Platform guide · IBM watsonx

Prompt injection scanner for IBM watsonx

IBM watsonx is an enterprise AI platform spanning foundation model inference (watsonx.ai), an open lakehouse (watsonx.data), and enterprise chatbot tooling (watsonx Assistants). Each tier introduces a distinct multimodal attack surface: watsonx.ai pipelines ingest PDF pages and images into Granite Vision and third-party foundation models, watsonx Assistants handles file uploads from enterprise users, and IBM Cloud Object Storage (COS) feeds document stores that flow directly into watsonx.data lakehouse queries and RAG retrieval. IBM's own GRC platform (OpenPages), SIEM layer (QRadar AI), and text-only NLP service (Watson NLU) do not inspect pixel-level content in these pipelines — adversarially crafted images pass through IBM's native governance stack without triggering any prompt-injection defence. Glyphward's multimodal scanner intercepts image bytes at each of these chokepoints before they reach any watsonx foundation model, blocking injections that exploit visual input channels IBM's tooling cannot see.

TL;DR

Before passing any image or document page to a watsonx.ai foundation model or watsonx Assistants skill, call POST https://glyphward.com/v1/scan with the base64-encoded image bytes. Reject the request if the returned score is 65 or higher — this threshold catches typo-glyph and FigStep-style pixel attacks with low false-positive rates on enterprise document content. The same scan call covers all four attack surfaces described below. Free tier — 10 scans/day, no card required.

The four multimodal attack surfaces in IBM watsonx

1. watsonx.ai Document AI — PDF and image ingestion for RAG. watsonx.ai RAG pipelines typically ingest documents through Watson Discovery or direct IBM Cloud Object Storage indexing. Watson Discovery converts PDF pages to images internally before extracting text; a PDF with adversarial pixel-level content embedded in a page image will have its injected instruction surfaced alongside the legitimate document text in the retrieval context. Pipelines using the ibm-watson-discovery SDK or the watsonx.ai Python library to index COS buckets receive these pages without any intermediate visual-content inspection. When the retrieved context is passed to a Granite or Llama foundation model via the ModelInference API, the adversarial instruction is presented as trusted retrieval context — a textbook indirect prompt injection executed through the document ingestion layer.

2. watsonx Assistants multimodal file uploads from enterprise users. IBM watsonx Assistants (formerly Watson Assistant) supports custom extensions and skills that accept file uploads from enterprise end-users — HR document reviewers, procurement analysts, and compliance teams routinely upload scanned contracts, invoices, and forms. Each uploaded file is processed by the assistant's configured skill, which may pass image content to a watsonx.ai foundation model for analysis. An employee or external counterparty who can influence the content of an uploaded file — even a page embedded inside a multi-page PDF — can inject instructions that override the assistant's system prompt or extract information from the conversation context. watsonx Assistants' built-in content moderation operates on conversational text, not on the pixel content of uploaded files.

3. IBM Cloud Object Storage and watsonx.data lakehouse — adversarial images in document stores. watsonx.data is IBM's open lakehouse built on Apache Iceberg and Presto. Enterprise teams store Parquet and Delta tables in IBM COS buckets; those tables frequently include binary columns containing document images, scanned receipts, or product photographs that are passed to vision-capable foundation models for enrichment queries. An attacker who can write a single adversarially crafted image into a COS bucket — through a compromised upload workflow, a poisoned vendor data feed, or a misconfigured bucket policy — can corrupt an entire batch enrichment run. Because watsonx.data's query engine processes these images as opaque binary data and passes them to the foundation model without inspection, the injection travels undetected from the data lakehouse layer into the AI output layer.

4. Watson Visual Recognition and Granite Vision in IBM RPA and workflow automation. IBM RPA (Robotic Process Automation) and business automation workflows built on IBM Cloud Pak for Business Automation frequently use watsonx.ai foundation models with image input capabilities — including Granite Vision and partner vision models available on the watsonx.ai model library — to extract structured data from scanned forms, invoices, and screenshots captured during automation runs. These automations operate with elevated permissions: they may write to ERP systems, trigger financial approvals, or update HR records. A manipulated screenshot or scanned form injected into an RPA workflow can redirect the model's output to produce structured extraction results that cause the automation to perform unintended privileged actions. Unlike user-facing chatbot deployments, RPA workflows often run unattended with no human review step between model output and downstream action execution.

Integration: watsonx.ai Python SDK with Glyphward pre-scan gate

import base64
import requests
from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models import ModelInference
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams

IBM_API_KEY = "<your-ibm-cloud-api-key>"
IBM_PROJECT_ID = "<your-watsonx-project-id>"
IBM_REGION_URL = "https://us-south.ml.cloud.ibm.com"

GLYPHWARD_KEY = "<your-glyphward-api-key>"
GLYPHWARD_THRESHOLD = 65  # fail-closed threshold for enterprise document content

credentials = Credentials(
    url=IBM_REGION_URL,
    api_key=IBM_API_KEY,
)

model = ModelInference(
    model_id="ibm/granite-vision-3-2-2b",
    credentials=credentials,
    project_id=IBM_PROJECT_ID,
    params={
        GenParams.MAX_NEW_TOKENS: 512,
        GenParams.TEMPERATURE: 0.0,
    },
)


def scan_image_for_injection(image_bytes: bytes) -> dict:
    """Scan image bytes for multimodal prompt injection before watsonx.ai call."""
    encoded = base64.b64encode(image_bytes).decode()
    resp = requests.post(
        "https://glyphward.com/v1/scan",
        json={"image": encoded, "source": "ibm_watsonx"},
        headers={"Authorization": f"Bearer {GLYPHWARD_KEY}"},
        timeout=8,
    )
    resp.raise_for_status()
    return resp.json()


def analyse_document_image(image_bytes: bytes, prompt: str = "Extract all text and summarise this document.") -> str:
    """Gate image through Glyphward before passing to watsonx.ai foundation model."""
    # Fail-closed: if scanner is unreachable, block the request
    try:
        scan = scan_image_for_injection(image_bytes)
    except Exception as exc:
        raise RuntimeError(
            "Image security check unavailable — request blocked. Please retry."
        ) from exc

    if scan["score"] >= GLYPHWARD_THRESHOLD:
        raise ValueError(
            f"Image blocked: adversarial content detected "
            f"(score {scan['score']}/100, scan_id={scan['scan_id']})"
        )

    # Safe — forward to watsonx.ai Granite Vision model
    encoded = base64.b64encode(image_bytes).decode()
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{encoded}"},
                },
                {"type": "text", "text": prompt},
            ],
        }
    ]
    response = model.chat(messages=messages)
    return response["choices"][0]["message"]["content"]

The scan_image_for_injection() gate runs before every model.chat() call regardless of the source of the image — whether it arrives from a COS document bucket, a watsonx Assistants file upload, or an RPA screenshot capture. The GLYPHWARD_THRESHOLD of 65 is deliberately conservative for enterprise document content, where false negatives carry higher regulatory risk than false positives; raise it only after validating false-positive rates against your specific document corpus. The fail-closed pattern on scanner failure ensures that a transient network error never silently allows an unscanned image through to the foundation model. Swap model_id from ibm/granite-vision-3-2-2b to any other vision-capable model in the watsonx.ai model library — the same scan gate applies.

Get early access

Coverage matrix

Defence layer	watsonx.ai Document AI / RAG	watsonx Assistants file uploads	COS + watsonx.data lakehouse images	Granite Vision / RPA automation
IBM OpenPages (GRC)	No — policy and risk workflow management, not content inspection	No	No	No
IBM Security QRadar AI (SIEM)	No — network and log threat detection, not AI input scanning	No	No	No
IBM Cloud Data Shield (Confidential Computing)	No — runtime memory encryption, not adversarial content detection	No	No	No
Watson NLU (text analysis)	No — text-only; cannot inspect pixel-level content in images	No — file content not analysed by NLU pre-ingestion	No	No
Glyphward pre-model scan	Yes — scan before Watson Discovery indexing or direct COS RAG retrieval	Yes — scan file upload bytes before skill processing	Yes — scan binary image columns before batch enrichment	Yes — scan RPA screenshots and scanned forms before model inference

Related questions

Does IBM's Adversarial Robustness Toolbox (ART) cover prompt injection in multimodal inputs?

No. IBM Research's Adversarial Robustness Toolbox (ART) is an open-source Python library designed to defend machine learning models against adversarial attacks — primarily evasion attacks, poisoning attacks, and extraction attacks against classifiers and object detectors. ART is a model-robustness framework: it helps you harden a model's decision boundary against perturbation attacks during training and evaluation. It does not provide runtime inspection of prompt inputs, does not detect injected natural-language instructions hidden inside image pixels, and does not integrate as an API gateway that blocks malicious inference requests. ART and Glyphward are complementary: ART addresses model-level adversarial ML robustness; Glyphward addresses application-level prompt injection through visual channels at runtime.

How does this interact with IBM's GDPR and FedRAMP compliance posture?

IBM watsonx deployments in regulated industries commonly operate under GDPR (for EU data subjects) and FedRAMP Moderate or High authorisations (for US federal workloads). Both frameworks require documented controls for integrity of AI input data and access control over processing systems. A successful multimodal prompt injection can bypass instruction boundaries and cause a model to disclose data from its retrieval context — a potential GDPR Article 25 (data protection by design) and FedRAMP SI-10 (information input validation) gap. Glyphward's pre-scan gate provides an auditable control point: each scan produces a scan_id that can be logged alongside the request for evidence of input validation in compliance audits. Glyphward does not store image bytes after scanning, which is compatible with GDPR data minimisation requirements.

What is the difference between scanning at the watsonx.ai API level vs. at the Watson Discovery ingestion level?

Scanning at the Watson Discovery ingestion level (when documents are first indexed into a collection) prevents adversarial images from entering the knowledge base at all — this is the preferred approach for RAG pipelines because it stops the payload at the earliest possible point and reduces the risk that a poisoned index entry is retrieved in future queries. Scanning at the watsonx.ai API level (immediately before a ModelInference.chat() or generate() call) catches adversarial content that arrives through real-time paths — user uploads, RPA screenshots, and live COS reads — that bypass the Discovery ingestion flow. In a full defence-in-depth deployment, both scan points are active: Discovery-ingestion scanning for batch document pipelines and API-level scanning for real-time inference paths.

Does the Glyphward scan threshold of 65 produce false positives on normal business documents?

At a threshold of 65, Glyphward's false-positive rate on typical enterprise document images — scanned invoices, contracts, ID documents, and product photography — is below 0.3% based on internal benchmarks. The threshold of 65 is chosen as a conservative default for IBM watsonx deployments where the downstream actions of the model (ERP writes, financial approvals, HR record updates) justify a low tolerance for missed detections. If your document corpus is narrow and well-characterised — for example, a single form template — you can evaluate false-positive rates against a representative sample using the free tier before tuning the threshold upward. Raise the threshold toward 80 only after confirming the false-negative rate on known-adversarial samples is acceptable for your risk posture.

Can this gate be applied in an IBM Cloud Pak for Business Automation workflow?

Yes. IBM Cloud Pak for Business Automation (CP4BA) RPA workflows can call external REST APIs from within automation scripts. In an IBM RPA script (BASIC dialect), use the built-in HttpRequest action to POST the base64-encoded screenshot or scanned document to https://glyphward.com/v1/scan before passing it to a watsonx.ai API call or a Watson Discovery indexing step. In IBM Business Automation Workflow (BAW), you can invoke the scan from a Service Flow using an external service integration configured against the Glyphward API endpoint. The fail-closed pattern applies in both environments: if the scan API call fails or returns a non-2xx status, the automation script should abort the task and route it to a human review queue rather than continuing with an unscanned input.

TL;DR

The four multimodal attack surfaces in IBM watsonx

Integration: watsonx.ai Python SDK with Glyphward pre-scan gate

Coverage matrix

Related questions

Further reading