OWASP LLM Top 10 · LLM05:2025

OWASP LLM05:2025 Improper Output Handling — multimodal dimension

OWASP LLM05:2025 Improper Output Handling describes the class of vulnerabilities that arise when LLM-generated content is passed to downstream components — web renderers, SQL query builders, operating system shells, code interpreters, API clients — without sufficient validation or sanitisation. The canonical examples are text-to-SQL pipelines where the model output is executed directly, or web-application backends that render model output as HTML without escaping. Every published mitigation for LLM05 — output encoding, allowlisting, sandboxed execution, treat-as-untrusted — targets text strings produced by the model. The multimodal dimension of LLM05 is systematically ignored: when a vision-language model (VLM) processes an image before generating structured output that feeds a downstream system, the adversarially crafted image is the root injection vector. Fixing the output handling layer in isolation — sanitising what the VLM returns — does not remove the adversarial instruction that the model was given at the input layer. Glyphward provides the pre-VLM scan gate that closes the input side of the LLM05 multimodal chain.

TL;DR

In any multimodal pipeline where VLM output feeds a downstream executor (SQL engine, shell, code interpreter, API), the image sent to the VLM is a first-class injection surface under LLM05. Call POST https://glyphward.com/v1/scan on every image before it reaches the VLM. Reject images with score >= 65. Output-layer sanitisation remains necessary but is not sufficient — an adversarial image that successfully steers model output bypasses output sanitisation by producing structurally valid but semantically malicious content. Free tier — 10 scans/day, no card required.

The four multimodal attack surfaces in LLM05 Improper Output Handling

1. VLM-to-SQL pipelines — image fields that feed structured query generation. Document intelligence pipelines — invoice processing, form OCR, ID verification, medical record extraction — commonly use a VLM to extract structured fields from a document image and then construct a SQL query to persist or retrieve records from a database. The LLM05 text baseline addresses SQL injection in the model's output string: if the VLM returns a value containing SQL metacharacters, output sanitisation catches it. The multimodal extension: an adversarially crafted invoice image can instruct the VLM to return a structurally valid JSON field value that is simultaneously a SQL injection payload — for example, a "vendor name" field rendered as Acme Corp'; DROP TABLE invoices;--. If the VLM is steered by pixel-level instructions in the image, the injection arrives at the output sanitisation layer disguised as a legitimate structured field value, not a raw SQL fragment. Parameterised queries stop naive SQL injection from VLM text output; they do not stop a VLM that was instructed by the image to emit a specific payload value in a specific JSON key that the application code unwraps and passes as a query parameter.

2. VLM-to-code interpreter — image descriptions that trigger code execution. Agentic pipelines built on LangChain, LlamaIndex, AutoGen, and CrewAI commonly include steps where a VLM processes a screenshot or diagram to generate a Python code block, a shell command, or a tool call that is then executed by a code interpreter or subprocess. The LLM05 risk here is that VLM-generated code is executed with insufficient sandboxing. The multimodal injection path: an adversarial image (a diagram, a screenshot, a scanned document) instructs the VLM to emit a specific shell command, file operation, or API call in its output. The code generation pattern is intended behaviour — the pipeline is designed to take VLM output and execute it. The adversarial image subverts the intended semantics of that output by making the VLM emit attacker-controlled commands rather than the task-appropriate commands the pipeline operator intended. Sandboxing the code interpreter (LLM05's recommended mitigation) limits blast radius but does not prevent the VLM from producing and executing attacker-specified code within the sandbox's permission boundary.

3. VLM-to-API client — structured output that triggers SSRF or credential exfiltration. Automation pipelines that process visual content from external sources — customer-submitted screenshots, uploaded product images, external web page captures — often use a VLM to extract a URL, an API endpoint reference, or an action identifier from the visual content, then invoke that URL or API endpoint. The LLM05 SSRF surface: an adversarial image instructs the VLM to extract a URL that points to an internal service, metadata endpoint (AWS 169.254.169.254, GCP metadata server), or attacker-controlled webhook. If the application uses VLM output to construct an HTTP request without allowlisting permitted domains, the adversarial image creates an SSRF chain through the VLM output handling layer. This is distinct from classical SSRF — the attacker does not control a URL parameter directly; they control an image that steers a VLM to produce their desired URL as an extracted structured field.

4. VLM-to-rendered-HTML — image analysis output injected into web application responses. Customer-facing applications that allow users to upload images and display AI-generated descriptions, captions, or analyses — product catalogues, social platforms, document review portals — face the LLM05 XSS risk if VLM output is rendered as HTML without escaping. The multimodal injection: a user-uploaded image crafted with pixel-level instructions causes the VLM to produce an output string containing JavaScript payloads, HTML injection, or markdown links pointing to phishing URLs. The application's HTML rendering layer — designed to display a VLM-generated product description or image caption — renders the attacker-specified content. Standard LLM05 output encoding (HTML entity escaping, Content-Security-Policy headers) mitigates the rendering step; the adversarial image still executes the injected instruction within the model's response before the output sanitisation layer sees it.

Integration: pre-VLM scan gate for multimodal output handling pipelines

import base64
import requests
import openai
import json

GLYPHWARD_KEY = "<your-glyphward-api-key>"
GLYPHWARD_THRESHOLD = 65

client = openai.OpenAI()


def scan_image_before_vlm(image_bytes: bytes, source: str = "pipeline") -> dict:
    """Scan image for adversarial PI before passing to VLM in output-handling chain."""
    encoded = base64.b64encode(image_bytes).decode()
    resp = requests.post(
        "https://glyphward.com/v1/scan",
        json={"image": encoded, "source": source},
        headers={"Authorization": f"Bearer {GLYPHWARD_KEY}"},
        timeout=8,
    )
    resp.raise_for_status()
    return resp.json()


def extract_structured_fields_safe(
    image_bytes: bytes,
    extraction_schema: dict,
    system_prompt: str,
) -> dict:
    """
    LLM05 multimodal pattern: scan image BEFORE VLM structured output call.

    Returns {'status': 'ok', 'fields': {...}} or {'status': 'blocked', 'reason': '...'}.
    Apply output sanitisation to 'fields' values AFTER this function returns 'ok'.
    """
    scan = scan_image_before_vlm(image_bytes, source="structured_extraction")
    if scan["score"] >= GLYPHWARD_THRESHOLD:
        return {
            "status": "blocked",
            "reason": "adversarial_image_detected",
            "score": scan["score"],
            "scan_id": scan["scan_id"],
            "action": "route_to_manual_review",
        }

    encoded = base64.b64encode(image_bytes).decode()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": f"Extract fields matching this schema: {json.dumps(extraction_schema)}. Return JSON only.",
                    },
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/png;base64,{encoded}"},
                    },
                ],
            },
        ],
        response_format={"type": "json_object"},
        max_tokens=512,
    )

    raw_output = response.choices[0].message.content

    # Output sanitisation layer (LLM05 baseline — still necessary after input scan)
    try:
        fields = json.loads(raw_output)
    except json.JSONDecodeError:
        return {"status": "blocked", "reason": "invalid_json_output"}

    # Sanitise string values — strip SQL metacharacters, HTML, etc.
    sanitised = {}
    for key, value in fields.items():
        if isinstance(value, str):
            # Strip characters with SQL injection significance
            sanitised[key] = value.replace("'", "''").replace(";", "").replace("--", "")
        else:
            sanitised[key] = value

    return {"status": "ok", "fields": sanitised, "scan_id": scan["scan_id"]}

The pattern above applies both layers: Glyphward pre-scan at the image input layer (blocking adversarial images before the VLM call) and output sanitisation at the model output layer (stripping SQL metacharacters from extracted field values). Both layers are necessary because they address different threat vectors — the pre-scan addresses adversarial steering of model behaviour; the output sanitisation addresses non-adversarial model errors and edge-case output formatting. For SQL persistence, use parameterised queries with the sanitised field values — never string-interpolate model output into SQL statements regardless of whether a pre-scan was performed. For HTML rendering, apply HTML entity encoding to all VLM output strings before inserting into page templates. Get early access

Coverage matrix

Mitigation layer	VLM-to-SQL (structured extraction)	VLM-to-code interpreter (agentic pipelines)	VLM-to-API client (SSRF via extracted URLs)	VLM-to-HTML renderer (XSS via captions)
Output encoding / sanitisation (LLM05 baseline)	Partial — catches SQL metacharacters in text output; misses VLM instructed to emit well-formed SQL via pixel PI	Partial — sandboxing limits blast radius; VLM still emits attacker-specified code within sandbox permissions	Partial — allowlisting catches known-bad domains; misses attacker-specified internal or metadata endpoints	Yes — HTML entity encoding catches XSS in VLM output; pixel PI still executed inside model before sanitisation
Parameterised queries (LLM05 baseline)	Yes — prevents SQL injection from literal VLM output; does not prevent VLM from emitting attacker-specified field values that are business-logic injections	N/A	N/A	N/A
Text-only prompt injection scanners (Lakera, LLM Guard, Azure Prompt Shields)	No — scan text input strings; VLM image input bypasses all text-layer scanners	No	No	No
Glyphward pre-VLM image scan (multimodal PI detection)	Yes — blocks adversarial image before VLM structured extraction call	Yes — blocks adversarial screenshot/diagram before code generation step	Yes — blocks adversarial image before VLM URL/action extraction	Yes — blocks adversarial user-uploaded image before VLM caption/description generation

Related questions

How does LLM05 Improper Output Handling differ from LLM01 Prompt Injection in multimodal pipelines?

LLM01 Prompt Injection is the root cause — an adversarially crafted input (image or text) that causes the model to deviate from its intended instructions. LLM05 Improper Output Handling is the consequence pathway — the downstream system that receives and acts on model output without validating it. In a multimodal pipeline, both are present simultaneously: an adversarial image causes LLM01 (the model is injected), and the model's altered output then exploits LLM05 (the downstream SQL executor, code interpreter, or API client acts on the injected output without validation). Addressing only LLM01 (blocking the adversarial image input) prevents the chain entirely; addressing only LLM05 (sanitising model output) limits blast radius but allows the injection to occur inside the model, potentially producing subtle semantic manipulations that pass surface-level sanitisation. The strongest posture addresses both layers.

Does structured output (JSON mode) in GPT-4o or Claude protect against LLM05 multimodal injection?

Structured output (JSON schema enforcement, tool-call output typing) ensures the model returns valid JSON conforming to a specified schema — it prevents malformed output and enforces field types. It does not prevent an adversarially injected image from causing the model to populate schema-conformant fields with attacker-specified values. If the schema expects a string for a "vendor_name" field, structured output guarantees a string is returned — but an adversarially crafted invoice image can instruct the model to return an attacker-specified string value in that field. Parameterised SQL queries then prevent that string from executing as SQL; the attacker's goal shifts to choosing a string value that achieves a business-logic injection (for example, a vendor name that matches a privileged internal account) rather than a syntax injection. Structured output is valuable for output handling robustness but is not a PI mitigation.

Which OWASP LLM Top 10 items are most relevant to multimodal pipelines?

Five items in the OWASP LLM Top 10 2025 have meaningful multimodal dimensions: LLM01 Prompt Injection (adversarial images and audio as injection vectors), LLM02 Sensitive Information Disclosure (VLMs that reveal training data or system prompt content when prompted via adversarial images), LLM05 Improper Output Handling (this page — VLM output fed to downstream executors), LLM08 Vector and Embedding Weaknesses (adversarial images poisoning CLIP-based multimodal vector indexes), and LLM09 Misinformation (adversarial images causing VLMs to generate authoritative false claims). LLM05 is particularly impactful because it is the primary pathway from an LLM01 input-layer injection to real-world downstream harm — the output handling chain is where injections become consequences.

What is the right remediation priority for LLM05 in a multimodal pipeline?

Address both layers with explicit priority ordering. First, add a pre-VLM image scan gate (Glyphward) on every pipeline path where an image from an untrusted external source feeds a VLM whose output is subsequently executed, rendered, or acted upon by a downstream system — this eliminates the adversarial input before it reaches the model. Second, apply standard LLM05 output handling mitigations to all VLM output: parameterised queries for SQL, sandboxed execution for code, URL allowlisting for API calls, HTML entity encoding for web rendering. Third, log and alert on scan rejections — blocked images are evidence of active attack attempts, not background noise. The multimodal AI security checklist provides a structured review for both layers.

TL;DR

The four multimodal attack surfaces in LLM05 Improper Output Handling

Integration: pre-VLM scan gate for multimodal output handling pipelines

Coverage matrix

Related questions

Further reading