Compliance · OWASP LLM01:2025
OWASP LLM01:2025 prompt injection — closing the multimodal sub-category
When AppSec teams audit an LLM application against the OWASP Top 10 for LLM Applications 2025, LLM01 — Prompt Injection — is the first control they have to evidence. The 2025 revision of LLM01 broadens the category beyond the original 2023 framing: it now recognises that prompt injection arrives through images, through audio, through embedded resources, and through any tool result the model treats as an instruction. Every public self-serve defender — Lakera Guard, LLM Guard, Azure Prompt Shields, Promptfoo — handles the text portion of LLM01. None of them, as shipped, cover the multimodal portion. That gap is what an LLM01:2025 audit will fail you on if your application accepts image upload, voice input, or any non-text content from a user or a tool. Here is what the multimodal piece of the control actually requires, and the inference-time scanner pattern that closes it.
TL;DR
LLM01:2025 (see the canonical risk page on the OWASP GenAI Security Project) treats prompt injection as a single category with multiple delivery channels — direct text, indirect text via retrieved or fetched content, and explicitly multimodal inputs (images, audio, mixed media). A control that satisfies LLM01 must inspect every channel the model actually consumes, at the place the model consumes it. Text-side filters are necessary but not sufficient: by design, they do not see pixels or waveforms. Glyphward sits as the inference-time scanner on the multimodal half of the control — bytes in, 0–100 score and flagged region out — and runs alongside the text-side guard you already have.
What LLM01:2025 actually says about multimodal
The 2025 revision of LLM01 extends the 2023 description in two places that matter for product teams. First, the threat description names multimodal inputs explicitly: any input the model accepts as part of its context — text strings, encoded images, encoded audio, embedded resources, file references, or content blocks delivered by tools — is a candidate carrier for an injected instruction. Second, the mitigation guidance names input inspection at the modality level. It is not enough to filter the text portion of a request and let the rest through; controls have to apply to whatever the model actually receives.
That is a small wording change with a large compliance consequence. Under LLM01:2023, an audit that demonstrated a text PI filter on user-typed prompts plus a basic system-prompt isolation pattern usually cleared the control. Under LLM01:2025, the same evidence does not clear the control if the application also accepts image upload (almost every chatbot in 2026), voice input (every voice agent), screenshots from a coding agent, or a retrieved PDF in a RAG pipeline. Each of those is a delivery channel the 2025 description names.
The OWASP GenAI Security Project — the 2025 successor to the original LLM Top 10 working group — maintains the LLM01 risk page as the canonical reference. Auditors and security reviewers will read that page first; their evidence questions will track its structure. If your application has a multimodal surface, expect to be asked which control inspects each modality and at what point in the pipeline.
Why the multimodal sub-category was added in 2025
Three things changed between the 2023 list and the 2025 list:
- Vision-language and audio-LLM models went mainstream. GPT-4o, Claude 3 and 4, Gemini 1.5+, and the open-weight Qwen-VL line all accept image and audio inputs as ordinary chat content. By 2024, accepting non-text inputs in production was the rule, not the exception.
- The attack literature caught up. FigStep, AgentTypo, and the typographic-PI class established that an instruction rendered as pixels survives every OCR-based defence (see FigStep detection and AgentTypo detector). WhisperInject established the audio analogue (see WhisperInject detection). Indirect prompt injection via images, ported from the Greshake et al. 2023 framing into 2024 multimodal pipelines, became a routine red-team finding (see indirect prompt injection in images).
- Existing public defenders did not extend. The text-PI scanners that cleared LLM01:2023 stayed text-only. The Sept–Nov 2025 acquisition of Lakera by Check Point pushed the leading text scanner upmarket rather than into modalities (see what Check Point buying Lakera means for self-serve AI-security buyers). The result was a public-control gap exactly where the 2025 list said one should not be.
The 2025 revision is the OWASP working group's response to that gap: name the multimodal channel explicitly so that an audit cannot quietly accept text-only evidence for a multimodal application.
The three multimodal delivery channels you have to cover
For a self-assessment against LLM01:2025, three multimodal channels recur in production AI applications. Each is a distinct evidence question.
1. Direct image upload. The user uploads a photo, a screenshot, a chart, or a meme. The image carries an instruction rendered onto its pixels — a FigStep-style anti-OCR overlay, an AgentTypo-style adversarial-glyph block, an attribute spoof, or a confusable visual prompt. Examples in scope: avatar SaaS (selfie-to-portrait), chatbots with image upload, support agents that accept screenshots of error states, content moderation pipelines, multimodal customer service. Per-product threat models in avatar SaaS, chatbots with image upload, and screenshot-reading agents.
2. Direct audio input. The user speaks. The audio carries either a spoken jailbreak that the STT pipeline transcribes faithfully, an inter-word carrier the transcript drops, or a WhisperInject-class out-of-band payload that the audio model decodes when the transcript-only filter sees nothing. Examples in scope: voice agents (telephony, in-app voice modes), audio-first chatbots, dictation assistants, transcript-then-act pipelines. Per-product threat model in voice agents, byte-level coverage in audio prompt-injection detection.
3. Indirect / tool-delivered multimodal content. The user does not upload anything; the model still sees image or audio bytes, because they came back from a retrieval, a tool call, an MCP server, or a fetched URL. Examples: a retrieved PDF in a RAG pipeline contains an embedded image with a FigStep payload (RAG pipelines); an MCP server returns a chart with an instruction overlay (MCP servers); a LangChain agent's tool call returns an image attachment with an injected instruction (LangChain agents). The 2025 LLM01 framing treats this as the same control: the bytes reach the model, so the bytes have to be inspected.
An LLM01:2025 self-assessment should produce one inspection-point answer per channel that applies to the application. "We do not accept user image upload" closes channel 1. "Our voice path uses STT and we filter the transcript" partially closes channel 2 — but only partially, because the transcript-side filter does not see WhisperInject-class carriers, and the auditor will probably ask. "We do not call multimodal-capable tools" closes channel 3. Anything else needs an active control.
Why text-side controls do not satisfy the multimodal sub-category
The argument that "we have a text PI scanner, so LLM01 is covered" fails on two grounds. First, by interface: a text PI scanner accepts strings. It does not accept PNG bytes or PCM-16 audio. Its API has nothing to score on a multimodal channel. Bolting an OCR adapter or an STT adapter in front of it converts the input to text, but the conversion is the very thing the attack defeats — see why every text-only scanner misses a 30-pixel PNG for the architectural form of that argument and building a prompt-injection scanner for voice agents for its audio analogue.
Second, by audit shape: LLM01:2025 evidence questions ask which control inspects each channel. An auditor will not accept "we feed the OCR output of every uploaded image into our text scanner" if the threat model includes adversarial-glyph attacks the OCR drops. They will ask whether the control reads the bytes, and if not, what the residual risk is. The honest answer for a text-side scanner with an OCR adapter is "high residual risk on the FigStep / AgentTypo class, mitigated only by post-hoc model behaviour monitoring," which is not a passing answer for a control whose purpose is pre-execution input inspection.
The same shape applies for audio. A text-side control that reads the STT transcript is not satisfying LLM01:2025 for audio input — it is mitigating a strict subset of the channel. The auditor's question is what reads the waveform, and "nothing" does not clear the control.
Coverage matrix against LLM01:2025 multimodal
For a buyer evaluating self-serve options against the multimodal sub-category specifically, the public-defender landscape sorts cleanly.
| Tool | Text channel | Image channel | Audio channel | LLM01 multimodal evidence |
|---|---|---|---|---|
| Lakera Guard | Yes | No (as of public coverage) | No | Partial — text only |
| LLM Guard (OSS) | Yes | No (text-only by design) | No | Partial — text only |
| Azure Prompt Shields | Yes (Azure-gated) | Image moderation, not PI | No | Partial — text + content moderation |
| Promptfoo | Test harness, eval-time | Test harness, eval-time | Test harness, eval-time | Not an inference-time control |
| Glyphward | Run-both with text scanner | Yes — bytes in, score and region | Yes — bytes in, score and region | Multimodal-channel control |
The "run-both" framing matters because LLM01:2025 does not require replacing the text scanner. It requires that every channel the model consumes is inspected. Most production setups use Lakera Guard or LLM Guard for text and Glyphward for image and audio — neither vendor competes for the other's channel. Side-by-side detail in Glyphward vs Lakera Guard, vs LLM Guard, vs Azure Prompt Shields, and vs Promptfoo; a self-serve pricing comparison at multimodal PI scanner pricing comparison.
Architecture for closing the multimodal half of LLM01
The shape of a control that clears LLM01:2025 multimodal is, deliberately, not novel. It is the same shape as a text PI scanner, applied to bytes:
- Mount on input. Place the scanner on the boundary the model actually consumes from. For a chatbot with image upload, that is the upload handler before the vision API call. For a voice agent, that is the audio buffer before STT or before the audio-aware model. For a RAG pipeline, that is the loader middleware (pre-ingestion or retrieval-time). For an MCP host, that is the tool-result handler.
- Score, do not block silently. Return a 0–100 score and the modality-tagged reason, not just a binary verdict. LLM01:2025 evidence is easier to defend with a continuous score and a tunable threshold than with an opaque "blocked / allowed" boolean. The threshold becomes a documented engineering parameter the auditor can read.
- Source-aware thresholds. Trust user-uploaded content less than first-party content, and trust third-party retrieved content least of all. The same scan call with three different threshold bands documents three risk tiers cleanly.
- Run-both with text. Keep the text scanner in front of the model. Add the multimodal scanner alongside it. The text channel is still a real channel, and the 2025 list still covers it as the original LLM01 sub-category.
- Log every score for evidence. A SOC 2 / ISO 27001 / FedRAMP-aligned LLM01 evidence trail wants per-request scoring data. Glyphward's API returns a request ID and a score; logging the pair against the application's request ID is the audit-friendly default.
The byte-level scanning architecture this implements — CLIP embedding plus typographic head plus Tesseract OCR plus a curated payload corpus on the image side, and a waveform anomaly classifier plus a Whisper-small transcript filter on the audio side — is described in the multimodal prompt-injection threat model for AI product teams (2026) blog post. The five-step playbook there maps directly onto LLM01:2025 evidence questions.
How Glyphward fits
Glyphward is the inference-time multimodal scanner — bytes in, score and region out — that slots into step 1 of the architecture above. The HTTP contract is one POST per attachment: image bytes (or URL) or audio bytes; the response is a 0–100 score, the flagged region (bounding box for image, time window for audio), and a modality-tagged reason. The same contract is exposed through the multimodal LLM security API page; pricing is flat-rate self-serve at $29/mo Pro and $99/mo Team, with a free tier sized for prototyping (free-tier API). Audit-friendly defaults: the Pro and Team tiers ship with logging and per-request IDs that satisfy a typical LLM01 evidence trail without further engineering.
The integration is provider-agnostic. Whether the application calls Anthropic, OpenAI, Google Gemini, AWS Bedrock, or a local model, the scanner reads bytes — not the chat-completion API the bytes are about to flow into. That is what makes it a clean LLM01 control rather than a vendor coupling.
Related questions
Will Glyphward show up as the LLM01 control in our SOC 2 / ISO 27001 evidence?
Glyphward returns a request ID and a 0–100 score per scan, which is the granularity an auditor wants for an inference-time control. Log the request ID against your application's request ID; the per-request scoring trail covers the multimodal half of the LLM01 evidence question. The text half is still your existing text scanner. Most teams document the two together as a single "input inspection" control, with two scanners citing two evidence streams.
Does Azure Prompt Shields' image moderation count as the LLM01 multimodal control?
Image moderation and prompt injection are different categories — moderation classifies content (NSFW, violence, hate); PI detection classifies whether the content is an instruction-carrier for the model. An adversarial-glyph block can be benign for moderation and still inject the model. Auditors familiar with the 2025 list will distinguish the two; documenting moderation as the LLM01 control on the image channel is a finding waiting to happen. Detail in Azure Prompt Shields alternative and the side-by-side comparison.
What about the indirect-PI sub-category specifically?
LLM01:2025 covers indirect prompt injection — payloads delivered through a retrieval, a tool call, or a fetched URL — within the same control. The multimodal channel applies the same way: an indirect image is still an image, and the inspection point is still the boundary at which the model receives it. For a RAG pipeline that means scanning at ingestion or retrieval time (see RAG pipelines); for an MCP host that means scanning every multimodal content block in CallToolResult (see MCP servers); for a LangChain agent that means scanning attachment parts in tool results (see LangChain agents).
Where does jailbreak detection fit, vs LLM01:2025?
OWASP folds jailbreak attempts into LLM01 as a sub-class — jailbreaks are prompt injections aimed at the system or safety policy rather than at the user's intended task. The same multimodal control covers jailbreaks delivered through pixels (FigStep) or waveforms (WhisperInject) the same way it covers task-redirecting injections. Glyphward does not separately badge "jailbreak"; the score and reason capture the carrier, and the policy decision (refuse vs warn vs allow) is the application's call.
Does this control overlap with LLM02 (Sensitive Information Disclosure)?
Slightly. A successful prompt injection can lead to information disclosure, but the controls live at different points: LLM01 inspects inputs before the model executes; LLM02 inspects outputs and prevents PII / secrets from leaving the model's response. A complete program needs both. Glyphward addresses LLM01 multimodal; output-side filtering for LLM02 belongs to a different tool (most teams use a text-side output filter, often the same vendor as their text-side input filter).
Further reading
- The multimodal prompt-injection threat model for AI product teams (2026) — full threat model, the five-step defender's playbook, and the byte-level architecture this LLM01 control instantiates.
- Why every text-only prompt-injection scanner misses a 30-pixel PNG — architectural argument for why an OCR-adapter-on-text-scanner has a structural ceiling below the FigStep / AgentTypo attack class.
- What Check Point buying Lakera means for self-serve AI-security buyers — buyer-side read on the public-defender consolidation that produced the LLM01 multimodal evidence gap.
- FigStep detection · AgentTypo detector · WhisperInject detection · Typographic PI scanner · Audio prompt-injection detection · Indirect prompt injection in images — what an LLM01-aligned scanner is actually catching, by attack family.
- For RAG pipelines · For MCP servers · For LangChain agents · For voice agents · For screenshot-reading agents · For avatar SaaS · For chatbots with image upload — per-product mount points, each one a different inspection-point answer to the same LLM01:2025 evidence question.
- Multimodal LLM security API — the API surface this control calls into.
- Lakera alternative · LLM Guard alternative · Azure Prompt Shields alternative · Promptfoo + multimodal scanning — coverage gaps of each public defender against the LLM01:2025 multimodal sub-category.