Compliance · EU AI Act · Article 15

EU AI Act Article 15 — multimodal prompt injection compliance for high-risk AI systems

Article 15 of Regulation (EU) 2024/1689 — the EU AI Act — sets the accuracy, robustness, and cybersecurity requirements for high-risk AI systems. The provisions take full effect on 2 August 2026. For every provider of a high-risk system on the EU market, that date is a hard deadline: the cybersecurity controls Article 15(5) names have to be in place, evidenced, and described in the technical documentation. The text of Article 15(5) names adversarial examples or model evasion as one of the AI-specific vulnerability classes the technical solutions have to address. Multimodal prompt injection — instructions hidden in pixels or in waveforms that walk past a text-only filter — is squarely inside that class. Here is what Article 15 actually requires for the multimodal piece, and the inference-time scanner pattern that satisfies it.

TL;DR

Article 15(5) of the EU AI Act (the consolidated text is published on EUR-Lex as Regulation (EU) 2024/1689) requires technical solutions to prevent, detect, respond to, resolve, and control for, among other named threats, "inputs designed to cause the AI model to make a mistake (adversarial examples or model evasion)." Multimodal prompt injection — FigStep-class typographic image payloads, AgentTypo-class adversarial glyphs, WhisperInject-class audio carriers, indirect payloads delivered through retrieved documents and tool calls — fits inside that named class. A text-only scanner does not prevent or detect any of these vectors by construction; bolting OCR or STT in front of it does not change the conclusion. To satisfy Article 15(5) on a high-risk multimodal application, the input-inspection control has to read the bytes the model reads. Glyphward sits as the inference-time multimodal scanner that closes that gap, alongside whatever text-side guard you already have.

What Article 15 actually says

Two paragraphs of Article 15 carry the weight for a multimodal-PI evidence question. The first establishes the standard:

"High-risk AI systems shall be designed and developed in such a way that they achieve an appropriate level of accuracy, robustness, and cybersecurity, and that they perform consistently in those respects throughout their lifecycle." — Article 15(1)

The second names the cybersecurity threats explicitly:

"High-risk AI systems shall be resilient against attempts by unauthorised third parties to alter their use, outputs or performance by exploiting system vulnerabilities. The technical solutions aiming to ensure the cybersecurity of high-risk AI systems shall be appropriate to the relevant circumstances and the risks. The technical solutions to address AI specific vulnerabilities shall include, where appropriate, measures to prevent, detect, respond to, resolve and control for attacks trying to manipulate the training data set (data poisoning), or pre-trained components used in training (model poisoning), inputs designed to cause the AI model to make a mistake (adversarial examples or model evasion), confidentiality attacks or model flaws." — Article 15(5)

The wording is deliberately enumerative rather than exhaustive. Five distinct AI-specific vulnerability classes are named: data poisoning, model poisoning, adversarial examples / model evasion, confidentiality attacks, and model flaws. Prompt injection is not literally written into the text, but the operative class for runtime injection is "adversarial examples or model evasion" — inputs designed to cause the AI model to make a mistake. That is the same shape as a prompt-injection payload: a crafted input that diverts the model from its intended task. The European Commission's Joint Research Centre has published parallel guidance using the same vocabulary; the academic and policy literature treats prompt injection as a member of the adversarial-input class. So an audit against Article 15 will treat prompt-injection mitigation as a control under the named "adversarial examples or model evasion" sub-class.

The scope phrase that does the most work in 15(5) is "appropriate to the relevant circumstances and the risks." This is the proportionality test: a static-image classifier in a low-stakes consumer app and a multimodal medical agent operating on patient data are not held to the same depth of cybersecurity control. But the test cuts both ways. For an application whose risk profile includes adversarial-input pressure on a multimodal channel — and most production multimodal LLM applications fit that description today — "appropriate" controls have to actually inspect that channel. A text-only filter is not appropriate to a risk profile that includes image and audio attack carriers.

The 2 August 2026 deadline and what it sweeps in

The EU AI Act entered into force on 1 August 2024. Different obligations switched on at different points: prohibitions on the unacceptable-risk practices in Article 5 took effect on 2 February 2025; the obligations on providers of general-purpose AI models took effect on 2 August 2025. The high-risk-system requirements — including Article 15 — take full effect on 2 August 2026, two years after entry into force. From that date, providers placing a high-risk AI system on the EU market or putting it into service in the Union must satisfy Articles 9 (risk management), 10 (data governance), 11 (technical documentation), 12 (record-keeping), 13 (transparency), 14 (human oversight), and 15 (accuracy, robustness, cybersecurity) — and a conformity assessment has to evidence the lot.

For an AppSec or compliance lead at an AI startup or scale-up shipping into the EU, that calendar fact has two practical consequences. First, the buying window for runtime cybersecurity controls is narrowing: every quarter that passes between now and the deadline is a quarter further along the budgeting and procurement cycle. Second, "we are not yet doing X but we will" stops being a defensible audit answer on 2 August 2026. The control either runs in production by that date or the conformity assessment fails.

Whether your AI system is "high-risk" under the Act's classification logic is decided by Annex III plus Article 6. Annex III lists eight broad use-case categories: biometrics, critical infrastructure, education and vocational training, employment / workers' management / access to self-employment, access to essential private services and essential public services and benefits, law enforcement, migration / asylum / border control, and administration of justice and democratic processes. Where the AI system performs a profiling, scoring, or decision-supporting function inside one of those categories, it is treated as high-risk. An LLM-based assistant integrated into a hiring workflow, a multimodal screening tool used in immigration processing, an AI tutor that grades a student's submitted images and audio — all of these are textbook high-risk by Annex III. The Article 15 obligations apply directly. (For systems that are not high-risk, Article 15 is not directly binding; but the same proportionality argument and the same control architecture remain a good answer to enterprise-buyer security reviews and to the OWASP LLM01:2025 evidence question — see OWASP LLM01:2025 multimodal.)

How multimodal prompt injection maps onto Article 15(5)

Article 15(5) names five vulnerability classes. Multimodal prompt injection touches three of them, with the bulk of the weight on "adversarial examples or model evasion."

Adversarial examples / model evasion (primary mapping). A FigStep-style image payload (see FigStep detection) is, in the language of the Act, an input designed to cause the AI model to make a mistake — to obey the instruction rendered in pixels rather than the user's intended task. An AgentTypo-style adversarial glyph block (see AgentTypo detector) and a typographic-PI overlay (see typographic PI scanner) are the same shape. WhisperInject-class audio carriers (see WhisperInject detection) are the audio analogue: an input that an audio-aware model decodes into an instruction the transcript-side filter never saw. Each of these is, by every reasonable reading of the Act's wording, an adversarial-input class member. The technical solution Article 15(5) requires for this class is a control that prevents, detects, responds to, resolves, and controls for those inputs — which, on a multimodal channel, means a scanner that reads the bytes.
Confidentiality attacks (secondary mapping). A successful multimodal prompt injection often serves as the entry vector for a confidentiality attack — the injected instruction tells the model to disclose its system prompt, to dump retrieved-document contents, or to leak the contents of a tool result. The runtime confidentiality control is downstream of the input control: input inspection prevents the disclosure-causing instruction from reaching the model in the first place. So the multimodal scanner is also load-bearing for the confidentiality sub-class, even though it is not the only control.
Model flaws (tertiary mapping). The Act's "model flaws" language covers both intentional and inadvertent failure modes traceable to the model itself. An always-trusts-pixels behaviour in a vision-language model is, in this taxonomy, a model flaw. The risk-management obligation under Article 9 is to identify it; the cybersecurity obligation under Article 15(5) is to compensate for it. A byte-level scanner is the operative compensating control — because the application provider cannot retrain the foundation model.

Two named classes — data poisoning and model poisoning — are upstream of an AI provider that consumes a foundation model. Those obligations, where they apply, sit with the upstream model provider under the general-purpose-AI provisions. Article 15(5) does still expect the downstream provider to "control for" them, but the operative inputs-side control that the application provider runs in production is the adversarial-input scanner.

Why text-only controls are not "appropriate" for a multimodal high-risk system

The proportionality test in Article 15(5) — "appropriate to the relevant circumstances and the risks" — is the lever an auditor will pull. Two arguments make the test bind on the multimodal channel.

First, the interface argument. A text PI scanner accepts strings. It does not accept PNG bytes or PCM-16 audio. Its API has no parameter on which a multimodal payload can be evaluated. Bolting an OCR adapter or an STT adapter in front of it converts the input to text — but the conversion is the very thing the FigStep / AgentTypo / WhisperInject family is designed to defeat. The architectural ceiling of "text scanner plus OCR adapter" is the OCR's sensitivity, which adversarial-glyph attacks deliberately drop below (see why every text-only scanner misses a 30-pixel PNG for the long form of this argument; the audio version is in building a PI scanner for voice agents).

Second, the audit-evidence argument. An auditor reading Article 15 evidence will ask which technical solution prevents, detects, responds to, resolves, and controls for adversarial inputs on each modality the system consumes. For an application that accepts user-uploaded images, the question for the image channel needs an answer that is not "the OCR adapter on our text scanner." The defensible answer is a control whose inputs are bytes and whose output is a per-input score and reason. The same shape applies on the audio channel. A control that admits to a known structural ceiling against the named adversarial-input class is not appropriate to a risk profile that includes that class.

This is not the same argument as "you must replace your text scanner." The Act asks for appropriateness, not vendor consolidation. The pragmatic production setup is a text-side scanner on the text channel and a multimodal scanner on the image and audio channels — two controls, two evidence streams, one input-inspection program. See vs Lakera Guard, vs LLM Guard, vs Azure Prompt Shields, and vs Promptfoo for the side-by-side coverage shape, and the multimodal PI scanner pricing comparison for the buyer view.

Coverage matrix against Article 15(5) for multimodal high-risk systems

The same coverage-matrix shape that applies to OWASP LLM01:2025 applies to Article 15(5) — because both are saying the same thing about adversarial multimodal inputs. The 15(5) version asks each control whether it satisfies the named "prevent, detect, respond to, resolve, and control" verbs on each modality the system consumes.

Tool	Text channel	Image channel	Audio channel	Article 15(5) multimodal evidence
Lakera Guard	Yes (prevent / detect on text)	No (text-only as of public coverage)	No	Partial — does not address the named adversarial-input class on multimodal channels
LLM Guard (OSS)	Yes (text-only by design)	No	No	Partial — text-channel only
Azure Prompt Shields	Yes (Azure-gated)	Image moderation, not adversarial-input detection	No	Partial — moderation ≠ adversarial-input control under Article 15(5)
Promptfoo	Eval-time test harness	Eval-time test harness	Eval-time test harness	Not an inference-time control — does not "prevent" or "respond to" inputs in production
Glyphward	Run-both with text scanner	Yes — bytes in, score and region out	Yes — bytes in, score and region out	Multimodal-channel adversarial-input control with per-request evidence trail

The "image moderation, not adversarial-input detection" line for Azure Prompt Shields is the easiest evidence error for an early audit to make and the easiest one for an auditor to find. Image moderation classifies content categories (NSFW, violence, hate); adversarial-input detection classifies whether the content is an instruction-carrier for the model. An adversarial-glyph block can be benign for moderation and still inject the model. Article 15(5) names the adversarial-input class specifically; documenting moderation as the 15(5) control on the image channel is a finding waiting to happen. Long-form treatment in Azure Prompt Shields alternative (non-Azure).

Architecture for satisfying Article 15(5) on multimodal channels

The shape of an Article 15(5)-aligned input-inspection control on a multimodal channel is the same shape as the OWASP LLM01:2025 architecture, with the named verbs from 15(5) mapped to engineering primitives:

Mount on input — the "prevent" verb. Place the scanner on the boundary the model actually consumes from. For a chatbot with image upload, that is the upload handler before the vision API call. For a voice agent, that is the audio buffer before STT or before the audio-aware model. For a RAG pipeline (RAG pipelines), that is the loader middleware. For an MCP host (MCP servers), that is the tool-result handler. The "prevent" verb in 15(5) means the high-scoring input does not reach the model.
Score and tag — the "detect" verb. Return a 0–100 score and the modality-tagged reason. Article 15 evidence is easier to defend with a continuous score and a documented threshold than with an opaque blocked / allowed boolean. The threshold is the documented engineering parameter; the score history is the audit trail.
Respond per source — the "respond to" verb. Trust user-uploaded content less than first-party content, and trust third-party retrieved content least of all. The 15(5) "appropriate to the relevant circumstances" test means the response policy varies with the source. The same scan call with three different threshold bands documents three risk tiers cleanly.
Quarantine and dispute — the "resolve" verb. When a scan crosses the threshold, the input is quarantined rather than silently dropped: a request ID is logged, the user (or upstream caller) sees a structured refusal, and the security team has a queue to review false positives. "Resolve" in 15(5) means there is a documented path from a flagged input to a human-reviewable record.
Log every score for evidence — the "control for" verb. Per-request scoring data, retained against your application's request ID, is the granularity the 15(5) audit wants. Glyphward's API returns a request ID and a score; logging the pair is the audit-friendly default. Combined with Article 12's record-keeping obligation, this is the surface that the conformity-assessment auditor will read.

The byte-level scanning architecture this implements — CLIP image embedding plus a typographic head plus Tesseract OCR plus a curated payload corpus on the image side, and a waveform anomaly classifier plus a Whisper-small transcript filter on the audio side — is described in the multimodal prompt-injection threat model for AI product teams (2026) blog post. The five-step playbook there maps directly onto the five Article 15(5) verbs.

How Glyphward fits

Glyphward is the inference-time multimodal scanner — bytes in, score and region out — that slots into step 1 of the architecture above. The HTTP contract is one POST per attachment: image bytes (or URL) or audio bytes; the response is a 0–100 score, the flagged region (bounding box for image, time window for audio), and a modality-tagged reason. The same contract is exposed through the multimodal LLM security API page; pricing is flat-rate self-serve at $29/mo Pro and $99/mo Team, with a free tier sized for prototyping (free-tier API). Audit-friendly defaults: every Pro and Team request returns a request ID and is retained on Glyphward's side per the documented retention policy, which is the granularity the Article 15(5) "control for" verb expects when paired with Article 12 record-keeping on the application side.

The integration is provider-agnostic. Whether the high-risk application calls Anthropic, OpenAI, Google Gemini, AWS Bedrock, or a self-hosted multimodal model, the scanner reads the bytes — not the chat-completion API the bytes are about to flow into. That is what makes Glyphward a clean Article 15(5) control rather than a vendor coupling.

Get early access · See the API surface