Compliance · CISA AI Security

CISA deploying AI systems securely — prompt injection and the multimodal gap

In April 2024, CISA published "Deploying AI Systems Securely" jointly with the NSA, FBI, and cyber agencies from the UK, Australia, Canada, New Zealand, Germany, Israel, Japan, and South Korea — fourteen co-signers in total. The advisory names prompt injection as a key adversarial threat to deployed AI systems and identifies the inference interface as the principal hardening point. What it does not enumerate is the modality dimension: for organisations operating AI applications that accept image upload, audio input, or multimodal tool results, the guidance's requirement to validate and sanitise all inputs before they reach the model applies to bytes in every channel — but the document does not spell this out per modality. For defence contractors, FedRAMP-path companies, and organisations that treat CISA guidance as their AI security baseline, that silence is not an exemption. It is an operator-level interpretive burden that requires closing with an active, inference-time multimodal control.

TL;DR

"Deploying AI Systems Securely" (CISA, April 2024) is the current high-watermark of US government AI security guidance for deployed systems. It identifies prompt injection as an adversarial threat class and recommends hardening the inference interface through input validation, privilege separation, monitoring, and logging. The guidance is modality-agnostic — it applies to image bytes and audio waveforms as directly as it applies to text strings, but does not name the multimodal channels explicitly. For US-adjacent organisations implementing this guidance, the operative question is: which control reads each input modality before it reaches the model? The standard public defenders — Lakera Guard, LLM Guard, Azure Prompt Shields — answer that question for the text channel only. Glyphward answers it for the image and audio channels: bytes in, 0–100 risk score and flagged region out, positioned at the same inference boundary the CISA guidance identifies as the hardening point.

What CISA's April 2024 advisory covers

"Deploying AI Systems Securely: Best Practices for Deploying Secure and Resilient AI Systems" was published on 15 April 2024 as a joint advisory available on the CISA resources portal. Its mandate is deployer-facing: not how to train a safer model, but how to operate an already-trained AI system in a production environment without creating new exploitable surfaces. The advisory organises its recommendations around five hardening areas.

The first is securing the deployment environment — isolating the AI system from adjacent network segments with least-privilege access controls, hardened host configurations, and clear boundaries between AI inference infrastructure and non-AI workloads. The second is protecting the AI supply chain — validating model provenance and third-party component integrity throughout the operational lifetime, maintaining an AI bill of materials, and detecting unexpected model updates. The third is protecting AI data — encrypting training datasets, inference inputs, and model outputs in transit and at rest, with particular attention to the inference pipeline where inputs of unknown provenance intermix with sensitive system context.

The fourth area — and the one most directly relevant to prompt injection — is hardening the inference interface. The advisory identifies this boundary as the point where untrusted external content enters the model's context: user-supplied inputs, retrieved content, tool results, and any data the model treats as part of its instruction. The recommended controls at this boundary are: validate and sanitise inputs before they reach the model; monitor model outputs for anomalous or out-of-scope content; log all requests and responses for post-hoc review. The fifth area is incident response — pre-defined playbooks for AI-specific failure modes including adversarial attacks, data poisoning, and prompt injection, with explicit logging requirements that make post-incident reconstruction possible.

Prompt injection sits squarely in the fourth category. The advisory treats it as a class of adversarial input — a payload delivered through the model's own inference channel that causes the model to deviate from its intended behaviour, act outside its authorised scope, or surface information it should not. The recommended control is inference-boundary hardening: inspect and sanitise what the model receives, monitor what it produces, and log the full request for the incident-response playbook to act on.

Why fourteen co-signing agencies change the compliance calculus

Most AI security guidance is advisory in tone — published recommendations that organisations may consider adopting. "Deploying AI Systems Securely" carries unusual weight for three reasons that change the compliance calculus for US-adjacent organisations.

First, the co-signer list includes NSA and FBI alongside CISA. When NSA co-signs a recommendation to harden the inference interface against prompt injection, that recommendation enters the informal standard of care that procurement and audit teams use when evaluating AI systems touching federal data or federal-adjacent supply chains. For defence prime contractors, components of a CMMC (Cybersecurity Maturity Model Certification) assessment, and companies on the FedRAMP authorisation path, the question "what did you do about prompt injection in your AI components?" is increasingly paired with "and how does your response align with CISA's April 2024 advisory?"

Second, the April 2024 advisory was not the first. It followed the November 2023 "Guidelines for Secure AI System Development" — also a joint CISA / NCSC-UK publication — which established that CISA's AI security posture is an evolving compliance baseline, not a one-off statement. Organisations that adopted the 2023 guidelines as their AI security framework are expected to extend them with the 2024 operational guidance. The trajectory is toward increasing specificity, not standing still.

Third, CISA's parallel "Secure by Design" programme includes AI prompt injection among the vulnerability classes that software manufacturers and deployers are expected to eliminate by architecture rather than detect at runtime. The Secure by Design framing is significant: it shifts the burden of proof. An organisation does not pass the control by having a monitoring alert for unusual AI behaviour; it passes by demonstrating a positive control that prevents the exploitation class at the input boundary. Inference-boundary inspection is the positive control that framing implies.

For security architects mapping controls to multiple frameworks, the CISA advisory is also the natural operational complement to the NIST AI RMF GenAI Profile. Where the NIST profile structures prompt injection as a risk inside a risk-management function (Govern / Map / Measure / Manage), the CISA advisory translates that structure into concrete deployment steps. The two documents are designed to be read together, and an AI security architecture that satisfies one without the other is incomplete against either framework's expectations.

The multimodal gap: what the guidance does not yet name

The April 2024 advisory names prompt injection as a threat and hardening the inference interface as the control. What it does not do is enumerate the modalities through which a prompt injection payload can arrive. The document's language about "user-provided data," "inputs," and "adversarial content" is deliberately modality-agnostic — which is both the guidance's structural strength and the operator's interpretive burden.

For AI applications that accept image upload, voice input, or multimodal tool results, the extension is unambiguous: an image is an input. An audio sample is an input. A retrieved PDF with embedded images is an input. The guidance's requirement to "validate and sanitise inputs before they reach the model" applies to bytes in every modality, not only to text strings. But the guidance does not say this explicitly, and teams without prior exposure to the multimodal attack literature may not make that extension when designing their inference-boundary controls.

The attack literature makes the extension unavoidable. FigStep demonstrated in 2023 that a jailbreak instruction rendered as typography in an image reaches a vision-language model cleanly — zero text-channel signal, same output-deviation as a typed jailbreak — because the attack defeats OCR-based filtering by design (see FigStep detection). AgentTypo extended this to adversarial glyph sequences that survive OCR by exploiting character-level confusables (see AgentTypo detector). WhisperInject established the audio analogue: a payload embedded in a waveform that the speech-recognition model decodes as a coherent instruction while the plain transcript drops it entirely (see WhisperInject detection). Indirect multimodal injection extended the threat surface beyond direct uploads: image bytes arriving from a retrieval, a tool call, or an MCP content block, with the user having uploaded nothing — the payload rides a trusted-looking channel (see indirect prompt injection via images).

None of these are laboratory curiosities. They are the current practitioner-facing threat model for any multimodal AI application in production. A CISA compliance reviewer asking about inference-boundary hardening in a multimodal application will not accept "our text scanner covers inputs" as a complete answer if the application accepts image or audio. The evidence question is which control inspects each modality, and "nothing" does not satisfy the advisory's positive-control requirement.

Why text-only input validation does not satisfy the multimodal requirement

The most common shortcut when a team recognises a multimodal input-validation gap is to route every input through an OCR or speech-to-text transcription step, then pass the result to the existing text-PI scanner. This pattern appears to satisfy the inference-boundary hardening requirement — every input was processed before reaching the model — but it does not, because the transcription step is precisely the attack surface that adversarial multimodal inputs exploit.

The FigStep class of attacks works by rendering an instruction in a way that a neural vision-language model reads as a coherent directive while an OCR adapter either drops the characters (adversarial rendering layers), misreads them (typographic confusables), or never encounters them (visual features encoded below the OCR token boundary). The architecture of an OCR-then-text-scanner pipeline is structurally identical to the architecture that adversarial multimodal images are designed to defeat: transcription at an upstream stage, text filter at a downstream stage, no inspection of the raw bytes the model will actually receive. See why every text-only prompt-injection scanner misses a 30-pixel PNG for the architectural argument in full.

The audio parallel is even cleaner. A speech-to-text adapter converts an audio stream to a transcript. A WhisperInject payload embeds an instruction in the waveform in a form that the speech model decodes as part of its context while the transcript representation omits it entirely. A downstream text filter operating on the transcript sees a clean transcript, because the payload was engineered to survive the speech-to-text layer and resolve only at the downstream language model (see audio prompt-injection detection and voice-agent threat model). No amount of sophistication at the text-filter stage fixes the structural gap at the transcription stage.

For a CISA-aligned compliance review, the issue is not only whether a specific known attack succeeds against the deployed system. It is whether the control architecture closes the named risk class. An architecture that transcribes before filtering has a residual risk in the exact category the advisory names — adversarial input that reaches the model — and that residual is not mitigated by transcript-quality improvements or OCR tuning. The evidence question "what reads the bytes the model will consume?" has only one satisfying answer: a control that reads those bytes directly, at the modality of origin, before the bytes reach the model.

Coverage matrix against CISA's inference-boundary hardening recommendations

For security teams mapping the self-serve tool landscape to CISA's inference-interface hardening requirements, the picture sorts cleanly along the modality boundary.

Tool	Text input inspection	Image input inspection	Audio input inspection	CISA inference-boundary evidence
Lakera Guard	Yes	No (text-only)	No	Partial — text channel only
LLM Guard (OSS)	Yes	No (text-only by design)	No	Partial — text channel only
Azure Prompt Shields	Yes (Azure-gated)	Content moderation, not PI	No	Partial — text + content-class moderation
Promptfoo	Eval-time test harness	Eval-time test harness	Eval-time test harness	Not a runtime inference control
Glyphward	Run-both with text scanner	Yes — bytes in, score + region	Yes — bytes in, score + region	Multimodal inference-boundary control

Two clarifications apply. First, Promptfoo is an evaluation harness that generates adversarial test cases at development and CI time — it verifies that a model or system behaves correctly under adversarial probing before deployment. It does not intercept production requests and does not constitute a runtime inference-boundary control in CISA's sense. Promptfoo is the Measure layer in NIST AI RMF terms; CISA's advisory is asking about the Manage layer. Second, Azure Prompt Shields' image-input capability covers content moderation classes — CSAM, violence, hate — rather than prompt-injection payload detection. Passing an uploaded image through Azure content moderation and recording a "no harmful content" result does not produce CISA-aligned evidence that adversarial-instruction payloads in that image were inspected and blocked. Moderation and PI detection are different functions serving different threat models. These distinctions are developed in the comparison pages at Lakera Guard vs Glyphward, Azure Prompt Shields vs Glyphward, and LLM Guard vs Glyphward.

Five-step architecture for CISA-aligned multimodal inference-boundary hardening

Implementing the CISA advisory's inference-boundary hardening requirement for a multimodal application requires five steps that map directly onto the guidance's language.

Inventory every input modality the model receives. This includes user-uploaded files, voice and audio streams, images delivered via tool calls or MCP content blocks, content retrieved from external sources (RAG chunks, web fetches), and any file format the model treats as part of its instruction context. The CISA advisory's "know what AI systems you have deployed" principle applies here at the modality level: you cannot harden a channel you have not identified. Per-product threat models are available for chatbots with image upload, RAG pipelines, MCP servers, screenshot-reading agents, and voice agents.
Place the scan at the inference boundary — after receipt, before the model call. The CISA advisory specifically names the inference interface as the hardening point. Scanning at upload time adds a useful layer but is insufficient if the bytes are stored and later forwarded to a model at a separate pipeline stage without re-inspection. The compliant architecture is: bytes arrive → scanner reads bytes → score above threshold → blocked before reaching model; score below threshold → bytes reach model with a source-trust tag. See multimodal LLM security API architecture for the pipeline shape and latency budget.
Apply source-trust thresholds, not binary pass/fail. The CISA advisory's proportionate-controls framing implies that a scored response is more useful than a binary block. An anonymous-user image upload warrants a stricter threshold than an internal-tool image result from a first-party service. The score, the threshold, and the source-trust tier together constitute the evidence record the advisory's logging requirement expects. A binary block log answers "did we reject inputs?" — a scored log answers "by how much and why?" for the incident-response playbook.
Log every scan result as the advisory's IR playbook requires. "Deploying AI Systems Securely" explicitly requires logging of AI system activity to support post-hoc review and incident response. Each scan result should produce an immutable log entry: input hash, modality, score, threshold applied, outcome (passed or blocked), and timestamp. This is the artifact a CISA-aligned review will request as evidence that the inference boundary was actively hardened — not just that a scanner exists. Glyphward's Pro and Team tiers return a stable per-request ID alongside the score; logging that ID against the application's own request ID is the audit-friendly default, with no additional engineering required.
Connect the multimodal scan signal to the full compliance vocabulary. CISA guidance does not exist in isolation. For organisations subject to NIST AI RMF, the same scan log answers the GenAI Profile's Measure and Manage evidence questions (see NIST AI RMF GenAI Profile). For OWASP audit prep, it answers the LLM01:2025 multimodal sub-category evidence question (see OWASP LLM01:2025 multimodal). For organisations doing EU-market compliance, it maps onto Article 15(5)'s adversarial-robustness requirement (see EU AI Act Article 15). For red-team programs, it covers MITRE ATLAS AML.T0051 and AML.T0054 (see MITRE ATLAS T0051 / T0054). A single inference-boundary multimodal scan, logged with a stable request ID, produces evidence readable across all five vocabulary sets simultaneously.

How Glyphward fits

Glyphward is the inference-time multimodal scanner — bytes in, 0–100 risk score and flagged region out — that positions at the inference boundary the CISA advisory identifies as the hardening point. The HTTP contract is one POST per attachment: image bytes or URL, or audio bytes; the response is a score, the flagged region (bounding box for image, time window for audio), a modality-tagged reason, and a stable request ID. The scanner runs CLIP embedding plus a typographic-PI detection head plus Tesseract OCR cross-referenced against a curated payload corpus on the image side, and a waveform anomaly classifier plus Whisper-small transcript filter on the audio side — architecture described in full in the multimodal prompt-injection threat model for AI product teams (2026).

For a CISA evidence package, Glyphward addresses the inference-boundary gap for multimodal channels: per-request scan IDs and scores constitute evidence that image and audio inputs are inspected before they reach the model, in the same way that an existing text-side scanner provides evidence for text inputs. The two scanners run in parallel rather than in sequence — Glyphward does not compete for the text channel, and the text scanner does not compete for the image or audio channels. That is the run-both pattern recommended across the comparison pages. Pricing is flat-rate self-serve at $29/mo Pro and $99/mo Team, with a free tier for prototyping at no-card signup.

Get early access · See the API surface

Related questions

Does "Deploying AI Systems Securely" explicitly name image or audio prompt injection?

No. The April 2024 advisory uses modality-agnostic language throughout: "user-provided inputs," "adversarial content," "inputs to the model." It names prompt injection as a threat class and requires inference-boundary hardening as the control, but it does not enumerate image, audio, or multimodal as distinct attack channels. This is a deliberate structural choice — modality-agnostic language does not become outdated when a new modality is added to a deployed system. The operator's responsibility is to apply the principle to every input modality the model actually consumes. A multimodal application team that reads "input validation" as covering text strings only is not meeting the spirit of the guidance, and a reviewer familiar with the April 2024 document will note the gap.

How does CISA guidance relate to OWASP LLM01:2025 and NIST AI RMF?

The three documents form a complementary evidence stack. OWASP LLM01:2025 is the practitioner-facing catalogue: it names the attack, categorises its variants (including the multimodal sub-category added in the 2025 revision), and lists mitigations at the application level. NIST AI RMF GenAI Profile (NIST AI 600-1) is the risk-management framework: it assigns prompt injection to specific functions (Govern, Map, Measure, Manage) and structures the audit trail around those functions. The CISA advisory is the operational hardening guide: it translates the framework's risk language into concrete deployment steps — harden this boundary, encrypt this data path, maintain this log. Reading the three together, a team gets threat identification (OWASP), risk structure (NIST), and operational execution (CISA). The NIST AI 100-2e2025 AML Taxonomy adds the underlying terminology — the canonical glossary entries the other documents lean on — and MITRE ATLAS adds the red-team technique-ID vocabulary. All five converge on the same inference-boundary control architecture for the multimodal channel.

Is CISA guidance mandatory for US companies?

CISA advisories do not carry direct legal-mandate force for private-sector companies unless incorporated by reference into a specific contract, regulation, or procurement requirement. However, "mandatory" is the wrong frame for how they operate in practice. For CMMC-scoped defence contractors, CISA guidance is part of the assessor's reference set. For FedRAMP applicants, CISA hardening baselines inform the authorisation review. For companies selling AI-enabled products to civilian federal agencies, procurement reviewers increasingly ask for demonstrated alignment with CISA AI guidance. Outside government-adjacent sales, the advisory operates as the informal standard of care in US AI security reviews: a reviewer who asks about prompt injection hardening and receives "we followed CISA's deploying-AI guidance" treats that differently from a blank answer. For companies with no government exposure, the OWASP + NIST vocabulary may be more directly binding via contractual SOC 2 or ISO 27001 evidence requirements.

What is the relationship between the April 2024 advisory and the earlier November 2023 guidelines?

The two documents cover complementary scopes. The November 2023 "Guidelines for Secure AI System Development" (co-authored by CISA and NCSC-UK) is aimed at AI developers and system builders — it covers the design, development, deployment, and operations lifecycle from the builder's perspective. The April 2024 "Deploying AI Systems Securely" is aimed at deployers and operators — organisations running an AI system they did not necessarily build, making it the more directly applicable document for product teams integrating third-party AI APIs or foundation models. Organisations building custom models need both; organisations deploying commercially-sourced or open-weight models primarily need the 2024 advisory. The 2024 document's focus on the inference interface specifically — validating inputs, monitoring outputs, logging requests — is the operational guidance that deployers can directly action, and it is the section most relevant to the prompt-injection control architecture.

How does this relate to CISA's Secure by Design programme?

CISA's Secure by Design programme asks software manufacturers to take responsibility for eliminating entire classes of vulnerabilities rather than patching them reactively or leaving mitigation to end-users. Prompt injection is explicitly within the AI scope of Secure by Design: the programme's framing says that an AI application should be architected so that adversarial-instruction injection via any input channel is prevented by design, not detected by monitoring after the fact. "Deploying AI Systems Securely" and Secure by Design address different parts of the supply chain — Secure by Design targets platform and application developers, the deployment advisory targets operators — but for an AI application developer both apply. The application layer should implement the deployment advisory's inference-boundary controls; the overall product design should follow Secure by Design's architectural principle that the injection class should be prevented at the boundary, not caught at the output. The intersection with OWASP LLM02:2025 (Insecure Output Handling) is direct: a Secure by Design reading says that a multimodal application allowing injected model outputs to reach a code interpreter or markdown renderer without sanitisation has a design-level failure regardless of how good the output monitor is.