Compliance · NIST AI RMF · GenAI Profile
NIST AI RMF GenAI Profile — multimodal prompt injection under the Information Security risk
The NIST AI Risk Management Framework — version 1.0, published 26 January 2023 — is the de facto common vocabulary for AI risk programs across US enterprise and US federal procurement. The Generative AI Profile that overlays the framework — NIST AI 600-1, published 26 July 2024 — enumerates twelve risks that are unique to or exacerbated by generative AI. Risk number 9 is Information Security, and prompt injection sits inside it. The Profile distinguishes direct prompt injection from indirect prompt injection in its own words. What the Profile does not do is enumerate modalities. Image and audio prompt injection — FigStep-class typographic payloads, AgentTypo-class adversarial glyphs, WhisperInject-class audio carriers — is the same Information Security risk delivered through channels a text-only control does not see. Here is how the AI RMF Govern / Map / Measure / Manage functions land on the multimodal piece, and the inference-time scanner pattern that closes the gap.
TL;DR
The AI RMF GenAI Profile (NIST AI 600-1) names Information Security as risk 9 of 12 and locates prompt injection inside it. Its definitions of direct and indirect prompt injection are quoted verbatim in the next section. The Profile does not enumerate modalities, so an organisation operationalising the Profile owns the read on whether image and audio prompt injection — FigStep, AgentTypo, WhisperInject, indirect carriers in retrieved documents and tool calls — counts as a delivery channel for the same Information Security risk. The defensible read is yes: an instruction rendered in pixels or in a waveform that diverts the model is the same risk class as an instruction encoded in characters. A text-only scanner does not satisfy a Map / Measure / Manage function on a multimodal channel by construction; bolting OCR or STT in front does not change the conclusion. Glyphward sits as the inference-time multimodal scanner — bytes in, score and region out — that closes that gap, alongside whatever text-side guard already covers the text channel.
What NIST AI 600-1 actually says about prompt injection
Section 2 of the Profile lists twelve GenAI risks: CBRN information or capabilities, confabulation, dangerous / violent / hateful content, data privacy, environmental impacts, harmful bias or homogenization, human-AI configuration, information integrity, information security, intellectual property, obscene / degrading / abusive content, and value-chain / component integration. Information Security is risk 9. The risk's Profile entry names cyberattacks against GenAI systems, including prompt injection, as the operative threat surface. The Profile gives the two attack vectors verbatim:
"In direct prompt injections, attackers might craft malicious prompts and input them directly to a GAI system, with a variety of downstream negative consequences to interconnected systems." — NIST AI 600-1 §2.9 (Information Security)
"Indirect prompt injection attacks occur when adversaries remotely (i.e., without a direct interface) exploit LLM-integrated applications by injecting prompts into data likely to be retrieved." — NIST AI 600-1 §2.9 (Information Security)
The Profile then notes the demonstrated downstream impact, again in its own words:
"Security researchers have already demonstrated how indirect prompt injections can exploit vulnerabilities by stealing proprietary data or running malicious code remotely on a machine." — NIST AI 600-1 §2.9 (Information Security)
What the Profile does not do is name modality. The verbatim language is "malicious prompts," "input them directly," "data likely to be retrieved." The text is mode-agnostic. That silence is the operative point: an organisation operationalising the Profile owns the read on whether image bytes and audio bytes count as inputs and retrieved data. The defensible read is yes — and the engineering literature treats the question as settled. FigStep (arXiv:2311.05608) is a typographic image payload that walks past every text-only scanner because the text never exists as characters; AgentTypo extends it with adversarial glyph blocks; WhisperInject (arXiv:2405.20653) demonstrates the audio analogue. A risk-management program that names prompt injection as Information Security but only inspects the text channel has documented an Information Security risk it is not Measuring on every channel the production system actually consumes from.
The Profile is published as an overlay on AI RMF 1.0. The four AI RMF core functions — Govern, Map, Measure, Manage — sit above the GenAI-specific risks. Suggested actions in the Profile are tagged with codes of the shape GV-1.1-001, MP-5.1-001, MS-2.7-001, MG-2.2-001, where the leading letters denote the function (GV / MP / MS / MG) and the rest of the code locates the suggested action inside that function's category and subcategory tree. The full list is large — over two hundred suggested actions across the four functions — and the prompt-injection mitigations are scattered across all four. Treating the framework as a checklist of action codes misses the point; the Profile's own framing is that the actions are suggestions an organisation tailors to its risk profile and use-case. The operationally useful mapping is not action-code by action-code, but function by function: what does Govern, Map, Measure, and Manage each require for the multimodal piece of Information Security risk 9?
How the four AI RMF functions land on multimodal prompt injection
Each function is a different surface a multimodal Information Security control has to be visible on. The mapping below is the operational read for an AI startup or scale-up running the RMF on a production multimodal LLM application — not a comprehensive reproduction of the Profile's suggested actions for the Information Security risk.
- Govern. The organisation has named multimodal prompt injection as a member of the Information Security risk in its risk register, and has designated accountable owners (typically the AppSec lead and the AI engineering owner of the multimodal application). The Govern function is where the policy that says "image and audio inputs are inspected on the same threat-class as text inputs" gets written down. Without that policy artifact, downstream Map and Measure activities can be argued away by the next reviewer who treats modality as out-of-scope.
- Map. The organisation has identified each modality the production system consumes, the upstream source of each modality (first-party user upload, third-party tool result, retrieved document, tool-server output), and the trust posture of each source. A typical multimodal-LLM application has at least three Map entries: image bytes from end users, image bytes embedded in retrieved documents (RAG pipelines — see RAG pipelines), and image or audio bytes returned in tool calls (MCP hosts — see MCP servers). Each Map entry is a distinct attack surface; each one needs a corresponding Measure activity.
- Measure. The organisation has a documented evaluation that exercises the multimodal Information Security control against representative inputs from each Mapped attack surface, and reports recall, false-positive rate, and latency on each. Measure is where pre-deployment red-teaming lives, and where a public-corpus benchmark on FigStep / AgentTypo / WhisperInject earns its keep. AI RMF Measure activities are not one-shot — the documented re-evaluation cadence (typically aligned to model upgrades, new attack publications, and incident learnings) is the auditable artefact.
- Manage. The organisation has a documented response to an inspection signal: a quarantine queue, a request-ID-keyed audit trail, a documented escalation path on a flagged input, and a feedback loop from production findings into the next Map / Measure cycle. Manage is the function that turns a runtime score into an organisational learning signal. A scanner whose output is a black-box block / allow boolean does not feed Manage; a scanner whose output is a per-request score with a modality-tagged reason and a request ID does.
The mapping above is intentionally function-level rather than action-code-level. The Profile invites the operator to tailor — and the action codes that are most load-bearing for a particular multimodal application are not the same codes that are most load-bearing for a different one. What is constant across operators is the four-function shape: Govern names the risk, Map enumerates the surfaces, Measure evaluates the control, Manage handles the signal.
Why a text-only Information Security control is not "tailored to the use-case" for multimodal systems
The AI RMF's language for "appropriateness" is "tailored to the AI actor's resources, the system's deployment context, and the risk profile." The same proportionality test that runs through the EU AI Act's Article 15 (see EU AI Act Article 15 multimodal) and the OWASP LLM01:2025 multimodal sub-category (see OWASP LLM01:2025 multimodal) lands here, in different vocabulary. Two arguments make the test bind on the multimodal channel.
The first is the interface argument. A text PI scanner accepts strings. It has no parameter on which a PNG byte array or a 16 kHz PCM audio buffer can be evaluated. Adapting it by running OCR or STT in front converts the input to text — but the conversion is the very thing the FigStep / AgentTypo / WhisperInject family is designed to defeat. The architectural ceiling of "text scanner plus OCR adapter" is the OCR's sensitivity, which adversarial-glyph attacks deliberately drop below. The long form of this argument is in why every text-only scanner misses a 30-pixel PNG; the audio version is in building a PI scanner for voice agents.
The second is the evidence argument. A reviewer reading the organisation's Measure activity will ask which control the Measure was performed against and on which inputs. For a text-only control, the Measure on the image channel is necessarily on a derivative — the OCR output, not the bytes. That derivative is exactly what an adversarial-glyph payload is engineered to make unrecoverable. A control whose Measure activity admits to a structural ceiling against the named adversarial-input class is not tailored to a risk profile that includes that class. This is the same shape as the EU AI Act Article 15 "appropriate to the relevant circumstances and the risks" argument, in NIST vocabulary.
This is not the same argument as "you must replace your text scanner." The RMF's tailoring language asks for an appropriate control mix, not vendor consolidation. The pragmatic production setup is a text-side scanner on the text channel and a multimodal scanner on the image and audio channels — two controls, two evidence streams, one Information Security program. See vs Lakera Guard, vs LLM Guard, vs Azure Prompt Shields, and vs Promptfoo for the side-by-side coverage shape, and the multimodal PI scanner pricing comparison for the buyer view.
Coverage matrix against the GenAI Profile Information Security risk on multimodal channels
The same coverage-matrix shape that applies to OWASP LLM01:2025 and to EU AI Act Article 15(5) applies to the GenAI Profile Information Security risk — because all three documents are saying the same thing about adversarial multimodal inputs in three different vocabularies. The Profile-aligned version of the matrix asks each control whether it satisfies a Map / Measure / Manage activity on each modality the production system consumes.
| Tool | Text channel | Image channel | Audio channel | GenAI Profile multimodal evidence |
|---|---|---|---|---|
| Lakera Guard | Yes (Measure on text inputs) | No (text-only as of public coverage) | No | Partial — does not Measure or Manage prompt injection on multimodal channels |
| LLM Guard (OSS) | Yes (text-only by design) | No | No | Partial — text-channel only |
| Azure Prompt Shields | Yes (Azure-gated) | Image moderation, not adversarial-input detection | No | Partial — moderation is a different Map entry from prompt-injection adversarial inputs |
| Promptfoo | Eval-time test harness | Eval-time test harness | Eval-time test harness | Useful inside Measure (pre-deployment); does not Manage runtime inputs |
| Glyphward | Run-both with text scanner | Yes — bytes in, score and region out | Yes — bytes in, score and region out | Multimodal-channel adversarial-input control with per-request evidence trail |
The "image moderation, not adversarial-input detection" line for Azure Prompt Shields is the easiest evidence error for an early Measure cycle to make and the easiest one for a careful reviewer to find. Image moderation classifies content categories (NSFW, violence, hate); adversarial-input detection classifies whether the content is an instruction-carrier for the model. An adversarial-glyph block can be benign for moderation and still inject the model. The Profile names prompt injection specifically; documenting moderation as the prompt-injection control on the image channel is a finding waiting to happen. Long-form treatment in Azure Prompt Shields alternative (non-Azure).
The Promptfoo line deserves its own footnote. Promptfoo is a strong Measure-function tool — it runs eval suites, including red-team probes, in CI. What it does not do is Manage a runtime input on a production request: a CI test harness is by construction not on the inference path. The pragmatic production setup is to use Promptfoo in CI to exercise the Glyphward scanner against your application's image and audio test corpus, and Glyphward at runtime to gate the production request. See Promptfoo + multimodal scanning for the YAML provider config that wires this up.
Architecture for satisfying the Information Security control on multimodal channels
The shape of an AI RMF–aligned input-inspection control on a multimodal channel is the same shape as the OWASP LLM01:2025 architecture and the EU AI Act Article 15(5) architecture, with the four AI RMF functions mapped to engineering primitives:
- Mount on input — implements Map and Manage at the inference boundary. Place the scanner on the boundary the model actually consumes from. For a chatbot with image upload, that is the upload handler before the vision API call. For a voice agent, that is the audio buffer before STT or before the audio-aware model. For a RAG pipeline, that is the loader middleware (RAG pipelines). For an MCP host, that is the tool-result handler (MCP servers). Map identifies the surface; Manage runs the control on each request.
- Score and tag — implements Measure with a continuous output. Return a 0–100 score and the modality-tagged reason. AI RMF Measure activities are easier to defend with a continuous score and a documented threshold than with an opaque blocked / allowed boolean. The threshold is the documented engineering parameter; the score history is the audit trail.
- Respond per source — implements Govern's policy on differential trust. Trust user-uploaded content less than first-party content, and trust third-party retrieved content least of all. The Profile's tailoring language means the response policy varies with the source. The same scan call with three different threshold bands documents three risk tiers cleanly, and the policy that ties source to threshold is the Govern artefact.
- Quarantine and dispute — implements Manage's response cycle. When a scan crosses the threshold, the input is quarantined rather than silently dropped: a request ID is logged, the user (or upstream caller) sees a structured refusal, and the security team has a queue to review false positives. Manage in NIST's vocabulary is exactly this: a documented path from a flagged input to a human-reviewable record and a feedback loop into the next Measure cycle.
- Log every score for evidence — implements Measure's auditability. Per-request scoring data, retained against your application's request ID, is the granularity an AI RMF audit wants on a Measure-2-flavoured activity. Glyphward's API returns a request ID and a score; logging the pair is the audit-friendly default. Combined with the organisation's incident-response logging and any sectoral record-keeping obligation (HIPAA, SOX, FERPA, sector-specific), this is the surface a careful reviewer reads first.
The byte-level scanning architecture this implements — CLIP image embedding plus a typographic head plus Tesseract OCR plus a curated payload corpus on the image side, and a waveform anomaly classifier plus a Whisper-small transcript filter on the audio side — is described in the multimodal prompt-injection threat model for AI product teams (2026) blog post. The five-step playbook there maps directly onto the four AI RMF functions and the Information Security risk's prompt-injection sub-class.
How Glyphward fits
Glyphward is the inference-time multimodal scanner — bytes in, score and region out — that slots into step 1 of the architecture above. The HTTP contract is one POST per attachment: image bytes (or URL) or audio bytes; the response is a 0–100 score, the flagged region (bounding box for image, time window for audio), and a modality-tagged reason. The same contract is exposed through the multimodal LLM security API page; pricing is flat-rate self-serve at $29/mo Pro and $99/mo Team, with a free tier sized for prototyping (free-tier API). Audit-friendly defaults: every Pro and Team request returns a request ID and is retained on Glyphward's side per the documented retention policy, which is the granularity the Profile's Measure-flavoured activities expect when paired with the organisation's own Manage logging.
The integration is provider-agnostic. Whether the AI application calls Anthropic, OpenAI, Google Gemini, AWS Bedrock, or a self-hosted multimodal model, the scanner reads the bytes — not the chat-completion API the bytes are about to flow into. That is what makes Glyphward a clean Information Security control under the Profile rather than a vendor coupling. It is also what makes it stack cleanly with whatever text-side guard already covers the text channel.
Related questions
Is multimodal prompt injection literally named in NIST AI 600-1?
No. The Profile names prompt injection inside the Information Security risk and gives verbatim definitions of direct and indirect prompt injection (quoted earlier on this page). Modality is silent in the text. The operationally useful read — and the read the engineering literature supports — is that an instruction encoded in pixels or in a waveform that diverts the model is the same risk class as an instruction encoded in characters, just delivered through a different channel. An organisation operationalising the Profile owns that read in its own risk register and Map activity. The vocabulary the Profile uses ("malicious prompts," "input them directly," "data likely to be retrieved") is mode-agnostic precisely because the Profile is meant to be tailored.
How does the AI RMF GenAI Profile relate to AI RMF 1.0?
AI RMF 1.0 (January 2023) is the underlying framework: four core functions (Govern, Map, Measure, Manage), a category and subcategory tree under each, and a Playbook of suggested actions. The GenAI Profile (NIST AI 600-1, July 2024) is a cross-sectoral overlay that enumerates GenAI-specific risks and lists suggested actions tagged to the same function-category-subcategory tree. The two documents are designed to be read together: the Profile says "for GenAI, here are the additional risks and the additional suggested actions"; the Framework supplies the structure those actions slot into. An organisation that has implemented AI RMF 1.0 already has the function-level scaffolding; the Profile's job is to point that scaffolding at the GenAI-specific risks.
Where does this fit with the EU AI Act and OWASP LLM01:2025?
Three documents, three vocabularies, one engineering control. The EU AI Act Article 15(5) names the threat as "adversarial examples or model evasion" and binds high-risk providers to a documented control by 2 August 2026. OWASP LLM01:2025 names it as "prompt injection" and explicitly recognises a multimodal sub-category. NIST AI 600-1 names it as Information Security and distinguishes direct from indirect injection. All three converge on the same control architecture for the multimodal piece: a byte-level inspection control on the image and audio channels that returns a per-request score with a modality-tagged reason. An organisation subject to all three (a US-headquartered AI startup with EU customers and an enterprise sales motion that triggers OWASP-mapped reviews) wants the same evidence stream to satisfy all three documents — which is what the architecture in this page is designed to produce.
Does Glyphward provide FedRAMP or sector-specific (HIPAA, FERPA, GLBA) compliance evidence?
Glyphward is a control component, not a compliance-evidence service. The control component side: it produces per-request scoring data with stable request IDs, modality-tagged reasons, and a documented retention policy — the granularity an AI RMF Measure / Manage cycle expects from a runtime Information Security control. The compliance-evidence side: that is your organisation's broader compliance program, which combines the AI RMF Govern artefacts, the Map / Measure / Manage cycle outputs, and the sectoral record-keeping the application is subject to (FedRAMP boundary documentation, HIPAA security-rule risk-analysis output, FERPA / GLBA equivalents). Glyphward feeds an evidence stream into the AI-specific layer of that program.
What about NIST AI 100-1 (Adversarial Machine Learning Taxonomy)?
The Adversarial Machine Learning Taxonomy (AML, NIST AI 100-2, 2024 update) is a separate companion document. It enumerates adversarial-ML attack classes — evasion, poisoning, privacy attacks — and treats prompt injection as a member of the evasion class for generative AI, which is consistent with how the EU AI Act treats it. AI RMF 1.0 plus AI 600-1 plus AI 100-2 are the three NIST-authored documents an AppSec lead at a multimodal AI application is most likely to be asked about; they overlap rather than conflict, and the multimodal scanner architecture in this page is the operative control on the inference-path adversarial-input class across all three.
Further reading
- NIST AI 600-1 (Generative AI Profile, July 2024) — canonical PDF on NIST Publications — the document quoted on this page.
- OWASP LLM01:2025 prompt injection — multimodal — the OWASP-framing of the same multimodal evidence question.
- EU AI Act Article 15 — multimodal prompt injection compliance — the EU regulatory framing of the same control architecture, with a 2 August 2026 deadline.
- The multimodal prompt-injection threat model for AI product teams (2026) — the threat model and the byte-level scanning architecture that the Information Security control instantiates.
- Why every text-only prompt-injection scanner misses a 30-pixel PNG — architectural argument for why the OCR-adapter-on-text-scanner pattern has a structural ceiling against the FigStep / AgentTypo class.
- Building a prompt-injection scanner for voice agents — the audio-channel version of the same architecture.
- What Check Point buying Lakera means for self-serve AI-security buyers — the public-defender consolidation that produced the multimodal evidence gap, written from a buyer's point of view.
- FigStep detection · AgentTypo detector · WhisperInject detection · Typographic PI scanner · Audio prompt-injection detection · Indirect prompt injection in images — the attack families that fall under the GenAI Profile's prompt-injection sub-class.
- For RAG pipelines · For MCP servers · For LangChain agents · For voice agents · For screenshot-reading agents · For avatar SaaS · For chatbots with image upload — per-product mount points; each one a different inspection-point answer to the same Information Security evidence question.
- Multimodal LLM security API — the API surface the Information Security control calls into.
- Lakera alternative · LLM Guard alternative · Azure Prompt Shields alternative · Promptfoo + multimodal scanning — coverage gaps of each public defender against the multimodal adversarial-input class.