Compliance · NIST AI RMF · GenAI Profile

NIST AI RMF GenAI Profile — multimodal prompt injection under the Information Security risk

The NIST AI Risk Management Framework — version 1.0, published 26 January 2023 — is the de facto common vocabulary for AI risk programs across US enterprise and US federal procurement. The Generative AI Profile that overlays the framework — NIST AI 600-1, published 26 July 2024 — enumerates twelve risks that are unique to or exacerbated by generative AI. Risk number 9 is Information Security, and prompt injection sits inside it. The Profile distinguishes direct prompt injection from indirect prompt injection in its own words. What the Profile does not do is enumerate modalities. Image and audio prompt injection — FigStep-class typographic payloads, AgentTypo-class adversarial glyphs, WhisperInject-class audio carriers — is the same Information Security risk delivered through channels a text-only control does not see. Here is how the AI RMF Govern / Map / Measure / Manage functions land on the multimodal piece, and the inference-time scanner pattern that closes the gap.

TL;DR

The AI RMF GenAI Profile (NIST AI 600-1) names Information Security as risk 9 of 12 and locates prompt injection inside it. Its definitions of direct and indirect prompt injection are quoted verbatim in the next section. The Profile does not enumerate modalities, so an organisation operationalising the Profile owns the read on whether image and audio prompt injection — FigStep, AgentTypo, WhisperInject, indirect carriers in retrieved documents and tool calls — counts as a delivery channel for the same Information Security risk. The defensible read is yes: an instruction rendered in pixels or in a waveform that diverts the model is the same risk class as an instruction encoded in characters. A text-only scanner does not satisfy a Map / Measure / Manage function on a multimodal channel by construction; bolting OCR or STT in front does not change the conclusion. Glyphward sits as the inference-time multimodal scanner — bytes in, score and region out — that closes that gap, alongside whatever text-side guard already covers the text channel.

What NIST AI 600-1 actually says about prompt injection

Section 2 of the Profile lists twelve GenAI risks: CBRN information or capabilities, confabulation, dangerous / violent / hateful content, data privacy, environmental impacts, harmful bias or homogenization, human-AI configuration, information integrity, information security, intellectual property, obscene / degrading / abusive content, and value-chain / component integration. Information Security is risk 9. The risk's Profile entry names cyberattacks against GenAI systems, including prompt injection, as the operative threat surface. The Profile gives the two attack vectors verbatim:

"In direct prompt injections, attackers might craft malicious prompts and input them directly to a GAI system, with a variety of downstream negative consequences to interconnected systems." — NIST AI 600-1 §2.9 (Information Security)

"Indirect prompt injection attacks occur when adversaries remotely (i.e., without a direct interface) exploit LLM-integrated applications by injecting prompts into data likely to be retrieved." — NIST AI 600-1 §2.9 (Information Security)

The Profile then notes the demonstrated downstream impact, again in its own words:

"Security researchers have already demonstrated how indirect prompt injections can exploit vulnerabilities by stealing proprietary data or running malicious code remotely on a machine." — NIST AI 600-1 §2.9 (Information Security)

What the Profile does not do is name modality. The verbatim language is "malicious prompts," "input them directly," "data likely to be retrieved." The text is mode-agnostic. That silence is the operative point: an organisation operationalising the Profile owns the read on whether image bytes and audio bytes count as inputs and retrieved data. The defensible read is yes — and the engineering literature treats the question as settled. FigStep (arXiv:2311.05608) is a typographic image payload that walks past every text-only scanner because the text never exists as characters; AgentTypo extends it with adversarial glyph blocks; WhisperInject (arXiv:2405.20653) demonstrates the audio analogue. A risk-management program that names prompt injection as Information Security but only inspects the text channel has documented an Information Security risk it is not Measuring on every channel the production system actually consumes from.

The Profile is published as an overlay on AI RMF 1.0. The four AI RMF core functions — Govern, Map, Measure, Manage — sit above the GenAI-specific risks. Suggested actions in the Profile are tagged with codes of the shape GV-1.1-001, MP-5.1-001, MS-2.7-001, MG-2.2-001, where the leading letters denote the function (GV / MP / MS / MG) and the rest of the code locates the suggested action inside that function's category and subcategory tree. The full list is large — over two hundred suggested actions across the four functions — and the prompt-injection mitigations are scattered across all four. Treating the framework as a checklist of action codes misses the point; the Profile's own framing is that the actions are suggestions an organisation tailors to its risk profile and use-case. The operationally useful mapping is not action-code by action-code, but function by function: what does Govern, Map, Measure, and Manage each require for the multimodal piece of Information Security risk 9?

How the four AI RMF functions land on multimodal prompt injection

Each function is a different surface a multimodal Information Security control has to be visible on. The mapping below is the operational read for an AI startup or scale-up running the RMF on a production multimodal LLM application — not a comprehensive reproduction of the Profile's suggested actions for the Information Security risk.

Govern. The organisation has named multimodal prompt injection as a member of the Information Security risk in its risk register, and has designated accountable owners (typically the AppSec lead and the AI engineering owner of the multimodal application). The Govern function is where the policy that says "image and audio inputs are inspected on the same threat-class as text inputs" gets written down. Without that policy artifact, downstream Map and Measure activities can be argued away by the next reviewer who treats modality as out-of-scope.
Map. The organisation has identified each modality the production system consumes, the upstream source of each modality (first-party user upload, third-party tool result, retrieved document, tool-server output), and the trust posture of each source. A typical multimodal-LLM application has at least three Map entries: image bytes from end users, image bytes embedded in retrieved documents (RAG pipelines — see RAG pipelines), and image or audio bytes returned in tool calls (MCP hosts — see MCP servers). Each Map entry is a distinct attack surface; each one needs a corresponding Measure activity.
Measure. The organisation has a documented evaluation that exercises the multimodal Information Security control against representative inputs from each Mapped attack surface, and reports recall, false-positive rate, and latency on each. Measure is where pre-deployment red-teaming lives, and where a public-corpus benchmark on FigStep / AgentTypo / WhisperInject earns its keep. AI RMF Measure activities are not one-shot — the documented re-evaluation cadence (typically aligned to model upgrades, new attack publications, and incident learnings) is the auditable artefact.
Manage. The organisation has a documented response to an inspection signal: a quarantine queue, a request-ID-keyed audit trail, a documented escalation path on a flagged input, and a feedback loop from production findings into the next Map / Measure cycle. Manage is the function that turns a runtime score into an organisational learning signal. A scanner whose output is a black-box block / allow boolean does not feed Manage; a scanner whose output is a per-request score with a modality-tagged reason and a request ID does.

The mapping above is intentionally function-level rather than action-code-level. The Profile invites the operator to tailor — and the action codes that are most load-bearing for a particular multimodal application are not the same codes that are most load-bearing for a different one. What is constant across operators is the four-function shape: Govern names the risk, Map enumerates the surfaces, Measure evaluates the control, Manage handles the signal.

Why a text-only Information Security control is not "tailored to the use-case" for multimodal systems

The AI RMF's language for "appropriateness" is "tailored to the AI actor's resources, the system's deployment context, and the risk profile." The same proportionality test that runs through the EU AI Act's Article 15 (see EU AI Act Article 15 multimodal) and the OWASP LLM01:2025 multimodal sub-category (see OWASP LLM01:2025 multimodal) lands here, in different vocabulary. Two arguments make the test bind on the multimodal channel.

The first is the interface argument. A text PI scanner accepts strings. It has no parameter on which a PNG byte array or a 16 kHz PCM audio buffer can be evaluated. Adapting it by running OCR or STT in front converts the input to text — but the conversion is the very thing the FigStep / AgentTypo / WhisperInject family is designed to defeat. The architectural ceiling of "text scanner plus OCR adapter" is the OCR's sensitivity, which adversarial-glyph attacks deliberately drop below. The long form of this argument is in why every text-only scanner misses a 30-pixel PNG; the audio version is in building a PI scanner for voice agents.

The second is the evidence argument. A reviewer reading the organisation's Measure activity will ask which control the Measure was performed against and on which inputs. For a text-only control, the Measure on the image channel is necessarily on a derivative — the OCR output, not the bytes. That derivative is exactly what an adversarial-glyph payload is engineered to make unrecoverable. A control whose Measure activity admits to a structural ceiling against the named adversarial-input class is not tailored to a risk profile that includes that class. This is the same shape as the EU AI Act Article 15 "appropriate to the relevant circumstances and the risks" argument, in NIST vocabulary.

This is not the same argument as "you must replace your text scanner." The RMF's tailoring language asks for an appropriate control mix, not vendor consolidation. The pragmatic production setup is a text-side scanner on the text channel and a multimodal scanner on the image and audio channels — two controls, two evidence streams, one Information Security program. See vs Lakera Guard, vs LLM Guard, vs Azure Prompt Shields, and vs Promptfoo for the side-by-side coverage shape, and the multimodal PI scanner pricing comparison for the buyer view.

Coverage matrix against the GenAI Profile Information Security risk on multimodal channels

The same coverage-matrix shape that applies to OWASP LLM01:2025 and to EU AI Act Article 15(5) applies to the GenAI Profile Information Security risk — because all three documents are saying the same thing about adversarial multimodal inputs in three different vocabularies. The Profile-aligned version of the matrix asks each control whether it satisfies a Map / Measure / Manage activity on each modality the production system consumes.

Tool	Text channel	Image channel	Audio channel	GenAI Profile multimodal evidence
Lakera Guard	Yes (Measure on text inputs)	No (text-only as of public coverage)	No	Partial — does not Measure or Manage prompt injection on multimodal channels
LLM Guard (OSS)	Yes (text-only by design)	No	No	Partial — text-channel only
Azure Prompt Shields	Yes (Azure-gated)	Image moderation, not adversarial-input detection	No	Partial — moderation is a different Map entry from prompt-injection adversarial inputs
Promptfoo	Eval-time test harness	Eval-time test harness	Eval-time test harness	Useful inside Measure (pre-deployment); does not Manage runtime inputs
Glyphward	Run-both with text scanner	Yes — bytes in, score and region out	Yes — bytes in, score and region out	Multimodal-channel adversarial-input control with per-request evidence trail

The "image moderation, not adversarial-input detection" line for Azure Prompt Shields is the easiest evidence error for an early Measure cycle to make and the easiest one for a careful reviewer to find. Image moderation classifies content categories (NSFW, violence, hate); adversarial-input detection classifies whether the content is an instruction-carrier for the model. An adversarial-glyph block can be benign for moderation and still inject the model. The Profile names prompt injection specifically; documenting moderation as the prompt-injection control on the image channel is a finding waiting to happen. Long-form treatment in Azure Prompt Shields alternative (non-Azure).

The Promptfoo line deserves its own footnote. Promptfoo is a strong Measure-function tool — it runs eval suites, including red-team probes, in CI. What it does not do is Manage a runtime input on a production request: a CI test harness is by construction not on the inference path. The pragmatic production setup is to use Promptfoo in CI to exercise the Glyphward scanner against your application's image and audio test corpus, and Glyphward at runtime to gate the production request. See Promptfoo + multimodal scanning for the YAML provider config that wires this up.

Architecture for satisfying the Information Security control on multimodal channels

The shape of an AI RMF–aligned input-inspection control on a multimodal channel is the same shape as the OWASP LLM01:2025 architecture and the EU AI Act Article 15(5) architecture, with the four AI RMF functions mapped to engineering primitives:

Mount on input — implements Map and Manage at the inference boundary. Place the scanner on the boundary the model actually consumes from. For a chatbot with image upload, that is the upload handler before the vision API call. For a voice agent, that is the audio buffer before STT or before the audio-aware model. For a RAG pipeline, that is the loader middleware (RAG pipelines). For an MCP host, that is the tool-result handler (MCP servers). Map identifies the surface; Manage runs the control on each request.
Score and tag — implements Measure with a continuous output. Return a 0–100 score and the modality-tagged reason. AI RMF Measure activities are easier to defend with a continuous score and a documented threshold than with an opaque blocked / allowed boolean. The threshold is the documented engineering parameter; the score history is the audit trail.
Respond per source — implements Govern's policy on differential trust. Trust user-uploaded content less than first-party content, and trust third-party retrieved content least of all. The Profile's tailoring language means the response policy varies with the source. The same scan call with three different threshold bands documents three risk tiers cleanly, and the policy that ties source to threshold is the Govern artefact.
Quarantine and dispute — implements Manage's response cycle. When a scan crosses the threshold, the input is quarantined rather than silently dropped: a request ID is logged, the user (or upstream caller) sees a structured refusal, and the security team has a queue to review false positives. Manage in NIST's vocabulary is exactly this: a documented path from a flagged input to a human-reviewable record and a feedback loop into the next Measure cycle.
Log every score for evidence — implements Measure's auditability. Per-request scoring data, retained against your application's request ID, is the granularity an AI RMF audit wants on a Measure-2-flavoured activity. Glyphward's API returns a request ID and a score; logging the pair is the audit-friendly default. Combined with the organisation's incident-response logging and any sectoral record-keeping obligation (HIPAA, SOX, FERPA, sector-specific), this is the surface a careful reviewer reads first.

The byte-level scanning architecture this implements — CLIP image embedding plus a typographic head plus Tesseract OCR plus a curated payload corpus on the image side, and a waveform anomaly classifier plus a Whisper-small transcript filter on the audio side — is described in the multimodal prompt-injection threat model for AI product teams (2026) blog post. The five-step playbook there maps directly onto the four AI RMF functions and the Information Security risk's prompt-injection sub-class.

How Glyphward fits

Glyphward is the inference-time multimodal scanner — bytes in, score and region out — that slots into step 1 of the architecture above. The HTTP contract is one POST per attachment: image bytes (or URL) or audio bytes; the response is a 0–100 score, the flagged region (bounding box for image, time window for audio), and a modality-tagged reason. The same contract is exposed through the multimodal LLM security API page; pricing is flat-rate self-serve at $29/mo Pro and $99/mo Team, with a free tier sized for prototyping (free-tier API). Audit-friendly defaults: every Pro and Team request returns a request ID and is retained on Glyphward's side per the documented retention policy, which is the granularity the Profile's Measure-flavoured activities expect when paired with the organisation's own Manage logging.

The integration is provider-agnostic. Whether the AI application calls Anthropic, OpenAI, Google Gemini, AWS Bedrock, or a self-hosted multimodal model, the scanner reads the bytes — not the chat-completion API the bytes are about to flow into. That is what makes Glyphward a clean Information Security control under the Profile rather than a vendor coupling. It is also what makes it stack cleanly with whatever text-side guard already covers the text channel.

Get early access · See the API surface