Compliance · SOC 2 Type II

SOC 2 AI security controls — multimodal prompt injection evidence

SOC 2 Type II is the compliance programme most SaaS companies reach for first when enterprise procurement demands an independent security assessment. For AI-powered SaaS — applications that accept image uploads, audio input, or multimodal content as part of their feature set — the AICPA Trust Services Criteria (specifically CC6 Logical and Physical Access Controls and CC7 System Operations) now carry a new operational question: what control does this product have in place to prevent adversarial instructions from reaching the model through every input channel? The standard text-only prompt-injection scanners answer that question for the text channel well. They do not answer it for image or audio channels, and the per-request evidence log that a SOC 2 auditor requests — what was inspected, what score was returned, what action was taken — does not exist for those channels unless a multimodal scan was performed at the inference boundary.

TL;DR

SOC 2 Trust Services Criteria CC6.6 requires controls that restrict logical access from outside the system boundary — for AI applications, the inference boundary is where untrusted user content (including image and audio files) crosses into the model's context, making it the operative control point. CC7.2 requires continuous anomaly monitoring of system operations, which for AI features means detecting adversarial-input events across every input modality. Text-only prompt-injection scanners — Lakera Guard, LLM Guard, Azure Prompt Shields — satisfy CC6.6 and CC7.2 evidence requirements for the text channel. They produce no scan log for image or audio inputs, leaving a per-request evidence gap that a Type II audit's operating-effectiveness testing will surface. Glyphward is the inference-time multimodal scan that closes that gap: bytes in, 0–100 risk score and flagged region out, per-request scan ID logged — the same format of evidence the text scanner produces, extended to the image and audio channels.

The Trust Services Criteria most relevant to AI inputs

SOC 2 is structured around five Trust Services Criteria sets: Security (the baseline, always included), Availability, Processing Integrity, Confidentiality, and Privacy. The Security criteria — formally designated the Common Criteria (CC) — are the ones most directly implicated by prompt injection controls. Three sub-criteria within CC6 and CC7 are the operative evidence anchors for AI input validation.

CC6.6 addresses "logical access security measures to protect against threats from sources outside system boundaries." The AICPA's implementation guidance notes that this criterion covers controls designed to prevent and detect external attacks — including attacks delivered through data the system legitimately accepts from external parties. For an AI application that accepts user-submitted images, those images are inputs from outside the system boundary. A prompt injection payload embedded in a pixel layer of an image is a threat originating from outside the system boundary, delivered through a legitimate input channel. CC6.6 is the criterion that asks whether a control was in place at the point where that external content crossed the boundary into the model's context.

CC7.2 addresses "monitoring of system components and the operation of those controls for anomalies that are indicative of malicious acts, natural disasters, and errors." For AI operations, this translates to continuous monitoring of inference requests for anomalous input patterns — specifically, inputs that appear to carry adversarial-instruction payloads. A runtime scan that scores every image or audio input against a payload corpus and generates a risk score is the concrete implementation of CC7.2 monitoring for multimodal AI features. CC7.3 — which requires evaluating detected security events to determine whether they indicate a failure to meet objectives — is the downstream complement: when a high-risk image is blocked, that event should be logged in a format that can inform the CC7.3 evaluation and, if necessary, feed an incident-response playbook.

CC9.2, which governs risk mitigation through controls over suppliers and business partners, becomes relevant when the AI pipeline includes third-party model APIs (OpenAI, Anthropic, Google). The argument that the third-party model provider is responsible for prompt injection mitigation does not satisfy CC6.6: the model provider controls the model's internal architecture, not the content of the inputs your application forwards to it. Your application is the system-boundary crossing point, and your application's controls are what CC6.6 is assessing. Delegating boundary hardening to the upstream model API leaves the criterion unaddressed regardless of how the model provider's terms of service are written.

What SOC 2 auditors are asking about AI inputs in 2026

SOC 2 Type II audits examine three things for each relevant control: design (does the control description cover the risk?), implementation (is the control actually in place?), and operating effectiveness (did the control operate continuously and correctly during the audit period?). For prompt injection controls, the design question is now routine at most security-focused auditors: "What prevents a malicious actor from manipulating your AI system through user-submitted content?" Most AI SaaS teams answer this with a text-side scanner — Lakera Guard, LLM Guard, Azure Prompt Shields — and the design question is satisfied for the text channel.

The follow-up is where the gap appears: "Does that control cover image uploads?" or "Your product accepts voice input — does the prompt injection control apply to audio files?" For most teams, the honest answer is no. The text scanner was deployed for the text channel. Image and audio inputs go directly to the vision or audio model with no intermediate inspection. There is no evidence record for those channels because no scan was performed.

Operating-effectiveness testing makes this gap concrete. During a Type II audit covering a twelve-month period, the auditor will sample a population of requests and ask for the corresponding control evidence. For a request that included a user-uploaded image alongside a text prompt, the auditor will ask: where is the scan log entry for the image? If the scanner only processed the text field, there is no scan log entry for the image. The control was not operating for that request on that channel. The operating-effectiveness finding is not that the scanner failed — it is that the scanner was not active for the input type in question. That is the evidence gap that cannot be resolved by tuning the text scanner or improving OCR quality.

This pattern is increasingly common in practice as more SaaS products add vision and voice AI features. The enterprise buyer whose security team is running SOC 2 due diligence on an AI vendor is asking exactly this question, often before the formal audit begins. "You have prompt injection controls — do they cover image uploads?" is now a standard pre-procurement question from buyers whose own compliance programmes require them to validate their vendors' security controls. A vendor whose text scanner provides no evidence for image inputs has a gap in their vendor questionnaire response that the buyer's security team will notice.

The evidence gap: what text-only scanners cannot log for image and audio inputs

The operative compliance issue is not whether a known FigStep or WhisperInject payload would succeed against the deployed system. It is whether the control architecture produces a per-request evidence record for every input modality the model consumes. SOC 2 operating-effectiveness testing is evidence-based: the auditor requests a sample of transactions and matches each to a corresponding control log. If the control did not generate a log for a given input type, the control was not operating for that input.

For a text-only scanner applied to an AI application that accepts image uploads, the evidence structure looks like this. A user submits a request with both a text prompt and an attached image. The text scanner receives the text field, runs the detection pipeline, and writes a log entry: request_id=R1, modality=text, score=14, action=allowed. The image field bypasses the scanner entirely and is forwarded to the vision model directly. The control log for request R1 contains one entry, covering the text field. The vision model receives the image, processes it, and returns a response. The image bytes were never inspected by a PI control. No log entry for the image exists.

When the auditor samples request R1 and asks "show me the CC6.6 control evidence for this request," the evidence package covers the text channel. The auditor then asks: "This request included an image — where is the PI control evidence for the image?" The team can explain that the scanner only covers text. The auditor records: CC6.6 — image inputs not covered by the logical access control during the audit period. Depending on the auditor and the severity of the other findings, this may result in a qualified opinion, a management letter finding, or (in a SOC 2 readiness engagement) a remediation action before the formal audit begins.

The argument that "images are safe because users uploaded them through our authenticated form" does not satisfy CC6.6. The criterion covers the system boundary crossing — the point where external content enters the model's processing context — not the authentication of the channel through which it was delivered. An authenticated user submitting a FigStep payload is still a threat from outside the system boundary. Authentication establishes identity, not content intent. CC6.6 requires a control that inspects the content at the boundary, regardless of the channel's authentication state.

The same argument applies to audio inputs in voice-enabled AI features. A WhisperInject-class payload embedded in a waveform submitted by an authenticated user is still external content crossing the system boundary. Authentication of the submission channel does not substitute for inspection of the content. For voice agents, RAG pipelines retrieving image-bearing PDFs, and MCP servers processing content blocks with embedded images, the scope of the evidence gap grows with the number of multimodal channels the application consumes. Each channel without a scan log is a channel where CC6.6 evidence is absent.

Why routing images through OCR does not close the evidence gap

A common response to the multimodal evidence gap is to add an OCR step before the text scanner: convert the image to text, pass the text to the existing PI scanner, and treat that as covering the image channel. This approach does not satisfy the SOC 2 evidence requirement for two structural reasons.

The first reason is architectural: the FigStep class of attacks is designed specifically to defeat OCR-before-text-scanner pipelines. FigStep renders an adversarial instruction as typography that a vision-language model reads as a coherent directive while an OCR adapter drops the characters, misreads them, or fails to extract them at all — because the adversarial rendering exploits the gap between the character-set a neural image encoder sees and the glyphs an OCR system extracts (see FigStep detection). The OCR transcript is clean; the text scanner finds nothing; the image is forwarded to the vision model with a passing scan record. The scan record is evidence that the OCR output was inspected — it is not evidence that the image bytes were inspected. A SOC 2 auditor who understands the attack model will note that the control inspected a derived representation, not the artifact the model actually consumed.

The second reason is evidentiary: the OCR-then-text-scan pattern produces a scan log for the text representation, not for the image. If the auditor asks "was the image inspected by a PI control?", the technically accurate answer remains no — the OCR output was inspected. Whether that distinction matters in a specific audit depends on the auditor's sophistication and the criteria language in the SOC 2 engagement letter. For a principle-based reading of CC6.6 — "controls to protect against threats from sources outside the system boundary" — the relevant boundary crossing is the image entering the model's multimodal context, and the relevant control is one that inspects the bytes at that boundary. A derived-text inspection does not cover the direct-bytes boundary crossing. See why every text-only prompt-injection scanner misses a 30-pixel PNG for the full architectural argument on why this structural ceiling is not addressable through OCR tuning.

The audio parallel is even more direct. A speech-to-text adapter converts an audio stream to a transcript. A text PI scanner operates on the transcript. A WhisperInject payload embeds an instruction in the waveform that the speech model decodes as part of its context while the transcript representation omits it (see WhisperInject detection and audio prompt-injection detection). The transcript scan log shows a clean transcript; the scanner scores it low; the audio is forwarded to the voice model. The waveform bytes were never inspected. The operating-effectiveness evidence for that request reflects a control that was not active for the input the model actually consumed. SOC 2 operating-effectiveness testing on a population of voice-feature requests will surface this pattern at scale.

Coverage matrix against CC6.6 and CC7.2 evidence requirements

For teams mapping the self-serve prompt injection tool landscape to SOC 2 evidence requirements, the picture sorts cleanly along the modality boundary.

Tool	Text PI evidence (CC6.6)	Image PI evidence (CC6.6)	Audio PI evidence (CC6.6)	Anomaly log (CC7.2)
Lakera Guard	Yes — per-request score + log	No (text-only)	No	Text channel only
LLM Guard (OSS)	Yes — per-request score	No (text-only by design)	No	Text channel only
Azure Prompt Shields	Yes (Azure-gated)	Content moderation, not PI	No	Text + content moderation only
Promptfoo	Eval-time test harness	Eval-time test harness	Eval-time test harness	Not a runtime evidence source
Glyphward	Run-both with text scanner	Yes — per-request score + scan_id	Yes — per-request score + scan_id	All channels; webhook on high-risk

Three clarifications apply to the matrix. First, Promptfoo is a pre-deployment evaluation harness — it generates adversarial probes and verifies model behaviour in CI. It is the Measure layer in NIST AI RMF terms. SOC 2 operating-effectiveness testing is asking about runtime controls during the audit period, not pre-deployment test coverage. Promptfoo records are development-time evidence, not per-production-request evidence. Second, Azure Prompt Shields' image-input capability covers content moderation classes — CSAM, violence, hate — not prompt-injection payload detection. A "no harmful content" score from an image content moderation service does not constitute evidence that adversarial-instruction payloads in the image were inspected and blocked. Content moderation and PI detection are different functions serving different threat models. These distinctions are developed in the comparison pages at Lakera Guard vs Glyphward and Azure Prompt Shields vs Glyphward. Third, Lakera Guard was acquired by Check Point in 2025 and is moving upmarket; the self-serve tier availability and evidence log format may change. The Lakera alternative page covers the practical implications for teams currently using Lakera Guard for their text-channel coverage.

Five-step architecture for SOC 2-aligned multimodal inference-boundary controls

Implementing a SOC 2-compliant AI input control for a multimodal product requires five steps that map directly onto the Trust Services Criteria language.

Enumerate every input modality in the system description. SOC 2 Type II audits are scoped to a defined system. The system description must accurately reflect what the system does, including all input channels the AI features accept. A system description that lists "user text prompts" but omits "user-uploaded images" and "audio inputs" creates a scoping gap that the audit process will surface. Update the system description to enumerate each modality: direct user-uploaded images, user-submitted audio, images retrieved via tool calls, images arriving in RAG chunks, multimodal content blocks from MCP servers. Per-product threat model examples are available for chatbots with image upload, RAG pipelines, MCP servers, screenshot-reading agents, and voice agents.
Place the inference-boundary scan after receipt of the input and before the model call. CC6.6 specifies the logical access boundary — the point where external content crosses into system-controlled processing. For AI features, that crossing happens when the model call is made: the model receives the image or audio bytes as part of its multimodal context. The scan must be positioned between input receipt and model call, not at upload time (which may be minutes or hours before the inference request) or at output time (after the threat has already crossed the boundary). Each input modality requires a separate scan call: image bytes or URL to the image scan endpoint; audio bytes to the audio scan endpoint. The scan response returns a score, flagged region or time window, and a stable scan_id. The multimodal LLM security API architecture covers the pipeline shape and latency budget for high-throughput deployments.
Log every scan result as the CC7.2 operating evidence. The log entry for each request should record: application request_id, Glyphward scan_id, input modality, risk_score, threshold applied, and action taken (allowed or blocked). This is the per-request evidence artifact the SOC 2 auditor will sample during operating-effectiveness testing. The request_id links the scan log to the application's access log; the scan_id links it to Glyphward's immutable server-side record, which can be used as corroborating evidence if the application log is questioned. Log entries should be immutable — append-only, not editable — to satisfy the auditor's expectation that evidence was not reconstructed after the fact. Glyphward's Pro and Team tiers return a stable scan_id in the response; logging that ID alongside the application request_id requires no additional engineering beyond the HTTP response parse.
Route high-risk scan events into your SIEM or alerting toolchain for CC7.2 anomaly coverage. CC7.2 requires not just that anomalies are detectable but that the system is actively monitoring for them. A scan result with a risk score above your blocking threshold is the anomaly indicator the criterion expects. Glyphward's Pro plan returns scan results synchronously with the HTTP response; webhook delivery of high-risk alerts is available for async pipeline architectures. Route webhook payloads to your SIEM ingestion endpoint (Splunk, Datadog, Elastic Security, PagerDuty) to generate CC7.2-compliant alert records. The CC7.3 evaluation — determining whether the detected event constitutes a security incident — can then follow the same runbook you use for other security alert classes, with the scan_id and risk_score as the incident triage artefacts.
Review the control's scope and effectiveness in the annual control review cycle. SOC 2 Type II covers a defined audit period (typically twelve months). At the end of each period, update the control documentation to reflect any changes in input modalities (new file types, new voice features, new retrieval sources) and run a scope review: are all currently-active input channels covered by a scan? The payload corpus that Glyphward scans against is updated continuously as new attack families are published; the control design documentation should note that the corpus is living, not static, and that the scan's detection coverage evolves with the threat landscape. For organisations subject to multiple compliance frameworks simultaneously, the same scan log and the same per-request evidence satisfies the OWASP LLM01:2025 multimodal evidence question, the NIST AI RMF GenAI Profile Measure/Manage evidence question, the CISA "Deploying AI Systems Securely" inference-boundary evidence question, and the EU AI Act Article 15(5) adversarial-robustness evidence question. One scan log, multiple compliance vocabulary reads.

How Glyphward fits

Glyphward is the inference-time multimodal scanner — bytes in, 0–100 risk score and flagged region out — positioned at the inference boundary that SOC 2's CC6.6 identifies as the logical access control point. The HTTP contract is a single POST per attachment: image bytes or URL forwarded to the image endpoint, or audio bytes forwarded to the audio endpoint. The response includes a risk score, a flagged region (bounding box for image inputs, time window for audio inputs), a modality-tagged reason string, and a stable scan_id. The scanner architecture runs CLIP embedding plus a typographic-PI detection head plus Tesseract OCR cross-referenced against a curated payload corpus on the image side, and a waveform anomaly classifier plus Whisper-small transcript filter on the audio side — the full architecture is in the multimodal prompt-injection threat model for AI product teams (2026).

For SOC 2 evidence packaging, Glyphward produces the same per-request evidence structure on the image and audio side that your existing text scanner produces on the text side. The scan_id is the stable external reference that lets your audit log point to an immutable server-side record. The risk_score and flagged_region are the technical detail that answers the auditor's question "what exactly did the control inspect?" The action field (allowed or blocked) is the CC6.6 outcome — the logical access decision. Logging these four fields against your application request_id gives you a complete per-request CC6.6 evidence package across all input modalities for the duration of the audit period. Glyphward does not compete with your existing text scanner; the two run in parallel, each covering their respective channels. Pricing is flat-rate self-serve — see the full pricing breakdown — starting at $29/mo for Pro with a free no-card tier for prototyping and scope evaluation.

Get early access · See the API surface

Related questions

Does SOC 2 specifically name prompt injection as a control requirement?

No. SOC 2 Trust Services Criteria are principle-based rather than threat-specific: they describe the control outcome required (protect the logical access boundary, monitor for anomalies) without enumerating the specific attack classes the control must cover. This is deliberate — a criteria set that named specific attacks would be immediately outdated by new attack families. The operative question for CC6.6 is whether the control protects against "threats from sources outside the system boundary" — and a prompt injection payload embedded in an image is exactly such a threat, regardless of whether the criterion names it. Security auditors familiar with the AI threat landscape are making this connection in their planning questionnaires; the criteria language does not need to name FigStep or WhisperInject to require a control that blocks them. For a detailed breakdown of how framework language maps onto the multimodal PI threat class, see OWASP LLM01:2025 multimodal (which does name the threat) and NIST AI RMF GenAI Profile (which names prompt injection explicitly in the Information Security risk).

Is multimodal prompt injection in SOC 2 scope if we do not sell to regulated industries?

Yes. SOC 2 scope is defined by the system's own trust services criteria and service commitments, not by the regulated-industry status of the buyer. A SaaS company selling to commercial buyers (not healthcare, not finance) that has committed to SOC 2 Security criteria has made the same CC6.6 commitments as one selling to regulated buyers. If the system description includes AI features that accept image or audio uploads, CC6.6's logical access boundary control applies to those inputs regardless of who is submitting them. The regulated-industry framing does apply to SOC 2's optional criteria sets (Privacy criteria are more strictly interpreted for healthcare data, for example), but the Security Common Criteria — including CC6.6 and CC7.2 — are evaluated against the same standard for any scoped system. The only question is whether image and audio inputs are in the system description. If they are, the controls must cover them.

Can we document a compensating control instead of adding a multimodal scan?

Compensating controls are accepted in SOC 2 when the primary control cannot be implemented as designed. Common examples include process controls that substitute for automated controls when system architecture precludes automation. For multimodal prompt injection, a compensating control might be: all image and audio uploads are processed in an isolated sandbox with no access to privileged context. To document this compensating control credibly, the team would need evidence that the sandbox isolation actually prevents injected instructions from reaching consequential system actions — which is a harder evidentiary case to make than a scan log. Compensating controls also require more documentation effort (design narrative, implementation evidence, effectiveness testing separate from the primary control) than a runtime scan that produces a per-request log automatically. For most teams, adding an inference-boundary multimodal scan is the path of least resistance to filling the CC6.6 evidence gap, and it produces better operating-effectiveness evidence than a process control description.

Which Trust Services Criteria does Glyphward's scan evidence address most directly?

CC6.6 is the most direct: the scan log is per-request evidence that a logical access control was active at the inference boundary for every image and audio input during the audit period. CC7.2 is the second most direct: high-risk scan results constitute the anomaly-detection signal the criterion requires, and webhook delivery into a SIEM generates the alert records that demonstrate active monitoring rather than passive availability of a scanner. CC7.3 — evaluating detected events to determine whether they constitute security incidents — is addressed by the scan_id and risk_score being available to the incident-response runbook as structured triage artefacts. A secondary argument exists for CC9.2 (the risk associated with forwarding user inputs to a third-party model API without inspection), though CC9.2 is more commonly addressed through model API vendor assessment processes. For organisations subject to SOC 2 alongside ISO 27001, the same scan log maps onto Annex A.8.7 (protection against malware, broadly construed to cover AI adversarial inputs) and A.8.28 (secure coding and deployment, specifically the inference pipeline design).

Can Glyphward scan logs be exported for the SOC 2 auditor?

The scan log for SOC 2 purposes is maintained in your application's own access log, not in Glyphward's servers alone. The pattern recommended above — logging the Glyphward scan_id alongside your application request_id in your own immutable log — means the audit evidence package is entirely in your control. You do not depend on pulling logs from a third-party API at audit time. The Glyphward scan_id allows the auditor to verify that the server-side record matches the application-side log if independent corroboration is requested. Pro plan includes synchronous per-request scan responses with stable scan_ids. Team plan includes webhook delivery of scan results and compare-report export functionality that can be used to assemble evidence packages for specific audit sampling requests — useful when the auditor asks for all high-risk events during a specific date window rather than sampling individual request IDs.