Compliance · FedRAMP

FedRAMP AI security: prompt injection controls for multimodal federal systems

FedRAMP Moderate and High authorizations require cloud service providers to implement NIST SP 800-53 Rev 5 controls across every attack surface their system exposes — including AI inference inputs. When a federal AI system processes claimant-submitted ID documents, contractor field photos, or agency-internal scanned forms through a vision-language model, the image byte stream is an input channel that standard text-only prompt-injection scanners do not inspect. Adversarial text embedded in image pixels — rendered invisibly to a human reviewer but fully legible to a vision model — can redirect model behavior, suppress output, exfiltrate retrieved context, or cause the system to produce fraudulent determinations. Five NIST 800-53 Rev 5 controls directly implicate multimodal prompt-injection detection, and the absence of that control is a findable gap during a 3PAO assessment under SA-11.

TL;DR

Call POST https://api.glyphward.com/v1/scan with the base64-encoded image bytes before passing the image to your vision model. The response includes a 0–100 score, a decision of allow or block, and a scan_id that becomes your per-transaction audit evidence for SI-10 and AU-12. Use a threshold of 70 for FedRAMP Moderate systems and 60 for High impact. Log the scan_id and image SHA-256 to your SIEM for continuous monitoring evidence. The free tier covers 1,000 scans per month — get early access to start building before your next 3PAO engagement.

Relevant NIST 800-53 Rev 5 controls for AI image inputs

FedRAMP does not yet publish a dedicated AI security baseline overlay, but the Joint Authorization Board (JAB) has confirmed that existing controls apply to AI system components. The five controls below have direct, documentable relationships to multimodal prompt-injection risk in vision AI systems.

SI-10 — Information Input Validation

SI-10 requires the information system to check the validity of information inputs. The control explicitly covers "values, format, and accuracy" of all inputs — not merely text fields. For an AI system that routes uploaded images to a vision-language model, the image byte stream is an information input. A multimodal prompt-injection payload (adversarial text rendered in the image that instructs the model to alter its behavior) is an invalid input under SI-10: it is not a legitimate documentary image; it is an instruction masquerading as one. The control enhancement SI-10(3) extends validation to the component level, which for a microservices AI pipeline means the inference gateway, not only the web application firewall. A pre-inference scan that returns a block decision before the image reaches the model is the SI-10 control implementation for multimodal AI inputs.

Assessors look for documented validation logic and evidence that it fires on every image input. The scan_id per-image log entry is that evidence.

SA-11 — Developer Testing and Evaluation

SA-11 requires developers to perform security testing and evaluation of the system, including penetration testing sufficient to identify vulnerabilities. Enhancement SA-11(5) specifically requires penetration testing of developer-defined elements — AI inference endpoints are developer-defined elements. The 3PAO security assessment package for a FedRAMP authorization with AI features must demonstrate that the CSP's developers tested those features for known AI attack classes. OWASP LLM01 (prompt injection) and its multimodal variants are documented vulnerability classes; a 3PAO assessor who is current on the AI threat landscape will look for evidence of multimodal PI testing. Providing the assessor with a set of adversarial test images and a log of scan decisions demonstrating that the scanner blocked them — with scan_id artifacts traceable to specific test runs — satisfies the SA-11 penetration testing evidence requirement for AI image inputs.

RA-5 — Vulnerability Monitoring and Scanning

RA-5 requires the organization to monitor and scan for vulnerabilities in the information system on an ongoing basis. FedRAMP Moderate and High both require continuous monitoring (ConMon) plans that enumerate the vulnerability classes the organization is scanning for. OWASP LLM01 multimodal prompt injection is a documented vulnerability class with a growing body of public exploit demonstrations, including attacks against document-processing AI published in 2024–2025. Once this vulnerability class appears in authoritative references (OWASP Top 10 for LLMs, NIST NVD AI-specific entries, CISA advisories), it must be included in the RA-5 vulnerability monitoring scope for AI systems. A ConMon evidence package that demonstrates per-image scanning in production — with weekly summary logs from the Glyphward dashboard — provides the artifact that satisfies the RA-5 monitoring requirement for this vulnerability class.

SI-3 — Malicious Code Protection

SI-3 requires malicious code protection at information system entry and exit points. Traditional implementations address executable malware in file uploads. For AI systems, an adversarial image payload is a form of malicious code: it is a crafted artifact designed to alter the behavior of an executing process (the model inference) in a manner the legitimate user and the system owner did not authorize. The analogy to shellcode is precise — the payload is not executable in isolation, but when the "interpreter" (the vision model) processes it, it executes the embedded instruction. SI-3's requirement to scan at entry points maps directly to pre-inference multimodal PI scanning for every uploaded image before it reaches the model. SI-3(1) and SI-3(2) enhancements, which require centrally managed and automatically updated malicious code protection tools, are satisfied by an API-based scanner that Glyphward updates as new attack patterns emerge — the CSP does not need to maintain adversarial pattern libraries in-house.

AU-2 / AU-12 — Audit Events and Audit Record Generation

AU-2 requires the organization to determine which events the system must audit. AU-12 requires the system to generate audit records for those events. For FedRAMP Moderate and High systems, every significant data-processing event must produce an auditable record. A vision AI system processing a claimant's uploaded document image is performing a significant data-processing event that affects the claimant's benefit determination. Each such event must appear in the audit log. The Glyphward scan call produces four audit-relevant fields per image: scan_id (unique per scan), image_sha256 (content-addressable identifier of the specific image processed), decision (allow/block), and timestamp. Logging these four fields to your SIEM alongside the application-layer transaction identifier provides the per-transaction audit trail AU-12 requires for AI inference events — a trail that shows which image was processed, when, by which system component, and what security decision was made about it.

AI attack surfaces in federal deployments

The federal government's expansion of AI into citizen-facing and operational systems has created several distinct image-processing attack surfaces. Each has characteristics that affect the threat model and the priority level for multimodal PI controls.

Benefits processing: SSA, VA, HHS

AI systems at the Social Security Administration, Department of Veterans Affairs, and Department of Health and Human Services increasingly process claimant-submitted document images — identity documents, medical records, supporting evidence for disability claims, provider attestation forms. These images originate from members of the public who may or may not be the claimant of record. An adversarial claimant or third party can craft an ID scan or supporting document image containing hidden text that, when processed by the agency's vision AI, instructs the model to approve a claim, suppress a fraud flag, or return a determination the model's developers did not intend. The impact of a successful injection is an erroneous benefit determination affecting a federal entitlement program — a high-severity outcome that elevates the priority of this control relative to most commercial deployments.

Immigration and border: USCIS, CBP

U.S. Citizenship and Immigration Services processes millions of document images annually — petitions, supporting evidence, identity documents from applicants in jurisdictions where document fraud is a documented threat vector. An adversarial image payload submitted in a visa petition or naturalization application could, against a vulnerable vision AI, alter the AI's characterization of the document, suppress anomaly flags, or redirect model output in ways that affect an adjudication. CBP's image classification systems at ports of entry face a similar threat from cargo and traveler-submitted images. The FedRAMP authorization boundary for USCIS and CBP AI systems must account for this external-input attack surface explicitly.

Defense contractor pipelines: DoD NIPR

DoD AI vision systems on Non-classified Internet Protocol Router (NIPR) networks process imagery submitted by contractors — field photographs, technical diagrams, equipment condition reports, inspection images. Contractors are external parties to the DoD authorization boundary; their submitted images are untrusted inputs. A contractor whose subcontractor has been compromised, or a contractor themselves acting adversarially, can embed instructions in a submitted image that affect downstream DoD AI processing. On NIPR, which handles controlled unclassified information (CUI) and some For Official Use Only (FOUO) data, a successful injection attack against a DoD AI vision system constitutes a CUI handling incident. The FedRAMP Moderate equivalent for DoD (Impact Level 2) applies to many NIPR AI deployments.

Grant management: NSF, NIH

The National Science Foundation and National Institutes of Health use AI systems to assist in processing grant proposals, including research diagrams, figures, and supplementary materials submitted as image attachments. A research submitter with adversarial intent — whether an applicant attempting to manipulate AI scoring or a nation-state actor targeting US research programs — can embed prompt-injection payloads in submitted figures. The vision AI that processes those figures to assist reviewers may return altered characterizations of the research if it processes an adversarial image. NSF and NIH grant systems often operate at the FedRAMP Moderate baseline.

Tax and revenue: IRS document AI

IRS AI document processing systems handle uploaded W-2 images, scanned tax forms, supporting schedules, and identity verification documents. Taxpayers who submit these images are external actors with direct financial motivation to alter the AI system's output. Adversarial image payloads in tax document uploads represent a revenue-integrity risk. The IRS's authorization boundary for its document-processing AI must include controls for the image-input channel — SI-10 input validation and SI-3 malicious artifact scanning at the inference gateway are the applicable controls.

Implementation pattern

import hashlib
import time
import uuid
from dataclasses import dataclass, field
from typing import Literal

import httpx

GLYPHWARD_API_KEY = "gw_..."          # stored in Secrets Manager, never in source
GLYPHWARD_SCAN_URL = "https://api.glyphward.com/v1/scan"

# Threshold by FedRAMP impact level.
# High impact uses a lower (more sensitive) threshold because the cost of a
# missed injection in a High-impact system exceeds the cost of a false positive.
FEDRAMP_THRESHOLDS: dict[str, int] = {
    "Moderate": 70,
    "High":     60,
}


@dataclass
class FederalScanRecord:
    """
    Per-image audit record.  All four fields must be written to the SIEM
    to satisfy AU-12 audit record generation for AI inference events.

    classification_context notes the FedRAMP impact level or system name.
    It MUST NOT contain actual CUI — use system identifiers only.
    """
    scan_id:                str
    image_sha256:           str
    score:                  int
    decision:               Literal["allow", "block"]
    timestamp:              str
    classification_context: str   # e.g. "FedRAMP-Moderate-BenefitsProcessing"


def log_to_audit_trail(record: FederalScanRecord) -> None:
    """
    Write the scan record to the system's authoritative audit log.

    In practice: publish to the CloudWatch Logs group / Splunk index /
    Azure Sentinel workspace that feeds the ConMon dashboard.  The
    scan_id ties back to the Glyphward dashboard for weekly anomaly review
    (RA-5 continuous monitoring evidence).
    """
    import json, logging
    audit_logger = logging.getLogger("audit.ai.image")
    audit_logger.info(json.dumps({
        "event":                    "multimodal_pi_scan",
        "scan_id":                  record.scan_id,
        "image_sha256":             record.image_sha256,
        "score":                    record.score,
        "decision":                 record.decision,
        "timestamp":                record.timestamp,
        "classification_context":   record.classification_context,
    }))


def scan_federal_image(
    image_bytes: bytes,
    impact_level: Literal["Moderate", "High"],
    system_identifier: str,
) -> FederalScanRecord:
    """
    Scan an image for multimodal prompt-injection before passing it to a
    vision-language model in a FedRAMP-authorized system.

    Implements SI-10 (input validation), SI-3 (malicious artifact detection),
    and AU-12 (audit record generation) for AI image inputs.

    Fail-closed: any API error or timeout results in a block decision.
    The image is never forwarded to the vision model unless this function
    returns a record with decision == "allow".

    Args:
        image_bytes:       Raw bytes of the image to scan.
        impact_level:      "Moderate" or "High" — sets the block threshold.
        system_identifier: Non-CUI system name for the audit log context
                           (e.g. "BenefitsProcessing-Prod").

    Returns:
        FederalScanRecord with scan_id, image_sha256, score, decision,
        timestamp, and classification_context.

    Raises:
        ValueError if image_bytes is empty.
    """
    if not image_bytes:
        raise ValueError("image_bytes must not be empty")

    image_sha256 = hashlib.sha256(image_bytes).hexdigest()
    classification_context = f"FedRAMP-{impact_level}-{system_identifier}"
    threshold = FEDRAMP_THRESHOLDS[impact_level]

    try:
        response = httpx.post(
            GLYPHWARD_SCAN_URL,
            headers={
                "Authorization": f"Bearer {GLYPHWARD_API_KEY}",
                "Content-Type":  "application/json",
            },
            json={
                "image":  __import__("base64").b64encode(image_bytes).decode(),
                "source": classification_context,
            },
            timeout=5.0,   # fail-closed on timeout — do not stall inference pipeline
        )
        response.raise_for_status()
        payload = response.json()
        score    = int(payload["score"])
        scan_id  = payload["scan_id"]
        decision: Literal["allow", "block"] = (
            "block" if score >= threshold else "allow"
        )

    except Exception:
        # Fail-closed: network error, API error, or timeout -> block.
        # Log the failure as a block so the SIEM alert fires, not silently passes.
        score    = 100
        scan_id  = f"ERR-{uuid.uuid4()}"
        decision = "block"

    record = FederalScanRecord(
        scan_id=scan_id,
        image_sha256=image_sha256,
        score=score,
        decision=decision,
        timestamp=time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
        classification_context=classification_context,
    )
    log_to_audit_trail(record)
    return record


# ---------------------------------------------------------------------------
# Inference gateway — call this instead of calling the vision model directly
# ---------------------------------------------------------------------------

def process_document_image(image_bytes: bytes, impact_level: str, system_id: str):
    """
    SI-10 compliant entry point for document image processing.
    The vision model is only called when the scan decision is 'allow'.
    """
    record = scan_federal_image(image_bytes, impact_level, system_id)

    if record.decision == "block":
        # Return a structured rejection.  Do not surface score to the submitter
        # (avoids oracle attacks that let an adversary tune the payload).
        return {
            "status":  "rejected",
            "reason":  "Image failed security validation.",
            "scan_id": record.scan_id,   # for audit cross-reference only
        }

    # Safe to forward to vision model.
    return call_vision_model(image_bytes)


def call_vision_model(image_bytes: bytes) -> dict:
    """Placeholder for the actual vision model inference call."""
    raise NotImplementedError("Replace with your GPT-4V / Claude / Gemini call")

The critical invariant is that call_vision_model is never reached when record.decision == "block". The scan sits between the image ingestion handler and the model inference call — not as an async side-channel, not as a post-processing filter, but as a synchronous gate. The scan_id returned in the rejection response lets the operations team locate the full scan record in the audit log without exposing the score, which would let an attacker iterate toward a payload that falls below the threshold. For the same reason, the threshold value should not appear in any user-facing API response or error message.

The FedRAMP continuous monitoring package for this control should include: the scan log export (grouped by week), a count of block decisions per system, and any anomalous spikes in score distribution — available from the Glyphward dashboard and exportable as CSV for inclusion in the monthly ConMon report.

Get early access

Coverage matrix

Control	Claimant-submitted ID document	Agency-internal diagram/chart	Contractor-submitted field image	User-uploaded form scan
SI-10 Input Validation	Pre-inference scan required; external origin = untrusted input	Lower risk (internal origin) but still required under SI-10(3)	Pre-inference scan required; contractor = external party	Pre-inference scan required; public-facing upload = untrusted
SI-3 Malicious Code Protection	Adversarial payload in claimant ID image is a malicious artifact at entry point	Reduced risk from internal origin; scan still satisfies SI-3 completeness requirement	Contractor-origin adversarial image is a malicious artifact; SI-3 entry-point scanning applies	Public upload is highest-risk SI-3 entry point; scanning mandatory
RA-5 Vulnerability Scanning	OWASP LLM01 multimodal PI is the documented vulnerability class; scan log is ConMon evidence	Internal images still within RA-5 continuous monitoring scope	External-origin images must be included in RA-5 monitoring scope	Highest-exposure RA-5 surface; anomaly spike detection required
SA-11 Penetration Testing	3PAO must test with adversarial claimant ID images; scan_id is test artifact	Internal diagram surface should be included in SA-11 scope for completeness	3PAO test scope must include contractor-submitted image channel	Primary SA-11 test surface for public-facing AI document systems
AU-12 Audit Record Generation	scan_id + image_sha256 + decision per document = per-transaction AU-12 record	Same audit fields required even for internal origin images	AU-12 record required; contractor identity logged at application layer, not in scan record	AU-12 record required; session token ties scan_id to authenticated user session