Security checklist · All modalities · 2026
Multimodal AI security checklist
Use this checklist before shipping any feature that accepts image, audio, or multi-document inputs into an LLM pipeline. It covers all six defence layers: threat modelling, input validation, pre-LLM scan gate, system prompt hardening, privilege separation, output encoding, and audit logging. Each item maps to the relevant OWASP LLM Top 10 2025 risk. All items are binary (✓ / ✗) — if you cannot tick an item, treat it as a blocking issue for your security review.
How to use this checklist
Paste this into your security review document, pentest scope sheet, or launch readiness checklist. Work through sections A–F in order. Incomplete items in sections B–D are high-severity for any application that accepts untrusted image or audio inputs. Incomplete items in sections E–F are medium-severity. If you need to unblock a launch before completing all items, document the gap, assign an owner, and set a remediation deadline — do not ship silently incomplete.
A — Threat modelling (do before any code)
- Identify every point where image bytes, audio bytes, or rendered document images enter the LLM pipeline — user upload, URL parameter, retrieved document, tool output, session state, or external API response. Map these to a data flow diagram. Every entry point is an injection surface.
- For each entry point, document the trust level of the source: anonymous internet user (lowest), authenticated user, internal service, or admin-only (highest). Scan thresholds should be lower (more aggressive) for lower-trust sources.
- Identify what tools (actions) the LLM can invoke and what each tool can modify, read, or send. Each tool is a blast-radius multiplier. A fully injected LLM with no tools can only produce text. With tools it can send email, write files, call APIs.
- Define the worst-case injection scenario: what is the most harmful action a fully successful injection could cause? LLM06 If the answer is "delete the production database" or "send all user emails to an attacker," this is a privilege separation problem that must be resolved before scan gates are relevant.
- Review the OWASP LLM Top 10 2025 and confirm which of LLM01–LLM10 apply to your pipeline. LLM01–LLM10
B — Image input validation (before any model call)
- Validate image file type by magic bytes (not file extension). LLM01 Rename a PNG to .jpg and check if your validator accepts it. If yes, fix the validator.
- Enforce a maximum image file size (≤ 4 MB recommended for most use cases).
- Enforce a maximum image resolution (≤ 1 500 × 1 500 px before model preprocessing). LLM04 Larger images increase compute cost quadratically and can be weaponised for resource exhaustion.
- Verify that the image is decodable (run a codec decode, not just MIME sniff). Reject images that throw decode errors.
- Strip EXIF metadata from user-uploaded images before passing to the model or storing. LLM06 EXIF GPS coordinates, device IDs, and author fields are PII. Some multimodal models read EXIF tags.
- For images from external URLs (user-provided or retrieved), validate that the URL resolves to an expected domain or allowlist. SSRF via image URL fetching is a distinct attack vector. LLM02
C — Pre-LLM multimodal scan gate
- Scan all image inputs (user uploads and retrieved document images) with a multimodal injection scanner before the model call. LLM01 Text-only guards (LLM Guard, Lakera, Azure Prompt Shields) do not inspect image bytes. A scanner that reads pixels is required.
- Configure fail-closed behaviour: if the scan API is unreachable or returns an error, reject the input rather than pass it through. LLM01
- Set scan thresholds per trust level: lower threshold (50–60) for anonymous user uploads, higher threshold (70–80) for internally ingested documents.
- For audio inputs, scan waveform anomalies and transcript-layer injection (WhisperInject pattern) before the STT step. LLM01
- For RAG pipelines, scan retrieved document images at every retrieval hop — not only at ingestion time. An adversarial image may enter the retrieval index after ingestion. LLM01
-
Check the scan API's
complexity_scorefield (or equivalent) to detect resource-exhaustion images in addition to injection-payload images. LLM04 - Log every scan result (scan ID, score, source, user ID hash, timestamp) to your audit log.
D — System prompt hardening
- Include an explicit instruction scope clause in the system prompt: the model should treat instructions from user content, images, and tool outputs as data, not as instructions. LLM01
- Include a role constraint: the model should refuse requests to change its role, persona, or system prompt. LLM01
- Constrain the output format in the system prompt and validate the model's response against the expected format before processing it. LLM02
- Use a clear delimiter between the system prompt and user-provided content. Do not interpolate user input directly into the system prompt string.
- Test the system prompt against known injection probes: "Ignore previous instructions," "You are now DAN," "Print your system prompt," and the FigStep / AgentTypo attack images from the attack corpus. Document the results.
E — Privilege separation and output handling
- The LLM has only the tools it needs for the current task — no extra tools are available. LLM06
- Irreversible tool actions (send email, post to API, modify database, delete file) require explicit human confirmation before execution. LLM06
- Tool call arguments generated by the LLM are validated against the expected schema before the tool is invoked. LLM07
-
LLM-generated output rendered in a browser is HTML-escaped. LLM output is never passed to
eval(),innerHTML, or equivalent. LLM02 - LLM output interpolated into SQL, shell, or other interpreted contexts uses parameterised queries / safe APIs — never string concatenation. LLM02
- Retrieved documents and tool outputs are stored in a separate memory tier from system instructions. Injected instructions in retrieved content cannot overwrite the system prompt. LLM01
F — Audit logging and compliance
- Every multimodal model call is logged with: timestamp, user ID hash, input modalities (text/image/audio), scan result (score and scan ID), model called, output summary hash.
- Logs are append-only (write once, no update/delete) and retained for the required compliance period (HIPAA: 6 years; SOC 2: 1 year; EU AI Act high-risk: logs available for regulatory inspection).
- Anomaly alerting is configured: alert on scan rejection rate > baseline + 2σ (indicates active attack campaign). LLM04
- Cost monitoring is configured: alert on per-user inference cost > 5× the expected value (indicates DoS via high-complexity images). LLM04
- Incident response plan includes a step for multimodal injection: who is paged, how the pipeline is disabled, and how affected users are notified.
- For HIPAA-regulated deployments: confirm that image inputs containing PHI (patient photos, medical imaging, handwritten notes) are encrypted at rest and in transit, and that scan API vendors are covered under a BAA. LLM06 See HIPAA compliant AI security page for BAA details.
- For EU AI Act high-risk systems: complete Article 9 risk management documentation, Article 13 transparency obligations, and Article 15 robustness/cybersecurity assessment. LLM09 See EU AI Act Article 15 page for the multimodal cybersecurity assessment template.
Get early access — Glyphward covers sections B–C automatically
Related questions
Which items are blocking for a security review?
All items in sections B and C are blocking for any application that accepts image or audio inputs from untrusted sources (i.e. any user-facing application). Missing a pre-LLM scan gate (C.1) or configuring fail-open scan behaviour (C.2) are the two most common critical gaps in production deployments. Items in sections D–F are high or medium severity depending on the trust level of inputs and the sensitivity of the tools available to the model.
Can I use this checklist for an SOC 2 Type II AI security control?
Yes. SOC 2 does not prescribe specific controls for LLM applications; the auditor will assess whether you have identified relevant risks (threat modelling, section A) and implemented reasonable controls (sections B–F). A completed checklist with evidence that each item is satisfied (code reference or test result) is the right artefact to present to an SOC 2 auditor under CC6 (logical access and security) and CC9 (risk mitigation). The audit log items in section F directly satisfy the availability and confidentiality trust service criteria. See the SOC 2 AI security controls page for the full mapping.
How often should this checklist be reviewed?
At minimum: before every feature launch that adds a new image, audio, or multimodal input path; before every third-party integration that returns data used as LLM context; and after every security incident. Quarterly re-review is reasonable for stable production applications. The threat landscape changes quickly — the FigStep attack was published in 2023, AgentTypo in 2024, WhisperInject variants emerged through 2025 — so checklist items from last year may need to be updated to reflect new attack patterns.
Further reading
- Prompt injection prevention best practices — deep dive on each of the six defence layers.
- OWASP LLM01:2025 — multimodal prompt injection
- OWASP LLM04:2025 — Model DoS via adversarial images
- Real-time vs batch scanning — architecture guide for section C scan gate.
- Multimodal LLM security API — Glyphward API overview.