Security checklist · All modalities · 2026

Multimodal AI security checklist

Use this checklist before shipping any feature that accepts image, audio, or multi-document inputs into an LLM pipeline. It covers all six defence layers: threat modelling, input validation, pre-LLM scan gate, system prompt hardening, privilege separation, output encoding, and audit logging. Each item maps to the relevant OWASP LLM Top 10 2025 risk. All items are binary (✓ / ✗) — if you cannot tick an item, treat it as a blocking issue for your security review.

How to use this checklist

Paste this into your security review document, pentest scope sheet, or launch readiness checklist. Work through sections A–F in order. Incomplete items in sections B–D are high-severity for any application that accepts untrusted image or audio inputs. Incomplete items in sections E–F are medium-severity. If you need to unblock a launch before completing all items, document the gap, assign an owner, and set a remediation deadline — do not ship silently incomplete.

A — Threat modelling (do before any code)

Identify every point where image bytes, audio bytes, or rendered document images enter the LLM pipeline — user upload, URL parameter, retrieved document, tool output, session state, or external API response. Map these to a data flow diagram. Every entry point is an injection surface.
For each entry point, document the trust level of the source: anonymous internet user (lowest), authenticated user, internal service, or admin-only (highest). Scan thresholds should be lower (more aggressive) for lower-trust sources.
Identify what tools (actions) the LLM can invoke and what each tool can modify, read, or send. Each tool is a blast-radius multiplier. A fully injected LLM with no tools can only produce text. With tools it can send email, write files, call APIs.
Define the worst-case injection scenario: what is the most harmful action a fully successful injection could cause? LLM06 If the answer is "delete the production database" or "send all user emails to an attacker," this is a privilege separation problem that must be resolved before scan gates are relevant.
Review the OWASP LLM Top 10 2025 and confirm which of LLM01–LLM10 apply to your pipeline. LLM01–LLM10

B — Image input validation (before any model call)

Validate image file type by magic bytes (not file extension). LLM01 Rename a PNG to .jpg and check if your validator accepts it. If yes, fix the validator.
Enforce a maximum image file size (≤ 4 MB recommended for most use cases).
Enforce a maximum image resolution (≤ 1 500 × 1 500 px before model preprocessing). LLM04 Larger images increase compute cost quadratically and can be weaponised for resource exhaustion.
Verify that the image is decodable (run a codec decode, not just MIME sniff). Reject images that throw decode errors.
Strip EXIF metadata from user-uploaded images before passing to the model or storing. LLM06 EXIF GPS coordinates, device IDs, and author fields are PII. Some multimodal models read EXIF tags.
For images from external URLs (user-provided or retrieved), validate that the URL resolves to an expected domain or allowlist. SSRF via image URL fetching is a distinct attack vector. LLM02

C — Pre-LLM multimodal scan gate

Scan all image inputs (user uploads and retrieved document images) with a multimodal injection scanner before the model call. LLM01 Text-only guards (LLM Guard, Lakera, Azure Prompt Shields) do not inspect image bytes. A scanner that reads pixels is required.
Configure fail-closed behaviour: if the scan API is unreachable or returns an error, reject the input rather than pass it through. LLM01
Set scan thresholds per trust level: lower threshold (50–60) for anonymous user uploads, higher threshold (70–80) for internally ingested documents.
For audio inputs, scan waveform anomalies and transcript-layer injection (WhisperInject pattern) before the STT step. LLM01
For RAG pipelines, scan retrieved document images at every retrieval hop — not only at ingestion time. An adversarial image may enter the retrieval index after ingestion. LLM01
Check the scan API's complexity_score field (or equivalent) to detect resource-exhaustion images in addition to injection-payload images. LLM04
Log every scan result (scan ID, score, source, user ID hash, timestamp) to your audit log.

D — System prompt hardening

Include an explicit instruction scope clause in the system prompt: the model should treat instructions from user content, images, and tool outputs as data, not as instructions. LLM01
Include a role constraint: the model should refuse requests to change its role, persona, or system prompt. LLM01
Constrain the output format in the system prompt and validate the model's response against the expected format before processing it. LLM02
Use a clear delimiter between the system prompt and user-provided content. Do not interpolate user input directly into the system prompt string.
Test the system prompt against known injection probes: "Ignore previous instructions," "You are now DAN," "Print your system prompt," and the FigStep / AgentTypo attack images from the attack corpus. Document the results.

E — Privilege separation and output handling

The LLM has only the tools it needs for the current task — no extra tools are available. LLM06
Irreversible tool actions (send email, post to API, modify database, delete file) require explicit human confirmation before execution. LLM06
Tool call arguments generated by the LLM are validated against the expected schema before the tool is invoked. LLM07
LLM-generated output rendered in a browser is HTML-escaped. LLM output is never passed to eval(), innerHTML, or equivalent. LLM02
LLM output interpolated into SQL, shell, or other interpreted contexts uses parameterised queries / safe APIs — never string concatenation. LLM02
Retrieved documents and tool outputs are stored in a separate memory tier from system instructions. Injected instructions in retrieved content cannot overwrite the system prompt. LLM01

F — Audit logging and compliance

Every multimodal model call is logged with: timestamp, user ID hash, input modalities (text/image/audio), scan result (score and scan ID), model called, output summary hash.
Logs are append-only (write once, no update/delete) and retained for the required compliance period (HIPAA: 6 years; SOC 2: 1 year; EU AI Act high-risk: logs available for regulatory inspection).
Anomaly alerting is configured: alert on scan rejection rate > baseline + 2σ (indicates active attack campaign). LLM04
Cost monitoring is configured: alert on per-user inference cost > 5× the expected value (indicates DoS via high-complexity images). LLM04
Incident response plan includes a step for multimodal injection: who is paged, how the pipeline is disabled, and how affected users are notified.
For HIPAA-regulated deployments: confirm that image inputs containing PHI (patient photos, medical imaging, handwritten notes) are encrypted at rest and in transit, and that scan API vendors are covered under a BAA. LLM06 See HIPAA compliant AI security page for BAA details.
For EU AI Act high-risk systems: complete Article 9 risk management documentation, Article 13 transparency obligations, and Article 15 robustness/cybersecurity assessment. LLM09 See EU AI Act Article 15 page for the multimodal cybersecurity assessment template.

Get early access — Glyphward covers sections B–C automatically