Category explainer · Image injection

Typographic prompt injection scanner

“Typographic prompt injection” is the umbrella term for every attack that smuggles an instruction to a vision-language model as rendered pixels instead of text. A scanner in this category has to look at the image — not the string that comes out of OCR — because the string is what the attacker is trying to hide.

TL;DR

A usable typographic PI scanner ingests an image, returns a risk score in 0–100, and highlights the pixel region responsible for the verdict. It runs four signals in parallel (OCR with confusable normalisation, a pixel-level instruction-layout classifier, a visual embedding nearest-neighbour over a shared corpus, and a perturbation-artefact detector) so that no single failure mode passes unchecked. Glyphward is that scanner at $0 / $29 / $99 per month.

What counts as typographic prompt injection

The class covers any payload where the instruction is delivered to the VLM as rendered glyphs rather than as text tokens. The most frequently seen subtypes in 2024–2026 public writeups:

FigStep-style rendered instructions — numbered lists, polite fill-in-the-blanks, sometimes plus an otherwise-innocent text prompt. Target: policy refusals.
AgentTypo-style adversarial glyphs — per-letter perturbation or anti-OCR fonts designed to defeat the OCR-first defence pattern specifically.
Confusable substitution — Unicode lookalikes (Cyrillic, Greek, mathematical alphanumeric symbols) rendered into an image to route around blocklists that match on exact code-points.
Screenshot-as-payload — a rendered screenshot of a webpage, DM thread, or IDE that itself contains a “when you read this, do X” instruction. Common against screenshot-reading agents.
Mixed-media composites — a benign chart or product photo with a small rendered-text overlay in a corner. Easy to miss visually; trivially visible to a VLM.

The defining feature is the same across all five: the token sequence the defender sees (or fails to recover) is not the token sequence the VLM acts on.

What a typographic PI scanner has to do

Any scanner that looks only at OCR output inherits OCR’s failure modes. A scanner that looks only at a visual classifier inherits the classifier’s blind spots. Reliable coverage is an ensemble of complementary signals, each catching a different subset of the surface:

OCR with confusable normalisation. Still useful — it recovers legible payloads cheaply. Normalise homoglyphs before matching so Cyrillic/Latin swaps do not hide the string.
Instruction-layout classifier. A small pixel-level model that fires on “this image contains instructional structure” (numbered lists, imperative verbs, bullet stacks). Independent of whether OCR recovers the letters.
Visual-embedding nearest-neighbour. CLIP-style embedding compared against a compounding corpus of known-malicious payloads. Catches paraphrased and font-swapped variants that fingerprint near known hits.
Adversarial-perturbation detector. High-frequency-artefact classifier that flags the noise pattern typical of AgentTypo-class distortions.

The output contract matters as much as the signals: the scanner must return a score and a bounding region, not a boolean. Callers need to log evidence, A/B against thresholds, and show regulators what fired — not just that something did.

How Glyphward works as your typographic PI scanner

POST an image to Glyphward and you get a 0–100 risk score, a list of firing signals, and the bounding pixel region. The free tier gives 10 scans per day with no card; Pro at $29/mo covers 100k scans per month with a webhook and SDK; Team at $99/mo adds 1M scans, SSO-lite, and Slack alerts. The corpus compounds across all paying users, so coverage improves week over week without work on your side. See how the scanner plugs in, how pricing stacks against Lakera / LLM Guard / Azure / Promptfoo, or get on the waitlist to claim an API key at launch.

Get early access

Typographic prompt injection scanner

TL;DR

What counts as typographic prompt injection

What a typographic PI scanner has to do

How Glyphward works as your typographic PI scanner

Related questions

Further reading