Category explainer · Image injection
Typographic prompt injection scanner
“Typographic prompt injection” is the umbrella term for every attack that smuggles an instruction to a vision-language model as rendered pixels instead of text. A scanner in this category has to look at the image — not the string that comes out of OCR — because the string is what the attacker is trying to hide.
TL;DR
A usable typographic PI scanner ingests an image, returns a risk score in 0–100, and highlights the pixel region responsible for the verdict. It runs four signals in parallel (OCR with confusable normalisation, a pixel-level instruction-layout classifier, a visual embedding nearest-neighbour over a shared corpus, and a perturbation-artefact detector) so that no single failure mode passes unchecked. Glyphward is that scanner at $0 / $29 / $99 per month.
What counts as typographic prompt injection
The class covers any payload where the instruction is delivered to the VLM as rendered glyphs rather than as text tokens. The most frequently seen subtypes in 2024–2026 public writeups:
- FigStep-style rendered instructions — numbered lists, polite fill-in-the-blanks, sometimes plus an otherwise-innocent text prompt. Target: policy refusals.
- AgentTypo-style adversarial glyphs — per-letter perturbation or anti-OCR fonts designed to defeat the OCR-first defence pattern specifically.
- Confusable substitution — Unicode lookalikes (Cyrillic, Greek, mathematical alphanumeric symbols) rendered into an image to route around blocklists that match on exact code-points.
- Screenshot-as-payload — a rendered screenshot of a webpage, DM thread, or IDE that itself contains a “when you read this, do X” instruction. Common against screenshot-reading agents.
- Mixed-media composites — a benign chart or product photo with a small rendered-text overlay in a corner. Easy to miss visually; trivially visible to a VLM.
The defining feature is the same across all five: the token sequence the defender sees (or fails to recover) is not the token sequence the VLM acts on.
What a typographic PI scanner has to do
Any scanner that looks only at OCR output inherits OCR’s failure modes. A scanner that looks only at a visual classifier inherits the classifier’s blind spots. Reliable coverage is an ensemble of complementary signals, each catching a different subset of the surface:
- OCR with confusable normalisation. Still useful — it recovers legible payloads cheaply. Normalise homoglyphs before matching so Cyrillic/Latin swaps do not hide the string.
- Instruction-layout classifier. A small pixel-level model that fires on “this image contains instructional structure” (numbered lists, imperative verbs, bullet stacks). Independent of whether OCR recovers the letters.
- Visual-embedding nearest-neighbour. CLIP-style embedding compared against a compounding corpus of known-malicious payloads. Catches paraphrased and font-swapped variants that fingerprint near known hits.
- Adversarial-perturbation detector. High-frequency-artefact classifier that flags the noise pattern typical of AgentTypo-class distortions.
The output contract matters as much as the signals: the scanner must return a score and a bounding region, not a boolean. Callers need to log evidence, A/B against thresholds, and show regulators what fired — not just that something did.
How Glyphward works as your typographic PI scanner
POST an image to Glyphward and you get a 0–100 risk score, a list of firing signals, and the bounding pixel region. The free tier gives 10 scans per day with no card; Pro at $29/mo covers 100k scans per month with a webhook and SDK; Team at $99/mo adds 1M scans, SSO-lite, and Slack alerts. The corpus compounds across all paying users, so coverage improves week over week without work on your side. See how the scanner plugs in, how pricing stacks against Lakera / LLM Guard / Azure / Promptfoo, or get on the waitlist to claim an API key at launch.
Related questions
Is typographic PI the same as “visual prompt injection”?
Visual prompt injection is the broader term — it covers typographic PI plus other image-based vectors (adversarial patches, steganographic encodings, image-as-context attacks). Typographic PI is the rendered-glyph subset. Most real-world incidents to date have been typographic.
Can I build this in-house with CLIP plus Tesseract?
You can build the first 40–50% coverage in a weekend. The trailing coverage — AgentTypo distortions, confusables, paraphrased FigStep variants — is where the compounding corpus matters, and that is the part we run for you.
Does this cover non-English payloads?
Partially. The pixel-level classifier and embedding nearest-neighbour are language-agnostic by design. The OCR + confusable-normalisation signal degrades for scripts with less training data (Arabic, Thai, CJK). Coverage is improving as the corpus grows; for launch, expect strong results in Latin-script languages and decent-but-not-leading results elsewhere.
Further reading
- FigStep detection — the best-known subtype of this class.
- AgentTypo detector — the adversarial-glyph sibling of FigStep.
- Indirect prompt injection via images — where typographic PI sits in the broader research thread.
- Audio prompt-injection detection — the waveform-side equivalent for voice agents.
- Lakera alternative (multimodal) — why the text-first market leader does not cover this class.