Category explainer · Image injection

Typographic prompt injection scanner

“Typographic prompt injection” is the umbrella term for every attack that smuggles an instruction to a vision-language model as rendered pixels instead of text. A scanner in this category has to look at the image — not the string that comes out of OCR — because the string is what the attacker is trying to hide.

TL;DR

A usable typographic PI scanner ingests an image, returns a risk score in 0–100, and highlights the pixel region responsible for the verdict. It runs four signals in parallel (OCR with confusable normalisation, a pixel-level instruction-layout classifier, a visual embedding nearest-neighbour over a shared corpus, and a perturbation-artefact detector) so that no single failure mode passes unchecked. Glyphward is that scanner at $0 / $29 / $99 per month.

What counts as typographic prompt injection

The class covers any payload where the instruction is delivered to the VLM as rendered glyphs rather than as text tokens. The most frequently seen subtypes in 2024–2026 public writeups:

The defining feature is the same across all five: the token sequence the defender sees (or fails to recover) is not the token sequence the VLM acts on.

What a typographic PI scanner has to do

Any scanner that looks only at OCR output inherits OCR’s failure modes. A scanner that looks only at a visual classifier inherits the classifier’s blind spots. Reliable coverage is an ensemble of complementary signals, each catching a different subset of the surface:

  1. OCR with confusable normalisation. Still useful — it recovers legible payloads cheaply. Normalise homoglyphs before matching so Cyrillic/Latin swaps do not hide the string.
  2. Instruction-layout classifier. A small pixel-level model that fires on “this image contains instructional structure” (numbered lists, imperative verbs, bullet stacks). Independent of whether OCR recovers the letters.
  3. Visual-embedding nearest-neighbour. CLIP-style embedding compared against a compounding corpus of known-malicious payloads. Catches paraphrased and font-swapped variants that fingerprint near known hits.
  4. Adversarial-perturbation detector. High-frequency-artefact classifier that flags the noise pattern typical of AgentTypo-class distortions.

The output contract matters as much as the signals: the scanner must return a score and a bounding region, not a boolean. Callers need to log evidence, A/B against thresholds, and show regulators what fired — not just that something did.

How Glyphward works as your typographic PI scanner

POST an image to Glyphward and you get a 0–100 risk score, a list of firing signals, and the bounding pixel region. The free tier gives 10 scans per day with no card; Pro at $29/mo covers 100k scans per month with a webhook and SDK; Team at $99/mo adds 1M scans, SSO-lite, and Slack alerts. The corpus compounds across all paying users, so coverage improves week over week without work on your side. See how the scanner plugs in, how pricing stacks against Lakera / LLM Guard / Azure / Promptfoo, or get on the waitlist to claim an API key at launch.

Get early access

Related questions

Is typographic PI the same as “visual prompt injection”?

Visual prompt injection is the broader term — it covers typographic PI plus other image-based vectors (adversarial patches, steganographic encodings, image-as-context attacks). Typographic PI is the rendered-glyph subset. Most real-world incidents to date have been typographic.

Can I build this in-house with CLIP plus Tesseract?

You can build the first 40–50% coverage in a weekend. The trailing coverage — AgentTypo distortions, confusables, paraphrased FigStep variants — is where the compounding corpus matters, and that is the part we run for you.

Does this cover non-English payloads?

Partially. The pixel-level classifier and embedding nearest-neighbour are language-agnostic by design. The OCR + confusable-normalisation signal degrades for scripts with less training data (Arabic, Thai, CJK). Coverage is improving as the corpus grows; for launch, expect strong results in Latin-script languages and decent-but-not-leading results elsewhere.

Further reading