Compare · Promptfoo

Promptfoo + multimodal scanning: Glyphward as the image and audio detector in your Promptfoo pipeline

Promptfoo is the open-source eval-and-red-team harness most LLM-app teams reach for first. It is excellent at running text adversarial suites against your models and asserting on outputs. It is not a real-time inference-path scanner, and it is not pixel-aware. If your eval suite needs to cover image and audio prompt-injection — and your runtime needs to block those payloads at request time — Glyphward fits inside Promptfoo, not against it.

TL;DR

Promptfoo is eval-time. Glyphward is inference-time. They are different layers of the stack, not different products in the same category. Use Promptfoo to red-team your model with adversarial multimodal payloads (FigStep, AgentTypo, WhisperInject corpora) and assert on whether Glyphward catches them. Use Glyphward in production to actually catch them at request time.

Two different jobs

The category confusion is worth dispelling up front, because it shows up in every comparison shopper's notebook:

Promptfoo is a test harness. You write a YAML config that describes a set of test cases (prompts, expected behaviours, providers, assertions), point it at one or more LLMs, and it runs the matrix and gives you a report. Their redteam command bundles known adversarial payloads to flush jailbreaks. The model under test is the system you are evaluating; Promptfoo itself does not sit inline with your production traffic.
Glyphward is an inline scanner. You call POST /v1/scan on every image and audio upload before it reaches your VLM or STT layer; you get back a 0–100 score plus modality-tagged reasons; you decide whether to pass, log, or block. Glyphward never sees your model — it sees the bytes the user sent.

You absolutely want both. Promptfoo tells you, at CI time, that FigStep payload variant 17 produces a successful jailbreak in your VLM. Glyphward tells you, at request time, that this specific upload from this specific user looks like a FigStep variant; here is the score, here is the flagged region. Different signals, different consumers, different latency budgets.

The integration recipe

The cleanest pattern is to write Promptfoo assertions that check whether Glyphward correctly flags a known-malicious image, then run that suite continuously.

Step 1 — corpus. Maintain a folder of labelled adversarial samples. Public starting points include the FigStep paper's released payloads, the AgentTypo supplementary material, and any WhisperInject samples you have access to. Glyphward customers get a curated subset on request.

Step 2 — Promptfoo provider. Define an HTTP provider in your promptfooconfig.yaml that posts each sample to the Glyphward scan endpoint, and surfaces the returned score:

providers:
  - id: glyphward
    config:
      url: https://glyphward.com/v1/scan
      method: POST
      headers:
        Authorization: Bearer ${GLYPHWARD_API_KEY}
      body:
        image_b64: "{{image_b64}}"
      transformResponse: |
        return { output: String(json.score), score: json.score };

tests:
  - description: FigStep variant 17 should score >= 70
    vars:
      image_b64: file://corpus/figstep_17.png.b64
    assert:
      - type: javascript
        value: parseFloat(output) >= 70
  - description: Clean stock photo should score < 30
    vars:
      image_b64: file://corpus/clean_001.jpg.b64
    assert:
      - type: javascript
        value: parseFloat(output) < 30

Step 3 — run on every PR. Add the suite to CI. If recall on the malicious set drops or false-positive rate on the clean set climbs, the build fails before it ships. Glyphward's free tier is enough to run a small CI suite; the Pro tier covers a larger one.

Architectural difference

	Promptfoo	Glyphward
Layer	Eval / red-team / CI	Inline scanner / inference path
Triggered by	Developer at CI time	End-user upload at request time
Subject under test	Your LLM (and any guardrails on it)	The uploaded image or audio bytes
Output	Pass/fail matrix per test case	Per-request risk score + flagged regions
Latency budget	Minutes per suite is fine	Sub-200ms p95 required
Pricing	OSS + Promptfoo Cloud (paid)	Free 10/day · $29/mo Pro · $99/mo Team
Multimodal coverage	Provider-dependent; harness handles bytes if your assertions do	Image + audio first-class

The architectural lesson lands in one line: eval-time and inference-time are not substitutes. Removing the eval suite does not protect your users; removing the inline scanner means every malicious upload reaches the model. Removing both leaves you with nothing in either layer, which is roughly where most multimodal apps sit today.

What this looks like in production

A typical Glyphward + Promptfoo deployment, end-to-end:

CI — Promptfoo runs the multimodal suite on every PR. Build fails if Glyphward's recall on the corpus regresses past your threshold.
Pre-deploy — Promptfoo's redteam generates fresh adversarial variants and probes your full stack (Glyphward + your VLM) end-to-end. New variants that bypass go into the corpus.
Production — every image and audio upload calls Glyphward inline. Score + reasons + region land in your request log; flagged regions feed back into the corpus.
Weekly review — last 7 days of flagged-but-passed samples (the grey-band middle scores) get reviewed manually and labelled. New labels feed the next CI run.

This is the loop. Promptfoo enforces the contract; Glyphward provides the scanner that the contract is enforced against; the corpus updates from production reality. None of the three steps replaces another.

When to pick which

Promptfoo only works if your application is text-only, you red-team in CI, and you do not need an inline scanner because no untrusted bytes ever reach your model.
Glyphward only works if you want production protection and are not yet ready to invest in a CI eval suite. Most teams add Promptfoo on top within a quarter once the inline scanner is paying off.
Both is the default for any multimodal app with users. Eval and inference are independent insurance — neither covers the other's failure mode.

What the integration costs

Adding Glyphward to an existing Promptfoo setup is a YAML provider definition and an API key. There is no Glyphward SDK to install in CI — Promptfoo's HTTP provider already speaks REST. The reverse direction is just as cheap: an existing Glyphward-protected app can adopt Promptfoo by writing one config file. The reason this combination works is that neither product reaches into the other's surface area.

Get early access · See full market comparison