Compare · Glyphward vs Promptfoo

Glyphward vs Promptfoo

Promptfoo is an open-source test harness and red-team eval framework — you run it at CI time to check what your model does when adversaries push on it. Glyphward is an inference-time scanner — it sits inline with production traffic and scores image and audio bytes before they reach your VLM or STT. Different layers, different latency budgets, different consumers. They are not substitutes; the strongest stacks run both.

TL;DR

Use Promptfoo to evaluate your defences against adversarial test cases on every PR. Use Glyphward as one of the defences Promptfoo evaluates and the runtime scanner that catches whatever slips through CI. Removing Promptfoo means you stop measuring your guardrail; removing Glyphward means malicious uploads reach the model unscored. Removing both is roughly where most multimodal apps sit today.

What each product actually is

Promptfoo is an MIT-licensed CLI and library — npx promptfoo — that runs a configurable matrix of test prompts against one or more model providers and reports pass/fail per assertion. Their redteam command bundles canonical jailbreak and prompt-injection payloads (DAN families, indirect-PI, role-play attacks, and more) so you can flush regressions before shipping. Promptfoo Cloud adds dashboards and managed history on top of the OSS core. The tool's job is to tell you whether your model still misbehaves.

Glyphward is a managed HTTPS API. You POST an image or audio file to /v1/scan and receive a 0–100 risk score, modality-tagged reasons, and bounding-box coordinates on flagged pixels or waveform windows. We run the detector models — FigStep and AgentTypo-trained text-in-image heads, waveform-anomaly plus Whisper-small transcript ensemble for audio — and we keep them current as new attack vectors land. The tool's job is to score the bytes a real user just sent.

Honest feature table

PromptfooGlyphward
Layer in the stackEval / red-team / CIInline scanner / inference path
Triggered byDeveloper at CI timeEnd-user upload at request time
Subject under testYour LLM and any guardrails on itThe uploaded image or audio bytes
OutputPass/fail matrix per test casePer-request risk score + flagged regions
Latency budgetMinutes per suite is fineSub-200 ms p95
Multimodal coverageProvider-dependent; harness handles bytes if your assertions doImage + audio first-class
LicenceMIT (OSS) + Promptfoo Cloud (paid)Commercial managed service
HostingSelf-host the OSS, or CloudManaged (our infra)
PricingOSS free · Cloud per quoteFree 10/day · $29/mo Pro · $99/mo Team
Owns the corpusYou curate yoursWe curate, signatures shared across customers

Where Glyphward wins

Where Promptfoo wins

When to pick which

Pick Promptfoo if you need pre-deploy assurance that your LLM and its guardrails behave under adversarial input — and especially if you do not yet have a red-team suite at all. Promptfoo is the lowest-friction path from "no eval" to "passing eval on every PR".

Pick Glyphward if you accept user-uploaded images or audio in production and want a managed scanner you don't have to operate. Most multimodal apps we see — avatar SaaS, voice agents, screenshot-reading agents — do not want to host another inference stack just to score uploads.

Run both is the default recommendation for any team taking multimodal PI seriously. Promptfoo at CI time, scoring whether Glyphward and your other defences still catch the latest payload classes; Glyphward at request time, scoring real user bytes before they reach your model. Two layers, two latency budgets, two consumers.

Integration sketch (running both)

The cleanest pattern is to register Glyphward as an HTTP provider inside your promptfooconfig.yaml and write assertions that confirm Glyphward correctly flags a known-malicious image at score ≥ 70 and clean stock photos at < 30. Run the suite in CI on every PR. When Glyphward's recall on the malicious set drops or the false-positive rate on the clean set climbs, the build fails before it ships. Promptfoo's free tier is enough; Glyphward's free tier (10 scans/day) covers a small CI suite, and the Pro tier ($29/mo, 100k scans) covers a much larger one.

A worked YAML example with the exact provider config, asserts, and how the corpus is laid out lives in our Promptfoo + multimodal scanning recipe.

FAQ

Does Promptfoo not include any image or audio detectors of its own?

Promptfoo orchestrates providers and asserts on outputs. The detectors that classify whether an image is a FigStep payload or whether audio carries a WhisperInject pattern are not part of the harness — that is what a scanner like Glyphward does, and that is what your assertions point at. If Promptfoo ships native multimodal detectors in the future we will update this page; their roadmap is theirs to announce.

Can I just call Glyphward from a Python test in pytest and skip Promptfoo?

Yes, and many teams do for a small suite. You give that up the broader benefits Promptfoo brings — provider matrix, dashboards, prompt regression tracking, the bundled red-team payload library. For a suite of a few dozen samples on one provider, hand-rolled tests are fine; once you have multiple providers or want regression history, Promptfoo's harness pays for itself.

Is there overlap between Promptfoo's red-team payloads and Glyphward's training corpus?

Some — both draw on the public attack literature (FigStep, AgentTypo, WhisperInject, indirect-PI). The difference is what each of us does with those payloads. Promptfoo uses them as inputs to drive its eval matrix. We use them as labelled training and evaluation data for the detector models, and we add real flagged samples customers see in production. Different role for the same source material.

What about latency?

Promptfoo's latency budget is "however long the suite needs"; tens of seconds to minutes per run is normal. Glyphward targets sub-200 ms p95 because it sits on the request path. The two budgets are not compatible — a CI tool optimised for tens of minutes is the wrong shape for a scanner that has to answer in 200 ms, and vice versa.

Will running both double my bill?

No. Promptfoo OSS is free; the cost is the CI runner time you already pay for. Glyphward starts at $0 and is a flat monthly rate above that. Running both does not compound — they sit at different points in your pipeline and bill against different budgets.

Further reading

Get early access · Full rate card