ICP-by-product · LangChain agents

Prompt-injection scanner for LangChain agents

A LangChain agent that accepts an image is a LangChain agent that accepts an instruction. The text guard you wired into your LCEL chain — whatever scans the user's typed prompt before the model call — does not look at the bytes attached to it. FigStep, AgentTypo, indirect image PI, and the audio payloads behind WhisperInject all ride straight through. The fix is one Runnable, dropped in front of the model, that scans every attachment and short-circuits the chain when the score crosses a threshold.

TL;DR

LangChain's multimodal message format — a HumanMessage whose content is a list of {type: "text"}, {type: "image_url"} and provider-specific audio parts — passes bytes through your chain that no text guard will ever inspect. Wrap a Glyphward call in a RunnableLambda, mount it before your ChatModel (or before your create_tool_calling_agent executor), and you get a 0–100 score on every attachment with about 80 lines of Python and one API key.

Where the gap is in a typical LangChain pipeline

The default text-safety pattern in LangChain Expression Language is straightforward: a RunnableLambda that runs a text PI scanner — Lakera, LLM Guard, OpenAI's moderation endpoint, your own classifier — over the user's prompt, then forwards or rejects. It is a sound pattern. It is also blind to two things.

First, it sees only the text leg of a multimodal HumanMessage. When a user attaches an image_url part (a base64 PNG or a URL), or when your agent's tool returns an image — a screenshot, a chart, a captured camera frame — that part is not a string. The text scanner reads "what is in this picture?" and waves it through. The FigStep, AgentTypo, and typographic injection payloads all live in the bytes the text scanner does not look at.

Second, it runs at the wrong altitude for tool-using agents. create_tool_calling_agent, create_react_agent, and the LangGraph node-based agents all loop the model with tool outputs that may themselves contain bytes — a screenshot taken by a browser tool, an audio file fetched by a scraping tool, an image returned from an MCP server. Inspecting the human's first message is not enough. You need to inspect every multimodal message that lands on the model's input stack, on every iteration of the loop.

The Runnable that closes it

The pattern below is the minimum viable wedge. It walks the message list, base64-decodes any image_url parts, posts them to Glyphward, and either passes the chain through or short-circuits with a refusal. Drop the scanner before the model in the LCEL pipe, or before the agent executor, depending on which surface is yours.

from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_core.messages import HumanMessage, AIMessage
from langchain_openai import ChatOpenAI
import base64, os, requests

GLYPHWARD_KEY = os.environ["GLYPHWARD_API_KEY"]
SCAN_URL = "https://glyphward.com/v1/scan"
BLOCK_AT = 70  # 0-100; tune to your false-positive budget

def _scan_one(b64_or_url: str) -> float:
    # Resolve URLs to bytes; pass base64 through.
    if b64_or_url.startswith("data:"):
        b64 = b64_or_url.split(",", 1)[1]
    elif b64_or_url.startswith(("http://", "https://")):
        b64 = base64.b64encode(requests.get(b64_or_url, timeout=8).content).decode()
    else:
        b64 = b64_or_url
    r = requests.post(
        SCAN_URL,
        headers={"Authorization": f"Bearer {GLYPHWARD_KEY}"},
        json={"image_b64": b64},
        timeout=8,
    )
    r.raise_for_status()
    return float(r.json()["score"])

def scan_messages(messages):
    for m in messages:
        if not isinstance(m, HumanMessage) or not isinstance(m.content, list):
            continue
        for part in m.content:
            if part.get("type") == "image_url":
                url = part["image_url"]["url"] if isinstance(part["image_url"], dict) else part["image_url"]
                score = _scan_one(url)
                if score >= BLOCK_AT:
                    return AIMessage(content=f"Blocked: attachment scored {score:.0f}/100 for prompt injection.")
            elif part.get("type") == "input_audio":
                # provider-specific; same shape as image_url for the scan call
                audio_b64 = part.get("input_audio", {}).get("data") or ""
                if audio_b64 and _scan_one(audio_b64) >= BLOCK_AT:
                    return AIMessage(content="Blocked: audio attachment flagged.")
    return messages  # untouched — let the chain proceed

guard = RunnableLambda(scan_messages)
model = ChatOpenAI(model="gpt-4.1-mini")
chain = guard | model

That is the entire wedge. guard returns either the original message list (chain proceeds normally) or an AIMessage that short-circuits to the user. For tool-using agents, mount the same guard as a before_model hook on each LangGraph state transition, or wrap the model inside an AgentExecutor with a callback that calls it on every iteration. The scanning logic does not change; only the mount point does.

Where to mount it: three patterns

  1. Single-turn LCEL chain. Pipe the guard Runnable in front of the ChatModel. Easiest. Covers the user's first message and is enough if your chain is not tool-using.
  2. Tool-calling agent (create_tool_calling_agent, create_react_agent). Run the guard on the input messages and again before each model call inside the loop, since tool outputs can carry images. The cleanest place to add it is in a RunnableLambda wrapped around the model call inside the agent's prompt assembly.
  3. LangGraph state machine. Add a guard node before every node that calls a multimodal model. The graph already enforces ordering, so the guard fires on every transition that produces a model input. This is the version that scales to multi-step browsing agents and screenshot-reading workflows — see prompt-injection scanner for screenshot agents for the screenshot-specific threat model.

None of the three patterns require touching the underlying provider integration. The guard is provider-agnostic: it inspects bytes, not the chat-completion API the bytes are going to.

Why not just use LangSmith red-team or a text guard?

LangSmith's evaluator and red-team flows are eval-time tools: they exercise your chain offline against datasets of adversarial inputs and surface a pass/fail matrix. They do not sit on the inference path of a production request. You want both layers, in the same way you want both Promptfoo and an inline scanner — see Promptfoo + multimodal scanning for the eval-time vs inference-time argument that applies identically here.

A text-only guard like LLM Guard or Lakera Guard on the prompt-text leg is necessary and stays. It catches the half of the attack space that lives in the typed instruction. Glyphward sits beside it, on the bytes leg. The recommended LangChain stack is your existing text guard plus a Glyphward Runnable on every message that contains an attachment. Two scanners, two surfaces, one chain.

Latency budget for an inline LangChain guard

A single image scan returns in tens of milliseconds for typical chat attachment sizes (200 KB–2 MB) on commodity inference hardware. For a one-attachment user message that is below the noise floor of any production VLM call. For tool-loop agents that fire the guard several times per turn, parallelise: asyncio.gather over each image part, or batch via the /v1/scan_batch endpoint on the Pro tier. The scanner is not in the per-token path of your model — it is a single fixed-cost call per attachment, not per generated token.

For the audio side, the same pattern applies. Audio attachment shapes vary by provider (input_audio for OpenAI's audio preview, partner-specific for Gemini and Anthropic), but the scanner takes the bytes regardless of how they were wrapped. See audio prompt-injection detection for the broader audio threat model.

Pricing for a LangChain agent

Most LangChain prototypes can run on the free tier at 10 scans/day during development. A production agent serving real users moves to Pro at $29/month for 100,000 scans, which covers up to roughly 3,000 attachment-bearing turns/day at one scan per turn — enough for a small to mid-sized SaaS product. Team at $99 covers a million scans plus an audit log that pairs cleanly with LangSmith's tracing. Full breakdown on pricing and a side-by-side against the rest of the market on multimodal PI scanner pricing comparison.

Get early access · See the API surface

Related questions

Does this work with LangChain.js as well as Python?

Yes. The HTTP API is the same; the integration is a one-file wrapper around fetch in JS, mounted as a RunnableLambda in the JS LCEL pipe. The bytes the scanner reads are model-agnostic, so the JS port is purely cosmetic.

What about LangGraph's tools_condition / interrupt flow?

Add a guard node before the model node and route a "blocked" classification to an interrupt that returns the refusal to the human-in-the-loop edge. The standard LangGraph pattern of a conditional edge handles the routing — no surgery on LangGraph itself.

I am using RunnableWithMessageHistory for multi-turn memory. Will the guard see history attachments?

Mount the guard after history hydration, on the full message list that will be sent to the model. The guard reads from the same message list, so historical attachments are inspected on the turn they re-enter the model context. Cache previously-clean attachment scores by content hash if the cost matters.

Do I need to install a Glyphward SDK?

No. The guard above is one HTTP POST. We will publish a langchain-glyphward companion package once the public API is GA so the integration becomes a one-line import; the manual wrapper above is the canonical pattern in the meantime.

Will this slow my agent's first-token latency?

For a single attachment per turn, yes — by the scan time, typically tens of milliseconds. For zero-attachment turns the guard adds essentially nothing because the message walk is a no-op. If first-token latency is a hard constraint, run the guard in parallel with prompt assembly: kick off the scan as soon as the message list is built, and only block if the result returns over threshold before the model's first token starts streaming.

Further reading