ICP-by-product · CrewAI multi-agent systems

Prompt-injection scanner for CrewAI agents

CrewAI orchestrates multiple specialized agents — a researcher, a writer, a reviewer, a code executor — each receiving the previous agent's output as its task context. When that context includes an image attachment, a screenshot from a browser tool, or an audio clip a voice agent transcribed, the payload travels through the crew without any text guard touching the bytes. A FigStep-class glyph block embedded in a screenshot reaches your writer agent, your code agent, and your QA agent in sequence. The fix is one multimodal scan at each task boundary where non-text content enters or exits a crew member.

TL;DR

CrewAI's Task context can contain arbitrary content that flows between Agent instances. A text PI scanner watching the typed user prompt does not see the bytes inside task attachments. Wire a Glyphward scan in a custom Tool or in the Task callback before each agent run: POST the image or audio bytes to /v1/scan, gate on the returned score, and either drop the task or add a quarantine annotation to the context. Free tier: 10 scans/day, no card. Pro: 100,000/month at $29/mo.

Where multimodal content enters a CrewAI crew

Four injection points matter in production CrewAI deployments:

User-supplied initial context. If your crew's kickoff accepts a file — a PDF the user wants summarized, a screenshot they want described, an audio memo they want transcribed — the bytes land in the first agent's context before any agent logic runs. This is the easiest place to scan and the most dangerous if skipped, because the injected instruction reaches every downstream crew member.
Tool call results. Agents in a crew routinely call tools that return rich content: a browser screenshot from a Selenium or Playwright tool, a chart image returned by a data-analysis tool, a rendered dashboard from a BI connector. Each tool result is re-inserted into the calling agent's context and passed forward. A screenshot whose visible content happens to contain a typographic instruction in the corner goes through undetected by any text filter.
Inter-agent task outputs. When a researcher agent completes its task and the output contains an image (a chart, a map, a scanned document) that the next agent is supposed to analyze, the bytes travel through CrewAI's delegation mechanism. The writer agent that receives them sees a visual input with the same trust level as the researcher's prose output — there is no trust boundary between crew members by default.
Memory and shared context stores. CrewAI's optional memory layer lets agents write and read from a shared store between crew runs. If an earlier run wrote an image to memory and a later run retrieves it, the indirect-PI channel mirrors the RAG-pipeline pattern: the attacker writes into the store at one point, the payload is retrieved by a different query at a later time.

Why text guards miss this

A typical CrewAI security setup adds a guard tool or wraps the crew's kickoff() in a pre-check that runs the user's typed input through a text PI scanner. This catches "ignore previous instructions" in the user message. It does not catch:

A FigStep-style numbered jailbreak list rendered as a PNG attached to the initial task.
An AgentTypo-distorted glyph block in a tool-returned screenshot that reads as benign ASCII to a text extractor but triggers the model's vision encoder as an instruction.
A WhisperInject-class audio carrier in a voice memo the crew was asked to summarize, where Whisper's transcript is clean and the out-of-band waveform instruction is delivered when the audio is re-transcribed by a downstream agent.

The structural reason is the same across all three: the text scanner consumes the text representation of the artifact (the OCR output, the ASR transcript, the file name) and not the bytes the model's multimodal encoder will consume. See why text scanners miss a 30-pixel PNG for the canonical explanation.

Where to place the scan in a CrewAI codebase

CrewAI exposes two natural hooks: the Task's callback parameter (runs after the agent completes but before the output is passed forward) and a custom Tool that wraps any file-returning tool call. Both fit without framework changes.

Option A — Tool wrapper. Create a GlyphwardScanTool that accepts bytes and a source_trust tier and returns either the bytes (if clean) or raises a ToolException (if score exceeds threshold). Chain it inside any tool that returns image or audio content:

from crewai_tools import BaseTool
import httpx

class GlyphwardScanTool(BaseTool):
    name: str = "glyphward_scan"
    description: str = "Scans image or audio bytes for multimodal prompt injection."

    def _run(self, b64_bytes: str, modality: str = "image") -> str:
        resp = httpx.post(
            "https://api.glyphward.com/v1/scan",
            json={"data": b64_bytes, "modality": modality, "source_trust": "low"},
            headers={"Authorization": f"Bearer {GLYPHWARD_API_KEY}"},
            timeout=5,
        )
        result = resp.json()
        if result["score"] > 70:
            raise Exception(f"Multimodal PI detected (score {result['score']}). Task blocked.")
        return b64_bytes  # clean — pass through

Option B — Task callback. Use the task's callback to inspect the output for embedded images before it is handed to the next agent. This catches the inter-agent case where the researcher agent itself generated or attached an image as part of its output.

For both options, set source_trust based on where the bytes originated: "high" for internally generated content, "medium" for third-party tool results, "low" for user-supplied files or any content retrieved from the open internet.

Threat model for agentic pipelines

CrewAI agents typically have real-world effects: they call APIs, write to databases, send emails, execute code. The prompt-injection threat in a multi-agent context is not just "the model outputs something embarrassing" — it is "the model executes an action on behalf of the attacker." A jailbreak payload delivered through a screenshot tool result can instruct a code-executor agent to exfiltrate environment variables, instruct a writer agent to include a social-engineering lure in an outbound email, or instruct a QA agent to approve a code change it should flag.

The multimodal PI threat model for 2026 covers the agentic escalation path in detail. The MITRE ATLAS mapping (AML.T0051 LLM Prompt Injection + AML.T0054 LLM Jailbreak) is the red-team vocabulary for scoping CrewAI security testing.

How Glyphward fits in a CrewAI stack

Glyphward's /v1/scan endpoint accepts raw image or audio bytes (base64 or multipart), returns a 0–100 risk score, the flagged pixel region or audio time window, and per-signal confidences for the three attack classes (FigStep / typographic, AgentTypo / distortion, WhisperInject / waveform carrier). Sub-200 ms round-trip, cacheable by content hash.

Text-side guards — Lakera Guard, LLM Guard, Azure Prompt Shields — keep their existing place on the typed-prompt and text-context legs of the pipeline. Glyphward covers the bytes those guards cannot reach. The two layers are additive, not competitive.

Get early access