ICP-by-product · CrewAI multi-agent systems
Prompt-injection scanner for CrewAI agents
CrewAI orchestrates multiple specialized agents — a researcher, a writer, a reviewer, a code executor — each receiving the previous agent's output as its task context. When that context includes an image attachment, a screenshot from a browser tool, or an audio clip a voice agent transcribed, the payload travels through the crew without any text guard touching the bytes. A FigStep-class glyph block embedded in a screenshot reaches your writer agent, your code agent, and your QA agent in sequence. The fix is one multimodal scan at each task boundary where non-text content enters or exits a crew member.
TL;DR
CrewAI's Task context can contain arbitrary content that flows between Agent instances. A text PI scanner watching the typed user prompt does not see the bytes inside task attachments. Wire a Glyphward scan in a custom Tool or in the Task callback before each agent run: POST the image or audio bytes to /v1/scan, gate on the returned score, and either drop the task or add a quarantine annotation to the context. Free tier: 10 scans/day, no card. Pro: 100,000/month at $29/mo.
Where multimodal content enters a CrewAI crew
Four injection points matter in production CrewAI deployments:
- User-supplied initial context. If your crew's kickoff accepts a file — a PDF the user wants summarized, a screenshot they want described, an audio memo they want transcribed — the bytes land in the first agent's context before any agent logic runs. This is the easiest place to scan and the most dangerous if skipped, because the injected instruction reaches every downstream crew member.
- Tool call results. Agents in a crew routinely call tools that return rich content: a browser screenshot from a Selenium or Playwright tool, a chart image returned by a data-analysis tool, a rendered dashboard from a BI connector. Each tool result is re-inserted into the calling agent's context and passed forward. A screenshot whose visible content happens to contain a typographic instruction in the corner goes through undetected by any text filter.
- Inter-agent task outputs. When a researcher agent completes its task and the output contains an image (a chart, a map, a scanned document) that the next agent is supposed to analyze, the bytes travel through CrewAI's delegation mechanism. The writer agent that receives them sees a visual input with the same trust level as the researcher's prose output — there is no trust boundary between crew members by default.
- Memory and shared context stores. CrewAI's optional memory layer lets agents write and read from a shared store between crew runs. If an earlier run wrote an image to memory and a later run retrieves it, the indirect-PI channel mirrors the RAG-pipeline pattern: the attacker writes into the store at one point, the payload is retrieved by a different query at a later time.
Why text guards miss this
A typical CrewAI security setup adds a guard tool or wraps the crew's kickoff() in a pre-check that runs the user's typed input through a text PI scanner. This catches "ignore previous instructions" in the user message. It does not catch:
- A FigStep-style numbered jailbreak list rendered as a PNG attached to the initial task.
- An AgentTypo-distorted glyph block in a tool-returned screenshot that reads as benign ASCII to a text extractor but triggers the model's vision encoder as an instruction.
- A WhisperInject-class audio carrier in a voice memo the crew was asked to summarize, where Whisper's transcript is clean and the out-of-band waveform instruction is delivered when the audio is re-transcribed by a downstream agent.
The structural reason is the same across all three: the text scanner consumes the text representation of the artifact (the OCR output, the ASR transcript, the file name) and not the bytes the model's multimodal encoder will consume. See why text scanners miss a 30-pixel PNG for the canonical explanation.
Where to place the scan in a CrewAI codebase
CrewAI exposes two natural hooks: the Task's callback parameter (runs after the agent completes but before the output is passed forward) and a custom Tool that wraps any file-returning tool call. Both fit without framework changes.
Option A — Tool wrapper. Create a GlyphwardScanTool that accepts bytes and a source_trust tier and returns either the bytes (if clean) or raises a ToolException (if score exceeds threshold). Chain it inside any tool that returns image or audio content:
from crewai_tools import BaseTool
import httpx
class GlyphwardScanTool(BaseTool):
name: str = "glyphward_scan"
description: str = "Scans image or audio bytes for multimodal prompt injection."
def _run(self, b64_bytes: str, modality: str = "image") -> str:
resp = httpx.post(
"https://api.glyphward.com/v1/scan",
json={"data": b64_bytes, "modality": modality, "source_trust": "low"},
headers={"Authorization": f"Bearer {GLYPHWARD_API_KEY}"},
timeout=5,
)
result = resp.json()
if result["score"] > 70:
raise Exception(f"Multimodal PI detected (score {result['score']}). Task blocked.")
return b64_bytes # clean — pass through
Option B — Task callback. Use the task's callback to inspect the output for embedded images before it is handed to the next agent. This catches the inter-agent case where the researcher agent itself generated or attached an image as part of its output.
For both options, set source_trust based on where the bytes originated: "high" for internally generated content, "medium" for third-party tool results, "low" for user-supplied files or any content retrieved from the open internet.
Threat model for agentic pipelines
CrewAI agents typically have real-world effects: they call APIs, write to databases, send emails, execute code. The prompt-injection threat in a multi-agent context is not just "the model outputs something embarrassing" — it is "the model executes an action on behalf of the attacker." A jailbreak payload delivered through a screenshot tool result can instruct a code-executor agent to exfiltrate environment variables, instruct a writer agent to include a social-engineering lure in an outbound email, or instruct a QA agent to approve a code change it should flag.
The multimodal PI threat model for 2026 covers the agentic escalation path in detail. The MITRE ATLAS mapping (AML.T0051 LLM Prompt Injection + AML.T0054 LLM Jailbreak) is the red-team vocabulary for scoping CrewAI security testing.
How Glyphward fits in a CrewAI stack
Glyphward's /v1/scan endpoint accepts raw image or audio bytes (base64 or multipart), returns a 0–100 risk score, the flagged pixel region or audio time window, and per-signal confidences for the three attack classes (FigStep / typographic, AgentTypo / distortion, WhisperInject / waveform carrier). Sub-200 ms round-trip, cacheable by content hash.
Text-side guards — Lakera Guard, LLM Guard, Azure Prompt Shields — keep their existing place on the typed-prompt and text-context legs of the pipeline. Glyphward covers the bytes those guards cannot reach. The two layers are additive, not competitive.
Related questions
My CrewAI crew only passes text between agents. Do I need this?
If no crew member calls a tool that returns images, screenshots, or audio, and if no user input ever includes those file types, a text guard is sufficient. The moment any tool returns image content or any user uploads a file, the text guard has a gap. Most production crews grow to include browser or vision tools over time — wiring the scan early is lower-cost than retrofitting it after a breach.
Which agent in the crew should run the scan?
Scan at every boundary where untrusted bytes enter or leave a crew member. For tool results: the calling agent, immediately after the tool returns. For user-supplied files: the first agent, before kickoff() passes context forward. For inter-agent outputs that include images: the receiving agent, before it processes the output. Over-scan is cheaper than under-scan in an agentic pipeline where downstream actions have real-world effects.
Does this add latency to the crew run?
One Glyphward scan adds ~150–200 ms per image or audio asset. In parallel with other async tool calls, the marginal latency is typically zero. Worst-case: serial scan of a large image-heavy tool result adds ~200 ms to that step. For crews where tool calls already take 1–5 seconds, this is in the noise.
How do I handle flagged content? Should the crew raise an exception or continue with a warning?
Raise an exception for score > 80 (high-confidence PI — terminate the crew run). For score 50–80 (possible PI), annotate the task context with a quarantine flag and let the crew continue but with tool-execution permissions downgraded (read-only, no external writes). For score < 50, pass through. Thresholds should be tighter for user-supplied files than for internally generated content.
Is this the same as the scanner needed for an OpenAI Assistants API or AutoGen pipeline?
The scanner is the same endpoint. The placement differs: Assistants API needs a scan on file attachments before they are uploaded to the Files API; AutoGen needs a scan on messages with image content before the group-chat relay. See those pages for framework-specific integration patterns.
Further reading
- The multimodal prompt-injection threat model for AI product teams (2026) — full threat model and the defender's playbook.
- Prompt-injection scanner for LangChain agents — the agent-framework sibling page; same scan, LangChain placement.
- Prompt-injection scanner for RAG pipelines — when the crew uses a retrieval backend with document storage.
- Prompt-injection scanner for MCP servers — if your crew tools speak MCP.
- MITRE ATLAS multimodal prompt injection — threat-intelligence framing for security review and red-team scoping.
- OWASP LLM01:2025 multimodal — compliance vocabulary for AppSec reviews of agentic pipelines.
- Multimodal LLM security API — the category-level overview.