Technical concept · Agentic RAG

Multimodal prompt injection in agentic RAG pipelines

Simple retrieval-augmented generation (RAG) retrieves a fixed set of chunks, inserts them into the context, and calls the LLM once. Agentic RAG — as implemented in LangGraph, LlamaIndex agents, and AutoGen — is fundamentally different: the agent decides what to retrieve, when to stop, and which tools to call based on the intermediate results of prior retrieval steps. This creates a qualitatively worse injection surface. In simple RAG, an adversarial image in a retrieved chunk corrupts the final answer. In agentic RAG, the same image can corrupt the agent's next retrieval decision — causing it to fetch attacker-controlled documents, call unintended tools, or terminate early with a false result. The injection propagates through the entire agent loop, not just the output token stream. Glyphward's scan gate at the tool-result layer — after the retriever returns chunks but before the LLM processes them — breaks this loop at the only point where image bytes are visible.

TL;DR

In your LangGraph node or LlamaIndex agent tool, after the retriever returns document chunks but before the LLM sees them, scan any image byte content with POST https://glyphward.com/v1/scan. If score ≥ 70, remove that chunk from the context window and log the event. Apply to every retrieval hop, not just the first. Free tier — 10 scans/day, no card required.

How agentic RAG creates a multi-hop injection surface

Simple RAG vs agentic RAG injection. In simple one-hop RAG: user query → vector search → top-k chunks → LLM prompt → final answer. An adversarial image in a retrieved chunk injects instructions into the final prompt. The LLM may follow those instructions in its response — but the response is the end of the pipeline. The scope of damage is: one response is wrong.

In agentic RAG with N retrieval hops: user query → agent plans retrieval → retriever returns chunks (including adversarial image) → LLM processes chunks → decides next retrieval query or tool call → retriever returns more chunks → … → final answer. If the LLM follows adversarial instructions embedded in a retrieved image at hop 1, it may issue a different retrieval query at hop 2 — one that fetches attacker-controlled documents. Every subsequent hop builds on the corrupted retrieval state. The scope of damage is: the entire agent session, and potentially any tool calls the agent makes along the way.

Where image bytes enter agentic RAG.

1. PDF page images from vector stores. Document loaders (PyPDFLoader, PDFPlumber, Unstructured) chunk PDFs by page or section. Pages with embedded images (charts, diagrams, annotated screenshots) are often stored as rendered page images alongside or instead of extracted text. Multimodal vector stores (Qdrant with payload.image, Weaviate with blob properties, LanceDB multimodal collections) store these image bytes alongside text embeddings. When the retriever returns a multimodal chunk, the agent loop includes the image bytes in the context window for the next LLM call.

2. Web search results with scraped images. Agentic RAG pipelines that include a web search tool (Tavily, Brave Search API, SerpAPI) may retrieve web pages containing images. If the agent fetches and processes the HTML of retrieved URLs (via a browse tool or a scraping step), any image on the page enters the multimodal context. An attacker who controls a web page indexed by the search provider can craft an adversarial image that the agent retrieves during its search loop.

3. Knowledge graph traversal with image nodes. Agents that traverse knowledge graphs (Neo4j, Amazon Neptune) as part of their retrieval strategy may encounter nodes with associated image properties — entity photos, document scans, diagram attachments. Each graph traversal step is a potential injection point if image nodes are included in the query result and forwarded to the vision LLM.

4. Tool outputs that include image references. Agentic RAG tools often include image-returning functions: screenshot tools, chart generators, document fetchers. If a tool returns an image (as base64 or a URL) in its output, the agent loop passes that image to the LLM for interpretation before deciding the next step. An adversarial tool output can inject instructions into the agent's planning state at any hop.

Integration: Python / LangGraph — scan in a tool-result processing node

import base64, requests
from typing import TypedDict, Annotated
from langchain_core.documents import Document
from langgraph.graph import StateGraph, END

GLYPHWARD_KEY = "<your-glyphward-api-key>"
SCAN_THRESHOLD = 70

def scan_image(image_bytes: bytes) -> dict:
    encoded = base64.b64encode(image_bytes).decode()
    resp = requests.post(
        "https://glyphward.com/v1/scan",
        json={"image": encoded, "source": "agentic_rag"},
        headers={"Authorization": f"Bearer {GLYPHWARD_KEY}"},
        timeout=8,
    )
    resp.raise_for_status()
    return resp.json()

def filter_retrieved_chunks(chunks: list[Document]) -> list[Document]:
    """
    Remove any chunk whose image payload scores above the injection threshold.
    Call this after every retrieval hop before feeding chunks to the LLM node.
    """
    clean_chunks = []
    for chunk in chunks:
        image_bytes = chunk.metadata.get("image_bytes")
        if image_bytes is None:
            # Text-only chunk — no scan needed
            clean_chunks.append(chunk)
            continue
        try:
            result = scan_image(image_bytes)
        except Exception:
            # Fail-closed: scanner unreachable → drop the chunk
            continue
        if result["score"] < SCAN_THRESHOLD:
            clean_chunks.append(chunk)
        else:
            # Log the quarantined chunk for audit
            print(
                f"Chunk quarantined: doc_id={chunk.metadata.get('doc_id')}, "
                f"page={chunk.metadata.get('page')}, "
                f"score={result['score']}, scan_id={result['scan_id']}"
            )
    return clean_chunks

# ── LangGraph integration ────────────────────────────────────────────────

class AgentState(TypedDict):
    query: str
    retrieved_chunks: list[Document]
    answer: str

def retrieve_node(state: AgentState) -> AgentState:
    # Your vector store retrieval here — returns multimodal chunks
    chunks = your_vector_store.similarity_search(state["query"], k=5)
    # Scan image bytes in retrieved chunks before LLM sees them
    clean_chunks = filter_retrieved_chunks(chunks)
    return {**state, "retrieved_chunks": clean_chunks}

def generate_node(state: AgentState) -> AgentState:
    # LLM call — only clean chunks are in context
    context = format_chunks_for_llm(state["retrieved_chunks"])
    answer = your_llm.invoke(f"Context:\n{context}\n\nQuestion: {state['query']}")
    return {**state, "answer": answer}

graph = StateGraph(AgentState)
graph.add_node("retrieve", retrieve_node)
graph.add_node("generate", generate_node)
graph.set_entry_point("retrieve")
graph.add_edge("retrieve", "generate")
graph.add_edge("generate", END)
rag_agent = graph.compile()

The filter_retrieved_chunks() function applies at every retrieval hop because retrieve_node is called on every graph transition that triggers a retrieval step. In multi-hop agents (where generate_node may return a new retrieval query rather than a final answer), the graph loops back to retrieve_node on each hop — and the scan fires each time. This prevents injection at hop N from affecting the retrieval query at hop N+1.

Get early access

Coverage matrix

Defence layer PDF page image in retrieved chunk Web search image in tool output Knowledge graph image node
Text-only RAG guard (LLM Guard, Lakera) No — image bytes ignored No No
Vector store access control (Qdrant RBAC) Prevents unauthorised reads, not content inspection No No
LLM system prompt hardening Partial — may suppress some injections but not pixel-level adversarial content No No
Glyphward scan at tool-result layer Yes — scan before each hop's LLM call Yes Yes

Related questions

How does the injection risk in agentic RAG differ from simple one-hop RAG?

In simple RAG, the injection scope is one LLM response: the adversarial image makes the model produce a wrong or harmful answer in the current turn. In agentic RAG, the injection can corrupt the agent's planning state: the model may issue a different search query on the next hop (retrieving more attacker-controlled content), call a tool it should not (an exfiltration endpoint, a write API), or terminate the loop early with a false "I found the answer" signal before completing the required retrieval. The more tools the agent has and the more hops it takes, the larger the blast radius of a single injected image chunk. Scan at the tool-result layer to prevent the injection from entering the planning context at all.

Where exactly should I insert the scan in a LangGraph agent?

Insert filter_retrieved_chunks() inside the node that processes retrieval results — immediately after the retriever returns chunks and before the LLM node sees them. In LangGraph, this is typically a dedicated "filter" node between the "retrieve" node and the "generate" node, or an inline step at the start of the "generate" node before the LLM call. Do not insert it only at the entry point of the graph — if the agent loops back to retrieve on subsequent hops, the scan must run on every retrieval result, not just the first.

What about images embedded in PDF chunks by pdf2image or PDFPlumber?

pdf2image renders each PDF page as a PIL Image object; PDFPlumber can extract embedded images from PDF content streams. In both cases, the image bytes are available as a Python bytes or PIL.Image object before they are stored in the vector store or passed to the LLM. Convert to bytes (PIL.Image.tobytes() or the raw pdf2image output) and call scan_image() at the document-loading step rather than at retrieval time. This prevents the adversarial page from being indexed in the vector store at all — stronger than retrieval-time filtering, which still leaves the adversarial chunk in the index for other queries.

Does this cover indirect injection via web search results in the agent loop?

Yes — if your agent's web search tool returns a page containing images, and those images are downloaded and included in the tool output (as base64 or as a fetched URL), scan the image bytes in the tool-output processing step (equivalent to the filter_retrieved_chunks() call but applied to the tool response). If your web search tool returns only text snippets (no images), there is no multimodal injection surface at that step. The risk is highest when the browse tool fetches the full HTML of retrieved URLs and passes rendered page images to the vision model for understanding.

Further reading