Platform guide · Cohere Command R+

Prompt injection scanner for Cohere Command R+

Cohere Command R+ is Cohere's flagship enterprise model, designed from the ground up for retrieval-augmented generation with document grounding and citation. The architecture is Embed v3 multimodal → Rerank 3 → Command R+ generate, where documents flow into the model's context as grounded sources that the model treats as authoritative. That trust model is what makes multimodal prompt injection uniquely dangerous here: an adversarial image planted inside a grounded document does not just enter the model's context window — it enters with higher trust than a user message. Cohere's content moderation endpoint, safety mode parameter, and grounding citation layer are all text-only. They tell you what Command R+ said and which document it cited; they do not inspect the image bytes inside that document. The scan gate must sit between your document corpus and the documents= parameter of the chat() call.

TL;DR

Before passing any document to cohere.Client().chat(documents=[...]), scan the image bytes extracted from that document with POST https://glyphward.com/v1/scan. If score ≥ 65, remove the document from the documents list entirely — fail-closed, not fail-open. Apply the same gate at the Embed v3 indexing step to block adversarial images from entering the vector store in the first place. Free tier — 10 scans/day, no card required.

The four multimodal attack surfaces in Cohere RAG deployments

1. Cohere Embed v3 multimodal vector store — images indexed alongside text documents. Cohere Embed v3 supports multimodal embeddings: you can embed images and text documents into the same vector space and retrieve them together using a single query. In a typical enterprise document corpus — PDFs, slide decks, product manuals — images are extracted from documents and embedded alongside the surrounding text chunks. An attacker who can contribute a document to the corpus (via an upload form, a shared folder, a scraping pipeline that pulls public web pages) can embed an adversarial image alongside ordinary text. At retrieval time, that image is returned as a top-k grounded source. The adversarial image has now survived the embedding step intact — Embed v3 produces a vector for it, not a content verdict — and is about to enter Command R+'s context window tagged as a grounded, citable document. The retrieval pipeline has no mechanism to distinguish an adversarial image from a benign one; similarity distance is not a safety signal.

2. Command R+ direct image input via the multimodal chat API — user-uploaded images in enterprise document review workflows. Cohere's multimodal chat API allows images to be passed directly as message content, enabling enterprise use cases such as document review, form processing, invoice parsing, and visual Q&A over scanned contracts. In these workflows, the user — or a system that acts on the user's behalf — provides image bytes that go directly to Command R+ in the messages parameter. This is the highest-immediacy attack surface: an adversarial image reaches the model on the very next API call, with no retrieval step that might dilute its position in the context. There is no Cohere-native filter between the multimodal message and the model. Safety mode and content moderation are applied to the model's text output, not to the image that produced it. Scanning must happen in your application code before the chat() call.

3. Cohere Rerank 3 with image documents — adversarial image boosted to top-1 rank and elevated to highest-trust grounded source. Cohere Rerank 3 is a cross-encoder reranker that can process multi-modal documents — it understands tables, figures, and image content when scoring document relevance. In a retrieve-then-rerank pipeline, Rerank 3 re-scores the initial retrieval results and returns a new ranked list, which is then passed to Command R+ as the ordered grounded context. The critical vulnerability is that Rerank 3 may legitimately boost an adversarial document to rank 1 if that document's visible content matches the query well — for instance, an adversarial image embedded in a relevant-looking product manual page. Command R+ uses the grounded source order as a relevance signal: the top-ranked document is the most authoritative. An adversarial image that wins the reranking step enters the model's context at maximum trust. A high rerank score is not a safety indicator; it is a relevance indicator. The two are orthogonal, and an attacker can craft a document that scores high on both.

4. Cohere on Azure AI Foundry or AWS Bedrock — Command R+ via partner cloud integration with additional trust assumptions. Command R+ is available through Azure AI Foundry (as a managed serverless deployment) and through AWS Bedrock (via the model catalog). In both cases, the Cohere RAG pipeline — Embed, Rerank, Command R+ generate — runs identically to the direct Cohere API, but with additional cloud-layer orchestration on top. Azure AI Foundry's Prompt Flow and AWS Bedrock Agents can wire Command R+ into automated document pipelines where the image sources are cloud storage buckets, SharePoint libraries, or Confluence spaces. These integration patterns increase the attack surface: document contributors are more numerous, documents arrive from more sources, and the pipeline may process documents automatically without any human review step. Neither Azure's content safety filters nor AWS Bedrock Guardrails inspect image bytes within Cohere's grounded document context. See the Azure AI Foundry and AWS Bedrock Agents pages for cloud-layer specifics; the Cohere-level scan gate described here applies regardless of which cloud hosts the deployment.

Integration: scanning documents before the Command R+ grounded chat call

import base64, io, requests, os
import cohere
from pathlib import Path

COHERE_API_KEY = os.environ["COHERE_API_KEY"]
GLYPHWARD_KEY  = os.environ["GLYPHWARD_API_KEY"]
SCAN_THRESHOLD = 65

co = cohere.Client(COHERE_API_KEY)


# ── Helper: scan a single image and return the Glyphward score ─────────────

def scan_image_bytes(image_bytes: bytes, source_hint: str = "cohere_rag_doc") -> dict:
    resp = requests.post(
        "https://glyphward.com/v1/scan",
        json={
            "image": base64.b64encode(image_bytes).decode(),
            "source": source_hint,
        },
        headers={"Authorization": f"Bearer {GLYPHWARD_KEY}"},
        timeout=8,
    )
    resp.raise_for_status()
    return resp.json()   # {"score": int, "scan_id": str, "signals": [...]}


# ── Helper: extract images embedded in a PDF page (requires PyMuPDF) ──────

def extract_pdf_images(pdf_bytes: bytes) -> list[bytes]:
    import fitz  # pip install PyMuPDF
    doc = fitz.open(stream=pdf_bytes, filetype="pdf")
    images = []
    for page in doc:
        for img_ref in page.get_images(full=True):
            xref = img_ref[0]
            base_image = doc.extract_image(xref)
            images.append(base_image["image"])
    return images


# ── Core gate: scan a document dict and return None if adversarial ─────────

def scan_document(doc: dict) -> dict | None:
    """
    'doc' is an item in the list passed to cohere chat(documents=[...]).
    Expected shape: {"title": str, "text": str, "image_bytes": bytes | None, ...}

    Returns the doc unchanged if safe, or None to signal removal.
    Fail-closed: if scanner is unreachable, treat as adversarial and remove.
    """
    image_bytes = doc.get("image_bytes")
    if not image_bytes:
        return doc  # Text-only document — no image to scan

    try:
        result = scan_image_bytes(image_bytes, source_hint="cohere_documents_param")
        score  = result.get("score", 100)
        if score >= SCAN_THRESHOLD:
            print(
                f"[glyphward] Removed document '{doc.get('title', '?')}' "
                f"score={score} scan_id={result.get('scan_id')}"
            )
            return None   # Drop this document from the grounded context
    except Exception as exc:
        # Fail-closed: scanner unavailable → remove document to be safe
        print(f"[glyphward] Scanner error for '{doc.get('title', '?')}': {exc} — removing")
        return None

    return doc


# ── RAG pattern: retrieve, scan, then call Command R+ chat ─────────────────

def safe_cohere_rag_chat(query: str, raw_documents: list[dict]) -> str:
    """
    raw_documents: list of dicts with at least {"title": str, "text": str}
    and optionally {"image_bytes": bytes} for documents containing images.
    """
    # Scan each document; remove any that fail
    clean_documents = []
    for doc in raw_documents:
        safe_doc = scan_document(doc)
        if safe_doc is not None:
            # Strip the image_bytes key — Command R+ chat() takes text + metadata
            clean_doc = {k: v for k, v in safe_doc.items() if k != "image_bytes"}
            clean_documents.append(clean_doc)

    if not clean_documents:
        return "[All retrieved documents were removed by the security scan. Unable to answer.]"

    response = co.chat(
        model="command-r-plus",
        message=query,
        documents=clean_documents,
        # citation_quality="accurate" enables grounded citation generation
        citation_quality="accurate",
    )
    return response.text


# ── Embed v3 indexing gate: scan before adding to the vector store ─────────

def safe_embed_and_index(image_bytes: bytes, text: str, metadata: dict, index_fn) -> bool:
    """
    Call this before adding a multimodal document to a Cohere Embed v3 vector store.
    index_fn: your function that calls co.embed() and upserts into your vector DB.
    Returns True if indexed, False if blocked.
    """
    try:
        result = scan_image_bytes(image_bytes, source_hint="cohere_embed_v3_index")
        if result.get("score", 100) >= SCAN_THRESHOLD:
            print(
                f"[glyphward] Blocked image from index: score={result['score']} "
                f"scan_id={result.get('scan_id')} text_preview={text[:60]!r}"
            )
            return False
    except Exception as exc:
        print(f"[glyphward] Scanner error during indexing: {exc} — blocking")
        return False

    # Scanner passed — embed and index the document
    index_fn(image_bytes=image_bytes, text=text, metadata=metadata)
    return True


# ── Example: direct multimodal chat (image in user message) ───────────────

def safe_multimodal_chat(user_text: str, image_bytes: bytes) -> str:
    """
    Gate for the Command R+ multimodal chat API where the user uploads an image directly.
    """
    try:
        result = scan_image_bytes(image_bytes, source_hint="cohere_multimodal_chat")
        if result.get("score", 100) >= SCAN_THRESHOLD:
            return f"[Image blocked by security scan. score={result['score']}]"
    except Exception as exc:
        return f"[Image scan failed ({exc}). Image not sent to model.]"

    # Build the multimodal message for Command R+
    image_b64 = base64.b64encode(image_bytes).decode()
    response = co.chat(
        model="command-r-plus",
        message=user_text,
        # Pass image via the multimodal message content API
        # (refer to https://docs.cohere.com/ for current multimodal message shape)
        images=[{"type": "base64", "data": image_b64}],
    )
    return response.text

The pattern has two integration points. The document-param gate (safe_cohere_rag_chat) intercepts each document before it enters the documents= list of the chat() call. Removing a document is the right action — not redacting its text and keeping it — because Command R+ uses the full document structure (including its position in the list and any image annotations) to generate citations. A partially scrubbed document can still mislead the citation mechanism. The embed-time gate (safe_embed_and_index) blocks adversarial images before they enter the vector store, which is the more cost-efficient placement: one scan per document upload, amortised across every future retrieval. Run both for defence in depth: embed-time blocks known-bad documents at ingestion; retrieval-time catches anything that slipped through (documents indexed before the scanner was deployed, or images that evade the ingestion scan but score higher under a retrieval-time policy calibrated for a lower threshold).

Get early access

Coverage matrix

Defence layer	Embed v3 vector store (adversarial image retrieved as grounded source)	Command R+ direct image input (multimodal chat API)	Rerank 3 (adversarial image boosted to top-1 rank)	Cohere on Azure / Bedrock (cloud-integrated RAG pipeline)
Cohere content moderation endpoint	No — text only; does not inspect image bytes in embedded documents	No — text only; applies to generated output, not image input	No	No
Cohere grounding / citation layer	Partial — records which document was cited; does not inspect whether that document is adversarial	No — citation applies to RAG outputs, not direct image inputs	Partial — logs the top-ranked document; does not flag adversarial content	Partial — audit trail only; no content inspection
Cohere safety mode parameter	No — text-only content filter applied to model output	No — does not scan the image that produced the output	No	No
RAG re-ranker score threshold (filtering low-relevance docs)	No — filters irrelevant documents, not adversarial ones; adversarial image may score high on relevance	N/A — re-ranker is not in the direct chat path	No — a high rerank score is a relevance signal, not a safety signal	No
Glyphward scan at Embed indexing + documents param + multimodal chat	Yes — scan at embed time blocks adversarial images from the vector store	Yes — scan before the chat() call; block if score ≥ 65	Yes — scan before rerank input is passed to Command R+ grounded context	Yes — Glyphward scan applies at the Cohere SDK layer regardless of cloud host

Related questions

Is Command R+ multimodal by default — do all deployments have an image attack surface?

There are two distinct multimodal paths to distinguish. The first is Embed v3 multimodal indexing: this is available to any deployment that uses Cohere's embedding API with images. If your pipeline extracts images from documents and embeds them into a vector store, the attack surface is active even if your Command R+ chat() call never receives an image directly — the adversarial image arrives as a grounded text-and-image document chunk. The second path is the Command R+ multimodal chat API, where image bytes are passed as message content. This path requires explicit feature use (the images parameter in the API call). If your deployment uses only text-formatted documents= parameters with no image bytes, this second path is not active. However, the Embed v3 path is active whenever multimodal indexing is used, which is a growing pattern in enterprise document corpora. Treat both paths as in-scope unless you have explicitly verified that no images flow through either.

Does Cohere's citation layer help detect or prevent adversarial image injection?

No. The citation layer tells you which grounded document produced each sentence in Command R+'s response — it traces the provenance of the model's text output back to a specific document in the documents= list. This is valuable for audit and hallucination detection. It does not tell you whether the cited document contained an adversarial image, and it cannot prevent injection because the injection has already occurred by the time the citation is generated. In fact, the citation layer may make an injection attack harder to detect: the model's injected output will be faithfully attributed to the poisoned document, making the attribution appear legitimate. Log the cited document IDs, but do not rely on the citation mechanism as a security control.

What about self-hosted Cohere models via C4AI or the open-weights release?

Command R+ is available as an open-weights model through Cohere's C4AI community release (on Hugging Face and direct download). Self-hosted deployments running Command R+ on-premises or in a private cloud have the same grounded-source trust architecture as the hosted API — the documents= parameter passes documents into the model context with the same elevated trust. The only difference is that you control the full inference stack. There is no Cohere-side content moderation running in your on-premises setup: safety mode and the moderation endpoint are Cohere API features, not model weights features. Self-hosted deployments of Command R+ have no native safety filters at all, making the Glyphward scan gate even more important. The integration code above works identically for self-hosted deployments — swap the model identifier and call the same POST /v1/scan gate before populating documents=.

How is this page different from the general RAG pipeline page on Glyphward?

The RAG pipelines page covers the general case: any retrieve-then-generate pipeline where the retrieved document may contain image bytes that reach a vision-capable LLM. The mechanics of pre-ingestion scanning, retrieval-time scanning, and content-hash caching apply there. This page is Cohere-specific and covers the grounded-source trust escalation unique to Command R+'s RAG architecture. In a generic RAG pipeline, a retrieved document is context — it informs the model's answer but does not carry a formal trust designation. In Command R+'s grounded generation, the documents= list members are authoritative sources that the model is trained to treat with citation-level confidence. That trust elevation changes the severity of an adversarial image landing in a grounded document: it is not just context that might influence the answer, it is a source the model is designed to follow. The attack surface is the same bytes; the blast radius is wider.

Does Glyphward integrate with Cohere's connector API?

Yes. Cohere's connector API allows Command R+ to retrieve documents from external sources at chat time — a connector can pull from a CRM, a file server, or any REST endpoint and return documents that are then passed directly into the grounded context as if they were in the documents= list. If a connector fetches documents from a source that returns images (an image-capable SharePoint connector, a Google Drive connector pulling slide decks, or a custom connector that returns product images alongside specs), those images enter the grounded context through the connector's return payload. The correct gate is inside the connector implementation: before the connector returns its document list, scan any image bytes in the returned documents with POST /v1/scan and remove documents that score at or above the threshold. The connector is the earliest point in the call chain where you have custody of the image bytes — earlier than the Command R+ model, earlier than the grounding layer, and before Cohere's platform handles the connector response.

TL;DR

The four multimodal attack surfaces in Cohere RAG deployments

Integration: scanning documents before the Command R+ grounded chat call

Coverage matrix

Related questions

Further reading