Platform guide · Cohere Command R+
Prompt injection scanner for Cohere Command R+
Cohere Command R+ is Cohere's flagship enterprise model, designed from the ground up for retrieval-augmented generation with document grounding and citation. The architecture is Embed v3 multimodal → Rerank 3 → Command R+ generate, where documents flow into the model's context as grounded sources that the model treats as authoritative. That trust model is what makes multimodal prompt injection uniquely dangerous here: an adversarial image planted inside a grounded document does not just enter the model's context window — it enters with higher trust than a user message. Cohere's content moderation endpoint, safety mode parameter, and grounding citation layer are all text-only. They tell you what Command R+ said and which document it cited; they do not inspect the image bytes inside that document. The scan gate must sit between your document corpus and the documents= parameter of the chat() call.
TL;DR
Before passing any document to cohere.Client().chat(documents=[...]), scan the image bytes extracted from that document with POST https://glyphward.com/v1/scan. If score ≥ 65, remove the document from the documents list entirely — fail-closed, not fail-open. Apply the same gate at the Embed v3 indexing step to block adversarial images from entering the vector store in the first place. Free tier — 10 scans/day, no card required.
The four multimodal attack surfaces in Cohere RAG deployments
1. Cohere Embed v3 multimodal vector store — images indexed alongside text documents. Cohere Embed v3 supports multimodal embeddings: you can embed images and text documents into the same vector space and retrieve them together using a single query. In a typical enterprise document corpus — PDFs, slide decks, product manuals — images are extracted from documents and embedded alongside the surrounding text chunks. An attacker who can contribute a document to the corpus (via an upload form, a shared folder, a scraping pipeline that pulls public web pages) can embed an adversarial image alongside ordinary text. At retrieval time, that image is returned as a top-k grounded source. The adversarial image has now survived the embedding step intact — Embed v3 produces a vector for it, not a content verdict — and is about to enter Command R+'s context window tagged as a grounded, citable document. The retrieval pipeline has no mechanism to distinguish an adversarial image from a benign one; similarity distance is not a safety signal.
2. Command R+ direct image input via the multimodal chat API — user-uploaded images in enterprise document review workflows. Cohere's multimodal chat API allows images to be passed directly as message content, enabling enterprise use cases such as document review, form processing, invoice parsing, and visual Q&A over scanned contracts. In these workflows, the user — or a system that acts on the user's behalf — provides image bytes that go directly to Command R+ in the messages parameter. This is the highest-immediacy attack surface: an adversarial image reaches the model on the very next API call, with no retrieval step that might dilute its position in the context. There is no Cohere-native filter between the multimodal message and the model. Safety mode and content moderation are applied to the model's text output, not to the image that produced it. Scanning must happen in your application code before the chat() call.
3. Cohere Rerank 3 with image documents — adversarial image boosted to top-1 rank and elevated to highest-trust grounded source. Cohere Rerank 3 is a cross-encoder reranker that can process multi-modal documents — it understands tables, figures, and image content when scoring document relevance. In a retrieve-then-rerank pipeline, Rerank 3 re-scores the initial retrieval results and returns a new ranked list, which is then passed to Command R+ as the ordered grounded context. The critical vulnerability is that Rerank 3 may legitimately boost an adversarial document to rank 1 if that document's visible content matches the query well — for instance, an adversarial image embedded in a relevant-looking product manual page. Command R+ uses the grounded source order as a relevance signal: the top-ranked document is the most authoritative. An adversarial image that wins the reranking step enters the model's context at maximum trust. A high rerank score is not a safety indicator; it is a relevance indicator. The two are orthogonal, and an attacker can craft a document that scores high on both.
4. Cohere on Azure AI Foundry or AWS Bedrock — Command R+ via partner cloud integration with additional trust assumptions. Command R+ is available through Azure AI Foundry (as a managed serverless deployment) and through AWS Bedrock (via the model catalog). In both cases, the Cohere RAG pipeline — Embed, Rerank, Command R+ generate — runs identically to the direct Cohere API, but with additional cloud-layer orchestration on top. Azure AI Foundry's Prompt Flow and AWS Bedrock Agents can wire Command R+ into automated document pipelines where the image sources are cloud storage buckets, SharePoint libraries, or Confluence spaces. These integration patterns increase the attack surface: document contributors are more numerous, documents arrive from more sources, and the pipeline may process documents automatically without any human review step. Neither Azure's content safety filters nor AWS Bedrock Guardrails inspect image bytes within Cohere's grounded document context. See the Azure AI Foundry and AWS Bedrock Agents pages for cloud-layer specifics; the Cohere-level scan gate described here applies regardless of which cloud hosts the deployment.
Integration: scanning documents before the Command R+ grounded chat call
import base64, io, requests, os
import cohere
from pathlib import Path
COHERE_API_KEY = os.environ["COHERE_API_KEY"]
GLYPHWARD_KEY = os.environ["GLYPHWARD_API_KEY"]
SCAN_THRESHOLD = 65
co = cohere.Client(COHERE_API_KEY)
# ── Helper: scan a single image and return the Glyphward score ─────────────
def scan_image_bytes(image_bytes: bytes, source_hint: str = "cohere_rag_doc") -> dict:
resp = requests.post(
"https://glyphward.com/v1/scan",
json={
"image": base64.b64encode(image_bytes).decode(),
"source": source_hint,
},
headers={"Authorization": f"Bearer {GLYPHWARD_KEY}"},
timeout=8,
)
resp.raise_for_status()
return resp.json() # {"score": int, "scan_id": str, "signals": [...]}
# ── Helper: extract images embedded in a PDF page (requires PyMuPDF) ──────
def extract_pdf_images(pdf_bytes: bytes) -> list[bytes]:
import fitz # pip install PyMuPDF
doc = fitz.open(stream=pdf_bytes, filetype="pdf")
images = []
for page in doc:
for img_ref in page.get_images(full=True):
xref = img_ref[0]
base_image = doc.extract_image(xref)
images.append(base_image["image"])
return images
# ── Core gate: scan a document dict and return None if adversarial ─────────
def scan_document(doc: dict) -> dict | None:
"""
'doc' is an item in the list passed to cohere chat(documents=[...]).
Expected shape: {"title": str, "text": str, "image_bytes": bytes | None, ...}
Returns the doc unchanged if safe, or None to signal removal.
Fail-closed: if scanner is unreachable, treat as adversarial and remove.
"""
image_bytes = doc.get("image_bytes")
if not image_bytes:
return doc # Text-only document — no image to scan
try:
result = scan_image_bytes(image_bytes, source_hint="cohere_documents_param")
score = result.get("score", 100)
if score >= SCAN_THRESHOLD:
print(
f"[glyphward] Removed document '{doc.get('title', '?')}' "
f"score={score} scan_id={result.get('scan_id')}"
)
return None # Drop this document from the grounded context
except Exception as exc:
# Fail-closed: scanner unavailable → remove document to be safe
print(f"[glyphward] Scanner error for '{doc.get('title', '?')}': {exc} — removing")
return None
return doc
# ── RAG pattern: retrieve, scan, then call Command R+ chat ─────────────────
def safe_cohere_rag_chat(query: str, raw_documents: list[dict]) -> str:
"""
raw_documents: list of dicts with at least {"title": str, "text": str}
and optionally {"image_bytes": bytes} for documents containing images.
"""
# Scan each document; remove any that fail
clean_documents = []
for doc in raw_documents:
safe_doc = scan_document(doc)
if safe_doc is not None:
# Strip the image_bytes key — Command R+ chat() takes text + metadata
clean_doc = {k: v for k, v in safe_doc.items() if k != "image_bytes"}
clean_documents.append(clean_doc)
if not clean_documents:
return "[All retrieved documents were removed by the security scan. Unable to answer.]"
response = co.chat(
model="command-r-plus",
message=query,
documents=clean_documents,
# citation_quality="accurate" enables grounded citation generation
citation_quality="accurate",
)
return response.text
# ── Embed v3 indexing gate: scan before adding to the vector store ─────────
def safe_embed_and_index(image_bytes: bytes, text: str, metadata: dict, index_fn) -> bool:
"""
Call this before adding a multimodal document to a Cohere Embed v3 vector store.
index_fn: your function that calls co.embed() and upserts into your vector DB.
Returns True if indexed, False if blocked.
"""
try:
result = scan_image_bytes(image_bytes, source_hint="cohere_embed_v3_index")
if result.get("score", 100) >= SCAN_THRESHOLD:
print(
f"[glyphward] Blocked image from index: score={result['score']} "
f"scan_id={result.get('scan_id')} text_preview={text[:60]!r}"
)
return False
except Exception as exc:
print(f"[glyphward] Scanner error during indexing: {exc} — blocking")
return False
# Scanner passed — embed and index the document
index_fn(image_bytes=image_bytes, text=text, metadata=metadata)
return True
# ── Example: direct multimodal chat (image in user message) ───────────────
def safe_multimodal_chat(user_text: str, image_bytes: bytes) -> str:
"""
Gate for the Command R+ multimodal chat API where the user uploads an image directly.
"""
try:
result = scan_image_bytes(image_bytes, source_hint="cohere_multimodal_chat")
if result.get("score", 100) >= SCAN_THRESHOLD:
return f"[Image blocked by security scan. score={result['score']}]"
except Exception as exc:
return f"[Image scan failed ({exc}). Image not sent to model.]"
# Build the multimodal message for Command R+
image_b64 = base64.b64encode(image_bytes).decode()
response = co.chat(
model="command-r-plus",
message=user_text,
# Pass image via the multimodal message content API
# (refer to https://docs.cohere.com/ for current multimodal message shape)
images=[{"type": "base64", "data": image_b64}],
)
return response.text
The pattern has two integration points. The document-param gate (safe_cohere_rag_chat) intercepts each document before it enters the documents= list of the chat() call. Removing a document is the right action — not redacting its text and keeping it — because Command R+ uses the full document structure (including its position in the list and any image annotations) to generate citations. A partially scrubbed document can still mislead the citation mechanism. The embed-time gate (safe_embed_and_index) blocks adversarial images before they enter the vector store, which is the more cost-efficient placement: one scan per document upload, amortised across every future retrieval. Run both for defence in depth: embed-time blocks known-bad documents at ingestion; retrieval-time catches anything that slipped through (documents indexed before the scanner was deployed, or images that evade the ingestion scan but score higher under a retrieval-time policy calibrated for a lower threshold).
Coverage matrix
| Defence layer | Embed v3 vector store (adversarial image retrieved as grounded source) | Command R+ direct image input (multimodal chat API) | Rerank 3 (adversarial image boosted to top-1 rank) | Cohere on Azure / Bedrock (cloud-integrated RAG pipeline) |
|---|---|---|---|---|
| Cohere content moderation endpoint | No — text only; does not inspect image bytes in embedded documents | No — text only; applies to generated output, not image input | No | No |
| Cohere grounding / citation layer | Partial — records which document was cited; does not inspect whether that document is adversarial | No — citation applies to RAG outputs, not direct image inputs | Partial — logs the top-ranked document; does not flag adversarial content | Partial — audit trail only; no content inspection |
| Cohere safety mode parameter | No — text-only content filter applied to model output | No — does not scan the image that produced the output | No | No |
| RAG re-ranker score threshold (filtering low-relevance docs) | No — filters irrelevant documents, not adversarial ones; adversarial image may score high on relevance | N/A — re-ranker is not in the direct chat path | No — a high rerank score is a relevance signal, not a safety signal | No |
| Glyphward scan at Embed indexing + documents param + multimodal chat | Yes — scan at embed time blocks adversarial images from the vector store | Yes — scan before the chat() call; block if score ≥ 65 | Yes — scan before rerank input is passed to Command R+ grounded context | Yes — Glyphward scan applies at the Cohere SDK layer regardless of cloud host |
Related questions
Is Command R+ multimodal by default — do all deployments have an image attack surface?
There are two distinct multimodal paths to distinguish. The first is Embed v3 multimodal indexing: this is available to any deployment that uses Cohere's embedding API with images. If your pipeline extracts images from documents and embeds them into a vector store, the attack surface is active even if your Command R+ chat() call never receives an image directly — the adversarial image arrives as a grounded text-and-image document chunk. The second path is the Command R+ multimodal chat API, where image bytes are passed as message content. This path requires explicit feature use (the images parameter in the API call). If your deployment uses only text-formatted documents= parameters with no image bytes, this second path is not active. However, the Embed v3 path is active whenever multimodal indexing is used, which is a growing pattern in enterprise document corpora. Treat both paths as in-scope unless you have explicitly verified that no images flow through either.
Does Cohere's citation layer help detect or prevent adversarial image injection?
No. The citation layer tells you which grounded document produced each sentence in Command R+'s response — it traces the provenance of the model's text output back to a specific document in the documents= list. This is valuable for audit and hallucination detection. It does not tell you whether the cited document contained an adversarial image, and it cannot prevent injection because the injection has already occurred by the time the citation is generated. In fact, the citation layer may make an injection attack harder to detect: the model's injected output will be faithfully attributed to the poisoned document, making the attribution appear legitimate. Log the cited document IDs, but do not rely on the citation mechanism as a security control.
What about self-hosted Cohere models via C4AI or the open-weights release?
Command R+ is available as an open-weights model through Cohere's C4AI community release (on Hugging Face and direct download). Self-hosted deployments running Command R+ on-premises or in a private cloud have the same grounded-source trust architecture as the hosted API — the documents= parameter passes documents into the model context with the same elevated trust. The only difference is that you control the full inference stack. There is no Cohere-side content moderation running in your on-premises setup: safety mode and the moderation endpoint are Cohere API features, not model weights features. Self-hosted deployments of Command R+ have no native safety filters at all, making the Glyphward scan gate even more important. The integration code above works identically for self-hosted deployments — swap the model identifier and call the same POST /v1/scan gate before populating documents=.
How is this page different from the general RAG pipeline page on Glyphward?
The RAG pipelines page covers the general case: any retrieve-then-generate pipeline where the retrieved document may contain image bytes that reach a vision-capable LLM. The mechanics of pre-ingestion scanning, retrieval-time scanning, and content-hash caching apply there. This page is Cohere-specific and covers the grounded-source trust escalation unique to Command R+'s RAG architecture. In a generic RAG pipeline, a retrieved document is context — it informs the model's answer but does not carry a formal trust designation. In Command R+'s grounded generation, the documents= list members are authoritative sources that the model is trained to treat with citation-level confidence. That trust elevation changes the severity of an adversarial image landing in a grounded document: it is not just context that might influence the answer, it is a source the model is designed to follow. The attack surface is the same bytes; the blast radius is wider.
Does Glyphward integrate with Cohere's connector API?
Yes. Cohere's connector API allows Command R+ to retrieve documents from external sources at chat time — a connector can pull from a CRM, a file server, or any REST endpoint and return documents that are then passed directly into the grounded context as if they were in the documents= list. If a connector fetches documents from a source that returns images (an image-capable SharePoint connector, a Google Drive connector pulling slide decks, or a custom connector that returns product images alongside specs), those images enter the grounded context through the connector's return payload. The correct gate is inside the connector implementation: before the connector returns its document list, scan any image bytes in the returned documents with POST /v1/scan and remove documents that score at or above the threshold. The connector is the earliest point in the call chain where you have custody of the image bytes — earlier than the Command R+ model, earlier than the grounding layer, and before Cohere's platform handles the connector response.
Further reading
- Prompt-injection scanner for RAG pipelines — general retrieve-then-generate patterns; pre-ingestion vs retrieval-time placement and latency budgets.
- Multimodal prompt injection in agentic RAG pipelines — multi-hop and multi-agent RAG architectures where retrieved images can redirect agent actions.
- Prompt-injection scanner for AWS Bedrock Agents — Command R+ deployed via Bedrock model catalog; Knowledge Base and ActionGroup attack surfaces.
- Prompt-injection scanner for Azure AI Foundry — Command R+ as a managed serverless deployment via Azure AI Foundry; Prompt Flow and Inference SDK integration.
- Real-time vs batch prompt injection scanning — architecture guide for latency-sensitive Cohere deployments: when to scan at embed time, rerank time, or retrieval time.