OWASP LLM Top 10 · LLM08:2025

OWASP LLM08: Vector and Embedding Weaknesses — the multimodal dimension

OWASP LLM08:2025 — Vector and Embedding Weaknesses describes attacks that manipulate the retrieval layer of RAG systems: poisoning embedding stores to surface adversarial content as top-k context, exploiting semantic similarity drift to redirect queries, or abusing access control gaps in multi-tenant vector stores. All published LLM08 mitigations — input validation, embedding anomaly detection, access controls on collections — address text-only embedding pipelines. The multimodal dimension of LLM08 is largely undiscussed: adversarial images can be crafted to occupy top-k retrieval slots in CLIP-based and multimodal embedding indexes while simultaneously carrying prompt-injection payloads in their pixel content. When those images are retrieved as context for a vision-language model, the embedded instructions are presented to the model as trusted retrieval context rather than user input — exactly the trust-escalation mechanism that makes indirect prompt injection via retrieval (LLM08's core attack pattern) more dangerous than direct user-input injection. Text-only embedding scanners, metadata filters, and output monitors cannot inspect the pixel payload on an adversarial image that occupies a top-k retrieval slot. Glyphward's scan gate at the ingestion point prevents adversarially crafted images from entering the vector index at all.

TL;DR

For every image that flows into a multimodal embedding pipeline (CLIP, Embed v3, OpenAI embeddings with vision, custom VLM encoders), call POST https://glyphward.com/v1/scan with the image bytes before computing the embedding and writing to the vector store. Reject and quarantine images with score >= 65. This is the only control that prevents LLM08-class embedding poisoning via adversarial image content — metadata filters, access controls, and output monitors are all post-ingestion and therefore ineffective against images already in the index. Free tier — 10 scans/day, no card required.

The four LLM08 multimodal attack surfaces in embedding pipelines

1. CLIP-based multimodal vector stores — adversarial images in top-k retrieval. CLIP (Contrastive Language-Image Pre-training) and its successors (SigLIP, OpenCLIP, EVA-CLIP) map images and text into a shared embedding space, enabling cross-modal semantic search: a text query can retrieve relevant images, and an image can retrieve relevant text chunks. Multimodal RAG pipelines built on Pinecone, Weaviate, Qdrant, or Milvus with CLIP embeddings are exposed to a two-layer attack. First, an adversarial image can be crafted to have a CLIP embedding that positions it near the top-k results for high-value queries — for example, queries about the system's operational parameters, admin credentials, or escalation procedures. Second, the same image carries an adversarially embedded natural-language instruction in its pixel content. When a CLIP-indexed RAG system retrieves this image as context for a VLM-powered response, the model receives the image both as a high-relevance "authoritative source" (per the retrieval ranking) and as a direct instruction channel (per the pixel payload). This compound attack exploits LLM08's trust-escalation dynamic — retrieved context is implicitly trusted more than user input — while routing the injection through a visual channel that text-based embedding anomaly detectors cannot examine.

2. Multimodal embedding poisoning via document ingestion pipelines. Enterprise document ingestion pipelines commonly embed PDF page images, product photographs, technical diagrams, and scanned forms alongside their textual content, creating multimodal vector stores where each document chunk is represented by both a text embedding and an image embedding. Pipelines built with LlamaIndex's MultiModalVectorStoreIndex, LangChain's multimodal document loaders, or custom pipelines using OpenAI's text-embedding-3-large with image descriptions generated by GPT-4o before embedding are all exposed to ingestion-time poisoning. An adversarial document — a supplier PDF with a manipulated invoice page, a product image with pixel-level instructions, a technical manual with an adversarially crafted diagram — enters the pipeline through a legitimate ingestion path (email attachment, file upload, S3 batch job) and is embedded into the vector store as a trusted enterprise document. The embedded injection then persists in the index indefinitely, surfacing in relevant queries without any ongoing attacker presence in the system. Unlike credential-based attacks, embedding poisoning leaves no authentication log trail — it is visible only at the content level of the index entries themselves.

3. Cross-modal retrieval injection — text queries retrieving adversarial images. A distinctive LLM08 multimodal attack targets the cross-modal retrieval path: a crafted image is positioned in the CLIP embedding space to match common operational text queries, so that when a legitimate user submits a text question to the RAG system, the adversarial image surfaces as a top-k result alongside legitimate text chunks. If the VLM generating the response processes both text chunks and image context from the retrieval set, the adversarial image's pixel payload reaches the model in the same retrieval batch as authoritative text — and the model has no signal distinguishing the adversarial image from a legitimate retrieved image. This attack is particularly effective when the RAG system's system prompt instructs the model to "use all retrieved context to answer the question," because that instruction elevates every retrieved item — including the adversarial image — to the same authority level as the system prompt itself. Preventing this attack requires scanning images at ingestion rather than at query time, because at query time the image is already embedded and retrievable.

4. Multimodal embedding model fine-tuning data poisoning. Teams that fine-tune custom CLIP or multimodal embedding models on domain-specific data (product imagery, medical scans, legal document images) to improve retrieval precision for their use case are exposed to training-data poisoning as an LLM08 vector. If the fine-tuning dataset contains adversarially crafted images — introduced through a compromised data labelling pipeline, a poisoned data vendor, or a misconfigured upload workflow — the resulting fine-tuned embedding model may systematically place those images near certain query embeddings in its learned space, creating a persistent backdoor in the embedding model itself. Unlike index-level poisoning, model-level poisoning survives re-indexing: even if the vector store is rebuilt from scratch, the fine-tuned embedding model will continue to place adversarial images near target queries. Scanning the training corpus for adversarial image content before fine-tuning is the only effective mitigation — output validation of the resulting model's retrieval behaviour is insufficient because the adversarial positioning may only activate for the specific target queries the attacker has planted for.

Integration: multimodal ingestion pipeline with Glyphward LLM08 scan gate

import asyncio
import base64
import requests
from pathlib import Path
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, VectorParams, Distance
import clip
import torch

GLYPHWARD_KEY = "<your-glyphward-api-key>"
GLYPHWARD_THRESHOLD = 65

QDRANT_URL = "<your-qdrant-instance-url>"
COLLECTION_NAME = "multimodal_docs"

client = QdrantClient(url=QDRANT_URL)
clip_model, clip_preprocess = clip.load("ViT-B/32")


def scan_image_for_injection(image_bytes: bytes) -> dict:
    """Scan image for LLM08 multimodal embedding poisoning before CLIP encoding."""
    encoded = base64.b64encode(image_bytes).decode()
    resp = requests.post(
        "https://glyphward.com/v1/scan",
        json={"image": encoded, "source": "vector_store_ingestion"},
        headers={"Authorization": f"Bearer {GLYPHWARD_KEY}"},
        timeout=8,
    )
    resp.raise_for_status()
    return resp.json()


def ingest_image_to_vector_store(
    image_path: str,
    metadata: dict,
    doc_id: str,
) -> bool:
    """
    LLM08-safe multimodal ingestion: scan before embedding, quarantine on rejection.
    Returns True if ingested, False if quarantined.
    """
    image_bytes = Path(image_path).read_bytes()

    # LLM08 gate: scan before computing embedding
    try:
        scan = scan_image_for_injection(image_bytes)
    except Exception as exc:
        # Fail-closed: scanner unavailable means we cannot safely ingest
        print(f"[QUARANTINE] {image_path}: scanner unavailable — {exc}")
        log_quarantine(doc_id, image_path, reason="scanner_unavailable")
        return False

    if scan["score"] >= GLYPHWARD_THRESHOLD:
        print(
            f"[QUARANTINE] {image_path}: adversarial content detected "
            f"(score={scan['score']}, scan_id={scan['scan_id']})"
        )
        log_quarantine(doc_id, image_path, reason="adversarial_detected", scan=scan)
        return False

    # Safe — compute CLIP embedding and upsert to Qdrant
    from PIL import Image
    import io

    pil_image = Image.open(io.BytesIO(image_bytes)).convert("RGB")
    image_input = clip_preprocess(pil_image).unsqueeze(0)
    with torch.no_grad():
        image_features = clip_model.encode_image(image_input)
        embedding = image_features[0].tolist()

    client.upsert(
        collection_name=COLLECTION_NAME,
        points=[
            PointStruct(
                id=doc_id,
                vector=embedding,
                payload={**metadata, "scan_id": scan["scan_id"], "scan_score": scan["score"]},
            )
        ],
    )
    return True


def log_quarantine(doc_id: str, path: str, reason: str, scan: dict = None) -> None:
    """Write quarantine record — integrate with your SIEM or audit log."""
    import json, datetime
    record = {
        "timestamp": datetime.datetime.utcnow().isoformat(),
        "doc_id": doc_id,
        "path": path,
        "reason": reason,
        "scan": scan,
    }
    with open("quarantine.jsonl", "a") as f:
        f.write(json.dumps(record) + "\n")

The key LLM08 insight in this integration is that the scan gate runs before the CLIP embedding is computed, not after. Post-embedding inspection cannot prevent an adversarial image from occupying a misleading position in the embedding space — the damage is done the moment the embedding is written to the vector store. The scan_id is stored as payload metadata on every ingested point, providing an audit trail that can be queried during a security review to verify that all indexed images passed Glyphward scanning. The quarantine.jsonl log feeds your SIEM or security incident workflow for adversarial-content tracking. Get early access

LLM08 coverage matrix: multimodal embedding attack surfaces

Defence layer	CLIP index poisoning	Document ingestion poisoning	Cross-modal retrieval injection	Fine-tuning data poisoning
Vector store access controls (Pinecone RBAC, Qdrant collections)	No — controls who can write, not what is written	No	No — restricts query access, not retrieval content	N/A
Metadata filtering and namespace isolation	No — filters post-ingestion; adversarial image already indexed	No	No	N/A
Embedding anomaly detection (statistical outlier detection)	Partial — may flag statistical outliers but not semantically crafted adversarial embeddings	Partial	No	No
Output monitoring (LLM output classifiers)	No — detects after injection executes, not before	No	No	No
Text-only PI scanners (Lakera Guard, LLM Guard, Azure Prompt Shields)	No — cannot inspect pixel content of image embeddings	No	No	No
Glyphward image scan at ingestion	Yes — scans pixel content before CLIP embedding is computed	Yes — scans document page images before indexing	Yes — adversarial images blocked before reaching the index	Yes — scans training corpus images before fine-tuning

Related questions

How does LLM08 multimodal differ from LLM01 Prompt Injection?

OWASP LLM01:2025 (Prompt Injection) describes the direct mechanism: an adversarial instruction is embedded in the input and the model executes it. LLM08:2025 (Vector and Embedding Weaknesses) describes the retrieval-layer exploitation: the attack positions itself in the vector store so that it is retrieved as context during a later query, at which point the injection mechanism (LLM01) activates. In the multimodal context, an adversarial image is the carrier for both: it occupies a top-k retrieval slot (LLM08 attack surface) and its pixel content carries the injection payload (LLM01 mechanism). LLM08 mitigations address the retrieval positioning; LLM01 mitigations address the injection execution. You need both: LLM08-layer scanning at ingestion (Glyphward) and LLM01-layer scanning at query time for any images that arrive through paths other than the indexed vector store (direct user uploads, real-time image URLs).

Does Pinecone, Weaviate, or Qdrant provide built-in protection against LLM08 multimodal attacks?

No. Pinecone, Weaviate, Qdrant, Milvus, and pgvector are vector storage and retrieval systems — they provide efficient approximate nearest-neighbour (ANN) search, access control, and metadata filtering, but they do not inspect the semantic content of what is indexed. They have no concept of "adversarial pixel content" and no hook to call an external content scanner before writing a point to a collection. The responsibility for content validation before ingestion falls entirely on the ingestion pipeline code. This is analogous to how a traditional database does not validate that the strings you insert are safe for downstream SQL — input sanitisation is the application's responsibility. Glyphward fills this gap at the ingestion pipeline layer, before the embedding is computed and the point is upserted.

What is the relationship between LLM08 and the Cohere Embed v3 multimodal attack surface?

Cohere Embed v3 with multimodal input (the embed-english-v3.0 model accepting images alongside text) is a specific implementation of the LLM08 multimodal attack surface. When an adversarial image is embedded using Embed v3 and stored in a vector index, it inherits the same top-k retrieval manipulation risk as CLIP-embedded images, with the additional risk that Cohere's grounded generation in Command R+ treats retrieved content as trusted sources with citation weight. The Cohere-specific attack surface (adversarial image in Embed v3 index retrieved as top-1 grounded source in a Command R+ response) is covered in detail in our Cohere Command R+ scanner page; this LLM08 page covers the vulnerability class across all multimodal embedding models.

Can scanning at ingestion miss adversarial images that are synthesised inside the pipeline?

Yes — if an attacker can influence an image-synthesis step inside the ingestion pipeline itself (for example, by controlling parameters of a screenshot capture, a diagram rendering, or an image transformation step), they may be able to introduce adversarial content after the scan gate. The correct mitigation for pipeline-internal synthesis is to scan at the point immediately before the embedding is computed, not at the raw input ingestion point. The code example above places the scan gate immediately before clip_model.encode_image() — this catches adversarial content regardless of how the image bytes were produced, whether from a file upload, a URL fetch, a screenshot capture, or a rendering step. If your pipeline transforms images before embedding (resizing, format conversion, OCR pre-processing), scan the transformed image bytes, not the raw input, to catch any adversarial content that may have been introduced or preserved through the transformation.

TL;DR

The four LLM08 multimodal attack surfaces in embedding pipelines

Integration: multimodal ingestion pipeline with Glyphward LLM08 scan gate

LLM08 coverage matrix: multimodal embedding attack surfaces

Related questions

Further reading