Platform guide · Mistral Pixtral

Prompt injection scanner for Mistral Pixtral

Mistral released Pixtral-12B as an open-weight multimodal model — it can be downloaded, fine-tuned, and self-hosted with no dependency on Mistral's infrastructure. Pixtral Large is available via la Plateforme (Mistral's cloud API). Both are gaining traction specifically in EU-regulated deployments: European fintech processing document images, healthcare companies running OCR on patient-submitted photos, and legal-tech platforms analysing scanned contracts — workloads where data residency under GDPR makes US-based cloud models a compliance risk. The critical difference from cloud-only models is that Pixtral-12B self-hosted deployments have no cloud safety layer. Mistral's API content filters, audit logging, and abuse monitoring exist only at la Plateforme. On a self-hosted vLLM or Ollama deployment, the model output goes directly to your application with no intermediary inspection. Your application is the entire safety stack. Glyphward provides the multimodal injection detection layer that the self-hosted model engine does not.

TL;DR

For Pixtral deployments — whether self-hosted (vLLM, Ollama) or via la Plateforme — scan every image before it reaches the model. Use POST https://glyphward.com/v1/scan; reject images with score ≥ 65. For self-hosted deployments, the scan gate is your only automated defence against multimodal injection. Free tier — 10 scans/day, no card required.

Four attack surfaces specific to Pixtral deployments

1. Self-hosted Pixtral-12B with no cloud safety layer. When you run Pixtral-12B via vLLM, Ollama, or a custom inference server, you receive raw model output — no content filters, no prompt-injection heuristics, no abuse monitoring. An adversarial image that instructs the model to ignore its system prompt, output a specific string, or claim a false identity passes directly to your application logic. This is the highest-risk configuration because the blast radius is entirely determined by what your application does with the model output. A document-extraction pipeline that writes to a database, a customer-service bot that updates CRM records, or a legal-review tool that generates compliance reports — all are directly affected by injected model output with no cloud-side backstop.

2. La Plateforme batch API for EU document processing. Mistral's cloud API (la Plateforme) offers a batch endpoint for high-volume document processing. EU-based applications use this to process large volumes of customer-submitted images while keeping data within EU infrastructure. At scale, the same multiplication effect as any high-throughput pipeline applies: a processing job that handles 50,000 invoice images per day provides 50,000 injection opportunities per day. Mistral's API includes content-moderation flagging for harmful content categories; it does not perform adversarial prompt-injection detection for the image layer.

3. LangChain/LlamaIndex integrations using ChatMistral with vision. Python developers building RAG or agent pipelines with LangChain's ChatMistralAI or LlamaIndex's MistralAI LLM wrapper pass images as HumanMessage content blocks. These integrations are particularly common in EU deployments where teams want a GDPR-compliant alternative to OpenAI's GPT-4o. The image content block bypasses any text-level input sanitisation middleware in the chain — LangChain's callback system and input guards operate on text tokens, not binary image data. The scan gate must be applied to the raw image bytes before constructing the HumanMessage.

4. GDPR on-prem deployments processing patient or client document images. Healthcare and legal-tech deployments running Pixtral-12B on-prem to avoid transferring patient photos or client documents to third-party clouds face the strongest argument for a scan gate: the same GDPR compliance posture that drives the on-prem deployment requires that user-submitted images be inspected for adversarial content before they influence AI-generated outputs (legal findings, clinical summaries, compliance assessments). A patient who uploads a photo containing typographic injection targeting a clinical NLP pipeline can cause the model to output fabricated diagnoses. The on-prem architecture cannot defer this to a cloud safety layer.

Integration: vLLM-hosted Pixtral scan gate (Python)

import base64, os, requests
from openai import OpenAI  # vLLM serves an OpenAI-compatible endpoint

GLYPHWARD_KEY = os.environ["GLYPHWARD_API_KEY"]
INJECTION_THRESHOLD = 65
VLLM_BASE_URL = os.environ.get("VLLM_BASE_URL", "http://localhost:8000/v1")

# vLLM client using OpenAI-compatible endpoint
pixtral_client = OpenAI(base_url=VLLM_BASE_URL, api_key="dummy")


def scan_image(image_bytes: bytes, source: str) -> dict:
    try:
        resp = requests.post(
            "https://glyphward.com/v1/scan",
            json={"image": base64.b64encode(image_bytes).decode(), "source": source},
            headers={"Authorization": f"Bearer {GLYPHWARD_KEY}"},
            timeout=8,
        )
        resp.raise_for_status()
        return resp.json()
    except Exception:
        return {"score": 100, "scan_id": None}  # Fail-closed


def safe_pixtral_call(
    image_bytes: bytes,
    source: str,
    system_prompt: str,
    user_text: str,
    model: str = "mistralai/Pixtral-12B-2409",
) -> str | None:
    """
    Scan an image, then call Pixtral if safe.
    Returns the model response text, or None if the image was rejected.
    """
    scan = scan_image(image_bytes, source)

    if scan["score"] >= INJECTION_THRESHOLD:
        print(
            f"Pixtral call rejected: source={source}, "
            f"score={scan['score']}, scan_id={scan['scan_id']}"
        )
        return None

    b64_image = base64.b64encode(image_bytes).decode()
    response = pixtral_client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{b64_image}"},
                    },
                    {"type": "text", "text": user_text},
                ],
            },
        ],
        max_tokens=1024,
    )
    return response.choices[0].message.content


# La Plateforme (Mistral cloud API) — same scan gate, different client
from mistralai import Mistral

mistral_cloud = Mistral(api_key=os.environ["MISTRAL_API_KEY"])


def safe_pixtral_large_call(
    image_bytes: bytes,
    source: str,
    user_text: str,
) -> str | None:
    scan = scan_image(image_bytes, source)
    if scan["score"] >= INJECTION_THRESHOLD:
        return None

    b64_image = base64.b64encode(image_bytes).decode()
    response = mistral_cloud.chat.complete(
        model="pixtral-large-latest",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{b64_image}"},
                    },
                    {"type": "text", "text": user_text},
                ],
            }
        ],
    )
    return response.choices[0].message.content

Get early access

Coverage matrix

Defence layer	Self-hosted Pixtral-12B	La Plateforme batch API	LangChain integration	On-prem GDPR deployment
Mistral API content moderation	Not present — self-hosted has no cloud filter	Harm-category flagging — not adversarial PI	Only if using la Plateforme endpoint	Not present — on-prem has no cloud filter
vLLM / Ollama inference server	Routes requests to model — no content inspection	N/A	N/A	Routes requests to model — no content inspection
GDPR data residency controls	Controls data location — not adversarial image content	Controls data location — not content	Controls data location — not content	Controls data location — not content
Glyphward scan gate (pre-model)	Yes — your only automated injection defence on self-hosted	Yes — scan before batch submission	Yes — scan image bytes before HumanMessage construction	Yes — can be deployed in EU region for data residency compliance

Prompt injection scanner for Mistral Pixtral

TL;DR

Four attack surfaces specific to Pixtral deployments

Integration: vLLM-hosted Pixtral scan gate (Python)

Coverage matrix

Related questions

Further reading