Platform guide · Mistral Pixtral
Prompt injection scanner for Mistral Pixtral
Mistral released Pixtral-12B as an open-weight multimodal model — it can be downloaded, fine-tuned, and self-hosted with no dependency on Mistral's infrastructure. Pixtral Large is available via la Plateforme (Mistral's cloud API). Both are gaining traction specifically in EU-regulated deployments: European fintech processing document images, healthcare companies running OCR on patient-submitted photos, and legal-tech platforms analysing scanned contracts — workloads where data residency under GDPR makes US-based cloud models a compliance risk. The critical difference from cloud-only models is that Pixtral-12B self-hosted deployments have no cloud safety layer. Mistral's API content filters, audit logging, and abuse monitoring exist only at la Plateforme. On a self-hosted vLLM or Ollama deployment, the model output goes directly to your application with no intermediary inspection. Your application is the entire safety stack. Glyphward provides the multimodal injection detection layer that the self-hosted model engine does not.
TL;DR
For Pixtral deployments — whether self-hosted (vLLM, Ollama) or via la Plateforme — scan every image before it reaches the model. Use POST https://glyphward.com/v1/scan; reject images with score ≥ 65. For self-hosted deployments, the scan gate is your only automated defence against multimodal injection. Free tier — 10 scans/day, no card required.
Four attack surfaces specific to Pixtral deployments
1. Self-hosted Pixtral-12B with no cloud safety layer. When you run Pixtral-12B via vLLM, Ollama, or a custom inference server, you receive raw model output — no content filters, no prompt-injection heuristics, no abuse monitoring. An adversarial image that instructs the model to ignore its system prompt, output a specific string, or claim a false identity passes directly to your application logic. This is the highest-risk configuration because the blast radius is entirely determined by what your application does with the model output. A document-extraction pipeline that writes to a database, a customer-service bot that updates CRM records, or a legal-review tool that generates compliance reports — all are directly affected by injected model output with no cloud-side backstop.
2. La Plateforme batch API for EU document processing. Mistral's cloud API (la Plateforme) offers a batch endpoint for high-volume document processing. EU-based applications use this to process large volumes of customer-submitted images while keeping data within EU infrastructure. At scale, the same multiplication effect as any high-throughput pipeline applies: a processing job that handles 50,000 invoice images per day provides 50,000 injection opportunities per day. Mistral's API includes content-moderation flagging for harmful content categories; it does not perform adversarial prompt-injection detection for the image layer.
3. LangChain/LlamaIndex integrations using ChatMistral with vision. Python developers building RAG or agent pipelines with LangChain's ChatMistralAI or LlamaIndex's MistralAI LLM wrapper pass images as HumanMessage content blocks. These integrations are particularly common in EU deployments where teams want a GDPR-compliant alternative to OpenAI's GPT-4o. The image content block bypasses any text-level input sanitisation middleware in the chain — LangChain's callback system and input guards operate on text tokens, not binary image data. The scan gate must be applied to the raw image bytes before constructing the HumanMessage.
4. GDPR on-prem deployments processing patient or client document images. Healthcare and legal-tech deployments running Pixtral-12B on-prem to avoid transferring patient photos or client documents to third-party clouds face the strongest argument for a scan gate: the same GDPR compliance posture that drives the on-prem deployment requires that user-submitted images be inspected for adversarial content before they influence AI-generated outputs (legal findings, clinical summaries, compliance assessments). A patient who uploads a photo containing typographic injection targeting a clinical NLP pipeline can cause the model to output fabricated diagnoses. The on-prem architecture cannot defer this to a cloud safety layer.
Integration: vLLM-hosted Pixtral scan gate (Python)
import base64, os, requests
from openai import OpenAI # vLLM serves an OpenAI-compatible endpoint
GLYPHWARD_KEY = os.environ["GLYPHWARD_API_KEY"]
INJECTION_THRESHOLD = 65
VLLM_BASE_URL = os.environ.get("VLLM_BASE_URL", "http://localhost:8000/v1")
# vLLM client using OpenAI-compatible endpoint
pixtral_client = OpenAI(base_url=VLLM_BASE_URL, api_key="dummy")
def scan_image(image_bytes: bytes, source: str) -> dict:
try:
resp = requests.post(
"https://glyphward.com/v1/scan",
json={"image": base64.b64encode(image_bytes).decode(), "source": source},
headers={"Authorization": f"Bearer {GLYPHWARD_KEY}"},
timeout=8,
)
resp.raise_for_status()
return resp.json()
except Exception:
return {"score": 100, "scan_id": None} # Fail-closed
def safe_pixtral_call(
image_bytes: bytes,
source: str,
system_prompt: str,
user_text: str,
model: str = "mistralai/Pixtral-12B-2409",
) -> str | None:
"""
Scan an image, then call Pixtral if safe.
Returns the model response text, or None if the image was rejected.
"""
scan = scan_image(image_bytes, source)
if scan["score"] >= INJECTION_THRESHOLD:
print(
f"Pixtral call rejected: source={source}, "
f"score={scan['score']}, scan_id={scan['scan_id']}"
)
return None
b64_image = base64.b64encode(image_bytes).decode()
response = pixtral_client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": system_prompt},
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{b64_image}"},
},
{"type": "text", "text": user_text},
],
},
],
max_tokens=1024,
)
return response.choices[0].message.content
# La Plateforme (Mistral cloud API) — same scan gate, different client
from mistralai import Mistral
mistral_cloud = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
def safe_pixtral_large_call(
image_bytes: bytes,
source: str,
user_text: str,
) -> str | None:
scan = scan_image(image_bytes, source)
if scan["score"] >= INJECTION_THRESHOLD:
return None
b64_image = base64.b64encode(image_bytes).decode()
response = mistral_cloud.chat.complete(
model="pixtral-large-latest",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{b64_image}"},
},
{"type": "text", "text": user_text},
],
}
],
)
return response.choices[0].message.content
Coverage matrix
| Defence layer | Self-hosted Pixtral-12B | La Plateforme batch API | LangChain integration | On-prem GDPR deployment |
|---|---|---|---|---|
| Mistral API content moderation | Not present — self-hosted has no cloud filter | Harm-category flagging — not adversarial PI | Only if using la Plateforme endpoint | Not present — on-prem has no cloud filter |
| vLLM / Ollama inference server | Routes requests to model — no content inspection | N/A | N/A | Routes requests to model — no content inspection |
| GDPR data residency controls | Controls data location — not adversarial image content | Controls data location — not content | Controls data location — not content | Controls data location — not content |
| Glyphward scan gate (pre-model) | Yes — your only automated injection defence on self-hosted | Yes — scan before batch submission | Yes — scan image bytes before HumanMessage construction | Yes — can be deployed in EU region for data residency compliance |
Related questions
Does Glyphward send image data outside the EU?
By default, glyphward.com/v1/scan routes to the nearest available endpoint. For EU data-residency requirements (GDPR Article 44 restrictions on international transfers), contact us about the EU-region endpoint at eu.glyphward.com/v1/scan. Image data sent for scanning is not retained after the scan result is returned — see the privacy policy for data processing details.
Does this apply to Mixtral and other Mistral text-only models?
Mixtral-8x7B, Mistral-7B, and other text-only Mistral models do not process image inputs. The injection surface described here applies only to Pixtral-12B and Pixtral Large, which accept image_url content blocks. For text-only Mistral models, standard text-level prompt injection defences apply (input sanitisation, system prompt hardening, output validation).
Can we run the Glyphward scanner on-prem alongside our self-hosted Pixtral?
On-prem deployment of the Glyphward scanner is available on the Team plan and above. Contact us for the Docker image and deployment guide. The on-prem scanner maintains the same API interface as the cloud endpoint, so no code changes are required beyond updating the GLYPHWARD_BASE_URL environment variable. The on-prem scanner includes the same CLIP embedding, Tesseract OCR, and YOLO text-in-image detection stack as the cloud service.
How does Pixtral handle images differently from GPT-4o Vision?
Both models process images via a visual encoder before producing tokens. The key difference is that Pixtral-12B is open-weight: you can inspect, fine-tune, and quantize the model, but you are also responsible for all safety controls. GPT-4o Vision routes through OpenAI's API, which includes content moderation at the API layer. Pixtral-12B self-hosted routes through your vLLM or Ollama server, which has no content layer. The Glyphward scan gate fills this gap for self-hosted deployments regardless of model choice.
Further reading
- Vision-language model security overview — attack taxonomy across all VLM architectures.
- Real-time vs batch prompt injection scanning — architecture guide for synchronous vs asynchronous scan patterns.
- Prompt injection prevention best practices — full defence-in-depth stack including self-hosted model considerations.
- Multimodal AI security checklist — 40-point binary checklist for security reviews.
- Multimodal LLM security API — Glyphward API overview.