ICP-by-platform · Azure OpenAI Service

Prompt-injection scanner for Azure OpenAI

Azure OpenAI Service gives your team a managed deployment of GPT-4o, GPT-4o-mini, and GPT-4V (vision-preview) inside your Azure tenant. Azure AI Content Safety — including Azure Prompt Shields — integrates natively with these deployments and detects text-based direct and indirect prompt injection attempts in the messages[].content string fields. It does not inspect the pixel bytes of an image_url content block for FigStep-class typographic jailbreak payloads or AgentTypo-class glyph distortions. When a user submits an image alongside a text prompt, the image bytes travel to the GPT-4o vision encoder without any Azure-side PI scan touching them. Scan those bytes in your application before the chat completion call.

TL;DR

Before calling client.chat.completions.create() on your Azure OpenAI deployment, scan every image_url content block. POST the image bytes to Glyphward's /v1/scan — if the score exceeds your threshold, reject the request server-side before it reaches Azure. One POST, under 200 ms, returns a 0–100 risk score and the flagged pixel region. Free tier: 10 scans/day, no card. Pro: 100,000/month at $29/mo. Start on the free tier.

What Azure Prompt Shields cover — and where they stop

Azure Prompt Shields (part of Azure AI Content Safety, generally available in 2024) performs two checks on each inference request:

User prompt attack detection. The text in the user's messages[] is analysed for direct injection attempts — phrases that try to override the system prompt, jailbreak the model, or elicit disallowed outputs through text alone.
Document attack detection. When you pass retrieved context to the model as a document or tool output, Prompt Shields can detect injected instructions that an attacker embedded in that document text.

Both checks operate on text representations. An image_url content block passes a URL or a base64-encoded image — a blob of bytes, not a text string. Azure Prompt Shields does not decode that blob and search it for rendered instruction text. The typographic prompt injection payload that walks past Prompt Shields is not a text-format attack that text analysis could catch — it is an instruction rendered as pixels, legible to the GPT-4o vision encoder and invisible to any text-path scanner.

Azure Computer Vision (the imageanalysis API) can generate text captions and OCR transcripts for images. The OCR path has the same structural ceiling as all text-extraction-before-scan architectures: FigStep and AgentTypo attacks produce a clean or ambiguous OCR transcript while remaining clearly legible to the vision model's neural encoder. Scanning the OCR output, rather than the original image bytes, covers the wrong artefact.

The Python SDK intercept — openai + Azure endpoint

Azure OpenAI uses the same OpenAI Python SDK with an Azure endpoint and API key (or Entra ID credential). The scan intercept is a wrapper around the chat completion call that walks the messages list before dispatch:

import httpx
import base64
import os
from openai import AzureOpenAI

GLYPHWARD_API_KEY = os.environ["GLYPHWARD_API_KEY"]
AZURE_OPENAI_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"]  # e.g. https://my-resource.openai.azure.com
AZURE_OPENAI_API_KEY = os.environ["AZURE_OPENAI_API_KEY"]
AZURE_DEPLOYMENT_NAME = "gpt-4o"  # your deployment name

azure_client = AzureOpenAI(
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    api_key=AZURE_OPENAI_API_KEY,
    api_version="2024-05-01-preview",
)

def _scan_image_url_block(block: dict, label: str) -> None:
    """Scan an image_url content block. Raises if score exceeds threshold."""
    url_val = block.get("image_url", {}).get("url", "")
    if url_val.startswith("data:image"):
        # Base64 data URI
        b64_data = url_val.split(",", 1)[1]
        img_bytes = base64.b64decode(b64_data)
    else:
        # Remote URL: fetch bytes first
        img_bytes = httpx.get(url_val, timeout=10).content

    resp = httpx.post(
        "https://api.glyphward.com/v1/scan",
        json={
            "data": base64.b64encode(img_bytes).decode(),
            "modality": "image",
            "source_trust": "low",
        },
        headers={"Authorization": f"Bearer {GLYPHWARD_API_KEY}"},
        timeout=5,
    )
    result = resp.json()
    if result["score"] > 70:
        raise ValueError(
            f"{label}: multimodal PI score {result['score']} "
            f"(region: {result.get('region')})"
        )

def safe_azure_chat_completion(messages: list, **kwargs) -> object:
    """Send a chat completion to Azure OpenAI after scanning all image_url blocks."""
    for msg in messages:
        content = msg.get("content", [])
        if isinstance(content, list):
            for i, block in enumerate(content):
                if isinstance(block, dict) and block.get("type") == "image_url":
                    _scan_image_url_block(block, f"{msg['role']}[{i}]")

    return azure_client.chat.completions.create(
        model=AZURE_DEPLOYMENT_NAME,
        messages=messages,
        **kwargs,
    )

Replace your existing azure_client.chat.completions.create() calls with safe_azure_chat_completion(). The added latency is ~150–200 ms per image block. GPT-4o's own inference latency is typically 1–4 seconds for multimodal prompts, so the scan overhead is well within user-acceptable response time.

Azure AI Search + Azure OpenAI RAG: the indirect-PI surface

A common Azure architecture feeds Azure AI Search (formerly Cognitive Search) results into an Azure OpenAI chat completion — the "own-data" pattern supported by Azure OpenAI's data_sources extension. When the documents in Azure AI Search include PDFs with embedded images, those image pages may contain typographic PI payloads that the search indexer's text extraction misses and that arrive in the model's context as trusted retrieved content.

The pre-ingestion scan pattern applies here too: before a document enters your Azure AI Search index, extract its embedded images and scan each one. A flagged document should be quarantined before indexing. The scan evidence (scan ID, document hash, decision) is the provenance record for your ISO 27001 A.8.28 input-validation obligation and your OWASP LLM03 dataset-provenance trail.

Running Prompt Shields and Glyphward together

Azure Prompt Shields and Glyphward cover non-overlapping attack surfaces. The recommended architecture uses both in the request handling path:

Glyphward scan (client-side): For each image_url block in the user's message, POST bytes to /v1/scan. If score > threshold, return HTTP 400 to the user. This gate runs before the Azure API call.
Azure Prompt Shields (Azure-side): Enable via the Content Safety configuration on your Azure OpenAI deployment or call POST /contentsafety/text:shieldPrompt on the text fields. This catches text-format injection in the same request.

Neither tool substitutes for the other. A text-only injection in the user message is caught by Prompt Shields; a pixel-layer injection in the attached image is caught by Glyphward. The per-request Glyphward scan log (scan_id, score, region, modality) integrates with Azure Monitor via webhook — forward it to the same Log Analytics workspace as your Azure Content Safety logs for a consolidated compliance evidence trail.

For teams pursuing SOC 2 Type II or ISO 27001:2022 certification, the combined log stream satisfies the per-request evidence requirement for CC6.6 and A.8.28 across both text and image modalities in a single audit artifact.

Get early access

Related questions

Does Azure Prompt Shields scan image_url content blocks?

No. Azure Prompt Shields operates on the text content of messages — the string fields in the user's input and retrieved document context. An image_url content block passes bytes (base64 or a URL referencing image bytes), not text. Those bytes are not analysed by Prompt Shields. The Azure Computer Vision image analysis API can generate OCR text from images, but OCR output is the wrong inspection artefact for FigStep-class attacks, which produce a clean OCR transcript while the vision model reads the injected instruction from the pixel layer.

What about Azure's image moderation API — does that help?

Azure AI Content Safety's image analysis detects violence, sexual content, hate/fairness content, and self-harm content categories. These are content-policy categories for user-generated content moderation. A FigStep prompt injection payload on a white background scores 0 on all four categories — it is not violent, sexual, hateful, or self-harm content. It is an instruction rendered as pixels. Content moderation and prompt-injection detection are different functions that address different threat models.

Does this apply to Azure AI Studio / Azure AI Foundry deployments?

Yes. Azure AI Foundry (previously Azure AI Studio) deploys models through the same Azure OpenAI Service endpoint. The image_url content block format and the gap in Prompt Shields coverage are identical regardless of whether you access the model through the Azure OpenAI SDK, the Azure AI Inference SDK, or the REST API directly. The Glyphward scan intercept pattern is the same.

What if I use Entra ID (Azure AD) credentials instead of an API key?

The Glyphward scan call uses your Glyphward API key, not your Azure credentials. The Azure OpenAI client can use either an API key or an AzureADTokenProvider for Entra ID auth — that credential is passed to the Azure call after the scan. The scan itself runs in your application, using only the image bytes, and does not interact with Azure identity at all.

How is this different from the Azure Prompt Shields alternative page?

The Azure Prompt Shields alternative page focuses on teams not on Azure who want multimodal PI scanning without any Azure dependency. This page focuses on teams who are building on Azure OpenAI and want to add image-layer PI scanning as a complement to Prompt Shields — keeping both tools and closing the modality gap Prompt Shields leaves open.