Platform guide · Anthropic Claude Enterprise

Prompt injection scanner for Anthropic Claude Enterprise

The Claude API direct integration page covers the single-application pattern: your code controls every image that enters the messages.create() call. Claude Enterprise is a different deployment model. Enterprise adds shared project contexts (files and system prompts uploaded by any workspace member and visible to all subsequent conversations in that project), org-wide MCP server integrations (admin-configured tools that return data — potentially including images — to every user in the organisation), and SSO-linked multi-user workspaces where the trust boundary is the organisation domain rather than an individual API key. These features shift the injection surface from the API request level to the workspace and project level. An adversarial image uploaded to a shared project file by one employee enters the context of every Claude conversation in that project. An org-wide MCP tool that returns an image from a third-party service can inject instructions into any user's session. Glyphward's scan gate must be applied at the document-upload step and the MCP tool-output step to protect the entire workspace.

TL;DR

For Claude Enterprise, scan images at two points: (1) when files are uploaded to shared project contexts — before they are accepted into the project; (2) when MCP servers return image data to the workspace — in the MCP server handler before the result is forwarded to Claude. Use POST https://glyphward.com/v1/scan; reject images with score ≥ 65. Free tier — 10 scans/day, no card required.

Four attack surfaces specific to Claude Enterprise

1. Shared project file uploads. Claude Enterprise projects allow team members to upload files (PDFs, images, Word documents) that become part of the project's persistent context. Every conversation started in that project includes the project files. An employee — or a compromised employee account — can upload an image file containing typographic prompt injection that instructs Claude to exfiltrate subsequent conversation content, change its role, or output malicious content to other team members. The damage is organisational, not individual: one malicious upload affects all project participants until the file is removed and the project context is refreshed.

2. Admin-managed MCP server integrations. Claude Enterprise allows org admins to configure organisation-wide Model Context Protocol (MCP) server integrations that all users in the organisation can access. MCP tools can return any data type, including images. A third-party MCP server integration that fetches data from an external service (a design tool, a monitoring dashboard, a customer CRM) can return adversarial images as part of its tool output. Because the MCP server is admin-configured rather than user-configured, its outputs receive implicitly higher trust — but trust level does not correlate with image safety. Every image returned by an MCP tool should be scanned before Claude processes it.

3. Conversation attachments in SSO-linked sessions. In Claude.ai Enterprise, users can attach images directly to individual conversations. Unlike shared project files (which affect all users), conversation attachments affect only the current session. However, in multi-user collaborative workflows (where conversation links are shared across the organisation), an attachment uploaded by one user can affect another user who opens the shared conversation link. The trust boundary is the organisation domain — any user with the correct SSO credentials can open a shared conversation, including an attacker who has compromised one account via credential stuffing or phishing.

4. Custom system prompts with embedded images. Claude Enterprise allows admins to define custom system prompts for projects. If an admin's system prompt includes an image URL (via a Markdown image reference or a URL placeholder), Claude will fetch and process that image as part of the system prompt. An attacker who can modify an admin's system prompt (via a compromised admin account or a social engineering attack) can inject an adversarial image into the system prompt level — the highest-privilege injection point in the entire hierarchy.

Integration: MCP server wrapper with scan gate (TypeScript / Node.js)

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import axios from "axios";
import fs from "fs";

const GLYPHWARD_KEY = process.env.GLYPHWARD_API_KEY!;
const INJECTION_THRESHOLD = 65;

async function scanImageBuffer(imageBuffer: Buffer, source: string): Promise<{score: number; scanId: string}> {
  const resp = await axios.post(
    "https://glyphward.com/v1/scan",
    { image: imageBuffer.toString("base64"), source },
    { headers: { Authorization: `Bearer ${GLYPHWARD_KEY}` }, timeout: 8000 }
  );
  return { score: resp.data.score, scanId: resp.data.scan_id };
}

async function safeImageToolResult(imageBuffer: Buffer, mimeType: string, source: string) {
  let scan: { score: number; scanId: string };
  try {
    scan = await scanImageBuffer(imageBuffer, source);
  } catch {
    // Fail-closed: scanner unreachable → redact
    return { type: "text", text: "[Image redacted: scan service unavailable]" };
  }

  if (scan.score >= INJECTION_THRESHOLD) {
    console.error(`MCP image redacted: score=${scan.score}, scan_id=${scan.scanId}, source=${source}`);
    return { type: "text", text: "[Image redacted: adversarial content detected]" };
  }

  return {
    type: "image",
    data: imageBuffer.toString("base64"),
    mimeType,
  };
}

// Example: MCP tool that fetches a screenshot and scans it before returning
const server = new Server({ name: "safe-screenshot-tool", version: "1.0.0" }, { capabilities: { tools: {} } });

server.setRequestHandler("tools/call", async (request) => {
  if (request.params.name === "get_screenshot") {
    const { url } = request.params.arguments as { url: string };

    // Fetch the screenshot image (your implementation here)
    const screenshotBuffer = await fetchScreenshot(url);

    // Scan before returning to Claude
    const safeContent = await safeImageToolResult(screenshotBuffer, "image/png", `mcp_screenshot:${url}`);

    return {
      content: [safeContent],
    };
  }
  throw new Error(`Unknown tool: ${request.params.name}`);
});

const transport = new StdioServerTransport();
await server.connect(transport);

# Python: Pre-upload scan gate for shared project file uploads
# (integrate with your internal file management system or Claude API wrapper)
import base64, requests, os
from pathlib import Path

GLYPHWARD_KEY = os.environ["GLYPHWARD_API_KEY"]
INJECTION_THRESHOLD = 65
IMAGE_EXTENSIONS = {".jpg", ".jpeg", ".png", ".gif", ".webp", ".bmp", ".tiff"}

def scan_project_file_upload(file_path: str, uploader_user_id: str) -> bool:
    """
    Returns True if the file is safe to add to the project context.
    Returns False (and logs) if the file contains adversarial image content.
    """
    ext = Path(file_path).suffix.lower()
    if ext not in IMAGE_EXTENSIONS:
        # For PDFs and Word docs, extract images first (out of scope for this example)
        # For text-only files, pass through (text-only scanner handles these)
        return True

    with open(file_path, "rb") as f:
        image_bytes = f.read()

    try:
        resp = requests.post(
            "https://glyphward.com/v1/scan",
            json={
                "image": base64.b64encode(image_bytes).decode(),
                "source": f"enterprise_project_upload:{uploader_user_id}",
            },
            headers={"Authorization": f"Bearer {GLYPHWARD_KEY}"},
            timeout=8,
        )
        resp.raise_for_status()
        result = resp.json()
    except Exception:
        # Fail-closed: scanner unreachable → reject the upload
        return False

    if result["score"] >= INJECTION_THRESHOLD:
        print(
            f"Project upload rejected: user={uploader_user_id}, file={file_path}, "
            f"score={result['score']}, scan_id={result['scan_id']}"
        )
        return False
    return True

Get early access

Coverage matrix

Defence layer	Shared project file upload	MCP tool image return	Conversation attachment	Admin system prompt image
Anthropic's built-in content moderation (Claude.ai)	Partial — policy violation (CSAM etc), not PI injection	No — MCP tool output not inspected	Partial — policy only	No
SSO / identity access controls	Prevents unauthorised workspace access; does not inspect content	No	No	No
MCP server allowlist (admin-configured)	N/A	Prevents unauthorised MCP servers; does not inspect their output	N/A	N/A
Glyphward scan at upload + MCP output	Yes — scan before project file accepted	Yes — scan in MCP server handler	Yes — scan at upload API	Yes — scan system prompt images at admin config step

Related questions

How is this different from the Claude API direct integration?

When you call the Claude API directly (anthropic.messages.create()), your application code controls every message, image, and tool result that enters the request. You can interpose a scan gate in your code before making the API call. In Claude Enterprise, users interact with Claude through a web or desktop interface, and shared project contexts/MCP tools create injection surfaces that exist outside your application's request-response cycle. The scan gate must be implemented at the document management and MCP server levels rather than at the API request level. See the Claude API direct integration page for the request-level pattern.

Are MCP server images scanned by Anthropic's content filters?

As of 2026, Anthropic's Claude.ai content moderation covers policy-violating content (CSAM, explicit content, etc.). It does not perform adversarial pixel-level injection detection. MCP tool outputs are passed to Claude as tool results; the trust model treats them as structured data from a configured integration, not as untrusted user uploads. An adversarial image in an MCP tool result therefore receives higher implicit trust than a user upload, making it a particularly effective attack vector. The scan gate in the MCP server handler is the correct defence layer.

What should we do if an adversarial image is found in an existing shared project file?

Immediately remove the file from the project context. Note that in Claude Enterprise, project context is refreshed per conversation — removing the file prevents it from appearing in future conversations but does not affect conversations already in progress. Audit the conversation logs for the period after the file was uploaded to identify any sessions that may have been affected (look for unusual tool calls, unexpected output formats, or responses that acknowledge instructions from an image). Notify affected users if your incident response policy requires it. Change the compromised user's SSO credentials if the upload was not authorised.

Does this apply to Claude for Work (the Teams plan) as well?

Claude Teams (the plan below Enterprise) also supports shared project contexts and conversation sharing. The shared project file attack surface applies to Teams in the same way as Enterprise. The MCP server attack surface applies to the extent that users configure personal MCP integrations in their workspace. Admin-managed org-wide MCP integrations are primarily an Enterprise feature. The scan gate pattern described above (project file upload scan + MCP server handler scan) applies to both plans.

TL;DR

Four attack surfaces specific to Claude Enterprise

Integration: MCP server wrapper with scan gate (TypeScript / Node.js)

Coverage matrix

Related questions

Further reading