Platform guide · HuggingFace Spaces

Prompt injection scanner for HuggingFace Spaces

HuggingFace Spaces lets anyone deploy a Gradio or Streamlit application as a publicly accessible web endpoint, backed by a hosted VLM (LLaVA, IDEFICS, InternVL, Phi-Vision, Qwen-VL, or any model from the Hub). What began as a demo-sharing platform has become an informal production API layer: hundreds of companies, independent developers, and academic projects route real user image uploads through Spaces-backed endpoints. The security model of Spaces was designed for demos and reproducibility, not production deployments: the app.py source code is publicly visible in the Space repository, which means system prompts and instruction templates are readable by anyone; Gradio's gr.Image component accepts any valid image file without content inspection; and HuggingFace's platform-level controls (organisation access settings, Space visibility, Secrets for API keys) address authentication and model access, not adversarial image content. A user submitting an adversarially crafted image to a Spaces-backed VLM endpoint bypasses every platform control because the platform has no mechanism for inspecting pixel-level prompt injection payloads. Glyphward provides the pre-VLM scan gate that must be added in application code — in the Gradio fn callback or the Streamlit page handler — before any image is passed to the hosted VLM.

TL;DR

In any HuggingFace Space that accepts user image uploads and passes them to a VLM (via the Inference API, a local pipeline, or a direct API call to an external provider), add a Glyphward scan in the Gradio callback or Streamlit handler before the model call. Reject images with score >= 65. If your Space's app.py is public, your system prompt is already readable — use Secrets for any API keys, but treat your system prompt as known to adversaries when designing the attack model. Free tier — 10 scans/day, no card required.

The four multimodal attack surfaces in HuggingFace Spaces

1. Public system prompt via visible app.py — adversary-informed injection targeting. Every HuggingFace Space's source code is visible in the Space's "Files" tab unless the Space is explicitly set to private. The app.py (Gradio) or streamlit_app.py (Streamlit) file contains the complete application logic — including the system prompt passed to the VLM, the instruction template structure, and any tool or function definitions wired to the model. An adversary who wants to craft an effective prompt injection payload against a specific Space-backed application starts by reading the app.py to learn the exact system prompt, the model name and version, the Gradio component types (which tells them what image preprocessing is applied), and any output validation logic. This gives the adversary complete knowledge of the instruction surface — they can craft an adversarial image that matches the model's context exactly and targets the specific output format or downstream action that the Space is designed to produce. For Spaces promoted beyond demo status, treat the system prompt as a publicly known input specification when designing the adversarial threat model.

2. Gradio image component — permissive file handler with no content inspection. Gradio's gr.Image component accepts any image file that parses as a valid image format (PNG, JPEG, WebP, GIF, BMP). It applies optional preprocessing (resize, normalise, convert to RGB) controlled by the component configuration, but it performs no content-level inspection. The image is passed to the Gradio callback function as a PIL Image object or a numpy array — at which point the application code is responsible for what happens next. Most Spaces pass this image directly to the VLM pipeline with no intermediate validation. A user submitting an adversarially crafted image — a typographically injected invoice, a FigStep-style instruction image, a steganographically encoded payload — passes through the Gradio file handler without any interception. The Gradio gr.Image component does not have a validate hook for content-level checks; validation must be added explicitly in the callback function before the model call.

3. Spaces-backed inference used as production API — security mismatches from demo-to-production drift. The HuggingFace Spaces API allows programmatic invocation of any Space via the gradio_client Python library or direct HTTP POST to the Space's API endpoint. Developers who start with a Space demo and then build a production application on top of it often call the Space's API from their backend service, treating it as a managed inference endpoint. This creates a security mismatch: the Space was built as a demo with the assumption that users interact with it via the Gradio UI (which provides some user-context friction); the production application bypasses the UI and sends images programmatically with no friction at all. Any image validation logic that existed in the UI layer (file size limits, format checks enforced by the browser) disappears in API-mode usage. The Space's app.py may have no input validation because it was written expecting demo users, not production volumes of programmatic submissions from untrusted external sources.

4. Multi-Space pipelines — adversarial image propagation across chained Spaces. Complex AI applications on HuggingFace sometimes chain multiple Spaces: a first Space pre-processes or classifies an image, a second Space performs VLM analysis, a third Space post-processes the output. Each Space in the chain may be owned by a different user or organisation. An adversarial image that passes the first Space's processing without detection may propagate to subsequent Spaces in the chain with elevated implicit trust — earlier-stage outputs are treated as pre-validated by downstream Spaces. If the first Space applies any transformation to the image (resize, format conversion, colour normalisation), the adversarial payload must survive that transformation — many typographic injection and steganographic injection techniques are designed to survive standard image preprocessing. The multi-Space pipeline pattern has no single point of content inspection unless explicitly added by the pipeline orchestrator.

Integration: Gradio callback with Glyphward pre-scan gate

import base64
import io
import requests
import gradio as gr
from PIL import Image

GLYPHWARD_KEY = "<your-glyphward-api-key>"
GLYPHWARD_THRESHOLD = 65


def scan_image_pil(pil_image: Image.Image, source: str = "huggingface_spaces") -> dict:
    """Convert PIL image to bytes and scan for adversarial PI."""
    buf = io.BytesIO()
    pil_image.save(buf, format="PNG")
    image_bytes = buf.getvalue()
    encoded = base64.b64encode(image_bytes).decode()
    resp = requests.post(
        "https://glyphward.com/v1/scan",
        json={"image": encoded, "source": source},
        headers={"Authorization": f"Bearer {GLYPHWARD_KEY}"},
        timeout=8,
    )
    resp.raise_for_status()
    return resp.json()


def analyse_image(user_image: Image.Image, user_question: str) -> str:
    """
    Gradio callback: scan image BEFORE passing to VLM.
    Add this scan gate to any gr.Image callback in your Space's app.py.
    """
    if user_image is None:
        return "Please upload an image."

    try:
        scan = scan_image_pil(user_image)
    except Exception as exc:
        # Fail closed — reject if scanner is unreachable
        return f"Security scan unavailable. Please try again. ({exc})"

    if scan["score"] >= GLYPHWARD_THRESHOLD:
        return (
            f"Image rejected: adversarial content detected "
            f"(score {scan['score']}/100, scan ID {scan['scan_id']}). "
            "Contact support if you believe this is a false positive."
        )

    # Scanner passed — proceed with VLM call
    # Replace the block below with your actual model inference call
    from transformers import pipeline as hf_pipeline
    vlm = hf_pipeline("image-to-text", model="Salesforce/blip-image-captioning-large")
    result = vlm(user_image)
    caption = result[0]["generated_text"] if result else "No caption generated."

    return f"Analysis: {caption}"


demo = gr.Interface(
    fn=analyse_image,
    inputs=[gr.Image(type="pil", label="Upload image"), gr.Textbox(label="Question")],
    outputs=gr.Textbox(label="Result"),
    title="Image Analysis (PI-scanned)",
    description="Images are scanned for adversarial prompt injection before analysis.",
)

if __name__ == "__main__":
    demo.launch()

The scan gate wraps the Gradio callback before the VLM call — any image that scores above the threshold returns an error string to the Gradio UI without reaching the model. The fail-closed pattern on scanner unavailability (the except block returning an error rather than falling through to the model) is correct for production use: a scanner outage should not silently disable PI protection. For Spaces used as API backends via gradio_client, the same scan gate applies — the analyse_image function is called regardless of whether the invocation comes from the Gradio UI or a programmatic API client. Store GLYPHWARD_KEY in the Space's Secrets (not in app.py) to prevent exposure in the public repository. Get early access

Coverage matrix

Defence layer	Public system prompt (app.py visible)	Gradio image component (file upload)	Spaces API (programmatic invocation)	Multi-Space chained pipelines
HuggingFace Space visibility (private/public)	Partial — private Spaces hide app.py; most demo-origin Spaces are public	No — visibility controls access; does not inspect image content	Partial — private Spaces require HF token; does not inspect image content	No — per-Space visibility settings are independent
HuggingFace Secrets (env var protection)	Yes — API keys hidden; system prompt still visible in app.py if public	No — Secrets protect credentials; not content inspection	No	No
Gradio gr.Image preprocessing (resize, normalise)	N/A	No — preprocessing normalises format/size; does not detect adversarial pixel content	No	No — transformations may degrade but typically do not eliminate adversarial payloads
HuggingFace Inference API output filters	N/A	No — output moderation targets harmful content; not adversarial instruction detection	No	No
Glyphward pre-VLM scan in Gradio callback	Partial — scan does not hide system prompt; compensates by blocking payloads crafted against known prompts	Yes — scan gate in callback intercepts adversarial images before model call	Yes — scan gate applies to all invocation paths (UI and API) because it is in the callback function	Yes — add scan gate at pipeline orchestrator layer or in each Space's own callback

Related questions

How is HuggingFace Spaces different from HuggingFace Inference Endpoints for this threat?

HuggingFace Inference Endpoints (covered in the Inference Endpoints page) are managed, production-grade deployment targets for Hub models — they support custom containers, persistent VMs, private VPC networking, and per-endpoint scaling controls. Spaces are application hosting for Gradio and Streamlit UIs — they're designed for demos, model showcases, and interactive tools. In practice, many teams use Spaces-backed APIs as informal Inference Endpoints because they're free to public users and require no infrastructure setup. The attack surface differs: Spaces expose the application source code (app.py) and have less isolation between the UI and API layers; Inference Endpoints provide a clean REST API with more predictable input handling. Both require Glyphward scan gates for any image input passed to a VLM, but the integration point differs — in app.py callback for Spaces, in application code calling the endpoint URL for Inference Endpoints.

Does making a Space private protect against prompt injection?

Making a Space private prevents public access and hides the app.py source code from unauthenticated users — this addresses the system-prompt-disclosure risk for public Spaces. It does not protect against authenticated users (anyone with a valid HuggingFace account who is granted access to the Space) who submit adversarially crafted images. For internal company Spaces accessed by employees, the relevant threat model shifts from external adversaries to compromised credentials and insider threats. For Spaces used as APIs, authenticated API callers who discover the endpoint can still submit adversarial images. Private visibility is a meaningful security improvement but is not sufficient as a PI defence; Glyphward scan gates in the application code are required regardless of Space visibility setting.

Does the Gradio framework version affect the vulnerability?

The core vulnerability — image content passed to a VLM without pixel-level PI scanning — exists in all Gradio versions because it is a property of the application design, not a Gradio library bug. Newer Gradio versions add features like file upload whitelisting (allowed MIME types) and size limits that reduce the attack surface for certain payload delivery mechanisms, but they do not inspect pixel content for adversarial instructions. Keeping Gradio updated addresses known security issues in the framework itself (including some file upload and XSS vulnerabilities in earlier versions documented in the Gradio changelog) but does not provide PI protection. The scan gate must be added explicitly in the application callback regardless of Gradio version.

TL;DR

The four multimodal attack surfaces in HuggingFace Spaces

Integration: Gradio callback with Glyphward pre-scan gate

Coverage matrix

Related questions

Further reading