Platform guide · Google Cloud Run

Prompt injection scanner for Google Cloud Run AI

Google Cloud Run is the dominant serverless container platform for deploying production AI applications on GCP: multimodal API backends, document intelligence pipelines, customer-facing vision tools, and AI-powered data extraction services. Cloud Run's security architecture — Cloud IAM authentication, VPC Service Controls, Secret Manager for credentials, Cloud Armor WAF for rate limiting and IP allowlisting — is designed around infrastructure access control and network security, not AI application content security. These controls answer "who can invoke this endpoint and how often?" rather than "what is in the image this endpoint will process?" A Cloud Run service that accepts user-uploaded images and passes them to the Gemini API (Vertex AI), a containerised open-source VLM (LLaVA, Phi-Vision, InternVL), or a Cloud Vision + GPT-4o hybrid pipeline has no platform-layer mechanism to detect adversarially crafted pixel content. The application code is the only layer that can add a pre-VLM scan gate, and most Cloud Run AI applications do not include one. Glyphward's scan API is callable from any Cloud Run container via a standard HTTPS POST — no VPC configuration or service mesh required.

TL;DR

In any Cloud Run service that accepts image input and passes it to a VLM (Gemini via Vertex AI, Anthropic Claude, or containerised open-source model), add a Glyphward scan in the request handler before the model call. Reject images with score >= 65. Cloud Run's IAM, VPC, Cloud Armor, and Secret Manager controls are necessary but do not inspect pixel-level PI payloads. The scan must be added in application code. Free tier — 10 scans/day, no card required.

The four multimodal attack surfaces in Cloud Run AI workloads

1. Cloud Run + Vertex AI Gemini multimodal endpoint — untrusted images in document and content pipelines. The most common Cloud Run AI pattern on GCP combines a Cloud Run containerised API with Vertex AI's Gemini API for multimodal processing. A Cloud Run service receives an HTTP request containing an image (as a base64 payload, a Cloud Storage URL, or a multipart form upload), validates the request authentication via IAM or a Cloud Run service account, and forwards the image to Vertex AI for Gemini Vision analysis. Cloud IAM validates the caller's identity and permissions; it does not inspect the image content. Vertex AI's Gemini API applies Google's content safety filters to model output (detecting harmful categories in generated text), but it does not scan input images for adversarially crafted PI payloads embedded at the pixel level. A caller who presents valid authentication credentials and submits an adversarially crafted image bypasses all Cloud IAM and Vertex AI safety controls — the adversarial pixel payload reaches the Gemini model unchecked.

2. Cloud Run containerised VLM inference — self-hosted models with no platform output filters. Teams deploying open-source VLMs (LLaVA-1.6, Phi-3.5-Vision, Qwen2-VL, InternVL2) on Cloud Run package the model server in a container (typically using vLLM, Hugging Face TGI, or a custom FastAPI wrapper) and expose a REST endpoint. Unlike Vertex AI's managed Gemini, these self-hosted model containers have no Google-provided output filtering — the model's raw output is returned directly to the caller. The adversarial image attack surface is identical: a caller submits an adversarially crafted image, the containerised VLM processes it, and the model outputs attacker-specified content. Because the output is unfiltered, the consequences of a successful injection are broader — the model can be instructed to output content that the application then uses for downstream actions (data extraction, decision making, report generation) without any moderation interception. Cloud Run's infrastructure-level controls (memory limits, request timeouts, instance concurrency) do not interact with image content at any layer.

3. Cloud Run event-driven image processing — Cloud Storage triggers and Pub/Sub pipelines. Cloud Run supports event-driven workloads via Eventarc triggers: a file uploaded to a Cloud Storage bucket triggers a Cloud Run service to process it. Document intelligence pipelines commonly use this pattern: a customer uploads a PDF, image, or Office document to a Cloud Storage bucket; Eventarc fires a Cloud Run service that extracts images from the document and passes them to Gemini or a containerised VLM for field extraction, summarisation, or classification. The Cloud Storage trigger and Eventarc routing architecture are infrastructure concerns — they route events without inspecting the content of the files that triggered them. A customer who uploads an adversarially crafted document (a poisoned invoice PDF, an image with embedded PI payload) triggers the Cloud Run pipeline with attacker-controlled content. Because these pipelines are automated and process documents without human review, the adversarial document may be processed multiple times across different pipeline stages before any anomaly is detected — if detection happens at all without a pre-VLM scan gate.

4. Cloud Run multi-service AI architectures — image propagation across chained services. Production AI applications on GCP often chain multiple Cloud Run services: a pre-processing service validates and normalises image formats, a feature extraction service runs CLIP embeddings or object detection, a VLM service performs higher-level analysis, a post-processing service structures the output for a downstream database or API. Each service communicates via HTTP calls, Cloud Pub/Sub messages, or Cloud Tasks. An adversarial image that passes the pre-processing service without detection propagates through the service chain with increasing implicit trust — later services assume earlier services have validated the content. If only the first service in the chain has image validation (for example, MIME type checking and file size limits), and that validation does not include PI scanning, the adversarial payload travels the complete pipeline undetected. Adding Glyphward scan gates at the entry point of each service that passes image bytes to a VLM — rather than only at the externally-exposed API gateway — is the correct architecture for defence-in-depth in multi-service pipelines.

Integration: Cloud Run + Vertex AI Gemini with Glyphward pre-scan

import base64
import os
import requests
from flask import Flask, request, jsonify, abort
import vertexai
from vertexai.generative_models import GenerativeModel, Part

app = Flask(__name__)

GLYPHWARD_KEY = os.environ["GLYPHWARD_KEY"]  # Set via Cloud Run Secret Manager
GLYPHWARD_THRESHOLD = 65
VERTEX_PROJECT = os.environ["GOOGLE_CLOUD_PROJECT"]
VERTEX_LOCATION = os.environ.get("VERTEX_LOCATION", "us-central1")

vertexai.init(project=VERTEX_PROJECT, location=VERTEX_LOCATION)
vlm = GenerativeModel("gemini-1.5-pro-vision")


def scan_image_bytes(image_bytes: bytes, source: str = "cloud_run") -> dict:
    """Scan image bytes for adversarial PI before Vertex AI call."""
    encoded = base64.b64encode(image_bytes).decode()
    resp = requests.post(
        "https://glyphward.com/v1/scan",
        json={"image": encoded, "source": source},
        headers={"Authorization": f"Bearer {GLYPHWARD_KEY}"},
        timeout=8,
    )
    resp.raise_for_status()
    return resp.json()


@app.route("/v1/analyse", methods=["POST"])
def analyse():
    data = request.get_json(force=True)

    if "image_base64" not in data:
        abort(400, "Missing image_base64 field")

    image_bytes = base64.b64decode(data["image_base64"])
    prompt = data.get("prompt", "Describe this image.")

    # Pre-scan before Vertex AI call
    try:
        scan = scan_image_bytes(image_bytes)
    except Exception as exc:
        # Fail closed — do not proceed if scanner unavailable
        return jsonify({"error": f"Security scan failed: {exc}"}), 503

    if scan["score"] >= GLYPHWARD_THRESHOLD:
        return jsonify({
            "error": "Image rejected: adversarial content detected",
            "score": scan["score"],
            "scan_id": scan["scan_id"],
        }), 422

    # Scanner passed — call Vertex AI Gemini
    image_part = Part.from_data(data=image_bytes, mime_type="image/png")
    response = vlm.generate_content([image_part, prompt])
    return jsonify({
        "result": response.text,
        "scan_id": scan["scan_id"],
    })


if __name__ == "__main__":
    app.run(host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))

Store GLYPHWARD_KEY in Google Cloud Secret Manager and mount it as an environment variable in the Cloud Run service configuration — never hardcode it in the container image or application source. The PORT environment variable is set by Cloud Run automatically; the service listens on that port. The fail-closed pattern on scanner unavailability (returning HTTP 503 rather than proceeding to the Vertex AI call) is correct for production — a scanner outage should surface as a service degradation rather than silently disabling PI protection. For Cloud Run event-driven pipelines triggered by Cloud Storage uploads, extract image bytes from the Eventarc event payload before calling this scan function, and use Cloud Pub/Sub dead-letter queues to route scan-rejected documents to a security review queue for manual inspection. Get early access

Coverage matrix

Defence layer	Cloud Run + Vertex AI Gemini (managed VLM)	Cloud Run + containerised VLM (self-hosted)	Cloud Storage event-triggered pipeline	Multi-service chained architecture
Cloud IAM authentication	Yes for access control; No for image content inspection	Yes for access control; No for image content inspection	Yes — service account permissions; No for document content inspection	Yes — service-to-service auth; No for image content propagation
Cloud Armor WAF (rate limiting, IP allowlisting)	No — rate limits invocations; does not inspect image payload content	No	N/A — event-driven, not HTTP WAF	No
Vertex AI Gemini content safety filters	Partial — filters harmful output categories; does not detect adversarial PI in input images	N/A — self-hosted models have no platform output filters	Partial — if using Vertex AI Gemini in the pipeline	Partial — only at Vertex AI steps
Google Cloud DLP (data loss prevention)	No — inspects structured data and text; not pixel-level PI payloads in images	No	No — DLP can be applied to Cloud Storage; inspects text content in documents, not adversarial image pixels	No
Glyphward pre-VLM scan in Cloud Run request handler	Yes — scan gate in Flask/FastAPI handler before Vertex AI call	Yes — scan gate before containerised model inference call	Yes — scan extracted images before VLM step in Eventarc-triggered pipeline	Yes — add scan gate at each service entry point that passes images to a VLM

Prompt injection scanner for Google Cloud Run AI

TL;DR

The four multimodal attack surfaces in Cloud Run AI workloads

Integration: Cloud Run + Vertex AI Gemini with Glyphward pre-scan

Coverage matrix

Related questions

Further reading