Platform guide · Google Cloud Run
Prompt injection scanner for Google Cloud Run AI
Google Cloud Run is the dominant serverless container platform for deploying production AI applications on GCP: multimodal API backends, document intelligence pipelines, customer-facing vision tools, and AI-powered data extraction services. Cloud Run's security architecture — Cloud IAM authentication, VPC Service Controls, Secret Manager for credentials, Cloud Armor WAF for rate limiting and IP allowlisting — is designed around infrastructure access control and network security, not AI application content security. These controls answer "who can invoke this endpoint and how often?" rather than "what is in the image this endpoint will process?" A Cloud Run service that accepts user-uploaded images and passes them to the Gemini API (Vertex AI), a containerised open-source VLM (LLaVA, Phi-Vision, InternVL), or a Cloud Vision + GPT-4o hybrid pipeline has no platform-layer mechanism to detect adversarially crafted pixel content. The application code is the only layer that can add a pre-VLM scan gate, and most Cloud Run AI applications do not include one. Glyphward's scan API is callable from any Cloud Run container via a standard HTTPS POST — no VPC configuration or service mesh required.
TL;DR
In any Cloud Run service that accepts image input and passes it to a VLM (Gemini via Vertex AI, Anthropic Claude, or containerised open-source model), add a Glyphward scan in the request handler before the model call. Reject images with score >= 65. Cloud Run's IAM, VPC, Cloud Armor, and Secret Manager controls are necessary but do not inspect pixel-level PI payloads. The scan must be added in application code. Free tier — 10 scans/day, no card required.
The four multimodal attack surfaces in Cloud Run AI workloads
1. Cloud Run + Vertex AI Gemini multimodal endpoint — untrusted images in document and content pipelines. The most common Cloud Run AI pattern on GCP combines a Cloud Run containerised API with Vertex AI's Gemini API for multimodal processing. A Cloud Run service receives an HTTP request containing an image (as a base64 payload, a Cloud Storage URL, or a multipart form upload), validates the request authentication via IAM or a Cloud Run service account, and forwards the image to Vertex AI for Gemini Vision analysis. Cloud IAM validates the caller's identity and permissions; it does not inspect the image content. Vertex AI's Gemini API applies Google's content safety filters to model output (detecting harmful categories in generated text), but it does not scan input images for adversarially crafted PI payloads embedded at the pixel level. A caller who presents valid authentication credentials and submits an adversarially crafted image bypasses all Cloud IAM and Vertex AI safety controls — the adversarial pixel payload reaches the Gemini model unchecked.
2. Cloud Run containerised VLM inference — self-hosted models with no platform output filters. Teams deploying open-source VLMs (LLaVA-1.6, Phi-3.5-Vision, Qwen2-VL, InternVL2) on Cloud Run package the model server in a container (typically using vLLM, Hugging Face TGI, or a custom FastAPI wrapper) and expose a REST endpoint. Unlike Vertex AI's managed Gemini, these self-hosted model containers have no Google-provided output filtering — the model's raw output is returned directly to the caller. The adversarial image attack surface is identical: a caller submits an adversarially crafted image, the containerised VLM processes it, and the model outputs attacker-specified content. Because the output is unfiltered, the consequences of a successful injection are broader — the model can be instructed to output content that the application then uses for downstream actions (data extraction, decision making, report generation) without any moderation interception. Cloud Run's infrastructure-level controls (memory limits, request timeouts, instance concurrency) do not interact with image content at any layer.
3. Cloud Run event-driven image processing — Cloud Storage triggers and Pub/Sub pipelines. Cloud Run supports event-driven workloads via Eventarc triggers: a file uploaded to a Cloud Storage bucket triggers a Cloud Run service to process it. Document intelligence pipelines commonly use this pattern: a customer uploads a PDF, image, or Office document to a Cloud Storage bucket; Eventarc fires a Cloud Run service that extracts images from the document and passes them to Gemini or a containerised VLM for field extraction, summarisation, or classification. The Cloud Storage trigger and Eventarc routing architecture are infrastructure concerns — they route events without inspecting the content of the files that triggered them. A customer who uploads an adversarially crafted document (a poisoned invoice PDF, an image with embedded PI payload) triggers the Cloud Run pipeline with attacker-controlled content. Because these pipelines are automated and process documents without human review, the adversarial document may be processed multiple times across different pipeline stages before any anomaly is detected — if detection happens at all without a pre-VLM scan gate.
4. Cloud Run multi-service AI architectures — image propagation across chained services. Production AI applications on GCP often chain multiple Cloud Run services: a pre-processing service validates and normalises image formats, a feature extraction service runs CLIP embeddings or object detection, a VLM service performs higher-level analysis, a post-processing service structures the output for a downstream database or API. Each service communicates via HTTP calls, Cloud Pub/Sub messages, or Cloud Tasks. An adversarial image that passes the pre-processing service without detection propagates through the service chain with increasing implicit trust — later services assume earlier services have validated the content. If only the first service in the chain has image validation (for example, MIME type checking and file size limits), and that validation does not include PI scanning, the adversarial payload travels the complete pipeline undetected. Adding Glyphward scan gates at the entry point of each service that passes image bytes to a VLM — rather than only at the externally-exposed API gateway — is the correct architecture for defence-in-depth in multi-service pipelines.
Integration: Cloud Run + Vertex AI Gemini with Glyphward pre-scan
import base64
import os
import requests
from flask import Flask, request, jsonify, abort
import vertexai
from vertexai.generative_models import GenerativeModel, Part
app = Flask(__name__)
GLYPHWARD_KEY = os.environ["GLYPHWARD_KEY"] # Set via Cloud Run Secret Manager
GLYPHWARD_THRESHOLD = 65
VERTEX_PROJECT = os.environ["GOOGLE_CLOUD_PROJECT"]
VERTEX_LOCATION = os.environ.get("VERTEX_LOCATION", "us-central1")
vertexai.init(project=VERTEX_PROJECT, location=VERTEX_LOCATION)
vlm = GenerativeModel("gemini-1.5-pro-vision")
def scan_image_bytes(image_bytes: bytes, source: str = "cloud_run") -> dict:
"""Scan image bytes for adversarial PI before Vertex AI call."""
encoded = base64.b64encode(image_bytes).decode()
resp = requests.post(
"https://glyphward.com/v1/scan",
json={"image": encoded, "source": source},
headers={"Authorization": f"Bearer {GLYPHWARD_KEY}"},
timeout=8,
)
resp.raise_for_status()
return resp.json()
@app.route("/v1/analyse", methods=["POST"])
def analyse():
data = request.get_json(force=True)
if "image_base64" not in data:
abort(400, "Missing image_base64 field")
image_bytes = base64.b64decode(data["image_base64"])
prompt = data.get("prompt", "Describe this image.")
# Pre-scan before Vertex AI call
try:
scan = scan_image_bytes(image_bytes)
except Exception as exc:
# Fail closed — do not proceed if scanner unavailable
return jsonify({"error": f"Security scan failed: {exc}"}), 503
if scan["score"] >= GLYPHWARD_THRESHOLD:
return jsonify({
"error": "Image rejected: adversarial content detected",
"score": scan["score"],
"scan_id": scan["scan_id"],
}), 422
# Scanner passed — call Vertex AI Gemini
image_part = Part.from_data(data=image_bytes, mime_type="image/png")
response = vlm.generate_content([image_part, prompt])
return jsonify({
"result": response.text,
"scan_id": scan["scan_id"],
})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))
Store GLYPHWARD_KEY in Google Cloud Secret Manager and mount it as an environment variable in the Cloud Run service configuration — never hardcode it in the container image or application source. The PORT environment variable is set by Cloud Run automatically; the service listens on that port. The fail-closed pattern on scanner unavailability (returning HTTP 503 rather than proceeding to the Vertex AI call) is correct for production — a scanner outage should surface as a service degradation rather than silently disabling PI protection. For Cloud Run event-driven pipelines triggered by Cloud Storage uploads, extract image bytes from the Eventarc event payload before calling this scan function, and use Cloud Pub/Sub dead-letter queues to route scan-rejected documents to a security review queue for manual inspection. Get early access
Coverage matrix
| Defence layer | Cloud Run + Vertex AI Gemini (managed VLM) | Cloud Run + containerised VLM (self-hosted) | Cloud Storage event-triggered pipeline | Multi-service chained architecture |
|---|---|---|---|---|
| Cloud IAM authentication | Yes for access control; No for image content inspection | Yes for access control; No for image content inspection | Yes — service account permissions; No for document content inspection | Yes — service-to-service auth; No for image content propagation |
| Cloud Armor WAF (rate limiting, IP allowlisting) | No — rate limits invocations; does not inspect image payload content | No | N/A — event-driven, not HTTP WAF | No |
| Vertex AI Gemini content safety filters | Partial — filters harmful output categories; does not detect adversarial PI in input images | N/A — self-hosted models have no platform output filters | Partial — if using Vertex AI Gemini in the pipeline | Partial — only at Vertex AI steps |
| Google Cloud DLP (data loss prevention) | No — inspects structured data and text; not pixel-level PI payloads in images | No | No — DLP can be applied to Cloud Storage; inspects text content in documents, not adversarial image pixels | No |
| Glyphward pre-VLM scan in Cloud Run request handler | Yes — scan gate in Flask/FastAPI handler before Vertex AI call | Yes — scan gate before containerised model inference call | Yes — scan extracted images before VLM step in Eventarc-triggered pipeline | Yes — add scan gate at each service entry point that passes images to a VLM |
Related questions
How does this differ from Google Vertex AI Agent Builder and Vertex AI Pipelines?
Google Vertex AI Agent Builder (the managed agent orchestration service, covered in the Vertex AI Agent Builder page) provides a higher-level abstraction for building conversational agents with Grounding and RAG. Cloud Run is the infrastructure layer where custom application code runs — it's what teams use when they need full control over the inference logic, pre/post-processing, or when using models outside the Vertex AI model garden. Vertex AI Pipelines are ML training and batch inference pipelines, typically run on managed infrastructure separate from Cloud Run. The PI risk in Cloud Run AI workloads is specifically about production serving code that a team wrote and deployed in a container — these workloads have no shared security baseline because every team writes their own application code. The scan gate must be added to that custom code.
Does Google Cloud Armor protect against prompt injection attacks?
Cloud Armor is a WAF and DDoS protection service — it inspects HTTP request metadata (IP address, headers, URI patterns, rate of requests) and applies rules based on those attributes. It can rate-limit high-frequency scan attempts, block known-malicious IP ranges, and reject requests with suspicious patterns in query parameters or headers. It does not inspect the binary payload of an image field in a POST request body for adversarial pixel content — this would require content-aware inspection of multimodal data that is outside Cloud Armor's design scope. Cloud Armor can reduce the volume of attack attempts and protect against infrastructure-layer abuse; Glyphward provides the content-layer inspection that Cloud Armor cannot perform.
What Cloud Run logging and monitoring should I add alongside the PI scan gate?
At minimum, log every scan result to Cloud Logging with the scan ID, score, source label, and request metadata (Cloud Run instance ID, timestamp, calling service identity). Create a Cloud Monitoring alert on scan rejections (score >= 65) — a spike in rejected scans is a signal of active attack probing. Write scan rejections to a separate BigQuery table or Pub/Sub topic for downstream analysis: cluster rejection events by source IP, user account, and time window to identify systematic probing campaigns. Use Cloud Trace to propagate the scan ID through the downstream pipeline so that any anomalous downstream outputs can be correlated back to the specific image submission that triggered them. The multimodal AI security checklist includes a logging and monitoring section for Cloud Run AI workloads.
Further reading
- Prompt injection scanner for Google Vertex AI / Gemini — the Vertex AI managed Gemini API surface (used as the VLM backend in many Cloud Run deployments)
- Prompt injection scanner for Vertex AI Agent Builder — the higher-level agent orchestration layer above Cloud Run
- Agentic RAG pipeline prompt injection — the document retrieval and processing patterns often deployed on Cloud Run with Vertex AI backends
- PDF prompt injection detection — PDF image extraction before Vertex AI processing, applicable to Cloud Run document intelligence pipelines