Product · Architecture guide
Real-time vs batch prompt-injection scanning
Multimodal prompt injection scanners can operate in two fundamentally different modes: real-time (synchronous, pre-LLM-call, fail-closed on timeout) and batch (asynchronous, scheduled, forensic/compliance role). The right mode — or combination of modes — depends on your pipeline's latency budget, threat model, and compliance requirements. Interactive applications where a user waits for a response need real-time gating: a flagged image should block the LLM call before the response is generated. Internal document processing workflows with looser latency budgets can run batch sweeps on stored image archives. Both modes call Glyphward's /v1/scan endpoint; the difference is in how and when you invoke it. This page walks through the architecture of each mode, when to choose each, and how to combine them for defence-in-depth.
TL;DR
Use real-time when a user is waiting and a blocked image should stop the LLM call immediately. Use batch when scanning historical uploads, preparing training data, or generating compliance audit trails on a schedule. Combine both: real-time gates production inference, batch sweeps the archive. Start with the free tier — 10 scans/day, no card required.
Real-time scanning: synchronous, fail-closed, pre-LLM
Real-time scanning inserts POST https://glyphward.com/v1/scan between the image source and the LLM call. The scan is synchronous — the application awaits the result before deciding whether to proceed. Typical latency: <150ms for images under 1MB; <200ms for 2–4MB images. If the scanner is unreachable (network partition, service disruption), the recommended behaviour is fail-closed: treat the image as unsafe and block the LLM call rather than passing the image unchecked.
When to use real-time:
- Interactive chatbots and assistants where a user submits an image and waits for a response
- Avatar SaaS and image-upload workflows where the scan result determines whether to process the upload
- Voice agents and bot platforms (Teams bots, Slack bots) where each incoming attachment needs immediate gating
- Webhook-triggered automation (Zapier, Make, n8n) where the scan result routes the image to different processing paths
- Any pipeline where the LLM output influences a downstream action (tool call, database write, email send) that should not execute on adversarial input
Batch scanning: asynchronous, high-throughput, forensic
Batch scanning processes collections of images on a schedule — nightly, weekly, or triggered by an event (new document ingested, upload completed). The caller submits a list of image identifiers or URLs and receives results asynchronously via webhook or polling. Batch scanning is appropriate when:
- The images have already been stored (the LLM call happened earlier, or the images are pre-processed before any LLM sees them)
- Throughput matters more than per-image latency (document archives of 10,000+ images)
- The output feeds a compliance audit trail or a data quality report rather than a real-time block decision
- Training data preparation: scan all images in a fine-tuning dataset before starting a training run
Combining real-time and batch: defence-in-depth
Most production pipelines benefit from both modes running in parallel:
- Real-time gate at inference time — blocks the immediate LLM call for any image submitted in the current session.
- Batch sweep of the stored archive — sweeps all images ingested over the past 24 hours (or since the last batch run) to catch any that slipped through the real-time gate (for example, during a scanner outage window when fail-open was configured) and to re-scan against updated detection models as Glyphward's corpus grows.
Integration code: real-time and batch in Python
import base64, time, requests
from pathlib import Path
GLYPHWARD_KEY = "<your-glyphward-api-key>"
HEADERS = {"Authorization": f"Bearer {GLYPHWARD_KEY}", "Content-Type": "application/json"}
# ── Real-time mode ──────────────────────────────────────────────────────
def scan_realtime(image_bytes: bytes, source: str = "realtime") -> dict:
"""Synchronous scan. Raises on scanner error (fail-closed pattern)."""
encoded = base64.b64encode(image_bytes).decode()
try:
resp = requests.post(
"https://glyphward.com/v1/scan",
json={"image": encoded, "source": source},
headers=HEADERS,
timeout=8, # fail-closed if scanner takes > 8s
)
resp.raise_for_status()
return resp.json()
except Exception as exc:
# Fail-closed: treat scanner error as a blocked image
raise RuntimeError(f"Scanner unavailable — image blocked. ({exc})")
def handle_user_image(image_bytes: bytes) -> str:
scan = scan_realtime(image_bytes) # raises if scanner down
if scan["score"] >= 70:
return f"Image blocked (ref {scan['scan_id']})"
# Safe — call your LLM here
return call_vision_llm(image_bytes)
# ── Batch mode ──────────────────────────────────────────────────────────
def scan_batch(image_paths: list[str], threshold: int = 70) -> list[dict]:
"""Scan a list of local image files. Returns per-image results."""
results = []
for path in image_paths:
image_bytes = Path(path).read_bytes()
encoded = base64.b64encode(image_bytes).decode()
try:
resp = requests.post(
"https://glyphward.com/v1/scan",
json={"image": encoded, "source": "batch"},
headers=HEADERS,
timeout=15,
)
resp.raise_for_status()
result = resp.json()
except Exception as exc:
result = {"path": path, "error": str(exc), "score": None}
result["path"] = path
result["flagged"] = result.get("score", 0) >= threshold
results.append(result)
time.sleep(0.1) # rate limiting: respect free-tier 10/day or Pro 100k/mo
return results
def nightly_archive_sweep(image_dir: str) -> None:
paths = [str(p) for p in Path(image_dir).glob("**/*.{png,jpg,jpeg,webp}")]
results = scan_batch(paths)
flagged = [r for r in results if r.get("flagged")]
if flagged:
# Write to audit log, quarantine flagged files, alert on-call
with open("audit-log.jsonl", "a") as f:
import json
for r in flagged:
f.write(json.dumps(r) + "\n")
print(f"Batch sweep: {len(flagged)}/{len(results)} images flagged — see audit-log.jsonl")
The batch function above calls the synchronous /v1/scan endpoint sequentially — suitable for archives up to a few thousand images. For larger archives, contact Glyphward about the batch job API (planned /v1/batch endpoint) which accepts a list of blob storage URLs and calls back via webhook on completion.
Which mode should I use?
| Use case | Recommended mode | Threshold | Fail behaviour |
|---|---|---|---|
| User-facing chatbot with image upload | Real-time | 70 | Fail-closed: block image, return error |
| Avatar SaaS: selfie upload before face-swap | Real-time | 65 | Fail-closed: reject upload, prompt retry |
| Voice agent with image attachments | Real-time | 70 | Fail-closed: skip image, process text-only |
| Nightly document archive audit | Batch | 70 | Log and quarantine flagged images |
| Fine-tuning dataset preparation | Batch | 60 | Strict: exclude any image above 60 |
| HIPAA / SOX compliance audit trail | Batch (+ real-time) | 70 | Both: gate production + generate audit evidence |
| Internal Slack bot (trusted users) | Real-time | 75 | Fail-open acceptable for trusted-user channels |
Related questions
What is the typical latency of Glyphward real-time scanning?
Typical end-to-end latency (application → Glyphward → application) is under 150ms for JPEG or PNG images under 1MB, and under 200ms for images up to 4MB, measured from a European or US-East network location. For applications with a 200ms SLA, set the timeout parameter in your HTTP client to 180ms and handle the timeout as a fail-closed event (block the image). For applications with tighter latency budgets, use edge caching for repeated identical images (same Content-MD5 hash → cache hit on the scan result).
Can batch scanning satisfy HIPAA or SOX audit trail requirements?
Yes — batch scans produce a scan_id for every image processed, regardless of whether the image was flagged. A nightly batch run over all images processed in the prior 24 hours generates a per-image evidence log: image hash, scan timestamp, score, and scan_id. This log can be used as ITGC (IT General Controls) evidence for SOX Section 404 audits (demonstrating that an AI input validation control ran on all financial images processed by the system) or as a PHI access audit supplement for HIPAA. See SOX compliance and AI security and HIPAA-compliant AI security for control-mapping details.
How does fail-closed differ between real-time and batch modes?
In real-time mode, fail-closed means: if the scanner is unreachable within the timeout window, treat the image as blocked and return an error to the caller. The LLM call does not happen. In batch mode, fail-closed means: if a scan attempt fails (network error, rate limit), mark that image as "unscanned" in the audit log, do not include it in any downstream pipeline that requires clean status, and retry on the next batch run. Batch mode does not have a user waiting, so there is no real-time block decision — the fail behaviour is instead quarantine + retry rather than immediate rejection.
What happens if real-time scan latency exceeds my SLA?
If the Glyphward real-time scan adds more latency than your SLA allows, consider: (a) reducing image resolution before scanning (resize to 512×512 before the scan call — this does not affect detection quality for most adversarial typographic payloads, which survive downscaling); (b) running the scan in parallel with other pre-processing steps using asyncio.gather() or a thread pool; (c) accepting a fail-open policy for images that timeout but supplementing with a nightly batch sweep to retrospectively flag any adversarial images processed during timeout events. See the pricing page for rate limits by plan tier.
Further reading
- Prompt-injection API free tier — start with the free tier.
- Glyphward pricing — rate limits and plan comparison.
- HIPAA-compliant AI security — HIPAA audit trail pattern.
- SOX compliance and AI security — SOX ICFR evidence using batch scan logs.
- Multimodal LLM security API — full API reference overview.