Product · Architecture guide

Real-time vs batch prompt-injection scanning

Multimodal prompt injection scanners can operate in two fundamentally different modes: real-time (synchronous, pre-LLM-call, fail-closed on timeout) and batch (asynchronous, scheduled, forensic/compliance role). The right mode — or combination of modes — depends on your pipeline's latency budget, threat model, and compliance requirements. Interactive applications where a user waits for a response need real-time gating: a flagged image should block the LLM call before the response is generated. Internal document processing workflows with looser latency budgets can run batch sweeps on stored image archives. Both modes call Glyphward's /v1/scan endpoint; the difference is in how and when you invoke it. This page walks through the architecture of each mode, when to choose each, and how to combine them for defence-in-depth.

TL;DR

Use real-time when a user is waiting and a blocked image should stop the LLM call immediately. Use batch when scanning historical uploads, preparing training data, or generating compliance audit trails on a schedule. Combine both: real-time gates production inference, batch sweeps the archive. Start with the free tier — 10 scans/day, no card required.

Real-time scanning: synchronous, fail-closed, pre-LLM

Real-time scanning inserts POST https://glyphward.com/v1/scan between the image source and the LLM call. The scan is synchronous — the application awaits the result before deciding whether to proceed. Typical latency: <150ms for images under 1MB; <200ms for 2–4MB images. If the scanner is unreachable (network partition, service disruption), the recommended behaviour is fail-closed: treat the image as unsafe and block the LLM call rather than passing the image unchecked.

When to use real-time:

Batch scanning: asynchronous, high-throughput, forensic

Batch scanning processes collections of images on a schedule — nightly, weekly, or triggered by an event (new document ingested, upload completed). The caller submits a list of image identifiers or URLs and receives results asynchronously via webhook or polling. Batch scanning is appropriate when:

Combining real-time and batch: defence-in-depth

Most production pipelines benefit from both modes running in parallel:

  1. Real-time gate at inference time — blocks the immediate LLM call for any image submitted in the current session.
  2. Batch sweep of the stored archive — sweeps all images ingested over the past 24 hours (or since the last batch run) to catch any that slipped through the real-time gate (for example, during a scanner outage window when fail-open was configured) and to re-scan against updated detection models as Glyphward's corpus grows.

Integration code: real-time and batch in Python

import base64, time, requests
from pathlib import Path

GLYPHWARD_KEY = "<your-glyphward-api-key>"
HEADERS = {"Authorization": f"Bearer {GLYPHWARD_KEY}", "Content-Type": "application/json"}

# ── Real-time mode ──────────────────────────────────────────────────────
def scan_realtime(image_bytes: bytes, source: str = "realtime") -> dict:
    """Synchronous scan. Raises on scanner error (fail-closed pattern)."""
    encoded = base64.b64encode(image_bytes).decode()
    try:
        resp = requests.post(
            "https://glyphward.com/v1/scan",
            json={"image": encoded, "source": source},
            headers=HEADERS,
            timeout=8,  # fail-closed if scanner takes > 8s
        )
        resp.raise_for_status()
        return resp.json()
    except Exception as exc:
        # Fail-closed: treat scanner error as a blocked image
        raise RuntimeError(f"Scanner unavailable — image blocked. ({exc})")

def handle_user_image(image_bytes: bytes) -> str:
    scan = scan_realtime(image_bytes)          # raises if scanner down
    if scan["score"] >= 70:
        return f"Image blocked (ref {scan['scan_id']})"
    # Safe — call your LLM here
    return call_vision_llm(image_bytes)

# ── Batch mode ──────────────────────────────────────────────────────────
def scan_batch(image_paths: list[str], threshold: int = 70) -> list[dict]:
    """Scan a list of local image files. Returns per-image results."""
    results = []
    for path in image_paths:
        image_bytes = Path(path).read_bytes()
        encoded = base64.b64encode(image_bytes).decode()
        try:
            resp = requests.post(
                "https://glyphward.com/v1/scan",
                json={"image": encoded, "source": "batch"},
                headers=HEADERS,
                timeout=15,
            )
            resp.raise_for_status()
            result = resp.json()
        except Exception as exc:
            result = {"path": path, "error": str(exc), "score": None}
        result["path"] = path
        result["flagged"] = result.get("score", 0) >= threshold
        results.append(result)
        time.sleep(0.1)  # rate limiting: respect free-tier 10/day or Pro 100k/mo
    return results

def nightly_archive_sweep(image_dir: str) -> None:
    paths = [str(p) for p in Path(image_dir).glob("**/*.{png,jpg,jpeg,webp}")]
    results = scan_batch(paths)
    flagged = [r for r in results if r.get("flagged")]
    if flagged:
        # Write to audit log, quarantine flagged files, alert on-call
        with open("audit-log.jsonl", "a") as f:
            import json
            for r in flagged:
                f.write(json.dumps(r) + "\n")
        print(f"Batch sweep: {len(flagged)}/{len(results)} images flagged — see audit-log.jsonl")

The batch function above calls the synchronous /v1/scan endpoint sequentially — suitable for archives up to a few thousand images. For larger archives, contact Glyphward about the batch job API (planned /v1/batch endpoint) which accepts a list of blob storage URLs and calls back via webhook on completion.

Get early access

Which mode should I use?

Use case Recommended mode Threshold Fail behaviour
User-facing chatbot with image upload Real-time 70 Fail-closed: block image, return error
Avatar SaaS: selfie upload before face-swap Real-time 65 Fail-closed: reject upload, prompt retry
Voice agent with image attachments Real-time 70 Fail-closed: skip image, process text-only
Nightly document archive audit Batch 70 Log and quarantine flagged images
Fine-tuning dataset preparation Batch 60 Strict: exclude any image above 60
HIPAA / SOX compliance audit trail Batch (+ real-time) 70 Both: gate production + generate audit evidence
Internal Slack bot (trusted users) Real-time 75 Fail-open acceptable for trusted-user channels

Related questions

What is the typical latency of Glyphward real-time scanning?

Typical end-to-end latency (application → Glyphward → application) is under 150ms for JPEG or PNG images under 1MB, and under 200ms for images up to 4MB, measured from a European or US-East network location. For applications with a 200ms SLA, set the timeout parameter in your HTTP client to 180ms and handle the timeout as a fail-closed event (block the image). For applications with tighter latency budgets, use edge caching for repeated identical images (same Content-MD5 hash → cache hit on the scan result).

Can batch scanning satisfy HIPAA or SOX audit trail requirements?

Yes — batch scans produce a scan_id for every image processed, regardless of whether the image was flagged. A nightly batch run over all images processed in the prior 24 hours generates a per-image evidence log: image hash, scan timestamp, score, and scan_id. This log can be used as ITGC (IT General Controls) evidence for SOX Section 404 audits (demonstrating that an AI input validation control ran on all financial images processed by the system) or as a PHI access audit supplement for HIPAA. See SOX compliance and AI security and HIPAA-compliant AI security for control-mapping details.

How does fail-closed differ between real-time and batch modes?

In real-time mode, fail-closed means: if the scanner is unreachable within the timeout window, treat the image as blocked and return an error to the caller. The LLM call does not happen. In batch mode, fail-closed means: if a scan attempt fails (network error, rate limit), mark that image as "unscanned" in the audit log, do not include it in any downstream pipeline that requires clean status, and retry on the next batch run. Batch mode does not have a user waiting, so there is no real-time block decision — the fail behaviour is instead quarantine + retry rather than immediate rejection.

What happens if real-time scan latency exceeds my SLA?

If the Glyphward real-time scan adds more latency than your SLA allows, consider: (a) reducing image resolution before scanning (resize to 512×512 before the scan call — this does not affect detection quality for most adversarial typographic payloads, which survive downscaling); (b) running the scan in parallel with other pre-processing steps using asyncio.gather() or a thread pool; (c) accepting a fail-open policy for images that timeout but supplementing with a nightly batch sweep to retrospectively flag any adversarial images processed during timeout events. See the pricing page for rate limits by plan tier.

Further reading