ICP-by-platform · Document AI

Prompt-injection scanner for document AI pipelines

Document AI services — Google Cloud Document AI, AWS Textract, Azure Document Intelligence (formerly Form Recognizer), and custom vision-LLM extraction stacks — extract structured data from document images submitted by external users: invoices, purchase orders, identity documents, contracts, medical records, and tax forms. These services treat the document image as a trusted source of data. But any document image submitted by an external user is an untrusted input. An attacker who controls a document submission can render adversarial text over a form field — using typographic prompt injection or invisible CSS-white text on a white background — that instructs the downstream LLM to return a false extracted value. The standard document validation pipeline (schema validation, format checks, value-range assertions) does not catch this attack because the extracted value looks structurally correct. Glyphward's pixel-level scanner detects adversarial overlays before the extraction call runs, blocking the attack at the document intake boundary.

TL;DR

At document intake, before calling Document AI, Textract, or your custom extraction LLM: convert the document to per-page images, call POST https://glyphward.com/v1/scan for each page, and reject any page with a returned score ≥ 65. Document pipelines benefit from a slightly lower threshold (65 vs 70) because document images are structured and false-positive rates on clean documents are low. Free tier — 10 scans/day, no card required.

The document AI prompt injection attack

The canonical document AI PI attack works as follows. The attacker creates a document image — a scan, a PDF rendered to an image, or a photo — where a form field contains both the expected value (printed in the standard font) and an adversarial instruction (rendered in tiny white text on a white background, or in a font that OCR misses but the vision model's CLIP encoder captures). When the extraction LLM processes the page, it "sees" both the legitimate content and the adversarial instruction pixel pattern. The instruction redirects the model's extraction output — changing a dollar amount, a date, a name, or a document classification.

This attack is distinct from standard document fraud (physically altering the document) because: the alteration is invisible to a human reviewing the document, it survives the normal document scan → image → extraction pipeline without triggering format validation, and it cannot be detected by text-layer analysis of the extracted output (the extracted text looks valid — it is the value the attacker intended to produce).

Research documenting vision-LLM susceptibility to typographic overlays: the FigStep paper (arXiv 2307.15043) demonstrates this attack class on instruction-following vision models, with direct applicability to extraction pipelines that use the same model architectures.

High-risk document AI use cases

Invoice and purchase order processing (AP automation). AI-powered accounts-payable automation that extracts vendor name, invoice number, amount due, and payment terms from uploaded invoices is a high-value PI target. A supplier who submits a manipulated invoice image can instruct the extraction AI to output a different bank account number or a higher amount, which the AP system routes for payment without human review.

KYC and identity document verification. Onboarding AI that extracts date-of-birth, address, and document number from ID photos and selfies operates in a fraud-sensitive environment. An adversarial overlay on a submitted ID image can alter the extracted date-of-birth (to pass an age verification check) or the extracted address (to pass a residence check) without the overlay being visible to a human reviewer.

Contract and agreement analysis. Legal AI that classifies contract clauses, extracts terms, or identifies risk language in uploaded contract images accepts documents from counterparties — untrusted external parties with incentive to influence the AI's classification. A PI overlay on a contract page can suppress a risk flag or alter an extracted clause classification.

Medical record and clinical document ingestion. Healthcare AI that processes uploaded clinical documents — patient-submitted records, referral letters, imaging reports from external providers — has an amplified risk profile because incorrect extraction affects clinical decisions. See HIPAA-compliant AI security for the full compliance framing.

Tax and financial document processing. AI that extracts income, deductions, or asset values from submitted tax forms or financial statements accepts documents from applicants, clients, or counterparties. False extraction in these pipelines can affect loan eligibility decisions, tax calculations, or regulatory reporting.

Implementation: per-page scan before extraction

import base64, io, hashlib, os, requests
from pdf2image import convert_from_bytes   # pip install pdf2image poppler-utils

GLYPHWARD_API_KEY = os.environ["GLYPHWARD_API_KEY"]
SCAN_THRESHOLD = 65   # lower threshold for structured document images

def scan_document_for_pi(document_bytes: bytes, source: str) -> list[dict]:
    """
    Convert document to per-page images, scan each page.
    Returns list of scan results per page.
    Raises ValueError if any page is flagged.
    """
    # Support both raw image bytes and PDF
    if document_bytes[:4] == b'%PDF':
        pages = convert_from_bytes(document_bytes, dpi=150)
    else:
        from PIL import Image
        pages = [Image.open(io.BytesIO(document_bytes))]

    scan_results = []
    for page_num, page_image in enumerate(pages, start=1):
        buf = io.BytesIO()
        page_image.save(buf, format='PNG')
        image_b64 = base64.b64encode(buf.getvalue()).decode()
        image_sha256 = hashlib.sha256(buf.getvalue()).hexdigest()

        resp = requests.post(
            "https://glyphward.com/v1/scan",
            headers={
                "Authorization": f"Bearer {GLYPHWARD_API_KEY}",
                "Content-Type": "application/json",
            },
            json={"image": image_b64, "source": f"{source}_page{page_num}"},
            timeout=8,
        )

        if not resp.ok:
            # Fail-closed: scanner unavailable = reject document
            raise RuntimeError(
                f"PI scanner unavailable on page {page_num} — document rejected. sha256={image_sha256}"
            )

        result = resp.json()
        result["page"] = page_num
        result["image_sha256"] = image_sha256
        scan_results.append(result)

        if result["score"] >= SCAN_THRESHOLD:
            raise ValueError(
                f"Document page {page_num} flagged by PI scanner. "
                f"scan_id={result['scan_id']} score={result['score']}"
            )

    return scan_results   # all pages passed; proceed to extraction

# Usage:
try:
    scan_results = scan_document_for_pi(uploaded_file_bytes, "invoice_upload")
    extracted = call_document_ai(uploaded_file_bytes)
except ValueError as e:
    reject_document(str(e))   # route to human review

Scanning each page of a multi-page PDF before extraction ensures that an adversarial overlay on any single page is caught before the extraction call processes the full document. The scan_id for each page is your audit evidence that the document was inspected before extraction.

Get early access

Coverage matrix

Defence layerInvoice imageID document photoContract scanMedical record PDF
Schema / format validationPost-extraction onlyPost-extraction onlyPost-extraction onlyPost-extraction only
Document forgery detection (metadata checks)Does not catch pixel overlaysDoes not catch overlaysDoes not catch overlaysDoes not catch overlays
Text-only scanner (LLM Guard, Lakera)No — image bytes ignoredNoNoNo
Glyphward per-page scanYes — pixel-levelYesYesYes

Related questions

Does scanning affect throughput for high-volume document pipelines?

Glyphward's scan API is designed for concurrent use. For high-volume pipelines, scan pages in parallel (one async request per page) rather than sequentially. A 10-page PDF with parallel scanning completes in approximately 200–400 ms total. For very high volume (tens of thousands of documents per day), contact us about batch scanning endpoints and rate limit tiers on the Pro and Team plans.

How does this interact with Google Document AI's built-in fraud detection?

Google Document AI includes a document fraud detection processor for certain document types (ID documents, financial forms). That processor checks for digital manipulation artifacts and known fraud patterns in the metadata and pixel structure. Glyphward is complementary: our scanner targets adversarial pixel-level PI payloads specifically, which are distinct from the image manipulation artifacts that fraud detection identifies. Use both: Glyphward before the LLM extraction call, Document AI's fraud processor as a post-extraction validation step.

What threshold should I use for structured business documents vs handwritten uploads?

For structured, typed documents (invoices, forms, contracts): use a threshold of 60–65. False positive rates on clean typed documents are very low because the scanner can clearly distinguish adversarial instruction patterns from standard form typography. For handwritten documents (handwritten forms, notes, medical charts): use threshold 70–75. Handwriting has higher variance that can produce elevated scores on clean images. If you have a mixed corpus, run a calibration pass with a sample of known-clean documents from your specific use case to confirm the threshold before deploying to production.

Can I scan documents that are already in Google Cloud Storage or S3?

Yes, with an intermediate step. Fetch the document from Cloud Storage or S3 using your service account credentials, convert to per-page images in your application, and send the image bytes to Glyphward. Do not send the storage URL directly — Glyphward fetches the image bytes within our scanning infrastructure, but the URL must be publicly accessible or you must send the bytes directly. For documents in private buckets, always download locally and send bytes rather than pre-signed URLs to avoid access control complexity.

Further reading