ICP-by-vertical · Insurance AI

Prompt-injection scanner for insurance AI

Insurance AI is rapidly expanding from text extraction to vision-capable workflows: motor claims processed from photos of damage, property claims assessed from smartphone images, life underwriting that analyses medical imaging, and document-intake pipelines that ingest claimant-submitted PDFs and forms. Each of these workflows receives images from claimants — untrusted external parties with direct financial incentive to manipulate the AI's output. A FigStep-class adversarial payload rendered into a claims photo at 30×30 pixels is invisible to a human reviewer and invisible to text-only input guards. The vision LLM reads it as an instruction: change the damage assessment, approve the claim, upgrade the coverage tier. Every text-only prompt-injection scanner on the market — Lakera Guard, LLM Guard, Azure Prompt Shields — operates on the text channel and cannot see the pixel stream. Glyphward is the scanner that reads what the model reads.

TL;DR

For insurance AI pipelines that accept claimant images or documents: call Glyphward's /v1/scan endpoint before passing any image to a vision LLM. Score ≥ 60 (conservative threshold for insurance) → hold for human review and log the anomaly. Score < 60 → pass to the model as normal. Sub-200 ms scan latency; full audit log via scan_id for SOC 2 CC6.6, NAIC Model Law 668 compliance records, and GDPR Article 22 automated decision-making audit trails. Free tier — 10 scans/day, no card.

Multimodal PI attack surfaces in insurance AI

Motor and property claims photos. Self-service claims portals ask claimants to upload photos of vehicle damage, storm damage, or personal property loss. These images are passed to a vision LLM for automated damage assessment, repair estimate generation, or reserve-setting decisions. A claimant can embed a typographic injection payload in the damage photo — a text overlay styled to match the background, or a low-contrast instruction in the corner of the image — that instructs the model to inflate the damage rating, approve total-loss status, or assign the claim to a high-reserve bucket. The image passes a visual human review because the payload is designed to be imperceptible at human inspection speeds and resolutions. The vision model reads it at full resolution.

Medical underwriting imaging and health claim documents. Life and health insurers use AI to analyse medical images and reports during underwriting. When the image input is provided by the applicant (a scan of a health document, a photo of a prescription) rather than retrieved from an authoritative medical source, it is an untrusted input. A fraudulent applicant with knowledge of the vision model used can craft a medical document image that injects instructions to the underwriting model — for example, to return a lower-risk health rating than the document's content warrants. This falls under EU AI Act Annex III Article 5(b) high-risk systems (access to health insurance) and NAIC Model Law 668 (anti-fraud technology requirements for US insurers).

Identity document verification. Know-Your-Customer and anti-fraud workflows that verify identity documents (driver's licences, passports, national IDs) by passing them to a vision LLM for field extraction are vulnerable to document manipulation attacks. A forged document with an adversarial overlay can instruct the extraction model to return a clean verification result for a fraudulent identity. The attack surface here is compounded by the fact that identity documents are high-value targets with well-resourced adversaries. See prompt-injection scanner for financial services AI for the KYC/AML variant of this attack surface.

Handwritten claim forms and agent notes scanned to PDF. OCR-plus-LLM pipelines that extract structured data from scanned handwritten forms receive claimant-controlled raster images. A handwritten instruction to the model placed visually in a margin or as a subtle overlay in the scan is a direct typographic prompt injection attack at handwriting resolution. Standard OCR (Tesseract, Azure Form Recognizer) extracts printed text and tokenises the image separately from handwriting — but vision LLMs reading the full page image see the entire pixel stream, including anomalous text the OCR layer missed or discarded.

Third-party adjuster reports and repair estimates. In property and motor claims, third-party adjusters, garages, and repair vendors submit documents that feed the claims AI. These documents arrive from external parties and are not under the insurer's control. A compromised adjuster system — or a vendor with incentive to inflate estimates — can embed adversarial payloads in submitted documents. This is the indirect prompt injection via image pattern applied to a supply-chain document flow.

Python integration for insurance claims pipeline

Insurance AI pipelines are typically Python-based. Insert a scan call before any client.chat.completions.create() call that includes an image part:

import base64, httpx
from pathlib import Path

GLYPHWARD_API_KEY = os.environ["GLYPHWARD_API_KEY"]
INSURANCE_SCAN_THRESHOLD = 60  # conservative for insurance decisions

def scan_image_before_llm(image_path: str | Path, claim_id: str) -> dict:
    """Scan an image for PI payloads. Raise if risk too high for insurance context."""
    image_bytes = Path(image_path).read_bytes()
    image_b64 = base64.b64encode(image_bytes).decode()

    resp = httpx.post(
        "https://glyphward.com/v1/scan",
        headers={"Authorization": f"Bearer {GLYPHWARD_API_KEY}"},
        json={
            "image": image_b64,
            "source": "insurance_claims",
            "metadata": {"claim_id": claim_id},
        },
        timeout=5.0,
    )
    resp.raise_for_status()
    result = resp.json()

    # Log every scan for SOC 2 / NAIC 668 audit trail
    log_scan_result(
        claim_id=claim_id,
        scan_id=result["scan_id"],
        score=result["score"],
        flagged_region=result.get("flagged_region"),
    )

    if result["score"] >= INSURANCE_SCAN_THRESHOLD:
        raise PIAdversarialImageError(
            f"Image blocked: scan_id={result['scan_id']} "
            f"score={result['score']} claim={claim_id}"
        )
    return result


class PIAdversarialImageError(Exception):
    """Raised when a submitted image exceeds the PI risk threshold."""
    pass

Wrap every claims-intake image call with scan_image_before_llm(). The raised PIAdversarialImageError should route the claim to a human review queue rather than auto-rejecting it — a high PI score indicates adversarial content, not necessarily a fraudulent claim, and wrongful rejection has its own regulatory and customer-relations costs.

Get early access

Regulatory and compliance mapping

NAIC Model Law 668 (AI Systems in Insurance). The National Association of Insurance Commissioners' model law on AI system use in insurance (adopted in variant form by multiple US states) requires insurers to maintain an AI system inventory, document data inputs, and implement controls against manipulated model inputs. Logging every scan result with a scan_id tied to the claim_id creates an auditable record of PI-mitigation controls, directly supporting the "accountability and human oversight" requirements of Model Law 668.

EU AI Act Annex III — High-Risk AI. AI systems used in insurance underwriting (access to health, life, and property insurance) are listed as high-risk systems under EU AI Act Annex III. High-risk systems must implement risk management measures (Article 9), maintain technical documentation (Article 11), and support human oversight (Article 14). PI detection at the image-input boundary is a direct implementation of Article 9's requirement to "identify and analyse the known and foreseeable risks" of the AI system and implement "measures to address those risks." See also EU AI Act Article 15 and multimodal prompt injection.

SOC 2 CC6.6 (Logical and Physical Access — System Inputs). SOC 2 CC6.6 requires that "the entity implements logical access security measures to protect against threats from sources outside its system boundaries." Untrusted claimant images are inputs from outside the system boundary. Scanning them for adversarial content — and logging the scan results — is a direct control implementation for CC6.6. See SOC 2 AI security controls for prompt injection.

GDPR Article 22 (Automated Individual Decision-Making). In the EU, automated insurance decisions based on AI must comply with GDPR Article 22's right-to-explanation requirements. If an AI system's decision was influenced by an adversarial PI payload embedded in a claimant-submitted image, the "explanation" provided to the claimant would be misleading — the system would report a decision based on the document's content when the actual driver was the injected instruction. Catching PI payloads before the model processes the image preserves the integrity of the automated decision and the validity of any resulting explanation.

Coverage matrix

Defence layerClaims photo with embedded overlayHandwritten form with typographic PIThird-party adjuster PDF with image PIIdentity document manipulation
Text-only scanner (Lakera, LLM Guard)No — image bytes skippedNoNoNo
OCR-based text extraction (Tesseract, Azure)No — OCR discards anomalous glyphsPartial — misses visual overlaysNo — embedded-image pages missedNo — trained for printed text
Human reviewerNo — payload designed to evade inspectionNo — imperceptible at review speedsNoNo — requires pixel-level diff
Glyphward (pixel-level + waveform)YesYesYes — page-render scanYes

Related questions

What threshold should insurance AI use?

The default Glyphward threshold of 70 is calibrated for a balance of detection rate and false-positive rate across general-purpose use cases. For insurance decisions — where a false-negative (missed payload) can result in fraudulent claim approval with direct financial loss — we recommend a threshold of 60. Scores 60–70 should route to human review rather than auto-rejection; scores above 70 can trigger an immediate hold and security alert. For high-value claims above a configurable monetary threshold, consider dropping to 55. Adjust based on your observed false-positive rate over the first 30 days of deployment.

Does scanning affect claims processing speed?

The Glyphward scan completes in under 200 ms. For a typical claims processing pipeline where OCR, database writes, and LLM inference add up to 3–15 seconds, the scan adds under 5% to the total processing time per claim. For high-volume motor claims pipelines (thousands of claims per hour), use the batch scan endpoint or asynchronous scan pattern to parallelise the scan with upstream document intake steps rather than adding it as a serial step before the LLM call.

We use a vision model built into our claims management system — can we still integrate?

If your claims management system vendor exposes an API or webhook at the document-intake step (before the LLM call), you can insert the Glyphward scan in that pre-processing hook. If the system is a closed black box with no pre-LLM hook, contact Glyphward about a proxy integration — the Team tier includes a configurable proxy mode that intercepts image inputs, scans them, and either passes or blocks them before forwarding to the downstream LLM endpoint.

What about voice claim submissions?

WhisperInject-class attacks embedded in recorded voice claim submissions are a real threat for insurers offering voice claim channels. A claimant who submits an audio file with an out-of-band waveform instruction can influence the transcription step in ways that alter the claim record. Use Glyphward's audio scan endpoint ("audio": "<base64>") before passing claim audio to Whisper or any other STT layer. See WhisperInject detection for the waveform attack class and audio prompt-injection detection for the integration pattern.

Further reading