ICP-by-vertical · Customer Service

Prompt-injection scanner for customer service AI

Customer service AI has crossed into mainstream production deployment. Zendesk AI (powered by OpenAI), Intercom Fin AI (Claude), and Freshdesk Freddy AI routinely read ticket attachments — error screenshots, product photos, delivery evidence images — as part of automated triage, routing, and suggested-reply generation. Every image attachment from every customer is an untrusted input. A customer who submits a support ticket with a crafted screenshot can embed adversarial pixel-level instructions that redirect the AI's triage classification, alter a suggested reply, or suppress a fraud flag. Text-only prompt injection scanners — which inspect the text body of the ticket — are blind to payloads hidden in image attachments. Glyphward scans image bytes before any LLM call and returns a 0–100 risk score in under 200 ms, giving customer service AI platforms a deterministic gate against FigStep-class and typographic PI attacks.

TL;DR

Before any AI triage or reply-generation call that includes an image attachment: download the attachment binary from your helpdesk platform's API, call POST https://glyphward.com/v1/scan with the base64-encoded image, and skip the AI call or route to human review if the returned score ≥ 70. Free tier — 10 scans/day, no card required.

Why customer service AI is a high-value target

Triage AI controls ticket routing and prioritisation. An AI triage system that assigns priority labels, routes to specific agent queues, or automatically closes certain ticket types based on image content is a high-value target. A successful PI attack against triage AI can: downgrade a legitimate complaint to low priority (suppressing SLA), route a ticket to the wrong team (delaying resolution and creating confusion), or trigger automated responses that incorrectly close tickets the customer has not resolved.

Suggested-reply AI can be made to output attacker-controlled text. AI-generated suggested replies that an agent approves and sends are a dangerous attack surface. A PI payload in a customer-supplied screenshot can instruct the reply-generation AI to draft a response containing attacker-chosen content — a false refund promise, a misleading product description, or a link to an external URL — which an inattentive agent then sends to other customers.

Customer service AI processes images from the highest-volume untrusted group. Unlike internal tools that accept images from employees (who have employment accountability), customer service AI accepts images from the full public customer base — millions of users with no accountability relationship beyond a payment account. The attack surface scales with customer volume.

Integration patterns by platform

Zendesk: Zendesk Apps Framework + webhook trigger. Zendesk's trigger system can fire a webhook when a ticket is created or updated with an attachment. Your webhook receiver downloads the attachment using the Zendesk API, calls Glyphward, and — if flagged — uses the Zendesk API to add an internal note to the ticket, set a custom tag (pi_image_flagged), and route to a human review queue before any AI agent action fires.

import requests, base64, os

def handle_zendesk_ticket_attachment(ticket_id: int, attachment_url: str, token: str):
    # Download attachment from Zendesk (requires Basic or OAuth token)
    img_resp = requests.get(
        attachment_url,
        headers={"Authorization": f"Basic {token}"},
    )
    img_b64 = base64.b64encode(img_resp.content).decode()

    scan = requests.post(
        "https://glyphward.com/v1/scan",
        headers={"Authorization": f"Bearer {os.environ['GLYPHWARD_API_KEY']}",
                 "Content-Type": "application/json"},
        json={"image": img_b64, "source": f"zendesk_ticket_{ticket_id}"},
        timeout=5,
    )

    result = scan.json() if scan.ok else {"score": 100, "scan_id": "SCAN-FAILED"}

    if result["score"] >= 70:
        # Tag ticket and suppress AI processing
        requests.put(
            f"https://your-domain.zendesk.com/api/v2/tickets/{ticket_id}",
            json={"ticket": {
                "tags": ["pi_image_flagged"],
                "comment": {
                    "body": f"[AUTO] Image attachment flagged by PI scanner (score={result['score']}, scan_id={result['scan_id']}). Routing to human review.",
                    "public": False,
                },
                "assignee_id": HUMAN_REVIEW_QUEUE_ID,
            }},
            headers={"Authorization": f"Basic {token}", "Content-Type": "application/json"},
        )
        return False   # suppress AI triage for this ticket

    return True  # safe to proceed with AI triage

Intercom: Custom inbox action or webhook. Intercom's webhook system fires on conversation.created and conversation.part.created events. Subscribe to attachment events, scan each image via Glyphward, and use Intercom's API to assign the conversation to a human inbox and add a private note before Fin AI's automated reply fires. Intercom exposes a block_reply API action in the Inbox SDK for exactly this use case.

Freshdesk: Automation trigger + Glyphward webhook. Freshdesk's Automation rules can trigger a webhook when a ticket with an attachment is created. The webhook handler scans the image and, if flagged, uses Freshdesk's API to assign the ticket to a human group, add a private note with the scan_id, and set a custom field (pi_scan_blocked = true) that prevents Freddy AI from generating a reply until a human agent clears the flag.

Get early access

Handling false positives without degrading customer experience

A flagged image does not mean you must reject the ticket — it means the AI should not process the image. The recommended pattern is: route flagged tickets to a human agent with a private internal note containing the scan_id; the human agent reviews the image visually, clears the flag in your system, and optionally re-enables AI processing for that specific attachment. The customer sees no difference — their ticket is in the queue and being handled. You avoid both the PI risk and the customer-experience degradation of an outright rejection. False positive rates on standard receipt and screenshot images are low (under 2% in internal testing on typical support ticket corpora).

Coverage matrix

Defence layerError screenshotProduct photoDelivery photoInvoice attachment
Helpdesk built-in spam filterEmail header onlyNoNoNo
LLM system prompt hardeningProbabilistic (not reliable)ProbabilisticProbabilisticProbabilistic
Text-only scanner (Lakera, LLM Guard)No — image bytes ignoredNoNoNo
Glyphward pre-AI scanYes — pixel-levelYesYesYes

Related questions

Does this add delay to ticket response times?

Glyphward's scan endpoint returns in under 200 ms for typical support ticket images (under 4 MB). In an asynchronous ticket processing pipeline this latency is invisible to the customer — the webhook receives the ticket, the scan runs, and the AI triage fires, all within the same background processing window that already takes several seconds. In real-time chat contexts (Intercom live chat with image attachment), the scan adds under 200 ms to the bot's first response — imperceptible in a conversational UI where a 1–3 second response is already expected.

What do we tell customers whose images are flagged?

Do not tell customers their image was blocked by a security scanner — this both leaks information about your detection approach and creates unnecessary friction for legitimate users who submitted benign images. The recommended approach: route the ticket silently to a human agent and respond with your standard "a member of our team will review your ticket shortly" message. The scan_id in your internal log allows your agent to reference the flagged event during their review without the customer knowing a security check ran.

Can we scan audio attachments too?

Yes. Voice AI in customer service — AI that transcribes and analyses voicemail recordings or audio messages attached to tickets — is exposed to audio prompt injection. Glyphward's /v1/scan endpoint accepts audio files as well as images. Pass the base64-encoded audio binary with "media_type": "audio" in the request body. The scanner applies a waveform anomaly classifier and Whisper transcript filter to detect out-of-band injection payloads.

Does this work with Salesforce Service Cloud and Einstein AI?

Yes. Salesforce Service Cloud Einstein AI (including Einstein Copilot for Service) processes case attachments. The integration pattern is the same: subscribe to the Salesforce Streaming API or Platform Events for new case attachments, download the attachment from the Salesforce Content API, scan with Glyphward, and use the Salesforce API to set a case flag that suppresses Einstein AI processing until the scan passes. Salesforce's Flow automation can be configured to gate Einstein AI actions on the presence or absence of the PI scan flag custom field.

Further reading