Healthcare AI security · Clinical imaging

Multimodal prompt injection in healthcare imaging AI

AI-assisted clinical imaging tools — radiology AI second-read systems, pathology digital slide analysers, retinal screening AI, dermatology classifiers — are among the highest-stakes deployments of vision models. These systems process medical images from a range of sources: referring institutions delivering DICOM files via DICOM network (DIMSE) or DICOMweb, PACS-stored images accumulated from prior studies, patient-uploaded photos in telemedicine platforms, and pathology whole-slide images from slide scanners. In each case, the image source introduces a degree of external trust: the institution delivering the DICOM file, the patient submitting the photo, the pathology lab operating the scanner. The medical imaging infrastructure surrounding these AI tools — PACS (Picture Archiving and Communication System), HL7/FHIR messaging for clinical data exchange, HIPAA-compliant storage with audit trails, SOC 2 Type II controls on the AI vendor's infrastructure — is designed to protect patient data privacy, ensure data integrity in transit, and enforce access control. None of these controls inspect image pixel content for embedded adversarial payloads that could direct a vision-language model to produce a specific false clinical finding. A VLM-based radiology AI second-read tool that receives an adversarially crafted DICOM file from a referring institution will process the pixel data, apply its clinical AI model, and may produce a specific false finding — not a random hallucination but a targeted, adversarially steered clinical assertion — if the pixel payload is effective. The clinical workflow then receives an AI-generated finding that is presented for clinician review, potentially under time pressure, in a high-volume reporting environment where AI assistance is trusted to reduce per-study cognitive load. Glyphward provides the pre-VLM scan gate that screens clinical images for adversarial pixel payloads before they reach the AI model.

TL;DR

Healthcare imaging AI pipelines receive DICOM files and medical images from external institutions and patient uploads. These sources introduce adversarial injection risk: pixel-level payloads can direct VLMs to produce specific false clinical findings. PACS, HIPAA, HL7/FHIR, and SOC 2 controls do not address this attack surface. Extract the pixel array from each DICOM file and scan with POST https://glyphward.com/v1/scan before the AI model invocation. Reject images with score >= 65 and route to manual radiologist review. Free tier — 10 scans/day, no card required.

The four multimodal attack surfaces in healthcare imaging AI

1. DICOM files from referring institutions — external-sourced adversarial pixel data in PACS-integrated AI workflows. Radiology AI second-read tools are commonly integrated with hospital PACS systems that receive DICOM files from referring institutions via DICOM network (C-STORE, C-MOVE) or DICOMweb REST API. The referring institution is an external entity with partial trust: the institution is a known clinical partner, DICOM TLS encrypts the transmission, DICOM conformance statements define the SOP classes exchanged, and the PACS audit log records the DICOM association — all controls on the integrity of the DICOM protocol exchange, not on the clinical integrity of the pixel data. An adversarially modified DICOM file — where the pixel data array carries an embedded adversarial payload alongside the legitimate medical image content — passes DICOM network validation (valid SOP class, valid pixel data format, valid DICOM metadata tags) and is stored in the PACS. The AI second-read tool retrieves the study from PACS, extracts the pixel array, runs its clinical AI model, and produces findings. The adversarial payload in the pixel data may direct the AI model toward a specific false finding: a false negative for a radiologically visible lesion (causing the AI not to flag a finding that would otherwise be flagged), or a specific false positive assertion (claiming a finding that is not present). In either case, the PACS audit trail, DICOM TLS, and HIPAA controls provide no signal that the pixel data was adversarially modified.

2. PACS-integrated AI second-read on archived studies — adversarial modification of historical image data. AI second-read tools operate not only on newly received studies but also on historically archived studies — retrospective AI review of prior imaging to flag studies that may have been misread, quality assurance audits of reporting accuracy, or research workflows that apply newer AI models to historical PACS data. Historical PACS archives contain images from past acquisitions across a range of imaging sources. PACS storage integrity is typically protected by WORM-equivalent storage policies, audit logs, and hash-based integrity checking of archived objects. If an adversarial modification occurred before the image entered the PACS archive — at the modality worklist acquisition step, at the DICOM routing layer, or at a third-party imaging service provider — the archived image contains the adversarial pixel payload with the hash-based integrity check computed over the adversarially modified pixel data. The AI second-read tool retrieves the archived study, extracts the pixel array, runs its model, and produces findings that reflect the adversarial payload — with the PACS integrity check confirming that the retrieved image matches the stored version, not that the stored image is free of adversarial modification. This attack surface is particularly relevant for AI retrospective review workflows that operate on large archived datasets at high throughput, where per-study adversarial image detection is not feasible without an automated pre-scan gate.

3. Pathology whole-slide imaging — adversarial regions in multi-gigapixel WSI files for digital pathology AI. Digital pathology AI analyses whole-slide images (WSI): multi-gigapixel image files produced by slide scanners (Aperio, Hamamatsu, Leica, Philips) and stored in formats including SVS, NDPI, SCN, and MRXS. WSI files are processed by AI in tiles: the WSI is divided into overlapping tile grids (typically 256×256 or 512×512 pixels per tile), and the AI model analyses each tile independently or in a multi-scale ensemble. Adversarial WSI attacks can target specific tile regions within the WSI: a maliciously prepared slide (prepared by an adversary with laboratory access, or a modified WSI file from an external laboratory) contains adversarial pixel content in one or more tile regions. These regions are analysed by the AI tile-processing pipeline and may produce specific false tile-level predictions that aggregate to a false slide-level finding. The attack requires adversarial knowledge of the AI model's tile-level classification boundaries — but this knowledge may be obtainable from published AI model papers (model architecture, training dataset) or through black-box probing of the AI system's API. WSI file integrity controls (hash-based storage verification, HIPAA audit trails) verify that the WSI file was not modified after ingestion, not that the WSI was prepared without adversarial intent before ingestion.

4. Telemedicine patient photo upload — adversarial images in AI-assisted dermatology and primary care triage. Telemedicine platforms that use AI to triage patient-submitted photos — dermatology lesion analysis, wound assessment, ophthalmology screening, primary care symptom assessment — process images submitted directly by patients through mobile apps or web portals. The patient is an external untrusted source by definition: while patients are not typically adversarial, telemedicine platforms serving large patient populations may also be accessed by individuals attempting to manipulate AI triage decisions — to obtain a referral for a condition they do not have, to avoid a referral for a condition they do, or to probe AI diagnostic boundaries for research or competitive purposes. Patient-submitted photos bypass the institutional trust controls applicable to DICOM-network-delivered clinical images: there is no DICOM TLS, no PACS integrity check, no institutional DICOM conformance statement. The AI triage system receives a JPEG or PNG from the patient's device and passes it directly to the vision model. Adversarially crafted patient photos — images that carry pixel-level payloads designed to trigger specific AI triage outputs — enter the AI pipeline with no layer of institutional validation between the patient's submission and the AI model's analysis.

Integration: pre-scan gate for DICOM-sourced healthcare imaging AI

import base64
import pydicom
import numpy as np
import requests
from PIL import Image
import io

GLYPHWARD_KEY = "<your-glyphward-api-key>"
GLYPHWARD_THRESHOLD = 65

def extract_pixel_array_as_jpeg(dicom_path: str) -> bytes:
    """
    Extract pixel array from DICOM file and convert to JPEG for scanning.
    Handles 16-bit grayscale (CT/MRI) and colour images (dermoscopy, pathology).
    """
    ds = pydicom.dcmread(dicom_path)
    pixel_array = ds.pixel_array

    # Apply window/level for grayscale modalities (CT, MRI, X-ray)
    if pixel_array.dtype != np.uint8:
        if hasattr(ds, "WindowCenter") and hasattr(ds, "WindowWidth"):
            center = float(ds.WindowCenter) if not isinstance(ds.WindowCenter, pydicom.multival.MultiValue) else float(ds.WindowCenter[0])
            width = float(ds.WindowWidth) if not isinstance(ds.WindowWidth, pydicom.multival.MultiValue) else float(ds.WindowWidth[0])
            lower = center - width / 2
            upper = center + width / 2
            pixel_array = np.clip(pixel_array, lower, upper)
            pixel_array = ((pixel_array - lower) / width * 255).astype(np.uint8)
        else:
            pixel_array = ((pixel_array - pixel_array.min()) / (pixel_array.max() - pixel_array.min()) * 255).astype(np.uint8)

    img = Image.fromarray(pixel_array)
    buf = io.BytesIO()
    img.save(buf, format="JPEG", quality=95)
    return buf.getvalue()

def scan_dicom_image(dicom_path: str) -> dict:
    """Scan DICOM pixel data for adversarial payloads before VLM analysis."""
    jpeg_bytes = extract_pixel_array_as_jpeg(dicom_path)
    encoded = base64.b64encode(jpeg_bytes).decode()

    response = requests.post(
        "https://glyphward.com/v1/scan",
        headers={"Authorization": f"Bearer {GLYPHWARD_KEY}"},
        json={"image": encoded},
        timeout=5,
    )
    response.raise_for_status()
    return response.json()

def process_clinical_image(dicom_path: str, ai_model_call) -> dict:
    """
    Full clinical image processing pipeline with Glyphward pre-scan gate.
    Routes adversarial images to mandatory manual radiologist review.
    """
    try:
        scan = scan_dicom_image(dicom_path)
    except Exception as e:
        # Fail-closed: scan unavailability → mandatory manual review
        return {
            "status": "manual_review_required",
            "reason": "scan_unavailable",
            "error": str(e),
            "dicom_path": dicom_path,
        }

    if scan["score"] >= GLYPHWARD_THRESHOLD:
        # Adversarial image detected — route to mandatory manual review
        # Do NOT run AI model; do NOT generate AI finding
        return {
            "status": "manual_review_required",
            "reason": "adversarial_image_detected",
            "score": scan["score"],
            "scan_id": scan["scan_id"],
            "dicom_path": dicom_path,
            # Include in radiologist worklist with adversarial flag for awareness
        }

    # AI model call — only reached by non-adversarial images
    ai_finding = ai_model_call(dicom_path)
    return {
        "status": "ai_finding",
        "finding": ai_finding,
        "scan_id": scan["scan_id"],
        "dicom_path": dicom_path,
    }

The DICOM pixel extraction step applies window/level normalisation before scanning — this is important because adversarial payloads embedded in the original 12-bit or 16-bit pixel values of CT or MRI images may be structured to affect the VLM's interpretation after the standard windowing applied by the radiology viewing system. Scanning the windowed JPEG (as the VLM will process it) catches payloads targeting the windowed representation. For pathology WSI files, apply the scan gate to each tile batch rather than the full WSI: extract a representative sample of tiles (or all tiles in a high-throughput scan) and scan each tile; if any tile scan returns score >= 65, route the entire slide to manual pathologist review without AI analysis. Fail-closed is the correct default for clinical imaging: scan unavailability or adversarial detection routes to mandatory manual review, never to AI-assisted analysis under adversarial conditions. This is consistent with FDA guidance on AI/ML-based SaMD that requires human oversight pathways for high-risk AI-generated findings. Get early access

Coverage matrix

Security or compliance control DICOM from referring institution PACS archived study AI review WSI pathology slide AI analysis Telemedicine patient photo upload
DICOM TLS and network encryption Encrypts transmission; does not validate pixel content for adversarial payloads Not applicable to PACS retrieval of already-stored images Encrypts WSI file transfer; does not inspect pixel tile content Not applicable (patient app uploads are HTTPS, not DICOM network)
PACS audit logs and WORM storage integrity Verifies image was not modified after PACS ingestion; does not detect pre-ingestion adversarial modification Hash verifies stored image matches retrieved image; does not detect adversarial modification before archive WSI file integrity checks; do not detect adversarial pixel regions before scanner output Not applicable to telemedicine direct upload
HIPAA technical safeguards and SOC 2 controls Protect PHI access and audit trail; do not inspect image pixel content Same — access control and audit, not pixel content inspection Same — PHI protection, not adversarial image detection Same — patient data protection, not adversarial pixel payload detection
Text-only prompt injection scanners (Lakera, LLM Guard, Azure Prompt Shields) No — scan text inputs; DICOM pixel payloads bypass all text-layer scanners No No No
Glyphward pre-VLM scan gate (multimodal PI detection) Yes — extracts and scans DICOM pixel array before AI model call; routes adversarial files to manual review Yes — scans archived study pixel data before retrospective AI analysis; detects adversarial pre-ingestion modification Yes — scans WSI tiles before pathology AI analysis; routes adversarial slides to manual pathologist review Yes — scans patient-uploaded JPEG/PNG before AI triage call; routes adversarial images to clinician review

Related questions

How does this attack differ from classical adversarial examples on medical image classifiers?

Classical adversarial examples on medical image classifiers (gradient-based perturbations, PGD attacks, C&W attacks) are well-studied: imperceptible pixel perturbations cause a trained classifier to misclassify with high confidence. These attacks require white-box access to the model's gradient or extensive black-box query access for transfer attacks. They produce false classification outputs by exploiting the model's learned decision boundary, without injecting readable instructions. Multimodal prompt injection in healthcare imaging AI targets a different mechanism: the adversarial pixel content carries typed instruction content — a natural-language command embedded in the image's visual structure — that the VLM processes as part of its interpretation of the image content. This requires no access to the model's gradients; any adversary with knowledge of how VLMs process images can craft typographic injection payloads (FigStep-style embedded text) without model access. The attack is accessible to a wider adversary pool than classical gradient-based adversarial examples. VLMs are increasingly used in healthcare AI second-read contexts precisely because their reasoning capability (not just classification score) is valuable for clinical reporting — and this reasoning capability is what typographic prompt injection exploits. Glyphward detects typographic and composite injection attacks; gradient-based imperceptible perturbations targeting specific VLM architectures are a separate concern typically addressed through adversarial training of the underlying model.

Does FDA regulation for AI/ML SaMD address adversarial image inputs?

The FDA's regulatory framework for AI/ML-based Software as a Medical Device (SaMD) addresses AI performance characteristics (sensitivity, specificity, generalisation to new patient populations, out-of-distribution handling) and human-AI integration design (clinician override, transparency of AI confidence, update control). FDA guidance documents (including the 2021 action plan for AI/ML SaMD and the 2024 marketing submissions guidance) discuss performance monitoring, predetermined change control plans, and transparency requirements. They do not specifically address adversarial image attacks as a cybersecurity risk vector for clinical imaging AI. However, the FDA's medical device cybersecurity guidance (2023 final guidance on cybersecurity in medical devices) requires manufacturers to identify, assess, and mitigate cybersecurity risks throughout the product lifecycle. Adversarial image attacks that manipulate AI-generated clinical findings are a cybersecurity risk to the AI SaMD — they represent an integrity threat to the AI output that could affect patient safety. This cybersecurity risk should be addressed in the pre-market submission's cybersecurity risk assessment and mitigated with controls including the pre-VLM scan gate. The EU MDR and EU AI Act similarly impose risk management obligations that extend to adversarial manipulation of AI inputs in high-risk AI systems, which clinical imaging AI qualifies as.

Is there a risk from DICOM metadata injection, or is the risk limited to pixel data?

DICOM metadata (tags in the DICOM header — patient demographics, modality, study description, series description, instance UID) is a separate attack surface from pixel data. Adversarial DICOM metadata attacks target VLMs that are provided with DICOM metadata as part of the clinical context alongside the image: if the system prompt or user message includes DICOM header fields (patient age, study indication, requesting clinician) and the DICOM header is controlled by an attacker, the metadata can carry text-layer prompt injection payloads that redirect the VLM's clinical interpretation. This is a text-layer attack that standard text PI scanners (Lakera, Azure Prompt Shields) can address for the metadata portion of the request. The pixel data attack (Glyphward's domain) and the metadata attack are orthogonal — both should be defended against in a comprehensive clinical imaging AI security architecture. For systems that use DICOM metadata as clinical context in VLM prompts: scan the metadata fields with a text PI scanner before including them in the prompt, and scan the pixel data with Glyphward before passing it to the VLM. The multimodal AI security checklist covers both attack surfaces in its DICOM-specific section.

What is the threshold score recommendation for clinical imaging applications?

The general Glyphward threshold recommendation is 65: reject images with score >= 65 and accept images with score < 65. For clinical imaging applications where false negatives (missing an adversarial image that reaches the AI model) have patient safety consequences, a lower threshold — 50 or 55 — trades more false positives (legitimate clinical images routed to manual review) for fewer false negatives (adversarial images reaching the AI model). The appropriate threshold depends on the clinical workflow's capacity to absorb manual review cases and the patient safety risk profile of the AI-generated finding type: a radiology AI that generates findings used directly in preliminary reports (visible to reporting radiologists as AI-generated suggestions) warrants a lower threshold (more conservative) than a screening AI that generates workflow prioritisation signals only (images flagged by AI for prioritised radiologist attention, not for direct reporting). For pathology tile-level scanning, a per-tile threshold of 50 is appropriate because individual tile false positives route only that tile's region to manual pathologist attention, not the entire slide. Test your chosen threshold against a hold-out set of your clinical image corpus using the Glyphward free tier before deploying to production.

Further reading