OWASP LLM Top 10 · LLM10:2025

OWASP LLM10:2025 Unbounded Consumption — multimodal dimension

OWASP LLM10:2025 Unbounded Consumption describes the risk that LLM deployments allow excessive compute, token, and API resource consumption — enabling denial-of-service attacks, cost amplification, and quota exhaustion. The published mitigations focus on request rate limiting, token budget caps, cost monitoring, and prompt length restrictions. All of these are text-layer controls: they operate on prompt token counts, generation length limits, and per-user request caps. The multimodal dimension of LLM10 is absent from every published discussion: when a vision-language model processes an image, the image is not measured in tokens for billing purposes but can trigger dramatically disproportionate compute at the attention and generation layers. An adversarially crafted high-entropy image — visually resembling a normal photograph but structured to maximise worst-case patch attention computation — can cause a VLM to spend orders-of-magnitude more inference compute than a normal image of identical file size. Beyond raw compute, adversarial images can exploit retry logic: an image designed to cause the VLM to produce output that fails downstream validation triggers automated retries, amplifying the resource consumption by the retry count. Text-layer rate limiting by request count or token budget does not prevent image-layer resource exhaustion because the adversarial payload is in the pixel content, not the prompt tokens. Glyphward's pre-VLM scan gate detects adversarially crafted images before they reach the model, blocking the resource exhaustion vector at the input layer.

TL;DR

In any multimodal pipeline where untrusted users can submit images for VLM processing, adversarially crafted high-entropy images are an LLM10 resource exhaustion vector. Token quotas and rate limits operate on text; they do not restrict image-layer compute amplification. Scan every untrusted image with POST https://glyphward.com/v1/scan before the VLM call. Reject images with score >= 65. Adversarial image detection is the only pre-VLM control that closes this input-layer DoS vector. Free tier — 10 scans/day, no card required.

The four multimodal attack surfaces in LLM10 Unbounded Consumption

1. Free-tier public APIs with image upload — adversarial high-entropy images exhausting per-user compute quotas. Public-facing multimodal APIs that allow any user to submit images for VLM analysis — document scanners, image captioning tools, visual question answering, screenshot analysis — are directly exposed to image-layer resource exhaustion. An adversarially crafted high-entropy image submitted by a free-tier user can trigger worst-case attention computation in the VLM's vision encoder: the maximum number of image patches, the highest attention head activation density, the longest completion generation before a stop condition. If the platform's per-user quota is measured in API calls or token-equivalent units rather than GPU-seconds, adversarial images allow a single free-tier user to consume compute equivalent to hundreds or thousands of normal requests. The attack is asymmetric: the adversary submits one API request; the platform incurs the compute cost of many. At scale, coordinated adversarial image submission can exhaust GPU capacity, trigger autoscaling beyond cost budgets, or force rate limiting on legitimate users. The adversarial image bypasses all text-layer protections — request rate limiting, prompt token caps, output length limits — because the attack is in the pixel content, not the request metadata.

2. Webhook and event-driven image processing pipelines — adversarial images triggering retry cascades. Asynchronous image processing pipelines that use webhooks, message queues, or event streams to trigger VLM analysis are exposed to a compounded LLM10 risk from adversarial images. In a typical pipeline, an image processing job is retried on failure — the message is requeued if processing throws an exception, if the VLM returns an unexpected output format, or if downstream validation fails. An adversarial image designed to cause the VLM to produce output that reliably fails the pipeline's downstream validation — wrong schema, invalid JSON structure, content policy flag — triggers the full retry sequence on every processing attempt. If the pipeline retries three times with exponential backoff, one adversarial image submission results in four full VLM inference calls, each consuming full model compute. In a queue-based architecture where the same adversarial image can be submitted by multiple event producers, the retry amplification compounds across all concurrent processing jobs. The adversarial image does not need to be crafted to cause a VLM crash — it only needs to cause a consistent validation failure to exploit the pipeline's own retry logic as a resource amplifier. Message queue poison-pill handling (dead-letter queues, max-retry caps) limits amplification but does not eliminate it; the Glyphward pre-scan gate eliminates the adversarial image before it enters the queue.

3. Multi-modal RAG pipelines with image indexing — adversarial images triggering excessive embedding recomputation. Retrieval-augmented generation systems that index images alongside text — visual knowledge bases, product catalogue RAG, document intelligence systems — are exposed to LLM10 resource exhaustion through the embedding and indexing layer. An adversarially crafted image submitted to a RAG ingestion pipeline can trigger excessive processing at multiple points: the vision encoder run to generate the image embedding (compute-intensive), the similarity search across the existing embedding index (scales with index size), any OCR pass over the image (CPU-intensive for high-entropy pixel regions), and any caching invalidation triggered by a high-entropy embedding that does not cluster with existing index entries. In large-scale RAG systems where embedding computation is distributed and cached, adversarial images with deliberately unusual embedding characteristics can defeat caching (the image embedding falls outside all cached clusters, forcing a full index scan on every retrieval), creating sustained high compute load from a small number of adversarial image submissions. The LLM10 risk compounds if the adversarial image also carries an LLM01 prompt injection payload — the injection steers the VLM to retrieve and surface attacker-specified context, while simultaneously triggering maximum retrieval compute.

4. Client-facing multimodal chatbot UI — adversarial images triggering infinite escalation and model fallback loops. Consumer-facing multimodal chatbots that process user-submitted images — customer service bots with screenshot upload, e-commerce visual search, educational tutoring with diagram upload — implement fallback logic for uncertain or low-confidence VLM responses: retry with a higher-capability model, escalate to a human agent, or request image resubmission. An adversarially crafted image designed to produce maximum uncertainty in the VLM's primary response — triggering the confidence-threshold fallback — forces the pipeline to invoke a more expensive fallback model on every request. If the adversarial image also bypasses the fallback model's confidence threshold, the pipeline escalates to human review, consuming human-agent time in addition to compute. In chatbot deployments with unlimited free image submissions per conversation, this pattern allows a single adversary to systematically exhaust the escalation queue by submitting adversarial images that always trigger the fallback path. Standard LLM10 mitigations (session rate limits, token budgets) do not address image-triggered escalation loops because the escalation is triggered by the image content's effect on VLM output confidence, not by the raw request volume.

Integration: pre-scan gate blocking image-layer resource exhaustion

import base64
import requests
import httpx

GLYPHWARD_KEY = "<your-glyphward-api-key>"
GLYPHWARD_THRESHOLD = 65

def process_image_safe(image_bytes: bytes, user_id: str) -> dict:
    """
    LLM10-safe image processing: scan before VLM invocation.
    Adversarial high-entropy images are blocked before they reach the model.
    """
    # Step 1: Glyphward pre-scan — blocks adversarial images before VLM call
    encoded = base64.b64encode(image_bytes).decode()
    scan_resp = requests.post(
        "https://glyphward.com/v1/scan",
        headers={"Authorization": f"Bearer {GLYPHWARD_KEY}"},
        json={"image": encoded},
        timeout=5,
    )

    if scan_resp.status_code != 200:
        # Fail-closed: scan unavailability → reject image, do not fall through to VLM
        return {"status": "error", "reason": "scan_unavailable", "user_id": user_id}

    scan = scan_resp.json()
    if scan["score"] >= GLYPHWARD_THRESHOLD:
        return {
            "status": "rejected",
            "reason": "adversarial_image_detected",
            "score": scan["score"],
            "user_id": user_id,
            "scan_id": scan["scan_id"],
        }

    # Step 2: VLM call — only reached by non-adversarial images
    # Apply standard LLM10 text-layer controls in addition to the image-layer scan
    response = httpx.post(
        "https://api.openai.com/v1/chat/completions",
        headers={"Authorization": f"Bearer {OPENAI_KEY}"},
        json={
            "model": "gpt-4o",
            "max_tokens": 512,  # LLM10 token budget cap (text layer)
            "messages": [
                {
                    "role": "user",
                    "content": [
                        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded}"}},
                        {"type": "text", "text": "Describe this image."},
                    ],
                }
            ],
        },
        timeout=15,  # LLM10 timeout cap
    )

    return {
        "status": "ok",
        "result": response.json()["choices"][0]["message"]["content"],
        "scan_id": scan["scan_id"],
        "user_id": user_id,
    }

The standard LLM10 text-layer controls — max_tokens cap and request timeout — remain in place at the VLM call level. These address text-output length and per-request duration. The Glyphward pre-scan gate addresses the image-input layer that text controls cannot reach: an adversarially crafted image that would trigger worst-case attention computation is blocked before the VLM call is ever issued, so the model never incurs the excessive compute cost. For retry-sensitive pipelines (queue-based architectures with dead-letter queue handling), apply the pre-scan gate at the queue consumer rather than at the ingestion endpoint: this blocks adversarial images that were queued before the scan gate was added, and prevents retry amplification by blocking before the first VLM call rather than after the first failure. For escalation-sensitive chatbot deployments, log the Glyphward scan ID alongside every escalation event to detect systematic adversarial escalation patterns across sessions. Get early access

Coverage matrix

Mitigation layer	Public API image upload (quota exhaustion)	Event-driven pipeline (retry cascade)	Multi-modal RAG (embedding recomputation)	Chatbot UI (escalation loop)
Token budget caps and output length limits (LLM10 baseline)	Partial — limits text token generation; does not limit image attention compute	No — retry count caps limit amplification; do not prevent the initial VLM invocation on the adversarial image	No — token limits apply to generation; embedding computation costs are outside token budget scope	Partial — limits response length; does not prevent confidence-threshold escalation triggered by image content
Request rate limiting per user/IP	Partial — limits request count; adversarial image compute amplification per request still occurs	No — queue-based pipelines decouple submission rate from processing rate	No — batch ingestion pipelines may not have per-user rate limits	Partial — session rate limits; adversarial image can trigger escalation within rate limit budget
Text-only prompt injection scanners (Lakera, LLM Guard, Azure Prompt Shields)	No — scan text inputs; image pixel payloads bypass all text-layer scanners	No	No	No
Glyphward pre-VLM image scan (multimodal PI detection)	Yes — blocks adversarial high-entropy images before VLM call; no model compute incurred	Yes — blocks adversarial image at queue consumer; prevents initial VLM call and all retries	Yes — blocks adversarial image before embedding computation and index search	Yes — blocks adversarial image before VLM confidence assessment; prevents escalation cascade

Related questions

How is image-layer resource exhaustion different from text-based DDoS on LLM APIs?

Text-based DDoS on LLM APIs (sending many requests with long prompts or requesting maximum token outputs) is addressed by standard LLM10 controls: request rate limiting, prompt length caps, output token budgets, and per-user cost throttling. These controls operate on measurable text-layer parameters. Image-layer resource exhaustion is categorically different because the attack is in the pixel content of a single image, not in the volume of requests or the length of text. One adversarially crafted image submitted within the normal rate limit can trigger worst-case VLM attention computation — the model spends maximum GPU time on a single request that is superficially indistinguishable from a normal image submission. The adversary does not need to flood the API; they need to craft one or a few images that maximise per-request compute cost. Text-layer rate limits do not help because the attack is within the rate limit budget. The correct countermeasure is image content inspection before the VLM call — which is what the Glyphward pre-scan gate provides.

Can file size limits prevent image-layer resource exhaustion?

File size limits reduce the attack surface for extremely large adversarial images but do not eliminate image-layer resource exhaustion. VLM vision encoders process images by dividing them into patches and computing attention across all patch pairs — the compute cost scales with the number of patches squared, not linearly with file size. A high-entropy adversarial image at a standard size (e.g., 1024×1024 JPEG within a 5MB file size limit) can trigger maximum attention computation because the adversarial structure targets the patch-level attention mechanism, not the file byte count. Additionally, adversarial images designed to trigger retry cascades exploit the pipeline's logic (validation failures, confidence thresholds) rather than raw model compute — these work regardless of image size. File size limits are a useful baseline defence but are not a substitute for image content inspection. The Glyphward pre-scan gate inspects image content rather than file metadata, catching adversarial structure that is invisible to size-based filters.

Which other OWASP LLM Top 10 items interact with LLM10 in multimodal contexts?

LLM10 Unbounded Consumption in multimodal pipelines interacts with two other OWASP LLM items. LLM01 Prompt Injection is the mechanism that enables the most dangerous LLM10 amplification: an adversarial image that both triggers maximum compute (LLM10) and injects a command to produce invalid output (causing retries, LLM10 amplification) and directs the model toward attacker-controlled content (LLM01 objective). Addressing LLM01 at the input layer with a pre-scan gate simultaneously addresses the image-layer LLM10 vector. LLM04 Model DoS (in earlier OWASP Top 10 versions) covers resource exhaustion attacks more specifically — LLM10 in the 2025 list absorbs and extends the LLM04 coverage to include cost amplification beyond pure availability attacks. The multimodal dimension applies equally: pixel-level resource exhaustion payloads close the gap that text-layer DoS mitigations leave open.

How do I detect if my pipeline is already being targeted by adversarial image resource exhaustion?

Several observable signals indicate adversarial image resource exhaustion targeting: (1) abnormally high per-request inference latency for image requests with normal file sizes — a VLM call on a 512KB image taking 10× longer than average suggests worst-case attention activation; (2) high dead-letter queue rates for image processing jobs without corresponding upstream error signals — adversarial images that reliably fail downstream validation will appear in dead-letter queues with consistent failure modes; (3) anomalous cost spikes on GPU usage metrics that correlate with specific users or IP ranges submitting images, not with request count spikes; (4) escalation rate anomalies in chatbot deployments — a single user session triggering escalation on every image submission. If any of these patterns are present, retroactively scan the image corpus with Glyphward to identify which images are adversarially crafted. The multimodal AI security checklist includes monitoring recommendations for LLM10 resource exhaustion signals.

TL;DR

The four multimodal attack surfaces in LLM10 Unbounded Consumption

Integration: pre-scan gate blocking image-layer resource exhaustion

Coverage matrix

Related questions

Further reading