vLLM Video Frames Can Flood Inference Memory (CVE-2026-5497)
Key Takeaways
- CVE-2026-5497 affects vLLM versions 0.8.0 and later.
- The vulnerable method processes video/jpeg data URLs without enforcing frame count limits.
- A single request containing many base64 JPEG frames can cause out-of-memory denial of service.
- The vulnerability highlights resource governance risks in multimodal AI inference APIs.
Summary
vLLM CVE-2026-5497 is an AI infrastructure denial-of-service vulnerability involving unbounded video frame processing. The flaw shows how multimodal inputs can create availability risks when inference servers accept complex media data without resource limits.
What We Know
On June 11, 2026, public vulnerability databases published CVE-2026-5497 for vLLM, a widely used inference framework for serving large language models. OpenCVE describes an out-of-memory denial-of-service condition in VideoMediaIO.load_base64() affecting vLLM versions 0.8.0 and later. CVEFeed reports that video/jpeg data URLs are split into individual JPEG frames without a frame count limit, allowing a crafted request to force excessive decoding into memory. The GitHub Advisory Database published the issue as a high-severity advisory. This is AI-related because vLLM is a core component of many LLM serving stacks, and the vulnerable path sits in multimodal input handling for inference infrastructure.
What Could Happen
The flaw is a resource consumption bug. When vLLM processes a video/jpeg data URL, the vulnerable code splits the base64 string on commas and treats the result as a collection of frames. Without a maximum frame count or memory budget, an attacker can send a single request containing thousands of frame-like segments. The server attempts to decode them, consumes excessive memory, and crashes or becomes unavailable. This is not a confidentiality or integrity attack. Its primary impact is availability. The AI-specific aspect is the input type and deployment role. Multimodal AI endpoints process richer media formats than traditional text APIs, and inference servers are resource-intensive even under normal load. That means malformed media can magnify cost, latency, and service stability issues. OpenAI-compatible APIs can also be exposed broadly by design, increasing the need for rate limits, media validation, and request-level resource controls.
Why It Matters
Availability failures in AI inference infrastructure can halt customer-facing copilots, internal assistants, developer tools, and automated workflows. For organizations using vLLM to serve production models, a denial-of-service bug can create operational disruption even without data theft. The financial impact may include excess compute costs, degraded service-level agreements, and engineering time spent on emergency mitigation. The reputational impact can be significant when business users rely on AI workflows for support, analysis, or operations. The incident also raises governance implications for multimodal AI adoption. Security teams must validate not only prompts and model outputs, but also media parsers, request sizes, decoding behavior, and resource limits. As AI systems accept images, audio, and video, classical input validation becomes part of AI safety and reliability.
PointGuard AI Perspective
PointGuard AI helps organizations manage this kind of AI infrastructure risk by combining visibility, policy enforcement, and runtime oversight. The PointGuard AI Runtime Guardrails focus on live AI traffic, applying controls before harmful or abnormal interactions disrupt workflows. The AI Runtime Protection glossary outlines why production AI systems need continuous monitoring of inputs, outputs, and behavior during active use. For agentic deployments that route requests through tools and services, the PointGuard AI Agent Control Plane can help validate actions and contain runaway behavior that consumes resources or destabilizes connected systems. In a vLLM scenario, PointGuard AI’s value is in helping teams discover exposed inference endpoints, enforce acceptable input policies, monitor unusual request patterns, and coordinate response when AI services become unstable. The broader takeaway is that AI runtime security includes availability. A trustworthy AI program must protect models from abuse, data from leakage, and infrastructure from malicious resource exhaustion.
Incident Scorecard Details
Total AISSI Score: 6.1/10
Criticality = 6.5, The main impact is inference availability rather than data exposure., AISSI weighting: 25%
Propagation = 6.0, Any similarly exposed vulnerable multimodal endpoint can be affected, but spread is not self-propagating., AISSI weighting: 20%
Exploitability = 6.0, The attack is low complexity and unauthenticated in described conditions, with no broad exploitation confirmed., AISSI weighting: 15%
Supply Chain = 7.5, vLLM is a third-party AI inference framework used across AI stacks., AISSI weighting: 15%
Business Impact = 5.0, Potential service disruption is credible, but confirmed material impact is not reported., AISSI weighting: 25%
