NVIDIAScape: Critical Container Escape in Toolkit Puts Cloud AI at Host-Takeover Risk
Key Takeaways
- The flaw (CVE-2025-23266), dubbed NVIDIAScape, was discovered by cloud-security firm Wiz — it enables a malicious container image to escape isolation and execute arbitrary code on the host via a misconfigured OCI hook.
- The exploit is trivial: a “three-line Dockerfile” is sufficient to trigger the escape — attackers only need container-run privileges.
- The toolkit versions affected: all NVIDIA Container Toolkit builds up to 1.17.7, and NVIDIA GPU Operator up to 25.3.0. Patched versions (1.17.8 / 25.3.1) exist.
- The vulnerability poses high risk, especially in multi-tenant GPU cloud environments — a malicious container from one user can compromise neighboring workloads, steal models/data, or persist across the host.
Summary
Core AI Infra Isn’t Immune: Container Escape in NVIDIA Toolkit Shows Host-Level Risk
In July 2025, the NVIDIAScape vulnerability (CVE-2025-23266) was disclosed: a flaw in NVIDIA’s Container Toolkit that undermines container isolation for GPU workloads. By exploiting a misconfigured OCI hook (createContainer) and setting the LD_PRELOAD environment variable to reference a malicious shared library (.so) inside a container, an attacker can escape the container boundary and gain root privileges on the host. (wiz.io)
Because the toolkit is widely used in cloud and enterprise AI deployments — often on multi-tenant GPU clusters — the vulnerability allows a single malicious container to compromise the entire host, including all co-located workloads, potentially stealing sensitive data, proprietary models, or credentials. (wiz.io)
The exploit is alarmingly simple and requires only minimal permissions, making it a severe threat to the foundational infrastructure that powers modern AI and ML services.
What Happened: Incident Overview
- The vulnerability affects how the NVIDIA Container Toolkit handles OCI lifecycle hooks for GPU containers. Specifically, the createContainer hook inherits environment variables from the container image, including LD_PRELOAD, which can be used to load arbitrary shared libraries on the host. (wiz.io)
- Researchers demonstrated a working exploit using a minimal Dockerfile: setting LD_PRELOAD, adding a malicious .so file, and launching the container — resulting in root-level code execution on the host. (The Hacker News)
- The flaw affects both on-prem and cloud environments. In shared GPU clouds or Kubernetes clusters, a malicious user or compromised container image can compromise other tenants’ workloads. In single-tenant environments, the host and all contained workloads are at risk. (wiz.io)
- Upon disclosure, NVIDIA released patches: Container Toolkit v1.17.8 and GPU Operator v25.3.1. Organizations are urged to update immediately. (The Hacker News)
Impact: Why It Matters
- Full host takeover from a container — a core isolation guarantee of containerization collapses, undermining trust in GPU container security.
- Cross-tenant compromise — in shared GPU clusters or cloud AI services, one malicious container can compromise data, models, and workloads of all tenants on the node.
- Widespread exposure — NVIDIA’s container tools are foundational to hundreds of AI platforms and cloud providers. Estimates suggest this flaw affects roughly 35–37% of GPU-based cloud environments worldwide. (wiz.io)
- Ease of exploitation — minimal code needed; just a crafted container image with a few lines. No privileged access required beyond container run permission.
- Persistent systemic risk — even patched systems remain vulnerable if untrusted container images are used, or if patching is delayed. Older variants (e.g. CVE-2024-0132) also exist, showing recurring issues in the toolkit. (CSO Online)
This incident is a stark reminder that infrastructure components undergirding AI (GPUs, container runtimes, orchestration layers) must receive the same security scrutiny as application code or models.
PointGuard AI Perspective
At PointGuard AI, we treat infrastructure as a first-class security concern. The NVIDIAScape incident underscores why:
- Our asset discovery includes container runtimes, GPU toolchains, and orchestration layers — not just code or models.
- We enforce configuration posture checks, flagging vulnerable toolkit versions, unsafe hooks (like LD_PRELOAD), and use of untrusted container images in shared environments.
- Runtime monitoring & isolation enforcement help detect anomalous container activity, unexpected library loads, or attempts to escalate privileges.
- Governance and supply-chain controls help maintain an AI-SBOM, including infrastructure dependencies, and track which GPU hosts or clusters might be vulnerable.
If you run containerized AI workloads — especially in multi-tenant or shared GPU clusters — this is a wake-up call. Infrastructure is not inert: it’s active, shared, and exploitable.
Incident Scorecard Details
Total AISSI Score: 8.5 / 10
Criticality = 9, container escape enables full host compromise, root privileges, and data/model theft.
Propagation = 8, affects many shared GPU cloud environments globally.
Exploitability = 9, exploit requires only a three-line Dockerfile and container-run access.
Supply Chain = 8, flaw resides in foundational GPU container toolchain widely used across AI infrastructure.
Business Impact = 9, potential for cross-tenant compromise, data loss, model theft, downtime, and massive regulatory or reputational damage.
Sources
- SecurityWeek — Critical Nvidia Security Flaw Exposes Cloud AI Systems to Host Takeover (SecurityWeek)
- The Hacker News — Critical NVIDIA Container Toolkit Flaw Allows Privilege Escalation on AI Cloud Services (The Hacker News)
- CVEDetails — CVE-2025-23266: NVIDIA Container Toolkit Hooks Vulnerability Allows Privileged Code Execution cvedetails.com+1
- CyberSecurityNews.com — PoC Exploit Released for Critical NVIDIA AI Container Toolkit Vulnerability (Cyber Security News)
- CSO Online — A critical NVIDIA Container Toolkit bug can allow a complete host takeover (CSO Online)
