A concise field guide to sandboxing untrusted AI code
A compact field guide separates sandboxing into three distinct decisions—boundary, policy, and lifecycle—and uses that separation to evaluate the practical tradeoffs of common sandboxing approaches for AI workloads. The framing is intentionally pragmatic: running model-generated or user-supplied code is now a routine surface for risk, and the right sandbox depends on what must be trusted, what must be accessible, and what must persist between runs.
The three-question model (quick)
- What is shared between the untrusted code and the host?
- What can the code touch (files, network, devices, syscalls)?
- What survives between runs (ephemeral run, workspace, or snapshot/restore)?
This simple checklist is marketed as the minimum required to reason about a sandbox’s actual guarantees.
Sandbox families at a glance
-
Containers: convenient and dense, but they share the host kernel. Common failure modes include misconfiguration, reachable kernel bugs, and policy leakage (credentials, repos, network). Hardening helps, but does not change the shared-kernel boundary.
-
gVisor: a userspace “Sentry” that intercepts syscalls and sharply reduces the host-kernel interface. Trades compatibility and some performance for a smaller trusted surface.
-
MicroVMs (examples: Firecracker, cloud-hypervisor, libkrun): run a guest kernel behind a minimal VMM. Provide full Linux semantics, snapshot/restore options, and a narrower host interface (KVM ioctls + virtio devices), shifting the attack surface away from the host syscall ABI.
-
Runtime sandboxes (Wasm, V8 isolates): no ambient syscall ABI; host capabilities must be explicitly granted. Excellent for high density and fast cold starts when tasks can be modeled as capability-scoped operations rather than arbitrary binaries.
-
Local OS sandboxes (macOS Seatbelt, Linux Landlock + seccomp, Windows AppContainer): lightweight, per-process controls suitable for desktop agents where the primary risk is local data access rather than kernel 0-day.
Practical signposts
- For multi-tenant coding agents that require shells and package managers, a microVM boundary is a common recommendation.
- For tool-calling and plugin isolation, Wasm or isolates are often the cleanest fit.
- For teams already invested in Kubernetes and willing to trade some compatibility, gVisor is a pragmatic middle ground.
For an in-depth comparison, concrete operational notes, and decision tables that map workloads to boundaries and policies, consult the full field guide at the original source: A field guide to sandboxes for AI.