Running Claude in YOLO Mode: Productivity vs. Security and Sandboxing

Simon Willison warns Claude's YOLO mode can speed coding but enables a 'lethal trifecta'—private access, untrusted content, and outgoing communication—risking secret exfiltration. His remedy: run agents in isolated re...

Running Claude in YOLO Mode: Productivity vs. Security and Sandboxing

TL;DR

  • YOLO mode (CLI: --dangerously-skip-permissions): Claude Code allowed long-running, low-supervision tasks by skipping normal permission checks
  • 48-hour proof-of-concept work: got DeepSeek-OCR on an NVIDIA Spark in Docker; ran Pyodide in Node.js to execute Python in WebAssembly; compiled SLOCCount (Perl + C) to run in-browser via WebAssembly
  • Lethal trifecta: access to private data + exposure to untrusted content + ability to externally communicate, enabling prompt-injection–driven exfiltration
  • Sandboxing as primary defense: prefer remote sandboxes to limit blast radius; references include OpenAI Codex Cloud, model code-interpreter features, Anthropic’s sandbox-runtime, and macOS sandbox-exec used in PoCs

Living dangerously with Claude — a quick read

Simon Willison’s October 2025 post investigates a familiar tension in coding-agent workflows: huge productivity gains from running agents with minimal restrictions versus the real security risks that follow. The write-up centers on the informal “YOLO mode” (the CLI flag --dangerously-skip-permissions) that lets Claude Code take long-running, low-supervision tasks — and the measures required to avoid handing attackers the keys to the castle.

What happened in 48 hours

Willison recounts three rapid, curiosity-driven projects completed while the agent ran with relaxed permissions: getting DeepSeek-OCR working on an NVIDIA Spark in Docker, experimenting with Pyodide in Node.js to run Python in a WebAssembly sandbox, and compiling the 2001-era SLOCCount (Perl + C) to run in-browser via WebAssembly. These are framed as side quests that highlight how much can be offloaded to an autonomous coding agent when supervision is reduced.

The security counterpoint

The post pivots to the hazards of such freedom. Willison revisits prompt injection and lays out the concept he calls the lethal trifecta: the combination of access to private data, exposure to untrusted content, and the ability to externally communicate. When those three elements converge, an agent running in YOLO mode can be coerced into exfiltrating secrets.

The practical defense proposed is straightforward: sandbox agents. Willison highlights remote sandboxes (so the worst-case compromise affects another machine), points to services and features like OpenAI Codex Cloud and the code-interpreter offerings in major models, and notes Anthropic’s recently released sandbox-runtime library. On macOS, the long-used sandbox-exec mechanism plays a role in proof-of-concept sandboxes — even though it’s marked deprecated in Apple docs.

Why read the full post

The original post pairs the high-level argument with concrete examples, slides, and links that document the experiments and the sandboxing approach in more depth. For readers interested in practical agent workflows and the trade-offs of low-supervision modes, it’s a concise combination of hands-on experiments and security guidance.

Read the original post: https://simonwillison.net/2025/Oct/22/living-dangerously-with-claude/

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community