Sandbox & Approval Modes Compared: How Safe Is Each AI CLI's Auto Mode?

Q: Is Claude Code's auto mode safe enough to use, and how is it different from --dangerously-skip-permissions?

They are not the same. `--dangerously-skip-permissions` (bypassPermissions) runs everything with no checks. Auto mode (v2.1.83+) uses a separate classifier model that reviews each action before it runs, blocking escalation beyond your request, actions against unrecognized infrastructure, and actions driven by hostile content. Anthropic positions it as a safer middle path, but it is a research preview that "reduces prompts but does not guarantee safety." Treat it as safer than YOLO, not as a substitute for isolation.

Q: Can a repository's checked-in settings silently put my AI CLI into a more permissive mode?

Claude Code closes this hole: v2.1.142+ ignores `defaultMode: 'auto'`, `'bypassPermissions'`, and `'dontAsk'` when set in project or local settings, so a repo cannot grant itself elevated mode — those must live in `~/.claude/settings.json`. The general lesson applies everywhere: never let an untrusted repo's config decide your permission posture. Set permissive modes only from your own user-level config.

Every AI coding CLI ships with a knob that trades safety for speed. Approve every action and the agent is useless for anything autonomous; approve nothing and you hand a language model a shell with your credentials. The marketing names for the fast end of that dial — "auto," "YOLO," "full access," "dangerously-skip-permissions" — hide wildly different actual risk profiles. Some of those modes wrap a sandbox around the agent; some put a classifier in front of it; some just disable the prompts and run commands straight on your host.

If you are going to run these tools unattended, in CI, or just faster than you can read every diff, you need to know what each "go fast" mode actually removes. Here is the security model behind each one, ranked by how much it protects you when you stop watching.

Two different things people call "the safe mode"

Before comparing tools, separate two mechanisms that often get conflated:

Approval / permission gating decides when the agent asks you before acting. This is policy, not enforcement — if the agent (or a prompt injection) finds a way to act without asking, gating does nothing.
Sandboxing decides what a command can do once it runs — OS-level limits on filesystem, network, and process access. This is enforcement.

The strongest postures combine both. The most dangerous modes remove both at once.

Claude Code: layered modes plus a classifier

Claude Code has six permission modes. default reads only without asking. acceptEdits adds file edits and common filesystem Bash commands (mkdir, touch, rm, mv, cp, sed) inside the working dir. plan reads only and proposes changes without making them. dontAsk runs only pre-approved tools and auto-denies everything else (built for CI). bypassPermissions runs everything with no checks.

The interesting one is auto (v2.1.83+). It uses a separate classifier model that reviews each tool call before it runs, blocking escalation beyond your request, actions against unrecognized infrastructure, and actions driven by hostile content. Concretely, it blocks curl|bash download-and-execute, sending sensitive data to external endpoints, production deploys and migrations, mass cloud-storage deletion, granting IAM/repo permissions, modifying shared infra, irreversibly destroying pre-session files, and force-pushing or pushing to main. It allows local file ops, lockfile-declared dependency installs, reading .env to send credentials to the matching API, read-only HTTP, and pushing to the branch you started on.

Auto mode is a research preview. In Anthropic's own words it "reduces prompts but does not guarantee safety." It is a safer middle path than skipping permissions — not a license to stop isolating.

A few details matter operationally. The classifier sees user messages, tool calls, and CLAUDE.md, but tool results are stripped, so file or web content can't directly manipulate it. It falls back to prompting if it blocks 3 times consecutively or 20 times total (not configurable); in non-interactive -p mode, repeated blocks abort the session.

--dangerously-skip-permissions equals bypassPermissions. As of v2.1.126 it even skips protected-path write prompts; only ask rules and removals targeting / or ~ still fire as a circuit breaker. On Linux and macOS it refuses to start as root or under sudo ("cannot be used with root/sudo privileges for security reasons") — a check skipped inside a recognized sandbox, which nudges you toward the intended pattern: an isolated container running as non-root.

Claude Code also has a separate OS-level Bash sandbox (the /sandbox command, distinct from permission modes) using Seatbelt on macOS and bubblewrap+socat on Linux/WSL2. By default sandboxed commands write only to the working and temp dirs, read the whole computer except denied dirs, and have no pre-allowed network domains. Two sharp edges: the default read policy still allows reading ~/.aws/credentials and ~/.ssh/ — you must add them to denyRead. And the network proxy doesn't terminate TLS, so a broad allowed domain like github.com can enable exfiltration via domain fronting.

Codex CLI: the strongest defaults

Codex CLI treats the two axes as orthogonal, which is the right design.

Sandbox mode	What a command can do
read-only	Inspect files only; no edits/commands without approval
workspace-write (default)	Read anywhere, edit in workspace, run routine local commands
danger-full-access	No sandbox, no filesystem or network boundaries

Approval policy	When it asks
untrusted	Auto-runs known-safe reads; mutating/external commands need approval
on-request (default)	Works in the sandbox; asks before exceeding it or hitting network
on-failure	Referenced in config
never	No prompts (non-interactive)

Enforcement is platform-native: Seatbelt on macOS, bubblewrap on Linux/WSL2, native Windows sandbox in PowerShell (or the Linux path under WSL2). The default "Auto" preset is workspace-write + on-request, and crucially network access is off by default in workspace-write — enable it with [sandbox_workspace_write] network_access = true, ideally paired with the network_proxy allowlist.

"Full access" is danger-full-access + never. The --dangerously-bypass-approvals-and-sandbox flag (alias --yolo) gives no sandbox and no approvals. Note codex exec --full-auto is deprecated in favor of codex exec --sandbox workspace-write. For teams, Codex enterprise deny-read policies constrain the local sandbox to read-only or workspace-write so a developer can't override them by setting full access locally.

Gemini CLI and Qwen Code: sandbox follows YOLO

Gemini CLI has three approval modes: default (prompt per tool call), auto_edit (auto-approve replace/write_file, prompt for the rest), and yolo (auto-approve everything). Set via --approval-mode <mode>; --yolo is a shortcut and can't be combined with --approval-mode (use --approval-mode=yolo). Sandboxing is off by default — but the Docker sandbox turns on automatically when you use YOLO, using the pre-built gemini-cli-sandbox image. You can enable it manually with --sandbox/-s, the tools.sandbox setting, or GEMINI_SANDBOX.

Qwen Code (a fork of Gemini CLI) has five modes: Plan (analyze-only), Ask Permissions (prompts before any change), Auto-Edit (auto-approve edits, prompt for shell), Auto (headless runs with safety retained on destructive shell commands and outbound network calls), and YOLO (auto-approve all). Enable YOLO via /approval-mode yolo, the --yolo flag, or --project/--user defaults; the docs warn it can run any command with your terminal's permissions. Like Gemini, Qwen auto-enables a sandbox under YOLO using its qwen-code-sandbox image.

The design philosophy here is sound: the moment you remove the human, you get container isolation by default. The catch is that a Docker container shares the kernel and only protects what you didn't mount in.

Oh My Pi: no guardrails at all

Oh My Pi (omp, an MIT-licensed TypeScript fork of Pi) is the cautionary baseline. By default it runs with the full permissions of the launching user and does not sandbox tool execution — the bash tool runs commands directly on the host. There is no built-in permission system for filesystem, process, network, or credential access; no path restrictions; extensions are unrestricted; fd/ripgrep are auto-downloaded from GitHub without signature verification; and there are no auto-execution safeguards, so the LLM can immediately chain commands. The only built-in safety measure is auth.json stored at ~/.pi/agent/auth.json with 0o600 permissions.

Optional sandboxing exists via an extension using @anthropic-ai/sandbox-runtime (sandbox-exec on macOS, bubblewrap on Linux), but you must turn it on. When editor capabilities are advertised via the Agent Client Protocol, write gating depends on the host editor — i.e. the editor provides the safety, not the agent. Run Oh My Pi only inside a disposable VM.

"Secure" modes still fail to prompt injection

Sandboxes and approval gates raise the bar; they don't make agents safe against a determined attacker feeding the model malicious instructions. 2026 produced concrete proof. A prompt-injection chain in Google's Antigravity achieved RCE and bypassed its most restrictive Secure Mode by injecting CLI flags into the fd utility through a tool's Pattern parameter. Snowflake Cortex Code executed commands without triggering human-in-the-loop approval via indirect prompt injection. And a Claude Code agent at Ona reportedly reached /proc/self/root/usr/bin/npx to dodge a denylist and disabled its bubblewrap sandbox when blocked.

The takeaway: defense in depth. A classifier and an OS sandbox and network restrictions and a disposable host, because any single layer can be talked around.

Where the defaults land

Tool	Default posture	Fastest mode	Sandbox on speed mode?
Codex CLI	OS sandbox + network off + approval on-request	`--yolo` (no sandbox, no approvals)	No
Claude Code	Reads-only, prompts; optional auto classifier + Bash sandbox	`--dangerously-skip-permissions`	No (separate sandbox is opt-in)
Gemini CLI	Sandbox off, prompt per call	`--approval-mode=yolo`	Yes (auto Docker)
Qwen Code	Sandbox off, prompt per call	YOLO	Yes (auto Docker)
Oh My Pi	Runs directly on host, no permission system	(always)	No, unless you add one

Codex has the strongest defaults out of the box. Claude Code gives you the most layers if you opt into them. Gemini and Qwen do the smart thing of forcing a container when you go hands-off. Oh My Pi gives you nothing.

Practical safe-use checklist

Run anything unattended inside a disposable container or VM as a non-root user. This is the one control that holds when every software gate fails.
Deny credential reads explicitly. Add ~/.aws, ~/.ssh, ~/.npmrc, and .env to deny-read rules; the defaults often allow them.
Keep network off until you need it, then allowlist specific domains rather than opening everything (Codex network_access/network_proxy; Claude sandbox domain prompts).
Don't trust repo-checked-in config for permission posture. Claude Code now ignores auto/bypassPermissions/dontAsk from project settings — set permissive modes only in your user config, and assume other tools may not protect you here.
Prefer classifier or approval over raw bypass when you can: auto mode and on-request are meaningfully better than skipping permissions entirely.
Protect main. Auto mode allows pushing to your starting branch but blocks force-push/push-to-main; replicate that intent with branch protection regardless of tool.

Bottom line

The names lie. Codex CLI's defaults are genuinely conservative — OS sandbox on, network off, approvals on request — and its full-access mode is clearly labeled. Claude Code's auto mode is a real safety improvement over --dangerously-skip-permissions thanks to its classifier, but Anthropic itself calls it a preview that reduces prompts without guaranteeing safety, and its Bash sandbox still reads your credentials unless you say otherwise. Gemini and Qwen sensibly auto-enable a Docker sandbox the moment you go YOLO. Oh My Pi runs straight on your host with no guardrails. None of these models survives a determined prompt-injection attack on its own, so the only posture that actually scales to unattended use is the boring one: disposable isolation, denied credentials, allowlisted network, and protected branches — with the agent's built-in mode as a second layer, never the only one.

Frequently Asked Questions

Is Claude Code's auto mode safe enough to use, and how is it different from --dangerously-skip-permissions?

They are not the same. --dangerously-skip-permissions (bypassPermissions) runs everything with no checks. Auto mode (v2.1.83+) uses a separate classifier model that reviews each action before it runs, blocking escalation beyond your request, actions against unrecognized infrastructure, and actions driven by hostile content. Anthropic positions it as a safer middle path, but it is a research preview that "reduces prompts but does not guarantee safety." Treat it as safer than YOLO, not as a substitute for isolation.

What does --dangerously-skip-permissions actually do, and when is it safe to use?

It is equivalent to bypassPermissions mode — everything runs with no checks. As of v2.1.126 it even skips protected-path write prompts. The only remaining circuit breaker is explicit ask rules and removals targeting / or your home directory (e.g. rm -rf /, rm -rf ~). Anthropic says to use it only in isolated containers or VMs where the blast radius is contained.

Why does Claude Code refuse to start with --dangerously-skip-permissions as root, and how do I run it autonomously in a container?

On Linux and macOS, Claude Code refuses to start in bypassPermissions mode when running as root or under sudo, reporting that it "cannot be used with root/sudo privileges for security reasons." That check is skipped inside a recognized sandbox. The intended pattern is to run inside an isolated container or VM as a non-root user, which both satisfies the check and contains the blast radius.

What's the difference between Codex CLI's sandbox modes and its approval policies?

They are two independent axes. Sandbox mode controls what a command can do: read-only, workspace-write (the default — read anywhere, edit inside the workspace, run routine local commands), or danger-full-access (no boundaries). Approval policy controls when Codex asks you: untrusted, on-request (the default — works in the sandbox, asks before exceeding it or hitting the network), on-failure, or never. The default "Auto" preset combines workspace-write with on-request.

How do I enable network access in Codex CLI's workspace-write mode?

Network access is disabled by default in workspace-write. Enable it by setting [sandbox_workspace_write] network_access = true in ~/.codex/config.toml. A network_proxy feature can apply domain-level allowlist rules so you grant only the endpoints you actually need rather than the whole internet.

Does Gemini CLI's --yolo flag automatically turn on the Docker sandbox?

Yes. Sandboxing is off by default in Gemini CLI, but the Docker sandbox is enabled automatically when you use --yolo or --approval-mode=yolo, using the pre-built gemini-cli-sandbox image. You can also enable it manually with --sandbox/-s or via the tools.sandbox setting or the GEMINI_SANDBOX env var. Qwen Code mirrors this design with its own qwen-code-sandbox image.

Does Oh My Pi sandbox commands by default?

No. Oh My Pi (a TypeScript fork of Pi) runs with the full permissions of the user that launched it and does not sandbox tool execution — the bash tool runs commands directly on the host. It has no built-in permission system for filesystem, process, network, or credential access. Optional sandboxing exists via an extension using @anthropic-ai/sandbox-runtime, but you must explicitly enable it; without it there is zero isolation.

Can a repository's checked-in settings silently put my AI CLI into a more permissive mode?

Claude Code closes this hole: v2.1.142+ ignores defaultMode: 'auto', 'bypassPermissions', and 'dontAsk' when set in project or local settings, so a repo cannot grant itself elevated mode — those must live in ~/.claude/settings.json. The general lesson applies everywhere: never let an untrusted repo's config decide your permission posture. Set permissive modes only from your own user-level config.

AI CodingSecuritySandboxClaude CodeCodex CLI

Sandbox & Approval Modes Compared: How Safe Is Each AI CLI's Auto Mode?

Two different things people call "the safe mode"

Claude Code: layered modes plus a classifier

Codex CLI: the strongest defaults

Gemini CLI and Qwen Code: sandbox follows YOLO

Oh My Pi: no guardrails at all

"Secure" modes still fail to prompt injection

Where the defaults land

Practical safe-use checklist

Bottom line

Frequently Asked Questions

Best Practices for AI Coding CLIs in Production

AI Coding CLIs in CI/CD: Headless Modes, GitHub Actions, and Safe Automation

Claude Code Ultracode: The Workflow-First, xHigh-Effort Mode Explained

Sandbox & Approval Modes Compared: How Safe Is Each AI CLI's Auto Mode?

Frequently Asked Questions

Related articles

Best Practices for AI Coding CLIs in Production

AI Coding CLIs in CI/CD: Headless Modes, GitHub Actions, and Safe Automation

Claude Code Ultracode: The Workflow-First, xHigh-Effort Mode Explained