If you live in a terminal, the choice between OpenAI's Codex CLI and Anthropic's Claude Code has stopped being academic. They now do the same core job — read your repo, plan a change, edit multiple files, run commands, and iterate — and they cost real money when you run them all day. The problem is that most "X vs Y" posts lean on a single benchmark screenshot or one viral cost anecdote and call it settled. It isn't.
This is the honest version: what each tool actually is in 2026, what the models cost, how access and sandboxing differ, and what the cost-per-task numbers really tell you (and where they fall apart). I'll flag every place the data is soft.
What you're actually comparing
Both are free, open-source command-line agents. You pay for the model behind them, not the CLI.
- Codex CLI was open-sourced in April 2025 (Apache-2.0) and has since been rewritten in Rust. It defaults to GPT-5.5, OpenAI's newest frontier model for complex coding, with GPT-5.4 (flagship), GPT-5.4-mini (fast/efficient, good for subagents), and GPT-5.3-codex-spark (a Pro-only research preview for near-instant iteration). You pick the model in
config.tomlor with the--modelflag. - Claude Code is Anthropic's open-source CLI. As of May 28, 2026 its default model is Claude Opus 4.8 (
claude-opus-4-8), which requires Claude Code v2.1.154 or later. You can also drop down to Sonnet 4.6 or Haiku 4.5 for speed and cost.
So this is really GPT-5.5-class execution versus Opus-4.8-class reasoning, wrapped in two different agent shells with different defaults around autonomy and permissions.
Models and pricing at a glance
Here's the current lineup with API token pricing (per million tokens) where it applies.
| Model | Role | Input / Output (per MTok) | Context |
|---|---|---|---|
| Claude Opus 4.8 | Claude Code default | $5 / $25 | 1M, 128k max output |
| Claude Sonnet 4.6 | Fast/balanced | $3 / $15 | 1M |
| Claude Haiku 4.5 | Fastest | $1 / $5 | 200k |
| Claude Fable 5 | Most capable (GA June 9, 2026) | $10 / $50 | 1M |
| GPT-5.5 | Codex CLI default | Plan-based or API rates | — |
| GPT-5.4 / 5.4-mini | Flagship / fast | Plan-based or API rates | — |
One detail that matters in practice: Opus 4.8's effort parameter defaults to high across the API and Claude Code. That's part of why it produces strong results — and part of why it can burn more tokens.
Access and subscriptions: two different philosophies
This is where the tools diverge most, and it drives the real cost math.
Codex is bundled into ChatGPT plans rather than priced as its own product:
| Plan | Price | Notes |
|---|---|---|
| Free | $0 | No cloud tasks / code reviews |
| Go | $8/mo | No cloud tasks / code reviews |
| Plus | $20/mo | Full Codex access |
| Pro | from $100/mo | 5x or 20x rate-limit options |
| Business | per seat (PAYG) | Same limits as Plus per seat |
| Enterprise / Edu | custom | — |
Codex usage runs in 5-hour windows. On Plus, that's roughly 15–80 GPT-5.5 messages, 20–100 on GPT-5.4, and 60–350 on GPT-5.4-mini. Pro 5x and Pro 20x multiply those Plus limits accordingly. There's also an API-key mode that bills per token at standard rates — but cloud tasks and code reviews aren't available that way.
Claude Code pays for itself through Anthropic subscriptions (Pro $20, Max 5x $100, Max 20x $200, plus Team/Enterprise) or per-token API rates. On May 6, 2026 Anthropic doubled Claude Code's 5-hour rate limits for Pro/Max/Team and seat-based Enterprise, and removed the peak-hours limit reduction for Pro and Max — a direct response to the single loudest complaint about the tool: rate limiting.
If you take one thing from the pricing section: for heavy daily agentic use, subscriptions crush pay-per-token. A Max 20x ($200/mo) heavy user can consume token volumes that would cost roughly $600–$1,500/mo at API rates. Codex is similarly cheaper bundled into a ChatGPT plan than in API-key mode.
Sandboxing and auth
Codex leans into autonomous, safe-by-default execution. It runs locally with OS-level sandboxing across three safety levels — Read Only, Auto, and Full Access — using macOS Seatbelt and Linux Landlock, with network access blocked by default. Auth is via ChatGPT OAuth (token cached at ~/.codex/auth.json) or API key; v0.116.0 (March 19, 2026) added ChatGPT device-code sign-in for headless boxes.
Claude Code uses a permission-prompting model instead — it asks before running commands or touching files. Both are legitimate approaches, but if your goal is "let the agent run unattended in a box," Codex's sandbox model is more turnkey.
The benchmark numbers — and why to distrust them
Here's where I have to be blunt. The headline benchmark figures vary wildly by source and date. I'm including them so you've seen them, not because any single number is trustworthy.
- Terminal-Bench 2.0: Codex/GPT-5.5 has been reported at both 77.3% and 82.7%; Claude at 65.4% and 69.4% depending on the source.
- SWE-bench Verified: Opus shows up at 80.9%, 87.6%, and 88.6% across different write-ups; GPT-5.5 around 88.7%.
- SWE-bench Pro (June 2026): Opus at 64.3% vs GPT-5.5 at 58.6%.
Notice the contradictions: Codex leads some boards, Claude leads others, and the same model swings eight points between sources. These come from secondary blogs with different harnesses, prompts, and dates. Don't pick a tool on a benchmark screenshot. The most defensible read is that GPT-5.5 and Opus 4.8 are in the same league, and harness/prompting differences explain most of the gaps.
Real cost per task
The viral number is that Claude Code uses ~4x more tokens per task than Codex, with one cited Express.js refactor running ~$15 on Codex vs ~$155 on Claude Code. That's a striking spread — and it's a single anecdote from a secondary blog, not a controlled benchmark. Treat the dollar figures as illustrative.
What's more reliable is the direction: multiple sources agree Codex is more token-efficient, and Opus 4.8's default high effort makes Claude Code thorough but token-hungry. For cost-sensitive or high-volume work, that efficiency compounds. Reported Codex spend averages roughly $100–200/developer/month with high variance — which is exactly why subscriptions exist.
Community sentiment (with a grain of salt)
A frequently cited 500+ developer Reddit survey found ~65% preferred Codex CLI day-to-day — token efficiency, speed, open-source flexibility, fewer limits — while blind reviews rated Claude Code's code cleaner ~67% of the time. The #1 Claude Code complaint was rate limiting (which the May 6 limit increase partly addresses). This survey is secondary and unverified, but it matches the broader pattern: people prefer Codex for the flow, and respect Claude for the output.
Decision matrix
| Your situation | Reach for |
|---|---|
| Large multi-file refactor, architecture, gnarly reasoning | Claude Code (Opus 4.8) |
| Long-context work across a big repo | Claude Code (1M context) |
| Code quality matters more than speed | Claude Code |
| Fast, autonomous, sandboxed execution | Codex CLI |
| Terminal-native DevOps / CI tasks | Codex CLI |
| Cost-sensitive or high-volume bulk edits | Codex CLI (token efficiency) |
| Already pay for ChatGPT, want zero extra spend | Codex CLI (bundled in plan) |
| Hate hitting rate limits | Lean Codex; or Claude Max with the May 2026 doubled limits |
| Unattended agent in a locked-down box | Codex CLI (Seatbelt/Landlock, network off) |
Bottom line
Stop looking for a winner. The honest 2026 answer is that Claude Code is the better thinker and Codex CLI is the better runner, and they're cheap enough to keep both.
If I had to pick a default workflow: drive architecture, complex features, and big refactors with Claude Code on Opus 4.8, then hand off autonomous, sandboxed, and cost-sensitive grunt work to Codex CLI on GPT-5.5 or GPT-5.4-mini. Pay for whichever subscription matches your daily volume — Max 20x or Codex Pro for heavy users — because per-token API mode is the expensive trap for anyone running agents all day. And whatever you read about benchmarks or a $155 refactor: verify it against your own repo before you believe it.