OpenAI Codex CLI is powerful, but performance issues can disrupt your workflow. Slow responses, timeouts, and laggy interactions often have identifiable causes and straightforward solutions. This guide covers how to diagnose and fix common performance problems.
Common Causes of Slow Performance
Understanding why Codex CLI runs slowly helps you target the right fix.
Context Size Overhead
Codex CLI reads your codebase to provide context-aware responses. Large repositories with many files significantly increase the context window size, which directly impacts:
- Initial response time: More context means more tokens to process
- Memory usage: Large contexts require more local and server-side resources
- Token costs: Larger contexts consume more of your quota faster
Model Selection
Different models have different speed characteristics:
| Model | Speed | Best For |
|---|---|---|
| gpt-4o-mini | Fastest | Simple tasks, quick questions |
| gpt-4o | Moderate | Balanced speed and capability |
| o1-preview | Slowest | Complex reasoning, architecture decisions |
Network Latency
Your connection to OpenAI's API affects every interaction. High latency manifests as:
- Delays before responses start streaming
- Intermittent pauses during output
- Frequent timeouts on longer operations
Approval Mode Overhead
The suggest approval mode requires confirmation for each file change, adding round-trip time for every operation. While safer, it slows down multi-step tasks significantly.
Measuring and Diagnosing Latency
Before optimizing, measure your baseline performance.
Check OpenAI API Status
First, verify the problem is not on OpenAI's end:
# Check OpenAI status page
curl -s https://status.openai.com/api/v2/status.json | jq '.status.description'
Or visit status.openai.com directly.
Measure Response Time
Time a simple request to establish your baseline:
time codex "what is 2+2"
Compare this to a context-heavy request:
time codex "explain the architecture of this codebase"
The difference reveals how much context loading affects your performance.
Check Your Network
Test your connection to OpenAI's API:
ping api.openai.com
curl -o /dev/null -s -w "Connect: %{time_connect}s\nTTFB: %{time_starttransfer}s\nTotal: %{time_total}s\n" https://api.openai.com/v1/models
Look for connect times under 100ms and TTFB (Time To First Byte) under 500ms for optimal performance.
Configuration Optimizations
Select the Right Model
Switch to faster models for routine tasks:
# Use gpt-4o-mini for quick operations
codex --model gpt-4o-mini "format this JSON file"
# Reserve o1-preview for complex reasoning
codex --model o1-preview "design a caching strategy for this API"
Configure your default model in ~/.codex/config.toml:
[model]
default = "gpt-4o-mini" # Faster default for routine tasks
Adjust Timeout Settings
Increase timeouts for large operations:
[network]
timeout = 120 # seconds, default is often 60
Configure Context Limits
Limit how much context Codex reads:
[context]
max_files = 50 # Limit files read
max_file_size = 100 # KB, skip files larger than this
Using .codexignore to Reduce Context
Create a .codexignore file in your project root to exclude files from context:
# Dependencies (always exclude)
node_modules/
vendor/
.venv/
__pycache__/
# Build artifacts
dist/
build/
*.min.js
*.map
# Large data files
*.csv
*.json
data/
fixtures/
# Generated code
*.generated.*
coverage/
# Media files
*.png
*.jpg
*.mp4
Strategic Exclusions
Focus exclusions on:
- Dependencies:
node_modules,vendor, virtual environments - Build output: Compiled files, bundles, sourcemaps
- Test fixtures: Large test data files
- Generated code: Auto-generated files that change frequently
- Binary files: Images, videos, compiled binaries
After creating .codexignore, restart Codex to apply changes.
Choosing Approval Modes for Speed
Codex CLI offers three approval modes with different performance characteristics:
Suggest Mode (Default)
codex --approval suggest "refactor this function"
- Requires approval for each change
- Safest but slowest for multi-file operations
- Best for learning or unfamiliar codebases
Auto-Edit Mode
codex --approval auto-edit "add error handling to all API calls"
- Automatically applies file changes
- Still requires approval for shell commands
- Good balance of speed and safety
Full-Auto Mode
codex --approval full-auto "run tests and fix any failures"
- Automatically applies all changes and runs commands
- Fastest for trusted operations
- Use only in sandboxed environments or with version control
For repetitive trusted tasks, auto-edit or full-auto can reduce completion time by 50-70%.
Network and API Latency Troubleshooting
Use a Faster DNS
Switch to a faster DNS resolver:
# Test Cloudflare DNS
dig @1.1.1.1 api.openai.com
# Test Google DNS
dig @8.8.8.8 api.openai.com
Configure your system to use the faster option.
Check for VPN Interference
VPNs can add significant latency. Test with VPN disabled:
# Disconnect VPN, then test
time codex "hello"
# Reconnect VPN, test again
time codex "hello"
If VPN adds significant latency, consider split tunneling to route OpenAI traffic directly.
Off-Peak Usage
OpenAI's API experiences higher latency during peak hours (typically 9 AM - 6 PM Pacific Time, weekdays). For large operations, consider scheduling during off-peak hours.
When to Clear Context and Start Fresh
Start a new session when:
- Switching projects: Old context pollutes new project responses
- After major refactoring: Stale context causes confusion
- When responses seem confused: Accumulated context may be contradictory
- Performance degrades over time: Long sessions accumulate history
Clear your session:
# Exit and restart Codex
exit
codex
# Or use the clear command within a session
/clear
Model Speed Comparison
Use this reference when choosing models:
| Model | Avg Response Time | Token Limit | Best Use Cases |
|---|---|---|---|
| gpt-4o-mini | 1-3 seconds | 128K | Quick edits, formatting, simple generation |
| gpt-4o | 3-8 seconds | 128K | Code review, moderate complexity tasks |
| o1-preview | 10-30 seconds | 128K | Architecture, complex debugging, planning |
| o1-mini | 5-15 seconds | 128K | Reasoning tasks with speed priority |
For most coding tasks, gpt-4o-mini provides the best balance of speed and capability. Reserve slower models for tasks that genuinely require deeper reasoning.
Next Steps
- Learn how to switch models in Codex CLI for different tasks
- Set up MCP servers for extended capabilities
- Explore session management to preserve context across work sessions