OpenAIintermediate

How to Fix OpenAI Codex CLI Slow Performance

Diagnose and resolve slow performance issues in OpenAI Codex CLI. Optimize response times, reduce latency, handle large context windows efficiently, and configure settings for faster interactions.

7 min readUpdated January 2025

Want us to handle this for you?

Get expert help →

OpenAI Codex CLI is powerful, but performance issues can disrupt your workflow. Slow responses, timeouts, and laggy interactions often have identifiable causes and straightforward solutions. This guide covers how to diagnose and fix common performance problems.

Common Causes of Slow Performance

Understanding why Codex CLI runs slowly helps you target the right fix.

Context Size Overhead

Codex CLI reads your codebase to provide context-aware responses. Large repositories with many files significantly increase the context window size, which directly impacts:

  • Initial response time: More context means more tokens to process
  • Memory usage: Large contexts require more local and server-side resources
  • Token costs: Larger contexts consume more of your quota faster

Model Selection

Different models have different speed characteristics:

ModelSpeedBest For
gpt-4o-miniFastestSimple tasks, quick questions
gpt-4oModerateBalanced speed and capability
o1-previewSlowestComplex reasoning, architecture decisions

Network Latency

Your connection to OpenAI's API affects every interaction. High latency manifests as:

  • Delays before responses start streaming
  • Intermittent pauses during output
  • Frequent timeouts on longer operations

Approval Mode Overhead

The suggest approval mode requires confirmation for each file change, adding round-trip time for every operation. While safer, it slows down multi-step tasks significantly.

Measuring and Diagnosing Latency

Before optimizing, measure your baseline performance.

Check OpenAI API Status

First, verify the problem is not on OpenAI's end:

# Check OpenAI status page
curl -s https://status.openai.com/api/v2/status.json | jq '.status.description'

Or visit status.openai.com directly.

Measure Response Time

Time a simple request to establish your baseline:

time codex "what is 2+2"

Compare this to a context-heavy request:

time codex "explain the architecture of this codebase"

The difference reveals how much context loading affects your performance.

Check Your Network

Test your connection to OpenAI's API:

ping api.openai.com
curl -o /dev/null -s -w "Connect: %{time_connect}s\nTTFB: %{time_starttransfer}s\nTotal: %{time_total}s\n" https://api.openai.com/v1/models

Look for connect times under 100ms and TTFB (Time To First Byte) under 500ms for optimal performance.

Configuration Optimizations

Select the Right Model

Switch to faster models for routine tasks:

# Use gpt-4o-mini for quick operations
codex --model gpt-4o-mini "format this JSON file"

# Reserve o1-preview for complex reasoning
codex --model o1-preview "design a caching strategy for this API"

Configure your default model in ~/.codex/config.toml:

[model]
default = "gpt-4o-mini"  # Faster default for routine tasks

Adjust Timeout Settings

Increase timeouts for large operations:

[network]
timeout = 120  # seconds, default is often 60

Configure Context Limits

Limit how much context Codex reads:

[context]
max_files = 50        # Limit files read
max_file_size = 100   # KB, skip files larger than this

Using .codexignore to Reduce Context

Create a .codexignore file in your project root to exclude files from context:

# Dependencies (always exclude)
node_modules/
vendor/
.venv/
__pycache__/

# Build artifacts
dist/
build/
*.min.js
*.map

# Large data files
*.csv
*.json
data/
fixtures/

# Generated code
*.generated.*
coverage/

# Media files
*.png
*.jpg
*.mp4

Strategic Exclusions

Focus exclusions on:

  1. Dependencies: node_modules, vendor, virtual environments
  2. Build output: Compiled files, bundles, sourcemaps
  3. Test fixtures: Large test data files
  4. Generated code: Auto-generated files that change frequently
  5. Binary files: Images, videos, compiled binaries

After creating .codexignore, restart Codex to apply changes.

Choosing Approval Modes for Speed

Codex CLI offers three approval modes with different performance characteristics:

Suggest Mode (Default)

codex --approval suggest "refactor this function"
  • Requires approval for each change
  • Safest but slowest for multi-file operations
  • Best for learning or unfamiliar codebases

Auto-Edit Mode

codex --approval auto-edit "add error handling to all API calls"
  • Automatically applies file changes
  • Still requires approval for shell commands
  • Good balance of speed and safety

Full-Auto Mode

codex --approval full-auto "run tests and fix any failures"
  • Automatically applies all changes and runs commands
  • Fastest for trusted operations
  • Use only in sandboxed environments or with version control

For repetitive trusted tasks, auto-edit or full-auto can reduce completion time by 50-70%.

Network and API Latency Troubleshooting

Use a Faster DNS

Switch to a faster DNS resolver:

# Test Cloudflare DNS
dig @1.1.1.1 api.openai.com

# Test Google DNS
dig @8.8.8.8 api.openai.com

Configure your system to use the faster option.

Check for VPN Interference

VPNs can add significant latency. Test with VPN disabled:

# Disconnect VPN, then test
time codex "hello"

# Reconnect VPN, test again
time codex "hello"

If VPN adds significant latency, consider split tunneling to route OpenAI traffic directly.

Off-Peak Usage

OpenAI's API experiences higher latency during peak hours (typically 9 AM - 6 PM Pacific Time, weekdays). For large operations, consider scheduling during off-peak hours.

When to Clear Context and Start Fresh

Start a new session when:

  • Switching projects: Old context pollutes new project responses
  • After major refactoring: Stale context causes confusion
  • When responses seem confused: Accumulated context may be contradictory
  • Performance degrades over time: Long sessions accumulate history

Clear your session:

# Exit and restart Codex
exit
codex

# Or use the clear command within a session
/clear

Model Speed Comparison

Use this reference when choosing models:

ModelAvg Response TimeToken LimitBest Use Cases
gpt-4o-mini1-3 seconds128KQuick edits, formatting, simple generation
gpt-4o3-8 seconds128KCode review, moderate complexity tasks
o1-preview10-30 seconds128KArchitecture, complex debugging, planning
o1-mini5-15 seconds128KReasoning tasks with speed priority

For most coding tasks, gpt-4o-mini provides the best balance of speed and capability. Reserve slower models for tasks that genuinely require deeper reasoning.

Next Steps

Frequently Asked Questions

Find answers to common questions

Codex CLI includes additional overhead for code analysis, file reading, and command execution context. Large codebases increase context size which affects response time. The CLI also streams responses which may feel slower than seeing a complete response instantly.

Need Professional IT & Security Help?

Our team of experts is ready to help protect and optimize your technology infrastructure.