Can Codex CLI work with local models?

Yes, Codex CLI can connect to any OpenAI-compatible API endpoint. By pointing it to Ollama, LM Studio, or similar local inference servers, you can use local models for coding tasks while maintaining data privacy.

Which local models work best for coding?

Code-specialized models like CodeLlama, DeepSeek Coder, and Qwen2.5-Coder provide the best results. For general tasks, Llama 3 and Mixtral work well. Model choice depends on your hardware capabilities and task complexity.

How do I configure Codex CLI for Ollama?

Set OPENAI_API_BASE to your Ollama endpoint (typically http://localhost:11434/v1) and OPENAI_API_KEY to any non-empty string. Then specify your model with --model. Codex CLI will use the local server.

What hardware do I need for local models?

Minimum 16GB RAM for smaller models (7B parameters). For better coding models (13B-34B), you'll want 32GB+ RAM or a GPU with 8GB+ VRAM. Larger models require proportionally more resources.

How to Run OpenAI Codex CLI with Local Models (Ollama, LM Studio)

OpenAI Codex CLI can connect to any OpenAI-compatible API endpoint, including local inference servers like Ollama and LM Studio. This enables you to run coding tasks entirely on your own hardware, keeping sensitive code private and eliminating API costs.

Why Use Local Models

Running local models with Codex CLI offers several advantages:

Privacy: Your code never leaves your machine. Ideal for proprietary codebases, client work, or sensitive projects
Cost savings: No per-token charges after initial hardware investment
Offline access: Work without internet connectivity
No rate limits: Run as many requests as your hardware can handle
Experimentation: Test different models without account restrictions

The tradeoff is that local models typically provide lower quality results than GPT-5-Codex, especially for complex multi-file refactoring. However, for routine tasks like code explanation, simple edits, and documentation, local models perform adequately.

Hardware Requirements

Local model performance depends heavily on your hardware. Here are the minimum and recommended specifications:

Minimum Requirements (7B Parameter Models)

Component	Specification
RAM	16GB
Storage	20GB free space
CPU	Modern multi-core processor

With these specs, you can run models like CodeLlama-7B, DeepSeek-Coder-6.7B, and similar lightweight coding models.

Recommended Requirements (13B-34B Parameter Models)

Component	Specification
RAM	32GB+
GPU	NVIDIA with 8GB+ VRAM or Apple Silicon with 16GB+ unified memory
Storage	100GB+ free space

This configuration enables models like CodeLlama-34B, DeepSeek-Coder-33B, and Mixtral-8x7B which provide significantly better coding assistance.

Optimal Setup (70B+ Parameter Models)

For the best local experience, you need either:

NVIDIA GPU with 24GB+ VRAM (RTX 4090, A6000)
Apple Silicon Mac with 64GB+ unified memory (M2 Max, M3 Max, M4 Max)
Multi-GPU setup with NVLink

Setting Up Ollama

Ollama is the simplest way to run local models. It handles model downloading, quantization, and provides an OpenAI-compatible API.

Installation

macOS:

brew install ollama

Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows: Download the installer from ollama.com.

Download a Coding Model

Pull a code-specialized model:

# Recommended for most users (6.7B parameters, ~4GB)
ollama pull deepseek-coder:6.7b

# Better quality if you have 16GB+ RAM
ollama pull codellama:13b-instruct

# Best local coding model if you have 32GB+ RAM or GPU
ollama pull deepseek-coder:33b

Start the Ollama Server

ollama serve

By default, Ollama runs on http://localhost:11434.

Configure Codex CLI for Ollama

Set environment variables to point Codex at your local Ollama instance:

export OPENAI_API_BASE="http://localhost:11434/v1"
export OPENAI_API_KEY="ollama"  # Any non-empty string works

Now run Codex with your local model:

codex --model deepseek-coder:6.7b "explain this function"

Setting Up LM Studio

LM Studio provides a graphical interface for managing local models and includes an OpenAI-compatible server.

Installation

Download LM Studio from lmstudio.ai
Install and launch the application
Search for and download a coding model (recommended: DeepSeek-Coder, CodeLlama, or Qwen2.5-Coder)

Start the Local Server

Click the Local Server tab (left sidebar)
Select your downloaded model
Click Start Server
Note the server URL (default: http://localhost:1234/v1)

Configure Codex CLI for LM Studio

export OPENAI_API_BASE="http://localhost:1234/v1"
export OPENAI_API_KEY="lm-studio"  # Any non-empty string works

Run Codex specifying the model name as shown in LM Studio:

codex --model "deepseek-coder-6.7b-instruct" "add error handling to this code"

Configuration Options

Permanent Configuration

Add local model settings to your Codex config file:

~/.codex/config.toml:

# Use local model by default
model_provider = "oss"
model = "deepseek-coder:6.7b"

# Or configure a custom provider
[model_providers.local]
base_url = "http://localhost:11434/v1"
api_key = "ollama"

# Create profiles for different setups
[profiles.local]
model_provider = "local"
model = "deepseek-coder:6.7b"

[profiles.cloud]
model_provider = "openai"
model = "gpt-5.2-codex"

Use profiles to switch between local and cloud:

codex --profile local "simple task"
codex --profile cloud "complex refactoring"

Environment Variables

For temporary configuration without modifying config files:

# Ollama
export OPENAI_API_BASE="http://localhost:11434/v1"
export OPENAI_API_KEY="ollama"

# LM Studio
export OPENAI_API_BASE="http://localhost:1234/v1"
export OPENAI_API_KEY="lm-studio"

Recommended Models for Coding

Choose your model based on available hardware and task complexity:

Model	Size	VRAM/RAM	Best For
deepseek-coder:6.7b	~4GB	8GB	Quick tasks, explanations
codellama:13b-instruct	~8GB	16GB	General coding assistance
qwen2.5-coder:14b	~9GB	16GB	Balanced quality and speed
deepseek-coder:33b	~20GB	32GB	Complex coding tasks
codellama:70b	~40GB	48GB+	Approaching cloud quality

For code-specific tasks, prioritize models with "coder" or "code" in the name. These are fine-tuned on programming data and significantly outperform general-purpose models at coding tasks.

Performance Comparison

Local models are improving rapidly but still trail cloud models for complex tasks:

Task Type	Local Model Quality	Cloud Model Quality
Code explanation	Good	Excellent
Simple bug fixes	Good	Excellent
Documentation	Good	Excellent
Multi-file refactoring	Fair	Excellent
Complex architecture	Fair	Excellent
Security analysis	Poor	Good

Use local models for routine tasks and switch to cloud for complex work.

Troubleshooting

Connection Refused Error

If Codex cannot connect to your local server:

Verify the server is running: curl http://localhost:11434/v1/models
Check the port is not blocked by a firewall
Ensure OPENAI_API_BASE includes the /v1 suffix

Model Not Found

If the model is not recognized:

List available models: ollama list or check LM Studio UI
Use the exact model name including version tag
Pull the model first: ollama pull model-name

Slow Response Times

If responses are too slow:

Use a smaller quantized model (Q4 instead of Q8)
Reduce context length in model settings
Ensure GPU acceleration is enabled if available
Close other memory-intensive applications

Out of Memory Errors

If you see memory errors:

Switch to a smaller model
Use a more aggressively quantized version (Q4_K_M)
Reduce the context window size
Enable swap/virtual memory (slower but works)

Hybrid Approach

The most practical setup uses local models for simple tasks and cloud models for complex work:

# ~/.codex/config.toml

# Default to local for privacy
model_provider = "oss"
model = "deepseek-coder:6.7b"

[profiles.cloud]
model_provider = "openai"
model = "gpt-5.2-codex"

Daily workflow:

# Quick local tasks (free, private)
codex "explain this function"
codex "add a docstring"

# Complex tasks (cloud quality)
codex --profile cloud "refactor this module to use dependency injection"

This approach maximizes privacy and cost savings while maintaining access to cloud-quality assistance when needed.

Next Steps

Explore Ollama model library for more coding models
Learn about quantization formats to optimize memory usage
Compare with Claude Code which requires cloud but offers superior reasoning
Read about Codex configuration options

How to Run OpenAI Codex CLI with Local Models (Ollama, LM Studio)

Why Use Local Models

Hardware Requirements

Minimum Requirements (7B Parameter Models)

Recommended Requirements (13B-34B Parameter Models)

Optimal Setup (70B+ Parameter Models)

Setting Up Ollama

Installation

Download a Coding Model

Start the Ollama Server

Configure Codex CLI for Ollama

Setting Up LM Studio

Installation

Start the Local Server

Configure Codex CLI for LM Studio

Configuration Options

Permanent Configuration

Environment Variables

Recommended Models for Coding

Performance Comparison

Troubleshooting

Connection Refused Error

Model Not Found

Slow Response Times

Out of Memory Errors

Hybrid Approach

Next Steps

Frequently Asked Questions

Building with AI?

How to Configure Approval and Sandbox Modes in OpenAI Codex CLI

How to Fix OpenAI Codex CLI Context Window Exceeded Errors

How to Fix OpenAI Codex CLI Slow Performance

LLM Token Counter

JSON Formatter

JWT Decoder

How to Run OpenAI Codex CLI with Local Models (Ollama, LM Studio)

Frequently Asked Questions

Building with AI?

Related Articles

How to Configure Approval and Sandbox Modes in OpenAI Codex CLI

How to Fix OpenAI Codex CLI Context Window Exceeded Errors

How to Fix OpenAI Codex CLI Slow Performance

Related Tools

LLM Token Counter

JSON Formatter

JWT Decoder