OpenAIadvanced

How to Run OpenAI Codex CLI with Local Models (Ollama, LM Studio)

Configure OpenAI Codex CLI to use local LLMs via Ollama, LM Studio, or other OpenAI-compatible APIs. Run coding tasks privately without sending data to external services.

9 min readUpdated January 2025

Want us to handle this for you?

Get expert help →

OpenAI Codex CLI can connect to any OpenAI-compatible API endpoint, including local inference servers like Ollama and LM Studio. This enables you to run coding tasks entirely on your own hardware, keeping sensitive code private and eliminating API costs.

Why Use Local Models

Running local models with Codex CLI offers several advantages:

  • Privacy: Your code never leaves your machine. Ideal for proprietary codebases, client work, or sensitive projects
  • Cost savings: No per-token charges after initial hardware investment
  • Offline access: Work without internet connectivity
  • No rate limits: Run as many requests as your hardware can handle
  • Experimentation: Test different models without account restrictions

The tradeoff is that local models typically provide lower quality results than GPT-5-Codex, especially for complex multi-file refactoring. However, for routine tasks like code explanation, simple edits, and documentation, local models perform adequately.

Hardware Requirements

Local model performance depends heavily on your hardware. Here are the minimum and recommended specifications:

Minimum Requirements (7B Parameter Models)

ComponentSpecification
RAM16GB
Storage20GB free space
CPUModern multi-core processor

With these specs, you can run models like CodeLlama-7B, DeepSeek-Coder-6.7B, and similar lightweight coding models.

ComponentSpecification
RAM32GB+
GPUNVIDIA with 8GB+ VRAM or Apple Silicon with 16GB+ unified memory
Storage100GB+ free space

This configuration enables models like CodeLlama-34B, DeepSeek-Coder-33B, and Mixtral-8x7B which provide significantly better coding assistance.

Optimal Setup (70B+ Parameter Models)

For the best local experience, you need either:

  • NVIDIA GPU with 24GB+ VRAM (RTX 4090, A6000)
  • Apple Silicon Mac with 64GB+ unified memory (M2 Max, M3 Max, M4 Max)
  • Multi-GPU setup with NVLink

Setting Up Ollama

Ollama is the simplest way to run local models. It handles model downloading, quantization, and provides an OpenAI-compatible API.

Installation

macOS:

brew install ollama

Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows: Download the installer from ollama.com.

Download a Coding Model

Pull a code-specialized model:

# Recommended for most users (6.7B parameters, ~4GB)
ollama pull deepseek-coder:6.7b

# Better quality if you have 16GB+ RAM
ollama pull codellama:13b-instruct

# Best local coding model if you have 32GB+ RAM or GPU
ollama pull deepseek-coder:33b

Start the Ollama Server

ollama serve

By default, Ollama runs on http://localhost:11434.

Configure Codex CLI for Ollama

Set environment variables to point Codex at your local Ollama instance:

export OPENAI_API_BASE="http://localhost:11434/v1"
export OPENAI_API_KEY="ollama"  # Any non-empty string works

Now run Codex with your local model:

codex --model deepseek-coder:6.7b "explain this function"

Setting Up LM Studio

LM Studio provides a graphical interface for managing local models and includes an OpenAI-compatible server.

Installation

  1. Download LM Studio from lmstudio.ai
  2. Install and launch the application
  3. Search for and download a coding model (recommended: DeepSeek-Coder, CodeLlama, or Qwen2.5-Coder)

Start the Local Server

  1. Click the Local Server tab (left sidebar)
  2. Select your downloaded model
  3. Click Start Server
  4. Note the server URL (default: http://localhost:1234/v1)

Configure Codex CLI for LM Studio

export OPENAI_API_BASE="http://localhost:1234/v1"
export OPENAI_API_KEY="lm-studio"  # Any non-empty string works

Run Codex specifying the model name as shown in LM Studio:

codex --model "deepseek-coder-6.7b-instruct" "add error handling to this code"

Configuration Options

Permanent Configuration

Add local model settings to your Codex config file:

~/.codex/config.toml:

# Use local model by default
model_provider = "oss"
model = "deepseek-coder:6.7b"

# Or configure a custom provider
[model_providers.local]
base_url = "http://localhost:11434/v1"
api_key = "ollama"

# Create profiles for different setups
[profiles.local]
model_provider = "local"
model = "deepseek-coder:6.7b"

[profiles.cloud]
model_provider = "openai"
model = "gpt-5.2-codex"

Use profiles to switch between local and cloud:

codex --profile local "simple task"
codex --profile cloud "complex refactoring"

Environment Variables

For temporary configuration without modifying config files:

# Ollama
export OPENAI_API_BASE="http://localhost:11434/v1"
export OPENAI_API_KEY="ollama"

# LM Studio
export OPENAI_API_BASE="http://localhost:1234/v1"
export OPENAI_API_KEY="lm-studio"

Choose your model based on available hardware and task complexity:

ModelSizeVRAM/RAMBest For
deepseek-coder:6.7b~4GB8GBQuick tasks, explanations
codellama:13b-instruct~8GB16GBGeneral coding assistance
qwen2.5-coder:14b~9GB16GBBalanced quality and speed
deepseek-coder:33b~20GB32GBComplex coding tasks
codellama:70b~40GB48GB+Approaching cloud quality

For code-specific tasks, prioritize models with "coder" or "code" in the name. These are fine-tuned on programming data and significantly outperform general-purpose models at coding tasks.

Performance Comparison

Local models are improving rapidly but still trail cloud models for complex tasks:

Task TypeLocal Model QualityCloud Model Quality
Code explanationGoodExcellent
Simple bug fixesGoodExcellent
DocumentationGoodExcellent
Multi-file refactoringFairExcellent
Complex architectureFairExcellent
Security analysisPoorGood

Use local models for routine tasks and switch to cloud for complex work.

Troubleshooting

Connection Refused Error

If Codex cannot connect to your local server:

  1. Verify the server is running: curl http://localhost:11434/v1/models
  2. Check the port is not blocked by a firewall
  3. Ensure OPENAI_API_BASE includes the /v1 suffix

Model Not Found

If the model is not recognized:

  1. List available models: ollama list or check LM Studio UI
  2. Use the exact model name including version tag
  3. Pull the model first: ollama pull model-name

Slow Response Times

If responses are too slow:

  1. Use a smaller quantized model (Q4 instead of Q8)
  2. Reduce context length in model settings
  3. Ensure GPU acceleration is enabled if available
  4. Close other memory-intensive applications

Out of Memory Errors

If you see memory errors:

  1. Switch to a smaller model
  2. Use a more aggressively quantized version (Q4_K_M)
  3. Reduce the context window size
  4. Enable swap/virtual memory (slower but works)

Hybrid Approach

The most practical setup uses local models for simple tasks and cloud models for complex work:

# ~/.codex/config.toml

# Default to local for privacy
model_provider = "oss"
model = "deepseek-coder:6.7b"

[profiles.cloud]
model_provider = "openai"
model = "gpt-5.2-codex"

Daily workflow:

# Quick local tasks (free, private)
codex "explain this function"
codex "add a docstring"

# Complex tasks (cloud quality)
codex --profile cloud "refactor this module to use dependency injection"

This approach maximizes privacy and cost savings while maintaining access to cloud-quality assistance when needed.

Next Steps

Frequently Asked Questions

Find answers to common questions

Yes, Codex CLI can connect to any OpenAI-compatible API endpoint. By pointing it to Ollama, LM Studio, or similar local inference servers, you can use local models for coding tasks while maintaining data privacy.

Need Professional IT & Security Help?

Our team of experts is ready to help protect and optimize your technology infrastructure.