OpenAI Codex CLI is built on the OpenAI-compatible API, which means it does not have to talk to api.openai.com. By overriding the base URL, you can route every request through a self-hosted endpoint: a local inference server, a corporate proxy, or an edge gateway that decides where each request actually runs. This guide covers every way to set a custom base URL for Codex CLI and the underlying OpenAI SDKs.
Why Change the Base URL
The default endpoint sends your prompts (and often your code) to OpenAI's servers and bills you per token. Pointing at a custom base URL lets you:
- Route to hardware you own: Send requests to a GPU on your network instead of the cloud, eliminating per-token cost for routine work.
- Insert a proxy or gateway: Add caching, logging, rate-limit handling, or failover in front of the model without changing the CLI.
- Keep data in your boundary: Compliance-sensitive code can stay inside your VPC or on-premise network.
- Swap providers without rewriting tooling: Any OpenAI-compatible vendor becomes a drop-in target.
The only hard requirement is that the endpoint implements the OpenAI chat-completions (or responses) contract. Codex CLI, Aider, Cline, and the official OpenAI SDKs all share this contract, which is why this technique works across them.
Method 1: Environment Variables (Quickest)
For a temporary or per-shell override, set environment variables before running Codex.
# The OpenAI SDKs read OPENAI_BASE_URL
export OPENAI_BASE_URL="https://gateway.example.com/v1"
# Some Codex versions and older tooling read OPENAI_API_BASE
export OPENAI_API_BASE="https://gateway.example.com/v1"
# Provide whatever token the endpoint expects
export OPENAI_API_KEY="your-endpoint-token"
Then run Codex as normal, specifying a model the endpoint serves:
codex --model qwen2.5-coder:14b "explain this function"
Always include the /v1 suffix. The most common cause of connection failures is a base URL that stops at the host (https://gateway.example.com) instead of the API root (https://gateway.example.com/v1).
Make It Persistent
Add the exports to your shell profile so every session inherits them:
echo 'export OPENAI_BASE_URL="https://gateway.example.com/v1"' >> ~/.zshrc
echo 'export OPENAI_API_KEY="your-endpoint-token"' >> ~/.zshrc
source ~/.zshrc
Method 2: A Custom Provider in config.toml
Environment variables are global. If you want to keep OpenAI as the default and switch to your endpoint on demand, define a named provider in ~/.codex/config.toml.
# ~/.codex/config.toml
# Define a custom OpenAI-compatible provider
[model_providers.gateway]
name = "Edge Gateway"
base_url = "https://gateway.example.com/v1"
env_key = "GATEWAY_API_KEY" # Codex reads the key from this env var
# A profile that uses it
[profiles.gateway]
model_provider = "gateway"
model = "qwen2.5-coder:14b"
# Keep cloud as a separate profile
[profiles.cloud]
model_provider = "openai"
model = "gpt-5.2-codex"
Set the referenced key once:
export GATEWAY_API_KEY="your-endpoint-token"
Now switch endpoints per command:
codex --profile gateway "add error handling to this file"
codex --profile cloud "refactor this module to use dependency injection"
This is the cleanest setup for teams: the config file documents the endpoint, and no one has to remember environment-variable names.
Method 3: Custom Base URL in the OpenAI SDK
If you script against the OpenAI SDK directly (for batch jobs, CI checks, or your own tooling), set the base URL when you construct the client. The same endpoint that serves Codex also serves these clients.
Python:
from openai import OpenAI
client = OpenAI(
base_url="https://gateway.example.com/v1",
api_key="your-endpoint-token",
)
resp = client.chat.completions.create(
model="qwen2.5-coder:14b",
messages=[{"role": "user", "content": "Summarize this diff"}],
)
print(resp.choices[0].message.content)
Node / TypeScript:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://gateway.example.com/v1",
apiKey: process.env.GATEWAY_API_KEY,
});
const resp = await client.chat.completions.create({
model: "qwen2.5-coder:14b",
messages: [{ role: "user", content: "Summarize this diff" }],
});
console.log(resp.choices[0].message.content);
Because the contract is identical, you can develop against the cloud, then flip a single URL to run the same code against local hardware.
Verifying the Endpoint
Before debugging Codex, confirm the endpoint answers directly. A working OpenAI-compatible server lists its models:
curl https://gateway.example.com/v1/models \
-H "Authorization: Bearer your-endpoint-token"
You should get a JSON list of model IDs. Test a completion next:
curl https://gateway.example.com/v1/chat/completions \
-H "Authorization: Bearer your-endpoint-token" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-coder:14b",
"messages": [{"role": "user", "content": "ping"}]
}'
If these curl calls succeed but Codex fails, the problem is in your Codex config, not the endpoint.
From a Single Endpoint to a Routing Gateway
A static base URL points every request at one place. That is fine until you want different behavior for different requests: cheap local inference for routine edits, cloud quality for hard refactors, and a fallback when your GPU is busy or offline.
Because the base URL is just an OpenAI-compatible HTTP endpoint, you can put a gateway at that URL instead of a single model server. The gateway accepts the same requests Codex already sends, then decides where to run them: a GPU you own (free and local), an edge cache for repeated prompts, and a cloud burst only when the local path fails. This is exactly the model an edge-first gateway such as Wide Area AI implements, and it is the practical reason the custom-base-URL pattern matters beyond a single local server. Codex never knows the difference, because the contract on the wire is unchanged.
Troubleshooting
Connection Refused or Timeout
- Confirm the base URL ends in
/v1. - Verify the server is reachable:
curl https://gateway.example.com/v1/models. - Check that no firewall or VPN is blocking the host or port.
401 Unauthorized
- The endpoint is validating the key. Provide the token it expects.
- For local servers (Ollama, llama.cpp), any non-empty
OPENAI_API_KEYworks. - Ensure the variable Codex reads (
OPENAI_API_KEYor yourenv_key) is the one you set.
Model Not Found (404)
- The model name does not exist on the target endpoint.
- List available models with the
/v1/modelscall above. - Use the exact ID the endpoint reports, including any version tag.
Codex Still Hits api.openai.com
- A profile or config value is overriding your environment variable. Check
~/.codex/config.toml. --profileand explicitmodel_providersettings take precedence overOPENAI_BASE_URL.- Run with verbose logging to confirm the resolved endpoint.