Skip to main content
OpenAIintermediate

How to Point Codex CLI at a Custom Base URL (Self-Hosted Endpoints)

Configure OpenAI Codex CLI or any OpenAI SDK to use a custom base URL. Route requests to a self-hosted gateway, proxy, or local inference server while keeping the OpenAI-compatible API.

9 min readUpdated June 2026

Want us to handle this for you?

Get expert help →

OpenAI Codex CLI is built on the OpenAI-compatible API, which means it does not have to talk to api.openai.com. By overriding the base URL, you can route every request through a self-hosted endpoint: a local inference server, a corporate proxy, or an edge gateway that decides where each request actually runs. This guide covers every way to set a custom base URL for Codex CLI and the underlying OpenAI SDKs.

Why Change the Base URL

The default endpoint sends your prompts (and often your code) to OpenAI's servers and bills you per token. Pointing at a custom base URL lets you:

  • Route to hardware you own: Send requests to a GPU on your network instead of the cloud, eliminating per-token cost for routine work.
  • Insert a proxy or gateway: Add caching, logging, rate-limit handling, or failover in front of the model without changing the CLI.
  • Keep data in your boundary: Compliance-sensitive code can stay inside your VPC or on-premise network.
  • Swap providers without rewriting tooling: Any OpenAI-compatible vendor becomes a drop-in target.

The only hard requirement is that the endpoint implements the OpenAI chat-completions (or responses) contract. Codex CLI, Aider, Cline, and the official OpenAI SDKs all share this contract, which is why this technique works across them.

Method 1: Environment Variables (Quickest)

For a temporary or per-shell override, set environment variables before running Codex.

# The OpenAI SDKs read OPENAI_BASE_URL
export OPENAI_BASE_URL="https://gateway.example.com/v1"

# Some Codex versions and older tooling read OPENAI_API_BASE
export OPENAI_API_BASE="https://gateway.example.com/v1"

# Provide whatever token the endpoint expects
export OPENAI_API_KEY="your-endpoint-token"

Then run Codex as normal, specifying a model the endpoint serves:

codex --model qwen2.5-coder:14b "explain this function"

Always include the /v1 suffix. The most common cause of connection failures is a base URL that stops at the host (https://gateway.example.com) instead of the API root (https://gateway.example.com/v1).

Make It Persistent

Add the exports to your shell profile so every session inherits them:

echo 'export OPENAI_BASE_URL="https://gateway.example.com/v1"' >> ~/.zshrc
echo 'export OPENAI_API_KEY="your-endpoint-token"' >> ~/.zshrc
source ~/.zshrc

Method 2: A Custom Provider in config.toml

Environment variables are global. If you want to keep OpenAI as the default and switch to your endpoint on demand, define a named provider in ~/.codex/config.toml.

# ~/.codex/config.toml

# Define a custom OpenAI-compatible provider
[model_providers.gateway]
name = "Edge Gateway"
base_url = "https://gateway.example.com/v1"
env_key = "GATEWAY_API_KEY"   # Codex reads the key from this env var

# A profile that uses it
[profiles.gateway]
model_provider = "gateway"
model = "qwen2.5-coder:14b"

# Keep cloud as a separate profile
[profiles.cloud]
model_provider = "openai"
model = "gpt-5.2-codex"

Set the referenced key once:

export GATEWAY_API_KEY="your-endpoint-token"

Now switch endpoints per command:

codex --profile gateway "add error handling to this file"
codex --profile cloud "refactor this module to use dependency injection"

This is the cleanest setup for teams: the config file documents the endpoint, and no one has to remember environment-variable names.

Method 3: Custom Base URL in the OpenAI SDK

If you script against the OpenAI SDK directly (for batch jobs, CI checks, or your own tooling), set the base URL when you construct the client. The same endpoint that serves Codex also serves these clients.

Python:

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.example.com/v1",
    api_key="your-endpoint-token",
)

resp = client.chat.completions.create(
    model="qwen2.5-coder:14b",
    messages=[{"role": "user", "content": "Summarize this diff"}],
)
print(resp.choices[0].message.content)

Node / TypeScript:

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://gateway.example.com/v1",
  apiKey: process.env.GATEWAY_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "qwen2.5-coder:14b",
  messages: [{ role: "user", content: "Summarize this diff" }],
});
console.log(resp.choices[0].message.content);

Because the contract is identical, you can develop against the cloud, then flip a single URL to run the same code against local hardware.

Verifying the Endpoint

Before debugging Codex, confirm the endpoint answers directly. A working OpenAI-compatible server lists its models:

curl https://gateway.example.com/v1/models \
  -H "Authorization: Bearer your-endpoint-token"

You should get a JSON list of model IDs. Test a completion next:

curl https://gateway.example.com/v1/chat/completions \
  -H "Authorization: Bearer your-endpoint-token" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder:14b",
    "messages": [{"role": "user", "content": "ping"}]
  }'

If these curl calls succeed but Codex fails, the problem is in your Codex config, not the endpoint.

From a Single Endpoint to a Routing Gateway

A static base URL points every request at one place. That is fine until you want different behavior for different requests: cheap local inference for routine edits, cloud quality for hard refactors, and a fallback when your GPU is busy or offline.

Because the base URL is just an OpenAI-compatible HTTP endpoint, you can put a gateway at that URL instead of a single model server. The gateway accepts the same requests Codex already sends, then decides where to run them: a GPU you own (free and local), an edge cache for repeated prompts, and a cloud burst only when the local path fails. This is exactly the model an edge-first gateway such as Wide Area AI implements, and it is the practical reason the custom-base-URL pattern matters beyond a single local server. Codex never knows the difference, because the contract on the wire is unchanged.

Troubleshooting

Connection Refused or Timeout

  1. Confirm the base URL ends in /v1.
  2. Verify the server is reachable: curl https://gateway.example.com/v1/models.
  3. Check that no firewall or VPN is blocking the host or port.

401 Unauthorized

  1. The endpoint is validating the key. Provide the token it expects.
  2. For local servers (Ollama, llama.cpp), any non-empty OPENAI_API_KEY works.
  3. Ensure the variable Codex reads (OPENAI_API_KEY or your env_key) is the one you set.

Model Not Found (404)

  1. The model name does not exist on the target endpoint.
  2. List available models with the /v1/models call above.
  3. Use the exact ID the endpoint reports, including any version tag.

Codex Still Hits api.openai.com

  1. A profile or config value is overriding your environment variable. Check ~/.codex/config.toml.
  2. --profile and explicit model_provider settings take precedence over OPENAI_BASE_URL.
  3. Run with verbose logging to confirm the resolved endpoint.

Next Steps

Running models locally?

Turn your GPU into an OpenAI-compatible endpoint

Wide Area Intelligence is an edge-first AI gateway — it serves the GPU you already own over a Cloudflare Tunnel as an OpenAI-compatible endpoint, edge-caches repeated requests, and bursts to the cloud only when your node is offline. Works with any OpenAI SDK.

Start routing — free

Frequently Asked Questions

Find answers to common questions

Yes. Codex CLI speaks the OpenAI-compatible API, so any endpoint that implements /v1/chat/completions (or /v1/responses) works. Set OPENAI_BASE_URL or define a custom model_provider in ~/.codex/config.toml with your base_url.

Both point the OpenAI client at a non-default host. The official OpenAI Python and Node SDKs read OPENAI_BASE_URL. Older tooling and some versions of Codex CLI also honor OPENAI_API_BASE. Set both if you are unsure, and always include the /v1 suffix.

Only if your endpoint validates it. Local servers like Ollama or llama.cpp accept any non-empty string. A hosted gateway may issue its own key. Codex still requires the variable to be set, so provide whatever token your endpoint expects.

No, but the model name must exist on the target endpoint. If you set base_url to a local server, pass a model that server actually serves (for example qwen2.5-coder:14b), not gpt-5.2-codex. Mismatched names return a 404 or model-not-found error.