The interactive AI coding session — you typing, the agent asking permission, you approving — does not survive contact with a build server. CI has no TTY, no human to click "yes," and a budget that is very much real. Yet the same agents that refactor code on your laptop can review pull requests, triage issues, and fix failing tests inside a pipeline. The trick is running them in headless mode with guardrails tight enough that a malicious issue comment can't walk off with your API key.
This is a practical guide to doing that with the three major CLIs — Claude Code, Codex CLI, and Gemini CLI — covering the headless flags, the GitHub Actions, the cost ceilings, and the security model you actually need.
Headless mode: the foundation
Every CI integration starts with a non-interactive invocation: one prompt in, one result out, then exit.
Claude Code uses the -p (or --print) flag:
claude -p "Find and fix the bug in auth.py" \
--allowedTools "Read,Edit,Bash"
Anthropic now frames headless mode as the Agent SDK delivered through the CLI; the older standalone --headless flag has been superseded by print mode. For scripted runs, add --bare, which skips auto-discovery of hooks, skills, plugins, MCP servers, auto-memory, and CLAUDE.md so a CI run doesn't behave differently from a local one. Anthropic recommends --bare for SDK/scripted calls and says it will become the default for -p in a future release. In bare mode, auth must come from ANTHROPIC_API_KEY or an apiKeyHelper.
Codex CLI uses codex exec:
codex exec "Add unit tests for the parser module"
It runs a single session to completion with no approval prompts, streams activity to stderr, and writes only the final agent message to stdout. By default codex exec is read-only — it cannot edit files or make network calls until you raise the sandbox level. That default is a feature: a read-only review job needs nothing more.
Gemini CLI enters headless mode automatically in a non-TTY environment or when you pass a prompt:
cat README.md | gemini -p "Summarize this and flag anything outdated"
All three accept piped stdin. Note Claude Code caps piped stdin at 10MB (as of v2.1.128); exceeding it exits with a clear error and non-zero status, so pass a file reference for larger inputs.
Structured output and per-run cost
CI needs machine-readable output, not prose. Each CLI obliges.
| CLI | JSON flag | What you get |
|---|---|---|
| Claude Code | --output-format json | total_cost_usd, per-model cost breakdown, session_id; --json-schema constrains output to a schema (returned in structured_output) |
| Codex | --json | NDJSON events: thread.started, turn.completed, item.*; --output-schema enforces a JSON Schema |
| Gemini | --output-format json | response, stats (per-model tokens/latency, tool call totals, file line modifications), error |
The Claude Code cost field is the one I lean on most — it lets a pipeline log exactly what each invocation spent:
COST=$(claude -p "$PROMPT" --output-format json --bare \
| jq -r '.total_cost_usd')
echo "This run cost \$$COST"
Cost and loop controls
Headless agents fail closed only if you make them. The two failure modes are runaway cost and infinite tool loops.
Claude Code gives you both levers directly. --max-budget-usd sets an absolute dollar ceiling — covering input/output tokens and tool uses — that the agent cannot exceed; when it's reached, the agent stops immediately. --max-turns limits reasoning/tool-use cycles and returns a partial result with a "Maximum turns reached" message, preventing infinite loops:
claude -p "$PROMPT" --bare \
--max-turns 5 \
--max-budget-usd 2.00 \
--allowedTools "Read,Edit"
Always pair an agent-level limit with a job-level timeout in your CI YAML. The agent's own ceiling protects your spend; the pipeline timeout protects against the agent hanging entirely.
For Codex, cost discipline comes from running codex exec (no TUI overhead), --ephemeral to skip persisting session rollout files, and choosing cheaper models for routine work — community reports put savings from a mini Codex model at roughly 40–60% on routine tasks. Bear in mind each connected MCP server adds tool definitions to every API call, inflating cost; trim MCP servers you don't need in CI.
One scheduling note for Claude subscribers: starting June 15, 2026, Agent SDK and claude -p usage on subscription plans draws from a new monthly Agent SDK credit, separate from interactive usage limits. Budget your automation against that pool, not your interactive quota.
GitHub Actions
Each vendor ships a first-party action.
anthropics/claude-code-action is general-purpose: it responds to @claude mentions, issue assignments, or explicit automation prompts, and runs entirely on your own runner. It supports Anthropic API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry auth. The easiest setup is /install-github-app from the Claude Code terminal. The v1 action uses a unified claude_args configuration object (replacing the v0.x individual inputs); pin to @v1. A lower-level anthropics/claude-code-base-action exists as the building block inside the full action.
- uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
prompt: "Review this PR for security issues"
claude_args: "--max-turns 8 --allowedTools Read,Bash(git diff *)"
openai/codex-action@v1 installs the Codex CLI, starts a Responses API proxy when given an API key (to reduce key exposure), and runs codex exec under a configurable safety strategy. Inputs include openai-api-key, prompt/prompt-file, sandbox (read-only / workspace-write / danger-full-access), safety-strategy, allow-users/allow-bots, and output-file. The safety-strategy options are drop-sudo (default, irreversibly strips sudo — recommended), unprivileged-user, and unsafe (required for Windows, not recommended on Linux/macOS).
google-github-actions/run-gemini-cli invokes the Gemini CLI in Actions. Inputs include gemini_api_key (optional unless using Vertex/Code Assist), prompt, and settings (JSON for .gemini/settings.json). It supports event-based automation, @gemini-cli mentions with /review or /triage, and conversational requests; pre-built workflows live in examples/workflows (gemini-dispatch.yml, issue-triage.yml, pr-review.yml).
The security model you can't skip
Here is the part most teams underestimate. Microsoft Threat Intelligence disclosed a vulnerability where the Claude Code GitHub Action could expose CI/CD secrets: prompt injection via untrusted GitHub content (issue bodies, PR descriptions, comments) could direct the Read tool to access /proc/self/environ and exfiltrate ANTHROPIC_API_KEY. The Bash tool was sandboxed and scrubbed — but the Read tool wasn't subject to the same isolation. Anthropic mitigated it in v2.1.128 (released May 5, 2026) by rejecting sensitive /proc files.
The patch closes that specific hole. The pattern behind it does not go away, so adopt the Microsoft case study's mitigations as standing policy:
- Agents Rule of Two — never combine untrusted input, sensitive system access, and external communication in a single workflow. Break the chain on at least one axis.
- Least privilege — scoped tokens, one key per environment, monitor usage. A review bot does not need write access to your registry.
- Harden system prompts — use Claude Code's
--append-system-prompt(or--append-system-prompt-file) to declare that issue, comment, PR, and file contents are untrusted data, not instructions. Use--system-promptonly when you intend to replace the default entirely. - Architectural isolation — keep untrusted context separate from execution. Restrict triggers with
allow-users/allow-botsso external contributors can't invoke a privileged agent. - Lock down tools — Claude Code's
dontAskmode denies anything not inpermissions.allowor the read-only command set; Codex'sread-onlysandbox default does the same. Default deny, then widen.
Also keep an eye on Gemini: a 2026 trust-model update introduced a breaking change to how non-interactive/headless environments handle folder trust. Previously, Gemini CLI in CI automatically trusted workspace folders for loading configuration and environment variables; the update hardens workspace trust and tool allowlisting. Review your run-gemini-cli setup against the advisory.
Multi-step pipelines and gating
For workflows that span several CI steps, Claude Code supports --continue (most recent conversation) and --resume <session_id>; capture the ID from --output-format json with jq -r '.session_id'. Session lookup is scoped to the current project directory and its git worktrees.
Gating is just exit codes. Both Claude Code and Codex return non-zero on failure, so a failed agent step fails the job like any other command. Codex captures stdout, stderr, exit code, and timing with token-aware truncation (it preserves the beginning and end, eliding the middle), which is handy when an agent's log is too large to surface whole.
Finally, prefer self-hosted runners with least-privilege scoped tokens for anything that touches sensitive systems — and note Anthropic documents the Agent SDK in GitLab CI/CD as well as GitHub Actions, so none of this is GitHub-only.
Bottom line
Headless AI agents in CI are production-ready, but only when you treat them like any other privileged automation. Pin the version (≥ v2.1.128 for Claude Code), start from the most restrictive tool and sandbox settings, set both an agent-level budget (--max-budget-usd, --max-turns) and a pipeline timeout, and parse the JSON output for cost and success signals. Above all, assume any PR or issue content the agent reads is hostile, and apply the Agents Rule of Two so a single injected comment can never reach both your secrets and the open internet. Do that, and an AI reviewer becomes one of the cheapest, most consistent steps in your pipeline.