Skip to main content
Home/Blog/AI Coding CLIs in CI/CD: Headless Modes, GitHub Actions, and Safe Automation
DevOps

AI Coding CLIs in CI/CD: Headless Modes, GitHub Actions, and Safe Automation

A practical DevOps guide to running Claude Code, Codex CLI, and Gemini CLI non-interactively in pipelines — headless flags, GitHub Actions, cost ceilings, and the guardrails that keep secrets from leaking.

By Sean

The interactive AI coding session — you typing, the agent asking permission, you approving — does not survive contact with a build server. CI has no TTY, no human to click "yes," and a budget that is very much real. Yet the same agents that refactor code on your laptop can review pull requests, triage issues, and fix failing tests inside a pipeline. The trick is running them in headless mode with guardrails tight enough that a malicious issue comment can't walk off with your API key.

This is a practical guide to doing that with the three major CLIs — Claude Code, Codex CLI, and Gemini CLI — covering the headless flags, the GitHub Actions, the cost ceilings, and the security model you actually need.

Headless mode: the foundation

Every CI integration starts with a non-interactive invocation: one prompt in, one result out, then exit.

Claude Code uses the -p (or --print) flag:

claude -p "Find and fix the bug in auth.py" \
  --allowedTools "Read,Edit,Bash"

Anthropic now frames headless mode as the Agent SDK delivered through the CLI; the older standalone --headless flag has been superseded by print mode. For scripted runs, add --bare, which skips auto-discovery of hooks, skills, plugins, MCP servers, auto-memory, and CLAUDE.md so a CI run doesn't behave differently from a local one. Anthropic recommends --bare for SDK/scripted calls and says it will become the default for -p in a future release. In bare mode, auth must come from ANTHROPIC_API_KEY or an apiKeyHelper.

Codex CLI uses codex exec:

codex exec "Add unit tests for the parser module"

It runs a single session to completion with no approval prompts, streams activity to stderr, and writes only the final agent message to stdout. By default codex exec is read-only — it cannot edit files or make network calls until you raise the sandbox level. That default is a feature: a read-only review job needs nothing more.

Gemini CLI enters headless mode automatically in a non-TTY environment or when you pass a prompt:

cat README.md | gemini -p "Summarize this and flag anything outdated"

All three accept piped stdin. Note Claude Code caps piped stdin at 10MB (as of v2.1.128); exceeding it exits with a clear error and non-zero status, so pass a file reference for larger inputs.

Structured output and per-run cost

CI needs machine-readable output, not prose. Each CLI obliges.

CLIJSON flagWhat you get
Claude Code--output-format jsontotal_cost_usd, per-model cost breakdown, session_id; --json-schema constrains output to a schema (returned in structured_output)
Codex--jsonNDJSON events: thread.started, turn.completed, item.*; --output-schema enforces a JSON Schema
Gemini--output-format jsonresponse, stats (per-model tokens/latency, tool call totals, file line modifications), error

The Claude Code cost field is the one I lean on most — it lets a pipeline log exactly what each invocation spent:

COST=$(claude -p "$PROMPT" --output-format json --bare \
  | jq -r '.total_cost_usd')
echo "This run cost \$$COST"

Cost and loop controls

Headless agents fail closed only if you make them. The two failure modes are runaway cost and infinite tool loops.

Claude Code gives you both levers directly. --max-budget-usd sets an absolute dollar ceiling — covering input/output tokens and tool uses — that the agent cannot exceed; when it's reached, the agent stops immediately. --max-turns limits reasoning/tool-use cycles and returns a partial result with a "Maximum turns reached" message, preventing infinite loops:

claude -p "$PROMPT" --bare \
  --max-turns 5 \
  --max-budget-usd 2.00 \
  --allowedTools "Read,Edit"

Always pair an agent-level limit with a job-level timeout in your CI YAML. The agent's own ceiling protects your spend; the pipeline timeout protects against the agent hanging entirely.

For Codex, cost discipline comes from running codex exec (no TUI overhead), --ephemeral to skip persisting session rollout files, and choosing cheaper models for routine work — community reports put savings from a mini Codex model at roughly 40–60% on routine tasks. Bear in mind each connected MCP server adds tool definitions to every API call, inflating cost; trim MCP servers you don't need in CI.

One scheduling note for Claude subscribers: starting June 15, 2026, Agent SDK and claude -p usage on subscription plans draws from a new monthly Agent SDK credit, separate from interactive usage limits. Budget your automation against that pool, not your interactive quota.

GitHub Actions

Each vendor ships a first-party action.

anthropics/claude-code-action is general-purpose: it responds to @claude mentions, issue assignments, or explicit automation prompts, and runs entirely on your own runner. It supports Anthropic API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry auth. The easiest setup is /install-github-app from the Claude Code terminal. The v1 action uses a unified claude_args configuration object (replacing the v0.x individual inputs); pin to @v1. A lower-level anthropics/claude-code-base-action exists as the building block inside the full action.

- uses: anthropics/claude-code-action@v1
  with:
    anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
    prompt: "Review this PR for security issues"
    claude_args: "--max-turns 8 --allowedTools Read,Bash(git diff *)"

openai/codex-action@v1 installs the Codex CLI, starts a Responses API proxy when given an API key (to reduce key exposure), and runs codex exec under a configurable safety strategy. Inputs include openai-api-key, prompt/prompt-file, sandbox (read-only / workspace-write / danger-full-access), safety-strategy, allow-users/allow-bots, and output-file. The safety-strategy options are drop-sudo (default, irreversibly strips sudo — recommended), unprivileged-user, and unsafe (required for Windows, not recommended on Linux/macOS).

google-github-actions/run-gemini-cli invokes the Gemini CLI in Actions. Inputs include gemini_api_key (optional unless using Vertex/Code Assist), prompt, and settings (JSON for .gemini/settings.json). It supports event-based automation, @gemini-cli mentions with /review or /triage, and conversational requests; pre-built workflows live in examples/workflows (gemini-dispatch.yml, issue-triage.yml, pr-review.yml).

The security model you can't skip

Here is the part most teams underestimate. Microsoft Threat Intelligence disclosed a vulnerability where the Claude Code GitHub Action could expose CI/CD secrets: prompt injection via untrusted GitHub content (issue bodies, PR descriptions, comments) could direct the Read tool to access /proc/self/environ and exfiltrate ANTHROPIC_API_KEY. The Bash tool was sandboxed and scrubbed — but the Read tool wasn't subject to the same isolation. Anthropic mitigated it in v2.1.128 (released May 5, 2026) by rejecting sensitive /proc files.

The patch closes that specific hole. The pattern behind it does not go away, so adopt the Microsoft case study's mitigations as standing policy:

  • Agents Rule of Two — never combine untrusted input, sensitive system access, and external communication in a single workflow. Break the chain on at least one axis.
  • Least privilege — scoped tokens, one key per environment, monitor usage. A review bot does not need write access to your registry.
  • Harden system prompts — use Claude Code's --append-system-prompt (or --append-system-prompt-file) to declare that issue, comment, PR, and file contents are untrusted data, not instructions. Use --system-prompt only when you intend to replace the default entirely.
  • Architectural isolation — keep untrusted context separate from execution. Restrict triggers with allow-users/allow-bots so external contributors can't invoke a privileged agent.
  • Lock down tools — Claude Code's dontAsk mode denies anything not in permissions.allow or the read-only command set; Codex's read-only sandbox default does the same. Default deny, then widen.

Also keep an eye on Gemini: a 2026 trust-model update introduced a breaking change to how non-interactive/headless environments handle folder trust. Previously, Gemini CLI in CI automatically trusted workspace folders for loading configuration and environment variables; the update hardens workspace trust and tool allowlisting. Review your run-gemini-cli setup against the advisory.

Multi-step pipelines and gating

For workflows that span several CI steps, Claude Code supports --continue (most recent conversation) and --resume <session_id>; capture the ID from --output-format json with jq -r '.session_id'. Session lookup is scoped to the current project directory and its git worktrees.

Gating is just exit codes. Both Claude Code and Codex return non-zero on failure, so a failed agent step fails the job like any other command. Codex captures stdout, stderr, exit code, and timing with token-aware truncation (it preserves the beginning and end, eliding the middle), which is handy when an agent's log is too large to surface whole.

Finally, prefer self-hosted runners with least-privilege scoped tokens for anything that touches sensitive systems — and note Anthropic documents the Agent SDK in GitLab CI/CD as well as GitHub Actions, so none of this is GitHub-only.

Bottom line

Headless AI agents in CI are production-ready, but only when you treat them like any other privileged automation. Pin the version (≥ v2.1.128 for Claude Code), start from the most restrictive tool and sandbox settings, set both an agent-level budget (--max-budget-usd, --max-turns) and a pipeline timeout, and parse the JSON output for cost and success signals. Above all, assume any PR or issue content the agent reads is hostile, and apply the Agents Rule of Two so a single injected comment can never reach both your secrets and the open internet. Do that, and an AI reviewer becomes one of the cheapest, most consistent steps in your pipeline.

Frequently Asked Questions

Find answers to common questions

Each CLI has a headless entry point. Claude Code uses claude -p "<prompt>" (the -p/--print flag: one prompt in, one result out, then exit). Codex uses codex exec "<prompt>", which runs a single session to completion with no approval prompts. Gemini CLI enters headless mode automatically in a non-TTY environment or when you pass -p/--prompt. All three accept piped stdin and emit machine-readable output for scripting.

Claude Code provides --max-turns (caps reasoning/tool-use cycles, returns a partial result with a 'Maximum turns reached' message) and --max-budget-usd (an absolute dollar ceiling the agent cannot exceed; it stops immediately when reached). Codex bounds work through its sandbox mode and you can wrap any of them in a job-level timeout. Always combine an agent-level limit with a pipeline timeout as a backstop.

Claude Code's --output-format json returns a payload including total_cost_usd and a per-model cost breakdown, plus a session_id you can extract with jq -r '.session_id'. Codex --json emits newline-delimited events (thread.started, turn.completed, item.*). Gemini's --output-format json returns response, stats (token counts, tool call totals, latency), and error. Parse any of them with jq.

Claude Code uses --allowedTools (e.g. --allowedTools "Read,Edit,Bash") with prefix-matching permission rules like Bash(git diff *), plus permission modes including acceptEdits and dontAsk (which denies anything not explicitly allowed). Codex uses --sandbox with read-only (default), workspace-write, or danger-full-access. Gemini uses --approval-mode and tool allowlisting. Start from the most restrictive setting and widen only as needed.

Treat it as a real risk. Microsoft Threat Intelligence disclosed an issue where prompt injection via untrusted GitHub content (issue bodies, PR descriptions, comments) could make the Claude Code GitHub Action's Read tool access /proc/self/environ and exfiltrate ANTHROPIC_API_KEY. Apply the 'Agents Rule of Two' — never combine untrusted input, sensitive system access, and external communication in a single workflow — enforce least-privilege scoped tokens, and harden system prompts to declare issue/PR/comment content as untrusted data.

Anthropic mitigated it in Claude Code v2.1.128 (released May 5, 2026) by rejecting access to sensitive /proc files. Upgrade your action and CLI to at least that version. The same release also capped piped stdin at 10MB.

The Codex GitHub Action exposes allow-users and allow-bots inputs to restrict who can trigger a run. For the Claude Code action, scope triggers to specific events and gate on author association in the workflow. The goal is to ensure untrusted external contributors cannot invoke an agent that holds privileged credentials.

Both Claude Code and Codex signal via exit codes / non-zero status. A non-zero exit (for example, Claude Code exiting when piped stdin exceeds 10MB) fails the step like any other command, so you can use standard CI conditionals. Codex captures stdout, stderr, exit code, and timing, enabling you to gate downstream steps on agent success.

Ship Faster with DevOps Expertise

From CI/CD pipelines to infrastructure as code, our DevOps consultants help you deploy confidently and recover quickly.