# Bastion user guide (full)

> The complete user guide, concatenated in reading order. Canonical pages live under https://bastion.jessica.black/guide.

<!-- https://bastion.jessica.black/guide -->

# Bastion user guide

> Agentic code review for a world where agents write all of the code.

This guide teaches you how to use Bastion on your own project: what it is, how to
run it, how to write reviewers, and how to wire it into CI and governance. It is
written for two audiences at once (the human curating the review policy and the
agent looping against it), because Bastion runs the repository's reviewers and merge
gate for both through whatever surface is natural to each (CI can add the PR's
description and discussion to the reviewers' context, and a purely local run can add
an author's personal user-level reviewers, which CI never sees).

This guide is self-contained: everything you need to run Bastion, write reviewers,
and wire it into CI is here, with nothing essential living elsewhere. If you want to
work on Bastion itself rather than use it, the contributor and design docs live in the
[Bastion repository](https://github.com/jssblck/bastion).

> **Reading this as an agent?** The whole guide is also served as a single plain-text
> file at [`bastion.jessica.black/llms-full.txt`](https://bastion.jessica.black/llms-full.txt),
> so you can ingest every chapter in one fetch instead of crawling pages.

## Read in order

The chapters build on each other. If you read them top to bottom you will go from
"what is this" to "running it in CI with a governed policy" without backtracking.

1. **[Introduction](./introduction.md)**: the problem Bastion solves, the core
   idea (reviewers as fitness functions), and the mental model. Start here.
2. **[Getting started](./getting-started.md)**: install the CLI, write your
   first reviewer, and run your first review in about five minutes.
3. **[Concepts](./concepts.md)**: reviewers, triggers, modes, the verdict, and
   the merge gate. The vocabulary the rest of the guide assumes.
4. **[Authoring reviewers](./authoring-reviewers.md)**: the registry schema in
   full, from the four required fields to timeouts, backends, environment, and
   prompt inputs. How to write a reviewer that stays at high recall.
5. **[The local workflow](./local-workflow.md)**: the `bastion review` loop in
   depth: human output vs. the JSONL agent stream, exit codes, and inspecting
   saved runs (`runs`, `show`, `transcript`, `clean`).
6. **[Continuous integration](./continuous-integration.md)**: promoting your
   repository's reviewers into GitHub Actions: checks, the aggregate gate, and
   per-author billing.
7. **[Governance](./governance.md)**: keeping humans at the policy layer with
   CODEOWNERS and branch protection, the escape-to-improvement loop, and what
   Bastion deliberately does not guarantee.

## In a hurry: set up Bastion in CI

If your goal is "get Bastion reviewing pull requests on GitHub," here is the whole
path; each step links to its details:

1. **Install the CLI and pick a backend.** [Getting started](./getting-started.md)
   (a subscription works; no API key required).
2. **Write `.bastion.yaml`** at your repo root with one or two reviewers, and check
   it with `bastion validate`. [Authoring reviewers](./authoring-reviewers.md). To
   pin a model like `gpt-5.5:high`, set `model:` and `effort:` separately under a
   pinned `backend:`.
3. **Add the workflow** and the per-author auth step.
   [Continuous integration](./continuous-integration.md#the-workflow). The complete,
   copy-pasteable auth recipe (the `<BACKEND>_AUTH_<LOGIN>` secret convention, the
   `case`-arm mapping, Dependabot, and fork safety) is in
   [Authentication & billing](./continuous-integration.md#authentication--billing).
4. **Protect the policy and require the check.** [Governance](./governance.md):
   CODEOWNERS over `.bastion.yaml` and the workflow, and branch protection requiring
   the aggregate `bastion` check.

## The one-paragraph version

You declare **reviewers** (focused agent prompts, one concern each) in
`.bastion.yaml`. Each reviewer has a **trigger** (file globs) and a
**mode** (`gate` blocks the merge, `advisor` only comments). `bastion review`
finds the reviewers whose triggers match your working-tree changes, runs them in
parallel, and aggregates their verdicts into one decision: all gates must pass.
A local run can also merge in personal reviewers from a user-level `.bastion.yaml`,
so you can run a reviewer locally even where a repo has not adopted Bastion. An
authoring agent loops `bastion review` until it is green, then opens a PR where CI
runs the repository's reviewers (the user-level ones are local-only). CI usually
confirms the result, and can differ when it adds the PR's description and discussion
to the reviewers' context. Humans stay in the loop by owning the reviewer registry,
not by reading every diff.

## Status

Bastion is experimental and still partial. The routing, runner, verdict
aggregation, and on-disk run store are implemented and tested, and the Claude Code,
Codex, and Pi backends execute reviewers for real, natively or inside a container
when a reviewer declares a `runner` and opts into `capabilities.network: true`. The
remaining capability fields (`mcp` and `skills`)
are accepted but not provisioned, so a reviewer that opts into one fails closed
rather than running without it. `network: true` grants a containerized reviewer
general (unscoped) egress; a container with the default `network: false` is rejected
before it runs, so a gate blocks and an advisor is skipped (provider-only scoping is
unbuilt). A containerized reviewer must opt into `network: true`. These are called out
where they appear in [Authoring reviewers](./authoring-reviewers.md).

---

<!-- https://bastion.jessica.black/guide/introduction -->

# Introduction

> Why Bastion exists, and the one idea you need to hold in your head.

## The problem

Agents write most of the code on a growing number of teams. When they are
fully unlocked, output volume looks more like *engineers x 100* than *x 1*. Two
things stop teams from unlocking that:

- **Human diff review does not scale.** Asking a 5-person team to review their
  agents' output is like asking 5 people in a 500-person org to review the other
  495. You cannot fix that by trying harder.
- **Without review, codebases rot.** Things go fine until they do not, and then
  you have a ball of mud nobody can work in.

The usual shape of agentic review hands the whole diff to one reviewer that checks
everything and writes comments designed for a person to act on. As you ask one
generic reviewer to check more things, its recall on any single one degrades. A
one-item checklist agent works; at ten items it is weaker; at a hundred it fails.

## The core idea

In Bastion, a reviewer is a **focused fitness function** (an automated check that
continuously asserts one property holds as the system evolves), and review is the
**author agent's loop taken to its conclusion**.

An authoring agent already loops against the compiler, the linter, and the tests.
Bastion adds loops whose oracle is *another agent*, one that encodes judgment a
compiler or a test cannot. The whole system follows from five principles:

1. **One concern per reviewer.** Single-responsibility reviewers stay at high
   recall and confidence. The unit of the system is *the reviewer*, not *the
   review*. You cover more ground by adding narrow reviewers, never by broadening
   one. A cross-cutting property like tenant isolation or migration safety is not
   special; it is just another reviewer whose single concern is that property.
2. **Reviewers run in the author's own loop, not only in CI.** The repository's
   reviewers run locally (fast, pre-PR) and in CI (authoritative), so CI usually
   confirms a green local loop. The two can differ when CI feeds reviewers the PR's
   description and discussion that a default local run lacks, and a purely local run
   can also include your personal user-level reviewers, which CI never runs (see
   [Authoring reviewers](./authoring-reviewers.md#user-level-reviewers)).
3. **Humans sit at the policy layer.** The goal is not human-out-of-the-loop. It
   is to move the human from reviewing diffs to *authoring, curating, and
   governing reviewers*, plus triaging escapes (bugs that slipped through a review
   that should have caught them). Your interface becomes the reviewer registry, not
   the diff.
4. **Aligned agents can still inadvertently game the system.** Bastion tolerates
   this and makes it *visible* and *easy to correct* by adjusting reviewers,
   rather than trying to make gaming impossible (which would give up the benefits
   of agentic development entirely).
5. **Reviewers converge through use.** Ship a reviewer that is good enough, then
   improve it from the escapes you actually hit, rather than trying to design a
   perfect one up front. The escape-to-improvement loop is where that happens.

## The mental model

Picture the way a good team did code review before agents:

> An author opens a PR. A reviewer reads it, leaves feedback (some blocking, some
> optional) and withholds approval until satisfied. The author addresses the
> blocking items (by changing the code, or by convincing the reviewer the code is
> already right) and requests re-review. Repeat until approved.

Bastion brings *that* process to the agent era. The reviewers play the colleague's
role, their verdicts are the feedback, and the author agent resolves the blocking
items and re-runs. The human is still in charge, but of the reviewers, not of
every line.

## What Bastion is not

Two non-guarantees are deliberate. Keep them in mind before you adopt it:

- **No guarantee of correctness.** Bastion does not prove your code is free of
  bugs or vulnerabilities. It is code review without the human in the small loop;
  a reviewer is only as good as its model and its prompt.
- **No guarantee the right thing is being built.** Catching "this is the wrong
  thing to build" was never review's job. By PR time that ship has sailed; it is a
  design-time question. Keep humans in the design loop.

Bastion is also **not an adversarial security boundary**. It is the agent-era
equivalent of team code review for aligned contributors: a speed bump and a set of
good defaults that keep earnest actors on the rails, not a defense against a
determined malicious one. The practical consequences for you, and how to govern
within these limits, show up in [Governance](./governance.md).

---

Next: [Getting started](./getting-started.md) -> install the CLI and run your
first review.

---

<!-- https://bastion.jessica.black/guide/getting-started -->

# Getting started

> Install Bastion, write one reviewer, and run your first review.

This chapter gets you from nothing to a working review loop. It assumes you have a
git repository and one of the supported agent backends installed (the Claude Code
or Codex CLI).

A little vocabulary shows up here in passing: *reviewer*, *gate*, *advisor*,
*verdict*, *findings*. The inline definitions are enough to follow along; the next
chapter, [Concepts](./concepts.md), defines each precisely.

## 1. Install the CLI

The quickest path is the install script. It detects your platform, downloads the
matching archive from the latest
[GitHub release](https://github.com/jssblck/bastion/releases), verifies its
SHA-256 checksum, and puts `bastion` on your `PATH`.

On Linux and macOS:

```sh
curl -sSfL https://raw.githubusercontent.com/jssblck/bastion/main/scripts/install.sh | bash
bastion --version
```

On Windows, from PowerShell:

```powershell
irm https://raw.githubusercontent.com/jssblck/bastion/main/scripts/install.ps1 | iex
bastion --version
```

The shell installer takes `-v/--version`, `-b/--bin-dir`, `-t/--tmp-dir`, and
`-l/--libc` (pass them after `bash -s --`); the PowerShell installer reads the
`Version` and `BinDir` environment variables. Pass `--help` (or set
`$env:Help="true"`) to see them all.

On Linux the installer autodetects the C runtime: it picks the statically linked
musl build on musl systems and on any host whose glibc is older than 2.35 (or
undetectable), and the glibc build only when the host glibc is 2.35 or newer
(Ubuntu 22.04, Debian 12, RHEL 9, and later). Force the choice with `--libc
gnu|musl` (or `BASTION_LIBC=...`) when you want to override it, for example to
take the portable musl build everywhere:

```sh
curl -sSfL https://raw.githubusercontent.com/jssblck/bastion/main/scripts/install.sh | bash -s -- --libc musl
# ...or, without the `-s --` dance, via the environment:
curl -sSfL https://raw.githubusercontent.com/jssblck/bastion/main/scripts/install.sh | BASTION_LIBC=musl bash
```

Prefer to grab the archive yourself? Prebuilt binaries are attached to every
release for Linux (x86_64 and aarch64, glibc and musl), macOS (Intel and Apple
silicon), and Windows (x86_64). Download the one for your platform, extract it, and
put `bastion` on your `PATH`:

```sh
# Example: Linux x86_64
curl -sSL https://github.com/jssblck/bastion/releases/latest/download/bastion-x86_64-unknown-linux-gnu.tar.gz | tar -xz
sudo install bastion-x86_64-unknown-linux-gnu/bastion /usr/local/bin/
bastion --version
```

On a system with glibc older than 2.35, swap `gnu` for `musl` in those URLs to get
the static build.

Prefer to build from source? You need a Rust 2024 toolchain:

```sh
cargo build --release
./target/release/bastion --version
```

`bastion --version` reports a release tag when one is reachable, otherwise the
short commit SHA, with a `-dirty` suffix when the tree has uncommitted changes.

## 2. Make sure the backend is ready

Bastion does not run its own agent loop. It shells out to an existing coding-agent
CLI and reuses whatever you already have configured locally, so your billing and
auth come along for free. Install and sign in to one of:

- **[Claude Code](https://docs.claude.com/en/docs/claude-code)** (`claude`): the
  default when a reviewer does not pin a backend.
- **[Codex](https://github.com/openai/codex)** (`codex`): pin it with
  `backend: codex` on a reviewer.
- **[Pi](https://github.com/earendil-works/pi)** (`pi`): pin it with `backend: pi`.
  Pi runs against whatever provider you have configured it with locally, unless a
  reviewer pins a `model` (Pi's `provider/id` form, which selects the provider too).

A **subscription** is fine; you do not need an API key. Because Bastion just runs
the CLI, whatever you signed in with works: a ChatGPT subscription through `codex`,
a Claude subscription through `claude`, and so on. The CLI reads its own auth file
(`~/.codex/auth.json`, `~/.claude`) and refreshes its token itself. Getting that same
subscription to bill the right person in CI is its own step, covered in
[Continuous integration](./continuous-integration.md#authentication--billing).

Bastion invokes the backend as a plain executable on your `PATH` (`claude`,
`codex`, or `pi`), so confirm the one you intend to use is installed and
authenticated before running a review:

```sh
claude --version    # for the Claude Code backend
codex --version     # for the Codex backend
```

If the binary lives elsewhere or you want to point at a wrapper, set
`BASTION_CLAUDE_BIN` or `BASTION_CODEX_BIN` to its path.

That covers the default, **native** path. If you author a reviewer with a
[`runner`](./authoring-reviewers.md#runner-and-capabilities), that reviewer runs its
backend inside a container instead (and must opt into `capabilities.network: true`;
without it the reviewer is rejected before it runs, so a gate blocks and an advisor is
skipped), so it needs a container engine on the host rather than the backend CLI: Bastion shells out to `docker` by default (set
`BASTION_CONTAINER_ENGINE` to use another, for example `podman`), and the backend CLI
(`claude` / `codex`) must be present inside the image. A fixed set of provider
credential variables (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, and the like) is forwarded
from your environment into the container by name so the in-container agent can
authenticate; host CLI auth that lives in a file (`~/.claude`, `~/.codex/auth.json`) is
not, so an image that relies on that should bake it in. You only need this once you
start using `runner` reviewers; the quickstart below stays native.

## 3. Write your first reviewer

Reviewers live in a declarative file at your repository root: `.bastion.yaml` (the
`.bastion.yml` spelling is also honored). Bastion discovers it by walking up from
your current directory, so you can run `bastion` from anywhere inside the repo. You
can also keep personal reviewers in a user-level `.bastion.yaml` in your platform
config directory; a local `bastion review` merges them with the repository's, which
lets you run a reviewer locally even in a repo that has not adopted Bastion (see
[Authoring reviewers](./authoring-reviewers.md#user-level-reviewers)). Create the
repository file:

```yaml
# .bastion.yaml
reviewers:
  - name: single-responsibility
    trigger: [src/**/*.rs]   # which changed files wake this reviewer
    mode: gate               # gate = blocks the merge; advisor = comments only
    prompt: |
      Review the changeset to determine whether any one file concentrates too
      many unrelated responsibilities. If a file has clearly taken on multiple
      distinct concerns that should be separate modules, block the PR and name
      the file(s) and the concerns; otherwise approve it. A single large but
      cohesive module is not a violation.
```

That is a complete reviewer. Four fields carry the meaning: a unique `name`, the
`trigger` globs over your changed files, the `mode`, and the `prompt`. Everything
else has a sensible default. The next chapter, [Concepts](./concepts.md), explains
each of these; [Authoring reviewers](./authoring-reviewers.md) covers the full
schema.

> Adapt the trigger to your language: `src/**/*.ts`, `app/**/*.py`, and so on. The
> glob matches against the paths git reports as changed.

## 4. Run a review

Make a change in your working tree (you do not need to commit it; Bastion reviews
the working tree, including uncommitted and untracked files), then:

```sh
bastion review --base main
```

Bastion computes the files that differ from `main`, selects the reviewers whose
triggers match, runs them in parallel, and renders progress and verdicts. A blocked
review exits non-zero; a clean one exits zero. That exit code is what lets an agent
(or a shell loop) know whether to keep working:

```sh
while ! bastion review --base main; do
  # ... fix what blocked, then loop ...
done
```

## 5. Read it as a machine stream

An agent driving the loop wants structured events, not rendered text. Ask for
JSONL: one JSON object per line, emitted as each thing happens:

```sh
bastion review --base main --format jsonl
```

You will get one typed event per line as the run progresses, ending in a
`run.completed` that carries the aggregate verdict. The
[local workflow](./local-workflow.md) chapter documents every event type and the
exact contract an agent should follow when consuming them.

## 6. Look at what was saved

Every run is persisted. Inspect history without re-running anything:

```sh
bastion runs                      # list recent runs and their verdicts
bastion show                      # re-print the latest run's findings
bastion transcript <reviewer>     # the full agent session for one reviewer
```

These are the on-demand detail; the common loop never needs them, but they are one
command away when a verdict surprises you. (`show` and `transcript` default to the
latest run; pass a run id for an older one, and the full forms are in
[the local workflow](./local-workflow.md).)

## 7. Teach your agents to use Bastion

You just drove the loop by hand. The point, though, is for your *coding agents* to
drive it themselves: run the review, read the findings, fix what blocks, and reach a
green gate before they ever open a PR. Bastion ships that instruction as a skill you
install into the repo and commit, so every agent picks it up on checkout:

```sh
bastion skills install
```

This writes a `using-bastion` skill into both `.claude/skills/` (Claude Code's
native skill path) and `.agents/skills/` (the agent-neutral convention). Commit the
result:

```sh
git add .claude/skills .agents/skills
git commit -m "Install the bastion onboarding skill"
```

The skill is generated from the binary, so re-running install after you upgrade
Bastion keeps the checked-in copy current. To confirm it has not drifted from the
binary (handy as a CI guard), run:

```sh
bastion skills check        # exits non-zero if a skill is missing or has drifted
```

The rendered file is deterministic (no version stamp or timestamp), so `check`
stays green across upgrades that do not change the skill text and only flags real
drift: a hand edit, or a forgotten re-install after the skill itself changed. When
you do upgrade, re-run `bastion skills install` to refresh, or
`bastion skills install --force` if you have local edits to overwrite. See what is
bundled with `bastion skills list`, and install into a different directory with
`--dir <path>` (repeatable).

## Keeping scratch runs out of your history

While you are experimenting, point Bastion at a throwaway data directory so trial
runs do not pile up in your real run history:

```sh
bastion --data-dir /tmp/bastion-scratch review --base main
```

The same override is available as the `BASTION_DATA_DIR` environment variable.

Note that `bastion review` always runs your reviewers on a real backend: there is
no built-in mode that fabricates verdicts without an agent, so a review still costs
a model call. To keep cost down while iterating, start with one cheap, fast
reviewer and a tight `timeout`.

## When something goes wrong

The most common first-run snags and what they mean:

- **"no reviewer registry found ..."**: there is no `.bastion.yaml` (or
  `.bastion.yml`) in this repo or any ancestor, and no user-level one in your config
  directory either. The command searches both and only errors when both are absent,
  so create a repository registry (step 3) or a personal one.
- **A reviewer registry error (malformed YAML, duplicate name, missing field).**
  The registry is validated before any agent runs, so these fail fast with a clear
  message. Run `bastion validate` (no model call) to check the merged set a local
  review would run, or `bastion validate path/to/.bastion.yaml` to check one file on
  its own; fix it and re-run. See [Authoring reviewers](./authoring-reviewers.md).
- **The review blocks immediately with "did not produce a verdict".** A gate failed
  closed, usually because the backend binary is missing or unauthenticated. Re-check
  `claude --version` / `codex --version` and that you are signed in (step 2).
- **No reviewers ran (a trivial pass).** Nothing in your changeset matched any
  reviewer's `trigger`. Confirm you actually changed a file the globs cover, and
  that `--base` points at the right branch.
- **Everything looks unchanged.** Bastion diffs against `--base` (default `main`);
  if your base branch has a different name, pass it explicitly.

---

You now have a working reviewer and a review loop. Next:
[Concepts](./concepts.md). The vocabulary (triggers, modes, verdicts, the gate)
the rest of the guide builds on.

---

<!-- https://bastion.jessica.black/guide/concepts -->

# Concepts

> The vocabulary Bastion runs on: reviewers, triggers, modes, verdicts, and the
> merge gate.

This chapter defines the terms the rest of the guide uses. It is short on purpose;
each idea has a deeper home later, linked as it comes up.

## The reviewer

A **reviewer** is the unit of the system: a focused agent prompt responsible for
exactly one property of a changeset. It is a bundle of *prompt + trigger + mode*,
plus an optional execution profile (backend, timeout, environment, inputs, a
container `runner`, and `capabilities`, among others). All of it is declared
statically in `.bastion.yaml`; [Authoring reviewers](./authoring-reviewers.md)
is the full field reference. The repository's `.bastion.yaml` is the shared, governed
set; locally you can also keep personal reviewers in a user-level `.bastion.yaml`,
and `bastion review` runs the merged set (see
[Authoring reviewers](./authoring-reviewers.md#user-level-reviewers)).

Two properties matter most:

- **Single concern.** A reviewer checks one thing and checks it well. You scale
  coverage by adding reviewers, never by widening one. This is what keeps recall
  high (see [Introduction](./introduction.md#the-core-idea)).
- **Declarative and static.** Reviewers are data, not code. Bastion never
  generates them on the fly. That keeps the trigger set stable and makes every
  reviewer reviewable, which is the foundation of [governance](./governance.md).

## The trigger and the changeset

A reviewer's **trigger** is a list of path globs. A reviewer runs only when at
least one changed file matches one of its globs. That is what makes a hundred
reviewers cheap: a docs-only change wakes the docs reviewers and nothing else.

```yaml
trigger: [src/server/**, src/client/**]   # runs when server or client code changed
```

The **changeset** is everything in your working tree that differs from the base
branch, *including uncommitted edits and new untracked files*, not just committed
history. This is deliberate: it lets an author loop against reviewers before
committing anything. (Locally, this means a reviewer sees your work in progress; in
CI the head is already committed, so the same definition gives the same result.)

## The mode: gate vs. advisor

Every reviewer has a **mode** that decides whether it can block a merge:

| Mode | Blocks the merge? | On crash/timeout/bad output |
| --- | --- | --- |
| `gate` | Yes, when it returns `block` | **Fails closed**: resolves to `block` |
| `advisor` | No, ever | **Fails open**: ignored in the aggregate |

A **gate** is a hard requirement: it must produce a clean `pass` for the merge to
proceed. If it crashes, times out, or cannot produce a valid verdict, it resolves
to a block, never a silent pass. An **advisor** comments but never holds up the
merge; even a clean `block` verdict from an advisor is treated as a pass for
aggregation (its findings still surface). A failed advisor is dropped.

Use a gate for properties that must hold (tenant isolation, fail-closed error
handling). Use an advisor for guidance you want surfaced but not enforced (test
coverage, doc gaps, style preferences).

## The verdict

Every reviewer returns a structured **verdict**, captured through the backend's
structured-output mechanism (a JSON schema for Claude Code, a requested verdict
block for Codex) so Bastion can parse and aggregate it:

```yaml
verdict: pass | block    # the authoritative gate decision (ignored for advisors)
summary: "..."           # a human-friendly one-paragraph explanation
findings:                # specific, located comments
  - kind: blocking       # blocking | optional
    path: src/server/db.rs
    line_start: 88
    line_end: 91
    detail: "scope this query by tenant_id"
```

The top-level `verdict` is the decision; `findings` explain it. A `block` should
carry at least one `blocking` finding (the reason), and a `pass` may still carry
`optional` findings as non-blocking suggestions. A finding's `kind` changes how it
is *surfaced*, not whether the merge proceeds; only `verdict` decides that.

**Findings are the actionable surface.** An agent fixing a PR gets everything it
needs from the findings: a file, a line range, and what to change. It should never
have to open a transcript to learn what to do.

A reviewer reports the complete actionable set in one pass, one finding per
distinct instance, not just one representative reason. The author can then fix
everything from a single run instead of meeting the next issue on the following
review cycle. Bastion requests this from every reviewer automatically, so a prompt
does not need to ask for it.

## The merge gate

Bastion runs all matched reviewers in parallel (they have wildly different
latencies, one might take 90 seconds, another 15 minutes) and **aggregates** their
verdicts into a single decision:

- **All gates must pass.** The aggregate is `pass` only when every gate returned a
  clean `pass`.
- **Any blocked, errored, or timed-out gate blocks the aggregate.** "All gates
  pass" never includes a gate that failed to produce a verdict.
- **Advisors never affect the aggregate.** They contribute findings, not gate
  decisions.

Locally, that aggregate is the exit code of `bastion review`. In CI it is the result
of the Bastion review job, and `bastion github report` also posts it as a single
always-present check named `bastion`. Either way the aggregation rule is the same, and
CI runs the repository's reviewers. The decision matches when both runs see the same
reviewers and context; two things can make a local run differ: CI can add the PR's
description and discussion that a default local run does not, and a purely local run
can include your personal user-level reviewers, which CI never runs (see
[Authoring reviewers](./authoring-reviewers.md#user-level-reviewers)).

## The backend

A **backend** is the agent harness a reviewer runs on. Bastion does not implement
its own agent loop; it translates the reviewer into the backend's native config and
shells out to its CLI, reusing your local auth and billing.

- `any` (the default): Bastion chooses; that resolves to Claude Code.
- `claude-code`: Anthropic's Claude Code CLI.
- `codex`: OpenAI's Codex CLI.
- `pi`: the Pi CLI; uses whatever provider you have configured it with locally,
  unless a reviewer pins a `model` (Pi's `provider/id` form selects the provider too).

You pin a backend when a subscription's terms require a specific harness, or when
one model is better at a given concern. See
[Authoring reviewers](./authoring-reviewers.md#backend) and, for CI
billing, [Continuous integration](./continuous-integration.md#authentication--billing).

By default the backend CLI runs **natively** on the host, using the `claude` or
`codex` already on your `PATH` and the auth and billing that CLI is configured with.
A reviewer that declares a [`runner`](./authoring-reviewers.md#runner-and-capabilities)
instead runs that same backend **inside a container** (which requires
`capabilities.network: true`; without it the reviewer is rejected before it runs, so a
gate blocks and an advisor is skipped): Bastion invokes the container engine on the
host, and the backend CLI resolves inside the image. A fixed set of
model-provider credential variables (`ANTHROPIC_API_KEY`, `ANTHROPIC_AUTH_TOKEN`,
`ANTHROPIC_BASE_URL`, `ANTHROPIC_MODEL`, `CLAUDE_CODE_OAUTH_TOKEN`, `OPENAI_API_KEY`,
`OPENAI_BASE_URL`, `CODEX_API_KEY`) is forwarded from Bastion's environment into the
container by name, so the in-container agent can still reach its provider; an image
can also bake in its own auth. If the reviewer's own `env` sets one of those names,
that value wins and the host's is not also forwarded, so the reviewer can pin a
specific credential. Nothing else from your host environment crosses that boundary. To
give the in-container agent another value, set it as a literal in the reviewer's `env`,
which is forwarded in alongside the credentials.

## How it all fits

```text
.bastion.yaml                 you author this
        |
        v
   bastion review  --->  compute changeset (working tree vs base)
        |
        v
   route: select reviewers whose trigger globs match
        |
        v
   run matched reviewers in parallel (each on its backend, each timeout-bounded)
        |
        v
   each returns a verdict (pass/block + summary + findings)
        |
        v
   aggregate: all gates must pass  --->  one decision (exit code locally; the
                                          review gate in CI)
```

---

Next: [Authoring reviewers](./authoring-reviewers.md). The full registry schema,
from the four required fields out to timeouts, environment, and prompt inputs.

---

<!-- https://bastion.jessica.black/guide/authoring-reviewers -->

# Authoring reviewers

> The registry schema in full, and how to write a reviewer that stays sharp.

Reviewers are the whole policy. This chapter is the reference for writing them:
the file, the required fields, the optional execution profile, and the craft of a
prompt that keeps recall high. It progresses from the minimum you need to the
fields you will reach for only occasionally.

## The registry file

The repository's reviewers live in one file at its root: `.bastion.yaml` (the
`.bastion.yml` spelling is also honored). Bastion finds it by walking up from the
current directory, so the command works anywhere inside the repo. The file is a
single `reviewers:` list:

```yaml
reviewers:
  - name: single-responsibility
    trigger: [src/**/*.rs]
    mode: gate
    prompt: |
      ...
  - name: test-coverage
    trigger: [src/**/*.rs]
    mode: advisor
    prompt: |
      ...
```

Reviewer **names must be unique** within the file; a duplicate name is a load
error. A name also has to work as a directory name in the run store, so a name that
reduces to an empty, `.`, or `..` component is rejected, as are two names that
collapse to the same component once non-portable characters are normalized (for
example `repo:test` and `repo-test`); plain names are unaffected. Because this file
*is* the review policy, changes to it should require human review; see
[Governance](./governance.md) and `bastion github codeowners`.

> **Migrating from `bastion/reviewers.yaml`.** Bastion still loads the legacy
> `bastion/reviewers.yaml` location but prints a deprecation warning; the supported
> location is `.bastion.yaml` at your repository root. Move the file (the contents
> are unchanged) and regenerate your CODEOWNERS block with `bastion github
> codeowners`.

## User-level reviewers

You can also keep personal reviewers in a user-level `.bastion.yaml` (or
`.bastion.yml`) in your platform config directory, so a reviewer you rely on runs
locally whether or not a given repository has adopted Bastion:

- Linux: `$XDG_CONFIG_HOME/bastion`, defaulting to `~/.config/bastion`.
- macOS: `~/Library/Application Support/bastion`.
- Windows: `%APPDATA%\bastion`.

When both files exist, a local `bastion review` merges the repository's reviewers
with your user-level ones into one set, by reviewer name:

- A reviewer only one file defines is included as-is.
- The same reviewer in both files is deduplicated to one. Sameness is compared by the
  *effective* configuration after each file's registry `defaults` are applied, so a
  reviewer that inherits a default `model` or `effort` and one that spells out the
  same value count as identical.
- A name in both files with a *different* effective configuration is a collision; both
  are kept, your copy under its plain name and the repository's scoped to
  `repo:<name>`, so neither silently wins. The two files are governed separately, so
  the collision is surfaced rather than resolved by precedence.

This layer is local-only. A review carrying a GitHub source (with `--repo`/`--pr`, as
CI runs) skips the user-level registry, so a pull request is gated by the
repository's reviewers alone, the `repo:` scope never appears there, and a personal
reviewer can never gate someone else's change.
`--config-dir <path>` (or `$BASTION_CONFIG_DIR`) overrides where the user-level file
is read from.

## Registry-wide defaults

An optional top-level `defaults:` block sets a house `model` and `effort` that
every reviewer inherits unless it sets its own. A reviewer's explicit field always
wins; the default just fills the gap, so you set the model and effort once instead
of repeating them on every reviewer:

```yaml
defaults:
  model: gpt-5
  effort: high
reviewers:
  - name: single-responsibility
    trigger: [src/**/*.rs]
    mode: gate
    backend: codex      # required: an inherited model needs a pinned backend
    prompt: |
      ...
```

A default `model` is still backend-specific, so a reviewer that inherits it must
pin a `backend`; an inherited model under `backend: any` is rejected the same way
an explicit one is. `defaults` sits *above* each backend's own built-in default
(Opus 4.8 at `high` effort on Claude Code), so the resolution order is: the
reviewer's own field, then `defaults`, then the backend default.

## The required fields

Four fields are mandatory. A reviewer with just these is complete and runnable.

### `name`

A unique identifier. It is also the reviewer's check-run name in CI
(`bastion / single-responsibility`), so keep it short and descriptive.

### `trigger`

A list of path globs matched against the changed files. The reviewer runs if any
changed file matches any glob. Globs use the usual `**` (any depth) and `*` (one
segment) syntax:

```yaml
trigger: [src/**/*.rs]                       # all Rust under src, any depth
trigger: [src/server/**, src/client/**]      # either subtree
trigger: [src/**/*.rs, docs/**/*.md, ".bastion.yaml"]   # multiple kinds
```

Quote a glob if YAML would otherwise mis-parse it (a bare leading `*`, for
instance). Scope triggers tightly: a narrow trigger is what keeps an irrelevant
reviewer from waking on every change.

### `mode`

`gate` (blocks the merge when it returns `block`; fails closed) or `advisor`
(never blocks; fails open). See [Concepts](./concepts.md#the-mode-gate-vs-advisor)
for the full semantics.

### `prompt`

The instruction handed to the reviewing agent. This is where the craft lives; see
[Writing a good prompt](#writing-a-good-prompt) below.

## The optional execution profile

The remaining fields tune *how* a reviewer runs. All have defaults; omit them
until you need them.

### `backend`

Which agent harness runs the reviewer. Default `any` (resolves to Claude Code).
Pin `claude-code`, `codex`, or `pi` to force a specific harness, usually
because a subscription's terms require it, or because one model is better at a
given concern.

```yaml
backend: codex
```

> `pi` is multi-provider. Pin its provider and model together in the [`model`](#model)
> field using Pi's `provider/id` form (e.g. `openai-codex/gpt-5.5`); omit `model` to
> run against whatever provider and model your local Pi CLI defaults to.

### `model`

The specific model the backend should use, for example `claude-opus-4-8` on Claude
Code or `gpt-5` on Codex. A model id is **backend-specific**, so pinning one
requires a pinned `backend`: a `model` under `backend: any` is rejected when the
registry loads, since Bastion cannot know which backend the id is meant for.

```yaml
backend: codex
model: gpt-5
```

Under `backend: pi` the model also names its **provider**, written in Pi's
`provider/id` form, because Pi is multi-provider and its bare default provider is
`google`. So a Pi reviewer that wants an OpenAI Codex model writes the provider into
the id rather than a separate field:

```yaml
backend: pi
model: openai-codex/gpt-5.5
```

Omit it to take the backend's default. On Claude Code that default is **Opus 4.8**;
on Codex and Pi it is whatever the harness itself resolves (for Pi, its configured
default provider and model). To set a model once for the whole registry rather than
per reviewer, use the [`defaults`](#registry-wide-defaults) block.

### `effort`

The reasoning-effort level, forwarded verbatim to the active backend's effort
control (Claude Code's `--effort`, Codex's `model_reasoning_effort`, Pi's
`--thinking`). Like `model`, the value is opaque: use whatever vocabulary your
backend accepts. Claude Code takes `low`, `medium`, `high`, `xhigh`, or `max`; Codex
takes `minimal`, `low`, `medium`, or `high`; Pi takes `off`, `minimal`, `low`,
`medium`, `high`, or `xhigh`. The shared `low`/`medium`/`high` levels work on any
backend; the backend-specific ones do not, so a value that does not match the
reviewer's backend is the backend's problem (Claude Code, for instance, warns and
falls back to its own default).

```yaml
effort: high
```

The default is **`high`** (accepted by every backend). Lower it on cheap,
mechanical reviewers to save tokens; raise it on the ones that need to reason hard.

> **The `model:effort` shorthand.** People often write a model and effort together
> as `gpt-5.5:high` or `claude-opus-4-8:max`. Bastion has no combined field: that is
> just `model:` plus `effort:`. Split it across the two fields, with a `backend`
> pinned so the model id is unambiguous:
>
> ```yaml
> backend: codex
> model: gpt-5.5      # the part before the colon
> effort: high        # the part after it
> ```

### `timeout`

A per-reviewer wall-clock limit, written in human form (`90s`, `15m`). When a
reviewer exceeds it, a gate fails closed (block) and an advisor is skipped. The
default is **15 minutes**. Set a short timeout on cheap reviewers and a long one on
heavy end-to-end checks:

```yaml
timeout: 15m
```

### `env`

Environment variables injected into the reviewer's process, so the agent and any
tool it runs can see them. Use this to hand a reviewer a value your environment
already provides, say a preview URL:

```yaml
env:
  PREVIEW_URL: http://localhost:3000
```

Values are **literal**: Bastion does not perform shell `$VAR` expansion, so write
the actual value, not `${SOMETHING}`. Bastion consumes environments, it does not
provision them: locally the value must already exist (a precommit script might boot
the service and export it), and in CI the workflow stands it up. See
[Continuous integration](./continuous-integration.md#environments--inputs).

How the value reaches the agent depends on where the reviewer runs:

- **Native reviewers** (no `runner`) also inherit Bastion's own environment, so a
  variable your shell or CI has already exported is visible to the agent even
  without listing it here; the `env` block sets additional values explicitly.
- **Containerized reviewers** (with a `runner` and `capabilities.network: true`) do
  *not* inherit Bastion's arbitrary environment. Into the container go exactly the `env` pairs written here
  (as literal values, the same as everywhere else) plus a fixed set of
  model-provider credential variables (see [Backends](./concepts.md#the-backend)).
  Nothing else crosses, so a value an outer shell or CI job exported reaches a
  containerized reviewer only if its literal value is written into this `env` block
  (template the registry if the value is dynamic, for example a per-PR preview URL).
  For a containerized reviewer the `env` pairs are written to a temporary file handed
  to the engine as `--env-file`, so their values never appear on the `docker run`
  command line (a secret in `env` stays out of a process listing) and their names
  never touch the engine *client* process; the provider credentials are the only
  variables forwarded by name from Bastion's own environment. If you set one of those
  provider credential names in this `env` block, your value wins: Bastion does not also
  forward the host's value for that name, so the reviewer's `env` overrides it (matching
  how a native reviewer's `env` overrides the inherited environment). One container-only
  constraint follows from that env-file format (one `KEY=VALUE` per line, no escaping):
  a containerized reviewer's `env` cannot carry a key containing a newline or `=`, or a
  value containing a newline. Such a pair is rejected and the reviewer fails closed
  rather than receive a corrupted value; a multiline value (a PEM key, say) has to
  reach a containerized reviewer some other way (a file in the image, or one its
  Dockerfile copies in). Native reviewers have no such limit.

### `inputs`

Values interpolated into the prompt *before* it reaches the agent. Reference an
input as `${name}` in the prompt; Bastion substitutes the value. Unknown
placeholders are left untouched.

```yaml
inputs:
  preview_url: http://localhost:3000
prompt: |
  Run the checkout flow against the preview environment at `${preview_url}`.
  If it fails, block the PR and explain; otherwise approve it.
```

`env` puts a value in the *process*; `inputs` puts a value in the *prompt text*.
They are independent: use `env` for tools the agent invokes, `inputs` for values
the agent should read in its instructions. Input values are literal as well: a
`${name}` in the prompt is substituted only from this `inputs` map, never from your
shell environment.

### `runner` and `capabilities`

The schema also accepts a `runner` block (`dockerfile` / `image`) and a
`capabilities` block (`network`, `mcp`, `skills`) to opt into an execution
environment beyond the least-privilege default. Where these stand:

- **`runner` is provisioned (paired with `network: true`).** A reviewer with a
  `runner` block and `capabilities.network: true` runs its backend
  inside a container: a `dockerfile` is built (tagged by a content hash of the
  Dockerfile, so an unchanged file reuses the engine's layer cache), an `image` is used
  as-is (the engine pulls it on demand at run time). If both are set, `dockerfile`
  wins; a `runner` with neither
  fails closed. The `dockerfile` path is relative to the repository root and must
  resolve inside it: an absolute path, any path with a `..` component (rejected
  outright, even one that would resolve back inside), or one that canonicalizes outside
  the repo through a symlink all fail closed. The build runs
  with the repository root as its build context, so the Dockerfile's `COPY` and `ADD`
  can reference files anywhere in the repo. An `image` reference beginning with `-`
  fails closed, since the engine would read it as a command-line option rather than an
  image name. The selected backend's executable must exist inside the image on `PATH`
  (`claude` for `claude-code`, `codex` for `codex`). This lets a reviewer carry tools
  or a pinned toolchain the host does not have.
- **`capabilities.network: true` is required to run a container; the default
  `network: false` fails closed.** `network: true` gives a containerized reviewer
  general (unscoped) outbound network. A container's egress cannot be scoped to the
  model provider yet (the allowlisting proxy is unbuilt), so the default
  `network: false` reads as restricted but cannot be enforced: rather than silently
  attach general egress, `ExecutionPlan::resolve` rejects a container with
  `network: false` before it runs. As with `mcp`/`skills`, that rejection **fails
  closed**: a gate blocks and an advisor is skipped, with a message naming the field. A
  containerized reviewer must opt into `network: true` to run, accepting general egress
  for now. A *native* `network: true` (no `runner`) also fails closed, since with no
  container there is nothing to scope.
- **`capabilities.mcp` and `capabilities.skills` are not provisioned.** A
  reviewer that declares either **fails closed**: a gate blocks and an advisor is
  skipped, with a message naming the unprovisioned field, rather than running
  degraded (a gate that quietly ran without a privilege it asked for would be a
  silent fail-open). Leave them out.

The least-privilege default (no `runner`, `network: false`, no `mcp` or `skills`)
runs natively on the host.

## A fully-loaded example

Putting the optional fields together. As written, this reviewer runs in the container
built from its Dockerfile. It must declare `network: true` to run (a containerized
reviewer needs general egress, since provider-only scoping is unbuilt), and Bastion
forwards its `env` into that container.

```yaml
reviewers:
  - name: e2e-checkout-flow
    trigger: [src/**]
    mode: gate
    backend: claude-code
    timeout: 15m
    env:
      PREVIEW_URL: http://localhost:3000     # literal value, no shell expansion
    inputs:
      preview_url: http://localhost:3000     # substituted into the prompt as ${preview_url}
    runner:                                  # provisioned: runs the backend in this image
      dockerfile: ./.bastion/e2e.Dockerfile
    capabilities:
      network: true                          # required to run a container; grants general (unscoped) egress
    prompt: |
      Run the e2e checkout flow against the preview environment at `${preview_url}`
      using Playwright. If it fails, block the PR and explain; otherwise approve it.
```

Adding an unprovisioned capability flips the whole reviewer to fail closed. For
example, adding `mcp: [playwright]` under `capabilities` would block this gate before
it ever reaches the container, since `mcp` is checked first. Leave `mcp` and `skills`
out until those tiers land.

## Writing a good prompt

The prompt is the reviewer. A few habits keep recall high:

- **Say what to block on, explicitly.** End with a clear instruction: "block the
  PR if X; otherwise approve it." The reviewer's job is a decision, not an essay.
- **Name the one concern and stay on it.** If you find yourself writing "also
  check...", that "also" is a second reviewer. Split it.
- **Carve out the false positives you can predict.** "A single large but cohesive
  module is not a violation." "Panics in `#[cfg(test)]` code are acceptable."
  Pre-empting the obvious wrong flags keeps false positives down.
- **Match the mode to the language.** A gate's prompt should be decisive; an
  advisor's should say "report as optional findings... do not block," so its
  output stays advisory even if the model is tempted to be firm.
- **Let the agent explore.** Every reviewer gets a full checkout and is told how to
  see the changeset (the diff against the base, plus untracked files). You do not
  need to paste the diff into the prompt; point the reviewer at the property.
- **You do not need to ask for completeness.** Bastion appends an instruction to
  every reviewer prompt telling the agent to report every distinct finding in one
  pass, not just the first. Write the prompt for the concern and phrase findings
  per instance (one per file and line range), and the agent enumerates them all so
  the author fixes the whole set from one run.

Some worked examples, taken from Bastion's own registry
([`.bastion.yaml`](https://github.com/jssblck/bastion/blob/main/.bastion.yaml)):

```yaml
  - name: error-handling
    trigger: [src/**/*.rs]
    mode: gate
    backend: codex
    prompt: |
      Review the changeset for error-handling discipline: no `.unwrap()` or
      `.expect()` on recoverable errors in non-test code, errors propagated with
      `?` and given context, and gates that fail closed. Block the PR if you find
      a recoverable error that can panic in production; otherwise approve it.
      Panics in `#[cfg(test)]` code and in genuinely-unreachable invariants that
      are documented as such are acceptable.

  - name: test-coverage
    trigger: [src/**/*.rs]
    mode: advisor
    backend: codex
    prompt: |
      Check whether new or changed behavior in this changeset is covered by
      tests. This is advisory: report uncovered behavior as optional findings so
      the author can decide, but do not block.
```

## Validating your registry

Run `bastion validate` to parse the registry and report any problem without running
a single reviewer or spending a model call:

```sh
bastion validate                          # validate the merged set review would run
bastion validate path/to/.bastion.yaml    # check a specific file on its own
```

With no file argument it validates the same merged set a local `bastion review`
would run, the discovered repository registry plus your user-level one, and names
each source it merged. An explicit `FILE` is checked on its own, with no merging. It
loads through the same path `bastion review` uses, so it catches exactly the errors a
real review would hit at load time: malformed YAML, an unknown field, a duplicate
name (including one that survives the user/repo merge), a reviewer missing a required
field, or a model pinned under `backend: any`. A valid registry prints a one-line
summary and the reviewers it parsed, and exits zero; an invalid one prints the error
and exits non-zero, so the command works as a pre-commit or CI lint as well as a
quick local check.

The registry is also validated whenever it loads for a real `bastion review`, so a
malformed file fails fast there too. `bastion validate` just lets you check it on its
own, for free, before you run anything.

---

Next: [The local workflow](./local-workflow.md). Running `bastion review` in
depth, the JSONL agent stream, and inspecting saved runs.

---

<!-- https://bastion.jessica.black/guide/local-workflow -->

# The local workflow

> Running `bastion review` for real: the loop, the two output formats, exit codes,
> and inspecting what was saved.

The local CLI is the surface an authoring agent optimizes against before opening a
PR. It runs the *same* reviewers CI will run, so a green local loop usually means a PR
that CI confirms. Two things can make a local run differ: CI feeds reviewers the PR's
description and discussion that a default local run lacks, and a local run also merges
in any personal reviewers from your user-level registry, which CI never sees (see
[Authoring reviewers](./authoring-reviewers.md#user-level-reviewers)). This chapter
covers the loop in depth.

## The loop

The intended use is a tight loop: run the review, read what blocks, fix it, run
again, until green.

```sh
bastion review --base main
```

`bastion review` computes the changeset (working tree vs. `--base`, including
uncommitted and untracked files), selects the reviewers whose triggers match, runs
them in parallel with per-reviewer timeouts, and renders progress and verdicts.

- `--base <branch>`: the branch to diff against. Defaults to `main`.
- `--format <human|jsonl>`: output format. Defaults to `human`.
- `--repo <owner/name>`: the GitHub repository to gather pull request context from. Defaults to `$GITHUB_REPOSITORY`.
- `--pr <number>`: the pull request whose description and discussion the reviewers read as context. Requires a repository, from `--repo` or `$GITHUB_REPOSITORY`; passing `--pr` with no repository is an error.
- `--config-dir <path>`: the user-level config directory to merge personal reviewers from (env `BASTION_CONFIG_DIR`). Defaults to your platform config directory (`~/.config/bastion` on Linux, `~/Library/Application Support/bastion` on macOS, `%APPDATA%\bastion` on Windows). The user-level layer is applied only to a purely local review; a review carrying `--repo`/`--pr` uses the repository's reviewers alone.

The CI workflow passes `--repo`/`--pr` so reviewers see the PR's stated intent and discussion. Locally you rarely need them: with no PR, intent comes from your branch's commit messages (`base..HEAD`), and each reviewer's prior findings come from the run store. When you do pass them, Bastion builds its GitHub REST client from `GITHUB_TOKEN` and `GITHUB_API_URL` (the latter defaults to the public API and points at a GitHub Enterprise host when set). Discussion gathering reads the first 100 conversation comments and the first 100 review comments and does not paginate, so later comments on a very long thread are not included. Gathering PR context is read-only and best effort, so an API or token failure never fails the review; it just drops back to the local context.

### Exit codes

The exit code *is* the gate, so a loop can branch on it:

| Aggregate verdict | Exit code |
| --- | --- |
| `pass` (all gates passed) | `0` |
| `block` (a gate blocked, errored, or timed out) | non-zero |

```sh
# Keep working until every gate is green.
until bastion review --base main; do
  echo "still blocked; fixing..."
  # ... make changes ...
done
```

A blocked review is an *expected* outcome, not a crash: Bastion still exits
cleanly with structured output, and only the code signals the gate.

## Two audiences, two formats

By default `bastion review` renders human-readable progress for a person watching.
An agent passes `--format jsonl` and gets a machine stream instead. Both describe
the same run; only the presentation differs.

### The JSONL stream

With `--format jsonl`, Bastion emits one JSON object per line, as each thing
happens. A run is a typed sequence of events:

```jsonl
{"type":"run.started","run":"r-0f3a","branch":"feat/cart","base":"main","changed":12,"reviewers":[{"name":"file-responsibility","mode":"gate"},{"name":"tenant-isolation","mode":"gate"}]}
{"type":"reviewer.started","run":"r-0f3a","reviewer":"tenant-isolation","mode":"gate","backend":"claude-code"}
{"type":"reviewer.resolved","run":"r-0f3a","reviewer":"tenant-isolation","verdict":"block","summary":"A new query path reads rows without scoping by tenant id.","findings":[{"kind":"blocking","path":"src/server/db.rs","line_start":88,"line_end":91,"detail":"scope this query by tenant_id"}],"usage":{"tokens_in":18204,"tokens_out":1560,"cache_read":12000,"cost_usd":0.21},"duration_ms":38120,"has_transcript":true}
{"type":"run.completed","run":"r-0f3a","verdict":"block","gates":{"total":2,"passed":1,"blocked":1},"duration_ms":41030,"tokens_in":20480,"tokens_out":1875,"cache_read":13100,"cost_usd":0.37}
```

The event types:

| Event | Meaning |
| --- | --- |
| `run.started` | The run began; lists the reviewers that matched and will run. |
| `reviewer.started` | One reviewer was dispatched. |
| `reviewer.resolved` | One reviewer finished; carries its `verdict`, `summary`, `findings`, `usage`, and a `has_transcript` flag. |
| `run.completed` | The aggregate decision and the gate tally, plus the run's wall-clock `duration_ms` and the usage totals (`tokens_in`, `tokens_out`, `cache_read`, `cost_usd`) summed across reviewers. |

How an agent should consume it:

- **Only need the outcome?** Ignore everything until `run.completed` and read its
  `verdict`.
- **Want to react as you go?** Read each `reviewer.resolved` as it lands and act on
  its `findings`: a `path`, a `line_start`/`line_end`, and a `detail` telling you
  what to change. The findings are everything you need to fix the code.

### For agents: the consumption contract

If you are an agent driving the loop, this is the whole contract:

1. Run `bastion review --base <branch> --format jsonl`.
2. Parse stdout one line at a time as JSON; each line has a `type`.
3. Act on every `reviewer.resolved` with `verdict: "block"` using its `findings`
   (`path` + `line_start`/`line_end` + `detail`). Do not open transcripts; the
   findings already say what to change.
4. The aggregate decision is `run.completed.verdict`. The process also exits
   non-zero on `block`, so you can branch on the exit code alone if you only need
   pass/fail.
5. Fix what blocked and re-run. Loop until `run.completed.verdict` is `pass` (exit
   zero), then open your PR.

This contract is exactly what `bastion skills install` checks into your repo as the
`using-bastion` agent skill, so your agents follow it without being told each time.
See [Teach your agents to use Bastion](./getting-started.md#7-teach-your-agents-to-use-bastion).

### The skills-freshness notice on stderr

Before it runs, `bastion review` compares the `using-bastion` skill checked into your
repo (under `.claude/skills` and `.agents/skills`) against the copy bundled in the
running binary, the same comparison `bastion skills check` makes. When the checked-in
copy is missing or has drifted, it prints a one-line notice to **stderr** naming the
affected files and pointing at `bastion skills install`. This is the case where your
agents may be following stale guidance, so the driving agent sees the notice inline
with the run.

It goes to stderr on purpose, keeping stdout as pure JSONL for a parser; the notice is
advisory, so it never adds an event to the stream and never changes the exit status. A
`block` still comes only from a reviewer. Run `bastion skills install` (add `--force`
to overwrite a file you edited) and commit the result to clear it.

### Money is dollars

Cost fields (`cost_usd`) serialize as dollars (`0.21`) even though Bastion tracks
exact cents internally, so you never see floating-point cent drift in the stream.
Token fields (`tokens_in`, `tokens_out`, `cache_read`) are plain integer counts;
on `run.completed` they are the totals summed across every reviewer that reported
usage, the same way `cost_usd` is. `cache_read` is the input tokens served from the
provider's prompt cache (cache hits); each backend names it differently natively
(Claude's `cache_read_input_tokens`, Codex's `cached_input_tokens`, Pi's
`cacheRead`) and Bastion normalizes them to one field. It is 0 when a backend
reports no cache usage.

## What is streamed vs. what is saved

The stream deliberately leaves out the verbose detail. A transcript is mostly noise
to an agent that just wants to know what to fix; streaming thousands of lines on
every run would bury the findings and burn the agent's own context.

- **Streamed:** the decisions and the things you act on immediately: the reviewer
  set, start and resolve events, verdicts, summaries, findings, per-reviewer usage.
- **Saved, not streamed:** the verbose detail: full session transcripts, raw
  verdict payloads, per-reviewer metadata. Written to disk, read on demand.

That is why `reviewer.resolved` carries `has_transcript: true` rather than the
transcript itself: when a decision surprises you, the transcript is one command
away (next section).

## Inspecting saved runs

Every run is persisted, so you can inspect history without re-running anything.
These commands are the local equivalent of clicking "Details" on a CI check. The
run-targeted ones (`show`, `transcript`) default to the latest run when you omit a
run id; `runs` and `clean` operate over all saved runs.

```sh
bastion runs                         # list recent runs: id, verdict, branch, reviewer count
bastion show [<run>]                 # re-print a run's summaries, verdicts, findings
bastion transcript [<run>] <reviewer>   # the full agent session for one reviewer
bastion clean [--keep N | --older-than <dur>]   # prune saved runs
```

- **`runs`** is the index: what ran recently and how each landed.
- **`show`** re-emits a past run's verdicts and findings, the same content as the
  stream's resolve and complete events, on demand. Accepts `--format human|jsonl`.
- **`transcript`** prints the saved session for one reviewer. This is the explicit,
  opt-in way to see what was kept off the stream; reach for it when a verdict is
  surprising and you want to know why. It is raw text (a transcript is already a
  document). Pass either `<reviewer>` (latest run) or `<run> <reviewer>`.
- **`clean`** prunes old runs. `--keep N` retains the N most recent;
  `--older-than <dur>` (e.g. `7d`, `12h`) removes runs older than a duration. The
  two are mutually exclusive.

## Where runs live

Bastion persists every run under a per-user data directory, by platform
convention:

- Linux: `$XDG_DATA_HOME/bastion`, default `~/.local/share/bastion`
- macOS: `~/Library/Application Support/bastion`
- Windows: `%APPDATA%\bastion`

Override it with `--data-dir <path>` or the `BASTION_DATA_DIR` environment
variable, handy for scratch runs you do not want in your real history. The layout:

```text
<data-dir>/
  runs/
    r-0f3a/
      run.jsonl                  # the full event stream (always JSONL, regardless of display format)
      reviewers/
        tenant-isolation/
          transcript.jsonl       # the full agent session
          verdict.json           # the raw structured verdict
          meta.json              # backend, timing, usage, matched trigger
    latest                       # a plain file holding the most recent run id
```

`run.jsonl` is the same event stream whether a human or an agent triggered the
run, so any run can be replayed or inspected after the fact. Runs accumulate:
`bastion review` does not prune, so history grows until you run `bastion clean`,
which keeps the most recent 20 when given no arguments (or use `--keep N` /
`--older-than <dur>`).

## Providing environments locally

For a **native** reviewer, the reviewer process inherits Bastion's own environment,
so anything your shell or a `precommit` script has exported (a service on
`http://localhost:3000`, say) is visible to the agent; a reviewer's `env` and
`inputs` values are literal text set in the YAML, not shell-expanded. Bastion only
reads values your shell or CI already exported; it does not stand them up. This is
the same boundary CI honors, which keeps the local and CI surfaces in agreement.

A **containerized** reviewer (one with a
[`runner`](./authoring-reviewers.md#runner-and-capabilities), which today must also set
`capabilities.network: true` to run) does not inherit your shell environment, since it
runs in a container. Into it go the reviewer's literal
`env` pairs plus a fixed provider-credential set, and nothing else. (If the reviewer's
`env` sets one of those credential names, its value wins and the host's is not also
forwarded.) So an exported `PREVIEW_URL` that a native reviewer would see for free
reaches a containerized one only if you write its literal value into that reviewer's
`env`, and a containerized
reviewer typically reaches a host service over the container network rather than
`localhost`.

## The same surface in CI

For the repository's reviewers, these local events are not a separate system from CI;
they are the same decisions in a finer-grained form. Each such JSONL event has a
GitHub twin (a check run, a comment, an annotation), laid out side by side in the
[Continuous integration](./continuous-integration.md#how-a-run-maps-to-github)
chapter. A green local loop predicts a green PR when both runs see the same reviewers
and context. The two surfaces run the repository's reviewers and aggregation, and CI
adds the PR's description and discussion that a default local run does not, so a
reviewer that weighs that context can decide differently. A purely local run can also
include your personal user-level reviewers; their `run.started` and
`reviewer.resolved` events are local-only and never become checks or comments (see
[Authoring reviewers](./authoring-reviewers.md#user-level-reviewers)).

---

Next: [Continuous integration](./continuous-integration.md). Promoting these same
reviewers into GitHub Actions as a required merge check.

---

<!-- https://bastion.jessica.black/guide/continuous-integration -->

# Continuous integration

> Promoting your reviewers into GitHub Actions: one required check and per-author
> billing.

The local loop gets you to green before you open a PR. CI is the authoritative
confirmation: it runs the reviewers from the repository's `.bastion.yaml` and reports
one merge gate. Because routing and aggregation are shared, CI rarely surprises an
author who looped locally. It can differ in two ways: CI adds the PR's description and
discussion that a default local run lacks, and CI runs the repository's reviewers
only, while a local run can also include your personal user-level reviewers (see
[Authoring reviewers](./authoring-reviewers.md#user-level-reviewers)). The user-level
layer is local-only by design, so it can never gate someone else's pull request. This
chapter covers the GitHub adapter, the one forge Bastion targets.

> Bastion does not own CI; it plugs into yours. The workflow, the secrets, the
> preview environments, and the branch-protection rules are GitHub's. Bastion
> reads and writes them through a thin adapter and otherwise stays out of the way.

## How a run maps to GitHub

On each pull-request event (`opened`, `synchronize`, `reopened`) the workflow runs
`bastion review`, which computes the changed files, routes to the matching
reviewers, runs them in parallel with per-reviewer timeouts, and persists the run. A
second step, `bastion github report`, reads that run and posts it. A verdict reaches
two GitHub surfaces:

- **Findings are posted to the PR.** `bastion github report` renders every finding
  (blocking and optional) into a single sticky PR comment, and attaches each located
  finding to its reviewer's check run as an annotation on the finding's `path` and
  line range. The sticky comment is the surface an implementing agent reads; it
  carries everything it needs to act.
- **Each verdict becomes a check run** named after the reviewer
  (`bastion / tenant-isolation`). A blocking gate reports `failure`; a passing gate
  reports `success`; an advisor reports `success` with its findings attached.

`bastion github report` also folds a skills-freshness advisory into the sticky comment
when the checked-out repo's bundled skills (`.claude/skills` and `.agents/skills`) are
missing or have drifted from the reporting binary, the same comparison
`bastion skills check` makes. It renders as a `> [!WARNING]` callout just under the
headline, naming each affected file and pointing at `bastion skills install`. It is
advisory only, so it never changes a check-run conclusion or the `bastion` gate; it
tells you to refresh stale skills without failing the build. The local `bastion review`
prints the same notice to stderr.

The local-to-GitHub mapping is one-to-one for the repository's reviewers: the JSONL
events a CI or `bastion review --repo/--pr` run produces are the same decisions GitHub
renders as checks and a comment. (A purely local run can also include your personal
user-level reviewers, whose events are local-only and have no GitHub twin.) Each
GitHub surface has a local twin:

| GitHub                                                         | Local                               |
| -------------------------------------------------------------- | ----------------------------------- |
| A per-reviewer check run reaching its conclusion               | `reviewer.resolved` event           |
| Findings in the sticky PR comment and as check-run annotations | `findings` in `reviewer.resolved`   |
| Tokens and cost in the check output                            | `usage` in `reviewer.resolved`      |
| The aggregate `bastion` check and the sticky PR comment        | `run.completed` event               |
| Transcript in the uploaded run artifact                        | saved on disk, `bastion transcript` |

The local stream additionally carries `run.started` and `reviewer.started` for an
agent reacting as the run goes; those have no separate GitHub surface, because
`bastion github report` runs after the review finishes and renders the result in one
pass. This mapping is deliberate, so an agent's local loop and the CI gate stay
aligned on what a review means.

## The one required check

Branch protection needs you to name the checks that must pass, but Bastion's set of
reviewers *varies per PR*: a docs-only PR and a server PR trigger different
reviewers, so there is no fixed list of names to require.

The fix is a single always-present check, **`bastion`**, and it is the only one
branch protection requires. It runs even when zero reviewers match (a trivial pass)
so it is always there to require. Internally it reflects the aggregate: `success`
only when every triggered gate passed, `failure` if any gate blocked, errored, or
timed out (fail-closed). The per-reviewer checks stay informational; `bastion` is
the gate.

## The workflow

The adapter is a self-hosted workflow that installs a published `bastion`
release plus your backend CLI, authenticates the backend, runs `bastion review`, and
then runs `bastion github report` to post the results to the PR. The CLI exits
non-zero if any gate blocks, so the job's pass/fail *is* your merge gate; the report
step adds the sticky comment and the per-reviewer and aggregate check runs. That host
backend CLI and its auth cover **native** reviewers (the default). A reviewer with a
[`runner`](./authoring-reviewers.md#runner-and-capabilities) runs its backend
*inside a container* instead (and must declare `capabilities.network: true`; without it
the reviewer is rejected before it runs, so a gate blocks and an advisor is skipped), so
for those the job needs a container engine on the runner (`docker` by default, or
whatever `BASTION_CONTAINER_ENGINE` names) and the backend executable plus its auth
inside the image, not on the host. The fixed provider
credential variables are forwarded from the job into the container by name, so the host
auth still reaches a containerized reviewer's provider even though the CLI itself lives
in the image:

```yaml
name: bastion
on:
  pull_request:
    types: [opened, synchronize, reopened]

# The report step writes the PR comment and the check runs, so the job needs more
# than read access.
permissions:
  contents: read
  pull-requests: write
  checks: write

jobs:
  review:
    runs-on: ubuntu-latest
    # True only when both dedicated-app secrets are set (the id and key are one
    # credential), so a half-configured repo falls back instead of failing the mint
    # step. Computed here because the `if:` below can read `env` but not `secrets`.
    env:
      HAS_BASTION_APP: ${{ secrets.BASTION_APP_ID != '' && secrets.BASTION_APP_PRIVATE_KEY != '' }}
    # Agentic backends run over the PR's code with live credentials, so restrict to
    # same-repo PRs; a maintainer re-runs a fork PR from a trusted branch.
    if: github.event.pull_request.head.repo.full_name == github.repository
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0          # full history; reviewers diff against the base

      # 1. Install a published bastion release (not built from the PR).
      # 2. For native reviewers: install your backend CLI (claude, codex, or pi) on
      #    the runner and authenticate it as the PR author. The concrete per-author
      #    auth step is in "Authentication & billing" below; drop it in here. For
      #    reviewers with a `runner`: ensure a container engine is on the runner
      #    (docker by default, or set BASTION_CONTAINER_ENGINE) and that the backend
      #    CLI and its auth live inside the image; the provider credential variables
      #    are forwarded in by name.
      # 3. Stand up anything your reviewers consume (a preview env, a database).

      - name: Review
        env:
          BASTION_DATA_DIR: ${{ github.workspace }}/.bastion
          # Lets the reviewers read the PR's description and discussion as context
          # (read-only, best effort; gathering reads the first 100 conversation comments
          # and first 100 review comments, no pagination). Omit the --repo/--pr flags
          # below to review the diff and local context without PR discussion.
          GITHUB_TOKEN: ${{ github.token }}
        # Non-zero exit on a blocked gate fails the job; that is the merge gate.
        # --repo/--pr feed the reviewers the PR's stated intent and discussion alongside
        # the diff. Cross-run prior-findings memory needs the run store persisted between
        # runs (upload and restore .bastion/runs); a fresh runner starts without it.
        run: |
          bastion review --base "origin/${{ github.base_ref }}" \
            --repo "${{ github.repository }}" \
            --pr "${{ github.event.pull_request.number }}"

      # Optional: mint a token for a dedicated Bastion app so the check runs get
      # their own check suite and render under the app's name. Skipped (and the
      # report falls back to the default GITHUB_TOKEN) when the app is not set up.
      # See "Grouping the checks under their own app" below.
      - id: app-token
        if: ${{ always() && env.HAS_BASTION_APP == 'true' }}
        uses: actions/create-github-app-token@v2
        with:
          app-id: ${{ secrets.BASTION_APP_ID }}
          private-key: ${{ secrets.BASTION_APP_PRIVATE_KEY }}

      - name: Report to the PR
        # Runs even when the review blocked and failed the job, so the comment and
        # checks always land. Creating check runs needs a GitHub App installation
        # token (a classic PAT cannot); both the dedicated-app token and the default
        # GITHUB_TOKEN qualify, so use the dedicated one when present and fall back.
        if: always()
        env:
          GITHUB_TOKEN: ${{ steps.app-token.outputs.token || github.token }}
          BASTION_DATA_DIR: ${{ github.workspace }}/.bastion
        run: |
          set -euo pipefail
          bastion github report \
            --repo "${{ github.repository }}" \
            --pr "${{ github.event.pull_request.number }}" \
            --sha "${{ github.event.pull_request.head.sha }}"
```

### `bastion github report`

The report step reads the run that `bastion review` just persisted (under
`BASTION_DATA_DIR`) and posts it to the pull request. Its full surface:

```
bastion github report --repo <OWNER/NAME> --pr <N> --sha <SHA> [RUN]
```

- `--repo <OWNER/NAME>`: the repository to post to. Defaults to the
  `GITHUB_REPOSITORY` environment variable that Actions sets, so you can usually
  omit it.
- `--pr <N>`: the pull request number (required).
- `--sha <SHA>`: the head commit the check runs attach to (required); pass the
  PR's `head.sha`, not the merge commit.
- `RUN`: an optional positional run id to report; defaults to the latest recorded
  run, which is what you want right after `bastion review`.

It needs a token with `pull-requests: write` and `checks: write` in `GITHUB_TOKEN`,
and reads `GITHUB_API_URL` (Actions sets it; also the hook for GitHub Enterprise).
Creating check runs requires a GitHub App installation token; both the default
Actions `GITHUB_TOKEN` and a dedicated-app token (see below) are installation
tokens and qualify, while a classic personal access token does not. If the run
cannot be found (an earlier failure persisted nothing), it prints a notice and
exits 0 rather than failing the step a second time. The command is CI-facing and
has no local mirror: locally you read findings straight from
`bastion review --format jsonl`.

### Grouping the checks under their own app

In the PR checks list, the name before the `/` is not the workflow that created a
check; it is the **check suite** the check belongs to, and a check suite is keyed by
`(GitHub App, commit)`. Every GitHub Actions workflow runs under the one shared
`github-actions` app, so a commit that triggers several workflows has several
`github-actions` suites. The check runs `bastion github report` creates through the
REST API carry no suite id (the API does not accept one), so GitHub attaches them to
one of those suites of its own choosing, often a sibling workflow's. The result is
check runs that read like `Security / fail-closed-gates` instead of grouping on
their own.

A check run lands in its own named suite only when a **distinct GitHub App**
creates it. So the fix is to post the report under a small app of your own rather
than the shared Actions identity:

1. Create the app. Go to
   [bastion.jessica.black/github-app](https://bastion.jessica.black/github-app) and
   follow the walkthrough; it shows how to create a GitHub App by hand in GitHub's UI
   with exactly the permissions the report step needs (`checks: write`,
   `pull_requests: write`, `contents: read`, no webhook). The app's **name** is what
   the checks group under, for example `YourOrg's Bastion`.
2. Generate the app's private key, note its numeric App ID, and install the app on
   the repositories that run Bastion.
3. Store `BASTION_APP_ID` (the App ID) and `BASTION_APP_PRIVATE_KEY` (the `.pem`
   contents) as Actions secrets. For Dependabot-triggered runs, set them in the
   Dependabot secret store too.

The workflow above mints a token from those secrets with
[`actions/create-github-app-token`](https://github.com/actions/create-github-app-token)
and hands it to the report step; the per-reviewer and aggregate checks then render
under the app's name. The step is fully optional: with the secrets unset it is
skipped and reporting falls back to the default `GITHUB_TOKEN`, which still posts
the comment and checks, only grouped under whichever suite GitHub picks. When that
happens, `bastion github report` notices (it reads back the app that GitHub stamped
on the check runs it just created) and closes the PR comment with a short note
linking here; once a dedicated app is configured the note disappears. Because the
report reads GitHub's response, the workflow does not pass a flag.

For a complete, working example (latest-release install, per-author backend
credentials, and fork-PR safety), see Bastion's own
[`.github/workflows/bastion.yml`](https://github.com/jssblck/bastion/blob/main/.github/workflows/bastion.yml).
It wires up the per-author auth recipe in [Authentication & billing](#authentication--billing)
below, on the Codex backend.

Configure branch protection on your default branch to require this job (and to
require review of the reviewer-policy paths; see [Governance](./governance.md)).
Merging stays GitHub-native: an author enables auto-merge, and once the required
job is green GitHub merges. A push re-triggers the workflow and it resolves again.

## Authentication & billing

Coding-agent subscriptions tie usage to an individual, not a team, so Bastion bills
a PR's reviews to the *PR author*. Reviewing Alice's PR is billed to Alice's
subscription, which is the ToS-compliant reading: each contributor's plan powers the
review of their own changes. Bastion never stores credentials. The team stores each
author's credential as an Actions secret, and the workflow maps the PR author's
GitHub login to the matching secret at run time.

Bastion just runs your backend CLI, and the backend reads whatever auth it finds on
the runner. Your job in CI is to place the right author's credential where that CLI
looks before `bastion review` runs. The pattern is the same for every backend:

1. **Capture the credential once, locally.** Each contributor signs in to the
   backend on their own machine. The CLI writes a credential file:

   | Backend       | Sign-in            | Credential file the CLI reads                  |
   | ------------- | ------------------ | ---------------------------------------------- |
   | `codex`       | `codex login`      | `~/.codex/auth.json` (relocatable: `CODEX_HOME`) |
   | `pi`          | `pi` auth flow     | `~/.pi/agent/auth.json`                         |
   | `claude-code` | `claude` sign-in   | `~/.claude` (OAuth token)                       |

   For a ChatGPT or Claude **subscription**, this file holds an OAuth credential (an
   access token plus a refresh token); the CLI refreshes the short-lived access
   token from the stored refresh token on each run, so the secret does not need
   rotating every time the access token expires. A Codex `auth.json` from a ChatGPT
   sign-in carries `"auth_mode": "chatgpt"`, and the native `backend: codex` reads it
   directly: you do **not** need Pi to spend a ChatGPT subscription (see
   [Spending a subscription in CI](#spending-a-subscription-in-ci) below).

2. **Store it as a per-author secret.** Copy the file's contents into a repository
   secret named `<BACKEND>_AUTH_<LOGIN>`: the backend, then the GitHub login
   uppercased. For the `codex` backend and the login `jssblck`, that is
   `CODEX_AUTH_JSSBLCK`; for `pi`, `PI_AUTH_JSSBLCK`. The name is a convention you
   pick and reference in the workflow, not something Bastion parses.

3. **Map the login to the secret in the workflow.** Resolve
   `github.event.pull_request.user.login` to the matching secret through a `case`
   arm, then write it back to the path the CLI reads:

   ```yaml
   - name: Authenticate Codex as the PR author
     env:
       AUTHOR: ${{ github.event.pull_request.user.login }}
       CODEX_AUTH_JSSBLCK: ${{ secrets.CODEX_AUTH_JSSBLCK }}
     run: |
       set -euo pipefail
       author="$(printf '%s' "$AUTHOR" | tr '[:upper:]' '[:lower:]')"
       case "$author" in
         jssblck) cred="$CODEX_AUTH_JSSBLCK" ;;
         *)
           echo "::error::No Codex credential mapped for PR author '$AUTHOR'. Add a CODEX_AUTH_<LOGIN> secret and a case arm." >&2
           exit 1 ;;
       esac
       if [ -z "$cred" ]; then
         echo "::error::Codex credential for '$AUTHOR' is mapped but its secret is empty." >&2
         exit 1
       fi
       mkdir -p "$HOME/.codex"
       printf '%s' "$cred" > "$HOME/.codex/auth.json"
       chmod 600 "$HOME/.codex/auth.json"
   ```

   Onboarding a contributor is then two reviewed lines: their secret and a `case`
   arm. Because the mapping lives in the workflow, which is a CODEOWNERS-protected
   path (see [Governance](./governance.md)), changing who may spend a subscription is
   itself a human-reviewed change.

An author with no mapped secret **fails closed**: the step errors and the gate
blocks, rather than silently billing someone else's subscription. If you would
rather a new contributor never be blocked, point the `*)` arm at a shared metered
**API key** instead of erroring: store the provider's API key as a secret and export
it (for example `CODEX_API_KEY` / `ANTHROPIC_API_KEY`) into the review step rather
than writing an `auth.json`. The same login-to-secret shape applies. Under heavy
volume a subscription's rate limits can throttle reviewers, and because gates fail
closed a throttled reviewer reads as a blocked merge, so some teams use API billing
in CI and keep subscriptions for the local loop.

### Spending a subscription in CI

A ChatGPT or Claude subscription works in CI the same way it does locally: the
backend CLI reads its OAuth `auth.json` and refreshes the token itself. Use the
backend that matches the subscription you have:

- **`backend: codex` with a ChatGPT subscription.** Sign in with `codex login`
  (ChatGPT), store `~/.codex/auth.json` as `CODEX_AUTH_<LOGIN>`, and rehydrate it to
  `$HOME/.codex/auth.json` as shown above. This is the direct path; no Pi involved.
- **`backend: claude-code` with a Claude subscription.** Same shape against the
  `claude` CLI's auth.
- **`backend: pi` with the `openai-codex` provider.** Pi can also spend a ChatGPT
  subscription, through its `openai-codex` provider (`model: openai-codex/gpt-5.5`).
  Reach for this only when you specifically want Pi's multi-provider routing; for
  plain Codex-on-ChatGPT, the native `codex` backend is simpler.

> **The two `auth.json` files are different.** `~/.codex/auth.json` (Codex CLI) and
> `~/.pi/agent/auth.json` (Pi CLI) are distinct file formats backed by the same
> ChatGPT account. The secret you store must match the backend you pin: a Codex
> `auth.json` rehydrated where Pi looks (or the reverse) will not authenticate. Pick
> the backend first, then capture that CLI's file.

### Dependabot and bot authors

Dependabot opens **same-repo** PRs, so they clear the fork guard and Bastion reviews
them like any other PR. With the `permissions:` block the example workflow declares,
the default `GITHUB_TOKEN` posts the `bastion` check on a Dependabot PR, so you can
require it for those PRs too. There is no read-only-token deadlock to work around.
Dependabot has one required difference for everyone and one extra step that applies
only to per-author billing:

- **Secrets come from a separate store (applies to everyone).** GitHub serves
  secrets to Dependabot-triggered runs from a *Dependabot* secret store, not the
  Actions store. Whatever credential your review step reads, an `ANTHROPIC_API_KEY`
  or a per-author `<BACKEND>_AUTH_<LOGIN>`, must be set in that store as well
  (`gh secret set <NAME> --app dependabot`), or it arrives empty on a Dependabot PR
  and the gate fails closed.
- **A bot has no subscription of its own (per-author billing only).** If you map
  per-author credentials, the bot author needs a `case` arm pointing at a maintainer
  who sponsors its reviews, and the bracketed login must be quoted, since `[bot]` is
  a glob character class in a shell `case` pattern:
  `'dependabot[bot]') cred="$CODEX_AUTH_JSSBLCK" ;;`. An arm that maps to an empty
  secret fails closed with a "mapped but empty" error, usually the sign the
  Dependabot-store copy is missing. Billing with a shared API key instead of
  per-author secrets avoids this entirely: there is no per-author arm to maintain.

### Fork-PR safety

GitHub does not expose secrets to workflows triggered by **fork** pull requests, and
an agentic backend should never run over untrusted code with a live credential
anyway. The example workflow guards on
`github.event.pull_request.head.repo.full_name == github.repository`, so it runs for
same-repo PRs only. A fork contribution is reviewed by a maintainer re-running it
from a trusted branch in the repo.

## Environments & inputs

Bastion consumes environments; it does not provision them. A reviewer that needs a
preview URL, a database, or any running dependency expects the workflow to have
stood it up and exposed it. Typically an earlier job deploys a preview environment
for the PR and passes its URL into the Bastion job as an environment variable. How
that variable reaches the agent depends on where the reviewer runs. A **native**
reviewer inherits the job environment, so the agent can see it directly. A
**containerized** reviewer (one with a
[`runner`](./authoring-reviewers.md#runner-and-capabilities) and
`capabilities.network: true`) runs in a container and does *not* inherit the arbitrary
job environment. Only the reviewer's literal `env`
pairs cross that boundary (plus a fixed provider-credential set, except that a
credential name set in the reviewer's own `env` wins and is not also forwarded from the
job environment), so a per-PR value reaches a containerized reviewer only if you write
its value into the registry,
typically by templating `.bastion.yaml` before the Bastion job runs. A reviewer's
`env` and `inputs` values are literal (Bastion does not shell-expand them), so to put
a dynamic value into the prompt itself you template the registry or have the prompt
read the variable. Standing up the environment is a deploy concern; Bastion's job
starts once it exists. (See
[Authoring reviewers](./authoring-reviewers.md#env) for the reviewer side.)

## Self-hosting note

Bastion dogfoods the adapter through
[`.github/workflows/bastion.yml`](https://github.com/jssblck/bastion/blob/main/.github/workflows/bastion.yml),
running the latest published `bastion` release rather than a binary built from the
PR's own sources, so a change can never edit the engine that judges it. That workflow
is a concrete, self-hosted instance of everything this chapter describes.

---

Next: [Governance](./governance.md). Keeping humans at the policy layer with
CODEOWNERS and branch protection, and the escape-to-improvement loop.

---

<!-- https://bastion.jessica.black/guide/governance -->

# Governance

> Keeping humans at the policy layer: protecting the registry, the
> escape-to-improvement loop, and what Bastion deliberately does not guarantee.

Bastion relocates the human from reviewing diffs to *governing the reviewers*. That
only works if the reviewer policy itself is protected and continuously improved.
This chapter is the human's operating manual.

## The policy layer

The reviewers, their prompts, and their triggers *are* the review policy. The whole
safety story rests on a simple rule: **any change to that policy is reviewed by a
human before it merges.** Otherwise an aligned-but-mistaken agent could quietly
loosen a trigger or soften a prompt, and the gate would erode without anyone
noticing.

Two native GitHub mechanisms enforce this; neither is exotic.

### CODEOWNERS protects the registry

Bastion can generate a CODEOWNERS block covering the reviewer-policy paths: the
registry, the reviewer definitions, the Bastion workflow, and the CODEOWNERS file
itself:

```sh
bastion github codeowners --owner @your-org/platform
```

Pass `--owner` once per owner (it is repeatable). Add the generated block to your
`CODEOWNERS`. With that block in place, any PR that adds, removes, or edits a
reviewer; loosens a trigger; or changes a prompt touches an owned path, so GitHub
requires a human review before merge. You can also write your own CODEOWNERS instead; the
generated block is a correct starting suggestion.

> Why generate it statically rather than have Bastion manage it live? CODEOWNERS
> changes only take effect *after* a PR merges, so the file must be written to
> protect every path Bastion will ever write into, ahead of time, which is what
> the generated, reviewed block provides.

### Branch protection requires the check

Require Bastion's review on your default branch. That is the review job from
[Continuous integration](./continuous-integration.md#the-workflow), which
also posts the always-present aggregate check named `bastion` (with a check run per
reviewer alongside it), so you can require either the job or that `bastion` check.
A PR then cannot merge with the gate switched off, and
because the workflow file and the registry are themselves owned paths, switching it
off is itself a policy change a human sees.

That is the entire enforcement story, and it is intentionally modest. The
contributor Bastion is designed for is an aligned agent that would never quietly
disable CI; the CODEOWNERS trip wire and the required check exist so that *if*
policy changes, a human is in the loop, not so that a determined adversary is
stopped.

## The escape-to-improvement loop

An **escape** is a PR that merged but should have been blocked: a reviewer missed
something. Escapes are inevitable, especially early while reviewers are still being
tuned, and they are the single most valuable signal for improving the system.

Bastion cannot detect escapes itself: if it could, it would have blocked them. This
is a human governance loop:

1. **Notice** an escape (monitoring, a bug report, a production incident).
2. **Triage** it: which reviewer(s) should have caught it, and why did they not?
3. **Improve** the policy: sharpen a prompt, add a new single-concern reviewer for
   the missed property, or fix the reviewer's environment.
4. **Merge** the policy change (through the CODEOWNERS-gated human review above).

This is why Bastion expects reviewers to improve over time. Start with a reviewer
that is good enough and sharpen it from real escapes instead of perfecting it on
paper. Treat escapes as expected feedback rather than failures, and triage them
regularly so the policy keeps improving.

## What Bastion does not guarantee

Govern with these limits in mind; they are deliberate, not gaps to be closed:

- **It is not a correctness proof.** Bastion does not guarantee code is free of
  bugs or vulnerabilities. A reviewer is only as good as its model and prompt;
  it is code review without the human in the small loop, not a verifier.
- **It does not judge whether the right thing is being built.** That is a
  design-time question; by PR time the ship has sailed. Keep humans in the design
  loop.
- **It is not an adversarial security boundary.** Bastion assumes PR authors are
  aligned contributors and treats reviewed code as trusted input; it does not
  defend reviewer agents against prompt injection or exfiltration from the code
  they review. The bar is *reasonable reduction proportionate to effort*: a speed
  bump and good defaults, like lint and CI and human review before it. Anything
  stronger (signing, external rule storage, an enumerated trusted-computing-base)
  is deliberately out of scope.

These limits follow from one assumption: the threat being managed is an
aligned-but-fallible agent, not a determined adversary. Govern accordingly. Bastion
is a control on honest mistakes and drift, layered with the rest of your CI, not a
boundary that holds against someone actively trying to defeat it.

## A governance checklist

For a healthy deployment:

- [ ] `.bastion.yaml` and the Bastion workflow are CODEOWNERS-protected.
- [ ] Bastion's review is required by branch protection on the default branch (the
      review job, or the aggregate `bastion` check that `bastion github report` posts).
- [ ] Reviewer-policy PRs get a real human review, not a rubber stamp.
- [ ] Someone owns escape triage, and escapes feed back into reviewer changes.
- [ ] Billing is configured (per-author secrets or an API-key fallback) so reviews
      are not silently blocked by missing credentials. See
      [Continuous integration](./continuous-integration.md#authentication--billing).

---

That is the guide. If you want to work on Bastion itself rather than use it, the
design notes and contributor docs live in the
[Bastion repository](https://github.com/jssblck/bastion).