# Bastion user guide (full) > The complete user guide, concatenated in reading order. Canonical pages live under https://bastion.jessica.black/guide. # Bastion user guide > Agentic code review for a world where agents write all of the code. This guide teaches you how to use Bastion on your own project: what it is, how to run it, how to write reviewers, and how to wire it into CI and governance. It is written for two audiences at once (the human curating the review policy and the agent looping against it), because Bastion runs the repository's reviewers and merge gate for both through whatever surface is natural to each (CI can add the PR's description and discussion to the reviewers' context, and a purely local run can add an author's personal user-level reviewers, which CI never sees). This guide is self-contained: everything you need to run Bastion, write reviewers, and wire it into CI is here, with nothing essential living elsewhere. If you want to work on Bastion itself rather than use it, the contributor and design docs live in the [Bastion repository](https://github.com/jssblck/bastion). > **Reading this as an agent?** The whole guide is also served as a single plain-text > file at [`bastion.jessica.black/llms-full.txt`](https://bastion.jessica.black/llms-full.txt), > so you can ingest every chapter in one fetch instead of crawling pages. ## Read in order The chapters build on each other. If you read them top to bottom you will go from "what is this" to "running it in CI with a governed policy" without backtracking. 1. **[Introduction](./introduction.md)**: the problem Bastion solves, the core idea (reviewers as fitness functions), and the mental model. Start here. 2. **[Getting started](./getting-started.md)**: install the CLI, write your first reviewer, and run your first review in about five minutes. 3. **[Concepts](./concepts.md)**: reviewers, triggers, modes, the verdict, and the merge gate. The vocabulary the rest of the guide assumes. 4. **[Authoring reviewers](./authoring-reviewers.md)**: the registry schema in full, from the four required fields to timeouts, backends, environment, and prompt inputs. How to write a reviewer that stays at high recall. 5. **[The local workflow](./local-workflow.md)**: the `bastion review` loop in depth: human output vs. the JSONL agent stream, exit codes, and inspecting saved runs (`runs`, `show`, `transcript`, `clean`). 6. **[Continuous integration](./continuous-integration.md)**: promoting your repository's reviewers into GitHub Actions: checks, the aggregate gate, and per-author billing. 7. **[Governance](./governance.md)**: keeping humans at the policy layer with CODEOWNERS and branch protection, the escape-to-improvement loop, and what Bastion deliberately does not guarantee. ## In a hurry: set up Bastion in CI If your goal is "get Bastion reviewing pull requests on GitHub," here is the whole path; each step links to its details: 1. **Install the CLI and pick a backend.** [Getting started](./getting-started.md) (a subscription works; no API key required). 2. **Write `.bastion.yaml`** at your repo root with one or two reviewers, and check it with `bastion validate`. [Authoring reviewers](./authoring-reviewers.md). To pin a model like `gpt-5.5:high`, set `model:` and `effort:` separately under a pinned `backend:`. 3. **Add the workflow** and the per-author auth step. [Continuous integration](./continuous-integration.md#the-workflow). The complete, copy-pasteable auth recipe (the `_AUTH_` secret convention, the `case`-arm mapping, Dependabot, and fork safety) is in [Authentication & billing](./continuous-integration.md#authentication--billing). 4. **Protect the policy and require the check.** [Governance](./governance.md): CODEOWNERS over `.bastion.yaml` and the workflow, and branch protection requiring the aggregate `bastion` check. ## The one-paragraph version You declare **reviewers** (focused agent prompts, one concern each) in `.bastion.yaml`. Each reviewer has a **trigger** (file globs) and a **mode** (`gate` blocks the merge, `advisor` only comments). `bastion review` finds the reviewers whose triggers match your working-tree changes, runs them in parallel, and aggregates their verdicts into one decision: all gates must pass. A local run can also merge in personal reviewers from a user-level `.bastion.yaml`, so you can run a reviewer locally even where a repo has not adopted Bastion. An authoring agent loops `bastion review` until it is green, then opens a PR where CI runs the repository's reviewers (the user-level ones are local-only). CI usually confirms the result, and can differ when it adds the PR's description and discussion to the reviewers' context. Humans stay in the loop by owning the reviewer registry, not by reading every diff. ## Status Bastion is experimental and still partial. The routing, runner, verdict aggregation, and on-disk run store are implemented and tested, and the Claude Code, Codex, and Pi backends execute reviewers for real, natively or inside a container when a reviewer declares a `runner` and opts into `capabilities.network: true`. The remaining capability fields (`mcp` and `skills`) are accepted but not provisioned, so a reviewer that opts into one fails closed rather than running without it. `network: true` grants a containerized reviewer general (unscoped) egress; a container with the default `network: false` is rejected before it runs, so a gate blocks and an advisor is skipped (provider-only scoping is unbuilt). A containerized reviewer must opt into `network: true`. These are called out where they appear in [Authoring reviewers](./authoring-reviewers.md). --- # Introduction > Why Bastion exists, and the one idea you need to hold in your head. ## The problem Agents write most of the code on a growing number of teams. When they are fully unlocked, output volume looks more like *engineers x 100* than *x 1*. Two things stop teams from unlocking that: - **Human diff review does not scale.** Asking a 5-person team to review their agents' output is like asking 5 people in a 500-person org to review the other 495. You cannot fix that by trying harder. - **Without review, codebases rot.** Things go fine until they do not, and then you have a ball of mud nobody can work in. The usual shape of agentic review hands the whole diff to one reviewer that checks everything and writes comments designed for a person to act on. As you ask one generic reviewer to check more things, its recall on any single one degrades. A one-item checklist agent works; at ten items it is weaker; at a hundred it fails. ## The core idea In Bastion, a reviewer is a **focused fitness function** (an automated check that continuously asserts one property holds as the system evolves), and review is the **author agent's loop taken to its conclusion**. An authoring agent already loops against the compiler, the linter, and the tests. Bastion adds loops whose oracle is *another agent*, one that encodes judgment a compiler or a test cannot. The whole system follows from five principles: 1. **One concern per reviewer.** Single-responsibility reviewers stay at high recall and confidence. The unit of the system is *the reviewer*, not *the review*. You cover more ground by adding narrow reviewers, never by broadening one. A cross-cutting property like tenant isolation or migration safety is not special; it is just another reviewer whose single concern is that property. 2. **Reviewers run in the author's own loop, not only in CI.** The repository's reviewers run locally (fast, pre-PR) and in CI (authoritative), so CI usually confirms a green local loop. The two can differ when CI feeds reviewers the PR's description and discussion that a default local run lacks, and a purely local run can also include your personal user-level reviewers, which CI never runs (see [Authoring reviewers](./authoring-reviewers.md#user-level-reviewers)). 3. **Humans sit at the policy layer.** The goal is not human-out-of-the-loop. It is to move the human from reviewing diffs to *authoring, curating, and governing reviewers*, plus triaging escapes (bugs that slipped through a review that should have caught them). Your interface becomes the reviewer registry, not the diff. 4. **Aligned agents can still inadvertently game the system.** Bastion tolerates this and makes it *visible* and *easy to correct* by adjusting reviewers, rather than trying to make gaming impossible (which would give up the benefits of agentic development entirely). 5. **Reviewers converge through use.** Ship a reviewer that is good enough, then improve it from the escapes you actually hit, rather than trying to design a perfect one up front. The escape-to-improvement loop is where that happens. ## The mental model Picture the way a good team did code review before agents: > An author opens a PR. A reviewer reads it, leaves feedback (some blocking, some > optional) and withholds approval until satisfied. The author addresses the > blocking items (by changing the code, or by convincing the reviewer the code is > already right) and requests re-review. Repeat until approved. Bastion brings *that* process to the agent era. The reviewers play the colleague's role, their verdicts are the feedback, and the author agent resolves the blocking items and re-runs. The human is still in charge, but of the reviewers, not of every line. ## What Bastion is not Two non-guarantees are deliberate. Keep them in mind before you adopt it: - **No guarantee of correctness.** Bastion does not prove your code is free of bugs or vulnerabilities. It is code review without the human in the small loop; a reviewer is only as good as its model and its prompt. - **No guarantee the right thing is being built.** Catching "this is the wrong thing to build" was never review's job. By PR time that ship has sailed; it is a design-time question. Keep humans in the design loop. Bastion is also **not an adversarial security boundary**. It is the agent-era equivalent of team code review for aligned contributors: a speed bump and a set of good defaults that keep earnest actors on the rails, not a defense against a determined malicious one. The practical consequences for you, and how to govern within these limits, show up in [Governance](./governance.md). --- Next: [Getting started](./getting-started.md) -> install the CLI and run your first review. --- # Getting started > Install Bastion, write one reviewer, and run your first review. This chapter gets you from nothing to a working review loop. It assumes you have a git repository and one of the supported agent backends installed (the Claude Code or Codex CLI). A little vocabulary shows up here in passing: *reviewer*, *gate*, *advisor*, *verdict*, *findings*. The inline definitions are enough to follow along; the next chapter, [Concepts](./concepts.md), defines each precisely. ## 1. Install the CLI The quickest path is the install script. It detects your platform, downloads the matching archive from the latest [GitHub release](https://github.com/jssblck/bastion/releases), verifies its SHA-256 checksum, and puts `bastion` on your `PATH`. On Linux and macOS: ```sh curl -sSfL https://raw.githubusercontent.com/jssblck/bastion/main/scripts/install.sh | bash bastion --version ``` On Windows, from PowerShell: ```powershell irm https://raw.githubusercontent.com/jssblck/bastion/main/scripts/install.ps1 | iex bastion --version ``` The shell installer takes `-v/--version`, `-b/--bin-dir`, `-t/--tmp-dir`, and `-l/--libc` (pass them after `bash -s --`); the PowerShell installer reads the `Version` and `BinDir` environment variables. Pass `--help` (or set `$env:Help="true"`) to see them all. On Linux the installer autodetects the C runtime: it picks the statically linked musl build on musl systems and on any host whose glibc is older than 2.35 (or undetectable), and the glibc build only when the host glibc is 2.35 or newer (Ubuntu 22.04, Debian 12, RHEL 9, and later). Force the choice with `--libc gnu|musl` (or `BASTION_LIBC=...`) when you want to override it, for example to take the portable musl build everywhere: ```sh curl -sSfL https://raw.githubusercontent.com/jssblck/bastion/main/scripts/install.sh | bash -s -- --libc musl # ...or, without the `-s --` dance, via the environment: curl -sSfL https://raw.githubusercontent.com/jssblck/bastion/main/scripts/install.sh | BASTION_LIBC=musl bash ``` Prefer to grab the archive yourself? Prebuilt binaries are attached to every release for Linux (x86_64 and aarch64, glibc and musl), macOS (Intel and Apple silicon), and Windows (x86_64). Download the one for your platform, extract it, and put `bastion` on your `PATH`: ```sh # Example: Linux x86_64 curl -sSL https://github.com/jssblck/bastion/releases/latest/download/bastion-x86_64-unknown-linux-gnu.tar.gz | tar -xz sudo install bastion-x86_64-unknown-linux-gnu/bastion /usr/local/bin/ bastion --version ``` On a system with glibc older than 2.35, swap `gnu` for `musl` in those URLs to get the static build. Prefer to build from source? You need a Rust 2024 toolchain: ```sh cargo build --release ./target/release/bastion --version ``` `bastion --version` reports a release tag when one is reachable, otherwise the short commit SHA, with a `-dirty` suffix when the tree has uncommitted changes. ## 2. Make sure the backend is ready Bastion does not run its own agent loop. It shells out to an existing coding-agent CLI and reuses whatever you already have configured locally, so your billing and auth come along for free. Install and sign in to one of: - **[Claude Code](https://docs.claude.com/en/docs/claude-code)** (`claude`): the default when a reviewer does not pin a backend. - **[Codex](https://github.com/openai/codex)** (`codex`): pin it with `backend: codex` on a reviewer. - **[Pi](https://github.com/earendil-works/pi)** (`pi`): pin it with `backend: pi`. Pi runs against whatever provider you have configured it with locally, unless a reviewer pins a `model` (Pi's `provider/id` form, which selects the provider too). A **subscription** is fine; you do not need an API key. Because Bastion just runs the CLI, whatever you signed in with works: a ChatGPT subscription through `codex`, a Claude subscription through `claude`, and so on. The CLI reads its own auth file (`~/.codex/auth.json`, `~/.claude`) and refreshes its token itself. Getting that same subscription to bill the right person in CI is its own step, covered in [Continuous integration](./continuous-integration.md#authentication--billing). Bastion invokes the backend as a plain executable on your `PATH` (`claude`, `codex`, or `pi`), so confirm the one you intend to use is installed and authenticated before running a review: ```sh claude --version # for the Claude Code backend codex --version # for the Codex backend ``` If the binary lives elsewhere or you want to point at a wrapper, set `BASTION_CLAUDE_BIN` or `BASTION_CODEX_BIN` to its path. That covers the default, **native** path. If you author a reviewer with a [`runner`](./authoring-reviewers.md#runner-and-capabilities), that reviewer runs its backend inside a container instead (and must opt into `capabilities.network: true`; without it the reviewer is rejected before it runs, so a gate blocks and an advisor is skipped), so it needs a container engine on the host rather than the backend CLI: Bastion shells out to `docker` by default (set `BASTION_CONTAINER_ENGINE` to use another, for example `podman`), and the backend CLI (`claude` / `codex`) must be present inside the image. A fixed set of provider credential variables (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, and the like) is forwarded from your environment into the container by name so the in-container agent can authenticate; host CLI auth that lives in a file (`~/.claude`, `~/.codex/auth.json`) is not, so an image that relies on that should bake it in. You only need this once you start using `runner` reviewers; the quickstart below stays native. ## 3. Write your first reviewer Reviewers live in a declarative file at your repository root: `.bastion.yaml` (the `.bastion.yml` spelling is also honored). Bastion discovers it by walking up from your current directory, so you can run `bastion` from anywhere inside the repo. You can also keep personal reviewers in a user-level `.bastion.yaml` in your platform config directory; a local `bastion review` merges them with the repository's, which lets you run a reviewer locally even in a repo that has not adopted Bastion (see [Authoring reviewers](./authoring-reviewers.md#user-level-reviewers)). Create the repository file: ```yaml # .bastion.yaml reviewers: - name: single-responsibility trigger: [src/**/*.rs] # which changed files wake this reviewer mode: gate # gate = blocks the merge; advisor = comments only prompt: | Review the changeset to determine whether any one file concentrates too many unrelated responsibilities. If a file has clearly taken on multiple distinct concerns that should be separate modules, block the PR and name the file(s) and the concerns; otherwise approve it. A single large but cohesive module is not a violation. ``` That is a complete reviewer. Four fields carry the meaning: a unique `name`, the `trigger` globs over your changed files, the `mode`, and the `prompt`. Everything else has a sensible default. The next chapter, [Concepts](./concepts.md), explains each of these; [Authoring reviewers](./authoring-reviewers.md) covers the full schema. > Adapt the trigger to your language: `src/**/*.ts`, `app/**/*.py`, and so on. The > glob matches against the paths git reports as changed. ## 4. Run a review Make a change in your working tree (you do not need to commit it; Bastion reviews the working tree, including uncommitted and untracked files), then: ```sh bastion review --base main ``` Bastion computes the files that differ from `main`, selects the reviewers whose triggers match, runs them in parallel, and renders progress and verdicts. A blocked review exits non-zero; a clean one exits zero. That exit code is what lets an agent (or a shell loop) know whether to keep working: ```sh while ! bastion review --base main; do # ... fix what blocked, then loop ... done ``` ## 5. Read it as a machine stream An agent driving the loop wants structured events, not rendered text. Ask for JSONL: one JSON object per line, emitted as each thing happens: ```sh bastion review --base main --format jsonl ``` You will get one typed event per line as the run progresses, ending in a `run.completed` that carries the aggregate verdict. The [local workflow](./local-workflow.md) chapter documents every event type and the exact contract an agent should follow when consuming them. ## 6. Look at what was saved Every run is persisted. Inspect history without re-running anything: ```sh bastion runs # list recent runs and their verdicts bastion show # re-print the latest run's findings bastion transcript # the full agent session for one reviewer ``` These are the on-demand detail; the common loop never needs them, but they are one command away when a verdict surprises you. (`show` and `transcript` default to the latest run; pass a run id for an older one, and the full forms are in [the local workflow](./local-workflow.md).) ## 7. Teach your agents to use Bastion You just drove the loop by hand. The point, though, is for your *coding agents* to drive it themselves: run the review, read the findings, fix what blocks, and reach a green gate before they ever open a PR. Bastion ships that instruction as a skill you install into the repo and commit, so every agent picks it up on checkout: ```sh bastion skills install ``` This writes a `using-bastion` skill into both `.claude/skills/` (Claude Code's native skill path) and `.agents/skills/` (the agent-neutral convention). Commit the result: ```sh git add .claude/skills .agents/skills git commit -m "Install the bastion onboarding skill" ``` The skill is generated from the binary, so re-running install after you upgrade Bastion keeps the checked-in copy current. To confirm it has not drifted from the binary (handy as a CI guard), run: ```sh bastion skills check # exits non-zero if a skill is missing or has drifted ``` The rendered file is deterministic (no version stamp or timestamp), so `check` stays green across upgrades that do not change the skill text and only flags real drift: a hand edit, or a forgotten re-install after the skill itself changed. When you do upgrade, re-run `bastion skills install` to refresh, or `bastion skills install --force` if you have local edits to overwrite. See what is bundled with `bastion skills list`, and install into a different directory with `--dir ` (repeatable). ## Keeping scratch runs out of your history While you are experimenting, point Bastion at a throwaway data directory so trial runs do not pile up in your real run history: ```sh bastion --data-dir /tmp/bastion-scratch review --base main ``` The same override is available as the `BASTION_DATA_DIR` environment variable. Note that `bastion review` always runs your reviewers on a real backend: there is no built-in mode that fabricates verdicts without an agent, so a review still costs a model call. To keep cost down while iterating, start with one cheap, fast reviewer and a tight `timeout`. ## When something goes wrong The most common first-run snags and what they mean: - **"no reviewer registry found ..."**: there is no `.bastion.yaml` (or `.bastion.yml`) in this repo or any ancestor, and no user-level one in your config directory either. The command searches both and only errors when both are absent, so create a repository registry (step 3) or a personal one. - **A reviewer registry error (malformed YAML, duplicate name, missing field).** The registry is validated before any agent runs, so these fail fast with a clear message. Run `bastion validate` (no model call) to check the merged set a local review would run, or `bastion validate path/to/.bastion.yaml` to check one file on its own; fix it and re-run. See [Authoring reviewers](./authoring-reviewers.md). - **The review blocks immediately with "did not produce a verdict".** A gate failed closed, usually because the backend binary is missing or unauthenticated. Re-check `claude --version` / `codex --version` and that you are signed in (step 2). - **No reviewers ran (a trivial pass).** Nothing in your changeset matched any reviewer's `trigger`. Confirm you actually changed a file the globs cover, and that `--base` points at the right branch. - **Everything looks unchanged.** Bastion diffs against `--base` (default `main`); if your base branch has a different name, pass it explicitly. --- You now have a working reviewer and a review loop. Next: [Concepts](./concepts.md). The vocabulary (triggers, modes, verdicts, the gate) the rest of the guide builds on. --- # Concepts > The vocabulary Bastion runs on: reviewers, triggers, modes, verdicts, and the > merge gate. This chapter defines the terms the rest of the guide uses. It is short on purpose; each idea has a deeper home later, linked as it comes up. ## The reviewer A **reviewer** is the unit of the system: a focused agent prompt responsible for exactly one property of a changeset. It is a bundle of *prompt + trigger + mode*, plus an optional execution profile (backend, timeout, environment, inputs, a container `runner`, and `capabilities`, among others). All of it is declared statically in `.bastion.yaml`; [Authoring reviewers](./authoring-reviewers.md) is the full field reference. The repository's `.bastion.yaml` is the shared, governed set; locally you can also keep personal reviewers in a user-level `.bastion.yaml`, and `bastion review` runs the merged set (see [Authoring reviewers](./authoring-reviewers.md#user-level-reviewers)). Two properties matter most: - **Single concern.** A reviewer checks one thing and checks it well. You scale coverage by adding reviewers, never by widening one. This is what keeps recall high (see [Introduction](./introduction.md#the-core-idea)). - **Declarative and static.** Reviewers are data, not code. Bastion never generates them on the fly. That keeps the trigger set stable and makes every reviewer reviewable, which is the foundation of [governance](./governance.md). ## The trigger and the changeset A reviewer's **trigger** is a list of path globs. A reviewer runs only when at least one changed file matches one of its globs. That is what makes a hundred reviewers cheap: a docs-only change wakes the docs reviewers and nothing else. ```yaml trigger: [src/server/**, src/client/**] # runs when server or client code changed ``` The **changeset** is everything in your working tree that differs from the base branch, *including uncommitted edits and new untracked files*, not just committed history. This is deliberate: it lets an author loop against reviewers before committing anything. (Locally, this means a reviewer sees your work in progress; in CI the head is already committed, so the same definition gives the same result.) ## The mode: gate vs. advisor Every reviewer has a **mode** that decides whether it can block a merge: | Mode | Blocks the merge? | On crash/timeout/bad output | | --- | --- | --- | | `gate` | Yes, when it returns `block` | **Fails closed**: resolves to `block` | | `advisor` | No, ever | **Fails open**: ignored in the aggregate | A **gate** is a hard requirement: it must produce a clean `pass` for the merge to proceed. If it crashes, times out, or cannot produce a valid verdict, it resolves to a block, never a silent pass. An **advisor** comments but never holds up the merge; even a clean `block` verdict from an advisor is treated as a pass for aggregation (its findings still surface). A failed advisor is dropped. Use a gate for properties that must hold (tenant isolation, fail-closed error handling). Use an advisor for guidance you want surfaced but not enforced (test coverage, doc gaps, style preferences). ## The verdict Every reviewer returns a structured **verdict**, captured through the backend's structured-output mechanism (a JSON schema for Claude Code, a requested verdict block for Codex) so Bastion can parse and aggregate it: ```yaml verdict: pass | block # the authoritative gate decision (ignored for advisors) summary: "..." # a human-friendly one-paragraph explanation findings: # specific, located comments - kind: blocking # blocking | optional path: src/server/db.rs line_start: 88 line_end: 91 detail: "scope this query by tenant_id" ``` The top-level `verdict` is the decision; `findings` explain it. A `block` should carry at least one `blocking` finding (the reason), and a `pass` may still carry `optional` findings as non-blocking suggestions. A finding's `kind` changes how it is *surfaced*, not whether the merge proceeds; only `verdict` decides that. **Findings are the actionable surface.** An agent fixing a PR gets everything it needs from the findings: a file, a line range, and what to change. It should never have to open a transcript to learn what to do. A reviewer reports the complete actionable set in one pass, one finding per distinct instance, not just one representative reason. The author can then fix everything from a single run instead of meeting the next issue on the following review cycle. Bastion requests this from every reviewer automatically, so a prompt does not need to ask for it. ## The merge gate Bastion runs all matched reviewers in parallel (they have wildly different latencies, one might take 90 seconds, another 15 minutes) and **aggregates** their verdicts into a single decision: - **All gates must pass.** The aggregate is `pass` only when every gate returned a clean `pass`. - **Any blocked, errored, or timed-out gate blocks the aggregate.** "All gates pass" never includes a gate that failed to produce a verdict. - **Advisors never affect the aggregate.** They contribute findings, not gate decisions. Locally, that aggregate is the exit code of `bastion review`. In CI it is the result of the Bastion review job, and `bastion github report` also posts it as a single always-present check named `bastion`. Either way the aggregation rule is the same, and CI runs the repository's reviewers. The decision matches when both runs see the same reviewers and context; two things can make a local run differ: CI can add the PR's description and discussion that a default local run does not, and a purely local run can include your personal user-level reviewers, which CI never runs (see [Authoring reviewers](./authoring-reviewers.md#user-level-reviewers)). ## The backend A **backend** is the agent harness a reviewer runs on. Bastion does not implement its own agent loop; it translates the reviewer into the backend's native config and shells out to its CLI, reusing your local auth and billing. - `any` (the default): Bastion chooses; that resolves to Claude Code. - `claude-code`: Anthropic's Claude Code CLI. - `codex`: OpenAI's Codex CLI. - `pi`: the Pi CLI; uses whatever provider you have configured it with locally, unless a reviewer pins a `model` (Pi's `provider/id` form selects the provider too). You pin a backend when a subscription's terms require a specific harness, or when one model is better at a given concern. See [Authoring reviewers](./authoring-reviewers.md#backend) and, for CI billing, [Continuous integration](./continuous-integration.md#authentication--billing). By default the backend CLI runs **natively** on the host, using the `claude` or `codex` already on your `PATH` and the auth and billing that CLI is configured with. A reviewer that declares a [`runner`](./authoring-reviewers.md#runner-and-capabilities) instead runs that same backend **inside a container** (which requires `capabilities.network: true`; without it the reviewer is rejected before it runs, so a gate blocks and an advisor is skipped): Bastion invokes the container engine on the host, and the backend CLI resolves inside the image. A fixed set of model-provider credential variables (`ANTHROPIC_API_KEY`, `ANTHROPIC_AUTH_TOKEN`, `ANTHROPIC_BASE_URL`, `ANTHROPIC_MODEL`, `CLAUDE_CODE_OAUTH_TOKEN`, `OPENAI_API_KEY`, `OPENAI_BASE_URL`, `CODEX_API_KEY`) is forwarded from Bastion's environment into the container by name, so the in-container agent can still reach its provider; an image can also bake in its own auth. If the reviewer's own `env` sets one of those names, that value wins and the host's is not also forwarded, so the reviewer can pin a specific credential. Nothing else from your host environment crosses that boundary. To give the in-container agent another value, set it as a literal in the reviewer's `env`, which is forwarded in alongside the credentials. ## How it all fits ```text .bastion.yaml you author this | v bastion review ---> compute changeset (working tree vs base) | v route: select reviewers whose trigger globs match | v run matched reviewers in parallel (each on its backend, each timeout-bounded) | v each returns a verdict (pass/block + summary + findings) | v aggregate: all gates must pass ---> one decision (exit code locally; the review gate in CI) ``` --- Next: [Authoring reviewers](./authoring-reviewers.md). The full registry schema, from the four required fields out to timeouts, environment, and prompt inputs. --- # Authoring reviewers > The registry schema in full, and how to write a reviewer that stays sharp. Reviewers are the whole policy. This chapter is the reference for writing them: the file, the required fields, the optional execution profile, and the craft of a prompt that keeps recall high. It progresses from the minimum you need to the fields you will reach for only occasionally. ## The registry file The repository's reviewers live in one file at its root: `.bastion.yaml` (the `.bastion.yml` spelling is also honored). Bastion finds it by walking up from the current directory, so the command works anywhere inside the repo. The file is a single `reviewers:` list: ```yaml reviewers: - name: single-responsibility trigger: [src/**/*.rs] mode: gate prompt: | ... - name: test-coverage trigger: [src/**/*.rs] mode: advisor prompt: | ... ``` Reviewer **names must be unique** within the file; a duplicate name is a load error. A name also has to work as a directory name in the run store, so a name that reduces to an empty, `.`, or `..` component is rejected, as are two names that collapse to the same component once non-portable characters are normalized (for example `repo:test` and `repo-test`); plain names are unaffected. Because this file *is* the review policy, changes to it should require human review; see [Governance](./governance.md) and `bastion github codeowners`. > **Migrating from `bastion/reviewers.yaml`.** Bastion still loads the legacy > `bastion/reviewers.yaml` location but prints a deprecation warning; the supported > location is `.bastion.yaml` at your repository root. Move the file (the contents > are unchanged) and regenerate your CODEOWNERS block with `bastion github > codeowners`. ## User-level reviewers You can also keep personal reviewers in a user-level `.bastion.yaml` (or `.bastion.yml`) in your platform config directory, so a reviewer you rely on runs locally whether or not a given repository has adopted Bastion: - Linux: `$XDG_CONFIG_HOME/bastion`, defaulting to `~/.config/bastion`. - macOS: `~/Library/Application Support/bastion`. - Windows: `%APPDATA%\bastion`. When both files exist, a local `bastion review` merges the repository's reviewers with your user-level ones into one set, by reviewer name: - A reviewer only one file defines is included as-is. - The same reviewer in both files is deduplicated to one. Sameness is compared by the *effective* configuration after each file's registry `defaults` are applied, so a reviewer that inherits a default `model` or `effort` and one that spells out the same value count as identical. - A name in both files with a *different* effective configuration is a collision; both are kept, your copy under its plain name and the repository's scoped to `repo:`, so neither silently wins. The two files are governed separately, so the collision is surfaced rather than resolved by precedence. This layer is local-only. A review carrying a GitHub source (with `--repo`/`--pr`, as CI runs) skips the user-level registry, so a pull request is gated by the repository's reviewers alone, the `repo:` scope never appears there, and a personal reviewer can never gate someone else's change. `--config-dir ` (or `$BASTION_CONFIG_DIR`) overrides where the user-level file is read from. ## Registry-wide defaults An optional top-level `defaults:` block sets a house `model` and `effort` that every reviewer inherits unless it sets its own. A reviewer's explicit field always wins; the default just fills the gap, so you set the model and effort once instead of repeating them on every reviewer: ```yaml defaults: model: gpt-5 effort: high reviewers: - name: single-responsibility trigger: [src/**/*.rs] mode: gate backend: codex # required: an inherited model needs a pinned backend prompt: | ... ``` A default `model` is still backend-specific, so a reviewer that inherits it must pin a `backend`; an inherited model under `backend: any` is rejected the same way an explicit one is. `defaults` sits *above* each backend's own built-in default (Opus 4.8 at `high` effort on Claude Code), so the resolution order is: the reviewer's own field, then `defaults`, then the backend default. ## The required fields Four fields are mandatory. A reviewer with just these is complete and runnable. ### `name` A unique identifier. It is also the reviewer's check-run name in CI (`bastion / single-responsibility`), so keep it short and descriptive. ### `trigger` A list of path globs matched against the changed files. The reviewer runs if any changed file matches any glob. Globs use the usual `**` (any depth) and `*` (one segment) syntax: ```yaml trigger: [src/**/*.rs] # all Rust under src, any depth trigger: [src/server/**, src/client/**] # either subtree trigger: [src/**/*.rs, docs/**/*.md, ".bastion.yaml"] # multiple kinds ``` Quote a glob if YAML would otherwise mis-parse it (a bare leading `*`, for instance). Scope triggers tightly: a narrow trigger is what keeps an irrelevant reviewer from waking on every change. ### `mode` `gate` (blocks the merge when it returns `block`; fails closed) or `advisor` (never blocks; fails open). See [Concepts](./concepts.md#the-mode-gate-vs-advisor) for the full semantics. ### `prompt` The instruction handed to the reviewing agent. This is where the craft lives; see [Writing a good prompt](#writing-a-good-prompt) below. ## The optional execution profile The remaining fields tune *how* a reviewer runs. All have defaults; omit them until you need them. ### `backend` Which agent harness runs the reviewer. Default `any` (resolves to Claude Code). Pin `claude-code`, `codex`, or `pi` to force a specific harness, usually because a subscription's terms require it, or because one model is better at a given concern. ```yaml backend: codex ``` > `pi` is multi-provider. Pin its provider and model together in the [`model`](#model) > field using Pi's `provider/id` form (e.g. `openai-codex/gpt-5.5`); omit `model` to > run against whatever provider and model your local Pi CLI defaults to. ### `model` The specific model the backend should use, for example `claude-opus-4-8` on Claude Code or `gpt-5` on Codex. A model id is **backend-specific**, so pinning one requires a pinned `backend`: a `model` under `backend: any` is rejected when the registry loads, since Bastion cannot know which backend the id is meant for. ```yaml backend: codex model: gpt-5 ``` Under `backend: pi` the model also names its **provider**, written in Pi's `provider/id` form, because Pi is multi-provider and its bare default provider is `google`. So a Pi reviewer that wants an OpenAI Codex model writes the provider into the id rather than a separate field: ```yaml backend: pi model: openai-codex/gpt-5.5 ``` Omit it to take the backend's default. On Claude Code that default is **Opus 4.8**; on Codex and Pi it is whatever the harness itself resolves (for Pi, its configured default provider and model). To set a model once for the whole registry rather than per reviewer, use the [`defaults`](#registry-wide-defaults) block. ### `effort` The reasoning-effort level, forwarded verbatim to the active backend's effort control (Claude Code's `--effort`, Codex's `model_reasoning_effort`, Pi's `--thinking`). Like `model`, the value is opaque: use whatever vocabulary your backend accepts. Claude Code takes `low`, `medium`, `high`, `xhigh`, or `max`; Codex takes `minimal`, `low`, `medium`, or `high`; Pi takes `off`, `minimal`, `low`, `medium`, `high`, or `xhigh`. The shared `low`/`medium`/`high` levels work on any backend; the backend-specific ones do not, so a value that does not match the reviewer's backend is the backend's problem (Claude Code, for instance, warns and falls back to its own default). ```yaml effort: high ``` The default is **`high`** (accepted by every backend). Lower it on cheap, mechanical reviewers to save tokens; raise it on the ones that need to reason hard. > **The `model:effort` shorthand.** People often write a model and effort together > as `gpt-5.5:high` or `claude-opus-4-8:max`. Bastion has no combined field: that is > just `model:` plus `effort:`. Split it across the two fields, with a `backend` > pinned so the model id is unambiguous: > > ```yaml > backend: codex > model: gpt-5.5 # the part before the colon > effort: high # the part after it > ``` ### `timeout` A per-reviewer wall-clock limit, written in human form (`90s`, `15m`). When a reviewer exceeds it, a gate fails closed (block) and an advisor is skipped. The default is **15 minutes**. Set a short timeout on cheap reviewers and a long one on heavy end-to-end checks: ```yaml timeout: 15m ``` ### `env` Environment variables injected into the reviewer's process, so the agent and any tool it runs can see them. Use this to hand a reviewer a value your environment already provides, say a preview URL: ```yaml env: PREVIEW_URL: http://localhost:3000 ``` Values are **literal**: Bastion does not perform shell `$VAR` expansion, so write the actual value, not `${SOMETHING}`. Bastion consumes environments, it does not provision them: locally the value must already exist (a precommit script might boot the service and export it), and in CI the workflow stands it up. See [Continuous integration](./continuous-integration.md#environments--inputs). How the value reaches the agent depends on where the reviewer runs: - **Native reviewers** (no `runner`) also inherit Bastion's own environment, so a variable your shell or CI has already exported is visible to the agent even without listing it here; the `env` block sets additional values explicitly. - **Containerized reviewers** (with a `runner` and `capabilities.network: true`) do *not* inherit Bastion's arbitrary environment. Into the container go exactly the `env` pairs written here (as literal values, the same as everywhere else) plus a fixed set of model-provider credential variables (see [Backends](./concepts.md#the-backend)). Nothing else crosses, so a value an outer shell or CI job exported reaches a containerized reviewer only if its literal value is written into this `env` block (template the registry if the value is dynamic, for example a per-PR preview URL). For a containerized reviewer the `env` pairs are written to a temporary file handed to the engine as `--env-file`, so their values never appear on the `docker run` command line (a secret in `env` stays out of a process listing) and their names never touch the engine *client* process; the provider credentials are the only variables forwarded by name from Bastion's own environment. If you set one of those provider credential names in this `env` block, your value wins: Bastion does not also forward the host's value for that name, so the reviewer's `env` overrides it (matching how a native reviewer's `env` overrides the inherited environment). One container-only constraint follows from that env-file format (one `KEY=VALUE` per line, no escaping): a containerized reviewer's `env` cannot carry a key containing a newline or `=`, or a value containing a newline. Such a pair is rejected and the reviewer fails closed rather than receive a corrupted value; a multiline value (a PEM key, say) has to reach a containerized reviewer some other way (a file in the image, or one its Dockerfile copies in). Native reviewers have no such limit. ### `inputs` Values interpolated into the prompt *before* it reaches the agent. Reference an input as `${name}` in the prompt; Bastion substitutes the value. Unknown placeholders are left untouched. ```yaml inputs: preview_url: http://localhost:3000 prompt: | Run the checkout flow against the preview environment at `${preview_url}`. If it fails, block the PR and explain; otherwise approve it. ``` `env` puts a value in the *process*; `inputs` puts a value in the *prompt text*. They are independent: use `env` for tools the agent invokes, `inputs` for values the agent should read in its instructions. Input values are literal as well: a `${name}` in the prompt is substituted only from this `inputs` map, never from your shell environment. ### `runner` and `capabilities` The schema also accepts a `runner` block (`dockerfile` / `image`) and a `capabilities` block (`network`, `mcp`, `skills`) to opt into an execution environment beyond the least-privilege default. Where these stand: - **`runner` is provisioned (paired with `network: true`).** A reviewer with a `runner` block and `capabilities.network: true` runs its backend inside a container: a `dockerfile` is built (tagged by a content hash of the Dockerfile, so an unchanged file reuses the engine's layer cache), an `image` is used as-is (the engine pulls it on demand at run time). If both are set, `dockerfile` wins; a `runner` with neither fails closed. The `dockerfile` path is relative to the repository root and must resolve inside it: an absolute path, any path with a `..` component (rejected outright, even one that would resolve back inside), or one that canonicalizes outside the repo through a symlink all fail closed. The build runs with the repository root as its build context, so the Dockerfile's `COPY` and `ADD` can reference files anywhere in the repo. An `image` reference beginning with `-` fails closed, since the engine would read it as a command-line option rather than an image name. The selected backend's executable must exist inside the image on `PATH` (`claude` for `claude-code`, `codex` for `codex`). This lets a reviewer carry tools or a pinned toolchain the host does not have. - **`capabilities.network: true` is required to run a container; the default `network: false` fails closed.** `network: true` gives a containerized reviewer general (unscoped) outbound network. A container's egress cannot be scoped to the model provider yet (the allowlisting proxy is unbuilt), so the default `network: false` reads as restricted but cannot be enforced: rather than silently attach general egress, `ExecutionPlan::resolve` rejects a container with `network: false` before it runs. As with `mcp`/`skills`, that rejection **fails closed**: a gate blocks and an advisor is skipped, with a message naming the field. A containerized reviewer must opt into `network: true` to run, accepting general egress for now. A *native* `network: true` (no `runner`) also fails closed, since with no container there is nothing to scope. - **`capabilities.mcp` and `capabilities.skills` are not provisioned.** A reviewer that declares either **fails closed**: a gate blocks and an advisor is skipped, with a message naming the unprovisioned field, rather than running degraded (a gate that quietly ran without a privilege it asked for would be a silent fail-open). Leave them out. The least-privilege default (no `runner`, `network: false`, no `mcp` or `skills`) runs natively on the host. ## A fully-loaded example Putting the optional fields together. As written, this reviewer runs in the container built from its Dockerfile. It must declare `network: true` to run (a containerized reviewer needs general egress, since provider-only scoping is unbuilt), and Bastion forwards its `env` into that container. ```yaml reviewers: - name: e2e-checkout-flow trigger: [src/**] mode: gate backend: claude-code timeout: 15m env: PREVIEW_URL: http://localhost:3000 # literal value, no shell expansion inputs: preview_url: http://localhost:3000 # substituted into the prompt as ${preview_url} runner: # provisioned: runs the backend in this image dockerfile: ./.bastion/e2e.Dockerfile capabilities: network: true # required to run a container; grants general (unscoped) egress prompt: | Run the e2e checkout flow against the preview environment at `${preview_url}` using Playwright. If it fails, block the PR and explain; otherwise approve it. ``` Adding an unprovisioned capability flips the whole reviewer to fail closed. For example, adding `mcp: [playwright]` under `capabilities` would block this gate before it ever reaches the container, since `mcp` is checked first. Leave `mcp` and `skills` out until those tiers land. ## Writing a good prompt The prompt is the reviewer. A few habits keep recall high: - **Say what to block on, explicitly.** End with a clear instruction: "block the PR if X; otherwise approve it." The reviewer's job is a decision, not an essay. - **Name the one concern and stay on it.** If you find yourself writing "also check...", that "also" is a second reviewer. Split it. - **Carve out the false positives you can predict.** "A single large but cohesive module is not a violation." "Panics in `#[cfg(test)]` code are acceptable." Pre-empting the obvious wrong flags keeps false positives down. - **Match the mode to the language.** A gate's prompt should be decisive; an advisor's should say "report as optional findings... do not block," so its output stays advisory even if the model is tempted to be firm. - **Let the agent explore.** Every reviewer gets a full checkout and is told how to see the changeset (the diff against the base, plus untracked files). You do not need to paste the diff into the prompt; point the reviewer at the property. - **You do not need to ask for completeness.** Bastion appends an instruction to every reviewer prompt telling the agent to report every distinct finding in one pass, not just the first. Write the prompt for the concern and phrase findings per instance (one per file and line range), and the agent enumerates them all so the author fixes the whole set from one run. Some worked examples, taken from Bastion's own registry ([`.bastion.yaml`](https://github.com/jssblck/bastion/blob/main/.bastion.yaml)): ```yaml - name: error-handling trigger: [src/**/*.rs] mode: gate backend: codex prompt: | Review the changeset for error-handling discipline: no `.unwrap()` or `.expect()` on recoverable errors in non-test code, errors propagated with `?` and given context, and gates that fail closed. Block the PR if you find a recoverable error that can panic in production; otherwise approve it. Panics in `#[cfg(test)]` code and in genuinely-unreachable invariants that are documented as such are acceptable. - name: test-coverage trigger: [src/**/*.rs] mode: advisor backend: codex prompt: | Check whether new or changed behavior in this changeset is covered by tests. This is advisory: report uncovered behavior as optional findings so the author can decide, but do not block. ``` ## Validating your registry Run `bastion validate` to parse the registry and report any problem without running a single reviewer or spending a model call: ```sh bastion validate # validate the merged set review would run bastion validate path/to/.bastion.yaml # check a specific file on its own ``` With no file argument it validates the same merged set a local `bastion review` would run, the discovered repository registry plus your user-level one, and names each source it merged. An explicit `FILE` is checked on its own, with no merging. It loads through the same path `bastion review` uses, so it catches exactly the errors a real review would hit at load time: malformed YAML, an unknown field, a duplicate name (including one that survives the user/repo merge), a reviewer missing a required field, or a model pinned under `backend: any`. A valid registry prints a one-line summary and the reviewers it parsed, and exits zero; an invalid one prints the error and exits non-zero, so the command works as a pre-commit or CI lint as well as a quick local check. The registry is also validated whenever it loads for a real `bastion review`, so a malformed file fails fast there too. `bastion validate` just lets you check it on its own, for free, before you run anything. --- Next: [The local workflow](./local-workflow.md). Running `bastion review` in depth, the JSONL agent stream, and inspecting saved runs. --- # The local workflow > Running `bastion review` for real: the loop, the two output formats, exit codes, > and inspecting what was saved. The local CLI is the surface an authoring agent optimizes against before opening a PR. It runs the *same* reviewers CI will run, so a green local loop usually means a PR that CI confirms. Two things can make a local run differ: CI feeds reviewers the PR's description and discussion that a default local run lacks, and a local run also merges in any personal reviewers from your user-level registry, which CI never sees (see [Authoring reviewers](./authoring-reviewers.md#user-level-reviewers)). This chapter covers the loop in depth. ## The loop The intended use is a tight loop: run the review, read what blocks, fix it, run again, until green. ```sh bastion review --base main ``` `bastion review` computes the changeset (working tree vs. `--base`, including uncommitted and untracked files), selects the reviewers whose triggers match, runs them in parallel with per-reviewer timeouts, and renders progress and verdicts. - `--base `: the branch to diff against. Defaults to `main`. - `--format `: output format. Defaults to `human`. - `--repo `: the GitHub repository to gather pull request context from. Defaults to `$GITHUB_REPOSITORY`. - `--pr `: the pull request whose description and discussion the reviewers read as context. Requires a repository, from `--repo` or `$GITHUB_REPOSITORY`; passing `--pr` with no repository is an error. - `--config-dir `: the user-level config directory to merge personal reviewers from (env `BASTION_CONFIG_DIR`). Defaults to your platform config directory (`~/.config/bastion` on Linux, `~/Library/Application Support/bastion` on macOS, `%APPDATA%\bastion` on Windows). The user-level layer is applied only to a purely local review; a review carrying `--repo`/`--pr` uses the repository's reviewers alone. The CI workflow passes `--repo`/`--pr` so reviewers see the PR's stated intent and discussion. Locally you rarely need them: with no PR, intent comes from your branch's commit messages (`base..HEAD`), and each reviewer's prior findings come from the run store. When you do pass them, Bastion builds its GitHub REST client from `GITHUB_TOKEN` and `GITHUB_API_URL` (the latter defaults to the public API and points at a GitHub Enterprise host when set). Discussion gathering reads the first 100 conversation comments and the first 100 review comments and does not paginate, so later comments on a very long thread are not included. Gathering PR context is read-only and best effort, so an API or token failure never fails the review; it just drops back to the local context. ### Exit codes The exit code *is* the gate, so a loop can branch on it: | Aggregate verdict | Exit code | | --- | --- | | `pass` (all gates passed) | `0` | | `block` (a gate blocked, errored, or timed out) | non-zero | ```sh # Keep working until every gate is green. until bastion review --base main; do echo "still blocked; fixing..." # ... make changes ... done ``` A blocked review is an *expected* outcome, not a crash: Bastion still exits cleanly with structured output, and only the code signals the gate. ## Two audiences, two formats By default `bastion review` renders human-readable progress for a person watching. An agent passes `--format jsonl` and gets a machine stream instead. Both describe the same run; only the presentation differs. ### The JSONL stream With `--format jsonl`, Bastion emits one JSON object per line, as each thing happens. A run is a typed sequence of events: ```jsonl {"type":"run.started","run":"r-0f3a","branch":"feat/cart","base":"main","changed":12,"reviewers":[{"name":"file-responsibility","mode":"gate"},{"name":"tenant-isolation","mode":"gate"}]} {"type":"reviewer.started","run":"r-0f3a","reviewer":"tenant-isolation","mode":"gate","backend":"claude-code"} {"type":"reviewer.resolved","run":"r-0f3a","reviewer":"tenant-isolation","verdict":"block","summary":"A new query path reads rows without scoping by tenant id.","findings":[{"kind":"blocking","path":"src/server/db.rs","line_start":88,"line_end":91,"detail":"scope this query by tenant_id"}],"usage":{"tokens_in":18204,"tokens_out":1560,"cache_read":12000,"cost_usd":0.21},"duration_ms":38120,"has_transcript":true} {"type":"run.completed","run":"r-0f3a","verdict":"block","gates":{"total":2,"passed":1,"blocked":1},"duration_ms":41030,"tokens_in":20480,"tokens_out":1875,"cache_read":13100,"cost_usd":0.37} ``` The event types: | Event | Meaning | | --- | --- | | `run.started` | The run began; lists the reviewers that matched and will run. | | `reviewer.started` | One reviewer was dispatched. | | `reviewer.resolved` | One reviewer finished; carries its `verdict`, `summary`, `findings`, `usage`, and a `has_transcript` flag. | | `run.completed` | The aggregate decision and the gate tally, plus the run's wall-clock `duration_ms` and the usage totals (`tokens_in`, `tokens_out`, `cache_read`, `cost_usd`) summed across reviewers. | How an agent should consume it: - **Only need the outcome?** Ignore everything until `run.completed` and read its `verdict`. - **Want to react as you go?** Read each `reviewer.resolved` as it lands and act on its `findings`: a `path`, a `line_start`/`line_end`, and a `detail` telling you what to change. The findings are everything you need to fix the code. ### For agents: the consumption contract If you are an agent driving the loop, this is the whole contract: 1. Run `bastion review --base --format jsonl`. 2. Parse stdout one line at a time as JSON; each line has a `type`. 3. Act on every `reviewer.resolved` with `verdict: "block"` using its `findings` (`path` + `line_start`/`line_end` + `detail`). Do not open transcripts; the findings already say what to change. 4. The aggregate decision is `run.completed.verdict`. The process also exits non-zero on `block`, so you can branch on the exit code alone if you only need pass/fail. 5. Fix what blocked and re-run. Loop until `run.completed.verdict` is `pass` (exit zero), then open your PR. This contract is exactly what `bastion skills install` checks into your repo as the `using-bastion` agent skill, so your agents follow it without being told each time. See [Teach your agents to use Bastion](./getting-started.md#7-teach-your-agents-to-use-bastion). ### The skills-freshness notice on stderr Before it runs, `bastion review` compares the `using-bastion` skill checked into your repo (under `.claude/skills` and `.agents/skills`) against the copy bundled in the running binary, the same comparison `bastion skills check` makes. When the checked-in copy is missing or has drifted, it prints a one-line notice to **stderr** naming the affected files and pointing at `bastion skills install`. This is the case where your agents may be following stale guidance, so the driving agent sees the notice inline with the run. It goes to stderr on purpose, keeping stdout as pure JSONL for a parser; the notice is advisory, so it never adds an event to the stream and never changes the exit status. A `block` still comes only from a reviewer. Run `bastion skills install` (add `--force` to overwrite a file you edited) and commit the result to clear it. ### Money is dollars Cost fields (`cost_usd`) serialize as dollars (`0.21`) even though Bastion tracks exact cents internally, so you never see floating-point cent drift in the stream. Token fields (`tokens_in`, `tokens_out`, `cache_read`) are plain integer counts; on `run.completed` they are the totals summed across every reviewer that reported usage, the same way `cost_usd` is. `cache_read` is the input tokens served from the provider's prompt cache (cache hits); each backend names it differently natively (Claude's `cache_read_input_tokens`, Codex's `cached_input_tokens`, Pi's `cacheRead`) and Bastion normalizes them to one field. It is 0 when a backend reports no cache usage. ## What is streamed vs. what is saved The stream deliberately leaves out the verbose detail. A transcript is mostly noise to an agent that just wants to know what to fix; streaming thousands of lines on every run would bury the findings and burn the agent's own context. - **Streamed:** the decisions and the things you act on immediately: the reviewer set, start and resolve events, verdicts, summaries, findings, per-reviewer usage. - **Saved, not streamed:** the verbose detail: full session transcripts, raw verdict payloads, per-reviewer metadata. Written to disk, read on demand. That is why `reviewer.resolved` carries `has_transcript: true` rather than the transcript itself: when a decision surprises you, the transcript is one command away (next section). ## Inspecting saved runs Every run is persisted, so you can inspect history without re-running anything. These commands are the local equivalent of clicking "Details" on a CI check. The run-targeted ones (`show`, `transcript`) default to the latest run when you omit a run id; `runs` and `clean` operate over all saved runs. ```sh bastion runs # list recent runs: id, verdict, branch, reviewer count bastion show [] # re-print a run's summaries, verdicts, findings bastion transcript [] # the full agent session for one reviewer bastion clean [--keep N | --older-than ] # prune saved runs ``` - **`runs`** is the index: what ran recently and how each landed. - **`show`** re-emits a past run's verdicts and findings, the same content as the stream's resolve and complete events, on demand. Accepts `--format human|jsonl`. - **`transcript`** prints the saved session for one reviewer. This is the explicit, opt-in way to see what was kept off the stream; reach for it when a verdict is surprising and you want to know why. It is raw text (a transcript is already a document). Pass either `` (latest run) or ` `. - **`clean`** prunes old runs. `--keep N` retains the N most recent; `--older-than ` (e.g. `7d`, `12h`) removes runs older than a duration. The two are mutually exclusive. ## Where runs live Bastion persists every run under a per-user data directory, by platform convention: - Linux: `$XDG_DATA_HOME/bastion`, default `~/.local/share/bastion` - macOS: `~/Library/Application Support/bastion` - Windows: `%APPDATA%\bastion` Override it with `--data-dir ` or the `BASTION_DATA_DIR` environment variable, handy for scratch runs you do not want in your real history. The layout: ```text / runs/ r-0f3a/ run.jsonl # the full event stream (always JSONL, regardless of display format) reviewers/ tenant-isolation/ transcript.jsonl # the full agent session verdict.json # the raw structured verdict meta.json # backend, timing, usage, matched trigger latest # a plain file holding the most recent run id ``` `run.jsonl` is the same event stream whether a human or an agent triggered the run, so any run can be replayed or inspected after the fact. Runs accumulate: `bastion review` does not prune, so history grows until you run `bastion clean`, which keeps the most recent 20 when given no arguments (or use `--keep N` / `--older-than `). ## Providing environments locally For a **native** reviewer, the reviewer process inherits Bastion's own environment, so anything your shell or a `precommit` script has exported (a service on `http://localhost:3000`, say) is visible to the agent; a reviewer's `env` and `inputs` values are literal text set in the YAML, not shell-expanded. Bastion only reads values your shell or CI already exported; it does not stand them up. This is the same boundary CI honors, which keeps the local and CI surfaces in agreement. A **containerized** reviewer (one with a [`runner`](./authoring-reviewers.md#runner-and-capabilities), which today must also set `capabilities.network: true` to run) does not inherit your shell environment, since it runs in a container. Into it go the reviewer's literal `env` pairs plus a fixed provider-credential set, and nothing else. (If the reviewer's `env` sets one of those credential names, its value wins and the host's is not also forwarded.) So an exported `PREVIEW_URL` that a native reviewer would see for free reaches a containerized one only if you write its literal value into that reviewer's `env`, and a containerized reviewer typically reaches a host service over the container network rather than `localhost`. ## The same surface in CI For the repository's reviewers, these local events are not a separate system from CI; they are the same decisions in a finer-grained form. Each such JSONL event has a GitHub twin (a check run, a comment, an annotation), laid out side by side in the [Continuous integration](./continuous-integration.md#how-a-run-maps-to-github) chapter. A green local loop predicts a green PR when both runs see the same reviewers and context. The two surfaces run the repository's reviewers and aggregation, and CI adds the PR's description and discussion that a default local run does not, so a reviewer that weighs that context can decide differently. A purely local run can also include your personal user-level reviewers; their `run.started` and `reviewer.resolved` events are local-only and never become checks or comments (see [Authoring reviewers](./authoring-reviewers.md#user-level-reviewers)). --- Next: [Continuous integration](./continuous-integration.md). Promoting these same reviewers into GitHub Actions as a required merge check. --- # Continuous integration > Promoting your reviewers into GitHub Actions: one required check and per-author > billing. The local loop gets you to green before you open a PR. CI is the authoritative confirmation: it runs the reviewers from the repository's `.bastion.yaml` and reports one merge gate. Because routing and aggregation are shared, CI rarely surprises an author who looped locally. It can differ in two ways: CI adds the PR's description and discussion that a default local run lacks, and CI runs the repository's reviewers only, while a local run can also include your personal user-level reviewers (see [Authoring reviewers](./authoring-reviewers.md#user-level-reviewers)). The user-level layer is local-only by design, so it can never gate someone else's pull request. This chapter covers the GitHub adapter, the one forge Bastion targets. > Bastion does not own CI; it plugs into yours. The workflow, the secrets, the > preview environments, and the branch-protection rules are GitHub's. Bastion > reads and writes them through a thin adapter and otherwise stays out of the way. ## How a run maps to GitHub On each pull-request event (`opened`, `synchronize`, `reopened`) the workflow runs `bastion review`, which computes the changed files, routes to the matching reviewers, runs them in parallel with per-reviewer timeouts, and persists the run. A second step, `bastion github report`, reads that run and posts it. A verdict reaches two GitHub surfaces: - **Findings are posted to the PR.** `bastion github report` renders every finding (blocking and optional) into a single sticky PR comment, and attaches each located finding to its reviewer's check run as an annotation on the finding's `path` and line range. The sticky comment is the surface an implementing agent reads; it carries everything it needs to act. - **Each verdict becomes a check run** named after the reviewer (`bastion / tenant-isolation`). A blocking gate reports `failure`; a passing gate reports `success`; an advisor reports `success` with its findings attached. `bastion github report` also folds a skills-freshness advisory into the sticky comment when the checked-out repo's bundled skills (`.claude/skills` and `.agents/skills`) are missing or have drifted from the reporting binary, the same comparison `bastion skills check` makes. It renders as a `> [!WARNING]` callout just under the headline, naming each affected file and pointing at `bastion skills install`. It is advisory only, so it never changes a check-run conclusion or the `bastion` gate; it tells you to refresh stale skills without failing the build. The local `bastion review` prints the same notice to stderr. The local-to-GitHub mapping is one-to-one for the repository's reviewers: the JSONL events a CI or `bastion review --repo/--pr` run produces are the same decisions GitHub renders as checks and a comment. (A purely local run can also include your personal user-level reviewers, whose events are local-only and have no GitHub twin.) Each GitHub surface has a local twin: | GitHub | Local | | -------------------------------------------------------------- | ----------------------------------- | | A per-reviewer check run reaching its conclusion | `reviewer.resolved` event | | Findings in the sticky PR comment and as check-run annotations | `findings` in `reviewer.resolved` | | Tokens and cost in the check output | `usage` in `reviewer.resolved` | | The aggregate `bastion` check and the sticky PR comment | `run.completed` event | | Transcript in the uploaded run artifact | saved on disk, `bastion transcript` | The local stream additionally carries `run.started` and `reviewer.started` for an agent reacting as the run goes; those have no separate GitHub surface, because `bastion github report` runs after the review finishes and renders the result in one pass. This mapping is deliberate, so an agent's local loop and the CI gate stay aligned on what a review means. ## The one required check Branch protection needs you to name the checks that must pass, but Bastion's set of reviewers *varies per PR*: a docs-only PR and a server PR trigger different reviewers, so there is no fixed list of names to require. The fix is a single always-present check, **`bastion`**, and it is the only one branch protection requires. It runs even when zero reviewers match (a trivial pass) so it is always there to require. Internally it reflects the aggregate: `success` only when every triggered gate passed, `failure` if any gate blocked, errored, or timed out (fail-closed). The per-reviewer checks stay informational; `bastion` is the gate. ## The workflow The adapter is a self-hosted workflow that installs a published `bastion` release plus your backend CLI, authenticates the backend, runs `bastion review`, and then runs `bastion github report` to post the results to the PR. The CLI exits non-zero if any gate blocks, so the job's pass/fail *is* your merge gate; the report step adds the sticky comment and the per-reviewer and aggregate check runs. That host backend CLI and its auth cover **native** reviewers (the default). A reviewer with a [`runner`](./authoring-reviewers.md#runner-and-capabilities) runs its backend *inside a container* instead (and must declare `capabilities.network: true`; without it the reviewer is rejected before it runs, so a gate blocks and an advisor is skipped), so for those the job needs a container engine on the runner (`docker` by default, or whatever `BASTION_CONTAINER_ENGINE` names) and the backend executable plus its auth inside the image, not on the host. The fixed provider credential variables are forwarded from the job into the container by name, so the host auth still reaches a containerized reviewer's provider even though the CLI itself lives in the image: ```yaml name: bastion on: pull_request: types: [opened, synchronize, reopened] # The report step writes the PR comment and the check runs, so the job needs more # than read access. permissions: contents: read pull-requests: write checks: write jobs: review: runs-on: ubuntu-latest # True only when both dedicated-app secrets are set (the id and key are one # credential), so a half-configured repo falls back instead of failing the mint # step. Computed here because the `if:` below can read `env` but not `secrets`. env: HAS_BASTION_APP: ${{ secrets.BASTION_APP_ID != '' && secrets.BASTION_APP_PRIVATE_KEY != '' }} # Agentic backends run over the PR's code with live credentials, so restrict to # same-repo PRs; a maintainer re-runs a fork PR from a trusted branch. if: github.event.pull_request.head.repo.full_name == github.repository steps: - uses: actions/checkout@v4 with: fetch-depth: 0 # full history; reviewers diff against the base # 1. Install a published bastion release (not built from the PR). # 2. For native reviewers: install your backend CLI (claude, codex, or pi) on # the runner and authenticate it as the PR author. The concrete per-author # auth step is in "Authentication & billing" below; drop it in here. For # reviewers with a `runner`: ensure a container engine is on the runner # (docker by default, or set BASTION_CONTAINER_ENGINE) and that the backend # CLI and its auth live inside the image; the provider credential variables # are forwarded in by name. # 3. Stand up anything your reviewers consume (a preview env, a database). - name: Review env: BASTION_DATA_DIR: ${{ github.workspace }}/.bastion # Lets the reviewers read the PR's description and discussion as context # (read-only, best effort; gathering reads the first 100 conversation comments # and first 100 review comments, no pagination). Omit the --repo/--pr flags # below to review the diff and local context without PR discussion. GITHUB_TOKEN: ${{ github.token }} # Non-zero exit on a blocked gate fails the job; that is the merge gate. # --repo/--pr feed the reviewers the PR's stated intent and discussion alongside # the diff. Cross-run prior-findings memory needs the run store persisted between # runs (upload and restore .bastion/runs); a fresh runner starts without it. run: | bastion review --base "origin/${{ github.base_ref }}" \ --repo "${{ github.repository }}" \ --pr "${{ github.event.pull_request.number }}" # Optional: mint a token for a dedicated Bastion app so the check runs get # their own check suite and render under the app's name. Skipped (and the # report falls back to the default GITHUB_TOKEN) when the app is not set up. # See "Grouping the checks under their own app" below. - id: app-token if: ${{ always() && env.HAS_BASTION_APP == 'true' }} uses: actions/create-github-app-token@v2 with: app-id: ${{ secrets.BASTION_APP_ID }} private-key: ${{ secrets.BASTION_APP_PRIVATE_KEY }} - name: Report to the PR # Runs even when the review blocked and failed the job, so the comment and # checks always land. Creating check runs needs a GitHub App installation # token (a classic PAT cannot); both the dedicated-app token and the default # GITHUB_TOKEN qualify, so use the dedicated one when present and fall back. if: always() env: GITHUB_TOKEN: ${{ steps.app-token.outputs.token || github.token }} BASTION_DATA_DIR: ${{ github.workspace }}/.bastion run: | set -euo pipefail bastion github report \ --repo "${{ github.repository }}" \ --pr "${{ github.event.pull_request.number }}" \ --sha "${{ github.event.pull_request.head.sha }}" ``` ### `bastion github report` The report step reads the run that `bastion review` just persisted (under `BASTION_DATA_DIR`) and posts it to the pull request. Its full surface: ``` bastion github report --repo --pr --sha [RUN] ``` - `--repo `: the repository to post to. Defaults to the `GITHUB_REPOSITORY` environment variable that Actions sets, so you can usually omit it. - `--pr `: the pull request number (required). - `--sha `: the head commit the check runs attach to (required); pass the PR's `head.sha`, not the merge commit. - `RUN`: an optional positional run id to report; defaults to the latest recorded run, which is what you want right after `bastion review`. It needs a token with `pull-requests: write` and `checks: write` in `GITHUB_TOKEN`, and reads `GITHUB_API_URL` (Actions sets it; also the hook for GitHub Enterprise). Creating check runs requires a GitHub App installation token; both the default Actions `GITHUB_TOKEN` and a dedicated-app token (see below) are installation tokens and qualify, while a classic personal access token does not. If the run cannot be found (an earlier failure persisted nothing), it prints a notice and exits 0 rather than failing the step a second time. The command is CI-facing and has no local mirror: locally you read findings straight from `bastion review --format jsonl`. ### Grouping the checks under their own app In the PR checks list, the name before the `/` is not the workflow that created a check; it is the **check suite** the check belongs to, and a check suite is keyed by `(GitHub App, commit)`. Every GitHub Actions workflow runs under the one shared `github-actions` app, so a commit that triggers several workflows has several `github-actions` suites. The check runs `bastion github report` creates through the REST API carry no suite id (the API does not accept one), so GitHub attaches them to one of those suites of its own choosing, often a sibling workflow's. The result is check runs that read like `Security / fail-closed-gates` instead of grouping on their own. A check run lands in its own named suite only when a **distinct GitHub App** creates it. So the fix is to post the report under a small app of your own rather than the shared Actions identity: 1. Create the app. Go to [bastion.jessica.black/github-app](https://bastion.jessica.black/github-app) and follow the walkthrough; it shows how to create a GitHub App by hand in GitHub's UI with exactly the permissions the report step needs (`checks: write`, `pull_requests: write`, `contents: read`, no webhook). The app's **name** is what the checks group under, for example `YourOrg's Bastion`. 2. Generate the app's private key, note its numeric App ID, and install the app on the repositories that run Bastion. 3. Store `BASTION_APP_ID` (the App ID) and `BASTION_APP_PRIVATE_KEY` (the `.pem` contents) as Actions secrets. For Dependabot-triggered runs, set them in the Dependabot secret store too. The workflow above mints a token from those secrets with [`actions/create-github-app-token`](https://github.com/actions/create-github-app-token) and hands it to the report step; the per-reviewer and aggregate checks then render under the app's name. The step is fully optional: with the secrets unset it is skipped and reporting falls back to the default `GITHUB_TOKEN`, which still posts the comment and checks, only grouped under whichever suite GitHub picks. When that happens, `bastion github report` notices (it reads back the app that GitHub stamped on the check runs it just created) and closes the PR comment with a short note linking here; once a dedicated app is configured the note disappears. Because the report reads GitHub's response, the workflow does not pass a flag. For a complete, working example (latest-release install, per-author backend credentials, and fork-PR safety), see Bastion's own [`.github/workflows/bastion.yml`](https://github.com/jssblck/bastion/blob/main/.github/workflows/bastion.yml). It wires up the per-author auth recipe in [Authentication & billing](#authentication--billing) below, on the Codex backend. Configure branch protection on your default branch to require this job (and to require review of the reviewer-policy paths; see [Governance](./governance.md)). Merging stays GitHub-native: an author enables auto-merge, and once the required job is green GitHub merges. A push re-triggers the workflow and it resolves again. ## Authentication & billing Coding-agent subscriptions tie usage to an individual, not a team, so Bastion bills a PR's reviews to the *PR author*. Reviewing Alice's PR is billed to Alice's subscription, which is the ToS-compliant reading: each contributor's plan powers the review of their own changes. Bastion never stores credentials. The team stores each author's credential as an Actions secret, and the workflow maps the PR author's GitHub login to the matching secret at run time. Bastion just runs your backend CLI, and the backend reads whatever auth it finds on the runner. Your job in CI is to place the right author's credential where that CLI looks before `bastion review` runs. The pattern is the same for every backend: 1. **Capture the credential once, locally.** Each contributor signs in to the backend on their own machine. The CLI writes a credential file: | Backend | Sign-in | Credential file the CLI reads | | ------------- | ------------------ | ---------------------------------------------- | | `codex` | `codex login` | `~/.codex/auth.json` (relocatable: `CODEX_HOME`) | | `pi` | `pi` auth flow | `~/.pi/agent/auth.json` | | `claude-code` | `claude` sign-in | `~/.claude` (OAuth token) | For a ChatGPT or Claude **subscription**, this file holds an OAuth credential (an access token plus a refresh token); the CLI refreshes the short-lived access token from the stored refresh token on each run, so the secret does not need rotating every time the access token expires. A Codex `auth.json` from a ChatGPT sign-in carries `"auth_mode": "chatgpt"`, and the native `backend: codex` reads it directly: you do **not** need Pi to spend a ChatGPT subscription (see [Spending a subscription in CI](#spending-a-subscription-in-ci) below). 2. **Store it as a per-author secret.** Copy the file's contents into a repository secret named `_AUTH_`: the backend, then the GitHub login uppercased. For the `codex` backend and the login `jssblck`, that is `CODEX_AUTH_JSSBLCK`; for `pi`, `PI_AUTH_JSSBLCK`. The name is a convention you pick and reference in the workflow, not something Bastion parses. 3. **Map the login to the secret in the workflow.** Resolve `github.event.pull_request.user.login` to the matching secret through a `case` arm, then write it back to the path the CLI reads: ```yaml - name: Authenticate Codex as the PR author env: AUTHOR: ${{ github.event.pull_request.user.login }} CODEX_AUTH_JSSBLCK: ${{ secrets.CODEX_AUTH_JSSBLCK }} run: | set -euo pipefail author="$(printf '%s' "$AUTHOR" | tr '[:upper:]' '[:lower:]')" case "$author" in jssblck) cred="$CODEX_AUTH_JSSBLCK" ;; *) echo "::error::No Codex credential mapped for PR author '$AUTHOR'. Add a CODEX_AUTH_ secret and a case arm." >&2 exit 1 ;; esac if [ -z "$cred" ]; then echo "::error::Codex credential for '$AUTHOR' is mapped but its secret is empty." >&2 exit 1 fi mkdir -p "$HOME/.codex" printf '%s' "$cred" > "$HOME/.codex/auth.json" chmod 600 "$HOME/.codex/auth.json" ``` Onboarding a contributor is then two reviewed lines: their secret and a `case` arm. Because the mapping lives in the workflow, which is a CODEOWNERS-protected path (see [Governance](./governance.md)), changing who may spend a subscription is itself a human-reviewed change. An author with no mapped secret **fails closed**: the step errors and the gate blocks, rather than silently billing someone else's subscription. If you would rather a new contributor never be blocked, point the `*)` arm at a shared metered **API key** instead of erroring: store the provider's API key as a secret and export it (for example `CODEX_API_KEY` / `ANTHROPIC_API_KEY`) into the review step rather than writing an `auth.json`. The same login-to-secret shape applies. Under heavy volume a subscription's rate limits can throttle reviewers, and because gates fail closed a throttled reviewer reads as a blocked merge, so some teams use API billing in CI and keep subscriptions for the local loop. ### Spending a subscription in CI A ChatGPT or Claude subscription works in CI the same way it does locally: the backend CLI reads its OAuth `auth.json` and refreshes the token itself. Use the backend that matches the subscription you have: - **`backend: codex` with a ChatGPT subscription.** Sign in with `codex login` (ChatGPT), store `~/.codex/auth.json` as `CODEX_AUTH_`, and rehydrate it to `$HOME/.codex/auth.json` as shown above. This is the direct path; no Pi involved. - **`backend: claude-code` with a Claude subscription.** Same shape against the `claude` CLI's auth. - **`backend: pi` with the `openai-codex` provider.** Pi can also spend a ChatGPT subscription, through its `openai-codex` provider (`model: openai-codex/gpt-5.5`). Reach for this only when you specifically want Pi's multi-provider routing; for plain Codex-on-ChatGPT, the native `codex` backend is simpler. > **The two `auth.json` files are different.** `~/.codex/auth.json` (Codex CLI) and > `~/.pi/agent/auth.json` (Pi CLI) are distinct file formats backed by the same > ChatGPT account. The secret you store must match the backend you pin: a Codex > `auth.json` rehydrated where Pi looks (or the reverse) will not authenticate. Pick > the backend first, then capture that CLI's file. ### Dependabot and bot authors Dependabot opens **same-repo** PRs, so they clear the fork guard and Bastion reviews them like any other PR. With the `permissions:` block the example workflow declares, the default `GITHUB_TOKEN` posts the `bastion` check on a Dependabot PR, so you can require it for those PRs too. There is no read-only-token deadlock to work around. Dependabot has one required difference for everyone and one extra step that applies only to per-author billing: - **Secrets come from a separate store (applies to everyone).** GitHub serves secrets to Dependabot-triggered runs from a *Dependabot* secret store, not the Actions store. Whatever credential your review step reads, an `ANTHROPIC_API_KEY` or a per-author `_AUTH_`, must be set in that store as well (`gh secret set --app dependabot`), or it arrives empty on a Dependabot PR and the gate fails closed. - **A bot has no subscription of its own (per-author billing only).** If you map per-author credentials, the bot author needs a `case` arm pointing at a maintainer who sponsors its reviews, and the bracketed login must be quoted, since `[bot]` is a glob character class in a shell `case` pattern: `'dependabot[bot]') cred="$CODEX_AUTH_JSSBLCK" ;;`. An arm that maps to an empty secret fails closed with a "mapped but empty" error, usually the sign the Dependabot-store copy is missing. Billing with a shared API key instead of per-author secrets avoids this entirely: there is no per-author arm to maintain. ### Fork-PR safety GitHub does not expose secrets to workflows triggered by **fork** pull requests, and an agentic backend should never run over untrusted code with a live credential anyway. The example workflow guards on `github.event.pull_request.head.repo.full_name == github.repository`, so it runs for same-repo PRs only. A fork contribution is reviewed by a maintainer re-running it from a trusted branch in the repo. ## Environments & inputs Bastion consumes environments; it does not provision them. A reviewer that needs a preview URL, a database, or any running dependency expects the workflow to have stood it up and exposed it. Typically an earlier job deploys a preview environment for the PR and passes its URL into the Bastion job as an environment variable. How that variable reaches the agent depends on where the reviewer runs. A **native** reviewer inherits the job environment, so the agent can see it directly. A **containerized** reviewer (one with a [`runner`](./authoring-reviewers.md#runner-and-capabilities) and `capabilities.network: true`) runs in a container and does *not* inherit the arbitrary job environment. Only the reviewer's literal `env` pairs cross that boundary (plus a fixed provider-credential set, except that a credential name set in the reviewer's own `env` wins and is not also forwarded from the job environment), so a per-PR value reaches a containerized reviewer only if you write its value into the registry, typically by templating `.bastion.yaml` before the Bastion job runs. A reviewer's `env` and `inputs` values are literal (Bastion does not shell-expand them), so to put a dynamic value into the prompt itself you template the registry or have the prompt read the variable. Standing up the environment is a deploy concern; Bastion's job starts once it exists. (See [Authoring reviewers](./authoring-reviewers.md#env) for the reviewer side.) ## Self-hosting note Bastion dogfoods the adapter through [`.github/workflows/bastion.yml`](https://github.com/jssblck/bastion/blob/main/.github/workflows/bastion.yml), running the latest published `bastion` release rather than a binary built from the PR's own sources, so a change can never edit the engine that judges it. That workflow is a concrete, self-hosted instance of everything this chapter describes. --- Next: [Governance](./governance.md). Keeping humans at the policy layer with CODEOWNERS and branch protection, and the escape-to-improvement loop. --- # Governance > Keeping humans at the policy layer: protecting the registry, the > escape-to-improvement loop, and what Bastion deliberately does not guarantee. Bastion relocates the human from reviewing diffs to *governing the reviewers*. That only works if the reviewer policy itself is protected and continuously improved. This chapter is the human's operating manual. ## The policy layer The reviewers, their prompts, and their triggers *are* the review policy. The whole safety story rests on a simple rule: **any change to that policy is reviewed by a human before it merges.** Otherwise an aligned-but-mistaken agent could quietly loosen a trigger or soften a prompt, and the gate would erode without anyone noticing. Two native GitHub mechanisms enforce this; neither is exotic. ### CODEOWNERS protects the registry Bastion can generate a CODEOWNERS block covering the reviewer-policy paths: the registry, the reviewer definitions, the Bastion workflow, and the CODEOWNERS file itself: ```sh bastion github codeowners --owner @your-org/platform ``` Pass `--owner` once per owner (it is repeatable). Add the generated block to your `CODEOWNERS`. With that block in place, any PR that adds, removes, or edits a reviewer; loosens a trigger; or changes a prompt touches an owned path, so GitHub requires a human review before merge. You can also write your own CODEOWNERS instead; the generated block is a correct starting suggestion. > Why generate it statically rather than have Bastion manage it live? CODEOWNERS > changes only take effect *after* a PR merges, so the file must be written to > protect every path Bastion will ever write into, ahead of time, which is what > the generated, reviewed block provides. ### Branch protection requires the check Require Bastion's review on your default branch. That is the review job from [Continuous integration](./continuous-integration.md#the-workflow), which also posts the always-present aggregate check named `bastion` (with a check run per reviewer alongside it), so you can require either the job or that `bastion` check. A PR then cannot merge with the gate switched off, and because the workflow file and the registry are themselves owned paths, switching it off is itself a policy change a human sees. That is the entire enforcement story, and it is intentionally modest. The contributor Bastion is designed for is an aligned agent that would never quietly disable CI; the CODEOWNERS trip wire and the required check exist so that *if* policy changes, a human is in the loop, not so that a determined adversary is stopped. ## The escape-to-improvement loop An **escape** is a PR that merged but should have been blocked: a reviewer missed something. Escapes are inevitable, especially early while reviewers are still being tuned, and they are the single most valuable signal for improving the system. Bastion cannot detect escapes itself: if it could, it would have blocked them. This is a human governance loop: 1. **Notice** an escape (monitoring, a bug report, a production incident). 2. **Triage** it: which reviewer(s) should have caught it, and why did they not? 3. **Improve** the policy: sharpen a prompt, add a new single-concern reviewer for the missed property, or fix the reviewer's environment. 4. **Merge** the policy change (through the CODEOWNERS-gated human review above). This is why Bastion expects reviewers to improve over time. Start with a reviewer that is good enough and sharpen it from real escapes instead of perfecting it on paper. Treat escapes as expected feedback rather than failures, and triage them regularly so the policy keeps improving. ## What Bastion does not guarantee Govern with these limits in mind; they are deliberate, not gaps to be closed: - **It is not a correctness proof.** Bastion does not guarantee code is free of bugs or vulnerabilities. A reviewer is only as good as its model and prompt; it is code review without the human in the small loop, not a verifier. - **It does not judge whether the right thing is being built.** That is a design-time question; by PR time the ship has sailed. Keep humans in the design loop. - **It is not an adversarial security boundary.** Bastion assumes PR authors are aligned contributors and treats reviewed code as trusted input; it does not defend reviewer agents against prompt injection or exfiltration from the code they review. The bar is *reasonable reduction proportionate to effort*: a speed bump and good defaults, like lint and CI and human review before it. Anything stronger (signing, external rule storage, an enumerated trusted-computing-base) is deliberately out of scope. These limits follow from one assumption: the threat being managed is an aligned-but-fallible agent, not a determined adversary. Govern accordingly. Bastion is a control on honest mistakes and drift, layered with the rest of your CI, not a boundary that holds against someone actively trying to defeat it. ## A governance checklist For a healthy deployment: - [ ] `.bastion.yaml` and the Bastion workflow are CODEOWNERS-protected. - [ ] Bastion's review is required by branch protection on the default branch (the review job, or the aggregate `bastion` check that `bastion github report` posts). - [ ] Reviewer-policy PRs get a real human review, not a rubber stamp. - [ ] Someone owns escape triage, and escapes feed back into reviewer changes. - [ ] Billing is configured (per-author secrets or an API-key fallback) so reviews are not silently blocked by missing credentials. See [Continuous integration](./continuous-integration.md#authentication--billing). --- That is the guide. If you want to work on Bastion itself rather than use it, the design notes and contributor docs live in the [Bastion repository](https://github.com/jssblck/bastion).