For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: When a test-passing change to wiki-browser's own Go source lands on master, the Raspberry Pi rebuilds and redeploys itself — automatically, crash-safely, and with auto-rollback on a bad deploy.
Architecture: A GitHub Action (the WB Action) runs on every push, classifies the changed paths, runs go test ./... for code changes, and makes one HMAC-signed POST /api/webhook/ci to the Pi. The handler always triggers a git-sync Sync and, on deploy:true, writes the approved commit to a trigger file. A systemd .path unit turns that into a wb-cd oneshot run: wb-cd builds the exact commit via git archive, atomically swaps the binaries, restarts wiki-browser (which now drains in-flight Agent jobs gracefully), health-checks /healthz, and rolls back to the previous binaries if the new one is unhealthy.
Tech Stack: Go 1.x, os/exec (git, make, systemctl, tar), net/http, gopkg.in/yaml.v3, the existing internal/walker / internal/gitsync / internal/alert / internal/agent packages, GitHub Actions, systemd.
Spec: docs/superpowers/specs/2026-05-21-wiki-browser-cd-design.html
New files:
internal/walker/matcher.go — Matcher: the shared "is this a tracked document" predicate.internal/walker/matcher_test.go — matcher tests.cmd/classify-ci-change/main.go — the WB Action's path classifier (Go program, run via go run).cmd/classify-ci-change/classify.go — the pure classification function.cmd/classify-ci-change/classify_test.go — classifier tests.internal/cd/config.go — Config for the deploy engine.internal/cd/state.go — State (deployed/poisoned commit) load + atomic save.internal/cd/state_test.go — state tests.internal/cd/git.go — commitPresent, archiveCommit.internal/cd/git_test.go — git-helper tests.internal/cd/build.go — buildCommit (archive + make build).internal/cd/swap.go — snapshotBinaries, swapBinaries, restartService, healthCheck.internal/cd/swap_test.go — swap/health tests.internal/cd/deploy.go — Deployer, Run (the trigger-drain loop), deployOne, rollback, alerting.internal/cd/deploy_test.go — deploy-loop tests.internal/cd/helpers_test.go — newTestRepo and shared test helpers.cmd/wb-cd/main.go — the wb-cd oneshot: config load → cd.Deployer → Run.deploy/wb-cd.path — systemd path unit watching the trigger file.deploy/wb-cd.service — systemd oneshot unit.deploy/wb-cd-alert.service — systemd OnFailure= alert unit.wiki-browser/ci-tracked-paths.yaml — committed non-secret extensions/exclude policy for the classifier..github/workflows/wiki-browser.yml — the WB Action (committed at the monorepo root, not under wiki-browser/).Modified files:
internal/config/config.go — add the CD block, defaults, validation.internal/walker/walker.go — use Matcher instead of the inline extSet/exclude fields.internal/server/handler_webhook.go — rewrite for the /api/webhook/ci payload + trigger-file write.internal/server/server.go — Deps gains Version/Commit/BuildTime/CDTriggerFile/Draining; route rename; drainGuard; real /healthz handler.internal/server/embed.go — ShellData gains version fields.internal/server/handler_doc.go — writeShell populates the version fields.internal/server/templates/shell.html — version footer.internal/server/static/chrome.css — footer styling.internal/agent/service.go — Drain, draining flag, ErrDraining.cmd/wiki-browser/main.go — version vars; new shutdown sequence (drain); wire CD config + version into Deps.deploy/wiki-browser.service — TimeoutStopSec=11m.Makefile — COMMIT/VERSION/BUILD_TIME ldflags; build cmd/wb-cd.wiki-browser.example.yaml — cd: block.Decomposition note: internal/cd is split by responsibility — config.go/state.go (persisted state), git.go/build.go (read-only build inputs), swap.go (filesystem + systemd side effects), deploy.go (the orchestration loop). deploy.go holds function-field seams for build/restart/health so the loop and rollback logic are unit-testable without compiling a real binary or touching systemd.
Spec deviation (carry into review): The spec's drain section says "queued jobs stay in the DB; the next instance drains them on boot." internal/agent has no dequeue loop — Submit inserts the job row and immediately spawns its goroutine, which then blocks on a semaphore. So "already-submitted" jobs are in-flight goroutines, not a DB queue. This plan's Drain rejects only new Submit calls and waits for every already-submitted job (running or semaphore-blocked) to finish, bounded by the 10-minute cap. This loses no work and is simpler than introducing a dequeue loop; the cap + hard Stop() still bound the wait.
Working directory convention: Tasks 1–14 run from wiki-browser/. Task 15 also starts from wiki-browser/, except the GitHub workflow is created and staged as ../.github/workflows/wiki-browser.yml because GitHub only reads workflows from the monorepo root.
cd: blockFiles:
Modify: internal/config/config.go
Test: internal/config/config_test.go
Step 1: Write the failing tests
Add to internal/config/config_test.go. Mirror the existing git:/alert: block tests in that file for the config-loading helper shape (minimalValidConfigYAML / mustLoadConfigFromString / loadConfigFromString already exist there from the #10 work).
func TestCDBlockDefaults(t *testing.T) {
dir := t.TempDir()
yaml := minimalValidConfigYAML(t, dir) +
"\ncd:\n" +
" bin_dir: /srv/wiki-browser/bin\n" +
" trigger_file: /srv/wiki-browser/cd-trigger\n" +
" state_file: /srv/wiki-browser/cd-state.json\n"
c := mustLoadConfigFromString(t, yaml)
if c.CD == nil {
t.Fatal("CD block should be non-nil when present")
}
if c.CD.HealthPollTimeout != 90*time.Second {
t.Errorf("HealthPollTimeout default = %v, want 90s", c.CD.HealthPollTimeout)
}
}
func TestCDBlockRequiresPaths(t *testing.T) {
dir := t.TempDir()
yaml := minimalValidConfigYAML(t, dir) + "\ncd:\n bin_dir: /srv/wiki-browser/bin\n"
if _, err := loadConfigFromString(t, yaml); err == nil {
t.Fatal("expected error: cd.trigger_file / cd.state_file required")
}
}
func TestNoCDBlockMeansNilCD(t *testing.T) {
dir := t.TempDir()
c := mustLoadConfigFromString(t, minimalValidConfigYAML(t, dir))
if c.CD != nil {
t.Errorf("CD should be nil when no cd: block present")
}
}
Run: go test ./internal/config/ -run 'TestCDBlock|TestNoCDBlock' -v
Expected: FAIL — c.CD undefined.
In internal/config/config.go, add to the Config struct after Alert:
CD *CD `yaml:"cd"`
Add the new type after the Alert type:
// CD configures the wb-cd continuous-delivery oneshot. Optional: absent a `cd:`
// block, the /api/webhook/ci route still serves doc-sync, but a deploy:true
// payload has nowhere to write and is logged and ignored.
type CD struct {
BinDir string `yaml:"bin_dir"` // live binaries + bin/prev/
TriggerFile string `yaml:"trigger_file"` // webhook writes here; .path watches it
StateFile string `yaml:"state_file"` // deployed/poisoned commit
HealthPollTimeout time.Duration `yaml:"health_poll_timeout"` // /healthz poll budget
}
In applyDefaults(), after the Alert block:
if c.CD != nil && c.CD.HealthPollTimeout == 0 {
c.CD.HealthPollTimeout = 90 * time.Second
}
In validate(), before the final return nil:
if c.CD != nil {
if c.CD.BinDir == "" {
return fmt.Errorf("cd.bin_dir is required when a cd: block is present")
}
if c.CD.TriggerFile == "" {
return fmt.Errorf("cd.trigger_file is required when a cd: block is present")
}
if c.CD.StateFile == "" {
return fmt.Errorf("cd.state_file is required when a cd: block is present")
}
if c.CD.HealthPollTimeout < 0 {
return fmt.Errorf("cd.health_poll_timeout must not be negative")
}
}
These paths are intentionally not os.Stat-checked: the trigger and state files are created on demand, and bin_dir is a provisioning artifact validated by the operator runbook, not config load.
Run: go test ./internal/config/ -v
Expected: PASS (all existing config tests plus the three new ones).
git add internal/config/config.go internal/config/config_test.go
git commit -m "config: add optional cd: block"
walker.Matcher — shared tracked-path predicateFiles:
internal/walker/matcher.gointernal/walker/walker.gointernal/walker/matcher_test.goThe classifier (cmd/classify-ci-change, Task 3) must answer "would the running wiki serve this file?" with the exact rule the walker uses. Extract that rule into a Matcher so there is one implementation, not two.
Create internal/walker/matcher_test.go:
package walker
import "testing"
func TestMatcherTracked(t *testing.T) {
m := NewMatcher([]string{".md", ".html"}, []string{"www/**"})
cases := []struct {
path string
want bool
}{
{"docs/intro.md", true},
{"docs/page.HTML", true}, // extension match is case-insensitive
{"main.go", false}, // wrong extension
{"www/index.md", false}, // user exclude
{".git/config.md", false}, // baked-in default exclude
{"node_modules/x/readme.md", false},
{"a/.claude/skills/s.md", false},
}
for _, c := range cases {
if got := m.Tracked(c.path); got != c.want {
t.Errorf("Tracked(%q) = %v, want %v", c.path, got, c.want)
}
}
}
func TestMatcherExcluded(t *testing.T) {
m := NewMatcher([]string{".md"}, []string{"vendor/**"})
if !m.Excluded("vendor/lib/x.md") {
t.Error("Excluded(vendor/...) should be true")
}
if m.Excluded("docs/x.md") {
t.Error("Excluded(docs/...) should be false")
}
}
Run: go test ./internal/walker/ -run TestMatcher -v
Expected: FAIL — NewMatcher undefined.
matcher.goCreate internal/walker/matcher.go:
package walker
import (
"path/filepath"
"strings"
)
// Matcher decides, for a repo-relative slash-separated path, whether it is a
// tracked document: its extension is in the configured set and it is not
// excluded. It is the single source of truth shared by the Walker (what to
// serve) and cmd/classify-ci-change (what a push touched).
type Matcher struct {
extSet map[string]struct{}
exclude *ExcludeMatcher
}
// NewMatcher builds a Matcher. exclude is combined with DefaultExcludes.
func NewMatcher(extensions, exclude []string) *Matcher {
m := &Matcher{
extSet: make(map[string]struct{}, len(extensions)),
exclude: NewExcludeMatcher(exclude),
}
for _, e := range extensions {
m.extSet[strings.ToLower(e)] = struct{}{}
}
return m
}
// Excluded reports whether path is excluded (baked-in defaults or user list).
func (m *Matcher) Excluded(path string) bool { return m.exclude.Match(path) }
// extOK reports whether path has a tracked extension.
func (m *Matcher) extOK(path string) bool {
_, ok := m.extSet[strings.ToLower(filepath.Ext(path))]
return ok
}
// Tracked reports whether path is a served document: tracked extension and not
// excluded.
func (m *Matcher) Tracked(path string) bool {
return !m.Excluded(path) && m.extOK(path)
}
walker.go to use MatcherIn internal/walker/walker.go, replace the Walker struct's exclude/extSet fields with a single matcher:
type Walker struct {
opts Options
matcher *Matcher
mu sync.RWMutex
files map[string]struct{} // repo-relative slash paths
}
In New, replace the matcher construction:
w := &Walker{
opts: opts,
matcher: NewMatcher(opts.Extensions, opts.Exclude),
files: make(map[string]struct{}),
}
if err := w.scan(); err != nil {
return nil, fmt.Errorf("initial scan: %w", err)
}
return w, nil
(Delete the old w.exclude / w.extSet assignment and the for _, e := range opts.Extensions loop.)
In scanInto, replace w.exclude.Match(rel) with w.matcher.Excluded(rel) and !w.matchesExt(rel) with !w.matcher.extOK(rel):
if w.matcher.Excluded(rel) {
if d.IsDir() {
return fs.SkipDir
}
return nil
}
if d.IsDir() {
return nil
}
if !w.matcher.extOK(rel) {
return nil
}
Delete the now-unused matchesExt method. Remove the strings import from walker.go if it becomes unused (it does — strings only appeared in matchesExt and the deleted loop; verify with the build in Step 5).
Run: go test ./internal/walker/ -v
Expected: PASS — all existing walker tests plus the new matcher tests.
Run: go build ./...
Expected: builds clean.
git add internal/walker/matcher.go internal/walker/matcher_test.go internal/walker/walker.go
git commit -m "walker: extract shared Matcher predicate"
cmd/classify-ci-change — the WB Action classifierFiles:
cmd/classify-ci-change/classify.gocmd/classify-ci-change/main.gocmd/classify-ci-change/classify_test.goA Go program run by the WB Action via go run. It diffs the previous-push range, classifies the changed paths into two sets, and writes three booleans to $GITHUB_OUTPUT.
Create cmd/classify-ci-change/classify_test.go:
package main
import (
"os"
"os/exec"
"path/filepath"
"strings"
"testing"
"github.com/getorcha/wiki-browser/internal/walker"
)
func TestClassify(t *testing.T) {
m := walker.NewMatcher([]string{".md", ".html"}, []string{"www/**"})
cases := []struct {
name string
paths []string
code, tracked, notify bool
}{
{
name: "doc only",
paths: []string{"docs/x.md"},
code: false, tracked: true, notify: true,
},
{
name: "wiki-browser go code",
paths: []string{"wiki-browser/internal/server/server.go"},
code: true, tracked: false, notify: true,
},
{
name: "wiki-browser embedded template",
paths: []string{"wiki-browser/internal/server/templates/shell.html"},
code: true, tracked: false, notify: true,
},
{
name: "wiki-browser doc - tracked doc but not a build input",
paths: []string{"wiki-browser/docs/plans/x.md"},
code: false, tracked: true, notify: true,
},
{
name: "wiki-browser markdown - tracked doc but not a build input",
paths: []string{"wiki-browser/CLAUDE.md"},
code: false, tracked: true, notify: true,
},
{
name: "the workflow file itself",
paths: []string{".github/workflows/wiki-browser.yml"},
code: true, tracked: false, notify: true,
},
{
name: "excluded doc",
paths: []string{"www/landing.md"},
code: false, tracked: false, notify: false,
},
{
name: "unrelated subproject code",
paths: []string{"crm/src/main.go"},
code: false, tracked: false, notify: false,
},
{
name: "code and doc together",
paths: []string{"wiki-browser/internal/index/index.go", "docs/x.md"},
code: true, tracked: true, notify: true,
},
}
for _, c := range cases {
t.Run(c.name, func(t *testing.T) {
got := classify(c.paths, m)
if got.Code != c.code || got.Tracked != c.tracked || got.Notify != c.notify {
t.Errorf("classify(%v) = %+v, want code=%v tracked=%v notify=%v",
c.paths, got, c.code, c.tracked, c.notify)
}
})
}
}
func TestChangedPathsUsesPushRange(t *testing.T) {
dir := t.TempDir()
oldWD, _ := os.Getwd()
if err := os.Chdir(dir); err != nil {
t.Fatal(err)
}
defer os.Chdir(oldWD)
runGit(t, "init", "-b", "master")
runGit(t, "config", "user.email", "test@example.com")
runGit(t, "config", "user.name", "Test")
os.WriteFile("a.md", []byte("a"), 0o644)
runGit(t, "add", "a.md")
runGit(t, "commit", "-m", "a")
base := gitOut(t, "rev-parse", "HEAD")
os.WriteFile("b.md", []byte("b"), 0o644)
runGit(t, "add", "b.md")
runGit(t, "commit", "-m", "b")
os.WriteFile("c.md", []byte("c"), 0o644)
runGit(t, "add", "c.md")
runGit(t, "commit", "-m", "c")
head := gitOut(t, "rev-parse", "HEAD")
paths, err := changedPaths(base, head)
if err != nil {
t.Fatal(err)
}
if got := strings.Join(paths, ","); got != "b.md,c.md" {
t.Fatalf("changedPaths = %q, want b.md,c.md", got)
}
}
func TestChangedPathsAllZeroBaseUsesEmptyTree(t *testing.T) {
dir := t.TempDir()
oldWD, _ := os.Getwd()
if err := os.Chdir(dir); err != nil {
t.Fatal(err)
}
defer os.Chdir(oldWD)
runGit(t, "init", "-b", "master")
runGit(t, "config", "user.email", "test@example.com")
runGit(t, "config", "user.name", "Test")
os.WriteFile("README.md", []byte("x"), 0o644)
runGit(t, "add", "README.md")
runGit(t, "commit", "-m", "seed")
head := gitOut(t, "rev-parse", "HEAD")
paths, err := changedPaths(strings.Repeat("0", 40), head)
if err != nil {
t.Fatal(err)
}
if len(paths) != 1 || paths[0] != "README.md" {
t.Fatalf("changedPaths = %v, want [README.md]", paths)
}
}
func TestWriteOutputsAppendsGitHubOutput(t *testing.T) {
out := filepath.Join(t.TempDir(), "github-output")
t.Setenv("GITHUB_OUTPUT", out)
if err := writeOutputs(map[string]bool{"notify": true}); err != nil {
t.Fatal(err)
}
got, err := os.ReadFile(out)
if err != nil {
t.Fatal(err)
}
if !strings.Contains(string(got), "notify=true\n") {
t.Fatalf("GITHUB_OUTPUT = %q, want notify=true", got)
}
}
func runGit(t *testing.T, args ...string) {
t.Helper()
if out, err := exec.Command("git", args...).CombinedOutput(); err != nil {
t.Fatalf("git %v: %v\n%s", args, err, out)
}
}
func gitOut(t *testing.T, args ...string) string {
t.Helper()
out, err := exec.Command("git", args...).Output()
if err != nil {
t.Fatalf("git %v: %v", args, err)
}
return strings.TrimSpace(string(out))
}
Run: go test ./cmd/classify-ci-change/ -v
Expected: FAIL — package does not exist.
classify.goCreate cmd/classify-ci-change/classify.go:
package main
import "strings"
// Result is the classifier verdict.
type Result struct {
Code bool // a deploy build input changed
Tracked bool // a document the running wiki serves changed
Notify bool // Code || Tracked — whether to call the Pi at all
}
// trackedMatcher is the narrow surface classify needs from walker.Matcher.
type trackedMatcher interface {
Tracked(path string) bool
}
// classify decides, from a list of repo-relative changed paths, whether the
// push is a deploy build-input change and/or a tracked-document change.
func classify(paths []string, m trackedMatcher) Result {
var r Result
for _, p := range paths {
if isBuildInput(p) {
r.Code = true
}
if m.Tracked(p) {
r.Tracked = true
}
}
r.Notify = r.Code || r.Tracked
return r
}
// isBuildInput reports whether a changed path affects the wiki-browser binary.
// Build inputs are everything under wiki-browser/ except docs/, .claude/ and
// *.md, plus the WB Action workflow file itself (it controls deployment).
func isBuildInput(path string) bool {
if path == ".github/workflows/wiki-browser.yml" {
return true
}
if !strings.HasPrefix(path, "wiki-browser/") {
return false
}
if strings.HasPrefix(path, "wiki-browser/docs/") {
return false
}
if strings.HasPrefix(path, "wiki-browser/.claude/") {
return false
}
if strings.HasSuffix(strings.ToLower(path), ".md") {
return false
}
return true
}
Run: go test ./cmd/classify-ci-change/ -v
Expected: PASS.
main.goCreate cmd/classify-ci-change/main.go:
// Command classify-ci-change is run by the WB Action. It diffs the push range,
// classifies the changed paths, and writes code_changed / tracked_changed /
// notify to $GITHUB_OUTPUT. Any error exits non-zero so the Action fails loud.
package main
import (
"flag"
"fmt"
"os"
"os/exec"
"strings"
"github.com/getorcha/wiki-browser/internal/walker"
"gopkg.in/yaml.v3"
)
// emptyTree is git's well-known empty-tree hash, used when base is the
// all-zero sentinel (a branch's first push).
const emptyTree = "4b825dc642cb6eb9a060e54bf8d69288fbee4904"
// policy is the committed non-secret mirror of the wiki's document-tracking
// config: just the values, not the matching logic.
type policy struct {
Extensions []string `yaml:"extensions"`
Exclude []string `yaml:"exclude"`
}
func main() {
base := flag.String("base", "", "github.event.before")
head := flag.String("head", "", "github.sha")
policyPath := flag.String("policy", "ci-tracked-paths.yaml", "path policy file")
flag.Parse()
if err := run(*base, *head, *policyPath); err != nil {
fmt.Fprintln(os.Stderr, "classify-ci-change:", err)
os.Exit(1)
}
}
func run(base, head, policyPath string) error {
pol, err := loadPolicy(policyPath)
if err != nil {
return err
}
m := walker.NewMatcher(pol.Extensions, pol.Exclude)
paths, err := changedPaths(base, head)
if err != nil {
return err
}
res := classify(paths, m)
return writeOutputs(map[string]bool{
"code_changed": res.Code,
"tracked_changed": res.Tracked,
"notify": res.Notify,
})
}
func loadPolicy(path string) (policy, error) {
raw, err := os.ReadFile(path)
if err != nil {
return policy{}, fmt.Errorf("read policy %s: %w", path, err)
}
var p policy
if err := yaml.Unmarshal(raw, &p); err != nil {
return policy{}, fmt.Errorf("parse policy %s: %w", path, err)
}
if len(p.Extensions) == 0 {
return policy{}, fmt.Errorf("policy %s: extensions is required", path)
}
return p, nil
}
// changedPaths returns the repo-relative paths changed between base and head.
// A missing or all-zero base (first push) is diffed against the empty tree.
func changedPaths(base, head string) ([]string, error) {
if head == "" {
return nil, fmt.Errorf("head is required")
}
from := base
if from == "" || from == strings.Repeat("0", 40) || !revExists(base) {
from = emptyTree
}
out, err := exec.Command("git", "diff", "--name-only", from+".."+head).CombinedOutput()
if err != nil {
return nil, fmt.Errorf("git diff %s..%s: %w\n%s", from, head, err, out)
}
var paths []string
for _, ln := range strings.Split(string(out), "\n") {
if ln = strings.TrimSpace(ln); ln != "" {
paths = append(paths, ln)
}
}
return paths, nil
}
func revExists(rev string) bool {
return exec.Command("git", "rev-parse", "--verify", "--quiet", rev+"^{commit}").Run() == nil
}
// writeOutputs appends key=value lines to the file named by $GITHUB_OUTPUT.
func writeOutputs(kv map[string]bool) error {
path := os.Getenv("GITHUB_OUTPUT")
if path == "" {
// Not under Actions — print so a manual run is still useful.
for k, v := range kv {
fmt.Printf("%s=%t\n", k, v)
}
return nil
}
f, err := os.OpenFile(path, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o644)
if err != nil {
return fmt.Errorf("open GITHUB_OUTPUT: %w", err)
}
defer f.Close()
for k, v := range kv {
if _, err := fmt.Fprintf(f, "%s=%t\n", k, v); err != nil {
return err
}
}
return nil
}
Run: go build ./... && go test ./cmd/classify-ci-change/ -v
Expected: builds clean; tests PASS.
git add cmd/classify-ci-change/
git commit -m "cmd/classify-ci-change: WB Action path classifier"
Files:
Modify: cmd/wiki-browser/main.go
Modify: Makefile
Test: cmd/wiki-browser/main_test.go
Step 1: Write the failing test
Add to cmd/wiki-browser/main_test.go:
func TestVersionVarsHaveDefaults(t *testing.T) {
// The ldflags-injected vars must have safe defaults so a bare `go build`
// or `go test` (no -X) does not produce empty version strings.
if version == "" || commit == "" || buildTime == "" {
t.Fatalf("version vars must default non-empty: version=%q commit=%q buildTime=%q",
version, commit, buildTime)
}
}
Run: go test ./cmd/wiki-browser/ -run TestVersionVars -v
Expected: FAIL — version undefined.
In cmd/wiki-browser/main.go, after the const block at the top of the file:
// Version metadata, overridden at build time via -ldflags "-X main.version=...".
// Defaults keep a bare `go build` / `go test` working.
var (
version = "dev"
commit = "dev"
buildTime = "unknown"
)
Run: go test ./cmd/wiki-browser/ -run TestVersionVars -v
Expected: PASS.
Replace the top of Makefile (the build and build-arm64 targets) with:
# Version metadata stamped into the wiki-browser binary. wb-cd overrides these
# on the Pi with the exact commit it is deploying.
COMMIT ?= $(shell git rev-parse HEAD 2>/dev/null || echo dev)
VERSION ?= $(shell git rev-parse --short HEAD 2>/dev/null || echo dev)
BUILD_TIME ?= $(shell date -u +%Y-%m-%dT%H:%MZ)
LDFLAGS = -s -w -X main.version=$(VERSION) -X main.commit=$(COMMIT) -X main.buildTime=$(BUILD_TIME)
build:
go build -trimpath -ldflags="$(LDFLAGS)" -o dist/wiki-browser ./cmd/wiki-browser
go build -trimpath -ldflags="$(LDFLAGS)" -o dist/wb-agent ./cmd/wb-agent
go build -trimpath -ldflags="$(LDFLAGS)" -o dist/wb-cd ./cmd/wb-cd
# Cross-compile every binary for a 64-bit Pi.
build-arm64:
GOOS=linux GOARCH=arm64 CGO_ENABLED=0 \
go build -trimpath -ldflags="$(LDFLAGS)" -o dist/wiki-browser-arm64 ./cmd/wiki-browser
GOOS=linux GOARCH=arm64 CGO_ENABLED=0 \
go build -trimpath -ldflags="$(LDFLAGS)" -o dist/wb-agent-arm64 ./cmd/wb-agent
GOOS=linux GOARCH=arm64 CGO_ENABLED=0 \
go build -trimpath -ldflags="$(LDFLAGS)" -o dist/wb-cd-arm64 ./cmd/wb-cd
The -X main.commit= flag is a no-op for wb-agent/wb-cd (they have no main.commit symbol) — the linker silently ignores it, so one LDFLAGS for all three is correct.
make build now references ./cmd/wb-cd, which does not exist until Task 14. Do not run make build until then — go build ./... and go test ./... in the interim do not depend on the Makefile.
git add cmd/wiki-browser/main.go cmd/wiki-browser/main_test.go Makefile
git commit -m "build: stamp version/commit/build-time via ldflags"
/healthz JSONFiles:
Modify: internal/server/server.go
Test: internal/server/server_test.go (create if absent; otherwise add to it)
Step 1: Write the failing test
Add to internal/server/server_test.go (create the file with package server if it does not exist):
package server
import (
"encoding/json"
"net/http"
"net/http/httptest"
"testing"
)
func TestHealthzReturnsJSONVersion(t *testing.T) {
d := Deps{Version: "abc1234", Commit: "abc1234full", BuildTime: "2026-05-21T14:30Z"}
rec := httptest.NewRecorder()
d.handleHealthz(rec, httptest.NewRequest("GET", "/healthz", nil))
if rec.Code != http.StatusOK {
t.Fatalf("status = %d, want 200", rec.Code)
}
var got map[string]string
if err := json.Unmarshal(rec.Body.Bytes(), &got); err != nil {
t.Fatalf("body not JSON: %v", err)
}
if got["status"] != "ok" {
t.Errorf("status = %q, want ok", got["status"])
}
if got["commit"] != "abc1234full" {
t.Errorf("commit = %q, want abc1234full", got["commit"])
}
if got["version"] != "abc1234" {
t.Errorf("version = %q, want abc1234", got["version"])
}
if got["built"] != "2026-05-21T14:30Z" {
t.Errorf("built = %q", got["built"])
}
}
Run: go test ./internal/server/ -run TestHealthz -v
Expected: FAIL — handleHealthz / Deps.Version undefined.
Deps fields and the handlerIn internal/server/server.go, add to the Deps struct (after Title/Root):
// Version, Commit, BuildTime are the build-stamp values surfaced at
// /healthz and in the UI footer. Empty in tests and dev builds.
Version string
Commit string
BuildTime string
Add the handler (a method on Deps, e.g. at the end of server.go):
// handleHealthz reports liveness plus the running build identity. wb-cd's
// post-deploy check reads the full commit here to confirm the swap took.
func (d Deps) handleHealthz(w http.ResponseWriter, r *http.Request) {
writeJSON(w, http.StatusOK, map[string]string{
"status": "ok",
"commit": d.Commit,
"version": d.Version,
"built": d.BuildTime,
})
}
In Mux, replace the inline /healthz closure:
mux.HandleFunc("GET /healthz", d.handleHealthz)
Run: go test ./internal/server/ -run TestHealthz -v
Expected: PASS.
git add internal/server/server.go internal/server/server_test.go
git commit -m "server: /healthz returns JSON build identity"
Files:
Modify: internal/server/embed.go
Modify: internal/server/handler_doc.go
Modify: internal/server/templates/shell.html
Modify: internal/server/static/chrome.css
Test: internal/server/handler_doc_test.go
Step 1: Write the failing test
Add to internal/server/handler_doc_test.go. Mirror the existing writeShell/handleRoot tests in that file for the Deps setup.
func TestShellRendersVersionFooter(t *testing.T) {
d := newDocTestDeps(t) // existing helper used by other handler_doc tests
d.Version = "abc1234"
d.Commit = "abc1234ffffffffffffffffffffffffffffffff"
d.BuildTime = "2026-05-21 14:30"
rec := httptest.NewRecorder()
d.handleRoot(rec, httptest.NewRequest("GET", "/", nil))
body := rec.Body.String()
if !strings.Contains(body, "abc1234") {
t.Error("footer should show the version")
}
if !strings.Contains(body, "github.com/getorcha/orcha/commit/abc1234ffff") {
t.Error("version should link to the GitHub commit")
}
if !strings.Contains(body, "2026-05-21 14:30") {
t.Error("footer should show the build time")
}
}
If newDocTestDeps does not exist, build the Deps inline the way the existing handler_doc_test.go tests do (they need a Walker); the assertions above are what matter.
Run: go test ./internal/server/ -run TestShellRendersVersionFooter -v
Expected: FAIL — version not in the rendered shell.
ShellData fieldsIn internal/server/embed.go, add to the ShellData struct:
// Version/Commit/BuildTime drive the subtle chrome footer. Empty in dev
// builds — the template omits the footer when Version is empty.
Version string
Commit string
BuildTime string
writeShellIn internal/server/handler_doc.go, add to the ShellData{...} literal in writeShell:
Version: d.Version,
Commit: d.Commit,
BuildTime: d.BuildTime,
shell.htmlIn internal/server/templates/shell.html, immediately before the closing </body> tag:
{{ if .Version }}
<footer class="wb-version">
<a href="https://github.com/getorcha/orcha/commit/{{ .Commit }}" target="_blank" rel="noopener">{{ .Version }}</a>
<span>· {{ .BuildTime }}</span>
</footer>
{{ end }}
Append to internal/server/static/chrome.css:
/* Version footer — subtle build-identity stamp, bottom-right of the chrome. */
.wb-version {
position: fixed;
bottom: 0;
right: 0;
z-index: 5;
padding: 2px 8px;
font-size: 11px;
line-height: 1.4;
color: var(--wb-muted, #78716c);
background: var(--wb-surface, #ffffff);
border-top-left-radius: 4px;
border-top: 1px solid var(--wb-rule, #e7e5e4);
border-left: 1px solid var(--wb-rule, #e7e5e4);
opacity: 0.6;
}
.wb-version:hover { opacity: 1; }
.wb-version a { color: inherit; text-decoration: none; }
.wb-version a:hover { text-decoration: underline; }
(If chrome.css does not define --wb-muted/--wb-surface/--wb-rule, the fallbacks in the var() calls apply; adjust the fallbacks to match the file's actual palette if it uses different token names.)
Run: go test ./internal/server/ -run TestShellRendersVersionFooter -v
Expected: PASS.
Templates and static assets are embedded via embed.FS — rebuild and restart. Per wiki-browser/CLAUDE.md:
make build
pkill -f 'dist/wiki-browser' || true
nohup ./dist/wiki-browser -config=wiki-browser.dev.yaml >/tmp/wb.log 2>&1 &
disown
playwright-cli open --browser=chromium http://localhost:8080/doc/README.md
playwright-cli screenshot --filename=/tmp/wb-footer.png
make build requires cmd/wb-cd (Task 14). If executing tasks in order, defer this browser step until after Task 14, or temporarily go build -o dist/wiki-browser ./cmd/wiki-browser directly. Read /tmp/wb-footer.png and confirm the version chip is visible, subtle, and bottom-right.
git add internal/server/embed.go internal/server/handler_doc.go internal/server/templates/shell.html internal/server/static/chrome.css internal/server/handler_doc_test.go
git commit -m "chrome: subtle version footer"
/api/webhook/ciFiles:
internal/server/handler_webhook.gointernal/server/server.gointernal/server/handler_webhook_test.goThe #10 Task-9 handler parsed GitHub's native push event. It now parses the WB Action's payload {deploy, commit, ref, delivery_id}, always triggers doc-sync, and on deploy:true writes the commit to the CD trigger file.
Replace the body of internal/server/handler_webhook_test.go with:
package server
import (
"crypto/hmac"
"crypto/sha256"
"encoding/hex"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"strings"
"testing"
)
func sign(secret, body []byte) string {
m := hmac.New(sha256.New, secret)
m.Write(body)
return "sha256=" + hex.EncodeToString(m.Sum(nil))
}
const fakeCommit = "a1b2c3d4e5f6789012345678901234567890abcd"
func TestCIWebhookRejectsMissingSignature(t *testing.T) {
d := Deps{WebhookSecret: []byte("topsecret")}
req := httptest.NewRequest("POST", "/api/webhook/ci",
strings.NewReader(`{"deploy":false,"commit":"`+fakeCommit+`"}`))
rec := httptest.NewRecorder()
d.handleCIWebhook(rec, req)
if rec.Code != http.StatusUnauthorized {
t.Fatalf("status = %d, want 401", rec.Code)
}
}
func TestCIWebhookRejectsBadCommit(t *testing.T) {
secret := []byte("s")
body := []byte(`{"deploy":false,"commit":"not-a-sha"}`)
d := Deps{WebhookSecret: secret}
req := httptest.NewRequest("POST", "/api/webhook/ci", strings.NewReader(string(body)))
req.Header.Set("X-Hub-Signature-256", sign(secret, body))
rec := httptest.NewRecorder()
d.handleCIWebhook(rec, req)
if rec.Code != http.StatusBadRequest {
t.Fatalf("status = %d, want 400", rec.Code)
}
}
func TestCIWebhookSyncOnlyReturns204(t *testing.T) {
secret := []byte("s")
body := []byte(`{"deploy":false,"commit":"` + fakeCommit + `","ref":"refs/heads/master"}`)
d := Deps{WebhookSecret: secret} // GitSync nil, no trigger file — must still 204
req := httptest.NewRequest("POST", "/api/webhook/ci", strings.NewReader(string(body)))
req.Header.Set("X-Hub-Signature-256", sign(secret, body))
rec := httptest.NewRecorder()
d.handleCIWebhook(rec, req)
if rec.Code != http.StatusNoContent {
t.Fatalf("status = %d, want 204", rec.Code)
}
}
func TestCIWebhookDeployWritesTriggerFile(t *testing.T) {
secret := []byte("s")
trigger := filepath.Join(t.TempDir(), "cd-trigger")
body := []byte(`{"deploy":true,"commit":"` + fakeCommit + `","ref":"refs/heads/master"}`)
d := Deps{WebhookSecret: secret, CDTriggerFile: trigger}
req := httptest.NewRequest("POST", "/api/webhook/ci", strings.NewReader(string(body)))
req.Header.Set("X-Hub-Signature-256", sign(secret, body))
rec := httptest.NewRecorder()
d.handleCIWebhook(rec, req)
if rec.Code != http.StatusNoContent {
t.Fatalf("status = %d, want 204", rec.Code)
}
got, err := os.ReadFile(trigger)
if err != nil {
t.Fatalf("trigger file not written: %v", err)
}
if strings.TrimSpace(string(got)) != fakeCommit {
t.Fatalf("trigger file = %q, want %q", got, fakeCommit)
}
}
func TestCIWebhookDeployWriteFailureReturns5xx(t *testing.T) {
secret := []byte("s")
trigger := filepath.Join(t.TempDir(), "missing", "cd-trigger")
body := []byte(`{"deploy":true,"commit":"` + fakeCommit + `","ref":"refs/heads/master"}`)
d := Deps{WebhookSecret: secret, CDTriggerFile: trigger}
req := httptest.NewRequest("POST", "/api/webhook/ci", strings.NewReader(string(body)))
req.Header.Set("X-Hub-Signature-256", sign(secret, body))
rec := httptest.NewRecorder()
d.handleCIWebhook(rec, req)
if rec.Code < 500 {
t.Fatalf("status = %d, want 5xx on trigger write failure", rec.Code)
}
}
func TestCIWebhookDeployWithoutCDConfiguredStill204(t *testing.T) {
secret := []byte("s")
body := []byte(`{"deploy":true,"commit":"` + fakeCommit + `","ref":"refs/heads/master"}`)
d := Deps{WebhookSecret: secret} // CDTriggerFile empty
req := httptest.NewRequest("POST", "/api/webhook/ci", strings.NewReader(string(body)))
req.Header.Set("X-Hub-Signature-256", sign(secret, body))
rec := httptest.NewRecorder()
d.handleCIWebhook(rec, req)
if rec.Code != http.StatusNoContent {
t.Fatalf("status = %d, want 204 (deploy ignored, logged)", rec.Code)
}
}
Run: go test ./internal/server/ -run TestCIWebhook -v
Expected: FAIL — handleCIWebhook / Deps.CDTriggerFile undefined.
handler_webhook.goReplace the whole file with:
package server
import (
"crypto/hmac"
"crypto/sha256"
"encoding/hex"
"encoding/json"
"io"
"log/slog"
"net/http"
"os"
"path/filepath"
"regexp"
)
// webhookMaxBody caps the request body the webhook handler will read.
const webhookMaxBody = 1 << 20 // 1 MiB — the CI payload is tiny
var fullSHA = regexp.MustCompile(`^[0-9a-f]{40}$`)
// ciPayload is the WB Action's request body.
type ciPayload struct {
Deploy bool `json:"deploy"`
Commit string `json:"commit"`
Ref string `json:"ref"`
DeliveryID string `json:"delivery_id"`
}
// handleCIWebhook verifies the WB Action's HMAC-signed delivery, triggers a
// git-sync Sync (the doc pull), and — on deploy:true — writes the approved
// commit to the CD trigger file. Public route: unauthenticated by session,
// protected only by the shared secret. Responds 204 fast after successful
// validation and trigger write; sync and the CD run proceed out of band.
func (d Deps) handleCIWebhook(w http.ResponseWriter, r *http.Request) {
if len(d.WebhookSecret) == 0 {
w.WriteHeader(http.StatusServiceUnavailable)
return
}
body, err := io.ReadAll(io.LimitReader(r.Body, webhookMaxBody))
if err != nil {
w.WriteHeader(http.StatusBadRequest)
return
}
if !validSignature(d.WebhookSecret, body, r.Header.Get("X-Hub-Signature-256")) {
slog.Warn("ci-webhook: signature verification failed", "remote", r.RemoteAddr)
w.WriteHeader(http.StatusUnauthorized)
return
}
var p ciPayload
if err := json.Unmarshal(body, &p); err != nil {
slog.Warn("ci-webhook: bad payload", "err", err)
w.WriteHeader(http.StatusBadRequest)
return
}
if !fullSHA.MatchString(p.Commit) {
slog.Warn("ci-webhook: malformed commit", "commit", p.Commit, "delivery", p.DeliveryID)
w.WriteHeader(http.StatusBadRequest)
return
}
if d.GitSync != nil {
wantRef := "refs/heads/" + d.GitSync.Branch()
if p.Ref != wantRef {
slog.Warn("ci-webhook: ref not the synced branch", "ref", p.Ref, "want", wantRef)
w.WriteHeader(http.StatusBadRequest)
return
}
d.GitSync.RequestSync()
}
if p.Deploy {
if d.CDTriggerFile == "" {
slog.Warn("ci-webhook: deploy requested but cd is not configured",
"commit", p.Commit, "delivery", p.DeliveryID)
} else if err := writeTriggerFile(d.CDTriggerFile, p.Commit); err != nil {
slog.Error("ci-webhook: write trigger file failed", "err", err)
w.WriteHeader(http.StatusServiceUnavailable)
return
} else {
slog.Info("ci-webhook: deploy queued", "commit", p.Commit, "delivery", p.DeliveryID)
}
}
w.WriteHeader(http.StatusNoContent)
}
// writeTriggerFile atomically writes commit to path (temp file + rename), so
// the .path unit never observes a half-written trigger.
func writeTriggerFile(path, commit string) error {
tmp := path + ".tmp"
if err := os.WriteFile(tmp, []byte(commit+"\n"), 0o644); err != nil {
return err
}
return os.Rename(tmp, filepath.Clean(path))
}
// validSignature constant-time compares the HMAC-SHA256 header.
func validSignature(secret, body []byte, header string) bool {
if header == "" {
return false
}
m := hmac.New(sha256.New, secret)
m.Write(body)
want := "sha256=" + hex.EncodeToString(m.Sum(nil))
return hmac.Equal([]byte(header), []byte(want))
}
Deps and the routeIn internal/server/server.go, add to the Deps struct after WebhookSecret:
// CDTriggerFile is the path wb-cd watches. Empty leaves a deploy:true
// webhook logged-and-ignored (doc-sync still happens).
CDTriggerFile string
In Mux, rename the webhook route:
if len(d.WebhookSecret) > 0 {
mux.HandleFunc("POST /api/webhook/ci", d.handleCIWebhook)
}
Run: go test ./internal/server/ -run TestCIWebhook -v
Expected: PASS.
Run: go test ./internal/server/ -v
Expected: PASS — no other test referenced the old /api/webhook/github route or handleGitHubWebhook. If one does, update it to the new route/handler name.
git add internal/server/handler_webhook.go internal/server/server.go internal/server/handler_webhook_test.go
git commit -m "server: rewrite webhook as /api/webhook/ci with deploy payload"
Files:
Modify: internal/agent/service.go
Test: internal/agent/service_test.go
Step 1: Write the failing test
Add to internal/agent/service_test.go. Reuse the existing service-test harness (fakeRunner, the NewService/ServiceConfig setup used by other tests in the file).
func TestDrainRejectsNewSubmitsAndWaitsForInflight(t *testing.T) {
// A runner that blocks until released, so a job is genuinely in flight
// while Drain runs.
release := make(chan struct{})
started := make(chan struct{})
svc := newTestService(t, testServiceOpts{
runner: blockingRunner{started: started, release: release},
})
jobID, err := svc.Submit(SubmitInput{
Kind: "perspective", SourcePath: "a.md",
PersonaName: "p", SourceSHA: "s", PersonaSHA: "ps",
})
if err != nil {
t.Fatalf("Submit: %v", err)
}
<-started // the job is running
drained := make(chan bool, 1)
go func() { drained <- svc.Drain(context.Background()) }()
// New Submit during the drain must be rejected.
if _, err := svc.Submit(SubmitInput{
Kind: "perspective", SourcePath: "b.md",
PersonaName: "p", SourceSHA: "s", PersonaSHA: "ps",
}); !errors.Is(err, ErrDraining) {
t.Fatalf("Submit during drain: err = %v, want ErrDraining", err)
}
close(release) // let the in-flight job finish
if ok := <-drained; !ok {
t.Fatal("Drain returned false; want true (job finished within budget)")
}
_ = jobID
}
func TestDrainHitsCapWhenJobNeverFinishes(t *testing.T) {
release := make(chan struct{})
started := make(chan struct{})
svc := newTestService(t, testServiceOpts{
runner: blockingRunner{started: started, release: release},
})
defer func() { close(release); svc.Stop() }()
if _, err := svc.Submit(SubmitInput{
Kind: "perspective", SourcePath: "a.md",
PersonaName: "p", SourceSHA: "s", PersonaSHA: "ps",
}); err != nil {
t.Fatalf("Submit: %v", err)
}
<-started
ctx, cancel := context.WithTimeout(context.Background(), 50*time.Millisecond)
defer cancel()
if ok := svc.Drain(ctx); ok {
t.Fatal("Drain returned true; want false (cap hit before the job finished)")
}
}
blockingRunner is a Runner whose Run closes started, then blocks on release, then returns a success RunResult. Add it to the test file (mirror the existing fakeRunner shape):
type blockingRunner struct {
started chan struct{}
release chan struct{}
}
func (r blockingRunner) Run(ctx context.Context, _ Job) RunResult {
close(r.started)
select {
case <-r.release:
case <-ctx.Done():
}
return RunResult{ExitCode: 0}
}
If newTestService/testServiceOpts do not already exist in service_test.go, build the Service inline the way the existing tests do; the behavioural assertions are the contract.
Also add a regression assertion for the acceptance race: the implementation should expose the accepted job to Drain before Submit releases s.mu. If the local test harness can pause storage calls, pause Submit after acceptance but before the runner goroutine starts, call Drain concurrently, and assert it does not return until the paused submit is released and the job finishes. If the existing harness cannot pause storage cleanly, factor the acceptance block into a small helper and unit-test that it sets draining/inflight/wg accounting as one critical section. This test is mandatory: a job accepted before drain begins must not be missed by the wait group.
Run: go test ./internal/agent/ -run TestDrain -v
Expected: FAIL — Drain / ErrDraining undefined.
DrainIn internal/agent/service.go:
Add to the error var block:
ErrDraining = errors.New("agent: service is draining; not accepting new jobs")
Add a draining field to the Service struct's mutex-guarded block (next to consecutiveFailures):
draining bool // guarded by mu
In Submit, add the draining check inside the existing s.mu.Lock() block — right after the lock, before the inflight check:
s.mu.Lock()
if s.draining {
s.mu.Unlock()
return "", ErrDraining
}
if _, ok := s.inflight[key]; ok {
s.mu.Unlock()
return "", ErrInflight
}
jobID := uuid.NewString()
s.inflight[key] = struct{}{}
s.wg.Add(1) // accepted jobs must be visible to Drain before releasing mu
s.mu.Unlock()
Remove the later jobID := uuid.NewString(), s.inflight[key] = struct{}{}, and s.wg.Add(1) from their old locations. From this point on, any error path before the goroutine is launched must balance both the in-memory gate and the wait group:
releaseAccepted := func() {
s.releaseInflight(key)
s.wg.Done()
}
Use releaseAccepted() for checkPersistedInflight / InsertJob failures, and keep the goroutine's existing defer s.wg.Done() for accepted jobs that actually launch. The invariant is: once Submit accepts a job under s.mu, Drain must wait for it even if the job has not reached the runner yet. Do not check draining, unlock, and only later call wg.Add(1) — that reintroduces the race this task is meant to prevent.
Add the Drain method (after Stop):
// Drain stops accepting new jobs and waits for every already-submitted job
// (running or semaphore-queued) to finish, or until ctx is done. It does NOT
// cancel in-flight jobs — that is the whole point. Returns true if all jobs
// finished within ctx, false if ctx fired first; on false the caller should
// Stop() to hard-cancel the stragglers.
func (s *Service) Drain(ctx context.Context) bool {
s.mu.Lock()
s.draining = true
s.mu.Unlock()
done := make(chan struct{})
go func() {
s.wg.Wait()
close(done)
}()
select {
case <-done:
return true
case <-ctx.Done():
return false
}
}
Run: go test ./internal/agent/ -v
Expected: PASS.
git add internal/agent/service.go internal/agent/service_test.go
git commit -m "agent: Drain — stop new jobs, wait for in-flight"
Files:
Modify: internal/server/server.go
Modify: cmd/wiki-browser/main.go
Test: internal/server/server_test.go
Step 1: Write the failing test
Add to internal/server/server_test.go:
import "sync/atomic" // add to the import block
func TestDrainGuardRejectsMutationsWhileDraining(t *testing.T) {
var draining atomic.Bool
inner := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
})
h := drainGuard(&draining, inner)
// Not draining → passes through.
rec := httptest.NewRecorder()
h.ServeHTTP(rec, httptest.NewRequest("POST", "/x", nil))
if rec.Code != http.StatusOK {
t.Fatalf("not draining: status = %d, want 200", rec.Code)
}
// Draining → 503 with Retry-After.
draining.Store(true)
rec = httptest.NewRecorder()
h.ServeHTTP(rec, httptest.NewRequest("POST", "/x", nil))
if rec.Code != http.StatusServiceUnavailable {
t.Fatalf("draining: status = %d, want 503", rec.Code)
}
if rec.Header().Get("Retry-After") == "" {
t.Error("draining response missing Retry-After")
}
}
func TestDrainGuardNilIsPassthrough(t *testing.T) {
inner := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
})
rec := httptest.NewRecorder()
drainGuard(nil, inner).ServeHTTP(rec, httptest.NewRequest("POST", "/x", nil))
if rec.Code != http.StatusOK {
t.Fatalf("nil draining flag: status = %d, want 200", rec.Code)
}
}
Run: go test ./internal/server/ -run TestDrainGuard -v
Expected: FAIL — drainGuard undefined.
Deps field and drainGuardIn internal/server/server.go, add "sync/atomic" to the imports, and add to Deps after CDTriggerFile:
// Draining, when set and true, makes mutating routes return 503 so a
// redeploy's drain is not extended by new write requests. Nil disables
// the guard (tests, and any build without a shutdown drain).
Draining *atomic.Bool
Add the middleware (near withSession in server.go):
// drainGuard returns 503 for mutating requests once draining is set, so the
// shutdown drain is not prolonged by new writes. The CI webhook is guarded
// too: the WB Action retries a 503 against the freshly-started process.
func drainGuard(draining *atomic.Bool, h http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if draining != nil && draining.Load() {
w.Header().Set("Retry-After", "30")
writeJSONError(w, http.StatusServiceUnavailable, "draining")
return
}
h.ServeHTTP(w, r)
})
}
In Mux, wrap every mutating route with drainGuard. The mutating routes are the POST /api/batches, POST /api/topics, POST /api/topics/{id}/messages, POST /api/topics/{id}/proposals, POST /api/topics/{id}/discard, POST /api/proposals/{id}/incorporate, POST /api/agent/jobs handlers and the CI webhook. Wrap the outermost handler. For example:
mux.Handle("POST /api/batches",
drainGuard(d.Draining, d.withSession(auth.RequireCollaborator(auth.RequireCSRF(http.HandlerFunc(d.handleCreateBatch))))))
mux.Handle("POST /api/topics",
drainGuard(d.Draining, d.withSession(auth.RequireCollaborator(auth.RequireCSRF(http.HandlerFunc(d.handleCreateTopic))))))
mux.Handle("POST /api/topics/{id}/messages",
drainGuard(d.Draining, d.withSession(auth.RequireCollaborator(auth.RequireCSRF(http.HandlerFunc(d.handleAppendTopicMessage))))))
mux.Handle("POST /api/topics/{id}/proposals",
drainGuard(d.Draining, d.withSession(auth.RequireCollaborator(auth.RequireCSRF(http.HandlerFunc(d.handleProposeRewrite))))))
mux.Handle("POST /api/topics/{id}/discard",
drainGuard(d.Draining, d.withSession(auth.RequireCollaborator(auth.RequireCSRF(http.HandlerFunc(d.handleDiscardTopic))))))
mux.Handle("POST /api/proposals/{id}/incorporate",
drainGuard(d.Draining, d.withSession(auth.RequireCollaborator(auth.RequireCSRF(http.HandlerFunc(d.handleIncorporate))))))
mux.Handle("POST /api/agent/jobs",
drainGuard(d.Draining, d.withSession(auth.RequireCollaborator(auth.RequireCSRF(http.HandlerFunc(d.handleCreateAgentJob))))))
And the CI webhook:
if len(d.WebhookSecret) > 0 {
mux.Handle("POST /api/webhook/ci", drainGuard(d.Draining, http.HandlerFunc(d.handleCIWebhook)))
}
GET routes and POST /api/stream/focus (presence only, not durable state) are left unguarded — reads must keep working during the drain.
main.go shutdown sequenceIn cmd/wiki-browser/main.go:
Add a constant near the other consts:
// agentDrainCap bounds how long a redeploy waits for in-flight Agent jobs
// before giving up and hard-cancelling them. Above it: wiki-browser.service
// TimeoutStopSec (11m) so systemd never SIGKILLs mid-drain.
const agentDrainCap = 10 * time.Minute
Add "sync/atomic" to the imports.
Before the deps := server.Deps{...} literal, create the draining flag:
var draining atomic.Bool
Add Draining: &draining, to the server.Deps{...} literal.
Replace the select { ... } block at the end of run() with:
select {
case err := <-errCh:
if errors.Is(err, http.ErrServerClosed) {
return nil
}
return err
case <-rootCtx.Done():
slog.Info("shutdown: draining")
// 1. Refuse new mutations so the drain is not prolonged.
draining.Store(true)
// 2. Wait for in-flight Agent jobs (up to the cap). Reads keep serving.
drainCtx, drainCancel := context.WithTimeout(context.Background(), agentDrainCap)
if ok := agentSvc.Drain(drainCtx); !ok {
slog.Warn("shutdown: agent drain hit the cap; stragglers will be cancelled")
}
drainCancel()
// 3. Stop the HTTP server (active read handlers finish fast).
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
defer shutdownCancel()
return srv.Shutdown(shutdownCtx)
}
The existing defer agentSvc.Stop() (idempotent hard-cancel) still runs on return, after srv.Shutdown — it cancels any straggler the capped drain left running. Leave that defer in place.
Run: go test ./internal/server/ ./cmd/wiki-browser/ -v
Expected: PASS.
Run: go build ./...
Expected: builds clean.
git add internal/server/server.go internal/server/server_test.go cmd/wiki-browser/main.go
git commit -m "server,main: graceful drain — 503 mutations, wait for agent jobs"
internal/cd — Config and StateFiles:
Create: internal/cd/config.go
Create: internal/cd/state.go
Test: internal/cd/state_test.go
Step 1: Write the failing test
Create internal/cd/state_test.go:
package cd
import (
"path/filepath"
"testing"
)
func TestLoadStateMissingFileIsZero(t *testing.T) {
s, err := LoadState(filepath.Join(t.TempDir(), "nope.json"))
if err != nil {
t.Fatalf("LoadState of missing file: %v", err)
}
if s.DeployedCommit != "" || s.PoisonedCommit != "" {
t.Errorf("missing file should yield zero State, got %+v", s)
}
}
func TestSaveThenLoadRoundTrips(t *testing.T) {
path := filepath.Join(t.TempDir(), "cd-state.json")
want := State{
DeployedCommit: "a1b2c3d",
DeployedAt: "2026-05-21T14:30:00Z",
PoisonedCommit: "deadbeef",
}
if err := SaveState(path, want); err != nil {
t.Fatalf("SaveState: %v", err)
}
got, err := LoadState(path)
if err != nil {
t.Fatalf("LoadState: %v", err)
}
if got != want {
t.Fatalf("round trip = %+v, want %+v", got, want)
}
}
Run: go test ./internal/cd/ -v
Expected: FAIL — package does not exist.
config.goCreate internal/cd/config.go:
// Package cd is the wiki-browser continuous-delivery engine. It rebuilds the
// binaries for a test-approved commit, swaps them atomically, restarts the
// service, health-checks it, and rolls back to the last-known-good binaries on
// failure. It only ever reads the clone (git archive / cat-file); git-sync
// remains the sole mutator of the clone's git state.
package cd
import "time"
// Config is the fully-specified input to New.
type Config struct {
Root string // the clone — git archive source (cfg.Root)
BinDir string // live binaries; bin/prev/ is the rollback copy
TriggerFile string // the webhook writes the target commit here
StateFile string // deployed/poisoned commit, JSON
HealthURL string // e.g. http://localhost:8080/healthz
HealthPollTimeout time.Duration // how long to wait for a healthy /healthz
ServiceName string // the systemd unit to restart, e.g. "wiki-browser"
}
// binaries are the three artifacts a deploy swaps, in dependency-safe order
// (wb-agent and wb-cd first, the user-facing wiki-browser last).
var binaries = []string{"wb-agent", "wb-cd", "wiki-browser"}
state.goCreate internal/cd/state.go:
package cd
import (
"encoding/json"
"errors"
"io/fs"
"os"
)
// State is wb-cd's persisted memory between runs.
type State struct {
DeployedCommit string `json:"deployed_commit"`
DeployedAt string `json:"deployed_at"`
// PoisonedCommit is a commit that failed to deploy (bad build, unhealthy
// binary). wb-cd skips it until a *different* commit arrives.
PoisonedCommit string `json:"poisoned_commit"`
}
// LoadState reads the state file. A missing file yields the zero State.
func LoadState(path string) (State, error) {
raw, err := os.ReadFile(path)
if errors.Is(err, fs.ErrNotExist) {
return State{}, nil
}
if err != nil {
return State{}, err
}
var s State
if err := json.Unmarshal(raw, &s); err != nil {
return State{}, err
}
return s, nil
}
// SaveState atomically writes the state file (temp file + rename).
func SaveState(path string, s State) error {
raw, err := json.MarshalIndent(s, "", " ")
if err != nil {
return err
}
tmp := path + ".tmp"
if err := os.WriteFile(tmp, raw, 0o644); err != nil {
return err
}
return os.Rename(tmp, path)
}
Run: go test ./internal/cd/ -v
Expected: PASS.
git add internal/cd/config.go internal/cd/state.go internal/cd/state_test.go
git commit -m "cd: Config and persisted State"
internal/cd — git helpers and buildFiles:
Create: internal/cd/git.go
Create: internal/cd/build.go
Create: internal/cd/helpers_test.go
Test: internal/cd/git_test.go
Step 1: Write the test helpers
Create internal/cd/helpers_test.go:
package cd
import (
"os"
"os/exec"
"path/filepath"
"strings"
"testing"
)
// mustGit runs git in dir and fails the test on error.
func mustGit(t *testing.T, dir string, args ...string) string {
t.Helper()
cmd := exec.Command("git", append([]string{"-C", dir}, args...)...)
cmd.Env = append(os.Environ(),
"GIT_AUTHOR_NAME=test", "GIT_AUTHOR_EMAIL=test@test",
"GIT_COMMITTER_NAME=test", "GIT_COMMITTER_EMAIL=test@test",
)
out, err := cmd.CombinedOutput()
if err != nil {
t.Fatalf("git %s: %v\n%s", strings.Join(args, " "), err, out)
}
return strings.TrimSpace(string(out))
}
// newTestRepo builds a git repo with one commit and returns its path plus the
// HEAD commit sha.
func newTestRepo(t *testing.T) (root, head string) {
t.Helper()
root = t.TempDir()
mustGit(t, root, "init", "-b", "master")
mustGit(t, root, "config", "user.name", "test")
mustGit(t, root, "config", "user.email", "test@test")
if err := os.WriteFile(filepath.Join(root, "README.md"), []byte("hello\n"), 0o644); err != nil {
t.Fatal(err)
}
mustGit(t, root, "add", "README.md")
mustGit(t, root, "commit", "-m", "init")
return root, mustGit(t, root, "rev-parse", "HEAD")
}
Create internal/cd/git_test.go:
package cd
import (
"os"
"path/filepath"
"testing"
)
func TestCommitPresent(t *testing.T) {
root, head := newTestRepo(t)
if !commitPresent(root, head) {
t.Errorf("commitPresent(%s) = false, want true", head)
}
if commitPresent(root, "0000000000000000000000000000000000000000") {
t.Error("commitPresent of a nonexistent commit = true, want false")
}
}
func TestArchiveCommitExtractsSubtree(t *testing.T) {
root := t.TempDir()
mustGit(t, root, "init", "-b", "master")
mustGit(t, root, "config", "user.name", "test")
mustGit(t, root, "config", "user.email", "test@test")
// A monorepo shape: wiki-browser/ plus an unrelated subproject.
if err := os.MkdirAll(filepath.Join(root, "wiki-browser"), 0o755); err != nil {
t.Fatal(err)
}
if err := os.MkdirAll(filepath.Join(root, "crm"), 0o755); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(root, "wiki-browser", "go.mod"), []byte("module x\n"), 0o644); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(root, "crm", "main.go"), []byte("package main\n"), 0o644); err != nil {
t.Fatal(err)
}
mustGit(t, root, "add", ".")
mustGit(t, root, "commit", "-m", "init")
head := mustGit(t, root, "rev-parse", "HEAD")
dest := t.TempDir()
if err := archiveCommit(root, head, dest); err != nil {
t.Fatalf("archiveCommit: %v", err)
}
if _, err := os.Stat(filepath.Join(dest, "wiki-browser", "go.mod")); err != nil {
t.Errorf("wiki-browser/go.mod not extracted: %v", err)
}
if _, err := os.Stat(filepath.Join(dest, "crm")); !os.IsNotExist(err) {
t.Error("crm/ should NOT be extracted — archive is scoped to wiki-browser")
}
}
Run: go test ./internal/cd/ -run 'TestCommitPresent|TestArchive' -v
Expected: FAIL — commitPresent / archiveCommit undefined.
git.goCreate internal/cd/git.go:
package cd
import (
"fmt"
"os/exec"
)
// commitPresent reports whether root's object database contains commit.
func commitPresent(root, commit string) bool {
return exec.Command("git", "-C", root, "cat-file", "-e", commit+"^{commit}").Run() == nil
}
// archiveCommit extracts the wiki-browser/ subtree of commit into destDir. It
// reads only the object database — no working tree, no refs — so it never
// races git-sync's mutations of the live clone.
func archiveCommit(root, commit, destDir string) error {
archive := exec.Command("git", "-C", root, "archive", "--format=tar", commit, "wiki-browser")
extract := exec.Command("tar", "-x", "-C", destDir)
pipe, err := archive.StdoutPipe()
if err != nil {
return err
}
extract.Stdin = pipe
if err := extract.Start(); err != nil {
return fmt.Errorf("start tar: %w", err)
}
if out, err := archive.CombinedOutput(); err != nil {
_ = extract.Wait()
return fmt.Errorf("git archive %s: %w\n%s", commit, err, out)
}
if err := extract.Wait(); err != nil {
return fmt.Errorf("tar extract: %w", err)
}
return nil
}
Note: archive.CombinedOutput() after wiring StdoutPipe() will error — CombinedOutput cannot be used once a pipe is taken. Use archive.Run() instead and capture stderr separately. Corrected archiveCommit:
func archiveCommit(root, commit, destDir string) error {
archive := exec.Command("git", "-C", root, "archive", "--format=tar", commit, "wiki-browser")
var archiveErr bytes.Buffer
archive.Stderr = &archiveErr
extract := exec.Command("tar", "-x", "-C", destDir)
var extractErr bytes.Buffer
extract.Stderr = &extractErr
pipe, err := archive.StdoutPipe()
if err != nil {
return err
}
extract.Stdin = pipe
if err := extract.Start(); err != nil {
return fmt.Errorf("start tar: %w", err)
}
if err := archive.Run(); err != nil {
_ = extract.Wait()
return fmt.Errorf("git archive %s: %w\n%s", commit, err, archiveErr.String())
}
if err := extract.Wait(); err != nil {
return fmt.Errorf("tar extract: %w\n%s", err, extractErr.String())
}
return nil
}
Use this corrected version. Add "bytes" to the imports.
build.goCreate internal/cd/build.go:
package cd
import (
"bytes"
"context"
"fmt"
"os"
"os/exec"
"path/filepath"
"time"
)
// buildResult is where buildCommit left the freshly compiled binaries.
type buildResult struct {
distDir string // contains wiki-browser, wb-agent, wb-cd
tmpDir string // the archive extraction root; caller removes it
}
// buildCommit archives commit from root, runs `make build` in the extracted
// wiki-browser module, and returns the dist directory. The caller must remove
// result.tmpDir. Version metadata is passed explicitly because the extracted
// tree has no .git for the Makefile to query.
func buildCommit(ctx context.Context, root, commit string) (buildResult, error) {
tmp, err := os.MkdirTemp("", "wb-cd-build-*")
if err != nil {
return buildResult{}, err
}
if err := archiveCommit(root, commit, tmp); err != nil {
_ = os.RemoveAll(tmp)
return buildResult{}, err
}
module := filepath.Join(tmp, "wiki-browser")
short := commit
if len(short) > 7 {
short = short[:7]
}
cmd := exec.CommandContext(ctx, "make", "-C", module, "build",
"COMMIT="+commit,
"VERSION="+short,
"BUILD_TIME="+time.Now().UTC().Format("2006-01-02T15:04Z"),
)
var out bytes.Buffer
cmd.Stdout = &out
cmd.Stderr = &out
if err := cmd.Run(); err != nil {
_ = os.RemoveAll(tmp)
return buildResult{}, fmt.Errorf("make build: %w\n%s", err, out.String())
}
return buildResult{distDir: filepath.Join(module, "dist"), tmpDir: tmp}, nil
}
Run: go test ./internal/cd/ -v && go build ./internal/cd/
Expected: PASS; builds clean.
git add internal/cd/git.go internal/cd/build.go internal/cd/git_test.go internal/cd/helpers_test.go
git commit -m "cd: git archive helpers and buildCommit"
internal/cd — swap, restart, health-checkFiles:
Create: internal/cd/swap.go
Test: internal/cd/swap_test.go
Step 1: Write the failing tests
Create internal/cd/swap_test.go:
package cd
import (
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"testing"
"time"
)
// writeFakeBinaries drops the three named files into dir with given contents.
func writeFakeBinaries(t *testing.T, dir string, content map[string]string) {
t.Helper()
for name, body := range content {
if err := os.WriteFile(filepath.Join(dir, name), []byte(body), 0o755); err != nil {
t.Fatal(err)
}
}
}
func TestSnapshotAndSwap(t *testing.T) {
binDir := t.TempDir()
distDir := t.TempDir()
writeFakeBinaries(t, binDir, map[string]string{
"wiki-browser": "OLD", "wb-agent": "OLD", "wb-cd": "OLD",
})
writeFakeBinaries(t, distDir, map[string]string{
"wiki-browser": "NEW", "wb-agent": "NEW", "wb-cd": "NEW",
})
if err := snapshotBinaries(binDir); err != nil {
t.Fatalf("snapshotBinaries: %v", err)
}
for _, name := range binaries {
got, _ := os.ReadFile(filepath.Join(binDir, "prev", name))
if string(got) != "OLD" {
t.Errorf("prev/%s = %q, want OLD", name, got)
}
}
if err := swapBinaries(distDir, binDir); err != nil {
t.Fatalf("swapBinaries: %v", err)
}
for _, name := range binaries {
got, _ := os.ReadFile(filepath.Join(binDir, name))
if string(got) != "NEW" {
t.Errorf("%s = %q, want NEW after swap", name, got)
}
}
}
func TestRestoreFromPrev(t *testing.T) {
binDir := t.TempDir()
writeFakeBinaries(t, binDir, map[string]string{
"wiki-browser": "BAD", "wb-agent": "BAD", "wb-cd": "BAD",
})
if err := os.MkdirAll(filepath.Join(binDir, "prev"), 0o755); err != nil {
t.Fatal(err)
}
writeFakeBinaries(t, filepath.Join(binDir, "prev"), map[string]string{
"wiki-browser": "GOOD", "wb-agent": "GOOD", "wb-cd": "GOOD",
})
if err := restoreFromPrev(binDir); err != nil {
t.Fatalf("restoreFromPrev: %v", err)
}
for _, name := range binaries {
got, _ := os.ReadFile(filepath.Join(binDir, name))
if string(got) != "GOOD" {
t.Errorf("%s = %q, want GOOD after restore", name, got)
}
}
}
func TestHealthCheckPassesOnMatchingCommit(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
_ = json.NewEncoder(w).Encode(map[string]string{"status": "ok", "commit": "abc123"})
}))
defer srv.Close()
if err := healthCheck(context.Background(), srv.URL, "abc123", time.Second); err != nil {
t.Fatalf("healthCheck: %v", err)
}
}
func TestHealthCheckFailsOnWrongCommit(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
_ = json.NewEncoder(w).Encode(map[string]string{"status": "ok", "commit": "stale"})
}))
defer srv.Close()
err := healthCheck(context.Background(), srv.URL, "abc123", 300*time.Millisecond)
if err == nil {
t.Fatal("healthCheck should fail when /healthz reports a different commit")
}
}
func TestHealthCheckAllowsEmptyWantedCommitForRollbackFallback(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
_ = json.NewEncoder(w).Encode(map[string]string{"status": "ok", "commit": "unknown"})
}))
defer srv.Close()
if err := healthCheck(context.Background(), srv.URL, "", time.Second); err != nil {
t.Fatalf("healthCheck with empty wanted commit should accept any healthy 200: %v", err)
}
}
Run: go test ./internal/cd/ -run 'TestSnapshot|TestRestore|TestHealth' -v
Expected: FAIL — snapshotBinaries etc. undefined.
swap.goCreate internal/cd/swap.go:
package cd
import (
"context"
"encoding/json"
"fmt"
"io"
"net/http"
"os"
"os/exec"
"path/filepath"
"time"
)
// copyFile copies src to dst preserving mode 0755, via a temp file + rename so
// dst is never observed half-written and a running binary keeps its inode.
func copyFile(src, dst string) error {
in, err := os.Open(src)
if err != nil {
return err
}
defer in.Close()
tmp := dst + ".new"
out, err := os.OpenFile(tmp, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0o755)
if err != nil {
return err
}
if _, err := io.Copy(out, in); err != nil {
out.Close()
return err
}
if err := out.Close(); err != nil {
return err
}
return os.Rename(tmp, dst)
}
// snapshotBinaries copies the current live binaries into binDir/prev/ so a
// failed deploy can be rolled back.
func snapshotBinaries(binDir string) error {
prev := filepath.Join(binDir, "prev")
if err := os.MkdirAll(prev, 0o755); err != nil {
return err
}
for _, name := range binaries {
if err := copyFile(filepath.Join(binDir, name), filepath.Join(prev, name)); err != nil {
return fmt.Errorf("snapshot %s: %w", name, err)
}
}
return nil
}
// swapBinaries atomically installs the freshly built binaries from distDir.
func swapBinaries(distDir, binDir string) error {
for _, name := range binaries {
if err := copyFile(filepath.Join(distDir, name), filepath.Join(binDir, name)); err != nil {
return fmt.Errorf("swap %s: %w", name, err)
}
}
return nil
}
// restoreFromPrev re-installs the binaries snapshotted in binDir/prev/.
func restoreFromPrev(binDir string) error {
prev := filepath.Join(binDir, "prev")
for _, name := range binaries {
if err := copyFile(filepath.Join(prev, name), filepath.Join(binDir, name)); err != nil {
return fmt.Errorf("restore %s: %w", name, err)
}
}
return nil
}
// restartService restarts the systemd unit via sudo. The call blocks for the
// whole graceful drain (wiki-browser.service TimeoutStopSec).
func restartService(ctx context.Context, name string) error {
cmd := exec.CommandContext(ctx, "sudo", "systemctl", "restart", name)
if out, err := cmd.CombinedOutput(); err != nil {
return fmt.Errorf("systemctl restart %s: %w\n%s", name, err, out)
}
return nil
}
// healthCheck polls healthURL until it returns 200 with a JSON body whose
// commit field equals wantCommit, or until budget elapses. Empty wantCommit
// means "any healthy 200" and is used only as a rollback fallback when no
// previous deployed commit was recorded.
func healthCheck(ctx context.Context, healthURL, wantCommit string, budget time.Duration) error {
deadline := time.Now().Add(budget)
var lastErr error
for time.Now().Before(deadline) {
if err := ctx.Err(); err != nil {
return err
}
got, err := probeHealth(ctx, healthURL)
if err == nil && (wantCommit == "" || got == wantCommit) {
return nil
}
if err != nil {
lastErr = err
} else {
lastErr = fmt.Errorf("/healthz reports commit %q, want %q", got, wantCommit)
}
time.Sleep(2 * time.Second)
}
if lastErr == nil {
lastErr = fmt.Errorf("health check timed out")
}
return lastErr
}
// probeHealth does one GET and returns the reported commit.
func probeHealth(ctx context.Context, healthURL string) (string, error) {
req, err := http.NewRequestWithContext(ctx, http.MethodGet, healthURL, nil)
if err != nil {
return "", err
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
return "", err
}
defer resp.Body.Close()
body, _ := io.ReadAll(io.LimitReader(resp.Body, 1<<16))
if resp.StatusCode != http.StatusOK {
return "", fmt.Errorf("/healthz status %d", resp.StatusCode)
}
var payload struct {
Commit string `json:"commit"`
}
if err := json.Unmarshal(body, &payload); err != nil {
return "", fmt.Errorf("/healthz body not JSON: %w", err)
}
return payload.Commit, nil
}
Run: go test ./internal/cd/ -v
Expected: PASS.
git add internal/cd/swap.go internal/cd/swap_test.go
git commit -m "cd: atomic binary swap, restart, health-check"
internal/cd — the deploy loop and rollbackFiles:
internal/cd/deploy.gointernal/cd/deploy_test.goThe Deployer orchestrates one trigger-drain run. Build and restart are function-field seams so the loop, rollback, and poisoning logic are unit-testable without compiling a real binary or touching systemd.
Create internal/cd/deploy_test.go:
package cd
import (
"context"
"os"
"path/filepath"
"strings"
"sync"
"testing"
)
// recordingNotifier captures alert messages.
type recordingNotifier struct {
mu sync.Mutex
msgs []string
}
func (n *recordingNotifier) Send(msg string) {
n.mu.Lock()
defer n.mu.Unlock()
n.msgs = append(n.msgs, msg)
}
func (n *recordingNotifier) all() []string {
n.mu.Lock()
defer n.mu.Unlock()
return append([]string(nil), n.msgs...)
}
// fixture builds a Deployer with overridable seams over a temp clone + binDir.
type fixture struct {
dep *Deployer
cfg Config
notifier *recordingNotifier
binDir string
root string
head string
}
func newFixture(t *testing.T) *fixture {
t.Helper()
root, head := newTestRepo(t)
binDir := t.TempDir()
writeFakeBinaries(t, binDir, map[string]string{
"wiki-browser": "OLD", "wb-agent": "OLD", "wb-cd": "OLD",
})
cfg := Config{
Root: root,
BinDir: binDir,
TriggerFile: filepath.Join(t.TempDir(), "trigger"),
StateFile: filepath.Join(t.TempDir(), "state.json"),
ServiceName: "wiki-browser",
}
n := &recordingNotifier{}
dep := New(cfg, n)
// Default seams: build succeeds (writes NEW binaries to a fresh dist),
// restart succeeds, health passes.
dep.build = func(ctx context.Context, commit string) (string, func(), error) {
dist := t.TempDir()
writeFakeBinaries(t, dist, map[string]string{
"wiki-browser": "NEW", "wb-agent": "NEW", "wb-cd": "NEW",
})
return dist, func() {}, nil
}
dep.restart = func(ctx context.Context) error { return nil }
dep.health = func(ctx context.Context, wantCommit string) error { return nil }
return &fixture{dep: dep, cfg: cfg, notifier: n, binDir: binDir, root: root, head: head}
}
func writeTrigger(t *testing.T, path, commit string) {
t.Helper()
if err := os.WriteFile(path, []byte(commit+"\n"), 0o644); err != nil {
t.Fatal(err)
}
}
func TestRunDeploysHealthyCommit(t *testing.T) {
fx := newFixture(t)
writeTrigger(t, fx.cfg.TriggerFile, fx.head)
if err := fx.dep.Run(context.Background()); err != nil {
t.Fatalf("Run: %v", err)
}
for _, name := range binaries {
got, _ := os.ReadFile(filepath.Join(fx.binDir, name))
if string(got) != "NEW" {
t.Errorf("%s = %q, want NEW", name, got)
}
}
st, _ := LoadState(fx.cfg.StateFile)
if st.DeployedCommit != fx.head {
t.Errorf("deployed_commit = %q, want %q", st.DeployedCommit, fx.head)
}
}
func TestRunRollsBackOnUnhealthyDeploy(t *testing.T) {
fx := newFixture(t)
prev := strings.Repeat("b", 40)
if err := SaveState(fx.cfg.StateFile, State{DeployedCommit: prev}); err != nil {
t.Fatal(err)
}
fx.dep.health = func(ctx context.Context, wantCommit string) error {
if wantCommit == fx.head {
return context.DeadlineExceeded // new binary unhealthy
}
if wantCommit != prev {
t.Fatalf("rollback health checked %q, want previous commit %q", wantCommit, prev)
}
return nil
}
writeTrigger(t, fx.cfg.TriggerFile, fx.head)
if err := fx.dep.Run(context.Background()); err != nil {
t.Fatalf("Run should not return an error after a handled rollback: %v", err)
}
// Binaries restored to OLD.
for _, name := range binaries {
got, _ := os.ReadFile(filepath.Join(fx.binDir, name))
if string(got) != "OLD" {
t.Errorf("%s = %q, want OLD after rollback", name, got)
}
}
// Commit poisoned, not deployed.
st, _ := LoadState(fx.cfg.StateFile)
if st.PoisonedCommit != fx.head {
t.Errorf("poisoned_commit = %q, want %q", st.PoisonedCommit, fx.head)
}
if st.DeployedCommit == fx.head {
t.Error("a failed deploy must not record deployed_commit")
}
if st.DeployedCommit != prev {
t.Errorf("previous deployed commit = %q, want preserved %q", st.DeployedCommit, prev)
}
// An alert fired.
if len(fx.notifier.all()) == 0 {
t.Error("rollback should fire an alert")
}
}
func TestRunAlertsWhenRollbackHealthFails(t *testing.T) {
fx := newFixture(t)
prev := strings.Repeat("b", 40)
if err := SaveState(fx.cfg.StateFile, State{DeployedCommit: prev}); err != nil {
t.Fatal(err)
}
fx.dep.health = func(ctx context.Context, wantCommit string) error {
return context.DeadlineExceeded
}
writeTrigger(t, fx.cfg.TriggerFile, fx.head)
if err := fx.dep.Run(context.Background()); err != nil {
t.Fatalf("Run should handle rollback-health failure without returning: %v", err)
}
st, _ := LoadState(fx.cfg.StateFile)
if st.PoisonedCommit != fx.head {
t.Errorf("poisoned_commit = %q, want %q", st.PoisonedCommit, fx.head)
}
if len(fx.notifier.all()) == 0 {
t.Fatal("rollback-health failure should alert")
}
}
func TestRunSkipsPoisonedCommit(t *testing.T) {
fx := newFixture(t)
if err := SaveState(fx.cfg.StateFile, State{PoisonedCommit: fx.head}); err != nil {
t.Fatal(err)
}
built := false
fx.dep.build = func(ctx context.Context, commit string) (string, func(), error) {
built = true
return "", func() {}, nil
}
writeTrigger(t, fx.cfg.TriggerFile, fx.head)
if err := fx.dep.Run(context.Background()); err != nil {
t.Fatalf("Run: %v", err)
}
if built {
t.Error("a poisoned commit must not be rebuilt")
}
}
func TestRunSkipsAlreadyDeployedCommit(t *testing.T) {
fx := newFixture(t)
if err := SaveState(fx.cfg.StateFile, State{DeployedCommit: fx.head}); err != nil {
t.Fatal(err)
}
built := false
fx.dep.build = func(ctx context.Context, commit string) (string, func(), error) {
built = true
return "", func() {}, nil
}
writeTrigger(t, fx.cfg.TriggerFile, fx.head)
if err := fx.dep.Run(context.Background()); err != nil {
t.Fatalf("Run: %v", err)
}
if built {
t.Error("an already-deployed commit must not be rebuilt")
}
}
func TestRunBuildFailurePoisonsAndDoesNotSwap(t *testing.T) {
fx := newFixture(t)
fx.dep.build = func(ctx context.Context, commit string) (string, func(), error) {
return "", func() {}, context.Canceled // stand-in build failure
}
writeTrigger(t, fx.cfg.TriggerFile, fx.head)
if err := fx.dep.Run(context.Background()); err != nil {
t.Fatalf("Run should handle a build failure without returning: %v", err)
}
for _, name := range binaries {
got, _ := os.ReadFile(filepath.Join(fx.binDir, name))
if string(got) != "OLD" {
t.Errorf("%s = %q, want OLD (build failed — no swap)", name, got)
}
}
st, _ := LoadState(fx.cfg.StateFile)
if st.PoisonedCommit != fx.head {
t.Errorf("poisoned_commit = %q, want %q", st.PoisonedCommit, fx.head)
}
msgs := fx.notifier.all()
if len(msgs) == 0 {
t.Error("build failure should fire an alert")
}
if len(msgs) > 0 && strings.Contains(msgs[0], "Rolled back") {
t.Errorf("build failure alert must not claim rollback happened: %q", msgs[0])
}
}
Run: go test ./internal/cd/ -run TestRun -v
Expected: FAIL — Deployer / New / Run undefined.
deploy.goCreate internal/cd/deploy.go:
package cd
import (
"context"
"fmt"
"log/slog"
"os"
"strings"
"time"
"github.com/getorcha/wiki-browser/internal/alert"
)
// Deployer runs one wb-cd activation: it drains the trigger file, deploying
// each target commit in turn until no new commit remains.
type Deployer struct {
cfg Config
notifier alert.Notifier
// Seams — wired to the real implementations by New; overridden in tests.
// build returns the dist dir and a cleanup func for the build temp dir.
build func(ctx context.Context, commit string) (distDir string, cleanup func(), err error)
restart func(ctx context.Context) error
health func(ctx context.Context, wantCommit string) error
}
// New returns a Deployer wired to the real build/restart/health operations.
func New(cfg Config, notifier alert.Notifier) *Deployer {
if notifier == nil {
notifier = alert.Nop{}
}
d := &Deployer{cfg: cfg, notifier: notifier}
d.build = func(ctx context.Context, commit string) (string, func(), error) {
res, err := buildCommit(ctx, cfg.Root, commit)
if err != nil {
return "", func() {}, err
}
return res.distDir, func() { _ = os.RemoveAll(res.tmpDir) }, nil
}
d.restart = func(ctx context.Context) error {
return restartService(ctx, cfg.ServiceName)
}
d.health = func(ctx context.Context, wantCommit string) error {
return healthCheck(ctx, cfg.HealthURL, wantCommit, cfg.HealthPollTimeout)
}
return d
}
// Run drains the trigger file: it deploys the target commit, then re-reads the
// trigger and repeats if a newer un-deployed, non-poisoned commit arrived
// while the deploy was running. This is what makes .path activation safe when
// a push lands while wb-cd.service is already active.
func (d *Deployer) Run(ctx context.Context) error {
for {
commit, err := readTrigger(d.cfg.TriggerFile)
if err != nil {
return fmt.Errorf("read trigger file: %w", err)
}
if commit == "" {
return nil // no trigger written
}
st, err := LoadState(d.cfg.StateFile)
if err != nil {
return fmt.Errorf("load state: %w", err)
}
switch commit {
case st.DeployedCommit:
slog.Info("wb-cd: commit already deployed; nothing to do", "commit", commit)
return nil
case st.PoisonedCommit:
slog.Warn("wb-cd: commit is poisoned; skipping until a new commit arrives",
"commit", commit)
return nil
}
d.deployOne(ctx, commit, st)
// Loop: a push during the deploy may have rewritten the trigger.
}
}
// deployOne builds, swaps, restarts, and health-checks one commit. A failure
// at any stage rolls back (where binaries were touched), poisons the commit,
// and alerts — it never returns an error, so Run continues to drain.
func (d *Deployer) deployOne(ctx context.Context, commit string, st State) {
slog.Info("wb-cd: deploying", "commit", commit)
if !commitPresent(d.cfg.Root, commit) {
if err := d.waitForCommit(ctx, commit); err != nil {
d.fail(commit, st, "commit never arrived in the clone", err, false)
return
}
}
distDir, cleanup, err := d.build(ctx, commit)
if err != nil {
d.fail(commit, st, "build failed", err, false)
return
}
defer cleanup()
if err := snapshotBinaries(d.cfg.BinDir); err != nil {
d.fail(commit, st, "snapshot of current binaries failed", err, false)
return
}
if err := swapBinaries(distDir, d.cfg.BinDir); err != nil {
d.fail(commit, st, "binary swap failed", err, true)
return
}
if err := d.restart(ctx); err != nil {
d.fail(commit, st, "service restart failed", err, true)
return
}
if err := d.health(ctx, commit); err != nil {
d.fail(commit, st, "new binary failed its health check", err, true)
return
}
st.DeployedCommit = commit
st.DeployedAt = time.Now().UTC().Format(time.RFC3339)
if err := SaveState(d.cfg.StateFile, st); err != nil {
slog.Error("wb-cd: deployed but could not save state", "err", err)
}
slog.Info("wb-cd: deploy succeeded", "commit", commit)
}
// fail rolls back (when binaries were already swapped), poisons the commit,
// and alerts. needRollback is true once the live binaries were replaced.
func (d *Deployer) fail(commit string, st State, reason string, cause error, needRollback bool) {
slog.Error("wb-cd: deploy failed", "commit", commit, "reason", reason, "err", cause)
st.PoisonedCommit = commit
if err := SaveState(d.cfg.StateFile, st); err != nil {
slog.Error("wb-cd: could not save poisoned state", "err", err)
}
if needRollback {
if err := restoreFromPrev(d.cfg.BinDir); err != nil {
d.notifier.Send(fmt.Sprintf(
"wiki-browser CD: ROLLBACK FAILED for %s — %s: %v — and restoring the "+
"previous binaries also failed: %v. Manual intervention needed.",
short(commit), reason, cause, err))
return
}
if err := d.restart(context.Background()); err != nil {
d.notifier.Send(fmt.Sprintf(
"wiki-browser CD: rolled back binaries for %s but the restart failed: %v. "+
"Manual intervention needed.", short(commit), err))
return
}
if st.DeployedCommit != "" {
if err := d.health(context.Background(), st.DeployedCommit); err != nil {
d.notifier.Send(fmt.Sprintf(
"wiki-browser CD: rollback restart completed for %s but previous commit "+
"%s did not become healthy: %v. Manual intervention needed.",
short(commit), short(st.DeployedCommit), err))
return
}
} else if err := d.health(context.Background(), ""); err != nil {
d.notifier.Send(fmt.Sprintf(
"wiki-browser CD: rollback restart completed for %s but /healthz is still "+
"unhealthy and no previous deployed commit was recorded: %v. Manual intervention needed.",
short(commit), err))
return
}
}
if needRollback {
d.notifier.Send(fmt.Sprintf(
"wiki-browser CD: deploy of %s FAILED — %s: %v. Rolled back to the previous "+
"binary; this commit will be skipped until a new one is pushed.",
short(commit), reason, cause))
} else {
d.notifier.Send(fmt.Sprintf(
"wiki-browser CD: deploy of %s FAILED before binaries were touched — %s: %v. "+
"This commit will be skipped until a new one is pushed.",
short(commit), reason, cause))
}
}
// waitForCommit blocks until the clone contains commit (git-sync's pull,
// kicked by the same webhook, delivers it) or ~60s elapses.
func (d *Deployer) waitForCommit(ctx context.Context, commit string) error {
deadline := time.Now().Add(60 * time.Second)
for time.Now().Before(deadline) {
if commitPresent(d.cfg.Root, commit) {
return nil
}
select {
case <-ctx.Done():
return ctx.Err()
case <-time.After(2 * time.Second):
}
}
return fmt.Errorf("commit %s not present after waiting for git-sync", commit)
}
// readTrigger reads and trims the trigger file. A missing file yields "".
func readTrigger(path string) (string, error) {
raw, err := os.ReadFile(path)
if os.IsNotExist(err) {
return "", nil
}
if err != nil {
return "", err
}
return strings.TrimSpace(string(raw)), nil
}
func short(commit string) string {
if len(commit) > 7 {
return commit[:7]
}
return commit
}
Run: go test ./internal/cd/ -v
Expected: PASS.
git add internal/cd/deploy.go internal/cd/deploy_test.go
git commit -m "cd: deploy loop, rollback, poisoning, alerting"
cmd/wb-cd + main.go CD wiringFiles:
Create: cmd/wb-cd/main.go
Modify: cmd/wiki-browser/main.go
Test: cmd/wb-cd/main_test.go
Step 1: Write the failing test
Create cmd/wb-cd/main_test.go:
package main
import (
"strings"
"testing"
)
func TestHealthURLFromListen(t *testing.T) {
cases := map[string]string{
":8080": "http://localhost:8080/healthz",
"127.0.0.1:9000": "http://127.0.0.1:9000/healthz",
}
for listen, want := range cases {
if got := healthURL(listen); got != want {
t.Errorf("healthURL(%q) = %q, want %q", listen, got, want)
}
}
}
func TestHealthURLDefaultsHostForBareColonForm(t *testing.T) {
if got := healthURL(":8080"); !strings.HasPrefix(got, "http://localhost:") {
t.Errorf("bare :port should resolve to localhost, got %q", got)
}
}
Run: go test ./cmd/wb-cd/ -v
Expected: FAIL — package does not exist.
cmd/wb-cd/main.go// Command wb-cd is the wiki-browser continuous-delivery oneshot. A systemd
// .path unit activates it when the CD trigger file changes; it builds the
// approved commit, swaps the binaries, restarts wiki-browser, health-checks,
// and rolls back on failure. It runs once and exits.
package main
import (
"context"
"flag"
"log/slog"
"os"
"strings"
"github.com/getorcha/wiki-browser/internal/alert"
"github.com/getorcha/wiki-browser/internal/cd"
"github.com/getorcha/wiki-browser/internal/config"
)
func main() {
configPath := flag.String("config", "wiki-browser.yaml", "path to config file")
flag.Parse()
logger := slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelInfo}))
slog.SetDefault(logger)
if err := run(*configPath); err != nil {
slog.Error("wb-cd: fatal", "err", err)
os.Exit(1)
}
}
func run(configPath string) error {
cfg, err := config.Load(configPath)
if err != nil {
return err
}
if cfg.CD == nil {
slog.Warn("wb-cd: no cd: block in config — nothing to do")
return nil
}
notifier := buildNotifier(cfg.Alert)
dep := cd.New(cd.Config{
Root: cfg.Root,
BinDir: cfg.CD.BinDir,
TriggerFile: cfg.CD.TriggerFile,
StateFile: cfg.CD.StateFile,
HealthURL: healthURL(cfg.Listen),
HealthPollTimeout: cfg.CD.HealthPollTimeout,
ServiceName: "wiki-browser",
}, notifier)
return dep.Run(context.Background())
}
// healthURL builds the local /healthz URL from the configured listen address.
// A bare ":port" resolves to localhost.
func healthURL(listen string) string {
host := listen
if strings.HasPrefix(listen, ":") {
host = "localhost" + listen
}
return "http://" + host + "/healthz"
}
// buildNotifier turns the optional alert config into an alert.Notifier. A nil
// config — or an unreadable URL file — yields a no-op notifier.
func buildNotifier(cfg *config.Alert) alert.Notifier {
if cfg == nil {
return alert.Nop{}
}
raw, err := os.ReadFile(cfg.SlackWebhookURLFile)
if err != nil {
slog.Warn("wb-cd: cannot read slack webhook url; alerting disabled", "err", err)
return alert.Nop{}
}
return alert.NewSlack(strings.TrimSpace(string(raw)))
}
wiki-browser's DepsIn cmd/wiki-browser/main.go, in the deps := server.Deps{...} literal, add the version fields and the trigger file:
Version: version,
Commit: commit,
BuildTime: buildTime,
And after the literal, alongside the existing if gitSync != nil { deps.GitSync = gitSync }:
if cfg.CD != nil {
deps.CDTriggerFile = cfg.CD.TriggerFile
}
Run: go test ./cmd/wb-cd/ -v
Expected: PASS.
Run: go build ./... && go test ./...
Expected: builds clean; full suite PASS.
Run: make build
Expected: all three binaries build into dist/ (this is the first task where make build works, since cmd/wb-cd now exists).
git add cmd/wb-cd/ cmd/wiki-browser/main.go
git commit -m "cmd/wb-cd: the CD oneshot; wire version + trigger into the server"
Files:
deploy/wb-cd.pathdeploy/wb-cd.servicedeploy/wb-cd-alert.serviceci-tracked-paths.yaml (monorepo path wiki-browser/ci-tracked-paths.yaml).github/workflows/wiki-browser.yml (at the monorepo root)deploy/wiki-browser.servicewiki-browser.example.yamlThis task ships configuration and operator artifacts; there is no Go code, so no TDD loop. The build/test verification is in Step 8.
TimeoutStopSec to the wiki-browser unitIn deploy/wiki-browser.service, in the [Service] section, add after RestartSec=2s:
# The graceful drain finishes in-flight Agent jobs (capped at 10m in-process).
# This must sit above that cap so systemd never SIGKILLs mid-drain.
TimeoutStopSec=11min
deploy/wb-cd.path# deploy/wb-cd.path — activates wb-cd.service when the CD trigger file changes.
# Coalescing is fine: wb-cd re-reads the trigger before exit, so a change that
# lands while wb-cd is already running is not lost.
[Unit]
Description=wiki-browser CD trigger watch
[Path]
PathChanged=/srv/wiki-browser/cd-trigger
Unit=wb-cd.service
[Install]
WantedBy=multi-user.target
deploy/wb-cd.service# deploy/wb-cd.service — the continuous-delivery oneshot.
[Unit]
Description=wiki-browser continuous delivery
# OnFailure fires the alert unit when a wb-cd run exits non-zero (a crash that
# wb-cd's own alerting could not report).
OnFailure=wb-cd-alert.service
[Service]
Type=oneshot
User=karn
ExecStart=/srv/wiki-browser/bin/wb-cd -config=/srv/orcha/wiki-browser/wiki-browser.yaml
# Covers build + the up-to-11m wiki-browser drain + the health poll.
TimeoutStartSec=20min
deploy/wb-cd-alert.service# deploy/wb-cd-alert.service — OnFailure backstop. wb-cd self-alerts for its
# known failure paths; this catches a wb-cd process that crashed outright.
# wb-cd-alert is a tiny invocation of wb-cd's own alert path — implemented as a
# `wb-cd -alert` flag is overkill; instead this unit posts a fixed message via
# a one-line curl against the Slack webhook file.
[Unit]
Description=wiki-browser CD failure alert
[Service]
Type=oneshot
User=karn
# Reads the Slack webhook URL from the secret file and posts a fixed message.
ExecStart=/bin/sh -c 'curl -fsS -X POST -H "Content-Type: application/json" \
--data "{\"text\":\"wiki-browser CD: a wb-cd run exited non-zero. Check journalctl -u wb-cd.\"}" \
"$(cat /srv/wiki-browser/secrets/slack-webhook-url)"'
ci-tracked-paths.yamlThis is the committed, non-secret mirror of the production document-tracking config. It must carry the same extensions and exclude values as the Pi's wiki-browser.yaml. Set the values to match the current production config (confirm against the deployed wiki-browser.yaml during provisioning — Task-15 review row):
# CI mirror of the wiki's document-tracking config. The classifier
# (cmd/classify-ci-change) reads these VALUES; the matching LOGIC is the shared
# internal/walker.Matcher, so only these two lists can drift — keep them in
# step with the production wiki-browser.yaml in the same commit.
extensions: [".md", ".html"]
exclude:
- "www/**"
- "marketing/**"
.github/workflows/wiki-browser.ymlAt the monorepo root — getorcha/orcha/.github/workflows/wiki-browser.yml, not under wiki-browser/:
name: wiki-browser
on:
push:
branches: [master]
jobs:
wiki-browser:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # github.event.before must be reachable
- uses: actions/setup-go@v5
with:
go-version-file: wiki-browser/go.mod
- id: classify
working-directory: wiki-browser
run: >
go run ./cmd/classify-ci-change
--base "${{ github.event.before }}"
--head "${{ github.sha }}"
--policy ci-tracked-paths.yaml
- if: steps.classify.outputs.code_changed == 'true'
working-directory: wiki-browser
run: go test ./...
- name: notify the pi
if: always() && steps.classify.outputs.notify == 'true'
env:
SECRET: ${{ secrets.WB_DEPLOY_HMAC_SECRET }}
URL: ${{ secrets.WB_WEBHOOK_URL }}
run: |
deploy=false
if [ "${{ steps.classify.outputs.code_changed }}" = "true" ] && [ "${{ job.status }}" = "success" ]; then
deploy=true
fi
body=$(printf '{"deploy":%s,"commit":"%s","ref":"%s","delivery_id":"github-run-%s"}' \
"$deploy" "${{ github.sha }}" "${{ github.ref }}" "${{ github.run_id }}")
sig="sha256=$(printf '%s' "$body" | openssl dgst -sha256 -hmac "$SECRET" | sed 's/^.* //')"
curl -fsS --retry 5 --retry-delay 10 --retry-all-errors -X POST "$URL" \
-H "Content-Type: application/json" \
-H "X-Hub-Signature-256: $sig" \
--data "$body"
Note: the notify step runs with if: always() so a test failure still notifies (the docs in the push sync; deploy is false). If classify itself failed, outputs.notify is empty and the step is skipped — and the job is already red, so the failure is loud. The curl retry is intentional: if wiki-browser is draining during a restart, /api/webhook/ci may briefly return 503 with Retry-After; transient delivery failures must retry rather than silently dropping a doc-sync/deploy notification.
cd: block in the example configIn wiki-browser.example.yaml, append:
# cd: enables the wb-cd continuous-delivery oneshot. Omit the whole block for a
# build that never self-deploys (development, local). The /api/webhook/ci route
# still serves doc-sync without it; a deploy:true payload is then logged and
# ignored.
cd:
bin_dir: "/srv/wiki-browser/bin" # live binaries; bin/prev/ is the rollback copy
trigger_file: "/srv/wiki-browser/cd-trigger"
state_file: "/srv/wiki-browser/cd-state.json"
health_poll_timeout: "90s" # default
Path sanity before running commands:
wiki-browser/, test -f ci-tracked-paths.yaml.wiki-browser/, test -f ../.github/workflows/wiki-browser.yml.wiki-browser/, run the classifier the same way the Action does: go run ./cmd/classify-ci-change --base HEAD~1 --head HEAD --policy ci-tracked-paths.yaml.Run: go build ./... && go test ./...
Expected: builds clean; full suite PASS.
Run: make build
Expected: all three binaries build.
Validate the systemd unit syntax (does not require installing the units):
Run: systemd-analyze verify deploy/wb-cd.path deploy/wb-cd.service deploy/wb-cd-alert.service deploy/wiki-browser.service
Expected: no syntax errors. (Off-Pi, ignore "Unit ... not found" notes for wb-cd.service/network-online.target cross-references.)
Validate the workflow YAML parses:
Run: python3 -c "import yaml,sys; yaml.safe_load(open('../.github/workflows/wiki-browser.yml'))"
Expected: no error.
git add deploy/wb-cd.path deploy/wb-cd.service deploy/wb-cd-alert.service deploy/wiki-browser.service ci-tracked-paths.yaml wiki-browser.example.yaml ../.github/workflows/wiki-browser.yml
git commit -m "deploy: wb-cd systemd units, WB Action workflow, cd example config"
Not implementation tasks — these are one-time provisioning actions the operator performs after the code merges. Listed here so they are not lost.
On the Pi:
wb-cd builds natively on the Pi.mkdir -p /srv/wiki-browser/bin/prev./srv/wiki-browser/secrets/github-webhook-secret (mode 0600, owned by karn).wb-cd.path, wb-cd.service, wb-cd-alert.service; systemctl daemon-reload; systemctl enable --now wb-cd.path.systemctl daemon-reload after the wiki-browser.service TimeoutStopSec edit.echo 'karn ALL=(root) NOPASSWD: /usr/bin/systemctl restart wiki-browser' > /etc/sudoers.d/wb-cd (mode 0440, visudo -c to check).cd: block to the Pi's wiki-browser.yaml.On GitHub (operator, via gh):
8. gh secret set WB_DEPLOY_HMAC_SECRET — the same value as the Pi's secret file.
9. gh variable set WB_WEBHOOK_URL (or gh secret set) — https://wiki.<domain>/api/webhook/ci. The workflow reads it as secrets.WB_WEBHOOK_URL; if you use a variable instead, change ${{ secrets.WB_WEBHOOK_URL }} to ${{ vars.WB_WEBHOOK_URL }}.
10. If the #10 native Settings→Webhooks hook was already created, delete it: gh api repos/getorcha/orcha/hooks to find the id, then gh api -X DELETE repos/getorcha/orcha/hooks/<id>.
Review:
11. Confirm wiki-browser/ci-tracked-paths.yaml matches the Pi's wiki-browser.yaml extensions and exclude.
Spec coverage — every Design subsection maps to a task:
| Spec section | Task(s) |
|---|---|
| End-to-end flow | whole plan; webhook 7, .path/units 15, wb-cd 13–14 |
| The WB Action (classifier, workflow) | 3 (classifier), 15 (workflow + ci-tracked-paths.yaml) |
Shared internal/walker predicate |
2 |
The single webhook endpoint (/api/webhook/ci) |
7 |
wb-cd — deploy oneshot (archive exact commit, atomic swap, trigger loop) |
11 (build), 12 (swap), 13 (loop), 14 (wiring) |
| CD self-update | covered by 14 (make build builds all three) + 12 (atomic rename) |
| Health check & rollback (poisoned commit, state file) | 10 (state), 12 (health), 13 (rollback) |
Graceful drain (agent drain, draining mode, TimeoutStopSec) |
8 (agent), 9 (server + main), 15 (TimeoutStopSec) |
Versioning & observability (ldflags, /healthz JSON, footer) |
4 (ldflags), 5 (/healthz), 6 (footer) |
systemd units (.path, wb-cd.service, OnFailure) |
15 |
Configuration (cd: block) |
1, 15 (example) |
| Provisioning | operator runbook section (non-code) |
| Failure modes | 13 (build fail, unhealthy, rollback-fails), 7 (bad payload), 3 (classifier error) |
| Security (HMAC, sudoers, build trust boundary) | 7 (HMAC), 15 (sudoers) |
Placeholder scan — no TBD/TODO. The corrected archiveCommit in Task 11 Step 4 supersedes the first draft shown immediately above it (the step explicitly says "Use this corrected version") — the engineer writes only the corrected form.
Type consistency — checked across tasks: config.CD{BinDir,TriggerFile,StateFile,HealthPollTimeout}; walker.NewMatcher(extensions, exclude) → Matcher.Tracked/Excluded/extOK; cd.Config{Root,BinDir,TriggerFile,StateFile,HealthURL,HealthPollTimeout,ServiceName}; cd.State{DeployedCommit,DeployedAt,PoisonedCommit}; cd.New(Config, alert.Notifier) → Deployer.Run; the Deployer seam build func(ctx, commit) (distDir string, cleanup func(), err error) matches its use in deployOne and the test fixture; server.Deps new fields Version/Commit/BuildTime/CDTriggerFile/Draining; ciPayload{Deploy,Commit,Ref,DeliveryID}; agent.Service.Drain(ctx) bool + ErrDraining. All consistent.
Plan complete and saved to docs/superpowers/plans/2026-05-21-wiki-browser-cd.md. Two execution options:
Which approach?