Agent runtime & harness invocation — Design
Draft

Agent runtime & harness invocation — Design

2026-05-12Danielwiki-browser · sub-project #3

Problem

The collaborative-annotations initiative introduces an Agent that owns Source rewrites, anchor maintenance, and Perspective generation. Sub-projects #1 (document model & persistence) and #2 (topic core: data model + anchoring) are implemented and assume the Agent exists; both punt the question of how the harness actually invokes it.

Today, internal/collab stores proposals and applies approved ones, but nothing produces them. The wiki-browser Go server has no way to spawn a Claude Code instance, ship it the context for a job, or learn whether the job succeeded. This sub-project defines that runtime.

The scope here is narrow on purpose. #4 (Topic resolution & incorporation) and #5 (Perspectives) own the prompt wording and the per-job invariants their skills must respect. #3 owns the substrate that lets those skills run, persist results, and surface state.

Goals & non-goals

Goals

Non-goals

Approach

Use a headless claude -p subprocess per job. The Go server spawns Claude Code, hands it a short prompt that names a skill and includes the job parameters as body text, and waits for exit. The agent does its work using its standard tools — filesystem reads/writes for Source, and a new wb-agent CLI for DB access. The server learns "succeeded" or "failed" purely from exit code and a stderr tail; everything else is read from the collab DB after exit.

wiki-browser agent.Service queue + agent_jobs spawns claude claude -p subprocess, fresh ctx wb-incorporate skill or wb-perspective prompt + cwd exit + stderr tail Source files git working tree wb-agent → collab DB Read Bash collab DB SQLite (WAL, cross-process) post-exit read Browser polls /api/agent-jobs
The server spawns Claude as a one-shot child. Inside the process, the agent uses Read on Source files and Bash to invoke wb-agent for DB writes. The server learns only "done / failed" from exit; the UI learns by reading agent_jobs.
Decision — subprocess per job, channels deferred

Headless claude -p wins on Pi-friendliness, maturity, and zero idle footprint. claude --channels would amortise startup but adds a supervised long-running process, a custom channel implementation, and context-isolation discipline. Channels have no documented session-lifetime cap, so v2 adoption remains open — the v1 skill code must not bake in fresh-process assumptions.

The agent owns the work surface. The Go server is a launcher and a status mirror: it spawns the process, surfaces success/failure to humans, and reads the rows the agent created. It does not parse agent stdout, does not enforce job-specific invariants (anchor placement, persona shape), and does not retry. Validation that lives inside the skill belongs to #4/#5; validation that lives in wb-agent reuses the existing internal/collab code paths.

Design

Process model

One subprocess per job, spawned via Go's exec.Command:

gocmd := exec.CommandContext(ctx,
    cfg.Agent.ClaudeBin,                       // "claude" by default
    "-p", promptBody,
    "--dangerously-skip-permissions",
)
cmd.Dir = wikiBrowserRoot                      // for .claude/skills/ discovery only
cmd.Stdout = &stdoutBuf                        // captured for debug logs
cmd.Stderr = &stderrBuf                        // last 4 KiB → agent_jobs.error_tail

// Process-group + graceful shutdown. cmd.Cancel + WaitDelay are Go 1.20+:
// on ctx cancel (shutdown or timeout) we SIGTERM the whole process group, then
// SIGKILL after 5 s if claude or any child still hasn't exited.
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
cmd.Cancel = func() error {
    return syscall.Kill(-cmd.Process.Pid, syscall.SIGTERM)
}
cmd.WaitDelay = 5 * time.Second                // then default kill (SIGKILL)

Skill layout

Project-local skills, versioned in git as part of wiki-browser/:

treewiki-browser/.claude/skills/
├── playwright-cli/           # existing
├── wb-incorporate/
│   └── SKILL.md
└── wb-perspective/
    └── SKILL.md

Each SKILL.md follows the standard frontmatter format:

markdown---
name: wb-incorporate
description: Produce a proposed Source rewrite for an open Topic, re-anchoring
  every other open non-global Topic on the same Source. Used by wiki-browser
  during Topic incorporation.
---

# wb-incorporate

Parse the job parameters from the prompt body. They will look like:

    Job ID:          <uuid>
    Topic ID:        <topic-id>
    Source path:     <repo-relative>
    Base source SHA: <git blob SHA>
    Repo root:       <absolute path to the orcha monorepo root>
    wb-agent path:   <absolute path to the wb-agent binary>

The Repo root + Source path concatenation gives you the absolute Source
file path. Always use the absolute path the harness gave you — never
rely on the current working directory for resolving Source files.
Always invoke wb-agent via the absolute path the harness gave you —
never rely on PATH lookup.

Then:

1. Read the Source file at <Repo root>/<Source path>.
2. Run `<wb-agent path> get-topic --id=<topic-id>` to load the topic,
   anchor, and full message thread.
3. Run `<wb-agent path> list-open-topics --source-path=<source-path>`
   to load every other open Topic on this Source, with anchors.
4. <REWRITE CONTRACT OWNED BY #4 — re-anchor + rewrite contract goes here.>
5. Pipe the proposed Source to:
     `<wb-agent path> insert-proposal --topic-id=<topic-id> --base-sha=<sha>`
6. Exit.
Why pass paths in the prompt instead of relying on cwd/PATH

The subprocess cwd is fixed to wikiBrowserRoot for skill discovery (Claude Code looks for .claude/skills/ in cwd). But Source files live under cfg.Root — the orcha monorepo root, which is the parent of wikiBrowserRoot in the typical layout. Source paths in the DB are stored relative to cfg.Root, not wikiBrowserRoot. Resolving them against cwd would silently read the wrong tree.

Same logic for wb-agent: the binary ships in dist/wb-agent next to dist/wiki-browser and is not installed to a system PATH in this deployment. The harness computes its absolute path once at startup and passes it explicitly so the skill never depends on environment-PATH magic.

#3 ships both SKILL.md files with stub prompt bodies (clearly marked <REWRITE CONTRACT OWNED BY #4/#5>) plus a minimal exit-zero pass so the end-to-end runtime is testable. #4 and #5 fill in the substantive prompt content when they land.

Prompt format

The prompt body is a short instruction naming the skill, followed by job parameters as a labelled block:

textUse the wb-incorporate skill.

Job ID:          0d4b9a2f8c7e4a13b6a01e9c2d8f5b34
Topic ID:        t-xyz
Source path:     docs/foo.md
Base source SHA: deadbeefcafef00d...
Repo root:       /home/volrath/code/orcha
wb-agent path:   /home/volrath/code/orcha/wiki-browser/dist/wb-agent

When done, exit 0. On any unrecoverable error, exit non-zero — the wiki-browser
server will surface stderr to the operator.

The parameter block is the same for every wb-incorporate invocation; the skill's first paragraph teaches Claude how to read it. wb-perspective uses an analogous block with Persona name, Source SHA, and Persona SHA in place of Topic ID and Base source SHA. Repo root and wb-agent path appear in both.

The harness computes Repo root from cfg.Root (the orcha monorepo root) and wb-agent path from cfg.Agent.WBAgentBin — which defaults at startup to filepath.Join(filepath.Dir(os.Executable()), "wb-agent"), the path next to the running wiki-browser binary. Both are validated at config-load time.

Capability surface

Two job kinds in v1:

KindSkillInputs (prompt body)Side effects
incorporate wb-incorporate Job ID, Topic ID, Source path, Base source SHA, Repo root, wb-agent path Inserts a row into incorporation_proposals via wb-agent insert-proposal.
perspective wb-perspective Job ID, Source path, Persona name, Source SHA, Persona SHA, Repo root, wb-agent path Upserts a row into the perspective cache via wb-agent put-perspective.
Job kinds the v1 runtime supports. New kinds add a new skill directory plus a new wb-agent subcommand.

The wb-agent CLI

A new binary, built from the same Go module as wiki-browser, lives at cmd/wb-agent/main.go and ships as dist/wb-agent. It opens the collab DB directly using the existing internal/collab code paths so all validation, FK enforcement, CHECK constraints, and the sequence-allocation logic that Store applies are reused. The agent invokes it via Claude Code's Bash tool.

Because wb-agent is a separate process from the running server, it cannot share the server's in-memory write funnel. SQLite's WAL mode plus the per-DSN busy_timeout=5000 already configured in collab.Open handle cross-process write contention: brief blocking is acceptable for low-volume agent writes. wb-agent opens its own short-lived connection per invocation; the funnel goroutine model remains the discipline within the server process.

v1 subcommands:

SubcommandOwnerBehavior
wb-agent get-topic --id=<id> #3 Reads the topic row, its anchor JSON, and the full message thread (ordered by sequence). Emits a JSON object on stdout.
wb-agent list-open-topics --source-path=<path> #3 Reads every open topic for the given Source path with current anchors. Emits a JSON array on stdout.
wb-agent insert-proposal --topic-id=<id> --base-sha=<sha> #3 Reads proposed Source from stdin. Allocates the next revision_number for that topic, validates, and inserts a row into incorporation_proposals with proposed_by = NULL (the Agent is not a user per #1's data model — see schema migration below). Prints the new proposal ID on stdout.
wb-agent get-persona --source-path=<path> --name=<name> #5 fills in Scaffold only in #3 — stub returns a placeholder. #5 implements the real persona lookup against perspective_defs.
wb-agent put-perspective --source-path=<path> --persona=<name> --source-sha=<sha> --persona-sha=<sha> #5 fills in Scaffold only in #3 — stub accepts stdin and returns OK without writing. #5 implements the cache upsert.
wb-agent subcommands. #3 fully implements the incorporate-related ones plus the two perspective scaffolds.

wb-agent reads wiki-browser.yaml to locate the collab DB. The path is resolved via the same -config flag the server uses (default wiki-browser.yaml in the working directory). The agent's working directory is wikiBrowserRoot when invoked, so the default works without configuration.

Concurrency and the in-memory queue

A new internal/agent package owns the runtime. The Service wraps an in-memory queue keyed by Source path:

Schema: relaxing incorporation_proposals.proposed_by

The current 001_initial.sql migration declares proposed_by TEXT NOT NULL with a FK to users(id). That contradicts #1's intent — recorded in the decisions doc as "the Agent is not a user; agent-authored content may have null user references." The constraint never fired because nothing has produced an agent proposal yet. #3 drops NOT NULL via SQLite's twelve-step table-rebuild procedure.

The rebuild has constraints that the existing migration runner cannot satisfy:

The fix is a small enhancement to the migration runner: a per-file directive that opts out of the runner's transaction wrapper and lets the migration manage its own boundaries.

go// internal/collab/migrate.go — runner change
// If the first non-blank line is exactly "-- migrate:no-tx",
// applyOne skips its own BEGIN/COMMIT. The migration file is
// executed as-is and is responsible for its own tx + FK toggling.
// schema_migrations bookkeeping runs in a separate short tx afterward.
sql-- migrations/003_agent_runtime.sql
-- migrate:no-tx

-- FK toggling must happen outside any transaction; the runner honors the
-- no-tx directive above and lets this file own its own BEGIN/COMMIT.
PRAGMA foreign_keys = OFF;

BEGIN;

-- SQLite cannot ALTER a column's NOT NULL; rebuild the table.
CREATE TABLE incorporation_proposals_new (
  id              TEXT PRIMARY KEY,
  topic_id        TEXT NOT NULL,
  revision_number INTEGER NOT NULL,
  proposed_source TEXT NOT NULL,
  base_source_sha TEXT NOT NULL,
  proposed_by     TEXT,                  -- now nullable; NULL = Agent
  created_at      INTEGER NOT NULL,
  FOREIGN KEY (topic_id)    REFERENCES topics(id),
  FOREIGN KEY (proposed_by) REFERENCES users(id)
);
INSERT INTO incorporation_proposals_new SELECT * FROM incorporation_proposals;
DROP TABLE incorporation_proposals;
ALTER TABLE incorporation_proposals_new RENAME TO incorporation_proposals;
CREATE UNIQUE INDEX incorporation_proposals_topic_rev
  ON incorporation_proposals(topic_id, revision_number);
CREATE UNIQUE INDEX incorporation_proposals_id_topic
  ON incorporation_proposals(id, topic_id);

-- Composite FK in incorporation_attempts targets the renamed table,
-- but the rename preserves it automatically.

PRAGMA foreign_key_check;            -- raises if any orphan slipped through

COMMIT;

PRAGMA foreign_keys = ON;

The same migration file then creates agent_jobs (see below). Tests cover: (a) the rebuilt table accepts proposed_by IS NULL; (b) existing non-null rows survive the rebuild with values intact; (c) foreign_key_check returns no rows after the rebuild on a populated DB; (d) the runner's no-tx directive correctly leaves schema_migrations recorded even when the migration manages its own transaction.

collab.InsertProposal changes its ProposedBy field from string to *string, with the existing required-fields check dropping that key. Existing rows produced before the migration are unaffected because they all already have non-null proposed_by values.

The agent_jobs table

The single source of truth for "what is the agent doing right now, and what did it do last." Added by the same migration (still inside the no-tx file, after the rebuild commits and FKs are re-enabled):

sqlCREATE TABLE agent_jobs (
  id            TEXT PRIMARY KEY,
  kind          TEXT NOT NULL,   -- 'incorporate' | 'perspective'
  source_path   TEXT NOT NULL,
  topic_id      TEXT,            -- non-null iff kind = 'incorporate'
  persona_name  TEXT,            -- non-null iff kind = 'perspective'
  status        TEXT NOT NULL,   -- queued|running|succeeded|failed|timed_out
  started_at    INTEGER,         -- unix seconds; null until run begins
  completed_at  INTEGER,         -- unix seconds; null until terminal
  exit_code     INTEGER,         -- null until terminal
  error_tail    TEXT,            -- last 4 KiB of stderr; null on success
  created_at    INTEGER NOT NULL,
  CHECK (status IN ('queued','running','succeeded','failed','timed_out')),
  CHECK (
    (kind = 'incorporate' AND topic_id IS NOT NULL AND persona_name IS NULL) OR
    (kind = 'perspective' AND persona_name IS NOT NULL AND topic_id IS NULL)
  ),
  CHECK ((status IN ('queued','running')) =
         (completed_at IS NULL)),
  FOREIGN KEY (topic_id) REFERENCES topics(id)
);
CREATE INDEX agent_jobs_status      ON agent_jobs(status);
CREATE INDEX agent_jobs_source_path ON agent_jobs(source_path, created_at DESC);

The kind/discriminator CHECK requires the opposite field to be NULL — a perspective job with topic_id set, or an incorporate job with persona_name set, is rejected at the schema level. The source_path column is validated through ValidateSourcePath at insert time (same discipline as topics.source_path). Lifecycle transitions are routed through the existing single-writer funnel in collab.Store via new mutators (InsertJob, StartJob, CompleteJob) — the same pattern as topics, messages, and proposals.

Startup sweep

On server startup, before collab.Recover runs:

sqlUPDATE agent_jobs
   SET status       = 'failed',
       completed_at = unixepoch(),
       error_tail   = 'server restarted while job in flight'
 WHERE status IN ('queued','running');

This restores the invariant that no running row outlasts a server process. A more sophisticated recovery (re-queueing) is rejected for v1: the agent's work may have partially landed (e.g. a proposal row exists) and the safest thing is to surface "this job didn't finish — retry if you still want it." The user retries through the UI.

HTTP surface

New endpoints under /api/agent/:

Method & pathBody / response
POST /api/agent/jobs Body: {kind, source_path, topic_id?, persona_name?}. Validates inputs, inserts an agent_jobs row with status=queued, enqueues. Returns {job_id}. Returns 409 if an in-flight job exists for the same Source.
GET /api/agent/jobs?source_path=… Returns the most recent agent jobs for a Source (default last 20). UI polls this to update spinners and surface errors.
GET /api/agent/jobs/{id} Single job by ID, including error_tail when relevant.
Agent-job endpoints. All require collaborator auth — the principal helper from #7 wraps each handler.

These endpoints are triggers and observers; they are not how the agent itself talks to the server. The agent uses wb-agent for its own writes and has no HTTP access.

Git identity

The Agent does not commit. The harness commits, post-approval, via the existing collab.IncorporateCommitSourceRewrite path. #3's git work is purely config plumbing: new agent.author_name and agent.author_email fields, threaded into the IncorporateInput.AuthorName/AuthorEmail arguments already in place.

go// in the handler that approves a proposal:
sha, err := collab.Incorporate(store, collab.IncorporateInput{
    RepoRoot:     cfg.Root,
    ProposalID:   proposalID,
    ApproverID:   principal.ID,
    ApproverName: principal.DisplayName,
    Subject:      subject,
    Body:         body,
    AuthorName:   cfg.Agent.AuthorName,   // new
    AuthorEmail:  cfg.Agent.AuthorEmail,  // new
})

This decouples "who wrote the new Source" (the Agent, via the git author trailer) from "who approved it" (the human, via the commit trailer). git blame attributes the rewrite to the Agent; git log reveals the human approver and Topic ID.

Failure modes

FailureDetectionState storedUser-facing message
Claude binary not in PATH / spawn error cmd.Start returns error status=failed, error_tail = "agent unreachable: <err>" "Agent is unreachable. Check that claude is installed."
Non-zero exit cmd.Wait returns *exec.ExitError status=failed, exit_code, last 4 KiB of stderr "Agent failed. See log."
Timeout context.DeadlineExceeded status=timed_out, partial error_tail "Agent timed out after Nm."
Server shutdown during run service receives ctx.Done() Job left in running in DB; startup sweep on next boot marks it failed "Agent failed: server restarted." (after restart)
Exit 0 but no proposal row (incorporate) Post-exit check in service: no new incorporation_proposals row for topic_id with created_at >= job.started_at status=failed, error_tail = "agent exited 0 but produced no proposal" "Agent finished without producing a proposal — retry."
How #3 classifies the outcome of a job. #4 and #5 may layer their own invariant checks on top.

No automatic retries. A failed job is surfaced to the user, who decides whether to re-trigger.

Observability

Configuration

New agent: block in wiki-browser.yaml:

yamlagent:
  author_name:         "Orcha Agent"
  author_email:        "agent@orcha.local"
  claude_bin:          ""           # optional; default "claude" (resolved against $PATH)
  wb_agent_bin:        ""           # optional; default sibling of the wiki-browser binary
  max_concurrent_jobs: 1
  incorporate_timeout: "5m"
  perspective_timeout: "3m"
  log_dir:             "./agent-logs"   # optional; empty disables file logging

The block is required once #3 lands. If agent: is missing, startup fails with a clear error — there is no implicit default for git authorship, and silent fallback would corrupt the audit trail.

Validation at config load:

Module layout

New code goes in two packages and one binary directory:

treeinternal/agent/
├── service.go          # queue + lifecycle + agent_jobs writes
├── service_test.go
├── runner.go           # Runner interface + ClaudeCLIRunner
├── runner_test.go
└── fake_runner.go      # test impl, also usable from external tests

internal/collab/
└── agent_jobs.go       # InsertJob / StartJob / CompleteJob mutators

cmd/wb-agent/
└── main.go             # subcommand dispatcher + handlers

internal/agent depends on internal/collab but not on the HTTP layer. internal/server wires agent.Service into its dependency bundle and exposes the HTTP handlers. The existing internal/collab/incorporate.go is unchanged: it remains the post-approval apply step.

Test boundary

The runtime is testable end-to-end without ever invoking claude. The Runner interface:

gotype Job struct {
    ID         string
    Kind       string           // "incorporate" | "perspective"
    SourcePath string
    TopicID    string           // "" when Kind == "perspective"
    Persona    string           // "" when Kind == "incorporate"
    BaseSHA    string           // "" when Kind == "perspective"
    PersonaSHA string           // "" when Kind == "incorporate"
    SourceSHA  string           // "" when Kind == "incorporate"
}

type RunResult struct {
    ExitCode  int
    ErrorTail string         // last 4 KiB of stderr
    Err       error          // non-nil on spawn errors / timeouts
}

type Runner interface {
    Run(ctx context.Context, j Job) RunResult
}

ClaudeCLIRunner spawns the real subprocess. FakeRunner takes a user-supplied func(Job) RunResult and runs it inline, letting tests assert on queue state, simulate the agent's wb-agent writes against the test DB, and exercise every failure-mode branch deterministically.

Open questions

The remaining unknowns belong to other sub-projects, not to #3:

References