Pi deployment & git-sync — Design
Draft

Pi deployment & git-sync — Design

2026-05-21Danielwiki-browser · sub-project #10

Problem

wiki-browser runs fine on a developer laptop against a local checkout, but the collaboration loop it was built for — view a document, open Topics, discuss, incorporate the resolution — only pays off when both collaborators point at the same always-on instance. Today there is no such instance.

The two of us (Daniel, Max) work trunk-based: documents are committed straight to master of getorcha/orcha and pushed to GitHub, no pull requests. The intended loop is: a document lands on master → the wiki serves it → we discuss Topics on it in the UI → the Agent rewrites the Source → the resolution is committed and pushed back to master → both of us git pull the updated copy. For that loop to close, a deployed wiki-browser must do two things it cannot do today:

  1. Pull to serve. When a commit lands on master, the deployed instance must fetch it and serve the new content. Today the server only sees a filesystem; it has no notion of a remote.
  2. Push what it incorporates. Incorporation already writes the rewritten Source and runs git commit with Topic:/Proposal: trailers (internal/collab/gitops.go) — but it never pushes. The commit sits on the deployed clone, invisible to everyone else. There is no fetch, pull, or push anywhere in the codebase.

The target host is a Raspberry Pi 5 on the home LAN, already reachable from outside through an existing Nginx Proxy Manager that terminates TLS and routes a domain to a host:port. So this spec is two things at once: an operational runbook for standing the Pi up, and the design of a new git-sync engine that turns the local-only server into a participant on a shared master.

A related want — continuously redeploying the wiki-browser binary itself when its source changes — is explicitly a later sub-project. It is addressed here only to the extent of leaving clean seams (see Design § CD seams).

Goals & non-goals

Goals

Non-goals

Approach

The new behavior is a git-sync engine: pull-to-serve on a webhook, push-after-incorporate, and a startup catch-up. The question is where it lives.

ApproachNew Go codeWebhook-nativeRace-safeVerdict
B. External pull (cron / systemd timer) + in-binary pushLowNoNoRejected
C. Bare mirror repo + separate served working treeHighPartialYesRejected
Sync-engine placement. The webhook choice and the existing in-binary commit path decide it.

B is ruled out by the webhook: a GitHub webhook is an HTTP request, and it lands in the wiki-browser process — so the pull must be in-binary regardless. Adding a second process that also mutates the same clone reintroduces exactly the race the lock exists to prevent: an external git merge interleaving with the in-binary git add/git commit. C separates "receive" from "serve" cleanly but is far too much machinery for two users and one host, and the wiki-browser already commits straight into the served tree — C would mean re-plumbing that.

A is chosen. The webhook already arrives in the Go process; incorporation, committing, and startup recovery already live there. A new internal/gitsync package adds one mutex that every git mutation acquires, and the whole system stays single-process, single-writer, single-lock.

Pros — Approach A

  • One process, one lock — pulls and incorporation commits cannot interleave by construction.
  • Composes with the webhook (already in-process) and with the existing commit / recover code.
  • Sync status is in-process, so it can be surfaced in the UI and read by a future CD watcher.

Cons — Approach A

  • New Go code in the binary (a package plus an incorporation seam).
  • A large fetch can briefly block an incorporation behind the lock — acceptable; fetches are small and infrequent.

Design

Topology & Pi layout

Daniel · Max browsers GitHub getorcha/orcha · master RASPBERRY PI 5 — HOME LAN Nginx Proxy Mgr TLS termination wiki-browser + gitsync engine /srv/orcha clone served working tree collab.db topics · threads https http :8080 webhook (push event) fetch · push (SSH) commit / read
The public edge is Nginx Proxy Manager. wiki-browser binds a localhost / LAN port only. Git traffic to GitHub is outbound SSH from the wiki-browser process; the inbound webhook rides the same HTTPS path as browsers.

On-disk layout. The config lives inside the clone, so the .claude/skills/ the Agent runtime requires (wb-incorporate, wb-perspective — validated at startup by validateAgentRuntimeRoot) arrive and stay current with every pull. Databases live outside the clone on a stable path:

filesystem — raspberry pi/srv/orcha/                         # clone of getorcha/orcha — this IS cfg.root
  wiki-browser/
    .claude/skills/{wb-incorporate,wb-perspective}/   # ship via git
    wiki-browser.yaml               # config; gitignored; agent-runtime root
/srv/wiki-browser/
  bin/{wiki-browser,wb-agent}       # arm64 binaries
  data/{collab.db,index.db}         # absolute paths in config
  secrets/{google-client-secret,github-webhook-secret,slack-webhook-url}
  agent-logs/

The collab DB holds every Topic, proposal, discussion message and session — none of it is in git. It is the one irreplaceable artifact on the Pi and is treated accordingly (see Operations). The index DB is disposable: it is rebuilt from the repo.

The git-sync engine (internal/gitsync)

A new package wrapping the clone. It owns one mutex that every git mutation acquires — webhook fetch, incorporation commit + push, startup catch-up, background push. Reads that do not mutate (the existing git log recovery scan, content hashing) are unaffected. Sketch of the surface:

go — internal/gitsynctype SyncResult struct {
    OldHead, NewHead string
    ChangedPaths     []string   // repo-relative, ext/exclude-filtered
    Rebased          bool
}

type State string   // "synced" | "syncing" | "push-pending" | "diverged"

type Status struct {
    State      State
    Head       string   // current HEAD sha — also the CD watcher's signal
    Ahead      int      // local commits not yet on origin
    LastSyncAt time.Time
    LastError  string
}

func New(cfg Config) (*Repo, error)            // validates: git repo, on branch, remote present
func (r *Repo) Sync(ctx) (SyncResult, error)  // lock → reconcile
func (r *Repo) Push(ctx) error                // lock → reconcile → git push
func (r *Repo) Incorporate(ctx, fn func() (string, error)) (string, error)
func (r *Repo) Status() Status

The shared primitive is reconcile — never exported, always under the lock: git fetch <remote> <branch>, then git merge --ff-only; if the local branch has diverged (it carries unpushed incorporation commits and origin also moved), fall back to git rebase <remote>/<branch>. Every public operation is built on it:

After any reconcile that changed files, the engine refreshes the served view deterministically rather than relying on filesystem-watch timing: it calls a new walker.Rescan() (re-runs the existing scan(), atomically swapping the file-set map), then for each path in ChangedPaths calls index.Reindex if the walker still has it or index.Remove otherwise. ChangedPaths comes from git diff --name-only OldHead NewHead. The fsnotify watcher stays in place for local edits and as a backstop, but correctness of a git-driven update does not depend on it.

Webhook endpoint

POST /api/webhook/github is a public route — mounted alongside /auth/*, outside the OAuth session middleware, since GitHub cannot authenticate. It is protected by HMAC instead:

Note

The webhook also fires for the Pi's own incorporation pushes. That is harmless: the follow-up Sync fetches, finds origin/<branch> already equal to local HEAD, and the fast-forward is a no-op. No de-duplication needed.

Incorporation: push integration

Incorporation today (internal/collab/incorporate.go) loads the proposal, stale-checks base_source_sha against the Source on disk, writes the recovery marker, commits via CommitSourceRewrite, then completes the DB transaction. The change is to run that unchanged sequence inside the engine's lock, with a reconcile before and a push after. The incorporate HTTP handler calls:

go — incorporate handlersha, err := gitSync.Incorporate(ctx, func() (string, error) {
    return collab.Incorporate(store, in)   // existing call, unchanged
})

Ordering inside one lock hold: (1) reconcile — the working tree is now equal to origin/<branch>. (2) collab.Incorporate runs; its stale-check now compares the proposal's base_source_sha against the freshly-pulled Source — so an upstream edit to the same document cleanly fails the check as ErrStaleProposal, and the user regenerates. (3) git push.

The push is the last, best-effort step. The commit and the DB transaction are already done and mutually consistent before the push is attempted — so a push failure does not fail the incorporation. The handler reports success; the engine sets state push-pending; the background pusher retries. The local commit is authoritative, the push is replication. This is the property that makes a restart mid-incorporation safe (see CD seams). Batched incorporation (#9) flows through the identical seam — it is the same collab.Incorporate call with ChildTopicIDs populated. Perspective regeneration commits nothing and never pushes.

Conflict & drift handling

SituationResolution
Upstream edits a document that has an open Topic / generated proposalThe proposal's base_source_sha no longer matches the pulled Source → existing ErrStaleProposal / freshness machinery fires → the user regenerates the proposal. No new code.
Push rejected — someone pushed in the window between our fetch and our pushreconcile rebases the local incorporation commit onto the new origin/<branch>; retry the push. Bounded attempts (e.g. 3). An incorporation commit is a single-file pathspec commit, so the rebase is clean unless the same document moved upstream.
Genuine rebase conflict — the Pi and a teammate edited the same document concurrentlygit rebase --abort; set state diverged; log loudly; surface the UI banner and fire an alert (see Alerting). The incorporation already succeeded locally and the DB is consistent — only the push is blocked. A human resolves the merge on the Pi. The background pusher skips while diverged so it does not thrash.
Network down — fetch or push failsTransient. Sync errors are non-fatal and logged; the background pusher and the next webhook retry. Local state stays consistent throughout.
The reconcile-before-commit ordering shrinks the push-rejection window to milliseconds; the residual genuine-conflict case is made loud, not silent.
Warning

Before deployment, the watched tree only changed via incorporation or a local editor. After deployment, arbitrary upstream pushes rewrite watched documents far more often. That exercises the existing Topic-anchor / freshness / recover paths (character-offset anchors drifting under an upstream rewrite) harder than they have been. This is existing behavior, not new design — but it is the most likely place for a latent bug to surface, and the implementation plan must include explicit tests for "open Topic on a document that is rewritten by an upstream pull."

Startup sequence

Catch-up runs early in run() — after config.Load, before walker.New — so the initial filesystem scan and collab.Recover both see the latest tree:

  1. config.Load.
  2. gitsync.New — validate the clone (is a git repo, on the configured branch, remote present); fail loudly with a clear message if not (provisioning did the initial git clone).
  3. Sync — fetch + fast-forward. This is what self-heals webhooks missed while the Pi was offline.
  4. Push — flush any incorporation commit that a previous run committed but did not push.
  5. walker.New, index.Open, collab.Open, RevokeSessions, SweepIncompleteJobs, collab.Recover — unchanged, now operating on the up-to-date tree.
Decision

Steps 3–4 are non-fatal. If the Pi boots with no network, gitsync.New still succeeds (the clone is valid), the Sync/Push errors are logged, and the server starts and serves whatever the clone currently holds. The next webhook or the background pusher reconciles once the network returns. Contrast collab.Recover, which stays fatal — a corrupt collab DB is not something to serve through.

Configuration

Two new top-level blocks. git: configures the sync engine; alert: configures the Slack notifier. The commit-author identity is the existing agent.author_name/author_email; SSH transport for fetch/push uses the service user's deploy key and needs no config entry.

yaml — wiki-browser.yamlgit:
  remote: "origin"              # default
  branch: "master"              # default
  webhook_secret_file: "/srv/wiki-browser/secrets/github-webhook-secret"
  poll_interval: "0"             # 0 = webhook-only. Set e.g. "10m" to
                               # enable a safety-net poll.
alert:
  slack_webhook_url_file: "/srv/wiki-browser/secrets/slack-webhook-url"
  fail_threshold: "15m"     # alert if sync/push fails continuously this long
Note

Both new secrets follow the existing google_client_secret_file convention — the config holds a path, not the value. The parsed config struct then carries only paths, never secret material, so a stray slog of the config or a config-wrapped error cannot leak a secret; the config file itself stays non-sensitive; and each secret rotates independently. Config load validates each path with an os.Stat existence check, exactly as it already does for google_client_secret_file.

Note

Webhook-only has one gap: a webhook missed while the Pi is online (a GitHub delivery failure, a momentary proxy hiccup) leaves the Pi stale until the next push. Offline misses self-heal at startup (step 3 above). poll_interval closes the online gap and is disabled by default — the safety net is one config line away without changing the primary mechanism.

Provisioning checklist

Operations

Alerting

A UI banner and log lines are passive — they assume someone is looking. A diverged state, a silently broken deploy key, or an Agent runtime that has stopped working all have to reach the operator. A small general-purpose notifier — internal/alert — POSTs a message to a Slack incoming webhook ({"text": ...}), the URL read from the file at alert.slack_webhook_url_file. Both the gitsync engine and the Agent service hold a reference to it.

Three conditions raise an alert:

Each alert carries the condition, the current HEAD, and the last error. When a condition clears — the rebase is resolved, the network returns, an Agent job succeeds — a single "recovered" message fires and the relevant counter resets. Alerts are edge-triggered: a long outage produces one alert and one recovery, never a stream.

Note

The notifier is best-effort and must never block or fail the operation that triggered it — a failed POST is logged and dropped. If alerting itself is down, the UI banner and logs remain as the passive fallback.

Failure modes

FailureDetectionBehavior / recovery
Pi offline / rebootedStartup Sync catches up; webhooks missed while down are subsumed by the fetch.
Process killed mid-incorporation, after commit, before DB completioncollab.Recover on next bootRecovery reconciles the DB against git log trailers — existing machinery.
Process killed after DB completion, before pushStartup Push (local branch ahead of origin)The pending commit is pushed on next boot. Local commit was authoritative throughout.
Push fails (network)git push errorState push-pending; background pusher retries; no state corruption. An alert fires if failures persist past fail_threshold.
Rebase conflict (same document edited both sides)git rebase exit statusState diverged; surfaced in UI + logs; alert fires immediately; human resolves on the Pi.
Agent job crashSweepIncompleteJobs on bootExisting — incomplete jobs swept before Recover.
Agent runtime broken (expired claude login, missing binary, API down)3 consecutive job failuresJobs surface as failed in the UI; an alert fires so the operator can re-authenticate or fix the runtime.
collab DB corruption / SD-card lossDB open / integrity errorRestore from the latest off-device backup.
Bad / missing webhook signatureHMAC verify401, logged, no sync. Repeated failures indicate a secret mismatch.
Deploy-key auth failurefetch / push errorSync stalls, logged; sync-status reflects LastError; an alert fires after fail_threshold.

Security

Continuous-delivery seams

CD of the wiki-browser binary is a later sub-project. This design does not build it, but three additive seams keep it from fighting the architecture later:

  1. Sync returns a result, not void. SyncResult carries {OldHead, NewHead, ChangedPaths}. Useful now for logging; later it is how CD answers "did wiki-browser/ source change in this push?" without re-querying git.
  2. Status includes the current HEAD sha. The same /api/sync-status surface that shows synced/diverged is the thing an external CD watcher reads to detect "new code landed."
  3. Invariant — the wiki-browser process is the sole mutator of the clone's git state. CD, whenever it is built, observes (reads sync-status, reacts) and never runs its own fetch/pull/commit/push on /srv/orcha. Violating this re-introduces the Approach B two-writer race. CD therefore lives as a separate systemd unit, not inside the binary.

The hard part of CD — surviving a restart at any instant — is already delivered by this design: a CD restart is just another crash, and the startup sequence (collab.Recover + startup Push of pending commits + SweepIncompleteJobs) already makes every step recoverable. The "local commit authoritative, push replayable" property specifically covers a restart mid-incorporation.

Note

One CD-era refinement is deliberately not done here: graceful-shutdown drain. main.go currently gives srv.Shutdown a 5 s budget, which can cut off a slow incorporation. Recovery still catches it, so this is polish, not a correctness gap — but the CD sub-project should revisit it so routine redeploys are clean rather than merely safe.

Resolved decisions

The draft's open questions were all resolved during review:

References