Prod-Clone Refactor Testing

Pre-deploy validation procedure for large, schema-touching refactors. Restores a clone of prod Postgres locally, applies pending migrations, verifies the resulting schema matches a fresh-from-init baseline, and exercises ingestion and UI against the migrated clone.

Design spec: docs/superpowers/specs/2026-04-24-prod-clone-refactor-testing-design.md

When to use

Run before deploying any migration that:

Not needed for additive migrations (new columns/tables) that are covered by unit/integration tests and don't touch live data layout.

Prerequisites

Procedure

Step 1 — Clone prod to a dump file

Unattended, ~30–45 min (longer if a fresh snapshot needs to be taken).

bb db:clone-prod

The script first calls aws sts get-caller-identity and aborts unless the profile resolves to the prod account. It then prints the session header and prompts Continue? [y/N] before any AWS mutation. Confirm with y to proceed, or pass --yes/-y to skip the prompt for unattended runs.

Produces dump/prod-<timestamp>.dump. Watch output for the throwaway clone identifier — if the script dies unexpectedly, that's what you need to delete manually (see Step 9).

If the script creates a fresh manual source snapshot, it tags it as temporary clone-test data and deletes it during cleanup. If cleanup fails, Step 9 shows how to list and remove the leftover snapshot.

Flags:

Step 2 — Load the dump into local Postgres

~2–5 min.

bb db:load-clone

Creates orcha_prod_clone in local docker-compose Postgres, restored from the newest dump/*.dump file.

Step 3 — Sanity check: current master boots against pre-migration clone

ORCHA_LOCAL_DB_NAME_OVERRIDE=orcha_prod_clone clj -M:dev

At the REPL: (integrant.repl/go) — must succeed. Confirms the clone is usable and that current master is healthy against prod schema. Exit the REPL.

Step 4 — Apply pending migrations on the clone

ORCHA_LOCAL_DB_NAME_OVERRIDE=orcha_prod_clone bb migrate migrate

All pending migrations apply cleanly against prod data.

Step 5 — Schema assertion (the gate)

bb db:fresh
bb db:schema-diff --a orcha_prod_clone --b orcha_fresh

bb db:fresh (re)creates orcha_fresh at HEAD schema. bb db:schema-diff exits 0 on empty diff. Any diff means the migration produces a schema that diverges from what init.sql + migrations produce — i.e., a migration bug. Iterate until exit 0.

Step 6 — Ingestion smoke (programmatic)

Start the app pointed at the clone:

ORCHA_LOCAL_DB_NAME_OVERRIDE=orcha_prod_clone clj -M:dev

At the REPL: (integrant.repl/go). In another terminal, list available fixtures and ingest a couple. Don't reference prod document IDs — these should be fresh test PDFs, since prod S3 objects aren't local.

ls test/fixtures/        # discover what's available
bb ingest test/fixtures/<picked-invoice-1>.pdf
bb ingest test/fixtures/<picked-invoice-2>.pdf

Assert each document reaches a terminal state (processed/failed) without errors. Watch the REPL logs for:

Step 7 — UI smoke (manual)

With the app still running against the clone, walk through:

Watch browser devtools and REPL logs for 500s.

Step 8 — Iterate on failure

If any step fails:

  1. Fix code or migration in the repo.
  2. bb db:load-clone — resets orcha_prod_clone from the cached dump file (~3 min, no re-clone of prod needed).
  3. Repeat from Step 4.

Step 9 — Cleanup

bb db:drop-clone
bb db:list-clones   # must be empty
bb db:list-clone-snapshots   # must be empty
rm dump/prod-<timestamp>.dump   # manually; contains PII

bb db:list-clones surfaces any leaked throwaway RDS instances (should not happen under normal conditions — the script has a shutdown hook — but run this as a belt-and-braces check). bb db:list-clone-snapshots surfaces any leaked manual source snapshots created by this workflow.

What this does NOT cover

Troubleshooting

Clone restore hangs past 25 minutes. The AWS wait timeout may have been exceeded. bb db:list-clones — if the instance exists and is available, restart the script; if it's creating, keep waiting (larger snapshots take longer).

SSM port-forward fails with "target not connected". Run aws sso login --profile orcha-prod and retry.

pg_dump fails with password authentication failed. The script fetches the master password from SSM (/v1-orcha/db-credentials), and restored snapshots inherit the master password as it was AT SNAPSHOT TIME. If the prod password rotated between the snapshot and the restore, those won't match. Resolutions:

pg_restore warnings about missing roles. Expected; --no-owner --no-privileges skips ownership. Exit code 1 with warnings is treated as success by bb db:load-clone.

Schema diff shows expected vs actual differences you think are fine. Add a canonicalizer rule in scripts/schema_diff.clj. Review the diff carefully — "noise" is often a subtle bug.

bb db:list-clones shows a stray instance you don't recognize. Delete it:

aws rds delete-db-instance --profile orcha-prod --region eu-central-1 \
  --db-instance-identifier <ID> --skip-final-snapshot --delete-automated-backups