Pre-deploy validation procedure for large, schema-touching refactors. Restores a clone of prod Postgres locally, applies pending migrations, verifies the resulting schema matches a fresh-from-init baseline, and exercises ingestion and UI against the migrated clone.
Design spec: docs/superpowers/specs/2026-04-24-prod-clone-refactor-testing-design.md
Run before deploying any migration that:
NOT NULL a previously-nullable column.bb migrate migrate against init.sql wouldn't
catch data-shape issues.Not needed for additive migrations (new columns/tables) that are covered by unit/integration tests and don't touch live data layout.
aws sso login --profile orcha-prodbb dev:up && bb dev:seedpg_dump, pg_restore, psql).
The host client major version MUST be ≥ the docker-compose Postgres
server version (currently pgvector/pgvector:pg18 per docker-compose.yml).
Verify both:
pg_dump --version # → 18.x
docker-compose exec -T postgres psql -U postgres -tAc "SHOW server_version;" # → 18.x
Mismatch will cause pg_restore to fail loading the prod-shaped dump.session-manager-plugin --version
must succeed). Without it, the SSM port-forward step silently hangs.Unattended, ~30–45 min (longer if a fresh snapshot needs to be taken).
bb db:clone-prod
The script first calls aws sts get-caller-identity and aborts unless the
profile resolves to the prod account. It then prints the session header and
prompts Continue? [y/N] before any AWS mutation. Confirm with y to
proceed, or pass --yes/-y to skip the prompt for unattended runs.
Produces dump/prod-<timestamp>.dump. Watch output for the throwaway clone
identifier — if the script dies unexpectedly, that's what you need to delete
manually (see Step 9).
If the script creates a fresh manual source snapshot, it tags it as temporary clone-test data and deletes it during cleanup. If cleanup fails, Step 9 shows how to list and remove the leftover snapshot.
Flags:
--fresh-snapshot — skip snapshot reuse; always create a new one.--freshness-hours N — reuse snapshots up to N hours old (default 24).--yes, -y — skip the interactive confirmation prompt.--skip-restore — dry-run; print plan and exit before any AWS calls.~2–5 min.
bb db:load-clone
Creates orcha_prod_clone in local docker-compose Postgres, restored from
the newest dump/*.dump file.
ORCHA_LOCAL_DB_NAME_OVERRIDE=orcha_prod_clone clj -M:dev
At the REPL: (integrant.repl/go) — must succeed. Confirms the clone is
usable and that current master is healthy against prod schema. Exit the REPL.
ORCHA_LOCAL_DB_NAME_OVERRIDE=orcha_prod_clone bb migrate migrate
All pending migrations apply cleanly against prod data.
bb db:fresh
bb db:schema-diff --a orcha_prod_clone --b orcha_fresh
bb db:fresh (re)creates orcha_fresh at HEAD schema. bb db:schema-diff
exits 0 on empty diff. Any diff means the migration produces a schema that
diverges from what init.sql + migrations produce — i.e., a migration bug.
Iterate until exit 0.
Start the app pointed at the clone:
ORCHA_LOCAL_DB_NAME_OVERRIDE=orcha_prod_clone clj -M:dev
At the REPL: (integrant.repl/go). In another terminal, list available
fixtures and ingest a couple. Don't reference prod document IDs — these
should be fresh test PDFs, since prod S3 objects aren't local.
ls test/fixtures/ # discover what's available
bb ingest test/fixtures/<picked-invoice-1>.pdf
bb ingest test/fixtures/<picked-invoice-2>.pdf
Assert each document reaches a terminal state (processed/failed) without errors. Watch the REPL logs for:
legal_entity_id, etc.):legal-entity-id vs :tenant-id)With the app still running against the clone, walk through:
/tenants) — loads, shows renamed entities/organizations) — loadsWatch browser devtools and REPL logs for 500s.
If any step fails:
bb db:load-clone — resets orcha_prod_clone from the cached dump
file (~3 min, no re-clone of prod needed).bb db:drop-clone
bb db:list-clones # must be empty
bb db:list-clone-snapshots # must be empty
rm dump/prod-<timestamp>.dump # manually; contains PII
bb db:list-clones surfaces any leaked throwaway RDS instances (should not
happen under normal conditions — the script has a shutdown hook — but run
this as a belt-and-braces check).
bb db:list-clone-snapshots surfaces any leaked manual source snapshots
created by this workflow.
Clone restore hangs past 25 minutes. The AWS wait timeout may have been
exceeded. bb db:list-clones — if the instance exists and is available,
restart the script; if it's creating, keep waiting (larger snapshots take
longer).
SSM port-forward fails with "target not connected". Run
aws sso login --profile orcha-prod and retry.
pg_dump fails with password authentication failed. The script
fetches the master password from SSM (/v1-orcha/db-credentials), and
restored snapshots inherit the master password as it was AT SNAPSHOT TIME.
If the prod password rotated between the snapshot and the restore, those
won't match. Resolutions:
--fresh-snapshot so the snapshot reflects the current password.--master-user-password <known> to
restore-db-instance-from-db-snapshot and use that known password
(script enhancement; not currently supported).pg_restore warnings about missing roles. Expected; --no-owner --no-privileges skips ownership. Exit code 1 with warnings is treated as
success by bb db:load-clone.
Schema diff shows expected vs actual differences you think are fine.
Add a canonicalizer rule in scripts/schema_diff.clj. Review the diff
carefully — "noise" is often a subtle bug.
bb db:list-clones shows a stray instance you don't recognize.
Delete it:
aws rds delete-db-instance --profile orcha-prod --region eu-central-1 \
--db-instance-identifier <ID> --skip-final-snapshot --delete-automated-backups