For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Build a /regression-test skill that dispatches parallel inspector agents to compare ingestion results before and after code changes.
Architecture: A Claude Code skill orchestrates parallel subagents (one per document). Each subagent fetches the document from prod if needed, captures baseline structured data, triggers reingestion, polls for completion, compares results, and writes a report. PDF visual inspection is used only when the comparison is ambiguous.
Tech Stack: Claude Code skills (SKILL.md + prompt template), Bash (psql, bb), clojure-eval (REPL), Write tool (reports)
Design doc: docs/plans/2026-03-09-ingestion-regression-testing-design.md
docs/regression-reports/ to .gitignoreFiles:
.gitignoreStep 1: Add the gitignore entry
Add this line to .gitignore:
docs/regression-reports/
Step 2: Create the directory
mkdir -p docs/regression-reports
Step 3: Commit
git add .gitignore
git commit -m "chore: gitignore regression report output directory"
reingest-doc skill (broken 4-arity call)The current /reingest-doc skill calls (erp.ingestion/queue-for-ingestion! db-pool aws doc-id nil) which doesn't match the function's 3-arity signature [db-pool aws-config opts]. The same broken call appears in the debug_fetch_document.clj script's printed instructions.
The actual reingestion logic lives in src/com/getorcha/erp/http/documents/view/shared.clj:854-888 — it inserts an ingestion record directly and sends the ingestion ID to SQS.
Add a REPL-callable requeue-document! function to com.getorcha.erp.ingestion that does this without HTTP dependencies.
Files:
src/com/getorcha/erp/ingestion.clj.claude/skills/reingest-doc/SKILL.mdscripts/debug_fetch_document.clj (printed instructions at line 142)Step 1: Write the requeue-document! function
Add to src/com/getorcha/erp/ingestion.clj, after queue-for-ingestion!:
(defn requeue-document!
"Re-queue an existing document for ingestion. For use from REPL or scripts.
Unlike `queue-for-ingestion!`, this doesn't create or upload the document —
it assumes the document already exists in DB and S3. Creates a new ingestion
record and queues it for processing.
Returns map with :ingestion/id and :document/id, or nil if document not found.
Returns :skipped? true if document already has an in-progress ingestion."
[db-pool aws-config document-id]
(let [document (db.sql/execute-one!
db-pool
{:select [:id
[{:select [:status]
:from [:ingestion]
:where [:and
[:= :document-id :document.id]
[:= :status (db.sql/->cast :in-progress :ingestion-status)]]
:limit 1}
:in-progress-status]]
:from [:document]
:where [:= :id document-id]})]
(when document
(if (:in-progress-status document)
{:document/id (:document/id document)
:skipped? true}
(let [;; Look up uploaded-by from the latest ingestion
uploaded-by (:ingestion/uploaded-by
(db.sql/execute-one!
db-pool
{:select [:uploaded-by]
:from [:ingestion]
:where [:and
[:= :document-id document-id]
[:not= :uploaded-by nil]]
:order-by [[:created-at :desc]]
:limit 1}
{:builder-fn ingestion-builder-fn}))
doc-source-id (when-not uploaded-by
(:ingestion/doc-source-id
(db.sql/execute-one!
db-pool
{:select [:doc-source-id]
:from [:ingestion]
:where [:and
[:= :document-id document-id]
[:not= :doc-source-id nil]]
:order-by [[:created-at :desc]]
:limit 1}
{:builder-fn ingestion-builder-fn})))
ingestion (db.sql/execute-one!
db-pool
{:insert-into :ingestion
:values [(cond-> {:document-id document-id}
uploaded-by (assoc :uploaded-by uploaded-by)
doc-source-id (assoc :doc-source-id doc-source-id))]
:returning [:id]}
{:builder-fn ingestion-builder-fn})
sqs-client (get-in aws-config [:clients :sqs])
queue-url (get-in aws-config [:queue-urls :ingestion])]
(aws/send-message! sqs-client queue-url (str (:ingestion/id ingestion)))
(log/info "Document requeued for ingestion"
{:document/id document-id
:ingestion/id (:ingestion/id ingestion)})
{:ingestion/id (:ingestion/id ingestion)
:document/id document-id})))))
Step 2: Add a repl/aws helper
Add to repl/com/getorcha/repl.clj:
(defn aws
[]
(:com.getorcha.aws/state system))
This makes it consistent with (repl/db-pool) and fixes the pattern used across skills and scripts.
Step 3: Update the reingest-doc skill
Update .claude/skills/reingest-doc/SKILL.md to use the new function:
(require '[com.getorcha.erp.ingestion :as erp.ingestion])
(erp.ingestion/requeue-document! (repl/db-pool) (repl/aws) #uuid "<doc-id>")
Step 4: Update the debug_fetch_document.clj printed instructions
Update scripts/debug_fetch_document.clj line 141-142 to print the corrected call.
Step 5: Verify it compiles
clj-nrepl-eval -p <PORT> "(require '[com.getorcha.erp.ingestion :as erp.ingestion] :reload)"
Step 6: Lint
clj-kondo --lint src/com/getorcha/erp/ingestion.clj repl/com/getorcha/repl.clj
Step 7: Commit
git add src/com/getorcha/erp/ingestion.clj repl/com/getorcha/repl.clj .claude/skills/reingest-doc/SKILL.md scripts/debug_fetch_document.clj
git commit -m "feat: add requeue-document! for REPL reingestion, fix broken 4-arity call"
This is the prompt each subagent receives. It must be self-contained — the subagent knows nothing about the codebase.
Files:
.claude/skills/regression-test/inspector-prompt.mdStep 1: Write the inspector prompt
The prompt is a template with placeholders that the orchestrator skill fills in. The inspector agent needs instructions for:
Ensure document exists locally — check local DB via psql, if missing run bb debug:fetch-document --force <doc-id>. If auth error, run aws sso login --profile orcha-prod and retry.
Capture baseline — query the latest completed ingestion's structured_data and commit_sha via psql:
SELECT structured_data, commit_sha
FROM ingestion
WHERE document_id = '<doc-id>' AND status = 'completed'
ORDER BY created_at DESC LIMIT 1
Trigger reingestion — use clj-nrepl-eval to call requeue-document!:
clj-nrepl-eval -p <PORT> "(require '[com.getorcha.erp.ingestion :as erp.ingestion]) (erp.ingestion/requeue-document! (repl/db-pool) (repl/aws) #uuid \"<doc-id>\")"
Parse the returned map to extract :ingestion/id. If :skipped? true, wait for the existing in-progress ingestion instead.
Poll for completion — poll via clj-nrepl-eval every 10 seconds, up to 3 minutes:
clj-nrepl-eval -p <PORT> "(require '[com.getorcha.db.sql :as db.sql]) (db.sql/execute-one! (repl/db-pool) {:select [:status] :from [:ingestion] :where [:= :id #uuid \"<ingestion-id>\"]})"
Wait until status is completed or failed. If failed, report failure and stop.
Fetch new structured data — query the new ingestion's structured_data via psql:
SELECT structured_data FROM ingestion WHERE id = '<ingestion-id>'
Compare — deep-diff the entire structured_data objects. Exclude document matching/clusters (fields like cluster_id, matching_status on the document table — these are not in structured_data). Only report differences. Assess verdict: Identical, Improved, Regressed, Mixed, or Unclear.
PDF visual inspection (only if verdict is Unclear) — download and read the PDF:
bb dev:aws-cli s3 cp s3://v1-orcha-global-storage-local-stack/documents/<doc-id>.pdf /tmp/<doc-id>.pdf
Then use Read tool on /tmp/<doc-id>.pdf. Use the visual content as ground truth to resolve the ambiguity.
Get current local commit SHA:
git rev-parse HEAD
Write report — use Write tool to create docs/regression-reports/<doc-id>-<commit-short>.md. Format per design doc:
Return summary — end with a one-line summary: <doc-id> | <filename> | <verdict>
Important notes to include in the prompt:
<PORT> — use it for all clj-nrepl-eval callsdocument.file_path in the DB — not all documents are PDFsStep 2: Commit
git add .claude/skills/regression-test/inspector-prompt.md
git commit -m "feat: add inspector prompt template for regression testing"
Files:
.claude/skills/regression-test/SKILL.mdStep 1: Write the skill
The SKILL.md is what gets loaded when the user runs /regression-test. It should:
Parse document IDs from the arguments. UUIDs are detected by format (8-4-4-4-12 hex pattern). The user may provide them inline or describe them in natural language (e.g., "all invoices from commit fcb985b..."). If natural language, query the DB to resolve:
psql -h localhost -U postgres -d orcha -c "SELECT DISTINCT d.id FROM document d JOIN ingestion i ON i.document_id = d.id WHERE d.type = 'invoice' AND i.commit_sha LIKE '<prefix>%' AND i.status = 'completed'"
Discover nREPL port — run clj-nrepl-eval --discover-ports and select the appropriate port. This port is passed to all inspector agents.
Dispatch one Agent per document — all in parallel, using the inspector-prompt.md template with placeholders filled in (document ID, nREPL port).
Collect results — after all agents complete, print a summary table to the terminal.
Step 2: Commit
git add .claude/skills/regression-test/SKILL.md
git commit -m "feat: add /regression-test orchestrator skill"
Files: none (manual verification)
Step 1: Pick a test document
Use one of the documents already in the local DB:
psql -h localhost -U postgres -d orcha -c "SELECT d.id, d.file_original_name, i.commit_sha FROM document d JOIN ingestion i ON i.document_id = d.id WHERE d.type = 'invoice' AND i.status = 'completed' ORDER BY i.created_at DESC LIMIT 3"
Step 2: Run the skill
/regression-test <doc-id>
Step 3: Verify
docs/regression-reports/<doc-id>-<commit>.mdStep 4: Run with 2-3 documents to verify parallel dispatch
/regression-test <doc-id-1> <doc-id-2> <doc-id-3>
Verify all agents run in parallel and all reports are created.