Ingestion Regression Testing — Design

Problem

When making changes to the ingestion pipeline (transcription, classification, extraction, post-processing), we need a way to verify that existing documents don't regress in quality. Currently there's no automated way to compare ingestion results before and after a code change.

Solution

A Claude Code skill (/regression-test) that takes a list of document IDs (or natural language description), dispatches parallel inspector agents — one per document — that each:

  1. Ensure the document exists locally (fetching from prod if needed)
  2. Capture the baseline structured data from the existing (prod) ingestion
  3. Trigger a reingestion with the current local code
  4. Wait for completion
  5. Compare the full structured data (old vs new)
  6. If the comparison is ambiguous, visually inspect the PDF to determine ground truth
  7. Write an individual report to disk

Architecture

User: "/regression-test <document IDs or description>"
         │
         ▼
┌──────────────────────────┐
│   Orchestrator (skill)   │
│                          │
│  1. Parse document IDs   │
│  2. Dispatch inspectors  │──── one Agent per document, parallel
│  3. Wait for completion  │
│  4. Print summary table  │──── terminal only, not persisted
└──────────────────────────┘
         │
         ▼  (parallel, one per document)
┌──────────────────────────┐
│   Inspector Agent        │
│                          │
│  1. Check local DB       │──── psql
│  2. If missing, fetch    │──── bb debug:fetch-document --force <id>
│  3. Capture prod baseline│──── psql (structured_data + commit_sha
│     from latest completed│     from latest completed ingestion)
│     ingestion            │
│  4. Trigger reingestion  │──── clojure-eval REPL
│  5. Poll until complete  │──── clojure-eval (SQL on ingestion table)
│  6. Fetch new structured │──── clojure-eval (SQL)
│     data                 │
│  7. Compare old vs new   │
│  8. If ambiguous, read   │──── Read tool on PDF from local S3
│     the PDF visually     │
│  9. Write report to disk │──── docs/regression-reports/<doc-id>-<commit>.md
└──────────────────────────┘

Inspector Details

Document Fetching

Follows the /debug-doc pattern:

  1. Check local DB: SELECT id FROM document WHERE id = '<doc-id>'
  2. If missing: bb debug:fetch-document --force <doc-id>

Baseline Capture

Before reingesting, capture from the latest completed ingestion:

SELECT structured_data, commit_sha
FROM ingestion
WHERE document_id = '<doc-id>' AND status = 'completed'
ORDER BY created_at DESC
LIMIT 1

Reingestion

Trigger via clojure-eval REPL:

(require '[com.getorcha.erp.ingestion :as erp.ingestion])
(erp.ingestion/queue-for-ingestion! (repl/db-pool) (repl/aws) #uuid "<doc-id>" nil)

Polling for Completion

Poll the ingestion status via clojure-eval until it leaves in-progress:

(db.sql/execute-one! (repl/db-pool)
  {:select [:status]
   :from   [:ingestion]
   :where  [:= :id #uuid "<ingestion-id>"]})

Poll interval: a few seconds. Timeout: reasonable upper bound (e.g., 2-3 minutes).

Comparison Logic

Compare the entire structured_data JSON object — old vs new. This includes all extraction outputs and all post-processing outputs (account matching, fraud detection, validation, tax issues, etc.). This is a regression test on ingestion quality, not just extraction quality.

Excluded from comparison: document matching / clusters. The inspector prompt explicitly calls this out to avoid false regression reports.

Deep-diff the full objects. Categorize differences by severity. Only report fields that differ — identical fields are omitted from the report.

Verdicts

PDF Visual Inspection

Only triggered when the verdict is "Unclear" — significant discrepancies between old and new structured data with no clear winner.

The inspector downloads the PDF from local S3 and reads it visually (Claude vision capabilities) to determine ground truth, then revises the verdict.

bb dev:aws-cli s3 cp s3://v1-orcha-global-storage-local-stack/documents/<doc-id>.pdf /tmp/<doc-id>.pdf

Then uses the Read tool on the PDF file.

Report Format

File naming

docs/regression-reports/<doc-id>-<new-commit-sha-short>.md

Examples:

Glob by document: docs/regression-reports/019c0fdc-* Glob by commit: docs/regression-reports/*-abc1234.md

Identical result (minimal)

# Regression: <doc-id> — <new-commit-short>
- **Verdict**: Identical
- **Prod commit**: `<sha>`
- **New commit**: `<sha>`

No meaningful differences in structured data.

Non-identical result (full narrative)

# Regression: <doc-id> — <new-commit-short>
- **Filename**: <original filename>
- **Verdict**: Improved / Regressed / Mixed / Unclear
- **Prod commit**: `<sha>`
- **New commit**: `<sha>`

## Changes
<narrative description of differences only, organized by significance>

## PDF Inspection
<only present if verdict required visual inspection>
<what the agent saw and how it resolved the ambiguity>

Orchestrator terminal summary

Not persisted. Printed after all inspectors complete:

| Document | Filename | Verdict   |
|----------|----------|-----------|
| 019c...  | inv.pdf  | Regressed |
| 019d...  | bill.pdf | Improved  |

X documents tested. Y improved, Z regressed, W identical, ...

Scope

Implementation Artifacts