Note (2026-04-24): After this document was written,
legal_entitywas renamed totenantand the oldtenantwas renamed toorganization. Read references to these terms with the pre-rename meaning.
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Build a REPL-driven testing framework that runs real PDFs through the full ingestion pipeline and compares output against curated golden-file snapshots using semantic diff.
Architecture: Two Clojure namespaces in dev/: a semantic diff library (dev.getorcha.correctness.diff) and a runner (dev.getorcha.correctness). Test cases live in dev/correctness/ with committed PDFs and golden EDN files. Tests run against the local dev system via (repl/db-pool) and (repl/aws).
Tech Stack: Clojure, next.jdbc (DB), AWS S3 (LocalStack), Malli (schema), clojure.data (diff foundation), clojure.test (for diff unit tests)
Design spec: docs/superpowers/specs/2026-04-18-pipeline-correctness-testing-design.md
dev/
correctness/
manifest.edn # test case definitions
pdfs/ # test PDFs (committed, ~10MB)
golden/ # expected structured_data EDN files
master-data/ # optional chart of accounts, cost centers, etc.
dev/getorcha/
correctness.clj # runner: run!, run-all!, update-golden!, etc.
correctness/
diff.clj # semantic diff algorithm
test/com/getorcha/
correctness/
diff_test.clj # unit tests for the diff algorithm
Key existing files referenced:
repl/com/getorcha/repl.clj — (repl/db-pool), (repl/aws) accessorssrc/com/getorcha/app/ingestion.clj — queue-for-ingestion! (line 57)src/com/getorcha/schema/invoice/structured_data.clj — Malli schema for invoice structured datasrc/com/getorcha/schema/common.clj — Issuer, Recipient, etc.test/com/getorcha/test/notification_helpers.clj — create-legal-entity! pattern (line 13)Files:
dev/dev/getorcha/correctness/diff.cljtest/com/getorcha/correctness/diff_test.cljThis is the core of the framework. It compares two structured_data maps and classifies differences as :trivial, :material, or :ignored.
Create test/com/getorcha/correctness/diff_test.clj:
(ns com.getorcha.correctness.diff-test
(:require [clojure.test :refer [deftest is testing]]
[dev.getorcha.correctness.diff :as diff]))
(deftest identical-maps-test
(testing "identical maps return :identical verdict"
(let [data {:invoice-number "INV-001"
:total 100.0
:issuer {:name "Test GmbH" :vat-id "DE123456789"}}
result (diff/compare-structured-data data data)]
(is (= :identical (:verdict result)))
(is (empty? (:material result)))
(is (empty? (:trivial result))))))
(deftest number-rounding-trivial-test
(testing "number rounding differences are trivial"
(let [expected {:total 100.0 :subtotal 84.03}
actual {:total 100.00 :subtotal 84.03}
result (diff/compare-structured-data expected actual)]
(is (= :identical (:verdict result))))))
(deftest material-value-diff-test
(testing "different invoice numbers are material"
(let [expected {:invoice-number "INV-001" :total 100.0}
actual {:invoice-number "INV-002" :total 100.0}
result (diff/compare-structured-data expected actual)]
(is (= :material-diff (:verdict result)))
(is (= 1 (count (:material result))))
(is (= [:invoice-number] (:path (first (:material result))))))))
(deftest reasoning-ignored-test
(testing "reasoning fields are ignored"
(let [expected {:line-items [{:description "Widget"
:amount 100.0
:debit-account {:number "4800"
:confidence 0.9
:reasoning "Because widgets"}}]}
actual {:line-items [{:description "Widget"
:amount 100.0
:debit-account {:number "4800"
:confidence 0.9
:reasoning "Different reason"}}]}
result (diff/compare-structured-data expected actual)]
(is (= :identical (:verdict result))))))
(deftest confidence-tolerance-test
(testing "small confidence changes are trivial"
(let [expected {:line-items [{:description "Widget"
:amount 100.0
:debit-account {:number "4800"
:confidence 0.90
:reasoning nil}}]}
actual {:line-items [{:description "Widget"
:amount 100.0
:debit-account {:number "4800"
:confidence 0.82
:reasoning nil}}]}
result (diff/compare-structured-data expected actual)]
(is (= :trivial-only (:verdict result)))
(is (= 1 (count (:trivial result))))))
(testing "large confidence changes are material"
(let [expected {:line-items [{:description "Widget"
:amount 100.0
:debit-account {:number "4800"
:confidence 0.90
:reasoning nil}}]}
actual {:line-items [{:description "Widget"
:amount 100.0
:debit-account {:number "4800"
:confidence 0.50
:reasoning nil}}]}
result (diff/compare-structured-data expected actual)]
(is (= :material-diff (:verdict result))))))
(deftest nil-vs-absent-test
(testing "nil value and absent key are equivalent"
(let [expected {:invoice-number "INV-001" :discount nil}
actual {:invoice-number "INV-001"}
result (diff/compare-structured-data expected actual)]
(is (= :identical (:verdict result))))))
(deftest fuzzy-string-test
(testing "issuer name is compared case-insensitively with whitespace normalization"
(let [expected {:issuer {:name "Müller GmbH" :vat-id "DE123"}}
actual {:issuer {:name "müller GmbH" :vat-id "DE123"}}
result (diff/compare-structured-data expected actual)]
(is (= :identical (:verdict result))))))
(deftest line-items-sorted-before-compare-test
(testing "line items are sorted by description before comparing"
(let [expected {:line-items [{:description "Alpha" :amount 10.0}
{:description "Beta" :amount 20.0}]}
actual {:line-items [{:description "Beta" :amount 20.0}
{:description "Alpha" :amount 10.0}]}
result (diff/compare-structured-data expected actual)]
(is (= :identical (:verdict result))))))
(deftest validation-results-status-only-test
(testing "validation results compare status only, ignore message"
(let [expected {:validation-results {:line-item-math {:status "pass" :message "All good"}}}
actual {:validation-results {:line-item-math {:status "pass" :message "Checks out"}}}
result (diff/compare-structured-data expected actual)]
(is (= :identical (:verdict result)))))
(testing "validation status change is material"
(let [expected {:validation-results {:line-item-math {:status "pass"}}}
actual {:validation-results {:line-item-math {:status "error" :message "Bad math"}}}
result (diff/compare-structured-data expected actual)]
(is (= :material-diff (:verdict result))))))
(deftest fraud-flags-type-severity-only-test
(testing "fraud flags compare type+severity, ignore message"
(let [expected {:fraud-flags [{:rule-id :ef1-01 :type :bank-account-mismatch
:severity :warning :message "Old msg"}]}
actual {:fraud-flags [{:rule-id :ef1-01 :type :bank-account-mismatch
:severity :warning :message "New msg"}]}
result (diff/compare-structured-data expected actual)]
(is (= :identical (:verdict result))))))
(deftest mixed-diffs-test
(testing "mix of trivial and material diffs"
(let [expected {:invoice-number "INV-001"
:total 100.0
:line-items [{:description "Widget"
:amount 50.0
:debit-account {:number "4800"
:confidence 0.9
:reasoning nil}}]}
actual {:invoice-number "INV-002" ;; material
:total 100.00 ;; trivial (rounding)
:line-items [{:description "Widget"
:amount 50.0
:debit-account {:number "4800"
:confidence 0.82 ;; trivial (tolerance)
:reasoning nil}}]}
result (diff/compare-structured-data expected actual)]
(is (= :material-diff (:verdict result)))
(is (= 1 (count (:material result))))
(is (= 1 (count (:trivial result)))))))
cd orcha && clj -X:test:silent :nses '[com.getorcha.correctness.diff-test]'
Expected: compilation error — dev.getorcha.correctness.diff namespace not found.
Create dev/dev/getorcha/correctness/diff.clj:
(ns dev.getorcha.correctness.diff
"Semantic diff for structured_data maps.
Compares two structured_data maps field-by-field, classifying
differences as :material, :trivial, or :ignored based on field
type and configurable tolerances."
(:require [clojure.string :as str]))
(def ^:private default-config
"Diff configuration. Controls how fields are compared."
{;; Absolute tolerance for numeric comparison (amounts, quantities)
:number-tolerance 0.005
;; Absolute tolerance for confidence scores
:confidence-tolerance 0.15
;; Keys whose values are ignored entirely
:ignored-keys #{:reasoning :match-reasoning :suggestion}
;; Keys compared case-insensitively with whitespace normalization
:fuzzy-string-keys #{:name :address}
;; For vectors of maps: which key to sort by before comparing
:sort-keys {:line-items :description
:fraud-flags :type
:tax-issues :type
:compliance-statements :type
:prepayments :description
:tax-rate-breakdowns :rate
:breakdown-items :description}
;; Sub-maps where only specific keys matter
:partial-compare-keys {:validation-results #{:status}
:fraud-flags #{:rule-id :type :severity}}})
(defn ^:private normalize-string
"Normalize string for fuzzy comparison: lowercase, collapse whitespace, trim."
[s]
(when s
(-> s str/trim (str/replace #"\s+" " ") str/lower-case)))
(defn ^:private numbers-equal?
"Compare two numbers with tolerance."
[a b tolerance]
(and (number? a) (number? b)
(<= (abs (- (double a) (double b))) tolerance)))
(defn ^:private nil-equivalent?
"True if both values are nil-equivalent (nil, absent, empty string)."
[a b]
(let [nil-ish? #(or (nil? %) (and (string? %) (str/blank? %)))]
(and (nil-ish? a) (nil-ish? b))))
(defn ^:private sort-vector-of-maps
"Sort a vector of maps by the given key for stable comparison."
[v sort-key]
(if (and (vector? v) (every? map? v) sort-key)
(vec (sort-by #(str (get % sort-key "")) v))
v))
(defn ^:private diff-values
"Compare two values at a given path. Returns nil (equal), or a diff entry."
[path expected actual config]
(let [last-key (last path)
parent-key (last (butlast path))]
(cond
;; Both nil-equivalent
(nil-equivalent? expected actual)
nil
;; Ignored key
(contains? (:ignored-keys config) last-key)
nil
;; Partial-compare: only check specific sub-keys
(and (map? expected) (map? actual)
(contains? (:partial-compare-keys config) parent-key))
(let [keys-to-check (get (:partial-compare-keys config) parent-key)]
(when-not (= (select-keys expected keys-to-check)
(select-keys actual keys-to-check))
{:path path
:expected (select-keys expected keys-to-check)
:actual (select-keys actual keys-to-check)
:category :material
:reason :value-mismatch}))
;; Confidence score
(= :confidence last-key)
(cond
(numbers-equal? expected actual (:confidence-tolerance config)) nil
(numbers-equal? expected actual 0.30)
{:path path :expected expected :actual actual
:category :trivial :reason :confidence-drift}
:else
{:path path :expected expected :actual actual
:category :material :reason :confidence-drift-large})
;; Numbers
(and (number? expected) (number? actual))
(when-not (numbers-equal? expected actual (:number-tolerance config))
{:path path :expected expected :actual actual
:category :material :reason :value-mismatch})
;; Fuzzy strings
(and (string? expected) (string? actual)
(contains? (:fuzzy-string-keys config) last-key))
(when-not (= (normalize-string expected) (normalize-string actual))
{:path path :expected expected :actual actual
:category :material :reason :value-mismatch})
;; Exact comparison for everything else
:else
(when-not (= expected actual)
{:path path :expected expected :actual actual
:category :material :reason :value-mismatch}))))
(defn ^:private diff-maps
"Recursively diff two maps. Returns vector of diff entries."
[path expected actual config]
(let [all-keys (distinct (concat (keys expected) (keys actual)))]
(reduce
(fn [diffs k]
(let [child-path (conj path k)
ev (get expected k)
av (get actual k)]
(cond
;; Both nil-equivalent
(nil-equivalent? ev av)
diffs
;; Ignored key
(contains? (:ignored-keys config) k)
diffs
;; Both maps — recurse
(and (map? ev) (map? av))
(into diffs (diff-maps child-path ev av config))
;; Both vectors — sort and compare element-by-element
(and (vector? ev) (vector? av))
(let [sort-key (get (:sort-keys config) k)
;; Check if elements should be partially compared
partial-keys (get (:partial-compare-keys config) k)
sv (sort-vector-of-maps ev sort-key)
sa (sort-vector-of-maps av sort-key)
max-len (max (count sv) (count sa))]
(reduce
(fn [diffs i]
(let [ei (get sv i)
ai (get sa i)]
(cond
(nil? ei)
(conj diffs {:path (conj child-path i) :expected nil :actual ai
:category :material :reason :extra-item})
(nil? ai)
(conj diffs {:path (conj child-path i) :expected ei :actual nil
:category :material :reason :missing-item})
(and (map? ei) (map? ai))
(if partial-keys
;; Partial compare for this vector's elements
(if (= (select-keys ei partial-keys)
(select-keys ai partial-keys))
diffs
(conj diffs {:path (conj child-path i)
:expected (select-keys ei partial-keys)
:actual (select-keys ai partial-keys)
:category :material
:reason :value-mismatch}))
(into diffs (diff-maps (conj child-path i) ei ai config)))
:else
(if-let [d (diff-values (conj child-path i) ei ai config)]
(conj diffs d)
diffs))))
diffs
(range max-len)))
;; Leaf values
:else
(if-let [d (diff-values child-path ev av config)]
(conj diffs d)
diffs))))
[]
all-keys)))
(defn compare-structured-data
"Compare expected and actual structured_data maps.
Returns:
{:verdict :identical | :trivial-only | :material-diff
:material [{:path [...] :expected v :actual v :reason kw}]
:trivial [{:path [...] :expected v :actual v :reason kw}]
:ignored [{:path [...] :reason kw}]}"
([expected actual]
(compare-structured-data expected actual default-config))
([expected actual config]
(let [config (merge default-config config)
all-diffs (diff-maps [] expected actual config)
material (filterv #(= :material (:category %)) all-diffs)
trivial (filterv #(= :trivial (:category %)) all-diffs)]
{:verdict (cond
(seq material) :material-diff
(seq trivial) :trivial-only
:else :identical)
:material material
:trivial trivial})))
cd orcha && clj -X:test:silent :nses '[com.getorcha.correctness.diff-test]'
Expected: all tests pass.
cd orcha && git add dev/dev/getorcha/correctness/diff.clj test/com/getorcha/correctness/diff_test.clj && git commit -m "feat: add semantic diff algorithm for pipeline correctness testing"
Files:
dev/dev/getorcha/correctness.cljThe runner namespace provides the REPL API: run!, run-all!, update-golden!, create-golden!, show-diff!.
Create dev/dev/getorcha/correctness.clj:
(ns dev.getorcha.correctness
"Pipeline correctness testing framework.
Runs test cases against the local dev system and compares output
to golden-file snapshots using semantic diff.
Usage from REPL:
(correctness/run! \"inv-001-standard\")
(correctness/run-all!)
(correctness/run-tagged! :extraction :accounts)
(correctness/create-golden! \"inv-001-standard\")
(correctness/update-golden! \"inv-001-standard\")"
(:require [cheshire.core :as json]
[clojure.java.io :as io]
[clojure.pprint :as pprint]
[clojure.string :as str]
[clojure.tools.reader.edn :as edn]
[com.getorcha.app.ingestion :as app.ingestion]
[com.getorcha.aws :as aws]
[com.getorcha.db.sql :as db.sql]
[com.getorcha.repl :as repl]
[dev.getorcha.correctness.diff :as diff])
(:import (java.io PushbackReader)
(java.nio.file Files Path Paths)
(java.time Duration Instant)
(java.util UUID)))
;; ---------------------------------------------------------------------------
;; Paths
;; ---------------------------------------------------------------------------
(def ^:private base-dir
"Base directory for correctness test data."
"dev/correctness")
(defn ^:private manifest-path [] (str base-dir "/manifest.edn"))
(defn ^:private pdf-path [relative] (str base-dir "/" relative))
(defn ^:private golden-path [relative] (str base-dir "/" relative))
;; ---------------------------------------------------------------------------
;; Manifest
;; ---------------------------------------------------------------------------
(defn ^:private load-manifest
"Load and parse the manifest.edn file."
[]
(let [f (io/file (manifest-path))]
(when-not (.exists f)
(throw (ex-info "Manifest not found" {:path (manifest-path)})))
(with-open [r (PushbackReader. (io/reader f))]
(edn/read r))))
(defn ^:private find-case
"Find a test case by ID in the manifest."
[manifest case-id]
(let [cases (:cases manifest)]
(or (first (filter #(= case-id (:id %)) cases))
(throw (ex-info (str "Test case not found: " case-id)
{:case-id case-id
:available (mapv :id cases)})))))
;; ---------------------------------------------------------------------------
;; System access
;; ---------------------------------------------------------------------------
(defn ^:private ensure-system!
"Verify the Integrant system is running. Throws if not."
[]
(try
(repl/db-pool)
(catch Throwable _
(throw (ex-info "System not running. Call (go) or (reset) first." {})))))
;; ---------------------------------------------------------------------------
;; Test data setup and teardown
;; ---------------------------------------------------------------------------
(defn ^:private setup-legal-entity!
"Create a tenant + legal entity for the test case. Returns legal-entity-id."
[db-pool case-config defaults]
(let [le-config (merge (:legal-entity defaults) (:legal-entity case-config))
tenant-id (random-uuid)
le-id (random-uuid)
slug (str "correctness-" (:id case-config))]
(db.sql/execute-one!
db-pool
{:insert-into :tenant
:values [{:id tenant-id :name "Correctness Test" :slug slug}]})
(db.sql/execute-one!
db-pool
{:insert-into :legal-entity
:values [{:id le-id
:name (or (:name le-config) "Correctness Test GmbH")
:company-address (:address le-config)
:company-vat-id (:vat-id le-config)
:company-tax-id (:tax-id le-config)
:company-country (:country le-config)
:tenant-id tenant-id}]})
le-id))
(defn ^:private setup-identity!
"Create a test identity for uploaded-by. Returns identity-id."
[db-pool]
(let [id (random-uuid)]
(db.sql/execute-one!
db-pool
{:insert-into :identity
:values [{:id id
:email "correctness-test@getorcha.com"}]})
id))
(defn ^:private setup-master-data!
"Insert master data (chart of accounts, cost centers, business partners)
for the legal entity if configured."
[db-pool le-id case-config defaults]
(let [md (merge (:master-data defaults) (:master-data case-config))]
(doseq [[table-key table-name active-field] [[:chart-of-accounts :gl-accounts-dataset :is-active]
[:cost-centers :cost-center-dataset :position]
[:business-partners :business-partner-dataset :is-active]]]
(when-let [path (get md table-key)]
(let [f (io/file (str base-dir "/" path))]
(when (.exists f)
(let [data (with-open [r (PushbackReader. (io/reader f))]
(edn/read r))
row (cond-> {:legal-entity-id le-id
:data [:lift (json/generate-string data)]}
;; gl_accounts_dataset and business_partner_dataset use is_active
;; cost_center_dataset uses position (integer) for active ordering
(= active-field :is-active) (assoc :is-active true)
(= active-field :position) (assoc :position 0 :headers [:lift "[]"]))]
(db.sql/execute-one!
db-pool
{:insert-into table-name
:values [row]}))))))))
(defn ^:private teardown!
"Delete all test data by removing the tenant (cascades to legal entity,
documents, ingestions, master data, etc.)."
[db-pool tenant-slug]
(db.sql/execute-one!
db-pool
{:delete-from :tenant
:where [:= :slug tenant-slug]}))
;; ---------------------------------------------------------------------------
;; Pipeline execution
;; ---------------------------------------------------------------------------
(defn ^:private ingest-pdf!
"Upload PDF and queue for ingestion. Returns {:ingestion-id UUID :document-id UUID}."
[db-pool aws-config le-id identity-id case-config]
(let [pdf-file (io/file (pdf-path (:pdf case-config)))
_ (when-not (.exists pdf-file)
(throw (ex-info (str "PDF not found: " (:pdf case-config))
{:path (pdf-path (:pdf case-config))})))
content (Files/readAllBytes (.toPath pdf-file))
result (app.ingestion/queue-for-ingestion!
db-pool aws-config
{:content content
:content-type "application/pdf"
:legal-entity-id le-id
:uploaded-by identity-id
:file-original-name (.getName pdf-file)})]
(when (:skipped? result)
(throw (ex-info "Document already has in-progress ingestion" result)))
{:ingestion-id (:ap-ingestion/id result)
:document-id (:document/id result)}))
(defn ^:private poll-ingestion!
"Poll for ingestion completion. Returns structured-data or throws on failure/timeout."
[db-pool ingestion-id & {:keys [timeout-seconds poll-interval-seconds]
:or {timeout-seconds 300
poll-interval-seconds 5}}]
(let [deadline (Instant/ofEpochMilli (+ (System/currentTimeMillis) (* timeout-seconds 1000)))]
(loop []
(let [{:ap-ingestion/keys [status structured-data error-message]}
(db.sql/execute-one!
db-pool
{:select [:status :structured-data :error-message]
:from [:ap-ingestion]
:where [:= :id ingestion-id]})]
(case (str status)
"completed" (or structured-data
(throw (ex-info "Ingestion completed but no structured_data"
{:ingestion-id ingestion-id})))
"failed" (throw (ex-info (str "Ingestion failed: " error-message)
{:ingestion-id ingestion-id}))
"skipped" (throw (ex-info "Ingestion was skipped"
{:ingestion-id ingestion-id}))
;; Still in progress
(if (.isAfter (Instant/now) deadline)
(throw (ex-info "Ingestion timed out"
{:ingestion-id ingestion-id
:timeout-seconds timeout-seconds}))
(do (Thread/sleep (* poll-interval-seconds 1000))
(recur))))))))
;; ---------------------------------------------------------------------------
;; Golden file I/O
;; ---------------------------------------------------------------------------
(defn ^:private read-golden
"Read a golden file. Returns the structured_data map."
[case-config]
(let [f (io/file (golden-path (:golden case-config)))]
(when-not (.exists f)
(throw (ex-info (str "Golden file not found: " (:golden case-config)
"\nRun (create-golden! \"" (:id case-config) "\") first.")
{:path (golden-path (:golden case-config))})))
(with-open [r (PushbackReader. (io/reader f))]
(edn/read r))))
(defn ^:private write-golden!
"Write structured_data as a golden file."
[case-config data]
(let [f (io/file (golden-path (:golden case-config)))]
(io/make-parents f)
(spit f (with-out-str (pprint/pprint data)))
(println "Golden file written:" (.getPath f))))
;; ---------------------------------------------------------------------------
;; Reporting
;; ---------------------------------------------------------------------------
(defn ^:private format-path
"Format a diff path for display."
[path]
(str/join " > " (map #(if (number? %) (str "[" % "]") (name %)) path)))
(defn show-diff!
"Print detailed diff for the most recent run of a test case."
[result]
(let [{:keys [material trivial]} result]
(when (seq material)
(println "\nMATERIAL" (str "(" (count material) "):"))
(doseq [{:keys [path expected actual reason]} material]
(println (str " " (format-path path)))
(println (str " expected: " (pr-str expected)))
(println (str " actual: " (pr-str actual)))
(when (not= reason :value-mismatch)
(println (str " reason: " (name reason))))))
(when (seq trivial)
(println "\nTRIVIAL" (str "(" (count trivial) "):"))
(doseq [{:keys [path expected actual reason]} trivial]
(println (str " " (format-path path) " "
(pr-str expected) " -> " (pr-str actual)
" (" (name reason) ")"))))))
(defn ^:private print-summary-table
"Print a summary table of test results."
[results]
(println)
(println "Pipeline Correctness Results")
(println (str/join "" (repeat 70 "=")))
(printf "| %-25s | %-8s | %-13s | %8s | %7s | %5s |\n"
"Case" "Type" "Verdict" "Material" "Trivial" "Time")
(println (str/join "" (repeat 70 "-")))
(doseq [{:keys [case-id type verdict material trivial elapsed-ms]} results]
(printf "| %-25s | %-8s | %-13s | %8d | %7d | %4.0fs |\n"
(subs case-id 0 (min 25 (count case-id)))
(or type "?")
(name verdict)
(count material)
(count trivial)
(/ (double (or elapsed-ms 0)) 1000.0)))
(println (str/join "" (repeat 70 "=")))
(let [freqs (frequencies (map :verdict results))]
(printf "%d cases: %s\n"
(count results)
(str/join ", " (for [[v c] (sort-by key freqs)]
(str c " " (name v)))))))
;; ---------------------------------------------------------------------------
;; Public API
;; ---------------------------------------------------------------------------
(defn run!
"Run a single test case. Returns result map."
[case-id]
(ensure-system!)
(let [manifest (load-manifest)
case-config (find-case manifest case-id)
defaults (:defaults manifest)
db-pool (repl/db-pool)
aws-config (repl/aws)
tenant-slug (str "correctness-" case-id)
start (System/currentTimeMillis)]
(try
;; Setup
(let [le-id (setup-legal-entity! db-pool case-config defaults)
identity-id (setup-identity! db-pool)]
(setup-master-data! db-pool le-id case-config defaults)
(try
;; Execute pipeline
(let [{:keys [ingestion-id]} (ingest-pdf! db-pool aws-config le-id identity-id case-config)
_ (println (str "Ingesting " case-id " (ingestion " ingestion-id ")..."))
actual (poll-ingestion! db-pool ingestion-id)
golden (read-golden case-config)
diff-result (diff/compare-structured-data golden actual)
elapsed (- (System/currentTimeMillis) start)
result (merge diff-result
{:case-id case-id
:type (:type case-config)
:elapsed-ms elapsed
:actual actual})]
(println (str case-id ": " (name (:verdict result))
" (" (count (:material result)) " material, "
(count (:trivial result)) " trivial)"))
(when (= :material-diff (:verdict result))
(show-diff! result))
result)
(finally
;; Teardown
(teardown! db-pool tenant-slug))))
(catch Throwable t
;; Ensure cleanup even on setup failure
(try (teardown! db-pool tenant-slug) (catch Throwable _))
(let [elapsed (- (System/currentTimeMillis) start)]
(println (str case-id ": ERROR - " (ex-message t)))
{:case-id case-id
:type (:type case-config)
:verdict :error
:error (ex-message t)
:material []
:trivial []
:elapsed-ms elapsed})))))
(defn run-all!
"Run all test cases. Prints summary table."
[]
(ensure-system!)
(let [manifest (load-manifest)
results (mapv #(run! (:id %)) (:cases manifest))]
(print-summary-table results)
results))
(defn run-tagged!
"Run all test cases matching any of the given tags."
[& tags]
(ensure-system!)
(let [manifest (load-manifest)
tag-set (set tags)
cases (filter #(some tag-set (:tags %)) (:cases manifest))
results (mapv #(run! (:id %)) cases)]
(print-summary-table results)
results))
(defn create-golden!
"Run the pipeline and save output as golden file (first-time setup)."
[case-id]
(ensure-system!)
(let [manifest (load-manifest)
case-config (find-case manifest case-id)
defaults (:defaults manifest)
db-pool (repl/db-pool)
aws-config (repl/aws)
tenant-slug (str "correctness-" case-id)]
(try
(let [le-id (setup-legal-entity! db-pool case-config defaults)
identity-id (setup-identity! db-pool)]
(setup-master-data! db-pool le-id case-config defaults)
(try
(let [{:keys [ingestion-id]} (ingest-pdf! db-pool aws-config le-id identity-id case-config)
_ (println (str "Ingesting " case-id " for golden file capture..."))
actual (poll-ingestion! db-pool ingestion-id)]
(write-golden! case-config actual)
actual)
(finally
(teardown! db-pool tenant-slug))))
(catch Throwable t
(try (teardown! db-pool tenant-slug) (catch Throwable _))
(throw t)))))
(defn update-golden!
"Run the pipeline and overwrite the golden file with current output."
[case-id]
(create-golden! case-id))
cd orcha && clj-kondo --lint dev/dev/getorcha/correctness.clj dev/dev/getorcha/correctness/diff.clj
Expected: no errors or warnings.
cd orcha && git add dev/dev/getorcha/correctness.clj && git commit -m "feat: add correctness test runner with REPL API"
Files:
Create: dev/correctness/manifest.edn
Create: dev/correctness/pdfs/.gitkeep
Create: dev/correctness/golden/.gitkeep
Create: dev/correctness/master-data/.gitkeep
Step 1: Create the manifest with placeholder cases
Create dev/correctness/manifest.edn:
{:defaults
{:legal-entity {:name "Correctness Test GmbH"
:country "DE"
:vat-id "DE123456789"
:tax-id "123/456/78901"
:address "Musterstraße 1, 12345 Berlin"}}
:cases
[;; Add test cases here. Example:
;; {:id "inv-001-standard"
;; :description "Standard German invoice, happy path"
;; :pdf "pdfs/inv-001-standard.pdf"
;; :golden "golden/inv-001-standard.edn"
;; :type "invoice"
;; :tags #{:extraction :accounts :cost-center :validation}}
]
:match-groups
[;; Add match groups here. Example:
;; {:id "match-001-invoice-po"
;; :description "Invoice with PO reference"
;; :cases ["inv-004-with-po" "po-001-purchase-order"]
;; :expected-edges [{:a "inv-004-with-po" :b "po-001-purchase-order" :min-score 0.7}]}
]}
cd orcha && mkdir -p dev/correctness/pdfs dev/correctness/golden dev/correctness/master-data && touch dev/correctness/pdfs/.gitkeep dev/correctness/golden/.gitkeep dev/correctness/master-data/.gitkeep
Check if .gitattributes exists. If not, create one in the repo root. Add:
dev/correctness/pdfs/*.pdf binary
This prevents git from trying to diff binary PDFs.
cd orcha && git add dev/correctness/ && git commit -m "feat: add correctness test directory structure and manifest"
Files:
Before adding real test PDFs, verify the full flow works with any available PDF.
Load the namespace and verify it compiles:
(require '[dev.getorcha.correctness :as correctness] :reload-all)
Expected: no errors.
Pick any PDF available locally (from dev snapshots or Downloads). Add a test case to manifest.edn:
{:id "smoke-test"
:description "Temporary smoke test"
:pdf "pdfs/smoke-test.pdf"
:golden "golden/smoke-test.edn"
:type "invoice"
:tags #{:smoke}}
Copy the PDF to dev/correctness/pdfs/smoke-test.pdf.
(correctness/create-golden! "smoke-test")
Expected: prints "Ingesting smoke-test...", waits for completion, writes golden file, prints path.
(correctness/run! "smoke-test")
Expected: prints verdict (likely :identical or :trivial-only since it was just captured). If :material-diff, the diff is printed. This verifies the full loop: setup -> ingest -> poll -> diff -> teardown.
(correctness/run-all!)
Expected: prints summary table with the smoke test result.
Remove smoke-test.pdf and smoke-test.edn from the correctness directories if not needed as a permanent test case.
Files:
dev/correctness/manifest.edn (add cases)dev/correctness/pdfs/This task is about curating the 5-10 test cases. The user needs to identify PDFs for each category and create golden files.
The user selects PDFs for these categories (from Downloads, dev snapshots, or production):
| ID | Description | Source |
|---|---|---|
inv-001-standard |
Standard German invoice, happy path | A known-good invoice |
inv-002-credit-note |
Credit note (Gutschrift) | Past failure case |
inv-003-mixed-tax-rates |
Invoice with 7% and 19% VAT | Past failure case |
inv-004-with-po |
Invoice referencing a PO | For matching test |
inv-005-ocr-difficult |
Scan with poor quality | Past failure case |
inv-006-wrong-accounts |
Past account assignment failure | Past failure case |
For each PDF, copy it:
cp /path/to/invoice.pdf orcha/dev/correctness/pdfs/inv-001-standard.pdf
Add each case to dev/correctness/manifest.edn following the format in the existing comments.
For each case:
(correctness/create-golden! "inv-001-standard")
Review the golden file output to verify it looks correct (this becomes the ground truth).
(correctness/run-all!)
Expected: summary table showing all cases. Most should be :identical or :trivial-only since golden files were just captured.
cd orcha && git add dev/correctness/ && git commit -m "feat: add initial correctness test cases and golden files"
Files:
dev/dev/getorcha/correctness.clj (add run-match-group!)This task adds support for testing four-way matching. It requires at least two related documents (e.g., invoice + PO). Implement after the core framework is working.
run-match-group! to the runnerAdd to dev/dev/getorcha/correctness.clj:
(defn run-match-group!
"Run a match group: ingest all documents, trigger matching, verify edges."
[group-id]
(ensure-system!)
(let [manifest (load-manifest)
group (or (first (filter #(= group-id (:id %)) (:match-groups manifest)))
(throw (ex-info (str "Match group not found: " group-id) {})))
defaults (:defaults manifest)
db-pool (repl/db-pool)
aws-config (repl/aws)
tenant-slug (str "correctness-match-" group-id)
case-configs (mapv #(find-case manifest %) (:cases group))]
(try
(let [le-id (setup-legal-entity! db-pool (first case-configs) defaults)
identity-id (setup-identity! db-pool)]
(setup-master-data! db-pool le-id (first case-configs) defaults)
(try
;; Ingest all documents in the group
(let [ingestion-results
(mapv (fn [case-config]
(let [result (ingest-pdf! db-pool aws-config le-id identity-id case-config)]
(println (str "Queued " (:id case-config) " (ingestion " (:ingestion-id result) ")"))
(assoc result :case-id (:id case-config))))
case-configs)]
;; Poll all ingestions until complete
(doseq [{:keys [ingestion-id case-id]} ingestion-results]
(println (str "Waiting for " case-id "..."))
(poll-ingestion! db-pool ingestion-id))
(println "All ingestions complete. Waiting for matching...")
;; Allow time for the matching worker to process
(Thread/sleep 15000)
;; Check expected edges
(let [doc-id-by-case (into {}
(map (fn [{:keys [case-id document-id]}]
[case-id document-id])
ingestion-results))
results
(mapv (fn [{:keys [a b min-score]}]
(let [doc-a (get doc-id-by-case a)
doc-b (get doc-id-by-case b)
[id-a id-b] (sort [doc-a doc-b])
match (db.sql/execute-one!
db-pool
{:select [:blended-score :match-method]
:from [:ap-document-match]
:where [:and
[:= :document-a-id id-a]
[:= :document-b-id id-b]]})]
{:edge (str a " <-> " b)
:found? (some? match)
:score (:ap-document-match/blended-score match)
:method (:ap-document-match/match-method match)
:min-score min-score
:pass? (and (some? match)
(>= (double (:ap-document-match/blended-score match))
(double min-score)))}))
(:expected-edges group))]
(println "\nMatch Group Results: " group-id)
(doseq [{:keys [edge found? score method min-score pass?]} results]
(println (str " " (if pass? "PASS" "FAIL") " " edge
(if found?
(str " score=" (format "%.3f" (double score))
" method=" method
" (min=" min-score ")")
" NOT FOUND"))))
results))
(finally
(teardown! db-pool tenant-slug))))
(catch Throwable t
(try (teardown! db-pool tenant-slug) (catch Throwable _))
(throw t)))))
Requires inv-004-with-po and po-001-purchase-order cases to already exist. Add to manifest:
:match-groups
[{:id "match-001-invoice-po"
:description "Invoice with PO reference"
:cases ["inv-004-with-po" "po-001-purchase-order"]
:expected-edges [{:a "inv-004-with-po" :b "po-001-purchase-order" :min-score 0.7}]}]
(correctness/run-match-group! "match-001-invoice-po")
cd orcha && git add dev/dev/getorcha/correctness.clj dev/correctness/manifest.edn && git commit -m "feat: add match group support to correctness testing"