Note (2026-04-24): After this document was written,
legal_entitywas renamed totenantand the oldtenantwas renamed toorganization. Read references to these terms with the pre-rename meaning.
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Validate invoice recipient fields against legal entity master data, with UVR vision fallback for mismatches.
Architecture: Pre-fetch LE data in job-handler into pipeline state. Add check-recipient-identity as a validation check. Extend UVR to resolve recipient identity warnings via vision.
Tech Stack: Clojure, HoneySQL, validation multimethod, UVR post-processor
check-recipient-identity with testsFiles:
Modify: src/com/getorcha/workers/ap/ingestion/validation.clj (add function before line 845)
Modify: test/com/getorcha/workers/ap/ingestion/validation_test.clj (add tests at end)
Step 1: Write tests for check-recipient-identity
Add to test/com/getorcha/workers/ap/ingestion/validation_test.clj:
;; Recipient Identity Tests
;; -----------------------------------------------------------------------------
(def ^:private base-legal-entity
"Legal entity master data for recipient identity tests."
{:legal-entity/name "Kunde AG"
:legal-entity/company-address "Musterstr. 1, 80331 München"
:legal-entity/company-vat-id "DE123456789"
:legal-entity/company-tax-id "143/123/12345"
:legal-entity/company-country "DE"})
(deftest test-check-recipient-identity
(testing "all fields match → pass"
(let [sd {:recipient {:name "Kunde AG"
:address "Musterstr. 1, 80331 München"
:country "DE"
:tax-id-type "vat"
:tax-id "DE123456789"}}]
(is (= {:status "pass"
:details {:name :match :vat-id :match :tax-id :skip
:address :match :country :match}}
(validation/check-recipient-identity sd base-legal-entity)))))
(testing "name mismatch → warning"
(let [sd {:recipient {:name "Wrong Company GmbH"}}]
(is (= "warning"
(:status (validation/check-recipient-identity sd base-legal-entity))))
(is (= :mismatch
(get-in (validation/check-recipient-identity sd base-legal-entity)
[:details :name])))))
(testing "whitespace normalization — extra spaces still match"
(let [sd {:recipient {:name " Kunde AG "}}]
(is (= "pass"
(:status (validation/check-recipient-identity sd base-legal-entity))))))
(testing "VAT ID mismatch → warning"
(let [sd {:recipient {:name "Kunde AG"
:tax-id-type "vat"
:tax-id "AT999999999"}}]
(is (= "warning"
(:status (validation/check-recipient-identity sd base-legal-entity))))
(is (= :mismatch
(get-in (validation/check-recipient-identity sd base-legal-entity)
[:details :vat-id])))))
(testing "non-vat tax-id comparison"
(let [sd {:recipient {:name "Kunde AG"
:tax-id-type "ein"
:tax-id "143/123/12345"}}]
(is (= :match
(get-in (validation/check-recipient-identity sd base-legal-entity)
[:details :tax-id])))))
(testing "non-vat tax-id mismatch"
(let [sd {:recipient {:name "Kunde AG"
:tax-id-type "ein"
:tax-id "999/999/99999"}}]
(is (= :mismatch
(get-in (validation/check-recipient-identity sd base-legal-entity)
[:details :tax-id])))))
(testing "address normalization — punctuation/case differences still match"
(let [sd {:recipient {:name "Kunde AG"
:address "musterstr 1 80331 münchen"}}]
(is (= :match
(get-in (validation/check-recipient-identity sd base-legal-entity)
[:details :address])))))
(testing "address mismatch"
(let [sd {:recipient {:name "Kunde AG"
:address "Hauptstr. 99, 10115 Berlin"}}]
(is (= :mismatch
(get-in (validation/check-recipient-identity sd base-legal-entity)
[:details :address])))))
(testing "invoice field blank → skip"
(let [sd {:recipient {:name "Kunde AG"}}]
(is (= "pass"
(:status (validation/check-recipient-identity sd base-legal-entity))))
(is (= :skip
(get-in (validation/check-recipient-identity sd base-legal-entity)
[:details :address])))))
(testing "LE field nil → skip"
(let [le (assoc base-legal-entity :legal-entity/company-vat-id nil)
sd {:recipient {:name "Kunde AG" :tax-id-type "vat" :tax-id "DE999999999"}}]
(is (= :skip
(get-in (validation/check-recipient-identity sd le)
[:details :vat-id])))))
(testing "no recipient → pass"
(is (= {:status "pass"
:details {:name :skip :vat-id :skip :tax-id :skip
:address :skip :country :skip}}
(validation/check-recipient-identity {} base-legal-entity))))
(testing "nil legal entity → pass (all skip)"
(let [sd {:recipient {:name "Anything"}}]
(is (= "pass"
(:status (validation/check-recipient-identity sd nil))))))
(testing "country case normalization"
(let [sd {:recipient {:name "Kunde AG" :country "de"}}]
(is (= :match
(get-in (validation/check-recipient-identity sd base-legal-entity)
[:details :country])))))
(testing "multiple mismatches → all reported in details"
(let [sd {:recipient {:name "Wrong Inc"
:address "Wrong Street"
:country "AT"}}]
(is (= {:name :mismatch :address :mismatch :country :mismatch
:vat-id :skip :tax-id :skip}
(:details (validation/check-recipient-identity sd base-legal-entity)))))))
Run: clj -X:test:silent :nses '[com.getorcha.workers.ap.ingestion.validation-test]' 2>&1 | grep -E "(FAIL|ERROR|Ran)"
Expected: Compilation error — check-recipient-identity does not exist.
check-recipient-identityAdd to src/com/getorcha/workers/ap/ingestion/validation.clj, before the ;; Document Type Dispatch section (before line 845):
(defn ^:private normalize-whitespace
"Collapse all whitespace runs to single space, trim."
[s]
(some-> s str/trim (str/replace #"\s+" " ")))
(defn ^:private normalize-address
"Lowercase, strip punctuation, collapse whitespace."
[s]
(some-> s
str/lower-case
(str/replace #"[.,;:\-/()]" "")
str/trim
(str/replace #"\s+" " ")))
;; NOTE: `normalize-non-vat-tax-id` already exists at line 562 in this file.
;; It's private but in the same ns, so it's callable from here.
(defn check-recipient-identity
"Compare invoice recipient fields against legal entity master data.
Returns a validation result with per-field match/mismatch/skip details."
[{:keys [recipient] :as _structured-data}
legal-entity]
(let [compare-field (fn [invoice-val le-val normalize-fn]
(cond
(or (nil? invoice-val) (str/blank? invoice-val)) :skip
(or (nil? le-val) (str/blank? le-val)) :skip
(= (normalize-fn invoice-val)
(normalize-fn le-val)) :match
:else :mismatch))
;; Resolve effective VAT ID and tax ID from recipient
{:keys [tax-id-type tax-id vat-id]} recipient
effective-vat-id (cond
(= tax-id-type "vat") tax-id
(seq vat-id) vat-id
:else nil)
effective-tax-id (when (and tax-id-type (not= tax-id-type "vat"))
tax-id)
details {:name (compare-field (:name recipient)
(:legal-entity/name legal-entity)
normalize-whitespace)
:vat-id (compare-field effective-vat-id
(:legal-entity/company-vat-id legal-entity)
tax/normalize-vat-id)
:tax-id (compare-field effective-tax-id
(:legal-entity/company-tax-id legal-entity)
normalize-non-vat-tax-id)
:address (compare-field (:address recipient)
(:legal-entity/company-address legal-entity)
normalize-address)
:country (compare-field (:country recipient)
(:legal-entity/company-country legal-entity)
str/upper-case)}]
(if (some #{:mismatch} (vals details))
{:status "warning"
:message "Recipient does not match legal entity master data"
:details details}
{:status "pass"
:details details})))
Important: The normalize-non-vat-tax-id function already exists as a private fn at line 562. It's in the same namespace so it's already callable. Do NOT re-define it — just use it. The note above is for clarity.
Run: clj -X:test:silent :nses '[com.getorcha.workers.ap.ingestion.validation-test]' 2>&1 | grep -E "(FAIL|ERROR|Ran)"
Expected: All tests pass, including the new test-check-recipient-identity.
Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion/validation.clj test/com/getorcha/workers/ap/ingestion/validation_test.clj
Fix any issues.
git add src/com/getorcha/workers/ap/ingestion/validation.clj test/com/getorcha/workers/ap/ingestion/validation_test.clj
git commit -m "feat: add check-recipient-identity validation check (issue #335)"
validate multimethod signature and wire LE dataFiles:
Modify: src/com/getorcha/workers/ap/ingestion/validation.clj:848-893 (multimethod + all defmethods)
Modify: src/com/getorcha/workers/ap/ingestion.clj:336-347 (with-validations)
Modify: test/com/getorcha/workers/ap/ingestion/validation_test.clj (update all validate calls)
Modify: src/com/getorcha/workers/ap/ingestion/post_process/financial_validation.clj (if it calls validate)
Modify: src/com/getorcha/workers/ap/ingestion/post_process/uncertain_validations.clj (if it calls validate — yes, check-required-fields is called directly, not validate, so no change there)
Step 1: Update the multimethod dispatch and all defmethods
In src/com/getorcha/workers/ap/ingestion/validation.clj:
Change the multimethod at line 848:
(defmulti validate
"Run all deterministic validation checks on structured-data.
Dispatches on the :document-type field in structured-data.
Accepts an optional legal-entity map for recipient identity validation.
Returns structured-data with :validation-results map keyed by check name.
Each check result has :status (\"pass\", \"warning\", \"error\", or \"uncertain\")
and optionally :field, :message, :details.
Summary predicates can be derived on-site:
(some #(= \"error\" (:status %)) (vals (:validation-results data)))"
(fn [structured-data & _] (:document-type structured-data)))
Change the :default method at line 862:
(defmethod validate :default
[structured-data & _]
(throw (ex-info "Validation not implemented for document type"
{:kind ::unsupported-document-type
:document-type (:document-type structured-data)})))
Change the "invoice" method at line 869:
(defmethod validate "invoice"
[structured-data & [legal-entity]]
(assoc structured-data
:validation-results
(cond-> {:financial-math (check-financial-math structured-data)
:required-fields (check-required-fields structured-data)
:tax-id-format (check-tax-id-format structured-data)
:iban-format (check-iban structured-data)
:date-reasonableness (check-date-reasonableness structured-data)
:issuer-country (check-issuer-country structured-data)
:recipient-country (check-recipient-country structured-data)
:recipient-identity (check-recipient-identity structured-data legal-entity)}
(:summary-page-range structured-data)
(assoc :large-document-summary-only (check-large-document-summary-only structured-data)))))
Change the "purchase-order" method at line 884:
(defmethod validate "purchase-order"
[structured-data & _]
(assoc structured-data
:validation-results
{:required-fields (if (and (seq (:po-number structured-data))
(seq (get-in structured-data [:supplier :name])))
{:status "pass"}
{:status "error"
:message "Missing required fields: po-number and/or supplier name"})}))
Do the same & _ change for the "contract" and "goods-received-note" defmethods (around lines 1048 and 1060).
with-validations in ingestion.cljChange src/com/getorcha/workers/ap/ingestion.clj:336-347:
(defn ^:private with-validations
"Runs deterministic validation checks on extracted structured-data.
Adds :validation-results to structured-data. Does not reject invalid documents."
[_context ingestion]
(log/info "Running validation checks")
(let [legal-entity (:legal-entity ingestion)
ingestion' (update ingestion :structured-data validation/validate legal-entity)
results (get-in ingestion' [:structured-data :validation-results])
has-errors (some #(= "error" (:status %)) (vals results))]
(log/info "Validation completed"
{:has-errors has-errors
:check-count (count results)})
ingestion'))
In test/com/getorcha/workers/ap/ingestion/validation_test.clj, all existing calls to (validation/validate ...) pass one argument. With the & [legal-entity] signature, all existing calls continue to work without changes — the second arg defaults to nil, and check-recipient-identity with nil legal-entity returns pass with all-skip details. No test changes needed.
Verify by running: clj -X:test:silent :nses '[com.getorcha.workers.ap.ingestion.validation-test]' 2>&1 | grep -E "(FAIL|ERROR|Ran)"
validateSearch for other callers:
grep -rn "validation/validate" src/ test/
If financial_validation.clj or uncertain_validations.clj call validate (not just individual check functions like check-required-fields), update those calls too. From reading the code, they call individual check functions directly, not validate — so no changes needed.
Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion/validation.clj src/com/getorcha/workers/ap/ingestion.clj
Fix any issues.
git add src/com/getorcha/workers/ap/ingestion/validation.clj src/com/getorcha/workers/ap/ingestion.clj
git commit -m "refactor: add legal-entity arg to validate multimethod (issue #335)"
job-handlerFiles:
Modify: src/com/getorcha/workers/ap/ingestion.clj:830-831 (add LE fetch)
Modify: src/com/getorcha/workers/ap/ingestion/extraction.clj:1318-1333 (remove redundant fetch)
Modify: src/com/getorcha/workers/ap/ingestion/post_process/tax_compliance.clj:304-309 (remove redundant fetch)
Step 1: Add LE fetch to job-handler
In src/com/getorcha/workers/ap/ingestion.clj, after line 831 ({:keys [document]} pipeline-state), add the LE fetch and merge it into pipeline-state:
(if-let [pipeline-state (claim-ingestion! context ingestion-id)]
(let [{:keys [document]} pipeline-state
legal-entity (db.sql/execute-one!
db-pool
{:select [:name :company-address :company-vat-id
:company-tax-id :company-country]
:from [:legal-entity]
:where [:= :id (:document/legal-entity-id document)]})
pipeline-state (assoc pipeline-state :legal-entity legal-entity)
{:keys [^ScheduledExecutorService heartbeat-scheduler]} worker-pools
This requires com.getorcha.db.sql to be required in the namespace. Check if it's already required — if not, add it.
In src/com/getorcha/workers/ap/ingestion/extraction.clj:1318-1333, change the contract extraction to read from the ingestion map:
(defmethod structured-data "contract"
[{:keys [db-pool llm-config] :as _context}
{{:keys [text]} :transcription-result :keys [document legal-entity] :as _ingestion}]
(let [started-at (java.time.Instant/now)
extraction-cfg (:extraction llm-config)
legal-entity-id (:document/legal-entity-id document)
{:legal-entity/keys [name company-address company-vat-id company-country]}
legal-entity
legal-entity-details
(str "- Company name: " name
(when company-address (str "\n- Address: " company-address))
(when company-vat-id (str "\n- VAT ID: " company-vat-id))
(when company-country (str "\n- Country: " company-country)))
The only change is: destructure :legal-entity from the ingestion map, bind its keys instead of the DB query result, and remove the db.sql/execute-one! call.
In src/com/getorcha/workers/ap/ingestion/post_process/tax_compliance.clj:289-310, change accounting-system-instructions to read from the ingestion map:
(defn ^:private accounting-system-instructions
"Enriches prompt-vars with accounting-system-specific variables from tenant config.
Merges :accounting-system-instructions and :extra-rules into prompt-vars.
Merges :line-item-output-fields into the :line-items entry of :output-schema.
Reads integration type from legal_entity_datev_integration table and legal entity country
for country-specific BU code rules (DE vs AT)."
[{:keys [db-pool] :as _context}
{:keys [document legal-entity] :as _ingestion}
prompt-vars]
(let [legal-entity-id (:document/legal-entity-id document)
integration (db.sql/execute-one!
db-pool
{:select [:integration-type]
:from [:legal-entity-datev-integration]
:where [:and
[:= :legal-entity-id legal-entity-id]
[:= :is-active true]]})
acct-vars (when (some-> integration
:legal-entity-datev-integration/integration-type
keyword
(= :datev))
(let [country (or (:legal-entity/company-country legal-entity) "DE")]
(datev-prompt-vars country)))]
(cond-> (merge-with merge prompt-vars (dissoc acct-vars :line-item-output-fields))
(:line-item-output-fields acct-vars)
(update-in [:output-schema :line-items 0]
merge (:line-item-output-fields acct-vars)))))
The change: destructure :legal-entity from ingestion, read company-country from it directly, remove the second db.sql/execute-one! call.
Run: clj -X:test:silent 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Execution error|failed because|Ran .* tests)"
Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion.clj src/com/getorcha/workers/ap/ingestion/extraction.clj src/com/getorcha/workers/ap/ingestion/post_process/tax_compliance.clj
git add src/com/getorcha/workers/ap/ingestion.clj src/com/getorcha/workers/ap/ingestion/extraction.clj src/com/getorcha/workers/ap/ingestion/post_process/tax_compliance.clj
git commit -m "refactor: pre-fetch legal entity in job-handler, remove redundant fetches (issue #335)"
Files:
Modify: src/com/getorcha/workers/ap/ingestion/post_process/uncertain_validations.clj:40-46,163-287
Modify: src/com/getorcha/workers/ap/ingestion/post_process/common.clj:41-68
Step 1: Add recipient correction field paths
In src/com/getorcha/workers/ap/ingestion/post_process/common.clj, add to correction-field-paths (after the existing "recipient.country" entry at line 66):
"recipient.vat-id" [:recipient :vat-id]
"recipient.tax-id" [:recipient :tax-id]
"recipient.tax-id-type" [:recipient :tax-id-type]
("recipient.name", "recipient.address", and "recipient.country" already exist.)
In src/com/getorcha/workers/ap/ingestion/post_process/uncertain_validations.clj, add after the iban-instruction def (after line 63):
(def ^:private recipient-identity-instruction
"**Recipient identity questions**:
The extracted recipient data does not match the legal entity master data.
Look at the actual recipient block on the attached PDF and verify each mismatched field.
If the PDF shows the correct legal entity data and the extraction had an OCR/transcription
error, provide corrections to fix the extracted data.
If the recipient on the PDF genuinely differs from the legal entity, confirm the mismatch.
Return: {\"recipient-identity\": {\"value\": \"pass\"|\"warning\", \"confidence\": 0.9,
\"reasoning\": \"...\", \"corrections\": {\"recipient.name\": \"...\"}}}
If confirming mismatch: {\"recipient-identity\": {\"value\": \"warning\", \"confidence\": 0.9,
\"reasoning\": \"The PDF clearly shows a different recipient name: ...\"}}")
-compute to include recipient-identity warningsIn uncertain_validations.clj, change the filter at lines 172-177:
uncertain-checks (when validation-results
(->> validation-results
(filter (fn [[k v]]
(and (not= k :financial-math)
(or (= "uncertain" (:status v))
(and (= "warning" (:status v))
(= k :recipient-identity))))))
(into {})))]
has-recipient-identity? flag and wire the instructionIn the (when (seq uncertain-checks) ...) block (lines 178-227), add the flag and update needs-pdf? and instructions:
has-recipient-identity? (contains? check-names :recipient-identity)
;; ... existing flags ...
needs-pdf? (or has-iban? has-required-fields? has-recipient-identity?)
instructions (string/join "\n\n"
(cond-> []
has-country? (conj country-instruction)
has-supplier-match? (conj supplier-match-instruction)
has-iban? (conj iban-instruction)
has-required-fields? (conj (required-fields-instruction missing-fields))
has-recipient-identity? (conj recipient-identity-instruction)))
:recipient-identity in -applyIn the -apply method (lines 229-286), add a clause in the reduce-kv body. The existing code has special handling for :required-fields (line 245) and a default branch (line 270). Add a clause for :recipient-identity before the default branch:
(if (= check-name :required-fields)
;; ... existing required-fields logic ...
(if (= check-name :recipient-identity)
(let [should-correct? (and (map? corrections)
(seq corrections)
(= value "pass"))
d' (if should-correct?
(apply-field-corrections d corrections)
d)
le (:legal-entity ingestion)
fresh (validation/check-recipient-identity d' le)]
(if (= "pass" (:status fresh))
(assoc-in d' [:validation-results :recipient-identity]
{:status "pass"
:resolved-by :uncertain-validations-resolver
:confidence confidence
:reasoning reasoning
:fields-fixed (when should-correct? (keys corrections))})
(assoc-in d' [:validation-results :recipient-identity]
{:status "warning"
:resolved-by :uncertain-validations-resolver
:confidence confidence
:reasoning reasoning
:message "Recipient mismatch confirmed by vision review"
:details (:details fresh)})))
;; ... existing default branch (cond-> assoc-in ...) ...
Run: clj -X:test:silent 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Execution error|failed because|Ran .* tests)"
Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion/post_process/uncertain_validations.clj src/com/getorcha/workers/ap/ingestion/post_process/common.clj
git add src/com/getorcha/workers/ap/ingestion/post_process/uncertain_validations.clj src/com/getorcha/workers/ap/ingestion/post_process/common.clj
git commit -m "feat: extend UVR to resolve recipient identity warnings via vision (issue #335)"
Use the /ingestion-regression-test skill to verify that existing documents still produce the same structured data after these changes. This catches any unintended side effects from:
The validate signature change
The pre-fetched LE data flowing through the pipeline
The new :recipient-identity key appearing in validation-results
Step 2: Review new recipient-identity results in regression output
Every invoice document should now have a :recipient-identity entry in :validation-results. Verify:
Documents with matching recipients → {:status "pass"}
The details map contains the expected field statuses
Step 3: Commit any regression baseline updates
If the regression baseline needs updating (because :recipient-identity is a new field in validation-results):
git add <regression baseline files>
git commit -m "test: update regression baseline with recipient-identity results (issue #335)"