Note (2026-04-24): After this document was written, legal_entity was renamed to tenant and the old tenant was renamed to organization. Read references to these terms with the pre-rename meaning.

Recipient Identity Validation Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Validate invoice recipient fields against legal entity master data, with UVR vision fallback for mismatches.

Architecture: Pre-fetch LE data in job-handler into pipeline state. Add check-recipient-identity as a validation check. Extend UVR to resolve recipient identity warnings via vision.

Tech Stack: Clojure, HoneySQL, validation multimethod, UVR post-processor


Task 1: Write check-recipient-identity with tests

Files:

Add to test/com/getorcha/workers/ap/ingestion/validation_test.clj:

;; Recipient Identity Tests
;; -----------------------------------------------------------------------------

(def ^:private base-legal-entity
  "Legal entity master data for recipient identity tests."
  {:legal-entity/name            "Kunde AG"
   :legal-entity/company-address "Musterstr. 1, 80331 München"
   :legal-entity/company-vat-id  "DE123456789"
   :legal-entity/company-tax-id  "143/123/12345"
   :legal-entity/company-country "DE"})


(deftest test-check-recipient-identity
  (testing "all fields match → pass"
    (let [sd {:recipient {:name    "Kunde AG"
                          :address "Musterstr. 1, 80331 München"
                          :country "DE"
                          :tax-id-type "vat"
                          :tax-id  "DE123456789"}}]
      (is (= {:status "pass"
              :details {:name :match :vat-id :match :tax-id :skip
                        :address :match :country :match}}
             (validation/check-recipient-identity sd base-legal-entity)))))

  (testing "name mismatch → warning"
    (let [sd {:recipient {:name "Wrong Company GmbH"}}]
      (is (= "warning"
             (:status (validation/check-recipient-identity sd base-legal-entity))))
      (is (= :mismatch
             (get-in (validation/check-recipient-identity sd base-legal-entity)
                     [:details :name])))))

  (testing "whitespace normalization — extra spaces still match"
    (let [sd {:recipient {:name "  Kunde   AG  "}}]
      (is (= "pass"
             (:status (validation/check-recipient-identity sd base-legal-entity))))))

  (testing "VAT ID mismatch → warning"
    (let [sd {:recipient {:name "Kunde AG"
                          :tax-id-type "vat"
                          :tax-id "AT999999999"}}]
      (is (= "warning"
             (:status (validation/check-recipient-identity sd base-legal-entity))))
      (is (= :mismatch
             (get-in (validation/check-recipient-identity sd base-legal-entity)
                     [:details :vat-id])))))

  (testing "non-vat tax-id comparison"
    (let [sd {:recipient {:name "Kunde AG"
                          :tax-id-type "ein"
                          :tax-id "143/123/12345"}}]
      (is (= :match
             (get-in (validation/check-recipient-identity sd base-legal-entity)
                     [:details :tax-id])))))

  (testing "non-vat tax-id mismatch"
    (let [sd {:recipient {:name "Kunde AG"
                          :tax-id-type "ein"
                          :tax-id "999/999/99999"}}]
      (is (= :mismatch
             (get-in (validation/check-recipient-identity sd base-legal-entity)
                     [:details :tax-id])))))

  (testing "address normalization — punctuation/case differences still match"
    (let [sd {:recipient {:name    "Kunde AG"
                          :address "musterstr 1 80331 münchen"}}]
      (is (= :match
             (get-in (validation/check-recipient-identity sd base-legal-entity)
                     [:details :address])))))

  (testing "address mismatch"
    (let [sd {:recipient {:name    "Kunde AG"
                          :address "Hauptstr. 99, 10115 Berlin"}}]
      (is (= :mismatch
             (get-in (validation/check-recipient-identity sd base-legal-entity)
                     [:details :address])))))

  (testing "invoice field blank → skip"
    (let [sd {:recipient {:name "Kunde AG"}}]
      (is (= "pass"
             (:status (validation/check-recipient-identity sd base-legal-entity))))
      (is (= :skip
             (get-in (validation/check-recipient-identity sd base-legal-entity)
                     [:details :address])))))

  (testing "LE field nil → skip"
    (let [le (assoc base-legal-entity :legal-entity/company-vat-id nil)
          sd {:recipient {:name "Kunde AG" :tax-id-type "vat" :tax-id "DE999999999"}}]
      (is (= :skip
             (get-in (validation/check-recipient-identity sd le)
                     [:details :vat-id])))))

  (testing "no recipient → pass"
    (is (= {:status "pass"
            :details {:name :skip :vat-id :skip :tax-id :skip
                      :address :skip :country :skip}}
           (validation/check-recipient-identity {} base-legal-entity))))

  (testing "nil legal entity → pass (all skip)"
    (let [sd {:recipient {:name "Anything"}}]
      (is (= "pass"
             (:status (validation/check-recipient-identity sd nil))))))

  (testing "country case normalization"
    (let [sd {:recipient {:name "Kunde AG" :country "de"}}]
      (is (= :match
             (get-in (validation/check-recipient-identity sd base-legal-entity)
                     [:details :country])))))

  (testing "multiple mismatches → all reported in details"
    (let [sd {:recipient {:name    "Wrong Inc"
                          :address "Wrong Street"
                          :country "AT"}}]
      (is (= {:name :mismatch :address :mismatch :country :mismatch
              :vat-id :skip :tax-id :skip}
             (:details (validation/check-recipient-identity sd base-legal-entity)))))))

Run: clj -X:test:silent :nses '[com.getorcha.workers.ap.ingestion.validation-test]' 2>&1 | grep -E "(FAIL|ERROR|Ran)"

Expected: Compilation error — check-recipient-identity does not exist.

Add to src/com/getorcha/workers/ap/ingestion/validation.clj, before the ;; Document Type Dispatch section (before line 845):

(defn ^:private normalize-whitespace
  "Collapse all whitespace runs to single space, trim."
  [s]
  (some-> s str/trim (str/replace #"\s+" " ")))


(defn ^:private normalize-address
  "Lowercase, strip punctuation, collapse whitespace."
  [s]
  (some-> s
          str/lower-case
          (str/replace #"[.,;:\-/()]" "")
          str/trim
          (str/replace #"\s+" " ")))


;; NOTE: `normalize-non-vat-tax-id` already exists at line 562 in this file.
;; It's private but in the same ns, so it's callable from here.


(defn check-recipient-identity
  "Compare invoice recipient fields against legal entity master data.
   Returns a validation result with per-field match/mismatch/skip details."
  [{:keys [recipient] :as _structured-data}
   legal-entity]
  (let [compare-field (fn [invoice-val le-val normalize-fn]
                        (cond
                          (or (nil? invoice-val) (str/blank? invoice-val)) :skip
                          (or (nil? le-val) (str/blank? le-val))           :skip
                          (= (normalize-fn invoice-val)
                             (normalize-fn le-val))                        :match
                          :else                                            :mismatch))
        ;; Resolve effective VAT ID and tax ID from recipient
        {:keys [tax-id-type tax-id vat-id]} recipient
        effective-vat-id (cond
                           (= tax-id-type "vat") tax-id
                           (seq vat-id)           vat-id
                           :else                  nil)
        effective-tax-id (when (and tax-id-type (not= tax-id-type "vat"))
                           tax-id)
        details {:name    (compare-field (:name recipient)
                                         (:legal-entity/name legal-entity)
                                         normalize-whitespace)
                 :vat-id  (compare-field effective-vat-id
                                         (:legal-entity/company-vat-id legal-entity)
                                         tax/normalize-vat-id)
                 :tax-id  (compare-field effective-tax-id
                                         (:legal-entity/company-tax-id legal-entity)
                                         normalize-non-vat-tax-id)
                 :address (compare-field (:address recipient)
                                         (:legal-entity/company-address legal-entity)
                                         normalize-address)
                 :country (compare-field (:country recipient)
                                         (:legal-entity/company-country legal-entity)
                                         str/upper-case)}]
    (if (some #{:mismatch} (vals details))
      {:status  "warning"
       :message "Recipient does not match legal entity master data"
       :details details}
      {:status  "pass"
       :details details})))

Important: The normalize-non-vat-tax-id function already exists as a private fn at line 562. It's in the same namespace so it's already callable. Do NOT re-define it — just use it. The note above is for clarity.

Run: clj -X:test:silent :nses '[com.getorcha.workers.ap.ingestion.validation-test]' 2>&1 | grep -E "(FAIL|ERROR|Ran)"

Expected: All tests pass, including the new test-check-recipient-identity.

Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion/validation.clj test/com/getorcha/workers/ap/ingestion/validation_test.clj

Fix any issues.

git add src/com/getorcha/workers/ap/ingestion/validation.clj test/com/getorcha/workers/ap/ingestion/validation_test.clj
git commit -m "feat: add check-recipient-identity validation check (issue #335)"

Task 2: Change validate multimethod signature and wire LE data

Files:

In src/com/getorcha/workers/ap/ingestion/validation.clj:

Change the multimethod at line 848:

(defmulti validate
  "Run all deterministic validation checks on structured-data.

   Dispatches on the :document-type field in structured-data.
   Accepts an optional legal-entity map for recipient identity validation.

   Returns structured-data with :validation-results map keyed by check name.
   Each check result has :status (\"pass\", \"warning\", \"error\", or \"uncertain\")
   and optionally :field, :message, :details.

   Summary predicates can be derived on-site:
     (some #(= \"error\" (:status %)) (vals (:validation-results data)))"
  (fn [structured-data & _] (:document-type structured-data)))

Change the :default method at line 862:

(defmethod validate :default
  [structured-data & _]
  (throw (ex-info "Validation not implemented for document type"
                  {:kind          ::unsupported-document-type
                   :document-type (:document-type structured-data)})))

Change the "invoice" method at line 869:

(defmethod validate "invoice"
  [structured-data & [legal-entity]]
  (assoc structured-data
         :validation-results
         (cond-> {:financial-math       (check-financial-math structured-data)
                  :required-fields      (check-required-fields structured-data)
                  :tax-id-format        (check-tax-id-format structured-data)
                  :iban-format          (check-iban structured-data)
                  :date-reasonableness  (check-date-reasonableness structured-data)
                  :issuer-country       (check-issuer-country structured-data)
                  :recipient-country    (check-recipient-country structured-data)
                  :recipient-identity   (check-recipient-identity structured-data legal-entity)}
           (:summary-page-range structured-data)
           (assoc :large-document-summary-only (check-large-document-summary-only structured-data)))))

Change the "purchase-order" method at line 884:

(defmethod validate "purchase-order"
  [structured-data & _]
  (assoc structured-data
         :validation-results
         {:required-fields (if (and (seq (:po-number structured-data))
                                    (seq (get-in structured-data [:supplier :name])))
                             {:status "pass"}
                             {:status "error"
                              :message "Missing required fields: po-number and/or supplier name"})}))

Do the same & _ change for the "contract" and "goods-received-note" defmethods (around lines 1048 and 1060).

Change src/com/getorcha/workers/ap/ingestion.clj:336-347:

(defn ^:private with-validations
  "Runs deterministic validation checks on extracted structured-data.
   Adds :validation-results to structured-data. Does not reject invalid documents."
  [_context ingestion]
  (log/info "Running validation checks")
  (let [legal-entity (:legal-entity ingestion)
        ingestion'   (update ingestion :structured-data validation/validate legal-entity)
        results      (get-in ingestion' [:structured-data :validation-results])
        has-errors   (some #(= "error" (:status %)) (vals results))]
    (log/info "Validation completed"
              {:has-errors  has-errors
               :check-count (count results)})
    ingestion'))

In test/com/getorcha/workers/ap/ingestion/validation_test.clj, all existing calls to (validation/validate ...) pass one argument. With the & [legal-entity] signature, all existing calls continue to work without changes — the second arg defaults to nil, and check-recipient-identity with nil legal-entity returns pass with all-skip details. No test changes needed.

Verify by running: clj -X:test:silent :nses '[com.getorcha.workers.ap.ingestion.validation-test]' 2>&1 | grep -E "(FAIL|ERROR|Ran)"

Search for other callers:

grep -rn "validation/validate" src/ test/

If financial_validation.clj or uncertain_validations.clj call validate (not just individual check functions like check-required-fields), update those calls too. From reading the code, they call individual check functions directly, not validate — so no changes needed.

Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion/validation.clj src/com/getorcha/workers/ap/ingestion.clj

Fix any issues.

git add src/com/getorcha/workers/ap/ingestion/validation.clj src/com/getorcha/workers/ap/ingestion.clj
git commit -m "refactor: add legal-entity arg to validate multimethod (issue #335)"

Files:

In src/com/getorcha/workers/ap/ingestion.clj, after line 831 ({:keys [document]} pipeline-state), add the LE fetch and merge it into pipeline-state:

(if-let [pipeline-state (claim-ingestion! context ingestion-id)]
  (let [{:keys [document]}                                   pipeline-state
        legal-entity    (db.sql/execute-one!
                         db-pool
                         {:select [:name :company-address :company-vat-id
                                   :company-tax-id :company-country]
                          :from   [:legal-entity]
                          :where  [:= :id (:document/legal-entity-id document)]})
        pipeline-state  (assoc pipeline-state :legal-entity legal-entity)
        {:keys [^ScheduledExecutorService heartbeat-scheduler]} worker-pools

This requires com.getorcha.db.sql to be required in the namespace. Check if it's already required — if not, add it.

In src/com/getorcha/workers/ap/ingestion/extraction.clj:1318-1333, change the contract extraction to read from the ingestion map:

(defmethod structured-data "contract"
  [{:keys [db-pool llm-config] :as _context}
   {{:keys [text]} :transcription-result :keys [document legal-entity] :as _ingestion}]
  (let [started-at      (java.time.Instant/now)
        extraction-cfg  (:extraction llm-config)
        legal-entity-id (:document/legal-entity-id document)
        {:legal-entity/keys [name company-address company-vat-id company-country]}
        legal-entity
        legal-entity-details
        (str "- Company name: " name
             (when company-address (str "\n- Address: " company-address))
             (when company-vat-id  (str "\n- VAT ID: " company-vat-id))
             (when company-country (str "\n- Country: " company-country)))

The only change is: destructure :legal-entity from the ingestion map, bind its keys instead of the DB query result, and remove the db.sql/execute-one! call.

In src/com/getorcha/workers/ap/ingestion/post_process/tax_compliance.clj:289-310, change accounting-system-instructions to read from the ingestion map:

(defn ^:private accounting-system-instructions
  "Enriches prompt-vars with accounting-system-specific variables from tenant config.
   Merges :accounting-system-instructions and :extra-rules into prompt-vars.
   Merges :line-item-output-fields into the :line-items entry of :output-schema.
   Reads integration type from legal_entity_datev_integration table and legal entity country
   for country-specific BU code rules (DE vs AT)."
  [{:keys [db-pool] :as _context}
   {:keys [document legal-entity] :as _ingestion}
   prompt-vars]
  (let [legal-entity-id (:document/legal-entity-id document)
        integration     (db.sql/execute-one!
                         db-pool
                         {:select [:integration-type]
                          :from   [:legal-entity-datev-integration]
                          :where  [:and
                                   [:= :legal-entity-id legal-entity-id]
                                   [:= :is-active true]]})
        acct-vars       (when (some-> integration
                                      :legal-entity-datev-integration/integration-type
                                      keyword
                                      (= :datev))
                          (let [country (or (:legal-entity/company-country legal-entity) "DE")]
                            (datev-prompt-vars country)))]
    (cond-> (merge-with merge prompt-vars (dissoc acct-vars :line-item-output-fields))
      (:line-item-output-fields acct-vars)
      (update-in [:output-schema :line-items 0]
                 merge (:line-item-output-fields acct-vars)))))

The change: destructure :legal-entity from ingestion, read company-country from it directly, remove the second db.sql/execute-one! call.

Run: clj -X:test:silent 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Execution error|failed because|Ran .* tests)"

Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion.clj src/com/getorcha/workers/ap/ingestion/extraction.clj src/com/getorcha/workers/ap/ingestion/post_process/tax_compliance.clj

git add src/com/getorcha/workers/ap/ingestion.clj src/com/getorcha/workers/ap/ingestion/extraction.clj src/com/getorcha/workers/ap/ingestion/post_process/tax_compliance.clj
git commit -m "refactor: pre-fetch legal entity in job-handler, remove redundant fetches (issue #335)"

Task 4: Extend UVR to resolve recipient identity warnings

Files:

In src/com/getorcha/workers/ap/ingestion/post_process/common.clj, add to correction-field-paths (after the existing "recipient.country" entry at line 66):

   "recipient.vat-id"         [:recipient :vat-id]
   "recipient.tax-id"         [:recipient :tax-id]
   "recipient.tax-id-type"    [:recipient :tax-id-type]

("recipient.name", "recipient.address", and "recipient.country" already exist.)

In src/com/getorcha/workers/ap/ingestion/post_process/uncertain_validations.clj, add after the iban-instruction def (after line 63):

(def ^:private recipient-identity-instruction
  "**Recipient identity questions**:
The extracted recipient data does not match the legal entity master data.
Look at the actual recipient block on the attached PDF and verify each mismatched field.
If the PDF shows the correct legal entity data and the extraction had an OCR/transcription
error, provide corrections to fix the extracted data.
If the recipient on the PDF genuinely differs from the legal entity, confirm the mismatch.
Return: {\"recipient-identity\": {\"value\": \"pass\"|\"warning\", \"confidence\": 0.9,
  \"reasoning\": \"...\", \"corrections\": {\"recipient.name\": \"...\"}}}
If confirming mismatch: {\"recipient-identity\": {\"value\": \"warning\", \"confidence\": 0.9,
  \"reasoning\": \"The PDF clearly shows a different recipient name: ...\"}}")

In uncertain_validations.clj, change the filter at lines 172-177:

          uncertain-checks                    (when validation-results
                                                (->> validation-results
                                                     (filter (fn [[k v]]
                                                               (and (not= k :financial-math)
                                                                    (or (= "uncertain" (:status v))
                                                                        (and (= "warning" (:status v))
                                                                             (= k :recipient-identity))))))
                                                     (into {})))]

In the (when (seq uncertain-checks) ...) block (lines 178-227), add the flag and update needs-pdf? and instructions:

              has-recipient-identity? (contains? check-names :recipient-identity)
              ;; ... existing flags ...
              needs-pdf?           (or has-iban? has-required-fields? has-recipient-identity?)
              instructions         (string/join "\n\n"
                                                (cond-> []
                                                  has-country?              (conj country-instruction)
                                                  has-supplier-match?       (conj supplier-match-instruction)
                                                  has-iban?                 (conj iban-instruction)
                                                  has-required-fields?      (conj (required-fields-instruction missing-fields))
                                                  has-recipient-identity?   (conj recipient-identity-instruction)))

In the -apply method (lines 229-286), add a clause in the reduce-kv body. The existing code has special handling for :required-fields (line 245) and a default branch (line 270). Add a clause for :recipient-identity before the default branch:

             (if (= check-name :required-fields)
               ;; ... existing required-fields logic ...

               (if (= check-name :recipient-identity)
                 (let [should-correct? (and (map? corrections)
                                            (seq corrections)
                                            (= value "pass"))
                       d'              (if should-correct?
                                         (apply-field-corrections d corrections)
                                         d)
                       le              (:legal-entity ingestion)
                       fresh           (validation/check-recipient-identity d' le)]
                   (if (= "pass" (:status fresh))
                     (assoc-in d' [:validation-results :recipient-identity]
                               {:status      "pass"
                                :resolved-by :uncertain-validations-resolver
                                :confidence  confidence
                                :reasoning   reasoning
                                :fields-fixed (when should-correct? (keys corrections))})
                     (assoc-in d' [:validation-results :recipient-identity]
                               {:status      "warning"
                                :resolved-by :uncertain-validations-resolver
                                :confidence  confidence
                                :reasoning   reasoning
                                :message     "Recipient mismatch confirmed by vision review"
                                :details     (:details fresh)})))

                 ;; ... existing default branch (cond-> assoc-in ...) ...

Run: clj -X:test:silent 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Execution error|failed because|Ran .* tests)"

Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion/post_process/uncertain_validations.clj src/com/getorcha/workers/ap/ingestion/post_process/common.clj

git add src/com/getorcha/workers/ap/ingestion/post_process/uncertain_validations.clj src/com/getorcha/workers/ap/ingestion/post_process/common.clj
git commit -m "feat: extend UVR to resolve recipient identity warnings via vision (issue #335)"

Task 5: Integration test with ingestion regression

Use the /ingestion-regression-test skill to verify that existing documents still produce the same structured data after these changes. This catches any unintended side effects from:

Every invoice document should now have a :recipient-identity entry in :validation-results. Verify:

If the regression baseline needs updating (because :recipient-identity is a new field in validation-results):

git add <regression baseline files>
git commit -m "test: update regression baseline with recipient-identity results (issue #335)"