Invoice ↔ GRN Direct Matching Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Enable direct matching between invoices and goods received notes (GRNs) without requiring a purchase order as intermediary.

Architecture: Add #{"invoice" "goods-received-note"} as a matchable pair. Add a gr-reference-exact evidence signal and extend date-within-period for GRN receipt dates. Add LLM prompt entries for both directions. Update the UI counterpart-types map.

Tech Stack: Clojure, clojure.test, HoneySQL

Design doc: docs/plans/2026-03-03-invoice-grn-matching-design.md


Task 1: Evidence — gr-reference-exact signal

Files:

Step 1: Write the failing tests

Add to evidence_test.clj:

(deftest gr-reference-exact-signal-test
  (testing "fires when invoice gr-references match GRN delivery-note-numbers"
    (let [{:keys [evidence]}
          (evidence/compute-score
           #:document{:type "invoice"
                      :structured-data {:gr-references ["50653" "2057422"]}}
           #:document{:type "goods-received-note"
                      :structured-data {:delivery-note-numbers ["50653"]}})]
      (is (some #(= :gr-reference-exact (:signal %)) evidence))))

  (testing "fires when invoice gr-references match GRN grn-number"
    (let [{:keys [evidence]}
          (evidence/compute-score
           #:document{:type "invoice"
                      :structured-data {:gr-references ["GRN-001"]}}
           #:document{:type "goods-received-note"
                      :structured-data {:grn-number "GRN-001"
                                        :delivery-note-numbers []}})]
      (is (some #(= :gr-reference-exact (:signal %)) evidence))))

  (testing "weight is 55"
    (let [{:keys [evidence]}
          (evidence/compute-score
           #:document{:type "invoice"
                      :structured-data {:gr-references ["DN-100"]}}
           #:document{:type "goods-received-note"
                      :structured-data {:delivery-note-numbers ["DN-100"]}})]
      (is (= 55 (:weight (first (filter #(= :gr-reference-exact (:signal %)) evidence)))))))

  (testing "does not fire when no references overlap"
    (let [{:keys [evidence]}
          (evidence/compute-score
           #:document{:type "invoice"
                      :structured-data {:gr-references ["50653"]}}
           #:document{:type "goods-received-note"
                      :structured-data {:delivery-note-numbers ["99999"]
                                        :grn-number "88888"}})]
      (is (not (some #(= :gr-reference-exact (:signal %)) evidence)))))

  (testing "does not fire when invoice has no gr-references"
    (let [{:keys [evidence]}
          (evidence/compute-score
           #:document{:type "invoice"
                      :structured-data {}}
           #:document{:type "goods-received-note"
                      :structured-data {:delivery-note-numbers ["50653"]}})]
      (is (not (some #(= :gr-reference-exact (:signal %)) evidence)))))

  (testing "works in both directions (GRN as doc-a, invoice as doc-b)"
    (let [{:keys [evidence]}
          (evidence/compute-score
           #:document{:type "goods-received-note"
                      :structured-data {:delivery-note-numbers ["50653"]}}
           #:document{:type "invoice"
                      :structured-data {:gr-references ["50653"]}})]
      (is (some #(= :gr-reference-exact (:signal %)) evidence)))))

Step 2: Run tests to verify they fail

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.evidence-test]' Expected: FAIL — :gr-reference-exact signal not found

Step 3: Implement

In evidence.clj, add the signal weight to evidence-signals:

(def evidence-signals
  {:po-number-exact     60
   :contract-ref-exact  55
   :po-ref-exact        55
   :gr-reference-exact  55   ; GR reference on invoice matches GRN delivery note
   ;; ... rest unchanged

Add an extractor function after get-contract-refs:

(defn ^:private get-gr-references
  "Extract GR/delivery-note references from document as a set of non-nil strings.
   Invoices yield `:gr-references`. GRNs yield `:grn-number` + `:delivery-note-numbers`."
  [{:document/keys [type structured-data]}]
  (let [refs (case type
               "invoice"             (:gr-references structured-data)
               "goods-received-note" (into [(:grn-number structured-data)]
                                           (:delivery-note-numbers structured-data))
               [])]
    (into #{} (remove nil?) refs)))

Add the signal check in collect-signals, after the contract reference block:

    ;; GR reference match (set intersection)
    (let [gr-a (get-gr-references doc-a)
          gr-b (get-gr-references doc-b)
          gr-common (set/intersection gr-a gr-b)]
      (when (seq gr-common)
        (conj! signals {:signal :gr-reference-exact
                        :value  (str/join ", " (sort gr-common))
                        :weight (:gr-reference-exact evidence-signals)})))

Step 4: Run tests to verify they pass

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.evidence-test]' Expected: PASS

Step 5: Commit

git add src/com/getorcha/workers/matching/evidence.clj test/com/getorcha/workers/matching/evidence_test.clj
git commit -m "feat: add gr-reference-exact evidence signal for invoice-GRN matching"

Task 2: Evidence — extend date-within-period for GRN receipt date

Files:

Step 1: Write the failing tests

Add to date-within-period-signal-test in evidence_test.clj:

  (testing "fires when GRN receipt-date falls within invoice service period"
    (let [{:keys [evidence]} (evidence/compute-score
                              #:document{:type "invoice"
                               :structured-data {:service-period {:start "2025-06-12"
                                                                  :end   "2025-06-23"}}}
                              #:document{:type "goods-received-note"
                               :structured-data {:receipt-date "2025-06-18"}})]
      (is (some #(= :date-within-period (:signal %)) evidence))))

  (testing "fires when GRN receipt-date equals service period boundary"
    (let [{:keys [evidence]} (evidence/compute-score
                              #:document{:type "goods-received-note"
                               :structured-data {:receipt-date "2025-06-12"}}
                              #:document{:type "invoice"
                               :structured-data {:service-period {:start "2025-06-12"
                                                                  :end   "2025-06-23"}}})]
      (is (some #(= :date-within-period (:signal %)) evidence))))

  (testing "does not fire when GRN receipt-date is outside invoice service period"
    (let [{:keys [evidence]} (evidence/compute-score
                              #:document{:type "invoice"
                               :structured-data {:service-period {:start "2025-06-12"
                                                                  :end   "2025-06-23"}}}
                              #:document{:type "goods-received-note"
                               :structured-data {:receipt-date "2025-07-01"}})]
      (is (not (some #(= :date-within-period (:signal %)) evidence)))))

  (testing "does not fire when GRN has no receipt-date"
    (let [{:keys [evidence]} (evidence/compute-score
                              #:document{:type "invoice"
                               :structured-data {:service-period {:start "2025-06-12"
                                                                  :end   "2025-06-23"}}}
                              #:document{:type "goods-received-note"
                               :structured-data {}})]
      (is (not (some #(= :date-within-period (:signal %)) evidence)))))

Step 2: Run tests to verify they fail

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.evidence-test]' Expected: FAIL — GRN receipt-date tests fail because existing logic only handles invoice↔contract

Step 3: Implement

In evidence.clj, add a get-receipt-date extractor after get-invoice-date:

(defn ^:private get-receipt-date
  "Extract receipt date from GRN as an ISO date string."
  [{:document/keys [type structured-data]}]
  (when (= type "goods-received-note")
    (:receipt-date structured-data)))

Extend the date-within-period block in collect-signals. Add a new case after the existing Case 3 (invoice-date within contract dates), before the closing paren of the cond:

        ;; Case 4: GRN receipt-date within invoice service period
        (let [receipt (or (get-receipt-date doc-a) (get-receipt-date doc-b))]
          (when (and svc-start svc-end receipt)
            (when (and (>= (compare receipt svc-start) 0)
                       (<= (compare receipt svc-end) 0))
              (conj! signals {:signal :date-within-period
                              :value  (str receipt " within " svc-start " to " svc-end)
                              :weight (:date-within-period evidence-signals)}))))

Note: This must be placed outside the cond as an additional check, since the cond is for the invoice↔contract cases. The receipt-date check should run independently when no contract dates are present. Restructure the date-within-period section as follows:

    ;; Date within period
    (let [[svc-start svc-end]           (or (get-service-period doc-a) (get-service-period doc-b))
          [contract-start contract-end] (or (get-contract-dates doc-a) (get-contract-dates doc-b))
          inv-date                      (or (get-invoice-date doc-a) (get-invoice-date doc-b))
          receipt                       (or (get-receipt-date doc-a) (get-receipt-date doc-b))]
      (cond
        ;; Case 1: Full service period + full contract dates
        (and svc-start svc-end contract-start contract-end)
        (when (and (>= (compare svc-start contract-start) 0)
                   (<= (compare svc-end contract-end) 0))
          (conj! signals {:signal :date-within-period
                          :value  (str svc-start " to " svc-end " within " contract-start " to " contract-end)
                          :weight (:date-within-period evidence-signals)}))

        ;; Case 2: Full service period + open-ended contract
        (and svc-start svc-end contract-start (nil? contract-end))
        (when (>= (compare svc-start contract-start) 0)
          (conj! signals {:signal :date-within-period
                          :value  (str svc-start " to " svc-end " after " contract-start)
                          :weight (:date-within-period evidence-signals)}))

        ;; Case 3: Invoice date only + contract dates (with or without expiration)
        (and inv-date contract-start)
        (when (and (>= (compare inv-date contract-start) 0)
                   (or (nil? contract-end)
                       (<= (compare inv-date contract-end) 0)))
          (conj! signals {:signal :date-within-period
                          :value  (str inv-date " within " contract-start " to " (or contract-end "open-ended"))
                          :weight (:date-within-period evidence-signals)}))

        ;; Case 4: GRN receipt-date within invoice service period
        (and svc-start svc-end receipt)
        (when (and (>= (compare receipt svc-start) 0)
                   (<= (compare receipt svc-end) 0))
          (conj! signals {:signal :date-within-period
                          :value  (str receipt " within " svc-start " to " svc-end)
                          :weight (:date-within-period evidence-signals)}))))

Step 4: Run tests to verify they pass

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.evidence-test]' Expected: PASS

Step 5: Commit

git add src/com/getorcha/workers/matching/evidence.clj test/com/getorcha/workers/matching/evidence_test.clj
git commit -m "feat: extend date-within-period signal for GRN receipt-date within invoice service period"

Task 3: LLM prompts for invoice ↔ GRN

Files:

Step 1: Write the failing tests

Add two new testing blocks inside the existing build-match-prompt-pair-specific-test in llm_decision_test.clj:

    (testing "invoice → goods-received-note: line-item prompt with GR references"
      (let [{:keys [system user]} (#'llm-decision/build-match-prompt
                                   nil nil invoice-doc "goods-received-note" (candidates-fn grn-doc))]
        (is (str/includes? (str/lower-case system) "invoice"))
        (is (str/includes? (str/lower-case system) "goods received note"))
        (is (str/includes? (str/lower-case system) "gr reference"))
        (is (str/includes? (str/lower-case user) "multiple matches"))))

    (testing "goods-received-note → invoice: line-item prompt"
      (let [{:keys [system user]} (#'llm-decision/build-match-prompt
                                   nil nil grn-doc "invoice" (candidates-fn invoice-doc))]
        (is (str/includes? (str/lower-case system) "goods received note"))
        (is (str/includes? (str/lower-case system) "invoice"))
        (is (str/includes? (str/lower-case system) "delivery note"))))

Step 2: Run tests to verify they fail

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.llm-decision-test]' Expected: FAIL — falls back to generic prompt, doesn't mention "gr reference" or "delivery note"

Step 3: Implement

Add two entries to pair-prompts in llm_decision.clj, after the ["purchase-order" "goods-received-note"] entry:

   ["invoice" "goods-received-note"]
   {:system "You are a financial document matching assistant.
Your task is to match an invoice to goods received notes by comparing line items and references.
An invoice can reference multiple GRNs — each GRN represents a separate delivery that contributes to the invoice total.
Focus on: GR reference numbers on the invoice matching delivery note numbers on GRNs, product descriptions, quantities (GRN quantities should aggregate to invoice line item quantities across all matched GRNs), and whether delivery dates fall within the invoice service period."
    :task "Match all candidates whose delivery note numbers or line items correspond to the source invoice.
Multiple matches are expected and encouraged when justified. Partial matches count."}

   ["goods-received-note" "invoice"]
   {:system "You are a financial document matching assistant.
Your task is to match a goods received note to invoices by comparing line items and references.
A GRN typically corresponds to one invoice, but could appear on multiple invoices.
Focus on: whether the GRN delivery note number appears in the invoice GR references, product descriptions, quantities (GRN quantity should be part of the invoice line item total), and whether the delivery date falls within the invoice service period."
    :task "Match all candidates whose GR references or line items correspond to the source goods received note.
Multiple matches are expected and encouraged when justified. Partial matches count."}

Step 4: Run tests to verify they pass

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.llm-decision-test]' Expected: PASS

Step 5: Commit

git add src/com/getorcha/workers/matching/llm_decision.clj test/com/getorcha/workers/matching/llm_decision_test.clj
git commit -m "feat: add LLM prompt entries for invoice-GRN matching"

Task 4: Register matchable pair and update UI

Files:

Step 1: Update matchable-pairs in candidates.clj

Change:

(def ^:private matchable-pairs
  "Valid document type pairs for matching."
  #{#{"invoice" "purchase-order"}
    #{"invoice" "contract"}
    #{"purchase-order" "contract"}
    #{"goods-received-note" "purchase-order"}})

To:

(def ^:private matchable-pairs
  "Valid document type pairs for matching."
  #{#{"invoice" "purchase-order"}
    #{"invoice" "contract"}
    #{"invoice" "goods-received-note"}
    #{"purchase-order" "contract"}
    #{"goods-received-note" "purchase-order"}})

Step 2: Update counterpart-types in view/shared.clj

Change:

(def ^:private counterpart-types
  "For each document type, the ordered list of counterpart types to show in the
   Matches section. Only directly matchable pairs (per candidates/matchable-pairs)."
  {:invoice             [:contract :purchase-order]
   :contract            [:invoice :purchase-order]
   :purchase-order      [:invoice :contract :goods-received-note]
   :goods-received-note [:purchase-order]})

To:

(def ^:private counterpart-types
  "For each document type, the ordered list of counterpart types to show in the
   Matches section. Only directly matchable pairs (per candidates/matchable-pairs)."
  {:invoice             [:contract :purchase-order :goods-received-note]
   :contract            [:invoice :purchase-order]
   :purchase-order      [:invoice :contract :goods-received-note]
   :goods-received-note [:purchase-order :invoice]})

Step 3: Run all matching tests

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.evidence-test com.getorcha.workers.matching.llm-decision-test]' Expected: PASS

Step 4: Commit

git add src/com/getorcha/workers/matching/candidates.clj src/com/getorcha/erp/http/documents/view/shared.clj
git commit -m "feat: register invoice-GRN as matchable pair and update UI counterpart types"

Task 5: Lint and full test suite

Step 1: Run linter

Run: clj-kondo --lint src test dev Expected: No errors. Fix any that appear.

Step 2: Run full matching test suite

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.evidence-test com.getorcha.workers.matching.llm-decision-test com.getorcha.workers.matching.normalize-test com.getorcha.workers.matching.searchable-text-test com.getorcha.workers.matching.core-test]' 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Ran .* tests)" Expected: All tests pass.

Step 3: Commit if lint/tests required fixes

Only if changes were needed.


Task 6: Manual verification with ingested documents

Retrigger matching for the 5 ATP1E4CT documents and verify they cluster correctly.

Step 1: Reset matching status

psql -h localhost -U postgres -d orcha -c "UPDATE document SET matching_status = 'pending', cluster_id = NULL WHERE file_original_name LIKE 'ATP1E4CT%'"

Also delete any existing match edges between these documents:

psql -h localhost -U postgres -d orcha -c "DELETE FROM document_match WHERE document_a_id IN (SELECT id FROM document WHERE file_original_name LIKE 'ATP1E4CT%') OR document_b_id IN (SELECT id FROM document WHERE file_original_name LIKE 'ATP1E4CT%')"

Step 2: Trigger matching via REPL or SQS

Send matching messages for each document. The exact method depends on the running system — either via nREPL (reset) + SQS message, or by calling match-document! directly from the REPL.

Step 3: Verify results

psql -h localhost -U postgres -d orcha -c "
SELECT d1.file_original_name AS doc_a,
       d2.file_original_name AS doc_b,
       dm.blended_score,
       dm.match_method,
       dm.evidence
FROM document_match dm
JOIN document d1 ON dm.document_a_id = d1.id
JOIN document d2 ON dm.document_b_id = d2.id
WHERE d1.file_original_name LIKE 'ATP1E4CT%'
   OR d2.file_original_name LIKE 'ATP1E4CT%'
ORDER BY dm.blended_score DESC"

Expected: The invoice matches all 4 GRNs. Each match should have gr-reference-exact in its evidence. Matches with references + description overlap should score above 0.70 (auto-matched via rule-based).

Step 4: Verify UI

Open the invoice detail page in the browser. The Matches section should show all 4 GRNs. Open a GRN detail page — it should show the invoice as a match.