Many-to-Many Document Matching Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Allow the matching system to create multiple matches per document type group instead of picking a single best match.

Architecture: Two-tier decision logic in decide-matches: auto-match all candidates above high threshold (rule-based), send uncertain-zone candidates (up to 50) to LLM for per-candidate yes/no evaluation. LLM prompt reframed from "pick the best" to "evaluate each candidate independently."

Tech Stack: Clojure, Malli, LLM integration (existing).

Design Doc: docs/plans/2026-03-01-many-to-many-matching-design.md


Task 1: Update decide-matches Tests for Many-to-Many

Files:

Step 1: Update existing tests to reflect new behavior

The test "multiple high-confidence candidates without LLM config picks top high candidate" (line 63) currently asserts only 1 match is returned. Under many-to-many, both should be auto-matched.

The test "multiple high-confidence candidates with LLM config uses LLM" (line 76) currently sends both to LLM. Under many-to-many, both are above 0.70 so they should be auto-matched without LLM.

The test "sends only top 3 candidates to LLM when multiple exist" (line 134) currently asserts exactly 3 candidates sent. Under many-to-many, all 5 candidates (all in uncertain zone 0.45–0.65) should go to LLM — the take 3 limit is removed.

Replace the decide-matches-test and decide-matches-sends-top-3-to-llm-test deftest forms with:

(deftest decide-matches-test
  (testing "single high-confidence candidate matches without LLM"
    (let [candidates [{:doc   {:document/id (random-uuid) :document/type "purchase-order"}
                       :score 0.85
                       :evidence [{:signal :po-number-exact :value "PO-001" :weight 60}]}]
          result     (matching/decide-matches nil nil candidates)]
      (is (= 1 (count result)))
      (is (= "rule-based" (:match-method (first result))))
      (is (= 0.85 (:score (first result))))))

  (testing "no candidates returns empty"
    (is (empty? (matching/decide-matches nil nil []))))

  (testing "no candidates above low threshold returns empty"
    (let [candidates [{:doc   {:document/id (random-uuid)}
                       :score 0.20
                       :evidence []}]]
      (is (empty? (matching/decide-matches nil nil candidates)))))

  (testing "multiple high-confidence candidates all auto-matched"
    (let [candidates [{:doc {:document/id (random-uuid) :document/type "purchase-order"}
                       :score 0.80
                       :evidence [{:signal :po-number-exact}]}
                      {:doc {:document/id (random-uuid) :document/type "purchase-order"}
                       :score 0.75
                       :evidence [{:signal :vat-id-match}]}]
          result     (matching/decide-matches nil nil candidates)]
      (is (= 2 (count result)))
      (is (every? #(= "rule-based" (:match-method %)) result))))

  (testing "multiple high-confidence candidates auto-matched even with LLM config"
    (let [candidates [{:doc {:document/id (random-uuid) :document/type "purchase-order"}
                       :score 0.80
                       :evidence [{:signal :po-number-exact}]}
                      {:doc {:document/id (random-uuid) :document/type "purchase-order"}
                       :score 0.75
                       :evidence [{:signal :vat-id-match}]}]
          llm-config {:provider :anthropic :api-key "test" :model "test"}
          result     (matching/decide-matches llm-config nil candidates)]
      (is (= 2 (count result)))
      (is (every? #(= "rule-based" (:match-method %)) result))))

  (testing "uncertain-zone candidates sent to LLM"
    (let [source-doc  #:document{:type "invoice"
                                 :structured-data {:issuer {:name "ACME"}}}
          candidates  [{:doc #:document{:type "purchase-order"
                                        :structured-data {:supplier {:name "ACME"}}}
                        :score 0.55
                        :evidence [{:signal :vat-id-match}]}]
          llm-config  {:provider :anthropic :api-key "test" :model "test"}
          llm-result  {:matches [{:candidate 1 :confidence "medium" :reasoning "Supplier matches"}]}
          result      (with-redefs [com.getorcha.workers.matching.llm-decision/llm-match-decision
                                    (fn [_cfg _src _cands] llm-result)]
                        (matching/decide-matches llm-config source-doc candidates))]
      (is (= 1 (count result)))
      (is (= "llm" (:match-method (first result))))))

  (testing "uncertain-zone candidates without LLM config returns empty"
    (let [candidates [{:doc {:document/id (random-uuid)} :score 0.55 :evidence []}]]
      (is (empty? (matching/decide-matches nil nil candidates)))))

  (testing "mixed tiers: high auto-matched + uncertain sent to LLM"
    (let [source-doc  #:document{:type "invoice"
                                 :structured-data {:issuer {:name "ACME"}}}
          high-id     (random-uuid)
          uncertain-id (random-uuid)
          candidates  [{:doc {:document/id high-id :document/type "purchase-order"
                              :document/structured-data {:supplier {:name "ACME"}}}
                        :score 0.85
                        :evidence [{:signal :po-number-exact}]}
                       {:doc {:document/id uncertain-id :document/type "purchase-order"
                              :document/structured-data {:supplier {:name "ACME"}}}
                        :score 0.55
                        :evidence [{:signal :vat-id-match}]}]
          llm-config  {:provider :anthropic :api-key "test" :model "test"}
          llm-result  {:matches [{:candidate 1 :confidence "high" :reasoning "matches"}]}
          result      (with-redefs [com.getorcha.workers.matching.llm-decision/llm-match-decision
                                    (fn [_cfg _src cands]
                                      ;; LLM should only receive the uncertain candidate
                                      (is (= 1 (count cands)))
                                      (is (= uncertain-id (:document/id (:doc (first cands)))))
                                      llm-result)]
                        (matching/decide-matches llm-config source-doc candidates))]
      (is (= 2 (count result)))
      (is (= #{"rule-based" "llm"} (set (map :match-method result))))))

  (testing "LLM with invalid candidate index (0) is safely ignored"
    (let [source-doc #:document{:type "invoice" :structured-data {}}
          candidates [{:doc #:document{:type "purchase-order" :structured-data {}}
                       :score 0.55 :evidence []}]
          llm-config {:provider :anthropic}
          llm-result {:matches [{:candidate 0 :confidence "high" :reasoning "bad index"}]}
          result     (with-redefs [com.getorcha.workers.matching.llm-decision/llm-match-decision
                                   (fn [_cfg _src _cands] llm-result)]
                       (matching/decide-matches llm-config source-doc candidates))]
      (is (empty? result)))))


(deftest decide-matches-sends-all-uncertain-to-llm-test
  (testing "sends all uncertain-zone candidates to LLM (no top-3 cap)"
    (let [candidates (mapv #(hash-map :doc {:document/id (random-uuid)}
                                      :score %
                                      :evidence [])
                           [0.65 0.60 0.55 0.50 0.45])
          llm-calls  (atom [])
          llm-config {:matching {:provider :test}}]
      (with-redefs [com.getorcha.workers.matching.llm-decision/llm-match-decision
                    (fn [_ _ cands]
                      (reset! llm-calls cands)
                      {:matches [{:candidate 1 :confidence "high" :reasoning "test"}]})]
        (matching/decide-matches llm-config {} candidates)
        ;; LLM should receive all 5, not top 3
        (is (= 5 (count @llm-calls)))))))

Step 2: Run tests to verify they fail

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.core-test]' Expected: Multiple failures — decide-matches still has old single-match logic.

Step 3: Commit failing tests

git add test/com/getorcha/workers/matching/core_test.clj
git commit -m "test: update decide-matches tests for many-to-many matching"

Task 2: Rewrite decide-matches for Many-to-Many

Files:

Step 1: Rewrite decide-matches

Replace the decide-matches function (lines 67–93) with:

(def ^:private max-uncertain-for-llm
  "Maximum number of uncertain-zone candidates to send to the LLM."
  50)


(defn decide-matches
  "Decide which candidates to match based on scores.

   Two-tier approach:
   - All candidates >= high threshold → auto-match (rule-based)
   - Candidates in uncertain zone (low..high) → LLM decides (if config provided)

   Both tiers produce matches simultaneously. Uncertain-zone candidates
   are capped at `max-uncertain-for-llm` (sorted by score descending).

   Returns seq of `{:doc :score :evidence :match-method}`."
  [llm-config source-doc candidates]
  (when (seq candidates)
    (let [high-threshold (:high evidence/match-thresholds)
          {high true uncertain false} (group-by #(>= (:score %) high-threshold) candidates)
          rule-matches   (mapv #(assoc % :match-method "rule-based") high)
          llm-matches    (when (and llm-config (seq uncertain))
                           (let [batch (take max-uncertain-for-llm uncertain)]
                             (resolve-llm-matches
                              (llm-decision/llm-match-decision llm-config source-doc batch)
                              batch)))]
      (into rule-matches llm-matches))))

Step 2: Run tests to verify they pass

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.core-test]' Expected: All tests pass.

Step 3: Commit

git add src/com/getorcha/workers/matching/core.clj
git commit -m "feat: rewrite decide-matches for many-to-many matching

Auto-match all candidates above high threshold. Send uncertain-zone
candidates (up to 50) to LLM for per-candidate evaluation."

Task 3: Update LLM Prompt for Per-Candidate Evaluation

Files:

Step 1: Write a test for the new prompt wording

Add to test/com/getorcha/workers/matching/llm_decision_test.clj, inside the existing build-match-prompt-test:

  (testing "prompt asks for per-candidate evaluation, not best-match selection"
    (let [source     #:document{:type "invoice"
                                :structured-data {:invoice-number "INV-003"
                                                  :issuer {:name "Test"}
                                                  :total 100}}
          candidates [{:doc #:document{:type "purchase-order"
                                       :structured-data {:po-number "PO-X"}}
                       :score 0.55
                       :evidence []}]
          {:keys [user]} (llm-decision/build-match-prompt source candidates)]
      (is (str/includes? user "each candidate"))
      (is (not (str/includes? user "Which candidate(s) match")))))

Step 2: Run test to verify it fails

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.llm-decision-test]' Expected: FAIL — prompt still contains old wording.

Step 3: Update build-match-prompt

Replace the :user string's "## Task\n" section (lines 182–185) with:

              "\n\n## Task\n"
              "For each candidate, determine whether it genuinely belongs to the same "
              "business transaction as the source document. Consider: supplier/counterparty "
              "identity, amounts, dates, reference numbers, and cross-references.\n\n"
              "Return JSON with only the candidates that match:\n"
              "{\"matches\": [{\"candidate\": <1-indexed>, \"confidence\": \"high\"|\"medium\"|\"low\", \"reasoning\": \"...\"}]}\n\n"
              "If none match confidently, return {\"matches\": []}")})

Step 4: Run tests to verify they pass

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.llm-decision-test]' Expected: All tests pass.

Step 5: Commit

git add src/com/getorcha/workers/matching/llm_decision.clj test/com/getorcha/workers/matching/llm_decision_test.clj
git commit -m "feat: update LLM prompt for per-candidate evaluation

Reframe from 'pick the best match' to 'evaluate each candidate
independently as belonging to the same business transaction.'"

Task 4: Run Full Test Suite and Lint

Step 1: Run all matching tests

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.core-test com.getorcha.workers.matching.llm-decision-test]' Expected: All tests pass.

Step 2: Run linter

Run: clj-kondo --lint src test dev Expected: No warnings or errors.

Step 3: Run integration tests

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.integration-test]' Expected: All tests pass. Integration tests exercise match-document! end-to-end and should work with the new decide-matches without modification (they test single-match scenarios that still work under many-to-many).

Step 4: Final commit if any fixups needed