Scoring Blend Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Integrate cosine similarity from hybrid search into evidence scoring via a weighted blend, removing redundant signals.

Architecture: The final match score becomes α × cosine_similarity + (1-α) × deterministic_score, where α varies by document-type pair. Two redundant signals (supplier-name-fuzzy, description-overlap) are removed. RRF fusion is fixed to preserve cosine scores on all result rows.

Tech Stack: Clojure, next-jdbc, pgvector, Apache Commons Text (JaroWinklerSimilarity removed)

Design doc: docs/plans/2026-02-26-scoring-blend-design.md


Task 1: Fix RRF fusion to preserve all search scores, rename :score to :cosine-score

Two changes in search.clj:

  1. rrf-fuse currently takes the row from the first search method only ((:row (first entries))), which loses the cosine similarity key for documents that appear in both BM25 and semantic results. Fix it to merge row data from all entries.
  2. Rename the ambiguous :score key (cosine similarity from vector search) to :cosine-score for clarity, consistent with :bm25-score.

Files:

Step 1: Write the failing test

File: test/com/getorcha/search_test.clj

(ns com.getorcha.search-test
  (:require [clojure.test :refer [deftest is testing]]
            [com.getorcha.search :as search]))


(deftest rrf-fuse-preserves-all-scores-test
  (testing "rows appearing in both result lists retain keys from both"
    (let [bm25-results    [{:id 1 :rank 1 :document/type "invoice" :bm25-score 0.95}
                           {:id 2 :rank 2 :document/type "contract" :bm25-score 0.80}]
          semantic-results [{:id 2 :rank 1 :document/type "contract" :cosine-score 0.85}
                            {:id 1 :rank 2 :document/type "invoice" :cosine-score 0.70}]
          results          (#'search/rrf-fuse 60 bm25-results semantic-results)
          by-id            (zipmap (map :id results) results)]
      ;; Doc 1: appeared in both -> should have both :bm25-score and :cosine-score
      (is (= 0.95 (:bm25-score (by-id 1))))
      (is (= 0.70 (:cosine-score (by-id 1))))
      ;; Doc 2: appeared in both -> should have both
      (is (= 0.80 (:bm25-score (by-id 2))))
      (is (= 0.85 (:cosine-score (by-id 2))))))

  (testing "rows appearing in only one list retain their scores"
    (let [bm25-results    [{:id 1 :rank 1 :bm25-score 0.95}]
          semantic-results [{:id 2 :rank 1 :cosine-score 0.85}]
          results          (#'search/rrf-fuse 60 bm25-results semantic-results)
          by-id            (zipmap (map :id results) results)]
      (is (= 0.95 (:bm25-score (by-id 1))))
      (is (nil? (:cosine-score (by-id 1))))
      (is (= 0.85 (:cosine-score (by-id 2))))
      (is (nil? (:bm25-score (by-id 2)))))))

Step 2: Run test to verify it fails

Run: clj -X:test:silent :nses '[com.getorcha.search-test]'

Expected: FAIL — :cosine-score is nil (key doesn't exist yet, still named :score).

Step 3: Implement the fixes

In src/com/getorcha/search.clj:

3a. Rename the SQL alias in vector-search (line 199):

Replace:

        query          {:select   [:* [[[:- [:inline 1] distance]] :score]]

With:

        query          {:select   [:* [[[:- [:inline 1] distance]] :cosine-score]]

Also update the docstring (line 182):

   - :cosine-score - cosine similarity (1 = identical, 0 = orthogonal)

3b. Fix row merging in rrf-fuse (line 236):

Replace:

                      row (:row (first entries))]

With:

                      row (apply merge (map :row entries))]

This merges row data from all search methods. Shared keys (document columns) have identical values. Unique keys (:bm25-score, :cosine-score) are preserved from their respective methods.

Step 4: Run test to verify it passes

Run: clj -X:test:silent :nses '[com.getorcha.search-test]'

Expected: PASS

Step 5: Commit

git add src/com/getorcha/search.clj test/com/getorcha/search_test.clj
git commit -m "fix(search): preserve all scores through RRF fusion, rename :score to :cosine-score"

Task 2: Remove redundant signals from evidence.clj

Remove :supplier-name-fuzzy and :description-overlap signals, plus all supporting code (Jaro-Winkler import, stop-words, extract-description-words). Add the type-pair-alpha map and blend-score function.

Files:

Step 1: Write failing tests for blend-score

Add tests to evidence_test.clj. Remove the supplier-name-fuzzy-signal-test and description-overlap-signal-test deftests entirely.

Add at the end of the file:

(deftest type-pair-alpha-test
  (testing "known type pairs return their configured alpha"
    (is (= 0.6 (evidence/type-pair-alpha :invoice :contract)))
    (is (= 0.6 (evidence/type-pair-alpha :contract :invoice)))
    (is (= 0.5 (evidence/type-pair-alpha :invoice :purchase-order)))
    (is (= 0.5 (evidence/type-pair-alpha :purchase-order :invoice)))
    (is (= 0.3 (evidence/type-pair-alpha :invoice :goods-received-note)))
    (is (= 0.5 (evidence/type-pair-alpha :purchase-order :contract)))
    (is (= 0.3 (evidence/type-pair-alpha :purchase-order :goods-received-note))))

  (testing "unknown type pairs return default alpha"
    (is (= 0.4 (evidence/type-pair-alpha :contract :goods-received-note)))))


(deftest blend-score-test
  (testing "blends cosine and deterministic scores using alpha for type pair"
    ;; invoice<->contract: alpha=0.6
    ;; final = 0.6 * 0.80 + 0.4 * 0.0 = 0.48
    (is (== 0.48 (evidence/blend-score :invoice :contract 0.80 0.0))))

  (testing "invoice<->PO uses alpha=0.5"
    ;; final = 0.5 * 0.85 + 0.5 * 1.0 = 0.925
    (is (== 0.925 (evidence/blend-score :invoice :purchase-order 0.85 1.0))))

  (testing "nil cosine falls back to deterministic only"
    (is (== 1.0 (evidence/blend-score :invoice :purchase-order nil 1.0))))

  (testing "clamps to 0-1 range"
    (is (== 0.0 (evidence/blend-score :invoice :contract 0.0 0.0)))
    (is (== 1.0 (evidence/blend-score :invoice :contract 1.0 1.0)))))

Step 2: Run tests to verify they fail

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.evidence-test]'

Expected: FAIL — type-pair-alpha and blend-score don't exist yet.

Step 3: Implement changes

In src/com/getorcha/workers/matching/evidence.clj:

3a. Remove imports and dead code:

Replace the ns declaration (lines 1-11) with:

(ns com.getorcha.workers.matching.evidence
  "Evidence signal collection and scoring for document matching.

   Compares two documents and produces a list of evidence signals (positive and
   negative) along with a normalized 0-1 confidence score. Used by the matching
   worker to decide whether documents should be linked."
  (:require [clojure.set :as set]
            [clojure.string :as str]))

Removes: com.getorcha.util.text, com.getorcha.workers.matching.normalize, JaroWinklerSimilarity import.

3b. Remove signals from the map (lines 16-30):

Replace evidence-signals with:

(def evidence-signals
  {:po-number-exact     60   ; PO number on invoice/GRN matches PO document
   :contract-ref-exact  55   ; Contract reference matches
   :po-ref-exact        55   ; PO reference on GRN matches
   :vat-id-match        30   ; Supplier VAT/tax IDs match
   :iban-match          25   ; Supplier bank accounts match
   :quantity-exact      35   ; Same quantity in both documents
   :amount-within-2pct  20   ; Amounts within 2% tolerance
   :amount-within-5pct  10   ; Amounts within 5% tolerance
   :date-within-period  20   ; Service period within contract dates
   :delivery-date-match 25   ; Delivery dates align
   :currency-mismatch  -30   ; Currencies present but different
   :vat-id-mismatch    -40}) ; VAT/tax IDs present but don't match

3c. Delete dead code blocks:

Delete these sections entirely:

3d. Add type-pair-alpha and blend-score after match-thresholds:

(def ^:private alpha-by-type-pair
  "Retrieval-vs-deterministic blend weight (α) per document-type pair.
   Higher α → more weight on cosine similarity from hybrid search.
   Lower α → more weight on deterministic structured-field signals."
  {#{:invoice :contract}              0.6
   #{:invoice :purchase-order}        0.5
   #{:invoice :goods-received-note}   0.3
   #{:purchase-order :contract}       0.5
   #{:purchase-order :goods-received-note} 0.3})


(def ^:private default-alpha 0.4)


(defn type-pair-alpha
  "Look up the blend alpha for a pair of document types."
  [type-a type-b]
  (get alpha-by-type-pair #{type-a type-b} default-alpha))


(defn blend-score
  "Blend cosine similarity and deterministic evidence score.
   Returns `α × cosine + (1 - α) × deterministic`, clamped to [0, 1].
   Falls back to deterministic-only when cosine is nil (no embedding available)."
  [type-a type-b cosine-similarity deterministic-score]
  (if (nil? cosine-similarity)
    deterministic-score
    (let [alpha (type-pair-alpha type-a type-b)]
      (-> (+ (* alpha (double cosine-similarity))
             (* (- 1.0 alpha) (double deterministic-score)))
          (max 0.0)
          (min 1.0)))))

Step 4: Run tests to verify they pass

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.evidence-test]'

Expected: PASS (new blend tests pass, removed signal tests are gone, existing deterministic signal tests still pass).

Step 5: Lint

Run: clj-kondo --lint src/com/getorcha/workers/matching/evidence.clj --fail-level warning

Check that removing the text, normalize, and JaroWinklerSimilarity requires doesn't leave unused imports elsewhere. The str require is still used in collect-signals (e.g., str/join). The set require is still used for set/intersection.

Step 6: Commit

git add src/com/getorcha/workers/matching/evidence.clj test/com/getorcha/workers/matching/evidence_test.clj
git commit -m "refactor(matching): remove redundant signals, add blend scoring"

Task 3: Wire blend scoring into core.clj

Update score-all-candidates to pass the cosine similarity from the candidate row through to blend-score, producing the final blended score.

Files:

Step 1: Update candidate-row->doc and score-all-candidates

In src/com/getorcha/workers/matching/core.clj:

Replace score-all-candidates (lines 32-40) with:

(defn ^:private score-all-candidates
  "Score all candidates against the source document.
   Blends cosine similarity from retrieval with deterministic evidence signals.
   Returns unsorted, unfiltered."
  [source-doc candidate-rows]
  (let [source-type (:type source-doc)]
    (mapv (fn [row]
            (let [candidate-type    (keyword (:document/type row))
                  cosine-similarity (:cosine-score row)
                  {:keys [score evidence]} (evidence/compute-score
                                            source-doc
                                            (candidate-row->doc row))
                  final-score       (evidence/blend-score
                                     source-type candidate-type
                                     cosine-similarity score)]
              {:doc                row
               :score              final-score
               :retrieval-score    cosine-similarity
               :deterministic-score score
               :evidence           evidence}))
          candidate-rows)))

Step 2: Update logging in match-document!

In the :scores log output (around line 189-194), add the sub-scores:

                        :scores           (mapv (fn [{:keys [doc score retrieval-score
                                                             deterministic-score evidence]}]
                                                  {:candidate-id        (:document/id doc)
                                                   :candidate-type      (:document/type doc)
                                                   :score               score
                                                   :retrieval-score     retrieval-score
                                                   :deterministic-score deterministic-score
                                                   :signals             (mapv :signal evidence)})
                                                all-scored)})

Step 3: Update core_test.clj

The decide-matches-test candidates use hardcoded :score values and are pure function tests — they don't go through score-all-candidates, so they need no changes.

The match-document-test integration tests stub embeddings as zero vectors, so cosine similarity between zero vectors is undefined (NaN or 0). In the integration tests, embed-document returns (vec (repeat 768 0.0)) and embed-query returns the same. pgvector cosine distance of two zero vectors is NaN, so :score may be nil.

When :score is nil, blend-score falls back to deterministic-only. This means existing integration tests should continue to work without changes — the blend just acts as a passthrough.

Verify by running:

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.core-test]'

Expected: PASS

Step 4: Commit

git add src/com/getorcha/workers/matching/core.clj
git commit -m "feat(matching): wire blend scoring into candidate scoring pipeline"

Task 4: Update integration tests

The integration tests need updates for two reasons:

  1. The contract-invoice-matching-without-references-test asserts a score based on the old signal set (supplier-name-fuzzy + quantity-exact + vat-id-match = 0.80). With supplier-name-fuzzy removed, the deterministic score drops to 0.65 (quantity-exact 35 + vat-id-match 30 = 65/100). But with cosine = 0.0 (zero vector stubs) and nil fallback, the final score is 0.65 — still above 0.30 but not above 0.70. This test will need either realistic embeddings or adjusted assertions.
  2. The invoice-po-matching-integration-test score comment mentions "PO exact (60) + VAT match (30) + amount 2% (20) = 110 -> capped at 1.0". With blend: deterministic = 1.0, cosine = nil → final = 1.0. Fine.

Files:

Step 1: Fix the contract-invoice test

The zero-vector embedding stubs produce nil cosine similarity (cosine distance of zero vectors is undefined in pgvector). With blend-score nil fallback, the final score is pure deterministic: (35 + 30) / 100 = 0.65.

This is below the 0.70 high threshold, so it won't auto-match rule-based. Without LLM config it won't match at all.

Two options:

Option (b) is cleaner — it tests the blend path. Use embeddings that produce a known cosine similarity. Two identical non-zero vectors → cosine similarity = 1.0.

Replace the embedding stubs in contract-invoice-matching-without-references-test:

      (with-redefs [search/embed-document (constantly (vec (repeat 768 1.0)))
                    search/embed-query    (constantly (vec (repeat 768 1.0)))]

With cosine similarity = 1.0 (identical embeddings), blend for invoice↔contract (α=0.6): 0.6 × 1.0 + 0.4 × 0.65 = 0.86 → above 0.70, auto-match.

Update the score comment (line 325):

          ;; deterministic: quantity-exact (35) + vat-id-match (30) = 65 -> 0.65
          ;; blend: 0.6 * 1.0 (cosine) + 0.4 * 0.65 = 0.86

The assertion (>= confidence 0.7M) remains valid.

Step 2: Update other integration tests for consistency

The invoice-po-matching-integration-test uses zero vectors. With zero vectors:

Check if zero vectors produce nil :cosine-score or a numeric value in pgvector. The formula is 1 - cosine_distance. For zero vectors, cosine distance is undefined → likely NaN or an error.

To be safe, update ALL integration tests to use (vec (repeat 768 1.0)) instead of (vec (repeat 768 0.0)) for both embed-document and embed-query. This gives cosine similarity = 1.0 everywhere, making the blend well-defined.

For the invoice-po-matching-integration-test:

For multiple-candidates-best-match-wins-test:

Step 3: Run all matching tests

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.integration-test]'

Expected: PASS

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.core-test]'

Expected: PASS

Step 4: Commit

git add test/com/getorcha/workers/matching/integration_test.clj
git commit -m "test(matching): update integration tests for blend scoring"

Task 5: Run full test suite and lint

Step 1: Lint

Run: clj-kondo --lint src test dev --fail-level warning

Expected: No new warnings. The removed imports (text, normalize, JaroWinklerSimilarity) should not be referenced anywhere else in evidence.clj.

Step 2: Run all matching tests

Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.evidence-test com.getorcha.workers.matching.core-test com.getorcha.workers.matching.integration-test com.getorcha.search-test]' 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Ran .* tests)"

Expected: All tests pass.

Step 3: Run full test suite

Run: clj -X:test:silent 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Execution error|failed because|Ran .* tests)"

Expected: All tests pass. No regressions from search.clj change.

Step 4: Commit (if any fixes were needed)

Only if lint or tests required fixes in the previous steps.