For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Integrate cosine similarity from hybrid search into evidence scoring via a weighted blend, removing redundant signals.
Architecture: The final match score becomes α × cosine_similarity + (1-α) × deterministic_score, where α varies by document-type pair. Two redundant signals (supplier-name-fuzzy, description-overlap) are removed. RRF fusion is fixed to preserve cosine scores on all result rows.
Tech Stack: Clojure, next-jdbc, pgvector, Apache Commons Text (JaroWinklerSimilarity removed)
Design doc: docs/plans/2026-02-26-scoring-blend-design.md
:score to :cosine-scoreTwo changes in search.clj:
rrf-fuse currently takes the row from the first search method only ((:row (first entries))), which loses the cosine similarity key for documents that appear in both BM25 and semantic results. Fix it to merge row data from all entries.:score key (cosine similarity from vector search) to :cosine-score for clarity, consistent with :bm25-score.Files:
src/com/getorcha/search.clj:199,236test/com/getorcha/search_test.cljStep 1: Write the failing test
File: test/com/getorcha/search_test.clj
(ns com.getorcha.search-test
(:require [clojure.test :refer [deftest is testing]]
[com.getorcha.search :as search]))
(deftest rrf-fuse-preserves-all-scores-test
(testing "rows appearing in both result lists retain keys from both"
(let [bm25-results [{:id 1 :rank 1 :document/type "invoice" :bm25-score 0.95}
{:id 2 :rank 2 :document/type "contract" :bm25-score 0.80}]
semantic-results [{:id 2 :rank 1 :document/type "contract" :cosine-score 0.85}
{:id 1 :rank 2 :document/type "invoice" :cosine-score 0.70}]
results (#'search/rrf-fuse 60 bm25-results semantic-results)
by-id (zipmap (map :id results) results)]
;; Doc 1: appeared in both -> should have both :bm25-score and :cosine-score
(is (= 0.95 (:bm25-score (by-id 1))))
(is (= 0.70 (:cosine-score (by-id 1))))
;; Doc 2: appeared in both -> should have both
(is (= 0.80 (:bm25-score (by-id 2))))
(is (= 0.85 (:cosine-score (by-id 2))))))
(testing "rows appearing in only one list retain their scores"
(let [bm25-results [{:id 1 :rank 1 :bm25-score 0.95}]
semantic-results [{:id 2 :rank 1 :cosine-score 0.85}]
results (#'search/rrf-fuse 60 bm25-results semantic-results)
by-id (zipmap (map :id results) results)]
(is (= 0.95 (:bm25-score (by-id 1))))
(is (nil? (:cosine-score (by-id 1))))
(is (= 0.85 (:cosine-score (by-id 2))))
(is (nil? (:bm25-score (by-id 2)))))))
Step 2: Run test to verify it fails
Run: clj -X:test:silent :nses '[com.getorcha.search-test]'
Expected: FAIL — :cosine-score is nil (key doesn't exist yet, still named :score).
Step 3: Implement the fixes
In src/com/getorcha/search.clj:
3a. Rename the SQL alias in vector-search (line 199):
Replace:
query {:select [:* [[[:- [:inline 1] distance]] :score]]
With:
query {:select [:* [[[:- [:inline 1] distance]] :cosine-score]]
Also update the docstring (line 182):
- :cosine-score - cosine similarity (1 = identical, 0 = orthogonal)
3b. Fix row merging in rrf-fuse (line 236):
Replace:
row (:row (first entries))]
With:
row (apply merge (map :row entries))]
This merges row data from all search methods. Shared keys (document columns) have identical values. Unique keys (:bm25-score, :cosine-score) are preserved from their respective methods.
Step 4: Run test to verify it passes
Run: clj -X:test:silent :nses '[com.getorcha.search-test]'
Expected: PASS
Step 5: Commit
git add src/com/getorcha/search.clj test/com/getorcha/search_test.clj
git commit -m "fix(search): preserve all scores through RRF fusion, rename :score to :cosine-score"
Remove :supplier-name-fuzzy and :description-overlap signals, plus all supporting code (Jaro-Winkler import, stop-words, extract-description-words). Add the type-pair-alpha map and blend-score function.
Files:
src/com/getorcha/workers/matching/evidence.cljtest/com/getorcha/workers/matching/evidence_test.cljStep 1: Write failing tests for blend-score
Add tests to evidence_test.clj. Remove the supplier-name-fuzzy-signal-test and description-overlap-signal-test deftests entirely.
Add at the end of the file:
(deftest type-pair-alpha-test
(testing "known type pairs return their configured alpha"
(is (= 0.6 (evidence/type-pair-alpha :invoice :contract)))
(is (= 0.6 (evidence/type-pair-alpha :contract :invoice)))
(is (= 0.5 (evidence/type-pair-alpha :invoice :purchase-order)))
(is (= 0.5 (evidence/type-pair-alpha :purchase-order :invoice)))
(is (= 0.3 (evidence/type-pair-alpha :invoice :goods-received-note)))
(is (= 0.5 (evidence/type-pair-alpha :purchase-order :contract)))
(is (= 0.3 (evidence/type-pair-alpha :purchase-order :goods-received-note))))
(testing "unknown type pairs return default alpha"
(is (= 0.4 (evidence/type-pair-alpha :contract :goods-received-note)))))
(deftest blend-score-test
(testing "blends cosine and deterministic scores using alpha for type pair"
;; invoice<->contract: alpha=0.6
;; final = 0.6 * 0.80 + 0.4 * 0.0 = 0.48
(is (== 0.48 (evidence/blend-score :invoice :contract 0.80 0.0))))
(testing "invoice<->PO uses alpha=0.5"
;; final = 0.5 * 0.85 + 0.5 * 1.0 = 0.925
(is (== 0.925 (evidence/blend-score :invoice :purchase-order 0.85 1.0))))
(testing "nil cosine falls back to deterministic only"
(is (== 1.0 (evidence/blend-score :invoice :purchase-order nil 1.0))))
(testing "clamps to 0-1 range"
(is (== 0.0 (evidence/blend-score :invoice :contract 0.0 0.0)))
(is (== 1.0 (evidence/blend-score :invoice :contract 1.0 1.0)))))
Step 2: Run tests to verify they fail
Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.evidence-test]'
Expected: FAIL — type-pair-alpha and blend-score don't exist yet.
Step 3: Implement changes
In src/com/getorcha/workers/matching/evidence.clj:
3a. Remove imports and dead code:
Replace the ns declaration (lines 1-11) with:
(ns com.getorcha.workers.matching.evidence
"Evidence signal collection and scoring for document matching.
Compares two documents and produces a list of evidence signals (positive and
negative) along with a normalized 0-1 confidence score. Used by the matching
worker to decide whether documents should be linked."
(:require [clojure.set :as set]
[clojure.string :as str]))
Removes: com.getorcha.util.text, com.getorcha.workers.matching.normalize, JaroWinklerSimilarity import.
3b. Remove signals from the map (lines 16-30):
Replace evidence-signals with:
(def evidence-signals
{:po-number-exact 60 ; PO number on invoice/GRN matches PO document
:contract-ref-exact 55 ; Contract reference matches
:po-ref-exact 55 ; PO reference on GRN matches
:vat-id-match 30 ; Supplier VAT/tax IDs match
:iban-match 25 ; Supplier bank accounts match
:quantity-exact 35 ; Same quantity in both documents
:amount-within-2pct 20 ; Amounts within 2% tolerance
:amount-within-5pct 10 ; Amounts within 5% tolerance
:date-within-period 20 ; Service period within contract dates
:delivery-date-match 25 ; Delivery dates align
:currency-mismatch -30 ; Currencies present but different
:vat-id-mismatch -40}) ; VAT/tax IDs present but don't match
3c. Delete dead code blocks:
Delete these sections entirely:
jaro-winkler def (line 99)stop-words def (lines 128-131)extract-description-words function (lines 134-157)collect-signals (lines 298-308)collect-signals (lines 310-320)3d. Add type-pair-alpha and blend-score after match-thresholds:
(def ^:private alpha-by-type-pair
"Retrieval-vs-deterministic blend weight (α) per document-type pair.
Higher α → more weight on cosine similarity from hybrid search.
Lower α → more weight on deterministic structured-field signals."
{#{:invoice :contract} 0.6
#{:invoice :purchase-order} 0.5
#{:invoice :goods-received-note} 0.3
#{:purchase-order :contract} 0.5
#{:purchase-order :goods-received-note} 0.3})
(def ^:private default-alpha 0.4)
(defn type-pair-alpha
"Look up the blend alpha for a pair of document types."
[type-a type-b]
(get alpha-by-type-pair #{type-a type-b} default-alpha))
(defn blend-score
"Blend cosine similarity and deterministic evidence score.
Returns `α × cosine + (1 - α) × deterministic`, clamped to [0, 1].
Falls back to deterministic-only when cosine is nil (no embedding available)."
[type-a type-b cosine-similarity deterministic-score]
(if (nil? cosine-similarity)
deterministic-score
(let [alpha (type-pair-alpha type-a type-b)]
(-> (+ (* alpha (double cosine-similarity))
(* (- 1.0 alpha) (double deterministic-score)))
(max 0.0)
(min 1.0)))))
Step 4: Run tests to verify they pass
Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.evidence-test]'
Expected: PASS (new blend tests pass, removed signal tests are gone, existing deterministic signal tests still pass).
Step 5: Lint
Run: clj-kondo --lint src/com/getorcha/workers/matching/evidence.clj --fail-level warning
Check that removing the text, normalize, and JaroWinklerSimilarity requires doesn't leave unused imports elsewhere. The str require is still used in collect-signals (e.g., str/join). The set require is still used for set/intersection.
Step 6: Commit
git add src/com/getorcha/workers/matching/evidence.clj test/com/getorcha/workers/matching/evidence_test.clj
git commit -m "refactor(matching): remove redundant signals, add blend scoring"
Update score-all-candidates to pass the cosine similarity from the candidate row through to blend-score, producing the final blended score.
Files:
src/com/getorcha/workers/matching/core.clj:24-40test/com/getorcha/workers/matching/core_test.cljStep 1: Update candidate-row->doc and score-all-candidates
In src/com/getorcha/workers/matching/core.clj:
Replace score-all-candidates (lines 32-40) with:
(defn ^:private score-all-candidates
"Score all candidates against the source document.
Blends cosine similarity from retrieval with deterministic evidence signals.
Returns unsorted, unfiltered."
[source-doc candidate-rows]
(let [source-type (:type source-doc)]
(mapv (fn [row]
(let [candidate-type (keyword (:document/type row))
cosine-similarity (:cosine-score row)
{:keys [score evidence]} (evidence/compute-score
source-doc
(candidate-row->doc row))
final-score (evidence/blend-score
source-type candidate-type
cosine-similarity score)]
{:doc row
:score final-score
:retrieval-score cosine-similarity
:deterministic-score score
:evidence evidence}))
candidate-rows)))
Step 2: Update logging in match-document!
In the :scores log output (around line 189-194), add the sub-scores:
:scores (mapv (fn [{:keys [doc score retrieval-score
deterministic-score evidence]}]
{:candidate-id (:document/id doc)
:candidate-type (:document/type doc)
:score score
:retrieval-score retrieval-score
:deterministic-score deterministic-score
:signals (mapv :signal evidence)})
all-scored)})
Step 3: Update core_test.clj
The decide-matches-test candidates use hardcoded :score values and are pure function tests — they don't go through score-all-candidates, so they need no changes.
The match-document-test integration tests stub embeddings as zero vectors, so cosine similarity between zero vectors is undefined (NaN or 0). In the integration tests, embed-document returns (vec (repeat 768 0.0)) and embed-query returns the same. pgvector cosine distance of two zero vectors is NaN, so :score may be nil.
When :score is nil, blend-score falls back to deterministic-only. This means existing integration tests should continue to work without changes — the blend just acts as a passthrough.
Verify by running:
Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.core-test]'
Expected: PASS
Step 4: Commit
git add src/com/getorcha/workers/matching/core.clj
git commit -m "feat(matching): wire blend scoring into candidate scoring pipeline"
The integration tests need updates for two reasons:
contract-invoice-matching-without-references-test asserts a score based on the old signal set (supplier-name-fuzzy + quantity-exact + vat-id-match = 0.80). With supplier-name-fuzzy removed, the deterministic score drops to 0.65 (quantity-exact 35 + vat-id-match 30 = 65/100). But with cosine = 0.0 (zero vector stubs) and nil fallback, the final score is 0.65 — still above 0.30 but not above 0.70. This test will need either realistic embeddings or adjusted assertions.invoice-po-matching-integration-test score comment mentions "PO exact (60) + VAT match (30) + amount 2% (20) = 110 -> capped at 1.0". With blend: deterministic = 1.0, cosine = nil → final = 1.0. Fine.Files:
test/com/getorcha/workers/matching/integration_test.cljStep 1: Fix the contract-invoice test
The zero-vector embedding stubs produce nil cosine similarity (cosine distance of zero vectors is undefined in pgvector). With blend-score nil fallback, the final score is pure deterministic: (35 + 30) / 100 = 0.65.
This is below the 0.70 high threshold, so it won't auto-match rule-based. Without LLM config it won't match at all.
Two options:
Option (b) is cleaner — it tests the blend path. Use embeddings that produce a known cosine similarity. Two identical non-zero vectors → cosine similarity = 1.0.
Replace the embedding stubs in contract-invoice-matching-without-references-test:
(with-redefs [search/embed-document (constantly (vec (repeat 768 1.0)))
search/embed-query (constantly (vec (repeat 768 1.0)))]
With cosine similarity = 1.0 (identical embeddings), blend for invoice↔contract (α=0.6):
0.6 × 1.0 + 0.4 × 0.65 = 0.86 → above 0.70, auto-match.
Update the score comment (line 325):
;; deterministic: quantity-exact (35) + vat-id-match (30) = 65 -> 0.65
;; blend: 0.6 * 1.0 (cosine) + 0.4 * 0.65 = 0.86
The assertion (>= confidence 0.7M) remains valid.
Step 2: Update other integration tests for consistency
The invoice-po-matching-integration-test uses zero vectors. With zero vectors:
:cosine-score is nil or NaNblend-score with nil cosine falls back to deterministic onlyCheck if zero vectors produce nil :cosine-score or a numeric value in pgvector. The formula is 1 - cosine_distance. For zero vectors, cosine distance is undefined → likely NaN or an error.
To be safe, update ALL integration tests to use (vec (repeat 768 1.0)) instead of (vec (repeat 768 0.0)) for both embed-document and embed-query. This gives cosine similarity = 1.0 everywhere, making the blend well-defined.
For the invoice-po-matching-integration-test:
0.5 × 1.0 + 0.5 × 1.0 = 1.0(>= confidence 0.7M) still passes.For multiple-candidates-best-match-wins-test:
0.5 × 1.0 + 0.5 × 0.5 = 0.75Step 3: Run all matching tests
Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.integration-test]'
Expected: PASS
Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.core-test]'
Expected: PASS
Step 4: Commit
git add test/com/getorcha/workers/matching/integration_test.clj
git commit -m "test(matching): update integration tests for blend scoring"
Step 1: Lint
Run: clj-kondo --lint src test dev --fail-level warning
Expected: No new warnings. The removed imports (text, normalize, JaroWinklerSimilarity) should not be referenced anywhere else in evidence.clj.
Step 2: Run all matching tests
Run: clj -X:test:silent :nses '[com.getorcha.workers.matching.evidence-test com.getorcha.workers.matching.core-test com.getorcha.workers.matching.integration-test com.getorcha.search-test]' 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Ran .* tests)"
Expected: All tests pass.
Step 3: Run full test suite
Run: clj -X:test:silent 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Execution error|failed because|Ran .* tests)"
Expected: All tests pass. No regressions from search.clj change.
Step 4: Commit (if any fixes were needed)
Only if lint or tests required fixes in the previous steps.