Scoring Blend: Integrate Retrieval Scores into Evidence Scoring

Problem

The matching pipeline has two phases: hybrid search (BM25 + semantic) retrieves candidates, then deterministic evidence signals score them independently. The retrieval scores are discarded before scoring.

This causes two problems:

Redundant signals: :supplier-name-fuzzy and :description-overlap poorly re-implement what BM25/semantic search already does.
Sparse-type penalty: Document pairs like invoice-contract have few matching structured fields (no VAT, IBAN, amounts on contracts), so they score near zero even when retrieval correctly identifies them as related.

Example: bikosigma invoice vs contract scored 0.15 (only supplier-name-fuzzy fired) despite hybrid search correctly finding the contract and the invoice line items being verbatim matches to the contract fee schedule.

Design

Scoring Model

final_score = alpha * cosine_similarity + (1 - alpha) * deterministic_score

Cosine similarity (from embedding vector search): already in [0,1], absolute measure of semantic similarity, independent of candidate set size.
Deterministic score: unchanged normalization (sum of weights / 100, clamped to [0,1]).
alpha: varies by document-type pair, higher when deterministic signals are structurally sparse.

Alpha Values

Document-Type Pair	alpha	Rationale
invoice <-> contract	0.6	Contracts lack VAT, IBAN, amounts. Semantic similarity is primary signal.
invoice <-> PO	0.5	Rich deterministic fields but retrieval also valuable.
invoice <-> GRN	0.3	Quantities, dates, supplier info commonly present on both.
PO <-> contract	0.5	Moderate — contracts may have PO refs but often sparse.
PO <-> GRN	0.3	Rich — PO refs, quantities, dates.
default	0.4	Balanced fallback for unlisted pairs.

Signal Changes

Remove (redundant with hybrid search):

:supplier-name-fuzzy (weight 15) — candidate retrieval already filters by normalized_counterparty
:description-overlap (weight 10) — BM25 does bag-of-words better on full searchable text

Keep (structured field comparisons that add value beyond retrieval):

:po-number-exact (60), :contract-ref-exact (55), :po-ref-exact (55)
:vat-id-match (30), :vat-id-mismatch (-40)
:iban-match (25)
:quantity-exact (35)
:amount-within-2pct (20), :amount-within-5pct (10)
:date-within-period (20), :delivery-date-match (25)
:currency-mismatch (-30)

Thresholds

Unchanged: 0.70 (auto-match), 0.30 (minimum to consider). Recalibrate empirically after deployment.

Data Flow

candidates/find-candidates -> [rows with cosine similarity preserved]
                            |
              evidence/compute-score -> deterministic score (10 signals)
                            |
              blend-score(type-pair, cosine, deterministic) -> final score

candidates/find-candidates already returns :score (cosine) per candidate row. No change.
core/score-all-candidates reads candidate :score, computes deterministic score, blends them.
Returns {:score final, :retrieval-score cosine, :deterministic-score det, :evidence signals}.
document_match.confidence stores the final blended score.
Sub-scores recorded in evidence JSONB for auditability.

No schema migration needed.

File Changes

`evidence.clj`

Remove :supplier-name-fuzzy and :description-overlap from evidence-signals
Remove extract-description-words, stop-words, Jaro-Winkler import, and corresponding collect-signals blocks
Add type-pair-alpha map
Add blend-score function

`core.clj`

score-all-candidates: pass candidate :score (cosine) through to blend-score
Return richer result map with sub-scores
Update logging to include retrieval and deterministic sub-scores

`normalize.clj`

No changes. get-counterparty-name is still used by extract-counterparty for candidate retrieval.

Tests

Remove tests for dropped signals
Add tests for blend-score with various type pairs and score combinations
Update integration test assertions for new score structure

Example Scenarios

Scenario	Cosine	Deterministic	alpha	Final	Outcome
bikosigma invoice <-> contract	0.80	0.00	0.6	0.48	LLM decides (was: filtered at 0.15)
Invoice <-> PO with matching PO#	0.85	1.00	0.5	0.925	Auto-match
Invoice <-> PO, no deterministic	0.85	0.00	0.5	0.425	LLM decides
Invoice <-> GRN with quantities	0.75	0.60	0.3	0.645	LLM decides
Unrelated documents	0.35	0.00	0.4	0.14	Filtered out