Scoring Blend: Integrate Retrieval Scores into Evidence Scoring

Problem

The matching pipeline has two phases: hybrid search (BM25 + semantic) retrieves candidates, then deterministic evidence signals score them independently. The retrieval scores are discarded before scoring.

This causes two problems:

  1. Redundant signals: :supplier-name-fuzzy and :description-overlap poorly re-implement what BM25/semantic search already does.
  2. Sparse-type penalty: Document pairs like invoice-contract have few matching structured fields (no VAT, IBAN, amounts on contracts), so they score near zero even when retrieval correctly identifies them as related.

Example: bikosigma invoice vs contract scored 0.15 (only supplier-name-fuzzy fired) despite hybrid search correctly finding the contract and the invoice line items being verbatim matches to the contract fee schedule.

Design

Scoring Model

final_score = alpha * cosine_similarity + (1 - alpha) * deterministic_score

Alpha Values

Document-Type Pair alpha Rationale
invoice <-> contract 0.6 Contracts lack VAT, IBAN, amounts. Semantic similarity is primary signal.
invoice <-> PO 0.5 Rich deterministic fields but retrieval also valuable.
invoice <-> GRN 0.3 Quantities, dates, supplier info commonly present on both.
PO <-> contract 0.5 Moderate — contracts may have PO refs but often sparse.
PO <-> GRN 0.3 Rich — PO refs, quantities, dates.
default 0.4 Balanced fallback for unlisted pairs.

Signal Changes

Remove (redundant with hybrid search):

Keep (structured field comparisons that add value beyond retrieval):

Thresholds

Unchanged: 0.70 (auto-match), 0.30 (minimum to consider). Recalibrate empirically after deployment.

Data Flow

candidates/find-candidates -> [rows with cosine similarity preserved]
                            |
              evidence/compute-score -> deterministic score (10 signals)
                            |
              blend-score(type-pair, cosine, deterministic) -> final score

No schema migration needed.

File Changes

evidence.clj

core.clj

normalize.clj

Tests

Example Scenarios

Scenario Cosine Deterministic alpha Final Outcome
bikosigma invoice <-> contract 0.80 0.00 0.6 0.48 LLM decides (was: filtered at 0.15)
Invoice <-> PO with matching PO# 0.85 1.00 0.5 0.925 Auto-match
Invoice <-> PO, no deterministic 0.85 0.00 0.5 0.425 LLM decides
Invoice <-> GRN with quantities 0.75 0.60 0.3 0.645 LLM decides
Unrelated documents 0.35 0.00 0.4 0.14 Filtered out