Many-to-Many Document Matching

Problem

The current matching algorithm is oriented toward finding a single best match per document type group. When a document arrives and multiple existing documents are genuine matches (e.g., a contract added after 5 invoices from the same supplier already exist), the system should create matches with all of them, not just pick the best one.

Design

Decision Logic Change

Replace the current decide-matches logic with a two-tier approach per type group:

  1. High-confidence tier (≥ 0.70): Auto-match ALL candidates. No cap. Method: rule-based.
  2. Uncertain tier (0.30–0.70): Send up to 50 candidates to LLM together for per-candidate yes/no evaluation. Method: llm.

Both tiers produce matches simultaneously — they are not mutually exclusive within a type group.

LLM Prompt Change

Reframe from "pick the best match" to "for each candidate, determine whether it genuinely belongs to the same business transaction as the source document." Candidates are sent together so the LLM can reason about overlaps and duplicates.

Response schema unchanged: {matches: [{candidate, confidence, reasoning}]}. Absence from the list = no match.

What Doesn't Change

Files Affected

File Change
src/com/getorcha/workers/matching/core.clj Rewrite decide-matches
src/com/getorcha/workers/matching/llm_decision.clj Update build-match-prompt wording
Tests for both namespaces Update expectations