Date: 2026-03-02 Status: Approved
The document matching LLM prompt is generic — it says "determine whether it genuinely belongs to the same business transaction" regardless of which document types are being matched. Different pairings have fundamentally different matching semantics:
A generic prompt can't express these constraints, leading to suboptimal matching quality.
The lookup key for prompts is [source-type candidate-type] (ordered vector),
not an unordered set. This captures the directional nature of matching
semantics.
8 prompt configurations (4 pairs × 2 directions):
| Source → Candidate | Cardinality | Focus |
|---|---|---|
| invoice → contract | 1 (exclusive) | Counterparty, contract scope vs invoice items, amounts, service period, contract refs |
| contract → invoice | many | All invoices under this contract, dates within contract period |
| invoice → purchase-order | many (line-item) | Line item descriptions, quantities, amounts, PO refs |
| purchase-order → invoice | many (line-item) | Partial invoicing, line item matching |
| purchase-order → contract | 1 (exclusive) | Contract scope vs PO items, contract refs |
| contract → purchase-order | many | All POs under this contract |
| goods-received-note → purchase-order | many (line-item) | Quantities received vs ordered, line items, PO refs |
| purchase-order → goods-received-note | many (line-item) | Delivery confirmations, quantities |
For exclusive (1:1) pairs, the prompt instructs the LLM to "select the single best match." However, if the LLM returns multiple matches, we accept them all. The instruction improves prompt quality but the LLM's judgment prevails.
Currently, a single LLM call evaluates all candidates regardless of type. An invoice's candidates could include both POs and contracts in the same batch.
With pair-specific prompts, candidates are grouped by type before LLM evaluation:
candidates → evidence scoring → group by candidate type →
per group: partition by threshold → LLM call with pair-specific prompt →
merge all groups → cluster assignment
Groups are evaluated in parallel (futures) to avoid increased latency.
Each pair configuration provides:
Shared across all pairs (unchanged):
format-document-summary — type-aware document formattingformat-line-items — line item list formattingformat-candidates — candidate list with scores and evidence{matches: [{candidate, confidence, reasoning}]}Exclusive pairs (invoice→contract, PO→contract):
Many pairs (contract→invoice, contract→PO):
Line-item pairs (invoice↔PO, GRN↔PO):
llm_decision.cljpair-prompts map keyed by [source-type candidate-type] vectorsbuild-match-prompt signature: [source-doc candidates] →
[source-doc candidate-type candidates]llm-match-decision signature to also accept candidate-typecore.clj(:document/type (:doc candidate))llm-match-decision with candidate-typeThe document_match table and evidence JSONB work unchanged. Optionally
store the pair-prompt key in evidence for debugging.
build-match-prompt: each of 8 pair prompts produces correct
system + task text