Pairing-Specific LLM Matching Prompts

Date: 2026-03-02 Status: Approved

Problem

The document matching LLM prompt is generic — it says "determine whether it genuinely belongs to the same business transaction" regardless of which document types are being matched. Different pairings have fundamentally different matching semantics:

Invoice → Contract: typically 1:1 (an invoice is covered by one contract)
Contract → Invoice: many (a contract covers many invoices)
Invoice ↔ PO: many:many at line-item level
GRN ↔ PO: many:many at line-item level

A generic prompt can't express these constraints, leading to suboptimal matching quality.

Design

Direction-Dependent Pair Prompts

The lookup key for prompts is [source-type candidate-type] (ordered vector), not an unordered set. This captures the directional nature of matching semantics.

8 prompt configurations (4 pairs × 2 directions):

Source → Candidate	Cardinality	Focus
invoice → contract	1 (exclusive)	Counterparty, contract scope vs invoice items, amounts, service period, contract refs
contract → invoice	many	All invoices under this contract, dates within contract period
invoice → purchase-order	many (line-item)	Line item descriptions, quantities, amounts, PO refs
purchase-order → invoice	many (line-item)	Partial invoicing, line item matching
purchase-order → contract	1 (exclusive)	Contract scope vs PO items, contract refs
contract → purchase-order	many	All POs under this contract
goods-received-note → purchase-order	many (line-item)	Quantities received vs ordered, line items, PO refs
purchase-order → goods-received-note	many (line-item)	Delivery confirmations, quantities

Cardinality Is Guidance, Not Enforcement

For exclusive (1:1) pairs, the prompt instructs the LLM to "select the single best match." However, if the LLM returns multiple matches, we accept them all. The instruction improves prompt quality but the LLM's judgment prevails.

Candidate Grouping

Currently, a single LLM call evaluates all candidates regardless of type. An invoice's candidates could include both POs and contracts in the same batch.

With pair-specific prompts, candidates are grouped by type before LLM evaluation:

candidates → evidence scoring → group by candidate type →
  per group: partition by threshold → LLM call with pair-specific prompt →
merge all groups → cluster assignment

Groups are evaluated in parallel (futures) to avoid increased latency.

Prompt Structure

Each pair configuration provides:

System prompt: Role context, matching semantics, what to focus on
Task instructions: Cardinality guidance, output format

Shared across all pairs (unchanged):

format-document-summary — type-aware document formatting
format-line-items — line item list formatting
format-candidates — candidate list with scores and evidence
Response schema — {matches: [{candidate, confidence, reasoning}]}

Prompt Content Direction

Exclusive pairs (invoice→contract, PO→contract):

System: "Match [source] to its governing contract. A [source] is typically covered by exactly one contract."
Focus: counterparty identity, contract scope/deliverables vs source items, amounts within contract value, date alignment, explicit contract refs
Task: "Select the single best matching contract, if any."

Many pairs (contract→invoice, contract→PO):

System: "Match [source] to related [candidates]. A [source] can be associated with multiple [candidates]."
Focus: counterparty identity, dates within relevant periods, amounts
Task: "Match all candidates that are genuinely related."

Line-item pairs (invoice↔PO, GRN↔PO):

System: "Match [source] to [candidates] by comparing line items. A [source] can partially match multiple [candidates]. Focus on line item descriptions, quantities, and amounts."
Task: "Match all candidates that share relevant line items. Partial matches (some line items match) are valid."

Code Changes

`llm_decision.clj`

Add pair-prompts map keyed by [source-type candidate-type] vectors
Change build-match-prompt signature: [source-doc candidates] → [source-doc candidate-type candidates]
Change llm-match-decision signature to also accept candidate-type
No changes to formatting functions or response parsing

`core.clj`

After evidence scoring, group candidates by (:document/type (:doc candidate))
Per group: partition by threshold, call llm-match-decision with candidate-type
Run groups in parallel (futures)
Merge results from all groups for cluster assignment

No DB schema changes

The document_match table and evidence JSONB work unchanged. Optionally store the pair-prompt key in evidence for debugging.

Testing

Unit tests for build-match-prompt: each of 8 pair prompts produces correct system + task text
Unit tests for candidate grouping: mixed candidate types correctly split
Integration test with mocked LLM: verify correct prompt is sent per group