Note (2026-04-24): After this document was written,
legal_entitywas renamed totenantand the oldtenantwas renamed toorganization. Read references to these terms with the pre-rename meaning.
The current booking history matching implementation produces poor results because it weights description similarity at 40% in the confidence formula. Historical descriptions ("01.2025 Car Lease") and invoice descriptions ("Servicerate vom 01.02.2026 bis 28.02.2026") are semantically related but textually different, causing pg_trgm to return ~0.06 similarity and tank the overall score.
PR #257's original approach worked better: filter by supplier name only, pass all matches to the LLM as CSV, let the LLM reason about semantic similarity.
Simplify to match PR #257 strategy with explicit parameter passing.
run "invoice"
│
├─► fetch-supplier-booking-history(db-pool, legal-entity-id, issuer-name)
│ └─► SQL: similarity(supplier_name_normalized, ...) >= 0.7, LIMIT 50
│
├─► booking-history->csv(matches)
│ └─► CSV string with: supplier-name, description, net-amount,
│ debit-account, credit-account, cost-center
│
└─► Pass booking-csv to processors:
(->AccountsMatcher context ingestion booking-csv)
(->CostCenterMatcher context ingestion booking-csv)
Database:
description_normalized (already done in migration)Code (post_process.clj):
enrich-with-booking-historyfind-booking-history-matchesfetch-supplier-booking-history — supplier-only filtering, 0.7 threshold, limit 50AccountsMatcher record — add booking-csv fieldCostCenterMatcher record — add booking-csv fieldrun "invoice" — fetch once, pass to both processorsPrompts:
${booking-history} CSV into accounts-match and cost-center-match prompts| Aspect | Before | After |
|---|---|---|
| Filtering | supplier (0.3) + description (weighted 40%) | supplier only (0.7) |
| Confidence | high/medium/none tiers | None - pass all matches |
| Format | Nested JSON on line items | CSV in prompt |
| Computation | Pre-enrichment before processors | Fetch once, pass explicitly |
| Max matches | 10 candidates, return 1-3 | 50 matches |
fetch-supplier-booking-history for threshold/limit behaviorbooking_history_integration_test.clj