Note (2026-04-24): After this document was written, legal_entity was renamed to tenant and the old tenant was renamed to organization. Read references to these terms with the pre-rename meaning.

Booking History Matching Simplification

Problem

The current booking history matching implementation produces poor results because it weights description similarity at 40% in the confidence formula. Historical descriptions ("01.2025 Car Lease") and invoice descriptions ("Servicerate vom 01.02.2026 bis 28.02.2026") are semantically related but textually different, causing pg_trgm to return ~0.06 similarity and tank the overall score.

PR #257's original approach worked better: filter by supplier name only, pass all matches to the LLM as CSV, let the LLM reason about semantic similarity.

Design

Simplify to match PR #257 strategy with explicit parameter passing.

Data Flow

run "invoice"
    │
    ├─► fetch-supplier-booking-history(db-pool, legal-entity-id, issuer-name)
    │       └─► SQL: similarity(supplier_name_normalized, ...) >= 0.7, LIMIT 50
    │
    ├─► booking-history->csv(matches)
    │       └─► CSV string with: supplier-name, description, net-amount,
    │           debit-account, credit-account, cost-center
    │
    └─► Pass booking-csv to processors:
            (->AccountsMatcher context ingestion booking-csv)
            (->CostCenterMatcher context ingestion booking-csv)

Changes

Database:

Code (post_process.clj):

Prompts:

Comparison

Aspect Before After
Filtering supplier (0.3) + description (weighted 40%) supplier only (0.7)
Confidence high/medium/none tiers None - pass all matches
Format Nested JSON on line items CSV in prompt
Computation Pre-enrichment before processors Fetch once, pass explicitly
Max matches 10 candidates, return 1-3 50 matches

Testing