Note (2026-04-24): After this document was written, legal_entity was renamed to tenant and the old tenant was renamed to organization. Read references to these terms with the pre-rename meaning.

Booking History Matching Simplification

Problem

The current booking history matching implementation produces poor results because it weights description similarity at 40% in the confidence formula. Historical descriptions ("01.2025 Car Lease") and invoice descriptions ("Servicerate vom 01.02.2026 bis 28.02.2026") are semantically related but textually different, causing pg_trgm to return ~0.06 similarity and tank the overall score.

PR #257's original approach worked better: filter by supplier name only, pass all matches to the LLM as CSV, let the LLM reason about semantic similarity.

Design

Simplify to match PR #257 strategy with explicit parameter passing.

Data Flow

run "invoice"
    │
    ├─► fetch-supplier-booking-history(db-pool, legal-entity-id, issuer-name)
    │       └─► SQL: similarity(supplier_name_normalized, ...) >= 0.7, LIMIT 50
    │
    ├─► booking-history->csv(matches)
    │       └─► CSV string with: supplier-name, description, net-amount,
    │           debit-account, credit-account, cost-center
    │
    └─► Pass booking-csv to processors:
            (->AccountsMatcher context ingestion booking-csv)
            (->CostCenterMatcher context ingestion booking-csv)

Changes

Database:

Remove GiST index on description_normalized (already done in migration)

Code (post_process.clj):

Delete enrich-with-booking-history
Delete find-booking-history-matches
Add fetch-supplier-booking-history — supplier-only filtering, 0.7 threshold, limit 50
Modify AccountsMatcher record — add booking-csv field
Modify CostCenterMatcher record — add booking-csv field
Modify run "invoice" — fetch once, pass to both processors

Prompts:

Inject ${booking-history} CSV into accounts-match and cost-center-match prompts
LLM reasons over historical data to inform account/cost-center assignments

Comparison

Aspect	Before	After
Filtering	supplier (0.3) + description (weighted 40%)	supplier only (0.7)
Confidence	high/medium/none tiers	None - pass all matches
Format	Nested JSON on line items	CSV in prompt
Computation	Pre-enrichment before processors	Fetch once, pass explicitly
Max matches	10 candidates, return 1-3	50 matches

Testing

Unit test fetch-supplier-booking-history for threshold/limit behavior
Update integration tests in booking_history_integration_test.clj
REPL verification with Alphabet Fuhrparkmanagement test case