Semantic Search for LLM Booking Context

Goal

Replace the current pg_trgm supplier-name matching with semantic search to select a curated, deduplicated set of historical bookings to send to the LLM for account/cost-center assignment.

Background

Orcha's current approach:

Problems:

Scope

In scope:

Out of scope:

Embedding Changes

Text format:

{supplier_name} | {description}

Raw text, no normalization.

Schema changes:

Code cleanup:

Search & Curation Algorithm

Input: Invoice with N line items from supplier X

Step 1: Per-line-item semantic search

For each line item:

Configurable parameters (exposed in UI):

Step 2: Merge results

Collect all results from N searches into one pool. Tag each result with which line item(s) found it.

Step 3: Cluster by (debit_account, cost_center)

Group results by their account/CC assignment.

Step 4: Deduplicate within clusters

For each cluster:

Step 5: Build output

For each cluster:

Output: Curated list as CSV.

UI / Interface

Search input:

Search output:

Batch mode:

Code Cleanup Summary

Remove:

Update:

Add: