Phase 3: Search Implementation - Context

Gathered: 2026-02-20 Status: Ready for planning

## Phase Boundary

Query interface that returns search results from two backends — pgvector semantic search (3 embedding models) and LLM context matching. Users submit a query via web interface and see results from all approaches side-by-side.

## Implementation Decisions

Query Interface

Simple web interface for submitting queries
Single text input for query, dropdown for K (3, 5, 10)
Results always show all 3 embedding models + LLM in one view
No model selector needed — all models run on every query

Result Display

Side-by-side columns: Google | Jina | MiniLM | LLM
Each pgvector column shows top-K results
LLM column shows single prediction (GL account + cost center)
Full details per result row: supplier name, description, GL account, cost center, amounts, similarity score, embedding model used

LLM Matching Behavior

Replicate Orcha's actual approach — investigate config.edn, ingestion.clj, post_process.clj
Use Gemini Flash (same as Orcha)
Copy credentials from Orcha config
Match Orcha's prompt/context structure exactly
Return GL account + cost center only (no confidence score)

Execution

Run all 4 searches (3 embedding models + LLM) in parallel
Faster response time, concurrent execution

Claude's Discretion

Results page UX (inline vs navigation) — pick simplest approach
LLM API error handling — appropriate error display
Flask app structure and routing

## Specific Ideas

"Whatever is simplest" — prioritize straightforward implementation over features
Web interface should be minimal, not fancy
Orcha replication is key for LLM matching — investigate their actual code

## Deferred Ideas

None — discussion stayed within phase scope

Phase: 03-search-implementation Context gathered: 2026-02-20