Phase 3: Search Implementation - Context
Gathered: 2026-02-20
Status: Ready for planning
## Phase Boundary
Query interface that returns search results from two backends — pgvector semantic search (3 embedding models) and LLM context matching. Users submit a query via web interface and see results from all approaches side-by-side.
## Implementation Decisions
Query Interface
- Simple web interface for submitting queries
- Single text input for query, dropdown for K (3, 5, 10)
- Results always show all 3 embedding models + LLM in one view
- No model selector needed — all models run on every query
Result Display
- Side-by-side columns: Google | Jina | MiniLM | LLM
- Each pgvector column shows top-K results
- LLM column shows single prediction (GL account + cost center)
- Full details per result row: supplier name, description, GL account, cost center, amounts, similarity score, embedding model used
LLM Matching Behavior
- Replicate Orcha's actual approach — investigate config.edn, ingestion.clj, post_process.clj
- Use Gemini Flash (same as Orcha)
- Copy credentials from Orcha config
- Match Orcha's prompt/context structure exactly
- Return GL account + cost center only (no confidence score)
Execution
- Run all 4 searches (3 embedding models + LLM) in parallel
- Faster response time, concurrent execution
Claude's Discretion
- Results page UX (inline vs navigation) — pick simplest approach
- LLM API error handling — appropriate error display
- Flask app structure and routing
## Specific Ideas
- "Whatever is simplest" — prioritize straightforward implementation over features
- Web interface should be minimal, not fancy
- Orcha replication is key for LLM matching — investigate their actual code
## Deferred Ideas
None — discussion stayed within phase scope
Phase: 03-search-implementation
Context gathered: 2026-02-20