Create LLM context matching backend and Flask web interface for semantic search comparison.

Purpose: Complete the search implementation phase by adding Orcha-style LLM matching (using Gemini Flash with historical booking context) and a minimal web interface that displays results from all 4 search approaches side-by-side.

Output: LLM matching module and Flask app with search form and results display.

<execution_context> @./.claude/get-shit-done/workflows/execute-plan.md @./.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/03-search-implementation/03-CONTEXT.md @.planning/phases/03-search-implementation/03-RESEARCH.md

Prior plan summary for search module

@.planning/phases/03-search-implementation/03-01-SUMMARY.md

Database and normalize modules

@src/db.py @src/normalize.py

Task 1: Enable pg_trgm extension and create LLM matching module init.sql, src/search/llm_matching.py 1. Add pg_trgm extension to init.sql: - Add `CREATE EXTENSION IF NOT EXISTS pg_trgm;` after pgvector extension - Add GIN index on supplier_name_normalized for similarity search: `CREATE INDEX IF NOT EXISTS idx_supplier_trgm ON line_item USING gin (supplier_name_normalized gin_trgm_ops);`
  1. Create src/search/llm_matching.py with Orcha-style LLM context matching:

fetch_booking_history(conn, supplier_name: str, limit: int = 50) -> list[dict]:

  • Normalize supplier name using src.normalize.normalize_supplier_name
  • Query using pg_trgm similarity operator % with threshold 0.7 (matching Orcha exactly per user decision)
  • Order by similarity(supplier_name_normalized, %s) DESC
  • Return fields: supplier_name, description, net_amount, debit_account, cost_center
  • Handle empty/None supplier gracefully (return empty list)

format_as_csv(data: list[dict]) -> str:

  • Convert list of dicts to CSV string
  • Header row with column names
  • Handle empty data (return empty string)

build_matching_prompt(query_text: str, historical_bookings: list[dict]) -> str:

  • Replicate Orcha's prompt structure (see RESEARCH.md)
  • Include historical bookings as CSV context
  • Include query text
  • Request JSON response with debit_account and cost_center

call_gemini_flash(prompt: str) -> str:

  • Use google-genai Client (NOT Vertex AI - use API key like Orcha)
  • Model: 'gemini-2.5-flash'
  • Config: response_mime_type='application/json', temperature=0
  • Handle JSON wrapped in markdown code blocks (strip json if present)

llm_context_match(conn, query_text: str) -> dict:

  • Parse supplier hint from query (format: "supplier | description")
  • Fetch booking history for supplier
  • Build prompt and call Gemini
  • Parse JSON response
  • Return dict with keys: debit_account, cost_center, error (if failed)

Load .env for GOOGLE_API_KEY (different from Vertex AI credentials).

IMPORTANT: For Gemini via API key (not Vertex AI):

  • Remove GOOGLE_GENAI_USE_VERTEXAI or set to False
  • Create client without vertexai parameter: genai.Client()
  • This uses GOOGLE_API_KEY env var automatically

Update src/search/init.py to export llm_context_match.

# Apply pg_trgm extension
docker exec -i semantic-search-db psql -U dev -d semantic_search -c "CREATE EXTENSION IF NOT EXISTS pg_trgm;"
docker exec -i semantic-search-db psql -U dev -d semantic_search -c "CREATE INDEX IF NOT EXISTS idx_supplier_trgm ON line_item USING gin (supplier_name_normalized gin_trgm_ops);"

# Test fetch_booking_history
python -c "
from src.db import get_connection
from src.search.llm_matching import fetch_booking_history
conn = get_connection()
results = fetch_booking_history(conn, 'AMAZON')
print(f'Found {len(results)} historical bookings for AMAZON')
assert len(results) > 0
conn.close()
"
pg_trgm extension enabled, LLM matching module fetches historical bookings and generates predictions.
Task 2: Create Flask web interface with parallel search execution src/app.py, src/templates/search.html 1. Install Flask: `uv add flask`
  1. Create src/templates/ directory

  2. Create src/app.py with Flask application:

from flask import Flask, render_template, request
from concurrent.futures import ThreadPoolExecutor, as_completed
from src.db import get_connection
from src.search import search_all_models
from src.search.llm_matching import llm_context_match

app = Flask(__name__, template_folder='templates')

@app.route('/', methods=['GET', 'POST'])
def search():
    results = None
    query = ''
    k = 5
    timing = {}

    if request.method == 'POST':
        query = request.form.get('query', '').strip()
        k = int(request.form.get('k', 5))

        if query:
            conn = get_connection()
            try:
                # Execute all searches in parallel
                results = execute_parallel_search(conn, query, k)
            finally:
                conn.close()

    return render_template('search.html', query=query, k=k, results=results)

def execute_parallel_search(conn, query: str, k: int) -> dict:
    """Execute pgvector and LLM searches in parallel."""
    results = {}

    def search_pgvector_all():
        # Uses search_all_models from Plan 01
        return ('pgvector', search_all_models(conn, query, k))

    def search_llm():
        return ('llm', llm_context_match(conn, query))

    with ThreadPoolExecutor(max_workers=2) as executor:
        futures = [
            executor.submit(search_pgvector_all),
            executor.submit(search_llm),
        ]
        for future in as_completed(futures):
            try:
                key, result = future.result()
                if key == 'pgvector':
                    results.update(result)  # Adds google, jina, minilm keys
                else:
                    results[key] = result
            except Exception as e:
                # Handle individual search failures gracefully
                if key == 'llm':
                    results['llm'] = {'error': str(e)}

    return results

if __name__ == '__main__':
    app.run(debug=True, port=5000)
  1. Create src/templates/search.html:
  • HTML5 document with minimal CSS (inline style block)
  • Search form: text input (size 80), K dropdown (3, 5, 10), submit button
  • Results grid: 4 columns (Google, Jina, MiniLM, LLM)
  • Each pgvector column shows top-K results with:
    • Supplier name (bold)
    • Description
    • GL Account: {debit_account} | CC: {cost_center or '-'}
    • Amount: {net_amount}
    • Similarity: {similarity:.3f} (green, bold)
  • LLM column shows single prediction:
    • GL Account: {debit_account or 'N/A'}
    • Cost Center: {cost_center or 'N/A'}
    • Or error message if failed
  • Pre-fill query and K from form submission for easy iteration
  • Use Jinja2 template syntax (Flask default)

Keep styles minimal but functional:

  • CSS grid for 4-column layout
  • Border around each column
  • Light border-bottom between result items
  • Green color for similarity scores
# Start Flask app and verify it runs
cd /home/volrath/code/worktrees-orcha-semantic-search/spikes/semantic-search
python -c "from src.app import app; print('Flask app imports successfully')"

# Test search endpoint manually
python -c "
from src.app import app
with app.test_client() as client:
    # GET request should show form
    response = client.get('/')
    assert response.status_code == 200
    assert b'Search' in response.data

    # POST request with query
    response = client.post('/', data={'query': 'AMAZON | Office supplies', 'k': '3'})
    assert response.status_code == 200
    print('Flask app responding correctly')
"
Flask web interface displays search form and results from all 4 approaches in side-by-side columns.
Task 3: End-to-end verification with sample queries Run the Flask app and verify full functionality:
  1. Start the Flask development server:

    cd /home/volrath/code/worktrees-orcha-semantic-search/spikes/semantic-search
    python -m src.app
    
  2. Test with sample queries from the test set:

    • Query: "AMAZON | Bueroartikel"
    • Query: "Deutsche Telekom | Mobilfunk"
    • Query: "IKEA | Moebel"
  3. Verify for each query:

    • All 4 columns show results
    • pgvector columns show K results with similarity scores
    • LLM column shows GL account and cost center prediction
    • No errors in any column
  4. Test edge cases:

    • Empty query (should show form only)
    • Query with no similar suppliers (LLM should handle gracefully)
    • K values: 3, 5, 10

Create a simple test script at src/test_search.py for automated verification:

"""Quick test script for search functionality."""
from src.db import get_connection
from src.search import search_all_models
from src.search.llm_matching import llm_context_match

def test_search():
    conn = get_connection()

    # Test pgvector search
    results = search_all_models(conn, "AMAZON | Office supplies", k=3)
    assert 'google' in results
    assert 'jina' in results
    assert 'minilm' in results
    assert len(results['google']) == 3

    # Test LLM matching
    llm_result = llm_context_match(conn, "AMAZON | Office supplies")
    assert 'debit_account' in llm_result or 'error' in llm_result

    conn.close()
    print("All search tests passed!")

if __name__ == "__main__":
    test_search()

Run: python -m src.test_search

python -m src.test_search

Expected output: "All search tests passed!" End-to-end search works with all 4 approaches returning results.

1. pg_trgm extension installed: ```sql SELECT * FROM pg_extension WHERE extname = 'pg_trgm'; ```
  1. LLM matching returns predictions:

    from src.search.llm_matching import llm_context_match
    from src.db import get_connection
    conn = get_connection()
    result = llm_context_match(conn, "AMAZON | supplies")
    print(result)  # Should have debit_account, cost_center
    
  2. Flask app accessible at http://localhost:5000:

    • Form displays correctly
    • Search returns 4 columns of results
    • All models show similarity scores
    • LLM shows prediction
  3. Parallel execution (4 searches complete faster than sequential):

    • Total time should be close to slowest individual search
    • Not 4x the average search time

<success_criteria>

  • pg_trgm extension enabled with GIN index on supplier_name_normalized
  • llm_context_match returns GL account and cost center predictions
  • Flask app runs at localhost:5000 with search form
  • Results display in 4-column grid (Google, Jina, MiniLM, LLM)
  • All searches execute in parallel via ThreadPoolExecutor
  • Sample queries return meaningful results from all approaches </success_criteria>
After completion, create `.planning/phases/03-search-implementation/03-02-SUMMARY.md`