Stack Research

Domain: Semantic Search Comparison Benchmark (pgvector vs LLM-based matching) Researched: 2026-02-20 Confidence: HIGH

Core Technologies

Technology Version Purpose Why Recommended Confidence
PostgreSQL 18.2 Database with vector search Latest stable, required for pgvector 0.8.1 compatibility. PGDATA changed in v18 - mount at /var/lib/postgresql not version-specific path HIGH
pgvector 0.8.1 Vector similarity search extension Latest stable (Nov 2024). HNSW indexes, halfvec support, binary quantization up to 64K dimensions HIGH
Python 3.12 Embedding generation, benchmarking Sweet spot: supported by all required libraries (google-genai, sentence-transformers, streamlit) HIGH
Docker latest Container runtime Official pgvector/pgvector image available with pg18 tag HIGH

Database Layer

Library Version Purpose Why Recommended Confidence
pgvector (Python) 0.4.2 Python bindings for pgvector Official pgvector Python package, supports psycopg3 HIGH
psycopg 3.3.3 PostgreSQL adapter Modern async-capable driver, replaces psycopg2. Required by pgvector Python package HIGH

Embedding Libraries

Library Version Purpose Why Recommended Confidence
google-genai 1.64.0 Google embeddings (gemini-embedding-001, text-multilingual-embedding-002) NEW unified SDK. Replaces deprecated google-generativeai. Works with both Gemini API and Vertex AI HIGH
sentence-transformers 5.2.3 Local embeddings (all-MiniLM-L6-v2), Jina models De-facto standard for local embeddings. Can run Jina v3/v4 with trust_remote_code=True HIGH
transformers latest Backend for sentence-transformers Required dependency, handles model loading HIGH

Embedding Models Supported

Model Dimensions Max Tokens Access Use Case
gemini-embedding-001 3072 (default), scalable to 768 2048 Google API Production multilingual - RECOMMENDED over text-multilingual-embedding-002
text-multilingual-embedding-002 768 2048 Vertex AI Legacy multilingual - still supported but gemini-embedding-001 outperforms
jina-embeddings-v3 1024 (default), scalable 32-1024 8192 Jina API / SentenceTransformers Multilingual retrieval, 570M params, 94 languages
all-MiniLM-L6-v2 384 256 Local (free) Fast local baseline, 22MB model, no API costs

Benchmarking Framework

Library Version Purpose Why Recommended Confidence
ragas 0.4.3 RAG evaluation metrics Industry standard (65% Fortune 500 adoption). LLM-as-judge, 92% human-aligned faithfulness scoring HIGH
pandas latest Data manipulation Standard for tabular data, comparison matrices HIGH
numpy latest Numerical operations Required for embedding operations, cosine similarity HIGH

Interactive Dashboard

Library Version Purpose Why Recommended Confidence
streamlit 1.54.0 Interactive HTML dashboard Best for quick data science apps. Simpler than Dash, more capable than Gradio for benchmarking UIs HIGH
plotly latest Interactive visualizations First-class Streamlit integration, interactive charts for benchmark comparisons HIGH

Development Tools

Tool Purpose Notes
uv Python package management Faster than pip, better dependency resolution. Use uv pip install
pytest Test framework For embedding quality tests, regression detection
python-dotenv Environment variables Load API keys from .env files

Docker Setup

version: '3.8'
services:
  postgres:
    image: pgvector/pgvector:0.8.1-pg18
    environment:
      POSTGRES_USER: semantic
      POSTGRES_PASSWORD: semantic
      POSTGRES_DB: semantic_search
    ports:
      - "5432:5432"
    volumes:
      # IMPORTANT: Postgres 18 changed PGDATA location
      - pgdata:/var/lib/postgresql
    shm_size: 256mb  # Required for parallel HNSW index builds

volumes:
  pgdata:

pgvector Index Configuration

-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create table with vector column
CREATE TABLE line_items (
    id SERIAL PRIMARY KEY,
    supplier_name TEXT,
    description TEXT,
    embedding vector(768),  -- Adjust dimensions per model
    debit_account TEXT,
    credit_account TEXT,
    cost_center TEXT
);

-- HNSW index (faster queries, slower builds)
CREATE INDEX ON line_items USING hnsw (embedding vector_cosine_ops);

-- Alternative: IVFFlat (faster builds, good for prototyping)
-- CREATE INDEX ON line_items USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Installation

# Core dependencies
pip install psycopg[binary] pgvector google-genai sentence-transformers

# Benchmarking
pip install ragas pandas numpy

# Dashboard
pip install streamlit plotly

# Development
pip install python-dotenv pytest

# Full install (single command)
pip install psycopg[binary] pgvector google-genai sentence-transformers ragas pandas numpy streamlit plotly python-dotenv pytest

Requirements.txt

# Database
psycopg[binary]>=3.3.0
pgvector>=0.4.2

# Embeddings
google-genai>=1.64.0
sentence-transformers>=5.2.0

# Benchmarking
ragas>=0.4.0
pandas>=2.0.0
numpy>=1.24.0

# Dashboard
streamlit>=1.54.0
plotly>=5.0.0

# Development
python-dotenv>=1.0.0
pytest>=8.0.0

Alternatives Considered

Recommended Alternative Why Not Use Alternative
google-genai google-generativeai DEPRECATED since Nov 2025. google-genai is the new unified SDK
google-genai google-cloud-aiplatform vertexai modules deprecated June 2025, removed June 2026. Migrate to google-genai
psycopg (v3) psycopg2 psycopg3 is async-capable, better performance. pgvector Python package supports both but v3 recommended
streamlit Gradio Gradio optimized for ML demos not data dashboards. Streamlit better for benchmark comparisons
streamlit Dash Dash more powerful but steeper learning curve, requires HTML/CSS. Overkill for spike
all-MiniLM-L6-v2 all-mpnet-base-v2 MiniLM is 22MB vs 420MB, 5x faster. Acceptable quality tradeoff for local baseline
jina-embeddings-v3 jina-embeddings-v4 v4 is 3.8B params (10x larger), multimodal. v3 sufficient for text-only line item matching

What NOT to Use

Avoid Why Use Instead
google-generativeai Deprecated Nov 2025, limited maintenance only google-genai
vertexai.language_models Deprecated June 2025, removed June 2026 google-genai with vertexai=True
psycopg2-binary Legacy, no async support psycopg[binary] (v3)
text-embedding-004 Deprecated Aug 2025 gemini-embedding-001
embedding-001 (old Gemini) Already deprecated gemini-embedding-001
Gradio for dashboards Designed for ML demos, not data apps streamlit
Flask/Django Overkill for benchmark dashboard, require frontend work streamlit

Stack Patterns by Variant

For API-based embeddings (production-like):

For local embeddings (baseline comparison):

For LLM-based matching (Orcha replication):

Version Compatibility Matrix

Package Compatible With Notes
pgvector 0.8.1 Postgres 13+ Postgres 17.0-17.2 has linking bug, use 17.3+ or 18
pgvector (Python) 0.4.2 psycopg 3.x, psycopg2 Both supported, psycopg3 recommended
google-genai 1.64.0 Python 3.10+ Does NOT support Python 3.9
sentence-transformers 5.2.3 Python 3.10+ Dropped Python 3.9 support
streamlit 1.54.0 Python 3.10+ Dropped Python 3.9 support
ragas 0.4.3 Python 3.9+ Still supports 3.9

Recommended Python: 3.12 - Stable, supported by all libraries, good performance.

Embedding Model Selection Guide

Scenario Model Rationale
Multilingual production gemini-embedding-001 Best MTEB multilingual scores, 100+ languages
German invoices specifically text-multilingual-embedding-002 Good German support, lower cost than gemini
Cost-sensitive/offline all-MiniLM-L6-v2 Free, local, fast - baseline comparison
Long descriptions (>256 tokens) jina-embeddings-v3 8192 token context, late chunking support
Best retrieval quality jina-embeddings-v3 Task-specific adapters, Matryoshka embeddings

API Key Requirements

Service Environment Variable Notes
Google AI (Gemini API) GOOGLE_API_KEY Free tier available via AI Studio
Google Cloud (Vertex AI) GOOGLE_APPLICATION_CREDENTIALS Service account JSON, existing Orcha credentials
Jina AI JINA_API_KEY Free tier with generous limits

Sources


Stack research for: Semantic Search Comparison Benchmark Researched: 2026-02-20