Stack Research
Domain: Semantic Search Comparison Benchmark (pgvector vs LLM-based matching)
Researched: 2026-02-20
Confidence: HIGH
Recommended Stack
Core Technologies
| Technology |
Version |
Purpose |
Why Recommended |
Confidence |
| PostgreSQL |
18.2 |
Database with vector search |
Latest stable, required for pgvector 0.8.1 compatibility. PGDATA changed in v18 - mount at /var/lib/postgresql not version-specific path |
HIGH |
| pgvector |
0.8.1 |
Vector similarity search extension |
Latest stable (Nov 2024). HNSW indexes, halfvec support, binary quantization up to 64K dimensions |
HIGH |
| Python |
3.12 |
Embedding generation, benchmarking |
Sweet spot: supported by all required libraries (google-genai, sentence-transformers, streamlit) |
HIGH |
| Docker |
latest |
Container runtime |
Official pgvector/pgvector image available with pg18 tag |
HIGH |
Database Layer
| Library |
Version |
Purpose |
Why Recommended |
Confidence |
| pgvector (Python) |
0.4.2 |
Python bindings for pgvector |
Official pgvector Python package, supports psycopg3 |
HIGH |
| psycopg |
3.3.3 |
PostgreSQL adapter |
Modern async-capable driver, replaces psycopg2. Required by pgvector Python package |
HIGH |
Embedding Libraries
| Library |
Version |
Purpose |
Why Recommended |
Confidence |
| google-genai |
1.64.0 |
Google embeddings (gemini-embedding-001, text-multilingual-embedding-002) |
NEW unified SDK. Replaces deprecated google-generativeai. Works with both Gemini API and Vertex AI |
HIGH |
| sentence-transformers |
5.2.3 |
Local embeddings (all-MiniLM-L6-v2), Jina models |
De-facto standard for local embeddings. Can run Jina v3/v4 with trust_remote_code=True |
HIGH |
| transformers |
latest |
Backend for sentence-transformers |
Required dependency, handles model loading |
HIGH |
Embedding Models Supported
| Model |
Dimensions |
Max Tokens |
Access |
Use Case |
| gemini-embedding-001 |
3072 (default), scalable to 768 |
2048 |
Google API |
Production multilingual - RECOMMENDED over text-multilingual-embedding-002 |
| text-multilingual-embedding-002 |
768 |
2048 |
Vertex AI |
Legacy multilingual - still supported but gemini-embedding-001 outperforms |
| jina-embeddings-v3 |
1024 (default), scalable 32-1024 |
8192 |
Jina API / SentenceTransformers |
Multilingual retrieval, 570M params, 94 languages |
| all-MiniLM-L6-v2 |
384 |
256 |
Local (free) |
Fast local baseline, 22MB model, no API costs |
Benchmarking Framework
| Library |
Version |
Purpose |
Why Recommended |
Confidence |
| ragas |
0.4.3 |
RAG evaluation metrics |
Industry standard (65% Fortune 500 adoption). LLM-as-judge, 92% human-aligned faithfulness scoring |
HIGH |
| pandas |
latest |
Data manipulation |
Standard for tabular data, comparison matrices |
HIGH |
| numpy |
latest |
Numerical operations |
Required for embedding operations, cosine similarity |
HIGH |
Interactive Dashboard
| Library |
Version |
Purpose |
Why Recommended |
Confidence |
| streamlit |
1.54.0 |
Interactive HTML dashboard |
Best for quick data science apps. Simpler than Dash, more capable than Gradio for benchmarking UIs |
HIGH |
| plotly |
latest |
Interactive visualizations |
First-class Streamlit integration, interactive charts for benchmark comparisons |
HIGH |
| Tool |
Purpose |
Notes |
| uv |
Python package management |
Faster than pip, better dependency resolution. Use uv pip install |
| pytest |
Test framework |
For embedding quality tests, regression detection |
| python-dotenv |
Environment variables |
Load API keys from .env files |
Docker Setup
Recommended Docker Compose
version: '3.8'
services:
postgres:
image: pgvector/pgvector:0.8.1-pg18
environment:
POSTGRES_USER: semantic
POSTGRES_PASSWORD: semantic
POSTGRES_DB: semantic_search
ports:
- "5432:5432"
volumes:
# IMPORTANT: Postgres 18 changed PGDATA location
- pgdata:/var/lib/postgresql
shm_size: 256mb # Required for parallel HNSW index builds
volumes:
pgdata:
pgvector Index Configuration
-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create table with vector column
CREATE TABLE line_items (
id SERIAL PRIMARY KEY,
supplier_name TEXT,
description TEXT,
embedding vector(768), -- Adjust dimensions per model
debit_account TEXT,
credit_account TEXT,
cost_center TEXT
);
-- HNSW index (faster queries, slower builds)
CREATE INDEX ON line_items USING hnsw (embedding vector_cosine_ops);
-- Alternative: IVFFlat (faster builds, good for prototyping)
-- CREATE INDEX ON line_items USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
Installation
# Core dependencies
pip install psycopg[binary] pgvector google-genai sentence-transformers
# Benchmarking
pip install ragas pandas numpy
# Dashboard
pip install streamlit plotly
# Development
pip install python-dotenv pytest
# Full install (single command)
pip install psycopg[binary] pgvector google-genai sentence-transformers ragas pandas numpy streamlit plotly python-dotenv pytest
Requirements.txt
# Database
psycopg[binary]>=3.3.0
pgvector>=0.4.2
# Embeddings
google-genai>=1.64.0
sentence-transformers>=5.2.0
# Benchmarking
ragas>=0.4.0
pandas>=2.0.0
numpy>=1.24.0
# Dashboard
streamlit>=1.54.0
plotly>=5.0.0
# Development
python-dotenv>=1.0.0
pytest>=8.0.0
Alternatives Considered
| Recommended |
Alternative |
Why Not Use Alternative |
| google-genai |
google-generativeai |
DEPRECATED since Nov 2025. google-genai is the new unified SDK |
| google-genai |
google-cloud-aiplatform |
vertexai modules deprecated June 2025, removed June 2026. Migrate to google-genai |
| psycopg (v3) |
psycopg2 |
psycopg3 is async-capable, better performance. pgvector Python package supports both but v3 recommended |
| streamlit |
Gradio |
Gradio optimized for ML demos not data dashboards. Streamlit better for benchmark comparisons |
| streamlit |
Dash |
Dash more powerful but steeper learning curve, requires HTML/CSS. Overkill for spike |
| all-MiniLM-L6-v2 |
all-mpnet-base-v2 |
MiniLM is 22MB vs 420MB, 5x faster. Acceptable quality tradeoff for local baseline |
| jina-embeddings-v3 |
jina-embeddings-v4 |
v4 is 3.8B params (10x larger), multimodal. v3 sufficient for text-only line item matching |
What NOT to Use
| Avoid |
Why |
Use Instead |
| google-generativeai |
Deprecated Nov 2025, limited maintenance only |
google-genai |
| vertexai.language_models |
Deprecated June 2025, removed June 2026 |
google-genai with vertexai=True |
| psycopg2-binary |
Legacy, no async support |
psycopg[binary] (v3) |
| text-embedding-004 |
Deprecated Aug 2025 |
gemini-embedding-001 |
| embedding-001 (old Gemini) |
Already deprecated |
gemini-embedding-001 |
| Gradio for dashboards |
Designed for ML demos, not data apps |
streamlit |
| Flask/Django |
Overkill for benchmark dashboard, require frontend work |
streamlit |
Stack Patterns by Variant
For API-based embeddings (production-like):
- Use google-genai for Google models
- Use Jina API directly via requests (no official Python SDK needed)
- Measure API latency and cost per query
For local embeddings (baseline comparison):
- Use sentence-transformers with all-MiniLM-L6-v2
- No API costs, predictable latency
- Can also run Jina v3 locally via sentence-transformers
For LLM-based matching (Orcha replication):
- Use google-genai with Gemini 2.5 Flash
- Pass 50 historical bookings as CSV context
- Measure token usage for cost comparison
Version Compatibility Matrix
| Package |
Compatible With |
Notes |
| pgvector 0.8.1 |
Postgres 13+ |
Postgres 17.0-17.2 has linking bug, use 17.3+ or 18 |
| pgvector (Python) 0.4.2 |
psycopg 3.x, psycopg2 |
Both supported, psycopg3 recommended |
| google-genai 1.64.0 |
Python 3.10+ |
Does NOT support Python 3.9 |
| sentence-transformers 5.2.3 |
Python 3.10+ |
Dropped Python 3.9 support |
| streamlit 1.54.0 |
Python 3.10+ |
Dropped Python 3.9 support |
| ragas 0.4.3 |
Python 3.9+ |
Still supports 3.9 |
Recommended Python: 3.12 - Stable, supported by all libraries, good performance.
Embedding Model Selection Guide
| Scenario |
Model |
Rationale |
| Multilingual production |
gemini-embedding-001 |
Best MTEB multilingual scores, 100+ languages |
| German invoices specifically |
text-multilingual-embedding-002 |
Good German support, lower cost than gemini |
| Cost-sensitive/offline |
all-MiniLM-L6-v2 |
Free, local, fast - baseline comparison |
| Long descriptions (>256 tokens) |
jina-embeddings-v3 |
8192 token context, late chunking support |
| Best retrieval quality |
jina-embeddings-v3 |
Task-specific adapters, Matryoshka embeddings |
API Key Requirements
| Service |
Environment Variable |
Notes |
| Google AI (Gemini API) |
GOOGLE_API_KEY |
Free tier available via AI Studio |
| Google Cloud (Vertex AI) |
GOOGLE_APPLICATION_CREDENTIALS |
Service account JSON, existing Orcha credentials |
| Jina AI |
JINA_API_KEY |
Free tier with generous limits |
Sources
Stack research for: Semantic Search Comparison Benchmark
Researched: 2026-02-20