Purpose: EVAL-01 requires clean train/test separation before any embedding work to prevent data leakage. INFRA-07 requires model metadata storage. Both must be done before embedding generation begins.
Output: Database schema additions (is_test_set column, embedding_model_config table), stratified split applied to existing 6078 line items, Python module for split operations.
<execution_context> @./.claude/get-shit-done/workflows/execute-plan.md @./.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/phases/01-foundation/01-02-SUMMARY.md
@src/db.py @init.sql
is_test_set column to line_item table:ALTER TABLE line_item ADD COLUMN IF NOT EXISTS is_test_set BOOLEAN DEFAULT FALSE;
CREATE TABLE IF NOT EXISTS embedding_model_config (
id SERIAL PRIMARY KEY,
model_name TEXT NOT NULL UNIQUE,
column_name TEXT NOT NULL,
dimensions INT NOT NULL,
distance_metric TEXT NOT NULL,
task_type TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
INSERT INTO embedding_model_config (model_name, column_name, dimensions, distance_metric, task_type) VALUES
('text-multilingual-embedding-002', 'embedding_google', 768, 'cosine', 'RETRIEVAL_DOCUMENT'),
('jina-embeddings-v3', 'embedding_jina', 1024, 'cosine', 'retrieval.passage'),
('all-MiniLM-L6-v2', 'embedding_minilm', 384, 'cosine', NULL)
ON CONFLICT (model_name) DO NOTHING;
Place these statements AFTER the CREATE TABLE line_item statement but BEFORE any index creation comments.
Also create a migration script migrations/001_train_test_split.sql with just the ALTER TABLE and CREATE TABLE statements for running against existing database.
docker compose exec -T postgres psql -U dev -d dev -c "\d line_item" and verify is_test_set column exists.
Run docker compose exec -T postgres psql -U dev -d dev -c "SELECT * FROM embedding_model_config" and verify 3 rows.
src/evaluation/__init__.py:"""Evaluation utilities for semantic search comparison."""
from .train_test_split import create_train_test_split
__all__ = ["create_train_test_split"]
src/evaluation/train_test_split.py:"""Train/test split with stratification by debit_account."""
import pandas as pd
from sklearn.model_selection import train_test_split as sklearn_split
def create_train_test_split(
conn,
test_size: float = 0.20,
random_state: int = 42
) -> tuple[int, int]:
"""
Create stratified train/test split by debit_account.
Marks ~20% of line items as test set with proportional
representation of each debit account.
Args:
conn: psycopg connection
test_size: Fraction of data for test set (default 0.20)
random_state: Random seed for reproducibility (default 42)
Returns:
Tuple of (train_count, test_count)
"""
# Fetch all IDs and their debit_accounts
df = pd.read_sql(
"SELECT id, debit_account FROM line_item",
conn
)
# Stratified split - ensures proportional debit_account distribution
train_ids, test_ids = sklearn_split(
df['id'],
test_size=test_size,
random_state=random_state,
stratify=df['debit_account']
)
# Update database
with conn.cursor() as cur:
# Reset all to train
cur.execute("UPDATE line_item SET is_test_set = FALSE")
# Mark test set
cur.execute(
"UPDATE line_item SET is_test_set = TRUE WHERE id = ANY(%s)",
(list(test_ids),)
)
conn.commit()
return len(train_ids), len(test_ids)
def get_split_stats(conn) -> dict:
"""Get current train/test split statistics."""
with conn.cursor() as cur:
cur.execute("""
SELECT
is_test_set,
COUNT(*) as count,
COUNT(DISTINCT debit_account) as unique_accounts
FROM line_item
GROUP BY is_test_set
ORDER BY is_test_set
""")
rows = cur.fetchall()
return {
"train": {"count": rows[0][1], "unique_accounts": rows[0][2]} if rows else {},
"test": {"count": rows[1][1], "unique_accounts": rows[1][2]} if len(rows) > 1 else {}
}
Use sklearn's train_test_split with stratify parameter for balanced evaluation.
uv run python -c "from src.evaluation import create_train_test_split; print('OK')" - should print OK.
docker compose exec -T postgres psql -U dev -d dev < migrations/001_train_test_split.sql
from src.db import get_connection
from src.evaluation import create_train_test_split, get_split_stats
conn = get_connection()
train_count, test_count = create_train_test_split(conn)
print(f"Split complete: {train_count} train, {test_count} test ({test_count/(train_count+test_count)*100:.1f}%)")
stats = get_split_stats(conn)
print(f"Train: {stats['train']['count']} items, {stats['train']['unique_accounts']} unique accounts")
print(f"Test: {stats['test']['count']} items, {stats['test']['unique_accounts']} unique accounts")
conn.close()
Expected output: ~4862 train (80%), ~1216 test (20%)
docker compose exec -T postgres psql -U dev -d dev -c "SELECT is_test_set, COUNT(*) FROM line_item GROUP BY is_test_set"
Expected:
<success_criteria>