Benchmark - Semantic Search

Benchmark Dashboard

Run Benchmark
Max: {{ total_queries }}
{% if total_queries > 100 %}
{% endif %}
{% if error %}
{{ error }}
{% endif %} {% if results %}
{{ results.google.total_queries }}
Total Queries
{% set total_time = (results.google.latency_mean_ms + results.jina.latency_mean_ms + results.minilm.latency_mean_ms + results.llm.latency_mean_ms) * results.google.total_queries / 1000 %}
{{ "%.1f"|format(total_time) }}s
Total Time (est.)
{{ "%.1f"|format(results.google.total_queries / total_queries * 100) }}%
Test Set Coverage
Export Report

Generate a self-contained HTML report with confusion matrix, curated examples, and aggregate metrics. The report can be shared and viewed offline.

This will re-run the benchmark with {{ limit }} queries to collect detailed results.
Benchmark Results
{% set models = ['google', 'jina', 'minilm', 'llm'] %} {% set model_names = {'google': 'Google (768d)', 'jina': 'Jina (1024d)', 'minilm': 'MiniLM (384d)', 'llm': 'LLM (Gemini)'} %} {# Find best accuracies and worst latencies #} {% set best_gl = results|dictsort|map(attribute='1.exact_match_gl')|max %} {% set best_cc = results|dictsort|map(attribute='1.exact_match_cc')|max %} {% set best_top3 = results|dictsort|map(attribute='1.top_3_accuracy')|max %} {% set best_top5 = results|dictsort|map(attribute='1.top_5_accuracy')|max %} {% set best_top10 = results|dictsort|map(attribute='1.top_10_accuracy')|max %} {% set worst_p50 = results|dictsort|map(attribute='1.latency_p50_ms')|max %} {% set worst_p95 = results|dictsort|map(attribute='1.latency_p95_ms')|max %} {% for model in models %} {% set r = results[model] %} {# Check if LLM was skipped (no API calls = N/A) #} {% set llm_skipped = (model == 'llm' and r.latency_p50_ms == 0 and r.exact_match_gl == 0) %} {% if llm_skipped %} {% else %} {% endif %} {% endfor %}
Model GL Accuracy CC Accuracy Top-3 Top-5 Top-10 Latency (p50) Latency (p95) Cost (USD)
{{ model_names[model] }} N/A - GOOGLE_API_KEY not set {{ "%.1f"|format(r.exact_match_gl * 100) }}% {% if model == 'llm' %} {{ "%.1f"|format(r.exact_match_cc * 100) }}% {% else %} N/A {% endif %} {% if model == 'llm' %} N/A {% else %} {{ "%.1f"|format(r.top_3_accuracy * 100) }}% {% endif %} {% if model == 'llm' %} N/A {% else %} {{ "%.1f"|format(r.top_5_accuracy * 100) }}% {% endif %} {% if model == 'llm' %} N/A {% else %} {{ "%.1f"|format(r.top_10_accuracy * 100) }}% {% endif %} {{ "%.0f"|format(r.latency_p50_ms) }}ms {{ "%.0f"|format(r.latency_p95_ms) }}ms ${{ "%.4f"|format(r.total_cost_usd) }}
{% endif %}