Semantic Search Benchmark Report

Semantic Search Benchmark Report

Comparison of Embedding Models vs LLM-Based Matching
Generated: {{ generated_at }}

Aggregate Metrics

{% if benchmark_results %}
{{ benchmark_results.google.total_queries if benchmark_results.google else 0 }}
Total Queries
{% set best_gl = [benchmark_results.google.exact_match_gl, benchmark_results.jina.exact_match_gl, benchmark_results.minilm.exact_match_gl, benchmark_results.llm.exact_match_gl]|max %}
{{ "%.1f"|format(best_gl * 100) }}%
Best GL Accuracy
{% set best_latency = [benchmark_results.google.latency_p50_ms, benchmark_results.jina.latency_p50_ms, benchmark_results.minilm.latency_p50_ms]|select|min %}
{{ "%.0f"|format(best_latency if best_latency else 0) }}ms
Best P50 Latency
{% set models = [ ('google', 'Google (768d)'), ('jina', 'Jina (1024d)'), ('minilm', 'MiniLM (384d)'), ('llm', 'LLM (Gemini)') ] %} {% set best_gl = benchmark_results.values()|map(attribute='exact_match_gl')|max %} {% set best_top3 = benchmark_results.values()|map(attribute='top_3_accuracy')|max %} {% set best_top5 = benchmark_results.values()|map(attribute='top_5_accuracy')|max %} {% set best_top10 = benchmark_results.values()|map(attribute='top_10_accuracy')|max %} {% for model_key, model_name in models %} {% set r = benchmark_results.get(model_key, {}) %} {% set llm_skipped = (model_key == 'llm' and r.latency_p50_ms == 0 and r.exact_match_gl == 0) %} {% if llm_skipped %} {% else %} {% endif %} {% endfor %}
Model GL Accuracy CC Accuracy Top-3 Top-5 Top-10 Latency (P50) Latency (P95) Cost (USD)
{{ model_name }}N/A - GOOGLE_API_KEY not set{{ "%.1f"|format(r.exact_match_gl * 100) }}% {% if model_key == 'llm' %}{{ "%.1f"|format(r.exact_match_cc * 100) }}%{% else %}N/A{% endif %} {% if model_key == 'llm' %}N/A{% else %}{{ "%.1f"|format(r.top_3_accuracy * 100) }}%{% endif %} {% if model_key == 'llm' %}N/A{% else %}{{ "%.1f"|format(r.top_5_accuracy * 100) }}%{% endif %} {% if model_key == 'llm' %}N/A{% else %}{{ "%.1f"|format(r.top_10_accuracy * 100) }}%{% endif %} {{ "%.0f"|format(r.latency_p50_ms) }}ms {{ "%.0f"|format(r.latency_p95_ms) }}ms ${{ "%.4f"|format(r.total_cost_usd) }}
{% else %}

No benchmark results available.

{% endif %}

GL Account Confusion Matrix

Shows prediction patterns for the Google embedding model. Low-frequency accounts are grouped into "Other".

{% if confusion_matrix_img %} Confusion Matrix {% else %}

Insufficient data to generate confusion matrix.

{% endif %}

Curated Examples

Representative examples across different outcome categories to illustrate model behavior.

{% if showcase and showcase.categories %} {% for category in showcase.categories %}

{{ category.title }}{{ category.subtitle }}

{{ category.count }} examples
{{ category.description }}
{% if category.examples %} {% for ex in category.examples %}
Query: {{ ex.query_text }} Similarity: {{ "%.2f"|format(ex.google_similarity) }}
Google: {{ ex.google_prediction or 'N/A' }}
Jina: {{ ex.jina_prediction or 'N/A' }}
MiniLM: {{ ex.minilm_prediction or 'N/A' }}
LLM: {{ ex.llm_prediction or 'N/A' }}
Ground Truth: GL: {{ ex.ground_truth }} {% if ex.cost_center %} | CC: {{ ex.cost_center }}{% endif %}
{% endfor %} {% else %}

No examples found for this category.

{% endif %}
{% endfor %} {% else %}

No showcase examples available.

{% endif %}