Semantic Search Benchmark Report

Comparison of Embedding Models vs LLM-Based Matching

Generated: {{ generated_at }}

Aggregate Metrics

{% if benchmark_results %}

{{ benchmark_results.google.total_queries if benchmark_results.google else 0 }}

Total Queries

{% set best_gl = [benchmark_results.google.exact_match_gl, benchmark_results.jina.exact_match_gl, benchmark_results.minilm.exact_match_gl, benchmark_results.llm.exact_match_gl]|max %}

{{ "%.1f"|format(best_gl * 100) }}%

Best GL Accuracy

{% set best_latency = [benchmark_results.google.latency_p50_ms, benchmark_results.jina.latency_p50_ms, benchmark_results.minilm.latency_p50_ms]|select|min %}

{{ "%.0f"|format(best_latency if best_latency else 0) }}ms

Best P50 Latency

{% set models = [ ('google', 'Google (768d)'), ('jina', 'Jina (1024d)'), ('minilm', 'MiniLM (384d)'), ('llm', 'LLM (Gemini)') ] %} {% set best_gl = benchmark_results.values()|map(attribute='exact_match_gl')|max %} {% set best_top3 = benchmark_results.values()|map(attribute='top_3_accuracy')|max %} {% set best_top5 = benchmark_results.values()|map(attribute='top_5_accuracy')|max %} {% set best_top10 = benchmark_results.values()|map(attribute='top_10_accuracy')|max %} {% for model_key, model_name in models %} {% set r = benchmark_results.get(model_key, {}) %} {% set llm_skipped = (model_key == 'llm' and r.latency_p50_ms == 0 and r.exact_match_gl == 0) %} {% if llm_skipped %} {% else %} {% endif %} {% endfor %}

Model	GL Accuracy	CC Accuracy	Top-3	Top-5	Top-10	Latency (P50)	Latency (P95)	Cost (USD)
{{ model_name }}	N/A - GOOGLE_API_KEY not set								{{ "%.1f"\|format(r.exact_match_gl * 100) }}%	{% if model_key == 'llm' %}{{ "%.1f"\|format(r.exact_match_cc * 100) }}%{% else %}N/A{% endif %}	{% if model_key == 'llm' %}N/A{% else %}{{ "%.1f"\|format(r.top_3_accuracy * 100) }}%{% endif %}	{% if model_key == 'llm' %}N/A{% else %}{{ "%.1f"\|format(r.top_5_accuracy * 100) }}%{% endif %}	{% if model_key == 'llm' %}N/A{% else %}{{ "%.1f"\|format(r.top_10_accuracy * 100) }}%{% endif %}	{{ "%.0f"\|format(r.latency_p50_ms) }}ms	{{ "%.0f"\|format(r.latency_p95_ms) }}ms	${{ "%.4f"\|format(r.total_cost_usd) }}

{% else %}

No benchmark results available.

{% endif %}

GL Account Confusion Matrix

Shows prediction patterns for the Google embedding model. Low-frequency accounts are grouped into "Other".

{% if confusion_matrix_img %}

{% else %}

Insufficient data to generate confusion matrix.

{% endif %}

Curated Examples

Representative examples across different outcome categories to illustrate model behavior.

{% if showcase and showcase.categories %} {% for category in showcase.categories %}

{{ category.count }} examples

{% if category.examples %} {% for ex in category.examples %}

Query: {{ ex.query_text }} Similarity: {{ "%.2f"|format(ex.google_similarity) }}

Google: {{ ex.google_prediction or 'N/A' }}

Jina: {{ ex.jina_prediction or 'N/A' }}

MiniLM: {{ ex.minilm_prediction or 'N/A' }}

LLM: {{ ex.llm_prediction or 'N/A' }}

Ground Truth: GL: {{ ex.ground_truth }} {% if ex.cost_center %} | CC: {{ ex.cost_center }}{% endif %}

{% endfor %} {% else %}

No examples found for this category.

{% endif %}

{% endfor %} {% else %}

No showcase examples available.

{% endif %}

Aggregate Metrics

GL Account Confusion Matrix

Curated Examples

{{ category.title }}{{ category.subtitle }}