Generate a self-contained HTML report with confusion matrix, curated examples, and aggregate metrics. The report can be shared and viewed offline.
| Model | GL Accuracy | CC Accuracy | Top-3 | Top-5 | Top-10 | Latency (p50) | Latency (p95) | Cost (USD) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| {{ model_names[model] }} | {% if llm_skipped %}N/A - GOOGLE_API_KEY not set | {% else %}{{ "%.1f"|format(r.exact_match_gl * 100) }}% | {% if model == 'llm' %} {{ "%.1f"|format(r.exact_match_cc * 100) }}% {% else %} N/A {% endif %} | {% if model == 'llm' %} N/A {% else %} {{ "%.1f"|format(r.top_3_accuracy * 100) }}% {% endif %} | {% if model == 'llm' %} N/A {% else %} {{ "%.1f"|format(r.top_5_accuracy * 100) }}% {% endif %} | {% if model == 'llm' %} N/A {% else %} {{ "%.1f"|format(r.top_10_accuracy * 100) }}% {% endif %} | {{ "%.0f"|format(r.latency_p50_ms) }}ms | {{ "%.0f"|format(r.latency_p95_ms) }}ms | ${{ "%.4f"|format(r.total_cost_usd) }} | {% endif %}|||||||