Refactor the evaluation report generation to more closely match Orcha's ingestion process, enabling meaningful comparison and eventual integration.
Key changes:
| Aspect | Current | Target |
|---|---|---|
| LLM calls per invoice | N (one per line item) | 2 (AccountsMatcher + CostCenterMatcher) |
| CoA/CC in prompt | None | Full datasets as CSV |
| Historical bookings | Curated per line item | Curated once per invoice |
| Output structure | Combined GL+CC | Separate GL and CC results |
Export CoA and CC from Orcha DB once, commit to repo:
# Chart of Accounts
psql -h localhost -U postgres -d orcha -c "\COPY (
SELECT number, name, description, balance_position
FROM gl_accounts_dataset
WHERE legal_entity_id = '<REGNOLOGY_ID>' AND is_active = true
ORDER BY number
) TO STDOUT CSV HEADER" > data/regnology_coa.csv
# Cost Centers
psql -h localhost -U postgres -d orcha -c "\COPY (
SELECT *
FROM cost_center_dataset
WHERE legal_entity_id = '<REGNOLOGY_ID>' AND position IS NOT NULL
ORDER BY number
) TO STDOUT CSV HEADER" > data/regnology_cc.csv
COA_CSV_PATH = Path(__file__).parent.parent.parent / 'data' / 'regnology_coa.csv'
CC_CSV_PATH = Path(__file__).parent.parent.parent / 'data' / 'regnology_cc.csv'
# Loaded once at startup
coa_csv = COA_CSV_PATH.read_text()
cc_csv = CC_CSV_PATH.read_text()
You are a double-entry bookkeeping specialist for accounts payable invoices.
Your task: assign the correct GL debit account for each line item.
## Chart of Accounts
${coa_csv}
## Historical Booking Patterns (from similar invoices)
${curated_bookings_csv}
## Invoice Data
Supplier: ${supplier_name}
Line Items:
${line_items_json}
## Instructions
1. For each line item, find the most appropriate GL account from the Chart of Accounts
2. Consider historical patterns for similar suppliers/descriptions
3. Prefer specific accounts over generic ones
## Response Format
{
"line_items": [
{"debit_account": "XXXXXX", "confidence": 0.85, "reasoning": "..."},
...
]
}
You are a cost center allocation specialist for accounts payable invoices.
Your task: assign the correct cost center for each line item.
## Cost Centers
${cc_csv}
## Historical Booking Patterns (from similar invoices)
${curated_bookings_csv}
## Invoice Data
Supplier: ${supplier_name}
Line Items:
${line_items_json}
## Instructions
1. For each line item, find the most appropriate cost center
2. Consider historical patterns for similar suppliers/descriptions
3. Different line items may belong to different cost centers
## Response Format
{
"line_items": [
{"cost_center": "XXXXXX", "confidence": 0.85, "reasoning": "..."},
...
]
}
The existing curate_bookings_for_invoice() already supports multiple line items. Change from calling it per line item to calling it once per invoice with all descriptions.
for li in line_items:
curated = curate_bookings_for_invoice(conn, supplier_name, [li['description']], ...)
all_descriptions = [li.get('description', '') for li in line_items]
curated = curate_bookings_for_invoice(conn, supplier_name, all_descriptions, ...)
Invoice with 3 line items:
- "Software license renewal"
- "Professional services - implementation"
- "Training workshop"
Step 1: Semantic search per line item (k=10 each)
→ Up to 30 raw matches
Step 2: Merge and dedupe by ID
→ ~20 unique matches
Step 3: Cluster by (debit_account, cost_center)
→ Groups matches by their GL/CC combination
Step 4: Deduplicate within clusters (similarity > 0.9)
→ ~2-3 diverse examples per cluster
Step 5: Output CSV sorted by cluster frequency
Output includes cluster_count to indicate pattern frequency.
def evaluate_invoice(issue, orcha_conn, semantic_conn, coa_csv, cc_csv, ...):
# 1. Fetch invoice from Orcha (unchanged)
invoice_data = fetch_invoice_from_orcha(orcha_conn, issue.invoice_number)
# 2. Collect ALL line item descriptions for curation
descriptions = [li.get('description', '') for li in invoice_data['line_items']]
# 3. Curate bookings once for entire invoice
curated = curate_bookings_for_invoice(conn, supplier_name, descriptions, ...)
curated_csv = curated_bookings_to_csv(curated)
# 4. Build line items JSON for prompt
line_items_json = json.dumps([
{"description": li['description'], "amount": li['amount']}
for li in invoice_data['line_items']
], indent=2)
# 5. Call AccountsMatcher (one LLM call for all line items)
gl_results = call_accounts_matcher(supplier_name, line_items_json, coa_csv, curated_csv)
# 6. Call CostCenterMatcher (one LLM call for all line items)
cc_results = call_cost_center_matcher(supplier_name, line_items_json, cc_csv, curated_csv)
# 7. Merge results and compare against ground truth
for i, li in enumerate(invoice_data['line_items']):
li_result = LineItemResult(
llm_gl=gl_results[i]['debit_account'],
llm_cc=cc_results[i]['cost_center'],
...
)
run_evaluation()def run_evaluation(issues_file, output_file, ...):
# Load static data once
coa_csv = COA_CSV_PATH.read_text()
cc_csv = CC_CSV_PATH.read_text()
# ... existing setup ...
for issue in issues:
result = evaluate_invoice(
issue, orcha_conn, semantic_conn,
coa_csv, cc_csv, # Pass pre-loaded data
k, threshold
)
results.append(result)
src/evaluation/llm_eval.py
call_accounts_matcher() and call_cost_center_matcher() functionsevaluate_invoice() to call matchers once per invoicerun_evaluation() to load and pass CoA/CC databuild_llm_prompt() and call_llm()New: data/regnology_coa.csv
gl_accounts_dataset tableNew: data/regnology_cc.csv
cost_center_dataset table