Line-Item Reconciliation for Document Matching

Issue: #291 Date: 2026-03-02

Problem

The matching system finds related documents but doesn't verify whether their contents agree. An invoice that overcharges by 50% compared to the PO still matches with high confidence. Line items, quantities, and prices are never cross-checked.

Feature

After documents are matched into a cluster, a single LLM call compares their contents and produces a structured reconciliation report surfacing price discrepancies, quantity mismatches, unmatched line items, and total inconsistencies.

Data Model

New table: document_cluster

CREATE TABLE document_cluster (
  id               uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  reconciliation   jsonb,
  reconciled_at    timestamptz,
  created_at       timestamptz NOT NULL DEFAULT now(),
  updated_at       timestamptz NOT NULL DEFAULT now()
);

document.cluster_id becomes a FK to document_cluster(id) ON DELETE SET NULL.

Existing cluster UUIDs are migrated: one document_cluster row per distinct cluster_id currently in use.

Reconciliation JSONB structure

{:status  "reconciled" ;; or "discrepancies"
 :summary "..."        ;; cluster-level human-readable summary
 :issues  [{:severity     "warning"  ;; or "error"
            :category     "price-discrepancy"
            :summary      "Invoice bills €120/unit but PO specifies €100/unit for Widget A"
            :document-ids ["<uuid-a>" "<uuid-b>"]
            :details      [{:field    "unit-price"
                            :expected "100.00"
                            :actual   "120.00"}]}]}

No changes to document_match.

Workers — Matching Changes

assign-cluster! updated

Creates/merges document_cluster rows instead of bare UUIDs:

format-document-summary made public

Currently private in llm_decision.clj. Made public for reuse by reconciliation.

Workers — Reconciliation

New namespace: com.getorcha.workers.matching.reconciliation

reconcile-cluster!:

  1. Load all documents in the cluster
  2. Format each with format-document-summary, including document ID
  3. Single Sonnet call with all documents
  4. Validate response against Malli schema
  5. Write to document_cluster.reconciliation + reconciled_at

Trigger point

In process-document! (worker.clj), between match-document! and set-matching-status! "succeeded":

(let [cluster-before (get-cluster-id db doc-id)
      _              (matching/match-document! db search-config llm-config doc)
      cluster-after  (get-cluster-id db doc-id)
      affected       (cond-> #{}
                       cluster-before (conj cluster-before)
                       cluster-after  (conj cluster-after))]
  (doseq [cluster-id affected]
    (reconciliation/reconcile-cluster! db llm-config cluster-id))
  (db.matching/set-matching-status! db doc-id {:status "succeeded"}))

Handles reingestion naturally: old and new clusters are both reconciled.

Error handling

On reconciliation failure (after retries): log warning, notify admins, still set matching status to "succeeded". Reconciliation is additive — missing results don't block matching.

LLM

Prompt structure

System message: role, rules (tolerance, matching strategy, output format).

User message: all documents in the cluster, each with UUID and formatted summary. Instructions to identify discrepancies, quantity mismatches, unmatched items, and total inconsistencies.

Response: JSON matching ReconciliationResponse schema.

Malli Schemas

(def ReconciliationIssue
  [:map
   [:severity [:enum "warning" "error"]]
   [:category :string]
   [:summary :string]
   [:document-ids [:vector :string]]
   [:details {:optional true}
    [:vector
     [:map
      [:field :string]
      [:expected [:maybe :string]]
      [:actual [:maybe :string]]]]]])

(def ReconciliationResponse
  [:map
   [:status [:enum "reconciled" "discrepancies"]]
   [:summary :string]
   [:issues [:vector ReconciliationIssue]]])

UI

Matches section header

Badge next to "Matches" heading:

Issues list

Below match cards, inside the matches section. Each issue renders:

Data flow

Reconciliation data loaded from document_cluster.reconciliation via the document's cluster_id. Part of the matches section — no separate endpoint. The existing matching-complete SSE event refreshes the whole section, so reconciliation results appear when matching finishes.

Cost

Single Sonnet call per cluster. Typical cluster of 2-3 documents ≈ 2-4K input tokens ≈ $0.01-0.02 per reconciliation.

Not included