Note (2026-04-24): After this document was written, legal_entity was renamed to tenant and the old tenant was renamed to organization. Read references to these terms with the pre-rename meaning.

Booking History Matching Simplification - Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Simplify booking history matching to filter by supplier only, pass CSV to LLM prompts.

Architecture: Fetch booking history once per invoice in run "invoice", convert to CSV, pass explicitly to AccountsMatcher and CostCenterMatcher constructors. Remove pre-enrichment and confidence tiers.

Tech Stack: Clojure, HoneySQL, pg_trgm similarity, next-jdbc


Task 1: Add fetch-supplier-booking-history function

Files:

Step 1: Write the new function

Replace find-booking-history-matches (lines 41-107) with:

(defn ^:private fetch-supplier-booking-history
  "Fetches booking history entries matching the invoice issuer by supplier name.
   Uses pg_trgm similarity >= 0.7 threshold on normalized supplier name.
   Returns up to 50 matches as a vector of maps, or nil if no matches."
  [db legal-entity-id issuer-name]
  (when (and legal-entity-id (not (string/blank? issuer-name)))
    (let [supplier-norm (util.text/normalize-supplier-name issuer-name)
          rows (db.sql/execute!
                 db
                 {:select [:supplier-name :description :net-amount
                           :debit-account :credit-account :cost-center]
                  :from   [[:booking-history-item :bhi]]
                  :join   [[:booking-history-upload :bhu]
                           [:= :bhu.id :bhi.upload-id]]
                  :where  [:and
                           [:= :bhu.legal-entity-id legal-entity-id]
                           [:is :bhi.deleted-at nil]
                           [:>= [:similarity :bhi.supplier-name-normalized [:inline supplier-norm]] 0.7]]
                  :order-by [[[:similarity :bhi.supplier-name-normalized [:inline supplier-norm]] :desc]]
                  :limit  50})]
      (when (seq rows)
        (mapv (fn [row]
                {:supplier-name  (:booking-history-item/supplier-name row)
                 :description    (:booking-history-item/description row)
                 :net-amount     (:booking-history-item/net-amount row)
                 :debit-account  (:booking-history-item/debit-account row)
                 :credit-account (:booking-history-item/credit-account row)
                 :cost-center    (:booking-history-item/cost-center row)})
              rows)))))

Step 2: Verify it compiles

Run in REPL:

(require '[com.getorcha.workers.ingestion.post-process :as post-process] :reload)

Expected: No errors

Step 3: Commit

git add src/com/getorcha/workers/ingestion/post_process.clj
git commit -m "Add fetch-supplier-booking-history with supplier-only filtering"

Task 2: Delete enrich-with-booking-history

Files:

Step 1: Delete the function

Delete lines 110-140 (the entire enrich-with-booking-history function).

Step 2: Verify it compiles

Run in REPL:

(require '[com.getorcha.workers.ingestion.post-process :as post-process] :reload)

Expected: Error about enrich-with-booking-history being called in run "invoice" - this is expected, we'll fix it in Task 5.

Step 3: Commit (skip for now, bundle with Task 5)


Task 3: Modify AccountsMatcher to accept booking-csv

Files:

Step 1: Update the record definition

Change line 298 from:

(defrecord AccountsMatcher [context ingestion]

to:

(defrecord AccountsMatcher [context ingestion booking-csv]

Step 2: Update the prompt template

In the :accounts-match prompt (around line 266-268), after ${accounts-csv} add:

${booking-history}

Step 3: Inject booking-csv into prompt

In -compute method (around line 327-330), update the prompt call to include booking-history:

Change:

prompt     (workers/legal-entity-prompt db-pool legal-entity-id :accounts-match
                                  {:structured-data (json/generate-string batch-data {:pretty true})
                                   :accounts-csv    accounts-csv
                                   :email-context   (or email-ctx "")})

to:

prompt     (workers/legal-entity-prompt db-pool legal-entity-id :accounts-match
                                  {:structured-data (json/generate-string batch-data {:pretty true})
                                   :accounts-csv    accounts-csv
                                   :booking-history (or booking-csv "")
                                   :email-context   (or email-ctx "")})

Step 4: Update prompt instructions

In the :accounts-match prompt, replace STEP 0 (lines 187-192) with:

STEP 0: CHECK HISTORICAL BOOKINGS
If a HISTORICAL BOOKINGS section is provided below, it contains verified past bookings for the same or similar supplier.
- Compare each line item's description against historical descriptions to find similar bookings
- If a match is found, prefer the same debit/credit accounts unless clearly inappropriate
- Use historical data to inform account choices, not as absolute rules
- If no historical section is present, proceed with standard matching

Step 5: Verify it compiles

Run in REPL:

(require '[com.getorcha.workers.ingestion.post-process :as post-process] :reload)

Expected: No errors (but constructor calls will fail until Task 5)

Step 6: Commit

git add src/com/getorcha/workers/ingestion/post_process.clj
git commit -m "Add booking-csv parameter to AccountsMatcher"

Task 4: Modify CostCenterMatcher to accept booking-csv

Files:

Step 1: Update the record definition

Change line 549 from:

(defrecord CostCenterMatcher [context ingestion]

to:

(defrecord CostCenterMatcher [context ingestion booking-csv]

Step 2: Update the prompt template

In the :cost-center-match prompt (around line 479-485), after ${cost-centers-csv} add:

${booking-history}

Step 3: Inject booking-csv into prompt

In -compute method, update build-cc-prompt call (around line 595-600) to include booking-history:

Change:

prompt     (build-cc-prompt strategy-section
                            {:column-context      column-context
                             :confidence-guidance confidence-guidance
                             :structured-data     (json/generate-string batch-data {:pretty true})
                             :cost-centers-csv    cost-centers-csv
                             :email-context       (or email-ctx "")})

to:

prompt     (build-cc-prompt strategy-section
                            {:column-context      column-context
                             :confidence-guidance confidence-guidance
                             :structured-data     (json/generate-string batch-data {:pretty true})
                             :cost-centers-csv    cost-centers-csv
                             :booking-history     (or booking-csv "")
                             :email-context       (or email-ctx "")})

Step 4: Verify it compiles

Run in REPL:

(require '[com.getorcha.workers.ingestion.post-process :as post-process] :reload)

Expected: No errors

Step 5: Commit

git add src/com/getorcha/workers/ingestion/post_process.clj
git commit -m "Add booking-csv parameter to CostCenterMatcher"

Task 5: Update run "invoice" to fetch once and pass to processors

Files:

Step 1: Update the run method

Replace lines 2460-2472 with:

(defmethod run "invoice"
  [context ingestion]
  (let [{:keys [db-pool]} context
        {:keys [structured-data document]} ingestion
        legal-entity-id (:document/legal-entity-id document)
        issuer-name     (get-in structured-data [:issuer :name])
        ;; Fetch booking history once
        booking-matches (fetch-supplier-booking-history db-pool legal-entity-id issuer-name)
        booking-csv     (when (seq booking-matches)
                          (str "--- HISTORICAL BOOKINGS FOR THIS SUPPLIER (CSV) ---\n"
                               (dataset->csv booking-matches)
                               "\n\nUse these verified past bookings to inform account and cost center choices for similar line items."))
        processors      [(->AccountsMatcher context ingestion booking-csv)
                         (->CostCenterMatcher context ingestion booking-csv)
                         (->AccrualsMatcher context ingestion)
                         (->SupplierMatcher context ingestion)
                         (->TaxComplianceAnalyzer context ingestion)
                         (->FinancialValidationResolver context ingestion)
                         (->UncertainValidationsResolver context ingestion)]]
    (run-processors ingestion processors)))

Step 2: Verify it compiles

Run in REPL:

(require '[com.getorcha.workers.ingestion.post-process :as post-process] :reload)

Expected: No errors

Step 3: Commit

git add src/com/getorcha/workers/ingestion/post_process.clj
git commit -m "Fetch booking history once in run invoice, pass to processors"

Task 6: Update integration tests

Files:

Step 1: Update tests to use new function

Replace the test file content:

(ns com.getorcha.erp.http.settings.booking-history-integration-test
  (:require [clojure.test :refer [deftest is testing use-fixtures]]
            [com.getorcha.erp.http.settings.booking-history :as http.booking-history]
            [com.getorcha.test.fixtures :as fixtures]
            [com.getorcha.test.notification-helpers :as helpers]
            [com.getorcha.workers.ingestion.post-process :as post-process]))


(use-fixtures :once fixtures/with-running-system)
(use-fixtures :each fixtures/with-db-rollback)


(deftest fetch-supplier-booking-history-test
  (testing "fetches matching booking history by supplier name"
    (let [legal-entity-id (helpers/create-legal-entity!)

          _ (#'http.booking-history/insert-booking-history-upload!
              fixtures/*db*
              {:legal-entity-id legal-entity-id
               :filename "test.csv"
               :items [{:supplier-name "ACME Corporation"
                        :description "Monthly consulting services"
                        :debit-account "6300"
                        :credit-account "3300"
                        :cost-center "CC-100"
                        :net-amount 5000.0}]})

          ;; Query for similar supplier
          matches (#'post-process/fetch-supplier-booking-history
                    fixtures/*db*
                    legal-entity-id
                    "ACME Corp")]

      (is (seq matches))
      (is (= "6300" (:debit-account (first matches))))
      (is (= "CC-100" (:cost-center (first matches)))))))


(deftest soft-delete-prevents-matching-test
  (testing "deleted items are not returned in matches"
    (let [legal-entity-id (helpers/create-legal-entity!)

          _ (#'http.booking-history/insert-booking-history-upload!
              fixtures/*db*
              {:legal-entity-id legal-entity-id
               :filename "test.csv"
               :items [{:supplier-name "ACME"
                        :description "Test"
                        :debit-account "6300"}]})

          _ (#'http.booking-history/soft-delete-booking-history! fixtures/*db* legal-entity-id)

          matches (#'post-process/fetch-supplier-booking-history
                    fixtures/*db*
                    legal-entity-id
                    "ACME")]

      (is (nil? matches)))))


(deftest new-upload-replaces-old-test
  (testing "new upload soft-deletes existing items"
    (let [legal-entity-id (helpers/create-legal-entity!)

          ;; First upload
          _ (#'http.booking-history/insert-booking-history-upload!
              fixtures/*db*
              {:legal-entity-id legal-entity-id
               :filename "first.csv"
               :items [{:supplier-name "Old Supplier"
                        :description "Old service"
                        :debit-account "6100"}]})

          ;; Second upload (should replace first)
          _ (#'http.booking-history/insert-booking-history-upload!
              fixtures/*db*
              {:legal-entity-id legal-entity-id
               :filename "second.csv"
               :items [{:supplier-name "New Supplier"
                        :description "New service"
                        :debit-account "6200"}]})

          ;; Query for old supplier - should not match
          old-matches (#'post-process/fetch-supplier-booking-history
                        fixtures/*db*
                        legal-entity-id
                        "Old Supplier")

          ;; Query for new supplier - should match
          new-matches (#'post-process/fetch-supplier-booking-history
                        fixtures/*db*
                        legal-entity-id
                        "New Supplier")]

      (is (nil? old-matches))
      (is (seq new-matches))
      (is (= "6200" (:debit-account (first new-matches)))))))

Step 2: Run tests

clj -X:test:silent :nses '[com.getorcha.erp.http.settings.booking-history-integration-test]'

Expected: All tests pass

Step 3: Commit

git add test/com/getorcha/erp/http/settings/booking_history_integration_test.clj
git commit -m "Update booking history tests for supplier-only matching"

Task 7: REPL verification with Alphabet test case

Files: None (REPL only)

Step 1: Test the full flow

(require '[com.getorcha.db.sql :as db.sql]
         '[com.getorcha.workers.ingestion.post-process :as post-process] :reload)

(def db-pool (:com.getorcha.db/pool integrant.repl.state/system))
(def legal-entity-id #uuid "00000000-0000-0000-0000-000000000001")
(def issuer-name "Alphabet Fuhrparkmanagement GmbH")

;; Should return ~50 matches
(def matches (#'post-process/fetch-supplier-booking-history db-pool legal-entity-id issuer-name))
(count matches)  ;; expect <= 50
(first matches)  ;; inspect structure

Step 2: Verify CSV generation

(def csv (#'post-process/dataset->csv matches))
(println (subs csv 0 500))  ;; print first 500 chars

Expected: CSV with headers: supplier-name, description, net-amount, debit-account, credit-account, cost-center

Step 3: Final commit

git add -A
git commit -m "Complete booking history matching simplification"

Summary of Changes

File Change
post_process.clj Replace find-booking-history-matches with fetch-supplier-booking-history
post_process.clj Delete enrich-with-booking-history
post_process.clj Add booking-csv field to AccountsMatcher record
post_process.clj Add booking-csv field to CostCenterMatcher record
post_process.clj Update run "invoice" to fetch once, pass to processors
post_process.clj Update :accounts-match prompt template
post_process.clj Update :cost-center-match prompt template
booking_history_integration_test.clj Update tests for new API