Note (2026-04-24): After this document was written, legal_entity was renamed to tenant and the old tenant was renamed to organization. Read references to these terms with the pre-rename meaning.

Dense Layout Handling Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Prevent garbled line-item extraction on dense documents (receipts) by adapting layout reconstruction tolerance and falling back to vision transcription when needed.

Architecture: Two layers — (1) adaptive tolerance in the shared layout reconstruction that tightens row grouping on dense pages, (2) vision fallback in the OCR pipeline triggered by gap analysis on non-PDFBox documents.

Tech Stack: Clojure, Google Document AI bounding boxes, Gemini Vision API (existing)


Task 1: Add density-ratio computation to layout.clj

Files:

;; In layout_test.clj, add:

(deftest test-density-ratio
  (testing "Normal invoice layout — ratio above threshold"
    (let [elements [{:text "A" :x 0.1 :y 0.10 :width 0.3 :height 0.02}
                    {:text "B" :x 0.1 :y 0.15 :width 0.3 :height 0.02}
                    {:text "C" :x 0.1 :y 0.20 :width 0.3 :height 0.02}]]
      (is (> (#'layout/density-ratio elements) 0.7))))

  (testing "Dense receipt layout — ratio below threshold"
    ;; Simulates receipt: elements with height ~0.01, gaps ~0.003
    (let [elements [{:text "A" :x 0.1 :y 0.200 :width 0.3 :height 0.010}
                    {:text "B" :x 0.1 :y 0.213 :width 0.3 :height 0.010}
                    {:text "C" :x 0.1 :y 0.226 :width 0.3 :height 0.010}
                    {:text "D" :x 0.1 :y 0.239 :width 0.3 :height 0.010}]]
      (is (< (#'layout/density-ratio elements) 0.7))))

  (testing "Single element — returns default high ratio"
    (is (> (#'layout/density-ratio [{:text "A" :x 0.1 :y 0.1 :width 0.3 :height 0.02}])
           0.7)))

  (testing "Empty elements — returns default high ratio"
    (is (> (#'layout/density-ratio []) 0.7))))

Run: clj -X:test:silent :nses '[com.getorcha.workers.ap.ingestion.transcription.layout-test]' 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Ran .* tests)"

Expected: FAIL — density-ratio not found

Add to layout.clj after the midpoint-y function:

(defn ^:private density-ratio
  "Compute density ratio for a page's elements.
   Returns median_gap / median_height. Low values (< 0.7) indicate dense
   layouts where row grouping tolerance should be tightened.
   Returns 1.0 for fewer than 2 elements (not enough data)."
  [elements]
  (if (< (count elements) 2)
    1.0
    (let [sorted  (sort-by :y elements)
          heights (keep #(when (pos? (:height %)) (:height %)) sorted)
          gaps    (->> (partition 2 1 sorted)
                       (map (fn [[a b]] (- (:y b) (:y a))))
                       (filter pos?))]
      (if (or (empty? heights) (empty? gaps))
        1.0
        (let [median   (fn [coll]
                         (let [s (vec (sort coll))
                               n (count s)]
                           (if (odd? n)
                             (s (quot n 2))
                             (* 0.5 (+ (s (quot n 2)) (s (dec (quot n 2))))))))
              med-h    (median heights)
              med-gap  (median gaps)]
          (if (pos? med-h)
            (/ med-gap med-h)
            1.0))))))

Run: clj -X:test:silent :nses '[com.getorcha.workers.ap.ingestion.transcription.layout-test]' 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Ran .* tests)"

Expected: All tests pass

Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion/transcription/layout.clj

git add src/com/getorcha/workers/ap/ingestion/transcription/layout.clj test/com/getorcha/workers/ap/ingestion/transcription/layout_test.clj
git commit -m "feat: add density-ratio computation for layout elements"

Task 2: Make same-row? and group-into-rows density-aware

Files:

;; In layout_test.clj, add to test-group-into-rows:

  (testing "Dense mode separates tightly packed items that normal mode merges"
    ;; Simulates IKEA receipt: VAT code at y=0.2073 and next item's article
    ;; at y=0.2086 are only 0.0013 apart. Normal tolerance merges them.
    (let [elements [{:text "3,99"   :x 0.4 :y 0.2040 :width 0.06 :height 0.0101}
                    {:text "7,98"   :x 0.6 :y 0.2052 :width 0.06 :height 0.0105}
                    {:text "0"      :x 0.8 :y 0.2073 :width 0.02 :height 0.0076}
                    {:text "Art 90" :x 0.1 :y 0.2086 :width 0.15 :height 0.0151}]]
      ;; Normal mode: all 4 elements in one row (current behavior)
      (is (= 1 (count (#'layout/group-into-rows elements false)))
          "Normal mode merges these elements")
      ;; Dense mode: Art 90 should be in a separate row
      (is (= 2 (count (#'layout/group-into-rows elements true)))
          "Dense mode separates the article line from prices")))

Run: clj -X:test:silent :nses '[com.getorcha.workers.ap.ingestion.transcription.layout-test]' 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Ran .* tests)"

Expected: FAIL — wrong arity for group-into-rows (currently takes 1 arg)

In layout.clj, replace the existing same-row? and group-into-rows:

(defn ^:private same-row?
  "Check if element belongs in the current row using midpoint proximity.
   Compares the element's Y-midpoint against the row anchor (first element).
   The anchor is fixed, preventing transitive chain growth in tight tables.

   In dense mode (dense? true), uses tighter tolerance with min height
   to prevent cross-item bleeding on receipts and similar layouts."
  [anchor element dense?]
  (let [tolerance (if dense?
                    (* 0.5 (min (:height anchor) (:height element)))
                    (* 0.75 (max (:height anchor) (:height element))))]
    (< (abs (- (midpoint-y anchor) (midpoint-y element)))
       tolerance)))


(defn ^:private group-into-rows
  "Group elements into visual rows using anchor-based midpoint proximity.
   The first element of each row is the anchor; subsequent elements join
   if their Y-midpoint is within tolerance of the anchor's midpoint.
   Returns seq of rows, each row being a seq of elements.

   When dense? is true, uses tighter tolerance to avoid merging elements
   from adjacent logical items."
  ([elements]
   (group-into-rows elements false))
  ([elements dense?]
   (->> elements
        (sort-by :y)
        (reduce
         (fn [rows element]
           (if (empty? rows)
             [[element]]
             (let [current-row (peek rows)
                   anchor      (first current-row)]
               (if (same-row? anchor element dense?)
                 (conj (pop rows) (conj current-row element))
                 (conj rows [element])))))
         []))))

In layout.clj, replace elements->structured-text:

(defn elements->structured-text
  "Convert positioned text elements into structured row-based text.

   Takes a seq of `{:text :x :y :width :height}` maps and options.
   Returns a string with rows separated by newlines and columns by `|`.

   Automatically detects dense layouts (receipts, etc.) and tightens
   row grouping tolerance to prevent cross-item bleeding.

   Options:
     :column-gap-threshold - normalized gap above which a `|` separator is
                             inserted (default 0.05)"
  [elements {:keys [column-gap-threshold]
             :or   {column-gap-threshold 0.05}
             :as   _opts}]
  (let [dense? (< (density-ratio elements) 0.7)]
    (->> (group-into-rows elements dense?)
         (map #(row->text % column-gap-threshold))
         (remove str/blank?)
         (str/join "\n"))))

The existing tests in test-group-into-rows call (#'layout/group-into-rows elements) with 1 arg. The new arity-1 version defaults to dense? false, so existing tests should still pass without changes. Verify by running them.

Run: clj -X:test:silent :nses '[com.getorcha.workers.ap.ingestion.transcription.layout-test]' 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Ran .* tests)"

Expected: All tests pass

Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion/transcription/layout.clj

git add src/com/getorcha/workers/ap/ingestion/transcription/layout.clj test/com/getorcha/workers/ap/ingestion/transcription/layout_test.clj
git commit -m "feat: density-aware row grouping in layout reconstruction"

Task 3: Add needs-dense-layout-fallback? to transcription

Files:

Add after needs-vision-fallback? (around line 493) in transcription.clj:

(defn ^:private needs-dense-layout-fallback?
  "Check if OCR result has dense page layouts that cause garbled row grouping.
   Computes per-page gap statistics from Document AI line bounding boxes.
   Returns true if any page has a density ratio below the threshold.
   Requires >20 elements on a page to trigger (avoids false positives on
   sparse pages with coincidentally tight spacing)."
  [{:keys [dense-layout-ratio] :as _vision-config}
   {:keys [raw-response] :as _ocr-result}]
  (when (and dense-layout-ratio (seq raw-response))
    (some (fn [response]
            (let [document-text (get-in response [:document :text])]
              (some (fn [page]
                      (when (seq (:lines page))
                        (let [elements (ocr-layout/page->elements document-text page)]
                          (when (> (count elements) 20)
                            (< (layout/density-ratio elements) dense-layout-ratio)))))
                    (get-in response [:document :pages]))))
          raw-response)))

This requires page->elements to be public and density-ratio to be public. We'll fix visibility in the next steps.

In ocr_layout.clj, page->elements (line 50) is already public (no ^:private metadata). No change needed.

In layout.clj, change density-ratio from ^:private to public and add a docstring:

(defn density-ratio
  "Compute density ratio for a page's elements.
   Returns median_gap / median_height. Low values (< 0.7) indicate dense
   layouts where row grouping tolerance should be tightened.
   Returns 1.0 for fewer than 2 elements (not enough data)."
  [elements]
  ;; ... implementation unchanged

In the ns declaration of transcription.clj, add to the :require vector (maintaining alphabetical order):

[com.getorcha.workers.ap.ingestion.transcription.layout :as layout]

Check if it's already required. If not, add it.

Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion/transcription.clj src/com/getorcha/workers/ap/ingestion/transcription/layout.clj

git add src/com/getorcha/workers/ap/ingestion/transcription.clj src/com/getorcha/workers/ap/ingestion/transcription/layout.clj
git commit -m "feat: add dense layout detection for vision fallback"

Task 4: Wire dense layout check into ocr-with-vision-fallback

Files:

In config.edn, add dense-layout-ratio to the :vision transcription config (line 248-252):

:vision  {:pages-per-batch          10
          :max-image-size-mb        7
          ;; Fallback triggers when >low-confidence-ratio of tokens have confidence <low-confidence-threshold
          :low-confidence-threshold 0.8
          :low-confidence-ratio     0.05
          ;; Fallback triggers when any page has density ratio below this threshold
          :dense-layout-ratio       0.7}

Replace the existing ocr-with-vision-fallback function:

(defn ^:private ocr-with-vision-fallback
  "Run OCR with optional vision fallback for low-quality results.
   Vision fallback triggers when:
   - Token confidence is low (existing), OR
   - Page layout is too dense for reliable row grouping (new)
   Vision fallback only applies to PDFs (renders pages to images for Gemini)."
  [context vision-config worker-pools db-pool legal-entity-id ingestion]
  (let [mime-type        (get-in ingestion [:file :mime-type])
        ocr-result       (ocr-transcribe! context ingestion)
        needs-fallback?  (and (pdf? mime-type)
                              vision-config
                              (:api-key vision-config)
                              (or (needs-vision-fallback? vision-config ocr-result)
                                  (needs-dense-layout-fallback? vision-config ocr-result)))
        ocr-with-layout  (ocr-layout/reconstruct-layout ocr-result)]
    (if needs-fallback?
      (do
        (log/info "OCR fallback to vision"
                  {:reason (cond
                             (needs-vision-fallback? vision-config ocr-result) :low-confidence
                             :else :dense-layout)
                   :ocr-token-quality-stats (:ocr-token-quality-stats ocr-result)})
        (try
          (vision-transcribe! vision-config
                              (:preprocessing-pool worker-pools)
                              db-pool
                              legal-entity-id
                              ingestion)
          (catch Exception e
            (log/warn "Vision fallback failed, using OCR result"
                      {:error (ex-message e)})
            ocr-with-layout)))
      ocr-with-layout)))

Note: needs-fallback? is computed before reconstruct-layout so the density check uses the raw Document AI data, not the already-reconstructed text. The reconstruct-layout call (which now uses adaptive tolerance from Task 2) still runs as the fallback if vision fails.

Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion/transcription.clj

Run: clj -X:test:silent 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Ran .* tests)"

Expected: All tests pass

git add src/com/getorcha/workers/ap/ingestion/transcription.clj resources/com/getorcha/config.edn
git commit -m "feat: wire dense layout vision fallback into OCR pipeline"

Task 5: Integration test — re-ingest IKEA receipt

Files: None (manual verification)

;; In REPL
(integrant.repl/reset)

Use /clojure-eval skill. Allow extra time (~15s) for system restart.

(require '[com.getorcha.app.ingestion :as app.ingestion])
(app.ingestion/requeue-document!
  (:com.getorcha.db/pool integrant.repl.state/system)
  (:com.getorcha.aws/state integrant.repl.state/system)
  #uuid "019d2e52-e9c9-70e0-88f6-353601d75aa8")

Query the ingestion status:

psql -h localhost -U postgres -d orcha -c "SELECT id, status, transcription_method, vision_fallback_used FROM ap_ingestion WHERE document_id = '019d2e52-e9c9-70e0-88f6-353601d75aa8' ORDER BY created_at DESC LIMIT 1"

Then check a few line items match the PDF:

psql -h localhost -U postgres -d orcha -t -A -c "SELECT structured_data FROM document WHERE id = '019d2e52-e9c9-70e0-88f6-353601d75aa8'" | python3 -c "
import json, sys
d = json.load(sys.stdin)
for li in d['line-items'][:5]:
    print(f\"{li['description']:<35} qty={li.get('quantity',1)} up={li.get('unit-price')} amt={li.get('amount')}\")
print(f\"Sum: {sum(li['amount'] for li in d['line-items']):.2f}\")
print(f\"Total: {d['total']}\")
"

Expected: First item SAMLA Box 39x28x28 c should have qty=2, unit-price=3.99, amount=7.98 (matching the PDF). Sum of line items should approximate the total (375.45).