Note (2026-04-24): After this document was written,
legal_entitywas renamed totenantand the oldtenantwas renamed toorganization. Read references to these terms with the pre-rename meaning.
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Prevent garbled line-item extraction on dense documents (receipts) by adapting layout reconstruction tolerance and falling back to vision transcription when needed.
Architecture: Two layers — (1) adaptive tolerance in the shared layout reconstruction that tightens row grouping on dense pages, (2) vision fallback in the OCR pipeline triggered by gap analysis on non-PDFBox documents.
Tech Stack: Clojure, Google Document AI bounding boxes, Gemini Vision API (existing)
density-ratio computation to layout.cljFiles:
Modify: src/com/getorcha/workers/ap/ingestion/transcription/layout.clj
Modify: test/com/getorcha/workers/ap/ingestion/transcription/layout_test.clj
Step 1: Write failing test for density-ratio
;; In layout_test.clj, add:
(deftest test-density-ratio
(testing "Normal invoice layout — ratio above threshold"
(let [elements [{:text "A" :x 0.1 :y 0.10 :width 0.3 :height 0.02}
{:text "B" :x 0.1 :y 0.15 :width 0.3 :height 0.02}
{:text "C" :x 0.1 :y 0.20 :width 0.3 :height 0.02}]]
(is (> (#'layout/density-ratio elements) 0.7))))
(testing "Dense receipt layout — ratio below threshold"
;; Simulates receipt: elements with height ~0.01, gaps ~0.003
(let [elements [{:text "A" :x 0.1 :y 0.200 :width 0.3 :height 0.010}
{:text "B" :x 0.1 :y 0.213 :width 0.3 :height 0.010}
{:text "C" :x 0.1 :y 0.226 :width 0.3 :height 0.010}
{:text "D" :x 0.1 :y 0.239 :width 0.3 :height 0.010}]]
(is (< (#'layout/density-ratio elements) 0.7))))
(testing "Single element — returns default high ratio"
(is (> (#'layout/density-ratio [{:text "A" :x 0.1 :y 0.1 :width 0.3 :height 0.02}])
0.7)))
(testing "Empty elements — returns default high ratio"
(is (> (#'layout/density-ratio []) 0.7))))
Run: clj -X:test:silent :nses '[com.getorcha.workers.ap.ingestion.transcription.layout-test]' 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Ran .* tests)"
Expected: FAIL — density-ratio not found
density-ratioAdd to layout.clj after the midpoint-y function:
(defn ^:private density-ratio
"Compute density ratio for a page's elements.
Returns median_gap / median_height. Low values (< 0.7) indicate dense
layouts where row grouping tolerance should be tightened.
Returns 1.0 for fewer than 2 elements (not enough data)."
[elements]
(if (< (count elements) 2)
1.0
(let [sorted (sort-by :y elements)
heights (keep #(when (pos? (:height %)) (:height %)) sorted)
gaps (->> (partition 2 1 sorted)
(map (fn [[a b]] (- (:y b) (:y a))))
(filter pos?))]
(if (or (empty? heights) (empty? gaps))
1.0
(let [median (fn [coll]
(let [s (vec (sort coll))
n (count s)]
(if (odd? n)
(s (quot n 2))
(* 0.5 (+ (s (quot n 2)) (s (dec (quot n 2))))))))
med-h (median heights)
med-gap (median gaps)]
(if (pos? med-h)
(/ med-gap med-h)
1.0))))))
Run: clj -X:test:silent :nses '[com.getorcha.workers.ap.ingestion.transcription.layout-test]' 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Ran .* tests)"
Expected: All tests pass
Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion/transcription/layout.clj
git add src/com/getorcha/workers/ap/ingestion/transcription/layout.clj test/com/getorcha/workers/ap/ingestion/transcription/layout_test.clj
git commit -m "feat: add density-ratio computation for layout elements"
same-row? and group-into-rows density-awareFiles:
Modify: src/com/getorcha/workers/ap/ingestion/transcription/layout.clj
Modify: test/com/getorcha/workers/ap/ingestion/transcription/layout_test.clj
Step 1: Write failing test for dense-mode row grouping
;; In layout_test.clj, add to test-group-into-rows:
(testing "Dense mode separates tightly packed items that normal mode merges"
;; Simulates IKEA receipt: VAT code at y=0.2073 and next item's article
;; at y=0.2086 are only 0.0013 apart. Normal tolerance merges them.
(let [elements [{:text "3,99" :x 0.4 :y 0.2040 :width 0.06 :height 0.0101}
{:text "7,98" :x 0.6 :y 0.2052 :width 0.06 :height 0.0105}
{:text "0" :x 0.8 :y 0.2073 :width 0.02 :height 0.0076}
{:text "Art 90" :x 0.1 :y 0.2086 :width 0.15 :height 0.0151}]]
;; Normal mode: all 4 elements in one row (current behavior)
(is (= 1 (count (#'layout/group-into-rows elements false)))
"Normal mode merges these elements")
;; Dense mode: Art 90 should be in a separate row
(is (= 2 (count (#'layout/group-into-rows elements true)))
"Dense mode separates the article line from prices")))
Run: clj -X:test:silent :nses '[com.getorcha.workers.ap.ingestion.transcription.layout-test]' 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Ran .* tests)"
Expected: FAIL — wrong arity for group-into-rows (currently takes 1 arg)
same-row? and group-into-rows to accept dense modeIn layout.clj, replace the existing same-row? and group-into-rows:
(defn ^:private same-row?
"Check if element belongs in the current row using midpoint proximity.
Compares the element's Y-midpoint against the row anchor (first element).
The anchor is fixed, preventing transitive chain growth in tight tables.
In dense mode (dense? true), uses tighter tolerance with min height
to prevent cross-item bleeding on receipts and similar layouts."
[anchor element dense?]
(let [tolerance (if dense?
(* 0.5 (min (:height anchor) (:height element)))
(* 0.75 (max (:height anchor) (:height element))))]
(< (abs (- (midpoint-y anchor) (midpoint-y element)))
tolerance)))
(defn ^:private group-into-rows
"Group elements into visual rows using anchor-based midpoint proximity.
The first element of each row is the anchor; subsequent elements join
if their Y-midpoint is within tolerance of the anchor's midpoint.
Returns seq of rows, each row being a seq of elements.
When dense? is true, uses tighter tolerance to avoid merging elements
from adjacent logical items."
([elements]
(group-into-rows elements false))
([elements dense?]
(->> elements
(sort-by :y)
(reduce
(fn [rows element]
(if (empty? rows)
[[element]]
(let [current-row (peek rows)
anchor (first current-row)]
(if (same-row? anchor element dense?)
(conj (pop rows) (conj current-row element))
(conj rows [element])))))
[]))))
elements->structured-text to use density-aware groupingIn layout.clj, replace elements->structured-text:
(defn elements->structured-text
"Convert positioned text elements into structured row-based text.
Takes a seq of `{:text :x :y :width :height}` maps and options.
Returns a string with rows separated by newlines and columns by `|`.
Automatically detects dense layouts (receipts, etc.) and tightens
row grouping tolerance to prevent cross-item bleeding.
Options:
:column-gap-threshold - normalized gap above which a `|` separator is
inserted (default 0.05)"
[elements {:keys [column-gap-threshold]
:or {column-gap-threshold 0.05}
:as _opts}]
(let [dense? (< (density-ratio elements) 0.7)]
(->> (group-into-rows elements dense?)
(map #(row->text % column-gap-threshold))
(remove str/blank?)
(str/join "\n"))))
group-into-rows directlyThe existing tests in test-group-into-rows call (#'layout/group-into-rows elements) with 1 arg. The new arity-1 version defaults to dense? false, so existing tests should still pass without changes. Verify by running them.
Run: clj -X:test:silent :nses '[com.getorcha.workers.ap.ingestion.transcription.layout-test]' 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Ran .* tests)"
Expected: All tests pass
Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion/transcription/layout.clj
git add src/com/getorcha/workers/ap/ingestion/transcription/layout.clj test/com/getorcha/workers/ap/ingestion/transcription/layout_test.clj
git commit -m "feat: density-aware row grouping in layout reconstruction"
needs-dense-layout-fallback? to transcriptionFiles:
Modify: src/com/getorcha/workers/ap/ingestion/transcription.clj
Step 1: Add the density check function
Add after needs-vision-fallback? (around line 493) in transcription.clj:
(defn ^:private needs-dense-layout-fallback?
"Check if OCR result has dense page layouts that cause garbled row grouping.
Computes per-page gap statistics from Document AI line bounding boxes.
Returns true if any page has a density ratio below the threshold.
Requires >20 elements on a page to trigger (avoids false positives on
sparse pages with coincidentally tight spacing)."
[{:keys [dense-layout-ratio] :as _vision-config}
{:keys [raw-response] :as _ocr-result}]
(when (and dense-layout-ratio (seq raw-response))
(some (fn [response]
(let [document-text (get-in response [:document :text])]
(some (fn [page]
(when (seq (:lines page))
(let [elements (ocr-layout/page->elements document-text page)]
(when (> (count elements) 20)
(< (layout/density-ratio elements) dense-layout-ratio)))))
(get-in response [:document :pages]))))
raw-response)))
This requires page->elements to be public and density-ratio to be public. We'll fix visibility in the next steps.
page->elements public in ocr_layout.cljIn ocr_layout.clj, page->elements (line 50) is already public (no ^:private metadata). No change needed.
density-ratio public in layout.cljIn layout.clj, change density-ratio from ^:private to public and add a docstring:
(defn density-ratio
"Compute density ratio for a page's elements.
Returns median_gap / median_height. Low values (< 0.7) indicate dense
layouts where row grouping tolerance should be tightened.
Returns 1.0 for fewer than 2 elements (not enough data)."
[elements]
;; ... implementation unchanged
layout in transcription.cljIn the ns declaration of transcription.clj, add to the :require vector (maintaining alphabetical order):
[com.getorcha.workers.ap.ingestion.transcription.layout :as layout]
Check if it's already required. If not, add it.
Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion/transcription.clj src/com/getorcha/workers/ap/ingestion/transcription/layout.clj
git add src/com/getorcha/workers/ap/ingestion/transcription.clj src/com/getorcha/workers/ap/ingestion/transcription/layout.clj
git commit -m "feat: add dense layout detection for vision fallback"
ocr-with-vision-fallbackFiles:
Modify: src/com/getorcha/workers/ap/ingestion/transcription.clj
Modify: resources/com/getorcha/config.edn
Step 1: Add config parameter
In config.edn, add dense-layout-ratio to the :vision transcription config (line 248-252):
:vision {:pages-per-batch 10
:max-image-size-mb 7
;; Fallback triggers when >low-confidence-ratio of tokens have confidence <low-confidence-threshold
:low-confidence-threshold 0.8
:low-confidence-ratio 0.05
;; Fallback triggers when any page has density ratio below this threshold
:dense-layout-ratio 0.7}
ocr-with-vision-fallback to check dense layoutReplace the existing ocr-with-vision-fallback function:
(defn ^:private ocr-with-vision-fallback
"Run OCR with optional vision fallback for low-quality results.
Vision fallback triggers when:
- Token confidence is low (existing), OR
- Page layout is too dense for reliable row grouping (new)
Vision fallback only applies to PDFs (renders pages to images for Gemini)."
[context vision-config worker-pools db-pool legal-entity-id ingestion]
(let [mime-type (get-in ingestion [:file :mime-type])
ocr-result (ocr-transcribe! context ingestion)
needs-fallback? (and (pdf? mime-type)
vision-config
(:api-key vision-config)
(or (needs-vision-fallback? vision-config ocr-result)
(needs-dense-layout-fallback? vision-config ocr-result)))
ocr-with-layout (ocr-layout/reconstruct-layout ocr-result)]
(if needs-fallback?
(do
(log/info "OCR fallback to vision"
{:reason (cond
(needs-vision-fallback? vision-config ocr-result) :low-confidence
:else :dense-layout)
:ocr-token-quality-stats (:ocr-token-quality-stats ocr-result)})
(try
(vision-transcribe! vision-config
(:preprocessing-pool worker-pools)
db-pool
legal-entity-id
ingestion)
(catch Exception e
(log/warn "Vision fallback failed, using OCR result"
{:error (ex-message e)})
ocr-with-layout)))
ocr-with-layout)))
Note: needs-fallback? is computed before reconstruct-layout so the density check uses the raw Document AI data, not the already-reconstructed text. The reconstruct-layout call (which now uses adaptive tolerance from Task 2) still runs as the fallback if vision fails.
Run: clj-kondo --lint src/com/getorcha/workers/ap/ingestion/transcription.clj
Run: clj -X:test:silent 2>&1 | grep -A 5 -E "(FAIL in|ERROR in|Ran .* tests)"
Expected: All tests pass
git add src/com/getorcha/workers/ap/ingestion/transcription.clj resources/com/getorcha/config.edn
git commit -m "feat: wire dense layout vision fallback into OCR pipeline"
Files: None (manual verification)
;; In REPL
(integrant.repl/reset)
Use /clojure-eval skill. Allow extra time (~15s) for system restart.
(require '[com.getorcha.app.ingestion :as app.ingestion])
(app.ingestion/requeue-document!
(:com.getorcha.db/pool integrant.repl.state/system)
(:com.getorcha.aws/state integrant.repl.state/system)
#uuid "019d2e52-e9c9-70e0-88f6-353601d75aa8")
Query the ingestion status:
psql -h localhost -U postgres -d orcha -c "SELECT id, status, transcription_method, vision_fallback_used FROM ap_ingestion WHERE document_id = '019d2e52-e9c9-70e0-88f6-353601d75aa8' ORDER BY created_at DESC LIMIT 1"
Then check a few line items match the PDF:
psql -h localhost -U postgres -d orcha -t -A -c "SELECT structured_data FROM document WHERE id = '019d2e52-e9c9-70e0-88f6-353601d75aa8'" | python3 -c "
import json, sys
d = json.load(sys.stdin)
for li in d['line-items'][:5]:
print(f\"{li['description']:<35} qty={li.get('quantity',1)} up={li.get('unit-price')} amt={li.get('amount')}\")
print(f\"Sum: {sum(li['amount'] for li in d['line-items']):.2f}\")
print(f\"Total: {d['total']}\")
"
Expected: First item SAMLA Box 39x28x28 c should have qty=2, unit-price=3.99, amount=7.98 (matching the PDF). Sum of line items should approximate the total (375.45).