Note (2026-04-24): After this document was written,
legal_entitywas renamed totenantand the oldtenantwas renamed toorganization. Read references to these terms with the pre-rename meaning.
The document detail UI has inline editing shipped as a UI-only
prototype (no persistence — see
2026-04-09-inline-editing-ui-design.md).
We need a backend that captures every edit with a full audit trail,
tolerates concurrent edits, handles re-ingestion cleanly, and enables
downstream LLM-improvement analysis from the corrections users make.
A unified append-only document_history table stores both ingestion
and edit events as RFC 6902 patches with an ID-based array-path
extension. document.structured_data remains the materialized current
state and gains a version column for optimistic locking. User edits
flow through HTMX-shaped endpoints that return HTML fragments;
responses vary by status code, not by response body shape.
document_history table; document.version column;
line-item.id + line-item.order fields; removal of
trg_update_document_from_ingestion trigger.document_history, threaded through the detail view's Hiccup render.document_historyCREATE TYPE document_history_change_type AS ENUM ('ingestion', 'edit');
CREATE TABLE document_history (
id UUID PRIMARY KEY DEFAULT uuidv7(),
document_id UUID NOT NULL REFERENCES document(id) ON DELETE CASCADE,
change_type document_history_change_type NOT NULL,
ingestion_id UUID REFERENCES ingestion(id) ON DELETE SET NULL,
edited_by UUID REFERENCES "identity"(id) ON DELETE SET NULL,
patch JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT document_history_source_xor CHECK (
(change_type = 'ingestion' AND ingestion_id IS NOT NULL AND edited_by IS NULL)
OR
(change_type = 'edit' AND edited_by IS NOT NULL AND ingestion_id IS NULL)
)
);
CREATE INDEX idx_document_history_document_created
ON document_history(document_id, created_at);
Append-only. One row per HTTP operation (ingestion completion or edit
request). ingestion_id / edited_by are mutually exclusive,
enforced by the XOR check. uuidv7 primary key so rows sort natively
by creation time.
documentALTER TABLE document ADD COLUMN version INT NOT NULL DEFAULT 1;
Incremented in the same transaction as every document_history
write. Read by the UI into body[data-document-version], sent back
with every edit request as expected-version.
ingestion (and what stays)Two things change:
trg_update_document_from_ingestion is dropped.
Its logic (type-setting and needs_human_review derivation) moves
into the application's ingestion-completion handler.ingestion.structured_data and ingestion.valid_structured_data.The two columns remain in the schema — they are no longer
written by new code but keep their legacy values for rows predating
this release. A future migration will drop them; tracked in
resources/migrations/PENDING-CLEANUPS.md (see §6).
RFC 6902 JSON Patch stored as a JSONB array, with one extension for
array paths: elements are addressed by stable id, not index.
Examples:
// scalar replace
[{"op": "replace", "path": "/invoice-number", "value": "INV-2024-0099"}]
// nested field on a line item
[{"op": "replace", "path": "/line-items[id=li-abc]/debit-account/number", "value": "1200"}]
// full ingestion (always "replace" at root, first and subsequent)
[{"op": "replace", "path": "", "value": {"document-type": "invoice", "invoice-number": "...", "line-items": [...]}}]
// line-item add
[{"op": "add", "path": "/line-items/-", "value": {"id": "li-xyz", "order": 5, "description": "New line item", ...}}]
// line-item remove
[{"op": "remove", "path": "/line-items[id=li-abc]"}]
// reorder (multi-op)
[
{"op": "replace", "path": "/line-items[id=li-abc]/order", "value": 0},
{"op": "replace", "path": "/line-items[id=li-def]/order", "value": 1},
{"op": "replace", "path": "/line-items[id=li-ghi]/order", "value": 2}
]
The [id=X] segment is resolved to a concrete index by an
application-level applier (in-house, not a third-party JSON-Patch
library — we already own the code path). Applier contract:
[id=X] matches any array element where
element["id"] == X; the index is resolved by linear scan.[id=X] form; they remain valid across
later reorderings that change numeric indices.Every line item carries two new fields in structured_data:
id — UUID string (java.util.UUID/randomUUID). Stable across
reorderings. Generated server-side as the final post-processing
step of an ingestion, and on POST /documents/:id/line-items.order — 0-based integer. UI sorts by this. Reassigned
contiguously on reorder.Both are added to the Malli LineItem schema as required fields.
Existing line items are backfilled in the migration.
structured_data does not containdocument_history — see §4.2.The worker pipeline gains an explicit terminal transaction replacing
what trg_update_document_from_ingestion used to do:
1. LLM extraction → raw structured_data (no id/order)
2. Post-processing enrichments (accounts, accruals, tax compliance, …)
3. Annotate each line-item with :id and :order (last post-processing step)
4. Malli schema validation — schema now requires :id and :order,
so annotation must precede validation
5. Transactional write:
begin tx
SELECT document.version FOR UPDATE
INSERT INTO document_history (change_type='ingestion', ingestion_id, patch)
patch = [{"op": "replace", "path": "", "value": <final structured_data>}]
UPDATE document SET
structured_data = <final>,
type = <extracted document-type>,
needs_human_review = <derived from validation-results + schema validity>,
version = version + 1,
updated_at = now()
UPDATE ingestion SET status = 'completed', completed_at = now(), …
commit
Type-setting and needs_human_review derivation — formerly in the
trigger body — move into this handler. SELECT … FOR UPDATE on the
document row prevents racing with a concurrent user edit. Failed /
skipped ingestions skip this entirely and write no history row.
Four HTMX-shaped endpoints. Each is a thin wrapper around the same core "apply patch, write history, bump version, return HTML" flow. The differences are only in how the form params become a patch.
| Route | Form / params | Patch produced (stored) |
|---|---|---|
PATCH /documents/:id/structured-data |
path, value, expected-version |
[{op: replace, path, value}] |
POST /documents/:id/line-items |
expected-version |
[{op: add, path: "/line-items/-", value: {id: <server uuid>, order: <max+1>, description: "New line item", page-location: [0,0]}}] |
DELETE /documents/:id/line-items/:item-id |
expected-version (query) |
[{op: remove, path: "/line-items[id=<item-id>]"}] |
PATCH /documents/:id/line-items |
expected-version, item-id=a&item-id=b&… (positional, from SortableJS) |
[{op: replace, path: "/line-items[id=<id>]/order", value: <index>}, …] |
Reorder lives on the collection path (PATCH /documents/:id/line-items)
rather than a child path to avoid a :item-id/order route conflict
and to reflect the collection-level semantics of a reorder.
Shared handler skeleton (Clojure pseudocode):
(defn apply-edit! [db-pool identity-id document-id expected-version patch-builder render-response]
(jdbc/with-transaction [tx db-pool]
(let [{:document/keys [version structured-data legal-entity-id]}
(fetch-document-for-update tx document-id)]
(assert-tenant-membership! identity-id legal-entity-id)
(cond
(not= version expected-version)
(conflict-response tx document-id structured-data version)
:else
(let [patch (patch-builder structured-data)
new-data (apply-patch structured-data patch) ;; resolves [id=X]
new-version (inc version)]
(insert-history-row! tx {:document-id document-id
:change-type :edit
:edited-by identity-id
:patch patch})
(update-document! tx document-id new-data new-version)
(render-response new-data new-version patch))))))
Each endpoint supplies a patch-builder (closure over the form
params) and a render-response (closure that renders the affected
Hiccup fragment).
Responses are HTML fragments. Status code selects the variant; HTMX swaps whatever came back. No JSON.
200 / 201 success
editable-value wrapper at that path<tr> or card<tbody>body[data-document-version] with the new version.HX-Trigger: documentEdited (future-useful for cross-cutting
listeners; not required for MVP).409 stale version
<tbody> is re-rendered, since structural changes
may have happened.body[data-document-version] to the current
version so the next attempt on the page doesn't also 409
instantly.HX-Trigger: editRejected — optional hook for a toast.422 invalid patch (unknown path, type mismatch, id not resolvable)
403 / 404
HX-Trigger: toast with error text. HX-Reswap: none so the UI
doesn't swap.One check at the top of every handler: identity-id from session has
a row in tenant_membership for the document's legal entity. No role
gate. Fails → 403.
Minimal. editable-fields.js shrinks but does not disappear:
htmx.ajax('PATCH', url, {values: {path, value, expected-version: body.dataset.documentVersion}, target: editableValueEl, swap: 'outerHTML'})
instead of mutating the DOM locally.Add / delete / reorder require zero new JS beyond ~10 lines of
SortableJS glue wired via htmx.onLoad:
htmx.onLoad((content) => {
content.querySelectorAll('.line-items-sortable').forEach((el) => {
new Sortable(el, { animation: 150, handle: '.drag-handle' });
});
});
The sortable <tbody> carries hx-patch, hx-trigger="end", and
hx-include="closest tbody". Each <tr> holds <input type="hidden" name="item-id" value="X">. SortableJS reorders the DOM; end fires;
HTMX submits rows' item-id values in their current DOM order.
Unchanged in shape: the view namespaces (view/invoice.clj, etc.)
render the detail page by reading document.structured_data. Two
additions:
body[data-document-version] attribute, set from
document.version, so every editable-value's outgoing hx-vals
can pick up the current version.editable-value receives a CSS
class and title tooltip when its path has been human-edited
since the last ingestion.The view handler computes a provenance-map once per page load and
threads it down to the components that render editable-values.
One helper, called once per document-detail request:
(defn document-provenance
"Returns {path-string → {:edited-by, :edited-at}} for every path
with a human edit since the most recent ingestion for this
document. Paths absent from the map are implicitly LLM-sourced."
[db-pool document-id]
(let [rows (db.sql/execute!
db-pool
{:select [:*]
:from [:document-history]
:where [:= :document-id document-id]
:order-by [[:created-at :desc]]})
post-ingest (take-while #(not= "ingestion"
(:document-history/change-type %))
rows)]
(reduce (fn [acc {:document-history/keys [patch edited-by created-at]}]
(reduce (fn [acc' op]
(let [path (get op "path")]
(if (contains? acc' path)
acc' ;; later edit wins (we reverse below)
(assoc acc' path {:edited-by edited-by
:edited-at created-at}))))
acc
patch))
{}
(reverse post-ingest))))
Notes:
document_history(document_id, created_at).
take-while stops at the most recent ingestion row.[id=X] extension; the returned map keys use
the same form. Renderers resolve [id=X] → current numeric index
once per line item when emitting the display.[id=X] no longer resolves (line item was later
removed) stay in the map but decorate nothing — the renderer never
asks for them.Minimal for MVP: a subtle dot or underline on edited
editable-values, with a title="Edited by {name} at {time}"
tooltip. No history dialog, no diff view — those are follow-ups.
editable-value gains an optional :provenance kwarg:
(editable-value path :text
{:provenance (get provenance-map path)}
display)
When :provenance is present, the wrapper renders with an extra
.is-human-edited class and a title attribute. Absent → identical
rest-state markup to today.
Per document detail view:
document_history.No caching for MVP. If detail view latency ever matters, per-request memoization or materialization can come later.
Implicit supersession, defined entirely by three read-side rules:
document_history newest →
oldest and stops at the first change_type='ingestion' row. Edits
older than that are never consulted.document.structured_data is always the materialized current
state; reads never fold history. Re-ingestion's root-level
replace patch overwrites any prior state when the worker applies it.superseded_at column,
archive table). "Superseded" is derivable from position in the
history timeline — a row is active iff its created_at is greater
than the most recent change_type='ingestion' row's created_at
for that document.ingestion row starts, status in-progress.ingestion
history row with a root replace patch, updates
document.structured_data, bumps version.expected-version gets a 409
with the usual inline-error UX.document_history for audit and
for "fields the LLM historically got wrong" analyses.User submits an edit just as a re-ingestion's transaction commits.
The edit's expected-version is stale → 409 with the freshly-ingested
value and the standard error banner. Expected behavior; falls out of
optimistic locking without special handling.
One migration file + one code deploy. A second migration (dropping
the now-unused ingestion columns) is deferred and tracked in
PENDING-CLEANUPS.md.
-- up
CREATE TYPE document_history_change_type AS ENUM ('ingestion', 'edit');
CREATE TABLE document_history (
id UUID PRIMARY KEY DEFAULT uuidv7(),
document_id UUID NOT NULL REFERENCES document(id) ON DELETE CASCADE,
change_type document_history_change_type NOT NULL,
ingestion_id UUID REFERENCES ingestion(id) ON DELETE SET NULL,
edited_by UUID REFERENCES "identity"(id) ON DELETE SET NULL,
patch JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT document_history_source_xor CHECK (
(change_type = 'ingestion' AND ingestion_id IS NOT NULL AND edited_by IS NULL)
OR
(change_type = 'edit' AND edited_by IS NOT NULL AND ingestion_id IS NULL)
)
);
CREATE INDEX idx_document_history_document_created
ON document_history(document_id, created_at);
ALTER TABLE document ADD COLUMN version INT NOT NULL DEFAULT 1;
-- Backfill :id and :order onto existing line items
UPDATE document SET structured_data = (
SELECT jsonb_set(
structured_data,
'{line-items}',
(SELECT jsonb_agg(
item || jsonb_build_object(
'id', gen_random_uuid()::text,
'order', (idx - 1)
) ORDER BY idx
)
FROM jsonb_array_elements(structured_data->'line-items')
WITH ORDINALITY AS t(item, idx))
)
)
WHERE structured_data ? 'line-items'
AND jsonb_typeof(structured_data->'line-items') = 'array';
-- Backfill one history row per document from its most recent successful ingestion
INSERT INTO document_history (document_id, change_type, ingestion_id, patch, created_at)
SELECT
d.id,
'ingestion'::document_history_change_type,
latest.id,
jsonb_build_array(jsonb_build_object(
'op', 'replace',
'path', '',
'value', d.structured_data
)),
COALESCE(latest.completed_at, d.created_at)
FROM document d
LEFT JOIN LATERAL (
SELECT id, completed_at
FROM ingestion
WHERE ingestion.document_id = d.id
AND status = 'completed'
ORDER BY completed_at DESC
LIMIT 1
) latest ON TRUE
WHERE d.structured_data IS NOT NULL;
-- Drop the trigger; app-level code now handles ingestion → document
DROP TRIGGER IF EXISTS trg_update_document_from_ingestion ON ingestion;
DROP FUNCTION IF EXISTS update_document_from_ingestion();
schema.invoice.structured-data/LineItem gains :id :string and
:order :int as required fields (also updated in PO / GRN /
contract line-item schemas wherever they exist).workers/ap/ingestion.clj adds id/order annotation as the final
post-processing step, before schema validation.workers/ap/ingestion.clj completion handler: transactional
document_history insert + document update + ingestion status
update (replacing what the trigger used to do).app/http/documents/ per §3.2.editable-value hiccup helper threads expected-version from
body[data-document-version] into every htmx.ajax call, accepts
a :provenance option.document-provenance once and
thread the result through component renders.editable-fields.js: commit-path rewritten to call htmx.ajax
instead of mutating the DOM locally.htmx.onLoad
glue.ingestion.structured_data and ingestion.valid_structured_data stay
in the schema after this release. The new code no longer writes to
them; on existing rows they keep their last-trigger-era values; on
new ingestions they stay NULL. No reader in the new code depends on
them. They are documented for later removal in a tracked file:
resources/migrations/PENDING-CLEANUPS.md — created as part of
this change:
# Pending schema cleanups
Tracks columns, tables, and triggers that are no longer written or
read but have not yet been dropped. Each entry states what's stale,
what replaces it, and the gating condition for removal.
## `ingestion.structured_data` (JSONB)
- **Replaced by:** `document_history.patch` (for per-ingestion state)
and `document.structured_data` (for current materialized state).
- **Stopped being written:** <DATE OF MIGRATION>, when
`trg_update_document_from_ingestion` was dropped and the ingestion
worker's completion handler began writing to `document_history` +
`document` transactionally.
- **Gate to drop:** migration above verified stable in production for
enough time to have no open regressions referencing this column.
## `ingestion.valid_structured_data` (BOOLEAN)
- **Replaced by:** Malli schema validation in the ingestion worker,
recorded implicitly by the presence of a `document_history` row
with `change_type='ingestion'` (failed validation means no row).
- **Stopped being written:** same as above.
- **Gate to drop:** same as above.
The window between the migration commit (which drops the trigger) and
the new-code process accepting its first ingestion completion is
narrow. Migratus runs at startup before routes register, so any
ingestion that finishes during this window is handled by the new
code. Ingestions in in-progress status at deploy time are either
(a) interrupted mid-flight and re-claimed by the new worker (new
code path applies cleanly) or (b) pre-deploy worker already committed
structured_data via the trigger before the deploy, which shows up
in document.structured_data and gets backfilled as a history row
just before the trigger drop.
All of the following are part of this implementation, not follow-up work.
.claude/skills/debug-doc/SKILL.md — Step 5 (line 115+):
replace the ap_ingestion.structured_data query with a
document_history query joined on ingestion_id, alongside the
existing document.structured_data query. Note that new
ingestions no longer populate ingestion.structured_data. While
in there, verify the ap_ingestion table name (appears stale
post-rename; confirm and fix)..claude/skills/ingestion-regression-test/inspector-prompt.md
— replace all three ingestion.structured_data reads (lines 38,
82, 89) with reads from the matching document_history row
(WHERE ingestion_id = X), extracting patch->0->>'value'. This
is the new baseline/current source of truth for per-ingestion
extraction results.test/com/getorcha/workers/ap/ingestion_test.clj — update
assertions that inspect ingestion.structured_data or
:ap-ingestion/structured-data. New assertions read
document.structured_data for current state and query
document_history for the per-ingestion patch when the test
specifically cares about the ingestion write. Add fresh tests for
the new transactional write path (history row + document update +
ingestion status), the version bump, and the optimistic-lock
conflict.src/com/getorcha/schema/ingestion.clj — remove
:ap-ingestion/structured-data and
:ap-ingestion/valid-structured-data from the Malli schema. The
columns physically remain in the DB per PENDING-CLEANUPS.md, but
the code-level schema no longer exposes them — they are
write-only legacy until drop.scripts/debug_fetch_document.clj — fetches a prod document +
its ingestions into the local DB for debugging. Must also copy the
document's document_history rows; otherwise local debug will not
see edits or the ingestion-patch lineage.scripts/debug_common.clj — ingestion-jsonb-keys (which
includes :structured-data) can stay as-is, legacy rows still
round-trip through it. Add a new jsonb-keys set (or extend
document-jsonb-keys machinery) to cover the new
document_history.patch column so history rows round-trip on
insert/fetch.scripts/ingest.clj and scripts/export_invoices.clj —
unaffected. Both read document.structured_data, which continues
to carry the current state.Any production SQL dashboards / ad-hoc queries reading
ingestion.structured_data will show stale values for old ingestions
and NULL for new ones. Release notes should call this out so the
owning engineers can update their queries to read from
document_history (for per-ingestion state) or
document.structured_data (for current state).