Note (2026-04-24): After this document was written, legal_entity was renamed to tenant and the old tenant was renamed to organization. Read references to these terms with the pre-rename meaning.

Document Edit History — Design

1. Scope & goals

Problem

The document detail UI has inline editing shipped as a UI-only prototype (no persistence — see 2026-04-09-inline-editing-ui-design.md). We need a backend that captures every edit with a full audit trail, tolerates concurrent edits, handles re-ingestion cleanly, and enables downstream LLM-improvement analysis from the corrections users make.

Approach

A unified append-only document_history table stores both ingestion and edit events as RFC 6902 patches with an ID-based array-path extension. document.structured_data remains the materialized current state and gains a version column for optimistic locking. User edits flow through HTMX-shaped endpoints that return HTML fragments; responses vary by status code, not by response body shape.

In scope

Out of scope, explicitly deferred

2. Storage model

New table: document_history

CREATE TYPE document_history_change_type AS ENUM ('ingestion', 'edit');

CREATE TABLE document_history (
    id            UUID        PRIMARY KEY DEFAULT uuidv7(),
    document_id   UUID        NOT NULL REFERENCES document(id) ON DELETE CASCADE,
    change_type   document_history_change_type NOT NULL,
    ingestion_id  UUID        REFERENCES ingestion(id) ON DELETE SET NULL,
    edited_by     UUID        REFERENCES "identity"(id) ON DELETE SET NULL,
    patch         JSONB       NOT NULL,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now(),

    CONSTRAINT document_history_source_xor CHECK (
        (change_type = 'ingestion' AND ingestion_id IS NOT NULL AND edited_by IS NULL)
        OR
        (change_type = 'edit'      AND edited_by    IS NOT NULL AND ingestion_id IS NULL)
    )
);

CREATE INDEX idx_document_history_document_created
    ON document_history(document_id, created_at);

Append-only. One row per HTTP operation (ingestion completion or edit request). ingestion_id / edited_by are mutually exclusive, enforced by the XOR check. uuidv7 primary key so rows sort natively by creation time.

Changes to document

ALTER TABLE document ADD COLUMN version INT NOT NULL DEFAULT 1;

Incremented in the same transaction as every document_history write. Read by the UI into body[data-document-version], sent back with every edit request as expected-version.

Changes to ingestion (and what stays)

Two things change:

  1. The trigger trg_update_document_from_ingestion is dropped. Its logic (type-setting and needs_human_review derivation) moves into the application's ingestion-completion handler.
  2. The ingestion worker stops writing to ingestion.structured_data and ingestion.valid_structured_data.

The two columns remain in the schema — they are no longer written by new code but keep their legacy values for rows predating this release. A future migration will drop them; tracked in resources/migrations/PENDING-CLEANUPS.md (see §6).

Patch format

RFC 6902 JSON Patch stored as a JSONB array, with one extension for array paths: elements are addressed by stable id, not index.

Examples:

// scalar replace
[{"op": "replace", "path": "/invoice-number", "value": "INV-2024-0099"}]

// nested field on a line item
[{"op": "replace", "path": "/line-items[id=li-abc]/debit-account/number", "value": "1200"}]

// full ingestion (always "replace" at root, first and subsequent)
[{"op": "replace", "path": "", "value": {"document-type": "invoice", "invoice-number": "...", "line-items": [...]}}]

// line-item add
[{"op": "add", "path": "/line-items/-", "value": {"id": "li-xyz", "order": 5, "description": "New line item", ...}}]

// line-item remove
[{"op": "remove", "path": "/line-items[id=li-abc]"}]

// reorder (multi-op)
[
  {"op": "replace", "path": "/line-items[id=li-abc]/order", "value": 0},
  {"op": "replace", "path": "/line-items[id=li-def]/order", "value": 1},
  {"op": "replace", "path": "/line-items[id=li-ghi]/order", "value": 2}
]

The [id=X] segment is resolved to a concrete index by an application-level applier (in-house, not a third-party JSON-Patch library — we already own the code path). Applier contract:

Line-item shape additions

Every line item carries two new fields in structured_data:

Both are added to the Malli LineItem schema as required fields. Existing line items are backfilled in the migration.

What structured_data does not contain

3. Write paths

3.1 Ingestion completion (replacing the trigger)

The worker pipeline gains an explicit terminal transaction replacing what trg_update_document_from_ingestion used to do:

1. LLM extraction → raw structured_data (no id/order)
2. Post-processing enrichments (accounts, accruals, tax compliance, …)
3. Annotate each line-item with :id and :order (last post-processing step)
4. Malli schema validation — schema now requires :id and :order,
   so annotation must precede validation
5. Transactional write:
   begin tx
     SELECT document.version FOR UPDATE
     INSERT INTO document_history (change_type='ingestion', ingestion_id, patch)
       patch = [{"op": "replace", "path": "", "value": <final structured_data>}]
     UPDATE document SET
       structured_data    = <final>,
       type               = <extracted document-type>,
       needs_human_review = <derived from validation-results + schema validity>,
       version            = version + 1,
       updated_at         = now()
     UPDATE ingestion SET status = 'completed', completed_at = now(), …
   commit

Type-setting and needs_human_review derivation — formerly in the trigger body — move into this handler. SELECT … FOR UPDATE on the document row prevents racing with a concurrent user edit. Failed / skipped ingestions skip this entirely and write no history row.

3.2 User edit endpoints

Four HTMX-shaped endpoints. Each is a thin wrapper around the same core "apply patch, write history, bump version, return HTML" flow. The differences are only in how the form params become a patch.

Route Form / params Patch produced (stored)
PATCH /documents/:id/structured-data path, value, expected-version [{op: replace, path, value}]
POST /documents/:id/line-items expected-version [{op: add, path: "/line-items/-", value: {id: <server uuid>, order: <max+1>, description: "New line item", page-location: [0,0]}}]
DELETE /documents/:id/line-items/:item-id expected-version (query) [{op: remove, path: "/line-items[id=<item-id>]"}]
PATCH /documents/:id/line-items expected-version, item-id=a&item-id=b&… (positional, from SortableJS) [{op: replace, path: "/line-items[id=<id>]/order", value: <index>}, …]

Reorder lives on the collection path (PATCH /documents/:id/line-items) rather than a child path to avoid a :item-id/order route conflict and to reflect the collection-level semantics of a reorder.

Shared handler skeleton (Clojure pseudocode):

(defn apply-edit! [db-pool identity-id document-id expected-version patch-builder render-response]
  (jdbc/with-transaction [tx db-pool]
    (let [{:document/keys [version structured-data legal-entity-id]}
          (fetch-document-for-update tx document-id)]
      (assert-tenant-membership! identity-id legal-entity-id)
      (cond
        (not= version expected-version)
        (conflict-response tx document-id structured-data version)

        :else
        (let [patch       (patch-builder structured-data)
              new-data    (apply-patch structured-data patch)  ;; resolves [id=X]
              new-version (inc version)]
          (insert-history-row! tx {:document-id document-id
                                   :change-type :edit
                                   :edited-by   identity-id
                                   :patch       patch})
          (update-document! tx document-id new-data new-version)
          (render-response new-data new-version patch))))))

Each endpoint supplies a patch-builder (closure over the form params) and a render-response (closure that renders the affected Hiccup fragment).

3.3 Response contract

Responses are HTML fragments. Status code selects the variant; HTMX swaps whatever came back. No JSON.

3.4 Authorization

One check at the top of every handler: identity-id from session has a row in tenant_membership for the document's legal entity. No role gate. Fails → 403.

3.5 Client-side JS

Minimal. editable-fields.js shrinks but does not disappear:

Add / delete / reorder require zero new JS beyond ~10 lines of SortableJS glue wired via htmx.onLoad:

htmx.onLoad((content) => {
  content.querySelectorAll('.line-items-sortable').forEach((el) => {
    new Sortable(el, { animation: 150, handle: '.drag-handle' });
  });
});

The sortable <tbody> carries hx-patch, hx-trigger="end", and hx-include="closest tbody". Each <tr> holds <input type="hidden" name="item-id" value="X">. SortableJS reorders the DOM; end fires; HTMX submits rows' item-id values in their current DOM order.

4. Read path & provenance

4.1 Rendering the document detail view

Unchanged in shape: the view namespaces (view/invoice.clj, etc.) render the detail page by reading document.structured_data. Two additions:

  1. body[data-document-version] attribute, set from document.version, so every editable-value's outgoing hx-vals can pick up the current version.
  2. Provenance decoration — each editable-value receives a CSS class and title tooltip when its path has been human-edited since the last ingestion.

The view handler computes a provenance-map once per page load and threads it down to the components that render editable-values.

4.2 Provenance computation (on-the-fly)

One helper, called once per document-detail request:

(defn document-provenance
  "Returns {path-string → {:edited-by, :edited-at}} for every path
   with a human edit since the most recent ingestion for this
   document. Paths absent from the map are implicitly LLM-sourced."
  [db-pool document-id]
  (let [rows        (db.sql/execute!
                     db-pool
                     {:select   [:*]
                      :from     [:document-history]
                      :where    [:= :document-id document-id]
                      :order-by [[:created-at :desc]]})
        post-ingest (take-while #(not= "ingestion"
                                       (:document-history/change-type %))
                                rows)]
    (reduce (fn [acc {:document-history/keys [patch edited-by created-at]}]
              (reduce (fn [acc' op]
                        (let [path (get op "path")]
                          (if (contains? acc' path)
                            acc'  ;; later edit wins (we reverse below)
                            (assoc acc' path {:edited-by edited-by
                                              :edited-at created-at}))))
                      acc
                      patch))
            {}
            (reverse post-ingest))))

Notes:

4.3 Provenance indicator in the UI

Minimal for MVP: a subtle dot or underline on edited editable-values, with a title="Edited by {name} at {time}" tooltip. No history dialog, no diff view — those are follow-ups.

editable-value gains an optional :provenance kwarg:

(editable-value path :text
                {:provenance (get provenance-map path)}
                display)

When :provenance is present, the wrapper renders with an extra .is-human-edited class and a title attribute. Absent → identical rest-state markup to today.

4.4 Costs

Per document detail view:

No caching for MVP. If detail view latency ever matters, per-request memoization or materialization can come later.

5. Re-ingestion & supersession

Implicit supersession, defined entirely by three read-side rules:

  1. The provenance walker (§4.2) reads document_history newest → oldest and stops at the first change_type='ingestion' row. Edits older than that are never consulted.
  2. document.structured_data is always the materialized current state; reads never fold history. Re-ingestion's root-level replace patch overwrites any prior state when the worker applies it.
  3. No DB-level supersession marker (superseded_at column, archive table). "Superseded" is derivable from position in the history timeline — a row is active iff its created_at is greater than the most recent change_type='ingestion' row's created_at for that document.

On re-ingestion

  1. A new ingestion row starts, status in-progress.
  2. Worker runs extraction + post-processing + id/order annotation + schema validation, exactly like the first ingestion.
  3. Worker's completion transaction (§3.1): inserts a new ingestion history row with a root replace patch, updates document.structured_data, bumps version.
  4. Any in-flight user edit with the old expected-version gets a 409 with the usual inline-error UX.

What users perceive

Edge case: edit in flight when re-ingestion completes

User submits an edit just as a re-ingestion's transaction commits. The edit's expected-version is stale → 409 with the freshly-ingested value and the standard error banner. Expected behavior; falls out of optimistic locking without special handling.

6. Migration plan

One migration file + one code deploy. A second migration (dropping the now-unused ingestion columns) is deferred and tracked in PENDING-CLEANUPS.md.

Migration file (ships with the release)

-- up

CREATE TYPE document_history_change_type AS ENUM ('ingestion', 'edit');

CREATE TABLE document_history (
    id            UUID        PRIMARY KEY DEFAULT uuidv7(),
    document_id   UUID        NOT NULL REFERENCES document(id) ON DELETE CASCADE,
    change_type   document_history_change_type NOT NULL,
    ingestion_id  UUID        REFERENCES ingestion(id) ON DELETE SET NULL,
    edited_by     UUID        REFERENCES "identity"(id) ON DELETE SET NULL,
    patch         JSONB       NOT NULL,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now(),

    CONSTRAINT document_history_source_xor CHECK (
        (change_type = 'ingestion' AND ingestion_id IS NOT NULL AND edited_by IS NULL)
        OR
        (change_type = 'edit'      AND edited_by    IS NOT NULL AND ingestion_id IS NULL)
    )
);

CREATE INDEX idx_document_history_document_created
    ON document_history(document_id, created_at);

ALTER TABLE document ADD COLUMN version INT NOT NULL DEFAULT 1;

-- Backfill :id and :order onto existing line items
UPDATE document SET structured_data = (
    SELECT jsonb_set(
        structured_data,
        '{line-items}',
        (SELECT jsonb_agg(
            item || jsonb_build_object(
                'id',    gen_random_uuid()::text,
                'order', (idx - 1)
            ) ORDER BY idx
        )
        FROM jsonb_array_elements(structured_data->'line-items')
             WITH ORDINALITY AS t(item, idx))
    )
)
WHERE structured_data ? 'line-items'
  AND jsonb_typeof(structured_data->'line-items') = 'array';

-- Backfill one history row per document from its most recent successful ingestion
INSERT INTO document_history (document_id, change_type, ingestion_id, patch, created_at)
SELECT
    d.id,
    'ingestion'::document_history_change_type,
    latest.id,
    jsonb_build_array(jsonb_build_object(
        'op',    'replace',
        'path',  '',
        'value', d.structured_data
    )),
    COALESCE(latest.completed_at, d.created_at)
FROM document d
LEFT JOIN LATERAL (
    SELECT id, completed_at
    FROM ingestion
    WHERE ingestion.document_id = d.id
      AND status = 'completed'
    ORDER BY completed_at DESC
    LIMIT 1
) latest ON TRUE
WHERE d.structured_data IS NOT NULL;

-- Drop the trigger; app-level code now handles ingestion → document
DROP TRIGGER IF EXISTS trg_update_document_from_ingestion ON ingestion;
DROP FUNCTION IF EXISTS update_document_from_ingestion();

Code deploy (same release)

Deferred cleanup

ingestion.structured_data and ingestion.valid_structured_data stay in the schema after this release. The new code no longer writes to them; on existing rows they keep their last-trigger-era values; on new ingestions they stay NULL. No reader in the new code depends on them. They are documented for later removal in a tracked file:

resources/migrations/PENDING-CLEANUPS.md — created as part of this change:

# Pending schema cleanups

Tracks columns, tables, and triggers that are no longer written or
read but have not yet been dropped. Each entry states what's stale,
what replaces it, and the gating condition for removal.

## `ingestion.structured_data` (JSONB)

- **Replaced by:** `document_history.patch` (for per-ingestion state)
  and `document.structured_data` (for current materialized state).
- **Stopped being written:** <DATE OF MIGRATION>, when
  `trg_update_document_from_ingestion` was dropped and the ingestion
  worker's completion handler began writing to `document_history` +
  `document` transactionally.
- **Gate to drop:** migration above verified stable in production for
  enough time to have no open regressions referencing this column.

## `ingestion.valid_structured_data` (BOOLEAN)

- **Replaced by:** Malli schema validation in the ingestion worker,
  recorded implicitly by the presence of a `document_history` row
  with `change_type='ingestion'` (failed validation means no row).
- **Stopped being written:** same as above.
- **Gate to drop:** same as above.

Risk window

The window between the migration commit (which drops the trigger) and the new-code process accepting its first ingestion completion is narrow. Migratus runs at startup before routes register, so any ingestion that finishes during this window is handled by the new code. Ingestions in in-progress status at deploy time are either (a) interrupted mid-flight and re-claimed by the new worker (new code path applies cleanly) or (b) pre-deploy worker already committed structured_data via the trigger before the deploy, which shows up in document.structured_data and gets backfilled as a history row just before the trigger drop.

7. Downstream updates (non-schema)

All of the following are part of this implementation, not follow-up work.

Claude skills

Tests

Ingestion schema

Scripts

Out-of-repo (flagged, not in this plan)

Any production SQL dashboards / ad-hoc queries reading ingestion.structured_data will show stale values for old ingestions and NULL for new ones. Release notes should call this out so the owning engineers can update their queries to read from document_history (for per-ingestion state) or document.structured_data (for current state).