Note (2026-04-24): After this document was written, legal_entity was renamed to tenant and the old tenant was renamed to organization. Read references to these terms with the pre-rename meaning.

Unified Processors — Design

1. Scope & goals

Problem

Three mechanisms in the codebase today run "derive something from document state, record the run, write outputs":

  1. Post-processor pipeline (IProcessor protocol + post-process/run) — the 9 in-ingestion processors (accounts, cost-center, accruals, supplier-matcher, supplier-verifier, tax-compliance-analyzer, financial-validation-resolver, fraud-detector, uncertain-validations-resolver).
  2. Matching SQS worker (matching.worker/process-document!) — bespoke orchestration that runs matching, writes diagnostics, and drives reconciliation per cluster.
  3. Diagnostics-recompute SQS worker (just landed in document-diagnostics) — placeholder stubs for edit-triggered recomputes.

Each one inserts document_processor_run rows, writes diagnostic slices, and conditionally mutates structured_data. They diverge on protocol shape, side-effect discipline, and trigger handling. Adding edit-triggered recompute to the mix (the reason this work started) would introduce a fourth implementation and compound the drift.

Separately, the diagnostics work shipped without a story for conditional recomputation: today an edit-triggered recompute would blindly re-run every processor, wasting LLM spend and risking the engine overwriting user-entered values.

Approach

Unify under one Processor protocol and one run-processors! engine. Every current processor (including matching and reconciliation) becomes an implementation. Two callers — ingestion and edit-recompute — parameterise the same engine with different phase lists and modes.

Introduce declarative :reads / :writes metadata on each processor so the engine can (a) skip processors whose declared reads weren't touched by the triggering edit, and (b) refuse to overwrite any field a user has manually edited.

Reorganise namespaces so matching lives alongside the other processors (workers.ap.processors.*), and rename -slice-diagnostic to match the terminology in the rest of the domain.

In scope

Out of scope

2. Concepts

2.1 Processor

A unit of work that derives something from document state. Every processor:

2.2 State

Replaces the ad-hoc ingestion map threaded through IProcessors. Single shape passed through the engine:

{:document        <document-row>              ;; includes :structured-data, :version, :legal-entity-id
 :legal-entity    <legal-entity-row>
 :file            {:contents <bytes-or-nil> :mime-type <string>}
 :structured-data <map>                       ;; mirror of :document/structured-data for convenience
 :commit-sha      <string-or-nil>
 :ingestion-id    <uuid-or-nil>               ;; set in :ingestion mode only
 :history-id      <uuid-or-nil>}              ;; set in :edit mode only

:file is populated on demand by the engine only for processors that need PDF bytes (currently tax-compliance's vision fallback). Lazy to avoid gratuitous S3 fetches.

2.3 Leaves

Every :reads entry and every :writes entry is a leaf path — a path whose tail addresses an atomic field (scalar, or a primitive inside a [:vector <primitive>]), per the structured-data malli schema. Subtree declarations ([:cat [:= :issuer]]) are forbidden.

Leaves are expressed as malli seq-regex patterns over clj-path segments:

[:cat [:= :issuer] [:= :name]]
[:cat [:= :line-items] :any [:= :description]]
[:cat [:= :line-items] :any [:= :debit-account]]

:any matches one segment of any shape — array index, {:id X} map, or keyword. No other wildcards are needed.

An authoring helper lowers a terse vector form to seq-regex:

(read-leaf :line-items :* :description)
;; => [:cat [:= :line-items] :any [:= :description]]

2.4 Diagnostic (renamed)

-slice-diagnostic everywhere (protocol method, engine param, helper fn names). Semantics unchanged: one top-level key in document.diagnostics owned by one processor. Processors whose output lives only in structured-data return nil.

3. IProcessor protocol (v2)

Keep the existing name. Modernise the contract:

(defprotocol IProcessor
  (-id         [this])                   ;; :keyword
  (-reads      [this state])             ;; [seq-regex-pattern...] — state-aware for per-type dispatch
  (-writes     [this state])             ;; [seq-regex-pattern...] — may be empty
  (-diagnostic [this state])             ;; DiagnosticSpec or vector thereof, or nil
  (-modes      [this])                   ;; #{:ingestion :edit}
  (-always?    [this])                   ;; boolean — bypasses conditional filter
  (-compute    [this ctx state])         ;; -> {:result _ :stats _}
  (-apply-ops  [this state result]))     ;; -> [json-patch-op ...]

-reads, -writes, and -diagnostic take state so processors can dispatch on document type. matching's reads differ for invoice/purchase-order/contract/GRN; validations similarly dispatches its sub-path ownership per document type (see §13). State is guaranteed to contain :document with :document/structured-data including :document-type before these methods are called.

-compute replaces today's -compute [this] (which closed over context and ingestion at construction). The ctx/state switch lets the engine reuse a single processor instance per run and supply the canonical state map.

-apply-ops replaces today's -apply [this sd result] → new-sd. Ops form is required so the engine can filter against user-edited paths before applying. Returned ops use string keys ({"op" "replace" "path" "..." "value" ...}) matching the JSON-Patch convention used by json-patch/apply-patch and the edit handlers.

-always? is an escape hatch for processors that read too broadly to enumerate (validations — cheap deterministic checks over the whole document). When true, the conditional filter never skips this processor in :edit mode; reads declaration is ignored. Defaults to false.

3.1 DiagnosticSpec

-diagnostic returns one of:

nil                                        ;; no diagnostic write
{:slice :kw}                               ;; replace whole slice
{:slice :kw :sub-path [k1 k2 ...]}         ;; replace at sub-path
{:slice :kw :sub-paths [[k1] [k2] ...]}    ;; multiple sub-paths, each written
                                           ;; separately with the result map's
                                           ;; top-level keys
[spec1 spec2 ...]                          ;; multi-slice: write to several slices
                                           ;; from one processor's result

Multi-slice (the vector form) is used by tax-compliance-analyzer, which writes both :tax-issues and :line-items from a single LLM call. When the vector form is used, -compute's result is a map with top-level keys matching each slice:

{:tax-issues [...]
 :line-items {"li-abc" {:vat-validation {...}} ...}}

Engine routes each top-level key to its slice.

4. Engine: run-processors!

(run-processors! ctx state phases)

No separate processors-filter argument. Filtering happens inside the engine per phase.

Flow

For each phase, in order:

  1. Refresh: refetch document.diagnostics and populate state.diagnostics. Phase-1 processors see empty/stale diagnostics; phase-2 sees phase-1's writes (e.g. tax-compliance-analyzer reads state.diagnostics.validations.tax-id-format that validations wrote in phase 1).
  2. Schedule: in :edit mode, drop processors whose -always? is false AND whose -reads (evaluated against current state) don't intersect :changed-leaves AND which have a :completed run at the current document version. In :ingestion mode, no filtering.
  3. Execute in parallel (virtual-thread-per-processor):
  4. Fold: after each phase, the engine returns an updated state reflecting applied ops and refreshed diagnostics. Subsequent phases see prior phases' mutations.

4.1 Persistence

The engine mutates state.structured-data in memory across phases. When the engine run completes, if state.structured-data differs from state.initial-structured-data:

A new change_type = 'derivation' enum value is added to document_history_change_type. Derivation rows have edited_by = NULL and ingestion_id = NULL; only patch is populated. The constraint on document_history is relaxed to allow this combination.

Return value

The updated state. Callers wire this back into their end-of-pipeline logic (e.g., ingestion's complete-ingestion! writes the final document_history row + document update in the same transaction).

5. Modes

5.1 :ingestion

5.2 :edit

5.3 :manual (future)

Not part of this work but protocol-level reserved. For ops scripts that want to force a recompute without an ingestion or edit anchor.

6. Conditional recomputation

6.1 Changed leaves

Given the current edit's patch (from document_history.patch), the engine produces the set of changed leaves:

  1. Parse each op's path string to a clj-path via json-patch.path/pointer->clj-path.
  2. If the op's path resolves to a leaf (per structured-data malli schema), keep it.
  3. Otherwise (non-leaf ops: add/remove/replace subtree), expand by walking the op's value (or the pre-patch value for remove) and emitting one virtual leaf path per atomic descendant.

The expansion uses the StructuredData schema to decide "is this atomic?" — primitive schemas and [:vector <primitive>] are atomic; maps and vectors of maps are not.

6.2 Scheduling filter

For each processor in the phase:

The filter runs per phase, immediately before execution. Later phases see earlier phases' mutations, which may themselves touch leaves relevant to later-phase processors; the filter recomputes between phases.

6.3 Edit-context feed

The state receives a computed :changed-leaves set on entry to :edit mode so processors can consume it too (e.g., an efficient processor might compute only the deltas). Optional — the filter handles most cases; this is for processors that want finer control.

7. Write-protection

7.1 User-edited path set

Reuses the shared logic from view/provenance.clj, extracted to a new namespace com.getorcha.document.provenance:

(provenance/user-edited-paths db-pool document-id)
;; => #{"/issuer/name" "/line-items[id=abc]/debit-account" ...}

Internally walks document_history newest → oldest, collecting op paths up to (but not including) the most recent :ingestion row. Same behaviour that the UI relies on.

The extracted ns exports both:

7.2 Op filter

The filter runs in :edit mode only. Ingestion mode applies every op unconditionally (re-ingestion wipes state).

For each op the engine is about to apply:

Prefix matching is important: user edited /line-items[id=abc] (wholesale replace) must block processor writes to /line-items[id=abc]/debit-account.

7.3 Test coverage

Unit-test the filter with representative scenarios:

8. Notifications

Failed processor runs fire admin notifications in both modes. Notification payload includes:

{:kind       :processor/failure
 :processor  :matching
 :trigger    {:kind :edit :history-id #uuid ... :edited-by #uuid ...}
 :document   {:id ... :legal-entity-id ... :file-original-name ...}
 :error      <message>}

For :ingestion triggers :edited-by is nil. For :edit triggers the admin payload includes which user performed the edit (useful when an edit pattern is destabilising a processor). Renames today's :matching/permanent-failure, :reconciliation/failure into the single :processor/failure kind with the processor id in the payload.

9. Renames

Old New
IProcessor (protocol) IProcessor (kept; contract extended)
db.diagnostics/update-slice! db.diagnostics/update-diagnostic! (no alias — all callers updated)
publish-document-ready! processors.matching.queue/enqueue!
:matching/permanent-failure :processor/failure with :processor :matching
:reconciliation/failure :processor/failure with :processor :reconciliation
compute! (post-process.clj) absorbed into engine
with-run-row! (post-process.clj) absorbed into engine
run-processor-phases absorbed into engine
run-phase absorbed into engine
tax-compliance/run-vat-validation deleted (TCA writes :line-items diagnostic slice directly; §13.2)
validation/validate (multimethod) deleted (callsites moved to validations, FVR, UVR processors)
with-validations (ingestion.clj) deleted (validations is phase 1 of the engine)

New database:

10. Namespace reorganisation

src/com/getorcha/workers/ap/
  ingestion.clj                       ;; shrinks; runs extraction + calls engine
  ingestion/
    classification.clj
    extraction.clj
    transcription.clj
    vat_rules.clj
    validation.clj                    ;; pure rules (check-* functions)
    post_process/                     ;; [DELETED — see processors/]
  processors/
    engine.clj                        ;; IProcessor protocol + run-processors!
    reads.clj                         ;; seq-regex helpers, leaf expansion
    accounts.clj                      ;; (moved from post_process/)
    accruals.clj
    cost_center.clj
    financial_validation.clj
    fraud.clj
    supplier.clj
    tax_compliance.clj
    uncertain_validations.clj
    validations.clj                   ;; NEW — wraps ingestion/validation.clj
    matching.clj                      ;; NEW — wraps match-document! + reconcile-cluster!
    matching/
      queue.clj                       ;; NEW — enqueue! (was publish-document-ready!)
      core.clj                        ;; unchanged internals
      candidates.clj
      evidence.clj
      llm_decision.clj
      normalize.clj
      reconciliation.clj              ;; moved — still internal, no longer a separate processor
      searchable_text.clj

Note: the provenance logic moves to com.getorcha.document.provenance (new top-level ns, neutral between UI + workers). The UI's existing com.getorcha.app.http.documents.view.provenance ns becomes a thin shim that re-exports document-provenance (or is deleted with the UI callers updated to the new ns, whichever is cheaper at implementation time).

11. Phase lists

Ingestion pipeline change

Today the ingestion pipeline in workers.ap.ingestion runs these stages sequentially:

transcribe → classify → extract → validate (with-validations) → post-process → complete

validate mutates structured-data with :validation-results; post-process runs the 9 post-processors through the old IProcessor protocol and mutates structured-data again.

Under the unified model, validate and post-process collapse into a single engine call with THREE phases. The pipeline becomes:

transcribe → classify → extract → run-processors! [validations] [post-procs…] [fraud] → complete

validations (always-run, deterministic-only) runs alone in phase 1 because phase-2 processors read its output. The nine existing post-processors plus the new validations processor distribute across the three phases per §11 below. (There is no separate vat-validation processor — tax-compliance-analyzer writes the per-line vat-validation diagnostic directly; see §13.)

Ingestion (invoice)

The engine replaces both with-validations and post-process/run. Ingestion calls:

(engine/run-processors!
  ctx state-in-ingestion-mode
  [;; Phase 1 — deterministic validations (fast; produces the validation
   ;; statuses that downstream LLM processors consult)
   [validations]
   ;; Phase 2 — enrichment, analysis, resolvers
   [accounts cost-center accruals
    supplier-matcher supplier-verifier
    tax-compliance-analyzer
    financial-validation-resolver
    uncertain-validations-resolver]
   ;; Phase 3 — sees phase-2 mutations (e.g. tax-id correction)
   [fraud-detector]])

Ingestion then calls processors.matching.queue/enqueue! to hand off to the matching SQS worker. Matching stays async for latency isolation.

Phase rationale:

Matching worker

The matching SQS worker handles both the post-ingestion continuation and (nothing else — edit-mode runs matching inline):

(engine/run-processors!
  ctx state-in-ingestion-mode
  [[matching]])

matching's -compute runs match-document! then invokes reconcile-cluster! for each affected cluster. Reconciliation is NOT a separate processor — it's an internal step of matching. matching writes both :matching and :reconciliation diagnostic slices (see §13).

Edit recompute

(engine/run-processors!
  ctx state-in-edit-mode
  [;; Phase 1 — validations (always runs; produces statuses downstream reads)
   [validations]
   ;; Phase 2 — everything else, conditionally
   [tax-compliance-analyzer
    fraud-detector matching
    accounts cost-center accruals
    supplier-matcher supplier-verifier
    financial-validation-resolver
    uncertain-validations-resolver]])

Phase 1 is the same as ingestion's phase 1: validations runs first because phase-2 processors (tax-compliance-analyzer, FVR, UVR) read its output.

Phase 2 collapses ingestion's phases 2 and 3 because in :edit mode -apply-ops mutations are filtered by the user-edit set, so phase-2 corrections typically don't propagate to phase-3 readers in a meaningful way — and when they would, the reader will recompute on a subsequent edit anyway. Fraud running alongside tax-compliance in phase 2 is a small latency win acceptable because fraud-detector's output is a diagnostic (not a correction) and a slightly-stale tax-id-type at fraud-time only produces a slightly-stale fraud-flag.

Every phase-2 processor is {:ingestion :edit}; the conditional filter (§6) decides which actually run based on :changed-leaves. validations runs unconditionally (-always? true).

Matching's reconciliation sub-step is sequenced inside matching's -compute, not at the engine's phase level. reconcile-cluster! still inserts its own document_processor_run rows for cluster-peer documents and writes their :reconciliation slices — these are side effects of the matching processor on OTHER documents, outside the engine's current-document scope. The engine itself only tracks runs/slices for the document that triggered the run.

12. Migration table

Every current processor gets a v2 profile. Values below are illustrative for the spec; exact reads/writes are locked in during implementation from source inspection.

Processor Reads (leaves, terse) Writes (structured-data) Diagnostic Modes Always?
accounts issuer.name, issuer.vat-id, issuer.country, line-items.*.description, line-items.*.amount line-items.*.debit-account, line-items.*.credit-account {:ingestion :edit} no
cost-center issuer.name, line-items.*.description, line-items.*.amount line-items.*.cost-center {:ingestion :edit} no
accruals invoice-date, line-items.*.description line-items.*.accrual {:ingestion :edit} no
supplier-matcher issuer.name, issuer.vat-id, issuer.iban supplier-match {:ingestion :edit} no
supplier-verifier issuer.name, issuer.vat-id, issuer.country, issuer.address supplier-verification-id {:ingestion :edit} no
tax-compliance-analyzer issuer.country, issuer.tax-id, issuer.tax-id-type, recipient.country, recipient.tax-id-type, shipping-country, line-items.*.tax-rate, line-items.*.description, delivery-terms-raw, incoterm-code, compliance-statements.*.text service-category, line-items.*.bu-code; tax-id-correction branch (vision PDF) is :ingestion-only — see §14 multi-slice: :tax-issues + :line-items (see §13) {:ingestion :edit} no
financial-validation-resolver subtotal, total, tax-amount, line-items.*.amount, line-items.*.quantity, line-items.*.unit-price :validations.financial-math (sub-path — see §13) {:ingestion :edit} no
fraud-detector issuer.name, issuer.country, issuer.vat-id, issuer.tax-id, issuer.iban, issuer.account-number, issuer.sort-code, issuer.routing-number, issuer.bsb, recipient.country, invoice-date, line-items.*.description :fraud-flags {:ingestion :edit} no
uncertain-validations-resolver issuer.name, issuer.address, recipient.name, recipient.address, invoice-date, invoice-number :validations.{required-fields,date-reasonableness,recipient-identity} (sub-paths — see §13) {:ingestion :edit} no
validations (see §12.1 — per-doc-type dispatch) :validations (per-doc-type sub-paths — see §13) {:ingestion :edit} yes
matching (per-doc-type — see §12.2) matches rows in document_match, cluster-id on document, cluster reconciliation state on ap_document_cluster; also triggers reconcile-cluster! which writes :reconciliation slice for each cluster peer (§13.3) :matching {:ingestion :edit} no

12.1 validations — per-doc-type dispatch, always-run

validations is the only -always? true processor. It runs cheap deterministic checks over the whole document. Enumerating every leaf it touches would produce a brittle declaration, and the cost profile (a few hundred microseconds, no LLM calls) doesn't justify the filtering overhead.

Its reads/writes/diagnostic vary by document type (via -reads [this state] dispatch on state.document.structured-data.document-type):

Doc type Sub-paths owned by validations
invoice :tax-id-format, :iban-format, :issuer-country, :recipient-country, :large-document-summary-only (invoice-specific checks; :financial-math owned by FVR, :required-fields/:date-reasonableness/:recipient-identity owned by UVR)
purchase-order :required-fields (the whole validations slice for POs)
contract :signature-presence, :required-fields, :date-validity, :party-identification, :financial-consistency, :termination-clause
goods-received-note :required-fields

Contract/PO/GRN have no LLM validation resolvers; validations owns their entire :validations slice. Invoice has FVR and UVR (§13) owning specific sub-paths.

12.2 Matching — per-doc-type reads

Matching's internal code (normalize.clj, searchable-text.clj, evidence.clj) dispatches on :document/type when extracting counterparty names, references, and scoring fields. The processor's -reads mirrors this dispatch:

Doc type Read leaves
invoice issuer.name, issuer.vat-id, issuer.iban, invoice-number, total, currency, line-items.*.description, line-items.*.quantity, line-items.*.unit, po-references.*, gr-references.*, service-period.start, service-period.end
purchase-order supplier.name, supplier.vat-id, po-number, total-value, currency, line-items.*.description, line-items.*.quantity, line-items.*.unit, contract-references.*, requisition-numbers.*
contract counterparty.name, counterparty.tax-id, contract-number, total-value, currency, deliverables.*
goods-received-note supplier.name, supplier.vat-id, grn-number, line-items.*.description, line-items.*.quantity, line-items.*.unit, po-references.*, delivery-note-numbers.*

12.3 Reconciliation — internal to matching

Reconciliation is not a separate processor. It's a sub-step of matching's -compute: after match-document! writes matches and assigns/merges clusters, matching calls reconcile-cluster! for each affected cluster with ≥ 2 documents, and writes the :reconciliation diagnostic slice for the edited document. For cluster peers, the existing peer-cluster run-row + slice-writing in reconcile-cluster! remains unchanged (the engine only tracks the current document's runs).

13. Diagnostic slice co-ownership

:validations is written by THREE processors: validations (base deterministic checks), financial-validation-resolver (resolves the financial-math sub-check), uncertain-validations-resolver (resolves required-fields, date-reasonableness, recipient-identity sub-checks).

Under the old pipeline these flowed through a shared structured-data.validation-results map that processors merged into. Under the new model the slice is a single JSONB object; co-ownership requires merge-not-replace semantics with non-overlapping ownership.

Resolution: -diagnostic returns either a single spec (slice + optional sub-path) or a vector of specs (multi-slice). See §3.1 for the full shape definition. The engine routes per-processor slice writes via jsonb_set for atomicity per sub-path.

13.1 Ownership of :validations slice — invoice

Sub-path Owner
[:financial-math] financial-validation-resolver
[:required-fields] uncertain-validations-resolver
[:date-reasonableness] uncertain-validations-resolver
[:recipient-identity] uncertain-validations-resolver
[:tax-id-format] validations
[:iban-format] validations
[:issuer-country] validations
[:recipient-country] validations
[:large-document-summary-only] validations

For contract, PO, GRN: validations owns the whole :validations slice (no resolvers exist for those doc types). See §12.1.

Each sub-path has exactly ONE owner per document type. No two processors write to the same sub-path. Small refactor of existing code: today's check-financial-math, check-required-fields, check-date-reasonableness, check-recipient-identity stay in ingestion/validation.clj as pure functions but the composition moves out of validation/validate into the respective resolver processors (which do the deterministic part + the LLM refinement in one -compute). For invoice, validations' -compute stops emitting those four sub-paths. For contract/PO/GRN, validations still runs all deterministic checks for that type.

13.2 Ownership of :tax-issues and :line-items slices

tax-compliance-analyzer owns BOTH diagnostic slices (:tax-issues invoice-level, :line-items per-line :vat-validation) via multi-slice -diagnostic return (§3.1).

TCA's -diagnostic returns:

[{:slice :tax-issues} {:slice :line-items}]

TCA's -compute result has corresponding top-level keys (plus whatever the processor wants for its own -apply-ops):

{:tax-issues [{:type :missing-vat-id :severity "warning" ...} ...]
 :line-items {"li-abc" {:vat-validation {...}}
              "li-def" {:vat-validation {...}}}
 ;; processor-internal: used by -apply-ops to build structured-data ops
 :service-category   {...}
 :bu-codes           {"li-abc" {...} "li-def" {...}}
 :tax-id-correction  {:status "corrected" :tax-id "..." :tax-id-type "..."}}

Engine reads top-level keys matching declared slice names (:tax-issues, :line-items) and writes them via update-diagnostic!. There is no separate vat-validation processor — the previous transitional function tax-compliance/run-vat-validation (which only extracted data TCA's LLM had stuffed onto structured-data) is deleted, along with the "extract then strip" dance at ingestion-completion.

TCA's -apply-ops emits structured-data mutations from the processor-internal keys in result:

13.3 :matching and :reconciliation slice ownership

matching processor's -diagnostic returns {:slice :matching}. The :reconciliation slice is written by reconcile-cluster! internally during matching's -compute — it iterates every cluster-peer document and writes each doc's per-doc :reconciliation slice (summaries are filtered per document). This is outside the engine's current-document scope (§12.3). No co-ownership concerns: matching's -diagnostic writes only :matching; reconcile-cluster! writes :reconciliation for all cluster docs (including the current one).

13.4 UI impact — none

The UI reads the final merged slices regardless of which processor wrote which sub-path. No rendering changes. Existing per-section states (not-yet-run, in-progress, completed) from the diagnostics feature already handle the case where individual sub-paths have differing run statuses.

14. Tax-compliance vision mode

The existing tax-compliance analyser has a vision fallback for tax-id correction — when the prior :tax-id-format validation failed, it fetches the PDF and asks a vision LLM to read the correct tax-id off the invoice image.

Policy: vision mode is :ingestion-only. In :edit mode, if the user edited an invalid tax-id and it's still invalid, that's user intent to flag (the validations processor will emit the format warning). We don't second-guess the user with a vision LLM. This simplifies edit-mode plumbing too — no S3 fetch needed.

Implementation: tax-compliance-analyzer's -compute inspects state.mode. When :edit, the no-vat tax-id correction branch and the tax-id-warn vision extension are both skipped.

15. Test strategy

Existing tests under test/com/getorcha/workers/ap/ingestion/post_process/ move under test/com/getorcha/workers/ap/processors/ with their ns updates.

16. Rollout

Since D3 settled on "do everything in one branch," the migration ships atomically:

  1. Migration: add 'derivation' to document_history_change_type enum; relax the CHECK constraint. Down-migration drops the value (if unused) or keeps it (if any rows exist).
  2. Introduce IProcessor protocol (v2) + engine (no callers yet).
  3. Extract provenance to shared ns (both UI and engine consume it).
  4. Implement reads helpers (leaf expansion, pattern matching).
  5. Extend db.document-processor-run/count-runs to accept :document-version kwarg (needed by the engine's conditional filter in §6.2).
  6. Extend db.diagnostics/update-diagnostic! for sub-path and multi-slice writes via jsonb_set.
  7. Migrate each post-processor to the v2 protocol (one commit per processor). Old record arities coexist briefly via a deprecated shim; shim removed once all callers switch.
  8. Migrate matching internals to processors/matching/* and introduce the processors/matching.clj wrapper (handles reconciliation inside -compute).
  9. Rewrite post-process/run + with-validations to call run-processors!.
  10. Rewrite matching.worker/process-document! to call run-processors!.
  11. Rewrite diagnostics-recompute/orchestrator stubs to call run-processors! with the edit-mode phase list + filter.
  12. Delete old IProcessor shim, validate multimethod, with-validations, tax-compliance/run-vat-validation, with-run-row!, compute!, run-processor-phases, run-phase, build-diagnostics, publish-document-ready!.
  13. Add notification payload unification.

Tests gate each commit. If a commit breaks a regression test, it gets fixed or reverted before the next step.

17. Risks & mitigations

18. Deferred decisions