Document Data Panel — Design
Problem
The document data panel — the right-hand panel of the document detail page (PDF on the left, data panel on the right) — serves every document type on production's single Document Management list — contracts, purchase orders, and goods-received notes (invoices live on a separate page and are out of scope here). Two things undercut it:
- Contracts render the same sections regardless of sub-type. An NDA shows
an empty
FinancialandRenewalblock; a loan has nowhere to put a repayment schedule. To a human, those empty, ill-fitting boxes read as one thing: this system doesn't really understand contracts. - Across types, the panel's treatment is uneven, and the types sit side-by-side in one list — so weak presentation on any one type drags down perceived competence on all of them.
The two functional jobs the contract panel once justified are already done:
- Cheap preselection metadata (counterparty, renewal/expiration date,
commercial terms) — extracted, stored in
structured_dataJSONB, already driving the list filter/sort and the panel header. Built - Formal validation — a deterministic 6-check processor computed on every ingestion and rendered today. Built
And the agent itself never consumes the structured display — humans do. So the remaining value of this rework is not data the system needs; it is perception and trust.
The panel's job is to demonstrate, for every document type a customer sees, that we read the document correctly with all its nuance. Contracts are the sharpest case — 11 wildly different sub-types — so they ship first and deepest. The same trust treatment then spans purchase orders and goods-received notes, so the whole Document Management list reads as competent rather than just the contracts in it.
Goals & non-goals
Goals
- One unified panel for the in-scope document types. A single architecture — a consistent skeleton (a per-type section configuration) rendered through shared components, with cross-cutting trust overlays — covering contract, purchase order, and goods-received note (invoices are a separate page, out of scope).
- Trust mechanisms for every type (the heart of the rework): a coverage view (found / not-found), plain-language interpretation callouts on nuanced or risky terms, page-level source anchoring (click a fact → jump the PDF to its page), and the existing formal validation surfaced prominently. (Phase 1 builds these as renderers against fixtures; real contract coverage/callout data and verified source-anchor navigation land in Phase 2 / after the spike — see the phases table.)
- Type-adaptive slot for contracts. The 11 contract sub-types each get a fixed set of section buckets whose key/value content is decided by the LLM — a contract-specific feature living inside the unified architecture.
- Contracts deep first. Contracts are the first delivery; POs and GRNs come onto the same skeleton and overlays in a subsequent phase.
- Hybrid renderer. One generic key/value renderer for the long tail, plus a few hand-built per-type components where nuance matters (contract payment schedule, PO/GRN line items, the coverage view).
- Renderer-first. Build the Hiccup renderer + the JSON/EDN contract against fixtures; wire the LLM prompts that produce the JSON as a backend follow-up.
Non-goals (this rework)
- The contract legal-analysis logic itself — that is
#389; here we only render its
:legaloutput in the Validation & Compliance section. - Re-extracting preselection metadata or rebuilding formal validation — both already exist (we render them).
- Pixel-precise PDF highlighting (bounding-box or text-quote) — a separate infrastructure project; see the phases table and Open questions.
- Invoices — they live on a separate page and are out of scope for this rework.
- Changing the document list page or the demo's P2P/O2C/Contracts subpages.
Approach
The panel is a per-type section configuration rendered through shared components, with cross-cutting trust overlays (source anchors, coverage, callouts, validation badge) applied throughout. The governing principle is unchanged from the legal-analysis rework: prompts decide content; code decides shape, section selection, and trust rendering.
The decisions that distinguish this rework from the original draft:
Rather than a contracts-only panel, build a single document data-panel architecture and bring every type onto it. The 11-sub-type "type slot" stays contract-specific (POs and GRNs are single-shape), but the skeleton, the hybrid renderer, and all trust overlays are shared. A contract's payment schedule and a PO/GRN's line items are both just per-type components within the same skeleton. Contracts ship first and deepest; PO/GRN follow. (Invoices have their own separate page and are out of scope.)
Relax the original "pure generic key/value, minimal hardcoding" rule. Flat KV lists read like a data dump and undercut the nuance we are trying to convey. Keep one generic renderer for the long tail but allow a few hand-built per-type components where nuance matters most: the contract payment / repayment schedule, the PO/GRN line items, and the coverage view.
Per-fact source anchoring moves from "deferred" into scope, realized at page granularity: each fact carries a source page; clicking it navigates the PDF to that page. No in-scope document type stores per-fact page data today, so source pages come from extraction — contracts first, in Phase 2; in Phase 1 the chips render from fixtures only. Pixel-precise highlight is a separate project — see the phases table.
Open technical risk (spike before Phase 1 commits): the PDF panel has
two iframes — #invoice-frame and #email-frame — toggled by a
tab switcher; the document frame's src already carries a fragment
(#view=FitH&pagemode=none). A page chip must target the correct (document)
frame. Changing only the hash of an already-loaded native PDF viewer does not reliably
re-navigate (Chrome/PDFium often ignores it; behaviour differs across Chrome/Firefox/Safari).
The spike picks the technique — e.g. reassign the document frame's src with
#page=N&view=FitH (mutating only the fragment, preserving the presigned query
string) and force a reload, vs. adopting PDF.js. Its exit criterion is behavioural
— the page visibly changes in a loaded viewer across our target browsers — not a unit test on the
fragment string. Treat "no PDF.js required" as a hypothesis, not a settled fact.
We reuse existing data where it exists — the ContractData schema, the
formal-validation diagnostics, the compliance-checks the contract prompt already
produces, the preselection metadata, and the hero/payment-schedule timeline.
But the rendering architecture is greenfield: there is no generic key/value
renderer, no :component dispatch, and no skeleton today (every section in
contract.clj is bespoke Hiccup), and the EDN content model
(:sections/:fields/:kind/:component/:coverage) is a new shape not present in
ContractData. So Phase 1 is "render existing data through a new pipeline",
not a re-skin. Two data caveats: contract risk-flags are in the
schema and consumed by the UI but generated by no current code, so coverage and
callout data is Phase 2 (Phase 1 renders them from fixtures); whereas
compliance-checks is produced today and is real Phase-1 data to render.
Design
The shared skeleton
Every document type renders through the same skeleton; a per-type config decides which sections appear and in what order. Empty sections auto-hide. The trust overlays apply within any section.
- Summary / metadata — key facts at a glance. Whether a type gets a prominent hero band is decided per type in the design (contracts keep theirs; PO/GRN TBD during implementation).
- Type sections — the per-type body. Contracts use the type-adaptive slot (below); PO/GRN use their P2P sections (line items + references).
- Validation & Compliance — the existing formal validation surfaced as a
prominent trust badge, plus the
:legalslice from #389 for contracts when available. - Coverage — the found / not-found checklist for the type (a panel-level overlay component).
- Matches — linked documents (POs / invoices / GRNs).
- History / versions.
The contract spine (deep, first delivery)
Contracts render this locked order. The existing section renderers are partly reused, but the order and grouping change materially from today: the current panel is Hero → Obligations → Financial → Validation → Renewal&Termination → Parties → Scope → Legal, with no standalone Term section and Parties below Financial. This rework promotes Parties, splits a dedicated Term & Renewal section, replaces Financial+Scope with the type-adaptive middle, and moves Validation down. Two pieces of coupled work the reorder pulls in:
- Auto-hide is new for several sections. Empty body sections should auto-hide (the coverage checklist is the deliberate exception — it shows not-found on purpose), but today Parties, Renewal&Termination, Legal, and Obligations render unconditionally (only Financial and Scope are guarded; Obligations shows an internal "none extracted" state). Auto-hide is per-section logic to add, not a property to preserve.
- The tab strip and scroll-spy change in lockstep. A separate hardcoded
contract tab list (
section-tabs-for-typeinshared.clj) plus a scroll-spy and per-tab collapse JS are coupled to the section ids. Dropping Scope/Financial and adding Term & Renewal + the type-adaptive middle means reworking that tab strip and its id contract — in Phase-1 scope, or explicitly deferred.
Treat the reorder as real work, not a no-op.
- Summary / metadata hero (exists) — counterparty, dates, total value, renewal.
- Parties (exists) — principal / counterparty.
- Term & Renewal (new section) — effective / expiration, renewal type, notice period, cancel-by. Pulls together fields that today live split across the hero (effective/expiration) and the existing "Renewal & Termination" renderer; the section itself is new.
- ⟳ Type-adaptive middle — 1–3 type-specific sections (below). Replaces
today's one-size-fits-all
Financial+Scope. - Validation & Compliance — formal validation badge. The
:legalslice (#389) is not built yet, so in Phase 1 this section renders the validation badge only; the:legalarea stays absent until #389 lands. - Obligations (exists) — key obligations.
- Legal (exists) — governing law, jurisdiction, liability cap, confidentiality.
- Matches — linked POs / invoices. Moves from today's after-hero injection
(
extra-after-herointype-specific-view) to this dedicated late section. - History / versions.
The contract type-adaptive middle — fixed buckets, LLM-decided content
- Section buckets are fixed per contract type (id, title, order), defined in config and passed into the prompt as input.
- The LLM decides the content of each bucket: flexible key/value pairs. Per-section guidance tells it what to look for; the guidance is non-exhaustive — the LLM may add relevant pairs we did not list and omit ones absent from the contract.
- Rendering is hybrid: a bucket may name a per-type
:component(e.g.:payment-schedule) for high-nuance content; otherwise its:fieldsrender through the generic key/value renderer.
| Type | Sections (buckets) |
|---|---|
| subscription | Subscription & Billing · Service Levels · Price Escalation & Changes |
| loan | Loan Terms · Repayment Schedule · Collateral & Covenants |
| lease | Leased Asset · Lease Terms & Payments · Return & Maintenance |
| rental | Rented Object · Rental Terms & Payments · Maintenance & Return |
| insurance | Coverage · Premium & Payment · Exclusions & Deductibles |
| service | Scope & Deliverables · Service Levels · Pricing & Penalties |
| supply | Supply Scope & Items · Pricing & Delivery · Quality & Penalties |
| purchase | Purchase Terms · Warranty & Acceptance |
| framework | Framework Scope & Ceiling · Call-off Terms |
| nda | Confidentiality Scope · Breach & Penalties |
| other | Key Terms (generic catch-all) |
Other document types on the unified panel
- Purchase order / goods-received note. P2P section sets (header / line
items / references) rendered through the skeleton, with the trust overlays. Line items use the
per-type
:line-itemscomponent; source pages come from their extraction (no existing page-location to lean on). Summary-band decision per type (Open questions). - None of these need the 11-sub-type slot; they are single-shape types with a fixed section config.
Content model & the JSON / EDN contract
The type-slot output per document. Three additions over the original draft: per-field
:source (page), a :callout field kind for interpretation, and a
panel-level :coverage checklist. The same field model is reused by other types'
sections.
edn — fixture for one contract{:type "subscription" :sections [{:id "billing" :title "Subscription & Billing" :fields [{:label "Recurring fee" :value 2400 :kind :currency :source 2} {:label "Billing cycle" :value "Annual" :source 2} {:kind :callout :severity :warning :source 5 :value "Auto-renews for 12 months unless cancelled 90 days prior."}]} {:id "payment-plan" :title "Payment Schedule" :component :payment-schedule ; per-type component, not generic KV :rows [["2026-01-01" 2400] ["2027-01-01" 2520]] :source 6}] :coverage [{:term "Liability cap" :found? false} {:term "Confidentiality" :found? true :source 7} {:term "Price escalation" :found? true :source 4}]}
:sections— only buckets the LLM populated; rendered in the configured order (not the LLM's), so the panel stays consistent. Empty buckets are omitted.:fields—{:label, :value, :kind?, :source?, :severity?}.:kind∈text(default) ·number·currency·date·percent·boolean·table·callout.:kinddrives formatting only; the generic renderer stays generic.:component— when present on a section, a hand-built per-type component renders it (contract:payment-schedule, PO/GRN:line-items) instead of generic fields.:source— 1-indexed page number; powers the click-to-page anchor.:coverage— per-type expected-terms checklist with:found?and optional:source; powers the coverage view.
Scope of this model (Phase 1). The EDN above describes the type-adaptive
middle plus the panel-level :coverage — not the whole panel. The spine sections
(Parties, Term & Renewal, Validation, Obligations, Legal) stay hand-built from
structured-data in Phase 1; generalizing them onto this model is an open question. So
a fixture carries the adaptive middle + coverage, not Parties/Legal. :source is valid
at field level and at section level (e.g. on a :component section like the payment
schedule); it is not used on spine sections in Phase 1.
Trust mechanisms (cross-cutting, all types)
Coverage view. For each document type, a curated expected-terms
checklist — a new, authored deliverable, one list per contract sub-type and per other type
(see Open questions) — is rendered with each item marked found or not found.
Absence becomes visible thoroughness rather than an empty box. The data behind found/not-found is
the contract risk-flags / extraction output, which is produced in
Phase 2 (not generated today); Phase 1 renders the view from fixtures.
Phase 1 must also remove the existing contract-risk-signals-box (in
contract-validation-section): because risk-flags are never produced, it currently
renders all 8 risk types as a green "pass" — a false all-clear that is the opposite of the
trust we want. The new coverage view supersedes it. (compliance-checks is real data
that renders today — but note #389 plans to re-home it under the :legal slice, so the
Phase-1 compliance renderer is interim, not permanent.)
Interpretation callouts. Short plain-language notes on nuanced or risky terms
(an auto-renewal trap, a Vertragsstrafe, an unusual termination condition), via the
:callout field kind. Same data dependency as coverage: rendered in Phase 1 from
fixtures, populated in Phase 2 once contract risk-flags are generated.
Source anchoring (page-level). Each fact shows a small page chip; clicking it navigates the PDF to that page (technique pending the spike in Approach). No in-scope type stores per-fact page data today, so contracts get the first real source pages once extraction emits them (Phase 2); Phase 1 renders chips from fixtures. No stored coordinates required.
Validation made prominent. Surface the already-built formal validation as a
visible trust badge at the top of Validation & Compliance, not a buried sub-section.
Data source: the 6 checks are computed in validation.clj but read by the panel from
the diagnostics model (:document/diagnostics → :validations) and
rendered by contract-validation-section; we reuse that path.
Callouts and not-found markers are powerful precisely because they are rare. Flag the genuinely notable; a panel that warns about everything trains the reader to ignore all of it.
Penalties & cross-cutting content (contracts)
Penalties / fines (e.g. a Vertragsstrafe) are cross-cutting and must never be lost:
- Dedicated home where prominent — service/supply ("…& Penalties"), NDA ("Breach & Penalties"), subscription (service credits in Service Levels), loan (default terms in Repayment / Covenants).
- Safety net everywhere else — because content is LLM-decided and guidance is
non-exhaustive, a defined fine returns as a
:callout(or KV pair) in the closest section even where no dedicated bucket exists.
Prompt & extraction changes (backend)
- One prompt per contract type (or one parameterized prompt taking the type's section config). It receives the fixed buckets + per-section "what to check" guidance and must return the JSON contract above. Other document types get analogous per-type guidance later.
- Guidance is a checklist, not a schema — the LLM returns only what is present and may add relevant pairs.
- Emit a
:sourcepage per fact (the transcript already carries=== PAGE N ===markers) and the:coverageresults per type. - Reuse the existing per-tenant
tenant_prompt_customizationpath. Mirrors the legal-analysis decision: prompts, not skills; the model fills content, a schema validates the shape.
Frontend renderer (Hiccup)
- A generic component renders
:sections → :fieldswith per-:kindformatting, a small table renderer, and the:callouttreatment. - A small set of per-type components (contract
:payment-schedule, PO/GRN:line-items, the panel-level coverage view) render high-nuance content the generic renderer cannot do justice. - Source chips, coverage, and the validation badge are shared overlay helpers usable from any type's section.
- Renderer-first: build and validate against EDN fixtures before the LLM produces real data.
Build phases
| Phase | Scope | Touches extraction? |
|---|---|---|
| 1 — Contracts, renderer-first | New rendering pipeline (generic :kind renderer + :component dispatch + EDN content model) — none exist today; the spine reorder incl. auto-hide for Parties/Term/Legal and the coupled tab-strip/scroll-spy rework; the 4 trust-overlay components; the contract type-adaptive slot; removal of the misleading all-green contract-risk-signals-box. Built and tested against EDN fixtures. Renders existing data where present (metadata, validation diagnostics, compliance-checks, payment schedule); coverage and callouts render from fixtures only (risk-flag data is Phase 2). Includes the source-anchoring navigation spike. | No |
| 2 — Contracts, backend extraction | Per-type prompts producing the JSON; generate contract risk-flags → coverage + callouts; teach the contract prompt to use the === PAGE N === markers and emit a per-fact :source page (with golden-doc accuracy validation); schema validation; pipeline wiring. | Yes |
| 3 — PO & GRN onto the unified panel | Purchase orders and goods-received notes get the skeleton + trust overlays + their P2P sections (incl. a :line-items component). No existing per-fact page data, so source pages come from their extraction; summary-band decided per type. (Invoices are out of scope — separate page.) | Some |
| Future (out of scope) — precise highlight | Text-quote / bounding-box highlight in the PDF. Requires persisting Document AI geometry + coordinate mapping + a PDF.js (or page-image + SVG overlay) rewrite. | Yes (heavy) |
The unified all-types architecture is a design constraint, not first-delivery surface.
The first writing-plans output covers Phase 1 (contracts, renderer-first
against fixtures) only; Phases 2 and 3 get their own plans (and may warrant their own
specs). This keeps the first plan tractable.
Testing
- Renderer: fixture-driven snapshot tests — one EDN fixture per contract sub-type (and per other document type as they land), covering empty sections, table fields, every value kind, callouts, coverage (found/not-found), and source chips → expected rendered panel.
- Config: every contract sub-type, and every covered document type, resolves to a section config.
- Source anchoring: a chip targets the document frame
(
#invoice-frame) and triggers the spike-selected navigation technique. The spike's behavioural exit criterion (page visibly changes in a loaded viewer) is the real gate — not a unit test on the fragment string. - Backend (Phase 2+): validate LLM output against the JSON contract; golden documents per type asserting expected sections / pairs / coverage appear.
Open questions
- Summary band per type. Contracts keep a hero band; do POs and GRNs get one? Decided per type during implementation.
- Precise highlighting (future). Is page-level jump enough, or do we later want text-quote / bounding-box highlight (persist OCR geometry, map LLM output to coordinates, migrate the iframe to PDF.js or page-image + SVG overlay)?
- Per-type component proliferation. We allow a few hand-built components.
Proposed guardrail: a
:componentis justified only when it needs bespoke visual treatment or interaction the generic renderer can't give — the payment-schedule timeline qualifies; plain tabular data (penalties, variable components, simple line lists) does not and must use:kind :table. Confirm the rule and the boundary. - Generalize the generic renderer onto spine sections? Parties, Legal, etc. are still hand-built — worth converging later, or leave them?
- Inline editing? Do contracts / PO / GRN get editable KV values? Deferred.
- Localization. Contracts are largely German, but the panel chrome, section titles, coverage terms, and callout labels are hardcoded English, and there is no i18n infrastructure today. English labels beside a German PDF undercut the very "we read it correctly" trust pitch. Do section titles / coverage terms / callout labels need German (locale-keyed config strings), or is English acceptable for now?
- Accessibility. Page chips, collapsible sections, and found/not-found markers need keyboard nav, ARIA, and a color-independent found/not-found encoding (icon/text, not the warm palette alone).
- Authoring the coverage checklists. Who curates the per-type expected-terms lists, and against what source (legal review)? This is net-new content, not derivable from the schema.
References
- 2026-05-20-legal-analysis-rework-design
(#389) — feeds the Validation & Compliance
:legalslice. - 2026-04-13-document-diagnostics-design — diagnostics model the validation output rides on.
- Code:
view/contract.clj(contract panel),view/shared.clj(detail-page-content,type-specific-view, PDF iframe at the#page=Nanchor point),ui/components.clj(contract-validation-section),workers/ap/ingestion/validation.clj(computes the 6 formal checks; the panel reads them from the diagnostics model:document/diagnostics → :validations),schema/contract/structured_data.clj(ContractData, the 11 types,risk-flags,compliance-checks).