VAT Rate Statement Check (§14 Abs. 4 Nr. 8)
Problem
§ 14 Abs. 4 Satz 1 Nr. 8 UStG requires an invoice to state both the applicable tax rate and the tax amount ("der anzuwendende Steuersatz sowie der auf das Entgelt entfallende Steuerbetrag" — "and", not "or"). Some supplier invoices print the net total, the tax amount, and the gross total but omit the explicit rate (e.g. no "19 %" anywhere). That is a formal compliance defect we should surface.
Today we never flag it. check-required-fields already requires
tax-rate for the EU tier, but the check runs on extracted
structured-data, where the LLM derives tax-rate: 19 from
tax-amount ÷ net even when no rate is printed. So has-tax-rate? is
satisfied by an inferred value and the invoice passes.
Every downstream check (check-required-fields, the
tax-compliance analyzer) consumes post-extraction
structured-data, which has no "printed vs inferred" signal.
document.provenance is edit-provenance only
(human vs LLM since last ingestion), not source evidence. The signal
does not exist and must be created at extraction time.
Concrete instance: invoice RE_CFG-20260518100554_TPM (bikosigma
GmbH) prints Summe, Umsatzsteuer 5.044,36 €, and
Gesamtbetrag but no rate. It currently passes required-fields.
Goals & non-goals
Goals
- Detect, at extraction time, whether the applicable VAT rate is printed on the invoice.
- Raise a warning (not a hard error) when a positive German/EU VAT applies but the rate is not stated.
- Route the signal through the Uncertain Validations Resolver (UVR) so a vision-LLM look at the PDF is the final decision-maker — protecting against a false-negative extraction boolean.
- No false warnings on legacy documents, non-EU invoices, or reverse-charge / exempt invoices.
Non-goals
- A general §14 Abs. 4 element-presence framework (net / amount / gross). Those are not actual gaps; only the rate is. The design stays additive so a future element pass is possible.
- Auto-correcting structured-data. This is a presentation defect on the supplier's paper; there is nothing to fix in our data.
- A retroactive backfill. Forward-going only; a backfill re-ingest can be run separately if desired.
Approach
An extraction-emitted boolean is the only reliable source of truth (chosen
over an OCR-text heuristic or extending the structured-data-only
tax-compliance analyzer). It mirrors the discipline already applied to
tax-amount ("extract only if printed, do NOT calculate") and
keeps the deterministic check deterministic.
Pros
- High precision; LLM reliably reports presence of a printed token
- Consistent with existing
tax-amountpattern - Deterministic check stays deterministic
- UVR backstop covers extraction false-negatives
Cons
- Extraction-prompt change → needs a regression/eval pass
- Retroactive coverage needs re-ingest, not just recompute
Bare boolean :tax-rate-stated? (schema-optional / nilable),
rate-only scope, surfaced as a
warning, with UVR as the final decision-maker. A separate
UVR sub-path — not folded into required-fields —
because the semantics differ: required-fields = "data we could not find";
this = "the data exists but the invoice omits the printed rate".
Design
1. Extraction signal
Add [:tax-rate-stated? {:optional true} [:maybe :boolean]] to
InvoiceData (schema/invoice/structured_data.clj).
Optional and nilable, mirroring
[:freight-included [:maybe :boolean]] and the
{:optional true} precedent on :summary-page-range.
One prompt instruction in extraction.clj, parallel to the
existing tax-amount rule:
It must not be a required key. InvoiceData is an
open Malli :map validated non-blockingly
(ingestion.clj only logs + sets
:valid-structured-data), but a required key would fail the
existing structured_data_test.clj fixtures and flip
every legacy doc to "schema invalid" on re-ingest/recompute. The
deterministic check treats absent OR nil identically →
pass (legacy docs stay silent).
prompttax-rate-stated? = true ONLY if the applicable VAT rate is printed
verbatim on the invoice (e.g. "19 %", "MwSt 19%", "USt 19 %",
"zzgl. 19% USt") OR an explicit exemption / reverse-charge note is
present. false if the rate is absent and only derivable from amounts.
Do NOT infer.
2. Deterministic check (validation.clj)
New check-tax-rate-stated, sibling to
check-required-fields. Returns one of
pass / not-applicable / uncertain.
It never terminally warns. Evaluation is an ordered
cond — rows are tried top to bottom, first match wins:
| # | Condition | Result |
|---|---|---|
| 1 | :tax-rate-stated? is true (extractor saw it printed) | pass — short-circuits everything below |
| 2 | :tax-rate-stated? absent or nil (legacy / pre-feature doc) | pass |
| 3 | invoice-tier is :non-eu | pass (out of scope, as today) |
| 4 | No positive VAT: neither a printed tax-amount nor any positive-rate tax-rate-breakdowns entry | pass |
| 5 | Reverse-charge / exempt: a compliance-statements entry of type reverse-charge / vat-exemption (or legal-basis §13b), with tax-rate 0/nil | not-applicable |
| 6 | Reached here ⇒ :eu/:kleinbetrag, positive VAT, rate not stated, not exempt | uncertain → escalate to UVR |
Reverse-charge precedence: the extractor-set :tax-rate-stated? =
true (row 1) always wins — including when it set it true because a
reverse-charge note is present. The row-5 deterministic exempt branch
only matters when :tax-rate-stated? is false/nil. The prompt
and the deterministic check therefore cannot contradict each other.
3. UVR integration (uncertain_validations.clj)
- Add
:validations.tax-rate-statedas the 4th owned sub-path (extend ns docstring + dispatch alongside required-fields / date-reasonableness / recipient-identity). - New
resolve-tax-rate-stated+ focused prompt instruction: "Is the applicable VAT rate, or an explicit exemption / reverse-charge statement, printed anywhere on the invoice?" - Resolver output → final status:
resolved→pass(extractor false-negative; rate is actually present),not-applicable→not-applicable,unresolved/warning→warning. - Per the existing UVR pattern (
(or (:det-result resolved) det-check)), the resolver result overwrites the escalated"uncertain"; a stuck"uncertain"only occurs if the resolver throws, andfindings/formal-findingmaps that to a (visible, fail-safe):warningrather than silence. -apply-opsstays[]— no structured-data corrections (presentation defect, nothing to fix).
4. Surfacing
Check ordering is duplicated across five hardcoded lists,
not just findings.clj. All five must include
:tax-rate-stated or the warning renders in the banner card
but is invisible in the issue-count badge and the document-list semaphore,
or renders with no status text / empty tooltip.
| Surface | Change |
|---|---|
findings.clj — formal-requirement-checks | Add :tax-rate-stated to ordering (rides formal-finding → banner + DATEV cover page, severity :warning) |
findings.clj — validation-check-labels | Add :tax-rate-stated → "VAT Rate Statement" |
view/invoice.clj:201 — hardcoded formal-issue-count literal | Add :tax-rate-stated so the "Validation N" header badge counts it |
components.clj — validation-check-order (≈1944) | Add :tax-rate-stated so the document-list validation-semaphore reflects it |
components.clj — formal-status-texts (≈1971) & validation-check-descriptions (≈1958) | Add status-text map + tooltip text, else the row renders raw status / empty tooltip |
Message: "Invoice does not state the applicable VAT rate (§14 Abs. 4 Nr. 8 UStG); only the tax amount is shown."
5. Testing & retroactivity
- Unit tests for
check-tax-rate-statedacross tiers and the reverse-charge / exempt / legacy-absent branches. - UVR resolver tested with a stubbed LLM (
with-redefs) for the three outcomes (resolved / not-applicable / warning). - Findings test for label + severity; banner + cover-page parity.
- Extraction prompt change → run the
ingestion-regression-testskill on a sample (incl. the bikosigma doc) to confirm no extraction regressions. - Retroactivity: the field is extraction-time, so existing docs need
re-ingest (not just diagnostics-recompute). Absent the
field, the check returns
pass— legacy docs are silent. - Known edge: for >50-page "large document, summary only" extractions,
header fields (incl. this boolean) come from the summary pages, which may
omit the page printing the rate → a possible false
:tax-rate-stated? = false. Harmless: the doc already carries thelarge-document-summary-onlywarning and UVR backstops against the real PDF. Note it; no special handling.
Open questions
- None blocking. The reverse-charge / exempt detection reuses existing
compliance-statements+ VAT-treatment signals; if those prove noisy in practice the suppression predicate can be tightened in a follow-up.
References
src/com/getorcha/workers/ap/ingestion/validation.clj—check-required-fields,has-tax-rate?,invoice-tiersrc/com/getorcha/workers/ap/processors/uncertain_validations.clj— UVR sub-path pattern,resolve-required-fieldssrc/com/getorcha/schema/invoice/structured_data.clj—InvoiceDatasrc/com/getorcha/workers/ap/ingestion/extraction.clj— tax-amount "extract only if printed" precedent (≈ lines 101–103)src/com/getorcha/diagnostics/findings.clj—formal-requirement-checks,formal-finding- § 14 Abs. 4 Satz 1 Nr. 8 UStG; § 33 UStDV (Kleinbetragsrechnung)
- Instance: document
019e3a85-4776-7067-81b6-49a6dfde031a(bikosigmaRE_CFG-20260518100554_TPM)