VAT Master Data Enrichment — Orcha

VAT Master Data Enrichment

Automated validation and enrichment of VAT identification numbers from existing ERP master data — preparing your vendor records for the e-invoicing era.

ERP Master Data
EU & International VAT Registers
AI-Powered Research

From Legacy Data to E-Invoice Readiness

With mandatory e-invoicing approaching across the EU, vendor master data that was “good enough” for traditional processes now needs to meet strict structural requirements.

With e-invoicing mandates like ViDA (VAT in the Digital Age) approaching, every invoice will require a machine-readable, validated VAT ID. Vendor records that worked for manual AP processing need to be enriched and validated to meet these new requirements.

Multi-Stage Validation & Enrichment Pipeline

A tiered approach that starts with the most reliable sources and progressively applies more sophisticated methods for remaining unresolved records.

End-to-End Process Flow

flowchart TD START["ERP Vendor Master Data\nNames, addresses, existing VAT IDs"] subgraph STAGE0 ["Stage 1 — Data Preparation"] direction TB S0_CLEAN["Data Cleansing & Normalisation\nStandardise company names, addresses,\ncountry codes, VAT ID formats"] S0_DONE["Structured Records\nClean, consistent format\nready for automated processing"] end subgraph STAGE1 ["Stage 2 — Validate Existing VAT IDs"] direction TB S1_CHECK["VIES / HMRC / National API\nValidate all existing VAT IDs\nagainst official tax authority databases"] S1_VALID["Valid\nVAT ID confirmed active,\nname & address match"] S1_INVALID["Invalid or Missing\nVAT ID expired, deregistered,\nor not present in ERP"] end subgraph STAGE2 ["Stage 3 — National Register Lookup"] direction TB S2_LOOKUP["Query National Business Registers\nSearch by company name & address\nin country-specific databases"] S2_FOUND["Candidate VAT ID Found\nDerived from register number\nor directly returned"] S2_NOTFOUND["Not Found in Registers\nCompany not matched or\nregister does not expose VAT ID"] end subgraph STAGE3 ["Stage 4 — AI-Powered Web Research"] direction TB S3_SEARCH["Intelligent Web Search\nSearch company websites, Impressum pages,\nlegal disclosures, public directories"] S3_FOUND["VAT ID Extracted\nFrom Impressum, legal pages,\nor public business directories"] S3_NOTFOUND["Not Found Online\nNo reliable source identified"] end subgraph STAGE4 ["Stage 5 — Re-Validation"] direction TB S4_CHECK["VIES / National API\nValidate all newly discovered\nVAT IDs from Stages 3 & 4"] S4_VALID["Confirmed Valid"] S4_INVALID["Validation Failed\nCandidate ID does not check out"] end subgraph STAGE5 ["Stage 6 — Remaining Records"] direction TB S5_OPTIONS["Resolution Options"] S5_COMMERCIAL["Commercial Data Provider\nD&B, Creditreform, or\nBureau van Dijk enrichment"] S5_MANUAL["Manual Review Queue\nFlagged for AP team with\nbest candidate matches & confidence scores"] end RESULT["Enriched Master Data\nAll vendor records with\nvalidated VAT IDs or\nclear action items"] START --> S0_CLEAN S0_CLEAN --> S0_DONE S0_DONE --> S1_CHECK S1_CHECK --> S1_VALID S1_CHECK --> S1_INVALID S1_INVALID --> S2_LOOKUP S2_LOOKUP --> S2_FOUND S2_LOOKUP --> S2_NOTFOUND S2_NOTFOUND --> S3_SEARCH S3_SEARCH --> S3_FOUND S3_SEARCH --> S3_NOTFOUND S2_FOUND --> S4_CHECK S3_FOUND --> S4_CHECK S4_CHECK --> S4_VALID S4_CHECK --> S4_INVALID S4_INVALID --> S5_OPTIONS S3_NOTFOUND --> S5_OPTIONS S5_OPTIONS --> S5_COMMERCIAL S5_OPTIONS --> S5_MANUAL S1_VALID --> RESULT S4_VALID --> RESULT S5_COMMERCIAL --> RESULT S5_MANUAL --> RESULT style START fill:#eef2ff,stroke:#6366f1,color:#1e1b4b style S0_CLEAN fill:#0e7490,color:#fff,stroke:#0891b2 style S0_DONE fill:#cffafe,stroke:#0e7490,color:#164e63 style STAGE0 fill:#f8fafc,stroke:#94a3b8,stroke-dasharray: 5 5 style S1_CHECK fill:#0078d4,color:#fff,stroke:#005a9e style S1_VALID fill:#dcfce7,stroke:#16a34a,color:#14532d style S1_INVALID fill:#fee2e2,stroke:#dc2626,color:#7f1d1d style STAGE1 fill:#f8fafc,stroke:#94a3b8,stroke-dasharray: 5 5 style S2_LOOKUP fill:#0078d4,color:#fff,stroke:#005a9e style S2_FOUND fill:#dcfce7,stroke:#16a34a,color:#14532d style S2_NOTFOUND fill:#fee2e2,stroke:#dc2626,color:#7f1d1d style STAGE2 fill:#f8fafc,stroke:#94a3b8,stroke-dasharray: 5 5 style S3_SEARCH fill:#7c3aed,color:#fff,stroke:#6d28d9 style S3_FOUND fill:#dcfce7,stroke:#16a34a,color:#14532d style S3_NOTFOUND fill:#fee2e2,stroke:#dc2626,color:#7f1d1d style STAGE3 fill:#f8fafc,stroke:#94a3b8,stroke-dasharray: 5 5 style S4_CHECK fill:#0078d4,color:#fff,stroke:#005a9e style S4_VALID fill:#dcfce7,stroke:#16a34a,color:#14532d style S4_INVALID fill:#fef3c7,stroke:#f59e0b,color:#78350f style STAGE4 fill:#f8fafc,stroke:#94a3b8,stroke-dasharray: 5 5 style S5_OPTIONS fill:#fef3c7,stroke:#f59e0b,color:#78350f style S5_COMMERCIAL fill:#f5f3ff,stroke:#8b5cf6,color:#3b0764 style S5_MANUAL fill:#f5f3ff,stroke:#8b5cf6,color:#3b0764 style STAGE5 fill:#f8fafc,stroke:#94a3b8,stroke-dasharray: 5 5 style RESULT fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d

Stage Details

1
Data Preparation & Normalisation

Before any validation can happen, the raw ERP data needs to be cleansed and structured into a consistent format. This includes standardising company names (handling legal form abbreviations like GmbH, AG, S.A., B.V.), normalising addresses (country codes, postal formats, character encoding), and formatting existing VAT IDs into the correct pattern (adding/removing country prefixes, stripping spaces and dashes). This step dramatically improves match rates in all subsequent stages.

2
Validate Existing VAT IDs

All vendor records that already contain a VAT ID are checked against the official EU VIES database. For UK vendors, the HMRC VAT API is used. VIES returns the registration status and, for most countries, the registered company name and address — enabling an additional cross-check against the ERP record.

3
National Business Register Lookup

For records without a valid VAT ID, Orcha queries national business registers using the company name and address. Several countries offer searchable APIs where the VAT ID can be derived directly from the company registration number. This is particularly effective for France (SIRENE → algorithmic VAT derivation), Belgium (enterprise number = VAT number), and the Netherlands (KVK API).

4
AI-Powered Web Research

For remaining unresolved records, an AI agent performs targeted web searches. In DACH countries, companies are legally required to display their VAT ID on their Impressum page. The agent searches for the company website, navigates to the Impressum or legal disclosure page, and extracts the VAT ID using pattern recognition. This also covers companies that publish their tax ID in footers, terms & conditions, or public directories.

5
Re-Validation

All VAT IDs discovered in Stages 3 and 4 are validated against VIES (or the relevant national system) before being written back to the master data. This ensures that only confirmed, currently active VAT IDs enter the ERP system. Any candidate that fails validation is flagged for further investigation.

6
Remaining Records — Commercial Enrichment or Manual Review

Records that could not be resolved through the automated pipeline are handled through two channels:

  • Commercial data providers (Dun & Bradstreet, Creditreform, Bureau van Dijk / Orbis) offer batch enrichment with high-quality fuzzy matching. These services excel at matching incomplete or inconsistent company data to the correct entity. A one-time batch run is often sufficient.
  • Manual review queue where each record is presented with the best candidate matches, confidence scores, and links to relevant sources. This allows the AP team to resolve edge cases efficiently.

Available Registers & APIs

An overview of the official and commercial data sources available for VAT ID validation and enrichment, by country.

EU — Official VIES Validation

VIES — VAT Information Exchange System

The European Commission’s official service for verifying EU VAT numbers. Covers all 27 EU member states plus Northern Ireland. REST and SOAP API available. Limitation: validation only — you must already have the VAT ID. No reverse lookup by company name.

National Business Registers with API Access

Country Register Name Search VAT ID API
France SIRENE (INSEE) Yes Derivable from SIREN number REST API
Belgium KBO / BCE Yes Enterprise number = VAT number Public search + data dumps
Netherlands KVK Yes BTW-nummer included REST API
Germany Handelsregister Yes (website) USt-IdNr. not in register No official API
Italy Registro Imprese Yes Partita IVA included Limited API (via InfoCamere)
Austria Firmenbuch Yes (website) UID not in commercial register No public API

DACH Advantage: Impressum Requirement

Legal Requirement Works in Our Favour

German, Austrian, and Swiss law requires companies to publish their VAT identification number (USt-IdNr. / UID / MWST-Nr.) on their website’s Impressum or legal disclosure page. This makes the AI-powered web research stage particularly effective for DACH-region vendors — the VAT ID is almost always publicly accessible on the company’s own website.

Commercial Data Providers

Dun & Bradstreet (D&B Direct+)

World’s largest commercial database. Excellent fuzzy matching — handles abbreviations, misspellings, subsidiaries vs. parent entities. Company Match and Company Search API endpoints.

Creditreform

Strong coverage in the DACH region. Matching and enrichment services including VAT IDs. Particularly relevant given Regnology’s vendor base.

Bureau van Dijk / Orbis (Moody’s)

400+ million companies worldwide. Purpose-built fuzzy matching for entity resolution. REST API available. Strong in Western Europe.

OpenCorporates

Aggregates data from 180+ company registers. Returns company register numbers — VAT IDs not always included but can be derived for some countries.

Beyond the EU

For vendors outside the EU, there is no single equivalent to VIES. However, several countries offer their own validation systems with API access.

International Tax ID Validation Systems

Country / Region System Name Search API
United Kingdom HMRC VAT API No (validation only) REST API
Switzerland UID Register (uid.admin.ch) Yes SOAP API
Norway Brønnøysund Register (brreg.no) Yes REST API
Australia ABN Lookup Yes REST API
India GST Verification Portal No (validation only) Limited
Brazil Receita Federal (CNPJ) No (validation only) Semi-official

Switzerland Deserves Special Attention

The Swiss UID register is exceptionally well-designed for this use case. It allows full-text company name search and returns the UID number, which serves as the VAT number (with “MWST” suffix). The SOAP API makes it possible to programmatically resolve Swiss vendors with high accuracy — comparable to the best EU registers. Given the strong business ties between DACH and Switzerland, this is a highly relevant data source.

Resolving French Vendors from Messy Data

France is one of the most favourable countries for VAT enrichment. The VAT number can be mathematically derived from the business registration number — no lookup required.

Why France Is a Best Case

Unlike most EU countries where the VAT ID must be found in a register or on a website, the French numéro de TVA intracommunautaire can be algorithmically computed from the 9-digit SIREN number. If you can match a vendor to its SIREN entry, you have the VAT ID — guaranteed, no additional lookup needed.

The French Business Register Landscape

Identifier Format What It Is
SIREN 9 digits Unique identifier for the legal entity (entreprise). Assigned by INSEE at registration. This is the key to deriving the VAT ID.
SIRET 14 digits (SIREN + 5-digit NIC) Identifies a specific establishment (site/branch) of the company. Multiple SIRETs can exist per SIREN.
TVA Intracommunautaire FR + 2-digit key + SIREN The EU VAT ID. Algorithmically derived: key = (12 + 3 × (SIREN mod 97)) mod 97

How It Works with Messy Vendor Data

The typical scenario: ERP vendor records contain French company names with inconsistent formatting, missing accents, abbreviated legal forms, and incomplete addresses. Here is how the pipeline handles this:

flowchart TD RAW["Messy ERP Record\n'STE DUPONT & FILS SARL\n14 Rue de la Paix, Paris'"] subgraph NORM ["Step 1 — Normalisation"] direction TB N1["Clean company name\nExpand abbreviations: STE → SOCIÉTÉ\nStandardise legal form: SARL → S.A.R.L.\nHandle accents & special characters"] N2["Normalise address\nParse postal code, city, street\nStandardise to French postal format"] end subgraph SIRENE ["Step 2 — SIRENE API Search"] direction TB S1["Full-text search on SIRENE\nAPI endpoint: api.insee.fr\nQuery by name + postal code + city"] S2["Fuzzy matching & ranking\nMultiple candidates returned\nScore by name similarity + address proximity"] S3["Best Match Found\nSIREN: 443 061 841\nDENOMINATION: SOCIETE DUPONT ET FILS"] end subgraph DERIVE ["Step 3 — Derive VAT ID"] direction TB D1["Apply formula\nSIREN = 443061841\nkey = (12 + 3 × (443061841 mod 97)) mod 97\nkey = (12 + 3 × 64) mod 97 = 17"] D2["VAT ID = FR17443061841"] end subgraph VALIDATE ["Step 4 — VIES Validation"] direction TB V1["Check FR17443061841 against VIES\nConfirm active registration\nCross-check returned name & address"] V_OK["Validated & Confirmed\nName match: 94%\nAddress match: 91%"] end RESULT["Enriched Record\nVAT ID: FR17443061841\nConfidence: High\nSource: SIRENE + VIES"] RAW --> N1 N1 --> N2 N2 --> S1 S1 --> S2 S2 --> S3 S3 --> D1 D1 --> D2 D2 --> V1 V1 --> V_OK V_OK --> RESULT style RAW fill:#fef3c7,stroke:#f59e0b,color:#78350f style N1 fill:#0e7490,color:#fff,stroke:#0891b2 style N2 fill:#0e7490,color:#fff,stroke:#0891b2 style NORM fill:#f8fafc,stroke:#94a3b8,stroke-dasharray: 5 5 style S1 fill:#0078d4,color:#fff,stroke:#005a9e style S2 fill:#0078d4,color:#fff,stroke:#005a9e style S3 fill:#dcfce7,stroke:#16a34a,color:#14532d style SIRENE fill:#f8fafc,stroke:#94a3b8,stroke-dasharray: 5 5 style D1 fill:#7c3aed,color:#fff,stroke:#6d28d9 style D2 fill:#dcfce7,stroke:#16a34a,color:#14532d style DERIVE fill:#f8fafc,stroke:#94a3b8,stroke-dasharray: 5 5 style V1 fill:#0078d4,color:#fff,stroke:#005a9e style V_OK fill:#dcfce7,stroke:#16a34a,color:#14532d style VALIDATE fill:#f8fafc,stroke:#94a3b8,stroke-dasharray: 5 5 style RESULT fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d

Handling Common Data Quality Issues

Abbreviated & Misspelled Names

French vendor records frequently use abbreviations (STE for Société, ETS for Établissements, CIE for Compagnie) or contain typos. The normalisation step expands these before searching. SIRENE’s full-text search is tolerant of minor variations, and we apply additional fuzzy matching on the returned candidates.

Inconsistent Legal Forms

S.A.R.L., SARL, Sarl, S.A.S., SAS, S.A., SA — French legal form abbreviations appear in many variations. The normalisation layer maps all variants to a canonical form before searching. This prevents false negatives where the name matches but the legal form string differs.

Missing or Partial Addresses

Many vendor records only have a city or postal code, not a full street address. SIRENE allows searching by postal code + city + name, which is often sufficient. When only the city is known, we search by name + department (first 2 digits of postal code) to narrow results.

Accents & Special Characters

French company names contain accented characters (é, è, ê, à, ç, ô) that are often stripped or garbled in ERP systems. The normalisation step handles both directions: adding likely accents to stripped names and removing accents for comparison, ensuring matches regardless of encoding.

Multiple Establishments (SIRETs)

A single French company (one SIREN) can have many establishments (SIRETs) — e.g., branches, warehouses, offices. The vendor record might reference a specific site. SIRENE returns all establishments, and we match to the correct SIRET while extracting the parent SIREN for VAT derivation.

Ceased or Merged Companies

SIRENE includes historical data — companies that have been dissolved, merged, or renamed. When a vendor’s SIREN points to a ceased entity, the pipeline checks for successor companies and flags the record for review with the successor’s details and new VAT ID.

SIRENE API Details

API Access & Rate Limits

The SIRENE API is operated by INSEE and freely accessible after registration. Key characteristics:

  • Endpoint: api.insee.fr/entreprises/sirene/V3.11
  • Authentication: OAuth2 bearer token (free registration)
  • Rate limit: 30 requests/minute (standard), higher with partnership agreement
  • Search fields: denominationUniteLegale (company name), codePostalEtablissement (postal code), libelleCommuneEtablissement (city), and many more
  • Fuzzy search: Built-in phonetic matching (useful for misspelled names)
  • Coverage: ~12 million active legal entities, ~32 million establishments (including historical)

VAT Derivation Formula

Computing the TVA Number from SIREN

The French intra-community VAT number follows a deterministic formula. There is no database lookup needed — it is pure arithmetic:

StepOperationExample (SIREN = 443 061 841)
1Take the 9-digit SIREN number443061841
2Compute: SIREN mod 97443061841 mod 97 = 64
3Compute: (12 + 3 × result) mod 97(12 + 3 × 64) mod 97 = 204 mod 97 = 10
4Zero-pad to 2 digits → this is the key10
5Concatenate: FR + key + SIRENFR10443061841

This formula is defined by French tax law and works for all standard French companies. The only exceptions are certain entities with special tax statuses (e.g., some non-profit organisations or public bodies), which may have a different VAT key or no intra-community VAT number at all.

~12M
Active French companies in SIRENE
100%
VAT derivation accuracy from SIREN
Free
SIRENE API access
30/min
Standard API rate limit

Bottom Line for French Vendors

Even with the messiest vendor data, the combination of SIRENE’s tolerant full-text search and the deterministic VAT derivation formula means that the vast majority of French vendors can be resolved automatically. The main challenge is not finding the VAT ID — it is correctly matching the messy ERP record to the right SIREN entry. This is where Orcha’s AI-powered normalisation and fuzzy matching provide the most value.

Questions to Scope the Project

To design the right solution and estimate effort, we need to understand the current state of your vendor master data.

1
How many vendor records are there?

Total number of active vendor/supplier records in the ERP. This determines the scale of the enrichment run and which approach is most efficient.

2
How many already have a VAT ID?

What percentage of records already contain a VAT identification number? This tells us how large the “missing” vs. “validate existing” workstreams will be.

3
Which countries are represented?

A breakdown of vendor countries — especially: which are within the EU, and which are outside? This determines which register APIs and validation systems we can leverage.

4
Which non-EU countries are most represented?

For vendors outside the EU, the approach varies significantly by country. Knowing the top non-EU countries (e.g., UK, Switzerland, Norway, US) allows us to prioritise which international systems to integrate.

5
What data is available per vendor?

Which fields are typically populated — company name, address, country, existing tax IDs, contact details? The quality and completeness of existing fields determines match rates.

6
Is this a one-time cleanup or ongoing?

Should this be a one-time batch enrichment of the existing master data, or an ongoing process that validates new vendors as they are created?