Automated validation and enrichment of VAT identification numbers from existing ERP master data — preparing your vendor records for the e-invoicing era.
With mandatory e-invoicing approaching across the EU, vendor master data that was “good enough” for traditional processes now needs to meet strict structural requirements.
With e-invoicing mandates like ViDA (VAT in the Digital Age) approaching, every invoice will require a machine-readable, validated VAT ID. Vendor records that worked for manual AP processing need to be enriched and validated to meet these new requirements.
A tiered approach that starts with the most reliable sources and progressively applies more sophisticated methods for remaining unresolved records.
Before any validation can happen, the raw ERP data needs to be cleansed and structured into a consistent format. This includes standardising company names (handling legal form abbreviations like GmbH, AG, S.A., B.V.), normalising addresses (country codes, postal formats, character encoding), and formatting existing VAT IDs into the correct pattern (adding/removing country prefixes, stripping spaces and dashes). This step dramatically improves match rates in all subsequent stages.
All vendor records that already contain a VAT ID are checked against the official EU VIES database. For UK vendors, the HMRC VAT API is used. VIES returns the registration status and, for most countries, the registered company name and address — enabling an additional cross-check against the ERP record.
For records without a valid VAT ID, Orcha queries national business registers using the company name and address. Several countries offer searchable APIs where the VAT ID can be derived directly from the company registration number. This is particularly effective for France (SIRENE → algorithmic VAT derivation), Belgium (enterprise number = VAT number), and the Netherlands (KVK API).
For remaining unresolved records, an AI agent performs targeted web searches. In DACH countries, companies are legally required to display their VAT ID on their Impressum page. The agent searches for the company website, navigates to the Impressum or legal disclosure page, and extracts the VAT ID using pattern recognition. This also covers companies that publish their tax ID in footers, terms & conditions, or public directories.
All VAT IDs discovered in Stages 3 and 4 are validated against VIES (or the relevant national system) before being written back to the master data. This ensures that only confirmed, currently active VAT IDs enter the ERP system. Any candidate that fails validation is flagged for further investigation.
Records that could not be resolved through the automated pipeline are handled through two channels:
An overview of the official and commercial data sources available for VAT ID validation and enrichment, by country.
The European Commission’s official service for verifying EU VAT numbers. Covers all 27 EU member states plus Northern Ireland. REST and SOAP API available. Limitation: validation only — you must already have the VAT ID. No reverse lookup by company name.
| Country | Register | Name Search | VAT ID | API |
|---|---|---|---|---|
| France | SIRENE (INSEE) | Yes | Derivable from SIREN number | REST API |
| Belgium | KBO / BCE | Yes | Enterprise number = VAT number | Public search + data dumps |
| Netherlands | KVK | Yes | BTW-nummer included | REST API |
| Germany | Handelsregister | Yes (website) | USt-IdNr. not in register | No official API |
| Italy | Registro Imprese | Yes | Partita IVA included | Limited API (via InfoCamere) |
| Austria | Firmenbuch | Yes (website) | UID not in commercial register | No public API |
German, Austrian, and Swiss law requires companies to publish their VAT identification number (USt-IdNr. / UID / MWST-Nr.) on their website’s Impressum or legal disclosure page. This makes the AI-powered web research stage particularly effective for DACH-region vendors — the VAT ID is almost always publicly accessible on the company’s own website.
World’s largest commercial database. Excellent fuzzy matching — handles abbreviations, misspellings, subsidiaries vs. parent entities. Company Match and Company Search API endpoints.
Strong coverage in the DACH region. Matching and enrichment services including VAT IDs. Particularly relevant given Regnology’s vendor base.
400+ million companies worldwide. Purpose-built fuzzy matching for entity resolution. REST API available. Strong in Western Europe.
Aggregates data from 180+ company registers. Returns company register numbers — VAT IDs not always included but can be derived for some countries.
For vendors outside the EU, there is no single equivalent to VIES. However, several countries offer their own validation systems with API access.
| Country / Region | System | Name Search | API |
|---|---|---|---|
| United Kingdom | HMRC VAT API | No (validation only) | REST API |
| Switzerland | UID Register (uid.admin.ch) | Yes | SOAP API |
| Norway | Brønnøysund Register (brreg.no) | Yes | REST API |
| Australia | ABN Lookup | Yes | REST API |
| India | GST Verification Portal | No (validation only) | Limited |
| Brazil | Receita Federal (CNPJ) | No (validation only) | Semi-official |
The Swiss UID register is exceptionally well-designed for this use case. It allows full-text company name search and returns the UID number, which serves as the VAT number (with “MWST” suffix). The SOAP API makes it possible to programmatically resolve Swiss vendors with high accuracy — comparable to the best EU registers. Given the strong business ties between DACH and Switzerland, this is a highly relevant data source.
France is one of the most favourable countries for VAT enrichment. The VAT number can be mathematically derived from the business registration number — no lookup required.
Unlike most EU countries where the VAT ID must be found in a register or on a website, the French numéro de TVA intracommunautaire can be algorithmically computed from the 9-digit SIREN number. If you can match a vendor to its SIREN entry, you have the VAT ID — guaranteed, no additional lookup needed.
| Identifier | Format | What It Is |
|---|---|---|
| SIREN | 9 digits | Unique identifier for the legal entity (entreprise). Assigned by INSEE at registration. This is the key to deriving the VAT ID. |
| SIRET | 14 digits (SIREN + 5-digit NIC) | Identifies a specific establishment (site/branch) of the company. Multiple SIRETs can exist per SIREN. |
| TVA Intracommunautaire | FR + 2-digit key + SIREN | The EU VAT ID. Algorithmically derived: key = (12 + 3 × (SIREN mod 97)) mod 97 |
The typical scenario: ERP vendor records contain French company names with inconsistent formatting, missing accents, abbreviated legal forms, and incomplete addresses. Here is how the pipeline handles this:
French vendor records frequently use abbreviations (STE for Société, ETS for Établissements, CIE for Compagnie) or contain typos. The normalisation step expands these before searching. SIRENE’s full-text search is tolerant of minor variations, and we apply additional fuzzy matching on the returned candidates.
S.A.R.L., SARL, Sarl, S.A.S., SAS, S.A., SA — French legal form abbreviations appear in many variations. The normalisation layer maps all variants to a canonical form before searching. This prevents false negatives where the name matches but the legal form string differs.
Many vendor records only have a city or postal code, not a full street address. SIRENE allows searching by postal code + city + name, which is often sufficient. When only the city is known, we search by name + department (first 2 digits of postal code) to narrow results.
French company names contain accented characters (é, è, ê, à, ç, ô) that are often stripped or garbled in ERP systems. The normalisation step handles both directions: adding likely accents to stripped names and removing accents for comparison, ensuring matches regardless of encoding.
A single French company (one SIREN) can have many establishments (SIRETs) — e.g., branches, warehouses, offices. The vendor record might reference a specific site. SIRENE returns all establishments, and we match to the correct SIRET while extracting the parent SIREN for VAT derivation.
SIRENE includes historical data — companies that have been dissolved, merged, or renamed. When a vendor’s SIREN points to a ceased entity, the pipeline checks for successor companies and flags the record for review with the successor’s details and new VAT ID.
The SIRENE API is operated by INSEE and freely accessible after registration. Key characteristics:
The French intra-community VAT number follows a deterministic formula. There is no database lookup needed — it is pure arithmetic:
| Step | Operation | Example (SIREN = 443 061 841) |
|---|---|---|
| 1 | Take the 9-digit SIREN number | 443061841 |
| 2 | Compute: SIREN mod 97 | 443061841 mod 97 = 64 |
| 3 | Compute: (12 + 3 × result) mod 97 | (12 + 3 × 64) mod 97 = 204 mod 97 = 10 |
| 4 | Zero-pad to 2 digits → this is the key | 10 |
| 5 | Concatenate: FR + key + SIREN | FR10443061841 |
This formula is defined by French tax law and works for all standard French companies. The only exceptions are certain entities with special tax statuses (e.g., some non-profit organisations or public bodies), which may have a different VAT key or no intra-community VAT number at all.
Even with the messiest vendor data, the combination of SIRENE’s tolerant full-text search and the deterministic VAT derivation formula means that the vast majority of French vendors can be resolved automatically. The main challenge is not finding the VAT ID — it is correctly matching the messy ERP record to the right SIREN entry. This is where Orcha’s AI-powered normalisation and fuzzy matching provide the most value.
To design the right solution and estimate effort, we need to understand the current state of your vendor master data.
Total number of active vendor/supplier records in the ERP. This determines the scale of the enrichment run and which approach is most efficient.
What percentage of records already contain a VAT identification number? This tells us how large the “missing” vs. “validate existing” workstreams will be.
A breakdown of vendor countries — especially: which are within the EU, and which are outside? This determines which register APIs and validation systems we can leverage.
For vendors outside the EU, the approach varies significantly by country. Knowing the top non-EU countries (e.g., UK, Switzerland, Norway, US) allows us to prioritise which international systems to integrate.
Which fields are typically populated — company name, address, country, existing tax IDs, contact details? The quality and completeness of existing fields determines match rates.
Should this be a one-time batch enrichment of the existing master data, or an ongoing process that validates new vendors as they are created?