From any PDF to a legally compliant e-invoice — automatically.
Most invoice tools expect clean, structured data. Your invoices usually aren't. InvoiceXML's AI pipeline reads any PDF — typed, scanned, or photographed — extracts every required field, and produces a structured UBL, CII, ZUGFeRD, Factur-X, or XRechnung file, automatically validated against Schematron rules.
The Problem
E-invoicing mandates don't care what format your suppliers use
European e-invoicing mandates are here. Germany's B2B requirement took effect January 2025. France's is rolling out now. The EU's ViDA directive is pushing this across every member state by 2028.
But your incoming invoices are still a pile of PDFs — some exported from accounting software, some scanned from paper, some photographed on a phone. The legal requirement is clear. The reality of your invoice stack is not.
Open-source libraries can convert structured data between formats. They cannot read an unstructured document. That gap is where businesses get stuck — and where InvoiceXML begins.
Format Mismatch
Your ERP wants structured XML. Your suppliers send PDFs in a hundred different layouts.
Library Gap
Open-source tools convert formats. They can't read an unstructured or scanned document.
Compliance Risk
Manual data entry is the current fallback. It doesn't scale, and it fails at the worst moments.
How it works
Four stages. Zero manual work.
A complete AI document pipeline — from raw PDF input to legally valid e-invoice output. Every step automated, every output validated.
Ingest
Any invoice, any quality
Accepts native PDFs, scanned documents, MS Office files (DOCX, XLSX), and even images — photographed invoices up to 20 MB. No preprocessing required.
Technical note: Automatic format detection. Scanned documents go through OCR before extraction.
Extract
AI reads your invoice like a human — but faster
Our extraction model identifies every field required by EN 16931: seller and buyer identity, line items, tax breakdowns, dates, payment terms, currency. It understands layout, not just text — so totals in footers, tax numbers in headers, and line items in tables are all found correctly.
Technical note: Extraction model trained on European invoice formats in DE, FR, EN, IT, ES. Returns confidence scores per field.
Map & Enrich
Raw data becomes a valid compliance schema
Extracted fields are mapped to the EN 16931 semantic data model. Missing optional fields are inferred where possible. Required fields that cannot be extracted are flagged with structured errors — not silent failures.
Technical note: Output is a typed, validated data model — not raw key-value pairs. Supports MINIMUM, BASIC WL, BASIC, EN 16931, and EXTENDED profiles.
Generate & Validate
A validated file, ready for your review before delivery
The mapped data is used to generate your chosen output format — ZUGFeRD PDF/A-3b, Factur-X PDF/A-3b, XRechnung XML, CII XML, or UBL XML. Every output is run through Schematron validation against official EU business rules before being returned. Always review the output before sending.
Technical note: Validation against EN 16931 Schematron + format-specific rules (e.g. CIUS-REC-DE for XRechnung). Errors returned as structured RFC 7807 responses.
Why it's hard
This is not a format converter.
This is document intelligence.
Real invoices are messy
Supplier invoices come in hundreds of layouts. Tax numbers appear in different positions. Line item tables have inconsistent column headers. Totals are sometimes miscalculated. Our extraction layer handles real-world variance — not just textbook examples.
Open source stops at the format layer
Libraries like Mustang or the Factur-X Python package are excellent at converting structured data into compliant XML. They cannot read an unstructured PDF. That’s the gap InvoiceXML fills — the step before the step that open-source handles.
Compliance is a moving target
EN 16931 profiles, XRechnung CIUS versions, Factur-X 1.0 vs 1.07 — the standards evolve continuously. We maintain the Schematron validation rules so you don't have to. Every API call runs against the current official specification.
Who it's for
Built for anyone who touches invoices at scale
Developers & engineering teams
Add e-invoicing to your product in an afternoon
You're adding compliance to your product or internal stack. You don't want to maintain a library, track standard updates, or debug Schematron errors on a Friday. One API endpoint handles the full pipeline. Keep shipping.
Read the API docs →Finance & operations teams
Automate incoming invoices without writing a line of code
You process dozens or hundreds of supplier invoices a month. Manual entry is slow and error-prone. Connect InvoiceXML to Make, Zapier, or n8n — no code required. Every invoice, compliant and structured, without touching it.
See no-code integrations →AI agents & autonomous workflows
Give your AI agent a compliance superpower
Your agent receives invoices, validates or converts them, and routes structured data downstream. InvoiceXML's MCP server lets AI assistants handle compliance natively — without leaving the workflow or calling a human.
Explore the MCP server →Output Formats
One pipeline. Every Global Invoice standard.
Whatever format your counterparty, ERP, or tax authority requires — InvoiceXML generates it from the same source document.
The EU's core semantic model based on Directive 2014/55/EU that underpins every national format.
Cross-Industry Invoice XML. The UN/CEFACT standard used by ZUGFeRD and Factur-X.
Universal Business Language. Used by Peppol, XRechnung, and many national standards.
Germany's standard for modern B2B e-invoicing. PDF/A-3 with embedded CII XML.
France's equivalent, built on the same CII foundation. Required for B2G and expanding to B2B.
Germany's public sector standard. Pure UBL or CII XML — no PDF wrapper.
Pan-European e-delivery network. UBL-based, used for cross-border invoicing.
Archival PDF format with embedded XML, required for long-term storage compliance.
AI-extracted invoice as a validated JSON model. Every EN 16931 field ready to consume.
Your invoice data never stays here
InvoiceXML processes your documents in memory and returns the result immediately. No invoice data is written to disk or retained after your API response. No training on your documents. HTTPS-only. EU-based infrastructure.
In-memory only
No disk writes
Zero retention
Deleted on response
No model training
Your data is yours
EU infrastructure
GDPR compliant
Every E-Invoicing Tool You Need
From AI-powered conversion to standards validation — a complete toolkit for global e-invoice compliance.
AI Tools
Convert messy or scanned PDF invoices into compliant e-invoices.
Create
Build compliant e-invoices with PDF/A-3b conformance from scratch.
Validate
Validate invoices against official standards and Schematron rules.
Render
Generate visual PDF previews from structured e-invoicing formats.
Convert
Transform between e-invoicing formats for cross-border compatibility.
Extract
Parse and extract structured data from invoice documents.