Automation Blog Pricing Contact
AI Document Pipeline

From any PDF to a legally compliant e-invoice — automatically.

Most invoice tools expect clean, structured data. Your invoices usually aren't. InvoiceXML's AI pipeline reads any PDF — typed, scanned, or photographed — extracts every required field, and produces a structured UBL, CII, ZUGFeRD, Factur-X, or XRechnung file, automatically validated against Schematron rules.

Works with scanned invoices EN 16931 validated output No data retained after processing EU-based infrastructure

The Problem

E-invoicing mandates don't care what format your suppliers use

European e-invoicing mandates are here. Germany's B2B requirement took effect January 2025. France's is rolling out now. The EU's ViDA directive is pushing this across every member state by 2028.

But your incoming invoices are still a pile of PDFs — some exported from accounting software, some scanned from paper, some photographed on a phone. The legal requirement is clear. The reality of your invoice stack is not.

Open-source libraries can convert structured data between formats. They cannot read an unstructured document. That gap is where businesses get stuck — and where InvoiceXML begins.

Format Mismatch

Your ERP wants structured XML. Your suppliers send PDFs in a hundred different layouts.

Library Gap

Open-source tools convert formats. They can't read an unstructured or scanned document.

Compliance Risk

Manual data entry is the current fallback. It doesn't scale, and it fails at the worst moments.

How it works

Four stages. Zero manual work.

A complete AI document pipeline — from raw PDF input to legally valid e-invoice output. Every step automated, every output validated.

01

Ingest

Any invoice, any quality

Accepts native PDFs, scanned documents, MS Office files (DOCX, XLSX), and even images — photographed invoices up to 20 MB. No preprocessing required.

Technical note: Automatic format detection. Scanned documents go through OCR before extraction.

02

Extract

AI reads your invoice like a human — but faster

Our extraction model identifies every field required by EN 16931: seller and buyer identity, line items, tax breakdowns, dates, payment terms, currency. It understands layout, not just text — so totals in footers, tax numbers in headers, and line items in tables are all found correctly.

Technical note: Extraction model trained on European invoice formats in DE, FR, EN, IT, ES. Returns confidence scores per field.

03

Map & Enrich

Raw data becomes a valid compliance schema

Extracted fields are mapped to the EN 16931 semantic data model. Missing optional fields are inferred where possible. Required fields that cannot be extracted are flagged with structured errors — not silent failures.

Technical note: Output is a typed, validated data model — not raw key-value pairs. Supports MINIMUM, BASIC WL, BASIC, EN 16931, and EXTENDED profiles.

04

Generate & Validate

A validated file, ready for your review before delivery

The mapped data is used to generate your chosen output format — ZUGFeRD PDF/A-3b, Factur-X PDF/A-3b, XRechnung XML, CII XML, or UBL XML. Every output is run through Schematron validation against official EU business rules before being returned. Always review the output before sending.

Technical note: Validation against EN 16931 Schematron + format-specific rules (e.g. CIUS-REC-DE for XRechnung). Errors returned as structured RFC 7807 responses.

Why it's hard

This is not a format converter.
This is document intelligence.

Real invoices are messy

Supplier invoices come in hundreds of layouts. Tax numbers appear in different positions. Line item tables have inconsistent column headers. Totals are sometimes miscalculated. Our extraction layer handles real-world variance — not just textbook examples.

Open source stops at the format layer

Libraries like Mustang or the Factur-X Python package are excellent at converting structured data into compliant XML. They cannot read an unstructured PDF. That’s the gap InvoiceXML fills — the step before the step that open-source handles.

Compliance is a moving target

EN 16931 profiles, XRechnung CIUS versions, Factur-X 1.0 vs 1.07 — the standards evolve continuously. We maintain the Schematron validation rules so you don't have to. Every API call runs against the current official specification.

Who it's for

Built for anyone who touches invoices at scale

Developers & engineering teams

Add e-invoicing to your product in an afternoon

You're adding compliance to your product or internal stack. You don't want to maintain a library, track standard updates, or debug Schematron errors on a Friday. One API endpoint handles the full pipeline. Keep shipping.

Read the API docs →

Finance & operations teams

Automate incoming invoices without writing a line of code

You process dozens or hundreds of supplier invoices a month. Manual entry is slow and error-prone. Connect InvoiceXML to Make, Zapier, or n8n — no code required. Every invoice, compliant and structured, without touching it.

See no-code integrations →

AI agents & autonomous workflows

Give your AI agent a compliance superpower

Your agent receives invoices, validates or converts them, and routes structured data downstream. InvoiceXML's MCP server lets AI assistants handle compliance natively — without leaving the workflow or calling a human.

Explore the MCP server →

Output Formats

One pipeline. Every Global Invoice standard.

Whatever format your counterparty, ERP, or tax authority requires — InvoiceXML generates it from the same source document.

EN 16931

The EU's core semantic model based on Directive 2014/55/EU that underpins every national format.

CII

Cross-Industry Invoice XML. The UN/CEFACT standard used by ZUGFeRD and Factur-X.

UBL

Universal Business Language. Used by Peppol, XRechnung, and many national standards.

ZUGFeRD

Germany's standard for modern B2B e-invoicing. PDF/A-3 with embedded CII XML.

Factur-X

France's equivalent, built on the same CII foundation. Required for B2G and expanding to B2B.

XRechnung

Germany's public sector standard. Pure UBL or CII XML — no PDF wrapper.

Peppol

Pan-European e-delivery network. UBL-based, used for cross-border invoicing.

PDF/A-3b

Archival PDF format with embedded XML, required for long-term storage compliance.

JSON

AI-extracted invoice as a validated JSON model. Every EN 16931 field ready to consume.

Your invoice data never stays here

InvoiceXML processes your documents in memory and returns the result immediately. No invoice data is written to disk or retained after your API response. No training on your documents. HTTPS-only. EU-based infrastructure.

In-memory only

No disk writes

Zero retention

Deleted on response

No model training

Your data is yours

EU infrastructure

GDPR compliant

Full Tool Suite

Every E-Invoicing Tool You Need

From AI-powered conversion to standards validation — a complete toolkit for global e-invoice compliance.

AI Tools

Convert messy or scanned PDF invoices into compliant e-invoices.

Create

Build compliant e-invoices with PDF/A-3b conformance from scratch.

Validate

Validate invoices against official standards and Schematron rules.

Render

Generate visual PDF previews from structured e-invoicing formats.

Convert

Transform between e-invoicing formats for cross-border compatibility.

Extract

Parse and extract structured data from invoice documents.

Ready to automate your invoices?

Start your 30-day free trial. No credit card required.

Get Started