Automation MCP Server Features Blog Pricing Contact

Parse Any Invoice to Structured JSON using AI

Test the invoice-to-JSON extraction REST API, upload any invoice (typed, scanned, or photographed) and get back a validated, structured JSON object with every field extracted and verified.

Drop your invoice here

or browse files to upload

REST API demo · POST /v1/parse/json · Accepted formats: PDF, JPEG, PNG, TIFF · Max 20 MB

How AI Invoice Parsing Works

Upload Any Invoice

Drop a native PDF, a scanned document, or a photo of a paper invoice. The AI handles any layout, orientation, language, or quality level.

AI Reads & Validates

The AI extracts every field, then verifies: tax number formats, address plausibility, unit price × quantity = line total, and sum of lines + tax = invoice total.

Receive Structured JSON

Download a normalised invoice JSON with consistent field names, seller, buyer, line items, tax breakdown, payment terms, ready for any app or database.

The Invoice Stack Reality

Most invoice parsing tools assume clean, computer-generated PDFs. Your suppliers don't send those. They send scanned faxes, smartphone photos of paper invoices, thermal-printed receipts with missing fields, and exported PDFs from a dozen different accounting systems, each with a completely different layout. InvoiceXML's AI engine was built for that reality. It reads invoices the way a human accountant would, understanding context, inferring missing values, and flagging anything it cannot verify with confidence.

Scanned & Photographed Invoices

OCR combined with semantic AI understands invoice structure regardless of scan quality, rotation, or partial occlusion, no template configuration required.

Tax Number Verification

Extracted VAT and business registration numbers are verified against country-specific format rules, catching transposition errors and truncated values before they hit your system.

Line-Item Arithmetic Check

Every line total is recomputed from unit price and quantity. Tax amounts are recalculated per category. The invoice grand total is verified. Discrepancies are flagged in the output.

Address Cross-Referencing

Seller and buyer addresses are parsed into structured components (street, city, postal code, country) and checked for internal consistency, catching OCR errors in postal codes or truncated street names.

Schema Validation

The extracted data is validated against the EN 16931 invoice data model to ensure all required fields are present, correctly typed, and within acceptable value ranges before the JSON is returned.

Integration-Ready Output

The JSON uses consistent, predictable field names regardless of the source language or format, drop it directly into your database, ERP import, or data pipeline without transformation.

Developer API

Built for Developers

A single REST endpoint. Upload any invoice PDF, native, scanned, or photographed, and receive a structured InvoiceDocument JSON. AI-driven with OCR and EN 16931 field mapping.

  • Supports all formats: DOCX, XLSX, PDF, images, no preprocessing needed
  • Works on scanned and photographed invoices
  • Every response validated against official Schematron rules

Supports ZUGFeRD · Factur-X · XRechnung · UBL · CII · EN 16931

API Documentation
Terminal
$ curl -X POST https://api.invoicexml.com/v1/parse/json \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: multipart/form-data" \
  -F "[email protected]"

// => 200 OK
// => { "id": "inv_8f3k...", "status": "completed" }

Frequently Asked Questions

Does this work with scanned or photographed invoices?

Yes, that is the primary use case. The AI reads the document like a human, recognising fields regardless of layout, orientation, scan quality, or language. It handles thermal-printed receipts, faxed documents, and smartphone photos of paper invoices.

What does the JSON output look like?

The output is the BT-first EN 16931 InvoiceDocument with consistent field names: invoiceNumber, issueDate, currency, seller, buyer, paymentDetails, lines, totals, and vatBreakdowns. Same shape as the request body accepted by the create endpoints, so you can pipe one into the other.

What formats are accepted?

PDF only (native or scanned). Maximum file size is 20 MB. For PDFs that already carry an embedded CII or UBL XML attachment, use /v1/extract/json instead to skip AI and parse the structured data directly.

How is this different from the Extract endpoints?

The Extract endpoints (/v1/extract/xml and /v1/extract/json) work only with structured XML, either embedded in a Factur-X / ZUGFeRD / Peppol PDF or supplied as a standalone XML file. They are fast, deterministic, and never call AI. The Parse endpoint (/v1/parse/json) runs the AI pipeline on the PDF itself, ignoring any embedded XML, and is the right choice for native, scanned, or photographed invoices. Use Extract when your input has structured data, use Parse when it does not.

Start free today

Ready to automate your invoices?

Validate, convert and embed compliant e-invoices through one API. Start your 30-day free trial. No credit card required.

GDPR Compliant No credit card required Setup in minutes
Peppol UBL
Factur-X
EN 16931
142 / 142 passed
Compliant
PDF/A-3 embedded