Parse Any Invoice to Structured JSON using AI
Test the invoice-to-JSON extraction REST API, upload any invoice (typed, scanned, or photographed) and get back a validated, structured JSON object with every field extracted and verified.
Drop your invoice here
or browse files to upload
REST API demo · POST /v1/parse/json · Accepted formats: PDF, JPEG, PNG, TIFF · Max 20 MB
How AI Invoice Parsing Works
Upload Any Invoice
Drop a native PDF, a scanned document, or a photo of a paper invoice. The AI handles any layout, orientation, language, or quality level.
AI Reads & Validates
The AI extracts every field, then verifies: tax number formats, address plausibility, unit price × quantity = line total, and sum of lines + tax = invoice total.
Receive Structured JSON
Download a normalised invoice JSON with consistent field names, seller, buyer, line items, tax breakdown, payment terms, ready for any app or database.
The Invoice Stack Reality
Most invoice parsing tools assume clean, computer-generated PDFs. Your suppliers don't send those. They send scanned faxes, smartphone photos of paper invoices, thermal-printed receipts with missing fields, and exported PDFs from a dozen different accounting systems, each with a completely different layout. InvoiceXML's AI engine was built for that reality. It reads invoices the way a human accountant would, understanding context, inferring missing values, and flagging anything it cannot verify with confidence.
Scanned & Photographed Invoices
OCR combined with semantic AI understands invoice structure regardless of scan quality, rotation, or partial occlusion, no template configuration required.
Tax Number Verification
Extracted VAT and business registration numbers are verified against country-specific format rules, catching transposition errors and truncated values before they hit your system.
Line-Item Arithmetic Check
Every line total is recomputed from unit price and quantity. Tax amounts are recalculated per category. The invoice grand total is verified. Discrepancies are flagged in the output.
Address Cross-Referencing
Seller and buyer addresses are parsed into structured components (street, city, postal code, country) and checked for internal consistency, catching OCR errors in postal codes or truncated street names.
Schema Validation
The extracted data is validated against the EN 16931 invoice data model to ensure all required fields are present, correctly typed, and within acceptable value ranges before the JSON is returned.
Integration-Ready Output
The JSON uses consistent, predictable field names regardless of the source language or format, drop it directly into your database, ERP import, or data pipeline without transformation.
Built for Developers
A single REST endpoint. Upload any invoice PDF, native, scanned, or photographed, and receive a structured InvoiceDocument JSON. AI-driven with OCR and EN 16931 field mapping.
- Supports all formats: DOCX, XLSX, PDF, images, no preprocessing needed
- Works on scanned and photographed invoices
- Every response validated against official Schematron rules
Supports ZUGFeRD · Factur-X · XRechnung · UBL · CII · EN 16931
API Documentation$ curl -X POST https://api.invoicexml.com/v1/parse/json \ -H "Authorization: Bearer sk_live_..." \ -H "Content-Type: multipart/form-data"\ -F "[email protected]" // => 200 OK // => { "id": "inv_8f3k...", "status": "completed" }
Frequently Asked Questions
Does this work with scanned or photographed invoices?
Yes, that is the primary use case. The AI reads the document like a human, recognising fields regardless of layout, orientation, scan quality, or language. It handles thermal-printed receipts, faxed documents, and smartphone photos of paper invoices.
What does the JSON output look like?
The output is the BT-first EN 16931 InvoiceDocument with consistent field names: invoiceNumber, issueDate, currency, seller, buyer, paymentDetails, lines, totals, and vatBreakdowns. Same shape as the request body accepted by the create endpoints, so you can pipe one into the other.
What formats are accepted?
PDF only (native or scanned). Maximum file size is 20 MB. For PDFs that already carry an embedded CII or UBL XML attachment, use /v1/extract/json instead to skip AI and parse the structured data directly.
How is this different from the Extract endpoints?
The Extract endpoints (/v1/extract/xml and /v1/extract/json) work only with structured XML, either embedded in a Factur-X / ZUGFeRD / Peppol PDF or supplied as a standalone XML file. They are fast, deterministic, and never call AI. The Parse endpoint (/v1/parse/json) runs the AI pipeline on the PDF itself, ignoring any embedded XML, and is the right choice for native, scanned, or photographed invoices. Use Extract when your input has structured data, use Parse when it does not.
Ready to automate your invoices?
Validate, convert and embed compliant e-invoices through one API. Start your 30-day free trial. No credit card required.