Invoice JSON Extraction API Reference

Read a structured XML invoice and return it as an InvoiceDocument JSON. The endpoint accepts either a PDF (whose embedded CII / UBL XML attachment is used) or a standalone XML file. Pure XML parsing, no AI. Use /v1/parse/json if your PDF has no embedded XML and you need the data extracted with AI.

POST https://api.invoicexml.com/v1/extract/json

Code Example

curl -X POST https://api.invoicexml.com/v1/extract/json \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]"

Try it out online, no coding required

Upload any invoice and download the structured JSON instantly, right in your browser.

Try It Online

Request

Parameter	Type	Description
file *	binary	The invoice file to process.

Content-Type: multipart/form-data

The source and target formats are part of the endpoint path, and everything else (syntax, declared profile, specification identifier) is read from the document itself, so there is nothing more to configure.

Headers

Header	Value
Authorization *	Bearer YOUR_API_KEY
Content-Type	multipart/form-data

Response

200 Extracted JSON

Returns the parsed invoice data as a structured JSON object.

Content-Type: application/json

The response filename is derived from the uploaded file: {original-name}.json. The Content-Disposition header is set to attachment for direct download.

The root object carries the full extracted document under an invoice key, so your client reads response.invoice.seller, response.invoice.totals, response.invoice.lines and so on. Fields that are not present on the source document are returned as null. The envelope mirrors the request body of the /v1/create endpoints, so a response can be piped straight back into invoice creation.

Complete example response

The example below shows every field of the InvoiceDocument model populated, so you can see the exact shape your parser has to handle: header terms, seller and buyer groups, delivery, payment details, totals, VAT breakdowns, line items with price details and item attributes, document level allowances and charges, supporting documents, and preceding invoice references. A real response contains the same properties for every invoice; whatever the source document does not carry is returned as null (or as an empty array for repeating groups), never omitted.

Show the complete example response (all fields populated)

{
  "invoice": {
    "invoiceNumber": "INV-2026-0042",
    "issueDate": "2026-06-14",
    "dueDate": "2026-07-14",
    "typeCode": "380",
    "currency": "EUR",
    "taxCurrency": null,
    "specificationId": "urn:cen.eu:en16931:2017",
    "businessProcessType": "urn:fdc:peppol.eu:2017:poacc:billing:01:1.0",
    "buyerReference": "BUYER-REF-2026-998",
    "purchaseOrderReference": "PO-2026-5571",
    "notes": [
      "Thank you for your business.",
      "Goods delivered to the Lyon warehouse during May 2026."
    ],
    "vatPointDateCode": "3",
    "supportingDocuments": [
      {
        "reference": "TIMESHEET-2026-05",
        "schemeId": null,
        "documentTypeCode": "916",
        "description": "Signed timesheet for May 2026",
        "externalUri": "https://docs.example.com/timesheets/2026-05.pdf",
        "attachment": "JVBERi0xLjcK",
        "attachmentMimeCode": "application/pdf",
        "attachmentFilename": "timesheet-2026-05.pdf"
      }
    ],
    "precedingInvoiceReferences": [
      { "reference": "INV-2026-0031", "issueDate": "2026-05-02" }
    ],
    "seller": {
      "name": "Muster Lieferant GmbH",
      "tradingName": "Muster Supplies",
      "postalAddress": {
        "line1": "Lieferantenstrasse 20",
        "line2": "Gebaeude 4",
        "line3": "Etage 3",
        "city": "Berlin",
        "postCode": "10115",
        "countrySubdivision": "Berlin",
        "country": "DE"
      },
      "contact": {
        "name": "Max Mustermann",
        "phone": "+49 30 1234567",
        "email": "[email protected]"
      },
      "identifiers": [
        { "identifier": "4000001000005", "schemeId": "0088" }
      ],
      "legalRegistration": { "identifier": "HRB 12345", "schemeId": "0002" },
      "vatIdentifier": "DE123456789",
      "taxRegistrationIdentifier": "201/123/12345",
      "additionalLegalInformation": "Geschaeftsfuehrer: Max Mustermann. Amtsgericht Berlin HRB 12345.",
      "electronicAddress": { "identifier": "DE123456789", "schemeId": "9930" }
    },
    "buyer": {
      "name": "Client Acheteur SARL",
      "tradingName": "Acheteur Retail",
      "postalAddress": {
        "line1": "12 Rue de l'Acheteur",
        "line2": "Batiment B",
        "line3": "Bureau 210",
        "city": "Lyon",
        "postCode": "69002",
        "countrySubdivision": "Auvergne-Rhone-Alpes",
        "country": "FR"
      },
      "contact": {
        "name": "Marie Durand",
        "phone": "+33 4 78 00 00 00",
        "email": "[email protected]"
      },
      "identifiers": [
        { "identifier": "3000002000003", "schemeId": "0088" }
      ],
      "legalRegistration": { "identifier": "303 265 045", "schemeId": "0002" },
      "vatIdentifier": "FR40303265045",
      "electronicAddress": { "identifier": "FR40303265045", "schemeId": "9957" }
    },
    "delivery": {
      "receiverName": "Acheteur Warehouse Lyon",
      "locationIdentifier": "3000002000010",
      "actualDeliveryDate": "2026-05-30",
      "deliveryAddress": {
        "line1": "5 Avenue de la Logistique",
        "line2": "Quai 7",
        "line3": "Zone C",
        "city": "Venissieux",
        "postCode": "69200",
        "countrySubdivision": "Auvergne-Rhone-Alpes",
        "country": "FR"
      }
    },
    "invoicingPeriod": {
      "startDate": "2026-05-01",
      "endDate": "2026-05-31"
    },
    "paymentDetails": {
      "paymentMeansCode": "58",
      "paymentMeansText": "SEPA credit transfer to the main account",
      "remittanceInformation": "INV-2026-0042",
      "paymentAccountIdentifier": "DE89370400440532013000",
      "paymentAccountName": "Muster Lieferant GmbH",
      "bic": "COBADEFFXXX",
      "mandateReference": "MNDT-2026-0042",
      "paymentTerms": "Payment within 30 days net. 2% discount if paid within 10 days."
    },
    "totals": {
      "sumOfLineNetAmounts": 975.00,
      "sumOfAllowances": 15.00,
      "sumOfCharges": 30.00,
      "taxBasisTotalAmount": 990.00,
      "taxTotalAmount": 176.10,
      "taxTotalAmountInAccountingCurrency": null,
      "grandTotalAmount": 1166.10,
      "paidAmount": 100.00,
      "roundingAmount": 0.00,
      "duePayableAmount": 1066.10
    },
    "vatBreakdowns": [
      {
        "taxableAmount": 890.00,
        "taxAmount": 169.10,
        "categoryCode": "S",
        "rate": 19,
        "exemptionReasonText": null,
        "exemptionReasonCode": null
      },
      {
        "taxableAmount": 100.00,
        "taxAmount": 7.00,
        "categoryCode": "S",
        "rate": 7,
        "exemptionReasonText": null,
        "exemptionReasonCode": null
      }
    ],
    "lines": [
      {
        "lineId": "1",
        "lineNote": "Includes 12-month standard warranty.",
        "objectIdentifier": { "identifier": "OBJ-AB-001", "schemeId": "AAJ" },
        "quantity": 10,
        "unitCode": "C62",
        "lineNetAmount": 875.00,
        "buyerOrderLineReference": "PO-2026-5571-1",
        "lineBuyerAccountingReference": "COSTCENTER-4711",
        "linePeriod": { "startDate": "2026-05-01", "endDate": "2026-05-31" },
        "priceDetails": {
          "netPrice": 90.00,
          "discountAmount": 10.00,
          "grossPrice": 100.00,
          "priceBaseQuantity": 1,
          "priceBaseUnit": "C62"
        },
        "vatInformation": { "categoryCode": "S", "rate": 19 },
        "item": {
          "name": "Ergonomic office chair",
          "description": "Adjustable ergonomic office chair, black mesh back.",
          "sellerIdentifier": "CHAIR-ERGO-BLK",
          "buyerIdentifier": "BUY-CHR-001",
          "standardIdentifier": { "identifier": "4012345000009", "schemeId": "0160" },
          "classifications": [
            { "identifier": "56101700", "schemeId": "STI", "schemeVersion": "26.0801" }
          ],
          "attributes": [
            { "name": "Colour", "value": "Black" },
            { "name": "Backrest", "value": "Mesh" }
          ],
          "countryOfOrigin": "DE"
        },
        "allowances": [
          {
            "amount": 50.00,
            "baseAmount": 1000.00,
            "percentage": 5,
            "reason": "Volume discount",
            "reasonCode": "95"
          }
        ],
        "charges": [
          {
            "amount": 25.00,
            "baseAmount": 1000.00,
            "percentage": 2.5,
            "reason": "Handling and packaging",
            "reasonCode": "FC"
          }
        ]
      },
      {
        "lineId": "2",
        "lineNote": "Reduced-rate consumable.",
        "objectIdentifier": { "identifier": "OBJ-AB-002", "schemeId": "AAJ" },
        "quantity": 5,
        "unitCode": "C62",
        "lineNetAmount": 100.00,
        "buyerOrderLineReference": "PO-2026-5571-2",
        "lineBuyerAccountingReference": "COSTCENTER-4712",
        "linePeriod": { "startDate": "2026-05-01", "endDate": "2026-05-31" },
        "priceDetails": {
          "netPrice": 20.00,
          "discountAmount": 5.00,
          "grossPrice": 25.00,
          "priceBaseQuantity": 1,
          "priceBaseUnit": "C62"
        },
        "vatInformation": { "categoryCode": "S", "rate": 7 },
        "item": {
          "name": "Printed product catalogue",
          "description": "Full-colour A4 product catalogue, 120 pages.",
          "sellerIdentifier": "CAT-2026",
          "buyerIdentifier": "BUY-CAT-002",
          "standardIdentifier": { "identifier": "4012345000160", "schemeId": "0160" },
          "classifications": [
            { "identifier": "55101500", "schemeId": "STI", "schemeVersion": "26.0801" }
          ],
          "attributes": [],
          "countryOfOrigin": "DE"
        },
        "allowances": [],
        "charges": []
      }
    ],
    "allowances": [
      {
        "amount": 15.00,
        "baseAmount": 300.00,
        "percentage": 5,
        "vatCategoryCode": "S",
        "vatRate": 19,
        "reason": "Loyalty rebate",
        "reasonCode": "100"
      }
    ],
    "charges": [
      {
        "amount": 30.00,
        "baseAmount": 600.00,
        "percentage": 5,
        "vatCategoryCode": "S",
        "vatRate": 19,
        "reason": "Freight",
        "reasonCode": "FC"
      }
    ]
  }}

How the Parse Invoice JSON API Works

The API runs a four-stage pipeline on every uploaded document to guarantee a validated, integration-ready JSON object:

Document ingestion & OCR

The uploaded file is decoded and, if necessary, passed through an OCR engine. Native PDFs are parsed at the text layer; scanned PDFs, JPEG, PNG, TIFF, and WEBP files are processed via optical character recognition before any field extraction begins.

AI field extraction & semantic mapping

A large-language model reads the full document and identifies every invoice field, seller, buyer, invoice number, date, line items, tax rates, payment terms, bank details, regardless of layout, language, or formatting. Fields are mapped to a normalised schema, and the model self-rates its confidence for each extraction area.

Multi-layer validation & cross-referencing

Four validation passes run sequentially:

Schema: all mandatory EN 16931 fields present, correct data types, valid code-list values.
Tax numbers: VAT IDs and business registration numbers verified against country-specific format rules.
Addresses: seller and buyer addresses parsed into components and checked for internal consistency.
Arithmetic: unit price × quantity = line total; sum of line totals + tax amounts = invoice grand total.

Discrepancies are corrected deterministically where the arithmetic allows it and otherwise surface as lower confidence scores rather than blocking the API response, so your application can decide how to handle borderline cases.

Structured JSON delivery

The validated invoice object is serialised as application/json and returned with Content-Disposition: attachment. Field names are consistent across all source documents, languages, and invoice formats, no post-processing required before API integration.

Frequently Asked Questions

What file formats are accepted?

A PDF containing embedded CII or UBL XML (Factur-X, ZUGFeRD, or Peppol PDF/A-3), or a standalone XML file (CII D16B / UBL 2.1). Maximum file size is 20 MB.

What happens if the PDF has no embedded XML?

The API returns a 400 response with errorCode 4006 (NoEmbeddedXml). For PDFs without an embedded XML attachment (typed, scanned, or photographed invoices), use POST /v1/parse/json instead, which runs the AI extraction pipeline.

Is the XML validated before parsing?

The XML is parsed against the CII or UBL schema by the parser. EN 16931 Schematron rules are not checked here. If you need a full validation pass, use POST /v1/validate/{format} on the same input.

What does the JSON output contain?

The BT-first InvoiceDocument: invoiceNumber, issueDate, currency, seller, buyer, paymentDetails, lines, totals, vatBreakdowns, and the rest of the EN 16931 model. Field names mirror the request bodies accepted by /v1/create/*.

How does this differ from /v1/parse/json?

/v1/extract/json is deterministic XML parsing. It does not call the AI model and produces an exact mapping of the source XML. /v1/parse/json is AI-driven extraction from PDFs that do not have embedded XML.