XML Extraction API Reference
Pull the embedded XML attachment out of a hybrid PDF (Factur-X, ZUGFeRD, or any PDF/A-3 carrying a CII or UBL invoice). The endpoint streams the XML straight from the PDF container without any transformation. If the PDF has no embedded XML, a 400 with errorCode 4006 is returned.
https://api.invoicexml.com/v1/extract/xml
Code Example
curl -X POST https://api.invoicexml.com/v1/extract/xml \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "[email protected]"
Try it out online, no coding required
Upload a Factur-X or ZUGFeRD PDF and download the extracted XML instantly, right in your browser.
Request
| Parameter | Type | Description |
|---|---|---|
| file * | binary | The invoice file to process. |
Content-Type: multipart/form-data
The source and target formats are part of the endpoint path, and everything else (syntax, declared profile, specification identifier) is read from the document itself, so there is nothing more to configure.
Headers
| Header | Value |
|---|---|
| Authorization * | Bearer YOUR_API_KEY |
| Content-Type | multipart/form-data |
Response
200 Extracted XML
Returns the embedded XML document as a file download.
The response filename is derived from the uploaded file: {original-name}.xml.
The Content-Disposition header is set to attachment for direct download.
Frequently Asked Questions
What PDFs work with this endpoint?
PDF/A-3 hybrid invoices that carry an embedded CII or UBL XML attachment, e.g. Factur-X (factur-x.xml), ZUGFeRD (zugferd-invoice.xml), or Peppol PDFs. The API recognises the standard attachment names defined by each format.
What if the PDF has no embedded XML?
The API returns a 400 response with errorCode 4006 (NoEmbeddedXml). To extract invoice data from a PDF that has no embedded XML, use POST /v1/parse/json which uses AI to read the visual content.
Is the extracted XML modified or validated?
No. The XML is returned exactly as it sits inside the PDF, byte-for-byte. If you need Schematron / EN 16931 validation, pass the output to POST /v1/validate/{format}.
Can I get JSON from a Factur-X PDF instead of XML?
Yes. POST /v1/extract/json takes the same hybrid PDF, extracts the embedded XML, parses it, and returns an InvoiceDocument JSON.
What is the output filename?
Derived from the uploaded PDF: if you upload invoice-2026.pdf you receive invoice-2026.xml. The Content-Disposition header is set to attachment for direct download.