How to Extract Invoice Data with n8n + Ollama (Free AI Document Processing Workflow)
Invoice processing is one of the most tedious tasks in any finance or operations team. Manually copying vendor names, invoice numbers, line items, and totals from PDFs into spreadsheets is slow, error-prone, and mind-numbing. Cloud OCR and document AI services like AWS Textract or Google Document AI can help — but they charge per page, require sending your financial documents to external servers, and raise serious GDPR concerns.
With n8n and Ollama, you can build an automated invoice extraction pipeline that runs entirely on your own hardware. No API costs, no cloud storage of sensitive financial data, and full control over how your documents are processed.
In this tutorial, you'll build a workflow that:
- Receives invoices via email attachment or webhook upload
- Extracts raw text from the PDF or image
- Sends the text to Ollama for structured data extraction
- Outputs clean, structured JSON with all invoice fields
- Logs the data to Google Sheets or a database automatically
Why Local AI for Invoice Processing?
Financial documents are among the most sensitive data your business handles. Invoices contain vendor relationships, pricing agreements, bank details, and spending patterns — information you do not want sitting on a third-party server.
| Concern | Cloud API Approach | n8n + Ollama (Local) |
|---|---|---|
| Data privacy | Documents sent to AWS, Google, or OpenAI servers | Never leaves your server |
| GDPR compliance | Requires DPA agreements, data residency checks | Fully compliant by default |
| Cost | $0.01–$0.15 per page (adds up fast) | $0 per document |
| Volume limits | Rate limits and tier caps | Unlimited throughput |
| Vendor lock-in | Dependent on API availability and pricing | No dependencies |
GDPR note: Under GDPR, sending employee or customer financial data to third-party processors requires a Data Processing Agreement (DPA) and may require data residency in the EU. Running Ollama locally eliminates this requirement entirely — the data never leaves your infrastructure.
The Architecture
The invoice extraction pipeline follows a straightforward pattern: receive the document, extract text, structure it with AI, and output the data.
[Email trigger / Webhook: Invoice arrives]
|
v
[Extract text from PDF/image attachment]
|
v
[Ollama: Structured data extraction]
|
v
[Set node: Parse + validate JSON fields]
|
v
[Google Sheets / PostgreSQL: Store results]
|
v
[Optional: Slack alert or approval workflow]
The Ollama call is the core of the pipeline. A well-engineered extraction prompt reliably pulls structured fields from raw invoice text, even when the formatting varies across vendors — which it always does.
Fields to Extract
A complete invoice extraction should capture these fields:
| Field | Description | Example |
|---|---|---|
vendor_name | Supplier company name | "Acme Supplies Ltd" |
vendor_address | Supplier billing address | "123 Main St, London EC1A" |
invoice_number | Unique invoice identifier | "INV-2026-00842" |
invoice_date | Date invoice was issued | "2026-03-15" |
due_date | Payment due date | "2026-04-14" |
currency | ISO currency code | "GBP" |
line_items | Array of products/services | [{description, qty, unit_price, total}] |
subtotal | Total before tax | 450.00 |
tax_rate | VAT/GST/tax percentage | 20 |
tax_amount | Tax amount in currency | 90.00 |
total_amount | Final amount due | 540.00 |
payment_terms | Net-30, Net-60, etc. | "Net 30" |
purchase_order | Your PO number if referenced | "PO-2026-112" |
The Ollama Extraction Prompt
The prompt is the most important part of this workflow. It needs to handle the wide variation in how different vendors format their invoices — some use tables, some use paragraphs, some mix both.
Here is the extraction prompt used in this workflow:
You are an invoice data extraction specialist. Extract all invoice information from the following document text and return it as structured JSON.
Extract these fields exactly:
- vendor_name: the supplier/seller company name
- vendor_address: the supplier's address (single string)
- invoice_number: the invoice ID, reference number, or invoice #
- invoice_date: date the invoice was issued (ISO format: YYYY-MM-DD)
- due_date: payment due date (ISO format: YYYY-MM-DD, or null if not found)
- currency: 3-letter ISO currency code (USD, EUR, GBP, etc.)
- line_items: array of objects with fields: description, quantity, unit_price, total
- subtotal: total before tax (number, no currency symbol)
- tax_rate: tax/VAT percentage as a number (e.g. 20 for 20%), or null
- tax_amount: total tax charged (number), or null
- total_amount: final total amount due (number, no currency symbol)
- payment_terms: payment terms string (e.g. "Net 30"), or null
- purchase_order: PO number if referenced, or null
- confidence: your confidence in the extraction from 0.0 to 1.0
Rules:
- Return ONLY valid JSON, no explanation, no markdown code blocks
- If a field cannot be found, use null
- All monetary values must be numbers (not strings)
- Dates must be in YYYY-MM-DD format
- If the currency symbol is present ($ £ €), infer the currency code
Invoice text:
{{invoice_text}}
Why this prompt works: The explicit field list, strict JSON-only instruction, and null-fallback rules are critical. Without them, local models tend to add explanatory text around the JSON or format numbers as strings like "£540.00". The temperature: 0.1 setting keeps output deterministic across repeated runs on the same document.
Prerequisites
You need n8n and Ollama running. If you haven't set them up yet:
# Install Ollama and pull a capable model
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3:8b
# For better extraction accuracy on complex invoices:
# ollama pull mistral:7b-instruct
# Run n8n in Docker
docker run -d --name n8n -p 5678:5678 \
--add-host=host.docker.internal:host-gateway \
-v n8n_data:/home/node/.n8n \
n8nio/n8n
Model choice: llama3:8b handles most invoices well. For invoices with complex multi-page tables or non-English text, mistral:7b-instruct or llama3:70b (if your hardware allows) gives more reliable extraction. The workflow JSON works with any Ollama model — just update the model name field.
Free Workflow: AI Invoice Extractor
Here is a complete, working n8n workflow. Import the JSON directly into your n8n instance to get started immediately.
What Each Step Does
The workflow starts when an invoice arrives. Use the Email Trigger (IMAP) node to monitor an inbox like invoices@yourcompany.com, or use a Webhook node if you're uploading invoices via an API or form. The node captures the raw attachment binary data and passes it downstream.
An HTTP Request node calls a local text extraction service, or you can use n8n's Extract from File node (available in newer n8n versions). For PDF invoices, the workflow pipes the binary through a simple PDF-to-text conversion. The output is a plain text string containing all the invoice content.
The raw invoice text is injected into the extraction prompt and sent to Ollama via its local HTTP API. Ollama returns a JSON object with all the structured invoice fields. The temperature: 0.1 setting ensures consistent output across runs.
A Set node parses the JSON response and validates that required fields are present. If the extraction confidence is below 0.7, the invoice is flagged for human review. High-confidence extractions pass through automatically.
The structured invoice data is written to a Google Sheets spreadsheet or a PostgreSQL/MySQL database. Each invoice becomes one row with all fields as columns. Line items are stored as a JSON string or written to a separate child table if your database schema supports it.
The Workflow JSON
Click to expand full workflow JSON
{
"name": "AI Invoice Data Extractor (Ollama)",
"nodes": [
{
"parameters": {
"httpMethod": "POST",
"path": "extract-invoice",
"responseMode": "responseNode",
"options": {}
},
"id": "webhook",
"name": "Receive Invoice",
"type": "n8n-nodes-base.webhook",
"typeVersion": 2,
"position": [240, 300],
"webhookId": "extract-invoice"
},
{
"parameters": {
"assignments": {
"assignments": [
{
"id": "invoice_text",
"name": "invoice_text",
"value": "={{ $json.body.text || $json.body.invoice_text || '' }}",
"type": "string"
}
]
}
},
"id": "prepare-text",
"name": "Prepare Invoice Text",
"type": "n8n-nodes-base.set",
"typeVersion": 3.4,
"position": [460, 300]
},
{
"parameters": {
"url": "http://localhost:11434/api/generate",
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={{ JSON.stringify({ model: 'llama3:8b', prompt: 'You are an invoice data extraction specialist. Extract all invoice information from the following document text and return it as structured JSON.\\n\\nExtract these fields exactly:\\n- vendor_name: the supplier/seller company name\\n- vendor_address: the supplier\\'s address (single string)\\n- invoice_number: the invoice ID, reference number, or invoice #\\n- invoice_date: date the invoice was issued (ISO format: YYYY-MM-DD)\\n- due_date: payment due date (ISO format: YYYY-MM-DD, or null if not found)\\n- currency: 3-letter ISO currency code (USD, EUR, GBP, etc.)\\n- line_items: array of objects with fields: description, quantity, unit_price, total\\n- subtotal: total before tax (number, no currency symbol)\\n- tax_rate: tax/VAT percentage as a number (e.g. 20 for 20%), or null\\n- tax_amount: total tax charged (number), or null\\n- total_amount: final total amount due (number, no currency symbol)\\n- payment_terms: payment terms string (e.g. \\'Net 30\\'), or null\\n- purchase_order: PO number if referenced, or null\\n- confidence: your confidence in the extraction from 0.0 to 1.0\\n\\nRules:\\n- Return ONLY valid JSON, no explanation, no markdown code blocks\\n- If a field cannot be found, use null\\n- All monetary values must be numbers (not strings)\\n- Dates must be in YYYY-MM-DD format\\n\\nInvoice text:\\n' + $json.invoice_text, stream: false, options: { temperature: 0.1, num_predict: 1000 } }) }}",
"options": { "timeout": 120000 }
},
"id": "ollama-extract",
"name": "Extract with Ollama",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [680, 300]
},
{
"parameters": {
"assignments": {
"assignments": [
{
"id": "extracted",
"name": "extracted",
"value": "={{ (() => { try { const raw = $json.response; const match = raw.match(/\\{[\\s\\S]*\\}/); return match ? JSON.parse(match[0]) : { error: 'parse_failed', raw: raw } } catch(e) { return { error: e.message } } })() }}",
"type": "object"
},
{
"id": "needs_review",
"name": "needs_review",
"value": "={{ ($json.response && (() => { try { const m = $json.response.match(/\\{[\\s\\S]*\\}/); return m ? JSON.parse(m[0]).confidence < 0.7 : true } catch(e) { return true } })()) }}",
"type": "boolean"
}
]
}
},
"id": "parse-result",
"name": "Parse Extraction Result",
"type": "n8n-nodes-base.set",
"typeVersion": 3.4,
"position": [900, 300]
},
{
"parameters": {
"respondWith": "json",
"responseBody": "={{ JSON.stringify({ success: !$json.extracted.error, needs_review: $json.needs_review, invoice: $json.extracted }) }}",
"options": {}
},
"id": "respond",
"name": "Return Result",
"type": "n8n-nodes-base.respondToWebhook",
"typeVersion": 1.1,
"position": [1120, 300]
}
],
"connections": {
"Receive Invoice": {
"main": [[{ "node": "Prepare Invoice Text", "type": "main", "index": 0 }]]
},
"Prepare Invoice Text": {
"main": [[{ "node": "Extract with Ollama", "type": "main", "index": 0 }]]
},
"Extract with Ollama": {
"main": [[{ "node": "Parse Extraction Result", "type": "main", "index": 0 }]]
},
"Parse Extraction Result": {
"main": [[{ "node": "Return Result", "type": "main", "index": 0 }]]
}
},
"settings": { "executionOrder": "v1" },
"tags": [
{ "name": "AI" },
{ "name": "Ollama" },
{ "name": "Finance" },
{ "name": "Invoice Processing" },
{ "name": "Document Extraction" }
]
}
Testing the Workflow
Once imported and activated, test it by sending invoice text directly via curl. In production you'd pipe in the extracted PDF text, but for testing you can send the text inline:
curl -X POST http://localhost:5678/webhook/extract-invoice \
-H "Content-Type: application/json" \
-d '{
"text": "INVOICE\n\nFrom: Acme Supplies Ltd\n123 Industrial Way, Manchester M1 2AB\n\nInvoice #: INV-2026-00842\nDate: 15 March 2026\nDue Date: 14 April 2026\nPayment Terms: Net 30\n\nBill To: Your Company Ltd\n\nItem Qty Unit Price Total\nWeb Hosting x3 3 £50.00 £150.00\nSSL Certificate 1 £80.00 £80.00\nSetup Fee 1 £120.00 £120.00\n\nSubtotal: £350.00\nVAT (20%): £70.00\nTotal Due: £420.00\n\nBank: Barclays | Sort: 20-30-40 | Acc: 12345678"
}'
Expected response:
{
"success": true,
"needs_review": false,
"invoice": {
"vendor_name": "Acme Supplies Ltd",
"vendor_address": "123 Industrial Way, Manchester M1 2AB",
"invoice_number": "INV-2026-00842",
"invoice_date": "2026-03-15",
"due_date": "2026-04-14",
"currency": "GBP",
"line_items": [
{ "description": "Web Hosting x3", "quantity": 3, "unit_price": 50.00, "total": 150.00 },
{ "description": "SSL Certificate", "quantity": 1, "unit_price": 80.00, "total": 80.00 },
{ "description": "Setup Fee", "quantity": 1, "unit_price": 120.00, "total": 120.00 }
],
"subtotal": 350.00,
"tax_rate": 20,
"tax_amount": 70.00,
"total_amount": 420.00,
"payment_terms": "Net 30",
"purchase_order": null,
"confidence": 0.97
}
}
Connecting to Email Automatically
The webhook approach works great for API-driven uploads. For a fully automated pipeline that processes invoices as they arrive by email, replace the Webhook node with an Email Trigger (IMAP) node:
- Add an Email Trigger (IMAP) node and connect it to your
invoices@yourcompany.cominbox - Enable "Download Attachments" in the node settings
- Add an Extract from File node (for PDF) or use a text extraction HTTP call to convert the binary to text
- Connect the text output to the "Prepare Invoice Text" node
- Set the polling interval to every 5 or 15 minutes
With this setup, anyone on your team can forward an invoice to a dedicated email address and it will be automatically extracted and logged within minutes — no manual data entry required.
Writing to Google Sheets
To log extracted invoices to a Google Sheets spreadsheet, add a Google Sheets node after the Parse Result step:
- Connect your Google account in n8n's credentials manager
- Create a spreadsheet with columns matching your invoice fields
- Set the operation to "Append Row"
- Map each field:
vendor_name,invoice_number,invoice_date,total_amount,currency, etc. - For
line_items, useJSON.stringify()to store the array as a string, or create a second sheet for line item detail rows
The result is a living spreadsheet of every invoice your business receives, auto-populated with structured data. Your accounts team can sort, filter, and export without touching the underlying automation.
Handling Tricky Invoice Formats
Invoices vary enormously between vendors. Here's how this workflow handles common edge cases:
Multi-page PDFs
The text extraction step concatenates all pages before sending to Ollama. The model receives the full document text and can find fields that span pages. Keep the num_predict value high (1000+) to allow a complete JSON response for documents with many line items.
Image-based invoices (scanned PDFs)
Scanned PDFs require OCR before text extraction. In the full pack workflow, an additional step uses Tesseract (via a local Docker container) or an n8n HTTP call to a locally-running OCR service. This converts the image to text before the Ollama extraction step runs.
Non-English invoices
Ollama's multilingual models handle common European languages well. Add "The invoice may be in a language other than English. Extract fields into English field names but preserve the original text values" to the prompt for best results with mixed-language invoices.
Low confidence extractions
The needs_review flag triggers when confidence falls below 0.7. Add a Switch node after parsing to route low-confidence invoices to a Slack message or email alert asking a human to verify before the data is written to the spreadsheet.
Accuracy in practice: On clean, machine-generated PDFs (the majority of invoices from SaaS vendors, suppliers with modern billing systems), extraction accuracy is consistently 95%+. Scanned or handwritten invoices are harder — expect 80–90% field accuracy and plan for a human review step for those cases.
Comparison: Local vs. Cloud Invoice Processing
| n8n + Ollama (Local) | AWS Textract / Google Doc AI | |
|---|---|---|
| Cost | $0 per document | $0.01–$0.15 per page |
| Data privacy | Never leaves your server | Sent to cloud for processing |
| GDPR | Compliant by default | Requires DPA, data residency review |
| Setup complexity | Moderate (30–60 min) | Low (API key + SDK) |
| Accuracy on clean PDFs | 95%+ | 98%+ |
| Accuracy on scanned docs | 80–90% | 90–95% |
| Customization | Full control over fields and logic | Limited to API schema |
| Break-even volume | Day 1 (hardware cost aside) | Costly above ~1,000 pages/month |
For a company processing 200 invoices per month at an average of 2 pages each, AWS Textract costs roughly $4–60/month depending on the feature tier used. Over a year that's $48–720. The local approach pays for itself quickly if you already have a server running n8n.
Want the Full Production-Ready Pack?
The Self-Hosted AI Workflow Pack includes an advanced invoice processor with OCR support for scanned PDFs, human-review routing via Slack, multi-currency normalization, PostgreSQL storage, duplicate detection, and 10 more AI workflows — all running locally with Ollama.
Get All 11 Workflows — $39One-time purchase. No subscriptions. 30-day money-back guarantee.
What's in the Full Pack
The free workflow above handles the core extraction. The full pack adds:
- OCR preprocessing — Automatically detects scanned vs. digital PDFs and routes through Tesseract OCR when needed
- Duplicate detection — Checks the database for matching invoice numbers before inserting to prevent double-processing
- Multi-currency normalization — Converts extracted amounts to a base currency using daily exchange rates
- Approval workflow — Invoices above a configurable threshold are sent to Slack for approval before logging
- Vendor matching — Fuzzy-matches extracted vendor names against your approved vendor list
- PostgreSQL schema — Ready-to-run SQL schema with invoices and line_items tables
- 10 additional workflows — Lead scoring, email automation, social content generation, customer support triage, and more
Next Steps
- Import the workflow — Copy the JSON above into n8n (Settings → Import Workflow)
- Test with your invoices — Extract text from a real PDF and send it via the curl command
- Connect your email inbox — Replace the Webhook node with an Email Trigger (IMAP) node
- Add Google Sheets output — Connect a Google Sheets node to log every extracted invoice
- Set up human review — Add a Slack notification for low-confidence extractions
Explore more n8n + Ollama automation tutorials: