How to Extract Invoice Data with n8n + Ollama (Free AI Document Processing Workflow)

Published March 24, 2026 · 11 min read

Invoice processing is one of the most tedious tasks in any finance or operations team. Manually copying vendor names, invoice numbers, line items, and totals from PDFs into spreadsheets is slow, error-prone, and mind-numbing. Cloud OCR and document AI services like AWS Textract or Google Document AI can help — but they charge per page, require sending your financial documents to external servers, and raise serious GDPR concerns.

With n8n and Ollama, you can build an automated invoice extraction pipeline that runs entirely on your own hardware. No API costs, no cloud storage of sensitive financial data, and full control over how your documents are processed.

In this tutorial, you'll build a workflow that:

Receives invoices via email attachment or webhook upload
Extracts raw text from the PDF or image
Sends the text to Ollama for structured data extraction
Outputs clean, structured JSON with all invoice fields
Logs the data to Google Sheets or a database automatically

Why Local AI for Invoice Processing?

Financial documents are among the most sensitive data your business handles. Invoices contain vendor relationships, pricing agreements, bank details, and spending patterns — information you do not want sitting on a third-party server.

Concern	Cloud API Approach	n8n + Ollama (Local)
Data privacy	Documents sent to AWS, Google, or OpenAI servers	Never leaves your server
GDPR compliance	Requires DPA agreements, data residency checks	Fully compliant by default
Cost	$0.01–$0.15 per page (adds up fast)	$0 per document
Volume limits	Rate limits and tier caps	Unlimited throughput
Vendor lock-in	Dependent on API availability and pricing	No dependencies

GDPR note: Under GDPR, sending employee or customer financial data to third-party processors requires a Data Processing Agreement (DPA) and may require data residency in the EU. Running Ollama locally eliminates this requirement entirely — the data never leaves your infrastructure.

The Architecture

The invoice extraction pipeline follows a straightforward pattern: receive the document, extract text, structure it with AI, and output the data.

[Email trigger / Webhook: Invoice arrives]
        |
        v
[Extract text from PDF/image attachment]
        |
        v
[Ollama: Structured data extraction]
        |
        v
[Set node: Parse + validate JSON fields]
        |
        v
[Google Sheets / PostgreSQL: Store results]
        |
        v
[Optional: Slack alert or approval workflow]

The Ollama call is the core of the pipeline. A well-engineered extraction prompt reliably pulls structured fields from raw invoice text, even when the formatting varies across vendors — which it always does.

Fields to Extract

A complete invoice extraction should capture these fields:

Field	Description	Example
`vendor_name`	Supplier company name	"Acme Supplies Ltd"
`vendor_address`	Supplier billing address	"123 Main St, London EC1A"
`invoice_number`	Unique invoice identifier	"INV-2026-00842"
`invoice_date`	Date invoice was issued	"2026-03-15"
`due_date`	Payment due date	"2026-04-14"
`currency`	ISO currency code	"GBP"
`line_items`	Array of products/services	[{description, qty, unit_price, total}]
`subtotal`	Total before tax	450.00
`tax_rate`	VAT/GST/tax percentage	20
`tax_amount`	Tax amount in currency	90.00
`total_amount`	Final amount due	540.00
`payment_terms`	Net-30, Net-60, etc.	"Net 30"
`purchase_order`	Your PO number if referenced	"PO-2026-112"

The Ollama Extraction Prompt

The prompt is the most important part of this workflow. It needs to handle the wide variation in how different vendors format their invoices — some use tables, some use paragraphs, some mix both.

Here is the extraction prompt used in this workflow:

You are an invoice data extraction specialist. Extract all invoice information from the following document text and return it as structured JSON.

Extract these fields exactly:
- vendor_name: the supplier/seller company name
- vendor_address: the supplier's address (single string)
- invoice_number: the invoice ID, reference number, or invoice #
- invoice_date: date the invoice was issued (ISO format: YYYY-MM-DD)
- due_date: payment due date (ISO format: YYYY-MM-DD, or null if not found)
- currency: 3-letter ISO currency code (USD, EUR, GBP, etc.)
- line_items: array of objects with fields: description, quantity, unit_price, total
- subtotal: total before tax (number, no currency symbol)
- tax_rate: tax/VAT percentage as a number (e.g. 20 for 20%), or null
- tax_amount: total tax charged (number), or null
- total_amount: final total amount due (number, no currency symbol)
- payment_terms: payment terms string (e.g. "Net 30"), or null
- purchase_order: PO number if referenced, or null
- confidence: your confidence in the extraction from 0.0 to 1.0

Rules:
- Return ONLY valid JSON, no explanation, no markdown code blocks
- If a field cannot be found, use null
- All monetary values must be numbers (not strings)
- Dates must be in YYYY-MM-DD format
- If the currency symbol is present ($ £ €), infer the currency code

Invoice text:
{{invoice_text}}

Why this prompt works: The explicit field list, strict JSON-only instruction, and null-fallback rules are critical. Without them, local models tend to add explanatory text around the JSON or format numbers as strings like "£540.00". The temperature: 0.1 setting keeps output deterministic across repeated runs on the same document.

Prerequisites

You need n8n and Ollama running. If you haven't set them up yet:

# Install Ollama and pull a capable model
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3:8b

# For better extraction accuracy on complex invoices:
# ollama pull mistral:7b-instruct

# Run n8n in Docker
docker run -d --name n8n -p 5678:5678 \
  --add-host=host.docker.internal:host-gateway \
  -v n8n_data:/home/node/.n8n \
  n8nio/n8n

Model choice: llama3:8b handles most invoices well. For invoices with complex multi-page tables or non-English text, mistral:7b-instruct or llama3:70b (if your hardware allows) gives more reliable extraction. The workflow JSON works with any Ollama model — just update the model name field.

Free Workflow: AI Invoice Extractor

Here is a complete, working n8n workflow. Import the JSON directly into your n8n instance to get started immediately.

What Each Step Does

STEP 1: Email Trigger or Webhook

The workflow starts when an invoice arrives. Use the Email Trigger (IMAP) node to monitor an inbox like invoices@yourcompany.com, or use a Webhook node if you're uploading invoices via an API or form. The node captures the raw attachment binary data and passes it downstream.

STEP 2: Text Extraction

An HTTP Request node calls a local text extraction service, or you can use n8n's Extract from File node (available in newer n8n versions). For PDF invoices, the workflow pipes the binary through a simple PDF-to-text conversion. The output is a plain text string containing all the invoice content.

STEP 3: Ollama Structured Extraction

The raw invoice text is injected into the extraction prompt and sent to Ollama via its local HTTP API. Ollama returns a JSON object with all the structured invoice fields. The temperature: 0.1 setting ensures consistent output across runs.

STEP 4: Parse and Validate

A Set node parses the JSON response and validates that required fields are present. If the extraction confidence is below 0.7, the invoice is flagged for human review. High-confidence extractions pass through automatically.

STEP 5: Output to Google Sheets or Database

The structured invoice data is written to a Google Sheets spreadsheet or a PostgreSQL/MySQL database. Each invoice becomes one row with all fields as columns. Line items are stored as a JSON string or written to a separate child table if your database schema supports it.

The Workflow JSON

Click to expand full workflow JSON

{
  "name": "AI Invoice Data Extractor (Ollama)",
  "nodes": [
    {
      "parameters": {
        "httpMethod": "POST",
        "path": "extract-invoice",
        "responseMode": "responseNode",
        "options": {}
      },
      "id": "webhook",
      "name": "Receive Invoice",
      "type": "n8n-nodes-base.webhook",
      "typeVersion": 2,
      "position": [240, 300],
      "webhookId": "extract-invoice"
    },
    {
      "parameters": {
        "assignments": {
          "assignments": [
            {
              "id": "invoice_text",
              "name": "invoice_text",
              "value": "={{ $json.body.text || $json.body.invoice_text || '' }}",
              "type": "string"
            }
          ]
        }
      },
      "id": "prepare-text",
      "name": "Prepare Invoice Text",
      "type": "n8n-nodes-base.set",
      "typeVersion": 3.4,
      "position": [460, 300]
    },
    {
      "parameters": {
        "url": "http://localhost:11434/api/generate",
        "sendBody": true,
        "specifyBody": "json",
        "jsonBody": "={{ JSON.stringify({ model: 'llama3:8b', prompt: 'You are an invoice data extraction specialist. Extract all invoice information from the following document text and return it as structured JSON.\\n\\nExtract these fields exactly:\\n- vendor_name: the supplier/seller company name\\n- vendor_address: the supplier\\'s address (single string)\\n- invoice_number: the invoice ID, reference number, or invoice #\\n- invoice_date: date the invoice was issued (ISO format: YYYY-MM-DD)\\n- due_date: payment due date (ISO format: YYYY-MM-DD, or null if not found)\\n- currency: 3-letter ISO currency code (USD, EUR, GBP, etc.)\\n- line_items: array of objects with fields: description, quantity, unit_price, total\\n- subtotal: total before tax (number, no currency symbol)\\n- tax_rate: tax/VAT percentage as a number (e.g. 20 for 20%), or null\\n- tax_amount: total tax charged (number), or null\\n- total_amount: final total amount due (number, no currency symbol)\\n- payment_terms: payment terms string (e.g. \\'Net 30\\'), or null\\n- purchase_order: PO number if referenced, or null\\n- confidence: your confidence in the extraction from 0.0 to 1.0\\n\\nRules:\\n- Return ONLY valid JSON, no explanation, no markdown code blocks\\n- If a field cannot be found, use null\\n- All monetary values must be numbers (not strings)\\n- Dates must be in YYYY-MM-DD format\\n\\nInvoice text:\\n' + $json.invoice_text, stream: false, options: { temperature: 0.1, num_predict: 1000 } }) }}",
        "options": { "timeout": 120000 }
      },
      "id": "ollama-extract",
      "name": "Extract with Ollama",
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.2,
      "position": [680, 300]
    },
    {
      "parameters": {
        "assignments": {
          "assignments": [
            {
              "id": "extracted",
              "name": "extracted",
              "value": "={{ (() => { try { const raw = $json.response; const match = raw.match(/\\{[\\s\\S]*\\}/); return match ? JSON.parse(match[0]) : { error: 'parse_failed', raw: raw } } catch(e) { return { error: e.message } } })() }}",
              "type": "object"
            },
            {
              "id": "needs_review",
              "name": "needs_review",
              "value": "={{ ($json.response && (() => { try { const m = $json.response.match(/\\{[\\s\\S]*\\}/); return m ? JSON.parse(m[0]).confidence < 0.7 : true } catch(e) { return true } })()) }}",
              "type": "boolean"
            }
          ]
        }
      },
      "id": "parse-result",
      "name": "Parse Extraction Result",
      "type": "n8n-nodes-base.set",
      "typeVersion": 3.4,
      "position": [900, 300]
    },
    {
      "parameters": {
        "respondWith": "json",
        "responseBody": "={{ JSON.stringify({ success: !$json.extracted.error, needs_review: $json.needs_review, invoice: $json.extracted }) }}",
        "options": {}
      },
      "id": "respond",
      "name": "Return Result",
      "type": "n8n-nodes-base.respondToWebhook",
      "typeVersion": 1.1,
      "position": [1120, 300]
    }
  ],
  "connections": {
    "Receive Invoice": {
      "main": [[{ "node": "Prepare Invoice Text", "type": "main", "index": 0 }]]
    },
    "Prepare Invoice Text": {
      "main": [[{ "node": "Extract with Ollama", "type": "main", "index": 0 }]]
    },
    "Extract with Ollama": {
      "main": [[{ "node": "Parse Extraction Result", "type": "main", "index": 0 }]]
    },
    "Parse Extraction Result": {
      "main": [[{ "node": "Return Result", "type": "main", "index": 0 }]]
    }
  },
  "settings": { "executionOrder": "v1" },
  "tags": [
    { "name": "AI" },
    { "name": "Ollama" },
    { "name": "Finance" },
    { "name": "Invoice Processing" },
    { "name": "Document Extraction" }
  ]
}

Testing the Workflow

Once imported and activated, test it by sending invoice text directly via curl. In production you'd pipe in the extracted PDF text, but for testing you can send the text inline:

curl -X POST http://localhost:5678/webhook/extract-invoice \
  -H "Content-Type: application/json" \
  -d '{
    "text": "INVOICE\n\nFrom: Acme Supplies Ltd\n123 Industrial Way, Manchester M1 2AB\n\nInvoice #: INV-2026-00842\nDate: 15 March 2026\nDue Date: 14 April 2026\nPayment Terms: Net 30\n\nBill To: Your Company Ltd\n\nItem             Qty    Unit Price    Total\nWeb Hosting x3    3      £50.00      £150.00\nSSL Certificate   1      £80.00       £80.00\nSetup Fee         1     £120.00      £120.00\n\nSubtotal: £350.00\nVAT (20%): £70.00\nTotal Due: £420.00\n\nBank: Barclays | Sort: 20-30-40 | Acc: 12345678"
  }'

Expected response:

{
  "success": true,
  "needs_review": false,
  "invoice": {
    "vendor_name": "Acme Supplies Ltd",
    "vendor_address": "123 Industrial Way, Manchester M1 2AB",
    "invoice_number": "INV-2026-00842",
    "invoice_date": "2026-03-15",
    "due_date": "2026-04-14",
    "currency": "GBP",
    "line_items": [
      { "description": "Web Hosting x3", "quantity": 3, "unit_price": 50.00, "total": 150.00 },
      { "description": "SSL Certificate",  "quantity": 1, "unit_price": 80.00, "total": 80.00 },
      { "description": "Setup Fee",         "quantity": 1, "unit_price": 120.00, "total": 120.00 }
    ],
    "subtotal": 350.00,
    "tax_rate": 20,
    "tax_amount": 70.00,
    "total_amount": 420.00,
    "payment_terms": "Net 30",
    "purchase_order": null,
    "confidence": 0.97
  }
}

Connecting to Email Automatically

The webhook approach works great for API-driven uploads. For a fully automated pipeline that processes invoices as they arrive by email, replace the Webhook node with an Email Trigger (IMAP) node:

Add an Email Trigger (IMAP) node and connect it to your invoices@yourcompany.com inbox
Enable "Download Attachments" in the node settings
Add an Extract from File node (for PDF) or use a text extraction HTTP call to convert the binary to text
Connect the text output to the "Prepare Invoice Text" node
Set the polling interval to every 5 or 15 minutes

With this setup, anyone on your team can forward an invoice to a dedicated email address and it will be automatically extracted and logged within minutes — no manual data entry required.

Writing to Google Sheets

To log extracted invoices to a Google Sheets spreadsheet, add a Google Sheets node after the Parse Result step:

Connect your Google account in n8n's credentials manager
Create a spreadsheet with columns matching your invoice fields
Set the operation to "Append Row"
Map each field: vendor_name, invoice_number, invoice_date, total_amount, currency, etc.
For line_items, use JSON.stringify() to store the array as a string, or create a second sheet for line item detail rows

The result is a living spreadsheet of every invoice your business receives, auto-populated with structured data. Your accounts team can sort, filter, and export without touching the underlying automation.

Handling Tricky Invoice Formats

Invoices vary enormously between vendors. Here's how this workflow handles common edge cases:

Multi-page PDFs

The text extraction step concatenates all pages before sending to Ollama. The model receives the full document text and can find fields that span pages. Keep the num_predict value high (1000+) to allow a complete JSON response for documents with many line items.

Image-based invoices (scanned PDFs)

Scanned PDFs require OCR before text extraction. In the full pack workflow, an additional step uses Tesseract (via a local Docker container) or an n8n HTTP call to a locally-running OCR service. This converts the image to text before the Ollama extraction step runs.

Non-English invoices

Ollama's multilingual models handle common European languages well. Add "The invoice may be in a language other than English. Extract fields into English field names but preserve the original text values" to the prompt for best results with mixed-language invoices.

Low confidence extractions

The needs_review flag triggers when confidence falls below 0.7. Add a Switch node after parsing to route low-confidence invoices to a Slack message or email alert asking a human to verify before the data is written to the spreadsheet.

Accuracy in practice: On clean, machine-generated PDFs (the majority of invoices from SaaS vendors, suppliers with modern billing systems), extraction accuracy is consistently 95%+. Scanned or handwritten invoices are harder — expect 80–90% field accuracy and plan for a human review step for those cases.

Comparison: Local vs. Cloud Invoice Processing

	n8n + Ollama (Local)	AWS Textract / Google Doc AI
Cost	$0 per document	$0.01–$0.15 per page
Data privacy	Never leaves your server	Sent to cloud for processing
GDPR	Compliant by default	Requires DPA, data residency review
Setup complexity	Moderate (30–60 min)	Low (API key + SDK)
Accuracy on clean PDFs	95%+	98%+
Accuracy on scanned docs	80–90%	90–95%
Customization	Full control over fields and logic	Limited to API schema
Break-even volume	Day 1 (hardware cost aside)	Costly above ~1,000 pages/month

For a company processing 200 invoices per month at an average of 2 pages each, AWS Textract costs roughly $4–60/month depending on the feature tier used. Over a year that's $48–720. The local approach pays for itself quickly if you already have a server running n8n.

Want the Full Production-Ready Pack?

The Self-Hosted AI Workflow Pack includes an advanced invoice processor with OCR support for scanned PDFs, human-review routing via Slack, multi-currency normalization, PostgreSQL storage, duplicate detection, and 10 more AI workflows — all running locally with Ollama.

Get All 11 Workflows — $39

One-time purchase. No subscriptions. 30-day money-back guarantee.

What's in the Full Pack

The free workflow above handles the core extraction. The full pack adds:

OCR preprocessing — Automatically detects scanned vs. digital PDFs and routes through Tesseract OCR when needed
Duplicate detection — Checks the database for matching invoice numbers before inserting to prevent double-processing
Multi-currency normalization — Converts extracted amounts to a base currency using daily exchange rates
Approval workflow — Invoices above a configurable threshold are sent to Slack for approval before logging
Vendor matching — Fuzzy-matches extracted vendor names against your approved vendor list
PostgreSQL schema — Ready-to-run SQL schema with invoices and line_items tables
10 additional workflows — Lead scoring, email automation, social content generation, customer support triage, and more

Next Steps

Import the workflow — Copy the JSON above into n8n (Settings → Import Workflow)
Test with your invoices — Extract text from a real PDF and send it via the curl command
Connect your email inbox — Replace the Webhook node with an Email Trigger (IMAP) node
Add Google Sheets output — Connect a Google Sheets node to log every extracted invoice
Set up human review — Add a Slack notification for low-confidence extractions

Explore more n8n + Ollama automation tutorials: