Build a Self-Hosted AI Chatbot with n8n + Ollama (No API Keys)

Published March 24, 2026 · 14 min read

Every SaaS chatbot solution — Intercom, Drift, Zendesk AI — charges $50–300/month and sends your customer conversations to third-party servers. For companies handling sensitive data (healthcare, legal, finance), that's a compliance headache. For everyone else, it's an unnecessary recurring cost.

With n8n and Ollama, you can build a fully private AI chatbot that runs on your own hardware. No API keys, no per-message fees, no data leaving your infrastructure. It handles customer questions, routes complex issues to humans, maintains conversation context, and integrates with Slack, Telegram, or any web frontend.

In this tutorial, you'll build a chatbot workflow that:

Accepts messages from multiple channels (webhook, Slack, Telegram)
Maintains conversation history with session memory
Answers questions using a custom knowledge base (RAG)
Escalates to a human agent when confidence is low
Logs all conversations for analytics

Why Self-Hosted Chatbots Make Sense in 2026

The landscape has shifted. Local LLMs like Llama 3, Mistral, and Qwen 2.5 now handle conversational tasks that required GPT-4 two years ago. Here's the comparison:

	Cloud Chatbot SaaS	Self-Hosted (n8n + Ollama)
Monthly cost	$50–300/month	$0 (runs on existing hardware)
Per-message cost	$0.002–0.06/message	$0
Data privacy	Conversations sent to cloud	100% on your servers
GDPR compliance	Requires DPA with vendor	Compliant by design
Customization	Limited to vendor's features	Full control over everything
Vendor lock-in	High (proprietary formats)	None (open source stack)

When self-hosted works best: Internal company chatbots (IT helpdesk, HR FAQ), customer support for regulated industries, any use case where conversation data is sensitive. If you're processing 1,000+ concurrent conversations and need sub-100ms latency, a cloud solution may still be more practical.

The Architecture

The chatbot uses n8n's Chat Trigger for the web interface, with optional Slack/Telegram integrations. Ollama provides the AI brain, and PostgreSQL (or Redis) handles session memory.

User Message (Web / Slack / Telegram)
    ↓
[Chat Trigger / Webhook] → [Load Session Memory]
                                       ↓
                              [Build Context Prompt]
                                       ↓
                                [Ollama Chat Model]
                                       ↓
                              [Confidence Check]
                              /                 \
                         [High]              [Low]
                           ↓                  ↓
                   [Send Response]    [Escalate to Human]
                           ↓
                  [Save to Memory]
                           ↓
                  [Log Conversation]

Step 1: Set Up the Chat Trigger

n8n has a built-in Chat Trigger node that provides a web-based chat widget out of the box. Add one to your canvas:

Node: Chat Trigger
Authentication: None (or Basic Auth for internal use)
Initial messages: ["Hi! I'm your AI assistant. How can I help you today?"]

This gives you a working chat URL at https://your-n8n.com/webhook/chat that you can embed in any website via iframe or link to directly.

Alternative: Slack Integration

To receive messages from Slack, use a Webhook node instead:

// Slack Event Subscription webhook
// Configure at api.slack.com → Event Subscriptions
// Subscribe to: message.channels, message.im

// The webhook receives:
{
  "event": {
    "type": "message",
    "text": "How do I reset my password?",
    "user": "U024BE7LH",
    "channel": "C024BE91L",
    "ts": "1710000000.000100"
  }
}

Alternative: Telegram Integration

Use n8n's built-in Telegram Trigger node:

Create a bot via @BotFather on Telegram
Add the Telegram Trigger node with your bot token
Set trigger on: message

Step 2: Session Memory with PostgreSQL

Chatbots need memory. Without it, every message is treated as a fresh conversation. n8n supports multiple memory backends:

// Use n8n's built-in Memory Buffer Window node
// or PostgreSQL Chat Memory for persistence

// PostgreSQL Chat Memory Configuration:
// Connection: your-postgres-connection
// Session ID: {{ $json.sessionId || $json.event?.user || 'default' }}
// Context Window Length: 10  (keeps last 10 messages)
// Table Name: chatbot_memory

Session ID strategy: Use the user's unique identifier as the session ID. For Slack it's event.user, for Telegram it's message.chat.id, for web chat it's the browser session cookie. This ensures each user gets their own conversation thread.

Step 3: Build the AI Agent with Ollama

The core of the chatbot is an AI Agent node connected to Ollama. This gives you tool-calling capabilities, not just text generation.

// AI Agent Node Configuration
System Prompt:
"You are a helpful customer support assistant for [YOUR COMPANY].

Rules:
- Answer questions based on the knowledge base provided via tools.
- If you don't know the answer, say so honestly. Do not make up information.
- Keep responses concise (2-3 sentences for simple questions).
- For complex issues that require human intervention (billing disputes,
  account deletion, technical bugs), respond with: ESCALATE: [reason]
- Be friendly but professional. No emojis unless the user uses them first.
- If asked about pricing, always refer to the official pricing page.

Company context:
- Product: [Your product description]
- Support hours: Monday-Friday 9am-6pm CET
- Escalation email: support@yourcompany.com"

Connect the Ollama Chat Model sub-node:

Base URL: http://localhost:11434
Model: llama3.1:8b (best balance of quality and speed)
Temperature: 0.3 (consistent, factual responses)

Step 4: Add a Knowledge Base (RAG)

A chatbot that only uses the LLM's training data will hallucinate about your specific product. Add a Qdrant Vector Store tool so the agent can search your documentation:

4a: Ingest Your Documentation

// Separate ingestion workflow (run once, then on doc updates)
//
// [Manual Trigger] → [Read Files] → [Text Splitter] → [Ollama Embeddings] → [Qdrant Upsert]
//
// Text Splitter settings:
//   Chunk size: 500 tokens
//   Chunk overlap: 50 tokens
//
// Ollama Embeddings:
//   Model: nomic-embed-text
//   Base URL: http://localhost:11434
//
// Qdrant:
//   URL: http://localhost:6333
//   Collection: company_docs

4b: Connect as Agent Tool

// Add a "Qdrant Vector Store" tool to the AI Agent
//
// Tool name: search_knowledge_base
// Description: "Search the company knowledge base for product documentation,
//              FAQs, and support articles. Use this for any product-specific
//              questions."
//
// Qdrant URL: http://localhost:6333
// Collection: company_docs
// Embedding model: nomic-embed-text (via Ollama)
// Top K: 3  (return top 3 most relevant chunks)

Embedding model tip: Use nomic-embed-text for RAG. It's specifically designed for retrieval tasks and outperforms larger models on search accuracy. At only 274MB, it loads fast and uses minimal memory alongside your chat model.

Step 5: Confidence Check and Human Escalation

Not every question should be answered by AI. Add a check for low-confidence responses and explicit escalation requests:

// Function Node: Check Confidence
const response = $json.output || $json.text || '';

// Check for explicit escalation
if (response.includes('ESCALATE:')) {
  const reason = response.split('ESCALATE:')[1]?.trim() || 'Complex issue';
  return [{
    json: {
      action: 'escalate',
      reason,
      originalMessage: $('Chat Trigger').item.json.chatInput,
      sessionId: $('Chat Trigger').item.json.sessionId
    }
  }];
}

// Check for hedging language (low confidence)
const hedges = ['I\'m not sure', 'I don\'t have', 'I cannot find', 'beyond my knowledge'];
const isLowConfidence = hedges.some(h => response.toLowerCase().includes(h.toLowerCase()));

if (isLowConfidence) {
  return [{
    json: {
      action: 'escalate',
      reason: 'Low confidence response detected',
      aiResponse: response,
      originalMessage: $('Chat Trigger').item.json.chatInput
    }
  }];
}

return [{ json: { action: 'respond', response } }];

For escalation, send a notification to your support team via Slack, email, or a ticketing system:

// Slack Notification for Escalation
// Channel: #support-escalations
// Message:
"New chatbot escalation:
*User message:* {{ $json.originalMessage }}
*Reason:* {{ $json.reason }}
*Session:* {{ $json.sessionId }}
*AI attempted response:* {{ $json.aiResponse || 'N/A' }}"

Step 6: Conversation Logging

Log every conversation for analytics and training data:

// Function Node: Log Conversation
const timestamp = new Date().toISOString();
const logEntry = {
  timestamp,
  sessionId: $('Chat Trigger').item.json.sessionId || 'unknown',
  userMessage: $('Chat Trigger').item.json.chatInput,
  aiResponse: $json.response || $json.output,
  wasEscalated: $json.action === 'escalate',
  model: 'llama3.1:8b'
};

// Store in PostgreSQL, or append to a file
return [{ json: logEntry }];

Model Selection for Chatbots

The model you choose affects response quality, speed, and memory usage:

Model	Size	Response Quality	Speed	Best For
llama3.1:8b	4.7 GB	Excellent	Fast	General-purpose chatbot (recommended)
mistral:7b	4.1 GB	Very good	Very fast	Quick responses, simpler conversations
qwen2.5:7b	4.7 GB	Excellent	Fast	Multilingual support, tool calling
llama3.1:70b	40 GB	Outstanding	Slow	Complex reasoning, nuanced responses
phi3:3.8b	2.3 GB	Good	Very fast	Low-resource servers, high throughput

Recommendation: Start with llama3.1:8b. It handles multi-turn conversations well, follows system prompts reliably, and runs fast on consumer GPUs (RTX 3060+). For servers with only CPU, use phi3:3.8b — it's surprisingly good for its size and responds in 2–5 seconds on modern CPUs.

Adding a Web Chat Widget

n8n's Chat Trigger provides a built-in chat page, but you can also embed it in your website. Add this snippet to any HTML page:

<!-- Minimal chat widget embedding -->
<iframe
  src="https://your-n8n.com/webhook/chat"
  style="position: fixed; bottom: 20px; right: 20px;
         width: 400px; height: 600px; border: none;
         border-radius: 12px; box-shadow: 0 4px 24px rgba(0,0,0,0.3);
         z-index: 10000;"
></iframe>

<!-- Or use n8n's official chat widget -->
<link href="https://cdn.jsdelivr.net/npm/@n8n/chat/style.css" rel="stylesheet" />
<script type="module">
  import { createChat } from 'https://cdn.jsdelivr.net/npm/@n8n/chat/chat.bundle.es.js';
  createChat({
    webhookUrl: 'https://your-n8n.com/webhook/chatbot',
    initialMessages: ['Hi! How can I help you today?'],
    theme: { primaryColor: '#4f46e5' }
  });
</script>

Production Hardening

Rate Limiting

Prevent abuse by limiting messages per session:

Add a Function node that checks message count per session in the last hour
If > 50 messages/hour from one session, return a "slow down" message
Block sessions that send > 200 messages/hour entirely

Input Sanitization

Always sanitize user input before passing to the LLM:

Strip HTML tags to prevent XSS in logged conversations
Truncate messages to 2000 characters (prevents context window flooding)
Filter prompt injection attempts (messages trying to override system prompt)

Monitoring

Track average response time — if it exceeds 10 seconds, your Ollama instance may need more resources
Monitor escalation rate — if > 30% of conversations escalate, improve your knowledge base
Log token usage per conversation to estimate capacity

Complete Workflow JSON

Import this into n8n to get the base chatbot running:

Click to expand full workflow JSON

{
  "name": "AI Chatbot (Ollama + RAG + Escalation)",
  "nodes": [
    {
      "parameters": {
        "options": {
          "allowedOrigins": "*"
        }
      },
      "id": "chat-trigger",
      "name": "Chat Trigger",
      "type": "@n8n/n8n-nodes-langchain.chatTrigger",
      "typeVersion": 1.1,
      "position": [240, 300],
      "webhookId": "chatbot"
    },
    {
      "parameters": {
        "options": {
          "systemMessage": "You are a helpful customer support assistant. Answer questions using the knowledge base tool when available. If you cannot find the answer or the issue requires human intervention, respond with ESCALATE: followed by the reason. Keep responses concise and professional."
        }
      },
      "id": "ai-agent",
      "name": "AI Agent",
      "type": "@n8n/n8n-nodes-langchain.agent",
      "typeVersion": 1.7,
      "position": [480, 300]
    },
    {
      "parameters": {
        "model": "llama3.1:8b",
        "options": {
          "temperature": 0.3
        }
      },
      "id": "ollama-chat",
      "name": "Ollama Chat Model",
      "type": "@n8n/n8n-nodes-langchain.lmChatOllama",
      "typeVersion": 1,
      "position": [480, 500]
    },
    {
      "parameters": {
        "contextWindowLength": 10
      },
      "id": "memory",
      "name": "Simple Memory",
      "type": "@n8n/n8n-nodes-langchain.memoryBufferWindow",
      "typeVersion": 1.2,
      "position": [640, 500]
    },
    {
      "parameters": {
        "toolDescription": "Search the company knowledge base for product documentation, FAQs, and support articles. Use this tool for any product-specific questions.",
        "qdrantCollection": "company_docs",
        "topK": 3
      },
      "id": "qdrant-tool",
      "name": "Knowledge Base (Qdrant)",
      "type": "@n8n/n8n-nodes-langchain.toolVectorStore",
      "typeVersion": 1,
      "position": [360, 500]
    },
    {
      "parameters": {
        "model": "nomic-embed-text"
      },
      "id": "embeddings",
      "name": "Ollama Embeddings",
      "type": "@n8n/n8n-nodes-langchain.embeddingsOllama",
      "typeVersion": 1,
      "position": [360, 680]
    }
  ],
  "connections": {
    "Chat Trigger": {
      "main": [[{ "node": "AI Agent", "type": "main", "index": 0 }]]
    },
    "Ollama Chat Model": {
      "ai_languageModel": [[{ "node": "AI Agent", "type": "ai_languageModel", "index": 0 }]]
    },
    "Simple Memory": {
      "ai_memory": [[{ "node": "AI Agent", "type": "ai_memory", "index": 0 }]]
    },
    "Knowledge Base (Qdrant)": {
      "ai_tool": [[{ "node": "AI Agent", "type": "ai_tool", "index": 0 }]]
    },
    "Ollama Embeddings": {
      "ai_embedding": [[{ "node": "Knowledge Base (Qdrant)", "type": "ai_embedding", "index": 0 }]]
    }
  },
  "settings": { "executionOrder": "v1" },
  "tags": [{ "name": "AI" }, { "name": "Ollama" }, { "name": "Chatbot" }, { "name": "RAG" }, { "name": "Customer Support" }]
}

Next Steps

Once your base chatbot is running:

Populate the knowledge base — Ingest your docs, FAQs, and support articles into Qdrant
Add channel integrations — Connect Slack and/or Telegram using n8n's built-in nodes
Set up escalation routing — Connect the escalation path to your ticketing system (Jira, Linear, email)
Monitor and improve — Review logged conversations weekly. Add common questions to the knowledge base. Fine-tune the system prompt based on real interactions.

A self-hosted chatbot with n8n + Ollama gives you full control over your AI assistant — the conversation data, the model behavior, the integration points, and the cost. Start with the template above and customize it for your specific use case.

Want 11 Production-Ready AI Workflows?

The Self-Hosted AI Workflow Pack includes a chatbot template, email automation, document processing, lead scoring, and 7 more n8n + Ollama workflows. One payment, unlimited runs, zero API costs.

Get the Full Pack — $39