Build a Self-Hosted AI Chatbot with n8n + Ollama (No API Keys)
Every SaaS chatbot solution — Intercom, Drift, Zendesk AI — charges $50–300/month and sends your customer conversations to third-party servers. For companies handling sensitive data (healthcare, legal, finance), that's a compliance headache. For everyone else, it's an unnecessary recurring cost.
With n8n and Ollama, you can build a fully private AI chatbot that runs on your own hardware. No API keys, no per-message fees, no data leaving your infrastructure. It handles customer questions, routes complex issues to humans, maintains conversation context, and integrates with Slack, Telegram, or any web frontend.
In this tutorial, you'll build a chatbot workflow that:
- Accepts messages from multiple channels (webhook, Slack, Telegram)
- Maintains conversation history with session memory
- Answers questions using a custom knowledge base (RAG)
- Escalates to a human agent when confidence is low
- Logs all conversations for analytics
Why Self-Hosted Chatbots Make Sense in 2026
The landscape has shifted. Local LLMs like Llama 3, Mistral, and Qwen 2.5 now handle conversational tasks that required GPT-4 two years ago. Here's the comparison:
| Cloud Chatbot SaaS | Self-Hosted (n8n + Ollama) | |
|---|---|---|
| Monthly cost | $50–300/month | $0 (runs on existing hardware) |
| Per-message cost | $0.002–0.06/message | $0 |
| Data privacy | Conversations sent to cloud | 100% on your servers |
| GDPR compliance | Requires DPA with vendor | Compliant by design |
| Customization | Limited to vendor's features | Full control over everything |
| Vendor lock-in | High (proprietary formats) | None (open source stack) |
When self-hosted works best: Internal company chatbots (IT helpdesk, HR FAQ), customer support for regulated industries, any use case where conversation data is sensitive. If you're processing 1,000+ concurrent conversations and need sub-100ms latency, a cloud solution may still be more practical.
The Architecture
The chatbot uses n8n's Chat Trigger for the web interface, with optional Slack/Telegram integrations. Ollama provides the AI brain, and PostgreSQL (or Redis) handles session memory.
User Message (Web / Slack / Telegram)
↓
[Chat Trigger / Webhook] → [Load Session Memory]
↓
[Build Context Prompt]
↓
[Ollama Chat Model]
↓
[Confidence Check]
/ \
[High] [Low]
↓ ↓
[Send Response] [Escalate to Human]
↓
[Save to Memory]
↓
[Log Conversation]
Step 1: Set Up the Chat Trigger
n8n has a built-in Chat Trigger node that provides a web-based chat widget out of the box. Add one to your canvas:
- Node: Chat Trigger
- Authentication: None (or Basic Auth for internal use)
- Initial messages:
["Hi! I'm your AI assistant. How can I help you today?"]
This gives you a working chat URL at https://your-n8n.com/webhook/chat that you can embed in any website via iframe or link to directly.
Alternative: Slack Integration
To receive messages from Slack, use a Webhook node instead:
// Slack Event Subscription webhook
// Configure at api.slack.com → Event Subscriptions
// Subscribe to: message.channels, message.im
// The webhook receives:
{
"event": {
"type": "message",
"text": "How do I reset my password?",
"user": "U024BE7LH",
"channel": "C024BE91L",
"ts": "1710000000.000100"
}
}
Alternative: Telegram Integration
Use n8n's built-in Telegram Trigger node:
- Create a bot via
@BotFatheron Telegram - Add the Telegram Trigger node with your bot token
- Set trigger on:
message
Step 2: Session Memory with PostgreSQL
Chatbots need memory. Without it, every message is treated as a fresh conversation. n8n supports multiple memory backends:
// Use n8n's built-in Memory Buffer Window node
// or PostgreSQL Chat Memory for persistence
// PostgreSQL Chat Memory Configuration:
// Connection: your-postgres-connection
// Session ID: {{ $json.sessionId || $json.event?.user || 'default' }}
// Context Window Length: 10 (keeps last 10 messages)
// Table Name: chatbot_memory
Session ID strategy: Use the user's unique identifier as the session ID. For Slack it's event.user, for Telegram it's message.chat.id, for web chat it's the browser session cookie. This ensures each user gets their own conversation thread.
Step 3: Build the AI Agent with Ollama
The core of the chatbot is an AI Agent node connected to Ollama. This gives you tool-calling capabilities, not just text generation.
// AI Agent Node Configuration
System Prompt:
"You are a helpful customer support assistant for [YOUR COMPANY].
Rules:
- Answer questions based on the knowledge base provided via tools.
- If you don't know the answer, say so honestly. Do not make up information.
- Keep responses concise (2-3 sentences for simple questions).
- For complex issues that require human intervention (billing disputes,
account deletion, technical bugs), respond with: ESCALATE: [reason]
- Be friendly but professional. No emojis unless the user uses them first.
- If asked about pricing, always refer to the official pricing page.
Company context:
- Product: [Your product description]
- Support hours: Monday-Friday 9am-6pm CET
- Escalation email: support@yourcompany.com"
Connect the Ollama Chat Model sub-node:
- Base URL:
http://localhost:11434 - Model:
llama3.1:8b(best balance of quality and speed) - Temperature:
0.3(consistent, factual responses)
Step 4: Add a Knowledge Base (RAG)
A chatbot that only uses the LLM's training data will hallucinate about your specific product. Add a Qdrant Vector Store tool so the agent can search your documentation:
4a: Ingest Your Documentation
// Separate ingestion workflow (run once, then on doc updates)
//
// [Manual Trigger] → [Read Files] → [Text Splitter] → [Ollama Embeddings] → [Qdrant Upsert]
//
// Text Splitter settings:
// Chunk size: 500 tokens
// Chunk overlap: 50 tokens
//
// Ollama Embeddings:
// Model: nomic-embed-text
// Base URL: http://localhost:11434
//
// Qdrant:
// URL: http://localhost:6333
// Collection: company_docs
4b: Connect as Agent Tool
// Add a "Qdrant Vector Store" tool to the AI Agent
//
// Tool name: search_knowledge_base
// Description: "Search the company knowledge base for product documentation,
// FAQs, and support articles. Use this for any product-specific
// questions."
//
// Qdrant URL: http://localhost:6333
// Collection: company_docs
// Embedding model: nomic-embed-text (via Ollama)
// Top K: 3 (return top 3 most relevant chunks)
Embedding model tip: Use nomic-embed-text for RAG. It's specifically designed for retrieval tasks and outperforms larger models on search accuracy. At only 274MB, it loads fast and uses minimal memory alongside your chat model.
Step 5: Confidence Check and Human Escalation
Not every question should be answered by AI. Add a check for low-confidence responses and explicit escalation requests:
// Function Node: Check Confidence
const response = $json.output || $json.text || '';
// Check for explicit escalation
if (response.includes('ESCALATE:')) {
const reason = response.split('ESCALATE:')[1]?.trim() || 'Complex issue';
return [{
json: {
action: 'escalate',
reason,
originalMessage: $('Chat Trigger').item.json.chatInput,
sessionId: $('Chat Trigger').item.json.sessionId
}
}];
}
// Check for hedging language (low confidence)
const hedges = ['I\'m not sure', 'I don\'t have', 'I cannot find', 'beyond my knowledge'];
const isLowConfidence = hedges.some(h => response.toLowerCase().includes(h.toLowerCase()));
if (isLowConfidence) {
return [{
json: {
action: 'escalate',
reason: 'Low confidence response detected',
aiResponse: response,
originalMessage: $('Chat Trigger').item.json.chatInput
}
}];
}
return [{ json: { action: 'respond', response } }];
For escalation, send a notification to your support team via Slack, email, or a ticketing system:
// Slack Notification for Escalation
// Channel: #support-escalations
// Message:
"New chatbot escalation:
*User message:* {{ $json.originalMessage }}
*Reason:* {{ $json.reason }}
*Session:* {{ $json.sessionId }}
*AI attempted response:* {{ $json.aiResponse || 'N/A' }}"
Step 6: Conversation Logging
Log every conversation for analytics and training data:
// Function Node: Log Conversation
const timestamp = new Date().toISOString();
const logEntry = {
timestamp,
sessionId: $('Chat Trigger').item.json.sessionId || 'unknown',
userMessage: $('Chat Trigger').item.json.chatInput,
aiResponse: $json.response || $json.output,
wasEscalated: $json.action === 'escalate',
model: 'llama3.1:8b'
};
// Store in PostgreSQL, or append to a file
return [{ json: logEntry }];
Model Selection for Chatbots
The model you choose affects response quality, speed, and memory usage:
| Model | Size | Response Quality | Speed | Best For |
|---|---|---|---|---|
| llama3.1:8b | 4.7 GB | Excellent | Fast | General-purpose chatbot (recommended) |
| mistral:7b | 4.1 GB | Very good | Very fast | Quick responses, simpler conversations |
| qwen2.5:7b | 4.7 GB | Excellent | Fast | Multilingual support, tool calling |
| llama3.1:70b | 40 GB | Outstanding | Slow | Complex reasoning, nuanced responses |
| phi3:3.8b | 2.3 GB | Good | Very fast | Low-resource servers, high throughput |
Recommendation: Start with llama3.1:8b. It handles multi-turn conversations well, follows system prompts reliably, and runs fast on consumer GPUs (RTX 3060+). For servers with only CPU, use phi3:3.8b — it's surprisingly good for its size and responds in 2–5 seconds on modern CPUs.
Adding a Web Chat Widget
n8n's Chat Trigger provides a built-in chat page, but you can also embed it in your website. Add this snippet to any HTML page:
<!-- Minimal chat widget embedding -->
<iframe
src="https://your-n8n.com/webhook/chat"
style="position: fixed; bottom: 20px; right: 20px;
width: 400px; height: 600px; border: none;
border-radius: 12px; box-shadow: 0 4px 24px rgba(0,0,0,0.3);
z-index: 10000;"
></iframe>
<!-- Or use n8n's official chat widget -->
<link href="https://cdn.jsdelivr.net/npm/@n8n/chat/style.css" rel="stylesheet" />
<script type="module">
import { createChat } from 'https://cdn.jsdelivr.net/npm/@n8n/chat/chat.bundle.es.js';
createChat({
webhookUrl: 'https://your-n8n.com/webhook/chatbot',
initialMessages: ['Hi! How can I help you today?'],
theme: { primaryColor: '#4f46e5' }
});
</script>
Production Hardening
Rate Limiting
Prevent abuse by limiting messages per session:
- Add a Function node that checks message count per session in the last hour
- If > 50 messages/hour from one session, return a "slow down" message
- Block sessions that send > 200 messages/hour entirely
Input Sanitization
Always sanitize user input before passing to the LLM:
- Strip HTML tags to prevent XSS in logged conversations
- Truncate messages to 2000 characters (prevents context window flooding)
- Filter prompt injection attempts (messages trying to override system prompt)
Monitoring
- Track average response time — if it exceeds 10 seconds, your Ollama instance may need more resources
- Monitor escalation rate — if > 30% of conversations escalate, improve your knowledge base
- Log token usage per conversation to estimate capacity
Complete Workflow JSON
Import this into n8n to get the base chatbot running:
Click to expand full workflow JSON
{
"name": "AI Chatbot (Ollama + RAG + Escalation)",
"nodes": [
{
"parameters": {
"options": {
"allowedOrigins": "*"
}
},
"id": "chat-trigger",
"name": "Chat Trigger",
"type": "@n8n/n8n-nodes-langchain.chatTrigger",
"typeVersion": 1.1,
"position": [240, 300],
"webhookId": "chatbot"
},
{
"parameters": {
"options": {
"systemMessage": "You are a helpful customer support assistant. Answer questions using the knowledge base tool when available. If you cannot find the answer or the issue requires human intervention, respond with ESCALATE: followed by the reason. Keep responses concise and professional."
}
},
"id": "ai-agent",
"name": "AI Agent",
"type": "@n8n/n8n-nodes-langchain.agent",
"typeVersion": 1.7,
"position": [480, 300]
},
{
"parameters": {
"model": "llama3.1:8b",
"options": {
"temperature": 0.3
}
},
"id": "ollama-chat",
"name": "Ollama Chat Model",
"type": "@n8n/n8n-nodes-langchain.lmChatOllama",
"typeVersion": 1,
"position": [480, 500]
},
{
"parameters": {
"contextWindowLength": 10
},
"id": "memory",
"name": "Simple Memory",
"type": "@n8n/n8n-nodes-langchain.memoryBufferWindow",
"typeVersion": 1.2,
"position": [640, 500]
},
{
"parameters": {
"toolDescription": "Search the company knowledge base for product documentation, FAQs, and support articles. Use this tool for any product-specific questions.",
"qdrantCollection": "company_docs",
"topK": 3
},
"id": "qdrant-tool",
"name": "Knowledge Base (Qdrant)",
"type": "@n8n/n8n-nodes-langchain.toolVectorStore",
"typeVersion": 1,
"position": [360, 500]
},
{
"parameters": {
"model": "nomic-embed-text"
},
"id": "embeddings",
"name": "Ollama Embeddings",
"type": "@n8n/n8n-nodes-langchain.embeddingsOllama",
"typeVersion": 1,
"position": [360, 680]
}
],
"connections": {
"Chat Trigger": {
"main": [[{ "node": "AI Agent", "type": "main", "index": 0 }]]
},
"Ollama Chat Model": {
"ai_languageModel": [[{ "node": "AI Agent", "type": "ai_languageModel", "index": 0 }]]
},
"Simple Memory": {
"ai_memory": [[{ "node": "AI Agent", "type": "ai_memory", "index": 0 }]]
},
"Knowledge Base (Qdrant)": {
"ai_tool": [[{ "node": "AI Agent", "type": "ai_tool", "index": 0 }]]
},
"Ollama Embeddings": {
"ai_embedding": [[{ "node": "Knowledge Base (Qdrant)", "type": "ai_embedding", "index": 0 }]]
}
},
"settings": { "executionOrder": "v1" },
"tags": [{ "name": "AI" }, { "name": "Ollama" }, { "name": "Chatbot" }, { "name": "RAG" }, { "name": "Customer Support" }]
}
Next Steps
Once your base chatbot is running:
- Populate the knowledge base — Ingest your docs, FAQs, and support articles into Qdrant
- Add channel integrations — Connect Slack and/or Telegram using n8n's built-in nodes
- Set up escalation routing — Connect the escalation path to your ticketing system (Jira, Linear, email)
- Monitor and improve — Review logged conversations weekly. Add common questions to the knowledge base. Fine-tune the system prompt based on real interactions.
A self-hosted chatbot with n8n + Ollama gives you full control over your AI assistant — the conversation data, the model behavior, the integration points, and the cost. Start with the template above and customize it for your specific use case.
Want 11 Production-Ready AI Workflows?
The Self-Hosted AI Workflow Pack includes a chatbot template, email automation, document processing, lead scoring, and 7 more n8n + Ollama workflows. One payment, unlimited runs, zero API costs.
Get the Full Pack — $39