Automate Content Moderation with n8n + Ollama (AI Comment Filter)
Content moderation is one of the most expensive and mentally draining tasks for any online community or platform. Cloud moderation APIs charge $1–3 per 1,000 requests and send every piece of user content to a third party. With n8n and Ollama, you can build a fully automated moderation pipeline that runs on your own hardware — zero per-request costs, unlimited volume, and complete data privacy.
In this tutorial you'll build a workflow that:
- Receives user-generated content (comments, posts, reviews) via webhook
- Classifies each piece of content into safe, toxic, spam, inappropriate, or needs review
- Auto-approves safe content and auto-rejects obvious spam/toxic content
- Routes borderline content to a human review queue
- Logs every moderation decision for audit compliance
Why Local AI for Content Moderation?
| Feature | Cloud APIs (Perspective, OpenAI) | Self-Hosted (n8n + Ollama) |
|---|---|---|
| Cost per 1K requests | $1–3 | $0 (your hardware) |
| Data privacy | Content sent to third party | Never leaves your server |
| Custom rules | Limited to provider's categories | Fully customizable prompt |
| Latency | 200–500ms (network + inference) | 100–300ms (local inference only) |
| Rate limits | Yes (often 1–10 QPS) | No limits beyond your hardware |
| GDPR compliance | Requires DPA with provider | Inherent — data stays local |
Prerequisites
- n8n — self-hosted instance (Docker or npm install)
- Ollama — running locally with a model pulled (we'll use
mistral:7bfor speed orqwen2.5:14bfor accuracy) - ~4GB RAM for the 7B model, ~8GB for the 14B model
Architecture Overview
User Content (webhook)
↓
[Preprocessing] — Strip HTML, normalize text, extract metadata
↓
[Ollama Classification] — Classify: safe / toxic / spam / inappropriate / needs_review
↓
[Decision Router] — Switch node based on classification
↓ ↓ ↓
[Auto-Approve] [Auto-Reject] [Human Review Queue]
↓ ↓ ↓
[Callback API] [Notify User] [Slack/Email Alert]
↓ ↓ ↓
[Audit Log] [Audit Log] [Audit Log]
Step-by-Step Build
Receive content to moderate
Add a Webhook node as the trigger. Your application sends a POST request whenever new content is submitted:
POST /webhook/moderate
Content-Type: application/json
{
"content_id": "comment_12345",
"text": "This is the user's comment text",
"author_id": "user_789",
"content_type": "comment",
"context": "product-review"
}
Clean and normalize input
Add a Code node to strip HTML tags, normalize whitespace, and extract metadata that helps with classification:
// Strip HTML and normalize
const text = $input.item.json.text
.replace(/<[^>]*>/g, '')
.replace(/\s+/g, ' ')
.trim();
// Extract basic signals
const hasUrls = /https?:\/\/\S+/i.test(text);
const hasExcessiveCaps = (text.replace(/[^A-Z]/g, '').length / text.length) > 0.5;
const wordCount = text.split(/\s+/).length;
return {
json: {
...$input.item.json,
cleaned_text: text,
signals: { hasUrls, hasExcessiveCaps, wordCount }
}
};
Classify content with local AI
Add an HTTP Request node to call your local Ollama instance. The prompt is the most critical part — it defines your moderation policy:
POST http://localhost:11434/api/generate
{
"model": "mistral:7b",
"prompt": "You are a content moderator. Classify the following user-generated content into exactly one category.\n\nCategories:\n- SAFE: Normal, constructive content. Opinions, questions, feedback are fine.\n- TOXIC: Hate speech, threats, severe insults, harassment, discrimination.\n- SPAM: Promotional content, repeated text, SEO spam, phishing links.\n- INAPPROPRIATE: Sexual content, graphic violence, personal information exposure.\n- NEEDS_REVIEW: Borderline content that could go either way. Sarcasm, dark humor, heated but not hateful debate.\n\nContent to classify:\n\"{{cleaned_text}}\"\n\nAdditional signals:\n- Contains URLs: {{signals.hasUrls}}\n- Excessive capitals: {{signals.hasExcessiveCaps}}\n- Word count: {{signals.wordCount}}\n\nRespond with ONLY a JSON object:\n{\"category\": \"SAFE|TOXIC|SPAM|INAPPROPRIATE|NEEDS_REVIEW\", \"confidence\": 0.0-1.0, \"reason\": \"brief explanation\"}",
"stream": false,
"options": {
"temperature": 0.1,
"num_predict": 100
}
}
Setting temperature: 0.1 keeps the output deterministic. The num_predict: 100 limit prevents the model from rambling — moderation decisions should be short.
Extract the classification
Add a Code node to parse Ollama's JSON response with error handling:
const response = $input.item.json.response;
// Extract JSON from response (model sometimes wraps in markdown)
const jsonMatch = response.match(/\{[\s\S]*?\}/);
let classification;
try {
classification = JSON.parse(jsonMatch[0]);
} catch (e) {
// Fallback: if JSON parsing fails, send to human review
classification = {
category: "NEEDS_REVIEW",
confidence: 0,
reason: "Failed to parse AI response"
};
}
// Normalize category
const validCategories = ["SAFE", "TOXIC", "SPAM", "INAPPROPRIATE", "NEEDS_REVIEW"];
if (!validCategories.includes(classification.category)) {
classification.category = "NEEDS_REVIEW";
}
return {
json: {
...$input.first().json,
moderation: classification
}
};
Route based on classification
Add a Switch node that routes content based on the category:
- SAFE → Auto-approve (callback to your API)
- TOXIC / SPAM → Auto-reject (callback + notify user)
- INAPPROPRIATE → Auto-reject with specific reason
- NEEDS_REVIEW → Human review queue (Slack/email alert)
Switch condition: {{ $json.moderation.category }}
Act on the decision
For auto-approve, send a callback to your application:
POST https://your-app.com/api/moderation/callback
{
"content_id": "{{content_id}}",
"action": "approve",
"category": "{{moderation.category}}",
"confidence": {{moderation.confidence}}
}
For human review, send a Slack message with the content and one-click approve/reject buttons:
{
"text": "Content needs review",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Content ID:* {{content_id}}\n*Author:* {{author_id}}\n*AI Classification:* {{moderation.category}} ({{moderation.confidence}})\n*Reason:* {{moderation.reason}}\n\n> {{cleaned_text}}"
}
}
]
}
Log every decision for compliance
Every moderation action must be logged. Add a final Spreadsheet/Database node that records:
{
"timestamp": "{{$now.toISO()}}",
"content_id": "{{content_id}}",
"author_id": "{{author_id}}",
"category": "{{moderation.category}}",
"confidence": {{moderation.confidence}},
"reason": "{{moderation.reason}}",
"action_taken": "approve|reject|review",
"reviewed_by": "ai"
}
This audit trail is essential for GDPR compliance (users can request why their content was rejected) and for improving your moderation rules over time.
Tuning Your Moderation Policy
Confidence thresholds
Don't trust the AI blindly. Add confidence-based routing:
- High confidence (>0.85) — Auto-action (approve or reject)
- Medium confidence (0.5–0.85) — Send to human review regardless of category
- Low confidence (<0.5) — Always send to human review
Custom rules per context
Different content types need different policies. A product review has different standards than a children's forum. Adjust the prompt per context field:
// In the preprocessing Code node:
const prompts = {
"product-review": "Allow negative opinions but flag personal attacks...",
"kids-forum": "Strict moderation. Flag anything remotely inappropriate...",
"internal-chat": "Relaxed moderation. Only flag clearly toxic content..."
};
const moderationPrompt = prompts[$input.item.json.context] || prompts["product-review"];
Model selection
| Model | Speed | Accuracy | Best for |
|---|---|---|---|
mistral:7b | Fast (~200ms) | Good | High-volume, clear-cut content |
qwen2.5:14b | Medium (~500ms) | Very good | Nuanced content, sarcasm detection |
llama3.1:8b | Fast (~250ms) | Good | General purpose, good multilingual |
qwen2.5:32b | Slow (~1.5s) | Excellent | Second-pass review of borderline content |
Scaling to Production
Batch processing
For platforms with high volume, modify the webhook to accept arrays of content items and process them in parallel using n8n's Split In Batches node. A single 7B model on a modern GPU can handle 10–20 classifications per second.
Feedback loop
When human reviewers override AI decisions, log the correction. Periodically review these overrides to improve your prompts:
- If the AI flags too many safe comments as toxic → adjust the TOXIC definition in your prompt
- If spam is getting through → add more spam indicators to the signals
- If a specific topic keeps getting flagged incorrectly → add it as an exception in the prompt
Multi-language support
Ollama models like qwen2.5 and llama3.1 support 20+ languages natively. Add a language detection step before classification, and use language-specific prompt variants for better accuracy.
Common Use Cases
- Forum/community moderation — Auto-moderate user posts and comments
- Product review filtering — Filter fake reviews, spam, and inappropriate content
- Support ticket triage — Flag abusive customer messages for manager review
- Internal tool protection — Moderate AI chatbot outputs before they reach users
- Social media management — Pre-screen comments on your brand's social accounts
- E-commerce listing moderation — Verify product descriptions meet platform guidelines
Want the complete workflow + 10 more AI templates?
Get the full Self-Hosted AI Workflow Pack for n8n + Ollama. 11 production-ready templates including content moderation, email automation, lead scoring, and more.
Get the Pack — $39Summary
Content moderation is a perfect use case for local AI: it's high-volume, privacy-sensitive, and the classification task is well-suited for smaller language models. By running it on your own hardware with n8n + Ollama, you eliminate per-request costs, keep user data private, and get full control over your moderation policy.
The two-pass architecture (fast model for initial screening + accurate model for edge cases) gives you the best of both worlds: speed for the 80% of content that's clearly safe or clearly bad, and accuracy for the 20% that needs careful judgment.
Start with the basic pipeline, then iterate on your prompts based on real moderation data. The audit log is your best friend — it shows you exactly where the AI gets it wrong so you can fix it.