---
title: "Build a RAG Knowledge-Base Chatbot for Support in n8n"
description: "Step-by-step: build a RAG chatbot in n8n that indexes your docs into a vector store and answers questions with citations. AI credits included."
canonical: https://agentroost.app/en/blog/rag-knowledge-base-chatbot-n8n
date: 2026-06-02T20:00:00Z
---

[Canonical URL](https://agentroost.app/en/blog/rag-knowledge-base-chatbot-n8n)

# Build a RAG Knowledge-Base Chatbot for Support in n8n

Your support docs exist. Your users can't find the answers in them. The gap between "we wrote it down" and "the customer got helped" is exactly what a retrieval-augmented generation (RAG) chatbot closes.

This guide walks you through building one end-to-end inside n8n — from indexing your help articles into a vector store to wiring a chat interface that returns grounded answers with source citations. No Python environment to stand up, no separate embedding server to babysit.

---

## What RAG Actually Does (and Why It Beats a Fine-Tuned Model)

RAG splits the problem in two:

1. **Retrieval** — at query time, search a vector store for the chunks of your documentation most semantically similar to the user's question.
2. **Generation** — pass those chunks to an LLM as context and ask it to answer using only what's there.

The result: the model can't hallucinate facts that aren't in your docs, because you're feeding it the docs. When you update an article, re-index it and the chatbot's answers update too — no retraining, no fine-tuning job, no model redeploy.

---

## What You Need Before Starting

- Your support documentation in any crawlable or exportable form: a Notion export, a folder of Markdown files, a Zendesk article list, a sitemap you can walk with HTTP Request nodes.
- A vector store. n8n has built-in integrations for **Pinecone**, **Qdrant**, **Supabase pgvector**, and **in-memory** (good for prototyping). This guide uses **Qdrant** (self-hostable, with a free cloud tier for getting started).
- An n8n instance where the AI/LLM nodes already have credits wired in — more on that below.

---

## Part 1 — The Indexing Workflow

This workflow runs once (then on a schedule or webhook trigger whenever docs change) and populates your vector store.

### Step 1 — Trigger

Use a **Manual Trigger** to start. Once it's working you'll swap it for a **Schedule Trigger** (`0 2 * * *` = 2 AM daily) or a **Webhook** node fired by your CMS on publish.

### Step 2 — Fetch Your Docs

If your docs are Markdown files in a GitHub repo:

```
HTTP Request
  Method: GET
  URL: https://api.github.com/repos/YOUR_ORG/docs/git/trees/main?recursive=1
  Authentication: GitHub (Personal Access Token)
```

Parse the tree with a **Code** node (JavaScript):

```js
return items[0].json.tree
  .filter(f => f.path.endsWith('.md'))
  .map(f => ({ json: { path: f.path, url: f.url } }));
```

Then a second **HTTP Request** (looped via **Split In Batches**, batch size 5) fetches the raw content for each file and base64-decodes it.

If your docs are on a web subdomain, use the **HTTP Request** node to fetch the sitemap XML, parse `<loc>` tags with a **Code** node, then loop-fetch each page's HTML and strip tags.

### Step 3 — Chunk the Text

LLMs have context windows; vector stores work better with smaller, focused chunks. Add a **Code** node:

```js
const text = $input.first().json.content;
const chunkSize = 500;   // characters
const overlap  = 100;
const chunks = [];
for (let i = 0; i < text.length; i += chunkSize - overlap) {
  chunks.push({
    json: {
      text: text.slice(i, i + chunkSize),
      source: $input.first().json.path,
    }
  });
}
return chunks;
```

Chunk size 400–600 characters with 10–20% overlap is a practical default. Smaller chunks = more precise retrieval; larger = more context per chunk. Tune after testing.

### Step 4 — Embed + Upsert

Add the **Embeddings** node (n8n's AI sub-node):

- **Model**: `text-embedding-3-small` (OpenAI) or equivalent — available through included credits.
- **Input field**: `text`

Then add the **Qdrant Vector Store** node in **Insert** mode:

```
Collection: support-docs
ID field: auto (n8n generates a UUID per chunk)
Metadata fields: source
```

Run the workflow. A few hundred support articles will typically index in under two minutes.

---

## Part 2 — The Chat Workflow

This is what runs every time a user asks a question.

### Step 1 — Receive the Question

Use a **Webhook** node (POST, any path, e.g. `/support-chat`). The payload should include `{ "question": "..." }`. n8n gives you a public HTTPS URL for this webhook automatically — no reverse-proxy config needed.

```json
{
  "question": "How do I export my data?"
}
```

### Step 2 — Embed the Query

Add the same **Embeddings** node as above, this time with the incoming `question` field as input. This converts the user's question into a vector so you can find similar doc chunks.

### Step 3 — Retrieve Relevant Chunks

Add the **Qdrant Vector Store** node in **Retrieve** mode:

```
Collection: support-docs
Top K: 4
```

This returns the 4 most relevant chunks from your docs. Set **Metadata filter** if you have multiple products in the same collection and need to scope results.

### Step 4 — Build the Prompt

A **Set** node assembles the context:

```
context = {{$json.documents.map(d => `[${d.metadata.source}]\n${d.pageContent}`).join("\n\n")}}
prompt  = You are a support assistant. Answer ONLY using the context below.
          If the answer isn't in the context, say "I don't know."
          
          Context:
          {{$json.context}}
          
          Question: {{$('Webhook').item.json.body.question}}
```

### Step 5 — Generate the Answer

Add the **AI/LLM** node (Chat Model sub-node):

- **Model**: `gpt-4o-mini` or whichever model fits your needs — you can switch models any time from the same node. `gpt-4o-mini` is fast and cost-efficient for support answers.
- **Messages**: System message with the assembled prompt above.
- Enable **Return full response** if you want token counts.

### Step 6 — Respond

Add a **Respond to Webhook** node:

```json
{
  "answer": "{{ $json.text }}",
  "sources": "{{ $('Qdrant').all().map(d => d.json.metadata.source).join(', ') }}"
}
```

Your chatbot now returns grounded answers with source attribution.

---

## Connecting It to Your Product

The webhook URL is the integration point. Drop it anywhere:

- **Intercom / Crisp / Freshdesk**: paste the webhook URL into their "AI bot" or "custom bot" settings and map the `answer` field.
- **Slack**: a second n8n workflow listens for `app_mention` events and calls the chat webhook internally.
- **Your own frontend**: a `<form>` or React component posts to the webhook and renders `answer` + `sources`.

---

## How to Run This on AgentRoost

Building this on your laptop is fine for testing. Running it in production means keeping the n8n process alive, keeping Qdrant reachable, and making sure the embedding + LLM API calls actually have credits to run on.

On [AgentRoost](/en/agents/n8n):

1. **Sign up** at agentroost.app — email/password, Google, Microsoft, or Discord.
2. **Pick the n8n framework** and give your instance a name.
3. Your private n8n editor opens at `https://<your-id>.agentroost.app` within about two minutes.
4. **The AI nodes already have credits loaded** — no OpenAI key to paste, no Pinecone billing to set up separately. Embeddings and the chat model call the same included credit pool.
5. Import or build the two workflows above, set your Qdrant connection once, and run the indexing workflow.

The webhooks get a permanent public HTTPS URL from day one. Your data and workflows stay on your instance — you own the instance, not us.

Pricing starts at **$19.99/mo all-in** — that covers the server, the AI credits, and everything between. 14-day money-back guarantee, cancel anytime.

[Compare plans and get started](/en/pricing)

---

## Tips and Pitfalls

- **Chunk metadata matters.** Always store `source` (the article URL or filename) alongside the chunk. Without it you can't give citations, and you can't surgically re-index one article when it changes.
- **Test retrieval before wiring the LLM.** Run the Qdrant Retrieve step with a real question and inspect the returned chunks. If the top-4 chunks are irrelevant, your chunk size is wrong or your collection needs a better embedding model — not a better prompt.
- **Prompt the model to say "I don't know."** This is the most important guardrail. Without it, the LLM will interpolate from its training data and give a confident wrong answer that contradicts your docs.
- **Re-index on publish, not on a cron.** Hook into your CMS's publish webhook so the vector store updates within seconds, not overnight.
- **Watch for duplicate chunks.** If you re-run indexing without deleting old vectors, you'll accumulate stale copies. Either delete the collection before re-indexing, or upsert using a deterministic ID (e.g. hash of `source + chunk_index`).
