Build a RAG Knowledge-Base Chatbot for Support in n8n

AgentRoost · June 2, 2026 · 7 min read · View as Markdown

Build a RAG Knowledge-Base Chatbot for Support in n8n

Your support docs exist. Your users can't find the answers in them. The gap between "we wrote it down" and "the customer got helped" is exactly what a retrieval-augmented generation (RAG) chatbot closes.

This guide walks you through building one end-to-end inside n8n — from indexing your help articles into a vector store to wiring a chat interface that returns grounded answers with source citations. No Python environment to stand up, no separate embedding server to babysit.

What RAG Actually Does (and Why It Beats a Fine-Tuned Model)

RAG splits the problem in two:

Retrieval — at query time, search a vector store for the chunks of your documentation most semantically similar to the user's question.
Generation — pass those chunks to an LLM as context and ask it to answer using only what's there.

The result: the model can't hallucinate facts that aren't in your docs, because you're feeding it the docs. When you update an article, re-index it and the chatbot's answers update too — no retraining, no fine-tuning job, no model redeploy.

What You Need Before Starting

Your support documentation in any crawlable or exportable form: a Notion export, a folder of Markdown files, a Zendesk article list, a sitemap you can walk with HTTP Request nodes.
A vector store. n8n has built-in integrations for Pinecone, Qdrant, Supabase pgvector, and in-memory (good for prototyping). This guide uses Qdrant (self-hostable, with a free cloud tier for getting started).
An n8n instance where the AI/LLM nodes already have credits wired in — more on that below.

Part 1 — The Indexing Workflow

This workflow runs once (then on a schedule or webhook trigger whenever docs change) and populates your vector store.

Step 1 — Trigger

Use a Manual Trigger to start. Once it's working you'll swap it for a Schedule Trigger (0 2 * * * = 2 AM daily) or a Webhook node fired by your CMS on publish.

Step 2 — Fetch Your Docs

If your docs are Markdown files in a GitHub repo:

HTTP Request
  Method: GET
  URL: https://api.github.com/repos/YOUR_ORG/docs/git/trees/main?recursive=1
  Authentication: GitHub (Personal Access Token)

Parse the tree with a Code node (JavaScript):

return items[0].json.tree
  .filter(f => f.path.endsWith('.md'))
  .map(f => ({ json: { path: f.path, url: f.url } }));

Then a second HTTP Request (looped via Split In Batches, batch size 5) fetches the raw content for each file and base64-decodes it.

If your docs are on a web subdomain, use the HTTP Request node to fetch the sitemap XML, parse <loc> tags with a Code node, then loop-fetch each page's HTML and strip tags.

Step 3 — Chunk the Text

LLMs have context windows; vector stores work better with smaller, focused chunks. Add a Code node:

const text = $input.first().json.content;
const chunkSize = 500;   // characters
const overlap  = 100;
const chunks = [];
for (let i = 0; i < text.length; i += chunkSize - overlap) {
  chunks.push({
    json: {
      text: text.slice(i, i + chunkSize),
      source: $input.first().json.path,
    }
  });
}
return chunks;

Chunk size 400–600 characters with 10–20% overlap is a practical default. Smaller chunks = more precise retrieval; larger = more context per chunk. Tune after testing.

Step 4 — Embed + Upsert

Add the Embeddings node (n8n's AI sub-node):

Model: text-embedding-3-small (OpenAI) or equivalent — available through included credits.
Input field: text

Then add the Qdrant Vector Store node in Insert mode:

Collection: support-docs
ID field: auto (n8n generates a UUID per chunk)
Metadata fields: source

Run the workflow. A few hundred support articles will typically index in under two minutes.

Part 2 — The Chat Workflow

This is what runs every time a user asks a question.

Step 1 — Receive the Question

Use a Webhook node (POST, any path, e.g. /support-chat). The payload should include { "question": "..." }. n8n gives you a public HTTPS URL for this webhook automatically — no reverse-proxy config needed.

{
  "question": "How do I export my data?"
}

Step 2 — Embed the Query

Add the same Embeddings node as above, this time with the incoming question field as input. This converts the user's question into a vector so you can find similar doc chunks.

Step 3 — Retrieve Relevant Chunks

Add the Qdrant Vector Store node in Retrieve mode:

Collection: support-docs
Top K: 4

This returns the 4 most relevant chunks from your docs. Set Metadata filter if you have multiple products in the same collection and need to scope results.

Step 4 — Build the Prompt

A Set node assembles the context:

context = {{$json.documents.map(d => `[${d.metadata.source}]\n${d.pageContent}`).join("\n\n")}}
prompt  = You are a support assistant. Answer ONLY using the context below.
          If the answer isn't in the context, say "I don't know."
          
          Context:
          {{$json.context}}
          
          Question: {{$('Webhook').item.json.body.question}}

Step 5 — Generate the Answer

Add the AI/LLM node (Chat Model sub-node):

Model: gpt-4o-mini or whichever model fits your needs — you can switch models any time from the same node. gpt-4o-mini is fast and cost-efficient for support answers.
Messages: System message with the assembled prompt above.
Enable Return full response if you want token counts.

Step 6 — Respond

Add a Respond to Webhook node:

{
  "answer": "{{ $json.text }}",
  "sources": "{{ $('Qdrant').all().map(d => d.json.metadata.source).join(', ') }}"
}

Your chatbot now returns grounded answers with source attribution.

Connecting It to Your Product

The webhook URL is the integration point. Drop it anywhere:

Intercom / Crisp / Freshdesk: paste the webhook URL into their "AI bot" or "custom bot" settings and map the answer field.
Slack: a second n8n workflow listens for app_mention events and calls the chat webhook internally.
Your own frontend: a <form> or React component posts to the webhook and renders answer + sources.

How to Run This on AgentRoost

Building this on your laptop is fine for testing. Running it in production means keeping the n8n process alive, keeping Qdrant reachable, and making sure the embedding + LLM API calls actually have credits to run on.

On AgentRoost:

Sign up at agentroost.app — email/password, Google, Microsoft, or Discord.
Pick the n8n framework and give your instance a name.
Your private n8n editor opens at https://<your-id>.agentroost.app within about two minutes.
The AI nodes already have credits loaded — no OpenAI key to paste, no Pinecone billing to set up separately. Embeddings and the chat model call the same included credit pool.
Import or build the two workflows above, set your Qdrant connection once, and run the indexing workflow.

The webhooks get a permanent public HTTPS URL from day one. Your data and workflows stay on your instance — you own the instance, not us.

Pricing starts at $19.99/mo all-in — that covers the server, the AI credits, and everything between. 14-day money-back guarantee, cancel anytime.

Compare plans and get started

Tips and Pitfalls

Chunk metadata matters. Always store source (the article URL or filename) alongside the chunk. Without it you can't give citations, and you can't surgically re-index one article when it changes.
Test retrieval before wiring the LLM. Run the Qdrant Retrieve step with a real question and inspect the returned chunks. If the top-4 chunks are irrelevant, your chunk size is wrong or your collection needs a better embedding model — not a better prompt.
Prompt the model to say "I don't know." This is the most important guardrail. Without it, the LLM will interpolate from its training data and give a confident wrong answer that contradicts your docs.
Re-index on publish, not on a cron. Hook into your CMS's publish webhook so the vector store updates within seconds, not overnight.
Watch for duplicate chunks. If you re-run indexing without deleting old vectors, you'll accumulate stale copies. Either delete the collection before re-indexing, or upsert using a deterministic ID (e.g. hash of source + chunk_index).

Frequently asked questions

Do I need to bring my own OpenAI API key for the embeddings and chat model?

No. On AgentRoost, AI credits are included in your subscription. The Embeddings node and the AI/LLM node use the same built-in credit pool — you don't paste any API keys into n8n or manage a separate OpenAI account.

Which vector store should I use?

Qdrant is a solid default — it has a free cloud tier and n8n has a native Qdrant node. Pinecone and Supabase pgvector also work with built-in nodes. For quick prototyping, n8n's in-memory vector store lets you test the full RAG pipeline without any external account.

How do I update the chatbot when I publish a new support article?

Trigger the indexing workflow via a webhook from your CMS whenever an article is published or updated. Wire your CMS's publish hook to the n8n webhook URL, and the new content will be chunked, embedded, and upserted within seconds — no manual re-run needed.

Can I cancel if it doesn't work for my use case?

Yes. AgentRoost subscriptions are monthly with no annual lock-in, and there's a 14-day money-back guarantee. Cancel any time from your account settings.

Will my workflow data and documents stay private?

Your n8n instance is single-tenant — your login, your workflows, your data. No other AgentRoost customer shares your instance. The vector store (Qdrant, Pinecone, etc.) is whichever provider you connect; if you self-host Qdrant, the data never leaves your own infrastructure.