---
title: "Build a Simple RAG Q&A Bot Over Your Docs with n8n"
description: "Step-by-step guide: chunk, embed, and query your own documents with a RAG Q&A flow in n8n. AI credits included — no API key, no DevOps setup required."
canonical: https://agentroost.app/en/blog/rag-qa-bot-over-docs-n8n
date: 2026-06-07T20:00:00Z
---

[Canonical URL](https://agentroost.app/en/blog/rag-qa-bot-over-docs-n8n)

Most chatbots have a fundamental problem: they answer from their training data, not your content. Ask a general-purpose LLM about your internal product docs, your SOPs, or your knowledge base and you get confident-sounding guesses. Retrieval-Augmented Generation (RAG) fixes this by forcing the model to answer *only from text you hand it* — fetched at query time from a vector store you control.

n8n is one of the cleaner places to build this because the entire pipeline — chunking, embedding, storing, retrieving, generating — can live in a single visual workflow with no Python scripts, no LangChain boilerplate, and no external orchestration layer. This guide walks through the exact build.

## What RAG Actually Does (in 30 Seconds)

RAG has two distinct phases:

1. **Indexing** — you take your documents, split them into chunks (e.g. 500 tokens each, 50-token overlap), convert each chunk into a vector embedding, and write those vectors to a store.
2. **Querying** — when a user asks a question, you embed the question the same way, find the nearest chunks in the store (cosine similarity), and pass those chunks as context into an LLM prompt. The model answers from the retrieved context, not from memory.

The result is grounded, citable answers. If the answer isn't in your docs, the model should say so.

---

## The n8n Flow Architecture

You need two workflows (or two sub-flows in one workflow with separate triggers):

```
[Indexing flow]
Trigger (manual / schedule / webhook)
  → Read Binary Files  (or HTTP Request / Google Drive)
  → Recursive Character Text Splitter
  → Embeddings node
  → Vector Store (Insert)

[Query flow]
Chat Trigger  (or Webhook)
  → Embeddings node  (same model as indexing)
  → Vector Store (Retrieve)
  → AI Agent / LLM Chain  (with retrieved context injected)
  → Respond to Webhook / Chat
```

Both flows share the same vector store and the same embedding model — that's the key constraint. Mixing models breaks retrieval.

---

## Step 1 — Set Up the Indexing Flow

### 1a. Trigger and Document Source

For a manual one-time index, use a **Manual Trigger**. For ongoing ingestion, use a **Schedule Trigger** (cron expression: `0 2 * * *` for nightly at 2 AM) or a **Webhook** node so you can trigger indexing from an external system when files change.

To load documents, the most flexible option is an **HTTP Request** node pointing at any URL that returns text or a binary file. For local files or Google Drive, use the built-in **Google Drive** or **Read/Write Files from Disk** nodes.

### 1b. Chunk the Text

Add a **Recursive Character Text Splitter** node (found under the AI → Document Loaders section in n8n). Reasonable defaults:

| Parameter | Value |
|---|---|
| Chunk Size | 500 |
| Chunk Overlap | 50 |
| Separators | `\n\n`, `\n`, ` ` |

Smaller chunks give more precise retrieval; larger chunks give more context per result. 500/50 is a safe starting point for prose documents.

### 1c. Embed the Chunks

Connect an **Embeddings** node. In n8n this is labelled something like **Embeddings OpenAI** or **Embeddings Cohere** depending on your provider selection. Pick `text-embedding-3-small` (OpenAI) or an equivalent — it's fast and cost-effective for indexing. You configure the model inside the node; on AgentRoost the credential is already pointed at your included credits (more on that below).

### 1d. Write to the Vector Store

Connect a **Vector Store** node in *Insert* mode. For a quick start, choose **In-Memory Vector Store** — it requires no external service and works immediately. For production persistence (survives workflow restarts), swap to **PGVector** with a Postgres connection, or **Qdrant**.

Your indexing flow is now complete. Run it manually once to populate the store.

---

## Step 2 — Build the Query Flow

### 2a. Accept the Question

Use a **Chat Trigger** node (built-in to n8n, gives you a test chat panel) or a **Webhook** node if you want to hook it up to a front-end, Telegram, or Slack.

### 2b. Embed the Question

Add the same **Embeddings** node you used in the indexing flow, pointing at the same model. n8n will reuse the same credential configuration.

### 2c. Retrieve Relevant Chunks

Add a **Vector Store** node in *Retrieve* (or *Search*) mode. Connect it to the same store you wrote to. Set `Top K` to 4 or 5 — that's the number of nearest chunks the model will receive as context.

The node outputs an array of document chunks with their content and metadata.

### 2d. Generate the Answer

Add an **AI Agent** node (or a simpler **LLM Chain** if you don't need tool-calling). In the system prompt, explicitly instruct the model to answer only from the provided context:

```
You are a helpful assistant. Answer the user's question using ONLY the
context below. If the answer is not in the context, say so clearly.

Context:
{{ $json.documents.map(d => d.pageContent).join('\n\n---\n\n') }}
```

Set the user message to `{{ $('Chat Trigger').item.json.chatInput }}` (or the equivalent input field from your trigger).

Pick any model from your preferred provider — `gpt-4o-mini` for speed, `claude-3-haiku` for low latency, or any of the 350+ models available on AgentRoost. Swap models with a single dropdown change; no credential update needed.

### 2e. Return the Answer

Wire the output back to a **Respond to Webhook** node (if you used a webhook trigger) or let the Chat Trigger display it in n8n's built-in chat panel.

---

## Pitfalls to Watch

**Embedding model mismatch.** If you index with `text-embedding-3-small` and query with `text-embedding-ada-002`, vector distances are meaningless. Pin the model name in both nodes.

**In-memory store resets on restart.** The in-memory vector store is cleared when n8n restarts. For anything you want to persist, use PGVector or Qdrant from day one. The swap is a node replacement — your chunking and query logic stays identical.

**Chunk size vs. context window.** If you set Top K=5 and each chunk is 500 tokens, you're injecting 2,500 tokens of context. At models with smaller context windows that leaves little room for the response. Either use a larger-context model or reduce K.

**No deduplication on re-index.** If you re-run the indexing flow without clearing the store, chunks double up. Add a **Delete All** operation at the start of the indexing flow (or filter by document ID) if you run it on a schedule.

---

## Running This on AgentRoost

On your own server you'd need to: install n8n, configure a reverse proxy and SSL, set up an OpenAI account, fund it, copy API keys, and keep the server patched. That's the self-hosting tax.

On AgentRoost, you get **your own n8n instance** — your login, your workflows, your data — at `https://<your-id>.agentroost.app`, with none of that overhead:

1. Sign up at [agentroost.app](/en/agents/n8n).
2. Pick the **n8n** framework, name your instance.
3. Your private n8n editor opens in about two minutes.
4. Open the Credentials panel — the AI/LLM and Embeddings credentials are already configured against your included credits.
5. Build the two flows above. Hit **Execute** on the indexing flow, then open the chat panel and ask a question.

Both the embedding calls during indexing *and* the generation calls during querying run against your included credits — no OpenAI billing account, no API key rotation, no surprise invoice at the end of the month. Plans start at $19.99/mo all-in, and there's a 14-day money-back guarantee if it doesn't fit your workflow.

[See what's included in each plan](/en/pricing) or [go straight to the n8n workspace](/en/agents/n8n).

---

## What to Build Next

Once the basic RAG loop works, the n8n ecosystem makes it straightforward to extend:

- **Slack or Telegram input** — replace the Webhook trigger with a Telegram node; your team asks questions in chat and gets grounded answers.
- **Auto-ingestion from Notion or Confluence** — add an HTTP Request to the Notion API on a schedule; new pages get indexed overnight.
- **Source citations** — pass chunk metadata (page number, filename, URL) through to the prompt so the answer includes `[Source: handbook-v3.pdf, p.12]`.
- **Multi-collection routing** — use an **IF** node to route HR questions to an HR vector store and product questions to a product docs store, giving better retrieval precision.

The flow you built today is the foundation for all of those.