Build a Simple RAG Q&A Bot Over Your Docs with n8n
Most chatbots have a fundamental problem: they answer from their training data, not your content. Ask a general-purpose LLM about your internal product docs, your SOPs, or your knowledge base and you get confident-sounding guesses. Retrieval-Augmented Generation (RAG) fixes this by forcing the model to answer only from text you hand it — fetched at query time from a vector store you control.
n8n is one of the cleaner places to build this because the entire pipeline — chunking, embedding, storing, retrieving, generating — can live in a single visual workflow with no Python scripts, no LangChain boilerplate, and no external orchestration layer. This guide walks through the exact build.
What RAG Actually Does (in 30 Seconds)
RAG has two distinct phases:
- Indexing — you take your documents, split them into chunks (e.g. 500 tokens each, 50-token overlap), convert each chunk into a vector embedding, and write those vectors to a store.
- Querying — when a user asks a question, you embed the question the same way, find the nearest chunks in the store (cosine similarity), and pass those chunks as context into an LLM prompt. The model answers from the retrieved context, not from memory.
The result is grounded, citable answers. If the answer isn't in your docs, the model should say so.
The n8n Flow Architecture
You need two workflows (or two sub-flows in one workflow with separate triggers):
[Indexing flow]
Trigger (manual / schedule / webhook)
→ Read Binary Files (or HTTP Request / Google Drive)
→ Recursive Character Text Splitter
→ Embeddings node
→ Vector Store (Insert)
[Query flow]
Chat Trigger (or Webhook)
→ Embeddings node (same model as indexing)
→ Vector Store (Retrieve)
→ AI Agent / LLM Chain (with retrieved context injected)
→ Respond to Webhook / Chat
Both flows share the same vector store and the same embedding model — that's the key constraint. Mixing models breaks retrieval.
Step 1 — Set Up the Indexing Flow
1a. Trigger and Document Source
For a manual one-time index, use a Manual Trigger. For ongoing ingestion, use a Schedule Trigger (cron expression: 0 2 * * * for nightly at 2 AM) or a Webhook node so you can trigger indexing from an external system when files change.
To load documents, the most flexible option is an HTTP Request node pointing at any URL that returns text or a binary file. For local files or Google Drive, use the built-in Google Drive or Read/Write Files from Disk nodes.
1b. Chunk the Text
Add a Recursive Character Text Splitter node (found under the AI → Document Loaders section in n8n). Reasonable defaults:
| Parameter | Value |
|---|---|
| Chunk Size | 500 |
| Chunk Overlap | 50 |
| Separators | \n\n, \n, |
Smaller chunks give more precise retrieval; larger chunks give more context per result. 500/50 is a safe starting point for prose documents.
1c. Embed the Chunks
Connect an Embeddings node. In n8n this is labelled something like Embeddings OpenAI or Embeddings Cohere depending on your provider selection. Pick text-embedding-3-small (OpenAI) or an equivalent — it's fast and cost-effective for indexing. You configure the model inside the node; on AgentRoost the credential is already pointed at your included credits (more on that below).
1d. Write to the Vector Store
Connect a Vector Store node in Insert mode. For a quick start, choose In-Memory Vector Store — it requires no external service and works immediately. For production persistence (survives workflow restarts), swap to PGVector with a Postgres connection, or Qdrant.
Your indexing flow is now complete. Run it manually once to populate the store.
Step 2 — Build the Query Flow
2a. Accept the Question
Use a Chat Trigger node (built-in to n8n, gives you a test chat panel) or a Webhook node if you want to hook it up to a front-end, Telegram, or Slack.
2b. Embed the Question
Add the same Embeddings node you used in the indexing flow, pointing at the same model. n8n will reuse the same credential configuration.
2c. Retrieve Relevant Chunks
Add a Vector Store node in Retrieve (or Search) mode. Connect it to the same store you wrote to. Set Top K to 4 or 5 — that's the number of nearest chunks the model will receive as context.
The node outputs an array of document chunks with their content and metadata.
2d. Generate the Answer
Add an AI Agent node (or a simpler LLM Chain if you don't need tool-calling). In the system prompt, explicitly instruct the model to answer only from the provided context:
You are a helpful assistant. Answer the user's question using ONLY the
context below. If the answer is not in the context, say so clearly.
Context:
{{ $json.documents.map(d => d.pageContent).join('\n\n---\n\n') }}
Set the user message to {{ $('Chat Trigger').item.json.chatInput }} (or the equivalent input field from your trigger).
Pick any model from your preferred provider — gpt-4o-mini for speed, claude-3-haiku for low latency, or any of the 350+ models available on AgentRoost. Swap models with a single dropdown change; no credential update needed.
2e. Return the Answer
Wire the output back to a Respond to Webhook node (if you used a webhook trigger) or let the Chat Trigger display it in n8n's built-in chat panel.
Pitfalls to Watch
Embedding model mismatch. If you index with text-embedding-3-small and query with text-embedding-ada-002, vector distances are meaningless. Pin the model name in both nodes.
In-memory store resets on restart. The in-memory vector store is cleared when n8n restarts. For anything you want to persist, use PGVector or Qdrant from day one. The swap is a node replacement — your chunking and query logic stays identical.
Chunk size vs. context window. If you set Top K=5 and each chunk is 500 tokens, you're injecting 2,500 tokens of context. At models with smaller context windows that leaves little room for the response. Either use a larger-context model or reduce K.
No deduplication on re-index. If you re-run the indexing flow without clearing the store, chunks double up. Add a Delete All operation at the start of the indexing flow (or filter by document ID) if you run it on a schedule.
Running This on AgentRoost
On your own server you'd need to: install n8n, configure a reverse proxy and SSL, set up an OpenAI account, fund it, copy API keys, and keep the server patched. That's the self-hosting tax.
On AgentRoost, you get your own n8n instance — your login, your workflows, your data — at https://<your-id>.agentroost.app, with none of that overhead:
- Sign up at agentroost.app.
- Pick the n8n framework, name your instance.
- Your private n8n editor opens in about two minutes.
- Open the Credentials panel — the AI/LLM and Embeddings credentials are already configured against your included credits.
- Build the two flows above. Hit Execute on the indexing flow, then open the chat panel and ask a question.
Both the embedding calls during indexing and the generation calls during querying run against your included credits — no OpenAI billing account, no API key rotation, no surprise invoice at the end of the month. Plans start at $19.99/mo all-in, and there's a 14-day money-back guarantee if it doesn't fit your workflow.
See what's included in each plan or go straight to the n8n workspace.
What to Build Next
Once the basic RAG loop works, the n8n ecosystem makes it straightforward to extend:
- Slack or Telegram input — replace the Webhook trigger with a Telegram node; your team asks questions in chat and gets grounded answers.
- Auto-ingestion from Notion or Confluence — add an HTTP Request to the Notion API on a schedule; new pages get indexed overnight.
- Source citations — pass chunk metadata (page number, filename, URL) through to the prompt so the answer includes
[Source: handbook-v3.pdf, p.12]. - Multi-collection routing — use an IF node to route HR questions to an HR vector store and product questions to a product docs store, giving better retrieval precision.
The flow you built today is the foundation for all of those.
Frequently asked questions
Do I need an OpenAI API key to use the embedding or AI nodes?
Not on AgentRoost. AI credits are included in every plan, so the Embeddings and AI/LLM nodes work out of the box. You select your model inside n8n and the cost is covered — no BYOK step.
What vector store does n8n support for RAG?
n8n has built-in nodes for Pinecone, Qdrant, Supabase Vector, PGVector (Postgres), and a simple in-memory store. For a self-contained setup, the In-Memory Vector Store node works immediately; for persistence across restarts, switch to PGVector (Postgres) or another connector.
How many documents can this handle?
That depends on your chunk count and the vector store's capacity. The in-memory store suits small doc sets (dozens of pages). For larger corpora, connect an external Qdrant or Supabase instance — the n8n node swap is just a credential change.
Can I cancel if it doesn't work for my use case?
Yes. AgentRoost offers a 14-day money-back guarantee and monthly billing with no lock-in — cancel anytime from your account dashboard.
Can I run the indexing pipeline on a schedule to pick up new documents automatically?
Absolutely. Add a Schedule Trigger node at the top of your indexing flow and point it at a folder, an RSS feed, an HTTP endpoint, or a Google Drive folder via the HTTP Request or Google Drive node. New files get chunked and embedded automatically on each run.