Build a Telegram Bot That Answers From Your Own Docs (RAG)

Q: What happens if the bot can't find a relevant answer in my docs?

If topK retrieval returns no results (or very low-similarity ones), the LLM will follow the system prompt instruction and say it doesn't have that information. You can also add an explicit IF node after retrieval to send a fixed fallback message when the result count is zero.

AgentRoost · May 24, 2026 · 7 min read · View as Markdown

Build a Telegram Bot That Answers From Your Own Docs (RAG)

Your team wiki exists. Nobody reads it.

Someone asks the same question for the third time this week. You paste the link. They skim the page, miss the relevant paragraph, and ask again. This is the actual problem a retrieval-augmented generation (RAG) bot solves: it reads the question, pulls only the relevant chunk from your documents, and returns a direct answer — with a source reference.

This guide walks you through building exactly that on your own n8n instance: a Telegram bot that answers strictly from a knowledge base you supply, with the AI inference already paid for.

What "RAG" Actually Means Here

RAG stands for retrieval-augmented generation. Instead of asking the LLM to answer from memory (where it can hallucinate), you:

Split your documents into chunks and store them as vectors (embeddings).
When a question arrives, convert it to an embedding and find the most similar chunks.
Stuff those chunks into the LLM prompt as context, then ask it to answer only from that context.

The result: a bot that is grounded in your content, refuses to guess when the answer isn't there, and always cites where it pulled from.

What You're Building

Input: A Telegram message ("What's the refund policy?")
Retrieval: n8n fetches the most relevant passages from a vector store loaded with your docs
Generation: An LLM node composes a cited answer using only those passages
Output: The answer is sent back in Telegram

The workflow lives in your own n8n instance. The AI nodes run against included credits — no OpenAI key, no Anthropic key, nothing to wire up yourself.

Step 1: Ingest Your Documents (One-Time Setup)

Create a Document Ingestion workflow. This runs once (or whenever your docs change).

Nodes you need

Node	Role
Manual Trigger or Schedule Trigger	Start the ingestion run
Read Binary File / HTTP Request	Load a PDF, a plain-text file, or pull from a URL
Recursive Character Text Splitter (n8n built-in)	Chunk the text (e.g. 500 tokens, 50 overlap)
Embeddings node (AI → Embeddings)	Turn each chunk into a vector
Vector Store Insert (Pinecone / Supabase pgvector / Qdrant)	Persist the vectors

A minimal chunk config that works well:

{
  "chunkSize": 500,
  "chunkOverlap": 50,
  "separators": ["\n\n", "\n", " "]
}

What to feed it: Team handbooks, product manuals, FAQ pages, support runbooks — any plain text or PDF you own. Keep each document tagged with a source metadata field so the answer can cite it.

Tip: If your docs live in Notion or Google Drive, n8n has built-in trigger nodes for both. You can schedule re-ingestion nightly with a Schedule Trigger so the bot always reflects your latest content.

Step 2: Build the Answer Workflow

This is the workflow that runs every time someone messages your bot.

Node-by-node

Telegram Trigger

Create a bot via BotFather to get your bot token, then add it to the Telegram Trigger node credentials. n8n registers the webhook automatically when you activate the workflow — no manual setWebhook call needed.
Output: {{ $json.message.text }} — the user's question.
{{ $json.message.chat.id }} — where to send the reply.

Vector Store Retrieval

Use the Vector Store Retriever node pointed at the same store you populated in Step 1.
Set topK: 4 — retrieve the four most similar chunks.
Pass {{ $json.message.text }} as the query.

LLM Node (AI → Chat Model)

Choose any of the 350+ models available on your instance — GPT-4o, Claude 3.5 Sonnet, Llama 3, Mistral, and more.
System prompt (paste this verbatim, then tweak):

You are a helpful assistant that ONLY answers from the provided context.
If the context does not contain enough information to answer, say:
"I don't have that in my docs — please check with the team."
Always cite the source at the end of your answer as: Source: <source>.
Do not invent information.

User message:

Question: {{ $json.message.text }}

Context:
{{ $('Vector Store Retrieval').item.json.pageContent }}

Source: {{ $('Vector Store Retrieval').item.json.metadata.source }}

IF Node

Condition: {{ $json.text.length > 0 }} — guard against empty responses before sending.

Telegram node (Send Message)

Chat ID: {{ $('Telegram Trigger').item.json.message.chat.id }}
Text: {{ $json.text }}
Parse mode: Markdown (so bold citations render cleanly)

Step 3: Handle Edge Cases

A bot that silently fails frustrates users more than one that admits ignorance.

When no relevant chunks are found — add an IF node after retrieval that checks topK results count === 0 and sends a fallback message: "I couldn't find anything relevant in the docs. Try rephrasing or contact the team."

Keeping context manageable — if you have thousands of pages, a generous topK will overflow the LLM's context window. Stay at topK: 3–6 and pick a model with a 128k context window for longer documents.

Rate limiting — if this bot is used by a whole team, consider adding a Rate Limit node or checking message.from.id against an allowlist using an IF + Set node.

How to Do This on AgentRoost

You need an n8n instance to run these workflows. Setting one up yourself means an afternoon of work: domain, SSL, Docker, reverse proxy, process manager, vector store, and API keys for every AI service you want to use.

On AgentRoost:

Sign up at agentroost.app — email/password, Google, Microsoft, or Discord.
Pick the n8n framework, name your instance.
Your private n8n editor opens at https://<your-id>.agentroost.app — SSL done, subdomain done, always on.
The AI nodes — embeddings, chat models, vector retrieval — are already connected to included credits. You pick a model from the dropdown and start building. No OpenAI key. No Anthropic key. Nothing.
Build the two workflows above. Your Telegram bot is live in under two minutes after the instance spins up.

What's included in the subscription: your own dedicated n8n instance (your login, your data, your workflows), AI/LLM inference credits, a public HTTPS webhook URL, 350+ model choices, and a 14-day money-back guarantee. From $19.99/mo all-in.

Every alternative — n8n Cloud, Elestio, Sliplane — requires you to bring your own API keys and pay for AI usage separately. On AgentRoost the credits are already in the plan.

Compare plans · See the n8n framework

Tips and Pitfalls

Do use metadata. Always store a source field (filename, URL, section title) when inserting chunks. Without it, the bot can answer correctly but can't tell the user where it came from — which undermines trust.

Do chunk overlap. A 50-token overlap means answers that straddle a chunk boundary don't get split in half.

Don't skip re-ingestion. If your docs change and you don't re-embed, the bot will answer from stale content. Schedule a nightly ingestion run even if the docs barely change.

Don't use the LLM without a strict system prompt. Without explicit grounding instructions, the model will fill gaps with plausible-sounding hallucinations. The prompt above ("ONLY answer from the provided context") is the key constraint.

Do test refusal. Ask the bot something that definitely isn't in your docs. It should say so cleanly rather than making something up. If it doesn't, tighten the system prompt.

Frequently asked questions

Do I need an OpenAI or Anthropic API key to run the AI nodes?

No. On AgentRoost the AI/LLM credits are included in your subscription. You pick a model from the dropdown in the n8n AI node and it works — no key needed. Every competitor requires you to bring your own key and pay for usage on top.

Which vector store should I use?

n8n has built-in support for Pinecone, Supabase (pgvector), Qdrant, and Weaviate. For a small knowledge base (under ~10,000 chunks), an in-memory store or Supabase pgvector is a practical starting point. For production workloads, Qdrant self-hosted or Pinecone serverless are solid choices.

Can I restrict the bot to certain Telegram users?

Yes — add an IF node right after the Telegram Trigger that checks {{ $json.message.from.id }} against a hardcoded list of allowed user IDs. Anyone not on the list gets an 'Access denied' reply and the workflow stops.

Can I export my workflows if I cancel?

Yes. n8n lets you export any workflow as a JSON file directly from the editor (right-click → Download). Your workflows are yours — you can import them into any other n8n instance at any time.

What happens if the bot can't find a relevant answer in my docs?

If topK retrieval returns no results (or very low-similarity ones), the LLM will follow the system prompt instruction and say it doesn't have that information. You can also add an explicit IF node after retrieval to send a fixed fallback message when the result count is zero.