---
title: "LLM Tokens, Context Windows & Temperature Explained"
description: "Plain-English guide to LLM tokens, context windows, and temperature — what they mean, when to change them, and what happens when you get them wrong."
canonical: https://agentroost.app/en/blog/llm-tokens-context-window-temperature-explained
date: 2026-05-14T20:00:00Z
---

[Canonical URL](https://agentroost.app/en/blog/llm-tokens-context-window-temperature-explained)

If you have ever built an automation with an AI step and seen it produce garbled output, cut off mid-sentence, or repeat the same phrase three times, one of three things was usually wrong: the model ran out of context, you gave it too many tokens, or the temperature was misconfigured. None of those terms are hard once you see what they actually map to.

This guide explains all three from first principles, gives you practical rules of thumb, and shows you what to do when something goes wrong in an AI node.

---

## What is a Token?

A **token** is the smallest chunk a language model processes. It is not a word and it is not a character — it sits somewhere in between, determined by the model's tokenizer.

A rough rule of thumb that holds across most English-language models:

- **~4 characters ≈ 1 token**
- **~75 words ≈ 100 tokens**
- A typical email (200 words) ≈ 270 tokens
- A one-page document (500 words) ≈ 670 tokens
- A full novel chapter (3,000 words) ≈ 4,000 tokens

Tokens matter for two reasons:

1. **Cost.** Most model providers charge per token (input + output separately). If you are on a bring-your-own-key setup and you forget to cap `max_tokens`, a runaway loop can drain your API credits in minutes.
2. **Context limits.** Each model can only "see" a fixed number of tokens at once. Once you hit that limit, the model can no longer process more input — or it starts dropping the oldest content to make room.

> **Practical tip:** When you paste a long document into an AI node, count the words first. Divide by 0.75 to get a rough token estimate. If that number approaches the model's context limit, you need to chunk the document before sending it.

---

## What is a Context Window?

The **context window** is the maximum number of tokens a model can hold in "working memory" for a single inference call. Everything inside the window — the system prompt, the conversation history, the user message, tool outputs — must fit within it. What sits outside the window is invisible to the model.

Different models have very different context window sizes:

| Model family | Typical context window |
|---|---|
| Older GPT-3.5 class | 4 K – 16 K tokens |
| GPT-4 class | 8 K – 128 K tokens |
| Claude 3 class | 200 K tokens |
| Gemini 1.5 Pro | 1 M tokens |
| Many open-source models | 4 K – 32 K tokens |

### Why this trips people up in automations

Suppose you build a workflow that:
1. Pulls a customer's full email thread (50 messages).
2. Feeds it to an AI node.
3. Asks the model to summarize the issue and suggest a reply.

If the thread is 6,000 tokens and your model's context window is 4,096, **the model silently drops the oldest messages**. It summarizes only the most recent part of the thread. No error, no warning — the output just looks oddly truncated or misses the original complaint.

### How to handle context limits

- **Summarize first, reason second.** Use a first AI node to reduce a long document to 300–500 tokens. Feed that summary to the next node.
- **Keep system prompts lean.** A 2,000-token system prompt burns 2,000 tokens of context on every single call.
- **Use retrieval (RAG) for large corpora.** Instead of feeding an entire knowledge base, retrieve only the two or three most relevant chunks, then feed those.
- **Pick the right model for the job.** A high-context model is not always better — it is often slower and more expensive. Use it only when you actually need the window.

---

## What is Temperature?

**Temperature** controls how deterministic or creative the model's output is. It is a number, typically between `0` and `2` (though the effective useful range is `0` to `1.5` for most tasks).

Internally, after the model computes probabilities for the next token, temperature re-scales those probabilities before the model picks one. A low temperature makes the most likely token overwhelmingly favored. A high temperature flattens the distribution, giving less-likely tokens a real shot.

| Temperature | Behavior | Best for |
|---|---|---|
| `0.0` | Almost deterministic; picks the highest-probability token nearly every time | Classification, extraction, structured JSON output, routing |
| `0.1 – 0.4` | Consistent but slightly flexible | Summarization, data cleaning, customer support replies |
| `0.5 – 0.8` | Balanced creativity and coherence | Marketing copy, email drafts, general writing |
| `0.9 – 1.2` | More varied; occasional surprising word choices | Brainstorming, creative variations, story generation |
| `> 1.2` | Unpredictable; may lose coherence | Rarely useful in production |

### Temperature rules of thumb

**Use `0` for structured output.** If your AI node is supposed to return JSON with specific keys, high temperature is your enemy. The model may decide to "get creative" and invent a key name, break the JSON structure, or add commentary. Set temperature to `0` and add a strict JSON schema in your system prompt.

**Use `0.7` as the starting default.** Most general-purpose tasks work well here. Adjust up if the output feels robotic; adjust down if it feels inconsistent.

**Do not conflate temperature with quality.** A common misconception: "higher temperature = better/smarter output." Temperature only changes variability, not capability. A model at temperature `1.5` is not smarter than the same model at `0.3` — it is just more unpredictable.

---

## When Your AI Node Misbehaves: A Diagnostic Checklist

Before you change the model or rebuild the workflow, run through this:

1. **Output cuts off mid-sentence** → Check `max_tokens`. If you set it too low, the model stops generating when it hits the limit. Raise it. Also check that the *input* is not close to the context limit.
2. **Model "forgot" something from earlier in the conversation** → You have exceeded the context window. The model is not hallucinating; it genuinely cannot see the dropped content. Summarize or chunk.
3. **JSON output is malformed or has random extra text** → Temperature is too high, or you are missing a strict output instruction in the system prompt. Set temperature to `0` and add `Return ONLY valid JSON, no prose.` to the prompt.
4. **Output is repetitive or eerily uniform** → Temperature may be at `0` for a creative task. Bump to `0.6–0.8`.
5. **Output is inconsistent run-to-run on the same input** → Expected at temperature > 0. If you need reproducibility, drop temperature to `0` and set a fixed seed if the model supports it.

---

## How to Experiment Freely on AgentRoost

Understanding these three knobs is useful in theory. What makes it practical is being able to actually try different configurations without worrying about a surprise billing event at the end of the month.

On AgentRoost, **AI/LLM credits are included in your subscription**. You are not bringing your own API key — the credits are already there, already paid for. This matters specifically when you are learning: running 30 test completions at different temperatures to see how the output changes costs you nothing extra.

Here is how a practical experiment looks on AgentRoost:

1. **Sign up** at [agentroost.app](https://agentroost.app) and pick the **n8n** framework.
2. Name your instance. Your private n8n editor opens at `https://<your-id>.agentroost.app` — no extra setup, no Docker, no SSL configuration.
3. Add a **Chat Model** node (or any AI/LLM node). The credential is pre-configured — no API key entry required.
4. Switch models from the dropdown. You have access to 350+ models. Run the same prompt against a 4 K-context model and a 200 K-context model. See the difference immediately.
5. Adjust `temperature` in the node parameters. Run your prompt. Adjust again. The credits are included, so every test iteration costs nothing on top of your monthly plan.

This is exactly the kind of friction-free experimentation that is hard to do when every test call drains a pay-as-you-go balance. At $19.99/mo all-in with a 14-day money-back guarantee, you can run dozens of experiments before deciding what model and configuration actually fits your workflow.

[Compare plans](/en/pricing) — or go straight to the [n8n agent page](/en/agents/n8n) to see what is included.

---

## Quick Reference

| Concept | Controls | When to change |
|---|---|---|
| **Tokens** | Size of input + output | Chunk long inputs; set `max_tokens` to avoid runaway generation |
| **Context window** | How much the model can "see" | Choose a larger-window model for long documents; summarize for small-window models |
| **Temperature** | Determinism vs. creativity | `0` for structured output; `0.7` for general tasks; `1.0+` for brainstorming |

Master these three and most AI node misbehavior becomes diagnosable in under a minute.