---
title: "Summarize Long Emails and PDFs with AI in n8n"
description: "Step-by-step recipe for an n8n AI summarization workflow that digests long emails and PDFs automatically using a map-reduce pattern. No API key needed."
canonical: https://agentroost.app/en/blog/ai-summarization-emails-pdfs-n8n
date: 2026-05-16T04:00:00Z
---

[Canonical URL](https://agentroost.app/en/blog/ai-summarization-emails-pdfs-n8n)

Inbox zero is a myth. But inbox *comprehension* is achievable — if you stop reading every 3 000-word thread in full and start routing them through an AI that extracts what actually matters.

This guide walks you through a concrete n8n workflow that watches your email, detects long messages and PDF attachments, chunks them correctly, runs them through an AI summarization step, and delivers a tight digest wherever you want it. We will also cover the map-reduce trick that keeps the workflow accurate even when the input exceeds a model's context window.

---

## Why "just paste it into ChatGPT" doesn't scale

Manual copy-paste breaks the moment you have more than a handful of long emails per day. You want:

- **Automatic triggering** — no human in the loop.
- **Reliable chunking** — one prompt per 40-page PDF will hallucinate or silently truncate.
- **Structured output** — a digest you can act on, not a wall of re-phrased text.
- **Delivery** — the summary has to land where you will actually see it.

n8n handles all four. The AI/LLM node does the language work; the surrounding nodes do the plumbing.

---

## The workflow overview

```
Schedule Trigger (or Gmail Trigger)
  └─► Gmail: Get Many Messages  ──► IF: body length > 1 500 chars?
                                          │ Yes
                                          ▼
                                   Loop Over Items (chunks)
                                          │
                                   AI/LLM Node — "Summarize this chunk"
                                          │
                                   Aggregate (collect partial summaries)
                                          │
                                   AI/LLM Node — "Combine into final digest"
                                          │
                                   Gmail: Send / Slack / HTTP Request
```

Short emails (under the threshold) skip the loop and go straight to the final summarization step. Long emails and extracted PDF text go through the map-reduce branch.

---

## Step 1 — Trigger: choose poll or event

**Option A — Schedule Trigger (recommended for batches)**

Add a **Schedule Trigger** node. Set it to run every hour or at a fixed morning time. Connect it to a **Gmail** node (action: *Get Many Messages*). In the Gmail node:

- **Filters → Label**: `INBOX`
- **Filters → After date**: use `{{ $now.minus({hours: 1}).toISO() }}` so each run fetches only new mail.
- **Return All**: off — set a sane limit (25–50) to avoid hitting API rate limits.

**Option B — Gmail Trigger (real-time)**

If you need near-instant processing, replace both nodes with a single **Gmail Trigger** node on the *Message Received* event. It fires as soon as a message arrives.

---

## Step 2 — Extract PDF attachments (optional branch)

Add an **IF** node after the Gmail node that checks:

```
{{ $json.payload.parts.some(p => p.mimeType === 'application/pdf') }}
```

On the `true` branch:
1. **Gmail: Download Attachment** — download the first PDF attachment.
2. **Extract from File** node (operation: *Extract Text from PDF*) — outputs `{{ $json.text }}`.

Merge the PDF text and the email body into a single field called `content` using a **Set** node:

```json
{
  "content": "{{ $json.text || $json.snippet }}",
  "subject": "{{ $('Gmail').item.json.payload.headers.find(h => h.name==='Subject').value }}"
}
```

---

## Step 3 — Decide: short or long content?

Add an **IF** node:

- **Condition**: `{{ $json.content.length }} > 1500`

Strings shorter than ~1 500 characters (roughly 300 words) comfortably fit in a single prompt. Route them directly to Step 5. Everything longer goes to the chunking loop.

---

## Step 4 — Map-reduce chunking for long content

### 4a. Split into chunks

Add a **Code** node (JavaScript) to split `$json.content` into overlapping windows:

```js
const text = $input.item.json.content;
const chunkSize = 2000;   // characters (~500 tokens)
const overlap  = 200;
const chunks   = [];
let start = 0;
while (start < text.length) {
  chunks.push({ chunk: text.slice(start, start + chunkSize) });
  start += chunkSize - overlap;
}
return chunks;
```

Connect the Code node to a **Loop Over Items** node (batch size: 1) so each chunk is processed individually.

### 4b. Summarize each chunk (the "map" step)

Inside the loop, add an **AI/LLM** node (or the **Summarize** node if you prefer the pre-built tool). Configure it:

- **Prompt**:
  ```
  You are a concise business summarizer.
  Summarize the following excerpt in 3-5 bullet points.
  Focus on decisions, action items, and key figures.

  Excerpt:
  {{ $json.chunk }}
  ```
- **Model**: any model available in the node — GPT-4o mini works well for cost/quality at this task.
- **Output field**: `partialSummary`

### 4c. Aggregate partial summaries (the "reduce" step)

After the loop, add an **Aggregate** node:

- **Field to aggregate**: `partialSummary`
- **Aggregate function**: *Concatenate with separator* → `\n`

This produces a single `data` field containing all chunk summaries joined together.

---

## Step 5 — Final digest prompt

Add a second **AI/LLM** node connected to both branches (long content via Aggregate, short content directly):

```
You are an executive assistant. Turn the following notes into a
single, crisp summary of 5-8 bullet points. Group related items.
Remove duplicates. Highlight any required action or deadline.

Notes:
{{ $json.data || $json.content }}
```

Add a **Set** node after it to store the output cleanly:

```json
{
  "digest": "{{ $json.text }}",
  "subject": "{{ $('Set').item.json.subject }}"
}
```

---

## Step 6 — Deliver the digest

Pick your channel:

| Destination | Node | Key config |
|---|---|---|
| Email to yourself | **Gmail: Send Email** | To: your address; Body: `{{ $json.digest }}` |
| Slack | **Slack: Post Message** | Channel: `#digest`; Text: `*{{ $json.subject }}*\n{{ $json.digest }}` |
| Notion | **Notion: Create Page** | Database ID + properties map |
| Telegram | **HTTP Request** | `POST https://api.telegram.org/bot<TOKEN>/sendMessage` |

For the Telegram option: if you are running the Hermes or OpenClaw frameworks on AgentRoost, your Telegram bot is already provisioned — paste the outbound URL and you are done.

---

## Tips and pitfalls

- **Overlap matters.** Without the 200-character overlap between chunks, sentences that fall on a boundary get cut in half and the AI misses the point. Keep at least 10% overlap.
- **System prompt placement.** Put your persona and formatting instructions in the *System* field of the AI node, not inline in the user prompt. It produces more consistent output.
- **Rate limiting.** Gmail's API allows 250 quota units per second per user. Fetching 50 messages with full body content is safe; bumping to 200 without a Wait node between pages will hit the limit.
- **Token counting.** 2 000 characters is roughly 500 tokens for English text. If you switch to a smaller context model, reduce `chunkSize` accordingly.
- **Deduplication.** Store processed message IDs in a **Redis** node (or a Google Sheet if you prefer no-infra) and skip them in an IF node at the top of the workflow.

---

## Run it on AgentRoost — your own n8n, no DevOps

The workflow above works in any n8n instance. The friction is usually everything *around* n8n: provisioning a server, keeping it online, managing SSL, and — most commonly — sourcing and funding an OpenAI or Anthropic API key.

On **AgentRoost** those problems disappear at once:

1. **[Get started](/en/agents/n8n)** — create your account and pick the n8n framework.
2. **Name your instance.** Your private n8n editor opens at `https://<your-id>.agentroost.app` — your login, your data, your workflows.
3. **The AI/LLM node already has credits wired in.** Open the node, pick a model, run the workflow. No API key to obtain, no separate billing to set up.

Plans [start at $19.99/mo](/en/pricing) — compute, instance, and AI credits bundled. Monthly billing, cancel anytime, 14-day money-back guarantee.

Every competitor that offers n8n hosting (Elestio, Sliplane, Hostinger) requires you to bring your own API key. The AI nodes sit there disabled until you fund an external account. On AgentRoost the AI nodes are ready the moment your instance boots.

[Compare plans and get started →](/en/pricing)