You paste 50 pages of campaign notes into ChatGPT and ask about that merchant the party met in Millhaven. You know the answer is in there somewhere. And ChatGPT responds with something vague about "checking your Session 3 notes" instead of telling you it was Corvus the Wanderer, the Shadow Guild informant with the crooked smile.
We've all been there. You wrote detailed notes. The AI can technically read them. So why does it keep forgetting the good stuff?
Turns out, Stanford researchers found the answer: AI models are terrible at remembering information in the middle of long prompts. The phenomenon, called "Lost in the Middle," shows that performance drops by 50% or more when the information the AI needs is buried somewhere in the middle of your input. For chronological session notes, that's almost always where the important details live.
Your campaign details aren't being ignored because you wrote bad notes. They're being ignored because of how AI models process long inputs.
But there's a solution that doesn't involve smaller campaigns or better note-taking. It's called Retrieval-Augmented Generation, and it changes how AI interacts with campaign knowledge.
The Memory Problem All GMs Face
Here's the uncomfortable truth: TTRPGs generate information faster than any human can reliably track it. Every session creates:
- 3-5 new NPCs (even if we planned for two)
- Plot threads our players care about (and several we thought they'd ignore)
- World details we improvised on the spot
- Connections between characters, factions, and locations
- That one offhand comment players will definitely remember in six months
After 10 sessions, we have a novel's worth of content. After 50? We're managing a small encyclopedia. And unlike published adventures where someone else organized everything with an index, our campaigns are organic messes of Google Docs, notebooks, scattered text files, and that voice memo we recorded at 2 AM.
The natural solution seems obvious: AI tools like ChatGPT can remember things, right? Just dump our campaign notes in and ask questions whenever we need to recall something.
Except it doesn't work the way we'd hope.
Why Does Dumping Everything Into ChatGPT Fail?
Let's say we've got 50 pages of campaign notes. We paste them into ChatGPT and ask, "Who was the suspicious merchant the party met in Millhaven?"
What we want: "That was Corvus the Wanderer, a Shadow Guild informant who offered them a map to the Forgotten Crypts. He had a crooked smile and mentioned 'the old debts always come due,' which connects to the party's patron owing money to the Guild."
What we often get: "Based on your notes, there was a merchant in Millhaven. You might want to check your Session 3 notes for more details."
Why the disconnect? Here are the problems with the "dump everything into ChatGPT" approach:
Problem 1: Context Windows Hit Walls (Even Big Ones)
Modern AI models have impressive context windows—GPT-4 can handle 128,000 tokens, Claude pushes past 200,000. That sounds enormous until we realize that 100,000 tokens is roughly 75,000 words. A moderately detailed campaign hits that limit faster than we'd expect.
More importantly, even when our notes technically fit, we're burning through available context on static information rather than using it for the actual creative work we need help with. Every token spent on context is a token we can't use for generating that NPC personality or plot twist.
Problem 2: Lost in the Middle (Why Your Best Notes Get Ignored)
Here's where it gets frustrating. Stanford researchers discovered that AI models are great at remembering information at the beginning and end of what we give them, but information in the middle? It gets lost.
The phenomenon, appropriately called "Lost in the Middle," shows that even models designed for long context windows significantly underperform when the relevant information appears in the middle of their input. In controlled studies, performance dropped dramatically—sometimes by 50% or more—when the answer was buried in the middle of a long document, even when the total length was well within the model's technical limits.
Think about it like this: Imagine telling someone a long, detailed story and then asking them to recall a specific detail from the middle. Most people remember the beginning and the ending clearly, but that bit from the middle? Fuzzy at best.
Our campaign notes aren't neatly organized with the most important information at the start and end. The detail about Corvus the merchant from Session 3 is buried somewhere in the middle of chronological session notes, surrounded by combat encounters, side quests, and tavern descriptions. The AI might see it, but it's not prioritizing it when we ask.
Problem 3: Context Rot—Quality Degrades With Length
Research in 2024 and 2025 uncovered another issue: AI performance degrades as prompt length increases, even when staying well within technical limits. This phenomenon, called "context rot," shows that LLMs struggle with reasoning, accuracy, and relevance as we feed them longer inputs.
Studies found performance drops beginning around 3,000-5,000 tokens—far below the models' maximum capacity—and continuing to decline as context grows. The models become more prone to hallucinations, less coherent in their responses, and more likely to miss relevant information entirely.
Think of it like trying to have a conversation in an increasingly noisy room. Technically we can still hear the person talking, but the quality of communication degrades. Our campaign notes create that noise, and the AI starts giving vaguer, more generic responses because it's struggling to focus on what matters.
We've all experienced this—asking ChatGPT for help with our campaign and getting responses that feel... off. They're not wrong, exactly, but they don't sound like our world.
Problem 4: Generic Responses That Forget Your World
Even when ChatGPT does recall details from our notes, the responses often feel generic. We ask for an NPC to fill an unexpected scene, and get "Eldrin the Wise, an elderly elf wizard with a long white beard and mysterious past."
That's not wrong. It's also not OUR campaign. Where are the naming conventions we've established? Our world's tone? The specific details that make our campaigns feel cohesive? ChatGPT is trained on generic fantasy tropes, and without smart retrieval, it defaults to them instead of pulling from our specific world's identity.
What Is RAG and How Does It Help?
This is where Retrieval-Augmented Generation (RAG) solves the problem. The technical term sounds intimidating, so let's break it down in GM-friendly terms:
RAG is like having a librarian who finds exactly the right book pages we need, instead of dumping the entire library on our desk and hoping we spot the relevant paragraph.
Here's how it actually works:
Step 1: Building a Searchable Memory
When we upload campaign documents to a RAG-powered knowledge base, the system doesn't just store them as static files. It processes them into something smarter: semantic embeddings.
Think of this as the AI reading our notes and understanding not just the words, but the concepts and relationships. It knows that "Corvus the merchant" is related to "Shadow Guild," "Millhaven," "suspicious map," and "Session 3" not because these words appear near each other, but because it understands their conceptual connections.
These relationships get stored in what's called a vector database—a specialized system designed to organize information by meaning and context rather than just keywords.
Step 2: Smart Retrieval Instead of Bulk Loading
When we ask "Who was the suspicious merchant in Millhaven?", the RAG system doesn't dump all 50 pages of notes into the AI's context window. Instead, it:
- Understands what we're asking for semantically (not just matching the word "merchant")
- Searches through our knowledge base for the most relevant information
- Retrieves only the specific passages that matter—Corvus's description, his connection to the Shadow Guild, the relevant session notes
- Sends this targeted context to the AI along with our question
The AI gets maybe 500-1,000 tokens of highly relevant context instead of 50,000 tokens of everything. It stays focused, accurate, and specific to our campaign.
Step 3: Campaign-Specific Responses
Because the AI is working with carefully selected information from OUR world, its responses stay true to our campaign's identity. It uses our naming conventions, references our established factions, and maintains the tone we've built over dozens of sessions.
Ask for an NPC in Millhaven, and instead of "Eldrin the Wise," we get someone who fits our world's style—because the AI is drawing from our actual NPCs, not generic fantasy tropes.
How It Works in ScriptoriumGM: A Real Scenario
Let's look at what this looks like in practice, with a real scenario that probably sounds familiar:
The Setup: We're running Session 18 of an urban intrigue campaign. The party just mentioned wanting to follow up on rumors about corrupt city guards—something we completely forgot we'd introduced as background flavor in Session 4.
The Old Way (ChatGPT dump):
- Open session notes folder
- Scroll through multiple documents looking for "guards" or "corruption"
- Copy 15,000 words of potentially relevant content into ChatGPT
- Ask about the corrupt guards subplot
- Get a generic response suggesting we create a new storyline about corrupt guards
- Improvise something on the spot that doesn't quite connect to what we'd established
- Make a mental note to organize better (that we'll forget by next week)
The RAG Way (ScriptoriumGM):
- Ask the campaign assistant: "What did I establish about corrupt guards?"
- The system searches the knowledge base semantically
- Retrieves the relevant passage from Session 4 where we mentioned Captain Thorne taking bribes from the Silk Merchants' Guild
- AI responds: "You introduced Captain Thorne of the East Watch in Session 4. He's been taking bribes from the Silk Merchants' Guild to look the other way on smuggling operations. The party overheard rumors about this in the Broken Anchor tavern but didn't pursue it at the time."
- Ask: "Create a scene where the party confronts evidence of this corruption"
- Get a scenario that fits our established world, references our actual NPCs and locations, and builds on what we already created
Time saved: 15-20 minutes of searching and context-switching.
Quality gained: Continuity with our established campaign instead of improvised retcons.
Real-World Scenarios Where RAG Saves You
A few more examples of when this saves your session:
Scenario 1: The Callback Your Players Remember (But We Don't)
The Moment: Session 32. The bard says, "I want to contact that information broker we met at the start of the campaign—the one who owed us a favor."
Without RAG: Panic. We vaguely remember an information broker. Was it in Session 2? Session 5? What was their name? What was the favor about? We improvise something and hope it's close enough to what we actually established.
With RAG: "Who was the information broker who owed the party a favor?" The knowledge base immediately pulls up Silas Venn from Session 2, including his personality traits, his connection to the Thieves' Quarter, and the specific terms of the favor (information about rival guild movements in exchange for keeping his involvement in a botched heist quiet). We reference him accurately and our players feel heard because we remembered the detail they cared about.
Scenario 2: The Unplanned Scene That Needs NPCs
The Moment: The players decide to visit the Temple of the Dawn for healing instead of going straight to the dungeon we prepped. We need a priest NPC right now, and they need to feel like they belong in our world.
Without RAG: We create Brother Whoever, a generic cleric who heals them and doesn't quite sound like the other religious NPCs in our campaign. Players get what they need but the world feels slightly less cohesive.
With RAG: Ask the assistant to generate a Temple priest. Because it has access to our knowledge base, it knows our world's religious structure, naming conventions, and the personality style of other clergy we've introduced. We get Father Aldric, who speaks with the formal dialect we established for Dawn worshippers, references the temple's recent struggles with funding (mentioned in Session 11 notes), and feels like a natural part of our world.
Scenario 3: Building on Forgotten Threads
The Moment: We're prepping Session 40 and realize the players are headed to the northern territories. We know we mentioned something about political tensions there, but that was 20 sessions and three months ago.
Without RAG: We re-read old session notes, try to piece together what we established, and hope we're not contradicting ourselves. This takes an hour of limited prep time.
With RAG: "What have I established about politics in the northern territories?" The knowledge base pulls every relevant mention: the succession crisis in Session 18, the border disputes mentioned in Session 22, the NPC who fled from there in Session 27. We get a consolidated picture in 30 seconds and can build on our existing foundation instead of accidentally retconning or forgetting important details.
Scenario 4: Maintaining Voice and Tone
The Moment: Our homebrew world has a specific feel—maybe it's dark and gritty, or whimsical and fairytale-like. We need to generate content quickly but want it to match our established tone.
Without RAG: Generic AI tools give us responses based on common fantasy tropes. We spend time rewriting to match our voice, or worse, gradually drift away from our original vision because generic is easier.
With RAG: Because the system learns from our existing campaign writing, its suggestions naturally match our tone. A gritty noir campaign gets morally gray NPCs and harsh consequences. A whimsical campaign gets playful descriptions and lighthearted complications. The AI adapts to our style instead of imposing its defaults.
Time Saved & Confidence Gained
Time matters when you're prepping between work, family, and actually having a life:
Traditional approach to campaign recall:
- Searching through documents: 5-15 minutes per question
- Re-reading to find context: 10-30 minutes
- Risk of missing information: High
- Quality of recall: Depends on note organization
RAG-powered knowledge base:
- Retrieving specific information: 10-30 seconds
- Getting contextual connections: Included automatically
- Risk of missing information: Low (semantic search finds related content)
- Quality of recall: Consistent and comprehensive
Over a campaign, those minutes add up to hours—hours we could spend on creative prep, or more importantly, not prepping at all because we have lives outside GMing.
But the bigger win isn't time. It's confidence.
We know that feeling when a player references something from months ago and we remember it accurately? When the world feels cohesive because we built on previous threads instead of contradicting them? When NPCs feel like they belong to the same world instead of random generators?
That's the confidence that comes from reliable campaign memory. We're not frantically searching or hoping we remember correctly. We know we can access what we've established, build on it, and keep our world feeling lived-in and consistent.
How Do You Get Started Building a Knowledge Base?
If you have 30 sessions worth of scattered notes and this sounds overwhelming, relax: you don't need to start perfect.
Week 1: Current Campaign State Upload what matters right now:
- Main NPCs (the ones players interact with regularly)
- Current plot threads and active quests
- Key locations in play
- Important world lore that's currently relevant
Week 2: Recent Sessions Add the last 5-10 sessions of notes. These establish recent continuity and are most likely to be referenced soon.
Week 3: Foundation Documents Include core worldbuilding:
- Setting overview and tone
- Major factions and organizations
- Character backstories
- Any "campaign bible" documents we've created
Week 4 and Beyond: Fill Gaps As Needed Don't try to upload everything at once. Add old session notes when we need to reference them, or during downtime between arcs. The knowledge base gets smarter as it grows, but it's useful immediately even with partial information.
The Secret: A knowledge base doesn't need to be complete to be valuable. Even uploading the last 10 sessions creates a semantic search foundation that's infinitely better than trying to remember everything ourselves or ctrl+F through documents.
The Hybrid Approach: AI + Our Creativity
RAG-powered knowledge bases handle the mechanical memory work so we can focus on the creative decisions that actually require our brains.
Think of it like the difference between:
Without support: Spending 20 minutes searching for a name, 15 minutes remembering what we'd established about a faction, and 30 minutes trying to improvise an NPC that fits our world—leaving us exhausted before we even start the creative work.
With RAG support: Spending 2 minutes retrieving what we need, then having 63 fresh minutes for the creative work we actually enjoy: developing plot twists, building emotional scenes, crafting interesting decisions for our players.
The AI handles the remembering so we can focus on the storytelling. It maintains consistency so we can build on that foundation with our imagination.
We're not handing over creative control. We're offloading the mechanical memory work that drains our prep time and mental energy.
The Future Is Campaign-Specific Intelligence
Here's where this technology opens up real possibilities: we're moving from generic AI tools that know everything about nothing specific, to campaign-specific intelligence that knows everything about OUR world specifically.
Imagine an assistant that:
- Knows our NPCs better than we remember them
- Understands the complex web of relationships we've built
- Can generate new content that feels native to our world
- Suggests plot connections we'd forgotten about
- Helps maintain continuity without constant re-reading
- Scales perfectly whether our campaign is 10 pages or 1,000 pages
That's not science fiction. That's what RAG-powered knowledge bases enable right now.
The difference between a 5-session campaign and a 50-session campaign becomes logistical, not technical. Our ability to recall, reference, and build on what we've created doesn't degrade as our world grows—it actually gets better because the AI has more context to work with.
The system that helps us remember Corvus in Session 3 scales to help us track 200 NPCs across 5 years of play. Same effort from us, exponentially more value as our campaigns grow.
Why This Matters More Than We Think
What this really enables: longer campaigns that don't collapse under their own weight.
How many campaigns have we seen (or run) that fizzled out not because the story wasn't interesting, but because tracking everything became exhausting? Where we gradually forgot important details, players noticed the inconsistencies, and the world started feeling less real?
How many times have we wanted to run an epic, years-long campaign but worried about our ability to maintain continuity across that much time and information?
RAG-powered knowledge bases solve the scalability problem of long-form TTRPGs. They make the 100-session campaign as manageable as the 10-session campaign. They prevent the GM burnout that comes from drowning in details we can't reliably recall.
This isn't just about convenience. It's about enabling the kinds of deep, long-running campaigns that define the best experiences in this hobby—without requiring superhuman memory or unsustainable prep loads.
Our Campaigns Deserve Better Than Forgetting
We put hours into building our worlds. Our players invest in their characters and get attached to NPCs. Together we create something meaningful that exists nowhere else—a unique story that's ours collectively.
That work deserves better than being forgotten in scattered notes we can't efficiently search. Our players deserve better than watching us frantically scroll through documents when they reference something important. We deserve better than feeling like we're drowning in our own campaign's details.
Modern tools can handle the remembering. We handle the magic.
Start Building Your Campaign Memory
If you're tired of forgetting your own campaign details, spending more time searching notes than creating content, or wanting your 50th session to feel as coherent and consistent as your 5th, it's time to try a campaign-specific knowledge base.
ScriptoriumGM is built specifically for this: upload campaign documents, ask questions, get answers that actually know YOUR world. Generate NPCs that fit your style. Recall plot threads you'd forgotten. Build on what you've created instead of accidentally contradicting it.
It's not about replacing your creativity. It's about freeing it from the mechanical burden of remembering everything you've ever established.
Your campaigns have a memory now. It's time to use it.
Try ScriptoriumGM's Knowledge Base and stop forgetting your own brilliant ideas.
What's Your Experience?
How do you handle campaign continuity right now? Drowning in notes, relying purely on memory, or have you found a system that works? What's the most embarrassing thing you've forgotten from your own campaign?
Share your campaign memory struggles (and solutions) in our Discord community. We've all been there, and sometimes the best solutions come from GMs who've already fought through the same problems.
Sources
-
Lost in the Middle: How Language Models Use Long Contexts - Liu et al., Stanford University, demonstrating how LLMs struggle with information in the middle of long contexts
-
Context Rot: How Increasing Input Tokens Impacts LLM Performance - Chroma Research, 2025, showing performance degradation as prompt length increases
-
Claude Context Window: Token Limits and 2025 Rules - Current context window specifications for major LLMs
-
ChatGPT Plus Token Limits 2025: Complete Guide - Understanding practical context limitations
-
What is Retrieval-Augmented Generation (RAG)? - AWS explanation of RAG architecture and benefits
-
Retrieval-Augmented Generation (RAG) - Pinecone's technical overview of RAG systems
-
What Is Retrieval-Augmented Generation (RAG)? - IBM's accessible explanation of RAG benefits
-
Context Length Alone Hurts LLM Performance - Research showing degradation even within technical context limits
-
The Impact of Prompt Bloat on LLM Output Quality - Analysis of quality degradation with long prompts


