Back to Blog

Context Rot: Why AI Performance Drops in Long Chats (And Fixes)

7 min read

Context rot is the progressive degradation of an LLM's performance as its context window fills up with conversation history. With every message you send, your AI gets just a little bit worse at reasoning, coding, and recalling facts.

I’ve seen this time and time again in my own development work, and it’s not just anecdotal. Hard data confirms that once you pass a certain token threshold, your "smart" AI assistant starts acting like a frantic intern who forgot the instructions you gave five minutes ago.

Here is exactly why context rot happens, the math behind the hidden costs of your chat history, and the specific strategies—like the GSD framework—I use to stop it.

What Is Context Rot?

The idea of context rot is simple: The longer you interact with an AI system in a single session, the worse it performs.

Most people assume that giving an AI more information makes it smarter. In reality, shoving more data into a particular system often creates noise that drowns out the signal. The context window is the AI's short-term memory, and when you overload it, reliability tanks.

A recent study by Chroma highlighted this perfectly. They tested top models—including Claude, GPT-4, Qwen, and Gemini—by giving them a task and then progressively filling their context windows with more tokens.

The results were alarming. As the context window filled up, performance didn't just dip; it fell off a cliff. Even with newer models that boast massive context windows (like Gemini’s 2 million tokens), the effective retrieval and reasoning ability degrades significantly the further you go.

The magic number seems to be around 100,000 to 120,000 tokens. Past this point, regardless of the model's theoretical max capacity, effectiveness drops. It stops following complex instructions and starts hallucinating.

How Do Context Windows Actually Work?

To battle this, you need to understand the mechanics of tokens. Think of tokens as the currency of Large Language Models (LLMs), and the context window as your budget.

For the sake of simplicity, let's say 1 token equals 1 word. (Technically it's roughly 0.75 words, but for 99% of use cases, 1-to-1 is the right mental model).

Here is where most people get the math wrong. They think if they send a 10-word message, they are using 10 tokens. That is false.

The "Back and Forth" Multiplier

When you interact with an LLM, you aren't just sending your latest message. You are sending input tokens (everything you've ever said in that chat + the system prompt + tools) and receiving output tokens (the AI's answer).

Let’s look at a hypothetical scenario with Claude:

  1. Message 1: You say "Hi Claude" (2 tokens).

    • Input: 2 tokens.
    • Claude Replies: "Hi, how are you?" (6 tokens).
    • Current Context Load: 8 tokens.
  2. Message 2: You say "I'm well. Explain context windows." (8 tokens).

    • Input: It is NOT just 8 tokens. It is the previous 8 tokens (history) + the new 8 tokens. Total Input = 16 tokens.
    • Claude Replies: A long explanation (100 tokens).
    • Current Context Load: 116 tokens.
  3. Message 3: You ask a follow-up question (6 tokens).

    • Input: 116 (history) + 6 (new message) = 122 input tokens.

Every time you hit enter, you are re-uploading the entire conversation history. If you are 50 messages deep, you might be sending 50,000 tokens of input just to ask a 10-word question.

But it gets worse. It's not just text.

The Hidden Bloat: System Prompts and Tools

If you look at the raw logs of a Claude Code session, user messages are often just a small slice of the pie. The context window is also filled with:

  • System Prompts: The massive block of text telling the AI how to behave.
  • Available Tools: Definitions of every function the AI can call.
  • MCP Tools: Model Context Protocol tools (more on this later).

Your first message might technically cost you 5,000 tokens before you've even typed a word because of the system overhead. All of this fights for real estate in the context window.

How Do You Fix Context Rot?

Since we know performance drops after ~100k tokens, our goal is to stay in the "Goldilocks Zone"—enough context to be useful, but not enough to cause rot. Here are the three weapons I use to manage this.

1. Atomic Task Management

The most important shift you can make is in Task Management. Stop giving your LLM vague, massive objectives.

If you tell Claude, "Build me a SaaS project management tool for creators," you are setting it up to fail. That request requires holding too many dependencies, file structures, and logic paths in the context window simultaneously.

Instead, break the idea down into its smallest parts—what I call Atomic Tasks.

  • Bad: "Build the website."
  • Good: "Create a Product Requirement Document (PRD) for the landing page."
  • Good: "Write the index.html based on the PRD."
  • Good: "Write the form validation logic for the sign-up component."

This is why I stress starting with a PRD and a Kanban board. By breaking the project into discrete tickets, you only need the AI to focus on one small context-light task at a time. It uses less context to execute the task, keeping the "rot" away longer.

2. Aggressive Session Management

If you have been talking to Claude for hours in the same chat window, your performance is garbage. You need to clear the cache.

Session management simply means knowing when to kill the current chat and start a fresh one.

The Manual Method:

  1. Ask the AI: "Create a comprehensive summary of everything we've acted on so far, including current file status."
  2. Copy the summary.
  3. Open a new chat.
  4. Paste the summary.

The Automated Method (Claude Code): Tools like Claude Code (the CLI interface) actually have an auto-compact feature built-in. When the context fills up (around 50-60% capacity), it automatically summarizes the session, dumps the history, and starts a fresh session with just the summary. This keeps the AI sharp without you having to manually babysit the token count.

3. Use Frameworks Like GSD or Ralph Loop

If you want to automate this process entirely, look at scaffolding frameworks like the Ralph Loop or the GSD (Get Sh*t Done) Framework.

These systems are designed to mitigate context rot architecturally:

  1. GSD Framework: Takes a PRD, breaks it into atomic tasks, and launches sub-agents. Each sub-agent spins up with a totally fresh context window, completes its specific task, and then dies.
  2. Ralph Loop: Attacks tasks one by one, starting a new session for every single attempt.

By using these frameworks, you ensure that no single agent ever gets "tired" or confused by 200,000 tokens of chat history. You are essentially giving every task a fresh brain.

The MCP Warning

A final note on the Model Context Protocol (MCP). When MCPs launched, everyone (myself included) went nuts installing 30 different tools.

Here is the reality: MCPs are heavy. Extremely heavy.

Anthropic admitted this in November 2025: MCP tools can bloat the context window significantly because the model needs to keep the tool definitions loaded at all times.

If you have 10 MCP servers running—Google Drive, Slack, GitHub, Filesystem, etc.—you might be burning 20,000+ tokens of context space per message just to keep those tools available.

My advice: Use MCPs sparingly. Toggle them off when you aren't using them. Do not let them sit in the background eating your context budget.

FAQ

What is considered a "large" context window today?

While models like Gemini 1.5 Pro boast up to 2 million tokens, practical testing shows that reasoning capabilities degrade significantly past the 100,000–120,000 token mark. Just because a model accepts 2 million tokens doesn't mean it can smartly use them.

Does starting a new chat really fix the problem?

Yes. When you start a new chat, the input tokens reset to near zero (just the system prompt). By carrying over a summary of the previous session, you keep the necessary context while discarding the thousands of tokens of "fluff" and conversational back-and-forth from the previous session.

How many words is 1,000 tokens?

A good rule of thumb is that 1,000 tokens is roughly 750 words. However, for quick mental math, treating 1 token as 1 word is usually close enough for context management. Code snippets, however, consume tokens faster than standard English text.

Why are MCP tools bad for context?

MCP tools aren't "bad," but they are expensive in terms of tokens. To use a tool, the LLM must have the tool's definition and instructions loaded in its context window constantly. Having unused tools active is like running high-CPU background apps on your computer—it slows everything down significantly.

The Bottom Line

Context rot is unavoidable if you don't manage it actively. The best prompts in the world won't save you if your context window is overflowing with garbage data.

The fix is discipline: Break your work into atomic tasks, use frameworks that support sub-agents, and don't be afraid to hit the "New Chat" button. If you want to see exactly how I set up these context-mitigation workflows using the GSD framework, join the free Chase AI community for templates and live breakdowns.