The 7 Levels of Claude Code and RAG: What Level Are You On?
The 7 Levels of Claude Code and RAG: What Level Are You On?
Claude Code and memory is one of the hardest problems in AI right now, and the default answer — RAG — is not always the right one. This post is a map of seven levels, from the auto-memory system Claude Code uses out of the box all the way up to agentic graph RAG with multimodal ingestion. At each level, I tell you how to know you're there, what to master, what the trap is, and how to move up. The biggest lesson: most people don't actually need to get past level 4 (Obsidian). Level 7 is overkill for 95% of use cases.
This is a roadmap, not a technical setup tutorial. When you need to actually build LightRAG or RAG-Anything, I've done deep dives on those separately and I'll link them where relevant.
What Is Level 1 Claude Code Memory (Auto-Memory)?
Level 1 is where you've never set anything up intentionally. You rely on whatever Claude Code does automatically, which is called auto-memory.
Auto-memory creates markdown files inside your .claude folder (specifically projects/memory/) based on its own intuition about what's important. Go look in that folder — you'll see files that Claude Code wrote itself, like Post-it notes. Things like "Chase mentioned his YouTube growth goals" or a small note about a skill. There's also an index memory file that lists the sub-memory files and what's in them.
It's cute. It's also not that useful. It works like ChatGPT's memory system — every so often it shoehorns in something random from a prior conversation in a way that feels off. Auto-memory alone is not enough.
The trap at level 1 is passivity. If you never take an active role in what Claude Code remembers, you have no control over what it considers when answering your questions. The skill to master is understanding that auto-memory is insufficient and that you need explicit memory structures.
What Is Level 2 Claude Code Memory (CLAUDE.md)?
Level 2 is where you start editing CLAUDE.md — the file Claude Code references on essentially every prompt. It feels like the holy grail at first. Finally, a place to tell Claude Code the rules and conventions you always want followed.
Claude Code auto-creates a CLAUDE.md, but you can edit it, and you can rebuild it on demand with /init. For all intents and purposes, Claude Code reads this file before every task it executes in that project. So if you want it to remember things, the instinct is to cram them in here.
But there's a study (the "Evaluating Agents.md" paper — substitute AGENTS.md for CLAUDE.md, same idea) showing that these global instruction files can reduce LLM effectiveness when they're bloated. The reason: injecting the same context into every prompt is a double-edged sword. If the content isn't relevant to nearly every prompt, you're polluting the context window, not helping it.
The trap at level 2 is a bloated rulebook. Based on that study and my own experience, less is more. Context pollution is real. If something isn't relevant to virtually every prompt you're going to run in the project, it doesn't belong in CLAUDE.md. Master high-signal project context and treat CLAUDE.md as a pointer file, not a catch-all.
What Is Level 3 Claude Code Memory (Multi-File State)?
Level 3 is where CLAUDE.md becomes an index instead of a rulebook. You break memory across multiple structured markdown files, each with a specific purpose.
A good example of this architecture is what the GSD orchestration tool does: it creates a project.md, requirements.md, roadmap.md, and state.md. Requirements persist the spec. Roadmap tracks what's done and what's next. Project defines the north star. State captures the current moment. CLAUDE.md then just points at these when needed.
By splitting memory this way, you fight context rot. You avoid the failure mode from level 2 where every prompt gets flooded with the same giant file. And you start setting up something that structurally looks like a crude version of chunking and similarity search — only with four files instead of thousands.
The trap at level 3 is that your system is project-specific. These markdown files live inside the project and don't transfer cleanly to other projects. That's what pushes most people to level 4.
What Is Level 4 Claude Code Memory (Obsidian)?
Level 4 is Obsidian plus Claude Code, and honestly, this is where most people should stop. It's the 80% solution that functions as a 99% solution for most real-world use cases. Free, low overhead, works.
Andrej Karpathy talked about his Obsidian-based LLM knowledge base in a video that got close to 20 million views. It's worth paying attention to. The pattern looks like this:
- Vault folder — your root knowledge base.
- Raw folder — staging area for ingested material. Think Claude Code doing competitive analysis on 50 sites and dumping all the research here.
- Wiki folder — the structured, processed version of that raw data, organized as Wikipedia-style articles.
Each wiki article gets its own folder. The wiki has a master index markdown file that lists what exists. Each sub-folder has its own index. When you ask Claude Code a question, it walks: vault -> wiki -> master index -> article. Because the hierarchy is explicit, Claude Code can navigate thousands of documents without RAG.
The key is structure. Most people pile a thousand documents into a single folder and then wonder why Claude Code can't find anything. That's asking it to search a factory floor. Give it a filing cabinet and it'll do well.
The trap at level 4 is cost and speed at scale. There's a study comparing textual RAG to textual LLMs that found RAG was 1,200 times cheaper and 1,200 times faster for equivalent answers. That study is from 2025 — before newer models and bigger context windows — so the gap has shrunk. But even if it's now 10x or 100x, that still matters when you're at scale. Obsidian works great until it suddenly doesn't. You won't know where your line is until you experiment.
What Is Naive RAG in Claude Code?
Level 5 is where you finally learn how real RAG works — chunking, embeddings, vector databases, and retrieval. Even if you never implement naive RAG in production, you need to understand it to make good decisions about what comes next.
RAG breaks into three stages:
- Ingestion. Documents get split into chunks. Each chunk goes through an embedding model that turns it into a vector — a series of numbers representing the chunk's semantic meaning.
- Vector database. Vectors get stored in a multi-dimensional space (hundreds or thousands of dimensions, though a 3D analogy works for intuition). Similar concepts cluster together — fruit vectors near other fruit, ship vectors near other ships.
- Retrieval. When you ask a question, the question also gets embedded. The system finds the vectors closest to the question vector, pulls those chunks, and feeds them to the LLM as augmentation on top of the model's training data. Hence "retrieval augmented generation."
The trap at level 5 is thinking naive RAG is good enough. It usually isn't. Chunking is arbitrary — how do you decide boundaries? What happens when chunk 3 references something only in chunk 1 and retrieval misses it? Vectors sit in silos and can't express relationships between concepts. Re-rankers help, but naive RAG setups are basically a more complicated Ctrl-F with a worse hit rate.
If you see someone selling you a "Pinecone RAG system" or a "Supabase RAG system" that doesn't mention graph structure or agentic routing, assume effectiveness around 25%. Know this before you buy.
What Is Graph RAG with LightRAG?
Level 6 is graph RAG, which is the minimum viable sophistication level if you're actually going to use RAG in production. My recommendation: start with LightRAG, which is an open-source graph RAG implementation — the lightest-weight option I know of.
Graph RAG's core insight: everything is connected. Instead of isolated vectors, you have entities and relationships. When you retrieve, you're traversing a graph — pulling not just similar chunks but the relationships that connect them.
LightRAG's own benchmarks versus naive RAG show jumps that are often more than 100% (31.6 to 68.4, 24 to 76, 24 to 75 across different metrics). Take LightRAG's numbers with a grain of salt — they're their numbers. But the direction of improvement is real and consistent with published graph RAG literature generally.
A knowledge graph in LightRAG looks visually similar to Obsidian's graph view. Don't confuse them. Obsidian's connections are manual or heuristic — you or Claude Code added [[wiki links]]. LightRAG's are derived from actual embedding and entity extraction on document content. They are not the same system. We are not the same, brother.
The trap at level 6 is that plain LightRAG is text-only. Scanned PDFs, images, videos — none of it ingests cleanly. And you still haven't built the data pipeline around the RAG system.
What Is Agentic RAG?
Level 7 is agentic, multimodal RAG — the current bleeding edge as of April 2026. This is where you start ingesting images, tables, videos, and scanned PDFs into your knowledge graph and layering an AI agent on top to route queries intelligently.
Tools worth knowing at this level:
- RAG-Anything. Lets you ingest images and non-text documents (think scanned PDFs) into a LightRAG-style knowledge graph.
- Gemini embedding 2. Released March 2026 — can actually embed videos directly, not just transcripts. This is a big deal because a huge amount of knowledge on the internet is locked in video, and transcripts alone don't capture enough.
But the real level 7 work isn't the RAG system itself — it's the surrounding architecture. Data ingestion pipelines. Deduplication. Version management. Update propagation. Access control for teams. Looking at a production agentic RAG system diagram, the graph RAG part is a small box in the corner; the rest is data plumbing.
An agentic layer on top also matters. In a real team setting, you're probably juggling a graph RAG database, standard Postgres tables queried with SQL, and maybe other sources. The agent needs to decide which to hit for a given question.
The trap at level 7 is forcing yourself into this level when you don't need it. After all this, most people are fine with Obsidian. This is overkill for most solo operators. You should only be here if you have genuinely multimodal content and team-scale complexity.
Which Level of Claude Code RAG Should You Actually Use?
For 95% of solo operators, level 4 (Obsidian) is the right answer. It's free, it's flexible, and it's enough. The mistake most people make is jumping to level 6 or 7 because RAG is the shiny thing everyone talks about.
My standard recommendation — including for clients — is to always start simple and level up only when you hit a real wall. Specifically:
- If you can get away with multi-file markdown state inside your project, do that.
- If that's not enough, go to Obsidian.
- If Obsidian can't handle your scale, try LightRAG (open source, lightweight, easy to walk away from).
- If LightRAG isn't enough, add RAG-Anything for multimodal.
- Only go full agentic + pipelines if you have a team or multi-source requirements.
The only way to know where your line is is to experiment. Nobody can tell you ahead of time when Obsidian breaks down for your specific workload. You have to push it and see.
Frequently Asked Questions
Do I need RAG to use Claude Code effectively?
No. Most solo users never need a real RAG system. A well-structured Obsidian vault with a clear index hierarchy handles thousands of documents fine when you use Claude Code's built-in grep and file search. Only move to RAG when you have a real scaling or cost problem.
What's the difference between Obsidian and graph RAG?
Obsidian's graph is built from manual or heuristic wiki-links — connections you or Claude Code added by hand. Graph RAG (like LightRAG) builds connections automatically from embedding and entity extraction on actual document content. Obsidian is rudimentary; graph RAG is sophisticated. They look similar but work very differently.
Is naive RAG good enough for a production system?
Usually not. Naive RAG (vector chunks with similarity search, no graph, no re-ranker) often performs around 25% effectiveness on real questions because chunks lose context and relationships between concepts aren't captured. If someone sells you a naive RAG system, push for graph RAG or at least a re-ranker.
What's the 80/20 on Claude Code memory for solo operators?
Obsidian, structured with a vault -> raw -> wiki folder hierarchy, plus a master index markdown that Claude Code can navigate. That combination scales to thousands of documents, costs nothing, and you can see exactly what's in the system at any time. Start here.
When should I use LightRAG over Obsidian?
When Obsidian starts giving you wrong answers on complex multi-document queries, or when token cost and response time on Obsidian lookups become painful at your document volume. LightRAG is open source and lightweight, so it's a low-risk next step — if it doesn't help, you haven't spent much to find out.
If you want to go deeper into Claude Code memory, Obsidian, and RAG, join the free Chase AI community for templates, prompts, and live breakdowns. And if you're serious about building with AI — including production-level RAG systems — check out the paid community, Chase AI+, for hands-on guidance on how to make money with AI.


