Claude Code + LightRAG: Graph RAG for Large Document Sets
The death of RAG has been greatly exaggerated. Yes, Opus 4.6 handles massive context windows. But once you're working with more than a few thousand pages of documents, pure context stuffing falls apart — it gets slower, more expensive, and less accurate than a proper RAG setup. The right answer in 2026 is graph RAG, specifically LightRAG, and it plugs directly into Claude Code through a few simple skills. I'll explain how it works, when you need it, and exactly how to set it up.
LightRAG is an open-source graph RAG system that competes with Microsoft's graph RAG at a tiny fraction of the cost. One study from last summer found RAG-based retrieval running 1,250x cheaper than standard LLM approaches at scale. The models have moved since then, so the gap has narrowed, but the principle holds: at large document scale, RAG wins on both cost and speed.
What is LightRAG and how does it differ from naive RAG?
LightRAG is a graph RAG system that builds both a vector database and a knowledge graph from your documents, then uses both in parallel to answer queries. Naive RAG — the kind you saw in every n8n automation from 2024 — just chunks documents, embeds them, and does cosine similarity lookups. That worked for toy examples. It falls apart in production.
The problem with naive RAG is context. If you ask "how does Anthropic relate to Claude Code," naive RAG finds chunks mentioning those terms but has no structural understanding of the connection. Graph RAG solves this by extracting entities (like "Anthropic" and "Claude Code") and relationships (like "Anthropic created Claude Code") from every document, then storing them as a navigable graph.
When you query a graph RAG system, it does two things simultaneously: finds the closest matching vectors in the flat database, and traverses the knowledge graph from the relevant entities. You get the raw text match plus the conceptual connections around it. That's what makes it way better for deep questions spanning multiple documents.
How does graph RAG actually work?
Here's the step-by-step flow for any document going into LightRAG:
- Chunking. The document gets split into chunks — typically a few paragraphs each.
- Embedding. Each chunk goes through an embedding model (OpenAI's
text-embedding-3-largeby default in my setup) and becomes a vector — a point in high-dimensional space. - Entity + relationship extraction. In parallel, an LLM reads each chunk and pulls out the entities mentioned and the relationships between them. Those get stored as nodes and edges in a knowledge graph.
- Storage. The vectors go in a vector database. The entities and relationships go in the knowledge graph. Both sit locally (or remotely if you scale it up).
When you query, the LLM's question gets embedded the same way. The system pulls the closest vectors (cosine similarity) AND traverses the knowledge graph from the matched entities. Both sets of results get handed back to the LLM to augment the answer. That's how you get retrieval + structural context in a single pass.
Imagine 10 documents turned into this web of entities and edges. Now imagine 1,000. That's the scale where graph RAG starts doing things no context window can replicate.
How do you install LightRAG with Claude Code?
Dead simple. Three prerequisites:
- An OpenAI API key (for embeddings — you can go fully local with Ollama if you want, but OpenAI is faster to get running).
- Docker Desktop installed and running. LightRAG ships as a Docker container.
- Claude Code open in whatever directory you want to work from.
Then give Claude Code this prompt:
"Clone the LightRAG repo. Write the .env file configured for OpenAI with GPT-5 mini and text-embedding-3-large. Use all default local storage and start it with Docker Compose."
Include the LightRAG GitHub URL and let it run. The full prompt is inside the free Chase AI community along with the skills I'll mention below.
When it finishes, you'll see a LightRAG container running in Docker Desktop and a localhost URL (typically port 9621). That URL opens the LightRAG web UI — upload tab, knowledge graph visualizer, retrieval tab, and the API endpoints.
How do you connect LightRAG to Claude Code?
The web UI works, but you don't want to bounce between tabs every time you need to query your knowledge base. The better path is exposing LightRAG's API as Claude Code skills. The four skills that do the heavy lifting are query, upload, explore, and status.
All four are in the free community. Drop them in your .claude/skills/ folder and they're ready. Usage looks like:
"Use the LightRAG query skill and ask: what's the full cost picture of running RAG in 2026?"
Claude Code hits the LightRAG API, gets back a full response (with references to the source documents), and summarizes it for you inline. You can also ask for the raw JSON response if you want the unfiltered output. The knowledge graph viewer in the web UI is still useful for exploring entities visually, but the day-to-day querying lives in Claude Code.
When should you actually use LightRAG instead of just context?
Here's the real question. Claude Code with agentic file search is already great for small-to-medium codebases and document sets. The LightRAG tradeoff isn't worth it if you're just working with a few hundred pages.
The rough line is 500 to 2,000 pages. Below that, stick with context and agentic search. Above that — especially if you're pushing into 1 million tokens of source material — start integrating LightRAG. The embedding cost is minor, the graph build happens once per document, and queries are dramatically cheaper than stuffing everything into context.
That study I mentioned earlier (from July 2025) measured RAG at 1,250x cheaper than pure LLM context retrieval on large corpuses. Models have improved since then, so the gap is smaller now, but the direction is correct: at scale, RAG beats context stuffing on both speed and dollars.
You don't have to decide upfront. LightRAG is easy enough to spin up that you can just try it. If your document count is borderline, install it, benchmark both paths, and move forward with whichever works better for your workload.
What are LightRAG's limitations?
Two big ones:
- Text only out of the box. LightRAG can't natively ingest scanned PDFs, charts, tables, or images. If your documents contain non-text content, you need RAG-Anything on top — which is a separate video, but it's from the same team and plugs directly into LightRAG.
- Single-node by default. The default Docker setup runs everything locally. If you need production scale, you can swap the storage backend to Postgres (or Neon for managed Postgres) and host LightRAG itself in the cloud. The flexibility is there, it just requires config changes.
For most solo devs and small teams, the local Docker setup is plenty. I've been running it that way for weeks with no issues.
FAQ
What's the difference between naive RAG and graph RAG?
Naive RAG only stores document chunks as vectors in a flat database and retrieves them via cosine similarity. Graph RAG adds a knowledge graph of entities and relationships extracted from the documents, so queries can traverse conceptual connections in addition to finding matching text chunks.
When should you use LightRAG instead of Claude Code's built-in file search?
When your document corpus exceeds roughly 500 to 2,000 pages (around 1 million tokens). Below that, Claude Code's agentic file search is fine. Above that, LightRAG is cheaper, faster, and more accurate because it offloads retrieval from the LLM's context window.
Do you need OpenAI to run LightRAG?
No — you can go fully local with Ollama for both embeddings and the LLM. OpenAI with text-embedding-3-large and GPT-5 mini is faster to get running and produces good-quality embeddings. Local setups trade speed for zero API cost and full privacy.
How much does LightRAG cost to run?
The LightRAG software is free and open source. You pay only for the embedding API calls (one-time per document) and the LLM calls at query time. At scale, this runs roughly 1,000x cheaper than stuffing documents into context via standard LLM calls, based on mid-2025 benchmarks.
Can LightRAG handle PDFs with charts and images?
Not natively — LightRAG is text-only. To ingest scanned PDFs, charts, tables, or images, install RAG-Anything on top. RAG-Anything is built by the same team and plugs directly into your existing LightRAG instance.
If you want to go deeper into building RAG systems with Claude Code, join the free Chase AI community for templates, prompts, and live breakdowns. And if you're serious about building with AI, check out the paid community, Chase AI+, for hands-on guidance on how to make money with AI.


