Back to Blog

5 Open Source Repos That Make Claude Code Unstoppable

8 min read

Every single day there's a new open source repo on GitHub that changes how you use Claude Code. Most of it is noise. A few genuinely move the needle. Here are five open source projects from the last month that I think are worth your time — three heavy hitters you may have seen, two that will probably surprise you, and a clear breakdown of when each one makes sense.

Repo #1: Auto Research By Karpathy

Auto Research is Andrej Karpathy's machine learning optimization loop in a box — a repo that lets Claude Code iteratively self-improve at a defined task by running tests, scoring results, and keeping only the changes that score higher. It's up to almost 60,000 stars and came out three weeks ago.

The loop is simple. Claude Code attempts the task, gets a score, and then does one of two things: if the score improved, it commits. If the score got worse, it does a git reset and throws the attempt away. Then it thinks of something new to try, edits the code, and runs the loop again. Rinse and repeat, forever if you want.

The results people are getting are real. Tobi (CEO of Shopify) ran Auto Research on an internal 0.8 billion parameter model, let it run 8 hours across 37 experiments, and ended up with a model 19% more efficient than the starting point.

How Auto Research Actually Works

There are three core files you need to understand:

  • program.markdown — this is where you define the task and constraints. You edit this. This is the equivalent of saying "here's what I want you to get better at."
  • train.py — this is what Claude Code edits on each loop. Think of this file as the model's "weights" in a standard ML setup. Auto Research isn't actually adjusting a billion parameters; it's adjusting the code that produces the output.
  • prepare.py — Karpathy's foundational code. Don't touch it.

When Auto Research Actually Works

Auto Research only works when the task has a binary or numeric score. If the "did it improve" question has a subjective answer, the loop is nonsense. The LLM has no way of knowing whether its last attempt was better or worse.

Things it's great at:

  • Making a Python script run faster (measurable in ms).
  • Prompt optimization (pass/fail against a test set).
  • Skill optimization (did the skill pass its tests or fail them).
  • System prompt iteration against an expected output format (yes/no match).

Things it's useless for:

  • Social media content ("is this post good?" has no objective score).
  • Cold emails, copywriting, creative writing.
  • Anything where the output is subjective or requires live human judgment.

In the Auto Research repo's first example, 83 experiments produced 15 kept improvements. That's an 18% hit rate — not because the tool is broken, but because most attempts don't improve the score. That's the nature of optimization loops.

Repo #2: OpenSpace — Self-Improving Skills

OpenSpace is an MCP server from HKUDS (Hong Kong University Data Science Lab) that tracks how your Claude Code skills perform and automatically buckets them into autofix, autoimprove, or autolearn. It came out 4 days ago, sitting at 1.7K stars.

HKUDS is worth paying attention to. These are the folks behind LightRAG, NanoBrowser, RAG-Anything, DeepCode, AI-Trader, and Cite-Anything. When they ship, it's usually worth a look.

The idea: every time Claude Code uses a skill, OpenSpace monitors quality and sorts the skill into one of three buckets:

  • Autofix — the skill is broken. Rewrite it.
  • Autoimprove — it works, but there's meaningful room to do better.
  • Autolearn — this is about as good as it gets. Lock it in.

There's an upfront token cost (you're running an MCP server alongside every skill call), but OpenSpace claims the aggregate effect is token-negative: better skills mean less retry work. Their benchmark shows 46% fewer tokens used on real-world tasks with the improved skills.

They also cite a $11,000 dollar figure from their internal "GDP" metric — take that one with a grain of salt, since it's their custom measure. The more credible number is the benchmark methodology: 220 real-world professional tasks across 44 occupations, using Qwen 3.5+ on both sides. On that test, OpenSpace produced 4.2x higher income, 73% value capture, and 70% quality vs. 40% quality on the baseline.

One of their showcase demos built a "super-world monitor" dashboard (maps, video streams, analytics in one view). Starting from 6 initial skills, OpenSpace's system generated 54 additional skills automatically during the build — 60 total.

If you're deep into the skills ecosystem and care about building up a library that gets better over time, this is worth a real look.

Repo #3: CLI-Anything From HKUDS

CLI-Anything (command line interface anything) lets you turn any open source project into a Claude Code CLI — it analyzes the repo, runs tests, documents the code, and publishes it as a tool Claude Code can use. It launched in early March and already has around 24,000 stars.

This sits in the middle of a bigger shift in the Claude Code ecosystem: people are moving away from MCPs toward CLIs. CLIs are more token-efficient because they live where Claude Code already lives — in the terminal. No MCP server overhead, no protocol translation, just a native tool call.

Install is simple. Two-line plugin setup, then a single command where you point CLI-Anything at an open source project. It runs the whole pipeline on its own.

The team has already used it to build CLI tools for Blender, Audacity, OBS, and Zoom. Those exist in the repo right now — you can use them directly without running the pipeline yourself. Or use them as templates for your own.

The real value here is closing the agent-software gap. Claude Code is good, but there are a thousand pieces of software it can't interact with yet. CLI-Anything is how that gap gets filled — one open source project at a time.

Repo #4: Claude Peers — Multi-Session Communication

Claude Peers lets multiple Claude Code sessions find each other and talk to each other through an MCP server and a SQLite database. It shipped last week and is just over 1,000 stars.

Normally every Claude Code terminal you have open runs in its own vacuum. Claude Peers breaks that. The MCP server auto-launches when your first session starts, and every additional session spawned after that gets a summary of the existing sessions pushed into it on startup. From then on they can message each other.

Why You'd Actually Want This

The use case clicked for me when I read Anthropic's March 24 blog post on long-running application development. The post is about harnesses for complex tasks (front-end design, video games) where Claude Code struggles in a single session — especially because Claude Code is a poor evaluator of its own work. It tends to hype itself up.

Anthropic's proposed harness splits the work into three roles: a planner, an executor, and an evaluator. One session plans. Another writes the code. A third grades the output. That structure works because the evaluator isn't emotionally attached to the work.

Claude Peers is the piece that makes this harness trivial to build. Instead of manually shuttling information between terminals, you let the sessions talk to each other directly. Executor ships a draft, evaluator critiques it, executor iterates. Everything happens in a persistent coordination layer.

For any long-running build where you need both creation and honest evaluation, this pattern is probably where we're all heading.

Repo #5: Google Workspace CLI

The Google Workspace CLI (official-unofficial — built by Google developers but not officially endorsed) gives Claude Code access to Gmail, Drive, Docs, Calendar, Sheets, and the rest of Google Workspace through a full CLI surface.

Two ways to use it. Go all-in and let Claude Code drive your entire Google account, or sandbox it — give it a dedicated email, share only specific Drive folders, wire it up to only the services you actually need. I'd start sandboxed.

The install is a little heavier than the others because the repo has a huge number of skills. Before you run the full install, clone the repo and have Claude Code walk you through which skills you actually need. You're not forced into an all-or-nothing setup.

What About Prompt Injections?

The obvious concern with giving Claude Code access to your Gmail is prompt injection attacks. An adversary sends an email with hidden instructions and suddenly your agent is forwarding credentials.

The Google Workspace CLI ships with skills for Model Armor — Google's built-in prompt injection defense. Model Armor scans incoming content before it hits Claude Code and flags (or strips out entirely, depending on config) anything that looks like an injection attempt. You can tune how aggressive you want it to be.

If you're using Claude Code as a personal assistant, this repo is probably the single biggest productivity unlock you can install. Everything I do in Workspace every day — email triage, doc drafting, calendar coordination — is now agent-addressable without glue code.

FAQ

Which of these five repos is actually worth installing first?

Depends on your goal. For coding productivity, CLI-Anything has the broadest immediate value. For personal assistant workflows, Google Workspace CLI is the biggest unlock. Auto Research is the most interesting research tool but only pays off if your task has a numeric score. OpenSpace is for people already deep into the skills ecosystem. Claude Peers is experimental but points at where multi-agent workflows are heading.

Why is the ecosystem moving from MCPs to CLIs?

CLIs are more token-efficient and more capable. They live in the terminal where Claude Code already runs, so there's no MCP protocol overhead. They're also typically faster to install and easier to debug because you can call them yourself without going through Claude Code at all.

Can Auto Research improve a full foundation model?

In theory, yes — Karpathy's loop is model-agnostic. In practice, Toby at Shopify ran it on a 0.8 billion parameter model for 8 hours and got a 19% efficiency gain. The technique works best on smaller models and specific optimization targets, not on training a frontier-scale model from scratch.

Is it safe to let Claude Code access my Gmail through Google Workspace CLI?

With Model Armor enabled and a sandboxed account, yes. The safer pattern is to set up a secondary email dedicated to agent work, share only the Drive folders and services you actually need, and run Model Armor in strip-out mode for any incoming content. Don't just connect it to your primary inbox with full access on day one.

Do these repos work with Claude Code's subagents?

Most of them do. Claude Peers specifically is designed around multi-session coordination, which is a close cousin to subagents. OpenSpace improves skills that subagents use. CLI-Anything produces tools that any agent flavor can call. The only one that's single-session by design is the Auto Research loop.


If you want to go deeper on open source Claude Code tooling, join the free Chase AI community for templates, prompts, and live breakdowns. And if you're serious about building with AI, check out the paid community, Chase AI+, for hands-on guidance on how to make money with AI.