Claude Code's Source Got Leaked. Here's What Agent Builders Should Actually Learn From It.
Anthropic’s most powerful developer tool had its internals laid bare. The Claude Code source leak spread across GitHub, Reddit, and X in a matter of hours. Developers expected to find proprietary algorithms, secret model fine-tuning, or some hidden competitive moat.
What they found instead was a while loop, a handful of tools, and a lot of thoughtful engineering.
The leak didn’t reveal a secret. It revealed a blueprint. And if you’re building AI agents, that blueprint is more valuable than any secret would have been. A community project called Learn Claude Code has since distilled the architecture into 12 composable sessions that anyone can study.
Here’s what matters for agent builders, and why most of it matters even if you’ve never written a line of code.
The Core Surprise: One Loop Does Everything
The entire foundation of Claude Code is a loop with one exit condition. The AI receives a task, decides what tool to use, checks the result, and repeats until it’s done.
%%{init: {"layout": "dagre"}}%%
flowchart LR
A[Your Request] --> B[AI Decides]
B --> C[Runs a Tool]
C --> D{Done?}
D --> |no| B
D --> |yes| E[Returns Answer]
That’s it. Under 30 lines of Python. No orchestration framework, no workflow engine, no graph database.
Here’s the actual structure:
def agent_loop(query):
messages = [{"role": "user", "content": query}]
while True:
response = client.messages.create(
model=MODEL, system=SYSTEM, messages=messages,
tools=TOOLS, max_tokens=8000,
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
return # Done. The only exit.
results = []
for block in response.content:
if block.type == "tool_use":
output = run_tool(block.name, block.input)
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": output,
})
messages.append({"role": "user", "content": results})
One exit condition controls the entire flow: stop_reason != "tool_use". The model keeps calling tools until it decides it’s done. No state machine. No graph traversal. Just a while loop and a list of messages that grows.
I wrote about how this agentic loop works in detail previously. The leak confirmed every architectural bet: the gather-act-verify cycle, the thin client, the model-driven control flow.
Most teams overcomplicate their first agent. They reach for LangChain, CrewAI, or AutoGen before writing a single line of their own code. The leak proved that the most capable coding agent in production runs on a pattern simple enough to fit on an index card.
The outcome: Start with the simplest loop that works. Add complexity only when real problems force you to. The teams shipping agents fastest aren’t using the fanciest frameworks. They’re running variations of this exact loop.
Tools Are Just a Dictionary
Adding new capabilities to Claude Code doesn’t require rewriting the loop. Each tool is a function registered in a lookup table. The AI decides which tool to call. The client looks it up, runs it, and returns the result.
TOOL_HANDLERS = {
"bash": lambda **kw: run_bash(kw["command"]),
"read_file": lambda **kw: run_read(kw["path"], kw.get("limit")),
"write_file": lambda **kw: run_write(kw["path"], kw["content"]),
"edit_file": lambda **kw: run_edit(kw["path"], kw["old_text"],
kw["new_text"]),
}
The dispatch map is a Python dictionary. One lookup replaces any if/elif chain. Adding a new tool means writing one handler function and one dictionary entry. The loop body stays identical to the 30-line version above.
| Component | What It Does |
|---|---|
| Tool schema | JSON description the AI reads to understand the tool |
| Handler function | Executes the action (with path sandboxing for safety) |
| Dispatch map | {tool_name: handler} dictionary lookup |
Path sandboxing prevents workspace escape:
def safe_path(p: str) -> Path:
resolved = (WORKDIR / p).resolve()
if not resolved.is_relative_to(WORKDIR):
raise ValueError(f"Path escapes workspace: {p}")
return resolved
Every file tool calls safe_path() before touching disk. The AI can ask to read /etc/passwd. The handler refuses. Security lives in the client, not the model.
The outcome: Teams that adopt this pattern report building new agent capabilities in hours instead of days. The architecture scales without architectural changes. Need a database query tool? One function, one dict entry, zero loop modifications.
The AI Plans Its Own Work
On multi-step tasks, Claude Code doesn’t wing it. It writes itself a checklist first, then works through items one at a time. A built-in “nag” system reminds the AI to update its checklist if it hasn’t done so recently.
TodoManager state
[ ] Set up database schema
[>] Write API endpoints <- currently working
[x] Create project structure
The constraint is intentional: only one item can be in_progress at a time. This forces sequential focus. And if the model goes 3+ rounds without updating its checklist, the harness injects a <reminder>Update your todos.</reminder> message into the next tool result.
This solves a real problem. AI models lose track of what they’re doing on long tasks. They repeat work, skip steps, or wander off into tangents. The checklist keeps them honest. I explored the broader principle in my post on agentic design patterns, where the reflection pattern (producer-critic loops) addresses the same challenge: keeping AI focused through explicit structure rather than hoping it stays on track.
The outcome: Agents that plan before they act complete multi-step tasks at 2-3x the rate of agents that improvise. If your agent is unreliable on complex work, this is likely the first pattern to add.
Subagents Keep the Main Thread Clean
Here’s a pattern that surprised many reviewers. When Claude Code needs to research something, it doesn’t pollute its own memory with all the intermediate work. It spawns a “subagent” with a fresh, empty context. The subagent does the research, returns a short summary, and its entire working memory gets discarded.
%%{init: {"layout": "dagre"}}%%
flowchart LR
subgraph Parent["Parent Agent (preserved)"]
P1[Working on task]
P2[Gets: 'X uses pytest']
P3[Continues work]
end
subgraph Child["Subagent (disposable)"]
C1[Fresh context]
C2[Reads 5 files]
C3[Runs 3 commands]
C4[Summary only returned]
end
P1 --> |"'find testing framework'"| C1
C4 --> |"one-paragraph summary"| P2
The child’s entire message history (possibly 30+ tool calls, thousands of tokens of file contents) gets discarded. The parent receives a one-paragraph summary as a normal tool result.
This maps directly to the multi-agent collaboration pattern I wrote about: decomposition creates focus. A research agent that only researches produces better answers than a generalist juggling research alongside its main task. The difference is that subagents here are disposable by design. No persistent identity, no message bus. Spawn, summarize, discard.
Context window management is the #1 silent killer of agent quality. Every file read, every command output, every search result eats into the AI’s working memory. When that memory fills up, the AI starts making worse decisions. Subagents are the fix.
The outcome: Agent systems that isolate research from execution produce higher quality results on complex tasks. Your main agent makes better decisions because it only sees clean summaries, not raw data dumps.
Knowledge Loads on Demand, Not Upfront
Claude Code doesn’t stuff its system prompt with every possible instruction. Instead, it uses a two-layer approach:
Layer 1 (System prompt, always loaded):
┌──────────────────────────────────┐
│ You are a coding agent. │
│ Skills available: │
│ - git: Git workflow helpers │ ~100 tokens per skill
│ - test: Testing best practices │
│ - review: Code review process │
└──────────────────────────────────┘
Layer 2 (On demand, via tool_result):
┌──────────────────────────────────┐
│ <skill name="git"> │
│ Full git workflow instructions │
│ Branching conventions... │ ~2,000 tokens per skill
│ Commit message format... │
│ </skill> │
└──────────────────────────────────┘
Layer one: a lightweight menu of available skills. Costs almost nothing. Layer two: the full instructions for a specific skill, loaded only when the AI calls load_skill("git").
Think of it like a restaurant. The waiter doesn’t memorize every recipe. They know the menu. When you order, they relay the specific dish to the kitchen.
The math makes this obvious. 10 skills at 2,000 tokens each = 20,000 tokens permanently in the system prompt. With on-demand loading, you spend ~1,000 tokens on the menu and load only the 1-2 skills you actually need per session. That’s a 60-80% token reduction.
The outcome: Lower costs, faster responses, and the AI stays focused on what’s relevant to the current task rather than wading through 20,000 tokens of instructions it won’t use.
Memory Doesn’t Have to Be Infinite. Just Smart.
This is the pattern that makes Claude Code usable on large codebases. Three compression layers work together:
%%{init: {"layout": "dagre"}}%%
flowchart TB
TR[Tool Result] --> L1["Layer 1: Micro-compact\n(every turn, silent)"]
L1 --> Check{"Tokens\n> 50,000?"}
Check --> |no| Continue[Continue normally]
Check --> |yes| L2["Layer 2: Auto-compact\n(save transcript, summarize)"]
L2 --> Fresh[Fresh context + summary]
Manual["Layer 3: Model calls\ncompact tool"] --> L2
- Micro-compact: Every turn, tool results older than 3 rounds get replaced with placeholders like
[Previous: used read_file]. The AI still knows it read the file. The raw contents are gone. - Auto-compact: When token count crosses 50,000, the full transcript gets saved to disk (
.transcripts/), then an LLM call summarizes the conversation into a few paragraphs. All messages get replaced with that summary. - Manual compact: The AI can trigger compression itself when it knows a big task is coming.
This is the agentic equivalent of context window checkpointing, one of the 15 production patterns I wrote about. The principle is identical: periodically distill state into a compact summary, continue with a fresh window. Transcripts on disk mean nothing is truly lost. Just moved out of active context.
Without compression, an agent hits its memory limit after reading ~30 files. That’s not enough for real work. With these three layers, it can work indefinitely.
The outcome: The teams building production agents that handle multi-hour sessions all implement some version of this compression stack. It’s not optional infrastructure. It’s table stakes for any agent that needs to work on real projects.
Tasks Survive Crashes. Conversations Don’t.
Claude Code stores its task list as files on disk, not in the AI’s memory. Each task is a JSON file with status, dependencies, and ownership. When one task completes, it automatically unblocks dependent tasks.
%%{init: {"layout": "dagre"}}%%
flowchart TB
T1["Task 1\ncompleted"] --> T2["Task 2\npending (unblocked)"]
T1 --> T3["Task 3\npending (unblocked)"]
T2 --> T4["Task 4\nblocked by 2,3"]
T3 --> T4
This is a DAG (directed acyclic graph) persisted to disk:
.tasks/
task_1.json {"id":1, "status":"completed", "blockedBy":[]}
task_2.json {"id":2, "status":"pending", "blockedBy":[]}
task_3.json {"id":3, "status":"pending", "blockedBy":[]}
task_4.json {"id":4, "status":"blocked", "blockedBy":[2,3]}
Completing task 1 triggers _clear_dependency(1), which removes 1 from every other task’s blockedBy list. Tasks 2 and 3 become unblocked automatically. The graph answers three questions at any moment: what’s ready, what’s blocked, what’s done.
If the AI crashes, forgets, or gets compressed, the task board is still intact on disk. The agent can pick up exactly where it left off.
The outcome: Agent systems with persistent task graphs handle interruptions gracefully. Users can close their laptop, restart the tool, even switch machines, and the work continues. This is the difference between a demo and a product.
Agents That Find Their Own Work
Here’s where it gets interesting. In earlier patterns, the lead agent assigns every task manually. Doesn’t scale. In the leaked architecture, autonomous teammates scan the task board themselves, claim unclaimed tasks, and work on them without being told.
The lifecycle:
%%{init: {"layout": "dagre"}}%%
stateDiagram-v2
[*] --> Spawn
Spawn --> Working
Working --> Idle : task done or idle signal
Idle --> Working : inbox message or unclaimed task found
Idle --> Shutdown : 60s timeout, nothing to do
During the idle phase, a teammate polls every 5 seconds:
- Check inbox for messages from other agents
- Scan
.tasks/for pending, unowned, unblocked tasks - If found, claim it and resume working
- If 60 seconds pass with nothing, shut down gracefully
One subtlety: after context compression, the agent might forget who it is. The harness re-injects an identity block (<identity>You are 'alice', role: coder</identity>) whenever messages drop below a threshold. Identity survives compression.
I explored the workflow vs agent distinction in a previous post. The key test: “Who controls the steps and their order?” In this pattern, the answer is fully autonomous. No human assigns work. No lead agent delegates. Agents discover and claim work from a shared board.
The outcome: Teams report that self-organizing agent pools complete backlogs 2-3x faster than lead-directed approaches. The coordination cost disappears because there is no coordinator.
Multiple Agents, Zero Collisions
The final and most sophisticated pattern: when multiple AI agents work on the same codebase simultaneously, each one gets its own isolated directory (a git worktree). Agent A edits authentication in one copy. Agent B fixes the UI in another. Their changes never interfere.
%%{init: {"layout": "dagre"}}%%
flowchart TB
subgraph Control[".tasks/ (what to do)"]
T1["task_1.json\nworktree: auth-refactor"]
T2["task_2.json\nworktree: ui-login"]
end
subgraph Execution[".worktrees/ (where to do it)"]
W1["auth-refactor/\nbranch: wt/auth-refactor"]
W2["ui-login/\nbranch: wt/ui-login"]
end
T1 <--> |"bound by task_id"| W1
T2 <--> |"bound by task_id"| W2
The task board tracks what needs doing. Worktrees track where each agent is doing it. They’re linked by task ID. Creating a worktree with task_id=1 auto-advances that task to in_progress. Removing a worktree with complete_task=True marks the task done and emits a lifecycle event to .worktrees/events.jsonl.
Without isolation, two agents editing the same file at the same time will corrupt each other’s work. It’s the AI equivalent of two people typing in the same Google Doc and deleting each other’s paragraphs.
This connects to the blast radius limiter pattern from my production agents writeup. Worktrees are containment at the filesystem level. Each agent’s blast radius is limited to its own directory. A bad edit in one worktree can’t affect another.
The outcome: Teams running parallel AI agents with worktree isolation report completing feature work 2-3x faster than serial approaches. The architecture supports it safely because each agent has its own sandbox.
The Real Lesson: Boring Beats Clever
Here’s what the leak ultimately taught the developer community. The most capable AI coding agent in production isn’t powered by a secret algorithm. It’s powered by patterns that have existed in software engineering for decades:
| Pattern | What It Does in Claude Code | Old World Equivalent |
|---|---|---|
| Agent loop | Runs tools until done | Event loop (Node.js, game engines) |
| Tool dispatch | Routes tool calls to handlers | Plugin architecture |
| TodoWrite | Plans before executing | Project management boards |
| Subagents | Isolates research from main thread | Microservices with clean interfaces |
| Skill loading | Loads instructions on demand | Lazy loading, dependency injection |
| Context compact | Compresses memory when full | Garbage collection |
| Task graph | Persists goals with dependencies | Database-backed job queues |
| Background tasks | Runs slow ops in parallel | Thread pools, async workers |
| Agent teams | Coordinates persistent teammates | Message queues, pub/sub |
| Protocols | Structured handshakes between agents | RPC, request-response patterns |
| Autonomous agents | Self-assign from a shared board | Work-stealing schedulers |
| Worktree isolation | Isolates each agent’s filesystem | Container isolation, sandboxing |
None of these ideas are new. What’s new is applying them to AI agents in a cohesive, composable system where each layer builds on the previous one without replacing it.
The takeaway: You don’t need a PhD in machine learning to build production-grade AI agents. You need solid software engineering fundamentals and the discipline to start simple.
Where to Start
If you’re building agents today, the leak gives you a clear progression:
- Week one: Build the loop. One tool (bash or an API call), one exit condition. Get something working end-to-end.
- Week two: Add a tool dispatch map. Register 3-4 tools without touching the loop.
- Week three: Add a task checklist so the agent plans before it acts.
- Week four: Add context compression so the agent can work on real-sized projects.
Everything else, subagents, skills, team coordination, worktrees, layers on top of those four foundations without replacing them.
The Bottom Line
The Claude Code leak revealed that the gap between hobbyist agents and production agents isn’t secret technology. It’s engineering discipline. A simple loop, clean tool boundaries, persistent state, and smart memory management.
The blueprint is public now. The 12 patterns compose into each other. Each one solves a specific, real problem that every agent builder hits eventually. The question isn’t whether you’ll need these patterns. It’s whether you’ll discover them the hard way or learn from the leak.
The best agents aren’t the cleverest. They’re the most composable.
Building AI agents and want to discuss these patterns? I’d love to hear which ones changed your approach. Reach out on LinkedIn.