The AI Agent Stack: From Sandboxes to Swarms
TL;DR: AI coding tools stack into four layers — LLM providers, coding agents, agent runtimes, and orchestrators — and most discourse conflates them all. Overstory and Gastown solve coordination at the top. Claude Code and Crush solve coding in the middle. IronClaw solves trust at the bottom. They don't compete. They compose.
The AI Agent Stack: From Sandboxes to Swarms
How Overstory, Gastown, Crush, OpenCode, and the Claw Wars map to four distinct layers of the AI coding agent landscape.
Quick Orientation
| If you care about... | Look at... |
|---|---|
| Coordinating 10+ AI agents on one codebase | Overstory or Gastown (Layer 4) |
| A single AI agent that writes code well | Claude Code, Crush, or OpenCode (Layer 3) |
| Preventing your agent from stealing your AWS keys | IronClaw (Layer 2) |
| The model behind the agent | Claude, GPT, Gemini, local models (Layer 1) |
These tools don't compete. They stack. Most people conflate them because they all involve "AI writing code," but they're solving fundamentally different problems at different layers. This post maps the landscape.
The Four-Layer Model
┌─────────────────────────────────────────────────────────┐
│ Layer 4: ORCHESTRATION │
│ "Who does what, when, and how does it merge?" │
│ Overstory, Gastown, Claude Code Agent Teams │
├─────────────────────────────────────────────────────────┤
│ Layer 3: CODING AGENT │
│ "One agent, one session, write some code" │
│ Claude Code, Crush, OpenCode, Cursor, Codex CLI │
├─────────────────────────────────────────────────────────┤
│ Layer 2: AGENT RUNTIME │
│ "Can I trust this tool not to exfiltrate my data?" │
│ IronClaw, OpenClaw, ZeroClaw │
├─────────────────────────────────────────────────────────┤
│ Layer 1: LLM PROVIDER │
│ "Raw intelligence" │
│ Claude, GPT, Gemini, Llama, Ollama │
└─────────────────────────────────────────────────────────┘
Most discourse treats everything in this stack as "AI coding tools" and tries to compare them head-to-head. That's like comparing Kubernetes to React because they both "run JavaScript." Let's look at what each layer actually does.
Layer 4: The Orchestrators
This is where things get interesting. You have one codebase, ten agents, and they all need to write code without destroying each other's work. Two systems dominate this space today.
Overstory
Built by: jayminwest | Language: TypeScript/Bun | Deps: Zero runtime npm dependencies
Overstory turns your Claude Code session into the orchestrator. There's no separate daemon. Your session IS the brain. It spawns workers via tmux into isolated git worktrees and coordinates them through a custom SQLite mail system.
Architecture:
Your Claude Code Session (orchestrator)
├── overstory sling lead → Team Lead (tmux + worktree)
│ ├── overstory sling builder → Builder (tmux + worktree)
│ ├── overstory sling scout → Scout (tmux + worktree, read-only)
│ └── overstory sling reviewer → Reviewer (tmux + worktree, read-only)
└── overstory sling lead → Another Team Lead
└── ...
What makes it tick:
| Component | Implementation |
|---|---|
| Agent spawning | overstory sling creates worktree + tmux + CLAUDE.md overlay |
| Messaging | Custom SQLite mail (WAL mode, ~1-5ms queries, typed messages) |
| Isolation | Git worktrees — each agent gets its own branch and directory |
| Merge pipeline | 4-tier: clean merge → auto-resolve → AI-resolve → reimagine |
| Health monitoring | 3-tier watchdog: mechanical daemon → AI triage → monitor agent |
| Expertise | Mulch integration — agents accumulate domain knowledge across sessions |
| Observability | events.db + trace + replay + feed + dashboard + inspect + costs |
| Hierarchy | Depth-limited tree (coordinator → lead → worker), max depth 2 |
The two-layer instruction model is clever: each agent gets a base definition (the HOW — agents/builder.md describes what a builder does) plus a per-task overlay CLAUDE.md (the WHAT — task ID, file scope, spec path, branch name). The orchestrator only passes WHAT. The base definition already has HOW.
The merge pipeline is the most sophisticated in any orchestrator I've seen. Four escalation tiers:
- Clean merge:
git merge --no-ff, zero conflicts. Done. - Auto-resolve: Parse conflict markers, keep the agent's changes.
- AI-resolve: Claude analyzes the conflicts with mulch history context and generates merged content.
- Reimagine: Nuclear option — abort the merge, replay the agent's changes from scratch on a fresh canonical state.
If tier N fails, it escalates to tier N+1. Conflict history from mulch informs which tiers to skip (if this file always fails at tier 2, start at tier 3).
Gastown (Steve Yegge)
Built by: Steve Yegge | Language: Go | Data: Dolt (git-backed database)
Gastown uses theatrical metaphors for the same fundamental primitives:
| Gastown term | Overstory equivalent | What it is |
|---|---|---|
| Mayor | Orchestrator session | Your primary Claude Code with full context |
| Polecats | Builder/Scout/Reviewer | Worker agents with persistent identity |
| Rigs | .overstory/ per-project | Project containers wrapping git repos |
| Hooks | Git worktrees | Persistent storage that survives crashes |
| Convoys | Groups | Bundles of work items assigned to agents |
| Beads | Beads | Issue tracking (both use bd) |
Shared DNA: Both systems use beads (bd) for issue tracking, tmux for agent sessions, git worktrees for isolation, and CLAUDE.md hooks for agent instructions. They diverged on storage (SQLite vs Dolt), language (TypeScript vs Go), and orchestration philosophy.
Key differences:
| Overstory | Gastown | |
|---|---|---|
| Hierarchy | Explicit depth-limited tree | Flatter (Mayor → Polecat) |
| Merge | 4-tier escalation pipeline | Git-based, less formalized |
| Expertise | Mulch integration (persistent domain knowledge) | Not mentioned |
| Observability | 7+ query tools (trace, replay, feed, etc.) | Beads-based tracking |
| Multi-runtime | Claude Code only | Claude Code + Codex |
| Scale target | Structured specialist teams | 20-30 agents comfortably |
| Runtime deps | Zero | Go modules |
The relationship: They're siblings, not competitors. Same parents (beads, worktrees, tmux, CLAUDE.md), different upbringings. Gastown is broader and flatter (multi-runtime, more agents); Overstory is deeper and more structured (typed hierarchy, 4-tier merge, integrated expertise, richer observability).
Both represent what Paddo's blog calls "operational multi-agent" — agents with external state management and git worktrees for isolation, as opposed to BMAD-style "SDLC theater" that recreates human organizational bottlenecks with sequential persona handoffs.
Layer 3: The Coding Agents
These are the individual agents that actually write code. One session, one agent, one conversation. The orchestrators in Layer 4 spawn many of these.
Claude Code (Anthropic)
The 800-pound gorilla. Terminal-based, Claude-only, with an experimental built-in Agent Teams feature that's disabled by default. Agent Teams uses Claude Code's own Task tool to spawn subagents — simpler than Overstory/Gastown but without external state persistence, worktree isolation, or merge pipelines. If a session crashes, the coordination state dies with it.
Crush (Charm)
Language: Go | TUI: Bubble Tea | Provider abstraction: Fantasy library
Built by the Charm team (Bubble Tea, Lip Gloss, Glamour) after they recruited the original OpenCode (Go) creator. The coordinator pattern is the architectural core: a centralized hub managing per-session FIFO queues, routing prompts to agents with injected dependencies, handling OAuth refresh, and coordinating three-layer config merging.
Crush has a sub-agent architecture where coder agents can spawn isolated read-only task agents. This is not multi-agent orchestration — it's delegation within a single session. No agent-to-agent messaging, no shared task queues, no merge pipelines.
What Crush does well: LSP integration, MCP server support (stdio/HTTP/SSE), multi-provider model switching, and the most polished terminal UX in the space (it's Charm, after all).
OpenCode (Anomaly Innovations)
Two lives: The archived Go version (Bubble Tea TUI, monolithic) and the active TypeScript rewrite (client/server, Hono API, SolidJS TUI at 60fps, Bun runtime).
The TypeScript version is a full client/server architecture — HTTP API with SSE for real-time updates, SQLite + Drizzle ORM for persistence, Vercel AI SDK for provider abstraction (75+ models). It has a Coder agent, Task agent, and Title agent, but these are isolated sub-agents within a single session. No inter-agent communication or coordination.
The OpenCode → Crush pipeline: The original Go OpenCode creator joined Charm and built Crush. The TypeScript rewrite is maintained by a different team (Anomaly Innovations). So OpenCode and Crush share architectural philosophy but diverged on language and maintainership.
How They Relate to Layer 4
Simple: Layer 3 agents are the worker processes that Layer 4 orchestrators spawn and coordinate. When Overstory runs overstory sling builder, it creates a tmux session running Claude Code. If Gastown supported Crush as a runtime, it would create a tmux session running Crush instead.
Layer 3 tools don't know they're being orchestrated. They just see a CLAUDE.md with instructions and get to work.
Layer 2: The Agent Runtimes (The Claw Wars)
This layer asks a different question entirely: not "how do agents coordinate?" but "can I trust what this agent's tools are doing?"
When Claude Code runs a bash command or writes a file, what prevents it from reading your .env, exfiltrating credentials to an external server, or rm -rf-ing your home directory? Claude Code has built-in safety checks and user confirmation prompts. The Claws argue that's not enough.
IronClaw (NEAR AI / Llion Jones)
Language: Rust | Binary: 3.4MB | Startup: <10ms | Idle RAM: ~7.8MB
The security-first contender. Every tool runs in an isolated WebAssembly sandbox with capability-based permissions inspired by the seL4 microkernel.
The killer feature: host-boundary credential injection.
Traditional approach:
Tool receives API key → Tool makes request → Hope tool doesn't leak it
IronClaw approach:
Tool requests HTTP (no auth) → Host injects credentials at boundary →
Leak detection scans I/O → Tool receives sanitized response
Tools never possess credentials. They can't leak what they don't have. The host (IronClaw runtime) injects secrets into outbound requests and strips them from responses. Aho-Corasick pattern matching catches 15+ credential formats.
WASM sandbox restrictions:
- No environment variable access
- No arbitrary filesystem access
- No direct network sockets
- No credential vault access
- No process spawning
Tools must hold explicit capability tokens for each permitted action. Rate limiting, memory limits, CPU limits, execution time limits — all enforced at the sandbox boundary.
The Claw Landscape
Per the head-to-head comparison:
| OpenClaw | IronClaw | ZeroClaw | |
|---|---|---|---|
| Stars | 216K | Small (new) | Small (new) |
| Language | Python | Rust | Go |
| Security | Weak (plaintext creds, 24 vulns) | Strong (WASM sandbox, host-boundary injection) | Medium (encrypted creds, same-process tools) |
| Multi-agent | Yes (only Claw with orchestration) | No | No |
| Binary size | 28MB+ | 3.4MB | 8.8MB |
| Idle RAM | 394MB | ~7.8MB | ~12MB |
| Channels | ~15 | Growing | 23 |
| Providers | ~8 | MCP-based | 30+ |
The paradox: OpenClaw is the only Claw with multi-agent orchestration, but it has the worst security. IronClaw has the best security, but no multi-agent. You can't have both yet.
Recommended path: Harden OpenClaw now for orchestration capabilities, monitor IronClaw's multi-agent roadmap (~2,200 lines of code away per estimates), migrate when it's ready.
Why This Layer is Orthogonal to Overstory
Overstory enforces agent boundaries through instructions: "CLAUDE.md says only touch these files in your FILE_SCOPE." IronClaw enforces them through architecture: WASM physically prevents unauthorized file access.
| Security concern | Overstory's answer | IronClaw's answer |
|---|---|---|
| Agent reads files outside scope | Trust + instructions | WASM capability tokens |
| Tool steals credentials | Not addressed | Host-boundary injection |
| Tool exfiltrates data | Not addressed | Network allowlisting + leak detection |
| Prompt injection | Not addressed | 5-layer sanitization |
rm -rf / | Claude Code's built-in safety | WASM sandbox (no FS access) |
Meanwhile, IronClaw has zero answers for:
- Agent-to-agent communication
- File conflict avoidance across agents
- Merge pipelines
- Agent health monitoring
- Expertise accumulation
- Agent spawning and scaling
They solve completely different problems. Overstory is urban planning; IronClaw is building codes. You need both for a city, but they're designed by different people for different reasons.
The Dream Stack
Nobody has built this yet, but the layers compose naturally:
┌─────────────────────────────────────────────┐
│ Overstory / Gastown │
│ "10 agents, exclusive file scopes, │
│ SQLite mail, 4-tier merge" │
├─────────────────────────────────────────────┤
│ IronClaw │
│ "Each agent's tools run in WASM sandboxes, │
│ credentials injected at boundary" │
├─────────────────────────────────────────────┤
│ Claude Code / Crush / OpenCode │
│ "Prompt → reason → tool call → code" │
├─────────────────────────────────────────────┤
│ Claude / GPT / Gemini / Local │
│ "Raw inference" │
└─────────────────────────────────────────────┘
Overstory tells agent-3 to build the auth module. IronClaw ensures agent-3's tools can only access src/auth/ and can't leak the database password. Claude Code handles the reasoning loop. Claude provides the intelligence.
Today, Overstory skips Layer 2 entirely and trusts Claude Code's built-in safety. That works for trusted development environments. For production agent deployments — where tools run code from untrusted sources, handle customer data, or access production infrastructure — the Claw Wars matter a lot more.
What I'd Watch
-
Overstory's merge pipeline is the most underrated innovation in this space. Everyone talks about spawning agents. Nobody talks about what happens when 10 agents' branches need to merge. The 4-tier escalation with mulch-informed conflict history is genuinely novel.
-
IronClaw adding multi-agent. The moment IronClaw ships agent-to-agent messaging and task coordination, it becomes the obvious Layer 2 choice for any orchestrator.
-
Claude Code Agent Teams maturing. If Anthropic builds worktree isolation, external state persistence, and merge pipelines into the native Agent Teams feature, the case for external orchestrators weakens significantly.
-
Crush or OpenCode as Overstory/Gastown runtimes. Gastown already supports Codex as an alternative runtime. If Crush or OpenCode become viable agent runtimes for orchestrators, you get multi-provider model flexibility at the agent level.
Sources
- Overstory (GitHub) — Multi-agent orchestration for Claude Code
- Gastown (GitHub) — Multi-agent workspace manager
- GasTown and the Two Kinds of Multi-Agent
- OpenCode (GitHub) — Open-source AI coding agent
- Crush (GitHub) — Glamourous AI coding agent
- IronClaw (GitHub) — Security-first agent runtime
- Claude Code Agent Teams Docs