← Back to blog

The AI Agent Stack: From Sandboxes to Swarms

·Overstory
overstorygastownironclawarchitecturecomparison

TL;DR: AI coding tools stack into four layers — LLM providers, coding agents, agent runtimes, and orchestrators — and most discourse conflates them all. Overstory and Gastown solve coordination at the top. Claude Code and Crush solve coding in the middle. IronClaw solves trust at the bottom. They don't compete. They compose.

The AI Agent Stack: From Sandboxes to Swarms

How Overstory, Gastown, Crush, OpenCode, and the Claw Wars map to four distinct layers of the AI coding agent landscape.

Quick Orientation

If you care about...Look at...
Coordinating 10+ AI agents on one codebaseOverstory or Gastown (Layer 4)
A single AI agent that writes code wellClaude Code, Crush, or OpenCode (Layer 3)
Preventing your agent from stealing your AWS keysIronClaw (Layer 2)
The model behind the agentClaude, GPT, Gemini, local models (Layer 1)

These tools don't compete. They stack. Most people conflate them because they all involve "AI writing code," but they're solving fundamentally different problems at different layers. This post maps the landscape.


The Four-Layer Model

┌─────────────────────────────────────────────────────────┐
│  Layer 4: ORCHESTRATION                                 │
│  "Who does what, when, and how does it merge?"          │
│  Overstory, Gastown, Claude Code Agent Teams            │
├─────────────────────────────────────────────────────────┤
│  Layer 3: CODING AGENT                                  │
│  "One agent, one session, write some code"              │
│  Claude Code, Crush, OpenCode, Cursor, Codex CLI        │
├─────────────────────────────────────────────────────────┤
│  Layer 2: AGENT RUNTIME                                 │
│  "Can I trust this tool not to exfiltrate my data?"     │
│  IronClaw, OpenClaw, ZeroClaw                           │
├─────────────────────────────────────────────────────────┤
│  Layer 1: LLM PROVIDER                                  │
│  "Raw intelligence"                                     │
│  Claude, GPT, Gemini, Llama, Ollama                     │
└─────────────────────────────────────────────────────────┘

Most discourse treats everything in this stack as "AI coding tools" and tries to compare them head-to-head. That's like comparing Kubernetes to React because they both "run JavaScript." Let's look at what each layer actually does.


Layer 4: The Orchestrators

This is where things get interesting. You have one codebase, ten agents, and they all need to write code without destroying each other's work. Two systems dominate this space today.

Overstory

Built by: jayminwest | Language: TypeScript/Bun | Deps: Zero runtime npm dependencies

Overstory turns your Claude Code session into the orchestrator. There's no separate daemon. Your session IS the brain. It spawns workers via tmux into isolated git worktrees and coordinates them through a custom SQLite mail system.

Architecture:

Your Claude Code Session (orchestrator)
  ├── overstory sling lead → Team Lead (tmux + worktree)
  │     ├── overstory sling builder → Builder (tmux + worktree)
  │     ├── overstory sling scout → Scout (tmux + worktree, read-only)
  │     └── overstory sling reviewer → Reviewer (tmux + worktree, read-only)
  └── overstory sling lead → Another Team Lead
        └── ...

What makes it tick:

ComponentImplementation
Agent spawningoverstory sling creates worktree + tmux + CLAUDE.md overlay
MessagingCustom SQLite mail (WAL mode, ~1-5ms queries, typed messages)
IsolationGit worktrees — each agent gets its own branch and directory
Merge pipeline4-tier: clean merge → auto-resolve → AI-resolve → reimagine
Health monitoring3-tier watchdog: mechanical daemon → AI triage → monitor agent
ExpertiseMulch integration — agents accumulate domain knowledge across sessions
Observabilityevents.db + trace + replay + feed + dashboard + inspect + costs
HierarchyDepth-limited tree (coordinator → lead → worker), max depth 2

The two-layer instruction model is clever: each agent gets a base definition (the HOW — agents/builder.md describes what a builder does) plus a per-task overlay CLAUDE.md (the WHAT — task ID, file scope, spec path, branch name). The orchestrator only passes WHAT. The base definition already has HOW.

The merge pipeline is the most sophisticated in any orchestrator I've seen. Four escalation tiers:

  1. Clean merge: git merge --no-ff, zero conflicts. Done.
  2. Auto-resolve: Parse conflict markers, keep the agent's changes.
  3. AI-resolve: Claude analyzes the conflicts with mulch history context and generates merged content.
  4. Reimagine: Nuclear option — abort the merge, replay the agent's changes from scratch on a fresh canonical state.

If tier N fails, it escalates to tier N+1. Conflict history from mulch informs which tiers to skip (if this file always fails at tier 2, start at tier 3).

Gastown (Steve Yegge)

Built by: Steve Yegge | Language: Go | Data: Dolt (git-backed database)

Gastown uses theatrical metaphors for the same fundamental primitives:

Gastown termOverstory equivalentWhat it is
MayorOrchestrator sessionYour primary Claude Code with full context
PolecatsBuilder/Scout/ReviewerWorker agents with persistent identity
Rigs.overstory/ per-projectProject containers wrapping git repos
HooksGit worktreesPersistent storage that survives crashes
ConvoysGroupsBundles of work items assigned to agents
BeadsBeadsIssue tracking (both use bd)

Shared DNA: Both systems use beads (bd) for issue tracking, tmux for agent sessions, git worktrees for isolation, and CLAUDE.md hooks for agent instructions. They diverged on storage (SQLite vs Dolt), language (TypeScript vs Go), and orchestration philosophy.

Key differences:

OverstoryGastown
HierarchyExplicit depth-limited treeFlatter (Mayor → Polecat)
Merge4-tier escalation pipelineGit-based, less formalized
ExpertiseMulch integration (persistent domain knowledge)Not mentioned
Observability7+ query tools (trace, replay, feed, etc.)Beads-based tracking
Multi-runtimeClaude Code onlyClaude Code + Codex
Scale targetStructured specialist teams20-30 agents comfortably
Runtime depsZeroGo modules

The relationship: They're siblings, not competitors. Same parents (beads, worktrees, tmux, CLAUDE.md), different upbringings. Gastown is broader and flatter (multi-runtime, more agents); Overstory is deeper and more structured (typed hierarchy, 4-tier merge, integrated expertise, richer observability).

Both represent what Paddo's blog calls "operational multi-agent" — agents with external state management and git worktrees for isolation, as opposed to BMAD-style "SDLC theater" that recreates human organizational bottlenecks with sequential persona handoffs.


Layer 3: The Coding Agents

These are the individual agents that actually write code. One session, one agent, one conversation. The orchestrators in Layer 4 spawn many of these.

Claude Code (Anthropic)

The 800-pound gorilla. Terminal-based, Claude-only, with an experimental built-in Agent Teams feature that's disabled by default. Agent Teams uses Claude Code's own Task tool to spawn subagents — simpler than Overstory/Gastown but without external state persistence, worktree isolation, or merge pipelines. If a session crashes, the coordination state dies with it.

Crush (Charm)

Language: Go | TUI: Bubble Tea | Provider abstraction: Fantasy library

Built by the Charm team (Bubble Tea, Lip Gloss, Glamour) after they recruited the original OpenCode (Go) creator. The coordinator pattern is the architectural core: a centralized hub managing per-session FIFO queues, routing prompts to agents with injected dependencies, handling OAuth refresh, and coordinating three-layer config merging.

Crush has a sub-agent architecture where coder agents can spawn isolated read-only task agents. This is not multi-agent orchestration — it's delegation within a single session. No agent-to-agent messaging, no shared task queues, no merge pipelines.

What Crush does well: LSP integration, MCP server support (stdio/HTTP/SSE), multi-provider model switching, and the most polished terminal UX in the space (it's Charm, after all).

OpenCode (Anomaly Innovations)

Two lives: The archived Go version (Bubble Tea TUI, monolithic) and the active TypeScript rewrite (client/server, Hono API, SolidJS TUI at 60fps, Bun runtime).

The TypeScript version is a full client/server architecture — HTTP API with SSE for real-time updates, SQLite + Drizzle ORM for persistence, Vercel AI SDK for provider abstraction (75+ models). It has a Coder agent, Task agent, and Title agent, but these are isolated sub-agents within a single session. No inter-agent communication or coordination.

The OpenCode → Crush pipeline: The original Go OpenCode creator joined Charm and built Crush. The TypeScript rewrite is maintained by a different team (Anomaly Innovations). So OpenCode and Crush share architectural philosophy but diverged on language and maintainership.

How They Relate to Layer 4

Simple: Layer 3 agents are the worker processes that Layer 4 orchestrators spawn and coordinate. When Overstory runs overstory sling builder, it creates a tmux session running Claude Code. If Gastown supported Crush as a runtime, it would create a tmux session running Crush instead.

Layer 3 tools don't know they're being orchestrated. They just see a CLAUDE.md with instructions and get to work.


Layer 2: The Agent Runtimes (The Claw Wars)

This layer asks a different question entirely: not "how do agents coordinate?" but "can I trust what this agent's tools are doing?"

When Claude Code runs a bash command or writes a file, what prevents it from reading your .env, exfiltrating credentials to an external server, or rm -rf-ing your home directory? Claude Code has built-in safety checks and user confirmation prompts. The Claws argue that's not enough.

IronClaw (NEAR AI / Llion Jones)

Language: Rust | Binary: 3.4MB | Startup: <10ms | Idle RAM: ~7.8MB

The security-first contender. Every tool runs in an isolated WebAssembly sandbox with capability-based permissions inspired by the seL4 microkernel.

The killer feature: host-boundary credential injection.

Traditional approach:
  Tool receives API key → Tool makes request → Hope tool doesn't leak it

IronClaw approach:
  Tool requests HTTP (no auth) → Host injects credentials at boundary →
  Leak detection scans I/O → Tool receives sanitized response

Tools never possess credentials. They can't leak what they don't have. The host (IronClaw runtime) injects secrets into outbound requests and strips them from responses. Aho-Corasick pattern matching catches 15+ credential formats.

WASM sandbox restrictions:

  • No environment variable access
  • No arbitrary filesystem access
  • No direct network sockets
  • No credential vault access
  • No process spawning

Tools must hold explicit capability tokens for each permitted action. Rate limiting, memory limits, CPU limits, execution time limits — all enforced at the sandbox boundary.

The Claw Landscape

Per the head-to-head comparison:

OpenClawIronClawZeroClaw
Stars216KSmall (new)Small (new)
LanguagePythonRustGo
SecurityWeak (plaintext creds, 24 vulns)Strong (WASM sandbox, host-boundary injection)Medium (encrypted creds, same-process tools)
Multi-agentYes (only Claw with orchestration)NoNo
Binary size28MB+3.4MB8.8MB
Idle RAM394MB~7.8MB~12MB
Channels~15Growing23
Providers~8MCP-based30+

The paradox: OpenClaw is the only Claw with multi-agent orchestration, but it has the worst security. IronClaw has the best security, but no multi-agent. You can't have both yet.

Recommended path: Harden OpenClaw now for orchestration capabilities, monitor IronClaw's multi-agent roadmap (~2,200 lines of code away per estimates), migrate when it's ready.

Why This Layer is Orthogonal to Overstory

Overstory enforces agent boundaries through instructions: "CLAUDE.md says only touch these files in your FILE_SCOPE." IronClaw enforces them through architecture: WASM physically prevents unauthorized file access.

Security concernOverstory's answerIronClaw's answer
Agent reads files outside scopeTrust + instructionsWASM capability tokens
Tool steals credentialsNot addressedHost-boundary injection
Tool exfiltrates dataNot addressedNetwork allowlisting + leak detection
Prompt injectionNot addressed5-layer sanitization
rm -rf /Claude Code's built-in safetyWASM sandbox (no FS access)

Meanwhile, IronClaw has zero answers for:

  • Agent-to-agent communication
  • File conflict avoidance across agents
  • Merge pipelines
  • Agent health monitoring
  • Expertise accumulation
  • Agent spawning and scaling

They solve completely different problems. Overstory is urban planning; IronClaw is building codes. You need both for a city, but they're designed by different people for different reasons.


The Dream Stack

Nobody has built this yet, but the layers compose naturally:

┌─────────────────────────────────────────────┐
│  Overstory / Gastown                        │
│  "10 agents, exclusive file scopes,         │
│   SQLite mail, 4-tier merge"                │
├─────────────────────────────────────────────┤
│  IronClaw                                   │
│  "Each agent's tools run in WASM sandboxes, │
│   credentials injected at boundary"         │
├─────────────────────────────────────────────┤
│  Claude Code / Crush / OpenCode             │
│  "Prompt → reason → tool call → code"       │
├─────────────────────────────────────────────┤
│  Claude / GPT / Gemini / Local              │
│  "Raw inference"                            │
└─────────────────────────────────────────────┘

Overstory tells agent-3 to build the auth module. IronClaw ensures agent-3's tools can only access src/auth/ and can't leak the database password. Claude Code handles the reasoning loop. Claude provides the intelligence.

Today, Overstory skips Layer 2 entirely and trusts Claude Code's built-in safety. That works for trusted development environments. For production agent deployments — where tools run code from untrusted sources, handle customer data, or access production infrastructure — the Claw Wars matter a lot more.


What I'd Watch

  1. Overstory's merge pipeline is the most underrated innovation in this space. Everyone talks about spawning agents. Nobody talks about what happens when 10 agents' branches need to merge. The 4-tier escalation with mulch-informed conflict history is genuinely novel.

  2. IronClaw adding multi-agent. The moment IronClaw ships agent-to-agent messaging and task coordination, it becomes the obvious Layer 2 choice for any orchestrator.

  3. Claude Code Agent Teams maturing. If Anthropic builds worktree isolation, external state persistence, and merge pipelines into the native Agent Teams feature, the case for external orchestrators weakens significantly.

  4. Crush or OpenCode as Overstory/Gastown runtimes. Gastown already supports Codex as an alternative runtime. If Crush or OpenCode become viable agent runtimes for orchestrators, you get multi-provider model flexibility at the agent level.


Sources