Overstory Architecture Deep Dive
TL;DR: Overstory turns a single Claude Code session into a coordinated swarm of 25 agents by parasitizing Claude Code's own hook system, isolating work in git worktrees, and resolving merge conflicts through a 4-tier escalation pipeline that goes from
git mergeto "ask an AI to rewrite it from scratch" — all with zero npm dependencies.
Overstory Architecture Deep Dive
If you're evaluating multi-agent orchestration for Claude Code — or wondering how Overstory compares to Gastown, Agent Teams, Ruflo, and MCO — this is the architectural teardown.
1. What Is Overstory?
Overstory is a project-agnostic swarm system for Claude Code agent orchestration. There is no separate daemon. Your Claude Code session is the orchestrator. The system bootstraps itself through three mechanisms:
- CLAUDE.md — the project instruction file that Claude Code reads on startup
- Hooks — Claude Code's lifecycle event system (SessionStart, PreToolUse, PostToolUse, Stop)
- The
overstoryCLI — 29 commands for spawning agents, messaging, merging, and monitoring
The runtime is Bun with TypeScript. Zero npm runtime dependencies — every external interaction (git, tmux, Claude CLI) goes through Bun.spawn. All persistent state lives in SQLite databases using bun:sqlite with WAL mode for concurrent access across agents.
Tech Stack at a Glance
| Component | Implementation |
|---|---|
| Runtime | Bun (runs TypeScript directly, no build step) |
| Dependencies | Zero runtime. bun:sqlite, Bun.spawn, Bun.file only |
| Databases | 4 SQLite DBs: mail.db, sessions.db, events.db, metrics.db |
| Agent isolation | Git worktrees (one per agent) |
| Agent execution | Tmux sessions (one per agent, running claude --dangerously-skip-permissions) |
| Communication | Custom SQLite mail system (~1-5ms per query) |
| Issue tracking | Beads (bd CLI, git-backed JSONL) |
| Expertise | Mulch (mulch CLI, structured knowledge records) |
2. The Hierarchy
Overstory enforces a strict 3-level hierarchy with depth limits:
Orchestrator (your Claude Code session, depth 0)
├── Coordinator agent (depth 0, spawns leads only)
│ ├── Lead agent (depth 1, spawns workers)
│ │ ├── Scout (depth 2, read-only recon)
│ │ ├── Builder (depth 2, writes code)
│ │ ├── Reviewer (depth 2, read-only validation)
│ │ └── Merger (depth 2, branch integration)
│ └── Supervisor agent (depth 1, persistent per-project)
│ └── Workers (depth 2, same types as above)
└── Monitor agent (Tier 2, observer only, no worktree)
This is code-enforced. If the coordinator tries to spawn a builder directly, it gets a HierarchyError:
export function validateHierarchy(
parentAgent: string | null,
capability: string,
name: string,
_depth: number,
forceHierarchy: boolean,
): void {
if (forceHierarchy) return;
if (parentAgent === null && capability !== "lead") {
throw new HierarchyError(
`Coordinator cannot spawn "${capability}" directly. Only "lead" is allowed without --parent.`,
{ agentName: name, requestedCapability: capability },
);
}
}
The lead agent enforces a mandatory 3-phase workflow: Scout → Build → Review. Skipping scouts is a named failure mode (SCOUT_SKIP). Every builder must have a corresponding reviewer. Non-overlapping file scopes are enforced — two builders cannot own the same file.
3. How Agent Spawning Works
The overstory sling command is the critical path. It orchestrates a 14-step spawn sequence:
sling <task-id> --name auth-builder --capability builder --parent auth-lead --depth 2 --files src/auth.ts,src/auth.test.ts
Here's what happens (traced from src/commands/sling.ts):
- Load config — reads
.overstory/config.yaml, resolves project root (handles worktree detection) - Validate depth —
depth > config.agents.maxDepth(default 2) → reject - Validate hierarchy — coordinator can only spawn leads
- Load agent manifest — validates capability exists in
agent-manifest.json - Resolve run ID — reads/creates
current-run.txtfor session grouping - Check concurrency — active agents vs
maxConcurrent(default 25) - Check name uniqueness — no duplicate active agent names
- Enforce stagger delay — minimum gap between spawns (default 2s)
- Validate bead — task must be
openorin_progress - Create git worktree —
git worktree add -b overstory/{name}/{taskId} .overstory/worktrees/{name} main - Generate overlay CLAUDE.md — Layer 1 (role HOW) + Layer 2 (task WHAT), including pre-fetched mulch expertise
- Deploy hooks — capability-specific guards to the worktree's
.claude/settings.local.json - Create tmux session —
tmux new-session -d -s overstory-{project}-{name} -c {worktreePath} "claude --model {model} --dangerously-skip-permissions" - Send startup beacon — structured message via
tmux send-keysafter 3s initialization delay
The tmux session creation in src/worktree/tmux.ts handles PATH injection so hooks can find the overstory binary:
export async function createSession(
name: string, cwd: string, command: string,
env?: Record<string, string>,
): Promise<number> {
const overstoryBinDir = await detectOverstoryBinDir();
const exports: string[] = [];
if (overstoryBinDir) {
exports.push(`export PATH="${overstoryBinDir}:$PATH"`);
}
if (env) {
for (const [key, value] of Object.entries(env)) {
exports.push(`export ${key}="${value}"`);
}
}
const wrappedCommand = exports.length > 0
? `${exports.join(" && ")} && ${command}` : command;
const { exitCode, stderr } = await runCommand(
["tmux", "new-session", "-d", "-s", name, "-c", cwd, wrappedCommand], cwd,
);
// ... PID retrieval via tmux list-panes ...
}
4. The Messaging System: SQLite Mail
Agents communicate through a custom SQLite mail system in .overstory/mail.db. Not email. Not HTTP. Not Redis. A single SQLite database in WAL mode with prepared statements and ~1-5ms query latency.
The schema (src/mail/store.ts):
CREATE TABLE messages (
id TEXT PRIMARY KEY,
from_agent TEXT NOT NULL,
to_agent TEXT NOT NULL,
subject TEXT NOT NULL,
body TEXT NOT NULL,
type TEXT NOT NULL DEFAULT 'status'
CHECK(type IN ('status','question','result','error',
'worker_done','merge_ready','merged','merge_failed',
'escalation','health_check','dispatch','assign')),
priority TEXT NOT NULL DEFAULT 'normal'
CHECK(priority IN ('low','normal','high','urgent')),
thread_id TEXT,
payload TEXT,
read INTEGER NOT NULL DEFAULT 0,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_inbox ON messages(to_agent, read);
There are 4 semantic types (status, question, result, error) and 8 protocol types (worker_done, merge_ready, merged, merge_failed, escalation, health_check, dispatch, assign). Protocol types carry structured JSON payloads — for example, worker_done includes { beadId, branch, exitCode, filesModified }.
Mail delivery is hook-driven. The UserPromptSubmit and PostToolUse hooks run overstory mail check --inject --agent {name}, which queries unread messages, marks them read, and formats them as human-readable text injected into the agent's context:
You have 2 new messages:
--- From: auth-lead [HIGH] (question) ---
Subject: File scope clarification
Are you handling the middleware auth or just the route handler?
[Reply with: overstory mail reply msg-abc123 --body "..."]
This is the parasitic genius of the system — Claude Code's own hook infrastructure becomes the agent's mailbox polling loop. No separate daemon needed.
5. The Merge Pipeline: 4-Tier Escalation
When an agent finishes work, its branch needs to be integrated. Overstory uses a FIFO merge queue (merge-queue.db, SQLite) and a 4-tier conflict resolution pipeline in src/merge/resolver.ts:
| Tier | Strategy | How It Works |
|---|---|---|
| 1: Clean merge | git merge --no-edit | If no conflicts, done. Cheapest possible outcome. |
| 2: Auto-resolve | Keep incoming changes | Parse conflict markers, replace each block with the agent's version. |
| 3: AI-resolve | claude --print per file | Send conflicted file content to Claude with "output ONLY the resolved content, NO explanation." Validates output isn't prose. |
| 4: Re-imagine | Full rewrite | Abort merge. Show Claude both canonical and branch versions. "Reimplement the changes from the branch version onto the canonical version." |
The prose detection in Tier 3 (looksLikeProse()) catches when Claude forgets it's supposed to output raw code:
export function looksLikeProse(text: string): boolean {
const prosePatterns = [
/^(I |I'[a-z]+ |Here |Here's |The |This |Let me |Sure|Unfortunately)/i,
/^(To resolve|Looking at|Based on|After reviewing|The conflict)/i,
/^```/m, // Markdown fencing — the model wrapped the code
/I need permission/i,
];
for (const pattern of prosePatterns) {
if (pattern.test(text.trim())) return true;
}
return false;
}
The system also queries mulch for historical conflict patterns. If a tier has failed >=2 times for the same files and never succeeded, it gets skipped entirely. Past successful resolutions are fed into the AI prompt for context. This means the merge pipeline literally learns from its own failures.
6. The Watchdog: 4-Tier Health Monitoring
Agent health monitoring runs across four tiers:
| Tier | Name | What It Does |
|---|---|---|
| 0 | Mechanical daemon | Process monitoring via setInterval. Checks tmux liveness, PID liveness, lastActivity timestamps. |
| 1 | AI triage | Ephemeral claude --print call to classify failure as retry, terminate, or extend. |
| 2 | Monitor agent | Persistent Claude Code session doing continuous patrol across all agents. |
| 3 | Supervisor | Per-project oversight (built into supervisor agent role). |
The Tier 0 daemon implements the ZFC principle (Zero Failure Crash) from src/watchdog/health.ts: observable state always wins over recorded state.
Signal priority (highest to lowest):
1. tmux session liveness — Is the tmux session actually running?
2. Process liveness (pid) — Is the Claude Code process still alive?
3. Recorded state — What does sessions.db claim?
When signals conflict:
- tmux dead + sessions.db says "working" → zombie immediately
- tmux alive + sessions.db says "zombie" → investigate, don't auto-kill
- pid dead + tmux alive → zombie (agent exited, shell survived)
Progressive escalation for stalled agents follows a 4-level nudge protocol:
Level 0 (warn): Log warning, no action
Level 1 (nudge): Send tmux nudge: "appears stalled, please check"
Level 2 (escalate): Invoke Tier 1 AI triage (if enabled)
Level 3 (terminate): Kill tmux session + descendant process tree
Process tree cleanup in src/worktree/tmux.ts walks descendant PIDs recursively using pgrep -P, sends SIGTERM deepest-first, waits a 2-second grace period, then SIGKILL survivors. This prevents orphaned git, bun test, and biome processes from accumulating.
7. Hooks: The Nervous System
The hooks template (templates/hooks.json.tmpl) wires Overstory into Claude Code's lifecycle:
| Hook | What It Does |
|---|---|
SessionStart | overstory prime --agent {name} — loads config, recent activity, mulch expertise |
UserPromptSubmit | overstory mail check --inject — polls inbox, injects unread messages |
PreToolUse (all) | overstory log tool-start — records tool invocation to events.db |
PreToolUse (Bash) | Blocks git push with a decision: block response |
PostToolUse | overstory log tool-end — records completion + mail check --inject with debounce |
Stop | overstory log session-end + mulch learn — captures session insights |
PreCompact | overstory prime --compact — re-injects context before context window compaction |
The git push block is enforced at the hook level for all agents — no agent can push to remote. All integration goes through the merge pipeline. This is a hard architectural constraint, not a suggestion in a markdown file.
8. Observability Stack
Four SQLite databases provide full-fleet observability:
| Database | Stores | Query Commands |
|---|---|---|
sessions.db | Agent sessions (state machine: booting->working->completed/stalled->zombie), runs | overstory status, overstory run |
events.db | Tool invocations, session lifecycle, errors, custom events | overstory trace, overstory errors, overstory replay |
mail.db | Inter-agent messages with threading and priority | overstory mail list |
metrics.db | Token usage, cost estimates per session | overstory costs |
The overstory dashboard command provides a live TUI polling every 2 seconds. overstory replay interleaves events across agents chronologically for post-mortem analysis. overstory costs --live shows real-time token burn rates for active agents.
9. Comparison: Overstory vs the Field
Here's how Overstory stacks up against the other major Claude Code orchestration systems as of March 2026:
| Dimension | Overstory | Gastown | Agent Teams (Native) | MCO | Ruflo |
|---|---|---|---|---|---|
| Builder | js0n | Steve Yegge | Anthropic | mco-org | Reuven Cohen |
| Architecture | CLAUDE.md + hooks + CLI | Mayor + Polecats + hooks | Lead + Teammates (flat) | Fan-out/wait-all | Queen-led swarms (layered) |
| Max agents | 25 (configurable) | 20-30 | 3-5 recommended | Per-provider | 60+ types |
| Hierarchy depth | 3 levels (enforced) | Hierarchical (Mayor -> Polecats) | 2 levels (lead + teammates) | Flat (parallel) | Queen -> workers |
| Git isolation | Git worktrees (auto) | Git worktrees (auto) | None (manual) | None | Not documented |
| Merge handling | 4-tier escalation (clean->auto->AI->reimagine) | Implicit (non-overlapping tasks) | None (overwrites) | N/A (read-only) | Not documented |
| Communication | SQLite mail (~1-5ms) | Mailboxes + convoys | Direct messaging + shared tasks | Indirect via aggregation | Consensus protocols (Raft, BFT) |
| Health monitoring | 4-tier (daemon->AI triage->monitor agent->supervisor) | Feed + problems view | Hooks (TeammateIdle) | doctor command | 12 background workers |
| Multi-model | No (Claude only) | Yes (Claude, Codex, Gemini) | No (Claude only) | Yes (any CLI agent) | Yes (Claude, GPT, Gemini, Ollama) |
| Runtime deps | Zero | Go + Dolt + beads | Zero (built-in) | npm (adapter packages) | MCP + various |
| Expertise system | Mulch (structured records) | Hooks (git-backed persistence) | None | None | RuVector (self-learning) |
| Maturity | v0.5.7, actively developed | Early (10.8k stars) | Experimental (Feb 2026) | Stable CLI | v3.5 (post-alpha, 18.6k stars) |
Where Overstory Wins
Merge conflict resolution. Nobody else has a 4-tier pipeline. Agent Teams warns about overwrites. Gastown uses implicit avoidance. Ruflo doesn't document it. Overstory will literally spawn an AI to rewrite your changes from scratch if three other tiers fail.
Structured health monitoring. The ZFC principle (observable state beats recorded state) with progressive escalation from "log a warning" to "kill the process tree" is more sophisticated than any competitor's health system.
Zero dependencies. No Go compiler. No Dolt. No npm packages. Bun runs the TypeScript directly. Everything external (git, tmux, claude) is invoked via Bun.spawn.
Expertise accumulation. Mulch records (conventions, patterns, failures, decisions) persist across sessions and get injected into agent context at spawn time. Agents literally learn from previous runs.
Where Overstory Loses
Multi-model support. Claude only. Gastown and Ruflo support Codex, Gemini, and others. If you want model diversity, Overstory is not the answer.
First-party support. Agent Teams is built into Claude Code. No installation, no extra CLI, no configuration files. It just works (with the caveats of no git isolation and no merge handling).
Community and ecosystem. Gastown has 10.8k stars and Steve Yegge's name recognition. Ruflo has 18.6k stars and 60+ agent types. Overstory is a smaller project with deeper architectural investment in the problems others ignore.
Scale experience. Anthropic stress-tested Agent Teams with 16 agents across ~2,000 sessions building a C compiler. Gastown reports running 20-30 agents. Overstory's default ceiling is 25 concurrent agents, but real-world scale reports are limited.
10. The Shared DNA with Gastown
Overstory and Gastown solve the same fundamental problem using remarkably similar primitives:
| Primitive | Overstory | Gastown |
|---|---|---|
| Agent isolation | git worktree add | git worktree add |
| Agent execution | Tmux sessions | Tmux sessions |
| Agent identity | identity.yaml in .overstory/agents/{name}/ | Persistent identity via hooks |
| Issue tracking | Beads (bd CLI) | Beads (beads CLI) — same project |
| State persistence | SQLite (4 databases) | Dolt (Git for data) |
| Communication | SQLite mail | Mailboxes |
| Session grouping | Runs (current-run.txt) | Convoys |
The divergence is in philosophy. Gastown is multi-runtime (Claude, Codex, Gemini, Cursor) with Go infrastructure and Dolt for data versioning. Overstory is Claude-native with zero dependencies, purpose-built SQLite stores, and deeper investment in conflict resolution and health monitoring.
Gastown uses Formulas (embedded TOML workflow definitions) for repeatable processes. Overstory uses agent definition files (markdown) with a two-layer overlay system (base HOW + dynamic WHAT). Both achieve the same goal — encoding reusable workflow knowledge — through different mechanisms.
11. What's Missing
| Feature | Status |
|---|---|
| Multi-model support | No. Claude only. |
| Automatic cost ceiling / budget cap | No. overstory costs --live shows burn rate, but there's no automatic stop. |
| Web UI / dashboard | No. Terminal TUI only (overstory dashboard). |
| CI/CD integration | No. No GitHub Actions, no PR creation, no automated deployment. |
| Remote execution | No. Local tmux only. No SSH, no container orchestration. |
| Windows support | No. Requires tmux (macOS/Linux). |
| Automatic retry on agent crash | Partial. Watchdog detects crashes and can nudge. Session checkpoint/resume exists but is not fully automatic. |
| Nested team composition | Yes. 3-level hierarchy is core architecture. |
| Conflict history learning | Yes. Mulch records merge patterns, skips tiers that repeatedly fail. |
12. The Bottom Line
The Claude Code multi-agent space in early 2026 has a clear spectrum:
If you want zero setup: Agent Teams. It's built in, it works, and its limitations (no git isolation, no merge handling) may not matter for your use case.
If you want multi-model and scale: Gastown. It supports everything, scales to 30 agents, and has Steve Yegge's operational experience behind it. Pay the $100/hr and bring your own Dolt.
If you want architectural rigor: Overstory. It's the only system with a real merge pipeline, ZFC-principled health monitoring, and zero runtime dependencies. It's Claude-only, and it's not trying to be everything for everyone.
If you want intelligent routing: Ruflo. 60+ agent types, self-learning, consensus algorithms. Whether the complexity pays for itself is an open question.
If you want multi-model code review: MCO. It's a different tool for a different job — parallel analysis across providers, not code generation orchestration.
Pick your orchestrator based on whether you need git safety nets or model diversity. As of today, nobody offers both.