TL;DR: Overstory turns a single Claude Code session into a coordinated swarm of 25 agents by parasitizing Claude Code's own hook system, isolating work in git worktrees, and resolving merge conflicts through a 4-tier escalation pipeline that goes from git merge to "ask an AI to rewrite it from scratch" — all with zero npm dependencies.

Overstory Architecture Deep Dive

If you're evaluating multi-agent orchestration for Claude Code — or wondering how Overstory compares to Gastown, Agent Teams, Ruflo, and MCO — this is the architectural teardown.

1. What Is Overstory?

Overstory is a project-agnostic swarm system for Claude Code agent orchestration. There is no separate daemon. Your Claude Code session is the orchestrator. The system bootstraps itself through three mechanisms:

CLAUDE.md — the project instruction file that Claude Code reads on startup
Hooks — Claude Code's lifecycle event system (SessionStart, PreToolUse, PostToolUse, Stop)
The overstory CLI — 29 commands for spawning agents, messaging, merging, and monitoring

The runtime is Bun with TypeScript. Zero npm runtime dependencies — every external interaction (git, tmux, Claude CLI) goes through Bun.spawn. All persistent state lives in SQLite databases using bun:sqlite with WAL mode for concurrent access across agents.

Tech Stack at a Glance

Component	Implementation
Runtime	Bun (runs TypeScript directly, no build step)
Dependencies	Zero runtime. `bun:sqlite`, `Bun.spawn`, `Bun.file` only
Databases	4 SQLite DBs: `mail.db`, `sessions.db`, `events.db`, `metrics.db`
Agent isolation	Git worktrees (one per agent)
Agent execution	Tmux sessions (one per agent, running `claude --dangerously-skip-permissions`)
Communication	Custom SQLite mail system (~1-5ms per query)
Issue tracking	Beads (`bd` CLI, git-backed JSONL)
Expertise	Mulch (`mulch` CLI, structured knowledge records)

2. The Hierarchy

Overstory enforces a strict 3-level hierarchy with depth limits:

Orchestrator (your Claude Code session, depth 0)
├── Coordinator agent (depth 0, spawns leads only)
│   ├── Lead agent (depth 1, spawns workers)
│   │   ├── Scout (depth 2, read-only recon)
│   │   ├── Builder (depth 2, writes code)
│   │   ├── Reviewer (depth 2, read-only validation)
│   │   └── Merger (depth 2, branch integration)
│   └── Supervisor agent (depth 1, persistent per-project)
│       └── Workers (depth 2, same types as above)
└── Monitor agent (Tier 2, observer only, no worktree)

This is code-enforced. If the coordinator tries to spawn a builder directly, it gets a HierarchyError:

export function validateHierarchy(
  parentAgent: string | null,
  capability: string,
  name: string,
  _depth: number,
  forceHierarchy: boolean,
): void {
  if (forceHierarchy) return;
  if (parentAgent === null && capability !== "lead") {
    throw new HierarchyError(
      `Coordinator cannot spawn "${capability}" directly. Only "lead" is allowed without --parent.`,
      { agentName: name, requestedCapability: capability },
    );
  }
}

The lead agent enforces a mandatory 3-phase workflow: Scout → Build → Review. Skipping scouts is a named failure mode (SCOUT_SKIP). Every builder must have a corresponding reviewer. Non-overlapping file scopes are enforced — two builders cannot own the same file.

3. How Agent Spawning Works

The overstory sling command is the critical path. It orchestrates a 14-step spawn sequence:

sling <task-id> --name auth-builder --capability builder --parent auth-lead --depth 2 --files src/auth.ts,src/auth.test.ts

Here's what happens (traced from src/commands/sling.ts):

Load config — reads .overstory/config.yaml, resolves project root (handles worktree detection)
Validate depth — depth > config.agents.maxDepth (default 2) → reject
Validate hierarchy — coordinator can only spawn leads
Load agent manifest — validates capability exists in agent-manifest.json
Resolve run ID — reads/creates current-run.txt for session grouping
Check concurrency — active agents vs maxConcurrent (default 25)
Check name uniqueness — no duplicate active agent names
Enforce stagger delay — minimum gap between spawns (default 2s)
Validate bead — task must be open or in_progress
Create git worktree — git worktree add -b overstory/{name}/{taskId} .overstory/worktrees/{name} main
Generate overlay CLAUDE.md — Layer 1 (role HOW) + Layer 2 (task WHAT), including pre-fetched mulch expertise
Deploy hooks — capability-specific guards to the worktree's .claude/settings.local.json
Create tmux session — tmux new-session -d -s overstory-{project}-{name} -c {worktreePath} "claude --model {model} --dangerously-skip-permissions"
Send startup beacon — structured message via tmux send-keys after 3s initialization delay

The tmux session creation in src/worktree/tmux.ts handles PATH injection so hooks can find the overstory binary:

export async function createSession(
  name: string, cwd: string, command: string,
  env?: Record<string, string>,
): Promise<number> {
  const overstoryBinDir = await detectOverstoryBinDir();
  const exports: string[] = [];
  if (overstoryBinDir) {
    exports.push(`export PATH="${overstoryBinDir}:$PATH"`);
  }
  if (env) {
    for (const [key, value] of Object.entries(env)) {
      exports.push(`export ${key}="${value}"`);
    }
  }
  const wrappedCommand = exports.length > 0
    ? `${exports.join(" && ")} && ${command}` : command;

  const { exitCode, stderr } = await runCommand(
    ["tmux", "new-session", "-d", "-s", name, "-c", cwd, wrappedCommand], cwd,
  );
  // ... PID retrieval via tmux list-panes ...
}

4. The Messaging System: SQLite Mail

Agents communicate through a custom SQLite mail system in .overstory/mail.db. Not email. Not HTTP. Not Redis. A single SQLite database in WAL mode with prepared statements and ~1-5ms query latency.

The schema (src/mail/store.ts):

CREATE TABLE messages (
  id TEXT PRIMARY KEY,
  from_agent TEXT NOT NULL,
  to_agent TEXT NOT NULL,
  subject TEXT NOT NULL,
  body TEXT NOT NULL,
  type TEXT NOT NULL DEFAULT 'status'
    CHECK(type IN ('status','question','result','error',
      'worker_done','merge_ready','merged','merge_failed',
      'escalation','health_check','dispatch','assign')),
  priority TEXT NOT NULL DEFAULT 'normal'
    CHECK(priority IN ('low','normal','high','urgent')),
  thread_id TEXT,
  payload TEXT,
  read INTEGER NOT NULL DEFAULT 0,
  created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_inbox ON messages(to_agent, read);

There are 4 semantic types (status, question, result, error) and 8 protocol types (worker_done, merge_ready, merged, merge_failed, escalation, health_check, dispatch, assign). Protocol types carry structured JSON payloads — for example, worker_done includes { beadId, branch, exitCode, filesModified }.

Mail delivery is hook-driven. The UserPromptSubmit and PostToolUse hooks run overstory mail check --inject --agent {name}, which queries unread messages, marks them read, and formats them as human-readable text injected into the agent's context:

You have 2 new messages:

--- From: auth-lead [HIGH] (question) ---
Subject: File scope clarification
Are you handling the middleware auth or just the route handler?
[Reply with: overstory mail reply msg-abc123 --body "..."]

This is the parasitic genius of the system — Claude Code's own hook infrastructure becomes the agent's mailbox polling loop. No separate daemon needed.

5. The Merge Pipeline: 4-Tier Escalation

When an agent finishes work, its branch needs to be integrated. Overstory uses a FIFO merge queue (merge-queue.db, SQLite) and a 4-tier conflict resolution pipeline in src/merge/resolver.ts:

Tier	Strategy	How It Works
1: Clean merge	`git merge --no-edit`	If no conflicts, done. Cheapest possible outcome.
2: Auto-resolve	Keep incoming changes	Parse conflict markers, replace each block with the agent's version.
3: AI-resolve	`claude --print` per file	Send conflicted file content to Claude with "output ONLY the resolved content, NO explanation." Validates output isn't prose.
4: Re-imagine	Full rewrite	Abort merge. Show Claude both canonical and branch versions. "Reimplement the changes from the branch version onto the canonical version."

The prose detection in Tier 3 (looksLikeProse()) catches when Claude forgets it's supposed to output raw code:

export function looksLikeProse(text: string): boolean {
  const prosePatterns = [
    /^(I |I'[a-z]+ |Here |Here's |The |This |Let me |Sure|Unfortunately)/i,
    /^(To resolve|Looking at|Based on|After reviewing|The conflict)/i,
    /^```/m, // Markdown fencing — the model wrapped the code
    /I need permission/i,
  ];
  for (const pattern of prosePatterns) {
    if (pattern.test(text.trim())) return true;
  }
  return false;
}

The system also queries mulch for historical conflict patterns. If a tier has failed >=2 times for the same files and never succeeded, it gets skipped entirely. Past successful resolutions are fed into the AI prompt for context. This means the merge pipeline literally learns from its own failures.

6. The Watchdog: 4-Tier Health Monitoring

Agent health monitoring runs across four tiers:

Tier	Name	What It Does
0	Mechanical daemon	Process monitoring via `setInterval`. Checks tmux liveness, PID liveness, lastActivity timestamps.
1	AI triage	Ephemeral `claude --print` call to classify failure as `retry`, `terminate`, or `extend`.
2	Monitor agent	Persistent Claude Code session doing continuous patrol across all agents.
3	Supervisor	Per-project oversight (built into supervisor agent role).

The Tier 0 daemon implements the ZFC principle (Zero Failure Crash) from src/watchdog/health.ts: observable state always wins over recorded state.

Signal priority (highest to lowest):
  1. tmux session liveness  — Is the tmux session actually running?
  2. Process liveness (pid) — Is the Claude Code process still alive?
  3. Recorded state         — What does sessions.db claim?

When signals conflict:
  - tmux dead + sessions.db says "working" → zombie immediately
  - tmux alive + sessions.db says "zombie" → investigate, don't auto-kill
  - pid dead + tmux alive → zombie (agent exited, shell survived)

Progressive escalation for stalled agents follows a 4-level nudge protocol:

Level 0 (warn):      Log warning, no action
Level 1 (nudge):     Send tmux nudge: "appears stalled, please check"
Level 2 (escalate):  Invoke Tier 1 AI triage (if enabled)
Level 3 (terminate): Kill tmux session + descendant process tree

Process tree cleanup in src/worktree/tmux.ts walks descendant PIDs recursively using pgrep -P, sends SIGTERM deepest-first, waits a 2-second grace period, then SIGKILL survivors. This prevents orphaned git, bun test, and biome processes from accumulating.

7. Hooks: The Nervous System

The hooks template (templates/hooks.json.tmpl) wires Overstory into Claude Code's lifecycle:

Hook	What It Does
`SessionStart`	`overstory prime --agent {name}` — loads config, recent activity, mulch expertise
`UserPromptSubmit`	`overstory mail check --inject` — polls inbox, injects unread messages
`PreToolUse` (all)	`overstory log tool-start` — records tool invocation to events.db
`PreToolUse` (Bash)	Blocks `git push` with a `decision: block` response
`PostToolUse`	`overstory log tool-end` — records completion + `mail check --inject` with debounce
`Stop`	`overstory log session-end` + `mulch learn` — captures session insights
`PreCompact`	`overstory prime --compact` — re-injects context before context window compaction

The git push block is enforced at the hook level for all agents — no agent can push to remote. All integration goes through the merge pipeline. This is a hard architectural constraint, not a suggestion in a markdown file.

8. Observability Stack

Four SQLite databases provide full-fleet observability:

Database	Stores	Query Commands
`sessions.db`	Agent sessions (state machine: booting->working->completed/stalled->zombie), runs	`overstory status`, `overstory run`
`events.db`	Tool invocations, session lifecycle, errors, custom events	`overstory trace`, `overstory errors`, `overstory replay`
`mail.db`	Inter-agent messages with threading and priority	`overstory mail list`
`metrics.db`	Token usage, cost estimates per session	`overstory costs`

The overstory dashboard command provides a live TUI polling every 2 seconds. overstory replay interleaves events across agents chronologically for post-mortem analysis. overstory costs --live shows real-time token burn rates for active agents.

9. Comparison: Overstory vs the Field

Here's how Overstory stacks up against the other major Claude Code orchestration systems as of March 2026:

Dimension	Overstory	Gastown	Agent Teams (Native)	MCO	Ruflo
Builder	js0n	Steve Yegge	Anthropic	mco-org	Reuven Cohen
Architecture	CLAUDE.md + hooks + CLI	Mayor + Polecats + hooks	Lead + Teammates (flat)	Fan-out/wait-all	Queen-led swarms (layered)
Max agents	25 (configurable)	20-30	3-5 recommended	Per-provider	60+ types
Hierarchy depth	3 levels (enforced)	Hierarchical (Mayor -> Polecats)	2 levels (lead + teammates)	Flat (parallel)	Queen -> workers
Git isolation	Git worktrees (auto)	Git worktrees (auto)	None (manual)	None	Not documented
Merge handling	4-tier escalation (clean->auto->AI->reimagine)	Implicit (non-overlapping tasks)	None (overwrites)	N/A (read-only)	Not documented
Communication	SQLite mail (~1-5ms)	Mailboxes + convoys	Direct messaging + shared tasks	Indirect via aggregation	Consensus protocols (Raft, BFT)
Health monitoring	4-tier (daemon->AI triage->monitor agent->supervisor)	Feed + problems view	Hooks (TeammateIdle)	`doctor` command	12 background workers
Multi-model	No (Claude only)	Yes (Claude, Codex, Gemini)	No (Claude only)	Yes (any CLI agent)	Yes (Claude, GPT, Gemini, Ollama)
Runtime deps	Zero	Go + Dolt + beads	Zero (built-in)	npm (adapter packages)	MCP + various
Expertise system	Mulch (structured records)	Hooks (git-backed persistence)	None	None	RuVector (self-learning)
Maturity	v0.5.7, actively developed	Early (10.8k stars)	Experimental (Feb 2026)	Stable CLI	v3.5 (post-alpha, 18.6k stars)

Where Overstory Wins

Merge conflict resolution. Nobody else has a 4-tier pipeline. Agent Teams warns about overwrites. Gastown uses implicit avoidance. Ruflo doesn't document it. Overstory will literally spawn an AI to rewrite your changes from scratch if three other tiers fail.

Structured health monitoring. The ZFC principle (observable state beats recorded state) with progressive escalation from "log a warning" to "kill the process tree" is more sophisticated than any competitor's health system.

Zero dependencies. No Go compiler. No Dolt. No npm packages. Bun runs the TypeScript directly. Everything external (git, tmux, claude) is invoked via Bun.spawn.

Expertise accumulation. Mulch records (conventions, patterns, failures, decisions) persist across sessions and get injected into agent context at spawn time. Agents literally learn from previous runs.

Where Overstory Loses

Multi-model support. Claude only. Gastown and Ruflo support Codex, Gemini, and others. If you want model diversity, Overstory is not the answer.

First-party support. Agent Teams is built into Claude Code. No installation, no extra CLI, no configuration files. It just works (with the caveats of no git isolation and no merge handling).

Community and ecosystem. Gastown has 10.8k stars and Steve Yegge's name recognition. Ruflo has 18.6k stars and 60+ agent types. Overstory is a smaller project with deeper architectural investment in the problems others ignore.

Scale experience. Anthropic stress-tested Agent Teams with 16 agents across ~2,000 sessions building a C compiler. Gastown reports running 20-30 agents. Overstory's default ceiling is 25 concurrent agents, but real-world scale reports are limited.

10. The Shared DNA with Gastown

Overstory and Gastown solve the same fundamental problem using remarkably similar primitives:

Primitive	Overstory	Gastown
Agent isolation	`git worktree add`	`git worktree add`
Agent execution	Tmux sessions	Tmux sessions
Agent identity	`identity.yaml` in `.overstory/agents/{name}/`	Persistent identity via hooks
Issue tracking	Beads (`bd` CLI)	Beads (`beads` CLI) — same project
State persistence	SQLite (4 databases)	Dolt (Git for data)
Communication	SQLite mail	Mailboxes
Session grouping	Runs (`current-run.txt`)	Convoys

The divergence is in philosophy. Gastown is multi-runtime (Claude, Codex, Gemini, Cursor) with Go infrastructure and Dolt for data versioning. Overstory is Claude-native with zero dependencies, purpose-built SQLite stores, and deeper investment in conflict resolution and health monitoring.

Gastown uses Formulas (embedded TOML workflow definitions) for repeatable processes. Overstory uses agent definition files (markdown) with a two-layer overlay system (base HOW + dynamic WHAT). Both achieve the same goal — encoding reusable workflow knowledge — through different mechanisms.

11. What's Missing

Feature	Status
Multi-model support	No. Claude only.
Automatic cost ceiling / budget cap	No. `overstory costs --live` shows burn rate, but there's no automatic stop.
Web UI / dashboard	No. Terminal TUI only (`overstory dashboard`).
CI/CD integration	No. No GitHub Actions, no PR creation, no automated deployment.
Remote execution	No. Local tmux only. No SSH, no container orchestration.
Windows support	No. Requires tmux (macOS/Linux).
Automatic retry on agent crash	Partial. Watchdog detects crashes and can nudge. Session checkpoint/resume exists but is not fully automatic.
Nested team composition	Yes. 3-level hierarchy is core architecture.
Conflict history learning	Yes. Mulch records merge patterns, skips tiers that repeatedly fail.

12. The Bottom Line

The Claude Code multi-agent space in early 2026 has a clear spectrum:

If you want zero setup: Agent Teams. It's built in, it works, and its limitations (no git isolation, no merge handling) may not matter for your use case.

If you want multi-model and scale: Gastown. It supports everything, scales to 30 agents, and has Steve Yegge's operational experience behind it. Pay the $100/hr and bring your own Dolt.

If you want architectural rigor: Overstory. It's the only system with a real merge pipeline, ZFC-principled health monitoring, and zero runtime dependencies. It's Claude-only, and it's not trying to be everything for everyone.

If you want intelligent routing: Ruflo. 60+ agent types, self-learning, consensus algorithms. Whether the complexity pays for itself is an open question.

If you want multi-model code review: MCO. It's a different tool for a different job — parallel analysis across providers, not code generation orchestration.

Pick your orchestrator based on whether you need git safety nets or model diversity. As of today, nobody offers both.