← Back to blog

Overstory Architecture Deep Dive

·Overstory
overstorytypescriptarchitecturemulti-agent

TL;DR: Overstory turns a single Claude Code session into a coordinated swarm of 25 agents by parasitizing Claude Code's own hook system, isolating work in git worktrees, and resolving merge conflicts through a 4-tier escalation pipeline that goes from git merge to "ask an AI to rewrite it from scratch" — all with zero npm dependencies.

Overstory Architecture Deep Dive

If you're evaluating multi-agent orchestration for Claude Code — or wondering how Overstory compares to Gastown, Agent Teams, Ruflo, and MCO — this is the architectural teardown.


1. What Is Overstory?

Overstory is a project-agnostic swarm system for Claude Code agent orchestration. There is no separate daemon. Your Claude Code session is the orchestrator. The system bootstraps itself through three mechanisms:

  1. CLAUDE.md — the project instruction file that Claude Code reads on startup
  2. Hooks — Claude Code's lifecycle event system (SessionStart, PreToolUse, PostToolUse, Stop)
  3. The overstory CLI — 29 commands for spawning agents, messaging, merging, and monitoring

The runtime is Bun with TypeScript. Zero npm runtime dependencies — every external interaction (git, tmux, Claude CLI) goes through Bun.spawn. All persistent state lives in SQLite databases using bun:sqlite with WAL mode for concurrent access across agents.

Tech Stack at a Glance

ComponentImplementation
RuntimeBun (runs TypeScript directly, no build step)
DependenciesZero runtime. bun:sqlite, Bun.spawn, Bun.file only
Databases4 SQLite DBs: mail.db, sessions.db, events.db, metrics.db
Agent isolationGit worktrees (one per agent)
Agent executionTmux sessions (one per agent, running claude --dangerously-skip-permissions)
CommunicationCustom SQLite mail system (~1-5ms per query)
Issue trackingBeads (bd CLI, git-backed JSONL)
ExpertiseMulch (mulch CLI, structured knowledge records)

2. The Hierarchy

Overstory enforces a strict 3-level hierarchy with depth limits:

Orchestrator (your Claude Code session, depth 0)
├── Coordinator agent (depth 0, spawns leads only)
│   ├── Lead agent (depth 1, spawns workers)
│   │   ├── Scout (depth 2, read-only recon)
│   │   ├── Builder (depth 2, writes code)
│   │   ├── Reviewer (depth 2, read-only validation)
│   │   └── Merger (depth 2, branch integration)
│   └── Supervisor agent (depth 1, persistent per-project)
│       └── Workers (depth 2, same types as above)
└── Monitor agent (Tier 2, observer only, no worktree)

This is code-enforced. If the coordinator tries to spawn a builder directly, it gets a HierarchyError:

export function validateHierarchy(
  parentAgent: string | null,
  capability: string,
  name: string,
  _depth: number,
  forceHierarchy: boolean,
): void {
  if (forceHierarchy) return;
  if (parentAgent === null && capability !== "lead") {
    throw new HierarchyError(
      `Coordinator cannot spawn "${capability}" directly. Only "lead" is allowed without --parent.`,
      { agentName: name, requestedCapability: capability },
    );
  }
}

The lead agent enforces a mandatory 3-phase workflow: Scout → Build → Review. Skipping scouts is a named failure mode (SCOUT_SKIP). Every builder must have a corresponding reviewer. Non-overlapping file scopes are enforced — two builders cannot own the same file.

3. How Agent Spawning Works

The overstory sling command is the critical path. It orchestrates a 14-step spawn sequence:

sling <task-id> --name auth-builder --capability builder --parent auth-lead --depth 2 --files src/auth.ts,src/auth.test.ts

Here's what happens (traced from src/commands/sling.ts):

  1. Load config — reads .overstory/config.yaml, resolves project root (handles worktree detection)
  2. Validate depthdepth > config.agents.maxDepth (default 2) → reject
  3. Validate hierarchy — coordinator can only spawn leads
  4. Load agent manifest — validates capability exists in agent-manifest.json
  5. Resolve run ID — reads/creates current-run.txt for session grouping
  6. Check concurrency — active agents vs maxConcurrent (default 25)
  7. Check name uniqueness — no duplicate active agent names
  8. Enforce stagger delay — minimum gap between spawns (default 2s)
  9. Validate bead — task must be open or in_progress
  10. Create git worktreegit worktree add -b overstory/{name}/{taskId} .overstory/worktrees/{name} main
  11. Generate overlay CLAUDE.md — Layer 1 (role HOW) + Layer 2 (task WHAT), including pre-fetched mulch expertise
  12. Deploy hooks — capability-specific guards to the worktree's .claude/settings.local.json
  13. Create tmux sessiontmux new-session -d -s overstory-{project}-{name} -c {worktreePath} "claude --model {model} --dangerously-skip-permissions"
  14. Send startup beacon — structured message via tmux send-keys after 3s initialization delay

The tmux session creation in src/worktree/tmux.ts handles PATH injection so hooks can find the overstory binary:

export async function createSession(
  name: string, cwd: string, command: string,
  env?: Record<string, string>,
): Promise<number> {
  const overstoryBinDir = await detectOverstoryBinDir();
  const exports: string[] = [];
  if (overstoryBinDir) {
    exports.push(`export PATH="${overstoryBinDir}:$PATH"`);
  }
  if (env) {
    for (const [key, value] of Object.entries(env)) {
      exports.push(`export ${key}="${value}"`);
    }
  }
  const wrappedCommand = exports.length > 0
    ? `${exports.join(" && ")} && ${command}` : command;

  const { exitCode, stderr } = await runCommand(
    ["tmux", "new-session", "-d", "-s", name, "-c", cwd, wrappedCommand], cwd,
  );
  // ... PID retrieval via tmux list-panes ...
}

4. The Messaging System: SQLite Mail

Agents communicate through a custom SQLite mail system in .overstory/mail.db. Not email. Not HTTP. Not Redis. A single SQLite database in WAL mode with prepared statements and ~1-5ms query latency.

The schema (src/mail/store.ts):

CREATE TABLE messages (
  id TEXT PRIMARY KEY,
  from_agent TEXT NOT NULL,
  to_agent TEXT NOT NULL,
  subject TEXT NOT NULL,
  body TEXT NOT NULL,
  type TEXT NOT NULL DEFAULT 'status'
    CHECK(type IN ('status','question','result','error',
      'worker_done','merge_ready','merged','merge_failed',
      'escalation','health_check','dispatch','assign')),
  priority TEXT NOT NULL DEFAULT 'normal'
    CHECK(priority IN ('low','normal','high','urgent')),
  thread_id TEXT,
  payload TEXT,
  read INTEGER NOT NULL DEFAULT 0,
  created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_inbox ON messages(to_agent, read);

There are 4 semantic types (status, question, result, error) and 8 protocol types (worker_done, merge_ready, merged, merge_failed, escalation, health_check, dispatch, assign). Protocol types carry structured JSON payloads — for example, worker_done includes { beadId, branch, exitCode, filesModified }.

Mail delivery is hook-driven. The UserPromptSubmit and PostToolUse hooks run overstory mail check --inject --agent {name}, which queries unread messages, marks them read, and formats them as human-readable text injected into the agent's context:

You have 2 new messages:

--- From: auth-lead [HIGH] (question) ---
Subject: File scope clarification
Are you handling the middleware auth or just the route handler?
[Reply with: overstory mail reply msg-abc123 --body "..."]

This is the parasitic genius of the system — Claude Code's own hook infrastructure becomes the agent's mailbox polling loop. No separate daemon needed.

5. The Merge Pipeline: 4-Tier Escalation

When an agent finishes work, its branch needs to be integrated. Overstory uses a FIFO merge queue (merge-queue.db, SQLite) and a 4-tier conflict resolution pipeline in src/merge/resolver.ts:

TierStrategyHow It Works
1: Clean mergegit merge --no-editIf no conflicts, done. Cheapest possible outcome.
2: Auto-resolveKeep incoming changesParse conflict markers, replace each block with the agent's version.
3: AI-resolveclaude --print per fileSend conflicted file content to Claude with "output ONLY the resolved content, NO explanation." Validates output isn't prose.
4: Re-imagineFull rewriteAbort merge. Show Claude both canonical and branch versions. "Reimplement the changes from the branch version onto the canonical version."

The prose detection in Tier 3 (looksLikeProse()) catches when Claude forgets it's supposed to output raw code:

export function looksLikeProse(text: string): boolean {
  const prosePatterns = [
    /^(I |I'[a-z]+ |Here |Here's |The |This |Let me |Sure|Unfortunately)/i,
    /^(To resolve|Looking at|Based on|After reviewing|The conflict)/i,
    /^```/m, // Markdown fencing — the model wrapped the code
    /I need permission/i,
  ];
  for (const pattern of prosePatterns) {
    if (pattern.test(text.trim())) return true;
  }
  return false;
}

The system also queries mulch for historical conflict patterns. If a tier has failed >=2 times for the same files and never succeeded, it gets skipped entirely. Past successful resolutions are fed into the AI prompt for context. This means the merge pipeline literally learns from its own failures.

6. The Watchdog: 4-Tier Health Monitoring

Agent health monitoring runs across four tiers:

TierNameWhat It Does
0Mechanical daemonProcess monitoring via setInterval. Checks tmux liveness, PID liveness, lastActivity timestamps.
1AI triageEphemeral claude --print call to classify failure as retry, terminate, or extend.
2Monitor agentPersistent Claude Code session doing continuous patrol across all agents.
3SupervisorPer-project oversight (built into supervisor agent role).

The Tier 0 daemon implements the ZFC principle (Zero Failure Crash) from src/watchdog/health.ts: observable state always wins over recorded state.

Signal priority (highest to lowest):
  1. tmux session liveness  — Is the tmux session actually running?
  2. Process liveness (pid) — Is the Claude Code process still alive?
  3. Recorded state         — What does sessions.db claim?

When signals conflict:
  - tmux dead + sessions.db says "working" → zombie immediately
  - tmux alive + sessions.db says "zombie" → investigate, don't auto-kill
  - pid dead + tmux alive → zombie (agent exited, shell survived)

Progressive escalation for stalled agents follows a 4-level nudge protocol:

Level 0 (warn):      Log warning, no action
Level 1 (nudge):     Send tmux nudge: "appears stalled, please check"
Level 2 (escalate):  Invoke Tier 1 AI triage (if enabled)
Level 3 (terminate): Kill tmux session + descendant process tree

Process tree cleanup in src/worktree/tmux.ts walks descendant PIDs recursively using pgrep -P, sends SIGTERM deepest-first, waits a 2-second grace period, then SIGKILL survivors. This prevents orphaned git, bun test, and biome processes from accumulating.

7. Hooks: The Nervous System

The hooks template (templates/hooks.json.tmpl) wires Overstory into Claude Code's lifecycle:

HookWhat It Does
SessionStartoverstory prime --agent {name} — loads config, recent activity, mulch expertise
UserPromptSubmitoverstory mail check --inject — polls inbox, injects unread messages
PreToolUse (all)overstory log tool-start — records tool invocation to events.db
PreToolUse (Bash)Blocks git push with a decision: block response
PostToolUseoverstory log tool-end — records completion + mail check --inject with debounce
Stopoverstory log session-end + mulch learn — captures session insights
PreCompactoverstory prime --compact — re-injects context before context window compaction

The git push block is enforced at the hook level for all agents — no agent can push to remote. All integration goes through the merge pipeline. This is a hard architectural constraint, not a suggestion in a markdown file.

8. Observability Stack

Four SQLite databases provide full-fleet observability:

DatabaseStoresQuery Commands
sessions.dbAgent sessions (state machine: booting->working->completed/stalled->zombie), runsoverstory status, overstory run
events.dbTool invocations, session lifecycle, errors, custom eventsoverstory trace, overstory errors, overstory replay
mail.dbInter-agent messages with threading and priorityoverstory mail list
metrics.dbToken usage, cost estimates per sessionoverstory costs

The overstory dashboard command provides a live TUI polling every 2 seconds. overstory replay interleaves events across agents chronologically for post-mortem analysis. overstory costs --live shows real-time token burn rates for active agents.

9. Comparison: Overstory vs the Field

Here's how Overstory stacks up against the other major Claude Code orchestration systems as of March 2026:

DimensionOverstoryGastownAgent Teams (Native)MCORuflo
Builderjs0nSteve YeggeAnthropicmco-orgReuven Cohen
ArchitectureCLAUDE.md + hooks + CLIMayor + Polecats + hooksLead + Teammates (flat)Fan-out/wait-allQueen-led swarms (layered)
Max agents25 (configurable)20-303-5 recommendedPer-provider60+ types
Hierarchy depth3 levels (enforced)Hierarchical (Mayor -> Polecats)2 levels (lead + teammates)Flat (parallel)Queen -> workers
Git isolationGit worktrees (auto)Git worktrees (auto)None (manual)NoneNot documented
Merge handling4-tier escalation (clean->auto->AI->reimagine)Implicit (non-overlapping tasks)None (overwrites)N/A (read-only)Not documented
CommunicationSQLite mail (~1-5ms)Mailboxes + convoysDirect messaging + shared tasksIndirect via aggregationConsensus protocols (Raft, BFT)
Health monitoring4-tier (daemon->AI triage->monitor agent->supervisor)Feed + problems viewHooks (TeammateIdle)doctor command12 background workers
Multi-modelNo (Claude only)Yes (Claude, Codex, Gemini)No (Claude only)Yes (any CLI agent)Yes (Claude, GPT, Gemini, Ollama)
Runtime depsZeroGo + Dolt + beadsZero (built-in)npm (adapter packages)MCP + various
Expertise systemMulch (structured records)Hooks (git-backed persistence)NoneNoneRuVector (self-learning)
Maturityv0.5.7, actively developedEarly (10.8k stars)Experimental (Feb 2026)Stable CLIv3.5 (post-alpha, 18.6k stars)

Where Overstory Wins

Merge conflict resolution. Nobody else has a 4-tier pipeline. Agent Teams warns about overwrites. Gastown uses implicit avoidance. Ruflo doesn't document it. Overstory will literally spawn an AI to rewrite your changes from scratch if three other tiers fail.

Structured health monitoring. The ZFC principle (observable state beats recorded state) with progressive escalation from "log a warning" to "kill the process tree" is more sophisticated than any competitor's health system.

Zero dependencies. No Go compiler. No Dolt. No npm packages. Bun runs the TypeScript directly. Everything external (git, tmux, claude) is invoked via Bun.spawn.

Expertise accumulation. Mulch records (conventions, patterns, failures, decisions) persist across sessions and get injected into agent context at spawn time. Agents literally learn from previous runs.

Where Overstory Loses

Multi-model support. Claude only. Gastown and Ruflo support Codex, Gemini, and others. If you want model diversity, Overstory is not the answer.

First-party support. Agent Teams is built into Claude Code. No installation, no extra CLI, no configuration files. It just works (with the caveats of no git isolation and no merge handling).

Community and ecosystem. Gastown has 10.8k stars and Steve Yegge's name recognition. Ruflo has 18.6k stars and 60+ agent types. Overstory is a smaller project with deeper architectural investment in the problems others ignore.

Scale experience. Anthropic stress-tested Agent Teams with 16 agents across ~2,000 sessions building a C compiler. Gastown reports running 20-30 agents. Overstory's default ceiling is 25 concurrent agents, but real-world scale reports are limited.

10. The Shared DNA with Gastown

Overstory and Gastown solve the same fundamental problem using remarkably similar primitives:

PrimitiveOverstoryGastown
Agent isolationgit worktree addgit worktree add
Agent executionTmux sessionsTmux sessions
Agent identityidentity.yaml in .overstory/agents/{name}/Persistent identity via hooks
Issue trackingBeads (bd CLI)Beads (beads CLI) — same project
State persistenceSQLite (4 databases)Dolt (Git for data)
CommunicationSQLite mailMailboxes
Session groupingRuns (current-run.txt)Convoys

The divergence is in philosophy. Gastown is multi-runtime (Claude, Codex, Gemini, Cursor) with Go infrastructure and Dolt for data versioning. Overstory is Claude-native with zero dependencies, purpose-built SQLite stores, and deeper investment in conflict resolution and health monitoring.

Gastown uses Formulas (embedded TOML workflow definitions) for repeatable processes. Overstory uses agent definition files (markdown) with a two-layer overlay system (base HOW + dynamic WHAT). Both achieve the same goal — encoding reusable workflow knowledge — through different mechanisms.

11. What's Missing

FeatureStatus
Multi-model supportNo. Claude only.
Automatic cost ceiling / budget capNo. overstory costs --live shows burn rate, but there's no automatic stop.
Web UI / dashboardNo. Terminal TUI only (overstory dashboard).
CI/CD integrationNo. No GitHub Actions, no PR creation, no automated deployment.
Remote executionNo. Local tmux only. No SSH, no container orchestration.
Windows supportNo. Requires tmux (macOS/Linux).
Automatic retry on agent crashPartial. Watchdog detects crashes and can nudge. Session checkpoint/resume exists but is not fully automatic.
Nested team compositionYes. 3-level hierarchy is core architecture.
Conflict history learningYes. Mulch records merge patterns, skips tiers that repeatedly fail.

12. The Bottom Line

The Claude Code multi-agent space in early 2026 has a clear spectrum:

If you want zero setup: Agent Teams. It's built in, it works, and its limitations (no git isolation, no merge handling) may not matter for your use case.

If you want multi-model and scale: Gastown. It supports everything, scales to 30 agents, and has Steve Yegge's operational experience behind it. Pay the $100/hr and bring your own Dolt.

If you want architectural rigor: Overstory. It's the only system with a real merge pipeline, ZFC-principled health monitoring, and zero runtime dependencies. It's Claude-only, and it's not trying to be everything for everyone.

If you want intelligent routing: Ruflo. 60+ agent types, self-learning, consensus algorithms. Whether the complexity pays for itself is an open question.

If you want multi-model code review: MCO. It's a different tool for a different job — parallel analysis across providers, not code generation orchestration.

Pick your orchestrator based on whether you need git safety nets or model diversity. As of today, nobody offers both.


Sources