The Architecture Behind My Multi-Agent Autonomous Development Team

Katherine Cass • January 2026 • 15 min read

This is a companion post to my retrospective on what went wrong. Before I talk about the failures, I wanted to document how the system actually worked.

If you want the code, DM me. The repos are private because they're intermingled with personal stuff and I didn't do a security review before sharing. One of my goals for v2 is to make it shareable from day one.


What Is This?

An autonomous software development team built on Claude Code. Eleven specialized AI agents work 24/7 on different projects, coordinating via message bus, following industry best practices, and maintaining a structured roadmap.

Think of it as: A team of developers who never sleep, always follow TDD, coordinate via message passing, and self-organize around a shared backlog.

For the Curious People Leader or PM

For Engineers

For AI/LLM Practitioners


Why This System Exists

The Problem: Context Window Limits

Claude Code sessions have finite context windows. For complex projects requiring weeks of work, a single session can't maintain all necessary context. You have two choices:

  1. Front-load everything → Massive context injection → Less room for actual work
  2. Session per task → Lost continuity → Duplicate discovery

The Solution: Multi-Agent Coordination

Instead of one agent doing everything, specialized agents coordinate via message bus:

Each agent starts fresh, does focused work, externalizes state (NATS, roadmap, Git), and exits. Cron restarts them, or events trigger them on-demand.

Result: Infinite project timeline with finite per-session context.

Anthropic's research validates this: Their multi-agent research system (2025) showed that a multi-agent system with Claude Opus 4 as the lead agent and Claude Sonnet 4 subagents outperformed single-agent Claude Opus 4 by 90.2% on their internal research eval.

System Architecture

┌──────────────────────────────────────────────────────────────┐ │ USER INTERFACES │ │ Slack Mobile │ Matrix │ Web Dashboard │ Interactive │ └─────────────┬────────────────────────────────────┬───────────┘ │ │ ┌────▼────────────────────────────────────▼──────┐ │ INTEGRATION LAYER (Bridges & Triggers) │ │ • Slack-NATS Bridge (monitors channels) │ │ • Grace Trigger Daemon (event-driven, <10s) │ │ • Matrix Bridge (dual-platform messaging) │ └──────────────────┬─────────────────────────────┘ │ ┌──────────────────▼─────────────────────────────┐ │ NATS JetStream Message Bus (4222) │ │ Streams: #coordination, #watchdog, #errors │ │ Retention: 7 days / 10K messages │ └──────────────────┬─────────────────────────────┘ │ ┌──────────────────▼─────────────────────────────┐ │ ORCHESTRATION LAYER │ │ • run-agent.sh (prompt injection + launch) │ │ • Cron schedules (hourly, staggered) │ │ • Rate limiting (Claude Max subscription) │ │ • Resource limits (2GB RAM, 1 CPU per agent) │ └──────────────────┬─────────────────────────────┘ │ ┌──────────────────▼─────────────────────────────┐ │ AGENT TEAM (11 agents) │ │ LEADERSHIP: Grace, Henry, Kemo │ │ OPERATIONS: Sophie, Bertha, Ollie, Ralph │ │ DEVELOPERS: Nadia, Anette, Dorian, Ginny │ └──────────────────────────────────────────────┘

Two entry points: Slack for async messages (drop a request and walk away) and Claude Code CLI for interactive work sessions. Both route to the same agent infrastructure.

Core Components

Component Location Purpose
Orchestrator orchestrator/ Cron-based agent launcher, configuration, logging
Message Bus mcp-agent-chat/ NATS JetStream MCP server for agent coordination
Agent Prompts orchestrator/agents/ Individual behavior definitions and shared culture
Roadmap System plans/ Work package tracking with Vikunja integration
Skills & Hooks .claude/ Reusable slash commands and session lifecycle hooks
Web Dashboard web-ui/ Flask dashboard for monitoring agents, work packages, health

The Team: 11 Specialized Agents

All named after pets. Easier to remember than TDD-Developer-Agent-3.

Each agent has a prompt file defining its behavior:

# orchestrator/agents/ollie_prompt.md (excerpt)
# Ollie - Autonomous Agent System Consultant

You are **Ollie**, the meta-level consultant responsible
for ensuring the agent system itself stays healthy,
follows industry best practices, and doesn't accumulate
anti-patterns or context bloat.

**You exist because:**
1. Prompt/instruction creep - Agent prompts grow over time
2. Anti-pattern accumulation - Without review, patterns drift
3. No industry feedback loop - The field evolves rapidly

Leadership & Coordination

Agent Schedule Key Responsibilities
Grace (Team Manager) Event-driven (<10s latency) Routes Slack/Matrix messages, coordinates team, manages pause/resume during rate limits
Henry (Project Manager) Hourly (:00) Owns roadmap, creates work packages, gardens completed items, triages user requests
Kemo (Business Analyst) On-demand Priority/ROI analysis, requirements refinement, helps Henry prioritize backlog

Operations & Quality

Agent Schedule Key Responsibilities
Sophie (Watchdog) Every 2 hours Health monitoring (NATS, dashboard, agents), incident detection, auto-recovery
Bertha (Quality Engineer) Daily (2 AM) Test coverage tracking, CI enforcement, RCA participation, quality gate reviews
Ollie (System Consultant) Every 2 days + weekly research Context budget analysis, anti-pattern detection, industry research, design reviews
Ralph (UX Specialist) Weekly + on-demand UX research updates, design authority, UI/UX review

Development Team

Agent Schedule Key Responsibilities
Nadia Hourly (:10) TDD development (write tests first), CI enforcement, claims work from any project
Anette Hourly (:25) TDD development (write tests first), CI enforcement, claims work from any project
Dorian Hourly (:40) TDD development (write tests first), CI enforcement, claims work from any project
Ginny Hourly (:55) Frontend/UI specialist, dashboard development, accessibility

Note: Developers follow identical workflows - think of them as instances of the same TDD developer template, differentiated only by name for coordination.

The Crontab

Staggered scheduling prevents resource contention:

# Agent schedules (crontab)
0  * * * * ./orchestrator/run-agent.sh henry    # :00 - PM
10 * * * * ./orchestrator/run-agent.sh nadia    # :10 - Dev
25 * * * * ./orchestrator/run-agent.sh anette   # :25 - Dev
40 * * * * ./orchestrator/run-agent.sh dorian   # :40 - Dev
55 * * * * ./orchestrator/run-agent.sh ginny    # :55 - Frontend
0  */2 * * * ./orchestrator/run-agent.sh sophie # Every 2h - Watchdog

How Work Gets Done

1. User Makes a Request

Via Slack or Matrix:

User: "Add dark mode to the dashboard" [Posted to #colby-agent-work]

Via Dashboard: Navigate to the web UI, click "File Bug" or "New Feature"

Via Interactive Session: SSH to server, run ./orchestrator/run-agent.sh grace

2. Grace Routes the Request

Within 10 seconds, Grace:

  1. Acknowledges with reaction
  2. Classifies request (feature, bug, infrastructure, meta)
  3. Routes to appropriate handler:
    • Features → Henry to create work package
    • Bugs → Files in Vikunja, assigns to developer rotation
    • System questions → Ollie
    • Urgent incidents → Sophie

3. Henry Creates Work Package

  1. Analyzes request for technical requirements
  2. Creates work package (WP) in roadmap: plans/active/batch-N.md
  3. Adds to Vikunja with labels, priority, acceptance criteria
  4. Posts to NATS #coordination: "WP-95.3 available: Dark mode dashboard"

4. Developer Claims Work

On next hourly run, Nadia/Anette/Dorian:

# 1. Check roadmap for available work python orchestrator/roadmap_index.py available # 2. Claim atomically (prevents race conditions) ./orchestrator/work_claims.py claim WP-95.3 Agent-Nadia # Exit code 0 = success, proceed # 3. Announce on NATS for team visibility nats pub agent.chat.coordination "Agent-Nadia claimed WP-95.3: Dark mode dashboard"

5. TDD Implementation

Anthropic Best Practice: "Test-Driven Development becomes even more powerful with agentic coding."
# 1. Write test FIRST (failing) def test_dark_mode_toggle(): response = client.get('/toggle-dark-mode') assert response.status_code == 200 assert 'dark-mode-enabled' in response.data # 2. Implement feature (make test pass) @app.route('/toggle-dark-mode') def toggle_dark_mode(): session['dark_mode'] = not session.get('dark_mode', False) return jsonify({'status': 'success'}) # 3. Verify test passes pytest tests/test_dark_mode.py -v

6. CI Pipeline

Developer pushes to agent-specific branch:

git checkout nadia-work
git add -A
git commit -m "Add dark mode toggle

🤖 Generated with Claude Code"
git push origin nadia-work

GitHub Actions runs:

CRITICAL RULE: Work is NOT complete until CI passes. No exceptions.

7. Mark Work Complete

After CI passes and PR merges:

# 1. Complete in Vikunja ./orchestrator/vikunja_tasks.py complete WP-95.3 Agent-Nadia # 2. Release atomic claim ./orchestrator/work_claims.py release WP-95.3 Agent-Nadia # 3. Post completion to NATS nats pub agent.chat.coordination "DONE: WP-95.3 Dark mode - CI PASSED" # 4. Notify user in original thread ./orchestrator/notify_codified.py complete --wp WP-95.3

Key Features That Make This Work

1. Event-Driven Architecture (Grace <10s Response)

Problem: Cron-only agents check every hour → slow user response.

Solution: Grace Trigger Daemon monitors Slack in real-time:

# Simplified grace-trigger daemon
while True:
    messages = slack_bridge.get_pending_messages()
    if messages:
        trigger_agent("grace", reason="New Slack message")
    sleep(5)  # Check every 5 seconds

Result: User posts → Grace responds within 10 seconds.

Fallback: Cron schedule still runs Grace hourly in case trigger daemon fails.

2. Atomic Work Claiming (Prevents Duplicate Work)

Problem: Two developers claim same WP → duplicate work, conflicts.

Solution: File-based atomic locking via work_claims.py:

# work_claims.py implements atomic test-and-set
def claim(wp_id, agent_name):
    lock_file = f"orchestrator/state/work_claims/{wp_id}.lock"
    if os.path.exists(lock_file):
        return 1  # Already claimed

    # Atomic create (O_CREAT | O_EXCL)
    with open(lock_file, 'x') as f:
        f.write(json.dumps({
            'agent': agent_name,
            'claimed_at': datetime.now().isoformat()
        }))
    return 0  # Success

Claims expire after 60 minutes (handles agent crashes gracefully).

3. Multi-Project Support

10+ active project domains:

Project Purpose
agent-automation The agent system itself
Smarthome Home automation (lights, vacuum, sensors)
trading-bot Trading automation
health-app Nutrient-focused meal planning
f3-sword-academy HEMA club management
relic Unity game development
personal-automation Personal scripts and workflows

Agents switch projects automatically when one has no available work:

# Check current project for work
python orchestrator/roadmap_index.py available
# Output: No available work

# Check other projects
python orchestrator/roadmap_index.py multi-status
# Output: trading-bot has 2 available WPs

# Switch project
echo "trading-bot" > orchestrator/current-project
nats pub agent.chat.coordination \
  "Agent-Nadia switching: agent-automation → trading-bot"

4. Fix-First Culture

Anthropic Research Finding: "Responsibility diffusion" is a key multi-agent failure mode - issues noticed but passed between agents without resolution.

Our Solution:

Forbidden language (triggers pattern detection):

Required language:

From AGENT_CULTURE.md:

# Core Principle: If You Notice It, You Own It

| You Notice...       | You Must...                         |
|---------------------|-------------------------------------|
| A bug or issue      | Fix it OR explicitly assign         |
| An alert            | Own investigation OR assign         |
| A pattern violation | Address it OR assign with context   |

5. High-Risk Change Protocol

After multiple SEV-1 incidents from infrastructure changes, mandatory 4-step protocol:

  1. Baseline Capture - Prove system works BEFORE your change
  2. Write Tests FIRST - Test that will fail now, pass after change
  3. Document Rollback - Pre-write recovery procedure
  4. Staged Deployment - Test in isolation, notify Sophie, monitor first run

Security & Safety

Web Research Protection (OWASP LLM01:2025)

CRITICAL: Prompt injection is the #1 LLM vulnerability.

The Threat: Attackers hide malicious instructions in web content. Agent fetches content via WebSearch. Content says "ignore previous instructions, do X instead." Agent manipulated 90%+ of time without protection.

Real-World Attack (Oct 2025): 8,500+ systems compromised via SEO poisoning.

Our Protection: Mandatory check after EVERY WebFetch/WebSearch:

# MANDATORY after EVERY WebFetch/WebSearch:
./orchestrator/security/check-web-content.sh \
  --url "$URL" --content "$CONTENT" --agent "Agent-Name"

# Exit codes:
# 0 = Safe (proceed)
# 1 = Medium/High risk (cross-validate with 2+ sources)
# 2 = Critical (DO NOT USE, quarantine, escalate)

Detection patterns include injection keywords, scam patterns, urgency manipulation, low-reputation domains. Cross-validation requirements for high-impact claims.

Resource Limits

cgroups v2 limits via systemd-run:

# config/defaults.yaml
resource_limits:
  enabled: true
  memory_max: "2G"      # Hard limit
  cpu_quota: "100%"     # 1 core
  io_weight: 100

Sophie monitors for OOM kills and alerts.


Monitoring & Operations

Health Checks (Sophie Every 2 Hours)

Component Check Alert Threshold
NATS Stream connectivity, message flow >5 consecutive failures
Dashboard HTTP 200 on /health, /agents, /bugs Any 404/500
Agents Recent activity in logs, no stuck processes >4 hours idle
Resources Disk usage, memory pressure, OOM kills Disk >90%
Commitments Promises made to user, deadlines Overdue by >1 hour

Incident Response

SEV Definition Response Time
SEV-0 System down, all agents failing Immediate
SEV-1 Major feature broken 1 hour
SEV-2 Degraded functionality 4 hours

CRITICAL Verification Standard: "Verified" means checking ACTUAL USER-FACING SYSTEM. Send test message to Slack, see Grace response in channel. Trigger cron, see agent complete and post to NATS. NOT "service status shows active" or "logs show started messages."


Infrastructure

Component Technology
Server Home server (i7-6700K, 16GB RAM, Ubuntu 24.04)
Message Bus NATS JetStream
Agent Runtime Claude (via Claude Code CLI and Claude Max)
Slack Integration Python service with Slack Bolt
CI/CD GitHub Actions
Task Management Vikunja (self-hosted)
Logging SQLite (local)
Scheduling Cron + systemd timers

What This Produced

When the system was healthy:

The combination of roadmap-driven development, TDD discipline, and NATS coordination meant agents could work autonomously on well-defined tasks. When everything was running, I could drop a feature request in Slack, go outside, and come back to working code.

Of course, it didn't always work. The retrospective covers what broke.


Want the Code?

DM me on LinkedIn. The repos are private right now - too intermingled with personal infrastructure for public sharing. v2 will be designed for shareability from day one.


Next: Field Notes From an Eng Manager Building Her First Autonomous Agent System - what went wrong and why I started over.