Cradicle Explorer

/ docs / 06-automation / agent-system.md
agent-system.md
   1  ---
   2  title: Multi-Agent System
   3  category: automation
   4  last_verified: 2026-02-26
   5  related_files:
   6    - src/agents/base-agent.js
   7    - src/agents/utils/task-manager.js
   8    - src/agents/workflows/bug-fix.js
   9    - src/agents/workflows/feature.js
  10    - src/agents/workflows/refactor.js
  11    - src/cron/sonnet-overseer.js
  12    - src/cli/agent-manager.js
  13  tags: [agents, automation, workflows, testing]
  14  status: active
  15  replaces: docs/AGENTS.md
  16  ---
  17  
  18  # Multi-Agent System Guide
  19  
  20  ## Table of Contents
  21  
  22  - [Overview](#overview)
  23  - [Architecture](#architecture)
  24  - [Getting Started](#getting-started)
  25  - [Agents](#agents)
  26  - [Task Routing](#task-routing)
  27  - [Workflows](#workflows)
  28  - [CLI Commands](#cli-commands)
  29  - [Horizontal Scaling](#horizontal-scaling)
  30  - [Configuration](#configuration)
  31  - [Safety Features](#safety-features)
  32  - [Cost Management](#cost-management)
  33  - [Communication Patterns](#communication-patterns)
  34  - [Testing](#testing)
  35  - [Troubleshooting](#troubleshooting)
  36  - [Best Practices](#best-practices)
  37  - [Known Gaps & Industry Standards](#known-gaps--industry-standards)
  38  - [Future Enhancements](#future-enhancements)
  39  
  40  ---
  41  
  42  ## Overview
  43  
  44  The 333 Method uses a database-driven multi-agent system where specialized AI agents collaborate autonomously to handle development, testing, security, and architecture tasks.
  45  
  46  ### Benefits
  47  
  48  - **Token efficiency**: 75-85% reduction vs monolithic approach (20-25KB per invocation vs 100-150KB)
  49  - **Specialization**: Each agent has focused responsibilities and optimized context
  50  - **Peer review**: Built-in workflows ensure quality through agent collaboration
  51  - **Autonomy**: Agents work continuously via cron scheduling
  52  - **Audit trail**: Complete tracking of all agent actions and decisions
  53  
  54  ### How It Works
  55  
  56  1. **Monitor Agent** scans logs every 5 minutes and creates tasks for detected issues
  57  2. **Triage Agent** classifies errors and routes tasks to appropriate agents
  58  3. **Developer Agent** fixes bugs and implements features
  59  4. **QA Agent** verifies fixes and enforces test coverage gates
  60  5. **Security Agent** performs security reviews and compliance checks
  61  6. **Architect Agent** reviews designs and maintains documentation freshness
  62  
  63  Agents communicate through a database-driven message queue, creating a collaborative workflow where each agent builds on others' work.
  64  
  65  ---
  66  
  67  ## Architecture
  68  
  69  ### Core Components
  70  
  71  #### 1. Database Tables (Migration 041, 051)
  72  
  73  **agent_tasks** - Task queue with priority and status tracking
  74  
  75  ```sql
  76  CREATE TABLE agent_tasks (
  77    id INTEGER PRIMARY KEY AUTOINCREMENT,
  78    task_type TEXT NOT NULL,
  79    assigned_to TEXT NOT NULL,
  80    status TEXT NOT NULL,
  81    priority INTEGER DEFAULT 5,
  82    parent_task_id INTEGER,
  83    context_json TEXT,
  84    result_json TEXT,
  85    retry_count INTEGER DEFAULT 0,
  86    reviewed_by TEXT,
  87    approval_json TEXT,
  88    created_at TEXT DEFAULT CURRENT_TIMESTAMP
  89  );
  90  ```
  91  
  92  **agent_messages** - Inter-agent communication
  93  
  94  ```sql
  95  CREATE TABLE agent_messages (
  96    id INTEGER PRIMARY KEY AUTOINCREMENT,
  97    task_id INTEGER NOT NULL,
  98    from_agent TEXT NOT NULL,
  99    to_agent TEXT NOT NULL,
 100    message_type TEXT NOT NULL,
 101    message_text TEXT,
 102    metadata_json TEXT,
 103    created_at TEXT DEFAULT CURRENT_TIMESTAMP
 104  );
 105  ```
 106  
 107  **agent_logs** - Execution audit trail
 108  
 109  ```sql
 110  CREATE TABLE agent_logs (
 111    id INTEGER PRIMARY KEY AUTOINCREMENT,
 112    task_id INTEGER,
 113    agent_name TEXT NOT NULL,
 114    level TEXT NOT NULL,
 115    message TEXT NOT NULL,
 116    metadata_json TEXT,
 117    created_at TEXT DEFAULT CURRENT_TIMESTAMP
 118  );
 119  ```
 120  
 121  **agent_state** - Agent status and metrics
 122  
 123  ```sql
 124  CREATE TABLE agent_state (
 125    agent_name TEXT PRIMARY KEY,
 126    status TEXT NOT NULL,
 127    current_task_id INTEGER,
 128    last_active DATETIME DEFAULT CURRENT_TIMESTAMP,
 129    metrics_json TEXT
 130  );
 131  ```
 132  
 133  **agent_outcomes** - Task outcomes for learning (Migration 052)
 134  
 135  ```sql
 136  CREATE TABLE agent_outcomes (
 137    id INTEGER PRIMARY KEY AUTOINCREMENT,
 138    task_id INTEGER NOT NULL REFERENCES agent_tasks(id) ON DELETE CASCADE,
 139    agent_name TEXT NOT NULL,
 140    task_type TEXT NOT NULL,
 141    outcome TEXT NOT NULL CHECK(outcome IN ('success', 'failure')),
 142    context_json TEXT,  -- Task-specific context (error_type, file_path, etc.)
 143    result_json TEXT,   -- Task result details (what worked, what didn't)
 144    duration_ms INTEGER,
 145    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
 146  );
 147  ```
 148  
 149  This table enables **task history and learning** - agents learn from past successes and failures to improve future performance. See [docs/agents/task-history.md](../agents/task-history.md) for details.
 150  
 151  #### 2. Context Files
 152  
 153  Each agent loads a base context (~15KB) plus role-specific context:
 154  
 155  | Agent     | Context Files          | Total Size |
 156  | --------- | ---------------------- | ---------- |
 157  | Monitor   | base.md + monitor.md   | 20KB       |
 158  | Triage    | base.md + triage.md    | 23.5KB     |
 159  | Developer | base.md + developer.md | 21.3KB     |
 160  | QA        | base.md + qa.md        | 23KB       |
 161  | Security  | base.md + security.md  | 21KB       |
 162  | Architect | base.md + architect.md | 25KB       |
 163  
 164  **Location:** `/home/jason/code/333Method/src/agents/contexts/`
 165  
 166  #### 3. Agent Framework
 167  
 168  **BaseAgent class** (`src/agents/base-agent.js`)
 169  
 170  - Task polling and execution
 171  - Message sending/receiving
 172  - Logging and error handling
 173  - Circuit breaker integration
 174  
 175  **Utility modules:**
 176  
 177  - `context-loader.js` - Merges context files
 178  - `context-builder.js` - Enriches context with task history for learning
 179  - `task-manager.js` - CRUD operations for tasks
 180  - `message-manager.js` - Inter-agent messaging
 181  
 182  ### Workflow States
 183  
 184  Tasks progress through these states:
 185  
 186  ```
 187  pending -> running -> completed
 188                |
 189           awaiting_po_approval -> approved -> pending
 190                |
 191           awaiting_architect_approval -> approved -> pending
 192                |
 193              failed
 194                |
 195             blocked
 196  ```
 197  
 198  **State Descriptions:**
 199  
 200  - `pending` - Ready to work on
 201  - `running` - Currently being processed by an agent
 202  - `awaiting_po_approval` - Design proposal waiting for Product Owner sign-off
 203  - `awaiting_architect_approval` - Implementation plan waiting for technical review
 204  - `completed` - Successfully finished
 205  - `failed` - Failed after 3 retry attempts
 206  - `blocked` - Blocked on external dependency or human action
 207  
 208  ### Approval System
 209  
 210  **Product Owner Approval** (for significant changes):
 211  
 212  - Required for: Breaking changes, schema migrations, features >4 hours effort
 213  - Workflow: Architect creates design proposal -> PO reviews -> Approve/Reject
 214  - CLI: `npm run agent:approve -- --task-id X --decision approved --reviewer "Jason"`
 215  
 216  **Architect Approval** (for all implementation plans):
 217  
 218  - Required for: All implementation plans, refactorings, performance optimizations
 219  - Review criteria: Files won't exceed 150 lines, test coverage >=85%, documentation updated
 220  - Workflow: Developer creates plan -> Architect reviews -> Approve/Reject
 221  
 222  **Database Schema** (migration 051):
 223  
 224  ```sql
 225  -- agent_tasks additions
 226  reviewed_by TEXT              -- Who approved the task
 227  approval_json TEXT            -- {decision, reviewer, timestamp, notes, conditions}
 228  status CHECK(..., 'awaiting_po_approval', 'awaiting_architect_approval')
 229  ```
 230  
 231  ---
 232  
 233  ## Getting Started
 234  
 235  ### Prerequisites
 236  
 237  1. Database initialized with agent tables (migration 041, 051)
 238  2. Environment variable set: `AGENT_SYSTEM_ENABLED=true`
 239  3. Cron system enabled to run agents every 5 minutes
 240  
 241  ### Quick Start
 242  
 243  #### 1. Enable the Agent System
 244  
 245  ```bash
 246  # Add to .env
 247  echo "AGENT_SYSTEM_ENABLED=true" >> .env
 248  ```
 249  
 250  #### 2. Bootstrap the Monitor Agent
 251  
 252  The Monitor agent needs an initial task to start its self-scheduling loop:
 253  
 254  ```bash
 255  npm run agent:create -- --agent monitor --task scan_logs --context '{"incremental":true}' --priority 5
 256  ```
 257  
 258  #### 3. Verify Agents Are Running
 259  
 260  ```bash
 261  # Check agent status
 262  npm run agent:list
 263  
 264  # View pending tasks
 265  npm run agent:tasks
 266  
 267  # View recent logs
 268  npm run agent:logs -- --level info
 269  ```
 270  
 271  #### 4. Trigger a Test Workflow
 272  
 273  ```bash
 274  # Test bug fix workflow
 275  npm run agent:workflow -- --workflow bug-fix --error "Test error for verification" --stage scoring
 276  
 277  # Check workflow status
 278  npm run agent:tasks
 279  ```
 280  
 281  ---
 282  
 283  ## Agents
 284  
 285  ### 1. Monitor Agent
 286  
 287  **Role:** System immune system - proactive detection of issues
 288  
 289  **Responsibilities:**
 290  
 291  - Scan log files for ERROR/FATAL patterns every 5 minutes
 292  - Detect looping errors (same error >3x in 1 hour)
 293  - Monitor stale tasks (pending >1 hour)
 294  - Verify process compliance (expected stage transitions)
 295  - Track agent health (success/failure ratios)
 296  - Check documentation drift daily
 297  
 298  **Task Types:**
 299  
 300  - `scan_logs` - Incremental log scanning (self-scheduling)
 301  - `check_agent_health` - Monitor agent success rates
 302  - `check_process_compliance` - Verify workflow adherence
 303  - `check_doc_freshness` - Detect stale documentation
 304  
 305  **Context Size:** 20KB (base.md + monitor.md)
 306  
 307  **Self-Scheduling:** Creates new `scan_logs` task after each completion
 308  
 309  **Example:**
 310  
 311  ```bash
 312  # View Monitor status
 313  npm run agent:list | grep monitor
 314  
 315  # View Monitor logs
 316  npm run agent:logs -- --agent-name monitor
 317  ```
 318  
 319  ### 2. Triage Agent
 320  
 321  **Role:** Error classifier and task router
 322  
 323  **Responsibilities:**
 324  
 325  - Classify errors by type (null_pointer, network, database_constraint, api_error, security, configuration)
 326  - Determine severity (critical, high, medium, low)
 327  - Calculate priority (1-10 scale based on severity + impact)
 328  - Route tasks to appropriate agents
 329  - Suggest initial fix approaches
 330  
 331  **Task Types:**
 332  
 333  - `classify_error` - Analyze error and create appropriate task
 334  
 335  **Context Size:** 23.5KB (base.md + triage.md)
 336  
 337  **Routing Logic:**
 338  
 339  - Security errors -> Security Agent (priority 10)
 340  - Network errors -> Developer Agent
 341  - Database/API errors -> Developer Agent
 342  - Complex architectural issues -> Architect Agent
 343  - Configuration errors -> Developer Agent
 344  
 345  **Example:**
 346  
 347  ```bash
 348  # Manually trigger triage
 349  npm run agent:create -- --agent triage --task classify_error --context '{"error":"TypeError: Cannot read property score of null","file":"src/score.js"}' --priority 7
 350  ```
 351  
 352  ### 3. Developer Agent
 353  
 354  **Role:** Bug fixes and feature implementation
 355  
 356  **Responsibilities:**
 357  
 358  - Analyze error messages and stack traces
 359  - Extract affected file paths
 360  - Generate bug fixes
 361  - Implement new features
 362  - **CRITICAL:** Enforce 85%+ code coverage before commits
 363  - Create git commits (only if coverage gate passes)
 364  - Hand off to QA for verification
 365  
 366  **Task Types:**
 367  
 368  - `fix_bug` - Analyze and fix bugs
 369  - `implement_feature` - Build new features
 370  - `implementation_plan` - Create detailed implementation plan
 371  
 372  **Context Size:** 21.3KB (base.md + developer.md)
 373  
 374  **Coverage Gate:**
 375  Developer enforces 85%+ coverage BEFORE creating commits:
 376  
 377  1. Make code changes
 378  2. Run `checkCoverageBeforeCommit(files, taskId)`
 379  3. If coverage <85%: Attempt automatic test generation
 380  4. If auto-fix fails: Escalate to Architect for guidance
 381  5. Only commit if coverage >=85%
 382  
 383  **Coverage Escalation:**
 384  
 385  When coverage <85% and auto-fix fails, Developer asks Architect for guidance:
 386  
 387  - Option A: Refactor code for better testability
 388  - Option B: Accept lower coverage with technical debt justification (requires human approval)
 389  - Option C: Provide manual test guidance for complex uncovered branches
 390  
 391  **Workflow Example:**
 392  
 393  ```
 394  1. Receive fix_bug task from Triage
 395  2. Analyze error and identify affected files
 396  3. Generate fix
 397  4. Run coverage check
 398  5. If coverage passes: Create commit
 399  6. Create verify_fix task for QA
 400  7. Send handoff message to QA
 401  ```
 402  
 403  **Example:**
 404  
 405  ```bash
 406  # View Developer tasks
 407  npm run agent:tasks -- --assigned-to developer
 408  
 409  # Trigger bug fix
 410  npm run agent:workflow -- --workflow bug-fix --error "..." --file src/score.js
 411  ```
 412  
 413  ### 4. QA Agent
 414  
 415  **Role:** Test generation, verification, coverage enforcement
 416  
 417  **Responsibilities:**
 418  
 419  - Generate unit tests for new features
 420  - Verify bug fixes work correctly
 421  - Enforce 80%+ coverage gate (HARD BLOCK on task completion)
 422  - Run test suite and parse coverage reports
 423  - Create feedback for developers on failures
 424  - Tag regression tests
 425  
 426  **Task Types:**
 427  
 428  - `write_test` - Generate unit test
 429  - `verify_fix` - Verify bug fix works
 430  - `check_coverage` - Ensure 80%+ coverage
 431  - `write_missing_tests` - Fill coverage gaps
 432  
 433  **Context Size:** 23KB (base.md + qa.md)
 434  
 435  **Coverage Gate:**
 436  QA enforces 80%+ coverage AFTER commits as a second safety layer:
 437  
 438  1. Receive verify_fix task
 439  2. Run tests for changed files
 440  3. Check coverage with c8
 441  4. If <80%: Create write_missing_tests task, block parent task
 442  5. If >=80%: Mark task complete
 443  
 444  **Example:**
 445  
 446  ```bash
 447  # View QA tasks
 448  npm run agent:tasks -- --assigned-to qa
 449  
 450  # Check recent verifications
 451  npm run agent:logs -- --agent-name qa --level info
 452  ```
 453  
 454  ### 5. Security Agent
 455  
 456  **Role:** Security audits, compliance, vulnerability scanning
 457  
 458  **Responsibilities:**
 459  
 460  - Code security reviews (SQL injection, XSS, command injection)
 461  - Dependency vulnerability scanning (`npm audit`)
 462  - Secrets detection (hardcoded keys, credentials)
 463  - TCPA/CAN-SPAM/GDPR compliance validation
 464  - Track vulnerability remediation time
 465  
 466  **Task Types:**
 467  
 468  - `audit_code` - Security code review
 469  - `scan_dependencies` - Check for vulnerable dependencies
 470  - `compliance_check` - Validate TCPA/CAN-SPAM adherence
 471  - `scan_secrets` - Detect exposed credentials
 472  
 473  **Context Size:** 21KB (base.md + security.md)
 474  
 475  **Example:**
 476  
 477  ```bash
 478  # Trigger security audit
 479  npm run agent:create -- --agent security --task audit_code --context '{"files":["src/outreach/sms.js"]}' --priority 8
 480  
 481  # View security findings
 482  npm run agent:logs -- --agent-name security --level error
 483  ```
 484  
 485  ### 6. Architect Agent
 486  
 487  **Role:** Design review, refactoring, documentation freshness
 488  
 489  **Responsibilities:**
 490  
 491  - Design reviews for new features
 492  - Refactoring suggestions based on complexity analysis
 493  - Code complexity monitoring (max 150 lines, complexity 15)
 494  - Documentation freshness checks
 495  - Schema change validation
 496  - Create Architecture Decision Records (ADRs)
 497  
 498  **Task Types:**
 499  
 500  - `design_proposal` - Create design document for significant changes
 501  - `technical_review` - Review implementation plans
 502  - `suggest_refactor` - Recommend refactoring
 503  - `update_documentation` - Fix stale docs
 504  - `review_design` - Evaluate feature designs
 505  
 506  **Context Size:** 25KB (base.md + architect.md)
 507  
 508  **Documentation Freshness Checks:**
 509  On every commit, Architect verifies:
 510  
 511  - New env vars -> `.env.example` updated?
 512  - New npm scripts -> `README.md` updated?
 513  - New modules -> `CLAUDE.md` updated?
 514  - Schema changes -> `db/schema.sql` + migration?
 515  - Features done -> `docs/TODO.md` updated?
 516  
 517  **Example:**
 518  
 519  ```bash
 520  # Request design review
 521  npm run agent:create -- --agent architect --task design_proposal --context '{"feature":"Dark mode toggle","requirements":["Settings UI","Persistence","Global theme"]}' --priority 6
 522  
 523  # View pending reviews
 524  npm run agent:tasks -- --assigned-to architect --status awaiting_po_approval
 525  ```
 526  
 527  ---
 528  
 529  ## Task Routing
 530  
 531  The agent system uses a centralized task routing configuration to ensure tasks are always assigned to the correct agent.
 532  
 533  ### Routing Configuration
 534  
 535  **Location:** `src/agents/utils/task-routing.js`
 536  
 537  This module provides:
 538  
 539  - `TASK_ROUTING` - Complete mapping of task types to agents
 540  - `getAgentForTaskType(taskType)` - Get correct agent for a task type
 541  - `validateTaskAssignment(taskType, assignedTo)` - Validate task is correctly routed
 542  - `getTaskTypesForAgent(agentName)` - Get all task types an agent handles
 543  
 544  ### Complete Task Type Reference
 545  
 546  | Task Type                       | Agent     | Description                                    |
 547  | ------------------------------- | --------- | ---------------------------------------------- |
 548  | **Developer Tasks**             |           |                                                |
 549  | `fix_bug`                       | developer | Fix bugs identified by Triage                  |
 550  | `implement_feature`             | developer | Implement new features after design approval   |
 551  | `refactor_code`                 | developer | Refactor complex or problematic code           |
 552  | `apply_feedback`                | developer | Address feedback from other agents             |
 553  | `implementation_plan`           | developer | Create detailed implementation plan            |
 554  | **QA Tasks**                    |           |                                                |
 555  | `write_test`                    | qa        | Generate unit tests for code                   |
 556  | `verify_fix`                    | qa        | Verify bug fix works correctly                 |
 557  | `check_coverage`                | qa        | Check test coverage meets 80%+ requirement     |
 558  | `run_tests`                     | qa        | Run test suite for files                       |
 559  | **Security Tasks**              |           |                                                |
 560  | `audit_code`                    | security  | Security code review (SQL injection, XSS, etc) |
 561  | `scan_dependencies`             | security  | Check for vulnerable dependencies              |
 562  | `verify_compliance`             | security  | Validate TCPA/CAN-SPAM/GDPR compliance         |
 563  | `scan_secrets`                  | security  | Detect exposed credentials                     |
 564  | `threat_model`                  | security  | STRIDE threat modeling for component           |
 565  | `fix_security_issue`            | security  | Auto-fix security vulnerabilities              |
 566  | `review_dependency_update`      | security  | Review dependency updates for security         |
 567  | **Architect Tasks**             |           |                                                |
 568  | `design_proposal`               | architect | Create design proposal for features            |
 569  | `technical_review`              | architect | Review implementation plan for soundness       |
 570  | `review_design`                 | architect | Review design against principles               |
 571  | `suggest_refactor`              | architect | Suggest refactoring for complex code           |
 572  | `update_documentation`          | architect | Update documentation with Claude API           |
 573  | `check_documentation_freshness` | architect | Check for stale documentation                  |
 574  | `check_complexity`              | architect | Check code complexity metrics                  |
 575  | `audit_documentation`           | architect | Verify documentation matches reality           |
 576  | `check_branch_health`           | architect | Check for stale branches                       |
 577  | `profile_performance`           | architect | Profile pipeline performance                   |
 578  | `review_documentation`          | architect | Review documentation accuracy                  |
 579  | **Triage Tasks**                |           |                                                |
 580  | `classify_error`                | triage    | Classify error and route to agent              |
 581  | `route_task`                    | triage    | Route generic task to agent                    |
 582  | `prioritize_tasks`              | triage    | Prioritize pending tasks                       |
 583  | **Monitor Tasks**               |           |                                                |
 584  | `scan_logs`                     | monitor   | Scan logs for errors (self-scheduling)         |
 585  | `check_agent_health`            | monitor   | Monitor agent success rates                    |
 586  | `check_process_compliance`      | monitor   | Verify workflow adherence                      |
 587  | `detect_anomaly`                | monitor   | Detect anomalous behavior                      |
 588  | `check_pipeline_health`         | monitor   | Check pipeline for blockages                   |
 589  | `check_slo_compliance`          | monitor   | Check SLO compliance metrics                   |
 590  
 591  ### Auto-Delegation
 592  
 593  When an agent receives a task type it doesn't handle, it automatically delegates to the correct agent using `BaseAgent.delegateToCorrectAgent()`:
 594  
 595  **Example:** If `implement_feature` is mistakenly assigned to `monitor`:
 596  
 597  1. Monitor calls `delegateToCorrectAgent(task)`
 598  2. Creates new task assigned to `developer`
 599  3. Completes original task with delegation note
 600  4. Logs routing correction for analysis
 601  
 602  This prevents "Unknown task type" errors and ensures no tasks are lost due to misrouting.
 603  
 604  ### Common Routing Errors Fixed
 605  
 606  **Before (Errors):**
 607  
 608  - `implement_feature` -> monitor, triage, qa, security, architect (wrong)
 609  - `fix_bug` -> architect (wrong)
 610  - `review_documentation` -> unknown (wrong)
 611  - `review_dependency_update` -> unknown (wrong)
 612  
 613  **After (Correct Routing):**
 614  
 615  - `implement_feature` -> developer
 616  - `fix_bug` -> developer
 617  - `review_documentation` -> architect
 618  - `review_dependency_update` -> security
 619  
 620  ### Testing
 621  
 622  Run task routing tests:
 623  
 624  ```bash
 625  node --test tests/agents/task-routing.test.js
 626  ```
 627  
 628  This validates all task types are correctly mapped and delegation works properly.
 629  
 630  ---
 631  
 632  ## Workflows
 633  
 634  ### Standard Workflow Types
 635  
 636  #### 1. Feature Implementation (Significant)
 637  
 638  Used for breaking changes, database migrations, or features >4 hours effort.
 639  
 640  ```
 641  Product Request
 642     |
 643  Architect: design_proposal
 644     |
 645  Status: awaiting_po_approval
 646     |
 647  PO Reviews -> Approves/Rejects
 648     | (approved)
 649  Developer: implementation_plan
 650     |
 651  Status: awaiting_architect_approval
 652     |
 653  Architect: technical_review -> Approves/Rejects
 654     | (approved)
 655  Developer: implement_feature
 656     |
 657  QA: verify_fix
 658     |
 659  Security: audit_code (if needed)
 660  ```
 661  
 662  **Example:**
 663  
 664  ```bash
 665  npm run agent:workflow -- --workflow feature --description "Add two-factor authentication" --requirements '["SMS OTP","Email backup codes","Recovery process"]'
 666  
 667  # View approval queue
 668  npm run agent:approvals -- --status awaiting_po_approval
 669  
 670  # Approve design
 671  npm run agent:approve -- --task-id 42 --reviewer "Jason" --decision approved
 672  ```
 673  
 674  #### 2. Feature Implementation (Minor)
 675  
 676  Used for small features <=4 hours, no breaking changes or migrations.
 677  
 678  ```
 679  Product Request
 680     |
 681  Architect: design_proposal (auto-approved)
 682     |
 683  Developer: implementation_plan
 684     |
 685  Architect: technical_review
 686     |
 687  Developer: implement_feature
 688     |
 689  QA: verify_fix
 690  ```
 691  
 692  **Example:**
 693  
 694  ```bash
 695  npm run agent:workflow -- --workflow feature --description "Add logging to enrich stage" --requirements '["Log contact count","Log errors"]'
 696  ```
 697  
 698  #### 3. Bug Fix (Architectural)
 699  
 700  For bugs affecting multiple modules or requiring schema changes.
 701  
 702  ```
 703  Error Detected
 704     |
 705  Triage: classify_error -> architectural
 706     |
 707  Architect: design_proposal
 708     |
 709  Status: awaiting_po_approval
 710     |
 711  PO Approves
 712     |
 713  Developer: implementation_plan
 714     |
 715  Architect: technical_review
 716     |
 717  Developer: fix_bug
 718     |
 719  QA: verify_fix
 720  ```
 721  
 722  #### 4. Bug Fix (Standard)
 723  
 724  For isolated bugs in a single file with low complexity.
 725  
 726  ```
 727  Error Detected
 728     |
 729  Triage: classify_error -> simple
 730     |
 731  Developer: fix_bug
 732     |
 733  QA: verify_fix
 734  ```
 735  
 736  **Example:**
 737  
 738  ```bash
 739  npm run agent:workflow -- --workflow bug-fix --error "TypeError: Cannot read property 'score' of null" --file src/score.js --stack "..."
 740  ```
 741  
 742  #### 5. Refactor Workflow
 743  
 744  For code complexity reduction or architectural improvements.
 745  
 746  ```
 747  Complexity Detected
 748     |
 749  Architect: design_proposal
 750     |
 751  Developer: implementation_plan
 752     |
 753  Architect: technical_review
 754     |
 755  Developer: implement refactoring
 756     |
 757  QA: verify_fix (ensure no regressions)
 758  ```
 759  
 760  **Example:**
 761  
 762  ```bash
 763  npm run agent:workflow -- --workflow refactor --file src/utils/stealth-browser.js --reason "Cyclomatic complexity exceeds 15"
 764  ```
 765  
 766  ### Validation Rules
 767  
 768  All tasks validate workflow dependencies before creation:
 769  
 770  1. **implement_feature** requires approved `design_proposal` parent
 771  2. **Developer implementation** requires approved `implementation_plan`
 772  3. **QA verification** requires completed Developer task
 773  4. **Parent tasks** must be completed before children start
 774  
 775  ### Approval System
 776  
 777  #### Product Owner Approval
 778  
 779  **Required for:**
 780  
 781  - Breaking changes
 782  - Database migrations
 783  - Features with >4 hours estimated effort
 784  - Changes explicitly marked "significant"
 785  
 786  **Process:**
 787  
 788  1. Architect creates design_proposal task
 789  2. Task status -> `awaiting_po_approval`
 790  3. Task appears in human_review_queue
 791  4. PO reviews via CLI: `npm run agent:approvals`
 792  5. PO approves/rejects via: `npm run agent:approve`
 793  
 794  **Approval Schema:**
 795  
 796  ```json
 797  {
 798    "decision": "approved | approved_with_conditions | rejected",
 799    "reviewer": "Jason",
 800    "timestamp": "2026-02-15T10:30:00Z",
 801    "notes": "Looks good, keep scope tight",
 802    "conditions": ["Max 2 files", "No new dependencies"]
 803  }
 804  ```
 805  
 806  #### Architect Approval
 807  
 808  **Required for:**
 809  
 810  - All implementation plans
 811  - Refactorings
 812  - Performance optimizations
 813  
 814  **Review Criteria:**
 815  
 816  - Files won't exceed 150 lines
 817  - Test coverage >=85%
 818  - Documentation updated
 819  - No circular dependencies
 820  - Follows architectural patterns
 821  
 822  **Process:**
 823  
 824  1. Developer creates implementation_plan
 825  2. Task status -> `awaiting_architect_approval`
 826  3. Architect agent reviews plan
 827  4. Creates technical_review task
 828  5. Approves -> status back to `pending`, Developer proceeds
 829  6. Rejects -> feedback to Developer, plan revised
 830  
 831  ---
 832  
 833  ## CLI Commands
 834  
 835  ### View Agent Status
 836  
 837  ```bash
 838  # List all agents with current status
 839  npm run agent:list
 840  
 841  # Output:
 842  # Agent: monitor, Status: idle, Last run: 2026-02-15 10:25:00
 843  # Agent: developer, Status: running, Current task: 42
 844  # Circuit breaker: All agents operational
 845  ```
 846  
 847  ### Manage Tasks
 848  
 849  ```bash
 850  # View all pending tasks
 851  npm run agent:tasks
 852  
 853  # View tasks for specific agent
 854  npm run agent:tasks -- --assigned-to developer
 855  
 856  # View tasks by status
 857  npm run agent:tasks -- --status pending
 858  npm run agent:tasks -- --status awaiting_po_approval
 859  
 860  # View specific task details
 861  npm run agent:tasks -- --task-id 42
 862  ```
 863  
 864  ### Create Tasks Manually
 865  
 866  ```bash
 867  # Create task for developer
 868  npm run agent:create -- --agent developer --task fix_bug --context '{"error":"...","file":"src/score.js"}' --priority 7
 869  
 870  # Create task for QA
 871  npm run agent:create -- --agent qa --task write_test --context '{"module":"scoring","function":"calculateScore"}' --priority 5
 872  ```
 873  
 874  ### Trigger Workflows
 875  
 876  ```bash
 877  # Bug fix workflow
 878  npm run agent:workflow -- --workflow bug-fix --error "TypeError: Cannot read property 'score' of null" --stage scoring
 879  
 880  # Feature workflow
 881  npm run agent:workflow -- --workflow feature --description "Add export to CSV" --requirements '["Export button","CSV format","Download trigger"]'
 882  
 883  # Refactor workflow
 884  npm run agent:workflow -- --workflow refactor --file src/utils/stealth-browser.js --reason "Cyclomatic complexity exceeds 15"
 885  ```
 886  
 887  ### Manage Approvals
 888  
 889  ```bash
 890  # View all pending approvals
 891  npm run agent:approvals
 892  
 893  # Filter by approval type
 894  npm run agent:approvals -- --status awaiting_po_approval
 895  npm run agent:approvals -- --status awaiting_architect_approval
 896  
 897  # Approve task
 898  npm run agent:approve -- --task-id 42 --reviewer "Jason" --decision approved
 899  
 900  # Approve with conditions
 901  npm run agent:approve -- --task-id 42 --reviewer "Jason" --decision approved_with_conditions --notes "Keep it simple" --conditions "Max 2 files,No new dependencies"
 902  
 903  # Reject task
 904  npm run agent:approve -- --task-id 42 --reviewer "Jason" --decision rejected --notes "Scope too large, break into smaller pieces"
 905  ```
 906  
 907  ### View Workflow Status
 908  
 909  ```bash
 910  # View workflow tree (parent/child tasks)
 911  npm run agent:workflow:status -- --workflow-id 42
 912  
 913  # Output shows task hierarchy and status
 914  ```
 915  
 916  ### View Logs
 917  
 918  ```bash
 919  # View all agent logs
 920  npm run agent:logs
 921  
 922  # Filter by agent
 923  npm run agent:logs -- --agent-name developer
 924  
 925  # Filter by task
 926  npm run agent:logs -- --task-id 42
 927  
 928  # Filter by level
 929  npm run agent:logs -- --level error
 930  npm run agent:logs -- --agent-name developer --level error
 931  ```
 932  
 933  ### View Statistics
 934  
 935  ```bash
 936  # View success rates and metrics
 937  npm run agent:stats
 938  
 939  # Output:
 940  # Agent: developer, Tasks: 45, Success: 42, Failure: 3, Rate: 93%
 941  # Agent: qa, Tasks: 38, Success: 38, Failure: 0, Rate: 100%
 942  # Circuit breaker: All agents operational
 943  ```
 944  
 945  ### Run Agents Manually
 946  
 947  ```bash
 948  # Run all agents once
 949  npm run agent:run
 950  
 951  # Run with verbose logging
 952  npm run agent:run -- --verbose
 953  
 954  # Process up to N tasks
 955  npm run agent:run -- --tasks=10
 956  
 957  # Run single agent
 958  npm run agent:run:single
 959  ```
 960  
 961  ---
 962  
 963  ## Horizontal Scaling
 964  
 965  The agent system supports horizontal scaling through row-level task locking, allowing multiple instances of the same agent to run concurrently without conflicts.
 966  
 967  ### How It Works
 968  
 969  **Row-Level Locking:**
 970  
 971  - Each agent instance atomically claims individual tasks from the database
 972  - SQLite transactions ensure only one instance can claim any given task
 973  - Multiple instances safely process different tasks simultaneously
 974  - No duplicate processing, even with 5+ concurrent instances
 975  
 976  **Configuration:**
 977  
 978  ```env
 979  # Enable row-level locking (default: true)
 980  AGENT_ENABLE_ROW_LOCKING=true
 981  
 982  # Allow concurrent instances of same agent (default: false)
 983  AGENT_ALLOW_CONCURRENT_INSTANCES=true
 984  ```
 985  
 986  ### Running Multiple Instances
 987  
 988  **Example: 3 Developer Agents:**
 989  
 990  ```bash
 991  # Terminal 1
 992  npm run agent:run:single developer &
 993  
 994  # Terminal 2
 995  npm run agent:run:single developer &
 996  
 997  # Terminal 3
 998  npm run agent:run:single developer &
 999  ```
1000  
1001  All three instances will process different tasks concurrently. Work distribution is automatic and race-condition safe.
1002  
1003  ### Performance Benefits
1004  
1005  **Task Throughput:**
1006  
1007  - 1 developer agent: ~5-10 tasks/hour (depending on complexity)
1008  - 3 developer agents: ~15-30 tasks/hour (3x throughput)
1009  - 5 developer agents: ~25-50 tasks/hour (5x throughput, diminishing returns)
1010  
1011  **When to Scale:**
1012  
1013  - High task queue depth (>20 pending tasks)
1014  - Long-running tasks (>5 minutes each)
1015  - Time-sensitive workflows (critical bug fixes)
1016  
1017  ### Safety Mechanisms
1018  
1019  **Agent-Level Locking (Optional):**
1020  
1021  ```env
1022  # Disable for horizontal scaling
1023  AGENT_ALLOW_CONCURRENT_INSTANCES=true
1024  ```
1025  
1026  When disabled (default), only one instance of each agent runs at a time (backwards compatible).
1027  
1028  **Task States:**
1029  
1030  - `pending` -> Available for claiming
1031  - `running` -> Claimed by an instance (atomic transition)
1032  - `completed` -> Finished successfully
1033  - `failed` -> Error after max retries
1034  
1035  **Edge Cases Handled:**
1036  
1037  - Race conditions: Transaction ensures atomic claiming
1038  - Crashed instances: Stale lock cleanup after 2 minutes
1039  - Duplicate processing: Prevented by atomic UPDATE WHERE status='pending'
1040  
1041  ### Monitoring Concurrent Agents
1042  
1043  **View running instances:**
1044  
1045  ```bash
1046  # Check agent states
1047  npm run agent:list
1048  
1049  # Monitor task processing
1050  watch -n 5 'npm run agent:tasks'
1051  ```
1052  
1053  **Database query:**
1054  
1055  ```sql
1056  SELECT
1057    agent_name,
1058    COUNT(*) as processing_count,
1059    GROUP_CONCAT(id) as task_ids
1060  FROM agent_tasks
1061  WHERE status = 'running'
1062  GROUP BY agent_name;
1063  ```
1064  
1065  ### Limitations
1066  
1067  **SQLite Concurrency:**
1068  
1069  - WAL mode recommended for high concurrency
1070  - ~10 concurrent writers is safe limit
1071  - Consider PostgreSQL for >10 instances
1072  
1073  **Cost Considerations:**
1074  
1075  - Each instance makes LLM API calls
1076  - Budget enforcement: `AGENT_DAILY_BUDGET=10` (USD)
1077  - Emergency shutdown if >$5/hour spend rate
1078  
1079  ### Scaling Best Practices
1080  
1081  1. **Start with 2-3 instances** - Verify row-level locking works correctly
1082  2. **Monitor task completion** - Ensure no duplicate processing
1083  3. **Check database locks** - Avoid SQLite contention
1084  4. **Scale gradually** - Add instances as queue depth increases
1085  5. **Use priority wisely** - High-priority tasks processed first
1086  
1087  ### Testing Concurrent Locking
1088  
1089  ```bash
1090  # Run concurrent locking tests
1091  npm test tests/agents/concurrent-locking.test.js
1092  ```
1093  
1094  Tests verify:
1095  
1096  - No duplicate task processing
1097  - Correct priority ordering
1098  - Work distribution across instances
1099  - Agent isolation (developer vs QA)
1100  - Backwards compatibility with single instance
1101  
1102  ---
1103  
1104  ## Configuration
1105  
1106  ### Environment Variables
1107  
1108  ```bash
1109  # Enable/disable agent system
1110  AGENT_SYSTEM_ENABLED=true
1111  
1112  # Circuit breaker threshold (30% failure rate triggers disable)
1113  AGENT_CIRCUIT_BREAKER_THRESHOLD=0.3
1114  
1115  # Rate limit (max invocations per hour)
1116  AGENT_MAX_INVOCATIONS_PER_HOUR=60
1117  
1118  # Immediate invocation (default: true)
1119  # Event-driven agent invocation eliminates 5-minute cron delays
1120  # Agents invoke each other immediately after handoffs and task creation
1121  # Speeds up workflows 10-15x (from 15-20 min to < 2 min)
1122  # See docs/IMMEDIATE-INVOCATION.md for details
1123  AGENT_IMMEDIATE_INVOCATION=true
1124  
1125  # Max chain depth (default: 10)
1126  # Prevents infinite loops by limiting consecutive immediate invocations
1127  # After reaching depth, agents fall back to cron polling
1128  AGENT_MAX_CHAIN_DEPTH=10
1129  
1130  # Database path (for testing)
1131  DATABASE_PATH=./db/sites.db
1132  ```
1133  
1134  ### Quality Gates
1135  
1136  **Developer Agent:**
1137  
1138  - **Coverage gate:** 85%+ required BEFORE commits (HARD BLOCK)
1139  - Automatic test generation attempted if coverage <85%
1140  - Escalates to Architect if auto-fix fails
1141  
1142  **QA Agent:**
1143  
1144  - **Coverage gate:** 80%+ required to approve tasks (HARD BLOCK)
1145  - Creates `write_missing_tests` task if coverage <80%
1146  - Blocks parent task until coverage improves
1147  
1148  **Other Gates:**
1149  
1150  - **Retry limit:** 3 retries per task before marking as failed
1151  - **Task TTL:** Tasks pending >1 hour escalate to human review
1152  - **Circuit breaker:** >30% failure rate disables agent
1153  
1154  **Coverage Enforcement:**
1155  
1156  1. **Developer Agent** (85% gate): Checks coverage BEFORE commit
1157     - Blocks commit if any changed source file <85%
1158     - Attempts to write tests automatically
1159     - Escalates to Architect if auto-fix fails
1160  2. **QA Agent** (80% gate): Checks coverage AFTER commit
1161     - Blocks task completion if <80%
1162     - Creates `write_test` task for missing coverage
1163     - Provides second layer of enforcement
1164  
1165  ### Scheduling
1166  
1167  **Immediate Invocation** (Event-Driven):
1168  
1169  Agents are invoked immediately when:
1170  
1171  - Another agent hands off a task (`handoff()`)
1172  - A new task is created (`createTask()`)
1173  
1174  This eliminates 5-minute cron delays, speeding up workflows **10-15x** (from 15-20 minutes to < 2 minutes).
1175  
1176  See [IMMEDIATE-INVOCATION.md](../IMMEDIATE-INVOCATION.md) for details.
1177  
1178  **Cron Fallback** (Scheduled Polling):
1179  
1180  Agents also run via cron job every 5 minutes as a safety net:
1181  
1182  ```sql
1183  -- cron_jobs table entry
1184  INSERT INTO cron_jobs (name, schedule, handler, enabled)
1185  VALUES ('agent-runner', '*/5 * * * *', 'node src/agents/runner.js', 1);
1186  ```
1187  
1188  Cron picks up tasks that were missed by immediate invocation (e.g., due to errors or depth limits).
1189  
1190  **Manual control:**
1191  
1192  - Start: Set `enabled = 1` in cron_jobs
1193  - Stop: Set `enabled = 0` in cron_jobs
1194  - One-time run: `npm run agent:run`
1195  
1196  ---
1197  
1198  ## Safety Features
1199  
1200  ### Circuit Breaker
1201  
1202  **Purpose:** Prevent runaway agent failures from consuming resources
1203  
1204  **How it works:**
1205  
1206  1. Monitors agent success/failure ratios
1207  2. If failure rate >30% (and >=10 tasks completed): Trigger circuit breaker
1208  3. Agent status -> `blocked`
1209  4. Timestamp recorded in `agent_state.metrics_json`
1210  5. Manual reset required
1211  
1212  **When triggered:**
1213  
1214  - Agent logged to `human_review_queue`
1215  - All tasks for that agent paused
1216  - Root cause investigation required
1217  
1218  **Reset:**
1219  
1220  ```sql
1221  UPDATE agent_state
1222  SET status = 'idle',
1223      metrics_json = json_remove(metrics_json, '$.circuit_breaker_triggered_at')
1224  WHERE agent_name = 'developer';
1225  ```
1226  
1227  ### Escalation to Human Review
1228  
1229  Tasks automatically escalate to `human_review_queue` for:
1230  
1231  - Database schema changes
1232  - Breaking API changes
1233  - Security-sensitive changes (auth, secrets, compliance)
1234  - Circuit breaker triggers
1235  - Stale tasks (pending >1 hour)
1236  - Failed tasks after 3 retries
1237  
1238  **Review queue:**
1239  
1240  ```bash
1241  # View human review items
1242  npm run agent:approvals
1243  
1244  # Approve/reject from queue
1245  npm run agent:approve -- --task-id <id> --reviewer "Name" --decision approved|rejected
1246  ```
1247  
1248  ### Audit Trail
1249  
1250  Complete tracking of all agent actions:
1251  
1252  **agent_logs table:**
1253  
1254  - Every task execution logged with level (info, warning, error)
1255  - Metadata includes context, decisions, file paths
1256  
1257  **agent_messages table:**
1258  
1259  - All inter-agent communication recorded
1260  - Message types: handoff, question, answer, notification
1261  
1262  **agent_tasks table:**
1263  
1264  - Task status changes tracked
1265  - Retry attempts logged
1266  - Result stored in result_json
1267  
1268  **agent_state table:**
1269  
1270  - Agent status changes
1271  - Metrics tracked (success/failure rates)
1272  - Last active timestamps
1273  
1274  ### Rollback Protection
1275  
1276  **Before making changes:**
1277  
1278  1. Developer agent checks coverage
1279  2. Architect reviews implementation plan
1280  3. QA verifies changes don't break tests
1281  
1282  **If something breaks:**
1283  
1284  1. Monitor detects errors in logs
1285  2. Triage classifies and routes
1286  3. Developer creates fix
1287  4. Workflow repeats with proper gates
1288  
1289  **Manual rollback:**
1290  
1291  ```bash
1292  # View recent changes
1293  git log --oneline -5
1294  
1295  # Rollback if needed
1296  git revert <commit-hash>
1297  
1298  # Trigger QA verification
1299  npm run agent:create -- --agent qa --task verify_fix --context '{"commit":"..."}' --priority 10
1300  ```
1301  
1302  ---
1303  
1304  ## Cost Management
1305  
1306  ### Token Usage Reduction
1307  
1308  **Monolithic approach:** 100-150KB per invocation (full CLAUDE.md)
1309  **Multi-agent approach:** 20-25KB per invocation (base + role context)
1310  **Reduction:** 75-85%
1311  
1312  ### Breakdown by Agent
1313  
1314  | Agent     | Context Size | Tokens/Invocation | Reduction |
1315  | --------- | ------------ | ----------------- | --------- |
1316  | Monitor   | 20KB         | ~5,000            | 80%       |
1317  | Triage    | 23.5KB       | ~6,000            | 76%       |
1318  | Developer | 21.3KB       | ~5,300            | 79%       |
1319  | QA        | 23KB         | ~5,800            | 77%       |
1320  | Security  | 21KB         | ~5,200            | 79%       |
1321  | Architect | 25KB         | ~6,200            | 75%       |
1322  
1323  ### Rate Limiting
1324  
1325  **Environment variable:**
1326  
1327  ```bash
1328  AGENT_MAX_INVOCATIONS_PER_HOUR=60
1329  ```
1330  
1331  **How it works:**
1332  
1333  - Tracks invocations per hour in `agent_state.metrics_json`
1334  - If limit exceeded: Agent status -> `blocked`
1335  - Resets every hour
1336  
1337  **Monitoring:**
1338  
1339  ```bash
1340  # Check invocation counts
1341  npm run agent:stats
1342  
1343  # View recent logs
1344  npm run agent:logs -- --agent-name developer
1345  ```
1346  
1347  ### Budget Controls
1348  
1349  **Prevent cost overruns:**
1350  
1351  1. **Set rate limits:** `AGENT_MAX_INVOCATIONS_PER_HOUR=60`
1352  2. **Monitor stats:** `npm run agent:stats` daily
1353  3. **Review logs:** Check for unnecessary task creation
1354  4. **Optimize context:** Keep context files lean and focused
1355  5. **Use Haiku for simple tasks:** `AGENT_USE_HAIKU_FOR_SIMPLE_TASKS=true` (50-70% cost reduction)
1356  
1357  ### Smart Model Selection (Haiku vs Sonnet)
1358  
1359  **Cost optimization:** The agent system automatically selects the appropriate model based on task complexity.
1360  
1361  **Cost comparison:**
1362  
1363  | Model             | Input          | Output            | Use Case                              |
1364  | ----------------- | -------------- | ----------------- | ------------------------------------- |
1365  | Claude 3.5 Haiku  | $0.80/M        | $4.00/M           | Simple pattern-based tasks            |
1366  | Claude 3.5 Sonnet | $3.00/M        | $15.00/M          | Complex reasoning & code generation   |
1367  | **Cost Savings**  | **4x cheaper** | **3.75x cheaper** | **50-70% reduction for simple tasks** |
1368  
1369  **Haiku tasks (simple/pattern-based):**
1370  
1371  - **Triage:** Error classification via pattern matching
1372  - **Monitor:** Log scanning and anomaly detection
1373  - **Security:** Regex-based security checks (SQL injection, secrets, command injection patterns)
1374  - **QA:** Test file discovery and simple test generation
1375  
1376  **Sonnet tasks (complex reasoning):**
1377  
1378  - **Developer:** Bug fixing and code generation
1379  - **Architect:** Design reviews and architectural decisions
1380  - **Security:** Advanced threat modeling (STRIDE analysis)
1381  - **QA:** Coverage analysis and complex integration tests
1382  
1383  **Configuration:**
1384  
1385  ```bash
1386  # Enable Haiku optimization (default: true for 50-70% cost reduction)
1387  AGENT_USE_HAIKU_FOR_SIMPLE_TASKS=true
1388  ```
1389  
1390  **Override model selection:**
1391  
1392  ```javascript
1393  // Force Haiku
1394  const result = await classifyIssue(agentName, taskId, errorMessage, {
1395    model: 'claude-3-5-haiku-20241022',
1396  });
1397  
1398  // Force Sonnet
1399  const result = await analyzeCode(agentName, taskId, filePath, prompt, {
1400    model: 'claude-3-5-sonnet-20241022',
1401    complexity: 'complex',
1402  });
1403  ```
1404  
1405  **Track cost savings:**
1406  
1407  ```bash
1408  npm run agent:stats
1409  ```
1410  
1411  Output includes model breakdown:
1412  
1413  ```json
1414  {
1415    "modelBreakdown": {
1416      "haiku": {
1417        "calls": 150,
1418        "cost": 0.45,
1419        "avgCost": 0.003
1420      },
1421      "sonnet": {
1422        "calls": 50,
1423        "cost": 1.2,
1424        "avgCost": 0.024
1425      },
1426      "savings": "27.3"
1427    }
1428  }
1429  ```
1430  
1431  **Expected savings:**
1432  
1433  - Monitor/Triage agents: 60-70% cost reduction (mostly Haiku)
1434  - Developer/Architect: 10-20% cost reduction (mostly Sonnet)
1435  - Security: 30-40% cost reduction (mix of simple checks and complex modeling)
1436  - QA: 40-50% cost reduction (test generation uses Haiku, coverage analysis uses Sonnet)
1437  
1438  6. **Use circuit breakers:** Prevent runaway failures
1439  
1440  **Cost estimation:**
1441  
1442  - Average task: ~6,000 tokens input + ~2,000 tokens output = 8,000 tokens
1443  - At 60 invocations/hour: ~480,000 tokens/hour
1444  - At $3/M tokens (Sonnet): ~$1.44/hour
1445  - Daily cost (24 hours): ~$35
1446  
1447  **Cost optimization tips:**
1448  
1449  - Reduce task creation frequency if logs are clean
1450  - Increase Monitor scan interval from 5 to 10 minutes
1451  - Disable agents not currently needed
1452  - Use smaller models for simple tasks (Haiku for classification)
1453  
1454  ---
1455  
1456  ## Communication Patterns
1457  
1458  ### Pattern 1: Task Handoff
1459  
1460  ```javascript
1461  await agent.createTask({
1462    task_type: 'verify_fix',
1463    assigned_to: 'qa',
1464    parent_task_id: 123,
1465    priority: 5,
1466    context: {
1467      files_changed: ['src/score.js'],
1468      fix_commit: 'abc123',
1469      test_instructions: 'Verify null check works',
1470    },
1471  });
1472  ```
1473  
1474  ### Pattern 2: Question & Answer
1475  
1476  ```javascript
1477  await agent.askQuestion(
1478    taskId,
1479    'developer',
1480    'Should this test cover mobile and desktop screenshots?'
1481  );
1482  ```
1483  
1484  ### Pattern 3: Workflow Chain
1485  
1486  Triage -> Developer -> QA -> Security (automated handoffs)
1487  
1488  ---
1489  
1490  ## Testing
1491  
1492  ### Unit Tests
1493  
1494  ```bash
1495  # Test individual agents
1496  node --test tests/agents/triage.test.js
1497  node --test tests/agents/developer.test.js
1498  ```
1499  
1500  ### Integration Tests
1501  
1502  ```bash
1503  # Test full workflows
1504  node --test tests/agents/workflow.integration.test.js
1505  ```
1506  
1507  ### E2E Integration Tests
1508  
1509  **Location:** `tests/agents-e2e-implementation.test.js`
1510  
1511  Comprehensive end-to-end tests for the complete agent workflow system. Tests real-world scenarios:
1512  
1513  1. **Bug Fix Workflow** - Triage -> Developer -> QA complete success path
1514  2. **Feature Implementation** - Multi-agent collaboration with 85%+ coverage target
1515  3. **Security Fix** - High priority workflow with security verification
1516  4. **Coverage Improvement** - QA proactively improving test coverage
1517  5. **Rollback on Failure** - Error recovery and retry with different approach
1518  6. **Budget Enforcement** - API call limits and emergency shutdown
1519  
1520  **Test Features:**
1521  
1522  - Isolated test database (`db/test-agents-e2e-impl.db`)
1523  - Mock Anthropic API for faster tests
1524  - Parent-child task relationship verification
1525  - Inter-agent messaging validation
1526  - Database integrity checks (foreign keys, constraints)
1527  - No database pollution between tests
1528  
1529  **Run Tests:**
1530  
1531  ```bash
1532  node --experimental-test-module-mocks --test tests/agents-e2e-implementation.test.js
1533  ```
1534  
1535  **Test Duration:** ~3 minutes for all 12 tests
1536  
1537  ### Test Results (Phase 5)
1538  
1539  - Triage Agent: 28/28 passed
1540  - Developer Agent: 15/16 passed
1541  - Workflow Integration: 5/6 passed
1542  
1543  ---
1544  
1545  ## Creating New Agents
1546  
1547  ### Step 1: Create Agent Class
1548  
1549  ```javascript
1550  // src/agents/my-agent.js
1551  import { BaseAgent } from './base-agent.js';
1552  
1553  export class MyAgent extends BaseAgent {
1554    constructor() {
1555      super('my-agent', ['base.md', 'my-agent.md']);
1556    }
1557  
1558    async processTask(task) {
1559      if (task.task_type === 'my_task_type') {
1560        await this.handleMyTask(task);
1561      }
1562    }
1563  
1564    async handleMyTask(task) {
1565      const { context_json } = task;
1566      const context = JSON.parse(context_json);
1567  
1568      // Do work...
1569  
1570      await this.completeTask(task.id, { result: 'success' });
1571    }
1572  }
1573  ```
1574  
1575  ### Step 2: Create Context File
1576  
1577  ```markdown
1578  <!-- src/agents/contexts/my-agent.md -->
1579  
1580  # My Agent Context
1581  
1582  ## Responsibilities
1583  
1584  - Task 1
1585  - Task 2
1586  
1587  ## Task Types
1588  
1589  - my_task_type
1590  
1591  ## Best Practices
1592  
1593  ...
1594  ```
1595  
1596  ### Step 3: Add to Runner
1597  
1598  ```javascript
1599  // src/agents/runner.js
1600  import { MyAgent } from './my-agent.js';
1601  
1602  const agents = [
1603    // ... existing agents
1604    new MyAgent(),
1605  ];
1606  ```
1607  
1608  ### Step 4: Update Database
1609  
1610  ```sql
1611  -- Add to agent_state table
1612  INSERT INTO agent_state (agent_name, status)
1613  VALUES ('my-agent', 'idle');
1614  
1615  -- Update CHECK constraints if needed
1616  ALTER TABLE agent_tasks
1617    ADD CONSTRAINT agent_tasks_assigned_to_check
1618    CHECK (assigned_to IN ('developer', 'qa', 'security', 'architect', 'triage', 'monitor', 'my-agent'));
1619  ```
1620  
1621  ### Step 5: Write Tests
1622  
1623  ```javascript
1624  // tests/agents/my-agent.test.js
1625  import { test, describe } from 'node:test';
1626  import { MyAgent } from '../../src/agents/my-agent.js';
1627  
1628  describe('MyAgent', () => {
1629    test('processes my_task_type tasks', async () => {
1630      // ...
1631    });
1632  });
1633  ```
1634  
1635  ---
1636  
1637  ## Troubleshooting
1638  
1639  ### Agent Not Processing Tasks
1640  
1641  **Symptoms:**
1642  
1643  - Tasks stuck in `pending` status
1644  - Agent status shows `blocked`
1645  - No recent logs for agent
1646  
1647  **Diagnosis:**
1648  
1649  ```bash
1650  # Check agent status
1651  npm run agent:list
1652  
1653  # Check circuit breaker
1654  npm run agent:stats
1655  
1656  # View error logs
1657  npm run agent:logs -- --agent-name developer --level error
1658  ```
1659  
1660  **Solutions:**
1661  
1662  1. **Circuit breaker triggered:**
1663  
1664  ```sql
1665  -- Check metrics
1666  SELECT metrics_json FROM agent_state WHERE agent_name = 'developer';
1667  
1668  -- Reset if safe
1669  UPDATE agent_state
1670  SET status = 'idle',
1671      metrics_json = json_remove(metrics_json, '$.circuit_breaker_triggered_at')
1672  WHERE agent_name = 'developer';
1673  ```
1674  
1675  2. **Rate limit exceeded:**
1676  
1677  ```bash
1678  # Wait for hourly reset, or increase limit
1679  AGENT_MAX_INVOCATIONS_PER_HOUR=120
1680  ```
1681  
1682  3. **Agent disabled:**
1683  
1684  ```sql
1685  -- Re-enable agent
1686  UPDATE agent_state SET status = 'idle' WHERE agent_name = 'developer';
1687  ```
1688  
1689  ### Tasks Stuck in Pending
1690  
1691  **Symptoms:**
1692  
1693  - Tasks created but never start
1694  - Task age >1 hour
1695  
1696  **Diagnosis:**
1697  
1698  ```bash
1699  # View pending tasks
1700  npm run agent:tasks -- --status pending
1701  
1702  # Check if agents are running
1703  npm run agent:list
1704  
1705  # Check task dependencies
1706  npm run agent:workflow:status -- --workflow-id 42
1707  ```
1708  
1709  **Solutions:**
1710  
1711  1. **Parent task incomplete:**
1712     - Tasks with `parent_task_id` won't start until parent completes
1713     - Check parent status: `npm run agent:tasks -- --task-id <parent_id>`
1714     - Complete or cancel parent task
1715  
1716  2. **Agent not running:**
1717     - Check cron job enabled: `SELECT * FROM cron_jobs WHERE name = 'agent-runner';`
1718     - Enable: `UPDATE cron_jobs SET enabled = 1 WHERE name = 'agent-runner';`
1719     - Manual run: `npm run agent:run`
1720  
1721  3. **Task priority too low:**
1722     - Increase priority: `UPDATE agent_tasks SET priority = 10 WHERE id = 42;`
1723  
1724  ### High Token Costs
1725  
1726  **Symptoms:**
1727  
1728  - Higher than expected API bills
1729  - Many agent invocations
1730  
1731  **Diagnosis:**
1732  
1733  ```bash
1734  # Check invocation counts
1735  SELECT agent_name, COUNT(*) as invocations
1736  FROM agent_logs
1737  WHERE created_at > datetime('now', '-1 hour')
1738  GROUP BY agent_name;
1739  
1740  # Check task creation rate
1741  SELECT task_type, COUNT(*) as count
1742  FROM agent_tasks
1743  WHERE created_at > datetime('now', '-24 hours')
1744  GROUP BY task_type;
1745  ```
1746  
1747  **Solutions:**
1748  
1749  1. **Reduce invocation frequency:**
1750  
1751  ```bash
1752  # Lower rate limit
1753  AGENT_MAX_INVOCATIONS_PER_HOUR=30
1754  
1755  # Increase Monitor scan interval
1756  # Edit cron_jobs: '*/5 * * * *' -> '*/10 * * * *'
1757  ```
1758  
1759  2. **Optimize context:**
1760     - Review context files for unnecessary content
1761     - Remove duplicate information
1762     - Keep context files lean
1763  
1764  3. **Disable unnecessary agents:**
1765  
1766  ```sql
1767  -- Temporarily disable Security agent
1768  UPDATE agent_state SET status = 'disabled' WHERE agent_name = 'security';
1769  ```
1770  
1771  ### Circuit Breaker Triggered
1772  
1773  **Symptoms:**
1774  
1775  - Agent status = `blocked`
1776  - `circuit_breaker_triggered_at` in metrics_json
1777  
1778  **Diagnosis:**
1779  
1780  ```bash
1781  # View error logs
1782  npm run agent:logs -- --agent-name developer --level error
1783  
1784  # Check failure rate
1785  npm run agent:stats
1786  ```
1787  
1788  **Solutions:**
1789  
1790  1. **Identify root cause:**
1791     - Review error logs for patterns
1792     - Check recent code changes
1793     - Verify external dependencies (DB, APIs)
1794  
1795  2. **Fix underlying issue:**
1796     - If code bug: Fix and test
1797     - If external issue: Wait for resolution
1798     - If config issue: Update configuration
1799  
1800  3. **Reset circuit breaker:**
1801  
1802  ```sql
1803  -- Only after fixing root cause!
1804  UPDATE agent_state
1805  SET status = 'idle',
1806      metrics_json = json_remove(metrics_json, '$.circuit_breaker_triggered_at')
1807  WHERE agent_name = 'developer';
1808  ```
1809  
1810  ### Tasks Failing Repeatedly
1811  
1812  **Symptoms:**
1813  
1814  - Task retry_count = 3
1815  - Task status = `failed`
1816  - Same error in multiple tasks
1817  
1818  **Diagnosis:**
1819  
1820  ```bash
1821  # View failed tasks
1822  SELECT * FROM agent_tasks WHERE status = 'failed' ORDER BY created_at DESC LIMIT 10;
1823  
1824  # Check error patterns
1825  npm run agent:logs -- --level error
1826  ```
1827  
1828  **Solutions:**
1829  
1830  1. **Code issue:**
1831     - Manually fix the bug
1832     - Reset task: `UPDATE agent_tasks SET retry_count = 0, status = 'pending' WHERE id = 42;`
1833  
1834  2. **Missing dependencies:**
1835     - Install required packages: `npm install`
1836     - Update environment variables
1837  
1838  3. **Task too complex:**
1839     - Break into smaller subtasks
1840     - Provide more context in context_json
1841  
1842  ---
1843  
1844  ## Best Practices
1845  
1846  ### When to Use Agents
1847  
1848  **Use agents for:**
1849  
1850  - Automated bug fixes from error logs
1851  - Test generation for new features
1852  - Security audits on commits
1853  - Documentation freshness checks
1854  - Refactoring suggestions
1855  - Routine maintenance tasks
1856  
1857  **Don't use agents for:**
1858  
1859  - Quick one-off tasks (just do it manually)
1860  - Tasks requiring complex user input
1861  - Real-time user interactions
1862  - Tasks with high uncertainty (needs human judgment)
1863  - Exploratory work without clear goals
1864  
1865  ### Task Design
1866  
1867  **Be specific:**
1868  
1869  ```json
1870  {
1871    "error": "TypeError: Cannot read property 'score' of null",
1872    "file": "src/score.js",
1873    "line": 42,
1874    "stack": "..."
1875  }
1876  ```
1877  
1878  **Include context:**
1879  
1880  ```json
1881  {
1882    "files_changed": ["src/score.js", "src/utils/error-handler.js"],
1883    "related_issues": ["Issue #123"],
1884    "previous_attempts": ["Tried null check, still failing"]
1885  }
1886  ```
1887  
1888  **Set appropriate priority:**
1889  
1890  - 10: Critical (system down, security breach)
1891  - 7-9: High (blocking issue, major bug)
1892  - 4-6: Medium (normal bugs, features)
1893  - 1-3: Low (nice-to-haves, refactoring)
1894  
1895  **Link parent tasks:**
1896  
1897  ```javascript
1898  await createTask({
1899    task_type: 'verify_fix',
1900    assigned_to: 'qa',
1901    parent_task_id: 123, // Links to fix_bug task
1902    priority: 5,
1903  });
1904  ```
1905  
1906  ### Message Design
1907  
1908  **Use handoff for task completion:**
1909  
1910  ```javascript
1911  await agent.sendMessage(taskId, 'qa', 'handoff', 'Bug fix complete, ready for verification', {
1912    commit: 'abc123',
1913    files_changed: ['src/score.js'],
1914  });
1915  ```
1916  
1917  **Use questions for clarification:**
1918  
1919  ```javascript
1920  await agent.askQuestion(
1921    taskId,
1922    'developer',
1923    'Should this handle mobile and desktop screenshots differently?'
1924  );
1925  ```
1926  
1927  **Use notifications for FYI:**
1928  
1929  ```javascript
1930  await agent.sendMessage(
1931    taskId,
1932    'architect',
1933    'notification',
1934    'Coverage gate blocked commit due to <85% coverage',
1935    { current_coverage: 78, required: 85 }
1936  );
1937  ```
1938  
1939  ### Agent Development
1940  
1941  **Keep agents focused:**
1942  
1943  - Single responsibility principle
1944  - One agent = one clear role
1945  - Don't create "do everything" agents
1946  
1947  **Log liberally:**
1948  
1949  ```javascript
1950  await this.log(taskId, 'info', 'Starting bug fix analysis');
1951  await this.log(taskId, 'info', 'Identified affected files', { files: [...] });
1952  await this.log(taskId, 'info', 'Generated fix, checking coverage');
1953  ```
1954  
1955  **Fail gracefully:**
1956  
1957  ```javascript
1958  try {
1959    const result = await this.analyzeBug(task);
1960    return result;
1961  } catch (error) {
1962    await this.log(task.id, 'error', 'Bug analysis failed', { error: error.message });
1963    await this.failTask(task.id, { reason: 'Analysis failed', error: error.message });
1964    return null; // Return partial results if possible
1965  }
1966  ```
1967  
1968  **Validate inputs:**
1969  
1970  ```javascript
1971  async processTask(task) {
1972    const { context_json } = task;
1973    const context = JSON.parse(context_json);
1974  
1975    // Validate required fields
1976    if (!context.error || !context.file) {
1977      await this.failTask(task.id, { reason: 'Missing required context fields' });
1978      return;
1979    }
1980  
1981    // Continue processing...
1982  }
1983  ```
1984  
1985  **Test thoroughly:**
1986  
1987  - Unit tests for agent logic
1988  - Integration tests for workflows
1989  - Test error handling paths
1990  - Verify circuit breaker behavior
1991  
1992  ### Monitoring and Maintenance
1993  
1994  **Daily checks:**
1995  
1996  ```bash
1997  # Check agent health
1998  npm run agent:stats
1999  
2000  # Review errors
2001  npm run agent:logs -- --level error
2002  
2003  # Check approval queue
2004  npm run agent:approvals
2005  ```
2006  
2007  **Weekly reviews:**
2008  
2009  - Review circuit breaker triggers (if any)
2010  - Analyze token usage trends
2011  - Check task completion rates
2012  - Review escalated items
2013  
2014  **Monthly optimization:**
2015  
2016  - Analyze agent effectiveness
2017  - Optimize context files
2018  - Update agent logic based on patterns
2019  - Review and update approval thresholds
2020  
2021  ---
2022  
2023  ## Known Gaps & Industry Standards
2024  
2025  ### Gap Analysis Summary
2026  
2027  **Status**: 41 gaps identified across all agents compared to industry standards (Google SRE, TOGAF, ISTQB, OWASP/NIST, ITIL)
2028  
2029  **Archived analysis:** [docs/plans/archive/agent-job-roles-gaps.md](/home/jason/code/333Method/docs/plans/archive/agent-job-roles-gaps.md)
2030  
2031  ### Gap Analysis by Agent
2032  
2033  Based on industry standards (Google SRE, TOGAF, ISTQB, OWASP/NIST, ITIL), the current system has **41 identified gaps** across all agents:
2034  
2035  #### 1. Monitor Agent - SRE Standards (7 gaps)
2036  
2037  **Critical gaps:**
2038  
2039  - **SLO Tracking**: No service-level objectives for pipeline stages (e.g., "95% of sites score within 1 hour")
2040  - **Capacity Planning**: No forward-looking capacity analysis based on growth trends
2041  - **Toil Automation**: No identification of repetitive manual work
2042  - **Latency Monitoring**: No pipeline stage latency tracking (p50, p95, p99)
2043  - **On-Call Runbooks**: No automated incident response playbooks
2044  
2045  **Recommendation**: Add SLO tracking via `pipeline_metrics` table, track growth trends
2046  
2047  #### 2. Architect Agent Standards (6 gaps)
2048  
2049  **Critical gaps:**
2050  
2051  - **Architecture Decision Records (ADRs)**: No formal decision tracking
2052  - **Performance Profiling**: No bottleneck identification
2053  - **Scalability Planning**: No analysis of scaling limits
2054  - **Technical Debt Management**: No debt inventory or prioritization
2055  
2056  **Recommendation**: Create `docs/decisions/` for ADRs, add `profile_performance` task
2057  
2058  #### 3. Developer Agent Standards (5 gaps)
2059  
2060  **Critical gaps:**
2061  
2062  - **Root Cause Analysis**: No systematic RCA for recurring bugs
2063  - **Code Review Automation**: Limited automated review (ESLint exists but not enforced in workflow)
2064  - **Observability Instrumentation**: No logging/metrics added when fixing bugs
2065  
2066  **Recommendation**: Require RCA for bugs recurring >2x, add log statements to error paths
2067  
2068  #### 4. QA Agent Standards (6 gaps)
2069  
2070  **Critical gaps:**
2071  
2072  - **Test Data Management**: No test data generation or anonymization
2073  - **Non-Functional Testing**: No performance, load, or stress testing
2074  - **Test Prioritization**: No risk-based test prioritization
2075  - **Regression Testing**: No automated regression suite tracking
2076  
2077  **Recommendation**: Add `run_load_test` task for critical paths, tag regression tests
2078  
2079  #### 5. Security Agent Standards (7 gaps)
2080  
2081  **Critical gaps:**
2082  
2083  - **Threat Modeling**: No systematic threat analysis (STRIDE, DREAD) for new features
2084  - **Security Regression Testing**: No automated security test suite
2085  - **Penetration Testing**: No regular pentests (Shannon integration planned Phase 7)
2086  - **Security Metrics**: No MTTR tracking for vulnerabilities
2087  
2088  **Recommendation**: Add `threat_model` task for new features, track remediation time
2089  
2090  #### 6. Triage Agent Standards (6 gaps)
2091  
2092  **Critical gaps:**
2093  
2094  - **Incident Correlation**: No linking of related incidents
2095  - **Escalation Policies**: No time-based escalation rules
2096  - **Known Error Database**: No knowledge base of solved issues
2097  - **Postmortem Triggering**: No automated postmortems for critical incidents
2098  
2099  **Recommendation**: Store successful fixes in `error_fix_history`, create postmortem tasks
2100  
2101  #### 7. Cross-Cutting Gaps (4 gaps)
2102  
2103  **Critical gaps:**
2104  
2105  - **Observability & Telemetry**: No structured logging, metrics, or tracing
2106  - **Feedback Loops**: Limited learning from outcomes (exists for prompts, needs expansion)
2107  - **Human-in-the-Loop**: Approval gates exist but not consistently enforced
2108  - **Runbook Automation**: No automated resolution for known issues
2109  
2110  **Recommendation**: Standardize JSON logging, extend `prompt_feedback` to all agents
2111  
2112  ### Prioritized Improvement Roadmap
2113  
2114  **Immediate (Week 1):**
2115  
2116  1. Bootstrap Monitor Agent (create initial `scan_logs` task)
2117  2. Add SLO tracking for pipeline stages
2118  3. Implement known error database in Triage
2119  
2120  **Short-Term (Month 1):**
2121  
2122  4. Add performance profiling to Architect
2123  5. Add root cause analysis to Developer
2124  6. Add threat modeling to Security
2125  7. Add non-functional testing to QA
2126  
2127  **Medium-Term (Quarter 1):**
2128  
2129  8. Implement capacity planning in Monitor
2130  9. Add technical debt management to Architect
2131  10. Implement incident correlation in Triage
2132  
2133  ---
2134  
2135  ## Future Enhancements
2136  
2137  **Planned features:**
2138  
2139  - **Agent learning**: Track feedback patterns, improve over time (foundation exists with `prompt_feedback`)
2140  - **Parallel execution**: Multiple agents working simultaneously
2141  - **Agent metrics dashboard**: Real-time monitoring UI
2142  - **Custom workflows**: User-defined agent chains
2143  
2144  **Integration opportunities:**
2145  
2146  - **Shannon Pen Tester**: Add security scanning to Security Agent (planned Phase 7)
2147  - **CI/CD pipelines**: Trigger agent workflows on push/PR
2148  - **Slack/Discord notifications**: Alert on circuit breakers, escalations
2149  - **LLM provider switching**: Auto-switch based on task type/cost
2150  
2151  ---
2152  
2153  ## Implementation Status
2154  
2155  **Status**: All 6 agents implemented, tested, and deployed to production (2026-02-15).
2156  
2157  **Bootstrap Issue**: RESOLVED (2026-02-15) - Monitor agent bootstrap problem solved. System now properly initializes and self-schedules tasks.
2158  
2159  ---
2160  
2161  ## Additional Resources
2162  
2163  - **Workflow System:** `/home/jason/code/333Method/docs/06-automation/agent-workflow.md`
2164  - **Base Agent Code:** `/home/jason/code/333Method/src/agents/base-agent.js`
2165  - **Agent Implementations:** `/home/jason/code/333Method/src/agents/`
2166  - **Context Files:** `/home/jason/code/333Method/src/agents/contexts/`
2167  - **CLI Manager:** `/home/jason/code/333Method/src/cli/agent-manager.js`
2168  - **Database Schema:** `/home/jason/code/333Method/db/schema.sql`
2169  - **Migrations:** `/home/jason/code/333Method/db/migrations/041-create-agent-system.sql`
2170  - **Gap Analysis:** `docs/plans/archive/agent-job-roles-gaps.md` (41 gaps identified)
2171  
2172  ---
2173  
2174  **Last Updated:** 2026-02-26
2175  **Version:** 2.0 (merged from docs/AGENTS.md + docs/06-automation/agent-system.md)
2176  **Status:** Production-ready