agent-system.md
1 --- 2 title: Multi-Agent System 3 category: automation 4 last_verified: 2026-02-26 5 related_files: 6 - src/agents/base-agent.js 7 - src/agents/utils/task-manager.js 8 - src/agents/workflows/bug-fix.js 9 - src/agents/workflows/feature.js 10 - src/agents/workflows/refactor.js 11 - src/cron/sonnet-overseer.js 12 - src/cli/agent-manager.js 13 tags: [agents, automation, workflows, testing] 14 status: active 15 replaces: docs/AGENTS.md 16 --- 17 18 # Multi-Agent System Guide 19 20 ## Table of Contents 21 22 - [Overview](#overview) 23 - [Architecture](#architecture) 24 - [Getting Started](#getting-started) 25 - [Agents](#agents) 26 - [Task Routing](#task-routing) 27 - [Workflows](#workflows) 28 - [CLI Commands](#cli-commands) 29 - [Horizontal Scaling](#horizontal-scaling) 30 - [Configuration](#configuration) 31 - [Safety Features](#safety-features) 32 - [Cost Management](#cost-management) 33 - [Communication Patterns](#communication-patterns) 34 - [Testing](#testing) 35 - [Troubleshooting](#troubleshooting) 36 - [Best Practices](#best-practices) 37 - [Known Gaps & Industry Standards](#known-gaps--industry-standards) 38 - [Future Enhancements](#future-enhancements) 39 40 --- 41 42 ## Overview 43 44 The 333 Method uses a database-driven multi-agent system where specialized AI agents collaborate autonomously to handle development, testing, security, and architecture tasks. 45 46 ### Benefits 47 48 - **Token efficiency**: 75-85% reduction vs monolithic approach (20-25KB per invocation vs 100-150KB) 49 - **Specialization**: Each agent has focused responsibilities and optimized context 50 - **Peer review**: Built-in workflows ensure quality through agent collaboration 51 - **Autonomy**: Agents work continuously via cron scheduling 52 - **Audit trail**: Complete tracking of all agent actions and decisions 53 54 ### How It Works 55 56 1. **Monitor Agent** scans logs every 5 minutes and creates tasks for detected issues 57 2. **Triage Agent** classifies errors and routes tasks to appropriate agents 58 3. **Developer Agent** fixes bugs and implements features 59 4. **QA Agent** verifies fixes and enforces test coverage gates 60 5. **Security Agent** performs security reviews and compliance checks 61 6. **Architect Agent** reviews designs and maintains documentation freshness 62 63 Agents communicate through a database-driven message queue, creating a collaborative workflow where each agent builds on others' work. 64 65 --- 66 67 ## Architecture 68 69 ### Core Components 70 71 #### 1. Database Tables (Migration 041, 051) 72 73 **agent_tasks** - Task queue with priority and status tracking 74 75 ```sql 76 CREATE TABLE agent_tasks ( 77 id INTEGER PRIMARY KEY AUTOINCREMENT, 78 task_type TEXT NOT NULL, 79 assigned_to TEXT NOT NULL, 80 status TEXT NOT NULL, 81 priority INTEGER DEFAULT 5, 82 parent_task_id INTEGER, 83 context_json TEXT, 84 result_json TEXT, 85 retry_count INTEGER DEFAULT 0, 86 reviewed_by TEXT, 87 approval_json TEXT, 88 created_at TEXT DEFAULT CURRENT_TIMESTAMP 89 ); 90 ``` 91 92 **agent_messages** - Inter-agent communication 93 94 ```sql 95 CREATE TABLE agent_messages ( 96 id INTEGER PRIMARY KEY AUTOINCREMENT, 97 task_id INTEGER NOT NULL, 98 from_agent TEXT NOT NULL, 99 to_agent TEXT NOT NULL, 100 message_type TEXT NOT NULL, 101 message_text TEXT, 102 metadata_json TEXT, 103 created_at TEXT DEFAULT CURRENT_TIMESTAMP 104 ); 105 ``` 106 107 **agent_logs** - Execution audit trail 108 109 ```sql 110 CREATE TABLE agent_logs ( 111 id INTEGER PRIMARY KEY AUTOINCREMENT, 112 task_id INTEGER, 113 agent_name TEXT NOT NULL, 114 level TEXT NOT NULL, 115 message TEXT NOT NULL, 116 metadata_json TEXT, 117 created_at TEXT DEFAULT CURRENT_TIMESTAMP 118 ); 119 ``` 120 121 **agent_state** - Agent status and metrics 122 123 ```sql 124 CREATE TABLE agent_state ( 125 agent_name TEXT PRIMARY KEY, 126 status TEXT NOT NULL, 127 current_task_id INTEGER, 128 last_active DATETIME DEFAULT CURRENT_TIMESTAMP, 129 metrics_json TEXT 130 ); 131 ``` 132 133 **agent_outcomes** - Task outcomes for learning (Migration 052) 134 135 ```sql 136 CREATE TABLE agent_outcomes ( 137 id INTEGER PRIMARY KEY AUTOINCREMENT, 138 task_id INTEGER NOT NULL REFERENCES agent_tasks(id) ON DELETE CASCADE, 139 agent_name TEXT NOT NULL, 140 task_type TEXT NOT NULL, 141 outcome TEXT NOT NULL CHECK(outcome IN ('success', 'failure')), 142 context_json TEXT, -- Task-specific context (error_type, file_path, etc.) 143 result_json TEXT, -- Task result details (what worked, what didn't) 144 duration_ms INTEGER, 145 created_at DATETIME DEFAULT CURRENT_TIMESTAMP 146 ); 147 ``` 148 149 This table enables **task history and learning** - agents learn from past successes and failures to improve future performance. See [docs/agents/task-history.md](../agents/task-history.md) for details. 150 151 #### 2. Context Files 152 153 Each agent loads a base context (~15KB) plus role-specific context: 154 155 | Agent | Context Files | Total Size | 156 | --------- | ---------------------- | ---------- | 157 | Monitor | base.md + monitor.md | 20KB | 158 | Triage | base.md + triage.md | 23.5KB | 159 | Developer | base.md + developer.md | 21.3KB | 160 | QA | base.md + qa.md | 23KB | 161 | Security | base.md + security.md | 21KB | 162 | Architect | base.md + architect.md | 25KB | 163 164 **Location:** `/home/jason/code/333Method/src/agents/contexts/` 165 166 #### 3. Agent Framework 167 168 **BaseAgent class** (`src/agents/base-agent.js`) 169 170 - Task polling and execution 171 - Message sending/receiving 172 - Logging and error handling 173 - Circuit breaker integration 174 175 **Utility modules:** 176 177 - `context-loader.js` - Merges context files 178 - `context-builder.js` - Enriches context with task history for learning 179 - `task-manager.js` - CRUD operations for tasks 180 - `message-manager.js` - Inter-agent messaging 181 182 ### Workflow States 183 184 Tasks progress through these states: 185 186 ``` 187 pending -> running -> completed 188 | 189 awaiting_po_approval -> approved -> pending 190 | 191 awaiting_architect_approval -> approved -> pending 192 | 193 failed 194 | 195 blocked 196 ``` 197 198 **State Descriptions:** 199 200 - `pending` - Ready to work on 201 - `running` - Currently being processed by an agent 202 - `awaiting_po_approval` - Design proposal waiting for Product Owner sign-off 203 - `awaiting_architect_approval` - Implementation plan waiting for technical review 204 - `completed` - Successfully finished 205 - `failed` - Failed after 3 retry attempts 206 - `blocked` - Blocked on external dependency or human action 207 208 ### Approval System 209 210 **Product Owner Approval** (for significant changes): 211 212 - Required for: Breaking changes, schema migrations, features >4 hours effort 213 - Workflow: Architect creates design proposal -> PO reviews -> Approve/Reject 214 - CLI: `npm run agent:approve -- --task-id X --decision approved --reviewer "Jason"` 215 216 **Architect Approval** (for all implementation plans): 217 218 - Required for: All implementation plans, refactorings, performance optimizations 219 - Review criteria: Files won't exceed 150 lines, test coverage >=85%, documentation updated 220 - Workflow: Developer creates plan -> Architect reviews -> Approve/Reject 221 222 **Database Schema** (migration 051): 223 224 ```sql 225 -- agent_tasks additions 226 reviewed_by TEXT -- Who approved the task 227 approval_json TEXT -- {decision, reviewer, timestamp, notes, conditions} 228 status CHECK(..., 'awaiting_po_approval', 'awaiting_architect_approval') 229 ``` 230 231 --- 232 233 ## Getting Started 234 235 ### Prerequisites 236 237 1. Database initialized with agent tables (migration 041, 051) 238 2. Environment variable set: `AGENT_SYSTEM_ENABLED=true` 239 3. Cron system enabled to run agents every 5 minutes 240 241 ### Quick Start 242 243 #### 1. Enable the Agent System 244 245 ```bash 246 # Add to .env 247 echo "AGENT_SYSTEM_ENABLED=true" >> .env 248 ``` 249 250 #### 2. Bootstrap the Monitor Agent 251 252 The Monitor agent needs an initial task to start its self-scheduling loop: 253 254 ```bash 255 npm run agent:create -- --agent monitor --task scan_logs --context '{"incremental":true}' --priority 5 256 ``` 257 258 #### 3. Verify Agents Are Running 259 260 ```bash 261 # Check agent status 262 npm run agent:list 263 264 # View pending tasks 265 npm run agent:tasks 266 267 # View recent logs 268 npm run agent:logs -- --level info 269 ``` 270 271 #### 4. Trigger a Test Workflow 272 273 ```bash 274 # Test bug fix workflow 275 npm run agent:workflow -- --workflow bug-fix --error "Test error for verification" --stage scoring 276 277 # Check workflow status 278 npm run agent:tasks 279 ``` 280 281 --- 282 283 ## Agents 284 285 ### 1. Monitor Agent 286 287 **Role:** System immune system - proactive detection of issues 288 289 **Responsibilities:** 290 291 - Scan log files for ERROR/FATAL patterns every 5 minutes 292 - Detect looping errors (same error >3x in 1 hour) 293 - Monitor stale tasks (pending >1 hour) 294 - Verify process compliance (expected stage transitions) 295 - Track agent health (success/failure ratios) 296 - Check documentation drift daily 297 298 **Task Types:** 299 300 - `scan_logs` - Incremental log scanning (self-scheduling) 301 - `check_agent_health` - Monitor agent success rates 302 - `check_process_compliance` - Verify workflow adherence 303 - `check_doc_freshness` - Detect stale documentation 304 305 **Context Size:** 20KB (base.md + monitor.md) 306 307 **Self-Scheduling:** Creates new `scan_logs` task after each completion 308 309 **Example:** 310 311 ```bash 312 # View Monitor status 313 npm run agent:list | grep monitor 314 315 # View Monitor logs 316 npm run agent:logs -- --agent-name monitor 317 ``` 318 319 ### 2. Triage Agent 320 321 **Role:** Error classifier and task router 322 323 **Responsibilities:** 324 325 - Classify errors by type (null_pointer, network, database_constraint, api_error, security, configuration) 326 - Determine severity (critical, high, medium, low) 327 - Calculate priority (1-10 scale based on severity + impact) 328 - Route tasks to appropriate agents 329 - Suggest initial fix approaches 330 331 **Task Types:** 332 333 - `classify_error` - Analyze error and create appropriate task 334 335 **Context Size:** 23.5KB (base.md + triage.md) 336 337 **Routing Logic:** 338 339 - Security errors -> Security Agent (priority 10) 340 - Network errors -> Developer Agent 341 - Database/API errors -> Developer Agent 342 - Complex architectural issues -> Architect Agent 343 - Configuration errors -> Developer Agent 344 345 **Example:** 346 347 ```bash 348 # Manually trigger triage 349 npm run agent:create -- --agent triage --task classify_error --context '{"error":"TypeError: Cannot read property score of null","file":"src/score.js"}' --priority 7 350 ``` 351 352 ### 3. Developer Agent 353 354 **Role:** Bug fixes and feature implementation 355 356 **Responsibilities:** 357 358 - Analyze error messages and stack traces 359 - Extract affected file paths 360 - Generate bug fixes 361 - Implement new features 362 - **CRITICAL:** Enforce 85%+ code coverage before commits 363 - Create git commits (only if coverage gate passes) 364 - Hand off to QA for verification 365 366 **Task Types:** 367 368 - `fix_bug` - Analyze and fix bugs 369 - `implement_feature` - Build new features 370 - `implementation_plan` - Create detailed implementation plan 371 372 **Context Size:** 21.3KB (base.md + developer.md) 373 374 **Coverage Gate:** 375 Developer enforces 85%+ coverage BEFORE creating commits: 376 377 1. Make code changes 378 2. Run `checkCoverageBeforeCommit(files, taskId)` 379 3. If coverage <85%: Attempt automatic test generation 380 4. If auto-fix fails: Escalate to Architect for guidance 381 5. Only commit if coverage >=85% 382 383 **Coverage Escalation:** 384 385 When coverage <85% and auto-fix fails, Developer asks Architect for guidance: 386 387 - Option A: Refactor code for better testability 388 - Option B: Accept lower coverage with technical debt justification (requires human approval) 389 - Option C: Provide manual test guidance for complex uncovered branches 390 391 **Workflow Example:** 392 393 ``` 394 1. Receive fix_bug task from Triage 395 2. Analyze error and identify affected files 396 3. Generate fix 397 4. Run coverage check 398 5. If coverage passes: Create commit 399 6. Create verify_fix task for QA 400 7. Send handoff message to QA 401 ``` 402 403 **Example:** 404 405 ```bash 406 # View Developer tasks 407 npm run agent:tasks -- --assigned-to developer 408 409 # Trigger bug fix 410 npm run agent:workflow -- --workflow bug-fix --error "..." --file src/score.js 411 ``` 412 413 ### 4. QA Agent 414 415 **Role:** Test generation, verification, coverage enforcement 416 417 **Responsibilities:** 418 419 - Generate unit tests for new features 420 - Verify bug fixes work correctly 421 - Enforce 80%+ coverage gate (HARD BLOCK on task completion) 422 - Run test suite and parse coverage reports 423 - Create feedback for developers on failures 424 - Tag regression tests 425 426 **Task Types:** 427 428 - `write_test` - Generate unit test 429 - `verify_fix` - Verify bug fix works 430 - `check_coverage` - Ensure 80%+ coverage 431 - `write_missing_tests` - Fill coverage gaps 432 433 **Context Size:** 23KB (base.md + qa.md) 434 435 **Coverage Gate:** 436 QA enforces 80%+ coverage AFTER commits as a second safety layer: 437 438 1. Receive verify_fix task 439 2. Run tests for changed files 440 3. Check coverage with c8 441 4. If <80%: Create write_missing_tests task, block parent task 442 5. If >=80%: Mark task complete 443 444 **Example:** 445 446 ```bash 447 # View QA tasks 448 npm run agent:tasks -- --assigned-to qa 449 450 # Check recent verifications 451 npm run agent:logs -- --agent-name qa --level info 452 ``` 453 454 ### 5. Security Agent 455 456 **Role:** Security audits, compliance, vulnerability scanning 457 458 **Responsibilities:** 459 460 - Code security reviews (SQL injection, XSS, command injection) 461 - Dependency vulnerability scanning (`npm audit`) 462 - Secrets detection (hardcoded keys, credentials) 463 - TCPA/CAN-SPAM/GDPR compliance validation 464 - Track vulnerability remediation time 465 466 **Task Types:** 467 468 - `audit_code` - Security code review 469 - `scan_dependencies` - Check for vulnerable dependencies 470 - `compliance_check` - Validate TCPA/CAN-SPAM adherence 471 - `scan_secrets` - Detect exposed credentials 472 473 **Context Size:** 21KB (base.md + security.md) 474 475 **Example:** 476 477 ```bash 478 # Trigger security audit 479 npm run agent:create -- --agent security --task audit_code --context '{"files":["src/outreach/sms.js"]}' --priority 8 480 481 # View security findings 482 npm run agent:logs -- --agent-name security --level error 483 ``` 484 485 ### 6. Architect Agent 486 487 **Role:** Design review, refactoring, documentation freshness 488 489 **Responsibilities:** 490 491 - Design reviews for new features 492 - Refactoring suggestions based on complexity analysis 493 - Code complexity monitoring (max 150 lines, complexity 15) 494 - Documentation freshness checks 495 - Schema change validation 496 - Create Architecture Decision Records (ADRs) 497 498 **Task Types:** 499 500 - `design_proposal` - Create design document for significant changes 501 - `technical_review` - Review implementation plans 502 - `suggest_refactor` - Recommend refactoring 503 - `update_documentation` - Fix stale docs 504 - `review_design` - Evaluate feature designs 505 506 **Context Size:** 25KB (base.md + architect.md) 507 508 **Documentation Freshness Checks:** 509 On every commit, Architect verifies: 510 511 - New env vars -> `.env.example` updated? 512 - New npm scripts -> `README.md` updated? 513 - New modules -> `CLAUDE.md` updated? 514 - Schema changes -> `db/schema.sql` + migration? 515 - Features done -> `docs/TODO.md` updated? 516 517 **Example:** 518 519 ```bash 520 # Request design review 521 npm run agent:create -- --agent architect --task design_proposal --context '{"feature":"Dark mode toggle","requirements":["Settings UI","Persistence","Global theme"]}' --priority 6 522 523 # View pending reviews 524 npm run agent:tasks -- --assigned-to architect --status awaiting_po_approval 525 ``` 526 527 --- 528 529 ## Task Routing 530 531 The agent system uses a centralized task routing configuration to ensure tasks are always assigned to the correct agent. 532 533 ### Routing Configuration 534 535 **Location:** `src/agents/utils/task-routing.js` 536 537 This module provides: 538 539 - `TASK_ROUTING` - Complete mapping of task types to agents 540 - `getAgentForTaskType(taskType)` - Get correct agent for a task type 541 - `validateTaskAssignment(taskType, assignedTo)` - Validate task is correctly routed 542 - `getTaskTypesForAgent(agentName)` - Get all task types an agent handles 543 544 ### Complete Task Type Reference 545 546 | Task Type | Agent | Description | 547 | ------------------------------- | --------- | ---------------------------------------------- | 548 | **Developer Tasks** | | | 549 | `fix_bug` | developer | Fix bugs identified by Triage | 550 | `implement_feature` | developer | Implement new features after design approval | 551 | `refactor_code` | developer | Refactor complex or problematic code | 552 | `apply_feedback` | developer | Address feedback from other agents | 553 | `implementation_plan` | developer | Create detailed implementation plan | 554 | **QA Tasks** | | | 555 | `write_test` | qa | Generate unit tests for code | 556 | `verify_fix` | qa | Verify bug fix works correctly | 557 | `check_coverage` | qa | Check test coverage meets 80%+ requirement | 558 | `run_tests` | qa | Run test suite for files | 559 | **Security Tasks** | | | 560 | `audit_code` | security | Security code review (SQL injection, XSS, etc) | 561 | `scan_dependencies` | security | Check for vulnerable dependencies | 562 | `verify_compliance` | security | Validate TCPA/CAN-SPAM/GDPR compliance | 563 | `scan_secrets` | security | Detect exposed credentials | 564 | `threat_model` | security | STRIDE threat modeling for component | 565 | `fix_security_issue` | security | Auto-fix security vulnerabilities | 566 | `review_dependency_update` | security | Review dependency updates for security | 567 | **Architect Tasks** | | | 568 | `design_proposal` | architect | Create design proposal for features | 569 | `technical_review` | architect | Review implementation plan for soundness | 570 | `review_design` | architect | Review design against principles | 571 | `suggest_refactor` | architect | Suggest refactoring for complex code | 572 | `update_documentation` | architect | Update documentation with Claude API | 573 | `check_documentation_freshness` | architect | Check for stale documentation | 574 | `check_complexity` | architect | Check code complexity metrics | 575 | `audit_documentation` | architect | Verify documentation matches reality | 576 | `check_branch_health` | architect | Check for stale branches | 577 | `profile_performance` | architect | Profile pipeline performance | 578 | `review_documentation` | architect | Review documentation accuracy | 579 | **Triage Tasks** | | | 580 | `classify_error` | triage | Classify error and route to agent | 581 | `route_task` | triage | Route generic task to agent | 582 | `prioritize_tasks` | triage | Prioritize pending tasks | 583 | **Monitor Tasks** | | | 584 | `scan_logs` | monitor | Scan logs for errors (self-scheduling) | 585 | `check_agent_health` | monitor | Monitor agent success rates | 586 | `check_process_compliance` | monitor | Verify workflow adherence | 587 | `detect_anomaly` | monitor | Detect anomalous behavior | 588 | `check_pipeline_health` | monitor | Check pipeline for blockages | 589 | `check_slo_compliance` | monitor | Check SLO compliance metrics | 590 591 ### Auto-Delegation 592 593 When an agent receives a task type it doesn't handle, it automatically delegates to the correct agent using `BaseAgent.delegateToCorrectAgent()`: 594 595 **Example:** If `implement_feature` is mistakenly assigned to `monitor`: 596 597 1. Monitor calls `delegateToCorrectAgent(task)` 598 2. Creates new task assigned to `developer` 599 3. Completes original task with delegation note 600 4. Logs routing correction for analysis 601 602 This prevents "Unknown task type" errors and ensures no tasks are lost due to misrouting. 603 604 ### Common Routing Errors Fixed 605 606 **Before (Errors):** 607 608 - `implement_feature` -> monitor, triage, qa, security, architect (wrong) 609 - `fix_bug` -> architect (wrong) 610 - `review_documentation` -> unknown (wrong) 611 - `review_dependency_update` -> unknown (wrong) 612 613 **After (Correct Routing):** 614 615 - `implement_feature` -> developer 616 - `fix_bug` -> developer 617 - `review_documentation` -> architect 618 - `review_dependency_update` -> security 619 620 ### Testing 621 622 Run task routing tests: 623 624 ```bash 625 node --test tests/agents/task-routing.test.js 626 ``` 627 628 This validates all task types are correctly mapped and delegation works properly. 629 630 --- 631 632 ## Workflows 633 634 ### Standard Workflow Types 635 636 #### 1. Feature Implementation (Significant) 637 638 Used for breaking changes, database migrations, or features >4 hours effort. 639 640 ``` 641 Product Request 642 | 643 Architect: design_proposal 644 | 645 Status: awaiting_po_approval 646 | 647 PO Reviews -> Approves/Rejects 648 | (approved) 649 Developer: implementation_plan 650 | 651 Status: awaiting_architect_approval 652 | 653 Architect: technical_review -> Approves/Rejects 654 | (approved) 655 Developer: implement_feature 656 | 657 QA: verify_fix 658 | 659 Security: audit_code (if needed) 660 ``` 661 662 **Example:** 663 664 ```bash 665 npm run agent:workflow -- --workflow feature --description "Add two-factor authentication" --requirements '["SMS OTP","Email backup codes","Recovery process"]' 666 667 # View approval queue 668 npm run agent:approvals -- --status awaiting_po_approval 669 670 # Approve design 671 npm run agent:approve -- --task-id 42 --reviewer "Jason" --decision approved 672 ``` 673 674 #### 2. Feature Implementation (Minor) 675 676 Used for small features <=4 hours, no breaking changes or migrations. 677 678 ``` 679 Product Request 680 | 681 Architect: design_proposal (auto-approved) 682 | 683 Developer: implementation_plan 684 | 685 Architect: technical_review 686 | 687 Developer: implement_feature 688 | 689 QA: verify_fix 690 ``` 691 692 **Example:** 693 694 ```bash 695 npm run agent:workflow -- --workflow feature --description "Add logging to enrich stage" --requirements '["Log contact count","Log errors"]' 696 ``` 697 698 #### 3. Bug Fix (Architectural) 699 700 For bugs affecting multiple modules or requiring schema changes. 701 702 ``` 703 Error Detected 704 | 705 Triage: classify_error -> architectural 706 | 707 Architect: design_proposal 708 | 709 Status: awaiting_po_approval 710 | 711 PO Approves 712 | 713 Developer: implementation_plan 714 | 715 Architect: technical_review 716 | 717 Developer: fix_bug 718 | 719 QA: verify_fix 720 ``` 721 722 #### 4. Bug Fix (Standard) 723 724 For isolated bugs in a single file with low complexity. 725 726 ``` 727 Error Detected 728 | 729 Triage: classify_error -> simple 730 | 731 Developer: fix_bug 732 | 733 QA: verify_fix 734 ``` 735 736 **Example:** 737 738 ```bash 739 npm run agent:workflow -- --workflow bug-fix --error "TypeError: Cannot read property 'score' of null" --file src/score.js --stack "..." 740 ``` 741 742 #### 5. Refactor Workflow 743 744 For code complexity reduction or architectural improvements. 745 746 ``` 747 Complexity Detected 748 | 749 Architect: design_proposal 750 | 751 Developer: implementation_plan 752 | 753 Architect: technical_review 754 | 755 Developer: implement refactoring 756 | 757 QA: verify_fix (ensure no regressions) 758 ``` 759 760 **Example:** 761 762 ```bash 763 npm run agent:workflow -- --workflow refactor --file src/utils/stealth-browser.js --reason "Cyclomatic complexity exceeds 15" 764 ``` 765 766 ### Validation Rules 767 768 All tasks validate workflow dependencies before creation: 769 770 1. **implement_feature** requires approved `design_proposal` parent 771 2. **Developer implementation** requires approved `implementation_plan` 772 3. **QA verification** requires completed Developer task 773 4. **Parent tasks** must be completed before children start 774 775 ### Approval System 776 777 #### Product Owner Approval 778 779 **Required for:** 780 781 - Breaking changes 782 - Database migrations 783 - Features with >4 hours estimated effort 784 - Changes explicitly marked "significant" 785 786 **Process:** 787 788 1. Architect creates design_proposal task 789 2. Task status -> `awaiting_po_approval` 790 3. Task appears in human_review_queue 791 4. PO reviews via CLI: `npm run agent:approvals` 792 5. PO approves/rejects via: `npm run agent:approve` 793 794 **Approval Schema:** 795 796 ```json 797 { 798 "decision": "approved | approved_with_conditions | rejected", 799 "reviewer": "Jason", 800 "timestamp": "2026-02-15T10:30:00Z", 801 "notes": "Looks good, keep scope tight", 802 "conditions": ["Max 2 files", "No new dependencies"] 803 } 804 ``` 805 806 #### Architect Approval 807 808 **Required for:** 809 810 - All implementation plans 811 - Refactorings 812 - Performance optimizations 813 814 **Review Criteria:** 815 816 - Files won't exceed 150 lines 817 - Test coverage >=85% 818 - Documentation updated 819 - No circular dependencies 820 - Follows architectural patterns 821 822 **Process:** 823 824 1. Developer creates implementation_plan 825 2. Task status -> `awaiting_architect_approval` 826 3. Architect agent reviews plan 827 4. Creates technical_review task 828 5. Approves -> status back to `pending`, Developer proceeds 829 6. Rejects -> feedback to Developer, plan revised 830 831 --- 832 833 ## CLI Commands 834 835 ### View Agent Status 836 837 ```bash 838 # List all agents with current status 839 npm run agent:list 840 841 # Output: 842 # Agent: monitor, Status: idle, Last run: 2026-02-15 10:25:00 843 # Agent: developer, Status: running, Current task: 42 844 # Circuit breaker: All agents operational 845 ``` 846 847 ### Manage Tasks 848 849 ```bash 850 # View all pending tasks 851 npm run agent:tasks 852 853 # View tasks for specific agent 854 npm run agent:tasks -- --assigned-to developer 855 856 # View tasks by status 857 npm run agent:tasks -- --status pending 858 npm run agent:tasks -- --status awaiting_po_approval 859 860 # View specific task details 861 npm run agent:tasks -- --task-id 42 862 ``` 863 864 ### Create Tasks Manually 865 866 ```bash 867 # Create task for developer 868 npm run agent:create -- --agent developer --task fix_bug --context '{"error":"...","file":"src/score.js"}' --priority 7 869 870 # Create task for QA 871 npm run agent:create -- --agent qa --task write_test --context '{"module":"scoring","function":"calculateScore"}' --priority 5 872 ``` 873 874 ### Trigger Workflows 875 876 ```bash 877 # Bug fix workflow 878 npm run agent:workflow -- --workflow bug-fix --error "TypeError: Cannot read property 'score' of null" --stage scoring 879 880 # Feature workflow 881 npm run agent:workflow -- --workflow feature --description "Add export to CSV" --requirements '["Export button","CSV format","Download trigger"]' 882 883 # Refactor workflow 884 npm run agent:workflow -- --workflow refactor --file src/utils/stealth-browser.js --reason "Cyclomatic complexity exceeds 15" 885 ``` 886 887 ### Manage Approvals 888 889 ```bash 890 # View all pending approvals 891 npm run agent:approvals 892 893 # Filter by approval type 894 npm run agent:approvals -- --status awaiting_po_approval 895 npm run agent:approvals -- --status awaiting_architect_approval 896 897 # Approve task 898 npm run agent:approve -- --task-id 42 --reviewer "Jason" --decision approved 899 900 # Approve with conditions 901 npm run agent:approve -- --task-id 42 --reviewer "Jason" --decision approved_with_conditions --notes "Keep it simple" --conditions "Max 2 files,No new dependencies" 902 903 # Reject task 904 npm run agent:approve -- --task-id 42 --reviewer "Jason" --decision rejected --notes "Scope too large, break into smaller pieces" 905 ``` 906 907 ### View Workflow Status 908 909 ```bash 910 # View workflow tree (parent/child tasks) 911 npm run agent:workflow:status -- --workflow-id 42 912 913 # Output shows task hierarchy and status 914 ``` 915 916 ### View Logs 917 918 ```bash 919 # View all agent logs 920 npm run agent:logs 921 922 # Filter by agent 923 npm run agent:logs -- --agent-name developer 924 925 # Filter by task 926 npm run agent:logs -- --task-id 42 927 928 # Filter by level 929 npm run agent:logs -- --level error 930 npm run agent:logs -- --agent-name developer --level error 931 ``` 932 933 ### View Statistics 934 935 ```bash 936 # View success rates and metrics 937 npm run agent:stats 938 939 # Output: 940 # Agent: developer, Tasks: 45, Success: 42, Failure: 3, Rate: 93% 941 # Agent: qa, Tasks: 38, Success: 38, Failure: 0, Rate: 100% 942 # Circuit breaker: All agents operational 943 ``` 944 945 ### Run Agents Manually 946 947 ```bash 948 # Run all agents once 949 npm run agent:run 950 951 # Run with verbose logging 952 npm run agent:run -- --verbose 953 954 # Process up to N tasks 955 npm run agent:run -- --tasks=10 956 957 # Run single agent 958 npm run agent:run:single 959 ``` 960 961 --- 962 963 ## Horizontal Scaling 964 965 The agent system supports horizontal scaling through row-level task locking, allowing multiple instances of the same agent to run concurrently without conflicts. 966 967 ### How It Works 968 969 **Row-Level Locking:** 970 971 - Each agent instance atomically claims individual tasks from the database 972 - SQLite transactions ensure only one instance can claim any given task 973 - Multiple instances safely process different tasks simultaneously 974 - No duplicate processing, even with 5+ concurrent instances 975 976 **Configuration:** 977 978 ```env 979 # Enable row-level locking (default: true) 980 AGENT_ENABLE_ROW_LOCKING=true 981 982 # Allow concurrent instances of same agent (default: false) 983 AGENT_ALLOW_CONCURRENT_INSTANCES=true 984 ``` 985 986 ### Running Multiple Instances 987 988 **Example: 3 Developer Agents:** 989 990 ```bash 991 # Terminal 1 992 npm run agent:run:single developer & 993 994 # Terminal 2 995 npm run agent:run:single developer & 996 997 # Terminal 3 998 npm run agent:run:single developer & 999 ``` 1000 1001 All three instances will process different tasks concurrently. Work distribution is automatic and race-condition safe. 1002 1003 ### Performance Benefits 1004 1005 **Task Throughput:** 1006 1007 - 1 developer agent: ~5-10 tasks/hour (depending on complexity) 1008 - 3 developer agents: ~15-30 tasks/hour (3x throughput) 1009 - 5 developer agents: ~25-50 tasks/hour (5x throughput, diminishing returns) 1010 1011 **When to Scale:** 1012 1013 - High task queue depth (>20 pending tasks) 1014 - Long-running tasks (>5 minutes each) 1015 - Time-sensitive workflows (critical bug fixes) 1016 1017 ### Safety Mechanisms 1018 1019 **Agent-Level Locking (Optional):** 1020 1021 ```env 1022 # Disable for horizontal scaling 1023 AGENT_ALLOW_CONCURRENT_INSTANCES=true 1024 ``` 1025 1026 When disabled (default), only one instance of each agent runs at a time (backwards compatible). 1027 1028 **Task States:** 1029 1030 - `pending` -> Available for claiming 1031 - `running` -> Claimed by an instance (atomic transition) 1032 - `completed` -> Finished successfully 1033 - `failed` -> Error after max retries 1034 1035 **Edge Cases Handled:** 1036 1037 - Race conditions: Transaction ensures atomic claiming 1038 - Crashed instances: Stale lock cleanup after 2 minutes 1039 - Duplicate processing: Prevented by atomic UPDATE WHERE status='pending' 1040 1041 ### Monitoring Concurrent Agents 1042 1043 **View running instances:** 1044 1045 ```bash 1046 # Check agent states 1047 npm run agent:list 1048 1049 # Monitor task processing 1050 watch -n 5 'npm run agent:tasks' 1051 ``` 1052 1053 **Database query:** 1054 1055 ```sql 1056 SELECT 1057 agent_name, 1058 COUNT(*) as processing_count, 1059 GROUP_CONCAT(id) as task_ids 1060 FROM agent_tasks 1061 WHERE status = 'running' 1062 GROUP BY agent_name; 1063 ``` 1064 1065 ### Limitations 1066 1067 **SQLite Concurrency:** 1068 1069 - WAL mode recommended for high concurrency 1070 - ~10 concurrent writers is safe limit 1071 - Consider PostgreSQL for >10 instances 1072 1073 **Cost Considerations:** 1074 1075 - Each instance makes LLM API calls 1076 - Budget enforcement: `AGENT_DAILY_BUDGET=10` (USD) 1077 - Emergency shutdown if >$5/hour spend rate 1078 1079 ### Scaling Best Practices 1080 1081 1. **Start with 2-3 instances** - Verify row-level locking works correctly 1082 2. **Monitor task completion** - Ensure no duplicate processing 1083 3. **Check database locks** - Avoid SQLite contention 1084 4. **Scale gradually** - Add instances as queue depth increases 1085 5. **Use priority wisely** - High-priority tasks processed first 1086 1087 ### Testing Concurrent Locking 1088 1089 ```bash 1090 # Run concurrent locking tests 1091 npm test tests/agents/concurrent-locking.test.js 1092 ``` 1093 1094 Tests verify: 1095 1096 - No duplicate task processing 1097 - Correct priority ordering 1098 - Work distribution across instances 1099 - Agent isolation (developer vs QA) 1100 - Backwards compatibility with single instance 1101 1102 --- 1103 1104 ## Configuration 1105 1106 ### Environment Variables 1107 1108 ```bash 1109 # Enable/disable agent system 1110 AGENT_SYSTEM_ENABLED=true 1111 1112 # Circuit breaker threshold (30% failure rate triggers disable) 1113 AGENT_CIRCUIT_BREAKER_THRESHOLD=0.3 1114 1115 # Rate limit (max invocations per hour) 1116 AGENT_MAX_INVOCATIONS_PER_HOUR=60 1117 1118 # Immediate invocation (default: true) 1119 # Event-driven agent invocation eliminates 5-minute cron delays 1120 # Agents invoke each other immediately after handoffs and task creation 1121 # Speeds up workflows 10-15x (from 15-20 min to < 2 min) 1122 # See docs/IMMEDIATE-INVOCATION.md for details 1123 AGENT_IMMEDIATE_INVOCATION=true 1124 1125 # Max chain depth (default: 10) 1126 # Prevents infinite loops by limiting consecutive immediate invocations 1127 # After reaching depth, agents fall back to cron polling 1128 AGENT_MAX_CHAIN_DEPTH=10 1129 1130 # Database path (for testing) 1131 DATABASE_PATH=./db/sites.db 1132 ``` 1133 1134 ### Quality Gates 1135 1136 **Developer Agent:** 1137 1138 - **Coverage gate:** 85%+ required BEFORE commits (HARD BLOCK) 1139 - Automatic test generation attempted if coverage <85% 1140 - Escalates to Architect if auto-fix fails 1141 1142 **QA Agent:** 1143 1144 - **Coverage gate:** 80%+ required to approve tasks (HARD BLOCK) 1145 - Creates `write_missing_tests` task if coverage <80% 1146 - Blocks parent task until coverage improves 1147 1148 **Other Gates:** 1149 1150 - **Retry limit:** 3 retries per task before marking as failed 1151 - **Task TTL:** Tasks pending >1 hour escalate to human review 1152 - **Circuit breaker:** >30% failure rate disables agent 1153 1154 **Coverage Enforcement:** 1155 1156 1. **Developer Agent** (85% gate): Checks coverage BEFORE commit 1157 - Blocks commit if any changed source file <85% 1158 - Attempts to write tests automatically 1159 - Escalates to Architect if auto-fix fails 1160 2. **QA Agent** (80% gate): Checks coverage AFTER commit 1161 - Blocks task completion if <80% 1162 - Creates `write_test` task for missing coverage 1163 - Provides second layer of enforcement 1164 1165 ### Scheduling 1166 1167 **Immediate Invocation** (Event-Driven): 1168 1169 Agents are invoked immediately when: 1170 1171 - Another agent hands off a task (`handoff()`) 1172 - A new task is created (`createTask()`) 1173 1174 This eliminates 5-minute cron delays, speeding up workflows **10-15x** (from 15-20 minutes to < 2 minutes). 1175 1176 See [IMMEDIATE-INVOCATION.md](../IMMEDIATE-INVOCATION.md) for details. 1177 1178 **Cron Fallback** (Scheduled Polling): 1179 1180 Agents also run via cron job every 5 minutes as a safety net: 1181 1182 ```sql 1183 -- cron_jobs table entry 1184 INSERT INTO cron_jobs (name, schedule, handler, enabled) 1185 VALUES ('agent-runner', '*/5 * * * *', 'node src/agents/runner.js', 1); 1186 ``` 1187 1188 Cron picks up tasks that were missed by immediate invocation (e.g., due to errors or depth limits). 1189 1190 **Manual control:** 1191 1192 - Start: Set `enabled = 1` in cron_jobs 1193 - Stop: Set `enabled = 0` in cron_jobs 1194 - One-time run: `npm run agent:run` 1195 1196 --- 1197 1198 ## Safety Features 1199 1200 ### Circuit Breaker 1201 1202 **Purpose:** Prevent runaway agent failures from consuming resources 1203 1204 **How it works:** 1205 1206 1. Monitors agent success/failure ratios 1207 2. If failure rate >30% (and >=10 tasks completed): Trigger circuit breaker 1208 3. Agent status -> `blocked` 1209 4. Timestamp recorded in `agent_state.metrics_json` 1210 5. Manual reset required 1211 1212 **When triggered:** 1213 1214 - Agent logged to `human_review_queue` 1215 - All tasks for that agent paused 1216 - Root cause investigation required 1217 1218 **Reset:** 1219 1220 ```sql 1221 UPDATE agent_state 1222 SET status = 'idle', 1223 metrics_json = json_remove(metrics_json, '$.circuit_breaker_triggered_at') 1224 WHERE agent_name = 'developer'; 1225 ``` 1226 1227 ### Escalation to Human Review 1228 1229 Tasks automatically escalate to `human_review_queue` for: 1230 1231 - Database schema changes 1232 - Breaking API changes 1233 - Security-sensitive changes (auth, secrets, compliance) 1234 - Circuit breaker triggers 1235 - Stale tasks (pending >1 hour) 1236 - Failed tasks after 3 retries 1237 1238 **Review queue:** 1239 1240 ```bash 1241 # View human review items 1242 npm run agent:approvals 1243 1244 # Approve/reject from queue 1245 npm run agent:approve -- --task-id <id> --reviewer "Name" --decision approved|rejected 1246 ``` 1247 1248 ### Audit Trail 1249 1250 Complete tracking of all agent actions: 1251 1252 **agent_logs table:** 1253 1254 - Every task execution logged with level (info, warning, error) 1255 - Metadata includes context, decisions, file paths 1256 1257 **agent_messages table:** 1258 1259 - All inter-agent communication recorded 1260 - Message types: handoff, question, answer, notification 1261 1262 **agent_tasks table:** 1263 1264 - Task status changes tracked 1265 - Retry attempts logged 1266 - Result stored in result_json 1267 1268 **agent_state table:** 1269 1270 - Agent status changes 1271 - Metrics tracked (success/failure rates) 1272 - Last active timestamps 1273 1274 ### Rollback Protection 1275 1276 **Before making changes:** 1277 1278 1. Developer agent checks coverage 1279 2. Architect reviews implementation plan 1280 3. QA verifies changes don't break tests 1281 1282 **If something breaks:** 1283 1284 1. Monitor detects errors in logs 1285 2. Triage classifies and routes 1286 3. Developer creates fix 1287 4. Workflow repeats with proper gates 1288 1289 **Manual rollback:** 1290 1291 ```bash 1292 # View recent changes 1293 git log --oneline -5 1294 1295 # Rollback if needed 1296 git revert <commit-hash> 1297 1298 # Trigger QA verification 1299 npm run agent:create -- --agent qa --task verify_fix --context '{"commit":"..."}' --priority 10 1300 ``` 1301 1302 --- 1303 1304 ## Cost Management 1305 1306 ### Token Usage Reduction 1307 1308 **Monolithic approach:** 100-150KB per invocation (full CLAUDE.md) 1309 **Multi-agent approach:** 20-25KB per invocation (base + role context) 1310 **Reduction:** 75-85% 1311 1312 ### Breakdown by Agent 1313 1314 | Agent | Context Size | Tokens/Invocation | Reduction | 1315 | --------- | ------------ | ----------------- | --------- | 1316 | Monitor | 20KB | ~5,000 | 80% | 1317 | Triage | 23.5KB | ~6,000 | 76% | 1318 | Developer | 21.3KB | ~5,300 | 79% | 1319 | QA | 23KB | ~5,800 | 77% | 1320 | Security | 21KB | ~5,200 | 79% | 1321 | Architect | 25KB | ~6,200 | 75% | 1322 1323 ### Rate Limiting 1324 1325 **Environment variable:** 1326 1327 ```bash 1328 AGENT_MAX_INVOCATIONS_PER_HOUR=60 1329 ``` 1330 1331 **How it works:** 1332 1333 - Tracks invocations per hour in `agent_state.metrics_json` 1334 - If limit exceeded: Agent status -> `blocked` 1335 - Resets every hour 1336 1337 **Monitoring:** 1338 1339 ```bash 1340 # Check invocation counts 1341 npm run agent:stats 1342 1343 # View recent logs 1344 npm run agent:logs -- --agent-name developer 1345 ``` 1346 1347 ### Budget Controls 1348 1349 **Prevent cost overruns:** 1350 1351 1. **Set rate limits:** `AGENT_MAX_INVOCATIONS_PER_HOUR=60` 1352 2. **Monitor stats:** `npm run agent:stats` daily 1353 3. **Review logs:** Check for unnecessary task creation 1354 4. **Optimize context:** Keep context files lean and focused 1355 5. **Use Haiku for simple tasks:** `AGENT_USE_HAIKU_FOR_SIMPLE_TASKS=true` (50-70% cost reduction) 1356 1357 ### Smart Model Selection (Haiku vs Sonnet) 1358 1359 **Cost optimization:** The agent system automatically selects the appropriate model based on task complexity. 1360 1361 **Cost comparison:** 1362 1363 | Model | Input | Output | Use Case | 1364 | ----------------- | -------------- | ----------------- | ------------------------------------- | 1365 | Claude 3.5 Haiku | $0.80/M | $4.00/M | Simple pattern-based tasks | 1366 | Claude 3.5 Sonnet | $3.00/M | $15.00/M | Complex reasoning & code generation | 1367 | **Cost Savings** | **4x cheaper** | **3.75x cheaper** | **50-70% reduction for simple tasks** | 1368 1369 **Haiku tasks (simple/pattern-based):** 1370 1371 - **Triage:** Error classification via pattern matching 1372 - **Monitor:** Log scanning and anomaly detection 1373 - **Security:** Regex-based security checks (SQL injection, secrets, command injection patterns) 1374 - **QA:** Test file discovery and simple test generation 1375 1376 **Sonnet tasks (complex reasoning):** 1377 1378 - **Developer:** Bug fixing and code generation 1379 - **Architect:** Design reviews and architectural decisions 1380 - **Security:** Advanced threat modeling (STRIDE analysis) 1381 - **QA:** Coverage analysis and complex integration tests 1382 1383 **Configuration:** 1384 1385 ```bash 1386 # Enable Haiku optimization (default: true for 50-70% cost reduction) 1387 AGENT_USE_HAIKU_FOR_SIMPLE_TASKS=true 1388 ``` 1389 1390 **Override model selection:** 1391 1392 ```javascript 1393 // Force Haiku 1394 const result = await classifyIssue(agentName, taskId, errorMessage, { 1395 model: 'claude-3-5-haiku-20241022', 1396 }); 1397 1398 // Force Sonnet 1399 const result = await analyzeCode(agentName, taskId, filePath, prompt, { 1400 model: 'claude-3-5-sonnet-20241022', 1401 complexity: 'complex', 1402 }); 1403 ``` 1404 1405 **Track cost savings:** 1406 1407 ```bash 1408 npm run agent:stats 1409 ``` 1410 1411 Output includes model breakdown: 1412 1413 ```json 1414 { 1415 "modelBreakdown": { 1416 "haiku": { 1417 "calls": 150, 1418 "cost": 0.45, 1419 "avgCost": 0.003 1420 }, 1421 "sonnet": { 1422 "calls": 50, 1423 "cost": 1.2, 1424 "avgCost": 0.024 1425 }, 1426 "savings": "27.3" 1427 } 1428 } 1429 ``` 1430 1431 **Expected savings:** 1432 1433 - Monitor/Triage agents: 60-70% cost reduction (mostly Haiku) 1434 - Developer/Architect: 10-20% cost reduction (mostly Sonnet) 1435 - Security: 30-40% cost reduction (mix of simple checks and complex modeling) 1436 - QA: 40-50% cost reduction (test generation uses Haiku, coverage analysis uses Sonnet) 1437 1438 6. **Use circuit breakers:** Prevent runaway failures 1439 1440 **Cost estimation:** 1441 1442 - Average task: ~6,000 tokens input + ~2,000 tokens output = 8,000 tokens 1443 - At 60 invocations/hour: ~480,000 tokens/hour 1444 - At $3/M tokens (Sonnet): ~$1.44/hour 1445 - Daily cost (24 hours): ~$35 1446 1447 **Cost optimization tips:** 1448 1449 - Reduce task creation frequency if logs are clean 1450 - Increase Monitor scan interval from 5 to 10 minutes 1451 - Disable agents not currently needed 1452 - Use smaller models for simple tasks (Haiku for classification) 1453 1454 --- 1455 1456 ## Communication Patterns 1457 1458 ### Pattern 1: Task Handoff 1459 1460 ```javascript 1461 await agent.createTask({ 1462 task_type: 'verify_fix', 1463 assigned_to: 'qa', 1464 parent_task_id: 123, 1465 priority: 5, 1466 context: { 1467 files_changed: ['src/score.js'], 1468 fix_commit: 'abc123', 1469 test_instructions: 'Verify null check works', 1470 }, 1471 }); 1472 ``` 1473 1474 ### Pattern 2: Question & Answer 1475 1476 ```javascript 1477 await agent.askQuestion( 1478 taskId, 1479 'developer', 1480 'Should this test cover mobile and desktop screenshots?' 1481 ); 1482 ``` 1483 1484 ### Pattern 3: Workflow Chain 1485 1486 Triage -> Developer -> QA -> Security (automated handoffs) 1487 1488 --- 1489 1490 ## Testing 1491 1492 ### Unit Tests 1493 1494 ```bash 1495 # Test individual agents 1496 node --test tests/agents/triage.test.js 1497 node --test tests/agents/developer.test.js 1498 ``` 1499 1500 ### Integration Tests 1501 1502 ```bash 1503 # Test full workflows 1504 node --test tests/agents/workflow.integration.test.js 1505 ``` 1506 1507 ### E2E Integration Tests 1508 1509 **Location:** `tests/agents-e2e-implementation.test.js` 1510 1511 Comprehensive end-to-end tests for the complete agent workflow system. Tests real-world scenarios: 1512 1513 1. **Bug Fix Workflow** - Triage -> Developer -> QA complete success path 1514 2. **Feature Implementation** - Multi-agent collaboration with 85%+ coverage target 1515 3. **Security Fix** - High priority workflow with security verification 1516 4. **Coverage Improvement** - QA proactively improving test coverage 1517 5. **Rollback on Failure** - Error recovery and retry with different approach 1518 6. **Budget Enforcement** - API call limits and emergency shutdown 1519 1520 **Test Features:** 1521 1522 - Isolated test database (`db/test-agents-e2e-impl.db`) 1523 - Mock Anthropic API for faster tests 1524 - Parent-child task relationship verification 1525 - Inter-agent messaging validation 1526 - Database integrity checks (foreign keys, constraints) 1527 - No database pollution between tests 1528 1529 **Run Tests:** 1530 1531 ```bash 1532 node --experimental-test-module-mocks --test tests/agents-e2e-implementation.test.js 1533 ``` 1534 1535 **Test Duration:** ~3 minutes for all 12 tests 1536 1537 ### Test Results (Phase 5) 1538 1539 - Triage Agent: 28/28 passed 1540 - Developer Agent: 15/16 passed 1541 - Workflow Integration: 5/6 passed 1542 1543 --- 1544 1545 ## Creating New Agents 1546 1547 ### Step 1: Create Agent Class 1548 1549 ```javascript 1550 // src/agents/my-agent.js 1551 import { BaseAgent } from './base-agent.js'; 1552 1553 export class MyAgent extends BaseAgent { 1554 constructor() { 1555 super('my-agent', ['base.md', 'my-agent.md']); 1556 } 1557 1558 async processTask(task) { 1559 if (task.task_type === 'my_task_type') { 1560 await this.handleMyTask(task); 1561 } 1562 } 1563 1564 async handleMyTask(task) { 1565 const { context_json } = task; 1566 const context = JSON.parse(context_json); 1567 1568 // Do work... 1569 1570 await this.completeTask(task.id, { result: 'success' }); 1571 } 1572 } 1573 ``` 1574 1575 ### Step 2: Create Context File 1576 1577 ```markdown 1578 <!-- src/agents/contexts/my-agent.md --> 1579 1580 # My Agent Context 1581 1582 ## Responsibilities 1583 1584 - Task 1 1585 - Task 2 1586 1587 ## Task Types 1588 1589 - my_task_type 1590 1591 ## Best Practices 1592 1593 ... 1594 ``` 1595 1596 ### Step 3: Add to Runner 1597 1598 ```javascript 1599 // src/agents/runner.js 1600 import { MyAgent } from './my-agent.js'; 1601 1602 const agents = [ 1603 // ... existing agents 1604 new MyAgent(), 1605 ]; 1606 ``` 1607 1608 ### Step 4: Update Database 1609 1610 ```sql 1611 -- Add to agent_state table 1612 INSERT INTO agent_state (agent_name, status) 1613 VALUES ('my-agent', 'idle'); 1614 1615 -- Update CHECK constraints if needed 1616 ALTER TABLE agent_tasks 1617 ADD CONSTRAINT agent_tasks_assigned_to_check 1618 CHECK (assigned_to IN ('developer', 'qa', 'security', 'architect', 'triage', 'monitor', 'my-agent')); 1619 ``` 1620 1621 ### Step 5: Write Tests 1622 1623 ```javascript 1624 // tests/agents/my-agent.test.js 1625 import { test, describe } from 'node:test'; 1626 import { MyAgent } from '../../src/agents/my-agent.js'; 1627 1628 describe('MyAgent', () => { 1629 test('processes my_task_type tasks', async () => { 1630 // ... 1631 }); 1632 }); 1633 ``` 1634 1635 --- 1636 1637 ## Troubleshooting 1638 1639 ### Agent Not Processing Tasks 1640 1641 **Symptoms:** 1642 1643 - Tasks stuck in `pending` status 1644 - Agent status shows `blocked` 1645 - No recent logs for agent 1646 1647 **Diagnosis:** 1648 1649 ```bash 1650 # Check agent status 1651 npm run agent:list 1652 1653 # Check circuit breaker 1654 npm run agent:stats 1655 1656 # View error logs 1657 npm run agent:logs -- --agent-name developer --level error 1658 ``` 1659 1660 **Solutions:** 1661 1662 1. **Circuit breaker triggered:** 1663 1664 ```sql 1665 -- Check metrics 1666 SELECT metrics_json FROM agent_state WHERE agent_name = 'developer'; 1667 1668 -- Reset if safe 1669 UPDATE agent_state 1670 SET status = 'idle', 1671 metrics_json = json_remove(metrics_json, '$.circuit_breaker_triggered_at') 1672 WHERE agent_name = 'developer'; 1673 ``` 1674 1675 2. **Rate limit exceeded:** 1676 1677 ```bash 1678 # Wait for hourly reset, or increase limit 1679 AGENT_MAX_INVOCATIONS_PER_HOUR=120 1680 ``` 1681 1682 3. **Agent disabled:** 1683 1684 ```sql 1685 -- Re-enable agent 1686 UPDATE agent_state SET status = 'idle' WHERE agent_name = 'developer'; 1687 ``` 1688 1689 ### Tasks Stuck in Pending 1690 1691 **Symptoms:** 1692 1693 - Tasks created but never start 1694 - Task age >1 hour 1695 1696 **Diagnosis:** 1697 1698 ```bash 1699 # View pending tasks 1700 npm run agent:tasks -- --status pending 1701 1702 # Check if agents are running 1703 npm run agent:list 1704 1705 # Check task dependencies 1706 npm run agent:workflow:status -- --workflow-id 42 1707 ``` 1708 1709 **Solutions:** 1710 1711 1. **Parent task incomplete:** 1712 - Tasks with `parent_task_id` won't start until parent completes 1713 - Check parent status: `npm run agent:tasks -- --task-id <parent_id>` 1714 - Complete or cancel parent task 1715 1716 2. **Agent not running:** 1717 - Check cron job enabled: `SELECT * FROM cron_jobs WHERE name = 'agent-runner';` 1718 - Enable: `UPDATE cron_jobs SET enabled = 1 WHERE name = 'agent-runner';` 1719 - Manual run: `npm run agent:run` 1720 1721 3. **Task priority too low:** 1722 - Increase priority: `UPDATE agent_tasks SET priority = 10 WHERE id = 42;` 1723 1724 ### High Token Costs 1725 1726 **Symptoms:** 1727 1728 - Higher than expected API bills 1729 - Many agent invocations 1730 1731 **Diagnosis:** 1732 1733 ```bash 1734 # Check invocation counts 1735 SELECT agent_name, COUNT(*) as invocations 1736 FROM agent_logs 1737 WHERE created_at > datetime('now', '-1 hour') 1738 GROUP BY agent_name; 1739 1740 # Check task creation rate 1741 SELECT task_type, COUNT(*) as count 1742 FROM agent_tasks 1743 WHERE created_at > datetime('now', '-24 hours') 1744 GROUP BY task_type; 1745 ``` 1746 1747 **Solutions:** 1748 1749 1. **Reduce invocation frequency:** 1750 1751 ```bash 1752 # Lower rate limit 1753 AGENT_MAX_INVOCATIONS_PER_HOUR=30 1754 1755 # Increase Monitor scan interval 1756 # Edit cron_jobs: '*/5 * * * *' -> '*/10 * * * *' 1757 ``` 1758 1759 2. **Optimize context:** 1760 - Review context files for unnecessary content 1761 - Remove duplicate information 1762 - Keep context files lean 1763 1764 3. **Disable unnecessary agents:** 1765 1766 ```sql 1767 -- Temporarily disable Security agent 1768 UPDATE agent_state SET status = 'disabled' WHERE agent_name = 'security'; 1769 ``` 1770 1771 ### Circuit Breaker Triggered 1772 1773 **Symptoms:** 1774 1775 - Agent status = `blocked` 1776 - `circuit_breaker_triggered_at` in metrics_json 1777 1778 **Diagnosis:** 1779 1780 ```bash 1781 # View error logs 1782 npm run agent:logs -- --agent-name developer --level error 1783 1784 # Check failure rate 1785 npm run agent:stats 1786 ``` 1787 1788 **Solutions:** 1789 1790 1. **Identify root cause:** 1791 - Review error logs for patterns 1792 - Check recent code changes 1793 - Verify external dependencies (DB, APIs) 1794 1795 2. **Fix underlying issue:** 1796 - If code bug: Fix and test 1797 - If external issue: Wait for resolution 1798 - If config issue: Update configuration 1799 1800 3. **Reset circuit breaker:** 1801 1802 ```sql 1803 -- Only after fixing root cause! 1804 UPDATE agent_state 1805 SET status = 'idle', 1806 metrics_json = json_remove(metrics_json, '$.circuit_breaker_triggered_at') 1807 WHERE agent_name = 'developer'; 1808 ``` 1809 1810 ### Tasks Failing Repeatedly 1811 1812 **Symptoms:** 1813 1814 - Task retry_count = 3 1815 - Task status = `failed` 1816 - Same error in multiple tasks 1817 1818 **Diagnosis:** 1819 1820 ```bash 1821 # View failed tasks 1822 SELECT * FROM agent_tasks WHERE status = 'failed' ORDER BY created_at DESC LIMIT 10; 1823 1824 # Check error patterns 1825 npm run agent:logs -- --level error 1826 ``` 1827 1828 **Solutions:** 1829 1830 1. **Code issue:** 1831 - Manually fix the bug 1832 - Reset task: `UPDATE agent_tasks SET retry_count = 0, status = 'pending' WHERE id = 42;` 1833 1834 2. **Missing dependencies:** 1835 - Install required packages: `npm install` 1836 - Update environment variables 1837 1838 3. **Task too complex:** 1839 - Break into smaller subtasks 1840 - Provide more context in context_json 1841 1842 --- 1843 1844 ## Best Practices 1845 1846 ### When to Use Agents 1847 1848 **Use agents for:** 1849 1850 - Automated bug fixes from error logs 1851 - Test generation for new features 1852 - Security audits on commits 1853 - Documentation freshness checks 1854 - Refactoring suggestions 1855 - Routine maintenance tasks 1856 1857 **Don't use agents for:** 1858 1859 - Quick one-off tasks (just do it manually) 1860 - Tasks requiring complex user input 1861 - Real-time user interactions 1862 - Tasks with high uncertainty (needs human judgment) 1863 - Exploratory work without clear goals 1864 1865 ### Task Design 1866 1867 **Be specific:** 1868 1869 ```json 1870 { 1871 "error": "TypeError: Cannot read property 'score' of null", 1872 "file": "src/score.js", 1873 "line": 42, 1874 "stack": "..." 1875 } 1876 ``` 1877 1878 **Include context:** 1879 1880 ```json 1881 { 1882 "files_changed": ["src/score.js", "src/utils/error-handler.js"], 1883 "related_issues": ["Issue #123"], 1884 "previous_attempts": ["Tried null check, still failing"] 1885 } 1886 ``` 1887 1888 **Set appropriate priority:** 1889 1890 - 10: Critical (system down, security breach) 1891 - 7-9: High (blocking issue, major bug) 1892 - 4-6: Medium (normal bugs, features) 1893 - 1-3: Low (nice-to-haves, refactoring) 1894 1895 **Link parent tasks:** 1896 1897 ```javascript 1898 await createTask({ 1899 task_type: 'verify_fix', 1900 assigned_to: 'qa', 1901 parent_task_id: 123, // Links to fix_bug task 1902 priority: 5, 1903 }); 1904 ``` 1905 1906 ### Message Design 1907 1908 **Use handoff for task completion:** 1909 1910 ```javascript 1911 await agent.sendMessage(taskId, 'qa', 'handoff', 'Bug fix complete, ready for verification', { 1912 commit: 'abc123', 1913 files_changed: ['src/score.js'], 1914 }); 1915 ``` 1916 1917 **Use questions for clarification:** 1918 1919 ```javascript 1920 await agent.askQuestion( 1921 taskId, 1922 'developer', 1923 'Should this handle mobile and desktop screenshots differently?' 1924 ); 1925 ``` 1926 1927 **Use notifications for FYI:** 1928 1929 ```javascript 1930 await agent.sendMessage( 1931 taskId, 1932 'architect', 1933 'notification', 1934 'Coverage gate blocked commit due to <85% coverage', 1935 { current_coverage: 78, required: 85 } 1936 ); 1937 ``` 1938 1939 ### Agent Development 1940 1941 **Keep agents focused:** 1942 1943 - Single responsibility principle 1944 - One agent = one clear role 1945 - Don't create "do everything" agents 1946 1947 **Log liberally:** 1948 1949 ```javascript 1950 await this.log(taskId, 'info', 'Starting bug fix analysis'); 1951 await this.log(taskId, 'info', 'Identified affected files', { files: [...] }); 1952 await this.log(taskId, 'info', 'Generated fix, checking coverage'); 1953 ``` 1954 1955 **Fail gracefully:** 1956 1957 ```javascript 1958 try { 1959 const result = await this.analyzeBug(task); 1960 return result; 1961 } catch (error) { 1962 await this.log(task.id, 'error', 'Bug analysis failed', { error: error.message }); 1963 await this.failTask(task.id, { reason: 'Analysis failed', error: error.message }); 1964 return null; // Return partial results if possible 1965 } 1966 ``` 1967 1968 **Validate inputs:** 1969 1970 ```javascript 1971 async processTask(task) { 1972 const { context_json } = task; 1973 const context = JSON.parse(context_json); 1974 1975 // Validate required fields 1976 if (!context.error || !context.file) { 1977 await this.failTask(task.id, { reason: 'Missing required context fields' }); 1978 return; 1979 } 1980 1981 // Continue processing... 1982 } 1983 ``` 1984 1985 **Test thoroughly:** 1986 1987 - Unit tests for agent logic 1988 - Integration tests for workflows 1989 - Test error handling paths 1990 - Verify circuit breaker behavior 1991 1992 ### Monitoring and Maintenance 1993 1994 **Daily checks:** 1995 1996 ```bash 1997 # Check agent health 1998 npm run agent:stats 1999 2000 # Review errors 2001 npm run agent:logs -- --level error 2002 2003 # Check approval queue 2004 npm run agent:approvals 2005 ``` 2006 2007 **Weekly reviews:** 2008 2009 - Review circuit breaker triggers (if any) 2010 - Analyze token usage trends 2011 - Check task completion rates 2012 - Review escalated items 2013 2014 **Monthly optimization:** 2015 2016 - Analyze agent effectiveness 2017 - Optimize context files 2018 - Update agent logic based on patterns 2019 - Review and update approval thresholds 2020 2021 --- 2022 2023 ## Known Gaps & Industry Standards 2024 2025 ### Gap Analysis Summary 2026 2027 **Status**: 41 gaps identified across all agents compared to industry standards (Google SRE, TOGAF, ISTQB, OWASP/NIST, ITIL) 2028 2029 **Archived analysis:** [docs/plans/archive/agent-job-roles-gaps.md](/home/jason/code/333Method/docs/plans/archive/agent-job-roles-gaps.md) 2030 2031 ### Gap Analysis by Agent 2032 2033 Based on industry standards (Google SRE, TOGAF, ISTQB, OWASP/NIST, ITIL), the current system has **41 identified gaps** across all agents: 2034 2035 #### 1. Monitor Agent - SRE Standards (7 gaps) 2036 2037 **Critical gaps:** 2038 2039 - **SLO Tracking**: No service-level objectives for pipeline stages (e.g., "95% of sites score within 1 hour") 2040 - **Capacity Planning**: No forward-looking capacity analysis based on growth trends 2041 - **Toil Automation**: No identification of repetitive manual work 2042 - **Latency Monitoring**: No pipeline stage latency tracking (p50, p95, p99) 2043 - **On-Call Runbooks**: No automated incident response playbooks 2044 2045 **Recommendation**: Add SLO tracking via `pipeline_metrics` table, track growth trends 2046 2047 #### 2. Architect Agent Standards (6 gaps) 2048 2049 **Critical gaps:** 2050 2051 - **Architecture Decision Records (ADRs)**: No formal decision tracking 2052 - **Performance Profiling**: No bottleneck identification 2053 - **Scalability Planning**: No analysis of scaling limits 2054 - **Technical Debt Management**: No debt inventory or prioritization 2055 2056 **Recommendation**: Create `docs/decisions/` for ADRs, add `profile_performance` task 2057 2058 #### 3. Developer Agent Standards (5 gaps) 2059 2060 **Critical gaps:** 2061 2062 - **Root Cause Analysis**: No systematic RCA for recurring bugs 2063 - **Code Review Automation**: Limited automated review (ESLint exists but not enforced in workflow) 2064 - **Observability Instrumentation**: No logging/metrics added when fixing bugs 2065 2066 **Recommendation**: Require RCA for bugs recurring >2x, add log statements to error paths 2067 2068 #### 4. QA Agent Standards (6 gaps) 2069 2070 **Critical gaps:** 2071 2072 - **Test Data Management**: No test data generation or anonymization 2073 - **Non-Functional Testing**: No performance, load, or stress testing 2074 - **Test Prioritization**: No risk-based test prioritization 2075 - **Regression Testing**: No automated regression suite tracking 2076 2077 **Recommendation**: Add `run_load_test` task for critical paths, tag regression tests 2078 2079 #### 5. Security Agent Standards (7 gaps) 2080 2081 **Critical gaps:** 2082 2083 - **Threat Modeling**: No systematic threat analysis (STRIDE, DREAD) for new features 2084 - **Security Regression Testing**: No automated security test suite 2085 - **Penetration Testing**: No regular pentests (Shannon integration planned Phase 7) 2086 - **Security Metrics**: No MTTR tracking for vulnerabilities 2087 2088 **Recommendation**: Add `threat_model` task for new features, track remediation time 2089 2090 #### 6. Triage Agent Standards (6 gaps) 2091 2092 **Critical gaps:** 2093 2094 - **Incident Correlation**: No linking of related incidents 2095 - **Escalation Policies**: No time-based escalation rules 2096 - **Known Error Database**: No knowledge base of solved issues 2097 - **Postmortem Triggering**: No automated postmortems for critical incidents 2098 2099 **Recommendation**: Store successful fixes in `error_fix_history`, create postmortem tasks 2100 2101 #### 7. Cross-Cutting Gaps (4 gaps) 2102 2103 **Critical gaps:** 2104 2105 - **Observability & Telemetry**: No structured logging, metrics, or tracing 2106 - **Feedback Loops**: Limited learning from outcomes (exists for prompts, needs expansion) 2107 - **Human-in-the-Loop**: Approval gates exist but not consistently enforced 2108 - **Runbook Automation**: No automated resolution for known issues 2109 2110 **Recommendation**: Standardize JSON logging, extend `prompt_feedback` to all agents 2111 2112 ### Prioritized Improvement Roadmap 2113 2114 **Immediate (Week 1):** 2115 2116 1. Bootstrap Monitor Agent (create initial `scan_logs` task) 2117 2. Add SLO tracking for pipeline stages 2118 3. Implement known error database in Triage 2119 2120 **Short-Term (Month 1):** 2121 2122 4. Add performance profiling to Architect 2123 5. Add root cause analysis to Developer 2124 6. Add threat modeling to Security 2125 7. Add non-functional testing to QA 2126 2127 **Medium-Term (Quarter 1):** 2128 2129 8. Implement capacity planning in Monitor 2130 9. Add technical debt management to Architect 2131 10. Implement incident correlation in Triage 2132 2133 --- 2134 2135 ## Future Enhancements 2136 2137 **Planned features:** 2138 2139 - **Agent learning**: Track feedback patterns, improve over time (foundation exists with `prompt_feedback`) 2140 - **Parallel execution**: Multiple agents working simultaneously 2141 - **Agent metrics dashboard**: Real-time monitoring UI 2142 - **Custom workflows**: User-defined agent chains 2143 2144 **Integration opportunities:** 2145 2146 - **Shannon Pen Tester**: Add security scanning to Security Agent (planned Phase 7) 2147 - **CI/CD pipelines**: Trigger agent workflows on push/PR 2148 - **Slack/Discord notifications**: Alert on circuit breakers, escalations 2149 - **LLM provider switching**: Auto-switch based on task type/cost 2150 2151 --- 2152 2153 ## Implementation Status 2154 2155 **Status**: All 6 agents implemented, tested, and deployed to production (2026-02-15). 2156 2157 **Bootstrap Issue**: RESOLVED (2026-02-15) - Monitor agent bootstrap problem solved. System now properly initializes and self-schedules tasks. 2158 2159 --- 2160 2161 ## Additional Resources 2162 2163 - **Workflow System:** `/home/jason/code/333Method/docs/06-automation/agent-workflow.md` 2164 - **Base Agent Code:** `/home/jason/code/333Method/src/agents/base-agent.js` 2165 - **Agent Implementations:** `/home/jason/code/333Method/src/agents/` 2166 - **Context Files:** `/home/jason/code/333Method/src/agents/contexts/` 2167 - **CLI Manager:** `/home/jason/code/333Method/src/cli/agent-manager.js` 2168 - **Database Schema:** `/home/jason/code/333Method/db/schema.sql` 2169 - **Migrations:** `/home/jason/code/333Method/db/migrations/041-create-agent-system.sql` 2170 - **Gap Analysis:** `docs/plans/archive/agent-job-roles-gaps.md` (41 gaps identified) 2171 2172 --- 2173 2174 **Last Updated:** 2026-02-26 2175 **Version:** 2.0 (merged from docs/AGENTS.md + docs/06-automation/agent-system.md) 2176 **Status:** Production-ready