README-E2E-TESTS.md
1 # Agent System E2E Tests 2 3 ## Quick Start 4 5 ```bash 6 # Run all E2E tests 7 npm test tests/agents/e2e-agent-system.test.js 8 9 # Run with coverage 10 npm run test:coverage tests/agents/e2e-agent-system.test.js 11 12 # Run specific test group 13 npm test tests/agents/e2e-agent-system.test.js -- --grep "Task Lifecycle" 14 ``` 15 16 ## What This Tests 17 18 Comprehensive end-to-end validation of the multi-agent collaboration system: 19 20 1. **Task Lifecycle** - Tasks flow correctly through pending → running → completed 21 2. **Inter-agent Communication** - Agents create tasks for each other and exchange messages 22 3. **Error Handling** - System fails gracefully with clear error messages 23 4. **Circuit Breaker** - Prevents cascade failures, auto-recovers after cooldown 24 5. **Task Routing** - Triage correctly routes errors to specialist agents 25 6. **Priority Handling** - High-priority tasks processed first 26 7. **Row-level Locking** - Concurrent agents don't claim same task (horizontal scaling) 27 8. **Known Error Database** - Similar errors get suggested fixes from history 28 9. **Coverage Gates** - Developer enforces 85% coverage before commits 29 10. **Workflow Dependencies** - Features require approved designs (TOGAF compliance) 30 11. **Full Multi-Agent Workflow** - Monitor → Triage → Developer → QA integration 31 32 ## Test Structure 33 34 ``` 35 tests/agents/ 36 ├── e2e-agent-system.test.js # 1,248 lines - Main test suite 37 ├── E2E-TEST-DOCUMENTATION.md # Developer guide & reference 38 ├── E2E-TEST-SUMMARY.md # Executive summary 39 └── README-E2E-TESTS.md # This file 40 ``` 41 42 ## Files 43 44 ### `e2e-agent-system.test.js` 45 46 **Purpose**: Complete E2E test suite (18 test scenarios) 47 **Size**: 1,248 lines 48 **Test Count**: 18 scenarios across 11 describe blocks 49 **Coverage**: All critical agent workflows 50 51 **Key Features**: 52 53 - Real SQLite database (in-memory, isolated per test) 54 - Comprehensive mocking (file ops, LLM calls, git, test runner) 55 - Clean setup/teardown (no test pollution) 56 - Realistic multi-agent workflows 57 58 ### `E2E-TEST-DOCUMENTATION.md` 59 60 **Purpose**: Detailed developer guide 61 **Size**: ~400 lines 62 **Audience**: Developers extending or debugging tests 63 64 **Contents**: 65 66 - What each test validates (with code examples) 67 - Mocking strategy (what/why/how) 68 - Running instructions (full suite, individual tests, debug mode) 69 - Troubleshooting guide (common failures) 70 - Future enhancements (load testing, integration tests) 71 - Contributing guidelines (adding new tests) 72 73 ### `E2E-TEST-SUMMARY.md` 74 75 **Purpose**: Executive summary 76 **Size**: ~300 lines 77 **Audience**: Project stakeholders, tech leads 78 79 **Contents**: 80 81 - Test coverage overview 82 - Production readiness assessment 83 - Known issues and workarounds 84 - Key learnings 85 - Next steps (immediate, short-term, long-term) 86 87 ## Test Coverage 88 89 ### Covered ✅ 90 91 - Task creation with correct schema 92 - Task status transitions (pending → running → completed) 93 - Inter-agent task creation and handoffs 94 - Question/answer messaging between agents 95 - Invalid context handling (graceful failures) 96 - Retry logic (3 attempts → failed) 97 - Circuit breaker (opens after failures, auto-recovers) 98 - Task routing (security → Security, database → Developer, network → Architect) 99 - Priority scheduling (high priority first) 100 - Row-level locking (concurrent agents don't duplicate work) 101 - Known error database (similar errors get suggested fixes) 102 - Coverage gates (85% required before commits) 103 - Workflow dependencies (features require approved designs) 104 - Full multi-agent workflow (Monitor → Triage → Developer → QA) 105 106 ### Not Covered (Future Work) 107 108 - Load testing (100+ concurrent tasks) 109 - Real LLM integration tests (limited budget) 110 - Agent deadlock detection (circular dependencies) 111 - Performance benchmarks (task throughput) 112 - Chaos engineering (random failures) 113 114 ## Known Issues 115 116 ### 1. StructuredLogger Readonly Database 117 118 **Symptom**: `[StructuredLogger] Database write failed: attempt to write a readonly database` 119 **Impact**: Low - some log assertions fail, but core functionality works 120 **Workaround**: Tests verify core functionality despite logging errors 121 **Fix**: Mock or disable StructuredLogger in test environment 122 123 ### 2. Test Execution Time 124 125 **Duration**: ~35 seconds for full suite 126 **Cause**: Real SQLite operations + agent initialization 127 **Impact**: Low - E2E tests should be thorough, not fast 128 **Improvement**: Could parallelize independent test suites 129 130 ## Production Readiness 131 132 ### ✅ Ready 133 134 - Task lifecycle management 135 - Agent collaboration and handoffs 136 - Error handling and graceful degradation 137 - Circuit breaker auto-recovery 138 - Intelligent task routing 139 - Priority-based scheduling 140 - Horizontal scaling (row-level locking) 141 - Learning from past fixes 142 143 ### ⚠️ Minor Issues (Non-blocking) 144 145 - StructuredLogger readonly database errors (logging only) 146 147 ### 🔄 Future Enhancements 148 149 - Load testing under high concurrency 150 - Integration tests with real LLM calls 151 - Chaos engineering tests 152 - Performance benchmarking 153 154 ## Running Tests 155 156 ### Basic Commands 157 158 ```bash 159 # All E2E tests 160 npm test tests/agents/e2e-agent-system.test.js 161 162 # With coverage report 163 npm run test:coverage tests/agents/e2e-agent-system.test.js 164 165 # Specific test group 166 npm test tests/agents/e2e-agent-system.test.js -- --grep "Circuit Breaker" 167 168 # Debug mode 169 DEBUG=1 npm test tests/agents/e2e-agent-system.test.js 170 ``` 171 172 ### Individual Test Groups 173 174 ```bash 175 # Task lifecycle 176 npm test tests/agents/e2e-agent-system.test.js -- --grep "1. Task Lifecycle" 177 178 # Inter-agent communication 179 npm test tests/agents/e2e-agent-system.test.js -- --grep "2. Inter-agent Communication" 180 181 # Error handling 182 npm test tests/agents/e2e-agent-system.test.js -- --grep "3. Error Handling" 183 184 # Circuit breaker 185 npm test tests/agents/e2e-agent-system.test.js -- --grep "4. Circuit Breaker" 186 187 # Task routing 188 npm test tests/agents/e2e-agent-system.test.js -- --grep "5. Task Routing" 189 190 # Priority handling 191 npm test tests/agents/e2e-agent-system.test.js -- --grep "6. Priority Handling" 192 193 # Row-level locking 194 npm test tests/agents/e2e-agent-system.test.js -- --grep "7. Row-level Locking" 195 196 # Known error database 197 npm test tests/agents/e2e-agent-system.test.js -- --grep "8. Known Error Database" 198 199 # Coverage gates 200 npm test tests/agents/e2e-agent-system.test.js -- --grep "9. Coverage Gates" 201 202 # Workflow dependencies 203 npm test tests/agents/e2e-agent-system.test.js -- --grep "10. Workflow Dependencies" 204 205 # Full workflow 206 npm test tests/agents/e2e-agent-system.test.js -- --grep "11. Agent System Integration" 207 ``` 208 209 ## What Each Test Validates 210 211 ### 1. Task Lifecycle 212 213 - ✅ Tasks created with correct schema 214 - ✅ Status transitions work (pending → running → completed) 215 - ✅ Result JSON stored correctly 216 - ✅ Timestamps set appropriately 217 218 ### 2. Inter-agent Communication 219 220 - ✅ Developer creates QA verification tasks 221 - ✅ Handoff messages delivered 222 - ✅ Parent-child relationships maintained 223 - ✅ Question/answer messaging works 224 225 ### 3. Error Handling 226 227 - ✅ Invalid context fails gracefully 228 - ✅ Malformed JSON handled correctly 229 - ✅ Retry logic (3 attempts max) 230 - ✅ Error messages are descriptive 231 232 ### 4. Circuit Breaker 233 234 - ✅ Circuit opens after multiple failures 235 - ✅ Auto-recovers after 30-minute cooldown 236 - ✅ Half-open state works correctly 237 - ✅ Failure count tracked 238 239 ### 5. Task Routing 240 241 - ✅ Security errors → Security agent 242 - ✅ Database errors → Developer agent 243 - ✅ Network errors → Architect agent 244 - ✅ Suggested fixes included 245 246 ### 6. Priority Handling 247 248 - ✅ High priority tasks processed first 249 - ✅ Priority calculated from severity and stage 250 - ✅ Queue ordering works correctly 251 252 ### 7. Row-level Locking 253 254 - ✅ Concurrent agents don't claim same task 255 - ✅ Only one agent processes each task 256 - ✅ Horizontal scaling enabled 257 258 ### 8. Known Error Database 259 260 - ✅ Similar errors detected (70%+ similarity) 261 - ✅ Suggested fixes from past tasks 262 - ✅ Fix descriptions included 263 - ✅ Reference to original fix task 264 265 ### 9. Coverage Gates 266 267 - ✅ 85% coverage required before commits 268 - ✅ Low coverage blocks commits 269 - ✅ QA task created to write tests 270 - ✅ Escalation to Architect if coverage can't be met 271 272 ### 10. Workflow Dependencies 273 274 - ✅ Features require approved designs 275 - ✅ Auto-creates design_proposal tasks 276 - ✅ Approved designs enable implementation 277 - ✅ TOGAF workflow compliance 278 279 ### 11. Full Multi-Agent Workflow 280 281 - ✅ Monitor → Triage → Developer → QA 282 - ✅ Complete workflow chain verified 283 - ✅ All agents participate correctly 284 - ✅ Real-world scenario validation 285 286 ## Troubleshooting 287 288 ### "Database locked" Error 289 290 **Cause**: Database connections not properly closed 291 **Fix**: Ensure afterEach() calls `resetBaseDb()`, `resetTaskDb()`, `resetMessageDb()` 292 293 ### "AssertionError: task not completed" 294 295 **Cause**: Agent not initialized or missing context fields 296 **Fix**: Check agent initialization and context_json fields 297 298 ### "Mock not called" Error 299 300 **Cause**: Mock not set up before agent processes task 301 **Fix**: Verify mock.method() called before agent.pollTasks() 302 303 ### Tests Timeout 304 305 **Cause**: Infinite loop or missing AGENT_IMMEDIATE_INVOCATION=false 306 **Fix**: Disable immediate invocation in test setup 307 308 ## Contributing 309 310 ### Adding New Tests 311 312 1. Follow existing describe/test structure 313 2. Use descriptive test names (what + why) 314 3. Add mocks for expensive operations 315 4. Document test purpose in comments 316 5. Verify cleanup in afterEach() 317 318 ### Test Organization 319 320 - Group related tests in describe() blocks 321 - Order by complexity (simple → complex) 322 - Keep test cases independent 323 - Use setup/teardown for common initialization 324 325 ### Mocking Best Practices 326 327 - Mock expensive operations (LLM, file I/O, git) 328 - Don't mock core agent logic 329 - Restore mocks in afterEach() 330 - Use realistic mock data 331 332 ### Assertions 333 334 - Use specific assertions (`strictEqual` vs `ok`) 335 - Provide descriptive failure messages 336 - Test both positive and negative cases 337 - Verify side effects (logs, messages, child tasks) 338 339 ## References 340 341 - **Agent System Docs**: `/docs/06-automation/agent-system.md` 342 - **Test Suite**: `/tests/agents/e2e-agent-system.test.js` 343 - **Test Documentation**: `/tests/agents/E2E-TEST-DOCUMENTATION.md` 344 - **Test Summary**: `/tests/agents/E2E-TEST-SUMMARY.md` 345 - **Base Agent**: `/src/agents/base-agent.js` 346 - **Task Manager**: `/src/agents/utils/task-manager.js` 347 - **Message Manager**: `/src/agents/utils/message-manager.js` 348 349 ## Support 350 351 For questions or issues: 352 353 1. Check E2E-TEST-DOCUMENTATION.md (troubleshooting section) 354 2. Review test code and comments 355 3. Check agent logs in agent_logs table 356 4. Run tests with DEBUG=1 for verbose output