Cradicle Explorer

/ tests / agents / README-E2E-TESTS.md
README-E2E-TESTS.md
  1  # Agent System E2E Tests
  2  
  3  ## Quick Start
  4  
  5  ```bash
  6  # Run all E2E tests
  7  npm test tests/agents/e2e-agent-system.test.js
  8  
  9  # Run with coverage
 10  npm run test:coverage tests/agents/e2e-agent-system.test.js
 11  
 12  # Run specific test group
 13  npm test tests/agents/e2e-agent-system.test.js -- --grep "Task Lifecycle"
 14  ```
 15  
 16  ## What This Tests
 17  
 18  Comprehensive end-to-end validation of the multi-agent collaboration system:
 19  
 20  1. **Task Lifecycle** - Tasks flow correctly through pending → running → completed
 21  2. **Inter-agent Communication** - Agents create tasks for each other and exchange messages
 22  3. **Error Handling** - System fails gracefully with clear error messages
 23  4. **Circuit Breaker** - Prevents cascade failures, auto-recovers after cooldown
 24  5. **Task Routing** - Triage correctly routes errors to specialist agents
 25  6. **Priority Handling** - High-priority tasks processed first
 26  7. **Row-level Locking** - Concurrent agents don't claim same task (horizontal scaling)
 27  8. **Known Error Database** - Similar errors get suggested fixes from history
 28  9. **Coverage Gates** - Developer enforces 85% coverage before commits
 29  10. **Workflow Dependencies** - Features require approved designs (TOGAF compliance)
 30  11. **Full Multi-Agent Workflow** - Monitor → Triage → Developer → QA integration
 31  
 32  ## Test Structure
 33  
 34  ```
 35  tests/agents/
 36  ├── e2e-agent-system.test.js          # 1,248 lines - Main test suite
 37  ├── E2E-TEST-DOCUMENTATION.md         # Developer guide & reference
 38  ├── E2E-TEST-SUMMARY.md               # Executive summary
 39  └── README-E2E-TESTS.md               # This file
 40  ```
 41  
 42  ## Files
 43  
 44  ### `e2e-agent-system.test.js`
 45  
 46  **Purpose**: Complete E2E test suite (18 test scenarios)
 47  **Size**: 1,248 lines
 48  **Test Count**: 18 scenarios across 11 describe blocks
 49  **Coverage**: All critical agent workflows
 50  
 51  **Key Features**:
 52  
 53  - Real SQLite database (in-memory, isolated per test)
 54  - Comprehensive mocking (file ops, LLM calls, git, test runner)
 55  - Clean setup/teardown (no test pollution)
 56  - Realistic multi-agent workflows
 57  
 58  ### `E2E-TEST-DOCUMENTATION.md`
 59  
 60  **Purpose**: Detailed developer guide
 61  **Size**: ~400 lines
 62  **Audience**: Developers extending or debugging tests
 63  
 64  **Contents**:
 65  
 66  - What each test validates (with code examples)
 67  - Mocking strategy (what/why/how)
 68  - Running instructions (full suite, individual tests, debug mode)
 69  - Troubleshooting guide (common failures)
 70  - Future enhancements (load testing, integration tests)
 71  - Contributing guidelines (adding new tests)
 72  
 73  ### `E2E-TEST-SUMMARY.md`
 74  
 75  **Purpose**: Executive summary
 76  **Size**: ~300 lines
 77  **Audience**: Project stakeholders, tech leads
 78  
 79  **Contents**:
 80  
 81  - Test coverage overview
 82  - Production readiness assessment
 83  - Known issues and workarounds
 84  - Key learnings
 85  - Next steps (immediate, short-term, long-term)
 86  
 87  ## Test Coverage
 88  
 89  ### Covered ✅
 90  
 91  - Task creation with correct schema
 92  - Task status transitions (pending → running → completed)
 93  - Inter-agent task creation and handoffs
 94  - Question/answer messaging between agents
 95  - Invalid context handling (graceful failures)
 96  - Retry logic (3 attempts → failed)
 97  - Circuit breaker (opens after failures, auto-recovers)
 98  - Task routing (security → Security, database → Developer, network → Architect)
 99  - Priority scheduling (high priority first)
100  - Row-level locking (concurrent agents don't duplicate work)
101  - Known error database (similar errors get suggested fixes)
102  - Coverage gates (85% required before commits)
103  - Workflow dependencies (features require approved designs)
104  - Full multi-agent workflow (Monitor → Triage → Developer → QA)
105  
106  ### Not Covered (Future Work)
107  
108  - Load testing (100+ concurrent tasks)
109  - Real LLM integration tests (limited budget)
110  - Agent deadlock detection (circular dependencies)
111  - Performance benchmarks (task throughput)
112  - Chaos engineering (random failures)
113  
114  ## Known Issues
115  
116  ### 1. StructuredLogger Readonly Database
117  
118  **Symptom**: `[StructuredLogger] Database write failed: attempt to write a readonly database`
119  **Impact**: Low - some log assertions fail, but core functionality works
120  **Workaround**: Tests verify core functionality despite logging errors
121  **Fix**: Mock or disable StructuredLogger in test environment
122  
123  ### 2. Test Execution Time
124  
125  **Duration**: ~35 seconds for full suite
126  **Cause**: Real SQLite operations + agent initialization
127  **Impact**: Low - E2E tests should be thorough, not fast
128  **Improvement**: Could parallelize independent test suites
129  
130  ## Production Readiness
131  
132  ### ✅ Ready
133  
134  - Task lifecycle management
135  - Agent collaboration and handoffs
136  - Error handling and graceful degradation
137  - Circuit breaker auto-recovery
138  - Intelligent task routing
139  - Priority-based scheduling
140  - Horizontal scaling (row-level locking)
141  - Learning from past fixes
142  
143  ### ⚠️ Minor Issues (Non-blocking)
144  
145  - StructuredLogger readonly database errors (logging only)
146  
147  ### 🔄 Future Enhancements
148  
149  - Load testing under high concurrency
150  - Integration tests with real LLM calls
151  - Chaos engineering tests
152  - Performance benchmarking
153  
154  ## Running Tests
155  
156  ### Basic Commands
157  
158  ```bash
159  # All E2E tests
160  npm test tests/agents/e2e-agent-system.test.js
161  
162  # With coverage report
163  npm run test:coverage tests/agents/e2e-agent-system.test.js
164  
165  # Specific test group
166  npm test tests/agents/e2e-agent-system.test.js -- --grep "Circuit Breaker"
167  
168  # Debug mode
169  DEBUG=1 npm test tests/agents/e2e-agent-system.test.js
170  ```
171  
172  ### Individual Test Groups
173  
174  ```bash
175  # Task lifecycle
176  npm test tests/agents/e2e-agent-system.test.js -- --grep "1. Task Lifecycle"
177  
178  # Inter-agent communication
179  npm test tests/agents/e2e-agent-system.test.js -- --grep "2. Inter-agent Communication"
180  
181  # Error handling
182  npm test tests/agents/e2e-agent-system.test.js -- --grep "3. Error Handling"
183  
184  # Circuit breaker
185  npm test tests/agents/e2e-agent-system.test.js -- --grep "4. Circuit Breaker"
186  
187  # Task routing
188  npm test tests/agents/e2e-agent-system.test.js -- --grep "5. Task Routing"
189  
190  # Priority handling
191  npm test tests/agents/e2e-agent-system.test.js -- --grep "6. Priority Handling"
192  
193  # Row-level locking
194  npm test tests/agents/e2e-agent-system.test.js -- --grep "7. Row-level Locking"
195  
196  # Known error database
197  npm test tests/agents/e2e-agent-system.test.js -- --grep "8. Known Error Database"
198  
199  # Coverage gates
200  npm test tests/agents/e2e-agent-system.test.js -- --grep "9. Coverage Gates"
201  
202  # Workflow dependencies
203  npm test tests/agents/e2e-agent-system.test.js -- --grep "10. Workflow Dependencies"
204  
205  # Full workflow
206  npm test tests/agents/e2e-agent-system.test.js -- --grep "11. Agent System Integration"
207  ```
208  
209  ## What Each Test Validates
210  
211  ### 1. Task Lifecycle
212  
213  - ✅ Tasks created with correct schema
214  - ✅ Status transitions work (pending → running → completed)
215  - ✅ Result JSON stored correctly
216  - ✅ Timestamps set appropriately
217  
218  ### 2. Inter-agent Communication
219  
220  - ✅ Developer creates QA verification tasks
221  - ✅ Handoff messages delivered
222  - ✅ Parent-child relationships maintained
223  - ✅ Question/answer messaging works
224  
225  ### 3. Error Handling
226  
227  - ✅ Invalid context fails gracefully
228  - ✅ Malformed JSON handled correctly
229  - ✅ Retry logic (3 attempts max)
230  - ✅ Error messages are descriptive
231  
232  ### 4. Circuit Breaker
233  
234  - ✅ Circuit opens after multiple failures
235  - ✅ Auto-recovers after 30-minute cooldown
236  - ✅ Half-open state works correctly
237  - ✅ Failure count tracked
238  
239  ### 5. Task Routing
240  
241  - ✅ Security errors → Security agent
242  - ✅ Database errors → Developer agent
243  - ✅ Network errors → Architect agent
244  - ✅ Suggested fixes included
245  
246  ### 6. Priority Handling
247  
248  - ✅ High priority tasks processed first
249  - ✅ Priority calculated from severity and stage
250  - ✅ Queue ordering works correctly
251  
252  ### 7. Row-level Locking
253  
254  - ✅ Concurrent agents don't claim same task
255  - ✅ Only one agent processes each task
256  - ✅ Horizontal scaling enabled
257  
258  ### 8. Known Error Database
259  
260  - ✅ Similar errors detected (70%+ similarity)
261  - ✅ Suggested fixes from past tasks
262  - ✅ Fix descriptions included
263  - ✅ Reference to original fix task
264  
265  ### 9. Coverage Gates
266  
267  - ✅ 85% coverage required before commits
268  - ✅ Low coverage blocks commits
269  - ✅ QA task created to write tests
270  - ✅ Escalation to Architect if coverage can't be met
271  
272  ### 10. Workflow Dependencies
273  
274  - ✅ Features require approved designs
275  - ✅ Auto-creates design_proposal tasks
276  - ✅ Approved designs enable implementation
277  - ✅ TOGAF workflow compliance
278  
279  ### 11. Full Multi-Agent Workflow
280  
281  - ✅ Monitor → Triage → Developer → QA
282  - ✅ Complete workflow chain verified
283  - ✅ All agents participate correctly
284  - ✅ Real-world scenario validation
285  
286  ## Troubleshooting
287  
288  ### "Database locked" Error
289  
290  **Cause**: Database connections not properly closed
291  **Fix**: Ensure afterEach() calls `resetBaseDb()`, `resetTaskDb()`, `resetMessageDb()`
292  
293  ### "AssertionError: task not completed"
294  
295  **Cause**: Agent not initialized or missing context fields
296  **Fix**: Check agent initialization and context_json fields
297  
298  ### "Mock not called" Error
299  
300  **Cause**: Mock not set up before agent processes task
301  **Fix**: Verify mock.method() called before agent.pollTasks()
302  
303  ### Tests Timeout
304  
305  **Cause**: Infinite loop or missing AGENT_IMMEDIATE_INVOCATION=false
306  **Fix**: Disable immediate invocation in test setup
307  
308  ## Contributing
309  
310  ### Adding New Tests
311  
312  1. Follow existing describe/test structure
313  2. Use descriptive test names (what + why)
314  3. Add mocks for expensive operations
315  4. Document test purpose in comments
316  5. Verify cleanup in afterEach()
317  
318  ### Test Organization
319  
320  - Group related tests in describe() blocks
321  - Order by complexity (simple → complex)
322  - Keep test cases independent
323  - Use setup/teardown for common initialization
324  
325  ### Mocking Best Practices
326  
327  - Mock expensive operations (LLM, file I/O, git)
328  - Don't mock core agent logic
329  - Restore mocks in afterEach()
330  - Use realistic mock data
331  
332  ### Assertions
333  
334  - Use specific assertions (`strictEqual` vs `ok`)
335  - Provide descriptive failure messages
336  - Test both positive and negative cases
337  - Verify side effects (logs, messages, child tasks)
338  
339  ## References
340  
341  - **Agent System Docs**: `/docs/06-automation/agent-system.md`
342  - **Test Suite**: `/tests/agents/e2e-agent-system.test.js`
343  - **Test Documentation**: `/tests/agents/E2E-TEST-DOCUMENTATION.md`
344  - **Test Summary**: `/tests/agents/E2E-TEST-SUMMARY.md`
345  - **Base Agent**: `/src/agents/base-agent.js`
346  - **Task Manager**: `/src/agents/utils/task-manager.js`
347  - **Message Manager**: `/src/agents/utils/message-manager.js`
348  
349  ## Support
350  
351  For questions or issues:
352  
353  1. Check E2E-TEST-DOCUMENTATION.md (troubleshooting section)
354  2. Review test code and comments
355  3. Check agent logs in agent_logs table
356  4. Run tests with DEBUG=1 for verbose output