task-history.md
1 --- 2 title: Agent Task History & Learning 3 category: agents 4 last_verified: 2026-02-16 5 tags: [agents, learning, context, memory, performance] 6 status: implemented 7 related_files: 8 - src/agents/utils/context-builder.js 9 - src/agents/utils/agent-claude-api.js 10 - src/agents/base-agent.js 11 - tests/agents/context-builder.test.js 12 --- 13 14 # Agent Task History & Learning 15 16 ## Overview 17 18 The agent system implements **task history and learning** to enable agents to improve performance over time by learning from past successes and failures. 19 20 Every agent can access: 21 22 - **Recent successful tasks** - Patterns that work 23 - **Recent failures** - Mistakes to avoid 24 - **Related tasks** - Tasks involving same files/error types 25 26 This context is automatically injected into LLM calls, providing agents with memory and learning capabilities. 27 28 ## How It Works 29 30 ### Architecture 31 32 ``` 33 ┌─────────────────┐ 34 │ Agent Task │ 35 │ (current) │ 36 └────────┬────────┘ 37 │ 38 v 39 ┌─────────────────┐ ┌──────────────────┐ 40 │ Context Builder │───────>│ agent_outcomes │ 41 │ │ │ (history DB) │ 42 └────────┬────────┘ └──────────────────┘ 43 │ 44 v 45 ┌─────────────────┐ 46 │ Enriched │ 47 │ Context = │ 48 │ Base + History │ 49 └────────┬────────┘ 50 │ 51 v 52 ┌─────────────────┐ 53 │ LLM Call │ 54 │ (with learning) │ 55 └─────────────────┘ 56 ``` 57 58 ### Data Flow 59 60 1. **Outcome Recording**: After completing a task, agent records outcome to `agent_outcomes` table 61 2. **History Retrieval**: When processing a new task, agent retrieves relevant history 62 3. **Context Enrichment**: History is formatted and added to base context 63 4. **LLM Call**: Enhanced context is sent to Claude API 64 5. **Learning**: Agent learns from patterns in historical data 65 66 ## Database Schema 67 68 ### agent_outcomes Table 69 70 Stores task outcomes for learning: 71 72 ```sql 73 CREATE TABLE agent_outcomes ( 74 id INTEGER PRIMARY KEY AUTOINCREMENT, 75 task_id INTEGER NOT NULL REFERENCES agent_tasks(id) ON DELETE CASCADE, 76 agent_name TEXT NOT NULL, 77 task_type TEXT NOT NULL, 78 outcome TEXT NOT NULL CHECK(outcome IN ('success', 'failure')), 79 context_json TEXT, -- Task-specific context (error_type, file_path, etc.) 80 result_json TEXT, -- Task result details (what worked, what didn't) 81 duration_ms INTEGER, 82 created_at DATETIME DEFAULT CURRENT_timestamp 83 ); 84 85 CREATE INDEX idx_agent_outcomes_agent ON agent_outcomes(agent_name, task_type); 86 CREATE INDEX idx_agent_outcomes_task_type ON agent_outcomes(task_type, outcome); 87 CREATE INDEX idx_agent_outcomes_outcome ON agent_outcomes(outcome, created_at); 88 ``` 89 90 ## Usage Examples 91 92 ### Recording Outcomes 93 94 Agents automatically record outcomes after task completion: 95 96 ```javascript 97 // In BaseAgent.executeTask() 98 try { 99 await this.processTask(task); 100 101 // Record success 102 await this.recordOutcome(task.id, 'success', { 103 task_type: task.task_type, 104 duration_ms: duration, 105 }); 106 } catch (error) { 107 // Record failure 108 await this.recordOutcome(task.id, 'failure', { 109 task_type: task.task_type, 110 error: error.message, 111 stack: error.stack, 112 duration_ms: duration, 113 }); 114 } 115 ``` 116 117 ### Getting Enriched Context 118 119 Agents can get enriched context with history: 120 121 ```javascript 122 import { buildAgentContext } from './utils/context-builder.js'; 123 124 // Get context with task history 125 const context = await buildAgentContext('developer', ['base.md', 'developer.md'], currentTask); 126 127 console.log(context.fullContext); // Base + history combined 128 console.log(context.baseContext); // Original base context 129 console.log(context.historyContext); // Just the history section 130 console.log(context.historyTokens); // Token cost of history 131 console.log(context.totalTokens); // Total tokens 132 console.log(context.metadata.historyStats); // Stats: {recentSuccesses, recentFailures, relatedTasks} 133 ``` 134 135 ### Using in LLM Calls 136 137 Inject history into system prompts: 138 139 ```javascript 140 import { simpleLLMCall } from './utils/agent-claude-api.js'; 141 import { buildAgentContext } from './utils/context-builder.js'; 142 143 // Get enriched context 144 const enrichedContext = await buildAgentContext('developer', ['base.md', 'developer.md'], task); 145 146 // Make LLM call with task history 147 const response = await simpleLLMCall('developer', task.id, { 148 prompt: 'Fix this bug...', 149 systemPrompt: 'You are an expert developer...', 150 taskHistory: enrichedContext.historyContext, // ← Inject history here 151 maxTokens: 2000, 152 }); 153 ``` 154 155 ### Direct BaseAgent Method 156 157 BaseAgent provides a convenience method: 158 159 ```javascript 160 class DeveloperAgent extends BaseAgent { 161 async fixBug(task) { 162 // Get context with history for this task 163 const context = await this.getContextForTask(task); 164 165 // Use context.fullContext in LLM prompts 166 const fixPrompt = ` 167 ${context.fullContext} 168 169 Fix this bug: ${task.context_json.error_message} 170 `; 171 } 172 } 173 ``` 174 175 ## History Format 176 177 ### Successful Tasks 178 179 ```markdown 180 ## Task History (Learning Context) 181 182 ### Recent Successful Approaches (Last 7 Days) 183 184 - **fix_bug** (Task #42) 185 - Files: src/scoring.js 186 - Approach: Added null check before accessing property 187 - Duration: 3s 188 189 - **implement_feature** (Task #38) 190 - Files: src/outreach/sms.js, tests/outreach-sms.test.js 191 - Approach: Implemented retry logic with exponential backoff 192 - Duration: 45s 193 ``` 194 195 ### Failed Tasks 196 197 ```markdown 198 ### Past Failures to Avoid (Last 7 Days) 199 200 - **fix_bug** (Task #40) 201 - Error: Twilio API timeout after 30 seconds 202 - Error Type: api_timeout 203 - File: src/outreach/sms.js 204 205 - **implement_feature** (Task #35) 206 - Error: Test coverage below 80% ([file:line]) 207 - Error Type: coverage_gate 208 ``` 209 210 ### Related Tasks 211 212 ```markdown 213 ### Related Tasks (Similar Context) 214 215 - ✓ **fix_bug** (Task #28) - completed 216 - File: src/capture.js 217 - What worked: Added null check before accessing browser context 218 219 - ✗ **refactor_code** (Task #32) - failed 220 - File: src/capture.js 221 - What failed: Broke existing tests by changing function signature 222 ``` 223 224 ## Configuration 225 226 ### Environment Variables 227 228 ```bash 229 # Enable/disable task history (default: true) 230 AGENT_ENABLE_TASK_HISTORY=true 231 232 # Cache TTL (default: 30 minutes) 233 # Internal constant, not configurable via env var 234 ``` 235 236 ### Limits 237 238 - **Recent Successes**: Last 10 successful tasks (7 days max) 239 - **Recent Failures**: Last 5 failed tasks (7 days max) 240 - **Related Tasks**: Last 5 related tasks (30 days max) 241 - **Cache TTL**: 30 minutes (reduces DB queries) 242 243 ## Benefits 244 245 ### 1. Pattern Recognition 246 247 Agents learn which approaches work for specific error types: 248 249 ``` 250 "For null_pointer errors in Playwright code: 251 - Past success: Added null check before accessing page.context() 252 - Past failure: Wrapping in try-catch without fixing root cause 253 → Apply null check pattern" 254 ``` 255 256 ### 2. Avoiding Repeated Mistakes 257 258 ``` 259 "File src/capture.js: 260 - Previous failure: Changed function signature, broke 8 tests 261 → This time: Keep function signature, add optional parameter" 262 ``` 263 264 ### 3. Faster Resolution 265 266 - **Without history**: Agent tries various approaches, some fail 267 - **With history**: Agent sees what worked before, applies immediately 268 269 ### 4. Consistency 270 271 All agents learn from each other's experiences: 272 273 - Developer fixes bug → QA sees the pattern 274 - Security finds vulnerability → Developer knows to check for it 275 276 ## Performance Impact 277 278 ### Token Usage 279 280 - **Base context**: ~10-15KB (varies by agent) 281 - **History context**: ~2-5KB (depends on history size) 282 - **Total overhead**: ~20-40% token increase 283 - **Benefit**: Reduces trial-and-error, faster resolution 284 285 ### Example Token Breakdown 286 287 ``` 288 Developer Agent with History: 289 - Base context: 12KB = ~3,000 tokens 290 - History (5 successes, 2 failures): 3KB = ~750 tokens 291 - Total: 15KB = ~3,750 tokens 292 - Overhead: +25% tokens 293 ``` 294 295 ### Caching 296 297 Context builder caches history for 30 minutes: 298 299 - **First call**: Query DB for history 300 - **Subsequent calls**: Return cached data 301 - **Cache invalidation**: After 30 minutes 302 303 ## Querying Task History 304 305 ### Get Recent Outcomes 306 307 ```javascript 308 // Via BaseAgent method 309 const insights = await agent.learnFromPastOutcomes('fix_bug', 50); 310 311 console.log(insights); 312 // { 313 // task_type: 'fix_bug', 314 // total_outcomes: 42, 315 // success_count: 35, 316 // failure_count: 7, 317 // success_rate: 83.33, 318 // avg_duration_ms: 2500, 319 // context_patterns: { 320 // 'null_pointer': { total: 12, successes: 10, success_rate: 83 }, 321 // 'api_timeout': { total: 8, successes: 5, success_rate: 62 } 322 // }, 323 // success_patterns: [ 324 // 'Successfully handled: capture.js', 325 // 'Added null check before accessing property' 326 // ], 327 // failure_patterns: [ 328 // 'Twilio API timeout after [N] seconds', 329 // 'Test coverage below [N]%' 330 // ], 331 // recommendations: [ 332 // 'Continue using successful approaches: Added null check before accessing property', 333 // 'Avoid common failure pattern: Twilio API timeout after [N] seconds' 334 // ] 335 // } 336 ``` 337 338 ### Direct Database Queries 339 340 ```sql 341 -- Get success rate by agent and task type 342 SELECT 343 agent_name, 344 task_type, 345 COUNT(*) as total, 346 SUM(CASE WHEN outcome = 'success' THEN 1 ELSE 0 END) as successes, 347 ROUND(100.0 * SUM(CASE WHEN outcome = 'success' THEN 1 ELSE 0 END) / COUNT(*), 2) as success_rate 348 FROM agent_outcomes 349 WHERE created_at > datetime('now', '-7 days') 350 GROUP BY agent_name, task_type 351 ORDER BY success_rate DESC; 352 353 -- Find patterns for specific error type 354 SELECT 355 task_id, 356 outcome, 357 duration_ms, 358 context_json, 359 result_json 360 FROM agent_outcomes 361 WHERE agent_name = 'developer' 362 AND task_type = 'fix_bug' 363 AND context_json LIKE '%null_pointer%' 364 ORDER BY created_at DESC 365 LIMIT 10; 366 ``` 367 368 ## Testing 369 370 ### Test Coverage 371 372 ```bash 373 # Run context-builder tests 374 npm test tests/agents/context-builder.test.js 375 ``` 376 377 ### Test Scenarios 378 379 1. **No history**: Returns empty history message 380 2. **With successful tasks**: Includes success section 381 3. **With failed tasks**: Includes failure section 382 4. **With related tasks**: Includes related tasks by file/error type 383 5. **Caching**: Verifies cache behavior 384 6. **Disabled via env var**: Skips history when disabled 385 7. **Token estimation**: Validates token counting 386 8. **Mixed history**: Success + failure combinations 387 388 ## Troubleshooting 389 390 ### History Not Showing 391 392 **Symptom**: LLM calls don't include task history 393 394 **Check**: 395 396 ```bash 397 # Verify env var not disabled 398 echo $AGENT_ENABLE_TASK_HISTORY # Should be empty or 'true' 399 400 # Check outcomes table has data 401 sqlite3 db/sites.db "SELECT COUNT(*) FROM agent_outcomes;" 402 403 # Verify agent is recording outcomes 404 sqlite3 db/sites.db "SELECT * FROM agent_outcomes ORDER BY created_at DESC LIMIT 5;" 405 ``` 406 407 ### High Token Usage 408 409 **Symptom**: Context tokens unexpectedly high 410 411 **Solution**: 412 413 ```bash 414 # Check history size 415 sqlite3 db/sites.db "SELECT agent_name, COUNT(*) FROM agent_outcomes GROUP BY agent_name;" 416 417 # If too many outcomes, clear old data 418 sqlite3 db/sites.db "DELETE FROM agent_outcomes WHERE created_at < datetime('now', '-30 days');" 419 420 # Or disable history temporarily 421 export AGENT_ENABLE_TASK_HISTORY=false 422 ``` 423 424 ### Cache Not Working 425 426 **Symptom**: Every call queries database 427 428 **Debug**: 429 430 ```javascript 431 import { clearCache } from './utils/context-builder.js'; 432 433 // Clear cache manually 434 clearCache(); 435 436 // Check if being called with different parameters 437 // (each parameter combination has separate cache key) 438 ``` 439 440 ## Best Practices 441 442 ### 1. Always Record Outcomes 443 444 ```javascript 445 // ✅ Good: Record detailed outcomes 446 await this.recordOutcome( 447 taskId, 448 'success', 449 { 450 task_type: 'fix_bug', 451 file_path: 'src/capture.js', 452 error_type: 'null_pointer', 453 duration_ms: 2500, 454 }, 455 { 456 approach: 'Added null check before accessing page.context()', 457 files_changed: ['src/capture.js'], 458 } 459 ); 460 461 // ❌ Bad: No outcome recording 462 await this.completeTask(taskId); 463 ``` 464 465 ### 2. Include Enough Context 466 467 ```javascript 468 // ✅ Good: Include file path, error type 469 context: { 470 error_type: 'null_pointer', 471 file_path: 'src/capture.js', 472 error_message: 'Cannot read property context of null', 473 } 474 475 // ❌ Bad: Too vague 476 context: { 477 error: 'Something failed', 478 } 479 ``` 480 481 ### 3. Use in Complex Tasks 482 483 ```javascript 484 // ✅ Good: Use history for complex debugging 485 async fixBug(task) { 486 const context = await this.getContextForTask(task); 487 // LLM benefits from past bug fixes 488 } 489 490 // ⚠️ OK: Skip for simple tasks 491 async scanSecrets(task) { 492 // Simple regex-based, no need for history 493 } 494 ``` 495 496 ### 4. Monitor Token Usage 497 498 ```javascript 499 const context = await buildAgentContext('developer', ['base.md', 'developer.md'], task); 500 501 if (context.historyTokens > 1000) { 502 // History is getting large, consider reducing limits 503 console.warn(`History using ${context.historyTokens} tokens`); 504 } 505 ``` 506 507 ## Future Enhancements 508 509 ### Planned Improvements 510 511 1. **Similarity Scoring**: Rank related tasks by relevance, not just file match 512 2. **Pattern Extraction**: Auto-detect successful patterns from outcomes 513 3. **Adaptive Limits**: Dynamically adjust history size based on task complexity 514 4. **Cross-Agent Learning**: Share patterns across different agents 515 5. **Long-Term Memory**: Persistent embeddings for very old but relevant outcomes 516 517 ### Experimental Features 518 519 ```javascript 520 // Not yet implemented 521 const context = await buildAgentContext('developer', ['base.md', 'developer.md'], task, { 522 includePatterns: true, // Auto-extracted patterns 523 semanticSearch: true, // Embedding-based similarity 524 crossAgentLearning: true, // Include other agents' outcomes 525 adaptiveHistorySize: true, // Dynamic limits based on task 526 }); 527 ``` 528 529 ## References 530 531 - **Implementation**: `src/agents/utils/context-builder.js` 532 - **Integration**: `src/agents/base-agent.js` (recordOutcome, learnFromPastOutcomes, getContextForTask) 533 - **API**: `src/agents/utils/agent-claude-api.js` (simpleLLMCall with taskHistory) 534 - **Tests**: `tests/agents/context-builder.test.js` 535 - **Schema**: `db/migrations/052-create-agent-outcomes.sql`