Cradicle Explorer

/ docs / agents / task-history.md
task-history.md
  1  ---
  2  title: Agent Task History & Learning
  3  category: agents
  4  last_verified: 2026-02-16
  5  tags: [agents, learning, context, memory, performance]
  6  status: implemented
  7  related_files:
  8    - src/agents/utils/context-builder.js
  9    - src/agents/utils/agent-claude-api.js
 10    - src/agents/base-agent.js
 11    - tests/agents/context-builder.test.js
 12  ---
 13  
 14  # Agent Task History & Learning
 15  
 16  ## Overview
 17  
 18  The agent system implements **task history and learning** to enable agents to improve performance over time by learning from past successes and failures.
 19  
 20  Every agent can access:
 21  
 22  - **Recent successful tasks** - Patterns that work
 23  - **Recent failures** - Mistakes to avoid
 24  - **Related tasks** - Tasks involving same files/error types
 25  
 26  This context is automatically injected into LLM calls, providing agents with memory and learning capabilities.
 27  
 28  ## How It Works
 29  
 30  ### Architecture
 31  
 32  ```
 33  ┌─────────────────┐
 34  │  Agent Task     │
 35  │  (current)      │
 36  └────────┬────────┘
 37           │
 38           v
 39  ┌─────────────────┐        ┌──────────────────┐
 40  │ Context Builder │───────>│ agent_outcomes   │
 41  │                 │        │ (history DB)     │
 42  └────────┬────────┘        └──────────────────┘
 43           │
 44           v
 45  ┌─────────────────┐
 46  │ Enriched        │
 47  │ Context =       │
 48  │ Base + History  │
 49  └────────┬────────┘
 50           │
 51           v
 52  ┌─────────────────┐
 53  │ LLM Call        │
 54  │ (with learning) │
 55  └─────────────────┘
 56  ```
 57  
 58  ### Data Flow
 59  
 60  1. **Outcome Recording**: After completing a task, agent records outcome to `agent_outcomes` table
 61  2. **History Retrieval**: When processing a new task, agent retrieves relevant history
 62  3. **Context Enrichment**: History is formatted and added to base context
 63  4. **LLM Call**: Enhanced context is sent to Claude API
 64  5. **Learning**: Agent learns from patterns in historical data
 65  
 66  ## Database Schema
 67  
 68  ### agent_outcomes Table
 69  
 70  Stores task outcomes for learning:
 71  
 72  ```sql
 73  CREATE TABLE agent_outcomes (
 74      id INTEGER PRIMARY KEY AUTOINCREMENT,
 75      task_id INTEGER NOT NULL REFERENCES agent_tasks(id) ON DELETE CASCADE,
 76      agent_name TEXT NOT NULL,
 77      task_type TEXT NOT NULL,
 78      outcome TEXT NOT NULL CHECK(outcome IN ('success', 'failure')),
 79      context_json TEXT,  -- Task-specific context (error_type, file_path, etc.)
 80      result_json TEXT,   -- Task result details (what worked, what didn't)
 81      duration_ms INTEGER,
 82      created_at DATETIME DEFAULT CURRENT_timestamp
 83  );
 84  
 85  CREATE INDEX idx_agent_outcomes_agent ON agent_outcomes(agent_name, task_type);
 86  CREATE INDEX idx_agent_outcomes_task_type ON agent_outcomes(task_type, outcome);
 87  CREATE INDEX idx_agent_outcomes_outcome ON agent_outcomes(outcome, created_at);
 88  ```
 89  
 90  ## Usage Examples
 91  
 92  ### Recording Outcomes
 93  
 94  Agents automatically record outcomes after task completion:
 95  
 96  ```javascript
 97  // In BaseAgent.executeTask()
 98  try {
 99    await this.processTask(task);
100  
101    // Record success
102    await this.recordOutcome(task.id, 'success', {
103      task_type: task.task_type,
104      duration_ms: duration,
105    });
106  } catch (error) {
107    // Record failure
108    await this.recordOutcome(task.id, 'failure', {
109      task_type: task.task_type,
110      error: error.message,
111      stack: error.stack,
112      duration_ms: duration,
113    });
114  }
115  ```
116  
117  ### Getting Enriched Context
118  
119  Agents can get enriched context with history:
120  
121  ```javascript
122  import { buildAgentContext } from './utils/context-builder.js';
123  
124  // Get context with task history
125  const context = await buildAgentContext('developer', ['base.md', 'developer.md'], currentTask);
126  
127  console.log(context.fullContext); // Base + history combined
128  console.log(context.baseContext); // Original base context
129  console.log(context.historyContext); // Just the history section
130  console.log(context.historyTokens); // Token cost of history
131  console.log(context.totalTokens); // Total tokens
132  console.log(context.metadata.historyStats); // Stats: {recentSuccesses, recentFailures, relatedTasks}
133  ```
134  
135  ### Using in LLM Calls
136  
137  Inject history into system prompts:
138  
139  ```javascript
140  import { simpleLLMCall } from './utils/agent-claude-api.js';
141  import { buildAgentContext } from './utils/context-builder.js';
142  
143  // Get enriched context
144  const enrichedContext = await buildAgentContext('developer', ['base.md', 'developer.md'], task);
145  
146  // Make LLM call with task history
147  const response = await simpleLLMCall('developer', task.id, {
148    prompt: 'Fix this bug...',
149    systemPrompt: 'You are an expert developer...',
150    taskHistory: enrichedContext.historyContext, // ← Inject history here
151    maxTokens: 2000,
152  });
153  ```
154  
155  ### Direct BaseAgent Method
156  
157  BaseAgent provides a convenience method:
158  
159  ```javascript
160  class DeveloperAgent extends BaseAgent {
161    async fixBug(task) {
162      // Get context with history for this task
163      const context = await this.getContextForTask(task);
164  
165      // Use context.fullContext in LLM prompts
166      const fixPrompt = `
167        ${context.fullContext}
168  
169        Fix this bug: ${task.context_json.error_message}
170      `;
171    }
172  }
173  ```
174  
175  ## History Format
176  
177  ### Successful Tasks
178  
179  ```markdown
180  ## Task History (Learning Context)
181  
182  ### Recent Successful Approaches (Last 7 Days)
183  
184  - **fix_bug** (Task #42)
185    - Files: src/scoring.js
186    - Approach: Added null check before accessing property
187    - Duration: 3s
188  
189  - **implement_feature** (Task #38)
190    - Files: src/outreach/sms.js, tests/outreach-sms.test.js
191    - Approach: Implemented retry logic with exponential backoff
192    - Duration: 45s
193  ```
194  
195  ### Failed Tasks
196  
197  ```markdown
198  ### Past Failures to Avoid (Last 7 Days)
199  
200  - **fix_bug** (Task #40)
201    - Error: Twilio API timeout after 30 seconds
202    - Error Type: api_timeout
203    - File: src/outreach/sms.js
204  
205  - **implement_feature** (Task #35)
206    - Error: Test coverage below 80% ([file:line])
207    - Error Type: coverage_gate
208  ```
209  
210  ### Related Tasks
211  
212  ```markdown
213  ### Related Tasks (Similar Context)
214  
215  - ✓ **fix_bug** (Task #28) - completed
216    - File: src/capture.js
217    - What worked: Added null check before accessing browser context
218  
219  - ✗ **refactor_code** (Task #32) - failed
220    - File: src/capture.js
221    - What failed: Broke existing tests by changing function signature
222  ```
223  
224  ## Configuration
225  
226  ### Environment Variables
227  
228  ```bash
229  # Enable/disable task history (default: true)
230  AGENT_ENABLE_TASK_HISTORY=true
231  
232  # Cache TTL (default: 30 minutes)
233  # Internal constant, not configurable via env var
234  ```
235  
236  ### Limits
237  
238  - **Recent Successes**: Last 10 successful tasks (7 days max)
239  - **Recent Failures**: Last 5 failed tasks (7 days max)
240  - **Related Tasks**: Last 5 related tasks (30 days max)
241  - **Cache TTL**: 30 minutes (reduces DB queries)
242  
243  ## Benefits
244  
245  ### 1. Pattern Recognition
246  
247  Agents learn which approaches work for specific error types:
248  
249  ```
250  "For null_pointer errors in Playwright code:
251   - Past success: Added null check before accessing page.context()
252   - Past failure: Wrapping in try-catch without fixing root cause
253   → Apply null check pattern"
254  ```
255  
256  ### 2. Avoiding Repeated Mistakes
257  
258  ```
259  "File src/capture.js:
260   - Previous failure: Changed function signature, broke 8 tests
261   → This time: Keep function signature, add optional parameter"
262  ```
263  
264  ### 3. Faster Resolution
265  
266  - **Without history**: Agent tries various approaches, some fail
267  - **With history**: Agent sees what worked before, applies immediately
268  
269  ### 4. Consistency
270  
271  All agents learn from each other's experiences:
272  
273  - Developer fixes bug → QA sees the pattern
274  - Security finds vulnerability → Developer knows to check for it
275  
276  ## Performance Impact
277  
278  ### Token Usage
279  
280  - **Base context**: ~10-15KB (varies by agent)
281  - **History context**: ~2-5KB (depends on history size)
282  - **Total overhead**: ~20-40% token increase
283  - **Benefit**: Reduces trial-and-error, faster resolution
284  
285  ### Example Token Breakdown
286  
287  ```
288  Developer Agent with History:
289  - Base context: 12KB = ~3,000 tokens
290  - History (5 successes, 2 failures): 3KB = ~750 tokens
291  - Total: 15KB = ~3,750 tokens
292  - Overhead: +25% tokens
293  ```
294  
295  ### Caching
296  
297  Context builder caches history for 30 minutes:
298  
299  - **First call**: Query DB for history
300  - **Subsequent calls**: Return cached data
301  - **Cache invalidation**: After 30 minutes
302  
303  ## Querying Task History
304  
305  ### Get Recent Outcomes
306  
307  ```javascript
308  // Via BaseAgent method
309  const insights = await agent.learnFromPastOutcomes('fix_bug', 50);
310  
311  console.log(insights);
312  // {
313  //   task_type: 'fix_bug',
314  //   total_outcomes: 42,
315  //   success_count: 35,
316  //   failure_count: 7,
317  //   success_rate: 83.33,
318  //   avg_duration_ms: 2500,
319  //   context_patterns: {
320  //     'null_pointer': { total: 12, successes: 10, success_rate: 83 },
321  //     'api_timeout': { total: 8, successes: 5, success_rate: 62 }
322  //   },
323  //   success_patterns: [
324  //     'Successfully handled: capture.js',
325  //     'Added null check before accessing property'
326  //   ],
327  //   failure_patterns: [
328  //     'Twilio API timeout after [N] seconds',
329  //     'Test coverage below [N]%'
330  //   ],
331  //   recommendations: [
332  //     'Continue using successful approaches: Added null check before accessing property',
333  //     'Avoid common failure pattern: Twilio API timeout after [N] seconds'
334  //   ]
335  // }
336  ```
337  
338  ### Direct Database Queries
339  
340  ```sql
341  -- Get success rate by agent and task type
342  SELECT
343    agent_name,
344    task_type,
345    COUNT(*) as total,
346    SUM(CASE WHEN outcome = 'success' THEN 1 ELSE 0 END) as successes,
347    ROUND(100.0 * SUM(CASE WHEN outcome = 'success' THEN 1 ELSE 0 END) / COUNT(*), 2) as success_rate
348  FROM agent_outcomes
349  WHERE created_at > datetime('now', '-7 days')
350  GROUP BY agent_name, task_type
351  ORDER BY success_rate DESC;
352  
353  -- Find patterns for specific error type
354  SELECT
355    task_id,
356    outcome,
357    duration_ms,
358    context_json,
359    result_json
360  FROM agent_outcomes
361  WHERE agent_name = 'developer'
362    AND task_type = 'fix_bug'
363    AND context_json LIKE '%null_pointer%'
364  ORDER BY created_at DESC
365  LIMIT 10;
366  ```
367  
368  ## Testing
369  
370  ### Test Coverage
371  
372  ```bash
373  # Run context-builder tests
374  npm test tests/agents/context-builder.test.js
375  ```
376  
377  ### Test Scenarios
378  
379  1. **No history**: Returns empty history message
380  2. **With successful tasks**: Includes success section
381  3. **With failed tasks**: Includes failure section
382  4. **With related tasks**: Includes related tasks by file/error type
383  5. **Caching**: Verifies cache behavior
384  6. **Disabled via env var**: Skips history when disabled
385  7. **Token estimation**: Validates token counting
386  8. **Mixed history**: Success + failure combinations
387  
388  ## Troubleshooting
389  
390  ### History Not Showing
391  
392  **Symptom**: LLM calls don't include task history
393  
394  **Check**:
395  
396  ```bash
397  # Verify env var not disabled
398  echo $AGENT_ENABLE_TASK_HISTORY  # Should be empty or 'true'
399  
400  # Check outcomes table has data
401  sqlite3 db/sites.db "SELECT COUNT(*) FROM agent_outcomes;"
402  
403  # Verify agent is recording outcomes
404  sqlite3 db/sites.db "SELECT * FROM agent_outcomes ORDER BY created_at DESC LIMIT 5;"
405  ```
406  
407  ### High Token Usage
408  
409  **Symptom**: Context tokens unexpectedly high
410  
411  **Solution**:
412  
413  ```bash
414  # Check history size
415  sqlite3 db/sites.db "SELECT agent_name, COUNT(*) FROM agent_outcomes GROUP BY agent_name;"
416  
417  # If too many outcomes, clear old data
418  sqlite3 db/sites.db "DELETE FROM agent_outcomes WHERE created_at < datetime('now', '-30 days');"
419  
420  # Or disable history temporarily
421  export AGENT_ENABLE_TASK_HISTORY=false
422  ```
423  
424  ### Cache Not Working
425  
426  **Symptom**: Every call queries database
427  
428  **Debug**:
429  
430  ```javascript
431  import { clearCache } from './utils/context-builder.js';
432  
433  // Clear cache manually
434  clearCache();
435  
436  // Check if being called with different parameters
437  // (each parameter combination has separate cache key)
438  ```
439  
440  ## Best Practices
441  
442  ### 1. Always Record Outcomes
443  
444  ```javascript
445  // ✅ Good: Record detailed outcomes
446  await this.recordOutcome(
447    taskId,
448    'success',
449    {
450      task_type: 'fix_bug',
451      file_path: 'src/capture.js',
452      error_type: 'null_pointer',
453      duration_ms: 2500,
454    },
455    {
456      approach: 'Added null check before accessing page.context()',
457      files_changed: ['src/capture.js'],
458    }
459  );
460  
461  // ❌ Bad: No outcome recording
462  await this.completeTask(taskId);
463  ```
464  
465  ### 2. Include Enough Context
466  
467  ```javascript
468  // ✅ Good: Include file path, error type
469  context: {
470    error_type: 'null_pointer',
471    file_path: 'src/capture.js',
472    error_message: 'Cannot read property context of null',
473  }
474  
475  // ❌ Bad: Too vague
476  context: {
477    error: 'Something failed',
478  }
479  ```
480  
481  ### 3. Use in Complex Tasks
482  
483  ```javascript
484  // ✅ Good: Use history for complex debugging
485  async fixBug(task) {
486    const context = await this.getContextForTask(task);
487    // LLM benefits from past bug fixes
488  }
489  
490  // ⚠️ OK: Skip for simple tasks
491  async scanSecrets(task) {
492    // Simple regex-based, no need for history
493  }
494  ```
495  
496  ### 4. Monitor Token Usage
497  
498  ```javascript
499  const context = await buildAgentContext('developer', ['base.md', 'developer.md'], task);
500  
501  if (context.historyTokens > 1000) {
502    // History is getting large, consider reducing limits
503    console.warn(`History using ${context.historyTokens} tokens`);
504  }
505  ```
506  
507  ## Future Enhancements
508  
509  ### Planned Improvements
510  
511  1. **Similarity Scoring**: Rank related tasks by relevance, not just file match
512  2. **Pattern Extraction**: Auto-detect successful patterns from outcomes
513  3. **Adaptive Limits**: Dynamically adjust history size based on task complexity
514  4. **Cross-Agent Learning**: Share patterns across different agents
515  5. **Long-Term Memory**: Persistent embeddings for very old but relevant outcomes
516  
517  ### Experimental Features
518  
519  ```javascript
520  // Not yet implemented
521  const context = await buildAgentContext('developer', ['base.md', 'developer.md'], task, {
522    includePatterns: true, // Auto-extracted patterns
523    semanticSearch: true, // Embedding-based similarity
524    crossAgentLearning: true, // Include other agents' outcomes
525    adaptiveHistorySize: true, // Dynamic limits based on task
526  });
527  ```
528  
529  ## References
530  
531  - **Implementation**: `src/agents/utils/context-builder.js`
532  - **Integration**: `src/agents/base-agent.js` (recordOutcome, learnFromPastOutcomes, getContextForTask)
533  - **API**: `src/agents/utils/agent-claude-api.js` (simpleLLMCall with taskHistory)
534  - **Tests**: `tests/agents/context-builder.test.js`
535  - **Schema**: `db/migrations/052-create-agent-outcomes.sql`