/ epics / epic-12-everything-agent.md
epic-12-everything-agent.md
 1  # Epic 12: Everything Agent Upgrade
 2  
 3  > Status: Done | Last updated: 2026-04-04
 4  
 5  ## Goal
 6  
 7  Transform Bob's voice agent from a fixed-tool assistant into an "everything agent" capable of answering arbitrary questions by writing and executing code, remembering prior conversations across sessions, managing long multi-turn dialogues without context overflow, and fast-tracking deterministic queries without LLM invocation. This epic delivers the core runtime intelligence upgrades that make Bob genuinely useful for daily family life.
 8  
 9  ## Dependencies
10  - Depends on: Epic 06 (voice pipeline, Pipecat agent), Epic 02 (LLM inference, vLLM), Epic 04 (Graphiti + Neo4j for temporal memory)
11  - Blocks: Epic 13 (Coordinator Agent, future), Model Tiering (Bonsai 8B, future)
12  
13  ## Context
14  
15  Bob currently has 4 hardcoded tools (weather, HA state, HA control, knowledge graph query). This limits him to answering only pre-anticipated questions. Research (agent-optimization-synthesis.md) shows that REPL-based agents dramatically outperform discrete-tool agents — MIT's RLMs scale to 10M+ tokens, and Agentica pushed ARC-AGI-2 from <10% to 85.28% using persistent REPL. Additionally, Bob has no cross-session memory (each connection starts fresh) and no protection against context overflow in long conversations.
16  
17  This epic addresses these gaps in priority order: sandboxed code execution (capability breadth), session memory (continuity), context compaction (reliability), and deterministic fast-path (latency).
18  
19  ## Stories
20  
21  | ID | Story | Status | OpenSpec Refs |
22  |----|-------|--------|---------------|
23  | S12-01 | REPL Sandbox — Docker-based Python execution tool for the voice agent | Done (S5) | REQ-EA-001, REQ-EA-002, REQ-EA-003 |
24  | S12-02 | Voice Session Consolidation — Graphiti-backed cross-session memory | Done (S5+S9) | REQ-EA-004, REQ-EA-005 |
25  | S12-03 | Context Compaction — Token-aware summarization for long conversations | Done (S5) | REQ-EA-006, REQ-EA-007 |
26  | S12-04 | Fast-Path Deterministic Queries — Pattern-matched bypass for simple questions | Done (S5) | REQ-EA-008, REQ-EA-009 |
27  
28  ## Acceptance Criteria
29  - [x] Bob can answer arbitrary questions by writing and executing Python code in a sandboxed container
30  - [x] After a voice session ends, a summary is stored in Graphiti and retrieved at the start of the next session with the same speaker
31  - [x] Conversations exceeding the token threshold are automatically compacted without user-visible degradation
32  - [x] Simple queries (time, date, weather, system status) are answered in < 200ms without LLM invocation
33  - [x] All sandbox executions are time-bounded (30s default) with stdout/stderr capture and no host escape
34  - [x] The REPL sandbox has API access to Bob's services (Docker, Prometheus, NATS, HA, Oxigraph)
35  
36  ## Technical Notes
37  - REPL sandbox uses a dedicated Docker container with a Python runtime, pre-installed libraries (requests, docker, nats-py, prometheus-api-client), and network access restricted to Bob's internal services
38  - Sandbox execution is invoked as an LLM tool call: the LLM generates Python code, the tool runs it in Docker, stdout is returned as the tool result
39  - Session consolidation reuses the existing Graphiti client (`services/knowledge-gardener/graphiti_client.py`) and Neo4j instance (:7687)
40  - Context compaction targets Qwen3-32B's 16K context window — trigger summarization at ~12K tokens, keep last 4 turns verbatim
41  - Fast-path uses regex/keyword patterns evaluated before the LLM context is assembled; matched queries call deterministic functions directly and inject the result as a TTS frame
42  - Security: sandbox container runs as non-root, no Docker socket mount, network restricted to internal services, filesystem is ephemeral (destroyed after each execution)