/ website / docs / developer-guide / agent-loop.md
agent-loop.md
  1  ---
  2  sidebar_position: 3
  3  title: "Agent Loop Internals"
  4  description: "Detailed walkthrough of AIAgent execution, API modes, tools, callbacks, and fallback behavior"
  5  ---
  6  
  7  # Agent Loop Internals
  8  
  9  The core orchestration engine is `run_agent.py`'s `AIAgent` class — roughly 13,700 lines that handle everything from prompt assembly to tool dispatch to provider failover.
 10  
 11  ## Core Responsibilities
 12  
 13  `AIAgent` is responsible for:
 14  
 15  - Assembling the effective system prompt and tool schemas via `prompt_builder.py`
 16  - Selecting the correct provider/API mode (chat_completions, codex_responses, anthropic_messages)
 17  - Making interruptible model calls with cancellation support
 18  - Executing tool calls (sequentially or concurrently via thread pool)
 19  - Maintaining conversation history in OpenAI message format
 20  - Handling compression, retries, and fallback model switching
 21  - Tracking iteration budgets across parent and child agents
 22  - Flushing persistent memory before context is lost
 23  
 24  ## Two Entry Points
 25  
 26  ```python
 27  # Simple interface — returns final response string
 28  response = agent.chat("Fix the bug in main.py")
 29  
 30  # Full interface — returns dict with messages, metadata, usage stats
 31  result = agent.run_conversation(
 32      user_message="Fix the bug in main.py",
 33      system_message=None,           # auto-built if omitted
 34      conversation_history=None,      # auto-loaded from session if omitted
 35      task_id="task_abc123"
 36  )
 37  ```
 38  
 39  `chat()` is a thin wrapper around `run_conversation()` that extracts the `final_response` field from the result dict.
 40  
 41  ## API Modes
 42  
 43  Hermes supports three API execution modes, resolved from provider selection, explicit args, and base URL heuristics:
 44  
 45  | API mode | Used for | Client type |
 46  |----------|----------|-------------|
 47  | `chat_completions` | OpenAI-compatible endpoints (OpenRouter, custom, most providers) | `openai.OpenAI` |
 48  | `codex_responses` | OpenAI Codex / Responses API | `openai.OpenAI` with Responses format |
 49  | `anthropic_messages` | Native Anthropic Messages API | `anthropic.Anthropic` via adapter |
 50  
 51  The mode determines how messages are formatted, how tool calls are structured, how responses are parsed, and how caching/streaming works. All three converge on the same internal message format (OpenAI-style `role`/`content`/`tool_calls` dicts) before and after API calls.
 52  
 53  **Mode resolution order:**
 54  1. Explicit `api_mode` constructor arg (highest priority)
 55  2. Provider-specific detection (e.g., `anthropic` provider → `anthropic_messages`)
 56  3. Base URL heuristics (e.g., `api.anthropic.com` → `anthropic_messages`)
 57  4. Default: `chat_completions`
 58  
 59  ## Turn Lifecycle
 60  
 61  Each iteration of the agent loop follows this sequence:
 62  
 63  ```text
 64  run_conversation()
 65    1. Generate task_id if not provided
 66    2. Append user message to conversation history
 67    3. Build or reuse cached system prompt (prompt_builder.py)
 68    4. Check if preflight compression is needed (>50% context)
 69    5. Build API messages from conversation history
 70       - chat_completions: OpenAI format as-is
 71       - codex_responses: convert to Responses API input items
 72       - anthropic_messages: convert via anthropic_adapter.py
 73    6. Inject ephemeral prompt layers (budget warnings, context pressure)
 74    7. Apply prompt caching markers if on Anthropic
 75    8. Make interruptible API call (_interruptible_api_call)
 76    9. Parse response:
 77       - If tool_calls: execute them, append results, loop back to step 5
 78       - If text response: persist session, flush memory if needed, return
 79  ```
 80  
 81  ### Message Format
 82  
 83  All messages use OpenAI-compatible format internally:
 84  
 85  ```python
 86  {"role": "system", "content": "..."}
 87  {"role": "user", "content": "..."}
 88  {"role": "assistant", "content": "...", "tool_calls": [...]}
 89  {"role": "tool", "tool_call_id": "...", "content": "..."}
 90  ```
 91  
 92  Reasoning content (from models that support extended thinking) is stored in `assistant_msg["reasoning"]` and optionally displayed via the `reasoning_callback`.
 93  
 94  ### Message Alternation Rules
 95  
 96  The agent loop enforces strict message role alternation:
 97  
 98  - After the system message: `User → Assistant → User → Assistant → ...`
 99  - During tool calling: `Assistant (with tool_calls) → Tool → Tool → ... → Assistant`
100  - **Never** two assistant messages in a row
101  - **Never** two user messages in a row
102  - **Only** `tool` role can have consecutive entries (parallel tool results)
103  
104  Providers validate these sequences and will reject malformed histories.
105  
106  ## Interruptible API Calls
107  
108  API requests are wrapped in `_interruptible_api_call()` which runs the actual HTTP call in a background thread while monitoring an interrupt event:
109  
110  ```text
111  ┌────────────────────────────────────────────────────┐
112  │  Main thread                  API thread           │
113  │                                                    │
114  │   wait on:                     HTTP POST           │
115  │    - response ready     ───▶   to provider         │
116  │    - interrupt event                               │
117  │    - timeout                                       │
118  └────────────────────────────────────────────────────┘
119  ```
120  
121  When interrupted (user sends new message, `/stop` command, or signal):
122  - The API thread is abandoned (response discarded)
123  - The agent can process the new input or shut down cleanly
124  - No partial response is injected into conversation history
125  
126  ## Tool Execution
127  
128  ### Sequential vs Concurrent
129  
130  When the model returns tool calls:
131  
132  - **Single tool call** → executed directly in the main thread
133  - **Multiple tool calls** → executed concurrently via `ThreadPoolExecutor`
134    - Exception: tools marked as interactive (e.g., `clarify`) force sequential execution
135    - Results are reinserted in the original tool call order regardless of completion order
136  
137  ### Execution Flow
138  
139  ```text
140  for each tool_call in response.tool_calls:
141      1. Resolve handler from tools/registry.py
142      2. Fire pre_tool_call plugin hook
143      3. Check if dangerous command (tools/approval.py)
144         - If dangerous: invoke approval_callback, wait for user
145      4. Execute handler with args + task_id
146      5. Fire post_tool_call plugin hook
147      6. Append {"role": "tool", "content": result} to history
148  ```
149  
150  ### Agent-Level Tools
151  
152  Some tools are intercepted by `run_agent.py` *before* reaching `handle_function_call()`:
153  
154  | Tool | Why intercepted |
155  |------|--------------------|
156  | `todo` | Reads/writes agent-local task state |
157  | `memory` | Writes to persistent memory files with character limits |
158  | `session_search` | Queries session history via the agent's session DB |
159  | `delegate_task` | Spawns subagent(s) with isolated context |
160  
161  These tools modify agent state directly and return synthetic tool results without going through the registry.
162  
163  ## Callback Surfaces
164  
165  `AIAgent` supports platform-specific callbacks that enable real-time progress in the CLI, gateway, and ACP integrations:
166  
167  | Callback | When fired | Used by |
168  |----------|-----------|---------|
169  | `tool_progress_callback` | Before/after each tool execution | CLI spinner, gateway progress messages |
170  | `thinking_callback` | When model starts/stops thinking | CLI "thinking..." indicator |
171  | `reasoning_callback` | When model returns reasoning content | CLI reasoning display, gateway reasoning blocks |
172  | `clarify_callback` | When `clarify` tool is called | CLI input prompt, gateway interactive message |
173  | `step_callback` | After each complete agent turn | Gateway step tracking, ACP progress |
174  | `stream_delta_callback` | Each streaming token (when enabled) | CLI streaming display |
175  | `tool_gen_callback` | When tool call is parsed from stream | CLI tool preview in spinner |
176  | `status_callback` | State changes (thinking, executing, etc.) | ACP status updates |
177  
178  ## Budget and Fallback Behavior
179  
180  ### Iteration Budget
181  
182  The agent tracks iterations via `IterationBudget`:
183  
184  - Default: 90 iterations (configurable via `agent.max_turns`)
185  - Each agent gets its own budget. Subagents get independent budgets capped at `delegation.max_iterations` (default 50) — total iterations across parent + subagents can exceed the parent's cap
186  - At 100%, the agent stops and returns a summary of work done
187  
188  ### Fallback Model
189  
190  When the primary model fails (429 rate limit, 5xx server error, 401/403 auth error):
191  
192  1. Check `fallback_providers` list in config
193  2. Try each fallback in order
194  3. On success, continue the conversation with the new provider
195  4. On 401/403, attempt credential refresh before failing over
196  
197  The fallback system also covers auxiliary tasks independently — vision, compression, web extraction, and session search each have their own fallback chain configurable via the `auxiliary.*` config section.
198  
199  ## Compression and Persistence
200  
201  ### When Compression Triggers
202  
203  - **Preflight** (before API call): If conversation exceeds 50% of model's context window
204  - **Gateway auto-compression**: If conversation exceeds 85% (more aggressive, runs between turns)
205  
206  ### What Happens During Compression
207  
208  1. Memory is flushed to disk first (preventing data loss)
209  2. Middle conversation turns are summarized into a compact summary
210  3. The last N messages are preserved intact (`compression.protect_last_n`, default: 20)
211  4. Tool call/result message pairs are kept together (never split)
212  5. A new session lineage ID is generated (compression creates a "child" session)
213  
214  ### Session Persistence
215  
216  After each turn:
217  - Messages are saved to the session store (SQLite via `hermes_state.py`)
218  - Memory changes are flushed to `MEMORY.md` / `USER.md`
219  - The session can be resumed later via `/resume` or `hermes chat --resume`
220  
221  ## Key Source Files
222  
223  | File | Purpose |
224  |------|---------|
225  | `run_agent.py` | AIAgent class — the complete agent loop (~13,700 lines) |
226  | `agent/prompt_builder.py` | System prompt assembly from memory, skills, context files, personality |
227  | `agent/context_engine.py` | ContextEngine ABC — pluggable context management |
228  | `agent/context_compressor.py` | Default engine — lossy summarization algorithm |
229  | `agent/prompt_caching.py` | Anthropic prompt caching markers and cache metrics |
230  | `agent/auxiliary_client.py` | Auxiliary LLM client for side tasks (vision, summarization) |
231  | `model_tools.py` | Tool schema collection, `handle_function_call()` dispatch |
232  
233  ## Related Docs
234  
235  - [Provider Runtime Resolution](./provider-runtime.md)
236  - [Prompt Assembly](./prompt-assembly.md)
237  - [Context Compression & Prompt Caching](./context-compression-and-caching.md)
238  - [Tools Runtime](./tools-runtime.md)
239  - [Architecture Overview](./architecture.md)