trajectory-format.md
1 # Trajectory Format 2 3 Hermes Agent saves conversation trajectories in ShareGPT-compatible JSONL format 4 for use as training data, debugging artifacts, and reinforcement learning datasets. 5 6 Source files: `agent/trajectory.py`, `run_agent.py` (search for `_save_trajectory`), `batch_runner.py` 7 8 9 ## File Naming Convention 10 11 Trajectories are written to files in the current working directory: 12 13 | File | When | 14 |------|------| 15 | `trajectory_samples.jsonl` | Conversations that completed successfully (`completed=True`) | 16 | `failed_trajectories.jsonl` | Conversations that failed or were interrupted (`completed=False`) | 17 18 The batch runner (`batch_runner.py`) writes to a custom output file per batch 19 (e.g., `batch_001_output.jsonl`) with additional metadata fields. 20 21 You can override the filename via the `filename` parameter in `save_trajectory()`. 22 23 24 ## JSONL Entry Format 25 26 Each line in the file is a self-contained JSON object. There are two variants: 27 28 ### CLI/Interactive Format (from `_save_trajectory`) 29 30 ```json 31 { 32 "conversations": [ ... ], 33 "timestamp": "2026-03-30T14:22:31.456789", 34 "model": "anthropic/claude-sonnet-4.6", 35 "completed": true 36 } 37 ``` 38 39 ### Batch Runner Format (from `batch_runner.py`) 40 41 ```json 42 { 43 "prompt_index": 42, 44 "conversations": [ ... ], 45 "metadata": { "prompt_source": "gsm8k", "difficulty": "hard" }, 46 "completed": true, 47 "partial": false, 48 "api_calls": 7, 49 "toolsets_used": ["code_tools", "file_tools"], 50 "tool_stats": { 51 "terminal": {"count": 3, "success": 3, "failure": 0}, 52 "read_file": {"count": 2, "success": 2, "failure": 0}, 53 "write_file": {"count": 0, "success": 0, "failure": 0} 54 }, 55 "tool_error_counts": { 56 "terminal": 0, 57 "read_file": 0, 58 "write_file": 0 59 } 60 } 61 ``` 62 63 The `tool_stats` and `tool_error_counts` dictionaries are normalized to include 64 ALL possible tools (from `model_tools.TOOL_TO_TOOLSET_MAP`) with zero defaults, 65 ensuring consistent schema across entries for HuggingFace dataset loading. 66 67 68 ## Conversations Array (ShareGPT Format) 69 70 The `conversations` array uses ShareGPT role conventions: 71 72 | API Role | ShareGPT `from` | 73 |----------|-----------------| 74 | system | `"system"` | 75 | user | `"human"` | 76 | assistant | `"gpt"` | 77 | tool | `"tool"` | 78 79 ### Complete Example 80 81 ```json 82 { 83 "conversations": [ 84 { 85 "from": "system", 86 "value": "You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. If available tools are not relevant in assisting with user query, just respond in natural conversational language. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. Here are the available tools:\n<tools>\n[{\"name\": \"terminal\", \"description\": \"Execute shell commands\", \"parameters\": {\"type\": \"object\", \"properties\": {\"command\": {\"type\": \"string\"}}}, \"required\": null}]\n</tools>\nFor each function call return a JSON object, with the following pydantic model json schema for each:\n{'title': 'FunctionCall', 'type': 'object', 'properties': {'name': {'title': 'Name', 'type': 'string'}, 'arguments': {'title': 'Arguments', 'type': 'object'}}, 'required': ['name', 'arguments']}\nEach function call should be enclosed within <tool_call> </tool_call> XML tags.\nExample:\n<tool_call>\n{'name': <function-name>,'arguments': <args-dict>}\n</tool_call>" 87 }, 88 { 89 "from": "human", 90 "value": "What Python version is installed?" 91 }, 92 { 93 "from": "gpt", 94 "value": "<think>\nThe user wants to know the Python version. I should run python3 --version.\n</think>\n<tool_call>\n{\"name\": \"terminal\", \"arguments\": {\"command\": \"python3 --version\"}}\n</tool_call>" 95 }, 96 { 97 "from": "tool", 98 "value": "<tool_response>\n{\"tool_call_id\": \"call_abc123\", \"name\": \"terminal\", \"content\": \"Python 3.11.6\"}\n</tool_response>" 99 }, 100 { 101 "from": "gpt", 102 "value": "<think>\nGot the version. I can now answer the user.\n</think>\nPython 3.11.6 is installed on this system." 103 } 104 ], 105 "timestamp": "2026-03-30T14:22:31.456789", 106 "model": "anthropic/claude-sonnet-4.6", 107 "completed": true 108 } 109 ``` 110 111 112 ## Normalization Rules 113 114 ### Reasoning Content Markup 115 116 The trajectory converter normalizes ALL reasoning into `<think>` tags, regardless 117 of how the model originally produced it: 118 119 1. **Native thinking tokens** (`msg["reasoning"]` field from providers like 120 Anthropic, OpenAI o-series): Wrapped as `<think>\n{reasoning}\n</think>\n` 121 and prepended before the content. 122 123 2. **REASONING_SCRATCHPAD XML** (when native thinking is disabled and the model 124 reasons via system-prompt-instructed XML): `<REASONING_SCRATCHPAD>` tags are 125 converted to `<think>` via `convert_scratchpad_to_think()`. 126 127 3. **Empty think blocks**: Every `gpt` turn is guaranteed to have a `<think>` 128 block. If no reasoning was produced, an empty block is inserted: 129 `<think>\n</think>\n` — this ensures consistent format for training data. 130 131 ### Tool Call Normalization 132 133 Tool calls from the API format (with `tool_call_id`, function name, arguments as 134 JSON string) are converted to XML-wrapped JSON: 135 136 ``` 137 <tool_call> 138 {"name": "terminal", "arguments": {"command": "ls -la"}} 139 </tool_call> 140 ``` 141 142 - Arguments are parsed from JSON strings back to objects (not double-encoded) 143 - If JSON parsing fails (shouldn't happen — validated during conversation), 144 an empty `{}` is used with a warning logged 145 - Multiple tool calls in one assistant turn produce multiple `<tool_call>` blocks 146 in a single `gpt` message 147 148 ### Tool Response Normalization 149 150 All tool results following an assistant message are grouped into a single `tool` 151 turn with XML-wrapped JSON responses: 152 153 ``` 154 <tool_response> 155 {"tool_call_id": "call_abc123", "name": "terminal", "content": "output here"} 156 </tool_response> 157 ``` 158 159 - If tool content looks like JSON (starts with `{` or `[`), it's parsed so the 160 content field contains a JSON object/array rather than a string 161 - Multiple tool results are joined with newlines in one message 162 - The tool name is matched by position against the parent assistant's `tool_calls` 163 array 164 165 ### System Message 166 167 The system message is generated at save time (not taken from the conversation). 168 It follows the Hermes function-calling prompt template with: 169 170 - Preamble explaining the function-calling protocol 171 - `<tools>` XML block containing the JSON tool definitions 172 - Schema reference for `FunctionCall` objects 173 - `<tool_call>` example 174 175 Tool definitions include `name`, `description`, `parameters`, and `required` 176 (set to `null` to match the canonical format). 177 178 179 ## Loading Trajectories 180 181 Trajectories are standard JSONL — load with any JSON-lines reader: 182 183 ```python 184 import json 185 186 def load_trajectories(path: str): 187 """Load trajectory entries from a JSONL file.""" 188 entries = [] 189 with open(path, "r", encoding="utf-8") as f: 190 for line in f: 191 line = line.strip() 192 if line: 193 entries.append(json.loads(line)) 194 return entries 195 196 # Filter to successful completions only 197 successful = [e for e in load_trajectories("trajectory_samples.jsonl") 198 if e.get("completed")] 199 200 # Extract just the conversations for training 201 training_data = [e["conversations"] for e in successful] 202 ``` 203 204 ### Loading for HuggingFace Datasets 205 206 ```python 207 from datasets import load_dataset 208 209 ds = load_dataset("json", data_files="trajectory_samples.jsonl") 210 ``` 211 212 The normalized `tool_stats` schema ensures all entries have the same columns, 213 preventing Arrow schema mismatch errors during dataset loading. 214 215 216 ## Controlling Trajectory Saving 217 218 In the CLI, trajectory saving is controlled by: 219 220 ```yaml 221 # config.yaml 222 agent: 223 save_trajectories: true # default: false 224 ``` 225 226 Or via the `--save-trajectories` flag. When the agent initializes with 227 `save_trajectories=True`, the `_save_trajectory()` method is called at the end 228 of each conversation turn. 229 230 The batch runner always saves trajectories (that's its primary purpose). 231 232 Samples with zero reasoning across all turns are automatically discarded by the 233 batch runner to avoid polluting training data with non-reasoning examples.