Cradicle Explorer

/ website / docs / developer-guide / trajectory-format.md
trajectory-format.md
  1  # Trajectory Format
  2  
  3  Hermes Agent saves conversation trajectories in ShareGPT-compatible JSONL format
  4  for use as training data, debugging artifacts, and reinforcement learning datasets.
  5  
  6  Source files: `agent/trajectory.py`, `run_agent.py` (search for `_save_trajectory`), `batch_runner.py`
  7  
  8  
  9  ## File Naming Convention
 10  
 11  Trajectories are written to files in the current working directory:
 12  
 13  | File | When |
 14  |------|------|
 15  | `trajectory_samples.jsonl` | Conversations that completed successfully (`completed=True`) |
 16  | `failed_trajectories.jsonl` | Conversations that failed or were interrupted (`completed=False`) |
 17  
 18  The batch runner (`batch_runner.py`) writes to a custom output file per batch
 19  (e.g., `batch_001_output.jsonl`) with additional metadata fields.
 20  
 21  You can override the filename via the `filename` parameter in `save_trajectory()`.
 22  
 23  
 24  ## JSONL Entry Format
 25  
 26  Each line in the file is a self-contained JSON object. There are two variants:
 27  
 28  ### CLI/Interactive Format (from `_save_trajectory`)
 29  
 30  ```json
 31  {
 32    "conversations": [ ... ],
 33    "timestamp": "2026-03-30T14:22:31.456789",
 34    "model": "anthropic/claude-sonnet-4.6",
 35    "completed": true
 36  }
 37  ```
 38  
 39  ### Batch Runner Format (from `batch_runner.py`)
 40  
 41  ```json
 42  {
 43    "prompt_index": 42,
 44    "conversations": [ ... ],
 45    "metadata": { "prompt_source": "gsm8k", "difficulty": "hard" },
 46    "completed": true,
 47    "partial": false,
 48    "api_calls": 7,
 49    "toolsets_used": ["code_tools", "file_tools"],
 50    "tool_stats": {
 51      "terminal": {"count": 3, "success": 3, "failure": 0},
 52      "read_file": {"count": 2, "success": 2, "failure": 0},
 53      "write_file": {"count": 0, "success": 0, "failure": 0}
 54    },
 55    "tool_error_counts": {
 56      "terminal": 0,
 57      "read_file": 0,
 58      "write_file": 0
 59    }
 60  }
 61  ```
 62  
 63  The `tool_stats` and `tool_error_counts` dictionaries are normalized to include
 64  ALL possible tools (from `model_tools.TOOL_TO_TOOLSET_MAP`) with zero defaults,
 65  ensuring consistent schema across entries for HuggingFace dataset loading.
 66  
 67  
 68  ## Conversations Array (ShareGPT Format)
 69  
 70  The `conversations` array uses ShareGPT role conventions:
 71  
 72  | API Role | ShareGPT `from` |
 73  |----------|-----------------|
 74  | system | `"system"` |
 75  | user | `"human"` |
 76  | assistant | `"gpt"` |
 77  | tool | `"tool"` |
 78  
 79  ### Complete Example
 80  
 81  ```json
 82  {
 83    "conversations": [
 84      {
 85        "from": "system",
 86        "value": "You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. If available tools are not relevant in assisting with user query, just respond in natural conversational language. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. Here are the available tools:\n<tools>\n[{\"name\": \"terminal\", \"description\": \"Execute shell commands\", \"parameters\": {\"type\": \"object\", \"properties\": {\"command\": {\"type\": \"string\"}}}, \"required\": null}]\n</tools>\nFor each function call return a JSON object, with the following pydantic model json schema for each:\n{'title': 'FunctionCall', 'type': 'object', 'properties': {'name': {'title': 'Name', 'type': 'string'}, 'arguments': {'title': 'Arguments', 'type': 'object'}}, 'required': ['name', 'arguments']}\nEach function call should be enclosed within <tool_call> </tool_call> XML tags.\nExample:\n<tool_call>\n{'name': <function-name>,'arguments': <args-dict>}\n</tool_call>"
 87      },
 88      {
 89        "from": "human",
 90        "value": "What Python version is installed?"
 91      },
 92      {
 93        "from": "gpt",
 94        "value": "<think>\nThe user wants to know the Python version. I should run python3 --version.\n</think>\n<tool_call>\n{\"name\": \"terminal\", \"arguments\": {\"command\": \"python3 --version\"}}\n</tool_call>"
 95      },
 96      {
 97        "from": "tool",
 98        "value": "<tool_response>\n{\"tool_call_id\": \"call_abc123\", \"name\": \"terminal\", \"content\": \"Python 3.11.6\"}\n</tool_response>"
 99      },
100      {
101        "from": "gpt",
102        "value": "<think>\nGot the version. I can now answer the user.\n</think>\nPython 3.11.6 is installed on this system."
103      }
104    ],
105    "timestamp": "2026-03-30T14:22:31.456789",
106    "model": "anthropic/claude-sonnet-4.6",
107    "completed": true
108  }
109  ```
110  
111  
112  ## Normalization Rules
113  
114  ### Reasoning Content Markup
115  
116  The trajectory converter normalizes ALL reasoning into `<think>` tags, regardless
117  of how the model originally produced it:
118  
119  1. **Native thinking tokens** (`msg["reasoning"]` field from providers like
120     Anthropic, OpenAI o-series): Wrapped as `<think>\n{reasoning}\n</think>\n`
121     and prepended before the content.
122  
123  2. **REASONING_SCRATCHPAD XML** (when native thinking is disabled and the model
124     reasons via system-prompt-instructed XML): `<REASONING_SCRATCHPAD>` tags are
125     converted to `<think>` via `convert_scratchpad_to_think()`.
126  
127  3. **Empty think blocks**: Every `gpt` turn is guaranteed to have a `<think>`
128     block. If no reasoning was produced, an empty block is inserted:
129     `<think>\n</think>\n` — this ensures consistent format for training data.
130  
131  ### Tool Call Normalization
132  
133  Tool calls from the API format (with `tool_call_id`, function name, arguments as
134  JSON string) are converted to XML-wrapped JSON:
135  
136  ```
137  <tool_call>
138  {"name": "terminal", "arguments": {"command": "ls -la"}}
139  </tool_call>
140  ```
141  
142  - Arguments are parsed from JSON strings back to objects (not double-encoded)
143  - If JSON parsing fails (shouldn't happen — validated during conversation),
144    an empty `{}` is used with a warning logged
145  - Multiple tool calls in one assistant turn produce multiple `<tool_call>` blocks
146    in a single `gpt` message
147  
148  ### Tool Response Normalization
149  
150  All tool results following an assistant message are grouped into a single `tool`
151  turn with XML-wrapped JSON responses:
152  
153  ```
154  <tool_response>
155  {"tool_call_id": "call_abc123", "name": "terminal", "content": "output here"}
156  </tool_response>
157  ```
158  
159  - If tool content looks like JSON (starts with `{` or `[`), it's parsed so the
160    content field contains a JSON object/array rather than a string
161  - Multiple tool results are joined with newlines in one message
162  - The tool name is matched by position against the parent assistant's `tool_calls`
163    array
164  
165  ### System Message
166  
167  The system message is generated at save time (not taken from the conversation).
168  It follows the Hermes function-calling prompt template with:
169  
170  - Preamble explaining the function-calling protocol
171  - `<tools>` XML block containing the JSON tool definitions
172  - Schema reference for `FunctionCall` objects
173  - `<tool_call>` example
174  
175  Tool definitions include `name`, `description`, `parameters`, and `required`
176  (set to `null` to match the canonical format).
177  
178  
179  ## Loading Trajectories
180  
181  Trajectories are standard JSONL — load with any JSON-lines reader:
182  
183  ```python
184  import json
185  
186  def load_trajectories(path: str):
187      """Load trajectory entries from a JSONL file."""
188      entries = []
189      with open(path, "r", encoding="utf-8") as f:
190          for line in f:
191              line = line.strip()
192              if line:
193                  entries.append(json.loads(line))
194      return entries
195  
196  # Filter to successful completions only
197  successful = [e for e in load_trajectories("trajectory_samples.jsonl")
198                if e.get("completed")]
199  
200  # Extract just the conversations for training
201  training_data = [e["conversations"] for e in successful]
202  ```
203  
204  ### Loading for HuggingFace Datasets
205  
206  ```python
207  from datasets import load_dataset
208  
209  ds = load_dataset("json", data_files="trajectory_samples.jsonl")
210  ```
211  
212  The normalized `tool_stats` schema ensures all entries have the same columns,
213  preventing Arrow schema mismatch errors during dataset loading.
214  
215  
216  ## Controlling Trajectory Saving
217  
218  In the CLI, trajectory saving is controlled by:
219  
220  ```yaml
221  # config.yaml
222  agent:
223    save_trajectories: true  # default: false
224  ```
225  
226  Or via the `--save-trajectories` flag. When the agent initializes with
227  `save_trajectories=True`, the `_save_trajectory()` method is called at the end
228  of each conversation turn.
229  
230  The batch runner always saves trajectories (that's its primary purpose).
231  
232  Samples with zero reasoning across all turns are automatically discarded by the
233  batch runner to avoid polluting training data with non-reasoning examples.