Cradicle Explorer

/ docs / docs / documentation / deploying / debugging.md

debugging.md

1 ---
2 title: Diagnosing and Resolving Problems
3 sidebar_position: 30
4 ---
5
6 # Diagnosing and Resolving Problems
7
8 Effective debugging in Agent Mesh requires a systematic approach that leverages the platform's distributed architecture. Because your system consists of multiple agents communicating through a Solace event broker, issues can arise at various levels—from individual agent logic to inter-component communication patterns.
9
10 The key to successful debugging lies in understanding where problems might occur and having the right tools to investigate each layer of your system. Agent Mesh provides comprehensive observability features that serve as your foundation for debugging activities. For detailed information about these monitoring capabilities, see [Observability](./observability.md).
11
12 This guide presents proven debugging strategies arranged from simple isolation techniques to advanced diagnostic methods. Each approach targets different types of issues, allowing you to choose the most effective method based on your specific situation.
13
14 ## Isolating Components
15
16 When facing complex issues in a multi-agent system, isolation becomes your most powerful debugging technique. By running only the components directly related to your problem, you eliminate variables and focus your investigation on the most likely sources of trouble.
17
18 Component isolation works because it reduces system complexity to manageable levels. Instead of trying to understand interactions across dozens of agents, you can focus on a small subset and verify their behavior in controlled conditions.
19
20 The Agent Mesh CLI provides precise control over which components run in your debugging session. You can specify exactly which configuration files to load, creating a minimal environment that includes only the agents you need to investigate.
21
22 For example, if you're debugging an issue with a specific tool integration, you might run only the orchestrator and the problematic tool agent:
23
24 ```bash
25 sam run configs/agents/my_tool_1.yaml configs/agents/my_tool_2.yaml
26 ```
27
28 This command creates a focused debugging environment that includes only the agents defined in `my_tool_1.yaml` and `my_tool_2.yaml`. By eliminating unrelated components, you reduce log noise and make it easier to trace the specific interactions that might be causing problems.
29
30 This isolation approach is particularly effective when you suspect issues with agent-to-agent communication, configuration problems, or logic errors within specific agents.
31
32 ## Examining STIM Files
33
34 STIM files serve as your detailed forensic evidence when debugging complex issues. These comprehensive traces capture every aspect of how requests flow through your system, making them invaluable for understanding problems that span multiple agents or involve timing-sensitive interactions.
35
36 [STIM files](./observability.md#examining-stimulus-logs) provide the most complete picture available of stimulus lifecycles. Unlike real-time monitoring tools that show current activity, STIM files preserve historical data that you can analyze repeatedly and share with team members for collaborative debugging.
37
38 Each `.stim` file contains a complete record of all Solace event broker events related to a single stimulus, from the initial user request through every agent interaction to the final response delivery. This comprehensive coverage makes STIM files particularly useful for debugging issues that involve:
39
40 - Multi-agent workflows where the problem might occur at any step
41 - Timing-related issues where sequence and duration matter
42 - Intermittent problems that are difficult to reproduce in real-time
43 - Performance bottlenecks that require detailed timing analysis
44
45 When examining STIM files, look for patterns in agent response times, unexpected message routing, or missing interactions that should have occurred based on your system design.
46
47 ## Monitoring Event Broker Activity
48
49 Real-time Solace event broker monitoring provides immediate insights into your system's communication patterns and helps identify issues as they occur. This approach complements STIM file analysis by giving you live visibility into message flows and event interactions.
50
51 Broker-level monitoring is particularly valuable because it shows the actual communication happening between components, regardless of how agents are configured or what they report about their own status. This ground-truth perspective helps identify discrepancies between expected and actual behavior.
52
53 For comprehensive guidance on Solace event broker monitoring techniques and tools, see [Monitoring Event Broker Activity](./observability.md#monitoring-event-broker-activity).
54
55 ## Using Debug Mode
56
57 Interactive debugging provides the deepest level of investigation capability by allowing you to pause execution and examine system state in real-time. Because Agent Mesh is built on Python, you can leverage standard Python debugging tools and IDE features to step through code execution and inspect variables.
58
59 This approach is most effective when you've already isolated the problem to specific components and need to understand exactly what's happening within agent logic or framework code.
60
61 ### Setting Up VSCode Debugging
62
63 VSCode provides an excellent debugging environment for Agent Mesh development. The integrated debugger allows you to set breakpoints, step through code execution, and inspect variables in real-time, making it easier to understand complex agent interactions and identify logic errors.
64
65 Configure debugging by creating or updating your `.vscode/launch.json` file:
66
67 ```json
68 {
69 "version": "0.2.0",
70 "configurations": [
71 {
72 "name": "sam-debug",
73 "type": "debugpy",
74 "request": "launch",
75 "module": "solace_agent_mesh.cli.main",
76 "console": "integratedTerminal",
77 "envFile": "${workspaceFolder}/.env",
78 "args": [
79 "run",
80 "configs/agents/main_orchestrator.yaml",
81 "configs/gateways/webui.yaml"
82 // Add any other components you want to run here
83 ],
84 "justMyCode": false
85 }
86 ]
87 }
88 ```
89
90 The `"justMyCode": false` setting is particularly important because it allows you to step into Agent Mesh framework code, not just your custom agent logic. This capability is valuable when debugging issues that might involve framework behavior or when you need to understand how your agents interact with the underlying platform.
91
92 To start a debugging session:
93
94 1. Open the **RUN AND DEBUG** panel in the left sidebar
95 2. Select `sam-debug` from the configuration dropdown
96 3. Click the **Play** button to launch your system in debug mode
97
98 Once running, you can set breakpoints in your agent code, framework files, or any Python modules your system uses. When execution hits a breakpoint, you can inspect variable states, evaluate expressions, and step through code line by line to understand exactly what's happening.
99
100 ## Invoking Agents Directly
101
102 Direct agent invocation provides a powerful technique for isolating and testing individual agents outside of normal user workflows. This approach helps you verify that specific agents work correctly in isolation, making it easier to determine whether problems lie within agent logic or in the broader system interactions.
103
104 You can invoke agents directly through two primary methods: using the web UI's agent selection dropdown for quick testing, or sending messages directly through the Solace event broker for more controlled testing scenarios.
105
106 The Solace event broker-based approach gives you complete control over message content and timing, making it ideal for testing edge cases, error conditions, or specific message formats that might be difficult to generate through normal user interactions.
107
108 ### Using Tools for Direct Message Testing
109
110 Several tools facilitate direct message testing, each suited to different debugging scenarios:
111
112 **[Solace Try Me VSCode Extension](https://marketplace.visualstudio.com/items?itemName=solace-tools.solace-try-me-vsc-extension)**: Integrates directly into your development environment, making it convenient to test messages without switching contexts. This tool is particularly useful during active development when you need to quickly verify agent behavior.
113
114 **[Solace Try Me (STM) CLI Tool](https://github.com/SolaceLabs/solace-tryme-cli)**: Provides command-line access for scripted testing and automation. This tool excels in scenarios where you need to send multiple test messages or integrate testing into automated workflows.
115
116 ### Formatting Messages for Direct Invocation
117
118 Understanding the exact message format is crucial for successful direct agent testing. The following structure represents how the Agent Mesh framework expects messages to be formatted:
119
120 **Topic Structure**:
121 ```
122 [NAME_SPACES]a2a/v1/agent/request/<agent_name>
123 ```
124
125 Replace `<agent_name>` with the specific agent you want to test. The namespace prefix should match your system configuration.
126
127 **Required User Properties**:
128 ```
129 userId: test-0000
130 clientId: test-0000
131 replyTo: [NAME_SPACES]a2a/v1/gateway/response/0000000/task-0000000
132 a2aUserConfig: {}
133 ```
134
135 These properties provide essential context that agents expect, including user identification and response routing information.
136
137 **Message Payload**:
138 ```json
139 {
140 "jsonrpc": "2.0",
141 "id": "000000000",
142 "method": "tasks/sendSubscribe",
143 "params": {
144 "id": "task-0000000",
145 "sessionId": "web-session-00000000",
146 "message": {
147 "role": "user",
148 "parts": [
149 {
150 "type": "text",
151 "text": "Hello World!"
152 }
153 ]
154 },
155 "acceptedOutputModes": [
156 "text"
157 ],
158 "metadata": {
159 "system_purpose": "The system is an AI Chatbot with agentic capabilities. It uses the agents available to provide information, reasoning and general assistance for the users in this system. **Always return useful artifacts and files that you create to the user.** Provide a status update before each tool call. Your external name is Agent Mesh.\n",
160 "response_format": "Responses should be clear, concise, and professionally toned. Format responses to the user in Markdown using appropriate formatting.\n"
161 }
162 }
163 }
164 ```
165
166 **Expected Response Topic**:
167 ```
168 [NAME_SPACES]a2a/v1/gateway/response/0000000/task-0000000
169 ```
170
171 Subscribe to this topic to receive the agent's response. The response will follow the same JSON-RPC format and contain the agent's output.
172
173 By sending carefully crafted requests and observing responses, you can verify agent behavior in complete isolation. This technique helps distinguish between agent-specific issues and broader system problems, significantly streamlining your debugging process.
174
175 ## Analyzing System Logs
176
177 System logs serve as your comprehensive record of application behavior, capturing everything from routine operations to error conditions. These logs provide a different perspective than STIM files or Solace event broker monitoring—they focus on internal application state and framework behavior rather than message flows.
178
179 Understanding system logs becomes crucial when debugging issues related to agent initialization, configuration problems, or internal framework errors that might not be visible through other observability tools.
180
181 For detailed information about configuring system logs, see [Logging Configuration](./logging.md).