/ docs / troubleshooting.md
troubleshooting.md
 1  # Diagnostics & Troubleshooting
 2  
 3  ---
 4  
 5  ## Logs
 6  
 7  | Log | Content |
 8  |-----|---------|
 9  | `logs/backend.log` | API server (10MB rotation, 5 backups) |
10  | `logs/agent_cli.log` | CLI execution |
11  | `logs/latest-test-results.log` | Last test run (overwritten) |
12  
13  ```bash
14  docker logs project-ag3ntum-api-1 --tail 100 -f     # Container stdout
15  ./run.sh shell && tail -f /logs/backend.log          # Inside container
16  grep -i "denied\|blocked" logs/backend.log           # Security denials
17  grep "ERROR\|Exception" logs/backend.log             # Errors
18  ```
19  
20  **Loggers**: `src.api` | `src.services` | `src.core` | `src.db` | `ag3ntum` | `tools.ag3ntum` | `uvicorn` | `fastapi`
21  
22  ---
23  
24  ## Database
25  
26  `sqlite3 data/ag3ntum.db` — tables: `users`, `sessions`, `events`, `tokens`
27  
28  ```sql
29  -- List recent sessions
30  SELECT id, status, task, total_cost_usd FROM sessions ORDER BY created_at DESC LIMIT 10;
31  -- Check user UIDs (sandbox debug)
32  SELECT username, linux_uid FROM users WHERE linux_uid BETWEEN 50000 AND 60000;
33  -- Count events for session
34  SELECT COUNT(*) FROM events WHERE session_id = 'SESSION_ID';
35  -- Find terminal event
36  SELECT event_type FROM events WHERE session_id = 'SESSION_ID'
37    AND event_type IN ('agent_complete', 'error', 'cancelled');
38  ```
39  
40  ---
41  
42  ## Debug Agent Execution
43  
44  ```bash
45  ./venv/bin/python scripts/ag3ntum_debug.py -r "task" --user "email" --password "pass"
46  # -v  verbose (all events)
47  # -s  security only (blocked ops)
48  # -d  dump session files
49  # -m/--model  override model (e.g., "openrouter:openai/gpt-5.2")
50  ```
51  Read @`how-to-debug-agent-with-ag3ntum_debug.md`. Note: auth uses email, filesystem uses username.
52  
53  ---
54  
55  ## Troubleshooting Flowcharts
56  
57  **Session stuck in "running"**:
58  1. Check process: `ps aux | grep session_id` inside container
59  2. Check DB: `SELECT status, updated_at FROM sessions WHERE id = '...';`
60  3. Fix: `./run.sh restart` — cleans stale sessions on startup
61  
62  **Events not appearing in UI**:
63  1. Redis alive? `redis-cli ping` (inside container)
64  2. Events persisted? `SELECT COUNT(*) FROM events WHERE session_id = '...';`
65  3. Browser console → SSE connection errors?
66  4. JWT token valid? Check expiry in browser DevTools.
67  
68  **Agent failing silently**:
69  1. Check SDK log: `tail -50 users/USER/sessions/ID/agent.jsonl | grep -i error`
70  2. Check backend: `grep -A5 "Exception\|Traceback" logs/backend.log | tail -30`
71  
72  **Container won't start**:
73  1. Port conflict: `lsof -i :40080` / `lsof -i :50080`
74  2. Stale containers: `./run.sh cleanup && ./run.sh build`
75  3. Permission issue (Linux): `./run.sh build` re-runs chown
76  
77  **Tests failing unexpectedly**:
78  1. Check `logs/latest-test-results.log` for full output
79  2. Stale container? `./run.sh rebuild && ./run.sh test`
80  3. Redis down? Tests need Redis: `docker ps | grep redis`
81  4. Wrong platform binaries (UI tests)? `run.sh` auto-detects and reinstalls node_modules
82  
83  **Dev/Prod mode issues**:
84  1. **503 "Frontend not built"**: Web container is in prod mode but `/web_dist` is missing or empty. Rebuild: `./run.sh build --no-cache`
85  2. **Stale frontend after code changes**: In prod mode, frontend is baked into the Docker image. Must rebuild (`./run.sh build`) to pick up changes. In dev mode, Vite HMR handles this automatically.
86  3. **Wrong mode after restart**: `./run.sh restart` preserves the current mode from `.env`. To switch modes, use `./run.sh build` (prod) or `./run.sh build --dev` (dev).
87  4. **UI tests fail with ENOENT /app/package.json**: Web container is in prod mode (no node_modules). Run `./run.sh test --ui` — it auto-switches to dev mode for tests.
88  5. **"Failed to resolve import" errors**: Check that `vite.shared.mjs` exists and is imported by both `vite.config.mjs` and `vitest.config.mjs`. Rebuild with `--no-cache` if the Docker image is stale.
89  
90  **SSE streaming broken**:
91  1. Frontend falls back: SSE → backoff → polling (3+ fails) → SSE retry (60s)
92  2. Check `ConnectionManager` state in React DevTools
93  3. Check `/sessions/{id}/events` endpoint in Network tab
94  4. Fallback endpoint: `/sessions/{id}/events/history` (polling)