troubleshooting.md
1 # Diagnostics & Troubleshooting 2 3 --- 4 5 ## Logs 6 7 | Log | Content | 8 |-----|---------| 9 | `logs/backend.log` | API server (10MB rotation, 5 backups) | 10 | `logs/agent_cli.log` | CLI execution | 11 | `logs/latest-test-results.log` | Last test run (overwritten) | 12 13 ```bash 14 docker logs project-ag3ntum-api-1 --tail 100 -f # Container stdout 15 ./run.sh shell && tail -f /logs/backend.log # Inside container 16 grep -i "denied\|blocked" logs/backend.log # Security denials 17 grep "ERROR\|Exception" logs/backend.log # Errors 18 ``` 19 20 **Loggers**: `src.api` | `src.services` | `src.core` | `src.db` | `ag3ntum` | `tools.ag3ntum` | `uvicorn` | `fastapi` 21 22 --- 23 24 ## Database 25 26 `sqlite3 data/ag3ntum.db` — tables: `users`, `sessions`, `events`, `tokens` 27 28 ```sql 29 -- List recent sessions 30 SELECT id, status, task, total_cost_usd FROM sessions ORDER BY created_at DESC LIMIT 10; 31 -- Check user UIDs (sandbox debug) 32 SELECT username, linux_uid FROM users WHERE linux_uid BETWEEN 50000 AND 60000; 33 -- Count events for session 34 SELECT COUNT(*) FROM events WHERE session_id = 'SESSION_ID'; 35 -- Find terminal event 36 SELECT event_type FROM events WHERE session_id = 'SESSION_ID' 37 AND event_type IN ('agent_complete', 'error', 'cancelled'); 38 ``` 39 40 --- 41 42 ## Debug Agent Execution 43 44 ```bash 45 ./venv/bin/python scripts/ag3ntum_debug.py -r "task" --user "email" --password "pass" 46 # -v verbose (all events) 47 # -s security only (blocked ops) 48 # -d dump session files 49 # -m/--model override model (e.g., "openrouter:openai/gpt-5.2") 50 ``` 51 Read @`how-to-debug-agent-with-ag3ntum_debug.md`. Note: auth uses email, filesystem uses username. 52 53 --- 54 55 ## Troubleshooting Flowcharts 56 57 **Session stuck in "running"**: 58 1. Check process: `ps aux | grep session_id` inside container 59 2. Check DB: `SELECT status, updated_at FROM sessions WHERE id = '...';` 60 3. Fix: `./run.sh restart` — cleans stale sessions on startup 61 62 **Events not appearing in UI**: 63 1. Redis alive? `redis-cli ping` (inside container) 64 2. Events persisted? `SELECT COUNT(*) FROM events WHERE session_id = '...';` 65 3. Browser console → SSE connection errors? 66 4. JWT token valid? Check expiry in browser DevTools. 67 68 **Agent failing silently**: 69 1. Check SDK log: `tail -50 users/USER/sessions/ID/agent.jsonl | grep -i error` 70 2. Check backend: `grep -A5 "Exception\|Traceback" logs/backend.log | tail -30` 71 72 **Container won't start**: 73 1. Port conflict: `lsof -i :40080` / `lsof -i :50080` 74 2. Stale containers: `./run.sh cleanup && ./run.sh build` 75 3. Permission issue (Linux): `./run.sh build` re-runs chown 76 77 **Tests failing unexpectedly**: 78 1. Check `logs/latest-test-results.log` for full output 79 2. Stale container? `./run.sh rebuild && ./run.sh test` 80 3. Redis down? Tests need Redis: `docker ps | grep redis` 81 4. Wrong platform binaries (UI tests)? `run.sh` auto-detects and reinstalls node_modules 82 83 **Dev/Prod mode issues**: 84 1. **503 "Frontend not built"**: Web container is in prod mode but `/web_dist` is missing or empty. Rebuild: `./run.sh build --no-cache` 85 2. **Stale frontend after code changes**: In prod mode, frontend is baked into the Docker image. Must rebuild (`./run.sh build`) to pick up changes. In dev mode, Vite HMR handles this automatically. 86 3. **Wrong mode after restart**: `./run.sh restart` preserves the current mode from `.env`. To switch modes, use `./run.sh build` (prod) or `./run.sh build --dev` (dev). 87 4. **UI tests fail with ENOENT /app/package.json**: Web container is in prod mode (no node_modules). Run `./run.sh test --ui` — it auto-switches to dev mode for tests. 88 5. **"Failed to resolve import" errors**: Check that `vite.shared.mjs` exists and is imported by both `vite.config.mjs` and `vitest.config.mjs`. Rebuild with `--no-cache` if the Docker image is stale. 89 90 **SSE streaming broken**: 91 1. Frontend falls back: SSE → backoff → polling (3+ fails) → SSE retry (60s) 92 2. Check `ConnectionManager` state in React DevTools 93 3. Check `/sessions/{id}/events` endpoint in Network tab 94 4. Fallback endpoint: `/sessions/{id}/events/history` (polling)