server.md
1 # Server & Infrastructure — Bob 2 3 > Last updated: 2026-04-09 (Session 12 final) 4 5 ## Access 6 7 - **Hostname**: rig.lan (192.168.1.137) 8 - **SSH**: `ssh rig@rig.lan` (key: `~/.ssh/id_rsa` from NH-OneXPlayer) 9 - **User**: rig (passwordless sudo) 10 11 ## Hardware 12 13 | Component | Spec | 14 |-----------|------| 15 | CPU | AMD Ryzen 9 8945HS (8 cores / 16 threads, up to 5.26 GHz) | 16 | RAM | 80 GB | 17 | GPU 0 | NVIDIA GeForce RTX 3090 (24 GB, bus 01:00.0) — internal PCIe x4 | 18 | GPU 1 | NVIDIA GeForce RTX 3090 (24 GB, bus 08:00.0) — Razer Core X via TB3 | 19 | GPU 2 | NVIDIA GeForce RTX 3090 (24 GB, bus 68:00.0) — Razer Core X via TB3 | 20 | Total VRAM | 72 GB | 21 | eGPU enclosures | 2x Razer Core X (Thunderbolt 3, Intel JHL6340) | 22 | Storage | 3.6 TB NVMe (nvme0n1) | 23 | Network | enp3s0 (Ethernet, DHCP) | 24 | PSU | **Unknown — verify ≥ 1200W** (3x 350W TDP GPUs = 1050W GPU alone) | 25 26 ## Current OS 27 28 - **NixOS 26.05 (Yarara)** — installed 2026-03-24, boot repaired 2026-03-29 29 - Kernel: 6.18.19 30 - NVIDIA Driver: 590.48.01 (proprietary), CUDA 13.1 31 - Bootloader: systemd-boot 32 - Thunderbolt: auto-authorize via udev rule + bolt.service 33 - Flake: `nix/` in this repo 34 35 ## Disk Layout 36 37 ``` 38 nvme0n1 3.6T 39 ├─nvme0n1p1 512M vfat /boot (PARTLABEL=disk-nvme-esp) 40 ├─nvme0n1p2 8G swap (PARTLABEL=swap, not active) 41 └─nvme0n1p3 3.6T ext4 / (PARTLABEL=disk-nvme-root) 42 ``` 43 44 ## Services — Running 45 46 | Service | Port | Runtime | Notes | 47 |---------|------|---------|-------| 48 | vLLM (primary) | 8000 | Docker (GPU 0+1, TP=2) | Qwen3-32B AWQ, ~40 tok/s, tool calling, llm.rig.lan | 49 | Embeddings (TEI) | 8080 | Docker (GPU 2) | BAAI/bge-m3 | 50 | faster-whisper | 10300 | Docker (GPU 2) | Whisper large-v3 INT8, STT, stt.rig.lan | 51 | Kokoro TTS | 10400 | Docker (GPU 2) | 54 voices, <300ms latency, tts.rig.lan | 52 | Fish Speech | 10600 | Docker (GPU 2) | v1.5, Ray Porter voice clone (primary TTS) | 53 | openWakeWord | 10500 | Docker (CPU) | Wyoming protocol, custom "hey bob" wake word | 54 | Ollama | 11434 | Docker (GPU 2) | Qwen2.5-VL-3B vision, time-shared, 30s keepalive | 55 | NATS JetStream | 4222/8222/1883 | NixOS native | Event bus + MQTT bridge, nats.rig.lan | 56 | HomeAssistant | 8123 | Docker (CPU, host network) | Onboarded, https://home.genexergy.org | 57 | Pipecat Agent | 10700 | Docker (CPU, host net) | Voice pipeline + 9 tools, wake word, diarization | 58 | Diarization | — | Docker (GPU 2) | diart + CAM++ streaming, 3 speakers enrolled | 59 | Oxigraph | 7878 | Docker (CPU) | SPARQL endpoint, ~122 triples (BFO+CCO data lost — needs reload) | 60 | Neo4j | 7474/7687 | Docker (CPU) | Graph DB for Graphiti agent memory | 61 | HA→NATS Bridge | — | Docker (CPU) | Publishes HA state changes to NATS | 62 | Caddy | 80/443 | NixOS native | Reverse proxy for *.rig.lan + voice.rig.lan | 63 | Prometheus | 9090 | NixOS native | Metrics, 30d retention, prometheus.rig.lan | 64 | Node Exporter | 9100 | NixOS native | System metrics | 65 | Alertmanager | 9093 | NixOS native | 8 alert rules | 66 | Reticulum | 4242 | Docker (CPU) | Transport node, TCP gateway, MichMesh + RMAP peered | 67 | Syncthing | 8384/22000 | Docker (CPU) | Running, rig + kairos synced | 68 | Firefly III | 8181 | Docker (CPU) | Financial management, sops credentials | 69 | Firefly DB | 3306 | Docker (CPU) | MariaDB for Firefly III | 70 | TrustGraph | 8888/8088 | Docker Compose (~44 containers) | Workbench :8888 + API :8088, Authenticated (API key configured) | 71 | Squid Proxy | 3128 | Docker (nuclide-amd.lan) | Authenticated forward proxy, residential IP egress. Pending router port forward. | 72 73 | Bob Agent | Schedule | Notes | 74 |-----------|----------|-------| 75 | Agent Scheduler | always-on | Cron trigger service, NATS JetStream | 76 | Home Keeper | hourly | Infrastructure health checks | 77 | Morning Coordinator | 7:45 AM ET | Daily briefing (weather, news, health) | 78 | Evening Coordinator | 8:00 PM ET | Daily summary | 79 | Knowledge Gardener | 2:00 AM ET | Consolidation + real-time session storage + pruning | 80 | System Sentinel | every 15 min | Deep monitoring (Prometheus, Docker, SSH inventory) | 81 | News Aggregator | every 2 hours | RSS feeds + NWS + Guardian API | 82 | Device Health | every 4 hours | SSH checks across managed devices | 83 | Alert Bridge | always-on | Alertmanager → NATS webhook bridge | 84 | Calendar Bridge | always-on | ICS feed poller, 4 feeds active (Proton x2, MS365, Google) | 85 | Coordinator | always-on | Request classifier + model router | 86 | Home Automations | always-on | 3-tier rule engine (YAML, pattern, LLM) | 87 | Network Discovery | always-on | Subnet scanner | 88 | Announce Player | always-on | TTS announcements on speakers | 89 | REPL Sandbox | always-on | Sandboxed Python execution | 90 | Voice Enrollment | always-on | Speaker enrollment training | 91 92 ## Services — DOWN or Degraded 93 94 | Service | Issue | Impact | 95 |---------|-------|--------| 96 | TrustGraph (7 of ~48) | Exited: workbench-ui, loki, grafana, prometheus, ddg-mcp, garage, mcp-server | TG monitoring + some UI unavailable | 97 98 > All Bob containers are now NixOS-managed. No unmanaged containers remain. 99 100 ## GPU Allocation 101 102 | GPU | Bus | Services | VRAM Used | VRAM Free | 103 |-----|-----|----------|-----------|-----------| 104 | 0 | 01:00.0 (internal) | vLLM TP rank 0 (Qwen3-32B AWQ) | 20.9 GB | 2.7 GB | 105 | 1 | 08:00.0 (Razer Core X) | vLLM TP rank 1 (Qwen3-32B AWQ) | 20.9 GB | 2.7 GB | 106 | 2 | 68:00.0 (Razer Core X) | Classifier + Embeddings + STT + TTS (Fish+Kokoro) + Diarization + Ollama (on-demand) | 15.6 GB | 8.2 GB | 107 108 > **Note:** GPU 2 has ~8 GB free with classifier running. Ollama loads Qwen2.5-VL-3B on demand (~3.2 GB, 30s keepalive) — will temporarily reduce free VRAM to ~5 GB. 109 110 ## Credentials 111 112 <!-- Secrets managed via sops-nix after NixOS install --> 113 | Account | Username | Password | Used For | 114 |---------|----------|----------|----------| 115 | rig SSH | rig | key-based | System access | 116 | Neo4j | neo4j | sops: `neo4j_password` | Graph DB | 117 | Grafana | admin | sops: `grafana_admin_password` | Monitoring dashboards | 118 | Firefly III DB | firefly | sops: `firefly_db_password` | MariaDB | 119 | Firefly III | — | sops: `firefly_app_key` | Laravel APP_KEY | 120 | Firefly DB root | root | sops: `firefly_db_root_password` | MariaDB root | 121 | Calendar Bridge | — | sops: `calendar_ics_urls` | 4 ICS feeds (Proton x2, MS365, Google) | 122 | HomeAssistant | — | sops: `ha_token` | Long-lived access token | 123 | HuggingFace | — | sops: `hf_token` | Gated model access (pyannote) | 124 | Residential Proxy | glean | sops: `proxy_password` | Squid proxy auth (proxy.genexergy.org:3128) | 125 | Restic Backup | — | sops: `restic_password` | Encrypted backup repo | 126 127 **Keycloak Users** (realm: hydra-ops, auth.genexergy.org): 128 | Username | Name | Role | Notes | 129 |----------|------|------|-------| 130 | cam | Cameron Hunt | haven | Dad — primary admin | 131 | aj | Adriane Hunt | haven | Mom | 132 | hailen | Hailen Hunt | haven | Son — email: hailen.n.hunt@outlook.com (verified) | 133 | operator | — | haven | Service account | 134 | greatroom | — | haven | Kiosk location account | 135 | garage | — | haven | Kiosk location account | 136 137 > Secrets managed by sops-nix. Decrypt with: `SOPS_AGE_KEY_FILE=/var/lib/sops-nix/key.txt sops nix/secrets/secrets.yaml` 138 139 ## Common Operations 140 141 ```bash 142 # SSH into rig 143 ssh rig@rig.lan 144 145 # Check GPU status 146 ssh rig@rig.lan "nvidia-smi" 147 148 # Check running containers 149 ssh rig@rig.lan "sudo docker ps" 150 151 # Test LLM inference 152 ssh rig@rig.lan 'curl -s http://localhost:8000/v1/chat/completions \ 153 -H "Content-Type: application/json" \ 154 -d "{\"model\":\"Qwen/Qwen3-32B-AWQ\",\"messages\":[{\"role\":\"user\",\"content\":\"Hello\"}],\"max_tokens\":64}"' 155 156 # Test embeddings 157 ssh rig@rig.lan 'curl -s http://localhost:8080/embed \ 158 -H "Content-Type: application/json" \ 159 -d "{\"inputs\":\"test\"}" | jq ".[0][:3]"' 160 161 # Test TTS → WAV 162 ssh rig@rig.lan 'curl -s http://localhost:10400/v1/audio/speech \ 163 -H "Content-Type: application/json" \ 164 -d "{\"model\":\"kokoro\",\"input\":\"Hello\",\"voice\":\"af_heart\"}" -o /tmp/test.wav' 165 166 # Test STT (transcribe a WAV) 167 ssh rig@rig.lan 'curl -s http://localhost:10300/v1/audio/transcriptions \ 168 -F "file=@/tmp/test.wav" -F "model=Systran/faster-whisper-large-v3"' 169 170 # Check NATS JetStream 171 ssh rig@rig.lan "curl -s http://localhost:8222/varz | jq .jetstream.config" 172 173 # Deploy NixOS config changes 174 rsync -avz nix/ rig@rig.lan:/tmp/haven-nix/ 175 ssh rig@rig.lan "sudo nixos-rebuild switch --flake /tmp/haven-nix#rig" 176 ``` 177 178 ## Notes 179 180 - **RAM**: 78 GB total, ~21 GB used, 56 GB available. Adequate for current stack. 181 - **Disk**: 250 GB / 3.6 TB used (8%). 3.2 TB free. 182 - **PSU**: Must verify wattage. 3x RTX 3090 at full load + CPU + system = ~1200W minimum. 183 - **Docker containers fully declarative**: 35 Bob containers managed via NixOS `containers.nix`. TrustGraph has its own docker-compose (~44 containers). 184 - **Model cache**: `/srv/bob/vllm` holds HuggingFace model downloads. Persistent across container restarts. 185 - **Single code directory**: All service code lives in `/home/rig/bob/services/`. The legacy `haven/` directory was removed in Session 12 (archived at `/home/rig/haven-archive-20260407.tar.gz`).