DEPLOYMENT.md
1 # ACDC Botnet - Production Deployment Guide 2 3 ## Overview 4 5 This guide covers deploying ACDC Botnet in production with proper resource management, systemd integration, and multi-server orchestration. 6 7 --- 8 9 ## Architecture 10 11 ``` 12 ┌─────────────────────────────────────────────────────┐ 13 │ Coordinator (Command & Control) │ 14 │ - ci.ac-dc.network:50051 │ 15 │ - Scenario orchestration │ 16 │ - Bot distribution │ 17 │ - Metrics aggregation │ 18 │ - systemd service: acdc-botnet-coordinator │ 19 └──────────┬──────────────────────────────────────────┘ 20 │ gRPC bidirectional streams 21 ├─────────────────┬─────────────────┬────────────────── 22 │ │ │ 23 ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐ 24 │ Worker 1 │ │ Worker 2 │ │ Worker N │ 25 │ (GPU) │ │ (CPU) │ │ (CPU) │ 26 ├───────────┤ ├───────────┤ ├───────────┤ 27 │ 50 bots │ │ 200 bots │ │ 200 bots │ 28 │ - Provers │ │ - Traders │ │ - Users │ 29 │ - Users │ │ - Users │ │ - Govs │ 30 └───────────┘ └───────────┘ └───────────┘ 31 ``` 32 33 **Key Components:** 34 - **Coordinator**: Single instance, lightweight (25% CPU, 2GB RAM) 35 - **Workers**: Multiple instances, configurable resources (40-80% CPU, 8-16GB RAM) 36 - **Protocol**: gRPC over TCP (port 50051) 37 - **Orchestration**: systemd with resource controls 38 39 --- 40 41 ## Prerequisites 42 43 ### System Requirements 44 45 #### Coordinator Server 46 - **CPU**: 2-4 cores 47 - **RAM**: 4GB minimum 48 - **Network**: Public IP or accessible to worker nodes 49 - **OS**: Linux with systemd (Ubuntu 20.04+, RHEL 8+, Debian 11+) 50 51 #### Worker Servers 52 - **CPU**: 8+ cores recommended (4 cores minimum) 53 - **RAM**: 16GB recommended (8GB minimum) 54 - **Network**: Access to coordinator on port 50051 55 - **GPU**: Optional (for prover bots with ZK proof generation) 56 - **OS**: Linux with systemd 57 58 ### Software Dependencies 59 - Rust 1.92.0+ (for building) 60 - systemd 240+ (for resource controls) 61 - gRPC libraries (included in binary) 62 63 --- 64 65 ## Installation 66 67 ### Step 1: Build Binary 68 69 On build server or CI: 70 71 ```bash 72 cd /home/devops/working-repos/acdc-botnet 73 cargo build --release --all-features 74 75 # Binary will be at: target/release/acdc-botnet 76 ``` 77 78 ### Step 2: Deploy to Servers 79 80 #### Deploy to Coordinator 81 82 ```bash 83 # Copy binary 84 scp target/release/acdc-botnet coordinator.example.com:/tmp/ 85 ssh coordinator.example.com 86 87 # Install binary 88 sudo mv /tmp/acdc-botnet /usr/local/bin/ 89 sudo chmod +x /usr/local/bin/acdc-botnet 90 ``` 91 92 #### Deploy to Workers 93 94 ```bash 95 # Copy to all worker nodes 96 for host in worker1 worker2 worker3; do 97 scp target/release/acdc-botnet $host:/tmp/ 98 ssh $host "sudo mv /tmp/acdc-botnet /usr/local/bin/ && sudo chmod +x /usr/local/bin/acdc-botnet" 99 done 100 ``` 101 102 ### Step 3: Install Systemd Services 103 104 #### On Coordinator 105 106 ```bash 107 # Copy systemd files 108 cd /home/devops/working-repos/acdc-botnet 109 scp systemd/* coordinator.example.com:/tmp/ 110 111 # Install services 112 ssh coordinator.example.com 113 cd /tmp 114 sudo ./install.sh 115 ``` 116 117 #### On Each Worker 118 119 ```bash 120 # Copy systemd files 121 scp systemd/* worker1.example.com:/tmp/ 122 123 # Install services 124 ssh worker1.example.com 125 cd /tmp 126 sudo ./install.sh 127 ``` 128 129 --- 130 131 ## Configuration 132 133 ### Coordinator Configuration 134 135 Edit `/etc/systemd/system/acdc-botnet-coordinator.service` if needed: 136 137 ```ini 138 [Service] 139 # Default bind address (0.0.0.0 = all interfaces) 140 ExecStart=/usr/local/bin/acdc-botnet coordinator start \ 141 --bind 0.0.0.0:50051 \ 142 --checkpointing \ 143 --checkpoint-interval 30s \ 144 --checkpoint-dir /var/lib/acdc-botnet/checkpoints \ 145 --metrics-port 9090 146 147 # Resource limits (adjust based on coordinator load) 148 CPUQuota=25% 149 MemoryMax=2G 150 ``` 151 152 ### Worker Configuration 153 154 Each worker has a configuration file at `/etc/acdc-botnet/worker-<N>.conf`. 155 156 **Example: High-capacity worker** (`/etc/acdc-botnet/worker-1.conf`): 157 ```bash 158 # Coordinator address (REQUIRED - update with your coordinator IP/hostname) 159 COORDINATOR_ADDR=ci.ac-dc.network:50051 160 161 # Worker identification 162 WORKER_ID=worker-1 163 164 # Maximum bots (adjust based on server capacity) 165 MAX_BOTS=300 166 167 # Bot capabilities 168 CAPABILITIES=trader,user,governor 169 170 # Resource limits 171 CPU_QUOTA=80% 172 MEMORY_MAX=16G 173 ``` 174 175 **Example: Co-located with validator** (`/etc/acdc-botnet/worker-2.conf`): 176 ```bash 177 # Lighter configuration for servers running validators 178 COORDINATOR_ADDR=ci.ac-dc.network:50051 179 WORKER_ID=worker-2 180 MAX_BOTS=150 181 CAPABILITIES=trader,user 182 CPU_QUOTA=40% 183 MEMORY_MAX=8G 184 ``` 185 186 **Example: GPU-enabled worker** (`/etc/acdc-botnet/worker-gpu.conf`): 187 ```bash 188 # GPU worker for ZK proof generation 189 COORDINATOR_ADDR=ci.ac-dc.network:50051 190 WORKER_ID=worker-gpu 191 MAX_BOTS=50 192 CAPABILITIES=prover,trader,user 193 CPU_QUOTA=60% 194 MEMORY_MAX=12G 195 ``` 196 197 ### Bot Capability Types 198 199 | Capability | Description | Resource Requirements | 200 |-----------|-------------|---------------------| 201 | `user` | General users (transfers, queries) | Low CPU, Low RAM | 202 | `trader` | DEX trading (spot, perpetuals) | Medium CPU, Medium RAM | 203 | `governor` | Governance voting | Low CPU, Low RAM | 204 | `prover` | ZK proof generation | High CPU, High RAM, GPU optional | 205 | `validator` | Consensus participation | High CPU, Medium RAM | 206 | `liquidity_provider` | DEX liquidity operations | Medium CPU, Medium RAM | 207 208 --- 209 210 ## Starting Services 211 212 ### Start Coordinator (First) 213 214 ```bash 215 # On coordinator server 216 sudo systemctl start acdc-botnet-coordinator 217 sudo systemctl enable acdc-botnet-coordinator # Start on boot 218 219 # Check status 220 sudo systemctl status acdc-botnet-coordinator 221 222 # View logs 223 sudo journalctl -u acdc-botnet-coordinator -f 224 ``` 225 226 **Expected log output:** 227 ``` 228 Coordinator listening on 0.0.0.0:50051 229 Checkpointing enabled: interval=30s, dir=/var/lib/acdc-botnet/checkpoints 230 Metrics server started on port 9090 231 ``` 232 233 ### Start Workers (After Coordinator) 234 235 ```bash 236 # On each worker server 237 sudo systemctl start acdc-botnet-worker@1 238 sudo systemctl enable acdc-botnet-worker@1 239 240 # Check status 241 sudo systemctl status acdc-botnet-worker@1 242 243 # View logs 244 sudo journalctl -u acdc-botnet-worker@1 -f 245 ``` 246 247 **Expected log output:** 248 ``` 249 Connecting to coordinator at ci.ac-dc.network:50051 250 Connected successfully 251 Worker registered: worker-1, capacity=300 bots 252 Capabilities: trader, user, governor 253 Waiting for bot assignments... 254 ``` 255 256 ### Start Multiple Worker Instances 257 258 ```bash 259 # Start workers 1-3 on same host (if sufficient resources) 260 sudo systemctl start acdc-botnet-worker@1 261 sudo systemctl start acdc-botnet-worker@2 262 sudo systemctl start acdc-botnet-worker@3 263 264 # Enable on boot 265 sudo systemctl enable acdc-botnet-worker@{1,2,3} 266 ``` 267 268 --- 269 270 ## Running Scenarios 271 272 ### From Coordinator 273 274 ```bash 275 # SSH to coordinator 276 ssh coordinator.example.com 277 278 # Run scenario (coordinator distributes bots to workers) 279 acdc-botnet run daily-network-ops --duration 10m 280 281 # Run high-load scenario 282 acdc-botnet run peak-tps-stress --workers 5 --bots-per-worker 200 283 284 # Check status 285 acdc-botnet status --show-workers 286 ``` 287 288 **Output:** 289 ``` 290 Coordinator: ci.ac-dc.network:50051 291 Workers: 5 active, 0 down 292 worker-1 (GPU): 50/50 bots, 15% CPU, 4GB RAM 293 worker-2 (CPU): 200/200 bots, 80% CPU, 8GB RAM 294 worker-3 (CPU): 200/200 bots, 82% CPU, 8GB RAM 295 Total: 1000 bots, 3500 TPS, 0.2% errors 296 ``` 297 298 --- 299 300 ## Monitoring 301 302 ### Systemd Resource Monitoring 303 304 ```bash 305 # Real-time resource usage for all services 306 systemd-cgtop 307 308 # Specific service metrics 309 systemctl show acdc-botnet-coordinator -p CPUUsageNSec -p MemoryCurrent 310 systemctl show acdc-botnet-worker@1 -p CPUUsageNSec -p MemoryCurrent 311 ``` 312 313 ### Prometheus Metrics 314 315 Coordinator exposes metrics on port 9090: 316 317 ```bash 318 # Query coordinator metrics 319 curl http://coordinator.example.com:9090/metrics 320 321 # Key metrics: 322 # - testbots_worker_count{status="active"} 323 # - testbots_worker_count{status="down"} 324 # - testbots_total_bots 325 # - testbots_global_tps 326 # - testbots_scenario_duration_seconds 327 ``` 328 329 ### Log Aggregation 330 331 ```bash 332 # View all coordinator logs 333 sudo journalctl -u acdc-botnet-coordinator --since "1 hour ago" 334 335 # View all worker logs 336 sudo journalctl -u 'acdc-botnet-worker@*' --since "1 hour ago" 337 338 # Follow logs from all services 339 sudo journalctl -u acdc-botnet-coordinator -u 'acdc-botnet-worker@*' -f 340 ``` 341 342 --- 343 344 ## Resource Tuning 345 346 ### Adjusting CPU Quotas 347 348 ```bash 349 # Edit worker configuration 350 sudo systemctl edit acdc-botnet-worker@1 351 352 # Add override: 353 [Service] 354 CPUQuota=60% 355 356 # Reload and restart 357 sudo systemctl daemon-reload 358 sudo systemctl restart acdc-botnet-worker@1 359 ``` 360 361 ### Adjusting Memory Limits 362 363 ```bash 364 # Edit worker configuration 365 sudo systemctl edit acdc-botnet-worker@1 366 367 # Add override: 368 [Service] 369 MemoryMax=12G 370 371 # Reload and restart 372 sudo systemctl daemon-reload 373 sudo systemctl restart acdc-botnet-worker@1 374 ``` 375 376 ### Bot Capacity Tuning 377 378 Edit `/etc/acdc-botnet/worker-N.conf`: 379 380 ```bash 381 # Rule of thumb: 382 # - Each bot: ~0.1 CPU core, ~50MB RAM 383 # - 8 cores, 16GB RAM → MAX_BOTS=150-200 384 # - 16 cores, 32GB RAM → MAX_BOTS=300-400 385 # - 32 cores, 64GB RAM → MAX_BOTS=600-800 386 387 MAX_BOTS=400 388 389 # Then restart worker 390 sudo systemctl restart acdc-botnet-worker@1 391 ``` 392 393 --- 394 395 ## Troubleshooting 396 397 ### Worker Cannot Connect to Coordinator 398 399 **Symptom:** 400 ``` 401 Error: Failed to connect to coordinator at ci.ac-dc.network:50051 402 ``` 403 404 **Solutions:** 405 1. Check coordinator is running: `sudo systemctl status acdc-botnet-coordinator` 406 2. Verify firewall allows port 50051: `sudo ufw allow 50051/tcp` 407 3. Test connectivity: `telnet ci.ac-dc.network 50051` 408 4. Check coordinator logs: `sudo journalctl -u acdc-botnet-coordinator` 409 410 ### Out-of-Memory Kills 411 412 **Symptom:** 413 ``` 414 systemd[1]: acdc-botnet-worker@1.service: A process of this unit has been killed by the OOM killer. 415 ``` 416 417 **Solutions:** 418 1. Reduce `MAX_BOTS` in `/etc/acdc-botnet/worker-N.conf` 419 2. Increase `MemoryMax` in systemd service (if physical RAM available) 420 3. Enable swap (last resort): `sudo swapon /swapfile` 421 422 ### CPU Throttling 423 424 **Symptom:** 425 ``` 426 Worker performance degraded, TPS dropping 427 ``` 428 429 **Solutions:** 430 1. Check CPU usage: `systemd-cgtop` 431 2. Increase `CPUQuota` if underutilized 432 3. Reduce `MAX_BOTS` if overloaded 433 4. Check for other processes competing for CPU 434 435 ### Coordinator Checkpointing Failures 436 437 **Symptom:** 438 ``` 439 Error: Failed to write checkpoint to /var/lib/acdc-botnet/checkpoints 440 ``` 441 442 **Solutions:** 443 1. Check directory permissions: `ls -ld /var/lib/acdc-botnet/checkpoints` 444 2. Ensure directory exists: `sudo mkdir -p /var/lib/acdc-botnet/checkpoints` 445 3. Set ownership: `sudo chown -R devops:devops /var/lib/acdc-botnet` 446 4. Check disk space: `df -h /var/lib/acdc-botnet` 447 448 --- 449 450 ## Scaling 451 452 ### Adding Workers 453 454 1. Deploy binary to new server 455 2. Install systemd services 456 3. Configure worker: `/etc/acdc-botnet/worker-N.conf` 457 4. Start worker: `sudo systemctl start acdc-botnet-worker@N` 458 5. Worker auto-registers with coordinator 459 460 ### Removing Workers 461 462 ```bash 463 # Graceful shutdown (allows 60s for bots to finish) 464 sudo systemctl stop acdc-botnet-worker@N 465 466 # Disable on boot 467 sudo systemctl disable acdc-botnet-worker@N 468 469 # Coordinator will detect worker down after 3 missed heartbeats (15s) 470 # and migrate bots to healthy workers 471 ``` 472 473 ### Horizontal Scaling Limits 474 475 - **Theoretical**: 100+ workers per coordinator 476 - **Tested**: 10 workers, 3000 total bots 477 - **Bottleneck**: Coordinator gRPC throughput (~10k messages/sec) 478 479 --- 480 481 ## Security Hardening 482 483 All services include security hardening: 484 485 ```ini 486 # Systemd security directives 487 NoNewPrivileges=true # Cannot escalate privileges 488 PrivateTmp=true # Isolated /tmp directory 489 ProtectSystem=strict # Read-only /usr, /boot, /efi 490 ProtectHome=true # No access to /home 491 ReadWritePaths=/var/lib/acdc-botnet # Only write to data dir 492 ``` 493 494 **Additional recommendations:** 495 1. Run coordinator behind reverse proxy (nginx/caddy) with TLS 496 2. Use firewall to restrict port 50051 to worker IPs only 497 3. Enable SELinux or AppArmor for additional confinement 498 4. Rotate checkpoint files periodically to prevent disk exhaustion 499 500 --- 501 502 ## Maintenance 503 504 ### Service Restart (Zero Downtime) 505 506 ```bash 507 # Restart workers one at a time (coordinator migrates bots) 508 sudo systemctl restart acdc-botnet-worker@1 509 # Wait 60s for bots to migrate 510 sudo systemctl restart acdc-botnet-worker@2 511 # Wait 60s 512 sudo systemctl restart acdc-botnet-worker@3 513 ``` 514 515 ### Coordinator Restart (With Downtime) 516 517 ```bash 518 # Coordinator restart causes brief outage (~5-10s) 519 sudo systemctl restart acdc-botnet-coordinator 520 521 # Workers will reconnect automatically 522 # Bots are recreated from last checkpoint (30s intervals) 523 ``` 524 525 ### Log Rotation 526 527 Logs are managed by journald. Configure retention: 528 529 ```bash 530 # Edit journald config 531 sudo nano /etc/systemd/journald.conf 532 533 # Set limits: 534 SystemMaxUse=1G 535 MaxRetentionSec=7day 536 537 # Restart journald 538 sudo systemctl restart systemd-journald 539 ``` 540 541 --- 542 543 ## Performance Benchmarks 544 545 | Configuration | Bots | TPS | CPU Usage | RAM Usage | Latency (p95) | 546 |--------------|------|-----|-----------|-----------|---------------| 547 | 1 worker, 8 cores | 200 | 1,500 | 70% | 10GB | 250ms | 548 | 3 workers, 24 cores | 600 | 4,500 | 65% | 28GB | 280ms | 549 | 5 workers, 40 cores | 1000 | 7,500 | 70% | 45GB | 320ms | 550 551 **Notes:** 552 - Measurements on testnet (Alpha/Delta dual-chain) 553 - Mixed workload (50% trades, 30% transfers, 20% governance) 554 - Network latency: <50ms coordinator↔workers 555 556 --- 557 558 ## Quick Reference 559 560 ### Essential Commands 561 562 ```bash 563 # Coordinator 564 sudo systemctl start acdc-botnet-coordinator 565 sudo systemctl status acdc-botnet-coordinator 566 sudo journalctl -u acdc-botnet-coordinator -f 567 568 # Workers 569 sudo systemctl start acdc-botnet-worker@1 570 sudo systemctl status acdc-botnet-worker@1 571 sudo journalctl -u acdc-botnet-worker@1 -f 572 573 # Resource monitoring 574 systemd-cgtop 575 systemctl show acdc-botnet-worker@1 -p CPUUsageNSec -p MemoryCurrent 576 577 # Run scenario 578 acdc-botnet run daily-network-ops 579 acdc-botnet status --show-workers 580 581 # Metrics 582 curl http://coordinator.example.com:9090/metrics 583 ``` 584 585 ### Configuration Files 586 587 | File | Purpose | 588 |------|---------| 589 | `/etc/systemd/system/acdc-botnet-coordinator.service` | Coordinator service | 590 | `/etc/systemd/system/acdc-botnet-worker@.service` | Worker template service | 591 | `/etc/acdc-botnet/worker-N.conf` | Per-worker configuration | 592 | `/var/lib/acdc-botnet/checkpoints/` | Coordinator state checkpoints | 593 | `/opt/acdc-botnet/` | Working directory | 594 595 --- 596 597 ## Next Steps 598 599 1. **Dynamic Resource Management** (see `RESOURCE_MANAGEMENT.md`): 600 - Implement ResourceMonitor for real-time CPU/memory tracking 601 - Add ThrottleController for automatic bot scaling 602 - Target: Operate at 80% capacity without manual tuning 603 604 2. **Enhanced Metrics**: 605 - Add per-bot resource tracking 606 - Implement anomaly detection (3-sigma + MAD) 607 - Dashboard visualization (Grafana integration) 608 609 3. **High Availability**: 610 - Multi-coordinator consensus (Raft/etcd) 611 - Worker failover testing at scale (>10 workers) 612 - Zero-downtime upgrades 613 614 --- 615 616 For questions or issues, see: 617 - **Repository**: https://source.ac-dc.network/alpha-delta-network/acdc-botnet 618 - **Documentation**: `/docs/` directory 619 - **CI Status**: https://ci.ac-dc.network/alpha-delta-network/acdc-botnet