/ docs / DEPLOYMENT.md
DEPLOYMENT.md
  1  # ACDC Botnet - Production Deployment Guide
  2  
  3  ## Overview
  4  
  5  This guide covers deploying ACDC Botnet in production with proper resource management, systemd integration, and multi-server orchestration.
  6  
  7  ---
  8  
  9  ## Architecture
 10  
 11  ```
 12  ┌─────────────────────────────────────────────────────┐
 13  │  Coordinator (Command & Control)                    │
 14  │  - ci.ac-dc.network:50051                          │
 15  │  - Scenario orchestration                           │
 16  │  - Bot distribution                                 │
 17  │  - Metrics aggregation                              │
 18  │  - systemd service: acdc-botnet-coordinator         │
 19  └──────────┬──────────────────────────────────────────┘
 20             │ gRPC bidirectional streams
 21             ├─────────────────┬─────────────────┬──────────────────
 22             │                 │                 │
 23       ┌─────▼─────┐     ┌─────▼─────┐     ┌─────▼─────┐
 24       │  Worker 1 │     │  Worker 2 │     │  Worker N │
 25       │  (GPU)    │     │  (CPU)    │     │  (CPU)    │
 26       ├───────────┤     ├───────────┤     ├───────────┤
 27       │ 50 bots   │     │ 200 bots  │     │ 200 bots  │
 28       │ - Provers │     │ - Traders │     │ - Users   │
 29       │ - Users   │     │ - Users   │     │ - Govs    │
 30       └───────────┘     └───────────┘     └───────────┘
 31  ```
 32  
 33  **Key Components:**
 34  - **Coordinator**: Single instance, lightweight (25% CPU, 2GB RAM)
 35  - **Workers**: Multiple instances, configurable resources (40-80% CPU, 8-16GB RAM)
 36  - **Protocol**: gRPC over TCP (port 50051)
 37  - **Orchestration**: systemd with resource controls
 38  
 39  ---
 40  
 41  ## Prerequisites
 42  
 43  ### System Requirements
 44  
 45  #### Coordinator Server
 46  - **CPU**: 2-4 cores
 47  - **RAM**: 4GB minimum
 48  - **Network**: Public IP or accessible to worker nodes
 49  - **OS**: Linux with systemd (Ubuntu 20.04+, RHEL 8+, Debian 11+)
 50  
 51  #### Worker Servers
 52  - **CPU**: 8+ cores recommended (4 cores minimum)
 53  - **RAM**: 16GB recommended (8GB minimum)
 54  - **Network**: Access to coordinator on port 50051
 55  - **GPU**: Optional (for prover bots with ZK proof generation)
 56  - **OS**: Linux with systemd
 57  
 58  ### Software Dependencies
 59  - Rust 1.92.0+ (for building)
 60  - systemd 240+ (for resource controls)
 61  - gRPC libraries (included in binary)
 62  
 63  ---
 64  
 65  ## Installation
 66  
 67  ### Step 1: Build Binary
 68  
 69  On build server or CI:
 70  
 71  ```bash
 72  cd /home/devops/working-repos/acdc-botnet
 73  cargo build --release --all-features
 74  
 75  # Binary will be at: target/release/acdc-botnet
 76  ```
 77  
 78  ### Step 2: Deploy to Servers
 79  
 80  #### Deploy to Coordinator
 81  
 82  ```bash
 83  # Copy binary
 84  scp target/release/acdc-botnet coordinator.example.com:/tmp/
 85  ssh coordinator.example.com
 86  
 87  # Install binary
 88  sudo mv /tmp/acdc-botnet /usr/local/bin/
 89  sudo chmod +x /usr/local/bin/acdc-botnet
 90  ```
 91  
 92  #### Deploy to Workers
 93  
 94  ```bash
 95  # Copy to all worker nodes
 96  for host in worker1 worker2 worker3; do
 97      scp target/release/acdc-botnet $host:/tmp/
 98      ssh $host "sudo mv /tmp/acdc-botnet /usr/local/bin/ && sudo chmod +x /usr/local/bin/acdc-botnet"
 99  done
100  ```
101  
102  ### Step 3: Install Systemd Services
103  
104  #### On Coordinator
105  
106  ```bash
107  # Copy systemd files
108  cd /home/devops/working-repos/acdc-botnet
109  scp systemd/* coordinator.example.com:/tmp/
110  
111  # Install services
112  ssh coordinator.example.com
113  cd /tmp
114  sudo ./install.sh
115  ```
116  
117  #### On Each Worker
118  
119  ```bash
120  # Copy systemd files
121  scp systemd/* worker1.example.com:/tmp/
122  
123  # Install services
124  ssh worker1.example.com
125  cd /tmp
126  sudo ./install.sh
127  ```
128  
129  ---
130  
131  ## Configuration
132  
133  ### Coordinator Configuration
134  
135  Edit `/etc/systemd/system/acdc-botnet-coordinator.service` if needed:
136  
137  ```ini
138  [Service]
139  # Default bind address (0.0.0.0 = all interfaces)
140  ExecStart=/usr/local/bin/acdc-botnet coordinator start \
141    --bind 0.0.0.0:50051 \
142    --checkpointing \
143    --checkpoint-interval 30s \
144    --checkpoint-dir /var/lib/acdc-botnet/checkpoints \
145    --metrics-port 9090
146  
147  # Resource limits (adjust based on coordinator load)
148  CPUQuota=25%
149  MemoryMax=2G
150  ```
151  
152  ### Worker Configuration
153  
154  Each worker has a configuration file at `/etc/acdc-botnet/worker-<N>.conf`.
155  
156  **Example: High-capacity worker** (`/etc/acdc-botnet/worker-1.conf`):
157  ```bash
158  # Coordinator address (REQUIRED - update with your coordinator IP/hostname)
159  COORDINATOR_ADDR=ci.ac-dc.network:50051
160  
161  # Worker identification
162  WORKER_ID=worker-1
163  
164  # Maximum bots (adjust based on server capacity)
165  MAX_BOTS=300
166  
167  # Bot capabilities
168  CAPABILITIES=trader,user,governor
169  
170  # Resource limits
171  CPU_QUOTA=80%
172  MEMORY_MAX=16G
173  ```
174  
175  **Example: Co-located with validator** (`/etc/acdc-botnet/worker-2.conf`):
176  ```bash
177  # Lighter configuration for servers running validators
178  COORDINATOR_ADDR=ci.ac-dc.network:50051
179  WORKER_ID=worker-2
180  MAX_BOTS=150
181  CAPABILITIES=trader,user
182  CPU_QUOTA=40%
183  MEMORY_MAX=8G
184  ```
185  
186  **Example: GPU-enabled worker** (`/etc/acdc-botnet/worker-gpu.conf`):
187  ```bash
188  # GPU worker for ZK proof generation
189  COORDINATOR_ADDR=ci.ac-dc.network:50051
190  WORKER_ID=worker-gpu
191  MAX_BOTS=50
192  CAPABILITIES=prover,trader,user
193  CPU_QUOTA=60%
194  MEMORY_MAX=12G
195  ```
196  
197  ### Bot Capability Types
198  
199  | Capability | Description | Resource Requirements |
200  |-----------|-------------|---------------------|
201  | `user` | General users (transfers, queries) | Low CPU, Low RAM |
202  | `trader` | DEX trading (spot, perpetuals) | Medium CPU, Medium RAM |
203  | `governor` | Governance voting | Low CPU, Low RAM |
204  | `prover` | ZK proof generation | High CPU, High RAM, GPU optional |
205  | `validator` | Consensus participation | High CPU, Medium RAM |
206  | `liquidity_provider` | DEX liquidity operations | Medium CPU, Medium RAM |
207  
208  ---
209  
210  ## Starting Services
211  
212  ### Start Coordinator (First)
213  
214  ```bash
215  # On coordinator server
216  sudo systemctl start acdc-botnet-coordinator
217  sudo systemctl enable acdc-botnet-coordinator  # Start on boot
218  
219  # Check status
220  sudo systemctl status acdc-botnet-coordinator
221  
222  # View logs
223  sudo journalctl -u acdc-botnet-coordinator -f
224  ```
225  
226  **Expected log output:**
227  ```
228  Coordinator listening on 0.0.0.0:50051
229  Checkpointing enabled: interval=30s, dir=/var/lib/acdc-botnet/checkpoints
230  Metrics server started on port 9090
231  ```
232  
233  ### Start Workers (After Coordinator)
234  
235  ```bash
236  # On each worker server
237  sudo systemctl start acdc-botnet-worker@1
238  sudo systemctl enable acdc-botnet-worker@1
239  
240  # Check status
241  sudo systemctl status acdc-botnet-worker@1
242  
243  # View logs
244  sudo journalctl -u acdc-botnet-worker@1 -f
245  ```
246  
247  **Expected log output:**
248  ```
249  Connecting to coordinator at ci.ac-dc.network:50051
250  Connected successfully
251  Worker registered: worker-1, capacity=300 bots
252  Capabilities: trader, user, governor
253  Waiting for bot assignments...
254  ```
255  
256  ### Start Multiple Worker Instances
257  
258  ```bash
259  # Start workers 1-3 on same host (if sufficient resources)
260  sudo systemctl start acdc-botnet-worker@1
261  sudo systemctl start acdc-botnet-worker@2
262  sudo systemctl start acdc-botnet-worker@3
263  
264  # Enable on boot
265  sudo systemctl enable acdc-botnet-worker@{1,2,3}
266  ```
267  
268  ---
269  
270  ## Running Scenarios
271  
272  ### From Coordinator
273  
274  ```bash
275  # SSH to coordinator
276  ssh coordinator.example.com
277  
278  # Run scenario (coordinator distributes bots to workers)
279  acdc-botnet run daily-network-ops --duration 10m
280  
281  # Run high-load scenario
282  acdc-botnet run peak-tps-stress --workers 5 --bots-per-worker 200
283  
284  # Check status
285  acdc-botnet status --show-workers
286  ```
287  
288  **Output:**
289  ```
290  Coordinator: ci.ac-dc.network:50051
291  Workers: 5 active, 0 down
292    worker-1 (GPU): 50/50 bots, 15% CPU, 4GB RAM
293    worker-2 (CPU): 200/200 bots, 80% CPU, 8GB RAM
294    worker-3 (CPU): 200/200 bots, 82% CPU, 8GB RAM
295  Total: 1000 bots, 3500 TPS, 0.2% errors
296  ```
297  
298  ---
299  
300  ## Monitoring
301  
302  ### Systemd Resource Monitoring
303  
304  ```bash
305  # Real-time resource usage for all services
306  systemd-cgtop
307  
308  # Specific service metrics
309  systemctl show acdc-botnet-coordinator -p CPUUsageNSec -p MemoryCurrent
310  systemctl show acdc-botnet-worker@1 -p CPUUsageNSec -p MemoryCurrent
311  ```
312  
313  ### Prometheus Metrics
314  
315  Coordinator exposes metrics on port 9090:
316  
317  ```bash
318  # Query coordinator metrics
319  curl http://coordinator.example.com:9090/metrics
320  
321  # Key metrics:
322  # - testbots_worker_count{status="active"}
323  # - testbots_worker_count{status="down"}
324  # - testbots_total_bots
325  # - testbots_global_tps
326  # - testbots_scenario_duration_seconds
327  ```
328  
329  ### Log Aggregation
330  
331  ```bash
332  # View all coordinator logs
333  sudo journalctl -u acdc-botnet-coordinator --since "1 hour ago"
334  
335  # View all worker logs
336  sudo journalctl -u 'acdc-botnet-worker@*' --since "1 hour ago"
337  
338  # Follow logs from all services
339  sudo journalctl -u acdc-botnet-coordinator -u 'acdc-botnet-worker@*' -f
340  ```
341  
342  ---
343  
344  ## Resource Tuning
345  
346  ### Adjusting CPU Quotas
347  
348  ```bash
349  # Edit worker configuration
350  sudo systemctl edit acdc-botnet-worker@1
351  
352  # Add override:
353  [Service]
354  CPUQuota=60%
355  
356  # Reload and restart
357  sudo systemctl daemon-reload
358  sudo systemctl restart acdc-botnet-worker@1
359  ```
360  
361  ### Adjusting Memory Limits
362  
363  ```bash
364  # Edit worker configuration
365  sudo systemctl edit acdc-botnet-worker@1
366  
367  # Add override:
368  [Service]
369  MemoryMax=12G
370  
371  # Reload and restart
372  sudo systemctl daemon-reload
373  sudo systemctl restart acdc-botnet-worker@1
374  ```
375  
376  ### Bot Capacity Tuning
377  
378  Edit `/etc/acdc-botnet/worker-N.conf`:
379  
380  ```bash
381  # Rule of thumb:
382  # - Each bot: ~0.1 CPU core, ~50MB RAM
383  # - 8 cores, 16GB RAM → MAX_BOTS=150-200
384  # - 16 cores, 32GB RAM → MAX_BOTS=300-400
385  # - 32 cores, 64GB RAM → MAX_BOTS=600-800
386  
387  MAX_BOTS=400
388  
389  # Then restart worker
390  sudo systemctl restart acdc-botnet-worker@1
391  ```
392  
393  ---
394  
395  ## Troubleshooting
396  
397  ### Worker Cannot Connect to Coordinator
398  
399  **Symptom:**
400  ```
401  Error: Failed to connect to coordinator at ci.ac-dc.network:50051
402  ```
403  
404  **Solutions:**
405  1. Check coordinator is running: `sudo systemctl status acdc-botnet-coordinator`
406  2. Verify firewall allows port 50051: `sudo ufw allow 50051/tcp`
407  3. Test connectivity: `telnet ci.ac-dc.network 50051`
408  4. Check coordinator logs: `sudo journalctl -u acdc-botnet-coordinator`
409  
410  ### Out-of-Memory Kills
411  
412  **Symptom:**
413  ```
414  systemd[1]: acdc-botnet-worker@1.service: A process of this unit has been killed by the OOM killer.
415  ```
416  
417  **Solutions:**
418  1. Reduce `MAX_BOTS` in `/etc/acdc-botnet/worker-N.conf`
419  2. Increase `MemoryMax` in systemd service (if physical RAM available)
420  3. Enable swap (last resort): `sudo swapon /swapfile`
421  
422  ### CPU Throttling
423  
424  **Symptom:**
425  ```
426  Worker performance degraded, TPS dropping
427  ```
428  
429  **Solutions:**
430  1. Check CPU usage: `systemd-cgtop`
431  2. Increase `CPUQuota` if underutilized
432  3. Reduce `MAX_BOTS` if overloaded
433  4. Check for other processes competing for CPU
434  
435  ### Coordinator Checkpointing Failures
436  
437  **Symptom:**
438  ```
439  Error: Failed to write checkpoint to /var/lib/acdc-botnet/checkpoints
440  ```
441  
442  **Solutions:**
443  1. Check directory permissions: `ls -ld /var/lib/acdc-botnet/checkpoints`
444  2. Ensure directory exists: `sudo mkdir -p /var/lib/acdc-botnet/checkpoints`
445  3. Set ownership: `sudo chown -R devops:devops /var/lib/acdc-botnet`
446  4. Check disk space: `df -h /var/lib/acdc-botnet`
447  
448  ---
449  
450  ## Scaling
451  
452  ### Adding Workers
453  
454  1. Deploy binary to new server
455  2. Install systemd services
456  3. Configure worker: `/etc/acdc-botnet/worker-N.conf`
457  4. Start worker: `sudo systemctl start acdc-botnet-worker@N`
458  5. Worker auto-registers with coordinator
459  
460  ### Removing Workers
461  
462  ```bash
463  # Graceful shutdown (allows 60s for bots to finish)
464  sudo systemctl stop acdc-botnet-worker@N
465  
466  # Disable on boot
467  sudo systemctl disable acdc-botnet-worker@N
468  
469  # Coordinator will detect worker down after 3 missed heartbeats (15s)
470  # and migrate bots to healthy workers
471  ```
472  
473  ### Horizontal Scaling Limits
474  
475  - **Theoretical**: 100+ workers per coordinator
476  - **Tested**: 10 workers, 3000 total bots
477  - **Bottleneck**: Coordinator gRPC throughput (~10k messages/sec)
478  
479  ---
480  
481  ## Security Hardening
482  
483  All services include security hardening:
484  
485  ```ini
486  # Systemd security directives
487  NoNewPrivileges=true        # Cannot escalate privileges
488  PrivateTmp=true             # Isolated /tmp directory
489  ProtectSystem=strict        # Read-only /usr, /boot, /efi
490  ProtectHome=true            # No access to /home
491  ReadWritePaths=/var/lib/acdc-botnet  # Only write to data dir
492  ```
493  
494  **Additional recommendations:**
495  1. Run coordinator behind reverse proxy (nginx/caddy) with TLS
496  2. Use firewall to restrict port 50051 to worker IPs only
497  3. Enable SELinux or AppArmor for additional confinement
498  4. Rotate checkpoint files periodically to prevent disk exhaustion
499  
500  ---
501  
502  ## Maintenance
503  
504  ### Service Restart (Zero Downtime)
505  
506  ```bash
507  # Restart workers one at a time (coordinator migrates bots)
508  sudo systemctl restart acdc-botnet-worker@1
509  # Wait 60s for bots to migrate
510  sudo systemctl restart acdc-botnet-worker@2
511  # Wait 60s
512  sudo systemctl restart acdc-botnet-worker@3
513  ```
514  
515  ### Coordinator Restart (With Downtime)
516  
517  ```bash
518  # Coordinator restart causes brief outage (~5-10s)
519  sudo systemctl restart acdc-botnet-coordinator
520  
521  # Workers will reconnect automatically
522  # Bots are recreated from last checkpoint (30s intervals)
523  ```
524  
525  ### Log Rotation
526  
527  Logs are managed by journald. Configure retention:
528  
529  ```bash
530  # Edit journald config
531  sudo nano /etc/systemd/journald.conf
532  
533  # Set limits:
534  SystemMaxUse=1G
535  MaxRetentionSec=7day
536  
537  # Restart journald
538  sudo systemctl restart systemd-journald
539  ```
540  
541  ---
542  
543  ## Performance Benchmarks
544  
545  | Configuration | Bots | TPS | CPU Usage | RAM Usage | Latency (p95) |
546  |--------------|------|-----|-----------|-----------|---------------|
547  | 1 worker, 8 cores | 200 | 1,500 | 70% | 10GB | 250ms |
548  | 3 workers, 24 cores | 600 | 4,500 | 65% | 28GB | 280ms |
549  | 5 workers, 40 cores | 1000 | 7,500 | 70% | 45GB | 320ms |
550  
551  **Notes:**
552  - Measurements on testnet (Alpha/Delta dual-chain)
553  - Mixed workload (50% trades, 30% transfers, 20% governance)
554  - Network latency: <50ms coordinator↔workers
555  
556  ---
557  
558  ## Quick Reference
559  
560  ### Essential Commands
561  
562  ```bash
563  # Coordinator
564  sudo systemctl start acdc-botnet-coordinator
565  sudo systemctl status acdc-botnet-coordinator
566  sudo journalctl -u acdc-botnet-coordinator -f
567  
568  # Workers
569  sudo systemctl start acdc-botnet-worker@1
570  sudo systemctl status acdc-botnet-worker@1
571  sudo journalctl -u acdc-botnet-worker@1 -f
572  
573  # Resource monitoring
574  systemd-cgtop
575  systemctl show acdc-botnet-worker@1 -p CPUUsageNSec -p MemoryCurrent
576  
577  # Run scenario
578  acdc-botnet run daily-network-ops
579  acdc-botnet status --show-workers
580  
581  # Metrics
582  curl http://coordinator.example.com:9090/metrics
583  ```
584  
585  ### Configuration Files
586  
587  | File | Purpose |
588  |------|---------|
589  | `/etc/systemd/system/acdc-botnet-coordinator.service` | Coordinator service |
590  | `/etc/systemd/system/acdc-botnet-worker@.service` | Worker template service |
591  | `/etc/acdc-botnet/worker-N.conf` | Per-worker configuration |
592  | `/var/lib/acdc-botnet/checkpoints/` | Coordinator state checkpoints |
593  | `/opt/acdc-botnet/` | Working directory |
594  
595  ---
596  
597  ## Next Steps
598  
599  1. **Dynamic Resource Management** (see `RESOURCE_MANAGEMENT.md`):
600     - Implement ResourceMonitor for real-time CPU/memory tracking
601     - Add ThrottleController for automatic bot scaling
602     - Target: Operate at 80% capacity without manual tuning
603  
604  2. **Enhanced Metrics**:
605     - Add per-bot resource tracking
606     - Implement anomaly detection (3-sigma + MAD)
607     - Dashboard visualization (Grafana integration)
608  
609  3. **High Availability**:
610     - Multi-coordinator consensus (Raft/etcd)
611     - Worker failover testing at scale (>10 workers)
612     - Zero-downtime upgrades
613  
614  ---
615  
616  For questions or issues, see:
617  - **Repository**: https://source.ac-dc.network/alpha-delta-network/acdc-botnet
618  - **Documentation**: `/docs/` directory
619  - **CI Status**: https://ci.ac-dc.network/alpha-delta-network/acdc-botnet