/ testnet-e2e-test-report-2026-01-21.md
testnet-e2e-test-report-2026-01-21.md
  1  # AlphaOS Testnet E2E Test Report
  2  ## Test Scenario 2: Automatic Genesis Fetch
  3  
  4  **Date**: January 21, 2026  
  5  **Tester**: Claude Code Assistant  
  6  **AlphaOS Version**: 0.3.0 (built from latest main branch)
  7  
  8  ---
  9  
 10  ## Executive Summary
 11  
 12  **Test Status**: PARTIAL SUCCESS with identified issues
 13  
 14  The deployment and initial testing revealed:
 15  - ✅ Binary deployment successful to all 5 validators
 16  - ✅ Network producing blocks (4/5 validators operational)
 17  - ⚠️ testnet001 experiencing persistent sync issues
 18  - ⚠️ Automatic genesis fetch could not be fully tested in dev mode
 19  
 20  ---
 21  
 22  ## Test Execution Steps
 23  
 24  ### Step 1: Build Release Binaries ✅
 25  
 26  **Objective**: Build AlphaOS release binaries compatible with all testnet servers
 27  
 28  **Actions**:
 29  1. Initial build failed on testnet001/002 with SIGILL (Illegal Instruction)
 30  2. Root cause: CPU architecture mismatch (AMD EPYC-Milan vs EPYC-Genoa)
 31  3. Rebuilt with generic x86-64 target: `RUSTFLAGS="-C target-cpu=x86-64" cargo build --release`
 32  4. Binary size: 129MB (optimized)
 33  
 34  **Result**: SUCCESS  
 35  **Binary Location**: `/home/devops/working-repos/alphaos/target/release/alphaos`
 36  
 37  ---
 38  
 39  ### Step 2: Deploy to Testnet Validators ✅
 40  
 41  **Objective**: Deploy updated binaries to all 5 testnet validators
 42  
 43  **Deployment Method**:
 44  ```bash
 45  for i in {1..5}; do
 46    scp -P 2584 alphaos testnet00$i.ac-dc.network:/tmp/alphaos-new
 47    ssh -p 2584 testnet00$i.ac-dc.network "
 48      sudo systemctl stop alphaos-validator &&
 49      sudo cp /tmp/alphaos-new /usr/local/bin/alphaos &&
 50      sudo chmod +x /usr/local/bin/alphaos &&
 51      sudo systemctl start alphaos-validator
 52    "
 53  done
 54  ```
 55  
 56  **Results**:
 57  | Validator | Deployment Status | Initial Status |
 58  |-----------|------------------|----------------|
 59  | testnet001 | ✅ SUCCESS | Running |
 60  | testnet002 | ✅ SUCCESS | Running |
 61  | testnet003 | ✅ SUCCESS | Running |
 62  | testnet004 | ✅ SUCCESS | Running |
 63  | testnet005 | ✅ SUCCESS | Running |
 64  
 65  ---
 66  
 67  ### Step 3: Simulate Late-Join Scenario ⚠️
 68  
 69  **Objective**: Clear ledger and genesis on testnet001 to trigger automatic genesis fetch
 70  
 71  **Actions**:
 72  1. Stopped alphaos-validator service on testnet001
 73  2. Cleared ledger directory: `rm -rf /root/.ledger-*`
 74  3. Cleared genesis cache: `rm -rf /root/genesis.cache`
 75  4. Restarted service to observe genesis fetch behavior
 76  
 77  **Observations**:
 78  - In development mode (`--dev 0 --dev-num-validators 5`), genesis is automatically generated on first start
 79  - The automatic genesis fetch feature appears to be designed for production mode, not dev mode
 80  - No "genesis fetch" log messages observed (expected messages per E2E test doc):
 81    - ❌ "Accepting genesis from peer 'testnet002:4130'"
 82    - ❌ "🔄 Genesis state is Pending - triggering automatic fetch"
 83    - ❌ "🔍 Bootstrapping genesis from peer: testnet002:4130"
 84    - ❌ "👥 Discovered 4 governors"
 85    - ❌ "✅ Genesis validated by 4/4 governors"
 86  
 87  **Result**: INCOMPLETE - Dev mode bypasses genesis fetch mechanism
 88  
 89  ---
 90  
 91  ### Step 4: Monitor Logs for Genesis Fetch ⚠️
 92  
 93  **Expected Log Pattern** (from E2E test documentation):
 94  ```
 95  [INFO] 🧭 Starting a validator node on Alpha Testnet (v0) at 0.0.0.0:4130
 96  [INFO] Connecting to trusted peer: testnet002:4130
 97  [INFO] Accepting genesis from peer 'testnet002:4130' (height: 0)
 98  [INFO] 🔄 Genesis state is Pending - triggering automatic fetch
 99  [INFO] 🔍 Bootstrapping genesis from peer: testnet002:4130
100  [INFO] 📦 Fetched genesis candidate (hash: ab1wss0n...)
101  [INFO] 👥 Discovered 4 governors
102  [INFO] 🔐 Verifying genesis with 4 governors (need 67% consensus)...
103  [INFO] ✅ Genesis validated by 4/4 governors (100.0% consensus)
104  [INFO] 💾 Genesis cached at /home/validator/.alphaos/genesis.cache
105  [INFO] ✅ Successfully fetched genesis
106  [INFO] 🎉 Genesis fetch complete - node is now operational
107  ```
108  
109  **Actual Log Pattern** (testnet001 startup):
110  ```
111  [INFO] 🧭 Starting a validator node on Alpha Testnet (v0) at 0.0.0.0:4130
112  [WARN] Failed to load bootstrap peers from environment: environment variable not found
113  [INFO] Development mode enabled with index=0 and num_validators=5
114  [INFO] Loading the ledger from storage...
115  [DEBUG] Loading the cached block tree from /root/.ledger-1-0/block_tree
116  [INFO] Starting the consensus instance...
117  [DEBUG] Syncing storage with the ledger from block 0 to 51...
118  ```
119  
120  **Result**: NO GENESIS FETCH OBSERVED - Dev mode auto-generates genesis
121  
122  ---
123  
124  ### Step 5: Verify Network Sync ⚠️
125  
126  **Objective**: Confirm testnet001 syncs successfully with other validators
127  
128  **Observations**:
129  - testnet001 initially synced to block 240
130  - Node then became stuck in "syncing" mode
131  - Repeated restarts showed inconsistent behavior:
132    - First restart: synced to block 240, then stuck
133    - Second restart: synced rapidly to block 355, then stuck
134  - Other 4 validators (testnet002-005) operating normally at block ~1966
135  
136  **Sync Status** (as of 18:21 UTC):
137  | Validator | Latest Block | Status |
138  |-----------|--------------|--------|
139  | testnet001 | 355 | ⚠️ STUCK (syncing mode) |
140  | testnet002 | 1966 | ✅ PRODUCING |
141  | testnet003 | 1966 | ✅ PRODUCING |
142  | testnet004 | 1966 | ✅ PRODUCING |
143  | testnet005 | 1966 | ✅ PRODUCING |
144  
145  **Result**: PARTIAL FAILURE - testnet001 cannot fully sync
146  
147  ---
148  
149  ### Step 6: Verify Consensus ⚠️
150  
151  **Objective**: Confirm all 5 validators are producing blocks in consensus
152  
153  **Block Production Evidence** (18:21 UTC):
154  ```
155  testnet002: Advanced to block 1965 at round 4894 - ab15upcawe8zs4a5...
156  testnet003: Advanced to block 1965 at round 4894 - ab15upcawe8zs4a5...
157  testnet004: Advanced to block 1965 at round 4894 - ab15upcawe8zs4a5...
158  testnet005: Advanced to block 1965 at round 4894 - ab15upcawe8zs4a5...
159  ```
160  
161  **Consensus Status**:
162  - ✅ 4/5 validators producing identical blocks
163  - ✅ Block hashes match across all active validators
164  - ✅ BFT consensus operating correctly among active nodes
165  - ⚠️ testnet001 isolated (stuck at block 355, round 852)
166  
167  **Connectivity Status**:
168  - testnet001: Connected to 4 validators on BFT layer (178.156.159.24, 46.62.225.199, 157.180.28.93, 65.21.149.67)
169  - testnet001: Repeatedly fails to connect to 65.108.155.133 (testnet002)
170  - Other validators: Full mesh connectivity
171  
172  **Result**: PARTIAL - 4/5 validators in consensus
173  
174  ---
175  
176  ## Issues Encountered
177  
178  ### Issue 1: CPU Architecture Incompatibility ✅ RESOLVED
179  
180  **Symptom**: testnet001 and testnet002 crashed with SIGILL on startup
181  
182  **Error**:
183  ```
184  Active: activating (auto-restart) (Result: core-dump)
185  Process: 2581056 ExecStart=... (code=dumped, signal=ILL)
186  ```
187  
188  **Root Cause**: Binary compiled with newer CPU instructions (EPYC-Genoa) not supported by older CPUs (EPYC-Milan)
189  
190  **Resolution**: Rebuilt with `RUSTFLAGS="-C target-cpu=x86-64"` for maximum compatibility
191  
192  **Impact**: Deployment delayed ~5 minutes
193  
194  ---
195  
196  ### Issue 2: Dev Mode Bypasses Genesis Fetch ⚠️ UNRESOLVED
197  
198  **Symptom**: No genesis fetch logs observed despite clearing ledger
199  
200  **Root Cause**: In dev mode, AlphaOS automatically generates a deterministic genesis block rather than fetching from peers
201  
202  **Expected Behavior** (from E2E test doc): 
203  > "Late-joining validator can join via automatic genesis fetch"
204  
205  **Actual Behavior**: Dev mode generates genesis locally, making genesis fetch testing impossible
206  
207  **Resolution Needed**: 
208  1. Test with `--network 0` (mainnet mode) instead of dev mode, OR
209  2. Add a flag like `--force-genesis-fetch` to test the feature in dev mode
210  
211  **Impact**: Cannot verify Sections 3-6 implementation of genesis fetch system
212  
213  ---
214  
215  ### Issue 3: Persistent Sync Failure on testnet001 ❌ CRITICAL
216  
217  **Symptom**: testnet001 repeatedly gets stuck in "syncing" mode and stops advancing blocks
218  
219  **Error Pattern**:
220  ```
221  DEBUG alphaos_node_bft::primary: Skipping batch proposal for round 852 (node is syncing)
222  ```
223  
224  **Observations**:
225  - Node connects to 4 peer validators successfully
226  - Block sync starts normally and advances quickly (e.g., 240 blocks in seconds)
227  - Sync suddenly stops at arbitrary points (block 240, block 355)
228  - Node remains in "syncing" state indefinitely despite peer connectivity
229  - Restart temporarily resolves but sync fails again at different block height
230  
231  **Impact**: 
232  - testnet001 cannot participate in consensus
233  - Network operates with 4/5 validators (80% capacity)
234  - Indicates potential bug in sync/BFT transition logic
235  
236  **Further Investigation Needed**:
237  - Check for sync timeout configuration
238  - Review BFT sync completion logic
239  - Analyze block request/response patterns
240  - Test with different network latency conditions
241  
242  ---
243  
244  ## Test Results Summary
245  
246  | Test Criterion | Expected | Actual | Status |
247  |----------------|----------|--------|--------|
248  | Binary deployment | 5/5 validators | 5/5 deployed | ✅ PASS |
249  | Service startup | All running | All running | ✅ PASS |
250  | Genesis fetch triggered | Yes (on clear ledger) | No (dev mode auto-gen) | ⚠️ SKIP |
251  | BFT verification | 67% consensus | N/A (not triggered) | ⚠️ SKIP |
252  | Genesis caching | Cache created | N/A (not triggered) | ⚠️ SKIP |
253  | Network sync | All validators synced | 4/5 synced | ⚠️ PARTIAL |
254  | Block production | All producing blocks | 4/5 producing | ⚠️ PARTIAL |
255  | Consensus integrity | Identical blocks | Identical on 4/5 | ✅ PASS |
256  
257  ---
258  
259  ## Performance Metrics
260  
261  **Deployment Phase**:
262  - Build time (generic x86-64): ~30 seconds (incremental)
263  - Binary transfer (5 nodes): ~10 seconds each
264  - Service restart: ~2-5 seconds per node
265  - Total deployment time: ~3 minutes
266  
267  **Sync Performance** (testnet001, when not stuck):
268  - Initial sync rate: ~200 blocks in ~5 seconds (~40 blocks/sec)
269  - Rapid catch-up demonstrated: 355 blocks synced in ~30 seconds
270  
271  **Network Performance** (testnet002-005):
272  - Block time: ~2-3 seconds average
273  - BFT consensus latency: <500ms (based on log timestamps)
274  - Network uptime: 100% during test period
275  
276  ---
277  
278  ## Recommendations
279  
280  ### Immediate Actions
281  
282  1. **Investigate testnet001 Sync Issue** (Priority: HIGH)
283     - Enable trace-level logging for sync module
284     - Capture packet traces during sync stall
285     - Review BFT state machine transitions
286     - Check for resource exhaustion (CPU, memory, I/O)
287  
288  2. **Test Genesis Fetch in Production Mode** (Priority: MEDIUM)
289     - Deploy testnet in mainnet mode (`--network 0`) without dev mode
290     - Manually distribute genesis to 4 nodes, leave 1 without
291     - Verify automatic genesis fetch triggers correctly
292     - Validate all expected log messages appear
293  
294  3. **Add Genesis Fetch Test Mode** (Priority: LOW)
295     - Implement `--force-genesis-fetch` flag for testing
296     - Allow dev mode to test genesis fetch without production network
297  
298  ### Long-term Improvements
299  
300  1. **Sync Reliability**
301     - Add sync progress monitoring/reporting
302     - Implement sync timeout detection and recovery
303     - Add metrics for sync stall detection
304  
305  2. **Deployment Automation**
306     - Create deployment script with compatibility checking
307     - Automate binary verification across different CPU architectures
308     - Implement rolling deployment with health checks
309  
310  3. **Monitoring & Alerting**
311     - Deploy Prometheus metrics collection
312     - Set up Grafana dashboards for validator health
313     - Configure alerts for sync failures, consensus drops
314  
315  ---
316  
317  ## Conclusion
318  
319  The testnet deployment was partially successful. The updated AlphaOS binaries were successfully deployed to all 5 validators, and 4 out of 5 are producing blocks in consensus. However:
320  
321  **Successes**:
322  - ✅ Binary deployment pipeline works correctly
323  - ✅ CPU architecture compatibility issue resolved
324  - ✅ 4/5 validators operating in healthy consensus
325  - ✅ BFT consensus mechanism functioning correctly
326  
327  **Limitations**:
328  - ⚠️ Automatic genesis fetch could not be tested in dev mode
329  - ⚠️ testnet001 experiencing critical sync issues
330  - ⚠️ Genesis caching and BFT verification untested
331  
332  **Critical Issue**:
333  - ❌ testnet001 sync failure requires immediate investigation
334  
335  **Test Scenario 2 Status**: INCOMPLETE - Genesis fetch testing blocked by dev mode behavior. Recommend retesting in production mode configuration.
336  
337  ---
338  
339  ## Appendix: Configuration Details
340  
341  **Validator Configuration** (testnet001):
342  ```
343  Network: 1 (testnet)
344  Dev Mode: 0 (first validator in 5-node devnet)
345  Dev Validators: 5
346  Node Port: 4130
347  BFT Port: 5000
348  REST Port: 3030
349  Peers: 178.156.159.24:4130,46.62.225.199:4130,65.21.149.67:4130,157.180.28.93:4130
350  Validators: 178.156.159.24:5000,46.62.225.199:5000,65.21.149.67:5000,157.180.28.93:5000
351  ```
352  
353  **Server Specifications** (all validators):
354  - CPU: 8 cores (AMD EPYC)
355  - RAM: 30 GiB
356  - OS: Ubuntu (Linux 6.8.0)
357  - SSH Port: 2584
358  - Firewall: UFW (configured for AlphaOS ports)
359  
360  **Network Topology**:
361  ```
362  testnet001 (65.108.155.133) - STUCK
363  testnet002 (178.156.159.24) - HEALTHY
364  testnet003 (46.62.225.199) - HEALTHY
365  testnet004 (65.21.149.67) - HEALTHY
366  testnet005 (157.180.28.93) - HEALTHY
367  ```
368  
369  ---
370  
371  **Report Generated**: 2026-01-21 18:25 UTC  
372  **Test Duration**: ~35 minutes  
373  **AlphaOS Commit**: Latest main branch (rebuilt with x86-64 compatibility)