/ components / _plans / testnet-proposal-cache-fixes.cspec
testnet-proposal-cache-fixes.cspec
  1  id: testnet-proposal-cache-fixes
  2  title: Fix Persistent Proposal Cache Issues in Dev Mode
  3  status: proposed
  4  priority: high
  5  estimated_effort: 4-6 hours
  6  created: 2026-01-22
  7  author: Claude Sonnet 4.5
  8  
  9  ## Context
 10  
 11  During Section 12b governance testing, discovered that BFT proposal cache persists across testnet restarts, causing validators to get stuck unable to propose new batches.
 12  
 13  **Root Cause**: Proposal cache file loads `latest_certificate_round` from previous session, validator refuses to propose for round 1 when cache shows round 5236.
 14  
 15  **Impact**: Cannot test new features on testnet without full infrastructure rebuild.
 16  
 17  ## Problem Statement
 18  
 19  **Files Involved**:
 20  - `alphaos/node/bft/src/primary.rs:173-188` - Loads proposal cache unconditionally
 21  - `alphaos/node/bft/src/helpers/proposal_cache.rs:32-48` - Cache file path generation
 22  - `alphaos/cli/src/commands/start.rs` - No fresh-start option
 23  
 24  **Current Behavior**:
 25  1. Validator starts with `--dev 0 --dev-num-validators 5`
 26  2. BFT primary calls `load_proposal_cache()`
 27  3. Loads `/root/.current-proposal-cache-1-0` if exists
 28  4. Sets `propose_lock` to cached round (e.g., 5236)
 29  5. Refuses to propose for round 1 because 1 < 5236
 30  6. **Result**: Testnet stuck forever
 31  
 32  **Error Message**:
 33  ```
 34  Cannot propose a batch for round 1 - the latest proposal cache round is 5236
 35  ```
 36  
 37  ## Proposed Solution
 38  
 39  ### Fix 1: Add --fresh-start Flag
 40  
 41  **File**: `alphaos/cli/src/commands/start.rs`
 42  **Lines**: Add around line 112 (with other flags)
 43  
 44  ```rust
 45  /// Start with fresh state (delete proposal cache)
 46  #[clap(long = "fresh-start")]
 47  pub fresh_start: bool,
 48  ```
 49  
 50  **Integration** (in `parse_node()` around line 650):
 51  ```rust
 52  // Before starting consensus, check for fresh start
 53  if self.fresh_start {
 54      info!("⚠️  Fresh start requested - deleting proposal cache");
 55  
 56      use acdc_std::StorageMode;
 57      use alphaos_node_bft::helpers::proposal_cache_path;
 58  
 59      let storage_mode = if let Some(dev_id) = self.dev {
 60          StorageMode::Development(dev_id)
 61      } else {
 62          StorageMode::Production
 63      };
 64  
 65      let cache_path = proposal_cache_path(N::ID, &storage_mode);
 66  
 67      if cache_path.exists() {
 68          std::fs::remove_file(&cache_path)
 69              .context("Failed to delete proposal cache")?;
 70          info!("✅ Deleted proposal cache: {}", cache_path.display());
 71      } else {
 72          info!("No proposal cache found");
 73      }
 74  }
 75  ```
 76  
 77  **Usage**:
 78  ```bash
 79  alphaos start --network 1 --validator --dev 0 --dev-num-validators 5 --fresh-start
 80  ```
 81  
 82  **Testing**:
 83  1. Start validator normally, let it run to round 100
 84  2. Stop validator
 85  3. Restart with `--fresh-start`
 86  4. Verify no cache file exists
 87  5. Verify validator starts at round 1
 88  
 89  ---
 90  
 91  ### Fix 2: Skip Proposal Cache in Dev Mode
 92  
 93  **File**: `alphaos/node/bft/src/primary.rs`
 94  **Function**: `load_proposal_cache()` (lines 172-212)
 95  
 96  **Change**:
 97  ```rust
 98  /// Load the proposal cache file and update the Primary state with the stored data.
 99  async fn load_proposal_cache(&self) -> Result<()> {
100      // IMPORTANT: Skip proposal cache in dev mode to allow fresh starts
101      if self.storage_mode.dev().is_some() {
102          info!("Skipping proposal cache in dev mode (allows fresh restarts)");
103          return Ok(());
104      }
105  
106      // Fetch the signed proposals from the file system if it exists.
107      match ProposalCache::<N>::exists(&self.storage_mode) {
108          // ... rest of existing logic
109      }
110  }
111  ```
112  
113  **Rationale**:
114  - Dev mode is for testing, should allow easy resets
115  - Production mode still benefits from proposal cache for crash recovery
116  - Explicit opt-in for cache behavior
117  
118  **Side Effects**:
119  - Dev mode validators will lose proposal state on restart (acceptable for testing)
120  - Slightly slower recovery after crash in dev mode (acceptable trade-off)
121  
122  **Testing**:
123  1. Start dev mode validator, let it run to round 100
124  2. Stop validator (cache file should exist)
125  3. Restart validator
126  4. Verify starts at round 1 (cache ignored)
127  5. Test production mode still loads cache
128  
129  ---
130  
131  ### Fix 3: Proposal Cache Validation
132  
133  **File**: `alphaos/node/bft/src/helpers/proposal_cache.rs`
134  **Function**: `load()` (around line 92)
135  
136  **Add validation after loading**:
137  ```rust
138  pub fn load(signer: Address<N>, storage_mode: &StorageMode) -> Result<Self> {
139      // Construct the proposal cache file system path.
140      let path = proposal_cache_path(N::ID, storage_mode);
141  
142      // Attempt to read the proposal cache file.
143      let proposal_cache = match fs::read(&path) {
144          Ok(bytes) => match Self::from_bytes_le(&bytes[..]) {
145              Ok(cache) => cache,
146              Err(err) => bail!("Failed to deserialize the proposal cache - {err}"),
147          },
148          Err(err) => bail!("Failed to read the proposal cache from {} - {err}", path.display()),
149      };
150  
151      // Validate the proposal cache.
152      ensure!(proposal_cache.is_valid(signer), "The proposal cache is invalid");
153  
154      // NEW: Validate cache is not stale
155      // If cache round is suspiciously high, it's likely from an old session
156      if proposal_cache.latest_round > 10000 {
157          warn!(
158              "Proposal cache round {} is very high - this may be from an old session. \
159               Consider using --fresh-start to reset state.",
160              proposal_cache.latest_round
161          );
162      }
163  
164      info!("Loaded the proposal cache from {} at round {}", path.display(), proposal_cache.latest_round);
165  
166      Ok(proposal_cache)
167  }
168  ```
169  
170  **Alternative** (stricter validation):
171  ```rust
172  // Compare cache round with current ledger state
173  // Requires access to ledger, so would need API change
174  pub fn load(
175      signer: Address<N>,
176      storage_mode: &StorageMode,
177      current_height: u32,  // NEW parameter
178  ) -> Result<Self> {
179      // ... load cache
180  
181      // Validate cache round is reasonable compared to ledger height
182      // Typically round ≈ height, so if round >> height, cache is stale
183      if proposal_cache.latest_round > current_height + 1000 {
184          bail!(
185              "Proposal cache round {} is far ahead of ledger height {} - \
186               refusing to load stale cache. Use --fresh-start to reset.",
187              proposal_cache.latest_round,
188              current_height
189          );
190      }
191  
192      Ok(proposal_cache)
193  }
194  ```
195  
196  **Testing**:
197  1. Create cache file with round 5236, ledger at height 10
198  2. Attempt to load cache
199  3. Verify validation triggers
200  4. Verify helpful error message
201  
202  ---
203  
204  ### Fix 4: Add alphaos clean Subcommand
205  
206  **New File**: `alphaos/cli/src/commands/clean.rs`
207  
208  ```rust
209  // Copyright (c) 2025-2026 ACDC Network
210  // ... standard header
211  
212  use acdc_std::{alpha_ledger_dir, StorageMode};
213  use alphaos_node_bft::helpers::proposal_cache_path;
214  use alphavm::prelude::{anyhow::Result, bail, info};
215  use clap::Parser;
216  use std::fs;
217  
218  /// Clean validator state (proposal cache, BFT storage)
219  #[derive(Debug, Parser)]
220  pub struct Clean {
221      /// Network ID (0=mainnet, 1=testnet, 2=devnet)
222      #[clap(long, default_value = "1")]
223      pub network: u16,
224  
225      /// Development node ID (0-4 for dev mode)
226      #[clap(long)]
227      pub dev: Option<u8>,
228  
229      /// Confirm deletion (required for safety)
230      #[clap(long)]
231      pub confirm: bool,
232  
233      /// Clean blockchain ledgers too (dangerous)
234      #[clap(long)]
235      pub clean_ledger: bool,
236  }
237  
238  impl Clean {
239      pub fn run(self) -> Result<()> {
240          if !self.confirm {
241              println!("⚠️  This will delete validator state files.");
242              println!("   Add --confirm to proceed.");
243              return Ok(());
244          }
245  
246          let storage_mode = if let Some(dev_id) = self.dev {
247              StorageMode::Development(dev_id)
248          } else {
249              StorageMode::Production
250          };
251  
252          // Delete proposal cache
253          let cache_path = proposal_cache_path(self.network, &storage_mode);
254          if cache_path.exists() {
255              fs::remove_file(&cache_path)?;
256              info!("✅ Deleted proposal cache: {}", cache_path.display());
257          } else {
258              info!("No proposal cache found at {}", cache_path.display());
259          }
260  
261          // Delete BFT storage
262          let ledger_dir = alpha_ledger_dir(self.network, &storage_mode);
263          if ledger_dir.exists() {
264              fs::remove_dir_all(&ledger_dir)?;
265              info!("✅ Deleted BFT storage: {}", ledger_dir.display());
266          } else {
267              info!("No BFT storage found at {}", ledger_dir.display());
268          }
269  
270          // Optionally delete blockchain ledgers
271          if self.clean_ledger {
272              // TODO: Add ledger deletion
273              // This is more dangerous and should require extra confirmation
274              info!("⚠️  Blockchain ledger cleaning not yet implemented");
275          }
276  
277          println!("✅ Cleanup complete!");
278          Ok(())
279      }
280  }
281  ```
282  
283  **Integration**: `alphaos/cli/src/commands/mod.rs`
284  ```rust
285  pub mod clean;
286  pub use clean::Clean;
287  ```
288  
289  **CLI Integration**: `alphaos/cli/src/main.rs`
290  ```rust
291  #[derive(Debug, Parser)]
292  pub enum Command {
293      Start(Start),
294      Clean(Clean),  // NEW
295      // ... other commands
296  }
297  
298  // In main():
299  match cli.command {
300      Command::Start(start) => start.parse().await,
301      Command::Clean(clean) => clean.run(),
302  }
303  ```
304  
305  **Usage**:
306  ```bash
307  # Clean dev mode validator 0
308  alphaos clean --network 1 --dev 0 --confirm
309  
310  # Clean production testnet validator
311  alphaos clean --network 1 --confirm
312  
313  # Clean everything including ledger
314  alphaos clean --network 1 --dev 0 --confirm --clean-ledger
315  ```
316  
317  **Testing**:
318  1. Start validator, let it run
319  2. Stop validator
320  3. Run `alphaos clean --network 1 --dev 0 --confirm`
321  4. Verify cache and storage deleted
322  5. Start validator again, verify fresh start
323  
324  ---
325  
326  ## Implementation Plan
327  
328  ### Phase 1: Quick Fix (1-2 hours)
329  1. Implement Fix 2 (skip cache in dev mode)
330  2. Test on local dev environment
331  3. Deploy to testnet for verification
332  
333  ### Phase 2: User-Friendly Fix (2-3 hours)
334  1. Implement Fix 1 (--fresh-start flag)
335  2. Update systemd service files to document flag
336  3. Add to testnet deployment scripts
337  
338  ### Phase 3: Robustness (1-2 hours)
339  1. Implement Fix 3 (cache validation)
340  2. Add unit tests for validation logic
341  3. Test with various cache scenarios
342  
343  ### Phase 4: Developer Experience (2-3 hours)
344  1. Implement Fix 4 (alphaos clean subcommand)
345  2. Write documentation
346  3. Update testnet operations guide
347  
348  ## Testing Strategy
349  
350  ### Unit Tests
351  **File**: `alphaos/node/bft/src/helpers/proposal_cache_tests.rs`
352  
353  ```rust
354  #[test]
355  fn test_cache_validation_stale_round() {
356      // Create cache with round 5236
357      let cache = ProposalCache::new(5236, None, Default::default(), Default::default());
358  
359      // Attempt to load with current height 10
360      let result = cache.validate_against_height(10);
361  
362      assert!(result.is_err());
363      assert!(result.unwrap_err().to_string().contains("stale cache"));
364  }
365  
366  #[test]
367  fn test_dev_mode_skips_cache() {
368      // Start primary in dev mode
369      let primary = Primary::new(..., StorageMode::Development(0));
370  
371      // Verify cache not loaded
372      assert_eq!(*primary.propose_lock.lock().await, 0);
373  }
374  ```
375  
376  ### Integration Tests
377  1. **Test fresh-start flag**:
378     - Start validator, run to round 100, stop
379     - Restart with --fresh-start
380     - Verify round resets to 1
381  
382  2. **Test clean command**:
383     - Start validator, create state
384     - Run `alphaos clean --confirm`
385     - Verify all state deleted
386  
387  3. **Test cache validation**:
388     - Manually create stale cache file
389     - Attempt to start validator
390     - Verify validation prevents startup with clear error
391  
392  ### Testnet Verification
393  1. Deploy updated binary to all 5 validators
394  2. Test fresh restart procedure
395  3. Verify governance testing can proceed
396  4. Document new operational procedures
397  
398  ## Success Criteria
399  
400  - [ ] Dev mode validators can restart fresh without manual cleanup
401  - [ ] `--fresh-start` flag works correctly
402  - [ ] `alphaos clean` command implemented and tested
403  - [ ] Cache validation prevents stale state loading
404  - [ ] Section 12b governance testing unblocked
405  - [ ] Documentation updated with new procedures
406  
407  ## Dependencies
408  
409  - None (all changes self-contained in alphaos)
410  
411  ## Risks
412  
413  - **Data loss**: Clean command could delete important state if misused
414    - Mitigation: Require --confirm flag, clear warnings
415  - **Behavior change**: Skipping cache in dev mode changes restart behavior
416    - Mitigation: Only affects dev mode, production unchanged
417  - **Testing gaps**: Cache validation logic needs thorough testing
418    - Mitigation: Comprehensive unit tests before deployment
419  
420  ## Documentation Updates
421  
422  ### Files to Update
423  1. `alpha-delta-context/docs/operations/testnet-reset.md` - Add clean procedures
424  2. `alpha-delta-context/docs/operations/validator-onboarding.md` - Mention --fresh-start
425  3. `alphaos/cli/README.md` - Document clean subcommand
426  
427  ### New Documentation
428  1. Troubleshooting guide for "Cannot propose a batch" error
429  2. Best practices for dev mode testing
430  3. State management in production vs dev mode
431  
432  ## Related Issues
433  
434  - Section 12b governance testing (blocked by this issue)
435  - Future testnet deployments (need reliable reset procedure)
436  - CI/CD integration testing (needs clean state between runs)
437  
438  ## Notes
439  
440  This is a HIGH priority fix because it blocks testing of all new consensus features. The immediate workaround (Fix 2) can be implemented in < 1 hour and unblocks governance testing.
441  
442  The proposal cache is valuable for production crash recovery, but becomes a liability in dev/test environments. The solution is to make cache behavior configurable and provide tools for state management.