pre-scale-safety-audit.md
1 # Pre-Scale Safety Audit 2 3 *proto-030 | Operational pattern for analyzing automation before scaling* 4 5 --- 6 7 - **principle** 8 - "Before scaling any automation, analyze for runaway risks and add structural safeguards." 9 - "Scale geometrically, not linearly. Verify at each expansion." 10 11 - **shape** 12 - Analyze system for compounding/cascading risks 13 - Add structural safeguards (caps, gates, aborts) 14 - Test at small scale (1 unit) 15 - Expand geometrically: 1 → 10 → 100 → all 16 - Verify at each expansion before proceeding 17 18 --- 19 20 ## The Insight 21 22 Automation that works on 1 item can explode on 1000. The failure modes aren't visible until scale. 23 24 ``` 25 LINEAR SCALING (DANGEROUS): 26 Build automation → Run on all sessions 27 ↓ 28 Discover runaway at scale (too late) 29 30 GEOMETRIC SCALING (SAFE): 31 Build automation → Safety audit → Safeguards 32 ↓ 33 Run on 1 → Verify 34 ↓ 35 Run on 10 → Verify 36 ↓ 37 Run on 100 → Verify 38 ↓ 39 Run on all 40 ``` 41 42 The geometric approach catches exponential risks before they compound. 43 44 --- 45 46 ## The Safety Audit Checklist 47 48 Before scaling any automation, check: 49 50 | Risk Category | Question | Safeguard | 51 |---------------|----------|-----------| 52 | **Cascade spawning** | Can A spawn B which spawns more A? | MAX_GENERATIONS cap | 53 | **Feedback loops** | Can output become input that triggers more output? | Generation tracking | 54 | **Resource exhaustion** | Unbounded subprocess/file/memory growth? | Explicit caps | 55 | **Threshold drift** | Can fixes create new threshold crossings? | Post-action re-check | 56 | **Time bombs** | Anything that grows with session count? | Per-session caps | 57 58 --- 59 60 ## Implementation 61 62 ### Phase 1: Safety Audit 63 ```python 64 # Before scaling, analyze the system 65 def pre_scale_audit(system) -> List[Risk]: 66 risks = [] 67 68 # Check for cascade spawning 69 if can_spawn_self(system): 70 risks.append(Risk("cascade", "System can spawn itself")) 71 72 # Check for feedback loops 73 if output_becomes_input(system): 74 risks.append(Risk("feedback", "Output feeds back to input")) 75 76 # Check for unbounded growth 77 for loop in find_loops(system): 78 if not has_explicit_cap(loop): 79 risks.append(Risk("unbounded", f"Loop {loop} has no cap")) 80 81 return risks 82 ``` 83 84 ### Phase 2: Add Safeguards 85 ```python 86 # Structural caps (not behavioral) 87 MAX_GENERATIONS = 1 # Prevent cascade spawning 88 MAX_ITEMS_PER_RUN = 50 # Prevent unbounded processing 89 MAX_TOTAL_BEFORE_ABORT = 1000 # Safety valve 90 ``` 91 92 ### Phase 3: Geometric Scaling 93 ```python 94 def scale_geometrically(automation, items): 95 """Scale in geometric steps, verifying at each.""" 96 97 # Step 1: Single item 98 result = automation.run(items[:1]) 99 if not verify(result): 100 return "ABORT at scale=1" 101 102 # Step 2: 10 items 103 result = automation.run(items[:10]) 104 if not verify(result): 105 return "ABORT at scale=10" 106 107 # Step 3: 100 items (or all if less) 108 result = automation.run(items[:100]) 109 if not verify(result): 110 return "ABORT at scale=100" 111 112 # Step 4: All items 113 return automation.run(items) 114 ``` 115 116 --- 117 118 ## When to Apply 119 120 | Trigger | Action | 121 |---------|--------| 122 | "Let's run this on all X" | STOP. Run safety audit first. | 123 | "Scale up to production" | STOP. Add safeguards first. | 124 | "Process the entire history" | STOP. Geometric expansion. | 125 | Automation worked on 1 item | Before expanding: audit + safeguards | 126 127 **Rule:** Never go from 1 to all. Always go 1 → 10 → 100 → all. 128 129 --- 130 131 ## Axiom Alignment 132 133 | Axiom | Alignment | 134 |-------|-----------| 135 | **A4 (Ergodicity)** | Prevent ruin before optimizing gain - one runaway can undo all progress | 136 | **A0 (Boundary)** | Safeguards define hard boundaries automation cannot cross | 137 | **A3 (Navigation)** | Caps can be adjusted based on observed behavior at each scale | 138 139 --- 140 141 ## Instances 142 143 ### Positive Instance: Speed Run Scaling 144 - **Context:** About to run speed run across all historical sessions 145 - **Audit:** Analyzed for cascade spawning, feedback loops, resource exhaustion 146 - **Risks Found:** Fixer→Fixer cascade, threshold feedback loops 147 - **Safeguards Added:** MAX_FIXER_GENERATIONS=1, post-fixer threshold re-check 148 - **Scaling:** Today's session → Yesterday → This week → All 149 - **Outcome:** ✓ Safe to scale with structural protections 150 151 ### Negative Instance: Unaudited Scaling 152 - **Context:** Automation worked on test case 153 - **Skip:** Ran directly on production data 154 - **Result:** Cascade spawned 10,000 subprocesses 155 - **Impact:** System crash, 4 hours to recover 156 - **Outcome:** ✗ 30-minute audit would have caught it 157 158 --- 159 160 ## The Geometric Expansion Protocol 161 162 ``` 163 STEP 1: AUDIT 164 ├─ Analyze for runaway risks 165 ├─ Document risks found 166 └─ Add structural safeguards 167 168 STEP 2: SCALE 1 169 ├─ Run on 1 item (today's session) 170 ├─ Verify: No unexpected behavior? 171 ├─ Check: Resource usage normal? 172 └─ If OK → proceed to SCALE 10 173 174 STEP 3: SCALE 10 175 ├─ Run on 10 items (yesterday + today) 176 ├─ Verify: Behavior scales linearly? 177 ├─ Check: No exponential growth? 178 └─ If OK → proceed to SCALE 100 179 180 STEP 4: SCALE 100 181 ├─ Run on 100 items (this week) 182 ├─ Verify: Still bounded? 183 ├─ Check: Safeguards activated appropriately? 184 └─ If OK → proceed to SCALE ALL 185 186 STEP 5: SCALE ALL 187 ├─ Run on all items 188 ├─ Monitor closely 189 └─ Have kill switch ready 190 ``` 191 192 --- 193 194 ## Related 195 196 - [[threshold-triggered-automation]] - What gets triggered at scale 197 - [[validation-loop]] - Verify at each expansion 198 - [[infrastructure-over-suggestion]] - Safeguards are structural, not behavioral 199 200 --- 201 202 *proto-030 | Pre-Scale Safety Audit | 2026-01-15*