/ patterns / pre-scale-safety-audit.md
pre-scale-safety-audit.md
  1  # Pre-Scale Safety Audit
  2  
  3  *proto-030 | Operational pattern for analyzing automation before scaling*
  4  
  5  ---
  6  
  7  - **principle**
  8    - "Before scaling any automation, analyze for runaway risks and add structural safeguards."
  9    - "Scale geometrically, not linearly. Verify at each expansion."
 10  
 11  - **shape**
 12    - Analyze system for compounding/cascading risks
 13    - Add structural safeguards (caps, gates, aborts)
 14    - Test at small scale (1 unit)
 15    - Expand geometrically: 1 → 10 → 100 → all
 16    - Verify at each expansion before proceeding
 17  
 18  ---
 19  
 20  ## The Insight
 21  
 22  Automation that works on 1 item can explode on 1000. The failure modes aren't visible until scale.
 23  
 24  ```
 25  LINEAR SCALING (DANGEROUS):
 26  Build automation → Run on all sessions
 27 28                Discover runaway at scale (too late)
 29  
 30  GEOMETRIC SCALING (SAFE):
 31  Build automation → Safety audit → Safeguards
 32 33      Run on 1 → Verify
 34 35      Run on 10 → Verify
 36 37      Run on 100 → Verify
 38 39      Run on all
 40  ```
 41  
 42  The geometric approach catches exponential risks before they compound.
 43  
 44  ---
 45  
 46  ## The Safety Audit Checklist
 47  
 48  Before scaling any automation, check:
 49  
 50  | Risk Category | Question | Safeguard |
 51  |---------------|----------|-----------|
 52  | **Cascade spawning** | Can A spawn B which spawns more A? | MAX_GENERATIONS cap |
 53  | **Feedback loops** | Can output become input that triggers more output? | Generation tracking |
 54  | **Resource exhaustion** | Unbounded subprocess/file/memory growth? | Explicit caps |
 55  | **Threshold drift** | Can fixes create new threshold crossings? | Post-action re-check |
 56  | **Time bombs** | Anything that grows with session count? | Per-session caps |
 57  
 58  ---
 59  
 60  ## Implementation
 61  
 62  ### Phase 1: Safety Audit
 63  ```python
 64  # Before scaling, analyze the system
 65  def pre_scale_audit(system) -> List[Risk]:
 66      risks = []
 67  
 68      # Check for cascade spawning
 69      if can_spawn_self(system):
 70          risks.append(Risk("cascade", "System can spawn itself"))
 71  
 72      # Check for feedback loops
 73      if output_becomes_input(system):
 74          risks.append(Risk("feedback", "Output feeds back to input"))
 75  
 76      # Check for unbounded growth
 77      for loop in find_loops(system):
 78          if not has_explicit_cap(loop):
 79              risks.append(Risk("unbounded", f"Loop {loop} has no cap"))
 80  
 81      return risks
 82  ```
 83  
 84  ### Phase 2: Add Safeguards
 85  ```python
 86  # Structural caps (not behavioral)
 87  MAX_GENERATIONS = 1          # Prevent cascade spawning
 88  MAX_ITEMS_PER_RUN = 50       # Prevent unbounded processing
 89  MAX_TOTAL_BEFORE_ABORT = 1000  # Safety valve
 90  ```
 91  
 92  ### Phase 3: Geometric Scaling
 93  ```python
 94  def scale_geometrically(automation, items):
 95      """Scale in geometric steps, verifying at each."""
 96  
 97      # Step 1: Single item
 98      result = automation.run(items[:1])
 99      if not verify(result):
100          return "ABORT at scale=1"
101  
102      # Step 2: 10 items
103      result = automation.run(items[:10])
104      if not verify(result):
105          return "ABORT at scale=10"
106  
107      # Step 3: 100 items (or all if less)
108      result = automation.run(items[:100])
109      if not verify(result):
110          return "ABORT at scale=100"
111  
112      # Step 4: All items
113      return automation.run(items)
114  ```
115  
116  ---
117  
118  ## When to Apply
119  
120  | Trigger | Action |
121  |---------|--------|
122  | "Let's run this on all X" | STOP. Run safety audit first. |
123  | "Scale up to production" | STOP. Add safeguards first. |
124  | "Process the entire history" | STOP. Geometric expansion. |
125  | Automation worked on 1 item | Before expanding: audit + safeguards |
126  
127  **Rule:** Never go from 1 to all. Always go 1 → 10 → 100 → all.
128  
129  ---
130  
131  ## Axiom Alignment
132  
133  | Axiom | Alignment |
134  |-------|-----------|
135  | **A4 (Ergodicity)** | Prevent ruin before optimizing gain - one runaway can undo all progress |
136  | **A0 (Boundary)** | Safeguards define hard boundaries automation cannot cross |
137  | **A3 (Navigation)** | Caps can be adjusted based on observed behavior at each scale |
138  
139  ---
140  
141  ## Instances
142  
143  ### Positive Instance: Speed Run Scaling
144  - **Context:** About to run speed run across all historical sessions
145  - **Audit:** Analyzed for cascade spawning, feedback loops, resource exhaustion
146  - **Risks Found:** Fixer→Fixer cascade, threshold feedback loops
147  - **Safeguards Added:** MAX_FIXER_GENERATIONS=1, post-fixer threshold re-check
148  - **Scaling:** Today's session → Yesterday → This week → All
149  - **Outcome:** ✓ Safe to scale with structural protections
150  
151  ### Negative Instance: Unaudited Scaling
152  - **Context:** Automation worked on test case
153  - **Skip:** Ran directly on production data
154  - **Result:** Cascade spawned 10,000 subprocesses
155  - **Impact:** System crash, 4 hours to recover
156  - **Outcome:** ✗ 30-minute audit would have caught it
157  
158  ---
159  
160  ## The Geometric Expansion Protocol
161  
162  ```
163  STEP 1: AUDIT
164  ├─ Analyze for runaway risks
165  ├─ Document risks found
166  └─ Add structural safeguards
167  
168  STEP 2: SCALE 1
169  ├─ Run on 1 item (today's session)
170  ├─ Verify: No unexpected behavior?
171  ├─ Check: Resource usage normal?
172  └─ If OK → proceed to SCALE 10
173  
174  STEP 3: SCALE 10
175  ├─ Run on 10 items (yesterday + today)
176  ├─ Verify: Behavior scales linearly?
177  ├─ Check: No exponential growth?
178  └─ If OK → proceed to SCALE 100
179  
180  STEP 4: SCALE 100
181  ├─ Run on 100 items (this week)
182  ├─ Verify: Still bounded?
183  ├─ Check: Safeguards activated appropriately?
184  └─ If OK → proceed to SCALE ALL
185  
186  STEP 5: SCALE ALL
187  ├─ Run on all items
188  ├─ Monitor closely
189  └─ Have kill switch ready
190  ```
191  
192  ---
193  
194  ## Related
195  
196  - [[threshold-triggered-automation]] - What gets triggered at scale
197  - [[validation-loop]] - Verify at each expansion
198  - [[infrastructure-over-suggestion]] - Safeguards are structural, not behavioral
199  
200  ---
201  
202  *proto-030 | Pre-Scale Safety Audit | 2026-01-15*