test-before-ship.md
1 # Test Before Ship 2 3 *proto-029 | Operational pattern for validating artifacts before committing* 4 5 --- 6 7 - **principle** 8 - "Run the artifact before committing. Verify it works, not just compiles." 9 - "The commit is the claim. The test is the evidence." 10 11 - **shape** 12 - Build/write the artifact 13 - Run it (not just syntax check) 14 - Verify output matches expectations 15 - Then commit with evidence of testing 16 17 --- 18 19 ## The Insight 20 21 Compiling is not testing. Syntax-correct is not working. 22 23 ``` 24 WEAK PATTERN: 25 Write code → Commit → "It should work" 26 ↓ 27 Discover failures later 28 29 TEST BEFORE SHIP: 30 Write code → Run it → Verify output → Commit 31 ↓ 32 Evidence in commit history 33 ``` 34 35 The cost of testing before commit: 30 seconds. 36 The cost of shipping broken code: hours of debugging, lost trust. 37 38 --- 39 40 ## The Verification Ladder 41 42 | Level | What You Verified | Confidence | 43 |-------|-------------------|------------| 44 | 0. Nothing | Just committed | 0% | 45 | 1. Syntax | No parse errors | 10% | 46 | 2. Imports | Dependencies resolve | 30% | 47 | 3. --help | CLI at least starts | 50% | 48 | 4. Basic run | Happy path works | 70% | 49 | 5. Edge cases | Handles errors | 90% | 50 | 6. Integration | Works with other components | 95% | 51 52 **Minimum:** Level 4 (basic run) before any commit. 53 54 --- 55 56 ## Implementation 57 58 ### For Scripts 59 ```bash 60 # Write the script 61 vim scripts/my_script.py 62 63 # Test it 64 python3 scripts/my_script.py --help 65 python3 scripts/my_script.py test-input.json 66 67 # Verify output 68 # ... check it looks right ... 69 70 # Then commit 71 git add scripts/my_script.py 72 git commit -m "Add my_script with tested CLI" 73 ``` 74 75 ### For Functions 76 ```python 77 # After writing the function, test immediately 78 def my_function(x): 79 return x * 2 80 81 # Quick verification 82 assert my_function(5) == 10 83 assert my_function(0) == 0 84 print("Tests pass") 85 86 # Then commit 87 ``` 88 89 ### For Infrastructure 90 ```bash 91 # After modifying infrastructure 92 vim core/consciousness/first_officer.py 93 94 # Run the tests 95 python3 -m pytest tests/test_first_officer.py 96 97 # Or run the component 98 python3 -c "from core.consciousness.first_officer import FirstOfficer; print('Import OK')" 99 100 # Then commit 101 ``` 102 103 --- 104 105 ## What "Run It" Means 106 107 | Artifact Type | Minimum Test | 108 |--------------|--------------| 109 | CLI script | `script.py --help` + one real invocation | 110 | Library function | Import + basic call with known output | 111 | Config file | Load it + verify parsed correctly | 112 | Integration | Run dependent component | 113 | Documentation | Render it + verify links work | 114 115 --- 116 117 ## The Commit Message Evidence 118 119 Good commit messages reference the testing: 120 121 ``` 122 Add speed run protocol with tribe architecture 123 124 - Tested: python3 speed_run.py session.jsonl --json 125 - Verified: 252 segments parsed, 26 workers spawned 126 - Edge case: Empty transcript returns gracefully 127 128 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> 129 ``` 130 131 This is evidence, not just description. 132 133 --- 134 135 ## Common Failures 136 137 ### "I'll Test Later" 138 ``` 139 Commit without testing 140 ↓ 141 Context switch to new task 142 ↓ 143 Bug discovered days later 144 ↓ 145 Forgot how it was supposed to work 146 ``` 147 148 ### "It Worked Last Time" 149 ``` 150 Small change, didn't test 151 ↓ 152 Small change broke something 153 ↓ 154 Not caught until integration 155 ``` 156 157 ### "Tests Pass" (But Not Run) 158 ``` 159 Tests exist 160 ↓ 161 Didn't run them before commit 162 ↓ 163 Tests would have caught the bug 164 ``` 165 166 --- 167 168 ## Axiom Alignment 169 170 | Axiom | Alignment | 171 |-------|-----------| 172 | **A2 (Life)** | Working code (tested) beats static code (untested) | 173 | **A4 (Ergodicity)** | Shipping untested code risks compounding failures | 174 | **A0 (Boundary)** | Test defines boundary between "works" and "might work" | 175 176 --- 177 178 ## Instances 179 180 ### Positive Instance: Speed Run CLI 181 - **Context:** Built 2000+ line speed_run.py 182 - **Test:** `python3 speed_run.py --help` then real transcript 183 - **Verification:** Saw 252 segments, 26 workers, output formatted 184 - **Then:** Committed 185 - **Outcome:** ✓ Known working at commit time 186 187 ### Positive Instance: Fixer Spawning 188 - **Context:** Added --spawn-fixers flag 189 - **Test:** Ran with flag, verified files created 190 - **Verification:** Checked manifest.json, prompt files exist 191 - **Then:** Committed 192 - **Outcome:** ✓ Feature verified before ship 193 194 ### Negative Instance: Untested Edit 195 - **Context:** Quick edit to existing function 196 - **Skip:** Didn't run tests 197 - **Result:** Typo broke import 198 - **Caught:** Next session, delayed work 199 - **Outcome:** ✗ 30-second test would have caught it 200 201 --- 202 203 ## The Rule 204 205 > **Never commit code you haven't run.** 206 > 207 > Syntax highlighting is not testing. 208 > "Should work" is not working. 209 > Run it. See output. Then commit. 210 211 --- 212 213 ## Related 214 215 - [[validation-loop]] - Testing after fixes specifically 216 - [[infrastructure-over-suggestion]] - Build tests, not just docs 217 - [[build-test-ground-protocol]] - Testing patterns for complex artifacts 218 219 --- 220 221 *proto-029 | Test Before Ship | 2026-01-15*