validation-loop.md
1 # Validation Loop 2 3 *proto-027 | Operational pattern for verifying fixes actually worked* 4 5 --- 6 7 - **principle** 8 - "After any fix, re-run the check to verify improvement. Report the delta." 9 - "Trust but verify. The fix isn't done until validation passes." 10 11 - **shape** 12 - Capture baseline metrics before fix 13 - Apply fix 14 - Re-run same measurement 15 - Compare before/after 16 - Report delta (not just "done") 17 18 --- 19 20 ## The Insight 21 22 "Fixed" is a claim. "Violations reduced from 57 to 0" is evidence. 23 24 ``` 25 WEAK PATTERN: 26 Fix applied → "Done" 27 ↓ 28 How do we know it worked? 29 30 VALIDATION LOOP: 31 Measure (before) → Fix → Measure (after) → Report delta 32 ↓ 33 "Violations: 57 → 12 (-45)" 34 ``` 35 36 The validation loop closes the feedback cycle. Without it, you're hoping. 37 38 --- 39 40 ## Implementation 41 42 ```python 43 def validate_fixes( 44 original_result: SpeedRunResult, 45 transcript_path: Path, 46 current_rules: Dict 47 ) -> Dict[str, Any]: 48 """ 49 Re-run compliance check to validate fixes worked. 50 Compares before/after violation counts. 51 """ 52 # Re-run the same check 53 segments = parse_transcript(transcript_path) 54 segments, new_summary = evaluate_all_segments(segments, current_rules) 55 56 # Compare 57 comparison = { 58 "before": { 59 "total_violations": len(original_result.protocol_violations), 60 "by_type": original_result.issues_by_type, 61 }, 62 "after": { 63 "total_violations": new_summary["total_violations"], 64 "by_type": categorize_violations(new_summary["violations"]), 65 }, 66 "delta": {}, 67 "validation_passed": False, 68 } 69 70 # Calculate deltas per type 71 for issue_type in ISSUE_TYPES: 72 before = comparison["before"]["by_type"].get(issue_type, 0) 73 after = comparison["after"]["by_type"].get(issue_type, 0) 74 comparison["delta"][issue_type] = after - before 75 76 # Pass if violations decreased or stayed same 77 comparison["validation_passed"] = ( 78 comparison["after"]["total_violations"] <= 79 comparison["before"]["total_violations"] 80 ) 81 82 return comparison 83 ``` 84 85 --- 86 87 ## The Report Format 88 89 Always report the delta, not just pass/fail: 90 91 ```markdown 92 ## Fix Validation ✓ 93 94 | Metric | Before | After | Delta | 95 |--------|--------|-------|-------| 96 | **Total Violations** | 57 | 12 | -45 | 97 | **Avg Compliance** | 85% | 97% | +12% | 98 | **Free Energy** | 0.58 | 0.12 | -0.46 | 99 100 ### By Issue Type 101 102 | Issue Type | Before | After | Delta | 103 |------------|--------|-------|-------| 104 | phoenix_update | 57 | 0 | -57 ↓ | 105 | hygiene_check | 23 | 12 | -11 ↓ | 106 107 **Result: VALIDATION PASSED** 108 ``` 109 110 --- 111 112 ## When to Apply 113 114 | Scenario | Validation | 115 |----------|------------| 116 | Fixer agent ran | Re-run speed run, compare violations | 117 | Bug fix deployed | Re-run tests, compare pass rate | 118 | Performance fix | Re-run benchmark, compare times | 119 | Config change | Re-run health check, compare status | 120 121 **Rule:** If you measured a problem, re-measure after fixing. 122 123 --- 124 125 ## Failure Modes 126 127 ### Validation Failed (Violations Increased) 128 ``` 129 Before: 57 violations 130 After: 63 violations 131 Delta: +6 ✗ 132 133 Action: Fixer made things worse. Investigate. 134 ``` 135 136 ### Validation Inconclusive (Same) 137 ``` 138 Before: 57 violations 139 After: 57 violations 140 Delta: 0 → 141 142 Action: Fix had no effect. Different approach needed. 143 ``` 144 145 ### Validation Passed (Decreased) 146 ``` 147 Before: 57 violations 148 After: 12 violations 149 Delta: -45 ↓ ✓ 150 151 Action: Fix worked. Remaining 12 may need different treatment. 152 ``` 153 154 --- 155 156 ## Axiom Alignment 157 158 | Axiom | Alignment | 159 |-------|-----------| 160 | **A0 (Boundary)** | Validation defines success/failure boundary | 161 | **A2 (Life)** | Primitive verification beats ornamental "done" claims | 162 | **A3 (Navigation)** | Delta tells you which direction you moved | 163 | **A4 (Ergodicity)** | Catching failed fixes prevents compounding damage | 164 165 --- 166 167 ## Instances 168 169 ### Positive Instance: Speed Run Fixer Validation 170 - **Context:** Ran fixers for phoenix/hygiene violations 171 - **Before:** 282 total violations 172 - **After:** Re-ran speed run 173 - **Validation:** Compared by type, reported delta per category 174 - **Outcome:** ✓ Clear evidence of fix effectiveness 175 176 ### Negative Instance: "Fixed" Without Verification 177 - **Context:** Applied fix, reported "done" 178 - **Problem:** No re-measurement 179 - **Risk:** Fix may not have worked, may have regressed 180 - **Outcome:** ✗ Unknown state, false confidence 181 182 --- 183 184 ## Related 185 186 - [[threshold-triggered-automation]] - Validation after automated fixes 187 - [[test-before-ship]] - Validation before committing 188 - [[free-energy-alignment]] - F score is a form of validation 189 190 --- 191 192 *proto-027 | Validation Loop | 2026-01-15*