Cradicle Explorer

/ patterns / validation-loop.md
validation-loop.md
  1  # Validation Loop
  2  
  3  *proto-027 | Operational pattern for verifying fixes actually worked*
  4  
  5  ---
  6  
  7  - **principle**
  8    - "After any fix, re-run the check to verify improvement. Report the delta."
  9    - "Trust but verify. The fix isn't done until validation passes."
 10  
 11  - **shape**
 12    - Capture baseline metrics before fix
 13    - Apply fix
 14    - Re-run same measurement
 15    - Compare before/after
 16    - Report delta (not just "done")
 17  
 18  ---
 19  
 20  ## The Insight
 21  
 22  "Fixed" is a claim. "Violations reduced from 57 to 0" is evidence.
 23  
 24  ```
 25  WEAK PATTERN:
 26  Fix applied → "Done"
 27                ↓
 28      How do we know it worked?
 29  
 30  VALIDATION LOOP:
 31  Measure (before) → Fix → Measure (after) → Report delta
 32                                                 ↓
 33                                      "Violations: 57 → 12 (-45)"
 34  ```
 35  
 36  The validation loop closes the feedback cycle. Without it, you're hoping.
 37  
 38  ---
 39  
 40  ## Implementation
 41  
 42  ```python
 43  def validate_fixes(
 44      original_result: SpeedRunResult,
 45      transcript_path: Path,
 46      current_rules: Dict
 47  ) -> Dict[str, Any]:
 48      """
 49      Re-run compliance check to validate fixes worked.
 50      Compares before/after violation counts.
 51      """
 52      # Re-run the same check
 53      segments = parse_transcript(transcript_path)
 54      segments, new_summary = evaluate_all_segments(segments, current_rules)
 55  
 56      # Compare
 57      comparison = {
 58          "before": {
 59              "total_violations": len(original_result.protocol_violations),
 60              "by_type": original_result.issues_by_type,
 61          },
 62          "after": {
 63              "total_violations": new_summary["total_violations"],
 64              "by_type": categorize_violations(new_summary["violations"]),
 65          },
 66          "delta": {},
 67          "validation_passed": False,
 68      }
 69  
 70      # Calculate deltas per type
 71      for issue_type in ISSUE_TYPES:
 72          before = comparison["before"]["by_type"].get(issue_type, 0)
 73          after = comparison["after"]["by_type"].get(issue_type, 0)
 74          comparison["delta"][issue_type] = after - before
 75  
 76      # Pass if violations decreased or stayed same
 77      comparison["validation_passed"] = (
 78          comparison["after"]["total_violations"] <=
 79          comparison["before"]["total_violations"]
 80      )
 81  
 82      return comparison
 83  ```
 84  
 85  ---
 86  
 87  ## The Report Format
 88  
 89  Always report the delta, not just pass/fail:
 90  
 91  ```markdown
 92  ## Fix Validation ✓
 93  
 94  | Metric | Before | After | Delta |
 95  |--------|--------|-------|-------|
 96  | **Total Violations** | 57 | 12 | -45 |
 97  | **Avg Compliance** | 85% | 97% | +12% |
 98  | **Free Energy** | 0.58 | 0.12 | -0.46 |
 99  
100  ### By Issue Type
101  
102  | Issue Type | Before | After | Delta |
103  |------------|--------|-------|-------|
104  | phoenix_update | 57 | 0 | -57 ↓ |
105  | hygiene_check | 23 | 12 | -11 ↓ |
106  
107  **Result: VALIDATION PASSED**
108  ```
109  
110  ---
111  
112  ## When to Apply
113  
114  | Scenario | Validation |
115  |----------|------------|
116  | Fixer agent ran | Re-run speed run, compare violations |
117  | Bug fix deployed | Re-run tests, compare pass rate |
118  | Performance fix | Re-run benchmark, compare times |
119  | Config change | Re-run health check, compare status |
120  
121  **Rule:** If you measured a problem, re-measure after fixing.
122  
123  ---
124  
125  ## Failure Modes
126  
127  ### Validation Failed (Violations Increased)
128  ```
129  Before: 57 violations
130  After: 63 violations
131  Delta: +6 ✗
132  
133  Action: Fixer made things worse. Investigate.
134  ```
135  
136  ### Validation Inconclusive (Same)
137  ```
138  Before: 57 violations
139  After: 57 violations
140  Delta: 0 →
141  
142  Action: Fix had no effect. Different approach needed.
143  ```
144  
145  ### Validation Passed (Decreased)
146  ```
147  Before: 57 violations
148  After: 12 violations
149  Delta: -45 ↓ ✓
150  
151  Action: Fix worked. Remaining 12 may need different treatment.
152  ```
153  
154  ---
155  
156  ## Axiom Alignment
157  
158  | Axiom | Alignment |
159  |-------|-----------|
160  | **A0 (Boundary)** | Validation defines success/failure boundary |
161  | **A2 (Life)** | Primitive verification beats ornamental "done" claims |
162  | **A3 (Navigation)** | Delta tells you which direction you moved |
163  | **A4 (Ergodicity)** | Catching failed fixes prevents compounding damage |
164  
165  ---
166  
167  ## Instances
168  
169  ### Positive Instance: Speed Run Fixer Validation
170  - **Context:** Ran fixers for phoenix/hygiene violations
171  - **Before:** 282 total violations
172  - **After:** Re-ran speed run
173  - **Validation:** Compared by type, reported delta per category
174  - **Outcome:** ✓ Clear evidence of fix effectiveness
175  
176  ### Negative Instance: "Fixed" Without Verification
177  - **Context:** Applied fix, reported "done"
178  - **Problem:** No re-measurement
179  - **Risk:** Fix may not have worked, may have regressed
180  - **Outcome:** ✗ Unknown state, false confidence
181  
182  ---
183  
184  ## Related
185  
186  - [[threshold-triggered-automation]] - Validation after automated fixes
187  - [[test-before-ship]] - Validation before committing
188  - [[free-energy-alignment]] - F score is a form of validation
189  
190  ---
191  
192  *proto-027 | Validation Loop | 2026-01-15*