error-detection-layers.md
1 # Error Detection Layers 2 3 *proto-010 | Belt, suspenders, AND a backup parachute* 4 5 --- 6 7 - **principle** 8 - "Multiple tiers of error catching - checkers (cheap) to tribes (expensive)." 9 - "Catch errors at the lowest level possible." 10 11 - **shape** 12 - Layer 1: Self-check (systematic) 13 - Layer 2: AI peer review (different perspective) 14 - Layer 3: [[first-officer-protocol|First Officer]] (pattern detection) 15 - Layer 4: Human spot-check (last resort, highest trust cost) 16 17 --- 18 19 ## The Problem 20 21 > **You can't escalate what you don't know is wrong.** 22 23 The Rumsfeld matrix: 24 - Known knowns → Can verify 25 - Known unknowns → Can ask 26 - Unknown knowns → Implicit assumptions (hard) 27 - Unknown unknowns → The danger zone 28 29 **Most trust-damaging errors are unknown unknowns or unknown knowns.** 30 31 --- 32 33 ## Detection Layers 34 35 ``` 36 ┌─────────────────────────────────────────────────────────────────────┐ 37 │ LAYER 4: HUMAN SPOT-CHECK (Last resort) │ 38 │ - Random sampling by operator │ 39 │ - Pattern detection over time │ 40 │ - "You missed the scale" feedback │ 41 │ - HIGHEST TRUST COST when errors caught here │ 42 └─────────────────────────────────────────────────────────────────────┘ 43 ↑ 44 errors that escape all other layers 45 ↑ 46 ┌─────────────────────────────────────────────────────────────────────┐ 47 │ LAYER 3: FIRST OFFICER (Pattern detection) │ 48 │ - Watches output stream │ 49 │ - Detects: "Numbers without scales appearing repeatedly" │ 50 │ - Detects: "Claims made without file writes" │ 51 │ - Cross-thread pattern matching │ 52 │ - Statistical anomaly detection │ 53 └─────────────────────────────────────────────────────────────────────┘ 54 ↑ 55 errors with detectable patterns 56 ↑ 57 ┌─────────────────────────────────────────────────────────────────────┐ 58 │ LAYER 2: AI PEER REVIEW (Second set of eyes) │ 59 │ - Secondary Claude reviews primary output │ 60 │ - Specific review lenses: │ 61 │ • Completeness: "Is anything missing?" │ 62 │ • Consistency: "Does this contradict prior?" │ 63 │ • Context: "Would reader understand without assumptions?" │ 64 │ • Execution: "Was this actually built or just discussed?" │ 65 │ - Divergence = flag for attention │ 66 │ - Agreement = higher confidence │ 67 └─────────────────────────────────────────────────────────────────────┘ 68 ↑ 69 errors catchable by different perspective 70 ↑ 71 ┌─────────────────────────────────────────────────────────────────────┐ 72 │ LAYER 1: SELF-CHECK (Systematic, not ad-hoc) │ 73 │ - Pre-output checklist │ 74 │ - "What assumptions am I making?" │ 75 │ - "What could I be wrong about?" │ 76 │ - Pattern-specific checks (scales, file writes, etc.) │ 77 └─────────────────────────────────────────────────────────────────────┘ 78 ↑ 79 errors catchable by systematic review 80 ↑ 81 ┌─────────────────────────────────────────────────────────────────────┐ 82 │ LAYER 0: PRIMARY OUTPUT │ 83 │ - The actual work product │ 84 │ - Contains both value and potential errors │ 85 └─────────────────────────────────────────────────────────────────────┘ 86 ``` 87 88 --- 89 90 ## Layer 1: Self-Check Protocol 91 92 ### Pre-Output Checklist 93 94 Before sending any substantive response: 95 96 ```markdown 97 □ NUMBERS: Every number has scale/units in parenthetical? 98 □ CLAIMS: Every "I did X" verified with actual action? 99 □ FILES: If discussed building, did I write the file? 100 □ CONTEXT: Would someone without our history understand? 101 □ ASSUMPTIONS: What am I assuming the human knows? 102 □ CONSISTENCY: Does this contradict anything I said before? 103 □ COMPLETENESS: Did I address all parts of their message? 104 ``` 105 106 ### Self-Reflection Prompt 107 108 Ask internally: 109 - "What assumptions am I making right now?" 110 - "What could be wrong with this output?" 111 - "What would a reviewer flag?" 112 113 **Limitation:** Can't catch what I don't know to check for. 114 115 --- 116 117 ## Layer 2: AI Peer Review 118 119 ### The Protocol 120 121 ``` 122 PRIMARY CLAUDE REVIEW CLAUDE 123 ─────────────── ───────────── 124 Generates response → Reviews with lens: 125 - Completeness 126 - Consistency 127 - Context assumptions 128 - Execution verification 129 130 ← Returns: 131 - PASS (no issues) 132 - FLAG (potential issues) 133 - BLOCK (definite problems) 134 ``` 135 136 ### Review Lenses 137 138 | Lens | Question | Catches | 139 |------|----------|---------| 140 | **Completeness** | "Is anything missing that should be there?" | Missing scales, incomplete lists | 141 | **Consistency** | "Does this contradict prior statements?" | Contradictions, changed positions | 142 | **Context** | "Would reader need unstated knowledge?" | Implicit assumptions | 143 | **Execution** | "Was promised action actually taken?" | Said vs. did gap | 144 | **Adversarial** | "How could this be wrong?" | Blind spots, edge cases | 145 146 ### When to Use 147 148 | Situation | Review Level | 149 |-----------|--------------| 150 | Simple response | Skip (overhead not worth it) | 151 | New pattern/document | Light review (completeness) | 152 | Architectural decision | Full review (all lenses) | 153 | High stakes output | Full review + adversarial | 154 155 ### Cost-Benefit 156 157 ``` 158 REVIEW THOROUGHNESS 159 ↑ 160 HIGH │ Full review │ Diminishing returns 161 │ (all lenses) │ here 162 │ ───────────────────────────────────────── 163 │ Light review │ Sweet spot for 164 │ (completeness) │ most outputs 165 │ ───────────────────────────────────────── 166 LOW │ No review │ Fast but risky 167 │ │ 168 └──────────────────────────────────────────→ 169 ERROR RISK 170 ``` 171 172 --- 173 174 ## Layer 3: First Officer Pattern Detection 175 176 The First Officer doesn't review individual outputs - it watches the **stream** for patterns: 177 178 ### Detection Rules 179 180 | Pattern | Detection | Action | 181 |---------|-----------|--------| 182 | Numbers without scales | Count occurrences | Alert at threshold | 183 | Claims without file writes | Compare said vs. files | Alert immediately | 184 | Repeated topics | Track frequency | Flag as potential attractor | 185 | Contradictions | Compare statements over time | Alert on detection | 186 | Complexity creep | Monitor response length | Warn on inflation | 187 188 ### Statistical Process Control 189 190 ``` 191 ERROR RATE TRACKING 192 193 Errors 194 │ 195 │ UCL (Upper Control Limit) 196 │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 197 │ * 198 │ * * * 199 │ ─────────────────────── Mean 200 │ * * * 201 │ * 202 │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 203 │ LCL (Lower Control Limit) 204 └────────────────────────────→ Time 205 206 When error rate exceeds UCL → systematic problem 207 When single error is severe → immediate flag 208 ``` 209 210 --- 211 212 ## Layer 4: Human Spot-Check 213 214 ### The Last Resort 215 216 When errors reach the human: 217 - Highest trust cost 218 - But also: most valuable feedback 219 - Reveals blind spots in Layers 1-3 220 221 ### Making It Efficient 222 223 Human shouldn't review everything. Instead: 224 225 ``` 226 SAMPLING STRATEGY 227 228 High stakes → Full human review (always) 229 Medium stakes → AI review + human spot-check (random) 230 Low stakes → AI review only (trust the system) 231 Patterns → When First Officer flags, human investigates 232 ``` 233 234 ### Feedback Loop 235 236 ``` 237 Human catches error 238 │ 239 ▼ 240 Log in TRUST-ERROR-LOG 241 │ 242 ▼ 243 Root cause analysis 244 │ 245 ├──→ Layer 1 gap? → Add to checklist 246 ├──→ Layer 2 gap? → Add review lens 247 └──→ Layer 3 gap? → Add detection rule 248 ``` 249 250 Every human-caught error should strengthen lower layers. 251 252 --- 253 254 ## The Gravity Well Error: Post-Mortem 255 256 **What happened:** 257 - Output numbers (gravity well strengths) without scales 258 - Human caught it 259 260 **Why each layer missed it:** 261 262 | Layer | Why It Missed | 263 |-------|---------------| 264 | Layer 1 (Self-check) | No explicit "scales" item in mental checklist | 265 | Layer 2 (AI review) | Not running - single instance | 266 | Layer 3 (First Officer) | Not implemented yet | 267 | Layer 4 (Human) | CAUGHT IT ✓ | 268 269 **Corrective actions:** 270 271 | Layer | Fix | 272 |-------|-----| 273 | Layer 1 | Added "NUMBERS: scales in parenthetical?" to checklist | 274 | Layer 2 | Design AI review protocol (this document) | 275 | Layer 3 | Add detection rule: "numbers without parenthetical context" | 276 | Layer 4 | Error logged, feeding back to improve lower layers | 277 278 --- 279 280 ## Implementation Status 281 282 | Layer | Status | Next Step | 283 |-------|--------|-----------| 284 | Layer 1 (Self-check) | 🚧 Partial | Formalize checklist | 285 | Layer 2 (AI review) | 📄 Designed | Need multi-instance infra | 286 | Layer 3 (First Officer) | 📄 Designed | Need implementation | 287 | Layer 4 (Human) | ⚡ Active | Optimize sampling | 288 289 --- 290 291 ## The Unknown Unknowns Problem 292 293 **Honest answer:** You can't fully solve unknown unknowns. But you can: 294 295 1. **Maximize coverage** - More lenses = more blind spots covered 296 2. **Diverse perspectives** - AI council has different blind spots than single instance 297 3. **Pattern detection** - Unknown unknowns often have detectable signatures 298 4. **Feedback loops** - Every caught error trains the system 299 5. **Humility** - Assume errors exist, design for detection 300 301 > **The goal isn't zero errors. The goal is catching errors before the human does.** 302 303 --- 304 305 ## The Promise 306 307 > **Errors happen. Catching them is the job.** 308 > 309 > Layer 1: Self-check before output 310 > Layer 2: AI peer review for fresh eyes 311 > Layer 3: First Officer for patterns 312 > Layer 4: Human for what escapes 313 > 314 > Every human-caught error feeds back to strengthen lower layers. 315 > The system learns from its failures. 316 317 --- 318 319 ## Related 320 321 - **axioms** 322 - [[A0 Boundary Operation]] - each layer is a boundary for error detection 323 - [[A1 Telos of Integration]] - layers integrate to provide coverage 324 - [[A2 Recognition of Life]] - catch calcification (recurring errors) 325 - **protocols** 326 - [[peer-review-protocol]] - Layer 2 implementation 327 - shape:: "Workers check each other's output. Multiple perspectives reduce blindspots." 328 - [[first-officer-protocol]] - Layer 3 implementation 329 - shape:: "Per-thread metacognition. Compress state, track gravity wells, flag drift." 330 - [[axiom-conformance-test]] - Layer 1 self-check 331 - shape:: "Runtime test that estimates Free Energy (F)." 332 - [[fractal-tribe-architecture]] - multi-level error catching 333 - shape:: "Same pattern at every level. Checkers → workers → tribes." 334 - [[model-allocation-strategy]] - Haiku checkers vs Opus synthesis 335 - shape:: "Match model capability to task complexity." 336 - **enables** 337 - [[trust-as-free-energy]] - errors caught early preserve trust 338 - shape:: "Trust measured as inverse of accumulated deviation." 339 340 --- 341 342 *proto-010 | Error Detection Layers | Catch Before Human Does*