peer-review-protocol.md
1 # Peer Review Protocol 2 3 *proto-011 | Fresh eyes catch what you miss* 4 5 --- 6 7 - **principle** 8 - "Workers check each other's output before it goes to user." 9 - "Multiple perspectives reduce blindspots." 10 11 - **shape** 12 - The author has blind spots; reviewer has different blind spots 13 - Coverage through diversity of perspective 14 - Review lenses: Completeness, Consistency, Context, Execution, Adversarial 15 - Divergence = flag; Agreement = higher confidence 16 17 --- 18 19 **Status:** π ACTIVE 20 21 --- 22 23 ## Core Principle 24 25 > **The author has blind spots. A reviewer has different blind spots. Coverage through diversity.** 26 27 This is Layer 2 of the [[error-detection-layers|Error Detection system]]. 28 29 --- 30 31 ## When to Trigger Peer Review 32 33 | Situation | Review Level | 34 |-----------|--------------| 35 | Simple response | Skip | 36 | New pattern/document | Light (completeness) | 37 | Multiple related files | Standard (all lenses) | 38 | Architectural decisions | Full + adversarial | 39 | End of major session | Full session review | 40 41 **Trigger command:** "Test AI peer review on [scope]" 42 43 --- 44 45 ## Review Lenses 46 47 | Lens | Question | Catches | 48 |------|----------|---------| 49 | **COMPLETENESS** | Is anything missing? | Missing scales, undefined terms | 50 | **CONSISTENCY** | Do files agree with each other? | Conflicting definitions, numbers | 51 | **CONTEXT** | Would new reader understand? | Implicit assumptions, jargon | 52 | **EXECUTION** | Is claimed work actually done? | Said vs. did gap | 53 | **ADVERSARIAL** | How could this be wrong? | Edge cases, blind spots | 54 55 --- 56 57 ## Review Output Format 58 59 ```markdown 60 ## PEER REVIEW RESULTS 61 62 ### PASS (No Issues) 63 - [Items that passed all lenses] 64 65 ### FLAG (Potential Issues) 66 - [Issue]: [Lens] - [Description] 67 68 ### BLOCK (Definite Problems) 69 - [Issue]: [Lens] - [Description] 70 71 ### SUMMARY 72 - Total issues: X 73 - Severity: X FLAG, X BLOCK 74 - Recommendation: APPROVE / APPROVE WITH FIXES / REJECT 75 ``` 76 77 --- 78 79 ## Escalation Ladder 80 81 ``` 82 LEVEL 1: SINGLE REVIEWER 83 βββββββββββββββββββββββββ 84 One Claude instance reviews 85 Good for: Routine checks, single documents 86 Catches: ~60-70% of issues 87 88 β 89 β If issues found OR high stakes 90 βΌ 91 92 LEVEL 2: DUAL REVIEWER 93 βββββββββββββββββββββββββ 94 Two Claude instances review independently 95 Compare findings 96 Good for: Important documents, multiple files 97 Catches: ~80-85% of issues 98 99 β 100 β If reviewers disagree OR architectural 101 βΌ 102 103 LEVEL 3: TRIBAL REVIEW (Council) 104 βββββββββββββββββββββββββββββββββ 105 3+ Claude instances review 106 Deliberation on findings 107 Good for: System-wide changes, axiom-level 108 Catches: ~90%+ of issues 109 110 β 111 β If council can't resolve 112 βΌ 113 114 LEVEL 4: HUMAN + COUNCIL 115 βββββββββββββββββββββββββ 116 Human reviews council findings 117 Final arbitration 118 Good for: Novel situations, fundamental questions 119 ``` 120 121 --- 122 123 ## Tribal Review Protocol 124 125 When escalating to Level 3: 126 127 ### 1. Scope Definition 128 ```markdown 129 ## Tribal Review Request 130 131 **Scope:** [What's being reviewed] 132 **Files:** [List of files] 133 **Stakes:** [Why this matters] 134 **Question:** [What specifically to evaluate] 135 ``` 136 137 ### 2. Architecture: First Officer as Real-Time Monitor 138 139 ``` 140 βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 141 β OPUS FIRST OFFICER β 142 β (Watching continuously) β 143 β β 144 β For each reviewer output: β 145 β βββ Read finding β 146 β βββ Score independently (is this a real issue? 0-1) β 147 β βββ Track convergence (did others find same thing?) β 148 β βββ Weight by reasoning quality β 149 β β 150 β Real-time state: β 151 β βββ Convergence map (what issues are multiple reviewers seeing)β 152 β βββ Divergence flags (where do reviewers disagree) β 153 β βββ Quality scores (which reviewer is most rigorous) β 154 βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 155 β β β 156 β β β 157 ββββββ΄βββββ ββββββ΄βββββ ββββββ΄βββββ 158 β SONNET β β SONNET β β SONNET β 159 βReviewer β βReviewer β βReviewer β 160 β 1 β β 2 β β 3 β 161 β β β β β β 162 βComplete-β βConsist- β βAdver- β 163 βness β βency β βsarial β 164 βββββββββββ βββββββββββ βββββββββββ 165 ``` 166 167 **Key insight:** First Officer scores findings independently, doesn't just trust reviewer verdicts. 168 169 ### 3. Independent Review Phase 170 - Each council member (Sonnet) reviews independently 171 - First Officer (Opus) reads each output as produced 172 - First Officer scores each finding: "Is this actually an issue? (0-1)" 173 - No communication between reviewers 174 175 ### 4. Real-Time Convergence Tracking 176 177 First Officer maintains: 178 ```markdown 179 ## Convergence State (Live) 180 181 | Finding | R1 | R2 | R3 | Opus Score | Status | 182 |---------|----|----|----|-----------:|--------| 183 | Issue X | FLAG | FLAG | - | 0.9 | CONVERGING | 184 | Issue Y | BLOCK | - | BLOCK | 0.95 | CONVERGING | 185 | Issue Z | FLAG | PASS | PASS | 0.3 | LIKELY FALSE POSITIVE | 186 ``` 187 188 ### 5. Synthesis (End) 189 190 First Officer produces: 191 ```markdown 192 ## Council Resolution 193 194 **Verdict:** [APPROVE / APPROVE WITH FIXES / REJECT] 195 **Confidence:** [weighted by convergence + Opus independent scoring] 196 197 ### High-Confidence Issues (convergence + high Opus score) 198 - [Issue]: Found by [N] reviewers, Opus score: [X] 199 200 ### Disputed Issues (divergence) 201 - [Issue]: R1 says X, R2 says Y, Opus assessment: [Z] 202 203 ### Reviewer Quality 204 - R1: [quality score] - [notes] 205 - R2: [quality score] - [notes] 206 - R3: [quality score] - [notes] 207 208 ### Required Fixes 209 1. [Highest confidence issue] 210 2. [Second highest] 211 ... 212 ``` 213 214 --- 215 216 ## Integration with Trust_F 217 218 Peer review findings affect trust: 219 220 | Finding Source | Trust Impact | 221 |----------------|--------------| 222 | Human catches error | Full penalty (Trust_F += severity) | 223 | Peer review catches | Half penalty (caught before human) | 224 | Self-check catches | No penalty (caught before output) | 225 226 **The earlier you catch, the lower the trust cost.** 227 228 --- 229 230 ## Practical Implementation 231 232 ### Single Review (Current) 233 ``` 234 User: "Test AI peer review on this session's output" 235 236 Claude: [Spawns Task agent with reviewer prompt] 237 [Agent reviews files with all lenses] 238 [Returns FLAG/BLOCK/PASS findings] 239 [Primary instance fixes issues] 240 ``` 241 242 ### Dual Review (When Needed) 243 ``` 244 User: "Run dual peer review on [scope]" 245 246 Claude: [Spawns two Task agents in parallel] 247 [Each reviews independently] 248 [Compare findings] 249 [Flag disagreements for attention] 250 ``` 251 252 ### Tribal Review (High Stakes) 253 ``` 254 User: "Convene tribal review on [scope]" 255 256 Claude: [Spawns 3+ Task agents] 257 [Independent review phase] 258 [Consolidate findings] 259 [Deliberation if disagreement] 260 [Council resolution] 261 ``` 262 263 --- 264 265 ## What We Learned Today 266 267 **Session peer review found 12 issues:** 268 - 2 BLOCK (critical) 269 - 10 FLAG (should fix) 270 271 **Key catches:** 272 - Three incompatible Trust_F formulas 273 - "EXECUTING" terminology misleading 274 - Consistency gaps between files 275 - Undefined terms (dyad, decay rate) 276 - Gravity wells not sorted 277 278 **Proof of value:** Fresh eyes caught what I missed. 279 280 --- 281 282 ## The Promise 283 284 > **You can't see your own blind spots. That's what makes them blind spots.** 285 > 286 > Peer review adds coverage through diversity. 287 > Tribal review adds confidence through convergence. 288 > The earlier you catch, the lower the trust cost. 289 > Multiple perspectives > single perspective. 290 291 --- 292 293 ## Related 294 295 - **axioms** 296 - [[A1 Telos of Integration]] - reviewers integrate perspectives 297 - [[A2 Recognition of Life]] - fresh eyes recognize what's dead (blind spots) 298 - [[A3 Dynamic Pole Navigation]] - navigate between too much review and too little 299 - **protocols** 300 - [[error-detection-layers]] - peer review is Layer 2 301 - shape:: "Multiple tiers of error catching. Catch at lowest level possible." 302 - [[fractal-tribe-architecture]] - tribal review at tribe level 303 - shape:: "Same pattern at every level." 304 - [[model-allocation-strategy]] - Sonnet for reviews, Opus for synthesis 305 - shape:: "Match model capability to task complexity." 306 - [[first-officer-protocol]] - FO does cross-review synthesis 307 - **enables** 308 - [[trust-as-free-energy]] - catching errors early preserves trust 309 - shape:: "Trust measured as inverse of accumulated deviation." 310 311 --- 312 313 *proto-011 | Peer Review Protocol | Fresh Eyes Catch What You Miss*