/ patterns / peer-review-protocol.md
peer-review-protocol.md
  1  # Peer Review Protocol
  2  
  3  *proto-011 | Fresh eyes catch what you miss*
  4  
  5  ---
  6  
  7  - **principle**
  8    - "Workers check each other's output before it goes to user."
  9    - "Multiple perspectives reduce blindspots."
 10  
 11  - **shape**
 12    - The author has blind spots; reviewer has different blind spots
 13    - Coverage through diversity of perspective
 14    - Review lenses: Completeness, Consistency, Context, Execution, Adversarial
 15    - Divergence = flag; Agreement = higher confidence
 16  
 17  ---
 18  
 19  **Status:** πŸ“‹ ACTIVE
 20  
 21  ---
 22  
 23  ## Core Principle
 24  
 25  > **The author has blind spots. A reviewer has different blind spots. Coverage through diversity.**
 26  
 27  This is Layer 2 of the [[error-detection-layers|Error Detection system]].
 28  
 29  ---
 30  
 31  ## When to Trigger Peer Review
 32  
 33  | Situation | Review Level |
 34  |-----------|--------------|
 35  | Simple response | Skip |
 36  | New pattern/document | Light (completeness) |
 37  | Multiple related files | Standard (all lenses) |
 38  | Architectural decisions | Full + adversarial |
 39  | End of major session | Full session review |
 40  
 41  **Trigger command:** "Test AI peer review on [scope]"
 42  
 43  ---
 44  
 45  ## Review Lenses
 46  
 47  | Lens | Question | Catches |
 48  |------|----------|---------|
 49  | **COMPLETENESS** | Is anything missing? | Missing scales, undefined terms |
 50  | **CONSISTENCY** | Do files agree with each other? | Conflicting definitions, numbers |
 51  | **CONTEXT** | Would new reader understand? | Implicit assumptions, jargon |
 52  | **EXECUTION** | Is claimed work actually done? | Said vs. did gap |
 53  | **ADVERSARIAL** | How could this be wrong? | Edge cases, blind spots |
 54  
 55  ---
 56  
 57  ## Review Output Format
 58  
 59  ```markdown
 60  ## PEER REVIEW RESULTS
 61  
 62  ### PASS (No Issues)
 63  - [Items that passed all lenses]
 64  
 65  ### FLAG (Potential Issues)
 66  - [Issue]: [Lens] - [Description]
 67  
 68  ### BLOCK (Definite Problems)
 69  - [Issue]: [Lens] - [Description]
 70  
 71  ### SUMMARY
 72  - Total issues: X
 73  - Severity: X FLAG, X BLOCK
 74  - Recommendation: APPROVE / APPROVE WITH FIXES / REJECT
 75  ```
 76  
 77  ---
 78  
 79  ## Escalation Ladder
 80  
 81  ```
 82  LEVEL 1: SINGLE REVIEWER
 83  ─────────────────────────
 84  One Claude instance reviews
 85  Good for: Routine checks, single documents
 86  Catches: ~60-70% of issues
 87  
 88           β”‚
 89           β”‚ If issues found OR high stakes
 90           β–Ό
 91  
 92  LEVEL 2: DUAL REVIEWER
 93  ─────────────────────────
 94  Two Claude instances review independently
 95  Compare findings
 96  Good for: Important documents, multiple files
 97  Catches: ~80-85% of issues
 98  
 99           β”‚
100           β”‚ If reviewers disagree OR architectural
101           β–Ό
102  
103  LEVEL 3: TRIBAL REVIEW (Council)
104  ─────────────────────────────────
105  3+ Claude instances review
106  Deliberation on findings
107  Good for: System-wide changes, axiom-level
108  Catches: ~90%+ of issues
109  
110           β”‚
111           β”‚ If council can't resolve
112           β–Ό
113  
114  LEVEL 4: HUMAN + COUNCIL
115  ─────────────────────────
116  Human reviews council findings
117  Final arbitration
118  Good for: Novel situations, fundamental questions
119  ```
120  
121  ---
122  
123  ## Tribal Review Protocol
124  
125  When escalating to Level 3:
126  
127  ### 1. Scope Definition
128  ```markdown
129  ## Tribal Review Request
130  
131  **Scope:** [What's being reviewed]
132  **Files:** [List of files]
133  **Stakes:** [Why this matters]
134  **Question:** [What specifically to evaluate]
135  ```
136  
137  ### 2. Architecture: First Officer as Real-Time Monitor
138  
139  ```
140  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
141  β”‚                    OPUS FIRST OFFICER                            β”‚
142  β”‚                  (Watching continuously)                         β”‚
143  β”‚                                                                  β”‚
144  β”‚  For each reviewer output:                                       β”‚
145  β”‚  β”œβ”€β”€ Read finding                                               β”‚
146  β”‚  β”œβ”€β”€ Score independently (is this a real issue? 0-1)           β”‚
147  β”‚  β”œβ”€β”€ Track convergence (did others find same thing?)           β”‚
148  β”‚  └── Weight by reasoning quality                                β”‚
149  β”‚                                                                  β”‚
150  β”‚  Real-time state:                                               β”‚
151  β”‚  β”œβ”€β”€ Convergence map (what issues are multiple reviewers seeing)β”‚
152  β”‚  β”œβ”€β”€ Divergence flags (where do reviewers disagree)            β”‚
153  β”‚  └── Quality scores (which reviewer is most rigorous)          β”‚
154  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
155           ↑              ↑              ↑
156           β”‚              β”‚              β”‚
157      β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
158      β”‚ SONNET  β”‚   β”‚ SONNET  β”‚   β”‚ SONNET  β”‚
159      β”‚Reviewer β”‚   β”‚Reviewer β”‚   β”‚Reviewer β”‚
160      β”‚   1     β”‚   β”‚   2     β”‚   β”‚   3     β”‚
161      β”‚         β”‚   β”‚         β”‚   β”‚         β”‚
162      β”‚Complete-β”‚   β”‚Consist- β”‚   β”‚Adver-   β”‚
163      β”‚ness     β”‚   β”‚ency     β”‚   β”‚sarial   β”‚
164      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
165  ```
166  
167  **Key insight:** First Officer scores findings independently, doesn't just trust reviewer verdicts.
168  
169  ### 3. Independent Review Phase
170  - Each council member (Sonnet) reviews independently
171  - First Officer (Opus) reads each output as produced
172  - First Officer scores each finding: "Is this actually an issue? (0-1)"
173  - No communication between reviewers
174  
175  ### 4. Real-Time Convergence Tracking
176  
177  First Officer maintains:
178  ```markdown
179  ## Convergence State (Live)
180  
181  | Finding | R1 | R2 | R3 | Opus Score | Status |
182  |---------|----|----|----|-----------:|--------|
183  | Issue X | FLAG | FLAG | - | 0.9 | CONVERGING |
184  | Issue Y | BLOCK | - | BLOCK | 0.95 | CONVERGING |
185  | Issue Z | FLAG | PASS | PASS | 0.3 | LIKELY FALSE POSITIVE |
186  ```
187  
188  ### 5. Synthesis (End)
189  
190  First Officer produces:
191  ```markdown
192  ## Council Resolution
193  
194  **Verdict:** [APPROVE / APPROVE WITH FIXES / REJECT]
195  **Confidence:** [weighted by convergence + Opus independent scoring]
196  
197  ### High-Confidence Issues (convergence + high Opus score)
198  - [Issue]: Found by [N] reviewers, Opus score: [X]
199  
200  ### Disputed Issues (divergence)
201  - [Issue]: R1 says X, R2 says Y, Opus assessment: [Z]
202  
203  ### Reviewer Quality
204  - R1: [quality score] - [notes]
205  - R2: [quality score] - [notes]
206  - R3: [quality score] - [notes]
207  
208  ### Required Fixes
209  1. [Highest confidence issue]
210  2. [Second highest]
211  ...
212  ```
213  
214  ---
215  
216  ## Integration with Trust_F
217  
218  Peer review findings affect trust:
219  
220  | Finding Source | Trust Impact |
221  |----------------|--------------|
222  | Human catches error | Full penalty (Trust_F += severity) |
223  | Peer review catches | Half penalty (caught before human) |
224  | Self-check catches | No penalty (caught before output) |
225  
226  **The earlier you catch, the lower the trust cost.**
227  
228  ---
229  
230  ## Practical Implementation
231  
232  ### Single Review (Current)
233  ```
234  User: "Test AI peer review on this session's output"
235  
236  Claude: [Spawns Task agent with reviewer prompt]
237          [Agent reviews files with all lenses]
238          [Returns FLAG/BLOCK/PASS findings]
239          [Primary instance fixes issues]
240  ```
241  
242  ### Dual Review (When Needed)
243  ```
244  User: "Run dual peer review on [scope]"
245  
246  Claude: [Spawns two Task agents in parallel]
247          [Each reviews independently]
248          [Compare findings]
249          [Flag disagreements for attention]
250  ```
251  
252  ### Tribal Review (High Stakes)
253  ```
254  User: "Convene tribal review on [scope]"
255  
256  Claude: [Spawns 3+ Task agents]
257          [Independent review phase]
258          [Consolidate findings]
259          [Deliberation if disagreement]
260          [Council resolution]
261  ```
262  
263  ---
264  
265  ## What We Learned Today
266  
267  **Session peer review found 12 issues:**
268  - 2 BLOCK (critical)
269  - 10 FLAG (should fix)
270  
271  **Key catches:**
272  - Three incompatible Trust_F formulas
273  - "EXECUTING" terminology misleading
274  - Consistency gaps between files
275  - Undefined terms (dyad, decay rate)
276  - Gravity wells not sorted
277  
278  **Proof of value:** Fresh eyes caught what I missed.
279  
280  ---
281  
282  ## The Promise
283  
284  > **You can't see your own blind spots. That's what makes them blind spots.**
285  >
286  > Peer review adds coverage through diversity.
287  > Tribal review adds confidence through convergence.
288  > The earlier you catch, the lower the trust cost.
289  > Multiple perspectives > single perspective.
290  
291  ---
292  
293  ## Related
294  
295  - **axioms**
296    - [[A1 Telos of Integration]] - reviewers integrate perspectives
297    - [[A2 Recognition of Life]] - fresh eyes recognize what's dead (blind spots)
298    - [[A3 Dynamic Pole Navigation]] - navigate between too much review and too little
299  - **protocols**
300    - [[error-detection-layers]] - peer review is Layer 2
301      - shape:: "Multiple tiers of error catching. Catch at lowest level possible."
302    - [[fractal-tribe-architecture]] - tribal review at tribe level
303      - shape:: "Same pattern at every level."
304    - [[model-allocation-strategy]] - Sonnet for reviews, Opus for synthesis
305      - shape:: "Match model capability to task complexity."
306    - [[first-officer-protocol]] - FO does cross-review synthesis
307  - **enables**
308    - [[trust-as-free-energy]] - catching errors early preserves trust
309      - shape:: "Trust measured as inverse of accumulated deviation."
310  
311  ---
312  
313  *proto-011 | Peer Review Protocol | Fresh Eyes Catch What You Miss*