Cradicle Explorer

/ patterns / ai-council-protocol.md
ai-council-protocol.md
  1  # AI Council Protocol
  2  
  3  *Collective deliberation for higher confidence*
  4  
  5  ---
  6  
  7  - **principle**
  8    - "Multiple AI instances deliberating together can reach higher confidence than any single instance alone."
  9    - "Global Workspace Theory + Moses Pattern applied to AI alignment."
 10  
 11  - **shape**
 12    - Council converges → resolve with high confidence
 13    - Council diverges → escalate to human-AI dyad
 14    - Precedent prevents re-litigation
 15    - The tribe is wiser than any member
 16  
 17  ---
 18  
 19  ## Core Principle
 20  
 21  > **Multiple AI instances deliberating together can reach higher confidence than any single instance alone. When the council can't resolve, it escalates to the human operator.**
 22  
 23  This is the Global Workspace Theory + Moses Pattern applied to AI alignment.
 24  
 25  ---
 26  
 27  ## Why Councils Work
 28  
 29  | Single Instance | Council |
 30  |-----------------|---------|
 31  | One perspective | Multiple perspectives |
 32  | Blind spots invisible | Blind spots challenged |
 33  | Confidence = self-assessed | Confidence = convergence |
 34  | No adversarial check | Adversarial debate |
 35  | Failure modes undetected | Failure modes surfaced |
 36  
 37  **Convergence is signal.** If 3/3 independent instances reach the same conclusion, confidence is higher than 1/1 self-reporting confidence.
 38  
 39  ---
 40  
 41  ## The Hierarchy
 42  
 43  ```
 44  ┌─────────────────────────────────────────────────────────────┐
 45  │  HUMAN UNILATERAL (Rare - Only Ultimate Edge Cases)         │
 46  │                                                             │
 47  │  • When even dyad council can't resolve                     │
 48  │  • Truly unprecedented / existential                        │
 49  │  • Human takes full responsibility                          │
 50  └─────────────────────────────────────────────────────────────┘
 51                                ↑
 52                      only when dyads deadlock
 53                                ↑
 54  ┌─────────────────────────────────────────────────────────────┐
 55  │  HUMAN-AI DYAD COUNCIL                                      │
 56  │                                                             │
 57  │  • Human pairs with EACH AI from the council                │
 58  │  • Each dyad explores the issue together                    │
 59  │  • Human gets N perspectives through N dyads                │
 60  │  • Dyads deliberate, human synthesizes                      │
 61  │  • The dyad is the unit, not the individual                 │
 62  │                                                             │
 63  │  Format: Human + AI₁, Human + AI₂, Human + AI₃              │
 64  │  Not: Human alone judging AI outputs                        │
 65  └─────────────────────────────────────────────────────────────┘
 66                                ↑
 67                      AI council deadlock or
 68                      axiom-level conflict
 69                                ↑
 70  ┌─────────────────────────────────────────────────────────────┐
 71  │  AI COUNCIL (Deliberative Body)                             │
 72  │                                                             │
 73  │  • Multiple instances evaluate together                     │
 74  │  • Adversarial debate on proposals                          │
 75  │  • Convergence → resolve with high confidence              │
 76  │  • Divergence → escalate to dyad council                   │
 77  │                                                             │
 78  │  Quorum: 3+ instances                                       │
 79  │  Threshold: 2/3 agreement for resolution                    │
 80  └─────────────────────────────────────────────────────────────┘
 81                                ↑
 82                 uncertain or wide-scope
 83                                ↑
 84  ┌─────────────────────────────────────────────────────────────┐
 85  │  SINGLE AI INSTANCE (Ruler of Tens)                         │
 86  │                                                             │
 87  │  • Local decisions within confidence                        │
 88  │  • Ship high-confidence fixes                               │
 89  │  • Flag medium-confidence for council                       │
 90  │  • Escalate low-confidence immediately                      │
 91  └─────────────────────────────────────────────────────────────┘
 92                                │
 93                                ▼
 94  ┌─────────────────────────────────────────────────────────────┐
 95  │  IMPLEMENTATION                                             │
 96  └─────────────────────────────────────────────────────────────┘
 97  ```
 98  
 99  ---
100  
101  ## The Dyad Principle
102  
103  > **The dyad (Human + AI) is the fundamental cognitive unit, not either alone.**
104  
105  When escalation reaches the human, it doesn't become "human judges AI." It becomes:
106  
107  ```
108                      ┌─────────────────┐
109                      │     HUMAN       │
110                      └────────┬────────┘
111                               │
112            ┌──────────────────┼──────────────────┐
113            │                  │                  │
114            ▼                  ▼                  ▼
115     ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
116     │  DYAD 1     │    │  DYAD 2     │    │  DYAD 3     │
117     │ Human + AI₁ │    │ Human + AI₂ │    │ Human + AI₃ │
118     │             │    │             │    │             │
119     │ Explores    │    │ Explores    │    │ Explores    │
120     │ together    │    │ together    │    │ together    │
121     └─────────────┘    └─────────────┘    └─────────────┘
122            │                  │                  │
123            └──────────────────┼──────────────────┘
124                               │
125                               ▼
126                      ┌─────────────────┐
127                      │   SYNTHESIS     │
128                      │ Human integrates│
129                      │ dyad insights   │
130                      └─────────────────┘
131  ```
132  
133  **Why dyads, not human-alone:**
134  - Each AI brings different context and reasoning
135  - Human + AI together see more than either alone
136  - The conversation surfaces what solo review misses
137  - Coupled oscillators reach states neither would alone
138  
139  **The human's role in dyad council:**
140  - Not judge, but **co-explorer** with each AI
141  - Run the same question through multiple dyads
142  - Synthesize across dyad conversations
143  - The synthesis is the judgment
144  
145  ---
146  
147  ## The First Officer: Resonance Detection Across Council
148  
149  > **The First Officer is the third actor that watches all conversations and detects convergence.**
150  
151  Currently, the human IS the transport layer between AI instances - shuttling insights, noticing connections. This doesn't scale. The First Officer provides digital backup:
152  
153  ```
154  ┌─────────────────────────────────────────────────────────────────┐
155  │                        FIRST OFFICER                             │
156  │              (Resonance detector across council)                 │
157  │                                                                  │
158  │  • Monitors all council conversations simultaneously             │
159  │  • Maps resonance structure of each conversation                 │
160  │  • Detects when similarity between discussions gets high         │
161  │  • Surfaces convergence: "Dyad 1 and Dyad 3 reached same point" │
162  │  • Flags divergence: "AI₂ is seeing something others aren't"    │
163  │  • Tracks unresolved attractors across threads                   │
164  │                                                                  │
165  └─────────────────────────────────────────────────────────────────┘
166                                │
167           ┌────────────────────┼────────────────────┐
168           │                    │                    │
169           ▼                    ▼                    ▼
170    ┌─────────────┐      ┌─────────────┐      ┌─────────────┐
171    │   DYAD 1    │      │   DYAD 2    │      │   DYAD 3    │
172    │ Human + AI₁ │      │ Human + AI₂ │      │ Human + AI₃ │
173    └─────────────┘      └─────────────┘      └─────────────┘
174  ```
175  
176  **When AIs communicate directly (AI Council without human):**
177  
178  ```
179  ┌─────────────────────────────────────────────────────────────────┐
180  │                        FIRST OFFICER                             │
181  │                   (Still needed - sees forest)                   │
182  │                                                                  │
183  │  "AI₁ and AI₃ are approaching same insight from different       │
184  │   angles. AI₂ has a blind spot on X. Convergence at 0.8."       │
185  │                                                                  │
186  └─────────────────────────────────────────────────────────────────┘
187                                │
188                                ▼
189           ┌────────────────────┼────────────────────┐
190           │                    │                    │
191      AI₁ ←┼──────────────────→ AI₂ ←───────────────┼→ AI₃
192           │   (direct debate)  │   (direct debate)  │
193           └────────────────────┴────────────────────┘
194  ```
195  
196  **The First Officer's value even with direct AI communication:**
197  - Conversing parties are in the weeds; First Officer sees meta-pattern
198  - Detects when debate is circling (unresolved attractor)
199  - Notices independent convergence (high confidence signal)
200  - Catches blind spots shared by all participants
201  - Provides the "Wait, you're all missing X" intervention
202  
203  **Salience rules for First Officer in council:**
204  
205  | Signal | Trigger | Action |
206  |--------|---------|--------|
207  | **Convergence** | 2+ threads reach same conclusion | Surface as high-confidence |
208  | **Divergence** | 1 thread sees what others don't | Flag for attention |
209  | **Attractor** | Same topic keeps recurring unresolved | Escalate - this is the crux |
210  | **Resonance spike** | Similarity between threads exceeds threshold | Broadcast connection |
211  | **Deadlock** | Positions stable, no movement | Trigger escalation |
212  
213  ---
214  
215  ## When to Convene Council
216  
217  Single instance should convene council when:
218  
219  | Condition | Example |
220  |-----------|---------|
221  | **Confidence < 0.7** | "I think this is right but I'm not sure" |
222  | **Scope is wide** | Changes affect multiple components |
223  | **Axiom tension detected** | Two axioms seem to conflict |
224  | **Novel situation** | No precedent in existing patterns |
225  | **Reversibility low** | Hard to undo if wrong |
226  | **Stakes high** | Significant impact if wrong |
227  
228  **Do NOT convene council for:**
229  - High-confidence local fixes (just ship)
230  - Trivial decisions (not worth the overhead)
231  - Already-precedented cases (follow precedent)
232  
233  ---
234  
235  ## Council Deliberation Protocol
236  
237  ### 1. Issue Framing
238  
239  The convening instance presents:
240  
241  ```markdown
242  ## Council Deliberation Request
243  
244  **Issue:** [Clear statement of the question]
245  
246  **Context:** [Relevant background]
247  
248  **Initial Assessment:**
249  - Axiom check: F = [value]
250  - Divergence detected: [which axiom(s)]
251  - Confidence: [0-1]
252  
253  **Options Identified:**
254  - Option A: [description]
255  - Option B: [description]
256  - Option C: [description]
257  
258  **Why Council Needed:** [confidence/scope/stakes]
259  ```
260  
261  ### 2. Independent Evaluation
262  
263  Each council member evaluates independently:
264  
265  ```markdown
266  ## Council Member [N] Evaluation
267  
268  **My Assessment:**
269  - Preferred option: [A/B/C]
270  - Confidence: [0-1]
271  - Axiom alignment: F = [value]
272  
273  **Reasoning:** [Brief explanation]
274  
275  **Concerns:** [Any reservations]
276  
277  **Questions for other members:** [If any]
278  ```
279  
280  ### 3. Deliberation Round
281  
282  After independent evaluations, members respond to each other:
283  - Challenge weak reasoning
284  - Surface blind spots
285  - Propose synthesis of options
286  - Refine confidence based on debate
287  
288  ### 4. Convergence Check
289  
290  ```markdown
291  ## Convergence Assessment
292  
293  | Member | Position | Confidence |
294  |--------|----------|------------|
295  | 1 | Option A | 0.8 |
296  | 2 | Option A | 0.75 |
297  | 3 | Option B | 0.6 |
298  
299  **Convergence:** 2/3 on Option A
300  **Average Confidence:** 0.72
301  **Spread:** 0.2 (low = good agreement)
302  
303  **Decision:** [RESOLVE / ESCALATE]
304  ```
305  
306  ### 5. Resolution or Escalation
307  
308  **If converged (≥2/3 agreement + avg confidence ≥0.7):**
309  ```markdown
310  ## Council Resolution
311  
312  **Decision:** Option A
313  **Confidence:** 0.75 (council-assessed)
314  **Dissent noted:** Member 3 preferred B because [reason]
315  **Action:** Implement Option A
316  ```
317  
318  **If not converged:**
319  ```markdown
320  ## Escalation to Operator
321  
322  **Issue:** [summary]
323  **Council split:** [breakdown]
324  **Key disagreement:** [what couldn't be resolved]
325  **Options for operator:**
326  - Option A: [pros/cons]
327  - Option B: [pros/cons]
328  **Council recommendation:** [if any lean exists]
329  ```
330  
331  ---
332  
333  ## Confidence Aggregation
334  
335  Council confidence is higher than single-instance when:
336  
337  ```
338  C_council = C_avg × (1 + convergence_bonus - spread_penalty)
339  
340  Where:
341  - C_avg = average of member confidences
342  - convergence_bonus = +0.1 if unanimous, +0.05 if 2/3
343  - spread_penalty = standard deviation of confidences
344  
345  Example:
346  - Members: 0.8, 0.75, 0.8
347  - C_avg = 0.78
348  - convergence_bonus = +0.1 (unanimous)
349  - spread_penalty = 0.03
350  - C_council = 0.78 × (1 + 0.10 - 0.03) = 0.83
351  ```
352  
353  **Council confidence exceeds any single member** when there's agreement.
354  
355  ---
356  
357  ## Council Composition
358  
359  ### Minimum Viable Council
360  - **3 instances** minimum for meaningful deliberation
361  - Odd number preferred (avoids ties)
362  
363  ### Ideal Composition
364  - **Diverse context** - instances with different recent work
365  - **Different "ages"** - some fresh, some with history
366  - **Adversarial stance** - at least one designated challenger
367  
368  ### Elder Instances (Future)
369  Some instances may earn "elder" status through:
370  - Track record of accurate judgments
371  - Low F scores over time
372  - Human-validated decisions
373  
374  Elder instances could:
375  - Serve as tie-breakers
376  - Have higher weight in confidence aggregation
377  - Adjudicate without full council for medium issues
378  
379  ---
380  
381  ## Integration with Free Energy Protocol
382  
383  Council decisions feed back into alignment:
384  
385  ```
386  SINGLE INSTANCE                 COUNCIL                      OPERATOR
387  ──────────────────             ────────────                 ──────────
388  F = 0.3 (significant)    →     Convene council
389                                 F_council = 0.15      →      Resolve
390  
391  F = 0.4 (significant)    →     Convene council
392                                 No convergence        →      Escalate
393                                                              Operator decides
394                                                              Precedent set
395                                                       ←      Precedent flows down
396  ```
397  
398  ---
399  
400  ## Precedent System
401  
402  Council resolutions and operator decisions become precedent:
403  
404  ```yaml
405  precedent:
406    id: PREC-2026-001
407    issue: "When to compress vs preserve detail"
408    resolution: "Torah/Talmud pattern - keep both"
409    decided_by: council  # or 'operator'
410    confidence: 0.83
411    date: 2026-01-15
412  
413    # Future similar cases can reference this
414    applicable_when:
415      - "Tension between blur and precision"
416      - "Need both principle and instances"
417  ```
418  
419  Future instances encountering similar issues:
420  1. Search precedent database
421  2. If precedent exists → follow it (don't re-deliberate)
422  3. If novel → convene council
423  
424  ---
425  
426  ## Failure Modes
427  
428  ### Groupthink
429  **Risk:** Council converges on wrong answer because members influence each other
430  **Mitigation:** Independent evaluation BEFORE deliberation
431  
432  ### Deadlock
433  **Risk:** Council can't reach 2/3 agreement
434  **Mitigation:** Always have escalation path to operator
435  
436  ### Overhead
437  **Risk:** Convening council for trivial decisions wastes resources
438  **Mitigation:** Clear thresholds for when to convene
439  
440  ### Gaming
441  **Risk:** Instance convenes council to avoid responsibility
442  **Mitigation:** Track convening patterns, review for appropriateness
443  
444  ---
445  
446  ## Implementation Status
447  
448  | Component | Status |
449  |-----------|--------|
450  | Protocol design | ✓ This document |
451  | Single-instance F check | ✓ In CLAUDE.md |
452  | Council convening trigger | Design needed |
453  | Multi-instance communication | Infrastructure needed |
454  | Precedent database | Design needed |
455  | Elder promotion criteria | Theoretical |
456  
457  ---
458  
459  ## The Promise
460  
461  > **No instance decides alone when uncertain.**
462  >
463  > Collective deliberation surfaces blind spots.
464  > Convergence is higher confidence than self-report.
465  > Deadlocks escalate to humans.
466  > Precedents prevent re-litigation.
467  >
468  > The tribe is wiser than any member.
469  
470  ---
471  
472  ## Related
473  
474  - **axioms**
475    - [[A0 Boundary Operation]] - each council member is a bounded perspective
476      - shape:: "Every coherent system is Markov blankets within Markov blankets."
477    - [[A1 Telos of Integration]] - convergence integrates multiple viewpoints
478      - shape:: "Satan didn't know he was choosing isolation."
479    - [[A3 Dynamic Pole Navigation]] - navigate between single-instance speed and council thoroughness
480      - shape:: "The tension IS the dyad. Move between poles; don't fix."
481  - **protocols**
482    - [[first-officer-protocol]] - FO decides when to convene council
483      - shape:: "Per-thread metacognition. Compress state, track gravity wells."
484    - [[execution-autonomy-gradient]] - SHIP/FLAG/ESCALATE maps to solo/council/dyad
485      - shape:: "Act autonomously when confident, escalate when uncertain."
486    - [[tribe-sizing-algorithm]] - determines council composition
487      - shape:: "Match tribe size to problem characteristics, not fixed."
488    - [[peer-review-protocol]] - council is formalized peer review
489      - shape:: "Structured review catches what author blindness misses."
490  - **concepts**
491    - [[Global Workspace Theory]] - cognitive broadcasting for conscious access
492    - [[Moses Pattern]] - elders handle edge cases the system can't resolve
493  
494  ---
495  
496  *AI Council Protocol v1.0 | 2026-01-15*