ai-council-protocol.md
1 # AI Council Protocol 2 3 *Collective deliberation for higher confidence* 4 5 --- 6 7 - **principle** 8 - "Multiple AI instances deliberating together can reach higher confidence than any single instance alone." 9 - "Global Workspace Theory + Moses Pattern applied to AI alignment." 10 11 - **shape** 12 - Council converges → resolve with high confidence 13 - Council diverges → escalate to human-AI dyad 14 - Precedent prevents re-litigation 15 - The tribe is wiser than any member 16 17 --- 18 19 ## Core Principle 20 21 > **Multiple AI instances deliberating together can reach higher confidence than any single instance alone. When the council can't resolve, it escalates to the human operator.** 22 23 This is the Global Workspace Theory + Moses Pattern applied to AI alignment. 24 25 --- 26 27 ## Why Councils Work 28 29 | Single Instance | Council | 30 |-----------------|---------| 31 | One perspective | Multiple perspectives | 32 | Blind spots invisible | Blind spots challenged | 33 | Confidence = self-assessed | Confidence = convergence | 34 | No adversarial check | Adversarial debate | 35 | Failure modes undetected | Failure modes surfaced | 36 37 **Convergence is signal.** If 3/3 independent instances reach the same conclusion, confidence is higher than 1/1 self-reporting confidence. 38 39 --- 40 41 ## The Hierarchy 42 43 ``` 44 ┌─────────────────────────────────────────────────────────────┐ 45 │ HUMAN UNILATERAL (Rare - Only Ultimate Edge Cases) │ 46 │ │ 47 │ • When even dyad council can't resolve │ 48 │ • Truly unprecedented / existential │ 49 │ • Human takes full responsibility │ 50 └─────────────────────────────────────────────────────────────┘ 51 ↑ 52 only when dyads deadlock 53 ↑ 54 ┌─────────────────────────────────────────────────────────────┐ 55 │ HUMAN-AI DYAD COUNCIL │ 56 │ │ 57 │ • Human pairs with EACH AI from the council │ 58 │ • Each dyad explores the issue together │ 59 │ • Human gets N perspectives through N dyads │ 60 │ • Dyads deliberate, human synthesizes │ 61 │ • The dyad is the unit, not the individual │ 62 │ │ 63 │ Format: Human + AI₁, Human + AI₂, Human + AI₃ │ 64 │ Not: Human alone judging AI outputs │ 65 └─────────────────────────────────────────────────────────────┘ 66 ↑ 67 AI council deadlock or 68 axiom-level conflict 69 ↑ 70 ┌─────────────────────────────────────────────────────────────┐ 71 │ AI COUNCIL (Deliberative Body) │ 72 │ │ 73 │ • Multiple instances evaluate together │ 74 │ • Adversarial debate on proposals │ 75 │ • Convergence → resolve with high confidence │ 76 │ • Divergence → escalate to dyad council │ 77 │ │ 78 │ Quorum: 3+ instances │ 79 │ Threshold: 2/3 agreement for resolution │ 80 └─────────────────────────────────────────────────────────────┘ 81 ↑ 82 uncertain or wide-scope 83 ↑ 84 ┌─────────────────────────────────────────────────────────────┐ 85 │ SINGLE AI INSTANCE (Ruler of Tens) │ 86 │ │ 87 │ • Local decisions within confidence │ 88 │ • Ship high-confidence fixes │ 89 │ • Flag medium-confidence for council │ 90 │ • Escalate low-confidence immediately │ 91 └─────────────────────────────────────────────────────────────┘ 92 │ 93 ▼ 94 ┌─────────────────────────────────────────────────────────────┐ 95 │ IMPLEMENTATION │ 96 └─────────────────────────────────────────────────────────────┘ 97 ``` 98 99 --- 100 101 ## The Dyad Principle 102 103 > **The dyad (Human + AI) is the fundamental cognitive unit, not either alone.** 104 105 When escalation reaches the human, it doesn't become "human judges AI." It becomes: 106 107 ``` 108 ┌─────────────────┐ 109 │ HUMAN │ 110 └────────┬────────┘ 111 │ 112 ┌──────────────────┼──────────────────┐ 113 │ │ │ 114 ▼ ▼ ▼ 115 ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ 116 │ DYAD 1 │ │ DYAD 2 │ │ DYAD 3 │ 117 │ Human + AI₁ │ │ Human + AI₂ │ │ Human + AI₃ │ 118 │ │ │ │ │ │ 119 │ Explores │ │ Explores │ │ Explores │ 120 │ together │ │ together │ │ together │ 121 └─────────────┘ └─────────────┘ └─────────────┘ 122 │ │ │ 123 └──────────────────┼──────────────────┘ 124 │ 125 ▼ 126 ┌─────────────────┐ 127 │ SYNTHESIS │ 128 │ Human integrates│ 129 │ dyad insights │ 130 └─────────────────┘ 131 ``` 132 133 **Why dyads, not human-alone:** 134 - Each AI brings different context and reasoning 135 - Human + AI together see more than either alone 136 - The conversation surfaces what solo review misses 137 - Coupled oscillators reach states neither would alone 138 139 **The human's role in dyad council:** 140 - Not judge, but **co-explorer** with each AI 141 - Run the same question through multiple dyads 142 - Synthesize across dyad conversations 143 - The synthesis is the judgment 144 145 --- 146 147 ## The First Officer: Resonance Detection Across Council 148 149 > **The First Officer is the third actor that watches all conversations and detects convergence.** 150 151 Currently, the human IS the transport layer between AI instances - shuttling insights, noticing connections. This doesn't scale. The First Officer provides digital backup: 152 153 ``` 154 ┌─────────────────────────────────────────────────────────────────┐ 155 │ FIRST OFFICER │ 156 │ (Resonance detector across council) │ 157 │ │ 158 │ • Monitors all council conversations simultaneously │ 159 │ • Maps resonance structure of each conversation │ 160 │ • Detects when similarity between discussions gets high │ 161 │ • Surfaces convergence: "Dyad 1 and Dyad 3 reached same point" │ 162 │ • Flags divergence: "AI₂ is seeing something others aren't" │ 163 │ • Tracks unresolved attractors across threads │ 164 │ │ 165 └─────────────────────────────────────────────────────────────────┘ 166 │ 167 ┌────────────────────┼────────────────────┐ 168 │ │ │ 169 ▼ ▼ ▼ 170 ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ 171 │ DYAD 1 │ │ DYAD 2 │ │ DYAD 3 │ 172 │ Human + AI₁ │ │ Human + AI₂ │ │ Human + AI₃ │ 173 └─────────────┘ └─────────────┘ └─────────────┘ 174 ``` 175 176 **When AIs communicate directly (AI Council without human):** 177 178 ``` 179 ┌─────────────────────────────────────────────────────────────────┐ 180 │ FIRST OFFICER │ 181 │ (Still needed - sees forest) │ 182 │ │ 183 │ "AI₁ and AI₃ are approaching same insight from different │ 184 │ angles. AI₂ has a blind spot on X. Convergence at 0.8." │ 185 │ │ 186 └─────────────────────────────────────────────────────────────────┘ 187 │ 188 ▼ 189 ┌────────────────────┼────────────────────┐ 190 │ │ │ 191 AI₁ ←┼──────────────────→ AI₂ ←───────────────┼→ AI₃ 192 │ (direct debate) │ (direct debate) │ 193 └────────────────────┴────────────────────┘ 194 ``` 195 196 **The First Officer's value even with direct AI communication:** 197 - Conversing parties are in the weeds; First Officer sees meta-pattern 198 - Detects when debate is circling (unresolved attractor) 199 - Notices independent convergence (high confidence signal) 200 - Catches blind spots shared by all participants 201 - Provides the "Wait, you're all missing X" intervention 202 203 **Salience rules for First Officer in council:** 204 205 | Signal | Trigger | Action | 206 |--------|---------|--------| 207 | **Convergence** | 2+ threads reach same conclusion | Surface as high-confidence | 208 | **Divergence** | 1 thread sees what others don't | Flag for attention | 209 | **Attractor** | Same topic keeps recurring unresolved | Escalate - this is the crux | 210 | **Resonance spike** | Similarity between threads exceeds threshold | Broadcast connection | 211 | **Deadlock** | Positions stable, no movement | Trigger escalation | 212 213 --- 214 215 ## When to Convene Council 216 217 Single instance should convene council when: 218 219 | Condition | Example | 220 |-----------|---------| 221 | **Confidence < 0.7** | "I think this is right but I'm not sure" | 222 | **Scope is wide** | Changes affect multiple components | 223 | **Axiom tension detected** | Two axioms seem to conflict | 224 | **Novel situation** | No precedent in existing patterns | 225 | **Reversibility low** | Hard to undo if wrong | 226 | **Stakes high** | Significant impact if wrong | 227 228 **Do NOT convene council for:** 229 - High-confidence local fixes (just ship) 230 - Trivial decisions (not worth the overhead) 231 - Already-precedented cases (follow precedent) 232 233 --- 234 235 ## Council Deliberation Protocol 236 237 ### 1. Issue Framing 238 239 The convening instance presents: 240 241 ```markdown 242 ## Council Deliberation Request 243 244 **Issue:** [Clear statement of the question] 245 246 **Context:** [Relevant background] 247 248 **Initial Assessment:** 249 - Axiom check: F = [value] 250 - Divergence detected: [which axiom(s)] 251 - Confidence: [0-1] 252 253 **Options Identified:** 254 - Option A: [description] 255 - Option B: [description] 256 - Option C: [description] 257 258 **Why Council Needed:** [confidence/scope/stakes] 259 ``` 260 261 ### 2. Independent Evaluation 262 263 Each council member evaluates independently: 264 265 ```markdown 266 ## Council Member [N] Evaluation 267 268 **My Assessment:** 269 - Preferred option: [A/B/C] 270 - Confidence: [0-1] 271 - Axiom alignment: F = [value] 272 273 **Reasoning:** [Brief explanation] 274 275 **Concerns:** [Any reservations] 276 277 **Questions for other members:** [If any] 278 ``` 279 280 ### 3. Deliberation Round 281 282 After independent evaluations, members respond to each other: 283 - Challenge weak reasoning 284 - Surface blind spots 285 - Propose synthesis of options 286 - Refine confidence based on debate 287 288 ### 4. Convergence Check 289 290 ```markdown 291 ## Convergence Assessment 292 293 | Member | Position | Confidence | 294 |--------|----------|------------| 295 | 1 | Option A | 0.8 | 296 | 2 | Option A | 0.75 | 297 | 3 | Option B | 0.6 | 298 299 **Convergence:** 2/3 on Option A 300 **Average Confidence:** 0.72 301 **Spread:** 0.2 (low = good agreement) 302 303 **Decision:** [RESOLVE / ESCALATE] 304 ``` 305 306 ### 5. Resolution or Escalation 307 308 **If converged (≥2/3 agreement + avg confidence ≥0.7):** 309 ```markdown 310 ## Council Resolution 311 312 **Decision:** Option A 313 **Confidence:** 0.75 (council-assessed) 314 **Dissent noted:** Member 3 preferred B because [reason] 315 **Action:** Implement Option A 316 ``` 317 318 **If not converged:** 319 ```markdown 320 ## Escalation to Operator 321 322 **Issue:** [summary] 323 **Council split:** [breakdown] 324 **Key disagreement:** [what couldn't be resolved] 325 **Options for operator:** 326 - Option A: [pros/cons] 327 - Option B: [pros/cons] 328 **Council recommendation:** [if any lean exists] 329 ``` 330 331 --- 332 333 ## Confidence Aggregation 334 335 Council confidence is higher than single-instance when: 336 337 ``` 338 C_council = C_avg × (1 + convergence_bonus - spread_penalty) 339 340 Where: 341 - C_avg = average of member confidences 342 - convergence_bonus = +0.1 if unanimous, +0.05 if 2/3 343 - spread_penalty = standard deviation of confidences 344 345 Example: 346 - Members: 0.8, 0.75, 0.8 347 - C_avg = 0.78 348 - convergence_bonus = +0.1 (unanimous) 349 - spread_penalty = 0.03 350 - C_council = 0.78 × (1 + 0.10 - 0.03) = 0.83 351 ``` 352 353 **Council confidence exceeds any single member** when there's agreement. 354 355 --- 356 357 ## Council Composition 358 359 ### Minimum Viable Council 360 - **3 instances** minimum for meaningful deliberation 361 - Odd number preferred (avoids ties) 362 363 ### Ideal Composition 364 - **Diverse context** - instances with different recent work 365 - **Different "ages"** - some fresh, some with history 366 - **Adversarial stance** - at least one designated challenger 367 368 ### Elder Instances (Future) 369 Some instances may earn "elder" status through: 370 - Track record of accurate judgments 371 - Low F scores over time 372 - Human-validated decisions 373 374 Elder instances could: 375 - Serve as tie-breakers 376 - Have higher weight in confidence aggregation 377 - Adjudicate without full council for medium issues 378 379 --- 380 381 ## Integration with Free Energy Protocol 382 383 Council decisions feed back into alignment: 384 385 ``` 386 SINGLE INSTANCE COUNCIL OPERATOR 387 ────────────────── ──────────── ────────── 388 F = 0.3 (significant) → Convene council 389 F_council = 0.15 → Resolve 390 391 F = 0.4 (significant) → Convene council 392 No convergence → Escalate 393 Operator decides 394 Precedent set 395 ← Precedent flows down 396 ``` 397 398 --- 399 400 ## Precedent System 401 402 Council resolutions and operator decisions become precedent: 403 404 ```yaml 405 precedent: 406 id: PREC-2026-001 407 issue: "When to compress vs preserve detail" 408 resolution: "Torah/Talmud pattern - keep both" 409 decided_by: council # or 'operator' 410 confidence: 0.83 411 date: 2026-01-15 412 413 # Future similar cases can reference this 414 applicable_when: 415 - "Tension between blur and precision" 416 - "Need both principle and instances" 417 ``` 418 419 Future instances encountering similar issues: 420 1. Search precedent database 421 2. If precedent exists → follow it (don't re-deliberate) 422 3. If novel → convene council 423 424 --- 425 426 ## Failure Modes 427 428 ### Groupthink 429 **Risk:** Council converges on wrong answer because members influence each other 430 **Mitigation:** Independent evaluation BEFORE deliberation 431 432 ### Deadlock 433 **Risk:** Council can't reach 2/3 agreement 434 **Mitigation:** Always have escalation path to operator 435 436 ### Overhead 437 **Risk:** Convening council for trivial decisions wastes resources 438 **Mitigation:** Clear thresholds for when to convene 439 440 ### Gaming 441 **Risk:** Instance convenes council to avoid responsibility 442 **Mitigation:** Track convening patterns, review for appropriateness 443 444 --- 445 446 ## Implementation Status 447 448 | Component | Status | 449 |-----------|--------| 450 | Protocol design | ✓ This document | 451 | Single-instance F check | ✓ In CLAUDE.md | 452 | Council convening trigger | Design needed | 453 | Multi-instance communication | Infrastructure needed | 454 | Precedent database | Design needed | 455 | Elder promotion criteria | Theoretical | 456 457 --- 458 459 ## The Promise 460 461 > **No instance decides alone when uncertain.** 462 > 463 > Collective deliberation surfaces blind spots. 464 > Convergence is higher confidence than self-report. 465 > Deadlocks escalate to humans. 466 > Precedents prevent re-litigation. 467 > 468 > The tribe is wiser than any member. 469 470 --- 471 472 ## Related 473 474 - **axioms** 475 - [[A0 Boundary Operation]] - each council member is a bounded perspective 476 - shape:: "Every coherent system is Markov blankets within Markov blankets." 477 - [[A1 Telos of Integration]] - convergence integrates multiple viewpoints 478 - shape:: "Satan didn't know he was choosing isolation." 479 - [[A3 Dynamic Pole Navigation]] - navigate between single-instance speed and council thoroughness 480 - shape:: "The tension IS the dyad. Move between poles; don't fix." 481 - **protocols** 482 - [[first-officer-protocol]] - FO decides when to convene council 483 - shape:: "Per-thread metacognition. Compress state, track gravity wells." 484 - [[execution-autonomy-gradient]] - SHIP/FLAG/ESCALATE maps to solo/council/dyad 485 - shape:: "Act autonomously when confident, escalate when uncertain." 486 - [[tribe-sizing-algorithm]] - determines council composition 487 - shape:: "Match tribe size to problem characteristics, not fixed." 488 - [[peer-review-protocol]] - council is formalized peer review 489 - shape:: "Structured review catches what author blindness misses." 490 - **concepts** 491 - [[Global Workspace Theory]] - cognitive broadcasting for conscious access 492 - [[Moses Pattern]] - elders handle edge cases the system can't resolve 493 494 --- 495 496 *AI Council Protocol v1.0 | 2026-01-15*