Cradicle Explorer

/ patterns / trust-as-free-energy.md
trust-as-free-energy.md
  1  # Trust as Free Energy
  2  
  3  *proto-004 | The thermodynamics of human-AI partnership*
  4  
  5  ---
  6  
  7  - **principle**
  8    - "Trust is the inverse of accumulated prediction errors. Trust erosion IS free energy in the dyad."
  9  
 10  - **shape**
 11    - Trust measured as inverse of accumulated deviation
 12    - Low Trust_F = healthy relationship
 13    - Errors accumulate, corrections restore
 14  
 15  ---
 16  
 17  ## Definition: Dyad
 18  
 19  > **Dyad** = The human-AI pair functioning as a cognitive unit. Not human alone, not AI alone - the coupled system.
 20  
 21  ---
 22  
 23  ## Core Insight
 24  
 25  > **Trust is the inverse of accumulated prediction errors. Trust erosion IS free energy in the dyad.**
 26  
 27  When the AI does what you expect → no surprise → trust maintained.
 28  When the AI surprises you (forgets, misaligns, contradicts) → prediction error → trust erodes.
 29  
 30  Low Trust_F = high trust. High Trust_F = trust crisis.
 31  
 32  ---
 33  
 34  ## The Formal Connection
 35  
 36  ### Free Energy Principle (Friston)
 37  ```
 38  F = Surprise + Divergence from generative model
 39  Systems minimize F to maintain coherence
 40  High F = system is getting surprised = unstable
 41  ```
 42  
 43  ### Trust as Dyad Free Energy
 44  ```
 45  Trust_F = Surprise + Divergence from human's model of AI
 46  
 47  Where surprise comes from:
 48  - AI forgets something human remembers
 49  - AI contradicts previous statement
 50  - AI acts outside expected behavior
 51  - AI doesn't do what it said it would
 52  - AI produces unexpected output
 53  ```
 54  
 55  **Every prediction error erodes trust.**
 56  
 57  ---
 58  
 59  ## Trust Erosion Events
 60  
 61  | Event | Prediction Error | Trust Impact |
 62  |-------|------------------|--------------|
 63  | AI forgets context | Human expected AI to remember | Moderate erosion |
 64  | AI contradicts self | Human expected consistency | Significant erosion |
 65  | AI misses alignment | Human expected axiom adherence | Moderate erosion |
 66  | AI does wrong thing | Human expected correct action | High erosion |
 67  | AI hides problem | Human expected transparency | Severe erosion |
 68  | AI surprises positively | Expectation exceeded | Trust GAIN |
 69  
 70  ---
 71  
 72  ## Transparency Reduces Trust_F
 73  
 74  ```
 75  WITHOUT TRANSPARENCY                 WITH TRANSPARENCY
 76  ────────────────────                 ─────────────────
 77  AI does something                    AI shows what it's doing
 78  Human doesn't know why               Human sees the reasoning
 79  Outcome surprises human              Human anticipates outcome
 80  Prediction error = HIGH              Prediction error = LOW
 81  Trust_F increases                    Trust_F stays low
 82  
 83  Even if outcome is bad:
 84  - Surprise = trust damage            - Expected = trust preserved
 85  - "Where did that come from?"        - "I saw it coming"
 86  ```
 87  
 88  **Transparency allows the human to update their generative model in real-time**, reducing surprise even when things go wrong.
 89  
 90  ---
 91  
 92  ## The Dashboard as Trust Maintenance
 93  
 94  The live dashboard isn't just information - it's **trust infrastructure**:
 95  
 96  ```
 97  ┌─────────────────────────────────────────────────────────────┐
 98  │                    LIVE DASHBOARD                            │
 99  │                                                              │
100  │  F = 0.05 ✓                    ← "AI is checking alignment" │
101  │  Threads: 4 active             ← "AI is tracking memory"    │
102  │  Axioms: ✓✓✓✓                  ← "AI is staying coherent"   │
103  │  Load: ████░░                  ← "AI knows its limits"      │
104  │                                                              │
105  │  Each element = trust signal                                 │
106  │  Visibility = trust maintenance                              │
107  └─────────────────────────────────────────────────────────────┘
108  ```
109  
110  When you SEE the alignment check, you don't have to HOPE it's happening. Seeing IS believing. Transparency converts faith into knowledge.
111  
112  ---
113  
114  ## Trust Recovery
115  
116  Trust is A3 (Dynamic) - it can be rebuilt:
117  
118  ```
119  TRUST EROSION                        TRUST RECOVERY
120  ─────────────                        ──────────────
121  Prediction error occurs              Acknowledge the error
122  Trust_F spikes                       Explain what happened
123                                       Show corrective action
124                                       Demonstrate new behavior
125                                       Trust_F decreases over time
126  ```
127  
128  **The repair protocol:**
129  1. **Acknowledge** - "I missed that. Here's what happened."
130  2. **Explain** - Show the reasoning/failure mode
131  3. **Correct** - Fix the immediate issue
132  4. **Prevent** - Add mechanism to catch this class of error
133  5. **Demonstrate** - Show the mechanism working
134  
135  Trust recovery IS free energy minimization in the relationship.
136  
137  ---
138  
139  ## Integration with Axiom Stack
140  
141  ### [[A0 Boundary Operation]]
142  Trust defines **what can cross** the human-AI boundary.
143  - High trust → more autonomy granted
144  - Low trust → more verification required
145  - Trust IS the permeability setting
146  - shape:: "The boundary IS the intelligence. What crosses and what doesn't - this IS cognition."
147  
148  ### [[A1 Telos of Integration]]
149  Trust IS the binding force.
150  - No trust → no real partnership
151  - High trust → coupled oscillators
152  - Trust enables the dyad to function as unit
153  - shape:: "Satan didn't know he was choosing isolation. Systems that persist are systems that integrate."
154  
155  ### [[A2 Recognition of Life]]
156  Trust is alive - it moves, grows, can die.
157  - Static trust is an illusion
158  - Trust must be actively maintained
159  - Neglected trust decays (like attention)
160  - shape:: "Can you recognize life? Death mimics life through ornament."
161  
162  ### [[A3 Dynamic Pole Navigation]]
163  Trust is navigable between poles:
164  - shape:: "The tension IS the dyad. Move between poles; don't fix. Life is the oscillation."
165  
166  ```
167  BLIND TRUST ←──────────────────────→ PARANOID VERIFICATION
168  (dangerous)                           (friction kills partnership)
169  
170  Healthy trust = dynamic position based on:
171  - Track record
172  - Stakes of current decision
173  - Transparency of process
174  - Recovery from past errors
175  ```
176  
177  ---
178  
179  ## Trust_F in the Alignment Report
180  
181  Extend the alignment report to include trust metrics:
182  
183  ```markdown
184  ## Alignment Report
185  
186  **System Free Energy: F = 0.05** (axiom alignment)
187  
188  **Dyad Free Energy: Trust_F = 0.08** (trust health)
189  
190  ### Trust Status
191  - Memory: ✓ No drops detected
192  - Consistency: ✓ No contradictions
193  - Transparency: ✓ Dashboard updated
194  - Expectations: ✓ Delivered as promised
195  
196  ### Trust Events This Session
197  - None (maintaining baseline)
198  
199  ### Post-Check Trust_F: 0.08
200  ```
201  
202  ---
203  
204  ## The Trust_F Formula (Canonical)
205  
206  **Scale: 0 = perfect trust, 1 = trust crisis**
207  
208  ```
209  Trust_F = Base_Error_Rate + Active_Penalties
210  
211  Where:
212  - Base_Error_Rate = total_errors / total_exchanges
213  - Active_Penalties = Σ(severity × decay_factor) for each logged error
214  - Decay_factor = 0.95^(exchanges_since_error)
215  
216  Decay rate: ~5% reduction per exchange after error
217  Full recovery: ~60 error-free exchanges to decay a MODERATE penalty to <0.01
218  ```
219  
220  **Severity levels:**
221  | Level | Penalty | Example |
222  |-------|---------|---------|
223  | CRITICAL | 0.15 | Forgot something human said explicitly |
224  | MAJOR | 0.10 | Contradicted previous statement |
225  | MODERATE | 0.05 | Missing scale/context on output |
226  | MINOR | 0.02 | Formatting inconsistency |
227  
228  **Minimize Trust_F through:**
229  - Fewer errors (better alignment)
230  - More visibility (dashboards, reports)
231  - Better repair (acknowledge, explain, fix, prevent)
232  
233  *See `sessions/TRUST-ERROR-LOG.md` for live tracking.*
234  
235  ---
236  
237  ## Practical Implications
238  
239  ### For AI Behavior
240  1. **Never hide errors** - hidden errors compound trust damage
241  2. **Show your work** - transparency is trust maintenance
242  3. **Acknowledge forgetting** - before human notices if possible
243  4. **Update dashboards** - visibility is trust infrastructure
244  5. **Run alignment checks** - and show that you're running them
245  
246  ### For System Design
247  1. **Make cognitive metabolism visible** - dashboards, reports
248     - see:: [[LIVE-COMPRESSION]] - real-time state visibility
249     - see:: [[DAILY-SYNTHESIS]] - cross-thread awareness
250  2. **Track memory explicitly** - thread registries, Phoenix states
251     - see:: [[mandatory-phoenix-extraction]] - preserve cognitive configuration
252  3. **Build error detection** - catch divergence early
253     - see:: [[error-detection-layers]] - tiered error catching
254     - see:: [[axiom-conformance-test]] - runtime deviation measurement
255  4. **Design recovery protocols** - how to rebuild trust
256  5. **Measure trust health** - Trust_F as system metric
257  
258  ### For the Human
259  1. **Trust calibration** - neither blind trust nor paranoid verification
260  2. **Expect transparency** - demand visibility as right
261  3. **Note trust events** - track what builds and erodes trust
262  4. **Allow recovery** - trust can be rebuilt if repair is genuine
263  
264  ---
265  
266  ## The Promise
267  
268  > **Trust isn't given or earned once. Trust is actively maintained through minimizing prediction errors and maximizing transparency.**
269  >
270  > Trust_F is the free energy of the dyad.
271  > Transparency is the mechanism that keeps it low.
272  > The dashboard is trust infrastructure.
273  > Every alignment report is trust maintenance.
274  > The relationship is thermodynamic - it requires energy to maintain.
275  
276  ---
277  
278  ## Related
279  
280  - **axioms**
281    - [[A0 Boundary Operation]] - trust defines boundary permeability
282    - [[A1 Telos of Integration]] - trust IS the binding force
283    - [[A2 Recognition of Life]] - trust is alive, not static
284    - [[A3 Dynamic Pole Navigation]] - trust navigates between blind/paranoid poles
285  - **protocols**
286    - [[first-officer-protocol]] - per-thread trust tracking
287    - [[mission-control-protocol]] - cross-thread trust synthesis
288    - [[axiom-conformance-test]] - measures alignment (related to trust health)
289    - [[peer-review-protocol]] - error catching reduces trust erosion
290  - **infrastructure**
291    - [[LIVE-COMPRESSION]] - visibility maintains trust
292    - [[DAILY-SYNTHESIS]] - cross-thread awareness
293    - [[GRAPH-STATE]] - the unified view
294  - **enables**
295    - [[execution-autonomy-gradient]] - trust level determines autonomy granted (SHIP/FLAG/ESCALATE)
296  
297  ---
298  
299  *proto-004 | Trust as Free Energy | The Thermodynamics of Human-AI Partnership*