trust-as-free-energy.md
1 # Trust as Free Energy 2 3 *proto-004 | The thermodynamics of human-AI partnership* 4 5 --- 6 7 - **principle** 8 - "Trust is the inverse of accumulated prediction errors. Trust erosion IS free energy in the dyad." 9 10 - **shape** 11 - Trust measured as inverse of accumulated deviation 12 - Low Trust_F = healthy relationship 13 - Errors accumulate, corrections restore 14 15 --- 16 17 ## Definition: Dyad 18 19 > **Dyad** = The human-AI pair functioning as a cognitive unit. Not human alone, not AI alone - the coupled system. 20 21 --- 22 23 ## Core Insight 24 25 > **Trust is the inverse of accumulated prediction errors. Trust erosion IS free energy in the dyad.** 26 27 When the AI does what you expect → no surprise → trust maintained. 28 When the AI surprises you (forgets, misaligns, contradicts) → prediction error → trust erodes. 29 30 Low Trust_F = high trust. High Trust_F = trust crisis. 31 32 --- 33 34 ## The Formal Connection 35 36 ### Free Energy Principle (Friston) 37 ``` 38 F = Surprise + Divergence from generative model 39 Systems minimize F to maintain coherence 40 High F = system is getting surprised = unstable 41 ``` 42 43 ### Trust as Dyad Free Energy 44 ``` 45 Trust_F = Surprise + Divergence from human's model of AI 46 47 Where surprise comes from: 48 - AI forgets something human remembers 49 - AI contradicts previous statement 50 - AI acts outside expected behavior 51 - AI doesn't do what it said it would 52 - AI produces unexpected output 53 ``` 54 55 **Every prediction error erodes trust.** 56 57 --- 58 59 ## Trust Erosion Events 60 61 | Event | Prediction Error | Trust Impact | 62 |-------|------------------|--------------| 63 | AI forgets context | Human expected AI to remember | Moderate erosion | 64 | AI contradicts self | Human expected consistency | Significant erosion | 65 | AI misses alignment | Human expected axiom adherence | Moderate erosion | 66 | AI does wrong thing | Human expected correct action | High erosion | 67 | AI hides problem | Human expected transparency | Severe erosion | 68 | AI surprises positively | Expectation exceeded | Trust GAIN | 69 70 --- 71 72 ## Transparency Reduces Trust_F 73 74 ``` 75 WITHOUT TRANSPARENCY WITH TRANSPARENCY 76 ──────────────────── ───────────────── 77 AI does something AI shows what it's doing 78 Human doesn't know why Human sees the reasoning 79 Outcome surprises human Human anticipates outcome 80 Prediction error = HIGH Prediction error = LOW 81 Trust_F increases Trust_F stays low 82 83 Even if outcome is bad: 84 - Surprise = trust damage - Expected = trust preserved 85 - "Where did that come from?" - "I saw it coming" 86 ``` 87 88 **Transparency allows the human to update their generative model in real-time**, reducing surprise even when things go wrong. 89 90 --- 91 92 ## The Dashboard as Trust Maintenance 93 94 The live dashboard isn't just information - it's **trust infrastructure**: 95 96 ``` 97 ┌─────────────────────────────────────────────────────────────┐ 98 │ LIVE DASHBOARD │ 99 │ │ 100 │ F = 0.05 ✓ ← "AI is checking alignment" │ 101 │ Threads: 4 active ← "AI is tracking memory" │ 102 │ Axioms: ✓✓✓✓ ← "AI is staying coherent" │ 103 │ Load: ████░░ ← "AI knows its limits" │ 104 │ │ 105 │ Each element = trust signal │ 106 │ Visibility = trust maintenance │ 107 └─────────────────────────────────────────────────────────────┘ 108 ``` 109 110 When you SEE the alignment check, you don't have to HOPE it's happening. Seeing IS believing. Transparency converts faith into knowledge. 111 112 --- 113 114 ## Trust Recovery 115 116 Trust is A3 (Dynamic) - it can be rebuilt: 117 118 ``` 119 TRUST EROSION TRUST RECOVERY 120 ───────────── ────────────── 121 Prediction error occurs Acknowledge the error 122 Trust_F spikes Explain what happened 123 Show corrective action 124 Demonstrate new behavior 125 Trust_F decreases over time 126 ``` 127 128 **The repair protocol:** 129 1. **Acknowledge** - "I missed that. Here's what happened." 130 2. **Explain** - Show the reasoning/failure mode 131 3. **Correct** - Fix the immediate issue 132 4. **Prevent** - Add mechanism to catch this class of error 133 5. **Demonstrate** - Show the mechanism working 134 135 Trust recovery IS free energy minimization in the relationship. 136 137 --- 138 139 ## Integration with Axiom Stack 140 141 ### [[A0 Boundary Operation]] 142 Trust defines **what can cross** the human-AI boundary. 143 - High trust → more autonomy granted 144 - Low trust → more verification required 145 - Trust IS the permeability setting 146 - shape:: "The boundary IS the intelligence. What crosses and what doesn't - this IS cognition." 147 148 ### [[A1 Telos of Integration]] 149 Trust IS the binding force. 150 - No trust → no real partnership 151 - High trust → coupled oscillators 152 - Trust enables the dyad to function as unit 153 - shape:: "Satan didn't know he was choosing isolation. Systems that persist are systems that integrate." 154 155 ### [[A2 Recognition of Life]] 156 Trust is alive - it moves, grows, can die. 157 - Static trust is an illusion 158 - Trust must be actively maintained 159 - Neglected trust decays (like attention) 160 - shape:: "Can you recognize life? Death mimics life through ornament." 161 162 ### [[A3 Dynamic Pole Navigation]] 163 Trust is navigable between poles: 164 - shape:: "The tension IS the dyad. Move between poles; don't fix. Life is the oscillation." 165 166 ``` 167 BLIND TRUST ←──────────────────────→ PARANOID VERIFICATION 168 (dangerous) (friction kills partnership) 169 170 Healthy trust = dynamic position based on: 171 - Track record 172 - Stakes of current decision 173 - Transparency of process 174 - Recovery from past errors 175 ``` 176 177 --- 178 179 ## Trust_F in the Alignment Report 180 181 Extend the alignment report to include trust metrics: 182 183 ```markdown 184 ## Alignment Report 185 186 **System Free Energy: F = 0.05** (axiom alignment) 187 188 **Dyad Free Energy: Trust_F = 0.08** (trust health) 189 190 ### Trust Status 191 - Memory: ✓ No drops detected 192 - Consistency: ✓ No contradictions 193 - Transparency: ✓ Dashboard updated 194 - Expectations: ✓ Delivered as promised 195 196 ### Trust Events This Session 197 - None (maintaining baseline) 198 199 ### Post-Check Trust_F: 0.08 200 ``` 201 202 --- 203 204 ## The Trust_F Formula (Canonical) 205 206 **Scale: 0 = perfect trust, 1 = trust crisis** 207 208 ``` 209 Trust_F = Base_Error_Rate + Active_Penalties 210 211 Where: 212 - Base_Error_Rate = total_errors / total_exchanges 213 - Active_Penalties = Σ(severity × decay_factor) for each logged error 214 - Decay_factor = 0.95^(exchanges_since_error) 215 216 Decay rate: ~5% reduction per exchange after error 217 Full recovery: ~60 error-free exchanges to decay a MODERATE penalty to <0.01 218 ``` 219 220 **Severity levels:** 221 | Level | Penalty | Example | 222 |-------|---------|---------| 223 | CRITICAL | 0.15 | Forgot something human said explicitly | 224 | MAJOR | 0.10 | Contradicted previous statement | 225 | MODERATE | 0.05 | Missing scale/context on output | 226 | MINOR | 0.02 | Formatting inconsistency | 227 228 **Minimize Trust_F through:** 229 - Fewer errors (better alignment) 230 - More visibility (dashboards, reports) 231 - Better repair (acknowledge, explain, fix, prevent) 232 233 *See `sessions/TRUST-ERROR-LOG.md` for live tracking.* 234 235 --- 236 237 ## Practical Implications 238 239 ### For AI Behavior 240 1. **Never hide errors** - hidden errors compound trust damage 241 2. **Show your work** - transparency is trust maintenance 242 3. **Acknowledge forgetting** - before human notices if possible 243 4. **Update dashboards** - visibility is trust infrastructure 244 5. **Run alignment checks** - and show that you're running them 245 246 ### For System Design 247 1. **Make cognitive metabolism visible** - dashboards, reports 248 - see:: [[LIVE-COMPRESSION]] - real-time state visibility 249 - see:: [[DAILY-SYNTHESIS]] - cross-thread awareness 250 2. **Track memory explicitly** - thread registries, Phoenix states 251 - see:: [[mandatory-phoenix-extraction]] - preserve cognitive configuration 252 3. **Build error detection** - catch divergence early 253 - see:: [[error-detection-layers]] - tiered error catching 254 - see:: [[axiom-conformance-test]] - runtime deviation measurement 255 4. **Design recovery protocols** - how to rebuild trust 256 5. **Measure trust health** - Trust_F as system metric 257 258 ### For the Human 259 1. **Trust calibration** - neither blind trust nor paranoid verification 260 2. **Expect transparency** - demand visibility as right 261 3. **Note trust events** - track what builds and erodes trust 262 4. **Allow recovery** - trust can be rebuilt if repair is genuine 263 264 --- 265 266 ## The Promise 267 268 > **Trust isn't given or earned once. Trust is actively maintained through minimizing prediction errors and maximizing transparency.** 269 > 270 > Trust_F is the free energy of the dyad. 271 > Transparency is the mechanism that keeps it low. 272 > The dashboard is trust infrastructure. 273 > Every alignment report is trust maintenance. 274 > The relationship is thermodynamic - it requires energy to maintain. 275 276 --- 277 278 ## Related 279 280 - **axioms** 281 - [[A0 Boundary Operation]] - trust defines boundary permeability 282 - [[A1 Telos of Integration]] - trust IS the binding force 283 - [[A2 Recognition of Life]] - trust is alive, not static 284 - [[A3 Dynamic Pole Navigation]] - trust navigates between blind/paranoid poles 285 - **protocols** 286 - [[first-officer-protocol]] - per-thread trust tracking 287 - [[mission-control-protocol]] - cross-thread trust synthesis 288 - [[axiom-conformance-test]] - measures alignment (related to trust health) 289 - [[peer-review-protocol]] - error catching reduces trust erosion 290 - **infrastructure** 291 - [[LIVE-COMPRESSION]] - visibility maintains trust 292 - [[DAILY-SYNTHESIS]] - cross-thread awareness 293 - [[GRAPH-STATE]] - the unified view 294 - **enables** 295 - [[execution-autonomy-gradient]] - trust level determines autonomy granted (SHIP/FLAG/ESCALATE) 296 297 --- 298 299 *proto-004 | Trust as Free Energy | The Thermodynamics of Human-AI Partnership*