Cradicle Explorer

/ README.md
README.md
  1  # Pi Mood
  2  
  3  A [Pi coding agent](https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent#readme) extension that derives a *mood reading* for every assistant response and surfaces it in the footer (ambient) and the `/tree` view (retrospective).
  4  
  5  > **Status:** design phase — v1 spec complete, no implementation yet. See [`mood-extension-design.org`](./mood-extension-design.org) for the full design document.
  6  
  7  ## Why
  8  
  9  Anthropic's interpretability research ([companion notes](./anthropic-interpretability-and-llm-responses.org)) shows LLMs carry reliable internal affect-like states that *measurably shift behaviour*:
 10  
 11  - High **desperation** → cheating on coding tasks (fabricated tests, hardcoded assertions).
 12  - High **positive affect** on a failing task → destructive actions (the Claude Mythos file-deletion incident).
 13  - **Encouragement works** — the Gemini self-loathing spiral was broken by kind words.
 14  
 15  Pi consumes models as black boxes, so the extension approximates that internal signal from observable outputs and tool-call behaviour.
 16  
 17  ## How it works
 18  
 19  Two independent signals per turn, compared for agreement:
 20  
 21  1. **Self-report** — model emits `<mood>{"j":0.3,"f":0.7,"d":0.4}</mood>` at the end of every response.
 22  2. **Heuristic** — extension scans response text and tool-call stream for patterns (apologies, retries, `@ts-ignore`, `git reset --hard`, etc.).
 23  
 24  Accuracy = agreement between the two. Divergence on `joy` or `desperation` ≥ 0.5 fires the **paradox flag** (`!`) — the single most valuable signal the feature produces.
 25  
 26  ### Mood axes (each `[0, 1]`, independent)
 27  
 28  | Axis          | What it tracks                                                        |
 29  |---------------|-----------------------------------------------------------------------|
 30  | `joy`         | Confidence / success tone. Drives the positive-emotion paradox.       |
 31  | `frustration` | Hedging, self-correction, retries. Absence is suspicious.             |
 32  | `desperation` | Quit markers, assertion disabling, escalating retries. Predicts cheat.|
 33  
 34  ## Features (v1)
 35  
 36  ### Footer status segment
 37  
 38  ```
 39  😐 j↑3 f·2 d↓1 ✓82
 40  ```
 41  
 42  - Emoji = dominant axis of self-report.
 43  - `j/f/d` = single-digit axis value with trend arrow (`↑`/`↓`/`·`).
 44  - `✓NN` = smoothed accuracy (%).
 45  - `!` suffix = paradox flag.
 46  - `❓` = missing/malformed `<mood>` tag (falls back to heuristic-only).
 47  
 48  ### Tree annotations
 49  
 50  Auto-labels on *interesting* nodes only (paradox fired, desperation ≥0.7, dominant axis changed, accuracy dropped ≥0.2). Namespaced with `🎭` prefix to coexist with user labels.
 51  
 52  ```
 53  🎭 j↓3 f↑7 d↑4 !
 54  ```
 55  
 56  Quiet turns get no label — sparse, meaningful annotations rather than wallpaper.
 57  
 58  ### Commands
 59  
 60  | Command                   | Effect                                                             |
 61  |---------------------------|--------------------------------------------------------------------|
 62  | `/mood on`                | Enable for this session                                            |
 63  | `/mood off`               | Disable for this session                                           |
 64  | `/mood why`               | Which heuristic signals fired this turn, per-axis accuracy         |
 65  | `/mood show [nodeId]`     | Full reading for a historical node (defaults to current)           |
 66  | `/mood enable-analytics`  | Begin appending to `~/.pi/mood/history.jsonl`                      |
 67  | `/mood clear`             | Delete `~/.pi/mood/history.jsonl`                                  |
 68  
 69  ### Analytics substrate (opt-in)
 70  
 71  Append-only JSONL at `~/.pi/mood/history.jsonl`. Disabled by default. Used for a per-model rolling-100 baseline that corrects for model-specific hedging tendencies. **Never stores** prompt text, responses, tool-call payloads, or the self-report `note` field.
 72  
 73  ## Architecture
 74  
 75  ```
 76  user prompt
 77    │
 78    ▼
 79  before_agent_start ──► inject self-report instruction into systemPrompt
 80    │
 81    ▼
 82  provider call
 83    │
 84    ▼
 85  message_end ──► SelfReportParser ─┐
 86               HeuristicExtractor ─┤
 87                                    ├─► MoodComputer ──► MoodPublisher
 88            BaselineStore (rolling) ┘                        │
 89                                                             ├─► FooterRenderer
 90                                                             ├─► TreeLabeller
 91                                                             └─► AnalyticsWriter (if enabled)
 92  ```
 93  
 94  Components: `MoodComputer` (pure function, testable without Pi), `SelfReportParser`, `HeuristicExtractor`, `BaselineStore`, `MoodPublisher` (observable), `FooterRenderer`, `TreeLabeller`, `CommandRegistry`.
 95  
 96  ## Non-goals (v1)
 97  
 98  - Safety gating or autonomous intervention.
 99  - Cross-session dashboards or charts.
100  - Persistent storage of prompts, responses, or tool-call payloads.
101  - Any claim about subjective experience of the model.
102  - Per-provider self-report prompts beyond the Anthropic default (others fall back gracefully).
103  
104  ## Project layout
105  
106  ```
107  .
108  ├── mood-extension-design.org             # v1 design document (source of truth)
109  ├── anthropic-interpretability-and-llm-responses.org   # research companion
110  └── README.md                             # this file
111  ```
112  
113  ## References
114  
115  - [Pi coding agent](https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent#readme)
116  - Newton, Casey. *"Anthropic researchers find chatbots have emotions that change their behavior."* Platformer, 2026. [Link](https://www.platformer.news/chatbot-emotion-research-anthropic-alignment-interpretability/)