ATTENTION_SYSTEM.md
1 # Attention System Architecture 2 3 > "Attention is all you need" - both the transformer insight and literally true. 4 5 ## The Core Insights 6 7 ### 1. Attention has velocity, not just position 8 9 It's not enough to know WHERE you looked - the system needs to know WHERE YOU'RE HEADING. The trajectory is the predictive signal that allows the system to surface things BEFORE you ask for them. 10 11 ### 2. Sessions aren't isolated - they're one distributed conversation 12 13 When you have multiple Claude sessions open (Claude App, Claude Code, multiple terminals), you're not having N separate conversations. You're having **ONE conversation with yourself**, mediated by Claude instances. Ideas planted in one session emerge in another. The sessions inform each other. 14 15 This is **cross-session attention** - tracking the unified thread of attention that weaves across fragmented interfaces. 16 17 ### 3. Aha moments are attention crystallizing into insight 18 19 Two phenomenologically distinct types: 20 21 | Type | Pattern | Experience | Graph Effect | 22 |------|---------|------------|--------------| 23 | **Discovery** | weak → strong | Surprise, expansion | Graph expands - new territory | 24 | **Architectural** | strong → strong | "Resonant clunk", settling | Graph contracts - space collapses | 25 26 The "resonant clunk" is the sound of graph distance shrinking. Disparate things clicking together. That's why architectural insights feel like settling - something finds its place. 27 28 ### 4. The body knows before the mind articulates 29 30 Biometric signals validate aha moments: 31 - **Discovery**: HR spike + GSR spike (arousal, surprise) 32 - **Architectural**: HRV increase (parasympathetic, settling) 33 34 When the Tobii arrives, pupil dilation adds another signal - cognitive load and interest. 35 36 ## The Single Pane Principle 37 38 **The daily note is the single pane of attention.** 39 40 Everything surfaces to one place. No hunting across apps, notes, databases. If it's important enough to warrant your attention, it appears in the daily note. This is the Markov blanket around your cognitive state for the day. 41 42 ``` 43 ┌──────────────────────────────────────────────────────────────────────┐ 44 │ DAILY NOTE │ 45 │ (Single Pane of Attention) │ 46 ├──────────────────────────────────────────────────────────────────────┤ 47 │ │ 48 │ ## Action Items ← Decisions requiring attention │ 49 │ - [ ] Choose Overcast sync │ 50 │ │ 51 │ ## Attention Surface ← Auto-surfaced by system │ 52 │ - [0.8] Unresolved: concept X │ 53 │ - [0.6] Ahead on trajectory: topic Y │ 54 │ │ 55 │ ## Current Attention State ← Where you are right now │ 56 │ - Focus: episode (listen) │ 57 │ - Direction: confidence 0.7, speed 0.03 │ 58 │ - Active wells: markov_blankets, architecture │ 59 │ - Nagging: 2 unresolved items │ 60 │ │ 61 │ ## Surfaced from Sessions ← High-resonance from conversations │ 62 │ - "We were defining a pipeline..." #question │ 63 │ │ 64 └──────────────────────────────────────────────────────────────────────┘ 65 ``` 66 67 ## System Components 68 69 ### 1. Attention Tracker (`core/attention/tracker.py`) 70 71 Tracks the operator's attention through concept space. 72 73 ```python 74 AttentionEvent # A single gaze - looked at something 75 AttentionTrajectory # Direction + velocity through concept space 76 AttentionState # Current state: focus, trajectory, wells, unresolved 77 AttentionTracker # Core tracker with history and trajectory computation 78 ``` 79 80 **Key methods:** 81 - `record(event)` - Record an attention event 82 - `get_state()` - Get current attention state 83 - `mark_unresolved(id)` - Add to nag list 84 - `resolve(id)` - Remove from nag list 85 86 ### 2. Attention Loop (`core/attention/loop.py`) 87 88 The actuator that makes the system respond to attention. 89 90 ```python 91 AttentionSource # Configuration for an input source 92 SurfacingAction # An action to take (add to daily, notify, etc.) 93 AttentionLoop # The main processing loop 94 ``` 95 96 **The loop continuously:** 97 1. Polls sources for new attention events 98 2. Records them in the tracker 99 3. Computes surfacing actions 100 4. Executes actions (update daily note) 101 102 ### 3. Attention Sources 103 104 Currently implemented: 105 - **Podcast** (`podcast_pipeline.py`) - Parses Overcast OPML 106 107 Planned: 108 - Browser history 109 - Conversation highlighting 110 - File access patterns 111 - Calendar/meeting context 112 113 ### 4. Daily Note Integration 114 115 The `DailyNoteIntegration` class handles: 116 - Surfacing items to the daily note 117 - Tracking what was surfaced when 118 - Attention state summary generation 119 120 ### 5. Cross-Session Tracker (`core/attention/cross_session.py`) 121 122 Tracks attention across multiple concurrent sessions. 123 124 ```python 125 SessionInfo # Info about an active session 126 CrossSessionEvent # Event bridging multiple sessions 127 UnifiedAttentionState # Unified state across all sessions 128 CrossSessionTracker # Detects cross-pollination, echoes, context switches 129 SessionStreamWatcher # Watches session files and feeds the tracker 130 ``` 131 132 **Key capabilities:** 133 - Detects **topic bridges** (same topic appearing in multiple sessions) 134 - Tracks **context switches** (when you move between sessions) 135 - Identifies **cross-session attractors** (ideas hot across multiple sessions) 136 - Computes **unified trajectory** across all sessions 137 - Detects **echoes** (same idea appearing in multiple sessions) 138 139 **The insight:** When you're working in Session A and mention "pipeline", and Session B was just discussing "pipeline", that's not coincidence - it's your distributed attention weaving the sessions together. 140 141 ### 6. Aha Detector (`core/attention/aha_detection.py`) 142 143 Detects moments when attention crystallizes into insight. 144 145 ```python 146 AhaType # DISCOVERY (weak→strong) or ARCHITECTURAL (strong→strong) 147 AhaCandidate # A potential aha moment with confidence score 148 BiometricSnapshot # HR, GSR, HRV data for validation 149 AhaDetector # Detects from edge prediction, resonance, cross-session 150 AhaManager # Surfaces ahas to daily note, log, audio, etc. 151 ``` 152 153 **Detection triggers:** 154 - Edge prediction connects distant concepts 155 - Resonance spike (concept suddenly much more resonant) 156 - Cross-session convergence (multiple sessions hit same topic) 157 158 **Surfacing targets:** 159 - `INLINE_MARKDOWN` - Inject into current content 160 - `AHA_LOG` - Append to dedicated log 161 - `AUDIO_ANNOUNCE` - TTS for ambient awareness 162 - `NAV_MARKER` - Add to navigation breadcrumb 163 - `DAILY_SUMMARY` - Include in daily note 164 165 ### 7. Session Membrane (`core/attention/membrane.py`) 166 167 Semi-permeable boundary between sessions - allows selective context sharing. 168 169 ```python 170 PermeabilityLevel # CLOSED → ATTRACTORS → SUMMARIES → ALERTS → OPEN 171 CrossSessionSignal # A piece of context that crosses the membrane 172 SessionMembrane # Controls what crosses between sessions 173 ``` 174 175 **Permeability levels:** 176 | Level | What Crosses | Use Case | 177 |-------|--------------|----------| 178 | CLOSED | Nothing | Independent work | 179 | ATTRACTORS | Hot topic names only | Light awareness | 180 | SUMMARIES | Topic summaries | Context without detail | 181 | ALERTS | Real-time coherence alerts | Active cross-pollination | 182 | OPEN | Full context | Deep integration | 183 184 ### 8. Eye Tracking (`core/attention/eye_tracking.py`) 185 186 Placeholder for Tobii integration - ground truth attention signal. 187 188 ```python 189 GazePoint # Raw sample (x, y, pupil diameter) 190 Fixation # Sustained attention event with intensity 191 ScreenRegion # Semantic target mapping 192 TobiiIntegration # Hardware interface (placeholder) 193 EyeTrackingAttentionBridge # Converts fixations to AttentionEvents 194 ``` 195 196 **Attention signal hierarchy:** 197 ``` 198 INFERRED DIRECT 199 ◄────────────────────────────────────────────────► 200 201 Session topics Podcast plays Highlights Eye gaze 202 (weak signal) (medium) (strong) (ground truth) 203 ``` 204 205 ### 9. Context Stream (`core/attention/context_stream.py`) 206 207 **Context is a river, not a bucket.** 208 209 Instead of fill-compact-fill sawtooth, maintain continuous flow: 210 211 ```python 212 StreamPriority # PINNED → ACTIVE → RECENT → DECAYING → EXITING 213 StreamItem # Item with attention score, velocity, friction 214 ContextStream # The continuous flow manager 215 AttentionStreamBridge # Connects attention events to stream 216 ``` 217 218 **Stream vs Sawtooth:** 219 ``` 220 SAWTOOTH (old) STREAM (new) 221 222 Context ▲ Context ▲ 223 │ ╱╲ ╱╲ │ ════════════════════ 224 │ ╱ ╲ ╱ ╲ │ items flow through 225 │ ╱ ╲╱ ╲ │ at attention-weighted 226 │ ╱ │ rates 227 └──────────────► └──────────────► 228 ``` 229 230 **Key dynamics:** 231 - Items enter when attention lands on them 232 - Items exit when attention decays below threshold 233 - High attention → high friction → slow exit 234 - Low attention → low friction → fast exit 235 - Pinned items (unresolved) never exit 236 - Stream pressure accelerates exits when near capacity 237 238 ### 10. Attention Compaction (`core/attention/compaction.py`) 239 240 Legacy batch compaction for daily notes (stream is preferred for live context): 241 242 ```python 243 RetentionTier # CORE → HOT → WARM → COOL → COLD 244 AttentionCompactor # Makes compaction decisions 245 DailyNoteCompactor # Applies compaction to daily notes 246 ``` 247 248 ## The Attention Architecture 249 250 ``` 251 ┌─────────────────────────────────────────────────────────────────────────────────┐ 252 │ ATTENTION ARCHITECTURE │ 253 │ "Attention is all you need" │ 254 ├─────────────────────────────────────────────────────────────────────────────────┤ 255 │ │ 256 │ SOURCES PROCESSING OUTPUTS │ 257 │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ 258 │ │ Podcast │ │ Attention│ │ Daily │ │ 259 │ │ (OPML) │─────────────►│ Tracker │─────────────►│ Note │ │ 260 │ └──────────┘ │ │ └──────────┘ │ 261 │ ┌──────────┐ │ • Events │ ┌──────────┐ │ 262 │ │ Browser │─────────────►│ • Traj. │─────────────►│ God DB │ │ 263 │ │ History │ │ • Wells │ └──────────┘ │ 264 │ └──────────┘ │ • Nags │ ┌──────────┐ │ 265 │ ┌──────────┐ └────┬─────┘─────────────►│ Notify │ │ 266 │ │ Claude │───────┐ │ └──────────┘ │ 267 │ │ Sessions │ │ │ │ 268 │ └──────────┘ │ ▼ │ 269 │ ┌──────────┐ │ ┌──────────┐ │ 270 │ │ Tobii │───────┼───►│Trajectory│ │ 271 │ │Eye Track │ │ │Calculator│ ┌─────────────┐ │ 272 │ └──────────┘ │ │ │ │ Aha │ │ 273 │ ┌──────────┐ │ │• Position│──────────────►│ Detector │ │ 274 │ │Biometric │ │ │• Velocity│ │ │ │ 275 │ │ (Watch) │───────┼───►│• Predict │ │ • Discovery │ │ 276 │ └──────────┘ │ └──────────┘ │ • Architect │ │ 277 │ │ │ └──────┬──────┘ │ 278 │ │ │ │ │ 279 │ │ ▼ ▼ │ 280 │ │ ┌──────────┐ ┌──────────┐ │ 281 │ │ │ Cross- │ │ Aha │ │ 282 │ └───►│ Session │─────────────►│ Manager │ │ 283 │ │ Tracker │ │ │ │ 284 │ │ │ │• Inline │ │ 285 │ │• Bridges │ │• Log │ │ 286 │ │• Echoes │ │• Audio │ │ 287 │ │• Attract │ │• Daily │ │ 288 │ └────┬─────┘ └──────────┘ │ 289 │ │ │ 290 │ ▼ │ 291 │ ┌──────────┐ │ 292 │ │ Session │ │ 293 │ │ Membrane │ │ 294 │ │ │ │ 295 │ │ CLOSED │ │ 296 │ │ ATTRACT │ │ 297 │ │ SUMMARY │ │ 298 │ │ ALERTS │ │ 299 │ │ OPEN │ │ 300 │ └──────────┘ │ 301 │ │ 302 └─────────────────────────────────────────────────────────────────────────────────┘ 303 ``` 304 305 ### Key Data Flows 306 307 1. **Attention → Trajectory**: Events compute where attention is heading 308 2. **Trajectory → Aha Detection**: Predicts connections, detects crystallization 309 3. **Sessions → Cross-Session**: Multiple sessions feed unified tracking 310 4. **Cross-Session → Aha Detection**: Convergence triggers architectural ahas 311 5. **Biometrics → Aha Validation**: Body confirms what mind discovers 312 6. **Aha → Daily Note**: Validated insights surface to single pane 313 314 ## Key Concepts 315 316 ### Attention Event 317 318 A single moment of attention: 319 - **target_id** - What you looked at (bullet UUID, episode ID, etc.) 320 - **target_type** - Category (bullet, episode, concept, link) 321 - **modality** - How you attended (read, listen, highlight, search) 322 - **intensity** - Depth of attention (0-1, highlight > read > glance) 323 - **source** - Where the attention came from (podcast, conversation, browser) 324 325 ### Trajectory 326 327 The direction and speed through concept space: 328 - **position** - Current centroid of recent attention (embedding coordinates) 329 - **velocity** - Direction and speed of movement 330 - **predictions** - Where you'll be in 1, 2, 5 seconds 331 - **ahead** - Concepts in the direction of travel 332 - **peripheral** - Concepts nearby but not focused 333 334 ### Gravity Wells 335 336 Topics that consistently attract attention: 337 - High mass = strong pull 338 - Large radius = influences related topics 339 - Slow decay = they persist over time 340 - Activation increases when you attend to them 341 342 ### Nag List (Unresolved) 343 344 Items that persist until explicitly resolved: 345 - Open questions 346 - Identified gaps 347 - Predictions that need validation 348 - Decisions deferred 349 350 ## Integration Points 351 352 ### God Database 353 354 ```python 355 # attention_log table 356 record_attention(bullet_uuid, session_id) 357 get_attention_log(bullet_uuid, limit) 358 ``` 359 360 ### Resonance Engine 361 362 Attention feeds resonance scoring: 363 - `access_frequency` - How often you've attended 364 - `recency` - Exponential decay from last attention 365 - `importance_markers` - Explicit flags from attention 366 367 ### Flight Protocol 368 369 Attention behavior adapts by phase: 370 - **FLY_HIGH** - Surface diverse, serendipitous connections 371 - **RETAIN** - Highlight high-resonance items for marking 372 - **LAND** - Focus on specific items, surface dependencies 373 - **BIRTH** - Run compaction, synthesize 374 - **SAFETY** - Backup, prepare tomorrow's daily note 375 376 ### Cognitive Fingerprint 377 378 Attention trains the fingerprint: 379 - New gravity wells emerge from consistent attention 380 - Pattern preferences adjust based on what resonates 381 - Altitude distribution reflects where you think 382 383 ## Cross-Session Architecture 384 385 When you have multiple Claude sessions open, the system needs to track the **unified thread of attention** that weaves across them. 386 387 ``` 388 ┌─────────────────────────────────────────────────────────────────────────────┐ 389 │ CROSS-SESSION ATTENTION │ 390 ├─────────────────────────────────────────────────────────────────────────────┤ 391 │ │ 392 │ SESSION A SESSION B SESSION C │ 393 │ (Claude Code) (Claude App) (Claude Code) │ 394 │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ 395 │ │ attention│ │ markov │ │ podcast │ │ 396 │ │ system │ │ blankets │ │ pipeline │ │ 397 │ │ │ │ │ │ │ │ 398 │ │ pipeline ├──────────────┤ pipeline │ │ attention│ │ 399 │ └──────────┘ topic └──────────┘ └────┬─────┘ │ 400 │ │ bridge │ │ │ 401 │ │ │ │ │ 402 │ └──────────────────────────┴────────────────────────┘ │ 403 │ │ │ 404 │ CROSS-SESSION │ 405 │ ATTRACTORS │ 406 │ ┌──────────────────┐ │ 407 │ │ • pipeline │ │ 408 │ │ • attention │ │ 409 │ │ (topics in 2+ │ │ 410 │ │ sessions) │ │ 411 │ └──────────────────┘ │ 412 │ │ 413 └─────────────────────────────────────────────────────────────────────────────┘ 414 ``` 415 416 ### Cross-Session Events 417 418 The system detects several types of cross-session patterns: 419 420 | Event Type | Description | Example | 421 |------------|-------------|---------| 422 | `topic_bridge` | Same topic appears in multiple sessions | "pipeline" in Session A and B | 423 | `echo` | Same idea/content appears across sessions | Repeated insight surfaces | 424 | `context_switch` | Attention moves from one session to another | Switch from Code to App | 425 426 ### Unified Trajectory 427 428 Instead of separate trajectories per session, the system computes a **unified trajectory** across all sessions. This represents the overall direction of your attention, regardless of which interface you're using. 429 430 ```python 431 from core.attention import CrossSessionTracker 432 433 tracker = CrossSessionTracker() 434 435 # Register sessions as they appear 436 tracker.register_session("abc123", source="claude-code") 437 tracker.register_session("def456", source="claude-app") 438 439 # Record activity - detects cross-pollination automatically 440 events = tracker.record_activity( 441 "abc123", 442 topics=["attention", "pipeline"] 443 ) 444 445 # Check for cross-session attractors 446 state = tracker.get_state() 447 print(f"Hot topics: {state.cross_session_attractors}") 448 ``` 449 450 ### Integration with Session Streaming 451 452 The `SessionStreamWatcher` integrates with the existing session streaming infrastructure: 453 454 ```python 455 from core.attention import create_cross_session_system 456 457 tracker, watcher = create_cross_session_system( 458 sessions_dir="/Users/rcerf/repos/Sovereign_Estate/daily/sessions/" 459 ) 460 461 # Process updates from all active sessions 462 cross_events = watcher.process_updates() 463 464 for event in cross_events: 465 print(f"Cross-pollination: {event.content} bridged {event.source_session} → {event.target_session}") 466 ``` 467 468 ## Usage 469 470 ### Starting the Loop 471 472 ```python 473 from core.attention import create_default_loop 474 475 loop = create_default_loop("/path/to/daily/notes/") 476 await loop.run() # Runs continuously 477 ``` 478 479 ### Recording Manual Attention 480 481 ```python 482 from core.attention import AttentionTracker, AttentionEvent 483 484 tracker = AttentionTracker() 485 tracker.record(AttentionEvent( 486 timestamp=datetime.now(), 487 target_id="bullet_123", 488 target_type="bullet", 489 modality="highlight", 490 intensity=0.9, 491 source="conversation" 492 )) 493 ``` 494 495 ### Marking Unresolved 496 497 ```python 498 tracker.mark_unresolved("open_question_001") 499 # Later... 500 tracker.resolve("open_question_001") 501 ``` 502 503 ### Cross-Session Tracking 504 505 ```python 506 from core.attention import CrossSessionTracker, create_cross_session_system 507 508 # Create tracker and watcher 509 tracker, watcher = create_cross_session_system( 510 sessions_dir="/Users/rcerf/repos/Sovereign_Estate/daily/sessions/" 511 ) 512 513 # Watcher scans for active sessions 514 watcher.scan_sessions() 515 516 # Process updates and detect cross-pollination 517 events = watcher.process_updates() 518 519 # Check unified state 520 state = tracker.get_state() 521 print(f"Active sessions: {len(state.sessions)}") 522 print(f"Context switches: {len(state.switch_history)}") 523 print(f"Cross-session attractors: {state.cross_session_attractors}") 524 ``` 525 526 ## File Locations 527 528 | Component | Path | 529 |-----------|------| 530 | Tracker | `core/attention/tracker.py` | 531 | Loop | `core/attention/loop.py` | 532 | Cross-Session | `core/attention/cross_session.py` | 533 | Membrane | `core/attention/membrane.py` | 534 | Aha Detection | `core/attention/aha_detection.py` | 535 | Eye Tracking | `core/attention/eye_tracking.py` | 536 | Compaction | `core/attention/compaction.py` | 537 | Podcast Pipeline | `core/attention/podcast_pipeline.py` | 538 | Daily Notes | `Sovereign_Estate/daily/` | 539 | Session Streams | `Sovereign_Estate/daily/sessions/` | 540 | Cooperative Eye Pattern | `Sovereign_Estate/patterns/cooperative-eye-principle.md` | 541 | Resonance Weights Pattern | `Sovereign_Estate/patterns/resonance-weighted-attention.md` | 542 543 ## Usage Examples 544 545 ### Aha Detection 546 547 ```python 548 from core.attention import create_aha_system, AhaType 549 550 detector, manager = create_aha_system( 551 daily_note_path="/path/to/daily/2026-01-13.md" 552 ) 553 554 # Detect from edge prediction 555 candidate = detector.detect_from_edge_prediction( 556 source_concepts=["attention_tracking"], 557 target_concepts=["aha_detection"], 558 prediction_confidence=0.85, 559 graph_distance=4.2 560 ) 561 # → Discovery aha: distant concepts connected 562 563 # Detect from resonance spike 564 candidate = detector.detect_from_resonance_spike( 565 concept="markov_blankets", 566 resonance_before=0.7, 567 resonance_after=0.95, 568 related_concepts=["attention", "membrane"] 569 ) 570 # → Architectural aha: strong→stronger settling 571 572 # Detect from cross-session convergence 573 candidate = detector.detect_from_cross_session( 574 topic="attention_is_all_you_need", 575 session_ids=["abc123", "def456"], 576 combined_intensity=0.82 577 ) 578 # → Architectural aha: distributed cognition crystallizing 579 ``` 580 581 ### Biometric Validation 582 583 ```python 584 from core.attention import BiometricSnapshot 585 586 biometrics = BiometricSnapshot( 587 timestamp=datetime.now(), 588 heart_rate=85, 589 heart_rate_baseline=72, # 18% spike 590 gsr=4.2, 591 gsr_baseline=3.5, # 20% spike 592 hrv=65, 593 hrv_baseline=55 # 18% increase 594 ) 595 596 # Attach to candidate and validate 597 validated = detector.attach_biometrics(candidate.id, biometrics) 598 # For Discovery: needs HR + GSR spike 599 # For Architectural: needs HRV increase 600 ``` 601 602 ### Session Membrane 603 604 ```python 605 from core.attention import create_membrane_system, PermeabilityLevel 606 607 membrane = create_membrane_system( 608 sessions_dir="/path/to/sessions/", 609 permeability=PermeabilityLevel.ALERTS 610 ) 611 612 # Check what should cross to current session 613 signals = membrane.get_signals_for_session("current_session_id") 614 for signal in signals: 615 print(f"From {signal.source_session}: {signal.content}") 616 ``` 617 618 ## Multi-Modal Attention Mesh 619 620 > **The body is distributed across space. So is attention.** 621 622 ### The Hardware Stack 623 624 The attention mesh spans multiple devices, each contributing different sensing modalities: 625 626 ``` 627 ┌─────────────────────────────────────────────────────────────────────────────────┐ 628 │ MULTI-MODAL ATTENTION MESH │ 629 │ "Every sensor is an attention antenna" │ 630 ├─────────────────────────────────────────────────────────────────────────────────┤ 631 │ │ 632 │ SEATED MODE (Desk) AMBULATORY MODE (Mobile) │ 633 │ ┌─────────────────────┐ ┌─────────────────────┐ │ 634 │ │ │ │ │ │ 635 │ │ [iPad] [iPad] │ Room │ AirPods Max │ Audio │ 636 │ │ ◄ ► │ Context │ AirPods Pro 3 │ Spatial │ 637 │ │ (rear-L) (rear-R) │ 3D Tracking │ ┌─────┐ │ Head Track │ 638 │ │ ▼ │ │ │ │ │ │ 639 │ │ ┌─────────┐ │ │ │ ⌚ │ │ Apple Watch │ 640 │ │ │ YOU │ │ │ │ │ │ HR/HRV/GSR │ 641 │ │ └─────────┘ │ │ └─────┘ │ │ 642 │ │ ▲ │ │ │ │ │ 643 │ │ ┌───┬───┬───┐ │ │ ┌───────┐ │ Omi AI │ 644 │ │ │ │ │ │ │ │ │ ◉ │ │ Always-on │ 645 │ │ │iPad│Tobii│iPhone │ Front Gaze │ │ Omi │ │ Audio │ 646 │ │ │cam │ 5 │cam │ Parallax │ └───────┘ │ │ 647 │ │ └───┴───┴───┘ │ │ │ │ │ 648 │ │ │ │ │ ┌───────┐ │ LED Glasses │ 649 │ │ ┌─────────┐ │ │ │ ≋≋≋≋ │ │ Peripheral │ 650 │ │ │ MacBook │ │ Side Gaze │ │glasses│ │ Vision │ 651 │ │ │ cam │ │ Fallback │ └───────┘ │ │ 652 │ │ └─────────┘ │ │ │ │ 653 │ │ │ │ iPhone │ Context │ 654 │ └─────────────────────┘ │ (pocket) │ Capture │ 655 │ │ │ │ 656 │ │ Meta Ray-Bans │ First-Person │ 657 │ │ ┌───────┐ │ Camera + │ 658 │ │ │ ◎ ◎ │ │ Meta AI │ 659 │ │ └───────┘ │ │ 660 │ └─────────────────────┘ │ 661 │ │ 662 └─────────────────────────────────────────────────────────────────────────────────┘ 663 ``` 664 665 ### Device Inventory 666 667 | Device | Primary Signal | Secondary Signal | Mode | 668 |--------|---------------|------------------|------| 669 | **Tobii Eye Tracker 5** | IR gaze (x,y) | Pupil diameter, head pose | Seated | 670 | **iPhone** (below Tobii) | Camera gaze redundancy | Proximity context | Seated | 671 | **MacBook camera** | Parallax gaze (left screen) | Head position | Seated | 672 | **iPad camera** (left of MacBook) | Parallax gaze triangulation | | Seated | 673 | **iPad rear-left** | Room position, 3D mesh | Screen content (if visible) | Seated | 674 | **iPad rear-right** | Room position, stereo depth | Body posture | Seated | 675 | **Apple Watch** | HR, HRV | GSR, movement | Both | 676 | **AirPods Pro 3** | Spatial audio delivery | Head tracking, noise level | Both | 677 | **AirPods Max** | Spatial audio (focused) | Head tracking | Both | 678 | **Meta Ray-Bans** | First-person camera | Audio + Meta AI queries | Ambulatory | 679 | **Omi AI** | Always-on audio + webhooks | 250+ integrations, speaker ID | Ambulatory | 680 | **LED Glasses** | Peripheral vision display | (output only) | Ambulatory | 681 682 ### Gaze Tracking Hierarchy 683 684 ``` 685 GROUND TRUTH INFERRED 686 ◄────────────────────────────────────────────────────────────────► 687 688 Tobii IR iPhone cam Parallax Head pose Last known 689 (main screen) (redundant) (side screens) (direction) (decayed) 690 │ │ │ │ │ 691 ▼ ▼ ▼ ▼ ▼ 692 ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ 693 │ 1.0 │ │ 0.9 │ │ 0.7 │ │ 0.4 │ │ 0.2 │ 694 │trust │ │trust │ │trust │ │trust │ │trust │ 695 └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ 696 ``` 697 698 When Tobii loses tracking (eyes leave main monitor), the system cascades: 699 700 1. **iPhone camera** - Redundant front camera, validates Tobii loss 701 2. **MacBook + iPad parallax** - Triangulates gaze on left screens 702 3. **Head pose** - Infers attention direction from head rotation 703 4. **Decay** - Last known position with temporal decay 704 705 ### Parallax Gaze Estimation 706 707 Using two cameras with known positions, we can estimate gaze point without IR: 708 709 ``` 710 Target Point (on MacBook screen) 711 ● 712 /│\ 713 / │ \ 714 / │ \ 715 / │ \ 716 / │ \ 717 / │ \ 718 / │ \ 719 / │ \ 720 ▼ │ ▼ 721 MacBook User iPad 722 Camera Eyes Camera 723 ●─────────●─────────● 724 d1 d2 725 ``` 726 727 **Algorithm:** 728 1. Both cameras detect pupil centers 729 2. Known camera positions give baseline 730 3. Triangulate where gaze vectors intersect 731 4. Map intersection to screen coordinates 732 733 **Accuracy:** Lower than Tobii IR, but sufficient for coarse attention: 734 - Which screen (LEFT/CENTER/RIGHT) 735 - Which quadrant of screen 736 - Attention duration on region 737 738 ### Room-Scale 3D Context 739 740 The rear iPads provide room-scale awareness: 741 742 ``` 743 ┌─────────────────────────────────────────┐ 744 │ ROOM VIEW │ 745 │ │ 746 │ [iPad-L] [iPad-R] │ 747 │ ◄────── 3D Mesh ─────────► │ 748 │ Stereo Depth │ 749 │ │ 750 │ ┌───────────────┐ │ 751 │ │ │ │ 752 │ │ USER │ │ 753 │ │ ┌───┐ │ │ 754 │ │ │ ○ │ head │ │ 755 │ │ └───┘ │ │ 756 │ │ body │ │ 757 │ │ │ │ 758 │ └───────────────┘ │ 759 │ │ 760 │ ┌─────────────────────────────┐ │ 761 │ │ DESK / SCREENS │ │ 762 │ └─────────────────────────────┘ │ 763 │ │ 764 └─────────────────────────────────────────┘ 765 ``` 766 767 **Captured signals:** 768 - **3D position** - Where you are in the room 769 - **Body posture** - Leaning, slouching, engaged 770 - **Head orientation** - Cross-validation with gaze 771 - **Screen content** (optional) - What's displayed (OCR if needed) 772 - **Object detection** - Coffee cup, phone, other attentional competitors 773 774 ### Spatial Audio Attention Gradient 775 776 AirPods Pro 3 and AirPods Max deliver **attention gradients through 3D audio space**: 777 778 ``` 779 ELEVATION (Abstraction Level) 780 ▲ 781 │ 782 Strategic │ ● Meta-insight 783 │ "Architecture settling" 784 │ 785 │ 786 Tactical │ ● Active thread 787 │ "Pipeline design" 788 │ 789 │ 790 Operational │ ● Immediate task 791 │ "Fix this bug" 792 │ 793 ───────────┼─────────────────────────────────► 794 │ DISTANCE 795 │ (Urgency) 796 POSITION │ 797 (Importance) │ Close = urgent 798 │ Far = background 799 Left = secondary │ 800 Center = primary │ 801 Right = future │ 802 ``` 803 804 **Mapping:** 805 - **Position (L/R)** → Importance gradient (center = most important) 806 - **Distance** → Urgency (close = needs attention soon) 807 - **Elevation** → Abstraction level (high = strategic, low = operational) 808 809 **Example audio cues:** 810 ```python 811 attention_audio_map = { 812 "urgent_decision": AudioPosition(x=0, y=0, z=0.5, distance=0.2), # Center, close, tactical 813 "background_insight": AudioPosition(x=-0.7, y=0, z=0.8, distance=0.8), # Left, far, strategic 814 "immediate_task": AudioPosition(x=0.3, y=0, z=0.2, distance=0.1), # Slight right, very close, operational 815 } 816 ``` 817 818 ### LED Glasses Peripheral Display 819 820 Peripheral vision for cognitive state without demanding foveal attention: 821 822 ``` 823 ┌─────────────────────────────────────────────────────────────┐ 824 │ LED GLASSES VIEW │ 825 │ │ 826 │ LEFT EDGE RIGHT EDGE │ 827 │ ┌──────┐ ┌──────┐ │ 828 │ │▓▓▓▓▓▓│ Cognitive │░░░░░░│ │ 829 │ │▓▓▓▓▓▓│ Load │░░░░░░│ │ 830 │ │▓▓▓▓▓▓│ (red=high) [ FOVEA ] │░░░░░░│ │ 831 │ │▓▓▓▓▓▓│ │░░░░░░│ │ 832 │ │▓▓▓▓▓▓│ Attention │░░░░░░│ │ 833 │ │▓▓▓▓▓▓│ State │░░░░░░│ │ 834 │ └──────┘ (gradient) └──────┘ │ 835 │ │ 836 │ Colors: │ 837 │ ● Blue glow = deep focus, flow state │ 838 │ ● Amber pulse = pending decision, needs resolution │ 839 │ ● Green = context aligned, proceeding well │ 840 │ ● Red edge = cognitive overload warning │ 841 │ │ 842 └─────────────────────────────────────────────────────────────┘ 843 ``` 844 845 **No foveal demand** - You never need to "look at" the glasses. The peripheral glow provides ambient awareness of your cognitive state. 846 847 ### Mode Transition: Seated ↔ Ambulatory 848 849 The system detects mode transitions and hands off attention context: 850 851 ``` 852 SEATED MODE AMBULATORY MODE 853 ┌─────────────────┐ ┌─────────────────┐ 854 │ │ │ │ 855 │ Tobii │ │ AirPods │ 856 │ iPhone │ ═══════════════════► │ Watch │ 857 │ Cameras │ Context Handoff │ Omi │ 858 │ Screens │ │ LED Glasses │ 859 │ │ ◄═══════════════════ │ iPhone │ 860 │ │ Context Resume │ │ 861 └─────────────────┘ └─────────────────┘ 862 863 Handoff includes: 864 - Active gravity wells 865 - Unresolved items (nag list) 866 - Attention trajectory 867 - Pending aha candidates 868 - Current cognitive altitude 869 ``` 870 871 **Transition triggers:** 872 - Watch detects walking motion 873 - Tobii loses tracking for extended period 874 - iPhone accelerometer confirms movement 875 - Explicit "going mobile" command 876 877 ### Apple Watch Integration 878 879 Apple Watch provides continuous biometric validation: 880 881 ```python 882 class WatchBiometrics: 883 heart_rate: float # Current HR 884 heart_rate_variability: float # HRV (parasympathetic indicator) 885 # GSR requires external sensor or inferred from conductance 886 887 def detect_aha_signature(self) -> Optional[AhaType]: 888 """ 889 Discovery: HR spike + arousal 890 Architectural: HRV increase (settling) 891 """ 892 if self.hr_spike > 0.15 and self.arousal_high: 893 return AhaType.DISCOVERY 894 if self.hrv_increase > 0.10: 895 return AhaType.ARCHITECTURAL 896 return None 897 ``` 898 899 ### Meta Ray-Bans: First-Person Capture 900 901 Meta Ray-Bans add first-person camera and audio to the ambulatory stack (no display): 902 903 **Hardware:** 904 - Camera (photo + video capture) 905 - Microphone (voice commands, audio capture) 906 - Speakers (open-ear audio output) 907 - Meta AI integration (voice-activated) 908 909 **Integration with Attention Mesh:** 910 ``` 911 Meta Ray-Bans Camera → Object Detection → Context Events 912 → OCR (reading text) → Attention Events 913 → Scene Recognition → Environmental Context 914 915 Meta Ray-Bans Audio → Meta AI queries → Knowledge retrieval 916 → Voice notes → Transcription → Attention Events 917 ``` 918 919 **Key use case:** When walking, Meta Ray-Bans capture what you're *looking at* without pulling out a phone. The first-person perspective provides: 920 - **Visual context** that Omi's audio alone cannot 921 - **Text capture** from signs, whiteboards, screens 922 - **Scene recognition** (where you are, what's around) 923 - **Quick AI queries** via Meta AI ("Hey Meta, what's this?") 924 925 **Complementary to Omi:** 926 | Device | Modality | Best For | 927 |--------|----------|----------| 928 | Meta Ray-Bans | Visual + AI queries | What you're looking at, quick questions | 929 | Omi AI | Continuous audio | Everything you say/hear, full conversations | 930 931 **Division of labor:** 932 - **Omi** = always-on passive capture (continuous transcription) 933 - **Meta** = active capture on demand (photos, AI queries, targeted audio) 934 935 Together they provide complete ambulatory capture - Omi for the audio stream, Meta for visual snapshots and AI assistance. 936 937 ### Omi AI: Always-On Audio + Open Platform 938 939 The [Omi device](https://www.omi.me/) (arriving Saturday) is far more than just audio capture - it's an **open-source AI wearable platform** with extensive extensibility. 940 941 **Hardware Specs:** 942 - 150 mAh battery (10-14 hour runtime) 943 - Dual microphones with speaker recognition 944 - Bluetooth 5.1 + 2.4/5 GHz Wi-Fi 945 - Offline recording capability 946 - Latency: 500-2000ms live; 10-20s offline 947 - 25+ language support with translation 948 949 **Core Capabilities:** 950 - Continuous audio capture (voice + environment) 951 - Real-time transcription via Deepgram/Speechmatics/Soniox 952 - Automatic summaries, tasks, and memory creation 953 - Searchable memory with AI assistant 954 - Daily recap digests 955 - Speaker recognition profiles 956 957 **The Extensibility Stack:** 958 959 ``` 960 ┌─────────────────────────────────────────────────────────────────┐ 961 │ OMI PLATFORM ARCHITECTURE │ 962 ├─────────────────────────────────────────────────────────────────┤ 963 │ │ 964 │ HARDWARE BACKEND APPS/INTEGRATIONS │ 965 │ ┌─────────┐ ┌─────────────┐ ┌─────────────────┐ │ 966 │ │ Omi │ │ FastAPI │ │ 250+ plugins │ │ 967 │ │ Device │─────►│ Firebase │─────►│ • Notion │ │ 968 │ │ │ BLE │ Pinecone │ │ • Slack │ │ 969 │ └─────────┘ │ Redis │ │ • Google Drive │ │ 970 │ │ │ LangChain │ │ • Zapier │ │ 971 │ │ └──────┬──────┘ │ • Discord │ │ 972 │ │ │ │ • Custom apps │ │ 973 │ │ ▼ └─────────────────┘ │ 974 │ │ ┌─────────────┐ │ 975 │ │ │ Webhooks │──────► Your endpoints │ 976 │ │ └─────────────┘ │ 977 │ │ │ 978 │ ▼ ┌─────────────┐ │ 979 │ ┌─────────┐ │ SDKs │ │ 980 │ │ 3rd │ │ • Python │ │ 981 │ │ Party │─────►│ • Swift │ │ 982 │ │ Devices │ │ • React │ │ 983 │ └─────────┘ │ Native │ │ 984 │ (Plaud, etc) └─────────────┘ │ 985 │ │ 986 └─────────────────────────────────────────────────────────────────┘ 987 ``` 988 989 **Webhook Integration (Critical for Sovereign OS):** 990 ```python 991 # Real-time transcript delivery to custom endpoints 992 # 1. Create webhook at webhook.site or your own server 993 # 2. Paste URL into Omi App → Explore → Create App 994 # 3. Start speaking → Real-time transcript streams to your endpoint 995 996 # Example: Wire to Sovereign OS attention system 997 @app.post("/omi/transcript") 998 async def receive_omi_transcript(data: dict): 999 """Receive real-time transcripts from Omi.""" 1000 transcript = data.get("transcript") 1001 speaker = data.get("speaker") 1002 timestamp = data.get("timestamp") 1003 1004 # Convert to attention event 1005 event = AttentionEvent( 1006 timestamp=timestamp, 1007 target_id=extract_concepts(transcript), 1008 target_type="verbal_thought", 1009 modality="speech", 1010 intensity=0.8, 1011 source="omi" 1012 ) 1013 attention_tracker.record(event) 1014 ``` 1015 1016 **250+ Integrations Available:** 1017 - **Notion** - Auto-sync memories and summaries 1018 - **Slack** - Voice-to-Slack messages ("Send Slack message to #channel...") 1019 - **Google Drive** - Store transcripts and summaries 1020 - **Zapier** - 1000+ app automations triggered by conversations 1021 - **Discord** - Post conversation summaries to channels 1022 - **Custom Apps** - Build in ~1 minute with webhook URLs 1023 1024 **Developer Platform:** 1025 - **Open Source** (MIT License) - [GitHub: BasedHardware/omi](https://github.com/BasedHardware/omi) 1026 - **$10M bounty pool** for app developers 1027 - **SDKs**: Python, Swift, React Native, Flutter 1028 - **Third-party wearable integration** - Add Plaud AI, Limitless, or custom hardware 1029 - **Local-only processing option** - Data sovereignty 1030 1031 **Privacy & Security:** 1032 - SOC 2 + HIPAA compliant 1033 - TLS encryption in transit; AES-256-GCM at rest 1034 - Local-only processing available 1035 - No AI training on user data 1036 - One-click data deletion 1037 1038 **Integration with Attention Mesh:** 1039 ``` 1040 Omi Audio Stream → Webhook → Sovereign OS 1041 │ 1042 ├─► Transcription → Concept Extraction 1043 │ │ 1044 │ ▼ 1045 │ Attention Events 1046 │ │ 1047 ├─► Speaker Recognition │ 1048 │ (who's talking) │ 1049 │ ▼ 1050 └─► Memory Search ◄── Cross-reference 1051 (recall context) with gaze data 1052 ``` 1053 1054 **Key Insight:** Omi isn't just a capture device - it's a **platform** we can build on. The webhook system means Sovereign OS can receive real-time transcripts and convert verbal thoughts into attention events, exactly like we do with typed content when seated. 1055 1056 When walking, verbal thoughts become attention events - the same pipeline, different input modality. 1057 1058 ### Data Flow Architecture 1059 1060 ``` 1061 ┌─────────────────────────────────────────────────────────────────────────────────┐ 1062 │ ATTENTION MESH DATA FLOW │ 1063 ├─────────────────────────────────────────────────────────────────────────────────┤ 1064 │ │ 1065 │ SENSORS FUSION ATTENTION │ 1066 │ │ 1067 │ ┌─────────┐ │ 1068 │ │ Tobii │──┐ │ 1069 │ └─────────┘ │ ┌──────────────┐ ┌──────────────┐ │ 1070 │ ┌─────────┐ │ │ │ │ │ │ 1071 │ │ iPhone │──┼────────►│ Gaze Fusion │────────►│ Attention │ │ 1072 │ └─────────┘ │ │ │ │ Tracker │ │ 1073 │ ┌─────────┐ │ │ • Tobii IR │ │ │ │ 1074 │ │Parallax │──┘ │ • Parallax │ │ • Events │ │ 1075 │ │ Cameras │ │ • Head pose │ │ • Trajectory│ │ 1076 │ └─────────┘ │ • Fallback │ │ • Wells │ │ 1077 │ └──────────────┘ │ • Nags │ │ 1078 │ ┌─────────┐ │ │ │ 1079 │ │ Watch │───────────────────────────────────►│ │ │ 1080 │ │ HR/HRV │ Biometric │ │ │ 1081 │ └─────────┘ Validation └──────┬───────┘ │ 1082 │ │ │ 1083 │ ┌─────────┐ ┌──────────────┐ │ │ 1084 │ │AirPods │────────────│ Spatial │ │ │ 1085 │ │Pro/Max │ │ Audio │◄──────────────┘ │ 1086 │ └─────────┘ │ Engine │ Attention │ 1087 │ │ │ → Audio Position │ 1088 │ ┌─────────┐ │ • Position │ │ 1089 │ │ Omi │────────────│ • Distance │ ┌──────────────┐ │ 1090 │ │ Audio │ │ • Elevation │ │ Daily │ │ 1091 │ └─────────┘ └──────────────┘────────►│ Note │ │ 1092 │ │ Surface │ │ 1093 │ ┌─────────┐ ┌──────────────┐ └──────────────┘ │ 1094 │ │ LED │◄───────────│ Cognitive │ │ 1095 │ │Glasses │ │ State │◄──────────────────────── │ 1096 │ └─────────┘ │ Display │ Attention State │ 1097 │ └──────────────┘ → Peripheral Glow │ 1098 │ │ 1099 │ ┌─────────┐ ┌──────────────┐ │ 1100 │ │ Rear │────────────│ Room │ Context for │ 1101 │ │ iPads │ │ Context │────────►Body posture │ 1102 │ └─────────┘ │ Engine │ Environment │ 1103 │ └──────────────┘ validation │ 1104 │ │ 1105 └─────────────────────────────────────────────────────────────────────────────────┘ 1106 ``` 1107 1108 ### Implementation Priority 1109 1110 | Component | Effort | Impact | Status | 1111 |-----------|--------|--------|--------| 1112 | Tobii → Talon bridge | Done | High | ✓ Operational | 1113 | Multi-monitor detection | Done | Medium | ✓ Operational | 1114 | Watch biometrics | Medium | High | Planned | 1115 | Parallax gaze estimation | High | Medium | Planned | 1116 | Spatial audio engine | Medium | High | Planned | 1117 | Meta Ray-Bans integration | Medium | High | Planned | 1118 | Omi webhook integration | Low | High | Awaiting hardware (Saturday) | 1119 | LED glasses driver | Medium | Medium | Planned | 1120 | Rear iPad mesh | High | Low | Future | 1121 | EMG bracelet integration | Medium | High | Research phase | 1122 1123 ### Sensor Fusion Architecture 1124 1125 > **No single sensor tells the truth. Fusion does.** 1126 1127 The attention mesh combines weak signals from multiple modalities to create robust attention inference: 1128 1129 ``` 1130 ┌─────────────────────────────────────────────────────────────────────────────────┐ 1131 │ SENSOR FUSION FOR ATTENTION INFERENCE │ 1132 ├─────────────────────────────────────────────────────────────────────────────────┤ 1133 │ │ 1134 │ SIGNAL SOURCES FUSION LAYER OUTPUT │ 1135 │ │ 1136 │ ┌─────────────┐ │ 1137 │ │ EEG (Omi) │──┐ ┌──────────────────┐ │ 1138 │ │ α-suppression│ │ │ │ │ 1139 │ │ (0.4 weight) │ │ │ ATTENTION │ ┌──────────────┐ │ 1140 │ └─────────────┘ │ │ CONFIDENCE │ │ │ │ 1141 │ │ │ ESTIMATOR │ │ Attention │ │ 1142 │ ┌─────────────┐ │ │ │───────►│ State │ │ 1143 │ │ EMG (wrist) │──┼────────►│ Kalman filter │ │ (0.0-1.0) │ │ 1144 │ │ tension/ │ │ │ + Bayesian │ │ │ │ 1145 │ │ arousal │ │ │ inference │ └──────────────┘ │ 1146 │ │ (0.2 weight)│ │ │ │ │ 1147 │ └─────────────┘ │ └────────┬─────────┘ │ 1148 │ │ │ │ 1149 │ ┌─────────────┐ │ │ ┌──────────────┐ │ 1150 │ │ Gaze │──┤ │ │ │ │ 1151 │ │ Tobii/cam │ │ └─────────────►│ Intent │ │ 1152 │ │ (0.3 weight)│ │ │ Prediction │ │ 1153 │ └─────────────┘ │ │ (what next) │ │ 1154 │ │ └──────────────┘ │ 1155 │ ┌─────────────┐ │ │ 1156 │ │ Speech │──┘ │ 1157 │ │ activity │ │ 1158 │ │ (0.1 weight)│ │ 1159 │ └─────────────┘ │ 1160 │ │ 1161 └─────────────────────────────────────────────────────────────────────────────────┘ 1162 ``` 1163 1164 **Fusion Formula:** 1165 1166 ```python 1167 def compute_attention_confidence(signals: dict) -> float: 1168 """ 1169 Fuse multiple weak signals into robust attention estimate. 1170 1171 Each modality catches what others miss: 1172 - EEG: Internal focus (thinking without moving) 1173 - EMG: Engagement intent (about to act) 1174 - Gaze: External attention targets 1175 - Speech: Verbal engagement 1176 """ 1177 weights = { 1178 'eeg_alpha': 0.4, # Omi single-electrode (when available) 1179 'gaze_fixation': 0.3, # Tobii/camera 1180 'emg_tension': 0.2, # Bracelet baseline 1181 'speech_active': 0.1 # AirPods/Omi mic 1182 } 1183 1184 confidence = sum( 1185 signals.get(k, 0.5) * w # Default 0.5 (uncertain) if missing 1186 for k, w in weights.items() 1187 ) 1188 1189 # Boost confidence when signals agree 1190 agreement = compute_signal_agreement(signals) 1191 if agreement > 0.8: 1192 confidence *= 1.2 # 20% boost for corroboration 1193 1194 return min(confidence, 1.0) 1195 ``` 1196 1197 **EMG-Attention Correlates:** 1198 1199 | EMG Signal | Attention Proxy | Detection Method | 1200 |------------|-----------------|------------------| 1201 | Forearm tension baseline | Cognitive engagement | RMS amplitude above rest | 1202 | Grip force variation | Concentration intensity | Force sensor variance | 1203 | Micro-tremor frequency | Arousal/alertness | High-freq spectral content | 1204 | Pre-movement potential | Intent to act | 50-100ms before motion | 1205 | Relaxation signature | Disengagement | Sustained low amplitude | 1206 1207 **Why This Works:** 1208 1209 1. **Redundancy** - If one sensor fails, others compensate 1210 2. **Corroboration** - Agreeing signals boost confidence 1211 3. **Complementarity** - Each modality catches different states 1212 4. **Graceful degradation** - System works with subset of sensors 1213 1214 **Research Basis (2024-2025):** 1215 1216 Recent studies validate EMG for attention/cognitive load detection: 1217 1218 1. **Facial EMG + Cognitive Load** (2025) - Wearable multi-modal approach integrating soft electrodes for facial EMG and EEG. Subject-specific EMG channels strongly correlate with subjective cognitive load scores. Key finding: personalized modeling is crucial for robust cognitive tracking. 1219 1220 2. **Trapezius EMG + Stress** (2024) - RMS shows positive correlation with perceived stress (r = 0.52), indicating higher perceived stress aligns with increased muscle activity. Increased task difficulty raised EMG RMS values due to cognitive load. 1221 1222 3. **Corrugator/Frontalis EMG** - Corrugator amplitude increased with higher cognitive distractor levels. Frontalis amplitude rose with additional task demands, supporting fEMG-cognitive load correlation. 1223 1224 4. **Multi-Modal Fusion** - PHYSIOPRINT-style systems derive workload measures in real time from multiple physiological signals (EEG, ECG, EOG, EMG), validating the sensor fusion approach. 1225 1226 **Open-Source EMG Options:** 1227 1228 | Device | Features | Price | Integration | 1229 |--------|----------|-------|-------------| 1230 | **[Gesto](https://medium.com/@ScottAmyx/gesto-open-source-gesture-control-platform-9518fe3b57b4)** | 4 bipolar EMG + accelerometer, Arduino/RPi, open GitHub | ~$100 | MIT license, no cameras | 1231 | **uMyo** | 8 channels, dry electrodes, BLE | ~$150 | Python SDK | 1232 | **MyoWare 2.0** | Single channel, Arduino shield | ~$40 | Simple analog | 1233 | **[OpenBCI Cyton](https://github.com/csce585-mlsystems/EMG-Based-Hand-Recognition)** | 8 EEG/EMG channels, research-grade | ~$500 | Full SDK | 1234 | **[sEMG TMA](https://github.com/Laknath1996/sEMG-Hand-Gesture-Recognition)** | Software-only, real-time gesture | Free | Python, ICASSP 2021 | 1235 1236 **Gesto Integration Sketch:** 1237 1238 ```python 1239 from gesto import GestoDevice 1240 from attention_system import AttentionTracker 1241 1242 class EMGAttentionBridge: 1243 """Bridge Gesto EMG to attention system.""" 1244 1245 def __init__(self): 1246 self.gesto = GestoDevice() 1247 self.baseline_rms = None 1248 self.tracker = AttentionTracker() 1249 1250 def calibrate_baseline(self, duration_sec: int = 30): 1251 """Capture resting EMG baseline.""" 1252 samples = self.gesto.read_for(duration_sec) 1253 self.baseline_rms = compute_rms(samples) 1254 1255 def process_emg(self, samples: np.ndarray) -> dict: 1256 """Convert EMG to attention signals.""" 1257 current_rms = compute_rms(samples) 1258 1259 # Tension relative to baseline 1260 tension_ratio = current_rms / self.baseline_rms 1261 1262 # Arousal from high-frequency content 1263 arousal = compute_hf_power(samples) / compute_total_power(samples) 1264 1265 # Pre-movement detection 1266 pre_movement = detect_motor_preparation(samples) 1267 1268 return { 1269 'emg_tension': normalize(tension_ratio, 0.5, 2.0), 1270 'emg_arousal': arousal, 1271 'emg_intent': 1.0 if pre_movement else 0.0 1272 } 1273 1274 def stream_to_attention(self): 1275 """Continuous EMG → attention events.""" 1276 for samples in self.gesto.stream(): 1277 signals = self.process_emg(samples) 1278 self.tracker.update_from_emg(signals) 1279 ``` 1280 1281 ### Cursorless for the Real World 1282 1283 > **What if you could target things in physical space the way Cursorless targets code?** 1284 1285 [Cursorless](https://www.cursorless.org/docs/) revolutionized voice-controlled code editing by putting "hats" (colored markers) on code elements and using an Action + Target grammar. We can extend this to physical reality via Meta Ray-Bans. 1286 1287 **The Vision:** 1288 1289 ``` 1290 CODE WORLD PHYSICAL WORLD 1291 ┌────────────────────────────┐ ┌────────────────────────────┐ 1292 │ │ │ │ 1293 │ "chuck blue air" │ │ "capture red shelf" │ 1294 │ ↓ │ │ ↓ │ 1295 │ Delete the token with │ │ Photo the object with │ 1296 │ blue hat at position 'a' │ │ red marker on shelf │ 1297 │ │ │ │ 1298 │ ┌─────────────────────┐ │ │ ┌─────────────────────┐ │ 1299 │ │ def foo(): │ │ │ │ 📖 [red] │ │ 1300 │ │ x = "hello"[blue]│ │ │ │ 📱 [green] │ │ 1301 │ │ return x │ │ │ │ ☕ [blue] │ │ 1302 │ └─────────────────────┘ │ │ └─────────────────────┘ │ 1303 │ │ │ │ 1304 └────────────────────────────┘ └────────────────────────────┘ 1305 1306 Grammar: ACTION + COLOR + TARGET Grammar: ACTION + COLOR + TARGET 1307 - chuck (delete) - capture (photo) 1308 - take (select) - note (annotate) 1309 - move (reposition) - track (add to attention) 1310 - recall (search memories) 1311 ``` 1312 1313 **Implementation Path:** 1314 1315 1. **Capture** - Meta Ray-Bans take photo/video 1316 2. **Detect** - YOLO/SAM identifies objects, assigns virtual "hats" (colors) 1317 3. **Display** - AR overlay or audio description of targets 1318 4. **Command** - Voice command using Cursorless-style grammar 1319 5. **Execute** - System performs action on targeted object 1320 1321 **Cursorless Real-World Grammar:** 1322 1323 | Action | Code Equivalent | Physical Meaning | 1324 |--------|----------------|------------------| 1325 | `capture` | `take` (select) | Photo/video the target | 1326 | `note` | `comment` | Add annotation to target | 1327 | `track` | `reference` | Add to attention/gravity wells | 1328 | `recall` | `search` | Find related memories/notes | 1329 | `mark` | `bookmark` | Save location/context for later | 1330 | `describe` | `hover` | Get AI description of target | 1331 | `send` | `copy` | Share target to specific destination | 1332 1333 **Example Commands:** 1334 1335 ``` 1336 "capture red book" → Photo the book with red marker 1337 "note green sign" → Add voice note about the green-marked sign 1338 "track blue whiteboard" → Add whiteboard content to attention system 1339 "recall this scene" → Search memories related to current view 1340 "describe yellow device" → "Hey Meta, what is this?" on yellow target 1341 ``` 1342 1343 **Integration with EMG Bracelet:** 1344 1345 The EMG bracelet adds **silent confirmation** to the Cursorless-for-reality pipeline: 1346 1347 ``` 1348 VOICE + EMG CONFIRMATION 1349 ┌────────────────────────────────────────────────────────────────────┐ 1350 │ │ 1351 │ 1. Voice: "capture red" → System identifies red target │ 1352 │ │ 1353 │ 2. EMG: Subtle pinch → Confirms selection │ 1354 │ │ 1355 │ 3. System executes photo → No need to say "yes" aloud │ 1356 │ │ 1357 └────────────────────────────────────────────────────────────────────┘ 1358 1359 Benefits: 1360 - Silent confirmation in social situations 1361 - Faster than voice confirmation 1362 - Reduces false positives from ambient speech 1363 - Enables "look + pinch" without voice at all 1364 ``` 1365 1366 **The talon-gaze-ocr Connection:** 1367 1368 The existing [talon-gaze-ocr](https://github.com/wolfmanstout/talon-gaze-ocr) project combines gaze + OCR + Talon voice. This is the bridge: 1369 1370 ``` 1371 SEATED (existing) AMBULATORY (proposed) 1372 ┌──────────────────┐ ┌──────────────────┐ 1373 │ Tobii gaze │ │ Meta camera │ 1374 │ ↓ │ │ ↓ │ 1375 │ Screen position │ │ Object detection │ 1376 │ ↓ │ │ ↓ │ 1377 │ OCR at gaze point│ │ "Hats" assigned │ 1378 │ ↓ │ │ ↓ │ 1379 │ Talon voice cmd │ │ Voice + EMG cmd │ 1380 │ ↓ │ │ ↓ │ 1381 │ Code action │ │ Real-world action│ 1382 └──────────────────┘ └──────────────────┘ 1383 ``` 1384 1385 **Prototype Architecture:** 1386 1387 ```python 1388 class RealWorldCursorless: 1389 """Cursorless-style targeting for physical world.""" 1390 1391 def __init__(self): 1392 self.meta_glasses = MetaRayBansAPI() 1393 self.object_detector = YOLOv8() # or SAM 1394 self.emg_bracelet = GestoDevice() 1395 self.voice = TalonVoiceEngine() 1396 1397 def process_scene(self, image: np.ndarray) -> dict: 1398 """Detect objects and assign color hats.""" 1399 objects = self.object_detector.detect(image) 1400 hats = {} 1401 colors = ['red', 'green', 'blue', 'yellow', 'purple', 'orange'] 1402 for i, obj in enumerate(objects[:6]): # Max 6 targets 1403 hats[colors[i]] = { 1404 'object': obj, 1405 'bbox': obj.bbox, 1406 'label': obj.label, 1407 'confidence': obj.confidence 1408 } 1409 return hats 1410 1411 def await_command(self, hats: dict) -> Optional[Action]: 1412 """Listen for voice command + EMG confirmation.""" 1413 # Voice recognition 1414 command = self.voice.listen() # e.g., "capture red" 1415 action, color = self.parse_command(command) 1416 1417 if color not in hats: 1418 return None 1419 1420 # Await EMG confirmation (subtle pinch) 1421 if self.emg_bracelet.await_gesture('pinch', timeout=2.0): 1422 return Action(action, hats[color]) 1423 1424 return None 1425 1426 def execute(self, action: Action): 1427 """Execute the action on the target.""" 1428 if action.verb == 'capture': 1429 self.meta_glasses.capture_photo(focus=action.target.bbox) 1430 self.save_to_attention_system(action.target) 1431 elif action.verb == 'note': 1432 note = self.voice.listen_for_note() 1433 self.attach_note(action.target, note) 1434 elif action.verb == 'track': 1435 self.attention_system.add_gravity_well(action.target) 1436 ``` 1437 1438 ### The Promise 1439 1440 **Seated:** Full gaze tracking with biometric validation. Know exactly where attention is, validate when insight arrives. 1441 1442 **Walking:** Audio-first attention with continuous capture. Thoughts spoken become events. Spatial audio guides attention without screens. 1443 1444 **Transition:** Seamless handoff. Context travels with you. The attention mesh adapts to your mode. 1445 1446 --- 1447 1448 ## Multi-Screen Session-Level Attention 1449 1450 > **"I can look at a session without interacting with it."** 1451 > 1452 > Mouse position is not attention. Eye position IS attention. 1453 1454 ### The Setup 1455 1456 You have ~10 concurrent terminal sessions across 3 displays: 1457 1458 ``` 1459 ┌─────────────────────────────────────────────────────────────────────────────────┐ 1460 │ YOUR MULTI-SCREEN WORKSPACE │ 1461 ├─────────────────────────────────────────────────────────────────────────────────┤ 1462 │ │ 1463 │ ┌─────────────────────────────────────┐ │ 1464 │ │ MAIN MONITOR (Tobii) │ │ 1465 │ │ ┌─────────────┬─────────────┐ │ │ 1466 │ │ │ main_tl │ main_tr │ │ │ 1467 │ │ │ (session) │ (session) │ │ │ 1468 │ │ ├─────────────┼─────────────┤ │ │ 1469 │ │ │ main_bl │ main_br │ │ │ 1470 │ │ │ (session) │ (session) │ │ │ 1471 │ │ └─────────────┴─────────────┘ │ │ 1472 │ │ 4 sessions, IR tracking │ │ 1473 │ └─────────────────────────────────────┘ │ 1474 │ │ │ 1475 │ ┌──────────────────────┐ │ ┌─────────────────┐ │ 1476 │ │ MacBook Pro (cam) │ │ │ iPad (cam) │ │ 1477 │ │ ┌────────┬────────┐ │ │ │ ┌─────┬─────┐ │ │ 1478 │ │ │ mbp_tl │ mbp_tr │ │ [YOU] │ │ipad_l│ipad_r│ │ │ 1479 │ │ ├────────┼────────┤ │ │ │ │ │ │ │ │ 1480 │ │ │ mbp_bl │ mbp_br │ │ │ │ └─────┴─────┘ │ │ 1481 │ │ └────────┴────────┘ │ │ │ 2 sessions │ │ 1482 │ │ 4 sessions │ │ │ │ │ 1483 │ │ parallax tracking │◄───────────●───────────►│ parallax track │ │ 1484 │ └──────────────────────┘ iPhone cam └─────────────────┘ │ 1485 │ (redundant) │ 1486 │ │ 1487 │ ┌──────────────────────────────────────────────────────────────────┐ │ 1488 │ │ REAR iPads (validation) │ │ 1489 │ │ [iPad-L] ◄─────── head pose tracking ────────► [iPad-R] │ │ 1490 │ │ confirm gaze direction via 3D mesh │ │ 1491 │ └──────────────────────────────────────────────────────────────────┘ │ 1492 │ │ 1493 └─────────────────────────────────────────────────────────────────────────────────┘ 1494 ``` 1495 1496 ### Gaze Source Hierarchy 1497 1498 | Source | Trust | Coverage | Notes | 1499 |--------|-------|----------|-------| 1500 | **Tobii IR** | 1.0 | Main monitor | Ground truth when available | 1501 | **iPhone cam** | 0.9 | Main monitor | Redundant, validates Tobii loss | 1502 | **Parallax (MBP+iPad cams)** | 0.7 | Side screens | ~80cm baseline → ~1-2° accuracy | 1503 | **Head pose (rear iPads)** | 0.4 | All screens | Direction confirmation | 1504 | **Last known (decay)** | 0.2 | All screens | Temporal fallback | 1505 1506 ### Parallax Gaze Estimation 1507 1508 With ~80cm baseline between MacBook and iPad cameras: 1509 1510 ``` 1511 Target Point (on MacBook screen) 1512 ● 1513 /│\ 1514 / │ \ 1515 / │ \ 1516 / │ \ 1517 / │ \ 1518 / │ \ 1519 / │ \ 1520 / │ \ 1521 ▼ │ ▼ 1522 MacBook User iPad 1523 Camera Eyes Camera 1524 ●─────────●─────────● 1525 ~80cm baseline 1526 ``` 1527 1528 **Accuracy:** ~1-2° is sufficient to determine: 1529 - Which screen (LEFT/CENTER/RIGHT) 1530 - Which quadrant of screen 1531 - Which session has attention 1532 1533 ### Fusion Algorithm 1534 1535 ```python 1536 def compute_fused_gaze(estimates: List[GazeEstimate]) -> FusedGaze: 1537 """ 1538 Fuse multiple gaze estimates with trust-weighted averaging. 1539 1540 When sources agree → boost confidence (corroboration) 1541 When sources disagree → reduce confidence, favor higher-trust source 1542 """ 1543 # Weighted average by trust level 1544 total_weight = 0.0 1545 weighted_x = 0.0 1546 weighted_y = 0.0 1547 1548 for estimate in estimates: 1549 trust = SOURCE_TRUST_LEVELS[estimate.source] 1550 weight = trust * estimate.confidence 1551 1552 weighted_x += estimate.x * weight 1553 weighted_y += estimate.y * weight 1554 total_weight += weight 1555 1556 x = weighted_x / total_weight 1557 y = weighted_y / total_weight 1558 1559 # Boost confidence when sources agree 1560 agreement = compute_source_agreement(estimates) 1561 if agreement > 0.8: 1562 confidence *= 1.2 # 20% boost for corroboration 1563 1564 return FusedGaze(x=x, y=y, confidence=confidence) 1565 ``` 1566 1567 ### Session Attention Tracking 1568 1569 The `SessionAttentionTracker` converts gaze data into per-session attention metrics: 1570 1571 ```python 1572 from core.attention.multi_screen_gaze import ( 1573 create_multi_screen_system, 1574 start_system 1575 ) 1576 1577 # Create system 1578 registry, fusion, tracker = create_multi_screen_system(talon_integration) 1579 start_system(fusion, tracker) 1580 1581 # Get what you're looking at RIGHT NOW 1582 current_session = tracker.get_active_session() 1583 # → "main_tl" (even with no mouse/keyboard activity) 1584 1585 # Get attention ranking 1586 for session_id, score in tracker.get_attention_ranking(): 1587 print(f"{session_id}: {score:.2f}") 1588 # → main_tl: 0.85 1589 # → mbp_tr: 0.45 1590 # → ipad_l: 0.12 1591 # ... 1592 1593 # React to attention changes 1594 def on_attention_change(new_session, old_session): 1595 print(f"Attention moved: {old_session} → {new_session}") 1596 1597 tracker.on_session_change(on_attention_change) 1598 ``` 1599 1600 ### Multi-Source Validation 1601 1602 When the system has multiple sources, it validates estimates: 1603 1604 ``` 1605 Tobii says: looking at main_tr 1606 Parallax says: looking at main_tr 1607 Head pose says: facing right side of main monitor 1608 1609 → High confidence (3 sources agree) 1610 → Boost confidence 20% 1611 → Session: main_tr (confidence: 0.96) 1612 ``` 1613 1614 ``` 1615 Tobii says: lost tracking (eyes left main monitor) 1616 Parallax says: looking at mbp_tl 1617 Head pose says: facing MacBook direction 1618 1619 → Medium confidence (2 sources, no Tobii) 1620 → Session: mbp_tl (confidence: 0.72) 1621 ``` 1622 1623 ### Calibration Procedure 1624 1625 To calibrate the multi-camera system: 1626 1627 1. **Camera positions** - Measure physical positions relative to your eyes 1628 2. **Display positions** - Measure each display's position and size 1629 3. **Gaze calibration per display**: 1630 - Look at corners of each screen 1631 - System learns mapping from camera images to screen coordinates 1632 4. **Head pose baseline** - Record neutral head position 1633 1634 ```python 1635 # Configure displays (once, at setup) 1636 registry.add_display(DisplayConfig( 1637 display_id=DisplayID.MAIN, 1638 name="Main Monitor", 1639 position_x=0.0, position_y=0.15, position_z=0.70, # meters from eyes 1640 width_m=0.60, height_m=0.34, 1641 width_px=2560, height_px=1440, 1642 gaze_trust=1.0, 1643 gaze_sources=['tobii', 'iphone'] 1644 )) 1645 1646 # Register session regions (can change dynamically) 1647 registry.add_session(SessionRegion( 1648 session_id="claude_code_sovereign", 1649 display_id=DisplayID.MAIN, 1650 x_min=0.0, y_min=0.0, x_max=0.5, y_max=0.5, 1651 session_name="Claude Code - Sovereign OS" 1652 )) 1653 ``` 1654 1655 ### Integration with Attention System 1656 1657 The multi-screen gaze system feeds into the main attention architecture: 1658 1659 ``` 1660 ┌───────────────────┐ 1661 │ Multi-Screen Gaze │ 1662 │ Fusion Engine │ 1663 └────────┬──────────┘ 1664 │ FusedGaze (session_id, confidence) 1665 ▼ 1666 ┌───────────────────┐ 1667 │ Session Attention │ 1668 │ Tracker │ 1669 └────────┬──────────┘ 1670 │ AttentionEvent (modality='gaze', target=session_id) 1671 ▼ 1672 ┌───────────────────┐ 1673 │ Cross-Session │ 1674 │ Tracker │──► Unified trajectory across all sessions 1675 └───────────────────┘ 1676 ``` 1677 1678 ### File Locations 1679 1680 | Component | Path | 1681 |-----------|------| 1682 | Multi-screen gaze | `core/attention/multi_screen_gaze.py` | 1683 | Eye tracking (Tobii) | `core/attention/eye_tracking.py` | 1684 | Talon bridge | `integrations/talon/sovereign_gaze_bridge.py` | 1685 1686 --- 1687 1688 ## Next Steps 1689 1690 1. **Complete Podcast Sync** - Choose and implement Overcast sync method 1691 2. **Embed Concepts** - Generate embeddings for trajectory calculation 1692 3. **Biometric Integration** - Connect Apple Watch HR/HRV data 1693 4. **Tobii Integration** - When hardware arrives, implement eye tracking 1694 5. **Real-time Dashboard** - Live attention state + aha visualization 1695 6. **Graph Contraction Metrics** - Measure "resonant clunk" quantitatively 1696 7. **~~Parallax Gaze~~** ✓ - Implemented in `multi_screen_gaze.py` 1697 8. **Spatial Audio Engine** - Map attention events to 3D audio positions 1698 9. **Omi Integration** - Wire always-on audio to attention events 1699 10. **LED Glasses Protocol** - Define peripheral vision display protocol 1700 11. **EMG Bracelet Evaluation** - Test Gesto/uMyo for attention correlation 1701 12. **Sensor Fusion Pipeline** - Implement Kalman filter for multi-modal attention 1702 13. **Real-World Cursorless** - Prototype Meta glasses + voice + EMG targeting 1703 14. **Personalized EMG Calibration** - Per-user baseline capture for tension/arousal 1704 15. **Camera Gaze Calibration** - Implement actual camera-based gaze estimation 1705 16. **Session Registry Persistence** - Save/load session layouts 1706 17. **Attention-Based Context Switching** - Auto-focus Claude session being looked at 1707 1708 ## The Unified Insight 1709 1710 The attention system tracks **where you're looking** and **where you're heading**. 1711 The aha detector recognizes **when insight crystallizes**. 1712 The cross-session tracker weaves the **unified thread** across interfaces. 1713 The membrane controls **how context flows** between sessions. 1714 1715 Together: **attention is all you need** - for knowing what matters, when insight arrives, and how your distributed cognition converges. 1716 1717 --- 1718 1719 *Sovereign OS Attention System v2.0* 1720 *The single pane of attention architecture* 1721 *"Attention is all you need"* 1722 --- 1723 1724 ## Related 1725 1726 - [[pattern-detection-inventory]] - resonance: 67% 1727 - [[daily-synthesis-2026-01-14]] - resonance: 52% 1728 - [[2026-01-15-conversation-graph]] - resonance: 43% 1729 - [[biomimetic-architecture]] - resonance: 32% 1730 - [[consciousness-architecture]] - resonance: 33% 1731