electron.md
1 --- 2 description: How to CLI-ify and automate any Electron Desktop Application via CDP 3 --- 4 5 # CLI-ifying Electron Applications (Skill Guide) 6 7 Based on the successful automation of **Cursor**, **Codex**, **Antigravity**, **ChatWise**, **Notion**, and **Discord** desktop apps, this guide serves as the standard operating procedure (SOP) for adapting ANY Electron-based application into an OpenCLI adapter. 8 9 ## Core Concept 10 11 Electron apps are essentially local Chromium browser instances. By exposing a debugging port (CDP — Chrome DevTools Protocol) at launch time, we can use the Browser Bridge to pierce through the UI layer, accessing and controlling all underlying state including React/Vue components and Shadow DOM. 12 13 > **Note:** Not all desktop apps are Electron. WeChat (native Cocoa) and Feishu/Lark (custom Lark Framework) embed Chromium but do NOT expose CDP. For those apps, use the AppleScript + clipboard approach instead (see [Non-Electron Pattern](#non-electron-pattern-applescript)). 14 15 ### Launching the Target App 16 ```bash 17 /Applications/AppName.app/Contents/MacOS/AppName --remote-debugging-port=9222 18 ``` 19 20 ### Verifying Electron 21 ```bash 22 # Check for Electron Framework in the app bundle 23 ls /Applications/AppName.app/Contents/Frameworks/Electron\ Framework.framework 24 # If this directory exists → Electron → CDP works 25 # If not → check for libEGL.dylib (embedded Chromium/CEF, CDP may not work) 26 ``` 27 28 ## The 5-Command Pattern (CDP / Electron) 29 30 Every new Electron adapter should implement these 5 commands in `clis/<app_name>/`: 31 32 ### 1. `status.ts` — Connection Test 33 ```typescript 34 export const statusCommand = cli({ 35 site: 'myapp', 36 name: 'status', 37 domain: 'localhost', 38 strategy: Strategy.UI, 39 browser: true, // Requires CDP connection 40 args: [], 41 columns: ['Status', 'Url', 'Title'], 42 func: async (page: IPage) => { 43 const url = await page.evaluate('window.location.href'); 44 const title = await page.evaluate('document.title'); 45 return [{ Status: 'Connected', Url: url, Title: title }]; 46 }, 47 }); 48 ``` 49 50 ### 2. `dump.ts` — Reverse Engineering Core 51 Modern app DOMs are huge and obfuscated. **Never guess selectors.** Dump first, then extract precise class names with AI or `grep`: 52 ```typescript 53 const dom = await page.evaluate('document.body.innerHTML'); 54 fs.writeFileSync('/tmp/app-dom.html', dom); 55 const snap = await page.snapshot({ interactive: false }); 56 fs.writeFileSync('/tmp/app-snapshot.json', JSON.stringify(snap, null, 2)); 57 ``` 58 59 ### 3. `send.ts` — Advanced Text Injection 60 Electron apps often use complex rich-text editors (Monaco, Lexical, ProseMirror). Setting `.value` directly is ignored by React state. 61 62 **Best practice:** Use `document.execCommand('insertText')` to perfectly simulate real user input, fully piercing React state: 63 ```javascript 64 const composer = document.querySelector('[contenteditable="true"]'); 65 composer.focus(); 66 document.execCommand('insertText', false, 'Hello'); 67 ``` 68 Then submit with `await page.pressKey('Enter')`. 69 70 ### 4. `read.ts` — Context Extraction 71 Don't extract the entire page text. Use `dump.ts` output to find the real "conversation container": 72 - Look for semantic selectors: `[role="log"]`, `[data-testid="conversation"]`, `[data-content-search-turn-key]` 73 - Format output as Markdown — readable by both humans and LLMs 74 75 ### 5. `new.ts` — Keyboard Shortcuts 76 Many GUI actions respond to native shortcuts rather than button clicks: 77 ```typescript 78 const isMac = process.platform === 'darwin'; 79 await page.pressKey(isMac ? 'Meta+N' : 'Control+N'); 80 await page.wait(1); // Wait for re-render 81 ``` 82 83 ## Environment Variable 84 ```bash 85 export OPENCLI_CDP_ENDPOINT="http://127.0.0.1:9222" 86 ``` 87 88 ## Non-Electron Pattern (AppleScript) 89 90 For native macOS apps (WeChat, Feishu) that don't expose CDP: 91 ```typescript 92 export const statusCommand = cli({ 93 site: 'myapp', 94 strategy: Strategy.PUBLIC, 95 browser: false, // No browser needed 96 func: async (page: IPage | null) => { 97 const output = execSync("osascript -e 'application \"MyApp\" is running'", { encoding: 'utf-8' }).trim(); 98 return [{ Status: output === 'true' ? 'Running' : 'Stopped' }]; 99 }, 100 }); 101 ``` 102 103 Core techniques: 104 - **status**: `osascript -e 'application "AppName" is running'` 105 - **send**: `pbcopy` → activate window → `Cmd+V` → `Enter` 106 - **read**: `Cmd+A` → `Cmd+C` → `pbpaste` 107 - **search**: Activate → `Cmd+F`/`Cmd+K` → `keystroke "query"` 108 109 ## Pitfalls & Gotchas 110 111 1. **Port conflicts (EADDRINUSE)**: Only one app per port. Use unique ports: Codex=9222, ChatGPT=9224, Cursor=9226, ChatWise=9228, Notion=9230, Discord=9232 112 2. **IPage abstraction**: OpenCLI wraps the browser page as `IPage` (`src/types.ts`). Use `page.pressKey()` and `page.evaluate()`, NOT direct DOM APIs 113 3. **Timing**: Always add `await page.wait(0.5)` to `1.0` after DOM mutations. Returning too early disconnects prematurely 114 4. **AppleScript requires Accessibility**: Terminal app must be granted permission in System Settings → Privacy & Security → Accessibility 115 116 ## Port Assignment Table 117 118 | App | Port | Mode | 119 |-----|------|------| 120 | Codex | 9222 | CDP | 121 | ChatGPT | 9224 | CDP / AppleScript | 122 | Cursor | 9226 | CDP | 123 | ChatWise | 9228 | CDP | 124 | Notion | 9230 | CDP | 125 | Discord App | 9232 | CDP |