DEMO_CREATE.md
1 # Demo Creation Guide 2 3 This document tracks knowledge gained while creating demo videos for code-action-quick. 4 5 ## Infrastructure 6 7 ### Available Tools 8 - **ffmpeg**: Screen recording with x11grab, window-specific capture via `-window_id` 9 - **edge-tts**: Microsoft Azure TTS via `pipx install edge-tts` - supports SSML timing 10 - **emacs --daemon**: Headless Emacs for scripted control via emacsclient 11 12 ### Automated Recording Script 13 Location: `scripts/record-demo-auto.sh` 14 15 **Features:** 16 - Fully automated (no manual interaction needed) 17 - Uses dedicated Emacs daemon for isolation 18 - Window-specific recording (no cropping needed) 19 - Configurable window size (default: 750x500) 20 - Resets demo file via git checkout after recording 21 22 **Usage:** 23 ```bash 24 ./scripts/record-demo-auto.sh 25 ``` 26 27 **Output:** 28 - `docs/assets/demo-TIMESTAMP.mp4` - Video file 29 - `docs/assets/demo-TIMESTAMP.gif` - Animated GIF for README 30 31 ### Key Technical Details 32 33 **Window-specific ffmpeg recording:** 34 ```bash 35 WINDOW_ID=$(emacsclient -e "(frame-parameter nil 'outer-window-id)") 36 ffmpeg -y -f x11grab -framerate 24 -window_id "$WINDOW_ID" -i :0.0 ... 37 ``` 38 Note: `-window_id` must come BEFORE `-i :0.0` 39 40 **Emacs frame pixel sizing:** 41 ```elisp 42 (set-frame-size nil 750 500 t) ; t = pixel units, not characters 43 ``` 44 45 **Hide eglot modeline indicator:** 46 ```elisp 47 (setq eglot--mode-line-format nil) 48 ``` 49 50 **Clean echo area before recording:** 51 ```elisp 52 (message nil) 53 (with-current-buffer " *Echo Area 0*" (erase-buffer)) 54 (with-current-buffer " *Echo Area 1*" (erase-buffer)) 55 ``` 56 57 ## Text-to-Speech Narration 58 59 ### edge-tts Installation 60 ```bash 61 pipx install edge-tts 62 ~/.local/bin/edge-tts --list-voices 63 ``` 64 65 ### Voice Selection 66 67 **Chosen voice:** `en-US-EmmaNeural` with `-10%` rate adjustment 68 69 Testing voices: 70 ```bash 71 # Generate samples for comparison 72 for voice in AriaNeural AvaNeural EmmaNeural JennyNeural MichelleNeural; do 73 edge-tts --voice "en-US-${voice}" --rate="-10%" \ 74 --text "This is code action quick. It shows LSP fixes in the modeline." \ 75 --write-media "sample-${voice}.mp3" 76 done 77 ``` 78 79 ### Available Voices (US English - Female) 80 | Voice | Style | 81 |-------|-------| 82 | `en-US-AriaNeural` | Positive, Confident | 83 | `en-US-JennyNeural` | Friendly, Considerate, Comfort | 84 | `en-US-AvaNeural` | Expressive, Caring, Pleasant | 85 | `en-US-EmmaNeural` | Cheerful, Clear, Conversational | 86 | `en-US-MichelleNeural` | Friendly, Pleasant | 87 | `en-US-AnaNeural` | Cartoon, Cute | 88 89 ### Available Voices (US English - Male) 90 | Voice | Style | 91 |-------|-------| 92 | `en-US-BrianNeural` | Approachable, Casual, Sincere | 93 | `en-US-AndrewNeural` | Warm, Confident, Authentic | 94 | `en-US-ChristopherNeural` | Reliable, Authority | 95 | `en-US-GuyNeural` | Passion | 96 | `en-US-RogerNeural` | Lively | 97 98 ### Narration Script Structure 99 100 Build narration from segments with silence gaps for pacing: 101 102 ```bash 103 # Generate individual segments 104 edge-tts --voice "$VOICE" --rate="-10%" \ 105 --text "Segment 1 intro text." \ 106 --write-media "01.mp3" 107 108 edge-tts --voice "$VOICE" --rate="-10%" \ 109 --text "Segment 2 action text." \ 110 --write-media "02.mp3" 111 112 # Generate silence gaps (2.5s each) 113 ffmpeg -y -f lavfi -i anullsrc=r=24000:cl=mono -t 2.5 \ 114 -c:a libmp3lame -q:a 9 silence.mp3 115 116 # Concatenate with silence gaps 117 cat > list.txt << EOF 118 file '/absolute/path/to/01.mp3' 119 file '/absolute/path/to/silence.mp3' 120 file '/absolute/path/to/02.mp3' 121 EOF 122 123 ffmpeg -y -f concat -safe 0 -i list.txt -c:a libmp3lame -q:a 2 output.mp3 124 ``` 125 126 **IMPORTANT:** Use absolute paths in concat list files. Relative paths break if working directory differs. 127 128 ### Subtitle Timing with VTT 129 130 edge-tts can generate VTT subtitle files with word-level timing: 131 132 ```bash 133 edge-tts --voice "en-US-EmmaNeural" --rate="-10%" \ 134 --text "Your narration text here." \ 135 --write-media "output.mp3" \ 136 --write-subtitles "output.vtt" 137 ``` 138 139 **Sample VTT output:** 140 ``` 141 1 142 00:00:00,050 --> 00:00:03,597 143 First, let's convert this comment to a doc comment. 144 145 2 146 00:00:03,597 --> 00:00:06,430 147 The lightbulb shows the available action. 148 149 3 150 00:00:06,430 --> 00:00:08,708 151 Pressing Control-c Control-a applies it. 152 ``` 153 154 Use VTT timing to sync demo actions with narration: 155 - Parse VTT to find when specific phrases are spoken 156 - Trigger Emacs actions at those timestamps 157 - Accounts for TTS pacing variations 158 159 ### Keybinding Pronunciation 160 161 TTS engines struggle with Emacs keybinding notation: 162 163 | Written | TTS reads as | Better text | 164 |---------|--------------|-------------| 165 | `C-c C-a` | "see see see ay" | `Control-c Control-a` | 166 | `M-x` | "emm ex" | `Meta-x` or `Alt-x` | 167 | `C-c a` | silence or garbled | `Control-c a` | 168 169 **Tips:** 170 - Use hyphens not commas: `Control-c Control-a` (comma creates long pause) 171 - Spell out "Control" not "Ctrl" 172 - Single letters at end may be swallowed - use `Capital-A` if needed 173 174 ### SSML Timing Support (Alternative) 175 ```xml 176 <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"> 177 <voice name="en-US-AriaNeural"> 178 <break time="2s"/> 179 The comment above the function can become a doc comment. 180 <break time="500ms"/> 181 Code action quick shows the fix in the modeline. 182 <break time="3s"/> 183 </voice> 184 </speak> 185 ``` 186 187 **Key SSML tags:** 188 - `<break time="2s"/>` - pause for N seconds 189 - `<break time="500ms"/>` - pause in milliseconds 190 - `<prosody rate="slow">text</prosody>` - adjust speech rate 191 - `<prosody pitch="high">text</prosody>` - adjust pitch 192 193 ### Generate Audio 194 ```bash 195 ~/.local/bin/edge-tts --voice en-US-AriaNeural --file narration.ssml --write-media narration.mp3 196 ``` 197 198 ### Merge Video + Audio 199 ```bash 200 ffmpeg -i video.mp4 -i narration.mp3 -c:v copy -c:a aac -shortest output.mp4 201 ``` 202 203 ## Eglot Configuration for Demos 204 205 ### Hide Modeline Clutter 206 ```elisp 207 ;; Hide [eglot:project-name] from modeline 208 (setq eglot--mode-line-format nil) 209 ``` 210 211 ### Disable Document Highlighting 212 When cursor moves programmatically, eglot-highlight-symbol creates distracting highlights: 213 ```elisp 214 ;; Disable document highlight (prevents flashing when cursor jumps) 215 (setq eglot-ignored-server-capabilities '(:documentHighlightProvider)) 216 ``` 217 218 ### Eglot Lifecycle Issues 219 220 **Problem:** `revert-buffer` disconnects eglot from the buffer. 221 222 **Solutions tried:** 223 - `(eglot-ensure)` - not sufficient, doesn't actually reconnect 224 - `(eglot-reconnect (eglot-current-server))` - works but slow 225 226 **Best approach:** Avoid `revert-buffer` during demo. Instead: 227 - Use `git checkout` to reset the file *before* opening in Emacs 228 - Or accept that the demo shows cumulative changes 229 230 ### Clean Echo Area 231 ```elisp 232 (message nil) 233 (with-current-buffer " *Echo Area 0*" (erase-buffer)) 234 (with-current-buffer " *Echo Area 1*" (erase-buffer)) 235 (redisplay t) 236 ``` 237 238 ## Demo Timing Synchronization 239 240 ### Calculate Cumulative Timeline 241 242 Given segment durations and silence gaps, calculate when to trigger each action: 243 244 ``` 245 Segment 01: 10.3s (intro) 246 Silence 1: 2.5s 247 Segment 02: 8.8s - contains action cue at 6.4s 248 Silence 2: 2.5s 249 Segment 03: 8.1s - contains action cue at 5.4s 250 ... 251 252 Timeline: 253 - Start segment 01: 0s 254 - Start segment 02: 12.8s (10.3 + 2.5) 255 - Action 1 trigger: 19.2s (12.8 + 6.4) 256 - Start segment 03: 24.1s (12.8 + 8.8 + 2.5) 257 - Action 2 trigger: 29.5s (24.1 + 5.4) 258 ``` 259 260 ### Sync Actions with VTT Cues 261 262 1. Generate narration with `--write-subtitles output.vtt` 263 2. Find the timestamp when key phrase is spoken (e.g., "applies it") 264 3. Set shell `sleep` timings to trigger action at that moment 265 4. Account for cumulative offsets from previous segments + silences 266 267 Example timing block in recording script: 268 ```bash 269 # Segment 02 starts at 12.8s, action cue "Control-c Control-a applies it" at 6.4s 270 # So trigger action at 12.8 + 6.4 = 19.2s from start 271 sleep 10 # Wait through intro (10.3s) 272 # Move cursor at start of segment 02 273 emacsclient -e "(goto-char ...)" 274 sleep 7 # Wait until "applies it" phrase 275 # Trigger action 276 emacsclient -e "(code-action-quick)" 277 ``` 278 279 ## Demo Project 280 281 ### Location 282 `test/fixtures/demo/src/main.rs` 283 284 ### Current Content (3 Code Actions) 285 ```rust 286 // Demo: code-action-quick in action 287 // This file has intentional issues that trigger LSP code actions 288 289 // A helper function that greets users 290 fn greet(name: &str) -> String { 291 format!("Hello, {}!", name) 292 } 293 294 fn main() { 295 // Demo 1: Convert comment to doc comment (on greet function) 296 297 // Demo 2: Missing import - HashMap needs std::collections::HashMap 298 let mut map: HashMap<String, i32> = HashMap::new(); 299 map.insert(greet("world"), 42); 300 301 // Demo 3: Unused variable (prefixing with _ fixes it) 302 let unused_value = 123; 303 304 println!("Map: {:?}", map); 305 } 306 ``` 307 308 **Code actions demonstrated:** 309 1. Convert `// comment` to `/// doc comment` 310 2. Import `std::collections::HashMap` 311 3. Prefix unused variable with `_` 312 313 ## Recording Workflow 314 315 ### Automated (Recommended) 316 ```bash 317 cd /path/to/code-action-quick 318 ./scripts/record-demo-auto.sh 319 ``` 320 321 ### Manual Setup (for debugging) 322 1. Start Emacs: `emacs test/fixtures/demo/src/main.rs` 323 2. Enable eglot: `M-x eglot` 324 3. Wait for rust-analyzer to initialize 325 4. Enable mode: `M-x code-action-quick-mode` 326 5. Verify 💡 indicator appears 327 328 ## Issues & Solutions 329 330 ### Issue: ffmpeg "output format x11grab not known" 331 **Cause:** `-window_id` placed after output file 332 **Solution:** `-window_id "$ID"` must come before `-i :0.0` 333 334 ### Issue: libx264 dimension error 335 **Cause:** Odd pixel dimensions 336 **Solution:** Round to even: `$((WIDTH / 2 * 2))` 337 338 ### Issue: [eglot:project] in modeline 339 **Solution:** `(setq eglot--mode-line-format nil)` 340 341 ### Issue: ffmpeg concat only gets first segment 342 **Cause:** Relative paths in concat.txt while ffmpeg runs from different directory 343 **Solution:** Use absolute paths in concat file: 344 ``` 345 file '/absolute/path/to/01.mp3' # Not 'file 01.mp3' 346 ``` 347 348 ### Issue: Distracting symbol highlights when cursor moves 349 **Cause:** eglot-highlight-symbol activates on cursor movement 350 **Solution:** `(setq eglot-ignored-server-capabilities '(:documentHighlightProvider))` 351 352 ### Issue: Modeline indicator not appearing after buffer revert 353 **Cause:** `revert-buffer` disconnects eglot from the buffer 354 **Solution:** Don't use revert-buffer during demo. Reset file via git before starting. 355 356 ### Issue: TTS mispronounces keybindings 357 **Cause:** `C-c C-a` is read as individual letters or garbled 358 **Solution:** Write as `Control-c Control-a` with hyphens (not commas - they cause pauses) 359 360 ### Issue: Actions happen before narrator explains them 361 **Cause:** Fixed sleep timings don't account for actual speech timing 362 **Solution:** Use VTT subtitle output from edge-tts to find exact phrase timestamps, sync actions accordingly 363 364 ## Iteration Log 365 366 ### Attempt 1 - 2026-01-11 367 - Status: Infrastructure setup complete 368 - Created: demo project, recording script 369 370 ### Attempt 2 - 2026-01-11 371 - Status: Working automated recording 372 - Window size: 750x500 pixels 373 - Duration: 35s for 3 code actions 374 - Output: MP4 + GIF 375 376 ## TODO 377 - [x] Test the recording script 378 - [x] Determine optimal video duration (35s for 3 actions) 379 - [x] Window-specific recording (no cropping) 380 - [x] Add TTS narration track 381 - [x] Sync audio timing with video actions via VTT 382 - [ ] Final polish and publish 383 384 ## Reusable Scripts 385 386 ### Narration Generator Template 387 `scripts/generate-narration.sh`: 388 - Generates TTS segments with edge-tts 389 - Creates silence gaps between segments 390 - Concatenates to single MP3 391 - Reports duration for timing calculations 392 393 ### Recording Script Template 394 `scripts/record-demo-auto.sh`: 395 - Starts isolated Emacs daemon 396 - Configures frame size and appearance 397 - Waits for LSP initialization 398 - Records window with ffmpeg 399 - Executes timed demo sequence 400 - Merges audio track 401 - Cleans up daemon