Cradicle Explorer

/ docs / DEMO_CREATE.md
DEMO_CREATE.md
  1  # Demo Creation Guide
  2  
  3  This document tracks knowledge gained while creating demo videos for code-action-quick.
  4  
  5  ## Infrastructure
  6  
  7  ### Available Tools
  8  - **ffmpeg**: Screen recording with x11grab, window-specific capture via `-window_id`
  9  - **edge-tts**: Microsoft Azure TTS via `pipx install edge-tts` - supports SSML timing
 10  - **emacs --daemon**: Headless Emacs for scripted control via emacsclient
 11  
 12  ### Automated Recording Script
 13  Location: `scripts/record-demo-auto.sh`
 14  
 15  **Features:**
 16  - Fully automated (no manual interaction needed)
 17  - Uses dedicated Emacs daemon for isolation
 18  - Window-specific recording (no cropping needed)
 19  - Configurable window size (default: 750x500)
 20  - Resets demo file via git checkout after recording
 21  
 22  **Usage:**
 23  ```bash
 24  ./scripts/record-demo-auto.sh
 25  ```
 26  
 27  **Output:**
 28  - `docs/assets/demo-TIMESTAMP.mp4` - Video file
 29  - `docs/assets/demo-TIMESTAMP.gif` - Animated GIF for README
 30  
 31  ### Key Technical Details
 32  
 33  **Window-specific ffmpeg recording:**
 34  ```bash
 35  WINDOW_ID=$(emacsclient -e "(frame-parameter nil 'outer-window-id)")
 36  ffmpeg -y -f x11grab -framerate 24 -window_id "$WINDOW_ID" -i :0.0 ...
 37  ```
 38  Note: `-window_id` must come BEFORE `-i :0.0`
 39  
 40  **Emacs frame pixel sizing:**
 41  ```elisp
 42  (set-frame-size nil 750 500 t)  ; t = pixel units, not characters
 43  ```
 44  
 45  **Hide eglot modeline indicator:**
 46  ```elisp
 47  (setq eglot--mode-line-format nil)
 48  ```
 49  
 50  **Clean echo area before recording:**
 51  ```elisp
 52  (message nil)
 53  (with-current-buffer " *Echo Area 0*" (erase-buffer))
 54  (with-current-buffer " *Echo Area 1*" (erase-buffer))
 55  ```
 56  
 57  ## Text-to-Speech Narration
 58  
 59  ### edge-tts Installation
 60  ```bash
 61  pipx install edge-tts
 62  ~/.local/bin/edge-tts --list-voices
 63  ```
 64  
 65  ### Voice Selection
 66  
 67  **Chosen voice:** `en-US-EmmaNeural` with `-10%` rate adjustment
 68  
 69  Testing voices:
 70  ```bash
 71  # Generate samples for comparison
 72  for voice in AriaNeural AvaNeural EmmaNeural JennyNeural MichelleNeural; do
 73      edge-tts --voice "en-US-${voice}" --rate="-10%" \
 74          --text "This is code action quick. It shows LSP fixes in the modeline." \
 75          --write-media "sample-${voice}.mp3"
 76  done
 77  ```
 78  
 79  ### Available Voices (US English - Female)
 80  | Voice | Style |
 81  |-------|-------|
 82  | `en-US-AriaNeural` | Positive, Confident |
 83  | `en-US-JennyNeural` | Friendly, Considerate, Comfort |
 84  | `en-US-AvaNeural` | Expressive, Caring, Pleasant |
 85  | `en-US-EmmaNeural` | Cheerful, Clear, Conversational |
 86  | `en-US-MichelleNeural` | Friendly, Pleasant |
 87  | `en-US-AnaNeural` | Cartoon, Cute |
 88  
 89  ### Available Voices (US English - Male)
 90  | Voice | Style |
 91  |-------|-------|
 92  | `en-US-BrianNeural` | Approachable, Casual, Sincere |
 93  | `en-US-AndrewNeural` | Warm, Confident, Authentic |
 94  | `en-US-ChristopherNeural` | Reliable, Authority |
 95  | `en-US-GuyNeural` | Passion |
 96  | `en-US-RogerNeural` | Lively |
 97  
 98  ### Narration Script Structure
 99  
100  Build narration from segments with silence gaps for pacing:
101  
102  ```bash
103  # Generate individual segments
104  edge-tts --voice "$VOICE" --rate="-10%" \
105      --text "Segment 1 intro text." \
106      --write-media "01.mp3"
107  
108  edge-tts --voice "$VOICE" --rate="-10%" \
109      --text "Segment 2 action text." \
110      --write-media "02.mp3"
111  
112  # Generate silence gaps (2.5s each)
113  ffmpeg -y -f lavfi -i anullsrc=r=24000:cl=mono -t 2.5 \
114      -c:a libmp3lame -q:a 9 silence.mp3
115  
116  # Concatenate with silence gaps
117  cat > list.txt << EOF
118  file '/absolute/path/to/01.mp3'
119  file '/absolute/path/to/silence.mp3'
120  file '/absolute/path/to/02.mp3'
121  EOF
122  
123  ffmpeg -y -f concat -safe 0 -i list.txt -c:a libmp3lame -q:a 2 output.mp3
124  ```
125  
126  **IMPORTANT:** Use absolute paths in concat list files. Relative paths break if working directory differs.
127  
128  ### Subtitle Timing with VTT
129  
130  edge-tts can generate VTT subtitle files with word-level timing:
131  
132  ```bash
133  edge-tts --voice "en-US-EmmaNeural" --rate="-10%" \
134      --text "Your narration text here." \
135      --write-media "output.mp3" \
136      --write-subtitles "output.vtt"
137  ```
138  
139  **Sample VTT output:**
140  ```
141  1
142  00:00:00,050 --> 00:00:03,597
143  First, let's convert this comment to a doc comment.
144  
145  2
146  00:00:03,597 --> 00:00:06,430
147  The lightbulb shows the available action.
148  
149  3
150  00:00:06,430 --> 00:00:08,708
151  Pressing Control-c Control-a applies it.
152  ```
153  
154  Use VTT timing to sync demo actions with narration:
155  - Parse VTT to find when specific phrases are spoken
156  - Trigger Emacs actions at those timestamps
157  - Accounts for TTS pacing variations
158  
159  ### Keybinding Pronunciation
160  
161  TTS engines struggle with Emacs keybinding notation:
162  
163  | Written | TTS reads as | Better text |
164  |---------|--------------|-------------|
165  | `C-c C-a` | "see see see ay" | `Control-c Control-a` |
166  | `M-x` | "emm ex" | `Meta-x` or `Alt-x` |
167  | `C-c a` | silence or garbled | `Control-c a` |
168  
169  **Tips:**
170  - Use hyphens not commas: `Control-c Control-a` (comma creates long pause)
171  - Spell out "Control" not "Ctrl"
172  - Single letters at end may be swallowed - use `Capital-A` if needed
173  
174  ### SSML Timing Support (Alternative)
175  ```xml
176  <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
177    <voice name="en-US-AriaNeural">
178      <break time="2s"/>
179      The comment above the function can become a doc comment.
180      <break time="500ms"/>
181      Code action quick shows the fix in the modeline.
182      <break time="3s"/>
183    </voice>
184  </speak>
185  ```
186  
187  **Key SSML tags:**
188  - `<break time="2s"/>` - pause for N seconds
189  - `<break time="500ms"/>` - pause in milliseconds
190  - `<prosody rate="slow">text</prosody>` - adjust speech rate
191  - `<prosody pitch="high">text</prosody>` - adjust pitch
192  
193  ### Generate Audio
194  ```bash
195  ~/.local/bin/edge-tts --voice en-US-AriaNeural --file narration.ssml --write-media narration.mp3
196  ```
197  
198  ### Merge Video + Audio
199  ```bash
200  ffmpeg -i video.mp4 -i narration.mp3 -c:v copy -c:a aac -shortest output.mp4
201  ```
202  
203  ## Eglot Configuration for Demos
204  
205  ### Hide Modeline Clutter
206  ```elisp
207  ;; Hide [eglot:project-name] from modeline
208  (setq eglot--mode-line-format nil)
209  ```
210  
211  ### Disable Document Highlighting
212  When cursor moves programmatically, eglot-highlight-symbol creates distracting highlights:
213  ```elisp
214  ;; Disable document highlight (prevents flashing when cursor jumps)
215  (setq eglot-ignored-server-capabilities '(:documentHighlightProvider))
216  ```
217  
218  ### Eglot Lifecycle Issues
219  
220  **Problem:** `revert-buffer` disconnects eglot from the buffer.
221  
222  **Solutions tried:**
223  - `(eglot-ensure)` - not sufficient, doesn't actually reconnect
224  - `(eglot-reconnect (eglot-current-server))` - works but slow
225  
226  **Best approach:** Avoid `revert-buffer` during demo. Instead:
227  - Use `git checkout` to reset the file *before* opening in Emacs
228  - Or accept that the demo shows cumulative changes
229  
230  ### Clean Echo Area
231  ```elisp
232  (message nil)
233  (with-current-buffer " *Echo Area 0*" (erase-buffer))
234  (with-current-buffer " *Echo Area 1*" (erase-buffer))
235  (redisplay t)
236  ```
237  
238  ## Demo Timing Synchronization
239  
240  ### Calculate Cumulative Timeline
241  
242  Given segment durations and silence gaps, calculate when to trigger each action:
243  
244  ```
245  Segment 01: 10.3s (intro)
246  Silence 1:   2.5s
247  Segment 02:  8.8s - contains action cue at 6.4s
248  Silence 2:   2.5s
249  Segment 03:  8.1s - contains action cue at 5.4s
250  ...
251  
252  Timeline:
253  - Start segment 01: 0s
254  - Start segment 02: 12.8s (10.3 + 2.5)
255  - Action 1 trigger: 19.2s (12.8 + 6.4)
256  - Start segment 03: 24.1s (12.8 + 8.8 + 2.5)
257  - Action 2 trigger: 29.5s (24.1 + 5.4)
258  ```
259  
260  ### Sync Actions with VTT Cues
261  
262  1. Generate narration with `--write-subtitles output.vtt`
263  2. Find the timestamp when key phrase is spoken (e.g., "applies it")
264  3. Set shell `sleep` timings to trigger action at that moment
265  4. Account for cumulative offsets from previous segments + silences
266  
267  Example timing block in recording script:
268  ```bash
269  # Segment 02 starts at 12.8s, action cue "Control-c Control-a applies it" at 6.4s
270  # So trigger action at 12.8 + 6.4 = 19.2s from start
271  sleep 10  # Wait through intro (10.3s)
272  # Move cursor at start of segment 02
273  emacsclient -e "(goto-char ...)"
274  sleep 7   # Wait until "applies it" phrase
275  # Trigger action
276  emacsclient -e "(code-action-quick)"
277  ```
278  
279  ## Demo Project
280  
281  ### Location
282  `test/fixtures/demo/src/main.rs`
283  
284  ### Current Content (3 Code Actions)
285  ```rust
286  // Demo: code-action-quick in action
287  // This file has intentional issues that trigger LSP code actions
288  
289  // A helper function that greets users
290  fn greet(name: &str) -> String {
291      format!("Hello, {}!", name)
292  }
293  
294  fn main() {
295      // Demo 1: Convert comment to doc comment (on greet function)
296      
297      // Demo 2: Missing import - HashMap needs std::collections::HashMap
298      let mut map: HashMap<String, i32> = HashMap::new();
299      map.insert(greet("world"), 42);
300      
301      // Demo 3: Unused variable (prefixing with _ fixes it)
302      let unused_value = 123;
303      
304      println!("Map: {:?}", map);
305  }
306  ```
307  
308  **Code actions demonstrated:**
309  1. Convert `// comment` to `/// doc comment` 
310  2. Import `std::collections::HashMap`
311  3. Prefix unused variable with `_`
312  
313  ## Recording Workflow
314  
315  ### Automated (Recommended)
316  ```bash
317  cd /path/to/code-action-quick
318  ./scripts/record-demo-auto.sh
319  ```
320  
321  ### Manual Setup (for debugging)
322  1. Start Emacs: `emacs test/fixtures/demo/src/main.rs`
323  2. Enable eglot: `M-x eglot`
324  3. Wait for rust-analyzer to initialize
325  4. Enable mode: `M-x code-action-quick-mode`
326  5. Verify 💡 indicator appears
327  
328  ## Issues & Solutions
329  
330  ### Issue: ffmpeg "output format x11grab not known"
331  **Cause:** `-window_id` placed after output file
332  **Solution:** `-window_id "$ID"` must come before `-i :0.0`
333  
334  ### Issue: libx264 dimension error
335  **Cause:** Odd pixel dimensions
336  **Solution:** Round to even: `$((WIDTH / 2 * 2))`
337  
338  ### Issue: [eglot:project] in modeline
339  **Solution:** `(setq eglot--mode-line-format nil)`
340  
341  ### Issue: ffmpeg concat only gets first segment
342  **Cause:** Relative paths in concat.txt while ffmpeg runs from different directory
343  **Solution:** Use absolute paths in concat file:
344  ```
345  file '/absolute/path/to/01.mp3'   # Not 'file 01.mp3'
346  ```
347  
348  ### Issue: Distracting symbol highlights when cursor moves
349  **Cause:** eglot-highlight-symbol activates on cursor movement
350  **Solution:** `(setq eglot-ignored-server-capabilities '(:documentHighlightProvider))`
351  
352  ### Issue: Modeline indicator not appearing after buffer revert
353  **Cause:** `revert-buffer` disconnects eglot from the buffer
354  **Solution:** Don't use revert-buffer during demo. Reset file via git before starting.
355  
356  ### Issue: TTS mispronounces keybindings
357  **Cause:** `C-c C-a` is read as individual letters or garbled
358  **Solution:** Write as `Control-c Control-a` with hyphens (not commas - they cause pauses)
359  
360  ### Issue: Actions happen before narrator explains them
361  **Cause:** Fixed sleep timings don't account for actual speech timing
362  **Solution:** Use VTT subtitle output from edge-tts to find exact phrase timestamps, sync actions accordingly
363  
364  ## Iteration Log
365  
366  ### Attempt 1 - 2026-01-11
367  - Status: Infrastructure setup complete
368  - Created: demo project, recording script
369  
370  ### Attempt 2 - 2026-01-11  
371  - Status: Working automated recording
372  - Window size: 750x500 pixels
373  - Duration: 35s for 3 code actions
374  - Output: MP4 + GIF
375  
376  ## TODO
377  - [x] Test the recording script
378  - [x] Determine optimal video duration (35s for 3 actions)
379  - [x] Window-specific recording (no cropping)
380  - [x] Add TTS narration track
381  - [x] Sync audio timing with video actions via VTT
382  - [ ] Final polish and publish
383  
384  ## Reusable Scripts
385  
386  ### Narration Generator Template
387  `scripts/generate-narration.sh`:
388  - Generates TTS segments with edge-tts
389  - Creates silence gaps between segments
390  - Concatenates to single MP3
391  - Reports duration for timing calculations
392  
393  ### Recording Script Template  
394  `scripts/record-demo-auto.sh`:
395  - Starts isolated Emacs daemon
396  - Configures frame size and appearance
397  - Waits for LSP initialization
398  - Records window with ffmpeg
399  - Executes timed demo sequence
400  - Merges audio track
401  - Cleans up daemon