/ docs / 06-automation / pipeline-service.md
pipeline-service.md
  1  ---
  2  title: 'Pipeline Service'
  3  category: 'automation'
  4  last_verified: '2026-02-15'
  5  related_files:
  6    - 'src/pipeline-service.js'
  7  tags: ['nixos', 'service', 'pipeline', 'automation']
  8  status: 'current'
  9  ---
 10  
 11  # Pipeline Service Configuration for 333 Method
 12  
 13  ## Overview
 14  
 15  The pipeline service (`src/pipeline-service.js`) runs the 333 Method pipeline continuously in a loop:
 16  
 17  **Pipeline Flow:** SERPs → Assets → Scoring → Rescoring → Enrich → Proposals → Outreach → Replies
 18  
 19  - Processes 5 sites per stage by default (configurable via `PIPELINE_BATCH_SIZE`)
 20  - Runs with nice 19 (low CPU priority) to avoid system slowdown
 21  - Pauses automatically when scheduled tasks need to run
 22  - Auto-restarts on crash via systemd
 23  - Respects `SKIP_STAGES` to skip specific pipeline stages
 24  
 25  ---
 26  
 27  ## NixOS Configuration
 28  
 29  ### Option 1: User Service (Recommended)
 30  
 31  Add this to your NixOS `configuration.nix` or home-manager config:
 32  
 33  ```nix
 34  { config, pkgs, ... }:
 35  
 36  let
 37    projectDir = "/home/jason/code/333Method";
 38    nodejs = pkgs.nodejs_20;
 39  in
 40  {
 41    # Systemd service (continuous pipeline loop)
 42    systemd.user.services."333method-pipeline" = {
 43      Unit = {
 44        Description = "333 Method Pipeline Service";
 45        After = [ "network.target" ];
 46      };
 47  
 48      Service = {
 49        Type = "simple";
 50        WorkingDirectory = projectDir;
 51  
 52        # Load environment from .env file
 53        EnvironmentFile = "${projectDir}/.env";
 54  
 55        # Run pipeline service
 56        ExecStart = "${nodejs}/bin/node src/pipeline-service.js";
 57  
 58        # Auto-restart on failure
 59        Restart = "always";
 60        RestartSec = "10s";
 61  
 62        # Run at lowest CPU priority to prevent system slowdown
 63        Nice = 19;
 64        IOSchedulingClass = "idle";
 65  
 66        # Security hardening
 67        PrivateTmp = true;
 68        NoNewPrivileges = true;
 69        ProtectSystem = "strict";
 70        ProtectHome = "read-only";
 71        ReadWritePaths = [
 72          "${projectDir}/db"
 73          "${projectDir}/logs"
 74          "${projectDir}/screenshots"
 75          "${projectDir}/.browser-profiles"
 76        ];
 77      };
 78  
 79      Install = {
 80        WantedBy = [ "default.target" ];
 81      };
 82    };
 83  }
 84  ```
 85  
 86  ---
 87  
 88  ## Setup Instructions
 89  
 90  1. **Add configuration to your NixOS config:**
 91  
 92     ```bash
 93     sudo nano /etc/nixos/configuration.nix
 94     # Add the systemd service configuration
 95     ```
 96  
 97  2. **Rebuild NixOS:**
 98  
 99     ```bash
100     sudo nixos-rebuild switch
101     ```
102  
103  3. **Reload user systemd:**
104  
105     ```bash
106     systemctl --user daemon-reload
107     ```
108  
109  4. **Enable and start the service:**
110  
111     ```bash
112     systemctl --user enable 333method-pipeline
113     systemctl --user start 333method-pipeline
114     ```
115  
116  5. **Enable linger (start on boot without login):**
117  
118     ```bash
119     sudo loginctl enable-linger jason
120     ```
121  
122  ---
123  
124  ## Managing the Service
125  
126  ### Check service status
127  
128  ```bash
129  systemctl --user status 333method-pipeline
130  ```
131  
132  ### View service logs (real-time)
133  
134  ```bash
135  journalctl --user -u 333method-pipeline -f
136  ```
137  
138  ### View recent logs
139  
140  ```bash
141  journalctl --user -u 333method-pipeline --since "1 hour ago"
142  ```
143  
144  ### Restart service (after .env changes)
145  
146  ```bash
147  systemctl --user restart 333method-pipeline
148  ```
149  
150  ### Stop service
151  
152  ```bash
153  systemctl --user stop 333method-pipeline
154  ```
155  
156  ### Disable service
157  
158  ```bash
159  systemctl --user disable 333method-pipeline
160  ```
161  
162  ---
163  
164  ## Configuration Options
165  
166  All configuration is via environment variables in `.env`:
167  
168  ### Batch Processing
169  
170  - `PIPELINE_BATCH_SIZE` - Sites per stage (default: 5)
171  - `PIPELINE_CYCLE_DELAY_MS` - Delay between cycles (default: 1000ms)
172  - `PIPELINE_PAUSE_CHECK_MS` - Pause check interval (default: 5000ms)
173  
174  ### Skip Stages
175  
176  - `SKIP_STAGES` - Comma-separated list of stages to skip (e.g., `proposals,outreach`)
177  
178    **Use case:** Skip stages waiting for QA review/approval
179  
180    **Stage names:** serps, assets, scoring, rescoring, enrich, proposals, outreach, replies
181  
182    **Example:** `SKIP_STAGES=proposals,outreach` processes through enrichment, then pauses
183  
184    **Important:** Restart service after changing SKIP_STAGES:
185  
186    ```bash
187    systemctl --user restart 333method-pipeline
188    ```
189  
190  ### Browser Configuration
191  
192  - `BROWSER_CONCURRENCY` - Concurrent browser instances for Assets stage (default: 3)
193  - `ENRICHMENT_CONCURRENCY` - Concurrent processing for Enrichment (default: 1)
194  - `CHROMIUM_PATH` - Override Chromium executable path
195  
196  ### Database & Storage
197  
198  - `DATABASE_PATH` - SQLite database path (default: `./db/sites.db`)
199  - `SCREENSHOT_BASE_PATH` - Screenshot storage directory (default: `./screenshots`)
200  
201  ---
202  
203  ## Pipeline Control
204  
205  The pipeline service respects the `pipeline_control` table for dynamic pause/resume:
206  
207  ### Pause pipeline (for manual maintenance)
208  
209  ```sql
210  UPDATE pipeline_control SET paused = 1, paused_by = 'Manual maintenance' WHERE id = 1;
211  ```
212  
213  ### Resume pipeline
214  
215  ```sql
216  UPDATE pipeline_control SET paused = 0, paused_by = NULL WHERE id = 1;
217  ```
218  
219  The service checks this table before each stage and pauses gracefully if needed.
220  
221  **Note:** Scheduled tasks (cron jobs) automatically pause/resume the pipeline.
222  
223  ---
224  
225  ## Monitoring
226  
227  ### View current stage
228  
229  ```bash
230  sqlite3 /home/jason/code/333Method/db/sites.db "SELECT current_stage, paused, paused_by FROM pipeline_control WHERE id = 1;"
231  ```
232  
233  ### View pipeline metrics (last 24 hours)
234  
235  ```bash
236  sqlite3 /home/jason/code/333Method/db/sites.db "SELECT stage_name, COUNT(*) as runs, AVG(duration_ms)/1000 as avg_seconds FROM pipeline_metrics WHERE started_at > datetime('now', '-1 day') GROUP BY stage_name;"
237  ```
238  
239  ### Check site status distribution
240  
241  ```bash
242  sqlite3 /home/jason/code/333Method/db/sites.db "SELECT status, COUNT(*) as count FROM sites GROUP BY status ORDER BY count DESC;"
243  ```
244  
245  ### View recent errors
246  
247  ```bash
248  tail -100 /home/jason/code/333Method/logs/pipeline-$(date +%Y-%m-%d).log | grep -i error
249  ```
250  
251  ---
252  
253  ## Troubleshooting
254  
255  ### Service won't start
256  
257  ```bash
258  # Check service status
259  systemctl --user status 333method-pipeline
260  
261  # View detailed logs
262  journalctl --user -u 333method-pipeline -n 50
263  
264  # Test manually
265  cd /home/jason/code/333Method
266  node src/pipeline-service.js
267  ```
268  
269  ### Pipeline stuck on one stage
270  
271  ```bash
272  # Check if paused
273  sqlite3 db/sites.db "SELECT paused, paused_by FROM pipeline_control WHERE id = 1;"
274  
275  # Resume if needed
276  sqlite3 db/sites.db "UPDATE pipeline_control SET paused = 0 WHERE id = 1;"
277  systemctl --user restart 333method-pipeline
278  ```
279  
280  ### High CPU usage
281  
282  ```bash
283  # Verify Nice priority is set
284  systemctl --user show 333method-pipeline | grep Nice
285  
286  # Should show: Nice=19
287  
288  # Reduce batch size
289  echo "PIPELINE_BATCH_SIZE=3" >> .env
290  systemctl --user restart 333method-pipeline
291  ```
292  
293  ### Environment variables not loading
294  
295  ```bash
296  # Check .env file exists and is readable
297  ls -la /home/jason/code/333Method/.env
298  
299  # Should be readable by user
300  chmod 600 /home/jason/code/333Method/.env
301  
302  # Verify environment is loaded
303  systemctl --user show 333method-pipeline | grep EnvironmentFile
304  ```
305  
306  ### Pipeline processes same sites repeatedly
307  
308  This usually means sites are failing and staying at the same status:
309  
310  ```bash
311  # Check for failing sites
312  sqlite3 db/sites.db "SELECT status, COUNT(*) FROM sites WHERE error_message IS NOT NULL GROUP BY status;"
313  
314  # View error messages
315  sqlite3 db/sites.db "SELECT domain, status, error_message FROM sites WHERE error_message IS NOT NULL LIMIT 10;"
316  ```
317  
318  ---
319  
320  ## Best Practices
321  
322  1. **Start with low batch size** - Use `PIPELINE_BATCH_SIZE=3` initially, increase if system handles it
323  2. **Monitor logs regularly** - Check `journalctl --user -u 333method-pipeline` for errors
324  3. **Use SKIP_STAGES for QA** - Skip proposals/outreach until you're ready to send
325  4. **Enable linger** - Ensure service starts on boot with `loginctl enable-linger`
326  5. **Regular backups** - Pipeline modifies database continuously, back up daily
327  6. **Watch disk space** - Screenshots accumulate quickly, monitor storage
328  
329  ---
330  
331  ## Integration with Cron System
332  
333  The pipeline service and cron system work together:
334  
335  - **Cron system** (`mmo-cron.timer`) - Runs every 1 minute for scheduled tasks
336  - **Pipeline service** (`333method-pipeline`) - Continuous pipeline processing
337  
338  When cron needs exclusive database access, it pauses the pipeline:
339  
340  ```javascript
341  // Cron task pauses pipeline
342  UPDATE pipeline_control SET paused = 1, paused_by = 'Scheduled task: backup';
343  
344  // Do database backup...
345  
346  // Resume pipeline
347  UPDATE pipeline_control SET paused = 0, paused_by = NULL;
348  ```
349  
350  The pipeline service checks `pipeline_control.paused` before each stage and yields gracefully.
351  
352  ---
353  
354  ## Security Considerations
355  
356  The systemd service includes security hardening:
357  
358  - `Nice = 19` - Lowest CPU priority (can't starve other processes)
359  - `IOSchedulingClass = "idle"` - Only uses idle disk I/O
360  - `PrivateTmp = true` - Private /tmp directory
361  - `NoNewPrivileges = true` - Can't gain additional privileges
362  - `ProtectSystem = "strict"` - Read-only system files
363  - `ProtectHome = "read-only"` - Read-only home directory
364  - `ReadWritePaths` - Only specific directories are writable
365  
366  This prevents the pipeline from:
367  
368  - Consuming too many system resources
369  - Accessing files outside its working directory
370  - Escalating privileges
371  - Interfering with other system services
372  
373  ---
374  
375  ## Logs
376  
377  Pipeline service logs to:
378  
379  1. **systemd journal** - `journalctl --user -u 333method-pipeline`
380  2. **Application logs** - `logs/pipeline-YYYY-MM-DD.log`
381  
382  Both logs include:
383  
384  - Stage execution metrics (succeeded/failed counts)
385  - Errors and warnings
386  - Pause/resume events
387  - Cycle completion times
388  
389  Log retention:
390  
391  - systemd journal: System default (usually 30 days)
392  - Application logs: 7 days (daily rotation)
393  
394  ---
395  
396  ## Performance Tuning
397  
398  ### For fast processing (high-end system)
399  
400  ```bash
401  # .env
402  PIPELINE_BATCH_SIZE=10
403  BROWSER_CONCURRENCY=5
404  ENRICHMENT_CONCURRENCY=2
405  ```
406  
407  ### For preventing system lag (low-end system)
408  
409  ```bash
410  # .env
411  PIPELINE_BATCH_SIZE=3
412  BROWSER_CONCURRENCY=1
413  ENRICHMENT_CONCURRENCY=1
414  ```
415  
416  ### For testing/development
417  
418  ```bash
419  # .env
420  PIPELINE_BATCH_SIZE=1
421  SKIP_STAGES=proposals,outreach
422  ```
423  
424  ---
425  
426  ## Alternative: System Service
427  
428  If you prefer a system-wide service:
429  
430  ```nix
431  { config, pkgs, ... }:
432  
433  {
434    systemd.services."333method-pipeline" = {
435      description = "333 Method Pipeline Service";
436      after = [ "network.target" ];
437      wantedBy = [ "multi-user.target" ];
438  
439      serviceConfig = {
440        Type = "simple";
441        User = "jason";
442        Group = "users";
443        WorkingDirectory = "/home/jason/code/333Method";
444        ExecStart = "${pkgs.nodejs_20}/bin/node src/pipeline-service.js";
445        Restart = "always";
446        RestartSec = "10s";
447        EnvironmentFile = "/home/jason/code/333Method/.env";
448        Nice = 19;
449        IOSchedulingClass = "idle";
450        PrivateTmp = true;
451        NoNewPrivileges = true;
452        ProtectSystem = "strict";
453        ProtectHome = "read-only";
454        ReadWritePaths = [
455          "/home/jason/code/333Method/db"
456          "/home/jason/code/333Method/logs"
457          "/home/jason/code/333Method/screenshots"
458          "/home/jason/code/333Method/.browser-profiles"
459        ];
460      };
461    };
462  }
463  ```
464  
465  Manage with `sudo systemctl` instead of `systemctl --user`.
466  
467  ---
468  
469  ## Recommendations
470  
471  - **For desktop/laptop**: Use user service (Option 1)
472  - **For server**: Use system service
473  - **For development**: Run manually with `node src/pipeline-service.js`
474  
475  The user service is generally preferred because:
476  
477  - Easier permission management
478  - No need for sudo to manage
479  - Automatically uses your user's environment
480  - More secure isolation from system services