pipeline-service.md
1 --- 2 title: 'Pipeline Service' 3 category: 'automation' 4 last_verified: '2026-02-15' 5 related_files: 6 - 'src/pipeline-service.js' 7 tags: ['nixos', 'service', 'pipeline', 'automation'] 8 status: 'current' 9 --- 10 11 # Pipeline Service Configuration for 333 Method 12 13 ## Overview 14 15 The pipeline service (`src/pipeline-service.js`) runs the 333 Method pipeline continuously in a loop: 16 17 **Pipeline Flow:** SERPs → Assets → Scoring → Rescoring → Enrich → Proposals → Outreach → Replies 18 19 - Processes 5 sites per stage by default (configurable via `PIPELINE_BATCH_SIZE`) 20 - Runs with nice 19 (low CPU priority) to avoid system slowdown 21 - Pauses automatically when scheduled tasks need to run 22 - Auto-restarts on crash via systemd 23 - Respects `SKIP_STAGES` to skip specific pipeline stages 24 25 --- 26 27 ## NixOS Configuration 28 29 ### Option 1: User Service (Recommended) 30 31 Add this to your NixOS `configuration.nix` or home-manager config: 32 33 ```nix 34 { config, pkgs, ... }: 35 36 let 37 projectDir = "/home/jason/code/333Method"; 38 nodejs = pkgs.nodejs_20; 39 in 40 { 41 # Systemd service (continuous pipeline loop) 42 systemd.user.services."333method-pipeline" = { 43 Unit = { 44 Description = "333 Method Pipeline Service"; 45 After = [ "network.target" ]; 46 }; 47 48 Service = { 49 Type = "simple"; 50 WorkingDirectory = projectDir; 51 52 # Load environment from .env file 53 EnvironmentFile = "${projectDir}/.env"; 54 55 # Run pipeline service 56 ExecStart = "${nodejs}/bin/node src/pipeline-service.js"; 57 58 # Auto-restart on failure 59 Restart = "always"; 60 RestartSec = "10s"; 61 62 # Run at lowest CPU priority to prevent system slowdown 63 Nice = 19; 64 IOSchedulingClass = "idle"; 65 66 # Security hardening 67 PrivateTmp = true; 68 NoNewPrivileges = true; 69 ProtectSystem = "strict"; 70 ProtectHome = "read-only"; 71 ReadWritePaths = [ 72 "${projectDir}/db" 73 "${projectDir}/logs" 74 "${projectDir}/screenshots" 75 "${projectDir}/.browser-profiles" 76 ]; 77 }; 78 79 Install = { 80 WantedBy = [ "default.target" ]; 81 }; 82 }; 83 } 84 ``` 85 86 --- 87 88 ## Setup Instructions 89 90 1. **Add configuration to your NixOS config:** 91 92 ```bash 93 sudo nano /etc/nixos/configuration.nix 94 # Add the systemd service configuration 95 ``` 96 97 2. **Rebuild NixOS:** 98 99 ```bash 100 sudo nixos-rebuild switch 101 ``` 102 103 3. **Reload user systemd:** 104 105 ```bash 106 systemctl --user daemon-reload 107 ``` 108 109 4. **Enable and start the service:** 110 111 ```bash 112 systemctl --user enable 333method-pipeline 113 systemctl --user start 333method-pipeline 114 ``` 115 116 5. **Enable linger (start on boot without login):** 117 118 ```bash 119 sudo loginctl enable-linger jason 120 ``` 121 122 --- 123 124 ## Managing the Service 125 126 ### Check service status 127 128 ```bash 129 systemctl --user status 333method-pipeline 130 ``` 131 132 ### View service logs (real-time) 133 134 ```bash 135 journalctl --user -u 333method-pipeline -f 136 ``` 137 138 ### View recent logs 139 140 ```bash 141 journalctl --user -u 333method-pipeline --since "1 hour ago" 142 ``` 143 144 ### Restart service (after .env changes) 145 146 ```bash 147 systemctl --user restart 333method-pipeline 148 ``` 149 150 ### Stop service 151 152 ```bash 153 systemctl --user stop 333method-pipeline 154 ``` 155 156 ### Disable service 157 158 ```bash 159 systemctl --user disable 333method-pipeline 160 ``` 161 162 --- 163 164 ## Configuration Options 165 166 All configuration is via environment variables in `.env`: 167 168 ### Batch Processing 169 170 - `PIPELINE_BATCH_SIZE` - Sites per stage (default: 5) 171 - `PIPELINE_CYCLE_DELAY_MS` - Delay between cycles (default: 1000ms) 172 - `PIPELINE_PAUSE_CHECK_MS` - Pause check interval (default: 5000ms) 173 174 ### Skip Stages 175 176 - `SKIP_STAGES` - Comma-separated list of stages to skip (e.g., `proposals,outreach`) 177 178 **Use case:** Skip stages waiting for QA review/approval 179 180 **Stage names:** serps, assets, scoring, rescoring, enrich, proposals, outreach, replies 181 182 **Example:** `SKIP_STAGES=proposals,outreach` processes through enrichment, then pauses 183 184 **Important:** Restart service after changing SKIP_STAGES: 185 186 ```bash 187 systemctl --user restart 333method-pipeline 188 ``` 189 190 ### Browser Configuration 191 192 - `BROWSER_CONCURRENCY` - Concurrent browser instances for Assets stage (default: 3) 193 - `ENRICHMENT_CONCURRENCY` - Concurrent processing for Enrichment (default: 1) 194 - `CHROMIUM_PATH` - Override Chromium executable path 195 196 ### Database & Storage 197 198 - `DATABASE_PATH` - SQLite database path (default: `./db/sites.db`) 199 - `SCREENSHOT_BASE_PATH` - Screenshot storage directory (default: `./screenshots`) 200 201 --- 202 203 ## Pipeline Control 204 205 The pipeline service respects the `pipeline_control` table for dynamic pause/resume: 206 207 ### Pause pipeline (for manual maintenance) 208 209 ```sql 210 UPDATE pipeline_control SET paused = 1, paused_by = 'Manual maintenance' WHERE id = 1; 211 ``` 212 213 ### Resume pipeline 214 215 ```sql 216 UPDATE pipeline_control SET paused = 0, paused_by = NULL WHERE id = 1; 217 ``` 218 219 The service checks this table before each stage and pauses gracefully if needed. 220 221 **Note:** Scheduled tasks (cron jobs) automatically pause/resume the pipeline. 222 223 --- 224 225 ## Monitoring 226 227 ### View current stage 228 229 ```bash 230 sqlite3 /home/jason/code/333Method/db/sites.db "SELECT current_stage, paused, paused_by FROM pipeline_control WHERE id = 1;" 231 ``` 232 233 ### View pipeline metrics (last 24 hours) 234 235 ```bash 236 sqlite3 /home/jason/code/333Method/db/sites.db "SELECT stage_name, COUNT(*) as runs, AVG(duration_ms)/1000 as avg_seconds FROM pipeline_metrics WHERE started_at > datetime('now', '-1 day') GROUP BY stage_name;" 237 ``` 238 239 ### Check site status distribution 240 241 ```bash 242 sqlite3 /home/jason/code/333Method/db/sites.db "SELECT status, COUNT(*) as count FROM sites GROUP BY status ORDER BY count DESC;" 243 ``` 244 245 ### View recent errors 246 247 ```bash 248 tail -100 /home/jason/code/333Method/logs/pipeline-$(date +%Y-%m-%d).log | grep -i error 249 ``` 250 251 --- 252 253 ## Troubleshooting 254 255 ### Service won't start 256 257 ```bash 258 # Check service status 259 systemctl --user status 333method-pipeline 260 261 # View detailed logs 262 journalctl --user -u 333method-pipeline -n 50 263 264 # Test manually 265 cd /home/jason/code/333Method 266 node src/pipeline-service.js 267 ``` 268 269 ### Pipeline stuck on one stage 270 271 ```bash 272 # Check if paused 273 sqlite3 db/sites.db "SELECT paused, paused_by FROM pipeline_control WHERE id = 1;" 274 275 # Resume if needed 276 sqlite3 db/sites.db "UPDATE pipeline_control SET paused = 0 WHERE id = 1;" 277 systemctl --user restart 333method-pipeline 278 ``` 279 280 ### High CPU usage 281 282 ```bash 283 # Verify Nice priority is set 284 systemctl --user show 333method-pipeline | grep Nice 285 286 # Should show: Nice=19 287 288 # Reduce batch size 289 echo "PIPELINE_BATCH_SIZE=3" >> .env 290 systemctl --user restart 333method-pipeline 291 ``` 292 293 ### Environment variables not loading 294 295 ```bash 296 # Check .env file exists and is readable 297 ls -la /home/jason/code/333Method/.env 298 299 # Should be readable by user 300 chmod 600 /home/jason/code/333Method/.env 301 302 # Verify environment is loaded 303 systemctl --user show 333method-pipeline | grep EnvironmentFile 304 ``` 305 306 ### Pipeline processes same sites repeatedly 307 308 This usually means sites are failing and staying at the same status: 309 310 ```bash 311 # Check for failing sites 312 sqlite3 db/sites.db "SELECT status, COUNT(*) FROM sites WHERE error_message IS NOT NULL GROUP BY status;" 313 314 # View error messages 315 sqlite3 db/sites.db "SELECT domain, status, error_message FROM sites WHERE error_message IS NOT NULL LIMIT 10;" 316 ``` 317 318 --- 319 320 ## Best Practices 321 322 1. **Start with low batch size** - Use `PIPELINE_BATCH_SIZE=3` initially, increase if system handles it 323 2. **Monitor logs regularly** - Check `journalctl --user -u 333method-pipeline` for errors 324 3. **Use SKIP_STAGES for QA** - Skip proposals/outreach until you're ready to send 325 4. **Enable linger** - Ensure service starts on boot with `loginctl enable-linger` 326 5. **Regular backups** - Pipeline modifies database continuously, back up daily 327 6. **Watch disk space** - Screenshots accumulate quickly, monitor storage 328 329 --- 330 331 ## Integration with Cron System 332 333 The pipeline service and cron system work together: 334 335 - **Cron system** (`mmo-cron.timer`) - Runs every 1 minute for scheduled tasks 336 - **Pipeline service** (`333method-pipeline`) - Continuous pipeline processing 337 338 When cron needs exclusive database access, it pauses the pipeline: 339 340 ```javascript 341 // Cron task pauses pipeline 342 UPDATE pipeline_control SET paused = 1, paused_by = 'Scheduled task: backup'; 343 344 // Do database backup... 345 346 // Resume pipeline 347 UPDATE pipeline_control SET paused = 0, paused_by = NULL; 348 ``` 349 350 The pipeline service checks `pipeline_control.paused` before each stage and yields gracefully. 351 352 --- 353 354 ## Security Considerations 355 356 The systemd service includes security hardening: 357 358 - `Nice = 19` - Lowest CPU priority (can't starve other processes) 359 - `IOSchedulingClass = "idle"` - Only uses idle disk I/O 360 - `PrivateTmp = true` - Private /tmp directory 361 - `NoNewPrivileges = true` - Can't gain additional privileges 362 - `ProtectSystem = "strict"` - Read-only system files 363 - `ProtectHome = "read-only"` - Read-only home directory 364 - `ReadWritePaths` - Only specific directories are writable 365 366 This prevents the pipeline from: 367 368 - Consuming too many system resources 369 - Accessing files outside its working directory 370 - Escalating privileges 371 - Interfering with other system services 372 373 --- 374 375 ## Logs 376 377 Pipeline service logs to: 378 379 1. **systemd journal** - `journalctl --user -u 333method-pipeline` 380 2. **Application logs** - `logs/pipeline-YYYY-MM-DD.log` 381 382 Both logs include: 383 384 - Stage execution metrics (succeeded/failed counts) 385 - Errors and warnings 386 - Pause/resume events 387 - Cycle completion times 388 389 Log retention: 390 391 - systemd journal: System default (usually 30 days) 392 - Application logs: 7 days (daily rotation) 393 394 --- 395 396 ## Performance Tuning 397 398 ### For fast processing (high-end system) 399 400 ```bash 401 # .env 402 PIPELINE_BATCH_SIZE=10 403 BROWSER_CONCURRENCY=5 404 ENRICHMENT_CONCURRENCY=2 405 ``` 406 407 ### For preventing system lag (low-end system) 408 409 ```bash 410 # .env 411 PIPELINE_BATCH_SIZE=3 412 BROWSER_CONCURRENCY=1 413 ENRICHMENT_CONCURRENCY=1 414 ``` 415 416 ### For testing/development 417 418 ```bash 419 # .env 420 PIPELINE_BATCH_SIZE=1 421 SKIP_STAGES=proposals,outreach 422 ``` 423 424 --- 425 426 ## Alternative: System Service 427 428 If you prefer a system-wide service: 429 430 ```nix 431 { config, pkgs, ... }: 432 433 { 434 systemd.services."333method-pipeline" = { 435 description = "333 Method Pipeline Service"; 436 after = [ "network.target" ]; 437 wantedBy = [ "multi-user.target" ]; 438 439 serviceConfig = { 440 Type = "simple"; 441 User = "jason"; 442 Group = "users"; 443 WorkingDirectory = "/home/jason/code/333Method"; 444 ExecStart = "${pkgs.nodejs_20}/bin/node src/pipeline-service.js"; 445 Restart = "always"; 446 RestartSec = "10s"; 447 EnvironmentFile = "/home/jason/code/333Method/.env"; 448 Nice = 19; 449 IOSchedulingClass = "idle"; 450 PrivateTmp = true; 451 NoNewPrivileges = true; 452 ProtectSystem = "strict"; 453 ProtectHome = "read-only"; 454 ReadWritePaths = [ 455 "/home/jason/code/333Method/db" 456 "/home/jason/code/333Method/logs" 457 "/home/jason/code/333Method/screenshots" 458 "/home/jason/code/333Method/.browser-profiles" 459 ]; 460 }; 461 }; 462 } 463 ``` 464 465 Manage with `sudo systemctl` instead of `systemctl --user`. 466 467 --- 468 469 ## Recommendations 470 471 - **For desktop/laptop**: Use user service (Option 1) 472 - **For server**: Use system service 473 - **For development**: Run manually with `node src/pipeline-service.js` 474 475 The user service is generally preferred because: 476 477 - Easier permission management 478 - No need for sudo to manage 479 - Automatically uses your user's environment 480 - More secure isolation from system services