/ docs / 90-archive / cron-batch-strategy.md
cron-batch-strategy.md
  1  ---
  2  title: 'Cron Batch Strategy'
  3  category: 'archive'
  4  last_verified: '2026-02-15'
  5  tags: ['cron', 'batch', 'strategy', 'scheduling', 'testing', 'api', 'ai', 'llm']
  6  status: 'archived'
  7  ---
  8  
  9  # Cron Job Batch Strategy
 10  
 11  ## Problem: Long-Running Tasks Block Other Jobs
 12  
 13  Cron jobs run **sequentially** (one at a time). If a task takes 30 minutes, nothing else runs during that time.
 14  
 15  **Example of blocking:**
 16  
 17  ```
 18  05:00 - Assets stage starts (takes 30 minutes)
 19  05:05 - Email sync is due, but BLOCKED (assets still running)
 20  05:10 - SMS sync is due, but BLOCKED
 21  05:15 - Scoring is due, but BLOCKED
 22  05:30 - Assets completes
 23  05:35 - Email sync finally runs (30 minutes late!)
 24  ```
 25  
 26  ## Solution: Time-Boxed Batches
 27  
 28  Instead of processing 100 sites in one go, each stage processes a **small batch** and returns control within 1-2 minutes.
 29  
 30  **Example with batches:**
 31  
 32  ```
 33  05:00 - Assets stage: 5 sites (2 minutes) ✅
 34  05:02 - Returns control
 35  05:05 - Email sync runs (on time!) ✅
 36  05:06 - Returns control
 37  05:10 - Scoring: 10 sites (1 minute) ✅
 38  05:11 - Returns control
 39  05:15 - Assets stage: next 5 sites ✅
 40  ```
 41  
 42  ## Current Batch Sizes
 43  
 44  These are configured in the `cron_jobs` table:
 45  
 46  | Stage          | Batch Size | Why                                        |
 47  | -------------- | ---------- | ------------------------------------------ |
 48  | **Assets**     | 5 sites    | Playwright is slow (~20-30s per site)      |
 49  | **Scoring**    | 10 sites   | Fast GPT-4o-mini calls (~5s per site)      |
 50  | **Rescoring**  | 5 sites    | Slower (below-fold screenshots + scoring)  |
 51  | **Enrichment** | 3 sites    | Very slow (browse multiple pages per site) |
 52  | **Proposals**  | 10 sites   | Fast AI generation                         |
 53  
 54  ## How It Works
 55  
 56  1. **Idempotent Stages**: Each stage queries `WHERE status = 'current_stage'`
 57  2. **Processes Batch**: Uses `LIMIT N` to process small batch
 58  3. **Updates Status**: Moves processed sites to next stage
 59  4. **Returns Quickly**: Finishes within 1-2 minutes
 60  5. **Next Run**: Picks up where it left off
 61  
 62  **Example SQL from assets stage:**
 63  
 64  ```sql
 65  SELECT id FROM sites
 66  WHERE status = 'found'
 67  AND error_message IS NULL
 68  LIMIT 5  -- Only process 5 sites
 69  ```
 70  
 71  ## Adjusting Batch Sizes
 72  
 73  ### Check Current Batch Size
 74  
 75  ```bash
 76  sqlite3 db/sites.db "SELECT task_key, handler_value FROM cron_jobs WHERE task_key LIKE 'run%';"
 77  ```
 78  
 79  ### Update Batch Size
 80  
 81  ```bash
 82  sqlite3 db/sites.db "UPDATE cron_jobs SET handler_value = 'npm run assets -- --limit 10' WHERE task_key = 'runAssets';"
 83  ```
 84  
 85  Or use the CLI:
 86  
 87  ```bash
 88  # Temporarily disable job
 89  npm run cron:disable runAssets
 90  
 91  # Update via SQL (no CLI command for editing yet)
 92  sqlite3 db/sites.db "UPDATE cron_jobs SET handler_value = 'npm run assets -- --limit 10' WHERE task_key = 'runAssets';"
 93  
 94  # Re-enable
 95  npm run cron:enable runAssets
 96  ```
 97  
 98  ## Batch Size Guidelines
 99  
100  ### Factors to Consider
101  
102  1. **Stage Speed**
103     - Assets: ~20-30s per site (Playwright screenshots)
104     - Scoring: ~5s per site (GPT-4o-mini vision)
105     - Enrichment: ~60s per site (browse multiple pages)
106     - Proposals: ~3s per site (LLM generation)
107  
108  2. **Cron Interval**
109     - Jobs run every **5 minutes**
110     - Each batch should complete in **1-2 minutes max**
111     - Leaves 3-4 minutes for other jobs
112  
113  3. **API Rate Limits**
114     - ZenRows: 20 concurrent (configured in rate-limiter)
115     - OpenRouter: 5 concurrent
116     - Twilio: 10 concurrent
117     - Batch size should respect these limits
118  
119  ### Recommended Ranges
120  
121  | Stage      | Min | Target | Max | Notes                              |
122  | ---------- | --- | ------ | --- | ---------------------------------- |
123  | Assets     | 3   | 5      | 10  | Limited by Playwright speed        |
124  | Scoring    | 5   | 10     | 20  | Limited by OpenRouter concurrency  |
125  | Rescoring  | 3   | 5      | 10  | Slower than scoring                |
126  | Enrichment | 1   | 3      | 5   | Very slow, browsing multiple pages |
127  | Proposals  | 5   | 10     | 20  | Fast LLM generation                |
128  | Outreach   | 10  | 20     | 50  | Fast API calls                     |
129  
130  ### When to Increase Batch Size
131  
132  ✅ **Increase if:**
133  
134  - Stage consistently finishes in <30 seconds
135  - Queue is backing up (100+ sites waiting)
136  - You have higher API rate limits
137  
138  ### When to Decrease Batch Size
139  
140  ❌ **Decrease if:**
141  
142  - Stage takes >2 minutes to complete
143  - Other jobs are delayed (check logs)
144  - Getting API rate limit errors
145  - Memory usage is high
146  
147  ## Monitoring Batch Performance
148  
149  ### Check Execution Times
150  
151  ```bash
152  # View recent logs for a stage
153  npm run cron:logs runAssets
154  
155  # Check average duration
156  sqlite3 db/sites.db "
157  SELECT
158    job_name,
159    AVG((julianday(finished_at) - julianday(started_at)) * 24 * 60) as avg_minutes,
160    COUNT(*) as runs
161  FROM cron_job_logs
162  WHERE job_name LIKE '%Stage%'
163    AND started_at >= datetime('now', '-7 days')
164    AND status = 'success'
165  GROUP BY job_name
166  ORDER BY avg_minutes DESC;
167  "
168  ```
169  
170  ### Check Queue Depth
171  
172  ```bash
173  sqlite3 db/sites.db "
174  SELECT
175    status,
176    COUNT(*) as count
177  FROM sites
178  GROUP BY status
179  ORDER BY count DESC;
180  "
181  ```
182  
183  ### Alert Thresholds
184  
185  - ⚠️ **Warning**: Stage takes >90 seconds
186  - 🚨 **Critical**: Stage takes >2 minutes
187  - 📊 **Queue backing up**: >100 sites at same status
188  
189  ## Alternative: SERP Stage Optimization
190  
191  The SERP stage is different - it doesn't use batches because:
192  
193  1. **Parallel Processing**: Uses `processBatch()` with concurrency
194  2. **Rate Limiter**: Bottleneck library queues excess requests
195  3. **Fast**: Completes 20 keywords in ~2 minutes
196  
197  **Current config:**
198  
199  ```javascript
200  ZENROWS_CONCURRENCY = 20; // Process 20 keywords in parallel
201  ```
202  
203  No batch size needed - it processes ALL pending keywords but limits concurrency.
204  
205  ## Best Practices
206  
207  1. **Start Conservative**: Begin with smaller batches, increase gradually
208  2. **Monitor First Week**: Check logs daily to tune batch sizes
209  3. **Leave Headroom**: Target 60-90 seconds, not 2+ minutes
210  4. **Test Changes**: Manually run stage with new limit before updating cron
211  5. **Document Changes**: Update this file when changing batch sizes
212  6. **Seasonal Adjustment**: Increase during low-traffic hours, decrease during peak
213  
214  ## Testing Batch Sizes
215  
216  ```bash
217  # Test assets stage with different limits
218  time npm run assets -- --limit 3
219  time npm run assets -- --limit 5
220  time npm run assets -- --limit 10
221  
222  # Check how many sites are waiting
223  sqlite3 db/sites.db "SELECT COUNT(*) FROM sites WHERE status = 'found';"
224  
225  # Calculate optimal batch size
226  # (Desired batch time in seconds) / (Avg time per site in seconds) = Batch size
227  # Example: 90s / 25s per site = 3.6 → use batch size of 3
228  ```
229  
230  ## Future Enhancements
231  
232  - [ ] Auto-tuning: Adjust batch size based on average execution time
233  - [ ] Queue depth monitoring: Increase batch if queue backs up
234  - [ ] Time-of-day optimization: Larger batches during off-peak hours
235  - [ ] Per-stage timeout enforcement: Kill jobs that exceed 3 minutes
236  - [ ] Parallel job execution for independent stages (email sync + scoring)
237  
238  ## Summary
239  
240  ✅ **Do:**
241  
242  - Use small batches (3-10 sites per run)
243  - Return control within 1-2 minutes
244  - Let cron orchestrate all jobs equally
245  - Monitor execution times weekly
246  
247  ❌ **Don't:**
248  
249  - Process 100+ sites in one run
250  - Block other jobs for >2 minutes
251  - Forget to adjust for API rate limits
252  - Set batch size without testing first
253  
254  The goal is **steady progress** across all stages, not completing one stage as fast as possible.