keyword-rotation-fix.md
1 --- 2 title: 'Keyword Rotation Fix' 3 category: 'archive' 4 last_verified: '2026-02-15' 5 tags: ['keyword', 'rotation', 'fix', 'cron', 'scheduling', 'testing', 'database', 'api'] 6 status: 'archived' 7 --- 8 9 # Keyword Rotation Fix 10 11 ## Problem 12 13 `npm run serps -- --limit 10` was processing the same 10 keywords every time instead of rotating through the keyword list. 14 15 ## Root Cause 16 17 The SERP stage query ordered keywords by: 18 19 ```sql 20 ORDER BY k.priority DESC, k.search_count ASC 21 ``` 22 23 When many keywords have the same `priority` (e.g., 10) and `search_count` (e.g., 0), the database returns them in ID order, causing the same keywords to be selected repeatedly. 24 25 ## Solution 26 27 Modified the SERP stage query to include `last_searched_at` in the sort order: 28 29 ```sql 30 ORDER BY k.priority DESC, k.last_searched_at ASC NULLS FIRST, k.search_count ASC 31 ``` 32 33 **How it works:** 34 35 1. **Priority first** - High-priority keywords processed before low-priority 36 2. **Oldest search next** - Never-searched (NULL) or least-recently-searched keywords prioritized 37 3. **Search count last** - Tie-breaker for keywords with same priority and last_searched_at 38 39 This ensures each `--limit N` batch rotates through different keywords. 40 41 ## Cron Strategy Recommendations 42 43 ### Option 1: Small Batches, High Frequency (RECOMMENDED) 44 45 **Best for:** Continuous pipeline flow, fast iteration, manageable costs 46 47 ```bash 48 # Every 30 minutes, process 5 keywords (50 sites max) 49 */30 * * * * cd /path/to/333Method && npm run serps -- --limit 5 >> logs/cron-serps.log 2>&1 50 ``` 51 52 **Benefits:** 53 54 - Rotates through all keywords systematically 55 - Low ZenRows API usage per run (~50 requests) 56 - Fast completion (5-10 minutes per batch) 57 - Frequent updates to pipeline 58 59 **Calculation:** 60 61 - 5 keywords × 10 results = 50 sites per batch 62 - 50 sites × 20 seconds = ~17 minutes per run 63 - 48 runs/day × 50 sites = 2,400 sites/day max 64 - Daily ZenRows usage: ~2,400 requests (within 1,000/day limit if running 2×/day) 65 66 ### Option 2: Full Daily Run 67 68 **Best for:** Simplicity, low maintenance 69 70 ```bash 71 # Once daily at 2 AM, process all keywords 72 0 2 * * * cd /path/to/333Method && npm run serps >> logs/cron-serps.log 2>&1 73 ``` 74 75 **Benefits:** 76 77 - Simple to manage 78 - Processes all keywords once per day 79 - No need to tune batch sizes 80 81 **Drawbacks:** 82 83 - Long runtime (could take hours for large keyword lists) 84 - Blocks other cron jobs during execution 85 - Exceeds daily ZenRows quota (1,000 requests/day) if you have >100 keywords 86 87 ### Option 3: Medium Batches, Moderate Frequency 88 89 **Best for:** Balanced approach 90 91 ```bash 92 # Every 2 hours, process 10 keywords (100 sites max) 93 0 */2 * * * cd /path/to/333Method && npm run serps -- --limit 10 >> logs/cron-serps.log 2>&1 94 ``` 95 96 **Benefits:** 97 98 - 12 runs/day × 100 sites = 1,200 sites/day 99 - Completes full rotation through keywords faster than Option 1 100 - Still manageable per-run duration (~30 minutes) 101 102 **Calculation:** 103 104 - 10 keywords × 10 results = 100 sites per batch 105 - 100 sites × 20 seconds = ~33 minutes per run 106 - Daily ZenRows usage: ~1,200 requests (slightly over 1,000/day limit) 107 108 ## Recommended Approach 109 110 **Start with Option 1** (small batches, high frequency): 111 112 ```bash 113 # Add to crontab 114 crontab -e 115 116 # Add this line 117 */30 * * * * cd /home/jason/SyncThing.Code/333Method && npm run serps -- --limit 5 >> logs/cron-serps.log 2>&1 118 ``` 119 120 **Monitor and adjust:** 121 122 1. Check `npm run cron:logs serps` after a few runs 123 2. Verify keywords are rotating (different keywords each run) 124 3. Monitor ZenRows usage in dashboard 125 4. Increase `--limit` or frequency if you have unused quota 126 5. Decrease if you hit rate limits 127 128 ## Removing the Keywords Stage 129 130 The `keywords` stage (`npm run keywords`) is now **obsolete** - it just lists keywords without processing them. Consider: 131 132 1. **Keep it as a management tool** - Useful for viewing keyword status: 133 134 ```bash 135 npm run keywords stats # View keyword statistics 136 ``` 137 138 2. **Remove --limit parameter** - It doesn't make sense for a non-processing stage 139 140 3. **Update documentation** - Clarify that SERP stage does the actual work 141 142 ## Migration Path 143 144 **Immediate changes:** 145 146 1. ✅ SERP stage now rotates keywords properly 147 2. Update cron jobs to use `npm run serps -- --limit N` instead of `npm run keywords` 148 3. Remove any `npm run keywords` cron jobs (they don't do anything useful) 149 150 **Future cleanup:** 151 152 1. Simplify keywords stage to pure management (add/list/update) 153 2. Update README.md to clarify stage purposes 154 3. Consider renaming to `npm run keywords:manage` to avoid confusion 155 156 ## Testing the Fix 157 158 **Verify rotation works:** 159 160 ```bash 161 # Run first batch 162 npm run serps -- --limit 5 163 164 # Check which keywords were processed 165 sqlite3 db/sites.db "SELECT keyword, last_searched_at FROM keywords WHERE last_searched_at IS NOT NULL ORDER BY last_searched_at DESC LIMIT 5" 166 167 # Run second batch 168 npm run serps -- --limit 5 169 170 # Verify different keywords were processed 171 sqlite3 db/sites.db "SELECT keyword, last_searched_at FROM keywords WHERE last_searched_at IS NOT NULL ORDER BY last_searched_at DESC LIMIT 10" 172 ``` 173 174 You should see 10 different keywords in the output. 175 176 ## Keyword Generation Cron Job 177 178 A separate cron job handles **keyword generation** (creating keyword combinations from business + region CSV files): 179 180 **Job Details:** 181 182 - **Name:** Generate Keyword Combinations 183 - **Key:** `generateKeywords` 184 - **Schedule:** Daily (1 day interval) 185 - **Handler:** `npm run keywords generate` 186 - **Runtime:** ~1-2 minutes for 166K keywords across 25 countries 187 - **Type:** Idempotent (safe to run repeatedly - skips duplicates via `upsertKeyword`) 188 189 **View job status:** 190 191 ```bash 192 npm run cron:list | grep "Generate Keyword" 193 npm run cron:logs generateKeywords 194 ``` 195 196 **When it runs:** 197 198 - Daily regeneration of all keyword combinations 199 - Catches any new keyword files added to `data/{country}/` 200 - Updates existing keywords with new priorities or search volumes 201 202 **Note:** This is separate from SERP scraping. Keyword generation creates the combinations in the database, while SERP scraping (`npm run serps`) processes those keywords to find sites.