/ docs / 90-archive / keyword-rotation-fix.md
keyword-rotation-fix.md
  1  ---
  2  title: 'Keyword Rotation Fix'
  3  category: 'archive'
  4  last_verified: '2026-02-15'
  5  tags: ['keyword', 'rotation', 'fix', 'cron', 'scheduling', 'testing', 'database', 'api']
  6  status: 'archived'
  7  ---
  8  
  9  # Keyword Rotation Fix
 10  
 11  ## Problem
 12  
 13  `npm run serps -- --limit 10` was processing the same 10 keywords every time instead of rotating through the keyword list.
 14  
 15  ## Root Cause
 16  
 17  The SERP stage query ordered keywords by:
 18  
 19  ```sql
 20  ORDER BY k.priority DESC, k.search_count ASC
 21  ```
 22  
 23  When many keywords have the same `priority` (e.g., 10) and `search_count` (e.g., 0), the database returns them in ID order, causing the same keywords to be selected repeatedly.
 24  
 25  ## Solution
 26  
 27  Modified the SERP stage query to include `last_searched_at` in the sort order:
 28  
 29  ```sql
 30  ORDER BY k.priority DESC, k.last_searched_at ASC NULLS FIRST, k.search_count ASC
 31  ```
 32  
 33  **How it works:**
 34  
 35  1. **Priority first** - High-priority keywords processed before low-priority
 36  2. **Oldest search next** - Never-searched (NULL) or least-recently-searched keywords prioritized
 37  3. **Search count last** - Tie-breaker for keywords with same priority and last_searched_at
 38  
 39  This ensures each `--limit N` batch rotates through different keywords.
 40  
 41  ## Cron Strategy Recommendations
 42  
 43  ### Option 1: Small Batches, High Frequency (RECOMMENDED)
 44  
 45  **Best for:** Continuous pipeline flow, fast iteration, manageable costs
 46  
 47  ```bash
 48  # Every 30 minutes, process 5 keywords (50 sites max)
 49  */30 * * * * cd /path/to/333Method && npm run serps -- --limit 5 >> logs/cron-serps.log 2>&1
 50  ```
 51  
 52  **Benefits:**
 53  
 54  - Rotates through all keywords systematically
 55  - Low ZenRows API usage per run (~50 requests)
 56  - Fast completion (5-10 minutes per batch)
 57  - Frequent updates to pipeline
 58  
 59  **Calculation:**
 60  
 61  - 5 keywords × 10 results = 50 sites per batch
 62  - 50 sites × 20 seconds = ~17 minutes per run
 63  - 48 runs/day × 50 sites = 2,400 sites/day max
 64  - Daily ZenRows usage: ~2,400 requests (within 1,000/day limit if running 2×/day)
 65  
 66  ### Option 2: Full Daily Run
 67  
 68  **Best for:** Simplicity, low maintenance
 69  
 70  ```bash
 71  # Once daily at 2 AM, process all keywords
 72  0 2 * * * cd /path/to/333Method && npm run serps >> logs/cron-serps.log 2>&1
 73  ```
 74  
 75  **Benefits:**
 76  
 77  - Simple to manage
 78  - Processes all keywords once per day
 79  - No need to tune batch sizes
 80  
 81  **Drawbacks:**
 82  
 83  - Long runtime (could take hours for large keyword lists)
 84  - Blocks other cron jobs during execution
 85  - Exceeds daily ZenRows quota (1,000 requests/day) if you have >100 keywords
 86  
 87  ### Option 3: Medium Batches, Moderate Frequency
 88  
 89  **Best for:** Balanced approach
 90  
 91  ```bash
 92  # Every 2 hours, process 10 keywords (100 sites max)
 93  0 */2 * * * cd /path/to/333Method && npm run serps -- --limit 10 >> logs/cron-serps.log 2>&1
 94  ```
 95  
 96  **Benefits:**
 97  
 98  - 12 runs/day × 100 sites = 1,200 sites/day
 99  - Completes full rotation through keywords faster than Option 1
100  - Still manageable per-run duration (~30 minutes)
101  
102  **Calculation:**
103  
104  - 10 keywords × 10 results = 100 sites per batch
105  - 100 sites × 20 seconds = ~33 minutes per run
106  - Daily ZenRows usage: ~1,200 requests (slightly over 1,000/day limit)
107  
108  ## Recommended Approach
109  
110  **Start with Option 1** (small batches, high frequency):
111  
112  ```bash
113  # Add to crontab
114  crontab -e
115  
116  # Add this line
117  */30 * * * * cd /home/jason/SyncThing.Code/333Method && npm run serps -- --limit 5 >> logs/cron-serps.log 2>&1
118  ```
119  
120  **Monitor and adjust:**
121  
122  1. Check `npm run cron:logs serps` after a few runs
123  2. Verify keywords are rotating (different keywords each run)
124  3. Monitor ZenRows usage in dashboard
125  4. Increase `--limit` or frequency if you have unused quota
126  5. Decrease if you hit rate limits
127  
128  ## Removing the Keywords Stage
129  
130  The `keywords` stage (`npm run keywords`) is now **obsolete** - it just lists keywords without processing them. Consider:
131  
132  1. **Keep it as a management tool** - Useful for viewing keyword status:
133  
134     ```bash
135     npm run keywords stats  # View keyword statistics
136     ```
137  
138  2. **Remove --limit parameter** - It doesn't make sense for a non-processing stage
139  
140  3. **Update documentation** - Clarify that SERP stage does the actual work
141  
142  ## Migration Path
143  
144  **Immediate changes:**
145  
146  1. ✅ SERP stage now rotates keywords properly
147  2. Update cron jobs to use `npm run serps -- --limit N` instead of `npm run keywords`
148  3. Remove any `npm run keywords` cron jobs (they don't do anything useful)
149  
150  **Future cleanup:**
151  
152  1. Simplify keywords stage to pure management (add/list/update)
153  2. Update README.md to clarify stage purposes
154  3. Consider renaming to `npm run keywords:manage` to avoid confusion
155  
156  ## Testing the Fix
157  
158  **Verify rotation works:**
159  
160  ```bash
161  # Run first batch
162  npm run serps -- --limit 5
163  
164  # Check which keywords were processed
165  sqlite3 db/sites.db "SELECT keyword, last_searched_at FROM keywords WHERE last_searched_at IS NOT NULL ORDER BY last_searched_at DESC LIMIT 5"
166  
167  # Run second batch
168  npm run serps -- --limit 5
169  
170  # Verify different keywords were processed
171  sqlite3 db/sites.db "SELECT keyword, last_searched_at FROM keywords WHERE last_searched_at IS NOT NULL ORDER BY last_searched_at DESC LIMIT 10"
172  ```
173  
174  You should see 10 different keywords in the output.
175  
176  ## Keyword Generation Cron Job
177  
178  A separate cron job handles **keyword generation** (creating keyword combinations from business + region CSV files):
179  
180  **Job Details:**
181  
182  - **Name:** Generate Keyword Combinations
183  - **Key:** `generateKeywords`
184  - **Schedule:** Daily (1 day interval)
185  - **Handler:** `npm run keywords generate`
186  - **Runtime:** ~1-2 minutes for 166K keywords across 25 countries
187  - **Type:** Idempotent (safe to run repeatedly - skips duplicates via `upsertKeyword`)
188  
189  **View job status:**
190  
191  ```bash
192  npm run cron:list | grep "Generate Keyword"
193  npm run cron:logs generateKeywords
194  ```
195  
196  **When it runs:**
197  
198  - Daily regeneration of all keyword combinations
199  - Catches any new keyword files added to `data/{country}/`
200  - Updates existing keywords with new priorities or search volumes
201  
202  **Note:** This is separate from SERP scraping. Keyword generation creates the combinations in the database, while SERP scraping (`npm run serps`) processes those keywords to find sites.