/ docs / 02-architecture / bot-detection.md
bot-detection.md
  1  ---
  2  title: 'Bot Detection'
  3  category: 'architecture'
  4  last_verified: '2026-03-13'
  5  related_files:
  6    - 'src/utils/stealth-browser.js'
  7  tags: ['bot', 'detection', 'stealth', 'captcha', 'browser']
  8  status: 'current'
  9  ---
 10  
 11  # Bot Detection Avoidance
 12  
 13  All Playwright usage goes through `src/utils/stealth-browser.js` for centralized bot-detection avoidance.
 14  
 15  ## Core Features
 16  
 17  - Random modern user agents (generated via `user-agents` npm package)
 18  - Bezier curve mouse movements (no teleporting or straight lines)
 19  - Human-like behaviors: realistic scrolling, typing, clicking with delays
 20  - Smart stealth level detection (aggressive for social media, minimal for prospect sites)
 21  - Configurable timezone matching IP location (prevents fingerprint inconsistencies)
 22  - Cloudflare/Turnstile challenge detection and waiting
 23  - Enhanced browser flags to avoid detection
 24  - `playwright-extra` with `puppeteer-extra-plugin-stealth` plugin
 25  
 26  ## Usage
 27  
 28  ```javascript
 29  import {
 30    launchStealthBrowser,
 31    createStealthContext,
 32    humanClick,
 33    humanType,
 34    humanScroll,
 35    randomDelay,
 36    isSocialMediaUrl,
 37    waitForCloudflare,
 38  } from './utils/stealth-browser.js';
 39  
 40  // Launch browser with specific stealth level
 41  const browser = await launchStealthBrowser({ stealthLevel: 'minimal' });
 42  const context = await createStealthContext(browser);
 43  const page = await context.newPage();
 44  
 45  // Navigate and wait for Cloudflare/Turnstile
 46  await page.goto(url, { waitUntil: 'networkidle', timeout: 30000 });
 47  await waitForCloudflare(page, { timeout: 30000 });
 48  
 49  // Use human-like actions
 50  await humanScroll(page, { distance: 'viewport', smooth: true });
 51  await humanClick(page, 'button.submit');
 52  await humanType(page, 'input[name="email"]', 'test@example.com');
 53  await randomDelay(300, 700);
 54  ```
 55  
 56  ## Stealth Levels
 57  
 58  - `minimal` - Basic stealth, minimal delays (for prospect sites like local businesses)
 59  - `standard` - Full stealth + human behaviors (balanced, default)
 60  - `aggressive` - Maximum delays + extra caution (for social media scraping)
 61  
 62  ## Smart Detection
 63  
 64  Social media URLs (twitter.com, x.com, linkedin.com, facebook.com, instagram.com) automatically use aggressive stealth. Prospect sites use minimal stealth for speed.
 65  
 66  ## Configuration (.env)
 67  
 68  - `TIMEZONE` - Browser timezone (IANA format, should match IP location, default: Australia/Sydney)
 69  - `ACCEPT_LANGUAGE` - Browser language preferences (default: en-AU,en;q=0.9)
 70  
 71  ## Browser Flags for Cloudflare/Turnstile
 72  
 73  The stealth browser includes enhanced flags to bypass detection:
 74  
 75  - `--disable-blink-features=AutomationControlled`
 76  - `--disable-features=IsolateOrigins,site-per-process`
 77  - `--disable-web-security`
 78  - `--disable-features=BlockInsecurePrivateNetworkRequests`
 79  - `--no-first-run`
 80  - `--start-maximized`
 81  
 82  ## CAPTCHA Handling
 83  
 84  - **Cloudflare/Turnstile**: `waitForCloudflare(page)` waits up to 30s for challenges to resolve. Detects common blocking indicators and waits for them to clear.
 85  - **NopeCHA extension**: Loaded for form outreach (`src/stages/form.js`) to auto-solve CAPTCHAs on contact forms. Extension is injected via Playwright's `--load-extension` flag.
 86  
 87  ## Testing Bot Detection
 88  
 89  - bot.sannysoft.com - Comprehensive bot detection tests
 90  - arh.antoinevastel.com/bots/areyouheadless - Headless detection
 91  - pixelscan.net - Browser fingerprinting analysis
 92  
 93  ## Module Usage
 94  
 95  - `capture.js` - Minimal stealth (prospect site screenshots)
 96  - `form.js` - Minimal stealth (prospect form submissions)
 97  - `enrich.js` - Smart detection (aggressive for socials, minimal for prospects)
 98  - `x.js` - Aggressive stealth + persistent profiles (social media outreach)
 99  - `linkedin.js` - Aggressive stealth + persistent profiles (social media outreach)
100  
101  ## Best Practices
102  
103  1. **Always use stealth browser** for all Playwright operations
104  2. **Match timezone to IP location** to avoid fingerprint inconsistencies
105  3. **Wait for Cloudflare** after navigation before interacting with page
106  4. **Use human-like actions** (humanClick, humanType, humanScroll) instead of direct Playwright methods
107  5. **Add random delays** between actions to mimic human behavior
108  6. **Test regularly** using bot detection sites to verify stealth effectiveness
109  7. **Use persistent profiles** for social media to avoid re-login
110  
111  ## Troubleshooting
112  
113  ### Bot Detection Failures
114  
115  If you're getting detected as a bot:
116  
117  1. Verify timezone matches IP location in .env
118  2. Check user agent is modern (Chrome 120+)
119  3. Test on bot detection sites
120  4. Increase stealth level (minimal → standard → aggressive)
121  5. Add longer random delays between actions
122  6. Use persistent profiles for social media
123  
124  ### Cloudflare/Turnstile Challenges
125  
126  If challenges aren't resolving:
127  
128  1. Increase waitForCloudflare timeout (default: 30s)
129  2. Check browser flags are correct
130  3. Verify stealth plugin is loaded
131  4. Test with headed browser to see what's happening
132  5. Check IP reputation (VPN/proxy issues)
133  
134  ### Performance Issues
135  
136  If automation is too slow:
137  
138  1. Use 'minimal' stealth level for non-social sites
139  2. Reduce random delay ranges
140  3. Skip unnecessary human-like actions for internal tools
141  4. Use headless mode where possible
142  5. Optimize page load timeouts