bot-detection.md
1 --- 2 title: 'Bot Detection' 3 category: 'architecture' 4 last_verified: '2026-03-13' 5 related_files: 6 - 'src/utils/stealth-browser.js' 7 tags: ['bot', 'detection', 'stealth', 'captcha', 'browser'] 8 status: 'current' 9 --- 10 11 # Bot Detection Avoidance 12 13 All Playwright usage goes through `src/utils/stealth-browser.js` for centralized bot-detection avoidance. 14 15 ## Core Features 16 17 - Random modern user agents (generated via `user-agents` npm package) 18 - Bezier curve mouse movements (no teleporting or straight lines) 19 - Human-like behaviors: realistic scrolling, typing, clicking with delays 20 - Smart stealth level detection (aggressive for social media, minimal for prospect sites) 21 - Configurable timezone matching IP location (prevents fingerprint inconsistencies) 22 - Cloudflare/Turnstile challenge detection and waiting 23 - Enhanced browser flags to avoid detection 24 - `playwright-extra` with `puppeteer-extra-plugin-stealth` plugin 25 26 ## Usage 27 28 ```javascript 29 import { 30 launchStealthBrowser, 31 createStealthContext, 32 humanClick, 33 humanType, 34 humanScroll, 35 randomDelay, 36 isSocialMediaUrl, 37 waitForCloudflare, 38 } from './utils/stealth-browser.js'; 39 40 // Launch browser with specific stealth level 41 const browser = await launchStealthBrowser({ stealthLevel: 'minimal' }); 42 const context = await createStealthContext(browser); 43 const page = await context.newPage(); 44 45 // Navigate and wait for Cloudflare/Turnstile 46 await page.goto(url, { waitUntil: 'networkidle', timeout: 30000 }); 47 await waitForCloudflare(page, { timeout: 30000 }); 48 49 // Use human-like actions 50 await humanScroll(page, { distance: 'viewport', smooth: true }); 51 await humanClick(page, 'button.submit'); 52 await humanType(page, 'input[name="email"]', 'test@example.com'); 53 await randomDelay(300, 700); 54 ``` 55 56 ## Stealth Levels 57 58 - `minimal` - Basic stealth, minimal delays (for prospect sites like local businesses) 59 - `standard` - Full stealth + human behaviors (balanced, default) 60 - `aggressive` - Maximum delays + extra caution (for social media scraping) 61 62 ## Smart Detection 63 64 Social media URLs (twitter.com, x.com, linkedin.com, facebook.com, instagram.com) automatically use aggressive stealth. Prospect sites use minimal stealth for speed. 65 66 ## Configuration (.env) 67 68 - `TIMEZONE` - Browser timezone (IANA format, should match IP location, default: Australia/Sydney) 69 - `ACCEPT_LANGUAGE` - Browser language preferences (default: en-AU,en;q=0.9) 70 71 ## Browser Flags for Cloudflare/Turnstile 72 73 The stealth browser includes enhanced flags to bypass detection: 74 75 - `--disable-blink-features=AutomationControlled` 76 - `--disable-features=IsolateOrigins,site-per-process` 77 - `--disable-web-security` 78 - `--disable-features=BlockInsecurePrivateNetworkRequests` 79 - `--no-first-run` 80 - `--start-maximized` 81 82 ## CAPTCHA Handling 83 84 - **Cloudflare/Turnstile**: `waitForCloudflare(page)` waits up to 30s for challenges to resolve. Detects common blocking indicators and waits for them to clear. 85 - **NopeCHA extension**: Loaded for form outreach (`src/stages/form.js`) to auto-solve CAPTCHAs on contact forms. Extension is injected via Playwright's `--load-extension` flag. 86 87 ## Testing Bot Detection 88 89 - bot.sannysoft.com - Comprehensive bot detection tests 90 - arh.antoinevastel.com/bots/areyouheadless - Headless detection 91 - pixelscan.net - Browser fingerprinting analysis 92 93 ## Module Usage 94 95 - `capture.js` - Minimal stealth (prospect site screenshots) 96 - `form.js` - Minimal stealth (prospect form submissions) 97 - `enrich.js` - Smart detection (aggressive for socials, minimal for prospects) 98 - `x.js` - Aggressive stealth + persistent profiles (social media outreach) 99 - `linkedin.js` - Aggressive stealth + persistent profiles (social media outreach) 100 101 ## Best Practices 102 103 1. **Always use stealth browser** for all Playwright operations 104 2. **Match timezone to IP location** to avoid fingerprint inconsistencies 105 3. **Wait for Cloudflare** after navigation before interacting with page 106 4. **Use human-like actions** (humanClick, humanType, humanScroll) instead of direct Playwright methods 107 5. **Add random delays** between actions to mimic human behavior 108 6. **Test regularly** using bot detection sites to verify stealth effectiveness 109 7. **Use persistent profiles** for social media to avoid re-login 110 111 ## Troubleshooting 112 113 ### Bot Detection Failures 114 115 If you're getting detected as a bot: 116 117 1. Verify timezone matches IP location in .env 118 2. Check user agent is modern (Chrome 120+) 119 3. Test on bot detection sites 120 4. Increase stealth level (minimal → standard → aggressive) 121 5. Add longer random delays between actions 122 6. Use persistent profiles for social media 123 124 ### Cloudflare/Turnstile Challenges 125 126 If challenges aren't resolving: 127 128 1. Increase waitForCloudflare timeout (default: 30s) 129 2. Check browser flags are correct 130 3. Verify stealth plugin is loaded 131 4. Test with headed browser to see what's happening 132 5. Check IP reputation (VPN/proxy issues) 133 134 ### Performance Issues 135 136 If automation is too slow: 137 138 1. Use 'minimal' stealth level for non-social sites 139 2. Reduce random delay ranges 140 3. Skip unnecessary human-like actions for internal tools 141 4. Use headless mode where possible 142 5. Optimize page load timeouts