/ README.md
README.md
1 # 333 Method Automation 2 3 Automated SERP→outreach pipeline for local business sites. Scrapes search results, scores conversion potential, generates personalized proposals, and sends multi-channel outreach (SMS, Email, Contact Forms). 4 5 ## Quick Start 6 7 ``` 8 # 1. Set up environment 9 nix-shell # Or ensure Node.js 20+ and dependencies are installed 10 11 # 2. Configure API keys in .env 12 cp .env.example .env 13 14 # 3. Initialize database 15 npm run init-db 16 17 # 4. Add a keyword and run the full pipeline 18 npm run keywords add "plumber seattle" 8 19 npm run all -- --limit 10 20 21 # Or use the POC pipeline for testing 22 npm run poc "plumber seattle" 10 23 ``` 24 25 ## Dashboard 26 27 View real-time analytics and pipeline health at http://localhost:8501 28 29 ### Start Dashboard 30 31 ```bash 32 # First time setup 33 npm run dashboard:install 34 35 # Start dashboard 36 npm run dashboard 37 38 # Development mode (auto-reload on changes) 39 npm run dashboard:dev 40 ``` 41 42 ### Features 43 44 - **Pipeline Health**: Error analysis, stuck sites, throughput tracking 45 - **Outreach Effectiveness**: Response rates, sales tracking, LLM cost analysis 46 - **Conversations**: Reply sentiment analysis and action items 47 - **Compliance**: Rate limit tracking for Resend, Twilio, and other platforms 48 - **System Health**: Cron jobs, code coverage, IP burnout detection, E2E tests 49 50 ### Dashboard Logging 51 52 All dashboard errors and logs are automatically captured to `logs/dashboard-YYYY-MM-DD.log`: 53 54 ```bash 55 # View dashboard logs 56 tail -f logs/dashboard-2026-02-08.log 57 58 # Check for errors 59 grep ERROR logs/dashboard-2026-02-08.log 60 ``` 61 62 **Error Logging in Streamlit Apps:** 63 64 The dashboard uses enhanced Python logging that captures: 65 66 - Streamlit internal errors and warnings 67 - Python exceptions and tracebacks 68 - Application-level logging (via Python's `logging` module) 69 - All stderr output 70 71 To add logging to custom Streamlit pages: 72 73 ```python 74 import logging 75 from dashboard.utils.logging_config import configure_app_logging, log_exception 76 77 logger = configure_app_logging("my-page") 78 79 try: 80 # Your code 81 result = risky_operation() 82 logger.info(f"Operation succeeded: {result}") 83 except Exception as e: 84 log_exception(logger, f"Operation failed: {e}") 85 st.error("An error occurred - check logs for details") 86 ``` 87 88 See [docs/DASHBOARD.md](docs/DASHBOARD.md) for detailed usage guide. 89 90 ## Reports 91 92 ### Daily Progress Report 93 94 Generate automated daily progress reports for stakeholder tracking: 95 96 ```bash 97 npm run report:daily 98 ``` 99 100 This generates a professional HTML report at `reports/daily-progress-YYYY-MM-DD.html` containing: 101 102 - **Git Activity**: Commits, files changed, lines added/removed 103 - **Database Changes**: New sites, outreaches, conversations by status 104 - **Code Quality**: Test coverage changes, new TODOs 105 - **System Health**: Recent errors from logs 106 - **Executive Summary**: Non-technical summary of what shipped 107 108 **Report Sections:** 109 110 1. **What Shipped** - Features, fixes, and improvements in last 24 hours 111 2. **System Metrics** - Database stats, pipeline throughput, quality gates 112 3. **Recent Commits** - Git activity with commit messages 113 4. **System Health** - Error summary and known issues 114 5. **Next Steps** - Recommended actions and blockers 115 116 Perfect for CEO updates, investor reporting, and progress documentation. 117 118 **Viewing the Report:** 119 120 - Open directly in browser: `file:///path/to/reports/daily-progress-YYYY-MM-DD.html` 121 - Or use: `xdg-open reports/daily-progress-$(date +%Y-%m-%d).html` (Linux) 122 - Print-friendly CSS included for PDF export via browser print 123 124 ## Multi-Country Support 125 126 The 333 Method now supports **25 countries** with proper localization for Google searches, proposals, and contact validation. 127 128 ### Supported Countries 129 130 **Top 25 Economies by GDP**: US, JP, DE, UK, FR, IT, CA, AU, ES, NL, KR, CH, SE, NO, AT, DK, BE, IE, SG, NZ, PL, IN, MX, ID, CN 131 132 ### Features 133 134 Each country includes: 135 136 - **Google Domain**: Country-specific search domains (e.g., `google.com.au`, `google.co.uk`) 137 - **Geo-targeting**: ZenRows premium proxy support for accurate local results 138 - **Currency & Symbols**: Automatic currency formatting (£, $, €, ¥, etc.) 139 - **Date Formats**: Localized date formats (DD/MM/YYYY, MM/DD/YYYY, etc.) 140 - **Phone Validation**: Country-specific mobile number patterns for SMS prioritization 141 - **Localization**: AI proposals use local spelling, cultural norms, and business expectations 142 - **GDPR Compliance**: EU/UK countries include company verification to avoid emailing sole traders 143 144 ### Configuration 145 146 Configure geo-targeting in `.env`: 147 148 ```bash 149 ZENROWS_PREMIUM=true # Required for geo-targeting (premium plan only) 150 ``` 151 152 ### Usage 153 154 Generate keywords for all countries or a specific country: 155 156 ```bash 157 # Generate keywords for ALL 25 countries (adds missing keywords) 158 npm run keywords generate -- --limit 100 159 160 # Generate keywords for a specific country only 161 npm run keywords generate -- --country UK --limit 100 162 163 # Generate keywords for US only 164 npm run keywords generate -- --country US --limit 50 165 ``` 166 167 Or manually insert country-specific keywords: 168 169 ```sql 170 -- Australian plumber 171 INSERT INTO keywords (keyword, country_code, google_domain) 172 VALUES ('plumber sydney', 'AU', 'google.com.au'); 173 174 -- US plumber 175 INSERT INTO keywords (keyword, country_code, google_domain) 176 VALUES ('plumber seattle', 'US', 'google.com'); 177 ``` 178 179 Proposals are automatically localized based on site country (spelling, currency, cultural norms). 180 181 **Phone Number Validation**: The system prioritizes mobile numbers based on country patterns (AU: 04xx, UK: 07xx, DE: 015x/016x/017x, etc.). 182 183 **GDPR Compliance**: EU/UK countries filter free emails and verify company registration to prevent sending to sole traders. 184 185 See `docs/MULTI-COUNTRY-PLAN.md` for implementation details. 186 187 ## Keyword Validation 188 189 Validate and expand keyword lists using real search volume data from Google via DataForSEO Labs API. 190 191 ### Setup 192 193 1. Sign up at https://dataforseo.com/ (pay-as-you-go) 194 2. Add credentials to `.env`: 195 196 ```bash 197 DATAFORSEO_LOGIN=your_email@example.com 198 DATAFORSEO_PASSWORD=your_api_password 199 KEYWORD_EXPANSION_LIMIT=50 # Max related keywords per seed 200 ``` 201 202 ### How It Works 203 204 The system uses DataForSEO's Labs API with three endpoints: 205 206 1. **Top Searches** (`/top_searches/live`) - Get 500-1000 popular local keywords per country in ONE call 207 2. **Keyword Suggestions** (`/keyword_suggestions/live`) - Expand seed keywords with related terms 208 3. **Keyword Overview** (`/keyword_overview/live`) - Batch ALL keywords in ONE request to get search volumes 209 210 This approach is 76% cheaper than the old Keywords Data API ($51 vs $212 for all countries). 211 212 ### Workflow 213 214 **Generate Search Volume Data** (one-time API cost) 215 216 ```bash 217 # Generate CSV with ALL keywords and search volumes for all countries 218 npm run keywords generate-csv -- --type businesses 219 npm run keywords generate-csv -- --type regions 220 221 # Or for specific country only 222 npm run keywords generate-csv -- --type businesses --country AU 223 ``` 224 225 This creates CSV files like `data/au/businesses-search-volume.csv` containing: 226 227 - Keyword text 228 - Monthly search volume (national) 229 - Competition level (0-100) 230 - CPC range (cost-per-click for ads) 231 - Related seed keyword 232 233 **Current Approach: Qualitative Filtering Only** 234 235 We don't use search volume cutoffs. Instead, keywords are filtered by: 236 237 - ✅ Place-specific removal (for businesses) 238 - ✅ Deterministic filters (jobs, education, products) 239 - ✅ Brand cleanup (LLM-based) 240 - ✅ Word count limits (≤3 words for businesses) 241 - ✅ Region junk removal (LLM-based) 242 243 This qualitative approach is more effective than arbitrary SV thresholds. Long-tail keywords can still convert well for local services. 244 245 ### Example Results 246 247 Test data for "plumber" in Australia: 248 249 - **plumber**: 49,500 searches/month, 46% competition, $25.07 CPC 250 - **plumbers near me**: 49,500 searches/month, 62% competition, $28.16 CPC 251 - **emergency plumber**: 6,600 searches/month, 76% competition, $36.58 CPC 252 - **electrician**: 40,500 searches/month, 39% competition, $14.39 CPC 253 254 ### Cost Breakdown 255 256 **IMPORTANT:** Retries are disabled to prevent extra charges. Failed API calls will not retry. 257 258 - **Per Keyword Expansion**: $0.075 per API call (no retries) 259 - **Typical cost per country type**: ~$5-6 (63-80 keywords × $0.075) 260 - **Full project estimate**: 25 countries × 2 types × $5.50 = ~$275 261 - **Threshold Adjustments**: FREE (uses local CSV files) 262 263 **Cost savings:** Removed automatic retry logic that was causing 49% overhead ($7-8 per country type with retries vs $5-6 without). Failed requests now fail immediately instead of retrying and incurring extra charges. 264 265 Test with one country first to verify before processing all: 266 267 ```bash 268 npm run keywords generate-csv -- --type businesses --country AU # ~$5-6 269 ``` 270 271 ### Analysis Tools 272 273 After generating CSV files, use these analysis scripts to make informed decisions: 274 275 **Coverage Analysis** - See keyword counts across countries 276 277 ```bash 278 npm run keywords:coverage # All countries 279 npm run keywords:coverage -- --country AU # Specific country 280 npm run keywords:coverage -- --type businesses 281 ``` 282 283 **Threshold Recommendations** - Find optimal cutoffs 284 285 ```bash 286 npm run keywords:recommend -- data/au/businesses-search-volume.csv --scenarios 287 npm run keywords:recommend -- data/au/businesses-search-volume.csv --target 100 288 ``` 289 290 Shows scenarios (p50, p75, p90, p95) with keyword counts and search volume impact. 291 292 **Keyword Comparison** - Before/after filtering analysis 293 294 ```bash 295 npm run keywords:compare -- data/au/businesses-search-volume.csv data/au/businesses.txt 296 npm run keywords:compare -- data/au/businesses-search-volume.csv data/au/businesses.txt --show-removed 297 ``` 298 299 Shows what was kept vs removed, search volume statistics, and missed opportunities. 300 301 ## Architecture 302 303 See `docs/ARCHITECTURE.md` for detailed system design and `docs/TODO.md` for implementation status. 304 305 **Core Pipeline**: Keywords → SERPs → Assets → Scoring → Rescoring → Enrich → Proposals → Outreach → Replies 306 307 The pipeline is now organized into 9 independent stages that can be run individually or together using `npm run all`. Each stage has its own CLI, statistics, and error handling. 308 309 **Reliability Features**: 310 311 - **Circuit Breakers**: All API calls (LLM providers, ZenRows, Twilio, Resend) are protected by circuit breakers that prevent cascading failures and excessive costs during outages. See `docs/CIRCUIT-BREAKER.md` for details. 312 - **Retry with Backoff**: Transient failures are automatically retried with exponential backoff. 313 - **Error Recovery**: Failed operations can be retried via stage-specific commands. 314 315 **Site Filtering**: 316 317 The pipeline automatically filters out business directories (Yelp, Yellow Pages, Craigslist, etc.) and social media platforms (Facebook, Twitter/X, LinkedIn, etc.) to prevent wasting API credits. Filtering uses a two-tier approach: 318 319 1. **Domain Blocklists** (`src/utils/site-filters.js`): Fast domain matching runs at the start of each stage (SERPs, Assets, Scoring, Rescoring, Enrich, Proposals). Blocklisted sites are set to `status='ignore'` with a descriptive `error_message`. 320 321 2. **LLM Fallback Detection**: The scoring prompts include an `is_business_directory` field that catches directories missed by the blocklist through visual/HTML analysis. These are also set to `status='ignore'`. 322 323 Sites with `status='ignore'` are automatically excluded from all downstream stages. Blocklists can be extended by adding domains to `src/utils/site-filters.js`. 324 325 ### Stage Selection Criteria & Status Flow 326 327 Each stage processes records based on specific criteria. Understanding these criteria ensures no records get lost during processing. 328 329 #### Sites Table Status Flow 330 331 **High-scoring sites (score > 82)**: 332 333 ``` 334 found → assets_captured → high_score ✓ 335 ``` 336 337 **Low-scoring sites (score ≤ 82)**: 338 339 ``` 340 found → assets_captured → scored → rescored → enriched → proposals_drafted → [outreach_sent] 341 ``` 342 343 **Note**: `outreach_sent` is defined in schema but not currently set by any stage. Sites remain at `proposals_drafted` after outreach delivery. 344 345 **Pipeline Flow Rules**: 346 347 - **High-scoring sites (score > 82)**: Marked as `high_score` after initial scoring (end of journey - no outreach needed) 348 - **Low-scoring sites (score ≤ 82)**: Must complete full sequence: `scored` → `rescored` → `enriched` → `proposals_drafted` 349 - **Strict sequential processing**: Each stage only processes sites from the previous stage (no skipping) 350 - **Failing sites**: Sites that exceed retry limits are marked as `failing` for human review 351 352 #### Stage-by-Stage Selection Criteria 353 354 **1. Keywords Stage** 355 356 - **Selects from**: `keywords` table 357 - **Criteria**: `status = 'active'` 358 - **Ordering**: `priority DESC, search_count ASC` 359 - **Output**: Selected keywords for SERP scraping 360 - **Note**: Keywords table has columns: keyword, priority (1-10), status ('active'/'inactive'), search_count, zenrows_count, processed_count, low_scoring_count 361 362 **2. SERPs Stage** 363 364 - **Selects from**: `keywords` table 365 - **Criteria**: `status = 'active'` 366 - **Creates**: New sites with `status = 'found'` 367 368 **3. Assets Stage** 369 370 - **Selects from**: `sites` table 371 - **Criteria**: 372 - `status IN ('found', 'assets_captured')` 373 - Then filters to sites missing `html_dom` OR incomplete screenshots (needs all 6 files) 374 - **Success**: Sets `status = 'assets_captured'` 375 - **Error**: Keeps `status = 'found'` with `error_message` set 376 - **Retry**: Processes records with `error_message IS NOT NULL` 377 378 **4. Scoring Stage** 379 380 - **Selects from**: `sites` table 381 - **Criteria**: 382 - `status = 'assets_captured'` 383 - `score IS NULL OR error_message IS NOT NULL` 384 - **Success**: 385 - Sets `status = 'high_score'` if score > 82 (end of journey) 386 - Sets `status = 'scored'` if score ≤ 82 (continues to rescoring) 387 - **Error**: Keeps `status = 'assets_captured'` with `error_message` set 388 - **Retry**: Processes records with `error_message IS NOT NULL` 389 390 **5. Rescoring Stage** 391 392 - **Selects from**: `sites` table 393 - **Criteria**: 394 - `status = 'scored'` 395 - `score <= 82` (B- or below, configurable via `config.low_score_cutoff`) 396 - `screenshot_path IS NOT NULL` 397 - `rescored_at IS NULL OR error_message IS NOT NULL` 398 - **Success**: Sets `status = 'rescored'` 399 - **Error**: Keeps `status = 'scored'` with `error_message` set 400 - **Retry**: Processes records with `error_message IS NOT NULL` 401 - **Note**: Sites with `score > 82` remain at `status = 'scored'` and skip this stage 402 403 **6. Enrich Stage** 404 405 - **Selects from**: `sites` table 406 - **Criteria**: 407 - `status = 'rescored'` (must complete rescoring first) 408 - `enriched_at IS NULL OR error_message IS NOT NULL` 409 - Then filters to sites without contact forms: `!contactsJson.primary_contact_form?.form_url` 410 - **Success**: Sets `status = 'enriched'` 411 - **Error**: Keeps `status = 'rescored'` with `error_message` set 412 - **Retry**: Processes records with `error_message IS NOT NULL` 413 - **Note**: Only low-scoring sites reach this stage (high-scoring sites stay at 'scored') 414 415 **7. Proposals Stage** 416 417 - **Selects from**: `sites` table 418 - **Criteria**: 419 - `status = 'enriched'` (must complete enrichment first) 420 - `score >= minScore AND score <= maxScore` (default: 0-82) 421 - `NOT EXISTS (SELECT 1 FROM outreaches WHERE site_id = sites.id)` 422 - **Success**: Creates `outreaches` records with `status = 'pending'`, sets site `status = 'proposals_drafted'` 423 - **Error**: Keeps `status = 'enriched'` with `error_message` set 424 - **Retry**: Processes records with `error_message IS NOT NULL` 425 - **Note**: Only processes low-scoring enriched sites (≤82 by default) 426 427 **8. Outreach Stage** 428 429 - **Selects from**: `outreaches` table (not sites) 430 - **Criteria**: 431 - `status = 'pending'` 432 - `contact_method IS NOT NULL` 433 - `contact_uri IS NOT NULL` 434 - **Success**: 435 - Sets outreach `status = 'sent'` or `'delivered'` 436 - Updates site `status = 'outreach_sent'` when all outreaches for that site are sent 437 - **Error**: Sets outreach `status = 'failed'` 438 439 **9. Replies Stage** 440 441 - **Selects from**: `conversations` table 442 - **Criteria**: `processed_at IS NULL` (unless `--all` flag used) 443 - **Success**: Sets `processed_at = CURRENT_TIMESTAMP` 444 445 #### Record Handling Notes 446 447 **High-scoring sites (score > 82) are intentionally excluded** 448 449 - Sites with `score > 82` after initial scoring remain at `status = 'scored'` indefinitely 450 - They skip rescoring (only processes `score <= 82`) 451 - They cannot reach enrichment (requires `status = 'rescored'`) 452 - They cannot get proposals (only processes `score <= 82`) 453 - **This is by design**: The pipeline only targets low-scoring sites (≤82) for outreach opportunities 454 455 --- 456 457 ## Command Reference 458 459 ### Stage-Based Pipeline (Recommended) 460 461 The new stage-based architecture provides granular control over each pipeline phase. Each stage can be run independently with `--limit` and `--skip` flags. 462 463 #### Run Complete Pipeline 464 465 ``` 466 # Run all stages from keywords to replies 467 npm run all 468 469 # Run with limit per stage 470 npm run all -- --limit 10 471 472 # Skip specific stages 473 npm run all -- --skip keywords,serps 474 475 # Continue on errors 476 npm run all -- --force 477 ``` 478 479 #### Individual Stages 480 481 **1\. Keywords** - Keyword selection and prioritization 482 483 ``` 484 # Process active keywords 485 npm run keywords 486 487 # List all keywords with stats 488 npm run keywords list 489 490 # Generate keyword combinations for a country 491 npm run keywords generate -- --country UK --limit 10 492 493 # Add a new keyword 494 npm run keywords add "plumber seattle" 8 -- --country US 495 496 # Update keyword priority 497 npm run keywords priority <id> <priority> 498 ``` 499 500 **2\. SERPs** - Scrape search results for keywords 501 502 ``` 503 # Scrape SERPs for active keywords 504 npm run serps 505 506 # Limit keywords to process 507 npm run serps -- --limit 5 508 509 # View SERP statistics 510 npm run serps stats 511 ``` 512 513 **3\. Assets** - Capture screenshots for sites 514 515 ``` 516 # Capture screenshots for found sites 517 npm run assets 518 519 # Limit sites to capture 520 npm run assets -- --limit 10 521 522 # Backfill missing screenshots 523 npm run assets backfill 20 524 525 # View assets statistics 526 npm run assets stats 527 ``` 528 529 **4\. Scoring** - Initial AI conversion scoring 530 531 ``` 532 # Score captured sites 533 npm run scoring 534 535 # Limit sites to score 536 npm run scoring -- --limit 10 537 538 # View scoring statistics 539 npm run scoring stats 540 ``` 541 542 **5\. Rescoring** - Rescore low-scoring sites (B- and below) 543 544 ``` 545 # Rescore sites with B- or below 546 npm run rescoring 547 548 # Limit sites to rescore 549 npm run rescoring -- --limit 10 550 551 # View rescoring statistics 552 npm run rescoring stats 553 ``` 554 555 **6\. Enrich** - Enrich contact details from key pages 556 557 ``` 558 # Enrich sites without contact forms by browsing key pages 559 npm run enrich 560 561 # Limit sites to enrich 562 npm run enrich -- --limit 10 563 564 # View enrichment statistics 565 npm run enrich stats 566 ``` 567 568 **7\. Proposals** - Generate personalized proposals 569 570 ``` 571 # Generate proposals for low-scoring sites 572 npm run proposals 573 574 # Limit proposals to generate 575 npm run proposals -- --limit 10 576 577 # Regenerate proposals for specific sites 578 npm run proposals regenerate <siteId1> <siteId2> 579 580 # View proposals statistics 581 npm run proposals stats 582 ``` 583 584 **8\. Outreach** - Multi-channel outreach delivery 585 586 ``` 587 # Send pending outreaches (auto channel selection) 588 npm run outreach 589 590 # Limit outreaches to send 591 npm run outreach -- --limit 10 592 593 # Send via specific channel 594 npm run outreach sms 595 npm run outreach email 596 npm run outreach form 597 npm run outreach x 598 npm run outreach linkedin 599 600 # Retry failed outreaches 601 npm run outreach retry 10 602 603 # View outreach statistics 604 npm run outreach stats 605 ``` 606 607 **9\. Replies** - Process inbound replies 608 609 ``` 610 # Process new replies 611 npm run replies 612 613 # Show all replies (not just unprocessed) 614 npm run replies -- --all 615 616 # Limit replies to show 617 npm run replies -- --limit 20 618 619 # Process opt-out requests 620 npm run replies opt-outs 621 622 # View replies statistics 623 npm run replies stats 624 ``` 625 626 ### POC Pipeline (SERP → Score) 627 628 - **Purpose**: Proof of concept / manual testing 629 - **Browser**: Visible (headed) - great for debugging 630 - **Scope**: Single keyword at a time 631 - **Output**: Beautiful summary report with grade distribution 632 - **Use case**: Manual verification, demos, troubleshooting 633 - **Benefits**: 634 - **Visual debugging** - See what's happening in real-time 635 - **Quality verification** - Manually verify scoring on new business types 636 - **Presentations/demos** - Nice visual output 637 - **Troubleshooting** - When process.js fails, POC helps diagnose 638 - **No imports found** - Nothing depends on it, so low maintenance burden 639 640 ``` 641 # Process a keyword (scrape SERP, capture screenshots, score sites) 642 npm run poc "keyword" N 643 # Example: npm run poc "plumber sydney" 10 644 645 # Process N sites from queue 646 npm run process N 647 648 # Retry failed operations 649 npm run retry 650 ``` 651 652 ### MVP Pipeline (Full End-to-End) 653 654 - **Purpose**: Production automation 655 - **Browser**: Headless (via captureWebsite) 656 - **Scope**: Batch processes entire keywords table 657 - **Output**: Basic logging 658 - **Use case**: Automated pipeline for scaling 659 660 ``` 661 # Full MVP pipeline: POC → Propose → Send 662 npm run mvp run "keyword" 663 npm run mvp run "keyword" --limit 5 664 npm run mvp run "keyword" --skip-poc --skip-outreach 665 666 # Individual stages 667 npm run mvp poc "keyword" # SERP + Score only 668 npm run mvp propose [min] [max] # Generate proposals (default: 0-82) 669 npm run mvp send [limit] # Send pending outreaches 670 ``` 671 672 ### Proposal Generation 673 674 ``` 675 # Generate proposals for a specific site 676 npm run proposals generate <site_id> 677 678 # Bulk generate for N sites scoring B- to E (0-82) 679 npm run proposals bulk N 680 681 # Regenerate proposals marked for rework 682 npm run proposals rework 683 684 # Show pending outreaches 685 npm run proposals pending 686 687 # Analyze feedback patterns (default: PROPOSAL.md, 30 days) 688 node src/proposal-generator-v2.js analyze [prompt] [days] 689 ``` 690 691 ### Outreach Approval Workflow 692 693 Batch review workflow using Google Sheets for QA collaboration: 694 695 ``` 696 # 1. Generate proposals 697 npm run proposals 698 699 # 2. Export pending outreaches to Google Sheets 700 npm run outreach:export 701 702 # 3. (QA reviews in Google Sheets, sets Action column: approve/rework/reject) 703 704 # 4. Import QA decisions back to database 705 npm run outreach:import <sheetId> 706 707 # 5. Show approval statistics 708 npm run outreach:status 709 710 # 6. Regenerate proposals marked for rework 711 node src/proposal-generator-v2.js rework 712 713 # 7. Send approved outreaches 714 npm run outreach 715 ``` 716 717 **Google Sheets Setup:** 718 719 1. Create project at https://console.cloud.google.com/ 720 2. Enable Google Sheets API 721 3. Create Service Account with Editor role 722 4. Generate JSON key and extract `client_email` + `private_key` 723 5. Add to `.env`: `GOOGLE_SHEETS_CLIENT_EMAIL`, `GOOGLE_SHEETS_PRIVATE_KEY` 724 6. (Optional) Create folder in Google Drive, share with service account, add `GOOGLE_SHEETS_FOLDER_ID` to `.env` 725 726 ### Contact Prioritization 727 728 ``` 729 # Update contact URIs for a specific site 730 node src/contacts/prioritize.js update <site_id> 731 732 # Bulk update all pending outreaches (with optional limit) 733 node src/contacts/prioritize.js bulk [N] 734 735 # Show outreach readiness report 736 node src/contacts/prioritize.js report 737 ``` 738 739 ### Outreach Channels 740 741 #### Email (Resend) 742 743 ``` 744 # Send single email 745 node src/outreach/email.js send <outreach_id> 746 747 # Bulk send approved emails (with optional limit) 748 # Note: Automatically syncs unsubscribes before sending 749 node src/outreach/email.js bulk [N] 750 751 # Manually unsubscribe an email 752 node src/outreach/email.js unsubscribe <outreach_id> 753 754 # Sync unsubscribes from Cloudflare Worker 755 npm run sync-unsubscribes 756 757 # Sync email tracking events (opens, clicks, bounces) 758 node src/utils/sync-email-events.js 759 760 # Test email configuration 761 node src/outreach/email.js test 762 ``` 763 764 **Email Tracking**: Resend automatically tracks email opens, clicks, bounces, and spam complaints. Set up the Cloudflare Worker webhook receiver (see [Cloudflare Worker Setup](#cloudflare-worker-setup) below) and run `sync-email-events.js` every 5 minutes via cron. 765 766 #### SMS (Twilio) 767 768 ``` 769 # Send single SMS 770 node src/outreach/sms.js send <outreach_id> 771 772 # Bulk send approved SMS (with optional limit) 773 node src/outreach/sms.js bulk [N] 774 775 # Test SMS configuration 776 node src/outreach/sms.js test 777 ``` 778 779 #### X / Twitter DMs (Playwright) 780 781 ``` 782 # Send single X DM (headed browser, manual review) 783 node src/outreach/x.js send <outreach_id> 784 785 # Bulk send X DMs 786 node src/outreach/x.js bulk [N] 787 ``` 788 789 #### LinkedIn Messages (Playwright) 790 791 ``` 792 # Send single LinkedIn message (headed browser, manual review) 793 node src/outreach/linkedin.js send <outreach_id> 794 795 # Bulk send LinkedIn messages 796 node src/outreach/linkedin.js bulk [N] 797 ``` 798 799 #### Contact Forms (Playwright) 800 801 ``` 802 # Submit contact form (interactive by default) 803 node src/outreach/form.js send <outreach_id> 804 805 # Run in headless mode (automated) 806 node src/outreach/form.js send <outreach_id> --headless 807 808 # Bulk submit forms 809 node src/outreach/form.js bulk [N] 810 ``` 811 812 ### Browser Profiles (X & LinkedIn) 813 814 X and LinkedIn outreach use persistent browser profiles to avoid re-login on every run. Profiles store cookies and session data, rotating across multiple accounts using an LRU strategy. 815 816 ``` 817 # List all profiles 818 npm run profiles list 819 820 # List profiles for a specific platform 821 npm run profiles list x 822 npm run profiles list linkedin 823 824 # Show which profile will be used next (LRU) 825 npm run profiles next x 826 827 # Delete a specific profile 828 npm run profiles delete x profile-1 829 ``` 830 831 **First-time setup:** Run outreach 3 times per platform to create 3 rotating profiles. Each run opens a headed browser for manual login, then auto-saves the session for future reuse. 832 833 **Configuration (.env):** 834 835 - `BROWSER_PROFILES_DIR` - Storage directory (default: `./.browser-profiles`) 836 - `X_PROFILE_COUNT` - Number of X profiles (default: 3) 837 - `LINKEDIN_PROFILE_COUNT` - Number of LinkedIn profiles (default: 3) 838 839 ### Inbound Handling 840 841 #### SMS Replies (Twilio) 842 843 ``` 844 # Poll Twilio API for new inbound SMS messages 845 npm run inbound:sms 846 # (or: node src/inbound/sms.js poll) 847 848 # Process pending operator replies 849 npm run inbound:process-replies 850 ``` 851 852 **Setup**: 853 854 - Webhooks are handled by Cloudflare Workers (see [Cloudflare Worker Setup](#cloudflare-worker-setup)) 855 - For local testing or as backup, use polling (cron every 5 minutes): 856 - Inbound messages are stored in the `conversations` table and matched to outreach records by phone number 857 - Operator replies marked as `direction='outbound'` are automatically sent via `process-replies` 858 859 ### Database & Maintenance 860 861 ``` 862 # Initialize/reset database 863 npm run init-db 864 865 # Apply database migrations 866 npm run db-migrate 867 868 # Apply migrations with force mode (for existing databases) 869 npm run db-migrate -- --force 870 871 # Backfill keywords from sites table 872 npm run backfill-keywords 873 874 # Recapture missing screenshots for N sites 875 npm run backfill-screenshots N 876 877 # Deduplicate domains (locale-aware: prefer exact country match) 878 npm run dedupe:locale:dry-run # Preview locale-aware deduplication 879 npm run dedupe:locale # Execute locale-aware deduplication 880 881 # Legacy deduplication (search volume only, no locale consideration) 882 npm run dedupe:dry-run # Preview search volume deduplication 883 npm run dedupe # Execute search volume deduplication 884 885 # Analyze competitors for a keyword 886 npm run competitors "keyword" 887 ``` 888 889 ### Agent System 890 891 The multi-agent system provides autonomous development, testing, and maintenance. See [docs/06-automation/agent-system.md](docs/06-automation/agent-system.md) for comprehensive documentation. 892 893 ```bash 894 # View agent status and health 895 npm run agent:list 896 897 # View pending tasks 898 npm run agent:tasks 899 900 # View agent logs 901 npm run agent:logs 902 903 # View tasks awaiting approval 904 npm run agent:approvals 905 906 # Approve a task 907 npm run agent:approve -- --task-id 123 --reviewer "Your Name" --decision approved 908 909 # View agent statistics 910 npm run agent:stats 911 912 # Reset circuit breakers (prepare for activation) 913 npm run agent:reset-breakers:dry-run # Preview what would be reset 914 npm run agent:reset-breakers # Reset breakers older than 30 minutes 915 npm run agent:reset-breakers:force # Force reset all breakers + cleanup old tasks 916 ``` 917 918 **Circuit Breaker Management:** 919 920 Circuit breakers protect against cascading failures by blocking agents with >30% failure rates. Auto-recovery happens after 30 minutes if failure rate drops. See [docs/06-automation/circuit-breaker-management.md](docs/06-automation/circuit-breaker-management.md) for details. 921 922 Before activating the agent system: 923 924 1. Run `npm run agent:reset-breakers:dry-run` to check status 925 2. Run `npm run agent:reset-breakers:force` to reset and cleanup 926 3. Run `npm run agent:list` to verify all agents are active 927 4. Enable cron: Set `AGENT_SYSTEM_ENABLED=true` in `.env` 928 929 ### Quality & Testing 930 931 ``` 932 # Run all unit tests 933 npm test 934 935 # Run integration tests only (requires RESEND_API_KEY in .env) 936 npm run test:integration 937 938 # Run all tests (unit + integration) 939 npm run test:all 940 941 # Watch mode for tests 942 npm run test:watch 943 944 # Run with coverage report (included in npm test) 945 npm test 946 947 # Lint code 948 npm run lint 949 npm run lint:fix # Auto-fix issues 950 951 # Format code 952 npm run format 953 npm run format:check 954 955 # Full quality check (runs all checks + Sage AI review) 956 npm run quality-check 957 958 # 🚀 UNIFIED AUTO-FIX (Recommended!) 959 # Runs ALL automated maintenance tasks in one go: 960 # - Prettier formatting 961 # - ESLint auto-fix 962 # - Security audit fixes (npm audit fix) 963 # - Dependency updates (patches + minors + npm outdated) 964 # - Sage AI quality fixes (requires claude CLI in PATH) 965 # - Documentation checks 966 # All fixes are committed to shared "autofix" branch for review 967 npm run autofix 968 969 # View what's in the autofix branch 970 npm run autofix:summary 971 972 # Individual auto-fix tasks (if you want granular control) 973 npm run sage:autofix # AI-powered quality fixes only 974 ``` 975 976 ### Maintenance 977 978 ```bash 979 # Quick health check (vulnerabilities, lint, tests) 980 npm run maint:quick 981 982 # Full audit (includes outdated packages) 983 npm run maint:audit 984 985 # Database integrity check and optimization 986 npm run maint:db 987 988 # Analyze CLAUDE.md for duplication (non-destructive) 989 npm run maint:claude 990 991 # Weekly maintenance (all of the above + cleanup) 992 ./scripts/weekly-maintenance.sh 993 ``` 994 995 **CLAUDE.md Optimization:** 996 997 The project includes tools to keep [CLAUDE.md](CLAUDE.md) optimized: 998 999 - **Analysis**: `npm run maint:claude` analyzes for duplication and generates a report 1000 - **Non-destructive**: Does not modify CLAUDE.md automatically 1001 - **Weekly automation**: Set up with cron (see [docs/CRON-SETUP.md](docs/CRON-SETUP.md)) 1002 - **Reports**: Saved to `.claude-analysis/` (git-ignored) 1003 1004 ```bash 1005 # Run analysis 1006 npm run maint:claude 1007 1008 # View analysis report 1009 cat .claude-analysis/analysis-2026-02-03.md 1010 1011 # Set up weekly cron job (optional) 1012 crontab -e 1013 # Add: 0 2 * * 0 cd /path/to/project && ./scripts/weekly-maintenance.sh >> logs/weekly-maintenance.log 2>&1 1014 ``` 1015 1016 **Git Hooks**: Pre-commit and pre-push hooks are automatically installed via `simple-git-hooks`: 1017 1018 - **Pre-commit**: Runs `npm run format && npm run lint` to ensure code quality 1019 - **Pre-push**: Runs `npm test` to prevent pushing broken code 1020 - Hooks are installed automatically when running `npm install` 1021 - Skip hooks temporarily: `SKIP_SIMPLE_GIT_HOOKS=1 git commit` 1022 1023 **GitHub Actions**: Automated CI/CD workflows: 1024 1025 - **PR Quality Check**: Runs on all pull requests and pushes to main 1026 - **Weekly Maintenance**: Scheduled every Monday at 9 AM UTC 1027 - Checks for vulnerabilities and outdated packages 1028 - Runs full test suite with coverage 1029 - Creates GitHub issue if problems found 1030 1031 **Maintenance Schedule**: 1032 1033 - **Weekly**: Review Dependabot PRs, check vulnerabilities, review test coverage 1034 - **Monthly**: Database maintenance, documentation audit, unused code review 1035 - See [docs/MAINTENANCE.md](docs/MAINTENANCE.md) for detailed maintenance plan 1036 1037 ### Monitoring & Credits 1038 1039 ```bash 1040 # Check OpenRouter credit balance 1041 npm run credits # Quick check 1042 npm run credits:verbose # Detailed info with rate limits 1043 npm run credits:monitor # Run monitoring cron job (logs to DB, alerts if low) 1044 1045 # Pipeline monitoring 1046 npm run monitor:status # System health summary (alias: npm run watchdog:status) 1047 npm run monitor:guardian # Run process guardian manually 1048 1049 # API rate limit management 1050 npm run rate-limits # Show current rate limit status 1051 npm run rate-limits:clear # Clear all rate limits 1052 npm run rate-limits:clear -- --clear zenrows # Clear specific API 1053 ``` 1054 1055 ### Inbound (npm scripts) 1056 1057 ```bash 1058 # Poll for new inbound messages 1059 npm run inbound:poll # Poll all channels 1060 npm run inbound:sms # Poll Twilio for new inbound SMS 1061 npm run inbound:email # Poll Resend API for inbound email replies 1062 1063 # Process operator replies 1064 npm run inbound:process-replies # Send pending operator replies 1065 1066 # View conversations 1067 npm run inbound:inbox # View inbox 1068 npm run inbound:thread # View conversation thread 1069 npm run inbound:stats # Inbound statistics 1070 ``` 1071 1072 ### Pricing 1073 1074 ```bash 1075 # View pricing 1076 npm run pricing:summary # Summary of all country pricing 1077 npm run pricing:get # Get price for a country/tier 1078 npm run pricing:tier # Show tier info 1079 1080 # Update pricing 1081 npm run pricing:update # Run weekly repricing (cultural/PPP-adjusted) 1082 npm run pricing:override # Override price for a specific country 1083 npm run pricing:export # Export pricing data to JSON 1084 ``` 1085 1086 ### Cleanup & Deduplication 1087 1088 ```bash 1089 # Screenshot cleanup 1090 npm run cleanup:screenshots # Remove screenshots for ignored sites 1091 npm run cleanup:screenshots:dry-run 1092 npm run cleanup:uncropped # Delete uncropped screenshots (saves disk) 1093 npm run cleanup:uncropped:dry-run 1094 1095 # Site cleanup 1096 npm run cleanup:reset-failing # Reset failing sites back to prior stage 1097 npm run cleanup:reset-failing:dry-run 1098 1099 # Outreach deduplication 1100 npm run dedupe:outreaches # Deduplicate outreach records 1101 npm run dedupe:outreaches:dry-run 1102 1103 # Contact validation 1104 npm run validate-contacts # Validate contact data 1105 ``` 1106 1107 ### Security 1108 1109 ```bash 1110 # Run all security checks 1111 npm run security # Full suite: audit + lint + snyk + semgrep 1112 npm run security:fix # Auto-fix everything possible 1113 1114 # Individual checks 1115 npm run security:audit # npm vulnerability audit 1116 npm run security:audit:fix # Auto-fix npm vulnerabilities 1117 npm run security:lint # Security-focused ESLint rules 1118 npm run security:snyk # Snyk vulnerability scan 1119 npm run security:semgrep # Semgrep static analysis 1120 npm run security:scan # Custom security scan 1121 ``` 1122 1123 ### Dependency Management 1124 1125 ```bash 1126 npm run deps:update # Update minor/patch dependencies (tests, rolls back on failure) 1127 npm run deps:update:patches # Patch updates only (safest) 1128 npm run deps:update:all # All updates including majors (manual review needed) 1129 npm run deps:update:dry-run # Preview updates without applying 1130 ``` 1131 1132 --- 1133 1134 ## Environment Variables 1135 1136 Create `.env` file with: 1137 1138 ``` 1139 # API Keys 1140 ZENROWS_API_KEY=your_zenrows_key # SERP scraping 1141 1142 # LLM Provider (OpenRouter — all LLM calls route through here) 1143 OPENROUTER_API_KEY=your_openrouter_key # Multi-model AI gateway 1144 1145 # Email Service (Resend) 1146 RESEND_API_KEY=your_resend_key # Email delivery 1147 RESEND_TEST_API_KEY=your_test_key # Optional: For integration tests 1148 1149 # SMS Service (Twilio) 1150 TWILIO_ACCOUNT_SID=your_twilio_sid # SMS delivery 1151 TWILIO_AUTH_TOKEN=your_twilio_token 1152 TWILIO_PHONE_NUMBER=+1234567890 1153 1154 # Sender Details (for email, contact forms, etc.) 1155 SENDER_NAME=Your Name # Sender name for outreach 1156 SENDER_EMAIL=your@email.com # Sender email address 1157 SENDER_PHONE=+1234567890 # Sender phone for contact forms 1158 SENDER_COMPANY=Your Company # Company name 1159 1160 # Cloudflare Workers (for unsubscribes and email tracking) 1161 UNSUBSCRIBE_WORKER_URL=https://unsubscribe-worker.YOUR-SUBDOMAIN.workers.dev 1162 EMAIL_EVENTS_WORKER_URL=https://resend-webhook-worker.YOUR-SUBDOMAIN.workers.dev 1163 UNSUBSCRIBE_SECRET=your-secret-key # HMAC secret for unsubscribe links 1164 1165 # Optional 1166 DATABASE_PATH=db/sites.db # Custom database location 1167 1168 # Browser Stealth Configuration (Bot Detection Avoidance) 1169 STEALTH_LEVEL=standard # minimal|standard|aggressive 1170 ENABLE_HUMAN_BEHAVIORS=true # Human-like delays and movements 1171 ENABLE_BEZIER_MOUSE=true # Bezier curve mouse movements (no teleporting) 1172 TIMEZONE=Australia/Sydney # Browser timezone (should match IP location) 1173 1174 # Browser Profiles (X & LinkedIn persistent sessions) 1175 BROWSER_PROFILES_DIR=./.browser-profiles # Profile storage directory 1176 X_PROFILE_COUNT=3 # Number of X profiles to rotate 1177 LINKEDIN_PROFILE_COUNT=3 # Number of LinkedIn profiles to rotate 1178 ``` 1179 1180 --- 1181 1182 ## Database Schema 1183 1184 Main tables: 1185 1186 - **sites**: Domain, screenshots, scores, contacts 1187 - **outreaches**: Proposal variants, delivery status, tracking 1188 - **conversations**: Inbound/outbound message threads 1189 - **config**: Global settings (sender email, templates) 1190 - **unsubscribed_emails**: CAN-SPAM compliance (global unsubscribe list) 1191 - **opt_outs**: TCPA/CAN-SPAM opt-outs (phone & email) 1192 - **keywords**: Keyword tracking and scoring (created via migration, not in base schema) 1193 - **migrations**: Migration tracking (auto-created by migration system) 1194 1195 See `db/schema.sql` for base schema. Keywords table created via `db/migrations/add-keywords-table.sql` and `013-fix-keywords-schema.sql`. 1196 1197 ### Database Migrations 1198 1199 The project uses an automated migration system to manage schema changes. Migrations are stored in `db/migrations/` and tracked in a `migrations` table to prevent duplicate execution. 1200 1201 **Running Migrations:** 1202 1203 ``` 1204 # Apply all pending migrations 1205 npm run db-migrate 1206 1207 # For existing databases with manually applied changes, use force mode 1208 npm run db-migrate -- --force 1209 ``` 1210 1211 **Creating New Migrations:** 1212 1213 1. Create a new `.sql` file in `db/migrations/` with a sortable prefix (e.g., `006-add-new-field.sql`) 1214 2. Write your SQL statements (ALTER TABLE, CREATE INDEX, etc.) 1215 3. Run `npm run db-migrate` to apply it 1216 1217 **Migration Safety:** 1218 1219 - Each migration runs in a transaction (automatically rolled back on failure) 1220 - The `migrations` table tracks which migrations have been applied 1221 - Force mode (`--force`) skips migrations that would fail due to existing schema (useful for existing databases) 1222 - Always backup your database before running migrations: `cp db/sites.db db/backup/sites-backup.db` 1223 1224 --- 1225 1226 ## Cloudflare Worker Setup 1227 1228 The project uses Cloudflare Workers to handle unsubscribes and email tracking events. This avoids running a 24/7 webhook server. 1229 1230 ### Initial Setup 1231 1232 ``` 1233 # Navigate to cloudflare-worker directory 1234 cd cloudflare-worker 1235 1236 # Install wrangler locally 1237 npm install 1238 1239 # Login to Cloudflare 1240 npm run login 1241 1242 # Create R2 buckets 1243 npx wrangler r2 bucket create unsubscribes 1244 npx wrangler r2 bucket create email-events 1245 ``` 1246 1247 ### Deploy Unsubscribe Worker 1248 1249 ``` 1250 # Set the UNSUBSCRIBE_SECRET (must match your .env) 1251 npx wrangler secret put UNSUBSCRIBE_SECRET 1252 # When prompted, paste your secret from .env 1253 1254 # Deploy worker 1255 npx wrangler deploy 1256 # Note the worker URL: https://unsubscribe-worker.YOUR-SUBDOMAIN.workers.dev 1257 ``` 1258 1259 ### Deploy Email Events Worker 1260 1261 ``` 1262 # Deploy the email tracking webhook worker 1263 npx wrangler deploy --config wrangler-resend.toml 1264 # Note the worker URL: https://resend-webhook-worker.YOUR-SUBDOMAIN.workers.dev 1265 ``` 1266 1267 ### Configure Resend Webhooks 1268 1269 1. Log into [Resend Dashboard](https://resend.com/webhooks) 1270 2. Click "Add Webhook" 1271 3. Enter webhook URL: `https://resend-webhook-worker.YOUR-SUBDOMAIN.workers.dev/webhook/resend` 1272 4. Enable events: `email.opened`, `email.clicked`, `email.bounced`, `email.complained`, `email.delivered` 1273 5. Save webhook 1274 1275 ### Enable Domain Tracking in Resend 1276 1277 1. Go to [Resend Domains](https://resend.com/domains) 1278 2. Select your domain 1279 3. Scroll to "Tracking" section 1280 4. Enable "Open Tracking" and "Click Tracking" 1281 5. Save changes 1282 1283 ### Update .env 1284 1285 Add the worker URLs to your `.env`: 1286 1287 ``` 1288 UNSUBSCRIBE_WORKER_URL=https://unsubscribe-worker.YOUR-SUBDOMAIN.workers.dev 1289 EMAIL_EVENTS_WORKER_URL=https://resend-webhook-worker.YOUR-SUBDOMAIN.workers.dev 1290 ``` 1291 1292 ### Set Up Cron Jobs 1293 1294 The 333 Method includes a **database-driven cron system** that manages all scheduled tasks (real-time syncs, pipeline stages, maintenance, quality checks). 1295 1296 #### Quick Setup 1297 1298 ```bash 1299 # 1. Initialize cron jobs table 1300 sqlite3 db/sites.db < db/migrations/029-create-cron-jobs-table.sql 1301 1302 # 2. Seed database with default jobs 1303 npm run cron:migrate 1304 1305 # 3. View all jobs 1306 npm run cron:list 1307 1308 # 4. Configure systemd (or use crontab) 1309 # systemd: /etc/systemd/system/333method-cron.timer (runs every 5 minutes) 1310 # OR crontab: */5 * * * * cd /path/to/project && node src/cron.js 1311 ``` 1312 1313 #### Management Commands 1314 1315 ```bash 1316 # List all jobs (with status and last run times) 1317 npm run cron:list 1318 1319 # Enable/disable jobs 1320 npm run cron:enable syncEmailEvents 1321 npm run cron:disable runTests 1322 1323 # View execution logs 1324 npm run cron:logs syncEmailEvents 1325 1326 # Show statistics 1327 npm run cron:stats 1328 1329 # Add new job 1330 npm run cron:add -- \ 1331 --name "My Task" \ 1332 --key myTask \ 1333 --handler "npm run my-script" \ 1334 --type command \ 1335 --interval 60 \ 1336 --unit minutes 1337 1338 # Remove job 1339 npm run cron:remove myTask 1340 ``` 1341 1342 #### Default Jobs 1343 1344 **Real-time (every 5 minutes)**: 1345 1346 - Sync email events (opens, clicks, bounces) 1347 - Sync unsubscribes 1348 - Poll inbound SMS 1349 - Run assets stage (5 sites per batch) 1350 - Run scoring stage (10 sites per batch) 1351 - Run rescoring stage (5 sites per batch) 1352 - Run enrichment stage (3 sites per batch) 1353 - Run proposals stage (10 sites per batch) 1354 1355 > **Note**: Pipeline stages use time-boxed batches to prevent blocking other jobs. Each batch completes in 1-2 minutes and returns control to the cron system. See [docs/CRON-BATCH-STRATEGY.md](docs/CRON-BATCH-STRATEGY.md) for details. 1356 1357 **Daily**: 1358 1359 - Database maintenance (PRAGMA optimize, integrity check) 1360 - Security audit (npm audit) 1361 - Check outdated dependencies 1362 1363 **Weekly**: 1364 1365 - Database vacuum and analyze 1366 - Database backup 1367 - Update Claude Code CLI 1368 - Performance analysis 1369 1370 **Monthly**: 1371 1372 - Technical debt review (TODO.md tracking) 1373 - Full security scan 1374 1375 See [docs/CRON-JOBS.md](docs/CRON-JOBS.md) for detailed documentation, systemd integration, and adding custom jobs. 1376 1377 --- 1378 1379 ## Development Workflow 1380 1381 ### Adding New Features 1382 1383 1. **Write code** in `src/` or `scripts/` 1384 2. **Add tests** in `tests/` (aim for >80% coverage) 1385 3. **Update this README** with new commands/features 1386 4. **Run quality checks**: `npm run quality-check` 1387 5. **Commit changes** 1388 1389 ### Running Tests 1390 1391 ``` 1392 # While developing 1393 npm run test:watch 1394 1395 # Before committing 1396 npm run quality-check 1397 ``` 1398 1399 ### Code Quality Standards 1400 1401 - **ESLint**: Zero errors (warnings acceptable for complexity/await) 1402 - **Prettier**: Enforced formatting 1403 - **Tests**: 80%+ coverage target 1404 - **Sage AI**: Review enabled for quality-check 1405 1406 ### Logging and Log Rotation 1407 1408 All operational npm scripts automatically log to `./logs/` with daily rotation (7-day retention by default). 1409 1410 **Coverage:** 84 operational commands logged including: 1411 1412 - All pipeline stages (keywords, serps, assets, scoring, rescoring, enrich, proposals, outreach, replies) 1413 - Database operations (init-db, migrations, backfill) 1414 - Sync operations (email events, unsubscribes, inbound processing) 1415 - Security scans, cron jobs, pricing updates 1416 - **Dashboard operations** (Streamlit Python app) 1417 1418 **Log Files:** 1419 1420 ```bash 1421 # View logs for a specific script 1422 cat logs/keywords-2026-02-08.log 1423 cat logs/serps-2026-02-08.log 1424 cat logs/outreach-2026-02-08.log 1425 cat logs/dashboard-2026-02-08.log 1426 1427 # Tail logs in real-time 1428 tail -f logs/all-2026-02-08.log 1429 1430 # List all log types 1431 ls logs/*.log | sed 's|logs/||' | sed 's|-2026-.*||' | sort | uniq 1432 ``` 1433 1434 **Log Rotation:** 1435 1436 ```bash 1437 # Rotate logs manually (delete files older than 7 days) 1438 npm run logs:rotate 1439 1440 # Dry-run to see what would be deleted 1441 npm run logs:rotate:dry-run 1442 1443 # Rotate with custom retention (30 days) 1444 npm run logs:rotate:30d 1445 ``` 1446 1447 **Automatic Rotation:** 1448 1449 Add to crontab for daily rotation at 2 AM: 1450 1451 ```bash 1452 0 2 * * * cd /path/to/project && node src/cron/daily-log-rotation.js 1453 ``` 1454 1455 **Log Format:** 1456 1457 Each log entry includes: 1458 1459 - Timestamp (ISO 8601) 1460 - Script name 1461 - Log level (INFO, SUCCESS, WARN, ERROR) 1462 - Message and optional data 1463 1464 Example: 1465 1466 ``` 1467 [2026-02-08T07:22:23.322Z] [keywords] [INFO] Starting keyword processing 1468 [2026-02-08T07:22:23.405Z] [keywords] [OUTPUT] Processing keyword: plumber seattle 1469 [2026-02-08T07:22:25.123Z] [keywords] [SUCCESS] Completed successfully 1470 ``` 1471 1472 --- 1473 1474 ## Project Structure 1475 1476 ``` 1477 333/ 1478 ├── src/ 1479 │ ├── all.js # Full pipeline orchestrator 1480 │ ├── stages/ # Stage-based pipeline modules 1481 │ │ ├── keywords.js # Keyword selection 1482 │ │ ├── serps.js # SERP scraping 1483 │ │ ├── assets.js # Screenshot capture 1484 │ │ ├── scoring.js # Initial AI scoring 1485 │ │ ├── rescoring.js # Rescore low-scoring sites 1486 │ │ ├── proposals.js # Proposal generation 1487 │ │ ├── outreach.js # Multi-channel delivery 1488 │ │ └── replies.js # Inbound reply processing 1489 │ ├── cli/ # CLI entry points for stages 1490 │ │ ├── keywords.js 1491 │ │ ├── serps.js 1492 │ │ ├── assets.js 1493 │ │ ├── scoring.js 1494 │ │ ├── rescoring.js 1495 │ │ ├── proposals.js 1496 │ │ ├── outreach.js 1497 │ │ └── replies.js 1498 │ ├── scrape.js # SERP scraping (ZenRows) 1499 │ ├── capture.js # Screenshot capture (Playwright) 1500 │ ├── score.js # Conversion scoring (AI) 1501 │ ├── poc.js # POC pipeline orchestration 1502 │ ├── process.js # Queue-based processing 1503 │ ├── mvp.js # MVP pipeline orchestration 1504 │ ├── proposal-generator-v2.js # Proposal generation 1505 │ ├── competitor-analysis.js # Competitor research 1506 │ ├── retry-failed.js # Error recovery 1507 │ ├── contacts/ 1508 │ │ └── prioritize.js # Contact method decision logic 1509 │ ├── outreach/ 1510 │ │ ├── email.js # Resend integration 1511 │ │ ├── sms.js # Twilio integration 1512 │ │ ├── form.js # Playwright form automation 1513 │ │ ├── x.js # X/Twitter DM automation 1514 │ │ └── linkedin.js # LinkedIn message automation 1515 │ ├── inbound/ 1516 │ │ └── sms.js # Twilio webhook server 1517 │ └── utils/ 1518 │ ├── logger.js # Colored console logging 1519 │ ├── error-handler.js # Retry logic 1520 │ ├── llm-provider.js # LLM provider abstraction (OpenRouter/Anthropic) 1521 │ ├── image-optimizer.js # Screenshot optimization 1522 │ ├── flag-parser.js # CLI flag parsing 1523 │ └── summary-generator.js # Beautiful terminal summaries 1524 ├── tests/ # Unit & integration tests 1525 ├── scripts/ # Utility scripts 1526 ├── db/ 1527 │ ├── schema.sql # Database schema 1528 │ ├── migrations/ # Schema migrations 1529 │ └── sites.db # SQLite database 1530 ├── docs/ 1531 │ ├── ARCHITECTURE.md # System design 1532 │ ├── TODO.md # Implementation roadmap 1533 │ ├── FUNCTIONAL-SPEC.md # Feature specifications 1534 │ ├── MULTI-COUNTRY-PLAN.md # Internationalization plan 1535 │ ├── BEST-PRACTICES-EMAIL.md # Email compliance (CAN-SPAM) 1536 │ └── prompts/ # AI prompts 1537 └── .clinerules/ # Cline automation rules 1538 ``` 1539 1540 --- 1541 1542 ## Testing 1543 1544 ``` 1545 # Run specific test file 1546 node --test tests/prioritize.test.js 1547 1548 # Run with coverage (c8, included in npm test) 1549 npm test 1550 1551 # Current coverage: ~64% (target: 85%+) 1552 ``` 1553 1554 --- 1555 1556 ## Troubleshooting 1557 1558 ### Database locked errors 1559 1560 ``` 1561 # Close all connections, then: 1562 sqlite3 db/sites.db "PRAGMA optimize;" 1563 ``` 1564 1565 ### Playwright browser issues 1566 1567 ``` 1568 # In nix-shell, Playwright uses system chromium 1569 # Outside nix-shell: 1570 npx playwright install chromium 1571 ``` 1572 1573 ### Missing screenshots 1574 1575 ``` 1576 npm run backfill-screenshots 10 1577 ``` 1578 1579 ### API rate limits 1580 1581 - **ZenRows SERP API**: 1582 - **Daily quota**: 1,000 requests/day 1583 - **Concurrency**: Configurable via `ZENROWS_CONCURRENCY` env var (default: 20) 1584 - Plan limits: Free (5), Developer (10), Business (100), Enterprise (custom) 1585 - Reference: [ZenRows Concurrency Docs](https://docs.zenrows.com/universal-scraper-api/features/concurrency) 1586 - LLM Providers: 1587 - OpenRouter: Pay-per-use (GPT-4o-mini: ~$0.15/1M tokens) 1588 - Anthropic: Direct API pricing (Claude 3.5 Sonnet: $3/$15 per 1M tokens) 1589 - Resend: 100 emails/day (free tier) 1590 - Twilio: Pay-per-message (~$0.0075/SMS) 1591 1592 --- 1593 1594 ## Unsubscribe System (CAN-SPAM Compliance) 1595 1596 The project includes a complete unsubscribe system for email compliance: 1597 1598 ### Architecture 1599 1600 1. **Static HTML Page** (`public/unsubscribe.html`) - User-facing unsubscribe page 1601 2. **Cloudflare Worker** (`cloudflare-worker/`) - Handles unsubscribe requests and stores them 1602 3. **Local Sync Script** (`src/utils/sync-unsubscribes.js`) - Polls worker and imports to SQLite 1603 4. **Global Unsubscribe List** (`unsubscribed_emails` table) - Blocks all future emails to unsubscribed addresses 1604 1605 ### Setup 1606 1607 1. Deploy the Cloudflare Worker (see `cloudflare-worker/README.md` for instructions) 1608 2. Upload `public/unsubscribe.html` to your static hosting 1609 3. Set `UNSUBSCRIBE_WORKER_URL` in `.env` 1610 4. Set `UNSUBSCRIBE_BASE_URL` in `.env` to point to your static page 1611 5. Generate a secure `UNSUBSCRIBE_SECRET` (same in `.env` and Cloudflare Worker) 1612 1613 ### Usage 1614 1615 ``` 1616 # Sync unsubscribes from Cloudflare Worker 1617 npm run sync-unsubscribes 1618 1619 # Manually unsubscribe an email 1620 node src/outreach/email.js unsubscribe <outreach_id> 1621 1622 # Bulk send automatically syncs before sending 1623 node src/outreach/email.js bulk 10 1624 ``` 1625 1626 ### How It Works 1627 1628 1. Each email includes a unique HMAC-secured unsubscribe link 1629 2. User clicks link → redirected to static HTML page 1630 3. Page POSTs to Cloudflare Worker with ID and token 1631 4. Worker validates token and appends to `unsubscribes.json` in R2 1632 5. Local script polls worker and imports to `unsubscribed_emails` table 1633 6. Future sends automatically check global unsubscribe list 1634 1635 ### Security 1636 1637 - HMAC tokens prevent unauthorized unsubscribes 1638 - Timing-safe comparison prevents timing attacks 1639 - Same secret must be in `.env` and Cloudflare Worker environment 1640 - Generate secret with: `openssl rand -hex 32` 1641 1642 --- 1643 1644 ## Support & Documentation 1645 1646 - **Architecture**: See `docs/ARCHITECTURE.md` 1647 - **Implementation Status**: See `docs/TODO.md` 1648 - **Compliance**: See `docs/BEST-PRACTICES-EMAIL.md` and `docs/BEST-PRACTICES-SMS.md` 1649 - **Prompts**: See `docs/prompts/` for AI prompt templates 1650 1651 --- 1652 1653 ## License 1654 1655 MIT 1656 1657 ``` 1658 */5 * * * * cd /path/to/project && node src/inbound/sms.js poll 1659 ```