/ README.md
README.md
   1  # 333 Method Automation
   2  
   3  Automated SERP→outreach pipeline for local business sites. Scrapes search results, scores conversion potential, generates personalized proposals, and sends multi-channel outreach (SMS, Email, Contact Forms).
   4  
   5  ## Quick Start
   6  
   7  ```
   8  # 1. Set up environment
   9  nix-shell  # Or ensure Node.js 20+ and dependencies are installed
  10  
  11  # 2. Configure API keys in .env
  12  cp .env.example .env
  13  
  14  # 3. Initialize database
  15  npm run init-db
  16  
  17  # 4. Add a keyword and run the full pipeline
  18  npm run keywords add "plumber seattle" 8
  19  npm run all -- --limit 10
  20  
  21  # Or use the POC pipeline for testing
  22  npm run poc "plumber seattle" 10
  23  ```
  24  
  25  ## Dashboard
  26  
  27  View real-time analytics and pipeline health at http://localhost:8501
  28  
  29  ### Start Dashboard
  30  
  31  ```bash
  32  # First time setup
  33  npm run dashboard:install
  34  
  35  # Start dashboard
  36  npm run dashboard
  37  
  38  # Development mode (auto-reload on changes)
  39  npm run dashboard:dev
  40  ```
  41  
  42  ### Features
  43  
  44  - **Pipeline Health**: Error analysis, stuck sites, throughput tracking
  45  - **Outreach Effectiveness**: Response rates, sales tracking, LLM cost analysis
  46  - **Conversations**: Reply sentiment analysis and action items
  47  - **Compliance**: Rate limit tracking for Resend, Twilio, and other platforms
  48  - **System Health**: Cron jobs, code coverage, IP burnout detection, E2E tests
  49  
  50  ### Dashboard Logging
  51  
  52  All dashboard errors and logs are automatically captured to `logs/dashboard-YYYY-MM-DD.log`:
  53  
  54  ```bash
  55  # View dashboard logs
  56  tail -f logs/dashboard-2026-02-08.log
  57  
  58  # Check for errors
  59  grep ERROR logs/dashboard-2026-02-08.log
  60  ```
  61  
  62  **Error Logging in Streamlit Apps:**
  63  
  64  The dashboard uses enhanced Python logging that captures:
  65  
  66  - Streamlit internal errors and warnings
  67  - Python exceptions and tracebacks
  68  - Application-level logging (via Python's `logging` module)
  69  - All stderr output
  70  
  71  To add logging to custom Streamlit pages:
  72  
  73  ```python
  74  import logging
  75  from dashboard.utils.logging_config import configure_app_logging, log_exception
  76  
  77  logger = configure_app_logging("my-page")
  78  
  79  try:
  80      # Your code
  81      result = risky_operation()
  82      logger.info(f"Operation succeeded: {result}")
  83  except Exception as e:
  84      log_exception(logger, f"Operation failed: {e}")
  85      st.error("An error occurred - check logs for details")
  86  ```
  87  
  88  See [docs/DASHBOARD.md](docs/DASHBOARD.md) for detailed usage guide.
  89  
  90  ## Reports
  91  
  92  ### Daily Progress Report
  93  
  94  Generate automated daily progress reports for stakeholder tracking:
  95  
  96  ```bash
  97  npm run report:daily
  98  ```
  99  
 100  This generates a professional HTML report at `reports/daily-progress-YYYY-MM-DD.html` containing:
 101  
 102  - **Git Activity**: Commits, files changed, lines added/removed
 103  - **Database Changes**: New sites, outreaches, conversations by status
 104  - **Code Quality**: Test coverage changes, new TODOs
 105  - **System Health**: Recent errors from logs
 106  - **Executive Summary**: Non-technical summary of what shipped
 107  
 108  **Report Sections:**
 109  
 110  1. **What Shipped** - Features, fixes, and improvements in last 24 hours
 111  2. **System Metrics** - Database stats, pipeline throughput, quality gates
 112  3. **Recent Commits** - Git activity with commit messages
 113  4. **System Health** - Error summary and known issues
 114  5. **Next Steps** - Recommended actions and blockers
 115  
 116  Perfect for CEO updates, investor reporting, and progress documentation.
 117  
 118  **Viewing the Report:**
 119  
 120  - Open directly in browser: `file:///path/to/reports/daily-progress-YYYY-MM-DD.html`
 121  - Or use: `xdg-open reports/daily-progress-$(date +%Y-%m-%d).html` (Linux)
 122  - Print-friendly CSS included for PDF export via browser print
 123  
 124  ## Multi-Country Support
 125  
 126  The 333 Method now supports **25 countries** with proper localization for Google searches, proposals, and contact validation.
 127  
 128  ### Supported Countries
 129  
 130  **Top 25 Economies by GDP**: US, JP, DE, UK, FR, IT, CA, AU, ES, NL, KR, CH, SE, NO, AT, DK, BE, IE, SG, NZ, PL, IN, MX, ID, CN
 131  
 132  ### Features
 133  
 134  Each country includes:
 135  
 136  - **Google Domain**: Country-specific search domains (e.g., `google.com.au`, `google.co.uk`)
 137  - **Geo-targeting**: ZenRows premium proxy support for accurate local results
 138  - **Currency & Symbols**: Automatic currency formatting (£, $, €, ¥, etc.)
 139  - **Date Formats**: Localized date formats (DD/MM/YYYY, MM/DD/YYYY, etc.)
 140  - **Phone Validation**: Country-specific mobile number patterns for SMS prioritization
 141  - **Localization**: AI proposals use local spelling, cultural norms, and business expectations
 142  - **GDPR Compliance**: EU/UK countries include company verification to avoid emailing sole traders
 143  
 144  ### Configuration
 145  
 146  Configure geo-targeting in `.env`:
 147  
 148  ```bash
 149  ZENROWS_PREMIUM=true  # Required for geo-targeting (premium plan only)
 150  ```
 151  
 152  ### Usage
 153  
 154  Generate keywords for all countries or a specific country:
 155  
 156  ```bash
 157  # Generate keywords for ALL 25 countries (adds missing keywords)
 158  npm run keywords generate -- --limit 100
 159  
 160  # Generate keywords for a specific country only
 161  npm run keywords generate -- --country UK --limit 100
 162  
 163  # Generate keywords for US only
 164  npm run keywords generate -- --country US --limit 50
 165  ```
 166  
 167  Or manually insert country-specific keywords:
 168  
 169  ```sql
 170  -- Australian plumber
 171  INSERT INTO keywords (keyword, country_code, google_domain)
 172  VALUES ('plumber sydney', 'AU', 'google.com.au');
 173  
 174  -- US plumber
 175  INSERT INTO keywords (keyword, country_code, google_domain)
 176  VALUES ('plumber seattle', 'US', 'google.com');
 177  ```
 178  
 179  Proposals are automatically localized based on site country (spelling, currency, cultural norms).
 180  
 181  **Phone Number Validation**: The system prioritizes mobile numbers based on country patterns (AU: 04xx, UK: 07xx, DE: 015x/016x/017x, etc.).
 182  
 183  **GDPR Compliance**: EU/UK countries filter free emails and verify company registration to prevent sending to sole traders.
 184  
 185  See `docs/MULTI-COUNTRY-PLAN.md` for implementation details.
 186  
 187  ## Keyword Validation
 188  
 189  Validate and expand keyword lists using real search volume data from Google via DataForSEO Labs API.
 190  
 191  ### Setup
 192  
 193  1. Sign up at https://dataforseo.com/ (pay-as-you-go)
 194  2. Add credentials to `.env`:
 195  
 196  ```bash
 197  DATAFORSEO_LOGIN=your_email@example.com
 198  DATAFORSEO_PASSWORD=your_api_password
 199  KEYWORD_EXPANSION_LIMIT=50        # Max related keywords per seed
 200  ```
 201  
 202  ### How It Works
 203  
 204  The system uses DataForSEO's Labs API with three endpoints:
 205  
 206  1. **Top Searches** (`/top_searches/live`) - Get 500-1000 popular local keywords per country in ONE call
 207  2. **Keyword Suggestions** (`/keyword_suggestions/live`) - Expand seed keywords with related terms
 208  3. **Keyword Overview** (`/keyword_overview/live`) - Batch ALL keywords in ONE request to get search volumes
 209  
 210  This approach is 76% cheaper than the old Keywords Data API ($51 vs $212 for all countries).
 211  
 212  ### Workflow
 213  
 214  **Generate Search Volume Data** (one-time API cost)
 215  
 216  ```bash
 217  # Generate CSV with ALL keywords and search volumes for all countries
 218  npm run keywords generate-csv -- --type businesses
 219  npm run keywords generate-csv -- --type regions
 220  
 221  # Or for specific country only
 222  npm run keywords generate-csv -- --type businesses --country AU
 223  ```
 224  
 225  This creates CSV files like `data/au/businesses-search-volume.csv` containing:
 226  
 227  - Keyword text
 228  - Monthly search volume (national)
 229  - Competition level (0-100)
 230  - CPC range (cost-per-click for ads)
 231  - Related seed keyword
 232  
 233  **Current Approach: Qualitative Filtering Only**
 234  
 235  We don't use search volume cutoffs. Instead, keywords are filtered by:
 236  
 237  - ✅ Place-specific removal (for businesses)
 238  - ✅ Deterministic filters (jobs, education, products)
 239  - ✅ Brand cleanup (LLM-based)
 240  - ✅ Word count limits (≤3 words for businesses)
 241  - ✅ Region junk removal (LLM-based)
 242  
 243  This qualitative approach is more effective than arbitrary SV thresholds. Long-tail keywords can still convert well for local services.
 244  
 245  ### Example Results
 246  
 247  Test data for "plumber" in Australia:
 248  
 249  - **plumber**: 49,500 searches/month, 46% competition, $25.07 CPC
 250  - **plumbers near me**: 49,500 searches/month, 62% competition, $28.16 CPC
 251  - **emergency plumber**: 6,600 searches/month, 76% competition, $36.58 CPC
 252  - **electrician**: 40,500 searches/month, 39% competition, $14.39 CPC
 253  
 254  ### Cost Breakdown
 255  
 256  **IMPORTANT:** Retries are disabled to prevent extra charges. Failed API calls will not retry.
 257  
 258  - **Per Keyword Expansion**: $0.075 per API call (no retries)
 259  - **Typical cost per country type**: ~$5-6 (63-80 keywords × $0.075)
 260  - **Full project estimate**: 25 countries × 2 types × $5.50 = ~$275
 261  - **Threshold Adjustments**: FREE (uses local CSV files)
 262  
 263  **Cost savings:** Removed automatic retry logic that was causing 49% overhead ($7-8 per country type with retries vs $5-6 without). Failed requests now fail immediately instead of retrying and incurring extra charges.
 264  
 265  Test with one country first to verify before processing all:
 266  
 267  ```bash
 268  npm run keywords generate-csv -- --type businesses --country AU  # ~$5-6
 269  ```
 270  
 271  ### Analysis Tools
 272  
 273  After generating CSV files, use these analysis scripts to make informed decisions:
 274  
 275  **Coverage Analysis** - See keyword counts across countries
 276  
 277  ```bash
 278  npm run keywords:coverage                    # All countries
 279  npm run keywords:coverage -- --country AU    # Specific country
 280  npm run keywords:coverage -- --type businesses
 281  ```
 282  
 283  **Threshold Recommendations** - Find optimal cutoffs
 284  
 285  ```bash
 286  npm run keywords:recommend -- data/au/businesses-search-volume.csv --scenarios
 287  npm run keywords:recommend -- data/au/businesses-search-volume.csv --target 100
 288  ```
 289  
 290  Shows scenarios (p50, p75, p90, p95) with keyword counts and search volume impact.
 291  
 292  **Keyword Comparison** - Before/after filtering analysis
 293  
 294  ```bash
 295  npm run keywords:compare -- data/au/businesses-search-volume.csv data/au/businesses.txt
 296  npm run keywords:compare -- data/au/businesses-search-volume.csv data/au/businesses.txt --show-removed
 297  ```
 298  
 299  Shows what was kept vs removed, search volume statistics, and missed opportunities.
 300  
 301  ## Architecture
 302  
 303  See `docs/ARCHITECTURE.md` for detailed system design and `docs/TODO.md` for implementation status.
 304  
 305  **Core Pipeline**: Keywords → SERPs → Assets → Scoring → Rescoring → Enrich → Proposals → Outreach → Replies
 306  
 307  The pipeline is now organized into 9 independent stages that can be run individually or together using `npm run all`. Each stage has its own CLI, statistics, and error handling.
 308  
 309  **Reliability Features**:
 310  
 311  - **Circuit Breakers**: All API calls (LLM providers, ZenRows, Twilio, Resend) are protected by circuit breakers that prevent cascading failures and excessive costs during outages. See `docs/CIRCUIT-BREAKER.md` for details.
 312  - **Retry with Backoff**: Transient failures are automatically retried with exponential backoff.
 313  - **Error Recovery**: Failed operations can be retried via stage-specific commands.
 314  
 315  **Site Filtering**:
 316  
 317  The pipeline automatically filters out business directories (Yelp, Yellow Pages, Craigslist, etc.) and social media platforms (Facebook, Twitter/X, LinkedIn, etc.) to prevent wasting API credits. Filtering uses a two-tier approach:
 318  
 319  1. **Domain Blocklists** (`src/utils/site-filters.js`): Fast domain matching runs at the start of each stage (SERPs, Assets, Scoring, Rescoring, Enrich, Proposals). Blocklisted sites are set to `status='ignore'` with a descriptive `error_message`.
 320  
 321  2. **LLM Fallback Detection**: The scoring prompts include an `is_business_directory` field that catches directories missed by the blocklist through visual/HTML analysis. These are also set to `status='ignore'`.
 322  
 323  Sites with `status='ignore'` are automatically excluded from all downstream stages. Blocklists can be extended by adding domains to `src/utils/site-filters.js`.
 324  
 325  ### Stage Selection Criteria & Status Flow
 326  
 327  Each stage processes records based on specific criteria. Understanding these criteria ensures no records get lost during processing.
 328  
 329  #### Sites Table Status Flow
 330  
 331  **High-scoring sites (score > 82)**:
 332  
 333  ```
 334  found → assets_captured → high_score ✓
 335  ```
 336  
 337  **Low-scoring sites (score ≤ 82)**:
 338  
 339  ```
 340  found → assets_captured → scored → rescored → enriched → proposals_drafted → [outreach_sent]
 341  ```
 342  
 343  **Note**: `outreach_sent` is defined in schema but not currently set by any stage. Sites remain at `proposals_drafted` after outreach delivery.
 344  
 345  **Pipeline Flow Rules**:
 346  
 347  - **High-scoring sites (score > 82)**: Marked as `high_score` after initial scoring (end of journey - no outreach needed)
 348  - **Low-scoring sites (score ≤ 82)**: Must complete full sequence: `scored` → `rescored` → `enriched` → `proposals_drafted`
 349  - **Strict sequential processing**: Each stage only processes sites from the previous stage (no skipping)
 350  - **Failing sites**: Sites that exceed retry limits are marked as `failing` for human review
 351  
 352  #### Stage-by-Stage Selection Criteria
 353  
 354  **1. Keywords Stage**
 355  
 356  - **Selects from**: `keywords` table
 357  - **Criteria**: `status = 'active'`
 358  - **Ordering**: `priority DESC, search_count ASC`
 359  - **Output**: Selected keywords for SERP scraping
 360  - **Note**: Keywords table has columns: keyword, priority (1-10), status ('active'/'inactive'), search_count, zenrows_count, processed_count, low_scoring_count
 361  
 362  **2. SERPs Stage**
 363  
 364  - **Selects from**: `keywords` table
 365  - **Criteria**: `status = 'active'`
 366  - **Creates**: New sites with `status = 'found'`
 367  
 368  **3. Assets Stage**
 369  
 370  - **Selects from**: `sites` table
 371  - **Criteria**:
 372    - `status IN ('found', 'assets_captured')`
 373    - Then filters to sites missing `html_dom` OR incomplete screenshots (needs all 6 files)
 374  - **Success**: Sets `status = 'assets_captured'`
 375  - **Error**: Keeps `status = 'found'` with `error_message` set
 376  - **Retry**: Processes records with `error_message IS NOT NULL`
 377  
 378  **4. Scoring Stage**
 379  
 380  - **Selects from**: `sites` table
 381  - **Criteria**:
 382    - `status = 'assets_captured'`
 383    - `score IS NULL OR error_message IS NOT NULL`
 384  - **Success**:
 385    - Sets `status = 'high_score'` if score > 82 (end of journey)
 386    - Sets `status = 'scored'` if score ≤ 82 (continues to rescoring)
 387  - **Error**: Keeps `status = 'assets_captured'` with `error_message` set
 388  - **Retry**: Processes records with `error_message IS NOT NULL`
 389  
 390  **5. Rescoring Stage**
 391  
 392  - **Selects from**: `sites` table
 393  - **Criteria**:
 394    - `status = 'scored'`
 395    - `score <= 82` (B- or below, configurable via `config.low_score_cutoff`)
 396    - `screenshot_path IS NOT NULL`
 397    - `rescored_at IS NULL OR error_message IS NOT NULL`
 398  - **Success**: Sets `status = 'rescored'`
 399  - **Error**: Keeps `status = 'scored'` with `error_message` set
 400  - **Retry**: Processes records with `error_message IS NOT NULL`
 401  - **Note**: Sites with `score > 82` remain at `status = 'scored'` and skip this stage
 402  
 403  **6. Enrich Stage**
 404  
 405  - **Selects from**: `sites` table
 406  - **Criteria**:
 407    - `status = 'rescored'` (must complete rescoring first)
 408    - `enriched_at IS NULL OR error_message IS NOT NULL`
 409    - Then filters to sites without contact forms: `!contactsJson.primary_contact_form?.form_url`
 410  - **Success**: Sets `status = 'enriched'`
 411  - **Error**: Keeps `status = 'rescored'` with `error_message` set
 412  - **Retry**: Processes records with `error_message IS NOT NULL`
 413  - **Note**: Only low-scoring sites reach this stage (high-scoring sites stay at 'scored')
 414  
 415  **7. Proposals Stage**
 416  
 417  - **Selects from**: `sites` table
 418  - **Criteria**:
 419    - `status = 'enriched'` (must complete enrichment first)
 420    - `score >= minScore AND score <= maxScore` (default: 0-82)
 421    - `NOT EXISTS (SELECT 1 FROM outreaches WHERE site_id = sites.id)`
 422  - **Success**: Creates `outreaches` records with `status = 'pending'`, sets site `status = 'proposals_drafted'`
 423  - **Error**: Keeps `status = 'enriched'` with `error_message` set
 424  - **Retry**: Processes records with `error_message IS NOT NULL`
 425  - **Note**: Only processes low-scoring enriched sites (≤82 by default)
 426  
 427  **8. Outreach Stage**
 428  
 429  - **Selects from**: `outreaches` table (not sites)
 430  - **Criteria**:
 431    - `status = 'pending'`
 432    - `contact_method IS NOT NULL`
 433    - `contact_uri IS NOT NULL`
 434  - **Success**:
 435    - Sets outreach `status = 'sent'` or `'delivered'`
 436    - Updates site `status = 'outreach_sent'` when all outreaches for that site are sent
 437  - **Error**: Sets outreach `status = 'failed'`
 438  
 439  **9. Replies Stage**
 440  
 441  - **Selects from**: `conversations` table
 442  - **Criteria**: `processed_at IS NULL` (unless `--all` flag used)
 443  - **Success**: Sets `processed_at = CURRENT_TIMESTAMP`
 444  
 445  #### Record Handling Notes
 446  
 447  **High-scoring sites (score > 82) are intentionally excluded**
 448  
 449  - Sites with `score > 82` after initial scoring remain at `status = 'scored'` indefinitely
 450  - They skip rescoring (only processes `score <= 82`)
 451  - They cannot reach enrichment (requires `status = 'rescored'`)
 452  - They cannot get proposals (only processes `score <= 82`)
 453  - **This is by design**: The pipeline only targets low-scoring sites (≤82) for outreach opportunities
 454  
 455  ---
 456  
 457  ## Command Reference
 458  
 459  ### Stage-Based Pipeline (Recommended)
 460  
 461  The new stage-based architecture provides granular control over each pipeline phase. Each stage can be run independently with `--limit` and `--skip` flags.
 462  
 463  #### Run Complete Pipeline
 464  
 465  ```
 466  # Run all stages from keywords to replies
 467  npm run all
 468  
 469  # Run with limit per stage
 470  npm run all -- --limit 10
 471  
 472  # Skip specific stages
 473  npm run all -- --skip keywords,serps
 474  
 475  # Continue on errors
 476  npm run all -- --force
 477  ```
 478  
 479  #### Individual Stages
 480  
 481  **1\. Keywords** - Keyword selection and prioritization
 482  
 483  ```
 484  # Process active keywords
 485  npm run keywords
 486  
 487  # List all keywords with stats
 488  npm run keywords list
 489  
 490  # Generate keyword combinations for a country
 491  npm run keywords generate -- --country UK --limit 10
 492  
 493  # Add a new keyword
 494  npm run keywords add "plumber seattle" 8 -- --country US
 495  
 496  # Update keyword priority
 497  npm run keywords priority <id> <priority>
 498  ```
 499  
 500  **2\. SERPs** - Scrape search results for keywords
 501  
 502  ```
 503  # Scrape SERPs for active keywords
 504  npm run serps
 505  
 506  # Limit keywords to process
 507  npm run serps -- --limit 5
 508  
 509  # View SERP statistics
 510  npm run serps stats
 511  ```
 512  
 513  **3\. Assets** - Capture screenshots for sites
 514  
 515  ```
 516  # Capture screenshots for found sites
 517  npm run assets
 518  
 519  # Limit sites to capture
 520  npm run assets -- --limit 10
 521  
 522  # Backfill missing screenshots
 523  npm run assets backfill 20
 524  
 525  # View assets statistics
 526  npm run assets stats
 527  ```
 528  
 529  **4\. Scoring** - Initial AI conversion scoring
 530  
 531  ```
 532  # Score captured sites
 533  npm run scoring
 534  
 535  # Limit sites to score
 536  npm run scoring -- --limit 10
 537  
 538  # View scoring statistics
 539  npm run scoring stats
 540  ```
 541  
 542  **5\. Rescoring** - Rescore low-scoring sites (B- and below)
 543  
 544  ```
 545  # Rescore sites with B- or below
 546  npm run rescoring
 547  
 548  # Limit sites to rescore
 549  npm run rescoring -- --limit 10
 550  
 551  # View rescoring statistics
 552  npm run rescoring stats
 553  ```
 554  
 555  **6\. Enrich** - Enrich contact details from key pages
 556  
 557  ```
 558  # Enrich sites without contact forms by browsing key pages
 559  npm run enrich
 560  
 561  # Limit sites to enrich
 562  npm run enrich -- --limit 10
 563  
 564  # View enrichment statistics
 565  npm run enrich stats
 566  ```
 567  
 568  **7\. Proposals** - Generate personalized proposals
 569  
 570  ```
 571  # Generate proposals for low-scoring sites
 572  npm run proposals
 573  
 574  # Limit proposals to generate
 575  npm run proposals -- --limit 10
 576  
 577  # Regenerate proposals for specific sites
 578  npm run proposals regenerate <siteId1> <siteId2>
 579  
 580  # View proposals statistics
 581  npm run proposals stats
 582  ```
 583  
 584  **8\. Outreach** - Multi-channel outreach delivery
 585  
 586  ```
 587  # Send pending outreaches (auto channel selection)
 588  npm run outreach
 589  
 590  # Limit outreaches to send
 591  npm run outreach -- --limit 10
 592  
 593  # Send via specific channel
 594  npm run outreach sms
 595  npm run outreach email
 596  npm run outreach form
 597  npm run outreach x
 598  npm run outreach linkedin
 599  
 600  # Retry failed outreaches
 601  npm run outreach retry 10
 602  
 603  # View outreach statistics
 604  npm run outreach stats
 605  ```
 606  
 607  **9\. Replies** - Process inbound replies
 608  
 609  ```
 610  # Process new replies
 611  npm run replies
 612  
 613  # Show all replies (not just unprocessed)
 614  npm run replies -- --all
 615  
 616  # Limit replies to show
 617  npm run replies -- --limit 20
 618  
 619  # Process opt-out requests
 620  npm run replies opt-outs
 621  
 622  # View replies statistics
 623  npm run replies stats
 624  ```
 625  
 626  ### POC Pipeline (SERP → Score)
 627  
 628  - **Purpose**: Proof of concept / manual testing
 629  - **Browser**: Visible (headed) - great for debugging
 630  - **Scope**: Single keyword at a time
 631  - **Output**: Beautiful summary report with grade distribution
 632  - **Use case**: Manual verification, demos, troubleshooting
 633  - **Benefits**:
 634    - **Visual debugging** - See what's happening in real-time
 635    - **Quality verification** - Manually verify scoring on new business types
 636    - **Presentations/demos** - Nice visual output
 637    - **Troubleshooting** - When process.js fails, POC helps diagnose
 638    - **No imports found** - Nothing depends on it, so low maintenance burden
 639  
 640  ```
 641  # Process a keyword (scrape SERP, capture screenshots, score sites)
 642  npm run poc "keyword" N
 643  # Example: npm run poc "plumber sydney" 10
 644  
 645  # Process N sites from queue
 646  npm run process N
 647  
 648  # Retry failed operations
 649  npm run retry
 650  ```
 651  
 652  ### MVP Pipeline (Full End-to-End)
 653  
 654  - **Purpose**: Production automation
 655  - **Browser**: Headless (via captureWebsite)
 656  - **Scope**: Batch processes entire keywords table
 657  - **Output**: Basic logging
 658  - **Use case**: Automated pipeline for scaling
 659  
 660  ```
 661  # Full MVP pipeline: POC → Propose → Send
 662  npm run mvp run "keyword"
 663  npm run mvp run "keyword" --limit 5
 664  npm run mvp run "keyword" --skip-poc --skip-outreach
 665  
 666  # Individual stages
 667  npm run mvp poc "keyword"           # SERP + Score only
 668  npm run mvp propose [min] [max]     # Generate proposals (default: 0-82)
 669  npm run mvp send [limit]            # Send pending outreaches
 670  ```
 671  
 672  ### Proposal Generation
 673  
 674  ```
 675  # Generate proposals for a specific site
 676  npm run proposals generate <site_id>
 677  
 678  # Bulk generate for N sites scoring B- to E (0-82)
 679  npm run proposals bulk N
 680  
 681  # Regenerate proposals marked for rework
 682  npm run proposals rework
 683  
 684  # Show pending outreaches
 685  npm run proposals pending
 686  
 687  # Analyze feedback patterns (default: PROPOSAL.md, 30 days)
 688  node src/proposal-generator-v2.js analyze [prompt] [days]
 689  ```
 690  
 691  ### Outreach Approval Workflow
 692  
 693  Batch review workflow using Google Sheets for QA collaboration:
 694  
 695  ```
 696  # 1. Generate proposals
 697  npm run proposals
 698  
 699  # 2. Export pending outreaches to Google Sheets
 700  npm run outreach:export
 701  
 702  # 3. (QA reviews in Google Sheets, sets Action column: approve/rework/reject)
 703  
 704  # 4. Import QA decisions back to database
 705  npm run outreach:import <sheetId>
 706  
 707  # 5. Show approval statistics
 708  npm run outreach:status
 709  
 710  # 6. Regenerate proposals marked for rework
 711  node src/proposal-generator-v2.js rework
 712  
 713  # 7. Send approved outreaches
 714  npm run outreach
 715  ```
 716  
 717  **Google Sheets Setup:**
 718  
 719  1. Create project at https://console.cloud.google.com/
 720  2. Enable Google Sheets API
 721  3. Create Service Account with Editor role
 722  4. Generate JSON key and extract `client_email` + `private_key`
 723  5. Add to `.env`: `GOOGLE_SHEETS_CLIENT_EMAIL`, `GOOGLE_SHEETS_PRIVATE_KEY`
 724  6. (Optional) Create folder in Google Drive, share with service account, add `GOOGLE_SHEETS_FOLDER_ID` to `.env`
 725  
 726  ### Contact Prioritization
 727  
 728  ```
 729  # Update contact URIs for a specific site
 730  node src/contacts/prioritize.js update <site_id>
 731  
 732  # Bulk update all pending outreaches (with optional limit)
 733  node src/contacts/prioritize.js bulk [N]
 734  
 735  # Show outreach readiness report
 736  node src/contacts/prioritize.js report
 737  ```
 738  
 739  ### Outreach Channels
 740  
 741  #### Email (Resend)
 742  
 743  ```
 744  # Send single email
 745  node src/outreach/email.js send <outreach_id>
 746  
 747  # Bulk send approved emails (with optional limit)
 748  # Note: Automatically syncs unsubscribes before sending
 749  node src/outreach/email.js bulk [N]
 750  
 751  # Manually unsubscribe an email
 752  node src/outreach/email.js unsubscribe <outreach_id>
 753  
 754  # Sync unsubscribes from Cloudflare Worker
 755  npm run sync-unsubscribes
 756  
 757  # Sync email tracking events (opens, clicks, bounces)
 758  node src/utils/sync-email-events.js
 759  
 760  # Test email configuration
 761  node src/outreach/email.js test
 762  ```
 763  
 764  **Email Tracking**: Resend automatically tracks email opens, clicks, bounces, and spam complaints. Set up the Cloudflare Worker webhook receiver (see [Cloudflare Worker Setup](#cloudflare-worker-setup) below) and run `sync-email-events.js` every 5 minutes via cron.
 765  
 766  #### SMS (Twilio)
 767  
 768  ```
 769  # Send single SMS
 770  node src/outreach/sms.js send <outreach_id>
 771  
 772  # Bulk send approved SMS (with optional limit)
 773  node src/outreach/sms.js bulk [N]
 774  
 775  # Test SMS configuration
 776  node src/outreach/sms.js test
 777  ```
 778  
 779  #### X / Twitter DMs (Playwright)
 780  
 781  ```
 782  # Send single X DM (headed browser, manual review)
 783  node src/outreach/x.js send <outreach_id>
 784  
 785  # Bulk send X DMs
 786  node src/outreach/x.js bulk [N]
 787  ```
 788  
 789  #### LinkedIn Messages (Playwright)
 790  
 791  ```
 792  # Send single LinkedIn message (headed browser, manual review)
 793  node src/outreach/linkedin.js send <outreach_id>
 794  
 795  # Bulk send LinkedIn messages
 796  node src/outreach/linkedin.js bulk [N]
 797  ```
 798  
 799  #### Contact Forms (Playwright)
 800  
 801  ```
 802  # Submit contact form (interactive by default)
 803  node src/outreach/form.js send <outreach_id>
 804  
 805  # Run in headless mode (automated)
 806  node src/outreach/form.js send <outreach_id> --headless
 807  
 808  # Bulk submit forms
 809  node src/outreach/form.js bulk [N]
 810  ```
 811  
 812  ### Browser Profiles (X & LinkedIn)
 813  
 814  X and LinkedIn outreach use persistent browser profiles to avoid re-login on every run. Profiles store cookies and session data, rotating across multiple accounts using an LRU strategy.
 815  
 816  ```
 817  # List all profiles
 818  npm run profiles list
 819  
 820  # List profiles for a specific platform
 821  npm run profiles list x
 822  npm run profiles list linkedin
 823  
 824  # Show which profile will be used next (LRU)
 825  npm run profiles next x
 826  
 827  # Delete a specific profile
 828  npm run profiles delete x profile-1
 829  ```
 830  
 831  **First-time setup:** Run outreach 3 times per platform to create 3 rotating profiles. Each run opens a headed browser for manual login, then auto-saves the session for future reuse.
 832  
 833  **Configuration (.env):**
 834  
 835  - `BROWSER_PROFILES_DIR` - Storage directory (default: `./.browser-profiles`)
 836  - `X_PROFILE_COUNT` - Number of X profiles (default: 3)
 837  - `LINKEDIN_PROFILE_COUNT` - Number of LinkedIn profiles (default: 3)
 838  
 839  ### Inbound Handling
 840  
 841  #### SMS Replies (Twilio)
 842  
 843  ```
 844  # Poll Twilio API for new inbound SMS messages
 845  npm run inbound:sms
 846  # (or: node src/inbound/sms.js poll)
 847  
 848  # Process pending operator replies
 849  npm run inbound:process-replies
 850  ```
 851  
 852  **Setup**:
 853  
 854  - Webhooks are handled by Cloudflare Workers (see [Cloudflare Worker Setup](#cloudflare-worker-setup))
 855  - For local testing or as backup, use polling (cron every 5 minutes):
 856  - Inbound messages are stored in the `conversations` table and matched to outreach records by phone number
 857  - Operator replies marked as `direction='outbound'` are automatically sent via `process-replies`
 858  
 859  ### Database & Maintenance
 860  
 861  ```
 862  # Initialize/reset database
 863  npm run init-db
 864  
 865  # Apply database migrations
 866  npm run db-migrate
 867  
 868  # Apply migrations with force mode (for existing databases)
 869  npm run db-migrate -- --force
 870  
 871  # Backfill keywords from sites table
 872  npm run backfill-keywords
 873  
 874  # Recapture missing screenshots for N sites
 875  npm run backfill-screenshots N
 876  
 877  # Deduplicate domains (locale-aware: prefer exact country match)
 878  npm run dedupe:locale:dry-run  # Preview locale-aware deduplication
 879  npm run dedupe:locale          # Execute locale-aware deduplication
 880  
 881  # Legacy deduplication (search volume only, no locale consideration)
 882  npm run dedupe:dry-run  # Preview search volume deduplication
 883  npm run dedupe          # Execute search volume deduplication
 884  
 885  # Analyze competitors for a keyword
 886  npm run competitors "keyword"
 887  ```
 888  
 889  ### Agent System
 890  
 891  The multi-agent system provides autonomous development, testing, and maintenance. See [docs/06-automation/agent-system.md](docs/06-automation/agent-system.md) for comprehensive documentation.
 892  
 893  ```bash
 894  # View agent status and health
 895  npm run agent:list
 896  
 897  # View pending tasks
 898  npm run agent:tasks
 899  
 900  # View agent logs
 901  npm run agent:logs
 902  
 903  # View tasks awaiting approval
 904  npm run agent:approvals
 905  
 906  # Approve a task
 907  npm run agent:approve -- --task-id 123 --reviewer "Your Name" --decision approved
 908  
 909  # View agent statistics
 910  npm run agent:stats
 911  
 912  # Reset circuit breakers (prepare for activation)
 913  npm run agent:reset-breakers:dry-run  # Preview what would be reset
 914  npm run agent:reset-breakers          # Reset breakers older than 30 minutes
 915  npm run agent:reset-breakers:force    # Force reset all breakers + cleanup old tasks
 916  ```
 917  
 918  **Circuit Breaker Management:**
 919  
 920  Circuit breakers protect against cascading failures by blocking agents with >30% failure rates. Auto-recovery happens after 30 minutes if failure rate drops. See [docs/06-automation/circuit-breaker-management.md](docs/06-automation/circuit-breaker-management.md) for details.
 921  
 922  Before activating the agent system:
 923  
 924  1. Run `npm run agent:reset-breakers:dry-run` to check status
 925  2. Run `npm run agent:reset-breakers:force` to reset and cleanup
 926  3. Run `npm run agent:list` to verify all agents are active
 927  4. Enable cron: Set `AGENT_SYSTEM_ENABLED=true` in `.env`
 928  
 929  ### Quality & Testing
 930  
 931  ```
 932  # Run all unit tests
 933  npm test
 934  
 935  # Run integration tests only (requires RESEND_API_KEY in .env)
 936  npm run test:integration
 937  
 938  # Run all tests (unit + integration)
 939  npm run test:all
 940  
 941  # Watch mode for tests
 942  npm run test:watch
 943  
 944  # Run with coverage report (included in npm test)
 945  npm test
 946  
 947  # Lint code
 948  npm run lint
 949  npm run lint:fix  # Auto-fix issues
 950  
 951  # Format code
 952  npm run format
 953  npm run format:check
 954  
 955  # Full quality check (runs all checks + Sage AI review)
 956  npm run quality-check
 957  
 958  # 🚀 UNIFIED AUTO-FIX (Recommended!)
 959  # Runs ALL automated maintenance tasks in one go:
 960  # - Prettier formatting
 961  # - ESLint auto-fix
 962  # - Security audit fixes (npm audit fix)
 963  # - Dependency updates (patches + minors + npm outdated)
 964  # - Sage AI quality fixes (requires claude CLI in PATH)
 965  # - Documentation checks
 966  # All fixes are committed to shared "autofix" branch for review
 967  npm run autofix
 968  
 969  # View what's in the autofix branch
 970  npm run autofix:summary
 971  
 972  # Individual auto-fix tasks (if you want granular control)
 973  npm run sage:autofix  # AI-powered quality fixes only
 974  ```
 975  
 976  ### Maintenance
 977  
 978  ```bash
 979  # Quick health check (vulnerabilities, lint, tests)
 980  npm run maint:quick
 981  
 982  # Full audit (includes outdated packages)
 983  npm run maint:audit
 984  
 985  # Database integrity check and optimization
 986  npm run maint:db
 987  
 988  # Analyze CLAUDE.md for duplication (non-destructive)
 989  npm run maint:claude
 990  
 991  # Weekly maintenance (all of the above + cleanup)
 992  ./scripts/weekly-maintenance.sh
 993  ```
 994  
 995  **CLAUDE.md Optimization:**
 996  
 997  The project includes tools to keep [CLAUDE.md](CLAUDE.md) optimized:
 998  
 999  - **Analysis**: `npm run maint:claude` analyzes for duplication and generates a report
1000  - **Non-destructive**: Does not modify CLAUDE.md automatically
1001  - **Weekly automation**: Set up with cron (see [docs/CRON-SETUP.md](docs/CRON-SETUP.md))
1002  - **Reports**: Saved to `.claude-analysis/` (git-ignored)
1003  
1004  ```bash
1005  # Run analysis
1006  npm run maint:claude
1007  
1008  # View analysis report
1009  cat .claude-analysis/analysis-2026-02-03.md
1010  
1011  # Set up weekly cron job (optional)
1012  crontab -e
1013  # Add: 0 2 * * 0 cd /path/to/project && ./scripts/weekly-maintenance.sh >> logs/weekly-maintenance.log 2>&1
1014  ```
1015  
1016  **Git Hooks**: Pre-commit and pre-push hooks are automatically installed via `simple-git-hooks`:
1017  
1018  - **Pre-commit**: Runs `npm run format && npm run lint` to ensure code quality
1019  - **Pre-push**: Runs `npm test` to prevent pushing broken code
1020  - Hooks are installed automatically when running `npm install`
1021  - Skip hooks temporarily: `SKIP_SIMPLE_GIT_HOOKS=1 git commit`
1022  
1023  **GitHub Actions**: Automated CI/CD workflows:
1024  
1025  - **PR Quality Check**: Runs on all pull requests and pushes to main
1026  - **Weekly Maintenance**: Scheduled every Monday at 9 AM UTC
1027    - Checks for vulnerabilities and outdated packages
1028    - Runs full test suite with coverage
1029    - Creates GitHub issue if problems found
1030  
1031  **Maintenance Schedule**:
1032  
1033  - **Weekly**: Review Dependabot PRs, check vulnerabilities, review test coverage
1034  - **Monthly**: Database maintenance, documentation audit, unused code review
1035  - See [docs/MAINTENANCE.md](docs/MAINTENANCE.md) for detailed maintenance plan
1036  
1037  ### Monitoring & Credits
1038  
1039  ```bash
1040  # Check OpenRouter credit balance
1041  npm run credits              # Quick check
1042  npm run credits:verbose      # Detailed info with rate limits
1043  npm run credits:monitor      # Run monitoring cron job (logs to DB, alerts if low)
1044  
1045  # Pipeline monitoring
1046  npm run monitor:status       # System health summary (alias: npm run watchdog:status)
1047  npm run monitor:guardian     # Run process guardian manually
1048  
1049  # API rate limit management
1050  npm run rate-limits          # Show current rate limit status
1051  npm run rate-limits:clear    # Clear all rate limits
1052  npm run rate-limits:clear -- --clear zenrows  # Clear specific API
1053  ```
1054  
1055  ### Inbound (npm scripts)
1056  
1057  ```bash
1058  # Poll for new inbound messages
1059  npm run inbound:poll         # Poll all channels
1060  npm run inbound:sms          # Poll Twilio for new inbound SMS
1061  npm run inbound:email        # Poll Resend API for inbound email replies
1062  
1063  # Process operator replies
1064  npm run inbound:process-replies  # Send pending operator replies
1065  
1066  # View conversations
1067  npm run inbound:inbox        # View inbox
1068  npm run inbound:thread       # View conversation thread
1069  npm run inbound:stats        # Inbound statistics
1070  ```
1071  
1072  ### Pricing
1073  
1074  ```bash
1075  # View pricing
1076  npm run pricing:summary      # Summary of all country pricing
1077  npm run pricing:get          # Get price for a country/tier
1078  npm run pricing:tier         # Show tier info
1079  
1080  # Update pricing
1081  npm run pricing:update       # Run weekly repricing (cultural/PPP-adjusted)
1082  npm run pricing:override     # Override price for a specific country
1083  npm run pricing:export       # Export pricing data to JSON
1084  ```
1085  
1086  ### Cleanup & Deduplication
1087  
1088  ```bash
1089  # Screenshot cleanup
1090  npm run cleanup:screenshots      # Remove screenshots for ignored sites
1091  npm run cleanup:screenshots:dry-run
1092  npm run cleanup:uncropped        # Delete uncropped screenshots (saves disk)
1093  npm run cleanup:uncropped:dry-run
1094  
1095  # Site cleanup
1096  npm run cleanup:reset-failing    # Reset failing sites back to prior stage
1097  npm run cleanup:reset-failing:dry-run
1098  
1099  # Outreach deduplication
1100  npm run dedupe:outreaches        # Deduplicate outreach records
1101  npm run dedupe:outreaches:dry-run
1102  
1103  # Contact validation
1104  npm run validate-contacts        # Validate contact data
1105  ```
1106  
1107  ### Security
1108  
1109  ```bash
1110  # Run all security checks
1111  npm run security             # Full suite: audit + lint + snyk + semgrep
1112  npm run security:fix         # Auto-fix everything possible
1113  
1114  # Individual checks
1115  npm run security:audit       # npm vulnerability audit
1116  npm run security:audit:fix   # Auto-fix npm vulnerabilities
1117  npm run security:lint        # Security-focused ESLint rules
1118  npm run security:snyk        # Snyk vulnerability scan
1119  npm run security:semgrep     # Semgrep static analysis
1120  npm run security:scan        # Custom security scan
1121  ```
1122  
1123  ### Dependency Management
1124  
1125  ```bash
1126  npm run deps:update          # Update minor/patch dependencies (tests, rolls back on failure)
1127  npm run deps:update:patches  # Patch updates only (safest)
1128  npm run deps:update:all      # All updates including majors (manual review needed)
1129  npm run deps:update:dry-run  # Preview updates without applying
1130  ```
1131  
1132  ---
1133  
1134  ## Environment Variables
1135  
1136  Create `.env` file with:
1137  
1138  ```
1139  # API Keys
1140  ZENROWS_API_KEY=your_zenrows_key     # SERP scraping
1141  
1142  # LLM Provider (OpenRouter — all LLM calls route through here)
1143  OPENROUTER_API_KEY=your_openrouter_key  # Multi-model AI gateway
1144  
1145  # Email Service (Resend)
1146  RESEND_API_KEY=your_resend_key       # Email delivery
1147  RESEND_TEST_API_KEY=your_test_key    # Optional: For integration tests
1148  
1149  # SMS Service (Twilio)
1150  TWILIO_ACCOUNT_SID=your_twilio_sid   # SMS delivery
1151  TWILIO_AUTH_TOKEN=your_twilio_token
1152  TWILIO_PHONE_NUMBER=+1234567890
1153  
1154  # Sender Details (for email, contact forms, etc.)
1155  SENDER_NAME=Your Name                # Sender name for outreach
1156  SENDER_EMAIL=your@email.com          # Sender email address
1157  SENDER_PHONE=+1234567890             # Sender phone for contact forms
1158  SENDER_COMPANY=Your Company          # Company name
1159  
1160  # Cloudflare Workers (for unsubscribes and email tracking)
1161  UNSUBSCRIBE_WORKER_URL=https://unsubscribe-worker.YOUR-SUBDOMAIN.workers.dev
1162  EMAIL_EVENTS_WORKER_URL=https://resend-webhook-worker.YOUR-SUBDOMAIN.workers.dev
1163  UNSUBSCRIBE_SECRET=your-secret-key   # HMAC secret for unsubscribe links
1164  
1165  # Optional
1166  DATABASE_PATH=db/sites.db            # Custom database location
1167  
1168  # Browser Stealth Configuration (Bot Detection Avoidance)
1169  STEALTH_LEVEL=standard               # minimal|standard|aggressive
1170  ENABLE_HUMAN_BEHAVIORS=true          # Human-like delays and movements
1171  ENABLE_BEZIER_MOUSE=true             # Bezier curve mouse movements (no teleporting)
1172  TIMEZONE=Australia/Sydney            # Browser timezone (should match IP location)
1173  
1174  # Browser Profiles (X & LinkedIn persistent sessions)
1175  BROWSER_PROFILES_DIR=./.browser-profiles  # Profile storage directory
1176  X_PROFILE_COUNT=3                    # Number of X profiles to rotate
1177  LINKEDIN_PROFILE_COUNT=3             # Number of LinkedIn profiles to rotate
1178  ```
1179  
1180  ---
1181  
1182  ## Database Schema
1183  
1184  Main tables:
1185  
1186  - **sites**: Domain, screenshots, scores, contacts
1187  - **outreaches**: Proposal variants, delivery status, tracking
1188  - **conversations**: Inbound/outbound message threads
1189  - **config**: Global settings (sender email, templates)
1190  - **unsubscribed_emails**: CAN-SPAM compliance (global unsubscribe list)
1191  - **opt_outs**: TCPA/CAN-SPAM opt-outs (phone & email)
1192  - **keywords**: Keyword tracking and scoring (created via migration, not in base schema)
1193  - **migrations**: Migration tracking (auto-created by migration system)
1194  
1195  See `db/schema.sql` for base schema. Keywords table created via `db/migrations/add-keywords-table.sql` and `013-fix-keywords-schema.sql`.
1196  
1197  ### Database Migrations
1198  
1199  The project uses an automated migration system to manage schema changes. Migrations are stored in `db/migrations/` and tracked in a `migrations` table to prevent duplicate execution.
1200  
1201  **Running Migrations:**
1202  
1203  ```
1204  # Apply all pending migrations
1205  npm run db-migrate
1206  
1207  # For existing databases with manually applied changes, use force mode
1208  npm run db-migrate -- --force
1209  ```
1210  
1211  **Creating New Migrations:**
1212  
1213  1.  Create a new `.sql` file in `db/migrations/` with a sortable prefix (e.g., `006-add-new-field.sql`)
1214  2.  Write your SQL statements (ALTER TABLE, CREATE INDEX, etc.)
1215  3.  Run `npm run db-migrate` to apply it
1216  
1217  **Migration Safety:**
1218  
1219  - Each migration runs in a transaction (automatically rolled back on failure)
1220  - The `migrations` table tracks which migrations have been applied
1221  - Force mode (`--force`) skips migrations that would fail due to existing schema (useful for existing databases)
1222  - Always backup your database before running migrations: `cp db/sites.db db/backup/sites-backup.db`
1223  
1224  ---
1225  
1226  ## Cloudflare Worker Setup
1227  
1228  The project uses Cloudflare Workers to handle unsubscribes and email tracking events. This avoids running a 24/7 webhook server.
1229  
1230  ### Initial Setup
1231  
1232  ```
1233  # Navigate to cloudflare-worker directory
1234  cd cloudflare-worker
1235  
1236  # Install wrangler locally
1237  npm install
1238  
1239  # Login to Cloudflare
1240  npm run login
1241  
1242  # Create R2 buckets
1243  npx wrangler r2 bucket create unsubscribes
1244  npx wrangler r2 bucket create email-events
1245  ```
1246  
1247  ### Deploy Unsubscribe Worker
1248  
1249  ```
1250  # Set the UNSUBSCRIBE_SECRET (must match your .env)
1251  npx wrangler secret put UNSUBSCRIBE_SECRET
1252  # When prompted, paste your secret from .env
1253  
1254  # Deploy worker
1255  npx wrangler deploy
1256  # Note the worker URL: https://unsubscribe-worker.YOUR-SUBDOMAIN.workers.dev
1257  ```
1258  
1259  ### Deploy Email Events Worker
1260  
1261  ```
1262  # Deploy the email tracking webhook worker
1263  npx wrangler deploy --config wrangler-resend.toml
1264  # Note the worker URL: https://resend-webhook-worker.YOUR-SUBDOMAIN.workers.dev
1265  ```
1266  
1267  ### Configure Resend Webhooks
1268  
1269  1.  Log into [Resend Dashboard](https://resend.com/webhooks)
1270  2.  Click "Add Webhook"
1271  3.  Enter webhook URL: `https://resend-webhook-worker.YOUR-SUBDOMAIN.workers.dev/webhook/resend`
1272  4.  Enable events: `email.opened`, `email.clicked`, `email.bounced`, `email.complained`, `email.delivered`
1273  5.  Save webhook
1274  
1275  ### Enable Domain Tracking in Resend
1276  
1277  1.  Go to [Resend Domains](https://resend.com/domains)
1278  2.  Select your domain
1279  3.  Scroll to "Tracking" section
1280  4.  Enable "Open Tracking" and "Click Tracking"
1281  5.  Save changes
1282  
1283  ### Update .env
1284  
1285  Add the worker URLs to your `.env`:
1286  
1287  ```
1288  UNSUBSCRIBE_WORKER_URL=https://unsubscribe-worker.YOUR-SUBDOMAIN.workers.dev
1289  EMAIL_EVENTS_WORKER_URL=https://resend-webhook-worker.YOUR-SUBDOMAIN.workers.dev
1290  ```
1291  
1292  ### Set Up Cron Jobs
1293  
1294  The 333 Method includes a **database-driven cron system** that manages all scheduled tasks (real-time syncs, pipeline stages, maintenance, quality checks).
1295  
1296  #### Quick Setup
1297  
1298  ```bash
1299  # 1. Initialize cron jobs table
1300  sqlite3 db/sites.db < db/migrations/029-create-cron-jobs-table.sql
1301  
1302  # 2. Seed database with default jobs
1303  npm run cron:migrate
1304  
1305  # 3. View all jobs
1306  npm run cron:list
1307  
1308  # 4. Configure systemd (or use crontab)
1309  # systemd: /etc/systemd/system/333method-cron.timer (runs every 5 minutes)
1310  # OR crontab: */5 * * * * cd /path/to/project && node src/cron.js
1311  ```
1312  
1313  #### Management Commands
1314  
1315  ```bash
1316  # List all jobs (with status and last run times)
1317  npm run cron:list
1318  
1319  # Enable/disable jobs
1320  npm run cron:enable syncEmailEvents
1321  npm run cron:disable runTests
1322  
1323  # View execution logs
1324  npm run cron:logs syncEmailEvents
1325  
1326  # Show statistics
1327  npm run cron:stats
1328  
1329  # Add new job
1330  npm run cron:add -- \
1331    --name "My Task" \
1332    --key myTask \
1333    --handler "npm run my-script" \
1334    --type command \
1335    --interval 60 \
1336    --unit minutes
1337  
1338  # Remove job
1339  npm run cron:remove myTask
1340  ```
1341  
1342  #### Default Jobs
1343  
1344  **Real-time (every 5 minutes)**:
1345  
1346  - Sync email events (opens, clicks, bounces)
1347  - Sync unsubscribes
1348  - Poll inbound SMS
1349  - Run assets stage (5 sites per batch)
1350  - Run scoring stage (10 sites per batch)
1351  - Run rescoring stage (5 sites per batch)
1352  - Run enrichment stage (3 sites per batch)
1353  - Run proposals stage (10 sites per batch)
1354  
1355  > **Note**: Pipeline stages use time-boxed batches to prevent blocking other jobs. Each batch completes in 1-2 minutes and returns control to the cron system. See [docs/CRON-BATCH-STRATEGY.md](docs/CRON-BATCH-STRATEGY.md) for details.
1356  
1357  **Daily**:
1358  
1359  - Database maintenance (PRAGMA optimize, integrity check)
1360  - Security audit (npm audit)
1361  - Check outdated dependencies
1362  
1363  **Weekly**:
1364  
1365  - Database vacuum and analyze
1366  - Database backup
1367  - Update Claude Code CLI
1368  - Performance analysis
1369  
1370  **Monthly**:
1371  
1372  - Technical debt review (TODO.md tracking)
1373  - Full security scan
1374  
1375  See [docs/CRON-JOBS.md](docs/CRON-JOBS.md) for detailed documentation, systemd integration, and adding custom jobs.
1376  
1377  ---
1378  
1379  ## Development Workflow
1380  
1381  ### Adding New Features
1382  
1383  1.  **Write code** in `src/` or `scripts/`
1384  2.  **Add tests** in `tests/` (aim for >80% coverage)
1385  3.  **Update this README** with new commands/features
1386  4.  **Run quality checks**: `npm run quality-check`
1387  5.  **Commit changes**
1388  
1389  ### Running Tests
1390  
1391  ```
1392  # While developing
1393  npm run test:watch
1394  
1395  # Before committing
1396  npm run quality-check
1397  ```
1398  
1399  ### Code Quality Standards
1400  
1401  - **ESLint**: Zero errors (warnings acceptable for complexity/await)
1402  - **Prettier**: Enforced formatting
1403  - **Tests**: 80%+ coverage target
1404  - **Sage AI**: Review enabled for quality-check
1405  
1406  ### Logging and Log Rotation
1407  
1408  All operational npm scripts automatically log to `./logs/` with daily rotation (7-day retention by default).
1409  
1410  **Coverage:** 84 operational commands logged including:
1411  
1412  - All pipeline stages (keywords, serps, assets, scoring, rescoring, enrich, proposals, outreach, replies)
1413  - Database operations (init-db, migrations, backfill)
1414  - Sync operations (email events, unsubscribes, inbound processing)
1415  - Security scans, cron jobs, pricing updates
1416  - **Dashboard operations** (Streamlit Python app)
1417  
1418  **Log Files:**
1419  
1420  ```bash
1421  # View logs for a specific script
1422  cat logs/keywords-2026-02-08.log
1423  cat logs/serps-2026-02-08.log
1424  cat logs/outreach-2026-02-08.log
1425  cat logs/dashboard-2026-02-08.log
1426  
1427  # Tail logs in real-time
1428  tail -f logs/all-2026-02-08.log
1429  
1430  # List all log types
1431  ls logs/*.log | sed 's|logs/||' | sed 's|-2026-.*||' | sort | uniq
1432  ```
1433  
1434  **Log Rotation:**
1435  
1436  ```bash
1437  # Rotate logs manually (delete files older than 7 days)
1438  npm run logs:rotate
1439  
1440  # Dry-run to see what would be deleted
1441  npm run logs:rotate:dry-run
1442  
1443  # Rotate with custom retention (30 days)
1444  npm run logs:rotate:30d
1445  ```
1446  
1447  **Automatic Rotation:**
1448  
1449  Add to crontab for daily rotation at 2 AM:
1450  
1451  ```bash
1452  0 2 * * * cd /path/to/project && node src/cron/daily-log-rotation.js
1453  ```
1454  
1455  **Log Format:**
1456  
1457  Each log entry includes:
1458  
1459  - Timestamp (ISO 8601)
1460  - Script name
1461  - Log level (INFO, SUCCESS, WARN, ERROR)
1462  - Message and optional data
1463  
1464  Example:
1465  
1466  ```
1467  [2026-02-08T07:22:23.322Z] [keywords] [INFO] Starting keyword processing
1468  [2026-02-08T07:22:23.405Z] [keywords] [OUTPUT] Processing keyword: plumber seattle
1469  [2026-02-08T07:22:25.123Z] [keywords] [SUCCESS] Completed successfully
1470  ```
1471  
1472  ---
1473  
1474  ## Project Structure
1475  
1476  ```
1477  333/
1478  ├── src/
1479  │   ├── all.js                 # Full pipeline orchestrator
1480  │   ├── stages/                # Stage-based pipeline modules
1481  │   │   ├── keywords.js        # Keyword selection
1482  │   │   ├── serps.js           # SERP scraping
1483  │   │   ├── assets.js          # Screenshot capture
1484  │   │   ├── scoring.js         # Initial AI scoring
1485  │   │   ├── rescoring.js       # Rescore low-scoring sites
1486  │   │   ├── proposals.js       # Proposal generation
1487  │   │   ├── outreach.js        # Multi-channel delivery
1488  │   │   └── replies.js         # Inbound reply processing
1489  │   ├── cli/                   # CLI entry points for stages
1490  │   │   ├── keywords.js
1491  │   │   ├── serps.js
1492  │   │   ├── assets.js
1493  │   │   ├── scoring.js
1494  │   │   ├── rescoring.js
1495  │   │   ├── proposals.js
1496  │   │   ├── outreach.js
1497  │   │   └── replies.js
1498  │   ├── scrape.js              # SERP scraping (ZenRows)
1499  │   ├── capture.js             # Screenshot capture (Playwright)
1500  │   ├── score.js               # Conversion scoring (AI)
1501  │   ├── poc.js                 # POC pipeline orchestration
1502  │   ├── process.js             # Queue-based processing
1503  │   ├── mvp.js                 # MVP pipeline orchestration
1504  │   ├── proposal-generator-v2.js  # Proposal generation
1505  │   ├── competitor-analysis.js    # Competitor research
1506  │   ├── retry-failed.js        # Error recovery
1507  │   ├── contacts/
1508  │   │   └── prioritize.js      # Contact method decision logic
1509  │   ├── outreach/
1510  │   │   ├── email.js           # Resend integration
1511  │   │   ├── sms.js             # Twilio integration
1512  │   │   ├── form.js            # Playwright form automation
1513  │   │   ├── x.js               # X/Twitter DM automation
1514  │   │   └── linkedin.js        # LinkedIn message automation
1515  │   ├── inbound/
1516  │   │   └── sms.js             # Twilio webhook server
1517  │   └── utils/
1518  │       ├── logger.js          # Colored console logging
1519  │       ├── error-handler.js   # Retry logic
1520  │       ├── llm-provider.js    # LLM provider abstraction (OpenRouter/Anthropic)
1521  │       ├── image-optimizer.js # Screenshot optimization
1522  │       ├── flag-parser.js     # CLI flag parsing
1523  │       └── summary-generator.js # Beautiful terminal summaries
1524  ├── tests/                     # Unit & integration tests
1525  ├── scripts/                   # Utility scripts
1526  ├── db/
1527  │   ├── schema.sql             # Database schema
1528  │   ├── migrations/            # Schema migrations
1529  │   └── sites.db               # SQLite database
1530  ├── docs/
1531  │   ├── ARCHITECTURE.md        # System design
1532  │   ├── TODO.md                # Implementation roadmap
1533  │   ├── FUNCTIONAL-SPEC.md     # Feature specifications
1534  │   ├── MULTI-COUNTRY-PLAN.md  # Internationalization plan
1535  │   ├── BEST-PRACTICES-EMAIL.md  # Email compliance (CAN-SPAM)
1536  │   └── prompts/               # AI prompts
1537  └── .clinerules/               # Cline automation rules
1538  ```
1539  
1540  ---
1541  
1542  ## Testing
1543  
1544  ```
1545  # Run specific test file
1546  node --test tests/prioritize.test.js
1547  
1548  # Run with coverage (c8, included in npm test)
1549  npm test
1550  
1551  # Current coverage: ~64% (target: 85%+)
1552  ```
1553  
1554  ---
1555  
1556  ## Troubleshooting
1557  
1558  ### Database locked errors
1559  
1560  ```
1561  # Close all connections, then:
1562  sqlite3 db/sites.db "PRAGMA optimize;"
1563  ```
1564  
1565  ### Playwright browser issues
1566  
1567  ```
1568  # In nix-shell, Playwright uses system chromium
1569  # Outside nix-shell:
1570  npx playwright install chromium
1571  ```
1572  
1573  ### Missing screenshots
1574  
1575  ```
1576  npm run backfill-screenshots 10
1577  ```
1578  
1579  ### API rate limits
1580  
1581  - **ZenRows SERP API**:
1582    - **Daily quota**: 1,000 requests/day
1583    - **Concurrency**: Configurable via `ZENROWS_CONCURRENCY` env var (default: 20)
1584    - Plan limits: Free (5), Developer (10), Business (100), Enterprise (custom)
1585    - Reference: [ZenRows Concurrency Docs](https://docs.zenrows.com/universal-scraper-api/features/concurrency)
1586  - LLM Providers:
1587    - OpenRouter: Pay-per-use (GPT-4o-mini: ~$0.15/1M tokens)
1588    - Anthropic: Direct API pricing (Claude 3.5 Sonnet: $3/$15 per 1M tokens)
1589  - Resend: 100 emails/day (free tier)
1590  - Twilio: Pay-per-message (~$0.0075/SMS)
1591  
1592  ---
1593  
1594  ## Unsubscribe System (CAN-SPAM Compliance)
1595  
1596  The project includes a complete unsubscribe system for email compliance:
1597  
1598  ### Architecture
1599  
1600  1.  **Static HTML Page** (`public/unsubscribe.html`) - User-facing unsubscribe page
1601  2.  **Cloudflare Worker** (`cloudflare-worker/`) - Handles unsubscribe requests and stores them
1602  3.  **Local Sync Script** (`src/utils/sync-unsubscribes.js`) - Polls worker and imports to SQLite
1603  4.  **Global Unsubscribe List** (`unsubscribed_emails` table) - Blocks all future emails to unsubscribed addresses
1604  
1605  ### Setup
1606  
1607  1.  Deploy the Cloudflare Worker (see `cloudflare-worker/README.md` for instructions)
1608  2.  Upload `public/unsubscribe.html` to your static hosting
1609  3.  Set `UNSUBSCRIBE_WORKER_URL` in `.env`
1610  4.  Set `UNSUBSCRIBE_BASE_URL` in `.env` to point to your static page
1611  5.  Generate a secure `UNSUBSCRIBE_SECRET` (same in `.env` and Cloudflare Worker)
1612  
1613  ### Usage
1614  
1615  ```
1616  # Sync unsubscribes from Cloudflare Worker
1617  npm run sync-unsubscribes
1618  
1619  # Manually unsubscribe an email
1620  node src/outreach/email.js unsubscribe <outreach_id>
1621  
1622  # Bulk send automatically syncs before sending
1623  node src/outreach/email.js bulk 10
1624  ```
1625  
1626  ### How It Works
1627  
1628  1.  Each email includes a unique HMAC-secured unsubscribe link
1629  2.  User clicks link → redirected to static HTML page
1630  3.  Page POSTs to Cloudflare Worker with ID and token
1631  4.  Worker validates token and appends to `unsubscribes.json` in R2
1632  5.  Local script polls worker and imports to `unsubscribed_emails` table
1633  6.  Future sends automatically check global unsubscribe list
1634  
1635  ### Security
1636  
1637  - HMAC tokens prevent unauthorized unsubscribes
1638  - Timing-safe comparison prevents timing attacks
1639  - Same secret must be in `.env` and Cloudflare Worker environment
1640  - Generate secret with: `openssl rand -hex 32`
1641  
1642  ---
1643  
1644  ## Support & Documentation
1645  
1646  - **Architecture**: See `docs/ARCHITECTURE.md`
1647  - **Implementation Status**: See `docs/TODO.md`
1648  - **Compliance**: See `docs/BEST-PRACTICES-EMAIL.md` and `docs/BEST-PRACTICES-SMS.md`
1649  - **Prompts**: See `docs/prompts/` for AI prompt templates
1650  
1651  ---
1652  
1653  ## License
1654  
1655  MIT
1656  
1657  ```
1658  */5 * * * * cd /path/to/project && node src/inbound/sms.js poll
1659  ```