architecture.md
1 --- 2 title: 'Architecture' 3 category: 'architecture' 4 last_verified: '2026-03-13' 5 related_files: 6 - 'src/stages/' 7 - 'scripts/claude-orchestrator.sh' 8 - 'scripts/claude-batch.js' 9 - 'scripts/claude-store-wrapper.js' 10 - 'scripts/monitoring-checks.sh' 11 - 'src/utils/programmatic-scorer.js' 12 - 'src/utils/circuit-breaker.js' 13 - 'src/api/free-score-api.js' 14 - 'website/scan.php' 15 tags: ['architecture', 'database', 'api', 'ai', 'llm', 'email', 'sms', 'scoring', 'orchestrator'] 16 status: 'current' 17 replaces: [] 18 --- 19 20 # 333 Method Automation: Architecture 21 22 > **Note**: Sections below marked _(POC/MVP design)_ document historical design decisions. The **Current Architecture** section reflects the production system as of 2026-03-13. 23 24 ## Current Architecture (2026-03-13) 25 26 Fully automated SERP→outreach pipeline. Node.js + Playwright + SQLite. All LLM work routes through `claude -p` (Claude Max subscription, $0 incremental cost) via the orchestrator. 27 28 ### Pipeline Stages 29 30 9 independent stages (`src/stages/`): 31 32 ```mermaid 33 flowchart LR 34 KW["Keywords"] --> SERP["SERPs"] 35 SERP -->|found| ASSET["Assets"] 36 ASSET -->|assets_captured| SCORE["Scoring"] 37 SCORE -->|prog_scored| RESCORE["Rescoring"] 38 RESCORE -->|semantic_scored\nvision_scored| ENRICH["Enrich"] 39 ENRICH -->|enriched| PROP["Proposals"] 40 PROP -->|proposals_drafted| OUT["Outreach"] 41 OUT -->|outreach_sent\noutreach_partial| DONE((" ")) 42 43 SCORE -->|"≥82"| HS["high_score"] 44 RESCORE -->|"≥82"| HS 45 46 IN["Inbound"] --> REPLY["Replies"] 47 48 style HS fill:#4a4,stroke:#333,color:#fff 49 style DONE fill:#333,stroke:#333 50 ``` 51 52 | Stage | Entry Status | Exit Status | Key Flag | 53 | --------- | ----------------------------------- | ------------------------------------ | ----------------------- | 54 | SERPs | _(keyword)_ | `found` | — | 55 | Assets | `found` | `assets_captured` | — | 56 | Scoring | `assets_captured` | `prog_scored` | `ENABLE_LLM_SCORING` | 57 | Rescoring | `prog_scored` | `semantic_scored` / `vision_scored` | `ENABLE_VISION` | 58 | Enrich | `semantic_scored` / `vision_scored` | `enriched` | `ENABLE_ENRICHMENT_LLM` | 59 | Proposals | `enriched` | `proposals_drafted` | — | 60 | Outreach | `proposals_drafted` | `outreach_sent` / `outreach_partial` | `SKIP_STAGES` | 61 | Replies | _(inbound)_ | _(processed)_ | — | 62 63 **Terminal statuses**: `high_score` (≥82, doesn't need help), `ignored` (blocklisted/excluded), `failing` (permanent error). 64 65 ### Pipeline Loops 66 67 The system runs **4 parallel loops**. Three run inside `src/pipeline-service.js`; the fourth is `scripts/claude-orchestrator.sh`. 68 69 **Terminology:** 70 71 - **Stage** — a named unit of code work in a pipeline-service loop. Processes N sites per cycle using local code or external APIs (Playwright, ZenRows, Twilio, Resend). Controlled by `SKIP_STAGES`. 72 - **Batch** — a named LLM job sent to the `claude_loop` orchestrator. Each batch is one `claude -p` invocation → JSON response for a specific task type. Batches run asynchronously via Claude Max (zero incremental cost). 73 74 | Loop | Runs in | Unit | Stages / Batches | DB tracking column | 75 | --------------- | ----------------------------------------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------- | 76 | `browser_loop` | `pipeline-service.js` `runBrowserLoop()` | Stages | Assets, Enrich | `last_browser_loop_at` | 77 | `api_loop` | `pipeline-service.js` `runApiLoop()` | Stage | SERPs | `last_api_loop_at` | 78 | `outreach_loop` | `pipeline-service.js` `runOutreachLoop()` | Stages | Outreach, Replies | `last_outreach_loop_at` | 79 | `claude_loop` | `claude-orchestrator.sh --loop` | Batches | score*sites, score_semantic, enrich_sites, proposals*\_, reword\_\_, proofread, classify_replies, reply_responses, extract_names, oversee, classify_errors | _(systemd timer)_ | 80 81 `browser_loop` and `api_loop` are separated so Playwright work never blocks SERP scraping. `outreach_loop` is separated so 3-day cooldown idle time never starves reply processing. 82 83 ### Scoring Architecture (3-Stage Model) 84 85 ```mermaid 86 flowchart TD 87 AC["assets_captured"] --> PROG["Programmatic Scoring\nsrc/utils/programmatic-scorer.js\n(rule-based DOM/regex, 7 factors)"] 88 PROG -->|prog_scored| SEM["Semantic Scoring\norchestrator score_semantic (Haiku)\n(headline, value prop, USP)"] 89 SEM -->|semantic_scored| NEXT["→ Enrich"] 90 91 PROG -->|"ENABLE_LLM_SCORING=true"| LLM["LLM Scoring\norchestrator score_sites (Sonnet)\n(full CRO analysis)"] 92 LLM -->|prog_scored| SEM 93 94 SEM -->|"ENABLE_VISION=true"| VIS["Vision Rescoring\nOpenRouter GPT-4o-mini\n(below-fold screenshot + contacts)"] 95 VIS -->|vision_scored| NEXT 96 97 SEM -->|"score ≥ 82"| HS["high_score (skipped)"] 98 VIS -->|"score ≥ 82"| HS 99 100 style HS fill:#4a4,stroke:#333,color:#fff 101 ``` 102 103 **Default path** (`ENABLE_LLM_SCORING=false`, `ENABLE_VISION=false`): 104 105 1. **Programmatic** (`programmatic-scorer.js`) — rule-based DOM/regex for 7 structural factors → `prog_scored` 106 2. **Semantic** (orchestrator `score_semantic`, Haiku) — headline_quality, value_proposition, USP → `semantic_scored` 107 3. Zero OpenRouter cost; Haiku via Claude Max 108 109 **Optional paths**: 110 111 - `ENABLE_LLM_SCORING=true` — Sonnet full CRO scoring via orchestrator `score_sites` batch (replaces OpenRouter GPT-4o-mini) 112 - `ENABLE_VISION=true` — GPT-4o-mini below-fold vision rescoring → `vision_scored` (OpenRouter, ~$0.003/site) 113 114 **Threshold**: Sites scoring ≥ 82 (B- and above) → `high_score` (skipped). Sites < 82 proceed to enrichment/outreach. 115 116 ### LLM Work (Claude Max Orchestrator) 117 118 All LLM calls run through `scripts/claude-orchestrator.sh` via `claude -p`. Output passes through `claude-store-wrapper.js` (JSON repair) → `claude-store.js` (DB persistence). 119 120 ```mermaid 121 flowchart TD 122 subgraph orch["claude-orchestrator.sh --loop"] 123 direction TB 124 CAN["Canary Check\n(node, claude-batch, claude CLI)"] 125 126 subgraph opus["Opus Batches"] 127 PE["proposals_email (5)"] 128 PS["proposals_sms (10)"] 129 RE["reword_email (10)"] 130 RS["reword_sms (30)"] 131 RF["reword_form (20)"] 132 RL["reword_linkedin (15)"] 133 RX["reword_x (20)"] 134 RR["reply_responses (10)"] 135 PR["proofread (50)"] 136 end 137 138 subgraph sonnet["Sonnet Batches"] 139 SS["score_sites (10)"] 140 OV["oversee (1)\n30min gate"] 141 end 142 143 subgraph haiku["Haiku Batches"] 144 SC["score_semantic (20)"] 145 ES["enrich_sites (5)"] 146 CR["classify_replies (50)"] 147 EN["extract_names (50)"] 148 CE["classify_errors (1)\n4h gate"] 149 end 150 end 151 152 orch --> WRAP["claude-store-wrapper.js\n(JSON repair)"] 153 WRAP --> STORE["claude-store.js\n(DB persist)"] 154 ``` 155 156 **16 batch types** (run via `--loop` or individually via `--type <name>`): 157 158 | Batch | Model | Size | Purpose | 159 | ------------------ | ------ | ---- | ------------------------------------------- | 160 | `proposals_email` | Opus | 5 | Generate email proposals | 161 | `proposals_sms` | Opus | 10 | Generate SMS proposals | 162 | `reword_email` | Opus | 10 | Improve email messaging | 163 | `reword_sms` | Opus | 30 | Improve SMS (<160 chars) | 164 | `reword_form` | Opus | 20 | Improve form submissions | 165 | `reword_linkedin` | Opus | 15 | Improve LinkedIn DMs | 166 | `reword_x` | Opus | 20 | Improve X/Twitter DMs | 167 | `score_semantic` | Haiku | 20 | Headline/value prop/USP scoring | 168 | `score_sites` | Sonnet | 10 | Full CRO scoring on HTML | 169 | `enrich_sites` | Haiku | 5 | Contact extraction from HTML | 170 | `proofread` | Opus | 50 | QA approve/rework/reject messages | 171 | `classify_replies` | Haiku | 50 | Inbound intent + sentiment | 172 | `extract_names` | Haiku | 50 | First name from email addresses | 173 | `reply_responses` | Opus | 10 | Sales funnel replies (time-critical) | 174 | `oversee` | Sonnet | 1 | Pipeline health check (30min gate) | 175 | `classify_errors` | Haiku | 1 | Regex patterns for unknown errors (4h gate) | 176 177 **Conservation mode**: At ≥80% of 5h window or ≥90% weekly Claude Max usage: 178 179 - **Always runs** (time-critical/cheap): `reply_responses`, `classify_replies`, `extract_names`, `oversee`, `classify_errors` 180 - **Deferred**: `proposals_*`, `reword_*`, `score_*`, `enrich_sites`, `proofread` 181 182 **Canary check** (once per cycle): Verifies `node`, `claude-batch.js`, and `claude` CLI are executable before processing any batches. Catches glibc/PATH/symlink issues early. 183 184 **JSON repair pipeline** (`claude-store-wrapper.js`): Unwraps Claude envelope → strips markdown fences → escapes control chars → iterative quote repair (up to 20 passes) → NDJSON reconstruction → validation. Failures dump to `logs/orch-fail-{batch_type}.json`. 185 186 ### Monitoring (3 Tiers) 187 188 ```mermaid 189 flowchart TD 190 subgraph T1["Tier 1 — Cron (systemd timer)"] 191 PG["Process Guardian\n(1min)"] 192 PM["Pipeline Monitor\n(5min)"] 193 SH["System Health\n(30min)"] 194 end 195 196 subgraph T2["Tier 2 — Agents (disabled)"] 197 AG["AGENT_SYSTEM_ENABLED=false"] 198 end 199 200 subgraph T3["Tier 3 — Claude Code AFK"] 201 MC["monitoring-checks.sh\n(30min)"] 202 end 203 204 subgraph ORC["Orchestrator Overseer"] 205 OV2["oversee batch (Sonnet)\n30min gate"] 206 end 207 208 OV2 -->|actions| RST["RESTART_PIPELINE"] 209 OV2 -->|actions| CLR["CLEAR_STALE_TASKS"] 210 OV2 -->|actions| RES["RESET_STUCK_SITES"] 211 212 style T2 fill:#555,stroke:#333,color:#999 213 ``` 214 215 - **Tier 1 — Cron** (`src/cron/`, systemd timer): Process Guardian (1min), Pipeline Monitor (5min), System Health (30min) 216 - **Tier 2 — Agents** (currently `AGENT_SYSTEM_ENABLED=false`) 217 - **Tier 3 — Claude Code AFK**: `bash scripts/monitoring-checks.sh` every 30min; finds blind spots, fixes code, commits 218 - **Orchestrator overseer**: Runs inside orchestrator loop (30min gate) — autonomous RESTART_PIPELINE / CLEAR_STALE_TASKS / RESET_STUCK_SITES. Replaced the former standalone `sonnet-overseer.js` cron job. 219 220 ### Circuit Breakers & Auto-Skip 221 222 API calls in pipeline stages are wrapped with circuit breakers (`src/utils/circuit-breaker.js`): 223 224 ```mermaid 225 flowchart LR 226 API["API Call"] --> CB{"Circuit\nBreaker"} 227 CB -->|CLOSED| OK["Normal"] 228 CB -->|OPEN| SKIP["Stage Skipped"] 229 CB -->|HALF_OPEN| TEST["Recovery Test"] 230 231 ERR["5xx / Timeout\n429 / 402"] -->|threshold| CB 232 RL["Rate Limit\nDetected"] --> RLS["rate-limit-scheduler\nauto-adds to SKIP_STAGES"] 233 RLS --> SKIP 234 235 OK -->|success| CLOSE["CLOSED"] 236 TEST -->|success| CLOSE 237 TEST -->|failure| OPEN["OPEN"] 238 ``` 239 240 | Breaker | API | Timeout | Limit Type | Stages | 241 | ---------- | -------------------- | ------- | ---------- | ------------------------------------- | 242 | OpenRouter | AI scoring/proposals | 120s | hourly | scoring, rescoring, proposals | 243 | ZenRows | SERP scraping | 300s | daily | serps | 244 | Twilio | SMS sending | 30s | hourly | outreach | 245 | Resend | Email sending | 30s | hourly | outreach | 246 | ZeroBounce | Email validation | 90s | hourly | _(fail-open, doesn't block outreach)_ | 247 248 **Auto-skip**: When a breaker opens (429/402), `rate-limit-scheduler.js` dynamically adds affected stages to `SKIP_STAGES`. When the rate limit window resets, stages are automatically re-enabled. No manual `.env` editing needed. 249 250 - State persists in `logs/rate-limits.json` 251 - Check: `npm run rate-limits` 252 253 ### Inbound Sales Funnel (Free Website Scanner) 254 255 Complements the outbound pipeline with a self-service inbound channel. Prospects enter their own URL, get a free score, and self-select into paid products. 256 257 **Funnel flow:** 258 259 ```mermaid 260 flowchart TD 261 AD["Ad\n(Google / Facebook / LinkedIn)"] --> SCAN["BRAND_URL/scan\n(enter URL)"] 262 SCAN -->|"POST /api/score"| API["Node.js Scoring API\n(programmatic scorer, $0/scan)"] 263 API --> SCORE["Instant score + grade\n+ traffic lights"] 264 SCORE --> EMAIL["Email capture\n(gates factor breakdown)"] 265 EMAIL --> PEEK["Free peek\n(worst factor, detailed)"] 266 PEEK --> QF["$47 Quick Fixes Report\n(tripwire)"] 267 QF --> FULL["$297 Full CRO Audit\n(core product)"] 268 ``` 269 270 **Architecture:** 271 272 ```mermaid 273 flowchart TD 274 subgraph PHP["PHP Frontend (Hostinger)"] 275 SCANPHP["scan.php — URL input + animated score reveal"] 276 APIPHP["api.php — proxies to Node.js API, caches results"] 277 MAINJS["main.js — scanner animations, email capture, PayPal"] 278 end 279 280 PHP -->|"POST /api/score"| NODE 281 282 subgraph NODE["Node.js Scoring API (NixOS, systemd)"] 283 FREEAPI["src/api/free-score-api.js"] 284 FETCH["Fetches HTML via HTTP (not Playwright)"] 285 SCORER["scoreWebsiteProgrammatically() — $0"] 286 RATE["Rate limits: 10/IP/hour, CAPTCHA after 3"] 287 CACHE["Cache: same URL → 24h"] 288 end 289 290 NODE --> DB 291 292 subgraph DB["SQLite (free_scans table)"] 293 COLS["scan_id, url, domain, email, score, grade, score_json\nutm_source/medium/campaign, converted_to, expires_at\nIndustry benchmarks from 23,990+ scored sites"] 294 end 295 ``` 296 297 **Product ladder:** 298 299 | Tier | Product | Price | Reveals | 300 | ----------------- | -------------------------- | ----- | -------------------------------------------------------------- | 301 | Free (pre-email) | Score + grade + percentile | $0 | Overall score only | 302 | Free (post-email) | Traffic lights + free peek | $0 | 10 factor indicators (red/amber/green) + weakest factor detail | 303 | Quick Fixes | 3-5 page PDF | $47 | All 10 factor scores + top 3 fixes with exact copy | 304 | Full Audit | Comprehensive report | $297 | Vision analysis + screenshots + full action plan + competitors | 305 306 **Key files:** 307 308 - `src/api/free-score-api.js` — Scoring API endpoint (Express) 309 - `src/utils/programmatic-scorer.js` — Core scoring engine (reused, zero cost) 310 - `website/scan.php` — Scanner landing page (in the website repo) 311 - `website/assets/js/scanner.js` — Frontend scan flow (in the website repo) 312 - `db/migrations/095-free-scans.sql` — Schema migration 313 314 **Email drip (post-scan, no purchase):** 5 emails over 14 days via Resend. Day 0: recap, Day 2: free tip, Day 5: social proof, Day 7: results expiring, Day 14: auto re-scan. 315 316 --- 317 318 ## Original Design (POC/MVP — historical reference) 319 320 Target: 90% automated SERP→outreach pipeline for low-score local biz sites. Local Node.js + Playwright stack with Cline-assisted development. 321 322 ## Functional Requirements 323 324 **SERP Scraping**: 325 326 - Input: Keyword (e.g. "plumber Seattle"), directory (Yelp/Angi first). 327 - Extract: Top 10 URLs + SERP contacts. 328 - Tool: ZenRows API (`$49/10K results` free tier 1K). 329 330 **Site Processing** (parallel tabs, max 15): 331 332 - Visit URL → wait full load (network idle). 333 - Capture: Domain, landing URL, keyword, above-fold screenshot (Playwright native), below-fold (scroll+clip), mobile above-fold, full rendered DOM. 334 - **Screenshot Optimization**: Apply intelligent cropping (DOM-based nav/footer removal + Sharp attention-based resizing) to reduce LLM token usage by 20-35%. 335 - **Contact Check**: Parse DOM for ≥2 primaries (email regex, tel:, form action or link to contact page). If <2: Inject operator panel + use Playwright's `page.pause()` for human intervention. 336 337 **LLM Scoring** (GPT-4o-mini via OpenRouter SDK): 338 339 - Prompt: Assets + "Score A+-F on \[factors: offer/CTA/USP/etc.\] + extract contacts if low". 340 - Post-SERP: Find top scorer → competitor for lows. 341 342 **Outreach Generation** (only for scores B- to E): 343 344 - LLM: 3x proposal variants + 3x subject variants (prompt includes competitor data, scores, ≤5 sales samples for keyword, ≤5 samples for country, best practices docs). 345 - **Channel Selection Logic**: Prioritize by engagement (SMS > Contact Form > Email > Social). Match variant style to channel (Variant 1=short/urgent for SMS, Variant 2=professional for forms, Variant 3=detailed for email). 346 347 **Dispatch & Store**: 348 349 - Send/track each variant via appropriate channel. 350 - SQLite: `outreaches` table with delivery tracking, click tracking, and sale flag. 351 - **Operator Tools**: DBeaver for data edits (POC/MVP), Streamlit dashboard (Phase 3). 352 353 **Edge Handling**: 354 355 - **Bot Evasion**: puppeteer-extra-plugin-stealth for anti-detection. 356 - **Parallelization**: Max 15 concurrent pages; monitor memory/tab limits. 357 - **Error Recovery**: Retry logic with exponential backoff for API failures; timeout handling (30s max per site). 358 - **Cline Integration**: Capture console errors via `page.on('console')` and `page.on('pageerror')` for automated debugging. 359 360 # Overall Architecture 361 362 ```mermaid 363 flowchart TD 364 subgraph APP["Node.js Application (NixOS)"] 365 SERP["SERP Scraping\nZenRows API → URL list"] 366 subgraph BROWSER["Browser Automation — Playwright"] 367 PAGE["Page Processing\nScreenshots + DOM capture"] 368 SHARP["Screenshot Optimization\nSharp (resize/crop/compress)"] 369 end 370 subgraph SCORING["Scoring"] 371 PROG2["Programmatic Scorer\n(DOM/regex, $0)"] 372 CLAUDE["Claude Max Orchestrator\n(Haiku/Sonnet/Opus, $0)"] 373 OR["OpenRouter\n(optional vision, ~$0.003/site)"] 374 end 375 subgraph OUTREACH2["Outreach"] 376 SMS["Twilio SMS"] 377 EMAILCH["Resend Email\n(direct fetch API)"] 378 FORM["Form Automation\n(Playwright)"] 379 SOCIAL["X / LinkedIn"] 380 end 381 STORE2["SQLite (better-sqlite3)"] 382 end 383 384 SERP --> BROWSER --> SCORING --> OUTREACH2 --> STORE2 385 ``` 386 387 # Tech Stack 388 389 - **Runtime**: Node.js v18+ (native ESM support) 390 - **Browser Automation**: 391 - `playwright` (headed chromium, native screenshots) 392 - `playwright-extra` + `puppeteer-extra-plugin-stealth` (anti-detection) 393 - **Image Processing**: `sharp` (resize, crop, compress for LLM optimization) 394 - **Database**: 395 - `better-sqlite3` (faster than sql.js, DBeaver-compatible) 396 - POC/MVP: DBeaver for read/write 397 - Phase 3: Custom Streamlit dashboard 398 - **APIs**: 399 - ZenRows (SERP scraping) 400 - OpenRouter (GPT-4o-mini for scoring/proposals) 401 - Twilio (SMS + webhooks) 402 - Resend (SMTP for email channel - Phase 3) 403 - **Development**: VSCodium with Cline extension for automated debugging via console.log capture 404 405 # Execution Flow 406 407 1. `node src/poc.js "keyword"` → launches headed Playwright browser 408 2. ZenRows fetches SERP → extracts top 10 URLs 409 3. Process sites in batches (15 concurrent pages max): 410 - Navigate to URL, wait for network idle 411 - Capture 3 screenshots (desktop above/below, mobile above) 412 - Optimize images (DOM-based cropping + Sharp resize) 413 - Extract HTML DOM 414 4. Send to OpenRouter for scoring 415 5. Store results in SQLite 416 6. (MVP+) Generate proposals for low scorers → dispatch via channels 417 7. Console errors → logged for Cline to auto-fix 418 419 **Entry Point**: `src/main.js`: 420 421 ```javascript 422 const { chromium } = require('playwright-extra'); 423 const stealth = require('puppeteer-extra-plugin-stealth')(); 424 chromium.use(stealth); 425 426 async function main(keyword) { 427 const browser = await chromium.launch({ 428 headless: false, 429 slowMo: 100, 430 }); 431 432 const context = await browser.newContext(); 433 434 // Process pipeline... 435 const sites = await scrapeSERP(keyword); 436 await processSites(context, sites); 437 438 await browser.close(); 439 } 440 ``` 441 442 # Integration Landscape Diagram 443 444 ```mermaid 445 flowchart LR 446 subgraph POC["Phase 1: POC — $1.30 / 1K sites"] 447 ZR["ZenRows\nSERP API"] --> PW["Playwright\nBrowser (×15)\nScreenshots + DOM\n+ Sharp optim"] 448 PW --> ORS["OpenRouter\nGPT-4o-mini\nScoring + Vision"] 449 ORS --> DB1["SQLite"] 450 end 451 452 subgraph MVP["Phase 2: MVP — $61 / 10K sites"] 453 PROP["OpenRouter\nProposals\n(3 variants)"] --> CH["Contact Channels\n1. Twilio SMS\n2. Contact Form"] 454 CH --> DB2["SQLite:\noutreaches\n+ conversations"] 455 end 456 457 subgraph FULL["Phase 3: Full System — $71/mo / 10K sites"] 458 direction TB 459 subgraph OUT["Outbound Channels"] 460 R2["Resend Email"] 461 SOC["Social Browser\n(LinkedIn/X)"] 462 end 463 subgraph IN["Inbound Handling"] 464 INSMS["Inbound SMS\n(Twilio webhook)"] 465 INEML["Inbound Email\n(Resend webhook)"] 466 end 467 INSMS --> CONV["SQLite:\nconversations\n+ reply templates"] 468 INEML --> CONV 469 end 470 471 POC --> MVP --> FULL 472 ``` 473 474 # Data Flow 475 476 ```mermaid 477 flowchart LR 478 KW2["Keyword"] --> SERP2["SERP"] --> SITE["Site"] --> ASSET2["Asset"] --> SCORE2["Score"] --> PROPOSAL["Proposal"] --> OUTREACH3["Outreach"] --> CONVO["Conversation"] 479 ``` 480 481 --- 482 483 # SQLite Schema 484 485 **Access via**: DBeaver (POC/MVP), Streamlit dashboard (Phase 3) 486 487 **Core Tables:** 488 489 - **sites** - Main site data with screenshots (cropped + uncropped), HTML DOM, conversion scores, processing status 490 - **keywords** - Keyword tracking with ZenRows count, processed count, low-scoring metrics, last scraped timestamp 491 - **outreaches** - Proposal variants (1-3) by channel with delivery tracking, click tracking, sale outcomes 492 - **conversations** - Threaded inbound/outbound messages linked to outreaches, with sentiment analysis 493 - **config** - Global settings (sender details, low score cutoff=82, templates, API keys) 494 495 **Key Features:** 496 497 - Screenshot storage: Both cropped (AI-optimized) and uncropped (full content) versions 498 - Conversion scores: JSON blob with detailed scoring + conversion_score (0-100) for filtering 499 - Processing pipeline: Status tracking (found → assets_captured → prog_scored → semantic_scored → enriched → proposals_drafted → outreach_sent) 500 - Multi-channel support: SMS, Email, Contact Form, LinkedIn, Facebook, Instagram 501 - Analytics ready: Indexes on key fields for performance (keyword, status, channel, timestamps) 502 503 See full schema: [db/schema.sql](db/schema.sql) 504 505 --- 506 507 ## LLM Calls in Pipeline (Current) 508 509 All LLM calls run through `claude -p` (Claude Max, $0 incremental) via the orchestrator, except vision rescoring which uses OpenRouter when `ENABLE_VISION=true`. 510 511 | # | Stage | Model | Trigger | Output | 512 | --- | ---------------------- | ------------------------ | ------------------------------------- | ----------------------------------------------- | 513 | 1 | Programmatic scoring | _(none — rule-based)_ | Always (DOM/regex) | Score + grade + 7 factor_scores | 514 | 2 | Semantic scoring | Haiku (`claude -p`) | `score_semantic` orchestrator batch | headline_quality, value_proposition, USP scores | 515 | 3 | CRO scoring (optional) | Sonnet (`claude -p`) | `score_sites` orchestrator batch | Full CRO factor analysis | 516 | 4 | Vision rescoring (opt) | GPT-4o-mini (OpenRouter) | `ENABLE_VISION=true` | Below-fold re-score + contacts | 517 | 5 | Enrichment | Haiku (`claude -p`) | `enrich_sites` orchestrator batch | Contact extraction from HTML | 518 | 6 | Proposals | Opus (`claude -p`) | `proposals_*` orchestrator batch | N personalized outreach messages | 519 | 7 | Rewording | Opus (`claude -p`) | `reword_*` orchestrator batch | Improved messaging (trust/proof framework) | 520 | 8 | Proofreading | Opus (`claude -p`) | `proofread` orchestrator batch | QA approve/rework/reject before send | 521 | 9 | Reply classification | Haiku (`claude -p`) | `classify_replies` orchestrator batch | intent + sentiment | 522 | 10 | Name extraction | Haiku (`claude -p`) | `extract_names` orchestrator batch | first name from email address | 523 | 11 | Reply responses | Opus (`claude -p`) | `reply_responses` orchestrator batch | sales funnel reply message | 524 | 12 | Oversight | Sonnet (`claude -p`) | `oversee` (30min gate) | corrective actions (restart, reset, clear) | 525 | 13 | Error classification | Haiku (`claude -p`) | `classify_errors` (4h gate) | regex patterns for unknown errors | 526 527 **Cost breakdown** (current defaults, `ENABLE_VISION=false`): 528 529 - Per-site pipeline cost: ~$0 (programmatic scoring + Haiku via Claude Max) 530 - Proposals + rewording + proofreading + oversight: $0 incremental (Claude Max subscription) 531 - OpenRouter: $0 (only charged if vision rescoring re-enabled) 532 533 ### Contact Extraction (Current) 534 535 1. **HTML regex** (`src/utils/html-contact-extractor.js`): emails, phones, forms — always runs 536 2. **Enrichment** (orchestrator `enrich_sites`, Haiku): contact extraction from About/Contact/Legal pages via stealth browser 537 3. **Vision rescoring** (`ENABLE_VISION=true`, optional): contacts from below-fold screenshots (OpenRouter) 538 539 ### Email Sending 540 541 Resend SDK replaced with direct `fetch()` to `https://api.resend.com/emails` with `AbortSignal.timeout(20s)`. Fixes stale TCP keep-alive hangs on NixOS (SDK had no abort support). Circuit breaker and rate limiting still apply. 542 543 ### Prompt Files 544 545 | Prompt | Used by | 546 | ------------------------------------------ | ------------------------------------------ | 547 | `prompts/PROPOSAL.md` | Orchestrator proposals + rewording batches | 548 | `prompts/REPLIES.md` | Orchestrator reply responses | 549 | `docs/05-outreach/email-best-practices.md` | Email proposal context | 550 | `docs/05-outreach/sms-best-practices.md` | SMS proposal context | 551 | `prompts/SCORING.md` | Legacy LLM scoring (OpenRouter) | 552 | `prompts/RESCORING.md` | Legacy rescoring (OpenRouter) | 553 554 - **VISION.md** = Image-to-text extraction instructions