/ docs / TODO.md
TODO.md
  1  ---
  2  title: 'TODO'
  3  category: 'other'
  4  last_verified: '2026-02-15'
  5  related_files:
  6    - 'src/scrape.js'
  7    - 'src/capture.js'
  8    - 'src/utils/image-optimizer.js'
  9    - 'src/score.js'
 10    - 'src/utils/error-handler.js'
 11  tags: ['TODO', 'cron', 'scheduling', 'testing', 'security', 'database', 'api', 'ai']
 12  status: 'current'
 13  ---
 14  
 15  ## Implementation Tasks
 16  
 17  **How to use this file:**
 18  
 19  - Tasks organized by priority and effort (Quick Wins → MVP → Full System)
 20  - Check this file before asking "what should we work on next?"
 21  - Mark tasks complete with the green tick emoji ✅ as you finish them
 22  - Add new tasks under appropriate priority section
 23  
 24  ---
 25  
 26  ## OpenViking — Deferred Context Database (added 2026-03-14)
 27  
 28  Integrate **OpenViking** (Apache-2.0, [github.com/volcengine/OpenViking](https://github.com/volcengine/OpenViking)) to reduce token usage via L0/L1/L2 tiered context loading.
 29  
 30  **Pre-conditions before starting:**
 31  
 32  1. Orchestrator context size logging shows dynamic context regularly >200KB/call
 33  2. Marketing knowledge base (brand guidelines, past campaigns, competitor intel) populated
 34  
 35  **Integration steps when ready:**
 36  
 37  1. Add `openviking` Docker container to `modules/containers.nix` (HTTP port 1933, Python SDK server)
 38  2. Node.js HTTP client in `src/agents/utils/context-loader.js` (~30 lines, replaces flat file reads)
 39  3. Migrate `src/agents/contexts/*.md` + `prompts/` into OpenViking schema
 40  4. Add env var `OPENVIKING_URL=http://localhost:1933`
 41  5. Same NixOS container config deploys unchanged to VPS — no migration friction
 42  
 43  **Why deferred:** Current dynamic context is ~0KB (7 static .md files, 94KB total). Token savings ~10-20% = ~$1/day if agent system were re-enabled. Not worth 6-8h setup. Break-even at >200KB dynamic context.
 44  
 45  See: `docs/02-architecture/software-inventory.md` for FOSS/audit status.
 46  
 47  ---
 48  
 49  ## Reminders / One-Off Actions
 50  
 51  - 🏷️ **Plan: rename site statuses for clarity (added 2026-03-11)** — Current status names are confusing (`scored` / `rescored` imply repetition; `rescored` is actually just "scored with vision"; `assets_captured` is internal jargon). Proposed rename:
 52    - `found` → `found` (fine)
 53    - `assets_captured` → `scraped` (HTML+screenshots captured, ready for scoring)
 54    - `scored` → `prog_scored` (programmatic score done, awaiting semantic LLM pass) — only used when ENABLE_VISION=true; with vision off, skipped
 55    - `rescored` → `scored` (fully scored: programmatic + semantic factors merged, final score computed; previously also meant "vision rescored")
 56    - `enriched` → `enriched` (contacts found, ready for proposals)
 57    - `proposals_drafted` → `proposals_drafted` (fine)
 58    - `high_score` → `skipped_high_score` (makes clear it's a deliberate skip, not a terminal failure)
 59    - `ignore` → `ignored` (grammatically consistent with other past-participle statuses)
 60    - Implementation notes: rename requires a DB migration (UPDATE sites SET status = ... for each mapping), update all queries in src/stages/, claude-batch.js, claude-store.js, monitoring-checks.sh, npm run status, and any hardcoded strings in tests. Not urgent — do during a quiet period. Add the semantic_scored gate above as an interim fix first.
 61  
 62  - 📊 **Recalibrate hybrid scorer once scored backlog clears (added 2026-03-09)** — Current R² of programmatic-only scorer is -2.6 (expected — it's incomplete by design). Run `node scripts/calibrate-scorer.js 500` once a meaningful number of sites (~500+) have gone through the full split pipeline (programmatic + `score_semantic` Haiku batch). The calibration measures hybrid output vs LLM ground truth. Worth running when orchestrator's `score_semantic` backlog has substantially drained. Check: `SELECT COUNT(*) FROM sites WHERE json_extract(score_json,'$.headline_quality') IS NOT NULL AND score IS NOT NULL;` — target 500+ before calibrating.
 63  
 64  - 🧹 **Remove reword system after backlog drain (added 2026-03-09)** — Once all 4,901 pre-orchestrator approved-unsent messages have been reworded (check: `SELECT COUNT(*) FROM messages WHERE reworded_at IS NULL AND approval_status='approved' AND sent_at IS NULL AND direction='outbound' AND message_type='outreach'` returns 0), remove: reword batch types from `claude-orchestrator.sh`, `fetchRewordBatch()` from `claude-batch.js`, `storeReword()` from `claude-store.js`, reword prompts from orchestrator. The `reworded_at` column can stay (harmless). New proposals generated by the orchestrator already use the trust/proof/importance framework — no rewording needed.
 65  
 66  - 📞 **Twilio number status (updated 2026-03-04)** — Numbers active: AU/NZ (+61468089949), CA (+18254794242), UK (+447446944610). US uses CA number (NANP +1 — indistinguishable to recipients). IE uses UK number. Old US number +17658856535 **CANCELLED** — remove from Twilio console. Future expansion: IN, ZA, SG (~$2/month each — purchase when targeting those markets).
 67  
 68  - 🇺🇸 **Register US A2P 10DLC for SMS marketing** — Required before scaling US SMS outreach. Current state: US SMS enabled using CA +18254794242 (works for testing/low volume). For production scale: register Brand + Campaign via Twilio console (~$10/month). Without registration, carriers may filter US-bound messages. See [Twilio A2P docs](https://help.twilio.com/articles/1260800720410). Also see Duguid defense note below re: TCPA consent requirements.
 69  
 70  - ⚖️ **Check TCPA Duguid defense with US lawyer** — Our system pulls specific phone numbers from a database (scraped from business websites), not random/sequential generation. Under _Facebook v. Duguid_ (2021), this may mean we're not an ATDS and TCPA automated consent requirements don't apply. Get a US telecom lawyer to confirm before scaling US SMS outreach. See BP Risk #1 for full analysis with case law citations.
 71  
 72  - 🏥 **Research regulated industry outreach requirements** — Healthcare (dental, medical, etc.) and financial (mortgage broker, accountant, etc.) sites are currently ignored (`status='ignore'`). Research per-industry compliance requirements (HIPAA marketing rules, financial services advertising regulations) to determine if/how we can outreach to these industries in future. Remove from `classifyIndustry()` ignore list once requirements are met.
 73  
 74  - 📸 **Screenshot cleanup cron** — When `ENABLE_VISION` is re-enabled, implement a cron job to clean up old screenshots (currently unbounded storage growth). Low priority while vision is disabled.
 75  
 76  - 🤖 **Reply-to-payment automation** — Automate the flow from prospect reply "YES" (conversation status `qualified`) → PayPal payment link generation → send payment link → webhook monitors completion → report delivery. Currently a manual step. The `conversations` table schema already supports the full flow.
 77  
 78  - 🌐 **auditandfix.com order page: URL params for prefill** — When sending prospects to the website (instead of a bare PayPal link), pass query params so the order page is pre-filled: `?domain=example.com.au&country=AU&currency=AUD&email=owner@example.com`. The page should use these to: (1) display the correct local price, (2) pre-fill the checkout form, (3) pass `custom_id` to PayPal so payment is linked back to the right conversation. Update SMS reply templates to use `https://auditandfix.com/order?domain=...&country=...` instead of direct `paypal.com` links. **Website-side change** — requires editing `auditandfix.com` PHP.
 79  
 80  - 📣 **[Audit&Fix] Gather real customer testimonials** — The sales page (auditandfix.com) currently uses placeholder testimonials (Sarah T., James K., Michelle R.). Once the first real reports are delivered, reach out to customers for genuine feedback quotes. Plan: email survey after report delivery, offer a follow-up benchmarking discount in exchange for a testimonial. Also set up Trustpilot or Google Business Profile so reviews are verifiable. Update `auditandfix.com/index.php` testimonials section and the star rating once real reviews exist.
 81  
 82  - 🇵🇹 **Add Portugal (PT) to countries list** — Portuguese translation (`lang/pt.json`) already exists (Brazilian Portuguese). When expanding to Portugal, add PT to `src/config/countries.js` with EUR currency, European Portuguese locale tweaks if needed, and GDPR compliance flag.
 83  
 84  - 🔒 **WPCloud Anti-Spam email obfuscation** (LOW PRIORITY) — Some sites use this plugin to blur email addresses behind a multi-step reveal flow (blurred link → click → popup → wait → OK → email shown). Plugin is only available to wpcloud.com customers, so affected sites are rare. If email extraction yield from these sites becomes measurably low, investigate: (1) clicking the mailto link to trigger the reveal, (2) intercepting network requests during the reveal popup, or (3) falling back to form outreach for wpcloud.com-hosted sites. Example site: cabanonetgarage.com/en/contact-us/
 85  
 86  - 💰 **Cancel Hostinger VPS backup add-on** (~$36/year) once restic → Backblaze B2 is confirmed working.
 87    Whole-VPS snapshots from Hostinger are redundant once restic is in place and verified.
 88    Verify: `restic -r b2:method333-backups-prod snapshots` shows at least one successful snapshot,
 89    then cancel from Hostinger control panel → Billing → Add-ons.
 90  
 91  - 📧 **Upgrade Resend to paid plan ($20/month) once sending >500 emails/day** — Current free plan
 92    allows 3,000 emails/month (100/day). Once daily email volume exceeds 100, upgrade to the $20/month
 93    plan (50,000 emails/month). At the $20 plan, add a dedicated sending domain via Resend → Domains
 94    for improved deliverability. Note: do NOT use the free plan for bulk cold outreach — Resend's free
 95    tier is intended for transactional/notification emails, not marketing campaigns. Violating this
 96    risks account suspension. Check current send volume: `npm run status`.
 97  
 98  ---
 99  
100  ## 🔥 High Priority — Claude Code AFK Pipeline (Proposals + Scoring QA)
101  
102  **Goal**: Shift proposal generation and scoring QA from paid API calls (OpenRouter/Anthropic) into Claude Code AFK cycles, using the Claude Max subscription at zero incremental LLM cost.
103  
104  **Context**: Claude Code already runs autonomous 30-minute AFK check cycles. Proposal generation and scoring QA are just analysis work — read site data, write text. Claude Code does this natively. This eliminates ~$45/month in API costs (Claude API $30 + OpenRouter proposals $15) and produces higher-quality output (Opus vs GPT-4o-mini/Haiku).
105  
106  **Architecture:**
107  
108  Each 30-minute AFK cycle adds a "pipeline work" phase after health checks:
109  
110  ```
111  AFK cycle (30 min):
112    1. Run monitoring-checks.sh (2-3 min) — existing health monitoring
113    2. PROPOSAL GENERATION (new):
114       a. Query: sites at status='enriched' needing proposals (LIMIT 20)
115       b. For each site: read score_json, contacts, keyword from DB
116       c. Generate proposal text per PROPOSAL.md guidelines
117       d. INSERT INTO messages table (direction='outbound', approval_status='pending')
118       e. UPDATE sites SET status='proposals_drafted'
119    3. SCORING QA (new, after programmatic scorer is built):
120       a. Query: 5 random recently-scored sites
121       b. Read their HTML summary + programmatic score
122       c. Spot-check: does the score feel right? Flag outliers for rescore.
123    4. Sleep 30 min → repeat
124  ```
125  
126  **Throughput estimate:**
127  
128  - Each proposal needs ~2KB context (domain, score_json, contacts, keyword)
129  - Claude Code can handle ~20 proposals per cycle before context pressure
130  - 48 cycles/day × 20 = **~960 proposals/day at $0 incremental cost**
131  - Current need: ~100 proposals/day → easily covered with capacity to spare
132  
133  **Steady-state monthly cost impact:**
134  
135  | Cost category                     | Before (API) | After (Claude Code)      | Savings    |
136  | --------------------------------- | ------------ | ------------------------ | ---------- |
137  | Claude API (proposals)            | $30/mo       | $0                       | $30        |
138  | OpenRouter (proposals/enrichment) | $15/mo       | ~$5/mo (enrichment only) | $10        |
139  | **Total LLM costs**               | **$45/mo**   | **~$5/mo**               | **$40/mo** |
140  
141  **Implementation:**
142  
143  - [ ] Add "pipeline work" phase to AFK cycle (after monitoring-checks.sh, before sleep)
144  - [ ] Write `scripts/pull-proposal-batch.sh` — queries DB, outputs site data as JSON for Claude Code to process
145  - [ ] Define proposal output format that Claude Code writes directly (SQL INSERT or JSON file that a loader script imports)
146  - [ ] Add scoring QA phase (after programmatic scorer is built)
147  - [ ] Add proposal count tracking to AFK progress reports
148  
149  **Not in scope:** Enrichment still uses Playwright + Haiku via API (browser automation can't run inside Claude Code context). This is ~$5/month and not worth optimizing.
150  
151  ---
152  
153  ## 🔥 High Priority — Programmatic (Rule-Based) Scoring
154  
155  **Goal**: Replace LLM-based scoring with a deterministic rule engine that analyzes HTML/DOM directly — zero LLM cost, faster, explainable.
156  
157  **Context**: LLM scoring (GPT-4o-mini via OpenRouter) cost $349 for 77,555 calls in the first 12 days — the single largest cost driver. With `SKIP_STAGES=scoring,rescoring` now set, new scoring is paused. But as the pipeline drains, we need a sustainable scoring approach that doesn't cost $0.004/site.
158  
159  **What to score programmatically (from HTML/DOM):**
160  
161  - Phone number visible above the fold (regex on `<header>`, first viewport HTML)
162  - CTA button present above fold (`<a>`, `<button>` with action words: call, get, book, contact, quote)
163  - Trust signals in HTML: testimonial sections, star ratings, review counts, accreditations
164  - Contact form present on landing page
165  - Mobile-responsive meta tag (`viewport`)
166  - Page load indicators: `<img>` count, inline CSS/JS bloat
167  - Business name/location in `<title>` or `<h1>` (local SEO signal)
168  - Social proof: mentions of "years experience", "projects completed", "clients served"
169  - SSL/HTTPS (from URL)
170  - Google Maps embed or address visible
171  
172  **Scoring**: Each factor adds/subtracts from a base score. Weight by conversion impact. Output: score 0-100, grade A+ to F, `score_json` with per-factor breakdown — identical schema to current LLM output so downstream stages need no changes.
173  
174  **Why high priority**: Reduces ongoing LLM costs from ~$350/month → $0 for scoring. Enables resuming pipeline at scale without burning OpenRouter credits. Also makes scoring auditable and debuggable.
175  
176  **Implementation notes:**
177  
178  - New module: `src/utils/programmatic-scorer.js`
179  - Input: HTML from `assets_captured` stage (already stored)
180  - Output: same JSON schema as current `score_json`
181  - Run A/B test: compare programmatic scores vs LLM scores on a sample of already-scored sites to validate calibration
182  - If correlation is >80%, replace LLM scoring entirely
183  
184  **Sub-tasks:**
185  
186  - [ ] Define scoring rubric (factors + weights) based on current LLM score_json schema
187  - [ ] Implement `src/utils/programmatic-scorer.js` with factor detection functions
188  - [ ] Add unit tests against known-scored sites from DB
189  - [ ] Calibration: run on 500 already-scored sites, compare to LLM scores
190  - [ ] If validated: update `src/stages/scoring.js` to call programmatic scorer when `ENABLE_LLM_SCORING=false`
191  - [ ] Update `.env.example` with `ENABLE_LLM_SCORING` flag
192  
193  ---
194  
195  ## 🔥 High Priority — LLM-Powered Autoresponder
196  
197  **Goal**: Automatically handle inbound replies using an LLM, so prospects get a timely, intelligent response without manual operator intervention. This is the critical link between outreach and revenue.
198  
199  **Context**: Inbound SMS/email polling is already implemented (`src/inbound/sms.js`, `src/inbound/email.js`). Sentiment + intent classification is already running. The `conversations` table has `direction`, `replied_at`, and `intent` columns. What's missing is the LLM step that reads the conversation context and drafts (and optionally sends) a reply.
200  
201  ### Sub-tasks
202  
203  - [ ] **Conversation context loader** — Given a `conversation_id`, fetch the full thread: original outreach proposal text, site metadata (domain, grade, score, industry), all prior inbound/outbound messages. This is the LLM's context window.
204  
205  - [ ] **Reply prompt** (`prompts/autoresponder.md`) — Persona: friendly web consultant. Goals by intent:
206    - `interested` / `qualified` → confirm interest, send PayPal payment link (or ask qualifying question if not enough info)
207    - `objection` → handle the objection (price, timing, relevance) with a short, confident rebuttal
208    - `not_interested` → graceful opt-out acknowledgement, honour STOP/unsubscribe
209    - `autoresponder` / `out_of_office` → no reply (skip)
210    - `unknown` → ask a clarifying question
211    - Tone rules: concise, channel-appropriate (SMS ≤160 chars, email can be longer), no corporate jargon, no fake urgency
212  
213  - [ ] **`src/inbound/autoresponder.js`** — Core module:
214    - `shouldAutoRespond(conversation)` — returns false for: already replied, opt-out, autoresponder intent, `replied_at IS NOT NULL`
215    - `generateReply(conversation)` — builds context, calls LLM (Claude Haiku preferred for cost), returns draft reply text
216    - `sendReply(conversation, text)` — writes `direction='outbound'` row to `conversations`, then calls existing `processPendingReplies()` in `src/inbound/sms.js` / `src/inbound/email.js`
217    - Log to `llm_usage` table with `stage='autoresponder'`
218  
219  - [ ] **Human-in-the-loop mode** (default ON) — `AUTORESPONDER_AUTO_SEND=false` in `.env`. When false: generate reply and insert into `conversations` as `direction='outbound', replied_at=NULL` for operator review in dashboard before sending. When true: send immediately.
220  
221  - [ ] **Cron job** (`src/cron/autoresponder.js`) — Runs every 5 minutes. Finds conversations where `direction='inbound'` and no outbound reply exists and intent is not `autoresponder`/`not_interested`. Calls `autoresponder.js` for each. Respects 72h cooldown (don't reply to something days-old without operator review).
222  
223  - [ ] **Dashboard integration** — "Pending autoresponder drafts" count in Outreach Trust panel. Operator can approve/edit/discard drafts from Human Review page.
224  
225  - [ ] **Tests** — Unit tests for `shouldAutoRespond()`, `generateReply()` (mocked LLM), and the intent-to-action routing logic. Integration test with a real inbound SMS conversation in a test DB.
226  
227  **Dependencies**: `src/inbound/sms.js`, `src/inbound/email.js`, `src/inbound/processor.js`, `src/payment/paypal.js` (for payment link generation), `prompts/` directory, `llm_usage` table.
228  
229  **Estimated effort**: Claude: 4h, Human: 8h
230  
231  ---
232  
233  ## Known Issues / Bugs
234  
235  **Test Suite Issues:**
236  
237  - `scripts/test-deduplication.js`: `r.reason` null crash → added null-coalescing guard
238  - `scripts/test-resend-webhook-sig.js`: `process.exit(1)` when env var missing → changed to `exit(0)` with SKIP message (manual integration test, not unit test)
239  - `npm test` suite (`tests/**/*.test.js`) was always clean; failures were in `scripts/test-*.js` picked up by bare `node --test`
240  
241  ---
242  
243  ## ✅ POC (Priority 1)
244  
245  **Goal**: Validate approach with 1,000 sites, test screenshot optimization
246  
247  #### ✅ Setup
248  
249  - `npm init -y && npm i playwright playwright-extra puppeteer-extra-plugin-stealth better-sqlite3 sharp axios`
250  - Create `.env` file with API keys (ZenRows, OpenRouter)
251  - Create `db/schema.sql` with enhanced schema + keywords table + config table
252  - Initialize database: `node scripts/init-db.js`
253  - Database migrations system created
254  
255  #### ✅ Core Pipeline
256  
257  - `src/scrape.js`: ZenRows SERP scraping (`scrapeSERP` function)
258  - `src/capture.js`: Screenshot capture with Playwright
259    - Desktop viewport (1440×900): above-fold + below-fold
260    - Mobile viewport (390×844): above-fold only
261    - Full rendered DOM via `page.content()`
262    - Cropped + uncropped versions for all screenshots
263    - Network idle wait strategy + timeout handling
264  - `src/utils/image-optimizer.js`: Screenshot optimization with Sharp (**100% test coverage**)
265    - Smart cropping (remove nav/footer margins)
266    - Uncropped versions preserve full content (fit: 'inside')
267    - JPEG optimization at quality 85
268    - 91-96% file size reduction achieved
269  - `src/score.js`: OpenRouter GPT-4o-mini scoring
270    - Load Conversion Scoring Prompt from file
271    - Send screenshots + DOM
272    - Parse JSON response, store in conversion_score_json
273    - Resubmit prompt for low scores
274  - `src/utils/error-handler.js`: Retry logic with exponential backoff (**100% test coverage, 30 tests**)
275  - `src/utils/logger.js`: Console logging for Cline monitoring (**100% test coverage**)
276  - `src/utils/keyword-manager.js`: Keyword tracking system
277  - `scripts/backfill-keywords.js`: Backfill existing keywords
278  
279  #### ✅ Main Orchestration
280  
281  - `src/poc.js`: Main POC pipeline (processes keyword → SERP → capture → score)
282  - `src/process.js`: Full processing pipeline with queue management
283  - Test: Successfully processed multiple keywords with score tracking
284  
285  #### ✅ Quality Infrastructure
286  
287  - ESLint configuration (0 errors)
288  - Prettier formatting (enforced)
289  - Node.js native test framework (386 tests passing, 60.5% coverage)
290  - c8 coverage reporting (keywords stage at 98.66% coverage)
291  - Sage AI review integration
292  - Quality check script (`npm run quality-check`)
293  - Keywords stage comprehensive test suite (16 tests, 100% coverage)
294  
295  #### ✅ New Features Added
296  
297  - Screenshot backfill system (`scripts/backfill-screenshots.js` + unit tests)
298  - Config-based low score threshold (default: 82)
299  - Comprehensive test suite (42 → 386 tests)
300  - Database indexes for performance
301  - Nix shell environment with command documentation
302  - Keywords table schema fix (migration 013: added priority, status, search_count columns)
303  - X and LinkedIn integration into stage-based outreach pipeline
304  
305  ---
306  
307  ## ✅ MVP (Priority 2)
308  
309  **Goal**: End-to-end outreach with SMS + contact forms
310  
311  **Status**: 100% complete. All components implemented and integrated.
312  
313  **What's working**: Proposals, contact prioritization, all outbound channels (SMS, Email, Forms), inbound handling (SMS & Email), full orchestration pipeline
314  
315  #### ✅ Proposal Generation
316  
317  - `src/proposal-generator-v2.js`: Generate N unique proposals (one per contact) via OpenRouter
318    - Loads prompts from `docs/prompts/PROPOSAL.md`
319    - Includes competitor data, scores, contact channels
320    - Generates dynamic number of proposals based on contacts found (N contacts → N proposals)
321    - Each proposal tagged with recommended channel and personalized to contact
322    - Channel-optimized (SMS: <160 chars, email: detailed, etc.)
323    - V1 deprecated code removed
324  
325  #### ✅ Contact Prioritization
326  
327  - `src/contacts/prioritize.js`: Contact method decision logic
328    - Parse contacts_json from database
329    - Apply priority: SMS > Email > Form > X > LinkedIn
330    - Match variant to channel (see architecture docs)
331    - Return array of {channel, uri, variant} objects
332    - CLI commands: `update <site_id>`, `bulk [limit]`, `report`
333    - Core logic tested (16/20 tests passing, integration tests have isolation issues)
334  
335  #### ✅ Outreach Channels
336  
337  - `src/outreach/email.js`: Resend SMTP integration
338    - Plain text with minimal HTML for links/buttons
339    - Include unsubscribe link (CAN-SPAM)
340    - Append signature from config table
341    - Channel-optimized detailed proposals via email
342  - `src/outreach/sms.js`: Twilio SMS integration
343    - Channel-optimized short proposals to mobile numbers
344    - Store delivery_status via Twilio webhooks
345    - Handle Twilio errors gracefully
346  - `src/outreach/form.js`: Contact form automation
347    - Playwright form filling with channel-optimized proposals
348    - Best-guess field mapping
349    - Inject operator panel (integrated into form.js)
350    - Use `await page.pause()` for human review
351  - `src/outreach/x.js`: X/Twitter DM automation
352    - Playwright-based headed browser automation
353    - Persistent browser profiles with LRU rotation
354    - Aggressive stealth mode for bot detection avoidance
355    - Manual login support with session persistence
356    - Integrated into stage-based pipeline (`npm run outreach x`)
357  - `src/outreach/linkedin.js`: LinkedIn message automation
358    - Playwright-based headed browser automation
359    - Persistent browser profiles with LRU rotation
360    - Aggressive stealth mode for bot detection avoidance
361    - Manual login support with session persistence
362    - Integrated into stage-based pipeline (`npm run outreach linkedin`)
363  
364  #### ✅ Inbound Handling
365  
366   `src/inbound/email.js`: Resend API polling for inbound emails
367  
368  - Poll Cloudflare R2 for email.received events
369  - Fetch full email content via Resend Received Emails API
370  - Parse email body to remove quoted text
371  - Detect sentiment (positive/neutral/objection) via keyword matching
372  - Map sender email to outreach
373  - Store in conversations table with direction='inbound'
374  - Process pending operator replies from conversations table
375  
376   `src/inbound/sms.js`: Twilio API polling
377  
378  - Poll Twilio Messages API every 5 minutes: `GET /2010-04-01/Accounts/{AccountSid}/Messages.json`
379  - Filter by DateSent > last_poll_time
380  - Map incoming SMS to outreach via phone number
381  - Store in conversations table with direction='inbound'
382  - Mark for operator review
383  - Process new replies in conversations table to send SMS from operator back to prospect
384  
385  #### ✅ Orchestration
386  
387  - Full MVP pipeline
388    - Expand POC pipeline to full pipeline
389    - Filter sites scoring B- to E
390    - Generate proposals for each
391    - Prioritize contacts
392    - Track in outreaches table
393    - Send via available channels
394    - Poll & handle new replies (from operator to prospect) in conversations table
395    - Implemented in `src/mvp.js` with commands: run, poc, propose, send, replies
396    - Also available as stage-based pipeline in `src/all.js` (recommended)
397  
398  ---
399  
400  ## Full System (Priority 3) - ~30-40h
401  
402  **Goal**: Multi-channel, inbound handling, analytics dashboard, multi-country support
403  
404  **Recommended order:**
405  
406  1.  Complete email inbound handling (builds on existing webhook infrastructure)
407  2.  Implement analytics/tracking (uses existing data)
408  3.  Build Streamlit dashboard (visualizes existing metrics)
409  4.  Add multi-country support (requires testing infrastructure)
410  5.  Implement GDPR compliance (EU-specific, lower priority)
411  
412  **Note**: Many tasks below can be done in parallel or deferred based on business needs
413  
414  #### Multi-Country Support (~6-10h)
415  
416  See `../09-business/market-expansion.md` for detailed implementation plan.
417  
418  - ⏳ Update README with country support docs (deferred)
419  - ⏳ Create country-specific keyword files (deferred)
420  - **Phase 6: GDPR Compliance for EU/UK** ✅ (COMPLETED - 5h)
421    - ✅ Created `src/utils/gdpr-verification.js` module (98.74% test coverage)
422    - ✅ Free email provider filter list (75+ domains: gmail, outlook, gmx, web.de, btinternet, etc.)
423    - ✅ Implemented DOM search for company type strings (searchCompanyTypes function)
424    - ✅ Implemented DOM search for registration keywords (searchCompanyKeywords function)
425    - ✅ GDPR configuration already exists in `countries.js` (companyTypes, companyKeywords, keyPageNames)
426    - ✅ Created `tests/gdpr-verification.test.js` - all 6 tests passing
427    - ✅ Database schema: Migration 027 adds `company_proof`, `gdpr_verified`, `gdpr_verified_at` columns
428    - ✅ Reporting view: `gdpr_verification_report` view created for verification statistics by country
429    - ✅ Enrichment integration: Calls `batchVerifyEmails()` for all EU/UK sites during enrich stage
430    - ✅ Automatic verification: Runs after browsing contact pages, uses HTML content for company type/keyword search
431    - ✅ Database storage: Stores verification results in `company_proof` JSON, sets `gdpr_verified` flag
432    - ✅ Outreach filtering: Migration 028 adds 'gdpr_blocked' status to outreaches table
433    - ✅ Proposals stage integration: Checks gdpr_verified flag, blocks unverified emails automatically
434    - ✅ Automatic skip: Outreach stage only processes 'pending' status (gdpr_blocked emails are skipped)
435    - ✅ Country code fix: Added GB→UK alias mapping for ISO 3166-1 alpha-2 codes
436    - ✅ Sites with forms: Modified enrichment to run GDPR verification on ALL sites (not just those without forms)
437    - ✅ End-to-end testing: Verified complete flow with UK sites (enrichment → proposals → outreach)
438    - ✅ Browser automation for key pages (DEFERRED - enrichment already browses contact pages with company info)
439  
440  #### ✅ Additional Outbound Channels (COMPLETED)
441  
442  - Manual-assist headed browser automation
443  - Playwright pause for human login/review
444  - Persistent profile sessions with LRU rotation
445  - Integrated into stage-based pipeline
446  
447  #### Email Deliverability — Secondary Verification (MillionVerifier)
448  
449  - ZeroBounce returns `unknown` for ~38% bounce-rate addresses it can't verify (greylisting, no MX response)
450  - These are currently **parked** as `retry_later` with `error_message='zb_unknown: parked pending secondary verification (MillionVerifier)'` and `retry_at = +30 days`
451  - **TODO**: Integrate [MillionVerifier](https://www.millionverifier.com) as secondary validator (prepay, ~$47/50k credits)
452    - Call MillionVerifier API only for `zb_status='unknown'` addresses (fail-closed: if also unknown → skip send)
453    - Cache result in new `mv_status` column (migration needed)
454    - On success: reset `retry_later` → `approved` for addresses MillionVerifier marks valid/catch-all
455    - On toxic: mark `failed` permanently
456    - See `src/utils/zerobounce.js` for ZeroBounce pattern to follow
457  
458  #### Tracking & Analytics
459  
460  - Resend built-in tracking (opens, clicks, bounces, complaints)
461  - Cloudflare Worker receives webhooks and stores in R2
462  - Local sync script polls worker and updates SQLite
463  - Tracking data: `opened_at`, `tracking_clicked_at`, `email_id` in outreaches table
464  - Setup: Deploy `resend-webhook-worker.js`, configure Resend webhook URL
465  - Sync: `node src/utils/sync-email-events.js` (run every 5 minutes via cron)
466  - **Sales Tracking Enhancements**
467    - Improve dashboard sales tracking with more detailed metrics
468    - Add revenue attribution by keyword
469    - Track time-to-conversion (first outreach → sale)
470    - Add sales funnel visualization (outreach → response → meeting → sale)
471    - Consider CRM integration for automated sale recording
472  - **Negative Sentiment Analysis**
473    - Analyze negative sentiment responses to identify common complaints
474    - Group objections by category (price, timing, not interested, already have solution)
475    - Track complaint frequency over time
476    - Use insights to refine proposal templates and reduce objections
477    - Dashboard visualization of top objection types
478  - Metrics collection with something like StatsD/Prometheus
479    - Feed these into analytics dashboard
480  - `src/analytics/`: Analytics calculations
481    - Response rate by channel, keyword, time-of-day
482    - Conversion funnel: sent → clicked → replied → sale
483    - Revenue per channel (from sale_amount field)
484  
485  #### Proposal Improvements
486  
487  - **Translate score_json fields for non-English proposals**
488    - Problem: `[primary_weakness]`, `[evidence]`, `[reasoning]` in templates come directly from the English LLM scoring output, producing mixed-language proposals (e.g. "Hauptproblem: no urgency messaging")
489    - Solution: After `extractTemplateFields()` in `generateTemplateProposal()`, call Haiku to translate `primaryWeakness` and `evidence` into the target language when `languageCode !== 'en'`
490    - Use a simple prompt: "Translate the following short phrase to {language}. Reply with ONLY the translation, no explanation: {text}"
491    - Cache by (text, languageCode) pair in-memory to avoid duplicate API calls within a run
492    - Cost: ~$0.0001/call × ~3 fields × non-English sites — negligible
493    - File: `src/utils/template-proposals.js` → `generateTemplateProposal()`
494    - Also consider: translate score_json weakness fields at scoring time and store translated versions in score_json for the detected locale
495  
496  - **Safe Conservative Recommendations** (DEFERRED until data quality validated)
497    - Add specific actionable recommendations to proposals once response rates and data quality are validated
498    - Focus on high-confidence recommendations that rarely backfire:
499      - Missing SSL certificate → "Add HTTPS security"
500      - No mobile viewport tag → "Fix mobile responsiveness"
501      - Missing alt text on images → "Improve accessibility"
502      - Slow page load (from scoring data) → "Optimize page speed"
503    - Extract recommendations from scoring JSON (already collected during scoring stage)
504    - Consider adding HTTP headers and SSL status collection during assets stage
505    - Update scoring rubric to explicitly evaluate SSL/HTTPS status
506    - Prerequisites:
507      - Establish baseline response rates with score-only proposals
508      - Validate data quality is sufficient for accurate recommendations
509      - Monitor complaint rates (<0.1% target) before adding recommendations
510      - A/B test score-only vs. score + recommendation to measure conversion lift
511  
512  #### Consolidation
513  
514  - Rearrange the various "npm run" and node scripts into a more sensible hierarchy
515  - Revamp the cron job system
516  
517  #### Enhanced Inbound Handling
518  
519  **Email Implementation Options**:
520  
521  - email.received, email.opened, email.clicked, email.complained, email.bounced, email.failed, email.suppressed
522  
523  - Poll Cloudflare R2 for email.received events from the Resend Webhooks API
524  - Use Resend Received Emails API to get the actual email body
525  - Map sender email to sites.domain or original recipient
526  - Parse out quoted text, store original in raw_payload
527  - Detect sentiment (positive/objection) via simple keyword matching
528  - Store in conversations table with direction='inbound'
529  - CLI commands: `poll`, `process-replies`
530  - npm scripts: `npm run inbound:email`
531  
532  - Route SMS and Email to conversations table
533  - Thread by outreach_id
534  - Mark conversations for operator review
535  - CLI commands: `poll`, `process-replies`, `inbox`, `thread`, `stats`
536  - npm scripts: `npm run inbound:poll`, `npm run inbound:inbox`, `npm run inbound:stats`
537  
538  - `tests/inbound-email.test.js` - Unit tests for email functions
539  - `tests/inbound-processor.test.js` - Unit tests for processor functions
540  
541  #### ✅ Streamlit Dashboard (COMPLETED)
542  
543  - Tab 1: Overview (stats, sales $, pipeline funnel, top errors)
544  - Tab 2: Pipeline Health (error analysis, stuck sites, throughput)
545  - Tab 3: Outreach Effectiveness (response rates, delivery funnel, sales tracking)
546  - Tab 4: Conversations (sentiment analysis, unread messages, conversation threads)
547  - Tab 5: Compliance (opt-out stats, rate limits, platform health)
548  - Tab 6: System Health (database metrics, cron jobs, code coverage, API limits)
549  
550  ##### Cron Job Integration
551  
552  - **Real-time Sync Status** (5-minute tasks)
553    - Display last sync times for: email events, unsubscribes, inbound SMS
554    - Show counts: new emails opened, new unsubscribes, new SMS messages
555    - Alert if sync hasn't run in > 10 minutes
556  - **Pipeline Queue Monitoring** (15-minute task)
557    - Display pipeline processing stats (sites processed, success rate)
558    - Show current queue depth and estimated completion time
559    - Alert if queue is backing up (> 1000 pending sites)
560  - **Database Health Metrics** (daily task)
561    - Display database size, integrity check status
562    - Show table row counts (sites, outreaches, conversations)
563    - Chart database growth over time
564  - **Code Quality Dashboard** (daily tasks)
565    - Display test coverage percentage with trend chart
566    - Show ESLint warning/error counts
567    - Display Sage AI review summary (if enabled)
568    - Show last lint/format/test run timestamps
569  - **Security & Dependencies** (daily/weekly tasks)
570    - Display npm audit vulnerability counts by severity
571    - Show outdated packages count with update recommendations
572    - Alert on critical security issues
573    - Display API rate limit health checks
574  - Compliance Reporting
575    - Total opt-outs by channel
576    - Messages blocked by compliance checks
577    - Business hours violations prevented
578  - **Profit Forecast** (weekly task)
579    - Based on /../09-business/profit-estimates.md
580  - **Database Operations** (weekly tasks)
581    - Show last backup timestamp and file size
582    - Display database vacuum stats (space saved)
583    - Chart database performance metrics over time
584    - Show index usage statistics
585  - **Technical Debt Tracking** (monthly task)
586    - Display incomplete TODO.md task count
587    - Show completed vs incomplete task ratio
588    - Trend chart of technical debt over time
589  - **Cron Task Overview**
590    - ✅ Move the TASKS array in cron.js to a sqlite table `cron_jobs` with: name, handler, interval_time, interval_unit, last_run, status, duration, and storage for status summary and details
591    - ✅ Created migration 029-create-cron-jobs-table.sql
592    - ✅ Created CLI manager (src/cli/cron-manager.js) for job management
593    - ✅ Created new cron runner (src/cron.js) that reads from database
594    - ✅ Migration script (scripts/migrate-cron-tasks.js) to seed existing tasks
595    - ✅ npm scripts for easy management (cron:list, cron:enable, cron:disable, cron:add, cron:remove, cron:logs, cron:stats)
596    - ✅ Comprehensive documentation (../06-automation/cron-system.md)
597    - ⏳ Surface this in the Streamlit dashboard (deferred)
598      - Allow changes/additions of jobs via web UI
599      - Show a dashboard with the status of each task (core pipeline and cron jobs) and drill down to the details
600    - ⏳ Color-coded status indicators in dashboard (success=green, failed=red, skipped=yellow) (deferred)
601    - ⏳ Ability to view logs for each task in dashboard (deferred)
602    - ⏳ Manual trigger button for enabled tasks in dashboard (deferred)
603  
604  #### Asset Collection
605  
606  - in "npm run assets" don't log "[Capture] [WARN] Request failed" for third-party domains
607  - look for unusual patterns in the asset collection logs
608  
609  #### Tech Debt
610  
611  - mock up all integrations
612  - then add mocking tests to increase coverage by 8-12% (eg: scrape.js, inbound/sms.js)
613  - remove any unused code such as:
614    - Legacy POC/MVP/Process pipelines
615    - `screenshot_optimization_tests` table
616  - move integration tests to their own sub folder, and unit tests in their own folder
617  - Refactor tests to use in-memory SQLite databases (`:memory:`) instead of file-based databases:
618    - Currently tests run sequentially (`--test-concurrency=1`) to prevent race conditions on shared DB files
619    - In-memory DBs would allow parallel test execution → ~5x faster test runs
620    - Need to update `src/utils/test-db.js` and all test files that use `initTestDb(filePath)`
621    - After refactor, remove `--test-concurrency=1` flag from `package.json` test commands
622  - replace all hardcoded references to the low score cutoff of 82 with a global constant in .env or `config` table
623  - consider removing the `config` table and just using .env (or move all secrets from .env to the `config` table if that's a safer place?)
624  - Fix ESLint Warnings
625    - `getCountryByCode()` should throw error instead of defaulting to AU
626    - `getCountryByGoogleDomain()` should throw error instead of defaulting to AU
627    - Remove `DEFAULT_COUNTRY` from `.env.example` (not used in code)
628  
629  #### Human Tasks
630  
631  - Review key doco to identify logic flaws
632  
633  #### Nice To Have
634  
635  - Redo architecture diagrams in mermaid (and make sure they're accurate)
636  
637  #### Final Testing
638  
639  - End-to-end test: Keyword → SERP → Scoring → Outreach → Track response
640  - Load test: 100 sites in one run
641  - Documentation: README.md with setup instructions
642  
643  ---
644  
645  **When asking Claude "what's next?":**
646  
647  - Reference this file instead of asking user
648  - Suggest next logical task from priority list
649  - Consider dependencies (e.g., need analytics before dashboard)
650  
651  ---
652  
653  ## Infrastructure & Networking
654  
655  **Worker USB Node — Future Considerations:**
656  
657  1. **Yggdrasil overlay network (P3 — consider once worker fleet is stable)**
658     - **What:** Decentralised encrypted IPv6 mesh; each node gets a cryptographically derived address. NetBird/WireGuard tunnel runs _inside_ Yggdrasil, hiding the WireGuard handshake from ISPs.
659     - **Why:** Reduces AITA (AI threat intel correlation) by obscuring which IP is contacting the VPS. ISPs see Yggdrasil traffic rather than raw WireGuard.
660     - **Why not yet:** Adds a third daemon to the USB boot chain (Yggdrasil → NetBird → Docker), increases boot complexity and failure modes. The VPN kill switch already encrypts all traffic.
661     - **Simpler alternatives first:** Residential proxy exit on the VPS, or Mullvad/IVPN exit routing.
662     - **NixOS module:** `services.yggdrasil.enable = true` — one line when ready.
663     - **Effort:** ~1 day (NixOS config + peer coordination)
664     - **Priority:** P3 — evaluate 3-6 months after worker fleet is running stably
665  
666  ---
667  
668  ## Enrichment Quality — Third-Party API Services
669  
670  **Goal:** Improve contact/company data quality during enrichment using third-party APIs. Currently we do our own contact extraction (Playwright + LLM) and email validation (ZeroBounce). Evaluate whether paid services yield better data at acceptable cost.
671  
672  **Services to evaluate (Outscraper and alternatives):**
673  
674  1. **Contact enrichment (Outscraper "Contact Enrichment")**
675     - Does it find contacts we're missing with our Playwright + Haiku extraction?
676     - Compare: hit rate, accuracy, cost per lookup vs our current ~$0.003/site
677     - Alternatives: Apollo.io, Hunter.io, Clearbit, RocketReach
678  
679  2. **HubSpot company contacts finder (Outscraper)**
680     - Useful if targets have HubSpot presence — may surface decision-maker names/emails
681     - Probably niche; skip unless our verticals are HubSpot-heavy
682  
683  3. **Email validation (Outscraper vs ZeroBounce)**
684     - We already have ZeroBounce — is Outscraper cheaper/better?
685     - Compare: cost per validation, accuracy, speed, catch-all handling
686     - Probably not worth switching unless ZeroBounce pricing becomes a problem
687  
688  4. **Phone number validation (Outscraper or alternatives)**
689     - **Key question:** Do we suffer reputational/ban/cost penalties from Twilio for sending SMS to invalid numbers?
690     - If Twilio charges per attempt or flags accounts for high invalid rates, validation pays for itself
691     - Alternatives: Numverify, Abstract API, Twilio Lookup API (we already pay Twilio — check if Lookup is cheaper than third-party)
692  
693  5. **Company insights / company type detection (Outscraper "Company Insights")**
694     - **High priority for GDPR:** Could help determine if a business is a sole trader / partnership vs limited company
695     - Currently we do DOM-based `searchCompanyTypes()` + LLM classification — accuracy is unknown
696     - A structured API response with company type/registration would be more reliable
697     - Alternatives: Companies House API (UK, free), ABN Lookup (AU, free), OpenCorporates, Endole
698     - **Note:** Free government APIs (Companies House, ABN) should be tried first before paying for this
699  
700  **Priority:** P2 — the GDPR sole trader detection (#5) is the most impactful. The rest are nice-to-have unless our current extraction quality drops.
701  
702  **Next step:** Benchmark current enrichment hit rate and GDPR classification accuracy to establish a baseline before evaluating any of these.
703  
704  ---
705  
706  ## Future Enhancements
707  
708  **Post-Implementation Improvements:**
709  
710  1. **Periodic Re-validation:**
711     - Cron job to re-validate monthly
712     - Update keywords based on search trends
713     - Store in `scripts/cron/validate-keywords.js`
714  
715  2. **Historical Tracking:**
716     - Store search volume history in database
717     - Track keyword trends over time
718     - Alert on significant SV changes
719  
720  3. **Multi-API Support:**
721     - Add Google Ads API provider
722     - Add Keyword Tool API provider
723     - Allow switching via `KEYWORD_VALIDATION_PROVIDER` env var
724  
725  4. **Advanced Filtering:**
726     - Filter by competition level (avoid ultra-competitive keywords)
727     - Filter by CPC (cost-per-click) if running ads later
728     - Filter by keyword intent (informational vs transactional)
729  
730  5. **Regional Variations:**
731     - Different keyword counts per country based on population/GDP
732     - Already implemented via GDP-prioritized processing
733  
734  6. **Agent Workflow Management:**
735     - Consider Jira-like ticketing system for agent task management
736     - Proper workflow enforcement: Architect → PO → Developer → Architect → QA
737     - Task dependencies, approvals, status tracking
738     - May be overkill for current scale - evaluate when agent system matures
739     - Alternative: Extend agent_tasks table with workflow fields (reviewed_by, approved_by, approval_status)
740  
741  ---
742  
743  ## Pricing & Revenue Optimization
744  
745  **Post Multi-Currency Launch:**
746  
747  1. **FX Rate Monitoring & Consolidation (P2 - Medium Complexity)**
748     - **Goal:** Minimize FX losses, maximize USD realization from currency holdings
749     - **Components:**
750       - Cron job to track USD/AUD, USD/EUR, USD/GBP rates hourly
751       - Alert when rate hits favorable threshold (e.g., AUD/USD > 1.55)
752       - Automated consolidation: Convert currency balance when optimal
753       - Dashboard widget showing currency holdings and conversion opportunities
754     - **Business Value:** 2-5% improvement in USD realization vs immediate conversion
755     - **Implementation:**
756       - Create `src/cron/fx-rate-monitor.js`
757       - Integrate with Fixer.io API (already used for weekly repricing)
758       - PayPal API integration for balance queries and currency conversion
759       - Add `fx_consolidation_log` table for tracking conversions and realized gains
760     - **Effort:** ~1-2 days (requires PayPal API integration)
761     - **Priority:** P2 (implement 1-2 months after multi-currency launch is stable)
762  
763  2. **Split Test Pricing in Proposals (P2 - Low Complexity)**
764     - **Goal:** Optimize proposal-to-payment conversion rate
765     - **Hypothesis:** Not showing price may increase engagement (reduces sticker shock), showing price may pre-qualify leads (reduces time waste)
766     - **Test Design:**
767       - Variant A: Proposals mention local currency price upfront
768       - Variant B: Proposals omit pricing (control - current implementation)
769       - Variant C: Proposals mention "starting from X" messaging
770     - **Metrics to Track:**
771       - Engagement rate (opens, clicks)
772       - Reply rate (interested vs declined)
773       - Payment conversion rate
774       - Time from proposal to payment
775       - Average deal value by variant
776     - **Business Value:** 5-15% improvement in conversion funnel optimization
777     - **Implementation:**
778       - Update `src/proposal-generator-v2.js` to conditionally include pricing
779       - Add `proposal_pricing_variant` field to conversations table
780       - Dashboard query to analyze conversion by variant
781     - **Effort:** ~4-6 hours (pricing infrastructure already exists)
782     - **Priority:** P2 (implement 30+ days after multi-currency launch)
783  
784  3. **Stripe Migration Evaluation (P3 - Research Phase)**
785     - **Goal:** Evaluate Stripe as alternative payment processor if PayPal limitations arise
786     - **When to Consider:**
787       - If PayPal FX fees exceed $1,000/month (3% × $33K+ monthly revenue)
788       - If adding subscription/recurring billing (Stripe Billing is superior)
789       - If need better exchange rates (Stripe closer to spot rates)
790       - If customer feedback indicates preference for card entry over PayPal login
791     - **Stripe Advantages:**
792       - Lower fees: 2.9% + $0.30 (no separate 1.5% cross-border fee)
793       - Better developer experience (webhooks, API design)
794       - Native subscription billing with prorations, trials, etc.
795       - Better exchange rates (typically 1-2% better than PayPal)
796       - More granular analytics and reporting
797     - **Stripe Disadvantages:**
798       - Customer must enter card details (vs convenient PayPal login)
799       - No buyer/seller protection like PayPal disputes process
800       - Less familiar brand in some international markets (PayPal more trusted)
801     - **Migration Path:**
802       - Add Stripe as secondary payment option (A/B test conversion rates)
803       - Track which processor customers prefer by region
804       - Gradually shift to preferred processor per market
805       - Maintain both for 3-6 months before potentially consolidating
806     - **Effort:** ~2-3 days implementation + 1-2 weeks testing
807     - **Priority:** P3 (evaluate 6-12 months post-launch or if revenue > $30K/month)
808  
809  ## Recurring Tasks
810  
811  These tasks should be performed on a regular schedule:
812  
813  - **Monthly — Contact extraction benchmark:** Re-run `npm run benchmark:contacts` to evaluate whether newer/cheaper LLMs outperform the current `ENRICHMENT_MODEL`. Check if model upgrade ROI is positive. Update `MODELS` in `scripts/benchmark-contact-extraction.js` and `MODEL_PRICING` in `src/utils/llm-usage-tracker.js` when new models launch.
814    - Last run: _(never — run it for the first time with `npm run benchmark:contacts -- --dry-run` to preview cost)_
815    - Reports saved to: `reports/contact-extraction-benchmark-YYYY-MM-DD.md`
816  
817  - **Monthly — Test/prod path isolation audit:** Check for mixing of data and logging between test and production:
818    1. Verify `src/utils/logger.js` respects `LOGS_DIR` env var (not just hardcoded `./logs`)
819    2. Grep test files for hardcoded prod paths (`./logs/`, `./db/sites.db`) — should use `/tmp/test-*` or `:memory:`
820    3. Grep source files for hardcoded test paths (`/tmp/test-`) that shouldn't be in prod code
821    4. Run `npm test` and confirm nothing written to `./logs/` or `./db/sites.db`
822    - Known issues found (2026-03-03): Logger ignores `LOGS_DIR` env var (line 25), `process.integration.test.js` hardcodes prod DB path