TODO.md
1 --- 2 title: 'TODO' 3 category: 'other' 4 last_verified: '2026-02-15' 5 related_files: 6 - 'src/scrape.js' 7 - 'src/capture.js' 8 - 'src/utils/image-optimizer.js' 9 - 'src/score.js' 10 - 'src/utils/error-handler.js' 11 tags: ['TODO', 'cron', 'scheduling', 'testing', 'security', 'database', 'api', 'ai'] 12 status: 'current' 13 --- 14 15 ## Implementation Tasks 16 17 **How to use this file:** 18 19 - Tasks organized by priority and effort (Quick Wins → MVP → Full System) 20 - Check this file before asking "what should we work on next?" 21 - Mark tasks complete with the green tick emoji ✅ as you finish them 22 - Add new tasks under appropriate priority section 23 24 --- 25 26 ## OpenViking — Deferred Context Database (added 2026-03-14) 27 28 Integrate **OpenViking** (Apache-2.0, [github.com/volcengine/OpenViking](https://github.com/volcengine/OpenViking)) to reduce token usage via L0/L1/L2 tiered context loading. 29 30 **Pre-conditions before starting:** 31 32 1. Orchestrator context size logging shows dynamic context regularly >200KB/call 33 2. Marketing knowledge base (brand guidelines, past campaigns, competitor intel) populated 34 35 **Integration steps when ready:** 36 37 1. Add `openviking` Docker container to `modules/containers.nix` (HTTP port 1933, Python SDK server) 38 2. Node.js HTTP client in `src/agents/utils/context-loader.js` (~30 lines, replaces flat file reads) 39 3. Migrate `src/agents/contexts/*.md` + `prompts/` into OpenViking schema 40 4. Add env var `OPENVIKING_URL=http://localhost:1933` 41 5. Same NixOS container config deploys unchanged to VPS — no migration friction 42 43 **Why deferred:** Current dynamic context is ~0KB (7 static .md files, 94KB total). Token savings ~10-20% = ~$1/day if agent system were re-enabled. Not worth 6-8h setup. Break-even at >200KB dynamic context. 44 45 See: `docs/02-architecture/software-inventory.md` for FOSS/audit status. 46 47 --- 48 49 ## Reminders / One-Off Actions 50 51 - 🏷️ **Plan: rename site statuses for clarity (added 2026-03-11)** — Current status names are confusing (`scored` / `rescored` imply repetition; `rescored` is actually just "scored with vision"; `assets_captured` is internal jargon). Proposed rename: 52 - `found` → `found` (fine) 53 - `assets_captured` → `scraped` (HTML+screenshots captured, ready for scoring) 54 - `scored` → `prog_scored` (programmatic score done, awaiting semantic LLM pass) — only used when ENABLE_VISION=true; with vision off, skipped 55 - `rescored` → `scored` (fully scored: programmatic + semantic factors merged, final score computed; previously also meant "vision rescored") 56 - `enriched` → `enriched` (contacts found, ready for proposals) 57 - `proposals_drafted` → `proposals_drafted` (fine) 58 - `high_score` → `skipped_high_score` (makes clear it's a deliberate skip, not a terminal failure) 59 - `ignore` → `ignored` (grammatically consistent with other past-participle statuses) 60 - Implementation notes: rename requires a DB migration (UPDATE sites SET status = ... for each mapping), update all queries in src/stages/, claude-batch.js, claude-store.js, monitoring-checks.sh, npm run status, and any hardcoded strings in tests. Not urgent — do during a quiet period. Add the semantic_scored gate above as an interim fix first. 61 62 - 📊 **Recalibrate hybrid scorer once scored backlog clears (added 2026-03-09)** — Current R² of programmatic-only scorer is -2.6 (expected — it's incomplete by design). Run `node scripts/calibrate-scorer.js 500` once a meaningful number of sites (~500+) have gone through the full split pipeline (programmatic + `score_semantic` Haiku batch). The calibration measures hybrid output vs LLM ground truth. Worth running when orchestrator's `score_semantic` backlog has substantially drained. Check: `SELECT COUNT(*) FROM sites WHERE json_extract(score_json,'$.headline_quality') IS NOT NULL AND score IS NOT NULL;` — target 500+ before calibrating. 63 64 - 🧹 **Remove reword system after backlog drain (added 2026-03-09)** — Once all 4,901 pre-orchestrator approved-unsent messages have been reworded (check: `SELECT COUNT(*) FROM messages WHERE reworded_at IS NULL AND approval_status='approved' AND sent_at IS NULL AND direction='outbound' AND message_type='outreach'` returns 0), remove: reword batch types from `claude-orchestrator.sh`, `fetchRewordBatch()` from `claude-batch.js`, `storeReword()` from `claude-store.js`, reword prompts from orchestrator. The `reworded_at` column can stay (harmless). New proposals generated by the orchestrator already use the trust/proof/importance framework — no rewording needed. 65 66 - 📞 **Twilio number status (updated 2026-03-04)** — Numbers active: AU/NZ (+61468089949), CA (+18254794242), UK (+447446944610). US uses CA number (NANP +1 — indistinguishable to recipients). IE uses UK number. Old US number +17658856535 **CANCELLED** — remove from Twilio console. Future expansion: IN, ZA, SG (~$2/month each — purchase when targeting those markets). 67 68 - 🇺🇸 **Register US A2P 10DLC for SMS marketing** — Required before scaling US SMS outreach. Current state: US SMS enabled using CA +18254794242 (works for testing/low volume). For production scale: register Brand + Campaign via Twilio console (~$10/month). Without registration, carriers may filter US-bound messages. See [Twilio A2P docs](https://help.twilio.com/articles/1260800720410). Also see Duguid defense note below re: TCPA consent requirements. 69 70 - ⚖️ **Check TCPA Duguid defense with US lawyer** — Our system pulls specific phone numbers from a database (scraped from business websites), not random/sequential generation. Under _Facebook v. Duguid_ (2021), this may mean we're not an ATDS and TCPA automated consent requirements don't apply. Get a US telecom lawyer to confirm before scaling US SMS outreach. See BP Risk #1 for full analysis with case law citations. 71 72 - 🏥 **Research regulated industry outreach requirements** — Healthcare (dental, medical, etc.) and financial (mortgage broker, accountant, etc.) sites are currently ignored (`status='ignore'`). Research per-industry compliance requirements (HIPAA marketing rules, financial services advertising regulations) to determine if/how we can outreach to these industries in future. Remove from `classifyIndustry()` ignore list once requirements are met. 73 74 - 📸 **Screenshot cleanup cron** — When `ENABLE_VISION` is re-enabled, implement a cron job to clean up old screenshots (currently unbounded storage growth). Low priority while vision is disabled. 75 76 - 🤖 **Reply-to-payment automation** — Automate the flow from prospect reply "YES" (conversation status `qualified`) → PayPal payment link generation → send payment link → webhook monitors completion → report delivery. Currently a manual step. The `conversations` table schema already supports the full flow. 77 78 - 🌐 **auditandfix.com order page: URL params for prefill** — When sending prospects to the website (instead of a bare PayPal link), pass query params so the order page is pre-filled: `?domain=example.com.au&country=AU¤cy=AUD&email=owner@example.com`. The page should use these to: (1) display the correct local price, (2) pre-fill the checkout form, (3) pass `custom_id` to PayPal so payment is linked back to the right conversation. Update SMS reply templates to use `https://auditandfix.com/order?domain=...&country=...` instead of direct `paypal.com` links. **Website-side change** — requires editing `auditandfix.com` PHP. 79 80 - 📣 **[Audit&Fix] Gather real customer testimonials** — The sales page (auditandfix.com) currently uses placeholder testimonials (Sarah T., James K., Michelle R.). Once the first real reports are delivered, reach out to customers for genuine feedback quotes. Plan: email survey after report delivery, offer a follow-up benchmarking discount in exchange for a testimonial. Also set up Trustpilot or Google Business Profile so reviews are verifiable. Update `auditandfix.com/index.php` testimonials section and the star rating once real reviews exist. 81 82 - 🇵🇹 **Add Portugal (PT) to countries list** — Portuguese translation (`lang/pt.json`) already exists (Brazilian Portuguese). When expanding to Portugal, add PT to `src/config/countries.js` with EUR currency, European Portuguese locale tweaks if needed, and GDPR compliance flag. 83 84 - 🔒 **WPCloud Anti-Spam email obfuscation** (LOW PRIORITY) — Some sites use this plugin to blur email addresses behind a multi-step reveal flow (blurred link → click → popup → wait → OK → email shown). Plugin is only available to wpcloud.com customers, so affected sites are rare. If email extraction yield from these sites becomes measurably low, investigate: (1) clicking the mailto link to trigger the reveal, (2) intercepting network requests during the reveal popup, or (3) falling back to form outreach for wpcloud.com-hosted sites. Example site: cabanonetgarage.com/en/contact-us/ 85 86 - 💰 **Cancel Hostinger VPS backup add-on** (~$36/year) once restic → Backblaze B2 is confirmed working. 87 Whole-VPS snapshots from Hostinger are redundant once restic is in place and verified. 88 Verify: `restic -r b2:method333-backups-prod snapshots` shows at least one successful snapshot, 89 then cancel from Hostinger control panel → Billing → Add-ons. 90 91 - 📧 **Upgrade Resend to paid plan ($20/month) once sending >500 emails/day** — Current free plan 92 allows 3,000 emails/month (100/day). Once daily email volume exceeds 100, upgrade to the $20/month 93 plan (50,000 emails/month). At the $20 plan, add a dedicated sending domain via Resend → Domains 94 for improved deliverability. Note: do NOT use the free plan for bulk cold outreach — Resend's free 95 tier is intended for transactional/notification emails, not marketing campaigns. Violating this 96 risks account suspension. Check current send volume: `npm run status`. 97 98 --- 99 100 ## 🔥 High Priority — Claude Code AFK Pipeline (Proposals + Scoring QA) 101 102 **Goal**: Shift proposal generation and scoring QA from paid API calls (OpenRouter/Anthropic) into Claude Code AFK cycles, using the Claude Max subscription at zero incremental LLM cost. 103 104 **Context**: Claude Code already runs autonomous 30-minute AFK check cycles. Proposal generation and scoring QA are just analysis work — read site data, write text. Claude Code does this natively. This eliminates ~$45/month in API costs (Claude API $30 + OpenRouter proposals $15) and produces higher-quality output (Opus vs GPT-4o-mini/Haiku). 105 106 **Architecture:** 107 108 Each 30-minute AFK cycle adds a "pipeline work" phase after health checks: 109 110 ``` 111 AFK cycle (30 min): 112 1. Run monitoring-checks.sh (2-3 min) — existing health monitoring 113 2. PROPOSAL GENERATION (new): 114 a. Query: sites at status='enriched' needing proposals (LIMIT 20) 115 b. For each site: read score_json, contacts, keyword from DB 116 c. Generate proposal text per PROPOSAL.md guidelines 117 d. INSERT INTO messages table (direction='outbound', approval_status='pending') 118 e. UPDATE sites SET status='proposals_drafted' 119 3. SCORING QA (new, after programmatic scorer is built): 120 a. Query: 5 random recently-scored sites 121 b. Read their HTML summary + programmatic score 122 c. Spot-check: does the score feel right? Flag outliers for rescore. 123 4. Sleep 30 min → repeat 124 ``` 125 126 **Throughput estimate:** 127 128 - Each proposal needs ~2KB context (domain, score_json, contacts, keyword) 129 - Claude Code can handle ~20 proposals per cycle before context pressure 130 - 48 cycles/day × 20 = **~960 proposals/day at $0 incremental cost** 131 - Current need: ~100 proposals/day → easily covered with capacity to spare 132 133 **Steady-state monthly cost impact:** 134 135 | Cost category | Before (API) | After (Claude Code) | Savings | 136 | --------------------------------- | ------------ | ------------------------ | ---------- | 137 | Claude API (proposals) | $30/mo | $0 | $30 | 138 | OpenRouter (proposals/enrichment) | $15/mo | ~$5/mo (enrichment only) | $10 | 139 | **Total LLM costs** | **$45/mo** | **~$5/mo** | **$40/mo** | 140 141 **Implementation:** 142 143 - [ ] Add "pipeline work" phase to AFK cycle (after monitoring-checks.sh, before sleep) 144 - [ ] Write `scripts/pull-proposal-batch.sh` — queries DB, outputs site data as JSON for Claude Code to process 145 - [ ] Define proposal output format that Claude Code writes directly (SQL INSERT or JSON file that a loader script imports) 146 - [ ] Add scoring QA phase (after programmatic scorer is built) 147 - [ ] Add proposal count tracking to AFK progress reports 148 149 **Not in scope:** Enrichment still uses Playwright + Haiku via API (browser automation can't run inside Claude Code context). This is ~$5/month and not worth optimizing. 150 151 --- 152 153 ## 🔥 High Priority — Programmatic (Rule-Based) Scoring 154 155 **Goal**: Replace LLM-based scoring with a deterministic rule engine that analyzes HTML/DOM directly — zero LLM cost, faster, explainable. 156 157 **Context**: LLM scoring (GPT-4o-mini via OpenRouter) cost $349 for 77,555 calls in the first 12 days — the single largest cost driver. With `SKIP_STAGES=scoring,rescoring` now set, new scoring is paused. But as the pipeline drains, we need a sustainable scoring approach that doesn't cost $0.004/site. 158 159 **What to score programmatically (from HTML/DOM):** 160 161 - Phone number visible above the fold (regex on `<header>`, first viewport HTML) 162 - CTA button present above fold (`<a>`, `<button>` with action words: call, get, book, contact, quote) 163 - Trust signals in HTML: testimonial sections, star ratings, review counts, accreditations 164 - Contact form present on landing page 165 - Mobile-responsive meta tag (`viewport`) 166 - Page load indicators: `<img>` count, inline CSS/JS bloat 167 - Business name/location in `<title>` or `<h1>` (local SEO signal) 168 - Social proof: mentions of "years experience", "projects completed", "clients served" 169 - SSL/HTTPS (from URL) 170 - Google Maps embed or address visible 171 172 **Scoring**: Each factor adds/subtracts from a base score. Weight by conversion impact. Output: score 0-100, grade A+ to F, `score_json` with per-factor breakdown — identical schema to current LLM output so downstream stages need no changes. 173 174 **Why high priority**: Reduces ongoing LLM costs from ~$350/month → $0 for scoring. Enables resuming pipeline at scale without burning OpenRouter credits. Also makes scoring auditable and debuggable. 175 176 **Implementation notes:** 177 178 - New module: `src/utils/programmatic-scorer.js` 179 - Input: HTML from `assets_captured` stage (already stored) 180 - Output: same JSON schema as current `score_json` 181 - Run A/B test: compare programmatic scores vs LLM scores on a sample of already-scored sites to validate calibration 182 - If correlation is >80%, replace LLM scoring entirely 183 184 **Sub-tasks:** 185 186 - [ ] Define scoring rubric (factors + weights) based on current LLM score_json schema 187 - [ ] Implement `src/utils/programmatic-scorer.js` with factor detection functions 188 - [ ] Add unit tests against known-scored sites from DB 189 - [ ] Calibration: run on 500 already-scored sites, compare to LLM scores 190 - [ ] If validated: update `src/stages/scoring.js` to call programmatic scorer when `ENABLE_LLM_SCORING=false` 191 - [ ] Update `.env.example` with `ENABLE_LLM_SCORING` flag 192 193 --- 194 195 ## 🔥 High Priority — LLM-Powered Autoresponder 196 197 **Goal**: Automatically handle inbound replies using an LLM, so prospects get a timely, intelligent response without manual operator intervention. This is the critical link between outreach and revenue. 198 199 **Context**: Inbound SMS/email polling is already implemented (`src/inbound/sms.js`, `src/inbound/email.js`). Sentiment + intent classification is already running. The `conversations` table has `direction`, `replied_at`, and `intent` columns. What's missing is the LLM step that reads the conversation context and drafts (and optionally sends) a reply. 200 201 ### Sub-tasks 202 203 - [ ] **Conversation context loader** — Given a `conversation_id`, fetch the full thread: original outreach proposal text, site metadata (domain, grade, score, industry), all prior inbound/outbound messages. This is the LLM's context window. 204 205 - [ ] **Reply prompt** (`prompts/autoresponder.md`) — Persona: friendly web consultant. Goals by intent: 206 - `interested` / `qualified` → confirm interest, send PayPal payment link (or ask qualifying question if not enough info) 207 - `objection` → handle the objection (price, timing, relevance) with a short, confident rebuttal 208 - `not_interested` → graceful opt-out acknowledgement, honour STOP/unsubscribe 209 - `autoresponder` / `out_of_office` → no reply (skip) 210 - `unknown` → ask a clarifying question 211 - Tone rules: concise, channel-appropriate (SMS ≤160 chars, email can be longer), no corporate jargon, no fake urgency 212 213 - [ ] **`src/inbound/autoresponder.js`** — Core module: 214 - `shouldAutoRespond(conversation)` — returns false for: already replied, opt-out, autoresponder intent, `replied_at IS NOT NULL` 215 - `generateReply(conversation)` — builds context, calls LLM (Claude Haiku preferred for cost), returns draft reply text 216 - `sendReply(conversation, text)` — writes `direction='outbound'` row to `conversations`, then calls existing `processPendingReplies()` in `src/inbound/sms.js` / `src/inbound/email.js` 217 - Log to `llm_usage` table with `stage='autoresponder'` 218 219 - [ ] **Human-in-the-loop mode** (default ON) — `AUTORESPONDER_AUTO_SEND=false` in `.env`. When false: generate reply and insert into `conversations` as `direction='outbound', replied_at=NULL` for operator review in dashboard before sending. When true: send immediately. 220 221 - [ ] **Cron job** (`src/cron/autoresponder.js`) — Runs every 5 minutes. Finds conversations where `direction='inbound'` and no outbound reply exists and intent is not `autoresponder`/`not_interested`. Calls `autoresponder.js` for each. Respects 72h cooldown (don't reply to something days-old without operator review). 222 223 - [ ] **Dashboard integration** — "Pending autoresponder drafts" count in Outreach Trust panel. Operator can approve/edit/discard drafts from Human Review page. 224 225 - [ ] **Tests** — Unit tests for `shouldAutoRespond()`, `generateReply()` (mocked LLM), and the intent-to-action routing logic. Integration test with a real inbound SMS conversation in a test DB. 226 227 **Dependencies**: `src/inbound/sms.js`, `src/inbound/email.js`, `src/inbound/processor.js`, `src/payment/paypal.js` (for payment link generation), `prompts/` directory, `llm_usage` table. 228 229 **Estimated effort**: Claude: 4h, Human: 8h 230 231 --- 232 233 ## Known Issues / Bugs 234 235 **Test Suite Issues:** 236 237 - `scripts/test-deduplication.js`: `r.reason` null crash → added null-coalescing guard 238 - `scripts/test-resend-webhook-sig.js`: `process.exit(1)` when env var missing → changed to `exit(0)` with SKIP message (manual integration test, not unit test) 239 - `npm test` suite (`tests/**/*.test.js`) was always clean; failures were in `scripts/test-*.js` picked up by bare `node --test` 240 241 --- 242 243 ## ✅ POC (Priority 1) 244 245 **Goal**: Validate approach with 1,000 sites, test screenshot optimization 246 247 #### ✅ Setup 248 249 - `npm init -y && npm i playwright playwright-extra puppeteer-extra-plugin-stealth better-sqlite3 sharp axios` 250 - Create `.env` file with API keys (ZenRows, OpenRouter) 251 - Create `db/schema.sql` with enhanced schema + keywords table + config table 252 - Initialize database: `node scripts/init-db.js` 253 - Database migrations system created 254 255 #### ✅ Core Pipeline 256 257 - `src/scrape.js`: ZenRows SERP scraping (`scrapeSERP` function) 258 - `src/capture.js`: Screenshot capture with Playwright 259 - Desktop viewport (1440×900): above-fold + below-fold 260 - Mobile viewport (390×844): above-fold only 261 - Full rendered DOM via `page.content()` 262 - Cropped + uncropped versions for all screenshots 263 - Network idle wait strategy + timeout handling 264 - `src/utils/image-optimizer.js`: Screenshot optimization with Sharp (**100% test coverage**) 265 - Smart cropping (remove nav/footer margins) 266 - Uncropped versions preserve full content (fit: 'inside') 267 - JPEG optimization at quality 85 268 - 91-96% file size reduction achieved 269 - `src/score.js`: OpenRouter GPT-4o-mini scoring 270 - Load Conversion Scoring Prompt from file 271 - Send screenshots + DOM 272 - Parse JSON response, store in conversion_score_json 273 - Resubmit prompt for low scores 274 - `src/utils/error-handler.js`: Retry logic with exponential backoff (**100% test coverage, 30 tests**) 275 - `src/utils/logger.js`: Console logging for Cline monitoring (**100% test coverage**) 276 - `src/utils/keyword-manager.js`: Keyword tracking system 277 - `scripts/backfill-keywords.js`: Backfill existing keywords 278 279 #### ✅ Main Orchestration 280 281 - `src/poc.js`: Main POC pipeline (processes keyword → SERP → capture → score) 282 - `src/process.js`: Full processing pipeline with queue management 283 - Test: Successfully processed multiple keywords with score tracking 284 285 #### ✅ Quality Infrastructure 286 287 - ESLint configuration (0 errors) 288 - Prettier formatting (enforced) 289 - Node.js native test framework (386 tests passing, 60.5% coverage) 290 - c8 coverage reporting (keywords stage at 98.66% coverage) 291 - Sage AI review integration 292 - Quality check script (`npm run quality-check`) 293 - Keywords stage comprehensive test suite (16 tests, 100% coverage) 294 295 #### ✅ New Features Added 296 297 - Screenshot backfill system (`scripts/backfill-screenshots.js` + unit tests) 298 - Config-based low score threshold (default: 82) 299 - Comprehensive test suite (42 → 386 tests) 300 - Database indexes for performance 301 - Nix shell environment with command documentation 302 - Keywords table schema fix (migration 013: added priority, status, search_count columns) 303 - X and LinkedIn integration into stage-based outreach pipeline 304 305 --- 306 307 ## ✅ MVP (Priority 2) 308 309 **Goal**: End-to-end outreach with SMS + contact forms 310 311 **Status**: 100% complete. All components implemented and integrated. 312 313 **What's working**: Proposals, contact prioritization, all outbound channels (SMS, Email, Forms), inbound handling (SMS & Email), full orchestration pipeline 314 315 #### ✅ Proposal Generation 316 317 - `src/proposal-generator-v2.js`: Generate N unique proposals (one per contact) via OpenRouter 318 - Loads prompts from `docs/prompts/PROPOSAL.md` 319 - Includes competitor data, scores, contact channels 320 - Generates dynamic number of proposals based on contacts found (N contacts → N proposals) 321 - Each proposal tagged with recommended channel and personalized to contact 322 - Channel-optimized (SMS: <160 chars, email: detailed, etc.) 323 - V1 deprecated code removed 324 325 #### ✅ Contact Prioritization 326 327 - `src/contacts/prioritize.js`: Contact method decision logic 328 - Parse contacts_json from database 329 - Apply priority: SMS > Email > Form > X > LinkedIn 330 - Match variant to channel (see architecture docs) 331 - Return array of {channel, uri, variant} objects 332 - CLI commands: `update <site_id>`, `bulk [limit]`, `report` 333 - Core logic tested (16/20 tests passing, integration tests have isolation issues) 334 335 #### ✅ Outreach Channels 336 337 - `src/outreach/email.js`: Resend SMTP integration 338 - Plain text with minimal HTML for links/buttons 339 - Include unsubscribe link (CAN-SPAM) 340 - Append signature from config table 341 - Channel-optimized detailed proposals via email 342 - `src/outreach/sms.js`: Twilio SMS integration 343 - Channel-optimized short proposals to mobile numbers 344 - Store delivery_status via Twilio webhooks 345 - Handle Twilio errors gracefully 346 - `src/outreach/form.js`: Contact form automation 347 - Playwright form filling with channel-optimized proposals 348 - Best-guess field mapping 349 - Inject operator panel (integrated into form.js) 350 - Use `await page.pause()` for human review 351 - `src/outreach/x.js`: X/Twitter DM automation 352 - Playwright-based headed browser automation 353 - Persistent browser profiles with LRU rotation 354 - Aggressive stealth mode for bot detection avoidance 355 - Manual login support with session persistence 356 - Integrated into stage-based pipeline (`npm run outreach x`) 357 - `src/outreach/linkedin.js`: LinkedIn message automation 358 - Playwright-based headed browser automation 359 - Persistent browser profiles with LRU rotation 360 - Aggressive stealth mode for bot detection avoidance 361 - Manual login support with session persistence 362 - Integrated into stage-based pipeline (`npm run outreach linkedin`) 363 364 #### ✅ Inbound Handling 365 366 `src/inbound/email.js`: Resend API polling for inbound emails 367 368 - Poll Cloudflare R2 for email.received events 369 - Fetch full email content via Resend Received Emails API 370 - Parse email body to remove quoted text 371 - Detect sentiment (positive/neutral/objection) via keyword matching 372 - Map sender email to outreach 373 - Store in conversations table with direction='inbound' 374 - Process pending operator replies from conversations table 375 376 `src/inbound/sms.js`: Twilio API polling 377 378 - Poll Twilio Messages API every 5 minutes: `GET /2010-04-01/Accounts/{AccountSid}/Messages.json` 379 - Filter by DateSent > last_poll_time 380 - Map incoming SMS to outreach via phone number 381 - Store in conversations table with direction='inbound' 382 - Mark for operator review 383 - Process new replies in conversations table to send SMS from operator back to prospect 384 385 #### ✅ Orchestration 386 387 - Full MVP pipeline 388 - Expand POC pipeline to full pipeline 389 - Filter sites scoring B- to E 390 - Generate proposals for each 391 - Prioritize contacts 392 - Track in outreaches table 393 - Send via available channels 394 - Poll & handle new replies (from operator to prospect) in conversations table 395 - Implemented in `src/mvp.js` with commands: run, poc, propose, send, replies 396 - Also available as stage-based pipeline in `src/all.js` (recommended) 397 398 --- 399 400 ## Full System (Priority 3) - ~30-40h 401 402 **Goal**: Multi-channel, inbound handling, analytics dashboard, multi-country support 403 404 **Recommended order:** 405 406 1. Complete email inbound handling (builds on existing webhook infrastructure) 407 2. Implement analytics/tracking (uses existing data) 408 3. Build Streamlit dashboard (visualizes existing metrics) 409 4. Add multi-country support (requires testing infrastructure) 410 5. Implement GDPR compliance (EU-specific, lower priority) 411 412 **Note**: Many tasks below can be done in parallel or deferred based on business needs 413 414 #### Multi-Country Support (~6-10h) 415 416 See `../09-business/market-expansion.md` for detailed implementation plan. 417 418 - ⏳ Update README with country support docs (deferred) 419 - ⏳ Create country-specific keyword files (deferred) 420 - **Phase 6: GDPR Compliance for EU/UK** ✅ (COMPLETED - 5h) 421 - ✅ Created `src/utils/gdpr-verification.js` module (98.74% test coverage) 422 - ✅ Free email provider filter list (75+ domains: gmail, outlook, gmx, web.de, btinternet, etc.) 423 - ✅ Implemented DOM search for company type strings (searchCompanyTypes function) 424 - ✅ Implemented DOM search for registration keywords (searchCompanyKeywords function) 425 - ✅ GDPR configuration already exists in `countries.js` (companyTypes, companyKeywords, keyPageNames) 426 - ✅ Created `tests/gdpr-verification.test.js` - all 6 tests passing 427 - ✅ Database schema: Migration 027 adds `company_proof`, `gdpr_verified`, `gdpr_verified_at` columns 428 - ✅ Reporting view: `gdpr_verification_report` view created for verification statistics by country 429 - ✅ Enrichment integration: Calls `batchVerifyEmails()` for all EU/UK sites during enrich stage 430 - ✅ Automatic verification: Runs after browsing contact pages, uses HTML content for company type/keyword search 431 - ✅ Database storage: Stores verification results in `company_proof` JSON, sets `gdpr_verified` flag 432 - ✅ Outreach filtering: Migration 028 adds 'gdpr_blocked' status to outreaches table 433 - ✅ Proposals stage integration: Checks gdpr_verified flag, blocks unverified emails automatically 434 - ✅ Automatic skip: Outreach stage only processes 'pending' status (gdpr_blocked emails are skipped) 435 - ✅ Country code fix: Added GB→UK alias mapping for ISO 3166-1 alpha-2 codes 436 - ✅ Sites with forms: Modified enrichment to run GDPR verification on ALL sites (not just those without forms) 437 - ✅ End-to-end testing: Verified complete flow with UK sites (enrichment → proposals → outreach) 438 - ✅ Browser automation for key pages (DEFERRED - enrichment already browses contact pages with company info) 439 440 #### ✅ Additional Outbound Channels (COMPLETED) 441 442 - Manual-assist headed browser automation 443 - Playwright pause for human login/review 444 - Persistent profile sessions with LRU rotation 445 - Integrated into stage-based pipeline 446 447 #### Email Deliverability — Secondary Verification (MillionVerifier) 448 449 - ZeroBounce returns `unknown` for ~38% bounce-rate addresses it can't verify (greylisting, no MX response) 450 - These are currently **parked** as `retry_later` with `error_message='zb_unknown: parked pending secondary verification (MillionVerifier)'` and `retry_at = +30 days` 451 - **TODO**: Integrate [MillionVerifier](https://www.millionverifier.com) as secondary validator (prepay, ~$47/50k credits) 452 - Call MillionVerifier API only for `zb_status='unknown'` addresses (fail-closed: if also unknown → skip send) 453 - Cache result in new `mv_status` column (migration needed) 454 - On success: reset `retry_later` → `approved` for addresses MillionVerifier marks valid/catch-all 455 - On toxic: mark `failed` permanently 456 - See `src/utils/zerobounce.js` for ZeroBounce pattern to follow 457 458 #### Tracking & Analytics 459 460 - Resend built-in tracking (opens, clicks, bounces, complaints) 461 - Cloudflare Worker receives webhooks and stores in R2 462 - Local sync script polls worker and updates SQLite 463 - Tracking data: `opened_at`, `tracking_clicked_at`, `email_id` in outreaches table 464 - Setup: Deploy `resend-webhook-worker.js`, configure Resend webhook URL 465 - Sync: `node src/utils/sync-email-events.js` (run every 5 minutes via cron) 466 - **Sales Tracking Enhancements** 467 - Improve dashboard sales tracking with more detailed metrics 468 - Add revenue attribution by keyword 469 - Track time-to-conversion (first outreach → sale) 470 - Add sales funnel visualization (outreach → response → meeting → sale) 471 - Consider CRM integration for automated sale recording 472 - **Negative Sentiment Analysis** 473 - Analyze negative sentiment responses to identify common complaints 474 - Group objections by category (price, timing, not interested, already have solution) 475 - Track complaint frequency over time 476 - Use insights to refine proposal templates and reduce objections 477 - Dashboard visualization of top objection types 478 - Metrics collection with something like StatsD/Prometheus 479 - Feed these into analytics dashboard 480 - `src/analytics/`: Analytics calculations 481 - Response rate by channel, keyword, time-of-day 482 - Conversion funnel: sent → clicked → replied → sale 483 - Revenue per channel (from sale_amount field) 484 485 #### Proposal Improvements 486 487 - **Translate score_json fields for non-English proposals** 488 - Problem: `[primary_weakness]`, `[evidence]`, `[reasoning]` in templates come directly from the English LLM scoring output, producing mixed-language proposals (e.g. "Hauptproblem: no urgency messaging") 489 - Solution: After `extractTemplateFields()` in `generateTemplateProposal()`, call Haiku to translate `primaryWeakness` and `evidence` into the target language when `languageCode !== 'en'` 490 - Use a simple prompt: "Translate the following short phrase to {language}. Reply with ONLY the translation, no explanation: {text}" 491 - Cache by (text, languageCode) pair in-memory to avoid duplicate API calls within a run 492 - Cost: ~$0.0001/call × ~3 fields × non-English sites — negligible 493 - File: `src/utils/template-proposals.js` → `generateTemplateProposal()` 494 - Also consider: translate score_json weakness fields at scoring time and store translated versions in score_json for the detected locale 495 496 - **Safe Conservative Recommendations** (DEFERRED until data quality validated) 497 - Add specific actionable recommendations to proposals once response rates and data quality are validated 498 - Focus on high-confidence recommendations that rarely backfire: 499 - Missing SSL certificate → "Add HTTPS security" 500 - No mobile viewport tag → "Fix mobile responsiveness" 501 - Missing alt text on images → "Improve accessibility" 502 - Slow page load (from scoring data) → "Optimize page speed" 503 - Extract recommendations from scoring JSON (already collected during scoring stage) 504 - Consider adding HTTP headers and SSL status collection during assets stage 505 - Update scoring rubric to explicitly evaluate SSL/HTTPS status 506 - Prerequisites: 507 - Establish baseline response rates with score-only proposals 508 - Validate data quality is sufficient for accurate recommendations 509 - Monitor complaint rates (<0.1% target) before adding recommendations 510 - A/B test score-only vs. score + recommendation to measure conversion lift 511 512 #### Consolidation 513 514 - Rearrange the various "npm run" and node scripts into a more sensible hierarchy 515 - Revamp the cron job system 516 517 #### Enhanced Inbound Handling 518 519 **Email Implementation Options**: 520 521 - email.received, email.opened, email.clicked, email.complained, email.bounced, email.failed, email.suppressed 522 523 - Poll Cloudflare R2 for email.received events from the Resend Webhooks API 524 - Use Resend Received Emails API to get the actual email body 525 - Map sender email to sites.domain or original recipient 526 - Parse out quoted text, store original in raw_payload 527 - Detect sentiment (positive/objection) via simple keyword matching 528 - Store in conversations table with direction='inbound' 529 - CLI commands: `poll`, `process-replies` 530 - npm scripts: `npm run inbound:email` 531 532 - Route SMS and Email to conversations table 533 - Thread by outreach_id 534 - Mark conversations for operator review 535 - CLI commands: `poll`, `process-replies`, `inbox`, `thread`, `stats` 536 - npm scripts: `npm run inbound:poll`, `npm run inbound:inbox`, `npm run inbound:stats` 537 538 - `tests/inbound-email.test.js` - Unit tests for email functions 539 - `tests/inbound-processor.test.js` - Unit tests for processor functions 540 541 #### ✅ Streamlit Dashboard (COMPLETED) 542 543 - Tab 1: Overview (stats, sales $, pipeline funnel, top errors) 544 - Tab 2: Pipeline Health (error analysis, stuck sites, throughput) 545 - Tab 3: Outreach Effectiveness (response rates, delivery funnel, sales tracking) 546 - Tab 4: Conversations (sentiment analysis, unread messages, conversation threads) 547 - Tab 5: Compliance (opt-out stats, rate limits, platform health) 548 - Tab 6: System Health (database metrics, cron jobs, code coverage, API limits) 549 550 ##### Cron Job Integration 551 552 - **Real-time Sync Status** (5-minute tasks) 553 - Display last sync times for: email events, unsubscribes, inbound SMS 554 - Show counts: new emails opened, new unsubscribes, new SMS messages 555 - Alert if sync hasn't run in > 10 minutes 556 - **Pipeline Queue Monitoring** (15-minute task) 557 - Display pipeline processing stats (sites processed, success rate) 558 - Show current queue depth and estimated completion time 559 - Alert if queue is backing up (> 1000 pending sites) 560 - **Database Health Metrics** (daily task) 561 - Display database size, integrity check status 562 - Show table row counts (sites, outreaches, conversations) 563 - Chart database growth over time 564 - **Code Quality Dashboard** (daily tasks) 565 - Display test coverage percentage with trend chart 566 - Show ESLint warning/error counts 567 - Display Sage AI review summary (if enabled) 568 - Show last lint/format/test run timestamps 569 - **Security & Dependencies** (daily/weekly tasks) 570 - Display npm audit vulnerability counts by severity 571 - Show outdated packages count with update recommendations 572 - Alert on critical security issues 573 - Display API rate limit health checks 574 - Compliance Reporting 575 - Total opt-outs by channel 576 - Messages blocked by compliance checks 577 - Business hours violations prevented 578 - **Profit Forecast** (weekly task) 579 - Based on /../09-business/profit-estimates.md 580 - **Database Operations** (weekly tasks) 581 - Show last backup timestamp and file size 582 - Display database vacuum stats (space saved) 583 - Chart database performance metrics over time 584 - Show index usage statistics 585 - **Technical Debt Tracking** (monthly task) 586 - Display incomplete TODO.md task count 587 - Show completed vs incomplete task ratio 588 - Trend chart of technical debt over time 589 - **Cron Task Overview** 590 - ✅ Move the TASKS array in cron.js to a sqlite table `cron_jobs` with: name, handler, interval_time, interval_unit, last_run, status, duration, and storage for status summary and details 591 - ✅ Created migration 029-create-cron-jobs-table.sql 592 - ✅ Created CLI manager (src/cli/cron-manager.js) for job management 593 - ✅ Created new cron runner (src/cron.js) that reads from database 594 - ✅ Migration script (scripts/migrate-cron-tasks.js) to seed existing tasks 595 - ✅ npm scripts for easy management (cron:list, cron:enable, cron:disable, cron:add, cron:remove, cron:logs, cron:stats) 596 - ✅ Comprehensive documentation (../06-automation/cron-system.md) 597 - ⏳ Surface this in the Streamlit dashboard (deferred) 598 - Allow changes/additions of jobs via web UI 599 - Show a dashboard with the status of each task (core pipeline and cron jobs) and drill down to the details 600 - ⏳ Color-coded status indicators in dashboard (success=green, failed=red, skipped=yellow) (deferred) 601 - ⏳ Ability to view logs for each task in dashboard (deferred) 602 - ⏳ Manual trigger button for enabled tasks in dashboard (deferred) 603 604 #### Asset Collection 605 606 - in "npm run assets" don't log "[Capture] [WARN] Request failed" for third-party domains 607 - look for unusual patterns in the asset collection logs 608 609 #### Tech Debt 610 611 - mock up all integrations 612 - then add mocking tests to increase coverage by 8-12% (eg: scrape.js, inbound/sms.js) 613 - remove any unused code such as: 614 - Legacy POC/MVP/Process pipelines 615 - `screenshot_optimization_tests` table 616 - move integration tests to their own sub folder, and unit tests in their own folder 617 - Refactor tests to use in-memory SQLite databases (`:memory:`) instead of file-based databases: 618 - Currently tests run sequentially (`--test-concurrency=1`) to prevent race conditions on shared DB files 619 - In-memory DBs would allow parallel test execution → ~5x faster test runs 620 - Need to update `src/utils/test-db.js` and all test files that use `initTestDb(filePath)` 621 - After refactor, remove `--test-concurrency=1` flag from `package.json` test commands 622 - replace all hardcoded references to the low score cutoff of 82 with a global constant in .env or `config` table 623 - consider removing the `config` table and just using .env (or move all secrets from .env to the `config` table if that's a safer place?) 624 - Fix ESLint Warnings 625 - `getCountryByCode()` should throw error instead of defaulting to AU 626 - `getCountryByGoogleDomain()` should throw error instead of defaulting to AU 627 - Remove `DEFAULT_COUNTRY` from `.env.example` (not used in code) 628 629 #### Human Tasks 630 631 - Review key doco to identify logic flaws 632 633 #### Nice To Have 634 635 - Redo architecture diagrams in mermaid (and make sure they're accurate) 636 637 #### Final Testing 638 639 - End-to-end test: Keyword → SERP → Scoring → Outreach → Track response 640 - Load test: 100 sites in one run 641 - Documentation: README.md with setup instructions 642 643 --- 644 645 **When asking Claude "what's next?":** 646 647 - Reference this file instead of asking user 648 - Suggest next logical task from priority list 649 - Consider dependencies (e.g., need analytics before dashboard) 650 651 --- 652 653 ## Infrastructure & Networking 654 655 **Worker USB Node — Future Considerations:** 656 657 1. **Yggdrasil overlay network (P3 — consider once worker fleet is stable)** 658 - **What:** Decentralised encrypted IPv6 mesh; each node gets a cryptographically derived address. NetBird/WireGuard tunnel runs _inside_ Yggdrasil, hiding the WireGuard handshake from ISPs. 659 - **Why:** Reduces AITA (AI threat intel correlation) by obscuring which IP is contacting the VPS. ISPs see Yggdrasil traffic rather than raw WireGuard. 660 - **Why not yet:** Adds a third daemon to the USB boot chain (Yggdrasil → NetBird → Docker), increases boot complexity and failure modes. The VPN kill switch already encrypts all traffic. 661 - **Simpler alternatives first:** Residential proxy exit on the VPS, or Mullvad/IVPN exit routing. 662 - **NixOS module:** `services.yggdrasil.enable = true` — one line when ready. 663 - **Effort:** ~1 day (NixOS config + peer coordination) 664 - **Priority:** P3 — evaluate 3-6 months after worker fleet is running stably 665 666 --- 667 668 ## Enrichment Quality — Third-Party API Services 669 670 **Goal:** Improve contact/company data quality during enrichment using third-party APIs. Currently we do our own contact extraction (Playwright + LLM) and email validation (ZeroBounce). Evaluate whether paid services yield better data at acceptable cost. 671 672 **Services to evaluate (Outscraper and alternatives):** 673 674 1. **Contact enrichment (Outscraper "Contact Enrichment")** 675 - Does it find contacts we're missing with our Playwright + Haiku extraction? 676 - Compare: hit rate, accuracy, cost per lookup vs our current ~$0.003/site 677 - Alternatives: Apollo.io, Hunter.io, Clearbit, RocketReach 678 679 2. **HubSpot company contacts finder (Outscraper)** 680 - Useful if targets have HubSpot presence — may surface decision-maker names/emails 681 - Probably niche; skip unless our verticals are HubSpot-heavy 682 683 3. **Email validation (Outscraper vs ZeroBounce)** 684 - We already have ZeroBounce — is Outscraper cheaper/better? 685 - Compare: cost per validation, accuracy, speed, catch-all handling 686 - Probably not worth switching unless ZeroBounce pricing becomes a problem 687 688 4. **Phone number validation (Outscraper or alternatives)** 689 - **Key question:** Do we suffer reputational/ban/cost penalties from Twilio for sending SMS to invalid numbers? 690 - If Twilio charges per attempt or flags accounts for high invalid rates, validation pays for itself 691 - Alternatives: Numverify, Abstract API, Twilio Lookup API (we already pay Twilio — check if Lookup is cheaper than third-party) 692 693 5. **Company insights / company type detection (Outscraper "Company Insights")** 694 - **High priority for GDPR:** Could help determine if a business is a sole trader / partnership vs limited company 695 - Currently we do DOM-based `searchCompanyTypes()` + LLM classification — accuracy is unknown 696 - A structured API response with company type/registration would be more reliable 697 - Alternatives: Companies House API (UK, free), ABN Lookup (AU, free), OpenCorporates, Endole 698 - **Note:** Free government APIs (Companies House, ABN) should be tried first before paying for this 699 700 **Priority:** P2 — the GDPR sole trader detection (#5) is the most impactful. The rest are nice-to-have unless our current extraction quality drops. 701 702 **Next step:** Benchmark current enrichment hit rate and GDPR classification accuracy to establish a baseline before evaluating any of these. 703 704 --- 705 706 ## Future Enhancements 707 708 **Post-Implementation Improvements:** 709 710 1. **Periodic Re-validation:** 711 - Cron job to re-validate monthly 712 - Update keywords based on search trends 713 - Store in `scripts/cron/validate-keywords.js` 714 715 2. **Historical Tracking:** 716 - Store search volume history in database 717 - Track keyword trends over time 718 - Alert on significant SV changes 719 720 3. **Multi-API Support:** 721 - Add Google Ads API provider 722 - Add Keyword Tool API provider 723 - Allow switching via `KEYWORD_VALIDATION_PROVIDER` env var 724 725 4. **Advanced Filtering:** 726 - Filter by competition level (avoid ultra-competitive keywords) 727 - Filter by CPC (cost-per-click) if running ads later 728 - Filter by keyword intent (informational vs transactional) 729 730 5. **Regional Variations:** 731 - Different keyword counts per country based on population/GDP 732 - Already implemented via GDP-prioritized processing 733 734 6. **Agent Workflow Management:** 735 - Consider Jira-like ticketing system for agent task management 736 - Proper workflow enforcement: Architect → PO → Developer → Architect → QA 737 - Task dependencies, approvals, status tracking 738 - May be overkill for current scale - evaluate when agent system matures 739 - Alternative: Extend agent_tasks table with workflow fields (reviewed_by, approved_by, approval_status) 740 741 --- 742 743 ## Pricing & Revenue Optimization 744 745 **Post Multi-Currency Launch:** 746 747 1. **FX Rate Monitoring & Consolidation (P2 - Medium Complexity)** 748 - **Goal:** Minimize FX losses, maximize USD realization from currency holdings 749 - **Components:** 750 - Cron job to track USD/AUD, USD/EUR, USD/GBP rates hourly 751 - Alert when rate hits favorable threshold (e.g., AUD/USD > 1.55) 752 - Automated consolidation: Convert currency balance when optimal 753 - Dashboard widget showing currency holdings and conversion opportunities 754 - **Business Value:** 2-5% improvement in USD realization vs immediate conversion 755 - **Implementation:** 756 - Create `src/cron/fx-rate-monitor.js` 757 - Integrate with Fixer.io API (already used for weekly repricing) 758 - PayPal API integration for balance queries and currency conversion 759 - Add `fx_consolidation_log` table for tracking conversions and realized gains 760 - **Effort:** ~1-2 days (requires PayPal API integration) 761 - **Priority:** P2 (implement 1-2 months after multi-currency launch is stable) 762 763 2. **Split Test Pricing in Proposals (P2 - Low Complexity)** 764 - **Goal:** Optimize proposal-to-payment conversion rate 765 - **Hypothesis:** Not showing price may increase engagement (reduces sticker shock), showing price may pre-qualify leads (reduces time waste) 766 - **Test Design:** 767 - Variant A: Proposals mention local currency price upfront 768 - Variant B: Proposals omit pricing (control - current implementation) 769 - Variant C: Proposals mention "starting from X" messaging 770 - **Metrics to Track:** 771 - Engagement rate (opens, clicks) 772 - Reply rate (interested vs declined) 773 - Payment conversion rate 774 - Time from proposal to payment 775 - Average deal value by variant 776 - **Business Value:** 5-15% improvement in conversion funnel optimization 777 - **Implementation:** 778 - Update `src/proposal-generator-v2.js` to conditionally include pricing 779 - Add `proposal_pricing_variant` field to conversations table 780 - Dashboard query to analyze conversion by variant 781 - **Effort:** ~4-6 hours (pricing infrastructure already exists) 782 - **Priority:** P2 (implement 30+ days after multi-currency launch) 783 784 3. **Stripe Migration Evaluation (P3 - Research Phase)** 785 - **Goal:** Evaluate Stripe as alternative payment processor if PayPal limitations arise 786 - **When to Consider:** 787 - If PayPal FX fees exceed $1,000/month (3% × $33K+ monthly revenue) 788 - If adding subscription/recurring billing (Stripe Billing is superior) 789 - If need better exchange rates (Stripe closer to spot rates) 790 - If customer feedback indicates preference for card entry over PayPal login 791 - **Stripe Advantages:** 792 - Lower fees: 2.9% + $0.30 (no separate 1.5% cross-border fee) 793 - Better developer experience (webhooks, API design) 794 - Native subscription billing with prorations, trials, etc. 795 - Better exchange rates (typically 1-2% better than PayPal) 796 - More granular analytics and reporting 797 - **Stripe Disadvantages:** 798 - Customer must enter card details (vs convenient PayPal login) 799 - No buyer/seller protection like PayPal disputes process 800 - Less familiar brand in some international markets (PayPal more trusted) 801 - **Migration Path:** 802 - Add Stripe as secondary payment option (A/B test conversion rates) 803 - Track which processor customers prefer by region 804 - Gradually shift to preferred processor per market 805 - Maintain both for 3-6 months before potentially consolidating 806 - **Effort:** ~2-3 days implementation + 1-2 weeks testing 807 - **Priority:** P3 (evaluate 6-12 months post-launch or if revenue > $30K/month) 808 809 ## Recurring Tasks 810 811 These tasks should be performed on a regular schedule: 812 813 - **Monthly — Contact extraction benchmark:** Re-run `npm run benchmark:contacts` to evaluate whether newer/cheaper LLMs outperform the current `ENRICHMENT_MODEL`. Check if model upgrade ROI is positive. Update `MODELS` in `scripts/benchmark-contact-extraction.js` and `MODEL_PRICING` in `src/utils/llm-usage-tracker.js` when new models launch. 814 - Last run: _(never — run it for the first time with `npm run benchmark:contacts -- --dry-run` to preview cost)_ 815 - Reports saved to: `reports/contact-extraction-benchmark-YYYY-MM-DD.md` 816 817 - **Monthly — Test/prod path isolation audit:** Check for mixing of data and logging between test and production: 818 1. Verify `src/utils/logger.js` respects `LOGS_DIR` env var (not just hardcoded `./logs`) 819 2. Grep test files for hardcoded prod paths (`./logs/`, `./db/sites.db`) — should use `/tmp/test-*` or `:memory:` 820 3. Grep source files for hardcoded test paths (`/tmp/test-`) that shouldn't be in prod code 821 4. Run `npm test` and confirm nothing written to `./logs/` or `./db/sites.db` 822 - Known issues found (2026-03-03): Logger ignores `LOGS_DIR` env var (line 25), `process.integration.test.js` hardcodes prod DB path