llm-integration.md
1 --- 2 title: 'Llm Integration' 3 category: 'integrations' 4 last_verified: '2026-02-15' 5 related_files: 6 - 'src/score.js' 7 - 'src/score.test.js' 8 tags: ['llm', 'integration', 'testing', 'security', 'api', 'ai', 'scoring'] 9 status: 'current' 10 --- 11 12 # LLM Integration Guide 13 14 ## Overview 15 16 The 333 Method Automation uses **OpenRouter** to access OpenAI's GPT-4o-mini for website conversion scoring. All LLM calls are centralized in `src/score.js`. 17 18 ## Current Configuration 19 20 ### Provider: OpenRouter 21 22 - **Website**: https://openrouter.ai 23 - **API Endpoint**: `https://openrouter.ai/api/v1/chat/completions` 24 - **Authentication**: Bearer token via `OPENROUTER_API_KEY` environment variable 25 26 ### Model: openai/gpt-4o-mini 27 28 - **Cost**: ~$0.15 per 1M input tokens, ~$0.60 per 1M output tokens 29 - **Context Window**: 128K tokens 30 - **Vision Support**: ✅ Yes (required for screenshot analysis) 31 - **JSON Mode**: ✅ Yes (structured output) 32 33 ### Why GPT-4o-mini? 34 35 1. **Cost-Effective**: 60x cheaper than GPT-4 Turbo 36 2. **Vision Capable**: Can analyze screenshots (desktop + mobile) 37 3. **Fast**: Low latency for scoring operations 38 4. **Sufficient Quality**: Excellent for conversion analysis tasks 39 5. **JSON Mode**: Native structured output support 40 41 ## LLM Call Locations 42 43 All LLM calls are in `src/score.js`: 44 45 ### 1. Initial Scoring (`callScoringAPI`) 46 47 **Purpose**: Score website conversion factors from above-the-fold content 48 49 **Inputs**: 50 51 - Desktop screenshot (above-fold) 52 - Mobile screenshot (above-fold) 53 - HTML DOM (first 50K chars) 54 - Scoring rubric from `docs/prompts/CONVERSION-SCORING.md` 55 56 **Output**: JSON with: 57 58 - Individual factor scores (0-100) 59 - Weighted total score 60 - Letter grade (A+ to F) 61 - Detailed reasoning 62 63 **Parameters**: 64 65 ```javascript 66 { 67 model: 'openai/gpt-4o-mini', 68 temperature: 0.3, // Low for consistency 69 max_tokens: 2000, // Sufficient for detailed scoring 70 response_format: { type: 'json_object' } 71 } 72 ``` 73 74 ### 2. Resubmit Scoring (`callResubmitAPI`) 75 76 **Purpose**: Re-evaluate sites scoring B- or below with below-fold content 77 78 **Triggers When**: Initial grade is B-, C+, C, C-, D+, D, D-, or F 79 80 **Inputs**: 81 82 - Below-fold screenshot (uncropped preferred, fallback to cropped) 83 - HTML DOM (first 50K chars) 84 - Initial scoring results 85 - Resubmit rubric from `docs/prompts/CONVERSION-RESCORING.md` 86 87 **Output**: Updated JSON with potentially improved scores 88 89 **Parameters**: 90 91 ```javascript 92 { 93 model: 'openai/gpt-4o-mini', 94 temperature: 0.3, 95 max_tokens: 3000, // More tokens for detailed re-evaluation 96 response_format: { type: 'json_object' } 97 } 98 ``` 99 100 ## Image Handling 101 102 ### Screenshot Encoding 103 104 - Format: JPEG (optimized by Sharp) 105 - Encoding: Base64 106 - Detail Level: `low` (85 tokens per image vs 255 for `high`) 107 - Why low detail: Saves 67% on tokens while maintaining sufficient quality for conversion analysis 108 109 ### Images Per Request 110 111 - **Initial Scoring**: 2 images (desktop + mobile, above-fold) 112 - **Resubmit Scoring**: 1 image (desktop below-fold) 113 114 ## Error Handling & Retry Logic 115 116 ### Automatic Retries 117 118 - **Max Retries**: 3 attempts 119 - **Backoff Strategy**: Exponential (base delay increases with each retry) 120 - **Retryable Errors**: 121 - Network errors (ECONNRESET, ETIMEDOUT, etc.) 122 - Rate limits (429 status) 123 - Server errors (500-599 status) 124 125 ### Implementation 126 127 ```javascript 128 await retryWithBackoff( 129 async () => { 130 /* API call */ 131 }, 132 { 133 maxRetries: 3, 134 shouldRetry: isRetryableError, 135 onRetry: (attempt, error) => { 136 logger.warn(`Retry ${attempt + 1}/3: ${error.message}`); 137 }, 138 } 139 ); 140 ``` 141 142 ## Token Usage Estimation 143 144 ### Per Site (Initial Scoring) 145 146 - System prompt: ~3,500 tokens 147 - URL + Domain + HTML (50K chars): ~12,500 tokens 148 - Desktop screenshot (low detail): ~85 tokens 149 - Mobile screenshot (low detail): ~85 tokens 150 - **Total Input**: ~16,170 tokens (~$0.0024) 151 - **Output** (average): ~500 tokens (~$0.0003) 152 - **Cost per initial score**: ~$0.0027 153 154 ### Per Site (With Resubmit) 155 156 - Additional input: ~16,000 tokens 157 - Additional output: ~700 tokens 158 - **Additional cost**: ~$0.0028 159 - **Total with resubmit**: ~$0.0055 160 161 ### Volume Estimates 162 163 - **1,000 sites** (assuming 70% need resubmit): 164 - 300 sites × $0.0027 = $0.81 165 - 700 sites × $0.0055 = $3.85 166 - **Total**: ~$4.66 167 168 - **10,000 sites** (assuming 70% need resubmit): 169 - **Total**: ~$46.60 170 171 ## Alternative Models (Future Consideration) 172 173 ### Cost vs Quality Trade-offs 174 175 | Model | Input Cost | Output Cost | Use Case | 176 | ---------------- | ---------- | ----------- | --------------------------- | 177 | gpt-4o-mini | $0.15/1M | $0.60/1M | ✅ Current (optimal) | 178 | gpt-3.5-turbo | $0.50/1M | $1.50/1M | ❌ No vision support | 179 | gpt-4-turbo | $10/1M | $30/1M | ❌ 67x more expensive | 180 | claude-3-haiku | $0.25/1M | $1. 25/1M | ⚠️ Alternative (vision) | 181 | gemini-1.5-flash | $0.075/1M | $0.30/1M | ⚠️ Cheaper (not OpenRouter) | 182 183 ### Recommendation 184 185 **Stick with gpt-4o-mini** for now because: 186 187 1. Best cost/performance ratio 188 2. Excellent vision quality 189 3. Native JSON mode 190 4. Proven reliability at scale 191 192 ## Cost Optimization Tips 193 194 ### Already Implemented ✅ 195 196 1. **Low detail images**: Saves 67% on image tokens 197 2. **HTML truncation**: Only first 50K chars (saves ~80% tokens) 198 3. **Conditional resubmit**: Only rescore low performers 199 4. **Error retry logic**: Prevents wasted API calls 200 5. **Batch processing**: Limits concurrent requests 201 202 ### Future Optimizations 203 204 1. **Prompt caching**: OpenRouter supports prompt caching (reduces cost by 50-90% for system prompts) 205 2. **Screenshot compression**: Further optimize JPEG quality without losing analysis accuracy 206 3. **HTML cleaning**: Remove scripts, styles, comments before sending 207 4. **Smart resubmit**: Use configurable threshold (currently B-) 208 209 ## Monitoring & Debugging 210 211 ### Enable Debug Logging 212 213 ```bash 214 DEBUG=true npm run process -- --url=https://example.com 215 ``` 216 217 ### View LLM Responses 218 219 Check logs for: 220 221 - `[Score] [INFO] Scoring website: {domain}` 222 - `[Score] [SUCCESS] Scored {domain}: {grade}` 223 - `[Score] [INFO] Score {grade} requires resubmit` 224 225 ### Common Issues 226 227 **1. API Key Not Found** 228 229 ``` 230 Error: OPENROUTER_API_KEY not found in environment 231 ``` 232 233 Solution: Add key to `.env` file 234 235 **2. Rate Limits** 236 237 ``` 238 429 Too Many Requests 239 ``` 240 241 Solution: Reduce concurrency or wait for rate limit reset 242 243 **3. Parse Errors** 244 245 ``` 246 Failed to parse JSON response 247 ``` 248 249 Solution: Check `max_tokens` - may need to increase for complex sites 250 251 ## Testing 252 253 ### Unit Tests 254 255 Run scoring tests with: 256 257 ```bash 258 npm test src/score.test.js 259 ``` 260 261 ### Integration Tests 262 263 Full end-to-end with real API calls: 264 265 ```bash 266 npm run test:integration 267 ``` 268 269 ### Manual Testing 270 271 Process a single URL: 272 273 ```bash 274 npm run process -- --url=https://www.loremipsum.de/ 275 ``` 276 277 ## Security Best Practices 278 279 1. **Never commit API keys**: Use `.env` file (gitignored) 280 2. **Rotate keys regularly**: Create new keys every 90 days 281 3. **Monitor usage**: Check OpenRouter dashboard for unusual activity 282 4. **Set spending limits**: Configure max monthly spend in OpenRouter 283 5. **Use separate keys**: Different keys for dev/staging/production 284 285 ## Resources 286 287 - **OpenRouter Dashboard**: https://openrouter.ai/dashboard 288 - **Model Pricing**: https://openrouter.ai/models 289 - **API Documentation**: https://openrouter.ai docs 290 - **Scoring Prompts**: `docs/prompts/CONVERSION-SCORING.md`, `docs/prompts/CONVERSION-RESCORING.md` 291 292 ## Summary 293 294 **Location**: All LLM calls are in `src/score.js` 295 296 **Model**: We're using **OpenAI's GPT-4o-mini via OpenRouter** - this is the most appropriate model for our use case: 297 298 - Vision capability for screenshot analysis 299 - Cost-effective at scale 300 - Fast response times 301 - JSON mode for structured output 302 303 **Cost**: ~$0.005 per site (with resubmit), ~$50 per 10K sites 304 305 **Status**: ✅ Production-ready with robust error handling and retry logic