Cradicle Explorer

/ docs / 07-integrations / llm-integration.md
llm-integration.md
  1  ---
  2  title: 'Llm Integration'
  3  category: 'integrations'
  4  last_verified: '2026-02-15'
  5  related_files:
  6    - 'src/score.js'
  7    - 'src/score.test.js'
  8  tags: ['llm', 'integration', 'testing', 'security', 'api', 'ai', 'scoring']
  9  status: 'current'
 10  ---
 11  
 12  # LLM Integration Guide
 13  
 14  ## Overview
 15  
 16  The 333 Method Automation uses **OpenRouter** to access OpenAI's GPT-4o-mini for website conversion scoring. All LLM calls are centralized in `src/score.js`.
 17  
 18  ## Current Configuration
 19  
 20  ### Provider: OpenRouter
 21  
 22  - **Website**: https://openrouter.ai
 23  - **API Endpoint**: `https://openrouter.ai/api/v1/chat/completions`
 24  - **Authentication**: Bearer token via `OPENROUTER_API_KEY` environment variable
 25  
 26  ### Model: openai/gpt-4o-mini
 27  
 28  - **Cost**: ~$0.15 per 1M input tokens, ~$0.60 per 1M output tokens
 29  - **Context Window**: 128K tokens
 30  - **Vision Support**: ✅ Yes (required for screenshot analysis)
 31  - **JSON Mode**: ✅ Yes (structured output)
 32  
 33  ### Why GPT-4o-mini?
 34  
 35  1. **Cost-Effective**: 60x cheaper than GPT-4 Turbo
 36  2. **Vision Capable**: Can analyze screenshots (desktop + mobile)
 37  3. **Fast**: Low latency for scoring operations
 38  4. **Sufficient Quality**: Excellent for conversion analysis tasks
 39  5. **JSON Mode**: Native structured output support
 40  
 41  ## LLM Call Locations
 42  
 43  All LLM calls are in `src/score.js`:
 44  
 45  ### 1. Initial Scoring (`callScoringAPI`)
 46  
 47  **Purpose**: Score website conversion factors from above-the-fold content
 48  
 49  **Inputs**:
 50  
 51  - Desktop screenshot (above-fold)
 52  - Mobile screenshot (above-fold)
 53  - HTML DOM (first 50K chars)
 54  - Scoring rubric from `docs/prompts/CONVERSION-SCORING.md`
 55  
 56  **Output**: JSON with:
 57  
 58  - Individual factor scores (0-100)
 59  - Weighted total score
 60  - Letter grade (A+ to F)
 61  - Detailed reasoning
 62  
 63  **Parameters**:
 64  
 65  ```javascript
 66  {
 67    model: 'openai/gpt-4o-mini',
 68    temperature: 0.3,        // Low for consistency
 69    max_tokens: 2000,        // Sufficient for detailed scoring
 70    response_format: { type: 'json_object' }
 71  }
 72  ```
 73  
 74  ### 2. Resubmit Scoring (`callResubmitAPI`)
 75  
 76  **Purpose**: Re-evaluate sites scoring B- or below with below-fold content
 77  
 78  **Triggers When**: Initial grade is B-, C+, C, C-, D+, D, D-, or F
 79  
 80  **Inputs**:
 81  
 82  - Below-fold screenshot (uncropped preferred, fallback to cropped)
 83  - HTML DOM (first 50K chars)
 84  - Initial scoring results
 85  - Resubmit rubric from `docs/prompts/CONVERSION-RESCORING.md`
 86  
 87  **Output**: Updated JSON with potentially improved scores
 88  
 89  **Parameters**:
 90  
 91  ```javascript
 92  {
 93    model: 'openai/gpt-4o-mini',
 94    temperature: 0.3,
 95    max_tokens: 3000,        // More tokens for detailed re-evaluation
 96    response_format: { type: 'json_object' }
 97  }
 98  ```
 99  
100  ## Image Handling
101  
102  ### Screenshot Encoding
103  
104  - Format: JPEG (optimized by Sharp)
105  - Encoding: Base64
106  - Detail Level: `low` (85 tokens per image vs 255 for `high`)
107  - Why low detail: Saves 67% on tokens while maintaining sufficient quality for conversion analysis
108  
109  ### Images Per Request
110  
111  - **Initial Scoring**: 2 images (desktop + mobile, above-fold)
112  - **Resubmit Scoring**: 1 image (desktop below-fold)
113  
114  ## Error Handling & Retry Logic
115  
116  ### Automatic Retries
117  
118  - **Max Retries**: 3 attempts
119  - **Backoff Strategy**: Exponential (base delay increases with each retry)
120  - **Retryable Errors**:
121    - Network errors (ECONNRESET, ETIMEDOUT, etc.)
122    - Rate limits (429 status)
123    - Server errors (500-599 status)
124  
125  ### Implementation
126  
127  ```javascript
128  await retryWithBackoff(
129    async () => {
130      /* API call */
131    },
132    {
133      maxRetries: 3,
134      shouldRetry: isRetryableError,
135      onRetry: (attempt, error) => {
136        logger.warn(`Retry ${attempt + 1}/3: ${error.message}`);
137      },
138    }
139  );
140  ```
141  
142  ## Token Usage Estimation
143  
144  ### Per Site (Initial Scoring)
145  
146  - System prompt: ~3,500 tokens
147  - URL + Domain + HTML (50K chars): ~12,500 tokens
148  - Desktop screenshot (low detail): ~85 tokens
149  - Mobile screenshot (low detail): ~85 tokens
150  - **Total Input**: ~16,170 tokens (~$0.0024)
151  - **Output** (average): ~500 tokens (~$0.0003)
152  - **Cost per initial score**: ~$0.0027
153  
154  ### Per Site (With Resubmit)
155  
156  - Additional input: ~16,000 tokens
157  - Additional output: ~700 tokens
158  - **Additional cost**: ~$0.0028
159  - **Total with resubmit**: ~$0.0055
160  
161  ### Volume Estimates
162  
163  - **1,000 sites** (assuming 70% need resubmit):
164    - 300 sites × $0.0027 = $0.81
165    - 700 sites × $0.0055 = $3.85
166    - **Total**: ~$4.66
167  
168  - **10,000 sites** (assuming 70% need resubmit):
169    - **Total**: ~$46.60
170  
171  ## Alternative Models (Future Consideration)
172  
173  ### Cost vs Quality Trade-offs
174  
175  | Model            | Input Cost | Output Cost | Use Case                    |
176  | ---------------- | ---------- | ----------- | --------------------------- |
177  | gpt-4o-mini      | $0.15/1M   | $0.60/1M    | ✅ Current (optimal)        |
178  | gpt-3.5-turbo    | $0.50/1M   | $1.50/1M    | ❌ No vision support        |
179  | gpt-4-turbo      | $10/1M     | $30/1M      | ❌ 67x more expensive       |
180  | claude-3-haiku   | $0.25/1M   | $1. 25/1M   | ⚠️ Alternative (vision)     |
181  | gemini-1.5-flash | $0.075/1M  | $0.30/1M    | ⚠️ Cheaper (not OpenRouter) |
182  
183  ### Recommendation
184  
185  **Stick with gpt-4o-mini** for now because:
186  
187  1. Best cost/performance ratio
188  2. Excellent vision quality
189  3. Native JSON mode
190  4. Proven reliability at scale
191  
192  ## Cost Optimization Tips
193  
194  ### Already Implemented ✅
195  
196  1. **Low detail images**: Saves 67% on image tokens
197  2. **HTML truncation**: Only first 50K chars (saves ~80% tokens)
198  3. **Conditional resubmit**: Only rescore low performers
199  4. **Error retry logic**: Prevents wasted API calls
200  5. **Batch processing**: Limits concurrent requests
201  
202  ### Future Optimizations
203  
204  1. **Prompt caching**: OpenRouter supports prompt caching (reduces cost by 50-90% for system prompts)
205  2. **Screenshot compression**: Further optimize JPEG quality without losing analysis accuracy
206  3. **HTML cleaning**: Remove scripts, styles, comments before sending
207  4. **Smart resubmit**: Use configurable threshold (currently B-)
208  
209  ## Monitoring & Debugging
210  
211  ### Enable Debug Logging
212  
213  ```bash
214  DEBUG=true npm run process -- --url=https://example.com
215  ```
216  
217  ### View LLM Responses
218  
219  Check logs for:
220  
221  - `[Score] [INFO] Scoring website: {domain}`
222  - `[Score] [SUCCESS] Scored {domain}: {grade}`
223  - `[Score] [INFO] Score {grade} requires resubmit`
224  
225  ### Common Issues
226  
227  **1. API Key Not Found**
228  
229  ```
230  Error: OPENROUTER_API_KEY not found in environment
231  ```
232  
233  Solution: Add key to `.env` file
234  
235  **2. Rate Limits**
236  
237  ```
238  429 Too Many Requests
239  ```
240  
241  Solution: Reduce concurrency or wait for rate limit reset
242  
243  **3. Parse Errors**
244  
245  ```
246  Failed to parse JSON response
247  ```
248  
249  Solution: Check `max_tokens` - may need to increase for complex sites
250  
251  ## Testing
252  
253  ### Unit Tests
254  
255  Run scoring tests with:
256  
257  ```bash
258  npm test src/score.test.js
259  ```
260  
261  ### Integration Tests
262  
263  Full end-to-end with real API calls:
264  
265  ```bash
266  npm run test:integration
267  ```
268  
269  ### Manual Testing
270  
271  Process a single URL:
272  
273  ```bash
274  npm run process -- --url=https://www.loremipsum.de/
275  ```
276  
277  ## Security Best Practices
278  
279  1. **Never commit API keys**: Use `.env` file (gitignored)
280  2. **Rotate keys regularly**: Create new keys every 90 days
281  3. **Monitor usage**: Check OpenRouter dashboard for unusual activity
282  4. **Set spending limits**: Configure max monthly spend in OpenRouter
283  5. **Use separate keys**: Different keys for dev/staging/production
284  
285  ## Resources
286  
287  - **OpenRouter Dashboard**: https://openrouter.ai/dashboard
288  - **Model Pricing**: https://openrouter.ai/models
289  - **API Documentation**: https://openrouter.ai docs
290  - **Scoring Prompts**: `docs/prompts/CONVERSION-SCORING.md`, `docs/prompts/CONVERSION-RESCORING.md`
291  
292  ## Summary
293  
294  **Location**: All LLM calls are in `src/score.js`
295  
296  **Model**: We're using **OpenAI's GPT-4o-mini via OpenRouter** - this is the most appropriate model for our use case:
297  
298  - Vision capability for screenshot analysis
299  - Cost-effective at scale
300  - Fast response times
301  - JSON mode for structured output
302  
303  **Cost**: ~$0.005 per site (with resubmit), ~$50 per 10K sites
304  
305  **Status**: ✅ Production-ready with robust error handling and retry logic