Cradicle Explorer

/ patterns / model-allocation-strategy.md
model-allocation-strategy.md
  1  # Model Allocation Strategy
  2  
  3  *proto-012 | Match model capability to task complexity*
  4  
  5  ---
  6  
  7  - **principle**
  8    - "Match model capability to task complexity. Haiku for simple, Sonnet for medium, Opus for judgment."
  9  
 10  - **shape**
 11    - Not every task needs your best model
 12    - Degrade intelligently based on task concreteness
 13    - Object-level work degrades (Haiku, Sonnet); metacognition requires Opus
 14    - The pyramid catches more than the peak alone
 15  
 16  ---
 17  
 18  **Status:** 📄 DOCUMENTED
 19  
 20  ---
 21  
 22  ## Core Principle
 23  
 24  > **Not every task needs your best model. Degrade intelligently based on task concreteness.**
 25  
 26  Using Opus for everything is:
 27  - Expensive (10x+ cost)
 28  - Slow (higher latency)
 29  - Wasteful (overkill for routine tasks)
 30  
 31  Using Haiku for everything is:
 32  - Cheap but misses nuance
 33  - Fast but shallow
 34  - Risky for complex judgment
 35  
 36  **The solution:** A pyramid of capability matching task complexity.
 37  
 38  ---
 39  
 40  ## The Model Pyramid
 41  
 42  ```
 43                           OPUS
 44                      ┌─────────────┐
 45                      │ Synthesis   │  Cost: $$$
 46                      │ Novel       │  Latency: High
 47                      │ Adversarial │  Instances: 1-2
 48                      └──────┬──────┘
 49                             │
 50                        SONNET
 51                ┌────────────┼────────────┐
 52                │            │            │
 53           ┌────┴────┐  ┌────┴────┐  ┌────┴────┐
 54           │Substan- │  │Substan- │  │Substan- │  Cost: $$
 55           │tive     │  │tive     │  │tive     │  Latency: Medium
 56           │Review   │  │Review   │  │Review   │  Instances: 3-5
 57           └────┬────┘  └────┬────┘  └────┬────┘
 58                │            │            │
 59                          HAIKU
 60      ┌─────────┼────────────┼────────────┼─────────┐
 61      │         │            │            │         │
 62  ┌───┴───┐ ┌───┴───┐ ┌──────┴───┐ ┌─────┴───┐ ┌───┴───┐
 63  │Routine│ │Routine│ │ Routine  │ │ Routine │ │Routine│  Cost: $
 64  │Check  │ │Check  │ │ Check    │ │ Check   │ │Check  │  Latency: Low
 65  └───────┘ └───────┘ └──────────┘ └─────────┘ └───────┘  Instances: Many
 66  ```
 67  
 68  ---
 69  
 70  ## Task-to-Model Mapping
 71  
 72  ### Opus (Highest Capability)
 73  
 74  | Task | Why Opus |
 75  |------|----------|
 76  | **First Officer synthesis** | Requires seeing patterns across multiple inputs |
 77  | **Adversarial review** | Needs creativity to find non-obvious failures |
 78  | **Architectural decisions** | Novel problem-solving, high stakes |
 79  | **Axiom-level questions** | Philosophical nuance, edge cases |
 80  | **Deadlock resolution** | When council can't converge |
 81  
 82  **Use when:** Novel, ambiguous, high-stakes, requires synthesis
 83  
 84  ### Sonnet (Balanced)
 85  
 86  | Task | Why Sonnet |
 87  |------|------------|
 88  | **Substantive peer review** | Needs judgment but not synthesis |
 89  | **Documentation writing** | Quality matters but path is clear |
 90  | **Consistency checking** | Thorough comparison, moderate complexity |
 91  | **Pattern implementation** | Following established templates |
 92  | **Standard council member** | Deliberation without meta-synthesis |
 93  
 94  **Use when:** Substantive judgment needed, established patterns exist
 95  
 96  ### Haiku (Fastest/Cheapest)
 97  
 98  | Task | Why Haiku |
 99  |------|-----------|
100  | **Checklist validation** | Binary checks, no judgment needed |
101  | **Format verification** | Does file have required sections? |
102  | **Simple completeness** | Are all fields filled? |
103  | **High-volume parallel** | Many quick checks simultaneously |
104  | **Routine status updates** | Mechanical, predictable |
105  
106  **Use when:** Concrete, mechanical, high-volume, low-stakes
107  
108  ---
109  
110  ## Tribal Review Configuration
111  
112  ### Standard Review (Most Cases)
113  ```
114  Reviewers: 3x Sonnet (completeness, consistency, context)
115  First Officer: Skip (Sonnet findings usually sufficient)
116  Cost: $$
117  ```
118  
119  ### Important Review (Significant Changes)
120  ```
121  Reviewers: 3x Sonnet (completeness, consistency, adversarial)
122  First Officer: 1x Opus (synthesis)
123  Cost: $$$
124  ```
125  
126  ### Critical Review (Architectural/Axiom-level)
127  ```
128  Pre-check: 5x Haiku (basic validation)
129  Reviewers: 3x Sonnet (all lenses)
130  First Officer: 1x Opus (synthesis + recommendations)
131  Arbitration: 1x Opus if deadlock
132  Cost: $$$$
133  ```
134  
135  ### High-Volume Review (Many Small Items)
136  ```
137  Reviewers: 10x Haiku (parallel checklist)
138  Escalation: 1x Sonnet if Haiku flags issues
139  First Officer: Skip unless escalation
140  Cost: $
141  ```
142  
143  ---
144  
145  ## Degradation Rules
146  
147  ### Degrade DOWN when:
148  - Task is well-defined
149  - Checklist can capture requirements
150  - Pattern/template exists
151  - Low stakes if wrong
152  - Speed matters more than depth
153  
154  ### Escalate UP when:
155  - Haiku/Sonnet finds unexpected issues
156  - Task requires novel judgment
157  - Stakes are high
158  - Multiple valid interpretations
159  - Synthesis across inputs needed
160  
161  ---
162  
163  ## Cost-Benefit Analysis
164  
165  | Configuration | Cost | Coverage | Best For |
166  |---------------|------|----------|----------|
167  | 1x Opus | $$$ | 75% | Quick expert opinion |
168  | 3x Haiku | $ | 50% | High-volume screening |
169  | 3x Sonnet | $$ | 80% | Standard review |
170  | 3x Sonnet + 1x Opus | $$$ | 90% | Important decisions |
171  | 5x Haiku + 3x Sonnet + 1x Opus | $$$$ | 95%+ | Critical architecture |
172  
173  **The insight:** 3x Sonnet often matches 1x Opus for detection, but Opus adds synthesis value that Sonnet can't provide.
174  
175  ---
176  
177  ## Integration with Peer Review Protocol
178  
179  Update `proto-011` escalation ladder:
180  
181  ```
182  LEVEL 1: SINGLE REVIEWER
183  Model: Sonnet
184  Catches: ~70%
185  
186  LEVEL 2: DUAL REVIEWER
187  Model: 2x Sonnet
188  Catches: ~80%
189  
190  LEVEL 3: TRIBAL REVIEW
191  Model: 3x Sonnet + 1x Opus (First Officer)
192  Catches: ~90%
193  
194  LEVEL 4: HUMAN + COUNCIL
195  Model: 3x Sonnet + 1x Opus + Human
196  Catches: ~95%+
197  ```
198  
199  ---
200  
201  ## The First Officer is Always Opus
202  
203  **Why:** Opus does metacognition. Sonnet does cognition.
204  
205  ```
206  COGNITION (Object Level)         METACOGNITION (Meta Level)
207  ────────────────────────         ─────────────────────────
208  "What are the issues?"           "Are these findings valid?"
209  "Does this match?"               "Are reviewers converging?"
210  "How could this fail?"           "What's the pattern?"
211                                   "Which reviewer is reliable?"
212  
213  Sonnet can do this               Only Opus can do this
214  ```
215  
216  **The architectural principle:**
217  - Object-level work can degrade (Haiku, Sonnet)
218  - Metacognition requires Opus
219  - This isn't about "hard tasks" - it's about **level of abstraction**
220  
221  **Exception:** Skip First Officer entirely for routine reviews (save cost).
222  
223  ---
224  
225  ## Implementation
226  
227  ### Spawn with Model Parameter
228  ```
229  Task agent with model: "haiku" | "sonnet" | "opus"
230  ```
231  
232  ### Parallel Haiku Swarm
233  ```
234  Spawn 5-10 Haiku agents simultaneously
235  Each checks one aspect
236  Aggregate findings
237  Escalate anomalies to Sonnet
238  ```
239  
240  ### Sequential Escalation
241  ```
242  Haiku pass → Done (no issues)
243  Haiku flag → Sonnet review
244  Sonnet flag → Opus synthesis
245  Opus flag → Human decision
246  ```
247  
248  ---
249  
250  ## The Promise
251  
252  > **Intelligence should match the problem, not exceed it.**
253  >
254  > Haiku for checklists.
255  > Sonnet for judgment.
256  > Opus for synthesis.
257  >
258  > Spend capability where it matters.
259  > Degrade gracefully on routine work.
260  > The pyramid catches more than the peak alone.
261  
262  ---
263  
264  ## Related
265  
266  - **axioms**
267    - [[A2 Recognition of Life]] - recognize which model is "alive" for this task
268      - shape:: "Can you recognize life? Death mimics life through ornament."
269    - [[A3 Dynamic Pole Navigation]] - navigate between cheap/capable based on context
270      - shape:: "Life is the oscillation; death is fixing at either pole."
271  - **protocols**
272    - [[first-officer-protocol]] - always Opus (metacognition)
273      - shape:: "Per-thread metacognition. Compress state, track gravity wells, flag drift."
274    - [[fractal-tribe-architecture]] - model allocation at each level
275      - shape:: "Same pattern at every level. Checkers → workers → tribes → supervisors."
276    - [[peer-review-protocol]] - which models review what
277      - shape:: "Workers check each other's output before it goes to user."
278    - [[error-detection-layers]] - Haiku checkers before Sonnet workers
279      - shape:: "Multiple tiers of error catching. Catch errors at lowest level possible."
280  - **enables**
281    - [[autonomous-exploration-tribes]] - tribe worker model selection
282  
283  ---
284  
285  *proto-012 | Model Allocation Strategy | Match Capability to Complexity*