/ YackShavingSkill_PRD_v3.md
YackShavingSkill_PRD_v3.md
1 # PRD: `@YackShavingSkill` for Pi Coding Agent 2 **Version:** 3.0 (Unified Framework) 3 **Status:** Ready for Implementation 4 **Primary Goal:** Operationalize Karpathy-inspired coding guidelines into an active, composable skill framework that combats "induced complexity" through **Orchestration**, **Contextual Anchoring**, and **Structured Reflection**. 5 6 --- 7 8 ## 1. Concept & Vision 9 10 ### The Problem: Induced Complexity 11 LLM coding agents suffer from "induced complexity"—a natural tendency toward over-engineering, scope creep, and "feature bloat." While standard `CLAUDE.md` or static prompts offer *advice*, they fail to enforce it. Agents "forget" simplicity the moment they start writing code. 12 13 ### The Solution: The Agent Counterweight 14 `@YackShavingSkill` is an **architecture, not just advice**. It transforms static Karpathy-inspired principles into an active operational system compatible with Pi's multi-agent lifecycle: 15 1. **Reflect:** Forces the agent to externalize its thinking (Pre-computation) and anchor to concrete code patterns *before* touching files. 16 2. **Reference:** Links dynamically to a "Context Brain" (examples directory) to ensure style consistency. 17 3. **Verify:** Runs a Post-Hoc Verification Gate that mechanically checks the agent's output against its own pre-flight commitments. 18 19 --- 20 21 ## 2. System Architecture: The 3-Layer Defense System 22 23 We operationalize the skill into three composable layers. Each layer addresses a specific failure mode of the LLM. 24 25 ### Layer 1: The Protocol (The Rules & Reflection) 26 **Function:** Enforces behavioral rigor through structured execution. 27 **Mechanism:** The **Complexity Orchestrator** calculates task difficulty and activates specific "Skill Modules" (see Section 3). The agent's progress is tracked via the **Session Journal**, a mandatory log file where the agent "thinks out loud" before and after coding. 28 * **PRD v1 Integration:** Dynamic skill activation based on complexity scoring. 29 * **PRD v2 Integration:** The requirement to write Pre-Flight / Post-Flight Journal entries. 30 31 ### Layer 2: The Context (The Patterns) 32 **Function:** Prevents "Style Drift" by grounding the agent in reality, not abstraction. 33 **Mechanism:** The `examples/` directory serves as the **Context Brain**. Instead of describing "simple code" in text, skill prompts dynamically inject relative links (e.g., *"Read `examples/patterns/simple-loop.md`"*) to point the agent to a **Gold Standard**. Conversely, **Anti-Patterns** show exactly what to avoid. 34 * **PRD v2 Integration:** The `examples/` directory structure and pattern matching. 35 36 ### Layer 3: The Skills (The Artifacts) 37 **Function:** Produces concrete, checkable deliverables rather than loose suggestions. 38 **Mechanism:** Each activated Skill Module is a first-class Pi task that must produce a discrete **Artifact** (a checklist, a scope boundary, a verification matrix). 39 * **PRD v1 Integration:** The four specific skill modules (Think, Simplicity, Surgical, Goal) and their artifacts. 40 41 --- 42 43 ## 3. The 4 Core Skill Modules 44 45 Derived from the Karpathy principles, these modules produce specific artifacts. 46 47 ### Skill A: Think Before Coding 48 **Purpose:** Force explicit reasoning and assumption validation before execution. 49 **Artifact Produced:** **Pre-Computation Block** 50 * A structured checklist containing: 51 * Assumptions list with Confidence Scores (`HIGH`/`MEDIUM`/`LOW`). 52 * Interpretations of the request and simpler alternatives considered. 53 * Scope Declaration (files explicitly touched vs. off-limits). 54 * *Fires on:* All tasks above TRIVIAL complexity. 55 56 ### Skill B: Simplicity First 57 **Purpose:** Actively detect and kill over-engineering. 58 **Artifact Produced:** **Simplicity Review** 59 * A self-assessment report answering: 60 * "What is the simplest possible solution?" 61 * "What flexibility/abstraction did I intentionally *not* add?" 62 * "My Line-Count Budget: Target vs. Actual." 63 * *Fires on:* All non-trivial tasks. 64 65 ### Skill C: Surgical Changes 66 **Purpose:** Governance over code scope. 67 **Artifact Produced:** **Change Boundary** 68 * A document detailing: 69 * Files explicitly touched + rationale. 70 * Files explicitly *not* touched. 71 * **Orthogonal Issues:** Improvements noticed but flagged as "Out of Scope" to prevent scope creep. 72 * **Orphan Tracking:** Identifying imports/variables that were made unused by this specific change. 73 * *Fires on:* Any task modifying existing files. 74 75 ### Skill D: Goal-Driven Execution 76 **Purpose:** Define and verify success criteria. 77 **Artifact Produced:** **Verification Matrix** 78 * A table mapping each subtask to explicit Pass/Fail criteria and the specific test cases required. 79 * *Fires on:* All tasks (even trivial ones get a minimal version). 80 81 --- 82 83 ## 4. The Complexity Orchestrator (Decision Engine) 84 85 The Orchestrator is a lightweight decision layer that runs during the Pi Plan phase. It calculates a **Complexity Score (1–10)** and determines which skills to activate. 86 87 ### 4.1 Scoring Dimensions 88 | Dimension | Low (0) | Medium (1) | High (2) | 89 |---|---|---|---| 90 | **Scope Size** | 1 file | 2–3 files | >3 files (Max 5) | 91 | **Ambiguity** | Clear, step-by-step request | Some context needed | Unclear requirements | 92 | **Risk Surface** | Internal / Helper functions | Public API / Shared logic | Critical path / User-facing | 93 | **Knowledge Gap** | None (agent knows the code) | Partial (agent needs to read) | Unknown (agent must explore) | 94 95 ### 4.2 Severity Tiers & Skill Activation 96 | Tier | Score | Active Skills | Requirements | 97 |---|---|---|---| 98 | **TRIVIAL** | 1–3 | Goal-Driven (Skill D) only | Minimal Verification Matrix. No Journal required. | 99 | **STANDARD** | 4–7 | Goal (D) + Simplicity (B) + Think (A) + Surgical (C) | Full **Session Journal** (Pre/Post-Flight). Must reference a Gold Standard pattern. | 100 | **COMPLEX** | 8–10 | ALL Skills + Explicit user confirmation | Full Session Journal + **Expert Reviewer Agent** assigned to review the work. | 101 102 --- 103 104 ## 5. The Reflection Mechanism: Session Journal 105 106 To prevent the agent from "forgetting" its goals during implementation (a key failure mode), we implement the **Session Journal**. This is a structured log file (e.g., `SESSION_LOG.md`) where the agent is forced to "self-audit" at two distinct stages. 107 108 ### 5.1 Pre-Flight Entry (Journal Entry #1) 109 *Generated during the Plan/Work transition.* 110 * **The Reflex Check:** A mandatory commitment block. 111 * *Simplicity Goal:* "I will use [Simple Technique X]. I will NOT use [Over-Engineered Approach Y]." 112 * *Scope Boundaries:* "I am strictly forbidding myself from touching [Out-of-Scope files]." 113 * **Contextual Retrieval:** A link to the specific `examples/` pattern the agent reads to anchor its style. 114 115 ### 5.2 Post-Flight Entry (Journal Entry #2) 116 *Generated after coding ends, before Review.* 117 * **Reflex Audit:** A Pass/Fail judgment. Did the final code meet the Pre-Flight Simplicity Goal? 118 * **Violation Checklist:** 119 * [ ] **Complexity Creep:** Did I add unused flags or hidden logic? 120 * [ ] **Scope Bleed:** Did I touch Out-of-Scope files? 121 * [ ] **Style Drift:** Did I mimic the `examples/` structure correctly? 122 * **Verification Results:** The actual pass/fail outcomes of the Verification Matrix. 123 124 --- 125 126 ## 6. The Review Phase: The Gate 127 128 Triggered when the work is complete or the task is marked as ready for review. 129 130 1. **Review the Journal First:** The Reviewer Agent reads the **Session Journal**. If the Pre-Flight committed to 20 lines and the code has 200, the review fails *instantly* (Instant Fail) without further inspection. 131 2. **Pattern Check:** The Reviewer verifies that the agent actually cited and followed one of the `examples/` patterns. 132 3. **Diff Purity Check:** The Reviewer runs the **Verification Gate**: reads the Post-Flight Journal/Change Boundary and diffs actual code changes against declared scope. 133 4. **Adherence Report:** If the review passes, an Adherence Report is emitted (listing which skills fired, artifacts produced, and zero violations found). 134 135 --- 136 137 ## 7. Repository Structure 138 139 We keep the framework modular so skills are reusable and context is easily expanded. 140 141 ```text 142 @YackShavingSkill/ 143 ├── PRD.md # This file (The unified master) 144 ├── CLAUDE.md # The top-level "master" rules (shorthand access) 145 │ 146 ├── skills/ # The modular skill prompt layers (PRD v1) 147 │ ├── think.md # Skill A: Think Before Coding 148 │ ├── simplicity.md # Skill B: Simplicity First 149 │ ├── surgical.md # Skill C: Surgical Changes 150 │ └── goal.md # Skill D: Goal-Driven Execution 151 │ 152 ├── examples/ # The "Context Brain" (PRD v2) 153 │ ├── anti-patterns/ # What to AVOID (Negative references) 154 │ │ ├── bloated-loop.md # Example of unnecessary loop complexity 155 │ │ └── god-object.md # Example of over-merged responsibilities 156 │ └── patterns/ # What to MIMIC (Gold Standard) 157 │ ├── simple-loop.md # Example of clean, simple loops 158 │ └── surgical-diff.md # Example of minimal, bounded changes 159 │ 160 └── templates/ # Reflection structures (PRD v2) 161 └── session_journal.md # Template for the journal entries 162 ``` 163 164 --- 165 166 ## 8. Success Metrics 167 168 We measure effectiveness by observing the outputs of the Session Journal and the Verification Matrix: 169 170 1. **Reflex Rate (Target: >80%):** The % of tasks where the agent successfully catches its own bloat during the "Post-Flight" entry. 171 2. **Style Drift Rate (Target: 0%):** The % of tasks where the agent successfully references and mimics a pattern from the `examples/` directory. (100% target in v2 is ambitious, 0% drift is the canonical goal). 172 3. **Surgical Purity (Target: >95%):** The % of code changes that are strictly within the "Scope Boundaries" declared in the Pre-Flight Journal. 173 4. **Adherence Adherence:** The % of tasks where the Verification Gate detected *zero* violations. 174 175 --- 176 177 ## 9. Implementation Roadmap 178 179 Merged and streamlined for execution. 180 181 ### Phase 1: The Core (Week 1-2) 182 * Define the 4 skill modules (`skills/`) as standalone prompt templates (PRD v1). 183 * Implement the Complexity Orchestrator rule-based logic (Score 1-10). 184 * Create the top-level CLAUDE.md master file. 185 186 ### Phase 2: The Context (Week 3) 187 * Create the `examples/` repository with 2-3 initial Gold Standard patterns and Anti-Patterns (PRD v2). 188 * Build the `templates/session_journal.md` and ensure skills dynamically reference it. 189 190 ### Phase 3: The Mirror & The Gate (Week 4-5) 191 * **The Mirror:** Test the "Journal" templates with a Pi agent to ensure it generates structured reflections and doesn't hallucinate the patterns. 192 * **The Gate:** Integrate the Verification Gate into the Pi Reviewer Agent workflow. Ensure the Reviewer reads the Journal *before* looking at the code. 193 194 ### Phase 4: Feedback Loop (Week 6+) 195 * Build a mechanism to track Adherence Reports. 196 * Expand the `examples/` directory based on real tasks encountered. 197 * Open-source the skill framework if successful. 198 199 --- 200 201 *"The bottleneck is no longer coding — it's specification and review."* 202 *(Merged quote from v1 & v2)* 203 204 This framework doesn't try to make the model code better. It makes the model **specify better (Pre-Flight), reference reality (Context), and review rigorously (Post-Gate)**, which is where the actual value lies.