Cradicle Explorer

/ YackShavingSkill_PRD_v3.md
YackShavingSkill_PRD_v3.md
  1  # PRD: `@YackShavingSkill` for Pi Coding Agent
  2  **Version:** 3.0 (Unified Framework)
  3  **Status:** Ready for Implementation
  4  **Primary Goal:** Operationalize Karpathy-inspired coding guidelines into an active, composable skill framework that combats "induced complexity" through **Orchestration**, **Contextual Anchoring**, and **Structured Reflection**.
  5  
  6  ---
  7  
  8  ## 1. Concept & Vision
  9  
 10  ### The Problem: Induced Complexity
 11  LLM coding agents suffer from "induced complexity"—a natural tendency toward over-engineering, scope creep, and "feature bloat." While standard `CLAUDE.md` or static prompts offer *advice*, they fail to enforce it. Agents "forget" simplicity the moment they start writing code.
 12  
 13  ### The Solution: The Agent Counterweight
 14  `@YackShavingSkill` is an **architecture, not just advice**. It transforms static Karpathy-inspired principles into an active operational system compatible with Pi's multi-agent lifecycle:
 15  1.  **Reflect:** Forces the agent to externalize its thinking (Pre-computation) and anchor to concrete code patterns *before* touching files.
 16  2.  **Reference:** Links dynamically to a "Context Brain" (examples directory) to ensure style consistency.
 17  3.  **Verify:** Runs a Post-Hoc Verification Gate that mechanically checks the agent's output against its own pre-flight commitments.
 18  
 19  ---
 20  
 21  ## 2. System Architecture: The 3-Layer Defense System
 22  
 23  We operationalize the skill into three composable layers. Each layer addresses a specific failure mode of the LLM.
 24  
 25  ### Layer 1: The Protocol (The Rules & Reflection)
 26  **Function:** Enforces behavioral rigor through structured execution.
 27  **Mechanism:** The **Complexity Orchestrator** calculates task difficulty and activates specific "Skill Modules" (see Section 3). The agent's progress is tracked via the **Session Journal**, a mandatory log file where the agent "thinks out loud" before and after coding.
 28  *   **PRD v1 Integration:** Dynamic skill activation based on complexity scoring.
 29  *   **PRD v2 Integration:** The requirement to write Pre-Flight / Post-Flight Journal entries.
 30  
 31  ### Layer 2: The Context (The Patterns)
 32  **Function:** Prevents "Style Drift" by grounding the agent in reality, not abstraction.
 33  **Mechanism:** The `examples/` directory serves as the **Context Brain**. Instead of describing "simple code" in text, skill prompts dynamically inject relative links (e.g., *"Read `examples/patterns/simple-loop.md`"*) to point the agent to a **Gold Standard**. Conversely, **Anti-Patterns** show exactly what to avoid.
 34  *   **PRD v2 Integration:** The `examples/` directory structure and pattern matching.
 35  
 36  ### Layer 3: The Skills (The Artifacts)
 37  **Function:** Produces concrete, checkable deliverables rather than loose suggestions.
 38  **Mechanism:** Each activated Skill Module is a first-class Pi task that must produce a discrete **Artifact** (a checklist, a scope boundary, a verification matrix).
 39  *   **PRD v1 Integration:** The four specific skill modules (Think, Simplicity, Surgical, Goal) and their artifacts.
 40  
 41  ---
 42  
 43  ## 3. The 4 Core Skill Modules
 44  
 45  Derived from the Karpathy principles, these modules produce specific artifacts.
 46  
 47  ### Skill A: Think Before Coding
 48  **Purpose:** Force explicit reasoning and assumption validation before execution.
 49  **Artifact Produced:** **Pre-Computation Block**
 50  *   A structured checklist containing:
 51      *   Assumptions list with Confidence Scores (`HIGH`/`MEDIUM`/`LOW`).
 52      *   Interpretations of the request and simpler alternatives considered.
 53      *   Scope Declaration (files explicitly touched vs. off-limits).
 54  *   *Fires on:* All tasks above TRIVIAL complexity.
 55  
 56  ### Skill B: Simplicity First
 57  **Purpose:** Actively detect and kill over-engineering.
 58  **Artifact Produced:** **Simplicity Review**
 59  *   A self-assessment report answering:
 60      *   "What is the simplest possible solution?"
 61      *   "What flexibility/abstraction did I intentionally *not* add?"
 62      *   "My Line-Count Budget: Target vs. Actual."
 63  *   *Fires on:* All non-trivial tasks.
 64  
 65  ### Skill C: Surgical Changes
 66  **Purpose:** Governance over code scope.
 67  **Artifact Produced:** **Change Boundary**
 68  *   A document detailing:
 69      *   Files explicitly touched + rationale.
 70      *   Files explicitly *not* touched.
 71      *   **Orthogonal Issues:** Improvements noticed but flagged as "Out of Scope" to prevent scope creep.
 72      *   **Orphan Tracking:** Identifying imports/variables that were made unused by this specific change.
 73  *   *Fires on:* Any task modifying existing files.
 74  
 75  ### Skill D: Goal-Driven Execution
 76  **Purpose:** Define and verify success criteria.
 77  **Artifact Produced:** **Verification Matrix**
 78  *   A table mapping each subtask to explicit Pass/Fail criteria and the specific test cases required.
 79  *   *Fires on:* All tasks (even trivial ones get a minimal version).
 80  
 81  ---
 82  
 83  ## 4. The Complexity Orchestrator (Decision Engine)
 84  
 85  The Orchestrator is a lightweight decision layer that runs during the Pi Plan phase. It calculates a **Complexity Score (1–10)** and determines which skills to activate.
 86  
 87  ### 4.1 Scoring Dimensions
 88  | Dimension | Low (0) | Medium (1) | High (2) |
 89  |---|---|---|---|
 90  | **Scope Size** | 1 file | 2–3 files | >3 files (Max 5) |
 91  | **Ambiguity** | Clear, step-by-step request | Some context needed | Unclear requirements |
 92  | **Risk Surface** | Internal / Helper functions | Public API / Shared logic | Critical path / User-facing |
 93  | **Knowledge Gap** | None (agent knows the code) | Partial (agent needs to read) | Unknown (agent must explore) |
 94  
 95  ### 4.2 Severity Tiers & Skill Activation
 96  | Tier | Score | Active Skills | Requirements |
 97  |---|---|---|---|
 98  | **TRIVIAL** | 1–3 | Goal-Driven (Skill D) only | Minimal Verification Matrix. No Journal required. |
 99  | **STANDARD** | 4–7 | Goal (D) + Simplicity (B) + Think (A) + Surgical (C) | Full **Session Journal** (Pre/Post-Flight). Must reference a Gold Standard pattern. |
100  | **COMPLEX** | 8–10 | ALL Skills + Explicit user confirmation | Full Session Journal + **Expert Reviewer Agent** assigned to review the work. |
101  
102  ---
103  
104  ## 5. The Reflection Mechanism: Session Journal
105  
106  To prevent the agent from "forgetting" its goals during implementation (a key failure mode), we implement the **Session Journal**. This is a structured log file (e.g., `SESSION_LOG.md`) where the agent is forced to "self-audit" at two distinct stages.
107  
108  ### 5.1 Pre-Flight Entry (Journal Entry #1)
109  *Generated during the Plan/Work transition.*
110  *   **The Reflex Check:** A mandatory commitment block.
111      *   *Simplicity Goal:* "I will use [Simple Technique X]. I will NOT use [Over-Engineered Approach Y]."
112      *   *Scope Boundaries:* "I am strictly forbidding myself from touching [Out-of-Scope files]."
113  *   **Contextual Retrieval:** A link to the specific `examples/` pattern the agent reads to anchor its style.
114  
115  ### 5.2 Post-Flight Entry (Journal Entry #2)
116  *Generated after coding ends, before Review.*
117  *   **Reflex Audit:** A Pass/Fail judgment. Did the final code meet the Pre-Flight Simplicity Goal?
118  *   **Violation Checklist:**
119      *   [ ] **Complexity Creep:** Did I add unused flags or hidden logic?
120      *   [ ] **Scope Bleed:** Did I touch Out-of-Scope files?
121      *   [ ] **Style Drift:** Did I mimic the `examples/` structure correctly?
122  *   **Verification Results:** The actual pass/fail outcomes of the Verification Matrix.
123  
124  ---
125  
126  ## 6. The Review Phase: The Gate
127  
128  Triggered when the work is complete or the task is marked as ready for review.
129  
130  1.  **Review the Journal First:** The Reviewer Agent reads the **Session Journal**. If the Pre-Flight committed to 20 lines and the code has 200, the review fails *instantly* (Instant Fail) without further inspection.
131  2.  **Pattern Check:** The Reviewer verifies that the agent actually cited and followed one of the `examples/` patterns.
132  3.  **Diff Purity Check:** The Reviewer runs the **Verification Gate**: reads the Post-Flight Journal/Change Boundary and diffs actual code changes against declared scope.
133  4.  **Adherence Report:** If the review passes, an Adherence Report is emitted (listing which skills fired, artifacts produced, and zero violations found).
134  
135  ---
136  
137  ## 7. Repository Structure
138  
139  We keep the framework modular so skills are reusable and context is easily expanded.
140  
141  ```text
142  @YackShavingSkill/
143  ├── PRD.md                   # This file (The unified master)
144  ├── CLAUDE.md                # The top-level "master" rules (shorthand access)
145  │
146  ├── skills/                  # The modular skill prompt layers (PRD v1)
147  │   ├── think.md             # Skill A: Think Before Coding
148  │   ├── simplicity.md        # Skill B: Simplicity First
149  │   ├── surgical.md          # Skill C: Surgical Changes
150  │   └── goal.md              # Skill D: Goal-Driven Execution
151  │
152  ├── examples/                # The "Context Brain" (PRD v2)
153  │   ├── anti-patterns/       # What to AVOID (Negative references)
154  │   │   ├── bloated-loop.md  # Example of unnecessary loop complexity
155  │   │   └── god-object.md    # Example of over-merged responsibilities
156  │   └── patterns/            # What to MIMIC (Gold Standard)
157  │       ├── simple-loop.md   # Example of clean, simple loops
158  │       └── surgical-diff.md # Example of minimal, bounded changes
159  │
160  └── templates/               # Reflection structures (PRD v2)
161      └── session_journal.md   # Template for the journal entries
162  ```
163  
164  ---
165  
166  ## 8. Success Metrics
167  
168  We measure effectiveness by observing the outputs of the Session Journal and the Verification Matrix:
169  
170  1.  **Reflex Rate (Target: >80%):** The % of tasks where the agent successfully catches its own bloat during the "Post-Flight" entry.
171  2.  **Style Drift Rate (Target: 0%):** The % of tasks where the agent successfully references and mimics a pattern from the `examples/` directory. (100% target in v2 is ambitious, 0% drift is the canonical goal).
172  3.  **Surgical Purity (Target: >95%):** The % of code changes that are strictly within the "Scope Boundaries" declared in the Pre-Flight Journal.
173  4.  **Adherence Adherence:** The % of tasks where the Verification Gate detected *zero* violations.
174  
175  ---
176  
177  ## 9. Implementation Roadmap
178  
179  Merged and streamlined for execution.
180  
181  ### Phase 1: The Core (Week 1-2)
182  *   Define the 4 skill modules (`skills/`) as standalone prompt templates (PRD v1).
183  *   Implement the Complexity Orchestrator rule-based logic (Score 1-10).
184  *   Create the top-level CLAUDE.md master file.
185  
186  ### Phase 2: The Context (Week 3)
187  *   Create the `examples/` repository with 2-3 initial Gold Standard patterns and Anti-Patterns (PRD v2).
188  *   Build the `templates/session_journal.md` and ensure skills dynamically reference it.
189  
190  ### Phase 3: The Mirror & The Gate (Week 4-5)
191  *   **The Mirror:** Test the "Journal" templates with a Pi agent to ensure it generates structured reflections and doesn't hallucinate the patterns.
192  *   **The Gate:** Integrate the Verification Gate into the Pi Reviewer Agent workflow. Ensure the Reviewer reads the Journal *before* looking at the code.
193  
194  ### Phase 4: Feedback Loop (Week 6+)
195  *   Build a mechanism to track Adherence Reports.
196  *   Expand the `examples/` directory based on real tasks encountered.
197  *   Open-source the skill framework if successful.
198  
199  ---
200  
201  *"The bottleneck is no longer coding — it's specification and review."*
202  *(Merged quote from v1 & v2)*
203  
204  This framework doesn't try to make the model code better. It makes the model **specify better (Pre-Flight), reference reality (Context), and review rigorously (Post-Gate)**, which is where the actual value lies.