/ evaluation.org
evaluation.org
1 #+title: Pi-AgentStack -- Evaluation 2 #+author: HaQadosch 3 #+date: [2026-04-22 Wed] 4 #+startup: indent 5 #+options: toc:2 num:nil 6 7 * Question Being Answered 8 9 Is the =codejunkie99/agentic-stack= pattern the right tool for /my/ 10 workflow, given what I already have (=PiCodingAgent=, 11 =pi-org-styleguide-reminder=, =Pi-OrgStyleGuide=, assorted =Pi-*= 12 projects)? 13 14 This document is the working notebook for that question. It is not 15 the decision -- the decision lives in [[file:decisions.org][decisions.org]] when ready. 16 17 * Seed Criteria 18 19 Initial criteria to refine. Each one should eventually resolve to 20 /yes/, /no/, or /it depends on X/. 21 22 ** Workflow Fit 23 24 - Does the four-layer memory taxonomy (personal / working / semantic / 25 episodic) map cleanly onto how I actually work, or am I twisting 26 my workflow to fit it? 27 - Is the review-queue discipline (rationale required for graduation) 28 something I will actually do, or will it become friction I bypass? 29 - Does progressive skill disclosure solve a real context-budget 30 problem for me, or am I not yet hitting that ceiling? 31 - Can I name at least three of my existing patterns (org styleguide 32 reminder, radicle ops, org review, ...) that would map cleanly to 33 skills under this schema? 34 35 ** Composition With Existing Work 36 37 - How does this compose with =PiCodingAgent/= -- does it replace, 38 sit alongside, or subsume it? 39 - =Pi-OrgStyleGuide= and =pi-org-styleguide-reminder= already look 40 skill-shaped. Are they a natural first import, or would forcing 41 them into =skills/= distort them? 42 - Harness portability is the headline promise. Which harnesses do I 43 actually use? If it is only Claude Code, portability is a non-goal 44 and the abstraction tax is wasted. 45 46 ** Operational Cost 47 48 - How much of the dream-cycle machinery (=auto_dream=, =cluster=, 49 =validate=, =promote=, =decay=) do I actually want, vs. what is 50 there because the author wanted it? 51 - What is the per-session overhead of the context budget 52 (=MAX_CTX - 40000=)? With a 1M-context model, 40k reserved is 53 noise. With a 128k-context model it is 30%+. 54 - Who maintains =permissions.md= and =skills/*/SKILL.md= over time? 55 Am I signing up for a second codebase to maintain? 56 57 ** Irreversibility 58 59 - If I adopt and then regret, what is the exit cost? Can I pull 60 =LESSONS.md= and =DECISIONS.md= out cleanly, or do they bind to 61 internal formats? 62 - Does adoption close off simpler paths -- plain =CLAUDE.md= + 63 =AGENTS.md= + a few markdown skill files -- that might be enough? 64 65 * Running Assessment 66 67 Fill in as thinking progresses. Date each entry so drift is visible. 68 69 ** [2026-04-22] Initial read 70 71 The repo is more sophisticated than the origin thread sells it as. 72 The dream-cycle pipeline with rationale-required graduation is a 73 genuine idea, not just scaffolding. The thin conductor (=~25 lines=) 74 and the 3-tier permissions file are directly useful even if nothing 75 else is adopted. 76 77 The four-layer memory taxonomy is the design decision that will 78 either carry the whole system or fail silently. It needs to match 79 how I actually accumulate context, not just how the author does. 80 81 Unknowns that block a decision: onboarding wizard behaviour, 82 =salience.py= scoring rules, adapter specifics for Claude Code 83 (since that is my primary harness), and whether existing Pi-* patterns 84 re-shape into skills without distortion. 85 86 * Concerns and Frictions 87 88 - *Abstraction tax for single-harness use* :: The portability story 89 justifies the machinery. If I only use Claude Code, I am paying 90 for flexibility I will not exercise. 91 - *Review-queue burden* :: Batch review is the discipline, but it 92 only pays off at a certain throughput. Below that, it is ceremony. 93 - *Onboarding opinionation* :: The wizard (=onboard*.py=) may lock 94 in choices I would rather leave open. Needs inspection. 95 - *Two sources of truth for lessons* :: Between =LESSONS.md=, 96 =DECISIONS.md=, and any CLAUDE.md or AGENTS.md at the project 97 root, there is a risk of contradiction. 98 - *Decay without reinforcement* :: =decay.py= will quietly erode 99 lessons that I still care about if the salience scorer miscounts 100 what I am doing. 101 102 * Open Questions 103 104 - [ ] What does =salience.py= actually score on? Read the source. 105 - [ ] What does =onboard.py= do at install time? Read the source. 106 - [ ] How do adapters for Claude Code wire the =pre_tool_call= hook? 107 - [ ] Is there a /minimum viable adoption/ -- just =AGENTS.md= plus 108 the four memory folders, no tools -- that captures 80% of the 109 value? 110 - [ ] How does the repo handle lesson /contradictions/? If I graduate 111 rule A then later rule not-A, what happens? 112 113 * Exit Criteria 114 115 The evaluation is /complete/ when: 116 117 - Every criterion above resolves to yes / no / depends. 118 - Every open question has an answer or a decision to stop waiting 119 for one. 120 - The running assessment contains at least one entry dated within 121 the past week describing a concrete interaction with the pattern 122 (not just reading about it). 123 124 At that point, move to [[file:decisions.org][decisions.org]] and record the decision.