/ evaluation.org
evaluation.org
  1  #+title: Pi-AgentStack -- Evaluation
  2  #+author: HaQadosch
  3  #+date: [2026-04-22 Wed]
  4  #+startup: indent
  5  #+options: toc:2 num:nil
  6  
  7  * Question Being Answered
  8  
  9  Is the =codejunkie99/agentic-stack= pattern the right tool for /my/
 10  workflow, given what I already have (=PiCodingAgent=,
 11  =pi-org-styleguide-reminder=, =Pi-OrgStyleGuide=, assorted =Pi-*=
 12  projects)?
 13  
 14  This document is the working notebook for that question. It is not
 15  the decision -- the decision lives in [[file:decisions.org][decisions.org]] when ready.
 16  
 17  * Seed Criteria
 18  
 19  Initial criteria to refine. Each one should eventually resolve to
 20  /yes/, /no/, or /it depends on X/.
 21  
 22  ** Workflow Fit
 23  
 24  - Does the four-layer memory taxonomy (personal / working / semantic /
 25    episodic) map cleanly onto how I actually work, or am I twisting
 26    my workflow to fit it?
 27  - Is the review-queue discipline (rationale required for graduation)
 28    something I will actually do, or will it become friction I bypass?
 29  - Does progressive skill disclosure solve a real context-budget
 30    problem for me, or am I not yet hitting that ceiling?
 31  - Can I name at least three of my existing patterns (org styleguide
 32    reminder, radicle ops, org review, ...) that would map cleanly to
 33    skills under this schema?
 34  
 35  ** Composition With Existing Work
 36  
 37  - How does this compose with =PiCodingAgent/= -- does it replace,
 38    sit alongside, or subsume it?
 39  - =Pi-OrgStyleGuide= and =pi-org-styleguide-reminder= already look
 40    skill-shaped. Are they a natural first import, or would forcing
 41    them into =skills/= distort them?
 42  - Harness portability is the headline promise. Which harnesses do I
 43    actually use? If it is only Claude Code, portability is a non-goal
 44    and the abstraction tax is wasted.
 45  
 46  ** Operational Cost
 47  
 48  - How much of the dream-cycle machinery (=auto_dream=, =cluster=,
 49    =validate=, =promote=, =decay=) do I actually want, vs. what is
 50    there because the author wanted it?
 51  - What is the per-session overhead of the context budget
 52    (=MAX_CTX - 40000=)? With a 1M-context model, 40k reserved is
 53    noise. With a 128k-context model it is 30%+.
 54  - Who maintains =permissions.md= and =skills/*/SKILL.md= over time?
 55    Am I signing up for a second codebase to maintain?
 56  
 57  ** Irreversibility
 58  
 59  - If I adopt and then regret, what is the exit cost? Can I pull
 60    =LESSONS.md= and =DECISIONS.md= out cleanly, or do they bind to
 61    internal formats?
 62  - Does adoption close off simpler paths -- plain =CLAUDE.md= +
 63    =AGENTS.md= + a few markdown skill files -- that might be enough?
 64  
 65  * Running Assessment
 66  
 67  Fill in as thinking progresses. Date each entry so drift is visible.
 68  
 69  ** [2026-04-22] Initial read
 70  
 71  The repo is more sophisticated than the origin thread sells it as.
 72  The dream-cycle pipeline with rationale-required graduation is a
 73  genuine idea, not just scaffolding. The thin conductor (=~25 lines=)
 74  and the 3-tier permissions file are directly useful even if nothing
 75  else is adopted.
 76  
 77  The four-layer memory taxonomy is the design decision that will
 78  either carry the whole system or fail silently. It needs to match
 79  how I actually accumulate context, not just how the author does.
 80  
 81  Unknowns that block a decision: onboarding wizard behaviour,
 82  =salience.py= scoring rules, adapter specifics for Claude Code
 83  (since that is my primary harness), and whether existing Pi-* patterns
 84  re-shape into skills without distortion.
 85  
 86  * Concerns and Frictions
 87  
 88  - *Abstraction tax for single-harness use* :: The portability story
 89    justifies the machinery. If I only use Claude Code, I am paying
 90    for flexibility I will not exercise.
 91  - *Review-queue burden* :: Batch review is the discipline, but it
 92    only pays off at a certain throughput. Below that, it is ceremony.
 93  - *Onboarding opinionation* :: The wizard (=onboard*.py=) may lock
 94    in choices I would rather leave open. Needs inspection.
 95  - *Two sources of truth for lessons* :: Between =LESSONS.md=,
 96    =DECISIONS.md=, and any CLAUDE.md or AGENTS.md at the project
 97    root, there is a risk of contradiction.
 98  - *Decay without reinforcement* :: =decay.py= will quietly erode
 99    lessons that I still care about if the salience scorer miscounts
100    what I am doing.
101  
102  * Open Questions
103  
104  - [ ] What does =salience.py= actually score on? Read the source.
105  - [ ] What does =onboard.py= do at install time? Read the source.
106  - [ ] How do adapters for Claude Code wire the =pre_tool_call= hook?
107  - [ ] Is there a /minimum viable adoption/ -- just =AGENTS.md= plus
108    the four memory folders, no tools -- that captures 80% of the
109    value?
110  - [ ] How does the repo handle lesson /contradictions/? If I graduate
111    rule A then later rule not-A, what happens?
112  
113  * Exit Criteria
114  
115  The evaluation is /complete/ when:
116  
117  - Every criterion above resolves to yes / no / depends.
118  - Every open question has an answer or a decision to stop waiting
119    for one.
120  - The running assessment contains at least one entry dated within
121    the past week describing a concrete interaction with the pattern
122    (not just reading about it).
123  
124  At that point, move to [[file:decisions.org][decisions.org]] and record the decision.