Cradicle Explorer

/ upstream-walkthrough.org
upstream-walkthrough.org
  1  #+title: codejunkie99/agentic-stack -- Walkthrough
  2  #+author: HaQadosch
  3  #+date: [2026-04-22 Wed]
  4  #+startup: indent
  5  #+options: toc:2 num:nil
  6  
  7  * Source
  8  
  9  - Repo :: [[https://github.com/codejunkie99/agentic-stack][codejunkie99/agentic-stack]]
 10  - Origin thread :: [[https://x.com/Av1dlive/status/2044453102703841645][Av1dlive on X]] -- /AI Agent Stack Everyone Must
 11    Use in 2026 (Builder's Guide)/
 12  - Licence :: Apache 2.0
 13  - Stated purpose :: /Keep one portable memory-and-skills layer across
 14    coding-agent harnesses, so switching tools doesn't reset how your
 15    agent works./
 16  
 17  These notes were captured on [2026-04-22]. The repo is in active
 18  development -- re-fetch before implementation decisions.
 19  
 20  * Core Thesis
 21  
 22  You don't need to build your own model. You need to build the
 23  /infrastructure around it/. Three pieces are load-bearing:
 24  
 25  - Memory :: that persists across sessions, across harnesses.
 26  - Skills :: reusable capability modules with triggers and constraints.
 27  - Protocols :: hard-coded governance for what the agent can and
 28    cannot do.
 29  
 30  Everything else -- the harness, the model call, the logging -- is
 31  thin glue. The bet is that the surrounding scaffolding is the moat,
 32  not the model.
 33  
 34  * Top-Level Repo Layout
 35  
 36  #+begin_example
 37  .agent/                    # Portable brain (the core component)
 38  adapters/                  # Harness-specific integrations
 39  docs/                      # Architecture & guides
 40  examples/                  # Usage examples
 41  Formula/                   # Homebrew formula
 42  install.sh                 # macOS/Linux installer
 43  install.ps1                # Windows installer
 44  onboard.py                 # Wizard entry point
 45  onboard_features.py        # Feature toggle logic
 46  onboard_render.py          # Answer rendering
 47  onboard_ui.py              # Terminal UI components
 48  onboard_widgets.py         # Interactive prompts
 49  onboard_write.py           # File I/O operations
 50  test_claude_code_hook.py   # Hook validation suite
 51  verify_codex_fixes.py      # Regression checks
 52  requirements.txt
 53  CHANGELOG.md  LICENSE  README.md  .env.example
 54  #+end_example
 55  
 56  The interesting bit is =.agent/= -- a self-contained directory that is
 57  meant to be portable across harnesses. The rest of the repo is
 58  installer, onboarding wizard, and adapter glue.
 59  
 60  * The =.agent/= Directory
 61  
 62  #+begin_example
 63  .agent/
 64  ├── AGENTS.md                 # Root prompt -- always loaded
 65  ├── harness/
 66  │   ├── conductor.py          # ~25 lines. Reads files, calls model, logs.
 67  │   ├── context_budget.py     # Token allocator
 68  │   ├── llm.py                # Model adapter
 69  │   ├── salience.py           # Scoring for episodic entries
 70  │   ├── text.py
 71  │   └── hooks/                # pre_tool_call, post_execution
 72  ├── memory/
 73  │   ├── personal/             # Stable user prefs (rarely changes)
 74  │   ├── working/              # Current task, review queue (high churn)
 75  │   ├── semantic/             # DECISIONS.md, LESSONS.md (distilled)
 76  │   ├── episodic/             # AGENT_LEARNINGS.jsonl (raw log)
 77  │   ├── candidates/           # Staged lessons awaiting review
 78  │   ├── auto_dream.py         # Clusters episodic -> candidates
 79  │   ├── validate.py           # Heuristic prefilter
 80  │   ├── promote.py            # Candidate -> LESSONS.md
 81  │   ├── decay.py              # Lessons lose salience without reinforcement
 82  │   ├── cluster.py            # Groups similar candidates
 83  │   ├── memory_search.py
 84  │   ├── review_state.py
 85  │   ├── archive.py
 86  │   └── render_lessons.py     # Renders LESSONS.md from lessons.jsonl
 87  ├── skills/
 88  │   ├── _index.md             # Always loaded (short)
 89  │   ├── _manifest.jsonl       # Machine-readable triggers
 90  │   ├── skillforge/           # Creates skills from observed patterns
 91  │   ├── memory-manager/       # Reflection cycles
 92  │   ├── git-proxy/            # Git with safety constraints
 93  │   ├── debug-investigator/   # Reproduce, isolate, hypothesize, verify
 94  │   └── deploy-checklist/     # Pre-deployment verification
 95  ├── protocols/
 96  │   ├── permissions.md        # 3-tier allowlist
 97  │   ├── delegation.md         # Sub-agent handoff contract
 98  │   ├── hook_patterns.json
 99  │   └── tool_schemas/         # Typed tool interfaces
100  └── tools/                    # Host-agent CLI
101      ├── recall.py             # Surface graduated lessons for current intent
102      ├── learn.py              # Teach a new lesson in one shot
103      ├── show.py               # One-screen brain-state dashboard
104      ├── list_candidates.py
105      ├── graduate.py           # Promote candidate -> LESSONS.md
106      ├── reject.py
107      ├── reopen.py
108      ├── memory_reflect.py     # Log a significant event
109      ├── budget_tracker.py
110      └── skill_loader.py
111  #+end_example
112  
113  * The Four-Layer Memory Taxonomy
114  
115  The repo distinguishes /four/ memory layers, not one. Reading order is
116  pinned in =AGENTS.md= (personal first, episodic last):
117  
118  | Layer    | Lifetime    | What goes in it                     | Example file               |
119  |----------+-------------+-------------------------------------+----------------------------|
120  | personal | Stable      | User conventions that rarely change | =PREFERENCES.md=           |
121  | working  | Per-task    | Live scratch + review queue         | =WORKSPACE.md=             |
122  | semantic | Durable     | Distilled, approved lessons         | =LESSONS.md=, =DECISIONS.md= |
123  | episodic | Append-only | Raw experience log                  | =AGENT_LEARNINGS.jsonl=    |
124  
125  The taxonomy maps cleanly onto human cognitive science:
126  
127  - Working memory :: scratch.
128  - Episodic memory :: experiences.
129  - Semantic memory :: facts and rules.
130  - Personal schemas :: preferences and conventions.
131  
132  This is not branding -- the API surface follows the metaphor
133  (=graduate=, =reject=, =reopen=, =decay=, =dream=).
134  
135  * The Dream Cycle
136  
137  Raw events do not become lessons directly. There is a three-stage
138  pipeline with a *human-in-the-loop gate*:
139  
140  1. =memory/auto_dream.py= clusters raw episodic events into candidate
141     lessons.
142  2. =memory/validate.py= heuristic-prefilters obvious junk (too-short
143     claims, exact duplicates).
144  3. The host agent (you) must run =graduate.py <id> --rationale "..."=
145     to promote. *Rationale is required.*
146  
147  The critical line from =AGENTS.md=:
148  
149  #+begin_quote
150  Rationale is required for graduation -- rubber-stamped promotions
151  are the exact failure mode this layer prevents.
152  #+end_quote
153  
154  Review happens in *batches*, not one-by-one, because cross-candidate
155  contradictions only surface when you see multiple at once.
156  
157  * The Thin Conductor
158  
159  =harness/conductor.py= is the entire runtime. Full body (abridged):
160  
161  #+begin_src python
162  RESERVED = 40000
163  MAX_CTX = int(os.getenv("AGENT_MAX_CONTEXT", "128000"))
164  
165  def run(user_input: str) -> str:
166      context, used = build_context(user_input, budget=MAX_CTX - RESERVED)
167      system = SYSTEM_PREAMBLE + context
168      try:
169          result = call_model(system, user_input)
170          log_execution("conductor", user_input[:100], result[:500], True)
171          return result
172      except Exception as e:
173          log_execution("conductor", user_input[:100], str(e), False)
174          raise
175  #+end_src
176  
177  Things worth noticing:
178  
179  - No retry logic :: it raises on failure. Retry policy lives per-skill,
180    not in the orchestrator.
181  - 40k reserved :: budget is =MAX_CTX - 40000= -- generous headroom for
182    the response.
183  - Logs success /and/ failure :: failure is logged before the raise so
184    the trail is never lost.
185  
186  The repo's stated rule: /"The harness is dumb on purpose. Reasoning
187  lives in skills + the host agent."/
188  
189  * Progressive Skill Disclosure
190  
191  =skills/_index.md= is always loaded. It is tiny -- five skills, one
192  sentence each, plus trigger phrases. Full =SKILL.md= bodies *only
193  load when triggers match*. This is the context-economy trick: you can
194  have 50 skills without paying the token cost for 49 of them on every
195  turn.
196  
197  Current shipped skills:
198  
199  - skillforge :: Creates new skills from observed patterns.
200  - memory-manager :: Reads, scores, consolidates memory.
201  - git-proxy :: All git operations with safety constraints.
202  - debug-investigator :: Reproduce, isolate, hypothesize, verify.
203  - deploy-checklist :: Pre-deployment verification.
204  
205  * Permissions: 3-Tier Allowlist
206  
207  =protocols/permissions.md= splits tool calls into three tiers:
208  
209  - Always allowed :: reads, tests, branches, memory writes, draft PRs,
210    approved-domain HTTP.
211  - Requires approval :: merges, deploys, deletes, new dependencies, CI
212    changes, migrations.
213  - Never allowed :: force-push to main, direct secret access, modifying
214    =permissions.md= itself, disabling hooks, deleting memory entries
215    (archive instead).
216  
217  The =pre_tool_call= hook enforces this /before/ the call runs. The
218  agent cannot edit the permissions file -- that is in the "Never
219  allowed" tier. Humans edit it; the agent does not.
220  
221  * Delegation Contract
222  
223  =protocols/delegation.md= governs sub-agent handoff. Hard cap:
224  *3 levels* of recursion.
225  
226  Every delegation must specify:
227  
228  1. Goal (one sentence).
229  2. Constraints (inherits parent permissions by default).
230  3. Return format (structured).
231  4. Budget (tokens, tool calls, wall time).
232  
233  Memory isolation:
234  
235  - Sub-agents *read* shared semantic + personal memory.
236  - Sub-agents *write* to their own =memory/episodic/= namespace.
237  - On return, the parent decides which sub-agent learnings to merge.
238  
239  That is a git-staging-area pattern for agent learnings.
240  
241  * Host-Agent CLI (=.agent/tools/=)
242  
243  Daily-driver surface, roughly in order of usage:
244  
245  - =recall.py "<intent>"= :: Surface graduated lessons relevant to the
246    task at hand. /Run before deploy, migration, timestamp, debug, or
247    refactor work./ This is how lessons cross harnesses.
248  - =learn.py "<rule>" --rationale "<why>"= :: Teach a new lesson in one
249    shot (stage + graduate + render). For rules you already know.
250  - =show.py= :: One-screen dashboard of brain state -- episodes,
251    candidates, lessons, failing skills, activity graph.
252  - =list_candidates.py= / =graduate.py= / =reject.py= / =reopen.py= ::
253    Review protocol for patterns the dream cycle has staged.
254  - =memory_reflect.py <skill> <action> <outcome>= :: Log a significant
255    event to =AGENT_LEARNINGS.jsonl=.
256  
257  * Key Design Insights
258  
259  - Four-layer memory taxonomy maps to cognitive science :: Not
260    marketing. Sleep-consolidation research inspired the /dream cycle/;
261    REM sleep moves episodic traces into semantic storage, which is
262    exactly what =auto_dream.py= does.
263  - Rationale-required graduation is the core anti-slop mechanism ::
264    Agents hallucinate most when allowed to promote their own outputs
265    into durable memory without justification. Forcing =--rationale=
266    creates a gate a silent loop cannot pass through. Worth stealing
267    even if the rest is not adopted.
268  - 3-level delegation cap is a cheap structural fix :: Most frameworks
269    collapse from unbounded recursion, not from any single bad decision.
270    Capping at 3 does not require smarter agents -- just a depth counter
271    in the handoff contract.
272  - Salience scoring is the piece the thread did not advertise :: Raw
273    episodic memory is append-only. Without salience, top-k retrieval
274    drowns in low-value events. =harness/salience.py= is where the stack
275    gets tuned for a specific workflow.
276  - No retrieval augmentation :: The repo never mentions RAG, embeddings,
277    or vector stores. With large-context models, the bottleneck has
278    shifted from /retrieval/ to /curation/. The cron-driven compression
279    does more work than any vector store would.
280  
281  * Open Questions / Things Not Yet Understood
282  
283  - What does the onboarding wizard (=onboard*.py=) actually do, and how
284    opinionated is it? If it hard-codes choices, adopting the stack is
285    less portable than advertised.
286  - How do the /adapters/ (=adapters/= directory) differ per harness?
287    This is the seam at which Claude Code / Cursor / Windsurf integrate,
288    and the real test of "portable."
289  - How is the =pre_tool_call= hook wired into a harness that does not
290    natively expose hooks?
291  - What does =salience.py= actually score on? Without reading it, the
292    top-k episodic retrieval is a black box.
293  - How does the repo handle /conflicting/ lessons -- when a new
294    candidate contradicts an already-graduated rule?