/ upstream-walkthrough.org
upstream-walkthrough.org
1 #+title: codejunkie99/agentic-stack -- Walkthrough 2 #+author: HaQadosch 3 #+date: [2026-04-22 Wed] 4 #+startup: indent 5 #+options: toc:2 num:nil 6 7 * Source 8 9 - Repo :: [[https://github.com/codejunkie99/agentic-stack][codejunkie99/agentic-stack]] 10 - Origin thread :: [[https://x.com/Av1dlive/status/2044453102703841645][Av1dlive on X]] -- /AI Agent Stack Everyone Must 11 Use in 2026 (Builder's Guide)/ 12 - Licence :: Apache 2.0 13 - Stated purpose :: /Keep one portable memory-and-skills layer across 14 coding-agent harnesses, so switching tools doesn't reset how your 15 agent works./ 16 17 These notes were captured on [2026-04-22]. The repo is in active 18 development -- re-fetch before implementation decisions. 19 20 * Core Thesis 21 22 You don't need to build your own model. You need to build the 23 /infrastructure around it/. Three pieces are load-bearing: 24 25 - Memory :: that persists across sessions, across harnesses. 26 - Skills :: reusable capability modules with triggers and constraints. 27 - Protocols :: hard-coded governance for what the agent can and 28 cannot do. 29 30 Everything else -- the harness, the model call, the logging -- is 31 thin glue. The bet is that the surrounding scaffolding is the moat, 32 not the model. 33 34 * Top-Level Repo Layout 35 36 #+begin_example 37 .agent/ # Portable brain (the core component) 38 adapters/ # Harness-specific integrations 39 docs/ # Architecture & guides 40 examples/ # Usage examples 41 Formula/ # Homebrew formula 42 install.sh # macOS/Linux installer 43 install.ps1 # Windows installer 44 onboard.py # Wizard entry point 45 onboard_features.py # Feature toggle logic 46 onboard_render.py # Answer rendering 47 onboard_ui.py # Terminal UI components 48 onboard_widgets.py # Interactive prompts 49 onboard_write.py # File I/O operations 50 test_claude_code_hook.py # Hook validation suite 51 verify_codex_fixes.py # Regression checks 52 requirements.txt 53 CHANGELOG.md LICENSE README.md .env.example 54 #+end_example 55 56 The interesting bit is =.agent/= -- a self-contained directory that is 57 meant to be portable across harnesses. The rest of the repo is 58 installer, onboarding wizard, and adapter glue. 59 60 * The =.agent/= Directory 61 62 #+begin_example 63 .agent/ 64 ├── AGENTS.md # Root prompt -- always loaded 65 ├── harness/ 66 │ ├── conductor.py # ~25 lines. Reads files, calls model, logs. 67 │ ├── context_budget.py # Token allocator 68 │ ├── llm.py # Model adapter 69 │ ├── salience.py # Scoring for episodic entries 70 │ ├── text.py 71 │ └── hooks/ # pre_tool_call, post_execution 72 ├── memory/ 73 │ ├── personal/ # Stable user prefs (rarely changes) 74 │ ├── working/ # Current task, review queue (high churn) 75 │ ├── semantic/ # DECISIONS.md, LESSONS.md (distilled) 76 │ ├── episodic/ # AGENT_LEARNINGS.jsonl (raw log) 77 │ ├── candidates/ # Staged lessons awaiting review 78 │ ├── auto_dream.py # Clusters episodic -> candidates 79 │ ├── validate.py # Heuristic prefilter 80 │ ├── promote.py # Candidate -> LESSONS.md 81 │ ├── decay.py # Lessons lose salience without reinforcement 82 │ ├── cluster.py # Groups similar candidates 83 │ ├── memory_search.py 84 │ ├── review_state.py 85 │ ├── archive.py 86 │ └── render_lessons.py # Renders LESSONS.md from lessons.jsonl 87 ├── skills/ 88 │ ├── _index.md # Always loaded (short) 89 │ ├── _manifest.jsonl # Machine-readable triggers 90 │ ├── skillforge/ # Creates skills from observed patterns 91 │ ├── memory-manager/ # Reflection cycles 92 │ ├── git-proxy/ # Git with safety constraints 93 │ ├── debug-investigator/ # Reproduce, isolate, hypothesize, verify 94 │ └── deploy-checklist/ # Pre-deployment verification 95 ├── protocols/ 96 │ ├── permissions.md # 3-tier allowlist 97 │ ├── delegation.md # Sub-agent handoff contract 98 │ ├── hook_patterns.json 99 │ └── tool_schemas/ # Typed tool interfaces 100 └── tools/ # Host-agent CLI 101 ├── recall.py # Surface graduated lessons for current intent 102 ├── learn.py # Teach a new lesson in one shot 103 ├── show.py # One-screen brain-state dashboard 104 ├── list_candidates.py 105 ├── graduate.py # Promote candidate -> LESSONS.md 106 ├── reject.py 107 ├── reopen.py 108 ├── memory_reflect.py # Log a significant event 109 ├── budget_tracker.py 110 └── skill_loader.py 111 #+end_example 112 113 * The Four-Layer Memory Taxonomy 114 115 The repo distinguishes /four/ memory layers, not one. Reading order is 116 pinned in =AGENTS.md= (personal first, episodic last): 117 118 | Layer | Lifetime | What goes in it | Example file | 119 |----------+-------------+-------------------------------------+----------------------------| 120 | personal | Stable | User conventions that rarely change | =PREFERENCES.md= | 121 | working | Per-task | Live scratch + review queue | =WORKSPACE.md= | 122 | semantic | Durable | Distilled, approved lessons | =LESSONS.md=, =DECISIONS.md= | 123 | episodic | Append-only | Raw experience log | =AGENT_LEARNINGS.jsonl= | 124 125 The taxonomy maps cleanly onto human cognitive science: 126 127 - Working memory :: scratch. 128 - Episodic memory :: experiences. 129 - Semantic memory :: facts and rules. 130 - Personal schemas :: preferences and conventions. 131 132 This is not branding -- the API surface follows the metaphor 133 (=graduate=, =reject=, =reopen=, =decay=, =dream=). 134 135 * The Dream Cycle 136 137 Raw events do not become lessons directly. There is a three-stage 138 pipeline with a *human-in-the-loop gate*: 139 140 1. =memory/auto_dream.py= clusters raw episodic events into candidate 141 lessons. 142 2. =memory/validate.py= heuristic-prefilters obvious junk (too-short 143 claims, exact duplicates). 144 3. The host agent (you) must run =graduate.py <id> --rationale "..."= 145 to promote. *Rationale is required.* 146 147 The critical line from =AGENTS.md=: 148 149 #+begin_quote 150 Rationale is required for graduation -- rubber-stamped promotions 151 are the exact failure mode this layer prevents. 152 #+end_quote 153 154 Review happens in *batches*, not one-by-one, because cross-candidate 155 contradictions only surface when you see multiple at once. 156 157 * The Thin Conductor 158 159 =harness/conductor.py= is the entire runtime. Full body (abridged): 160 161 #+begin_src python 162 RESERVED = 40000 163 MAX_CTX = int(os.getenv("AGENT_MAX_CONTEXT", "128000")) 164 165 def run(user_input: str) -> str: 166 context, used = build_context(user_input, budget=MAX_CTX - RESERVED) 167 system = SYSTEM_PREAMBLE + context 168 try: 169 result = call_model(system, user_input) 170 log_execution("conductor", user_input[:100], result[:500], True) 171 return result 172 except Exception as e: 173 log_execution("conductor", user_input[:100], str(e), False) 174 raise 175 #+end_src 176 177 Things worth noticing: 178 179 - No retry logic :: it raises on failure. Retry policy lives per-skill, 180 not in the orchestrator. 181 - 40k reserved :: budget is =MAX_CTX - 40000= -- generous headroom for 182 the response. 183 - Logs success /and/ failure :: failure is logged before the raise so 184 the trail is never lost. 185 186 The repo's stated rule: /"The harness is dumb on purpose. Reasoning 187 lives in skills + the host agent."/ 188 189 * Progressive Skill Disclosure 190 191 =skills/_index.md= is always loaded. It is tiny -- five skills, one 192 sentence each, plus trigger phrases. Full =SKILL.md= bodies *only 193 load when triggers match*. This is the context-economy trick: you can 194 have 50 skills without paying the token cost for 49 of them on every 195 turn. 196 197 Current shipped skills: 198 199 - skillforge :: Creates new skills from observed patterns. 200 - memory-manager :: Reads, scores, consolidates memory. 201 - git-proxy :: All git operations with safety constraints. 202 - debug-investigator :: Reproduce, isolate, hypothesize, verify. 203 - deploy-checklist :: Pre-deployment verification. 204 205 * Permissions: 3-Tier Allowlist 206 207 =protocols/permissions.md= splits tool calls into three tiers: 208 209 - Always allowed :: reads, tests, branches, memory writes, draft PRs, 210 approved-domain HTTP. 211 - Requires approval :: merges, deploys, deletes, new dependencies, CI 212 changes, migrations. 213 - Never allowed :: force-push to main, direct secret access, modifying 214 =permissions.md= itself, disabling hooks, deleting memory entries 215 (archive instead). 216 217 The =pre_tool_call= hook enforces this /before/ the call runs. The 218 agent cannot edit the permissions file -- that is in the "Never 219 allowed" tier. Humans edit it; the agent does not. 220 221 * Delegation Contract 222 223 =protocols/delegation.md= governs sub-agent handoff. Hard cap: 224 *3 levels* of recursion. 225 226 Every delegation must specify: 227 228 1. Goal (one sentence). 229 2. Constraints (inherits parent permissions by default). 230 3. Return format (structured). 231 4. Budget (tokens, tool calls, wall time). 232 233 Memory isolation: 234 235 - Sub-agents *read* shared semantic + personal memory. 236 - Sub-agents *write* to their own =memory/episodic/= namespace. 237 - On return, the parent decides which sub-agent learnings to merge. 238 239 That is a git-staging-area pattern for agent learnings. 240 241 * Host-Agent CLI (=.agent/tools/=) 242 243 Daily-driver surface, roughly in order of usage: 244 245 - =recall.py "<intent>"= :: Surface graduated lessons relevant to the 246 task at hand. /Run before deploy, migration, timestamp, debug, or 247 refactor work./ This is how lessons cross harnesses. 248 - =learn.py "<rule>" --rationale "<why>"= :: Teach a new lesson in one 249 shot (stage + graduate + render). For rules you already know. 250 - =show.py= :: One-screen dashboard of brain state -- episodes, 251 candidates, lessons, failing skills, activity graph. 252 - =list_candidates.py= / =graduate.py= / =reject.py= / =reopen.py= :: 253 Review protocol for patterns the dream cycle has staged. 254 - =memory_reflect.py <skill> <action> <outcome>= :: Log a significant 255 event to =AGENT_LEARNINGS.jsonl=. 256 257 * Key Design Insights 258 259 - Four-layer memory taxonomy maps to cognitive science :: Not 260 marketing. Sleep-consolidation research inspired the /dream cycle/; 261 REM sleep moves episodic traces into semantic storage, which is 262 exactly what =auto_dream.py= does. 263 - Rationale-required graduation is the core anti-slop mechanism :: 264 Agents hallucinate most when allowed to promote their own outputs 265 into durable memory without justification. Forcing =--rationale= 266 creates a gate a silent loop cannot pass through. Worth stealing 267 even if the rest is not adopted. 268 - 3-level delegation cap is a cheap structural fix :: Most frameworks 269 collapse from unbounded recursion, not from any single bad decision. 270 Capping at 3 does not require smarter agents -- just a depth counter 271 in the handoff contract. 272 - Salience scoring is the piece the thread did not advertise :: Raw 273 episodic memory is append-only. Without salience, top-k retrieval 274 drowns in low-value events. =harness/salience.py= is where the stack 275 gets tuned for a specific workflow. 276 - No retrieval augmentation :: The repo never mentions RAG, embeddings, 277 or vector stores. With large-context models, the bottleneck has 278 shifted from /retrieval/ to /curation/. The cron-driven compression 279 does more work than any vector store would. 280 281 * Open Questions / Things Not Yet Understood 282 283 - What does the onboarding wizard (=onboard*.py=) actually do, and how 284 opinionated is it? If it hard-codes choices, adopting the stack is 285 less portable than advertised. 286 - How do the /adapters/ (=adapters/= directory) differ per harness? 287 This is the seam at which Claude Code / Cursor / Windsurf integrate, 288 and the real test of "portable." 289 - How is the =pre_tool_call= hook wired into a harness that does not 290 natively expose hooks? 291 - What does =salience.py= actually score on? Without reading it, the 292 top-k episodic retrieval is a black box. 293 - How does the repo handle /conflicting/ lessons -- when a new 294 candidate contradicts an already-graduated rule?