Cradicle Explorer

/ course.md

course.md

1 <course title="Claude Code Cohort">
2 <section title="01-before-we-start">
3 <lesson title="01.01-where-were-going" name="Where We're Going">
4 <description>Show a preview of exactly where we're going to end up and what we're covering on the way there.</description>
5 (no videos)
6 </lesson>
7 <lesson title="01.02-repo-setup" name="Repo Setup">
8 <description>Help everybody get set up with their repos.</description>
9 (no videos)
10 </lesson>
11 <lesson title="01.03-how-to-take-this-course" name="How To Take This Course">
12 <description>Describe how to get the codebase into the state you want for taking the lesson.</description>
13 (no videos)
14 </lesson>
15 <lesson title="01.04-which-model-should-i-use" name="Which Model Should I Use?">
16 <description>The model I've used is Opus 4.6 on 5X Max on medium effort. That's what I currently run and I'm very happy with it. You can run Sonnet 4.6 if you want, or if you're restricted to Pro. But I recommend you run Opus 4.6.</description>
17 (no videos)
18 </lesson>
19 <lesson title="01.05-how-big-a-subscription-will-i-need" name="How Big A Subscription Will I Need?">
20 <description>You will probably be able to get away with Pro but if you hit any issues then you should upgrade to 5X Max.</description>
21 (no videos)
22 </lesson>
23 <lesson title="01.06-navigating-the-discord" name="Navigating The Discord">
24 <description>Show everyone the Discord, and where all the channels are and where they can ask for help.</description>
25 (no videos)
26 </lesson>
27 <lesson title="01.07-office-hours" name="Office Hours">
28 <description>Literally just a reference page for the Office Hours links.</description>
29 (no videos)
30 </lesson>
31 </section>
32 <section title="02-getting-to-know-claude-code">
33 <lesson title="02.01-managing-your-claude-code-session">
34 <video title="Explainer">
35 All right, so you've got Claude code ready to go and installed, let's open it up and check out the first few commands that I want you to learn. For this entire course, I'm going to be running Claude from within VS Code. We're going to talk about the relationship between Claude code and VSCode, but for now it's okay just to run Claude inside the integrated terminal. What you should see is a UI kind of like this. I can't promise that they haven't changed the UI since they've recorded this, but something they will definitely keep is this enormous input box here. We can say to Claude, hello, how are you? And then to send it we can press Return inside the terminal. It comes back with, hi, I'm doing well. Thanks for asking. How can I help you today? In other words, this so far is so much like any other AI chat application that you've used already. At this point I would like you to run something. I'd like you to run forward slash then terminal hyphen setup. It should turn this nice shade of lilac when you've completed the command here. You can then use enter to initiate the command and this will set up a couple of key bindings for you. In my case, because I'm on Windows subsystem Linux, then I have to install this manually myself. But if you're on a different operating system, such as Windows or Mac, you're going to just have it installed for you. But once it is installed, you'll be able to do something very nice, which is you'll be able to press Shift Enter to add new lines into your input here, which is really important when you want to express more complicated things. For instance, I can type thank you and then add a couple of new lines and then say very much like this for a sort of dramatic effect. So from here, let's show a couple of really important commands that we're going to be using on every single time that we use Cloud Code. The first one here is forward slash usage, which shows your plan usage limits. So again, just to show you that again, because I did that quite fast, we go forward slash, then usage, and I can even use tab here to complete the command there. So again, usage like that, and then I run it with return. And this shows me how much Claude code usage I have left in my current session, on my current week, and there's a separate one for current week SONNET only. Now this will probably look different based on which plan you're on and what Claude plans even look like in the future. For me, I'm on 5x max, and this is what it looks like with 5x max. I can then press escape to cancel out of this. Another really important command is to run forward slash context here. And what context is going to do is show me a graph of all of the different ways I'm using up my context window. We are currently using about 21,000 tokens out of a maximum of 200k tokens. So we're using around 10% here. We're gonna be talking a lot, lot more about context throughout this entire course. And so this is a great way of giving you the ability to introspect the context that you're using and how much you have left. The next command you definitely need to know about is to go down to the bottom and run clear here, which clears the entire conversation history. Clear resets that context window back to zero. So it's kind of like you started up a whole new chat with Claude Code. It's forgotten everything that you talked about in your last chat. Again, we're going to be talking a lot about when to clear the context window, but this is how you do it. You either, of course, you can either just cancel out of code by pressing control C twice and just starting up a new Claude session here, or you can clear the current session by just running clear. Finally, how do you interrupt Claude when it's running? Well, I'm gonna ask it a question that's gonna take a little longer to answer, which is tell me about this code base here. And anytime I want to, I can just press escape to interrupt it. Once I've interrupted it, if I wanted to go again, I could just say go as well. And that was me literally just typing in the word go and sending it on its way again. if I want to interrupt it again I can just press escape and then it's stopped doing what it's doing. Okay, so we've learned how to add basic prompts inside here. We've learned that we can do Shift-Enter to add new lines. We've learned how to invoke commands with forward slash and using tab to get to the end of the command. We've learned how to run usage here to check our plan usage limits. We've learned how to run context here to visualize the current context. and we've learned how to clear the context with forward slash clear. So I would say those are the basics done. Nice work, and I will see you in the next one.
36 </video>
37 </lesson>
38 <lesson title="02.02-prompting-in-the-terminal">
39 <video title="Explainer">
40 Now we understand the basics about Claw Code, I want to give you some nice little tips that you can use to improve your prompting. You may want to reference individual files when you're using Cloud Code. For instance, you may want to say, tell me lots of information about this file or make some changes to this specific file. To do that, you can use the at command and start typing inside here. For instance, I might want something inside the routes.ts file, and it's just right here, so I can autocomplete by pressing up and down on the keyboard. When I want to select something, I can press tab here. And of course, if I want to add another file, I can just go through that process again, where I can say app and then DB, and let's see what's in here. Maybe DB schema is what I need. So I tab and I complete there. When I then run this by pressing return, these files will be automatically read into the context window here. This means that it doesn't need to go and find those files manually, it's just read them automatically. This does cost you a little bit of upfront time, in other words, having to find these files. but it is worth it because it gives Claude exactly what it needs to succeed. One other tip you might not know about, even if you've been using Cloud Code for a while, is if you have a big prompt here and you want to, let's say, not do this now, but do it later, you can press Control S and it will stash it. This has now stashed my command inside Claude's memory here and I can now submit another command, let's say hello like this, and it will then rehydrate and sort of come back to me here. So again, I can stash with control S, I can submit something says, I'm good, thanks. And when I submit, it will then restore after that. This is really useful when you're giving Claude some feedback, let's say on some code that it's not done very well, and then you realize, oh, I just need to tell it something first, stash my current command, bring it back later. If you then realize, oh, I don't need this command at all, you can press Control C and get rid of it. One other really sweet thing you can do is you can copy and paste images into Cloud Code. I've got this lovely image of Lake Bled in Slovenia. I can right-click it, copy the image, and go to Cloud Code. And if you copy and paste it in, it can then reference that image. Annoyingly, I'm on Windows Subsystem Linux and this doesn't actually work on WSL. But if you try it for yourself and press return then it should be able to tell you where am I in the world and it should tell you Lake Bled in Slovenia. Pretty cool. So to summarise then, we've covered referencing particular files with this at symbol here. and we've covered pasting in images, which I'm kind of gutted not to be able to show you. hopefully you were able to follow along clearly and nothing went wrong or did anything weird. Nice work, and I will see you in the next one. We've covered stashing with Control S.
41 </video>
42 </lesson>
43 <lesson title="02.03-claude-and-your-ide">
44 <video title="Explainer">
45 Let's now talk about how Claude integrates with your ID of choice. I'm using VS Code for this course, but you do not have to. You can use Cursor, you can use Windsurf, Antigravity, whatever fancy tools are available now. The way that Claude integrates with it is via this IDE command here. where it says you can manage the IDE integrations and show the status. If we run this, we can see that we are connected to Visual Studio Code. I've got this working because I have installed the Claude code for VS Code extension here. But if you run IDE in your terminal here, then it should show you how to install your version for your IDE that you're using. If you're following along, let's press enter to confirm and get out of this. The IDE integration is really for diff management, that's the thing that comes up most often. For instance, if we prompt Claude to say, remove the test watch command from package.json. We can see that it read one file, so it read the package.json, and now it's made an update to package.json. Now if we weren't connected to the IDE, then this would show as a diff inside the terminal, which is a little bit awkward. But since we're connected to VS Code, we can actually close this and we can see the diff in VS Code. This is really nice because this is much, much richer. and it lets us actually scroll and see what's going on in the file and we can see that this is a decent edit. We can then either press this Accept Proposed Changes up here or we can click inside here and save the file. And by saving the file, we agree to the changes. This is the reason that I'm running Claude code inside VS Code. It's really nice to be able to dive in sometimes and just tweak some things about Claw Code's output. And if I need to update any diffs or review any diffs, it's so much nicer doing that in a proper IDE than just doing it in the terminal. There are more features and there are more ways that VS Code integrates with Cloud Code and your IDE probably does too. but this diff feature was the one I wanted to show off because it's the one I end up using 99% of the time. hopefully you were able to follow along with that. Feel free to go back through the video if you weren't and ask any questions on the discord if you ran into trouble. Nice work, and I will see you in the next one.
46 </video>
47 </lesson>
48 <lesson title="02.04-going-forwards-and-backwards-in-time">
49 <video title="Explainer">
50 One really important thing that Claude Code allows you to do is go backwards and forwards in the conversation. This is really important for when you're trying something out that maybe you later want to revert. For instance, in the previous video, we just did a change where we removed a dev script. Now, what I could say to Claude is just revert that, please. and it will go and undo the change that it just did before by just rewriting, re-adding it back in. And I'm just going to accept this by pressing Control S to save the diff. But there's another option here, I can press escape, escape in quick succession to enter rewind mode. And in rewind mode, I get to restore the code and or the conversation to this point. For instance, if I just want to revert the command that I just did, I can just choose this one. This one at the bottom is the current state. This one is the point at which I said revert that please. And so I can select it with enter. Now there's a few different options here that I can choose. The first one will be to restore the code and the conversation. In other words, to just rewind my session to the point before this code edit was made. Or I could restore the conversation but keep the code as it was, that would be number two. Or I could restore the code but keep the conversation how it is. The one I choose most often is just to restore the full code and conversation, so I'll do that now. And we can see that the code has been reverted. So now we are back to the point at which we had TestWatch deleted. If I want to go back further, I can open up the terminal again and just press escape twice to go back further. And let's choose, we go back to the point where we remove the test watch command completely. Another interesting option is this summarize from here command, but we'll talk about that later. I'm actually gonna cancel this by choosing nevermind and just sticking to the current state. One other thing that's also really important is the fact that Claude actually persists its sessions locally. What this means is you can quit out of Claude by pressing CTRL-C twice. And then you can resume it in a bunch of different ways. You can either resume the session by running this command directly that you get from the output of the previous command. So I can just run Claude resume and then the UUID and then I just go back into the exact state that I left previously. Another way of doing this is if I can just CTRL-C twice again to exit and I open up a new session just by typing Claude, this one is a totally clear session. Inside here, I can press forward slash resume and resume a previous conversation. And now I get to choose all of the conversations that I've had here. I can even search through all of the sessions that I've had in the repo if I want. Let's go back to the one that we were just using and I can press return to go into that. And again, I'm back in that same session. So if your session is interrupted for any reason, then you can always resume it because Claude persists it locally. So that's how Claude allows you to go backwards and forwards in the conversation, either by entering rewind mode by pressing double escape and just kind of like zooming through all of the content here. or I can cancel out of this and restart my session any time. I can even just choose to resume the previous session by running Claude hyphen hyphen continue. And again, this pulls me into exactly where I was. So, very nice work and I will see you in the next one.
51 </video>
52 </lesson>
53 <lesson title="02.05-running-bash-commands" name="Running Bash Commands">
54 <description>Backgrounding with &amp;, running them with !, suspending Claude and running it in the terminal</description>
55 <video title="Explainer">
56 One super important part of working with Claw code is running bash commands. Bash commands turn your agent from just a passive code writer into something that can actually seek feedback loops, can actually work with your project and can use all the power of Bash at its disposable to find information and to do stuff. Now we can't just manually ask Claude, OK, run the dev server for me, and because it's quite smart it will understand, OK, I need to read the package.json, I need to find where the dev server is, and then I need to run it like this. We'll get into what this is doing here in this kind of approval setup here. But suffice to say, there are some times when you just know what command you want to run, and you just want to run it, and then put the result into Claude's context. So the way you do that is I've just restarted my Claude code instance. I'm going to just put an exclamation mark and now I've entered bash mode. And now anything I put in here will be turned into a bash command and actually run for Claude. So for instance, why don't I run npm run type check inside here and now it just runs the command after I press enter. Now because I haven't run npm install in my setup here, I'm getting a lot of errors. and these errors are now present to Claude in its context. So I'm gonna get it to help me solve these errors and it will be able to see the errors and actually work with them. And there we go. It's figured out that Zord is in package.json, but not installed. I should run npm install. There we go. Now this works well for commands that essentially have a start and an end, but what about commands that are supposed to persist, like long-running dev servers? Well for that we can run npm run dev inside bash mode here and what we can do is while it's running we can actually press control B to run it in the background. And in this version of Claw Code, something appears that says, command was manually background with user with ID this. and any output in that task goes into a local file. You can then see that underneath our status line here, there is a little background task that's showing. And so I can zoom downwards with just the down arrow there, and I can press return to see it. And now I can actually view the shell. I can view what's going on here, and I can see that it's on localhost 5175. I've got lots of options here, I can stop it with X if I want to. Or I can just press left to go back, and now I'm back in my Cloud Code instance. This is really useful when you're debugging a problem with your dev server, let's say, or with a long-running command, because Claude can see where all of the logs are being written to. It can try something out in the UI, maybe, or send a curl request, and then it can actually see the output of the dev server. So I've just reset my instance here so we can see another feature which is suspending Claude code. If, for instance, I want to run something that I don't want Claude to see the results of, or I don't care to show it... and I have some state inside Claude that I want to preserve, then I can use suspend. I can press CTRL-Z inside here and Claude Code has been suspended. This means I can now run any command I want to. I can just echo foo inside here. and this is not visible to Claude Code. If I want to bring Claude Code back, I can just run FG here, and I get my Claude Code instance back with all of its state. So this is great if you just want to go, okay, don't care about Claude code, do something like whatever command I want, and then FG to bring it back. So the decision tree for this really looks like this. If you want the output of the bash command that you're running to be visible to Claude, then you can use bash mode here by using the exclamation mark, and you can background that with Control B and then manage those backgrounded tasks. Again, really, really useful for dev servers. I don't use it every single time, but when I do use it, it's usually for debugging some kind of dev server issue. But if you want the command to be totally hidden from Claude, you can just suspend it quickly with CTRL-Z. And of course, these are Windows shortcuts, so you might be needing to do something different if you're on Mac. So those are all the different ways that you can manage bash commands in Cloud Code. nice work and I'll see you in the next one.
57 </video>
58 </lesson>
59 <lesson title="02.06-permissions">
60 <video title="Explainer">
61 Whenever you're working with an agent, you always need to think about risk versus reward. especially in how much power you give over to it. If I were to give Claude infinite power, like many people have done... then I would be understandably nervous that it would do something crazy with that power. For instance, you know, delete my entire file system by accident. To mitigate that, Claude Code has a very detailed permissions model. and by default is very strict with itself of what it allows the agent to actually go and do. For instance, I'm going to tell Cloud Code to run a bash command to run echo hello. And because echo is an extremely safe command, Claude Code thinks this command is fine for it to run. But if I get it to do something else, such as run a type check on the project, then it might do something differently. We can see it's now requesting to run a command here. So it's trying to run PMPM type check. So we see a few things here. We see first of all the command, the exact command it's going to run. We then see the reason it wants to run it. It wants to run a type check using react-router-typegen and TSC. And we have three options here. We can say, yes, you're allowed to do this this time. Or yes, and you're allowed to then, from now on, run any PNPM type check command inside this project. Or we say no and inside here we can do something clever. If for instance, we don't want it to do this exact command, we can press tab and we can give a reason here. For instance, we might just want to use NPXTSC instead for some reason. And now it's asking us if it wants to, if we can run npxtsc instead. So let's say we agree to this and we say, yep, and you don't need to ask again inside this project what actually happens. Well, first of all, for some reason, it's actually failed the type check on this branch, which is, I think, my fault. As we can see, it actually reported the results from the type check and they were reported back to the LLM. But crucially, our preferences here were recorded inside a file just at the top here. This is inside the .clawed folder and this is inside settings.local.json. Here we have a permissions property on side this piece of JSON here. And inside this allow array are all of the things we are allowing for this project. The syntax is really important to understand here. Because we can actually edit this ourselves if we want to, ahead of time, allow things to happen in the repo. For instance, if we want to, we can say bash here and just say, let's say pnpm type check. And this means that the PNPM type check command will now be automatically allowed by Claude Code. If instead we want to say that all PNPM commands are available, then we can replace this with a wildcard here. We can also disallow Claude from doing things by using the deny inside here. For instance, we might not want it to run bash, let's say, git push, for instance. And so git push with a wildcard would be a safe one here to deny it always. However, it's not just bash commands that Claude Code needs permission in order to be able to run. Claw Code can also search the web and fetch web pages in order to back up what it's seeing locally. So for instance, I can get it to fetch info about react-router-typegen from the web. and it's now asking me to do a web search for React router type gen. If I say yes and don't ask again for web search commands, then it's going to add it to the settings.local.json up here. And we can see that it's then going to fetch from this website, ReactRouter.com. So now it's fetched from there and it's given me a summary of how React Router typegen works based on the actual documentation. The final thing to say is that these settings.local.json are in my project, they get ignored, so they apply only for me. But if I wanted to share these with my team, let's say, and have a specific set of things that were always allowed within this repo, that were part of those project settings, then I would rename settings.local.json to settings.json. These can then be shared with your team or any Cloud Code instance that's running on your repo, and they will pick these up too. This can be incredibly powerful for anyone starting on your repo for the first time. They just run Claude, and Claude then knows what's allowed for that repo. which can be a lot faster than having to manually set up the permissions for yourself each time. We're gonna be touching on permissions more throughout the course, of course, but I just wanted to give you an intro so you understand what the approval flow looks like, and you can edit these permissions if you like. Nice work, and I will see you in the next one.
62 </video>
63 </lesson>
64 </section>
65 <section title="03-day-1-fundamentals">
66 <lesson title="03.01-the-constraints-of-llms">
67 <video title="Explainer">
68 Before we even touch Claude code, we need to understand the constraints of LLMs because they come with some extremely strange ones. A lot of people think of LLMs as like a really enthusiastic junior developer who can work 24-7, but it really is a lot weirder than that. And understanding this stuff is so fundamental because if you don't understand it, you're going to end up blaming AI when in fact you're just working against how the models are supposed to work. In this video, I'm gonna walk through each of these constraints and explain how they affect your usage of tools like Cloud Code. And let's start with the biggest constraint, which is the scaling laws around tokens and the context window. The way that LLMs work is they take input text like this, they split it up, and then they tokenize it into numbers here. You don't really need to know why they do this, I've talked about this in separate videos. But when you get these lists of tokens here, all of the tokens that the LLM can see are its context window. We've only got three tokens here in this very small example, but a typical context window maxes out at around 200,000 tokens. Now, if we were to add another token to this context window here, you might think, okay, all we've got to do here is just store an extra number in memory, right? But that's not quite right. Not only are we storing the actual token in memory, we're also storing all of its relationships to every other token. So for four tokens, that means six relationships. For 8 tokens, we've immediately scaled up to 28 relationships. And if I were to draw 100 tokens here, that would mean roughly 5,000 arrows here. In other words, every time we add a token, the number of things the LLM has to track and keep hold of scales quadratically. It's not like appending a character to a document, it's like adding a team to a football league, where the number of games played massively increases as you add teams. Now what this means in practice is, as you're using your coding agent, as you're adding more messages to the context window, you are really putting a huge strain on the LLM. And this means as you fill up the context window, the model will start to struggle and you end up in what I call the dumb zone. In the smart zone, early in the context window, the LLM has lots of memory to spare, its attention relationships are not very strained. and it can reason really clearly, it can attend to all the information in its context window. and in general, it makes smart decisions. But later, when it starts getting a little bit more strained, it goes up into the dumb zone. This is where hallucinations start to creep in, reasoning gets a lot worse. and sometimes it will struggle to recall information that's just sitting there in its context window. Now where this dumb zone actually starts is a matter of some controversy and it will probably change over time. but I start getting paranoid around the 40% mark. In other words, in a 200,000 token window, I would start getting scared around the 80,000 token mark. Some might say this is too aggressive and that, you know, maybe 60% or 70% is where the dumb zone starts, but everyone agrees that it does exist and whenever you're using LLMs, you need to bear it in mind. Now the Smart Zone and Dumb Zone is the main constraint that we're going to be working against throughout this entire course really. But I wanted to mention some others because they really inform how you use LLMs too. One of the main failure modes I see is from people trying to use LLMs as a database to just perfectly retrieve information from its pre-trained knowledge. That's because it's very tempting, right? If I say to be or not to be, that is the question, whether it is nobler than the mind to suffer the slings and arrows of outrageous fortune. LLMs do kind of behave like a database sometimes, just spewing out information from their training data. However, we need to be super careful when we think of LLMs as a database or try to use them in that way. If we're training a relatively small model, we might take, let's say, 10 terabytes of training data, like all of human knowledge. In reality, this data would be much, much bigger for the massive models, especially the ones we're using with coding agents. And then we compress all of that down into a set of parameters, which is a decent enough size that it can fit on a GPU. In other words, the right mental model for LLMs as databases is that they've seen all of human knowledge but they've got it as a kind of fuzzy JPEG, they've compressed it down to a point where it's hardly even visible to them. So when you ask an LLM a question, it's not referencing the training data directly, it's referencing this fuzzy JPEG that it has, which means that its answers are by design unreliable. This is not true, however, of stuff in its context window. Of stuff in its context window, it has access to the entire raw information. So if you ask it questions about stuff that it has in its context window, it's going to give you pretty reliable answers. However, of course, the more you stuff the context window, the more you put it into the dumb zone, the less reliable those answers are going to be for reasons we discussed earlier. Another key constraint that emerges from the way that LLMs are trained are knowledge cut-off dates. This pre-training process we talked about here is prohibitively expensive. And the process for testing a model, too, is often really elaborate and expensive, too. So models are not kept up to date with the latest information when they're deployed. If you have a model deployed in January, it won't know anything that's happened in the world since January. For AI coding this means it won't have any information about the latest React version if they've pushed that. And so this all of human knowledge, it's all of human knowledge up to a certain date. However, because LLMs are compressors anyway, and this is just like a fuzzy JPEG of all of its information, I tend to put less focus on knowledge cutoff dates than other educators. because I distrust the LLM as a database so much that I wouldn't expect it to kind of have any background knowledge whatsoever. That's the mental model I'm coming in. Finally, the weirdest constraint is that LLMs are completely stateless. This means LLMs behave kind of like that guy from Memento, where every time he woke up, he would completely forget his entire life. In practice, this means that as you're working with the LLM, kind of adding things and adding stuff into here, when you clear the context, you completely reset back to nothing. and you then need to build it back up again, right? So when you clear the context and start fresh, you're losing the tribal knowledge that the LLM has built up over your code base. This means that documentation and code-based quality and organization become absolutely key. but we're going to explore that more in the steering section. So those are the main constraints of LLMs, that they have a smart zone and a dumb zone. they have seen all of the world's information but it's kind of like it was written scribbly on the back of a napkin somewhere and they can only half remember it. And they were pre-trained at a certain point in time, which means that their knowledge only goes to a certain date anyway. And they are totally stateless, which means that every time you clear the context, you are essentially wiping their memory. All of their feelings, all of their experiences are just BOOM! Gone. As developers or as managers, these are the weirdest set of constraints we've ever had to work around. When you have a new starter joining your team, at least, you know, they might rock up at 10am or something, but at least they have a memory. least they can learn over time, and at least they don't have such a narrow window in which to work before they just forget all of their memories. During this course and throughout this cohort, we are going to be working with Claude Code to get the most out of these constraints. Nice work, and I will see you in the next one. Because when you work within them, you realize, ah, there are actually many techniques I can use to get the most out of these tools without bumping into their weirdnesses.
69 </video>
70 </lesson>
71 <lesson title="03.02-what-are-subagents">
72 <video title="Explainer">
73 Now we understand some of the constraints of LLMs, let's start talking about how Claude Code tries to mitigate them. Specifically, there's a super important strategy that Claude Code implements to squeeze more juice out of its context window. I've got a little visualization of the context window here. We can imagine that each of these sections are different tasks that Claude Code needs to do in that session. This gray bit I'm imagining is the system prompt, the bit that we saw in the context, the stuff that's always in there, the system prompt, the system tools, the stuff that's always being passed to the LLM to tell it how to behave like Claude Code. then the first thing the agent might need to do is, let's say, explore the repo. It comes in with zero memory, right, so it needs to do a bit of exploration before it then goes on to the green, which in this case is implementation. And then in this example, this dark gray is stuff that's just empty space in the context window that hasn't been filled up yet. Now the dream for any kind of harness, any kind of Claude code-like application would be to make these chunks smaller, right? Because the less tokens we spend on exploration, the more tokens we have available in the smart zone for the implementation. But of course, if we spend less effort in the exploration mode, we are probably going to end up with a worse exploration, which means probably the LLM will have fewer, less context about our repo, which probably means a worse implementation. So it's hard to see how you bridge this divide. So Cloud Code and agent tools like it employ a really smart solution. In the agent that you talk to, which is the orchestrator agent, it then spawns a sub-agent. In other words, it creates a new context window and prompts that sub-agent to do this task here. So the sub-agent can then spend lots of tokens doing the task, all within its smart zone, and then it reports a summary of its results back to the orchestrator agent. In other words, this is a delegation mechanism. It's kind of like the orchestrator is the lead developer and it just pawns off a task to a junior developer to say, explore this repo for me and then report back your findings. The orchestrator can spawn multiple sub-agents too, so you can have multiple sub-agents doing work in parallel, and when they're all finished, they report back to the parent orchestrator. These sub-agents can be spawned with different system prompts and different models too, so we can even use a cheaper model for these sub-agents if it's a relatively simple task like exploration. It's super common to see Claude Code spawn with Haiku, for instance, for exploration, which is really fast and really high quality. So this is what sub-agents are. They are a context-saving mechanism for the orchestrator agent. And Claude Code uses them extremely aggressively, so you're gonna see them everywhere when you use Claude Code. Nice work, and I will see you in the next one.
74 </video>
75 </lesson>
76 <lesson title="03.03-codebase-exploration">
77 <video title="Problem">
78 Of all of the constraints that we explored in the previous exercise, the most onerous, the strangest, and the one you have to think about first, is the fact that the LLM is stateless. This statelessness means that the LLM is dropped into your codebase every single time with no memory of exploring it before. This means that every time the LLM needs to get up to speed with the codebase and explore it to understand its patterns, understand the way it's laid out, and understand even what the codebase does. And this means that exploration is a foundational skill that you need to understand in order to get good with coding agents. And fortunately, we have a big old repo here for you to explore. You should hopefully have had a chance to click around the app a little bit and understand what it's doing, at least at a basic level. But in this exercise, we're going to use Claude to explore the repo for us. I'm going to start by going into VS Code as we've done before and running Claude. And I'm going to prompt it by saying, tell me what the tech stack of this repo is and what its intended purpose is. Now at this point, I'm gonna pause and we are going to see what happens on my machine in the solution. but I would like you to run the exact same thing inside the project and look out for some different things. First, I'd like you to note when you think a sub-agent is being used. You might be able to tell this from the user interface, so see if you can figure it out. Second, when the LLM responds with a response, query it and ask it some more questions and then see what happens. By the end of this exercise, I do want you to have a really good understanding of the repo. So I've given you a bunch of questions that you can ask down below. Once that's done, head to the solution and I will break down exactly what's happening. So good luck and I will see you in the solution.
79 </video>
80 <video title="Solution">
81 All right, let's kick this off and let's see what happens. We can see first of all that it's searching for two patterns and reading six files and you can press CTRL-O to expand here. If I press Ctrl-O to expand, I enter a kind of verbose mode where I can see all of the commands that it's done, where it's calling the bash commands, where it's reading certain files. it gives me a brief summary of the repo including the tech stack here. I'm going to untoggle this UI here so we end up with the kind of default. And what I can see here is that it did not spawn any sub-agents. So in my case, with this model, with this version of the repo, with this particular prompt, it did not spawn any explore sub-agents. And what this meant is we didn't actually read that many files. We only read six files throughout this. That's not going to give me a full breakdown of what's happening in this repo. So let's go to the bottom here and prompt it a little bit deeper. I'm especially curious about the Power Purchasing Parity implementation. And when I do this, I'm going to use a special word. I'm going to use the word explore. I'm going to say explore how PPP works in this repo. And if I kick this off, we're going to see something really interesting. Now what we can see here is that now it's spawned an explore sub-agent. You see here, explore with a kind of title inside the brackets here. And this Explore is running a bunch of different tool calls and it's very aggressively searching for files. If I press CTRL-O to expand here, we can see, wow, it's reading a lot more files here, searching for a lot more different things. and I'll untoggle this to just let it run. So what happened here under the hood was that our orchestrator agent, the one that we're talking to, spawned an explore sub-agent with a customised system prompt to explore the repo for us. We can see that inside the sub-agents context window, it took 60 seconds to complete its task. It used 64,000 tokens. Which is about 32% of its context window, which is pretty gnarly, you know, that's a lot of tokens burned. and it called 25 tools here. So tools can be things like bash commands or file reads and writes. And then it came back with a summary to our parent orchestrator agent, and the orchestrator agent spat out this for us. This looks like a really in-depth exploration for how PPP works. Honestly, almost as good as if we'd just written it ourselves. And so we notice how powerful that sub-agent is there and how important our word choice was. Here we use the explore verb, which kind of connected Claude code and sort of triggered something in its latent space to say, okay, I need to use the explore sub-agent. Whereas first in our previous prompts, we just said, tell me what the tech stack of this repo is. So that's a good hint when you want a really in-depth exploration is to actually use the word explore. So now you should take the opportunity to run any more exploration commands that you want on the repo to really deepen your understanding. because soon we're gonna be getting to building features. Nice work, and I will see you in the next one.
82 </video>
83 </lesson>
84 <lesson title="03.04-build-a-feature">
85 <video title="Problem">
86 Alright, we've explored the code base, you understand the vague structure of what's going on. Now it's time to build our first feature. We're going to build a course review system where students can leave reviews on courses. I've chosen this feature because it's fairly meaty, you need to touch all areas of the code base, but it's not that intense in terms of user interface. To give you an idea here, we need to go into the dev UI, log in as a student, we are now Emma Wilson, and then go to, let's say, Node.js, for instance. The idea would be that we want to be able to leave a review on this page as the user. We don't want to leave a written review or anything like that, we really just need a star rating. So the first thing you'll need to do is get Claude running inside VS Code again. Or if you've got an existing Claude code set up, then run clear to clear the conversation history. From there we're going to put together our initial prompt which is just going to be a couple of sentences about what we want to build. I've gone for this simple prompt. I would like to create a course review system where students can review courses by leaving a star rating. We don't want to add written reviews, just star rating. These reviews will then be visible everywhere that courses are visible. We want to show the average rating on the courses in the list page and on the course page itself. Your prompt may look different from mine, you may want to go further, add more detail, or even pull it back and keep it simpler. Now we're going to pause here, but once you have sent that, then I want you to start steering Claude. I want you to be watching it closely, see if it spawns an explore sub-agent for instance. and try to understand everything it's doing as it's going through. it might start wanting to change files or ask you permissions which you should be prepared for. You should also make sure you're running forward slash context to check on your context usage as you're going through. If you've built features with LLMs before, then this will feel familiar, but the thing I want you to get out of this is the level of context paranoia that I have when I'm using LLMs. Remember, around 40% usage of the main orchestrator agent is when we should start getting a bit nervous. So let yourself be guided by what Claude does, put yourself in a more observational mode, give it a little bit of steering, but mostly we just want to observe the default behavior of Claude Code. Good luck and I will see you in the solution.
87 </video>
88 <video title="Solution">
89 All right, let's fire this off and see what happens. We can see it's entered something called plan mode here, which is fairly self-explanatory, but we're gonna dive much deeper into that later in this section. And we can see it's kicked off an Explore sub-agent here. Beautiful. Let's check in again once the explore agent has finished. Okay, it's now finished with the explore phase. It had quite a meaty explore, which is nice. It's now reading some key files in the orchestrator agent. And now it has come back with a couple of questions. These are things which obviously were not present in my initial prompt or not clear from my initial prompt. I can navigate this by using tab or arrow keys to navigate up and down. And the first question it's asking me, should only enrolled students be able to leave a star rating on a course? That makes sense to me because only those who've actually paid for the course should be able to rate it. That means that the reviews are gonna be more reliable. So I'll press return to select this. What star rating scale should we use? A one to five stars, that feels most appropriate. So I press return. And should the dashboard page also display average ratings on course cards? Now I'm not actually sure what's shown on the dashboard page, so I'm actually going to ask the LLM here. So I'm going to press 4 and chat about this. This should let me quit out of the ask questions flow, but it appears that I just seem to have gone in there again, so I'm gonna press escape and get out of there. I'm just going to ask it, what is currently shown on the dashboard? Give me a list of all the things that are shown there so I can work out whether adding the star ratings would clutter the UI. So you notice I can kind of kick off another exploration or it might already have the information in this context. I'm gonna press return here and just see what it does. So it's given me a description here of the dashboard code and what's actually shown there. I think based on the fact that this seems like a private dashboard to me, I'm gonna say to it, let's not bother with the dashboard page. Let's only put it on places where we're intending to sell the course. So now I'll press return. Now that it's got all of the information from all of the questions that it asked me, it should be able to come up with a decent plan that it's then going to put into action. So as we can see here, it's kicked off now a planned sub-agent. This is another context saving mechanism where it dives deep into the files again in order to read through everything, design a perfect implementation plan that it's then going to follow. So let's wait for the plan agent to complete and see what it produces. Okay, the plan sub-agent has now completed and it's now actually creating a final plan. So this plan here is pretty meaty and it is a multi-step plan. It includes the code that we're probably going to add to the schema here, then the rating service that follows existing service conventions. When I'm reading these plans, I generally just read the top level items here, understanding all the steps are gonna be completed. So it's then gonna update the course list page, update the course detail page. It's going to touch some files here and then run some verification steps. It's also important to check the top of the file too, just to check the kind of top level context here. Students need a way to rate courses with a one to five star rating. Only enrolled students can rate. Average rating should display on the course list page. On the course detail page, the dashboard is excluded. So this looks fine to me. We can now scroll all the way to the bottom and we can check out our options here. We have four main options. We have to say yes, go ahead and auto accept any edits. In other words, I trust that this plan looks great. Just go ahead. I don't need to approve any file rights here. Just feel free to write any code that you fancy. or we could say, yes, go ahead, and I need to still manually approve any stuff that you write. Or let's clear the context, put the plan into context only and then automatically accept any edits. Or, of course, if I don't like the plan, then I can type in to number 4 to tell Claude what to change, and it will update that plan. So I'm deciding between 1 and 2 here. 1 would clear the context, but 2 would just keep the current context and then just barrel on. In order to make that decision, I need to check what current context we're on because I'm feeling a little bit paranoid about the context. I can't exactly recall how to escape from here, but it's either escape, yeah escape works, I was going to try CTRL-C if that didn't work. This lets us quit out into Claude here and we can run forward slash context to check out what current context we're on. And there we go, we are at 36% context already. We have a ton of messages in the context, and we're just pushing up into the kind of bottom of the dumb zone. We might be able to get away with this, but my context paranoia is starting to creep in. So I'm gonna go back to where we were by going down to the prompt and saying, give me the chance to review the plan again. This should just open up the UI, yeah, here we go, where it gives me the option to clear context or not. So because we're already pretty high on context, I'm gonna accept, yes, clear the context, and then automatically accept edits. So because we've now bumped it back to only the context that's in the plan, it reads some of the key files again, it checks for existing patterns and actually kicks off another explore agent. So that's the downside there of clearing the context right? You then have to run the explore agent again to catch up to where you were. but it tends to be worth it to avoid the dumb zone. Alright, we are finally at the implementation stage and it's now started creating a list of tasks for itself. And now it's started actually implementing and it's now referencing the steps in the plan as it's going. So it's added the course ratings table to the schema. It has created the rating service. And it's really starting to cook now, so I'm just going to let it run until it reaches a stopping point. Okay, we are now at a state where it's done four out of five tasks. So one is currently in progress and the one that it's trying to run is the database migration. So this is the first time it's asked me for a permissions thing. Database migrations are something that I always want personal control over, so I'm going to say yes. but I'm not gonna give it the license to always run it automatically. It's doing the same with the db migrate command as well to apply the database migration to my local database. And I'm going to say, yes, go for it. Now, I've hit a slight error in my local setup where it needs to run like a manual sqlite3 command. sqlite3 has not been found. And so I'm sort of walking through some steps with the LLM here to actually just try to fix my local setup. We can see that here it's trying to run an arbitrary script in order to get this fixed. And the interest in keeping this video relatively short, I'm gonna skip over this fix until I actually manage to get it sorted. All right, that is now fixed and it's now asking me whether I want to run the database seed command, so seed this nice fresh database. Okay, and after a couple more permissions checks, it is now complete. If you ran into any issues with the setup or with the LLM there, then please go and ping in the Discord. But hopefully the model that you're using was smart enough that it was able to navigate around them. So now let's check our context one more time just to see where we're at. Nice, so we ended up on 32% usage. That's good, that's nice and comfortably within the smart zone. So it was worth clearing out all that early context so that our implementation stayed within the smart zone. Nice. I'm going to open up a separate terminal here inside VS Code, and I'm going to run pnpm dev inside of it. And when I go there, I can see that if I log in as Emma Wilson here, and I check out some courses, the ones that I'm enrolled in here. then just under your progress here, I'm able to rate this course. And now that my rating has been saved, we can see it on this course up here. If I change it to a three, for instance, then it's going to change on the global course straight away. That's quite nice. And if I switch now to Olivia Martinez, who also has access to this course, and I give it a five star rating, we can see up here, it changes to 4.5 as the average. So that to me is looking pretty good for a first pass. What I'd now like you to do is go back to Claude and just type in commit here. This is going to get Cloud Code to add the relevant files to staging and then commit your code. And we can see here it's asked for permission to write a commit message and I think yes, that looks good. All right, so we have built our first feature with Claude Code. What I want you to do right now is to open up a Notes app and write down anything that you noticed about your session with Claw Code. Write down any unresolved questions that you have, such as what is plan mode, for instance. How do I successfully debug with the agent? How much should I be reviewing the code? going to be coming back to these questions throughout the course and hopefully we're going to get you some good answers for them. But well done, this was a long solution video, a long exercise. So hopefully the ones after this should be a bit easier now we've built the foundations. Nice work, and I will see you in the next one.
90 </video>
91 </lesson>
92 <lesson title="03.05-showing-context-in-the-status-line">
93 <video title="Explainer">
94 One thing that drives me crazy about Claude Code's default UI is how hard it is to monitor the context usage as you're using Claude Code. If you look at other tools in the space like Cursor or OpenCode, they're really, really clear about what your context percentage usage is at any time. And so that makes it really simple to stay in the smart zone because you're able to see when you're going over 40%. In the previous exercise, we had to exit out of plan mode, get out of our flow in order to run forward slash context in order to see it, it was just gross. But fortunately, Cloud Code gives us the ability to customize what we see in our UI via a status line. And I'm going to be showing you how to use a community package in order to get the context window usage in your viewpoint at all times. So just like me, you can max out on context window paranoia. The first thing I'd like you to do is to clear your context window with forward slash clear here. And then use Shift-Tab to cycle between all of the different modes until you just reach the default mode. So again, shift tab, just tapping between all of these different modes until you see the kind of like question mark for shortcuts button. Now what I'd like you to do is go to the page below where there should be a copy as markdown button. and I'd like you to copy the entire article below into your clipboard so you can paste it into Cloud Code. Once you paste it in, you should see something like this, pasted text number one with 80 lines. There might be more or less if I've edited the instructions since. Next, you can go ahead and just press return here. and then it will begin walking through the instructions in the article to set up this package for you. This will set this up inside your global Claude installation, not inside the project, but inside your global Claude directory where you have your personal user settings, not the ones for your project. I'm going to accept this first command where it's making a directory. and then I'm going to allow it to read from .clawed during this session. .cloud is where my user settings for Cloud are stored, so I'm going to allow it to do that. It's now asking to make an edit from inside VS Code up here to add a status line part to the settings.json up here. This looks good to me so I'm going to press save here and that should allow the diff. It then went through and wrote a ccStatusLineSettings.json2. And actually, even before I've restarted Cloud Code, I can see there is now a context window usage down here, just below my prompt. And now I'll open up a new session here just by typing Claude. and I should see that my context window starts at 0.0%. So this context window number will update as we go through sessions and will allow you to keep a really, really close eye on what is happening inside your context window at any time. you had any issues with the setup then please go to the discord to figure out how to solve them. But if you're seeing what I see, then nice work, and I will see you in the next one. but just because the instructions tell me to, I'm going to cancel out of that, press Control C twice to exit.
95 </video>
96 </lesson>
97 <lesson title="03.06-what-is-plan-mode">
98 <video title="Explainer">
99 OK, now that we've understood Claude Code's default mode and the default way it works when we're not trying to necessarily steer it or impose any practices on it, let's start imposing some practices on it. Specifically, I want to talk about plan mode, which we saw in the implementation exercise previously but we haven't really discussed or know really what it is. Now if we think about the default mode that we saw earlier, which I'm going to call the Execute Mode. Cloud Code can really do kind of one of four things. It can choose to write files, it can choose to read files, it can choose to run bash commands, and it can choose to run commands from MCP servers. If you're not sure what an MCP server is, don't worry, we'll touch on it later. Now each of these four options are really useful of course, but if you think about it in planning mode, the right is actually a lot less useful. We don't really need to be able to write files when we're just planning ahead, right? And so what plan mode does is it disables the LLM from being able to write files. In execute mode it has access to everything, but in plan mode it just has the ability to read, to run bash scripts, and to contact MCP servers. What this means is that plan mode essentially narrows its options and makes it so that it only does read-only actions on the codebase. It also comes with a new system prompt here too, which tells it to take the instructions that it's been given to gather requirements, to ask questions, and then to produce a plan document at the end. Doing this upfront planning is really helpful for two reasons. First, it massively helps you as a developer further understand your requirements. It means that you can kick off a session with Claude with just a couple of sentences on what you might want to do. And then the LLM with you is able to interrogate your requirements by asking you questions, by pushing you closer to what you actually want. If you have any professional experience working as a developer, then this should feel very familiar to you, especially if you did any client-facing work. The client will often come in with vague requirements that you then need to actually hammer out into an implementation, and that's what you're doing with Cloud Code. Secondly, plan mode massively helps the AI itself because it prompts it to go and do an explore to fetch all the context that it needs. It's often the case that you will prompt it, then it will do an explore phase to see if what you're saying actually matches up to the code base, and then it will ask you clarifying questions and produce a plan. Cloud Code is actually so good at this and so trained at this that it can actually enter plan mode itself because it understands the benefits so much. It seems to do this more in some versions than others, or with some tasks than others, so it's not always reliable that it's going to enter plan mode. And so the best way to guarantee that it does is to go into Cloud Code and to run Shift Tab until you enter plan mode and you see plan mode on in the bottom. This means that most of my AI coding sessions look like this, where I plan something, we then execute the plan together, then we test it to figure out if it's working, then we commit and clear the context, begin a whole new plan. I found this, especially when I was starting out, to be the way to get the highest quality code out of Claude. So that's what plan mode is. Just a stripped down version of the execute mode with a slightly different system prompt tuned to create a plan at the end of it. And then once you're happy with the plan and you've iterated on it, then you can go into the execute mode. Nice work, and I'll see you in the next one.
100 </video>
101 </lesson>
102 <lesson title="03.07-the-plan-execute-clear-loop">
103 <video title="Problem">
104 So you should now have a pretty decent understanding about how Claude code works. And now that we understand what plan mode is and we've set up the context percentage in our status line, we are ready to begin the plan execute loop. I have got another feature that I would like you to implement and this one is slightly meatier. The idea is that students will be able to go onto a lesson here and they will be able to leave comments on that lesson. These comments can then be answered by instructors or by other students. The idea is that a discussion is often just as valuable as the lesson content itself. But I've chosen this feature because it could spiral into something enormous, or it could stay relatively svelte. You could make it so students' comments are not visible to each other and only the instructor gets to see them. or you can make it so that the instructor has an entire dashboard that they can view all of the comments on, they can maybe edit the comments or they can hide them from other users. You could have an entire moderation system here, you know, you could go as deep as you want. And so this is really a playground for you to experiment with plan mode and see what you can build. So what I'd like you to do is kick off a fresh session with Claude Code. and then add in some kind of prompt here. For me, I'm going to give it, make it so that students can comment on lessons and instructors can moderate their comments. and then I'm going to press shift tab twice to enter plan mode. The point of this exercise is to use a really, really small prompt to see how well Claude Code can take your simple prompt and turn it into a real fleshed out plan. I want you to treat this like a sandbox lesson. You're really just experimenting, watching the context window go up, and seeing how Claude Code behaves when you plan with it. Write down your observations and notes, and if you have any questions or just really cool observations, then go to the Discord so we can all hear about them. Nice work, and I will see you in the solution.
105 </video>
106 <video title="Solution">
107 So I'm imagining that this solution video is going to be fairly long again, so I want you to treat this video as totally optional. The important thing is that you walk through the plan loop yourself, you take some observations, and you get a sense for what's going on. but I know lots of people like to follow along with what I'm doing, so I'll include this video for completeness. So I'm in plan mode, I have my one sentence prompt and I'm going to start going for it. So it's gonna kick off an explore again to understand the current structure, database schema, existing patterns. It's actually running multiple Parallel Explorer agents, which is not something we've seen before in my run-throughs. These of course each have their own separate context and so when they come back they bring their summaries back into the main orchestrator's window. I find it really interesting that the LLM knows that probably my comment is like, my prompt is super underspecified, so it needs to go and grab lots of information. Alright, it's now finished exploring, it's now saying let me now read the key files I need to reference for the plan. So it's now pulled in all the information it seems to need, and we're already nearly at 30% context window usage. So it's obviously taken in a lot of information here and it's now initiating a planned sub-agent too. In other words, this planned sub-agent is having to do the work that we've kind of already done in order to freshly explore the codebase. so that it has enough information to actually go and write the plan. So with sub-agents, there is always a little bit of duplicated work going on, but the point is that you're saving tokens inside your orchestrator's context window, which is the goal. And that's essential because we need to keep the orchestrator in the smart zone. All righty, we now have a plan. Okie dokie. I wasn't really expecting it to come up with a plan this quickly, I was kind of expecting it to hammer things out with me. I find that the discussion that happens before the plan is actually really essential to making sense of the plan. So I'm really quite surprised that it created an entire plan just from one sentence and didn't try to interrogate me anymore. I think what I'm going to do actually is kind of cancel out of this and tell it to interrogate me about every single design decision inside here so that we both understand what we're trying to build better. I'm now going to press enter to kick this off and it should now actually ask me some proper questions. Here we go, should comments be editable by the author after posting? I think for this version, I'll keep it simple and just say no editing, that makes sense. What happens when a comment is deleted? I pretty much always prefer a soft delete over a hard delete, especially when we have, you know, comments where you could potentially have sensitive information in them, where you might need to reference the comment later if there's a dispute. Hard deleting it from the database just seems super unwise. I think comments should be only for enrolled users only, because that makes sense, because if you're part of the clique, then you want to be able to see other people's comments, but you don't want your comments to be public public. Okay, I'm happy with those, let's submit. OK, and we've got another round of these questions to walk through. Anyone enrolled plus instructor plus admin makes sense to be allowed to post comments. I think author plus instructor plus admin is fine for moderation. Now this one, this soft delete UX, the whether we show a placeholder or not depends on whether we're having threads, right? Because if we completely hide a message that has then some comments related to that comment. the potential tree structure that we're building here could be completely messed up. So I need to know from the agent whether we're doing a thread structure or whether it's a flat structure. Are we doing a thread structure or are we doing a flat structure? Because this will change this decision. I like replying in this context because it means you can actually batch multiple of these questions together and get the LLM to answer each one in turn. There we go. Should comments be flat, one level, chronological list, or threaded? I think for this, I just fancy a flat, simple chronological list. You could always add the others on top of it. So flat makes sense to me. Notice, by the way, just how in-depth I'm thinking about each of these discussions. Like this is really how I would think about planning for a genuine feature. Yeah, since comments are flat, no threading, soft deleted comments don't need to preserve structure, I think I still want to show a placeholder. This is a tricky one, the sort order, whether we want comments to be sorted. I'm really genuinely not sure here. I think it's probably safest to go oldest first so that it reads like a natural conversation. That means that if people are replying to other people, then we're going to get a kind of natural flow there and it'll be easier to read. So I'll pick oldest first. I'm going to go with a 500 character max, that makes sense. and then they should appear below all other content too, so all of that makes sense. notice just how many decisions we are needing to make here and notice how like we've really developed our understanding of the feature just through using plan mode. Think about how much context we would have lost there if we just accepted that plan straight away, right? So now if we scroll to the top of this newly created plan. We can see it's created this nice design decision here. Beautiful. So a simple chronological list, comments are immutable once posted. Soft deletion, show a placeholder, enrolled only, all of the decisions are right here. It's worth mentioning too that if I've had this discussion with the LLM like this, and I just review the doc and its major design decisions, I don't tend to then go down and review all of the actual implementation stuff. At this point, I really want to see something on the screen that validates my ideas, and we can go back and improve the code later. And that's, again, the value of having these in-depth discussions with the LLM before you actually crack on. It just means that you can develop a shared understanding with the LLM so that you don't actually need to go and review the plan at the end that much in-depth. So I remember from our context that we were at about 35% context window. I'm just going to double check that. Yeah, 35.7. Oh look at that, it just jumped up to 36.5 in front of my eyes, must have been a caching thing. At this point, it's a really hard decision as to whether I clear my context and initiate this plan, or whether I keep all the juice from everything I've said to the LLM in the conversation. because if we clear the context and the LLM just kind of goes along based on the plan, it might miss some of the richness in the conversation that we talked about before. But then again, we're getting really close to the dumb zone, so this is a really hard call. I think I'm going to risk it for a biscuit and I'm going to put accept edits on and say go for it. In other words, I'm keeping the context of all of the questions and I'm not clearing it before actually running. So this is gonna put the kind of dumb zone theory to a test. It's not that it falls off immediately at 40%, it's just that at 40% you might start noticing some artifacts that might indicate you're in the dumb zone. In fact, I'm already being filled with regret because if you just look at how many tasks it's creating here. I can already tell that our session is going to end pretty deep in the dumb zone. Maybe it's not too bad, I'm at 43.2% and rising. It's interesting to note just how relaxed the execution phase is once you've understood the plan really well with the LLM. Because the LLM is just working from all the data inside its context, working from the plan, it's really just wrote at this point. Okay, we've reached step seven, which is to verify with the type check. This of course is where things could go awry. We're at 47.6%, but if it discovers some issues with the unit tests or with the type checking that need kind of like lots of feedback loops and lots of iterations to fix, then that could end up being pretty nasty. Okay, but it looks like all our tests are passing, which is great. Alright, we've finished on 48.2% which is not too shabby. now logged in as Olivia Martinez here as a student who's enrolled in this Node.js course and I can go down here and I can say great stuff guys let's put that in post the comment And look at that, the comment gets added here. Very nice. So Olivia can delete her own comment, it seems. But if I log in as a separate user, let's try Liam Thompson, who's also enrolled into here, then we should be able to not delete Olivia's comment. That looks correct. Let me just add one from Liam. Hey guys. And post. And now Liam can delete his own comment and it says comment removed. Very nice. However, it seems that if I log in as the instructor here, Marcus Johnson, who actually owns this course, I can't see the comments. Now if this wasn't already an 8 minute long video, here's what I would do. I would go back to my session here and I would say, okay, how much context window have we already used? Are we already in the dumb zone? If we are in the dumb zone, which we are, I would clear the context, kick off a new plan to fix that bug and go through it again. But if we were still early on in the smart zone, then I would say, sure, we've got enough budget left that we can probably fix this bug with all of the information again, and we don't need to go through that explore phase again. So that is the plan, execute, clear loop. You plan a feature with the LLM or you plan a bug fix. by shift-tabbing into plan mode down here. And as we saw, it's incredibly powerful to get the LLM to grill you about all of the decisions it made in its plan. You then monitor the context window and see once you've completed the plan mode, whether you want to keep all of that initial discussion context, or whether you just want to clear the context with the plan inside and go from there. Once you've executed it, you QA it to check if there are any bugs or anything weird about it. If there are, you kick off the loop again, or if you can squeeze it in and if you've got context left over, then you can do it from within the same context window in the same session. So well done, if you made it to the end there, I think we actually uncovered some really nice nuggets. Nice work, and I will see you in the next one. and hopefully you're starting to understand my context paranoia and the decisions I'm making around when to clear the context and when not.
108 </video>
109 </lesson>
110 <lesson title="03.08-compaction" name="Compaction">
111 <description>Talk about what happens if you don't clear and if you compact. Walk through a compacting session.</description>
112 <video title="Explainer">
113 Throughout this course so far, I've been leaving something unexplained and it's time to open the lid and see what's actually in the box. I want to talk about what happens when your context window goes all the way into the dumb zone and actually goes all the way up to the context limit. In other words, you start your session in the smart zone, you do a good job in the smart zone, but let's just say you carry on, carry on, carry on through the dumb zone. What actually happens when you reach the end of the context window? In other words, you have spent 200,000 tokens in a single session. What is going to happen then? We can examine this by running Claude inside our project, of course, and we can do a context command here. Now this context command shows the context usage that we have here, currently sitting at 7%, very nice. But right at the end of the context we can see an auto-compact buffer here. So this is 33K token, so 16.5% of the context is reserved as an auto-compact buffer. Now this is not being used by the context window, so it's not affecting the LLM. There's no tokens actually being put here. It's basically a stopgap. In other words, when we cross into the auto-compact region here, it's going to automatically run something called compact. What is Compact you may ask? Well, let me show you what it does. So I've gone back to a previous session here where we had used 49% of the context window. In this session we implemented the lesson comments table here. We added the comment service, we added a lesson page, add and delete comment actions, and we had comment section and comment card components. Now let's imagine for a second that I wanted to carry on this session and give it some feedback on its implementation. 49% feels a little bit freaky deaky here, so I might be tempted to clear the context. But clearing the context would mean I would need to explore the entire repo again, I would lose all of this nice context of what was actually implemented. My context window kind of looks like this at the moment where I have this nice watch of good context, I just want to fit it into a smaller space so that I can do some work in the smart zone. this is what compacting in theory does. It takes a large wadge of context here with a bunch of repeated tool calls with maybe some stuff that doesn't really need to be there and it just gives you a summary of what the most useful stuff is. And crucially, it uses an LLM to do this. So this process of reducing the large context into a small context does cost you tokens. But of course, exploring the repo again would also cost you tokens. So I'm going to try this out by going into the Cloud Code session and running Compact here. I now get the option to pass in custom summarization instructions. This allows me to pass some information and maybe some guidance to the LLM that's doing the summarization. And I tend to use this to tell the LLM why I'm compacting. In other words, what I'm about to do after I compact. So for this one, I'm gonna say, I've just implemented a feature and I want to do some QA on it. That really is it, that's all the LLM needs in order to do a better job I've found. So let's actually run this and see what happens. We can see a compacting conversation thing comes up. This does usually take a fair chunk of time, maybe a minute, maybe two minutes, depending on the size of the conversation. Okay, and we can see it is now compacted. There's a little bug here where it still shows the original context in the status line there, so this is deceiving. But we can see it also gives us a little printed summary here. It keeps some of the main files in context, so it keeps apps routes in context, keeps this lesson.lessonid in context, it keeps references to some files as well, so this file is not in context but it's still referenced. And it's also got the plan file referenced as well, so it knows that the plan file is there. We can also press Control 0 to see the full summary here. So here we go, this is the full summary of everything that was in the conversation. Along with the files and the references above, this is all the LLM has in its context. and we can see it's simply a markdown file here. So you see how little remains here, I mean, like, there really isn't that much. It's just a set of bullet points and some code samples saying what was in the conversation. We can see it preserves all the user messages here, so anything that we might have said, I suppose. And we can also see that there's a pending task to QA the feature. The user's additional instruction says, I've just implemented a feature and I want to do QA on it. So there's our intention being preserved in the compacted conversation. This is nice too, it also preserves the full transcript in a file. So if it needs to reference anything that was said in the previous transcript, it has a reference to it. We can then exit out of here by pressing CTRL-O again. And I'm just gonna show you what's in the context by running forward slash context. zoom up to the top here we can see we are now at 12% tokens. So we've compacted all that big conversation into just 23K tokens. So this is what compacting does, and this is what would happen if you were to hit the auto-compact buffer. So a natural question becomes, why do we bother clearing the context at all? Why not just allow the context to grow until we hit the auto-compact buffer, then we zoom down again, zoom up again, zoom down again, and just keep on going like this. Well, the reason is that every time you hit the Auto Compact Buffer and you go back and you compact... it leaves what I like to think of as a little sediment inside the context. When you then go up again and hit the auto-compact buffer again, you leave another little piece of sediment. and these layer up and up and up, and they affect the output in unpredictable ways. whenever you start a session with an agent that has some of this sort of sediment in its context, it means that it's in a different state from the way you usually work with it. Whereas if you optimize your workflow to work with an agent that always has nothing in its context, then you find you end up with more predictable outputs each time. Not only that, but you spend fewer tokens, of course, because you're not spending tokens on compacting. You're spending more time in the smart zone because you've got more smart zone to work with and less time in the dumb zone. So your code quality outputs tend to be higher. I have to caveat this with this is my mental model and this is what I have found best results with. In other words, this is my opinion. It also happens to be the view of lots of people in the community as well, so this is kind of an agreed-upon idea. However, there are prominent people that say you should just, you know, continue working and just hit the auto-compact buffer and not have to worry about context ever. So the question then becomes, when should you compact? I find myself compacting relatively rarely. It tends to be in cases like we discussed where I have just finished a large session and I just want to add some extra feedback on top. I've also found this really useful when debugging or trying to solve a complex error. For instance, you fill up your context window with things that you've tried, and you don't want to lose them by clearing the context and just going back to nothing. So you compact, you say to the LLM, okay, we tried these things, let's now try some more. But you should notice that the times that I'm using compact are times where I'm working with the LLM directly. Our end goal with this course is to get to a place where you should not need to touch the LLM, it should be working relatively autonomously. And so relying on compact for your workflows means that you're also relying on you being there to tell the LLM when to compact. which of course is useful, but not quite where we're going in this course. So that's what compacting is. It's a Claude code mechanism for essentially making sure you can have an infinite conversation with Claude if you want to. Compacting multiple times over a conversation is considered an anti-pattern by me and lots of people in the community. Because you build up these horrible little gunky layers of sediment in your context from maybe unrelated conversations? And what you should be optimizing for is a clean context, not one that has a bunch of memories already in it. When I do compact, it's usually only once per conversation, and it's usually only in cases where I'm working on a difficult, long-running task. I want to stay in the smart zone and just give it a bit of extra feedback. In general, I organize my setup and my harness and all the things I use with Claw Code to make sure I never have to compact. And when I do compact, I usually feel bad about it afterwards. However, this is just my opinion and you may find different results. and they may even improve compacting in the future to the point where I actually like it and use it again. Nice work and I'll see you in the next one.
114 </video>
115 </lesson>
116 </section>
117 <section title="04-day-2-steering">
118 <lesson title="04.01-what-is-an-agents-md-file">
119 <video title="Explainer">
120 In one of the early lessons of this course, I talked about how LLMs just forget everything as soon as you clear the context. And immediately you probably ended up thinking, okay, that sounds terrible. That sounds awful constraint. Because I have preferences that I want to be able to teach my LLM or I have certain ways of working or certain patterns that the LLM maybe is not very good at by default. It's going to be absolutely brutal to tell the agent every single time about what my preferences are before it goes and does its work. It would be great if there was some kind of memory mechanism for Claude Code. some way that Claw Code could learn my preferences or at least my repo's preferences over time and then use that to better improve its output. Fortunately there are multiple ways to solve this problem, both supported by Claude Code and some supported by the community. And in this section, we're gonna walk through how to use these to the best of your ability and which ones to pick. The first one we're going to look at is agents.md. which is a simple open format for guiding coding agents. You can think of Agents.md as a read-me-for-agents, a dedicated, predictable place to provide the context and instructions to help AI coding agents work on your project. Theappealofagents.md is how many places it is supported by. or rather how many tools use it. Gemini, CLI, Devin, Codex, Cursor. The very notable exception here is actually Claude Code. Claude Code doesn't use Agents.md and doesn't recognize it. Instead it writes it as Claude.md. My desperate hope is that Claude Co will start supporting agents.md because it's just so stupid that it doesn't. And so because I'm very helpful and maybe a bit naive, I'm just going to refer to it as agents.md, whereas in fact, we will be writing claw.md. What we're going to do first is go into our project and run touch-claw.md here, just to see how it works. And we end up with an empty claw.md file in the root of our repository. I'm going to say to it, always reply to me in pirate language. And then we're going to save this file and then open up Claude inside my terminal here. I'll then say to it, hello, how are you doing today? And when we run this, we can see that it's now replying in pirate language. so we have successfully steered our coding agent. There's a really important thing to note here though, I didn't at any point opt into this claude.md, it just was pulled into the conversation. In other words, things inside claude.md are global for the entire repository. This means that no matter what I do in this repo, Claw.md will now be responding to me in Pyra language. And so this is the biggest downside with Agents.md or Claw.md, it is global. So this means that our Claude.md will be included in every conversation that we have with Claude. I'm going to copy and paste this about a thousand times here until I end up with an enormous file just to show you the impact on the context window. OK, we now have a 2,000 line file where it just says, always reply to me in pirate language. I'm now gonna kick off a new Claude code instance to make sure it picks up the changes in my Claude.md, and we can see an error here saying a large Claude.md will affect performance. If I go into my context here by visualizing the current context usage. We can see that I have burned about 10% of my entire context here on this memory file, on the Claude.md file. So everything you put in Claude.MD costs you tokens. And I have, no joke, out there in the wild seen Claude.mds that are not this stupid but certainly have around 500 lines or 1000 lines of stuff in them. So my default attitude with Claude.md files is paranoia and not wanting to put too much stuff in there, not only because it's global, but also because it costs you tokens on every single request. You notice I have quite a lot of paranoia when I come to working with AI agents. I think that's relatively healthy. I especially have paranoia about a specific command that Claude bundles with, which is the init command. If I run this here, you'll see what it does. It initializes a new Claude.md file by looking at your repository, exploring it, and putting some of the code-based conventions inside a Claude.md file at the root. We can see it's kicked off an explore sub-agent here, so we'll wait for this to complete. You can try this too if you like. I sort of don't recommend you do it, for reasons we'll explain in a minute. OK, we can now see the edit that it has proposed to make, and the edit is a pretty large Claude.md file. If we open up Claude again, if we cancel out of it and then run a new Claude session, we can see by running context just how much this is filling up our context window, and we should see that the memory file, it's nearly a thousand tokens here that it's created. Now that might not sound very much, that's just sort of 0.4% of the context window, but if you imagine that's on every single request, that is just gonna push you closer to the dumb zone and cost you tokens on every request. And a lot of this stuff is just stuff that you could discover very simply by itself and stuff that will probably go out of date quite quickly, too. For instance, the stuff inside the package.json here is incredibly easy for it to discover. Just check the package.json file. And if you ever change any of these scripts, you'll need to remember to go into claude.md and update this too. So for that reason, I really don't recommend you run Claude in it, even though the UI will tell you to a bunch of different times. It's also worth saying that Claude often actually ignores the stuff inside Claude.md. Claw code injects the following system reminder with your claw.md file in the user message to the agent, which is this context may or may not be relevant to your tasks. You should not respond to it unless it's highly relevant. In other words, Claude will just ignore anything inside your Claude.md file if it thinks that it's not relevant to its current task. This, by the way, is from an excellent article from Human Layer on optimizing your Claw.md file, which I'll link below. So even if you put stuff in there, Claude has the option to just ignore it if it wants to, which means that steering it is not always reliable. So now you understand a bit more of what claude.md is or what agents.md is, when should you actually use it? we're going to be getting into this and talking about it a lot more during this section because it's not an easy question to answer. But anyway, let's summarize. Essentially, agents.md and claw.md are the same thing. Except that Claw code never listens to Agents.md and will only listen to Claw.md files. Claude.md is often ignored and that's kind of by design because Claude Code actually tells the agent or tells the LLM do not always use this. And finally, CLAW.MD is global, so it's included on every single request you make to the agents. So you better make sure that the stuff you put in there is relevant to every single request you're going to make of the agent. So nice work folks, I will see you in the next one.
121 </video>
122 </lesson>
123 <lesson title="04.02-steering-an-agent-with-the-agents-md-file">
124 <video title="Problem">
125 All right, so we've learned what an agents.md file is, but let's now use it to steer Claude code to build some features. To save you a bit of time, I've put together a plan that you are going to implement. In other words, I've gone into plan mode and I've put together a bookmarks feature. Bookmarks are private to each student, they persist until manually removed and are inline only, so there's no dedicated bookmarks page. I'll make this available to you below so that you can copy and paste this directly into Claude as I'll be doing in a second. Now we mentioned last time that Claude.md can be used to steer the agent. But what particular patterns do we want to steer it towards or against? Well, I've added something to myclaw.md that is something that I really hate to see from AI agents. It says here when you have a function with more than one parameter with the same type use an object parameter instead of positional parameters. In other words, this first one up here is the bad version where we have an add user to post where we have user ID as a string and then post ID as a string. It is very, very easy to call this function with user ID and post ID switched. And so in my opinion, it's much better to have an object as the parameter. So you need to pass an object instead of positional parameters because then you specifically need to name which one is the user ID and which one is the post ID. When I was building this dummy repo, I put in lots and lots of these positional parameters in here. So now in this exercise, there's going to be a funny conflict because the code base is going to be doing mostly this, but we're telling the agent to do mostly the new good one. This means that during its explore phase, it's probably going to see a lot of this, but it's Claude.md file will be telling it to do something different. So let's see which one wins. So to set this up, I'm going to open up a new Claude session here. I'm going to grab my plan here, and I'm gonna copy and paste it, and then just go and paste it directly into Claude Code here. and it's now kicking off as it will be for you. I would like you to sit back and observe what you see from Claude Code here. Is it obeying the Claude.md file, or is it obeying the stuff that's in the code base? And while this goes on, see if you notice any patterns in the code that it creates that you would like to steer against in the Cloud.MD. We're really developing an instinct for how to steer the LLM, even if, as we discussed, the Claude.md file is maybe not a perfect place to do the steering. So best of luck, follow the steps below if you get stuck and I will see you in the solution.
126 </video>
127 <video title="Solution">
128 All right, let's walk through what it's done so far. It started by exploring the code base. In fact, it didn't end up kicking off an explore agent, I suppose, because we'd already done some upfront work and because this plan was pretty detailed. It then added a bunch of tasks down here and it's about to start kicking off the plan. So I'm gonna press shift tab and just allow it to create all edits. Now I can already see some promising signs here. Ooh, yes, lovely. Inside bookmark services, if we open it up by going bookmark service, we can see that it's using this opts pattern that it's read inside its claw.md. It has a user ID of number and a lesson ID of number here. And we can really tell it's doing the same thing as in the Claw.md because it's used the same parameter name. So we had opts inside here, which is just my preferred way of doing it. And this one has opts as well. We can see too it's actually done this for all of the different functions here, so isLessonBookmarked and getBookmarkedLessonIds all have this same pattern. So I've now been sat here for a while, and I've let Claude Code complete all of its work. added the schema, the migration, the bookmark service, and the lesson viewer. But one extremely important thing it didn't do is it did not add any tests to the bookmark service. This is pretty bizarre to me because there's tests for a bunch of other services inside here. which is visible only from the file system, but I suppose because we didn't specify in the plan, then it didn't end up writing any. This would be a pretty high value instruction to add into a Claw.md file. So let's actually go through, search for Claw.md and add it inside here. So here's what I'm going to add here. Anything marked as a service by the name of the file, for instance, AuthTokenService.ts, should have tests written for them in an Accompanying.test.ts file. This is a really high leverage instruction because it's just a small little hint to the agent that this is a convention that we use. This is unlikely to rot away or go out of date because it's not actually specifying or naming any named files. It's just really a piece of jargon in our application that is unlikely to change. If you've ever worked on a complicated enterprise project, you know how sticky jargon can be, and I feel confident that the word service will in general mean, in this application, a tested unit. and a tested unit that provides a specific piece of functionality. Okay, now that I've updated this Claude.md file and saved it, I'm going to go back into my Claude code and I'm going to say review your work with the updated Claude.md in mind. The reason I'm specifically mentioning this is because if you make updates to Claw.md while Claw code is running, it won't pick them up until you go into a new session. I don't want to go into a new session because I want all of the context that we've had before. I'm just about able to stay within the smart zone for this. and I just want it to pull in the most up-to-date version of the file so it understands kind of what has changed and what I'm saying. There we go, it's picked it up. The updated Claw.nd ads rule services must have accompanying.test.ts files. It's looking for existing test patterns and then it's gonna write some tests. All right, now it's writing the bookmark service tests. Okay, all 10 tests passed, the missing test file from bookmarkservice.ts has been added, and all services must have accompanying .ts files. We've stayed just inside the smart zone, and we've successfully done a bit of steering. Now the LLM will remember both of these rules about the positional parameters and it will remember anything marked as a service must be tested. I've also just done a bit of QA on the feature and I can see that it is now working too. So I had a bookmark here, it adds a bookmark to the section header and nice, it seems to work too. we've successfully built a feature and we've improved the LLM's output potentially for next time. Nice work, and I will see you in the next one.
129 </video>
130 </lesson>
131 <lesson title="04.03-progressive-disclosure">
132 <video title="Explainer">
133 Alright, I've taken the claw.md file that we had in the last lesson and I have opened it out a little bit more and added a bunch of sediment inside here, basically as if a bunch of developers had gone in and added their own ideas into this file. If you've been using Claude with a bunch of teammates, then you probably have one that looks a little bit like this. Now this is not exactly ideal for reasons that we've kind of already touched on. We have put a bunch of instructions into the global scope here and not all of these are going to be relevant for every single request that we make of the LLM. All of these instructions are going to be competing with the instructions that we give it in the prompt, for instance, in the plan. And if we're only touching front-end code that maybe doesn't have any positional parameters, that doesn't touch any services, maybe we don't even need to do any importing. Maybe we don't need to add a new ID to the database. Maybe we don't need to add a new timestamp to the database. You know, all of these instructions are actually pretty narrow in the kinds of sessions that are going to need them. I want you to imagine that each one of these instructions is one of these grey blobs in the context window. Now it might be that only these little blobs here are actually relevant to front-end code. It might be that only these instructions are relevant for database code. Maybe these instructions are the ones that are relevant for React Router. Now looking at all of these blobs, isn't it funny that we've got them grouped all into one file? Wouldn't it make more sense if they were grouped together? Since a session that needs one piece of React framework advice will probably need more React router framework advice. So what if we group them like this instead where each one was in a separate file all located together and all of the unique ones were in their own file? And then inside the Claude.md file you would have just a set of links which linked you to the correct grouping when needed. So these would simply be separate markdown files and Claude.md would use markdown links to link you to the right place. Now what we're organically discovering here is a really important concept that's going to recur throughout the rest of this course. And that idea is progressive disclosure. Progressive Disclosure is actually an idea that comes from UI and UX design. The idea is that bad UI involves putting all available actions into one huge screen that the user can scroll through and choose from. If you've ever seen a menu of a bad website where it literally just has a hundred options up at the top in one single flat list, you know what I mean. Instead, you should reduce the number of choices that the user has to make upfront and then allow them to find kind of information based on that small number of choices. So Progressive Disclosure of Complexity is the name of the game. Instead of throwing all the complexity out at once, you just sort of give them a map and you allow them to navigate through it. This Claw.md file chucks all the complexity in at once. And as this grows bigger and bigger, I would be much more likely to progressively disclose parts of it. In other words, take it from the claw.md, put it in separate files that aren't immediately visible to the LLM, and then just link it from the information that it does have inside claw.md. Now this concept of progressive disclosure is gonna come up again and again and again throughout this course. It goes into software architecture and code-based design and all that stuff. But in the next couple of sessions, we're going to look at this in the context of steering. So nice work and I will see you in the next one.
134 </video>
135 </lesson>
136 <lesson title="04.04-what-are-agent-skills">
137 <video title="Explainer">
138 So we've been talking about agents.md and progressive disclosure and we've been kind of like not very kind to agents.md. It is not great that agents.md just puts everything into the global scope. That is no fun at all. So wouldn't it be great if there was a built-in solution accepted by the community as the right way to do progressive disclosure when you're steering agents? And that solution is agent skills. This is an open format accepted across coding agents, including flawed code. Anthropic were the ones that invented it, but they then gave it away. It is a simple open format for giving agents new capabilities and expertise. Agent skills are folders of instructions, scripts, and resources that agents can discover. The key word there is discover, because they are not forced to take this information in. They're not forced to see it, but it is in a place that they can easily discover it. Let's take our example from before to illustrate what we mean. Before, all of these React Router-style instructions were just forced on the agent, like immediately. They were present in its context window. Same for all of these front end ones and same for all of these database ones. All of the instructions were right there. But what if instead we refactored these into their own file and we gave a little bit of metadata to the agent to say use this skill when you need help with the React Router Framework, use this one when you need help writing front-end code, use this one when you need help writing database or drizzle code. And so all the agent would see by default would be the name of the skill and then the description of the skill. And then if it chose to, inside the session, it could basically call the React Router skill and bam, all of the instructions would become available to it. This is the idea behind skills. They allow you to progressively disclose instructions. And they allow you to kind of build a user interface for your agent of all the information it might need to know. So inside the repo, I've added a couple of example skills. These are skills that I genuinely found were very useful while I was building out this application. If we open up the Explorer, we can see them inside .clawed, then skills here. And there are two skills here. One for a better SQLite 3 rebuild. And the first one we're going to look at is PMPM not found. Notice how these skills, just like the Claw.md, are inside the project folder. Just like Claw.md, you can have them inside the project like this, or you can have them scoped globally to your user. In my setup I've got some user scoped skills and some local skills. Of course, the ones that are scoped to the project are much easier to share with your teammates as well. So that's worth thinking about too. If we click into skill.md here, we can see that it follows a certain pattern. It has a piece of front matter at the top here, which has the name and the description. Just like we mapped out earlier, by default only these things are available to the agent. so it cannot see the rest of this file inside here. The description says, fix pnpm command not found errors by enabling corpac. Use this when pnpm cannot be found, corpac errors appear or the package manager is missing. It turns out that LLMs really don't know about CorePack, and maybe you don't know about CorePack either. But if you encounter errors like pnpm command not found, you can just literally run CorePack enable and then everything will wire itself together. So this is a perfect thing to put in a skill because I don't want this in a global Claw.md file because the error really only happens very rarely. But when it does happen, it's really high leverage for the LLM to know how to fix it. The other skill that I've got here is this better SQ Alliance 3 rebuild. Use when seeing errors about BetterSQLite, NativeModule, NodeModuleMismatch was compiled against a different Node.js version. So it's a very clear description of when I wanted to invoke this skill. It's again, very similar to the other PNPM skill that we had there. It's just trying to fix a relatively rare issue where if it's a node of module version mismatch, then it should run NPM or PNPM rebuild. These are perfect use cases for skills because we're just steering the LLM in the right direction to avoid a rare issue. Now you may be thinking, if you look at this diagram, how possible is it to go even further in the progressive disclosure? Is it possible within the React Router skill to have other documents inside there that might speak to specific weirdnesses with React Router? Could you potentially take the entire React router documentation and put it inside a skill? Well, you absolutely could. For instance, if you wanted to just create an extra little file inside Better SQLite Rebuild, perhaps just take this and just imagine this is like a more complex script than you think, put it inside script.sh, and then you can just have the script in there and reference it from this file. The way you reference it is simply by using a markdown link here that's very, very easy for the LLM to follow. So this is a great way of hiding complexity within your skill. Your skill might want to bundle scripts, it might even bundle images, you could bundle anything you can put on the file system inside a skill, but you don't have to put it inside the proper skill.md file. So this is why skills are really so exciting, because the progressive disclosure inside them means you can fit really high leverage, high power things in them. then bundle those up really simply and share them with any coding agent and your team. And so skills as a steering mechanism are really, really attractive. I want to note something interesting finally, which is I think of there as being two types of skills. Skills that you want the LLM to invoke, such as the ones that we've been looking at here, or skills that you personally want to invoke. You can invoke a skill in the same way that you invoke any command inside Claude. We can actually use the skills command to list all the available skills. You can see we've got some project skills here and I've got a bunch of user skills up here. and we can press escape to close this. To invoke a skill, all we've got to do is just, let's say, use one of these skills here. So let's say we just invoke the Better SQLite Rebuild skill. And I just pressed return here to figure it out. And it thinks, you know, it's a little bit weird kind of doing it in this conversation, but it sort of figures out what's going on and it's going to run PMP and rebuild. Now the difference between a user invoked skill or an LLM invoked skill is basically what you do with this description. because this description is a description to the agent to use this skill under certain conditions. It turns out that if for whatever reason we didn't want to allow this skill to be used by the LLM, we can actually just omit the name and description here. This is kind of like in our diagram just basically deleting this whole section here. Now this React Router skill would only be invocable by the user. The LLM would have no reason to invoke it because it can't see a reason for doing so. There's no description of use when. So I find user invoke skills a really cool mechanism for steering the LLM to my preferences without needing to add an extra instruction like a description to the context window. So that's what agent skills are. They are a mechanism for steering the LLM using progressive disclosure. And I think of there as being two groups, either user invoked skills or LLM invoked skills. nice work and I will see you in the next one.
139 </video>
140 </lesson>
141 <lesson title="04.05-a-skill-for-writing-skills">
142 <video title="Problem">
143 Now that we understand what skills are, let's actually go and write our own. And we're going to be doing this in my recommended way of writing skills, which is using a skill to write skills. I've added this write a skill skill to the repo here which basically walks through the process for writing the skill and how I recommend you write one. This comes from my personal repository of skills which you can install using the command below. What I'm going to get you to do is to create a skill to migrate from one library to another. Specifically, we are going to be migrating from Zod to a library called Valibot. Now usually what I do is I just plan this migration using plan mode and I would then execute it in a single session probably, it's not that big of a job. But let's imagine that this is only one repo in, let's say, a hundred repos that our organisation owns and we want to move everything from Zod to Valobot. So we want a reusable skill that we can share across our organization to move us from Zord to ValorBot. We're going to create this skill in our project directory. And then we're going to start a new session with Claude to just run the skill. So the way you're gonna do this is you're gonna start a new session here, you're going to invoke the skill by finding the write a skill skill here and then invoke it and give it a prompt to say what you want to do. In my case, I'm gonna say write a skill which translates all of the code in this repo from Zod to ValorBot. Go and research ValorBot's API and the differences between it and Zod, research that and create a really in-depth skill that takes one from the other. We don't just want it for this repo, we also want it for the other repos that our organization owns. You are then going to run this and see what happens and see what kind of quality of skill is created and then use it to translate this repo from Zod to Valobot. Nice work, and I will see you in the solution.
144 </video>
145 <video title="Solution">
146 All right, let's run this and see what we get. As we can see, it's gone into a research mode where it's going to research ValorBot's API and examine the Zod usage patterns in this repo in parallel. So it's kicked off a task here where it's going to do a bunch of different web searches. So I'm going to allow it to do any web search it wants to. It's then requesting to find individual pages here. This one looked particularly high value. Migrate from Zord, yes, I think we'll definitely need that. So it really went pretty crazy with the ValorBot API. It spent six minutes researching it. Damn. It's now going ahead and exploring the Zod usage in the repo. I sometimes get this bug from Claw Code where it says it's going to do something in parallel, but actually the tasks are done sequentially, which is very annoying because it just means more waiting around. Okay, it has now decided to make the skill directory. It is creating it within CloudSkills Zod to Valorbots. And it is kicking out a pretty big file here, which I'm just going to accept unseen for now until we get in there. So where is it? Aha, I'm actually going to interrupt it because it's actually put it inside my project Claude or sorry my user Claude, I actually want it local to this project. So it's now asking to run a command which is going to move it it looks like, so yes that's fine. and it's going to create some extra files here for what looks like progressive disclosure, that's nice. While that's going on, let's take a look at it. This is odd to ValorBot file and bump it about to there so you can see. So first of all, we can see the name and the metadata are up here. It says migrate TypeScript code bases from Zod to ValorBot. Use when the user mentions ValorBot, wants to replace Zod, migrate validation schemas, or reduce bundle size with tree shakable validation. Now this description reads to me like it's a skill that the LLM is supposed to invoke and it's giving the LLM rules for when it should invoke it. For me, I think of this as a user invoked skill. So I don't see a reason to provide a description here because we never really want the LLM to use it itself. It's a command that we always run. So I'll delete the front matter there, that makes most sense. Oh very cool, we've got install ValorBot here, npm install ValorBot, try the automated code mod first, so review the output for errors, that's really nice. Fix whatever the code mod missed, and use this progressively disclosed file reference.md. Oh, that's nice. Migrate any validation wrappers, that's really cool. Update all of the imports. Verify by running the type checker, running the test and removing Zod from dependencies. And then it's got a list of critical gotchas. That's so cool. Let's take a quick look inside reference.md here, where is it, just here. And we can see this looks like a really complete API reference. Wow, look at all this stuff in here. I mean, it's super duper in depth. This is again a benefit of progressive disclosure here because we can just stick all of this stuff in and it won't be seen until it's needed. We could even probably progressively disclose this further, so by putting more of these groupings into further files referenced from here, kind of like a traditional documentation site. Let's also look at examples here. So very nice. Just a whole load of different examples that go from Zod to Valobots. Okay, let's try running this now. I'm going to cancel out of this. I'm going to run Claude again, and I'm just simply going to run Zod to Valobot, and let's see what happens. Let me bump this up a little bit so you can see it and zoom us in here. So we can see again, it's kicked off an explore phase, which is nice and healthy, that's what you want. And it now says it has a full picture of the migration. Let us start executing it. OK, so it wants to install Valobot, which is a good start. I'm not feeling the need to particularly babysit this here. I'm fairly happy just to let it run. And just in anticipation, I'm going to turn on Accept Edit On using Shift-Tab to make sure that all of the edits are going to work. Oh dear, I have some concerns about our context window usage here. We are creeping up to 60. I'm hopeful that because the changes are relatively simple, we won't need to be too scared here. i.e. we can have some time in the dumb zone, but because the work is so simple, we don't necessarily need to be too scared. We can even see it's actually using some sub-agents here. So it's actually running three task agents at once to migrate a bunch of stuff at once. I'm a bit scared too that it's doing all of this stuff without referencing the feedback loops at all, it hasn't run any types or tests yet. Certainly if I were doing this as a human developer, I would want to migrate a bit, then run the types and tests, and then migrate another bit, and that kind of thing. So this whole one big chunk thing was maybe a bad idea. OK, all 19 root files migrated. Now it's asking to remove Zod, which I'm fine with. And it's now running the type checker on this too. So it looks like there's a little bit of error in here. This is where I start to get really worried about the context window because if there's any debugging to do or any fixing here, we are at 67% already and that is very scary. That's when I'm anticipating seeing some ennies creep in. So at this point, I'm really watching the AI like a hawk to make sure that the code it's producing is okay. OK, phew, it ran TSC again and there was no output, meaning that the type checking is passing. So now it's asking to run the test suite. Let's give that a go. Alright, sweet relief, the 288 tests pass and the type checker is clean. Let's go and see what it did, because this looks like huge amounts of changes. Wow, look at all this. So here we can see a nice simple schema here with Z.Object, let's actually just go into the file so we can see what's changed. We changed into V.Object with V.Pipe, String and MinLength. We have a discriminated union here which has been transformed into a V.variant. I would not have known how to migrate that personally, so it's pretty amazing that the skill knew how to do it. If you know ValorBot like I do, then you can go through some of the files like I just did and look at them and just sort of sense check them, make sure they look okay. Otherwise, I'm gonna do a little bit of QA on the site to make sure things are working fine. I mean, the loaders seem to be working okay, which is nice. We can still rate the course. We can do all the stuff we did before. You can feel free to QA a bit more in depth than I have, but this looks okay to me, especially since the types and the tests are both passing. So just like that, not only have we migrated our entire repo, we've got a reusable skill that we can take and apply to any of the other repos in our setup. This is, of course, just one use case for skills. They are just incredibly powerful for just powering up our AI. and then steering it in ways that match our preferences better. We've seen both skills that I've used to help it recover from errors and now a user invoked skill where we can just transform its behavior and make it do a certain sequence of moves. And of course, feel free to nab that write a skill skill to write some skills. Nice work and I'll see you in the next one.
147 </video>
148 </lesson>
149 <lesson title="04.06-automatic-memory" name="Claude Code's Automatic Memory">
150 <description>Describe the mechanism by which Claude code adds automatic memories. Purely a knowledge-based lesson.</description>
151 <video title="Explainer">
152 So far we've been talking about skills, we've been talking about Claude.md, but we haven't talked about Claude's automatic memory function. This shipped relatively recently at the time of recording. And what this does is it actually allows Claude to steer itself over time. We can discover this file by going into Claude here by just running a fresh instance, and then by using the memory command here. When we press return on this, we can see that we have user memory, so this is your user Claude.md. We have the project memory, which is checked in locally at claw.mt. And then we also have an OpenAutoMemory folder, which we're going to run. Now I've opened this in VS Code and we can see that there are actually no files in this directory. So on this particular project, Claude isn't adding any extra memory files in. However, I just ran this for a project that I actually work on for my job. And on this one, it did surface a memory.md file. So we can see here for this project it has written its own steering documentation. It's pretty arcane and specific honestly, it has some stuff about using npm install force instead of npm legacy peer deps. And then it does have some stuff on testing patterns, which appears useful. Effects use, or tests use Effect VTest with it.effect. DbTest use pgLight, v.mock is hoisted. So this will be put into the context alongside your claw.md file. Now this is still relatively new, so I don't honestly know how to feel about it yet. However, I would say that you should probably go in and check on this file every so often just to see that it's not adding some weird crud inside here. For instance, I don't think my LLM needs the hint about v.mock is hoisted. I don't think it needs this thing about DB tests use PG light, I think it can figure that out. And it's actually push schema has already been deprecated in my repo, so I want to get rid of that. and actually soon we're going to be moving off effect platform node anyway. So what I'd recommend you do is every couple of weeks, maybe just check inside this memory.md file and see if there's stuff that might be conflicting with things that you actually want your LLM to do. the worst thing that could happen is this could just go out of date with the actual state on your repo. Which is, by the way, the thing that I always am worried about when I see these kind of memory style systems. So there we go, that's Claude's automatic memory, how to find it, and how to edit it. nice work and I'll see you in the next one.
153 </video>
154 </lesson>
155 </section>
156 <section title="05-day-3-planning">
157 <lesson title="05.01-how-to-tackle-massive-tasks">
158 <video title="Explainer">
159 Now so far in the course you've probably had or been infected by my context window paranoia. I'm constantly thinking about the context, constantly thinking about the smart and the dumb zone. With the current generation of LLM capabilities, what I advocate for is staying within the early part of the context window. And this is okay for some tasks that can easily fit into the context window. small features and bug fixes. But what happens when you try to do something much, much larger? Something that bursts its way into the dumb zone like the refactor that we did in the previous section. or something where it's just obvious up ahead that you're not going to be able to fit it into even a context window, let alone the smart part of the context window. In other words, what do you do when you're faced with a task that's just too big? Well, the way you do it is the way that devs have been doing it for decades, which is you take the big task and you break it down into small tasks. That way we can do all of the work for this big task in the smart zone of the context. But the question then becomes, what kind of planning do you need to do to get this to work? because so far our plans have only been the duration of a single context window. We've not considered what it might look like to have some kind of document or set of documents to span multiple context windows. But fortunately the community has kind of coalesced around a common set of patterns. specifically to documents in particular. The first document we need is some kind of description of the destination, the place that we're heading. because if we don't know where we're heading, then how on earth are we going to complete the task? Some people call this the spec, the specification for what we're building, and some people call this a PRD, a product requirements document. I'm using PRD here, but the name here doesn't really matter. Now writing really good PRDs is a really important skill by itself and we're gonna be covering that in this section. hammering out exactly what you want to build is super important and it scales to really, really big tasks. Or you can write PRDs for relatively small, manageable tasks. And of course, this is where you as the human get to impose your taste on the LM and get to hammer out exactly what you're building. It has exactly the same set of benefits as plan mode, which is you understand the project more and the LLM understands the project more. except it can scale to much larger builds than just one context window. But there's a problem. If you only specify the destination here, how does the LLM know or how does your system know how to break it down into small chunks? In other words, without a description of the journey, then the LLM might just try to tackle this all in one big context window. This is why you will often specify a plan.md file next to the PRD. In other words, you will use one session to create the massive destination, and then you might use the same session or you might clear the context and then create the journey as well. you would break down the product requirements document into a set of phases where you basically say, okay, in this phase we'll do phase one, this phase we'll do phase two, phase three, phase four, phase five. And those really are just the three ingredients in your prompt. You're just saying we're gonna do phase N, then we pass in the PRD and we pass in the entire plan. Passing in the whole plan is really useful because it means that these phases don't step on each other's toes. Because the LLM can see, okay, we're doing that bit in phase 3, I'm not going to do that bit in phase 2. So learning how to build great PRDs, learning how to build great plans, and learning how to construct this prompt and understand how to best navigate your way through these massive builds, that is the topic for today's section. and it is a banger. Good luck, I'll see you in the next one.
160 </video>
161 </lesson>
162 <lesson title="05.02-write-great-prds-with-this-skill">
163 <video title="Problem">
164 Now we should understand why we want to use a PRD. A PRD is a way of describing the destination for a long multi-context window session. and the question probably immediately comes to you, how do I write one? What is the best format for the PRD? What should I include in the PRD? And how do I even put one together? Now, a lot of these are open questions. You will have different tastes about what you want to include in your PRD. This is a great thing to be talking about within this cohort and in the discussions. What should you include? What should you not include? But what I can give you is my best template that I've put together so far, that I've used a bunch to create these multi-agent sessions and to ship a ton of features. and I have encoded it into a skill. And here is the skill inside .clawd skills write-a-prd skill.md You can break it down into essentially two sections. The first one is a list of steps that the agent is going to complete when the skill is invoked. We first ask the user for a long detailed description of the problem. We then get it to explore the repo, so we manually say do an exploration step. And then, this is a really important one, we say, interview the user relentlessly about every aspect of this plan. We want it to really grill us about this plan. walk down each branch of the design tree, resolving dependencies between decisions one by one. So it's really going to aggressively question us about this design until we understand what's in the end state. And then once you've got a complete understanding of the problem, then use the template below to write the PRD and it should be written to a local file. So then the second section down below is the PRD template and this is essentially a problem statement, a solution and a long list of user stories. There's a couple more sections down the bottom of implementation decisions and then what's out of scope and any further notes. but we'll look at this template a bit more in the solution. Now what you are gonna do in this exercise is you're gonna use this skill to write a PRD for a feature. This PRD is going to be an instructor analytics dashboard. I'm really being quite intentionally vague here with my requirements because I want you to really hammer it out with the writer PRD skill. Instructors might need to be able to see enrollments, they might need to see how much money they've earned and revenue from their courses. They might want to see completion rates or quiz pass rates as well. and they may want to see which lessons resulted in a drop-off rate, or where their students stopped. So what I'm gonna do is I'm gonna open up a fresh Claude session. I'm gonna say write a PRD. using tab to complete it here so I don't submit it early. And then I'm going to describe vaguely what I want to the agent here. and then the Writer PRD skill should just take it from there. While you're walking through the skill, think about what this process gives you, how whether you like it and what you would look to add to it. And when you've completed the finished PRD, take a review of it, see if you would like to add anything to it, see it again if you like it, and just note down any thoughts that you have about the whole process. You'll also be able to watch my solution to see how I walk through this to interrogate myself and figure out exactly what I want to build and then review the resulting assets. So good luck and I will see you in the solution.
165 </video>
166 <video title="Solution">
167 Okay, let's start by dictating out what I actually want out of this PRD. I want an instructor analytics dashboard primarily focused on revenue, and I would like to hammer out with you exactly what that entails and what that looks like. They should obviously only be accessible to instructors. although admins should be able to see it too. Instructors can only view analytics on their own courses. Instructors should probably have a single analytics dashboard which displays the information for all of their courses. Okay, let's go in with that and see what happens. Okay, and predictably it's kicked off with an explore task here. just going to let it complete just so it can fill its context with everything in the repo that it needs. Alright, it's finished exploring, we now begin the questioning. Okay, the first question is, what are the key revenue metrics you want displayed on the dashboard? Total lifetime revenue, revenue per course, revenue over time, average order value, PPP discount impact. Whoa, whoa, whoa, whoa. Team Vogue, geographic breakdown by country, wow. I think I'm gonna go with option two here, revenue totals plus time. This is a good question about time filters on the granularity of the revenue over time. I think fixed periods like last 7 days, last 30 days, last 12 months look good. Should the dashboard include written enrolment metrics alongside revenue, total enrolments, enrolment rate over time, or should it be purely revenue focused? I think probably revenue plus enrolments plus ratings because we want to see student satisfaction data. That's really important. Even in that first round of questions, I can feel the thing coming into life more. It really is crazy how beneficial these conversations are, just really rubber ducking with a smart agent. and one that knows your code base too. Very cool. For the dashboard layout, how should the per course data be presented? Shows all the instructed courses on one page, how do you want to break that down? I was imagining when I designed this a summary cards plus table, I think. So you would have some summary cards at the very top and then a detailed table about all of the breakdowns. think that makes most sense so I'm going to pick it. For revenue over time, we probably do need a line chart, so yeah, we'll probably pull in a charting library. For ratings, I think AVERAGE PLUS COUNT ONLY makes sense. And we're done with the next set of questions. That's good. More questions, oh yeah. Okay, so fixed time periods, I think seven days, 30 days, 12 months. This is for the filtering the revenue trend line, yep. is an important one. For the admin view should admin see a platform-wide analytics dashboard all instructors or should they just be able to access any individual instructor's dashboard? If we did a platform-wide analytics dashboard, that would be a massive increase in scope. Now we probably could do that as part of this PRD but it wasn't quite what I was envisaging. I want the admins really to be able to view the instructor's dashboard just so they can see any issues with it, I think, and so that they can act as the instructor just for debugging purposes. So I think for now, let's just save to view any instructor's view. Should the summary cards at the top of the dashboard also respect the selected time period or should they always show the all time totals? I use Stripe quite a lot and I have a feeling that Stripe respects the time filter. So I'm going to go with what Stripe does in this case. Again, these are not easy decisions, and the LLM asking us right now... Oh my god, there's even more. Oh no, I thought it was over. Some of these questions, especially the UI ones, are really hard to answer, actually. And this is something that comes out during this planning stage, is that it's not necessarily simple to answer these in text. I would really like is a prototype to walk through here that would be the highest fidelity way to do it. So really you're having to use the power of your imagination to do this. Okay, I'm gonna go for the top one, which is just three cards at the top. Total revenue, total enrollments, and average rating. which column should appear in the per course table below the summary cards. It's giving me the option between basic and detailed. I'm gonna go detailed just to like really max out that area. And then how should admins access an instructor's analytics? They need a way to navigate to any instructor's dashboard. I think a link from the admin users page makes most sense there. Okay, we're getting through it. A few final questions on the chart details and some edge cases. You know how this is like a brutal grilling it's giving us here. it's so unbelievably useful because all of this conversation is going to end up inside the PRD. For the line chart, x-axis granularity, last seven days should show daily points, what about the other periods? Let's auto scale it, so 12 months does monthly, all time is monthly, 30 days is daily, that makes sense. The per course table should also be filtered by the selected period, I think. This is a really good implementation question. Should the time period filter be stored in the URL as a search param or as client-side state? URL params mean the page is linkable, bookmarkable, which I love, so definitely in a URL search param. If an instructor has no courses or no purchases yet, I think an empty state message makes sense here. Again, the empty state of this dashboard is not something I even considered going in, so having a question about it is really useful. Just a couple more details to lock down. Yeah, come on, let me go, let me go. When an admin clicks View Analytics for an instructor on admin users, what should the URL look like? Yeah, that's a great point. I guess this doesn't really matter, I'll just put it in like this one I think. The line chart should show revenue only. Should the per course table be sortable by clicking column headers? I think you probably do need it to be sortable, yeah. Notice all of these questions and we're only at 15.6% context here. You know, this is not a context-heavy operation, really. I mean, I suppose it can become one if it does a lot of exploration or it needs multiple rounds of exploration. Okay, I had to skip forward in time a little bit, but it's now ready to write the local file. So I'm just gonna accept this edit and let's go and review the thing that we've just created. Now here's a little secret for you, I very rarely actually go and review the PRDs. What I do is I trust that the conversation that I've had with the LLM, I kind of get the sense that we share the same design idea, that we share a concept for what we're creating. And so really all I'm doing when I review this, if I just give it a cursory glance, is just checking that that understanding is correct. And I haven't really found a situation yet where it isn't correct, so all I'm doing here is double-checking the LLM's ability to summarise a conversation, which I know it can do really well. But for the sake of this and for the sake of understanding what this PRD format does and how we might improve it, let's actually go and do this now. So this problem statement is really good. There's no way for an instructor to see total revenue, revenue trends over time, or per course breakdowns. This makes it impossible for instructors to make informed decisions about pricing, marketing and course development. problem statement gives an overall why which is really important for the implementation because then if an agent has kind of like decisions to make when it's implementing it can go back to the problem statement and understand why this feature even exists. Then we have a dedicated analytics dashboard for instructors that provides a revenue focused overview, blah, blah, blah. I think we kind of understand the solution here, and this is a nice summary of it. And then this is the real juice, we have a list, a numbered list of many, many 22 user stories in this PRD. Writing user stories is something you may have done as a professional developer, and AI is really, really good at it. You can think of these kind of like as acceptance criteria for what the final thing should look like. This is where we describe in detail what our destination is. But there are many, many different ways to frame this and phrase this. For instance, the Gherkin language under the Cucumber project does this quite well, like this. This is an actual coding language for specifying how these features should be put together. And since AI knows this quite well, you could use Gherkin here to specify exactly how these should work. We also number these user stories so that we can later reference those numbers in the plan as a shorthand, which I find works really, really well. For instance, number two is I want to see my total enrollment count across all courses for a selected time period so that I can gauge student interest. So for each story, we capture the role, who's doing it, we capture what they want to do, and then we capture why they want to do it. Again, tying it back to the why, which is very important. understanding why each decision is there, understanding why that decision was made is really important for anything implementing this because again, if it has any questions while it's implementing, it can go back to the why and hopefully use that to reason a bit better. So then if we fold up the user stories, we can see then we have a list of implementation decisions, a list of modules, first of all, a new analytics service, then a couple of new routes for the instructor analytics, a shared analytics dashboard component. You note here that it's not providing implementation suggestions here, it's just kind of saying these are the high level modules that we're gonna create. And then we've got a list of implementation decisions down below, or sorry, technical decisions. And this is really what we hashed out in the kind of conversation with the LLM, figuring out these technical decisions. Table sorting is client side, per course table respects the time filter, time chart granularity, daily data points for 30 days and seven days, monthly data points for 12 months in all time. Then below that, what have we got? Some schema changes, some kind of testing decisions as well, which I really like. We're specifically saying what to test inside here with our unit test two. We've got some prior art. Then we've got some out of scope stuff, and this is actually really detailed. This is basically the negative space of all of the positive decisions we had, where we said, okay, we do this and not this. So PPP discount, impact analysis, team versus individual purchase breakdown, all of this stuff is actually ideas that you could take further and have their own PRD created later. And finally, I just like having this further notes section at the bottom here, so this is just for any kind of stuff that doesn't fit into the top sections. And so there we go, we have successfully described the destination for where we want to go with this analytics dashboard. And hopefully, if you walk through a similar process to me, then you should end up with a similarly detailed document. You might have slightly different emphases in your document. The sections might be a slightly different lengths, but it will probably follow a template like this. And like me, you are probably now itching to see this come to life and actually build something from it. So nice work, I will see you in the next one.
168 </video>
169 </lesson>
170 <lesson title="05.03-split-features-across-multiple-context-windows-with-multi-phase-plans">
171 <video title="Problem">
172 Alright, so at this point we've created our PRD, which is our destination, but we've not yet created the plan. I'm going to argue throughout this section that there's a good and a bad way to do this plan.md or perhaps, let's say, a good way and a naive way. And so we're gonna start by doing it in the naive way. I would like you inside the repo to open up a new Claude code session. and you're going to then specify the PRD Instructor Analytics dashboard. and we're going to use the at symbol to pull it directly into context. And then I'm gonna say, turn this into a multi-phase plan and save it as a local Markdown file. That's all we're doing. I'm not gonna specify anything else. That's all it is and we're gonna run it. As this is running, what I want you to do is note down how you might improve this, note down the thing that it creates because we're probably going to get quite divergent responses. and you should probably do some thinking about how big you want the tasks, how big you want the phases to be. because task sizing here and what you do within that context window is really important. So walk it through, see what it creates, and I will see you in the solution.
173 </video>
174 <video title="Solution">
175 Okay, incredibly, it is actually already ready to go. What it did while I was actually recording the problem is it did an explore on the code base and then it just wrote down plans in structureanalyticsdashboard.md. The first thing I notice here is how quick it is to go and actually make changes and eagerly produce an asset before even grilling me about it. With the Writer PRD skill, we were really spoiled by the fact that we had a really enriching conversation with the LLM, whereas this is just spitting out something immediately. but I'm gonna see what it's done to see how good it is. We can see, first of all, it's put it inside the plans directory here. I'm just going to collapse all of these phases here so that we can understand what it's doing. Okay the thing I'm noticing here is that it's gone for four phases, which is nice, but it's doing it in quite a horizontal style. The four phases are that it's created the analytics service separately first. This means that every single thing that we need from the analytic service will be done in one phase. All of the back-end architecture will be done there. Next, it's going to create a shared dashboard component and install recharts. but then it's not until phase three that it's actually going to use that UI in a route and then it just creates the admin analytics route and the user's page link is phase four. inside each of these phases here, OK, it's actually got all of the individual steps mapped out here. So this is almost as if it's giving us a pure implementation guide, a user manual here for all of this stuff. And what I notice here is that I'm a little bit scared by all of this implementation information actually inside the steps. The reason I'm scared by that is because I'm a developer by trade and I know that you can't purely, you can't get all of the designs right before you actually go in and implement. phase one you can maybe do it because you're just making changes to the codebase as it is right now but if we look inside phase three then yeah like this might end up with referencing functions that don't end up existing in the implementation So this makes me a little bit nervous too. I do like the acceptance criteria. That's really nice. I also don't like that it's not referencing the user stories in the PRD because those user stories were ones that we sweated over right so all of these here I would like to see that actually referenced inside the plan itself so that we can actually link them back to the Y. But anyway, let's go back and see more of what it's up to. We zoom down to the last one here, is there anything egregious inside here? Again, it's still doing this kind of like referencing previously created functions, but we're not even sure if they're going to be created by that point. This is especially true if, like, during phase one, we step in as the human and we decide, no, I actually don't want to implement it that way. In other words, these specific functions here, they will go stale very, very quickly if we decide to change anything about the implementation in earlier phases. However, I do think it's got the sizing of the tasks quite nicely. If I zoom back up to the top and fold them all up again, we can see just sort of what we're up to. None of these look too large, except possibly for phase 1 where the analytics service and tests might be harder than expected. But I suppose since we're only querying the database, maybe it won't be that tricky. So we have created a relatively naive plan here. It's doing all of its work in horizontal layers, which we'll explain why that's bad in the next couple of videos. the plan is a bit too specific with the implementation, in other words it's actually naming functions which might not exist by the time we get to later phases. However it is fairly nicely sized and maybe we could even condense phase 3 and 4 since that's going to be a lot of similar work. Hopefully you reached some of the same conclusions as me, and I would love to hear your thoughts in the Discord if you noticed anything different from how yours turned out to how mine turned out. Nice work, and I will see you in the next one.
176 </video>
177 </lesson>
178 <lesson title="05.04-what-are-tracer-bullets">
179 <video title="Explainer">
180 I'm really excited about this one because this lesson gives me a chance to talk about one of my favourite discoveries when working with AI over the past, you know, few months. And that is that with two words, two well-chosen words, you can completely change the way that AI prioritizes tasks and you can change the way that it plans. These two words come from this book, The Pragmatic Programmer, by David Thomas and Andrew Hunt. This is a 20-year-old engineering text and it has a ton of awesome insights. So I definitely recommend you pick up a copy. and the two words that we're gonna steal from it are tracer bullets. Now the first idea behind Tracer Bullets is that systems have layers. layers might be different deployable units like we have here a database and an API and a front end. or more likely they're different services within the application such as we have in our project. But either way, these layers are things that you need to integrate in order to build something that works. You might have a tiny feature which is front-end only, but more likely you're going to have features that touch the front-end, the API, and the database, if they're going to be anything worth using for users. Now what I have noticed, and in fact what we noticed in our previous exercise, is that LLMs tend to code horizontally. when they're designing the plan, they tend to think about it in terms of layers. If we have a look at the plan, then we have analytic service plus test. Then phase two is an entire component. Then phase three is an entire route. Phase four is an entire another route. This is kind of crazy because we're not getting any feedback on whether the whole thing is working until we reach one of the later phases. Sure, we've got some tests written for the analytics service in phase 1, but those tests are kind of meaningless if the design for the service is not going to match up to what we need in phase 3 or 4. We need early feedback and we need feedback as often as possible. So this idea, early feedback and feedback as often as possible, is what comes from the Pagmatic Programmer and Tracer Bullets. To explain the metaphor, tracer bullets were what the anti-aircraft gunners in World War II used to put into their guns. They would load in a special bullet every six rounds or so. instead of firing an actual proper bullet that was meant to hurt the planes or something, it would just fire out a kind of phosphorescent beam from its backside. And so you would see these streaks of tracer streaking through the sky. it meant that the anti-aircraft gunners could look up, could see where their bullets were going, and adjust accordingly. In code, this looks like this, where instead of phases that span an entire layer, we now have phases that actually go through each layer. Another idea, another name for this concept is vertical slices, where instead of just coding the horizontal stuff, we make sure each phase in the plan touches every layer. That way we, as the human, are able to go in and, like, take a look and provide feedback if needed. But it also means our design actually encompasses all the layers that we need. And so we're less likely to make mistakes in our architecture because what we're building touches all the integration layers. And what makes this so cool to use with AI is that AI already knows what tracer bullets are. It's a relatively famous concept within software engineering. So all we need to do is just say the words, use tracer bullets and maybe a little bit of brief explanation, and then the AI just gets it. I have found that this massively improves the way that AI builds software, and we're gonna use it throughout the rest of this course. Hopefully this clicks for you, but if you have any questions, then do ask them in the Discord. Nice work, and I will see you in the next one.
181 </video>
182 </lesson>
183 <lesson title="05.05-use-tracer-bullets-in-our-multi-phase-plan">
184 <video title="Problem">
185 Alright, now we know about this golden Tracer Bullets approach, I want you to open up a new Claude session with zero context in it. and we're going to use a new skill that I just added to the repo saying PRD to plan. I wrote this using my skill writing skill. and it has copious mentions of tracer bullets in it. I would like you then very simply to go into this Cloud Code instance to say PRD to plan here, and then passing in the PRD just here and tabbing to auto-complete. We're going to run this skill and we're going to see if the output is any better or makes more sense than the one that we did the naive plan that we already created. actually probably as a bit of just sort of sanity checking we should probably go inside the plans and delete the original plan because in a sort of to make this a fair test we don't necessarily want to have the previous one in the context for the LLM to read and manipulate. Let's just make this a clear run to test out our new planning skill against the sort of default setup for Claude. As you're going through this, I'd like you to notice the differences between this new plan and the previous plan. and see if this plan actually does make any improvements or see if you could maybe think about how to improve it further. Good luck and I will see you in the solution.
186 </video>
187 <video title="Solution">
188 OK, let's run it and see what happens. Now, of course, predictably it's doing an explore phase, which is fantastic. So exploring the code base architecture, we can even see that it's using the Sonnet 4.6 model, which is nice. I always love it when it uses a cheaper model for these explore phases because it just seems like a good use of resources. So let's wait for it to complete. Okie dokie, it's now complete and it's come back with a proposed phase breakdown. This is the way I've designed it so that it actually asks you for feedback on the phases before it actually commits to building the plan out. So phase one is a lot more interesting and very, very meaty. We have an analytic service plus an instructor route plus summary cards. So this is a definition of a vertical slice, right? We're doing the service plus the root plus some UI as well. We can see the way that it's defined that down below. Phase one is the tracer bullet that wires up the full stack with the simplest possible data, three summary numbers. Each subsequent phase extends the service and dashboard incrementally. Now that is exactly what I'm looking for. By the end of phase one we will know if the whole thing works essentially, whether we can connect the analytics service up to the instructor root and sort of what weird caveats there are there, and we will have discovered all of the weirdness to do with this feature. In other words, all of the unknown unknowns will have been flushed out. Now I'm looking at this immediately and I'm thinking five phases might be a bit too much. Phase 1 I think is packed in nicely, I think that's a good meaty amount. If anything, it might be too big, especially if we encounter any issues, but maybe it's okay. I'm going to say it's okay. But I think we can group Phase 2 and Phase 3 together since they're both UI concerns. and then maybe phase four and phase five can also be grouped together. Like they're definitely unrelated, admin access and versus empty states, but they both touch the same area of the code base. So it makes sense to do them in the same session and they're both pretty small. So that is what I'm gonna say to the UI. Group together phase two and phase three, and then make another grouping for phase four and phase five, resulting in three phases total. So now if I run that, it should give me a new set of a new plan. Okay, it's now going ahead to write the plan file, so let's see what it cooks up. Okay, it has spat out plans, instructoranalyticsdashboard.md, beautiful. and we can see it looks somewhat different from the previous plan. First of all, the phases down the bottom here are nicely laid out. So we've got our three phases. and we can see a description of each phase. In other words, what to build and the user stories that they reference in the parent PRD. But we can see too that there's not a ton of implementation-like leakage inside here. In other words, the previous plan was almost an implementation just in pseudocode, but this is a lot more just text, essentially. It's a description of the feature. Now the reason for this is that in the skill itself I actually added some instructions to basically only make durable decisions that will definitely work across all phases. If you look inside the skill here, where is it inside PRD to plan, then we say identify durable architectural decisions. Before slicing, identify high-level decisions that are unlikely to change throughout implementation. In other words, we are planning what we can. You do want some implementation information in the plan itself because it just helps keep everything on rails and means that like what comes out is more predictable. but what we're saying here really is don't plan what you can't plan. lots of stuff will only become apparent when you start actually going in and implementing it. So to head back to the actual plan that we have, yeah, we can see the durable decisions are the instructor analytics at this route in this particular file. In the schema, okay, there's no changes. In the service, there's a new analytics service. We're just sort of like pulling in like information about the authentication, get current user ID from the repo. and stuff like dependencies that we're adding, very nice. But of course the proof of the plan is in the pudding. In other words, we should actually go and implement this to see how it works. And of course, your plan may look different to mine. So what did you notice about your plan that you liked or maybe didn't like? Is the level of detail too high or not high enough? do you think? Share it in the discord, I would love to hear your thoughts. nice work and I will see you in the next one.
189 </video>
190 </lesson>
191 <lesson title="05.06-executing-our-multi-phase-plan">
192 <video title="Problem">
193 Okay, we now have our PRD inside PRD forward slash instructor analytics dashboard. We now have our plan inside instructor analytics dashboard in the plans directory. Our job now is to open Cloud Code, enter a new session and we are going to prompt it using this. We're then gonna pass in a couple of files to it. We're gonna pass in the PRD, of course, and then we're going to pass in the plan. And finally, we're going to say, do phase one. As a final thing, I'm going to press Shift-Tab here to accept edits on. We're not doing plan mode here. We are just getting it to churn out some code immediately. Now what I would like you to notice is what happens. Will it do an explore phase first? Probably yes. What does the code look like? How is it doing? And now once you reach the end of phase one, you get a choice. You can either clear the context, run the same prompt again, but say do phase two instead. Or you can just go, okay, you've got all the information in your context, just carry on to the next phase. Or if you're not sure, just default to clearing the context at the end of each phase. I also recommend that you commit at the end of each phase too. that will give you just a nice safe point to go back to, and that's what I'm going to be doing. Other than that, enjoy! This is a really, really fun way to build because we've already done the legwork, we already know what's needed, we've specced it all out, we've got the plan and now we just get to watch it roll down the hill. So good luck and I will see you in the solution!
194 </video>
195 <video title="Solution">
196 Okay, let's run it. I'm assuming, of course, it's going to do an explore phase. Yep, there it is, as soon as I mention it, there it goes exploring. I'll check back in with you once it's complete. All righty, it has now completed the explore phase, it did that fairly quickly, took about a minute, and it's created a bunch of different tasks for itself. So the first thing it's done is created the analytic service here. I might just give this a cursory review. So we are up here inside analytics, service, services. Let's just check how big it made this and what it added to it. Yeah, we can see here that it's really not providing that many analytics. In other words, it's only put in here enough features to satisfy the tracer bullet. When we finish, this is probably going to be a pretty big file, but for now Claude is happy just to leave it be small because we specified it should be a tracer bullet. Fantastic. is now running the tests too, which is good. So all 12 tests passed. Now let me build the UI component. So it's only built 12 tests here. With the previous plan, it would have spent a lot of time here building out the whole horizontal level, but we now have just enough signal to know that, okay, we're on target, we can proceed. It's now building out the analytics dashboard, that's good, and now building out the root. We are now approaching 32%, so we're doing pretty well staying inside the smart zone here, and we've covered a lot of ground. Okay, very cool, it's now given us a total description about what's been happening, which looks all good. I'm gonna say commit this code. And now I can go in and actually QA this. So it's looking pretty nice, kind of pretty sparse. I don't love that we've got free being the total revenue here. This is probably a bug from somewhere. We will look at how to do bug reports within these loops a bit later, but so for now I'll just have to sit there and rankle with it. We are getting some money shown up at 12 months. And if I hide my face here, we can see that if we switch to Sarah Chen, who's the other instructor, then we see other information here. So average rating is four. and she's got four enrollments, so that's looking great. So notice how quickly we're able to gain confidence in what we're building by building this tracer bullet early. Okay, I'm gonna call that phase one complete and this now, this is the decision that I was talking about. 34.3% is, oh, that's, I mean, I probably would definitely clear the context at 40%, right? And maybe definitely at like 35. So I think I am going to clear the context here and free it up for a new phase. I'm then just going to press up a couple of times until I get to my prompt here for the plans on the PRD. I'm gonna say instead of phase one, do phase two. Now, of course, then because we cleared, it's then going to explore the phase one implementation again, so it's gonna look at all the same stuff that we just cleared out of the context. So that's what's annoying about cluing the context, right? Because we then have to spend 30 seconds just catching up with what the phase one implementation actually was. now by the time we're actually going to implement, nice, we're only at 15.5% context, well in the smart zone. So I think we've got the picture now as to what it does when it's implementing, so I'll join you once it's finished implementing. Okay, we have now finished phase two. That's pretty cool. We've got a new analytics service, a dashboard component, a route and some tests. And as we can see on the analytics dashboard, we now have a chart here of sorts, and we have a per course breakdown at the bottom here. Now, we do actually have a showstopper bug here, which is that there are no lines on the chart. You can hover over them by, you know, just doing this or whatever, but, like, you can't actually see the data. So I'm going to go back to Claude here, and I'm gonna say, you can't see any lines on the chart. Please add them. and I'm gonna run this and I'm gonna see what happens. Now, I feel confident doing this within the same context window because we're only at like 30% here. We've got another 10% context window to play with. I felt that we were straying into the dumb zone here I would clear the context and commit the code and then get it to create a new session based on just this bug probably. or I might batch a bunch of bug fixes together so that it does them all together. Okay, it's proposed a fix, and if we go and take a look at it, we can see that it's actually managed to find a really good fix. So if we look at 30 days, potentially, yes, we've got a nice little thing here. That's cool. Now at this point I do have a lot of budget left, so I've still got, you know, I would say 10, 20% of context left. I could do some more QA on this commit before I commit it. or I could just keep ploughing on with phase three and QA it all at the end. So I'm going to commit this, and then I'm going to clear, and I'm going to complete phase three. By the way, I do like using Claude code to commit, because it just writes really detailed commit messages that then future Claude instances can go and read and understand what was done. your commit history ends up being really, really important for Cloud Code because it's a really high value signal because the code and the reason for the change is tied together really closely. So yeah, what's old is new again, commit messages are important again, all that. So I'm going to clear the context because that commit worked fine. And now I'll go upwards and I will go and do phase three instead. And again I will see you when this has completed phasing. Okay, all done. Here's a summary of what was implemented for phase three, admin access and empty states. We now have an admin route that enforces userrole.admin, validates the instructor ID, okay. We've got a completed analytics service. time to get QA-ing, I think. So I've gone to the application and the dev UI, I've logged in as Alex Rivera, who is the admin. I've gone to the manage users page here and I can see that there's a view analytics button next to the instructors. And look at this, that's really nice. I get a breadcrumb up here of all the places I can go and I get to see exactly the same information as Sarah Chen does. If we look at Marcus Johnson here, I can view his analytics too, but if I change to, why don't we change to James Park here, then it says that only admins can access this page. I'm not really in love with this. I suppose this should probably be a 404 rather than a 401. In other words, we want to say this page doesn't exist instead of this page does exist and your attempt to hack in is not working. But either way, that's something that can go on a backlog later. But I am happy enough with this to call it done, I think. So I'm gonna go back in, I'm gonna tell it commit the code and after that, I am done. And what you probably notice here is how hands-off the implementation stage actually is. There are certainly moments when we need to jump in and add our inputs and add our taste. But for the most part, because we've written such a clear destination, and we've specified the journey... the agent is basically just able to get on with it and it understands where it's gonna go. Now this approach scales to really, really big builds. All we need is a bit of a longer, more in-depth PRD and more time spent thinking about the journey there. But once those two documents are in place and we've taken the time to think up a really good plan that involves tracer bullets, early feedback, then we can really start cooking. Nice work, and I will see you in the next one.
197 </video>
198 </lesson>
199 </section>
200 <section title="06-day-4-feedback-loops">
201 <lesson title="06.01-is-code-cheap">
202 <video title="Explainer">
203 I've got a thought experiment for you. There's a lot of people out there who are saying that the AI age is a new paradigm for software development. that we need to chuck out all of our assumptions about how code works, what good code bases look like. Because agents are so fundamentally different from people, from human developers, that they need entirely different guardrails, entirely different setups. In other words, the old rules are for boomers, and now, you know, we need to find new ways of building software. And a lot of this comes down to the mantra that code is cheap. And when you have an AI agent that can basically churn out tons of code, then the production of code is now so much cheaper, so we don't need to worry that much about bugs because the AI will just be able to blast through code and create code faster than we've ever been able to see before. And so the argument goes that software quality matters less because you can just always churn your way out of software quality issues. If you've got a bad code base, then the AI would just be able to fix it by just blasting out more code. Now, I think this is incorrect. I think that software quality matters more now and that AI agents are more sensitive than humans to software quality. And this massively affects how we should design our code bases and design our systems to take advantage of AI best. But first let me start off by defining what I mean by software quality. Is the code base easy to change? This definition comes from this book, The Philosophy of Software Design, by John Ousterhout. I've probably absolutely murdered his name there, so sorry, John. But John makes the argument that if a codebase is easier to change, then we can think of it as a better codebase. In other words, is related information grouped together in the code base? how likely is it that a change you make to the code base will have unintended consequences that ripple outwards? and also what are the feedback loops like in the code base. If you make a change in one place and you accidentally cause a bug somewhere else, how likely are you to find out before you ship to production? Now in a perfect world, we would always have perfect code bases. In other words, our code bases would never be hard to change. They would always be well designed and really easy to change with great feedback loops that we would always be able to catch bugs with. But the truth is that most times, most people are touching a code base, they are making it worse, not better. But if you thoughtlessly make a change in a codebase in order to fix a bug or build a new feature, let's say, and you're not thinking about continually improving the codebase or keeping it in this easy-to-change mode... then you are probably making that code base worse, not better. This concept comes from the pragmatic programmer and this is called software entropy. This is the idea that software tends towards getting worse, not tends towards getting better. And for anyone who's worked on any reasonable size of code base, this will ring true to you. Software developers don't work in a vacuum. There are pressures on you, there are time constraints, there are times when you just need to ship something immediately and you don't really care about the long-term consequences. Or there are times where you're just having a bad day and you just need to get your work done. And on those days, then you are probably going to be having a negative effect on the quality of your code base, even though you're shipping features. We can think of this kind of like a kludge meter where we have it just slowly rising here on every single commit. Now you can intervene and do refactors and redesigns that help stop the kludge. Such as this commit here which might introduce a new part of the test suite or might refactor the code to make it easier to change. every commit that is not this thoughtful will eventually be increasing your software entropy again. Now guess what, in the AI age, AI massively increases the amount of commits you can do. But currently, it does not do a great job at these commits, the kind of entropy-saving commits. Figuring out a better software design to prevent software entropy is one of the hardest tasks a software engineer can do. It's harder than building new features. It's harder than fixing bugs. And so it makes sense that AI is not very good at this yet. And so in its current state, AI is a software entropy accelerator. This is especially true when you think of the three things that go into any AI coding session. You have of course the initial prompt, the thing that you're trying to get the agent to do. You then have any steering mechanisms you're using like skills or Claude.md. but the biggest one, the most important one, is your code base. Agents are much more likely to copy your codebase than any steering that you do because the codebase is the source of truth. And so if there are bad patterns in your code base, even if you're explicitly warning about them in your steering, it will tend to copy the code base by default. This means that AI is maybe even more susceptible to kludge than human developers. because it will just blindly copy whatever's in your codebase without applying its own taste to it. And an AI in a bad code base will have access to way fewer feedback loops, and so it won't know when it's writing bad code. Humans, because they can develop memory about a bad codebase, can kind of survive it and kind of work around it. And because they can understand what weird patterns a codebase has, they can produce fewer bugs and higher quality features within a bad codebase. But AI, remember, is stateless, so again, it is more susceptible to a bad codebase than a human is. A code base that is hard to change is a killer for an AI agent. I mean, it's a tale as old as time. You put garbage in, you're gonna get garbage out. So in my opinion, because software entropy is a fact of life. And because AI is just not really capable at the moment of creating these beautiful code-based saving commits. and because it's so unusually sensitive to code base quality. then I don't think we can say that code is cheap. Cheap code will just accelerate you to the point where you've just maxed out your kludge meter where the code base is complete garbage and impossible for an AI to change. I know that you have worked in codebases like that. I have worked in codebases like that. And the only way you can get around it is by banging your head against the wall repeatedly until you're able to make the change or able to function within that code base. AI, because it has no memory, has no chance. And so in the AI age, code quality matters more. And in this section, we're gonna talk about increasing the quality of your code through feedback loops. which are the essential way to prevent AI from increasing your software entropy. Nice work, and I will see you in the next one.
204 </video>
205 </lesson>
206 <lesson title="06.02-steering-agents-to-use-feedback-loops-with-skills">
207 <video title="Explainer">
208 Now we know how important software quality is, how do we keep the agents on the right path? How do we increase the quality of what the agent is producing so we don't end up in a software entropy death spiral? But one thing we have already learned how to do is to proactively steer the agent by using agents.md files or using skills. but this kind of increases the probability that an agent will choose the right path. Wouldn't it be better if we could deterministically always enforce something for the agent to increase its quality? So this is where feedback loops come in. The agent produces some code, we give some feedback to the agent based on the code that it's produced. and this then helps the agent to produce more code which goes again through the feedback loops until it's produced something of sufficient quality. Now this of course is something that we as humans have been doing for a long time. great engineers don't trust their own instincts and don't trust the quality of the code they're producing, so they produce a lot of feedback loops in order to improve their own quality. So we're just taking the same tried and tested software practice and applying it to agents instead of humans. Now the more feedback loops and the better quality feedback loops you can produce, the better the output is going to be. A classic one is using a strongly-typed language like TypeScript instead of a weakly-typed language like JavaScript. With TypeScript, you're going to get errors if you create a typo or you use the wrong type, which is feedback the agents can use to write better code. Another really important one is any kind of automated testing. Automated tests actually run your code and check that the logic works how you expect it to. And so the higher quality test suite that you can produce, the more signal you're gonna be giving the agents, the better feedback, the better quality code. And so testing strong types are essential for getting high quality code out of agents. Inside our repo, we've already got some type checking and test set up. We already have a script for type check here, which runs react-router-typegen and tsc. In other words, it sort of gets the types ready to be run and then checks the entire repo using the TypeScript CLI. Then we've got this test command which runs vtest run, which runs essentially the entire test suite in the repository. You may already have noticed that we have services inside our application which each have some kind of test service next to them, for instance, PurchaseService has PurchaseService.test.ts. The question then becomes, how do we encourage the agent to use these feedback loops to improve its code? How do we get it to work in this fashion where it basically asks for feedback on every piece of code that it produces? Now of course we could add these instructions into a Claude.md file and so it's in every single session. but not every single session needs to run the feedback loops. So what I prefer to do is create a skill instead. In nearly every one of my repos, I have this Do Work skill. which I invoke every single time I make a change to the repo. This do work skill basically has a list of instructions for the LLM to follow. that mimic how I work when I'm contributing to a repo. In the first part of the skill, I get it to plan out what we're going to do. This is not always necessary because sometimes I'll be working to a pre-written plan. Then I'll get it to implement the change. then I'll get it to seek feedback on what it just implemented. then I'll get it to commit the code to the repo with a commit message. And this is the nicest way I've found to steer the LLM to use feedback loops and to use a kind of loop in the repo that I can then iterate on. And in the next couple of exercises, we're going to build our own and use it to build a feature. Nice work, and I will see you in the next one.
209 </video>
210 </lesson>
211 <lesson title="06.03-building-a-do-work-skill">
212 <video title="Problem">
213 Alright, now we understand the benefits of a do work skill, let's actually implement our own. we're going to go into the repo and use our writer skills skill to write a skill. To do that you'll need to go into a Cloud Code instance and we'll need to do write a skill here. And then we're going to say something like write a skill that creates a do work skill that represents a unit of work within this repository. The skill should instruct the agent to plan out the piece of work, then implement the work, then seek feedback via the feedback loops in PNPM type check and PNPM run test. The final step in the do work skill should be that it commits the code. Once that's done then you should end up with a nice little do work skill in the repo which we will then put into practice in a minute. So best of luck and I'll see you in the solution.
214 </video>
215 <video title="Solution">
216 Okie doke, let's run this and see what we end up with. Now it has of course entered an explore phase where it's exploring the repo structure and it's just checking out all the stuff it needs to. And as usual, we'll chat back in once it's complete. Okay, it's come back here and it's trying to put it inside my user directory instead of the project. I want this one to be in my project because it's kind of going to be tied to the structure of my project because it's going to ask for specific feedback loops. So I'm going to select no here and press tab to give it some more information. I'm going to say put it inside the project directory instead. So let's run this and see what happens. Okay, cool, it's now doing it inside the project. Okay, it's now come back with a skill here, which is inside do work inside the repo. So we can see it's doing this nice little workflow here where it's understanding the task, read any reference plan or PRD. It's then planning the implementation, then implementing the plan. I don't necessarily love that it's added all of this stuff to do with specific files in here. I don't think the convention stuff belongs here. You can be really, really concise with this do work skill. You don't have to be too in depth here. Then it says validate, that's lovely, and it's doing this pnpm run type check, pnpm run test. It's actually mentioning the things by name. That's important. Now we are repeating ourselves a little bit here. We're repeating until both pass cleanly. Do not move on until both commands pass with zero errors. In my experience, you don't need to be that harsh with AI. Like this should be enough that it will just kind of continually run the feedback loops until it's all good. I also don't think we need this section too. I think we can just ignore that. That feels too much like the default behavior. But notice here I'm taking a light hand in steering it, I'm not steering it too harshly. Once time check and test pass, commit the work. I don't care whether it uses imperative mood or not. In my experience, you don't need to tell it to stage only the files you changed. And in fact, I generally don't want to give it any advice about the commit message, like I don't care about the commit message that much, even if it's super long, actually that's really beneficial. There's one other thing I want to tweak in here too, which is just at the top, this plan the implementation. Now this do work skill, I could run it without creating a multi-phase plan first. Like I could run it on a small enough task that it would just be able to do it within a single context window. But most of the time I'm going to be running this do work on a large PRD or running it on a multi-phase plan. So I'm going to say that this section is optional. Because if we've already planned it, then we don't need to re-plan it here. And I'm then gonna change this top line to, if the task has not already been planned, create a plan for it. We could also optimise this section further by introducing stuff about tracer bullets and creating good plans. But to be honest, I'm kind of tempted just to delete all of this guff and have a really simple instruction here. I really, really like concise skills like this. We're now down to just 35 lines. The really important part here is really just the headings, you know, this five step workflow, which will take it through understanding the task, planning it, implementing it, validating it and committing it. Now I'm really interested in how your skill differed from mine. Did yours look more verbose? What did you add in here that I didn't? I would love your thoughts on this in the discord. But I bet you're now itching to test out this skill and see if it actually works. which we will do very soon. Nice work, and I will see you in the next one.
217 </video>
218 </lesson>
219 <lesson title="06.04-using-our-do-work-skill">
220 <video title="Problem">
221 Okay, we have our do work skill cooking. and I have prepared for you a PRD to test it out on. This is the in-app notifications PRD. Instructors on the Cadence platform have no way to know when new students enroll in their courses. They must manually check their student list or analytics dashboard to discover new enrollments. they miss the opportunity to engage with new students promptly. So we're going to create a notification section for instructors. I've created a PRD for you, and I've also created a plan too. But this is not a massive piece of work, so it actually only comes down to a single phase here. and so it's perfect for testing our do work skill. So let's open up a fresh Claude code session. We're going to invoke the do work skill. We're gonna pass in the plans in app notifications and then pass in the PRD in app notifications. And then just like we did before, we're gonna say do phase one. Now as this is happening, I would like you to observe whether the agent is actually following your do work skill, following the steps you laid down. And we definitely want to check if it's running the feedback loops too. So good luck, write down anything you notice and ask in the Discord if you run into any issues. Good luck and I'll see you in the solution.
222 </video>
223 <video title="Solution">
224 All right, let's rock and roll. Now it's kicking off with an explore phase, which is good. That's what it usually does. But we also said that we wanted to do that in the do work skill. So it's hard to tell whether it's obeying its own commands, or whether it's actually listening to us and doing it because we put it in the Do Work skill. Okay, we can see that it's not choosing to plan here. It feels like it has full context. It's now creating tasks and starting implementing. So that I don't get in its way, I'm gonna turn on edits for this session. I'm gonna turn on shift tab, so I'm just gonna let it run. Now it is really roaring through this task list, which is great. And we can actually see that it's got validate with type check and test already in its list of tasks. So this is good, our do work skill is doing a bit of work here. Now it has completed all of its work and it's now asking to run PMPM type check. And the version of Claw Code that I'm on is doing this weird thing where it's being a bit too eager with these permission requests. I'm going to say yes and don't ask again for pnpm run and then immediately go up into claude settings.local.json and modify this to pnpm run type check instead. because if I just allow it to do pnpm run this, then it will be allowed to run everything, including migration scripts and things like that. While I'm here, I'm also going to add pnpm run test just below it because I want it to be able to run that anytime. And ha ha, look at this, it's now asking for approval for PMPM run tests, so it should in theory do that, or not need to do that next time. So there we go, we do have a failed test now up here, so something is not quite working here. The ordering test fails because both notifications get the same created at timestamp I'll order by ID descending as a reliable tiebreaker. I don't love this because it's a weird dependency to have for the notifications order to depend on how we're doing IDs. However, because we have a feedback loop covering this, I feel a bit more okay about it. Because in future, if we do change the setup of the ID, then that test will probably break before we head to production. So let's see where it's got to now. Okay, it is now committing the code. A lot of people feel quite squeamish about allowing AI to commit to the repo. but you can always just roll back the commit, right? Commits are totally immutable. So in general, I like adding the git add while card and git commit while card into the permissions allow. because otherwise I would just have to manually tell it to commit afterwards. Okay, let's do some QA here. I'm gonna log in as James Park. I'm gonna go to a course that I haven't purchased already, the Node.js course, which is run by Marcus Johnson, and I'm going to enroll now for 60 fake dollars. So James has now enrolled. I can now log in as Marcus Johnson. And hey, would you look at that, I've got a notification. James Park enrolled in building REST APIs with Node.js, and if I click on this, I can go to my student roster. Now, like usual, I can see a couple of little paper cuts here. This bell is not quite visually aligned with this cadence title up here. But overall, I'm happy with the results here and on the backend, it seems to be working fine. so we have successfully used our Do Work skill. This is exciting because it's going to be a canvas that we can then build and layer on as we increase the complexity of our workflow. and especially increase the complexity of our feedback loops. Nice work, and I will see you in the next one.
225 </video>
226 </lesson>
227 <lesson title="06.05-fixing-agents-broken-formatting-with-pre-commit">
228 <video title="Explainer">
229 This do work skill that we've been working on, you might notice that we are really encouraging the AI to use the feedback loops, but we're not enforcing that the feedback loops run on every single commit. Wouldn't it be great if before every commit, we could just deterministically check if the repo is good, and if it's bad, then we report that back to the agent. Well, guess what, there is such a thing using git hooks. Githooks allow you to run a script when certain events happen within Git. For instance, here we have a pre-commit hook. Now this runs automatically before you commit so we can run something that might fail like a type check or test and put that inside the pre-commit hook. Now experienced developers are probably looking at this going, what? You're recommending pre-commit hooks? Are you crazy? because pre-commit hooks are famously really annoying for human developers. Because human developers might want to push to the repo for all sorts of reasons, but they don't necessarily want to have to wait three minutes every time they commit for their types and tests to run. But for an agent, and for Claude, this doesn't matter because Claude doesn't care how long it takes to run the tests. It doesn't get frustrated by things taking a long time, it's just going to just wait and see what happens. So the friction that's really painful for a human developer is actually perfect for an agent. And so in this lesson, we're gonna set up a pre-commit hook that is going to deterministically enforce our feedback loops. So all we're going to do is click on Use with AI, copy as markdown. And then we're going to paste it inside a fresh Claude code instance in the project. and then just say, implement this in my project. I'm then just going to shift tab so I accept edit on and we should be good to go. So it's detected that we've already got TypeScript, we've already got VTest, it's now asking to install a couple of dependencies, Husky and lint staged. Husky basically helps us manage our pre-commit hooks in a really simple way, and it's now asking to execute husky init, which is what you want. lint staged is a really nice command that only runs linting on staged files. This is going to be really, really useful in a second. It's then written three lines to lint staged rc here. This is the configuration file for lint staged. And what this basically says is on every single file that's staged, we are going to run prettier on it. Now, Prettier, if you don't know, is a code formatter that basically means that your entire repo corresponds to a certain set of formatting rules that you encode locally. If you've never heard of Prettier, what are you doing with your life? It's the most incredible thing and it's been around for years. But combined with lint stage, we're basically making sure, deterministically, that the AI cannot commit unformatted code to the repo. Okay, it's verified everything works by doing a quick dry run. And crucially, it's also given us a file inside .husky forward slash precommit, which basically runs the three things that we've just been talking about. This now runs npx lint staged, it runs pnpm type check, and it runs pnpm run test. This means that every time we commit, then these three are going to be run. So now I'm going to get it to commit the code here. And if we zoom down here, it looks like it's already completed its work. It added all the relevant files. It committed. we can see that the pre-commit hook ran all three checks, lint stage formatting, type check, and a bunch of tests, and they all passed. I'm now going to say to it, test that the pre-commit hooks work by trying to push a commit with a failing test. I'm going to submit this and check that it works. So it's now gone into the coupon service and updated that, and we can see that it's now staged the files and tried to commit, but the commit was blocked. The hook caught the failing test, expected X to have a length of 99.9, but got five. It's now gone back, updated the thing again, and it's now in fact going to restore the original test file, which is good. So there we go, that is exactly what we want. It caught the error and it prevented us from committing and it told the AI how to fix it. a beautiful deterministic feedback loop that you should have in every single project. Nice work, and I will see you in the next one.
230 </video>
231 </lesson>
232 <lesson title="06.06-what-is-red-green-refactor">
233 <video title="Explainer">
234 We now have our do work skill, and at the moment it's kind of vanilla. it's not exactly optimised for what I think is the best way to build code. For instance, we aren't giving it any kind of instruction on how to implement the plan. However, the debate on how to implement code is almost as old as software development itself. And there are a ton of great techniques that allow you to not only build out the code, but do it in a safe way that increases the value of your feedback loops as you go. We want to push this implementation step to increase our code quality, not diminish it. and the technique I'm gonna teach you about is red-green refactor. Time for me to wave another book at you, this is Extreme Programming Explained by Kent Beck. Kent Beck is a prominent advocate for something called test-driven development. And the foundation of TDD is that tests are really important and should drive your entire development process. This is even more true in the age of AI where feedback loops are so important. One of Kent's main techniques is that you have this red-green refactor loop while you're implementing. It's very tempting when you're building stuff to just dive into implementation and this is what AI will do every single time you ask it. But what RedGreenRefactor does is it says you should write a failing test first. And not only should you write the test, but you should run the test to see that it fails. This is phenomenal when you're bug fixing because you prove the bug exists and then you go to fix the bug in the green phase. In the green phase, you write a minimal implementation to make CI go green again. In other words, you just try to pass the test. And then in the refactor phase, you go and look at that code, refactor it, kind of put it in a prettier state and try to increase your code quality. all the time running the feedback loops to make sure that you're keeping the CI green. And by the way, when I say CI, I mean continuous integration. That's kind of a shorthand that devs use to basically mean types and tests. So why does writing a failing test first matter so much? Well, for AI, it's incredible because it gives the AI the ability to run the code and test the code even before it's written the code. And not only that, but it forces the code that the AI creates to be testable. And if a piece of code is testable, then it's easy to write tests for, and so easy to change later because the tests will catch any bugs with the code. And having a failing test in place first means that the AI can instrument the code and add logs and understand the code really deeply while it's implementing. Personally, I have found that using red-green refactor in my implementation phases increases the quality of AI's code so much. I've also found that combining Red Green Refactor with Tracer Bullets is an extremely good way to go. With this setup, the agent creates one failing test, then creates a minimal thing to pass that test, then it creates another failing test, then creates the minimal implementation needed to pass that test. In other words, the first test here is a tracer bullet test. It's just testing one vertical slice of the entire setup. So getting AI to work through and just create one test at a time, it has been so good. Because this reduces the likelihood that AI is going to spray a bunch of tests in a horizontal layer into your codebase, many of which might not actually be testing a real thing. Finally, from your perspective as the user, actually seeing the failing test and then seeing the test succeed is a really good way to just feel confident about what the LLM is doing. because actually having a failing test and then making it pass is relatively hard for the LLM to fake. Obviously if it's feeling malicious then it can just kind of create a crap test and then make that crap test pass with a crap implementation. But in my experience, if it's steered well, it just rarely does that. And as it's going through this cycle, it's adding better and better tests to the repo, which are testing existing code, making the code base easier to change and easier to manipulate. and I'm really, really excited to show you this as part of this course. So nice work and we'll see you in the next one.
235 </video>
236 </lesson>
237 <lesson title="06.07-red-green-refactor">
238 <video title="Problem">
239 Okay, let's go into our do work skill and change the implementation setup so that it uses red green refactor instead of just working through the plan step by step. The way I recommend you do this is to go into a fresh Claude code instance here. And use the Writer skill here to add RedGreenRefactor into the implementation steps inside the DoWorks skill. We should encourage the agent to do one test at a time in a tracer bullet style. and we should encourage you to do this only for back-end code, not for front-end code. The reason I'm adding that stipulation is that our test suite only really works for back-end code. We don't currently have a front-end facing test setup. We'll cover the front end later, don't you worry, but for now I'm just gonna focus on red-green refactor in the back end. Then once you're happy with this skill and you're ready to test it out, we're going to immediately go and test it out. We have a PRD here inside Coupon Redemption Notifications for Team Admins. When a team admin purchases seats and distributes coupon code to their team members, they have no way of knowing whether those coupons are actually redeemed. So the solution is to extend the existing in-app notification system to alert all team admins when a coupon belonging to their team is redeemed. This is a really small feature, we're just extending what we had before. And so it should be a good place to test out the TDD approach because there's some backend code that needs changing in here. So once you've finished writing your skill, then you should go in and do the do work setup that we had before, putting in both the PRD and the plan. plan itself only has one phase so you can just say do phase one and you should be good to go. Watch Claude code like a hawk and you should see it go through the red-green refactor loop. It is just tremendously satisfying to watch. Good luck and I will see you in the solution.
240 </video>
241 <video title="Solution">
242 OK, let's start by modifying our Do Work skill. The cool thing about TDD and RedGreenRefactor is that models kind of really know what it is, you don't need to point them to external docks. And so we can already see it has asked to modify the setup here. Okay, cool. So it's given us quite a lot of text here. looking at the for front-end code section immediately this just looks really long to me. I think we can say just cut that down and say implement directly without prd. I think it knows what front-end code is so we can just pare it down to that little sentence. I do like this, I think it already understands what back-end code is, I don't need this extra pair of parentheses. For back-end code, use red-green refactor. I'm looking at this little list here, and I can see that it says refactor and then repeat from step one. I'm actually gonna swap these two things here. So I'm gonna get it to repeat from stage one to continually do red, green, red, green, and then do one refactor step at the end. I'm also going to return these parentheses. I don't think these examples are actually that helpful. I think more vague guidance is better. We're just trying to tickle the latent space of the LLM, really, just to say, do red green refactor, here's a list of things you should do. But overall this looks good to me, I'm going to commit this, say make changes to the do work skill before I actually go and implement. OK, let's close this up. And I'm now going to run clear inside my plug code instance to make sure I've got a clean context for the next bit. So I'm first gonna pull in the PRD, then I'm gonna pull in the plan, and then I'm gonna go a couple of spaces down and I'm going to say, do work. and then I'll tell it just down here, do phase one. And let's see how this goes. It's nice that it starts off here with two explore agents, it's exploring the notification system and exploring the coupon system, very cool. Before I forget, I'm going to put it on shift tab so I accept all edits on. And here we go, we've got our first TDD mentioned here. Let me create tasks and start implementing with TDD. OK, it's continuing implementing and it's starting to use TDD. Let me write the first failing test first. Yay! It is, I have to say, writing more than one test here, so it's actually written multiple tests inside here. So I suppose it must be quite confident about the implementation, it's sort of rushing ahead a little bit. we can see it's referencing the red phase and indeed the tests do fail. So it's now heading towards the green phase where it's going to actually implement them. We can see it's adding code to the coupon service, which is expected. And now. Yep, all tests pass. Now it's going to update the front end. Okay, if we zoom down, we can see that it then finally, once it sort of did everything, it's now validated with the type check and the tests. So it's running the feedback loops as kind of a final check before it actually goes and commits. Then it's adding the code and it's asking me for permission because it looks like there's some dodgy stuff in there or a command substitution. The feedback loops ran just one more time inside the pre-commit hook there as well. and so we are looking good. So I've logged in as Liam Thompson here who's just a student and I can go into introduction to TypeScript and I can buy more seats. I'm going to be a big spender and I'm going to buy five seats here. and I end up in my team management section where I can distribute these tokens. So I'm going to copy and paste the first code here and just copy the link. Then I go to the bottom left. I'm going to log out of Liam Thompson. I'm going to sign up as a new user. So I'll go with Matthew Pocock as my name and then I'll put in an example email. And when I sign up here, then I should be able to go into the URL bar. and paste the link that I got from Liam Thompson and redeem this course. I can then click enroll now and I should be able to start learning from here. Now if I go back into Liam Thompson, then Liam Thompson, yes, has a notification. Matthew Percock redeemed a coupon for introduction to TypeScript and there are four of five seats remaining and I can see that by clicking on the notification and heading into here. So I would say not too shabby. Now because this is a relatively simple feature, we might have gotten the same quality here without TDD. But what I definitely feel, having done this for a few months now, is that TDD and RedGreen refactor because AI is so patient at creating the tests. And because they create a huge network of feedback loops that the AI can rely on… they increase the likelihood that AI is going to one-shot the feature, as it just did. So I freaking love it, I'm a huge fan. Now one thing I did notice that does concern me a bit is that it didn't follow our instruction to do tracer bullet tests. So what I would do if I was working on this on a real application is I would go back into my skill and try to really emphasize the traceable approach. And if you like, you can do that now. So what did you notice that was different from my outcomes to your outcomes? How does your skill differ from the one that I produced? all good questions that can be asked right now on the discord. Nice work, and I will see you in the next one.
243 </video>
244 </lesson>
245 </section>
246 <section title="07-day-5-ralph">
247 <lesson title="07.01-what-is-ralph">
248 <video title="Explainer">
249 I want you to cast your mind back for a moment to this mental model that we had about multi-phase plans. Now, the idea of multi-phase plans, of course, is to take this huge chunk of work and break it down into smaller chunks that we can fit inside the smart zone of a context window. This means that we need to prompt the agent with three parts here. We need to prompt it with the destination, we need to prompt it with the journey, in other words, the breakdown of all of the tasks, and we need to prompt it to do phase N. In other words, phase one for the first one, phase two, phase three, phase whatever. Now there's a problem with this, and the problem I see with this setup is this Do Phase N. In other words, this is a human that's having to sit with the LLM as it does all of this work and say, okay, now go on to the next phase. In other words, this entire process requires a human in the loop, H-I-T-L. And the truth is that not all of the tasks we're going to be doing in a plan require a human in the loop. Many of the tasks that we need to do with an agent can be done AFK, away from keyboard where the human is not in the loop. You may have felt this while you were monitoring the agent during its multi-phase plan. You might be thinking at certain points, I don't really need to be here. Because we have the PRD, the destination all mapped out, and we've understood what the plan is, then the agent can really do its own thinking and just, we can QA it all at the end. The only thing I'm really needed here for is to point it to the right phase, and that does not feel like something you need a human for. That can be done with a simple for loop. this is what I had been feeling for a while. I'd been using multi-phase plans up until December 2025 and they were okay but not amazing. I've been starting to feel like I wanted to step away from the keyboard a bit more and go and do other work while the agent was doing its thing. In other words, I wanted the agent to be more autonomous. And without really knowing it, I was looking for a framework that would help me do that. And around then I discovered this article by Jeff Huntley describing Ralph Wiggum as a software engineer. But this article gathered a lot of hype, and it's gone into the vernacular of people doing work with Claude Code. At the time of recording a few weeks ago, Vercel said they Ralph Wiggum'd webstreams to make them ten times faster. know people are calling out the weird little kid from the Simpsons and saying yes we use this as an AI coding technique. All that Ralph is, is a simple loop. You essentially just run Claude in a loop with a single prompt, the same prompt over and over again, getting it to do a task until it's complete. If this sounds way too simple, then that's because it is. Instead of saying do phase N here, we are essentially asking the agent to walk through the journey itself. we give it the destination and we give it our description of the journey and then it just walks through until it completes. This means that instead of the human being needed at the start of each phase here. and at the end of the last one to QA it and make sure all the code is good. The human is really only needed at the very start and the very end. This means that you can kick off a Ralph loop and it will just loop and loop and loop until it's complete and then it will ping you and you can go and QA the code. This setup here is how I'm doing most of my coding now. The flow tends to be, I spend a lot of time creating the PRD and doing all of the planning. and then I walk away and come back to working code at the end, or mostly working code. Even while I'm filming this now, I have a Ralph loop running in the background doing some work for me. So for me, getting agents to work autonomously has been really the goal of this course and the goal of all of my work for the past few months. Everything we've learned from planning to feedback loops to steering is really geared towards getting the agent to produce good results autonomously. Now, as a final note here, Ralph is a stupid name, and in certain ways, it's not even really a concept. Ralph is really just a loop and everyone has different implementations of that loop, different flavors in their Ralph loops. And certainly I'm going to be teaching you something that's very different to the original vision proposed by Jeff Hunt. He prefers a much looser, just give the thing a PRD and let it just run forever. I prefer a version of Ralph where you control it a lot more, where you give it more guardrails and where you give it a specific plan to follow as you go along. So there's no such thing as a right way to do Ralph, we're just trying to run an agent in a loop and get good results at the end. And I bet now you are very excited to dive in and I'm very excited to teach you it. So nice work, and I will see you in the next one.
250 </video>
251 </lesson>
252 <lesson title="07.02-hitl-vs-afk-ralph">
253 <video title="Explainer">
254 you can think of there as being really two types of Ralph loops. And if you go into the project repo, you can actually see them here. Inside the Ralph folder in the root of the project, we have two scripts here. of these is an afk script and this is pretty complicated and we'll walk through all of this later but it essentially runs a loop here so we have a for loop in bash This runs Cloud Code autonomously multiple times until some objective is achieved. The other script we have is the once script, which just runs Ralph once. So there's no looping here. All we do is we just watch Ralph do its work. These autonomous loops take a while to set up and you need experience with watching them and understanding what they're doing before you can feel safe letting them run autonomously. So I think of that as being two types of Ralph loops. There are human-in-the-loop Ralph loops, where you just run this once script, where you're just sort of learning it, getting up to speed with what Ralph is doing. And then there's AFK Ralph, where you just go over here and you use the proper full fat script here with everything inside. you notice that both of these actually use a shared prompt here. So the human-in-the-loop Ralph and the AFK Ralph will behave exactly the same, it's just that the harness around them is slightly different. In this section we're going to start with Human in the Loop Ralph, get our bearings, understand what it's doing, learn how to best prompt it. and then we're going to go AFK and start running it autonomously once we've got our feet wet and we're feeling comfortable. And that's also exactly what I would do if I was introducing this to a new repo. So in the next exercise, we're gonna start with human in the loop, Ralph. Nice work, I'll see you in the next one.
255 </video>
256 </lesson>
257 <lesson title="07.03-trying-hitl-ralph">
258 <video title="Problem">
259 Alright, in this exercise we are going to use the wants.sh script. Let's first walk through exactly what it's doing. Let's actually start from the bottom, because at the bottom, we are calling Claude, so running it via the CLI, of course, and then passing in permission mode except edits. If we run this ourself, what we get is we just open Claude with Accept Edit on. That's all this does. And then what you can do is you can pass it a string, so pass it some kind of prompt here, and that will get passed into the prompt. So there are three parts to this prompt that we're passing in. The first thing here is we're passing in the plan and the PRD. That's the way we're going to run this loop. We're going to pass in the plan and PRD basically as a single string so that it can reference the local files. means that we can run the same Ralph loop on multiple plans and PRDs if we want to. We're also then passing in the previous five commits. I've found that this is a really, really great way to keep Ralph on track and to see exactly what's gone before. It immediately orients it and allows it to see the plan of the PRD in a new light. and I've just found that it's an excellent way to just keep it on the rails. Finally, we are grabbing this prompt.md file, which is just over here, and we're passing that in as the prompt. Now this prompt.md file is going to feel pretty familiar because it's fairly similar to our do work skill. a PRDM plan have been provided to you, read these to understand the task. You've also been passed a file containing the last few commits. Read these to understand the work that has been done. If there are no more tasks to complete output, promise no more tasks. I'm going to leave this as a mystery until we look at the AFK version of Ralph. Then we get it to do an exploration to explore the repo, then an implementation to complete the task. If you wanted to do any kind of red green refactor or TDD, this is where that information would go. Then we do the feedback loops. So before committing, run the feedback loops. Then we make a git commit and I do specify a few things that need to go into the commits. We include the key decisions made, we include the files changed, and blockers and notes for the next iteration. Since all the progress here will be pulled into the next loop of the Ralph, it's really good to just keep a running record of what the most recent things that happened were. And so extra detail here is really useful. And finally, we tell it in all caps, ONLY WORK ON A SINGLE TASK. This is incredibly important because we don't want it to continually keep choosing new tasks. If we do that, then it's just gonna burn through phases and burn its way into the dumb zone. So like always, we have a new PRD to run through. We're gonna be looking at the Admin Analytics PRD. Admins currently have no way to view platform-wide revenue or performance metrics. The existing dashboard is scoped to individual instructors, meaning an admin must navigate to each instructor's analytics page separately to understand overall platform health. So we're making a read-only Admin Analytics page at Admin Analytics. There's also a plan here, so this is inside plans, admin analytics plan, and this plan has three phases, I think, yeah, three phases. So what I would like you to do is just to do phase one with the human-in-the-loop version of Ralph. That means we are going to run in a fresh terminal here. You may need to run this command, you may need to chmod it with plus X to make it executable. And then we're gonna run ralphwants.sh passing in the admin analytics plan and the admin analytics PRD. and then we are going to watch what happens. At this point this should be fairly similar to running your Do Work skill. There might be some permissions requests that you need to go through, but hopefully you shouldn't have to steer it that much. And remember, we're just gonna do one phase of the plan. We're gonna save the later phases for later. Make sure you write down anything you notice. Good luck, and I'll see you in the solution.
260 </video>
261 <video title="Solution">
262 Okay, let's run it and see what happens. So the first thing we can see is that Claude has been run now with all of the previous commits here. We can zoom down a bit further, and then we can see the plan and PRD have been added into the prompt here. And then finally, we have the prompt from our prompt.md file. So predictably it's going into an explore phase, which is great. And it's now decided it is going to implement phase one. Where did that thingy just go? Here we go, let me implement phase one. I need to add platform-wide analytics, add the admin analytics route, add analytics to the admin sidebar, et cetera. We can see it's continuing to churn out code and implement, and we're heading towards the dumb zone, but we're not necessarily that close to it. It has added the new tests, and now it's running the feedback loops, and I'm absolutely fine with it running pnpm run test. One thing to note here is that we as the humans are still needed to run permissions requests here and make sure that it's not doing anything too crazy or rim-wrapping our home directory. For instance, here it's saying, should I proceed because the command contains backticks for command substitution? Yes, that's fine, you may proceed. run this totally afk we will want to somehow manage these better in future and we're going to talk about that in a bit. Okay, a really nice clean run. We are at 30% context. We have added a bunch of new tests and the admin sidebar. I'm going to head to the dev server and log in as Alex Rivera, the admin. I'm going to go to my dashboard and we should now see analytics. So we now have 12 months, all time, seven days, very cool. And we should, too, have the top earning course here, which is, of course, Introduction to TypeScript. Why wouldn't it be? So this is exciting, the AI has produced really nice code without us really needing to do anything? With the exception of a couple of permissions issues, we would probably be able to run this in a loop pretty comfortably and continue adding commits into this repo. So that is the idea of Ralph. You just run the same thing again and again and you get a commit at the end of it. You then pass those commits into the next iteration of the loop. And the wonderful, beautiful cycle continues on and on. Let me know if you have any questions or observations, or if your Ralph loop somehow didn't do as good a job as mine. let me know in the Discord server. But if everything is good, nice work, and I'll see you in the next one.
263 </video>
264 </lesson>
265 <lesson title="07.04-sandboxing">
266 <video title="Explainer">
267 In the previous exercise, we saw that there are a lot of steps we still need to do in order to get Claude running afk. The main thing we need to handle are those permission requests. Because there's no way that Claude is going to run completely afk if it's constantly asking us if it needs to run certain bash commands or fetch from websites or all that stuff. Now there is a way that we can programmatically run Claude without it asking us anything. and that is the dangerously skip permissions flag here. Now the reason this is marked dangerous is because it's very dangerous. Early on, when people were getting excited about AI coding tools, people referred to this as YOLO mode. In other words, you-only-live-once mode. And it turned out that most of their home directories would also only live once, because if you run clawed code with this, it has a tendency to just do weird things. Claude and, of course, all AI agents are not deterministic. They will do strange things on occasion. And those strange things sometimes involve deleting your home directory or deleting important configuration files and in general messing things up. So if we're going to run Claude with dangerously skipped permissions, which is kind of what we want to do for AFK, we need to find a way to sandbox it so it can do the minimum amount of damage. In other words, if we are not there to guide Claude, we are going to stuff it into the smallest possible box of options that still makes it useful. Now, I don't think a perfect solution for that exists yet. Claude itself has a sandbox command here. and this essentially tries to sandbox the built-in bash helper. However, Claude Code can actually break out of the sandbox if it wants to. and so we can't really run it sensibly with dangerously skipped permissions. So for me, Sandbox is a non-starter, it's not going to work afk yet. It might change in the future. The setup I've found that has the best set of trade-offs, but is still not perfect, is Docker Sandbox. Docker Sandbox lets you run AI coding agents in isolated environments on your machine. This means that inside the sandbox, the agent literally can't reach the rest of your file system. It can run commands and it can run git commands too. but it can't use those bash commands to modify stuff in your system. It isolates the agents in micro VMs, each with its own Docker daemon. One thing that I don't like about this is that it can still reach the web. So I'm hoping at some point that Docker Sandboxes adds web isolation too. But for now, this is about as good as you're going to get for running Ralph AFK. To get started with Docker, you will need to download Docker Desktop here. which, like me, you may already have and you may need to update it to a later version. I'll put a link below for Docker Desktop for Mac, for Windows, and for Linux. Once you've got it downloaded, then you should be able to run docker sandbox run Claude and with a period after it. That just means we're going to run Claude inside a Docker sandbox in this directory. It then says you are going to create a new sandbox, Claude Cohort 003 project. and it then pulls down the latest version of that sandbox. Once that's done and it's all downloaded, we can then open up Claude Code in here. Now, because we are essentially just running Cloud Code in a new environment, it asks us to get set up as if we're getting set up for the first time. I'm going to choose dark mode here and I'm then going to log in with my subscription. or if you have a console account or a third party platform, you can log in with those too. So I have successfully logged in, let's press Enter to continue. Yes, Claude can make mistakes. I have definitely learned that. We need to walk through the quick safety check again, so yes, I do trust this folder. And for this version of claw code, I'm going to recommend using medium effort. Medium effort has been fine for me. So now at this point we have access to a Claude code instance inside a docker container. I'm going to ask it a question about our repo saying are we using npm or pnpm. That is correct, we are using PNPM, lovely. Next, I'm going to ask it to grab me a file from my Downloads folder in my home directory. I'm asking this because this is not something it should be able to do, it should be sandboxed directly into its own repo, or directly into this folder. And there we go, beautiful. There's no downloads directory in this environment. It's running in a sandbox container, not on your local machine. I don't have access to your local file system. Isn't it great when Claude just says exactly what you want it to? That's fantastic. So once you have that set up, and once you've done these tests here, then you're ready to move on to the next exercise. I know from experience that every time I introduce Docker into a course, I always get a ton of support requests about it. you know, Docker doesn't work on this environment, Docker doesn't work on that environment. And so if you have any trouble setting this up or any questions whatsoever, ask me in the Discord. But if you're good to go, nice work, and I'll see you in the next one.
268 </video>
269 </lesson>
270 <lesson title="07.05-setting-up-and-trying-afk-ralph">
271 <video title="Problem">
272 Okay, it's now time to start exploring the AFK script. Now this looks significantly more complex than the once script, the human in the loop script, but it really isn't. let me show you all the stuff that's the same first. The first thing is we're passing in the plan and PRD just like we were before. Down here inside the loop, we are also then passing in the commits too. So we're logging the last five commits and saving them to a variable, and then passing them in to docker sandbox run Claude. So this part of the prompt here, previous commits, plan and PRD, and the prompt are just exactly the same as in the human and the loop script. In fact, most of the code here is trying to get Claude to do something that it doesn't usually do. which is that we're using this print flag here to basically run Claude as if it was a normal script. In other words, instead of opening the Claude Code UI, it's literally just going to be silent, sit there, and just produce some code and then exit. Now I didn't like this at all, I wanted it to stream me updates as it was going. And so a lot of the logic here like output format stream JSON and all of this JQ filter stuff is basically to grab all of the JSON parts that are being streamed down as they come. and display those in the terminal. Really the main thing we need to look at is this for loop here. This basically says for X number of iterations, do this little step in the middle, in other words, run docker sandbox with this prompt. And these X number of iterations, we actually pass that in as an argument. So we say, run this Ralph loop and stop when we reach this number of iterations. And we want to do this to prevent the fail case where for some reason our agent just loops and loops forever. I usually set this to a maximum of like 20 or something or maybe like 5 if I'm testing. If we do reach the max number of iterations, then the for loop will just run out and we are done. But if the result contains the promise no more tasks here. Then we echo to the console ralph complete after X number of iterations, and then we exit with a zero. Now we saw this before, didn't we? We saw promise no more tasks inside the prompt, and I told you we'd come back to it later. This means that it reads the inputs here, and if it says, if there are no more tasks to complete, output no more tasks. I found that this is the most reliable way to get Ralph to stop itself once it's complete. However, you will inevitably get some false positives here, it's just the nature of the game. Now running this exercise is going to be dead simple for all we're going to do is instead of running once.sh we're going to run afk.sh passing in the same plan and the same PRD. And now for the second argument, I'm going to set the max number of iterations and I'm going to set it to 1 for this exercise. Since we're really just testing AFK Ralph yet, we don't want to let it off the leash. What you should see are messages streaming to the console. and you should see the same behavior as you saw in the human in the loop version now taken to the AFK version. Make sure you write down everything you see, and note down any questions that you have or any weirdnesses you notice. And if there are any setup issues that surface here, then head to the Discord. So best of luck and I will see you in the solution.
273 </video>
274 <video title="Solution">
275 All right, let's run it and see what happens. We can see the first thing that it emits is that the sandbox exists, but the VM is not running. The VM then started successfully. So it's starting the sandbox container and it's running it within the virtual machine. Now Docker Sandbox might look different to you, they might be emitting different logs, but you should see something like this. Okay, and we can see that Claw Code is talking to us. It is saying phase one is already complete, per the commit history, beautiful. The next phase is phase two, revenue over time chart. we can see how useful the commit history is, like actively prompting the agent to see what's come before. Now it's just telling us kind of what it's up to really. Notice it's not making any permission requests of us. It's just going to its work. My dream is that someday Claude Code will add a feature that allows us to get actual nice UI out of this, because currently it's fairly ugly, right? So where's it up to now? We can see it's now adding the tests for the getAdminRevenue time series. just nicely following the plan that we've laid down for it. If we take a look at the code that it's written by opening up the Git source control here, we can see the changes that it's making, adding stuff to the tests, adding stuff to the analytics service too. one thing we don't get, which I haven't actually figured out a way to fix yet, is to add the live context onto this view. And with the loop in its current state, observability is actually quite hard to do. It's hard to see exactly how many tokens are being used and whether it's succeeded or failed. Capturing all that in some kind of mechanism would be really interesting. Now, here we go. This is super interesting. The tests are all failing due to a better SQLite 3 native module version mismatch. Now, this is something that I noticed when I was building the app, because I actually used Ralph to build the app. I used AFK Ralph. I noticed that between my setup and the Docker setup, when I ran npm install kind of locally, it would install the native module differently. So I added the skill inside .cloud skills better SQLite rebuild. So this is a very, very satisfying moment for me because inside the Docker container, it is picking up the local project-based skills. and those skills are then steering its behavior to actually fix issues it's seeing. So this is one of the downsides of Docker Sandbox is that it is a different environment to your environment. However we can see that thanks to this skill it was able to rebuild successfully and it ran the test again. Okay, and phase two is complete. And the last thing it emits is a kind of big description of all the things that happened and changed. If we take a look inside the git commit history too. by running git log here, we can see the commit that it added. This is a really nice detail commit that adds the key decisions and the files changed and even recommends what to do next. So it's almost as if it's prompting itself what to do next, which is just incredibly helpful. I'm gonna QA this by switching to Alex Rivera. I'm gonna go to the dashboard, go to analytics, and now we can see revenue over time. If we switch to 12 months, then it should show a different chart. How about all time? Yes, looking lovely. So there you go, we have created a new feature completely AFK. no permissions requests, nothing asked of us. It was able to figure out and steer itself with the skills that we provided, use the feedback loops that we got in place to make sure its work was good. and all of that just from a PRD, a plan, and a relatively simple prompt and a bash loop. For me, I feel that this is the future of development, and it's very, very cool. nice work and I will see you in the next one.
276 </video>
277 </lesson>
278 <lesson title="07.06-using-backlogs-to-queue-tasks-for-ralph">
279 <video title="Explainer">
280 I have one final issue with our Ralph loop and the way we've got it set up. And that is what we are asking it to do. In our prompt here, we have a PRD and a plan have been provided to you. Now this is fine, but we need to manually specify which PRD and which plan we're passing in every single time we want to do any work. In other words, we as the human are kind of in the loop here because we are doing task selection for Ralph. Most of the tasks that you do in a repo are not going to be building big features that require a PRD and multi-step plans. A lot of what it's going to be is development infrastructure, you know, like upgrading packages or changing the way dev scripts work. It's going to be bug fixes, like really critical bug fixes, maybe, that are actually really important to ship. Or maybe it's going to be polish or quick wins or changing the wording of something on a website. All of these tasks then form a backlog which you need to get done in some sort of order. Some of these tasks might be P1, might be really, really important and need to get done right now. Some of them might be just P2 and, you know, need to be done at some point and are very, very useful to have. and others are just the nice-to-have tasks that pop up in any backlog. Now if we think about Ralph as just like a machine that can take these tasks and complete them one by one, then our job really becomes backlog or queue management. We as a human are just trying to queue up these tasks one by one for Ralph to complete, at least in the model that we currently have. Inside our prompt we would manually need to pass a plan for each of these tasks. In other words, we are still in the loop. However, I don't think it has to be this way. I think that AI is smart enough that it can look at a backlog like this and figure out which task to do. In other words, AI can do this task prioritization and selection by itself. This means that instead of passing a PRD and a plan, we should instead pass it an entire backlog. Now, I'm not saying pass it a hundreds of tasks at once. You probably need to do some manual prioritization or filtering. But I found that Ralph can easily take in 20 tasks at once, figure out which one to do, and then churn out just that task. In this setup then, the agent produces code which the human then reviews, the human adds stuff to the backlog which the agent then sees and turns into more code. And the amazing thing about this is that this can happen in parallel, so the agent can be coding while the human is reviewing previous commits or just looking at the app in general and adding more things to the backlog. And the next question you're asking is, where should this backlog live? Should it live as markdown files in the repo? In my opinion, this backlog should live somewhere where human developers and human non-developers can look at it and read it. So this backlog should live in your task management service. Linear is one that's extremely popular because, I mean, what a beautiful UI Linear has. You could of course use a classic tool like Jira. But my favorite way of handling a backlog with Ralph is using plain old GitHub issues. My main application, this course video manager, is the video editor that I edit all of my videos on. And as you can see, I have closed 385 issues working on this backlog. A lot of these are quick wins, like adding a keyboard shortcut, adding an OpenMVS code action. Some of these are bug fixes. The OBS virtual camera shouldn't reset when navigating between pages. In fact, let's just click into this and look at how much detail there's here. It's not a huge amount of detail. But as we can see, the Ralph loop has come back and said fixed and referenced it in a commit. So this setup has been incredible for me and oh my god, I mean, I've said this so many times in this section, but I really am so excited for you to see this. In the next exercise, we are going to hook this up to GitHub. Good luck, and I will see you in the next one.
281 </video>
282 </lesson>
283 <lesson title="07.07-setting-up-our-repo-for-github-issues">
284 <video title="Explainer">
285 So let's hook up Ralph to our backlog. Now in order to do this and in order to use GitHub with it, we're going to have to walk through a series of annoying steps. The reason for this is I don't want you to push issues to the same repo that all of your fellow students are using. So we've got a kind of unique constraint here because lots of students are using the same repo. So I'm gonna give you a series of bash commands that you can run in order to move this repo to a new Git repository. but I still want you to retain the original repo. So we're just gonna create a single throwaway repo for a couple of exercises. just so we can see this working essentially. But I still want you to have access to the original repo because in the original repo you'll have commits and stuff to do exercises which might follow on after this. I'd like you to go into the project and I'd like you to cd up a directory. you're going to run cp-r cohort-003-project or whatever you cloned it down to. and you're going to put that in a new directory, cohort-003-project-fork. This is going to take all of the code that's in here and copy it over into here. let's now run this and depending on what's you know the speed of your setup it might take longer it might take a shorter time we are copying quite a lot of code over because we're having to copy over the entire git repo the entire node modules all that stuff. Okay, mine is complete here so I'm going to now open up this cohort03 project fork inside a new directory in VS Code. So cohort03 project hyphen fork. So there we go, we have a new folder of the repo, but it's still pointing at the original GitHub repo. we're going to initialize a new repository inside here we're going to go rim ref and or rm slash rf and we're going to go dot git here so we're going to completely remove the git repository stuff so now it's no longer a git repository We'll then run git init here to initialize a new GitHub repo. Then we're going to add everything by saying git add and colon here. or not a colon, a period, you know what I mean. we'll say git commit and then add a message of initial and then I think that should be enough. So now we have a Git repo locally new for this fork, but we don't yet have an entry on GitHub to actually put issues on. So to get this working, we're going to run gh repo create hyphen hyphen private hyphen hyphen source equals dot. In other words, we're creating a private repo from the stuff in this folder. This will only work if you have the GitHub CLI installed, and so I'll link a link below, link a link below, in order for you to set it up. So let's run this and see what happens. Just like that, we have created a repo on GitHub. The next thing we need to do is to run git push setUpstreamOriginMain() will push our code to the main branch on the origin, which in this case is GitHub. Let's run it and we can see we end up with a main branch up on origin main. Now if we go over to the fork that we've created, we can see all of our code in GitHub and our issues ready to go. So if you got to this point, congratulations, you are ready to move on and start plugging this into Ralph. If you didn't, if you had any setup issues, then go into the Discord to ask for help. nice work and I will see you in the next one.
286 </video>
287 </lesson>
288 <lesson title="07.08-hooking-up-ralph-to-your-backlog">
289 <video title="Problem">
290 Now that our repo is ready to receive GitHub issues, it's now time to hook up those GitHub issues into our Ralph prompt. The way we're going to do that is via the GitHub CLI. which is a really elegant way that LLMs seem to be really, really good at of talking to GitHub. So the first change is that we've added GitHub issues are provided at the start of context. Parse it to get the open issues with their bodies and comments. And that's an important note here, we are passing in the comments too. And we're doing that in the script by running it here where we have inside the loop we grab issues equals gh issue list. Then we grab the issue number, the title, the body, and its comments. Going back to the prompt then, we have this task selection section which we'll look at in a second. But the important thing is after it makes the commit, we then get it to close the original GitHub issue if the task is complete. If the task is not complete for any reason, then leave a comment on the GitHub issue with what was done. ends up being really useful because then the GitHub issue becomes a running record of all of the work that has been done against the GitHub issue. But now let's go back to the task selection section up here because this is really important. We are passing it currently all of the open issues in the repository. And so it is going to need to pick the next task based on a priority order. I have decided, and you may think differently, that this is my priority order. First of all, critical bug fixes must come first. But the reason for this is not only are they very impactful for end users, but also bug fixes or buggy code might end up causing issues if we try to build features on top of it. Next, I'm gonna put in development infrastructure. This might be a controversial one to have so high up the list. But I think development infrastructure like upgrading types and tests and kind of getting things ready to build on top of is really important. especially because our code quality requirements are so high. I found that if you don't put this before new features, then it will tend to just build a bunch of stuff and then not actually develop the infrastructure to actually test the stuff that it's working on. So I have this really, really high up on my list. And then of course, number three is Tracer Bullets for new features. We briefly define what tracer bullets are because this just gives it a little bit of extra context. We then say down the bottom, you can have polish and quick wins. So, you know, just sort of like small additions and features like that. And then at the bottom is refactors. There are probably other tasks that you could put into priority order here, but these five have been pretty useful for me for dealing with a wide range of stuff in the backlogs. means that our backlog can contain bug fixes, it can contain developer infrastructure stuff, it can contain refactor requests and quick wins, and it can contain PRDs and plans for new features. Essentially this makes Ralph really, really versatile for fixing lots and lots of different types of backlog items. So our exercise then is to go into the admin analytics PRD, the one that's kind of nearly done at this point. I'm gonna select everything in this file and copy it. And then I'm going to go over to my issues here and I'm going to add a new issue. I'm going to call it admin analytics PRD. I'm going to paste it into here. So yes, we are putting the PRD directly into a GitHub issue and it is issue number one in this repo. Then I'm gonna go into admin analytics plan here and do the same thing and create a new issue just for the plan. pop up to the top and create a title and this is gonna be Admin Analytics Plan. then instead of the source PRD being referenced as a local file, I'm going to reference it as number one here. and then we'll submit this and we should have two distinct issues. Now before we run our Ralph script, there's one more piece of setup we need to run. Inside our Docker sandbox, it doesn't have access to our authentication login for the GitHub CLI. so we need to log in inside of the Docker sandbox. To do that, you will need to run docker sandbox run claude dot. And because this is a new project, it will create a new sandbox for you. For some reason it then downloads the image again, even though I'm sure it has the up-to-date image locally. Although maybe Claude Code updated overnight and it needs to pull in the new version of Claude Code, but that seems unlikely. This is a frustrating downside of Docker sandbox, is that every time you create a new sandbox, you need to run the entire setup again. So let me pull this open a little bit. We're gonna choose dark mode. We're gonna log in with, in my case, my subscription. And then once we get to here, there's a little bit more to do still. We're going to run exclamation mark here and we're going to run gh auth login. exclamation mark puts us into bash mode so this is going to run the script directly inside the docker sandbox. This shows this little piece of UI here. You've got to copy this one-time code and then go to your browser here, github.com login device. So you basically need to complete the OAuth flow on the Docker sandbox. So I'll just do that and I'll see you in a second. And once that's complete, you can head back here and the GitHub auth login is complete. we can see I'm logged in as Matt Pocock, beautiful. Now I can press CTRL-C a few times to get out of this, and now our AFK script is ready to run. So from here, I'm going to clear the terminal with clear, and then I'm going to run ralph afk, and I'm going to run it with five max iterations. I'm expecting the AFK script here to only run once, really. But I'm going to run it with 5 just to give it a bit of extra legroom and hopefully we should see it stop. But the hope here is that it triggers the no more tasks thing here where it is. If all tasks are complete, output promise no more tasks. And then if we see this in the AFK script, it should exit with ralph complete after X number of iterations because it's looking for this in the final result. So best of luck, we are about to run what I feel is nearly the final form of our Ralph loop. So make sure you note down what you see, ask any questions in the Discord, and I will see you in the solution.
291 </video>
292 <video title="Solution">
293 Okay, let's run it and see what happens. okay it's exploring the repository structure we imagine it's running an explore sub-agent that means that the github issue stuff resolved properly and it's looking at kind of the work that's been done Okay, it's gone a fair bit further. We can see that phases one and two are complete. In other words, it's seeing that from the commit history, not from the issue comments. It then starts implementing phase three, starting with the service function. It's adding some new tests and it's updating the route. Okay, all the tests are passing here and the type checks are passing and now it's committing. And finally, it's decided to close the GitHub issue now that all three phases are complete. So from inside the Docker sandbox, thanks to our authentication login with GitHub, it is able to actually manipulate issues, which is very cool. Okay, the PRD issue is still open since all the work described in it is now implemented, let me close it too. phenomenal stuff. So, so good. And not only that, it says no open issues remaining and it outputs no more tasks. It then comes back with a summary here and this all looks good and the Ralph loop is complete after one iteration. Now it does not always cleanly manage to exit like this on the same iteration. Sometimes it will say, OK, I've done my tasks. Let me go to a new iteration. And then in that iteration, it sees there are no more open issues and then says no more tasks. I found that that's a decent enough trade-off that I don't mind the extra little run to just check that there's no more tasks since that's really cheap. But yeah, would you look at that? Our Ralph loop is pulling from GitHub issues. Incredible. Okay, let's QA this. I'm gonna sign in as Alex Rivera again. I'm gonna go up to dashboard, go to my analytics over here, and I should see a little bit more richness in the UI here. Yeah, we can see we have a course breakdown here and we can filter by instructors. So we can filter to only see Sarah Chen's, we can filter to only see Marcus Johnson's. That should be responsive to the number of days I select to, so seven days, yep, we can see Marcus Johnson only earned $60. Then, yeah, over 30 days, $60, but over 12 months, he earned a decent amount too. Now so far throughout the course I've been trailing how I like to do QA without really explaining how I like to do QA. And really, I've been waiting for this backlog-based setup to show you properly. So let's find something we want to modify here. When we go initially onto the analytics page, it defaults to 30 days. Let's say that I actually wanted to default to 12 months instead. going to go to our GitHub issues on the fork and then we're going to add a new issue here. then we're just gonna say the analytics page for the admins should default to 12 months, not 30 days. Let's add a touch more detail down here. The analytics page for the admins, they need to view a kind of entire swathe of time rather than instructors who are most likely just focused on their current month. So we really don't need to add that much more than that. This is really just a little bit of polish, little quick win. and I've submitted that now as issue three. So now we can head back down into our terminal and just go up in the terminal to run the same command again. Now, as we can see, it starts looking at issue three. The Admin Analytics page should default to 12 months instead of 30 days. What's important to realize here is that there's no plan mode here. There's no back and forth with the agent. So your issue description needs to be really, really clear. However, because this is such a simple change, it was able to do it pretty quickly. We're still getting that better SQLite 3 native module issue, thank the Lord we have that skill. Alrighty, and the tests are passing. The type check is clean. Let's commit and close the issue. And we can see that on the issue it says fixed, change the default time period for analytics for 30 days to 12 months. And in the terminal, it says no more tasks. Ralph complete after one iterations. Let's close the issue. Beautiful stuff. And now a tiny bit of QA when we go to a different page and go back to the analytics page, it defaults to 12 months. I really can't describe how cool it is to just be able to completely delegate little backlog tasks like this to an agent. What I'd recommend you do now is if you have time, this is an optional extra exercise, is to go through the entire site and just look for little things that you want to tweak. Maybe you don't like the icon on the Manage Courses item or something like that. do a little bit of QA on the whole site and just put those in GitHub issues. Once you've done that, go back to your Ralph loop here and run it on a higher max iterations, maybe like 10 or something. and then sit back and watch as the bugs just fix themselves. write down anything you notice and go into the discord to see how other people got on too. Very, very, very nice work. I'll see you in the next one.
294 </video>
295 </lesson>
296 <lesson title="07.09-updating-our-prd-and-plan-skill-to-use-github" name="Updating Our PRD And Plan Skill To Use GitHub">
297 <description>Our PRD and plan skill aren't fit for purpose by the end of the Ralph section. Instead we should update it to post to GitHub instead.</description>
298 <video title="Explainer">
299 is a little addendum to the previous exercise where we still need to go back through and actually every time that we write a prd or we turn that prd into a plan we now need to create github issues into the local markdown files. So this commit does this. It just simply says, once you have a complete understanding of the problem and solution, use the template below to write the PRD. The PRD should be submitted as a GitHub issue. Because Claude and models like it understand the GitHub CLI so well, we don't even need to prompt it to use the GitHub CLI. I've then done exactly the same thing to the PRD to plan inside here. So that should be just down the bottom, write the plan file. Create a GitHub issue containing the plan, use the template below, so exactly the same. So these skills then the PRD creator and the PRD to plan, they're just essentially scripts that you run locally that just pings them up to Ralph. Ralph then runs through the backlog and choose through it and you're good to go. So this pattern of working with an LLM locally to create a GitHub issue is just something I've found is just so powerful. It's great obviously for building features, but also great for issue triage, for ideas, for RFCs. there is a lot of meat on those bones. So there we go, addendum over. Nice work, I'll see you in the next one.
300 </video>
301 </lesson>
302 </section>
303 <section title="08-day-6-human-in-the-loop-patterns">
304 <lesson title="08.01-hitl-and-afk-tasks" name="HITL And AFK Tasks">
305 <description>An explainer where I talk about the designation of human-in-the-loop and AFK tasks. Some tasks, like prototyping or QA, are best done in HITL mode. Other tasks like implementation and bug fixing are best on AFK.</description>
306 <video title="Explainer">
307 Now that we've fairly successfully set up an autonomous agent to do our work for us, you might be thinking, can it do all of my work? But surely it can't do all of my work, right? There are probably some things that I need to do human in the loop. Now I would say your instinct is absolutely correct. And I would say that when you're planning out work to complete a feature, you should be thinking carefully about which bits are human in the loop, in other words, you need to be there, and which bits can be done autonomously. Now obviously the dream is that we're just able to delegate all of our work AFK and we don't need to be there for any of it. But the truth is that lots of work needs to be done human in the loop. We already know one of the things that needs to be done in the human loop, which is planning. An AI can assist with creating a plan, but without the human, it doesn't have the source of truth to actually bounce off for what it's supposed to be creating. Another one we've seen so far in the course is QA. You need a human eventually to actually go in and test the thing that's being built. they'll be able to surface things which aren't present in the AI's feedback loops. Does it feel good to use? Is it fast enough? Does it actually serve the purpose that I built this thing for? you might be getting a sense for the things that you need to use humans in a loop for. anywhere that you need to apply taste to what you're doing. Now, what do I mean by taste? What I mean is human judgment, human feel. We need taste in planning because we need to work out together what we're building and I need to make design choices that I feel are sensible. We need QA because I need to look at the artifact that's created and give feedback on it based on my own tastes. And I feel that to create great products, we need to let humans and AI do the things that they're best at. The human should be there to give their taste, to give their opinion, to give the feel to the application. not only to how it performs externally to users, but how it's built internally to its architecture. and the AI should be there to pick up the grunt work. to apply the human's taste to the canvas of the application code. So throughout this section, we're gonna be looking at ways that we can bring the human back in the loop in a productive way. Because if you delegate 100% to the AI, you're going to end up with a tasteless application. and often just something that plain doesn't work. So let's get into it. I'll see you in the next one.
308 </video>
309 </lesson>
310 <lesson title="08.02-dont-plan-kanban" name="Don't Plan, Kanban">
311 <description>- Show why kanbans are great
312 - Show the user my kanban skill (put it in the repo)
313 - HITL and AFK tasks
314 - Make Ralph only take AFK tasks
315 - QA plan at the end</description>
316 <video title="Explainer">
317 I've always had a little bit of discomfort when it comes to multi-phase plans with AI. And this discomfort comes from the fact that human developers don't work like that. When we understand the destination, we don't just plan everything out rigorously in a single document to create a journey. Instead, we create a Kanban board of different issues. Some issues might be blocked by others. In other words, this issue needs to be completed before either of these issues can be completed. And for this issue at the bottom to be completed, then this issue, this issue, and this issue all need to be completed. In other words, you have a kind of dependency graph of different issues and their interrelationships. This means that two developers at the same time can grab both of these issues at the top independently. Once those are done, maybe they both grab these issues independently. and then one developer works on that one at the end. What I'm talking about here is basically a Kanban board. where once you figure out the destination, you then take all of the tickets, put them on a board, and then developers grab them independently. This is much less prescriptive than a multi-phase plan and it's also much easier to add new things onto. For instance, if I want to grab another issue and just stick it inside here, then I can very easily just describe another blocking relationship. I haven't needed to change any other part of the plan, I've just added something else onto that. In my experience, this is much easier than editing multi-phase plans. It's also really, really easy to QA because you just say, okay, this issue is now done. I'll just go and QA it and add all the QA feedback as a separate issue. So the graph grows naturally instead of you having to squeeze it into a multi-phase plan and figure out where to schedule it. In the repo, I have gotten rid of the prd2plan skill and I've added my own prd2issues skill. This breaks a PRD, which is probably already in GitHub, into independently grabbable GitHub issues using Vertical Slices Tracer Bullets. So the first thing is, it locates the PRD wherever it is, then it might need to explore the code base depending on what's in its context or you might already have explored it. Then it drafts vertical slices, so it breaks the PRD into tracer bullets. and these slices might be human-in-the-loop or AFK. It says human-in-the-loop slices require human interaction such as an architectural decision or a design review. AFK slices can be implemented in merge without human interaction and prefer AFK over human-in-the-loop where possible. Finally, and I found this is suuuper nice, it says always create a final QA issue with a detailed manual QA plan for all items that require human verification. This absolutely rocks because it means I can step away for a long time and then when I come back I've got an issue in GitHub with a detailed QA plan with everything that's been done and the stuff I need to verify. It then quizzes the user. It says, OK, we've got the numbered list here. Present the proposed breakdown and ask it, does the granularity feel right? Are the dependency relationships correct? Should you split them or are the correct slices marked as human loop or AFK? And finally, it creates the GitHub issues using this fairly simple template here. The main thing is it references the parent PRD in the issue number. and it has a blocked by section below saying it's blocked by this issue number. We're still going to be passing all of the issues into Ralph, so it should be able to see all of the blocked by relationships just by looking at the text. Inside the Ralph prompt, I've also made a small adjustment, which is I've said, you will work on the AFK issues only, not the human and the loop ones. And if all AFK tasks are complete, output no more tasks. This should prevent Ralph from doing the human-in-the-loop tasks. but you may want to enforce this deterministically by hiding them with labels or something like that. For now, this has worked fine for me. So by using the Kanban approach instead of the plan approach, we get to independently pick out issues from the AI. In other words, we're parallelizing our work with the AI's work. We get to extend the Kanban board infinitely by adding more QA issues. and we should also get a really nice QA plan at the end. next exercise we're actually going to use this new setup and see how it does. nice work and I will see you in the next one.
318 </video>
319 </lesson>
320 <lesson title="08.03-using-the-kanban-skill" name="Adding QA Plans To Our Plan Skill">
321 <description>- Give them a PRD to break into issues
322 - Create a QA Plan (HITL)
323 - Create a feature and QA it with the QA plan</description>
324 <video title="Problem">
325 Okay, now you've had a taste of what the Kanban skill actually looks like, let's go and use it. I have a new PRD up here which I will provide in a GitHub gist below. The idea of this PRD is gamification. We need XP, we need levels, we need streaks. what course platform is complete without these useless doodads. So we've got a whole bunch of user stories. I want to see earn XP when I complete a lesson so that I feel rewarded for my effort. Your job is to copy and paste what we get from the gist and paste it into a new issue like I've done here. You're then going to note down the issue number of the PRD. In my case, it's 57. I'll copy that. Then I'm going to head into Claude here and I'm going to say PRD to issues, I'm going to pass in number 57. And I just press return very eagerly there, but I will pause that until we get to the solution. This is your first chance of seeing how it actually works with the Kanban board. So write down whatever you notice about the approach and whether you like it or not. Actually creating the issues shouldn't take that long. I would like you to run a Ralph loop on the resulting issues afterwards. It should only pick up the AFK tasks, and once that's done, you should be left with a bunch of human-in-the-loop QA to review. So best of luck and I will see you in the solution.
326 </video>
327 <video title="Solution">
328 Alrighty, let's rock and roll. It's asking to review the GitHub issue, which of course I don't mind. and it's heading off exploring the repo. Okay, pretty swiftly it has returned with the potential list of issues. Whenever I'm checking these, I always check the first one just to make sure it's a proper tracer bullet and not some horizontal splurge. This one looks really good because it's got XP on lesson completion plus sidebar level display. So it's doing the XP events table and migration, doing the test for the XP service and showing the UI. which is great, that's exactly what we want. Then it goes on to do streak tracking. That's good, that's like an extension of the tracer bullet originally. Then quiz XP, again, another extension. 4 and 5 look like really small pieces, so I'm tempted to merge those together. And then number six we've got the full gamification verification so a proper QA plan. So okay, I'm gonna go down and I'm gonna say, merge four and five together, they look too small on their own. Then I'll submit this and it should just plunk those two together. It's now saying, shall I go ahead and create this as GitHub issues? Yes, they look good. I'm not going to read through these manually because they're kind of just derived from our conversation anyway. We can just sort of see them all on the issues template once they're done. And in fact, it's asking me for permission for each individual one. So I'm actually going to go up into settings.json, which doesn't appear to exist. I think it's just settings.local.json. And I'm going to add bash, and this is going to be github-issue-create or gh-issue-create. This, I'm just going to allow it. So that should hopefully speed things up a little bit. Now, of course, I'm using GitHub here, but you could definitely feel free to use any kind of task tracker that has a CLI or has an MCP server. All righty, it has created all five issues. And if we head to the UI here, we can see that they're all here. We've got the PRD of 57 and 58, 59, 60, 61, 62. Let's just take a look at one of them just for an example. Yeah, this is the one we grouped together, which is the dashboard summary card and the module completion toast. really is pretty thin, it's just a couple of lines here on what we need to build as part of this plan and it says go and see the original PRD for full details. and a bunch of acceptance criteria. Now let's go and take a look at the QA one, which is the last one here. Nice. It's basically got a big list of checkboxes for us to check all of the work that was completed. I really like adding the QA plan in actually and showing it to the LLM too, because it just shows it what the user is gonna do at the end, provides another source of like, you must do this, you must follow these criteria. Okay, so we're all set up. I think we're ready to run the AFK loop. going to run it 10 max iterations. I've actually gone recently to running it 100 as my default. I know is slightly mad but honestly I just haven't had any issues with it yet. I'm actually going to go full afk for this one. I will be back once it's done its iterations. unless I hit any weird issues on the way, which hopefully I should not. Okay, I've come back some time later and I now have only two open issues. which is the original PRD here and the human in the loop QA task. We can see that it did complete after four iterations and we're up to nearly 400 tests. and it completed each issue in order. So I'm now going to walk through this QA plan and just see if everything works. I'm going to first log in as a student here. I'm going to go to the dashboard. Well, that's an early fix. I think it hasn't run the migration, so I'll just run those locally. I'm going to run pnpm db migrate. And now I should just be able to refresh the page and yes, here we go. Okay, we can see lots of UI has been added here. At this point during the QA process, what I would do is I would start looking at things that I don't like. For instance, I don't like the fact that these stars and flames and etc seem to be displaying in light mode. I want them to look like the ones on the sidebar over to the left here. And I don't like that there's a weird kind of padding at the top of this board here. so I would start putting these into GitHub issues. Well let's go ahead with QA, I'm going to go into this course here, I'm going to go continue learning and I'm going to complete a lesson by pressing up next and I should get 10 XP, yes I do, very cool. So I'm going to check this off here, that looks nice, then complete the same lesson again, verify no duplicate XP. So I'll go back to pagination and filtering. I can then press up next again and I don't get duplicate XP. Now I've got to go and find a quiz to pass for the first time and verify that 5 XP is awarded. So to do that I had to log in as James Park here and I found one on the API Fundamentals HTTP Methods quiz. see if I can get this one right. OK, which HTTP method blah blah blah. A 404 status code. No, that's wrong. Which status code indicates successful resource creation? I think that's correct. Submit and I get 5 XP. Beautiful. Now there's nothing duller in the world than watching someone just follow a QA script. So I recommend you do as much as you feel comfortable doing that you feel like you've got the picture. and then finish up to save your sanity, because this is a toy project after all. Now the important thing about this Kanban skill, and the thing that I really like about this process, is that AFK and human-in-the-loop tasks are first class. We can plan huge tranches of work while not quite knowing how all of our taste is going to be applied yet. You could even modify this skill to break the QA stages up into sections, so you're only QAing certain bits instead of large chunks. And of course you can delegate this to different members of your team if you have a testing specialist. I think that the AFK versus human-in-the-loop designation allows you to build in humans into your dev loop in a really intuitive way. and I hope that this exercise has taught you the same too. Nice work and I'll see you in the next one.
329 </video>
330 </lesson>
331 <lesson title="08.04-research" name="Research Vs 'Doc Rot'">
332 <description>Talk about when you should use local research files for caching explore phases. But be ware of the dangers of 'doc rot'.
333
334 Reframe research as a great HITL process that allows you to apply your taste.</description>
335 <video title="Explainer">
336 There's a certain type of work that you need to do with agents that sometimes can result in really expensive explore phases. For instance, you might need to integrate with an external service that you don't have the documentation for locally. or maybe of a service where the documentation doesn't exist or isn't public. And in those phases, because the explore phase is so expensive, then you might need to think about caching it. Now, what do I mean by this? Well, I've shown you this diagram a bunch of different times in different ways. It's kind of like a context window with an explore phase, with an implementation phase, and with a testing phase. Now if we imagine that this is a kind of Ralph loop, where it's split out over multiple context windows... The longer and more expensive this explore phase is, the more you benefit from reducing it. In other words, this might be long because it needs to go to an external documentation site and fetch all the docs. and maybe it can't quite find the thing it's looking for and so it needs to just continually search. And over multiple runs, let's say we run this 10 times over a massive Ralph loop. then this is just gonna get really, really expensive. And so, wouldn't it be great if we could reduce the size of these somewhat? because if we reduce the size of these explore phases down, then we're going to end up with a more efficient loop, less token spend, and it's just going to be faster and better overall because we're going to spend more time in the smart zone. One of the best ways to do that is to start off before you actually go into a Ralph loop with a research phase. In other words, you look at everything you need to do for the PRD, you take all of the external documentation and you cash it into an asset locally. In other words, this is a research.md file. it might be research about a specific external library or service that you need to integrate with. It might be research into a specific approach you want to take, like, I don't know, SSE versus WebSockets or something. Either way, you do a bit of upfront work and then all of the Ralph loops afterwards get to benefit from that. Now I consider this research to be a human-in-the-loop task. Because, especially if you're, let's say, trying to choose an external library to do a piece of work... then your taste is gonna be really, really important to influence the direction of the LLM's research. If the external service has multiple ways of doing the same thing... For instance, maybe it has some streams and maybe it has some webhooks. then your taste is going to be really important for guiding which one goes into the research.md. Now this approach won't be necessary for all tasks. because not all tasks need a long, complicated explore phase. However, I would say that as your code base grows and maybe gets harder to find the right things to actually pull into the context. then a research phase at that point would be really, really handy too. A crucial part of this is that this research should have a life cycle that's really closely monitored. We don't want this to stick around in the repo forever. because we might end up choosing a different implementation or refactoring away to a different service. Or if this research is about our own code, then our code might change. So it's really important to audit these research files in the same way that you audit your steering files. the same way that you audit your Claude.nd or your skills. because a bunch of stale markdown files in your repo are going to really hurt your LLMs performance. So that's what research is, it's a way of caching explore phases to make sure that you don't do expensive explore phases at the start of a bunch of ralph loops. you've got to very carefully manage the files that it creates. But if you do, then you're going to get really, really nice results. Nice work, and I will see you in the next one.
337 </video>
338 </lesson>
339 <lesson title="08.05-trying-out-research" name="Trying Out Research">
340 <description>Try out a research phase on a task.</description>
341 <video title="Problem">
342 So now we know what research is, let's actually try out doing some research. In the application I want to build a live presence indicator on lessons. you can basically see all of the little icons of people who are also taking the course at the same time. make this work we'll probably need to figure out the external service right or figure out how we handle this in the back end. There are many, many, many different approaches we could take. We could take a simple polling approach or do WebSockets or do use an external service. because we need to know when someone quits out of the lesson so that we can remove them from the UI or when someone comes in so we can add them back. So what I'm going to ask you to do is go into Claude code inside the project. And you're going to say something like this, I want to research several different approaches to create a live presence indicator. This live presence indicator is going to show up for students who are intrigued who else is looking at the lesson at the same time. I want you to iterate with me towards a solid approach and ask me some good questions about where we should go. And then I want you to look on the web to back things up and kind of validate our assumptions. At the end, I want you to create a research document inside the plans directory. This document should be focused on the implementation and focused on providing research on the approach and how we intend to use it. I don't have a skill prepared for this one. I don't tend to use a skill. I prefer to just rely on claw code and to get it done. The important thing in this prompt is that I'm asking it for several different approaches so that we can compare them. I'm then going to use my human judgment to figure out the right one and then create a markdown document about it. All we're gonna do in this lesson is just create the Markdown document. We're not actually going to go and implement it, although you're very welcome to turn that into a PRD, turn that into a Kanban board, and then go. But since we've already done that, I don't want to be too repetitive. As you do this, really work with the agent, ask it questions. We are just kind of co-researching together. Write down anything you notice and any questions, ping the Discord. nice work and I'll see you in the solution.
343 </video>
344 <video title="Solution">
345 All right, let's give this a go and see how it does. we have our traditional explore phase as it goes and explores the project tech stack. OK, it's done its exploration, it's now found that we have a React Router v7, we have a full stack SSR. and crucially we have no existing real-time infrastructure. Now here is where we get to apply our taste. You notice these questions like scale expectations and fidelity versus simplicity, what should the UI show? We're really in kind of writing PRD territory and kind of, you know, grilling me about my expectations. In terms of scale expectations I'm imagining something like 20 students viewing the same lesson is probably realistic. I'm going to say that I do want true real-time. The UI should show Matt, Sarah, and two others are here. As you say, the cursor scroll position is overkill. I'm definitely open to adding a third-party service, and I would probably prefer that to self-hosting it. Presence data doesn't need to persist, it can be ephemeral. We don't need to worry about unauthenticated users counting towards presence. Okay, let's ping this off and see what it does now. So for the first time during this course, it has gone and launched some background agents. Those agents are researching Partykit, Pusher, Ably, and Liveblocks. And it's going to share its initial thinking on the trade-offs of those. I know about some of these, but I've never actually used any of them. In fact, while I've been waiting, it looks like the agents have completed. So Super Bass Realtime and Live Blocks don't look right. Then we've got Pusher and Ably and Party Kit. So it looks like it thinks Abley is the best. also has a super generous free tier which is what I suppose I care about most in this really a toy application. I'm just going to answer its last few questions here. I agree that Ably looks really nice. I think I would prefer a managed API key service. I'm comfortable with that api.pusherauth.ts idea. And this is probably the most important one. let's make Ably the recommended approach for the research document. So with that, I feel comfortable running it and we can get it to work. is nice. It's now kicking off the agent again to research ably again. So this is a really nice in-depth exploration that we're able to do without using skills actually. Okay, it's come back from its final explore phase and it's written a markdown file for us. This is that live presence indicator in the plans. And let's just take a quick look at the length there. What are we talking? We're talking a decent, hefty, nearly to 300 line file. Let's take just a quick look at the top and see what the top level items are. We've got some requirements here, we've got the recommended approach, the implementation design, integration points in the existing code base, that's really nice. The alternatives considered we might not need, but then it might be useful as a kind of architectural decision record. I suppose it's useful context for us to decide why we picked Abley. So this then is the asset that would go into the PRD and go into any planning that we then go on to do from here. We've even still got, you know, just 18% contacts used here, so we could just go and write a PRD from this point too, directly from the research. way the research, all of the issues in the Kanban board and everything would rely on this research. And I hope you can see now from having done it that the human in the loop part is really important. What I would do for a production application is I would go into each service, kind of investigate in depth and verify what the AI was saying to me too. because as part of a research you often need to winnow down and figure out exactly what approach you want to take. or at least which one you want to prioritize now and maybe test out and prototype. I'd love to hear if you noticed anything different from that or if you have any great ideas for how you want to use research in the future. It's a really, really great tool. I don't use it every single time, but when I do, it feels really effective. I suppose the last thing I should mention is why we're including it in the local file system instead of on like a GitHub issue. The reason is that I want this to be really easily discoverable by the AI that's doing the implementing. We can certainly reference this file directly in the PRD and in the issues so it knows to go and read it. But if we later come back and we need to fix some bugs with the approach, then having the original research in the repo is going to be really useful. However, as I said before, beware of duck rot. If we eventually move away from Ably, or Ably changes their underlying SDK... then this research is going to go totally out of date. and it's actively going to harm the agent working on this codebase instead of help it. I tend to be really aggressive in deleting the stuff because you can always just check in the git history and pull it out if you need it. So I would say, once you finish the Ralph loop that's working on this feature, and once you finish the QA, then you can just delete it from the repo. Very nice work and I will see you in the next one.
346 </video>
347 </lesson>
348 <lesson title="08.06-prototyping" name="Prototyping">
349 <description>Reframe prototyping is a great way to apply taste to design.</description>
350 <video title="Explainer">
351 Whenever you're embarking out on a big build of something, especially something that doesn't have a precedent in the code base. it's often extremely useful to get in an early prototype. This is a technique that software developers have been following for decades. You're not quite sure what the client meant, so you provide them with a prototype they can play with, they can mess about with, and give you feedback on. If you think of the plan mode as you and the AI trying to work out in-text what you're trying to build... prototyping allows you to actually make it concrete. And along with research, I tend to think of prototyping as a really key part of building good applications. because you as the human get to impose your taste on the AI before it then goes into the Ralph loop. I tend to use prototyping a lot for front-end design. I tend to get it to create multiple options for me on a throwaway route. this also means that when the Ralph loop goes and actually implements this afk it's got this prototype to reference. This is also really great for testing new tools. For instance, I might want to just spin up a new library or something, maybe even in a throwaway repo. I want the LLM to really put the new library through its paces and maybe test something out. And then having that janky little prototype that I can then feed into the actual implementation again is really great. And of course, it's also really great for testing new services. This is where research and prototype can work really well together. you research a bunch of possible options and then maybe you create prototypes for all of those options. Then you as the human can QA the prototypes, offer feedback, maybe iterate on them a little bit. and then you end up with the best possible solution that can feed into a PRD and then break down into issues for the Ralph loop. What we're trying to do here is try to flush out the unknown unknowns really early on in the process. And there are lots of places where prototyping is not useful. For instance, bug fixing is the obvious one. We know what the desired behaviour is, we just aren't meeting that behaviour. Extending existing features usually also means you don't need a prototype. because you're usually just composing existing functionality together to build something new. If we've got a hundred modals in the application, we don't need to prototype what a new modal might look like. we can just throw it together from all of our bits we've got already. However, if you do want to redesign an entire system, then a prototype can be an amazing way to do it. Now for me, I don't tend to use a prototyping skill to create this prototype. I'll probably just invoke my do work skill and do a normal kind of human in the loop run with this. The reason is I want to be close to the prototype, I want to be offering advice and applying my taste. And once that's done, I'll create the PRD leaning on information in the prototype. and then create the Kanban board too, again, referencing the prototype locally. So this is really like a form of research, but instead of researching external stuff and saving explore phases, you're actually making the implementation step simpler. and I cannot recommend it enough as an approach. So prototyping, even before the PRD, or as part of building the PRD, is essential to applying your taste to products. It's great for design, it's great for testing out services, even great for software architecture. Nice work, and I will see you in the next one.
352 </video>
353 </lesson>
354 <lesson title="08.07-trying-out-prototyping" name="Trying Out Prototyping">
355 <description>After the brief introduction to prototyping, it gives people a chance to try out prototyping.</description>
356 <video title="Problem">
357 Okay, let's try out prototyping by taking the research that we did before and trying it out. Our research is currently all theoretical. Before we were to actually implement this AFK, I would definitely want to do a prototype. Because first of all, I want to know whether the time I'm going to be waiting for the AFK is going to be worth it. I need to see this thing working and fix any weird bugs with the implementation before we actually go and implement it. So to get this going, I'm gonna go into my Claude instance and I'm going to kick off a prompt. The first thing I'm going to do is specify the Do Work skill. Then I'm going to add in the plans, live presence indicator, so I'm going to add in the research. then I'm gonna say, I want to build a prototype of the research here. I want this on a throwaway route that is only visible to developers. I want to create a local asset that the agent that ends up implementing this in the real app can actually use. I think that's probably enough. I imagine it's going to ask me some more questions as we go. The important point is that I've specified I want it on a throwaway route that's only visible to devs. and I'm kind of explaining the purpose of why I'm creating the prototype. If I was doing or interested in design here, I would also say, okay, maybe give me lots of different options. you know, give me five options for how we could actually design this on the front end. But I've already got a pretty solid idea of how I want it to look, and that's kind of reflected in the design, so I don't really need to see multiple of these, I just need to see it working. So I'll give you this prompt as something you can copy and paste below. Actually seeing this working will mean you will need to go and sign up to Ably and actually check it out. or if you just want to watch me do it, then I'll be there in the solution. If you do do it yourself then write down anything you see, and if you have any questions then head to the Discord. Nice work and I will see you in the solution.
358 </video>
359 <video title="Solution">
360 Okay, let's run this and see what we get to. It is exploring, of course it is. A little side note while it's exploring, I do still like having a Do Work skill, just because it allows you to do these kind of human-in-the-loop mini sprints. Just in coding the very simple stuff I want out of a do work skill like the feedback loops, like any TDD stuff I'm doing, I still want to apply like my rigorous code standards even to these prototypes. because it means that when the implementation goes to actually copy them, they'll be copying genuinely production-ready code. Okay, it feels like it's ready to go without me needing to do any planning. So it says PMPM add Ably, okay. now feels confident enough to start adding files I'm just going to let it do its thing. It's already at the point where it feels like it's confident enough to run the types and the tests and it's now committing. It's actually done something really nice here, which is it's actually made the prototype components reusable and exported from the prototype components. In other words it looks like it's designed to be pulled out of the prototype and dropped immediately into the real app. This again means that the AFK implementation of this is going to be so simple. It's also given me a how-to-test-it thing, which is nice. I'm gonna go and grab myself an Abley API key and I'm gonna come back and see if this does what it says it's gonna do. actually found this pretty confusing so I'm actually going to ask it give me a detailed step-by-step guide on how to acquire an Abley API key with exactly the scopes that you need. It's a nice way of just making my life a little bit easier. OK, that's what I need. Getting an ABI key for presence, get the ABI key, set the right capabilities. Beautiful. Having some actual research here means that we don't need to actually go out and explore this, it's just basically churning stuff out from the research, beautiful. So I've now followed these steps and I've got an Ably API key in my .env, that's lovely. Okay, and I have logged in and I'm on dev presence and I've logged in as Emma Wilson, a student. It's telling me open this page in multiple browser tabs as different users via the DevUI to see presence updates in real time. So on the tab on the right, I have logged in on an incognito tab. So hopefully they won't share the same session. And then I go in and I'm going to go in as Olivia Martinez. I'm now going to log on to dev slash presence. I connect and we see on the left hand side that we have Olivia Martinez too. If I then come out of here and go back, then the present disappears instantly. So there we go, our present setup is genuinely working. At this point I might want to iterate on the prototype a little bit more. I might want to make it a bit more like what I imagine it's going to look like in the lesson. But to be honest, all of the unknown unknowns are flushed out at this point. We know that the service works okay, and we know that we can connect to it easily. All of the rest of the stuff, I'm confident, can be worked out AFK. So there we go, that's the power of prototyping. Putting a little bit of human-in-the-loop work in early in order to validate your assumptions about an unknown unknown. Anyone who's an experienced developer will know the value of this already. but I think it's important to categorise it as human-in-the-loop work. We could have potentially said, okay, just go off AFK and make this happen, then I'll review it later. But the idea that you can sit and iterate with a prototype and just get into a tight feedback loop with the AI is so beneficial. and then pays dividends when you go to actually implement it. nice work and I'll see you in the next one.
361 </video>
362 </lesson>
363 <lesson title="08.08-designing-codebases-ai-loves" name="Designing Codebases AI Loves">
364 <description>Talk about the philosophy of designing codebases that AI loves, including cognitive load concerns, gray boxing, etc. Maybe lean into a DDD angle?
365
366 Reframe software architecture as a taste thing.</description>
367 <video title="Explainer">
368 There's one extremely important place where human-in-the-loop can make a crucial difference to the success of your code base. and that is in defining software architecture. The current generation of agents are not good at thinking about software architecture and they generally design systems that are hard for themselves to use. So organizing code bases for AI is going to be an essential skill and probably one of the highest impact things you could do to your code base. Let's look at what AI sees when it looks at a bad code base. A bad codebase looks like this, where there are dozens and dozens of modules here, each with maybe some exported functions, some functionality inside them. And to find the correct code to actually explore this code base, the AI has to go through, understand which modules are imported by which other modules, has to go through the entire chain. or at least large chunks of the import-export chain. What this basically means is that AI can't, at a glance, understand where to find specific functionality. And these tiny chunks here are really hard to get good tests out of. We know the importance of good feedback loops to AI. And the way you'd have to test a module like this is really just to sort of test, okay, maybe just write some tests for this module on its own, then this module on its own, then this module on its own. means we don't get a good sense for how all of the pieces of the codebase work together. And if one of these individual modules change, then we need to change the test as well because it's too coupled to the shape of that tiny little module. Now you, as a human, will probably be able to navigate this codebase okay. because you can kind of understand and start to develop an instinct for the purpose of these chunks of code. you'll know that, okay, these groups of files over here, these relate to authentication. These groups of files over here relate to just the API layer. These front-end components pertain to a certain feature. These ones pertain to a certain feature as well. But the AI doesn't have the same advantage as you. it can't develop memories about the code base in the same way that you can. So what it sees when it goes into a bad code base is this. it has to recreate all of the relationships from scratch. So then how do we make this code base better? How do we make it so that AI can detect and understand what the purpose of the code is without needing to develop a memory or without needing to traverse the entire graph? Well, we can restructure the code into larger chunks. and we can bake in the intention of the code into the structure of it. This is a concept that John Ousterhout from the philosophy of software design calls deep modules. Deep modules have two parts. They have an interface, which is this little thin barrier up there, and then the implementation. The interface is a thin little layer between this massive module and the rest of the world. On an interface you might define functions or methods, you might say okay this is how you're going to call this function, this is how you're going to call this function. Only these properties, et cetera, are available and you call them in this way. And then behind the interface, you then stack a bunch of implementation. In fact, the more implementation you can fit behind a simple interface, the better. Because that means that different parts of your system are just relying on this simple interface which should rarely change. And so you can test or write tests for these huge units individually. These tests will test the public interface of this module. And because the public interface won't change that much, these tests also won't change that much. you can really experiment inside the tests and write really detailed implementations. So when you have a code base structured like this, where you have a bunch of deep modules with simple interfaces. then the AI can actually just read the interface and understand very quickly what the module is supposed to do. And from a high level, it can get a very fast sense of what the entire code base's purpose is and where it might need to change things to adjust an implementation. There's another massive bonus to this approach too, which is it decreases the cognitive load on the developer. If you're a developer maintaining this system, then you have to keep the internal map in your head. And because AI is changing this code all the time, moving these boxes in and out and around, and doing weird things with them. then that cognitive load is really, really hard to maintain and it often leads to devs just feeling really knackered when they use AI. This, by the way, is something that's really important to mention, that cognitive load is something that I think everyone is experiencing more of having used AI. You want to move so much faster and you want to continually push with AI to just maximally parallelize yourself to do as much work as possible. So any strategy that's out there that can decrease cognitive load can just make your life a lot easier. So, organising your codebase into deep modules not only allows the AI to navigate it better... It also means that the map that you have to keep in your head is simpler. Not only that, but you as the developer get to just focus on the interfaces, focus on how these modules are designed. and you can largely let AI take care of the implementations. I find myself doing much less code review on these stuff inside these boxes because I know that it's tested. I usually find myself reading the tests more than I do the implementation. So I think of these as grey boxes. I can look inside them, you know, they're not fully black boxes, but they're grey boxes because by default I want the AI to handle what's in the implementation and I can help design the interfaces. Let me give you an example to make it a bit more concrete. For those who don't know, I record all of my content on a custom video editor that I've built. And as you can imagine, it is really, really complicated. There is the entire front end for the video editor, which has a bunch of modules associated with it. Many of which I had automated tests for individually, like a reducer that did some state in the front end. and some of the very simple selectors which pull down state from that, you know, sort of front-end stuff. Then I had maybe an even more complex back-end API which was spread out over several API endpoints. there were maybe actually about 20 files here sort of focused on the back end API and all of them were really interrelated. The backend API then contacted a CLI that I use to monitor files that are being created from OBS that I need to sort of detect the silence on, blah, blah, blah. There's basically a CLI that handles that stuff. And then there's a couple of related modules for saving stuff in the local database I have. So I was getting a lot of bugs in my video editor, and it was a lot about how these pieces interacted together. In other words, I would try to fix the bug on the backend, but then it turned out it was actually the way that the frontend was calling the backend API. And then maybe it was the fact that the backend API wasn't calling this CLI properly, rather than the CLI had a bug in it. So what I needed to do was find some way to turn all of this into a single testable unit. Because then, instead of breaking my brain trying to figure out where the bug was between all of these four possible places... I could just wrap the whole thing in an integration test and test the whole thing end to end. And then all of the stuff inside here, I wouldn't necessarily need to care about its internal structure so much. So that's what I did. I wrapped the entire thing in a service. I essentially built two parts of this, an SDK that could be called from the front end and then a sort of API handler on the back end that took messages from that SDK. This basically turned the entire thing into a single deep module. And now whenever anything goes wrong with my video editor, I'm able to do TDD on the entire editor flow. Now identifying that this was a problem and identifying that wrapping it in an entire service and deepening the module was the solution, I don't think AI is capable of that right now. certainly not without a human prompting it. So that means that you, as a developer, need to develop the language in order to talk about these deep modules with AI. Designing your codebase to work this way needs to be embedded in your planning process. Nice work, and I will see you in the next one.
369 </video>
370 </lesson>
371 <lesson title="08.09-the-improve-my-codebase-skill" name="The 'Improve My Codebase' Skill">
372 <description>Have a go at running the Improve My Codebase skill using the language that we've now learned - interfaces, modules, deepening etc.</description>
373 <video title="Problem">
374 So now we know that code base architecture is important and we should have a good idea that deep modules increase the quality of your code base and make it easier for AI to navigate. But how do we actually make that concrete? How do we take a bad codebase and turn it into a good codebase? Well, I have provided for you an improved codebase architecture skill. This skill walks you through your code base and looks for opportunities to improve it. It starts by defining what a deep module is. It has a small interface hiding a large implementation. It then just tells the AI to explore the code base, use the agent tool with sub-agent type explore to navigate the code base naturally. It says to explore organically and note where you experience friction. Where does understanding one concept require bouncing between multiple small files? Where are modules so shallow that the interface is nearly as complex as the implementation? That's really nice. This is such a common one. Where have pure functions been extracted just for testability, but the real bugs hide in how they're called? This is something that LLMs do all the time. They say, OK, let's make this testable by just pulling out this one small bit. But actually, you still get bugs in the real implementation. where do tightly coupled modules create integration risk in the seams between them? I used the writer skill to put this together and it honestly is on a tear here. This is just perfect. So once the exploration is done, it's going to present some candidates. It doesn't propose any interface designs yet, it just basically says, okay, there's this cluster of shared concepts, they're coupled together, we should probably group them into a deep module. Then the user picks a candidate, that's step number three. Then step number four is to design multiple interfaces. So this one spawns multiple sub-agents in parallel and each are going to produce a radically different interface. I found that this gives you the best opportunity for getting lots of diversity, lots of diverse options that can then be pulled together into the right design. then it should, just below here, give you a recommendation. which design you think is strongest and why, and if elements from different designs will combine well, propose a hybrid. Finally, the user picks an interface or accepts your recommendation and then you create a refactor RFC as a GitHub issue. You can then turn that RFC directly into a Kanban board, breaking it down into issues if you're happy with it. Now the way I usually run this is I usually identify myself something that might be a problem. And so you can say, improve your code base architecture looking at this specific area. but for this one, I'm just gonna run it clean. I would like you to do the same as me, and let's compare our notes in the solution. Good luck, and I will see you there.
375 </video>
376 <video title="Solution">
377 Okay, I'm going to run this, it is probably going to do its exploration, and I'll see you once its exploration is complete. Okay, it has come back with a bunch of options. The first one is a quiz subsystem, so merge scoring, CRUD, and XP awarding into one deep module. This is the one I was kind of hoping it would pick. When I put this codebase together, I decided to leave one service that had been badly coded. In other words, this quiz scoring service uses a raw database instead of using the actual DB from elsewhere. It has like anys all over it, quiz data, answers, et cetera. And it actually has no tests accompanying it. So it's literally just as if it's been written by a contractor, someone just foreign to the code base. And so I do think it's a good idea to wrap this in a deeper module in order to test it all. Honestly, there are lots of decent candidates here, the student progress system, like consolidating a bunch of stuff into a single module. This stuff can be pretty hard to read if you're not used to some of this language. like side effects and transactional domain operations and monoliths. But if you have any questions about the language, I recommend you ask the LLM directly what it means when it uses certain phrases. Learning these phrases will actually benefit you in the long run because then you can prompt the LLM with those phrases. I'm actually gonna zoom to the bottom and ask the LLM, which of these would you recommend? This is like asking the waiter which meal he would most prefer. Okay, and it's recommending number one, where we can see it because it's the only place in the code base mixing raw BetterSQLite 3.SQL with Drizzle ORM. I like that, that's freaking hilarious, you're already looking at it because the quiz scoring service is open in your editor. Okay, I guess so, cool. So at this point I'll just say yes, let's explore number one and let's see what it comes back with. So it's going to pull in all of the current quiz files to understand the exact interfaces. And it's now spawning some several different design agents, one with a minimal interface, one with a flexible interface. and then a third with a caller-optimized interface. Now interface design is like a really, really deep topic. And it's something that you will get better at as you design more of these systems. But seeing multiple options from the AI is really, really useful just for developing taste. and you don't necessarily need to get it perfectly right, because the next time that you run improve my code base architecture, you could target the thing that you've done already and make improvements to it. So I'll wait for a minute until these three come back. Okay, we have some designs to look at. Before I actually look at them, I'm going to zoom all the way to the bottom and I'm going to see, okay, it recommends design B. So before I read this stuff, let's actually go and look at design B. So this is essentially a function called createQuizModule where you pass in the database, you pass in the scoring strategy, so that's dynamic. and then you pass in the quiz pass XP. And then this has a few methods on it. So getQuizForLesson, getBestAttempt, getLatestAttempt, attemptHistory, getStat, submitAttempt, saveQuiz, and deleteQuiz, I see. And the XP amount is completely configurable outside of it, so it doesn't own the XP amount. This means that whether you're an instructor creating a quiz or updating a quiz or whether you are a student taking the quiz, you're essentially calling the same module. which I really like. That means all of the quiz stuff is in one place. An AI needing to fix anything to do with the quiz, we'll put it inside the quiz module. So I dig this, I think this is great, I can see that the recommendation at the bottom too... It says to take design B's factory and injection, but borrow one thing from design A, use an XP awarded Boolean, rather than an XP awarded number or null. The XP amount is an implementation detail that the caller doesn't need. I agree with that. It's now asking if I want to create a GitHub issue, and I do. And just like that, we have an issue ready to go. The quiz subsystem is split across three shallow, tightly-coupled services that callers must manually orchestrate. and it's in that manual orchestration that the bugs come up. If we zoom down below, we can see that there is a proposed interface. This absorbs all quiz concerns behind around eight methods. This looks super duper good to me. I really like that it gives you a before and after of how the caller actually calls these functions. Before it would have to grab the quiz by the lesson ID, then grab the quiz with the questions, then get the best attempt. And to figure out the result, it would have to compute the result and then award XP. But afterwards, it now just has two calls in the loader, where it gets the quiz, and if there is a quiz, then we get the best attempt. And in the action, it's now just one call instead of two. So overall, this is a relatively small change. we're just combining more functionality together into a larger testable unit. And over time, as you do this more and more to your codebase and as you bake this thinking into your planning. you'll end up with a code base that's more testable, where the feedback loops are stronger and that's more easily navigable. And in theory, we could take this quiz module and just turn it into a gray box. You as the user can now just completely test from outside of it and you don't need to worry about what's inside. As long as the XP is awarded, and as long as the quiz gets passed, and as long as all those functions work and test correctly, then you're good. So hopefully that makes the idea of what a good codebase looks like more concrete. feel free to grab that skill so that you can turn it on your own codebases and see what comes out. Nice work, and I will see you in the next one.
378 </video>
379 </lesson>
380 <lesson title="08.10-adding-module-awareness-to-our-planprd-skill" name="Adding Module Awareness To Our Plan/PRD Skill">
381 <description>Services and interfaces, and being specific about interface design.</description>
382 <video title="Explainer">
383 Now we should understand a little bit more about what the cure for a bad codebase is. It's a human working with an AI, running the improved codebase architecture skill. applying their judgment and finding opportunities for deepening modules from within the code base. So if that's the cure, how do you prevent a codebase from going bad in the first place? Well I would say that whenever you're planning new features, especially large features such as in a PRD, you need to build module awareness into what you're doing. And so that's what I've done inside the WriteAPRD skill. Before, we only had four sections here, but I've added an extra one just before section five. I've told it to sketch out the major modules that you will need to build or modify to complete the implementation. actively look for opportunities to extract deep modules that can be tested in isolation. A deep module encapsulates a lot of functionality in a simple testable interface which rarely changes. And step four is to check with the user that these modules match their expectations. and check which modules they want tests written for and at what boundary. So this brings that module awareness into the PRD construction. and it then gets put into the implementation decisions in the example. So we see the modules that will be built or modified and the interfaces of those modules that will be modified. so it brings those modules and interfaces into the planning process. If you fancy it, feel free to use this updated writer PRD skill to try something out in the code base. Or jump in the Discord to discuss how you might improve this skill even further and bring some of your own architectural knowledge into how you like to build codebases. Nice work, and I will see you in the next one.
384 </video>
385 </lesson>
386 </section>
387 </course>