Proposal:_Use_Trunk-Based_Development.md
1 # <span class="c2">Proposal: use trunk-based development</span> 2 3 <span class="c0">Written by Oskar Thorén on September 16, 2017. Updated 4 September 23.</span> 5 6 <span class="c0"></span> 7 8 <span class="c0">We should practice trunk-based development in 9 status-react and status-go repos.</span> 10 11 <span class="c0"></span> 12 13 <span class="c0">Trunk-based development (TBD) is a source-control 14 branching model, where developers collaborate on code in a single branch 15 called ‘trunk’ (‘develop’ for us), resist any pressure to create other 16 long-lived development branches by employing documented techniques, 17 avoid merge hell, do not break the build, and live happily ever 18 after.</span> 19 20 ## <span class="c11 c12">Why?</span> 21 22 <span class="c0">1) We have a lot of issues related to breaking builds, 23 long-lived branches that other branches depend on, wasting precious time 24 on coordinating and doing manual builds from ad hoc branches to patch 25 things up.</span> 26 27 <span class="c0"></span> 28 29 <span class="c0">A breaking build on 'develop' (i.e. change that causes 30 the app to stop working in a major way) should be flaming emergency and 31 you stop everything to fix it, even if it means reverting 32 commits.</span> 33 34 <span class="c0"></span> 35 36 <span class="c0">2) We already have a great QA team, a PR based 37 workflow, a (semi-)-functioning CI, and feature flags. All of this is a 38 perfect fit for TBD.</span> 39 40 <span class="c0"></span> 41 42 <span class="c4">Objection: We merged a few bugs into develop with 43 changes related to wallet. If we use TBD such problems will increase. If 44 we merge want this not to happen we have to not merge stuff into develop 45 before merging, but this makes fast reviews (tens of minutes) not 46 possible.</span> 47 48 <span class="c4"></span> 49 50 <span class="c4">Reply: It is OK to push a few bugs under WIP if nothing 51 breaking like crazy. If it's a big/UX related change, QA can note known 52 regressions/limitations of things hidden behind flag. Once flag is about 53 to be turned on for release, you do more extensive checking. You don't 54 need auto testing for this.</span> 55 56 <span class="c0"></span> 57 58 <span class="c0">3) TBD encourages shipping code, breaking things into 59 smaller parts and getting things into trunk/develop as soon as 60 possible.</span> 61 62 <span class="c0"></span> 63 64 <span class="c4">Objection: Commit every 24h isn't a good policy because 65 it doesn't guarantee meaningful smaller parts.</span> 66 67 <span class="c4"></span> 68 69 <span class="c10">Reply: You don't need "commit every 24h", just if you 70 open a PR and it hasn't been merged in two days it is most likely (a) 71 not ready yet and/or (b) too big. Minimal utility can be for other 72 developers as 73 well.</span> 74 75 ## <span class="c11 c12">I thought we kind of did this already; examples?</span> 76 77 <span class="c0">- \`status-react\` has never depended on \`status-go\` 78 develop branch as far as I can tell. This, in addition to lack of CI for 79 status-go, leads to cascading effects for status-react.</span> 80 81 <span class="c0"></span> 82 83 <span class="c4">Objection: I don't see a big problem in this, most 84 cascading effects come from go-ethereum, which we don't control.</span> 85 86 <span class="c4"></span> 87 88 <span class="c4">Reply: It (status-go HEAD breaking for a month) can be 89 solved by only actually doing the upgrade when all breaking changes have 90 been incorporated. This too can be a flag and one commit. We have wasted 91 a lot of time on this status-react if you sum it all up I think, and 92 devs can't use bugfixes like crashing in debugger.</span> 93 94 <span class="c4"></span> 95 96 <span class="c4">Counter-reply: This will require a lot of coordination 97 on status-react and status-go side.</span> 98 99 <span class="c4"></span> 100 101 <span class="c4">What if we have this situation like we have with 102 \`upgrade-status-go\`, when we can’t merge it because app will be fucked 103 (now eth.sendTransaction calls are blocking), but you need some changes 104 from this branch (“like crashing in debugger”, etc). TBD definitely will 105 not work if we will use it on our side here, i doubt that it will be 106 possible to choose dependency by flag (and even if so, fix for crushing 107 debugger will not be available), so then flags should be</span> 108 109 <span class="c4">used on status-go side in order to achieve this. So we 110 still will have a loop between status-react and status-go: they have to 111 add flag (if possible), only then we can proceed.</span> 112 113 <span class="c4"></span> 114 115 <span class="c4">And then when it comes to using flags for bugs which 116 appears after they rebase on new version of go-ethereum and something is 117 not working, i hardly imagine how this is doable. I mean flag which will 118 choose between pieces of code before and after rebasing, it can be 119 impossible to separate these pieces at all.</span> 120 121 <span class="c4"></span> 122 123 <span class="c4">Counter-counter-reply: We can use branch by abstraction 124 as well <https://trunkbaseddevelopment.com/branch-by-abstraction/> and 125 possibly have the two in parallel. I guess it's a matter of extensive QA 126 on the upgrade-go-ethereum short lived branch and push bugfixes _to 127 trunk_. Google does TBD and has a single repo, and I'm guessing they 128 have worse dependency issues than we do, so it's not impossible. Then 129 it's a question whether that is _worth it_ and maybe we should have a 130 basic post mortem on what went wrong and how we can avoid this in 131 future. But it's certainly doable if we think it is worth it. That 132 doesn't mean it is definitely worth it in every possible circumstance 133 for our org/code base etc, but we should make trade-off explicit.</span> 134 135 <span class="c0"></span> 136 137 <span class="c0">- Long-lived PRs: 5+ open PRs that are older than a 138 week in both repos.</span> 139 140 <span class="c0"></span> 141 142 <span class="c4">Objection: On the other hand we keep not finished, not 143 completely meaningful piece of code outside codebase. Splitting things 144 in meaningful parts is not always straightforward.</span> 145 146 <span class="c4"></span> 147 148 <span class="c4">Reply: It's OK to keep WIP work in code base, it makes 149 it easier for someone to take over too. Of course it should always have 150 some utility to someone, not just random stuff. Utility can be for other 151 developers (yourself in future or other team members) too.</span> 152 153 <span class="c0"></span> 154 155 <span class="c0">- Long-lived branches: We use ‘epic/wallet’ when QA 156 flow and flag are already in place.</span> 157 158 <span class="c0"></span> 159 160 <span class="c4">Objection: We don't want WIP commits to go into develop 161 before QA, so we use epic branch to decrease number of testing 162 iterations.</span> 163 164 <span class="c4"></span> 165 166 <span class="c4">Reply: We can decrease number of testing iterations 167 just using flags. This is the exact same thing for QA especially if 168 everyone is running trunk and notice fatal errors immediately. The exact 169 involvement of QA for each WIP commit is up to QA and each PR.</span> 170 171 <span class="c0"></span> 172 173 <span class="c0">This is not any individual’s fault; it is a systemic 174 issue stemming from a lack of coherent vision and application of how we 175 do branching. Will there be issues? Sure, but nothing that can’t be 176 solved with disciplined application of known practices.</span> 177 178 ## <span class="c11 c12">What would this mean in practice for us?</span> 179 180 <span class="c0">- Ensure that \`develop\` is always working (tools we 181 have: CI, flags, incremental changes).</span> 182 183 <span class="c0"></span> 184 185 <span class="c4">Objection: Develop is fine, we also will need to be 186 able to build different versions of develop afaiu (based on 187 flags).</span> 188 189 <span class="c4"></span> 190 191 <span class="c4">Reply: Possibly. To start with all flags should be on 192 by default for nightlies, then for release builds almost all should be 193 off (depends on scope of release).</span> 194 195 <span class="c0"></span> 196 197 <span class="c0">- Be aggressive about not allowing long-lived branches 198 or PRs or weird branch dependencies.</span> 199 200 <span class="c0"></span> 201 202 <span class="c4">Comment on weird dependencies: we have several PRs 203 waiting for status-go upgrade, the PN one being biggest offender since 204 it is against a PR that was first opened a month ago and it requires a 205 lot of coordination despite not being directly tied to it, if breaking 206 things were hidden behind flag in status-go then this commit could've 207 happened anytime and PN branch would have debugger fix and PN public API 208 available to consume.</span> 209 210 <span class="c0"></span> 211 212 <span class="c0">- For releases, don’t do code freeze, instead 213 short-lived release branches + cherry pick from develop.</span> 214 215 <span class="c0"></span> 216 217 <span class="c4">Comment on releases: This should only be the concern of 218 the people actively working on creating release artifact, testing and 219 fixing regression. It has little impact on other devs WIP on develop. 220 </span> 221 222 <span class="c0"></span> 223 224 <span class="c15">For more details on TBD, read here: 225 </span><span class="c13 c15">[<https://trunkbaseddevelopment.com/>](https://www.google.com/url?q=https://trunkbaseddevelopment.com/&sa=D&ust=1507366105390000&usg=AFQjCNEm3ZdxQRxb4P3FHLexY1Corp8oPw)</span> 226 227 ## <span class="c11 c12"></span> 228 229 <span class="c0"></span> 230 231 ----- 232 233 ## <span class="c11 c12"></span> 234 235 ## <span class="c11 c12">General comments</span> 236 237 <span class="c4">Comment: I see serious advantages in this approach, 238 because actually we have suffered from epic branches. Though still some 239 drawbacks \[captured in objections\]</span> 240 241 <span class="c4"></span> 242 243 <span class="c11 c7">Objection: Potential flag-based mess in 244 code.</span> 245 246 <span class="c4"></span> 247 248 <span class="c4">Reply: Yes we should be aggressive in deleting this, 249 can put 'best before' dates on branches - this requires some discipline 250 for sure. Also should get rid of flags as soon as they are in 251 release.</span> 252 253 <span class="c4"></span> 254 255 <span class="c11 c7">Objection: Resolving conflicts, this thing can’t 256 disappear.</span> 257 258 <span class="c4"></span> 259 260 <span class="c4">Reply: Why? with short lived PRs everyone will be more 261 up to date to trunk-state of the world, the distance between developers 262 will be shorter and deltas smaller. See for example that huge 263 refactoring PR, and a few things that depend on it and have either been 264 merged beforehand or after, both requiring extra conflict resolution 265 because of lack of shared work.</span> 266 267 <span class="c4"></span> 268 269 <span class="c7 c11">Objection: Coordination between status-go and 270 status-react.</span> 271 272 <span class="c4"></span> 273 274 <span class="c4">Reply: maybe but I'm not sure this is true if we do it 275 right (perhaps it is too expensive to do it right). In Rich Hickey's 276 words, "I don't want to know" - status-react should be able to depend on 277 status-go develop/HEAD (possibly last 'good commit' if bug is 278 introduced) and then new _capabilities_ are introduced in status-go. 279 Things being breaking changes can exists concurrently and then be 280 switched off, like deprecation notice kind of. I really like Rich 281 Hickey's philosophy of 'change' not being a useful word, either you 282 accrete, add to something and _provide_ more, or you break something 283 by introducing a bug or _requiring_ more. I think this lesson can be 284 applied outside of a Clojure ecosystem, even with crazy stuff like 285 go-ethereum vendoring, but it requires discipline</span> 286 287 <span class="c4"></span> 288 289 <span class="c4">There is a general point though that you hint at: a lot 290 of complexity status-go side are things I'm blissfully unaware of, so 291 the trade-off discussion there is better had there. We can still do 292 things on status-react side, though.</span> 293 294 <span class="c4"></span> 295 296 <span class="c7">Objection: Also we will need to redefine manual testing 297 process,</span><span class="c4"> so imagine task like wallet. If we add 298 commits with not finished features to trunk, we will add bugs for sure 299 (especially if new changes touch older code), so we still need to test 300 wallet related stuff periodically, we can’t wait for 100% 301 implementation. So then QA folks should be able to build develop with 302 flags they choose. It should be relatively easily doable with jenkins. 303 But then when it comes to flags existing in both status-react and 304 status-go, or just in status-go, it means that they have to build 305 status-go first with these flags and then link it with status-react, 306 only then build app. So we need to provide such tools. Also the list of 307 flags should be defined somewhere and well explained.</span> 308 309 <span class="c4"></span> 310 311 <span class="c4">Reply: QA can already test WIP wallet stuff with 312 Jenkins PR, no? It is true we might want something like develop with WIP 313 and without WIP, at least as we get closer to release. I don't know 314 about flags in status-go or why this would directly impact status-react, 315 but maybe I'm missing something. Agreed regarding list of being well 316 defined.</span> 317 318 <span class="c4"></span> 319 320 <span class="c4">Concrete example from stuff I'm working on now (PR: 321 <https://github.com/status-im/status-react/pull/1872>) in terms of 322 splitting work: I can submit PR that targets develop, doesn't have iOS 323 bindings/ mock call \`Notify\` and prints fcm-token and payload. Then 324 tester instruction is similar but deliverable is: add contact and see 325 their fcm-token in Testfairy logs, download logs and do \`grep 326 XXX\`.</span> 327 328 <span class="c4"></span> 329 330 <span class="c4">Then next PR is dependent on status-go public API 331 changes, ideally this would be available for consumption straightaway 332 when API is updated in trunk but if not, wait until this is provided by 333 status-go and then submit iOS/Java bindings. Possibly short-lived 334 minimal branch for QA to test flow that binds to specific unsupported 335 status-go branch, but this is where you maybe get some cascading 336 effects.</span> 337 338 <span class="c4"></span> 339 340 <span class="c4">Counter-reply: We either have to always have already 341 compiled all invariants of flags for status-go (if don’t use only one 342 WIP flag), or to compile status-go as a step of status-react’s 343 build.</span> 344 345 <span class="c4"></span> 346 347 <span class="c4">The first case: There is some new feature in 348 status-react, like in that \`upgrade-status-go\` branch. Branch contains 349 changes related to new Whisper v5 api, and we definitely can abstract 350 this behind some protocol in cljs. So, we do this, test it with 351 status-go and find out that there is a breaking bug on status-go side. 352 We would commit this code to status-react’s trunk (it makes sense in 353 TBD, feature is not finished but we hide it anyway and don’t wait too 354 long for committing changes), but we can’t hide status-go related stuff 355 using the same flag, we need to add another flag in status-go first 356 which will hide new Whisper v5 api and provide v2 instead.</span> 357 358 <span class="c4"></span> 359 360 <span class="c4">The second case is similar, but related to that 361 go-ethereum rebasing thing: what if breaking change can’t be hidden by 362 flag at all. So we have short living branch in status-react, and we can 363 hide that feature but not status-go’s version on which it 364 depends.</span> 365 366 <span class="c4"></span> 367 368 <span class="c7">Objection: Seems obvious, the only reason we have 369 long-lived branches is due to being a new 370 team.</span><span class="c4"> I even wrote something like that before: 371 <https://hackmd.io/s/rkgqzRWIb>. I'm currently talking about status-go 372 as I've never had a chance to work to status-react to some 373 representative degree. I mean all our developers are middle to strong 374 but the team is weak just because we've just started working together on 375 the project nobody knows. This makes us do weird things and keep 376 long-lived branches.</span> 377 378 <span class="c4"></span> 379 380 <span class="c4">Reply: It might seem obvious yes, but we are not doing 381 it in practice. First step is getting everyone on the same page, I 382 think.</span> 383 384 <span class="c4"></span> 385 386 <span class="c4">Counter-reply: I thought everybody's already been on 387 the same page as I've already shared that document. The only thing 388 missing now is more aggressive approach to this requirement.</span> 389 390 <span class="c4"></span> 391 392 <span class="c7">Objection: Changes to stabilise everything shouldn't be 393 made at once,</span><span class="c4"> it's more flowing and gradual 394 process. Currently, everybody is occupied with the release and sees 395 broken and unreliable tests. Also, refactoring has come into play 396 recently. Once we have room to think about something else, we'll start 397 doing it. For example, everybody sees a lot of long-lived pull requests 398 and everybody understands that it shouldn't be so but we already have a 399 lot of more actual problems to tackle. From the status-go side we 400 already have more important things to tackle right now.</span> 401 402 <span class="c4"></span> 403 404 <span class="c4">Reply: As soon as release is out, I don't see why 405 status-react can't just depend on status-go develop straight away (even 406 if just a fixed commit). What downside would that have?</span> 407 408 <span class="c4"></span> 409 410 <span class="c4">Counter-reply: It can. status-react should just say: we 411 now depend on \`develop\`. Please, guys, keep it stable. This 412 requirement for us is already a good start.</span> 413 414 <span class="c4"></span> 415 416 <span class="c7">Objection: Let's start from status-react. Because these 417 problems are not so critical now for 418 status-go.</span><span class="c4"> They'll get more priority once we 419 have proper tests in place and the requirement from status-react to keep 420 status-go \`develop\` stable. Can open issue.</span> 421 422 <span class="c4"></span> 423 424 <span class="c10">Reply: OK. 425 </span><span class="c10 c13">[<https://github.com/status-im/status-go/issues/356>](https://www.google.com/url?q=https://github.com/status-im/status-go/issues/356&sa=D&ust=1507366105396000&usg=AFQjCNGtT8SN6Cm3tvdfddr3L6cYDSv3eA)</span><span class="c10"> 426 Then I guess the other thing is about long lived PRs, but that's more 427 team-specific.</span>