code-size.md
1 # Shrinking `.wasm` Code Size 2 3 This section will teach you how to optimize your `.wasm` build for a small code 4 size footprint, and how to identify opportunities to change your Rust source 5 such that less `.wasm` code is emitted. 6 7 ## Why Care About Code Size? 8 9 When serving a `.wasm` file over the network, the smaller it is, the faster the 10 client can download it. Faster `.wasm` downloads lead to faster page load times, 11 and that leads to happier users. 12 13 However, it's important to remember though that code size likely isn't the 14 end-all-be-all metric you're interested in, but rather something much more vague 15 and hard to measure like "time to first interaction". While code size plays a 16 large factor in this measurement (can't do anything if you don't even have all 17 the code yet!) it's not the only factor. 18 19 WebAssembly is typically served to users gzip'd so you'll want to be sure to 20 compare differences in gzip'd size for transfer times over the wire. Also keep 21 in mind that the WebAssembly binary format is quite amenable to gzip 22 compression, often getting over 50% reductions in size. 23 24 Furthermore, WebAssembly's binary format is optimized for very fast parsing and 25 processing. Browsers nowadays have "baseline compilers" which parses WebAssembly 26 and emits compiled code as fast as wasm can come in over the network. This means 27 that [if you're using `instantiateStreaming`][hacks] the second the Web request 28 is done the WebAssembly module is probably ready to go. JavaScript, on the other 29 hand, can often take longer to not only parse but also get up to speed with JIT 30 compilation and such. 31 32 And finally, remember that WebAssembly is also far more optimized than 33 JavaScript for execution speed. You'll want to be sure to measure for runtime 34 comparisons between JavaScript and WebAssembly to factor that in to how 35 important code size is. 36 37 All this to say basically don't dismay immediately if your `.wasm` file is 38 larger than expected! Code size may end up only being one of many factors in the 39 end-to-end story. Comparisons between JavaScript and WebAssembly that only look 40 at code size are missing the forest for the trees. 41 42 [hacks]: https://hacks.mozilla.org/2018/01/making-webassembly-even-faster-firefoxs-new-streaming-and-tiering-compiler/ 43 44 ## Optimizing Builds for Code Size 45 46 There are a bunch of configuration options we can use to get `rustc` to make 47 smaller `.wasm` binaries. In some cases, we are trading longer compile times for 48 smaller `.wasm` sizes. In other cases, we are trading runtime speed of the 49 WebAssembly for smaller code size. We should be cognizant of the trade offs of 50 each option, and in the cases where we trade runtime speed for code size, 51 profile and measure to make an informed decision about whether the trade is 52 worth it. 53 54 ### Compiling with Link Time Optimizations (LTO) 55 56 In `Cargo.toml`, add `lto = true` in the `[profile.release]` section: 57 58 ```toml 59 [profile.release] 60 lto = true 61 ``` 62 63 This gives LLVM many more opportunities to inline and prune functions. Not only 64 will it make the `.wasm` smaller, but it will also make it faster at runtime! 65 The downside is that compilation will take longer. 66 67 ### Tell LLVM to Optimize for Size Instead of Speed 68 69 LLVM's optimization passes are tuned to improve speed, not size, by default. We 70 can change the goal to code size by modifying the `[profile.release]` section in 71 `Cargo.toml` to this: 72 73 ```toml 74 [profile.release] 75 opt-level = 's' 76 ``` 77 78 Or, to even more aggressively optimize for size, at further potential speed 79 costs: 80 81 ```toml 82 [profile.release] 83 opt-level = 'z' 84 ``` 85 86 Note that, surprisingly enough, `opt-level = "s"` can sometimes result in 87 smaller binaries than `opt-level = "z"`. Always measure! 88 89 ### Use the `wasm-opt` Tool 90 91 The [Binaryen][] toolkit is a collection of WebAssembly-specific compiler 92 tools. It goes much further than LLVM's WebAssembly backend does, and using its 93 `wasm-opt` tool to post-process a `.wasm` binary generated by LLVM can often get 94 another 15-20% savings on code size. It will often produce runtime speed ups at 95 the same time! 96 97 ```bash 98 # Optimize for size. 99 wasm-opt -Os -o output.wasm input.wasm 100 101 # Optimize aggressively for size. 102 wasm-opt -Oz -o output.wasm input.wasm 103 104 # Optimize for speed. 105 wasm-opt -O -o output.wasm input.wasm 106 107 # Optimize aggressively for speed. 108 wasm-opt -O3 -o output.wasm input.wasm 109 ``` 110 111 [Binaryen]: https://github.com/WebAssembly/binaryen 112 113 ### Notes about Debug Information 114 115 One of the biggest contributors to wasm binary size can be debug information and 116 the `names` section of the wasm binary. The `wasm-pack` tool, however, removes 117 debuginfo by default. Additionally `wasm-opt` removes the `names` section by 118 default unless `-g` is also specified. 119 120 This means that if you follow the above steps you should by default not have 121 either debuginfo or the names section in the wasm binary. If, however, you are 122 manually otherwise preserving this debug information in the wasm binary be sure 123 to be mindful of this! 124 125 ## Size Profiling 126 127 If tweaking build configurations to optimize for code size isn't resulting in a 128 small enough `.wasm` binary, it is time to do some profiling to see where the 129 remaining code size is coming from. 130 131 > ⚡ Just like how we let time profiling guide our speed up efforts, we want to 132 > let size profiling guide our code size shrinking efforts. Fail to do this and 133 > you risk wasting your own time! 134 135 ### The `twiggy` Code Size Profiler 136 137 [`twiggy` is a code size profiler][twiggy] that supports WebAssembly as 138 input. It analyzes a binary's call graph to answer questions like: 139 140 * Why was this function included in the binary in the first place? 141 142 * What is the *retained size* of this function? I.e. how much space would be 143 saved if I removed it and all the functions that become dead code after its 144 removal? 145 146 <style> 147 /* For whatever reason, the default mdbook fonts fonts break with the 148 following box-drawing characters, hence the manual style. */ 149 pre, code { 150 font-family: "SFMono-Regular",Consolas,"Liberation Mono",Menlo,Courier,monospace; 151 } 152 </style> 153 154 ```text 155 $ twiggy top -n 20 pkg/wasm_game_of_life_bg.wasm 156 Shallow Bytes │ Shallow % │ Item 157 ───────────────┼───────────┼──────────────────────────────────────────────────────────────────────────────────────── 158 9158 ┊ 19.65% ┊ "function names" subsection 159 3251 ┊ 6.98% ┊ dlmalloc::dlmalloc::Dlmalloc::malloc::h632d10c184fef6e8 160 2510 ┊ 5.39% ┊ <str as core::fmt::Debug>::fmt::he0d87479d1c208ea 161 1737 ┊ 3.73% ┊ data[0] 162 1574 ┊ 3.38% ┊ data[3] 163 1524 ┊ 3.27% ┊ core::fmt::Formatter::pad::h6825605b326ea2c5 164 1413 ┊ 3.03% ┊ std::panicking::rust_panic_with_hook::h1d3660f2e339513d 165 1200 ┊ 2.57% ┊ core::fmt::Formatter::pad_integral::h06996c5859a57ced 166 1131 ┊ 2.43% ┊ core::str::slice_error_fail::h6da90c14857ae01b 167 1051 ┊ 2.26% ┊ core::fmt::write::h03ff8c7a2f3a9605 168 931 ┊ 2.00% ┊ data[4] 169 864 ┊ 1.85% ┊ dlmalloc::dlmalloc::Dlmalloc::free::h27b781e3b06bdb05 170 841 ┊ 1.80% ┊ <char as core::fmt::Debug>::fmt::h07742d9f4a8c56f2 171 813 ┊ 1.74% ┊ __rust_realloc 172 708 ┊ 1.52% ┊ core::slice::memchr::memchr::h6243a1b2885fdb85 173 678 ┊ 1.45% ┊ <core::fmt::builders::PadAdapter<'a> as core::fmt::Write>::write_str::h96b72fb7457d3062 174 631 ┊ 1.35% ┊ universe_tick 175 631 ┊ 1.35% ┊ dlmalloc::dlmalloc::Dlmalloc::dispose_chunk::hae6c5c8634e575b8 176 514 ┊ 1.10% ┊ std::panicking::default_hook::{{closure}}::hfae0c204085471d5 177 503 ┊ 1.08% ┊ <&'a T as core::fmt::Debug>::fmt::hba207e4f7abaece6 178 ``` 179 180 [twiggy]: https://github.com/rustwasm/twiggy 181 182 ### Manually Inspecting LLVM-IR 183 184 LLVM-IR is the final intermediate representation in the compiler toolchain 185 before LLVM generates WebAssembly. Therefore, it is very similar to the 186 WebAssembly that is ultimately emitted. More LLVM-IR generally means more 187 `.wasm` size, and if a function takes up 25% of the LLVM-IR, then it generally 188 will take up 25% of the `.wasm`. While these numbers only hold in general, the 189 LLVM-IR has crucial information that is not present in the `.wasm` (because of 190 WebAssembly's lack of a debugging format like DWARF): which subroutines were 191 inlined into a given function. 192 193 You can generate LLVM-IR with this `cargo` command: 194 195 ``` 196 cargo rustc --release -- --emit llvm-ir 197 ``` 198 199 Then, you can use `find` to locate the `.ll` file containing the LLVM-IR in 200 `cargo`'s `target` directory: 201 202 ``` 203 find target/release -type f -name '*.ll' 204 ``` 205 206 #### References 207 208 * [LLVM Language Reference Manual](https://llvm.org/docs/LangRef.html) 209 210 ## More Invasive Tools and Techniques 211 212 Tweaking build configurations to get smaller `.wasm` binaries is pretty hands 213 off. When you need to go the extra mile, however, you are prepared to use more 214 invasive techniques, like rewriting source code to avoid bloat. What follows is 215 a collection of get-your-hands-dirty techniques you can apply to get smaller 216 code sizes. 217 218 ### Avoid String Formatting 219 220 `format!`, `to_string`, etc... can bring in a lot of code bloat. If possible, 221 only do string formatting in debug mode, and in release mode use static strings. 222 223 ### Avoid Panicking 224 225 This is definitely easier said than done, but tools like `twiggy` and manually 226 inspecting LLVM-IR can help you figure out which functions are panicking. 227 228 Panics do not always appear as a `panic!()` macro invocation. They arise 229 implicitly from many constructs, such as: 230 231 * Indexing a slice panics on out of bounds indices: `my_slice[i]` 232 233 * Division will panic if the divisor is zero: `dividend / divisor` 234 235 * Unwrapping an `Option` or `Result`: `opt.unwrap()` or `res.unwrap()` 236 237 The first two can be translated into the third. Indexing can be replaced with 238 fallible `my_slice.get(i)` operations. Division can be replaced with 239 `checked_div` calls. Now we only have a single case to contend with. 240 241 Unwrapping an `Option` or `Result` without panicking comes in two flavors: safe 242 and unsafe. 243 244 The safe approach is to `abort` instead of panicking when encountering a `None` 245 or an `Error`: 246 247 ```rust 248 #[inline] 249 pub fn unwrap_abort<T>(o: Option<T>) -> T { 250 use std::process; 251 match o { 252 Some(t) => t, 253 None => process::abort(), 254 } 255 } 256 ``` 257 258 Ultimately, panics translate into aborts in `wasm32-unknown-unknown` anyways, so 259 this gives you the same behavior but without the code bloat. 260 261 Alternatively, the [`unreachable` crate][unreachable] provides an unsafe 262 [`unchecked_unwrap` extension method][unchecked_unwrap] for `Option` and 263 `Result` which tells the Rust compiler to *assume* that the `Option` is `Some` 264 or the `Result` is `Ok`. It is undefined behavior what happens if that 265 assumption does not hold. You really only want to use this unsafe approach when 266 you 110% *know* that the assumption holds, and the compiler just isn't smart 267 enough to see it. Even if you go down this route, you should have a debug build 268 configuration that still does the checking, and only use unchecked operations in 269 release builds. 270 271 [unreachable]: https://crates.io/crates/unreachable 272 [unchecked_unwrap]: https://docs.rs/unreachable/1.0.0/unreachable/trait.UncheckedOptionExt.html#tymethod.unchecked_unwrap 273 274 ### Avoid Allocation or Switch to `wee_alloc` 275 276 Rust's default allocator for WebAssembly is a port of `dlmalloc` to Rust. It 277 weighs in somewhere around ten kilobytes. If you can completely avoid dynamic 278 allocation, then you should be able to shed those ten kilobytes. 279 280 Completely avoiding dynamic allocation can be very difficult. But removing 281 allocation from hot code paths is usually much easier (and usually helps make 282 those hot code paths faster, as well). In these cases, [replacing the default 283 global allocator with `wee_alloc`][wee_alloc] should save you most (but not 284 quite all) of those ten kilobytes. `wee_alloc` is an allocator designed for 285 situations where you need *some* kind of allocator, but do not need a 286 particularly fast allocator, and will happily trade allocation speed for smaller 287 code size. 288 289 [wee_alloc]: https://github.com/rustwasm/wee_alloc 290 291 ### Use Trait Objects Instead of Generic Type Parameters 292 293 When you create generic functions that use type parameters, like this: 294 295 ```rust 296 fn whatever<T: MyTrait>(t: T) { ... } 297 ``` 298 299 Then `rustc` and LLVM will create a new copy of the function for each `T` type 300 that the function is used with. This presents many opportunities for compiler 301 optimizations based on which particular `T` each copy is working with, but these 302 copies add up quickly in terms of code size. 303 304 If you use trait objects instead of type parameters, like this: 305 306 ```rust 307 fn whatever(t: Box<MyTrait>) { ... } 308 // or 309 fn whatever(t: &MyTrait) { ... } 310 // etc... 311 ``` 312 313 Then dynamic dispatch via virtual calls is used, and only a single version of 314 the function is emitted in the `.wasm`. The downside is the loss of the compiler 315 optimization opportunities and the added cost of indirect, dynamically 316 dispatched function calls. 317 318 ### Use the `wasm-snip` Tool 319 320 [`wasm-snip` replaces a WebAssembly function's body with an `unreachable` 321 instruction.][snip] This is a rather heavy, blunt hammer for functions that kind 322 of look like nails if you squint hard enough. 323 324 Maybe you know that some function will never be called at runtime, but the 325 compiler can't prove that at compile time? Snip it! Afterwards, run `wasm-opt` 326 again with the `--dce` flag, and all the functions that the snipped function 327 transitively called (which could also never be called at runtime) will get 328 removed too. 329 330 This tool is particularly useful for removing the panicking infrastructure, 331 since panics ultimately translate into traps anyways. 332 333 [snip]: https://github.com/fitzgen/wasm-snip