/ src / reference / code-size.md
code-size.md
  1  # Shrinking `.wasm` Code Size
  2  
  3  This section will teach you how to optimize your `.wasm` build for a small code
  4  size footprint, and how to identify opportunities to change your Rust source
  5  such that less `.wasm` code is emitted.
  6  
  7  ## Why Care About Code Size?
  8  
  9  When serving a `.wasm` file over the network, the smaller it is, the faster the
 10  client can download it. Faster `.wasm` downloads lead to faster page load times,
 11  and that leads to happier users.
 12  
 13  However, it's important to remember though that code size likely isn't the
 14  end-all-be-all metric you're interested in, but rather something much more vague
 15  and hard to measure like "time to first interaction". While code size plays a
 16  large factor in this measurement (can't do anything if you don't even have all
 17  the code yet!) it's not the only factor.
 18  
 19  WebAssembly is typically served to users gzip'd so you'll want to be sure to
 20  compare differences in gzip'd size for transfer times over the wire. Also keep
 21  in mind that the WebAssembly binary format is quite amenable to gzip
 22  compression, often getting over 50% reductions in size.
 23  
 24  Furthermore, WebAssembly's binary format is optimized for very fast parsing and
 25  processing. Browsers nowadays have "baseline compilers" which parses WebAssembly
 26  and emits compiled code as fast as wasm can come in over the network. This means
 27  that [if you're using `instantiateStreaming`][hacks] the second the Web request
 28  is done the WebAssembly module is probably ready to go. JavaScript, on the other
 29  hand, can often take longer to not only parse but also get up to speed with JIT
 30  compilation and such.
 31  
 32  And finally, remember that WebAssembly is also far more optimized than
 33  JavaScript for execution speed. You'll want to be sure to measure for runtime
 34  comparisons between JavaScript and WebAssembly to factor that in to how
 35  important code size is.
 36  
 37  All this to say basically don't dismay immediately if your `.wasm` file is
 38  larger than expected! Code size may end up only being one of many factors in the
 39  end-to-end story. Comparisons between JavaScript and WebAssembly that only look
 40  at code size are missing the forest for the trees.
 41  
 42  [hacks]: https://hacks.mozilla.org/2018/01/making-webassembly-even-faster-firefoxs-new-streaming-and-tiering-compiler/
 43  
 44  ## Optimizing Builds for Code Size
 45  
 46  There are a bunch of configuration options we can use to get `rustc` to make
 47  smaller `.wasm` binaries. In some cases, we are trading longer compile times for
 48  smaller `.wasm` sizes. In other cases, we are trading runtime speed of the
 49  WebAssembly for smaller code size. We should be cognizant of the trade offs of
 50  each option, and in the cases where we trade runtime speed for code size,
 51  profile and measure to make an informed decision about whether the trade is
 52  worth it.
 53  
 54  ### Compiling with Link Time Optimizations (LTO)
 55  
 56  In `Cargo.toml`, add `lto = true` in the `[profile.release]` section:
 57  
 58  ```toml
 59  [profile.release]
 60  lto = true
 61  ```
 62  
 63  This gives LLVM many more opportunities to inline and prune functions. Not only
 64  will it make the `.wasm` smaller, but it will also make it faster at runtime!
 65  The downside is that compilation will take longer.
 66  
 67  ### Tell LLVM to Optimize for Size Instead of Speed
 68  
 69  LLVM's optimization passes are tuned to improve speed, not size, by default. We
 70  can change the goal to code size by modifying the `[profile.release]` section in
 71  `Cargo.toml` to this:
 72  
 73  ```toml
 74  [profile.release]
 75  opt-level = 's'
 76  ```
 77  
 78  Or, to even more aggressively optimize for size, at further potential speed
 79  costs:
 80  
 81  ```toml
 82  [profile.release]
 83  opt-level = 'z'
 84  ```
 85  
 86  Note that, surprisingly enough, `opt-level = "s"` can sometimes result in
 87  smaller binaries than `opt-level = "z"`. Always measure!
 88  
 89  ### Use the `wasm-opt` Tool
 90  
 91  The [Binaryen][] toolkit is a collection of WebAssembly-specific compiler
 92  tools. It goes much further than LLVM's WebAssembly backend does, and using its
 93  `wasm-opt` tool to post-process a `.wasm` binary generated by LLVM can often get
 94  another 15-20% savings on code size. It will often produce runtime speed ups at
 95  the same time!
 96  
 97  ```bash
 98  # Optimize for size.
 99  wasm-opt -Os -o output.wasm input.wasm
100  
101  # Optimize aggressively for size.
102  wasm-opt -Oz -o output.wasm input.wasm
103  
104  # Optimize for speed.
105  wasm-opt -O -o output.wasm input.wasm
106  
107  # Optimize aggressively for speed.
108  wasm-opt -O3 -o output.wasm input.wasm
109  ```
110  
111  [Binaryen]: https://github.com/WebAssembly/binaryen
112  
113  ### Notes about Debug Information
114  
115  One of the biggest contributors to wasm binary size can be debug information and
116  the `names` section of the wasm binary. The `wasm-pack` tool, however, removes
117  debuginfo by default. Additionally `wasm-opt` removes the `names` section by
118  default unless `-g` is also specified.
119  
120  This means that if you follow the above steps you should by default not have
121  either debuginfo or the names section in the wasm binary. If, however, you are
122  manually otherwise preserving this debug information in the wasm binary be sure
123  to be mindful of this!
124  
125  ## Size Profiling
126  
127  If tweaking build configurations to optimize for code size isn't resulting in a
128  small enough `.wasm` binary, it is time to do some profiling to see where the
129  remaining code size is coming from.
130  
131  > ⚡ Just like how we let time profiling guide our speed up efforts, we want to
132  > let size profiling guide our code size shrinking efforts. Fail to do this and
133  > you risk wasting your own time!
134  
135  ### The `twiggy` Code Size Profiler
136  
137  [`twiggy` is a code size profiler][twiggy] that supports WebAssembly as
138  input. It analyzes a binary's call graph to answer questions like:
139  
140  * Why was this function included in the binary in the first place?
141  
142  * What is the *retained size* of this function? I.e. how much space would be
143    saved if I removed it and all the functions that become dead code after its
144    removal?
145  
146  <style>
147  /* For whatever reason, the default mdbook fonts fonts break with the
148     following box-drawing characters, hence the manual style. */
149  pre, code {
150    font-family: "SFMono-Regular",Consolas,"Liberation Mono",Menlo,Courier,monospace;
151  }
152  </style>
153  
154  ```text
155  $ twiggy top -n 20 pkg/wasm_game_of_life_bg.wasm
156   Shallow Bytes │ Shallow % │ Item
157  ───────────────┼───────────┼────────────────────────────────────────────────────────────────────────────────────────
158            9158 ┊    19.65% ┊ "function names" subsection
159            3251 ┊     6.98% ┊ dlmalloc::dlmalloc::Dlmalloc::malloc::h632d10c184fef6e8
160            2510 ┊     5.39% ┊ <str as core::fmt::Debug>::fmt::he0d87479d1c208ea
161            1737 ┊     3.73% ┊ data[0]
162            1574 ┊     3.38% ┊ data[3]
163            1524 ┊     3.27% ┊ core::fmt::Formatter::pad::h6825605b326ea2c5
164            1413 ┊     3.03% ┊ std::panicking::rust_panic_with_hook::h1d3660f2e339513d
165            1200 ┊     2.57% ┊ core::fmt::Formatter::pad_integral::h06996c5859a57ced
166            1131 ┊     2.43% ┊ core::str::slice_error_fail::h6da90c14857ae01b
167            1051 ┊     2.26% ┊ core::fmt::write::h03ff8c7a2f3a9605
168             931 ┊     2.00% ┊ data[4]
169             864 ┊     1.85% ┊ dlmalloc::dlmalloc::Dlmalloc::free::h27b781e3b06bdb05
170             841 ┊     1.80% ┊ <char as core::fmt::Debug>::fmt::h07742d9f4a8c56f2
171             813 ┊     1.74% ┊ __rust_realloc
172             708 ┊     1.52% ┊ core::slice::memchr::memchr::h6243a1b2885fdb85
173             678 ┊     1.45% ┊ <core::fmt::builders::PadAdapter<'a> as core::fmt::Write>::write_str::h96b72fb7457d3062
174             631 ┊     1.35% ┊ universe_tick
175             631 ┊     1.35% ┊ dlmalloc::dlmalloc::Dlmalloc::dispose_chunk::hae6c5c8634e575b8
176             514 ┊     1.10% ┊ std::panicking::default_hook::{{closure}}::hfae0c204085471d5
177             503 ┊     1.08% ┊ <&'a T as core::fmt::Debug>::fmt::hba207e4f7abaece6
178  ```
179  
180  [twiggy]: https://github.com/rustwasm/twiggy
181  
182  ### Manually Inspecting LLVM-IR
183  
184  LLVM-IR is the final intermediate representation in the compiler toolchain
185  before LLVM generates WebAssembly. Therefore, it is very similar to the
186  WebAssembly that is ultimately emitted. More LLVM-IR generally means more
187  `.wasm` size, and if a function takes up 25% of the LLVM-IR, then it generally
188  will take up 25% of the `.wasm`. While these numbers only hold in general, the
189  LLVM-IR has crucial information that is not present in the `.wasm` (because of
190  WebAssembly's lack of a debugging format like DWARF): which subroutines were
191  inlined into a given function.
192  
193  You can generate LLVM-IR with this `cargo` command:
194  
195  ```
196  cargo rustc --release -- --emit llvm-ir
197  ```
198  
199  Then, you can use `find` to locate the `.ll` file containing the LLVM-IR in
200  `cargo`'s `target` directory:
201  
202  ```
203  find target/release -type f -name '*.ll'
204  ```
205  
206  #### References
207  
208  * [LLVM Language Reference Manual](https://llvm.org/docs/LangRef.html)
209  
210  ## More Invasive Tools and Techniques
211  
212  Tweaking build configurations to get smaller `.wasm` binaries is pretty hands
213  off. When you need to go the extra mile, however, you are prepared to use more
214  invasive techniques, like rewriting source code to avoid bloat. What follows is
215  a collection of get-your-hands-dirty techniques you can apply to get smaller
216  code sizes.
217  
218  ### Avoid String Formatting
219  
220  `format!`, `to_string`, etc... can bring in a lot of code bloat. If possible,
221  only do string formatting in debug mode, and in release mode use static strings.
222  
223  ### Avoid Panicking
224  
225  This is definitely easier said than done, but tools like `twiggy` and manually
226  inspecting LLVM-IR can help you figure out which functions are panicking.
227  
228  Panics do not always appear as a `panic!()` macro invocation. They arise
229  implicitly from many constructs, such as:
230  
231  * Indexing a slice panics on out of bounds indices: `my_slice[i]`
232  
233  * Division will panic if the divisor is zero: `dividend / divisor`
234  
235  * Unwrapping an `Option` or `Result`: `opt.unwrap()` or `res.unwrap()`
236  
237  The first two can be translated into the third. Indexing can be replaced with
238  fallible `my_slice.get(i)` operations. Division can be replaced with
239  `checked_div` calls. Now we only have a single case to contend with.
240  
241  Unwrapping an `Option` or `Result` without panicking comes in two flavors: safe
242  and unsafe.
243  
244  The safe approach is to `abort` instead of panicking when encountering a `None`
245  or an `Error`:
246  
247  ```rust
248  #[inline]
249  pub fn unwrap_abort<T>(o: Option<T>) -> T {
250      use std::process;
251      match o {
252          Some(t) => t,
253          None => process::abort(),
254      }
255  }
256  ```
257  
258  Ultimately, panics translate into aborts in `wasm32-unknown-unknown` anyways, so
259  this gives you the same behavior but without the code bloat.
260  
261  Alternatively, the [`unreachable` crate][unreachable] provides an unsafe
262  [`unchecked_unwrap` extension method][unchecked_unwrap] for `Option` and
263  `Result` which tells the Rust compiler to *assume* that the `Option` is `Some`
264  or the `Result` is `Ok`. It is undefined behavior what happens if that
265  assumption does not hold. You really only want to use this unsafe approach when
266  you 110% *know* that the assumption holds, and the compiler just isn't smart
267  enough to see it. Even if you go down this route, you should have a debug build
268  configuration that still does the checking, and only use unchecked operations in
269  release builds.
270  
271  [unreachable]: https://crates.io/crates/unreachable
272  [unchecked_unwrap]: https://docs.rs/unreachable/1.0.0/unreachable/trait.UncheckedOptionExt.html#tymethod.unchecked_unwrap
273  
274  ### Avoid Allocation or Switch to `wee_alloc`
275  
276  Rust's default allocator for WebAssembly is a port of `dlmalloc` to Rust. It
277  weighs in somewhere around ten kilobytes. If you can completely avoid dynamic
278  allocation, then you should be able to shed those ten kilobytes.
279  
280  Completely avoiding dynamic allocation can be very difficult. But removing
281  allocation from hot code paths is usually much easier (and usually helps make
282  those hot code paths faster, as well). In these cases, [replacing the default
283  global allocator with `wee_alloc`][wee_alloc] should save you most (but not
284  quite all) of those ten kilobytes. `wee_alloc` is an allocator designed for
285  situations where you need *some* kind of allocator, but do not need a
286  particularly fast allocator, and will happily trade allocation speed for smaller
287  code size.
288  
289  [wee_alloc]: https://github.com/rustwasm/wee_alloc
290  
291  ### Use Trait Objects Instead of Generic Type Parameters
292  
293  When you create generic functions that use type parameters, like this:
294  
295  ```rust
296  fn whatever<T: MyTrait>(t: T) { ... }
297  ```
298  
299  Then `rustc` and LLVM will create a new copy of the function for each `T` type
300  that the function is used with. This presents many opportunities for compiler
301  optimizations based on which particular `T` each copy is working with, but these
302  copies add up quickly in terms of code size.
303  
304  If you use trait objects instead of type parameters, like this:
305  
306  ```rust
307  fn whatever(t: Box<MyTrait>) { ... }
308  // or
309  fn whatever(t: &MyTrait) { ... }
310  // etc...
311  ```
312  
313  Then dynamic dispatch via virtual calls is used, and only a single version of
314  the function is emitted in the `.wasm`. The downside is the loss of the compiler
315  optimization opportunities and the added cost of indirect, dynamically
316  dispatched function calls.
317  
318  ### Use the `wasm-snip` Tool
319  
320  [`wasm-snip` replaces a WebAssembly function's body with an `unreachable`
321  instruction.][snip] This is a rather heavy, blunt hammer for functions that kind
322  of look like nails if you squint hard enough.
323  
324  Maybe you know that some function will never be called at runtime, but the
325  compiler can't prove that at compile time? Snip it! Afterwards, run `wasm-opt`
326  again with the `--dce` flag, and all the functions that the snipped function
327  transitively called (which could also never be called at runtime) will get
328  removed too.
329  
330  This tool is particularly useful for removing the panicking infrastructure,
331  since panics ultimately translate into traps anyways.
332  
333  [snip]: https://github.com/fitzgen/wasm-snip