Profiling.md
1 # Arti profiling methodology 2 3 This document describes basic tools for profiling Arti's CPU and memory 4 usage. Not all of these tools will make sense for every situation, and 5 we may want to switch them in the future. The main reason for recording 6 them here is so that we don't have to re-learn how to use them the next 7 time we need to do a big round of profiling tests. 8 9 ## Building for profiling 10 11 When you're testing with `cargo build --locked --release`, use 12 `CARGO_PROFILE_RELEASE_DEBUG=true` to include extra debugging 13 information for better output. 14 15 ## Profiling tools 16 17 Here I'll talk about a few tools for measuring CPU usage, memory usage, 18 and the like. For now, I'll assume you're on a reasonably modern Linux 19 environment: if you aren't, you'll have to do some stuff differently. 20 21 I'll talk about particular scenarios to profile in the next major 22 section. 23 24 ### cargo flamegraph 25 26 [cargo-flamegraph](https://github.com/flamegraph-rs/flamegraph) is a 27 pretty quick-and-easy event profiling visualization tool. It produces 28 nice SVG flamegraphs in a variety of pretty colors. As with all 29 flamegraphs, these are better for visualization than detailed 30 drill-down. On Linux, `cargo-flamegraph` uses 31 [`perf`](https://perf.wiki.kernel.org/index.php/Main_Page) under the 32 hood. 33 34 To install, make sure you have a working version of `perf` 35 installed. Then run `cargo install flamegraph`. 36 37 Basic usage: 38 39 ``` 40 flamegraph {command} 41 ``` 42 43 Output: `flamegraph.svg` 44 45 Also consider using the `--reverse` flag, to reverse the stack and see the 46 lowest-level functions that get the most use. 47 48 ### tcmalloc and pprof 49 50 This can generate usage graphs showing who allocated your memory when. 51 (It can get a bit confusing in Rust.) 52 53 ``` 54 HEAPPROFILE=/tmp/heap.hprof \ 55 LD_PRELOAD=/usr/lib64/libtcmalloc_and_profiler.so \ 56 {command} 57 ``` 58 59 ``` 60 pprof --pdf --inuse_space {binary} /tmp/heap.hprof > heap.pdf 61 ``` 62 63 You might need a longer timeout with this one; it's nontrivial. 64 65 ### valgrind --massif 66 67 This tool can also generate usage graphs like pprof above. 68 69 `valgrind --tool=massif {command}` 70 71 It will generate a file called `massif.out.PID`. You can view it with the 72 `ms_print` tool (included with valgrind) or the `massif-visualizer` tool 73 (installed separately, highly recommended.) 74 75 ## Some commands to profile 76 77 These should generally run against a chutney network whenever possible; 78 the `ARTI_CONF` envvar should be set to 79 e.g. `$(pwd)/chutney/net/nodes/arti.toml`. 80 81 ### Bootstrapping a directory 82 83 `arti-testing bootstrap -c ${ARTI_CONF}` 84 85 (This test bootstraps only. It might make sense to do this one on the 86 real network, since its data is more complex. You need to start with an 87 empty set of state files for this to test bootstrapping instead of 88 loading.) 89 90 ### Large number of circuits, focusing on circuit construction 91 92 Bootstrap outside of benchmarking, then run: 93 94 `arti-bench -u 1 -d 1 -s 100 -C 20 -p 1 -c ${ARTI_CONF}` 95 96 (100 samples, 20 circuits per sample, 1 stream per circuit, only 1 byte 97 to upload or download.) 98 99 Note that this test won't necessarily tell you so much about _path 100 construction_, since path construction on a large real network with 101 different weights, policies, and families is more complex than on a 102 chutney network. 103 104 (just times out with chutney; directory changes too fast, I think.) 105 106 107 ### Running offline 108 109 Also 110 111 * Bootstrapping failure conditional 112 * Going offline 113 * Primary guards go down after bootstrap 114 115 (See `HowToBreak.md`) 116 117 ### Data transfer 118 119 `arti-bench -s 20 -C 1 -p 1 {...}` 120 121 (No parallelism, 10 MB up and down.) 122 123 ### Data transfer with many circuits 124 125 `arti-bench -s 1 -C 64 -p 1 -c ${ARTI_CONF}` 126 127 (Circuit parallelism only, 10 mb up and down) 128 129 ### Data transfer with many streams 130 131 `arti-bench -s 1 -C 1 -p 64 -c ${ARTI_CONF}` 132 133 (Stream parallelism only, 10 mb up and down) 134 135 ### Huge number of simultaneous connection attempts 136 137 `arti-bench -s 1 -C 16 -p 16 -c ${ARTI_CONF}` 138 139 (stream and circuit parallelism) 140 141 # TODO 142 143 arti-bench: 144 - take a target address as a string. 145 - Allow -p 0 to build a circuit only? 146 - Some way to build a path only? 147 148 Extract chutney boilerplate. 149 150 arti-testing: 151 - ability to make connections aggressively simultaneous 152