/ doc / dev / testing / Profiling.md
Profiling.md
  1  # Arti profiling methodology
  2  
  3  This document describes basic tools for profiling Arti's CPU and memory
  4  usage.  Not all of these tools will make sense for every situation, and
  5  we may want to switch them in the future.  The main reason for recording
  6  them here is so that we don't have to re-learn how to use them the next
  7  time we need to do a big round of profiling tests.
  8  
  9  ## Building for profiling
 10  
 11  When you're testing with `cargo build --locked --release`, use
 12  `CARGO_PROFILE_RELEASE_DEBUG=true` to include extra debugging
 13  information for better output.
 14  
 15  ## Profiling tools
 16  
 17  Here I'll talk about a few tools for measuring CPU usage, memory usage,
 18  and the like.  For now, I'll assume you're on a reasonably modern Linux
 19  environment: if you aren't, you'll have to do some stuff differently.
 20  
 21  I'll talk about particular scenarios to profile in the next major
 22  section.
 23  
 24  ### cargo flamegraph
 25  
 26  [cargo-flamegraph](https://github.com/flamegraph-rs/flamegraph) is a
 27  pretty quick-and-easy event profiling visualization tool.  It produces
 28  nice SVG flamegraphs in a variety of pretty colors.  As with all
 29  flamegraphs, these are better for visualization than detailed
 30  drill-down.  On Linux, `cargo-flamegraph` uses
 31  [`perf`](https://perf.wiki.kernel.org/index.php/Main_Page) under the
 32  hood.
 33  
 34  To install, make sure you have a working version of `perf`
 35  installed.  Then run `cargo install flamegraph`.
 36  
 37  Basic usage:
 38  
 39  ```
 40  flamegraph {command}
 41  ```
 42  
 43  Output: `flamegraph.svg`
 44  
 45  Also consider using the `--reverse` flag, to reverse the stack and see the
 46  lowest-level functions that get the most use.
 47  
 48  ### tcmalloc and pprof
 49  
 50  This can generate usage graphs showing who allocated your memory when.
 51  (It can get a bit confusing in Rust.)
 52  
 53  ```
 54  HEAPPROFILE=/tmp/heap.hprof \
 55   LD_PRELOAD=/usr/lib64/libtcmalloc_and_profiler.so \
 56   {command}
 57  ```
 58  
 59  ```
 60  pprof --pdf --inuse_space {binary} /tmp/heap.hprof > heap.pdf
 61  ```
 62  
 63  You might need a longer timeout with this one; it's nontrivial.
 64  
 65  ### valgrind --massif
 66  
 67  This tool can also generate usage graphs like pprof above.
 68  
 69  `valgrind --tool=massif {command}`
 70  
 71  It will generate a file called `massif.out.PID`.  You can view it with the
 72  `ms_print` tool (included with valgrind) or the `massif-visualizer` tool
 73  (installed separately, highly recommended.)
 74  
 75  ## Some commands to profile
 76  
 77  These should generally run against a chutney network whenever possible;
 78  the `ARTI_CONF` envvar should be set to
 79  e.g. `$(pwd)/chutney/net/nodes/arti.toml`.
 80  
 81  ### Bootstrapping a directory
 82  
 83  `arti-testing bootstrap -c ${ARTI_CONF}`
 84  
 85  (This test bootstraps only.  It might make sense to do this one on the
 86  real network, since its data is more complex.  You need to start with an
 87  empty set of state files for this to test bootstrapping instead of
 88  loading.)
 89  
 90  ### Large number of circuits, focusing on circuit construction
 91  
 92  Bootstrap outside of benchmarking, then run:
 93  
 94  `arti-bench -u 1 -d 1 -s 100 -C 20 -p 1 -c ${ARTI_CONF}`
 95  
 96  (100 samples, 20 circuits per sample, 1 stream per circuit, only 1 byte
 97  to upload or download.)
 98  
 99  Note that this test won't necessarily tell you so much about _path
100  construction_, since path construction on a large real network with
101  different weights, policies, and families is more complex than on a
102  chutney network.
103  
104  (just times out with chutney; directory changes too fast, I think.)
105  
106  
107  ### Running offline
108  
109  Also
110  
111  * Bootstrapping failure conditional
112  * Going offline
113  * Primary guards go down after bootstrap
114  
115  (See `HowToBreak.md`)
116  
117  ### Data transfer
118  
119  `arti-bench -s 20 -C 1 -p 1 {...}`
120  
121  (No parallelism, 10 MB up and down.)
122  
123  ### Data transfer with many circuits
124  
125  `arti-bench -s 1 -C 64 -p 1 -c ${ARTI_CONF}`
126  
127  (Circuit parallelism only, 10 mb up and down)
128  
129  ### Data transfer with many streams
130  
131  `arti-bench -s 1 -C 1 -p 64 -c ${ARTI_CONF}`
132  
133  (Stream parallelism only, 10 mb up and down)
134  
135  ### Huge number of simultaneous connection attempts
136  
137  `arti-bench -s 1 -C 16 -p 16 -c ${ARTI_CONF}`
138  
139  (stream and circuit parallelism)
140  
141  # TODO
142  
143  arti-bench:
144    - take a target address as a string.
145    - Allow -p 0 to build a circuit only?
146    - Some way to build a path only?
147  
148  Extract chutney boilerplate.
149  
150  arti-testing:
151    - ability to make connections aggressively simultaneous
152