/ externals / catch / docs / benchmarks.md
benchmarks.md
  1  <a id="top"></a>
  2  # Authoring benchmarks
  3  
  4  > [Introduced](https://github.com/catchorg/Catch2/issues/1616) in Catch2 2.9.0.
  5  
  6  Writing benchmarks is not easy. Catch simplifies certain aspects but you'll
  7  always need to take care about various aspects. Understanding a few things about
  8  the way Catch runs your code will be very helpful when writing your benchmarks.
  9  
 10  First off, let's go over some terminology that will be used throughout this
 11  guide.
 12  
 13  - *User code*: user code is the code that the user provides to be measured.
 14  - *Run*: one run is one execution of the user code. Sometimes also referred
 15    to as an _iteration_.
 16  - *Sample*: one sample is one data point obtained by measuring the time it takes
 17    to perform a certain number of runs. One sample can consist of more than one
 18    run if the clock available does not have enough resolution to accurately
 19    measure a single run. All samples for a given benchmark execution are obtained
 20    with the same number of runs.
 21  
 22  ## Execution procedure
 23  
 24  Now I can explain how a benchmark is executed in Catch. There are three main
 25  steps, though the first does not need to be repeated for every benchmark.
 26  
 27  1. *Environmental probe*: before any benchmarks can be executed, the clock's
 28  resolution is estimated. A few other environmental artifacts are also estimated
 29  at this point, like the cost of calling the clock function, but they almost
 30  never have any impact in the results.
 31  
 32  2. *Estimation*: the user code is executed a few times to obtain an estimate of
 33  the amount of runs that should be in each sample. This also has the potential
 34  effect of bringing relevant code and data into the caches before the actual
 35  measurement starts.
 36  
 37  3. *Measurement*: all the samples are collected sequentially by performing the
 38  number of runs estimated in the previous step for each sample.
 39  
 40  This already gives us one important rule for writing benchmarks for Catch: the
 41  benchmarks must be repeatable. The user code will be executed several times, and
 42  the number of times it will be executed during the estimation step cannot be
 43  known beforehand since it depends on the time it takes to execute the code.
 44  User code that cannot be executed repeatedly will lead to bogus results or
 45  crashes.
 46  
 47  ## Benchmark specification
 48  
 49  Benchmarks can be specified anywhere inside a Catch test case.
 50  There is a simple and a slightly more advanced version of the `BENCHMARK` macro.
 51  
 52  Let's have a look how a naive Fibonacci implementation could be benchmarked:
 53  ```c++
 54  std::uint64_t Fibonacci(std::uint64_t number) {
 55      return number < 2 ? 1 : Fibonacci(number - 1) + Fibonacci(number - 2);
 56  }
 57  ```
 58  Now the most straight forward way to benchmark this function, is just adding a `BENCHMARK` macro to our test case:
 59  ```c++
 60  TEST_CASE("Fibonacci") {
 61      CHECK(Fibonacci(0) == 1);
 62      // some more asserts..
 63      CHECK(Fibonacci(5) == 8);
 64      // some more asserts..
 65  
 66      // now let's benchmark:
 67      BENCHMARK("Fibonacci 20") {
 68          return Fibonacci(20);
 69      };
 70  
 71      BENCHMARK("Fibonacci 25") {
 72          return Fibonacci(25);
 73      };
 74  
 75      BENCHMARK("Fibonacci 30") {
 76          return Fibonacci(30);
 77      };
 78  
 79      BENCHMARK("Fibonacci 35") {
 80          return Fibonacci(35);
 81      };
 82  }
 83  ```
 84  There's a few things to note:
 85  - As `BENCHMARK` expands to a lambda expression it is necessary to add a semicolon after
 86   the closing brace (as opposed to the first experimental version).
 87  - The `return` is a handy way to avoid the compiler optimizing away the benchmark code.
 88  
 89  Running this already runs the benchmarks and outputs something similar to:
 90  ```
 91  -------------------------------------------------------------------------------
 92  Fibonacci
 93  -------------------------------------------------------------------------------
 94  C:\path\to\Catch2\Benchmark.tests.cpp(10)
 95  ...............................................................................
 96  benchmark name                                  samples       iterations    est run time
 97                                                  mean          low mean      high mean
 98                                                  std dev       low std dev   high std dev
 99  -------------------------------------------------------------------------------
100  Fibonacci 20                                            100       416439   83.2878 ms
101                                                         2 ns         2 ns         2 ns
102                                                         0 ns         0 ns         0 ns
103  
104  Fibonacci 25                                            100       400776   80.1552 ms
105                                                         3 ns         3 ns         3 ns
106                                                         0 ns         0 ns         0 ns
107  
108  Fibonacci 30                                            100       396873   79.3746 ms
109                                                        17 ns        17 ns        17 ns
110                                                         0 ns         0 ns         0 ns
111  
112  Fibonacci 35                                            100       145169   87.1014 ms
113                                                       468 ns       464 ns       473 ns
114                                                        21 ns        15 ns        34 ns
115  ```
116  
117  ### Advanced benchmarking
118  The simplest use case shown above, takes no arguments and just runs the user code that needs to be measured.
119  However, if using the `BENCHMARK_ADVANCED` macro and adding a `Catch::Benchmark::Chronometer` argument after
120  the macro, some advanced features are available. The contents of the simple benchmarks are invoked once per run,
121  while the blocks of the advanced benchmarks are invoked exactly twice:
122  once during the estimation phase, and another time during the execution phase.
123  
124  ```c++
125  BENCHMARK("simple"){ return long_computation(); };
126  
127  BENCHMARK_ADVANCED("advanced")(Catch::Benchmark::Chronometer meter) {
128      set_up();
129      meter.measure([] { return long_computation(); });
130  };
131  ```
132  
133  These advanced benchmarks no longer consist entirely of user code to be measured.
134  In these cases, the code to be measured is provided via the
135  `Catch::Benchmark::Chronometer::measure` member function. This allows you to set up any
136  kind of state that might be required for the benchmark but is not to be included
137  in the measurements, like making a vector of random integers to feed to a
138  sorting algorithm.
139  
140  A single call to `Catch::Benchmark::Chronometer::measure` performs the actual measurements
141  by invoking the callable object passed in as many times as necessary. Anything
142  that needs to be done outside the measurement can be done outside the call to
143  `measure`.
144  
145  The callable object passed in to `measure` can optionally accept an `int`
146  parameter.
147  
148  ```c++
149  meter.measure([](int i) { return long_computation(i); });
150  ```
151  
152  If it accepts an `int` parameter, the sequence number of each run will be passed
153  in, starting with 0. This is useful if you want to measure some mutating code,
154  for example. The number of runs can be known beforehand by calling
155  `Catch::Benchmark::Chronometer::runs`; with this one can set up a different instance to be
156  mutated by each run.
157  
158  ```c++
159  std::vector<std::string> v(meter.runs());
160  std::fill(v.begin(), v.end(), test_string());
161  meter.measure([&v](int i) { in_place_escape(v[i]); });
162  ```
163  
164  Note that it is not possible to simply use the same instance for different runs
165  and resetting it between each run since that would pollute the measurements with
166  the resetting code.
167  
168  It is also possible to just provide an argument name to the simple `BENCHMARK` macro to get
169  the same semantics as providing a callable to `meter.measure` with `int` argument:
170  
171  ```c++
172  BENCHMARK("indexed", i){ return long_computation(i); };
173  ```
174  
175  ### Constructors and destructors
176  
177  All of these tools give you a lot mileage, but there are two things that still
178  need special handling: constructors and destructors. The problem is that if you
179  use automatic objects they get destroyed by the end of the scope, so you end up
180  measuring the time for construction and destruction together. And if you use
181  dynamic allocation instead, you end up including the time to allocate memory in
182  the measurements.
183  
184  To solve this conundrum, Catch provides class templates that let you manually
185  construct and destroy objects without dynamic allocation and in a way that lets
186  you measure construction and destruction separately.
187  
188  ```c++
189  BENCHMARK_ADVANCED("construct")(Catch::Benchmark::Chronometer meter) {
190      std::vector<Catch::Benchmark::storage_for<std::string>> storage(meter.runs());
191      meter.measure([&](int i) { storage[i].construct("thing"); });
192  };
193  
194  BENCHMARK_ADVANCED("destroy")(Catch::Benchmark::Chronometer meter) {
195      std::vector<Catch::Benchmark::destructable_object<std::string>> storage(meter.runs());
196      for(auto&& o : storage)
197          o.construct("thing");
198      meter.measure([&](int i) { storage[i].destruct(); });
199  };
200  ```
201  
202  `Catch::Benchmark::storage_for<T>` objects are just pieces of raw storage suitable for `T`
203  objects. You can use the `Catch::Benchmark::storage_for::construct` member function to call a constructor and
204  create an object in that storage. So if you want to measure the time it takes
205  for a certain constructor to run, you can just measure the time it takes to run
206  this function.
207  
208  When the lifetime of a `Catch::Benchmark::storage_for<T>` object ends, if an actual object was
209  constructed there it will be automatically destroyed, so nothing leaks.
210  
211  If you want to measure a destructor, though, we need to use
212  `Catch::Benchmark::destructable_object<T>`. These objects are similar to
213  `Catch::Benchmark::storage_for<T>` in that construction of the `T` object is manual, but
214  it does not destroy anything automatically. Instead, you are required to call
215  the `Catch::Benchmark::destructable_object::destruct` member function, which is what you
216  can use to measure the destruction time.
217  
218  ### The optimizer
219  
220  Sometimes the optimizer will optimize away the very code that you want to
221  measure. There are several ways to use results that will prevent the optimiser
222  from removing them. You can use the `volatile` keyword, or you can output the
223  value to standard output or to a file, both of which force the program to
224  actually generate the value somehow.
225  
226  Catch adds a third option. The values returned by any function provided as user
227  code are guaranteed to be evaluated and not optimised out. This means that if
228  your user code consists of computing a certain value, you don't need to bother
229  with using `volatile` or forcing output. Just `return` it from the function.
230  That helps with keeping the code in a natural fashion.
231  
232  Here's an example:
233  
234  ```c++
235  // may measure nothing at all by skipping the long calculation since its
236  // result is not used
237  BENCHMARK("no return"){ long_calculation(); };
238  
239  // the result of long_calculation() is guaranteed to be computed somehow
240  BENCHMARK("with return"){ return long_calculation(); };
241  ```
242  
243  However, there's no other form of control over the optimizer whatsoever. It is
244  up to you to write a benchmark that actually measures what you want and doesn't
245  just measure the time to do a whole bunch of nothing.
246  
247  To sum up, there are two simple rules: whatever you would do in handwritten code
248  to control optimization still works in Catch; and Catch makes return values
249  from user code into observable effects that can't be optimized away.
250  
251  <i>Adapted from nonius' documentation.</i>