benchmarks.md
1 <a id="top"></a> 2 # Authoring benchmarks 3 4 > [Introduced](https://github.com/catchorg/Catch2/issues/1616) in Catch2 2.9.0. 5 6 Writing benchmarks is not easy. Catch simplifies certain aspects but you'll 7 always need to take care about various aspects. Understanding a few things about 8 the way Catch runs your code will be very helpful when writing your benchmarks. 9 10 First off, let's go over some terminology that will be used throughout this 11 guide. 12 13 - *User code*: user code is the code that the user provides to be measured. 14 - *Run*: one run is one execution of the user code. Sometimes also referred 15 to as an _iteration_. 16 - *Sample*: one sample is one data point obtained by measuring the time it takes 17 to perform a certain number of runs. One sample can consist of more than one 18 run if the clock available does not have enough resolution to accurately 19 measure a single run. All samples for a given benchmark execution are obtained 20 with the same number of runs. 21 22 ## Execution procedure 23 24 Now I can explain how a benchmark is executed in Catch. There are three main 25 steps, though the first does not need to be repeated for every benchmark. 26 27 1. *Environmental probe*: before any benchmarks can be executed, the clock's 28 resolution is estimated. A few other environmental artifacts are also estimated 29 at this point, like the cost of calling the clock function, but they almost 30 never have any impact in the results. 31 32 2. *Estimation*: the user code is executed a few times to obtain an estimate of 33 the amount of runs that should be in each sample. This also has the potential 34 effect of bringing relevant code and data into the caches before the actual 35 measurement starts. 36 37 3. *Measurement*: all the samples are collected sequentially by performing the 38 number of runs estimated in the previous step for each sample. 39 40 This already gives us one important rule for writing benchmarks for Catch: the 41 benchmarks must be repeatable. The user code will be executed several times, and 42 the number of times it will be executed during the estimation step cannot be 43 known beforehand since it depends on the time it takes to execute the code. 44 User code that cannot be executed repeatedly will lead to bogus results or 45 crashes. 46 47 ## Benchmark specification 48 49 Benchmarks can be specified anywhere inside a Catch test case. 50 There is a simple and a slightly more advanced version of the `BENCHMARK` macro. 51 52 Let's have a look how a naive Fibonacci implementation could be benchmarked: 53 ```c++ 54 std::uint64_t Fibonacci(std::uint64_t number) { 55 return number < 2 ? 1 : Fibonacci(number - 1) + Fibonacci(number - 2); 56 } 57 ``` 58 Now the most straight forward way to benchmark this function, is just adding a `BENCHMARK` macro to our test case: 59 ```c++ 60 TEST_CASE("Fibonacci") { 61 CHECK(Fibonacci(0) == 1); 62 // some more asserts.. 63 CHECK(Fibonacci(5) == 8); 64 // some more asserts.. 65 66 // now let's benchmark: 67 BENCHMARK("Fibonacci 20") { 68 return Fibonacci(20); 69 }; 70 71 BENCHMARK("Fibonacci 25") { 72 return Fibonacci(25); 73 }; 74 75 BENCHMARK("Fibonacci 30") { 76 return Fibonacci(30); 77 }; 78 79 BENCHMARK("Fibonacci 35") { 80 return Fibonacci(35); 81 }; 82 } 83 ``` 84 There's a few things to note: 85 - As `BENCHMARK` expands to a lambda expression it is necessary to add a semicolon after 86 the closing brace (as opposed to the first experimental version). 87 - The `return` is a handy way to avoid the compiler optimizing away the benchmark code. 88 89 Running this already runs the benchmarks and outputs something similar to: 90 ``` 91 ------------------------------------------------------------------------------- 92 Fibonacci 93 ------------------------------------------------------------------------------- 94 C:\path\to\Catch2\Benchmark.tests.cpp(10) 95 ............................................................................... 96 benchmark name samples iterations est run time 97 mean low mean high mean 98 std dev low std dev high std dev 99 ------------------------------------------------------------------------------- 100 Fibonacci 20 100 416439 83.2878 ms 101 2 ns 2 ns 2 ns 102 0 ns 0 ns 0 ns 103 104 Fibonacci 25 100 400776 80.1552 ms 105 3 ns 3 ns 3 ns 106 0 ns 0 ns 0 ns 107 108 Fibonacci 30 100 396873 79.3746 ms 109 17 ns 17 ns 17 ns 110 0 ns 0 ns 0 ns 111 112 Fibonacci 35 100 145169 87.1014 ms 113 468 ns 464 ns 473 ns 114 21 ns 15 ns 34 ns 115 ``` 116 117 ### Advanced benchmarking 118 The simplest use case shown above, takes no arguments and just runs the user code that needs to be measured. 119 However, if using the `BENCHMARK_ADVANCED` macro and adding a `Catch::Benchmark::Chronometer` argument after 120 the macro, some advanced features are available. The contents of the simple benchmarks are invoked once per run, 121 while the blocks of the advanced benchmarks are invoked exactly twice: 122 once during the estimation phase, and another time during the execution phase. 123 124 ```c++ 125 BENCHMARK("simple"){ return long_computation(); }; 126 127 BENCHMARK_ADVANCED("advanced")(Catch::Benchmark::Chronometer meter) { 128 set_up(); 129 meter.measure([] { return long_computation(); }); 130 }; 131 ``` 132 133 These advanced benchmarks no longer consist entirely of user code to be measured. 134 In these cases, the code to be measured is provided via the 135 `Catch::Benchmark::Chronometer::measure` member function. This allows you to set up any 136 kind of state that might be required for the benchmark but is not to be included 137 in the measurements, like making a vector of random integers to feed to a 138 sorting algorithm. 139 140 A single call to `Catch::Benchmark::Chronometer::measure` performs the actual measurements 141 by invoking the callable object passed in as many times as necessary. Anything 142 that needs to be done outside the measurement can be done outside the call to 143 `measure`. 144 145 The callable object passed in to `measure` can optionally accept an `int` 146 parameter. 147 148 ```c++ 149 meter.measure([](int i) { return long_computation(i); }); 150 ``` 151 152 If it accepts an `int` parameter, the sequence number of each run will be passed 153 in, starting with 0. This is useful if you want to measure some mutating code, 154 for example. The number of runs can be known beforehand by calling 155 `Catch::Benchmark::Chronometer::runs`; with this one can set up a different instance to be 156 mutated by each run. 157 158 ```c++ 159 std::vector<std::string> v(meter.runs()); 160 std::fill(v.begin(), v.end(), test_string()); 161 meter.measure([&v](int i) { in_place_escape(v[i]); }); 162 ``` 163 164 Note that it is not possible to simply use the same instance for different runs 165 and resetting it between each run since that would pollute the measurements with 166 the resetting code. 167 168 It is also possible to just provide an argument name to the simple `BENCHMARK` macro to get 169 the same semantics as providing a callable to `meter.measure` with `int` argument: 170 171 ```c++ 172 BENCHMARK("indexed", i){ return long_computation(i); }; 173 ``` 174 175 ### Constructors and destructors 176 177 All of these tools give you a lot mileage, but there are two things that still 178 need special handling: constructors and destructors. The problem is that if you 179 use automatic objects they get destroyed by the end of the scope, so you end up 180 measuring the time for construction and destruction together. And if you use 181 dynamic allocation instead, you end up including the time to allocate memory in 182 the measurements. 183 184 To solve this conundrum, Catch provides class templates that let you manually 185 construct and destroy objects without dynamic allocation and in a way that lets 186 you measure construction and destruction separately. 187 188 ```c++ 189 BENCHMARK_ADVANCED("construct")(Catch::Benchmark::Chronometer meter) { 190 std::vector<Catch::Benchmark::storage_for<std::string>> storage(meter.runs()); 191 meter.measure([&](int i) { storage[i].construct("thing"); }); 192 }; 193 194 BENCHMARK_ADVANCED("destroy")(Catch::Benchmark::Chronometer meter) { 195 std::vector<Catch::Benchmark::destructable_object<std::string>> storage(meter.runs()); 196 for(auto&& o : storage) 197 o.construct("thing"); 198 meter.measure([&](int i) { storage[i].destruct(); }); 199 }; 200 ``` 201 202 `Catch::Benchmark::storage_for<T>` objects are just pieces of raw storage suitable for `T` 203 objects. You can use the `Catch::Benchmark::storage_for::construct` member function to call a constructor and 204 create an object in that storage. So if you want to measure the time it takes 205 for a certain constructor to run, you can just measure the time it takes to run 206 this function. 207 208 When the lifetime of a `Catch::Benchmark::storage_for<T>` object ends, if an actual object was 209 constructed there it will be automatically destroyed, so nothing leaks. 210 211 If you want to measure a destructor, though, we need to use 212 `Catch::Benchmark::destructable_object<T>`. These objects are similar to 213 `Catch::Benchmark::storage_for<T>` in that construction of the `T` object is manual, but 214 it does not destroy anything automatically. Instead, you are required to call 215 the `Catch::Benchmark::destructable_object::destruct` member function, which is what you 216 can use to measure the destruction time. 217 218 ### The optimizer 219 220 Sometimes the optimizer will optimize away the very code that you want to 221 measure. There are several ways to use results that will prevent the optimiser 222 from removing them. You can use the `volatile` keyword, or you can output the 223 value to standard output or to a file, both of which force the program to 224 actually generate the value somehow. 225 226 Catch adds a third option. The values returned by any function provided as user 227 code are guaranteed to be evaluated and not optimised out. This means that if 228 your user code consists of computing a certain value, you don't need to bother 229 with using `volatile` or forcing output. Just `return` it from the function. 230 That helps with keeping the code in a natural fashion. 231 232 Here's an example: 233 234 ```c++ 235 // may measure nothing at all by skipping the long calculation since its 236 // result is not used 237 BENCHMARK("no return"){ long_calculation(); }; 238 239 // the result of long_calculation() is guaranteed to be computed somehow 240 BENCHMARK("with return"){ return long_calculation(); }; 241 ``` 242 243 However, there's no other form of control over the optimizer whatsoever. It is 244 up to you to write a benchmark that actually measures what you want and doesn't 245 just measure the time to do a whole bunch of nothing. 246 247 To sum up, there are two simple rules: whatever you would do in handwritten code 248 to control optimization still works in Catch; and Catch makes return values 249 from user code into observable effects that can't be optimized away. 250 251 <i>Adapted from nonius' documentation.</i>