/ README.md
README.md
1 # Constantine - Fast, compact, hardened Pairing-Based Cryptography 2 3 [](https://opensource.org/licenses/Apache-2.0) 4 [](https://opensource.org/licenses/MIT) 5 \ 6 [](https://github.com/mratsim/constantine/actions?query=workflow%3A%22Constantine+CI%22+branch%3Amaster)\ 7 [](https://travis-ci.com/mratsim/constantine)\ 8 [](https://dev.azure.com/numforge/Constantine/_build?definitionId=4&branchName=master) 9 10 > “A cryptographic system should be secure even if everything about the system, except the key, is public knowledge.”\ 11 > — Auguste Kerckhoffs 12 13 This library provides [constant-time](https://en.wikipedia.org/wiki/Timing_attack) implementation of cryptographic protocols 14 with a particular focus on pairing-based cryptography as used in blockchains and zero-knowledge protocols. 15 16 The implementations are accompanied with SAGE code used as reference implementation and test vectors generators before writing highly optimized routines implemented in the [Nim language](https://nim-lang.org/) 17 18 > The library is in development state and high-level wrappers or example protocols are work-in-progress. 19 20 ## Table of Contents 21 22 <!-- TOC --> 23 24 - [Constantine - Fast, compact, hardened Pairing-Based Cryptography](#constantine---fast-compact-hardened-pairing-based-cryptography) 25 - [Table of Contents](#table-of-contents) 26 - [Target audience](#target-audience) 27 - [Protocols](#protocols) 28 - [Installation](#installation) 29 - [From C](#from-c) 30 - [From Nim](#from-nim) 31 - [Dependencies & Requirements](#dependencies--requirements) 32 - [Curves supported in the backend](#curves-supported-in-the-backend) 33 - [Security](#security) 34 - [Disclaimer](#disclaimer) 35 - [Security disclosure](#security-disclosure) 36 - [Performance](#performance) 37 - [In blockchain](#in-blockchain) 38 - [In zero-knowledge proofs](#in-zero-knowledge-proofs) 39 - [Measuring performance](#measuring-performance) 40 - [Ethereum BLS signatures over BLS12-381 G2](#ethereum-bls-signatures-over-bls12-381-g2) 41 - [BLS12-381 detailed benchmarks](#bls12-381-detailed-benchmarks) 42 - [BN254-Snarks Multi-Scalar-Multiplication benchmarks](#bn254-snarks-multi-scalar-multiplication-benchmarks) 43 - [Parallelism](#parallelism) 44 - [Why Nim](#why-nim) 45 - [Compiler caveats](#compiler-caveats) 46 - [Inline assembly](#inline-assembly) 47 - [Sizes: code size, stack usage](#sizes-code-size-stack-usage) 48 - [License](#license) 49 50 <!-- /TOC --> 51 52 ## Target audience 53 54 The library aims to be a fast, compact and hardened library for elliptic curve cryptography needs, in particular for blockchain protocols and zero-knowledge proofs system. 55 56 The library focuses on following properties: 57 - constant-time (not leaking secret data via [side-channels](https://en.wikipedia.org/wiki/Side-channel_attack)) 58 - performance 59 - generated code size, datatype size and stack usage 60 61 in this order. 62 63 ## Protocols 64 65 Protocols are a set of routines, designed for specific goals or a combination thereof: 66 - confidentiality: only the intended receiver of a message can read it 67 - authentication: the other party in the communication is the expected part 68 - integrity: the received message has not been tampered with 69 - non-repudiation: the sender of a message cannot repudiated it 70 71 Protocols to address these goals, (authenticated) encryption, signature, traitor-tracing, etc 72 are designed.\ 73 Note: some goals might be mutually exclusive, for example "plausible deniability" and "non-repudiation". 74 75 ## Installation 76 77 ### From C 78 79 1. Install a C compiler, for example: 80 - Debian/Ubuntu `sudo apt update && sudo apt install build-essential` 81 - Archlinux `pacman -S base-devel` 82 83 2. Install nim, it is available in most distros package manager for Linux and Homebrew for MacOS 84 Windows binaries are on the official website: https://nim-lang.org/install_unix.html 85 - Debian/Ubuntu `sudo apt install nim` 86 - Archlinux `pacman -S nim` 87 88 3. Compile the bindings. 89 - Recommended: \ 90 `CC:clang nimble bindings` 91 - or `nimble bindings_no_asm`\ 92 to compile without assembly (otherwise it autodetects support) 93 - or with default compiler\ 94 `nimble bindings` 95 96 4. Ensure bindings work 97 - `nimble test_bindings` 98 99 5. Bindings location 100 - The bindings are put in `constantine/lib` 101 - The headers are in [constantine/include](./include) for example [Ethereum BLS signatures](./include/constantine_ethereum_bls_signatures.h) 102 103 6. Read the examples in [examples_c](./examples_c): 104 - Using the [Ethereum BLS signatures bindings from C](./examples_c/ethereum_bls_signatures.c) 105 - Testing Constantine BLS12-381 vs GMP [./examples_c/t_libctt_bls12_381.c](./examples_c/t_libctt_bls12_381.c) 106 107 The bindings currently provided are: 108 109 - Ethereum BLS signatures on BLS12-381 G2 110 Cryptographic suite: `BLS_SIG_BLS12381G2_XMD:SHA-256_SSWU_RO_POP_` 111 112 This scheme is also used in the following blockchains: 113 Algorand, Chia, Dfinity, Filecoin, Tezos, Zcash. 114 They may have their pubkeys on G1 and signatures on G2 like Ethereum or the other way around. 115 116 - BLS12-381 arithmetic: 117 - field arithmetic 118 - on Fr (i.e. modulo the 255-bit curve order) 119 - on Fp (i.e. modulo the 381-bit prime modulus) 120 - on Fp2 121 - elliptic curve arithmetic: 122 - on elliptic curve over Fp (EC G1) with affine, jacobian and homogenous projective coordinates 123 - on elliptic curve over Fp2 (EC G2) with affine, jacobian and homogenous projective coordinates 124 - currently not exposed: \ 125 scalar multiplication, multi-scalar multiplications \ 126 pairings and multi-pairings \ 127 are implemented but not exposed 128 - _All operations are constant-time unless explicitly mentioned_ vartime 129 130 - The Pasta curves: Pallas and Vesta 131 - field arithmetic 132 - on Fr (i.e. modulo the 255-bit curve order) 133 - on Fp (i.e. modulo the 255-bit prime modulus) 134 - elliptic curve arithmetic: 135 - on elliptic curve over Fp (EC G1) with affine, jacobian and homogenous projective coordinates 136 - currently not exposed: \ 137 scalar multiplication, multi-scalar multiplications \ 138 are implemented but not exposed 139 - _All operations are constant-time unless explicitly mentioned_ vartime 140 141 ### From Nim 142 143 You can install the developement version of the library through nimble with the following command 144 ``` 145 nimble install https://github.com/mratsim/constantine@#master 146 ``` 147 148 ## Dependencies & Requirements 149 150 For speed it is recommended to use Clang (see [Compiler-caveats](#Compiler-caveats)). 151 In particular GCC generates inefficient add-with-carry code. 152 153 Constantine requires at least: 154 - GCC 7 \ 155 Previous versions generated incorrect add-with-carry code. 156 - Clang 14 \ 157 On x86-64, inline assembly is used to workaround compilers having issues optimizing large integer arithmetic, 158 and also ensure constant-time code. \ 159 Constantine uses the intel assembly syntax to address issues with the default AT&T syntax and constants propagated in Clang. \ 160 Clang 14 added support for `-masm=intel`. \ 161 \ 162 On MacOS, Apple Clang does not support Intel assembly syntax, use Homebrew Clang instead or compile without assembly.\ 163 _Note that Apple is discontinuing Intel CPU throughough their product line so this will impact only older model and Mac Pro_ 164 165 On Windows, Constantine is tested with MinGW. The Microsoft Visual C++ Compiler is not configured. 166 167 Constantine has no dependencies, even on Nim standard library except: 168 - for testing 169 - jsony for parsing json test vectors 170 - the Nim standard library for unittesting, formatting and datetime. 171 - GMP for testing against GMP 172 - for benchmarking 173 - The Nim standard libreary for timing and formatting 174 - for Nvidia GPU backend: 175 - the LLVM runtime ("dev" version with headers is not needed) 176 - the CUDA runtime ("dev" version with headers is not needed) 177 - at compile-time 178 - we need the std/macros library to generate Nim code. 179 180 ## Curves supported in the backend 181 182 _The backend, unlike protocols, is not public. Here be dragons._ 183 184 At the moment the following curves are implemented, adding a new curve only requires adding the prime modulus 185 and its bitsize in [constantine/config/curves.nim](constantine/math/config/curves_declaration.nim). 186 187 The following curves are configured: 188 189 - Pairing-Friendly curves 190 - BN254_Nogami 191 - BN254_Snarks (Zero-Knowledge Proofs, Snarks, Starks, Zcash, Ethereum 1) 192 - BLS12-377 (Zexe) 193 - BLS12-381 (Algorand, Chia Networks, Dfinity, Ethereum 2, Filecoin, Zcash Sapling) 194 - BW6-671 (Celo, EY Blockchain) (Pairings are WIP)\ 195 BLS12-377 is embedded in BW6-761 for one layer proof composition in zk-SNARKS. 196 - Embedded curves 197 - Jubjub, a curve embedded in BLS12-381 scalar field to be used in zk-SNARKS circuits. 198 - Bandersnatch, a more efficient curve embedded in BLS12-381 scalar field to be used in zk-SNARKS circuits. 199 - Other curves 200 - Edwards25519, used in ed25519 and X25519 from TLS 1.3 protocol and the Signal protocol. \ 201 With Ristretto, it can be used in bulletproofs. 202 - The Pasta curves (Pallas and Vesta) for the Halo 2 proof system (Zcash). 203 204 ## Security 205 206 Hardening an implementation against all existing and upcoming attack vectors is an extremely complex task. 207 The library is provided as is, without any guarantees at least until: 208 - it gets audited 209 - formal proofs of correctness are produced 210 - formal verification of constant-time implementation is possible 211 212 Defense against common attack vectors are provided on a best effort basis. 213 Do note that Constantine has no external package dependencies hence it is not vulnerable to 214 supply chain attacks (unless they affect a compiler or the OS). 215 216 Attackers may go to great lengths to retrieve secret data including: 217 - Timing the time taken to multiply on an elliptic curve 218 - Analysing the power usage of embedded devices 219 - Detecting cache misses when using lookup tables 220 - Memory attacks like page-faults, allocators, memory retention attacks 221 222 This is would be incomplete without mentioning that the hardware, OS and compiler 223 actively hinder you by: 224 - Hardware: sometimes not implementing multiplication in constant-time. 225 - OS: not providing a way to prevent memory paging to disk, core dumps, a debugger attaching to your process or a context switch (coroutines) leaking register data. 226 - Compiler: optimizing away your carefully crafted branchless code and leaking server secrets or optimizing away your secure erasure routine which is deemed "useless" because at the end of the function the data is not used anymore. 227 228 A growing number of attack vectors is being collected for your viewing pleasure 229 at https://github.com/mratsim/constantine/wiki/Constant-time-arithmetics 230 231 ### Disclaimer 232 233 Constantine's authors do their utmost to implement a secure cryptographic library 234 in particular against remote attack vectors like timing attacks. 235 236 Please note that Constantine is provided as-is without guarantees. 237 Use at your own risks. 238 239 Thorough evaluation of your threat model, the security of any cryptographic library you are considering, 240 and the secrets you put in jeopardy is strongly advised before putting data at risk. 241 The author would like to remind users that the best code can only mitigate 242 but not protect against human failures which are the weakest link and largest 243 backdoors to secrets exploited today. 244 245 ### Security disclosure 246 247 TODO 248 249 ## Performance 250 251 High-performance is a sought out property. 252 Note that security and side-channel resistance takes priority over performance. 253 254 New applications of elliptic curve cryptography like zero-knowledge proofs or 255 proof-of-stake based blockchain protocols are bottlenecked by cryptography. 256 257 ### In blockchain 258 259 Ethereum 2 clients spent or use to spend anywhere between 30% to 99% of their processing time verifying the signatures of block validators on R&D testnets 260 Assuming we want nodes to handle a thousand peers, if a cryptographic pairing takes 1ms, that represents 1s of cryptography per block to sign with a target 261 block frequency of 1 every 6 seconds. 262 263 ### In zero-knowledge proofs 264 265 According to https://medium.com/loopring-protocol/zksnark-prover-optimizations-3e9a3e5578c0 266 a 16-core CPU can prove 20 transfers/second or 10 transactions/second. 267 The previous implementation was 15x slower and one of the key optimizations 268 was changing the elliptic curve cryptography backend. 269 It had a direct implication on hardware cost and/or cloud computing resources required. 270 271 ### Measuring performance 272 273 To measure the performance of Constantine 274 275 ```bash 276 git clone https://github.com/mratsim/constantine 277 278 # Default compiler 279 nimble bench_fp 280 281 # Arithmetic 282 CC=clang nimble bench_fp # Using Clang + Assembly (recommended) 283 CC=clang nimble bench_fp2 284 CC=clang nimble bench_fp12 285 286 # Scalar multiplication and pairings 287 CC=clang nimble bench_ec_g1_scalar_mul 288 CC=clang nimble bench_ec_g2_scalar_mul 289 CC=clang nimble bench_pairing_bls12_381 290 291 # And per-curve summaries 292 CC=clang nimble bench_summary_bn254_nogami 293 CC=clang nimble bench_summary_bn254_snarks 294 CC=clang nimble bench_summary_bls12_377 295 CC=clang nimble bench_summary_bls12_381 296 297 # The Ethereum BLS signature protocol 298 CC=clang nimble bench_ethereum_bls_signatures 299 300 # Multi-scalar multiplication 301 CC=clang nimble bench_ec_g1_msm_bls12_381 302 CC=clang nimble bench_ec_g1_msm_bn256_snarks 303 ``` 304 305 The full list of benchmarks is available in the [`benchmarks`](./benchmarks) folder. 306 307 As mentioned in the [Compiler caveats](#compiler-caveats) section, GCC is up to 2x slower than Clang due to mishandling of carries and register usage. 308 309 #### Ethereum BLS signatures (over BLS12-381 G2) 310 311  312 313 #### BLS12-381 detailed benchmarks 314 315 On my machine i9-11980HK (8 cores 2.6GHz, turbo 5GHz), for Clang + Assembly, **all being constant-time** (including scalar multiplication, square root and inversion). 316 317  318 319  320  321  322 323 #### BN254-Snarks Multi-Scalar-Multiplication benchmarks 324 325 On a i9-9980XE (18 cores,watercooled, overclocked, 4.1GHz all core turbo) 326 327  328 329 #### Parallelism 330 331 Constantine multithreaded primitives are powered by a highly tuned threadpool and stress-tested for: 332 - scheduler overhead 333 - load balancing with extreme imbalance 334 - nested data parallelism 335 - contention 336 - speculative/conditional parallelism 337 338 and provides the following paradigms: 339 - Future-based task-parallelism 340 - Data parallelism (nestable and awaitable for loops) 341 - including arbitrary parallel reductions 342 - Dataflow parallelism / Stream parallelism / Graph Parallelism / Pipeline parallelism 343 - Structured Parallelism 344 345 The threadpool parallel-for loops use lazy loop splitting and are fully adaptative to the workload being scheduled, the threads in-flight load and the hardware speed unlike most (all?) runtime, see: 346 - OpenMP woes depending on hardware and workload: https://github.com/zy97140/omp-benchmark-for-pytorch 347 - Raytracing ideal runtime, adapt to pixel compute load: \ 348 Most (all?) production runtime use scheduling A (split on number of threads like GCC OpenMP) or B (eager splitting, unable to adapt to actual work like LLVM/Intel OpenMP or Intel TBB) while Constantine uses C. 349 350 The threadpool provides efficient backoff strategy to conserve power based on: 351 - eventcounts / futexes, for low overhead backoff 352 - log-log iterated backoff, a provably optimal backoff strategy used for wireless communication to minimize communication in parallel for-loops 353 354 The research papers on high performance multithreading available in Weave repo: https://github.com/mratsim/weave/tree/7682784/research.\ 355 _Note: The threadpool is not backed by Weave but by an inspired runtime that has been significantly simplified for ease of auditing. In particular it uses shared-memory based work-stealing instead of channel-based work-requesting for load balancing as distributed computing is not a target, ..., yet._ 356 357 ## Why Nim 358 359 The Nim language offers the following benefits for cryptography: 360 - Compilation to machine code via C or C++ or alternatively compilation to Javascript. Easy FFI to those languages. 361 - Obscure embedded devices with proprietary C compilers can be targeted. 362 - WASM can be targeted. 363 - Performance reachable in C is reachable in Nim, easily. 364 - Rich type system: generics, dependent types, mutability-tracking and side-effect analysis, borrow-checking, compiler enforced distinct types (Miles != Meters, SecretBool != bool and SecretWord != uint64). 365 - Compile-time evaluation, including parsing hex string, converting them to BigInt or Finite Field elements and doing bigint operations. 366 - Assembly support either inline or ``__attribute__((naked))`` or a simple `{.compile: "myasm.S".}` away 367 - No GC if no GC-ed types are used (automatic memory management is set at the type level and optimized for latency/soft-realtime by default and can be totally deactivated). 368 - Procedural macros working directly on AST to 369 - create generic curve configuration, 370 - derive constants 371 - write a size-independent inline assembly code generator 372 - Upcoming proof system for formal verification via Z3 ([DrNim](https://nim-lang.org/docs/drnim.html), [Correct-by-Construction RFC](https://github.com/nim-lang/RFCs/issues/222)) 373 ## Compiler caveats 374 375 Unfortunately compilers and in particular GCC are not very good at optimizing big integers and/or cryptographic code even when using intrinsics like `addcarry_u64`. 376 377 Compilers with proper support of `addcarry_u64` like Clang, MSVC and ICC 378 may generate code up to 20~25% faster than GCC. 379 380 This is explained by the GMP team: https://gmplib.org/manual/Assembly-Carry-Propagation.html 381 and can be reproduced with the following C code. 382 383 See https://gcc.godbolt.org/z/2h768y 384 ```C 385 #include <stdint.h> 386 #include <x86intrin.h> 387 388 void add256(uint64_t a[4], uint64_t b[4]){ 389 uint8_t carry = 0; 390 for (int i = 0; i < 4; ++i) 391 carry = _addcarry_u64(carry, a[i], b[i], &a[i]); 392 } 393 ``` 394 395 GCC 396 ```asm 397 add256: 398 movq (%rsi), %rax 399 addq (%rdi), %rax 400 setc %dl 401 movq %rax, (%rdi) 402 movq 8(%rdi), %rax 403 addb $-1, %dl 404 adcq 8(%rsi), %rax 405 setc %dl 406 movq %rax, 8(%rdi) 407 movq 16(%rdi), %rax 408 addb $-1, %dl 409 adcq 16(%rsi), %rax 410 setc %dl 411 movq %rax, 16(%rdi) 412 movq 24(%rsi), %rax 413 addb $-1, %dl 414 adcq %rax, 24(%rdi) 415 ret 416 ``` 417 418 Clang 419 ```asm 420 add256: 421 movq (%rsi), %rax 422 addq %rax, (%rdi) 423 movq 8(%rsi), %rax 424 adcq %rax, 8(%rdi) 425 movq 16(%rsi), %rax 426 adcq %rax, 16(%rdi) 427 movq 24(%rsi), %rax 428 adcq %rax, 24(%rdi) 429 retq 430 ``` 431 ### Inline assembly 432 433 While using intrinsics significantly improve code readability, portability, auditability and maintainability, 434 Constantine use inline assembly on x86-64 to ensure performance portability despite poor optimization (for GCC) 435 and also to use dedicated large integer instructions MULX, ADCX, ADOX that compilers cannot generate. 436 437 The speed improvement on finite field arithmetic is up 60% with MULX, ADCX, ADOX on BLS12-381 (6 limbs). 438 439 Finally assembly is a requirement to ensure constant-time property and to avoid compilers turning careful 440 branchless code into branches, see [Fighting the compiler (wiki)](https://github.com/mratsim/constantine/wiki/Constant-time-arithmetics#fighting-the-compiler) 441 442 In summary, pure C/C++/Nim implies: 443 - a smart compiler might unravel the constant time bit manipulation and reintroduce branches. 444 - a significant performance cost with GCC (~50% slower than Clang). 445 - missed opportunities on recent CPUs that support MULX/ADCX/ADOX instructions (~60% faster than Clang). 446 - 2.4x perf ratio between using plain GCC vs GCC with inline assembly. 447 448 ## Sizes: code size, stack usage 449 450 Thanks to 10x smaller key sizes for the same security level as RSA, elliptic curve cryptography 451 is widely used on resource-constrained devices. 452 453 Constantine is actively optimize for code-size and stack usage. 454 Constantine does not use heap allocation. 455 456 At the moment Constantine is optimized for 32-bit and 64-bit CPUs. 457 458 When performance and code size conflicts, a careful and informed default is chosen. 459 In the future, a compile-time flag that goes beyond the compiler `-Os` might be provided. 460 ## License 461 462 Licensed and distributed under either of 463 464 * MIT license: [LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT 465 466 or 467 468 * Apache License, Version 2.0, ([LICENSE-APACHEv2](LICENSE-APACHEv2) or http://www.apache.org/licenses/LICENSE-2.0) 469 470 at your option. This file may not be copied, modified, or distributed except according to those terms. 471 472 This library has **no external dependencies**. 473 In particular GMP is used only for testing and differential fuzzing 474 and is not linked in the library.