/ README.md
README.md
  1  # Constantine - Fast, compact, hardened Pairing-Based Cryptography
  2  
  3  [![License: Apache](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
  4  [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
  5  ![Stability: experimental](https://img.shields.io/badge/stability-experimental-orange.svg)\
  6  [![Github Actions CI](https://github.com/mratsim/constantine/workflows/Constantine%20CI/badge.svg)](https://github.com/mratsim/constantine/actions?query=workflow%3A%22Constantine+CI%22+branch%3Amaster)\
  7  [![Build Status: Travis](https://img.shields.io/travis/com/mratsim/constantine/master?label=Travis%20%28Linux%20ARM64%2FPowerPC64%29)](https://travis-ci.com/mratsim/constantine)\
  8  [![Build Status: Azure](https://img.shields.io/azure-devops/build/numforge/07a2a7a5-995a-45d3-acd5-f5456fe7b04d/4?label=Azure%20%28Linux%2032%2F64-bit%2C%20Windows%2032%2F64-bit%2C%20MacOS%2064-bit%29)](https://dev.azure.com/numforge/Constantine/_build?definitionId=4&branchName=master)
  9  
 10  > “A cryptographic system should be secure even if everything about the system, except the key, is public knowledge.”\
 11  >   — Auguste Kerckhoffs
 12  
 13  This library provides [constant-time](https://en.wikipedia.org/wiki/Timing_attack) implementation of cryptographic protocols
 14  with a particular focus on pairing-based cryptography as used in blockchains and zero-knowledge protocols.
 15  
 16  The implementations are accompanied with SAGE code used as reference implementation and test vectors generators before writing highly optimized routines implemented in the [Nim language](https://nim-lang.org/)
 17  
 18  > The library is in development state and high-level wrappers or example protocols are work-in-progress.
 19  
 20  ## Table of Contents
 21  
 22  <!-- TOC -->
 23  
 24  - [Constantine - Fast, compact, hardened Pairing-Based Cryptography](#constantine---fast-compact-hardened-pairing-based-cryptography)
 25    - [Table of Contents](#table-of-contents)
 26    - [Target audience](#target-audience)
 27    - [Protocols](#protocols)
 28    - [Installation](#installation)
 29      - [From C](#from-c)
 30      - [From Nim](#from-nim)
 31    - [Dependencies & Requirements](#dependencies--requirements)
 32    - [Curves supported in the backend](#curves-supported-in-the-backend)
 33    - [Security](#security)
 34      - [Disclaimer](#disclaimer)
 35      - [Security disclosure](#security-disclosure)
 36    - [Performance](#performance)
 37      - [In blockchain](#in-blockchain)
 38      - [In zero-knowledge proofs](#in-zero-knowledge-proofs)
 39      - [Measuring performance](#measuring-performance)
 40        - [Ethereum BLS signatures over BLS12-381 G2](#ethereum-bls-signatures-over-bls12-381-g2)
 41        - [BLS12-381 detailed benchmarks](#bls12-381-detailed-benchmarks)
 42        - [BN254-Snarks Multi-Scalar-Multiplication benchmarks](#bn254-snarks-multi-scalar-multiplication-benchmarks)
 43        - [Parallelism](#parallelism)
 44    - [Why Nim](#why-nim)
 45    - [Compiler caveats](#compiler-caveats)
 46      - [Inline assembly](#inline-assembly)
 47    - [Sizes: code size, stack usage](#sizes-code-size-stack-usage)
 48    - [License](#license)
 49  
 50  <!-- /TOC -->
 51  
 52  ## Target audience
 53  
 54  The library aims to be a fast, compact and hardened library for elliptic curve cryptography needs, in particular for blockchain protocols and zero-knowledge proofs system.
 55  
 56  The library focuses on following properties:
 57  - constant-time (not leaking secret data via [side-channels](https://en.wikipedia.org/wiki/Side-channel_attack))
 58  - performance
 59  - generated code size, datatype size and stack usage
 60  
 61  in this order.
 62  
 63  ## Protocols
 64  
 65  Protocols are a set of routines, designed for specific goals or a combination thereof:
 66  - confidentiality: only the intended receiver of a message can read it
 67  - authentication: the other party in the communication is the expected part
 68  - integrity: the received message has not been tampered with
 69  - non-repudiation: the sender of a message cannot repudiated it
 70  
 71  Protocols to address these goals, (authenticated) encryption, signature, traitor-tracing, etc
 72  are designed.\
 73  Note: some goals might be mutually exclusive, for example "plausible deniability" and "non-repudiation".
 74  
 75  ## Installation
 76  
 77  ### From C
 78  
 79  1. Install a C compiler, for example:
 80      - Debian/Ubuntu `sudo apt update && sudo apt install build-essential`
 81      - Archlinux `pacman -S base-devel`
 82  
 83  2. Install nim, it is available in most distros package manager for Linux and Homebrew for MacOS
 84     Windows binaries are on the official website: https://nim-lang.org/install_unix.html
 85      - Debian/Ubuntu `sudo apt install nim`
 86      - Archlinux `pacman -S nim`
 87  
 88  3. Compile the bindings.
 89      - Recommended: \
 90        `CC:clang nimble bindings`
 91      - or `nimble bindings_no_asm`\
 92       to compile without assembly (otherwise it autodetects support)
 93      - or with default compiler\
 94        `nimble bindings`
 95  
 96  4. Ensure bindings work
 97      - `nimble test_bindings`
 98  
 99  5. Bindings location
100      - The bindings are put in `constantine/lib`
101      - The headers are in [constantine/include](./include) for example [Ethereum BLS signatures](./include/constantine_ethereum_bls_signatures.h)
102  
103  6. Read the examples in [examples_c](./examples_c):
104     - Using the [Ethereum BLS signatures bindings from C](./examples_c/ethereum_bls_signatures.c)
105     - Testing Constantine BLS12-381 vs GMP [./examples_c/t_libctt_bls12_381.c](./examples_c/t_libctt_bls12_381.c)
106  
107  The bindings currently provided are:
108  
109  - Ethereum BLS signatures on BLS12-381 G2
110    Cryptographic suite: `BLS_SIG_BLS12381G2_XMD:SHA-256_SSWU_RO_POP_`
111  
112    This scheme is also used in the following blockchains:
113    Algorand, Chia, Dfinity, Filecoin, Tezos, Zcash.
114    They may have their pubkeys on G1 and signatures on G2 like Ethereum or the other way around.
115  
116  - BLS12-381 arithmetic:
117    - field arithmetic
118      - on Fr (i.e. modulo the 255-bit curve order)
119      - on Fp (i.e. modulo the 381-bit prime modulus)
120      - on Fp2
121    - elliptic curve arithmetic:
122      - on elliptic curve over Fp (EC G1) with affine, jacobian and homogenous projective coordinates
123      - on elliptic curve over Fp2 (EC G2) with affine, jacobian and homogenous projective coordinates
124    - currently not exposed: \
125      scalar multiplication, multi-scalar multiplications \
126      pairings and multi-pairings \
127      are implemented but not exposed
128    - _All operations are constant-time unless explicitly mentioned_ vartime
129  
130  - The Pasta curves: Pallas and Vesta
131    - field arithmetic
132      - on Fr (i.e. modulo the 255-bit curve order)
133      - on Fp (i.e. modulo the 255-bit prime modulus)
134    - elliptic curve arithmetic:
135      - on elliptic curve over Fp (EC G1) with affine, jacobian and homogenous projective coordinates
136    - currently not exposed: \
137      scalar multiplication, multi-scalar multiplications \
138      are implemented but not exposed
139    - _All operations are constant-time unless explicitly mentioned_ vartime
140  
141  ### From Nim
142  
143  You can install the developement version of the library through nimble with the following command
144  ```
145  nimble install https://github.com/mratsim/constantine@#master
146  ```
147  
148  ## Dependencies & Requirements
149  
150  For speed it is recommended to use Clang (see [Compiler-caveats](#Compiler-caveats)).
151  In particular GCC generates inefficient add-with-carry code.
152  
153  Constantine requires at least:
154  - GCC 7 \
155    Previous versions generated incorrect add-with-carry code.
156  - Clang 14 \
157    On x86-64, inline assembly is used to workaround compilers having issues optimizing large integer arithmetic,
158    and also ensure constant-time code. \
159    Constantine uses the intel assembly syntax to address issues with the default AT&T syntax and constants propagated in Clang. \
160    Clang 14 added support for `-masm=intel`. \
161    \
162    On MacOS, Apple Clang does not support Intel assembly syntax, use Homebrew Clang instead or compile without assembly.\
163    _Note that Apple is discontinuing Intel CPU throughough their product line so this will impact only older model and Mac Pro_
164  
165  On Windows, Constantine is tested with MinGW. The Microsoft Visual C++ Compiler is not configured.
166  
167  Constantine has no dependencies, even on Nim standard library except:
168  - for testing
169    - jsony for parsing json test vectors
170    - the Nim standard library for unittesting, formatting and datetime.
171    - GMP for testing against GMP
172  - for benchmarking
173    - The Nim standard libreary for timing and formatting
174  - for Nvidia GPU backend:
175    - the LLVM runtime ("dev" version with headers is not needed)
176    - the CUDA runtime ("dev" version with headers is not needed)
177  - at compile-time
178    - we need the std/macros library to generate Nim code.
179  
180  ## Curves supported in the backend
181  
182  _The backend, unlike protocols, is not public. Here be dragons._
183  
184  At the moment the following curves are implemented, adding a new curve only requires adding the prime modulus
185  and its bitsize in [constantine/config/curves.nim](constantine/math/config/curves_declaration.nim).
186  
187  The following curves are configured:
188  
189  - Pairing-Friendly curves
190    - BN254_Nogami
191    - BN254_Snarks (Zero-Knowledge Proofs, Snarks, Starks, Zcash, Ethereum 1)
192    - BLS12-377 (Zexe)
193    - BLS12-381 (Algorand, Chia Networks, Dfinity, Ethereum 2, Filecoin, Zcash Sapling)
194    - BW6-671 (Celo, EY Blockchain) (Pairings are WIP)\
195    BLS12-377 is embedded in BW6-761 for one layer proof composition in zk-SNARKS.
196  - Embedded curves
197    - Jubjub, a curve embedded in BLS12-381 scalar field to be used in zk-SNARKS circuits.
198    - Bandersnatch, a more efficient curve embedded in BLS12-381 scalar field to be used in zk-SNARKS circuits.
199  - Other curves
200    - Edwards25519, used in ed25519 and X25519 from TLS 1.3 protocol and the Signal protocol. \
201      With Ristretto, it can be used in bulletproofs.
202    - The Pasta curves (Pallas and Vesta) for the Halo 2 proof system (Zcash).
203  
204  ## Security
205  
206  Hardening an implementation against all existing and upcoming attack vectors is an extremely complex task.
207  The library is provided as is, without any guarantees at least until:
208  - it gets audited
209  - formal proofs of correctness are produced
210  - formal verification of constant-time implementation is possible
211  
212  Defense against common attack vectors are provided on a best effort basis.
213  Do note that Constantine has no external package dependencies hence it is not vulnerable to
214  supply chain attacks (unless they affect a compiler or the OS).
215  
216  Attackers may go to great lengths to retrieve secret data including:
217  - Timing the time taken to multiply on an elliptic curve
218  - Analysing the power usage of embedded devices
219  - Detecting cache misses when using lookup tables
220  - Memory attacks like page-faults, allocators, memory retention attacks
221  
222  This is would be incomplete without mentioning that the hardware, OS and compiler
223  actively hinder you by:
224  - Hardware: sometimes not implementing multiplication in constant-time.
225  - OS: not providing a way to prevent memory paging to disk, core dumps, a debugger attaching to your process or a context switch (coroutines) leaking register data.
226  - Compiler: optimizing away your carefully crafted branchless code and leaking server secrets or optimizing away your secure erasure routine which is deemed "useless" because at the end of the function the data is not used anymore.
227  
228  A growing number of attack vectors is being collected for your viewing pleasure
229  at https://github.com/mratsim/constantine/wiki/Constant-time-arithmetics
230  
231  ### Disclaimer
232  
233  Constantine's authors do their utmost to implement a secure cryptographic library
234  in particular against remote attack vectors like timing attacks.
235  
236  Please note that Constantine is provided as-is without guarantees.
237  Use at your own risks.
238  
239  Thorough evaluation of your threat model, the security of any cryptographic library you are considering,
240  and the secrets you put in jeopardy is strongly advised before putting data at risk.
241  The author would like to remind users that the best code can only mitigate
242  but not protect against human failures which are the weakest link and largest
243  backdoors to secrets exploited today.
244  
245  ### Security disclosure
246  
247  TODO
248  
249  ## Performance
250  
251  High-performance is a sought out property.
252  Note that security and side-channel resistance takes priority over performance.
253  
254  New applications of elliptic curve cryptography like zero-knowledge proofs or
255  proof-of-stake based blockchain protocols are bottlenecked by cryptography.
256  
257  ### In blockchain
258  
259  Ethereum 2 clients spent or use to spend anywhere between 30% to 99% of their processing time verifying the signatures of block validators on R&D testnets
260  Assuming we want nodes to handle a thousand peers, if a cryptographic pairing takes 1ms, that represents 1s of cryptography per block to sign with a target
261  block frequency of 1 every 6 seconds.
262  
263  ### In zero-knowledge proofs
264  
265  According to https://medium.com/loopring-protocol/zksnark-prover-optimizations-3e9a3e5578c0
266  a 16-core CPU can prove 20 transfers/second or 10 transactions/second.
267  The previous implementation was 15x slower and one of the key optimizations
268  was changing the elliptic curve cryptography backend.
269  It had a direct implication on hardware cost and/or cloud computing resources required.
270  
271  ### Measuring performance
272  
273  To measure the performance of Constantine
274  
275  ```bash
276  git clone https://github.com/mratsim/constantine
277  
278  # Default compiler
279  nimble bench_fp
280  
281  # Arithmetic
282  CC=clang nimble bench_fp  # Using Clang + Assembly (recommended)
283  CC=clang nimble bench_fp2
284  CC=clang nimble bench_fp12
285  
286  # Scalar multiplication and pairings
287  CC=clang nimble bench_ec_g1_scalar_mul
288  CC=clang nimble bench_ec_g2_scalar_mul
289  CC=clang nimble bench_pairing_bls12_381
290  
291  # And per-curve summaries
292  CC=clang nimble bench_summary_bn254_nogami
293  CC=clang nimble bench_summary_bn254_snarks
294  CC=clang nimble bench_summary_bls12_377
295  CC=clang nimble bench_summary_bls12_381
296  
297  # The Ethereum BLS signature protocol
298  CC=clang nimble bench_ethereum_bls_signatures
299  
300  # Multi-scalar multiplication
301  CC=clang nimble bench_ec_g1_msm_bls12_381
302  CC=clang nimble bench_ec_g1_msm_bn256_snarks
303  ```
304  
305  The full list of benchmarks is available in the [`benchmarks`](./benchmarks) folder.
306  
307  As mentioned in the [Compiler caveats](#compiler-caveats) section, GCC is up to 2x slower than Clang due to mishandling of carries and register usage.
308  
309  #### Ethereum BLS signatures (over BLS12-381 G2)
310  
311  ![Bench Ethereum BLS signature](./media/ethereum_bls_signatures.png)
312  
313  #### BLS12-381 detailed benchmarks
314  
315  On my machine i9-11980HK (8 cores 2.6GHz, turbo 5GHz), for Clang + Assembly, **all being constant-time** (including scalar multiplication, square root and inversion).
316  
317  ![BLS12-381 perf summary](./media/bls12_381_perf_summary_i9-11980HK.png)
318  
319  ![BLS12-381 Multi-Scalar multiplication 1](./media/bls12_381_msm_i9-11980HK-8cores_1.png)
320  ![BLS12-381 Multi-Scalar multiplication 2](./media/bls12_381_msm_i9-11980HK-8cores_2.png)
321  ![BLS12-381 Multi-Scalar multiplication 3](./media/bls12_381_msm_i9-11980HK-8cores_3.png)
322  
323  #### BN254-Snarks Multi-Scalar-Multiplication benchmarks
324  
325  On a i9-9980XE (18 cores,watercooled, overclocked, 4.1GHz all core turbo)
326  
327  ![BN254-Snarks multi-scalar multiplication](./media/bn254_snarks_msm-i9-9980XE-18cores.png)
328  
329  #### Parallelism
330  
331  Constantine multithreaded primitives are powered by a highly tuned threadpool and stress-tested for:
332  - scheduler overhead
333  - load balancing with extreme imbalance
334  - nested data parallelism
335  - contention
336  - speculative/conditional parallelism
337  
338  and provides the following paradigms:
339  - Future-based task-parallelism
340  - Data parallelism (nestable and awaitable for loops)
341    - including arbitrary parallel reductions
342  - Dataflow parallelism / Stream parallelism / Graph Parallelism / Pipeline parallelism
343  - Structured Parallelism
344  
345  The threadpool parallel-for loops use lazy loop splitting and are fully adaptative to the workload being scheduled, the threads in-flight load and the hardware speed unlike most (all?) runtime, see:
346  - OpenMP woes depending on hardware and workload: https://github.com/zy97140/omp-benchmark-for-pytorch
347  - Raytracing ideal runtime, adapt to pixel compute load: ![load distribution](./media/parallel_load_distribution.png)\
348    Most (all?) production runtime use scheduling A (split on number of threads like GCC OpenMP) or B (eager splitting, unable to adapt to actual work like LLVM/Intel OpenMP or Intel TBB) while Constantine uses C.
349  
350  The threadpool provides efficient backoff strategy to conserve power based on:
351  - eventcounts / futexes, for low overhead backoff
352  - log-log iterated backoff, a provably optimal backoff strategy used for wireless communication to minimize communication in parallel for-loops
353  
354  The research papers on high performance multithreading available in Weave repo: https://github.com/mratsim/weave/tree/7682784/research.\
355  _Note: The threadpool is not backed by Weave but by an inspired runtime that has been significantly simplified for ease of auditing. In particular it uses shared-memory based work-stealing instead of channel-based work-requesting for load balancing as distributed computing is not a target, ..., yet._
356  
357  ## Why Nim
358  
359  The Nim language offers the following benefits for cryptography:
360  - Compilation to machine code via C or C++ or alternatively compilation to Javascript. Easy FFI to those languages.
361    - Obscure embedded devices with proprietary C compilers can be targeted.
362    - WASM can be targeted.
363  - Performance reachable in C is reachable in Nim, easily.
364  - Rich type system: generics, dependent types, mutability-tracking and side-effect analysis, borrow-checking, compiler enforced distinct types (Miles != Meters, SecretBool != bool and SecretWord != uint64).
365  - Compile-time evaluation, including parsing hex string, converting them to BigInt or Finite Field elements and doing bigint operations.
366  - Assembly support either inline or ``__attribute__((naked))`` or a simple `{.compile: "myasm.S".}` away
367  - No GC if no GC-ed types are used (automatic memory management is set at the type level and optimized for latency/soft-realtime by default and can be totally deactivated).
368  - Procedural macros working directly on AST to
369    - create generic curve configuration,
370    - derive constants
371    - write a size-independent inline assembly code generator
372  - Upcoming proof system for formal verification via Z3 ([DrNim](https://nim-lang.org/docs/drnim.html), [Correct-by-Construction RFC](https://github.com/nim-lang/RFCs/issues/222))
373  ## Compiler caveats
374  
375  Unfortunately compilers and in particular GCC are not very good at optimizing big integers and/or cryptographic code even when using intrinsics like `addcarry_u64`.
376  
377  Compilers with proper support of `addcarry_u64` like Clang, MSVC and ICC
378  may generate code up to 20~25% faster than GCC.
379  
380  This is explained by the GMP team: https://gmplib.org/manual/Assembly-Carry-Propagation.html
381  and can be reproduced with the following C code.
382  
383  See https://gcc.godbolt.org/z/2h768y
384  ```C
385  #include <stdint.h>
386  #include <x86intrin.h>
387  
388  void add256(uint64_t a[4], uint64_t b[4]){
389    uint8_t carry = 0;
390    for (int i = 0; i < 4; ++i)
391      carry = _addcarry_u64(carry, a[i], b[i], &a[i]);
392  }
393  ```
394  
395  GCC
396  ```asm
397  add256:
398          movq    (%rsi), %rax
399          addq    (%rdi), %rax
400          setc    %dl
401          movq    %rax, (%rdi)
402          movq    8(%rdi), %rax
403          addb    $-1, %dl
404          adcq    8(%rsi), %rax
405          setc    %dl
406          movq    %rax, 8(%rdi)
407          movq    16(%rdi), %rax
408          addb    $-1, %dl
409          adcq    16(%rsi), %rax
410          setc    %dl
411          movq    %rax, 16(%rdi)
412          movq    24(%rsi), %rax
413          addb    $-1, %dl
414          adcq    %rax, 24(%rdi)
415          ret
416  ```
417  
418  Clang
419  ```asm
420  add256:
421          movq    (%rsi), %rax
422          addq    %rax, (%rdi)
423          movq    8(%rsi), %rax
424          adcq    %rax, 8(%rdi)
425          movq    16(%rsi), %rax
426          adcq    %rax, 16(%rdi)
427          movq    24(%rsi), %rax
428          adcq    %rax, 24(%rdi)
429          retq
430  ```
431  ### Inline assembly
432  
433  While using intrinsics significantly improve code readability, portability, auditability and maintainability,
434  Constantine use inline assembly on x86-64 to ensure performance portability despite poor optimization (for GCC)
435  and also to use dedicated large integer instructions MULX, ADCX, ADOX that compilers cannot generate.
436  
437  The speed improvement on finite field arithmetic is up 60% with MULX, ADCX, ADOX on BLS12-381 (6 limbs).
438  
439  Finally assembly is a requirement to ensure constant-time property and to avoid compilers turning careful
440  branchless code into branches, see [Fighting the compiler (wiki)](https://github.com/mratsim/constantine/wiki/Constant-time-arithmetics#fighting-the-compiler)
441  
442  In summary, pure C/C++/Nim implies:
443  - a smart compiler might unravel the constant time bit manipulation and reintroduce branches.
444  - a significant performance cost with GCC (~50% slower than Clang).
445  - missed opportunities on recent CPUs that support MULX/ADCX/ADOX instructions (~60% faster than Clang).
446  - 2.4x perf ratio between using plain GCC vs GCC with inline assembly.
447  
448  ## Sizes: code size, stack usage
449  
450  Thanks to 10x smaller key sizes for the same security level as RSA, elliptic curve cryptography
451  is widely used on resource-constrained devices.
452  
453  Constantine is actively optimize for code-size and stack usage.
454  Constantine does not use heap allocation.
455  
456  At the moment Constantine is optimized for 32-bit and 64-bit CPUs.
457  
458  When performance and code size conflicts, a careful and informed default is chosen.
459  In the future, a compile-time flag that goes beyond the compiler `-Os` might be provided.
460  ## License
461  
462  Licensed and distributed under either of
463  
464  * MIT license: [LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT
465  
466  or
467  
468  * Apache License, Version 2.0, ([LICENSE-APACHEv2](LICENSE-APACHEv2) or http://www.apache.org/licenses/LICENSE-2.0)
469  
470  at your option. This file may not be copied, modified, or distributed except according to those terms.
471  
472  This library has **no external dependencies**.
473  In particular GMP is used only for testing and differential fuzzing
474  and is not linked in the library.