optimizations.md
1 # Optimizations 2 3 This document lists the optimizations relevant to an elliptic curve or pairing-based cryptography library and whether Constantine has them implemented. 4 5 The optimizations can be of algebraic, algorithmic or "implementation details" nature. Using non-constant time code is always possible, it is listed if the speedup is significant. 6 7 ## Big Integers 8 9 - Conditional copy 10 - [x] Loop unrolling 11 - [x] x86: Conditional Mov 12 - [x] x86: Full Assembly implementation 13 - [ ] SIMD instructions 14 - Add/Sub 15 - [x] int128 16 - [x] add-with-carry, sub-with-borrow intrinsics 17 - [x] loop unrolling 18 - [x] x86: Full Assembly implementation 19 - Multiplication 20 - [x] int128 21 - [x] loop unrolling 22 - [x] Comba multiplication / product Scanning 23 - [ ] Karatsuba 24 - [ ] Karatsuba + Comba 25 - [x] x86: Full Assembly implementation 26 - [x] x86: MULX, ADCX, ADOX instructions 27 - [x] Fused Multiply + Shift-right by word (for Barrett Reduction and approximating multiplication by fractional constant) 28 - Squaring 29 - [x] Dedicated squaring functions 30 - [x] int128 31 - [x] loop unrolling 32 - [x] x86: Full Assembly implementation 33 - [x] x86: MULX, ADCX, ADOX instructions 34 35 ## Finite Fields & Modular Arithmetic 36 37 - Representation 38 - [x] Montgomery Representation 39 - [ ] Barret Reduction 40 - [ ] Unsaturated Representation 41 - [ ] Mersenne Prime (2ᵏ - 1), 42 - [ ] Generalized Mersenne Prime (NIST Prime P256: 2^256 - 2^224 + 2^192 + 2^96 - 1) 43 - [ ] Pseudo-Mersenne Prime (2^m - k for example Edwards25519: 2^255 - 19) 44 - [ ] Golden Primes (φ^2 - φ - 1 with φ = 2ᵏ for example Ed448-Goldilocks: 2^448 - 2^224 - 1) 45 - [ ] any prime modulus (lazy carry) 46 47 - Montgomery Reduction 48 - [x] int128 49 - [x] loop unrolling 50 - [x] x86: Full Assembly implementation 51 - [x] x86: MULX, ADCX, ADOX instructions 52 53 - Addition/substraction 54 - [x] int128 55 - [x] add-with-carry, sub-with-borrow intrinsics 56 - [x] loop unrolling 57 - [x] x86: Full Assembly implementation 58 - [x] Addition-chain for small constants 59 60 - Montgomery Multiplication 61 - [x] Fused multiply + reduce 62 - [x] int128 63 - [x] loop unrolling 64 - [x] x86: Full Assembly implementation 65 - [x] x86: MULX, ADCX, ADOX instructions 66 - [x] no-carry optimization for CIOS (Coarsely Integrated Operand Scanning) 67 - [x] FIPS (Finely Integrated Operand Scanning) 68 69 - Montgomery Squaring 70 - [x] Dedicated squaring functions 71 - [x] Fused multiply + reduce 72 - [x] int128 73 - [x] loop unrolling 74 - [x] x86: Full Assembly implementation 75 - [x] x86: MULX, ADCX, ADOX instructions 76 - [ ] no-carry optimization for CIOS (Coarsely Integrated Operand Scanning) 77 78 - Addition chains 79 - [x] unreduced squarings/multiplications in addition chains 80 81 - Exponentiation 82 - [x] variable-time exponentiation 83 - [x] fixed window optimization _(sliding windows are not constant-time)_ 84 - [ ] NAF recoding 85 - [ ] windowed-NAF recoding 86 - [ ] SIMD vectorized select in window algorithm 87 - [x] Montgomery Multiplication with no final substraction, 88 - Bos and Montgomery, https://eprint.iacr.org/2017/1057.pdf 89 - Colin D Walter, https://colinandmargaret.co.uk/Research/CDW_ELL_99.pdf 90 - Hachez and Quisquater, https://link.springer.com/content/pdf/10.1007%2F3-540-44499-8_23.pdf 91 - Gueron, https://eprint.iacr.org/2011/239.pdf 92 - [ ] Pippenger multi-exponentiation (variable-time) 93 - [ ] parallelized Pippenger 94 95 - Inversion (constant-time baseline, Little-Fermat inversion via a^(p-2)) 96 - [x] Constant-time binary GCD algorithm by Möller, algorithm 5 in https://link.springer.com/content/pdf/10.1007%2F978-3-642-40588-4_10.pdf 97 - [x] Addition-chain for a^(p-2) 98 - [x] Constant-time binary GCD algorithm by Bernstein-Yang, https://eprint.iacr.org/2019/266 99 - [ ] Constant-time binary GCD algorithm by Pornin, https://eprint.iacr.org/2020/972 100 - [x] Constant-time binary GCD algorithm by BY with half-delta optimization by libsecp256k1, formally verified, https://eprint.iacr.org/2021/549 101 - [x] Simultaneous inversion 102 103 - Square Root (constant-time) 104 - [x] baseline sqrt via Little-Fermat for `p ≡ 3 (mod 4)` 105 - [x] baseline sqrt via Little-Fermat for `p ≡ 5 (mod 8)` 106 - [ ] baseline sqrt via Little-Fermat for `p ≡ 9 (mod 16)` 107 - [x] baseline sqrt via Tonelli-Shanks for any prime. 108 - [x] sqrt via addition-chain 109 - [x] Fused sqrt + testIfSquare (Euler Criterion or Legendre symbol or Kronecker symbol) 110 - [x] Fused sqrt + 1/sqrt 111 - [x] Fused sqrt + 1/sqrt + testIfSquare 112 113 ## Extension Fields 114 115 - [x] Lazy reduction via double-precision base fields 116 - [x] Sparse multiplication 117 - Fp2 118 - [x] complex multiplication 119 - [x] complex squaring 120 - [x] sqrt via the constant-time complex method (Adj et al) 121 - [x] sqrt using addition chain 122 - [x] fused complex method sqrt by rotating in complex plane 123 - Cubic extension fields 124 - [x] Toom-Cook polynomial multiplication (Chung-Hasan) 125 126 ## Elliptic curve 127 128 - Weierstrass curves: 129 - [x] Affine coordinates 130 - [x] Homogeneous projective coordinates 131 - [x] Projective complete formulae 132 - [x] Mixed addition 133 - [x] Jacobian projective coordinates 134 - [x] Jacobian complete formulae 135 - [x] Mixed addition 136 - [ ] Conjugate Mixed Addition 137 - [ ] Composites Double-Add 2P+Q, tripling, quadrupling, quintupling, octupling 138 139 - [x] scalar multiplication 140 - [x] fixed window optimization 141 - [ ] constant-time NAF recoding 142 - [ ] constant-time windowed-NAF recoding 143 - [ ] SIMD vectorized select in window algorithm 144 - [x] constant-time endomorphism acceleration 145 - [ ] using NAF recoding 146 - [x] using GLV-SAC recoding 147 - [x] constant-time windowed-endomorphism acceleration 148 - [ ] using wNAF recoding 149 - [x] using windowed GLV-SAC recoding 150 - [ ] SIMD vectorized select in window algorithm 151 - [ ] Fixed-base scalar mul 152 153 - [ ] Multi-scalar-mul 154 - [ ] Strauss multi-scalar-mul 155 - [ ] Bos-Coster multi-scalar-mul 156 - [ ] Pippenger multi-scalar-mul (variable-time) 157 - [ ] parallelized Pippenger 158 159 ## Pairings 160 161 - Frobenius maps 162 - [x] Sparse Frobenius coefficients 163 - [x] Coalesced Frobenius in towered Fields 164 - [x] Coalesced Frobenius powers 165 166 - Line functions 167 - [x] Homogeneous projective coordinates 168 - [x] D-Twist 169 - [x] Fused line add + elliptic curve add 170 - [x] Fused line double + elliptic curve double 171 - [x] M-Twist 172 - [x] Fused line add + elliptic curve add 173 - [x] Fused line double + elliptic curve double 174 - [x] 6-way sparse multiplication line * Gₜ element 175 - [ ] Jacobian projective coordinates 176 - [ ] D-Twist 177 - [ ] Fused line add + elliptic curve add 178 - [ ] Fused line double + elliptic curve double 179 - [ ] M-Twist 180 - [ ] Fused line add + elliptic curve add 181 - [ ] Fused line double + elliptic curve double 182 - [x] 6-way sparse multiplication line * Gₜ element 183 - [ ] Affine coordinates 184 - [ ] 7-way sparse multiplication line * Gₜ element 185 - [ ] Pseudo-8 sparse multiplication line * Gₜ element 186 187 - Miller Loop 188 - [x] NAF recoding 189 - [ ] Quadruple-and-add and Octuple-and-add 190 - [x] addition chains 191 192 - Final exponentiation 193 - [x] Cyclotomic squaring 194 - [x] Karabina's compressed cyclotomic squarings 195 - [x] Addition-chain for exponentiation by curve parameter 196 - [x] BN curves: Fuentes-Castañeda 197 - [ ] BN curves: Duquesne, Ghammam 198 - [ ] BLS curves: Ghamman, Fouotsa 199 - [x] BLS curves: Hayashida, Hayasaka, Teruya 200 201 - [x] Multi-pairing 202 - [ ] Line accumulation 203 - [ ] Parallel Multi-Pairing 204 205 ## Hash-to-curve 206 207 - Clear cofactor 208 - [x] BLS G1: Wahby-Boneh 209 - [ ] BLS G2: Scott et al 210 - [ ] BLS G2: Fuentes-Castañeda 211 - [x] BLS G2: Budroni et al, endomorphism accelerated 212 - [x] BN G2: Fuentes-Castañeda 213 - [ ] BW6-761 G1 214 - [ ] BW6-761 G2 215 216 - Subgroup check 217 - [ ] BLS G1: Bowe, endomorphism accelerated 218 - [ ] BLS G2: Bowe, endomorphism accelerated 219 - [x] BLS G1: Scott, endomorphism accelerated 220 - [x] BLS G2: Scott, endomorphism accelerated