/ docs / optimizations.md
optimizations.md
  1  # Optimizations
  2  
  3  This document lists the optimizations relevant to an elliptic curve or pairing-based cryptography library and whether Constantine has them implemented.
  4  
  5  The optimizations can be of algebraic, algorithmic or "implementation details" nature. Using non-constant time code is always possible, it is listed if the speedup is significant.
  6  
  7  ## Big Integers
  8  
  9  - Conditional copy
 10    - [x] Loop unrolling
 11    - [x] x86: Conditional Mov
 12    - [x] x86: Full Assembly implementation
 13    - [ ] SIMD instructions
 14  - Add/Sub
 15    - [x] int128
 16    - [x] add-with-carry, sub-with-borrow intrinsics
 17    - [x] loop unrolling
 18    - [x] x86: Full Assembly implementation
 19  - Multiplication
 20    - [x] int128
 21    - [x] loop unrolling
 22    - [x] Comba multiplication / product Scanning
 23    - [ ] Karatsuba
 24    - [ ] Karatsuba + Comba
 25    - [x] x86: Full Assembly implementation
 26    - [x] x86: MULX, ADCX, ADOX instructions
 27    - [x] Fused Multiply + Shift-right by word (for Barrett Reduction and approximating multiplication by fractional constant)
 28  - Squaring
 29    - [x] Dedicated squaring functions
 30    - [x] int128
 31    - [x] loop unrolling
 32    - [x] x86: Full Assembly implementation
 33    - [x] x86: MULX, ADCX, ADOX instructions
 34  
 35  ## Finite Fields & Modular Arithmetic
 36  
 37  - Representation
 38    - [x] Montgomery Representation
 39    - [ ] Barret Reduction
 40    - [ ] Unsaturated Representation
 41      - [ ] Mersenne Prime (2ᵏ - 1),
 42      - [ ] Generalized Mersenne Prime (NIST Prime P256: 2^256 - 2^224 + 2^192 + 2^96 - 1)
 43      - [ ] Pseudo-Mersenne Prime (2^m - k for example Edwards25519: 2^255 - 19)
 44      - [ ] Golden Primes (φ^2 - φ - 1 with φ = 2ᵏ for example Ed448-Goldilocks: 2^448 - 2^224 - 1)
 45      - [ ] any prime modulus (lazy carry)
 46  
 47  - Montgomery Reduction
 48    - [x] int128
 49    - [x] loop unrolling
 50    - [x] x86: Full Assembly implementation
 51    - [x] x86: MULX, ADCX, ADOX instructions
 52  
 53  - Addition/substraction
 54    - [x] int128
 55    - [x] add-with-carry, sub-with-borrow intrinsics
 56    - [x] loop unrolling
 57    - [x] x86: Full Assembly implementation
 58    - [x] Addition-chain for small constants
 59  
 60  - Montgomery Multiplication
 61    - [x] Fused multiply + reduce
 62    - [x] int128
 63    - [x] loop unrolling
 64    - [x] x86: Full Assembly implementation
 65    - [x] x86: MULX, ADCX, ADOX instructions
 66    - [x] no-carry optimization for CIOS (Coarsely Integrated Operand Scanning)
 67    - [x] FIPS (Finely Integrated Operand Scanning)
 68  
 69  - Montgomery Squaring
 70    - [x] Dedicated squaring functions
 71    - [x] Fused multiply + reduce
 72    - [x] int128
 73    - [x] loop unrolling
 74    - [x] x86: Full Assembly implementation
 75    - [x] x86: MULX, ADCX, ADOX instructions
 76    - [ ] no-carry optimization for CIOS (Coarsely Integrated Operand Scanning)
 77  
 78  - Addition chains
 79    - [x] unreduced squarings/multiplications in addition chains
 80  
 81  - Exponentiation
 82    - [x] variable-time exponentiation
 83    - [x] fixed window optimization _(sliding windows are not constant-time)_
 84    - [ ] NAF recoding
 85    - [ ] windowed-NAF recoding
 86    - [ ] SIMD vectorized select in window algorithm
 87    - [x] Montgomery Multiplication with no final substraction,
 88      - Bos and Montgomery, https://eprint.iacr.org/2017/1057.pdf
 89        - Colin D Walter, https://colinandmargaret.co.uk/Research/CDW_ELL_99.pdf
 90        - Hachez and Quisquater, https://link.springer.com/content/pdf/10.1007%2F3-540-44499-8_23.pdf
 91      - Gueron, https://eprint.iacr.org/2011/239.pdf
 92    - [ ] Pippenger multi-exponentiation (variable-time)
 93      - [ ] parallelized Pippenger
 94  
 95  - Inversion (constant-time baseline, Little-Fermat inversion via a^(p-2))
 96    - [x] Constant-time binary GCD algorithm by Möller, algorithm 5 in https://link.springer.com/content/pdf/10.1007%2F978-3-642-40588-4_10.pdf
 97    - [x] Addition-chain for a^(p-2)
 98    - [x] Constant-time binary GCD algorithm by Bernstein-Yang, https://eprint.iacr.org/2019/266
 99    - [ ] Constant-time binary GCD algorithm by Pornin, https://eprint.iacr.org/2020/972
100    - [x] Constant-time binary GCD algorithm by BY with half-delta optimization by libsecp256k1, formally verified, https://eprint.iacr.org/2021/549
101    - [x] Simultaneous inversion
102  
103  - Square Root (constant-time)
104    - [x] baseline sqrt via Little-Fermat for `p ≡ 3 (mod 4)`
105    - [x] baseline sqrt via Little-Fermat for `p ≡ 5 (mod 8)`
106    - [ ] baseline sqrt via Little-Fermat for `p ≡ 9 (mod 16)`
107    - [x] baseline sqrt via Tonelli-Shanks for any prime.
108    - [x] sqrt via addition-chain
109    - [x] Fused sqrt + testIfSquare (Euler Criterion or Legendre symbol or Kronecker symbol)
110    - [x] Fused sqrt + 1/sqrt
111    - [x] Fused sqrt + 1/sqrt + testIfSquare
112  
113  ## Extension Fields
114  
115  - [x] Lazy reduction via double-precision base fields
116  - [x] Sparse multiplication
117  - Fp2
118    - [x] complex multiplication
119    - [x] complex squaring
120    - [x] sqrt via the constant-time complex method (Adj et al)
121    - [x] sqrt using addition chain
122    - [x] fused complex method sqrt by rotating in complex plane
123  - Cubic extension fields
124    - [x] Toom-Cook polynomial multiplication (Chung-Hasan)
125  
126  ## Elliptic curve
127  
128  - Weierstrass curves:
129    - [x] Affine coordinates
130    - [x] Homogeneous projective coordinates
131      - [x] Projective complete formulae
132      - [x] Mixed addition
133    - [x] Jacobian projective coordinates
134      - [x] Jacobian complete formulae
135      - [x] Mixed addition
136      - [ ] Conjugate Mixed Addition
137      - [ ] Composites Double-Add 2P+Q, tripling, quadrupling, quintupling, octupling
138  
139  - [x] scalar multiplication
140    - [x] fixed window optimization
141    - [ ] constant-time NAF recoding
142    - [ ] constant-time windowed-NAF recoding
143      - [ ] SIMD vectorized select in window algorithm
144    - [x] constant-time endomorphism acceleration
145      - [ ] using NAF recoding
146      - [x] using GLV-SAC recoding
147    - [x] constant-time windowed-endomorphism acceleration
148      - [ ] using wNAF recoding
149      - [x] using windowed GLV-SAC recoding
150      - [ ] SIMD vectorized select in window algorithm
151    - [ ] Fixed-base scalar mul
152  
153  - [ ] Multi-scalar-mul
154    - [ ] Strauss multi-scalar-mul
155    - [ ] Bos-Coster multi-scalar-mul
156    - [ ] Pippenger multi-scalar-mul (variable-time)
157      - [ ] parallelized Pippenger
158  
159  ## Pairings
160  
161  - Frobenius maps
162    - [x] Sparse Frobenius coefficients
163    - [x] Coalesced Frobenius in towered Fields
164    - [x] Coalesced Frobenius powers
165  
166  - Line functions
167    - [x] Homogeneous projective coordinates
168      - [x] D-Twist
169        - [x] Fused line add + elliptic curve add
170        - [x] Fused line double + elliptic curve double
171      - [x] M-Twist
172        - [x] Fused line add + elliptic curve add
173        - [x] Fused line double + elliptic curve double
174      - [x] 6-way sparse multiplication line * Gₜ element
175    - [ ] Jacobian projective coordinates
176      - [ ] D-Twist
177        - [ ] Fused line add + elliptic curve add
178        - [ ] Fused line double + elliptic curve double
179      - [ ] M-Twist
180        - [ ] Fused line add + elliptic curve add
181        - [ ] Fused line double + elliptic curve double
182      - [x] 6-way sparse multiplication line * Gₜ element
183    - [ ] Affine coordinates
184      - [ ] 7-way sparse multiplication line * Gₜ element
185      - [ ] Pseudo-8 sparse multiplication line * Gₜ element
186  
187  - Miller Loop
188    - [x] NAF recoding
189    - [ ] Quadruple-and-add and Octuple-and-add
190    - [x] addition chains
191  
192  - Final exponentiation
193    - [x] Cyclotomic squaring
194      - [x] Karabina's compressed cyclotomic squarings
195    - [x] Addition-chain for exponentiation by curve parameter
196    - [x] BN curves: Fuentes-Castañeda
197    - [ ] BN curves: Duquesne, Ghammam
198    - [ ] BLS curves: Ghamman, Fouotsa
199    - [x] BLS curves: Hayashida, Hayasaka, Teruya
200  
201  - [x] Multi-pairing
202    - [ ] Line accumulation
203    - [ ] Parallel Multi-Pairing
204  
205  ## Hash-to-curve
206  
207  - Clear cofactor
208    - [x] BLS G1: Wahby-Boneh
209    - [ ] BLS G2: Scott et al
210    - [ ] BLS G2: Fuentes-Castañeda
211    - [x] BLS G2: Budroni et al, endomorphism accelerated
212    - [x] BN G2: Fuentes-Castañeda
213    - [ ] BW6-761 G1
214    - [ ] BW6-761 G2
215  
216  - Subgroup check
217    - [ ] BLS G1: Bowe, endomorphism accelerated
218    - [ ] BLS G2: Bowe, endomorphism accelerated
219    - [x] BLS G1: Scott, endomorphism accelerated
220    - [x] BLS G2: Scott, endomorphism accelerated