/ README.md
README.md
1 2 Codex Storage Proofs for the MVP 3 ================================ 4 5 This document describes the storage proof system for the Codex 2023 Q4 MVP. 6 7 8 Repo organization 9 ----------------- 10 11 - `README.md` - this document 12 - `circuit/` - the proof circuit (`circom` code) 13 - `reference/haskell/` - Haskell reference implementation of the proof input generation 14 - `reference/nim/proof_input/` - Nim reference implementation of the proof input generation 15 - `reference/nim/testvectors/` - Nim script to generate test vectors for Poseidon2 sponge hash 16 - `test/` - tests for (some parts of the) circuit (using the `r1cs-solver` tool) 17 - `workflow/` - description and script for the full proof workflow 18 19 20 Setup 21 ----- 22 23 We assume that a user dataset is split into `nSlots` number of (not necessarily 24 uniformly sized) "slots" of size `slotSize`, for example 10 GB or 100 GB or even 25 1,000 GB (for the MVP we may chose a smaller sizes). The slots of the same dataset 26 are spread over different storage nodes, but a single storage node can hold several 27 slots (of different sizes, and belonging to different datasets). The slots themselves 28 can be optionally erasure coded, but this does not change the proof system, only 29 the robustness of it. 30 31 We assume the slots are split into `nCells` number of fixed, uniformly sized 32 "cells" of size `cellSize`, for example 2048 bytes. Note that `nCells` can 33 depend on the particular slot. For the initial version, we assume 34 that these numbers are powers of two (especially `nCells`). Worst case we 35 can just pad the data to achieve this (it probably makes more sense to pad 36 _before_ the erasure coding, even though this increases the computational cost). 37 Note that we can simply calculate: 38 39 nCells = slotSize / cellSize 40 41 We hash each cell independently, using the sponge construction with Poseidon2 42 (see below for details). 43 44 The cells are then organized to `blockSize = 64kb` blocks, each block containing 45 `blockSize / cellSize = 32` cells. This is for compatibility with the networking 46 layer, which use larger (right now 64kb) blocks. For each block, we compute a 47 block hash by building a depth `5 = log2(32)` complete Merkle tree, using again 48 Poseidon2 hash, with the Merkle tree conventions described below. 49 50 Then on the set of of block hashes in a slot (we have `slotSize / blockSize` many 51 ones), we build another (big) Merkle tree, whose root will identify the slot, 52 which we call the the "slot root", and is denoted by `slotRoot`. 53 54 Then for a given dataset, containing several slots, we can build a third binary 55 Merkle tree on the top of its slot roots, resulting in the "dataset root" (note: 56 this is not the same as the SHA256 hash associated with the original dataset 57 uploaded by the user). Grafting these Merkle trees together we get a big dataset 58 Merkle tree; however one should be careful about the padding conventions 59 (it makes sense to construct the dataset-level Merkle tree separately, as `nSlots` 60 may not be a power of two, and later maybe `nCells` and `nBlocks` won't be 61 power-of-two either). 62 63 The dataset root is a commitment to the whole (erasure coded) dataset, and will 64 be posted on-chain, to ensure that the nodes really store its data and not something else. 65 Optionally, the slot roots can be posted on-chain, but this seems to be somewhat 66 wasteful. 67 68 69 Hash function 70 ------------- 71 72 For the MVP we will use the Poseidon2 hash function, specialized to state size 73 `t = 3` field elements, and to the alt-bn128 elliptic curve's scalar field, which 74 has size 75 76 r = 21888242871839275222246405745257275088548364400416034343698204186575808495617 77 78 For more details about this curve, see [BN254 For The Rest Of Us](https://hackmd.io/@jpw/bn254). 79 Remark: we have `2^253 < r < 2^254`. 80 81 For more details about the Poseidon2 hash, see 82 83 - [the Poseidon2 paper](https://eprint.iacr.org/2023/323) 84 - [the reference implementation](https://github.com/HorizenLabs/poseidon2) 85 86 ### Data encoding convention 87 88 Poseidon2 hash works not over sequence of bytes, but field elements. Hence we 89 need to convert the data to be hashed to a sequence of field elements. 90 91 Note that the field size is approximately 254 bits, so a field element can store 92 31 bytes of data but cannot store 32 bytes. Hence we have to split up our data 93 into 31 bytes "hash chunks", then encode these as field elements. We will use 94 little-endian byte convention to get from 31 bytes to the _standard form_ of 95 a field element, that is, the 31 little-endian bytes encode the number `a` 96 where `0 <= a < 2^248 < r < 2^254` is the standard representation of a field element. 97 For padding the last field element use the so-called `10*` strategy: That means 98 _always_ append a single `0x01` byte, and then pad with the minimum number 99 of zero bytes so that the final padded length is a multiple of 31. 100 101 It's probably better to choose the standard representation of field elements, 102 because most standard tools like circom use that; even though choosing the 103 Montgomery form would be a tiny bit more efficient (but the difference is very 104 small, the hashing will dominate the costs anyway). 105 106 ### Compression and sponge 107 108 Poseidon2 offers essentially two API functions: a so called _compression function_ 109 which takes 2 field elements and returns 1; this can be used to build binary 110 Merkle trees. And a more complex _sponge construction_ for linear hashing, 111 which can hash an arbitrary sequence of field elements into a single (or several) 112 field element(s). 113 114 The sponge can have versions with different "rate" parameters, and the compression 115 function is more generally a parametric family of functions, which I call a _keyed 116 compression function_. 117 118 While we could always use a Merkle tree instead of the sponge, in the sponge 119 construction we can use `rate = 2` with a target of 128-bit security level, which 120 in practice means twice as fast; so we should use that for hashing the cells. 121 122 Conventions to decide: 123 124 - padding (especially important for `rate > 1`) 125 - conversion from bytes to field elements (see above) 126 - initialization vector 127 128 We propose again the padding strategy `10*` (but here we are talking about field 129 elements, not bytes!), and an initialization vector `(0,0,domSep)` where `domSep` 130 (short for "domain separation"), the initial value for the "capacity" part of the 131 sponge, is defined as 132 133 domSep := 2^64 + 256*t + rate 134 135 136 Parameters 137 ---------- 138 139 Parameters should be set based on bechmarks and other external constraints, 140 but already we can have some preliminary targets. If we use a slot Merkle tree 141 of depth between 16-32, and sponge hash with a rate of 2, then the Merkle 142 inclusion proof cost will be approximately in balance with the cell hashing 143 cost if the cell size is 1024-2048 bytes (34-67 field elements). 144 145 If we use 2048 byte cells, then `nCells = 8192*8192 = 2^26` gives a slot size 146 of 128 Gb, which looks like a good compromise target slot size (we don't want 147 it to be too small, because then the number of slot proofs per node will be 148 too big; but we don't want it to be too big either, because that's inflexible). 149 The maximum slot size with cell size of 2048 and a depth of 32 (corresponding 150 to 2D erasure code is of size `2^16 x 2^16`, see below) is `2^(32+11) = 2^43 = 8 Tb`. 151 152 ### 2D erasure coding 153 154 We need to use 2D erasure coding inside a slot, because while 1D would be better, 155 the required K,N parameters of the EC would be too big. Furthermore, we don't 156 want too big storage overhead, so the idea is to use `K ~= 2/3 * N` 157 (with `N` being a power of two). With such a setting, we can store the 158 original `K x K` matrix and the small `(N-K) x (N-K)` matrix, and reconstruct 159 the rest real-time. This particular parameter setting gives us a storage overhead 160 of `25%` (of course we can tune the parameters, this is just an example). 161 162 Note that the EC rate does not affect the proof system itself, only the required 163 number of samples to achieve a given target detection probability. 164 165 With the above setting (EC rate = 2/3), we need 117 samples to achieve 6 nines 166 of detection probability. Sampling a single 2048 cell with sponge rate 2 167 means approx 34 Poseidon2 permutation calls, while computing the root of 168 a depth 26 Merkle path is another 26 such calls. Finally computing the index 169 needs 2 calls and an extra cost similar to approx 3 calls, so we have about 170 65 calls per sample, each about 250 constraints. So in total we have about 171 `117 * 65 * 250 ~= 1.9 million` constraints, or equivalent to hashing about 172 `235 kb` of data with a binary Merkle tree or with a sponge of rate 1. 173 174 175 Sampling 176 -------- 177 178 Given a slot of a dataset, we want to check `nSamples` randomly selected 179 cells, to see whether they are still stored by the node. We expect `nSamples` 180 to be in the range 20-200. 181 182 To be able to do this, we need to compute: 183 184 - the indices of the selected cells (or Merkle leaves) 185 - the hashes of the selected cells 186 - the Merkle paths from these hashes up to the slot root (Merkle proofs) 187 - the number of cells in the slot (this is because of our particular Merkle 188 tree construction, but we will need it anyway to compute the indices) 189 - the (single) Merkle path from the slot root to the dataset root. 190 191 To be able to sample randomly, we need some external source of entropy; for 192 this, presumably a block hash of some public blockchain (eg. Ethereum) will 193 be used. It can be convenient to have the entropy in the form a field element; 194 for this we can simply take the block hash `modulo r`. While this gives a 195 somewhat non-uniform probability distribution, in practice this should not 196 cause any issue. Alternatively, we could just keep 31 bytes of the block hash and 197 interpret it as a field element, giving 248 bits of uniform entropy. 198 199 We propose the following function to compute the indices of the selected cells: 200 201 idx = H( entropy | slotRoot | counter ) `mod` nCells 202 203 where `counter` iterates over the range `[1..nSamples]`, `H` is our hash 204 function (right now Poseidon2 sponge with `rate = 2` and `10*` padding strategy), 205 and `|` denotes concatenation (in this case the input is just 3 field elements). 206 207 208 Circuit 209 ------- 210 211 For the MVP, we build a monolithic circuit / R1CS instance for proving all 212 the samples in single slot; then use Groth16 to prove it. 213 214 Public inputs: 215 216 - dataset root 217 - slot index within the dataset 218 - entropy (public randomness) 219 220 Private inputs: 221 222 - the slot root 223 - number of cells in the slot 224 - the number of slots in the dataset 225 - the underlying data of the cells, as sequences of field elements 226 - the Merkle paths from the leaves (the cell hashes) to the slot root 227 - the Merkle path from the slot root to the dataset root 228 229 Computation: 230 231 - compute the cell indices from the slot root and the entropy 232 - compute the cell hashes from the cell data 233 - from the cell hashes, using the provided Merkle paths, recompute the slot root 234 - compare the reconstructed slot root to the slot root given as private input 235 - from the slot root, using the provided Merkle path, recompute and check the 236 dataset root 237 238 Note that given the index of a leaf, we can compute the left-right zig-zag 239 of the corresponding Merkle path, simply by looking at the binary decomposition. 240 So the Merkle paths will only consist lists of hashes. 241 242 243 Merkle tree conventions 244 ----------------------- 245 246 We use the same "safe" Merkle tree construction across the codebase. This uses 247 a "keyed compression function", where the key depends on: 248 249 - whether we are in the bottommost layer or not 250 - whether the node we are dealing with has 1 or 2 children (odd or even node) 251 252 These are two bits, encoded as numbers in the set `{0,1,2,3}` (the lowest bit is 253 1 if it's the bottom layer, 0 otherwise; the next bit is 1 if it's an odd node, 254 0 if even node). Furthermore: 255 256 - in case of an odd node with leaf `x`, we apply the compression to the pair `(x,0)` 257 - in case of a singleton input (the whole Merkle tree is built on a single field 258 element), we also apply one compression 259 - the keyed compression is defined as applying the permutation to the triple 260 `(x,y,key)`, and extracting the first component of the resulting triple 261 262 In case of SHA256, we could use a compression functions of the form 263 `SHA256(x|y|key)`, where `x,y` are 32 byte long hashes, and `key` is a single 264 byte. Since SHA256 already does some padding internally, this has the same 265 cost as computing just `SHA256(x|y)`. 266 267 Network blocks vs. cells 268 ------------------------ 269 270 The networking layer uses 64kb "blocks", while the proof layer uses 2kb "cells". 271 To make these compatible, the way we define a hash of a 64kb block is: 272 273 - split the 64kb data into 32 smaller 2kb cells; 274 - hash these cells (with Poseidon2 sponge, rate=2, and `10*` padding); 275 - build a depth 5 complete binary Merkle tree on those hashes, with the above 276 conventions. The resulting Merkle root will be the hash of the 64kb block. 277 278 Then we build a big Merkle tree on these block hashes, again with the above 279 conventions, resulting in the slot root. 280 281