/ README.md
README.md
  1  
  2  Codex Storage Proofs for the MVP
  3  ================================
  4  
  5  This document describes the storage proof system for the Codex 2023 Q4 MVP.
  6  
  7  
  8  Repo organization
  9  -----------------
 10  
 11  - `README.md` - this document
 12  - `circuit/` - the proof circuit (`circom` code)
 13  - `reference/haskell/` - Haskell reference implementation of the proof input generation
 14  - `reference/nim/proof_input/` - Nim reference implementation of the proof input generation
 15  - `reference/nim/testvectors/` - Nim script to generate test vectors for Poseidon2 sponge hash
 16  - `test/` - tests for (some parts of the) circuit (using the `r1cs-solver` tool)
 17  - `workflow/` - description and script for the full proof workflow
 18  
 19  
 20  Setup
 21  -----
 22  
 23  We assume that a user dataset is split into `nSlots` number of (not necessarily 
 24  uniformly sized) "slots" of size `slotSize`, for example 10 GB or 100 GB or even
 25  1,000 GB (for the MVP we may chose a smaller sizes). The slots of the same dataset 
 26  are spread over different storage nodes, but a single storage node can hold several 
 27  slots (of different sizes, and belonging to different datasets). The slots themselves
 28  can be optionally erasure coded, but this does not change the proof system, only 
 29  the robustness of it.
 30  
 31  We assume the slots are split into `nCells` number of fixed, uniformly sized
 32  "cells" of size `cellSize`, for example 2048 bytes. Note that `nCells` can
 33  depend on the particular slot. For the initial version, we assume 
 34  that these numbers are powers of two (especially `nCells`). Worst case we
 35  can just pad the data to achieve this (it probably makes more sense to pad
 36  _before_ the erasure coding, even though this increases the computational cost).
 37  Note that we can simply calculate:
 38  
 39      nCells = slotSize / cellSize
 40  
 41  We hash each cell independently, using the sponge construction with Poseidon2 
 42  (see below for details).
 43  
 44  The cells are then organized to `blockSize = 64kb` blocks, each block containing
 45  `blockSize / cellSize = 32` cells. This is for compatibility with the networking
 46  layer, which use larger (right now 64kb) blocks. For each block, we compute a
 47  block hash by building a depth `5 = log2(32)` complete Merkle tree, using again 
 48  Poseidon2 hash, with the Merkle tree conventions described below. 
 49  
 50  Then on the set of of block hashes in a slot (we have `slotSize / blockSize` many 
 51  ones), we build another (big) Merkle tree, whose root will identify the slot, 
 52  which we call the the "slot root", and is denoted by `slotRoot`.
 53  
 54  Then for a given dataset, containing several slots, we can build a third binary 
 55  Merkle tree on the top of its slot roots, resulting in the "dataset root" (note:
 56  this is not the same as the SHA256 hash associated with the original dataset 
 57  uploaded by the user). Grafting these Merkle trees together we get a big dataset 
 58  Merkle tree; however one should be careful about the padding conventions 
 59  (it makes sense to construct the dataset-level Merkle tree separately, as `nSlots`
 60  may not be a power of two, and later maybe `nCells` and `nBlocks` won't be
 61  power-of-two either).
 62  
 63  The dataset root is a commitment to the whole (erasure coded) dataset, and will 
 64  be posted on-chain, to ensure that the nodes really store its data and not something else.
 65  Optionally, the slot roots can be posted on-chain, but this seems to be somewhat
 66  wasteful.
 67  
 68  
 69  Hash function
 70  -------------
 71  
 72  For the MVP we will use the Poseidon2 hash function, specialized to state size
 73  `t = 3` field elements, and to the alt-bn128 elliptic curve's scalar field, which 
 74  has size
 75  
 76    r = 21888242871839275222246405745257275088548364400416034343698204186575808495617
 77  
 78  For more details about this curve, see [BN254 For The Rest Of Us](https://hackmd.io/@jpw/bn254).
 79  Remark: we have `2^253 < r < 2^254`.
 80  
 81  For more details about the Poseidon2 hash, see
 82  
 83  - [the Poseidon2 paper](https://eprint.iacr.org/2023/323)
 84  - [the reference implementation](https://github.com/HorizenLabs/poseidon2)
 85  
 86  ### Data encoding convention
 87  
 88  Poseidon2 hash works not over sequence of bytes, but field elements. Hence we
 89  need to convert the data to be hashed to a sequence of field elements.
 90  
 91  Note that the field size is approximately 254 bits, so a field element can store 
 92  31 bytes of data but cannot store 32 bytes. Hence we have to split up our data
 93  into 31 bytes "hash chunks", then encode these as field elements. We will use
 94  little-endian byte convention to get from 31 bytes to the _standard form_ of 
 95  a field element, that is, the 31 little-endian bytes encode the number `a`
 96  where `0 <= a < 2^248 < r < 2^254` is the standard representation of a field element.
 97  For padding the last field element use the so-called `10*` strategy: That means
 98  _always_ append a single `0x01` byte, and then pad with the minimum number
 99  of zero bytes so that the final padded length is a multiple of 31.
100  
101  It's probably better to choose the standard representation of field elements, 
102  because most standard tools like circom use that; even though choosing the 
103  Montgomery form would be a tiny bit more efficient (but the difference is very
104  small, the hashing will dominate the costs anyway).
105  
106  ### Compression and sponge
107  
108  Poseidon2 offers essentially two API functions: a so called _compression function_
109  which takes 2 field elements and returns 1; this can be used to build binary
110  Merkle trees. And a more complex _sponge construction_ for linear hashing,
111  which can hash an arbitrary sequence of field elements into a single (or several) 
112  field element(s). 
113  
114  The sponge can have versions with different "rate" parameters, and the compression
115  function is more generally a parametric family of functions, which I call a _keyed 
116  compression function_.
117  
118  While we could always use a Merkle tree instead of the sponge, in the sponge
119  construction we can use `rate = 2` with a target of 128-bit security level, which 
120  in practice means twice as fast; so we should use that for hashing the cells.
121  
122  Conventions to decide:
123  
124  - padding (especially important for `rate > 1`)
125  - conversion from bytes to field elements (see above)
126  - initialization vector
127  
128  We propose again the padding strategy `10*` (but here we are talking about field
129  elements, not bytes!), and an initialization vector `(0,0,domSep)` where `domSep`
130  (short for "domain separation"), the initial value for the "capacity" part of the 
131  sponge, is defined as 
132  
133      domSep := 2^64 + 256*t + rate
134  
135  
136  Parameters
137  ----------
138  
139  Parameters should be set based on bechmarks and other external constraints,
140  but already we can have some preliminary targets. If we use a slot Merkle tree
141  of depth between 16-32, and sponge hash with a rate of 2, then the Merkle 
142  inclusion proof cost will be approximately in balance with the cell hashing
143  cost if the cell size is 1024-2048 bytes (34-67 field elements).
144  
145  If we use 2048 byte cells, then `nCells = 8192*8192 = 2^26` gives a slot size
146  of 128 Gb, which looks like a good compromise target slot size (we don't want
147  it to be too small, because then the number of slot proofs per node will be
148  too big; but we don't want it to be too big either, because that's inflexible).
149  The maximum slot size with cell size of 2048 and a depth of 32 (corresponding
150  to 2D erasure code is of size `2^16 x 2^16`, see below) is `2^(32+11) = 2^43 = 8 Tb`.
151  
152  ### 2D erasure coding
153  
154  We need to use 2D erasure coding inside a slot, because while 1D would be better,
155  the required K,N parameters of the EC would be too big. Furthermore, we don't
156  want too big storage overhead, so the idea is to use `K ~= 2/3 * N`
157  (with `N` being a power of two). With such a setting, we can store the
158  original `K x K` matrix and the small `(N-K) x (N-K)` matrix, and reconstruct
159  the rest real-time. This particular parameter setting gives us a storage overhead 
160  of `25%` (of course we can tune the parameters, this is just an example).
161  
162  Note that the EC rate does not affect the proof system itself, only the required
163  number of samples to achieve a given target detection probability.
164  
165  With the above setting (EC rate = 2/3), we need 117 samples to achieve 6 nines
166  of detection probability. Sampling a single 2048 cell with sponge rate 2
167  means approx 34 Poseidon2 permutation calls, while computing the root of
168  a depth 26 Merkle path is another 26 such calls. Finally computing the index
169  needs 2 calls and an extra cost similar to approx 3 calls, so we have about
170  65 calls per sample, each about 250 constraints. So in total we have about
171  `117 * 65 * 250 ~= 1.9 million` constraints, or equivalent to hashing about
172  `235 kb` of data with a binary Merkle tree or with a sponge of rate 1.
173  
174  
175  Sampling
176  --------
177  
178  Given a slot of a dataset, we want to check `nSamples` randomly selected
179  cells, to see whether they are still stored by the node. We expect `nSamples`
180  to be in the range 20-200.
181  
182  To be able to do this, we need to compute:
183  
184  - the indices of the selected cells (or Merkle leaves)
185  - the hashes of the selected cells
186  - the Merkle paths from these hashes up to the slot root (Merkle proofs)
187  - the number of cells in the slot (this is because of our particular Merkle 
188    tree construction, but we will need it anyway to compute the indices)
189  - the (single) Merkle path from the slot root to the dataset root.
190  
191  To be able to sample randomly, we need some external source of entropy; for
192  this, presumably a block hash of some public blockchain (eg. Ethereum) will
193  be used. It can be convenient to have the entropy in the form a field element;
194  for this we can simply take the block hash `modulo r`. While this gives a
195  somewhat non-uniform probability distribution, in practice this should not
196  cause any issue. Alternatively, we could just keep 31 bytes of the block hash and
197  interpret it as a field element, giving 248 bits of uniform entropy.
198  
199  We propose the following function to compute the indices of the selected cells:
200  
201      idx = H( entropy | slotRoot | counter ) `mod` nCells
202  
203  where `counter` iterates over the range `[1..nSamples]`, `H` is our hash
204  function (right now Poseidon2 sponge with `rate = 2` and `10*` padding strategy), 
205  and `|` denotes concatenation (in this case the input is just 3 field elements).
206  
207  
208  Circuit
209  -------
210  
211  For the MVP, we build a monolithic circuit / R1CS instance for proving all
212  the samples in single slot; then use Groth16 to prove it.
213  
214  Public inputs:
215  
216  - dataset root
217  - slot index within the dataset
218  - entropy (public randomness)
219  
220  Private inputs:
221  
222  - the slot root 
223  - number of cells in the slot 
224  - the number of slots in the dataset
225  - the underlying data of the cells, as sequences of field elements
226  - the Merkle paths from the leaves (the cell hashes) to the slot root
227  - the Merkle path from the slot root to the dataset root
228  
229  Computation:
230  
231  - compute the cell indices from the slot root and the entropy
232  - compute the cell hashes from the cell data
233  - from the cell hashes, using the provided Merkle paths, recompute the slot root
234  - compare the reconstructed slot root to the slot root given as private input
235  - from the slot root, using the provided Merkle path, recompute and check the
236    dataset root
237  
238  Note that given the index of a leaf, we can compute the left-right zig-zag
239  of the corresponding Merkle path, simply by looking at the binary decomposition.
240  So the Merkle paths will only consist lists of hashes.
241  
242  
243  Merkle tree conventions
244  -----------------------
245  
246  We use the same "safe" Merkle tree construction across the codebase. This uses
247  a "keyed compression function", where the key depends on:
248  
249  - whether we are in the bottommost layer or not
250  - whether the node we are dealing with has 1 or 2 children (odd or even node)
251  
252  These are two bits, encoded as numbers in the set `{0,1,2,3}` (the lowest bit is
253  1 if it's the bottom layer, 0 otherwise; the next bit is 1 if it's an odd node,
254  0 if even node). Furthermore:
255  
256  - in case of an odd node with leaf `x`, we apply the compression to the pair `(x,0)`
257  - in case of a singleton input (the whole Merkle tree is built on a single field
258    element), we also apply one compression
259  - the keyed compression is defined as applying the permutation to the triple 
260    `(x,y,key)`, and extracting the first component of the resulting triple
261  
262  In case of SHA256, we could use a compression functions of the form
263  `SHA256(x|y|key)`, where `x,y` are 32 byte long hashes, and `key` is a single
264  byte. Since SHA256 already does some padding internally, this has the same
265  cost as computing just `SHA256(x|y)`.
266  
267  Network blocks vs. cells
268  ------------------------
269  
270  The networking layer uses 64kb "blocks", while the proof layer uses 2kb "cells".
271  To make these compatible, the way we define a hash of a 64kb block is:
272  
273  - split the 64kb data into 32 smaller 2kb cells;
274  - hash these cells (with Poseidon2 sponge, rate=2, and `10*` padding);
275  - build a depth 5 complete binary Merkle tree on those hashes, with the above
276    conventions. The resulting Merkle root will be the hash of the 64kb block.
277  
278  Then we build a big Merkle tree on these block hashes, again with the above
279  conventions, resulting in the slot root.
280  
281