PROOFS.md
1 2 The cogs of a ZK proof 3 ====================== 4 5 The goal of this document is to describe all the different moving parts 6 (circuit, witness, public input, etc) of a ZK proof through a simple example, 7 hopefully giving some intuition about how these cogs fit together. 8 9 10 The example: Fibonacci sequence 11 ------------------------------- 12 13 ZK proofs (or more precisely: "succint arguments") are all about convicing 14 another party (the "verifier"), about the truth of some statement, without 15 telling them _everything_. 16 17 In our toy example, the statement will be the 1000th element of a Fibonacci 18 sequence. 19 20 A Fibonacci sequence is generated deterministically from its first two elements, 21 let's call them `U` and `V`. These are numbers. The sequence is then defined as 22 23 a[0] = U 24 a[1] = V 25 a[n] = a[n-1] + a[n-2] 26 27 In the standard Fibonacci sequence, we have `U=0` and `V=1` (or sometimes `U=V=1`, 28 which results in the same sequence but shifted by one). But we can consider 29 this generalized setting with `U` and `V` being arbitrary numbers. 30 31 The statement I want to convince you, is that for a given `U` (which I tell you, 32 say `U=53`), I know a value for `V` (which I don't tell you), such that `a[1000]` 33 is some given value (of course, I tell you this value too, otherwise there is 34 nothing to be convinced about). 35 36 Pretty simple so far. 37 38 39 The statement 40 ------------- 41 42 A proof always start with a statement. Usually, the statement is descibed by 43 a computer program (in our case, the program is computing the Fibonacci sequence). 44 A program normally has some inputs and some ouputs. Some of the inputs are 45 public (I tell you what they are), some others can be secret (I don't tell you what 46 they are). And the statement is _usually_ that if you run this program with the given 47 inputs, then the program runs normally (it doesn't throw an error) and you get the 48 given output. 49 50 Note: you can freely move things from outputs to inputs. In our case, the output 51 would be `a[1000]`, but equivalently, I can make another program where there is 52 a third input `W`, and the program just throws an error at the end if `a[1000] != W`. 53 Because of this, we often just say IO for input/output, and "public IO" for the 54 part of it I tell you, and "private IO" or "secret IO" for the part I don't tell 55 you. 56 57 So far we have: 58 59 - the statement (a computer program) 60 - the public IO 61 - and the private IO 62 63 The computer program is often called a "circuit", because for technical reasons 64 it often needs to be described as an "arithmetic circuit", which is something 65 similar to digital electronics circuits (but in math). This gets a bit more 66 complicated with zkVM-s, where the circuit itself is something like a CPU, and your 67 statement is an actual program running on that (virtual) CPU. 68 69 70 The witness 71 ----------- 72 73 How essentially all proof system works is to translate the statement you want 74 the prove into some math equations, and then prove that I know a solution for 75 those equations. Because of this, the prover needs to know not only the input 76 and the output, but all the intermediate results too - essentially a very detailed 77 log of all individual operations happening in the computer program describing 78 the statement. This huge logfile is called "the witness", and in practice it's 79 usually just a large array of numbers (more precisely, finite field elements). 80 81 Unfortunately, the word "witness" is a bit overloaded: Sometimes it refers only 82 to the private IO, and sometimes to the whole log. While these are kind of the same 83 from a holistic point of view, it still makes sense to distinguish them, and 84 I will call the whole execution log the witness. 85 86 Creating a proof always start by running the program which descibes statement, 87 and creating this very detailed log, the witness (sometimes also called a 88 "transcript"). This step is usually called the "witness generation". 89 90 91 Trusted setup 92 ------------- 93 94 Some, but not all, proof systems requires something called a "trusted setup". 95 This is some structured piece of information produced by an external party, 96 which has some secret encoded in it. It's important that the secret is 97 deleted (hence the word "trusted") after generating this setup, because anybody 98 in possesion of the secret can cheat and create fake proofs. 99 100 For example in the popular scheme called "KZG commitments", the secret is 101 a number (a field element) usually called `tau`, and the result of the 102 trusted setup is a sequence 103 104 [ g , tau*g , tau^2*g , ... , tau^N*g ] 105 106 where `g` is a fixed group element. Because of this, the procedure generating 107 this sequence is called a "powers-of-tau ceremony". Assuming the discrete 108 logarithm problem is hard in the group, it's not feasible to compute `tau` 109 itself given that sequence, but it's easy to compute `f(tau)*g` for a polynomial 110 `f` with degree at most `N`. 111 112 In practice such a trusted setup can be accomplished using multiparty computation 113 (MPC), so that if there is at least _one_ honest participant, then the resulting 114 sequence is safe to use. These events are called "ceremonies" (in early days 115 people actually had to do this in person, and at the end they ceremonially destroyed 116 the computer which contained the random secret `tau`). 117 118 There are two kinds of trusted setups: universal and circuit-specific. An universal 119 one can be used to prove many different statements, while a circuit-specific one 120 can be used to prove a single fixed statement, or circuit (but with different inputs). 121 122 For example Groth16 needs a circuit-specific trusted setup, Plonk+KZG only needs 123 a universal trusted setup, while Plonk+FRI does not need any such setup at all. 124 However usually systems with trusted setups are more efficient (for example Groth16 has 125 the smallest proofs among all known proof system). In case of Groth16, the circuit-specific 126 setup is generated from an already existing universal setup and the circuit. 127 128 129 The files 130 --------- 131 132 Using the popular [`circom`](https://docs.circom.io/) + [`snarkjs`](https://github.com/iden3/snarkjs) 133 ecosystem, all the above parts are in different files, which are produced by 134 different commands. 135 136 These are: 137 138 - `.circom` files contain the source code of your circuit (both the equations and 139 the actual program, interleaved together) 140 - `input.json` contains the circuit inputs (both public and private inputs) 141 - `.r1cs` file contains the circuit, that is, the statement you want to prove 142 (R1CS is short for Rank-1 Constraint System, a specific form of a "circuit"). 143 This is one output of the `circom` compiler, which reads the above source code and produces `.r1cs` files 144 - `.wtns` files contain the witness. This is generated by the so-called "witness generator", which is 145 another output of the `circom` compiler. Essentially the compiler separates the equations (goes into `.r1cs`) 146 and running the program (goes into the witness generator, which in this case can be either a WASM or a C++ program). 147 When you run the witness generator, it takes the `input.json` and produces the `.wtns` file 148 - `.ptau` files are containing universal trusted setups 149 - `.zkey` files are called a "prover key" (it contains everything the prover needs), and 150 in case of Groth16 it also corresponds to a circuit-specific trusted setup. 151 - the prover takes the `.zkey` and the `.wtns` files, and produces two outputs: a `proof.json` 152 containing the proof itself, and a `public.json` containing only the public input (so this 153 is copied from `input.json`, but the latter also contains the private inputs) 154 - from the `.zkey` file, a "verification key" can be extracted (this is again `.json` file), 155 which contains everything the verifier needs (in particular, it contains something like a 156 hash of the circuit. At least once this has to be checked against a source code, trusted setup 157 and resulting `.zkey`, so that you actually know which statement you verify! 158 However you don't want to do this agin and again, because the statement - the circuit - can be very big). 159 In practice the verifier key is often hardcoded in the on-chain verifier contract. 160 - then finally the verifier takes this verification key, the proof file, and the public input file 161 (all `.json` files here), and outputs either "valid" or "invalid" 162 163 The command line workflow of doing all this is described in the file 164 [README.md](README.md).