/ workflow / PROOFS.md
PROOFS.md
  1  
  2  The cogs of a ZK proof
  3  ======================
  4  
  5  The goal of this document is to describe all the different moving parts 
  6  (circuit, witness, public input, etc) of a ZK proof through a simple example,
  7  hopefully giving some intuition about how these cogs fit together.
  8  
  9  
 10  The example: Fibonacci sequence
 11  -------------------------------
 12  
 13  ZK proofs (or more precisely: "succint arguments") are all about convicing
 14  another party (the "verifier"), about the truth of some statement, without
 15  telling them _everything_.
 16  
 17  In our toy example, the statement will be the 1000th element of a Fibonacci
 18  sequence.
 19  
 20  A Fibonacci sequence is generated deterministically from its first two elements,
 21  let's call them `U` and `V`. These are numbers. The sequence is then defined as
 22  
 23      a[0] = U
 24      a[1] = V
 25      a[n] = a[n-1] + a[n-2]
 26  
 27  In the standard Fibonacci sequence, we have `U=0` and `V=1` (or sometimes `U=V=1`,
 28  which results in the same sequence but shifted by one). But we can consider 
 29  this generalized setting with `U` and `V` being arbitrary numbers.
 30  
 31  The statement I want to convince you, is that for a given `U` (which I tell you,
 32  say `U=53`), I know a value for `V` (which I don't tell you), such that `a[1000]`
 33  is some given value (of course, I tell you this value too, otherwise there is
 34  nothing to be convinced about).
 35  
 36  Pretty simple so far.
 37  
 38  
 39  The statement
 40  -------------
 41  
 42  A proof always start with a statement. Usually, the statement is descibed by
 43  a computer program (in our case, the program is computing the Fibonacci sequence).
 44  A program normally has some inputs and some ouputs. Some of the inputs are
 45  public (I tell you what they are), some others can be secret (I don't tell you what 
 46  they are). And the statement is _usually_ that if you run this program with the given 
 47  inputs, then the program runs normally (it doesn't throw an error) and you get the 
 48  given output.
 49  
 50  Note: you can freely move things from outputs to inputs. In our case, the output
 51  would be `a[1000]`, but equivalently, I can make another program where there is
 52  a third input `W`, and the program just throws an error at the end if `a[1000] != W`.
 53  Because of this, we often just say IO for input/output, and "public IO" for the
 54  part of it I tell you, and "private IO" or "secret IO" for the part I don't tell 
 55  you.
 56  
 57  So far we have:
 58  
 59  - the statement (a computer program)
 60  - the public IO
 61  - and the private IO
 62  
 63  The computer program is often called a "circuit", because for technical reasons
 64  it often needs to be described as an "arithmetic circuit", which is something
 65  similar to digital electronics circuits (but in math). This gets a bit more
 66  complicated with zkVM-s, where the circuit itself is something like a CPU, and your
 67  statement is an actual program running on that (virtual) CPU.
 68  
 69  
 70  The witness
 71  -----------
 72  
 73  How essentially all proof system works is to translate the statement you want
 74  the prove into some math equations, and then prove that I know a solution for
 75  those equations. Because of this, the prover needs to know not only the input
 76  and the output, but all the intermediate results too - essentially a very detailed
 77  log of all individual operations happening in the computer program describing 
 78  the statement. This huge logfile is called "the witness", and in practice it's
 79  usually just a large array of numbers (more precisely, finite field elements).
 80  
 81  Unfortunately, the word "witness" is a bit overloaded: Sometimes it refers only
 82  to the private IO, and sometimes to the whole log. While these are kind of the same
 83  from a holistic point of view, it still makes sense to distinguish them, and
 84  I will call the whole execution log the witness.
 85  
 86  Creating a proof always start by running the program which descibes statement,
 87  and creating this very detailed log, the witness (sometimes also called a 
 88  "transcript"). This step is usually called the "witness generation".
 89  
 90  
 91  Trusted setup
 92  -------------
 93  
 94  Some, but not all, proof systems requires something called a "trusted setup".
 95  This is some structured piece of information produced by an external party, 
 96  which has some secret encoded in it. It's important that the secret is
 97  deleted (hence the word "trusted") after generating this setup, because anybody 
 98  in possesion of the secret can cheat and create fake proofs. 
 99  
100  For example in the popular scheme called "KZG commitments", the secret is
101  a number (a field element) usually called `tau`, and the result of the 
102  trusted setup is a sequence
103  
104      [ g , tau*g , tau^2*g , ... , tau^N*g ]
105  
106  where `g` is a fixed group element. Because of this, the procedure generating
107  this sequence is called a "powers-of-tau ceremony". Assuming the discrete 
108  logarithm problem is hard in the group, it's not feasible to compute `tau` 
109  itself given that sequence, but it's easy to compute `f(tau)*g` for a polynomial
110  `f` with degree at most `N`.
111  
112  In practice such a trusted setup can be accomplished using multiparty computation
113  (MPC), so that if there is at least _one_ honest participant, then the resulting
114  sequence is safe to use. These events are called "ceremonies" (in early days
115  people actually had to do this in person, and at the end they ceremonially destroyed 
116  the computer which contained the random secret `tau`).
117  
118  There are two kinds of trusted setups: universal and circuit-specific. An universal
119  one can be used to prove many different statements, while a circuit-specific one
120  can be used to prove a single fixed statement, or circuit (but with different inputs).
121  
122  For example Groth16 needs a circuit-specific trusted setup, Plonk+KZG only needs 
123  a universal trusted setup, while Plonk+FRI does not need any such setup at all. 
124  However usually systems with trusted setups are more efficient (for example Groth16 has
125  the smallest proofs among all known proof system). In case of Groth16, the circuit-specific 
126  setup is generated from an already existing universal setup and the circuit.
127  
128  
129  The files
130  ---------
131  
132  Using the popular [`circom`](https://docs.circom.io/) + [`snarkjs`](https://github.com/iden3/snarkjs) 
133  ecosystem, all the above parts are in different files, which are produced by
134  different commands.
135  
136  These are:
137  
138  - `.circom` files contain the source code of your circuit (both the equations and 
139    the actual program, interleaved together)
140  - `input.json` contains the circuit inputs (both public and private inputs)
141  - `.r1cs` file contains the circuit, that is, the statement you want to prove
142    (R1CS is short for Rank-1 Constraint System, a specific form of a "circuit").
143    This is one output of the `circom` compiler, which reads the above source code and produces `.r1cs` files
144  - `.wtns` files contain the witness. This is generated by the so-called "witness generator", which is 
145    another output of the `circom` compiler. Essentially the compiler separates the equations (goes into `.r1cs`)
146    and running the program (goes into the witness generator, which in this case can be either a WASM or a C++ program).
147    When you run the witness generator, it takes the `input.json` and produces the `.wtns` file
148  - `.ptau` files are containing universal trusted setups
149  - `.zkey` files are called a "prover key" (it contains everything the prover needs), and 
150    in case of Groth16 it also corresponds to a circuit-specific trusted setup.
151  - the prover takes the `.zkey` and the `.wtns` files, and produces two outputs: a `proof.json` 
152    containing the proof itself, and a `public.json` containing only the public input (so this
153    is copied from `input.json`, but the latter also contains the private inputs)
154  - from the `.zkey` file, a "verification key" can be extracted (this is again `.json` file),
155    which contains everything the verifier needs (in particular, it contains something like a 
156    hash of the circuit. At least once this has to be checked against a source code, trusted setup
157    and resulting `.zkey`, so that you actually know which statement you verify!
158    However you don't want to do this agin and again, because the statement - the circuit - can be very big).
159    In practice the verifier key is often hardcoded in the on-chain verifier contract.
160  - then finally the verifier takes this verification key, the proof file, and the public input file
161    (all `.json` files here), and outputs either "valid" or "invalid"
162  
163  The command line workflow of doing all this is described in the file
164  [README.md](README.md).