/ doc / dev / notes / dirauth-sketch.md
dirauth-sketch.md
  1  # arti dirauth design sketch
  2  
  3  ## dirauth functions
  4  
  5   * receive relays' server descriptor submissions (and extra-info docs submissions)
  6   * exchange server submissions with other dirauths
  7     (acting as a full normal dircache is one way to do this,
  8     but perhaps a more limited form of dircache is sufficient).
  9   * participate in the shared random protocol
 10   * perform some reachability tests for candidate relays
 11     - includes tracking reachability over time
 12       (and thus computing Stable and Guard flags)
 13   * generate a vote from
 14     - available descriptors
 15     - configuration (including relay-specific configuration provided
 16       by network health team, mediated by dirauth local policy
 17     - bandwidth measurements
 18   * exchange votes (and consensus signatures) with other dirauths (make them publicly available)
 19   * given votes, generate and sign consensus
 20   * serve the consensus document
 21  
 22  ## principal components
 23  
 24   * dircache.
 25     (Also needed for Arti Relay; collaborate with that team)
 26      - store information (eg descriptors, consensuses) locally
 27      - serve over BEGIN_DIR
 28      - serve over HTTP
 29      - Download information as needed; see also
 30        https://spec.torproject.org/dir-spec/downloading-from-other-auths.html
 31  
 32   * reachability tester and relay status history
 33     - test relays' reachability
 34     - record enough history to calculate consensus uptime and MTBF measures
 35  
 36   * ingesters for relay-specific information
 37     - relay-specific configuration from Network Health
 38     - bandwidth scanner results
 39  
 40   * consensus algorithm implementation
 41     - We will not attempt to 100% match the behaviour of C Tor.
 42       Instead, we provide this as `.so` (or a maybe an executable)
 43       and will arrange for C Tor diaruth to  be able to use it
 44       (see transition plan).
 45  
 46   * vote calculator
 47  
 48   * component for generating K\_dirauth\_sign\_* and signing it with
 49     KS\_dirauth\_id\_*, capable of running offline.
 50  
 51  The latter two don't need to be always-online.
 52  We'll to separate them out so that they can (likely in the future)
 53  use a static data dump, or a restricted protocol,
 54  so that they don't need full internet access.
 55  
 56  ## deployment transition plan
 57  
 58  Directory consensus protocol means that
 59  if we change the consensus algorithm
 60  at least 1/3 of functioning dirauths, and probably more,
 61  must change simultaneously.
 62  (We go from \<1/3 new to \>2/3 new in one go.)
 63  
 64  We think it is probably not going to be feasible to precisely reproduce
 65  the consensus calculations from C Tor in Arti.
 66  
 67  This is practical only if the simultaneously-switching dirauths
 68  all implement both the old and new consensus algorithm.
 69  (This is what the consensus methods are for.)
 70  We can't switch all dirauths from C Tor to Arti on the same day.
 71  
 72  Instead, we will
 73  make the Arti implementation of the consensus protocol
 74  available in a form that can be used by C Tor.
 75  We'll
 76  adapt C Tor to be able to call that implementation,
 77  making the choice based on the consensus method.
 78  
 79  When enough (C Tor) dirauths have the Arti consensus algorithm available,
 80  the consensus method protocol will automatically switch
 81  to using the Arti consensus.
 82  
 83  After that, C Tor dirauths without the Arti consensus algorithm
 84  will effectively not participate, until they are upgraded.
 85  But pure-Arti dirauths (which can only perform the Arti consensus algorithm)
 86  can be deployed.
 87  
 88  (In practice there may be, during the transition,
 89  more than one relevant Arti consensus method
 90  and possibly more than one relevant C Tor consensus method.)
 91  
 92  ### Rationale, dirauth upgrade impact
 93  
 94  Arti dirauth is not going to be a drop-in replacement
 95  for C Tor dirauth.
 96  While we'll aim to minimise unnecessary changes,
 97  it will interact with the operating system somewhat differently,
 98  be configured somewhat differently,
 99  and there will be possible complications involving key management.
100  
101  So the upgrade process for each dirauth
102  will involve human work by the operator,
103  and carries some risk.
104  It is likely to involve some downtime.
105  
106  Attempting to do this near-simultaneously for all dirauths
107  has a big coordination problem and risks a long outage.
108  
109  Ideally dirauth upgrades would be staggered,
110  to maximise availability and minimise risk.
111  
112  ### dirauth operator options
113  
114  Each dirauth operator can choose
115  from the following options,
116  (presented in order from least to most effort):
117  
118   1. Do nothing until the network consensus
119      is using the Arti consensus method,
120      at which point their dirauth ceases to be part of the consensus.
121      Then, upgrade straight to Arti dirauth at operator's convenience.
122  
123      The transition plan depends on no more than
124      1/3 of dirauth operators choosing this option -
125      ideally, fewer.
126  
127   2. Install the Arti dirauth plugin when it becomes available,
128      and tell C Tor to load/use it.
129      Eventually, when Arti consensuses are stable, upgrade to Arti dirauth.
130      This dirauth will participate in the consensus
131      throughout the transition.
132      Low-latency communication with and quick response by the operator
133      is not required.
134  
135   3. Install the Arti dirauth plugin,
136      but initially configure it to run only in a testing mode -
137      ie, don't advertise the Arti consensus method.
138      Engage with the transition scheduling team
139      (Arti team, Network Health team, interested dirauths)
140      and be part of the coordinated configuration change
141      to switch to the Arti consensus method.
142      Eventually, when Arti consensuses are stable, upgrade to Arti dirauth.
143      We need at least a handful of these,
144      depending precisely on what options everyone picks.
145  
146   4. Switch over to Arti dirauth as soon as possible.
147      These dirauths will not participate in consensuses
148      until the consensus switches to the Arti method.
149  
150      These operators can provide valuable feedback on Arti dirauth,
151      but having many dirauths in this state reduces network resilience,
152      so ideally this would be a minority choice.
153      Ideally we would have at least one dirauth operator in this category,
154      so we can discover issues with Arti dirauth as soon as possible,
155      but that's not essential for the transition plan.
156  
157  dirauth operators may change their mind,
158  moving from one category to another,
159  but for simplicity we'll write as if
160  each dirauth is in a fixed category determined at the start.
161  
162  ### Detailed schedule
163  
164   * Phase 1: software development.
165  
166     Discussions with dirauth operators, Network Health team,
167     about requirements, planning, etc.
168  
169     Arti team develops:
170       - Arti dirauth
171       - Arti consensus method plugin for C Tor
172       - C Tor configuration for using Arti consensus method plugin
173  
174     dirauth operators provide feedback, additional testing, etc.
175  
176     There are likely to be updates to C Tor to tidy up
177     some aspects of the Tor protocols which we don't want to reimplement.
178     These will be released and deployed according to normal C Tor processes.
179  
180   * Milestone 1: Software available.
181  
182     The Arti project is shipping both
183      1. Arti dirauth
184      2. the Arti consensus method plugin and its support in C Tor
185     as formal software deliverables,
186     in a form suitable for production use by dirauth operators.
187  
188     Any necessary updates to C Tor dirauths (and maybe relays)
189     for compatibility with Arti votes and consensuses
190     have been deployed.
191  
192     Schedule determined by: software development timescale.
193  
194   * Phase 2: deployment of support for the Arti consensus method.
195  
196     dirauths in category 4 switch to Arti dirauth
197     (and stop running C Tor entirely).
198     Each of these dirauths will be down during its transition.
199  
200     dirauths in categories 2 and 3 install the Arti dirauth plugin,
201     and configure their C Tor accordingly.
202   
203   * Milestone 2: Arti consensus method available.
204  
205     At least 2/3 of dirauths have the Arti consensus method available
206     (ie, are in categories 2-4 and have completed their phase 2 setup).
207  
208     Schedule determined by: dirauth operators' deployment decisions.
209  
210   * Phase 3: switch to the Arti consensus method.
211  
212     dirauths in category 3 coordinate,
213     and switch their configuration to advertise the Arti consensus method.
214  
215     The Tor network consensus switches over.
216     Category 4 dirauths now participate in consensus;
217     category 1 dirauths no longer participate in consensus.
218     We monitor the network behaviour,
219     ready to revert if we see problems.
220  
221     Schedule determined by:
222     explicit decision by category 3 dirauth operators
223     as advised by Arti experts, Network Health team, etc.
224  
225   * Milestone 3: we believe the Arti consensus method is stable.
226  
227     Schedule determined by:
228     explicit decision by category 3 dirauth operators
229     as advised by Arti experts, Network Health team, etc.
230  
231   * Phase 4: deployment of Arti dirauth
232  
233     dirauths (in categories 1-3) install Arti dirauth and deinstall C Tor,
234     on their own schedule.
235     Each of these dirauths will be down during its transition;
236     some coordination is advisable to reduce overall network impact.
237  
238   * Milestone 4: C Tor dirauth withdrawn.
239  
240     All (or nearly all) dirauths are running Arti dirauth
241     (not C Tor with Arti plugin).
242     C Tor dirauth can be desupported.