dirauth-sketch.md
1 # arti dirauth design sketch 2 3 ## dirauth functions 4 5 * receive relays' server descriptor submissions (and extra-info docs submissions) 6 * exchange server submissions with other dirauths 7 (acting as a full normal dircache is one way to do this, 8 but perhaps a more limited form of dircache is sufficient). 9 * participate in the shared random protocol 10 * perform some reachability tests for candidate relays 11 - includes tracking reachability over time 12 (and thus computing Stable and Guard flags) 13 * generate a vote from 14 - available descriptors 15 - configuration (including relay-specific configuration provided 16 by network health team, mediated by dirauth local policy 17 - bandwidth measurements 18 * exchange votes (and consensus signatures) with other dirauths (make them publicly available) 19 * given votes, generate and sign consensus 20 * serve the consensus document 21 22 ## principal components 23 24 * dircache. 25 (Also needed for Arti Relay; collaborate with that team) 26 - store information (eg descriptors, consensuses) locally 27 - serve over BEGIN_DIR 28 - serve over HTTP 29 - Download information as needed; see also 30 https://spec.torproject.org/dir-spec/downloading-from-other-auths.html 31 32 * reachability tester and relay status history 33 - test relays' reachability 34 - record enough history to calculate consensus uptime and MTBF measures 35 36 * ingesters for relay-specific information 37 - relay-specific configuration from Network Health 38 - bandwidth scanner results 39 40 * consensus algorithm implementation 41 - We will not attempt to 100% match the behaviour of C Tor. 42 Instead, we provide this as `.so` (or a maybe an executable) 43 and will arrange for C Tor diaruth to be able to use it 44 (see transition plan). 45 46 * vote calculator 47 48 * component for generating K\_dirauth\_sign\_* and signing it with 49 KS\_dirauth\_id\_*, capable of running offline. 50 51 The latter two don't need to be always-online. 52 We'll to separate them out so that they can (likely in the future) 53 use a static data dump, or a restricted protocol, 54 so that they don't need full internet access. 55 56 ## deployment transition plan 57 58 Directory consensus protocol means that 59 if we change the consensus algorithm 60 at least 1/3 of functioning dirauths, and probably more, 61 must change simultaneously. 62 (We go from \<1/3 new to \>2/3 new in one go.) 63 64 We think it is probably not going to be feasible to precisely reproduce 65 the consensus calculations from C Tor in Arti. 66 67 This is practical only if the simultaneously-switching dirauths 68 all implement both the old and new consensus algorithm. 69 (This is what the consensus methods are for.) 70 We can't switch all dirauths from C Tor to Arti on the same day. 71 72 Instead, we will 73 make the Arti implementation of the consensus protocol 74 available in a form that can be used by C Tor. 75 We'll 76 adapt C Tor to be able to call that implementation, 77 making the choice based on the consensus method. 78 79 When enough (C Tor) dirauths have the Arti consensus algorithm available, 80 the consensus method protocol will automatically switch 81 to using the Arti consensus. 82 83 After that, C Tor dirauths without the Arti consensus algorithm 84 will effectively not participate, until they are upgraded. 85 But pure-Arti dirauths (which can only perform the Arti consensus algorithm) 86 can be deployed. 87 88 (In practice there may be, during the transition, 89 more than one relevant Arti consensus method 90 and possibly more than one relevant C Tor consensus method.) 91 92 ### Rationale, dirauth upgrade impact 93 94 Arti dirauth is not going to be a drop-in replacement 95 for C Tor dirauth. 96 While we'll aim to minimise unnecessary changes, 97 it will interact with the operating system somewhat differently, 98 be configured somewhat differently, 99 and there will be possible complications involving key management. 100 101 So the upgrade process for each dirauth 102 will involve human work by the operator, 103 and carries some risk. 104 It is likely to involve some downtime. 105 106 Attempting to do this near-simultaneously for all dirauths 107 has a big coordination problem and risks a long outage. 108 109 Ideally dirauth upgrades would be staggered, 110 to maximise availability and minimise risk. 111 112 ### dirauth operator options 113 114 Each dirauth operator can choose 115 from the following options, 116 (presented in order from least to most effort): 117 118 1. Do nothing until the network consensus 119 is using the Arti consensus method, 120 at which point their dirauth ceases to be part of the consensus. 121 Then, upgrade straight to Arti dirauth at operator's convenience. 122 123 The transition plan depends on no more than 124 1/3 of dirauth operators choosing this option - 125 ideally, fewer. 126 127 2. Install the Arti dirauth plugin when it becomes available, 128 and tell C Tor to load/use it. 129 Eventually, when Arti consensuses are stable, upgrade to Arti dirauth. 130 This dirauth will participate in the consensus 131 throughout the transition. 132 Low-latency communication with and quick response by the operator 133 is not required. 134 135 3. Install the Arti dirauth plugin, 136 but initially configure it to run only in a testing mode - 137 ie, don't advertise the Arti consensus method. 138 Engage with the transition scheduling team 139 (Arti team, Network Health team, interested dirauths) 140 and be part of the coordinated configuration change 141 to switch to the Arti consensus method. 142 Eventually, when Arti consensuses are stable, upgrade to Arti dirauth. 143 We need at least a handful of these, 144 depending precisely on what options everyone picks. 145 146 4. Switch over to Arti dirauth as soon as possible. 147 These dirauths will not participate in consensuses 148 until the consensus switches to the Arti method. 149 150 These operators can provide valuable feedback on Arti dirauth, 151 but having many dirauths in this state reduces network resilience, 152 so ideally this would be a minority choice. 153 Ideally we would have at least one dirauth operator in this category, 154 so we can discover issues with Arti dirauth as soon as possible, 155 but that's not essential for the transition plan. 156 157 dirauth operators may change their mind, 158 moving from one category to another, 159 but for simplicity we'll write as if 160 each dirauth is in a fixed category determined at the start. 161 162 ### Detailed schedule 163 164 * Phase 1: software development. 165 166 Discussions with dirauth operators, Network Health team, 167 about requirements, planning, etc. 168 169 Arti team develops: 170 - Arti dirauth 171 - Arti consensus method plugin for C Tor 172 - C Tor configuration for using Arti consensus method plugin 173 174 dirauth operators provide feedback, additional testing, etc. 175 176 There are likely to be updates to C Tor to tidy up 177 some aspects of the Tor protocols which we don't want to reimplement. 178 These will be released and deployed according to normal C Tor processes. 179 180 * Milestone 1: Software available. 181 182 The Arti project is shipping both 183 1. Arti dirauth 184 2. the Arti consensus method plugin and its support in C Tor 185 as formal software deliverables, 186 in a form suitable for production use by dirauth operators. 187 188 Any necessary updates to C Tor dirauths (and maybe relays) 189 for compatibility with Arti votes and consensuses 190 have been deployed. 191 192 Schedule determined by: software development timescale. 193 194 * Phase 2: deployment of support for the Arti consensus method. 195 196 dirauths in category 4 switch to Arti dirauth 197 (and stop running C Tor entirely). 198 Each of these dirauths will be down during its transition. 199 200 dirauths in categories 2 and 3 install the Arti dirauth plugin, 201 and configure their C Tor accordingly. 202 203 * Milestone 2: Arti consensus method available. 204 205 At least 2/3 of dirauths have the Arti consensus method available 206 (ie, are in categories 2-4 and have completed their phase 2 setup). 207 208 Schedule determined by: dirauth operators' deployment decisions. 209 210 * Phase 3: switch to the Arti consensus method. 211 212 dirauths in category 3 coordinate, 213 and switch their configuration to advertise the Arti consensus method. 214 215 The Tor network consensus switches over. 216 Category 4 dirauths now participate in consensus; 217 category 1 dirauths no longer participate in consensus. 218 We monitor the network behaviour, 219 ready to revert if we see problems. 220 221 Schedule determined by: 222 explicit decision by category 3 dirauth operators 223 as advised by Arti experts, Network Health team, etc. 224 225 * Milestone 3: we believe the Arti consensus method is stable. 226 227 Schedule determined by: 228 explicit decision by category 3 dirauth operators 229 as advised by Arti experts, Network Health team, etc. 230 231 * Phase 4: deployment of Arti dirauth 232 233 dirauths (in categories 1-3) install Arti dirauth and deinstall C Tor, 234 on their own schedule. 235 Each of these dirauths will be down during its transition; 236 some coordination is advisable to reduce overall network impact. 237 238 * Milestone 4: C Tor dirauth withdrawn. 239 240 All (or nearly all) dirauths are running Arti dirauth 241 (not C Tor with Arti plugin). 242 C Tor dirauth can be desupported.