Cradicle Explorer

/ doc / dev / notes / hssvc-ipt-algorithms.md
hssvc-ipt-algorithms.md
  1  # HS service IPTs and descriptor publication algorithms
  2  
  3  ## Code structure
  4  
  5  There are three and a half main pieces:
  6  
  7   * IPT Establisher:
  8     One per IPT.
  9     Given a single IPT relay attempts to set up,
 10     verify, maintain, and report on the introduction point.
 11     Persistent (on-disk) state: none.
 12  
 13   * IPT Manager:
 14     One per HS.
 15     Selects IPTs, creates and destroys IPT establishers,
 16     monitors their success/failure, etc.
 17     Persistent (on-disk) state:
 18     current set of IPT Relays.
 19     Persistent (on-disk) state:
 20     current list of IPTs and their (last) states, fault counters, etc.,
 21     including secret keys necessary to re-stablish that IPT.
 22     Information about previously published
 23     descriptor contents (`PublishIptSet`)
 24     issued to hsdir publisher,
 25     that have not yet expired.
 26  
 27   * hsdir Publisher:
 28     One per HS.
 29     Identifies the hsdirs for the relevant time periods.
 30     Constructs descriptors according to the IPT manager's instructions,
 31     and publishes them to the hsdirs.
 32  
 33   * `ipt_set`, persistent data structure,
 34     shared between Manager and Publisher.
 35     Persistent (on-disk) state:
 36     which IPTs are published where.
 37  
 38  Output of the whole thing:
 39  Stream of introduction requests,
 40  done by passing an mpsc sender into the IPT Manager's constructor,
 41  which is simply cloned and given to each IPT Establisher.
 42  
 43  (Each IPT Establisher is told by the IPT Manager
 44  when a descriptor mentioning that IPT is about to be published,
 45  so that the IPT Establisher can reject introduction attempts
 46  using an unpublished IPT.)
 47  
 48  (There are too many possible IPTs
 49  to maintain experience information about IPTs we used to use;
 50  the list of experience information would grow to the size of the network.
 51  And recording *all* our IPT experiences might
 52  lead to distinguishability.)
 53  
 54  We might of course also operate a completely ephemeral hidden service,
 55  which doesn't store anything on disk,
 56  (and therefore gets a new K_hs_id each time it is started.)
 57  
 58  ## IPT selection and startup for a new HS, overall behaviour
 59  
 60   * Select N suitable relays randomly to be IPTs
 61  
 62   * Attempt to establish and verify them, in parallel
 63  
 64   * Wait a short time
 65     and then publish a short-lifetime descriptor listing the ones
 66     set up so far (this gets us some working descriptors right away)
 67  
 68   * When we have all the IPTs set up, republish the descriptor.
 69  
 70  (This behaviour follows from the detailed algorithm below.)
 71  
 72  ## Verification and monitoring (optional, probably not in v1)
 73  
 74  After ESTABLISH_INTRO,
 75  we attempt (via a 2nd set of circuits)
 76  an INTRODUCE probe, to see if the IPT is working.
 77  
 78  We do such probes periodically at random intervals.
 79  
 80  NOTE: there is a behaviour/privacy risk here,
 81  which should be properly considered before implementation.
 82  
 83  ## General operation, IPT selection
 84  
 85  We maintain records of each still-possibly-relevant IPT.
 86  (We distinguish "IPT",
 87  an intended or established introduction point with particular keys etc.,
 88  from an "IPT Relay", which is a relay at which we'll establish the IPT.)
 89  
 90  When we lose our circuit to an IPT,
 91  we look at the `ErrorKind` to try to determine
 92  if the fault was local (and would therefore affect all relays and IPTs):
 93  
 94   * `TorAccessFailed`, `LocalNetworkError`, `ExternalToolFailed`
 95     and perhaps others:
 96     Return the IPT to `Establishing`.
 97  
 98   * Others: declare the IPT `Faulty`.
 99  
100  If we are doing verification, and
101  our verification probe fails,
102  but the circuit to the IPT appears to remain up:
103  
104   * If we didn't manage to build the test circuit to the IPT,
105     check the `ErrorKind`, as above.
106  
107   * If we managed to build the test circuit to the IPT,
108     but the probe failed (or the probe payload didn't arrive),
109     declare the IPT `Faulty`.
110  
111  ## Descriptor publication
112  
113  The descriptor output from the IPT maintenance algorithm is
114  an updated (`postage::watch`) `IptSetStatus`:
115  
116  ```
117  enum IptSetStatus {
118      Unknown,
119      Certain(IptSetForDescriptor),
120      Uncertain(IptSetForDescriptor),
121  }
122  struct IptSetForDescriptor {
123      ipts: list of introduction points for descriptor
124      expiry_time: Instant,
125  }
126  ```
127  
128  We run a publication algorithm separately for each hsdir:
129  
130  We record for each hsdir what we have published.
131  
132  We attempt publication in the following cases:
133  
134   * `Certain`, if: the IPT list has changed from what was published,
135     and we haven't published a `Certain` set recently
136   * `Uncertain`, if: nothing is published,
137     or what is published will expire soon,
138     or we haven't published since Arti was restarted
139  
140  If a publication attempt failed
141  we block further attempts
142  according to an exponential backoff schedule;
143  when the timer expires we reconsider
144  if and what we want to publish.
145  
146  ## Tuning parameters
147  
148  TODO: Review these tuning parameters both for value and origin.
149  Some of these may be in `param-spec.txt` section "8. V3 onion service parameters"
150  Some of them may be in C Tor.
151  
152   * N, number of IPTs to try to maintain:
153     configurable, default is 3, max is 20.
154     (rend-spec-v3 2.5.4 NUM_INTRO_POINT)
155  
156   * k*N: Maximum number of IPTs including replaced faulty ones.
157     (We may actually maintain more than this when we are have *retiring* IPTs,
158     but this doesn't expose us to IPT churn since attackers can't
159     force us to retire IPTs.
160  
161   * IPT replacement time: 4..7 days (uniform random)
162     TODO: what is the right value here?  (Should we do time-based rotation at all?)
163  
164   * "Soon" for "if the published descriptor will expire soon":
165     10 minutes.
166  
167   * Verification probe interval:
168     descriptor expiry time minus 15 minutes.
169  
170   * Backoff schedule for hsdir publication.
171  
172  ## Load balancing (and maybe failover)
173  
174  This is a sketch, only.
175  TODO: Look at what Onion Balance does before implementing this.
176  
177  If it's desired to allow multiple Arti processes to serve a single HS:
178  
179  The shards will have the IPT Establishers.
180  
181  There will be one central IPT Manager
182  (perhaps with a failover).
183  
184  Each shard will have an IPT Manager Stub
185  which receives instructions from,
186  and reports experiences to, 
187  the central IPT Manager.