Cradicle Explorer

/ doc / dev / notes / service-side-pow.md
service-side-pow.md
  1  # Service-side PoW
  2  
  3  This document describes the implementation plan for service-side PoW.
  4  Specifically, it currently talks about seed rotation and checking PoW solves,
  5  but not yet the priority queue or control loop algorithm.
  6  
  7  Related spec docs:
  8  
  9  * [Onion service proof-of-work: Scheme v1, Equi-X and Blake2b][pow-v1]
 10  * [HS PoW Common protocol][pow-common]
 11  
 12  ## What this code is intended to do
 13  
 14  * We need some code to keep track of seeds for each TP
 15  * There needs to be some kind of periodic timer to update those seeds
 16  * The information about the current seed for each TP that we are publishing needs to get to the
 17    publisher
 18  * When processing requests, we need to access various pieces of state (including some that are
 19    fundamentally global per-onion-service) in order to filter out invalid PoW solves:
 20    * Used nonces datastructure
 21    * Verifier for the correct seed (implies that access to the verifier should be keyed on `SeedHead`
 22      since that's all we have access to when processing a request, we don't know the TP)
 23    * Total effort value, which must somehow be updated upon successful requests
 24  * At some point after we have verified the solve, the code dequeuing requests in order to send them
 25    to the backend that generates responses needs to be able to see the effort associated with that
 26    request.
 27  
 28  ## General background
 29  
 30  * Seeds are unique to a given time period (TP) — that is, two different TPs must never have the same
 31    seed.
 32    * This is because the client uses `KP_hs_blinded_id` (which is per-TP) as a input to the PoW
 33      function, but the only key that the client gives us to check the solution is the first 4 bytes
 34      of the seed. Thus, we must have some way of ensuring that we're using the correct
 35      `KP_hs_blinded_id` when checking the PoW solve. We could try multiple if one of them fails, but
 36      that would make executing a DoS attack easier, and seems like not the best solution.
 37    * We make the choice to have the expiration time of the seeds be the same for all of the TPs, to
 38      simplify the logic. However, this could create a problem of linkability between HsDescs for
 39      different time periods for the same service, so we should consider whether that is a problem and
 40      fix it if it might be before stabilizing PoW.
 41  * Seeds are rotated every "update period" (115 - 120 minutes long).
 42    * When there's a new TP, we generate a new seed for that TP, but use the same expiration time as
 43      we do for all the seeds.
 44  * We need to ensure that in the set of seeds that are the current or previous seed in all TPs that
 45    we still have active descriptors for, the seed heads (first 4 bytes of the seed) are all unique.
 46    This ensures that when we receive a introduction request, we can know both what seed and TP are
 47    being used.
 48  
 49  ## Proposed PoW module
 50  
 51  I am describing a version of this where the `PowManager` doesn't have a separate update loop, and
 52  piggybacks of the update loop in `IptManager`. However, the version where `PowManager` has its own
 53  update loop is just a small change if we decide that's better.
 54  
 55  This `pow` module exists in the `tor-hsservice` crate:
 56  
 57  ```rust
 58  pub(crate) struct PowManager(RwLock<State>)
 59  
 60  struct State {
 61      seeds: HashMap<TimePeriod, ArrayVec<Seed, 2>>,
 62  
 63      verifiers: HashMap<SeedHead, Verifier>,
 64  
 65      next_expiration_time: SystemTime,
 66  
 67      used_nonces: HashMap<SeedHead, Mutex<ReplayLog<replay::ProofOfWork>>>,
 68  
 69      total_effort: Mutex<Effort>,
 70  
 71      // this will also have:
 72      // * REND_HANDLED
 73      // * HAD_QUEUE
 74      // * MAX_TRIMMED_EFFORT
 75      // As per https://spec.torproject.org/hspow-spec/common-protocol.html#service-effort-periodic
 76  }
 77  
 78  // type that can be serialized / deserialized to disk
 79  // we may just want to implement the serde traits on the PowManager directly instead, if that's easy
 80  // This will be a member of StateRecord in tor-hsservice/src/ipt_mgr/persist.rs
 81  pub(crate) struct PowManagerRecord {
 82      // seeds
 83      // expiration time
 84      // total effort
 85      // used_nonces are not in here but will in the future be persisted via ReplayLog
 86  }
 87  
 88  // Both the IptManager and the Reactor will have a Arc<Mutex<PowManager>>
 89  impl PowManager {
 90      // Called from IptManager::new
 91      // The sender/receiver pair will replace the existing rend_req_tx / rend_req_rx in lib.rs
 92      pub(crate) fn new(pow_replay_log_dir: InstanceRawSubdir) -> (Self, mpsc::Sender, RendQueueReceiver);
 93  
 94      // Both called from tor-hsservice/src/ipt_mgr/persist.rs
 95      pub(crate) fn to_record(&self) -> PowManagerRecord;
 96      // Upon loading from disk, we will delete stale replay logs from replay_log_dir,
 97      // using read_directory / parse_log_leafname / remove_file
 98      pub(crate) fn from_record(record: PowManagerRecord, replay_log_dir: InstanceRawSubdir) -> Self;
 99  
100      // Called from IptManager::idempotently_progress_things_now
101      // Would be called in our update loop instead of there, if we took that path
102      // This will also handle deleting old ReplayLog files.
103      pub(crate) fn rotate_seeds_if_expiring(&mut self, now: TrackingNow);
104  
105      // Called from publisher Reactor::upload_for_time_period
106      pub(crate) fn get_pow_params(&self, time_period: TimePeriod) -> PowParams;
107  
108      // This is called from RendQueueSender
109      fn check_solve(solve: ProofOfWorkV1) -> Result<(), PowSolveError>;
110  }
111  
112  pub(crate) enum PowSolveError {
113      InvalidSeedHead,
114      NonceAlreadyUsed,
115      InvalidSolve,
116      // maybe some more detailed stuff.
117      // or maybe we don't want to provide detail
118  }
119  
120  // If PoW is not compiled in, this will be a dummy that is just a mpsc::Receiver
121  pub(crate) struct RendQueueReceiver {
122      rx: mpsc::Receiver,
123  
124      // This may be something different / more complicated
125      queue: BinaryHeap<RendRequest>,
126  
127      pow_manager: Arc<PowManager>,
128  }
129  
130  impl Stream<RendRequest> for RendQueueReceiver;
131  ```
132  
133  ## Threads and locking
134  
135  We expect to have one (or possibly more, on powerful machines) PoW verification thread per IPT.
136  
137  From the PoW verification threads, the following tasks must be performed:
138  
139  * Checking PoW solves (requires a `&Verifier` for a given `SeedHead`)
140  * Updating used nonces (requires a `&mut ReplayLog`)
141  * Updating total effort (requires `&mut Effort`)
142  
143  From the update loop, the following tasks must be performed:
144  
145  * Updating the list of which `Verifier` instances are valid (requires `&mut` on the datastructure
146    containing `Verifier`s)
147  * Updating the list of which `ReplayLog`s are valid (requires `&mut` on the datastructure
148    containing `ReplayLog`s)
149  * Resetting total effort (requires `&mut Effort`)
150  
151  The updating of `Effort` and of any particular `ReplayLog` are fundamentally synchronous operations
152  that must block. However, the checking of PoW solves only needs to block when the list of valid
153  `Verifier`s is being updated.
154  
155  This can be accomplished by the proposed design, where:
156  
157  * Everything that needs to be updated upon seed rotation is behind a `RwLock`.
158    This allows concurrency between the verifier threads, while still letting us block to update the
159    list of which `Verifier`s and `ReplayLog`s are valid.
160  * `ReplayLog`s will each be in a `Mutex`, allowing verification threads to exclusively update them
161    when needed, without blocking unnecessarily.
162  * `Effort` is in a `Mutex` on its own, allowing verification threads to exclusively update it
163    without blocking unnecessarily.
164  
165  Verification threads should only hold a lock on a single `Mutex` at a time. That will likely look
166  like taking a lock on a `ReplayLog`, updating it, releasing it, then taking a lock on `Effort`,
167  updating it, and releasing it.
168  
169  Essentially, this design is:
170  
171  * Put everything behind a `RwLock`
172  * Put the things within that lock that need to be updated by verification threads behind `Mutex`es.
173  
174  ### Alternatives
175  
176  * Put entire `PowManager` in a single `Mutex`
177    * This would force PoW verification to be limited by a global lock (which would be unacceptably
178      slow), unless we implemented some separate mechanism for each verification thread to only lock
179      the mutex to obtain a `Arc<Verifier>`, which would force us to implement some other method to
180      invalidate those on expiry. While that's definitely doable (for instance, by using a
181      `TimerangeBound<Verifier>`), it seems more complex than using a `RwLock`.
182    * This would also force updating the `ReplayLog` and updating the `Effort` to block each other,
183      which, while only a small slowdown, would be better avoided.
184  * Put entire `HashMap<SeedHead, ReplayLog>` in a `Mutex`
185    * This would probably work fine, it just reduces concurrency without benefit.
186  
187  ## Making `ReplayLog` generic
188  
189  Make `ReplayLog` generic over types that implement the `ReplayLogType` trait:
190  
191  ```rust
192  trait ReplayLogType {
193      type Name; // IptLocalId, Seed
194      type Message; // Introduce2, Nonce
195  
196      fn format_filename(name: Name) -> String;
197      fn hash_message(message: Message) -> H;
198      fn parse_log_leafname(leaf: &OsStr) -> Result<(Name, &str), Cow<'static, str>>;
199  }
200  
201  struct IptReplayLog;
202  struct ProofOfWorkReplayLog;
203  ```
204  
205  Replace `IptLocalId` and `Introduce2` in `ReplayLog<T>` with `T::Name` and `T::message`.
206  
207  It would also be good to add a method to `ReplayLog` to delete old log files:
208  
209  ```rust
210  // Not sure what the error type should be
211  fn cleanup_log_files(&self, exempt: Vec<Self::Name>) -> Result<(), _>;
212  ```
213  
214  That way, the `PowManager` code does not need to keep any details about the log directory.
215  
216  [pow-v1]: https://spec.torproject.org/hspow-spec/v1-equix.html
217  [pow-common]: https://spec.torproject.org/hspow-spec/common-protocol.html