service-side-pow.md
1 # Service-side PoW 2 3 This document describes the implementation plan for service-side PoW. 4 Specifically, it currently talks about seed rotation and checking PoW solves, 5 but not yet the priority queue or control loop algorithm. 6 7 Related spec docs: 8 9 * [Onion service proof-of-work: Scheme v1, Equi-X and Blake2b][pow-v1] 10 * [HS PoW Common protocol][pow-common] 11 12 ## What this code is intended to do 13 14 * We need some code to keep track of seeds for each TP 15 * There needs to be some kind of periodic timer to update those seeds 16 * The information about the current seed for each TP that we are publishing needs to get to the 17 publisher 18 * When processing requests, we need to access various pieces of state (including some that are 19 fundamentally global per-onion-service) in order to filter out invalid PoW solves: 20 * Used nonces datastructure 21 * Verifier for the correct seed (implies that access to the verifier should be keyed on `SeedHead` 22 since that's all we have access to when processing a request, we don't know the TP) 23 * Total effort value, which must somehow be updated upon successful requests 24 * At some point after we have verified the solve, the code dequeuing requests in order to send them 25 to the backend that generates responses needs to be able to see the effort associated with that 26 request. 27 28 ## General background 29 30 * Seeds are unique to a given time period (TP) — that is, two different TPs must never have the same 31 seed. 32 * This is because the client uses `KP_hs_blinded_id` (which is per-TP) as a input to the PoW 33 function, but the only key that the client gives us to check the solution is the first 4 bytes 34 of the seed. Thus, we must have some way of ensuring that we're using the correct 35 `KP_hs_blinded_id` when checking the PoW solve. We could try multiple if one of them fails, but 36 that would make executing a DoS attack easier, and seems like not the best solution. 37 * We make the choice to have the expiration time of the seeds be the same for all of the TPs, to 38 simplify the logic. However, this could create a problem of linkability between HsDescs for 39 different time periods for the same service, so we should consider whether that is a problem and 40 fix it if it might be before stabilizing PoW. 41 * Seeds are rotated every "update period" (115 - 120 minutes long). 42 * When there's a new TP, we generate a new seed for that TP, but use the same expiration time as 43 we do for all the seeds. 44 * We need to ensure that in the set of seeds that are the current or previous seed in all TPs that 45 we still have active descriptors for, the seed heads (first 4 bytes of the seed) are all unique. 46 This ensures that when we receive a introduction request, we can know both what seed and TP are 47 being used. 48 49 ## Proposed PoW module 50 51 I am describing a version of this where the `PowManager` doesn't have a separate update loop, and 52 piggybacks of the update loop in `IptManager`. However, the version where `PowManager` has its own 53 update loop is just a small change if we decide that's better. 54 55 This `pow` module exists in the `tor-hsservice` crate: 56 57 ```rust 58 pub(crate) struct PowManager(RwLock<State>) 59 60 struct State { 61 seeds: HashMap<TimePeriod, ArrayVec<Seed, 2>>, 62 63 verifiers: HashMap<SeedHead, Verifier>, 64 65 next_expiration_time: SystemTime, 66 67 used_nonces: HashMap<SeedHead, Mutex<ReplayLog<replay::ProofOfWork>>>, 68 69 total_effort: Mutex<Effort>, 70 71 // this will also have: 72 // * REND_HANDLED 73 // * HAD_QUEUE 74 // * MAX_TRIMMED_EFFORT 75 // As per https://spec.torproject.org/hspow-spec/common-protocol.html#service-effort-periodic 76 } 77 78 // type that can be serialized / deserialized to disk 79 // we may just want to implement the serde traits on the PowManager directly instead, if that's easy 80 // This will be a member of StateRecord in tor-hsservice/src/ipt_mgr/persist.rs 81 pub(crate) struct PowManagerRecord { 82 // seeds 83 // expiration time 84 // total effort 85 // used_nonces are not in here but will in the future be persisted via ReplayLog 86 } 87 88 // Both the IptManager and the Reactor will have a Arc<Mutex<PowManager>> 89 impl PowManager { 90 // Called from IptManager::new 91 // The sender/receiver pair will replace the existing rend_req_tx / rend_req_rx in lib.rs 92 pub(crate) fn new(pow_replay_log_dir: InstanceRawSubdir) -> (Self, mpsc::Sender, RendQueueReceiver); 93 94 // Both called from tor-hsservice/src/ipt_mgr/persist.rs 95 pub(crate) fn to_record(&self) -> PowManagerRecord; 96 // Upon loading from disk, we will delete stale replay logs from replay_log_dir, 97 // using read_directory / parse_log_leafname / remove_file 98 pub(crate) fn from_record(record: PowManagerRecord, replay_log_dir: InstanceRawSubdir) -> Self; 99 100 // Called from IptManager::idempotently_progress_things_now 101 // Would be called in our update loop instead of there, if we took that path 102 // This will also handle deleting old ReplayLog files. 103 pub(crate) fn rotate_seeds_if_expiring(&mut self, now: TrackingNow); 104 105 // Called from publisher Reactor::upload_for_time_period 106 pub(crate) fn get_pow_params(&self, time_period: TimePeriod) -> PowParams; 107 108 // This is called from RendQueueSender 109 fn check_solve(solve: ProofOfWorkV1) -> Result<(), PowSolveError>; 110 } 111 112 pub(crate) enum PowSolveError { 113 InvalidSeedHead, 114 NonceAlreadyUsed, 115 InvalidSolve, 116 // maybe some more detailed stuff. 117 // or maybe we don't want to provide detail 118 } 119 120 // If PoW is not compiled in, this will be a dummy that is just a mpsc::Receiver 121 pub(crate) struct RendQueueReceiver { 122 rx: mpsc::Receiver, 123 124 // This may be something different / more complicated 125 queue: BinaryHeap<RendRequest>, 126 127 pow_manager: Arc<PowManager>, 128 } 129 130 impl Stream<RendRequest> for RendQueueReceiver; 131 ``` 132 133 ## Threads and locking 134 135 We expect to have one (or possibly more, on powerful machines) PoW verification thread per IPT. 136 137 From the PoW verification threads, the following tasks must be performed: 138 139 * Checking PoW solves (requires a `&Verifier` for a given `SeedHead`) 140 * Updating used nonces (requires a `&mut ReplayLog`) 141 * Updating total effort (requires `&mut Effort`) 142 143 From the update loop, the following tasks must be performed: 144 145 * Updating the list of which `Verifier` instances are valid (requires `&mut` on the datastructure 146 containing `Verifier`s) 147 * Updating the list of which `ReplayLog`s are valid (requires `&mut` on the datastructure 148 containing `ReplayLog`s) 149 * Resetting total effort (requires `&mut Effort`) 150 151 The updating of `Effort` and of any particular `ReplayLog` are fundamentally synchronous operations 152 that must block. However, the checking of PoW solves only needs to block when the list of valid 153 `Verifier`s is being updated. 154 155 This can be accomplished by the proposed design, where: 156 157 * Everything that needs to be updated upon seed rotation is behind a `RwLock`. 158 This allows concurrency between the verifier threads, while still letting us block to update the 159 list of which `Verifier`s and `ReplayLog`s are valid. 160 * `ReplayLog`s will each be in a `Mutex`, allowing verification threads to exclusively update them 161 when needed, without blocking unnecessarily. 162 * `Effort` is in a `Mutex` on its own, allowing verification threads to exclusively update it 163 without blocking unnecessarily. 164 165 Verification threads should only hold a lock on a single `Mutex` at a time. That will likely look 166 like taking a lock on a `ReplayLog`, updating it, releasing it, then taking a lock on `Effort`, 167 updating it, and releasing it. 168 169 Essentially, this design is: 170 171 * Put everything behind a `RwLock` 172 * Put the things within that lock that need to be updated by verification threads behind `Mutex`es. 173 174 ### Alternatives 175 176 * Put entire `PowManager` in a single `Mutex` 177 * This would force PoW verification to be limited by a global lock (which would be unacceptably 178 slow), unless we implemented some separate mechanism for each verification thread to only lock 179 the mutex to obtain a `Arc<Verifier>`, which would force us to implement some other method to 180 invalidate those on expiry. While that's definitely doable (for instance, by using a 181 `TimerangeBound<Verifier>`), it seems more complex than using a `RwLock`. 182 * This would also force updating the `ReplayLog` and updating the `Effort` to block each other, 183 which, while only a small slowdown, would be better avoided. 184 * Put entire `HashMap<SeedHead, ReplayLog>` in a `Mutex` 185 * This would probably work fine, it just reduces concurrency without benefit. 186 187 ## Making `ReplayLog` generic 188 189 Make `ReplayLog` generic over types that implement the `ReplayLogType` trait: 190 191 ```rust 192 trait ReplayLogType { 193 type Name; // IptLocalId, Seed 194 type Message; // Introduce2, Nonce 195 196 fn format_filename(name: Name) -> String; 197 fn hash_message(message: Message) -> H; 198 fn parse_log_leafname(leaf: &OsStr) -> Result<(Name, &str), Cow<'static, str>>; 199 } 200 201 struct IptReplayLog; 202 struct ProofOfWorkReplayLog; 203 ``` 204 205 Replace `IptLocalId` and `Introduce2` in `ReplayLog<T>` with `T::Name` and `T::message`. 206 207 It would also be good to add a method to `ReplayLog` to delete old log files: 208 209 ```rust 210 // Not sure what the error type should be 211 fn cleanup_log_files(&self, exempt: Vec<Self::Name>) -> Result<(), _>; 212 ``` 213 214 That way, the `PowManager` code does not need to keep any details about the log directory. 215 216 [pow-v1]: https://spec.torproject.org/hspow-spec/v1-equix.html 217 [pow-common]: https://spec.torproject.org/hspow-spec/common-protocol.html