bw-rate-limit.md
1 # Designing bandwidth rate-limits for Arti 2 3 > This is a draft from Feb 2024 to explore the space 4 > of implementing bandwidth limits in Arti. 5 6 7 We need a feature where some of our connections can be throttled 8 to use no more than a specified amount of bandwidth, 9 according to a token-bucket scheme. 10 11 We want separate upload and download token buckets, 12 with the same value for each. 13 14 The rate limit will be applied to channels, 15 since these are the only recommended 16 non-localhost connections we support. 17 Later, we may support limiting other things, 18 such as individual circuits, or other TCP streams. 19 20 The C tor implementation supports other options here; 21 I believe that we can do without them. 22 23 ## Candidate design (rejected) 24 25 Our tor-rtcompat design lets us define a Runtime 26 that wraps another Runtime; 27 we can use this to define a runtime that wraps 28 the TcpProvider of an underlying runtime 29 so as to apply a rate limiter wrapper 30 to each TcpStream we make. 31 32 Ideally, we would not write or maintain our own rate-limiting tool. 33 Instead, we should use an off-the-shelf crate 34 and submit patches as needed in order to make it more suitable. 35 36 If it does turn out that we need to maintain a rate-limiting crate 37 we should learn from our experience and from other implementations. 38 39 I've looked over a few possibilities, 40 and it looks like the `async_speed_limit` crate 41 is the only viable option for now. 42 There is also an `aio_limited` crate, but it isn't maintained, 43 and a `stream-limiter` crate, but it isn't async. 44 45 I don't especially love `async_speed_limit`: 46 it does some things well, 47 but there are areas where I expect it to underperform, 48 and other areas where I think it's not strictly correct. 49 Nonetheless, it might be our fastest route to an MVP. 50 51 ## Candidate design (improved) 52 53 54 We've decided that there's a decent chance 55 we'll want to apply rate limiting 56 in other places in our system, later on. 57 For example, we might want 58 to limit different outbound users of a channel differently. 59 60 Because of that, we should make our rate-limiting logic first-class. 61 62 We should implement this logic in a way 63 that hides our choice of implementation strategy. 64 65 Here is a sketch of a possible _minimal_ API. 66 67 ``` 68 // Here and below, please assume Send, Sync, and Pin have been added 69 // as appropriate. 70 // 71 // (We should not consider these APIs remotely stable until we've 72 // got tests that compile, since we may need to add these additional 73 // constraints.) 74 75 pub struct Limiter { ... } 76 pub struct LimitedIo<T> { ... } 77 78 // We don't have separate LimitedIoRead and LimitedIoWrite types; 79 // instead, the one thing does both (subject to trait bounds). 80 // If the caller has separate read and write objects, it needs two LimitedIo 81 // objects to wrap them. But there's still only one copy of the config, 82 // token buckets, etc. 83 impl<T: AsyncRead> AsyncRead for LimitedAsycnIo<T> {...} 84 impl<T: AsyncWrite> AsyncWrite for LimitedAsyncIo<T> {...} 85 86 87 impl<T> LimitedAsyncIo<T> { 88 pub fn inner(Pin<&mut self>) -> Pin<&mut T> 89 // like Box<TcpThingy> and at least some unboxed !Unpin types 90 pub fn inner(&self) -> &T {...} 91 pub fn into_inner(self) -> T where T: Unpin, presumably? {...} 92 93 // (maybe, a function to inspect the current limit status?) 94 // (maybe, a function to get the Limit? I hope we don't need that.) 95 } 96 97 98 pub struct TrafficRateLimit { 99 max_bytes_per_sec: u64, 100 max_bytes_burst: u64 101 } 102 103 pub struct LimitConfig { 104 upload_limit: BucketConfig, 105 download_limit: BucketConfig, 106 } 107 108 impl Limiter { 109 /// This might need to take a Runtime, a clock type, or who 110 /// knows what else. Maybe we need a generalization of SleepProvider 111 /// that provides its own Instant and Duration types. 112 /// 113 /// Ack; I think what we need is a generalization of a SleepProvider that defines its own Instant and Duration types. 114 pub fn new(cfg: &LimitConfig) -> Result<Arc<Self>> { ... } 115 116 pub fn reconfigure(&self, cfg: &LimitConfig) -> Result<(), ReconfigError> { ... } 117 118 /// All `LimitIo` from the same `Limiter` interact, 119 /// sharing the limit and using from kthe same quota. 120 pub fn limit_async_io<T>(self: &Arc<Self>, io: T) -> LimitedAsyncIo<T> { ... } 121 } 122 ``` 123 124 125 In the future, we might want to have a more complex set of 126 interrelated limits. If we do, we can either add a "group" or "key" 127 or "category" field. We might need to define a Limiter and a 128 SubLimiter or something too. I think it's okay to expect some churn 129 here if the functionality grows in this way. 130 131 132 ### Stream/Sink APIs 133 134 We may someday want to add the ability to limit Stream/Sink objects 135 other than AsyncRead and AsyncWrite. To do so, we define a cost 136 function on the members of the Stream/Sinks, to make them 137 comparable with our other read/writes. 138 139 (This is not something we should build 140 until we have an application for it.) 141 142 ``` 143 pub trait LimitedObject { 144 /// Return the "cost" in bytes to send or receive this object. 145 fn cost(&self) -> u64; 146 } 147 148 #[derive(From,Into)] 149 pub struct FixedCost<T,COST:u64>(T); 150 impl<T,COST:u64> LimitedObject for FixedCost<T,COST> { 151 fn cost(&self) -> u64 { COST } 152 } 153 154 /// Does this count against the upload limit or the download limit? 155 pub enum Direction { 156 Upload, Download 157 } 158 159 pub struct LimitedStream<T> {...} 160 impl<T> Stream for LimitedStream<T> 161 where T: Stream, T::Item: LimitedObject {...} 162 163 pub struct LimitedSink<T> {...} 164 impl<T,Item> Sink<Item> for LimitedSink<T> 165 where T: Sink<Item> {...} 166 167 impl Limiter { 168 pub fn limit_stream<T>(self: &Arc<Self>, stream: T, d: Direction) -> LimitedStream<T> {...} 169 pub fn limit_sink<T>(self: &Arc<Self>, sink: T, d: Direction) -> LimitedSink<T> {...} 170 ``` 171 172 173 ### Lower-level APIs 174 175 176 Conceivably we might want even lower-level APIs 177 to do things like: 178 - checking the current token bucket levels (or equivalent) 179 - draining the buckets directly 180 - determining how long to pause before a given operation can be attempted 181 182 We should implement these carefully, if at all: 183 they are likely to depend a lot on our backend, 184 and possibly tie us into a particular backend. 185 186 We should not build any more here than we need 187 to implement our LimitedIo types. 188 If we expose them, 189 they should be behind an `experimental` feature 190 until we actually need them for something. 191 192 One proposed possibility (from diziet): 193 194 ``` 195 196 impl RawLimiter { 197 // questions of details: 198 // Q. does it need to take Pin<&mut self> ? 199 // Q. is this a method on LimitedIo ? 200 // (You could have "just raw" with LimitedIo<()>) 201 // seems like it might be possible, but maybe a LimitedIo has a buffer? 202 // Q. separate read and write types? 203 // Probably not 204 // Q. should qty be a Range or something 205 // which causes this to return only when min is fulfilled? 206 // probably not. 207 // As I propose here this API is equivalent to AsyncWrite except 208 // that it doesn't get involved with `&[u8]` etc. so it hopefully doesn't impose 209 // any different requirements on the innards/algorithms/whatever. 210 async fn await_and_consume_quota_for_write(&mut self, qty: usize) -> usize; 211 212 ``` 213 214 215 ## An alternative: Ask the OS! 216 217 When it's viable, an OS-based traffic-shaping approach 218 will always outperform what we can do in userspace. 219 At minimum, we should mention this in our documentation, 220 and link to resources for how to set it up. 221 222 Perhaps in the long term it would be neat 223 to ask the operating system to limit our traffic. 224 Unfortunately, I can't find any plausible user-space API 225 for this. 226 (For example, there isn't an RLIMIT_NETWORK_BW.) 227 228 We should probably discuss this in general terms 229 as a possibility, 230 once we build relays, 231 and maybe link to relevant resources. 232