/ doc / dev / notes / bw-rate-limit.md
bw-rate-limit.md
  1  # Designing bandwidth rate-limits for Arti
  2  
  3  > This is a draft from Feb 2024 to explore the space
  4  > of implementing bandwidth limits in Arti.
  5  
  6  
  7  We need a feature where some of our connections can be throttled
  8  to use no more than a specified amount of bandwidth,
  9  according to a token-bucket scheme.
 10  
 11  We want separate upload and download token buckets,
 12  with the same value for each.
 13  
 14  The rate limit will be applied to channels,
 15  since these are the only recommended
 16  non-localhost connections we support.
 17  Later, we may support limiting other things,
 18  such as individual circuits, or other TCP streams.
 19  
 20  The C tor implementation supports other options here;
 21  I believe that we can do without them.
 22  
 23  ## Candidate design (rejected)
 24  
 25  Our tor-rtcompat design lets us define a Runtime
 26  that wraps another Runtime;
 27  we can use this to define a runtime that wraps
 28  the TcpProvider of an underlying runtime
 29  so as to apply a rate limiter wrapper
 30  to each TcpStream we make.
 31  
 32  Ideally, we would not write or maintain our own rate-limiting tool.
 33  Instead, we should use an off-the-shelf crate
 34  and submit patches as needed in order to make it more suitable.
 35  
 36  If it does turn out that we need to maintain a rate-limiting crate
 37  we should learn from our experience and from other implementations.
 38  
 39  I've looked over a few possibilities,
 40  and it looks like the `async_speed_limit` crate
 41  is the only viable option for now.
 42  There is also an `aio_limited` crate, but it isn't maintained,
 43  and a `stream-limiter` crate, but it isn't async.
 44  
 45  I don't especially love `async_speed_limit`:
 46  it does some things well,
 47  but there are areas where I expect it to underperform,
 48  and other areas where I think it's not strictly correct.
 49  Nonetheless, it might be our fastest route to an MVP.
 50  
 51  ## Candidate design (improved)
 52  
 53  
 54  We've decided that there's a decent chance
 55  we'll want to apply rate limiting
 56  in other places in our system, later on.
 57  For example, we might want
 58  to limit different outbound users of a channel differently.
 59  
 60  Because of that, we should make our rate-limiting logic first-class.
 61  
 62  We should implement this logic in a way
 63  that hides our choice of implementation strategy.
 64  
 65  Here is a sketch of a possible _minimal_ API.
 66  
 67  ```
 68  // Here and below, please assume Send, Sync, and Pin have been added
 69  // as appropriate.
 70  //
 71  // (We should not consider these APIs remotely stable until we've
 72  // got tests that compile, since we may need to add these additional
 73  // constraints.)
 74  
 75  pub struct Limiter { ... }
 76  pub struct LimitedIo<T> { ... }
 77  
 78  // We don't have separate LimitedIoRead and LimitedIoWrite types;
 79  // instead, the one thing does both (subject to trait bounds).
 80  // If the caller has separate read and write objects, it needs two LimitedIo
 81  // objects to wrap them.  But there's still only one copy of the config,
 82  // token buckets, etc.
 83  impl<T: AsyncRead> AsyncRead for LimitedAsycnIo<T> {...}
 84  impl<T: AsyncWrite> AsyncWrite for LimitedAsyncIo<T> {...}
 85  
 86  
 87  impl<T> LimitedAsyncIo<T> {
 88     pub fn inner(Pin<&mut self>) -> Pin<&mut T>
 89     // like Box<TcpThingy> and at least some unboxed !Unpin types
 90     pub fn inner(&self) -> &T {...}
 91     pub fn into_inner(self) -> T where T: Unpin, presumably? {...}
 92  
 93     // (maybe, a function to inspect the current limit status?)
 94     // (maybe, a function to get the Limit? I hope we don't need that.)
 95  }
 96  
 97  
 98  pub struct TrafficRateLimit {
 99      max_bytes_per_sec: u64,
100      max_bytes_burst: u64
101  }
102  
103  pub struct LimitConfig {
104      upload_limit: BucketConfig,
105      download_limit: BucketConfig,
106  }
107  
108  impl Limiter {
109      /// This might need to take a Runtime, a clock type, or who
110      /// knows what else. Maybe we need a generalization of SleepProvider
111      /// that provides its own Instant and Duration types.
112      ///
113      /// Ack; I think what we need is a generalization of a SleepProvider that defines its own Instant and Duration types.
114      pub fn new(cfg: &LimitConfig) -> Result<Arc<Self>> { ... }
115  
116      pub fn reconfigure(&self, cfg: &LimitConfig) -> Result<(), ReconfigError> { ... }
117  
118      /// All `LimitIo` from the same `Limiter` interact,
119      /// sharing the limit and using from kthe same quota.
120      pub fn limit_async_io<T>(self: &Arc<Self>, io: T) -> LimitedAsyncIo<T> { ... }
121  }
122  ```
123  
124  
125  In the future, we might want to have a more complex set of
126  interrelated limits.  If we do, we can either add a "group" or "key"
127  or "category" field.  We might need to define a Limiter and a
128  SubLimiter or something too.  I think it's okay to expect some churn
129  here if the functionality grows in this way.
130  
131  
132  ### Stream/Sink APIs
133  
134  We may someday want to add the ability to limit Stream/Sink objects
135  other than AsyncRead and AsyncWrite.  To do so, we define a cost
136  function on the members of the Stream/Sinks, to make them
137  comparable with our other read/writes.
138  
139  (This is not something we should build
140  until we have an application for it.)
141  
142  ```
143  pub trait LimitedObject {
144      /// Return the "cost" in bytes to send or receive this object.
145      fn cost(&self) -> u64;
146  }
147  
148  #[derive(From,Into)]
149  pub struct FixedCost<T,COST:u64>(T);
150  impl<T,COST:u64> LimitedObject for FixedCost<T,COST> {
151      fn cost(&self) -> u64 { COST }
152  }
153  
154  /// Does this count against the upload limit or the download limit?
155  pub enum Direction {
156      Upload, Download
157  }
158  
159  pub struct LimitedStream<T> {...}
160  impl<T> Stream for LimitedStream<T>
161    where T: Stream, T::Item: LimitedObject {...}
162  
163  pub struct LimitedSink<T> {...}
164  impl<T,Item> Sink<Item> for LimitedSink<T>
165    where T: Sink<Item> {...}
166  
167  impl Limiter {
168     pub fn limit_stream<T>(self: &Arc<Self>, stream: T, d: Direction) -> LimitedStream<T> {...}
169     pub fn limit_sink<T>(self: &Arc<Self>, sink: T, d: Direction) -> LimitedSink<T> {...}
170  ```
171  
172  
173  ### Lower-level APIs
174  
175  
176  Conceivably we might want even lower-level APIs
177  to do things like:
178   - checking the current token bucket levels (or equivalent)
179   - draining the buckets directly
180   - determining how long to pause before a given operation can be attempted
181  
182  We should implement these carefully, if at all:
183  they are likely to depend a lot on our backend,
184  and possibly tie us into a particular backend.
185  
186  We should not build any more here than we need
187  to implement our LimitedIo types.
188  If we expose them,
189  they should be behind an `experimental` feature
190  until we actually need them for something.
191  
192  One proposed possibility (from diziet):
193  
194  ```
195  
196  impl RawLimiter {
197      // questions of details:
198      //    Q. does it need to take Pin<&mut self> ?
199      //    Q. is this a method on LimitedIo ?
200      //       (You could have "just raw" with LimitedIo<()>)
201      //       seems like it might be possible, but maybe a LimitedIo has a buffer?
202      //    Q. separate read and write types?
203      //       Probably not
204      //    Q. should qty be a Range or something
205      //          which causes this to return only when min is fulfilled?
206      //          probably not.
207      // As I propose here this API is equivalent to AsyncWrite except
208      // that it doesn't get involved with `&[u8]` etc.  so it hopefully doesn't impose
209      // any different requirements on the innards/algorithms/whatever.
210      async fn await_and_consume_quota_for_write(&mut self, qty: usize) -> usize;
211  
212  ```
213  
214  
215  ## An alternative: Ask the OS!
216  
217  When it's viable, an OS-based traffic-shaping approach
218  will always outperform what we can do in userspace.
219  At minimum, we should mention this in our documentation,
220  and link to resources for how to set it up.
221  
222  Perhaps in the long term it would be neat
223  to ask the operating system to limit our traffic.
224  Unfortunately, I can't find any plausible user-space API
225  for this.
226  (For example, there isn't an RLIMIT_NETWORK_BW.)
227  
228  We should probably discuss this in general terms
229  as a possibility,
230  once we build relays,
231  and maybe link to relevant resources.
232