Cradicle Explorer

/ doc / dev / notes / ffi_and_rpc_sketch.md
ffi_and_rpc_sketch.md
  1  
  2  # Let's meditate on FFI and RPC designs.
  3  
  4  > These are Nick's thoughts, as of 3 February 2023.
  5  > They are not an official plan.
  6  
  7  People want to call Arti in several ways.
  8  
  9  The calling language might be...
 10     1. Sync Rust
 11     2. Async Rust
 12     3. Something else
 13   
 14   The calling environment might be...
 15     1. In-process
 16     2. Out-of-process
 17     3. Via some mechanism that lets the application ignore whether it is in-process or out-of-process.
 18  
 19  Premise: we would like to define as few APIs as possible.
 20  But if we aren't careful, we'll wind up with 9 different APIs, 
 21  to cover all `3×3` of the different cases above.
 22  (C Tor is basically in this situation.)
 23  
 24  So to that end, 
 25  let's try to define a single Rust API
 26  that supports sync and async uses,
 27  and which can be implemented remotely or in-process.
 28  Let's also define a single C FFI API
 29  that closely mirrors that Rust API, 
 30  and which can be implemented remotely or in-process.
 31  
 32  # Ooh look, a diagram!
 33  
 34  ```mermaid
 35     graph TD
 36  
 37     classDef UserCode fill:#008,color:#eee,stroke-width:3px,stroke-dasharray: 5 5;
 38     classDef API1 fill:#373,color:#eee;
 39     classDef API2 fill:#737,color:#eee;
 40  
 41     subgraph Legend
 42        UserCode[User Code]
 43        API1[Uniform Rust API]
 44        API2[Uniform C API]
 45        UserCode:::UserCode;
 46        API1:::API1;
 47        API2:::API2;
 48     end
 49  
 50     subgraph In-Process
 51       RPCServer(RPC Server)
 52       C_FFI(C FFI)
 53       Facade[Rust API]
 54  
 55       RPCServer --> Facade & TorClient
 56       C_FFI --> Facade
 57       Facade --> TorClient
 58  
 59       SyncRust:::UserCode -.-> Facade 
 60       AsyncRust:::UserCode -.-> Facade;
 61       AsyncRust -.->|?| TorClient;
 62  
 63       Embedding:::UserCode -.-> C_FFI
 64     end
 65  
 66     ManualRPC[Manual RPC]
 67     ManualRPC:::UserCode -.-> RPCServer;
 68  
 69     subgraph Out-Of-Process
 70        RPCClient(RPC Client)
 71        RPC_FFI(C FFI)
 72  
 73        RPCClient ----> |Json over TLS?| RPCServer
 74        RPC_FFI --> RPCClient
 75        Embedding2[Embedding]
 76        SyncRust2[SyncRust]
 77        AsyncRust2[AsyncRust]
 78        Embedding2:::UserCode -.-> RPC_FFI
 79        SyncRust2:::UserCode -.-> RPCClient
 80        AsyncRust2:::UserCode -.-> RPCClient 
 81     end
 82  
 83     class Facade API1;
 84     class RPCClient API1;
 85     class C_FFI API2;
 86     class RPC_FFI API2;
 87  
 88  ```
 89  
 90  # Challenges
 91  
 92  ## 1: RPC is never invisible
 93  
 94  Notoriously, RPC has failure modes that make it annoying 
 95  to use the same API for in-process and out-of-process calls.
 96  Yet that is exactly what we're proposing to do here!
 97  
 98  Can this possibly be wise?
 99  
100  > My thoughts: I think it's not too bad. Inasmuch as Tor is a protocol for being a proxy, callers already have to deal with the possibility that the network is down, that their operations will be slow or cancelled, and so on.  So having one more possible cause for slowness and/or cancellation should be No Big Deal.  Maybe?
101  
102  ## 2: Wow, that's a lot of ways to call Arti!
103  
104  If we want to add a new feature to our API, we will in the worst case need to add that feature to:
105    * The abstract Rust API definition
106    * The abstract C API definition
107    * The "Rust API" module
108    * The RPC Server
109    * The RPC Client
110    * Both "C FFI" wrappers
111  
112  So it seems:
113     * We need some way to automate all of this code generation as much as possible
114     * We need some way to keep the API small.
115  
116  And this last point leads to our next challenge...
117  
118  ## 3: A big API is harder to export; a small API is harder to make type-safe
119  
120  Up till now, we've been adding stuff to the `arti-client` crate without too much worry.  But some parts of our API (notably, our configuration and configuration builder logic) are pretty huge, inasmuch as they try to enable compile time checking of configuration options and types.
121  
122  As an alternative, we could introduce points of detail-hiding, like providing a "`set_option(name:&str, val:&str)`" API.  The more of our API we can make "string-ly typed" typed in this way, the smaller it would be, but harder it would be to ensure good compile-time type checking.  
123  
124  Perhaps we can some up with some way to make things "jsonly-typed" instead?  It still seems like a regrettable kludge.
125  
126  > My inclination: pursue the "jsonly-typed" option.  Look for ways to expose things as object trees.  Look for abstractions that let us minimize our API surface.
127  
128  ## 4: Whither `TorClient`?
129  
130  If the preferred way to use Arti in Rust is now via a uniform Rust API, then should we deprecate use of the `TorClient` API?  Or should we deprecate those parts of it that can't be made part of the "uniform Rust API"?
131  
132  Or should we leave it sitting around permanently as yet another API surface, for those who only want to use Arti embedded from Rust?
133  
134  > My inclination:  Massage TorClient so that most of it is used via the uniform Rust API; allow it to have additional features that are not in that uniform Rust API; say that using those features means you can only be in-process.
135  
136  ## 5: Oh yeah, that RPC protocol...
137  
138  We're putting a lot of constraints into this system, in a way that has implications for our RPC design.  The more we say here, the less likely it is that we "manual RPC" users will be able to use an off-the-shelf RPC tool to call Arti.
139  
140  Do we care?  Do our users?
141  
142  # How do you even prototype something like this?
143  
144  I think I'd like to start by designing a few key operations in it, and looking at what they would imply for the API at all layers.  From that, we can probably figure out more about the general shape of the design, and which spots make more sense than others.
145  
146  In parallel, I'd look for ways to have all of our implementations share as much code as possible.  For instance, could we use a single message-handling implementation under the hood, so that our in-process and out-of-process APIs differed only in whether they had to touch the network?  Could our C FFI layer be written to wrap a Rust trait that provides our API, so that we only need to implement that once?
147  
148  I'd also try to think about ways to specify and document this API so that we didn't have to define every piece of functionality five or six times.
149