p2p-data-sync-mobile.md
1 # P2P Data Sync for mobile 2 3 ## Introduction 4 5 How do we achieve user friendly data syncronization in a p2p network for resource restricted devices? In this paper we survey existing work, and propose a solution combining several of the most promising technologies. 6 7 In a p2p network you often want to reliably transmit and replicate data across participants. This can either be large files, or small messages that users want to exchange between each other in a private chat. The paticipants can either be human, or machines, or a combination thereof. P2P networks are one type of challenging network that has a few special characteristics: nodes often churn a lot (low uptime) and there might be malicious nodes if you haven't established trust beforehand. There are also considerations such as NAT traversal, which we treat as out of scope for this paper. Doing P2P on mobile and similar resource restricted devices is extra challenging, due to the churn being even more of an issue, as well as having to pay attention to limited bandwidth usage, battery, storage and computational requirements. A lot of these considerations overlap with other types of challenging networks, such as VANETs, DTNs and mesh networks. 8 9 In addition to synchronization data, you might also want other types of properties such as: enabling casual consistency for message ordering, privacy-preservation through self-exploding messages, censorship resistance, and so on. 10 11 This paper is aimed at introducing one part of a secure messaging stack. 12 13 ## Related Work 14 15 Data synchronization in p2p networks have existed for a while. The most commonly used modern examples are probably Bittorrent and Git. With p2p cryptocurrencies like Bitcoin and Ethereum, we have seen an explosion in the amount of infrastructure being built to solve various use cases. In Ethereum you have projects like IPFS and Swarm, that deal with content storage and distribution, as well as associated protocols with it. 16 17 Outside of the cryptocurrency or Web3 community we also have seen many projects building decentralized networks and apps recently with open associated protocols. We have projects like Briar with its Bramble Synchronization protocol, Tribler with its Dispersy Bundle Synchroization, Matrix, Secure Scuttlebutt and a few others. 18 19 It's useful to compare these on a few dimensions to understand how they are similar and how they differ. There are many ways to slice a cake, and we'll focus in a few of the more relevant dimensions. Let's first give a brief overview of the main sources of inspiration. 20 21 ### Server based model 22 23 In a server based model things are rather simple. The server is the source of truth, and you do updates synchronously with it. So I have a device, make a request to some server and send update. This means it is easy to guarantee otherwise impossible attributes, such as sequential consistency. This comes at the cost of centralization, with drawbacks such as availability issues (whether deliberate, by accident, by service provider or by third party), censorship, lack of ownership and control of your data, and surveillance. We mention it here only to provide contrast with the other models. 24 25 ### Git 26 27 Git is a decentralized version control system. It was built out of the needs of Linux kernel development, which is inherently distributed and with many sources of truth that needs to be reconciled. Essentially a git repository is a DAG, and then you create branches and can pull down other remote references and merge them with yours. In case of different views, you either need to rebase (changing you own dependency tree), or create a merge commit. While you could use Git for other things than just code, it isn't what it is designed for. Often commits operate on the same single piece of data (codebase), which leads to things like merge conflicts etc. So while the commits themselves are immutable, the thing you are manipulating is not, and it often leads to conflicts and so on. 28 29 ### Bittorrent 30 31 Bittorrent has quite a few parts to it, and it can acts as a private group-based network or as an open structured network. A common mode of operation is having a tracker, then getting other nodes that you share chunks of a single static file. 32 33 One interesting aspect of Bittorrent is that it has a form of incentive mechainsm that's tit for tat. Essentially to encourage pairwise contributions within the game that is sharing a single static torrent file. This game doesn't extend outside of sharing a single file, which has lead to private trackers using things like seed ratio as a reputation system in a centralized fashion, leading to exposure to censorship and coercion. 34 35 For its structured network, most Bittorrent implementations leverage a DHT of nodes, that allow you to figure out who is sharing a piece of data. It is worth noting that the object being shared is static, a torrent file once uploaded doesn't change, and the integrity/consistency of it is communicated in the form of a checksum that's usually communicated in the initial (centrally distributed) download. 36 37 ### Matrix 38 39 Matrix is a tool for various forms of group chat, including on mobile. Matrix is in its current mode of operation operating a hybrid architecture. It is neither fully p2p nor client-server, instead it uses a federated mode or a super-peer architecture. As a user, you connect to a homeserver, which then syncs data with other known homeservers. While there are plans to move to a p2p architecture where each node is running its own homeserver, there appears to be some unsolved issues, similar to the ones noted in other protocols. 40 41 Matrix distinguishes between syncable messages and ephemeral messages, which allows it to have non-heavy typing notifications and so on. They also have two DAGs, where one is used for reconciling auth events to ensure people who are banned from a chat room can't easily just rejoin as unbanned. Since it uses a DAG, it also means a chat context can be offline, such as in a submarine, for an extended period of time, then come online and sync up with the rest of the world. Because you need to connect to a homeserver to sync, it requires the user to be online or have an active connection with a homeserver. 42 43 The super-peer archicture means there's only a simple TCP connection and no high availablity requirements for end users, which leads to a good user experience for mobile devices. However, it isn't pure peer to peer, and suffer many of the same centralizing aspects as other server based approaches, including security concerns, censorship resistance, and so on. 44 45 ### Secure Scuttlebutt (SSB) 46 47 SSB is a form of group-based private p2p network based on trusted relationsihps append-only feeds. These feeds are single-master and offline/local by default. You can follow a feed to get updates from it, and and by following a feed, and being followed, you become aware of more feeds. Essentially you see content 1-hop away, fetch in a cache content 2 hops away, and you are aware of 3 hop. The way bootstrapping works is that you connect to a known Pub server, which follows you back. By following you back, you become aware of the connection that pub has, and so on, in concentric circles, like a social graph owned by you. Feeds are unencrypted by default, and the way messaging works is that you post to your own feed with a message encrypted by the other person's public key. It's then a client detail how to make this into a good user interface. 48 49 ### Bramble Synchronization Protocol (BSP) 50 51 BSP is used in Briar, which is a messaging app for mobile devices. It is based on a friend-to-friend network, so it leverages trusted connections. It uses a secure transport protocol to transfer data from one device to another securely. While BSP is made for mobile devices, it assumes devices are largely online. This has battery concerns, and also means some difficulty in the real-world for running on iPhone devices. 52 53 BSP works by keeping a log for each group context, and you replicate data asynchronously. Depending on the specific settings, you then offer to send or send messages to other peers that you are sharing with. If they receive it, they acknowledge this to you, and if they don't the sender resends. Each peer keeps track of the state of other nodes to know when to resend messages. 54 55 Because it makes no special network assumptions, it works just as well over sneakernet (USB stick). 56 57 Two drawbacks off this approach are: 58 - high churn is problematic, as can be verified in a simulation / back of the envelope calculation 59 - large group chats over a multicast network lead to a lot of overhead due to each message being pairwise 60 61 ### Dispersy Bundle Synchronization (DBS) 62 63 Dispersy is a form of anti-entropy sync, which uses bloom filters to advertise locally available bundles to random nodes in the network. The receiving node then tests this bloom filter against their own locally available bundles, and requests missing messages. As each random interaction steps, we get closer and closer to a consistent world view among all nodes. 64 65 This approach is very bandwidth efficient, as Bloom Filters are very space efficient way of conveying probabilistic information, and the effects of false positives can be mitigated. One problem with using Bloom Filters is that if you have a lot of elements your false positive rate can shoot up quite quickly. To mitigate this, DBS uses a subset of elements approach. Essentially they cut up the locally available bundles in clever ways, which allows the false positive rate and the bloom filter size stay the same. The latter is important, as for challenging networks you want to keep the payload limited to allow for NAT puncturing with UDP, and too big of a payload leads to fragmentation. 66 67 DBS is used in Tribler, where it's used for their megacaches, which essentially contains various forms of social information (peers similar, videos liked, etc). It's important to note that this doesn't have to be 100% up to date to be useful information. There are two main downsides of this approach as far as the author can tell: 68 69 - it is probabilistic and doesn't aim to be used for guaranteed message ordering (i.e. casual consistency) 70 - you sync data with all peers, including potentially untrusted peers 71 72 ### Swarm Feeds and chunks 73 74 Swarm is one of the legs of Web3 in Ethereum world. There are many related protocols to Swarm, such as access control, PSS for routing, incentivization, etc. It's essentially storage for the so called world computer. Swarm spreads data into chunks that are spread onto all the nodes in the network. It's leveraging a form of DHT, a structured network, where the data placement is related to the address of the content. This means individual nodes don't have a say in what they store, which is a form of censorship resistance, as no one is really in charge of data storage. 75 76 All data is immutable in Swarm by default, so to deal with mutability Swarm has Feeds. This is a mutable resource, similar to a pointer or DNS. It essentially acts as a replicated log. This is useful for smartphones, since you can replicate your log to the network, and thus get close to 100% available guarantees. There are a few downsides to this approach: 77 78 - Your log is by default accessible by anyone 79 - The receiver need to actively go fetch your log to sync 80 - In a group context, there might be a lot of feeds to sync 81 82 These can be mitigated to various degrees by using things like PSS, topic negotiation, and so on. There are trade-offs to all of these though. 83 84 ## Problem motivation 85 86 Why do we want to do p2p sync for mobilephones in the first place? There's three components to that question. One is on the valuation of decentralization and peer-to-peer, the second is on why we'd want to reliably sync data at all, and finally why mobilephones and other resource restricted devices. 87 88 For decentralization and p2p, there are various reasons, both technical and social/philosophical in nature. Technically, having a user run network means it can scale with the number of users. Data locality is also improved if you query data that's close to you, similar to distributed CDNs. The throughput is also improved if there are more places to get data from. 89 90 Socially and philosophically, there are several ways to think about it. Open and decentralized networks also relate to the idea of open standards, i.e. compare AOL with IRC or Bittorrent and the longevity. One is run by a company and is shutdown as it stops being profitable, the others live on. Additionally increasingly control of data and infrastructure is becoming a liability [xx]. By having a network with no one in control, everyone is. It's ultimately a form of democratization, more similar to organic social structures pre Big Internet companies. This leads to properties such as censorship resistance and coercion resistance, where we limit the impact a 3rd party might have a voluntary interaction between individuals or a group of people. Examples of this are plentiful in the world of Facebook, Youtube and Twitter. 91 92 For reliably syncing data at all, it's often a requirement for many problem domains. You don't get this by default in a p2p world, as it is extra unreliable with nodes permissionslessly join and leave the network. In some cases you can get away with only ephemeral data, but usually you want some kind of guarantees. This is a must for reliable group chat experience, for example, where messages are expected to arrive in a timely fashion and in some reasonable order. The same is true for messages there represent financial transactions, and so on. 93 94 Why mobilephones? We live in an increasingly mobile world, and most devices people use daily are mobile phones. It's important to provide the same or at least similar guarantees to more traditional p2p nodes that might run on a desktop computer or computer. The alternative is to rely on gateways, which shares many of the drawbacks of centralized control and prone to censorship, control and surveillence. 95 96 More generally, resource restricted devices can differ in their capabilities. One example is smartphones, but others are: desktop, routers, Raspberry PIs, POS systems, and so on. The number and diversity of devices are exploding, and it's useful to be able to leverage this for various types of infrastructure. The alternative is to centralize on big cloud providers, which also lends itself to lack of democratization and censorship, etc. 97 98 ### Minimal Requirements 99 100 In terms of minimal requirements or design goals for a solution we propose the following. 101 102 1. MUST sync data reliably between devices. 103 By reliably we mean having the ability to deal with messages being out of order, dropped, duplicated, or delayed. 104 105 2. MUST NOT rely on any centralized services for reliability. 106 By centralized services we mean any single point of failure that isn’t one of the endpoint devices. 107 108 3. MUST allow for mobile-friendly usage. 109 By mobile-friendly we mean devices that are resource restricted, mostly-offline and often changing network. 110 111 4. MAY use helper services in order to be more mobile-friendly. 112 Examples of helper services are decentralized file storage solutions such as IPFS and Swarm. These help with availability and latency of data for mostly-offline devices. 113 114 5. MUST have the ability to provide casual consistency. 115 By casual consistency we mean the commonly accepted definition in distributed systems literature. This means messages that are casually related can achieve a partial ordering. 116 117 6. MUST support ephemeral messages that don’t need replication. 118 That is, allow for messages that don’t need to be reliabily transmitted but still needs to be transmitted between devices. 119 120 7. MUST allow for privacy-preserving messages and extreme data loss. 121 By privacy-preserving we mean things such as exploding messages (self-destructing messages). By extreme data loss we mean the ability for two trusted devices to recover from a, deliberate or accidental, removal of data. 122 123 8. MUST be agnostic to whatever transport it is running on. 124 It should not rely on specific semantics of the transport it is running on, nor be tightly coupled with it. This means a transport can be swapped out without loss of reliability between devices. 125 126 We've already expanded on the rationale for 1-3. Let's briefly expand on the need for the other requirements. For 4, the reality is that mobile devices are largely offline, and you need somewhere to get data from. To get reasonable latency, this requires some additional form of helper service. It also ties into 3, which is to allow for a good UX on mobile, which means reliable and reasonable latency. 127 128 5 is a must to at least be able provide casual ordering between events, though it is up to clients of the protocol to use this or no. 6 is useful for non important messages, such as typing notifications, discovery beacons, and so on. The alternative here is to use a separate protocol, which would make this data sync layer less pluggable and general for various types of transports. It is similar to TCP and UDP running on IP. This also ties into 8, where the protocol should be able to use any underlying transport from device to device. 129 130 For 7, this is often a desirable feature for secure messaging applications. The idea is that you want to do exploding messages, or maybe wipe your device if you are afraid of local seizure [xx]. This is an explicit requirement as otherwise you get into hairy solutions when for example repairing a DAG. 131 132 ## Solution / System Model 133 134 # System model 135 136 ## System components 137 138 Here's a brief overview of the components that together make up data sync. 139 140 1. A *node*, or device, synchronizes data with other nodes. 141 2. *Data*, or a message, is the minimal unit being synchronized, and it is references by a globally unique *message ID*. 142 3. A *group* is an independent sync scope consisting of multiple nodes synchronizing some messages. 143 4. A *log* is an ordered list of messages by a particular node in a sync scope. 144 5. A *feed* is a persistent identifier to the head of a log, which allows us to identify missing data. 145 146 Nodes belonging to the same group can choose which other nodes they synchronize with. A log is local by default, but can be replicated as part of communication with other nodes (passive replication) and, as an extension, actively replicated through decentralized file storage. A feed can either be implicit, in that it is the last message received by another node, or it can be explicit, in the case of fetching updates from mostly-offline nodes. Assuming messages include message IDs from other nodes, by taking logs from multiple nodes in a group together, we can optionally form an immutable graph. 147 148 ## Overview 149 150 We propose a protocol that's heavily inspired by BSP, with some tweaks, some minor and some major. Let's first introduce BSP in more detail as specified before diving into enhancements. 151 152 We synchronize messages between devices. Each device has a set of peers, of which it chooses a subset to which it wants to synchronize messages with. Each synchronization happens within a data group, and in a data group there's a set of immutable messages. Each message has a message id that identifies it. 153 154 There are the following message types: OFFER, ACK, SEND, REQUEST. ACK acknowledges a set of message IDs, whereas REQUEST requsests it. OFFER offers a set of message ids without sending them, whereas SEND sends the actual messages. Depending on their sharing policy, a client can choose to OFFER messages or SEND messages first, depending on if latency (roundtrips) or bandwidth (payloads) is at a premium. 155 156 **Tweak 1:** For payloads we propose using Protobuf to allow for upgradability and implementation in multiple languages. For wire protocol we use a slightly different one, see next major section for this. 157 158 **Tweak 2:** One difference from BSP is that we allow multiple messages types to occur in one packet, so you can ACK messages at the same time as you OFFER messages. This is to allow for fewer roundtrips, as well as piggybacking ACKs. This is useful if a node might not want to disclose that they received a message straight away. 159 160 A device keeps track of its peers that it is synchronizing data with. When a device sends a message, it starts by appending it locally to its own log. It then tries to OFFER or SEND messages to each of the peers it is sharing with. This is operating on a positive retransmission with ACKs basis, simlar to e.g. TCP/IP. It uses exponential backoff and keeps track of send count to ensure it doesn't send too many messages to an offline node. 161 162 A receiving node might receive messages out of order. Inside the payload, a useful thing to have is a set of parent IDs. These are message dependencies that can be requested specifically before delivering the message. So a node knows what data it is missing, and can request it, by default from the sending node but additionally in other ways. Only once all message dependencies have been met does the message actually get delivered, as some order has been guaranteed. This is useful when casual consistency is needed, but isn't a strict requirement and it might be domain dependent. 163 164 ### Enhancement 1: Leveraging multicast 165 166 Another difference with BSP is that we allow for leveraging multicast protocols. In these you might have a topic construct, which you know multiple receivers are listening to. If you are sending on a topic that you know a set of participants are listening to, this means a client counts this as a send to those participants. If you have 100 participant on a topic, this means a 100x factor improvement in bandwidth usage. A mapping is kept between known participants and a topic to facilitate this. 167 168 As a specific example, if you have a group chat with 100 participants, you can ACK a set of messages. This means you publish this ACK (message ids) to a topic. As a result, all participants who are able to read this message knows you recevied it. Additionally, they are now aware that this is a message id that exists, and may choose to request it. However, this still has the drawback that each node sends ACKs to the same topic, which can lead to undesirable bandwidth consumption. This points to using ACKs less aggressively and leveraging more randomized and optimized approaches. These additional enhancements in case of large group chats are possibly, see below for more radical changes. 169 170 This is an optional feature, as it isn't guaranteed the listeners of a topic are the intendendent recipients, which is a somewhat weaker assumption than friend to friend with mandatory trust establishment. 171 172 ### Enhancement 2: Swarm for replicated log 173 174 One drawback mentioned earlier is that high churn is problematic. As a simple example, we can imagine a mobile node only being online 10% of the time. This means 9/10 sends will be failures, and this will cascade with failing to deliver ACKs, leading to more resends, and so on. Additionally, with exponential backoff and possibly offline devices at non-overlapping times, the latency might be unacceptable. 175 176 The fundamental issue here is one of low-availability. How do we solve this without reverting back to centralized solutions? One such approach is to replicate the log remotely. Similar to what is done in Git, but ideally in a 'no one owns this way', which points to IPFS and Swarm. This enables an interesting property, where you can essentially upload and forget, and, assuming incentives around storage policy and so on work out, you can be reasonably sure the data is still there for you even if you lose your local copy. 177 178 A replicated log can exist on Swarm and IPFS, assuming the message ids map to references used there. What Swarm Feeds provide is a way to show where the last message is, and doing so without requiring individual nodes to send or offer a message to that peer individually. This means as long you know which log to pull from, you can query it at some interval. Additionally, Swarm Feeds uses an adaptive algorithm that allows you to seamlessly communicate the expected update frequency. 179 180 One downside of this is that it requires Internet access and a specific overlay network, currently devp2p. This is an optional enhancement, and there are many ways to replicate remote logs, the important thing is that the other node is aware of it. 181 182 The way it'd work is as follows. A node commits a message to their local log, then they sync it to some decentralized file storage, such as Swarm or IPFS. This information is further confirmation that the data has been replicated and acts as a form of ACK where Swarm is seen as a node. Recipients then know, through previous communication with sender node, that they can look at this log. This means the two nodes don't need to be both online, and they can leverage Swarm, or other similar solutions, with less latency. Another way of looking at it is as chunks providing a linked list. Even if a similar solution lacks a feeds-like mechanism (e.g. like ENS or DNS), it can still be used for chunk storage, even though latency might be slightly higher. 183 184 Let's decompose the above into orthogonal concerns. 185 186 #### Content addressed storage (CAS) 187 188 Swarm with its chunks and references is a content addressed storage. This allows us to get and put data there, with various properties. There are also other content addressed storages, such as IPFS and local node's cache. What's important is that there's a way to refer to messages, and that this is based on a cryptographic hash function. This means we have two properties: collision resistance and preimage resistance. This allows us to uniquely refer to a piece of content, and it can be leveraged by clients in term of ordering of events. It also allows us to check the data integrity of a specific message. 189 190 A desirable property here is to be as agnostic as possible to the specific infrastructure used. How can we refer to the same piece of content through various stores? This is an open question, but tentatively multihashes and multicodec can be used for this. 191 192 One way of solving this issue is by wrapping. That is, a client might advertise they have the capability to replicate messages on Swarm. Reciving nodes then know to look at a certain feed, and referenced there we have chunk IDs. Inside the chunks we have the actual messages. So we have both ChunkIDs (Swarm references), and inside the chunks themselves the actual messages with their message IDs. This ensures messages stay immutable and are not tightly coupled to a specific storage or distribution mechanism. 193 194 #### Update pointers 195 196 In the above section, we suggest using Feeds. However, this is just one alternative. Other alternatives are ENS, DNS, etc. We can also use endpoint communicating the latest update through a push-message. It's desirable that this notion is kept abstract. 197 198 What it does is to (a) allow you to see last updated data and ideally (b) reference previous state, so you can fetch it via CAS, or at least be aware that it's missing. 199 200 ### Enhancement 3: Dispersy Bloom Filters 201 202 Another drawback mentioned in earlier section is that large group data sync contexts over multicast network lead to a lot of overhead due to each message being pairwise. By leveraging the multicast capability when it exists we mitigate this somewhat. However, a simple back of the envelope calculation shows that this might still be too much. If you have 100 nodes in a group context, a node sending a message over a topic results in 100 naive ACKs over topic. These can clumped together and so on but it still might be an unacceptable bandwidth trade off. Without relying on centralized intermediaries, this leads to ACKs being relaxed somewhat, while still ensuring we keep reliablity. One way of doing this is by using some of the ideas in DSP. 203 204 By using bloom filters and repeat random interactions with other nodes in a data sync context to offer and acknowledge messages, we'll eventually reach a form of steady state, as this is a form of anti-entropy sync. 205 206 We can extend the message types as follows. In large group contexts, OFFER and ACK, all messages that have a set of message ids, can be replaced with BLOOM_AVAILABLE and some additional subset description. This conveys the same information as OFFER and ACK but probabilistically, in the following way. If we have a data sync context with 100 nodes, one node A has some messages locally available, and it connects to a random node B. A then sends a bloom filter along with some additional information. B then compares each locally available message (for some range) with the bloom filter sent. It gets a response 'maybe' or 'not' in set, and false positives here are fine. If it is not in set, it knows A is missing these messages, and in return it sends a specific REQUEST for these messages, along with its own Bloom filter. B then does vice versa. Messages that are in the 'maybe' category can be seen as probabilistic ACK, and no further action is needed from the receiving node. This information can be discarded, or it can be used as probabilistic information to inform things like future send time. 207 208 #### Subset description 209 How is the subset described? What's most important is that there's a way to divide up a set of messages into some equally sized buckets. This is important to ensure the bloom filter length and false positive rate stays the same. A good way of doing this is using Lamport timestamps. These are non-unique but monotonic, and they roughly evenly divide up the space of messages. I.e. we can select a range 0..1000 and communicate this in the BLOOM_AVAILABLE message, and the recipent knows which subset of messages to look for. What happens if there's a byzantine node that sends messages with the same timestamp? In that case, the sending node would select a smaller subset partition range. The universe size for the bloom filter is determined by sender, and it simply hints at the receiver where it should look in its own store. 210 211 We don't just want to arbitrarily divide the space, instead we want to make sure we send a request that's most likely to get the information we need. Similar to DSP, depending on if the node has almost caught up or is just starting, different heuristics can be used. For example, if a node just joined, it can use a modulo heuristic. This way a field in BLOOM_AVAILABLE would also be modulo and offset. As an example, if you expect there to be roughly 1000 messages, you might select modulo 10. This would then divide this up the space into 10 pieces, which can be requested in parallel. I.e. 1st, 11th, 21th message, and so on. This approximates a linear download, without requiring any state. For more recent messages, a pivot heuristic that biases recent messages is desirable, that probabilistically picks messages that are more recent. For further analysis of this, we refer to the DSP paper. 212 213 #### Sync scope and location of subset description 214 A brief experimental note on sync scope and Lamport clocks. In some cases we might want untrusted nodes so not use information inside payload. One thing worth noticing here is where we want the global time partial ordering Lamport clocks to be stored. By default, they will be inside a data sync context payload. There may be circumstances were it is desirable to expand this. As a concrete example: imagine you have several one on one chats, each their own data sync context, since the chats are encrypted. If these devices are mostly offline, and you don't leverage remote replicated logs, it may be desirable to let other nodes help you sync these messages for you. This is similar to having an encrypted torrent file, but still having a way to refer to roughly ordered chunks, so other nodes can seed it. It may be that it's desirable to have nested sync contexts, where participants can choose to let other nodes help them sync in order to reduce latency. By doing so, another level of Lamport clocks can be added in the header, and the sync scope ID can also be used to divie up the space. This design is experimental, and belongs in future work. 215 216 Notice that this general approach might still have to be complemented with REQUESTs. If we want casual consistency, it's desirable to have a way to refer to specific message ids to fetch these messages. 217 218 Also note that this would work without a specific structured overlay on top of Internet, for example over a mesh network. 219 220 ## Specification and wire protocol 221 222 What follows is the actual specification and wire protocol. Note that this is the Minimal Version of Data Sync (MVDS). Both of these are in draft mode and currently being tested. They don't currently implement any of the enhancements listed above. 223 224 ### Caveats for the minimal version (Status specific) 225 226 This section is specific to Status engineering concerns to outline some aspects regarding the current state of MVDS (mid-May 2019). 227 228 1. On mode of replication: This version uses passive replication where nodes share data they are already interested in. In future version, this may be extended to use active replication, see enhanement 2 and 3. 229 230 2. On the role of mailservers: This does not replace mailservers outright, it amends them and makes them less critical. A simple litmus test is that reliability is not impacted by mailserver outage. Specificially, it deals with correctness first, not latency and mostly-offline per se. It is compatible with mostly-offlice enhanchements (see requirement 3), and this will be a focus of future iterations, for example through the enhancements mentioned. 231 232 3. On parent IDs: This is a concern for the specific clients that uses data sync layer. Other approaches may also be used and are not actively discouraged. It is a desirable aspect though, as it allows us to leverage CAS and build up a DAG, as outlined above. As future iterations are further along and example clients are implemented, this will likely be more clear. 233 234 4. Current implementation: Currently there's a reference implementation being integrated `status-console-client` (Golang). The (developing) spec is the source of truth though. In the future, it is expected that more clients will be integrated, for example in Javascript, Nim and others. For integration into the app, this is up to Core to leverage the code in `status-console-client`. 235 236 #### Mailservers and data sync upgrade path 237 238 In a sentence, data sync provides reliability/correctness whereas mailserver is currently used to provide less latency/more availability. 239 240 What's the contract for mailservers? 241 A node requests messages from a set of topics (roughly group context). This can be added with a time interval hint. Asynchronously, this mailserver will send expired messages. This is a set of messages, some of which the node can read and some they can't. There may be messages missing from this. Without data sync, we currently assume and require mailservers to have high availability to pick up all the envelopes. 242 243 The upgrade path looks like this: 244 1. Ad hoc messaging + mailservers (need to be HA) 245 2. Data sync + mailservers (doesn't need to be HA) 246 3. Data sync + Swarm (e.g.) 247 248 That is, we can leave mailserver as they are. For dealing with missing data, data sync will ensure not ACKed messages are resent. Alternatively and additionally, the Dispersy enhancement and/or message dependencies can be used to ensure receiver has received relevant messages. During a Chaos Unicorn day event, lack of mailservers would merely lead to message delays, not lost messages. 249 250 ### Types 251 252 #### Custom Types 253 254 | Type | Equivilant | Description | 255 |-------------|------------|-----------------------------------------------------------| 256 | `MessageID` | `bytes32` | 32 bytes of binary data obtained when hashing the message | 257 258 #### Message Types 259 260 We define `messages` as packets sent and recieved by MVDS nodes. They have been taken from the BSP spec. 261 262 ##### `ACK` 263 264 @TODO: DESCRIBE 265 266 ```python 267 { 268 'ids': ['MessageID'] 269 } 270 ``` 271 272 ##### `OFFER` 273 274 @TODO: DESCRIBE 275 276 ```python 277 { 278 'id': ['MessageID'] 279 } 280 ``` 281 282 ##### `REQUEST` 283 284 @TODO: DESCRIBE 285 286 ```python 287 { 288 'id': ['MessageID'] 289 } 290 ``` 291 292 ##### `MESSAGE` 293 294 @TODO: DESCRIBE 295 296 ```python 297 { 298 'group_id': 'bytes32', 299 'timestamp': 'int64', 300 'body': 'bytes' 301 } 302 ``` 303 304 ### Protobuf 305 306 ``` 307 syntax = "proto3"; 308 309 package mvds; 310 311 message Payload { 312 Ack ack = 1; 313 Offer offer = 2; 314 Request request = 3; 315 repeated Message messages = 4; 316 } 317 318 message Ack { 319 repeated bytes id = 1; 320 } 321 322 message Message { 323 bytes group_id = 1; 324 int64 timestamp = 2; 325 bytes body = 3; 326 } 327 328 message Offer { 329 repeated bytes id = 1; 330 } 331 332 message Request { 333 repeated bytes id = 1; 334 } 335 ``` 336 337 ### Wire protocol 338 339 ### Example Clients 340 341 There are many clients that might use data sync, and each client might have their own policy for sharing messages, etc. This is where the semantics of what is actually in a message lives, which might differ in terms of guarantees and so on depending the domain. 342 343 #### Private chat 344 345 A simple example is a private chat, which includes two humans talking encrypted. Additionally, each human might have multiple devices. Even though it might just be two individuals, we still have a group context. 346 347 #### Private group chat 348 349 This is similar to private chat, except you might have some semantics around who can join a chat. This can be specified in the form of a finite state machine, and message types such as INVITE can be protobuf messages. 350 351 #### Public (group) chat 352 353 This is a channel with a lot of participants, and it might not be end to end encrypted. Additionally, it might be coordination-less, such that you can take a given string and hash it to a 'topic', and then join the chat. 354 355 Membership here might be more probabilistic, and sharing policy more restricted/random, due to bandwidth constraints. 356 357 #### State channels 358 359 To enable layer 2 scaling solutions, such as state channels. This might have more stringent requirements than normal 'human' chat. 360 361 #### Transport properties 362 363 To communicate what transports a specific node supports. This can be leveraged by other clients in terms of where to look for updates, etc. 364 365 #### Tribute to talk 366 367 Setting a specific limit for a stake that has to be paid before messages are delivered to an end user. 368 369 #### Multisig coordination 370 371 Ensuring nodes get updates on when a multisig signature is required. A private group chat with native support for multsig interactions. 372 373 #### And so on 374 375 As you can see, a lot of different types of clients are possible. What they have in common is that they leverage data sync to various extents, and don't have to think about reliability. Instead, they can specify semantics they care about. By specifying these in e.g. protobufs and having clear message types, roles, finite state machine and sharing policy, we enable implementation of these clients into multiple end user applications 376 377 ## Proof-Evaluation-Simulation 378 379 Various types of simulation are needed to verify the design. Specifically wrt parameters such as bandwidth and latency. Here we draft some example simulations. 380 381 ### Simulation 1: 1-1 chat (basic) 382 383 Two nodes talking to each with 10% churn each. That is: 10% probability to be online at any given time. When online, a node is online for X time. For simplicity, let's assume X is 5 minutes. This can be a parameter and be varied up to connection windows as short as 30s. 384 385 Send 5 messages each. 386 387 Answer the following questions: 388 1. What's the bandwidth overhead? 389 Expressed as multipler of 10 messages. E.g. if you try to send message one message 3 times and get 1 ack, that's x4 multipler. 390 391 2. What's the latency? 392 Expressed as ticks or absolute time until a node has received all messages sent by other node. Alternatively: expressed as a distribution of average or median latency, along with P90 latency. I.e. what's the latency for the 90%th delayed message? 393 394 #### Simulation 2: 1-1 chat (basic, naive extension) 395 396 Answer same questions above, but where a mailserver with 90% reliability relays messages. 397 398 ### Simulation 3: Large public chat (basic) 399 400 100 people sending 5 messages each. 401 402 Answer same questions as above. 403 404 ### Simulation 4: Large public chat (basic, multicast) 405 406 Same as above, but leverage multicast extension. 407 408 ### Simulation 5: Large public chat (basic, multicast, naive extension) 409 410 Multicast+naive mailserver. 411 412 ### Simulation 6: Large public chat (basic, dispersy extension) 413 414 Dispery random pairwise bloom filter. 415 416 ### Simulation 7: Large public chat (basic, dispersy extension, naive extension) 417 418 Dispery random pairwise bloom filter, and mailserver. 419 420 ### Notes 421 422 Can probably be compacted differently, and might make sense to vary or measure other variables. 423 424 ## Future work 425 426 ## Conclusion 427 428 ## References 429 430 431 432 - [Bramble Synchronization Protocol](https://code.briarproject.org/briar/briar-spec/blob/master/protocols/BSP.md) 433 - [Minimal Viable Data Sync](https://github.com/status-im/mvds/)