discv5-rationale.md
1 # Node Discovery Protocol v5 - Rationale 2 3 **Protocol version v5.1** 4 5 Note that this specification is a work in progress and may change incompatibly without 6 prior notice. 7 8 This document explains the design requirements and security needs of Discovery v5. In 9 addition, the document tries to gather the various vulnerabilities and threats that 10 pertain to Kademlia-like p2p networks. Our aim is to make it plain which issues are 11 addressed and how they are mitigated, so that the design of the [wire protocol] may be 12 verified. 13 14 # Design Requirements 15 16 ## Basic Goals 17 18 #### 1.1.1 Replace the Discovery v4 Endpoint Proof 19 20 The existing mutual endpoint verification process is unreliable because either side may 21 forget about a previously performed endpoint proof. If node A assumes that node B already 22 knows about a recent PING/PONG interaction and sends FINDNODE, the request may fail. 23 Implementations of Discovery v4 may guard against this flaw using retries, but retrying is 24 really slow and usually not done. 25 26 #### 1.1.2 Require knowledge of destination node ID for communication 27 28 Make it expensive to obtain the logical node ID from discovery communications. In 29 Discovery v4, any node can provoke responses knowing IP alone, and obtain information 30 about a node without knowing its ID. This encourages sloppy implementations to not perform 31 proper validation of FINDNODE results and increases the risk of DHT misuse for DDoS 32 purposes. 33 34 #### 1.1.3 Support more than one node ID cryptosystem 35 36 Ensure the DHT can accommodate ENR's with multiple identity systems. This will allow 37 identity cryptosystems other than *secp256k1/keccak256*. 38 39 #### 1.1.4 Replace node information tuples with ENRs 40 41 ENRs include discovery information and more. These signed, versioned records fulfill 42 multiple requirements, such as permitting capability advertisement and transport 43 negotiation. 44 45 #### 1.1.5 Guard against Kademlia implementation flaws 46 47 Discovery v4 trusts other nodes to return neighbors according to an agreed distance 48 metric. Mismatches in implementation can make it hard for nodes to join the network, or 49 lead to network fragmentation. 50 51 #### 1.1.6 Secondary topic-based node index 52 53 The protocol must support discovery of nodes via an arbitrary topic identifier. Finding 54 nodes belonging to a topic should be as fast or faster than finding a node with a certain 55 ID. 56 57 #### 1.1.7 Change replay prevention 58 59 The use of timestamps as a replay prevention mechanism in Discovery v4 has led to many 60 complaints about connectivity when the host's clock was wrong. The protocol should be 61 independent of the clock. 62 63 #### 1.1.8 Message obfuscation 64 65 The protocol should obfuscate traffic to prevent accidental packet mangling or trivial 66 sniffing. It must also avoid inclusion of obvious markers to prevent naive blocking of 67 discovery traffic using hard-coded packet signatures. Defense against advanced traffic 68 analysis systems, e.g. using inter-packet timing is a secondary concern. 69 70 ## Security Goals 71 72 Individual potential vulnerabilities are identified below. These each represent their own 73 risk mitigation goal. 74 75 #### 1.2.1 Replay of the handshake 76 77 The handshake, if successfully replayed from an older session, would allow a malicious 78 node to occupy a former IP location, or pollute the routing table with old information. 79 80 #### 1.2.2 Replay NODES 81 82 A NODES response, if successfully replayed, would pollute the routing table with stale 83 information. 84 85 #### 1.2.3 Replay PONG 86 87 A PONG, if successfully replayed, could convince a node that a node is live and 88 participating when it isn't. 89 90 #### 1.2.4 Kademlia redirection 91 92 A FindNode response contains false endpoint information intended at directing traffic at a 93 victim / polluting the routing table. A topic query results in fake endpoint information, 94 directing traffic at a victim. 95 96 #### 1.2.5 Kademlia redirection + self-propagation 97 98 As 1.2.3 but the responses attempt to replicate the malicious node throughout the routing 99 table, to amplify the source of pollution and traffic. 100 101 #### 1.2.6 Unsolicited replies 102 103 A malicious node is attempting to spam a node with fake responses to typical requests. 104 These messages may be replayed from previous communications, or may be new messages with 105 spoofed source endpoints. The aim is to disrupt weak implementations or have their 106 information be received as authentic, to pollute the recipient's routing table. 107 108 #### 1.2.7 Amplification 109 110 Malicious requests of small message size are sent from spoofed source IPs to direct larger 111 response messages at the victim. 112 113 #### 1.2.8 Kademlia direct validation 114 115 Direct validation of a newly discovered node can be an attack vector. A malicious node may 116 supply false node information with the IP of a victim. Validation traffic is then directed 117 at the victim. 118 119 #### 1.2.9 Kademlia ID count per address validations 120 121 There are various attacks facilitated by being able to associate multiple fake (or even 122 real) malicious node ids with a single IP endpoint. One mitigation method that is 123 sometimes considered is to globally limit the number of logical node IDs that can be 124 associated with an IP address. However, this is an attack vector. A malicious actor can 125 supply many logical node ids for a single IP address and thus prevent the correct node 126 from being able to join the network. 127 128 #### 1.2.10 Sybil/Eclipse attacks 129 130 These attacks rely on being able to create many real nodes, or spoof many logical node IDs 131 for a small number of physical endpoints, to form a large, isolated area of the network 132 under the control of the malicious actor. The victim's discovery findings are directed 133 into that part of the network, either to manipulate their traffic or to fully isolate them 134 from the network. 135 136 ## Version Interoperability / Upgrade Paths 137 138 There are several considerations regarding the coexistence of v4 and v5 network members. 139 140 #### 1.3.1 Transition period during network formation 141 142 Discovery v4 clients should be able to serve as discovery v5 bootstrap nodes while the 143 number of new discovery v5 clients is still low. 144 145 #### 1.3.2 Circumvention of 1.1.2 with v4 PING 146 147 While a client supports both the old v4 and newer versions, it is possible for malicious 148 actors to pose as a v4 node and recover node IDs from arbitrary IP addresses. This should 149 somehow be avoided. 150 151 # Rationale 152 153 ## Why UDP? 154 155 The wire protocol specification mandates the use of UDP. This may seem restrictive, but 156 use of UDP communication is an important part of the design. While there is no single 157 reason which ultimately dictates this choice, there are many reasons why the system as a 158 whole will function a lot better in the context of UDP. 159 160 For discovery to work, all nodes must be able to communicate with each other on equal 161 footing. The network won't form properly if some nodes can only communicate with certain 162 other nodes. Incooperative NAT in between the node and the Internet can cause 163 communication failure. UDP is fundamentally easier to work with when it comes to NAT 164 traversal. No explicit hole-punching is required if the NAT setup is capable of full-cone 165 translation, i.e. a single packet sent to any other node establishes a port mapping which 166 allows packets from others to reach the node behind NAT. 167 168 Unlike other DHT systems such as IPFS, the node discovery protocol mandates a single wire 169 protocol to be implemented by everyone. This avoids communication failures due to 170 incompatible transports and strengthens the DHT because all participants are guaranteed to 171 be reachable on the declared endpoint. It is also fundamentally simpler to reason about 172 and implement: the protocol either works in a certain context or it doesn't. If the 173 protocol cannot be used because the networking environment doesn't support UDP, another 174 discovery mechanism must be chosen. 175 176 Another reason for UDP is communication latency: participants in the discovery protocol 177 must be able to communicate with a large number of other nodes within a short time frame 178 to establish and maintain the neighbor set and must perform regular liveness checks on 179 their neighbors. For the topic advertisement system, registrants collect tickets and must 180 use them as soon as the ticket expires to place an ad in a topic queue. 181 182 These protocol interactions are difficult to implement in a TCP setting where connections 183 require multiple round-trips before application data can be sent and the connection 184 lifecycle needs to be maintained. An implementation of the wire protocol on a TCP-based 185 transport would either need permanent connection to hundreds of nodes, in which case the 186 application would be short on file descriptors, or establish many short-lived TCP 187 connections per second to communicate with specific nodes. 188 189 Yet another useful property of UDP is that packets aren't required to reach their 190 destination --- intermediaries may drop arbitrary packets. This strengthens the protocol 191 because it must be designed to function even under bad connectivity. Implementations may 192 exploit the possibility of packet loss to their advantage. A participant can never tell 193 whether a certain request wasn't answered in time because the recipient chose to ignore it 194 or because their own connection isn't working. An implementation that tries to minimize 195 traffic or CPU overhead could simply drop a certain amount of packets at application level 196 to stay within self-imposed limits. 197 198 ## Why Kademlia? 199 200 Kademlia is a simple distributed hash table design proposed in 2002. It is commonly used 201 for file-sharing systems where content is stored by hash and distributed among 202 participants based on their 'proximity' according to the XOR distance metric. 203 204 Node discovery is a Kademlia-inspired system but doesn't store any files, only node 205 information is relayed. We chose Kademlia primarily because the algorithm is simple and 206 understandable while providing a distributed database that scales with the number of 207 participants. Our system also relies on the routing table to allow enumeration and random 208 traversal of the whole network, i.e. all participants can be found. Most importantly, 209 having a structured network with routing enables thinking about DHT 'address space' and 210 'regions of address space'. These concepts are used to build the [topic-based node index]. 211 212 Kademlia is often criticized as a naive design with obvious weaknesses. We believe that 213 most issues with simple Kademlia can be overcome by careful programming and the benefits 214 of a simple design outweigh the cost and risks of maintaining a more complex system. 215 216 ## Sybil and Eclipse Attacks 217 218 The well-known 'sybil attack' is based on the observation that creating node identities is 219 essentially free. In any system using a measure of proximity among node identities, an 220 adversary may place nodes close to a chosen node by generating suitable identities. For 221 basic node discovery through network enumeration, the 'sybil attack' poses no significant 222 challenge. Sybils are a serious issue for the topic-based node index, especially for 223 topics provided by few participants, because the index relies on node distance. 224 225 An 'eclipse attack' is usually based on generating sybil nodes with the goal of polluting 226 the victim node's routing table. Once the table is overtaken, the victim has no way to 227 find any other nodes but those controlled by the adversary. Even if creating sybil nodes 228 were somehow impossible, 'eclipsing' a node might still be achieved through other means 229 such as directing large amounts of traffic to the node. When the victim node is unable to 230 keep up regular communication with the rest of the network it may lose connection and be 231 forced into re-bootstrapping its routing table --- a situation in which it is most 232 vulnerable. 233 234 Both the 'sybil attack' and the 'eclipse attack' must be considered for any structured 235 overlay network, and there is no single optimal solution to fully protect against these 236 attacks. However, certain implementation decisions can make them more expensive or render 237 them ineffective. 238 239 As a general measure, implementations can place IP-based limits on the content of their 240 routing table. For example, limiting Kademlia table buckets to two nodes from every /24 IP 241 subnetwork and the whole table to 10 nodes per /24 IP subnetwork significantly increases 242 the number of hosts an attacker must control to overtake the routing table. Such limits 243 are effective because IPv4 addresses are a scarce resource. Subnetwork-based limits remain 244 effective even as IPv6 adoption progresses. 245 246 To counter being eclipsed via repeated contact by an adversary, implementations of the 247 Kademlia table should avoid taking on new members on incoming contact unless the table is 248 well-stocked from outbound queries. Readers of the original Kademlia paper may easily 249 assume that liveness checks on bucket members should be performed just when a new node 250 tries to enter the bucket, but doing so increases the risk of emptying the table through 251 DoS. We therefore recommend to perform liveness checks on a separate schedule which is 252 independent of incoming requests. Checks may also be paused or delayed when the node is 253 under high load. The number of past liveness checks performed on a bucket member is an 254 important indicator of its age: Implementations should favor long-lived nodes and may 255 relax liveness checks according to node age. 256 257 A well-researched countermeasure to sybil attacks is to make creation of identities 258 computationally expensive. While effective in theory, there are significant downsides to 259 this approach. Nodes on resource-constrained devices such as mobile phones may not be able 260 to solve the computational puzzle in time to join the network. Continuous advances in 261 hashing technology which speed up cryptocurrency proof-of-work algorithms show that this 262 way of securing the network requires constant adjustments to thresholds and can never beat 263 determined attackers. 264 265 Support for mixed ENR identity schemes, described later in this document, allows for an 266 escape hatch to introduce arbitrary optional constraints (including proof-of-work) on node 267 identities. Thus, while the issue is not directly addressed at wire protocol level, there 268 is no inherent blocker for solving it as the need arises. 269 270 ## Node Records and Their Properties 271 272 In Discovery v5, all node information is exchanged using [node records]. Records are 273 self-signed by the node they describe and contain arbitrary key-value pairs. They also 274 contain a sequence number to determine which copy of the record is newer when multiple 275 copies are available. When a node record is changed by its owner, the sequence number 276 increases. The new record 'syncs' to neighboring nodes because they will request it during 277 liveness revalidation. The record is also 'pushed' on to newly seen nodes as part of the 278 handshake. 279 280 Signing records prevents any intermediary node from changing the content of a record. Any 281 node's information is either available in the exact form it was published or not at all. 282 To make the system secure, proper validation of records is important. Implementations must 283 verify the signature of all received records. Implementations should also avoid sharing 284 records containing no usable IP addresses or ports and check that Internet hosts do not 285 attempt to share records containing LAN IP addresses. 286 287 ## On Encryption 288 289 An early draft of Discovery v5 integrated weak obfuscation based on XORing packet content 290 as an optional facility. As development of the protocol progressed, we understood that 291 traffic amplification, replay and packet authentication could all be solved by introducing 292 a real encryption scheme. The way the handshake and encryption works is primarily aimed at 293 these issues and is not supposed to ensure complete anonymity of DHT users. While it does 294 protect against passive observers, the handshake is not forward-secure and active protocol 295 participants can access node information by simply asking for it. 296 297 Node identities can use different kinds of keys depending on the identity scheme used in 298 the node record. This has implications on the handshake because it deals with the public 299 key used to derive the identity. Implementations of Discovery v5 must agree on the set of 300 supported identity schemes to keep the network interoperable and custom code to verify the 301 handshake is required for every new scheme. We believe this is an acceptable tradeoff 302 because introducing a new kind of node identity is a rare event. 303 304 Since the handshake performs complex cryptographic operations (ECDH, signature 305 verification) performance of the handshake is a big concern. Benchmarking the experimental 306 Go implementation shows that the handshake computation takes 500µs on a 2014-era laptop 307 using the default secp256k1/keccak256 identity scheme. That's a lot, but note the cost 308 amortizes because nodes commonly exchange multiple packets. Subsequent packets in the same 309 conversation can be decrypted and authenticated in just 2µs. The most common protocol 310 interaction is a FINDNODE or TOPICQUERY request on an unknown node with 4 NODES responses. 311 312 To put things into perspective: encryption and authentication in Discovery v5 is still a 313 significant improvement over the authentication scheme used in Discovery v4, which 314 performs secp256k1 signature 'recovery' (benchmark: ~170µs) on every packet. A FINDNODE 315 interaction with an unknown v4 node takes 7 packets (2x PING/PONG, FINDNODE, 2x NEIGHBORS) 316 and costs 1.2ms on each side for the crypto alone. In addition, the v5 handshake reduces 317 the risk of computational DoS because it costs as much to create as it costs to verify and 318 cannot be replayed. 319 320 ## On Amplification and Replay 321 322 Any openly accessible packet-based system must consider misuse of the protocol for traffic 323 amplification purposes. There are two possible avenues of attack: In the first, an 324 adversary who wishes to attack a third-party host may send packets with 'spoofed' source 325 IP address to a node, attempting to make the node send a larger response to the victim 326 endpoint. In the second, the adversary attempts to install a node record containing the 327 victim's endpoint in the DHT, causing other nodes to direct packets to the victim. 328 329 The handshake handles the first kind of attack by responding with a small WHOAREYOU packet 330 whenever any request is received from an unknown endpoint. This is safe because the 331 adversary's packet is always larger than the WHOAREYOU response, removing the incentive 332 for the attack. To make the countermeasure work, implementations must keep session secrets 333 not just per node ID, but also per node IP. 334 335 The second kind of attack--- installing the victim as a node ---is handled by requiring 336 that implementations mustn't answer queries with nodes whose liveness hasn't been 337 verified. When a node is added to the Kademlia table, it must pass at least one check on 338 the IP declared in the node record before it can be returned in a NODES response. 339 340 An adversary may also try to replay previously sent/seen packets to impersonate a node or 341 disturb the operation of the protocol. Session keys per node-ID/IP generally prevent 342 replay across sessions. The `request-id`, mirrored in response packets, prevents replay of 343 responses within a session. 344 345 ## The Topic Index 346 347 Using FINDNODE queries with appropriately chosen targets, the entire DHT can be sampled by 348 a random walk to find all other participants. When building a distributed application, it 349 is often desirable to restrict the search to participants which provide a certain service. 350 A simple solution to this problem would be to simply split up the network and require 351 participation in many smaller application-specific networks. However, such networks are 352 hard to bootstrap and also more vulnerable to attacks which could isolate nodes. 353 354 The topic index provides discovery by provided service in a different way. Nodes maintain 355 a single node table tracking their neighbors and advertise 'topics' on nodes found by 356 randomly walking the DHT. While the 'global' topic index can be also spammed, it makes 357 complete isolation a lot harder. To prevent nodes interested in a certain topic from 358 finding each other, the entire discovery network would have to be overpowered. 359 360 To make the index useful, searching for nodes by topic must be efficient regardless of the 361 number of advertisers. This is achieved by estimating the topic 'radius', i.e. the 362 percentage of all live nodes which are advertising the topic. Advertisement and search 363 activities are restricted to a region of DHT address space around the topic's 'center'. 364 365 We also want the index to satisfy another property: When a topic advertisement is placed, 366 it should last for a well-defined amount of time. This ensures nodes may rely on their 367 advertisements staying placed rather than worrying about keeping them alive. 368 369 Finally, the index should consume limited resources. Just as the node table is limited in 370 number and size of buckets, the size of the index data structure on each node is limited. 371 372 ### Why should advertisers wait? 373 374 Advertisers must wait a certain amount of time before they can be registered. Enforcing 375 this time limit prevents misuse of the topic index because any topic must be important 376 enough to outweigh the cost of waiting. Imagine a group phone call: announcing the 377 participants of the call using topic advertisement isn't a good use of the system because 378 the topic exists only for a short time and will have very few participants. The waiting 379 time prevents using the index for this purpose because the call might already be over 380 before everyone could get registered. 381 382 ### Dealing with Topic Spam 383 384 Our model is based on the following assumptions: 385 386 - Anyone can place their own advertisements under any topics and the rate of placing ads 387 is not limited globally. The number of active ads for any node is roughly proportional 388 to the resources (network bandwidth, mostly) spent on advertising. 389 - Honest actors whose purpose is to connect to other honest actors will spend an adequate 390 amount of efforts on registering and searching for ads, depending on the rate of newly 391 established connections they are targeting. If the given topic is used only by honest 392 actors, a few registrations per minute will be satisfactory, regardless of the size of 393 the subnetwork. 394 - Dishonest actors may want to place an excessive amount of ads just to disrupt the 395 discovery service. This will reduce the effectiveness of honest registration efforts by 396 increasing the topic radius and/or topic queue waiting times. If the attacker(s) can 397 place a comparable amount or more ads than all honest actors combined then the rate of 398 new (useful) connections established throughout the network will reduce proportionally 399 to the `honest / (dishonest + honest)` registration rates. 400 401 This adverse effect can be countered by honest actors increasing their registration and 402 search efforts. Fortunately, the rate of established connections between them will 403 increase proportionally both with increased honest registration and search efforts. If 404 both are increased in response to an attack, the required factor of increased efforts from 405 honest actors is proportional to the square root of the attacker's efforts. 406 407 ### Detecting a useless registration attack 408 409 In the case of a symmetrical protocol, where nodes are both searching and advertising 410 under the same topic, it is easy to detect when most of the found ads turn out to be 411 useless and increase both registration and query frequency. It is a bit harder but still 412 possible with asymmetrical (client-server) protocols, where only clients can easily detect 413 useless registrations, while advertisers (servers) do not have a direct way of detecting 414 when they should increase their advertising efforts. One possible solution is for servers 415 to also act as clients just to test the server capabilities of other advertisers. It is 416 also possible to implement a feedback system between trusted clients and servers. 417 418 # References 419 420 - Petar Maymounkov and David Mazières. 421 *Kademlia: A Peer-to-peer Information System Based on the XOR Metric.* 2002.\ 422 <https://www.scs.stanford.edu/~dm/home/papers/kpos.pdf> 423 424 - Atul Singh, Tsuen-Wan “Johnny” Ngan, Peter Druschel, Dan S. Wallach. 425 *Eclipse Attacks on Overlay Networks: Threats and Defenses*. 2006.\ 426 <https://www.cs.rice.edu/~dwallach/pub/eclipse-infocom06.pdf> 427 428 - Ingmar Baumgart and Sebastian Mies. 429 *S/Kademlia: A Practicable Approach Towards Secure Key-Based Routing.* 2007.\ 430 <https://telematics.tm.kit.edu/publications/Files/267/SKademlia_2007.pdf> 431 432 - Xin Sun, Ruben Torres and Sanjay Rao. *Feasiblity of DDoS Attacks with P2P Systems and 433 Prevention through Robust Membership Management.* 2007.\ 434 <https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1357&context=ecetr> 435 436 - Erik Hjelmvik, Wolfgang John. *Breaking and Improving Protocol Obfuscation.* 2010.\ 437 <https://internetstiftelsen.se/docs/hjelmvik_breaking.pdf> 438 439 - Adam Langley, Wan-Teh Chang. *QUIC Crypto*. 2016.\ 440 <https://docs.google.com/document/d/1g5nIXAIkN_Y-7XJW5K45IblHd_L2f5LTaDUDwvZ5L6g> 441 442 - W3C Credentials Community Group. *Decentralized Identifiers (DIDs) Spec.* 2017.\ 443 <https://w3c-ccg.github.io/did-spec> 444 445 - Seoung Kyun Kim, Zane Ma, Siddharth Murali, Joshua Mason, Andrew Miller, Michael Bailey. 446 *Measuring Ethereum Network Peers*. 2018.\ 447 <http://mdbailey.ece.illinois.edu/publications/imc18_ethereum.pdf> 448 449 - Yuval Marcus, Ethan Heilman, Sharon Goldberg. 450 *Low-Resource Eclipse Attacks on Ethereum’s Peer-to-Peer Network.* 2018.\ 451 <https://eprint.iacr.org/2018/236.pdf> 452 453 [wire protocol]: ./discv5-wire.md 454 [topic-based node index]: ./discv5-theory.md#topic-advertisement 455 [node records]: ../enr.md