Cradicle Explorer

/ discv5 / discv5-rationale.md

discv5-rationale.md

1 # Node Discovery Protocol v5 - Rationale
2
3 **Protocol version v5.1**
4
5 Note that this specification is a work in progress and may change incompatibly without
6 prior notice.
7
8 This document explains the design requirements and security needs of Discovery v5. In
9 addition, the document tries to gather the various vulnerabilities and threats that
10 pertain to Kademlia-like p2p networks. Our aim is to make it plain which issues are
11 addressed and how they are mitigated, so that the design of the [wire protocol] may be
12 verified.
13
14 # Design Requirements
15
16 ## Basic Goals
17
18 #### 1.1.1 Replace the Discovery v4 Endpoint Proof
19
20 The existing mutual endpoint verification process is unreliable because either side may
21 forget about a previously performed endpoint proof. If node A assumes that node B already
22 knows about a recent PING/PONG interaction and sends FINDNODE, the request may fail.
23 Implementations of Discovery v4 may guard against this flaw using retries, but retrying is
24 really slow and usually not done.
25
26 #### 1.1.2 Require knowledge of destination node ID for communication
27
28 Make it expensive to obtain the logical node ID from discovery communications. In
29 Discovery v4, any node can provoke responses knowing IP alone, and obtain information
30 about a node without knowing its ID. This encourages sloppy implementations to not perform
31 proper validation of FINDNODE results and increases the risk of DHT misuse for DDoS
32 purposes.
33
34 #### 1.1.3 Support more than one node ID cryptosystem
35
36 Ensure the DHT can accommodate ENR's with multiple identity systems. This will allow
37 identity cryptosystems other than *secp256k1/keccak256*.
38
39 #### 1.1.4 Replace node information tuples with ENRs
40
41 ENRs include discovery information and more. These signed, versioned records fulfill
42 multiple requirements, such as permitting capability advertisement and transport
43 negotiation.
44
45 #### 1.1.5 Guard against Kademlia implementation flaws
46
47 Discovery v4 trusts other nodes to return neighbors according to an agreed distance
48 metric. Mismatches in implementation can make it hard for nodes to join the network, or
49 lead to network fragmentation.
50
51 #### 1.1.6 Secondary topic-based node index
52
53 The protocol must support discovery of nodes via an arbitrary topic identifier. Finding
54 nodes belonging to a topic should be as fast or faster than finding a node with a certain
55 ID.
56
57 #### 1.1.7 Change replay prevention
58
59 The use of timestamps as a replay prevention mechanism in Discovery v4 has led to many
60 complaints about connectivity when the host's clock was wrong. The protocol should be
61 independent of the clock.
62
63 #### 1.1.8 Message obfuscation
64
65 The protocol should obfuscate traffic to prevent accidental packet mangling or trivial
66 sniffing. It must also avoid inclusion of obvious markers to prevent naive blocking of
67 discovery traffic using hard-coded packet signatures. Defense against advanced traffic
68 analysis systems, e.g. using inter-packet timing is a secondary concern.
69
70 ## Security Goals
71
72 Individual potential vulnerabilities are identified below. These each represent their own
73 risk mitigation goal.
74
75 #### 1.2.1 Replay of the handshake
76
77 The handshake, if successfully replayed from an older session, would allow a malicious
78 node to occupy a former IP location, or pollute the routing table with old information.
79
80 #### 1.2.2 Replay NODES
81
82 A NODES response, if successfully replayed, would pollute the routing table with stale
83 information.
84
85 #### 1.2.3 Replay PONG
86
87 A PONG, if successfully replayed, could convince a node that a node is live and
88 participating when it isn't.
89
90 #### 1.2.4 Kademlia redirection
91
92 A FindNode response contains false endpoint information intended at directing traffic at a
93 victim / polluting the routing table. A topic query results in fake endpoint information,
94 directing traffic at a victim.
95
96 #### 1.2.5 Kademlia redirection + self-propagation
97
98 As 1.2.3 but the responses attempt to replicate the malicious node throughout the routing
99 table, to amplify the source of pollution and traffic.
100
101 #### 1.2.6 Unsolicited replies
102
103 A malicious node is attempting to spam a node with fake responses to typical requests.
104 These messages may be replayed from previous communications, or may be new messages with
105 spoofed source endpoints. The aim is to disrupt weak implementations or have their
106 information be received as authentic, to pollute the recipient's routing table.
107
108 #### 1.2.7 Amplification
109
110 Malicious requests of small message size are sent from spoofed source IPs to direct larger
111 response messages at the victim.
112
113 #### 1.2.8 Kademlia direct validation
114
115 Direct validation of a newly discovered node can be an attack vector. A malicious node may
116 supply false node information with the IP of a victim. Validation traffic is then directed
117 at the victim.
118
119 #### 1.2.9 Kademlia ID count per address validations
120
121 There are various attacks facilitated by being able to associate multiple fake (or even
122 real) malicious node ids with a single IP endpoint. One mitigation method that is
123 sometimes considered is to globally limit the number of logical node IDs that can be
124 associated with an IP address. However, this is an attack vector. A malicious actor can
125 supply many logical node ids for a single IP address and thus prevent the correct node
126 from being able to join the network.
127
128 #### 1.2.10 Sybil/Eclipse attacks
129
130 These attacks rely on being able to create many real nodes, or spoof many logical node IDs
131 for a small number of physical endpoints, to form a large, isolated area of the network
132 under the control of the malicious actor. The victim's discovery findings are directed
133 into that part of the network, either to manipulate their traffic or to fully isolate them
134 from the network.
135
136 ## Version Interoperability / Upgrade Paths
137
138 There are several considerations regarding the coexistence of v4 and v5 network members.
139
140 #### 1.3.1 Transition period during network formation
141
142 Discovery v4 clients should be able to serve as discovery v5 bootstrap nodes while the
143 number of new discovery v5 clients is still low.
144
145 #### 1.3.2 Circumvention of 1.1.2 with v4 PING
146
147 While a client supports both the old v4 and newer versions, it is possible for malicious
148 actors to pose as a v4 node and recover node IDs from arbitrary IP addresses. This should
149 somehow be avoided.
150
151 # Rationale
152
153 ## Why UDP?
154
155 The wire protocol specification mandates the use of UDP. This may seem restrictive, but
156 use of UDP communication is an important part of the design. While there is no single
157 reason which ultimately dictates this choice, there are many reasons why the system as a
158 whole will function a lot better in the context of UDP.
159
160 For discovery to work, all nodes must be able to communicate with each other on equal
161 footing. The network won't form properly if some nodes can only communicate with certain
162 other nodes. Incooperative NAT in between the node and the Internet can cause
163 communication failure. UDP is fundamentally easier to work with when it comes to NAT
164 traversal. No explicit hole-punching is required if the NAT setup is capable of full-cone
165 translation, i.e. a single packet sent to any other node establishes a port mapping which
166 allows packets from others to reach the node behind NAT.
167
168 Unlike other DHT systems such as IPFS, the node discovery protocol mandates a single wire
169 protocol to be implemented by everyone. This avoids communication failures due to
170 incompatible transports and strengthens the DHT because all participants are guaranteed to
171 be reachable on the declared endpoint. It is also fundamentally simpler to reason about
172 and implement: the protocol either works in a certain context or it doesn't. If the
173 protocol cannot be used because the networking environment doesn't support UDP, another
174 discovery mechanism must be chosen.
175
176 Another reason for UDP is communication latency: participants in the discovery protocol
177 must be able to communicate with a large number of other nodes within a short time frame
178 to establish and maintain the neighbor set and must perform regular liveness checks on
179 their neighbors. For the topic advertisement system, registrants collect tickets and must
180 use them as soon as the ticket expires to place an ad in a topic queue.
181
182 These protocol interactions are difficult to implement in a TCP setting where connections
183 require multiple round-trips before application data can be sent and the connection
184 lifecycle needs to be maintained. An implementation of the wire protocol on a TCP-based
185 transport would either need permanent connection to hundreds of nodes, in which case the
186 application would be short on file descriptors, or establish many short-lived TCP
187 connections per second to communicate with specific nodes.
188
189 Yet another useful property of UDP is that packets aren't required to reach their
190 destination --- intermediaries may drop arbitrary packets. This strengthens the protocol
191 because it must be designed to function even under bad connectivity. Implementations may
192 exploit the possibility of packet loss to their advantage. A participant can never tell
193 whether a certain request wasn't answered in time because the recipient chose to ignore it
194 or because their own connection isn't working. An implementation that tries to minimize
195 traffic or CPU overhead could simply drop a certain amount of packets at application level
196 to stay within self-imposed limits.
197
198 ## Why Kademlia?
199
200 Kademlia is a simple distributed hash table design proposed in 2002. It is commonly used
201 for file-sharing systems where content is stored by hash and distributed among
202 participants based on their 'proximity' according to the XOR distance metric.
203
204 Node discovery is a Kademlia-inspired system but doesn't store any files, only node
205 information is relayed. We chose Kademlia primarily because the algorithm is simple and
206 understandable while providing a distributed database that scales with the number of
207 participants. Our system also relies on the routing table to allow enumeration and random
208 traversal of the whole network, i.e. all participants can be found. Most importantly,
209 having a structured network with routing enables thinking about DHT 'address space' and
210 'regions of address space'. These concepts are used to build the [topic-based node index].
211
212 Kademlia is often criticized as a naive design with obvious weaknesses. We believe that
213 most issues with simple Kademlia can be overcome by careful programming and the benefits
214 of a simple design outweigh the cost and risks of maintaining a more complex system.
215
216 ## Sybil and Eclipse Attacks
217
218 The well-known 'sybil attack' is based on the observation that creating node identities is
219 essentially free. In any system using a measure of proximity among node identities, an
220 adversary may place nodes close to a chosen node by generating suitable identities. For
221 basic node discovery through network enumeration, the 'sybil attack' poses no significant
222 challenge. Sybils are a serious issue for the topic-based node index, especially for
223 topics provided by few participants, because the index relies on node distance.
224
225 An 'eclipse attack' is usually based on generating sybil nodes with the goal of polluting
226 the victim node's routing table. Once the table is overtaken, the victim has no way to
227 find any other nodes but those controlled by the adversary. Even if creating sybil nodes
228 were somehow impossible, 'eclipsing' a node might still be achieved through other means
229 such as directing large amounts of traffic to the node. When the victim node is unable to
230 keep up regular communication with the rest of the network it may lose connection and be
231 forced into re-bootstrapping its routing table --- a situation in which it is most
232 vulnerable.
233
234 Both the 'sybil attack' and the 'eclipse attack' must be considered for any structured
235 overlay network, and there is no single optimal solution to fully protect against these
236 attacks. However, certain implementation decisions can make them more expensive or render
237 them ineffective.
238
239 As a general measure, implementations can place IP-based limits on the content of their
240 routing table. For example, limiting Kademlia table buckets to two nodes from every /24 IP
241 subnetwork and the whole table to 10 nodes per /24 IP subnetwork significantly increases
242 the number of hosts an attacker must control to overtake the routing table. Such limits
243 are effective because IPv4 addresses are a scarce resource. Subnetwork-based limits remain
244 effective even as IPv6 adoption progresses.
245
246 To counter being eclipsed via repeated contact by an adversary, implementations of the
247 Kademlia table should avoid taking on new members on incoming contact unless the table is
248 well-stocked from outbound queries. Readers of the original Kademlia paper may easily
249 assume that liveness checks on bucket members should be performed just when a new node
250 tries to enter the bucket, but doing so increases the risk of emptying the table through
251 DoS. We therefore recommend to perform liveness checks on a separate schedule which is
252 independent of incoming requests. Checks may also be paused or delayed when the node is
253 under high load. The number of past liveness checks performed on a bucket member is an
254 important indicator of its age: Implementations should favor long-lived nodes and may
255 relax liveness checks according to node age.
256
257 A well-researched countermeasure to sybil attacks is to make creation of identities
258 computationally expensive. While effective in theory, there are significant downsides to
259 this approach. Nodes on resource-constrained devices such as mobile phones may not be able
260 to solve the computational puzzle in time to join the network. Continuous advances in
261 hashing technology which speed up cryptocurrency proof-of-work algorithms show that this
262 way of securing the network requires constant adjustments to thresholds and can never beat
263 determined attackers.
264
265 Support for mixed ENR identity schemes, described later in this document, allows for an
266 escape hatch to introduce arbitrary optional constraints (including proof-of-work) on node
267 identities. Thus, while the issue is not directly addressed at wire protocol level, there
268 is no inherent blocker for solving it as the need arises.
269
270 ## Node Records and Their Properties
271
272 In Discovery v5, all node information is exchanged using [node records]. Records are
273 self-signed by the node they describe and contain arbitrary key-value pairs. They also
274 contain a sequence number to determine which copy of the record is newer when multiple
275 copies are available. When a node record is changed by its owner, the sequence number
276 increases. The new record 'syncs' to neighboring nodes because they will request it during
277 liveness revalidation. The record is also 'pushed' on to newly seen nodes as part of the
278 handshake.
279
280 Signing records prevents any intermediary node from changing the content of a record. Any
281 node's information is either available in the exact form it was published or not at all.
282 To make the system secure, proper validation of records is important. Implementations must
283 verify the signature of all received records. Implementations should also avoid sharing
284 records containing no usable IP addresses or ports and check that Internet hosts do not
285 attempt to share records containing LAN IP addresses.
286
287 ## On Encryption
288
289 An early draft of Discovery v5 integrated weak obfuscation based on XORing packet content
290 as an optional facility. As development of the protocol progressed, we understood that
291 traffic amplification, replay and packet authentication could all be solved by introducing
292 a real encryption scheme. The way the handshake and encryption works is primarily aimed at
293 these issues and is not supposed to ensure complete anonymity of DHT users. While it does
294 protect against passive observers, the handshake is not forward-secure and active protocol
295 participants can access node information by simply asking for it.
296
297 Node identities can use different kinds of keys depending on the identity scheme used in
298 the node record. This has implications on the handshake because it deals with the public
299 key used to derive the identity. Implementations of Discovery v5 must agree on the set of
300 supported identity schemes to keep the network interoperable and custom code to verify the
301 handshake is required for every new scheme. We believe this is an acceptable tradeoff
302 because introducing a new kind of node identity is a rare event.
303
304 Since the handshake performs complex cryptographic operations (ECDH, signature
305 verification) performance of the handshake is a big concern. Benchmarking the experimental
306 Go implementation shows that the handshake computation takes 500µs on a 2014-era laptop
307 using the default secp256k1/keccak256 identity scheme. That's a lot, but note the cost
308 amortizes because nodes commonly exchange multiple packets. Subsequent packets in the same
309 conversation can be decrypted and authenticated in just 2µs. The most common protocol
310 interaction is a FINDNODE or TOPICQUERY request on an unknown node with 4 NODES responses.
311
312 To put things into perspective: encryption and authentication in Discovery v5 is still a
313 significant improvement over the authentication scheme used in Discovery v4, which
314 performs secp256k1 signature 'recovery' (benchmark: ~170µs) on every packet. A FINDNODE
315 interaction with an unknown v4 node takes 7 packets (2x PING/PONG, FINDNODE, 2x NEIGHBORS)
316 and costs 1.2ms on each side for the crypto alone. In addition, the v5 handshake reduces
317 the risk of computational DoS because it costs as much to create as it costs to verify and
318 cannot be replayed.
319
320 ## On Amplification and Replay
321
322 Any openly accessible packet-based system must consider misuse of the protocol for traffic
323 amplification purposes. There are two possible avenues of attack: In the first, an
324 adversary who wishes to attack a third-party host may send packets with 'spoofed' source
325 IP address to a node, attempting to make the node send a larger response to the victim
326 endpoint. In the second, the adversary attempts to install a node record containing the
327 victim's endpoint in the DHT, causing other nodes to direct packets to the victim.
328
329 The handshake handles the first kind of attack by responding with a small WHOAREYOU packet
330 whenever any request is received from an unknown endpoint. This is safe because the
331 adversary's packet is always larger than the WHOAREYOU response, removing the incentive
332 for the attack. To make the countermeasure work, implementations must keep session secrets
333 not just per node ID, but also per node IP.
334
335 The second kind of attack--- installing the victim as a node ---is handled by requiring
336 that implementations mustn't answer queries with nodes whose liveness hasn't been
337 verified. When a node is added to the Kademlia table, it must pass at least one check on
338 the IP declared in the node record before it can be returned in a NODES response.
339
340 An adversary may also try to replay previously sent/seen packets to impersonate a node or
341 disturb the operation of the protocol. Session keys per node-ID/IP generally prevent
342 replay across sessions. The `request-id`, mirrored in response packets, prevents replay of
343 responses within a session.
344
345 ## The Topic Index
346
347 Using FINDNODE queries with appropriately chosen targets, the entire DHT can be sampled by
348 a random walk to find all other participants. When building a distributed application, it
349 is often desirable to restrict the search to participants which provide a certain service.
350 A simple solution to this problem would be to simply split up the network and require
351 participation in many smaller application-specific networks. However, such networks are
352 hard to bootstrap and also more vulnerable to attacks which could isolate nodes.
353
354 The topic index provides discovery by provided service in a different way. Nodes maintain
355 a single node table tracking their neighbors and advertise 'topics' on nodes found by
356 randomly walking the DHT. While the 'global' topic index can be also spammed, it makes
357 complete isolation a lot harder. To prevent nodes interested in a certain topic from
358 finding each other, the entire discovery network would have to be overpowered.
359
360 To make the index useful, searching for nodes by topic must be efficient regardless of the
361 number of advertisers. This is achieved by estimating the topic 'radius', i.e. the
362 percentage of all live nodes which are advertising the topic. Advertisement and search
363 activities are restricted to a region of DHT address space around the topic's 'center'.
364
365 We also want the index to satisfy another property: When a topic advertisement is placed,
366 it should last for a well-defined amount of time. This ensures nodes may rely on their
367 advertisements staying placed rather than worrying about keeping them alive.
368
369 Finally, the index should consume limited resources. Just as the node table is limited in
370 number and size of buckets, the size of the index data structure on each node is limited.
371
372 ### Why should advertisers wait?
373
374 Advertisers must wait a certain amount of time before they can be registered. Enforcing
375 this time limit prevents misuse of the topic index because any topic must be important
376 enough to outweigh the cost of waiting. Imagine a group phone call: announcing the
377 participants of the call using topic advertisement isn't a good use of the system because
378 the topic exists only for a short time and will have very few participants. The waiting
379 time prevents using the index for this purpose because the call might already be over
380 before everyone could get registered.
381
382 ### Dealing with Topic Spam
383
384 Our model is based on the following assumptions:
385
386 - Anyone can place their own advertisements under any topics and the rate of placing ads
387 is not limited globally. The number of active ads for any node is roughly proportional
388 to the resources (network bandwidth, mostly) spent on advertising.
389 - Honest actors whose purpose is to connect to other honest actors will spend an adequate
390 amount of efforts on registering and searching for ads, depending on the rate of newly
391 established connections they are targeting. If the given topic is used only by honest
392 actors, a few registrations per minute will be satisfactory, regardless of the size of
393 the subnetwork.
394 - Dishonest actors may want to place an excessive amount of ads just to disrupt the
395 discovery service. This will reduce the effectiveness of honest registration efforts by
396 increasing the topic radius and/or topic queue waiting times. If the attacker(s) can
397 place a comparable amount or more ads than all honest actors combined then the rate of
398 new (useful) connections established throughout the network will reduce proportionally
399 to the `honest / (dishonest + honest)` registration rates.
400
401 This adverse effect can be countered by honest actors increasing their registration and
402 search efforts. Fortunately, the rate of established connections between them will
403 increase proportionally both with increased honest registration and search efforts. If
404 both are increased in response to an attack, the required factor of increased efforts from
405 honest actors is proportional to the square root of the attacker's efforts.
406
407 ### Detecting a useless registration attack
408
409 In the case of a symmetrical protocol, where nodes are both searching and advertising
410 under the same topic, it is easy to detect when most of the found ads turn out to be
411 useless and increase both registration and query frequency. It is a bit harder but still
412 possible with asymmetrical (client-server) protocols, where only clients can easily detect
413 useless registrations, while advertisers (servers) do not have a direct way of detecting
414 when they should increase their advertising efforts. One possible solution is for servers
415 to also act as clients just to test the server capabilities of other advertisers. It is
416 also possible to implement a feedback system between trusted clients and servers.
417
418 # References
419
420 - Petar Maymounkov and David Mazières.
421 *Kademlia: A Peer-to-peer Information System Based on the XOR Metric.* 2002.\
422 <https://www.scs.stanford.edu/~dm/home/papers/kpos.pdf>
423
424 - Atul Singh, Tsuen-Wan “Johnny” Ngan, Peter Druschel, Dan S. Wallach.
425 *Eclipse Attacks on Overlay Networks: Threats and Defenses*. 2006.\
426 <https://www.cs.rice.edu/~dwallach/pub/eclipse-infocom06.pdf>
427
428 - Ingmar Baumgart and Sebastian Mies.
429 *S/Kademlia: A Practicable Approach Towards Secure Key-Based Routing.* 2007.\
430 <https://telematics.tm.kit.edu/publications/Files/267/SKademlia_2007.pdf>
431
432 - Xin Sun, Ruben Torres and Sanjay Rao. *Feasiblity of DDoS Attacks with P2P Systems and
433 Prevention through Robust Membership Management.* 2007.\
434 <https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1357&context=ecetr>
435
436 - Erik Hjelmvik, Wolfgang John. *Breaking and Improving Protocol Obfuscation.* 2010.\
437 <https://internetstiftelsen.se/docs/hjelmvik_breaking.pdf>
438
439 - Adam Langley, Wan-Teh Chang. *QUIC Crypto*. 2016.\
440 <https://docs.google.com/document/d/1g5nIXAIkN_Y-7XJW5K45IblHd_L2f5LTaDUDwvZ5L6g>
441
442 - W3C Credentials Community Group. *Decentralized Identifiers (DIDs) Spec.* 2017.\
443 <https://w3c-ccg.github.io/did-spec>
444
445 - Seoung Kyun Kim, Zane Ma, Siddharth Murali, Joshua Mason, Andrew Miller, Michael Bailey.
446 *Measuring Ethereum Network Peers*. 2018.\
447 <http://mdbailey.ece.illinois.edu/publications/imc18_ethereum.pdf>
448
449 - Yuval Marcus, Ethan Heilman, Sharon Goldberg.
450 *Low-Resource Eclipse Attacks on Ethereum’s Peer-to-Peer Network.* 2018.\
451 <https://eprint.iacr.org/2018/236.pdf>
452
453 [wire protocol]: ./discv5-wire.md
454 [topic-based node index]: ./discv5-theory.md#topic-advertisement
455 [node records]: ../enr.md