/ discv5 / discv5-rationale.md
discv5-rationale.md
  1  # Node Discovery Protocol v5 - Rationale
  2  
  3  **Protocol version v5.1**
  4  
  5  Note that this specification is a work in progress and may change incompatibly without
  6  prior notice.
  7  
  8  This document explains the design requirements and security needs of Discovery v5. In
  9  addition, the document tries to gather the various vulnerabilities and threats that
 10  pertain to Kademlia-like p2p networks. Our aim is to make it plain which issues are
 11  addressed and how they are mitigated, so that the design of the [wire protocol] may be
 12  verified.
 13  
 14  # Design Requirements
 15  
 16  ## Basic Goals
 17  
 18  #### 1.1.1 Replace the Discovery v4 Endpoint Proof
 19  
 20  The existing mutual endpoint verification process is unreliable because either side may
 21  forget about a previously performed endpoint proof. If node A assumes that node B already
 22  knows about a recent PING/PONG interaction and sends FINDNODE, the request may fail.
 23  Implementations of Discovery v4 may guard against this flaw using retries, but retrying is
 24  really slow and usually not done.
 25  
 26  #### 1.1.2 Require knowledge of destination node ID for communication
 27  
 28  Make it expensive to obtain the logical node ID from discovery communications. In
 29  Discovery v4, any node can provoke responses knowing IP alone, and obtain information
 30  about a node without knowing its ID. This encourages sloppy implementations to not perform
 31  proper validation of FINDNODE results and increases the risk of DHT misuse for DDoS
 32  purposes.
 33  
 34  #### 1.1.3 Support more than one node ID cryptosystem
 35  
 36  Ensure the DHT can accommodate ENR's with multiple identity systems. This will allow
 37  identity cryptosystems other than *secp256k1/keccak256*.
 38  
 39  #### 1.1.4 Replace node information tuples with ENRs
 40  
 41  ENRs include discovery information and more. These signed, versioned records fulfill
 42  multiple requirements, such as permitting capability advertisement and transport
 43  negotiation.
 44  
 45  #### 1.1.5 Guard against Kademlia implementation flaws
 46  
 47  Discovery v4 trusts other nodes to return neighbors according to an agreed distance
 48  metric. Mismatches in implementation can make it hard for nodes to join the network, or
 49  lead to network fragmentation.
 50  
 51  #### 1.1.6 Secondary topic-based node index
 52  
 53  The protocol must support discovery of nodes via an arbitrary topic identifier. Finding
 54  nodes belonging to a topic should be as fast or faster than finding a node with a certain
 55  ID.
 56  
 57  #### 1.1.7 Change replay prevention
 58  
 59  The use of timestamps as a replay prevention mechanism in Discovery v4 has led to many
 60  complaints about connectivity when the host's clock was wrong. The protocol should be
 61  independent of the clock.
 62  
 63  #### 1.1.8 Message obfuscation
 64  
 65  The protocol should obfuscate traffic to prevent accidental packet mangling or trivial
 66  sniffing. It must also avoid inclusion of obvious markers to prevent naive blocking of
 67  discovery traffic using hard-coded packet signatures. Defense against advanced traffic
 68  analysis systems, e.g. using inter-packet timing is a secondary concern.
 69  
 70  ## Security Goals
 71  
 72  Individual potential vulnerabilities are identified below. These each represent their own
 73  risk mitigation goal.
 74  
 75  #### 1.2.1 Replay of the handshake
 76  
 77  The handshake, if successfully replayed from an older session, would allow a malicious
 78  node to occupy a former IP location, or pollute the routing table with old information.
 79  
 80  #### 1.2.2 Replay NODES
 81  
 82  A NODES response, if successfully replayed, would pollute the routing table with stale
 83  information.
 84  
 85  #### 1.2.3 Replay PONG
 86  
 87  A PONG, if successfully replayed, could convince a node that a node is live and
 88  participating when it isn't.
 89  
 90  #### 1.2.4 Kademlia redirection
 91  
 92  A FindNode response contains false endpoint information intended at directing traffic at a
 93  victim / polluting the routing table. A topic query results in fake endpoint information,
 94  directing traffic at a victim.
 95  
 96  #### 1.2.5 Kademlia redirection + self-propagation
 97  
 98  As 1.2.3 but the responses attempt to replicate the malicious node throughout the routing
 99  table, to amplify the source of pollution and traffic.
100  
101  #### 1.2.6 Unsolicited replies
102  
103  A malicious node is attempting to spam a node with fake responses to typical requests.
104  These messages may be replayed from previous communications, or may be new messages with
105  spoofed source endpoints. The aim is to disrupt weak implementations or have their
106  information be received as authentic, to pollute the recipient's routing table.
107  
108  #### 1.2.7 Amplification
109  
110  Malicious requests of small message size are sent from spoofed source IPs to direct larger
111  response messages at the victim.
112  
113  #### 1.2.8 Kademlia direct validation
114  
115  Direct validation of a newly discovered node can be an attack vector. A malicious node may
116  supply false node information with the IP of a victim. Validation traffic is then directed
117  at the victim.
118  
119  #### 1.2.9 Kademlia ID count per address validations
120  
121  There are various attacks facilitated by being able to associate multiple fake (or even
122  real) malicious node ids with a single IP endpoint. One mitigation method that is
123  sometimes considered is to globally limit the number of logical node IDs that can be
124  associated with an IP address. However, this is an attack vector. A malicious actor can
125  supply many logical node ids for a single IP address and thus prevent the correct node
126  from being able to join the network.
127  
128  #### 1.2.10 Sybil/Eclipse attacks
129  
130  These attacks rely on being able to create many real nodes, or spoof many logical node IDs
131  for a small number of physical endpoints, to form a large, isolated area of the network
132  under the control of the malicious actor. The victim's discovery findings are directed
133  into that part of the network, either to manipulate their traffic or to fully isolate them
134  from the network.
135  
136  ## Version Interoperability / Upgrade Paths
137  
138  There are several considerations regarding the coexistence of v4 and v5 network members.
139  
140  #### 1.3.1 Transition period during network formation
141  
142  Discovery v4 clients should be able to serve as discovery v5 bootstrap nodes while the
143  number of new discovery v5 clients is still low.
144  
145  #### 1.3.2 Circumvention of 1.1.2 with v4 PING
146  
147  While a client supports both the old v4 and newer versions, it is possible for malicious
148  actors to pose as a v4 node and recover node IDs from arbitrary IP addresses. This should
149  somehow be avoided.
150  
151  # Rationale
152  
153  ## Why UDP?
154  
155  The wire protocol specification mandates the use of UDP. This may seem restrictive, but
156  use of UDP communication is an important part of the design. While there is no single
157  reason which ultimately dictates this choice, there are many reasons why the system as a
158  whole will function a lot better in the context of UDP.
159  
160  For discovery to work, all nodes must be able to communicate with each other on equal
161  footing. The network won't form properly if some nodes can only communicate with certain
162  other nodes. Incooperative NAT in between the node and the Internet can cause
163  communication failure. UDP is fundamentally easier to work with when it comes to NAT
164  traversal. No explicit hole-punching is required if the NAT setup is capable of full-cone
165  translation, i.e. a single packet sent to any other node establishes a port mapping which
166  allows packets from others to reach the node behind NAT.
167  
168  Unlike other DHT systems such as IPFS, the node discovery protocol mandates a single wire
169  protocol to be implemented by everyone. This avoids communication failures due to
170  incompatible transports and strengthens the DHT because all participants are guaranteed to
171  be reachable on the declared endpoint. It is also fundamentally simpler to reason about
172  and implement: the protocol either works in a certain context or it doesn't. If the
173  protocol cannot be used because the networking environment doesn't support UDP, another
174  discovery mechanism must be chosen.
175  
176  Another reason for UDP is communication latency: participants in the discovery protocol
177  must be able to communicate with a large number of other nodes within a short time frame
178  to establish and maintain the neighbor set and must perform regular liveness checks on
179  their neighbors. For the topic advertisement system, registrants collect tickets and must
180  use them as soon as the ticket expires to place an ad in a topic queue.
181  
182  These protocol interactions are difficult to implement in a TCP setting where connections
183  require multiple round-trips before application data can be sent and the connection
184  lifecycle needs to be maintained. An implementation of the wire protocol on a TCP-based
185  transport would either need permanent connection to hundreds of nodes, in which case the
186  application would be short on file descriptors, or establish many short-lived TCP
187  connections per second to communicate with specific nodes.
188  
189  Yet another useful property of UDP is that packets aren't required to reach their
190  destination --- intermediaries may drop arbitrary packets. This strengthens the protocol
191  because it must be designed to function even under bad connectivity. Implementations may
192  exploit the possibility of packet loss to their advantage. A participant can never tell
193  whether a certain request wasn't answered in time because the recipient chose to ignore it
194  or because their own connection isn't working. An implementation that tries to minimize
195  traffic or CPU overhead could simply drop a certain amount of packets at application level
196  to stay within self-imposed limits.
197  
198  ## Why Kademlia?
199  
200  Kademlia is a simple distributed hash table design proposed in 2002. It is commonly used
201  for file-sharing systems where content is stored by hash and distributed among
202  participants based on their 'proximity' according to the XOR distance metric.
203  
204  Node discovery is a Kademlia-inspired system but doesn't store any files, only node
205  information is relayed. We chose Kademlia primarily because the algorithm is simple and
206  understandable while providing a distributed database that scales with the number of
207  participants. Our system also relies on the routing table to allow enumeration and random
208  traversal of the whole network, i.e. all participants can be found. Most importantly,
209  having a structured network with routing enables thinking about DHT 'address space' and
210  'regions of address space'. These concepts are used to build the [topic-based node index].
211  
212  Kademlia is often criticized as a naive design with obvious weaknesses. We believe that
213  most issues with simple Kademlia can be overcome by careful programming and the benefits
214  of a simple design outweigh the cost and risks of maintaining a more complex system.
215  
216  ## Sybil and Eclipse Attacks
217  
218  The well-known 'sybil attack' is based on the observation that creating node identities is
219  essentially free. In any system using a measure of proximity among node identities, an
220  adversary may place nodes close to a chosen node by generating suitable identities. For
221  basic node discovery through network enumeration, the 'sybil attack' poses no significant
222  challenge. Sybils are a serious issue for the topic-based node index, especially for
223  topics provided by few participants, because the index relies on node distance.
224  
225  An 'eclipse attack' is usually based on generating sybil nodes with the goal of polluting
226  the victim node's routing table. Once the table is overtaken, the victim has no way to
227  find any other nodes but those controlled by the adversary. Even if creating sybil nodes
228  were somehow impossible, 'eclipsing' a node might still be achieved through other means
229  such as directing large amounts of traffic to the node. When the victim node is unable to
230  keep up regular communication with the rest of the network it may lose connection and be
231  forced into re-bootstrapping its routing table --- a situation in which it is most
232  vulnerable.
233  
234  Both the 'sybil attack' and the 'eclipse attack' must be considered for any structured
235  overlay network, and there is no single optimal solution to fully protect against these
236  attacks. However, certain implementation decisions can make them more expensive or render
237  them ineffective.
238  
239  As a general measure, implementations can place IP-based limits on the content of their
240  routing table. For example, limiting Kademlia table buckets to two nodes from every /24 IP
241  subnetwork and the whole table to 10 nodes per /24 IP subnetwork significantly increases
242  the number of hosts an attacker must control to overtake the routing table. Such limits
243  are effective because IPv4 addresses are a scarce resource. Subnetwork-based limits remain
244  effective even as IPv6 adoption progresses.
245  
246  To counter being eclipsed via repeated contact by an adversary, implementations of the
247  Kademlia table should avoid taking on new members on incoming contact unless the table is
248  well-stocked from outbound queries. Readers of the original Kademlia paper may easily
249  assume that liveness checks on bucket members should be performed just when a new node
250  tries to enter the bucket, but doing so increases the risk of emptying the table through
251  DoS. We therefore recommend to perform liveness checks on a separate schedule which is
252  independent of incoming requests. Checks may also be paused or delayed when the node is
253  under high load. The number of past liveness checks performed on a bucket member is an
254  important indicator of its age: Implementations should favor long-lived nodes and may
255  relax liveness checks according to node age.
256  
257  A well-researched countermeasure to sybil attacks is to make creation of identities
258  computationally expensive. While effective in theory, there are significant downsides to
259  this approach. Nodes on resource-constrained devices such as mobile phones may not be able
260  to solve the computational puzzle in time to join the network. Continuous advances in
261  hashing technology which speed up cryptocurrency proof-of-work algorithms show that this
262  way of securing the network requires constant adjustments to thresholds and can never beat
263  determined attackers.
264  
265  Support for mixed ENR identity schemes, described later in this document, allows for an
266  escape hatch to introduce arbitrary optional constraints (including proof-of-work) on node
267  identities. Thus, while the issue is not directly addressed at wire protocol level, there
268  is no inherent blocker for solving it as the need arises.
269  
270  ## Node Records and Their Properties
271  
272  In Discovery v5, all node information is exchanged using [node records]. Records are
273  self-signed by the node they describe and contain arbitrary key-value pairs. They also
274  contain a sequence number to determine which copy of the record is newer when multiple
275  copies are available. When a node record is changed by its owner, the sequence number
276  increases. The new record 'syncs' to neighboring nodes because they will request it during
277  liveness revalidation. The record is also 'pushed' on to newly seen nodes as part of the
278  handshake.
279  
280  Signing records prevents any intermediary node from changing the content of a record. Any
281  node's information is either available in the exact form it was published or not at all.
282  To make the system secure, proper validation of records is important. Implementations must
283  verify the signature of all received records. Implementations should also avoid sharing
284  records containing no usable IP addresses or ports and check that Internet hosts do not
285  attempt to share records containing LAN IP addresses.
286  
287  ## On Encryption
288  
289  An early draft of Discovery v5 integrated weak obfuscation based on XORing packet content
290  as an optional facility. As development of the protocol progressed, we understood that
291  traffic amplification, replay and packet authentication could all be solved by introducing
292  a real encryption scheme. The way the handshake and encryption works is primarily aimed at
293  these issues and is not supposed to ensure complete anonymity of DHT users. While it does
294  protect against passive observers, the handshake is not forward-secure and active protocol
295  participants can access node information by simply asking for it.
296  
297  Node identities can use different kinds of keys depending on the identity scheme used in
298  the node record. This has implications on the handshake because it deals with the public
299  key used to derive the identity. Implementations of Discovery v5 must agree on the set of
300  supported identity schemes to keep the network interoperable and custom code to verify the
301  handshake is required for every new scheme. We believe this is an acceptable tradeoff
302  because introducing a new kind of node identity is a rare event.
303  
304  Since the handshake performs complex cryptographic operations (ECDH, signature
305  verification) performance of the handshake is a big concern. Benchmarking the experimental
306  Go implementation shows that the handshake computation takes 500µs on a 2014-era laptop
307  using the default secp256k1/keccak256 identity scheme. That's a lot, but note the cost
308  amortizes because nodes commonly exchange multiple packets. Subsequent packets in the same
309  conversation can be decrypted and authenticated in just 2µs. The most common protocol
310  interaction is a FINDNODE or TOPICQUERY request on an unknown node with 4 NODES responses.
311  
312  To put things into perspective: encryption and authentication in Discovery v5 is still a
313  significant improvement over the authentication scheme used in Discovery v4, which
314  performs secp256k1 signature 'recovery' (benchmark: ~170µs) on every packet. A FINDNODE
315  interaction with an unknown v4 node takes 7 packets (2x PING/PONG, FINDNODE, 2x NEIGHBORS)
316  and costs 1.2ms on each side for the crypto alone. In addition, the v5 handshake reduces
317  the risk of computational DoS because it costs as much to create as it costs to verify and
318  cannot be replayed.
319  
320  ## On Amplification and Replay
321  
322  Any openly accessible packet-based system must consider misuse of the protocol for traffic
323  amplification purposes. There are two possible avenues of attack: In the first, an
324  adversary who wishes to attack a third-party host may send packets with 'spoofed' source
325  IP address to a node, attempting to make the node send a larger response to the victim
326  endpoint. In the second, the adversary attempts to install a node record containing the
327  victim's endpoint in the DHT, causing other nodes to direct packets to the victim.
328  
329  The handshake handles the first kind of attack by responding with a small WHOAREYOU packet
330  whenever any request is received from an unknown endpoint. This is safe because the
331  adversary's packet is always larger than the WHOAREYOU response, removing the incentive
332  for the attack. To make the countermeasure work, implementations must keep session secrets
333  not just per node ID, but also per node IP.
334  
335  The second kind of attack--- installing the victim as a node ---is handled by requiring
336  that implementations mustn't answer queries with nodes whose liveness hasn't been
337  verified. When a node is added to the Kademlia table, it must pass at least one check on
338  the IP declared in the node record before it can be returned in a NODES response.
339  
340  An adversary may also try to replay previously sent/seen packets to impersonate a node or
341  disturb the operation of the protocol. Session keys per node-ID/IP generally prevent
342  replay across sessions. The `request-id`, mirrored in response packets, prevents replay of
343  responses within a session.
344  
345  ## The Topic Index
346  
347  Using FINDNODE queries with appropriately chosen targets, the entire DHT can be sampled by
348  a random walk to find all other participants. When building a distributed application, it
349  is often desirable to restrict the search to participants which provide a certain service.
350  A simple solution to this problem would be to simply split up the network and require
351  participation in many smaller application-specific networks. However, such networks are
352  hard to bootstrap and also more vulnerable to attacks which could isolate nodes.
353  
354  The topic index provides discovery by provided service in a different way. Nodes maintain
355  a single node table tracking their neighbors and advertise 'topics' on nodes found by
356  randomly walking the DHT. While the 'global' topic index can be also spammed, it makes
357  complete isolation a lot harder. To prevent nodes interested in a certain topic from
358  finding each other, the entire discovery network would have to be overpowered.
359  
360  To make the index useful, searching for nodes by topic must be efficient regardless of the
361  number of advertisers. This is achieved by estimating the topic 'radius', i.e. the
362  percentage of all live nodes which are advertising the topic. Advertisement and search
363  activities are restricted to a region of DHT address space around the topic's 'center'.
364  
365  We also want the index to satisfy another property: When a topic advertisement is placed,
366  it should last for a well-defined amount of time. This ensures nodes may rely on their
367  advertisements staying placed rather than worrying about keeping them alive.
368  
369  Finally, the index should consume limited resources. Just as the node table is limited in
370  number and size of buckets, the size of the index data structure on each node is limited.
371  
372  ### Why should advertisers wait?
373  
374  Advertisers must wait a certain amount of time before they can be registered. Enforcing
375  this time limit prevents misuse of the topic index because any topic must be important
376  enough to outweigh the cost of waiting. Imagine a group phone call: announcing the
377  participants of the call using topic advertisement isn't a good use of the system because
378  the topic exists only for a short time and will have very few participants. The waiting
379  time prevents using the index for this purpose because the call might already be over
380  before everyone could get registered.
381  
382  ### Dealing with Topic Spam
383  
384  Our model is based on the following assumptions:
385  
386  - Anyone can place their own advertisements under any topics and the rate of placing ads
387    is not limited globally. The number of active ads for any node is roughly proportional
388    to the resources (network bandwidth, mostly) spent on advertising.
389  - Honest actors whose purpose is to connect to other honest actors will spend an adequate
390    amount of efforts on registering and searching for ads, depending on the rate of newly
391    established connections they are targeting. If the given topic is used only by honest
392    actors, a few registrations per minute will be satisfactory, regardless of the size of
393    the subnetwork.
394  - Dishonest actors may want to place an excessive amount of ads just to disrupt the
395    discovery service. This will reduce the effectiveness of honest registration efforts by
396    increasing the topic radius and/or topic queue waiting times. If the attacker(s) can
397    place a comparable amount or more ads than all honest actors combined then the rate of
398    new (useful) connections established throughout the network will reduce proportionally
399    to the `honest / (dishonest + honest)` registration rates.
400  
401  This adverse effect can be countered by honest actors increasing their registration and
402  search efforts. Fortunately, the rate of established connections between them will
403  increase proportionally both with increased honest registration and search efforts. If
404  both are increased in response to an attack, the required factor of increased efforts from
405  honest actors is proportional to the square root of the attacker's efforts.
406  
407  ### Detecting a useless registration attack
408  
409  In the case of a symmetrical protocol, where nodes are both searching and advertising
410  under the same topic, it is easy to detect when most of the found ads turn out to be
411  useless and increase both registration and query frequency. It is a bit harder but still
412  possible with asymmetrical (client-server) protocols, where only clients can easily detect
413  useless registrations, while advertisers (servers) do not have a direct way of detecting
414  when they should increase their advertising efforts. One possible solution is for servers
415  to also act as clients just to test the server capabilities of other advertisers. It is
416  also possible to implement a feedback system between trusted clients and servers.
417  
418  # References
419  
420  - Petar Maymounkov and David Mazières.
421    *Kademlia: A Peer-to-peer Information System Based on the XOR Metric.* 2002.\
422    <https://www.scs.stanford.edu/~dm/home/papers/kpos.pdf>
423  
424  - Atul Singh, Tsuen-Wan “Johnny” Ngan, Peter Druschel, Dan S. Wallach.
425    *Eclipse Attacks on Overlay Networks: Threats and Defenses*. 2006.\
426    <https://www.cs.rice.edu/~dwallach/pub/eclipse-infocom06.pdf>
427  
428  - Ingmar Baumgart and Sebastian Mies.
429    *S/Kademlia: A Practicable Approach Towards Secure Key-Based Routing.* 2007.\
430    <https://telematics.tm.kit.edu/publications/Files/267/SKademlia_2007.pdf>
431  
432  - Xin Sun, Ruben Torres and Sanjay Rao. *Feasiblity of DDoS Attacks with P2P Systems and
433    Prevention through Robust Membership Management.* 2007.\
434    <https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1357&context=ecetr>
435  
436  - Erik Hjelmvik, Wolfgang John. *Breaking and Improving Protocol Obfuscation.* 2010.\
437    <https://internetstiftelsen.se/docs/hjelmvik_breaking.pdf>
438  
439  - Adam Langley, Wan-Teh Chang. *QUIC Crypto*. 2016.\
440    <https://docs.google.com/document/d/1g5nIXAIkN_Y-7XJW5K45IblHd_L2f5LTaDUDwvZ5L6g>
441  
442  - W3C Credentials Community Group. *Decentralized Identifiers (DIDs) Spec.* 2017.\
443    <https://w3c-ccg.github.io/did-spec>
444  
445  - Seoung Kyun Kim, Zane Ma, Siddharth Murali, Joshua Mason, Andrew Miller, Michael Bailey.
446    *Measuring Ethereum Network Peers*. 2018.\
447    <http://mdbailey.ece.illinois.edu/publications/imc18_ethereum.pdf>
448  
449  - Yuval Marcus, Ethan Heilman, Sharon Goldberg.
450    *Low-Resource Eclipse Attacks on Ethereum’s Peer-to-Peer Network.* 2018.\
451    <https://eprint.iacr.org/2018/236.pdf>
452  
453  [wire protocol]: ./discv5-wire.md
454  [topic-based node index]: ./discv5-theory.md#topic-advertisement
455  [node records]: ../enr.md