/ discv4.md
discv4.md
1 # Node Discovery Protocol 2 3 This specification defines the Node Discovery protocol version 4, a Kademlia-like DHT that 4 stores information about Ethereum nodes. The Kademlia structure was chosen because it is 5 an efficient way to organize a distributed index of nodes and yields a topology of low 6 diameter. 7 8 The current protocol version is **4**. You can find a list of changes in past protocol 9 versions at the end of this document. 10 11 ## Node Identities 12 13 Every node has a cryptographic identity, a key on the secp256k1 elliptic curve. The public 14 key of the node serves as its identifier or 'node ID'. 15 16 The 'distance' between two node keys is the bitwise exclusive or on the hashes of the 17 public keys, taken as the number. 18 19 distance(n₁, n₂) = keccak256(n₁) XOR keccak256(n₂) 20 21 ## Node Records 22 23 Participants in the Discovery Protocol are expected to maintain a [node record] \(ENR\) 24 containing up-to-date information. All records must use the "v4" identity scheme. Other 25 nodes may request the local record at any time by sending an [ENRRequest] packet. 26 27 To resolve the current record of any node public key, perform a Kademlia lookup using 28 [FindNode] packets. When the node is found, send ENRRequest to it and return the record 29 from the response. 30 31 ## Kademlia Table 32 33 Nodes in the Discovery Protocol keep information about other nodes in their neighborhood. 34 Neighbor nodes are stored in a routing table consisting of 'k-buckets'. For each `i` in 35 `0 ≤ i < 256`, every node keeps a k-bucket of neighbors with distance 36 `2^i ≤ distance < 2^(i+1)` from itself. 37 38 The Node Discovery Protocol uses `k = 16`, i.e. every k-bucket contains up to 16 node 39 entries. The entries are sorted by time last seen — least-recently seen node at the head, 40 most-recently seen at the tail. 41 42 Whenever a new node N₁ is encountered, it can be inserted into the corresponding bucket. 43 If the bucket contains less than `k` entries N₁ can simply be added as the last entry. If 44 the bucket already contains `k` entries, the least recently seen node in the bucket, N₂, 45 needs to be revalidated by sending a [Ping] packet. If no reply is received from N₂ it is 46 considered dead, removed and N₁ added to the tail of the bucket. 47 48 ## Endpoint Proof 49 50 To prevent traffic amplification attacks, implementations must verify that the sender of a 51 query participates in the discovery protocol. The sender of a packet is considered 52 verified if it has sent a valid [Pong] response with matching ping hash within the last 12 53 hours. 54 55 ## Recursive Lookup 56 57 A 'lookup' locates the `k` closest nodes to a node ID. 58 59 The lookup initiator starts by picking `α` closest nodes to the target it knows of. The 60 initiator then sends concurrent [FindNode] packets to those nodes. `α` is a system-wide 61 concurrency parameter, such as 3. In the recursive step, the initiator resends FindNode to 62 nodes it has learned about from previous queries. Of the `k` nodes the initiator has heard 63 of closest to the target, it picks `α` that it has not yet queried and resends [FindNode] 64 to them. Nodes that fail to respond quickly are removed from consideration until and 65 unless they do respond. 66 67 If a round of FindNode queries fails to return a node any closer than the closest already 68 seen, the initiator resends the find node to all of the `k` closest nodes it has not 69 already queried. The lookup terminates when the initiator has queried and gotten responses 70 from the `k` closest nodes it has seen. 71 72 ## Wire Protocol 73 74 Node discovery messages are sent as UDP datagrams. The maximum size of any packet is 1280 75 bytes. 76 77 packet = packet-header || packet-data 78 79 Every packet starts with a header: 80 81 packet-header = hash || signature || packet-type 82 hash = keccak256(signature || packet-type || packet-data) 83 signature = sign(packet-type || packet-data) 84 85 The `hash` exists to make the packet format recognizable when running multiple protocols 86 on the same UDP port. It serves no other purpose. 87 88 Every packet is signed by the node's identity key. The `signature` is encoded as a byte 89 array of length 65 as the concatenation of the signature values `r`, `s` and the 'recovery 90 id' `v`. 91 92 The `packet-type` is a single byte defining the type of message. Valid packet types are 93 listed below. Data after the header is specific to the packet type and is encoded as an 94 RLP list. Implementations should ignore any additional elements in the `packet-data` list 95 as well as any extra data after the list. 96 97 ### Ping Packet (0x01) 98 99 packet-data = [version, from, to, expiration, enr-seq ...] 100 version = 4 101 from = [sender-ip, sender-udp-port, sender-tcp-port] 102 to = [recipient-ip, recipient-udp-port, 0] 103 104 The `expiration` field is an absolute UNIX time stamp. Packets containing a time stamp 105 that lies in the past are expired may not be processed. 106 107 The `enr-seq` field is the current ENR sequence number of the sender. This field is 108 optional. 109 110 When a ping packet is received, the recipient should reply with a [Pong] packet. It may 111 also consider the sender for addition into the local table. Implementations should ignore 112 any mismatches in version. 113 114 If no communication with the sender has occurred within the last 12h, a ping should be 115 sent in addition to pong in order to receive an endpoint proof. 116 117 ### Pong Packet (0x02) 118 119 packet-data = [to, ping-hash, expiration, enr-seq, ...] 120 121 Pong is the reply to ping. 122 123 `ping-hash` should be equal to `hash` of the corresponding ping packet. Implementations 124 should ignore unsolicited pong packets that do not contain the hash of the most recent 125 ping packet. 126 127 The `enr-seq` field is the current ENR sequence number of the sender. This field is 128 optional. 129 130 ### FindNode Packet (0x03) 131 132 packet-data = [target, expiration, ...] 133 134 A FindNode packet requests information about nodes close to `target`. The `target` is a 135 64-byte secp256k1 public key. When FindNode is received, the recipient should reply with 136 [Neighbors] packets containing the closest 16 nodes to target found in its local table. 137 138 To guard against traffic amplification attacks, Neighbors replies should only be sent if 139 the sender of FindNode has been verified by the endpoint proof procedure. 140 141 ### Neighbors Packet (0x04) 142 143 packet-data = [nodes, expiration, ...] 144 nodes = [[ip, udp-port, tcp-port, node-id], ...] 145 146 Neighbors is the reply to [FindNode]. 147 148 ### ENRRequest Packet (0x05) 149 150 packet-data = [expiration] 151 152 When a packet of this type is received, the node should reply with an ENRResponse packet 153 containing the current version of its [node record]. 154 155 To guard against amplification attacks, the sender of ENRRequest should have replied to a 156 ping packet recently (just like for FindNode). The `expiration` field, a UNIX timestamp, 157 should be handled as for all other existing packets i.e. no reply should be sent if it 158 refers to a time in the past. 159 160 ### ENRResponse Packet (0x06) 161 162 packet-data = [request-hash, ENR] 163 164 This packet is the response to ENRRequest. 165 166 - `request-hash` is the hash of the entire ENRRequest packet being replied to. 167 - `ENR` is the node record. 168 169 The recipient of the packet should verify that the node record is signed by the public key 170 which signed the response packet. 171 172 # Change Log 173 174 ## Known Issues in the Current Version 175 176 The `expiration` field present in all packets is supposed to prevent packet replay. Since 177 it is an absolute time stamp, the node's clock must be accurate to verify it correctly. 178 Since the protocol's launch in 2016 we have received countless reports about connectivity 179 issues related to the user's clock being wrong. 180 181 The endpoint proof is imprecise because the sender of FindNode can never be sure whether 182 the recipient has seen a recent enough pong. Geth handles it as follows: If no 183 communication with the recipient has occurred within the last 12h, initiate the procedure 184 by sending a ping. Wait for a ping from the other side, reply to it and then send 185 FindNode. 186 187 ## EIP-868 (October 2019) 188 189 [EIP-868] adds the [ENRRequest] and [ENRResponse] packets. It also modifies [Ping] and 190 [Pong] to include the local ENR sequence number. 191 192 ## EIP-8 (December 2017) 193 194 [EIP-8] mandated that implementations ignore mismatches in Ping version and any additional 195 list elements in `packet-data`. 196 197 [Ping]: #ping-packet-0x01 198 [Pong]: #pong-packet-0x02 199 [FindNode]: #findnode-packet-0x03 200 [Neighbors]: #neighbors-packet-0x04 201 [ENRRequest]: #enrrequest-packet-0x05 202 [ENRResponse]: #enrresponse-packet-0x06 203 [EIP-8]: https://eips.ethereum.org/EIPS/eip-8 204 [EIP-868]: https://eips.ethereum.org/EIPS/eip-868 205 [node record]: ./enr.md