idnits 2.17.1 draft-ietf-p2psip-self-tuning-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document date (July 16, 2012) is 4273 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-26) exists of draft-ietf-p2psip-base-21 ** Obsolete normative reference: RFC 5389 (Obsoleted by RFC 8489) == Outdated reference: A later version (-09) exists of draft-ietf-p2psip-concepts-04 Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 P2PSIP Working Group J. Maenpaa 3 Internet-Draft G. Camarillo 4 Intended status: Standards Track Ericsson 5 Expires: January 17, 2013 J. Hautakorpi 6 Nokia Siemens Networks 7 July 16, 2012 9 A Self-tuning Distributed Hash Table (DHT) for REsource LOcation And 10 Discovery (RELOAD) 11 draft-ietf-p2psip-self-tuning-06.txt 13 Abstract 15 REsource LOcation And Discovery (RELOAD) is a peer-to-peer (P2P) 16 signaling protocol that provides an overlay network service. Peers 17 in a RELOAD overlay network collectively run an overlay algorithm to 18 organize the overlay, and to store and retrieve data. This document 19 describes how the default topology plugin of RELOAD can be extended 20 to support self-tuning, that is, to adapt to changing operating 21 conditions such as churn and network size. 23 Status of this Memo 25 This Internet-Draft is submitted to IETF in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on January 17, 2013. 40 Copyright Notice 42 Copyright (c) 2012 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 3. Introduction to Stabilization in DHTs . . . . . . . . . . . . 5 60 3.1. Reactive vs. Periodic Stabilization . . . . . . . . . . . 5 61 3.2. Configuring Periodic Stabilization . . . . . . . . . . . . 6 62 3.3. Adaptive Stabilization . . . . . . . . . . . . . . . . . . 7 63 4. Introduction to Chord . . . . . . . . . . . . . . . . . . . . 7 64 5. Extending Chord-reload to Support Self-tuning . . . . . . . . 9 65 5.1. Update Requests . . . . . . . . . . . . . . . . . . . . . 9 66 5.2. Neighbor Stabilization . . . . . . . . . . . . . . . . . . 10 67 5.3. Finger Stabilization . . . . . . . . . . . . . . . . . . . 11 68 5.4. Adjusting Finger Table Size . . . . . . . . . . . . . . . 11 69 5.5. Detecting Partitioning . . . . . . . . . . . . . . . . . . 11 70 5.6. Leaving the Overlay . . . . . . . . . . . . . . . . . . . 11 71 6. Self-tuning Chord Parameters . . . . . . . . . . . . . . . . . 12 72 6.1. Estimating Overlay Size . . . . . . . . . . . . . . . . . 12 73 6.2. Determining Routing Table Size . . . . . . . . . . . . . . 13 74 6.3. Estimating Failure Rate . . . . . . . . . . . . . . . . . 13 75 6.3.1. Detecting Failures . . . . . . . . . . . . . . . . . . 14 76 6.4. Estimating Join Rate . . . . . . . . . . . . . . . . . . . 14 77 6.5. Estimate Sharing . . . . . . . . . . . . . . . . . . . . . 15 78 6.6. Calculating the Stabilization Interval . . . . . . . . . . 16 79 7. Overlay Configuration Document Extension . . . . . . . . . . . 17 80 8. Security Considerations . . . . . . . . . . . . . . . . . . . 17 81 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 82 9.1. Message Extensions . . . . . . . . . . . . . . . . . . . . 18 83 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 84 10.1. Normative References . . . . . . . . . . . . . . . . . . . 18 85 10.2. Informative References . . . . . . . . . . . . . . . . . . 18 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 88 1. Introduction 90 REsource LOcation And Discovery (RELOAD) [I-D.ietf-p2psip-base] is a 91 peer-to-peer signaling protocol that can be used to maintain an 92 overlay network, and to store data in and retrieve data from the 93 overlay. For interoperability reasons, RELOAD specifies one overlay 94 algorithm, called chord-reload, that is mandatory to implement. This 95 document extends the chord-reload algorithm by introducing self- 96 tuning behavior. 98 DHT-based overlay networks are self-organizing, scalable and 99 reliable. However, these features come at a cost: peers in the 100 overlay network need to consume network bandwidth to maintain routing 101 state. Most DHTs use a periodic stabilization routine to counter the 102 undesirable effects of churn on routing. To configure the parameters 103 of a DHT, some characteristics such as churn rate and network size 104 need to be known in advance. These characteristics are then used to 105 configure the DHT in a static fashion by using fixed values for 106 parameters such as the size of the successor set, size of the routing 107 table, and rate of maintenance messages. The problem with this 108 approach is that it is not possible to achieve a low failure rate and 109 a low communication overhead by using fixed parameters. Instead, a 110 better approach is to allow the system to take into account the 111 evolution of network conditions and adapt to them. This document 112 extends the mandatory-to-implement chord-reload algorithm by making 113 it self-tuning. Two main advantages of self-tuning are that users no 114 longer need to tune every DHT parameter correctly for a given 115 operating environment and that the system adapts to changing 116 operating conditions. 118 The remainder of this document is structured as follows: Section 2 119 provides definitions of terms used in this document. Section 3 120 discusses alternative approaches to stabilization operations in DHTs, 121 including reactive stabilization, periodic stabilization, and 122 adaptive stabilization. Section 4 gives an introduction to the Chord 123 DHT algorithm. Section 5 describes how this document extends the 124 stabilization routine of the chord-reload algorithm. Section 6 125 describes how the stabilization rate and routing table size are 126 calculated in an adaptive fashion. 128 2. Terminology 130 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 131 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 132 document are to be interpreted as described in RFC 2119 [RFC2119]. 134 This document uses the terminology and definitions from the Concepts 135 and Terminology for Peer to Peer SIP [I-D.ietf-p2psip-concepts] 136 draft. 138 Chord Ring: The Chord DHT orders identifiers on an identifier circle 139 of size 2^numBitsInNodeId (numBitsInNodeId is the number of bits 140 in node identifiers). This identifier circle is called the Chord 141 ring. 143 DHT: Distributed Hash Tables (DHTs) are a class of decentralized 144 distributed systems that provide a lookup service similar to a 145 hash table. Given a key, any participating peer can retrieve the 146 value associated with that key. The responsibility for 147 maintaining the mapping from keys to values is distributed among 148 the peers. 150 Finger Table: A data structure with up to numBitsInNodeId entries 151 maintained by each peer in a Chord-based overlay. The ith entry 152 in the finger table of peer n contains the identity of the first 153 peer that succeeds n by at least 2^(numBitsInNodeId-i) on the 154 Chord ring. This peer is called the ith finger of peer n. As an 155 example, the first entry in the finger table of peer n contains a 156 peer half-way around the Chord ring from peer n. The purpose of 157 the finger table is to accelerate lookups. 159 log2(N): Logarithm of N with base 2. 161 n.id: Peer-ID of peer n. 163 Neighborhood Set: Consists of successor and predecessor lists. 165 numBitsInNodeId: Number of bits in a Node-ID. 167 O(g(n)): Informally, saying that some equation f(n) = O(g(n)) means 168 that f(n) is less than some constant multiple of g(n). 170 Omega(g(n)): Informally, saying that some equation f(n) = 171 Omega(g(n)) means that f(n) is more than some constant multiple of 172 g(n). 174 Predecessor List: A data structure containing the predecessors of a 175 peer on the Chord ring. 177 Routing Table: The set of peers that a node can use to route overlay 178 messages. The routing table consists of the finger table, 179 successor list and predecessor list. 181 Successor List: A data structure containing the first r successors 182 of a peer on the Chord ring. 184 3. Introduction to Stabilization in DHTs 186 DHTs use stabilization routines to counter the undesirable effects of 187 churn on routing. The purpose of stabilization is to keep the 188 routing information of each peer in the overlay consistent with the 189 constantly changing overlay topology. There are two alternative 190 approaches to stabilization: periodic and reactive [rhea2004]. 191 Periodic stabilization can either use a fixed stabilization rate or 192 calculate the stabilization rate in an adaptive fashion. 194 3.1. Reactive vs. Periodic Stabilization 196 In reactive stabilization, a peer reacts to the loss of a peer in its 197 neighborhood set or to the appearance of a new peer that should be 198 added to its neighborhood set by sending a copy of its neighbor table 199 to all peers in the neighborhood set. Periodic recovery, in 200 contrast, takes place independently of changes in the neighborhood 201 set. In periodic recovery, a peer periodically shares its 202 neighborhood set with each or a subset of the members of that set. 204 The chord-reload algorithm [I-D.ietf-p2psip-base] supports both 205 reactive and periodic stabilization. It has been shown in [rhea2004] 206 that reactive stabilization works well for small neighborhood sets 207 (i.e., small overlays) and moderate churn. However, in large-scale 208 (e.g., 1000 peers or more [rhea2004]) or high-churn overlays, 209 reactive stabilization runs the risk of creating a positive feedback 210 cycle, which can eventually result in congestion collapse. In 211 [rhea2004], it is shown that a 1000-peer overlay under churn uses 212 significantly less bandwidth and has lower latencies when periodic 213 stabilization is used than when reactive stabilization is used. 214 Although in the experiments carried out in [rhea2004], reactive 215 stabilization performed well when there was no churn, its bandwidth 216 use was observed to jump dramatically under churn. At higher churn 217 rates and larger scale overlays, periodic stabilization uses less 218 bandwidth and the resulting lower contention for the network leads to 219 lower latencies. For this reason, most DHTs such as CAN [CAN], Chord 220 [Chord], Pastry [Pastry], Bamboo [rhea2004], etc. use periodic 221 stabilization [ghinita2006]. As an example, the first version of 222 Bamboo used reactive stabilization, which caused Bamboo to suffer 223 from degradation in performance under churn. To fix this problem, 224 Bamboo was modified to use periodic stabilization. 226 In Chord, periodic stabilization is typically done both for 227 successors and fingers. An alternative strategy is analyzed in 228 [krishnamurthy2008]. In this strategy, called the correction-on- 229 change maintenance strategy, a peer periodically stabilizes its 230 successors but does not do so for its fingers. Instead, finger 231 pointers are stabilized in a reactive fashion. The results obtained 232 in [krishnamurthy2008] imply that although the correction-on-change 233 strategy works well when churn is low, periodic stabilization 234 outperforms the correction-on-change strategy when churn is high. 236 3.2. Configuring Periodic Stabilization 238 When periodic stabilization is used, one faces the problem of 239 selecting an appropriate execution rate for the stabilization 240 procedure. If the execution rate of periodic stabilization is high, 241 changes in the system can be quickly detected, but at the 242 disadvantage of increased communication overhead. Alternatively, if 243 the stabilization rate is low and the churn rate is high, routing 244 tables become inaccurate and DHT performance deteriorates. Thus, the 245 problem is setting the parameters so that the overlay achieves the 246 desired reliability and performance even in challenging conditions, 247 such as under heavy churn. This naturally results in high cost 248 during periods when the churn level is lower than expected, or 249 alternatively, poor performance or even network partitioning in worse 250 than expected conditions. 252 In addition to selecting an appropriate stabilization interval, 253 regardless of whether periodic stabilization is used or not, an 254 appropriate size needs to be selected for the neighborhood set and 255 for the finger table. 257 The current approach is to configure overlays statically. This works 258 in situations where perfect information about the future is 259 available. In situations where the operating conditions of the 260 network are known in advance and remain static throughout the 261 lifetime of the system, it is possible to choose fixed optimal values 262 for parameters such as stabilization rate, neighborhood set size and 263 routing table size. However, if the operating conditions (e.g., the 264 size of the overlay and its churn rate) do not remain static but 265 evolve with time, it is not possible to achieve both a low lookup 266 failure rate and a low communication overhead by using fixed 267 parameters [ghinita2006]. 269 As an example, to configure the Chord DHT algorithm, one needs to 270 select values for the following parameters: size of successor list, 271 stabilization interval, and size of the finger table. To select an 272 appropriate value for the stabilization interval, one needs to know 273 the expected churn rate and overlay size. According to 275 [liben-nowell2002], a Chord network in a ring-like state remains in a 276 ring-like state as long as peers send Omega(log2^2(N)) messages 277 before N new peers join or N/2 peers fail. Thus, in a 500-peer 278 overlay churning at a rate such that one peer joins and one peer 279 leaves the network every 30 seconds, an appropriate stabilization 280 interval would be on the order of 93s. According to [Chord], the 281 size of the successor list and finger table should be on the order of 282 log2(N). Having a successor list of size O(log2(N)) makes it 283 unlikely that a peer will lose all of its successors, which would 284 cause the Chord ring to become disconnected. Thus, in a 500-peer 285 network each peer should maintain on the order of nine successors and 286 fingers. However, if the churn rate doubles and the network size 287 remains unchanged, the stabilization rate should double as well. 288 That is, the appropriate maintenance interval would now be on the 289 order of 46s. On the other hand, if the churn rate becomes e.g. six- 290 fold and the size of the network grows to 2000 peers, on the order of 291 eleven fingers and successors should be maintained and the 292 stabilization interval should be on the order of 42s. If one 293 continued using the old values, this could result in inaccurate 294 routing tables, network partitioning, and deteriorating performance. 296 3.3. Adaptive Stabilization 298 A self-tuning DHT takes into consideration the continuous evolution 299 of network conditions and adapts to them. In a self-tuning DHT, each 300 peer collects statistical data about the network and dynamically 301 adjusts its stabilization rate, neighborhood set size, and finger 302 table size based on the analysis of the data [ghinita2006]. 303 Reference [mahajan2003] shows that by using self-tuning, it is 304 possible to achieve high reliability and performance even in adverse 305 conditions with low maintenance cost. Adaptive stabilization has 306 been shown to outperform periodic stabilization in terms of both 307 lookup failures and communication overhead [ghinita2006]. 309 4. Introduction to Chord 311 Chord [Chord] is a structured P2P algorithm that uses consistent 312 hashing to build a DHT out of several independent peers. Consistent 313 hashing assigns each peer and resource a fixed-length identifier. 314 Peers use SHA-1 as the base hash fuction to generate the identifiers. 315 As specified in RELOAD base, the length of the identifiers is 316 numBitsInNodeId=128 bits. The identifiers are ordered on an 317 identifier circle of size 2^numBitsInNodeId. On the identifier 318 circle, key k is assigned to the first peer whose identifier equals 319 or follows the identifier of k in the identifier space. The 320 identifier circle is called the Chord ring. 322 Different DHTs differ significantly in performance when bandwidth is 323 limited. It has been shown that when compared to other DHTs, the 324 advantages of Chord include that it uses bandwidth efficiently and 325 can achieve low lookup latencies at little cost [li2004]. 327 A simple lookup mechanism could be implemented on a Chord ring by 328 requiring each peer to only know how to contact its current successor 329 on the identifier circle. Queries for a given identifier could then 330 be passed around the circle via the successor pointers until they 331 encounter the first peer whose identifier is equal to or larger than 332 the desired identifier. Such a lookup scheme uses a number of 333 messages that grows linearly with the number of peers. To reduce the 334 cost of lookups, Chord maintains also additional routing information; 335 each peer n maintains a data structure with up to numBitsInNodeId 336 entries, called the finger table. The first entry in the finger 337 table of peer n contains the peer half-way around the ring from peer 338 n. The second entry contains the peer that is 1/4th of the way 339 around, the third entry the peer that is 1/8th of the way around, 340 etc. In other words, the ith entry in the finger table at peer n 341 contains the identity of the first peer s that succeeds n by at least 342 2^(numBitsInNodeId-i) on the Chord ring. This peer is called the ith 343 finger of peer n. The interval between two consecutive fingers is 344 called a finger interval. The ith finger interval of peer n covers 345 the range [n.id + 2^(numBitsInNodeId-i), n.id + 346 2^(numBitsInNodeId-i+1)) on the Chord ring. In an N-peer network, 347 each peer maintains information about O(log2(N)) other peers in its 348 finger table. As an example, if N=100000, it is sufficient to 349 maintain 17 fingers. 351 Chord needs all peers' successor pointers to be up to date in order 352 to ensure that lookups produce correct results as the set of 353 participating peers changes. To achieve this, peers run a 354 stabilization protocol periodically in the background. The 355 stabilization protocol of the original Chord algorithm uses two 356 operations: successor stabilization and finger stabilization. 357 However, the Chord algorithm of RELOAD base defines two additional 358 stabilization components, as will be discussed below. 360 To increase robustness in the event of peer failures, each Chord peer 361 maintains a successor list of size r, containing the peer's first r 362 successors. The benefit of successor lists is that if each peer 363 fails independently with probability p, the probability that all r 364 successors fail simultaneously is only p^r. 366 The original Chord algorithm maintains only a single predecessor 367 pointer. However, multiple predecessor pointers (i.e., a predecessor 368 list) can be maintained to speed up recovery from predecessor 369 failures. The routing table of a peer consists of the successor 370 list, finger table, and predecessor list. 372 5. Extending Chord-reload to Support Self-tuning 374 This section describes how the mandatory-to-implement chord-reload 375 algorithm defined in RELOAD base [I-D.ietf-p2psip-base] can be 376 extended to support self-tuning. 378 The chord-reload algorithm supports both reactive and periodic 379 recovery strategies. When the self-tuning mechanisms defined in this 380 document are used, the periodic recovery strategy MUST be used. 381 Further, chord-reload specifies that at least three predecessors and 382 three successors need to be maintained. When the self-tuning 383 mechanisms are used, the appropriate sizes of the successor list and 384 predecessor list are determined in an adaptive fashion based on the 385 estimated network size, as will be described in Section 6. 387 As specified in RELOAD base, each peer MUST maintain a stabilization 388 timer. When the stabilization timer fires, the peer MUST restart the 389 timer and carry out the overlay stabilization routine. Overlay 390 stabilization has four components in chord-reload: 392 1. Update the neighbor table. We refer to this as neighbor 393 stabilization. 394 2. Refreshing the finger table. We refer to this as finger 395 stabilization. 396 3. Adjusting finger table size. 397 4. Detecting partitioning. We refer to this as strong 398 stabilization. 400 As specified in RELOAD base [I-D.ietf-p2psip-base], a peer sends 401 periodic messages as part of the neighbor stabilization, finger 402 stabilization, and strong stabilization routines. In neighbor 403 stabilization, a peer periodically sends an Update request to every 404 peer in its Connection Table. The default time is every ten minutes. 405 In finger stabilization, a peer periodically searches for new peers 406 to include in its finger table. This time defaults to one hour. 407 This document specifies how the neighbor stabilization and finger 408 stabilization intervals can be determined in an adaptive fashion 409 based on the operating conditions of the overlay. The subsections 410 below describe how this document extends the four components of 411 stabilization. 413 5.1. Update Requests 415 As described in RELOAD base [I-D.ietf-p2psip-base], the neighbor and 416 finger stabilization procedures are implemented using Update 417 requests. RELOAD base defines three types of Update requests: 418 'peer_ready', 'neighbors', and 'full'. Regardless of the type, all 419 Update requests include an 'uptime' field. Since the self-tuning 420 extensions require information on the uptimes of peers in the routing 421 table, the sender of an Update request MUST include its current 422 uptime in seconds in the 'uptime' field. 424 When self-tuning is used, each peer decides independently the 425 appropriate size for the successor list, predecessor list and finger 426 table. Thus, the 'predecessors', 'successors', and 'fingers' fields 427 included in RELOAD Update requests are of variable length. As 428 specified in RELOAD [I-D.ietf-p2psip-base], variable length fields 429 are on the wire preceded by length bytes. In the case of the 430 successor list, predecessor list, and finger table, there are two 431 length bytes (allowing lengths up to 2^16-1). The number of NodeId 432 structures included in each field can be calculated based on the 433 length bytes since the size of a single NodeId structure is 16 bytes. 434 If a peer receives more entries than fit into its successor list, 435 predecessor list or finger table, the peer MUST ignore the extra 436 entries. If a peer receives less entries than it currently has in 437 its own data structure, the peer MUST NOT drop the extra entries from 438 its data structure. 440 5.2. Neighbor Stabilization 442 In the neighbor stabilization operation of chord-reload, a peer 443 periodically sends an Update request to every peer in its Connection 444 Table. In a small, low-churn overlay, the amount of traffic this 445 process generates is typically acceptable. However, in a large-scale 446 overlay churning at a moderate or high churn rate, the traffic load 447 may no longer be acceptable since the size of the connection table is 448 large and the stabilization interval relatively short. The self- 449 tuning mechanisms described in this document are especially designed 450 for overlays of the latter type. Therefore, when the self-tuning 451 mechanisms are used, each peer MUST send a periodic Update request 452 only to its first predecessor and first successor on the Chord ring. 454 The neighbor stabilization routine MUST be executed when the 455 stabilization timer fires. To begin the neighbor stabilization 456 routine, a peer MUST send an Update request to its first successor 457 and its first predecessor. The type of the Update request MUST be 458 'neighbors'. The Update request MUST include the successor and 459 predecessor lists of the sender. If a peer receiving such an Update 460 request learns from the predecessor and successor lists included in 461 the request that new peers can be included in its neighborhood set, 462 it MUST send Attach requests to the new peers. 464 After a new peer has been added to the predecessor or successor list, 465 an Update request of type 'peer_ready' MUST be sent to the new peer. 466 This allows the new peer to insert the sender into its neighborhood 467 set. 469 5.3. Finger Stabilization 471 Chord-reload specifies two alternative methods for searching for new 472 peers to the finger table. Both of the alternatives can be used with 473 the self-tuning extensions defined in this document. 475 Immediately after a new peer has been added to the finger table, a 476 Probe request MUST be sent to the new peer to fetch its uptime. The 477 requested_info field of the Probe request MUST be set to contain the 478 ProbeInformationType 'uptime' defined in RELOAD base 479 [I-D.ietf-p2psip-base]. 481 5.4. Adjusting Finger Table Size 483 The chord-reload algorithm defines how a peer can make sure that the 484 finger table is appropriately sized to allow for efficient routing. 485 Since the self-tuning mechanisms specified in this document produce a 486 network size estimate, this estimate can be directly used to 487 calculate the optimal size for the finger table. This mechanism MUST 488 be used instead of the one specified by chord-reload. A peer MUST 489 use the network size estimate to determine whether it needs to adjust 490 the size of its finger table each time when the stabilization timer 491 fires. The way this is done is explained in Section 6.2. 493 5.5. Detecting Partitioning 495 This document does not require any changes to the mechanism chord- 496 reload uses to detect network partitioning. 498 5.6. Leaving the Overlay 500 As specified in RELOAD base [I-D.ietf-p2psip-base], a leaving peer 501 SHOULD send a Leave request to all members of its neighbor table 502 prior to leaving the overlay. The overlay_specific_data field MUST 503 contain the ChordLeaveData structure. The Leave requests that are 504 sent to successors MUST contain the predecessor list of the leaving 505 peer. The Leave requests that are sent to the predecessors MUST 506 contain the successor list of the leaving peer. If a given successor 507 can identify better predecessors than are already included in its 508 predecessor lists by investigating the predecessor list it receives 509 from the leaving peer, it MUST send Attach requests to them. 510 Similarly, if a given predecessor identifies better successors by 511 investigating the successor list it receives from the leaving peer, 512 it MUST send Attach requests to them. 514 6. Self-tuning Chord Parameters 516 This section specifies how to determine an appropriate stabilization 517 rate and routing table size in an adaptive fashion. The proposed 518 mechanism is based on [mahajan2003], [liben-nowell2002], and 519 [ghinita2006]. To calculate an appropriate stabilization rate, the 520 values of three parameters MUST be estimated: overlay size N, failure 521 rate U, and join rate L. To calculate an appropriate routing table 522 size, the estimated network size N can be used. Peers in the overlay 523 MUST re-calculate the values of the parameters to self-tune the 524 chord-reload algorithm at the end of each stabilization period before 525 re-starting the stabilization timer. 527 6.1. Estimating Overlay Size 529 Techniques for estimating the size of an overlay network have been 530 proposed for instance in [mahajan2003], [horowitz2003], 531 [kostoulas2005], [binzenhofer2006], and [ghinita2006]. In Chord, the 532 density of peer identifiers in the neighborhood set can be used to 533 produce an estimate of the size of the overlay, N [mahajan2003]. 534 Since peer identifiers are picked randomly with uniform probability 535 from the numBitsInNodeId-bit identifier space, the average distance 536 between peer identifiers in the successor set is 537 (2^numBitsInNodeId)/N. 539 To estimate the overlay network size, a peer MUST compute the average 540 inter-peer distance d between the successive peers starting from the 541 most distant predecessor and ending to the most distant successor in 542 the successor list. The estimated network size MUST be calculated 543 as: 545 2^numBitsInNodeId 546 N = ------------------- 547 d 549 This estimate has been found to be accurate within 15% of the real 550 network size [ghinita2006]. Of course, the size of the neighborhood 551 set affects the accuracy of the estimate. 553 During the join process, a joining peer fills its routing table by 554 sending a series of Ping and Attach requests, as specified in RELOAD 555 base [I-D.ietf-p2psip-base]. Thus, a joining peer immediately has 556 enough information at its disposal to calculate an estimate of the 557 network size. 559 6.2. Determining Routing Table Size 561 As specified in RELOAD base, the finger table must contain at least 562 16 entries. When the self-tuning mechanisms are used, the size of 563 the finger table MUST be set to max(log2(N), 16) using the estimated 564 network size N. 566 The size of the successor list MUST be set to log2(N). An 567 implementation MAY place a lower limit on the size of the successor 568 list. As an example, the implementation might require the size of 569 the successor list to be always at least three. 571 A peer MAY choose to maintain a fixed-size predecessor list with only 572 three entries as specified in RELOAD base. However, it is 573 RECOMMENDED that a peer maintains log2(N) predecessors. 575 6.3. Estimating Failure Rate 577 A typical approach is to assume that peers join the overlay according 578 to a Poisson process with rate L and leave according to a Poisson 579 process with rate parameter U [mahajan2003]. The value of U can be 580 estimated using peer failures in the finger table and neighborhood 581 set [mahajan2003]. If peers fail with rate U, a peer with M unique 582 peer identifiers in its routing table should observe K failures in 583 time K/(M*U). Every peer in the overlay MUST maintain a history of 584 the last K failures. The current time MUST be inserted into the 585 history when the peer joins the overlay. The estimate of U MUST be 586 calculated as: 588 k 589 U = --------, 590 M * Tk 592 where M is the number of unique peer identifiers in the routing 593 table, Tk is the time between the first and the last failure in the 594 history, and k is the number of failures in the history. If k is 595 smaller than K, the estimate MUST be computed as if there was a 596 failure at the current time. It has been shown that an estimate 597 calculated in a similar manner is accurate within 17% of the real 598 value of U [ghinita2006]. 600 The size of the failure history K affects the accuracy of the 601 estimate of U. One can increase the accuracy by increasing K. 602 However, this has the side effect of decreasing responsiveness to 603 changes in the failure rate. On the other hand, a small history size 604 may cause a peer to overreact each time a new failure occurs. In 605 [ghinita2006], K is set 25% of the routing table size. Use of this 606 approach is RECOMMENDED. 608 6.3.1. Detecting Failures 610 A new failure MUST be inserted to the failure history in the 611 following cases: 613 1. A Leave request is received from a neigbhor. 614 2. A peer fails to reply to a Ping request sent in the situation 615 explained below. If no packets have been received on a 616 connection during the past 2*Tr seconds (where Tr is the 617 inactivity timer defined by ICE [I-D.ietf-mmusic-ice]), a RELOAD 618 Ping request MUST be sent to the remote peer. RELOAD mandates 619 the use of STUN [RFC5389] for keepalives. STUN keepalives take 620 the form of STUN Binding Indication transactions. As specified 621 in ICE [I-D.ietf-mmusic-ice], a peer sends a STUN Binding 622 Indication if there has been no packet sent on a connection for 623 Tr seconds. Tr is configurable and has a default of 15 seconds. 624 Although STUN Binding Indications do not generate a response, the 625 fact that a peer has failed can be learned from the lack of 626 packets (Binding Indications or application protocol packets) 627 received from the peer. If the remote peer fails to reply to the 628 Ping request, the sender MUST consider the remote peer to have 629 failed. 631 As an alternative to relying on STUN keepalives to detect peer 632 failure, a peer could send additional, frequent RELOAD messages to 633 every peer in its Connection Table. These messages could be Update 634 requests, in which case they would serve two purposes: detecting peer 635 failure and stabilization. However, as the cost of this approach can 636 be very high in terms of bandwidth consumption and traffic load, 637 especially in large-scale overlays experiencing churn, its use is NOT 638 RECOMMENDED. 640 6.4. Estimating Join Rate 642 Reference [ghinita2006] proposes that a peer can estimate the join 643 rate based on the uptime of the peers in its routing table. An 644 increase in peer join rate will be reflected by a decrease in the 645 average age of peers in the routing table. Thus, each peer MUST 646 maintain an array of the ages of the peers in its routing table 647 sorted in increasing order. Using this information, an estimate of 648 the global peer join rate L MUST be calculated as: 650 N 1 651 L = --- * ---------------, 652 4 Ages[rsize/4] 654 where Ages is an array containing the ages of the peers in the 655 routing table sorted in increasing order and rsize is the size of the 656 routing table. It is RECOMMENDED that only the ages of the 25% of 657 the youngest peers in the routing table (i.e., the 25th percentile) 658 are used to reduce the bias that a small number of peers with very 659 old ages can cause [ghinita2006]. It has been shown that the 660 estimate obtained by using this method is accurate within 22% of the 661 real join rate [ghinita2006]. Of course, the size of the routing 662 table affects the accuracy. 664 In order for this mechanism to work, peers need to exchange 665 information about the time they have been present in the overlay. 666 Peers receive the uptimes of their successors and predecessors during 667 the stabilization operations since all Update requests carry uptime 668 values. A joining peer learns the uptime of the admitting peer since 669 it receives an Update from the admitting peer during the join 670 procedure. Peers learn the uptimes of new fingers since they can 671 fetch the uptime using a Probe request after having attached to the 672 new finger. 674 6.5. Estimate Sharing 676 To improve the accuracy of network size, join rate, and leave rate 677 estimates, peers MUST share their estimates. When the stabilization 678 timer fires, a peer MUST select number-of-peers-to-probe random peers 679 from its finger table and send each of them a Probe request. The 680 targets of Probe requests are selected from the finger table rather 681 than from the neighbor table since neighbors are likely to make 682 similar errors when calculating their estimates. number-of-peers-to- 683 probe is a new element in the overlay configuration document. It is 684 defined in Section 7 and has a default value of 4. Both the Probe 685 request and the answer returned by the target peer MUST contain a new 686 message extension whose MessageExtensionType is 'self_tuning_data'. 687 This extension type is defined in Section 9.1. The 688 extension_contents field of the MessageExtension structure MUST 689 contain a SelfTuningData structure: 691 struct { 692 uint32 network_size; 693 uint32 join_rate; 694 uint32 leave_rate; 695 } SelfTuningData; 697 The contents of the SelfTuningData structure are as follows: 699 network_size 700 The latest network size estimate calculated by the sender. 702 join_rate 703 The latest join rate estimate calculated by the sender. 705 leave_rate 706 The latest leave rate estimate calculated by the sender. 708 The join and leave rates are expressed as joins or failures per 24 709 hours. As an example, if the global join rate estimate a peer has 710 calculated is 0.123 peers/s, it would include in the join_rate field 711 the value 10627 (24*60*60*0.123 = 10627.2). 713 The 'type' field of the MessageExtension structure MUST be set to 714 contain the value 'self_tuning_data'. The 'critical' field of the 715 structure MUST be set to False. 717 A peer MUST store all estimates it receives in Probe requests and 718 answers during a stabilization interval. When the stabilization 719 timer fires, the peer MUST calculate the estimates to be used during 720 the next stabilization interval by taking the 75th percentile of a 721 data set containing its own estimate and the received estimates. 723 6.6. Calculating the Stabilization Interval 725 According to [liben-nowell2002], a Chord network in a ring-like state 726 remains in a ring-like state as long as peers send Omega(log2^2(N)) 727 messages before N new peers join or N/2 peers fail. We can use the 728 estimate of peer failure rate, U, to calculate the time Tf in which 729 N/2 peers fail: 731 1 732 Tf = ------ 733 2*U 735 Based on this estimate, a stabilization interval Tstab-1 MUST be 736 calculated as: 738 Tf 739 Tstab-1 = ----------- 740 log2^2(N) 742 On the other hand, the estimated join rate L can be used to calculate 743 the time in which N new peers join the overlay. Based on the 744 estimate of L, a stabilization interval Tstab-2 MUST be calculated 745 as: 747 N 748 Tstab-2 = --------------- 749 L * log2^2(N) 751 Finally, the actual stabilization interval Tstab that MUST be used 752 can be obtained by taking the minimum of Tstab-1 and Tstab-2. 754 The results obtained in [maenpaa2009] indicate that making the 755 stabilization interval too small has the effect of making the overlay 756 less stable (e.g., in terms of detected loops and path failures). 757 Thus, a lower limit should be used for the stabilization period. 758 Based on the results in [maenpaa2009], a lower limit of 15s is 759 RECOMMENDED, since using a stabilization period smaller than this 760 will with a high probability cause too much traffic in the overlay. 762 7. Overlay Configuration Document Extension 764 This document extends the RELOAD overlay configuration document by 765 adding one new element, "number-of-peers-to-probe", inside each 766 "configuration" element. 768 self-tuning:number-of-peers-to-probe: The number of fingers to which 769 Probe requests are sent to obtain their network size, join rate, 770 and leave rate estimates. The default value is 4. 772 This new element is formally defined as follows: 774 namespace self-tuning = "urn:ietf:params:xml:ns:p2p:self-tuning" 776 parameter &= element self-tuning:number-of-peers-to-probe { xsd: 777 unsignedInt } 779 This namespace is added into the