idnits 2.17.1 

draft-ietf-p2psip-self-tuning-10.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (February 2, 2014) is 3730 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 5245 (Obsoleted by RFC 8445, RFC 8839)

  ** Obsolete normative reference: RFC 5389 (Obsoleted by RFC 8489)

  == Outdated reference: A later version (-09) exists of
     draft-ietf-p2psip-concepts-05


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	P2PSIP Working Group                                          J. Maenpaa
3	Internet-Draft                                              G. Camarillo
4	Intended status: Standards Track                                Ericsson
5	Expires: August 6, 2014                                 February 2, 2014

7	   Self-tuning Distributed Hash Table (DHT) for REsource LOcation And
8	                           Discovery (RELOAD)
9	                  draft-ietf-p2psip-self-tuning-10.txt

11	Abstract

13	   REsource LOcation And Discovery (RELOAD) is a peer-to-peer (P2P)
14	   signaling protocol that provides an overlay network service.  Peers
15	   in a RELOAD overlay network collectively run an overlay algorithm to
16	   organize the overlay, and to store and retrieve data.  This document
17	   describes how the default topology plugin of RELOAD can be extended
18	   to support self-tuning, that is, to adapt to changing operating
19	   conditions such as churn and network size.

21	Status of This Memo

23	   This Internet-Draft is submitted in full conformance with the
24	   provisions of BCP 78 and BCP 79.

26	   Internet-Drafts are working documents of the Internet Engineering
27	   Task Force (IETF).  Note that other groups may also distribute
28	   working documents as Internet-Drafts.  The list of current Internet-
29	   Drafts is at http://datatracker.ietf.org/drafts/current/.

31	   Internet-Drafts are draft documents valid for a maximum of six months
32	   and may be updated, replaced, or obsoleted by other documents at any
33	   time.  It is inappropriate to use Internet-Drafts as reference
34	   material or to cite them other than as "work in progress."

36	   This Internet-Draft will expire on August 6, 2014.

38	Copyright Notice

40	   Copyright (c) 2014 IETF Trust and the persons identified as the
41	   document authors.  All rights reserved.

43	   This document is subject to BCP 78 and the IETF Trust's Legal
44	   Provisions Relating to IETF Documents
45	   (http://trustee.ietf.org/license-info) in effect on the date of
46	   publication of this document.  Please review these documents
47	   carefully, as they describe your rights and restrictions with respect
48	   to this document.  Code Components extracted from this document must
49	   include Simplified BSD License text as described in Section 4.e of
50	   the Trust Legal Provisions and are provided without warranty as
51	   described in the Simplified BSD License.

53	Table of Contents

55	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
56	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
57	   3.  Introduction to Stabilization in DHTs . . . . . . . . . . . .   5
58	     3.1.  Reactive vs. Periodic Stabilization . . . . . . . . . . .   5
59	     3.2.  Configuring Periodic Stabilization  . . . . . . . . . . .   6
60	     3.3.  Adaptive Stabilization  . . . . . . . . . . . . . . . . .   7
61	   4.  Introduction to Chord . . . . . . . . . . . . . . . . . . . .   8
62	   5.  Extending Chord-reload to Support Self-tuning . . . . . . . .   9
63	     5.1.  Update Requests . . . . . . . . . . . . . . . . . . . . .  10
64	     5.2.  Neighbor Stabilization  . . . . . . . . . . . . . . . . .  10
65	     5.3.  Finger Stabilization  . . . . . . . . . . . . . . . . . .  11
66	     5.4.  Adjusting Finger Table Size . . . . . . . . . . . . . . .  11
67	     5.5.  Detecting Partitioning  . . . . . . . . . . . . . . . . .  11
68	     5.6.  Leaving the Overlay . . . . . . . . . . . . . . . . . . .  12
69	   6.  Self-tuning Chord Parameters  . . . . . . . . . . . . . . . .  12
70	     6.1.  Estimating Overlay Size . . . . . . . . . . . . . . . . .  12
71	     6.2.  Determining Routing Table Size  . . . . . . . . . . . . .  13
72	     6.3.  Estimating Failure Rate . . . . . . . . . . . . . . . . .  13
73	       6.3.1.  Detecting Failures  . . . . . . . . . . . . . . . . .  14
74	     6.4.  Estimating Join Rate  . . . . . . . . . . . . . . . . . .  15
75	     6.5.  Estimate Sharing  . . . . . . . . . . . . . . . . . . . .  15
76	     6.6.  Calculating the Stabilization Interval  . . . . . . . . .  16
77	   7.  Overlay Configuration Document Extension  . . . . . . . . . .  17
78	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  18
79	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  18
80	     9.1.  Message Extensions  . . . . . . . . . . . . . . . . . . .  18
81	   10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  18
82	   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  19
83	     11.1.  Normative References . . . . . . . . . . . . . . . . . .  19
84	     11.2.  Informative References . . . . . . . . . . . . . . . . .  19
85	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  21

87	1.  Introduction

89	   REsource LOcation And Discovery (RELOAD) [I-D.ietf-p2psip-base] is a
90	   peer-to-peer signaling protocol that can be used to maintain an
91	   overlay network, and to store data in and retrieve data from the
92	   overlay.  For interoperability reasons, RELOAD specifies one overlay
93	   algorithm, called chord-reload, that is mandatory to implement.  This
94	   document extends the chord-reload algorithm by introducing self-
95	   tuning behavior.

97	   Distributed Hash Table (DHT) based overlay networks are self-
98	   organizing, scalable and reliable.  However, these features come at a
99	   cost: peers in the overlay network need to consume network bandwidth
100	   to maintain routing state.  Most DHTs use a periodic stabilization
101	   routine to counter the undesirable effects of churn on routing.  To
102	   configure the parameters of a DHT, some characteristics such as churn
103	   rate and network size need to be known in advance.  These
104	   characteristics are then used to configure the DHT in a static
105	   fashion by using fixed values for parameters such as the size of the
106	   successor set, size of the routing table, and rate of maintenance
107	   messages.  The problem with this approach is that it is not possible
108	   to achieve a low failure rate and a low communication overhead by
109	   using fixed parameters.  Instead, a better approach is to allow the
110	   system to take into account the evolution of network conditions and
111	   adapt to them.  This document extends the mandatory-to-implement
112	   chord-reload algorithm by making it self-tuning.  Two main advantages
113	   of self-tuning are that users no longer need to tune every DHT
114	   parameter correctly for a given operating environment and that the
115	   system adapts to changing operating conditions.

117	   The remainder of this document is structured as follows: Section 2
118	   provides definitions of terms used in this document.  Section 3
119	   discusses alternative approaches to stabilization operations in DHTs,
120	   including reactive stabilization, periodic stabilization, and
121	   adaptive stabilization.  Section 4 gives an introduction to the Chord
122	   DHT algorithm.  Section 5 describes how this document extends the
123	   stabilization routine of the chord-reload algorithm.  Section 6
124	   describes how the stabilization rate and routing table size are
125	   calculated in an adaptive fashion.

127	2.  Terminology

129	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
130	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
131	   "OPTIONAL" in this document are to be interpreted as described in RFC
132	   2119 [RFC2119].

134	   This document uses the terminology and definitions from the Concepts
135	   and Terminology for Peer to Peer SIP [I-D.ietf-p2psip-concepts]
136	   draft.

138	   numBitsInNodeId:  Specifies the number of bits in a RELOAD Node-ID.

140	   DHT:  Distributed Hash Tables (DHTs) are a class of decentralized
141	      distributed systems that provide a lookup service similar to a
142	      regular hash table.  Given a key, any peer participating in the
143	      system can retrieve the value associated with that key.  The
144	      responsibility for maintaining the mapping from keys to values is
145	      distributed among the peers.

147	   Chord Ring:  The Chord DHT uses ring topology and orders identifiers
148	      on an identifier circle of size 2^numBitsInNodeId.  This
149	      identifier circle is called the Chord ring.  On the Chord ring,
150	      the responsibility for a key k is assigned to the node whose
151	      identifier equals to or immediately follows k.

153	   Finger Table:  A data structure with up to (but typically less than)
154	      numBitsInNodeId entries maintained by each peer in a Chord-based
155	      overlay.  The ith entry in the finger table of peer n contains the
156	      identity of the first peer that succeeds n by at least
157	      2^(numBitsInNodeId-i) on the Chord ring.  This peer is called the
158	      ith finger of peer n. As an example, the first entry in the finger
159	      table of peer n contains a peer half-way around the Chord ring
160	      from peer n. The purpose of the finger table is to accelerate
161	      lookups.

163	   n.id:  An abbreviation that is in this document used refer to the
164	      Node-ID of peer n.

166	   O(g(n)):  Informally, saying that some equation f(n) = O(g(n)) means
167	      that f(n) is less than some constant multiple of g(n).

169	   Omega(g(n)):  Informally, saying that some equation f(n) =
170	      Omega(g(n)) means that f(n) is more than some constant multiple of
171	      g(n).

173	   Predecessor List:  A data structure containing the first predecessors
174	      of a peer on the Chord ring.

176	   Successor List:  A data structure containing the first r successors
177	      of a peer on the Chord ring.

179	   Neighborhood Set:  A term used to refer to the set of peers included
180	      in the successor and predecessor lists of a given peer.

182	   Routing Table:  Contents of a given peer's routing table include the
183	      set of peers that the peer can use to route overlay messages.  The
184	      routing table is made up of the finger table, successor list and
185	      predecessor list.

187	3.  Introduction to Stabilization in DHTs

189	   DHTs use stabilization routines to counter the undesirable effects of
190	   churn on routing.  The purpose of stabilization is to keep the
191	   routing information of each peer in the overlay consistent with the
192	   constantly changing overlay topology.  There are two alternative
193	   approaches to stabilization: periodic and reactive [rhea2004].
194	   Periodic stabilization can either use a fixed stabilization rate or
195	   calculate the stabilization rate in an adaptive fashion.

197	3.1.  Reactive vs. Periodic Stabilization

199	   In reactive stabilization, a peer reacts to the loss of a peer in its
200	   neighborhood set or to the appearance of a new peer that should be
201	   added to its neighborhood set by sending a copy of its neighbor table
202	   to all peers in the neighborhood set.  Periodic recovery, in
203	   contrast, takes place independently of changes in the neighborhood
204	   set.  In periodic recovery, a peer periodically shares its
205	   neighborhood set with each or a subset of the members of that set.

207	   The chord-reload algorithm [I-D.ietf-p2psip-base] supports both
208	   reactive and periodic stabilization.  It has been shown in [rhea2004]
209	   that reactive stabilization works well for small neighborhood sets
210	   (i.e., small overlays) and moderate churn.  However, in large-scale
211	   (e.g., 1000 peers or more [rhea2004]) or high-churn overlays,
212	   reactive stabilization runs the risk of creating a positive feedback
213	   cycle, which can eventually result in congestion collapse.  In
214	   [rhea2004], it is shown that a 1000-peer overlay under churn uses
215	   significantly less bandwidth and has lower latencies when periodic
216	   stabilization is used than when reactive stabilization is used.
217	   Although in the experiments carried out in [rhea2004], reactive
218	   stabilization performed well when there was no churn, its bandwidth
219	   use was observed to jump dramatically under churn.  At higher churn
220	   rates and larger scale overlays, periodic stabilization uses less
221	   bandwidth and the resulting lower contention for the network leads to
222	   lower latencies.  For this reason, most DHTs such as CAN [CAN], Chord
223	   [Chord], Pastry [Pastry], Bamboo [rhea2004], etc. use periodic
224	   stabilization [ghinita2006].  As an example, the first version of
225	   Bamboo used reactive stabilization, which caused Bamboo to suffer
226	   from degradation in performance under churn.  To fix this problem,
227	   Bamboo was modified to use periodic stabilization.

229	   In Chord, periodic stabilization is typically done both for
230	   successors and fingers.  An alternative strategy is analyzed in
231	   [krishnamurthy2008].  In this strategy, called the correction-on-
232	   change maintenance strategy, a peer periodically stabilizes its
233	   successors but does not do so for its fingers.  Instead, finger
234	   pointers are stabilized in a reactive fashion.  The results obtained
235	   in [krishnamurthy2008] imply that although the correction-on-change
236	   strategy works well when churn is low, periodic stabilization
237	   outperforms the correction-on-change strategy when churn is high.

239	3.2.  Configuring Periodic Stabilization

241	   When periodic stabilization is used, one faces the problem of
242	   selecting an appropriate execution rate for the stabilization
243	   procedure.  If the execution rate of periodic stabilization is high,
244	   changes in the system can be quickly detected, but at the
245	   disadvantage of increased communication overhead.  Alternatively, if
246	   the stabilization rate is low and the churn rate is high, routing
247	   tables become inaccurate and DHT performance deteriorates.  Thus, the
248	   problem is setting the parameters so that the overlay achieves the
249	   desired reliability and performance even in challenging conditions,
250	   such as under heavy churn.  This naturally results in high cost
251	   during periods when the churn level is lower than expected, or
252	   alternatively, poor performance or even network partitioning in worse
253	   than expected conditions.

255	   In addition to selecting an appropriate stabilization interval,
256	   regardless of whether periodic stabilization is used or not, an
257	   appropriate size needs to be selected for the neighborhood set and
258	   for the finger table.

260	   The current approach is to configure overlays statically.  This works
261	   in situations where perfect information about the future is
262	   available.  In situations where the operating conditions of the
263	   network are known in advance and remain static throughout the
264	   lifetime of the system, it is possible to choose fixed optimal values
265	   for parameters such as stabilization rate, neighborhood set size and
266	   routing table size.  However, if the operating conditions (e.g., the
267	   size of the overlay and its churn rate) do not remain static but
268	   evolve with time, it is not possible to achieve both a low lookup
269	   failure rate and a low communication overhead by using fixed
270	   parameters [ghinita2006].

272	   As an example, to configure the Chord DHT algorithm, one needs to
273	   select values for the following parameters: size of successor list,
274	   stabilization interval, and size of the finger table.  To select an
275	   appropriate value for the stabilization interval, one needs to know
276	   the expected churn rate and overlay size.  According to
277	   [liben-nowell2002], a Chord network in a ring-like state remains in a
278	   ring-like state as long as peers send Omega(log2^2(N)) messages
279	   before N new peers join or N/2 peers fail.  Thus, in a 500-peer
280	   overlay churning at a rate such that one peer joins and one peer
281	   leaves the network every 30 seconds, an appropriate stabilization
282	   interval would be on the order of 93s. According to [Chord], the size
283	   of the successor list and finger table should be on the order of
284	   log2(N).  Having a successor list of size O(log2(N)) makes it
285	   unlikely that a peer will lose all of its successors, which would
286	   cause the Chord ring to become disconnected.  Thus, in a 500-peer
287	   network each peer should maintain on the order of nine successors and
288	   fingers.  However, if the churn rate doubles and the network size
289	   remains unchanged, the stabilization rate should double as well.
290	   That is, the appropriate maintenance interval would now be on the
291	   order of 46s. On the other hand, if the churn rate becomes e.g. six-
292	   fold and the size of the network grows to 2000 peers, on the order of
293	   eleven fingers and successors should be maintained and the
294	   stabilization interval should be on the order of 42s. If one
295	   continued using the old values, this could result in inaccurate
296	   routing tables, network partitioning, and deteriorating performance.

298	3.3.  Adaptive Stabilization

300	   A self-tuning DHT takes into consideration the continuous evolution
301	   of network conditions and adapts to them.  In a self-tuning DHT, each
302	   peer collects statistical data about the network and dynamically
303	   adjusts its stabilization rate, neighborhood set size, and finger
304	   table size based on the analysis of the data [ghinita2006].
305	   Reference [mahajan2003] shows that by using self-tuning, it is
306	   possible to achieve high reliability and performance even in adverse
307	   conditions with low maintenance cost.  Adaptive stabilization has
308	   been shown to outperform periodic stabilization in terms of both
309	   lookup failures and communication overhead [ghinita2006].

311	4.  Introduction to Chord

313	   Chord [Chord] is a structured P2P algorithm that uses consistent
314	   hashing to build a DHT out of several independent peers.  Consistent
315	   hashing assigns each peer and resource a fixed-length identifier.
316	   Peers use SHA-1 as the base hash fuction to generate the identifiers.
317	   As specified in RELOAD base, the length of the identifiers is
318	   numBitsInNodeId=128 bits.  The identifiers are ordered on an
319	   identifier circle of size 2^numBitsInNodeId.  On the identifier
320	   circle, key k is assigned to the first peer whose identifier equals
321	   or follows the identifier of k in the identifier space.  The
322	   identifier circle is called the Chord ring.

324	   Different DHTs differ significantly in performance when bandwidth is
325	   limited.  It has been shown that when compared to other DHTs, the
326	   advantages of Chord include that it uses bandwidth efficiently and
327	   can achieve low lookup latencies at little cost [li2004].

329	   A simple lookup mechanism could be implemented on a Chord ring by
330	   requiring each peer to only know how to contact its current successor
331	   on the identifier circle.  Queries for a given identifier could then
332	   be passed around the circle via the successor pointers until they
333	   encounter the first peer whose identifier is equal to or larger than
334	   the desired identifier.  Such a lookup scheme uses a number of
335	   messages that grows linearly with the number of peers.  To reduce the
336	   cost of lookups, Chord maintains also additional routing information;
337	   each peer n maintains a data structure with up to numBitsInNodeId
338	   entries, called the finger table.  The first entry in the finger
339	   table of peer n contains the peer half-way around the ring from peer
340	   n. The second entry contains the peer that is 1/4th of the way
341	   around, the third entry the peer that is 1/8th of the way around,
342	   etc.  In other words, the ith entry in the finger table at peer n
343	   contains the identity of the first peer s that succeeds n by at least
344	   2^(numBitsInNodeId-i) on the Chord ring.  This peer is called the ith
345	   finger of peer n. The interval between two consecutive fingers is
346	   called a finger interval.  The ith finger interval of peer n covers
347	   the range [n.id + 2^(numBitsInNodeId-i), n.id +
348	   2^(numBitsInNodeId-i+1)) on the Chord ring.  In an N-peer network,
349	   each peer maintains information about O(log2(N)) other peers in its
350	   finger table.  As an example, if N=100000, it is sufficient to
351	   maintain 17 fingers.

353	   Chord needs all peers' successor pointers to be up to date in order
354	   to ensure that lookups produce correct results as the set of
355	   participating peers changes.  To achieve this, peers run a
356	   stabilization protocol periodically in the background.  The
357	   stabilization protocol of the original Chord algorithm uses two
358	   operations: successor stabilization and finger stabilization.

360	   However, the Chord algorithm of RELOAD base defines two additional
361	   stabilization components, as will be discussed below.

363	   To increase robustness in the event of peer failures, each Chord peer
364	   maintains a successor list of size r, containing the peer's first r
365	   successors.  The benefit of successor lists is that if each peer
366	   fails independently with probability p, the probability that all r
367	   successors fail simultaneously is only p^r.

369	   The original Chord algorithm maintains only a single predecessor
370	   pointer.  However, multiple predecessor pointers (i.e., a predecessor
371	   list) can be maintained to speed up recovery from predecessor
372	   failures.  The routing table of a peer consists of the successor
373	   list, finger table, and predecessor list.

375	5.  Extending Chord-reload to Support Self-tuning

377	   This section describes how the mandatory-to-implement chord-reload
378	   algorithm defined in RELOAD base [I-D.ietf-p2psip-base] can be
379	   extended to support self-tuning.

381	   The chord-reload algorithm supports both reactive and periodic
382	   recovery strategies.  When the self-tuning mechanisms defined in this
383	   document are used, the periodic recovery strategy MUST be used.
384	   Further, chord-reload specifies that at least three predecessors and
385	   three successors need to be maintained.  When the self-tuning
386	   mechanisms are used, the appropriate sizes of the successor list and
387	   predecessor list are determined in an adaptive fashion based on the
388	   estimated network size, as will be described in Section 6.

390	   As specified in RELOAD base, each peer MUST maintain a stabilization
391	   timer.  When the stabilization timer fires, the peer MUST restart the
392	   timer and carry out the overlay stabilization routine.  Overlay
393	   stabilization has four components in chord-reload:

395	   1.  Update the neighbor table.  We refer to this as neighbor
396	       stabilization.

398	   2.  Refreshing the finger table.  We refer to this as finger
399	       stabilization.

401	   3.  Adjusting finger table size.

403	   4.  Detecting partitioning.  We refer to this as strong
404	       stabilization.

406	   As specified in RELOAD base [I-D.ietf-p2psip-base], a peer sends
407	   periodic messages as part of the neighbor stabilization, finger
408	   stabilization, and strong stabilization routines.  In neighbor
409	   stabilization, a peer periodically sends an Update request to every
410	   peer in its Connection Table.  The default time is every ten minutes.
411	   In finger stabilization, a peer periodically searches for new peers
412	   to include in its finger table.  This time defaults to one hour.
413	   This document specifies how the neighbor stabilization and finger
414	   stabilization intervals can be determined in an adaptive fashion
415	   based on the operating conditions of the overlay.  The subsections
416	   below describe how this document extends the four components of
417	   stabilization.

419	5.1.  Update Requests

421	   As described in RELOAD base [I-D.ietf-p2psip-base], the neighbor and
422	   finger stabilization procedures are implemented using Update
423	   requests.  RELOAD base defines three types of Update requests:
424	   'peer_ready', 'neighbors', and 'full'.  Regardless of the type, all
425	   Update requests include an 'uptime' field.  Since the self-tuning
426	   extensions require information on the uptimes of peers in the routing
427	   table, the sender of an Update request MUST include its current
428	   uptime in seconds in the 'uptime' field.

430	   When self-tuning is used, each peer decides independently the
431	   appropriate size for the successor list, predecessor list and finger
432	   table.  Thus, the 'predecessors', 'successors', and 'fingers' fields
433	   included in RELOAD Update requests are of variable length.  As
434	   specified in RELOAD [I-D.ietf-p2psip-base], variable length fields
435	   are on the wire preceded by length bytes.  In the case of the
436	   successor list, predecessor list, and finger table, there are two
437	   length bytes (allowing lengths up to 2^16-1).  The number of NodeId
438	   structures included in each field can be calculated based on the
439	   length bytes since the size of a single NodeId structure is 16 bytes.
440	   If a peer receives more entries than fit into its successor list,
441	   predecessor list or finger table, the peer MUST ignore the extra
442	   entries.  If a peer receives less entries than it currently has in
443	   its own data structure, the peer MUST NOT drop the extra entries from
444	   its data structure.

446	5.2.  Neighbor Stabilization

448	   In the neighbor stabilization operation of chord-reload, a peer
449	   periodically sends an Update request to every peer in its Connection
450	   Table.  In a small, low-churn overlay, the amount of traffic this
451	   process generates is typically acceptable.  However, in a large-scale
452	   overlay churning at a moderate or high churn rate, the traffic load
453	   may no longer be acceptable since the size of the connection table is
454	   large and the stabilization interval relatively short.  The self-
455	   tuning mechanisms described in this document are especially designed
456	   for overlays of the latter type.  Therefore, when the self-tuning
457	   mechanisms are used, each peer MUST send a periodic Update request
458	   only to its first predecessor and first successor on the Chord ring.

460	   The neighbor stabilization routine MUST be executed when the
461	   stabilization timer fires.  To begin the neighbor stabilization
462	   routine, a peer MUST send an Update request to its first successor
463	   and its first predecessor.  The type of the Update request MUST be
464	   'neighbors'.  The Update request MUST include the successor and
465	   predecessor lists of the sender.  If a peer receiving such an Update
466	   request learns from the predecessor and successor lists included in
467	   the request that new peers can be included in its neighborhood set,
468	   it MUST send Attach requests to the new peers.

470	   After a new peer has been added to the predecessor or successor list,
471	   an Update request of type 'peer_ready' MUST be sent to the new peer.
472	   This allows the new peer to insert the sender into its neighborhood
473	   set.

475	5.3.  Finger Stabilization

477	   Chord-reload specifies two alternative methods for searching for new
478	   peers to the finger table.  Both of the alternatives can be used with
479	   the self-tuning extensions defined in this document.

481	   Immediately after a new peer has been added to the finger table, a
482	   Probe request MUST be sent to the new peer to fetch its uptime.  The
483	   requested_info field of the Probe request MUST be set to contain the
484	   ProbeInformationType 'uptime' defined in RELOAD base
485	   [I-D.ietf-p2psip-base].

487	5.4.  Adjusting Finger Table Size

489	   The chord-reload algorithm defines how a peer can make sure that the
490	   finger table is appropriately sized to allow for efficient routing.
491	   Since the self-tuning mechanisms specified in this document produce a
492	   network size estimate, this estimate can be directly used to
493	   calculate the optimal size for the finger table.  This mechanism MUST
494	   be used instead of the one specified by chord-reload.  A peer MUST
495	   use the network size estimate to determine whether it needs to adjust
496	   the size of its finger table each time when the stabilization timer
497	   fires.  The way this is done is explained in Section 6.2.

499	5.5.  Detecting Partitioning

501	   This document does not require any changes to the mechanism chord-
502	   reload uses to detect network partitioning.

504	5.6.  Leaving the Overlay

506	   As specified in RELOAD base [I-D.ietf-p2psip-base], a leaving peer
507	   SHOULD send a Leave request to all members of its neighbor table
508	   prior to leaving the overlay.  The overlay_specific_data field MUST
509	   contain the ChordLeaveData structure.  The Leave requests that are
510	   sent to successors MUST contain the predecessor list of the leaving
511	   peer.  The Leave requests that are sent to the predecessors MUST
512	   contain the successor list of the leaving peer.  If a given successor
513	   can identify better predecessors than are already included in its
514	   predecessor lists by investigating the predecessor list it receives
515	   from the leaving peer, it MUST send Attach requests to them.
516	   Similarly, if a given predecessor identifies better successors by
517	   investigating the successor list it receives from the leaving peer,
518	   it MUST send Attach requests to them.

520	6.  Self-tuning Chord Parameters

522	   This section specifies how to determine an appropriate stabilization
523	   rate and routing table size in an adaptive fashion.  The proposed
524	   mechanism is based on [mahajan2003], [liben-nowell2002], and
525	   [ghinita2006].  To calculate an appropriate stabilization rate, the
526	   values of three parameters MUST be estimated: overlay size N, failure
527	   rate U, and join rate L. To calculate an appropriate routing table
528	   size, the estimated network size N can be used.  Peers in the overlay
529	   MUST re-calculate the values of the parameters to self-tune the
530	   chord-reload algorithm at the end of each stabilization period before
531	   re-starting the stabilization timer.

533	6.1.  Estimating Overlay Size

535	   Techniques for estimating the size of an overlay network have been
536	   proposed for instance in [mahajan2003], [horowitz2003],
537	   [kostoulas2005], [binzenhofer2006], and [ghinita2006].  In Chord, the
538	   density of peer identifiers in the neighborhood set can be used to
539	   produce an estimate of the size of the overlay, N [mahajan2003].
540	   Since peer identifiers are picked randomly with uniform probability
541	   from the numBitsInNodeId-bit identifier space, the average distance
542	   between peer identifiers in the successor set is (2^numBitsInNodeId)/
543	   N.

545	   To estimate the overlay network size, a peer MUST compute the average
546	   inter-peer distance d between the successive peers starting from the
547	   most distant predecessor and ending to the most distant successor in
548	   the successor list.  The estimated network size MUST be calculated
549	   as:

551	                         2^numBitsInNodeId
552	                    N = -------------------
553	                                d

555	   This estimate has been found to be accurate within 15% of the real
556	   network size [ghinita2006].  Of course, the size of the neighborhood
557	   set affects the accuracy of the estimate.

559	   During the join process, a joining peer fills its routing table by
560	   sending a series of Ping and Attach requests, as specified in RELOAD
561	   base [I-D.ietf-p2psip-base].  Thus, a joining peer immediately has
562	   enough information at its disposal to calculate an estimate of the
563	   network size.

565	6.2.  Determining Routing Table Size

567	   As specified in RELOAD base, the finger table must contain at least
568	   16 entries.  When the self-tuning mechanisms are used, the size of
569	   the finger table MUST be set to max(log2(N), 16) using the estimated
570	   network size N.

572	   The size of the successor list MUST be set to log2(N).  An
573	   implementation MAY place a lower limit on the size of the successor
574	   list.  As an example, the implementation might require the size of
575	   the successor list to be always at least three.

577	   A peer MAY choose to maintain a fixed-size predecessor list with only
578	   three entries as specified in RELOAD base.  However, it is
579	   RECOMMENDED that a peer maintains log2(N) predecessors.

581	6.3.  Estimating Failure Rate

583	   A typical approach is to assume that peers join the overlay according
584	   to a Poisson process with rate L and leave according to a Poisson
585	   process with rate parameter U [mahajan2003].  The value of U can be
586	   estimated using peer failures in the finger table and neighborhood
587	   set [mahajan2003].  If peers fail with rate U, a peer with M unique
588	   peer identifiers in its routing table should observe K failures in
589	   time K/(M*U).  Every peer in the overlay MUST maintain a history of
590	   the last K failures.  The current time MUST be inserted into the
591	   history when the peer joins the overlay.  The estimate of U MUST be
592	   calculated as:

594	                             k
595	                     U = --------,
596	                          M * Tk

598	   where M is the number of unique peer identifiers in the routing
599	   table, Tk is the time between the first and the last failure in the
600	   history, and k is the number of failures in the history.  If k is
601	   smaller than K, the estimate MUST be computed as if there was a
602	   failure at the current time.  It has been shown that an estimate
603	   calculated in a similar manner is accurate within 17% of the real
604	   value of U [ghinita2006].

606	   The size of the failure history K affects the accuracy of the
607	   estimate of U. One can increase the accuracy by increasing K.
608	   However, this has the side effect of decreasing responsiveness to
609	   changes in the failure rate.  On the other hand, a small history size
610	   may cause a peer to overreact each time a new failure occurs.  In
611	   [ghinita2006], K is set 25% of the routing table size.  Use of this
612	   approach is RECOMMENDED.

614	6.3.1.  Detecting Failures

616	   A new failure MUST be inserted to the failure history in the
617	   following cases:

619	   1.  A Leave request is received from a neigbhor.

621	   2.  A peer fails to reply to a Ping request sent in the situation
622	       explained below.  If no packets have been received on a
623	       connection during the past 2*Tr seconds (where Tr is the
624	       inactivity timer defined by ICE [RFC5245]), a RELOAD Ping request
625	       MUST be sent to the remote peer.  RELOAD mandates the use of STUN
626	       [RFC5389] for keepalives.  STUN keepalives take the form of STUN
627	       Binding Indication transactions.  As specified in ICE [RFC5245],
628	       a peer sends a STUN Binding Indication if there has been no
629	       packet sent on a connection for Tr seconds.  Tr is configurable
630	       and has a default of 15 seconds.  Although STUN Binding
631	       Indications do not generate a response, the fact that a peer has
632	       failed can be learned from the lack of packets (Binding
633	       Indications or application protocol packets) received from the
634	       peer.  If the remote peer fails to reply to the Ping request, the
635	       sender MUST consider the remote peer to have failed.

637	   As an alternative to relying on STUN keepalives to detect peer
638	   failure, a peer could send additional, frequent RELOAD messages to
639	   every peer in its Connection Table.  These messages could be Update
640	   requests, in which case they would serve two purposes: detecting peer
641	   failure and stabilization.  However, as the cost of this approach can
642	   be very high in terms of bandwidth consumption and traffic load,
643	   especially in large-scale overlays experiencing churn, its use is NOT
644	   RECOMMENDED.

646	6.4.  Estimating Join Rate

648	   Reference [ghinita2006] proposes that a peer can estimate the join
649	   rate based on the uptime of the peers in its routing table.  An
650	   increase in peer join rate will be reflected by a decrease in the
651	   average age of peers in the routing table.  Thus, each peer MUST
652	   maintain an array of the ages of the peers in its routing table
653	   sorted in increasing order.  Using this information, an estimate of
654	   the global peer join rate L MUST be calculated as:

656	                               N
657	                    L = ---------------,
658	                         Ages[rsize/2]

660	   where Ages is an array containing the ages of the peers in the
661	   routing table sorted in increasing order and rsize is the size of the
662	   routing table.  It has been shown that the estimate obtained by using
663	   this method is accurate within 22% of the real join rate
664	   [ghinita2006].  Of course, the size of the routing table affects the
665	   accuracy.

667	   In order for this mechanism to work, peers need to exchange
668	   information about the time they have been present in the overlay.
669	   Peers receive the uptimes of their successors and predecessors during
670	   the stabilization operations since all Update requests carry uptime
671	   values.  A joining peer learns the uptime of the admitting peer since
672	   it receives an Update from the admitting peer during the join
673	   procedure.  Peers learn the uptimes of new fingers since they can
674	   fetch the uptime using a Probe request after having attached to the
675	   new finger.

677	6.5.  Estimate Sharing

679	   To improve the accuracy of network size, join rate, and leave rate
680	   estimates, peers MUST share their estimates.  When the stabilization
681	   timer fires, a peer MUST select number-of-peers-to-probe random peers
682	   from its finger table and send each of them a Probe request.  The
683	   targets of Probe requests are selected from the finger table rather
684	   than from the neighbor table since neighbors are likely to make
685	   similar errors when calculating their estimates. number-of-peers-to-
686	   probe is a new element in the overlay configuration document.  It is
687	   defined in Section 7 and has a default value of 4.  Both the Probe
688	   request and the answer returned by the target peer MUST contain a new
689	   message extension whose MessageExtensionType is 'self_tuning_data'.
690	   This extension type is defined in Section 9.1.  The
691	   extension_contents field of the MessageExtension structure MUST
692	   contain a SelfTuningData structure:

694	               struct {
695	                 uint32                   network_size;
696	                 uint32                   join_rate;
697	                 uint32                   leave_rate;
698	               } SelfTuningData;

700	   The contents of the SelfTuningData structure are as follows:

702	   network_size

704	      The latest network size estimate calculated by the sender.

706	   join_rate

708	      The latest join rate estimate calculated by the sender.

710	   leave_rate

712	      The latest leave rate estimate calculated by the sender.

714	   The join and leave rates are expressed as joins or failures per 24
715	   hours.  As an example, if the global join rate estimate a peer has
716	   calculated is 0.123 peers/s, it would include in the join_rate field
717	   the ceiling of the value 10627.2 (24*60*60*0.123 = 10627.2), that is,
718	   the value 10628.

720	   The 'type' field of the MessageExtension structure MUST be set to
721	   contain the value 'self_tuning_data'.  The 'critical' field of the
722	   structure MUST be set to False.

724	   A peer MUST store all estimates it receives in Probe requests and
725	   answers during a stabilization interval.  When the stabilization
726	   timer fires, the peer MUST calculate the estimates to be used during
727	   the next stabilization interval by taking the 75th percentile of a
728	   data set containing its own estimate and the received estimates.

730	6.6.  Calculating the Stabilization Interval

732	   According to [liben-nowell2002], a Chord network in a ring-like state
733	   remains in a ring-like state as long as peers send Omega(log2(N))^2
734	   messages before N new peers join or N/2 peers fail.  We can use the
735	   estimate of peer failure rate, U, to calculate the time Tf in which N
736	   /2 peers fail:

738	                                  1
739	                           Tf = ------
740	                                 2*U

742	   Based on this estimate, a stabilization interval Tstab-1 MUST be
743	   calculated as:

745	                                         Tf
746	                           Tstab-1 = -----------
747	                                      log2^2(N)

749	   On the other hand, the estimated join rate L can be used to calculate
750	   the time in which N new peers join the overlay.  Based on the
751	   estimate of L, a stabilization interval Tstab-2 MUST be calculated
752	   as:

754	                                             N
755	                            Tstab-2 = ---------------
756	                                       L * log2^2(N)

758	   Finally, the actual stabilization interval Tstab that MUST be used
759	   can be obtained by taking the minimum of Tstab-1 and Tstab-2.

761	   The results obtained in [maenpaa2009] indicate that making the
762	   stabilization interval too small has the effect of making the overlay
763	   less stable (e.g., in terms of detected loops and path failures).
764	   Thus, a lower limit should be used for the stabilization period.
765	   Based on the results in [maenpaa2009], a lower limit of 15s is
766	   RECOMMENDED, since using a stabilization period smaller than this
767	   will with a high probability cause too much traffic in the overlay.

769	7.  Overlay Configuration Document Extension

771	   This document extends the RELOAD overlay configuration document by
772	   adding one new element, "number-of-peers-to-probe", inside each
773	   "configuration" element.

775	   self-tuning:number-of-peers-to-probe:  The number of fingers to which
776	      Probe requests are sent to obtain their network size, join rate,
777	      and leave rate estimates.  The default value is 4.

779	   This new element is formally defined as follows:

781	   namespace self-tuning = "urn:ietf:params:xml:ns:p2p:self-tuning"
782	   parameter &= element self-tuning:number-of-peers-to-probe {
783	   xsd:unsignedInt }

785	   This namespace is added into the <mandatory-extension> element in the
786	   overlay configuration file.

788	8.  Security Considerations

790	   In the same way as malicious or compromised peers implementing the
791	   RELOAD base protocol [I-D.ietf-p2psip-base] can advertise false
792	   network metrics or distribute false routing table information for
793	   instance in RELOAD Update messages, malicious peers implementing this
794	   specification may share false join rate, leave rate, and network size
795	   estimates.  For such attacks, the same security concerns apply as in
796	   the RELOAD base specification.  In addition, as long as the amount of
797	   malicious peers in the overlay remains modest, the statistical
798	   mechanisms applied in Section 6.5 (i.e., the use of 75th percentiles)
799	   to process the shared estimates a peer obtains help ensuring that
800	   estimates that are clearly different from (i.e., larger or smaller
801	   than) other received estimates will not significantly influence the
802	   process of adapting the stabilization interval and routing table
803	   size.

805	9.  IANA Considerations

807	9.1.  Message Extensions

809	   This document introduces one additional extension to the "RELOAD
810	   Extensions" Registry:

812	                  +------------------+-------+---------------+
813	                  | Extension Name   |  Code | Specification |
814	                  +------------------+-------+---------------+
815	                  | self_tuning_data |     3 |      RFC-AAAA |
816	                  +------------------+-------+---------------+

818	   The contents of the extension are defined in Section 6.5.

820	   Note to RFC Editor: please replace AAAA with the RFC number for this
821	   specification.

823	10.  Acknowledgments

825	   The authors would like to thank Jani Hautakorpi for his contributions
826	   to the document.  The authors would also like to thank Carlos
827	   Bernardos for his comments on the document.

829	11.  References

831	11.1.  Normative References

833	   [I-D.ietf-p2psip-base]
834	              Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and
835	              H. Schulzrinne, "REsource LOcation And Discovery (RELOAD)
836	              Base Protocol", draft-ietf-p2psip-base-26 (work in
837	              progress), February 2013.

839	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
840	              Requirement Levels", BCP 14, RFC 2119, March 1997.

842	   [RFC5245]  Rosenberg, J., "Interactive Connectivity Establishment
843	              (ICE): A Protocol for Network Address Translator (NAT)
844	              Traversal for Offer/Answer Protocols", RFC 5245, April
845	              2010.

847	   [RFC5389]  Rosenberg, J., Mahy, R., Matthews, P., and D. Wing,
848	              "Session Traversal Utilities for NAT (STUN)", RFC 5389,
849	              October 2008.

851	11.2.  Informative References

853	   [CAN]      Ratnasamy, S., Francis, P., Handley, M., Karp, R., and S.
854	              Schenker, "A Scalable Content-Addressable Network", In
855	              Proceedings of the 2001 Conference on Applications,
856	              Technologies, Architectures and Protocols for Computer
857	              Communications pp. 161-172, August 2001.

859	   [Chord]    Stoica, I., Morris, R., Liben-Nowell, D., Karger, D.,
860	              Kaashoek, M., Dabek, F., and H. Balakrishnan, "Chord: A
861	              Scalable Peer-to-peer Lookup Service for Internet
862	              Applications", IEEE/ACM Transactions on Networking Volume
863	              11, Issue 1, pp. 17-32, February 2003.

865	   [I-D.ietf-p2psip-concepts]
866	              Bryan, D., Matthews, P., Shim, E., Willis, D., and S.
867	              Dawkins, "Concepts and Terminology for Peer to Peer SIP",
868	              draft-ietf-p2psip-concepts-05 (work in progress), July
869	              2013.

871	   [Pastry]   Rowstron, A. and P. Druschel, "Pastry: Scalable,
872	              Decentralized Object Location and Routing for Large-Scale
873	              Peer-to-Peer Systems", In Proceedings of the IFIP/ACM
874	              International Conference on Distribued Systems Platforms
875	              pp. 329-350, November 2001.

877	   [binzenhofer2006]
878	              Binzenhofer, A., Kunzmann, G., and R. Henjes, "A Scalable
879	              Algorithm to Monitor Chord-Based P2P Systems at Runtime",
880	              In Proceedings of the 20th IEEE International Parallel and
881	              Distributed Processing Symposium (IPDPS) pp. 1-8, April
882	              2006.

884	   [ghinita2006]
885	              Ghinita, G. and Y. Teo, "An Adaptive Stabilization
886	              Framework for Distributed Hash Tables", In Proceedings of
887	              the 20th IEEE International Parallel and Distributed
888	              Processing Symposium (IPDPS) pp. 29-38, April 2006.

890	   [horowitz2003]
891	              Horowitz, K. and D. Malkhi, "Estimating Network Size from
892	              Local Information", Information Processing Letters Volume
893	              88, Issue 5, pp. 237-243, December 2003.

895	   [kostoulas2005]
896	              Kostoulas, D., Psaltoulis, D., Gupta, I., Birman, K., and
897	              A. Demers, "Decentralized Schemes for Size Estimation in
898	              Large and Dynamic Groups", In Proceedings of the 4th IEEE
899	              International Symposium on Network Computing and
900	              Applications pp. 41-48, July 2005.

902	   [krishnamurthy2008]
903	              Krishnamurthy, S., El-Ansary, S., Aurell, E., and S.
904	              Haridi, "Comparing Maintenance Strategies for Overlays",
905	              In Proceedings of the 16th Euromicro Conference on
906	              Parallel, Distributed and Network-Based Processing pp.
907	              473-482, February 2008.

909	   [li2004]   Li, J., Strinbling, J., Gil, T., Morris, R., and M.
910	              Kaashoek, "Comparing the Performance of Distributed Hash
911	              Tables Under Churn", Peer-to-Peer Systems III, volume 3279
912	              of Lecture Notes in Computer Science Springer, pp. 87-99,
913	              February 2005.

915	   [liben-nowell2002]
916	              Liben-Nowell, D., Balakrishnan, H., and D. Karger,
917	              "Observations on the Dynamic Evolution of Peer-to-Peer
918	              Networks", In Proceedings of the 1st International
919	              Workshop on Peer-to-Peer Systems (IPTPS) pp. 22-33, March
920	              2002.

922	   [maenpaa2009]
923	              Maenpaa, J. and G. Camarillo, "A Study on Maintenance
924	              Operations in a Chord-Based Peer-to-Peer Session
925	              Initiation Protocol Overlay Network", In Proceedings of
926	              the 23rd IEEE International Parallel and Distributed
927	              Processing Symposium (IPDPS) pp. 1-9, May 2009.

929	   [mahajan2003]
930	              Mahajan, R., Castro, M., and A. Rowstron, "Controlling the
931	              Cost of Reliability in Peer-to-Peer Overlays", In
932	              Proceedings of the 2nd International Workshop on Peer-to-
933	              Peer Systems (IPTPS) pp. 21-32, February 2003.

935	   [rhea2004]
936	              Rhea, S., Geels, D., Roscoe, T., and J. Kubiatowicz,
937	              "Handling Churn in a DHT", In Proceedings of the USENIX
938	              Annual Technical Conference pp. 127-140, June 2004.

940	Authors' Addresses

942	   Jouni Maenpaa
943	   Ericsson
944	   Hirsalantie 11
945	   Jorvas  02420
946	   Finland

948	   Email: Jouni.Maenpaa@ericsson.com

950	   Gonzalo Camarillo
951	   Ericsson
952	   Hirsalantie 11
953	   Jorvas  02420
954	   Finland

956	   Email: Gonzalo.Camarillo@ericsson.com