idnits 2.17.1 

draft-irtf-p2prg-survey-search-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 3817.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 3794.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 3801.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 3807.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There is 1 instance of too long lines in the document, the longest one
     being 49 characters in excess of 72.

  == There are 2 instances of lines with non-RFC2606-compliant FQDNs in the
     document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 3, 2007) is 6258 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: '21-29' is mentioned on line 173, but not defined

  == Missing Reference: '30-52' is mentioned on line 174, but not defined

  == Missing Reference: '53-71' is mentioned on line 181, but not defined

  == Missing Reference: '72-100' is mentioned on line 185, but not defined

  == Missing Reference: '101-112' is mentioned on line 195, but not defined

  == Missing Reference: '113-131' is mentioned on line 198, but not defined

  == Missing Reference: '132-139' is mentioned on line 200, but not defined

  == Missing Reference: '140-160' is mentioned on line 203, but not defined

  == Missing Reference: '161-171' is mentioned on line 204, but not defined

  == Missing Reference: '172-182' is mentioned on line 207, but not defined

  == Missing Reference: '183-197' is mentioned on line 211, but not defined

  == Missing Reference: '198-200' is mentioned on line 218, but not defined

  == Missing Reference: '201-222' is mentioned on line 219, but not defined

  == Missing Reference: '223-228' is mentioned on line 226, but not defined

  == Missing Reference: '229-247' is mentioned on line 229, but not defined

  == Missing Reference: '284-287' is mentioned on line 716, but not defined

  == Missing Reference: '378-380' is mentioned on line 2002, but not defined

  == Missing Reference: '49-51' is mentioned on line 2079, but not defined

  == Unused Reference: 'RFC2119' is defined on line 2485, but no explicit
     reference was found in the text

  == Unused Reference: '26' is defined on line 2566, but no explicit
     reference was found in the text

  == Unused Reference: '28' is defined on line 2573, but no explicit
     reference was found in the text

  == Unused Reference: '33' is defined on line 2590, but no explicit
     reference was found in the text

  == Unused Reference: '35' is defined on line 2597, but no explicit
     reference was found in the text

  == Unused Reference: '53' is defined on line 2657, but no explicit
     reference was found in the text

  == Unused Reference: '57' is defined on line 2669, but no explicit
     reference was found in the text

  == Unused Reference: '58' is defined on line 2672, but no explicit
     reference was found in the text

  == Unused Reference: '63' is defined on line 2689, but no explicit
     reference was found in the text

  == Unused Reference: '64' is defined on line 2693, but no explicit
     reference was found in the text

  == Unused Reference: '68' is defined on line 2706, but no explicit
     reference was found in the text

  == Unused Reference: '69' is defined on line 2711, but no explicit
     reference was found in the text

  == Unused Reference: '74' is defined on line 2727, but no explicit
     reference was found in the text

  == Unused Reference: '94' is defined on line 2794, but no explicit
     reference was found in the text

  == Unused Reference: '96' is defined on line 2801, but no explicit
     reference was found in the text

  == Unused Reference: '101' is defined on line 2818, but no explicit
     reference was found in the text

  == Unused Reference: '102' is defined on line 2822, but no explicit
     reference was found in the text

  == Unused Reference: '103' is defined on line 2824, but no explicit
     reference was found in the text

  == Unused Reference: '104' is defined on line 2828, but no explicit
     reference was found in the text

  == Unused Reference: '106' is defined on line 2835, but no explicit
     reference was found in the text

  == Unused Reference: '108' is defined on line 2842, but no explicit
     reference was found in the text

  == Unused Reference: '109' is defined on line 2846, but no explicit
     reference was found in the text

  == Unused Reference: '110' is defined on line 2852, but no explicit
     reference was found in the text

  == Unused Reference: '112' is defined on line 2859, but no explicit
     reference was found in the text

  == Unused Reference: '113' is defined on line 2865, but no explicit
     reference was found in the text

  == Unused Reference: '114' is defined on line 2868, but no explicit
     reference was found in the text

  == Unused Reference: '115' is defined on line 2872, but no explicit
     reference was found in the text

  == Unused Reference: '116' is defined on line 2877, but no explicit
     reference was found in the text

  == Unused Reference: '117' is defined on line 2881, but no explicit
     reference was found in the text

  == Unused Reference: '118' is defined on line 2884, but no explicit
     reference was found in the text

  == Unused Reference: '120' is defined on line 2891, but no explicit
     reference was found in the text

  == Unused Reference: '121' is defined on line 2895, but no explicit
     reference was found in the text

  == Unused Reference: '122' is defined on line 2899, but no explicit
     reference was found in the text

  == Unused Reference: '123' is defined on line 2905, but no explicit
     reference was found in the text

  == Unused Reference: '124' is defined on line 2908, but no explicit
     reference was found in the text

  == Unused Reference: '125' is defined on line 2910, but no explicit
     reference was found in the text

  == Unused Reference: '126' is defined on line 2913, but no explicit
     reference was found in the text

  == Unused Reference: '127' is defined on line 2916, but no explicit
     reference was found in the text

  == Unused Reference: '128' is defined on line 2919, but no explicit
     reference was found in the text

  == Unused Reference: '129' is defined on line 2922, but no explicit
     reference was found in the text

  == Unused Reference: '130' is defined on line 2926, but no explicit
     reference was found in the text

  == Unused Reference: '131' is defined on line 2929, but no explicit
     reference was found in the text

  == Unused Reference: '133' is defined on line 2934, but no explicit
     reference was found in the text

  == Unused Reference: '134' is defined on line 2937, but no explicit
     reference was found in the text

  == Unused Reference: '135' is defined on line 2940, but no explicit
     reference was found in the text

  == Unused Reference: '136' is defined on line 2944, but no explicit
     reference was found in the text

  == Unused Reference: '137' is defined on line 2947, but no explicit
     reference was found in the text

  == Unused Reference: '139' is defined on line 2953, but no explicit
     reference was found in the text

  == Unused Reference: '140' is defined on line 2957, but no explicit
     reference was found in the text

  == Unused Reference: '141' is defined on line 2960, but no explicit
     reference was found in the text

  == Unused Reference: '143' is defined on line 2967, but no explicit
     reference was found in the text

  == Unused Reference: '144' is defined on line 2971, but no explicit
     reference was found in the text

  == Unused Reference: '145' is defined on line 2974, but no explicit
     reference was found in the text

  == Unused Reference: '146' is defined on line 2977, but no explicit
     reference was found in the text

  == Unused Reference: '147' is defined on line 2980, but no explicit
     reference was found in the text

  == Unused Reference: '148' is defined on line 2983, but no explicit
     reference was found in the text

  == Unused Reference: '149' is defined on line 2987, but no explicit
     reference was found in the text

  == Unused Reference: '155' is defined on line 3007, but no explicit
     reference was found in the text

  == Unused Reference: '156' is defined on line 3011, but no explicit
     reference was found in the text

  == Unused Reference: '158' is defined on line 3018, but no explicit
     reference was found in the text

  == Unused Reference: '159' is defined on line 3021, but no explicit
     reference was found in the text

  == Unused Reference: '160' is defined on line 3024, but no explicit
     reference was found in the text

  == Unused Reference: '161' is defined on line 3027, but no explicit
     reference was found in the text

  == Unused Reference: '162' is defined on line 3031, but no explicit
     reference was found in the text

  == Unused Reference: '163' is defined on line 3034, but no explicit
     reference was found in the text

  == Unused Reference: '164' is defined on line 3038, but no explicit
     reference was found in the text

  == Unused Reference: '166' is defined on line 3045, but no explicit
     reference was found in the text

  == Unused Reference: '167' is defined on line 3049, but no explicit
     reference was found in the text

  == Unused Reference: '169' is defined on line 3054, but no explicit
     reference was found in the text

  == Unused Reference: '170' is defined on line 3057, but no explicit
     reference was found in the text

  == Unused Reference: '171' is defined on line 3061, but no explicit
     reference was found in the text

  == Unused Reference: '172' is defined on line 3064, but no explicit
     reference was found in the text

  == Unused Reference: '173' is defined on line 3067, but no explicit
     reference was found in the text

  == Unused Reference: '174' is defined on line 3070, but no explicit
     reference was found in the text

  == Unused Reference: '175' is defined on line 3074, but no explicit
     reference was found in the text

  == Unused Reference: '176' is defined on line 3078, but no explicit
     reference was found in the text

  == Unused Reference: '177' is defined on line 3081, but no explicit
     reference was found in the text

  == Unused Reference: '178' is defined on line 3084, but no explicit
     reference was found in the text

  == Unused Reference: '179' is defined on line 3087, but no explicit
     reference was found in the text

  == Unused Reference: '180' is defined on line 3091, but no explicit
     reference was found in the text

  == Unused Reference: '181' is defined on line 3094, but no explicit
     reference was found in the text

  == Unused Reference: '182' is defined on line 3097, but no explicit
     reference was found in the text

  == Unused Reference: '183' is defined on line 3100, but no explicit
     reference was found in the text

  == Unused Reference: '184' is defined on line 3103, but no explicit
     reference was found in the text

  == Unused Reference: '185' is defined on line 3106, but no explicit
     reference was found in the text

  == Unused Reference: '186' is defined on line 3109, but no explicit
     reference was found in the text

  == Unused Reference: '187' is defined on line 3112, but no explicit
     reference was found in the text

  == Unused Reference: '188' is defined on line 3114, but no explicit
     reference was found in the text

  == Unused Reference: '189' is defined on line 3118, but no explicit
     reference was found in the text

  == Unused Reference: '190' is defined on line 3121, but no explicit
     reference was found in the text

  == Unused Reference: '191' is defined on line 3124, but no explicit
     reference was found in the text

  == Unused Reference: '192' is defined on line 3127, but no explicit
     reference was found in the text

  == Unused Reference: '193' is defined on line 3130, but no explicit
     reference was found in the text

  == Unused Reference: '194' is defined on line 3133, but no explicit
     reference was found in the text

  == Unused Reference: '196' is defined on line 3140, but no explicit
     reference was found in the text

  == Unused Reference: '197' is defined on line 3143, but no explicit
     reference was found in the text

  == Unused Reference: '198' is defined on line 3146, but no explicit
     reference was found in the text

  == Unused Reference: '199' is defined on line 3149, but no explicit
     reference was found in the text

  == Unused Reference: '200' is defined on line 3151, but no explicit
     reference was found in the text

  == Unused Reference: '201' is defined on line 3153, but no explicit
     reference was found in the text

  == Unused Reference: '203' is defined on line 3161, but no explicit
     reference was found in the text

  == Unused Reference: '204' is defined on line 3165, but no explicit
     reference was found in the text

  == Unused Reference: '205' is defined on line 3169, but no explicit
     reference was found in the text

  == Unused Reference: '206' is defined on line 3173, but no explicit
     reference was found in the text

  == Unused Reference: '207' is defined on line 3175, but no explicit
     reference was found in the text

  == Unused Reference: '208' is defined on line 3178, but no explicit
     reference was found in the text

  == Unused Reference: '209' is defined on line 3182, but no explicit
     reference was found in the text

  == Unused Reference: '211' is defined on line 3190, but no explicit
     reference was found in the text

  == Unused Reference: '212' is defined on line 3193, but no explicit
     reference was found in the text

  == Unused Reference: '213' is defined on line 3197, but no explicit
     reference was found in the text

  == Unused Reference: '214' is defined on line 3200, but no explicit
     reference was found in the text

  == Unused Reference: '217' is defined on line 3211, but no explicit
     reference was found in the text

  == Unused Reference: '218' is defined on line 3214, but no explicit
     reference was found in the text

  == Unused Reference: '219' is defined on line 3216, but no explicit
     reference was found in the text

  == Unused Reference: '220' is defined on line 3219, but no explicit
     reference was found in the text

  == Unused Reference: '221' is defined on line 3223, but no explicit
     reference was found in the text

  == Unused Reference: '222' is defined on line 3228, but no explicit
     reference was found in the text

  == Unused Reference: '223' is defined on line 3231, but no explicit
     reference was found in the text

  == Unused Reference: '224' is defined on line 3235, but no explicit
     reference was found in the text

  == Unused Reference: '225' is defined on line 3238, but no explicit
     reference was found in the text

  == Unused Reference: '226' is defined on line 3241, but no explicit
     reference was found in the text

  == Unused Reference: '227' is defined on line 3243, but no explicit
     reference was found in the text

  == Unused Reference: '228' is defined on line 3246, but no explicit
     reference was found in the text

  == Unused Reference: '229' is defined on line 3249, but no explicit
     reference was found in the text

  == Unused Reference: '230' is defined on line 3252, but no explicit
     reference was found in the text

  == Unused Reference: '232' is defined on line 3259, but no explicit
     reference was found in the text

  == Unused Reference: '233' is defined on line 3262, but no explicit
     reference was found in the text

  == Unused Reference: '234' is defined on line 3265, but no explicit
     reference was found in the text

  == Unused Reference: '235' is defined on line 3268, but no explicit
     reference was found in the text

  == Unused Reference: '236' is defined on line 3271, but no explicit
     reference was found in the text

  == Unused Reference: '238' is defined on line 3279, but no explicit
     reference was found in the text

  == Unused Reference: '239' is defined on line 3283, but no explicit
     reference was found in the text

  == Unused Reference: '243' is defined on line 3299, but no explicit
     reference was found in the text

  == Unused Reference: '244' is defined on line 3302, but no explicit
     reference was found in the text

  == Unused Reference: '245' is defined on line 3306, but no explicit
     reference was found in the text

  == Unused Reference: '246' is defined on line 3309, but no explicit
     reference was found in the text

  == Unused Reference: '247' is defined on line 3313, but no explicit
     reference was found in the text

  == Unused Reference: '285' is defined on line 3426, but no explicit
     reference was found in the text

  == Unused Reference: '287' is defined on line 3431, but no explicit
     reference was found in the text

  == Unused Reference: '379' is defined on line 3731, but no explicit
     reference was found in the text


     Summary: 2 errors (**), 0 flaws (~~), 160 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	PEER-TO-PEER RESEARCH GROUP                                    J.Risson
2	                                                                T.Moors
3	Internet Draft                            University of New South Wales
4	Intended status: Informational                            March 3, 2007
5	Expires: September 2007

7	         Survey of Research towards Robust Peer-to-Peer Networks:
8	                              Search Methods
9	                   draft-irtf-p2prg-survey-search-01.txt

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that
14	   any applicable patent or other IPR claims of which he or she is
15	   aware have been or will be disclosed, and any of which he or she
16	   becomes aware will be disclosed, in accordance with Section 6 of
17	   BCP 79.

19	   Internet-Drafts are working documents of the Internet Engineering
20	   Task Force (IETF), its areas, and its working groups.  Note that
21	   other groups may also distribute working documents as Internet-
22	   Drafts.

24	   Internet-Drafts are draft documents valid for a maximum of six months
25	   and may be updated, replaced, or obsoleted by other documents at any
26	   time.  It is inappropriate to use Internet-Drafts as reference
27	   material or to cite them other than as "work in progress."

29	   The list of current Internet-Drafts can be accessed at
30	   http://www.ietf.org/ietf/1id-abstracts.txt

32	   The list of Internet-Draft Shadow Directories can be accessed at
33	   http://www.ietf.org/shadow.html

35	   This Internet-Draft will expire on September 3, 2007.

37	Copyright Notice

39	   Copyright (C) The IETF Trust (2007).

41	Abstract

43	   The pace of research on peer-to-peer (P2P) networking in the last
44	   five years warrants a critical survey. P2P has the makings of a
45	   disruptive technology - it can aggregate enormous storage and
46	   processing resources while minimizing entry and scaling costs.

48	   Failures are common amongst massive numbers of distributed peers,
49	   though the impact of individual failures may be less than in
50	   conventional architectures. Thus the key to realizing P2P's potential
51	   in applications other than casual file sharing is robustness.

53	   P2P search methods are first couched within an overall P2P taxonomy.
54	   P2P indexes for simple key lookup are assessed, including those based
55	   on Plaxton trees, rings, tori, butterflies, de Bruijn graphs and skip
56	   graphs. Similarly, P2P indexes for keyword lookup, information
57	   retrieval and data management are explored. Finally, early efforts to
58	   optimize range, multi-attribute, join and aggregation queries over
59	   P2P indexes are reviewed. Insofar as they are available in the
60	   primary literature, robustness mechanisms and metrics are highlighted
61	   throughout. However, the low-level mechanisms that most affect
62	   robustness are not well isolated in the literature. Recommendations
63	   are given for future research.

65	Table of Contents

67	   1. Introduction...................................................3
68	      1.1. Related Disciplines.......................................6
69	      1.2. Structured and Unstructured Routing.......................8
70	      1.3. Indexes and Queries.......................................9
71	   2. Index Types...................................................10
72	      2.1. Local Index (Gnutella)...................................11
73	      2.2. Central Index (Napster)..................................12
74	      2.3. Distributed Index (Freenet)..............................14
75	   3. Semantic Free Index...........................................15
76	      3.1. Origins..................................................16
77	         3.1.1. Plaxton, Rajaraman, and Richa (PRR).................16
78	         3.1.2. Consistent Hashing..................................16
79	         3.1.3. Scalable Distributed Data Structures (LH*)..........17
80	      3.2. Dependability............................................17
81	         3.2.1. Static Dependability................................18
82	         3.2.2. Dynamic Dependability...............................18
83	         3.2.3. Ephemeral or Stable Nodes - O(log N) or O(1) Hops...19
84	         3.2.4. Simulation and Proof................................20
85	      3.3. Latency..................................................21
86	         3.3.1. Hop Count and the O(1)-Hop DHTs.....................21
87	         3.3.2. Proximity and the O(log N)-Hop DHTs.................22
88	      3.4. Multicasting.............................................23
89	         3.4.1. Multicasting vs Broadcasting........................23
90	         3.4.2. Motivation for DHT-based Multicasting...............23
91	         3.4.3. Design Issues.......................................24
92	      3.5. Routing Geometries.......................................25
93	         3.5.1. Plaxton Trees (Pastry, Tapestry)....................25
94	         3.5.2. Rings (Chord, DKS)..................................27
95	         3.5.3. Tori (CAN)..........................................28
96	         3.5.4. Butterflies (Viceroy)...............................29
97	         3.5.5. de Bruijn (D2B, Koorde, Distance Halving, ODRI).....30
98	         3.5.6. Skip Graphs.........................................32
99	   4. Semantic Index................................................33
100	      4.1. Keyword Lookup...........................................34
101	         4.1.1. Gnutella Enhancements...............................35
102	         4.1.2. Partition-by-Document, Partition-by-Keyword.........38
103	         4.1.3. Partial Search, Exhaustive Search...................38
104	      4.2. Information Retrieval....................................39
105	         4.2.1. Vector Model (PlanetP, FASD, eSearch)...............40
106	         4.2.2. Latent Semantic Indexing (pSearch)..................42
107	         4.2.3. Small Worlds........................................43
108	   5. Queries.......................................................43
109	      5.1. Range Queries............................................45
110	      5.2. Multi-Attribute Queries..................................48
111	      5.3. Join Queries.............................................49
112	      5.4. Aggregation Queries......................................50
113	   6. Security Considerations.......................................51
114	   7. IANA Considerations...........................................52
115	   8. Conclusions...................................................52
116	   9. Acknowledgments...............................................53
117	   10. References...................................................54
118	      10.1. Normative References....................................54
119	      10.2. Informative References..................................54
120	   Author's Addresses...............................................81
121	   Intellectual Property Statement..................................81
122	   Disclaimer of Validity...........................................82
123	   Copyright Statement..............................................82
124	   Acknowledgment...................................................82

126	1. Introduction

128	   Peer-to-peer (P2P) networks are those that exhibit three
129	   characteristics: self-organization, symmetric communication and
130	   distributed control [1]. A self-organizing P2P network "automatically
131	   adapts to the arrival, departure and failure of nodes" [2].
132	   Communication is symmetric in that peers act as both clients and
133	   servers. It has no centralized directory or control point. USENET
134	   servers or BGP peers have these traits [3] but the emphasis here is
135	   on the flurry of research since 2000. Leading examples include
136	   Gnutella [4], Freenet [5], Pastry [2], Tapestry [6], Chord [7], the
137	   Content Addressable Network (CAN) [8], pSearch [9] and Edutella [10].
138	   Some have suggested that peers are inherently unreliable [11]. Others
139	   have assumed well-connected, stable peers [12].

141	   This critical survey of P2P academic literature is warranted, given
142	   the intensity of recent research. At the time of writing, one
143	   research database lists over 5,800 P2P publications [13]. One vendor
144	   surveyed P2P products and deployments [14]. There is also a tutorial
145	   survey of leading P2P systems [15]. DePaoli and Mariani recently
146	   reviewed the dependability of some early P2P systems at a high level
147	   [16]. The need for a critical survey was flagged in the peer-to-peer
148	   research group of the Internet Research Task Force (IRTF) [17].

150	   P2P is potentially a disruptive technology with numerous
151	   applications, but this potential will not be realized unless it is
152	   demonstrated to be robust. A massively distributed search technique
153	   may yield numerous practical benefits for applications [18]. A P2P
154	   system has potential to be more dependable than architectures relying
155	   on a small number of centralized servers. It has potential to evolve
156	   better from small configurations - the capital outlays for high
157	   performance servers can be reduced and spread over time if a P2P
158	   assembly of general purpose nodes is used. A similar argument
159	   motivated the deployment of distributed databases - one thousand,
160	   off-the-shelf PC processors are more powerful and much less expensive
161	   than a large mainframe computer [19]. Storage and processing can be
162	   aggregated to achieve massive scale. Wasteful partitioning between
163	   servers or clusters can be avoided. As Gedik and Liu put it, if P2P
164	   is to find its way into applications other than casual file sharing,
165	   then reliability needs to be addressed [20].

167	   The taxonomy of Figure 1 divides the entire body of P2P research
168	   literature along four lines: search, storage, security and
169	   applications. This survey concentrates on search aspects. A P2P
170	   search network consists of an underlying index (Sections 2. to 4. )
171	   and queries that propagate over that index (Section 5. ).

173	   Search [18, 21-29]
174	      Semantic-Free Indexes [2, 6, 7, 30-52]
175	         Plaxton Trees
176	         Rings
177	         Tori
178	         Butterflies
179	         de Bruijn Graphs
180	         Skip Graphs
181	      Semantic Indexes [4, 53-71]
182	         Keyword Lookup
183	         Peer Information Retrieval
184	         Peer Data Management
185	      Queries [20, 22, 23, 25, 32, 38, 41, 56, 72-100]
186	         Range Queries
187	         Multi-Attribute Queries
188	         Join Queries
189	         Aggregation Queries
190	         Continuous Queries
191	         Recursive Queries
192	         Adaptive Queries

194	   Storage
195	      Consistency & Replication [101-112]
196	         Eventual consistency
197	         Trade-offs
198	      Distribution [39, 42, 90, 92, 113-131]
199	         Epidemics, Bloom Filters
200	      Fault Tolerance [40, 105, 132-139]
201	         Erasure Coding
202	         Byzantine Agreement
203	      Locality [24, 43, 47, 140-160]
204	      Load Balancing [37, 86, 100, 107, 151, 161-171]

206	   Security
207	      Character [172-182]
208	         Identity
209	         Reputation and Trust
210	         Incentives
211	      Goals [25, 27, 71, 183-197]
212	         Availability
213	         Authenticity
214	         Anonymity
215	         Access Control
216	         Fair Trading

218	   Applications [1, 198-200]
219	      Memory [32, 90, 142, 201-222]
220	         File Systems
221	         Web
222	         Content Delivery Networks
223	         Directories
224	         Service Discovery
225	         Publish / Subscribe ...
226	      Intelligence [223-228]
227	         GRID
228	         Security...
229	      Communication [12, 92, 119, 229-247]
230	         Multicasting
231	         Streaming Media
232	         Mobility
233	         Sensors...

235	            Figure 1 Classification of P2P Research Literature.

237	   This survey is concerned with two questions. The first is "How do P2P
238	   search networks work?" This foundation is important given the pace
239	   and breadth of P2P research in the last five years. In Section 2. ,
240	   we classify indexes as local, centralized and distributed. Since
241	   distributed indexes are becoming dominant, they are given closer
242	   attention in Sections 3. and 4. . Section 3. compares distributed P2P
243	   indexes for simple key lookup, in particular, their origins (Section
244	   3.1. ), dependability (Section 3.2. ), latency (Section 3.3. ), and
245	   their support for multicast (Section 3.4. ). It classifies those
246	   index according to their routing geometry (Section 3.5. ) - Plaxton
247	   trees, rings, tori, butterflies, de Bruijn graphs and skip graphs.
248	   Section 4. reviews distributed P2P indexes supporting keyword lookup
249	   (Section 4.1. ) and information retrieval (Section 4.2. ). Section 5.
250	   probes the embryonic research on P2P queries, in particular, range
251	   queries (Section 5.1. ), multi-attribute queries (Section 5.2. ),
252	   join queries (Section 5.3. ) and aggregation queries (Section 5.4. ).

254	   The second question is "How robust are P2P search networks?" Insofar
255	   as it is available in the research literature, we tease out the
256	   robustness mechanisms and metrics throughout Sections 2. to 5. .
257	   Unfortunately, robustness is often more sensitive to low-level design
258	   choices than it is to the broad P2P index structure, yet these
259	   underlying design choices are seldom isolated in the primary
260	   literature [248]. Furthermore, there has been little consensus on P2P
261	   robustness metrics (Section 3.2. ). Section 8. gives recommendations
262	   to address these important gaps.

264	1.1. Related Disciplines

266	   Peer-to-peer research draws upon numerous distributed systems
267	   disciplines. Networking researchers will recognize familiar issues of
268	   naming, routing and congestion control. P2P designs need to address
269	   routing and security issues across network region boundaries [152].
270	   Networking research has traditionally been host-centric. The web's
271	   Universal Resource Identifiers are naturally tied to specific hosts,
272	   making object mobility a challenge [216].

274	   P2P work is data-centric [249]. P2P systems for dynamic object
275	   location and routing have borrowed heavily from the distributed
276	   systems corpus. Some have used replication, erasure codes and
277	   Byzantine agreement [111]. Others have used epidemics for durable
278	   peer group communication [39].

280	   Similarly, P2P research is set to benefit from database research
281	   [250]. Database researchers will recognize the need to reapply Codd's
282	   principle of physical data independence, that is, to decouple data
283	   indexes from the applications that use the data [23]. It was the
284	   invention of appropriate indexing mechanisms and query optimizations
285	   that enabled data independence. Database indexes like B+ trees have
286	   an analog in P2P's distributed hash tables (DHTs). Wide-area, P2P
287	   query optimization is a ripe, but challenging, area for innovation.

289	   More flexible distribution of objects comes with increased security
290	   risks. There are opportunities for security researchers to deliver
291	   new methods for availability, file authenticity, anonymity and access
292	   control [25]. Proactive and reactive mechanisms are needed to deal
293	   with large numbers of autonomous, distributed peers. To build robust
294	   systems from cooperating but self-interested peers, issues of
295	   identity, reputation, trust and incentives need to be tackled.
296	   Although it is beyond the scope of this paper, robustness against
297	   malicious attacks also ought to be addressed [195].

299	   Possibly the largest portion of P2P research has majored on basic
300	   routing structures [18], where research on algorithms comes to the
301	   fore. Should the overlay be "structured" or "unstructured"? Are the
302	   two approaches competing or complementary? Comparisons of the
303	   "structured" approaches (hypercubes, rings, toroids, butterflies, de
304	   Bruijn and skip graphs) have weighed the amount of routing state per
305	   peer and the number of links per peer against overlay hop-counts.
306	   While "unstructured" overlays initially used blind flooding and
307	   random walks, overheads usually trigger some structure, for example
308	   super-peers and clusters.

310	   P2P applications rely on cooperation between these disciplines.
311	   Applications have included file sharing, directories, content
312	   delivery networks, email, distributed computation, publish-subscribe
313	   middleware, multicasting, and distributed authentication. Which
314	   applications will be suited to which structures? Are there adaptable
315	   mechanisms which can decouple applications from the underlying data
316	   structures? What are the criteria for selection of applications
317	   amenable to a P2P design [1]?

319	   Robustness is emphasized throughout the survey. We are particularly
320	   interested in two aspects. The first, dependability, was a leading
321	   design goal for the original Internet [251]. It deserves the same
322	   status in P2P. The measures of dependability are well established:
323	   reliability, a measure of the mean-time-to-failure (MTTF);
324	   availability, a measure of both the MTTF and the mean-time-to-repair
325	   (MTTR); maintainability; and safety [252]. The second aspect is the
326	   ability to accommodate variation in outcome, which one could call
327	   adaptability. Its measures have yet to be defined. In the context of
328	   the Internet, it was only recently acknowledged as a first class
329	   requirement [253]. In P2P, it means planning for the tussles over
330	   resources and identity. It means handling different kinds of queries
331	   and accomodating changeable application requirements with minimal
332	   intervention. It means "organic scaling" [22], whereby the system
333	   grows gracefully, without a priori data center costs or architectural
334	   breakpoints.

336	   In the following section, we discuss one notable omission from the
337	   taxonomy of P2P networking in Figure 1 - routing.

339	1.2. Structured and Unstructured Routing

341	   P2P routing algorithms have been classified as "structured" or
342	   "unstructured". Peers in unstructured overlay networks join by
343	   connecting to any existing peers [254]. In structured overlays, the
344	   identifier of the joining peer determines the set of peers that it
345	   connects to [254]. Early instantiations of Gnutella were unstructured
346	   - keyword queries were flooded widely [255]. Napster [256] had
347	   decentralized content and a centralized index, so only partially
348	   satisfies the distributed control criteria for P2P systems. Early
349	   structured algorithms included Plaxton, Rajaraman and Richa (PRR)
350	   [30], Pastry [2], Tapestry [31], Chord [7] and the Content
351	   Addressable Network [8]. Mishchke and Stiller recently classified P2P
352	   systems by the presence or absence of structure in routing tables and
353	   network topology [257].

355	   Some have cast unstructured and structured algorithms as competing
356	   alternatives. Unstructured approaches have been called "first
357	   generation", implicitly inferior to the "second generation"
358	   structured algorithms [2, 31]. When generic key lookups are required,
359	   these structured, key-based routing schemes can guarantee location of
360	   a target within a bounded number of hops [23]. The broadcasting
361	   unstructured approaches, however, may have large routing costs, or
362	   fail to find available content [22]. Despite the apparent advantages
363	   of structured P2P, several research groups are still pursuing
364	   unstructured P2P.

366	   There have been two main criticisms of structured systems [61]. The
367	   first relates to peer transience, which in turn affects robustness.
368	   Chawathe et al. opined that highly transient peers are not well
369	   supported by DHTs [61]. P2P systems often exhibit "churn", with peers
370	   continually arriving and departing. One objection to concerns about
371	   highly transient peers is that many applications use peers in well-
372	   connected parts of the network. The Tapestry authors analysed the
373	   impact of churn in a network of 1000 nodes [31]. Others opined that
374	   it is possible to maintain a robust DHT at relatively low cost [258].
375	   Very few papers have quantitatively compared the resilience of
376	   structured systems. Loguinov, Kumar et al claimed that there were
377	   only two such works [24, 36].

379	   The second criticism of structured systems is that they do not
380	   support keyword searches and complex queries as well as unstructured
381	   systems. Given the current file-sharing deployments, keyword searches
382	   seem more important than exact-match key searches in the short term.
383	   Paraphrased, "most queries are for hay, not needles" [61].

385	   More recently, some have justifiably seen unstructured and structured
386	   proposals as complementary, and have devised hybrid models [259].
387	   Their starting point was the observation that unstructured flooding
388	   or random walks are inefficient for data that is not highly
389	   replicated across the P2P network. Structured graphs can find keys
390	   efficiently, irrespective of replication. Castro et al proposed
391	   Structella, a hybrid of Gnutella built on top of Pastry [259].
392	   Another design used structured search for rare items and unstructured
393	   search for massively replicated items [54].

395	   However, the "structured versus unstructured routing" taxonomy is
396	   becoming less useful, for two reasons, Firstly, most "unstructured"
397	   proposals have evolved and incorporated structure. Consider the
398	   classic "unstructured" system, Gnutella [4]. For scalability, its
399	   peers are either ultrapeers or leaf nodes. This hierarchy is
400	   augmented with a query routing protocol whereby ultrapeers receive a
401	   hashed summary of the resource names available at leaf-nodes. Between
402	   ultrapeers, simple query broadcast is still used, though methods to
403	   reduce the query load here have been considered [260]. Secondly,
404	   there are emerging schema-based P2P designs [59], with super-node
405	   hierarchies and structure within documents. These are quite distinct
406	   from the structured DHT proposals.

408	1.3. Indexes and Queries

410	   Given that most, if not all, P2P designs today assume some structure,
411	   a more instructive taxonomy would describe the structure. In this
412	   survey, we use a database taxonomy in lieu of the networking
413	   taxonomy, as suggested by Hellerstein, Cooper and Garcia-Molina [23,
414	   261]. The structure is determined by the type of index (Sections 2. ,
415	   3. and 4. ). Queries feature in lieu of routing (Section 5. ). The
416	   DHT algorithms implement a "semantic-free index" [216]. They are
417	   oblivious of whether keys represent document titles, meta-data, or
418	   text. Gnutella-like and schema-based proposals have a "semantic
419	   index".

421	   Index engineering is at the heart of P2P search methods. It captures
422	   a broad range of P2P issues, as demonstrated by the Search/Index
423	   Links model [261]. As Manber put it, "the most important of the tools
424	   for information retrieval is the index - a collection of terms with
425	   pointers to places where information about documents can be
426	   found"[262]. Sen and Wang noted that a "P2P network" usually consists
427	   of connections between hosts for application-layer signaling, rather
428	   than for the data transfer itself [263]. Similarly, we concentrate on
429	   the "signaled" indexes and queries.

431	   Our focus here is the dependability and adaptability of the search
432	   network. Static dependability is a measure of how well queries route
433	   around failures in a network that is normally fault-free. Dynamic
434	   dependability gives an indication of query success when nodes and
435	   data are continually joining and leaving the P2P system. An adaptable
436	   index accommodates change in the data and query distribution. It
437	   enables data independence, in that it facilitates changes to the data
438	   layout without requiring changes to the applications that use the
439	   data [23]. An adaptable P2P system can support rich queries for a
440	   wide range of applications. Some applications benefit from simple,
441	   semantic-free key lookups [264]. Others require more complex,
442	   Structured Query Language (SQL)-like queries to find documents with
443	   multiple keywords, or to aggregate or join query results from
444	   distributed relations [22].

446	2. Index Types

448	   A P2P index can be local, centralized or distributed. With a local
449	   index, a peer only keeps the references to its own data, and does not
450	   receive references for data at other nodes. The very early Gnutella
451	   design epitomized the local index (Section 2.1. ). In a centralized
452	   index, a single server keeps references to data on many peers. The
453	   classic example is Napster (Section 2.2. ). With distributed indexes,
454	   pointers towards the target reside at several nodes. One very early
455	   example is Freenet (Section 2.3. ). Distributed indexes are used in
456	   most P2P designs nowadays - they dominate this survey.

458	   P2P indexes can also be classified as non-forwarding and forwarding.
459	   When queries are guided by a non-forwarding index, they jump to the
460	   node containing the target data in a single hop. There have been
461	   semantic and semantic-free one-hop schemes [138, 265, 266]. Where
462	   scalability to a massive number of peers is required, these schemes
463	   have been extended to two-hops [267, 268]. More common are the
464	   forwarding P2Ps where the number of hops varies with the total number
465	   of peers, often logarithmically. The related tradeoffs between
466	   routing state, lookup latency, update bandwidth and peer churn are
467	   critical to total system dependability.

469	2.1. Local Index (Gnutella)

471	   P2Ps with a purely local data index are becoming rare. In such
472	   designs, peers flood queries widely and only index their own content.
473	   They enable rich queries - the search is not limited to a simple key
474	   lookup. However, they also generate a large volume of query traffic
475	   with no guarantee that a match will be found, even if it does exist
476	   on the network. For example, to find potential peers on the early
477	   instantiations of Gnutella, 'ping' messages were broadcast over the
478	   P2P network and the 'pong' responses were used to build the node
479	   index. Then small 'query' messages, each with a list of keywords, are
480	   broadcast to peers which respond with matching filenames [4].

482	   There have been numerous attempts to improve the scalability of
483	   local-index P2P networks. Gnutella uses fixed time-to-live (TTL)
484	   rings, where the query's TTL is set less than 7-10 hops [4]. Small
485	   TTLs reduce the network traffic and the load on peers, but also
486	   reduce the chances of a successful query hit. One paper reported,
487	   perhaps a little too bluntly, that the fixed "TTL-based mechanism
488	   does not work" [67] To address this TTL selection problem, they
489	   proposed an expanding ring, known elsewhere as iterative deepening
490	   [29]. It uses successively larger TTL counters until there is a
491	   match. The flooding, ring and expanding ring methods all increase
492	   network load with duplicated query messages. A random walk, whereby
493	   an unduplicated query wanders about the network, does indeed reduce
494	   the network load but massively increases the search latency. One
495	   solution is to replicate the query k times at each peer. Called
496	   random k-walkers, this technique can be coupled with TTL limits, or
497	   periodic checks with the query originator, to cap the query load
498	   [67]. Adamic, Lukose et al. suggested that the random walk searches
499	   be directed to nodes with higher degree, that is, with larger numbers
500	   of inter-peer connections [269]. They assumed that higher-degree
501	   peers are also capable of higher query throughputs. However without
502	   some balancing design rule, such peers would be swamped with the
503	   entire P2P signaling traffic. In addition to the above approaches,
504	   there is the 'directed breadth-first' algorithm [29]. It forwards
505	   queries within a subset of peers selected according to heuristics on
506	   previous performance, like the number of successful query results.
507	   Another algorithm, called probabilistic flooding, has been modeled
508	   using percolation theory [270].

510	   Several measurement studies have investigated locally indexed P2Ps.
511	   Jovanovic noted Gnutella's power law behaviour [70]. Sen and Wang
512	   compared the performance of Gnutella, Fasttrack [271] and Direct
513	   Connect [263, 272, 273]. At the time, only Gnutella used local data
514	   indexes. All three schemes now use distributed data indexes, with
515	   hierarchy in the form of Ultrapeers (Gnutella), Super-Nodes
516	   (FastTrack) and Hubs (Direct Connect). It was found that a very small
517	   percentage of peers have a very high degree and that the total system
518	   dependability is at the mercy of such peers. While peer up-time and
519	   bandwidth were heavy-tailed, they did not fit well with the Zipf
520	   distribution. Fortunately for Internet Service Providers, measures
521	   aggregated by IP prefix and Autonomous System (AS) were more stable
522	   than for individual IP addresses. A study of University of Washington
523	   traffic found that Gnutella and Kazaa together contributed 43% of the
524	   university's total TCP traffic [274]. They also reported a heavy-
525	   tailed distribution, with 600 external peers (out of 281,026)
526	   delivering 26% of Kazaa bytes to internal peers. Furthermore, objects
527	   retrieved from the P2P network were typically three orders of
528	   magnitude larger than web objects - 300 objects contributed to almost
529	   half of the total outbound Kazaa bandwidth. Others reported
530	   Gnutella's topology mismatch, whereby only 2-5% of P2P connections
531	   link peers in the same AS, despite over 40% of peers being in the top
532	   10 ASes [65]. Together these studies underscore the significance of
533	   multimedia sharing applications. They motivate interesting caching
534	   and locality solutions to the topology mismatch problem.

536	   These same studies bear out one main dependability lesson: total
537	   system dependability may be sensitive to the dependability of high
538	   degree peers. The designers of Scamp translated this observation to
539	   the design heuristic, "have the degree of each node be of nearly
540	   equal size" [153]. They analyzed a system of N peers, with mean
541	   degree c.log(N), where link failures occur independently with
542	   probability e. If d>0 is fixed and c>(1+d)/(-log(e)) then the
543	   probability of graph disconnection goes to zero as N->infinity.
544	   Otherwise, if c<(1-d)/(-log(e)) then the probability of disconnection
545	   goes to one as N->infinity. They presented a localizer, which finds
546	   approximate minima to a global function of peer degree and arbitrary
547	   link costs using only local information. The Scamp overlay
548	   construction algorithms could support any of the flooding and walking
549	   routing schemes above, or other epidemic and multicasting schemes for
550	   that matter. Resilience to high churn rates was identified for future
551	   study.

553	2.2. Central Index (Napster)

555	   Centralized schemes like Napster [256] are significant because they
556	   were the first to demonstrate the P2P scalability that comes from
557	   separating the data index from the data itself. Ultimately 36 million
558	   Napster users lost their service not because of technical failure,
559	   but because the single administration was vulnerable to the legal
560	   challenges of record companies [275].

562	   There has since been little research on P2P systems with central data
563	   indexes. Such systems have also been called 'hybrid' since the index
564	   is centralized but the data is distributed. Yang and Garcia-Molina
565	   devised a four-way classification of hybrid systems [276]: unchained
566	   servers, where users whose index is on one server do not see other
567	   servers' indexes; chained servers, where the server that receives a
568	   query forwards it to a list of servers if it does not own the index
569	   itself; full replication, where all centralized servers keep a
570	   complete index of all available metadata; and hashing, where keywords
571	   are hashed to the server where the associated inverted list is kept.
572	   The unchained architecture was used by Napster, but it has the
573	   disadvantage that users do not see all indexed data in the system.
574	   Strictly speaking, the other three options illustrate the distributed
575	   data index, not the central index. The chained architecture was
576	   recommended as the optimum for the music-swapping application at the
577	   time. The methods by which clients update the central index were
578	   classified as batch or incremental, with the optimum determined by
579	   the query-to-login ratio. Measurements were derived from a clone of
580	   Napster called OpenNap[277]. Another study of live Napster data
581	   reported wide variation in the availability of peers, a general
582	   unwillingness to share files (20-40% of peers share few or no files),
583	   and a common understatement of available bandwidth so as to
584	   discourage other peers from sharing one's link [202].

586	   Influenced by Napster's early demise, the P2P research community may
587	   have prematurely turned its back on centralized architectures.
588	   Chawathe, Ratnasamy et al. opined that Google and Yahoo demonstrate
589	   the viability of a centralized index. They argued that "the real
590	   barriers to Napster-like designs are not technical but legal and
591	   financial" [61]. Even this view may be a little too harsh on the
592	   centralized architectures - it implies that they always have an
593	   upfront capital hurdle that is steeper than for distributed
594	   architectures. The closer one looks at scalable 'centralized'
595	   architectures, the less the distinction with 'distributed'
596	   architectures seems to matter. For example, it is clear that Google's
597	   designers consider Google a distributed, not centralized, file system
598	   [278]. Google demonstrates the scale and performance possible on
599	   commodity hardware, but still has a centralized master that is
600	   critical to the operation of each Google cluster. Time may prove that
601	   the value of emerging P2P networks, regardless of the centralized-
602	   versus-distributed classification, is that they smooth the capital
603	   outlays and remove the single points of failure across the spectra of
604	   scale and geographic distribution.

606	2.3. Distributed Index (Freenet)

608	   An important early P2P proposal for a distributed index was Freenet
609	   [5, 71, 279]. While its primary emphasis was the anonymity of peers,
610	   it did introduce a novel indexing scheme. Files are identified by
611	   low-level "content-hash" keys and by "secure signed-subspace" keys
612	   which ensure that only a file owner can write to a file while anyone
613	   can read from it. To find a file, the requesting peer first checks
614	   its local table for the node with keys closest to the target. When
615	   that node receives the query, it too checks for either a match or
616	   another node with keys close to the target. Eventually, the query
617	   either finds the target or exceeds time-to-live (TTL) limits. The
618	   query response traverses the successful query path in reverse,
619	   depositing a new routing table entry (the requested key and the data
620	   holder) at each peer. The insert message similarly steps towards the
621	   target node, updating routing table entries as it goes, and finally
622	   stores the file there. Whereas early versions of Gnutella used
623	   breadth-first flooding, Freenet uses a more economic depth-first
624	   search [280].

626	   An initial assessment has been done of Freenet's robustness. It was
627	   shown that in a network of 1000 nodes, the median query path length
628	   stayed under 20 hops for a failure of 30% of nodes. While the Freenet
629	   designers considered this as evidence that the system is
630	   "surprisingly robust against quite large failures" [71], the same
631	   datapoint may well be outside meaningful operating bounds. How many
632	   applications are useful when the first quartile of queries have path
633	   lengths of several hundred hops in a network of only 1000 nodes, per
634	   Figure 4 of [71]? To date, there has been no analysis of Freenet's
635	   dynamic robustness. For example, how does it perform when nodes are
636	   continually arriving and departing?

638	   There have been both criticisms and extensions of the early Freenet
639	   work. Gnutella proponents acknowledged the merit in Freenet's
640	   avoidance of query broadcasting [281]. However, they are critical on
641	   two counts: the exact file name is needed to construct a query; and
642	   exactly one match is returned for each query. P2P designs using DHTs,
643	   per Section 3. , share similar characteristics - a precise query
644	   yields a precise response. The similarity is not surprising since
645	   Freenet also uses a hash function to generate keys. However, the
646	   query routing used in the DHTs has firmer theoretical foundations.
647	   Another difference with DHTs is that Freenet will take time, when a
648	   new node joins the network, to build an index that facilitates
649	   efficient query routing. By the inventor's own admission, this is
650	   damaging for a user's first impressions [282]. It was proposed to
651	   download a copy of routing tables from seed nodes at startup, even
652	   though the new node might be far from the seed node. Freenet's slow
653	   startup motivated Mache, Gilbert et al. to amend the overlay after
654	   failed requests and to place additional index entries on successful
655	   requests - they claim almost an order of magnitude reduction in
656	   average query path length [280]. Clarke also highlighted the lack of
657	   locality or bandwidth information available for efficient query
658	   routing decisions [282]. He proposed that each node gather response
659	   times, connection times and proportion of successful requests for
660	   each entry in the query routing table. When searching for a key that
661	   is not in its own routing table, it was proposed to estimate response
662	   times from the routing metrics for the nearest known keys and
663	   consequently choose the node that can retrieve the data fastest. The
664	   response time heuristic assumed that nodes close in the key space
665	   have similar response times. This assumption stemmed from early
666	   deployment observations that Freenet peers seemed to specialize in
667	   parts of the keyspace - it has not been justified analytically.
668	   Kronfol drew attention to Freenet's inability to do keyword searches
669	   [283]. He suggested that peers cache lists of weighted keywords in
670	   order to route queries to documents, using Term Frequency Inverse
671	   Document Frequency (TFIDF) measures and inverted indexes (Section
672	   4.2.1. ). With these methods, a peer can route queries for simple
673	   keyword lists or more complicated conjunctions and disjunctions of
674	   keywords. Robustness analysis and simulation of Kronfol's proposal
675	   remains open.

677	   The vast majority of P2P proposals in following sections rely on a
678	   distributed index.

680	3. Semantic Free Index

682	   Many of today's distributed network indexes are semantic. The
683	   semantic index is human-readable. For example, it might associate
684	   information with other keywords, a document, a database key or even
685	   an administrative domain. It makes it easy to associate objects with
686	   particular network providers, companies or organizations, as
687	   evidenced in the Domain Name System (DNS). However, it can also
688	   trigger legal tussles and frustrate content replication and migration
689	   [216].

691	   Distributed Hash Tables (DHTs) have been proposed to provide
692	   semantic-free, data-centric references. DHTs enable one to find an
693	   object's persistent key in a very large, changing set of hosts. They
694	   are typically designed for [23]:

696	   a) low degree. If each node keeps routing information for only a
697	   small number of other nodes, the impact of high node arrival and
698	   departure rates is contained;
699	   b) low hop count. The hops and delay introduced by the extra
700	   indirection are minimized;

702	   c) greedy routing. Nodes independently calculate a short path to the
703	   target. At each hop, the query moves closer to the target; and

705	   d) robustness. A path to the target can be found even when links or
706	   nodes fail.

708	3.1. Origins

710	   To understand the origins of recent DHTs, one needs to look to three
711	   contributions from the 1990s. The first two - Plaxton, Rajaraman, and
712	   Richa (PRR) [30] and Consistent Hashing [49] - were published within
713	   one month of each other. The third, the Scalable Distributed Data
714	   Structure (SDDS) [52], was curiously ignored in significant
715	   structured P2P designs despite having some similar goals [2, 6, 7].
716	   It has been briefly referenced in other P2P papers [46, 284-287].

718	3.1.1. Plaxton, Rajaraman, and Richa (PRR)

720	   PRR is the most recent of the three. It influenced the designs of
721	   Pastry [2], Tapestry [6] and Chord [7]. The value of PRR is that it
722	   can locate objects using fixed-length routing tables [6]. Objects and
723	   nodes are assigned a semantic-free address, for example a 160 bit
724	   key. Every node is effectively the root of a spanning tree. A message
725	   routes toward an object by matching longer address suffixes, until it
726	   encounters either the object's root node or another node with a
727	   'nearby' copy. It can route around link and node failure by matching
728	   nodes with a related suffix. The scheme has several disadvantages
729	   [6]: global knowledge is needed to construct the overlay; an object's
730	   root node is a single point of failure; nodes cannot be inserted and
731	   deleted; there is no mechanism for queries to avoid congestion hot
732	   spots.

734	3.1.2. Consistent Hashing

736	   Consistent Hashing [288] strongly influenced the designs of Chord [7]
737	   and Koorde [37]. Karger et al. introduced Consistent Hashing in the
738	   context of the web caching problem [49]. Web servers could
739	   conceivably use standard hashing to place objects across a network of
740	   caches. Clients could use the approach to find the objects. For
741	   normal hashing, most object references would be moved when caches are
742	   added or deleted. On the other hand, Consistent Hashing is "smooth" -
743	   when caches are added or deleted, the minimum number of object
744	   references move so as to maintain load balancing. Consistent Hashing
745	   also ensures that the total number of caches responsible for a
746	   particular object is limited. Whereas Litwin's Linear Hashing (LH*)
747	   scheme requires 'buckets' to be added one at a time in sequence [50],
748	   Consistent Hashing allows them to be added in any order [49]. There
749	   is an open Consistent Hashing problem pertaining to the fraction of
750	   items moved when a node is inserted [165]. Extended Consistent
751	   Hashing was recently proposed to randomize queries over the spread of
752	   caches to significantly reduce the load variance [289].
753	   Interestingly, Karger [49] referred to an older DHT algorithm by
754	   Devine that used "a novel autonomous location discovery algorithm
755	   that learns the buckets' locations instead of using a centralized
756	   directory" [51].

758	3.1.3. Scalable Distributed Data Structures (LH*)

760	   In turn, Devine's primary point of reference was Litwin's work on
761	   SDDSs and the associated LH* algorithm [52]. An SDDS satisfies three
762	   design requirements: files grow to new servers only when existing
763	   servers are well loaded; there is no centralized directory; the basic
764	   operations like insert, search and split never require atomic updates
765	   to multiple clients. Honicky and Miller suggested the first
766	   requirement could be considered a limitation since expansion to new
767	   servers is not under administrative control [286]. Litwin recently
768	   noted numerous similarities and differences between LH* and Chord
769	   [290]. He found that both implement key search. Although LH* refers
770	   to clients and servers, nodes can operate as peers in both. Chord
771	   'splits' nodes when a new node is inserted, while LH* schedules
772	   'splits' to avoid overload. Chord requests travel O(logN) hops, while
773	   LH* client requests need at most two hops to find the target. Chord
774	   stores a small number of 'fingers' at each node. LH* servers store
775	   N/2 to N addresses while LH* clients store 1 to N addresses. This
776	   tradeoff between hop count and the size of the index affects system
777	   robustness, and bears striking similarity to recent one- and two-hop
778	   P2P schemes in Section 2. . The arrival and departure of LH* clients
779	   does not disrupt LH* server metadata at all. Given the size of the
780	   index, the arrival and departure of LH* servers is likely to cause
781	   more churn than that of Chord nodes. Unlike Chord, LH* has a single
782	   point of failure, the split coordinator. It can be replicated.
783	   Alternatively it can be removed in later LH* variants, though details
784	   have not been progressed for lack of practical need [290].

786	3.2. Dependability

788	   We make four overall observations about their dependability.
789	   Dependability metrics fall into two categories: static dependability,
790	   a measure of performance before recovery mechanisms take over; and
791	   dynamic dependability, for the most likely case in massive networks
792	   where there is continual failure and recovery ("churn").

794	3.2.1. Static Dependability

796	   Observation A: Static dependability comparisons show that no O(log N)
797	   DHT geometry is significantly more dependable than the other O(log N)
798	   geometries.

800	   Gummadi et al. compared the tree, hypercube, butterfly, ring, XOR and
801	   hybrid geometries. In such geometries, nodes generally know about
802	   O(log N) neighbors and route to a destination in O(log N) hops, where
803	   N is the number of nodes in the overlay. Gummadi et al. asked "Why
804	   not the ring?". They concluded that only the ring and XOR geometries
805	   permit flexible choice of both neighbors and alternative routes [24].
806	   Loguinov et al. added the de Bruijn graph to their comparison [36].
807	   They concluded that the classical analyses, for example the
808	   probability that a particular node becomes disconnected, yield no
809	   major differences between the resilience of Chord, CAN and de Bruijn
810	   graphs. Using bisection width (the minimum edge count between two
811	   equal partitions) and path overlap (the likelihood that backup paths
812	   will encounter the same failed nodes or links as the primary path),
813	   they argued for the superior resilience of the de Bruijn graph. In
814	   short, ring, XOR and de Bruijn graphs all permit flexible choice of
815	   alternative paths, but only in de Bruijn are the alternate paths
816	   independent of each other [36].

818	3.2.2. Dynamic Dependability

820	   Observation B: Dynamic dependability comparisons show that DHT
821	   dependability is sensitive to the underlying topology maintenance
822	   algorithms.

824	   Li et al. give the best comparison to date of several leading DHTs
825	   during churn [291]. They relate the disparate configuration
826	   parameters of Tapestry, Chord, Kademlia, Kelips and OneHop to
827	   fundamental design choices. For each of these DHTs, they plotted the
828	   optimal performance in terms of lookup latency (milliseconds) and
829	   fraction of failed lookups. The results led to several important
830	   insights about the underlying algorithms, for example: increasing
831	   routing table size is more cost-effective than increasing the rate of
832	   periodic stabilization; learning about new nodes during the lookup
833	   process sometimes eliminates the need for stabilization; parallel
834	   lookups reduce latency due to timeouts more effectively than faster
835	   stabilization. Similarly, Zhuang et al. compared keep-alive
836	   algorithms for DHT failure detection [292]. Such algorithmic
837	   comparisons can significantly improve the dependability of DHT
838	   designs.

840	   In Figure 2, we propose a taxonomy for the topology maintenance
841	   algorithms that influence dependability. The algorithms can be
842	   classified by how nodes join and leave, how they first detect
843	   failures, how they share information about topology updates, and how
844	   they react when they receive information about topology updates.

846	   Normal Updates
847	      Joins (passive; active) [293]
848	      Leaves (passive; active) [293]

850	   Fault Detection [292]
851	      Maintenance
852	         Proactive (periodic or keep-alive probes)
853	         Reactive (correction-on-use, correction-on-failure) [294]
854	      Report
855	         Negative (all dead nodes, nodes recently failed);
856	         Positive (all live nodes; nodes recently recovered); [292]

858	   Topology Sharing: yes/ no [292]
859	         Multicast Tree (explicit, implicit) [267, 295]
860	         Gossip (timeouts; number of contacts) [39]

862	   Corrective Action
863	      Routing
864	         Rerouting actions
865	            (reroute once; route in parallel [291]; reject);
866	         Routing timeouts
867	            (TCP-style, virtual coordinates) [296]
868	      Topology
869	         Update action (evict/ replace/ tag node)
870	         Update timeliness (immediate, periodic[296], delayed [297])

872	         Figure 2 Topology Maintenance in Distributed Hash Tables.

874	3.2.3. Ephemeral or Stable Nodes - O(log N) or O(1) Hops

876	   Observation C: Most DHTs use O(log N) geometries to suit ephemeral
877	   nodes. The O(1) hop DHTs suit stable nodes and deserve more research
878	   attention.

880	   Most of the DHTs in Section 3.5. assume that nodes are ephemeral,
881	   with expected lifetimes of one to two hours. They therefore mostly
882	   use an O(log N) geometry. The common assumption is that maintenance
883	   of full routing tables in the O(1) hop DHTs will consume excessive
884	   bandwidth when nodes are continually joining and leaving. The
885	   corollary is that, when they run on stable infrastructure servers
886	   [298], most of the DHTs in Section 3.5. are less than optimal -
887	   lookups take many more hops than necessary, wasting latency and
888	   bandwidth budgets. The O(1) hop DHTs suit stable deployments and high
889	   lookup rates. For a churning 1024-node network, Li et al. concluded
890	   that OneHop is superior to Chord, Tapestry, Kademlia and Kelips in
891	   terms of latency and lookup success rate [291]. For a 3000-node
892	   network, they concluded that "OneHop is only preferable to Chord when
893	   the deployment scenario allows a communication cost greater than 20
894	   bytes per node per second" [291]. This apparent limitation needs to
895	   be put in context. They assumed that each node issues only one lookup
896	   every 10 minutes and has a lifetime of only 60 minutes. It seems
897	   reasonable to expect that in some deployments, nodes will have a
898	   lifetime of weeks or more, a maintenance bandwidth of tens of
899	   kilobits per second, and a load of hundreds of lookups per second.
900	   O(1) hop DHTs are superior in such situations. OneHop can scale at
901	   least to many tens of thousands of nodes [267]. The recent O(1) hop
902	   designs [267, 295] are vastly outnumbered by the O(log N) DHTs in
903	   Section 3.5. . Research on the algorithms of Figure 2 will also yield
904	   improvements in the dependability of the O(1) hop DHTs.

906	3.2.4. Simulation and Proof

908	   Observation D: Although not yet a mature science, the study of DHT
909	   dependability is helped by recent simulation and formal development
910	   tools.

912	   While there are recent reference architectures [294, 298], much of
913	   the DHT literature in Section 3.5. does not lend itself to
914	   repeatable, comparative studies. The best comparative work to date
915	   [291] relies on the P2PSIM simulator [299]. At the time of writing,
916	   it supports more DHT geometries than any other simulator. As the
917	   study of DHTs matures, we can expect to see the simulation emphasis
918	   shift from geometric comparison to a comparison of the algorithms of
919	   Figure 2.

921	   P2P correctness proofs generally rely on less than complete formal
922	   specifications of system invariants and events [7, 45, 300]. Li and
923	   Plaxton expressed concern that "when many joins and leaves happen
924	   concurrently, it is not clear whether the neighbor tables will remain
925	   in a 'good' state" [47]. While acknowledging that guaranteeing
926	   consistency in a failure prone network is impossible, Lynch, Malkhi
927	   et al. sketched amendments to the Chord algorithm to guarantee
928	   atomicity [301]. More recently, Gilbert, Lynch et al. gave a new
929	   algorithm for atomic read/write memory in a churning distributed
930	   network, suggesting it to be a good match for P2P [302]. Lynch and
931	   Stoica show in an enhancement to Chord that lookups are provably
932	   correct when there is a limited rate of joins and failures [303].
933	   Fault Tolerant Active Rings is a protocol for active joins and leaves
934	   that was formally specified and proven using B-method tools [304]. A
935	   good starting point for a formal DHT development would be the
936	   numerous informal API specifications [22, 305, 306]. Such work could
937	   be informed by other efforts to formally specify routing invariants
938	   [307, 308].

940	3.3. Latency

942	   The key metrics for DHT latency are:

944	   1) Shortest-Path Distance and Diameter. In graph theory, the
945	      shortest-path distance is the minimum number of edges in any path
946	      between two vertices of the graph. Diameter is the largest of all
947	      shortest-path distances in a graph [309]. Networking synonyms for
948	      distance on a DHT are "hop count" and "lookup length".

950	   2) Latency and Latency Stretch. Two types of latency are relevant
951	      here - network-layer latency and overlay latency. Network-layer
952	      latency has been referred to as "proximity" or "locality" [24].
953	      Stretch is the cost of an overlay path between two nodes, divided
954	      by the cost of the direct network path between those nodes [310].
955	      Latency stretch is also known as the "relative delay penalty"
956	      [311].

958	3.3.1. Hop Count and the O(1)-Hop DHTs

960	   Hop count gives an approximate indication of path latency. O(1)-hop
961	   DHTs have path latencies lower than the O(log N)-hop DHTs [291]. This
962	   significant advantage is often overlooked on account of concern about
963	   the messaging costs to maintain large routing tables (Section 3.2.3.
964	   ). Such concern is justified when the mean node lifetime is only a
965	   few hours and the mean lookup interval per node is more than a few
966	   seconds (the classic profile of a P2P file-sharing node). However,
967	   for a large, practical operating range (node lifetimes of days or
968	   more, lookup rates of over tens of lookups per second per node, up to
969	   ~100,000 nodes), the total messaging cost in O(1) hop DHTs is lower
970	   than in O(log N) DHTs [312]. Lookups and routing table maintenance
971	   contribute to the total messaging cost. If a deployment fits this
972	   operating range, then O(1)-hop DHTs will give lower path latencies
973	   and lower total messaging costs. An additional merit of the O(1)-hop
974	   DHTs is that they yield lower lookup failure rates than their O(log
975	   N)-hop counterparts [291].

977	   Low hop count can be achieved in two ways: each node has a large O(N)
978	   index of nodes; or the object references can be replicated on many
979	   nodes. Beehive [313], Kelips [39], LAND [310] and Tulip [314] are
980	   examples of the latter category. Beehive achieves O(1) hops on
981	   average and O(log N) hops in the worst case, by proactive replication
982	   of popular objects. Kelips replicates the 'file index'. It incurs
983	   O(sqrt(N)) storage costs for both the node index and the file index.
984	   LAND uses O(log N) reference pointers for each stored object and an
985	   O(log N) index to achieve a worst-case 1+e stretch, where 0<e. The
986	   Kelips-like Tulip [314] requires 2 hops per lookup. Each node
987	   maintains 2sqrt(N)log(N) links to other nodes and objects are
988	   replicated on O(sqrt(N)) nodes.

990	   The DHTs with a large O(N) node index can be divided into two groups:
991	   those for which the index is always O(N); and those for which the
992	   index opportunistically ranges from O(log N) to O(N). Linear Hashing
993	   (LH*) servers [52], OneHop [267] and 1h-Calot [295] fall into the
994	   former category. EpiChord [315] and Accordion [316] are examples of
995	   the latter.

997	3.3.2. Proximity and the O(log N)-Hop DHTs

999	   If one chooses not to use single-hop DHTs, hop count is a weak
1000	   indicator of end-to-end path latency. Some hops may incur large
1001	   delays because of intercontinental or satellite links. Consequently,
1002	   numerous DHT designs minimize path latency by considering the
1003	   proximity of nodes. Gummadi et al. classified the proximity methods
1004	   as follows [24]:

1006	   1) Proximity Neighbor Selection (PNS). The nodes in the routing table
1007	   are chosen based on the latency of the direct hop to those nodes. The
1008	   latency may be explicitly measured [317], or it may be estimated
1009	   using one of several synthetic coordinate systems [150, 154, 318]. As
1010	   a lower bound on PNS performance, Dabek et al. showed that lookups on
1011	   O(log N) DHTs take at least 1.5 times the average round trip time of
1012	   the underlying network [154].

1014	   2) Proximity Route Selection (PRS). At lookup time, the choice of the
1015	   next-hop node relies on the latency of the direct hop to that node.
1016	   PRS is less effective than PNS, though it may complement it [24].
1017	   Some of the routing geometries in Section 3.5. do not support PNS
1018	   and/or PRS [24].

1020	   3) Proximity Identifier Selection (PIS). Node identifiers indicate
1021	   geographic position. PIS frustrates load balancing, increases the
1022	   risk of correlated failures, and is not often used [24].

1024	   The proximity study by Gummadi et al. assumed recursive routing,
1025	   though they suggested that PNS would also be superior to PRS with
1026	   iterative routing [24]. Dabek et al. found that recursive lookups
1027	   take 0.6 times as long as iterative lookups [150].

1029	   Beyond the explicit use of proximity information, redundancy can help
1030	   to avoid slow paths and servers. One may increase the number of
1031	   replicas [150], use parallel lookups [291, 316], use alternate routes
1032	   on failure [150], or use multiple gateway nodes to enter the DHT
1033	   [317].

1035	3.4. Multicasting

1037	3.4.1. Multicasting vs Broadcasting

1039	   "Multicasting" here means sending a message to a subset of an
1040	   overlay's nodes. Nodes explicitly join and leave this subset, called
1041	   a "multicast group". "Broadcasting" here is a special case of
1042	   multicasting in which a message is sent to all nodes in the overlay.
1043	   Broadcasting relies on overlay membership messages - it does not need
1044	   extra group membership messaging. Castro et al. said multicasting on
1045	   structured overlays is either "flooding" (one overlay per group) or
1046	   "tree-based" (one tree per group) [319]. These are synonyms for
1047	   broadcasting and multicasting respectively.

1049	   The first DHT-based designs for multicasting were CAN multicast
1050	   [320], Scribe [241], Bayeux [242] and i3 [231]. They were based on
1051	   CAN [8], Pastry [2], Tapestry [31] and Chord [7] respectively. El-
1052	   Ansary et al. devised the first DHT-based broadcasting scheme [321].
1053	   It was based on Chord.

1055	   Multicast trees can be constructed using reverse-path forwarding or
1056	   forward-path forwarding. Scribe uses reverse-path forwarding [241].
1057	   Bayeux uses forward-path forwarding [242]. Borg, a multicast design
1058	   based on Pastry, uses a combination of forward-path and reverse-path
1059	   forwarding to minimize latency [237].

1061	3.4.2. Motivation for DHT-based Multicasting

1063	   Multicasting complements DHT search capability. DHTs naturally
1064	   support exact match queries. With multicasting, they can support more
1065	   complex queries. Multicasting also enables the dissemination and
1066	   collection of global information.

1068	   Consider, for example, aggregation queries like minimum, maximum,
1069	   count, sum and average (Section 5.4. ). A node at the root of a
1070	   dissemination tree might multicast such a query [322]. The leaf nodes
1071	   return local results towards the root node. Successive parents
1072	   aggregate the result so that eventually the root node can compute the
1073	   global result. Such queries may help to monitor the capacity and
1074	   health of the overlay itself.

1076	   Why bother with structured overlays for multicasting? In Section 2.1.
1077	   , we saw that Gnutella can multicast complex queries without them
1078	   [4]. Castro et al. posed the question "Should we build Gnutella on a
1079	   structured overlay?" [259]. While acknowledging that their study was
1080	   preliminary, they did conclude that "we see no reason to build
1081	   Gnutella on top of an unstructured overlay" [259]. The supposedly
1082	   high maintenance costs of structured overlays were outweighed by
1083	   query cost savings. The structured overlay ensured that nodes were
1084	   only visited once during a complex query. It also helped to
1085	   accurately limit the total number of nodes visited. Pai et al.
1086	   acknowledged that multicast trees based on structured overlays
1087	   contribute to simple routing rules, low delay and low delay variation
1088	   [323]. However, they opted for unstructured, gossip-based
1089	   multicasting for reliability reasons: data loss near the tree root
1090	   affects all subtended nodes; interior node failures must be repaired
1091	   quickly; interior nodes are obliged to disseminate more than their
1092	   fair share of traffic, giving leaf nodes a "free ride". The most
1093	   promising research direction is to improve on the Bimodal
1094	   Multicasting approach [324]. It combines the bandwidth efficiency and
1095	   low latency of structured, best-effort multicasting trees with the
1096	   reliability of unstructured gossip protocols.

1098	3.4.3. Design Issues

1100	   None of the early structured overlay multicast designs addressed all
1101	   of the following issues [325]:

1103	   1) Heterogeneous Node Capacity. Nodes differ in their processing,
1104	      memory and network capacity. Multicast throughput is largely
1105	      determined by the node with smallest throughput [325]. To limit
1106	      the multicasting load on a node, one might cap its out-degree. If
1107	      the same node receives further join requests, it refers them to
1108	      its children ("pushdown") [240]. Bharambe et al. explored several
1109	      pushdown strategies but found them inadequate to deal with
1110	      heterogeneity [326]. They concluded that the heterogeneity issue
1111	      remains open, and should be addressed before deploying DHTs for
1112	      high-bandwidth multicasting applications. Independently, Zhang et
1113	      al. partially tackled heterogeneity by allowing nodes in their
1114	      their CAM-Chord and CAM-Koorde designs to vary out-degree
1115	      according to the node's capacity [325]. However they made no
1116	      mention of the "pushdown" issue - they did not describe topology
1117	      maintenance when the out-degree limit is reached.

1119	   2) Reliability (Dynamic Membership). If a multicast tree is to be
1120	      resilient, it must survive dynamic membership. There are several
1121	      ways to deal with dynamic membership: ensure that the root node of
1122	      the multicasting tree does not handle all requests to join or
1123	      leave the multicast group [242]; use multiple interior-node-
1124	      disjoint trees to avoid single points of failure in tree
1125	      structures [322]; and split the root node into several replicas
1126	      and partition members across them [241]. For example, Bayeux
1127	      requires the root node to track all group membership changes
1128	      whereas Scribe does not [241]. CAN-multicast uses a single, well-
1129	      known host to bootstrap the join operations [320]. The earliest
1130	      DHT-based broadcasting work by El-Ansary et al. did not address
1131	      the issue of dynamic membership [321]. Ghodsi et al. addressed it
1132	      in a subsequent paper though, giving two broadcast algorithms that
1133	      accommodate routing table inconsistencies [327]. One algorithm
1134	      achieves a more optimal multicasting network at the expense of
1135	      greater correction overhead. Splitstream, based on Scribe and
1136	      Pastry, redundantly striped content across multiple interior-node-
1137	      disjoint multicast trees - if one interior node fails, then only
1138	      one stripe is lost [240].

1140	   3) Large Any-Source Multicast Groups. Any group member should be
1141	      allowed to send multicast messages. The group should scale to a
1142	      very large number of hosts. CAN-based multicast was the first
1143	      application-level multicast scheme to scale to groups of several
1144	      thousands of nodes without restricting the service model to a
1145	      single source [320]. Bayeux scales to large groups but has a
1146	      single root node for each multicast group. It supports the any-
1147	      source model only by having the root node operate as a reflector
1148	      for multiple senders [242].

1150	3.5. Routing Geometries

1152	   In Sections 3.5.1. to 3.5.6. , we introduce the main geometries for
1153	   simple key lookup and survey their robustness mechanisms.

1155	3.5.1. Plaxton Trees (Pastry, Tapestry)

1157	   Work began in March 2000 on a structured, fault-tolerant, wide-area
1158	   Dynamic Object Location and Routing (DOLR) system called Tapestry [6,
1159	   155]. While DHTs fix replica locations, a DOLR API enables
1160	   applications to control object placement [31]. Tapestry's basic
1161	   location and routing scheme follows Plaxton, Rajaraman and Richa
1162	   (PRR) [30], but it remedies PRR's robustness shortcomings described
1163	   in Section 3.1. . Whereas each object has one root node in PRR,
1164	   Tapestry uses several to avoid a single point of failure. Unlike PRR,
1165	   it allows nodes to be inserted and deleted. Whereas PRR required a
1166	   total ordering of nodes, Tapestry uses 'surrogate routing' to
1167	   incrementally choose root nodes. The PRR algorithm does not address
1168	   congestion, but Tapestry can put object copies close to nodes
1169	   generating high query loads. PRR nodes only know of the nearest
1170	   replica, whereas Tapestry nodes enable selection from a set of
1171	   replicas (for example to retrieve the most up to date). To detect
1172	   routing faults, Tapestry uses TCP timeouts and UDP heartbeats for
1173	   detection, sequential secondary neighbours for rerouting, and a
1174	   'second chance' window so that recovery can occur without the
1175	   overhead of a full node insertion. Tapestry's dependability has been
1176	   measured on a testbed of about 100 machines and on simulations of
1177	   about 1000 nodes. Successful routing rates and maintenance bandwidths
1178	   were measured during instantaneous failures and ongoing churn [31].

1180	   Pastry, like Tapestry, uses Plaxton-like prefix routing [2]. As in
1181	   Tapestry, Pastry nodes maintain O(log N) neighbours and route to a
1182	   target in O(log N) hops. Pastry differs from Tapestry only in the
1183	   method by which it handles network locality and replication [2]. Each
1184	   Pastry node maintains a 'leaf set' and a 'routing table'. The leaf
1185	   set contains l/2 node IDs on either side of the local node ID in the
1186	   node ID space. The routing table, in row r column c, points to the
1187	   node ID with the same r-digit prefix as the local node, but with an
1188	   r+1 digit of c. A Pastry node periodically probes leaf set and
1189	   routing table nodes, with periodicity of Tls and Trt and a timeout
1190	   Tout. Mahajan, Castry et al. analysed the reliability versus
1191	   maintenance cost tradeoffs in terms of the parameters l, Tls, Trt,
1192	   and Tout [328]. They concluded that earlier concerns about excessive
1193	   maintenance cost in a churning P2P network were unfounded, but
1194	   suggested followup work for a wider range of reliability targets,
1195	   maintenance costs and probe periods. Rhea Geels et al. concluded that
1196	   existing DHTs fail at high churn rates [329]. Building on a Pastry
1197	   implementation from Rice University, they found that most lookups
1198	   fail to complete when there is excessive churn. They conjectured that
1199	   short-lived nodes often leave the network with lookups that have not
1200	   yet timed out, but no evidence was provided to confirm the theory.
1201	   They identified three design issues that affect DHT performance under
1202	   churn: reactive versus periodic recovery of peers; lookup timeouts;
1203	   and choice of nearby neighbours. Since reactive recovery was found to
1204	   add traffic to already congested links, the authors used periodic
1205	   recovery in their design. For lookup timeouts, they advocated an
1206	   exponentially weighted moving average of each neighbour's response
1207	   time, over alternative fixed timeout or 'virtual coordinate' schemes.
1208	   For selection of nearby neighbours, they found that 'global sampling'
1209	   was more effective than simply sampling a 'neighbour's neighbours' or
1210	   'inverse neighbours'. Castro, Costa et al. have refuted the
1211	   suggestion that DHTs cannot cope with high churn rates [330]. By
1212	   implementing methods for continuous detection and repair, their
1213	   MSPastry implementation achieved shorter routing paths and a
1214	   maintenance overhead of less than half a message per second per node.

1216	   There have been more recent proposals based on these early Plaxton-
1217	   like schemes. Kademlia uses a bit-wise exclusive or (XOR) metric for
1218	   the 'distance' between 160 bit node identifiers [45]. Each node keeps
1219	   a list of contact nodes for each section of the node space that is
1220	   between 2^i and 2^(i+1) from itself (0.i<160). Longer-lived nodes are
1221	   deliberately given preference on this list - it has been found in
1222	   Gnutella that the longer a node has been active, the more likely it
1223	   is to remain active. Like Kademlia, Willow uses the XOR metric [32].
1224	   It implements a Tree Maintenance Protocol to 'zipper' together broken
1225	   segments of a tree. Where other schemes use DHT routing to
1226	   inefficiently add new peers, Willow can merge disjoint or broken
1227	   trees in O(log N) parallel operations.

1229	3.5.2. Rings (Chord, DKS)

1231	   Chord is the prototypical DHT ring, so we first sketch its operation.
1232	   Chord maps nodes and keys to an identifier ring [7, 34]. Chord
1233	   supports one main operation: find a node with the given key. It uses
1234	   Consistent Hashing (Section 3.1. ) to minimize disruption of keys
1235	   when nodes join and leave the network. However, Chord peers need only
1236	   track O(log N) other peers, not all peers as in the original
1237	   consistent hashing proposal [49]. It enables concurrent node
1238	   insertions and deletions, improving on PRR. Compared to Pastry, it
1239	   has a simpler join protocol. Each Chord peer tracks its predecessor,
1240	   a list of successors and a finger table. Using the finger table, each
1241	   hop is at least half the remaining distance around the ring to the
1242	   target node, giving an average lookup hop count of (1/2)log N(base
1243	   2). Each Chord node runs a periodic stabilization routine that
1244	   updates predecessor and successor pointers to cater for newly added
1245	   nodes. All successors of a given node need to fail for the ring to
1246	   fail. Although a node departure could be treated the same as a
1247	   failure, a departing Chord node first notifies the predecessor and
1248	   successors, so as to improve performance.

1250	   In their definitive paper, Chord's inventors critiqued its
1251	   dependability under churn [34]. They provided proofs on the behaviour
1252	   of the Chord network when nodes in a stable network fail, stressing
1253	   that such proofs are inadequate in the general case of a perpetually
1254	   churning network. An earlier paper had posed the question, "For
1255	   lookups to be successful during churn, how regularly do the Chord
1256	   stabilization routines need to run?" [331]. Stoica, Morris et al.
1257	   modeled a range of node join/departure rates and stabilization
1258	   periods for a Chord network of 1000 nodes. They measured the number
1259	   of timeouts (caused by a finger pointing to a departed node) and
1260	   lookup failures (caused by nodes that temporarily point to the wrong
1261	   successor during churn). They also modelled the 'lookup stretch', the
1262	   ratio of the Chord lookup time to optimal lookup time on the
1263	   underlying network. They demonstrated the latency advantage of
1264	   recursive lookups over iterative lookups, but there remains room for
1265	   delay reduction. For further work, the authors proposed to improve
1266	   resilience to network partitions, using a small set of known nodes or
1267	   'remembered' random nodes. To reduce the number of messages per
1268	   lookup, they suggested an increase in the size of each step around
1269	   the ring, accomplished via a larger number of fingers at each node.
1270	   Much of the paper assumed independent, equally likely node failures.
1271	   Analysis of correlated node failures, caused by massive site or
1272	   backbone failures, will be more important in some deployments. The
1273	   paper did not attempt to recommend a fixed optimal stabilization
1274	   rate. Liben-Nowell, Balakrishnan et al. had suggested that optimum
1275	   stabilization rate might evolve according to measurements of peers'
1276	   behaviour [331] - such a mechanism has yet to be devised.

1278	   Alima, El-Ansary et al. considered the communication costs of Chord's
1279	   stabilization routines, referred to as 'active correction', to be
1280	   excessive [332]. Two other robustness issues also motivated their
1281	   Distributed K-ary Search (DKS) design, which is similar to Chord.
1282	   Firstly, the total system should evolve for an optimum balance
1283	   between the number of peers, the lookup hopcount and the size of the
1284	   routing table. Secondly, lookups should be reliable - P2P algorithms
1285	   should be able to guarantee a successful lookup for key/value pairs
1286	   that have been inserted into the system. A similar lookup correctness
1287	   issue was raised elsewhere by one of Chord's authors, "Is it possible
1288	   to augment the data structure to work even when nodes (and their
1289	   associated finger lists) just disappear?" [333] Alima, El-Ansary et
1290	   al. asserted that P2Ps using active correction, like Chord, Pastry
1291	   and Tapestry, are unable to give such a guarantee. They propose an
1292	   alternate 'correction-on-use' scheme, whereby expired routing entries
1293	   are corrected by information piggybacking lookups and insertions. A
1294	   prerequisite is that lookup and insertion rates are significantly
1295	   higher than node arrival, departure and failure rates. Correct
1296	   lookups are guaranteed in the presence of simultaneous node arrivals
1297	   or up to f concurrent node departures, where f is configurable.

1299	3.5.3. Tori (CAN)

1301	   Ratnasamy, Francis et al. developed the Content-Addressable Network
1302	   (CAN), another early DHT widely referenced alongside Tapestry, Pastry
1303	   and Chord [8, 334]. It is arranged as a virtual d-dimensional
1304	   Cartesian coordinate space on a d-torus. Each node is responsible for
1305	   a zone in this coordinate space. The designers used a heuristic
1306	   thought to be important for large, churning P2P networks: keep the
1307	   number of neighbours independent of system size. Consequently, its
1308	   design differs significantly from Pastry, Tapestry and Chord. Whereas
1309	   they have O(logN) neighbours per node and O(logN) hops per lookup,
1310	   CAN has O(d) neighbours and O(dn^(1/d)) hop-count. When CAN's system-
1311	   wide parameter d is set to log(N), CAN converges to their profile. If
1312	   the number of nodes grows, a major rearrangement of the CAN network
1313	   may be required [151]. The CAN designers considered building on PRR,
1314	   but opted for the simple, low-state-per-node CAN algorithm instead.
1315	   They had reasoned that a PRR-based design would not perform well
1316	   under churn, given node departures and arrivals would affect a
1317	   logarithmic number of nodes [8].

1319	   There have been preliminary assessments of CAN's resilience. When a
1320	   node leaves the CAN in an orderly fashion, it passes its own Virtual
1321	   ID (VID), its neighbours' VIDs and IP addresses, and its key/value
1322	   pairs to a takeover node. If a node leaves abruptly, its neighbours
1323	   send recovery messages towards the designated takeover node. CAN
1324	   ensures the recovery messages reach the takeover node, even if nodes
1325	   die simultaneously, by maintaining a VID chain with Chord's
1326	   stabilization algorithm. Some initial 'proof of concept' resilience
1327	   simulations were run using the Network Simulator (ns) [335] for up to
1328	   a few hundred nodes. Average hopcounts and lookup failure
1329	   probabilities were plotted against the total number of nodes, for
1330	   various node failure rates [8]. The CAN team documented several open
1331	   research questions pertaining to state/hopcount tradeoffs,
1332	   resilience, load, locality and heterogeneous peers [44, 334].

1334	3.5.4. Butterflies (Viceroy)

1336	   Viceroy approximates a butterfly network [46]. It generally has
1337	   constant degree like CAN. Like Chord, Tapesty and Pastry, it has
1338	   logarithmic diameter. It improves on these systems, inasmuch as its
1339	   diameter is better than CAN and its degree is better than Chord,
1340	   Tapestry and Pastry. As with most DHTs, it utilizes Consistent
1341	   Hashing. When a peer joins the Viceroy network, it takes a random but
1342	   permanent 'identity' and selects its 'level' within the network. Each
1343	   peer maintains general ring pointers ('predecessor' and 'successor'),
1344	   level ring pointers ('nextonlevel' and 'prevonlevel') and butterfly
1345	   pointers ('left', 'right' and 'up'). When a peer departs, it normally
1346	   passes its key pairs to a successor, and notifies other peers to find
1347	   a replacement peer.

1349	   The Viceroy paper scoped out the issue of robustness. It explicitly
1350	   assumed that peers do not fail [46]. It assumed that join and leave
1351	   operations do not overlap, so as to avoid the complication of
1352	   concurrency mechanisms like locking. Kaashoek and Karger were
1353	   somewhat critical of Viceroy's complexity [37]. They also pointed to
1354	   its fault tolerance blindspot. Li and Plaxton suggested that such
1355	   constant-degree algorithms deserve further consideration [47]. They
1356	   offered several pros and cons. The limited degree may increase the
1357	   risk of a network partition, or inhibit use of local neighbours (for
1358	   the simple reason that there are less of them). On the other hand, it
1359	   may be easier to reason about the correctness of fixed-degree
1360	   networks. One of the Viceroy authors has since proposed constant-
1361	   degree peers in a two-tier, locality-aware DHT [310] - the lower
1362	   degree maintained by each lower-tier peer purportedly improves
1363	   network adaptability. Another Viceroy author has since explored an
1364	   alternative bounded-degree graph for P2P, namely the de Bruijn graph
1365	   [336].

1367	3.5.5. de Bruijn (D2B, Koorde, Distance Halving, ODRI)

1369	   De Bruijn graphs have had numerous refinements since their inception
1370	   [337, 338]. Schlumberger was the first to use them for networking
1371	   [339]. Two research teams independently devised the 'generalized' de
1372	   Bruijn graph that accommodates a flexible number of nodes in the
1373	   system [340, 341]. Rowley and Bose studied fault-tolerant rings
1374	   overlaid on the de Bruijn graph [342]. Lee, Liu et al. devised a two-
1375	   level de Bruijn hierarchy, whereby clusters of local nodes are
1376	   interconnected by a second-tier ring [343].

1378	   Many of the algorithms discussed previously are 'greedy' in that each
1379	   time a query is forwarded, it moves closer to the destination.
1380	   Unfortunately, greedy algorithms are generally suboptimal - for a
1381	   given degree, the routing distance is longer than necessary [344].
1382	   Unlike these earlier P2P designs, de Bruijn graphs of degree k
1383	   achieve an asymptotically optimal diameter logn, where n is the                                                      k
1384	   number of nodes in the system and k can be varied to improve
1385	   resilience. If there are O(log(n)) neighbours per node, the de Bruijn
1386	   hop count is O(log n/log log n). To illustrate de Bruijn's practical
1387	   advantage, consider a network with one million nodes of degree 20:
1388	   Chord has a diameter of 20, while de Bruijn has a diameter of 5 [36].
1389	   In 2003, there were a quick succession of de Bruijn proposals - D2B
1390	   [345], Koorde [37], Distance Halving [132, 336] and the Optimal
1391	   Diameter Routing Infrastructre (ODRI) [36].

1393	   Fraigniaud and Gauron began the D2B design by laying out an informal
1394	   problem statement: keys should be evenly distributed; lookup latency
1395	   should be small; traffic load should be evenly distributed; updates
1396	   of routing tables and redistribution of keys should be fast when
1397	   nodes join or leave the network. They defined a node's "congestion"
1398	   to be the probability that a lookup will traverse it. Apart from its
1399	   optimal de Bruijn diameter, they highlighted D2B's merits: a constant
1400	   expected update time when nodes (O(log n) with high probability
1401	   (w.h.p.)); the expected node congestion is O((logn)/n) (O(((log
1402	   n)^2))/n) w.h.p.) [345]. D2B's resilience was discussed only in
1403	   passing.

1405	   Koorde extends Chord to attain the optimal de Bruijn degree/diameter
1406	   tradeoff above [37]. Unlike D2B, Koorde does not constrain the
1407	   selection of node identifiers. Also unlike D2B, it caters for
1408	   concurrent joins, by extension of Chord's functionality. Kaashoek and
1409	   Karger investigated Koorde's resilience to a rather harsh failure
1410	   scenario: "in order for a network to stay connected when all nodes
1411	   fail with probability of 1/2, some nodes must have degree omega(log
1412	   n)" [37]. They sketched a mechanism to increase Koorde's degree for
1413	   this more stringent fault tolerance, losing de Bruijn's constant
1414	   degree advantage. Similarly, to achieve a constant-factor load
1415	   balance, Koorde would have to sacrifice its degree optimality. They
1416	   suggested that the ability to trade the degree, and hence the
1417	   maintenance overhead, against the expected hop count may be important
1418	   for churning systems. They also identified an open problem: find a
1419	   load-balanced, degree optimal DHT. Datta, Girdzijauskas et al. showed
1420	   that for arbitrary key distributions, de Bruijn graphs fail to meet
1421	   the dual goals of load balancing and search efficiency [346]. They
1422	   posed the question, "(Is there) a constant routing table sized DHT
1423	   which meets the conflicting goals of storage load balancing and
1424	   search efficiency for an arbitrary and changing key distribution?"

1426	   Distance Halving was also inspired by de Bruijn [336] and shares its
1427	   optimal diameter. Naor and Wieder argued for a two-step "continuous-
1428	   discrete" approach for its design. The correctness of its algorithms
1429	   is proven in a continuous setting. The algorithms are then mapped to
1430	   a discrete space. The source x and target y are points on the
1431	   continuous interval [0,1). Data items are hashed to this same
1432	   interval. <str> is a string which determines how messages leave any
1433	   point on the ring: if bit t of the string is 0, the left leg is
1434	   taken; if it is 1, the right leg is taken. <str> increases by one bit
1435	   each hop, giving a sequence by which to step around the ring. A
1436	   lookup has two phases. In the first, the lookup message containing
1437	   the source, target and the random string hops toward the midpoint of
1438	   the source and target. On each hop, the distance between <str>(x) and
1439	   <str>(y) is halved, by virtue of the specific 'left' and 'right'
1440	   functions. In the second phase, the message steps 'backward' from the
1441	   midpoint to the target, removing the last bit in <str> at each hop.
1442	   'Join' and 'leave' algorithms were outlined but there was no
1443	   consideration of recovery times or message load on churn. Using the
1444	   Distance Halving properties, the authors devised a caching scheme to
1445	   relieve congestion in a large P2P network. They have also modified
1446	   the algorithm to be more robust in the presence of random faults
1447	   [132].

1449	   Solid comparisons of DHT resilience are scarce, but Loguinov, Kumar
1450	   et al. give just that in their ODRI paper [36]. They compare Chord,
1451	   CAN and de Bruijn in terms of routing performance, graph expansion
1452	   and clustering. At the outset, they give the optimal diameter (the
1453	   maximum hopcount between any two nodes in the graph) and average
1454	   hopcount for graphs of fixed degree. De Bruijn graphs converge to
1455	   both optima, and outperform Chord and CAN on both counts. These
1456	   optima impact both delay and aggregate lookup load. They present two
1457	   clustering measures (edge expansion and node expansion) which are
1458	   interesting for resilience. Unfortunately, after decades of de Bruijn
1459	   research, they have no exact solution. De Bruijn was shown to be
1460	   superior in terms of path overlap - "de Bruijn automatically selects
1461	   backup paths that do not overlap with the best shortest path or with
1462	   each other" [36].

1464	3.5.6. Skip Graphs

1466	   Skip Graphs have been pursued by two research camps [38, 41]. They
1467	   augment the earlier Skip Lists [347, 348]. Unlike earlier balanced
1468	   trees, the Skip List is probabilistic - its insert and delete
1469	   operations do not require tree rearrangements and so are faster by a
1470	   constant factor. The Skip List consists of layers of ordered linked
1471	   lists. All nodes participate in the bottom layer 0 list. Some of
1472	   these nodes participate in the layer 1 list with some fixed
1473	   probability. A subset of layer 1 nodes participate in the layer 2
1474	   list, and so on. A lookup can proceed quickly through the list by
1475	   traversing the sparse upper layers until it is close to, or at, the
1476	   target. Unfortunately, nodes in the upper layers of a Skip List are
1477	   potential hot spots and single points of failure. Unlike Skip Lists,
1478	   Skip Graphs provide multiple lists at each level for redundancy, and
1479	   every node participates in one of the lists at each level.

1481	   Each node in a Skip Graph has theta(log n) neighbours on average,
1482	   like some of the preceding DHTs. The Skip Graph's primary edge over
1483	   the DHTs is its support for prefix and proximity search. DHTs hash
1484	   objects to a random point in the graph. Consequently, they give no
1485	   guarantees over where the data is stored. Nor do they guarantee that
1486	   the path to the data will stay within the one administration as far
1487	   as possible [38]. Skip graphs, on the other hand, provide for
1488	   location-sensitive name searches. For example, to find the document
1489	   docname on the node user.company.com, the Skip Graph might step
1490	   through its ordered lists for the prefix com.company.user [38].
1491	   Alternatively, to find an object with a numeric identifier, an
1492	   algorithm might search the lowest layer of the Skip Graph for the
1493	   first digit, the next layer for the next digit, in the same vein
1494	   until all digits are resolved. Being ordered, Skip Graphs also
1495	   facilitate range searches. In each of these examples, the Skip Graph
1496	   can be arranged such that the path to the target, as far as possible,
1497	   stays within an administrative boundary. If one administration is
1498	   detached from the rest of the Skip Graph, routing can continue within
1499	   each of the partitions. Mechanisms have been devised to merge
1500	   disconnected segments [157], though at this stage, segments are
1501	   remerged one at a time. A parallel merge algorithm has been flagged
1502	   for future work.

1504	   The advantages of Skip Graphs come at a cost. To be able to provide
1505	   range queries and data placement flexibility, Skip Graph nodes
1506	   require many more pointers than their DHT counterparts. An increased
1507	   number of pointers implies increased maintenance traffic. Another
1508	   shortcoming of at least one of the early proposals was that no
1509	   algorithm was given to assign keys to machines. Consequently, there
1510	   are no guarantees on system-wide load balancing or on the distance
1511	   between adjacent keys [100]. Aspnes, Kirsch et al. have recently
1512	   devised a scheme to reduce the inter-machine pointer count from
1513	   O(mlogm), where m is the number of data elements, to O(nlogn), where
1514	   n is the number of nodes [100]. They proposed a two-layer scheme -
1515	   one layer for the Skip Graph itself and the second 'bucket layer'.
1516	   Each machine is responsible for a number of buckets and each bucket
1517	   elects a representative key. Nodes locally adjust their load. They
1518	   accept additional keys if they are below their threshold or disperse
1519	   keys to nearby nodes if they are above threshold. There appear to be
1520	   numerous open issues: simulations have been done but analysis is
1521	   outstanding; mechanisms are required to handle the arrival and
1522	   departure of nodes; there were only brief hints as to how to handle
1523	   nodes with different capacities.

1525	4. Semantic Index

1527	   Semantic indexes capture object relationships. While the semantic-
1528	   free methods (DHTs) have firmer theoretic foundations and guarantee
1529	   that a key can be found if it exists, they do not on their own
1530	   capture the relationships between the document name and its content
1531	   or metadata. Semantic P2P designs do. However, since their design is
1532	   often driven by heuristics, they may not guarantee that scarce items
1533	   will be found.

1535	   So what might the semantically indexed P2Ps add to an already crowded
1536	   field of distributed information architectures? At one extreme there
1537	   are the distributed relational database management systems (RDBMSs),
1538	   with their strong consistency guarantees [284]. They provide strong
1539	   data independence, the flexibility of SQL queries and strong
1540	   transactional semantics - Atomicity, Consistency, Isolation and
1541	   Durability (ACID) [349]. They guarantee that the query response is
1542	   complete - all matching results are returned. The price is
1543	   performance. They scale to perhaps 1000 nodes, as evidenced in
1544	   Mariposa [350, 351], or require query caching front ends to constrain
1545	   the load [284]. Database research has "arguably been cornered into
1546	   traditional, high-end, transactional applications" [72]. Then there
1547	   are distributed file systems, like the Network File System (NFS) or
1548	   the Serverless Network File Systems (xFS), with little data
1549	   independence, low-level file retrieval interfaces and varied
1550	   consistency [284]. Today's eclectic mix of Content Distribution
1551	   Networks (CDNs) generally deload primary servers by redirecting web
1552	   requests to a nearby replica. Some intercept the HTTP requests at the
1553	   DNS level and then use consistent hashing to find a replica [23].
1554	   Since this same consistent hashing was a forerunner to the DHT
1555	   approaches above, CDNs are generally constrained to the same simple
1556	   key lookups.

1558	   The opportunity for semantically indexed P2Ps, then, is to provide:

1560	   a) graduated data independence, consistency and query flexibility,
1561	   and

1563	   b) probabilistically complete query responses, across

1565	   c) very large numbers of low-cost, geographically distributed,
1566	   dynamic nodes.

1568	4.1. Keyword Lookup

1570	   P2P keyword lookup is best understood by considering the structure of
1571	   the underlying index and the algorithms by which queries are routed
1572	   over that index. Figure 3 summarizes the following paragraphs by
1573	   classifying the keyword query algorithms, index structures and
1574	   metrics. The research has largely focused on scalability, not
1575	   dependability. There have been very few studies that quantify the
1576	   impact of network churn. One exception is the work by Chawathe et al.
1577	   on the Gia system [61]. Gia's combination of algorithms from Figure 3
1578	   (receiver-based flow control, biased random walk and one-hop
1579	   replication) gave 2-4 orders of magnitude improvement in query
1580	   success rates in churning networks.

1582	   QUERY
1583	   Query routing
1584	     Flooding: Peers only index local files so queries must propagate
1585	       widely [4]
1586	     Policy-based: Choice of the next hop node: random; most/least
1587	       recently used; most files shared; most results [265, 352]
1588	     Random walks: Parallel [67] or biased random walks [61, 66]
1589	   Query forwarding
1590	     Iterative: Nodes perform iterative unicast searches of ultrapeers,
1591	       until the desired number of results is achieved. See Gnutella UDP
1592	       Extension for Scalable Searches (GUESS) [265, 353]

1594	     Recursive
1595	   Query flow control
1596	     Receiver-controlled: Receivers grant query tokens to senders, so
1597	       as to avoid overload [61]
1598	     Reactive: sender throttles queries when it notices receivers are
1599	       discarding packets [61, 66]
1600	     Dynamic Time To Live: In the Dynamic Query Protocol, the sender
1601	       adjusts the time-to-live on each iteration based on the number
1602	       of results received, the number of connections left, and the
1603	       number of nodes already theoretically reached by the search [354]

1605	   INDEX
1606	   Distribution
1607	     Compression: Leaf nodes periodically send ultrapeers compressed
1608	       query routing tables, as in the Query Routing Protocol [260]
1609	     One hop replication: Nodes maintain an index of content on their
1610	       nearest neighbors [61, 352]
1611	   Partitioning
1612	     By document [210]
1613	     By keyword: Use an inverted list to find a matching document,
1614	       either locally or at another peer [21]. Partition by keyword sets
1615	       [355]
1616	     By document and keyword: Also called Multi-Level Partitioning [21]

1618	   METRIC
1619	   Query load: Queries per second per node/link [65, 265]
1620	   Degree: The number of links per node [66, 352]. Early P2P networks
1621	     approximated power-law networks, where the number of nodes with L
1622	     links is proportional to L^(-k) where k is a constant [65]
1623	   Query delay: Reported in terms of time and hop count [61, 66]
1624	   Query success rate: The "Collapse Point" is the per-node query rate
1625	     at which the query success rate drops below 90% [61]. See also [61,
1626	   265, 352].

1628	                  Figure 3 Keyword Lookup in P2P Systems.

1630	4.1.1. Gnutella Enhancements

1632	   Perhaps the most widely referenced P2P system for simple keyword
1633	   match is Gnutella [4]. Gnutella queries contain a string of keywords.
1634	   Gnutella peers answer when they have files whose names contain all
1635	   the keywords. As discussed in Section 2.1. , early versions of
1636	   Gnutella did not forward the document index. Queries were flooded and
1637	   peers searched their own local indexes for filename matches. An early
1638	   review highlighted numerous areas for improvement [65]. It was
1639	   estimated that the query traffic alone from 50,000 early-generation
1640	   Gnutella nodes would amount to 1.7% of the total U.S. internet
1641	   backbone traffic at December 2000 levels. It was speculated that high
1642	   degree Gnutella nodes would impede dependability. An unnecessarily
1643	   high percentage of Gnutella traffic crossed Autonomous System (AS)
1644	   boundaries - a locality mechanism may have found suitable nearby
1645	   peers.

1647	   Fortunately, there have since been numerous enhancements within the
1648	   Gnutella Developer Forum. At the time of writing, it has been
1649	   reported that Gnutella has almost 350,000 unique hosts, of which
1650	   nearly 90,000 accept incoming connections [356]. One of the main
1651	   improvements is that an index of filename keywords, called the Query
1652	   Routing Table (QRT), can now be forwarded from 'leaf peers' to its
1653	   'ultrapeers' [260]. Ultrapeers can then ensure that the leaves only
1654	   receive queries for which they have a match, dramatically reducing
1655	   the query traffic at the leaves. Ultrapeers can have connections to
1656	   many leaf nodes (~10-100) and a small number of other ultrapeers
1657	   (<10) [260]. Originally, a leaf node's QRT was not forwarded by the
1658	   parent ultrapeer to other ultrapeers. More recently, there has been a
1659	   proposal to distribute aggregated QRTs amongst ultrapeers [357]. To
1660	   further limit traffic, QRTs are compressed by hashing, according to
1661	   the Query Routing Protocol (QRP) specification [281]. This same
1662	   specification claims QRP may reduce Gnutella traffic by orders of
1663	   magnitude, but cautions that simulation is required before mass
1664	   deployment. A known shortcoming of QRP was that the extent of query
1665	   propagation was independent of the popularity of the search terms.
1666	   The Dynamic Query Protocol addressed this [358]. It required leaf
1667	   nodes to send single queries to high-degree ultrapeers which adjust
1668	   the queries' time-to-live (TTL) bounds according to the number of
1669	   received query results. An earlier proposal, called the Gnutella UDP
1670	   Extension for Scalable Searches (GUESS) [353], similarly aimed to
1671	   reduce the number of queries for widely distributed files. GUESS
1672	   reuses the non-forwarding idea (Section 2. ). A GUESS peer repeatedly
1673	   queries single ultrapeers with a TTL of 1, with a small timeout on
1674	   each query to limit load. It chooses the number of iterations and
1675	   selects ultrapeers so as to satisfy its search needs. For
1676	   adaptability, a small number of experimental Gnutella nodes have
1677	   implemented eXtensible Markup Language (XML) schemas for richer
1678	   queries [359, 360]. None of the above Gnutella proposals explicitly
1679	   assess robustness.

1681	   The broader research community has recently been leveraging aspects
1682	   of the Gnutella design. Lv, Ratnasamy et al. exposed one assumption
1683	   implicit in some of the early DHT work - that designs "such as
1684	   Gnutella are inherently not scalable, and therefore should be
1685	   abandoned" [66]. They argued that by making better use of the more
1686	   powerful peers, Gnutella's scalability issues could be alleviated.
1687	   Instead of its flooding mechanism, they used random walks. Their
1688	   preliminary design to bias random walks towards high capacity nodes
1689	   did not go as far as the ultrapeer proposals in that the indexes did
1690	   not move to the high capacity nodes. Chawathe, Ratnasamy et al. chose
1691	   to extend the Gnutella design with their Gia system, in response to
1692	   the perceived shortcomings of DHTs in Section 1.2. [61]. Compared to
1693	   the early Gnutella designs, they incorporated several novel features.
1694	   They devise a topology adaptation algorithm so that most peers are
1695	   attached to high-degree peers. They use a random walk search
1696	   algorithm, in lieu of flooding, and bias the query load towards
1697	   higher-degree peers. For 'one-hop replication', they require all
1698	   nodes keep pointers to content on adjacent peers. To implement a
1699	   receiver-controlled token-based flow control, a peer must have a
1700	   token from its neighbouring peer before it sends a query to it.
1701	   Chawathe, Ratnasamy et al. show by simulations that the combination
1702	   of these features provides a scalability improvement of three to five
1703	   orders of magnitude over Gnutella "while retaining significant
1704	   robustness". The main robustness metrics they used were the 'collapse
1705	   point' query rate (the per node query rate at which the successful
1706	   query rate falls below 90%) and the average hop-count immediately
1707	   prior to collapse. Their comparison with Gnutella did not take into
1708	   account the Gnutella enhancements above - this was left as future
1709	   work. Castro, Costa and Rowstron argued that if Gnutella were built
1710	   on top of a structured overlay, then both the query and overlay
1711	   maintenance traffic could be reduced [259]. Yang, Vinograd et al.
1712	   explore various policies for peer selection in the GUESS protocol,
1713	   since the issue is left open in the original proposal [265]. For
1714	   example, the peer initiating the query could choose peers that have
1715	   been "most recently used" or that have the "most files shared".
1716	   Various policy pitfalls are identified. For example, good peers could
1717	   be overloaded, victims of their own success. Alternatively, malicious
1718	   peers could encourage the querying peer to try inactive peers. They
1719	   conclude that a "most results" policy gives the best balance of
1720	   robustness and efficiency. Like Castro, Costa and Rowstron, they
1721	   concentrated on the static network scenario. Cholvi, Felber et al.
1722	   very briefly describe how similar "least recently used" and "most
1723	   often used" heuristics can be used by a peer to select peer
1724	   'acquaintances' [352]. They were motivated by the congestion
1725	   associated with Gnutella's TTL-limited flooding. Recognizing that the
1726	   busiest peers can quickly become overloaded central hubs for the
1727	   entire network, they limit the number of acquaintances for any given
1728	   peer to 25. They sketch a mechanism to decrement a query's TTL
1729	   multiple times when it traverses "interested peers". In summary,
1730	   these Gnutella-related investigations are characterized by a bias for
1731	   high degree peers and very short directed query paths, a disdain for
1732	   flooding, and concern about excessive load on the 'better' peers.
1733	   Generally, the robustness analysis for dynamic networks (content
1734	   updates and node arrivals/departures) remains open.

1736	4.1.2. Partition-by-Document, Partition-by-Keyword

1738	   One aspect of P2P keyword search systems has received particular
1739	   attention: should the index be partitioned by document or by keyword?
1740	   The issue affects scalability. To be partitioned by document, each
1741	   node has a local index of documents for which it is responsible.
1742	   Gnutella is a prime example. Queries are generally flooded in systems
1743	   partitioned by document. On the other hand, a peer may assume
1744	   responsibility for a set of keywords. The peer uses an inverted list
1745	   to find a matching document, either locally or at another peer. If
1746	   the query contains several keywords, inverted lists may need to be
1747	   retrieved from several different peers to find the intersection [21].
1748	   The initial assessment by Li, Loo et al. was that the partition-by-
1749	   document approach was superior [210]. For one scenario of a full-text
1750	   web search, they estimated the communications costs to be about six
1751	   times higher than the feasible budget. However, wanting to exploit
1752	   prior work on inverted list intersection, they studied the partition-
1753	   by-keyword strategy. They proposed several optimizations which put
1754	   the communication costs for a partition-by-keyword system within an
1755	   order of magnitude of feasibility. There had been a couple of prior
1756	   papers that suggested partitioned-by-keyword designs incorporate DHTs
1757	   to map keywords to peers [355, 361]. In Gnawali's Keyword-set Search
1758	   System (KSS), the index is partitioned by sets of keywords [355].
1759	   Terpstra, Behnel et al. point out that by keeping keyword pairs or
1760	   triples, the number of lists per document in KSS is squared or
1761	   tripled [362]. Shi, Guangwen et al. interpreted the approximations of
1762	   Li, Loo et al. to mean that neither approach is feasible on its own
1763	   [21]. Their Multi-Level Partitioning (MLP) scheme incorporates both
1764	   partitioning approaches. They arrange nodes into a group hierarchy,
1765	   with all nodes in the single 'level 0' group, and with the same nodes
1766	   sub-divided into k logical subgroups on 'level 1'. The subgroups are
1767	   again divided, level by level, until level l. The inverted index is
1768	   partitioned by document between groups and by keyword within groups.
1769	   MLP avoids the query flooding normally associated with systems
1770	   partitioned by document, since a small number of nodes in each group
1771	   process the query. It reduces the bandwidth overheads associated with
1772	   inverted list intersection in systems partitioned solely by keyword,
1773	   since groups can calculate the intersection independently over the
1774	   documents for which they are responsible. MLP was overlaid on
1775	   SkipNet, per Section 3.5.6. [38]. Some initial analyses of
1776	   communications costs and query latencies were provided.

1778	4.1.3. Partial Search, Exhaustive Search

1780	   Much of the research above addresses partial keyword search. Daswani
1781	   et al. highlighted the open problem of efficient, comprehensive
1782	   keyword search [25]. How can exhaustive searches be achieved without
1783	   flooding queries to every peer in the network? Terpstra, Behnel et
1784	   al. couched the keyword search problem in rendezvous terms: dynamic
1785	   keyword queries need to 'meet' with static document lists [362].
1786	   Their Bitzipper scheme is partitioned by document. They improved on
1787	   full flooding by putting document metadata on 2sqrt(n) nodes and
1788	   forwarding queries through only 6sqrt(n) nodes. They reported that
1789	   Bitzipper nodes need only 1/166th of the bandwidth of full-flooding
1790	   Gnutella nodes for an exhaustive search. An initial comparison of
1791	   query load was given. There was little consideration of either static
1792	   or dynamic resilience, that is, of nodes failing, of documents
1793	   continually changing, or of nodes continually joining and leaving the
1794	   network.

1796	4.2. Information Retrieval

1798	   The field of Information Retrieval (IR) has matured considerably
1799	   since its inception in the 1950s [363]. A taxonomy for IR models has
1800	   been formalized [262]. It consists of four elements: a representation
1801	   of documents in a collection; a representation of user queries; a
1802	   framework describing relationships between document representations
1803	   and queries; and a ranking function that quantifies an ordering
1804	   amongst documents for a particular query. Three main issues motivate
1805	   current IR research - information relevance, query response time, and
1806	   user interaction with IR systems. The dominant IR trends for
1807	   searching large text collections are also threefold [262]. The size
1808	   of collections is increasing dramatically. More complicated search
1809	   mechanisms are being found to exploit document structure, to
1810	   accommodate heterogeneous document collections, and to deal with
1811	   document errors. Compression is in favour - it may be quicker to
1812	   search compact text or retrieve it from external devices. In a
1813	   distributed IR system, query processing has four parts. Firstly,
1814	   particular collections are targeted for the search. Secondly, queries
1815	   are sent to the targeted collections. Queries are then evaluated at
1816	   the individual collections. Finally results from the collections are
1817	   collated.

1819	   So how do P2P networks differ from distributed IR systems? Bawa,
1820	   Manku et al. presented four differences [62]. They suggested that a
1821	   P2P network is typically larger, with tens or hundreds of thousands
1822	   of nodes. It is usually more dynamic, with node lifetimes measured in
1823	   hours. They suggested that a P2P network is usually homogeneous, with
1824	   a common resource description language. It lacks the centralized
1825	   "mediators" found in many IR systems, that assume responsibility for
1826	   selecting collections, for rewriting queries, and for merging ranked
1827	   results. These distinctions are generally aligned with the peer
1828	   characteristics in Section 1. . One might add that P2P nodes display
1829	   more symmetry - peers are often both information consumers and
1830	   producers. Daswani, Garcia-Molina et al. pointed out that, while
1831	   there are IR techniques for ranked keyword search at moderate scale,
1832	   research is required so that ranking mechanisms are efficient at the
1833	   larger scale targeted by P2P designs [25]. Joseph and Hoshiai
1834	   surveyed several P2P systems using metadata techniques from the IR
1835	   toolkit [60]. They described an assortment of IR techniques and P2P
1836	   systems, including various metadata formats, retrieval models, bloom
1837	   filters, DHTs and trust issues.

1839	   In the ensuing paragraphs, we survey P2P work that has incorporated
1840	   information retrieval models, particularly the Vector Model and the
1841	   Latent Semantic Indexing Model. We omit the P2P work based on
1842	   Bayesian models. Some have pointed to such work [60], but it made no
1843	   explicit mention of the model [364]. One early paper on P2P content-
1844	   based image retrieval also leveraged the Bayesian model [365]. For
1845	   the former two models, we briefly describe the design, then try to
1846	   highlight robustness aspects. On robustness, we are again stymied for
1847	   lack of prior work. Indeed, a search across all proceedings of the
1848	   Annual ACM Conference on Research and Development in Information
1849	   Retrieval for the words "reliable", "available", "dependable" or
1850	   "adaptable" did not return any results at the time of writing. In
1851	   contrast, a standard text on distributed database management systems
1852	   [366] contains a whole chapter on reliability. IR research
1853	   concentrates on performance measures. Common performance measures
1854	   include recall, the fraction of the relevant documents which has been
1855	   retrieved, and precision, the fraction of the retrieved documents
1856	   which is relevant [262]. Ideally, an IR system would have high recall
1857	   and high precision. Unfortunately techniques favouring one often
1858	   disadvantage the other [363].

1860	4.2.1. Vector Model (PlanetP, FASD, eSearch)

1862	   The vector model [367] represents both documents and queries as term
1863	   vectors, where a term could be a word or a phrase. If a document or
1864	   query has a term, the weight of the corresponding dimension of the
1865	   vector is non-zero. The similarity of the document and query vectors
1866	   gives an indication of how well a document matches a particular
1867	   query.

1869	   The weighting calculation is critical across the retrieval models.
1870	   Amongst the numerous proposals for the probabilistic and vector
1871	   models, there are some commonly recurring weighting factors [363].
1872	   One is term frequency. The more a term is repeated in a document, the
1873	   more important the term is. Another is inverse document frequency.
1874	   Terms common to many documents give less information about the
1875	   content of a document. Then there is document length. Larger
1876	   documents can bias term frequencies, so weightings are sometimes
1877	   normalized against document length. The expression "TFIDF weighting"
1878	   refers to the collection of weighting calculations that incorporate
1879	   term frequency and inverse document frequency, not just to one. Two
1880	   weighting calculations have been particularly dominant - Okapi [368]
1881	   and pivoted normalization [369]. A distributed version of Google's
1882	   Pagerank algorithm has also been devised for a P2P environment [370].
1883	   It allows incremental, ongoing Pagerank calculations while documents
1884	   are inserted and deleted.

1886	   A couple of early P2P systems leveraged the vector model. Building on
1887	   the vector model, PlanetP divided the ranking problem into two steps
1888	   [215]. In the first, peers are ranked for the probability that they
1889	   have matching documents. In the second, higher priority peers are
1890	   contacted and the matching documents are ranked. An Inverse Peer
1891	   Frequency, analogous to the Inverse Document Frequency, is used to
1892	   rank relevant peers. To further constrain the query traffic, PlanetP
1893	   contacts only the first group of m peers to retrieve a relevant set
1894	   of documents. In this way, it repeatedly contacts groups of m peers
1895	   until the top k document rankings are stable. While the PlanetP
1896	   designers first quantified recall and precision, they also considered
1897	   reliability. Each PlanetP peer has a global index with a list of all
1898	   other peers, their IP addresses, and their Bloom filters. This large
1899	   volume of shared information needs to be maintained. Klampanos and
1900	   Jose saw this as PlanetP's primary shortcoming [371]. Each Bloom
1901	   filter summarized the set of terms in the local index of each peer.
1902	   The time to propagate changes, be they new documents or peer
1903	   arrivals/departures, was studied by simulation for up to 1000 peers.
1904	   The reported propagation times were in the hundreds of seconds.
1905	   Design workarounds were required for PlanetP to be viable across
1906	   slower dial-up modem connections. For future work, the authors were
1907	   considering some sort of hierarchy to scale to larger numbers of
1908	   peers.

1910	   A second early system using the vector model is the Fault-tolerant,
1911	   Adaptive, Scalable Distributed (FASD) search engine [283], which
1912	   extended the Freenet design (Section 2.3. ) for richer queries. The
1913	   original Freenet design could find a document based on a globally
1914	   unique identifier. Kronfol's design added the ability to search, for
1915	   example, for documents about "apples AND oranges NOT bananas". It
1916	   uses a TFIDF weighting scheme to build a document's term vector. Each
1917	   peer calculates the similarity of the query vector and local
1918	   documents and forwards the query to the best downstream peer. Once
1919	   the best downstream peer returns a result, the second-best peer is
1920	   tried, and so on. Simulations with 1000 nodes gave an indication of
1921	   the query path lengths in various situations - when routing queries
1922	   in a network with constant rates of node and document insertion, when
1923	   bootstrapping the network in a "worst-case" ring topology, or when
1924	   failing randomly and specifically selected peers. Kronfol claimed
1925	   excellent average-case performance - less than 20 hops to retrieve
1926	   the same top n results as a centralized search engine. There were,
1927	   however, numerous cases where the worst-case path length was several
1928	   hundred hops in a network of only 1000 nodes.

1930	   In parallel, there have been some P2P designs based on the vector
1931	   model from the University of Rochester - pSearch [9, 372] and eSearch
1932	   [373]. The early pSearch paper suggested a couple of retrieval
1933	   models, one of which was the Vector Space Model, to search only the
1934	   nodes likely to have matching documents. To obtain approximate global
1935	   statistics for the TFIDF calculation, a spanning tree was constructed
1936	   across a subset of the peers. For the m top terms, the term-to-
1937	   document index was inserted into a Content-Addressable Network [334].
1938	   A variant which mapped terms to document clusters was also suggested.
1939	   eSearch is a hybrid of the partition-by-document and partition-by-
1940	   term approaches (Section 4.1.2. ). eSearch nodes are primarily
1941	   partitioned by term. Each is responsible for the inverted lists for
1942	   some top terms. For each document in the inverted list, the node
1943	   stores the complete term list. To reduce the size of the index, the
1944	   complete term lists for a document are only kept on nodes that are
1945	   responsible for top terms in the document. eSearch uses the Okapi
1946	   term weighting to select top terms. It relies on the Chord DHT [34]
1947	   to associate terms with nodes storing the inverted lists. It also
1948	   uses automatic query expansion. This takes the significant terms from
1949	   the top document matches and automatically adds them to the user's
1950	   query to find additional relevant documents. The eSearch performance
1951	   was quantified in terms of search precision, the number of retrieved
1952	   documents, and various load-balancing metrics. Compared to the more
1953	   common proposals for partitioning by keywords, eSearch consumed 6.8
1954	   times the storage space to achieve faster search times.

1956	4.2.2. Latent Semantic Indexing (pSearch)

1958	   Another retrieval model used in P2P proposals is Latent Semantic
1959	   Indexing (LSI) [374]. Its key idea is to map both the document and
1960	   query vectors to a concept space with lower dimensions. The starting
1961	   point is a t*N weighting matrix, where t is the total number of
1962	   indexed terms, N is the total number of documents, and the matrix
1963	   elements could be TFIDF rankings. Using singular value decomposition,
1964	   this matrix is reduced to a smaller number of dimensions, while
1965	   retaining the more significant term-to-document mappings. Baeza-Yates
1966	   and Ribeiro-Neto suggested that LSI's value is a novel theoretic
1967	   framework, but that its practical performance advantage for real
1968	   document collections had yet to be proven [262]. pSearch incorporated
1969	   LSI [9]. By placing the indices for semantically similar documents
1970	   close in the network, Tang, Xu et al. touted significant bandwidth
1971	   savings relative to the early full-flooding variant of Gnutella
1972	   [372]. They plotted the number of nodes visited by a query. The also
1973	   explored the tradeoff with accuracy, the percentage match between the
1974	   documents returned by the distributed pSearch algorithm and those
1975	   from a centralized LSI baseline. In a more recent update to the
1976	   pSearch work, Tang, Dwarkadas et al. summarized LSI's shortcomings
1977	   [375]. Firstly, for large document collections, its retrieval quality
1978	   is inherently inferior to Okapi. Secondly, singular value
1979	   decomposition consumes excessive memory and computation time.
1980	   Consequently, the authors used Okapi for searching while retaining
1981	   LSI for indexing. With Okapi, they selected the next node to be
1982	   searched and selected documents on searched nodes. With LSI, they
1983	   ensured that similar documents are clustered near each other, thereby
1984	   optimizing the network search costs. When retrieving a small number
1985	   of top documents, the precision of LSI+Okapi approached that of
1986	   Okapi. However, if retrieving a large number of documents, the
1987	   LSI+Okapi precision is inferior. The authors want to improve this in
1988	   future work.

1990	4.2.3. Small Worlds

1992	   The "small world" concept originally described how people are
1993	   interconnected by short chains of acquaintances [376]. Kleinberg was
1994	   struck by the algorithmic lesson of the small world, namely "that
1995	   individuals using local information are collectively very effective
1996	   at constructing short paths between two points in a social network"
1997	   [377]. Small world networks have a small diameter and a large
1998	   clustering coefficient (a large number of connections amongst
1999	   relevant nodes) [378].

2001	   The small world idea has had a limited impact on peer-to-peer
2002	   algorithms. It has influenced only a few unstructured [62, 378-380]
2003	   and structured [344, 381] algorithms. The most promising work on
2004	   "small worlds" in P2P networks are those concerned with the
2005	   information retrieval metrics, precision and recall [62, 378, 380].

2007	5. Queries

2009	   Database research suggests directions for P2P research. Hellerstein
2010	   observed that, while work on fast P2P indexes is well underway, P2P
2011	   query optimization remains a promising topic for future research
2012	   [23]. Kossman reviewed the state of the art of distributed query
2013	   processing, highlighting areas for future research: simulation and
2014	   query optimization for networks of tens of thousands of servers and
2015	   millions of clients; non-relational data types like XML, text and
2016	   images; and partial query responses since on the Internet "failure is
2017	   the rule rather than the exception" [19]. A primary motivation for
2018	   the P2P system, PIER, was to scale from the largest database systems
2019	   of a few hundred nodes to an Internet environment in which there are
2020	   over 160 million nodes [22]. Litwin and Sahri have also considered
2021	   ways to combine distributed hashing, more specifically the Scalable
2022	   Distributed Data Structures, with SQL databases, claiming to be first
2023	   to implement scalable distributed database partitioning [382].
2024	   Motivated by the lack of transparent distribution in current
2025	   distributed databases, they measure query execution times for
2026	   Microsoft SQL servers aggregated by means of an SDDS layer. One of
2027	   their starting assumptions was that it is too challenging to change
2028	   the SQL query optimizer.

2030	   Database research also suggests the approach to P2P research.
2031	   Researchers of database query optimization were divided between those
2032	   looking for optimal solutions in special cases and those using
2033	   heuristics to answer all queries [383]. Gribble et al. cast query
2034	   optimization in terms of the data placement problem, which is to
2035	   "distribute data and work so the full query workload is answered with
2036	   lowest cost under the existing bandwidth and resource constraints"
2037	   [250]. They pointed out that even the static version of this problem
2038	   is NP-complete in P2P networks. Consequently, research on massive,
2039	   dynamic P2P networks will likely progress using both strategies of
2040	   early database research - heuristics and special-case optimizations.

2042	   If P2P networks are going to be adaptable, if they are to support a
2043	   wide range of applications, then they need to accommodate many query
2044	   types [72]. Up to this point, we have reviewed queries for keys
2045	   (Section 3. ) and keywords (Sections 4.1. and 4.2. ). Unfortunately,
2046	   a major shortcoming of the DHTs in Section 3.5. is that they
2047	   primarily support exact-match, single-key queries. Skip Graphs
2048	   support range and prefix queries, but not aggregation queries. Here
2049	   we probe below the language syntax to identify the open research
2050	   issues associated with more expressive P2P queries [25].
2051	   Triantafillou and Pitoura observed the disparate P2P designs for
2052	   different types of queries and so outlined a unifying framework [76].
2053	   To classify queries, they considered the number of relations (single
2054	   or multiple), the number of attributes (single or multiple) and the
2055	   type of query operator. They described numerous operators: equality,
2056	   range, join and "special functions". The latter referred to
2057	   aggregation (like sum, count, average, minimum and maximum), grouping
2058	   and ordering. The following sections approximately fit their taxonomy
2059	   - range queries, multi-attribute queries, join queries and
2060	   aggregation queries. There has been some initial P2P work on other
2061	   query types - continuous queries [20, 22, 73], recursive queries [22,
2062	   74] and adaptive queries [23, 75]. For these, we defer to the primary
2063	   references.

2065	5.1. Range Queries

2067	   The support of efficient range predicates in P2P networks was
2068	   identified as an important open research issue by Huebsch et al.
2069	   [22]. Range partitioning has been important in parallel databases to
2070	   improve performance, so that a transaction commonly needs data from
2071	   only one disk or node [22]. One type of range search, longest prefix
2072	   match, is important because of its prevalence in routing schemes for
2073	   voice and data networks alike. In other applications, users may pose
2074	   broad, inexact queries, even though they require only a small number
2075	   of responses. Consequently techniques to locate similar ranges are
2076	   also important [77]. Various proposals for range searches over P2P
2077	   networks are summarized in Figure 4. Since the Scalable Distributed
2078	   Data Structure (SDDS) has been an important influence on contemporary
2079	   Distributed Hash Tables (DHTs) [49-51], we also include ongoing work
2080	   on SDDS range searches.

2082	   PEER-TO-PEER (P2P)
2083	   Locality Sensitive Hashing (Chord) [77]
2084	   Prefix Hash Trees (unspecified DHT) [78, 79]
2085	   Space Filling Curves (CAN) [80]
2086	   Space Filling Curves (Chord) [81]
2087	   Quadtrees (Chord) [82]
2088	   Skip Graphs [38, 41, 83, 100]
2089	   Mercury [84]
2090	   P-Grid [85, 86]

2092	   SCALABLE DISTRIBUTED DATA STRUCTURES (SDDS)
2093	   RP*   [87, 88]

2095	       Figure 4 Solutions for Range Queries on P2P and SDDS Indexes.

2097	   The papers on P2P range search can be divided into those that rely on
2098	   an underlying DHT (the first five entries in (Figure 4) and those
2099	   that do not (the subsequent three entries). Bharambe, Agrawal et al.
2100	   argued that DHTs are inherently ill-suited to range queries [84]. The
2101	   very feature that makes for their good load balancing properties,
2102	   randomized hash functions, works against range queries. One possible
2103	   solution would be to hash ranges, but this can require a priori
2104	   partitioning. If the partitions are too large, partitions risk
2105	   overload. If they are too small, there may be too many hops.

2107	   Despite these potential shortcomings, there have been several range
2108	   query proposals based on DHTs. If hashing ranges to nodes, it is
2109	   entirely possible that overlapping ranges map to different nodes.
2110	   Gupta, Agrawal et al. rely on locality sensitive hashing to ensure
2111	   that, with high probability, similar ranges are mapped to the same
2112	   node [77]. They propose one particular family of locality sensitive
2113	   hash functions, called min-wise independent permutations. The number
2114	   of partitions per node and the path length were plotted against the
2115	   total numbers of peers in the system. For a network with 1000 nodes,
2116	   the hop-count distribution was very similar to that of the exact-
2117	   matching Chord scheme. Was it load-balanced? For the same network
2118	   with 50,000 partitions, there were over two orders of magnitude
2119	   variation in the number of partitions at each node (first and ninety-
2120	   ninth percentiles). The Prefix Hash Tree is a trie in which prefixes
2121	   are hashed onto any DHT. The preliminary analysis suggests efficient
2122	   doubly logarithmic lookup, balanced load and fault resilience [78,
2123	   79]. Andrzejak and Xu were perhaps the first to propose a mapping
2124	   from ranges to DHTs [80]. They use one particular Space Filling
2125	   Curve, the Hilbert curve, over a Content Addressable Network (CAN)
2126	   construction (Section 3.5.3. ). They maintain two properties: nearby
2127	   ranges map to nearby CAN zones; if a range is split into two sub-
2128	   ranges, then the zones of the sub-ranges partition the zone of the
2129	   primary range. They plot path length and load proxy measures (the
2130	   total number of messages and nodes visited) for three algorithms to
2131	   propagate range queries: brute force; controlled flooding and
2132	   directed controlled flooding. Schmidt and Parashar also advocated
2133	   Space Filling Curves to achieve range queries over a DHT [81].
2134	   However they point out that, while Andrzejak and Xu use an inverse
2135	   Space Filling Curve to map a one-dimensional space to d-dimensional
2136	   zones, they map a d-dimensional space back to a one-dimensional
2137	   index. Such a construction gives the ability to search across
2138	   multiple attributes (Section 5.2. ). Tanin, Harwood et al. suggested
2139	   quadtrees over Chord [82], and gave preliminary simulation results
2140	   for query response times.

2142	   Because DHTs are naturally constrained to exact-match, single-key
2143	   queries, researchers have considered other P2P indexes for range
2144	   searches. Several were based on Skip Graphs [38, 41] which, unlike
2145	   the DHTs, do not necessitate randomizing hash functions and are
2146	   therefore capable of range searches. Unfortunately, they are not load
2147	   balanced [83]. For example, in SkipNet [48], hashing was added to
2148	   balance the load - the Skip Graph could support range searches or
2149	   load balancing, but not both. One solution for load-balancing relies
2150	   on an increased number of 'virtual' servers [168] but, in their
2151	   search for a system that can both search for ranges and balance
2152	   loads, Bharambe, Agrawal et al. rejected the idea [84]. The virtual
2153	   servers work assumed load imbalance stems from hashing, that is, by
2154	   skewed data insertions and deletions. In some situations, the
2155	   imbalance is triggered by a skewed query load. In such circumstances,
2156	   additional virtual servers can increase the number of routing hops
2157	   and increase the number of pointers that a Skip Graph needs to
2158	   maintain. Ganesan, Bawa et al. devised an alternate method to balance
2159	   load [83]. They proposed two Skip Graphs, one to index the data
2160	   itself and the other to track load at each node in the system. Each
2161	   node is able to determine the load on its neighbours and the most
2162	   (least) loaded nodes in the system. They devise two algorithms:
2163	   NBRADJUST balances load on neighbouring nodes; using REORDER, empty
2164	   nodes can take over some of the tuples on heavily loaded nodes. Their
2165	   simulations focus on skewed storage load, rather than on skewed query
2166	   loads, but they surmise that the same approach could be used for the
2167	   latter.

2169	   Other proposals for range queries avoid both the DHT and the Skip
2170	   Graph. Bharambe, Agrawal et al. distinguish their Mercury design by
2171	   its support for multi-attribute range queries and its explicit load
2172	   balancing [84]. In Mercury, nodes are grouped into routing hubs, each
2173	   of which is responsible for various query attributes. While it does
2174	   not use hashing, Mercury is loosely similar to the DHT approaches:
2175	   nodes within hubs are arranged into rings, like Chord [34]; for
2176	   efficient routing within hubs, k long-distance links are used, like
2177	   Symphony [381]. Range lookups require O(((log n)^2)/k) hops. Random
2178	   sampling is used to estimate the average load on nodes and to find
2179	   the parts of the overlay that are lightly loaded. Whereas Symphony
2180	   assumed that nodes are responsible for ranges of approximately equal
2181	   size, Mercury's random sampling can determine the location of the
2182	   start of the range, even for non-uniform ranges [84]. P-Grid [42]
2183	   does provide for range queries, by virtue of the key ordering in its
2184	   tree structures. Ganesan, Bawa et al. critiqued its capabilities
2185	   [83]: P-Grid assumes fixed-capacity nodes; there was no formal
2186	   characterization of imbalance ratios or balancing costs; every P-Grid
2187	   periodically contacts other nodes for load information.

2189	   The work on Scalable Distributed Data Structures (SDDSs) has
2190	   progressed in parallel with P2P work and has addressed range queries.
2191	   Like the DHTs above, the early SDDS Linear Hashing (LH*) schemes were
2192	   not order-preserving [52]. To facilitate range queries, Litwin,
2193	   Niemat et al. devised a Range Parititioning variant, RP* [87]. There
2194	   are options to dispense with the index, to add indexes to clients and
2195	   to add them to servers. In the variant without an index, every query
2196	   is issued via multicasting. The other variants also use some
2197	   multicasting. The initial RP* paper suggested scalability to
2198	   thousands of sites, but a more recent RP* simulation was capped at
2199	   140 servers [88]. In that work, Tsangou, Ndiaye et al. investigated
2200	   TCP and UDP mechanisms by which servers could return range query
2201	   results to clients. The primary metrics were search and response
2202	   times. Amongst the commercial parallel database management systems,
2203	   they reported that the largest seems only to scale to 32 servers (SQL
2204	   Server 2000). For future work, they planned to explore aggregation of
2205	   query results, rather than establishing a connection between the
2206	   client and every single server with a response.

2208	   All in all, it seems there are numerous open research questions on
2209	   P2P range queries. How realistic is the maintenance of global load
2210	   statistics considering the scale and dynamism of P2P networks?
2211	   Simulations at larger scales are required. Proposals should take into
2212	   account both the storage load (insert and delete messages) and the
2213	   query load (lookup messages). Simplifying assumptions need to be
2214	   attacked. For example, how well do the above solutions work in
2215	   networks with heterogeneous nodes, where the maximum message loads
2216	   and index sizes are node-dependent?

2218	5.2. Multi-Attribute Queries

2220	   There has been some work on multi-attribute P2P queries. As late as
2221	   September 2003, it was suggested that there has not been an efficient
2222	   solution [76].

2224	   Again, an early significant work on multi-attribute queries over
2225	   aggregated commodity nodes germinated amongst SDDSs. k-RP* [89] uses
2226	   the multi-dimensional binary search tree (or k-d tree where k
2227	   indicates the number of dimensions of the search index) [384]. It
2228	   builds on the RP* work from the previous section and inherits their
2229	   capabilities for range search and partial match. Like the other
2230	   SDDSs, k-RP* indexes can fit into RAM for very fast lookup. For
2231	   future work, Litwin and Neimat suggested a) a formal analysis of the
2232	   range search termination algorithm and the k-d paging algorithm, b) a
2233	   comparison with other multi-attribute data structures (quad-trees and
2234	   R-trees) and c) exploration of query processing, concurrency control
2235	   and transaction management for k-RP* files, and [89]. On the latter
2236	   point, others have considered transactions to be inconsequential to
2237	   the core problem of supporting more complex queries in P2P networks
2238	   [72].

2240	   In architecting their secure wide-area Service Discovery Service
2241	   (SDS), Hodes, Czerwinski et al. considered three possible designs for
2242	   multi-criteria search - Centralization, Mapping and Flooding [90].
2243	   These correlate to the index classifications of Section 2. - Central,
2244	   Distributed and Local. They discounted the centralized, Napster-like
2245	   index for its risk of a single point of failure. They considered the
2246	   hash-based mappings of Section 3. but concluded that it would not be
2247	   possible to adequately partition data. A document satisfying many
2248	   criteria would be wastefully stored in many partitions. They rejected
2249	   full flooding for its lack of scalability. Instead, they devised a
2250	   query filtering technique, reminiscent of Gnutella's query routing
2251	   protocol (Section 4.1. ). Nodes push proactive summaries of their
2252	   data rather than waiting for a query. Summaries are aggregated and
2253	   stored throughout a server hierarchy, to guide subsequent queries.
2254	   Some initial prototype measurements were provided for total load on
2255	   the system, but not for load distribution. They put several issues
2256	   forward for future work. The indexing needs to be flexible to change
2257	   according to query and storage workloads. A mesh topology might
2258	   improve on their hierarchic topology since query misses would not
2259	   propagate to root servers. The choice is analogous to BGP meshes and
2260	   DNS trees.

2262	   More recently, Cai, Frank et al. devised the Multi-Attribute
2263	   Addressable Network (MAAN) [91]. They built on Chord to provide both
2264	   multi-attribute and range queries, claiming to be the first to
2265	   service both query types in a structured P2P system. Each MAAN node
2266	   has O(log N) neighbours, where N is the number of nodes. MAAN multi-
2267	   attribute range queries require O(log N+N*Smin) hops, where Smin is
2268	   the minimum range selectivity across all attributes. Selectivity is
2269	   the ratio of the query range to the entire identifier range. The
2270	   paper assumed that a locality preserving hash function would ensure
2271	   balanced load. Per Section 5.1. , the arguments by Bharambe, Agrawal
2272	   et al. have highlighted the shortcomings of this assumption [84].
2273	   MAAN required that the schema must be fixed and known in advance -
2274	   adaptable schemas were recommended for subsequent attention. The
2275	   authors also acknowledged that there is a selectivity breakpoint at
2276	   which full flooding becomes more efficient than their scheme. This
2277	   begs for a query resolution algorithm that adapts to the profile of
2278	   queries. Cai and Frank followed up with RDFPeers [55]. They
2279	   differentiate their work from other RDF proposals by a) guaranteeing
2280	   to find query results if they exist and b) removing the requirement
2281	   of prior definition of a fixed schema. They hashed <subject,
2282	   predicate, object> triples onto the MAAN and reported routing hop
2283	   metrics for their implementation. Load imbalance across nodes was
2284	   reduced to less than one order of magnitude, but the specific measure
2285	   was number of triples stored per node - skewed query loads were not
2286	   considered. They plan to improve load balancing with the virtual
2287	   servers of Section 5.1. [168].

2289	5.3. Join Queries

2291	   Two research teams have done some initial work on P2P join
2292	   operations. Harren, Hellerstein et al. initially described a three-
2293	   layer architecture - storage, DHT and query processing. They
2294	   implemented the join operation by modifying an existing Content
2295	   Addressable Network (CAN) simulator, reporting "significant hot-spots
2296	   in all dimensions: storage, processing and routing" [72]. They
2297	   progressed their design more recently in the context of PIER, a
2298	   distributed query engine based on CAN [22, 385]. They implemented two
2299	   equi-join algorithms. In their design, a key is constructed from the
2300	   "namespace" and the "resource ID". There is a namespace for each
2301	   relation and the resource ID is the primary key for base tuples in
2302	   that relation. Queries are multicast to all nodes in the two
2303	   namespaces (relations) to be joined. Their first algorithm is a DHT
2304	   version of the symmetric hash join. Each node in the two namespaces
2305	   finds the relevant tuples and hashes them to a new query namespace.
2306	   The resource ID in the new namespace is the concatenation of join
2307	   attributes. In the second algorithm, called "fetch matches", one of
2308	   the relations is already hashed on the join attributes. Each node in
2309	   the second namespace finds tuples matching the query and retrieves
2310	   the corresponding tuples from the the first relation. They leveraged
2311	   two other techniques, namely the symmetric semi-join rewrite and the
2312	   Bloom filter rewrite, to reduce the high bandwidth overheads of the
2313	   symmetric hash join. For an overlay of 10,000 nodes, they simulated
2314	   the delay to retrieve tuples and the aggregate network bandwidth for
2315	   these four schemes. The initial prototype was on a cluster of 64 PCs,
2316	   but it has more recently been expanded to PlanetLab.

2318	   Triantafillou and Pitoura considered multicasting to large numbers of
2319	   peers to be inefficient [76]. They therefore allocated a limited
2320	   number of special peers, called range guards. The domain of the join
2321	   attributes was divided, one partition per range guard. Join queries
2322	   were sent only to range guards, where the query was executed.
2323	   Efficient selection of range guards and a quantitive evaluation of
2324	   their proposal were left for future work.

2326	5.4. Aggregation Queries

2328	   Aggregation queries invariable rely on tree-structures to combine
2329	   results from a large number of nodes. Examples of aggregation queries
2330	   are Count, Sum, Maximum, Minimum, Average, Median and Top-K [92, 386,
2331	   387]. Figure 5 summarizes the tree and query characteristics that
2332	   affect dependability.

2334	   Tree type: Doesn't use DHT [92], use internal DHT trees [95], use
2335	      independent trees on top of DHTs
2336	   Tree repair: Periodic [93], exceptional [32]
2337	   Tree count: One per key, one per overlay [56]
2338	   Tree flexibility: Static [92], dynamic

2340	   Query interface: install, update, probe [98]
2341	   Query distribution: multicast [98], gossip [92]
2342	   Query applications: leader election, voting, resource location,
2343	      object placement and error recovery [98, 388]
2344	   Query semantics
2345	      Consistency: Best-effort, eventual [92], snapshot / interval /
2346	         single-site validity [99]
2347	      Timeliness [388]
2348	      Lifetime: Continuous [97, 99], single-shot
2349	      No. attributes: Single, multiple
2350	   Query types: Count, sum, maximum, minimum, average, median, top k
2351	      [92, 386, 387]

2353	          Figure 5 Aggregation Trees and Queries in P2P Networks.

2355	   Key: Astrolabe [92]; Cone [93]; Distributed Approximative System
2356	   Information Service (DASIS) [95]; Scalable Distributed Information
2357	   Management System (SDIMS) [98]; Self-Organized Metadata Overlay
2358	   (SOMO) [56]; Wildfire [99]; Willow [32]; Newscast [97]

2360	   The fundamental design choices for aggregation trees relate to how
2361	   the overlay uses DHTs, how it repairs itself when there are failures,
2362	   how many aggregation trees there are, and whether the tree is static
2363	   or dynamic (Figure 5). Astrolabe is one of the most influential P2P
2364	   designs included in Figure 5, yet it makes no use of DHTs [92]. Other
2365	   designs make use of the internal trees of Plaxton-like DHTs. Others
2366	   build independent tree structures on top of DHTs. Most of the designs
2367	   repair the aggregation tree with periodic mechanisms similar to those
2368	   used in the DHTs themselves. Willow is an exception [32]. It uses a
2369	   Tree Maintenance Protocol to "zip" disjoint aggregation trees
2370	   together when there are major failures. Yalagandula and Dahlin found
2371	   reconfigurations at the aggregation layer to be costly, suggesting
2372	   more research on techniques to reduce the cost and frequency of such
2373	   reconfigurations [98]. Many of the designs use multiple aggregation
2374	   trees, each rooted at the DHT node responsible for the aggregation
2375	   attribute. On the other hand, the Self-Organized Metadata Overlay
2376	   [56] uses a single tree and is vulnerable to a single point of
2377	   failure at its root.

2379	   At the time of writing, researchers have just begun exploring the
2380	   performance of queries in the presence of churn. Most designs are for
2381	   best-effort queries. Bawa et al. devised a better consistency model,
2382	   called Single-Site Validity [99] to qualify the accuracy of results
2383	   when there is churn. Its price was a five-fold increase in the
2384	   message load, when compared to an efficient but best-effort Spanning
2385	   Tree. Gossip mechanisms are resilient to churn, but they delay
2386	   aggregation results and incur high message cost for aggregation
2387	   attributes with small read-to-write ratios.

2389	6. Security Considerations

2391	   An initial list of references to research on P2P security is given in
2392	   Figure 1, Section 1. This document addresses P2P search. P2P storage,
2393	   security and applications are recommended for further investigation
2394	   in Section 8.

2396	7. IANA Considerations

2398	   This document has no actions for IANA.

2400	8. Conclusions

2402	   Research on peer-to-peer networks can be divided into four categories
2403	   - search, storage, security and applications. This critical survey
2404	   has focused on search methods. While P2P networks have been
2405	   classified by the existence of an index (structured or unstructured)
2406	   or the location of the index (local, centralized and distributed),
2407	   this survey has shown that most have evolved to have some structure,
2408	   whether it is indexes at superpeers or indexes defined by DHT
2409	   algorithms. As for location, the distributed index is most common.
2410	   The survey has characterized indexes as semantic and semantic-free.
2411	   It has also critiqued P2P work on major query types. While much of it
2412	   addresses work from 2000 or later, we have traced important building
2413	   blocks from the 1990s.

2415	   The initial motivation in this survey was to answer the question,
2416	   "How robust are P2P search networks?" The question is key to the
2417	   deployment of P2P technology. Balakrishnan, Kaashoek et al. argued
2418	   that the P2P architecture is appealing: the startup and growth
2419	   barriers are low; they can aggregate enormous storage and processing
2420	   resources; "the decentralized and distributed nature of P2P systems
2421	   gives them the potential to be robust to faults or intentional
2422	   attacks" [18]. If P2P is to be a disruptive technology in
2423	   applications other than casual file sharing, then robustness needs to
2424	   be practically verified [20].

2426	   The best comparative research on P2P dependability has been done in
2427	   the context of Distributed Hash Tables (DHTs) [291]. The entire body
2428	   of DHT research can be distilled to four main observations about
2429	   dependability (Section 3.2. ). Firstly, static dependability
2430	   comparisons show that no O(log N) DHT geometry is significantly more
2431	   dependable than the other O(log N) geometries.  Secondly, dynamic
2432	   dependability comparisons show that DHT dependability is sensitive to
2433	   the underlying topology maintenance algorithms (Figure 2). Thirdly,
2434	   most DHTs use O(log N) geometries to suit ephemeral nodes, whereas
2435	   the O(1) hop DHTs suit stable nodes - they deserve more research
2436	   attention. Fourthly, although not yet a mature science, the study of
2437	   DHT dependability is helped by recent simulation tools that support
2438	   multiple DHTs [299].

2440	   We make the following four suggestions for future P2P research:

2442	   1) Complete the companion P2P surveys for storage, security and
2443	   applications. A rough outline has been suggested in Figure 1, along
2444	   with references. The need for such surveys was highlighted within the
2445	   peer-to-peer research group of the Internet Research Task Force
2446	   (IRTF) [17].

2448	   2) P2P indexes are maturing. P2P queries are embryonic. Work on more
2449	   expressive queries over P2P indexes started to gain momentum in 2003,
2450	   but remains fraught with efficiency and load issues.

2452	   3) Isolate the low-level mechanisms affecting robustness. There is
2453	   limited value in comparing robustness of DHT geometries (like rings
2454	   versus de Bruijn graphs), when robustness is highly sensitive to
2455	   underlying topology maintenance algorithms (Figure 2).

2457	   4) Build consensus on robustness metrics and their acceptable ranges.
2458	   This paper has teased out numerous measures that impinge on
2459	   robustness, for example, the median query path length for a failure
2460	   of x% of nodes, bisection width, path overlap, the number of
2461	   alternatives available for the next hop, lookup latency, average live
2462	   bandwidth (bytes/node/sec), successful routing rates, the number of
2463	   timeouts (caused by a finger pointing to a departed node), lookup
2464	   failure rates (caused by nodes that temporarily point to the wrong
2465	   successor during churn) and clustering measures (edge expansion and
2466	   node expansion). Application-level robustness metrics need to drive a
2467	   consistent assessment of the underlying search mechanics.

2469	9. Acknowledgments

2471	   This document was adapted from a paper in Elsevier's Computer
2472	   Networks:-

2474	      J.Risson & T.Moors, Survey of Research towards Robust Peer-to-Peer
2475	      Networks: Search Methods, Computer Networks 51(7)2007.

2477	   We thank Bill Yeager, Ali Ghodsi and several anonymous reviewers for
2478	   thorough comments that significantly improved the quality of earlier
2479	   drafts.

2481	10. References

2483	10.1. Normative References

2485	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
2486	             Requirement Levels", BCP 14, RFC 2119, March 1997.

2488	10.2. Informative References

2490	[1]   M. Roussopoulos, M. Baker, D. Rosenthal, T. Guili, P. Maniatis,
2491	      and J. Mogul, 2 P2P of Not 2 P2P?, The 3rd Int'l Workshop on
2492	      Peer-to-Peer Systems, February 26-27 2004.
2493	[2]   A. Rowstron and P. Druschel, Pastry:  Scalable, distributed
2494	      object location and routing for large-scale peer-to-peer systems,
2495	      IFIP/ACM Middleware 2001, Nov 2001.
2496	[3]   B. Yeager and B. Bhattacharjee, Peer-to-Peer Research Group
2497	      Charter, http://www.irtf.org/charters/p2prg.html (2003)
2498	[4]   T. Klingberg and R. Manfredi, Gnutella 0.6, (2002)
2499	[5]   I. Clarke, A Distributed Decentralised Information Storage and
2500	      Retrieval System, Undergraduate Thesis, 1999.
2501	[6]   B. Zhao, J. Kubiatowicz, and A. Joseph, Tapestry:  an
2502	      infrastructure for fault-tolerant wide-area location and routing,
2503	      Report No. UCB/CSD-01-1141 2001.
2504	[7]   I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. Kaashoek, F.
2505	      Dabek, and H. Balakrishnan, Chord:  A scalable peer-to-peer
2506	      lookup service for internet applications, Proc.  ACM SIGCOMM 2001
2507	      2001, pp. 149-160.
2508	[8]   S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, A
2509	      scalable content-addressable network, Proc. of the conf. on
2510	      Applications, technologies, architectures and protocols for
2511	      computer communications, August 27-31 2001, pp. 161-172.
2512	[9]   C. Tang, Z. Xu, and M. Mahalingam, pSearch: information retrieval
2513	      in structured overlays, First Workshop on Hot Topics in Networks.
2514	      Also Computer Communication Review, Volume 33, Number 1, January
2515	      2003, Oct 28-29 2002.
2516	[10]  W. Nejdl, S. Decker, and W. Siberski, Edutella Project, RDF-based
2517	      Metadata Infrastructure for P2P Applications,
2518	      http://edutella.jxta.org/ (2003)
2519	[11]  K. Aberer and M. Hauswirth, Peer-to-peer information systems:
2520	      concepts and models, state-of-the-art, and future systems, ACM
2521	      SIGSOFT Software Engineering Notes, Proc. 8th European software
2522	      engineering conference held jointly with 9th ACM SIGSOFT
2523	      international symposium on foundations of software engineering 26
2524	      (5) (2001)
2525	[12]  L. Zhou and R. van Renesse, P6P: a peer-to-peer approach to
2526	      internet infrastructure, The 3rd Int'l Workshop on Peer-to-Peer
2527	      Systems, February 26-27 2004.

2529	[13]  Citeseer, Citeseer Scientific Literature Digital Library,
2530	      http://citeseer.ist.psu.edu/ (2004)
2531	[14]  D. Milojicic, V. Kalogeraki, R. Lukose, K. Nagaraja, J. Pruyne,
2532	      B. Richard, S. Rollins, and Z. Xu, Peer-to-Peer Computing, HP
2533	      Technical Report, HPL-2002-57 2002.
2534	[15]  K. Aberer and M. Hauswirth, An overview on peer-to-peer
2535	      information systems, Workshop on Distributed Data and Structures
2536	      WDAS-2002 2002.
2537	[16]  F. DePaoli and L. Mariani, Dependability in Peer-to-Peer Systems,
2538	      IEEE Internet Computing 8 (4) (2004) 54-61.
2539	[17]  B. Yeager, Proposed research tracks, Email to the Internet
2540	      Research Task Force IRTF P2P Research Group, Nov 10 2003.
2541	[18]  H. Balakrishnan, M. F. Kaashoek, D. Karger, R. Morris, and I.
2542	      Stoica, Looking up data in P2P systems, Communications of the ACM
2543	      46 (2) (2003) 43-48.
2544	[19]  D. Kossmann, The state of the art in distributed query
2545	      processing, ACM Computing Surveys 32 (4) (2000) 422-469.
2546	[20]  B. Gedik and L. Liu, Reliable peer-to-peer information monitoring
2547	      through replication, Proc. 22nd Int'l Symp. on Reliable
2548	      Distributed Systems, 6-8 Oct 2003, pp. 56-65.
2549	[21]  S.-M. Shi, Y. Guangwen, D. Wang, J. Yu, S. Qu, and M. Chen,
2550	      Making peer-to-peer keyword searching feasible using multi-level
2551	      partitioning, The 3rd Int'l Workshop on Peer-to-Peer Systems,
2552	      February 26-27 2004.
2553	[22]  R. Huebsch, J. M. Hellerstein, N. Lanham, B. T. Loo, S. Shenker,
2554	      and I. Stoica, Querying the Internet with PIER, Proc. 29th Int'l
2555	      Conf. on Very Large Databases VLDB'03, September 2003.
2556	[23]  J. M. Hellerstein, Toward network data independence, ACM SIGMOD
2557	      Record 32 (3) (2003) 34-40.
2558	[24]  K. Gummadi, R. Gummadi, S. Gribble, S. Ratnasamy, S. Shenker, and
2559	      I. Stoica, The impact of DHT routing geometry on resilience and
2560	      proximity, Proc. 2003 conference on Applications, Technologies,
2561	      Architectures and Protocols for Computer Communications 2003, pp.
2562	      381-394.
2563	[25]  N. Daswani, H. Garcia-Molina, and B. Yang, Open Problems in Data-
2564	      sharing Peer-to-peer Systems, The 9th Int'l Conf. on Database
2565	      Theory (ICDT 2003), Siena, Italy, 8-10 January (2003)
2566	[26]  B. Cooper and H. Garcia-Molina, Studying search networks with
2567	      SIL, Second Int'l Workshop on Peer-to-Peer Systems IPTPS 03, 20-
2568	      21 February 2003.
2569	[27]  M. Bawa, Q. Sun, P. Vinograd, B. Yang, B. Cooper, A. Crespo, N.
2570	      Daswani, P. Ganesan, H. Garcia-Molina, S. Kamvar, S. Marti, and
2571	      M. Schlossed, Peer-to-peer research at Stanford, ACM SIGMOD
2572	      Record 32 (3) (2003) 23-28.
2573	[28]  B. Yang and H. Garcia-Molina, Improving search in peer-to-peer
2574	      networks, Proc. 22nd IEEE Int'l Conf. on Distributed Computing
2575	      Systems, July 2002.

2577	[29]  B. Yang and H. Garcia-Molina, Efficient search in peer-to-peer
2578	      networks, Proc. 22nd Int'l Conf. on Distributed Computing
2579	      Systems, July 2-5 2002.
2580	[30]  C. Plaxton, R. Rajaraman, and A. Richa, Accessing nearby copies
2581	      of replicated objects in a distributed environment, ACM Symp. on
2582	      Parallel Algorithms and Architectures (1997)
2583	[31]  B. Zhao, L. Huang, J. Stribling, S. Rhea, A. Joseph, and J.
2584	      Kubiatowicz, Tapestry: A Resilient Global-Scale overlay for
2585	      Service Deployment, IEEE Journal on Selected Areas in
2586	      Communications 22 (1) (2004) 41-53.
2587	[32]  R. van Renesse and A. Bozdog, Willow: DHT, aggregation and
2588	      publish/subscribe in one protocol, The 3rd Int'l Workshop on
2589	      Peer-to-Peer Systems, February 26-27 2004.
2590	[33]  P. Ganesan, G. Krishna, and H. Garcia-Molina, Canon in G Major:
2591	      Designing DHTs with Hierarchical Structure, Proc. Int'l Conf. on
2592	      Distributed Computing Systems ICDCS 2004 2004.
2593	[34]  I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. Kaashoek, F.
2594	      Dabek, and H. Balakrishnan, Chord:  a scalable peer-to-peer
2595	      lookup protocol for Internet applications, IEEE/ACM Trans. on
2596	      Networking 11 (1) (2003) 17-32.
2597	[35]  S. Rhea, T. Roscoe, and J. Kubiatowicz, Structured Peer-to-Peer
2598	      Overlays Need Application-Driven Benchmarks, Proc. 2nd Int'l
2599	      Workshop on Peer-to-Peer Systems IPTPS'03, February 20-21 2003.
2600	[36]  D. Loguinov, A. Kumar, and S. Ganesh, Graph-theoretic analysis of
2601	      structured peer-to-peer systems:  routing distances and fault
2602	      resilience, Proc. 2003 conference on Applications, Technologies,
2603	      Architectures and Protocols for Computer Communications, August
2604	      25-29 2003, pp. 395-406.
2605	[37]  F. Kaashoek and D. Karger, Koorde:  A simple degree-optimal hash
2606	      table, Second Int'l Workshop on Peer-to-Peer Systems IPTPS'03,
2607	      20-21 February 2003.
2608	[38]  N. Harvey, M. B. Jones, S. Saroiu, M. Theimer, and A. Wolman,
2609	      SkipNet: A Scalable Overlay Network with Practical Locality
2610	      Properties, Proc. Fourth USENIX Symp. on Internet Technologies
2611	      and Systems USITS'03, March 2003.
2612	[39]  I. Gupta, K. Birman, P. Linga, A. Demers, and R. Van Renesse,
2613	      Kelips:  Building an efficient and stable P2P DHT through
2614	      increased memory and background overhead, Second Int'l Workshop
2615	      on Peer-to-Peer Systems IPTPS 03, Feb 20-21 2003.
2616	[40]  J. Cates, Robust and Efficient Data Management for a Distributed
2617	      Hash Table, Master's Thesis, May 2003.
2618	[41]  J. Aspnes and G. Shah, Skip graphs, Proc. 14th annual ACM-SIAM
2619	      symposium on discrete algorithms (2003) 384-393.
2620	[42]  K. Aberer, P. Cudre-Mauroux, A. Datta, Z. Despotovic, M.
2621	      Hauswirth, M. Punceva, and R. Schmidt, P-Grid:  a self-organizing
2622	      structured P2P system, ACM SIGMOD Record 32 (3) (2003) 29-33.

2624	[43]  B. Zhao, Y. Duan, L. Huang, A. Joseph, and J. Kubiatowicz,
2625	      Brocade: landmark routing on overlay networks, First Int'l
2626	      Workshop on Peer-to-Peer Systems IPTPS'02, March 2002.
2627	[44]  S. Ratnasamy, S. Shenker, and I. Stoica, Routing algorithms for
2628	      DHTs:  some open questions, Proc. First Int'l Workshop on Peer to
2629	      Peer Systems, IPTPS 2002, March 2002.
2630	[45]  P. Maymounkov and D. Mazieres, Kademlia:  A peer-to-peer
2631	      information system based on the XOR metric, Proc. First Int'l
2632	      Workshop on Peer to Peer Systems, IPTPS 2002, March 7-8 2002.
2633	[46]  D. Malkhi, M. Naor, and D. Ratajczak, Viceroy:  a scalable and
2634	      dynamic emulation of the butterfly, Proc. 21st annual symposium
2635	      on principles of distributed computing PODC, July 21-24 2002, pp.
2636	      183-192.
2637	[47]  X. Li and C. Plaxton, On name resolution in peer to peer
2638	      networks, Proc. ACM SIGACT Annual Workshop on Principles of
2639	      Mobile Computing POMC'02 2002, pp. 82-89.
2640	[48]  N. Harvey, J. Dunagan, M. B. Jones, S. Saroiu, M. Theimer, and A.
2641	      Wolman, SkipNet:  A Scalable overlay Network with Practical
2642	      Locality Properties, Microsoft Research Technical Report MSR-TR-
2643	      2002-92 (2002)
2644	[49]  D. Karger, E. Lehman, T. Leighton, R. Panigraphy, M. Levin, and
2645	      D. Lewin, Consistent hashing and random trees:  distributed
2646	      caching protocols for relieving hot spots on the World  Wide Web,
2647	      ACM Symp. on Theory of Computing (1997)
2648	[50]  W. Litwin, M. Neimat, and D. Schneider, LH* - a scalable,
2649	      distributed data structure, ACM Trans. on Database Systems (TODS)
2650	      21 (4) (1996) 480-525.
2651	[51]  R. Devine, Design and Implementation of DDH: A Distributed
2652	      Dynamic Hashing Algorithm, Proc.  4th Int'l Conf. on Foundations
2653	      of Data Organizations and Algorithms 1993.
2654	[52]  W. Litwin, M.-A. Niemat, and D. Schneider, LH* - Linear Hashing
2655	      for Distributed Files, Proc.  ACM Int'l Conf. on Mngt. of Data
2656	      SIGMOD, May 1993, pp. 327-336.
2657	[53]  C. Tempich, S. Staab, and A. Wranik, Remindin': semantic query
2658	      routing in peer-to-peer networks, Proc. 13th conference on World
2659	      Wide Web, New York, NY, USA, May 17-20 (2004) 640-649.
2660	[54]  B. T. Loo, R. Huebsch, I. Stoica, and J. M. Hellerstein, The case
2661	      for a hybrid P2P search infrastructure, The 3rd Int'l Workshop on
2662	      Peer-to-Peer Systems, February 26-27 2004.
2663	[55]  M. Cai and M. Frank, RDFPeers: a scalable distributed RDF
2664	      repository based on a structured peer-to-peer network, Proc. 13th
2665	      conference on World Wide Web, May 17-20 2004, pp. 650-657.
2666	[56]  Z. Zhang, S.-M. Shi, and J. Zhu, SOMO: Self-organized metadata
2667	      overlay for resource management in P2P DHTs, Second Int'l
2668	      Workshop on Peer-to-Peer Systems IPTPS'03, Feb 20-21 2003.
2669	[57]  B. Yang and H. Garcia-Molina, Designing a super-peer network,
2670	      Proc. 19th Int'l Conf. on Data Engineering ICDE, March 2003.

2672	[58]  I. Tatarinov, P. Mork, Z. Ives, J. Madhavan, A. Halevy, D. Suciu,
2673	      N. Dalvi, X. Dong, Y. Kadiyska, and G. Miklau, The Piazza peer
2674	      data management project, ACM SIGMOD Record 32 (3) (2003) 47-52.
2675	[59]  W. Nejdl, W. Siberski, and M. Sintek, Design Issues and
2676	      Challenges for RDF- and schema-based peer-to-peer systems, ACM
2677	      SIGMOD Record 32 (3) (2003) 41-46.
2678	[60]  S. Joseph and T. Hoshiai, Decentralized Meta-Data Strategies:
2679	      Effective Peer-to-Peer Search, IEICE Trans. Commun. E86-B (6
2680	      June) (2003) 1740-1753.
2681	[61]  Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, and S. Shenker,
2682	      Making gnutella-like P2P systems scalable, Proc. 2003 conference
2683	      on Applications, Technologies, Architectures and Protocols for
2684	      Computer Communications, August 25-29 2003, pp. 407-418.
2685	[62]  M. Bawa, G. S. Manku, and P. Raghavan, SETS: search enhanced by
2686	      topic segmentation, Proc. 26th annual international ACM SIGIR
2687	      conference on Research and Development in Information Retrieval
2688	      2003, pp. 306-313.
2689	[63]  H. Sunaga, M. Takemoto, and T. Iwata, Advanced peer to peer
2690	      network platform for various services - SIONet Semantic
2691	      Information Oriented Network, Proc. Second Int'l Conf. on Peer to
2692	      Peer Computing, Sept 5-7 2002, pp. 169-170.
2693	[64]  M. Schlosser, M. Sintek, S. Decker, and W. Nejdl, HyperCuP -
2694	      Hypercubes, Ontologies and P2P Networks, Springer Lecture Notes
2695	      on Computer Science, Agents and Peer-to-Peer Systems Vol. 2530
2696	      (2002)
2697	[65]  M. Ripeanu, A. Iamnitchi, and P. Foster, Mapping the Gnutella
2698	      network, IEEE Internet Computing 6 (1) (2002) 50-57.
2699	[66]  Q. Lv, S. Ratnasamy, and S. Shenker, Can Heterogeneity Make
2700	      Gnutella Scalable?, Proc. 1st Int'l Workshop on Peer-to-Peer
2701	      Systems IPTPS2002, March 7-8 2002.
2702	[67]  Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker, Search and
2703	      replication in unstructured peer to peer networks, Proc. 16th
2704	      international conference on supercomputing, June 22-26 2002, pp.
2705	      84-95.
2706	[68]  V. Kalogaraki, D. Gunopulos, and D. Zeinalipour-Yasti, XML
2707	      schemas:  integration and translation:  A local search mechanism
2708	      for peer to peer networks, Proc. 11th ACM international
2709	      conference on Information and Knowledge management 2002, pp. 300-
2710	      307.
2711	[69]  O. Babaoglu, H. Meling, and Montresor, Anthill:  a framework for
2712	      the development of agent-based peer-to-peer systems, Proc.  IEEE
2713	      Int'l Conf. on Distributed Computer systems 2002, pp. 15-22.
2714	[70]  M. Jovanovic, Modeling large-scale peer-to-peer networks and a
2715	      case study of Gnutella, Master's Thesis 2001.
2716	[71]  I. Clarke, O. Sandberg, B. Wiley, and T. Hong, Freenet:  A
2717	      Distributed Anonymous Information Storage and Retrieval System.
2718	      Springer, New York, USA, 2001.

2720	[72]  J. Harren, J. Hellerstein, R. Huebsch, B. Loo, S. Shenker, and I.
2721	      Stoica, Complex queries in DHT-based peer-to-peer networks, Proc.
2722	      First Int'l Workshop on Peer to Peer Systems IPTPS 2002, March
2723	      2002.
2724	[73]  B. Gedik and L. Liu, PeerCQ: A Decentralized and Self-Configuring
2725	      Peer-to-Peer Information Monitoring System, Proc. 23rd Int'l
2726	      Conf. on Distributed Computing Systems ICDCS2003, May 19-22 2003.
2727	[74]  B. T. Loo, R. Huebsch, J. M. Hellerstein, T. Roscoe, and I.
2728	      Stoica, Analyzing P2P Overlays with Recursive Queries, Technical
2729	      Report, CSD-04-1301, January 14 2004.
2730	[75]  R. Avnur and J. Hellerstein, Eddies: continuously adaptive query
2731	      processing, Proc. 2000 ACM SIGMOD international conference on
2732	      Management of Data 2000, pp. 261-272.
2733	[76]  P. Triantafillou and T. Pitoura, Towards a unifying framework for
2734	      complex query processing over structured peer-to-peer data
2735	      networks, Proc. First Int'l Workshop on Databases, Information
2736	      Systems and Peer-to-Peer Computing DBISP2P, Sept 7-8 2003, pp.
2737	      169-183.
2738	[77]  A. Gupta, D. Agrawal, and A. E. Abbadi, Approximate range
2739	      selection queries in peer-to-peer systems, Proc. First Biennial
2740	      Conf. on Innovative Data Systems Research CIDR 2003 2003.
2741	[78]  S. Ratnasamy, P. Francis, and M. Handley, Range queries in DHTs,
2742	      Technical Report IRB-TR-03-009, July 2003.
2743	[79]  S. Ramabhadran, S. Ratnasamy, J. Hellerstein, and S. Shenker,
2744	      Brief announcement: prefix hash tree, Proc. 23rd Annual ACM
2745	      SIGACT-SIGOPS Symp. on Principles of Distributed Computing, PODC
2746	      2004, July 25-28 2004, pp. 368-368.
2747	[80]  A. Andrzejak and Z. Xu, Scalable, efficient range queries for
2748	      grid information services, Proc. Second IEEE Int'l Conf. on Peer
2749	      to Peer Computing, September 2002.
2750	[81]  C. Schmidt and M. Parashar, Enabling flexible queries with
2751	      guarantees in P2P systems, IEEE Internet Computing 8 (3) (2004)
2752	      19-26.
2753	[82]  E. Tanin, A. Harwood, and H. Samet, Indexing distributed complex
2754	      data for complex queries, Proc. National Conf. on Digital
2755	      Government Research 2004, pp. 81-90.
2756	[83]  P. Ganesan, M. Bawa, and H. Garcia-Molina, Online Balancing of
2757	      Range-Partitioned Data with Applications to Peer-to-Peer Systems,
2758	      Proc. 30th Int'l Conf. on Very Large Data Bases VLDB 2004, 29
2759	      August - 3 September 2004.
2760	[84]  A. Bharambe, M. Agrawal, and S. Seshan, Mercury: Supporting
2761	      Scalable Multi-Attribute Range Queries, SIGCOMM'04, Aug 30-Sept 3
2762	      2004.
2763	[85]  K. Aberer, Scalable Data Access in P2P Systems Using Unbalanced
2764	      Search Trees, Workshop on Distributed Data and Structures WDAS-
2765	      2002 2002.

2767	[86]  K. Aberer, A. Datta, and M. Hauswirth, The Quest for Balancing
2768	      Peer Load in Structured Peer-to-Peer Systems, Technical Report
2769	      IC/2003/32 2003.
2770	[87]  W. Litwin, M.-A. Neimat, and D. Schneider, RP*: a family of
2771	      order-preserving scalable distributed data structures, Proc. 20th
2772	      Int'l Conf. on Very Large Data Bases VLDB'94, September 12-15
2773	      1994.
2774	[88]  M. Tsangou, S. Ndiaye, M. Seck, and W. Litwin, Range queries to
2775	      scalable distributed data structure RP*, Proc. Fifth Workshop on
2776	      Distributed Data and Structures, WDAS 2003, June 2003.
2777	[89]  W. Litwin and M.-A. Neimat, k-RP*s: a scalable distributed data
2778	      structure for high-performance multi-attributed access, Proc.
2779	      Fourth Int'l Conf. on Parallel and Distributed Information
2780	      Systems (1996) 120-131.
2781	[90]  T. Hodes, S. Czerwinski, B. Zhao, A. Joseph, and R. Katz, An
2782	      architecture for secure wide-area service discovery, Wireless
2783	      Networks 8 (2/3) (2002) 213-230.
2784	[91]  M. Cai, M. Frank, J. Chen, and P. Szekely, MAAN: A Multi-
2785	      Attribute Addressable Network for Grid Information Services,
2786	      Proc. Int'l Workshop on Grid Computing, November 2003.
2787	[92]  R. van Renesse, K. P. Birman, and W. Vogels, Astrolabe:  A robust
2788	      and scalable technology for distribute system monitoring,
2789	      management and data mining, ACM Trans. on Computer Systems 21 (2)
2790	      (2003) 164-206.
2791	[93]  R. Bhagwan, G. Varghese, and G. Voelker, Cone: Augmenting DHTs to
2792	      support distributed resource discovery, Technical Report, CS2003-
2793	      0755, July 2003.
2794	[94]  K. Albrecht, R. Arnold, and R. Wattenhofer, Join and Leave in
2795	      Peer-to-Peer Systems: The DASIS Approach, Technical Report 427,
2796	      Department of Computer Science, November 2003.
2797	[95]  K. Albrecht, R. Arnold, and R. Wattenhofer, Aggregating
2798	      information in peer-to-peer systems for improved join and leave,
2799	      Proc. Fourth IEEE Int'l Conf. on Peer-to-Peer Computing, 25-27
2800	      August 2004.
2801	[96]  A. Montresor, M. Jelasity, and O. Babaoglu, Robust aggregation
2802	      protocol for large-scale overlay networks, Technical Report
2803	      UBLCS-2003-16, December 2003.
2804	[97]  M. Jelasity, W. Kowalczyk, and M. van Steen, An Approach to
2805	      Aggregation in Large and Fully Distributed Peer-to-Peer Overlay
2806	      Networks, Proc. 12th Euromicro Conf. on Parallel, Distributted
2807	      and Network based Processing PDP 2004, February 2004.
2808	[98]  P. Yalagandula and M. Dahlin, A scalable distributed information
2809	      management system, SIGCOMM'04, Aug 30-Sept 3 2004.
2810	[99]  M. Bawa, A. Gionis, H. Garcia-Molina, and R. Motwani, The price
2811	      of validity in dynamic networks, Proc. 2004 ACM SIGMOD Int'l
2812	      Conf. on the management of data 2004, pp. 515-526.

2814	[100] J. Aspnes, J. Kirsch, and A. Krishnamurthy, Load Balancing and
2815	      Locality in Range-Queriable Data Structures, Proc. 23rd Annual
2816	      ACM SIGACT-SIGOPS Symp. on Principles of Distributed Computing
2817	      PODC 2004, July 25-28 2004.
2818	[101] G. On, J. Schmitt, and R. Steinmetz, The effectiveness of
2819	      realistic replication strategies on quality of availability for
2820	      peer-to-peer systems, Proc. Third Int'l IEEE Conf. on Peer-to-
2821	      Peer Computing, Sept 1-3 2003, pp. 57-64.
2822	[102] D. Geels and J. Kubiatowicz, Replica management should be a game,
2823	      Proc. SIGOPS European Workshop, September 2003.
2824	[103] E. Cohen and S. Shenker, Replication strategies in unstructured
2825	      peer to peer networks, Proc. 2002 conference on applications,
2826	      technologies, architectures and protocols for computer
2827	      communications 2002, pp. 177-190.
2828	[104] E. Cohen and S. Shenker, P2P and multicast:  replication
2829	      strategies in unstructured peer to peer networks, Proc. 2002
2830	      conference on applications, technologies, architectures and
2831	      protocols for computer communications 2002, pp. 177-190.
2832	[105] H. Weatherspoon and J. Kubiatowicz, Erasure coding vs
2833	      replication:  a quantative comparison, Proc. First Int'l Workshop
2834	      on Peer to Peer Systems IPTPS'02, March 2002.
2835	[106] D. Lomet, Replicated indexes for distributed data, Proc. Fourth
2836	      Int'l Conf. on Parallel and Distributed Information Systems,
2837	      December 18-20 1996, pp. 108-119.
2838	[107] V. Gopalakrishnan, B. Silaghi, B. Bhattacharjee, and P. Keleher,
2839	      Adaptive Replication in Peer-to-Peer Systems, Proc. 24th Int'l
2840	      Conf. on Distributed Computing Systems ICDCS 2004, March 23-26
2841	      2004.
2842	[108] S.-D. Lin, Q. Lian, M. Chen, and Z. Zhang, A practical
2843	      distributed mutual exclusion protocol in dynamic peer-to-peer
2844	      systems, The 3rd Int'l Workshop on Peer-to-Peer Systems, February
2845	      26-27 2004.
2846	[109] A. Adya, R. Wattenhofer, W. Bolosky, M. Castro, G. Cermak, R.
2847	      Chaiken, J. Douceur, J. Howell, J. Lorch, and M. Thiemer,
2848	      Farsite: federated, available and reliable storage for an
2849	      incompletely trusted environment, ACM SIGOPS Operating Systems
2850	      Review, Special issue on Decentralized storage systems (2002) 1-
2851	      14.
2852	[110] A. Rowstron and P. Druschel, Storage management and caching in
2853	      PAST, a large-scale, persistent peer-to-peer storage utility,
2854	      Proceedings ACM SOSP'01, October 2001, pp. 188-201.
2855	[111] S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H. Weatherspoon,
2856	      and J. Kubiatowicz, Maintenance-Free Global Data Storage, IEEE
2857	      Internet Computing 5 (5) (2001) 40-49.

2859	[112] J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D.
2860	      Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells,
2861	      and B. Zhao, Oceanstore:  An Architecture for global-scale
2862	      persistent storage, Proc. Ninth Int'l Conf. on Architecture
2863	      Support for Programming Languages and Operating Systems ASPLOS
2864	      2000, November 2000, pp. 190-201.
2865	[113] K. Birman, The Surprising Power of Epidemic Communication,
2866	      Springer-Verlag Heidelberg Lecture Notes in Computer Science
2867	      Volume 2584/2003 (2003) 97-102.
2868	[114] P. Costa, M. Migliavacca, G. P. Picco, and G. Cugola, Introducing
2869	      reliability in content-based publish-subscribe through epidemic
2870	      algorithms, Proc. 2nd international workshop on Distributed
2871	      event-based systems 2003, pp. 1-8.
2872	[115] P. Costa, M. Migliavacca, G. P. Picco, and G. Cugola, Epidemic
2873	      Algorithms for Reliable Content-Based Publish-Subscribe:  An
2874	      Evaluation, The 24th Int'l Conf. on Distributed Computing Systems
2875	      (ICDCS-2004), Mar 23-26, Tokyo University of Technology,
2876	      Hachioji, Tokyo, Japan (2004)
2877	[116] A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker,
2878	      H. Sturgis, D. Swinehart, and D. Terry, Epidemic algorithms for
2879	      replicated data management, Proc. Sixth ACM Symp. on Principles
2880	      of Distributed Computing 1987, pp. 1-12.
2881	[117] P. Eugster, R. Guerraoiu, A. Kermarrec, and L. Massoulie,
2882	      Epidemic information dissemination in distributed systems, IEEE
2883	      Computer 37 (5) (2004) 60-67.
2884	[118] W. Vogels, R. v. Renesse, and K. Birman, The power of epidemics:
2885	      robust communication for large-scale distributed systems, ACM
2886	      SIGCOMM  Computer Communication Review 33 (1) (2003) 131-135.
2887	[119] S. Voulgaris and M. van Steen, An epidemic protocol for managing
2888	      routing tables in very large peer to peer networks, Proc. 14th
2889	      IFIP/IEEE Workshop on Distributed Systems: Operations and
2890	      Management, October 2003.
2891	[120] I. Gupta, On the design of distributed protocols from
2892	      differential equations, Proc. 23rd Annual ACM SIGACT-SIGOPS Symp.
2893	      on Principles of Distributed Computing PODC 2004, July 25-28
2894	      2004, pp. 216-225.
2895	[121] I. Gupta, K. Birman, and R. van Renesse, Fighting fire with fire:
2896	      using randomized gossip to combat stochastic scalability limits,
2897	      Cornell University Dept of Computer Science Technical Report,
2898	      March 2001.
2899	[122] K. Birman and I. Gupta, Building Scalable Solutions to
2900	      Distributed Computing Problems using Probabilistic Components,
2901	      Submitted to the Int'l Conf. on Dependable Systems and Networks
2902	      DSN-2004, Dependable Computing and Computing Symp. DCCS, June 28-
2903	      July 1 2004.

2905	[123] A. Ganesh, A.-M. Kermarrec, and L. Massoulie, Peer-to-peer
2906	      membership management for gossip-based protocols, IEEE Trans. on
2907	      Computers 52 (2) (2003) 139-149.
2908	[124] N. Bailey, Epidemic Theory of Infectious Diseases and its
2909	      Applications, Second Edition ed. Hafner Press, 1975.
2910	[125] P. Eugster, R. Guerraoiu, S. Handurukande, P. Kouznetsov, and A.-
2911	      M. Kermarrec, Lightweight probabilistic broadcast, ACM Trans. on
2912	      Computer Systems 21 (4) (2003) 341-374.
2913	[126] H. Weatherspoon and J. Kubiatowicz, Efficient heartbeats and
2914	      repair of softstate in decentralized object location and routing
2915	      systems, Proc. SIGOPS European Workshop, September 2002.
2916	[127] G. Koloniari and E. Pitoura, Content-based Routing of Path
2917	      Queries in Peer-to-Peer Systems, Proc. 9th Int'l Conf. on
2918	      Extending DataBase Technology EDBT, March 14-18 2004.
2919	[128] A. Mohan and V. Kalogaraki, Speculative routing and update
2920	      propagation: a kundali centric approach, IEEE Int'l Conf. on
2921	      Communications ICC'03, May 2002.
2922	[129] G. Koloniari, Y. Petrakis, and E. Pitoura, Content-Based Overlay
2923	      Networks for XML Peers Based on Multi-Level Bloom Filters, Proc.
2924	      First Int'l Workshop on Databases, Information Systems and Peer-
2925	      to-Peer Computing DBISP2P, Sept 7-8 2003, pp. 232-247.
2926	[130] G. Koloniari and E. Pitoura, Bloom-Based Filters for Hierarchical
2927	      Data, Proc. 5th Workshop on Distributed Data and Structures
2928	      (WDAS) (2003)
2929	[131] B. Bloom, Space/time trade-offs in hash coding with allowable
2930	      errors, Communications of the ACM 13 (7) (1970) 422-426.
2931	[132] M. Naor and U. Wieder, A Simple Fault Tolerant Distributed Hash
2932	      Table, Second Int'l Workshop on Peer-to-Peer Systems (IPTPS 03),
2933	      Berkeley, CA, USA, 20-21 February (2003)
2934	[133] P. Maymounkov and D. Mazieres, Rateless codes and big downloads,
2935	      Second Int'l Workshop on Peer-to-Peer Systems, IPTPS'03, February
2936	      20-21 2003.
2937	[134] M. Krohn, M. Freedman, and D. Mazieres, On-the-fly verification
2938	      of rateless erasure codes for efficient content distribution,
2939	      Proc. IEEE Symp. on Security and Privacy, May 2004.
2940	[135] J. Byers, J. Considine, M. Mitzenmacher, and S. Rost, Informed
2941	      content delivery across adaptive overlay networks, Proc. 2002
2942	      conference on applications, technologies, architectures and
2943	      protocols for computer communications 2002, pp. 47-60.
2944	[136] J. Plank, S. Atchley, Y. Ding, and M. Beck, Algorithms for High
2945	      Performance, Wide-Area Distributed File Downloads, Parallel
2946	      Processing Letters 13 (2) (2003) 207-223.
2947	[137] M. Castro, P. Rodrigues, and B. Liskov, BASE:  Using abstraction
2948	      to improve fault tolerance, ACM Trans. on Computer Systems 21 (3)
2949	      (2003) 236-269.
2950	[138] R. Rodrigues, B. Liskov, and L. Shrira, The design of a robust
2951	      peer-to-peer system, 10th ACM SIGOPS European Workshop, Sep 2002.

2953	[139] H. Weatherspoon, T. Moscovitz, and J. Kubiatowicz, Introspective
2954	      failure analysis: avoiding correlated failures in peer-to-peer
2955	      systems, Proc.  Int'l Workshop on Reliable Peer-to-Peer
2956	      Distributed Systems, Oct 2002.
2957	[140] F. Dabek, R. Cox, F. Kaashoek, and R. Morris, Vivaldi: A
2958	      Decentralized Network Coordinate System, SIGCOMM'04, Aug 30-Sept
2959	      3 2004.
2960	[141] E.-K. Lua, J. Crowcroft, and M. Pias, Highways: proximity
2961	      clustering for massively scaleable peer-to-peer network routing,
2962	      Proc. Fourth IEEE Int'l Conf. on Peer-to-Peer Computing, August
2963	      25-27 2004.
2964	[142] F. Fessant, S. Handurukande, A.-M. Kermarrec, and L. Massoulie,
2965	      Clustering in Peer-to-Peer File Sharing Workloads, The 3rd Int'l
2966	      Workshop on Peer-to-Peer Systems, February 26-27 2004.
2967	[143] T. S. E. Ng and H. Zhang, Predicting internet network distance
2968	      with coordinates-based approaches, IEEE Infocom 2002, The 21st
2969	      Annual Joint Conf. of the IEEE Computer and Communication
2970	      Societies, June 23-27 2002.
2971	[144] K. Hildrum, R. Krauthgamer, and J. Kubiatowicz, Object Location
2972	      in Realistic Networks, Proc. Sixteenth ACM Symp. on Parallel
2973	      Algorithms and Architectures (SPAA 2004), June 2004, pp. 25-35.
2974	[145] P. Keleher, S. Bhattacharjee, and B. Silaghi, Are Virtualized
2975	      Overlay Networks Too Much of a Good Thing?, First Int'l Workshop
2976	      on Peer-to-Peer Systems IPTPS, March 2002.
2977	[146] A. Mislove and P. Druschel, Providing administrative control and
2978	      autonomy in structured peer-to-peer overlays, The 3rd Int'l
2979	      Workshop on Peer-to-Peer Systems, June 9-12 2004.
2980	[147] D. Karger and M. Ruhl, Diminished Chord: A Protocol for
2981	      Heterogeneous SubGroup Formation in Peer-to-Peer Networks, The
2982	      3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27 2004.
2983	[148] B. Awerbuch and C. Scheideler, Consistent, order-preserving data
2984	      management in distributed storage systems, Proc. Sixteenth ACM
2985	      Symp. on Parallel Algorithms and Architectures SPAA 2004, June
2986	      27-30 2004, pp. 44-53.
2987	[149] M. Freedman and D. Mazieres, Sloppy Hashing and Self-Organizing
2988	      Clusters, Proc. 2nd Int'l Workshop on Peer-to-Peer Systems IPTPS
2989	      '03, February 2003.
2990	[150] F. Dabek, J. Li, E. Sit, J. Robertson, F. Kaashoek, and R.
2991	      Morris, Designing a DHT for low latency and high throughput,
2992	      Proc. First Symp. on Networked Systems Design and Implementation
2993	      (NSDI'04), San Francisco, California, March 29-31 (2004) 85-98.
2994	[151] M. Ruhl, Efficient algorithms for new computational models,
2995	      Doctoral Dissertation, September 2003.
2996	[152] K. Sollins, Designing for scale and differentiation, Proc. ACM
2997	      SIGCOMM workshop on Future Directions in network architecture,
2998	      August 25-27 2003.

3000	[153] L. Massoulie, A. Kermarrec, and A. Ganesh, Network awareness and
3001	      failure resilience in self-organizing overlay networks, Proc.
3002	      22nd Int'l Symp. on Reliable Distributed Systems, SRDS'03, Oct 6-
3003	      8 2003, pp. 47-55.
3004	[154] R. Cox, F. Dabek, F. Kaashoek, J. Li, and R. Morris,
3005	      Practical,distributed network coordinates, ACM SIGCOMM  Computer
3006	      Communication Review 34 (1) (2004) 113-118.
3007	[155] K. Hildrum, J. Kubiatowicz, S. Rao, and B. Zhao, Distributed
3008	      object location in a dynamic network, Proc. 14th annual ACM
3009	      symposium on parallel algorithms and architectures 2002, pp. 41-
3010	      52.
3011	[156] X. Zhang, Q. Zhang, G. Song, and W. Zhu, A Construction of
3012	      Locality-Aware Overlay Network: mOverlay and its Performance,
3013	      IEEE Journal on Selected Areas in Communications 22 (1) (2004)
3014	      18-28.
3015	[157] N. Harvey, M. B. Jones, M. Theimer, and A. Wolman, Efficient
3016	      recovery from organization disconnects in Skipnet, Second Int'l
3017	      Workshop on Peer-to-Peer Systems IPTPS'03, Feb 20-21 2003.
3018	[158] M. Pias, J. Crowcroft, S. Wilbur, T. Harris, and S. Bhatti,
3019	      Lighthouses for scalable distributed location, Second Int'l
3020	      Workshop on Peer-to-Peer Systems IPTPS'03, February 20-21 2003.
3021	[159] K. Gummadi, S. Saroui, S. Gribble, and D. King, Estimating
3022	      latency between arbitrary internet end hosts, Proc.  SIGCOMM IMW
3023	      2002, November 2002.
3024	[160] Y. Liu, X. Liu, L. Xiao, L. Ni, and X. Zhang, Location-aware
3025	      topology matching in P2P systems, Proc.   IEEE Infocomm, Mar 7-11
3026	      2004.
3027	[161] G. S. Manku, Balanced binary trees for ID management and load
3028	      balance in distributed hash tables, Proc. 23rd Annual ACM SIGACT-
3029	      SIGOPS Symp. on Principles of Distributed Computing, PODC 2004,
3030	      July 25-28 2004, pp. 197-205.
3031	[162] J. Gao and P. Steenkiste, Design and Evaluation of a Distributed
3032	      Scalable Content Delivery System, IEEE Journal on Selected Areas
3033	      in Communications 22 (1) (2004) 54-66.
3034	[163] X. Wang, Y. Zhang, X. Li, and D. Loguinov, On zone-balancing of
3035	      peer-to-peer networks: analysis of random node join, Proc. joint
3036	      international conference on measurement and modeling of computer
3037	      systems, June 2004.
3038	[164] D. Karger and M. Ruhl, Simple efficient load balancing algorithms
3039	      for peer-to-peer systems, Proc. Sixteenth ACM Symp. on Parallel
3040	      Algorithms and Architectures SPAA 2004, June 27-30 2004.
3041	[165] D. Karger and M. Ruhl, Simple efficient load balancing algorithms
3042	      for peer-to-peer systems, The 3rd Int'l Workshop on Peer-to-Peer
3043	      Systems, February 26-27 2004.

3045	[166] M. Adler, E. Halperin, R. Karp, and V. Vazirani, A stochastic
3046	      process on the hypercube with applications to peer-to-peer
3047	      networks, Proc. 35th ACM symposium on Theory of Computing 2003,
3048	      pp. 575-584.
3049	[167] C. Baquero and N. Lopes, Towards peer to peer content indexing,
3050	      ACM SIGOPS Operating Systems Review 37 (4) (2003) 90-96.
3051	[168] A. Rao, K. Lakshminarayanan, S. Surana, R. Karp, and I. Stoica,
3052	      Load balancing in structured P2P systems, Proc. 2nd Int'l
3053	      Workshop on Peer-to-Peer Systems, IPTPS'03, February 20-21 2003.
3054	[169] J. Byers, J. Considine, and M. Mitzenmacher, Simple Load
3055	      Balancing for Distributed Hash Tables, Second Int'l Workshop on
3056	      Peer-to-Peer Systems IPTPS 03, 20-21 February 2003.
3057	[170] P. Castro, J. Lee, and A. Misra, CLASH: A Protocol for Internet-
3058	      Scale Utility-Oriented Distributed Computing, Proc. 24th Int'l
3059	      Conf. on Distributed Computing Systems ICDCS 2004, March 23-26
3060	      2004.
3061	[171] A. Stavrou, D. Rubenstein, and S. Sahu, A Lightwight, Robust P2P
3062	      System to Handle Flash Crowds, IEEE Journal on Selected Areas in
3063	      Communications 22 (1) (2004) 6-17.
3064	[172] A. Selcuk, E. Uzun, and M. R. Pariente, A reputation-based trust
3065	      management system for P2P networks, Fourth Int'l Workshop on
3066	      Global and Peer-to-Peer Computing, April 20-21 2004.
3067	[173] T. Papaioannou and G. Stamoulis, Effective use of reputation in
3068	      peer-to-peer environments, Fourth Int'l Workshop on Global and
3069	      Peer-to-Peer Computing, April 20-21 2004.
3070	[174] M. Blaze, J. Feigenbaum, and J. Lacy, Trust and Reputation in P2P
3071	      networks,
3072	      http://www.neurogrid.net/twiki/bin/view/Main/ReputationAndTrust
3073	      (2003)
3074	[175] E. Damiani, D. C. di Vimercati, S. Paraboschi, P. Samarati, and
3075	      F. Violante, A reputation-based approach for choosing reliable
3076	      resources in peer to peer networks, Proc. 9th conference on
3077	      computer and communications security 2002, pp. 207-216.
3078	[176] S. Marti, P. Ganesan, and H. Garcia-Molina, DHT routing using
3079	      social links, The 3rd Int'l Workshop on Peer-to-Peer Systems,
3080	      February 26-27 2004.
3081	[177] G. Caronni and M. Waldvogel, Establishing trust in distributed
3082	      storage providers, Proc. Third Int'l IEEE Conf. on Peer-to-Peer
3083	      Computing, 1-3 Sept 2003, pp. 128-133.
3084	[178] B. Sieka, A. Kshemkalyani, and M. Singhal, On the security of
3085	      polling protocols in peer-to-peer systems, Proc. Fourth IEEE
3086	      Int'l Conf. on Peer-to-Peer Computing, 25-27 August 2004.
3087	[179] M. Feldman, K. Lai, I. Stoica, and J. Chuang, Robust Incentive
3088	      Techniques for Peer-to-Peer Networks, ACM E-Commerce Conf. EC'04,
3089	      May 2004.

3091	[180] K. Anagnostakis and M. Greenwald, Exchange-based Incentive
3092	      Mechanism for Peer-to-Peer File Sharing, Proc. 24th Int'l Conf.
3093	      on Distributed Computing Systems ICDCS 2004, March 23-26 2004.
3094	[181] J. Schneidman and D. Parkes, Rationality and self-Interest in
3095	      peer to peer networks, Second Int'l Workshop on Peer-to-Peer
3096	      Systems IPTPS'03, February 20-21 2003.
3097	[182] C. Buragohain, D. Agrawal, and S. Subhash, A game theoretic
3098	      framework for incentives in P2P systems, Proc. Third Int'l IEEE
3099	      Conf. on Peer-to-Peer Computing, 1-3 Sept 2003, pp. 48-56.
3100	[183] W. Josephson, E. Sirer, and F. Schneider, Peer-to-Peer
3101	      Authentication with a Distributed Single Sign-On Service, The 3rd
3102	      Int'l Workshop on Peer-to-Peer Systems, February 26-27 2004.
3103	[184] A. Fiat and J. Saia, Censorship resistant peer to peer content
3104	      addressable networks, Proc. 13th annual ACM-SIAM symposium on
3105	      discrete algorithms 2002, pp. 94-103.
3106	[185] N. Daswani and H. Garcia-Molina, Query-flood DoS attacks in
3107	      gnutella, Proc. 9th ACM Conf. on Computer and Communications
3108	      Security 2002, pp. 181-192.
3109	[186] A. Singh and L. Liu, TrustMe: anonymous management of trust
3110	      relationships in decentralized P2P systems, Proc. Third Int'l
3111	      IEEE Conf. on Peer-to-Peer Computing, Sept 1-3 2003.
3112	[187] A. Serjantov, Anonymizing censorship resistant systems, Proc.
3113	      Second Int'l Conf. on Peer to Peer Computing, March 2002.
3114	[188] S. Hazel and B. Wiley, Achord: A Variant of the Chord Lookup
3115	      Service for Use in Censorship Resistant Peer-to-Peer Publishing
3116	      Systems, Proc. Second Int'l Conf. on Peer to Peer Computing,
3117	      March 2002.
3118	[189] M. Freedman and R. Morris, Tarzan: a peer-to-peer anonymizing
3119	      network layer, Proc. 9th ACM Conf. on Computer and Communications
3120	      Security (2002) 193-206.
3121	[190] M. Feldman, C. Papadimitriou, J. Chuang, and I. Stoica, Free-
3122	      Riding and Whitewashing in Peer-to-Peer Systems, 3rd Annual
3123	      Workshop on Economics and Information Security WEIS04, May 2004.
3124	[191] L. Ramaswamy and L. Liu, FreeRiding: a new challenge for peer-to-
3125	      peer file sharing systems, Proc. 2003 Hawaii Int'l Conf. on
3126	      System Sciences, P2P Track, HICSS2003, January 6-9 2003.
3127	[192] T.-W. Ngan, D. Wallach, and P. Druschel, Enforcing fair sharing
3128	      of peer-to-peer resources, Second Int'l Workshop on Peer-to-Peer
3129	      Systems, IPTPS'03, 20-21 February 2003.
3130	[193] L. Cox and B. D. Noble, Samsara: honor among thieves in peer-to-
3131	      peer storage, Proc. nineteenth ACM symposium on Operating System
3132	      Principles 2003, pp. 120-132.
3133	[194] M. Surridge and C. Upstill, Grid security: lessons for peer-to-
3134	      peer systems, Proc. Third Int'l IEEE Conf. on Peer-to-Peer
3135	      Computing, Sept 1-3 2003, pp. 2-6.

3137	[195] E. Sit and R. Morris, Security considerations for peer-to-peer
3138	      distributed hash tables, First Int'l Workshop on Peer-to-Peer
3139	      Systems, March 2002.
3140	[196] C. O'Donnel and V. Vaikuntanathan, Information leak in the Chord
3141	      lookup protocol, Proc. Fourth IEEE Int'l Conf. on Peer-to-Peer
3142	      Computing, 25-27 August 2004.
3143	[197] K. Berket, A. Essiari, and A. Muratas, PKI-Based Security for
3144	      Peer-to-Peer Information Sharing, Proc. Fourth IEEE Int'l Conf.
3145	      on Peer-to-Peer Computing, 25-27 August 2004.
3146	[198] B. Karp, S. Ratnasamy, S. Rhea, and S. Shenker, Spurring adoption
3147	      of DHTs with OpenHash, a public DHT service, The 3rd Int'l
3148	      Workshop on Peer-to-Peer Systems, February 26-27 2004.
3149	[199] J. Considine, M. Walfish, and D. G. Andersen, A pragmatic
3150	      approach to DHT adoption, Technical Report,, December 2003.
3151	[200] G. Li, Peer to Peer Networks in Action, IEEE Internet Computing 6
3152	      (1) (2002) 37-39.
3153	[201] A. Mislove, A. Post, C. Reis, P. Willmann, P. Druschel, D.
3154	      Wallach, X. Bonnaire, P. Sens, J.-M. Busca, and L. Arantes-
3155	      Bezerra, POST:  A Secure, Resilient, Cooperative Messaging
3156	      System, 9th Workshop on Hot Topics in Operating Systems, HotOS,
3157	      May 2003.
3158	[202] S. Saroiu, P. Gummadi, and S. Gribble, A measurement study of
3159	      peer-to-peer file sharing systems, Proc.  Multimedia Computing
3160	      and Networking 2002 MMCN'02, January 2002.
3161	[203] A. Muthitacharoen, R. Morris, T. Gil, and B. Chen, Ivy: a
3162	      read/write peer-to-peer file system, ACM SIGOPS Operating Systems
3163	      Review, Special issue on Decentralized storage systems, December
3164	      2002, pp. 31-44.
3165	[204] A. Muthitacharoen, R. Morris, T. Gil, and B. Chen, A read/write
3166	      peer-to-peer file system, Proc. 5th Symp. on Operating System
3167	      Design and Implementation (OSDI 2002), Boston, MA, December
3168	      (2002)
3169	[205] F. Annexstein, K. Berman, M. Jovanovic, and K. Ponnavaikko,
3170	      Indexing techniques for file sharing in scalable peer to peer
3171	      networks, 11th IEEE Int'l Conf. on Computer Communications and
3172	      Networks (2002) 10-15.
3173	[206] G. Kan and Y. Faybishenko, Introduction to Gnougat, First Int'l
3174	      Conf. on Peer-to-Peer Computing 2001 2001, pp. 4-12.
3175	[207] R. Gold and D. Tidhar, Towards a content-based aggregation
3176	      network, Proc. First Int'l Conf. on Peer to Peer Compuuting 2001,
3177	      pp. 62-68.
3178	[208] F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, and I. Stoica,
3179	      Wide-area cooperative storage with CFS, Proc. 18th ACM symposium
3180	      on Operating System Principles 2001, pp. 202-215.

3182	[209] M. Freedman, E. Freudenthal, and D. Mazieres, Democratizing
3183	      content publication with coral, Proc. First Symp. on Networked
3184	      Systems Design and Implementation NSDI'04, March 29-31 2004, pp.
3185	      239-252.
3186	[210] J. Li, B. T. Loo, J. Hellerstein, F. Kaashoek, D. Karger, and R.
3187	      Morris, On the Feasibility of Peer-to-Peer Web Indexing and
3188	      Search, Second Int'l Workshop on Peer-to-Peer Systems IPTPS 03,
3189	      20-21 February 2003.
3190	[211] S. Iyer, A. Rowstron, and P. Druschel, Squirrel: a decentralized
3191	      peer-to-peer web cache, Proc. 21st annual symposium on principles
3192	      of distributed computing 2002, pp. 213-222.
3193	[212] M. Bawa, R. Bayardo, S. Rajagopalan, and E. Shekita, Make it
3194	      fresh, make it quick: searching a network of personal webservers,
3195	      Proc. 12th international conference on World Wide Web 2003, pp.
3196	      577-586.
3197	[213] B. T. Loo, S. Krishnamurthy, and O. Cooper, Distributed web
3198	      crawling over DHTs, Technical Report, CSD-04-1305, February 9
3199	      2004.
3200	[214] M. Junginger and Y. Lee, A self-organizing publish/subscribe
3201	      middleware for dynamic peer-to-peer networks, IEEE Network 18 (1)
3202	      (2004) 38-43.
3203	[215] F. Cuenca-Acuna, C. Peery, R. Martin, and T. Nguyen, PlanetP:
3204	      Using Gossiping to Build Content Addressable Peer-to-Peer
3205	      Information Sharing Communities, Proc. 12th international
3206	      symposium on High Performance Distributed Computing (HPDC), June
3207	      2002.
3208	[216] M. Walfish, H. Balakrishnan, and S. Shenker, Untangling the web
3209	      from DNS, Proc. First Symp. on Networked Systems Design and
3210	      Implementation NSDI'04, March 29-31 2004, pp. 225-238.
3211	[217] B. Awerbuch and C. Scheideler, Robust distributed name service,
3212	      The 3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27
3213	      2004.
3214	[218] A. Iamnitchi, Resource Discovery in Large Resource-Sharing
3215	      Environments, Doctoral Dissertation 2003.
3216	[219] R. Cox, A. Muthitacharoen, and R. Morris, Serving DNS using a
3217	      Peer-to-Peer Lookup Service, First Int'l Workshop on Peer-to-Peer
3218	      Systems (IPTPS), March 2002.
3219	[220] A. Chander, S. Dawson, P. Lincoln, and D. Stringer-Calvert,
3220	      NEVRLATE:  scalable resource discovery, Second IEEE/ACM Int'l
3221	      Symp. on Cluster Computing and the Grid CCGRID2002 2002, pp. 56-
3222	      65.
3223	[221] M. Balazinska, H. Balakrishnan, and D. Karger, INS/Twine:  A
3224	      scalable Peer-to-Peer architecture for Intentional Resource
3225	      Discovery, Proc. First Int'l Conf. on Pervasive Computing (IEEE)
3226	      (2002)

3228	[222] J. Kangasharju, K. Ross, and D. Turner, Secure and resilient
3229	      peer-to-peer E-mail: design and implementation, Proc. Third Int'l
3230	      IEEE Conf. on Peer-to-Peer Computing, 1-3 Sept 2003.
3231	[223] V. Lo, D. Zappala, D. Zhou, Y. Liu, and S. Zhao, Cluster
3232	      computing on the fly: P2P scheduling of idle cycles in the
3233	      internet, The 3rd Int'l Workshop on Peer-to-Peer Systems,
3234	      February 26-27 2004.
3235	[224] A. Iamnitchi, I. Foster, and D. Nurmi, A peer-to-peer approach to
3236	      resource discovery in grid environments, IEEE High Performance
3237	      Distributed Computing 2002.
3238	[225] I. Foster and A. Iamnitchi, On Death, Taxes and the Convergence
3239	      of Peer-to-Peer and Grid Computing, Second Int'l Workshop on
3240	      Peer-to-Peer Systems IPTPS 03, 20-21 February 2003.
3241	[226] W. Hoschek, Peer-to-Peer Grid Databases for Web Service
3242	      Discovery, Concurrency - Practice and Experience (2002) 1-7.
3243	[227] K. Aberer, A. Datta, and M. Hauswirth, A decentralized public key
3244	      infrastructure for customer-to-customer e-commerce, Int'l Journal
3245	      of Business Process Integration and Management (2004)
3246	[228] S. Ajmani, D. Clarke, C.-H. Moh, and S. Richman, ConChord:
3247	      Cooperative SDSI Certificate Storage and Name Resolution, First
3248	      Int'l Workshop on Peer-to-Peer Systems IPTPS, March 2002.
3249	[229] E. Sit, F. Dabek, and J. Robertson, UsenetDHT: a low overhead
3250	      Usenet server, The 3rd Int'l Workshop on Peer-to-Peer Systems,
3251	      February 26-27 2004.
3252	[230] H.-Y. Hsieh and R. Sivakumar, On transport layer support for
3253	      peer-to-peer networks, The 3rd Int'l Workshop on Peer-to-Peer
3254	      Systems, February 26-27 2004.
3255	[231] I. Stoica, D. Adkins, S. Zhuang, S. Shenker, and S. Surana,
3256	      Internet indirection infrastructure, Proc. 2002 conference on
3257	      applications, technologies, architectures and protocols for
3258	      computer communications, August 19-23 2002, pp. 73-86.
3259	[232] E. Halepovic and R. Deters, Building a P2P forum system with
3260	      JXTA, Proc. Second IEEE Int'l Conf. on Peer to Peer Computing
3261	      P2P'02, September 5-7 2002.
3262	[233] M. Wawrzoniak, L. Peterson, and T. Roscoe, Sophia: an Information
3263	      Plane for networked systems, ACM SIGCOMM  Computer Communication
3264	      Review 34 (1) (2004) 15-20.
3265	[234] D. Tran, K. Hua, and T. Do, A Peer-to-Peer Architecture for Media
3266	      Streaming, IEEE Journal on Selected Areas in Communications 22
3267	      (1) (2004) 121-133.
3268	[235] V. Padmanabhan, H. Wang, and P. Chou, Supporting heterogeneity
3269	      and congestion control in peer-to-peer multicast streaming, The
3270	      3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27 2004.
3271	[236] A. Nicolosi and D. Mazieres, Secure acknowledgment of multicast
3272	      messages in open peer-to-peer networks, The 3rd Int'l Workshop on
3273	      Peer-to-Peer Systems, February 26-27 2004.

3275	[237] R. Zhang and C. Hu, Borg: a hybrid protocol for scalable
3276	      application-level multicast in peer-to-peer networks, Proc. 13th
3277	      international workshop on network and operating systems for
3278	      digital audio and video 2003, pp. 172-179.
3279	[238] M. Sasabe, N. Wakamiya, M. Murata, and H. Miyahara, Scalable and
3280	      continuous media streaming on peer-to-peer networks, Proc. Third
3281	      Int'l IEEE Conf. on Peer-to-Peer Computing, Sept 1-3 2003, pp.
3282	      92-99.
3283	[239] M. Hefeeda, A. Habib, B. Botev, D. Xu, and B. Bhargava, PROMISE:
3284	      peer-to-peer media streaming using CollectCast, Proc. eleventh
3285	      ACM international conference on multimedia 2003, pp. 45-54.
3286	[240] M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron,
3287	      and A. Singh, SplitStream:  high-bandwidth multicast in
3288	      cooperative environments, Proc. 19th ACM symposium on operating
3289	      systems principles 2003, pp. 298-313.
3290	[241] M. Castro, P. Druschel, A.-M. Kermarrec, and A. Rowstron, SCRIBE:
3291	      a large-scale and decentralized application-level multicast
3292	      infrastructure, IEEE Journal on Selected Areas in Communications
3293	      20 (8) (2002)
3294	[242] S. Zhuang, B. Zhao, A. Joseph, R. Katz, and J. Kubiatowicz,
3295	      Bayeux: an architecture for scalable and fault-tolerant wide-area
3296	      data dissemination, Proc. 11th ACM international workshop on
3297	      network and operating systems support for digital audio and
3298	      video, Jan 2001.
3299	[243] R. Lienhart, M. Holliman, Y.-K. Chen, I. Kozintsev, and M. Yeung,
3300	      Improving media services on P2P networks, IEEE Internet Computing
3301	      6 (1) (2002) 58-67.
3302	[244] S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L.
3303	      Yin, and F. Yu, Data Centric Storage in Sensornets with GHT, a
3304	      geographic hash table, Mobile Networks and Applications 8 (4)
3305	      (2003) 427-442.
3306	[245] M. Demirbas and H. Ferhatosmanoglu, Peer-to-peer spatial queries
3307	      in sensor networks, Proc. Third Int'l IEEE Conf. on Peer-to-Peer
3308	      Computing, 1-3 Sept 2003, pp. 32-39.
3309	[246] S. Ratnasamy, B. Karp, L. Yin, F. Yu, D. Estrin, R. Govindan, and
3310	      S. Shenker, GHT:  a geographic hash table for data-centric
3311	      storage, Proc. First ACM Int'l Workshop on Wireless Sensor
3312	      Networks and Applications (Mobicom) 2002, pp. 78-87.
3313	[247] J. Hellerstein and W. Wang, Optimization of In-Network Data
3314	      Reduction, Proc. First Workshop on Data Management for Sensor
3315	      Networks DMSN 2004, August 30th 2004.
3316	[248] J. Li, J. Stribling, T. Gil, R. Morris, and F. Kaashoek,
3317	      Comparing the performance of distributed hash tables under churn,
3318	      The 3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27
3319	      2004.

3321	[249] S. Shenker, The data-centric revolution in networking, Keynote
3322	      Speech, 29th Int'l Conf. on Very Large Data Bases, September 9-12
3323	      2003.
3324	[250] S. Gribble, A. Halevy, Z. Ives, M. Rodrig, and D. Suciu, What can
3325	      databases do for P2P?, Proc.  Fourth Int'l Workshop on Databases
3326	      and the Web, WebDB2001, May 24-25 2001.
3327	[251] D. Clark, The design philosophy of the DARPA internet protocols,
3328	      ACM SIGCOMM Computer Communication Review, Symp. proceedings on
3329	      communications architectures and protocols 18 (4) (1988)
3330	[252] J.-C. Laprie, Dependable Computing and Fault Tolerance:  Concepts
3331	      and Terminology, Twenty-Fifth Int'l Symp. on Fault-Tolerant
3332	      Computing, Highlights from Twenty-Five Years 1995, pp. 2-13.
3333	[253] D. Clark, J. Wroclawski, K. Sollins, and R. Braden, Tussle in
3334	      cyberspace:  defining tomorrow's internet, Conf. on Applications,
3335	      Technologies, Architectures and Protocols for Computer
3336	      Communications 2002, pp. 347-356.
3337	[254] L. O. Alima, A. Ghodsi, and S. Haridi, "A framework for
3338	      structured peer-to-peer overlay networks," in Global computing,
3339	      vol. 3267, Lecture Notes in Computer Science: Springer Berlin /
3340	      Heidelberg, 2005, pp. 223-249.
3341	[255] Clip2, The Gnutella Protocol Specification, http://www.clip2.com
3342	      (2000)
3343	[256] Napster, http://www.napster.com (1999)
3344	[257] J. Mishchke and B. Stiller, A methodology for the design of
3345	      distributed search in P2P middleware, IEEE Network 18 (1) (2004)
3346	      30-37.
3347	[258] J. Li and K. Sollins, Implementing aggregation and broadcast over
3348	      distributed hash tables. Full report,
3349	      http://krs.lcs.mit.edu/regions/docs.html (November) (2003)
3350	[259] M. Castro, M. Costa, and A. Rowstron, Should we build Gnutella on
3351	      a structured overlay?, ACM SIGCOMM  Computer Communication Review
3352	      34 (1) (2004) 131-136.
3353	[260] A. Singla and C. Rohrs, Ultrapeers: Another Step Towards Gnutella
3354	      Scalability,,
3355	      http://groups.yahoo.com/group/the_gdf/files/Proposals/Working%20P
3356	      roposals/Ultrapeer/ Version 1.0, 26 November (2002)
3357	[261] B. Cooper and H. Garcia-Molina, Ad hoc, Self-Supervising Peer-to-
3358	      Peer Search Networks, Technical Report,
3359	      http://www.cc.gatech.edu/~cooperb/odin/ 2003.
3360	[262] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval.
3361	      Addison Wesley, Essex, England, 1999.
3362	[263] S. Sen and J. Wang, Analyzing peer-to-peer traffic across large
3363	      networks, IEEE/ACM Trans. on Networking 12 (2) (2004) 219-232.
3364	[264] H. Balakrishnan, S. Shenker, and M. Walfish, Semantic-Free
3365	      Referencing in Linked Distributed Systems, Second Int'l Workshop
3366	      on Peer-to-Peer Systems IPTPS 03, 20-21 February 2003.

3368	[265] B. Yang, P. Vinograd, and H. Garcia-Molina, Evaluating GUESS and
3369	      non-forwarding peer-to-peer search, The 24th Int'l Conf. on
3370	      Distributed Computing Systems ICDCS'04, Mar 23-26 2004.
3371	[266] A. Gupta, B. Liskov, and R. Rodrigues, One Hop Lookups for Peer-
3372	      to-Peer Overlays, 9th Workshop on Hot Topics in Operating Systems
3373	      (HotOS), 18-21 May 2003.
3374	[267] A. Gupta, B. Liskov, and R. Rodrigues, Efficient routing for
3375	      peer-to-peer overlays, First symp. on Networked Systems Design
3376	      and Implementation (NSDI), Mar 29-31 2004, pp. 113-126.
3377	[268] A. Mizrak, Y. Cheng, V. Kumar, and S. Savage, Structured
3378	      superpeers: leveraging heterogeneity to provide constant-time
3379	      lookup, IEEE Workshop on Internet Applications, June 23-24 2003.
3380	[269] L. Adamic, R. Lukose, A. Puniyani, and B. Huberman, Search in
3381	      power-law networks, Physical review E, The American Physical
3382	      Society 64 (046135) (2001)
3383	[270] F. Banaei-Kashani and C. Shahabi, Criticality-based analysis and
3384	      design of unstructured peer-to-peer networks as "complex
3385	      systems", Proc. 3rd IEEE/ACM Int'l Symp. on Cluster Computing and
3386	      the Grid 2003, pp. 351-358.
3387	[271] KaZaa, KaZaa Media Desktop, www.kazaa.com (2001)
3388	[272] S. Sen and J. Wang, Analyzing peer-to-peer traffic across large
3389	      networks, Proc. second ACM SIGCOMM workshop on Internet
3390	      measurement, November 06-08 2002, pp. 137-150.
3391	[273] DirectConnect, http:www.neo-modus.com (2001)
3392	[274] S. Saroiu, K. Gummadi, R. Dunn, S. Gribble, and H. Levy, An
3393	      analysis of Internet content delivery systems, ACM SIGOPS
3394	      Operating Systems Review 36 (2002) 315-327.
3395	[275] A. Loo, The Future or Peer-to-Peer Computing, Communications of
3396	      the ACM 46 (9) (2003) 56-61.
3397	[276] B. Yang and H. Garcia-Molina, Comparing Hybrid Peer-to-Peer
3398	      Systems (extended), 27th Int'l Conf. on Very Large Data Bases,
3399	      September 11-14 2001.
3400	[277] D. Scholl, OpenNap Home Page, http://opennap.sourceforge.net/
3401	      (2001)
3402	[278] S. Ghemawat, H. Gobioff, and S.-T. Leung, The Google file system,
3403	      Proc. 19th ACM symposium on operating systems principles 2003,
3404	      pp. 29-43.
3405	[279] I. Clarke, S. Miller, T. Hong, O. Sandberg, and B. Wiley,
3406	      Protecting Free Expression Online with Freenet, IEEE Internet
3407	      Computing 6 (1) (2002)
3408	[280] J. Mache, M. Gilbert, J. Guchereau, J. Lesh, F. Ramli, and M.
3409	      Wilkinson, Request algorithms in Freenet-style peer-to-peer
3410	      systems, Proc. Second IEEE Int'l Conf. on Peer to Peer Computing
3411	      P2P'02, September 5-7 2002.
3412	[281] C. Rohrs, Query Routing for the Gnutella Networks,
3413	      http://www.limewire.com/developer/query_routing/keyword%20routing
3414	      .htm Version 1.0 (2002)

3416	[282] I. Clarke, Freenet's Next Generation Routing Protocol,
3417	      http://freenetproject.org/index.php?page=ngrouting, 20th July
3418	      2003.
3419	[283] A. Z. Kronfol, FASD: A fault-tolerant, adaptive scalable
3420	      distributed search engine, Master's Thesis
3421	      http://www.cs.princeton.edu/~akronfol/fasd/ 2002.
3422	[284] S. Gribble, E. Brewer, J. M. Hellerstein, and D. Culler,
3423	      Scalable, Distributed Data Structures for Internet Service
3424	      Construction, Proc. 4th Symp. on Operating Systems Design and
3425	      Implementation OSDI 2000, October 2000.
3426	[285] K. Aberer, Efficient Search in Unbalanced, Randomized Peer-to-
3427	      Peer Search Trees, EPFL Technical Report IC/2002/79 (2002)
3428	[286] R. Honicky and E. Miller, A fast algorithm for online placement
3429	      and reorganization of replicated data, Proc. 17th Int'l Parallel
3430	      and Distributed Processing Symp., April 2003.
3431	[287] G. S. Manku, Routing networks for distributed hash tables, Proc.
3432	      22nd annual ACM Symp. on Principles of Distributed Computing,
3433	      PODC 2003, July 13-16 2003, pp. 133-142.
3434	[288] D. Lewin, Consistent hashing and random trees: algorithms for
3435	      caching in distributed networks, Master's Thesis, Department of
3436	      Electrical Engineering and Computer Science, Massachusetts
3437	      Institute of Technology (1998)
3438	[289] S. Lei and A. Grama, Extended consistent hashing: a framework for
3439	      distributed servers, Proc. 24th Int'l Conf. on Distributed
3440	      Computing Systems ICDCS 2004, March 23-26 2004.
3441	[290] W. Litwin, Re: Chord & LH*, Email to Ion Stoica, March 23 2004a.
3442	[291] J. Li, J. Stribling, R. Morris, F. Kaashoek, and T. Gil, A
3443	      performance vs. cost framework for evaluating DHT design
3444	      tradeoffs under churn, Proc. IEEE Infocom, Mar 13-17 2005.
3445	[292] S. Zhuang, D. Geels, I. Stoica, and R. Katz, On failure detection
3446	      algorithms in overlay networks, Proc. IEEE Infocomm, Mar 13-17
3447	      2005.
3448	[293] X. Li, J. Misra, and C. G. Plaxton, Active and Concurrent
3449	      Topology Maintenance, The 18th Annual Conf. on Distributed
3450	      Computing (DISC 2004), Trippenhuis, Amsterdam, the Netherlands,
3451	      October 4 - October 7 (2004)
3452	[294] K. Aberer, L. O. Alima, A. Ghodsi, S. Girdzijauskas, M.
3453	      Hauswirth, and S. Haridi, The essence of P2P: a reference
3454	      architecture for overlay networks, Proc. of the 5th international
3455	      conference on peer-to-peer computing, Aug 31-Sep 2 2005.
3456	[295] C. Tang, M. Buco, R. Chang, S. Dwarkadas, L. Luan, E. So, and C.
3457	      Ward, Low traffic overlay networks with large routing tables,
3458	      Proc. of ACM Sigmetrics Int'l Conf. on Measurement and Modeling
3459	      of Comp. Sys., Jun 6-10 2005, pp. 14-25.
3460	[296] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, Handling churn
3461	      in a DHT, Proc. of the USENIX Annual Technical Conference, June
3462	      2004.

3464	[297] C. Blake and R. Rodrigues, High Availability, Scalable Storage,
3465	      Dynamic Peer Networks:  Pick Two, 9th Workshop on Hot Topics in
3466	      Operating Systems (HotOS), Lihue, Hawaii, 18-21 May (2003)
3467	[298] S. Rhea, B. Godfrey, B. Karp, J. Kubiatowicz, S. Ratnasamy, S.
3468	      Shenker, I. Stoica, and H. Yu, OpenDHT: a public DHT service and
3469	      its uses, Proc. of the conf. on Applications, technologies,
3470	      architectures and protocols for computer communications, Aug 22-
3471	      26 2005, pp. 73-84.
3472	[299] T. Gil, F. Kaashoek, J. Li, R. Morris, and J. Stribling, p2psim,
3473	      a simulator for peer-to-peer protocols,
3474	      http://www.pdos.lcs.mit.edu/p2psim/ (2003)
3475	[300] K. Hildrum, J. D. Kubiatowicz, S. Rao, and B. Y. Zhao,
3476	      Distributed object location in a dynamic network, Theory of
3477	      Computing Systems (2004)
3478	[301] N. Lynch, D. Malkhi, and D. Ratajczak, Atomic data access in
3479	      distributed hash tables, Proc. Int'l Peer-to-Peer Symp., March 7-
3480	      8 2002.
3481	[302] S. Gilbert, N. Lynch, and A. Shvartsman, RAMBO II: Rapidly
3482	      Reconfigurable Atomic Memory for Dynamic Networks, Technical
3483	      Report, MIT-CSAIL-TR-890 2004.
3484	[303] N. Lynch and I. Stoica, MultiChord: A resilient namespace
3485	      management algorithm, Technical Memo MIT-LCS-TR-936 2004.
3486	[304] J. Risson, K. Robinson, and T. Moors, Fault tolerant active rings
3487	      for structured peer-to-peer overlays, Proc. of the 30th Annual
3488	      IEEE Conf. on Local Computer Networks, Nov 15-17 2005, pp. 18-25.
3489	[305] B. Awerbuch and C. Scheideler, Peer-to-peer systems for prefix
3490	      search, Proc. 22nd annual ACM Symp. on Principles of Distributed
3491	      Computing 2003, pp. 123-132.
3492	[306] F. Dabek, B. Zhao, P. Druschel, J. Kubiatowicz, and I. Stoica,
3493	      Towards a common API for structured P2P overlays, Proc. Second
3494	      Int'l Workshop on Peer to Peer Systems IPTPS 2003, February 2003.
3495	[307] N. Feamster and H. Balakrishnan, Towards a logic for wide-area
3496	      Internet routing, Proc. ACM SIGCOMM workshop on Future Directions
3497	      in Network Architecture, August 25-27 2003, pp. 289-300.
3498	[308] B. Ahlgren, M. Brunner, L. Eggert, R. Hancock, and S. Schmid,
3499	      Invariants: a new design methodology for network architectures,
3500	      Proc. ACM SIGCOMM workshop on Future Direction in Network
3501	      Architecture, August 30 2004, pp. 65-70.
3502	[309] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to
3503	      Algorithms, 2nd Edition. MIT Press, McGraw-Hill, Cambridge,
3504	      London, England, 2003.
3505	[310] I. Abraham, D. Malkhi, and O. Dubzinski, LAND:Stretch (1+epsilon)
3506	      Locality Aware Networks for DHTs, Proc. ACM-SIAM Symp. on
3507	      Discrete Algorithms SODA-04 2004.
3508	[311] S. Jain, R. Mahajan, and D. Wetherall, A study of the performance
3509	      potential of DHT-based overlays, Proc. of the 4th Usenix
3510	      symposium on internet technologies and systems (USITS), Mar 2003.

3512	[312] J. Risson, A. Harwood, and T. Moors, Stable high-capacity one-hop
3513	      distributed hash tables, Proc. of the IEEE Symposium on Computers
3514	      and Communications (ISCC'06), Jun 26-29 2006.
3515	[313] V. Ramasubramanian and E. Sirer, Beehive: O(1) Lookup Performance
3516	      for Power-Law Query Distributions in Peer-to-Peer Overlays, Proc.
3517	      First Symp. on Networked Systems Design and Implementation
3518	      (NSDI'04), San Francisco, California, March 29-31 (2004) 99-112.
3519	[314] I. Abraham, A. Badola, D. Bickson, D. Malkhi, S. Maloo, and S.
3520	      Ron, Practical locality-awareness for large scale information
3521	      sharing, Proc. 4th International Workshop on Peer-to-Peer
3522	      Systems, Feb 24-25 2005.
3523	[315] B. Leong, B. Liskov, and E. Demaine, Epichord: parallelizing the
3524	      Chord lookup algorithm with reactive routing state management,
3525	      Proc. of the 12th International Conference on Networks, Nov 2004.
3526	[316] J. Li, J. Stribling, R. Morris, and F. Kaashoek, Bandwidth-
3527	      efficient management of DHT routing tables, Proc. 2nd Symposium
3528	      on Networked Systems Design and Implementation, May 2-4 2005.
3529	[317] S. Rhea, B.-G. Chun, J. Kubiatowicz, and S. Shenker, Fixing the
3530	      embarrassing slowness of OpenDHT on PlanetLab, Proc. of the
3531	      Second USENIX Workshop on Real, Large Distributed Systems, Dec 13
3532	      2005.
3533	[318] M. Costa, M. Castro, A. Rowstron, and P. Key, PIC: Practical
3534	      Internet coordinates for distance estimation, Proc. of the 24th
3535	      international conference on distributed computing systems, Mar
3536	      2004.
3537	[319] M. Castro, M. B. Jones, A.-M. Kermarrec, A. Rowstron, M. Theimer,
3538	      H. Wang, and A. Wolman, An evaluation of scalable application-
3539	      level multicast built using peer-to-peer overlays, Proc. of the
3540	      22nd Annual Joint Conf. of the IEEE Comp. and Comm. Soc.
3541	      (INFOCOM), 30 Mar - 3 Apr 2003, pp. 1510-1520.
3542	[320] S. Ratnasamy, M. Handley, R. Karp, and S. Shenker, Application-
3543	      level multicast using content-addressable networks, Proc. of the
3544	      Third International Workshop on Networked Group Communication,
3545	      Nov 7-9 2001.
3546	[321] S. El-Ansary, L. Alima, P. Brand, and S. Haridi, Efficient
3547	      broadcast in structured P2P networks, Second Int'l Workshop on
3548	      Peer-to-Peer Systems (IPTPS 03), Berkeley, CA, USA, 20-21
3549	      February (2003)
3550	[322] J. Li, K. Sollins, and D.-Y. Lim, Implementing aggregation and
3551	      broadcast over Distributed Hash Tables, ACM Computer
3552	      Communication Reviews 35 (1) (2005) 81-92.
3553	[323] V. Pai, K. Tamilmani, V. Sambamurthy, K. Kumar, and A. Mohr,
3554	      Chainsaw: eliminating trees from overlay multicast, Proc. 4th
3555	      Int'l Workshop on Peer-to-Peer Systems, February 24-25 2005.
3556	[324] K. Birman, M. Hayden, O. Ozkasap, Z. Xiao, and M. Budiu, Bimodal
3557	      Multicast, ACM Trans. on Computer Systems 17 (2) (1999) 41-88.

3559	[325] Z. Zhang, S. Chen, Y. Ling, and R. Chow, Resilient capacity-aware
3560	      multicasting based on overlay networks, Proc. of the 25th IEEE
3561	      Int'l Conf. on Distributed Computing Systems, 6-10 June 2005, pp.
3562	      565-574.
3563	[326] A. Bharambe, S. Rao, V. Padmanabhan, S. Seshan, and H. Zhang, The
3564	      impact of heterogeneous bandwidth constraints on DHT-based
3565	      multicast protocols, Proc. 4th Int'l Workshop on Peer-to-Peer
3566	      Systems, February 24-25 2005.
3567	[327] A. Ghodsi, L. O. Alima, S. El-Ansary, P. Brand, and S. Haridi,
3568	      Self-correcting broadcast in distributed hash tables, Proc. of
3569	      the 15th IASTED International Conf. on Parallel and Distributed
3570	      Computing and Systems, Nov 2003.
3571	[328] R. Mahajan, M. Castro, and A. Rowstron, Controlling the cost of
3572	      reliability in peer-to-peer overlays, Second Int'l Workshop on
3573	      Peer-to-Peer Systems IPTPS'03, February 20-21 2003.
3574	[329] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, Handling churn
3575	      in a DHT, Report No. UCB/CSD-03-1299, University of California,
3576	      also Proc. USENIX Annual Technical Conference, June 2003.
3577	[330] M. Castro, M. Costa, and A. Rowstron, Performance and
3578	      dependability of structured peer-to-peer overlays, Microsoft
3579	      Research Technical Report MSR-TR-2003-94, December. Also 2004
3580	      Int'l Conf. on Dependable Systems and Networks, June 28-July 1
3581	      2003.
3582	[331] D. Liben-Nowell, H. Balakrishnan, and D. Karger, Analysis of the
3583	      evolution of peer-to-peer systems, Annual ACM Symp. on Principles
3584	      of Distributed Computing 2002, pp. 233-242.
3585	[332] L. Alima, S. El-Ansary, P. Brand, and S. Haridi, DKS(N,k,f): a
3586	      family of low communication, scalable and fault-tolerant
3587	      infrastructures for P2P applications, Proc. 3rd IEEE/ACM Int'l
3588	      Symp. on Cluster Computing and the Grid (2003) 344-350.
3589	[333] D. Karger and M. Ruhl, Finding nearest neighbours in growth-
3590	      restricted metrics, Proc. 34th annual ACM symposium on Theory of
3591	      computing 2002, pp. 741-750.
3592	[334] S. Ratnasamy, A Scalable Content-Addressable Network, Doctoral
3593	      Dissertation 2002.
3594	[335] S. McCanne and S. Floyd, The LBNL/UCB Network Simulator.
3595	[336] M. Naor and U. Wieder, Novel architectures for P2P applications:
3596	      the continuous-discrete approach, Proc. fifteenth annual ACM
3597	      Symp. on Parallel Algorithms and Architectures, SPAA 2003, June
3598	      7-9 2003, pp. 50-59.
3599	[337] N. D. de Bruijn, A combinatorial problem, Koninklijke
3600	      Netherlands: Academe Van Wetenschappen 49 (1946) 758-764.
3601	[338] J.-W. Mao, "The Coloring and Routing Problems on de Bruijn
3602	      Interconnection Networks," in Doctoral Dissertation, National Sun
3603	      Yat-sen University, 2003.
3604	[339] M. L. Schlumberger, De Bruijn communication networks, Doctoral
3605	      Dissertation 1974.

3607	[340] M. Imase and M. Itoh, Design to minimize diameter on building-
3608	      block network, IEEE Trans. on Computers C-30 (6) (1981) 439-442.
3609	[341] S. M. Reddy, D. K. Pradhan, and J. G. Kuhl, Direct graphs with
3610	      minimal and maximal connectivity, Technical Report, School of
3611	      Engineering, Oakland University (1980)
3612	[342] R. A. Rowley and B. Bose, Fault-tolerant ring embedding in de
3613	      Bruijn networks, IEEE Trans. on Computers 42 (12) (1993) 1480-
3614	      1486.
3615	[343] K. Y. Lee, G. Liu, and H. F. Jordan, Hierarchical networks for
3616	      optical communications, Journal of Parallel and Distributed
3617	      Computing 60 (2000) 1-16.
3618	[344] M. Naor and U. Wieder, Know thy neighbor's neighbor:  better
3619	      routing for skip-graphs and small worlds, The 3rd Int'l Workshop
3620	      on Peer-to-Peer Systems, February 26-27 2004.
3621	[345] P. Fraigniaud and P. Gauron, The content-addressable networks
3622	      D2B, Technical Report 1349, Laboratoire de Recherche en
3623	      Informatique, January 2003.
3624	[346] A. Datta, S. Girdzijauskas, and K. Aberer, On de Bruijn routing
3625	      in distributed hash tables: there and back again, Proc. Fourth
3626	      IEEE Int'l Conf. on Peer-to-Peer Computing, , 25-27 August 2004.
3627	[347] W. Pugh, Skip lists: a probabilistic alternative to balanced
3628	      trees, Proc. Workshop on Algorithms and Data Structures, August
3629	      17-19 1989, pp. 437-449.
3630	[348] W. Pugh, Skip lists: a probabilistic alternative to balanced
3631	      trees, Communications of the ACM 33 (6) (1990) 668-676.
3632	[349] J. Gray, The transaction concept: Virtues and limitations, Proc.
3633	      VLDB, September 1981.
3634	[350] B. T. Loo, J. M. Hellerstein, R. Huebsch, S. Shenker, and I.
3635	      Stoica, Enhancing P2P file-sharing with internet-scale query
3636	      processor, Proc. 30th Int'l Conf. on Very Large Data Bases VLDB
3637	      2004, 29 August-3 September 2004.
3638	[351] M. Stonebraker, P. Aoki, W. Litwin, A. Pfeffer, A. Sah, J.
3639	      Sidell, C. Staelin, and A. Yu, Mariposa: a wide-area distributed
3640	      database system, THE VLDB Journal - The Int'l Journal of Very
3641	      Large Data Bases (5) (1996) 48-63.
3642	[352] V. Cholvi, P. Felber, and E. Biersack, Efficient Search in
3643	      Unstructured Peer-to-Peer Networks, Proc. Symp. on Parallel
3644	      Algorithms and Architectures, July 2004.
3645	[353] S. Daswani and A. Fisk, Gnutella UDP Extension for Scalable
3646	      Searches (GUESS) v0.1,
3647	      http://www.limewire.org/fisheye/viewrep/~raw,r=1.2/limecvs/core/g
3648	      uess_01.html (2002)
3649	[354] A. Fisk, Gnutella Dynamic Query Protocol v0.1, Gnutella Developer
3650	      Forum (2003)
3651	[355] O. Gnawali, A Keyword Set Search System for Peer-to-Peer
3652	      Networks, Master's Thesis 2002.

3654	[356] Limewire, Limewire Host Count,
3655	      http://www.limewire.com/english/content/netsize.shtml (2004)
3656	[357] A. Fisk, Gnutella Ultrapeer Query Routing,
3657	      http://groups.yahoo.com/group/the_gdf/files/Proposals/Working%20P
3658	      roposals/search/Ultrapeer%20QRP/ v0.1 (2003)
3659	[358] A. Fisk, Gnutella Dynamic Query Protocol,
3660	      http://groups.yahoo.com/group/the_gdf/files/Proposals/Working%20P
3661	      roposals/search/Dynamic%20Querying/ v0.1 (2003)
3662	[359] S. Thadani, Meta Data searches on the Gnutella Network
3663	      (addendum), http://www.limewire.com/developer/MetaProposal2.htm
3664	      (2001)
3665	[360] S. Thadani, Meta Information Searches on the Gnutella Networks,
3666	      http://www.limewire.com/developer/metainfo_searches.html (2001)
3667	[361] P. Reynolds and A. Vahdat, Efficient peer-to-peer keyword
3668	      searching, ACM/IFP/USENIX Int'l Middleware Conference, Middleware
3669	      2003, June 16-20 2003.
3670	[362] W. Terpstra, S. Behnel, L. Fiege, J. Kangasharju, and A.
3671	      Buchmann, Bit Zipper Rendezvous, optimal data placement for
3672	      general P2P queries, Proc. First Int'l Workshop on Peer-to-Peer
3673	      Computing and Databases, March 14 2004.
3674	[363] A. Singhal, Modern Information Retrieval: A Brief Overview, IEEE
3675	      Data Engineering Bulletin 24 (4) (2001) 35-43.
3676	[364] E. Cohen, A. Fiat, and H. Kaplan, Associative Search in Peer to
3677	      Peer Networks: Harnessing Latent Semantics, IEEE Infocom 2003,
3678	      The 22nd Annual Joint Conf. of the IEEE Computer and
3679	      Communications Societies, March 30-April 3 2003.
3680	[365] W. Muller and A. Henrich, Fast retrieval of high-dimensional
3681	      feature vectors in P2P  networks using compact peer data
3682	      summaries, Proc. 5th ACM SIGMM international workshop on
3683	      Multimedia Information Retrieval, November 7 2003, pp. 79-86.
3684	[366] M. T. Ozsu and P. Valduriez, Principles of Distributed Database
3685	      Systems, 2nd edition ed. Prentice Hall, 1999.
3686	[367] G. Salton, A. Wong, and C. S. Yang, A vector space model for
3687	      automatic indexing, Communications of the ACM 18 (11) (1975) 613-
3688	      620.
3689	[368] S. E. Robertson, S. Walker, and M. Beaulieu, Okapi at TREC-7:
3690	      automatic ad hoc, filtering, VLC and filtering tracks, Proc.
3691	      Seventh Text REtrieval Conference, TREC-7, NIST Special
3692	      Publication 500-242, July 1999, pp. 253-264.
3693	[369] A. Singhal, J. Choi, D. Hindle, D. Lewis, and F. Pereira, AT&T at
3694	      TREC-7, Proc. Seventh Text REtrieval Conf. TREC-7, July 1999, pp.
3695	      253-264.
3696	[370] K. Sankaralingam, S. Sethumadhavan, and J. Browne, Distributed
3697	      Pagerank for P2P Systems, Proc. 12th international symposium on
3698	      High Performance Distributed Computing HPDC, June 22-24 2003.

3700	[371] I. Klampanos and J. Jose, An architecture for information
3701	      retrieval over semi-collaborated peer-to-peer networks, Proc.
3702	      2004 ACM symposium on applied computing 2004, pp. 1078-1083.
3703	[372] C. Tang, Z. Xu, and S. Dwarkadas, Peer-to-peer information
3704	      retrieval using self-organizing semantic overlay networks, Proc.
3705	      2003 conference on Applications, Technologies, Architectures and
3706	      Protocols for Computer Communications, August 25-29 2003, pp.
3707	      175-186.
3708	[373] C. Tang and S. Dwarkadas, Hybrid global-local indexing for
3709	      efficient peer-to-peer information retrieval, Proc. First Symp.
3710	      on Networked Systems Design and Implementation NSDI'04, March 29-
3711	      31 2004, pp. 211-224.
3712	[374] G. W. Furnas, S. Deerwester, S. T. Dumais, T. K. Landauer, R. A.
3713	      Harshman, L. A. Streeter, and K. E. Lochbaum, Information
3714	      retrieval using a singular value decomposition model of latent
3715	      semantic structure, Proc. 11th Annual Int'l ACM SIGIR Conf. on
3716	      Research and Development in Information Retrieval 1988, pp. 465-
3717	      480.
3718	[375] C. Tang, S. Dwarkadas, and Z. Xu, On scaling latent semantic
3719	      indexing for large peer-to-peer systems, The 27th Annual Int'l
3720	      ACM SIGIR Conf. SIGIR'04, ACM Special Interest Group on
3721	      Information Retrieval, July 2004.
3722	[376] S. Milgram, The small world problem, Psychology Today 1 (61)
3723	      (1967)
3724	[377] J. Kleinberg, The small-world phenonemon: An algorithmic
3725	      perspective, Proc. 32nd ACM Symp. on Theory of Computing (2000)
3726	[378] Y. Petrakis and E. Pitoura, "On constructing small worlds in
3727	      unstructured peer-to-peer systems," in Current trends in database
3728	      technology (Proc. First Int'l Workshop on Peer-to-Peer Computing
3729	      and Databases, Heraklion, Crete, Greece, March 14), vol. 3268,
3730	      Lecture Notes in Computer Science: Springer, 2004, pp. 415-424.
3731	[379] A. Iamnitchi, M. Ripeanu, and I. Foster, Locating Data in (Small
3732	      World?) P2P Scientific Collaborations, First Int'l Workshop on
3733	      Peer-to-Peer Systems (IPTPS), Cambridge, MA, March (2002)
3734	[380] Y. Ren, C. Sha, W. Qian, A. Zhou, B. Ooi, and K. Tan, Explore the
3735	      "small world phenomena" in pure P2P information sharing systems,
3736	      Proc. 3rd IEEE/ACM Int'l Symp. on Cluster Computing and the Grid
3737	      (2003) 232-239.
3738	[381] G. S. Manku, M. Bawa, and P. Raghavan, Symphony:  Distributed
3739	      Hashing in a Small World, Proc. 4th USENIX Symp. on Internet
3740	      Technologies and Systems, March 26-28 2003.
3741	[382] W. Litwin and S. Sahri, Implementing SD-SQL Server: a Scalable
3742	      Distributed Database System, CERIA Research Rerpot 2004-04-02,
3743	      April 2004.
3744	[383] M. Jarke and J. Koch, Query Optimization in Database Systems, ACM
3745	      Computing Surveys 16 (2) (1984) 111-152.

3747	[384] J. L. Bentley, Multidimensional binary search trees used for
3748	      associative searching, Communications of the ACM 18 (9) (1975)
3749	      509-517.
3750	[385] B. Chun, I. Stoica, J. Hellerstein, R. Huebsch, S. Jeffery, B. T.
3751	      Loo, S. Mardanbeigi, T. Roscoe, S. Rhea, and S. Schenker,
3752	      Querying at Internet Scale, Proc. 2004 ACM SIGMOD international
3753	      conference on management of data, demonstration session 2004, pp.
3754	      935-936.
3755	[386] P. Cao and Z. Wang, Efficient top-K query calculation in
3756	      distributed networks, Proc. 23rd Annual ACM SIGACT-SIGOPS Symp.
3757	      on Principles of Distributed Computing PODC 2004, July 25-28
3758	      2004, pp. 206-215.
3759	[387] D. Psaltoulis, I. Kostoulas, I. Gupta, K. Birman, and A. Demers,
3760	      Practical algorithms for size estimation in large and dynamic
3761	      groups, Proc. Twenty-Third Annual ACM SIGACT-SIGOPS Symp. on
3762	      Principles of Distributed Computing, PODC 2004, July 25-28 2004.
3763	[388] R. van Renesse, The importance of aggregation, Springer-Verlag
3764	      Lecture Notes in Computer Science  "Future Directions in
3765	      Distributed Computing".  A. Schiper, A. A. Shvartsman, H.
3766	      Weatherspoon, and B. Y. Zhao, editors. Springer-Verlag,
3767	      Heidelberg volume 2584 (2003)

3769	Author's Addresses

3771	   John Risson
3772	   School of Elec Eng and Telecommunications
3773	   University of New South Wales
3774	   Sydney NSW 2052 Australia

3776	   Email: jr@tuffit.com

3778	   Tim Moors
3779	   School of Elec Eng and Telecommunications
3780	   University of New South Wales
3781	   Sydney NSW 2052 Australia

3783	   Email: t.moors@unsw.edu.au

3785	Intellectual Property Statement

3787	   The IETF takes no position regarding the validity or scope of any
3788	   Intellectual Property Rights or other rights that might be claimed to
3789	   pertain to the implementation or use of the technology described in
3790	   this document or the extent to which any license under such rights
3791	   might or might not be available; nor does it represent that it has
3792	   made any independent effort to identify any such rights.  Information
3793	   on the procedures with respect to rights in RFC documents can be
3794	   found in BCP 78 and BCP 79.

3796	   Copies of IPR disclosures made to the IETF Secretariat and any
3797	   assurances of licenses to be made available, or the result of an
3798	   attempt made to obtain a general license or permission for the use of
3799	   such proprietary rights by implementers or users of this
3800	   specification can be obtained from the IETF on-line IPR repository at
3801	   http://www.ietf.org/ipr.

3803	   The IETF invites any interested party to bring to its attention any
3804	   copyrights, patents or patent applications, or other proprietary
3805	   rights that may cover technology that may be required to implement
3806	   this standard.  Please address the information to the IETF at
3807	   ietf-ipr@ietf.org.

3809	Disclaimer of Validity

3811	   This document and the information contained herein are provided on an
3812	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
3813	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
3814	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
3815	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
3816	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
3817	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

3819	Copyright Statement

3821	   Copyright (C) The IETF Trust (2007).

3823	   This document is subject to the rights, licenses and restrictions
3824	   contained in BCP 78, and except as set forth therein, the authors
3825	   retain all their rights.

3827	Acknowledgment

3829	   Funding for the RFC Editor function is currently provided by the
3830	   Internet Society.