idnits 2.17.1 draft-bryan-p2psip-reload-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 21. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 5695. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 5706. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 5713. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 5719. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 3 characters in excess of 72. -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 2249 has weird spacing: '...tyValue ide...' == Line 2885 has weird spacing: '...ionType typ...' == Line 3411 has weird spacing: '...naryKey key...' == Line 3843 has weird spacing: '...ionType type...' == Line 3845 has weird spacing: '...ionData data...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: If peer N which is responsible for a resource-id R discovers that the replica set for R (the next two nodes in its successor set) has changed, it MUST send a Store for any data associated with R to any new node in the replica set. It SHOULD not delete data from peers which have left the replica set. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 10, 2008) is 5798 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '16' on line 1769

  == Missing Reference: 'REF' is mentioned on line 5401, but not defined

  == Unused Reference: 'RFC3263' is defined on line 5509, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4572' is defined on line 5535, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2617' is defined on line 5539, but no explicit
     reference was found in the text

  == Unused Reference: 'I-D.cheshire-dnsext-multicastdns' is defined on line
     5559, but no explicit reference was found in the text

  == Unused Reference: 'I-D.camarillo-hip-bone' is defined on line 5579, but
     no explicit reference was found in the text

  == Outdated reference: A later version (-19) exists of
     draft-ietf-mmusic-ice-16

  == Outdated reference: A later version (-18) exists of
     draft-ietf-behave-rfc3489bis-06

  == Outdated reference: A later version (-16) exists of
     draft-ietf-behave-turn-03

  == Outdated reference: A later version (-08) exists of
     draft-ietf-pkix-cmc-trans-05

  == Outdated reference: A later version (-07) exists of
     draft-ietf-pkix-2797-bis-04

  ** Downref: Normative reference to an Informational draft:
     draft-ietf-tls-srp (ref. 'I-D.ietf-tls-srp')

  == Outdated reference: A later version (-16) exists of
     draft-ietf-mmusic-ice-tcp-03

  ** Obsolete normative reference: RFC 4347 (Obsoleted by RFC 6347)

  ** Downref: Normative reference to an Experimental RFC: RFC 4828

  == Outdated reference: A later version (-08) exists of
     draft-ietf-behave-tcp-07

  == Outdated reference: A later version (-09) exists of
     draft-ietf-p2psip-concepts-00

  -- Obsolete informational reference (is this intentional?): RFC 4572
     (Obsoleted by RFC 8122)

  -- Obsolete informational reference (is this intentional?): RFC 2617
     (Obsoleted by RFC 7235, RFC 7615, RFC 7616, RFC 7617)

  -- Obsolete informational reference (is this intentional?): RFC 2818
     (Obsoleted by RFC 9110)

  -- Obsolete informational reference (is this intentional?): RFC 3280
     (Obsoleted by RFC 5280)

  == Outdated reference: A later version (-15) exists of
     draft-cheshire-dnsext-multicastdns-06

  == Outdated reference: A later version (-11) exists of
     draft-cheshire-dnsext-dns-sd-04

  == Outdated reference: A later version (-01) exists of
     draft-camarillo-hip-bone-00


     Summary: 5 errors (**), 0 flaws (~~), 24 warnings (==), 14 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	P2PSIP                                                       C. Jennings
3	Internet-Draft                                                     Cisco
4	Intended status:  Standards Track                            B. Lowekamp
5	Expires:  December 12, 2008                       SIPeerior Technologies
6	                                                             E. Rescorla
7	                                                       Network Resonance
8	                                                                S. Baset
9	                                                          H. Schulzrinne
10	                                                     Columbia University
11	                                                           June 10, 2008

13	                REsource LOcation And Discovery (RELOAD)
14	                      draft-bryan-p2psip-reload-04

16	Status of this Memo

18	   By submitting this Internet-Draft, each author represents that any
19	   applicable patent or other IPR claims of which he or she is aware
20	   have been or will be disclosed, and any of which he or she becomes
21	   aware will be disclosed, in accordance with Section 6 of BCP 79.

23	   Internet-Drafts are working documents of the Internet Engineering
24	   Task Force (IETF), its areas, and its working groups.  Note that
25	   other groups may also distribute working documents as Internet-
26	   Drafts.

28	   Internet-Drafts are draft documents valid for a maximum of six months
29	   and may be updated, replaced, or obsoleted by other documents at any
30	   time.  It is inappropriate to use Internet-Drafts as reference
31	   material or to cite them other than as "work in progress."

33	   The list of current Internet-Drafts can be accessed at
34	   http://www.ietf.org/ietf/1id-abstracts.txt.

36	   The list of Internet-Draft Shadow Directories can be accessed at
37	   http://www.ietf.org/shadow.html.

39	   This Internet-Draft will expire on December 12, 2008.

41	Copyright Notice

43	   Copyright (C) The IETF Trust (2008).

45	Abstract

47	   This document defines REsource LOcation And Discovery (RELOAD), a
48	   peer-to-peer (P2P) signaling protocol for use on the Internet.  A P2P
49	   signaling protocol provides its clients with an abstract storage and
50	   messaging service between a set of cooperating peers that form the
51	   overlay network.  RELOAD is designed to support a P2P Session
52	   Initiation Protocol (P2PSIP) network, but can be utilized by other
53	   applications with similar requirements by defining new usages that
54	   specify the kinds of data that must be stored for a particular
55	   application.  RELOAD defines a security model based on a certificate
56	   enrollment service that provides unique identities.  NAT traversal is
57	   a fundamental service of the protocol.  RELOAD also allows access
58	   from "client" nodes which do not need to route traffic or store data
59	   for others.

61	Table of Contents

63	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   5
64	     1.1.  Basic Setting . . . . . . . . . . . . . . . . . . . . . .   6
65	     1.2.  Architecture  . . . . . . . . . . . . . . . . . . . . . .   7
66	       1.2.1.  Usage Layer . . . . . . . . . . . . . . . . . . . . .   9
67	       1.2.2.  Routing Layer . . . . . . . . . . . . . . . . . . . .   9
68	       1.2.3.  Storage . . . . . . . . . . . . . . . . . . . . . . .  10
69	       1.2.4.  Topology Plugin . . . . . . . . . . . . . . . . . . .  10
70	       1.2.5.  Forwarding Layer  . . . . . . . . . . . . . . . . . .  11
71	     1.3.  SIP Usage . . . . . . . . . . . . . . . . . . . . . . . .  11
72	     1.4.  Security  . . . . . . . . . . . . . . . . . . . . . . . .  12
73	     1.5.  Structure of This Document  . . . . . . . . . . . . . . .  12
74	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .  13
75	   3.  Overlay Management Overview . . . . . . . . . . . . . . . . .  15
76	     3.1.  Security and Identification . . . . . . . . . . . . . . .  15
77	       3.1.1.  Shared-Key Security . . . . . . . . . . . . . . . . .  16
78	     3.2.  Clients . . . . . . . . . . . . . . . . . . . . . . . . .  16
79	       3.2.1.  Client Routing  . . . . . . . . . . . . . . . . . . .  17
80	       3.2.2.  Client Behavior . . . . . . . . . . . . . . . . . . .  17
81	     3.3.  Routing . . . . . . . . . . . . . . . . . . . . . . . . .  19
82	       3.3.1.  Routing Alternatives  . . . . . . . . . . . . . . . .  21
83	     3.4.  Connectivity Management . . . . . . . . . . . . . . . . .  25
84	     3.5.  Overlay Algorithm Support . . . . . . . . . . . . . . . .  26
85	       3.5.1.  Support for Pluggable Overlay Algorithms  . . . . . .  26
86	       3.5.2.  Joining, Leaving, and Maintenance Overview  . . . . .  26
87	     3.6.  First-Time Setup  . . . . . . . . . . . . . . . . . . . .  27
88	       3.6.1.  Initial Configuration . . . . . . . . . . . . . . . .  27
89	       3.6.2.  Enrollment  . . . . . . . . . . . . . . . . . . . . .  27
90	   4.  Application Support Overview  . . . . . . . . . . . . . . . .  28
91	     4.1.  Data Storage  . . . . . . . . . . . . . . . . . . . . . .  28
92	       4.1.1.  Storage Permissions . . . . . . . . . . . . . . . . .  30
93	       4.1.2.  Usages  . . . . . . . . . . . . . . . . . . . . . . .  30
94	       4.1.3.  Replication . . . . . . . . . . . . . . . . . . . . .  31
95	     4.2.  Service Discovery . . . . . . . . . . . . . . . . . . . .  32
96	     4.3.  Application Connectivity  . . . . . . . . . . . . . . . .  32
97	   5.  P2PSIP Integration Overview . . . . . . . . . . . . . . . . .  32
98	   6.  Overlay Management Protocol . . . . . . . . . . . . . . . . .  33
99	     6.1.  Message Routing . . . . . . . . . . . . . . . . . . . . .  34
100	       6.1.1.  Request Origination . . . . . . . . . . . . . . . . .  34
101	       6.1.2.  Message Receipt and Forwarding  . . . . . . . . . . .  34
102	       6.1.3.  Response Origination  . . . . . . . . . . . . . . . .  37
103	     6.2.  Message Structure . . . . . . . . . . . . . . . . . . . .  37
104	       6.2.1.  Presentation Language . . . . . . . . . . . . . . . .  38
105	       6.2.2.  Forwarding Header . . . . . . . . . . . . . . . . . .  41
106	       6.2.3.  Message Contents Format . . . . . . . . . . . . . . .  47
107	       6.2.4.  Signature . . . . . . . . . . . . . . . . . . . . . .  50
108	     6.3.  Overlay Topology  . . . . . . . . . . . . . . . . . . . .  51
109	       6.3.1.  Topology Plugin Requirements  . . . . . . . . . . . .  51
110	       6.3.2.  Methods and types for use by topology plugins . . . .  52
111	     6.4.  Forwarding Layer  . . . . . . . . . . . . . . . . . . . .  54
112	       6.4.1.  Transports  . . . . . . . . . . . . . . . . . . . . .  54
113	       6.4.2.  Connection Management Methods . . . . . . . . . . . .  57
114	   7.  Data Storage Protocol . . . . . . . . . . . . . . . . . . . .  67
115	     7.1.  Data Signature Computation  . . . . . . . . . . . . . . .  68
116	     7.2.  Data Models . . . . . . . . . . . . . . . . . . . . . . .  69
117	       7.2.1.  Single Value  . . . . . . . . . . . . . . . . . . . .  69
118	       7.2.2.  Array . . . . . . . . . . . . . . . . . . . . . . . .  70
119	       7.2.3.  Dictionary  . . . . . . . . . . . . . . . . . . . . .  70
120	     7.3.  Data Storage Methods  . . . . . . . . . . . . . . . . . .  71
121	       7.3.1.  Store . . . . . . . . . . . . . . . . . . . . . . . .  71
122	       7.3.2.  Fetch . . . . . . . . . . . . . . . . . . . . . . . .  76
123	       7.3.3.  Remove  . . . . . . . . . . . . . . . . . . . . . . .  79
124	       7.3.4.  Find  . . . . . . . . . . . . . . . . . . . . . . . .  80
125	   8.  Certificate Store Usage . . . . . . . . . . . . . . . . . . .  82
126	   9.  TURN Server Usage . . . . . . . . . . . . . . . . . . . . . .  83
127	   10. SIP Usage . . . . . . . . . . . . . . . . . . . . . . . . . .  84
128	     10.1. Registering AORs  . . . . . . . . . . . . . . . . . . . .  85
129	     10.2. Looking up an AOR . . . . . . . . . . . . . . . . . . . .  87
130	     10.3. Forming a Direct Connection . . . . . . . . . . . . . . .  88
131	     10.4. GRUUs . . . . . . . . . . . . . . . . . . . . . . . . . .  88
132	     10.5. SIP-REGISTRATION Kind Definition  . . . . . . . . . . . .  88
133	   11. Diagnostic Usage  . . . . . . . . . . . . . . . . . . . . . .  89
134	     11.1. Diagnostic Metrics for a P2PSIP Deployment  . . . . . . .  91
135	   12. Chord Algorithm . . . . . . . . . . . . . . . . . . . . . . .  91
136	     12.1. Overview  . . . . . . . . . . . . . . . . . . . . . . . .  91
137	     12.2. Routing . . . . . . . . . . . . . . . . . . . . . . . . .  92
138	     12.3. Redundancy  . . . . . . . . . . . . . . . . . . . . . . .  92
139	     12.4. Joining . . . . . . . . . . . . . . . . . . . . . . . . .  92
140	     12.5. Routing Connects  . . . . . . . . . . . . . . . . . . . .  93
141	     12.6. Updates . . . . . . . . . . . . . . . . . . . . . . . . .  93
142	       12.6.1. Sending Updates . . . . . . . . . . . . . . . . . . .  95
143	       12.6.2. Receiving Updates . . . . . . . . . . . . . . . . . .  95
144	       12.6.3. Stabilization . . . . . . . . . . . . . . . . . . . .  96
145	     12.7. Route Query . . . . . . . . . . . . . . . . . . . . . . .  98
146	     12.8. Leaving . . . . . . . . . . . . . . . . . . . . . . . . .  98
147	   13. Enrollment and Bootstrap  . . . . . . . . . . . . . . . . . .  98
148	     13.1. Discovery . . . . . . . . . . . . . . . . . . . . . . . .  99
149	     13.2. Overlay Configuration . . . . . . . . . . . . . . . . . .  99
150	     13.3. Credentials . . . . . . . . . . . . . . . . . . . . . . . 102
151	       13.3.1. Self-Generated Credentials  . . . . . . . . . . . . . 102
152	     13.4. Joining the Overlay Peer  . . . . . . . . . . . . . . . . 103
153	   14. Message Flow Example  . . . . . . . . . . . . . . . . . . . . 104
154	   15. Security Considerations . . . . . . . . . . . . . . . . . . . 109
155	     15.1. Overview  . . . . . . . . . . . . . . . . . . . . . . . . 109
156	     15.2. Attacks on P2P Overlays . . . . . . . . . . . . . . . . . 110
157	     15.3. Certificate-based Security  . . . . . . . . . . . . . . . 110
158	     15.4. Shared-Secret Security  . . . . . . . . . . . . . . . . . 111
159	     15.5. Storage Security  . . . . . . . . . . . . . . . . . . . . 112
160	       15.5.1. Authorization . . . . . . . . . . . . . . . . . . . . 112
161	       15.5.2. Distributed Quota . . . . . . . . . . . . . . . . . . 113
162	       15.5.3. Correctness . . . . . . . . . . . . . . . . . . . . . 113
163	       15.5.4. Residual Attacks  . . . . . . . . . . . . . . . . . . 113
164	     15.6. Routing Security  . . . . . . . . . . . . . . . . . . . . 114
165	       15.6.1. Background  . . . . . . . . . . . . . . . . . . . . . 114
166	       15.6.2. Admissions Control  . . . . . . . . . . . . . . . . . 115
167	       15.6.3. Peer Identification and Authentication  . . . . . . . 115
168	       15.6.4. Protecting the Signaling  . . . . . . . . . . . . . . 116
169	       15.6.5. Residual Attacks  . . . . . . . . . . . . . . . . . . 116
170	     15.7. SIP-Specific Issues . . . . . . . . . . . . . . . . . . . 116
171	       15.7.1. Fork Explosion  . . . . . . . . . . . . . . . . . . . 116
172	       15.7.2. Malicious Retargeting . . . . . . . . . . . . . . . . 117
173	       15.7.3. Privacy Issues  . . . . . . . . . . . . . . . . . . . 117
174	   16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 117
175	     16.1. Overlay Algorithm Types . . . . . . . . . . . . . . . . . 117
176	     16.2. Data Kind-Id  . . . . . . . . . . . . . . . . . . . . . . 117
177	     16.3. Data Model  . . . . . . . . . . . . . . . . . . . . . . . 118
178	     16.4. Message Codes . . . . . . . . . . . . . . . . . . . . . . 118
179	     16.5. Error Codes . . . . . . . . . . . . . . . . . . . . . . . 119
180	     16.6. Route Log Extension Types . . . . . . . . . . . . . . . . 119
181	     16.7. reload: URI Scheme  . . . . . . . . . . . . . . . . . . . 119
182	       16.7.1. URI Registration  . . . . . . . . . . . . . . . . . . 120
183	   17. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 121
184	   18. References  . . . . . . . . . . . . . . . . . . . . . . . . . 121
185	     18.1. Normative References  . . . . . . . . . . . . . . . . . . 121
186	     18.2. Informative References  . . . . . . . . . . . . . . . . . 122
187	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . 125
188	   Intellectual Property and Copyright Statements  . . . . . . . . . 127

190	1.  Introduction

192	   This document defines REsource LOcation And Discovery (RELOAD), a
193	   peer-to-peer (P2P) signaling protocol for use on the Internet.  It
194	   provides a generic, self-organizing overlay network service, allowing
195	   nodes to efficiently route messages to other nodes and to efficiently
196	   store and retrieve data in the overlay.  RELOAD provides several
197	   features that are critical for a successful P2P protocol for the
198	   Internet:

200	   Security Framework:  A P2P network will often be established among a
201	      set of peers that do not trust each other.  RELOAD leverages a
202	      central enrollment server to provide credentials for each peer
203	      which can then be used to authenticate each operation.  This
204	      greatly reduces the possible attack surface.

206	   Usage Model:  RELOAD is designed to support a variety of
207	      applications, including P2P multimedia communications with the
208	      Session Initiation Protocol [I-D.ietf-p2psip-concepts].  RELOAD
209	      allows the definition of new application usages, each of which can
210	      define its own data types, along with the rules for their use.
211	      This allows RELOAD to be used with new applications through a
212	      simple documentation process that supplies the details for each
213	      application.

215	   NAT Traversal:  RELOAD is designed to function in environments where
216	      many if not most of the nodes are behind NATs or firewalls.
217	      Operations for NAT traversal are part of the base design,
218	      including using ICE to establish new RELOAD or application
219	      protocol connections as well as tunneling application protocols
220	      across the overlay.

222	   High Performance Routing:  The very nature of overlay algorithms
223	      introduces a requirement that peers participating in the P2P
224	      network route requests on behalf of other peers in the network.
225	      This introduces a load on those other peers, in the form of
226	      bandwidth and processing power.  RELOAD has been defined with a
227	      simple, lightweight forwarding header, thus minimizing the amount
228	      of effort required by intermediate peers.

230	   Pluggable overlay Algorithms:  RELOAD has been designed with an
231	      abstract interface to the overlay layer to simplify implementing a
232	      variety of structured (DHT) and unstructured overlay algorithms.
233	      This specification also defines how RELOAD is used with Chord,
234	      which is mandatory to implement.  Specifying a default "must
235	      implement" overlay algorithm will allow interoperability, while
236	      the extensibility allows selection of overlay algorithms optimized
237	      for a particular application.

239	   These properties were designed specifically to meet the requirements
240	   for a P2P protocol to support SIP, and this document defines a SIP
241	   Usage of RELOAD.  However, RELOAD is not limited to usage by SIP and
242	   could serve as a tool for supporting other P2P applications with
243	   similar needs.  RELOAD is also based on the concepts introduced in
244	   [I-D.ietf-p2psip-concepts].

246	1.1.  Basic Setting

248	   In this section, we provide a brief overview of the operational
249	   setting for RELOAD.  See the concepts document for more details.  A
250	   RELOAD Overlay Instance consists of a set of nodes arranged in a
251	   partly connected graph.  Each node in the overlay is assigned a
252	   numeric Node-ID which, together with the specific overlay algorithm
253	   in use, determines its position in the graph and the set of nodes it
254	   connects to.  The figure below shows a trivial example which isn't
255	   drawn from any particular overlay algorithm, but was chosen for
256	   convenience of representation.

258	             +--------+              +--------+              +--------+
259	             | Node 10|--------------| Node 20|--------------| Node 30|
260	             +--------+              +--------+              +--------+
261	                 |                       |                       |
262	                 |                       |                       |
263	             +--------+              +--------+              +--------+
264	             | Node 40|--------------| Node 50|--------------| Node 60|
265	             +--------+              +--------+              +--------+
266	                 |                       |                       |
267	                 |                       |                       |
268	             +--------+              +--------+              +--------+
269	             | Node 70|--------------| Node 80|--------------| Node 90|
270	             +--------+              +--------+              +--------+
271	                                         |
272	                                         |
273	                                     +--------+
274	                                     | Node 85|
275	                                     |(Client)|
276	                                     +--------+

278	   Because the graph is not fully connected, when a node wants to send a
279	   message to another node, it may need to route it through the network.
280	   For instance, Node 10 can talk directly to nodes 20 and 40, but not
281	   to Node 70.  In order to send a message to Node 70, it would first
282	   send it to Node 40 with instructions to pass it along to Node 80.
283	   Different overlay algorithms will have different connectivity graphs,
284	   but the general idea behind all of them is to allow any node in the
285	   graph to efficiently reach every other node within a small number of
286	   hops.

288	   The RELOAD network is not only a messaging network.  It is also a
289	   storage network.  Records are stored under numeric addresses which
290	   occupy the same space as node identifiers.  Nodes are responsible for
291	   storing the data associated with some set of addresses as determined
292	   by their Node-Id.  For instance, we might say that every node is
293	   responsible for storing any data value which has an address less than
294	   or equal to its own Node-Id, but greater than the next lowest
295	   Node-Id.  Thus, Node-20 would be responsible for storing values
296	   11-20.

298	   RELOAD also supports clients.  These are nodes which have Node-Ids
299	   but do not participate in routing or storage.  For instance, in the
300	   figure above Node 85 is a client.  It can route to the rest of the
301	   RELOAD network via Node 80, but no other node will route through it
302	   and Node 90 is still responsible for all addresses between 81-90.  We
303	   refer to non-client nodes as peers.

305	   Other applications (for instance, SIP) can be defined on top of
306	   RELOAD and use these two basic RELOAD services to provide their own
307	   services.

309	1.2.  Architecture

311	   Architecturally RELOAD is divided into several layers, as shown in
312	   the following figure.

314	                    Application

316	               +-------+  +-------+
317	               | SIP   |  | XMPP  |  ...
318	               | Usage |  | Usage |
319	               +-------+  +-------+
320	             -------------------------------------- Message Routing API
321	               +------------------+   +---------+
322	               |                  |<->| Storage |
323	               |                  |   +---------+
324	               |      Routing     |        ^
325	               |       Layer      |        v
326	               |                  |   +---------+
327	               |                  |<->|Topology |
328	               |                  |   | Plugin  |
329	               +------------------+   +---------+
330	                         ^                 ^
331	                         v                 |
332	               +------------------+ <------+
333	               |    Forwarding    |
334	               |       Layer      |
335	               +------------------+
336	             -------------------------------------- Transport API
337	                +-------+  +------+
338	                |TLS    |  |DTLS  |  ...
339	                +-------+  +------+

341	   The major components of RELOAD are:

343	   Usage Layer:  Each application defines a RELOAD usage; a set of data
344	      kinds and behaviors which describe how to use the services
345	      provided by RELOAD.  These usages all talk to RELOAD through a
346	      common Message Routing API.

348	   Routing Layer:  The Routing Layer is responsible for routing messages
349	      through the overlay.  It also manages request state for the usages
350	      and forwards Store and Fetch operations to the Storage component.
351	      It talks directly to the Topology Plugin, which is responsible for
352	      implementing the specific topology defined by the overlay
353	      algorithm being used.

355	   Storage:  The Storage component is responsible for processing
356	      messages relating to the storage and retrieval of data.  It talks
357	      directly to the Topology Plugin and the routing layer in order to
358	      send and receive messages and manage data replication and
359	      migration.

361	   Topology Plugin:  The Topology Plugin is responsible for implementing
362	      the specific overlay algorithm being used.  It talks directly to
363	      the Routing Layer to send and receive overlay management messages,
364	      to the Storage component to manage data replication, and directly
365	      to the Forwarding Layer to control hop-by-hop message forwarding.

367	   Forwarding Layer:  The Forwarding Layer provides packet forwarding
368	      services between nodes.  It also handles setting up connections
369	      across NATs using ICE.

371	1.2.1.  Usage Layer

373	   The top layer, called the Usage Layer, has application usages---such
374	   as the SIP Location Usage---that use the abstract Message Routing API
375	   provided by RELOAD.  The goal of this layer is to implement
376	   application-specific usages of the generic overlay services provided
377	   by RELOAD.  The usage defines how a specific application maps its
378	   data into something that can be stored in the overlay, where to store
379	   the data, how to secure the data, and finally how applications can
380	   retrieve and use the data.

382	   The architecture diagram shows both a SIP usage and an XMPP usage.  A
383	   single application may require multiple usages, for example a SIP
384	   application may also require a voicemail usage.  A usage may define
385	   multiple kinds of data that are stored in the overlay and may also
386	   rely on kinds originally defined by other usages.

388	   This draft also defines a Diagnostics Usage, which can be used to
389	   obtain diagnostic information about a peer in the overlay.  The
390	   Diagnostics Usage is interesting both to administrators monitoring
391	   the overlay as well as to some overlay algorithms that base their
392	   decisions on capabilities and current load of nodes in the overlay.

394	1.2.2.  Routing Layer

396	   The Routing Layer provides a generic message routing service for the
397	   overlay.  Each peer is identified by its location in the overlay as
398	   determined by its Node-ID.  A component which is a client of the
399	   Routing Layer can perform two basic functions:

401	   o  Send a message to a given peer, specified by Node-Id or
402	      Resource-Id.
403	   o  Receive messages that other peers sent to a Node-Id or Resource-Id
404	      for which this peer is responsible.

406	   All usages are clients of the Routing Layer and use RELOAD's services
407	   by sending and receiving messages from peers.  For instance, when a
408	   usage wants to store data, it does so by sending Store requests.
409	   Note that the Storage component and the Topology Plugin are
410	   themselves clients of the Routing Layer, because they need to send
411	   and receive messages from other peers.

413	   The Routing Layer provides a fairly generic interface that allows the
414	   topology plugin control the overlay and resource operations and
415	   messages.  Since each overlay algorithm is defined and functions
416	   differently, we generically refer to the table of other peers that
417	   the overlay algorithm maintains and uses to route requests
418	   (neighbors) as a Routing Table.  The Routing Layer component makes
419	   queries to the overlay algorithm to determine the next hop, then
420	   encodes and sends the message itself.  Similarly, the overlay
421	   algorithm issues periodic update requests through the logic component
422	   to maintain and update its Routing Table.

424	1.2.3.  Storage

426	   One of the major functions of RELOAD is to allow nodes to store data
427	   in the overlay and to retrieve data stored by other nodes or by
428	   themselves.  The Storage component is responsible for processing data
429	   storage and retrieval messages from other peers.  For instance, the
430	   Storage component might receive a Store request for a given resource
431	   from the Routing Layer.  It would then store the data value(s) in its
432	   local data store and sends a response to the Routing Layer for
433	   delivery to the requesting peer.

435	   The node's Node-ID determines the set of resources which it will be
436	   responsible for storing.  However, the exact mapping between these is
437	   determined by the overlay algorithm used by the overlay, therefore
438	   the Storage component always the queries the topology plugin to
439	   determine where a particular resource should be stored.

441	1.2.4.  Topology Plugin

443	   RELOAD is explicitly designed to work with a variety of overlay
444	   algorithms.  In order to facilitate this, the overlay algorithm
445	   implementation is provided by a Topology Plugin so that each overlay
446	   can select an appropriate overlay algorithm that relies on the common
447	   RELOAD core protocols and code.

449	   The Topology Plugin is responsible for maintaining the overlay
450	   algorithm Routing Table, which is consulted by the Routing Layer
451	   before routing a message.  When connections are made or broken, the
452	   Forwarding Layer notifies the Topology Plugin, which adjusts the
453	   routing table as appropriate.  The Topology Plugin will also instruct
454	   the Forwarding Layer to form new connections as dictated by the
455	   requirements of the overlay algorithm Topology.

457	   As peers enter and leave, resources may be stored on different peers,
458	   so the Topology Plugin also keeps track of which peers are
459	   responsible for which resources.  As peers join and leave, the
460	   Topology Plugin issues resource migration requests as appropriate, in
461	   order to ensure that other peers have whatever resources they are now
462	   responsible for.  The Topology Plugin is also responsible for
463	   providing redundant data storage to protect against loss of
464	   information in the event of a peer failure and to protect against
465	   compromised or subversive peers.

467	1.2.5.  Forwarding Layer

469	   The Forwarding Layer is responsible for getting a packet to the next
470	   peer, as determined by the Routing and Storage Layer.  The Forwarding
471	   Layer establishes and maintains the network connections as required
472	   by the Topology Plugin.  This layer is also responsible for setting
473	   up connections to other peers through NATs and firewalls using ICE,
474	   and it can elect to forward traffic using relays for NAT and firewall
475	   traversal.

477	   The Forwarding Layer sits on top of transport layer protocols which
478	   carry the actual traffic.  This specification defines how to use DTLS
479	   and TLS to carry RELOAD messages.

481	1.3.  SIP Usage

483	   The SIP Usage of RELOAD allows SIP user agents to provide a peer-to-
484	   peer telephony service without the requirement for permanent proxy or
485	   registration servers.  In such a network, the RELOAD overlay itself
486	   performs the registration and rendezvous functions ordinarily
487	   associated with such servers.

489	   The SIP Usage involves two basic functions:
490	   Registration:    SIP UAs can use the RELOAD data storage
491	      functionality to store a mapping from their AOR to their Node-Id
492	      in the overlay, and to retrieve the Node-Id of other UAs.
493	   Rendezvous:    Once a SIP UA has identified the Node-Id for an AOR it
494	      wishes to call, it can use the RELOAD message routing system to
495	      set up a direct connection which can be used to exchange SIP
496	      messages.

498	   For instance, Bob could register his Node-Id, "1234", under his AOR,
499	   "sip:bob@dht.example.com".  When Alice wants to call Bob, she queries
500	   the overlay for "sip:bob@dht.example.com" and gets back Node-Id 1234.
501	   She then uses the overlay to establish a direct connection with Bob
502	   and can use that direct connection to perform a standard SIP INVITE.

504	1.4.  Security

506	   RELOAD's security model is based on each node having one or more
507	   public key certificates.  In general, these certificates will be
508	   assigned by a central server which also assigns Node-Ids, although
509	   self-signed certificates can be used in closed networks.  These
510	   credentials can be leveraged to provide communications security for
511	   RELOAD messages.  RELOAD provides communications security at three
512	   levels:

514	   Connection Level:    Connections between peers are secured with TLS
515	      or DTLS.
516	   Message Level:    Each RELOAD message must be signed.
517	   Object Level:    Stored objects must be signed by the storing peer.

519	   These three levels of security work together to allow peers to verify
520	   the origin and correctness of data they receive from other peers,
521	   even in the face of malicious activity by other peers in the overlay.
522	   RELOAD also provides access control built on top of these
523	   communications security features.  Because the peer responsible for
524	   storing a piece of data can validate the signature on the data being
525	   stored, the responsible peer can determine whether a given operation
526	   is permitted or not.

528	   RELOAD also provides a shared secret based admission control feature
529	   using shared secrets and TLS-PSK.  In order to form a TLS connection
530	   to any node in the overlay, a new node needs to know the shared
531	   overlay key, thus restricting access to authorized users.

533	1.5.  Structure of This Document

535	   The remainder of this document is structured as follows.

537	   o  Section 2 provides definitions of terms used in this document.
538	   o  Section 3 provides an overview of the mechanisms used to establish
539	      and maintain the overlay.
540	   o  Section 4 provides an overview of the mechanism RELOAD provides to
541	      support other applications.
542	   o  Section 5 provides an overview of the SIP usage for RELOAD.
543	   o  Section 6 defines the protocol messages that RELOAD uses to
544	      establish and maintain the overlay.
545	   o  Section 7 defines the protocol messages that are used to store and
546	      retrieve data using RELOAD.
547	   o  Sections 8-10 define three Usages of RELOAD that provide
548	      certificate storage, SIP, and Diagnostics.
549	   o  Section 11 defines a specific Topology Plugin using Chord.

551	   o  Section 12 defines the mechanisms that new RELOAD nodes use to
552	      join the overlay for the first time.
553	   o  Section 13 provides an extended example.
554	   o  Sections 14 and 15 provide Security and IANA considerations.

556	2.  Terminology

558	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
559	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
560	   document are to be interpreted as described in RFC 2119 [RFC2119].

562	   We use the terminology and definitions from the Concepts and
563	   Terminology for Peer to Peer SIP [I-D.ietf-p2psip-concepts] draft
564	   extensively in this document.  Other terms used in this document are
565	   defined inline when used and are also defined below for reference.
566	   Terms which are new to this document (and perhaps should be added to
567	   the concepts document) are marked with a (*).

569	   DHT:  A distributed hash table.  A DHT is an abstract hash table
570	      service realized by storing the contents of the hash table across
571	      a set of peers.

573	   Overlay Algorithm:  An overlay algorithm defines the rules for
574	      determining which peers in an overlay store a particular piece of
575	      data and for determining a topology of interconnections amongst
576	      peers in order to find a piece of data.

578	   Overlay Instance:  A specific overlay algorithm and the collection of
579	      peers that are collaborating to provide read and write access to
580	      it.  There can be any number of overlay instances running in an IP
581	      network at a time, and each operates in isolation of the others.

583	   Peer:  A host that is participating in the overlay.  Peers are
584	      responsible for holding some portion of the data that has been
585	      stored in the overlay and also route messages on behalf of other
586	      hosts as required by the Overlay Algorithm.

588	   Client:  A host that is able to store data in and retrieve data from
589	      the overlay but which is not participating in routing or data
590	      storage for the overlay.

592	   Node:  We use the term "Node" to refer to a host that may be either a
593	      Peer or a Client.  Because RELOAD uses the same protocol for both
594	      clients and peers, much of the text applies equally to both.
595	      Therefore we use "Node" when the text applies to both Clients and
596	      Peers and the more specific term when the text applies only to
597	      Clients or only to Peers.

599	   Node-ID:  A 128-bit value that uniquely identifies a node.  Node-IDs
600	      0 and 2^128 - 1 are reserved and are invalid Node-IDs.  A value of
601	      zero is not used in the wire protocol but can be used to indicate
602	      an invalid node in implementations and APIs.  The Node-ID of
603	      2^128-1 is used on the wire protocol as a wildcard. (*)

605	   Resource:  An object or group of objects associated with a string
606	      identifier see "Resource Name" below.

608	   Resource Name:  The (potentially) human readable name by which a
609	      resource is identified.  In unstructured P2P networks, the
610	      resource name is used directly as a Resource-Id.  In structured
611	      P2P networks the resource name can be mapped into a Resource-ID by
612	      using the string as the input to hash function.  A SIP resource,
613	      for example, is often identified by its AOR (see Resource Name
614	      below).(*)

616	   Resource-ID:  A value that identifies some resources and which is
617	      used as a key for storing and retrieving the resource.  Often this
618	      is not human friendly/readable.  One way to generate a Resource-ID
619	      is by applying a mapping function to some other unique name (e.g.,
620	      user name or service name) for the resource.  The Resource-ID is
621	      used by the distributed database algorithm to determine the peer
622	      or peers that are responsible for storing the data for the
623	      overlay.  In structured P2P networks, resource-IDs are generally
624	      fixed length and are formed by hashing the resource identifier.
625	      In unstructured networks, resource identifiers may be used
626	      directly as resource-IDs and may have variable length.

628	   Connection Table:  The set of peers to which a node is directly
629	      connected.  This includes nodes with which Connect handshakes have
630	      been done but which have not sent any Updates. (*)

632	   Routing Table:  The set of peers which a node can use to route
633	      overlay messages.  In general, these peers will all be on the
634	      connection table but not vice versa, because some peers will have
635	      Connected but not sent updates.  Peers may send messages directly
636	      to peers which are on the connection table but may only route
637	      messages to other peers through peers which are on the routing
638	      table. (*)

640	   Destination List:  A list of IDs through which a message is to be
641	      routed.  A single Node-ID is a trivial form of destination list.
642	      (*)

644	   Usage:  A usage is an application that wishes to use the overlay for
645	      some purpose.  Each application wishing to use the overlay defines
646	      a set of data kinds that it wishes to use.  The SIP usage defines
647	      the location, certificate, STUN server and TURN server data kinds.
648	      (*)

650	3.  Overlay Management Overview

652	   The most basic function of RELOAD is as a generic overlay network.
653	   Nodes need to be able to join the overlay, form connections to other
654	   nodes, and route messages through the overlay to nodes to which they
655	   are not directly connected.  This section provides an overview of the
656	   mechanisms that perform these functions.

658	3.1.  Security and Identification

660	   Every node in the RELOAD overlay is identified by one or more Node-
661	   IDs.  The Node-ID is used for three major purposes:

663	   o  To address the node itself.
664	   o  To determine its position in the overlay topology when the overlay
665	      is structured.
666	   o  To determine the set of resources for which the node is
667	      responsible.

669	   Each node has a certificate [RFC3280] containing one or more Node-
670	   IDs, which are globally unique.

672	   The certificate serves multiple purposes:

674	   o  It entitles the user to store data at specific locations in the
675	      Overlay Instance.  Each data kind defines the specific rules for
676	      determining which certificates can access each resource-ID/kind-id
677	      pair.  For instance, some kinds might allow anyone to write at a
678	      given location, whereas others might restrict writes to the owner
679	      of a single certificate.
680	   o  It entitles the user to operate a node that has a Node-ID found in
681	      the certificate.  When the node forms a connection to another
682	      peer, it can use this certificate so that a node connecting to it
683	      knows it is connected to the correct node.  In addition, the node
684	      can sign messages, thus providing integrity and authentication for
685	      messages which are sent from the node.
686	   o  It entitles the user to use the user name found in the
687	      certificate.

689	   If a user has more than one device, typically they would get one
690	   certificate for each device.  This allows each device to act as a
691	   separate peer.

693	   RELOAD supports two certificate issuance models.  The first is based
694	   on a central enrollment process which allocates a unique name and
695	   Node-Id to the node a certificate for a public/private key pair for
696	   the user.  All peers in a particular Overlay Instance have the
697	   enrollment server as a trust anchor and so can verify any other
698	   peer's certificate.

700	   In some settings, a group of users want to set up an overlay network
701	   but are not concerned about attack by other users in the network.
702	   For instance, users on a LAN might want to set up a short term ad hoc
703	   network without going to the trouble of setting up an enrollment
704	   server.  RELOAD supports the use of self-generated and self-signed
705	   certificates.  When self-signed certificates are used, the node also
706	   generates its own Node-Id and username.  The Node-Id is computed as a
707	   digest of the public key, to prevent Node-Id theft, however this
708	   model is still subject to a number of known attacks (most notably
709	   Sybil attacks [Sybil]) and can only be safely used in closed networks
710	   where users are mutually trusting.

712	3.1.1.  Shared-Key Security

714	   RELOAD also provides an admission control system based on shared
715	   keys.  In this model, the peers all share a single key which is used
716	   to authenticate the peer-to-peer connections via TLS-PSK/TLS-SRP.

718	3.2.  Clients

720	   RELOAD defines a single protocol that is used both as the peer
721	   protocol and the client protocol for the overlay.  This simplifies
722	   implementation, particularly for devices that may act in either role,
723	   and allows clients to inject messages directly into the overlay.

725	   We use the term "peer" to identify a node in the overlay that routes
726	   messages for nodes other than those to which it is directly
727	   connected.  Peers typically also have storage responsibilities.  We
728	   use the term "client" to refer to nodes that do not have routing or
729	   storage responsibilities.  When text applies to both peers and
730	   clients, we will simply refer to such a device as a "node."

732	   RELOAD's client support allows nodes that are not participating in
733	   the overlay as peers to utilize the same implementation and to
734	   benefit from the same security mechanisms as the peers.  Clients
735	   possess and use certificates that authorize the user to store data at
736	   its locations in the overlay.  The Node-ID in the certificate is used
737	   to identify the particular client as a member of the overlay and to
738	   authenticate its messages.

740	   The remainder of this section discusses how RELOAD supports clients
741	   in terms of routing issues specific to clients, minimum functionality
742	   requirements for clients, and alternatives for devices not capable of
743	   meeting those requirements.

745	3.2.1.  Client Routing

747	   There are two routing options by which a client may be located in an
748	   overlay.

750	   o  Establish a connection to the peer responsible for the client's
751	      Node-ID in the overlay.  Then requests may be sent from/to the
752	      client using its Node-ID in the same manner as if it were a peer,
753	      because the responsible peer in the overlay will handle the final
754	      step of routing to the client.
755	   o  Establish a connection with an arbitrary peer in the overlay
756	      (perhaps based on network proximity or an inability to establish a
757	      direct connection with the responsible peer).  In this case, the
758	      client will rely on RELOAD's Destination List feature to ensure
759	      reachability.  The client can initiate requests, and any node in
760	      the overlay that knows the Destination List to its current
761	      location can reach it, but the client is not directly reachable
762	      directly using only its Node-ID.  The Destination List required to
763	      reach it must be learnable via other mechanisms, such as being
764	      stored in the overlay by a usage, if the client is to receive
765	      incoming requests from other members of the overlay.

767	3.2.2.  Client Behavior

769	   There are a wide variety of reasons a node may act as a client rather
770	   than as a peer [I-D.pascual-p2psip-clients].  This section outlines
771	   some of those scenarios and how the client's behavior changes based
772	   on its capabilities.

774	3.2.2.1.  Why Not Only Peers?

776	   For a number of reasons, a particular node may be forced to act as a
777	   client even though it is willing to act as a peer.  These include:

779	   o  The node does not have appropriate network connectivity---
780	      typically because it is behind an overly restrictive NAT, or it
781	      has a low-bandwidth network connection.
782	   o  The node may not have sufficient resources, such as computing
783	      power, storage space, or battery power.

785	   o  The overlay algorithm may dictate specific requirements for peer
786	      selection.  These may include participation in the overlay to
787	      determine trustworthiness, control the number of peers in the
788	      overlay to reduce overly-long routing paths, or ensure minimum
789	      application uptime before a node can join as a peer.

791	   The ultimate criteria for a node to become a peer are determined by
792	   the overlay algorithm and specific deployment.  A node acting as a
793	   client that has a full implementation of RELOAD and the appropriate
794	   overlay algorithm is capable of locating its responsible peer in the
795	   overlay and using CONNECT to establish a direct connection to that
796	   peer.  In that way, it may elect to be reachable under either of the
797	   routing approaches listed above.  Particularly for overlay algorithms
798	   that elect nodes to serve as peers based on trustworthiness or
799	   population, the overlay algorithm may require such a client to locate
800	   itself at a particular place in the overlay.

802	3.2.2.2.  Minimum Functionality Requirements for Clients

804	   A node may act as a client simply because it does not have the
805	   resources or even an implementation of the topology plugin required
806	   to acts as a peer in the overlay.  In order to exchange RELOAD
807	   messages with a peer, a client must meet a minimum level of
808	   functionality.  Such a client must:

810	   o  Implement RELOAD's connection-management connections that are used
811	      to establish the connection with the peer.
812	   o  Implement RELOAD's data storage and retrieval methods (with client
813	      functionality).
814	   o  Be able to calculate Resource-IDs used by the overlay.
815	   o  Possess security credentials required by the overlay it is
816	      implementing.

818	   A client speaks the same protocol as the peers, knows how to
819	   calculate Resource-IDs, and signs its requests in the same manner as
820	   peers.  While a client does not necessarily require a full
821	   implementation of the overlay algorithm, calculating the Resource-ID
822	   requires an implementation of the appropriate algorithm for the
823	   overlay.

825	   RELOAD does not support a separate protocol for clients that do not
826	   meet these functionality requirements.  Any such extension would
827	   either entail compromises on the features of RELOAD or require an
828	   entirely new protocol to reimplement the core features of RELOAD.
829	   Furthermore, for P2PSIP and many other applications, a native
830	   application-level protocol already exists that is sufficient for such
831	   a client, as described in the next section.

833	3.2.2.3.  Clients as Application-Level Agents

835	   SIP defines an extensive protocol for registration and security
836	   between a client and its registrar/proxy server(s).  Any SIP device
837	   can act as a client of a RELOAD-based P2PSIP overlay if it contacts a
838	   peer that implements the server-side functionality required by the
839	   SIP protocol.  In this case, the peer would be acting as if it were
840	   the user's peer, and would need the appropriate credentials for that
841	   user.

843	   Application-level support for clients is defined by a usage.  A usage
844	   offering support for application-level clients should specify how the
845	   security of the system is maintained when the data is moved between
846	   the application and RELOAD layers.

848	3.3.  Routing

850	   This section will discuss the requirements RELOAD's routing
851	   capabilities must meet, then describe the routing features in the
852	   protocol, and provide a brief overview of how they are used.  The
853	   section will conclude by discussing some alternative designs and the
854	   tradeoffs that would be necessary to support them.

856	   RELOAD's routing capabilities must meet the following requirements:

858	   NAT Traversal:    RELOAD must support establishing and using
859	      connections between nodes separated by one or more NATs, including
860	      locating peers behind NATs for those overlays allowing/requiring
861	      it.
862	   Clients:    RELOAD must support requests from and to clients that do
863	      not participate in overlay routing.
864	   Client promotion:  RELOAD must support clients that become peers at a
865	      later point as determined by the overlay algorithm and deployment.
866	   Low state:    RELOAD's routing algorithms must not require
867	      significant state to be stored on intermediate peers.
868	   Return routability in unstable topologies:    At some points in
869	      times, different nodes may have inconsistent information about the
870	      connectivity of the routing graph.  In all cases, the response to
871	      a request needs to delivered to the node that sent the request and
872	      not to some other node.

874	   To meet these requirements, RELOAD's routing relies on two basic
875	   mechanisms:

877	   Via Lists:    The forwarding header used by all RELOAD messages
878	      contains both a Via List (built hop-by-hop as the message is
879	      routed through the overlay) and a Destination List (providing
880	      source-routing capabilities for requests and return-path routing
881	      for responses).
882	   Route_Query:    The Route_Query method allows a node to query a peer
883	      for the next hop it will use to route a message.  This method is
884	      useful for diagnostics and for iterative routing.

886	   The basic routing mechanism used by RELOAD is Symmetric Recursive.
887	   We will first describe symmetric routing and then discuss its
888	   advantages in terms of the requirements discussed above.

890	   Symmetric recursive routing requires a message follow the path
891	   through the overlay to the destination without returning to the
892	   originating node:  each peer forwards the message closer to its
893	   destination.  The return path of the response is then the same path
894	   followed in reverse.  For example, a message following a route from A
895	   to Z through B and X:

897	   A         B         X         Z
898	   -------------------------------

900	   ---------->
901	   Dest=Z
902	             ---------->
903	             Via=A
904	             Dest=Z
905	                       ---------->
906	                       Via=A, B
907	                       Dest=Z

909	                       <----------
910	                      Dest=X, B, A
911	             <----------
912	               Dest=B, A
913	   <----------
914	        Dest=A

916	   Note that the preceding Figure does not indicate whether A is a
917	   client or peer---A forwards its request to B and the response is
918	   returned to A in the same manner regardless of A's role in the
919	   overlay.

921	   This figure shows use of full via-lists by intermediate peers B and
922	   X. However, if B and/or X are willing to store state, then they may
923	   elect to truncate the lists, save that information internally (keyed
924	   by the transaction id), and return the response message along the
925	   path from which it was received when the response is received.  This
926	   option requires greater state on intermediate peers but saves a small
927	   amount of bandwidth and reduces the need for modifying the message
928	   enroute.  Selection of this mode of operation is a choice for the
929	   individual peer---the techniques are mutually interoperable even on a
930	   single message.  The Figure below shows B using full via lists but X
931	   truncating them and saving the state internally.

933	   A         B         X         Z
934	   -------------------------------

936	   ---------->
937	   Dest=Z
938	             ---------->
939	             Via=A
940	             Dest=Z
941	                       ---------->
942	                       Dest=Z

944	                       <----------
945	                            Dest=X
946	               <----------
947	               Dest=B, A
948	   <----------
949	        Dest=A

951	   For debugging purposes, a Route Log attribute is available that
952	   stores information about each peer as the message is forwarded.

954	   RELOAD also supports a basic Iterative routing mode (where the
955	   intermediate peers merely return a response indicating the next hop,
956	   but do not actually forward the message to that next hop themselves).
957	   Iterative routing is implemented using the Route_Query method, which
958	   requests this behavior.  Note that iterative routing is selected only
959	   by the initiating node.  RELOAD does not support an intermediate peer
960	   returning a response that it will not recursively route a normal
961	   request---the willingness to perform that operation is implicit in
962	   its role as a peer in the overlay.

964	3.3.1.  Routing Alternatives

966	   Significant discussion has been focused on the selection of a routing
967	   algorithm for P2PSIP.  This section discusses the motivations for
968	   selection of symmetric recursive routing for RELOAD and describes the
969	   extensions that would be required to support additional routing
970	   algorithms.

972	3.3.1.1.  Iterative vs Recursive

974	   Iterative routing has a number of advantages.  It is easier to debug,
975	   consumes fewer resources on intermediate peers, and allows the
976	   querying peer to identify and route around misbehaving peers
977	   [stoica-non-transitive-worlds05].  However, in the presence of NATs
978	   iterative routing is intolerably expensive because a new connection
979	   must be established for each hop (using ICE) [bryan-design-hotp2p08].

981	   Iterative routing is supported through the Route_Query mechanism and
982	   is primarily intended for debugging.  It is also the most reliable
983	   technique in the presence of network transitivity because the
984	   querying peer can evaluate the routing decisions made by the peers at
985	   each hop, consider alternatives, and detect at what point the
986	   forwarding path fails.  An algorithm to implement this approach is
987	   beyond the scope of this draft.

989	3.3.1.2.  Symmetric vs Forward response

991	   An alternative to the symmetric recursive routing method used by
992	   RELOAD is Forward-Only routing, where the response is routed to the
993	   requester as if it is a new message initiating by the responder (in
994	   the previous example, Z sends the response to A as if it were sending
995	   a request).  Forward-only routing requires no state in either the
996	   message or intermediate peers.

998	   The drawback of forward-only routing is that it does not work when
999	   the overlay is unstable.  For example, if A is in the process of
1000	   joining the overlay and is sending a Join request to Z, it is not yet
1001	   reachable via forward routing.  Even if it is established in the
1002	   overlay, if network failures produce temporary instability, A may not
1003	   be reachable (and may be trying to stabilize its network connectivity
1004	   via Connect messages).

1006	   Furthermore, forward-only responses are less likely to reach the
1007	   querying peer than symmetric recursive because the forward path is
1008	   more likely to have a failed peer than the request path (which was
1009	   just tested to route the request) [stoica-non-transitive-worlds05].

1011	   An extension to RELOAD that supports forward-only routing but relies
1012	   on symmetric responses as a fallback would be possible, but due to
1013	   the complexities of determining when to use forward-only and when to
1014	   fallback to symmetric, we have chosen not to include it as an option
1015	   at this point.

1017	3.3.1.3.  Direct Response

1019	   Another routing option is Direct Response routing, in which the
1020	   response is returned directly to the querying node.  In the previous
1021	   example, if A encodes its IP address in the request, then Z can
1022	   simply deliver the response directly to A. In the absence of NATs or
1023	   other connectivity issues, this is the optimal routing technique.

1025	   The challenge of implementing direct response is the presence of
1026	   NATs.  There are a number of complexities that must be addressed.  In
1027	   this discussion, we will continue our assumption that A issued the
1028	   request and Z is generating the response.

1030	   o  The IP address listed by A may be unreachable, either due to NAT
1031	      or firewall rules.  Therefore, a direct response technique must
1032	      fallback to symmetric response [stoica-non-transitive-worlds05].
1033	      The hop-by-hop ACKs used by RELOAD allow Z to determine when A has
1034	      received the message (and the TLS negotiation will provide earlier
1035	      confirmation that A is reachable), but this fallback requires a
1036	      timeout that will increase the response latency whenever A is not
1037	      reachable from Z.
1038	   o  Whenever A is behind a NAT it will have multiple candidate IP
1039	      addresses, each of which must be advertised to ensure
1040	      connectivity, therefore Z will need to attempt multiple
1041	      connections to deliver the response.
1042	   o  One (or all) of A's candidate addresses may route from Z to a
1043	      different device on the Internet.  In the worst case these nodes
1044	      may actually be running RELOAD on the same port.  Therefore,
1045	      establishing a secure connection to authenticate A before
1046	      delivering the response is absolutely necessary.  This step
1047	      diminishes the efficiency of direct response because multiple
1048	      roundtrips are required before the message can be delivered.
1049	   o  If A is behind a NAT and does not have a connection already
1050	      established with Z, there are only two ways the direct response
1051	      will work.  The first is that A and Z are both behind the same
1052	      NAT, in which case the NAT is not involved.  In the more common
1053	      case, when Z is outside A's NAT, the response will only be
1054	      received if A's NAT implements endpoint-independent filtering.  As
1055	      the choice of filtering mode conflates application transparency
1056	      with security [RFC4787], and no clear recommendation is available,
1057	      the prevalence of this feature in future devices remains unclear.

1059	   An extension to RELOAD that supports direct response routing but
1060	   relies on symmetric responses as a fallback would be possible, but
1061	   due to the complexities of determining when to use direct response
1062	   and when to fallback to symmetric, and the reduced performance for
1063	   responses to peers behind restrictive NATs, we have chosen not to
1064	   include it as an option at this point.

1066	3.3.1.4.  Relay Peers

1068	   SEP [I-D.jiang-p2psip-sep] has proposed implementing a form of direct
1069	   response by having A identify a peer, Q, that will be directly
1070	   reachable by any other peer.  A uses Connect to establish a
1071	   connection with Q and advertises Q's IP address in the request sent
1072	   to Z. Z sends the response to Q, which relays it to A. This then
1073	   reduces the latency to two hops, plus Z negotiating a secure
1074	   connection to Q.

1076	   This technique relies on the relative population of nodes such as A
1077	   that require relay peers and peers such as Q that are capable of
1078	   serving as a relay peer.  It also requires nodes to be able to
1079	   identify which category they are in.  This identification problem has
1080	   turned out to be hard to solve and is still an open area of
1081	   exploration.

1083	   An extension to RELOAD that supports relay peers is possible, but due
1084	   to the complexities of implementing such an alternative, we have not
1085	   added such a feature to RELOAD at this point.

1087	   A concept similar to relay peers, essentially choosing a relay peer
1088	   at random, has previously been suggested to solve problems of
1089	   pairwise non-transitivity [stoica-non-transitive-worlds05], but
1090	   deterministic filtering provided by NATs make random relay peers no
1091	   more likely to work than the responding peer.

1093	3.3.1.5.  Symmetric Route Stability

1095	   A common concern about symmetric recursive routing has been that one
1096	   or more peers along the request path may fail before the response is
1097	   received.  The significance of this problem essentially depends on
1098	   the response latency of the overlay---an overlay that produces slow
1099	   responses will be vulnerable to churn, whereas responses that are
1100	   delivered very quickly are vulnerable only to failures that occur
1101	   over that small interval.

1103	   The other aspect of this issue is whether the request itself can be
1104	   successfully delivered.  Assuming typical connection maintenance
1105	   intervals, the time period between the last maintenance and the
1106	   request being sent will be orders of magnitude greater than the delay
1107	   between the request being forwarded and the response being received.
1108	   Therefore, if the path was stable enough to be available to route the
1109	   request, it is almost certainly going to remain available to route
1110	   the response.

1112	   An overlay that is unstable enough to suffer this type of failure
1113	   frequently is unlikely to be able to support reliable functionality
1114	   regardless of the routing mechanism.  However, regardless of the
1115	   stability of the return path, studies show that in the event of high
1116	   churn, iterative routing is a better solution to ensure request
1117	   completion [ng-analytical-churn-ieeep2p06]
1118	   [stoica-non-transitive-worlds05]

1120	   Finally, because RELOAD retries the end-to-end request, that retry
1121	   will address the issues of churn that remain.

1123	3.4.  Connectivity Management

1125	   In order to provide efficient routing, a peer needs to maintain a set
1126	   of direct connections to other peers in the Overlay Instance.  Due to
1127	   the presence of NATs, these connections often cannot be formed
1128	   directly.  Instead, we use the Connect request to establish a
1129	   connection.  Connect uses ICE [I-D.ietf-mmusic-ice-tcp] to establish
1130	   the connection.  It is assumed that the reader is familiar with ICE.

1132	   Say that peer A wishes to form a direct connection to peer B. It
1133	   gathers ICE candidates and packages them up in a Connect request
1134	   which it sends to B through usual overlay routing procedures.  B does
1135	   its own candidate gathering and sends back a response with its
1136	   candidates.  A and B then do ICE connectivity checks on the candidate
1137	   pairs.  The result is a connection between A and B. At this point, A
1138	   and B can add each other to their routing tables and send messages
1139	   directly between themselves without going through other overlay
1140	   peers.

1142	   There is one special case in which Connect cannot be used:  when a
1143	   peer is joining the overlay and is not connected to any peers.  In
1144	   order to support this case, some small number of "bootstrap nodes"
1145	   need to be publicly accessible so that new peers can directly connect
1146	   to them.  Section 13 contains more detail on this.

1148	   In general, a peer needs to maintain connections to all of the peers
1149	   near it in the Overlay Instance and to enough other peers to have
1150	   efficient routing (the details depend on the specific overlay).  If a
1151	   peer cannot form a connection to some other peer, this isn't
1152	   necessarily a disaster; overlays can route correctly even without
1153	   fully connected links.  However, a peer should try to maintain the
1154	   specified link set and if it detects that it has fewer direct
1155	   connections, should form more as required.  This also implies that
1156	   peers need to periodically verify that the connected peers are still
1157	   alive and if not try to reform the connection or form an alternate
1158	   one.

1160	3.5.  Overlay Algorithm Support

1162	   The Topology Plugin allows RELOAD to support a variety of overlay
1163	   algorithms.  This draft defines a DHT based on Chord [Chord], which
1164	   is mandatory to implement, but the base RELOAD protocol is designed
1165	   to support a variety of overlay algorithms.

1167	3.5.1.  Support for Pluggable Overlay Algorithms

1169	   RELOAD defines three methods for overlay maintenance:  Join, Update,
1170	   and Leave.  However, the contents of those messages, when they are
1171	   sent, and their precise semantics are specified by the actual overlay
1172	   algorithm; RELOAD merely provides a framework of commonly-needed
1173	   methods that provides uniformity of notation (and ease of debugging)
1174	   for a variety of overlay algorithms.

1176	3.5.2.  Joining, Leaving, and Maintenance Overview

1178	   When a new peer wishes to join the Overlay Instance, it must have a
1179	   Node-ID that it is allowed to use.  It uses one of the Node-IDs in
1180	   the certificate it received from the enrollment server.  The details
1181	   of the joining procedure are defined by the overlay algorithm, but
1182	   the general steps for joining an Overlay Instance are:

1184	   o  Forming connections to some other peers.
1185	   o  Acquiring the data values this peer is responsible for storing.
1186	   o  Informing the other peers which were previously responsible for
1187	      that data that this peer has taken over responsibility.

1189	   The first thing the peer needs to do is form a connection to some
1190	   "bootstrap node".  Because this is the first connection the peer
1191	   makes, these nodes must have public IP addresses and therefore can be
1192	   connected to directly.  Once a peer has connected to one or more
1193	   bootstrap nodes, it can form connections in the usual way by routing
1194	   Connect messages through the overlay to other nodes.  Once a peer has
1195	   connected to the overlay for the first time, it can cache the set of
1196	   nodes it has connected to with public IP addresses for use as future
1197	   bootstrap nodes.

1199	   Once the peer has connected to a bootstrap node, it then needs to
1200	   take up its appropriate place in the overlay.  This requires two
1201	   major operations:

1203	   o  Forming connections to other peers in the overlay to populate its
1204	      Routing Table.
1205	   o  Getting a copy of the data it is now responsible for storing and
1206	      assuming responsibility for that data.

1208	   The second operation is performed by contacting the Admitting Peer
1209	   (AP), the node which is currently responsible for that section of the
1210	   overlay.

1212	   The details of this operation depend mostly on the overlay algorithm
1213	   involved, but a typical case would be:

1215	   1.  JP (Joining Peer) sends a Join request to AP (Admitting Peer)
1216	       announcing its intention to join.
1217	   2.  AP sends a Join response.
1218	   3.  AP does a sequence of Stores to JP to give it the data it will
1219	       need.
1220	   4.  AP does Updates to JP and to other peers to tell it about its own
1221	       routing table.  At this point, both JP and AP consider JP
1222	       responsible for some section of the Overlay Instance.
1223	   5.  JP makes its own connections to the appropriate peers in the
1224	       Overlay Instance.

1226	   After this process is completed, JP is a full member of the Overlay
1227	   Instance and can process Store/Fetch requests.

1229	3.6.  First-Time Setup

1231	   Previous sections addressed how RELOAD works once a node has
1232	   connected.  This section provides an overview of how users get
1233	   connected to the overlay for the first time.  RELOAD is designed so
1234	   that users can start with the name of the overlay they wish to join
1235	   and perhaps a username and password, and leverage that into having a
1236	   working peer with minimal user intervention.  This helps avoid the
1237	   problems that have been experienced with conventional SIP clients
1238	   where users are required to manually configure a large number of
1239	   settings.

1241	3.6.1.  Initial Configuration

1243	   In the first phase of the process, the user starts out with the name
1244	   of the overlay and uses this to download an initial set of overlay
1245	   configuration parameters.  The user does a DNS SRV lookup on the
1246	   overlay name to get the address of a configuration server.  It can
1247	   then connect to this server with HTTPS to download a configuration
1248	   document which contains the basic overlay configuration parameters as
1249	   well as a set of bootstrap nodes which can be used to join the
1250	   overlay. role.

1252	3.6.2.  Enrollment

1254	   If the overlay is using certificate enrollment, then a user needs to
1255	   acquire a certificate before joining the overlay.  The certificate
1256	   attests both to the user's name within the overlay and to the node-
1257	   ids which they are permitted to operate.  In that case, the
1258	   configuration document will contain the address of an enrollment
1259	   server which can be used to obtain such a certificate.  The
1260	   enrollment server may (and probably will) require some sort of
1261	   username and password before issuing the certificate.  The enrollment
1262	   server's ability to restrict attackers' access to certificates in the
1263	   overlay is one of the cornerstones of RELOAD's security.

1265	4.  Application Support Overview

1267	   RELOAD is not intended to be used alone, but rather as a substrate
1268	   for other applications.  These applications can use RELOAD for a
1269	   variety of purposes:

1271	   o  To store data in the overlay and retrieve data stored by other
1272	      nodes.
1273	   o  As a discovery mechanism for services such as TURN.
1274	   o  To form direct connections which can be used to transmit
1275	      application-level messages.

1277	   This section provides an overview of these services.

1279	4.1.  Data Storage

1281	   RELOAD provides operations to Store, Fetch, and Remove data.  Each
1282	   location in the Overlay Instance is referenced by a Resource-ID.
1283	   However, each location may contain data elements corresponding to
1284	   multiple kinds (e.g., certificate, SIP registration).  Similarly,
1285	   there may be multiple elements of a given kind, as shown below:

1287	                       +--------------------------------+
1288	                       |            Resource-ID         |
1289	                       |                                |
1290	                       | +------------+  +------------+ |
1291	                       | |   Kind 1   |  |   Kind 2   | |
1292	                       | |            |  |            | |
1293	                       | | +--------+ |  | +--------+ | |
1294	                       | | | Value  | |  | | Value  | | |
1295	                       | | +--------+ |  | +--------+ | |
1296	                       | |            |  |            | |
1297	                       | | +--------+ |  | +--------+ | |
1298	                       | | | Value  | |  | | Value  | | |
1299	                       | | +--------+ |  | +--------+ | |
1300	                       | |            |  +------------+ |
1301	                       | | +--------+ |                 |
1302	                       | | | Value  | |                 |
1303	                       | | +--------+ |                 |
1304	                       | +------------+                 |
1305	                       +--------------------------------+

1307	   Each kind is identified by a kind-id, which is a code point assigned
1308	   by IANA.  As part of the kind definition, protocol designers may
1309	   define constraints, such as limits on size, on the values which may
1310	   be stored.  For many kinds, the set may be restricted to a single
1311	   value; some sets may be allowed to contain multiple identical items
1312	   while others may only have unique items.  Note that a kind may be
1313	   employed by multiple usages and new usages are encouraged to use
1314	   previously defined kinds where possible.  We define the following
1315	   data models in this document, though other usages can define their
1316	   own structures:

1318	   single value:  There can be at most one item in the set and any value
1319	      overwrites the previous item.

1321	   array:  Many values can be stored and addressed by a numeric index.

1323	   dictionary:  The values stored are indexed by a key.  Often this key
1324	      is one of the values from the certificate of the peer sending the
1325	      Store request.

1327	   In order to protect stored data from tampering, by other nodes, each
1328	   stored value is digitally signed by the node which created it.  When
1329	   a value is retrieved, the digital signature can be verified to detect
1330	   tampering.

1332	4.1.1.  Storage Permissions

1334	   A major issue in peer-to-peer storage networks is minimizing the
1335	   burden of becoming a peer, and in particular minimizing the amount of
1336	   data which any peer is required to store for other nodes.  RELOAD
1337	   addresses this issue by only allowing any given node to store data at
1338	   a small number of locations in the overlay, with those locations
1339	   being determined by the node's certificate.  When a peer uses a Store
1340	   request to place data at a location authorized by its certificate, it
1341	   signs that data with the private key that corresponds to its
1342	   certificate.  Then the peer storing the data is able to verify that
1343	   the peer issuing the request is authorized to make that request.
1344	   Each data kind defines the exact rules for determining what
1345	   certificate is appropriate.

1347	   The most natural rule is that a certificate authorizes a user to
1348	   store data keyed with their user name X. This rules is used for all
1349	   the kinds defined in this specification.  Thus, only a user with a
1350	   certificate for "alice@example.org" could write to that location in
1351	   the overlay.  However, other usages can define any rules they choose,
1352	   including publicly writable values.

1354	   The digital signature over the data serves two purposes.  First, it
1355	   allows the peer responsible for storing the data to verify that this
1356	   Store is authorized.  Second, it provides integrity for the data.
1357	   The signature is saved along with the data value (or values) so that
1358	   any reader can verify the integrity of the data.  Of course, the
1359	   responsible peer can "lose" the value but it cannot undetectable
1360	   modify it.

1362	   The size requirements of the data being stored in the overlay are
1363	   variable.  For instance, a SIP AoR and voicemail differ widely in the
1364	   storage size.  RELOAD leaves it to the Usage to address the size
1365	   imbalance of various kinds.

1367	4.1.2.  Usages

1369	   By itself, the distributed storage layer just provides infrastructure
1370	   on which applications are built.  In order to do anything useful, a
1371	   usage must be defined.  Each Usage specifies several things:

1373	   o  Registers kind-id code points for any kinds that the Usage
1374	      defines.
1375	   o  Defines the data structure for each of the kinds.
1376	   o  Defines access control rules for each kinds.
1377	   o  Provides a size limit for each kinds.

1379	   o  Defines how the Resource Name is formed that is hashed to form the
1380	      Resource-ID where each kind is stored.
1381	   o  Describes how values will be merged after a network partition.
1382	      Unless otherwise specified, the default merging rule is to act as
1383	      if all the values that need to be merged were stored and that the
1384	      order they were stored in corresponds to the stored time values
1385	      associated with (and carried in) their values.  Because the stored
1386	      time values are those associated with the peer which did the
1387	      writing, clock skew is generally not an issue.  If two nodes are
1388	      on different partitions, clocks, this can create merge conflicts.
1389	      However because RELOAD deliberately segregates storage so that
1390	      data from different users and peers is stored in different
1391	      locations, and a single peer will typically only be in a single
1392	      network partition, this case will generally not arise.

1394	   The kinds defined by a usage may also be applied to other usages.
1395	   However, a need for different parameters, such as different size
1396	   limits, would imply the need to create a new kind.

1398	4.1.3.  Replication

1400	   Replication in P2P overlays can be used to provide:

1402	   persistence:    if the responsible peer crashes and/or if the storing
1403	      peer leaves the overlay
1404	   security:    to guard against DoS attacks by the responsible peer or
1405	      routing attacks to that responsible peer
1406	   load balancing:    to balance the load of queries for popular
1407	      resources.

1409	   A variety of schemes are used in P2P overlays to achieve some of
1410	   these goals.  Common techniques include replicating on neighbors of
1411	   the responsible peer, randomly locating replicas around the overlay,
1412	   or replicating along the path to the responsible peer.

1414	   The core RELOAD specification does not specify a particular
1415	   replication strategy.  Instead, the first level of replication
1416	   strategies are determined by the overlay algorithm, which can base
1417	   the replication strategy on the its particular topology.  For
1418	   example, Chord places replicas on successor peers, which will take
1419	   over responsibility should the responsible peer fail [Chord].

1421	   If additional replication is needed, for example if data persistence
1422	   is particularly important for a particular usage, then that usage may
1423	   specify additional replication, such as implementing random
1424	   replications by inserting a different well known constant into the
1425	   Resource Name used to store each replicated copy of the resource.
1426	   Such replication strategies can be added independent of the
1427	   underlying algorithm, and their usage can be determined based on the
1428	   needs of the particular usage.

1430	4.2.  Service Discovery

1432	   RELOAD does not currently define a generic service discovery
1433	   algorithm as part of the base protocol.  A variety of service
1434	   discovery algorithm can be implemented as extensions to the base
1435	   protocol, such as ReDIR [opendht-sigcomm05].

1437	4.3.  Application Connectivity

1439	   There is no requirement that a RELOAD usage must use RELOAD's
1440	   primitives for establishing its own communication if it already
1441	   possesses its own means of establishing connections.  For example,
1442	   one could design a RELOAD-based resource discovery protocol which
1443	   used HTTP to retrieve the actual data.

1445	   For more common situations, however, the overlay itself is used to
1446	   establish a connection rather than an external authority such as DNS,
1447	   RELOAD provides connectivity to applications using the same Connect
1448	   method as is used for the overlay maintenance.  For example, if a
1449	   P2PSIP node wishes to establish a SIP dialog with another P2PSIP
1450	   node, it will use Connect to establish a direct connection with the
1451	   other node.  This new connection is separate from the peer protocol
1452	   connection, it is a dedicated UDP or TCP flow used only for the SIP
1453	   dialog.  Each usage specifies which types of connections can be
1454	   initiated using Connect.

1456	5.  P2PSIP Integration Overview

1458	   The SIP Usage of RELOAD allows SIP user agents to provide a peer-to-
1459	   peer telephony service without the requirement for permanent proxy or
1460	   registration servers.  In such a network, the RELOAD overlay itself
1461	   performs the registration and rendezvous functions ordinarily
1462	   associated with such servers.

1464	   The basic function of the SIP usage is to allow Alice to start with a
1465	   SIP URI (e.g., "bob@dht.example.com") and end up with a connection
1466	   which Alice's SIP UA can use to pass SIP messages back and forth to
1467	   Bob's SIP UA.  The way this works is as follows:

1469	   1.  Bob, operating Node-ID 1234, stores a mapping from his URI to his
1470	       Node-ID in the overlay.  I.e., "sip:bob@dht.example.com -> 1234".
1471	   2.  Alice, operating Node-ID 5678, decides to call Bob. She looks up
1472	       "sip:bob@dht.example.com" in the overlay and retrieves "1234".

1474	   3.  Alice uses the overlay to route a Connect message to Bob's peer.
1475	       Bob responds with his own Connect and they set up a direct
1476	       connection, as shown below.

1478	   Alice       Peer1      Overlay     PeerN      Bob
1479	   (5678)                                     (1234)
1480	   -------------------------------------------------
1481	   Connect ->
1482	             Connect ->
1483	                         Connect ->
1484	                                      Connect ->
1485	                                          <- Connect
1486	                                   <- Connect
1487	                      <- Connect
1488	            <- Connect

1490	   <------------------ ICE Checks ----------------->
1491	   INVITE ----------------------------------------->
1492	   <--------------------------------------------- OK
1493	   ACK -------------------------------------------->
1494	   <------------ ICE Checks for media ------------->
1495	   <-------------------- RTP ---------------------->

1497	   It is important to note that RELOAD's only role here is to set up the
1498	   direct connection between Alice and Bob. As soon as the ICE checks
1499	   complete and the connection is established, then ordinary SIP is
1500	   used.  In particular, the establishment of the media channel for the
1501	   phone call happens via the usual SIP mechanisms, and RELOAD is not
1502	   involved.  Media never goes over the overlay.  After the successful
1503	   exchange of SIP messages, call peers run ICE connectivity checks for
1504	   media.

1506	   As well as allowing mappings from AORs to Node-IDs, the SIP Usage
1507	   also allows mappings from AORs to other AORs.  For instance, if Bob
1508	   wanted his phone calls temporarily forwarded to Charlie, he could
1509	   store the mapping "sip:bob@dht.example.com ->
1510	   sip:charlie@dht.example.com".  When Alice wants to call Bob, she
1511	   retrieves this mapping and can then fetch Charlie's AOR to retrieve
1512	   his Node-ID.

1514	6.  Overlay Management Protocol

1516	   This section defines the basic protocols used to create, maintain,
1517	   and use the RELOAD overlay network.  We start by defining how
1518	   messages are transmitted, received, and routed in an existing
1519	   overlay, then define the message structure, and then finally define
1520	   the messages used to join and maintain the overlay.

1522	6.1.  Message Routing

1524	   This section describes procedures used by nodes to route messages
1525	   through the overlay.

1527	6.1.1.  Request Origination

1529	   In order to originate a message to a given Node-ID or resource-id, a
1530	   node constructs an appropriate destination list.  The simplest such
1531	   destination list is a single entry containing the peer or
1532	   resource-id.  The resulting message will use the normal overlay
1533	   routing mechanisms to forward the message to that destination.  The
1534	   node can also construct a more complicated destination list for
1535	   source routing.

1537	   Once the message is constructed, the node sends the message to some
1538	   adjacent peer.  If the first entry on the destination list is
1539	   directly connected, then the message MUST be routed down that
1540	   connection.  Otherwise, the topology plugin MUST be consulted to
1541	   determine the appropriate next hop.

1543	   Parallel searches for the resource are a common solution to improve
1544	   reliability in the face of churn or of subversive peers.  Parallel
1545	   searches for usage-specified replicas are managed by the usage layer.
1546	   However, a single request can also be routed through multiple
1547	   adjacent peers, even when known to be sub-optimal, to improve
1548	   reliability [vulnerabilities-acsac04].  Such parallel searches MAY BE
1549	   specified by the topology plugin.

1551	   Because messages may be lost in transit through the overlay, RELOAD
1552	   incorporates an end-to-end reliability mechanism.  When an
1553	   originating node transmits a request it MUST set a 3 second timer.
1554	   If a response has not been received when the timer fires, the request
1555	   is retransmitted with the same transaction identifier.  The request
1556	   MAY be retransmitted up to 4 times (for a total of 5 messages).
1557	   After the timer for the fifth transmission fires, the message SHALL
1558	   be considered to have failed.  Note that this retransmission
1559	   procedure is not followed by intermediate nodes.  They follow the
1560	   hop-by-hop reliability procedure described in Section 6.4.1.2.

1562	6.1.2.  Message Receipt and Forwarding

1564	   When a peer receives a message, it first examines the overlay,
1565	   version, and other header fields to determine whether the message is
1566	   one it can process.  If any of these are incorrect (e.g., the message
1567	   is for an overlay in which the peer does not participate) it is an
1568	   error.  The peer SHOULD generate an appropriate error but if local
1569	   policy can override this in which case the messages is silently
1570	   dropped.

1572	   Once the peer has determined that the message is correctly formatted,
1573	   it examines the first entry on the destination list.  There are three
1574	   possible cases here:

1576	   o  The first entry on the destination list is an id for which the
1577	      peer is responsible.
1578	   o  The first entry on the destination list is a an id for which
1579	      another peer is responsible.
1580	   o  The first entry on the destination list is a private id which is
1581	      being used for destination list compression.

1583	   These cases are handled as discussed below.

1585	6.1.2.1.  Responsible ID

1587	   If the first entry on the destination list is a ID for which the node
1588	   is responsible, there are several sub-cases.
1589	   o  If the entry is a Resource-Id, then it MUST be the only entry on
1590	      the destination list.  If there are other entries, the message
1591	      MUST be silently dropped.  Otherwise, the message is destined for
1592	      this node and it passes it up to the upper layers.
1593	   o  If the entry is a Node-Id which belongs to this node, then the
1594	      message is destined for this node.  If this is the only entry on
1595	      the destination list, the message is destined for this node and is
1596	      passed up to the upper layers.  Otherwise the entry is removed
1597	      from the destination list and the message is passed it to the
1598	      routing layer.  If the message is a response and there is state
1599	      for the transaction ID, the state is reinserted into the
1600	      destination list first.
1601	   o  If the entry is a Node-Id which is not equal to this node, then
1602	      the node MUST drop the message silently unless the Node-Id
1603	      corresponds to a node which is directly connected to this node
1604	      (i.e., a client).  In that case, it MUST forward the message to
1605	      the destination node as described in the next section.

1607	   Note that this implies that in order to address a message to "the
1608	   peer that controls region X", a sender sends to resource-id X, not
1609	   Node-ID X.

1611	6.1.2.2.  Other ID

1613	   If neither of the other two cases applies, then the peer MUST forward
1614	   the message towards the first entry on the destination list.  This
1615	   means that it MUST select one of the peers to which it is connected
1616	   and which is likely to be responsible for the first entry on the
1617	   destination list.  If the first entry on the destination list is in
1618	   the peer's connection table, then it SHOULD forward the message to
1619	   that peer directly.  Otherwise, it consult the routing table to
1620	   forward the message.

1622	   Any intermediate peer which forwards a RELOAD message MUST arrange
1623	   that if it receives a response to that message the response can be
1624	   routed back through the set of nodes through which the request
1625	   passed.  This may be arranged in one of two ways:

1627	   o  The peer MAY add an entry to the via list in the forwarding header
1628	      that will enable it to determine the correct node.
1629	   o  The peer MAY keep per-transaction state which will allow it to
1630	      determine the correct node.

1632	   As an example of the first strategy, if node D receives a message
1633	   from node C with via list (A, B), then D would forward to the next
1634	   node (E) with via list (A, B, C).  Now, if E wants to respond to the
1635	   message, it reverses the via list to produce the destination list,
1636	   resulting in (D, C, B, A).  When D forwards the response to C, the
1637	   destination list will contain (C, B, A).

1639	   As an example of the second strategy, if node D receives a message
1640	   from node C with transaction ID X and via list (A, B), it could store
1641	   (X, C) in its state database and forward the message with the via
1642	   list unchanged.  When D receives the response, it consults its state
1643	   database for transaction id X, determines that the request came from
1644	   C, and forwards the response to C.

1646	   Intermediate peer which modify the via list are not required to
1647	   simply add entries.  The only requirement is that the peer be able to
1648	   reconstruct the correct destination list on the return route.  RELOAD
1649	   provides explicit support for this functionality in the form of
1650	   private IDs, which can replace any number of via list entries.  For
1651	   instance, in the above example, Node D might send E a via list
1652	   containing only the private ID (I).  E would then use the destination
1653	   list (D, I) to send its return message.  When D processes this
1654	   destination list, it would detect that I is a private ID, recover the
1655	   via list (A, B, C), and reverse that to produce the correct
1656	   destination list (C, B, A) before sending it to C. This feature is
1657	   called List Compression.  I MAY either be a compressed version of the
1658	   original via list or an index into a state database containing the
1659	   original via list.

1661	   Note that if an intermediate peer exits the overlay, then on the
1662	   return trip the message cannot be forwarded and will be dropped.  The
1663	   ordinary timeout and retransmission mechanisms provide stability over
1664	   this type of failure.

1666	6.1.2.3.  Private ID

1668	   If the first entry on the destination list is a private id (e.g., a
1669	   compressed via list), the peer MUST that entry with the original via
1670	   list that it replaced indexes and then re-examine the destination
1671	   list to determine which case now applies.

1673	6.1.3.  Response Origination

1675	   When a peer sends a response to a request, it MUST construct the
1676	   destination list by reversing the order of the entries on the via
1677	   list.  This has the result that the response traverses the same peers
1678	   as the request traversed, except in reverse order (symmetric
1679	   routing).  Note that this rule will need to be relaxed if other
1680	   routing algorithms are supported.

1682	6.2.  Message Structure

1684	   RELOAD is a message-oriented request/response protocol.  The messages
1685	   are encoded using binary fields.  All integers are represented in
1686	   network byte order.  The general philosophy behind the design was to
1687	   use Type, Length, Value fields to allow for extensibility.  However,
1688	   for the parts of a structure that were required in all messages, we
1689	   just define these in a fixed position as adding a type and length for
1690	   them is unnecessary and would simply increase bandwidth and
1691	   introduces new potential for interoperability issues.

1693	   Each message has three parts, concatenated as shown below:

1695	      +-------------------------+
1696	      |    Forwarding Header    |
1697	      +-------------------------+
1698	      |    Message Contents     |
1699	      +-------------------------+
1700	      |       Signature         |
1701	      +-------------------------+

1703	   The contents of these parts are as follows:

1705	   Forwarding Header:  Each message has a generic header which is used
1706	      to forward the message between peers and to its final destination.
1707	      This header is the only information that an intermediate peer
1708	      (i.e., one that is not the target of a message) needs to examine.

1710	   Message Contents:  The message being delivered between the peers.
1711	      From the perspective of the forwarding layer, the contents is
1712	      opaque, however, it is interpreted by the higher layers.

1714	   Signature:  A digital signature over the message contents and parts
1715	      of the header of the message.  Note that this signature can be
1716	      computed without parsing the message contents.

1718	   The following sections describe the format of each part of the
1719	   message.

1721	6.2.1.  Presentation Language

1723	   Most of the structures defined in this document (with the exception
1724	   of the forwarding header defined in the next section) are defined
1725	   using a C-like syntax based on the presentation language used to
1726	   define TLS.  Advantages of this style include:

1728	   o  It is easy to write and familiar enough looking that most readers
1729	      can grasp it quickly.
1730	   o  The ability to define nested structures allows a separation
1731	      between high-level and low level message structures.
1732	   o  It has a straightforward wire encoding that allows quick
1733	      implementation, but the structures can be comprehended without
1734	      knowing the encoding.

1736	   This presentation is to some extent a placeholder.  We consider it an
1737	   open question what the final protocol definition method and encodings
1738	   use.  We expect this to be a question for the WG to decide.

1740	   Several idiosyncrasies of this language are worth noting.

1742	   o  All lengths are denoted in bytes, not objects.
1743	   o  Variable length values are denoted like arrays with angle
1744	      brackets.
1745	   o  "select" is used to indicate variant structures.

1747	   For instance, "uint16 array<0..2^8-2>;" represents up to 254 bytes
1748	   but only up to 127 values of two bytes (16 bits) each..

1750	6.2.1.1.  Common Definitions

1752	   The following definitions are used throughout RELOAD and so are
1753	   defined here.  They also provide a convenient introduction to how to
1754	   read the presentation language.

1756	   An enum represents an enumerated type.  The values associated with
1757	   each possibility are represented in parentheses and the maximum value
1758	   is represented as a nameless value, for purposes of describing the
1759	   width of the containing integral type.  For instance, Boolean
1760	   represents a true or false:

1762	          enum { false (0), true(1), (255)} Boolean;

1764	   A boolean value is either a 1 or a 0 and is represented as a single
1765	   byte on the wire.

1767	   The NodeId, shown below, represents a single Node-ID.

1769	              typedef opaque       NodeId[16];

1771	   A NodeId is a fixed-length 128-bit structure represented as a series
1772	   of bytes, most significant byte first.  Note:  the use of "typedef"
1773	   here is an extension to the TLS language, but its meaning should be
1774	   relatively obvious.

1776	   A ResourceId, shown below, represents a single resource-id.

1778	              typedef opaque       ResourceId<0..2^8-1>;

1780	   Like a NodeId, a resource-id is an opaque string of bytes, but unlike
1781	   Node-IDs, resource-ids are variable length, up to 255 bytes (2048
1782	   bits) in length.  On the wire, each ResourceId is preceded by a
1783	   single length byte (allowing lengths up to 255).  Thus, the 3-byte
1784	   value "Foo" would be encoded as:  03 46 4f 4f.

1786	   A more complicated example is IpAddressPort, which represents a
1787	   network address and can be used to carry either an IPv6 or IPv4
1788	   address:

1790	         enum {reserved(0), ip4_address (1), ip6_address (2), (255)}
1791	              AddressType;

1793	         struct  {
1794	           uint32                  addr;
1795	           uint16                  port;
1796	         } IPv4AddrPort;

1798	         struct  {
1799	           uint128                 addr;
1800	           uint16                  port;
1801	         } IPv6AddrPort;

1803	         struct {
1804	           AddressType             type;
1805	           uint8                   length;

1807	           select (type) {
1808	             case ipv4_address:
1809	                IPv4AddrPort       v4addr_port;

1811	             case ipv6_address:
1812	                IPv6AddrPort       v6addr_port;

1814	             /* This structure can be extended */

1816	          } IpAddressPort;

1818	   The first two fields in the structure are the same no matter what
1819	   kind of address is being represented:

1821	   type
1822	      the type of address (v4 or v6).

1824	   length
1825	      the length of the rest of the structure.

1827	   By having the type and the length appear at the beginning of the
1828	   structure regardless of the kind of address being represented, an
1829	   implementation which does not understand new address type X can still
1830	   parse the IpAddressPort field and then discard it if it is not
1831	   needed.

1833	   The rest of the IpAddressPort structure is either an IPv4AddrPort or
1834	   an IPv6AddrPort.  Both of these simply consist of an address
1835	   represented as an integer and a 16-bit port.  As an example, here is
1836	   the wire representation of the IPv4 address "192.0.2.1" with port
1837	   "6100".

1839	              01           ; type    = IPv4
1840	              06           ; length  = 6
1841	              c0 00 02 01  ; address = 192.0.2.1
1842	              17 d4        ; port    = 6100

1844	6.2.2.  Forwarding Header

1846	   The layout of the forwarding header is shown below.  We present this
1847	   as a bit diagram because it is mostly fixed and to show the
1848	   similarities with other packet headers.

1850	       0                   1                   2                   3
1851	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1852	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1853	      |1|     R       |       E       |       L       |       O       |
1854	   4  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1855	      |                           Overlay                             |
1856	   8  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1857	      |               |               |F|L|                           |
1858	      |     TTL       |   Reserved    |R|F|      Fragment Offset      |
1859	      |               |               |A|R|                           |
1860	      |               |               |G|G|                           |
1861	   12 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1862	      |               |                                               |
1863	      |    Version    |                    Length                     |
1864	      |               |                                               |
1865	   16 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1866	      |                        Transaction ID                         |
1867	      +                                                               +
1868	      |                                                               |
1869	   24 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1870	      |                               |                               |
1871	      |             Flags             |        Via List Length        |
1872	      |                               |                               |
1873	   28 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1874	      |                               |
1875	      |    Destination List Length    |
1876	      |                               |
1877	   30 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1878	      |                                                               |
1879	      //                           Via List                          //
1880	      |                                                               |
1881	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1882	      |                                                               |
1883	      //                          Destination List                   //
1884	      |                                                               |
1885	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1886	      |                                                               |
1887	      //                          Route Log                          //
1888	      |                                                               |
1889	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1891	   The first four bytes identify this message as a RELOAD message.  The
1892	   message is easy to demultiplex from STUN messages by looking at the
1893	   first bit.

1895	   The Overlay field is the 32 bit checksum/hash of the overlay being
1896	   used.  The variable length string representing the overlay name is
1897	   hashed with SHA-1 and the low order 32 bits are used.  The purpose of
1898	   this field is to allow nodes to participate in multiple overlays and
1899	   to detect accidental misconfiguration.  This is not a security
1900	   critical function.

1902	   TTL (time-to-live) is an 8 bit field indicating the number of
1903	   iterations, or hops, a message can experience before it is discarded.
1904	   The TTL value MUST be decremented by one at every hop along the route
1905	   the message traverses.  If the TTL is 0, the message MUST NOT be
1906	   propagated further and MUST be discarded.  The initial value of the
1907	   TTL should be TBD.

1909	   FRAG is a 1 bit field used to specify if this message is a fragment.

1911	              NOT-FRAGMENT    : 0x0
1912	              FRAGMENT        : 0x1

1914	   LFRG is a 1 bit field used to specify whether this is the last
1915	   fragment in a complete message.

1917	              NOT-LAST-FRAGMENT    : 0x0
1918	              LAST-FRAGMENT        : 0x1

1920	   [[Open Issue:  This is conceptually clear, but the details are still
1921	   lacking.  Need to define the fragment offset and total length be
1922	   encoded in the header.  Right now we have 14 bits reserved with the
1923	   intention that they be used for fragmenting, though additional bytes
1924	   in the header might be needed for fragmentation.]]

1926	   Version is a 7 bit field that indicates the version of the RELOAD
1927	   protocol being used.

1929	              Version0.1       : 0x1

1931	   The message Length is the count in bytes of the size of the message,
1932	   including the header.

1934	   The Transaction ID is a unique 64 bit number that identifies this
1935	   transaction and also serves as a salt to randomize the request and
1936	   the response.  Responses use the same Transaction ID as the request
1937	   they correspond to.  Transaction IDs are also used for fragment
1938	   reassembly.

1940	   The flags word contains control flags.  There is one currently
1941	   defined flag.

1943	              ROUTE-LOG       : 0x1

1945	   The ROUTE-LOG flag indicates that the route log should be included
1946	   (see Section 6.2.2.2).

1948	   The Destination List Length and the Via List Length contain the
1949	   lengths of the route and via lists respectively, in bytes.

1951	   The Via List contains the sequence of destinations through which the
1952	   message has passed.  The via list starts out empty and grows as the
1953	   message traverses each peer.

1955	   The Destination List contains a sequence of destinations which the
1956	   message should pass through.  The destination list is constructed by
1957	   the message originator.  The first element in the destination list is
1958	   where the message goes next.  The list shrinks as the message
1959	   traverses each listed peer.

1961	6.2.2.1.  Destination and Via Lists

1963	   The destination list and via lists are sequences of Destination
1964	   values:

1966	         enum {reserved(0), peer(2), resource(2), compressed(3), (255) }
1967	              DestinationType;

1969	         select (destination_type) {
1970	           case peer:
1971	              NodeId               node_id;

1973	           case resource:
1974	              ResourceId           resource_id;

1976	           case compressed:
1977	              opaque               compressed_id;

1979	           /* This structure may be extended with new types */

1981	         } DestinationData;

1983	         struct {
1984	           DestinationType         type;
1985	           uint8                   length;
1986	           DestinationData         destination_data;
1987	         } Destination;

1989	   This is a TLV structure with the following contents:

1991	   type
1992	      The type of the DestinationData PDU.  This may be one of "peer",
1993	      "resource", or "compressed".

1995	   length
1996	      The length of the destination_data.

1998	   destination_value
1999	      The destination value itself, which is an encoded DestinationData
2000	      structure, depending on the value of "type".

2002	   Note:  This structure encodes a type, length, value.  The length
2003	      field specifies the length of the DestinationData values, which
2004	      allows the addition of new DestinationTypes.  This allows an
2005	      implementation which does not understand a given DestinationType
2006	      to skip over it.

2008	   A DestinationData can be one of three types:

2010	   peer
2011	      A Node-ID.

2013	   compressed
2014	      A compressed list of Node-IDs and/or resources.  Because this
2015	      value was compressed by one of the peers, it is only meaningful to
2016	      that peer and cannot be decoded by other peers.  Thus, it is
2017	      represented as an opaque string.

2019	   resource
2020	      The Resource-ID of the resource which is desired.  This type MUST
2021	      only appear in the final location of a destination list and MUST
2022	      NOT appear in a via list.  It is meaningless to try to route
2023	      through a resource.

2025	6.2.2.2.  Route Logging

2027	   The route logging feature provides diagnostic information about the
2028	   path taken by the request so far and in this manner it is similar in
2029	   function to SIP's [RFC3261] Via header field.  If the ROUTE-LOG flag
2030	   is set in the Flags word, at each hop peers MUST append a route log
2031	   entry to the route log element in the header or reject the request.
2032	   The order of the route log entry elements in the message is
2033	   determined by the order of the peers were traversed along the path.
2034	   The first route log entry corresponds to the peer at the first hop
2035	   along the path, and each subsequent entry corresponds to the peer at
2036	   the next hop along the path.  If the ROUTE-LOG flag is set in a
2037	   request, the route log MUST be copied into the response and the
2038	   ROUTE-LOG flag set so that the originator receives the ROUTE-LOG
2039	   data.

2041	   If the responder wishes to have a route log in the reverse direction,
2042	   it MAY set the ROUTE-LOG flag in its response as well.  Note,
2043	   however, that this means that the response will grow on the return
2044	   path, which may potentially mean that it gets dropped due to becoming
2045	   too large for some intermediate hop.  Thus, this option must be used
2046	   with care.

2048	   The route log is defined as follows:

2050	       enum { (255) } RouteLogExtensionType;

2052	       struct {
2053	         RouteLogExtensionType     type;
2054	         uint16                    length;

2056	         select (type){
2057	           /* Extension values go here */
2058	         } extension;
2059	       } RouteLogExtension;

2061	       enum { reserved(0), tcp_tls(1),  udp_dtls(2),  (255)}  Transport;

2063	       struct {
2064	         opaque                 version<0..2^8-1>;    /* A string */
2065	         Transport              transport;            /* TCP or UDP */
2066	         NodeId                 id;
2067	         uint32                 uptime;
2068	         IpAddressPort          address;
2069	         opaque                 certificate<0..2^16-1>;
2070	         RouteLogExtension      extensions<0..2^16-1>;
2071	       } RouteLogEntry;

2073	       struct {
2074	          RouteLogEntry         entries<0..2^16-1>;
2075	       } RouteLog;

2077	   The route log consists of an arbitrary number of RouteLogEntry
2078	   values, each representing one node through which the message has
2079	   passed.

2081	   Each RouteLogEntry consists of the following values:

2083	   version
2084	      A textual representation of the software version

2086	   transport
2087	      The transport type, currently either "tcp_tls" or "udp_dtls".

2089	   id
2090	      The Node-ID of the peer.

2092	   uptime
2093	      The uptime of the peer in seconds.

2095	   address
2096	      The address and port of the peer.

2098	   certificate
2099	      The peer's certificate.  Note that this may be omitted by setting
2100	      the length to zero.

2102	   extensions
2103	      Extensions, if any.

2105	   Extensions are defined using a RouteLogExtension structure.  New
2106	   extensions are defined by defining a new code point for
2107	   RouteLogExtensionType and adding a new arm to the RouteLogExtension
2108	   structure.  The contents of that structure are:

2110	   type
2111	      The type of the extension.

2113	   length
2114	      The length of the rest of the structure.

2116	   extension
2117	      The extension value.

2119	6.2.3.  Message Contents Format

2121	   The second major part of a RELOAD message is the contents part, which
2122	   is defined by MessageContents:

2124	          struct {
2125	            MessageCode            message_code;
2126	            opaque                 payload<0..2^24-1>;
2127	          } MessageContents;

2129	   The contents of this structure are as follows:

2131	   message_code
2132	      This indicates the message that is being sent.  The code space is
2133	      broken up as follows.

2135	      0  Reserved

2137	      1 .. 0x7fff  Requests and responses.  These code points are always
2138	         paired, with requests being odd and the corresponding response
2139	         being the request code plus 1.  Thus, "ping_request" (the Ping
2140	         request) has value 1 and "ping_answer" (the Ping response) has
2141	         value 2

2143	      0xffff  Error

2145	   message_body
2146	      The message body itself, represented as a variable-length string
2147	      of bytes.  The bytes themselves are dependent on the code value.
2148	      See the sections describing the various RELOAD methods (Join,
2149	      Update, Connect, Store, Fetch, etc.) for the definitions of the
2150	      payload contents.

2152	6.2.3.1.  Response Codes and Response Errors

2154	   A peer processing a request returns its status in the message_code
2155	   field.  If the request was a success, then the message code is the
2156	   response code that matches the request (i.e., the next code up).  The
2157	   response payload is then as defined in the request/response
2158	   descriptions.

2160	   If the request failed, then the message code is set to 0xffff (error)
2161	   and the payload MUST be an error_response PDU, as shown below.

2163	   When the message code is 0xffff, the payload MUST be an
2164	   ErrorResponse.

2166	          public struct {
2167	            uint16             error_code;
2168	            opaque             reason_phrase<0..2^8-1>;  /* String*/
2169	            opaque             error_info<0..65000>;
2170	          } ErrorResponse;

2172	   The contents of this structure are as follows:

2174	   error_code
2175	      A numeric error code indicating the error that occurred.

2177	   reason_phrase
2178	      A free form text string indicating the reason for the response.
2179	      The reason phrase SHOULD BE as indicated in the error code list
2180	      below (e.g., "Moved Temporarily").  [[Open Issue:  These reason
2181	      phrases are pretty useless.  Like the rest of this error system,
2182	      They're a holdover from SIP.  Should we remove?]]

2184	   error_info
2185	      Payload specific error information.  This MUST be empty (zero
2186	      length) except as specified below.

2188	   The following error code values are defined.  [[TODO:  These are
2189	   currently semi-aligned with SIP codes. that's probably bad and we
2190	   need to fix.]

2192	   302 (Moved Temporarily):  The requesting peer SHOULD retry the
2193	      request at the new address specified in the 302 response message.

2195	   401 (Unauthorized):  The requesting peer needs to sign and provide a
2196	      certificate.  [[TODO:  The semantics here don't seem quite
2197	      right.]]

2199	   403 (Forbidden):  The requesting peer does not have permission to
2200	      make this request.

2202	   404 (Not Found):  The resource or peer cannot be found or does not
2203	      exist.

2205	   408 (Request Timeout):  A response to the request has not been
2206	      received in a suitable amount of time.  The requesting peer MAY
2207	      resend the request at a later time.

2209	   412 (Precondition Failed):  A request can't be completed because some
2210	      precondition was incorrect.  For instance, the wrong generation
2211	      counter was provided

2213	   498 (Incompatible with Overlay)  A peer receiving the request is
2214	      using a different overlay, overlay algorithm, or hash algorithm.
2215	      [[Open Issue:  What is the best error number and reason phrase to
2216	      use?]]

2218	6.2.4.  Signature

2220	   The third part of a RELOAD message is the signature, represented by a
2221	   Signature structure.  The message signature is computed over the
2222	   payload and parts of forwarding header.  The payload, in case of a
2223	   Store, may contain an additional signature computed over a StoreReq
2224	   structure.  All signatures are formatted using the Signature element.
2225	   This element is also used in other contexts where signatures are
2226	   needed.  The input structure to the signature computation varies
2227	   depending on the data element being signed.

2229	        enum {reserved(0), signer_identity_peer (1),
2230	              signer_identity_name (2), signer_identity_certificate (3),
2231	              (255)} SignerIdentityType;

2233	        select (identity_type) {
2234	          case signer_identity_peer:
2235	            NodeId               id;

2237	          case signer_identity_name:
2238	            opaque               name<0..2^16-1>;

2240	          case signer_identity_certificate:
2241	            opaque               certificate<0..2^16-1>;

2243	          /* This structure may be extended with new types */
2244	        } SignerIdentityValue;

2246	        struct {
2247	          SignerIdentityType     identity_type;
2248	          uint16                 length;
2249	          SignerIdentityValue    identity[SignerIdentity.length];
2250	        } SignerIdentity;

2252	        struct  {
2253	           SignatureAndHashAlgorithm     algorithm;
2254	           SignerIdentity                identity;
2255	           opaque                        signature_value<0..2^16-1>;
2256	        } Signature;

2258	   The signature construct contains the following values:

2260	   algorithm
2261	      The signature algorithm in use.  The algorithm definitions are
2262	      found in the IANA TLS SignatureAlgorithm Registry.

2264	   identity
2265	      The identity or certificate used to form the signature

2267	   signature_value
2268	      The value of the signature

2270	   A number of possible identity formats are permitted.  The current
2271	   possibilities are:  a Node-ID, a user name, and a certificate.

2273	   For signatures over messages the input to the signature is computed
2274	   over:

2276	      overlay + transaction_id + MessageContents + SignerIdentity

2278	   Where overlay and transaction_id come from the forwarding header and
2279	   + indicates concatenation.

2281	   [[TODO:  Check the inputs to this carefully.]]

2283	   The input to signatures over data values is different, and is
2284	   described in Section 7.1.

2286	6.3.  Overlay Topology

2288	   As discussed in previous sections, RELOAD does not itself implement
2289	   any overlay topology.  Rather, it relies on Topology Plugins, which
2290	   allow a variety of overlay algorithms to be used while maintaining
2291	   the same RELOAD core.  This section describes the requirements for
2292	   new topology plugins and the methods that RELOAD provides for overlay
2293	   topology maintenance.

2295	6.3.1.  Topology Plugin Requirements

2297	   When specifying a new overlay algorithm, at least the following need
2298	   to be described:

2300	   o  Joining procedures, including the contents of the Join message.
2301	   o  Stabilization procedures, including the contents of the Update
2302	      message, the frequency of topology probes and keepalives, and the
2303	      mechanism used to detect when peers have disconnected.
2304	   o  Exit procedures, including the contents of the Leave message.

2306	   o  The length of the Resource-IDs and Node-IDs.  For DHTs, the hash
2307	      algorithm to compute the hash of an identifier.
2308	   o  The procedures that peers use to route messages.
2309	   o  The replication strategy used to ensure data redundancy.

2311	6.3.2.  Methods and types for use by topology plugins

2313	   This section describes the methods that topology plugins use to join,
2314	   leave, and maintain the overlay.

2316	6.3.2.1.  Join

2318	   A new peer (but which already has credentials) uses the JoinReq
2319	   message to join the overlay.  The JoinReq is sent to the responsible
2320	   peer depending on the routing mechanism described in the topology
2321	   plugin.  This notifies the responsible peer that the new peer is
2322	   taking over some of the overlay and it needs to synchronize its
2323	   state.

2325	          struct {
2326	             NodeId                joining_peer_id;
2327	             opaque                overlay_specific_data<0..2^16-1>;
2328	          } JoinReq;

2330	   The minimal JoinReq contains only the Node-ID which the sending peer
2331	   wishes to assume.  Overlay algorithms MAY specify other data to
2332	   appear in this request.

2334	   If the request succeeds, the responding peer responds with a JoinAns
2335	   message, as defined below:

2337	          struct {
2338	             opaque                overlay_specific_data<0..2^16-1>;
2339	          } JoinAns;

2341	   If the request succeeds, the responding peer MUST follow up by
2342	   executing the right sequence of Stores and Updates to transfer the
2343	   appropriate section of the overlay space to the joining peer.  In
2344	   addition, overlay algorithms MAY define data to appear in the
2345	   response payload that provides additional info.

2347	6.3.2.2.  Leave

2349	   The LeaveReq message is used to indicate that a node is exiting the
2350	   overlay.  A node SHOULD send this message to each peer with which it
2351	   is directly connected prior to exiting the overlay.

2353	          public struct {
2354	             NodeId                leaving_peer_id;
2355	             opaque                overlay_specific_data<0..2^16-1>;
2356	          } LeaveReq;

2358	   The default LeaveReq contains only the Node-ID of the leaving peer.
2359	   Overlay algorithms MAY specify other data to appear in this request.

2361	   Upon receiving a Leave request, a peer MUST update its own routing
2362	   table, and send the appropriate Store/Update sequences to re-
2363	   stabilize the overlay.

2365	6.3.2.3.  Update

2367	   Update is the primary overlay-specific maintenance message.  It is
2368	   used by the sender to notify the recipient of the sender's view of
2369	   the current state of the overlay (its routing state) and it is up to
2370	   the recipient to take whatever actions are appropriate to deal with
2371	   the state change.

2373	   The contents of the UpdateReq message are completely overlay-
2374	   specific.  The UpdateAns response is expected to be either success or
2375	   an error.

2377	6.3.2.4.  Route_Query

2379	   The Route_Query request allows the sender to ask a peer where they
2380	   would route a message directed to a given destination.  In other
2381	   words, a RouteQuery for a destination X requests the Node-ID where
2382	   the receiving peer would next route to get to X. A RouteQuery can
2383	   also request that the receiving peer initiate an Update request to
2384	   transfer his routing table.

2386	   One important use of the RouteQuery request is to support iterative
2387	   routing.  The sender selects one of the peers in its routing table
2388	   and sends it a RouteQuery message with the destination_object set to
2389	   the Node-ID or Resource-ID it wishes to route to.  The receiving peer
2390	   responds with information about the peers to which the request would
2391	   be routed.  The sending peer MAY then Connects to that peer(s), and
2392	   repeats the RouteQuery.  Eventually, the sender gets a response from
2393	   a peer that is closest to the identifier in the destination_object as
2394	   determined by the topology plugin.  At that point, the sender can
2395	   send messages directly to that peer.

2397	6.3.2.4.1.  Request Definition

2399	   A RouteQueryReq message indicates the peer or resource that the
2400	   requesting peer is interested in.  It also contains a "send_update"
2401	   option allowing the requesting peer to request a full copy of the
2402	   other peer's routing table.

2404	          struct {
2405	            Boolean                send_update;
2406	            Destination            destination;
2407	            opaque                 overlay_specific_data<0..2^16-1>;
2408	          } RouteQueryReq;

2410	   The contents of the RouteQueryReq message are as follows:

2412	   send_update
2413	      A single byte.  This may be set to "true" to indicate that the
2414	      requester wishes the responder to initiate an Update request
2415	      immediately.  Otherwise, this value MUST be set to "false".

2417	   destination
2418	      The destination which the requester is interested in.  This may be
2419	      any valid destination object, including a Node-ID, compressed ids,
2420	      or resource-id.

2422	   overlay_specific_data
2423	      Other data as appropriate for the overlay.

2425	6.3.2.4.2.  Response Definition

2427	   A response to a successful RouteQueryReq request is a RouteQueryAns
2428	   message.  This is completely overlay specific.

2430	6.4.  Forwarding Layer

2432	   Each node maintains connections to a set of other nodes defined by
2433	   the topology plugin.

2435	6.4.1.  Transports

2437	   RELOAD can use multiple transports to send its messages.  Because ICE
2438	   is used to establish connections (see Section 6.4.2.1.3), RELOAD
2439	   nodes are able to detect which transports are offered by other nodes
2440	   and establish connections between each other.  Any transport protocol
2441	   needs to be able to establish a secure, authenticated connection, and
2442	   provide data origin authentication and message integrity for
2443	   individual data elements.  RELOAD currently supports two transport
2444	   protocols:

2446	   o  TLS [REF] over TCP
2447	   o  DTLS [RFC4347] over UDP

2449	   Note that although UDP does not properly have "connections", both TLS
2450	   and DTLS have a handshake which establishes a stateful association, a
2451	   similar stateful construct, and we simply refer to these as
2452	   "connections" for the purposes of this document.

2454	6.4.1.1.  Future Support for HIP

2456	   The P2PSIP Working Group has expressed interest in supporting a HIP-
2457	   based transport.  Such support would require specifying such details
2458	   as:

2460	   o  How to issue certificates which provided identities meaningful to
2461	      the HIP base exchange.  We anticipate that this would require a
2462	      mapping between ORCHIDs and NodeIds.
2463	   o  How to carry the HIP I1 and I2 messages.  We anticipate that this
2464	      would require defining a HIP Tunnel usage.
2465	   o  How to carry RELOAD messages over HIP.

2467	   We leave this work as a topic for another draft.

2469	6.4.1.2.  Reliability for Unreliable Transports

2471	   When RELOAD is carried over DTLS or another unreliable transport, it
2472	   needs to be used with a reliability and congestion control mechanism,
2473	   which is provided on a hop-by-hop basis, matching the semantics if
2474	   TCP were used.  The basic principle is that each message, regardless
2475	   of if it carries a request or responses, will get an ACK and be
2476	   reliably retransmitted.  The receiver's job is very simple, limited
2477	   to just sending ACKs.  All the complexity is at the sender side.
2478	   This allows the sending implementation to trade off performance
2479	   versus implementation complexity without affecting the wire protocol.

2481	   In order to support unreliable transport, each message is wrapped in
2482	   a very simple framing layer (FramedMessage) which is only used for
2483	   each hop.  This layer contains a sequence number which can then be
2484	   used for ACKs.

2486	6.4.1.2.1.  Framed Message Format

2488	   The definition of FramedMessage is:

2490	         enum {data (128), ack (129), (255)} FramedMessageType;

2492	         struct {
2493	           FramedMessageType       type;

2495	           select (type) {
2496	             case data:
2497	               uint24              sequence;
2498	               opaque              message<0..2^24-1>;

2500	             case ack:
2501	               uint24              ack_sequence;
2502	               uint32              received;
2503	           };
2504	         } FramedMessage;

2506	   The type field of the PDU is set to indicate whether the message is
2507	   data or an acknowledgement.  Note that these values have been set to
2508	   force the first bit to be high, thus allowing easy demultiplexing
2509	   with STUN.  All FramedMessageType values must be > 128.

2511	   If the message is of type "data", then the remainder of the PDU is as
2512	   follows:

2514	   sequence
2515	      the sequence number

2517	   message
2518	      the original message that is being transmitted.

2520	   Each connection has it own sequence number.  Initially the value is
2521	   zero and it increments by exactly one for each message sent over that
2522	   connection.

2524	   When the receiver receive a message, it SHOULD immediately send an
2525	   ACK message.  The receiver MUST keep track of the 32 most recent
2526	   sequence numbers received on this association in order to generate
2527	   the appropriate ack.

2529	   If the PDU is of type "ack", the contents are as follows:

2531	   ack_sequence
2532	      The sequence number of the message being acknowledged.

2534	   received
2535	      A bitmask indicating whether or not each of the previous 32
2536	      packets has been received before the sequence number in
2537	      ack_sequence.  The high order bit represents the first packet in
2538	      the sequence space.

2540	   The received field bits in the ACK provide a very high degree of
2541	   redundancy for the sender to figure out which packets the receiver
2542	   received and can then estimate packet loss rates.  If the sender also
2543	   keeps track of the time at which recent sequence numbers were sent,
2544	   the RTT can be estimated.

2546	6.4.1.2.2.  Retransmission and Flow Control

2548	   Because the receiver's role is limited to providing packet
2549	   acknowledgements, a wide variety of congestion control algorithms can
2550	   be implemented on the sender side while using the same basic wire
2551	   protocol.  It is RECOMMENDED that senders implement TFRC-SP [RFC4828]
2552	   and use the received bitmask to allow the sender to compute packer
2553	   loss event rates.  Senders MUST implement a retransmission and
2554	   congestion control scheme no more aggressive then TFRC-SP.

2556	6.4.1.3.  Fragmentation and Reassembly

2558	   In order to allow transport over datagram protocols, RELOAD messages
2559	   may be fragmented.  If a message is too large for a peer to transmit
2560	   to the next peer it MUST fragment the message.  Note that this
2561	   implies that intermediate peers may re-fragment messages if the
2562	   incoming and outgoing paths have different maximum datagram sizes.
2563	   Intermediate peers SHOULD NOT reassemble fragments.

2565	   Upon receipt of a fragmented message by the intended peer, the peer
2566	   holds the fragments in a holding buffer until the entire message has
2567	   been received.  The message is then reassembled into a single
2568	   unfragmented message and processed.  In order to mitigate denial of
2569	   service attacks, receivers SHOULD time out incomplete fragments.
2570	   [[TODO:  Describe algorithm]]

2572	6.4.2.  Connection Management Methods

2574	   This section defines the methods RELOAD uses to form and maintain
2575	   connections between nodes in the overlay.  Three methods are defined:

2577	   Connect:    used to form connections between nodes.  When node A
2578	      wants to connect to node B, it sends a Connect message to node B
2579	      through the overlay.  The Connect contains A's ICE parameters.  B
2580	      responds with its ICE parameters and the two nodes perform ICE to
2581	      form connection.
2582	   Ping:    is a simple request/response which is used to verify
2583	      connectivity (analogous to the UNIX ping command) along a path and
2584	      to gather a small amount of information about the resources held
2585	      by the target peer
2586	   Tunnel:    in some cases, it will be too expensive for an application
2587	      layer protocol to set up a connection in order to send a small
2588	      number of messages.  The Tunnel message allows applications to
2589	      route individual application layer protocol messages through the
2590	      overlay.

2592	6.4.2.1.  Connect

2594	   A node sends a Connect request when it wishes to establish a direct
2595	   TCP or UDP connection to another node for the purposes of sending
2596	   RELOAD messages or application layer protocol messages, such as SIP.
2597	   Detailed procedures for the Connect and its response are described in
2598	   Section 6.4.2.1.

2600	   A Connect in and of itself does not result in updating the routing
2601	   table of either node.  That function is performed by Updates.  If
2602	   node A has Connected to node B, but not received any Updates from B,
2603	   it MAY route messages which are directly addressed to B through that
2604	   channel but MUST NOT route messages through B to other peers via that
2605	   channel.  The process of Connecting is separate from the process of
2606	   becoming a peer (using Update) to prevent half-open states where a
2607	   node has started to form connections but is not really ready to act
2608	   as a peer.

2610	6.4.2.1.1.  Request Definition

2612	   A ConnectReq message contains the requesting peer's ICE connection
2613	   parameters formatted into a binary structure.

2615	         typedef opaque            IceCandidate<0..2^16-1>;

2617	         struct  {
2618	           opaque                  ufrag<0..2^8-1>;
2619	           opaque                  password<0..2^8-1>;
2620	           uint16                  application;
2621	           opaque                  role<0..2^8-1>;
2622	           IceCandidate            candidates<0..2^16-1>;
2623	         } ConnectReqAns;

2625	   The values contained in ConnectReq and ConnectAns are:

2627	   ufrag
2628	      The username fragment (from ICE)

2630	   password
2631	      The ICE password.

2633	   application
2634	      A 16-bit port number.  This port number represents the IANA
2635	      registered port of the protocol that is going to be sent on this
2636	      connection.  For SIP, this is 5060 or 5061, and for RELOAD is TBD.
2637	      By using the IANA registered port, we avoid the need for an
2638	      additional registry and allow RELOAD to be used to set up
2639	      connections for any existing or future application protocol.

2641	   role
2642	      An active/passive/actpass attribute from RFC 4145 [RFC4145].

2644	   candidates
2645	      One or more ICE candidate values.  Each candidate has an IP
2646	      address, IP address family, port, transport protocol, priority,
2647	      foundation, component ID, STUN type and related address.  The
2648	      candidate_list is a list of string candidate values from ICE.

2650	   These values should be generated using the procedures described in
2651	   Section 6.4.2.1.3.

2653	6.4.2.1.2.  Response Definition

2655	   If a peer receives a Connect request, it SHOULD follow the process
2656	   the request and generate its own response with a ConnectReqAns It
2657	   should then begin ICE checks.  When a peer receives a Connect
2658	   response, it SHOULD parse the response and begin its own ICE checks.

2660	6.4.2.1.3.  Using ICE With RELOAD

2662	   This section describes the profile of ICE that is used with RELOAD.
2663	   RELOAD implementations MUST implement full ICE.  Because RELOAD
2664	   always tries to use TCP and then UDP as a fallback, there will be
2665	   multiple candidates of the same IP version, which requires full ICE.

2667	   In ICE as defined by [I-D.ietf-mmusic-ice], SDP is used to carry the
2668	   ICE parameters.  In RELOAD, this function is performed by a binary
2669	   encoding in the Connect method.  This encoding is more restricted
2670	   than the SDP encoding because the RELOAD environment is simpler:

2672	   o  Only a single media stream is supported.
2673	   o  In this case, the "stream" refers not to RTP or other types of
2674	      media, but rather to a connection for RELOAD itself or for SIP
2675	      signaling.
2676	   o  RELOAD only allows for a single offer/answer exchange.  Unlike the
2677	      usage of ICE within SIP, there is never a need to send a
2678	      subsequent offer to update the default candidates to match the
2679	      ones selected by ICE.

2681	   An agent follows the ICE specification as described in
2682	   [I-D.ietf-mmusic-ice] and [I-D.ietf-mmusic-ice-tcp] with the changes
2683	   and additional procedures described in the subsections below.

2685	6.4.2.1.4.  Collecting STUN Servers

2687	   ICE relies on the node having one or more STUN servers to use.  In
2688	   conventional ICE, it is assumed that nodes are configured with one or
2689	   more STUN servers through some out-of-band mechanism.  This is still
2690	   possible in RELOAD but RELOAD also learns STUN servers as it connects
2691	   to other peers.  Because all RELOAD peers implement ICE and use STUN
2692	   keepalives, every peer is a STUN server [I-D.ietf-behave-rfc3489bis].
2693	   Accordingly, any peer a node knows will be willing to be a STUN
2694	   server -- though of course it may be behind a NAT.

2696	   A peer on a well-provisioned wide-area overlay will be configured
2697	   with one or more bootstrap peers.  These peers make an initial list
2698	   of STUN servers.  However, as the peer forms connections with
2699	   additional peers, it builds more peers it can use as STUN servers.

2701	   Because complicated NAT topologies are possible, a peer may need more
2702	   than one STUN server.  Specifically, a peer that is behind a single
2703	   NAT will typically observe only two IP addresses in its STUN checks:
2704	   its local address and its server reflexive address from a STUN server
2705	   outside its NAT.  However, if there are more NATs involved, it may
2706	   discover that it learns additional server reflexive addresses (which
2707	   vary based on where in the topology the STUN server is).  To maximize
2708	   the chance of achieving a direct connection, a peer SHOULD group
2709	   other peers by the peer-reflexive addresses it discovers through
2710	   them.  It SHOULD then select one peer from each group to use as a
2711	   STUN server for future connections.

2713	   Only peers to which the peer currently has connections may be used.
2714	   If the connection to that host is lost, it MUST be removed from the
2715	   list of stun servers and a new server from the same group SHOULD be
2716	   selected.

2718	6.4.2.1.5.  Gathering Candidates

2720	   When a node wishes to establish a connection for the purposes of
2721	   RELOAD signaling or SIP signaling (or any other application protocol
2722	   for that matter), it follows the process of gathering candidates as
2723	   described in Section 4 of ICE [I-D.ietf-mmusic-ice].  RELOAD utilizes
2724	   a single component, as does SIP.  Consequently, gathering for these
2725	   "streams" requires a single component.

2727	   An agent MUST implement ICE-tcp [I-D.ietf-mmusic-ice], and MUST
2728	   gather at least one UDP and one TCP host candidate for RELOAD and for
2729	   SIP.

2731	   The ICE specification assumes that an ICE agent is configured with,
2732	   or somehow knows of, TURN and STUN servers.  RELOAD provides a way
2733	   for an agent to learn these by querying the overlay, as described in
2734	   Section 6.4.2.1.4 and Section 9.

2736	   The agent SHOULD prioritize its TCP-based candidates over its UDP-
2737	   based candidates in the prioritization described in Section 4.1.2 of
2738	   ICE [I-D.ietf-mmusic-ice].

2740	   The default candidate selection described in Section 4.1.3 of ICE is
2741	   ignored; defaults are not signaled or utilized by RELOAD.

2743	6.4.2.1.6.  Encoding the Connect Message

2745	   Section 4.3 of ICE describes procedures for encoding the SDP for
2746	   conveying RELOAD or SIP ICE candidates.  Instead of actually encoding
2747	   an SDP, the candidate information (IP address and port and transport
2748	   protocol, priority, foundation, component ID, type and related
2749	   address) is carried within the attributes of the Connect request or
2750	   its response.  Similarly, the username fragment and password are
2751	   carried in the Connect message or its response.  Section 6.4.2.1
2752	   describes the detailed attribute encoding for Connect.  The Connect
2753	   request and its response do not contain any default candidates or the
2754	   ice-lite attribute, as these features of ICE are not used by RELOAD.
2755	   The Connect request and its response also contain a application
2756	   attribute, with a value of SIP or RELOAD, which indicates what
2757	   protocol is to be run over the connection.  The RELOAD Connect
2758	   request MUST only be utilized to set up connections for application
2759	   protocols that can be multiplexed with STUN.

2761	   Since the Connect request contains the candidate information and
2762	   short term credentials, it is considered as an offer for a single
2763	   media stream that happens to be encoded in a format different than
2764	   SDP, but is otherwise considered a valid offer for the purposes of
2765	   following the ICE specification.  Similarly, the Connect response is
2766	   considered a valid answer for the purposes of following the ICE
2767	   specification.

2769	   Similarly, the node MUST implement the active, passive, and actpass
2770	   attributes from RFC 4145 [RFC4145].  However, here they refer
2771	   strictly to the role of active or passive for the purposes of TLS
2772	   handshaking.  The TCP connection directions are signaled as part of
2773	   the ICE candidate attribute.

2775	6.4.2.1.7.  Verifying ICE Support

2777	   An agent MUST skip the verification procedures in Section 5.1 and 6.1
2778	   of ICE.  Since RELOAD requires full ICE from all agents, this check
2779	   is not required.

2781	6.4.2.1.8.  Role Determination

2783	   The roles of controlling and controlled as described in Section 5.2
2784	   of ICE are still utilized with RELOAD.  However, the offerer (the
2785	   entity sending the Connect request) will always be controlling, and
2786	   the answerer (the entity sending the Connect response) will always be
2787	   controlled.  The connectivity checks MUST still contain the ICE-
2788	   CONTROLLED and ICE-CONTROLLING attributes, however, even though the
2789	   role reversal capability for which they are defined will never be
2790	   needed with RELOAD.  This is to allow for a common codebase between
2791	   ICE for RELOAD and ICE for SDP.

2793	6.4.2.1.9.  Connectivity Checks

2795	   The processes of forming check lists in Section 5.7 of ICE,
2796	   scheduling checks in Section 5.8, and checking connectivity checks in
2797	   Section 7 are used with RELOAD without change.

2799	6.4.2.1.10.  Concluding ICE

2801	   The controlling agent MUST utilize regular nomination.  This is to
2802	   ensure consistent state on the final selected pairs without the need
2803	   for an updated offer, as RELOAD does not generate additional offer/
2804	   answer exchanges.

2806	   The procedures in Section 8 of ICE are followed to conclude ICE, with
2807	   the following exceptions:

2809	   o  The controlling agent MUST NOT attempt to send an updated offer
2810	      once the state of its single media stream reaches Completed.
2811	   o  Once the state of ICE reaches Completed, the agent can immediately
2812	      free all unused candidates.  This is because RELOAD does not have
2813	      the concept of forking, and thus the three second delay in Section
2814	      8.3 of ICE does not apply.

2816	6.4.2.1.11.  Subsequent Offers and Answers

2818	   An agent MUST NOT send a subsequent offer or answer.  Thus, the
2819	   procedures in Section 9 of ICE MUST be ignored.

2821	6.4.2.1.12.  Media Keepalives

2823	   STUN MUST be utilized for the keepalives described in Section 10 of
2824	   ICE.

2826	6.4.2.1.13.  Sending Media

2828	   The procedures of Section 11 apply to RELOAD as well.  However, in
2829	   this case, the "media" takes the form of application layer protocols
2830	   (RELOAD or SIP for example) over TLS or DTLS.  Consequently, once ICE
2831	   processing completes, the agent will begin TLS or DTLS procedures to
2832	   establish a secure connection.  The nodes MUST verify that the
2833	   certificate presented in the handshake matches the identity of the
2834	   other peer as found in the Connect message.  Once the TLS or DTLS
2835	   signaling is complete, the application protocol is free to use the
2836	   connection.

2838	   The concept of a previous selected pair for a component does not
2839	   apply to RELOAD, since ICE restarts are not possible with RELOAD.

2841	6.4.2.1.14.  Receiving Media

2843	   An agent MUST be prepared to receive packets for the application
2844	   protocol (TLS or DTLS carrying RELOAD, SIP or anything else) at any
2845	   time.  The jitter and RTP considerations in Section 11 of ICE do not
2846	   apply to RELOAD or SIP.

2848	6.4.2.2.  Ping

2850	   Ping is used to test connectivity along a path.  A ping can be
2851	   addressed to a specific Node-ID, the peer controlling a given
2852	   location (by using a resource ID) or to the broadcast Node-ID (all
2853	   1s).  In either case, the target Node-IDs respond with a simple
2854	   response containing some status information.

2856	6.4.2.2.1.  Request Definition

2858	   The PingReq message contains a list (potentially empty) of the pieces
2859	   of status information that the requester would like the responder to
2860	   provide.

2862	         enum { responsible_set(1), num_resources(2), (255)}
2863	              PingInformationType;

2865	         struct {
2866	           PingInformationType     requested_info<0..2^8-1>;
2867	         } PingReq

2869	   The two currently defined values for PingInformation are:

2871	   responsible_set
2872	      indicates that the peer should Respond with the fraction of the
2873	      overlay for which the responding peer is responsible.

2875	   num_resources
2876	      indicates that the peer should Respond with the number of
2877	      resources currently being stored by the peer.

2879	6.4.2.2.2.  Response Definition

2881	   A successful PingAns response contains the information elements
2882	   requested by the peer.

2884	          struct {
2885	            PingInformationType    type;

2887	            select (type) {
2888	              case responsible_set:
2889	                uint32             responsible_ppb;

2891	              case num_resources:
2892	                uint32             num_resources;

2894	              /* This type may be extended */

2896	            };
2897	          } PingInformation;

2899	          struct {
2900	            uint64                 response_id;
2901	            PingInformation        ping_info<0..2^16-1>;
2902	          } PingAns;

2904	   A PingAns message contains the following elements:

2906	   response_id
2907	      A randomly generated 64-bit response ID.  This is used to
2908	      distinguish Ping responses in cases where the Ping request is
2909	      multicast.

2911	   ping_info
2912	      A sequence of PingInformation structures, as shown below.

2914	   Each of the current possible Ping information types is a 32-bit
2915	   unsigned integer.  For type "responsible_ppb", it is the fraction of
2916	   the overlay for which the peer is responsible in parts per billion.
2917	   For type "num_resources", it is the number of resources the peer is
2918	   storing.

2920	   The responding peer SHOULD include any values that the requesting
2921	   peer requested and that it recognizes.  They SHOULD be returned in
2922	   the requested order.  Any other values MUST NOT be returned.

2924	6.4.2.3.  Tunnel

2926	   A node sends a Tunnel request when it wishes to exchange application-
2927	   layer protocol messages without the expense of establishing a direct
2928	   connection via Connect or when ICE is unable to establish a direct
2929	   connection via Connect and a TURN relay is not available.  The
2930	   application-level protocols that are routed via the Tunnel request
2931	   are defined by that application's usage.

2933	   Note:  The decision of whether to route application-level traffic
2934	      across the overlay or to open a direct connection requires careful
2935	      consideration of the overhead involved in each transaction.
2936	      Establishing a direct connection requires greater initial setup
2937	      costs, but after setup, communication is faster and imposes no
2938	      overhead on the overlay.  For example, for the SIP usage, an
2939	      INVITE request to establish a voice call might be routed over the
2940	      overlay, a SUBSCRIBE with regular updates would be better used
2941	      with a Connect, and media would both impose too great a load on
2942	      the overlay and likely receive unacceptable performance.  However,
2943	      there may be a tradeoff between locating TURN servers and relying
2944	      on Tunnel for packet routing.

2946	   When a usage requires the Tunnel method, it must specify the specific
2947	   application protocol(s) that will be Tunneled and for each protocol,
2948	   specify:

2950	   o  An application attribute that indicates the protocol being
2951	      tunneled.  This the IANA-registered port of the application
2952	      protocol.
2953	   o  The conditions under which the application will be Tunneled over
2954	      the overlay rather than using a direct Connect.
2955	   o  A mechanism for moving future application-level communication from
2956	      Tunneling on the overlay to a direct Connection, or an explanation
2957	      why this is unnecessary.
2958	   o  A means of associating messages together as required for dialog-
2959	      oriented or request/response-oriented protocols.
2960	   o  How the Tunneled message (and associated responses) will be
2961	      delivered to the correct application.  This is particularly
2962	      important if there might be multiple instances of the application
2963	      on or behind a single peer.

2965	6.4.2.3.1.  Request Definition

2967	   The TunnelReq message contains the application PDU that the
2968	   requesting peer wishes to transmit, along with some control
2969	   information identifying the handling of the PDU.

2971	          struct  {
2972	            uint16                 application;
2973	            opaque                 dialog_id<0..2^8-1>;
2974	            opaque                 application_pdu<0..2^24-1>;
2975	          } TunnelReq;

2977	   The values contained in the TunnelReq are:

2979	   application
2980	      A 16-bit port number.  This port number represents the IANA
2981	      registered port of the protocol that is going to be sent on this
2982	      connection.  For SIP, this is 5060 or 5061, and for RELOAD is TBD.
2983	      By using the IANA registered port, we avoid the need for an
2984	      additional registry and allow RELOAD to be used to set up
2985	      connections for any existing or future application protocol.

2987	   dialog_id
2988	      An arbitrary string providing an application-defined way of
2989	      associating related Tunneled messages.  This attribute may also
2990	      encode sequence information as required by the application
2991	      protocol.

2993	   application_pdu
2994	      An application PDU in the format specified by the application.

2996	6.4.2.3.2.  Response Definition

2998	   A TunnelAns message serves as confirmation that the message was
2999	   received by the destination peer.  It implies nothing about the
3000	   processing of the application.  If the application protocol specifies
3001	   an acknowledgement or confirmation, that must be sent with a separate
3002	   Tunnel request.  The TunnelAns message is empty (has a zero length
3003	   payload)

3005	7.  Data Storage Protocol

3007	   RELOAD provides a set of generic mechanisms for storing and
3008	   retrieving data in the Overlay Instance.  These mechanisms can be
3009	   used for new applications simply by defining new code points and a
3010	   small set of rules.  No new protocol mechanisms are required.

3012	   The basic unit of stored data is a single StoredData structure:

3014	         struct {
3015	           uint32                  length;
3016	           uint64                  storage_time;
3017	           uint32                  lifetime;
3018	           StoredDataValue         value;
3019	           Signature               signature;
3020	         } StoredData;

3022	   The contents of this structure are as follows:

3024	   length
3025	      The length of the rest of the structure in octets.

3027	   storage_time
3028	      The time when the data was stored in absolute time, represented in
3029	      seconds since the Unix epoch.  Any attempt to store a data value
3030	      with a storage time before that of a value already stored at this
3031	      location MUST generate a 412 error.  This prevents rollback
3032	      attacks.  Note that this does not require synchronized clocks:
3033	      the receiving peer uses the storage time in the previous store,
3034	      not its own clock.

3036	   lifetime
3037	      The validity period for the data, in seconds, starting from the
3038	      time of store.

3040	   value
3041	      The data value itself, as described in Section 7.2.

3043	   signature
3044	      A signature over the data value.  Section 7.1 describes the
3045	      signature computation.  The element is formatted as described in
3046	      Section 6.2.4

3048	   Each resource-id specifies a single location in the Overlay Instance.
3049	   However, each location may contain multiple StoredData values
3050	   distinguished by kind-id.  The definition of a kind describes both
3051	   the data values which may be stored and the data model of the data.
3052	   Some data models allow multiple values to be stored under the same
3053	   kind-id.  Section Section 7.2 describes the available data models.
3054	   Thus, for instance, a given resource-id might contain a single-value
3055	   element stored under kind-id X and an array containing multiple
3056	   values stored under kind-id Y.

3058	7.1.  Data Signature Computation

3060	   Each StoredData element is individually signed.  However, the
3061	   signature also must be self-contained and cover the kind-id and
3062	   resource-id even though they are not present in the StoredData
3063	   structure.  The input to the signature algorithm is:

3065	      resource_id + kind + StoredData

3067	   Where these values are:

3069	   resource
3070	      The resource ID where this data is stored.

3072	   kind
3073	      The kind-id for this data.

3075	   StoredData
3076	      The contents of the stored data value, as described in the
3077	      previous sections.

3079	   [OPEN ISSUE:  Should we include the identity in the string that forms
3080	   the input to the signature algorithm?.]

3082	   Once the signature has been computed, the signature is represented
3083	   using a signature element, as described in Section 6.2.4.

3085	7.2.  Data Models

3087	   The protocol currently defines the following data models:

3089	   o  single value
3090	   o  array
3091	   o  dictionary

3093	   These are represented with the StoredDataValue structure:

3095	         enum { reserved(0), single_value(1), array(2),
3096	                dictionary(3), (255)} DataModel;

3098	         struct {
3099	           Boolean                exists;
3100	           opaque                 value<0..2^32-1>;
3101	         } DataValue;

3103	         select (DataModel) {
3104	           case single_value:
3105	             DataValue             single_value_entry;

3107	           case array:
3108	             ArrayEntry            array_entry;

3110	           case DictionaryEntry:
3111	             DictionaryEntry       dictionary_entry;

3113	           /* This structure may be extended */
3114	         } StoredDataValue;

3116	   We now discuss the properties of each data model in turn:

3118	7.2.1.  Single Value

3120	   A single-value element is a simple, opaque sequence of bytes.  There
3121	   may be only one single-value element for each resource-id, kind-id
3122	   pair.

3124	   A single value element is represented as a DataValue, which contains
3125	   the following two values:

3127	   exists
3128	      This value indicates whether the value exists at all.  If it is
3129	      set to False, it means that no value is present.  If it is True,
3130	      that means that a value is present.  This gives the protocol a
3131	      mechanism for indicating nonexistence as opposed to emptiness.

3133	   value
3134	      The stored data.

3136	7.2.2.  Array

3138	   An array is a set of opaque values addressed by an integer index.
3139	   Arrays are zero based.  Note that arrays can be sparse.  For
3140	   instance, a Store of "X" at index 2 in an empty array produces an
3141	   array with the values [ NA, NA, "X"].  Future attempts to fetch
3142	   elements at index 0 or 1 will return values with "exists" set to
3143	   False.

3145	   A array element is represented as an ArrayEntry:

3147	          struct {
3148	            uint32                  index;
3149	            DataValue               value;
3150	          } ArrayEntry;

3152	   The contents of this structure are:

3154	   index
3155	      The index of the data element in the array.

3157	   value
3158	      The stored data.

3160	7.2.3.  Dictionary

3162	   A dictionary is a set of opaque values indexed by an opaque key with
3163	   one value for each key. single dictionary entry is represented as
3164	   follows

3166	   A dictionary element is represented as a DictionaryEntry:

3168	          typedef opaque           DictionaryKey<0..2^16-1>;

3170	          struct {
3171	            DictionaryKey          key;
3172	            DataValue              value;
3173	          } DictionaryEntry;

3175	   The contents of this structure are:
3176	   key
3177	      The dictionary key for this value.
3178	   value
3179	      The stored data.

3181	7.3.  Data Storage Methods

3183	   RELOAD provides several methods for storing and retrieving data:

3185	   o  Store values in the overlay
3186	   o  Fetch values from the overlay
3187	   o  Remove values from the overlay
3188	   o  Find the values stored at an individual peer

3190	   These methods are each described in the following sections.

3192	7.3.1.  Store

3194	   The Store method is used to store data in the overlay.  The format of
3195	   the Store request depends on the data model which is determined by
3196	   the kind.

3198	7.3.1.1.  Request Definition

3200	   A StoreReq message is a sequence of StoreKindData values, each of
3201	   which represents a sequence of stored values for a given kind.  The
3202	   same kind-id MUST NOT be used twice in a given store request.  Each
3203	   value is then processed in turn.  These operations MUST be atomic.
3204	   If any operation fails, the state MUST be rolled back to before the
3205	   request was received.

3207	   The store request is defined by the StoreReq structure:

3209	        struct {
3210	            KindId                 kind;
3211	            DataModel              data_model;
3212	            uint64                 generation_counter;
3213	            StoredData             values<0..2^32-1>;
3214	        } StoreKindData;

3216	        struct {
3217	            ResourceId             resource;
3218	            uint8                  replica_number;
3219	            StoreKindData          kind_data<0..2^32-1>;
3220	        } StoreReq;

3222	   A single Store request stores data of a number of kinds to a single
3223	   resource location.  The contents of the structure are:

3225	   resource
3226	      The resource to store at.

3228	   replica_number
3229	      The number of this replica.  When a storing peer saves replicas to
3230	      other peers each peer is assigned a replica number starting from 1
3231	      and sent in the Store message.  This field is set to 0 when a node
3232	      is storing its own data.  This allows peers to distinguish replica
3233	      writes from original writes.

3235	   kind_data
3236	      A series of elements, one for each kind of data to be stored.

3238	   If the replica number is zero, then the peer MUST check that it is
3239	   responsible for the resource and if not reject the request.  If the
3240	   replica number is nonzero, then the peer MUST check that it expects
3241	   to be a replica for the resource and if not reject the request.

3243	   Each StoreKindData element represents the data to be stored for a
3244	   single kind-id.  The contents of the element are:

3246	   kind
3247	      The kind-id.  Implementations SHOULD reject requests corresponding
3248	      to unknown kinds unless specifically configured otherwise.

3250	   data_model
3251	      The data model of the data.  The kind defines what this has to be
3252	      so this is redundant in the case where the software interpreting
3253	      the messages understands the kind.

3255	   generation
3256	      The expected current state of the generation counter
3257	      (approximately the number of times this object has been written,
3258	      see below for details).

3260	   values
3261	      The value or values to be stored.  This may contain one or more
3262	      stored_data values depending on the data model associated with
3263	      each kind.

3265	   The peer MUST perform the following checks:

3267	   o  The kind_id is known and supported.
3268	   o  The data_model matches the kind_id.
3269	   o  The signatures over each individual data element (if any) are
3270	      valid.
3271	   o  Each element is signed by a credential which is authorized to
3272	      write this kind at this resource-id
3273	   o  For original (non-replica) stores, the peer MUST check that if the
3274	      generation-counter is non-zero, it equals the current value of the
3275	      generation-counter for this kind.  This feature allows the
3276	      generation counter to be used in a way similar to the HTTP Etag
3277	      feature.
3278	   o  The storage time values are greater than that of any value which
3279	      would be replaced by this Store.  [[OPEN ISSUE:  do peers need to
3280	      save the storage time of Removes to prevent reinsertion?]]

3282	   If all these checks succeed, the peer MUST attempt to store the data
3283	   values.  For non-replica stores, if the store succeeds and the data
3284	   is changed, then the peer must increase the generation counter by at
3285	   least one.  If there are multiple stored values in a single
3286	   StoreKindData, it is permissible for the peer to increase the
3287	   generation counter by only 1 for the entire kind-id, or by 1 or more
3288	   than one for each value.  Accordingly, all stored data values must
3289	   have a generation counter of 1 or greater. 0 is used by other nodes
3290	   to indicate that they are indifferent to the generation counter's
3291	   current value.  For replica Stores, the peer MUST set the generation
3292	   counter to match the generation_counter in the message.  Replica
3293	   Stores MUST NOT use a generation counter of 0.

3295	   The properties of stores for each data model are as follows:

3297	   Single-value:

3299	      A store of a new single-value element creates the element if it
3300	      does not exist and overwrites any existing value. with the new
3301	      value.

3303	   Array:
3304	      A store of an array entry replaces (or inserts) the given value at
3305	      the location specified by the index.  Because arrays are sparse, a
3306	      store past the end of the array extends it with nonexistent values
3307	      (exists=False) as required.  A store at index 0xffffffff places
3308	      the new value at the end of the array regardless of the length of
3309	      the the array.  The resulting StoredData has the correct index
3310	      value when it is subsequently fetched.

3312	   Dictionary:
3313	      A store of a dictionary entry replaces (or inserts) the given
3314	      value at the location specified by the dictionary key.

3316	   The following figure shows the relationship between these structures
3317	   for an example store which stores the following values at resource
3318	   "1234"

3320	   o  The value "abc" in the single value slot for kind X
3321	   o  The value "foo" at index 0 in the array for kind Y
3322	   o  The value "bar" at index 1 in the array for kind Y

3324	                                     Store
3325	                                 resource=1234
3326	                                    /      \
3327	                                   /        \
3328	                       StoreKindData        StoreKindData
3329	                          kind=X               kind=Y
3330	                    model=Single-Value       model=Array
3331	                            |                    /\
3332	                            |                   /  \
3333	                        StoredData             /    \
3334	                            |                 /      \
3335	                            |           StoredData  StoredData
3336	                     StoredDataValue        |           |
3337	                      value="abc"           |           |
3338	                                            |           |
3339	                                   StoredDataValue  StoredDataValue
3340	                                         index=0      index=1
3341	                                      value="foo"    value="bar"

3343	7.3.1.2.  Response Definition

3345	   In response to a successful Store request the peer MUST return a
3346	   StoreAns message containing a series of StoreKindResponse elements
3347	   containing the current value of the generation counter for each
3348	   kind-id, as well as a list of the peers where the data was
3349	   replicated.

3351	         struct {
3352	           KindId                  kind;
3353	           uint64                  generation_counter;
3354	           NodeId                  replicas<0..2^16-1>;
3355	         } StoreKindResponse;

3357	         struct {
3358	           StoreKindResponse       kind_responses<0..2^16-1>;
3359	         } StoreAns;

3361	   The contents of each StoreKindResponse are:

3363	   kind
3364	      The kind-id being represented.

3366	   generation
3367	      The current value of the generation counter for that kind-id.

3369	   replicas
3370	      The list of other peers at which the data was/will-be replicated.
3371	      In overlays and applications where the responsible peer is
3372	      intended to store redundant copies, this allows the storing peer
3373	      to independently verify that the replicas were in fact stored by
3374	      doing its own Fetch.

3376	   The response itself is just StoreKindResponse values packed end-to-
3377	   end.

3379	   If any of the generation counters in the request precede the
3380	   corresponding stored generation counter, then the peer MUST fail the
3381	   entire request and respond with a 412 error.  The error_info in the
3382	   ErrorResponse MUST be a StoreAns response containing the correct
3383	   generation counter for each kind and empty replicas lists.

3385	7.3.2.  Fetch

3387	   The Fetch request retrieves one or more data elements stored at a
3388	   given resource-id.  A single Fetch request can retrieve multiple
3389	   different kinds.

3391	7.3.2.1.  Request Definition

3393	         struct {
3394	           int32            first;
3395	           int32            last;
3396	         } ArrayRange;

3398	         struct {
3399	           KindId                  kind;
3400	           DataModel               model;
3401	           uint64                  generation;
3402	           uint16                  length;

3404	           select (model) {
3405	             case single_value: ;    /* Empty */

3407	             case array:
3408	                  ArrayRange       indices<0..2^16-1>;

3410	             case dictionary:
3411	                  DictionaryKey    keys<0..2^16-1>;

3413	             /* This structure may be extended */

3415	           } model_specifier;
3416	         } StoredDataSpecifier;

3418	         struct {
3419	           ResourceId              resource;
3420	           StoredDataSpecifier     specifiers<0..2^16-1>;
3421	         } FetchReq;

3423	   The contents of the Fetch requests are as follows:

3425	   resource
3426	      The resource ID to fetch from.

3428	   specifiers
3429	      A sequence of StoredDataSpecifier values, each specifying some of
3430	      the data values to retrieve.

3432	   Each StoredDataSpecifier specifies a single kind of data to retrieve
3433	   and (if appropriate) the subset of values that are to be retrieved.
3434	   The contents of the StoredDataSpecifier structure are as follows:

3436	   kind
3437	      The kind-id of the data being fetched.  Implementations SHOULD
3438	      reject requests corresponding to unknown kinds unless specifically
3439	      configured otherwise.

3441	   model
3442	      The data model of the data..  This must be checked against the
3443	      kind-id.

3445	   generation
3446	      The last generation counter that the requesting peer saw.  This
3447	      may be used to avoid unnecessary fetches or it may be set to zero.

3449	   length
3450	      The length of the rest of the structure, thus allowing
3451	      extensibility.

3453	   model_specifier
3454	      A reference to the data value being requested within the data
3455	      model specified for the kind.  For instance, if the data model is
3456	      "array", it might specify some subset of the values.

3458	   The model_specifier is as follows:

3460	   o  If the data is of data model single value, the specifier is empty.
3461	   o  If the data is of data model array, the specifier contains of a
3462	      list of ArrayRange elements, each of which contains two integers.
3463	      two integers.  The first integer is the beginning of the range and
3464	      the second is the end of the range. 0 is used to indicate the
3465	      first element and 0xffffffff is used to indicate the final
3466	      element.  The beginning of the range MUST be earlier in the array
3467	      then the end.  The ranges MUST be non-overlapping.
3468	   o  If the data is of data model dictionary then the specifier
3469	      contains a list of the dictionary keys being requested.  If no
3470	      keys are specified, than this is a wildcard fetch and all key-
3471	      value pairs are returned.  [[TODO:  We really need a way to return
3472	      only the keys.  We'll need to modify this.]]

3474	   The generation-counter is used to indicate the requester's expected
3475	   state of the storing peer.  If the generation-counter in the request
3476	   matches the stored counter, then the storing peer returns a response
3477	   with no StoredData values.

3479	   Note that because the certificate for a user is typically stored at
3480	   the same location as any data stored for that user, a requesting peer
3481	   which does not already have the user's certificate should request the
3482	   certificate in the Fetch as an optimization.

3484	7.3.2.2.  Response Definition

3486	   The response to a successful Fetch request is a FetchAns message
3487	   containing the data requested by the requester.

3489	          struct {
3490	            KindId                 kind;
3491	            uint64                 generation;
3492	            StoredData             values<0..2^32-1>;
3493	          } FetchKindResponse;

3495	          struct {
3496	            FetchKindResponse      kind_responses<0..2^32-1>;
3497	          } FetchAns;

3499	   The FetchAns structure contains a series of FetchKindResponse
3500	   structures.  There MUST be one FetchKindResponse element for each
3501	   kind-id in the request.

3503	   The contents of the FetchKindResponse structure are as follows:

3505	   kind
3506	      the kind that this structure is for.

3508	   generation
3509	      the generation counter for this kind.

3511	   values
3512	      the relevant values.  If the generation counter in the request
3513	      matches the generation-counter in the stored data, then no
3514	      StoredData values are returned.  Otherwise, all relevant data
3515	      values MUST be returned.  A nonexistent value is represented with
3516	      "exists" set to False.

3518	   There is one subtle point about signature computation on arrays.  If
3519	   the storing node uses the append feature (where the
3520	   index=0xffffffff), then the index in the StoredData that is returned
3521	   will not match that used by the storing node, which would break the
3522	   signature.  In order to avoid this issue, the index value in array is
3523	   set to zero before the signature is computed.  This implies that
3524	   malicious storing nodes can reorder array entries without being
3525	   detected.  [[OPEN ISSUE:  We've considered a number of alternate
3526	   designs here that would preserve security against this attack if the
3527	   storing node did not use the append feature.  However, they are more
3528	   complicated for one or both sides.  If this attack is considered
3529	   serious, we can introduce one of them.]]

3531	7.3.3.  Remove

3533	   The Remove request is used to remove a stored element or elements
3534	   from the storing peer.  Any successful remove of an existing element
3535	   for a given kind MUST increment the generation counter by at least 1.

3537	         struct {
3538	           ResourceId              resource;
3539	           StoredDataSpecifier     specifiers<0..2^16-1>;
3540	         } RemoveReq;

3542	   A RemoveReq has exactly the same syntax as a Fetch request except
3543	   that each entry represents a set of values to be removed rather than
3544	   returned.  The same kind-id MUST NOT be used twice in a given
3545	   RemoveReq.  Each specifier is then processed in turn.  These
3546	   operations MUST be atomic.  If any operation fails, the state MUST be
3547	   rolled back to before the request was received.

3549	   Before processing the Remove request, the peer MUST perform the
3550	   following checks.

3552	   o  The kind-id is known.
3553	   o  The signature over the message is valid or (depending on overlay
3554	      policy) no signature is required.
3555	   o  The signer of the message has permissions which permit him to
3556	      remove this kind of data.  Although each kind defines its own
3557	      access control requirements, in general only the original signer
3558	      of the data should be allowed to remove it.
3559	   o  If the generation-counter is non-zero, it must equal the current
3560	      value of the generation-counter for this kind.  This feature
3561	      allows the generation counter to be used in a way similar to the
3562	      HTTP Etag feature.

3564	   Assuming that the request is permitted, the operations proceed as
3565	   follows.

3567	7.3.3.1.  Single Value

3569	   A Remove of a single value element causes it not to exist.  If no
3570	   such element exists, then this is a silent success.

3572	7.3.3.2.  Array

3574	   A Remove of an array element (or element range) replaces those
3575	   elements with null elements.  Note that this does not cause the array
3576	   to be packed.  An array which contains ["A", "B", "C"] and then has
3577	   element 0 removed produces an array containing [NA, "B", "C"].  Note,
3578	   however, that the removal of the final element of the array shortens
3579	   the array, so in the above case, the removal of element 2 makes the
3580	   array ["A", "B"].

3582	7.3.3.3.  Dictionary

3584	   A Remove of a dictionary element (or elements) replaces those
3585	   elements with null elements.  If no such elements exist, then this is
3586	   a silent success.

3588	7.3.3.4.  Response Definition

3590	   The response to a successful Remove simply contains a list of the new
3591	   generation counters for each kind-id, using the same syntax as the
3592	   response to a Store request.  Note that if the generation counter
3593	   does not change, that means that the requested items did not exist.
3594	   However, if the generation counter does change, that does not mean
3595	   that the items existed.

3597	         struct {
3598	           StoreKindResponse          kind_responses<0..2^16-1>;
3599	         } RemoveAns;

3601	7.3.4.  Find

3603	   The Find request can be used to explore the Overlay Instance.  A Find
3604	   request for a resource-id R and a kind-id T retrieves the resource-id
3605	   (if any) of the resource of kind T known to the target peer which is
3606	   closes to R. This method can be used to walk the Overlay Instance by
3607	   interactively fetching R_n+1=nearest(1 + R_n).

3609	7.3.4.1.  Request Definition

3611	   The FindReq message contains a series of resource-IDs and kind-ids
3612	   identifying the resource the peer is interested in.

3614	      struct {
3615	        ResourceID                 resource;
3616	        KindId                     kinds<0..2^8-1>;
3617	      } FindReq;

3619	   The request contains a list of kind-ids which the Find is for, as
3620	   indicated below:

3622	   resource
3623	      The desired resource-id

3625	   kinds
3626	      The desired kind-ids.  Each value MUST only appear once.

3628	7.3.4.2.  Response Definition

3630	   A response to a successful Find request is a FindAns message
3631	   containing the closest resource-id for each kind specified in the
3632	   request.

3634	     struct {
3635	       KindId                      kind;
3636	       ResourceID                  closest;
3637	     } FindKindData;

3639	     struct {
3640	       FindKindData                results<0..2^16-1>;
3641	     } FindAns;

3643	   If the processing peer is not responsible for the specified
3644	   resource-id, it SHOULD return a 404 error.

3646	   For each kind-id in the request the response MUST contain a
3647	   FindKindData indicating the closest resource-id for that kind-id
3648	   unless the kind is not allowed to be used with Find in which case a
3649	   FindKindData for that kind-id MUST NOT be included in the response.
3650	   If a kind-id is not known, then the corresponding resource-id MUST be
3651	   0.  Note that different kind-ids may have different closest resource-
3652	   ids.

3654	   The response is simply a series of FindKindData elements, one per
3655	   kind, concatenated end-to-end.  The contents of each element are:

3657	   kind
3658	      The kind-id.

3660	   closest
3661	      The closest resource ID to the specified resource ID.  This is 0
3662	      if no resource ID is known.

3664	   Note that the response does not contain the contents of the data
3665	   stored at these resource-ids.  If the requester wants this, it must
3666	   retrieve it using Fetch.

3668	7.3.4.3.  Defining New Kinds

3670	   A new kind MUST define:

3672	   o  The meaning of the data to be stored.
3673	   o  The kind-id.
3674	   o  The data model (single value, array, dictionary, etc.)
3675	   o  Access control rules for indicating what credentials are allowed
3676	      to read and write that kind-id at a given location.

3678	   While each kind MUST define what data model is used for its data,
3679	   that does not mean that it must define new data models.  Where
3680	   practical, kinds SHOULD use the built-in data models.  However, they
3681	   MAY define any new required data models.  The intention is that the
3682	   basic data model set be sufficient for most applications/usages.

3684	8.  Certificate Store Usage

3686	   The Certificate Store usage allows a peer to store its certificate in
3687	   the overlay, thus avoiding the need to send a certificate in each
3688	   message - a reference may be sent instead.

3690	   A user/peer MUST store its certificate at resource-ids derived from
3691	   two Resource Names:

3693	   o  The user names in the certificate.
3694	   o  The Node-IDs in the certificate.

3696	   Note that in the second case the certificate is not stored at the
3697	   peer's Node-ID but rather at a hash of the peer's Node-ID.  The
3698	   intention here (as is common throughout RELOAD) is to avoid making a
3699	   peer responsible for its own data.

3701	   A peer MUST ensure that the user's certificates are stored in the
3702	   Overlay Instance.  New certificates are stored at the end of the
3703	   list.  This structure allows users to store and old and new
3704	   certificate the both have the same node-id which allows for migration
3705	   of certificates when they are renewed.

3707	   Kind IDs  This usage defines the CERTIFICATE kind-id to store a peer
3708	      or user's certificate.

3710	   Data Model  The data model for CERTIFICATE data is array.

3712	   Access Control  The CERTIFICATE MUST contain a Node-ID or user name
3713	      which, when hashed, maps to the resource-id at which the value is
3714	      being stored.

3716	9.  TURN Server Usage

3718	   The TURN server usage allows a RELOAD peer to advertise that it is
3719	   prepared to be a TURN server.  When a node starts up, it joins the
3720	   overlay network and forms several connection in the process.  If the
3721	   ICE stage in any of these connection return a reflexive address that
3722	   is not the same as the peers perceived address, then the peers is
3723	   behind a NAT and not an candidate for a TURN server.  Additionally,
3724	   if the peers IP address is in the private address space range, then
3725	   it is not a candidate for a TURN server.  Otherwise, the peer SHOULD
3726	   assume it is a potential TURN server and follow the procedures below.

3728	   If the node is a candidate for a TURN server it will insert some
3729	   pointers in the overlay so that other peers can find it.  The overlay
3730	   configuration file specifies a turnDensity parameter that indicates
3731	   how many times each TURN server should record itself in the overlay.
3732	   Typically this should be set to the reciprocal of the estimate of
3733	   what percentage of peers will act as TURN servers.  For each value,
3734	   called d, between 1 and turnDensity, the peer forms a Resource Name
3735	   by concatenating its peer-ID and the value d.  This Resource Name is
3736	   hashed to form a Resource-ID.  The address of the peer is stored at
3737	   that Resource-ID using type TURN-SERVICE and the TurnServer object:

3739	         struct {
3740	           uint8                   iteration;
3741	           IpAddressAndPort        server_address;
3742	         } TurnServer;

3744	   The contents of this structure are as follows:

3746	   iteration
3747	      the d value

3749	   server_address
3750	      the address at which the TURN server can be contacted.

3752	   Note:  Correct functioning of this algorithm depends critically on
3753	      having turnDensity be an accurate estimate of the true density of
3754	      TURN servers.  If turnDensity is too high, then the process of
3755	      finding TURN servers becomes extremely expensive as multiple
3756	      candidate resource-ids must be probed.

3758	   Peers peers that provide this service need to support the TURN
3759	   extensions to STUN for media relay of both UDP and TCP traffic as
3760	   defined in [I-D.ietf-behave-turn] and [I-D.ietf-behave-tcp].

3762	   [[OPEN ISSUE:  This structure only works for TURN servers that have
3763	   public addresses.  It may be possible to use TURN servers that are
3764	   behind well-behaved NATs by first ICE connecting to them.  If we
3765	   decide we want to enable that, this structure will need to change to
3766	   either be a peer-id or include that as an option.]]

3768	   Kind IDs  This usage defines the TURN-SERVICE kind-id to indicate
3769	      that a peer is willing to act as a TURN server.  The Find command
3770	      MUST return results for the TURN-SERVICE kind-id.
3771	   Data Model  The TURN-SERVICE stores a single value for each
3772	      resource-id.
3773	   Access Control  If certificate-based access control is being used,
3774	      stored data of kind TURN-SERVICE MUST be authenticated by a
3775	      certificate which contains a peer-id which when hashed with the
3776	      iteration counter produces the resource-id being stored at.

3778	   Peers can find other servers by selecting a random Resource-ID and
3779	   then doing a Find request for the appropriate server type with that
3780	   Resource-ID.  The Find request gets routed to a random peer based on
3781	   the Resource-ID.  If that peer knows of any servers, they will be
3782	   returned.  The returned response may be empty if the peer does not
3783	   know of any servers, in which case the process gets repeated with
3784	   some other random Resource-ID.  As long as the ratio of servers
3785	   relative to peers is not too low, this approach will result in
3786	   finding a server relatively quickly.

3788	10.  SIP Usage

3790	   The SIP usage allows a RELOAD overlay to be used as a distributed SIP
3791	   registrar/proxy network.  This entails three primary operations:

3793	   o  Registering one's own AOR with the overlay.
3794	   o  Looking up a given AOR in the overlay.
3795	   o  Forming a direct connection to a given peer.

3797	10.1.  Registering AORs

3799	   In ordinary SIP, a UA registers its AOR and location with a
3800	   registrar.  In RELOAD, this registrar function is provided by the
3801	   overlay as a whole.  To register its location, a RELOAD peer stores a
3802	   SipRegistration structure under its own AOR.  This uses the SIP-
3803	   REGISTRATION kind-id, which is formally defined in Section 10.5.
3804	   Note:  GRUUs are handled via a separate mechanism, as described in
3805	      Section 10.4.

3807	   As a simple example, if Alice's AOR were "sip:alice@dht.example.com"
3808	   and her Node-ID were "1234", she might store the mapping
3809	   "sip:alice@example.org -> 1234".  This would tell anyone who wanted
3810	   to call Alice to contact node "1234".

3812	   RELOAD peers MAY store two kinds of SIP mappings:

3814	   o  From AORs to destination lists (a single Node-ID is just a trivial
3815	      destination list.)
3816	   o  From AORs to other AORs.

3818	   The meaning of the first kind of mapping is "in order to contact me,
3819	   form a connection with this peer."  The meaning of the second kind of
3820	   mapping is "in order to contact me, dereference this AOR".  This
3821	   allows for forwarding.  For instance, if Alice wants calls to her to
3822	   be forwarded to her secretary, Sam, she might insert the following
3823	   mapping "sip:alice@dht.example.org -> sip:sam@dht.example.org".

3825	   The contents of a SipRegistration structure are as follows:

3827	          enum {sip_registration_uri (1), sip_registration_route (2),
3828	             (255)} SipRegistrationType;

3830	          select (SipRegistration.type) {
3831	            case sip_registration_uri:
3832	              opaque               uri<0..2^16-1>;

3834	            case sip_registration_route:
3835	              opaque               contact_prefs<0..2^16-1>;
3836	              Destination          destination_list<0..2^16-1>;

3838	            /* This type can be extended */

3840	          } SipRegistrationData;

3842	          struct {
3843	             SipRegistrationType   type;
3844	             uint16                length;
3845	             SipRegistrationData   data;
3846	         } SipRegistration;

3848	   The contents of the SipRegistration PDU are:

3850	   type
3851	      the type of the registration

3853	   length
3854	      the length of the rest of the PDU

3856	   data
3857	      the registration data

3859	   o  If the registration is of type "sip_registration_uri", then the
3860	      contents are an opaque string containing the URI.
3861	   o  If the registration is of type "sip_registration_route", then the
3862	      contents are an opaque string containing the callee's contact
3863	      preferences and a destination list for the peer.

3865	   RELOAD explicitly supports multiple registrations for a single AOR.
3866	   The registrations are stored in a Dictionary with the dictionary keys
3867	   being Node-IDs.  Consider, for instance, the case where Alice has two
3868	   peers:

3870	   o  her desk phone (1234)
3871	   o  her cell phone (5678)

3873	   Alice might store the following in the overlay at resource
3874	   "sip:alice@dht.example.com".

3876	   o  A SipRegistration of type "sip_registration_route" with dictionary
3877	      key "1234" and value "1234".
3878	   o  A SipRegistration of type "sip_registration_route" with dictionary
3879	      key "5678" and value "5678".

3881	   Note that this structure explicitly allows one Node-ID to forward to
3882	   another Node-ID.  For instance, Alice could set calls to her desk
3883	   phone to ring at her cell phone.  It's not clear that this is useful
3884	   in this case, but may be useful if Alice has two AORs.

3886	   In order to prevent hijacking, registrations are subject to access
3887	   control rules.  Before a Store is permitted, the storing peer MUST
3888	   check that:

3890	   o  The certificate contains a username that is a SIP AOR that hashes
3891	      to the resource-id being stored at.
3892	   o  The certificate contains a Node-ID that is the same as the
3893	      dictionary key being stored at.

3895	   Note that these rules permit Alice to forward calls to Bob without
3896	   his permission.  However, they do not permit Alice to forward Bob's
3897	   calls to her.  See Section 15.7.2 for more on this point.

3899	10.2.  Looking up an AOR

3901	   When a RELOAD user wishes to call another user, starting with a non-
3902	   GRUU AOR, he follows the following procedure.  (GRUUs are discussed
3903	   in Section 10.4).

3905	   1.  Check to see if the domain part of the AOR matches the domain
3906	       name of an overlay of which he is a member.  If not, then this is
3907	       an external AOR, and he MUST do one of the following:
3908	       *  Fail the call.
3909	       *  Use ordinary SIP procedures.
3910	       *  Attempt to become a member of the overlay indicated by the
3911	          domain part (only possible if the enrollment procedure defined
3912	          in Section 13.1 indicates that this is a RELOAD overlay.)
3913	   2.  Perform a Fetch for kind SIP-REGISTRATION at the resource-id
3914	       corresponding to the AOR.  This Fetch SHOULD NOT indicate any
3915	       dictionary keys, which will result in fetching all the stored
3916	       values.

3918	   3.  If any of the results of the Fetch are non-GRUU AORs, then repeat
3919	       step 1 for that AOR.
3920	   4.  Once only GRUUs and destination lists remain, the peer removes
3921	       duplicate destination lists and GRUUs from the list and forms a
3922	       SIP connection to the appropriate peers as described in the
3923	       following sections.  If there are also external AORs, the peer
3924	       follows the appropriate procedure for contacting them as well.

3926	10.3.  Forming a Direct Connection

3928	   Once the peer has translated the AOR into a set of destination lists,
3929	   it then uses the overlay to route Connect messages to each of those
3930	   peers.  The "application" field MUST be 5060 to indicate SIP.  If
3931	   certificate-based authentication is in use, the responding peer MUST
3932	   present a certificate with a Node-ID matching the terminal entry in
3933	   the route list.  Note that it is possible that the peers already have
3934	   a RELOAD connection between them.  This MUST NOT be used for SIP
3935	   messages.  However, if a SIP connection already exists, that MAY be
3936	   used.  Once the Connect succeeds, the peer sends SIP messages over
3937	   the connection as in normal SIP.

3939	10.4.  GRUUs

3941	   GRUUs do not require storing data in the Overlay Instance.  Rather,
3942	   they are constructed by embedding a base64-encoded destination list
3943	   in the gr URI parameter of the GRUU.  The base64 encoding is done
3944	   with the alphabet specified in table 1 of RFC 4648 with the exception
3945	   that ~ is used in place of =.  An example GRUU is
3946	   "sip:alice@example.com;gr=MDEyMzQ1Njc4OTAxMjM0NTY3ODk~".  When a peer
3947	   needs to route a message to a GRUU in the same P2P network, it simply
3948	   uses the destination list and connects to that peer.

3950	   Because a GRUU contains a destination list, it MAY have the same
3951	   contents as a destination list stored elsewhere in the resource
3952	   dictionary.

3954	   Anonymous GRUUs are done in roughly the same way but require either
3955	   that the enrollment server issue a different Node-ID for each
3956	   anonymous GRUU required or that a destination list be used that
3957	   includes a peer that compresses the destination list to stop the
3958	   Node-ID from being revealed.

3960	10.5.  SIP-REGISTRATION Kind Definition

3962	   The first mapping is provided using the SIP-REGISTRATION kind-id:

3964	   Kind IDs  The Resource Name for the SIP-REGISTRATION kind-id is the
3965	      AOR of the user.  The data stored is a SipRegistrationData, which
3966	      can contain either another URI or a destination list to the peer
3967	      which is acting for the user.

3969	   Data Model  The data model for the SIP-REGISTRATION kind-id is
3970	      dictionary.  The dictionary key is the Node-ID of the storing
3971	      peer.  This allows each peer (presumably corresponding to a single
3972	      device) to store a single route mapping.

3974	   Access Control  If certificate-based access control is being used,
3975	      stored data of kind-id SIP-REGISTRATION must be signed by a
3976	      certificate which (1) contains user name matching the storing URI
3977	      used as the Resource Name for the resource-id and (2) contains a
3978	      Node-ID matching the storing dictionary key.

3980	   Data stored under the SIP-REGISTRATION kind is of type
3981	   SipRegistration.  This comes in two varieties:

3983	   sip_registration_uri
3984	      a URI which the user can be reached at.

3986	   sip_registration_route
3987	      a destination list which can be used to reach the user's peer.

3989	11.  Diagnostic Usage

3991	   The Diagnostic Usage allows a node to report various statistics about
3992	   itself that may be useful for diagnostics or performance management.
3993	   It can be used to discover information such as the software version,
3994	   uptime, routing table, stored resource-objects, and performance
3995	   statistics of a peer.  The usage defines several new kinds which can
3996	   be retrieved to get the statistics and also allows to retrieve other
3997	   kinds that a node stores.  In essence, the usage allows querying a
3998	   node's state such as storage and network to obtain the relevant
3999	   information.

4001	   The access control model for all kinds is a local policy defined by
4002	   the peer or the overlay policy.  The peer may be configured with a
4003	   list of users that it is willing to return the information for and
4004	   restrict access to users with that name.  Unless specific policy
4005	   overrides it, data SHOULD NOT be returned for users not on the list.
4006	   The access control can also be determined on a per kind basis - for
4007	   example, a node may be willing to return the software version to any
4008	   users while specific information about performance may not be
4009	   returned.

4011	   The following kinds are defined:

4013	   ROUTING_TABLE_SIZE  A single value element containing an unsigned 32-
4014	      bit integer representing the number of peers in the peer's routing
4015	      table.

4017	   SOFTWARE_VERSION  A single value element containing a US-ASCII string
4018	      that identifies the manufacture, model, and version of the
4019	      software.

4021	   MACHINE_UPTIME  A single value element containing an unsigned 64-bit
4022	      integer specifying the time the nodes has been up in seconds.

4024	   APP_UPTIME  A single value element containing an unsigned 64-bit
4025	      integer specifying the time the p2p application has been up in
4026	      seconds.

4028	   MEMORY_FOOTPRINT  A single value element containing an unsigned 32-
4029	      bit integer representing the memory footprint of the peer program
4030	      in kilo bytes.

4032	      Note:  What's a kilo byte? 1000 or 1024? -- Cullen
4033	      Note:  Good question. 1000 seems like not quite enough room but
4034	         1024 is too much? -- EKR

4036	   DATASIZE_StoreD  An unsigned 64-bit integer representing the number
4037	      of bytes of data being stored by this node.

4039	   INSTANCES_StoreD  An array element containing the number of instances
4040	      of each kind stored.  The array is index by kind-id.  Each entry
4041	      is an unsigned 64-bit integer.

4043	   MESSAGES_SENT_RCVD  An array element containing the number of
4044	      messages sent and received.  The array is indexed by method code.
4045	      Each entry in the array is a pair of unsigned 64-bit integers
4046	      (packed end to end) representing sent and received.

4048	   EWMA_BYTES_SENT  A single value element containing an unsigned 32-bit
4049	      integer representing an exponential weighted average of bytes sent
4050	      per second by this peer.
4051	      sent = alpha x sent_present + (1 - alpha) x sent
4052	      where sent_present represents the bytes sent per second since the
4053	      last calculation and sent represents the last calculation of bytes
4054	      sent per second.  A suitable value for alpha is 0.8.  This value
4055	      is calculated every five seconds.

4057	   EWMA_BYTES_RCVD  A single value element containing an unsigned 32-bit
4058	      integer representing an exponential weighted average of bytes
4059	      received per second by this peer.  Same calculation as above.

4061	   [[TODO:  We would like some sort of bandwidth measurement, but we're
4062	   kind of unclear on the units and representation.]]

4064	11.1.  Diagnostic Metrics for a P2PSIP Deployment

4066	   (OPEN QUESTION:  any other metrics?)

4068	   Below, we sketch how these metrics can be used.  A peer can use
4069	   EWMA_BYTES_SENT and EWMA_BYTES_RCVD of another peer to infer whether
4070	   it is acting as a media relay.  It may then choose not to forward any
4071	   requests for media relay to this peer.  Similarly, among the various
4072	   candidates for filling up routing table, a peer may prefer a peer
4073	   with a large UPTIME value, small RTT, and small LAST_CONTACT value.

4075	12.  Chord Algorithm

4077	   This algorithm is assigned the name chord-128-2-16+ to indicate it is
4078	   based on Chord, uses SHA-1 then truncates that to 128 bit for the
4079	   hash function, stores 2 redundant copies of all data, and has finger
4080	   tables with at least 16 entries.

4082	12.1.  Overview

4084	   The algorithm described here is a modified version of the Chord
4085	   algorithm.  Each peer keeps track of a finger table of 16 entries and
4086	   a neighborhood table of 6 entries.  The neighborhood table contains
4087	   the 3 peers before this peer and the 3 peers after it in the DHT
4088	   ring.  The first entry in the finger table contains the peer half-way
4089	   around the ring from this peer; the second entry contains the peer
4090	   that is 1/4 of the way around; the third entry contains the peer that
4091	   is 1/8th of the way around, and so on.  Fundamentally, the chord data
4092	   structure can be thought of a doubly-linked list formed by knowing
4093	   the successors and predecessor peers in the neighborhood table,
4094	   sorted by the Node-ID.  As long as the successor peers are correct,
4095	   the DHT will return the correct result.  The pointers to the prior
4096	   peers are kept to enable inserting of new peers into the list
4097	   structure.  Keeping multiple predecessor and successor pointers makes
4098	   it possible to maintain the integrity of the data structure even when
4099	   consecutive peers simultaneously fail.  The finger table forms a skip
4100	   list, so that entries in the linked list can be found in O(log(N))
4101	   time instead of the typical O(N) time that a linked list would
4102	   provide.

4104	   A peer, n, is responsible for a particular Resource-ID k if k is less
4105	   than or equal to n and k is greater than p, where p is the peer id of
4106	   the previous peer in the neighborhood table.  Care must be taken when
4107	   computing to note that all math is modulo 2^128.

4109	12.2.  Routing

4111	   If a peer is not responsible for a Resource-ID k, but is directly
4112	   connected to a node with Node-Id k, then it routes the message to
4113	   that node.  Otherwise, it routes the request to the peer in the
4114	   routing table that has the largest Node-ID that is in the interval
4115	   between the peer and k.

4117	12.3.  Redundancy

4119	   When a peer receives a Store request for Resource-ID k, and it is
4120	   responsible for Resource-ID k, it stores the data and returns a
4121	   success response.  [[Open Issue:  should it delay sending this
4122	   success until it has successfully stored the redundant copies?]].  It
4123	   then sends a Store request to its successor in the neighborhood table
4124	   and to that peers successor.  Note that these Store requests are
4125	   addressed to those specific peers, even though the Resource-ID they
4126	   are being asked to store is outside the range that they are
4127	   responsible for.  The peers receiving these check they came from an
4128	   appropriate predecessor in their neighborhood table and that they are
4129	   in a range that this predecessor is responsible for, and then they
4130	   store the data.  They do not themselves perform further Stores
4131	   because they can determine that they are not responsible for the
4132	   resource-ID.

4134	   Note that a malicious node can return a success response but not
4135	   store the data locally or in the replica set.  Requesting peers which
4136	   wish to ensure that the replication actually occurred SHOULD contact
4137	   each peer listed in the replicas field of the Store response and
4138	   retrieve a copy of the data.  [[TODO:  Do we want to have some
4139	   optimization in Fetch where they can retrieve just a digest instead
4140	   of the data values?]]

4142	12.4.  Joining

4144	   The join process for a joining party (JP) with Node-ID n is as
4145	   follows.

4147	   1.  JP connects to its chosen bootstrap node.
4148	   2.  JP uses a series of Pings to populate its routing table.
4149	   3.  JP sends Connect requests to initiate connections to each of the
4150	       peers in the connection table as well as to the desired finger
4151	       table entries.  Note that this does not populate their routing
4152	       tables, but only their connection tables, so JP will not get
4153	       messages that it is expected to route to other nodes.
4154	   4.  JP enters all the peers it contacted into its routing table.
4155	   5.  JP sends a Join to its immediate successor, the admitting peer
4156	       (AP) for Node-ID n.  The AP sends the response to the Join.
4157	   6.  AP does a series of Store requests to JP to store the data that
4158	       JP will be responsible for.
4159	   7.  AP sends JP an Update explicitly labeling JP as its predecessor.
4160	       At this point, JP is part of the ring and responsible for a
4161	       section of the overlay.  AP can now forget any data which is
4162	       assigned to JP and not AP.
4163	   8.  AP sends an Update to all of its neighbors with the new values of
4164	       its neighbor set (including JP).
4165	   9.  JP sends UpdateS to all the peers in its routing table.

4167	   In order to populate its routing table, JP sends a Ping via the
4168	   bootstrap node directed at resource-id n+1 (directly after its own
4169	   resource-id).  This allows it to discover its own successor.  Call
4170	   that node p0.  It then sends a ping to p0+1 to discover its successor
4171	   (p1).  This process can be repeated to discover as many successors as
4172	   desired.  The values for the two peers before p will be found at a
4173	   later stage when n receives an Update.

4175	   In order to set up its neighbor table entry for peer i, JP simply
4176	   sends a Connect to peer (n+2^(numBitsInNodeId-i).  This will be
4177	   routed to a peer in approximately the right location around the ring.

4179	12.5.  Routing Connects

4181	   When a peer needs to Connect with a new peer in its neighborhood
4182	   table, it MUST source-route the Connect request through the peer from
4183	   which it learned the new peer's Node-ID.  Source-routing these
4184	   requests allows the overlay to recover from instability.

4186	   All other Connect requests, such as those for new finger table
4187	   entries, are routed conventionally through the overlay.

4189	   If a peer is unable to successfully Connect with a peer that should
4190	   be in its neighborhood, it MUST locate either a TURN server or
4191	   another peer in the overlay, but not in its neighborhood, through
4192	   which it can exchange messages with its neighbor peer

4194	12.6.  Updates

4196	   A chord Update is defined as
4197	         enum { reserved (0), peer_ready(1), neighbors(2), full(3), (255) }
4198	              ChordUpdateType;

4200	         struct {
4201	           ChordUpdateType         type;

4203	           select(type){
4204	             case peer_ready:                   /* Empty */
4205	               ;

4207	             case neighbors:
4208	               NodeId              predecessors<0..2^16-1>;
4209	               NodeId              successors<0..2^16-1>;

4211	             case full:
4212	               NodeId              predecessors<0..2^16-1>;
4213	               NodeId              successors<0..2^16-1>;
4214	               NodeId              fingers<0..2^16-1>;
4215	           };
4216	         } ChordUpdate;

4218	   The "type" field contains the type of the update, which depends on
4219	   the reason the update was sent.

4221	   peer_ready:    this peer is ready to receive messages.  This message
4222	      is used to indicate that a node which has Connected is a peer and
4223	      can be routed through.  It is also used as a connectivity check to
4224	      non-neighbor pers.
4225	   neighbors:    this version is sent to members of the Chord neighbor
4226	      table.
4227	   full:    this version is sent to peers which request an Update with a
4228	      RouteQueryReq.

4230	   If the message is of type "neighbors", then the contents of the
4231	   message will be:

4233	   predecessors
4234	      The predecessor set of the Updating peer.

4236	   successors
4237	      The successor set of the Updating peer.

4239	   If the message is of type "full", then the contents of the message
4240	   will be:

4242	   predecessors
4243	      The predecessor set of the Updating peer.

4245	   successors
4246	      The successor set of the Updating peer.

4248	   fingers
4249	      The finger table if the Updating peer, in numerically ascending
4250	      order.

4252	   A peer MUST maintain an association (via Connect) to every member of
4253	   its neighbor set.  A peer MUST attempt to maintain at least three
4254	   predecessors and three successors.  However, it MUST send its entire
4255	   set in any Update message sent to neighbors.

4257	12.6.1.  Sending Updates

4259	   Every time a connection to a peer in the neighborhood set is lost (as
4260	   determined by connectivity pings or failure of some request), the
4261	   peer should remove the entry from its neighborhood table and replace
4262	   it with the best match it has from the other peers in its routing
4263	   table.  It then sends an Update to all its remaining neighbors.  The
4264	   update will contain all the Node-IDs of the current entries of the
4265	   table (after the failed one has been removed).  Note that when
4266	   replacing a successor the peer SHOULD delay the creation of new
4267	   replicas for 30 seconds after removing the failed entry from its
4268	   neighborhood table in order to allow a triggered update to inform it
4269	   of a better match for its neighborhood table.

4271	   If connectivity is lost to all three of the peers that succeed this
4272	   peer in the ring, then this peer should behave as if it is joining
4273	   the network and use Pings to find a peer and send it a Join.  If
4274	   connectivity is lost to all the peers in the finger table, this peer
4275	   should assume that it has been disconnected from the rest of the
4276	   network, and it should periodically try to join the DHT.

4278	12.6.2.  Receiving Updates

4280	   When a peer, N, receives an Update request, it examines the Node-IDs
4281	   in the UpdateReq and at its neighborhood table and decides if this
4282	   UpdateReq would change its neighborhood table.  This is done by
4283	   taking the set of peers currently in the neighborhood table and
4284	   comparing them to the peers in the update request.  There are three
4285	   major cases:

4287	   o  The UpdateReq contains peers that would not change the neighbor
4288	      set because they match the neighborhood table.
4289	   o  The UpdateReq contains peers closer to N than those in its
4290	      neighborhood table.
4291	   o  The UpdateReq defines peers that indicate a neighborhood table
4292	      further away from N than some of its neighborhood table.  Note
4293	      that merely receiving peers further away does not demonstrate
4294	      this, since the update could be from a node far away from N.
4295	      Rather, the peers would need to bracket N.

4297	   In the first case, no change is needed.

4299	   In the second case, N MUST attempt to Connect to the new peers and if
4300	   it is successful it MUST adjust its neighbor set accordingly.  Note
4301	   that it can maintain the now inferior peers as neighbors, but it MUST
4302	   remember the closer ones.

4304	   The third case implies that a neighbor has disappeared, most likely
4305	   because it has simply been disconnected but perhaps because of
4306	   overlay instability.  N MUST Ping the questionable peers to discover
4307	   if they are indeed missing and if so, remove them from its
4308	   neighborhood table.

4310	   After any Pings and Connects are done, if the neighborhood table
4311	   changes, the peer sends an Update request to each of its neighbors
4312	   that was in either the old table or the new table.  These Update
4313	   requests are what ends up filling in the predecessor/successor tables
4314	   of peers that this peer is a neighbor to.  A peer MUST NOT enter
4315	   itself in its successor or predecessor table and instead should leave
4316	   the entries empty.

4318	   If peer N which is responsible for a resource-id R discovers that the
4319	   replica set for R (the next two nodes in its successor set) has
4320	   changed, it MUST send a Store for any data associated with R to any
4321	   new node in the replica set.  It SHOULD not delete data from peers
4322	   which have left the replica set.

4324	   When a peer N detects that it is no longer in the replica set for a
4325	   resource R (i.e., there are three predecessors between N and R), it
4326	   SHOULD delete all data associated with R from its local store.

4328	12.6.3.  Stabilization

4330	   There are four components to stabilization:
4331	   1.  exchange Updates with all peers in its routing table to exchange
4332	       state

4334	   2.  search for better peers to place in its finger table
4335	   3.  search to determine if the current finger table size is
4336	       sufficiently large
4337	   4.  search to determine if the overlay has partitioned and needs to
4338	       recover

4340	   A peer MUST periodically send an Update request to every peer in its
4341	   routing table.  The purpose of this is to keep the predecessor and
4342	   successor lists up to date and to detect connection failures.  The
4343	   default time is about every ten minutes, but the enrollment server
4344	   SHOULD set this in the configuration document using the "chord-128-2-
4345	   16+-update-frequency" element (denominated in seconds.)  A peer
4346	   SHOULD randomly offset these Update requests so they do not occur all
4347	   at once.  If an Update request fails or times out, the peer MUST mark
4348	   that entry in the neighbor table invalid and attempt to reestablish a
4349	   connection.  If no connection can be established, the peer MUST
4350	   attempt to establish a new peer as its neighbor and do whatever
4351	   replica set adjustments are required.

4353	   Periodically a peer should select a random entry i from the finger
4354	   table and do a Ping to resource (n+2^(numBitsInNodeId-i).  The
4355	   purpose of this is to find a more accurate finger table entry if
4356	   there is one.  This is done less frequently than the connectivity
4357	   checks in the previous section because forming new connections is
4358	   somewhat expensive and the cost needs to be balanced against the cost
4359	   of not having the most optimal finger table entries.  The default
4360	   time is about every hour, but the enrollment server SHOULD set this
4361	   in the configuration document using the "chord-128-2-16+-ping-
4362	   frequency" element (denominated in seconds).  If this returns a
4363	   different peer than the one currently in this entry of the peer
4364	   table, then a new connection should be formed to this peer and it
4365	   should replace the old peer in the finger table.

4367	   As an overlay grows, more than 16 entries may be required in the
4368	   finger table for efficient routing.  To determine if its finger table
4369	   is sufficiently large, one an hour the peer should perform a Ping to
4370	   determine whether growing its finger table by four entries would
4371	   result in it learning at least two peers that it does not already
4372	   have in its neighbor table.  If so, then the finger table SHOULD be
4373	   grown by four entries.  Similarly, if the peer observes that its
4374	   closest finger table entries are also in its neighbor table, it MAY
4375	   shrink its finger table to the minimum size of 16 entries.  [[OPEN
4376	   ISSUE:  there are a variety of algorithms to gauge the population of
4377	   the overlay and select an appropriate finger table size.  Need to
4378	   consider which is the best combination of effectiveness and
4379	   simplicity.]]

4381	   To detect that a partitioning has occurred and to heal the overlay, a
4382	   peer P MUST periodically repeat the discovery process used in the
4383	   initial join for the overlay to locate an appropriate bootstrap peer,
4384	   B. If an overlay has multiple mechanisms for discovery it should
4385	   randomly select a method to locate a bootstrap peer.  P should then
4386	   send a Ping for its own Node-ID routed through B. If a response is
4387	   received from a peer S', which is not P's successor, then the overlay
4388	   is partitioned and P should send a Connect to S' routed through B,
4389	   followed by an Update sent to S'.  (Note that S' may not be in P's
4390	   neighborhood table once the overlay is healed, but the connection
4391	   will allow S' to discover appropriate neighbor entries for itself via
4392	   its own stabilization.)

4394	12.7.  Route Query

4396	   For this topology plugin, the RouteQueryReq contains no additional
4397	   information.  The RouteQueryAns contains the single node ID of the
4398	   next peer to which the responding peer would have routed the request
4399	   message in recursive routing:

4401	       struct {
4402	          NodeId                  next_id;
4403	       } ChordRouteQueryAns;

4405	   The contents of this structure are as follows:

4407	   next_peer
4408	      The peer to which the responding peer would route the message to
4409	      in order to deliver it to the destination listed in the request.

4411	   If the requester set the send_update flag, the responder SHOULD
4412	   initiate an Update immediately after sending the RouteQueryAns.

4414	12.8.  Leaving

4416	   Peers SHOULD send a Leave request prior to exiting the Overlay
4417	   Instance.  Any peer which receives a Leave for a peer n in its
4418	   neighbor set must remove it from the neighbor set, update its replica
4419	   sets as appropriate (including Stores of data to new members of the
4420	   replica set) and send Updates containing its new predecessor and
4421	   successor tables.

4423	13.  Enrollment and Bootstrap
4424	13.1.  Discovery

4426	   When a peer first joins a new overlay, it starts with a discovery
4427	   process to find an enrollment server.  Related work to the approach
4428	   used here is described in [I-D.garcia-p2psip-dns-sd-bootstrapping]
4429	   and [I-D.matthews-p2psip-bootstrap-mechanisms].  The peer first
4430	   determines the overlay name.  This value is provided by the user or
4431	   some other out of band provisioning mechanism.  If the name is an IP
4432	   address, that is directly used otherwise the peer MUST do a DNS SRV
4433	   query using a Service name of "p2p_enroll" and a protocol of tcp to
4434	   find an enrollment server.

4436	   If the overlay name ends in .local, then a DNS SRV lookup using
4437	   implement [I-D.cheshire-dnsext-dns-sd] with a Service name of
4438	   "p2p_menroll" can also be tried to find an enrollment server.  If
4439	   they implement this, the user name MAY be used as the Instance
4440	   Identifier label.

4442	   Once an address for the enrollment servers is determined, the peer
4443	   forms an HTTPS connection to that IP address.  The certificate MUST
4444	   match the overlay name as described in [RFC2818].  The peer then
4445	   performs a GET to the URL formed by appending a path of "/p2psip/
4446	   enroll" to the overlay name.  For example, if the overlay name was
4447	   example.com, the URL would be "https://example.com/p2psip/enroll".

4449	   The result is an XML configuration file with the syntax described in
4450	   the following section.

4452	13.2.  Overlay Configuration

4454	   This specification defines a new content type "application/
4455	   p2p-overlay+xml" for an MIME entity that contains overlay
4456	   information.  This information is fetched from the enrollment server,
4457	   as described above.  An example document is shown below.

4459	      
4460	        
4461	          
4462	          [PEM encoded certificate here]
4463	          
4465	          
4466	          
4467	          
4468	          
4469	          
4470	        

4472	   The file MUST be a well formed XML document and it SHOULD contain an
4473	   encoding declaration in the XML declaration.  If the charset
4474	   parameter of the MIME content type declaration is present and it is
4475	   different from the encoding declaration, the charset parameter takes
4476	   precedence.  Every application conferment to this specification MUST
4477	   accept the UTF-8 character encoding to ensure minimal
4478	   interoperability.  The namespace for the elements defined in this
4479	   specification is urn:ietf:params:xml:ns:p2p:overlay.

4481	   The file can contain multiple "overlay" elements where each one
4482	   contains the configuration information for a different overlay.  Each
4483	   "overlay" has the following attributes:

4485	   instance-name:  name of the overlay

4487	   expiration:  time in future at which this overlay configuration is
4488	      not longer valid and need to be retrieved again.  This is
4489	      expressed in seconds from the current time.

4491	   Inside each overlay element, the following elements can occur:

4493	   topology-plugin
4494	      This element has an attribute called algorithm-name that describes
4495	      the overlay-algorithm being used.

4497	   root-cert
4498	      This element contains a PEM encoded X.509v3 certificate that is
4499	      the root trust store used to sign all certificates in this
4500	      overlay.  There can be more than one of these.

4502	   required-kinds
4503	      This element indicates the kinds that members must support.  It
4504	      has three attributes:
4505	      *  name:  a string representing the kind.
4506	      *  max-count:  the maximum number of values which members of the
4507	         overlay must support.
4508	      *  max-size:  the maximum size of individual values.
4509	      For instance, the example above indicates that members must
4510	      support SIP-REGISTRATION with a maximum of 10 values of up to 1000
4511	      bytes each.  Multiple required-kinds elements MAY be present.

4513	   credential-server
4514	      This element contains the URL at which the credential server can
4515	      be reached in a "url" element.  This URL MUST be of type "https:".
4516	      More than one credential-server element may be present.

4518	   self-signed-permitted
4519	      This element indicates whether self-signed certificates are
4520	      permitted.  If it is set to "TRUE", then self-signed certificates
4521	      are allowed, in which case the credential-server and root-cert
4522	      elements may be absent.  Otherwise, it SHOULD be absent, but MAY
4523	      be set "FALSE".  This element also contains an attribute "digest"
4524	      which indicates the digest to be used to compute the Node-ID.
4525	      Valid values for this parameter are "SHA-1" and "SHA-256".

4527	   bootstrap-peer
4528	      This elements represents the address of one of the bootstrap
4529	      peers.  It has an attribute called "address" that represents the
4530	      IP address (either IPv4 or IPv6, since they can be distinguished)
4531	      and an attribute called "port" that represents the port.  More
4532	      than one bootstrap-peer element may be present.

4534	   multicast-bootstrap
4535	      This element represents the address of a multicast address and
4536	      port that may be used for bootstrap and that peers SHOULD listen
4537	      on to enable bootstrap.  It has an attributed called "address"
4538	      that represents the IP address and an attribute called "port" that
4539	      represents the port.  More than one "multicast-bootstrap" element
4540	      may be present.

4542	   clients-permitted
4543	      This element represents whether clients are permitted or whether
4544	      all nodes must be peers.  If it is set to "TRUE" or absent, this
4545	      indicates that clients are permitted.  If it is set to "FALSE"
4546	      then nodes MUST join as peers.

4548	   chord-128-2-16+-update-frequency
4549	      The update frequency for the Chord-128-2-16+ topology plugin (see
4550	      Section 12).

4552	   chord-128-2-16+-ping-frequency
4553	      The ping frequency for the Chord-128-2-16+ topology plugin (see
4554	      Section 12).

4556	   credential-server
4557	      Base URL for credential server.

4559	   shared-secret
4560	      If shared secret mode is used, this contains the shared secret.

4562	   [[TODO:  Do a RelaxNG grammar.]]

4564	13.3.  Credentials

4566	   If the configuration document contains a credential-server element,
4567	   credentials are required to join the Overlay Instance.  A peer which
4568	   does not yet have credentials MUST contact the credential server to
4569	   acquire them.

4571	   In order to acquire credentials, the peer generates an asymmetric key
4572	   pair and then generates a "Simple Enrollment Request" (as defined in
4573	   [I-D.ietf-pkix-2797-bis]) and sends this over HTTPS as defined in
4574	   [I-D.ietf-pkix-cmc-trans] to the URL in the credential-server
4575	   element.  The subjectAltName in the request MUST contain the required
4576	   user name.

4578	   The credential server MUST authenticate the request using the
4579	   provided user name and password.  If the authentication succeeds and
4580	   the requested user name is acceptable, the server and returns a
4581	   certificate.  The SubjectAltName field in the certificate contains
4582	   the following values:

4584	   o  One or more Node-IDs which MUST be cryptographically random
4585	      [RFC4086].  These MUST be chosen by the credential server in such
4586	      a way that they are unpredictable to the requesting user.  These
4587	      are of type URI and MUST contain RELOAD URIs as described in
4588	      Section 16.7 and MUST contain a Destination list with a single
4589	      entry of type "node_id".
4590	   o  The names this user is allowed to use in the overlay, using type
4591	      rfc822Name.

4593	   The certificate is returned in a "Simple Enrollment Response".
4594	   [[TODO:  REF]]

4596	   The client MUST check that the certificate returned was signed by one
4597	   of the certificates received in the "root-cert" list of the overlay
4598	   configuration data.  The peer then reads the certificate to find the
4599	   Node-IDs it can use.

4601	13.3.1.  Self-Generated Credentials

4603	   If the "self-signed-permitted" element is present and set to "TRUE",
4604	   then a node MUST generate its own self-signed certificate to join the
4605	   overlay.  The self-signed certificate MAY contain any user name of
4606	   the users choice.  Users SHOULD make some attempt to make it unique
4607	   but this document does not specify any mechanisms for that.

4609	   The Node-Id MUST be computed by applying the digest specified in the
4610	   self-signed-permitted element to the DER representation of the user's
4611	   public key.  When accepting a self-signed certificate, nodes MUST
4612	   check that the Node-ID and public keys match.  This prevents Node-ID
4613	   theft.

4615	   Once the node has constructed a self-signed certificate, it MAY join
4616	   the overlay.  Before storing its certificate in the overlay
4617	   (Section 8) it SHOULD look to see if the user name is already taken
4618	   and if so choose another user name.  Note that this only provides
4619	   protection against accidental name collisions.  Name theft is still
4620	   possible.  If protection against name theft is desired, then the
4621	   enrollment service must be used.

4623	13.4.  Joining the Overlay Peer

4625	   In order to join the overlay, the peer MUST contact a peer.
4626	   Typically this means contacting the bootstrap peers, since they are
4627	   guaranteed to have public IP addresses (the system should not
4628	   advertise them as bootstrap peers otherwise).  If the peer has cached
4629	   peers it SHOULD contact them first by sending a Ping request to the
4630	   known peer address with the destination Node-ID set to that peer's
4631	   Node-ID.

4633	   If no cached peers are available, then the peer SHOULD send a Ping
4634	   request to the address and port found in the broadcast-peers element
4635	   in the configuration document.  This MAY be a multicast or anycast
4636	   address.  The Ping should use the wildcard Node-ID as the destination
4637	   Node-ID.

4639	   The responder peer that receives the Ping request SHOULD check that
4640	   the overlay name is correct and that the requester peer sending the
4641	   request has appropriate credentials for the overlay before responding
4642	   to the Ping request even if the response is only an error.

4644	   When the requester peer finally does receive a response from some
4645	   responding peer, it can note the Node-ID in the response and use this
4646	   Node-ID to start sending requests to join the Overlay Instance as
4647	   described in Section 6.3.

4649	   After a peer has successfully joined the overlay network, it SHOULD
4650	   periodically look at any peers to which it has managed to form direct
4651	   connections.  Some of these peers MAY be added to the cached-peers
4652	   list and used in future boots.  Peers that are not directly connected
4653	   MUST NOT be cached.  The RECOMMENDED number of peers to cache is 10.

4655	14.  Message Flow Example

4657	   In the following example, we assume that JP has formed a connection
4658	   to one of the bootstrap peers.  JP then sends a Connect through that
4659	   peer to the admitting peer (AP) to initiate a connection.  When AP
4660	   responds, JP and AP use ICE to set up a connection and then set up
4661	   TLS.

4663	          JP        PPP       PP        AP        NP        NNP       BP
4664	           |         |         |         |         |         |         |
4665	           |         |         |         |         |         |         |
4666	           |         |         |         |         |         |         |
4667	           |Connect Dest=JP    |         |         |         |         |
4668	           |---------------------------------------------------------->|
4669	           |         |         |         |         |         |         |
4670	           |         |         |         |         |         |         |
4671	           |         |         |Connect Dest=JP    |         |         |
4672	           |         |         |<--------------------------------------|
4673	           |         |         |         |         |         |         |
4674	           |         |         |         |         |         |         |
4675	           |         |         |Connect Dest=JP    |         |         |
4676	           |         |         |-------->|         |         |         |
4677	           |         |         |         |         |         |         |
4678	           |         |         |         |         |         |         |
4679	           |         |         |ConnectAns         |         |         |
4680	           |         |         |<--------|         |         |         |
4681	           |         |         |         |         |         |         |
4682	           |         |         |         |         |         |         |
4683	           |         |         |ConnectAns         |         |         |
4684	           |         |         |-------------------------------------->|
4685	           |         |         |         |         |         |         |
4686	           |         |         |         |         |         |         |
4687	           |ConnectAns         |         |         |         |         |
4688	           |<----------------------------------------------------------|
4689	           |         |         |         |         |         |         |
4690	           |         |         |         |         |         |         |
4691	           |TLS      |         |         |         |         |         |
4692	           |.............................|         |         |         |
4693	           |         |         |         |         |         |         |
4694	           |         |         |         |         |         |         |
4695	           |         |         |         |         |         |         |
4696	           |         |         |         |         |         |         |

4698	   Once JP has connected to AP, it needs to populate its Routing Table.
4699	   In Chord, this means that it needs to populate its neighbor table and
4700	   its finger table.  To populate its neighbor table, it needs the
4701	   successor of AP, NP.  It sends a Connect to the Resource-IP AP+1,
4702	   which gets routed to NP.  When NP responds, JP and NP use ICE and TLS
4703	   to set up a connection.

4705	          JP        PPP       PP        AP        NP        NNP       BP
4706	           |         |         |         |         |         |         |
4707	           |         |         |         |         |         |         |
4708	           |         |         |         |         |         |         |
4709	           |Connect AP+1       |         |         |         |         |
4710	           |---------------------------->|         |         |         |
4711	           |         |         |         |         |         |         |
4712	           |         |         |         |         |         |         |
4713	           |         |         |         |Connect AP+1       |         |
4714	           |         |         |         |-------->|         |         |
4715	           |         |         |         |         |         |         |
4716	           |         |         |         |         |         |         |
4717	           |         |         |         |ConnectAns         |         |
4718	           |         |         |         |<--------|         |         |
4719	           |         |         |         |         |         |         |
4720	           |         |         |         |         |         |         |
4721	           |ConnectAns         |         |         |         |         |
4722	           |<----------------------------|         |         |         |
4723	           |         |         |         |         |         |         |
4724	           |         |         |         |         |         |         |
4725	           |Connect  |         |         |         |         |         |
4726	           |-------------------------------------->|         |         |
4727	           |         |         |         |         |         |         |
4728	           |         |         |         |         |         |         |
4729	           |TLS      |         |         |         |         |         |
4730	           |.......................................|         |         |
4731	           |         |         |         |         |         |         |
4732	           |         |         |         |         |         |         |
4733	           |         |         |         |         |         |         |
4734	           |         |         |         |         |         |         |

4736	   JP also needs to populate its finger table (for Chord).  It issues a
4737	   Connect to a variety of locations around the overlay.  The diagram
4738	   below shows it sending a Connect halfway around the Chord ring the JP
4739	   + 2^127.

4741	            JP        NP        XX        TP
4742	             |         |         |         |
4743	             |         |         |         |
4744	             |         |         |         |
4745	             |Connect JP+2<<126  |         |
4746	             |-------->|         |         |
4747	             |         |         |         |
4748	             |         |         |         |
4749	             |         |Connect JP+2<<126  |
4750	             |         |-------->|         |
4751	             |         |         |         |
4752	             |         |         |         |
4753	             |         |         |Connect JP+2<<126
4754	             |         |         |-------->|
4755	             |         |         |         |
4756	             |         |         |         |
4757	             |         |         |ConnectAns
4758	             |         |         |<--------|
4759	             |         |         |         |
4760	             |         |         |         |
4761	             |         |ConnectAns         |
4762	             |         |<--------|         |
4763	             |         |         |         |
4764	             |         |         |         |
4765	             |ConnectAns         |         |
4766	             |<--------|         |         |
4767	             |         |         |         |
4768	             |         |         |         |
4769	             |TLS      |         |         |
4770	             |.............................|
4771	             |         |         |         |
4772	             |         |         |         |
4773	             |         |         |         |
4774	             |         |         |         |

4776	   Once JP has a reasonable set of connections he is ready to take his
4777	   place in the DHT.  He does this by sending a Join to AP.  AP does a
4778	   series of Store requests to JP to store the data that JP will be
4779	   responsible for.  AP then sends JP an Update explicitly labeling JP
4780	   as its predecessor.  At this point, JP is part of the ring and
4781	   responsible for a section of the overlay.  AP can now forget any data
4782	   which is assigned to JP and not AP.

4784	          JP        PPP       PP        AP        NP        NNP       BP
4785	           |         |         |         |         |         |         |
4786	           |         |         |         |         |         |         |
4787	           |         |         |         |         |         |         |
4788	           |JoinReq  |         |         |         |         |         |
4789	           |---------------------------->|         |         |         |
4790	           |         |         |         |         |         |         |
4791	           |         |         |         |         |         |         |
4792	           |JoinAns  |         |         |         |         |         |
4793	           |<----------------------------|         |         |         |
4794	           |         |         |         |         |         |         |
4795	           |         |         |         |         |         |         |
4796	           |StoreReq Data A    |         |         |         |         |
4797	           |<----------------------------|         |         |         |
4798	           |         |         |         |         |         |         |
4799	           |         |         |         |         |         |         |
4800	           |StoreAns |         |         |         |         |         |
4801	           |---------------------------->|         |         |         |
4802	           |         |         |         |         |         |         |
4803	           |         |         |         |         |         |         |
4804	           |StoreReq Data B    |         |         |         |         |
4805	           |<----------------------------|         |         |         |
4806	           |         |         |         |         |         |         |
4807	           |         |         |         |         |         |         |
4808	           |StoreAns |         |         |         |         |         |
4809	           |---------------------------->|         |         |         |
4810	           |         |         |         |         |         |         |
4811	           |         |         |         |         |         |         |
4812	           |UpdateReq|         |         |         |         |         |
4813	           |<----------------------------|         |         |         |
4814	           |         |         |         |         |         |         |
4815	           |         |         |         |         |         |         |
4816	           |UpdateAns|         |         |         |         |         |
4817	           |---------------------------->|         |         |         |
4818	           |         |         |         |         |         |         |
4819	           |         |         |         |         |         |         |
4820	           |         |         |         |         |         |         |
4821	           |         |         |         |         |         |         |

4823	   In Chord, JP's neighbor table needs to contain its own predecessors.
4824	   It couldn't connect to them previously because Chord has no way to
4825	   route immediately to your predecessors.  However, now that it has
4826	   received an Update from AP, it has APs predecessors, which are also
4827	   its own, so it sends Connects to them.  Below it is shown connecting
4828	   to its closest predecessor, PP.

4830	          JP        PPP       PP        AP        NP        NNP       BP
4831	           |         |         |         |         |         |         |
4832	           |         |         |         |         |         |         |
4833	           |         |         |         |         |         |         |
4834	           |Connect Dest=PP    |         |         |         |         |
4835	           |---------------------------->|         |         |         |
4836	           |         |         |         |         |         |         |
4837	           |         |         |         |         |         |         |
4838	           |         |         |Connect Dest=PP    |         |         |
4839	           |         |         |<--------|         |         |         |
4840	           |         |         |         |         |         |         |
4841	           |         |         |         |         |         |         |
4842	           |         |         |ConnectAns         |         |         |
4843	           |         |         |-------->|         |         |         |
4844	           |         |         |         |         |         |         |
4845	           |         |         |         |         |         |         |
4846	           |ConnectAns         |         |         |         |         |
4847	           |<----------------------------|         |         |         |
4848	           |         |         |         |         |         |         |
4849	           |         |         |         |         |         |         |
4850	           |TLS      |         |         |         |         |         |
4851	           |...................|         |         |         |         |
4852	           |         |         |         |         |         |         |
4853	           |         |         |         |         |         |         |
4854	           |UpdateReq|         |         |         |         |         |
4855	           |------------------>|         |         |         |         |
4856	           |         |         |         |         |         |         |
4857	           |         |         |         |         |         |         |
4858	           |UpdateAns|         |         |         |         |         |
4859	           |<------------------|         |         |         |         |
4860	           |         |         |         |         |         |         |
4861	           |         |         |         |         |         |         |
4862	           |UpdateReq|         |         |         |         |         |
4863	           |---------------------------->|         |         |         |
4864	           |         |         |         |         |         |         |
4865	           |         |         |         |         |         |         |
4866	           |UpdateAns|         |         |         |         |         |
4867	           |<----------------------------|         |         |         |
4868	           |         |         |         |         |         |         |
4869	           |         |         |         |         |         |         |
4870	           |UpdateReq|         |         |         |         |         |
4871	           |-------------------------------------->|         |         |
4872	           |         |         |         |         |         |         |
4873	           |         |         |         |         |         |         |
4874	           |UpdateAns|         |         |         |         |         |
4875	           |<--------------------------------------|         |         |
4876	           |         |         |         |         |         |         |
4877	           |         |         |         |         |         |         |

4879	   Finally, now that JP has a copy of all the data and is ready to route
4880	   messages and receive requests, it sends Updates to everyone in its
4881	   Routing Table to tell them it is ready to go.  Below, it is shown
4882	   sending such an update to TP.

4884	            JP        NP        XX        TP
4885	             |         |         |         |
4886	             |         |         |         |
4887	             |         |         |         |
4888	             |Update   |         |         |
4889	             |---------------------------->|
4890	             |         |         |         |
4891	             |         |         |         |
4892	             |UpdateAns|         |         |
4893	             |<----------------------------|
4894	             |         |         |         |
4895	             |         |         |         |
4896	             |         |         |         |
4897	             |         |         |         |

4899	15.  Security Considerations

4901	15.1.  Overview

4903	   RELOAD provides a generic storage service, albeit one designed to be
4904	   useful for P2PSIP.  In this section we discuss security issues that
4905	   are likely to be relevant to any usage of RELOAD.  In Section 15.7 we
4906	   describe issues that are specific to SIP.

4908	   In any Overlay Instance, any given user depends on a number of peers
4909	   with which they have no well-defined relationship except that they
4910	   are fellow members of the Overlay Instance.  In practice, these other
4911	   nodes may be friendly, lazy, curious, or outright malicious.  No
4912	   security system can provide complete protection in an environment
4913	   where most nodes are malicious.  The goal of security in RELOAD is to
4914	   provide strong security guarantees of some properties even in the
4915	   face of a large number of malicious nodes and to allow the overlay to
4916	   function correctly in the face of a modest number of malicious nodes.

4918	   P2PSIP deployments require the ability to authenticate both peers and
4919	   resources (users) without the active presence of a trusted entity in
4920	   the system.  We describe two mechanisms.  The first mechanism is
4921	   based on public key certificates and is suitable for general
4922	   deployments.  The second is based on an overlay-wide shared symmetric
4923	   key and is suitable only for limited deployments in which the
4924	   relationship between admitted peers is not adversarial.

4926	15.2.  Attacks on P2P Overlays

4928	   The two basic functions provided by overlay nodes are storage and
4929	   routing:  some node is responsible for storing a peer's data and for
4930	   allowing a peer to fetch other peer's data.  Some other set of nodes
4931	   are responsible for routing messages to and from the storing nodes.
4932	   Each of these issues is covered in the following sections.

4934	   P2P overlays are subject to attacks by subversive nodes that may
4935	   attempt to disrupt routing, corrupt or remove user registrations, or
4936	   eavesdrop on signaling.  The certificate-based security algorithms we
4937	   describe in this draft are intended to protect overlay routing and
4938	   user registration information in RELOAD messages.

4940	   To protect the signaling from attackers pretending to be valid peers
4941	   (or peers other than themselves), the first requirement is to ensure
4942	   that all messages are received from authorized members of the
4943	   overlay.  For this reason, RELOAD transports all messages over a
4944	   secure channel (TLS and DTLS are defined in this document) which
4945	   provides message integrity and authentication of the directly
4946	   communicating peer.  In addition, when the certificate-based security
4947	   system is used, messages and data are digitally signed with the
4948	   sender's private key, providing end-to-end security for
4949	   communications.

4951	15.3.  Certificate-based Security

4953	   This specification stores users' registrations and possibly other
4954	   data in an overlay network.  This requires a solution to securing
4955	   this data as well as securing, as well as possible, the routing in
4956	   the overlay.  Both types of security are based on requiring that
4957	   every entity in the system (whether user or peer) authenticate
4958	   cryptographically using an asymmetric key pair tied to a certificate.

4960	   When a user enrolls in the Overlay Instance, they request or are
4961	   assigned a unique name, such as "alice@dht.example.net".  These names
4962	   are unique and are meant to be chosen and used by humans much like a
4963	   SIP Address of Record (AOR) or an email address.  The user is also
4964	   assigned one or more Node-IDs by the central enrollment authority.
4965	   Both the name and the peer ID are placed in the certificate, along
4966	   with the user's public key.

4968	   Each certificate enables an entity to act in two sorts of roles:

4970	   o  As a user, storing data at specific Resource-IDs in the Overlay
4971	      Instance corresponding to the user name.

4973	   o  As a overlay peer with the peer ID(s) listed in the certificate.

4975	   Note that since only users of this Overlay Instance need to validate
4976	   a certificate, this usage does not require a global PKI.  Instead,
4977	   certificates are signed by require a central enrollment authority
4978	   which acts as the certificate authority for the Overlay Instance.
4979	   This authority signs each peer's certificate.  Because each peer
4980	   possesses the CA's certificate (which they receive on enrollment)
4981	   they can verify the certificates of the other entities in the overlay
4982	   without further communication.  Because the certificates contain the
4983	   user/peer's public key, communications from the user/peer can be
4984	   verified in turn.

4986	   If self-signed certificates are used, then the security provided is
4987	   significantly decreased, since attackers can mount Sybil attacks.  In
4988	   addition, attackers cannot trust the user names in certificates
4989	   (though they can trust the Node-Ids because they are
4990	   cryptographically verifiable).  This scheme is only appropriate for
4991	   small deployments, such as a small office or ad hoc overlay set up
4992	   among participants in a meeting.  Some additional security can be
4993	   provided by using the shared secret admission control scheme as well.

4995	   Because all stored data is signed by the owner of the data the
4996	   storing peer can verify that the storer is authorized to perform a
4997	   store at that resource-id and also allows any consumer of the data to
4998	   verify the provenance and integrity of the data when it retrieves it.

5000	   All implementations MUST implement certificate-based security.

5002	15.4.  Shared-Secret Security

5004	   RELOAD also supports a shared secret admission control scheme that
5005	   relies on a single key that is shared among all members of the
5006	   overlay.  It is appropriate for small groups that wish to form a
5007	   private network without complexity.  In shared secret mode, all the
5008	   peers share a single symmetric key which is used to key TLS-PSK
5009	   [RFC4279] or TLS-SRP [I-D.ietf-tls-srp] mode.  A peer which does not
5010	   know the key cannot form TLS connections with any other peer and
5011	   therefore cannot join the overlay.

5013	   One natural approach to a shared-secret scheme is to use a user-
5014	   entered password as the key.  The difficulty with this is that in
5015	   TLS-PSK mode, such keys are very susceptible to dictionary attacks.
5016	   If passwords are used as the source of shared-keys, then TLS-SRP is a
5017	   superior choice because it is not subject to dictionary attacks.

5019	15.5.  Storage Security

5021	   When certificate-based security is used in RELOAD, any given
5022	   Resource-ID/kind-id pair (a slot) is bound to some small set of
5023	   certificates.  In order to write data in a slot, the writer must
5024	   prove possession of the private key for one of those certificates.
5025	   Moreover, all data is stored signed by the certificate which
5026	   authorized its storage.  This set of rules makes questions of
5027	   authorization and data integrity - which have historically been
5028	   thorny for overlays - relatively simple.

5030	   When shared-secret security is used, then all peers trust all other
5031	   peers, provided that they have demonstrated that they have the
5032	   credentials to join the overlay at all.  The following text therefore
5033	   applies only to certificate-based security.

5035	15.5.1.  Authorization

5037	   When a client wants to store some value in a slot, it first digitally
5038	   signs the value with its own private key.  It then sends a Store
5039	   request that contains both the value and the signature towards the
5040	   storing peer (which is defined by the Resource Name construction
5041	   algorithm for that particular kind of value).

5043	   When the storing peer receives the request, it must determine whether
5044	   the storing client is authorized to store in this slot.  In order to
5045	   do so, it executes the Resource Name construction algorithm for the
5046	   specified kind based on the user's certificate information.  It then
5047	   computes the Resource-ID from the Resource Name and verifies that it
5048	   matches the slot which the user is requesting to write to.  If it
5049	   does, the user is authorized to write to this slot, pending quota
5050	   checks as described in the next section.

5052	   For example, consider the certificate with the following properties:

5054	           User name: alice@dht.example.com
5055	           Node-ID:   013456789abcdef
5056	           Serial:    1234

5058	   If Alice wishes to Store a value of the "SIP Location" kind, the
5059	   Resource Name will be the SIP AOR "sip:alice@dht.example.com".  The
5060	   Resource-ID will be determined by hashing the Resource Name.  When a
5061	   peer receives a request to store a record at Resource-ID X, it takes
5062	   the signing certificate and recomputes the Resource Name, in this
5063	   case "alice@dht.example.com".  If H("alice@dht.example.com")=X then
5064	   the Store is authorized.  Otherwise it is not.  Note that the
5065	   Resource Name construction algorithm may be different for other
5066	   kinds.

5068	15.5.2.  Distributed Quota

5070	   Being a peer in a Overlay Instance carries with it the responsibility
5071	   to store data for a given region of the Overlay Instance.  However,
5072	   if clients were allowed to store unlimited amounts of data, this
5073	   would create unacceptable burdens on peers, as well as enabling
5074	   trivial denial of service attacks.  RELOAD addresses this issue by
5075	   requiring each usage to define maximum sizes for each kind of stored
5076	   data.  Attempts to store values exceeding this size MUST be rejected
5077	   (if peers are inconsistent about this, then strange artifacts will
5078	   happen when the zone of responsibility shifts and a different peer
5079	   becomes responsible for overlarge data).  Because each slot is bound
5080	   to a small set of certificates, these size restrictions also create a
5081	   distributed quota mechanism, with the quotas administered by the
5082	   central enrollment server.

5084	   Allowing different kinds of data to have different size restrictions
5085	   allows new usages the flexibility to define limits that fit their
5086	   needs without requiring all usages to have expansive limits.

5088	15.5.3.  Correctness

5090	   Because each stored value is signed, it is trivial for any retrieving
5091	   peer to verify the integrity of the stored value.  Some more care
5092	   needs to be taken to prevent version rollback attacks.  Rollback
5093	   attacks on storage are prevented by the use of store times and
5094	   lifetime values in each store.  A lifetime represents the latest time
5095	   at which the data is valid and thus limits (though does not
5096	   completely prevent) the ability of the storing node to perform a
5097	   rollback attack on retrievers.  In order to prevent a rollback attack
5098	   at the time of the Store request, we require that storage times be
5099	   monotonically increasing.  Storing peers MUST reject Store requests
5100	   with storage times smaller than or equal to those they are currently
5101	   storing.  In addition, a fetching node which receives a data value
5102	   with a storage time older than the result of the previous fetch knows
5103	   a rollback has occurred.

5105	15.5.4.  Residual Attacks

5107	   The mechanisms described here provide a high degree of security, but
5108	   some attacks remain possible.  Most simply, it is possible for
5109	   storing nodes to refuse to store a value (i.e., reject any request).
5110	   In addition, a storing node can deny knowledge of values which it
5111	   previously accepted.  To some extent these attacks can be ameliorated
5112	   by attempting to store to/retrieve from replicas, but a retrieving
5113	   client does not know whether it should try this or not, since there
5114	   is a cost to doing so.

5116	   Although the certificate-based authentication scheme prevents a
5117	   single peer from being able to forge data owned by other peers.
5118	   Furthermore, although a subversive peer can refuse to return data
5119	   resources for which it is responsible it cannot return forged data
5120	   because it cannot provide authentication for such registrations.
5121	   Therefore parallel searches for redundant registrations can mitigate
5122	   most of the affects of a compromised peer.  The ultimate reliability
5123	   of such an overlay is a statistical question based on the replication
5124	   factor and the percentage of compromised peers.

5126	   In addition, when a kind is is multivalued (e.g., an array data
5127	   model), the storing node can return only some subset of the values,
5128	   thus biasing its responses.  This can be countered by using single
5129	   values rather than sets, but that makes coordination between multiple
5130	   storing agents much more difficult.  This is a tradeoff that must be
5131	   made when designing any usage.

5133	15.6.  Routing Security

5135	   Because the storage security system guarantees (within limits) the
5136	   integrity of the stored data, routing security focuses on stopping
5137	   the attacker from performing a DOS attack on the system by misrouting
5138	   requests in the overlay.  There are a few obvious observations to
5139	   make about this.  First, it is easy to ensure that an attacker is at
5140	   least a valid peer in the Overlay Instance.  Second, this is a DOS
5141	   attack only.  Third, if a large percentage of the peers on the
5142	   Overlay Instance are controlled by the attacker, it is probably
5143	   impossible to perfectly secure against this.

5145	15.6.1.  Background

5147	   In general, attacks on DHT routing are mounted by the attacker
5148	   arranging to route traffic through or two nodes it controls.  In the
5149	   Eclipse attack [Eclipse] the attacker tampers with messages to and
5150	   from nodes for which it is on-path with respect to a given victim
5151	   node.  This allows it to pretend to be all the nodes that are
5152	   reachable through it.  In the Sybil attack [Sybil], the attacker
5153	   registers a large number of nodes and is therefore able to capture a
5154	   large amount of the traffic through the DHT.

5156	   Both the Eclipse and Sybil attacks require the attacker to be able to
5157	   exercise control over her peer IDs.  The Sybil attack requires the
5158	   creation of a large number of peers.  The Eclipse attack requires
5159	   that the attacker be able to impersonate specific peers.  In both
5160	   cases, these attacks are limited by the use of centralized,
5161	   certificate-based admission control.

5163	15.6.2.  Admissions Control

5165	   Admission to an RELOAD Overlay Instance is controlled by requiring
5166	   that each peer have a certificate containing its peer ID.  The
5167	   requirement to have a certificate is enforced by using certificate-
5168	   based mutual authentication on each connection.  Thus, whenever a
5169	   peer connects to another peer, each side automatically checks that
5170	   the other has a suitable certificate.  These peer IDs are randomly
5171	   assigned by the central enrollment server.  This has two benefits:

5173	   o  It allows the enrollment server to limit the number of peer IDs
5174	      issued to any individual user.
5175	   o  It prevents the attacker from choosing specific peer IDs.

5177	   The first property allows protection against Sybil attacks (provided
5178	   the enrollment server uses strict rate limiting policies).  The
5179	   second property deters but does not completely prevent Eclipse
5180	   attacks.  Because an Eclipse attacker must impersonate peers on the
5181	   other side of the attacker, he must have a certificate for suitable
5182	   peer IDs, which requires him to repeatedly query the enrollment
5183	   server for new certificates which only will match by chance.  From
5184	   the attacker's perspective, the difficulty is that if he only has a
5185	   small number of certificates the region of the Overlay Instance he is
5186	   impersonating appears to be very sparsely populated by comparison to
5187	   the victim's local region.

5189	15.6.3.  Peer Identification and Authentication

5191	   In general, whenever a peer engages in overlay activity that might
5192	   affect the routing table it must establish its identity.  This
5193	   happens in two ways.  First, whenever a peer establishes a direct
5194	   connection to another peer it authenticates via certificate-based
5195	   mutual authentication.  All messages between peers are sent over this
5196	   protected channel and therefore the peers can verify the data origin
5197	   of the last hop peer for requests and responses without further
5198	   cryptography.

5200	   In some situations, however, it is desirable to be able to establish
5201	   the identity of a peer with whom one is not directly connected.  The
5202	   most natural case is when a peer Updates its state.  At this point,
5203	   other peers may need to update their view of the overlay structure,
5204	   but they need to verify that the Update message came from the actual
5205	   peer rather than from an attacker.  To prevent this, all overlay
5206	   routing messages are signed by the peer that generated them.

5208	   [OPEN ISSUE:  this allows for replay attacks on requests.  There are
5209	   two basic defenses here.  The first is global clocks and loose anti-
5210	   replay.  The second is to refuse to take any action unless you verify
5211	   the data with the relevant node.  This issue is undecided.]

5213	   [TODO:  I think we are probably going to end up with generic
5214	   signatures or at least optional signatures on all overlay messages.]

5216	15.6.4.  Protecting the Signaling

5218	   The goal here is to stop an attacker from knowing who is signaling
5219	   what to whom.  An attacker being able to observe the activities of a
5220	   specific individual is unlikely given the randomization of IDs and
5221	   routing based on the present peers discussed above.  Furthermore,
5222	   because messages can be routed using only the header information, the
5223	   actual body of the RELOAD message can be encrypted during
5224	   transmission.

5226	   There are two lines of defense here.  The first is the use of TLS or
5227	   DTLS for each communications link between peers.  This provides
5228	   protection against attackers who are not members of the overlay.  The
5229	   second line of defense, if certificate-based security is used, is to
5230	   digitally sign each message.  This prevents adversarial peers from
5231	   modifying messages in flight, even if they are on the routing path.

5233	15.6.5.  Residual Attacks

5235	   The routing security mechanisms in RELOAD are designed to contain
5236	   rather than eliminate attacks on routing.  It is still possible for
5237	   an attacker to mount a variety of attacks.  In particular, if an
5238	   attacker is able to take up a position on the overlay routing between
5239	   A and B it can make it appear as if B does not exist or is
5240	   disconnected.  It can also advertise false network metrics in attempt
5241	   to reroute traffic.  However, these are primarily DoS attacks.

5243	   The certificate-based security scheme secures the namespace, but if
5244	   an individual peer is compromised or if an attacker obtains a
5245	   certificate from the CA, then a number of subversive peers can still
5246	   appear in the overlay.  While these peers cannot falsify responses to
5247	   resource queries, they can respond with error messages, effecting a
5248	   DoS attack on the resource registration.  They can also subvert
5249	   routing to other compromised peers.  To defend against such attacks,
5250	   a resource search must still consist of parallel searches for
5251	   replicated registrations.

5253	15.7.  SIP-Specific Issues

5255	15.7.1.  Fork Explosion

5257	   Because SIP includes a forking capability (the ability to retarget to
5258	   multiple recipients), fork bombs are a potential DoS concern.

5260	   However, in the SIP usage of RELOAD, fork bombs are a much lower
5261	   concern because the calling party is involved in each retargeting
5262	   event and can therefore directly measure the number of forks and
5263	   throttle at some reasonable number.

5265	15.7.2.  Malicious Retargeting

5267	   Another potential DoS attack is for the owner of an attractive number
5268	   to retarget all calls to some victim.  This attack is difficult to
5269	   ameliorate without requiring the target of a SIP registration to
5270	   authorize all stores.  The overhead of that requirement would be
5271	   excessive and in addition there are good use cases for retargeting to
5272	   a peer without there explicit cooperation.

5274	15.7.3.  Privacy Issues

5276	   All RELOAD SIP registration data is public.  Methods of providing
5277	   location and identity privacy are still being studied.

5279	16.  IANA Considerations

5281	   This section contains the new code points registered by this
5282	   document.  The IANA policies are TBD.

5284	16.1.  Overlay Algorithm Types

5286	   IANA SHALL create/(has created) a "RELOAD Overlay Algorithm Type"
5287	   Registry.  Entries in this registry are strings denoting the names of
5288	   overlay algorithms.  The registration policy for this registry is
5289	   TBD.

5291	   The initial contents of this registry are:

5293	   chord-128-2-16+
5294	      The algorithm defined in Section 12 of this document.

5296	16.2.  Data Kind-Id

5298	   IANA SHALL create/(has created) a "RELOAD Data Kind-Id" Registry.
5299	   Entries in this registry are 32-bit integers denoting data kinds, as
5300	   described in Section 4.1.2.  The registration policy for this
5301	   registry is TBD.

5303	   The initial contents of this registry are:

5305	                     +--------------------+---------+
5306	                     | Kind               | Kind-Id |
5307	                     +--------------------+---------+
5308	                     | SIP-REGISTRATION   |       1 |
5309	                     | TURN_SERVICE       |       2 |
5310	                     | CERTIFICATE        |       3 |
5311	                     | ROUTING_TABLE_SIZE |       4 |
5312	                     | SOFTWARE_VERSION   |       5 |
5313	                     | MACHINE_UPTIME     |       6 |
5314	                     | APP_UPTIME         |       7 |
5315	                     | MEMORY_FOOTPRINT   |       8 |
5316	                     | DATASIZE_StoreD    |       9 |
5317	                     | INSTANCES_StoreD   |      10 |
5318	                     | MESSAGES_SENT_RCVD |      11 |
5319	                     | EWMA_BYTES_SENT    |      12 |
5320	                     | EWMA_BYTES_RCVD    |      13 |
5321	                     | LAST_CONTACT       |      14 |
5322	                     | RTT                |      15 |
5323	                     +--------------------+---------+

5325	16.3.  Data Model

5327	   IANA SHALL create/(has created) a "RELOAD Data Model" Registry.
5328	   Entries in this registry are 8-bit integers denoting data models, as
5329	   described in Section 7.2.  The registration policy for this registry
5330	   is TBD.

5332	                       +--------------+------------+
5333	                       | Data Model   | Identifier |
5334	                       +--------------+------------+
5335	                       | SINGLE_VALUE |          1 |
5336	                       | ARRAY        |          2 |
5337	                       | DICTIONARY   |          3 |
5338	                       +--------------+------------+

5340	16.4.  Message Codes

5342	   IANA SHALL create/(has created) a "RELOAD Message Code" Registry.
5343	   Entries in this registry are 16-bit integers denoting method codes as
5344	   described in Section 6.2.3.  The registration policy for this
5345	   registry is TBD.

5347	   The initial contents of this registry are:

5349	                  +-------------------+----------------+
5350	                  | Message Code Name |     Code Value |
5351	                  +-------------------+----------------+
5352	                  | reserved          |              0 |
5353	                  | ping_req          |              1 |
5354	                  | ping_ans          |              2 |
5355	                  | connect_req       |              3 |
5356	                  | connect_ans       |              4 |
5357	                  | tunnel_req        |              5 |
5358	                  | tunnel_ans        |              6 |
5359	                  | store_req         |              7 |
5360	                  | store_ans         |              8 |
5361	                  | fetch_req         |              9 |
5362	                  | fetch_ans         |             10 |
5363	                  | remove_req        |             11 |
5364	                  | remove_ans        |             12 |
5365	                  | find_req          |             13 |
5366	                  | find_ans          |             14 |
5367	                  | join_req          |             15 |
5368	                  | join_ans          |             16 |
5369	                  | leave_req         |             17 |
5370	                  | leave_ans         |             18 |
5371	                  | update_req        |             19 |
5372	                  | update_ans        |             20 |
5373	                  | route_query_req   |             21 |
5374	                  | route_query_ans   |             22 |
5375	                  | reserved          | 0x8000..0xfffe |
5376	                  | error             |         0xffff |
5377	                  +-------------------+----------------+

5379	   [[TODO - add IANA registration for p2p_enroll SRV and p2p_menroll]]

5381	16.5.  Error Codes

5383	   IANA SHALL create/(has created) a "RELOAD Error Code" Registry.
5384	   Entries in this registry are 16-bit integers denoting error codes.
5385	   [[TODO:  Complete this once we decide on error code strategy.

5387	16.6.  Route Log Extension Types

5389	   IANA SHALL create/(has created) a "RELOAD Route Log Extension Type
5390	   Registry.  This entry is currently empty.

5392	16.7.  reload: URI Scheme

5394	   This section describes the scheme for a reload:  URI, which can be
5395	   used to refer to either:

5397	   o  A peer.
5398	   o  A resource inside a peer.

5400	   The reload:  URI is defined using a subset of the URI schema
5401	   specified in Appendix A.  of RFC 3986 [REF] and the associated URI
5402	   Guidelines [REF:  RFC4395] per the following ABNF syntax:

5404	             RELOAD-URI = "reload://" destination "@" overlay "/"
5405	                          [specifier]

5407	             destination = 1 * HEXDIG
5408	             overlay = reg-name
5409	             specifier = 1*HEXDIG

5411	   The definitions of these productions are as follows:
5412	   destination:    a hex-encoded Destination List object.

5414	   overlay:    the name of the overlay.

5416	   specifier :  a hex-encoded StoredDataSpecifier indicating the data
5417	      element.

5419	   If no specifier is present than this URI addresses the peer which can
5420	   be reached via the indicated destination list at the indicated
5421	   overlay name.  If a specifier is present, then the URI addresses the
5422	   data value.

5424	16.7.1.  URI Registration

5426	   The following summarizes the information necessary to register the
5427	   reload:  URI.  [NOTE TO IANA/RFC-EDITOR:  Please replace XXXX with
5428	   the RFC number for this specification in the following list.]

5430	   URI Scheme Name:    reload
5431	   Status:    permanent
5432	   URI Scheme Syntax:    see Section 16.7.
5433	   URI Scheme Semantics:    The reload:  URI is intended to be used as a
5434	      reference to a RELOAD peer or resource.
5435	   Encoding Considerations:    The reload:  URI is not intended to be
5436	      human-readable text, therefore they are encoded entirely in US-
5437	      ASCII.
5438	   Applications/protocols that use this URI scheme:    The RELOAD
5439	      protocol described in RFC XXXX.

5441	      TBD for the rest of this template.

5443	17.  Acknowledgments

5445	   This draft is a merge of the "REsource LOcation And Discovery
5446	   (RELOAD)" draft by David A. Bryan, Marcia Zangrilli and Bruce B.
5447	   Lowekamp, the "Address Settlement by Peer to Peer" draft by Cullen
5448	   Jennings, Jonathan Rosenberg, and Eric Rescorla, the "Security
5449	   Extensions for RELOAD" draft by Bruce B. Lowekamp and James Deverick,
5450	   the "A Chord-based DHT for Resource Lookup in P2PSIP" by Marcia
5451	   Zangrilli and David A. Bryan, and the Peer-to-Peer Protocol (P2PP)
5452	   draft by Salman A. Baset, Henning Schulzrinne, and Marcin
5453	   Matuszewski.

5455	   Thanks to the many people who contributed including:  Michael Chen,
5456	   TODO - fill in.

5458	18.  References

5460	18.1.  Normative References

5462	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
5463	              Requirement Levels", BCP 14, RFC 2119, March 1997.

5465	   [I-D.ietf-mmusic-ice]
5466	              Rosenberg, J., "Interactive Connectivity Establishment
5467	              (ICE): A Protocol for Network Address Translator (NAT)
5468	              Traversal for Offer/Answer Protocols",
5469	              draft-ietf-mmusic-ice-16 (work in progress), June 2007.

5471	   [I-D.ietf-behave-rfc3489bis]
5472	              Rosenberg, J., "Session Traversal Utilities for (NAT)
5473	              (STUN)", draft-ietf-behave-rfc3489bis-06 (work in
5474	              progress), March 2007.

5476	   [I-D.ietf-behave-turn]
5477	              Rosenberg, J., "Obtaining Relay Addresses from Simple
5478	              Traversal Underneath NAT (STUN)",
5479	              draft-ietf-behave-turn-03 (work in progress), March 2007.

5481	   [I-D.ietf-pkix-cmc-trans]
5482	              Schaad, J. and M. Myers, "Certificate Management over CMS
5483	              (CMC) Transport Protocols", draft-ietf-pkix-cmc-trans-05
5484	              (work in progress), May 2006.

5486	   [I-D.ietf-pkix-2797-bis]
5487	              Myers, M. and J. Schaad, "Certificate Management Messages
5488	              over CMS", draft-ietf-pkix-2797-bis-04 (work in progress),
5489	              March 2006.

5491	   [RFC4279]  Eronen, P. and H. Tschofenig, "Pre-Shared Key Ciphersuites
5492	              for Transport Layer Security (TLS)", RFC 4279,
5493	              December 2005.

5495	   [I-D.ietf-tls-srp]
5496	              Taylor, D., "Using SRP for TLS Authentication",
5497	              draft-ietf-tls-srp-14 (work in progress), June 2007.

5499	   [I-D.ietf-mmusic-ice-tcp]
5500	              Rosenberg, J., "TCP Candidates with Interactive
5501	              Connectivity Establishment (ICE",
5502	              draft-ietf-mmusic-ice-tcp-03 (work in progress),
5503	              March 2007.

5505	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
5506	              A., and J. Peterson, "SIP: Session Initiation Protocol",
5507	              RFC 3261, June 2002.

5509	   [RFC3263]  Rosenberg, J. and H. Schulzrinne, "Session Initiation
5510	              Protocol (SIP): Locating SIP Servers", RFC 3263,
5511	              June 2002.

5513	   [RFC4347]  Rescorla, E. and N. Modadugu, "Datagram Transport Layer
5514	              Security", RFC 4347, April 2006.

5516	   [RFC4828]  Floyd, S. and E. Kohler, "TCP Friendly Rate Control
5517	              (TFRC): The Small-Packet (SP) Variant", RFC 4828,
5518	              April 2007.

5520	18.2.  Informative References

5522	   [I-D.ietf-behave-tcp]
5523	              Guha, S., "NAT Behavioral Requirements for TCP",
5524	              draft-ietf-behave-tcp-07 (work in progress), April 2007.

5526	   [I-D.ietf-p2psip-concepts]
5527	              Bryan, D., "Concepts and Terminology for Peer to Peer
5528	              SIP", draft-ietf-p2psip-concepts-00 (work in progress),
5529	              July 2007.

5531	   [RFC4145]  Yon, D. and G. Camarillo, "TCP-Based Media Transport in
5532	              the Session Description Protocol (SDP)", RFC 4145,
5533	              September 2005.

5535	   [RFC4572]  Lennox, J., "Connection-Oriented Media Transport over the
5536	              Transport Layer Security (TLS) Protocol in the Session
5537	              Description Protocol (SDP)", RFC 4572, July 2006.

5539	   [RFC2617]  Franks, J., Hallam-Baker, P., Hostetler, J., Lawrence, S.,
5540	              and P. Leach, "HTTP Authentication: Basic and Digest
5541	              Access Authentication", RFC 2617, June 1999.

5543	   [RFC2818]  Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000.

5545	   [RFC4086]  Eastlake, D., Schiller, J., and S. Crocker, "Randomness
5546	              Requirements for Security", BCP 106, RFC 4086, June 2005.

5548	   [RFC3280]  Housley, R., Polk, W., Ford, W., and D. Solo, "Internet
5549	              X.509 Public Key Infrastructure Certificate and
5550	              Certificate Revocation List (CRL) Profile", RFC 3280,
5551	              April 2002.

5553	   [Sybil]    Douceur, J., "The Sybil Attack", IPTPS 02, March 2002.

5555	   [Eclipse]  Singh, A., Ngan, T., Druschel, T., and D. Wallach,
5556	              "Eclipse Attacks on Overlay Networks: Threats and
5557	              Defenses", INFOCOM 2006, April 2006.

5559	   [I-D.cheshire-dnsext-multicastdns]
5560	              Cheshire, S. and M. Krochmal, "Multicast DNS",
5561	              draft-cheshire-dnsext-multicastdns-06 (work in progress),
5562	              August 2006.

5564	   [I-D.cheshire-dnsext-dns-sd]
5565	              Krochmal, M. and S. Cheshire, "DNS-Based Service
5566	              Discovery", draft-cheshire-dnsext-dns-sd-04 (work in
5567	              progress), August 2006.

5569	   [I-D.matthews-p2psip-bootstrap-mechanisms]
5570	              Cooper, E., "Bootstrap Mechanisms for P2PSIP",
5571	              draft-matthews-p2psip-bootstrap-mechanisms-00 (work in
5572	              progress), February 2007.

5574	   [I-D.garcia-p2psip-dns-sd-bootstrapping]
5575	              Garcia, G., "P2PSIP bootstrapping using DNS-SD",
5576	              draft-garcia-p2psip-dns-sd-bootstrapping-00 (work in
5577	              progress), October 2007.

5579	   [I-D.camarillo-hip-bone]
5580	              Camarillo, G., Nikander, P., and J. Hautakorpi, "HIP BONE:
5581	              Host Identity Protocol (HIP) Based Overlay Networking
5582	              Environment", draft-camarillo-hip-bone-00 (work in
5583	              progress), December 2007.

5585	   [I-D.pascual-p2psip-clients]
5586	              Pascual, V., Matuszewski, M., Shim, E., Zhang, H., and S.
5587	              Yongchao, "P2PSIP Clients",
5588	              draft-pascual-p2psip-clients-01 (work in progress),
5589	              February 2008.

5591	   [RFC4787]  Audet, F. and C. Jennings, "Network Address Translation
5592	              (NAT) Behavioral Requirements for Unicast UDP", BCP 127,
5593	              RFC 4787, January 2007.

5595	   [I-D.jiang-p2psip-sep]
5596	              Jiang, X. and H. Zhang, "Service Extensible P2P Peer
5597	              Protocol", draft-jiang-p2psip-sep-01 (work in progress),
5598	              February 2008.

5600	   [stoica-non-transitive-worlds05]
5601	              Freedman, M., Lakshminarayanan, K., Rhea, S., and I.
5602	              Stoica, "Non-Transitive Connectivity and DHTs",
5603	               WORLDS'05.

5605	   [stoica-geometry-sigcomm03]
5606	              Gummadi, K., Gummadi, R., Gribble, S., Ratnasamy, S.,
5607	              Shenker, S., and I. Stoica, "The Impact of DHT Routing
5608	              Geometry on Resilience and Proximity",  SIGCOMM'03.

5610	   [ng-analytical-churn-ieeep2p06]
5611	              Wu, D., Tian, Y., and K. Ng, "Analytical Study on
5612	              Improving DHT Lookup Performance under Churn",  IEEE
5613	              P2P'06.

5615	   [bryan-design-hotp2p08]
5616	              Bryan, D., Lowekamp, B., and M. Zangrilli, "The Design of
5617	              a Versatile, Secure P2PSIP Communications Architecture for
5618	              the Public Internet",  Hot-P2P'08.

5620	   [opendht-sigcomm05]
5621	              Rhea, S., Godfrey, B., Karp, B., Kubiatowicz, J.,
5622	              Ratnasamy, S., Shenker, S., Stoica, I., and H. Yu,
5623	              "OpenDHT: A Public DHT and its Uses",  SIGCOMM'05.

5625	   [Chord]    Stoica, I., Morris, R., Liben-Nowell, D., Karger, D.,
5626	              Kaashoek, M., Dabek, F., and H. Balakrishnan, "Chord: A
5627	              Scalable Peer-to-peer Lookup Service for Internet
5628	              Applications", IEEE/ACM Transactions on Networking Volume
5629	              11, Issue 1, 17-32, Feb 2003.

5631	   [vulnerabilities-acsac04]
5632	              Srivatsa, M. and L. Liu, "Vulnerabilities and Security
5633	              Threats in Structured Peer-to-Peer Systems: A Quantitative
5634	              Analysis",  ACSAC 2004.

5636	Authors' Addresses

5638	   Cullen Jennings
5639	   Cisco
5640	   170 West Tasman Drive
5641	   MS: SJC-21/2
5642	   San Jose, CA  95134
5643	   USA

5645	   Phone:  +1 408 421-9990
5646	   Email:  fluffy@cisco.com

5648	   Bruce B. Lowekamp
5649	   SIPeerior Technologies
5650	   3000 Easter Circle
5651	   Williamsburg, VA  23188
5652	   USA

5654	   Phone:  +1 757 565 0101
5655	   Email:  lowekamp@sipeerior.com

5657	   Eric Rescorla
5658	   Network Resonance
5659	   2064 Edgewood Drive
5660	   Palo Alto, CA  94303
5661	   USA

5663	   Phone:  +1 650 320-8549
5664	   Email:  ekr@networkresonance.com

5666	   Salman A. Baset
5667	   Columbia University
5668	   1214 Amsterdam Avenue
5669	   New York, NY
5670	   USA

5672	   Email:  salman@cs.columbia.edu
5673	   Henning Schulzrinne
5674	   Columbia University
5675	   1214 Amsterdam Avenue
5676	   New York, NY
5677	   USA

5679	   Email:  hgs@cs.columbia.edu

5681	Full Copyright Statement

5683	   Copyright (C) The IETF Trust (2008).

5685	   This document is subject to the rights, licenses and restrictions
5686	   contained in BCP 78, and except as set forth therein, the authors
5687	   retain all their rights.

5689	   This document and the information contained herein are provided on an
5690	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
5691	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
5692	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
5693	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
5694	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
5695	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

5697	Intellectual Property

5699	   The IETF takes no position regarding the validity or scope of any
5700	   Intellectual Property Rights or other rights that might be claimed to
5701	   pertain to the implementation or use of the technology described in
5702	   this document or the extent to which any license under such rights
5703	   might or might not be available; nor does it represent that it has
5704	   made any independent effort to identify any such rights.  Information
5705	   on the procedures with respect to rights in RFC documents can be
5706	   found in BCP 78 and BCP 79.

5708	   Copies of IPR disclosures made to the IETF Secretariat and any
5709	   assurances of licenses to be made available, or the result of an
5710	   attempt made to obtain a general license or permission for the use of
5711	   such proprietary rights by implementers or users of this
5712	   specification can be obtained from the IETF on-line IPR repository at
5713	   http://www.ietf.org/ipr.

5715	   The IETF invites any interested party to bring to its attention any
5716	   copyrights, patents or patent applications, or other proprietary
5717	   rights that may cover technology that may be required to implement
5718	   this standard.  Please address the information to the IETF at
5719	   ietf-ipr@ietf.org.

5721	Acknowledgment

5723	   Funding for the RFC Editor function is provided by the IETF
5724	   Administrative Support Activity (IASA).