idnits 2.17.1 draft-irtf-samrg-sam-baseline-protocol-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 766 has weird spacing: '...tionary opti...' == Line 812 has weird spacing: '...ionInfo imp_i...' == Line 1371 has weird spacing: '... opaque alm_...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: Code points for the kinds defined in this document MUST not conflict with any defined code points for RELOAD. RELOAD defines exp_a_req, exp_a_ans for experimental purposes. This specification uses only these message types for all ALM messages. RELOAD defines the MessageContents data structure. The ALM mapping uses the fields as follows: -- The document date (January 29, 2013) is 4098 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '32' on line 808 == Unused Reference: 'AGU1984' is defined on line 1635, but no explicit reference was found in the text == Unused Reference: 'CASTRO2003' is defined on line 1647, but no explicit reference was found in the text == Unused Reference: 'HE2005' is defined on line 1655, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-mboned-auto-multicast' is defined on line 1662, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-p2psip-sip' is defined on line 1673, but no explicit reference was found in the text == Unused Reference: 'I-D.irtf-p2prg-rtc-security' is defined on line 1679, but no explicit reference was found in the text == Unused Reference: 'I-D.irtf-sam-hybrid-overlay-framework' is defined on line 1685, but no explicit reference was found in the text == Unused Reference: 'I-D.matuszewski-p2psip-security-overview' is defined on line 1696, but no explicit reference was found in the text == Unused Reference: 'RFC0792' is defined on line 1713, but no explicit reference was found in the text == Unused Reference: 'RFC1112' is defined on line 1716, but no explicit reference was found in the text == Unused Reference: 'RFC1930' is defined on line 1719, but no explicit reference was found in the text == Unused Reference: 'RFC3376' is defined on line 1726, but no explicit reference was found in the text == Unused Reference: 'RFC3552' is defined on line 1730, but no explicit reference was found in the text == Unused Reference: 'RFC3810' is defined on line 1734, but no explicit reference was found in the text == Unused Reference: 'RFC4286' is defined on line 1737, but no explicit reference was found in the text == Unused Reference: 'RFC4605' is defined on line 1740, but no explicit reference was found in the text == Unused Reference: 'RFC4607' is defined on line 1745, but no explicit reference was found in the text == Unused Reference: 'RFC5058' is defined on line 1748, but no explicit reference was found in the text == Unused Reference: 'SPLITSTREAM' is defined on line 1752, but no explicit reference was found in the text == Outdated reference: A later version (-18) exists of draft-ietf-mboned-auto-multicast-14 == Outdated reference: A later version (-26) exists of draft-ietf-p2psip-base-24 == Outdated reference: A later version (-21) exists of draft-ietf-p2psip-sip-08 == Outdated reference: A later version (-10) exists of draft-irtf-samrg-common-api-06 Summary: 0 errors (**), 0 flaws (~~), 28 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SAM Research Group J. Buford 3 Internet-Draft Avaya Labs Research 4 Intended status: Informational M. Kolberg, Ed. 5 Expires: August 2, 2013 University of Stirling 6 January 29, 2013 8 Application Layer Multicast Extensions to RELOAD 9 draft-irtf-samrg-sam-baseline-protocol-02 11 Abstract 13 We define a RELOAD Usage for Application Layer Multicast as well as a 14 mapping to the RELOAD experimental message type to support ALM. The 15 ALM Usage is intended to support a variety of ALM control algorithms 16 in an overlay-independent way. Two example algorithms are defined, 17 based on Scribe and P2PCast. 19 Status of this Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on August 2, 2013. 36 Copyright Notice 38 Copyright (c) 2013 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 51 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 52 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 53 2.1. Overlay Network . . . . . . . . . . . . . . . . . . . . . 5 54 2.2. Overlay Multicast . . . . . . . . . . . . . . . . . . . . 5 55 2.3. Source Specific Multicast (SSM) . . . . . . . . . . . . . 5 56 2.4. Any Source Multicast (ASM) . . . . . . . . . . . . . . . . 6 57 2.5. Peer . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 58 3. Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 6 59 3.1. Overlay . . . . . . . . . . . . . . . . . . . . . . . . . 6 60 3.2. Overlay Multicast . . . . . . . . . . . . . . . . . . . . 6 61 3.3. RELOAD . . . . . . . . . . . . . . . . . . . . . . . . . . 7 62 3.4. NAT . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 63 3.5. Tree Topology . . . . . . . . . . . . . . . . . . . . . . 7 64 4. Architecture Extensions to RELOAD . . . . . . . . . . . . . . 7 65 5. RELOAD ALM Usage . . . . . . . . . . . . . . . . . . . . . . . 9 66 6. ALM Tree Control Signaling . . . . . . . . . . . . . . . . . . 9 67 7. ALM Messages Mapped to RELOAD . . . . . . . . . . . . . . . . 11 68 7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 11 69 7.2. Tree Lifecycle Messages . . . . . . . . . . . . . . . . . 12 70 7.2.1. Create Tree . . . . . . . . . . . . . . . . . . . . . 12 71 7.2.2. CreateTreeResponse . . . . . . . . . . . . . . . . . . 13 72 7.2.3. Join . . . . . . . . . . . . . . . . . . . . . . . . . 13 73 7.2.4. Join Accept (Join Response) . . . . . . . . . . . . . 14 74 7.2.5. Join Reject (Join Response) . . . . . . . . . . . . . 15 75 7.2.6. Join Confirm . . . . . . . . . . . . . . . . . . . . . 15 76 7.2.7. Join Confirm Response . . . . . . . . . . . . . . . . 16 77 7.2.8. Join Decline . . . . . . . . . . . . . . . . . . . . . 16 78 7.2.9. Join Decline Response . . . . . . . . . . . . . . . . 16 79 7.2.10. Leave . . . . . . . . . . . . . . . . . . . . . . . . 16 80 7.2.11. Leave Response . . . . . . . . . . . . . . . . . . . . 17 81 7.2.12. Re-Form or Optimize Tree . . . . . . . . . . . . . . . 17 82 7.2.13. Reform Response . . . . . . . . . . . . . . . . . . . 17 83 7.2.14. Heartbeat . . . . . . . . . . . . . . . . . . . . . . 17 84 7.2.15. Heartbeat Response . . . . . . . . . . . . . . . . . . 18 85 7.2.16. NodeQuery . . . . . . . . . . . . . . . . . . . . . . 18 86 7.2.17. NodeQuery Response . . . . . . . . . . . . . . . . . . 18 87 7.2.18. Push . . . . . . . . . . . . . . . . . . . . . . . . . 21 88 7.2.19. PushResponse . . . . . . . . . . . . . . . . . . . . . 21 89 8. Scribe Algorithm . . . . . . . . . . . . . . . . . . . . . . . 22 90 8.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 22 91 8.2. Create . . . . . . . . . . . . . . . . . . . . . . . . . . 23 92 8.3. Join . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 93 8.4. Leave . . . . . . . . . . . . . . . . . . . . . . . . . . 23 94 8.5. JoinConfirm . . . . . . . . . . . . . . . . . . . . . . . 24 95 8.6. JoinDecline . . . . . . . . . . . . . . . . . . . . . . . 24 96 8.7. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 24 97 9. P2PCast Algorithm . . . . . . . . . . . . . . . . . . . . . . 24 98 9.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 24 99 9.2. Message Mapping . . . . . . . . . . . . . . . . . . . . . 25 100 9.3. Create . . . . . . . . . . . . . . . . . . . . . . . . . . 26 101 9.4. Join . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 102 9.5. Leave . . . . . . . . . . . . . . . . . . . . . . . . . . 28 103 9.6. JoinConfirm . . . . . . . . . . . . . . . . . . . . . . . 28 104 9.7. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 28 105 10. ALMTree Kind . . . . . . . . . . . . . . . . . . . . . . . . . 28 106 11. Message Codes . . . . . . . . . . . . . . . . . . . . . . . . 29 107 11.1. ALMHeader Definition . . . . . . . . . . . . . . . . . . . 31 108 11.2. ALMMessageContents Definition . . . . . . . . . . . . . . 31 109 11.3. Response Codes . . . . . . . . . . . . . . . . . . . . . . 32 110 11.4. Algorithm Codes . . . . . . . . . . . . . . . . . . . . . 33 111 12. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 112 12.1. Create Tree . . . . . . . . . . . . . . . . . . . . . . . 33 113 12.2. Join Tree . . . . . . . . . . . . . . . . . . . . . . . . 34 114 12.3. Leave Tree . . . . . . . . . . . . . . . . . . . . . . . . 36 115 12.4. Push Data . . . . . . . . . . . . . . . . . . . . . . . . 36 116 13. Kind Definitions . . . . . . . . . . . . . . . . . . . . . . . 37 117 13.1. ALMTree Kind Definition . . . . . . . . . . . . . . . . . 37 118 14. RELOAD Configuration File Extensions . . . . . . . . . . . . . 38 119 15. Change History . . . . . . . . . . . . . . . . . . . . . . . . 38 120 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38 121 17. Security Considerations . . . . . . . . . . . . . . . . . . . 38 122 18. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 39 123 19. Informative References . . . . . . . . . . . . . . . . . . . . 39 124 Appendix A. Additional Stuff . . . . . . . . . . . . . . . . . . 41 125 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 41 127 1. Introduction 129 The concept of scalable adaptive multicast includes both scaling 130 properties and adaptability properties. Scalability is intended to 131 cover: 133 o large group size 135 o large numbers of small groups 137 o rate of group membership change 139 o admission control for QoS 141 o use with network layer QoS mechanisms 143 o varying degrees of reliability 145 o trees connect nodes over the global Internet 147 Adaptability includes 149 o use of different control mechanisms for different multicast trees 150 depending on initial application parameters or application classes 152 o changing multicast tree structure depending on changes in 153 application requirements, network conditions, and membership 155 Application Layer Multicast (ALM) has been demonstrated to be a 156 viable multicast technology where native multicast isn't available. 157 Many ALM designs have been proposed. This ALM Usage focuses on: 159 o ALM implemented in RELOAD-based overlays 161 o Support for a variety of ALM control algorithms 163 o Providing a basis for defining a separate hybrid-ALM RELOAD Usage 165 RELOAD [I-D.ietf-p2psip-base] has an application extension mechanism 166 in which a new type of application defines a Usage. A RELOAD Usage 167 defines a set of data types and rules for their use. In addition, 168 this document describes additional message types and a new ALM 169 algorithm plugin architectural component. 171 1.1. Requirements Language 173 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 174 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 175 document are to be interpreted as described in RFC 2119 [RFC2119]. 177 2. Definitions 179 We adopt the terminology defined in section 2 of 180 [I-D.ietf-p2psip-base], specifically the distinction between Node, 181 Peer, and Client. 183 2.1. Overlay Network 185 P P P P P 186 ..+....+....+...+.....+... 187 . +P 188 P+ . 189 . +P 190 ..+....+....+...+.....+... 191 P P P P P 193 Figure 1: Overlay Network Example 195 Overlay network - An application layer virtual or logical network in 196 which end points are addressable and that provides connectivity, 197 routing, and messaging between end points. Overlay networks are 198 frequently used as a substrate for deploying new network services, or 199 for providing a routing topology not available from the underlying 200 physical network. Many peer-to-peer systems are overlay networks 201 that run on top of the Internet. In Figure 1, "P" indicates overlay 202 peers, and peers are connected in a logical address space. The links 203 shown in the figure represent predecessor/successor links. Depending 204 on the overlay routing model, additional or different links may be 205 present. 207 2.2. Overlay Multicast 209 Overlay Multicast (OM): Hosts participating in a multicast session 210 form an overlay network and utilize unicast connections among pairs 211 of hosts for data dissemination. The hosts in overlay multicast 212 exclusively handle group management, routing, and tree construction, 213 without any support from Internet routers. This is also commonly 214 known as Application Layer Multicast (ALM) or End System Multicast 215 (ESM). We call systems which use proxies connected in an overlay 216 multicast backbone "proxied overlay multicast" or POM. 218 2.3. Source Specific Multicast (SSM) 220 SSM tree: The creator of the tree is the source. It sends data 221 messages to the tree root which are forwarded down the tree. 223 2.4. Any Source Multicast (ASM) 225 ASM tree: A node sending a data message sends the message to its 226 parent and its children. Each node receiving a data message from one 227 edge forwards it to remaining tree edges it is connected to. 229 2.5. Peer 231 Peer: an autonomous end system that is connected to the physical 232 network and participates in and contributes resources to overlay 233 construction, routing and maintenance. Some peers may also perform 234 additional roles such as connection relays, super nodes, NAT 235 traversal assistance, and data storage. 237 3. Assumptions 239 3.1. Overlay 241 Peers connect in a large-scale overlay, which may be used for a 242 variety of peer-to-peer applications in addition to multicast 243 sessions. Peers may assume additional roles in the overlay beyond 244 participation in the overlay and in multicast trees. We assume a 245 single structured overlay routing algorithm is used. Any of a 246 variety of multi-hop, one-hop, or variable-hop overlay algorithms 247 could be used. 249 Castro et al. [CASTRO2003]compared multi-hop overlays and found that 250 tree-based construction in a single overlay out-performed using 251 separate overlays for each multicast session. We use a single 252 overlay rather than separate overlays per multicast sessions. 254 An overlay multicast algorithm may leverage the overlay's mechanism 255 for maintaining overlay state in the face of churn. For example, a 256 peer may store a number of DHT (Distributed Hash Table) entries. 257 When the peer gracefully leaves the overlay, it transfers those 258 entries to the nearest peer. When another peer joins which is closer 259 to some of the entries than the current peer which holds those 260 entries, than those entries are migrated. Overlay churn affects 261 multicast trees as well; remedies include automatic migration of the 262 tree state and automatic re-join operations for dislocated children 263 nodes. 265 3.2. Overlay Multicast 267 The overlay supports concurrent multiple multicast trees. The limit 268 on number of concurrent trees depends on peer and network resources 269 and is not an intrinsic property of the overlay. 271 3.3. RELOAD 273 We use RELOAD [I-D.ietf-p2psip-base] as Peer-to-Peer overlay for data 274 storage and mechanism by which the peers interconnect and route 275 messages. RELOAD is a generic P2P overlay, and application support 276 is defined by profiles called Usages. 278 3.4. NAT 280 Some nodes in the overlay may be in a private address space and 281 behind firewalls. We use the RELOAD mechanisms for NAT traversal. 282 We permit clients to be leaf nodes in an ALM tree. 284 3.5. Tree Topology 286 All tree control messages are routed in the overlay. Two types of 287 data or media topologies are envisioned: 1) tree edges are paths in 288 the overlay, 2) tree edges are direct connections between a parent 289 and child peer in the tree, formed using the RELOAD AppAttach method. 291 4. Architecture Extensions to RELOAD 293 There are two changes as depicted in Figure 2. New ALM messages are 294 mapped to RELOAD Message Transport using the RELOAD experimental 295 message type. A plug-in for ALM algorithms handles the ALM state and 296 control. The ALM Algorithm is under control of the application via 297 the Group API [I-D.irtf-samrg-common-api]. 299 +---------+ 300 |Group API| 301 +---------+ 302 | 303 ------------------- Application ------------------------ 304 +-------+ | 305 | ALM | | 306 | Usage | | 307 +-------+ | 308 -------------- Messaging Service Boundary -------------- 309 | 310 +--------+ +-----------+---------+ +---------+ 311 | Storage|<---> | RELOAD | ALM |<-->| ALM Alg | 312 +--------+ | Message | Messages| +---------+ 313 ^ | Transport | | 314 | +-----------+---------+ 315 v | | 316 +-------------+ | 317 | Topology | | 318 | Plugin | | 319 +-------------+ | 320 ^ | 321 v v 322 +-------------------+ 323 | Forwarding& | 324 | Link Management | 325 +-------------------+ 327 ---------- Overlay Link Service Boundary -------------- 329 Figure 2: RELOAD Architecture Extensions 331 The ALM components interact with RELOAD as follows: 333 o ALM uses the RELOAD data storage functionality to store an ALMTree 334 instance when a new ALM tree is created in the overlay, and to 335 retrieve ALMTree instance(s) for existing ALM trees. 337 o ALM applications and management tools may use the RELOAD data 338 storage functionality to store diagnostic information about the 339 operation of trees, including average number of tree, delay from 340 source to leaf nodes, bandwidth use, packet loss rate. In 341 addition, diagnostic information may include statistics specific 342 to the tree root, or to any node in the tree. 344 5. RELOAD ALM Usage 346 Applications of RELOAD are restricted in the data types that can be 347 stored in the DHT. The profile of accepted data types for an 348 application is referred to as a Usage. RELOAD is designed so that 349 new applications can easily define new Usages. New RELOAD Usages are 350 needed for multicast applications since the data types in base RELOAD 351 and existing usages are not sufficient. 353 We define an ALM Usage in RELOAD. This ALM Usage is sufficient for 354 applications which require ALM functionality in the overlay. 355 Figure 2 shows the internal structure of the ALM Usage. This 356 contains the Group API ([I-D.irtf-samrg-common-api]) an ALM algorithm 357 plugin (e.g. Scribe) and the ALM messages which are then sent out to 358 the RELOAD network. 360 A RELOAD Usage is required [I-D.ietf-p2psip-base] to define the 361 following: 363 o Register Kind-Id points 365 o Define data structures for each kind 367 o Defines access control rules for each kind 369 o Defines the Resource Name used to hash to the Resource ID that 370 determines where the kind is stored 372 o Addresses restoration of values after recovery from a network 373 partition 375 o Defines the types of connections that can be initiated using 376 AppConnect 378 an ALM GroupID is a RELOAD Node-ID. The owner of an ALM group 379 creates a RELOAD Node-ID as specified in [I-D.ietf-p2psip-base]. 380 This means that a GroupID is used as a RELOAD Destination for overlay 381 routing purposes. 383 6. ALM Tree Control Signaling 385 Peers use the overlay to support ALM operations such as: 387 o Create tree 389 o Join 390 o Leave 392 o Re-Form or optimize tree 394 There are a variety of algorithms for peers to form multicast trees 395 in the overlay. We permit multiple such algorithms to be supported 396 in the overlay, since different algorithms may be more suitable for 397 certain application requirements, and since we wish to support 398 experimentation. Therefore, overlay messaging corresponding to the 399 set of overlay multicast operations must carry algorithm 400 identification information. 402 For example, for small groups, the join point might be directly 403 assigned by the rendezvous point, while for large trees the join 404 request might be propagated down the tree with candidate parents 405 forwarding their position directly to the new node. 407 Here is a simplistic algorithm for forming a multicast tree in the 408 overlay. Its main advantage is use of the overlay for routing both 409 control and data messages. The group creator doesn't have to be the 410 root of the tree or even in the tree. It doesn't consider per node 411 load, admission control, or alternative paths. 413 As stated earlier, multiple algorithms will co-exist in the overlay. 415 1. Peer which initiates multicast group: 417 groupID = create(); // allocate a unique groupId 418 // the root is the nearest 419 // peer in the overlay 420 // out of band advertisement or 421 // distribution of groupID, 422 // perhaps by publishing in DHT 424 2. Any joining peer: 426 // out of band discovery of groupID, perhaps by lookup in DHT 427 joinTree(groupID); // sends "join groupID" message 429 The overlay routes the join request using the overlay routing 430 mechanism toward the peer with the nearest id to the groupID. 431 This peer is the root. Peers on the path to the root join the 432 tree as forwarding points. 434 3. Leave Tree: 436 leaveTree(groupID) // removes this node from the tree 438 Propagates a leave message to each child node and to the parent 439 node. If the parent node is a forwarding node and this is its 440 last child, then it propagates a leave message to its parent. A 441 child node receiving a leave message from a parent sends a join 442 message to the groupID. 444 4. Message forwarding: 446 multicastMsg(groupID, msg); 448 5. For the message forwarding both Any Source Multicast (ASM) and 449 Source Specific Multicast (SSM) approaches may be used: 451 7. ALM Messages Mapped to RELOAD 453 7.1. Introduction 455 In this document we define messages for overlay multicast tree 456 creation, using an existing protocol (RELOAD) in the P2P-SIP WG 457 [I-D.ietf-p2psip-base] for a universal structured peer-to-peer 458 overlay protocol. RELOAD provides the mechanism to support a number 459 of overlay topologies. Hence the overlay multicast framework defined 460 in this draft can be used with P2P-SIP, and makes the SAM framework 461 is overlay agnostic. 463 As discussed in the SAM requirements draft 464 [I-D.muramoto-irtf-sam-generic-require], there are a variety of ALM 465 tree formation and tree maintenance algorithms. The intent of this 466 specification is to be algorithm agnostic, similar to how RELOAD is 467 overlay algorithm agnostic. We assume that all control messages are 468 propagated using overlay routed messages. 470 The message types needed for ALM behavior are divided into the 471 following categories: 473 o Tree life-cycle (create, join, leave, re-form, heartbeat) 475 o Peer region and multicast properties 477 The message codes are defined in Section 11 of this document. 478 Messages are mapped to the RELOAD experimental message type. 480 In the following sections the protocol messages as mapped to RELOAD 481 are discussed. Detailed example message flows are provided in 482 Section 12. 484 In the following descriptions we use the datatype Dictionary which is 485 a set of opaque values indexed by an opaque key with one value for 486 each key. A single dictionary entry is represented by a 487 DictionaryEntry as defined in Section 7.2.3 of the RELOAD draft 488 [I-D.ietf-p2psip-base]. The Dictionary datatype is defined as 489 follows: 491 struct { 492 DictionaryEntry elements<0..2^16-1>; 493 } Dictionary; 495 7.2. Tree Lifecycle Messages 497 Peers use the overlay to transmit ALM (application layer multicast) 498 operations defined in this section. 500 7.2.1. Create Tree 502 A new ALM tree is created in the overlay with the identity specified 503 by group_id. The common interpretation in a DHT based overlay of 504 group_id is that the peer with peer id closest to and less than the 505 group_id is the root of the tree. However, other overlay types are 506 supported. The tree has no children at the time it is created. 508 The group_id is generated from a well-known session key to be used by 509 other Peers to address the multicast tree in the overlay. The 510 generation of the group_id from the session_key MUST be done using 511 the overlay's id generation mechanism. 513 A successful Create Tree causes an ALMTree structure to be stored in 514 the overlay at the node G responsible for the group_id. This node G 515 performs the RELOAD-defined StoreReq operation as a side effect of 516 performing the Create Tree. If the StoreReq fails, the Create Tree 517 fails too. 519 After a successful Create Tree, peers can use the RELOAD Fetch method 520 to retrieve the ALMTree struct at address group_id. The ALMTree kind 521 is defined in Section 10. 523 struct { 524 node_id peer_id; 525 opaque session_key<0..2^32-1>; 526 node_id group_id; 527 Dictionary options; 528 } ALMTree; 530 peer_id: the overlay address of the peer that creates the multicast 531 tree. 533 session_key: a well-known string when hashed using the overlay's id 534 generation algorithm produces the group_id. 536 group_id: the overlay address of the root of the tree 538 options: name-value list of properties to be associated with the 539 tree, such as the maximum size of the tree, restrictions on peers 540 joining the tree, latency constraints, preference for distributed or 541 centralized tree formation and maintenance, heartbeat interval. 543 Tree creation is subject to access control since it involves a Store 544 operation. The NODE-MATCH access policy defined in section 7.3.2 of 545 RELOAD is used. 547 7.2.2. CreateTreeResponse 549 After receiving a CreateTree message from node S, the peer sends a 550 CreateTreeReponse to node S. 552 struct { 553 Dictionary options; 554 } CreateTreeResponse; 556 options: A node may provide algorithm-dependent parameters about the 557 created tree to the requesting node. 559 7.2.3. Join 561 Causes the distributed algorithm for peer join of a specific ALM 562 group to be invoked. The definition of the Join message is shown 563 below. If successful, the joining peer is notified of one or more 564 candidate parent peers in one or more JoinAccept messages. The 565 particular ALM join algorithm is not specified in this protocol. 567 RELOAD is a request-response protocol. Consequently, the messages 568 JoinAccept and JoinReject (defined below) are matching responses for 569 Join. If JoinReject is received, then no further action on this 570 request is carried out. If JoinAccept is received, then either a 571 JoinConfirm or a JoinDecline message (see below) is then sent. The 572 matching response for JoinConfirm is JoinConfirmResponse. The 573 matching response for JoinDecline is JoinDeclineResponse. 575 The following list shows the matching request-responses according to 576 the request-response mechanism defined in RELOAD. 578 Join -- JoinAccept: Node C sends a Join request to node P. If node 579 P accepts, it responds with JoinAccept. 581 Join -- JoinReject: Node C sends a Join request to node P. If node 582 P does not accept the join request, it responds with JoinReject. 584 JoinConfirm -- JoinConfirmResponse: If node P sent node C a 585 JoinAccept, then node C confirms with a JoinConfirm request. Node 586 P then responds with a JoinConfirmResponse. 588 JoinDecline -- JoinDeclineResponse: If node P sent node C a 589 JoinAccept, then node C declines with a JoinDecline request. Node 590 P then responds with a JoinDeclineResponse 592 Thus Join, JoinConfirm, and JoinDecline are treated as requests as 593 defined in RELOAD, are mapped to the RELOAD exp_a_req message, and 594 are therefore retransmitted until either retry limit is reached or a 595 matching response received. JoinAccept, JoinReject, 596 JoinConfirmResponse, and JoinDeclineResponse are treated as message 597 responses as defined above, and are mapped to the RELOAD exp_a_ans 598 message. 600 struct { 601 node_id peer_id; 602 node_id group_id; 603 Dictionary options; 604 } Join; 606 peer_id: overlay address of joining/leaving peer 608 group_id: the overlay address of the root of the tree 610 options: name-value list of options proposed by joining peer 612 7.2.4. Join Accept (Join Response) 614 Tells the requesting joining peer that the indicated peer is 615 available to act as its parent in the ALM tree specified by group_id, 616 with the corresponding options specified. A peer MAY receive more 617 than one JoinAccept from different candidate parent peers in the 618 group_id tree. The peer accepts a peer as parent using a JoinConfirm 619 message. A JoinAccept which receives neither a JoinConfirm or 620 JoinDecline message MUST expire. 622 struct { 623 node_id parent_peer_id; 624 node_id child_peer_id; 625 node_id group_id; 626 Dictionary options; 627 } JoinAccept; 629 parent_peer_id: overlay address of a peer which accepts the joining 630 peer 632 child_peer_id: overlay address of joining peer 634 group_id: the overlay address of the root of the tree 636 options: name-value list of options accepted by parent peer 638 7.2.5. Join Reject (Join Response) 640 A peer receiving a Join message responds with a JoinReject response 641 to indicate the request is rejected. 643 7.2.6. Join Confirm 645 A peer receiving a JoinAccept message which it wishes to accept MUST 646 explicitly accept it before the expiration of the JoinAccept using a 647 JoinConfirm message. The joining peer MUST include only those 648 options from the JoinAccept which it also accepts, completing the 649 negotiation of options between the two peers. 651 struct { 652 node_id child_peer_id; 653 node_id parent_peer_id; 654 node_id group_id; 655 Dictionary options; 656 } JoinConfirm; 658 child_peer_id: overlay address of joining peer which is a child of 659 the parent peer 661 parent_peer_id: overlay address of the peer which is the parent of 662 the joining peer 664 group_id: the overlay address of the root of the tree 666 options: name-value list of options accepted by both peers 668 7.2.7. Join Confirm Response 670 A peer receiving a JoinConfirm message responds with a 671 JoinConfirmResponse 673 7.2.8. Join Decline 675 A peer receiving a JoinAccept message which does not wish to accept 676 it MAY explicitly decline it using a JoinDecline message. 678 struct { 679 node_id peer_id; 680 node_id parent_peer_id; 681 node_id group_id; 682 } JoinDecline; 684 peer_id: overlay address of joining peer which declines the 685 JoinAccept 687 parent_peer_id: overlay address of the peer which issued a JoinAccept 688 to this peer 690 group_id: the overlay address of the root of the tree 692 7.2.9. Join Decline Response 694 A peer receiving a JoinConfirm message responds with a 695 JoinDeclineResponse 697 7.2.10. Leave 699 A peer which is part of an ALM tree identified by group_id which 700 intends to detach from either a child or parent peer SHOULD send a 701 Leave message to the peer it wishes to detach from. A peer receiving 702 a Leave message from a peer which is neither in its parent or child 703 lists SHOULD ignore the message. 705 struct { 706 node_id peer_id; 707 node_id group_id; 708 Dictionary options; 709 } Leave; 711 peer_id: overlay address of leaving peer 713 group_id: the overlay address of the root of the tree 715 options: name-value list of options 717 7.2.11. Leave Response 719 A peer receiving a Leave message responds with a LeaveResponse 721 7.2.12. Re-Form or Optimize Tree 723 This triggers a reorganization of either the entire tree or only a 724 sub-tree. It MAY include hints to specific peers of recommended 725 parent or child peers to reconnect to. A peer receiving this message 726 MAY ignore it, MAY propagate it to other peers in its subtree, and 727 MAY invoke local algorithms for selecting preferred parent and/or 728 child peers. 730 struct { 731 node_id group_id; 732 node_id peer_id; 733 Dictionary options; 734 } Reform; 736 group_id: the overlay address of the root of the tree 738 peer_id: if omitted, then the tree is reorganized starting from the 739 root, otherwise it is reorganized only at the sub-tree identified by 740 peer_id. 742 options: name-value list of options 744 7.2.13. Reform Response 746 A peer receiving a Reform message responds with a ReformResponse 748 struct { 749 Dictionary options; 750 } ReformResponse; 752 options: algorithm dependent information about the results of the 753 reform operation 755 7.2.14. Heartbeat 757 A child node signals to its adjacent parent nodes in the tree that it 758 is alive. If a parent node does not receive a Heartbeat message 759 within N heartbeat time intervals, it MUST treat this as an explicit 760 Leave message from the unresponsive peer. N is configurable. 762 struct { 763 node_id peer_id_1; 764 node_id peer_id_2; 765 node_id group_id; 766 Dictionary options; 767 } Heartbeat; 769 peer_id_1: source of heartbeat 771 peer_id_2: destination of heartbeat 773 group_id: overlay address of the root of the tree 775 options: an algorithm may use the heartbeat message to provide state 776 information to adjacent nodes in the tree 778 7.2.15. Heartbeat Response 780 A parent node responds to a Heartbeat message from a child node in a 781 tree that it has received the Heartbeat message. 783 7.2.16. NodeQuery 785 The NodeQuery message is used to obtain information about the state 786 and performance of the tree on a per node basis. A set of nodes 787 could be queried to construct a centralized view of the multicast 788 trees, similar to a web crawler. 790 struct { 791 node_id peer_id_1; 792 node_id peer_id_2; 793 } NodeQuery; 795 peer_id_1: source of query 797 peer_id_2: destination of query 799 7.2.17. NodeQuery Response 801 The response to a NodeQuery message contains a NodeStatistics 802 instance for this node. 804 public struct { 805 uint32 node_lifetime; 806 uint32 total_number_trees; 807 uint16 number_algorithms_supported; 808 uint8 algorithms_supported[32]; 809 TreeData max_tree_data; 810 uint16 active_number_trees; 811 TreeData tree_data<0..2^8-1>; 812 ImplementationInfo imp_info; 813 } NodeStatistics; 815 node_lifetime: time the node has been alive in seconds since last 816 restart 818 total_number_trees: total number of trees this node has been part 819 of during the node lifetime 821 number_algorithms_supported: value between 0..2^16-1 corresponding 822 to the number of algorithms supported 824 algorithms_supported: list of algorithms, each byte encoded using 825 the corresponding algorithm code 827 max_tree_data: data about tree with largest number of nodes that 828 this node was part of. NodeQuery can be used to crawl all the 829 nodes in an ALM tree to fill this field. This is intended to 830 support monitoring, algorithm design, and general experimentation 831 with ALM in RELOAD. 833 active_number_trees: current number of trees that the node is part 834 of 836 tree_data: details of each active tree, the number of such is 837 specified by the number_active_trees. 839 impl_info: information about the implementation of this usage 841 public struct { 842 uint32 tree_id; 843 uint8 algorithm; 844 NodeId tree_root; 845 uint8 number_parents; 846 NodeId parent<0..2^8-1>; 847 Uint16 number_children_nodes; 848 NodeId children<0..2^16-1>; 849 Uint32 path_length_to_root; 850 Uint32 path_delay_to_root; 851 Uint32 path_delay_to_child; 852 } TreeData; 854 tree_id: the id of the tree 856 algorithm: code identifying the multicast algorithm used by this 857 tree 859 tree_root: node_id of tree root, or 0 if unknown 861 number_parents: 0 .. 2^8-1 indicates number of parent nodes for 862 this node 864 parent: the RELOAD NodeId of each parent node 866 number_children_nodes: 0..2^16-1 indicates number of children 868 children: the RELOAD NodeId of each child node 870 path_length_to_root: number of overlay hops to the root of the 871 tree 873 path_delay_to_root: RTT in millisec. to root node 875 path_delay_to_child: last measured RTT in msec to child node with 876 largest RTT. 878 public struct { 879 uint32 join_confim_timeout; 880 uint32 heartbeat_interval; 881 uint32 heartbeat_reponse_timeout; 882 uint16 info_length; 883 uint8 info<0..2^16-1>; 884 } ImplementationInfo; 885 join_confirm_timeout: The default time for join confirm/decline, 886 intended to provide sufficient time for a join request to receive 887 all responses and confirm the best choice. Default value is 5000 888 msec. An implementation can change this value. 890 heartbeat interval: The heartbeat interval is 2000 msec. 892 heartbeat timeout interval: The heartbeat timeout is 5000 msec, 893 and is the max time between heartbeat reports from an adjacent 894 node in the tree at which point the heartbeat is missed. 896 info_length: length of the info field 898 info: implementation specific information, such as name of 899 implementation, build version, and implementation specific 900 features 902 7.2.18. Push 904 A peer sends arbitrary multicast data to other peers in the tree. 905 Nodes in the tree forward this message to adjacent nodes in the tree 906 in an algorithm dependent way. 908 struct { 909 node_id group_id; 910 uint8 priority; 911 uint32 length; 912 uint8 data<0..2^32-1>; 913 } Push; 915 group_id: overlay address of root of the ALM tree 917 priority: the relative priority of the message, highest priority is 918 255. A node may ignore this field 920 length: length of the data field in bytes 922 data: the data 924 7.2.19. PushResponse 926 After receiving a Push message from node S, the receiving peer sends 927 a PushReponse to node S. 929 struct { 930 Dictionary options; 931 } PushResponse; 933 options: A node may provide feedback to the sender about previous 934 push messages in some window, for example, the last N push messages. 935 The feedback could include, for each push message received, the 936 number of adjacent nodes which were forwarded the push message, and 937 the number of adjacent nodes from which a PushResponse was received. 939 8. Scribe Algorithm 941 8.1. Overview 943 Figure 3 shows a mapping between RELOAD ALM messages (as defined in 944 Section 5 of this draft) and Scribe messages as defined in 945 [CASTRO2002]. 947 +------------------+-------------------+-----------------+ 948 | Section in Draft |RELOAD ALM Message | Scribe Message | 949 +------------------+-------------------+-----------------+ 950 | 7.2.1 | CreateALMTree | Create | 951 +------------------+-------------------+-----------------+ 952 | 7.2.2 | Join | Join | 953 +------------------+-------------------+-----------------+ 954 | 7.2.3 | JoinAccept | | 955 +------------------+-------------------+-----------------+ 956 | 7.2.4 | JoinConfirm | | 957 +------------------+-------------------+-----------------+ 958 | 7.2.5 | JoinDecline | | 959 +------------------+-------------------+-----------------+ 960 | 7.2.6 | Leave | Leave | 961 +------------------+-------------------+-----------------+ 962 | 7.2.7 | Reform | | 963 +------------------+-------------------+-----------------+ 964 | 7.2.8 | Heartbeat | | 965 +------------------+-------------------+-----------------+ 966 | 7.2.9 | NodeQuery | | 967 +------------------+-------------------+-----------------+ 968 | 7.2.10 | Push | Multicast | 969 +------------------+-------------------+-----------------+ 970 | | Note 1 | deliver | 971 +------------------+-------------------+-----------------+ 972 | | Note 1 | forward | 973 +------------------+-------------------+-----------------+ 974 | | Note 1 | route | 975 +------------------+-------------------+-----------------+ 976 | | Note 1 | send | 977 +------------------+-------------------+-----------------+ 979 Figure 3: Mapping to Scibe Messages 981 Note 1: These Scribe messages are handled by RELOAD messages. 983 The following sections describe the Scribe algorithm in more detail. 985 8.2. Create 987 This message will create a group with group_id. This message will be 988 delivered to the node whose node_id is closest to the group_id. This 989 node becomes the rendezvous point and root for the new multicast 990 tree. Groups may have multiple sources of multicast messages. 992 CREATE : groups.add(msg.group_id) 994 group_id: the overlay address of the root of the tree 996 8.3. Join 998 To join a multicast tree a node sends a JOIN request with the 999 group_id as the key. This message gets routed by the overlay to the 1000 rendezvous point of the tree. If an intermediate node is already a 1001 forwarder for this tree, it will add the joining node as a child. 1002 Otherwise the node will create a child table for the group and adds 1003 the joining node. It will then send the JOIN request towards the 1004 rendevous point terminating the JOIN message from the child. 1006 To adapt the Scribe algorithm into the ALM Usage proposed here, after 1007 a JOIN request is accepted, a JOINAccept message is returned to the 1008 joining node. 1010 JOIN : if(checkAccept(msg)) { 1011 recvJoins.add(msg.source, msg.group_id) 1012 SEND(JOINAccept(node_id, msg.source, msg.group_id)) 1013 } 1015 8.4. Leave 1017 When leaving a multicast group a node will change its local state to 1018 indicate that it left the group. If the node has no children in its 1019 table it will send a LEAVE request to its parent, which will travel 1020 up the multicast tree and will stop at a node which has still 1021 children remaining after removing the leaving node. 1023 LEAVE : groups[msg.group_id].children.remove(msg.source) 1024 if (groups[msg.group].children = 0) 1025 SEND(msg,groups[msg.group_id].parent) 1027 8.5. JoinConfirm 1029 This message is not part of the Scribe protocol, but required by the 1030 basic protocol proposed in this draft. Thus the usage will send this 1031 message to confirm a joining node accepting its parent node. 1033 JOINConfirm: if(recvJoins.contains(msg.source,msg.group_id)){ 1034 if !(groups.contains(msg.group_id)) { 1035 groups.add(msg.group_id) 1036 SEND(msg,msg.group_id) 1037 } 1038 groups[msg.group_id].children.add(msg.source) 1039 recvJoins.del(msg.source, msg.group_id) 1040 } 1042 8.6. JoinDecline 1044 JOINDecline: if(recvJoins.contains(msg.source,msg.group_id)) 1045 recvJoins.del(msg.source, msg.group_id) 1047 8.7. Multicast 1049 A message to be multicast to a group is sent to the rendevous node 1050 from where it is forwarded down the tree. If a node is a member of 1051 the tree rather than just a forwarder it will pass the multicast data 1052 up to the application. 1054 MULTICAST : foreach(groups[msg.group_id].children as node_id) 1055 SEND(msg,node_id) 1056 if memberOf(msg.group_id) 1057 invokeMessageHandler(msg.group_id, msg) 1059 9. P2PCast Algorithm 1061 9.1. Overview 1063 P2PCast [P2PCAST]creates a forest of related trees to increase load 1064 balancing. P2PCast is independent on the underlying P2P substrate. 1065 Its goals and approach are similar to Splitstream [SPLITSTREAM](which 1066 assumes Pastry as the P2P overlay). In P2PCast the content provider 1067 splits the stream of data into f stripes. Each tree in the forest of 1068 multicast trees is an (almost) full tree of arity f. These trees are 1069 conceptually separate: every node of the system appears once in each 1070 tree, with the content provider being the source in all of them. To 1071 ensure that each peer contributes as much bandwidth as it receives, 1072 every node is a leaf in all the trees except for one, in which the 1073 node will serve as an internal node (proper tree of this node). The 1074 remainder of this section will assume f=2 for the discussion. This 1075 is to keep the complexity for the description down. However, the 1076 algorithm scales for any number f. 1078 P2PCast distinguishes the following types of nodes: 1080 o Incomplete Nodes: A node with less than f children in its proper 1081 stripe; 1083 o Only-Child Nodes: A node whose parent (in any multicast tree) is 1084 an incomplete node; 1086 o Complete Nodes: A node with exactly f children in its proper 1087 stripe 1089 o Special Node: A single node which is a leaf in all multicast trees 1090 of the forest 1092 9.2. Message Mapping 1094 Figure 4 shows a mapping between RELOAD ALM messages (as defined in 1095 Section 5 of this draft) and P2PCast messages as defined in 1096 [P2PCAST]. 1098 +------------------+-------------------+-----------------+ 1099 | Section in Draft |RELOAD ALM Message | P2PCast Message | 1100 +------------------+-------------------+-----------------+ 1101 | 7.2.1 | CreateALMTree | Create | 1102 +------------------+-------------------+-----------------+ 1103 | 7.2.2 | Join | Join | 1104 +------------------+-------------------+-----------------+ 1105 | 7.2.3 | JoinAccept | | 1106 +------------------+-------------------+-----------------+ 1107 | 7.2.4 | JoinConfirm | | 1108 +------------------+-------------------+-----------------+ 1109 | 7.2.5 | JoinDecline | | 1110 +------------------+-------------------+-----------------+ 1111 | 7.2.6 | Leave | Leave | 1112 +------------------+-------------------+-----------------+ 1113 | 7.2.7 | Reform | Takeon | 1114 | | | Substitute | 1115 | | | Search | 1116 | | | Replace | 1117 | | | Direct | 1118 | | | Update | 1119 +------------------+-------------------+-----------------+ 1120 | 7.2.8 | Heartbeat | | 1121 +------------------+-------------------+-----------------+ 1122 | 7.2.9 | NodeQuery | | 1123 +------------------+-------------------+-----------------+ 1124 | 7.2.10 | Push | Multicast | 1125 +------------------+-------------------+-----------------+ 1127 Figure 4: Mapping to P2PCast Messages 1129 The following sections describe the mapping of the P2PCast messages 1130 in more detail. 1132 9.3. Create 1134 This message will create a group with group_id. This message will be 1135 delivered to the node whose node_id is closest to the group_id. This 1136 node becomes the rendezvous point and root for the new multicast 1137 tree. The rendezvous point will maintain f subtrees. 1139 9.4. Join 1141 To join a multicast tree a joining node N sends a JOIN request to a 1142 random node A already part of the tree. Depending of the type of A 1143 the joining algorithm continues as follows: 1145 o Incomplete Nodes: Node A will arbitrarily select for which tree it 1146 wants to serve as an internal node, and adopt N in that tree. In 1147 the other tree node N will adopt node A as a child (taking node 1148 A's place in the tree) thus becoming an internal node in the 1149 stripe that node A didn't choose. 1151 o Only-Child Nodes: As this node has a parent which is an incomplete 1152 node, the joining node will be redirected to the parent node and 1153 will handle the request as detailed above. 1155 o Complete Nodes: The contacted node A must be a leaf in the other 1156 tree. If node A is a leaf node in Stripe 1, node N will become an 1157 internal node in Stripe 1, taking the place of node A, adopting it 1158 at the same time. To find a place for itself in the other stripe, 1159 node N starts a random walk down the subtree rooted at the sibling 1160 of node A (if node A is the root and thus does not have siblings, 1161 node N is sent directly to a leaf in that tree), which ends as 1162 soon as node N finds an incomplete node or a leaf. In this case 1163 node N is adopted by the incomplete node. 1165 o Special Node: as this node is a leaf in all subtrees, the joining 1166 node can adopt the node in one tree and become a child in the 1167 other. 1169 P2PCast uses defined messages for communication between nodes during 1170 reorganisation. To use P2PCast in this context, these messages are 1171 encapsulated by the message type REFORM. In doing so, the P2PCast 1172 message is to be included in the options parameter of REFORM. The 1173 following reorganisation messages are defined by P2PCast: 1175 TAKEON: To take another peer as a child 1177 SUBSTITUTE: To take the place of a child of some peer 1179 SEARCH: To obtain the child of a node in a particular stripe 1181 REPLACE: Different from SUBSTITUTE in that the node which makes us 1182 its child sheds off a random child 1184 DIRECT: To direct a node to its would-be parent 1186 UPDATE: A node sends its updated state to its children 1188 To adapt the P2PCast algorithm into the ALM Usage proposed here, 1189 after a JOIN request is accepted, a JOINAccept message is returned to 1190 the joining node (one for every subtree). 1192 9.5. Leave 1194 When leaving a multicast group a node will change its local state to 1195 indicate that it left the group. Distregarding the case where the 1196 leaving node is the root of the tree, the leaving node must be 1197 complete or incomplete in its proper tree. In the other trees the 1198 node is a leaf and can just disappear by notifying its parent. For 1199 the proper tree, if the node is incomplete, it is replaced by its 1200 child. However, if the node is complete, a bubble is created which 1201 is filled by a random child. If this child is incomplete, it can 1202 simply fill the gap. However, if it is complete, it needs to shed a 1203 random child. This child is directed to its sibling, which sheds a 1204 random child. This process ripples down the tree until the next-to- 1205 last level is reached. The shed node is then taken as a child by the 1206 parent of the deleted node in the other stripe. 1208 Again, for the reorganisation of the tree, the REFORM message type is 1209 used as defined in the previous section. 1211 9.6. JoinConfirm 1213 This message is not part of the P2PCast protocol, but required by the 1214 basic protocol defined in this draft. Thus the usage will send this 1215 message to confirm a joining node accepting its parent node. As with 1216 Join and JoinAccept, this will be carried out for every subtree. 1218 9.7. Multicast 1220 A message to be multicast to a group is sent to the rendezvous node 1221 from where it is forwarded down the tree by being split into k 1222 stripes. Each stripe is then sent via a subtree. If a receiving 1223 node is a member of the tree rather than just a forwarder it will 1224 pass the multicast data up to the application. 1226 10. ALMTree Kind 1228 an ALMTree Kind is defined per section 7.4.5 in RELOAD. An instance 1229 of the ALMTree kind is stored in the overlay for each ALM tree 1230 instance. It is stored at the address group_id. 1232 Meaning: The meaning of the fields is given in Section 7.2.1. 1234 Kind-Id: 0xf0000001 (This is a private-use code-point per section 1235 14.6 of RELOAD. 1237 Data model: 1239 struct { 1240 node_id peer_id; 1241 opaque session_key<0..2^32-1>; 1242 node_id group_id; 1243 Dictionary options; 1244 } ALMTree; 1246 Access control model: The node performing the store operation is 1247 required to have NODE-MATCH access. 1249 11. Message Codes 1251 All messages are mapped to the RELOAD experimental message type. The 1252 mapping is given in the following table. The format of the body of a 1253 message is given in Figure 5. 1255 +-------------------------+------------------+------------------+ 1256 | Message |RELOAD Code Point | ALM Message Code | 1257 +-------------------------+------------------+------------------+ 1258 | CreateALMTRee | exp_a_req | 00 | 1259 +-------------------------+------------------+------------------+ 1260 | CreateALMTreeResponse | exp_a_ans | 01 | 1261 +-------------------------+------------------+------------------+ 1262 | Join | exp_a_req | 02 | 1263 +-------------------------+------------------+------------------+ 1264 | JoinAccept | exp_a_ans | 03 | 1265 +-------------------------+------------------+------------------+ 1266 | JoinReject | exp_a_ans | 04 | 1267 +-------------------------+------------------+------------------+ 1268 | JoinConfirm | exp_a_req | 05 | 1269 +-------------------------+------------------+------------------+ 1270 | JoinConfirmResponse | exp_a_ans | 06 | 1271 +-------------------------+------------------+------------------+ 1272 | JoinDecline | exp_a_req | 07 | 1273 +-------------------------+------------------+------------------+ 1274 | JoinDeclineResponse | exp_a_ans | 08 | 1275 +-------------------------+------------------+------------------+ 1276 | Leave | exp_a_req | 09 | 1277 +-------------------------+------------------+------------------+ 1278 | LeaveResponse | exp_a_ans | x0A | 1279 +-------------------------+------------------+------------------+ 1280 | Reform | exp_a_req | x0B | 1281 +-------------------------+------------------+------------------+ 1282 | ReformResponse | exp_a_ans | x0C | 1283 +-------------------------+------------------+------------------+ 1284 | Heartbeat | exp_a_req | x0D | 1285 +-------------------------+------------------+------------------+ 1286 | HeartbeatResponse | exp_a_ans | x0E | 1287 +-------------------------+------------------+------------------+ 1288 | NodeQuery | exp_a_req | x0F | 1289 +-------------------------+------------------+------------------+ 1290 | NodeQueryResponse | exp_a_ans | x10 | 1291 +-------------------------+------------------+------------------+ 1292 | Push | exp_a_req | x11 | 1293 +-------------------------+------------------+------------------+ 1294 | PushResponse | exp_a_ans | x12 | 1295 +-------------------------+------------------+------------------+ 1297 Figure 5: RELOAD Message Code mapping 1299 For Data Kind-IDs, the RELOAD specification states: "Code points in 1300 the range 0xf0000001 to 0xfffffffe are reserved for private use". 1301 ALM Usage Kind-IDs will be defined in the private use range. 1303 All ALM Usage messages support the RELOAD Message Extension 1304 mechanism. 1306 Code points for the kinds defined in this document MUST not conflict 1307 with any defined code points for RELOAD. RELOAD defines exp_a_req, 1308 exp_a_ans for experimental purposes. This specification uses only 1309 these message types for all ALM messages. RELOAD defines the 1310 MessageContents data structure. The ALM mapping uses the fields as 1311 follows: 1313 o message_code: exp_a_req for requests and exp_a_ans for responses 1315 o message_body: contains one instance of ALMHeader followed by one 1316 instance of ALMMessageContents 1318 o extensions: unused 1320 11.1. ALMHeader Definition 1322 struct { 1323 uint32 sam_token; 1324 uint32 algorithm; 1325 uint8 version; 1326 } ALMHeader; 1328 The fields in ALMHeader are used as follows: 1330 sam_token: The first four bytes identify this message as an ALM 1331 message. This field MUST contain the value 0xd3414d42 (the string 1332 "SAMB" with the high bit of the first byte set. 1334 algorithm: The code of the multicast algorithm being used. Each 1335 multicast tree uses only one algorithm. Trees with different 1336 multicast algorithm can co-exist, and can share the same nodes. 1338 version: The version of the ALM protocol being used. This is a 1339 fixed point integer between 0.1 and 25.4 This document describes 1340 version 1.0 with a value of 0xa. 1342 11.2. ALMMessageContents Definition 1344 struct { 1345 uint16 alm_message_code; 1346 opaque alm_message_body; 1347 } ALMMessageContents; 1349 The fields in ALMMessageContents are used as follows: 1351 alm_message_code: This indicates the message being sent. The 1352 message codes are listed in Section 11. 1354 alm_message_body: The message body itself, represented as a 1355 variable-length string of bytes. The bytes themselves are 1356 dependent on the code value. See Section 8 and Section 9 1357 describing the various ALM methods for the definitions of the 1358 payload contents. 1360 11.3. Response Codes 1362 Response codes are defined in section 6.3.3.1 in RELOAD. This 1363 experimental specification maps to RELOAD ErrorResponse as follows: 1365 ErrorResponse.error_code = Error_Exp_A; 1367 Error_info contains an ALMErrorResponse instance. 1369 public struct { 1370 uint16 alm_error_code; 1371 opaque alm_error_info<0..2^16-1>; 1372 } ALMErrorResponse; 1374 alm_error_code: The following error code values are defined. Numeric 1375 values for these are defined in section X. 1377 Error_Unknown_Algorithm: The multicast algorithm is not known or 1378 not supported. 1380 Error_Child_Limit_Reached: The maximum number of children nodes 1381 has been reached for this node 1383 Error_Node_Bandwidth_Reached: The overall data bandwidth limit 1384 through this node has been reached 1386 Error_Node_Connection_Limit_Reached: The total number of 1387 connections to this node has been reached 1389 Error_Link_Capacity_Limit_Reached: The capacity of a link has been 1390 reached 1392 Error_Node_Memory_Capacity_Limit_Reached: An internal memory 1393 capacity of the node has been reached 1395 Error_Node_CPU_Capacity_Limit_Reached: An internal processing 1396 capacity of the node has been reached 1397 Error_Path_Limit_Reached: The maximum path length in hopcount over 1398 the multicast tree has been reached 1400 Error _Path_Delay_Limit_Reached: The maximum path length in 1401 message delay over the multicast tree has been reached 1403 Error_Tree_Fanout_Limit_Reached: The maximum fanout of a multicast 1404 tree has been reached 1406 Error_Tree_Depth_Limit_Reached: The maximum height of a multicast 1407 tree has been reached 1409 Error_Other: A human-readable description is placed in the 1410 alm_error_info field. 1412 11.4. Algorithm Codes 1414 ALM Algorithm Types: There are currently two types: SCRIBE and 1415 P2PCAST. 1417 0001 - SCRIBE 1419 0002 - P2PCAST 1421 0003 .. 0xFFFF undefined 1423 12. Examples 1425 All peers in the examples are assumed to have completed 1426 bootstrapping. "Pn" refers to peer N. "GroupID" refers to a peer 1427 responsible for storing the ALMTree instance with GroupID. 1429 12.1. Create Tree 1431 A node with "NODE-MATCH" rights sends a request CreateTree to the 1432 group-id node, which also has NODE-MATCH rights for its own address. 1433 The group-id node determines whether to create the new tree, and if 1434 so, performs a local StoreReq. If the CreateTree succeeds, the 1435 ALMTree instance can be retrieved using Fetch. An example message 1436 flow for ceating a tree is depicted in Figure 6. 1438 P1 P2 P3 P4 GroupID 1439 | | | | | 1440 | | | | | 1441 | | | | | 1442 | CreateTree | | | 1443 |------------------------------->| 1444 | | | | | 1445 | | | | | StoreReq 1446 | | | | |--+ 1447 | | | | | | 1448 | | | | | | 1449 | | | | |<-+ 1450 | | | | | StoreResponse 1451 | | | | |--+ 1452 | | | | | | 1453 | | | | | | 1454 | | | | |<-+ 1455 | | | | | 1456 | | | | | 1457 | | CreateTreeResponse | 1458 |<-------------------------------| 1459 | | | | | 1460 | | | | | 1461 | Fetch | | | 1462 |------------------------------->| 1463 | | | | | 1464 | | | | | 1465 | | FetchResponse | 1466 |<-------------------------------| 1467 | | | | | 1469 Figure 6: Message flow example for CreateTree. 1471 12.2. Join Tree 1473 P1 joins node GroupID as child node. P2 joins the tree as a child of 1474 P1. P4 joins the tree as a child of P1. The corresponding message 1475 flow is shown in Figure 7 1476 P1 P2 P3 P4 GroupID 1477 | | | | | 1478 | | | | | 1479 | Join | 1480 |------------------------------->| 1481 | | | | | 1482 | JoinAccept | 1483 |<-------------------------------| 1484 | | | | | 1485 | | | | | 1486 | |Join | 1487 | |----------------------->| 1488 | | | | | 1489 | Join| 1490 |<-------------------------------| 1491 | | | | | 1492 |JoinAccept | | | 1493 |------>| | | | 1494 | | | | | 1495 |JoinConfirm | | | 1496 |<------| | | | 1497 | | | | | 1498 | | | |Join | 1499 | | | |------>| 1500 | | | | Join | 1501 |<-------------------------------| 1502 | | | | | 1503 | Join | | | | 1504 |------>| | | | 1505 | | | | | 1506 | JoinAccept | | | 1507 |----------------------->| | 1508 | | | | | 1509 | | JoinAccept | | 1510 | |--------------->| | 1511 | | | | | 1512 | | | | | 1513 | | Join Confirm | | 1514 |<-----------------------| | 1515 | | | | | 1516 | | Join Decline | | 1517 | |<---------------| | 1518 | | | | | 1519 | | | | | 1521 Figure 7: Message flow example for tree Join. 1523 12.3. Leave Tree 1525 P1 P2 P3 P4 GroupID 1526 | | | | | 1527 | | | | | 1528 | | | Leave | | 1529 |<-----------------------| | 1530 | | | | | 1531 | LeaveResponse | | | 1532 |----------------------->| | 1533 | | | | | 1534 | | | | | 1536 Figure 8: Message flow example for Leave tree. 1538 12.4. Push Data 1540 The multicast data is pushed recursively P1 => GroupID => P1 => P2, 1541 P4 following the tree topology created in the Join example above. An 1542 example message flow is shown in Figure 9 1543 P1 P2 P3 P4 GroupID 1544 | | | | | 1545 | Push | | | | 1546 |------------------------------->| 1547 | | | | | 1548 | | | PushResponse| 1549 |<-------------------------------| 1550 | | | | | 1551 | | | | Push| 1552 |<-------------------------------| 1553 | | | | | 1554 | PushResponse | | | 1555 |------------------------------->| 1556 | | | | | 1557 |Push | | | | 1558 |------>| | | | 1559 | | | | | 1560 |PushResponse | | | 1561 |<------| | | | 1562 | | | | | 1563 | Push | | | | 1564 |----------------------->| | 1565 | | | | | 1566 | | PushResponse | | 1567 |<-----------------------| | 1568 | | | | | 1569 | | | | | 1570 | | | | | 1572 Figure 9: Message flow example for pushing data. 1574 13. Kind Definitions 1576 13.1. ALMTree Kind Definition 1578 This section defines the ALMTree kind. 1580 Kind IDs The Resource Name for the ALMTree Kind-ID is the session_key 1581 used to identify the ALM tree 1583 Data Model The data model is the ALMTree structure. 1585 Access Control NODE-MATCH 1587 14. RELOAD Configuration File Extensions 1589 There are no ALM parameters defined for the RELOAD configuration 1590 file. 1592 15. Change History 1594 o Version 02: Remove Hybrid ALM material. Define ALMTree kind. 1595 Define new RELOAD messages. Define RELOAD architecture 1596 extensions. Add Scribe as base algorithm for ALM usage. Define 1597 code points. Define preliminary ALM-specific security issues. 1599 o Version 03: Add P2Pcast Algorithm. 1601 o Version 04: Add mapping to RELOAD experimental message. Modified 1602 IANA considerations setion. New algorithm identification coding. 1603 New message coding. Added push message. Create Tree access 1604 policy changed to use NODE-MATCH. Create Tree StoreReq clarified. 1605 Updated the diagrams in the Examples section. Added a Push data 1606 example. Defined the ALMTree kind. 1608 16. IANA Considerations 1610 This memo includes no request to IANA. 1612 17. Security Considerations 1614 Overlays are vulnerable to DOS and collusion attacks. We are not 1615 solving overlay security issues. We assume the node authentication 1616 model as defined in [I-D.ietf-p2psip-base]. 1618 ALM Usage specific security issues: 1620 o Right to create GroupID at some node_id 1622 o Right to store Tree info at some Location in the DHT 1624 o Limit on # messages / sec and bandwidth use 1626 o Right to join an ALM tree 1628 18. Acknowledgement 1630 Marc Petit-Huguenin provided important comments on earlier versions 1631 of this draft. 1633 19. Informative References 1635 [AGU1984] Aguilar, L., "Datagram Routing for Internet Multicasting", 1636 ACM Sigcomm 84 1984, March 1984, 1637 . 1639 [CASTRO2002] 1640 Castro, M., Druschel, P., Kermarrec, A., and A. Rowstron, 1641 "Scribe: A large-scale and decentralized application-level 1642 multicast infrastructure", IEEE Journal on Selected Areas 1643 in Communications vol.20, No.8, October 2002, . 1647 [CASTRO2003] 1648 Castro, M., Jones, M., Kermarrec, A., Rowstron, A., 1649 Theimer, M., Wang, H., and A. Wolman, "An Evaluation of 1650 Scalable Application-level Multicast Built Using Peer-to- 1651 peer overlays", Proceedings of IEEE INFOCOM 2003, 1652 April 2003, . 1655 [HE2005] He, Q. and M. Ammar, "Dynamic Host-Group/Multi-Destination 1656 Routing for Multicast Sessions", J. Telecommunication 1657 Systems vol. 28, pp. 409-433, 2005, . 1662 [I-D.ietf-mboned-auto-multicast] 1663 Bumgardner, G., "Automatic Multicast Tunneling", 1664 draft-ietf-mboned-auto-multicast-14 (work in progress), 1665 June 2012. 1667 [I-D.ietf-p2psip-base] 1668 Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and 1669 H. Schulzrinne, "REsource LOcation And Discovery (RELOAD) 1670 Base Protocol", draft-ietf-p2psip-base-24 (work in 1671 progress), January 2013. 1673 [I-D.ietf-p2psip-sip] 1674 Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., 1675 Schulzrinne, H., and T. Schmidt, "A SIP Usage for RELOAD", 1676 draft-ietf-p2psip-sip-08 (work in progress), 1677 December 2012. 1679 [I-D.irtf-p2prg-rtc-security] 1680 Schulzrinne, H., Marocco, E., and E. Ivov, "Security 1681 Issues and Solutions in Peer-to-peer Systems for Realtime 1682 Communications", draft-irtf-p2prg-rtc-security-05 (work in 1683 progress), September 2009. 1685 [I-D.irtf-sam-hybrid-overlay-framework] 1686 Buford, J., "Hybrid Overlay Multicast Framework", 1687 draft-irtf-sam-hybrid-overlay-framework-02 (work in 1688 progress), February 2008. 1690 [I-D.irtf-samrg-common-api] 1691 Waehlisch, M., Schmidt, T., and S. Venaas, "A Common API 1692 for Transparent Hybrid Multicast", 1693 draft-irtf-samrg-common-api-06 (work in progress), 1694 August 2012. 1696 [I-D.matuszewski-p2psip-security-overview] 1697 Yongchao, S., Matuszewski, M., and D. York, "P2PSIP 1698 Security Overview and Risk Analysis", 1699 draft-matuszewski-p2psip-security-overview-01 (work in 1700 progress), October 2009. 1702 [I-D.muramoto-irtf-sam-generic-require] 1703 Muramoto, E., "Requirements for Scalable Adaptive 1704 Multicast Framework in Non-GIG Networks", 1705 draft-muramoto-irtf-sam-generic-require-01 (work in 1706 progress), November 2006. 1708 [P2PCAST] Nicolosi, A. and S. Annapureddy, "P2PCast: A Peer-to-Peer 1709 Multicast Scheme for Streaming Data", Stanford Secure 1710 Computer Systems Group Report 2003, May 2003, . 1713 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 1714 RFC 792, September 1981. 1716 [RFC1112] Deering, S., "Host extensions for IP multicasting", STD 5, 1717 RFC 1112, August 1989. 1719 [RFC1930] Hawkinson, J. and T. Bates, "Guidelines for creation, 1720 selection, and registration of an Autonomous System (AS)", 1721 BCP 6, RFC 1930, March 1996. 1723 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1724 Requirement Levels", BCP 14, RFC 2119, March 1997. 1726 [RFC3376] Cain, B., Deering, S., Kouvelas, I., Fenner, B., and A. 1727 Thyagarajan, "Internet Group Management Protocol, Version 1728 3", RFC 3376, October 2002. 1730 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 1731 Text on Security Considerations", BCP 72, RFC 3552, 1732 July 2003. 1734 [RFC3810] Vida, R. and L. Costa, "Multicast Listener Discovery 1735 Version 2 (MLDv2) for IPv6", RFC 3810, June 2004. 1737 [RFC4286] Haberman, B. and J. Martin, "Multicast Router Discovery", 1738 RFC 4286, December 2005. 1740 [RFC4605] Fenner, B., He, H., Haberman, B., and H. Sandick, 1741 "Internet Group Management Protocol (IGMP) / Multicast 1742 Listener Discovery (MLD)-Based Multicast Forwarding 1743 ("IGMP/MLD Proxying")", RFC 4605, August 2006. 1745 [RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for 1746 IP", RFC 4607, August 2006. 1748 [RFC5058] Boivie, R., Feldman, N., Imai, Y., Livens, W., and D. 1749 Ooms, "Explicit Multicast (Xcast) Concepts and Options", 1750 RFC 5058, November 2007. 1752 [SPLITSTREAM] 1753 Castro, M., Druschel, P., Nandi, A., Kermarrec, A., 1754 Rowstron, A., and A. Singh, "SplitStream: High-bandwidth 1755 multicast in a cooperative environment", SOSP'03,Lake 1756 Bolton, New York 2003, October 2003, . 1760 Appendix A. Additional Stuff 1762 This becomes an Appendix. 1764 Authors' Addresses 1766 John Buford 1767 Avaya Labs Research 1768 211 Mt. Airy Rd 1769 Basking Ridge, New Jersey 07920 1770 USA 1772 Phone: +1 908 848 5675 1773 Email: buford@avaya.com 1775 Mario Kolberg (editor) 1776 University of Stirling 1777 Dept. Computing Science and Mathematics 1778 Stirling, FK9 4LA 1779 UK 1781 Phone: +44 1786 46 7440 1782 Email: mkolberg@ieee.org 1783 URI: http://www.cs.stir.ac.uk/~mko