idnits 2.17.1 draft-irtf-samrg-sam-baseline-protocol-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1365 has weird spacing: '... opaque alm_...' -- The document date (July 23, 2013) is 3927 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '32' on line 841 == Missing Reference: 'RFC-to-be' is mentioned on line 1686, but not defined == Unused Reference: 'RFC3552' is defined on line 1797, but no explicit reference was found in the text == Outdated reference: A later version (-10) exists of draft-irtf-samrg-common-api-06 -- No information found for draft-muramoto-irtf-sam-generic-require - is the name correct? -- Obsolete informational reference (is this intentional?): RFC 5226 (Obsoleted by RFC 8126) Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 SAM Research Group J. Buford 3 Internet-Draft Avaya Labs Research 4 Intended status: Experimental M. Kolberg, Ed. 5 Expires: January 24, 2014 University of Stirling 6 July 23, 2013 8 Application Layer Multicast Extensions to RELOAD 9 draft-irtf-samrg-sam-baseline-protocol-06 11 Abstract 13 We define a RELOAD Usage for Application Layer Multicast as well as a 14 mapping to the RELOAD experimental message type to support ALM. The 15 ALM Usage is intended to support a variety of ALM control algorithms 16 in an overlay-independent way. Two example algorithms are defined, 17 based on Scribe and P2PCast. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on January 24, 2014. 36 Copyright Notice 38 Copyright (c) 2013 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 51 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 52 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 53 2.1. Overlay Network . . . . . . . . . . . . . . . . . . . . . 4 54 2.2. Overlay Multicast . . . . . . . . . . . . . . . . . . . . 5 55 2.3. Source Specific Multicast (SSM) . . . . . . . . . . . . . 5 56 2.4. Any Source Multicast (ASM) . . . . . . . . . . . . . . . 5 57 2.5. Peer . . . . . . . . . . . . . . . . . . . . . . . . . . 5 58 3. Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 3.1. Overlay . . . . . . . . . . . . . . . . . . . . . . . . . 6 60 3.2. Overlay Multicast . . . . . . . . . . . . . . . . . . . . 6 61 3.3. RELOAD . . . . . . . . . . . . . . . . . . . . . . . . . 6 62 3.4. NAT . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 63 3.5. Tree Topology . . . . . . . . . . . . . . . . . . . . . . 6 64 4. Architecture Extensions to RELOAD . . . . . . . . . . . . . . 7 65 5. RELOAD ALM Usage . . . . . . . . . . . . . . . . . . . . . . 8 66 6. ALM Tree Control Signaling . . . . . . . . . . . . . . . . . 9 67 7. ALM Messages Mapped to RELOAD . . . . . . . . . . . . . . . . 10 68 7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 10 69 7.2. Tree Lifecycle Messages . . . . . . . . . . . . . . . . . 11 70 7.2.1. Create Tree . . . . . . . . . . . . . . . . . . . . . 11 71 7.2.2. CreateTreeResponse . . . . . . . . . . . . . . . . . 12 72 7.2.3. Join . . . . . . . . . . . . . . . . . . . . . . . . 12 73 7.2.4. Join Accept (Join Response) . . . . . . . . . . . . . 14 74 7.2.5. Join Reject (Join Response) . . . . . . . . . . . . . 14 75 7.2.6. Join Confirm . . . . . . . . . . . . . . . . . . . . 14 76 7.2.7. Join Confirm Response . . . . . . . . . . . . . . . . 15 77 7.2.8. Join Decline . . . . . . . . . . . . . . . . . . . . 15 78 7.2.9. Join Decline Response . . . . . . . . . . . . . . . . 16 79 7.2.10. Leave . . . . . . . . . . . . . . . . . . . . . . . . 16 80 7.2.11. Leave Response . . . . . . . . . . . . . . . . . . . 17 81 7.2.12. Re-Form or Optimize Tree . . . . . . . . . . . . . . 17 82 7.2.13. Reform Response . . . . . . . . . . . . . . . . . . . 17 83 7.2.14. Heartbeat . . . . . . . . . . . . . . . . . . . . . . 17 84 7.2.15. Heartbeat Response . . . . . . . . . . . . . . . . . 18 85 7.2.16. NodeQuery . . . . . . . . . . . . . . . . . . . . . . 18 86 7.2.17. NodeQuery Response . . . . . . . . . . . . . . . . . 18 87 7.2.18. Push . . . . . . . . . . . . . . . . . . . . . . . . 21 88 7.2.19. PushResponse . . . . . . . . . . . . . . . . . . . . 21 89 8. Scribe Algorithm . . . . . . . . . . . . . . . . . . . . . . 22 90 8.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 22 91 8.2. Create . . . . . . . . . . . . . . . . . . . . . . . . . 23 92 8.3. Join . . . . . . . . . . . . . . . . . . . . . . . . . . 23 93 8.4. Leave . . . . . . . . . . . . . . . . . . . . . . . . . . 23 94 8.5. JoinConfirm . . . . . . . . . . . . . . . . . . . . . . . 23 95 8.6. JoinDecline . . . . . . . . . . . . . . . . . . . . . . . 23 96 8.7. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 24 97 9. P2PCast Algorithm . . . . . . . . . . . . . . . . . . . . . . 24 98 9.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 24 99 9.2. Message Mapping . . . . . . . . . . . . . . . . . . . . . 24 100 9.3. Create . . . . . . . . . . . . . . . . . . . . . . . . . 25 101 9.4. Join . . . . . . . . . . . . . . . . . . . . . . . . . . 25 102 9.5. Leave . . . . . . . . . . . . . . . . . . . . . . . . . . 26 103 9.6. JoinConfirm . . . . . . . . . . . . . . . . . . . . . . . 27 104 9.7. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 27 105 10. Message Format . . . . . . . . . . . . . . . . . . . . . . . 27 106 10.1. ALMHeader Definition . . . . . . . . . . . . . . . . . . 29 107 10.2. ALMMessageContents Definition . . . . . . . . . . . . . 29 108 10.3. Response Codes . . . . . . . . . . . . . . . . . . . . . 30 109 11. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 31 110 11.1. Create Tree . . . . . . . . . . . . . . . . . . . . . . 31 111 11.2. Join Tree . . . . . . . . . . . . . . . . . . . . . . . 32 112 11.3. Leave Tree . . . . . . . . . . . . . . . . . . . . . . . 33 113 11.4. Push Data . . . . . . . . . . . . . . . . . . . . . . . 33 114 12. Kind Definitions . . . . . . . . . . . . . . . . . . . . . . 34 115 12.1. ALMTree Kind Definition . . . . . . . . . . . . . . . . 34 116 13. RELOAD Configuration File Extensions . . . . . . . . . . . . 34 117 14. Change History . . . . . . . . . . . . . . . . . . . . . . . 35 118 15. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 35 119 15.1. ALM Algorithm Types . . . . . . . . . . . . . . . . . . 35 120 15.2. Message Code Registration . . . . . . . . . . . . . . . 36 121 15.3. Error Code Registration . . . . . . . . . . . . . . . . 36 122 16. Security Considerations . . . . . . . . . . . . . . . . . . . 37 123 17. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 38 124 18. Informative References . . . . . . . . . . . . . . . . . . . 38 125 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 39 127 1. Introduction 129 The concept of scalable adaptive multicast includes both scaling 130 properties and adaptability properties. Scalability is intended to 131 cover: 133 o large group size 135 o large numbers of small groups 137 o rate of group membership change 139 o admission control for QoS 141 o use with network layer QoS mechanisms 143 o varying degrees of reliability 144 o trees connecting nodes over the global Internet 146 Adaptability includes 148 o use of different control mechanisms for different multicast trees 149 depending on initial application parameters or application classes 151 o changing multicast tree structure depending on changes in 152 application requirements, network conditions, and membership 154 Application Layer Multicast (ALM) has been demonstrated to be a 155 viable multicast technology where native multicast isn't available. 156 Many ALM designs have been proposed. This ALM Usage focuses on: 158 o ALM implemented in RELOAD-based overlays 160 o Support for a variety of ALM control algorithms 162 o Providing a basis for defining a separate hybrid-ALM RELOAD Usage 164 RELOAD [I-D.ietf-p2psip-base] has an application extension mechanism 165 in which a new type of application defines a Usage. A RELOAD Usage 166 defines a set of data types and rules for their use. In addition, 167 this document describes additional message types and a new ALM 168 algorithm plugin architectural component. 170 1.1. Requirements Language 172 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 173 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 174 document are to be interpreted as described in RFC 2119 [RFC2119]. 176 2. Definitions 178 We adopt the terminology defined in section 2 of 179 [I-D.ietf-p2psip-base], specifically the distinction between Node, 180 Peer, and Client. 182 2.1. Overlay Network 184 Overlay network - An application layer virtual or logical network in 185 which end points are addressable and that provides connectivity, 186 routing, and messaging between end points. Overlay networks are 187 frequently used as a substrate for deploying new network services, or 188 for providing a routing topology not available from the underlying 189 physical network. Many peer-to-peer systems are overlay networks 190 that run on top of the Internet. In Figure 1, "P" indicates overlay 191 peers, and peers are connected in a logical address space. The links 192 shown in the figure represent predecessor/successor links. Depending 193 on the overlay routing model, additional or different links may be 194 present. 196 P P P P P 197 ..+....+....+...+.....+... 198 . +P 199 P+ . 200 . +P 201 ..+....+....+...+.....+... 202 P P P P P 204 Figure 1: Overlay Network Example 206 2.2. Overlay Multicast 208 Overlay Multicast (OM): Hosts participating in a multicast session 209 form an overlay network and utilize unicast connections among pairs 210 of hosts for data dissemination [BUFORD2009], [KOLBERG2010], 211 [BUFORD2008]. The hosts in overlay multicast exclusively handle 212 group management, routing, and tree construction, without any support 213 from Internet routers. This is also commonly known as Application 214 Layer Multicast (ALM) or End System Multicast (ESM). We call systems 215 which use proxies connected in an overlay multicast backbone "proxied 216 overlay multicast" or POM. 218 2.3. Source Specific Multicast (SSM) 220 SSM tree: The creator of the tree is the source. It sends data 221 messages to the tree root which are forwarded down the tree. 223 2.4. Any Source Multicast (ASM) 225 ASM tree: A node sending a data message sends the message to its 226 parent and its children. Each node receiving a data message from one 227 edge forwards it to remaining tree edges it is connected to. 229 2.5. Peer 231 Peer: an autonomous end system that is connected to the physical 232 network and participates in and contributes resources to overlay 233 construction, routing and maintenance. Some peers may also perform 234 additional roles such as connection relays, super nodes, NAT 235 traversal assistance, and data storage. 237 3. Assumptions 238 3.1. Overlay 240 Peers connect in a large-scale overlay, which may be used for a 241 variety of peer-to-peer applications in addition to multicast 242 sessions. Peers may assume additional roles in the overlay beyond 243 participation in the overlay and in multicast trees. We assume a 244 single structured overlay routing algorithm is used. Any of a 245 variety of multi-hop, one-hop, or variable-hop overlay algorithms 246 could be used. 248 Castro et al. [CASTRO2003] compared multi-hop overlays and found 249 that tree-based construction in a single overlay out-performed using 250 separate overlays for each multicast session. We use a single 251 overlay rather than separate overlays per multicast sessions. 253 An overlay multicast algorithm may leverage the overlay's mechanism 254 for maintaining overlay state in the face of churn. For example, a 255 peer may store a number of DHT (Distributed Hash Table) entries. 256 When the peer gracefully leaves the overlay, it transfers those 257 entries to the nearest peer. When another peer joins which is closer 258 to some of the entries than the current peer which holds those 259 entries, than those entries are migrated. Overlay churn affects 260 multicast trees as well; remedies include automatic migration of the 261 tree state and automatic re-join operations for dislocated children 262 nodes. 264 3.2. Overlay Multicast 266 The overlay supports concurrent multiple multicast trees. The limit 267 on number of concurrent trees depends on peer and network resources 268 and is not an intrinsic property of the overlay. 270 3.3. RELOAD 272 We use RELOAD [I-D.ietf-p2psip-base] as the Peer-to-Peer overlay for 273 data storage and the mechanism by which the peers interconnect and 274 route messages. RELOAD is a generic P2P overlay, and application 275 support is defined by profiles called Usages. 277 3.4. NAT 279 Some nodes in the overlay may be in a private address space and 280 behind firewalls. We use the RELOAD mechanisms for NAT traversal. 281 We permit clients to be leaf nodes in an ALM tree. 283 3.5. Tree Topology 284 All tree control messages are routed in the overlay. Two types of 285 data or media topologies are envisioned: 1) tree edges are paths in 286 the overlay, 2) tree edges are direct connections between a parent 287 and child peer in the tree, formed using the RELOAD AppAttach method. 289 4. Architecture Extensions to RELOAD 291 There are two changes as depicted in Figure 2. New ALM messages are 292 mapped to RELOAD Message Transport using the RELOAD experimental 293 message type. A plug-in for ALM algorithms handles the ALM state and 294 control. The ALM Algorithm is under control of the application via 295 the Group API [I-D.irtf-samrg-common-api]. 297 +---------+ 298 |Group API| 299 +---------+ 300 | 301 ------------------- Application ------------------------ 302 +-------+ | 303 | ALM | | 304 | Usage | | 305 +-------+ | 306 -------------- Messaging Service Boundary -------------- 307 | 308 +--------+ +-----------+---------+ +---------+ 309 | Storage|<---> | RELOAD | ALM |<-->| ALM Alg | 310 +--------+ | Message | Messages| +---------+ 311 ^ | Transport | | 312 | +-----------+---------+ 313 v | | 314 +-------------+ | 315 | Topology | | 316 | Plugin | | 317 +-------------+ | 318 ^ | 319 v v 320 +-------------------+ 321 | Forwarding & | 322 | Link Management | 323 +-------------------+ 325 ---------- Overlay Link Service Boundary -------------- 327 Figure 2: RELOAD Architecture Extensions 329 The ALM components interact with RELOAD as follows: 331 o ALM uses the RELOAD data storage functionality to store an ALMTree 332 instance when a new ALM tree is created in the overlay, and to 333 retrieve ALMTree instance(s) for existing ALM trees. 335 o ALM applications and management tools may use the RELOAD data 336 storage functionality to store diagnostic information about the 337 operation of trees, including average number of tree, delay from 338 source to leaf nodes, bandwidth use, packet loss rate. In 339 addition, diagnostic information may include statistics specific 340 to the tree root, or to any node in the tree. 342 5. RELOAD ALM Usage 344 Applications of RELOAD are restricted in the data types that can be 345 stored in the DHT. The profile of accepted data types for an 346 application is referred to as a Usage. RELOAD is designed so that 347 new applications can easily define new Usages. New RELOAD Usages are 348 needed for multicast applications since the data types in base RELOAD 349 and existing usages are not sufficient. 351 We define an ALM Usage in RELOAD. This ALM Usage is sufficient for 352 applications which require ALM functionality in the overlay. Figure 353 2 shows the internal structure of the ALM Usage. This contains the 354 Group API ([I-D.irtf-samrg-common-api]) an ALM algorithm plugin (e.g. 355 Scribe) and the ALM messages which are then sent out to the RELOAD 356 network. 358 A RELOAD Usage is required [I-D.ietf-p2psip-base] to define the 359 following: 361 o Kind-Id and Code points 363 o data structures for each kind 365 o access control rules for each kind 367 o the Resource Name used to hash to the Resource ID that determines 368 where the kind is stored 370 o Addresses restoration of values after recovery from a network 371 partition 373 o the types of connections that can be initiated using AppConnect 375 an ALM GroupID is a RELOAD Node-ID. The owner of an ALM group 376 creates a RELOAD Node-ID as specified in [I-D.ietf-p2psip-base]. 377 This means that a GroupID is used as a RELOAD Destination for overlay 378 routing purposes. 380 6. ALM Tree Control Signaling 382 Peers use the overlay to support ALM operations such as: 384 o Create tree 386 o Join 388 o Leave 390 o Re-Form or optimize tree 392 There are a variety of algorithms for peers to form multicast trees 393 in the overlay. The approach presented here permits multiple such 394 algorithms to be supported in the overlay since different algorithms 395 may be more suitable for certain application requirements, and to 396 support experimentation. Therefore, overlay messaging corresponding 397 to the set of overlay multicast operations MUST carry algorithm 398 identification information. 400 For example, for small groups, the join point might be directly 401 assigned by the rendezvous point, while for large trees the join 402 request might be propagated down the tree with candidate parents 403 forwarding their position directly to the new node. 405 Here is a simplistic notation for forming a multicast tree in the 406 overlay. Its main advantage is the use of the overlay for routing 407 both control and data messages. The group creator does not have to 408 be the root of the tree or even in the tree. It does not consider 409 per node load, admission control, or alternative paths. After the 410 creation of a tree, the groupID is expected to be advertised or 411 distributed out of band, perhaps by publishing in the DHT. 412 Similarly, joining peers will discover the groupID out of band, 413 perhaps by a lookup in the tree. 415 As stated earlier, multiple algorithms will co-exist in the overlay. 417 1. Peer which initiates multicast group: 419 groupID = create(); // Allocate a unique groupId. 420 // The root is the nearest 421 // peer in the overlay. 423 2. Any joining peer: 425 joinTree(groupID); // sends "join groupID" message 427 The overlay routes the join request using the overlay routing 428 mechanism toward the peer with the nearest id to the groupID. 429 This peer is the root. Peers on the path to the root join the 430 tree as forwarding points. 432 3. Leave Tree: 434 leaveTree(groupID) // removes this node from the tree 436 Propagates a leave message to each child node and to the parent 437 node. If the parent node is a forwarding node and this is its 438 last child, then it propagates a leave message to its parent. A 439 child node receiving a leave message from a parent sends a join 440 message to the groupID. 442 4. Message forwarding: 444 multicastMsg(groupID, msg); 446 For the message forwarding both Any Source Multicast (ASM) and 447 Source Specific Multicast (SSM) approaches may be used. 449 7. ALM Messages Mapped to RELOAD 451 7.1. Introduction 453 In this document we define messages for overlay multicast tree 454 creation, using an existing protocol (RELOAD) in the P2P-SIP WG 455 [I-D.ietf-p2psip-base] for a universal structured peer-to-peer 456 overlay protocol. RELOAD provides the mechanism to support a number 457 of overlay topologies. Hence the overlay multicast framework defined 458 in this document can be used with P2P-SIP, and makes the SAM 459 framework overlay agnostic. 461 As discussed in the SAM requirements document 462 [I-D.muramoto-irtf-sam-generic-require], there are a variety of ALM 463 tree formation and tree maintenance algorithms. The intent of this 464 specification is to be algorithm agnostic, similar to how RELOAD is 465 overlay algorithm agnostic. We assume that all control messages are 466 propagated using overlay routed messages. 468 The message types needed for ALM behavior are divided into the 469 following categories: 471 o Tree life-cycle (create, join, leave, re-form, heartbeat) 472 o Peer region and multicast properties 474 The message codes are defined in Section 15.2 of this document. 475 Messages are mapped to the RELOAD experimental message type. 477 In the following sections the protocol messages as mapped to RELOAD 478 are discussed. Detailed example message flows are provided in 479 Section 11. 481 In the following descriptions we use the datatype Dictionary which is 482 a set of opaque values indexed by an opaque key with one value for 483 each key. A single dictionary entry is represented by a 484 DictionaryEntry as defined in Section 7.2.3 of the RELOAD document 485 [I-D.ietf-p2psip-base]. The Dictionary datatype is defined as 486 follows: 488 struct { 489 DictionaryEntry elements<0..2^16-1>; 490 } Dictionary; 492 7.2. Tree Lifecycle Messages 494 Peers use the overlay to transmit ALM (application layer multicast) 495 operations defined in this section. 497 7.2.1. Create Tree 499 A new ALM tree is created in the overlay with the identity specified 500 by group_id. The common interpretation in a DHT based overlay of 501 group_id is that the peer with peer id closest to and less than the 502 group_id is the root of the tree. However, other overlay types are 503 supported. The tree has no children at the time it is created. 505 The group_id is generated from a well-known session key to be used by 506 other peers to address the multicast tree in the overlay. The 507 generation of the group_id from the session_key MUST be done using 508 the overlay's id generation mechanism. 510 struct { 511 node_id peer_id; 512 opaque session_key<0..2^32-1>; 513 node_id group_id; 514 Dictionary options; 515 } ALMTree; 517 peer_id: the overlay address of the peer that creates the multicast 518 tree. 520 session_key: a well-known string that when hashed using the overlay's 521 id generation algorithm produces the group_id. 523 group_id: the overlay address of the root of the tree 525 options: name-value list of properties to be associated with the 526 tree, such as the maximum size of the tree, restrictions on peers 527 joining the tree, latency constraints, preference for distributed or 528 centralized tree formation and maintenance, heartbeat interval. 530 Tree creation is subject to access control since it involves a Store 531 operation. The NODE-MATCH access policy defined in section 7.3.2 of 532 RELOAD is used. 534 A successful Create Tree causes an ALMTree structure to be stored in 535 the overlay at the node G responsible for the group_id. This node G 536 performs the RELOAD-defined StoreReq operation as a side effect of 537 performing the Create Tree. If the StoreReq fails, the Create Tree 538 fails too. 540 After a successful Create Tree, peers can use the RELOAD Fetch method 541 to retrieve the ALMTree struct at address group_id. The ALMTree kind 542 is defined in Section 12.1. 544 7.2.2. CreateTreeResponse 546 After receiving a CreateTree message from node S, the peer sends a 547 CreateTreeReponse to node S. 549 struct { 550 Dictionary options; 551 } CreateTreeResponse; 553 options: A node may provide algorithm-dependent parameters about the 554 created tree to the requesting node. 556 7.2.3. Join 558 Causes the distributed algorithm for peer join of a specific ALM 559 group to be invoked. The definition of the Join message is shown 560 below. If successful, the joining peer is notified of one or more 561 candidate parent peers in one or more JoinAccept messages. The 562 particular ALM join algorithm is not specified in this protocol. 564 struct { 565 node_id peer_id; 566 node_id group_id; 567 Dictionary options; 568 } Join; 570 peer_id: overlay address of joining/leaving peer 572 group_id: the overlay address of the root of the tree 574 options: name-value list of options proposed by joining peer 576 RELOAD is a request-response protocol. Consequently, the messages 577 JoinAccept and JoinReject (defined below) are matching responses for 578 Join. If JoinReject is received, then no further action on this 579 request is carried out. If JoinAccept is received, then either a 580 JoinConfirm or a JoinDecline message (see below) is sent. The 581 matching response for JoinConfirm is JoinConfirmResponse. The 582 matching response for JoinDecline is JoinDeclineResponse. 584 The following list shows the matching request-responses according to 585 the request-response mechanism defined in RELOAD. 587 Join -- JoinAccept: Node C sends a Join request to node P. If 588 node P accepts, it responds with JoinAccept. 590 Join -- JoinReject: Node C sends a Join request to node P. If 591 node P does not accept the join request, it responds with 592 JoinReject. 594 JoinConfirm -- JoinConfirmResponse: If node P sent node C a 595 JoinAccept and node C confirms with a JoinConfirm request then 596 Node P then responds with a JoinConfirmResponse. 598 JoinDecline -- JoinDeclineResponse: If node P sent node C a 599 JoinAccept and node C declines with a JoinDecline request then 600 Node P then responds with a JoinDeclineResponse. 602 Thus Join, JoinConfirm, and JoinDecline are treated as requests as 603 defined in RELOAD, are mapped to the RELOAD exp_a_req message, and 604 are therefore retransmitted until either a retry limit is reached or 605 a matching response received. JoinAccept, JoinReject, 606 JoinConfirmResponse, and JoinDeclineResponse are treated as message 607 responses as defined above, and are mapped to the RELOAD exp_a_ans 608 message. 610 The Join behaviour can be described as follows: 612 if(checkAccept(msg)) { 613 recvJoins.add(msg.source, msg.group_id) 614 SEND(JOINAccept(node_id, msg.source, msg.group_id)) 615 } 617 7.2.4. Join Accept (Join Response) 619 Tells the requesting joining peer that the indicated peer is 620 available to act as its parent in the ALM tree specified by group_id, 621 with the corresponding options specified. A peer MAY receive more 622 than one JoinAccept from different candidate parent peers in the 623 group_id tree. The peer accepts a peer as parent using a JoinConfirm 624 message. A JoinAccept which receives neither a JoinConfirm or 625 JoinDecline message MUST expire. RELOAD implementations are able to 626 read a local configuration file for settings. It is assumed that 627 this file contains the timeout value to be used. 629 struct { 630 node_id parent_peer_id; 631 node_id child_peer_id; 632 node_id group_id; 633 Dictionary options; 634 } JoinAccept; 636 parent_peer_id: overlay address of a peer which accepts the joining 637 peer 639 child_peer_id: overlay address of joining peer 641 group_id: the overlay address of the root of the tree 643 options: name-value list of options accepted by parent peer 645 7.2.5. Join Reject (Join Response) 647 A peer receiving a Join message responds with a JoinReject response 648 to indicate the request is rejected. 650 7.2.6. Join Confirm 652 A peer receiving a JoinAccept message which it wishes to accept MUST 653 explicitly accept it before the expiration of a timer for the 654 JoinAccept message using a JoinConfirm message. The joining peer 655 MUST include only those options from the JoinAccept which it also 656 accepts, completing the negotiation of options between the two peers. 658 struct { 659 node_id child_peer_id; 660 node_id parent_peer_id; 661 node_id group_id; 662 Dictionary options; 663 } JoinConfirm; 665 child_peer_id: overlay address of joining peer which is a child of 666 the parent peer 668 parent_peer_id: overlay address of the peer which is the parent of 669 the joining peer 671 group_id: the overlay address of the root of the tree 673 options: name-value list of options accepted by both peers 675 The JoinConfirm message behaviour is decribed below: 677 if(recvJoins.contains(msg.source,msg.group_id)){ 678 if !(groups.contains(msg.group_id)) { 679 groups.add(msg.group_id) 680 SEND(msg,msg.group_id) 681 } 682 groups[msg.group_id].children.add(msg.source) 683 recvJoins.del(msg.source, msg.group_id) 684 } 686 7.2.7. Join Confirm Response 688 A peer receiving a JoinConfirm message responds with a 689 JoinConfirmResponse message. 691 7.2.8. Join Decline 693 A peer receiving a JoinAccept message which it does not wish to 694 accept it MAY explicitly decline it using a JoinDecline message. 696 struct { 697 node_id peer_id; 698 node_id parent_peer_id; 699 node_id group_id; 700 } JoinDecline; 702 peer_id: overlay address of joining peer which declines the 703 JoinAccept 705 parent_peer_id: overlay address of the peer which issued a JoinAccept 706 to this peer 708 group_id: the overlay address of the root of the tree 710 The behaviour of the JoinDecline message is described as follows: 712 if(recvJoins.contains(msg.source,msg.group_id)) 713 recvJoins.del(msg.source, msg.group_id) 715 7.2.9. Join Decline Response 717 A peer receiving a JoinConfirm message responds with a 718 JoinDeclineResponse message. 720 7.2.10. Leave 722 A peer which is part of an ALM tree identified by group_id which 723 intends to detach from either a child or parent peer SHOULD send a 724 Leave message to the peer it wishes to detach from. A peer receiving 725 a Leave message from a peer which is neither in its parent or child 726 lists SHOULD ignore the message. 728 struct { 729 node_id peer_id; 730 node_id group_id; 731 Dictionary options; 732 } Leave; 734 peer_id: overlay address of leaving peer 736 group_id: the overlay address of the root of the tree 738 options: name-value list of options 740 The behaviour of the Leave message can be described as: 742 groups[msg.group_id].children.remove(msg.source) 743 if (groups[msg.group].children = 0) 744 SEND(msg,groups[msg.group_id].parent) 746 7.2.11. Leave Response 748 A peer receiving a Leave message responds with a LeaveResponse 749 message. 751 7.2.12. Re-Form or Optimize Tree 753 This triggers a reorganization of either the entire tree or only a 754 sub-tree. It MAY include hints to specific peers of recommended 755 parent or child peers to reconnect to. A peer receiving this message 756 MAY ignore it, MAY propagate it to other peers in its subtree, and 757 MAY invoke local algorithms for selecting preferred parent and/or 758 child peers. 760 struct { 761 node_id group_id; 762 node_id peer_id; 763 Dictionary options; 764 } Reform; 766 group_id: the overlay address of the root of the tree 768 peer_id: if omitted, then the tree is reorganized starting from the 769 root, otherwise it is reorganized only at the sub-tree identified by 770 peer_id. 772 options: name-value list of options 774 7.2.13. Reform Response 776 A peer receiving a Reform message responds with a ReformResponse 778 struct { 779 Dictionary options; 780 } ReformResponse; 782 options: algorithm dependent information about the results of the 783 reform operation 785 7.2.14. Heartbeat 787 A child node signals to its adjacent parent nodes in the tree that it 788 is alive. If a parent node does not receive a Heartbeat message 789 within N heartbeat time intervals, it MUST treat this as an explicit 790 Leave message from the unresponsive peer. N is configurable. RELOAD 791 implementations are able to read a local configuration file for 792 settings. It is assumed that this file contains the value for N to 793 be used. 795 struct { 796 node_id peer_id_src; 797 node_id peer_id_dst; 798 node_id group_id; 799 Dictionary options; 800 } Heartbeat; 802 peer_id_src: source of heartbeat 804 peer_id_dst: destination of heartbeat 806 group_id: overlay address of the root of the tree 808 options: an algorithm may use the heartbeat message to provide state 809 information to adjacent nodes in the tree 811 7.2.15. Heartbeat Response 813 A parent node responds with a Heartbeat Response to a Heartbeat from 814 a child node indicating that it has received the Heartbeat message. 816 7.2.16. NodeQuery 818 The NodeQuery message is used to obtain information about the state 819 and performance of the tree on a per node basis. A set of nodes 820 could be queried to construct a centralized view of the multicast 821 trees, similar to a web crawler. 823 struct { 824 node_id peer_id_src; 825 node_id peer_id_dst; 826 } NodeQuery; 828 peer_id_src: source of query 830 peer_id_dst: destination of query 832 7.2.17. NodeQuery Response 834 The response to a NodeQuery message contains a NodeStatistics 835 instance for this node. 837 public struct { 838 uint32 node_lifetime; 839 uint32 total_number_trees; 840 uint16 number_algorithms_supported; 841 uint8 algorithms_supported[32]; 842 TreeData max_tree_data; 843 uint16 active_number_trees; 844 TreeData tree_data<0..2^8-1>; 845 ImplementationInfo imp_info; 846 } NodeStatistics; 848 node_lifetime: time the node has been alive in seconds since last 849 restart 851 total_number_trees: total number of trees this node has been part 852 of during the node lifetime 854 number_algorithms_supported: value between 0..2^16-1 corresponding 855 to the number of algorithms supported 857 algorithms_supported: list of algorithms, each byte encoded using 858 the corresponding algorithm code 860 max_tree_data: data about tree with largest number of nodes that 861 this node was part of. NodeQuery can be used to crawl all the 862 nodes in an ALM tree to fill this field. This is intended to 863 support monitoring, algorithm design, and general experimentation 864 with ALM in RELOAD. 866 active_number_trees: current number of trees that the node is part 867 of 869 tree_data: details of each active tree, the number of such is 870 specified by the number_active_trees. 872 impl_info: information about the implementation of this usage 874 public struct { 875 uint32 tree_id; 876 uint8 algorithm; 877 NodeId tree_root; 878 uint8 number_parents; 879 NodeId parent<0..2^8-1>; 880 Uint16 number_children_nodes; 881 NodeId children<0..2^16-1>; 882 Uint32 path_length_to_root; 883 Uint32 path_delay_to_root; 884 Uint32 path_delay_to_child; 885 } TreeData; 887 tree_id: the id of the tree 889 algorithm: code identifying the multicast algorithm used by this 890 tree 892 tree_root: node_id of tree root, or 0 if unknown 894 number_parents: 0 .. 2^8-1 indicates number of parent nodes for 895 this node 897 parent: the RELOAD NodeId of each parent node 899 number_children_nodes: 0..2^16-1 indicates number of children 901 children: the RELOAD NodeId of each child node 903 path_length_to_root: number of overlay hops to the root of the 904 tree 906 path_delay_to_root: RTT in millisec. to root node 908 path_delay_to_child: last measured RTT in msec to child node with 909 largest RTT. 911 public struct { 912 uint32 join_confim_timeout; 913 uint32 heartbeat_interval; 914 uint32 heartbeat_reponse_timeout; 915 uint16 info_length; 916 uint8 info<0..2^16-1>; 917 } ImplementationInfo; 919 join_confirm_timeout: The default time for join confirm/decline, 920 intended to provide sufficient time for a join request to receive 921 all responses and confirm the best choice. Default value is 5000 922 msec. An implementation can change this value. 924 heartbeat interval: The default heartbeat interval is 2000 msec. 925 Different interoperating implementations could use different 926 intervals. 928 heartbeat timeout interval: The default heartbeat timeout is 5000 929 msec, and is the max time between heartbeat reports from an 930 adjacent node in the tree at which point the heartbeat is missed. 932 info_length: length of the info field 934 info: implementation specific information, such as name of 935 implementation, build version, and implementation specific 936 features 938 7.2.18. Push 940 A peer sends arbitrary multicast data to other peers in the tree. 941 Nodes in the tree forward this message to adjacent nodes in the tree 942 in an algorithm dependent way. 944 struct { 945 node_id group_id; 946 uint8 priority; 947 uint32 length; 948 uint8 data<0..2^32-1>; 949 } Push; 951 group_id: overlay address of root of the ALM tree 953 priority: the relative priority of the message, highest priority is 954 255. A node may ignore this field 956 length: length of the data field in bytes 958 data: the data 960 In pseudocode the behaviour of Push can be described as: 962 foreach(groups[msg.group_id].children as node_id) 963 SEND(msg,node_id) 964 if memberOf(msg.group_id) 965 invokeMessageHandler(msg.group_id, msg) 967 7.2.19. PushResponse 969 After receiving a Push message from node S, the receiving peer sends 970 a PushReponse to node S. 972 struct { 973 Dictionary options; 975 } PushResponse; 977 options: A node may provide feedback to the sender about previous 978 push messages in some window, for example, the last N push messages. 979 The feedback could include, for each push message received, the 980 number of adjacent nodes which were forwarded the push message, and 981 the number of adjacent nodes from which a PushResponse was received. 983 8. Scribe Algorithm 985 8.1. Overview 987 Figure 3 shows a mapping between RELOAD ALM messages (as defined in 988 Section 5 of this document) and Scribe messages as defined in 989 [CASTRO2002]. 991 +---------+-------------------+-----------------+ 992 | Section |RELOAD ALM Message | Scribe Message | 993 +---------+-------------------+-----------------+ 994 | 7.2.1 | CreateALMTree | Create | 995 +---------+-------------------+-----------------+ 996 | 7.2.2 | Join | Join | 997 +---------+-------------------+-----------------+ 998 | 7.2.3 | JoinAccept | | 999 +---------+-------------------+-----------------+ 1000 | 7.2.4 | JoinConfirm | | 1001 +---------+-------------------+-----------------+ 1002 | 7.2.5 | JoinDecline | | 1003 +---------+-------------------+-----------------+ 1004 | 7.2.6 | Leave | Leave | 1005 +---------+-------------------+-----------------+ 1006 | 7.2.7 | Reform | | 1007 +---------+-------------------+-----------------+ 1008 | 7.2.8 | Heartbeat | | 1009 +---------+-------------------+-----------------+ 1010 | 7.2.9 | NodeQuery | | 1011 +---------+-------------------+-----------------+ 1012 | 7.2.10 | Push | Multicast | 1013 +---------+-------------------+-----------------+ 1014 | | Note 1 | deliver | 1015 +---------+-------------------+-----------------+ 1016 | | Note 1 | forward | 1017 +---------+-------------------+-----------------+ 1018 | | Note 1 | route | 1019 +---------+-------------------+-----------------+ 1020 | | Note 1 | send | 1021 +---------+-------------------+-----------------+ 1022 Figure 3: Mapping to Scribe Messages 1024 Note 1: These Scribe messages are handled by RELOAD messages. 1026 The following sections describe the Scribe algorithm in more detail. 1028 8.2. Create 1030 This message will create a group with group_id. This message MUST be 1031 delivered to the node whose node_id is closest to the group_id. This 1032 node becomes the rendezvous point and root for the new multicast 1033 tree. Groups MAY have multiple sources of multicast messages. 1035 8.3. Join 1037 To join a multicast tree a node SHOULD send a JOIN request with the 1038 group_id as the key. This message gets routed by the overlay to the 1039 rendezvous point of the tree. If an intermediate node is already a 1040 forwarder for this tree, it SHOULD add the joining node as a child. 1041 Otherwise the node SHOULD create a child table for the group and add 1042 the joining node. It SHOULD then send the JOIN request towards the 1043 rendevous point terminating the JOIN message from the child. 1045 To adapt the Scribe algorithm into the ALM Usage proposed here, after 1046 a JOIN request is accepted, a JOINAccept message MUST be returned to 1047 the joining node. 1049 8.4. Leave 1051 When leaving a multicast group a node SHOULD change its local state 1052 to indicate that it left the group. If the node has no children in 1053 its table it MUST send a LEAVE request to its parent, from where it 1054 SHOULD travel up the multicast tree and stop at a node which has 1055 still children remaining after removing the leaving node. 1057 8.5. JoinConfirm 1059 This message is not part of the Scribe protocol, but required by the 1060 basic protocol proposed in this document. Thus the usage MUST send 1061 this message to confirm a joining node accepting its parent node. 1063 8.6. JoinDecline 1065 Like JoinConfirm, this message is not part of the Scribe protocol. 1066 Thus the usage MUST send this message if a peer receiving a 1067 JoinAccept message wishes to decline it. 1069 8.7. Multicast 1071 A message to be multicast to a group MUST be sent to the rendevous 1072 node from where it is forwarded down the tree. If a node is a member 1073 of the tree rather than just a forwarder it SHOULD pass the multicast 1074 data up to the application. 1076 9. P2PCast Algorithm 1078 9.1. Overview 1080 P2PCast [P2PCAST] creates a forest of related trees to increase load 1081 balancing. P2PCast is independent of the underlying P2P substrate. 1082 Its goals and approach are similar to Splitstream [SPLITSTREAM] 1083 (which assumes Pastry as the P2P overlay). In P2PCast the content 1084 provider splits the stream of data into f stripes. Each tree in the 1085 forest of multicast trees is an (almost) full tree of arity f. These 1086 trees are conceptually separate: every node of the system appears 1087 once in each tree, with the content provider being the source in all 1088 of them. To ensure that each peer contributes as much bandwidth as 1089 it receives, every node is a leaf in all the trees except for one, in 1090 which the node will serve as an internal node (proper tree of this 1091 node). The remainder of this section will assume f=2 for the 1092 discussion. This is to keep the complexity for the description down. 1093 However, the algorithm scales for any number f. 1095 P2PCast distinguishes the following types of nodes: 1097 o Incomplete Nodes: A node with less than f children in its proper 1098 stripe; 1100 o Only-Child Nodes: A node whose parent (in any multicast tree) is 1101 an incomplete node; 1103 o Complete Nodes: A node with exactly f children in its proper 1104 stripe 1106 o Special Node: A single node which is a leaf in all multicast trees 1107 of the forest 1109 9.2. Message Mapping 1111 Figure 4 shows a mapping between RELOAD ALM messages (as defined in 1112 Section 5 of this document) and P2PCast messages as defined in 1113 [P2PCAST]. 1115 +---------+-------------------+-----------------+ 1116 | Section |RELOAD ALM Message | P2PCast Message | 1117 +---------+-------------------+-----------------+ 1118 | 7.2.1 | CreateALMTree | Create | 1119 +---------+-------------------+-----------------+ 1120 | 7.2.2 | Join | Join | 1121 +---------+-------------------+-----------------+ 1122 | 7.2.3 | JoinAccept | | 1123 +---------+-------------------+-----------------+ 1124 | 7.2.4 | JoinConfirm | | 1125 +---------+-------------------+-----------------+ 1126 | 7.2.5 | JoinDecline | | 1127 +---------+-------------------+-----------------+ 1128 | 7.2.6 | Leave | Leave | 1129 +---------+-------------------+-----------------+ 1130 | 7.2.7 | Reform | Takeon | 1131 | | | Substitute | 1132 | | | Search | 1133 | | | Replace | 1134 | | | Direct | 1135 | | | Update | 1136 +---------+-------------------+-----------------+ 1137 | 7.2.8 | Heartbeat | | 1138 +---------+-------------------+-----------------+ 1139 | 7.2.9 | NodeQuery | | 1140 +---------+-------------------+-----------------+ 1141 | 7.2.10 | Push | Multicast | 1142 +---------+-------------------+-----------------+ 1144 Figure 4: Mapping to P2PCast Messages 1146 The following sections describe the mapping of the P2PCast messages 1147 in more detail. 1149 9.3. Create 1151 This message will create a group with group_id. This message MUST be 1152 delivered to the node whose node_id is closest to the group_id. This 1153 node becomes the rendezvous point and root for the new multicast 1154 tree. The rendezvous point will maintain f subtrees. 1156 9.4. Join 1158 To join a multicast tree a joining node N MUST send a JOIN request to 1159 a random node A already part of the tree. Depending of the type of A 1160 the joining algorithm continues as follows: 1162 o Incomplete Nodes: Node A will arbitrarily select for which tree it 1163 wants to serve as an internal node, and adopt N in that tree. In 1164 the other tree node N will adopt node A as a child (taking node 1165 A's place in the tree) thus becoming an internal node in the 1166 stripe that node A didn't choose. 1168 o Only-Child Nodes: As this node has a parent which is an incomplete 1169 node, the joining node will be redirected to the parent node and 1170 will handle the request as detailed above. 1172 o Complete Nodes: The contacted node A must be a leaf in the other 1173 tree. If node A is a leaf node in Stripe 1, node N will become an 1174 internal node in Stripe 1, taking the place of node A, adopting it 1175 at the same time. To find a place for itself in the other stripe, 1176 node N starts a random walk down the subtree rooted at the sibling 1177 of node A (if node A is the root and thus does not have siblings, 1178 node N is sent directly to a leaf in that tree), which ends as 1179 soon as node N finds an incomplete node or a leaf. In this case 1180 node N is adopted by the incomplete node. 1182 o Special Node: as this node is a leaf in all subtrees, the joining 1183 node MAY adopt the node in one tree and become a child in the 1184 other. 1186 P2PCast uses defined messages for communication between nodes during 1187 reorganisation. To use P2PCast in this context, these messages are 1188 encapsulated by the message type REFORM. In doing so, the P2PCast 1189 message is to be included in the options parameter of REFORM. The 1190 following reorganisation messages are defined by P2PCast: 1192 TAKEON: To take another peer as a child 1194 SUBSTITUTE: To take the place of a child of some peer 1196 SEARCH: To obtain the child of a node in a particular stripe 1198 REPLACE: Different from SUBSTITUTE in that the node which makes us 1199 its child sheds off a random child 1201 DIRECT: To direct a node to its would-be parent 1203 UPDATE: A node sends its updated state to its children 1205 To adapt the P2PCast algorithm into the ALM Usage proposed here, 1206 after a JOIN request is accepted, a JOINAccept message MUST be 1207 returned to the joining node (one for every subtree). 1209 9.5. Leave 1211 When leaving a multicast group a node will change its local state to 1212 indicate that it left the group. Disregarding the case where the 1213 leaving node is the root of the tree, the leaving node must be 1214 complete or incomplete in its proper tree. In the other trees the 1215 node is a leaf and can just disappear by notifying its parent. For 1216 the proper tree, if the node is incomplete, it is replaced by its 1217 child. However, if the node is complete, a gap is created which is 1218 filled by a random child. If this child is incomplete, it can simply 1219 fill the gap. However, if it is complete, it needs to shed a random 1220 child. This child is directed to its sibling, which sheds a random 1221 child. This process ripples down the tree until the next-to-last 1222 level is reached. The shed node is then taken as a child by the 1223 parent of the deleted node in the other stripe. 1225 Again, for the reorganisation of the tree, the REFORM message type is 1226 used as defined in the previous section. 1228 9.6. JoinConfirm 1230 This message is not part of the P2PCast protocol, but required by the 1231 basic protocol defined in this document. Thus the usage MUST send 1232 this message to confirm a joining node accepting its parent node. As 1233 with Join and JoinAccept, this MUST be carried out for every subtree. 1235 9.7. Multicast 1237 A message to be multicast to a group MUST be sent to the rendezvous 1238 node from where it is forwarded down the tree by being split into k 1239 stripes. Each stripe is then sent via a subtree. If a receiving 1240 node is a member of the tree rather than just a forwarder it MAY pass 1241 the multicast data up to the application. 1243 10. Message Format 1245 All messages are mapped to the RELOAD experimental message type. The 1246 mapping is given in the following table. The message codes are given 1247 in Section 15.2. The format of the body of a message is given in 1248 Figure 5. 1250 +-------------------------+------------------+ 1251 | Message |RELOAD Code Point | 1252 +-------------------------+------------------+ 1253 | CreateALMTree | exp_a_req | 1254 +-------------------------+------------------+ 1255 | CreateALMTreeResponse | exp_a_ans | 1256 +-------------------------+------------------+ 1257 | Join | exp_a_req | 1258 +-------------------------+------------------+ 1259 | JoinAccept | exp_a_ans | 1260 +-------------------------+------------------+ 1261 | JoinReject | exp_a_ans | 1262 +-------------------------+------------------+ 1263 | JoinConfirm | exp_a_req | 1264 +-------------------------+------------------+ 1265 | JoinConfirmResponse | exp_a_ans | 1266 +-------------------------+------------------+ 1267 | JoinDecline | exp_a_req | 1268 +-------------------------+------------------+ 1269 | JoinDeclineResponse | exp_a_ans | 1270 +-------------------------+------------------+ 1271 | Leave | exp_a_req | 1272 +-------------------------+------------------+ 1273 | LeaveResponse | exp_a_ans | 1274 +-------------------------+------------------+ 1275 | Reform | exp_a_req | 1276 +-------------------------+------------------+ 1277 | ReformResponse | exp_a_ans | 1278 +-------------------------+------------------+ 1279 | Heartbeat | exp_a_req | 1280 +-------------------------+------------------+ 1281 | HeartbeatResponse | exp_a_ans | 1282 +-------------------------+------------------+ 1283 | NodeQuery | exp_a_req | 1284 +-------------------------+------------------+ 1285 | NodeQueryResponse | exp_a_ans | 1286 +-------------------------+------------------+ 1287 | Push | exp_a_req | 1288 +-------------------------+------------------+ 1289 | PushResponse | exp_a_ans | 1290 +-------------------------+------------------+ 1292 Figure 5: RELOAD Message Code mapping 1294 For Data Kind-IDs, the RELOAD specification states: "Code points in 1295 the range 0xf0000001 to 0xfffffffe are reserved for private use". 1296 ALM Usage Kind-IDs are defined in the private use range. 1298 All ALM Usage messages map to the RELOAD Message Extension mechanism. 1300 Code points for the kinds defined in this document MUST NOT conflict 1301 with any defined code points for RELOAD. RELOAD defines exp_a_req, 1302 exp_a_ans for experimental purposes. This specification uses only 1303 these message types for all ALM messages. RELOAD defines the 1304 MessageContents data structure. The ALM mapping uses the fields as 1305 follows: 1307 o message_code: exp_a_req for requests and exp_a_ans for responses 1308 o message_body: contains one instance of ALMHeader followed by one 1309 instance of ALMMessageContents 1311 o extensions: unused 1313 10.1. ALMHeader Definition 1315 struct { 1316 uint32 sam_token; 1317 uint16 alm_algorithm_id; 1318 uint8 version; 1319 } ALMHeader; 1321 The fields in ALMHeader are used as follows: 1323 sam_token: The first four bytes identify this message as an ALM 1324 message. This field MUST contain the value 0xd3414d42 (the string 1325 "SAMB" with the high bit of the first byte set. 1327 alm_algorithm_id: The ALM Algorith ID of the ALM algorithm being 1328 used. Each multicast tree uses only one algorithm. Trees with 1329 different ALM algorithms can co-exist, and can share the same 1330 nodes. ALM Algorithm ID codes are defined in Section 15.1 1332 version: The version of the ALM protocol being used. This is a 1333 fixed point integer between 0.1 and 25.4 This document describes 1334 version 1.0 with a value of 0xa. 1336 10.2. ALMMessageContents Definition 1338 struct { 1339 uint16 alm_message_code; 1340 opaque alm_message_body; 1341 } ALMMessageContents; 1343 The fields in ALMMessageContents are used as follows: 1345 alm_message_code: This indicates the message being sent. The 1346 message codes are listed in Section 15.2. 1348 alm_message_body: The message body itself, represented as a 1349 variable-length string of bytes. The bytes themselves are 1350 dependent on the code value. See Section 8 and Section 9 1351 describing the various ALM methods for the definitions of the 1352 payload contents. 1354 10.3. Response Codes 1356 Response codes are defined in section 6.3.3.1 in RELOAD. This 1357 specification maps to RELOAD ErrorResponse as follows: 1359 ErrorResponse.error_code = Error_Exp_A; 1361 Error_info contains an ALMErrorResponse instance. 1363 public struct { 1364 uint16 alm_error_code; 1365 opaque alm_error_info<0..2^16-1>; 1366 } ALMErrorResponse; 1368 alm_error_code: The following error code values are defined. Numeric 1369 values for these are defined in section Section 15.3. 1371 Error_Unknown_Algorithm: The multicast algorithm is not known or 1372 not supported. 1374 Error_Child_Limit_Reached: The maximum number of children nodes 1375 has been reached for this node 1377 Error_Node_Bandwidth_Reached: The overall data bandwidth limit 1378 through this node has been reached 1380 Error_Node_Conn_Limit_Reached: The total number of connections to 1381 this node has been reached 1383 Error_Link_Cap_Limit_Reached: The capacity of a link has been 1384 reached 1386 Error_Node_Mem_Limit_Reached: An internal memory capacity of the 1387 node has been reached 1389 Error_Node_CPU_Cap_Limit_Reached: An internal processing capacity 1390 of the node has been reached 1392 Error_Path_Limit_Reached: The maximum path length in hopcount over 1393 the multicast tree has been reached 1395 Error_Path_Delay_Limit_Reached: The maximum path length in message 1396 delay over the multicast tree has been reached 1398 Error_Tree_Fanout_Limit_Reached: The maximum fanout of a multicast 1399 tree has been reached 1400 Error_Tree_Depth_Limit_Reached: The maximum height of a multicast 1401 tree has been reached 1403 Error_Other: A human-readable description is placed in the 1404 alm_error_info field. 1406 11. Examples 1408 All peers in the examples are assumed to have completed 1409 bootstrapping. "Pn" refers to peer N. "GroupID" refers to a peer 1410 responsible for storing the ALMTree instance with GroupID. 1412 11.1. Create Tree 1414 A node with "NODE-MATCH" rights sends a request CreateTree to the 1415 group-id node, which also has NODE-MATCH rights for its own address. 1416 The group-id node determines whether to create the new tree, and if 1417 so, performs a local StoreReq. If the CreateTree succeeds, the 1418 ALMTree instance can be retrieved using Fetch. An example message 1419 flow for ceating a tree is depicted in Figure 6. 1421 P1 P2 P3 P4 GroupID 1422 | | | | | 1423 | | | | | 1424 | | | | | 1425 | CreateTree | | | 1426 |------------------------------->| 1427 | | | | | 1428 | | | | | StoreReq 1429 | | | | |--+ 1430 | | | | | | 1431 | | | | | | 1432 | | | | |<-+ 1433 | | | | | StoreResponse 1434 | | | | |--+ 1435 | | | | | | 1436 | | | | | | 1437 | | | | |<-+ 1438 | | | | | 1439 | | | | | 1440 | | CreateTreeResponse | 1441 |<-------------------------------| 1442 | | | | | 1443 | | | | | 1444 | Fetch | | | 1445 |------------------------------->| 1446 | | | | | 1447 | | | | | 1448 | | FetchResponse | 1449 |<-------------------------------| 1450 | | | | | 1452 Figure 6: Message flow example for CreateTree. 1454 11.2. Join Tree 1456 P1 joins node GroupID as child node. P2 joins the tree as a child of 1457 P1. P4 joins the tree as a child of P1. The corresponding message 1458 flow is shown in Figure 7. 1460 P1 P2 P3 P4 GroupID 1461 | | | | | 1462 | | | | | 1463 | Join | 1464 |------------------------------->| 1465 | | | | | 1466 | JoinAccept | 1467 |<-------------------------------| 1468 | | | | | 1469 | | | | | 1470 | |Join | 1471 | |----------------------->| 1472 | | | | | 1473 | Join| 1474 |<-------------------------------| 1475 | | | | | 1476 |JoinAccept | | | 1477 |------>| | | | 1478 | | | | | 1479 |JoinConfirm | | | 1480 |<------| | | | 1481 | | | | | 1482 | | | |Join | 1483 | | | |------>| 1484 | | | | Join | 1485 |<-------------------------------| 1486 | | | | | 1487 | Join | | | | 1488 |------>| | | | 1489 | | | | | 1490 | JoinAccept | | | 1491 |----------------------->| | 1492 | | | | | 1493 | | JoinAccept | | 1494 | |--------------->| | 1495 | | | | | 1496 | | | | | 1497 | | Join Confirm | | 1498 |<-----------------------| | 1499 | | | | | 1500 | | Join Decline | | 1501 | |<---------------| | 1502 | | | | | 1503 | | | | | 1505 Figure 7: Message flow example for tree Join. 1507 11.3. Leave Tree 1509 P1 P2 P3 P4 GroupID 1510 | | | | | 1511 | | | | | 1512 | | | Leave | | 1513 |<-----------------------| | 1514 | | | | | 1515 | LeaveResponse | | | 1516 |----------------------->| | 1517 | | | | | 1518 | | | | | 1520 Figure 8: Message flow example for Leave tree. 1522 11.4. Push Data 1524 The multicast data is pushed recursively P1 => GroupID => P1 => P2, 1525 P4 following the tree topology created in the Join example above. An 1526 example message flow is shown in Figure 9. 1528 P1 P2 P3 P4 GroupID 1529 | | | | | 1530 | Push | | | | 1531 |------------------------------->| 1532 | | | | | 1533 | | | PushResponse| 1534 |<-------------------------------| 1535 | | | | | 1536 | | | | Push| 1537 |<-------------------------------| 1538 | | | | | 1539 | PushResponse | | | 1540 |------------------------------->| 1541 | | | | | 1542 |Push | | | | 1543 |------>| | | | 1544 | | | | | 1545 |PushResponse | | | 1546 |<------| | | | 1547 | | | | | 1548 | Push | | | | 1549 |----------------------->| | 1550 | | | | | 1551 | | PushResponse | | 1552 |<-----------------------| | 1553 | | | | | 1554 | | | | | 1555 | | | | | 1557 Figure 9: Message flow example for pushing data. 1559 12. Kind Definitions 1561 12.1. ALMTree Kind Definition 1563 This section defines the ALMTree kind per section 7.4.5 in RELOAD. 1564 An instance of the ALMTree kind is stored in the overlay for each ALM 1565 tree instance. It is stored at the address group_id. 1567 Kind-Id: 0xf0000001 (This is a private-use code-point per section 1568 14.6 of RELOAD.) The Resource Name for the ALMTree Kind-ID is the 1569 session_key used to identify the ALM tree. 1571 Data Model The data model is the ALMTree structure. 1573 Access Control NODE-MATCH. The node performing the store operation 1574 is required to have NODE-MATCH access. 1576 Meaning: The meaning of the fields is given in Section 7.2.1. 1578 struct { 1579 node_id peer_id; 1580 opaque session_key<0..2^32-1>; 1581 node_id group_id; 1582 Dictionary options; 1583 } ALMTree; 1585 13. RELOAD Configuration File Extensions 1587 There are no ALM parameters defined for the RELOAD configuration 1588 file. 1590 14. Change History 1592 o Version 02: Remove Hybrid ALM material. Define ALMTree kind. 1593 Define new RELOAD messages. Define RELOAD architecture 1594 extensions. Add Scribe as base algorithm for ALM usage. Define 1595 code points. Define preliminary ALM-specific security issues. 1597 o Version 03: Add P2Pcast Algorithm. 1599 o Version 04: Add mapping to RELOAD experimental message. Modified 1600 IANA considerations section. Changed category of id from 1601 Informational to Experimental. New algorithm identification 1602 coding. New message coding. Added push message. Create Tree 1603 access policy changed to use NODE-MATCH. Create Tree StoreReq 1604 clarified. Updated the diagrams in the Examples section. Added a 1605 Push data example. Defined the ALMTree kind. 1607 Version 05: Updated references. Fixed typos. 1609 Version 06: Fixed typos. 1611 15. IANA Considerations 1613 This section contains the new code points registered by this 1614 document. [NOTE TO IANA/RFC-EDITOR: Please replace RFC-to-be with 1615 the RFC number for this specification in the following list. ] 1617 15.1. ALM Algorithm Types 1619 We request that IANA create a "SAM ALM Algorithm ID" Registry. 1620 Entries in this registry are 16-bit integers denoting Application 1621 Layer Multicast algorithms as described in section Section 10.1 of 1622 [RFC-to-be]. Code points in the range 0x3 to 0x7fff SHALL be 1623 registered via RFC 5226 [RFC5226] Expert Review. Code points in the 1624 range 0x7fff to 0xfffe are reserved for private use. The initial 1625 contents of this registry are: 1627 +----------------+-------------------+-----------+ 1628 | Algorithm Name | ALM Algorith ID | RFC | 1629 +----------------+-------------------+-----------+ 1630 | INVALID-ALG | 0 | RFC-to-be | 1631 | SCRIBE-SAM | 1 | RFC-to-be | 1632 | P2PCAST-SAM | 2 | RFC-to-be | 1633 | Reserved | 0x3..0xffff | RFC-to-be | 1634 +----------------+-------------------+-----------+ 1636 Figure 10 1638 These values have been made available for the purposes of 1639 experimentation. These values are not meant for vendor specific use 1640 of any sort and MUST NOT be used for operational deployments. 1642 15.2. Message Code Registration 1644 We request that IANA create a "SAM ALM Message Code" Registry. 1645 Entries in this registry are 16-bit integers denoting message codes 1646 as described in section Section 10.2 of [RFC-to-be]. Code points in 1647 the range 0x14 to 0x7fff SHALL be registered via RFC 5226 [RFC5226] 1648 Expert Review. Code points in the range 0x7fff to 0xfffe are 1649 reserved for private use. The initial contents of this registry are: 1651 +-------------------------+----------------------+-----------+ 1652 | Message Code Name | Message Code Value | RFC | 1653 +-------------------------+----------------------+-----------+ 1654 | InvalidMessageCode | 0 | RFC-to-be | 1655 | CreateALMTRee | 1 | RFC-to-be | 1656 | CreateALMTreeResponse | 2 | RFC-to-be | 1657 | Join | 3 | RFC-to-be | 1658 | JoinAccept | 4 | RFC-to-be | 1659 | JoinReject | 5 | RFC-to-be | 1660 | JoinConfirm | 6 | RFC-to-be | 1661 | JoinConfirmResponse | 7 | RFC-to-be | 1662 | JoinDecline | 8 | RFC-to-be | 1663 | JoinDeclineResponse | 9 | RFC-to-be | 1664 | Leave | 10 | RFC-to-be | 1665 | LeaveResponse | 11 | RFC-to-be | 1666 | Reform | 12 | RFC-to-be | 1667 | ReformResponse | 13 | RFC-to-be | 1668 | Heartbeat | 14 | RFC-to-be | 1669 | HeartbeatResponse | 15 | RFC-to-be | 1670 | NodeQuery | 16 | RFC-to-be | 1671 | NodeQueryResponse | 17 | RFC-to-be | 1672 | Push | 18 | RFC-to-be | 1673 | PushResponse | 19 | RFC-to-be | 1674 | Reserved | 0x14..0xffff | RFC-to-be | 1675 +-------------------------+----------------------+-----------+ 1677 Figure 11 1679 These values have been made available for the purposes of 1680 experimentation. These values are not meant for vendor specific use 1681 of any sort and MUST NOT be used for operational deployments. 1683 15.3. Error Code Registration 1684 We request that IANA create a "SAM ALM Error Code" Registry. Entries 1685 in this registry are 16-bit integers denoting error codes as 1686 described in section Section 10.3 of [RFC-to-be]. Code points in the 1687 range 0x14 to 0x7fff SHALL be registered via RFC 5226 [RFC5226] 1688 Expert Review. Code points in the range 0x7fff to 0xfffe are 1689 reserved for private use. The initial contents of this registry are: 1691 +----------------------------------+--------------+-----------+ 1692 | Error Code Name | Code Value | RFC | 1693 +----------------------------------+--------------+-----------+ 1694 | InvalidErrorCode | 0 | RFC-to-be | 1695 | Error_Unknown_Algorithm | 1 | RFC-to-be | 1696 | Error_Child_Limit_Reached | 2 | RFC-to-be | 1697 | Error_Node_Bandwidth_Reached | 3 | RFC-to-be | 1698 | Error_Node_Conn_Limit_Reached | 4 | RFC-to-be | 1699 | Error_Link_Cap_Limit_Reached | 5 | RFC-to-be | 1700 | Error_Node_Mem_Limit_Reached | 6 | RFC-to-be | 1701 | Error_Node_CPU_Cap_Limit_Reached | 7 | RFC-to-be | 1702 | Error_Path_Limit_Reached | 8 | RFC-to-be | 1703 | Error_Path_Delay_Limit_Reached | 9 | RFC-to-be | 1704 | Error_Tree_Fanout_Limit_Reached | 10 | RFC-to-be | 1705 | Error_Tree_Depth_Limit_Reached | 11 | RFC-to-be | 1706 | Error_Other | 12 | RFC-to-be | 1707 | Reserved | 0x0D..0xffff | RFC-to-be | 1708 +----------------------------------+--------------+-----------+ 1710 Figure 12 1712 These values have been made available for the purposes of 1713 experimentation. These values are not meant for vendor specific use 1714 of any sort and MUST NOT be used for operational deployments. 1716 16. Security Considerations 1718 Overlays are vulnerable to DOS and collusion attacks. We are not 1719 solving overlay security issues. We assume the node authentication 1720 model as defined in [I-D.ietf-p2psip-base]. 1722 ALM Usage specific security issues: 1724 o Right to create GroupID at some node_id 1726 o Right to store Tree info at some Location in the DHT 1728 o Limit on # messages / sec and bandwidth use 1730 o Right to join an ALM tree 1732 17. Acknowledgement 1734 Marc Petit-Huguenin, Michael Welzl, Joerg Ott, and Lars Eggert 1735 provided important comments on earlier versions of this document. 1737 18. Informative References 1739 [BUFORD2008] 1740 Buford, J. and H. Yu, "Peer-to-Peer Overlay Multicast", 1741 Encyclopedia of Wireless and Mobile Communications 2008, 1742 2008, . 1745 [BUFORD2009] 1746 Buford, J., Yu, H., and E. Lua, "P2P Networking and 1747 Applications (Chapter 9)", Morgan Kaufman 2009, 2009, 1748 . 1750 [CASTRO2002] 1751 Castro, M., Druschel, P., Kermarrec, A., and A. Rowstron, 1752 "Scribe: A large-scale and decentralized application-level 1753 multicast infrastructure", IEEE Journal on Selected Areas 1754 in Communications vol.20, No.8, October 2002, . 1758 [CASTRO2003] 1759 Castro, M., Jones, M., Kermarrec, A., Rowstron, A., 1760 Theimer, M., Wang, H., and A. Wolman, "An Evaluation of 1761 Scalable Application-level Multicast Built Using Peer-to- 1762 peer overlays", Proceedings of IEEE INFOCOM 2003, April 1763 2003, . 1766 [I-D.ietf-p2psip-base] 1767 Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and 1768 H. Schulzrinne, "REsource LOcation And Discovery (RELOAD) 1769 Base Protocol", draft-ietf-p2psip-base-26 (work in 1770 progress), February 2013. 1772 [I-D.irtf-samrg-common-api] 1773 Waehlisch, M., Schmidt, T., and S. Venaas, "A Common API 1774 for Transparent Hybrid Multicast", draft-irtf-samrg- 1775 common-api-06 (work in progress), August 2012. 1777 [I-D.muramoto-irtf-sam-generic-require] 1778 Muramoto, E., "Requirements for Scalable Adaptive 1779 Multicast Framework in Non-GIG Networks", draft-muramoto- 1780 irtf-sam-generic-require-01 (work in progress), November 1781 2006. 1783 [KOLBERG2010] 1784 Kolberg, M., "Employing Multicast in P2P Networks", 1785 Handbook of Peer-to-Peer Networking (Ed. X.Shen, H. Yu, J. 1786 Buford, M. Akon) 2010, 2010, . 1789 [P2PCAST] Nicolosi, A. and S. Annapureddy, "P2PCast: A Peer-to-Peer 1790 Multicast Scheme for Streaming Data", Stanford Secure 1791 Computer Systems Group Report 2003, May 2003, . 1794 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1795 Requirement Levels", BCP 14, RFC 2119, March 1997. 1797 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 1798 Text on Security Considerations", BCP 72, RFC 3552, July 1799 2003. 1801 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1802 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 1803 May 2008. 1805 [SPLITSTREAM] 1806 Castro, M., Druschel, P., Nandi, A., Kermarrec, A., 1807 Rowstron, A., and A. Singh, "SplitStream: High-bandwidth 1808 multicast in a cooperative environment", SOSP'03,Lake 1809 Bolton, New York 2003, October 2003, . 1813 Authors' Addresses 1815 John Buford 1816 Avaya Labs Research 1817 211 Mt. Airy Rd 1818 Basking Ridge, New Jersey 07920 1819 USA 1821 Phone: +1 908 848 5675 1822 Email: buford@avaya.com 1823 Mario Kolberg (editor) 1824 University of Stirling 1825 Dept. Computing Science and Mathematics 1826 Stirling FK9 4LA 1827 UK 1829 Phone: +44 1786 46 7440 1830 Email: mkolberg@ieee.org 1831 URI: http://www.cs.stir.ac.uk/~mko