idnits 2.17.1 draft-whetten-rmtp-ii-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-23) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 483 has weird spacing: '...ce with quali...' == Line 2050 has weird spacing: '... to the issui...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 8, 1998) is 9512 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 2971 -- Looks like a reference, but probably isn't: '1' on line 2973 == Missing Reference: 'T' is mentioned on line 1858, but not defined == Unused Reference: 'WMK94' is defined on line 3259, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 1301 (ref. 'AFK92') -- Possible downref: Non-RFC (?) normative reference: ref. 'BOGKS94' -- Possible downref: Non-RFC (?) normative reference: ref. 'FVLMZ96' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson88' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson90' -- Possible downref: Non-RFC (?) normative reference: ref. 'LC98' -- Possible downref: Non-RFC (?) normative reference: ref. 'LG96' -- Possible downref: Non-RFC (?) normative reference: ref. 'LG97' -- Possible downref: Non-RFC (?) normative reference: ref. 'LLG96' -- Possible downref: Non-RFC (?) normative reference: ref. 'MRTW97' -- Possible downref: Non-RFC (?) normative reference: ref. 'NB96' -- Possible downref: Non-RFC (?) normative reference: ref. 'NB98' -- Possible downref: Non-RFC (?) normative reference: ref. 'NBT97' -- Possible downref: Non-RFC (?) normative reference: ref. 'NLJBC98' -- Possible downref: Non-RFC (?) normative reference: ref. 'PPV98' -- Possible downref: Non-RFC (?) normative reference: ref. 'PSK94' -- Possible downref: Non-RFC (?) normative reference: ref. 'PSLB97' -- Possible downref: Non-RFC (?) normative reference: ref. 'Rizzo97' -- Possible downref: Non-RFC (?) normative reference: ref. 'SFLT98' -- Possible downref: Non-RFC (?) normative reference: ref. 'WMK94' -- Possible downref: Non-RFC (?) normative reference: ref. 'YGS95' Summary: 10 errors (**), 0 flaws (~~), 5 warnings (==), 24 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft: RMTP-II Specification B. Whetten, M. Basavaiah 3 Document: draft-whetten-rmtp-ii-00.txt GlobalCast Communications, Inc. 4 Expires: Six months S. Paul 5 Lucent Technologies, Bell Labs 6 T. Montgomery 7 West Virginia University 8 N. Rastogi, J. Conlan, T. Yeh 9 GlobalCast Communications, Inc. 10 April 8, 1998 12 THE RMTP-II PROTOCOL 14 Status of this Memo 16 This document is an Internet-Draft. Internet-Drafts are working 17 documents of the Internet Engineering Task Force (IETF), its areas, 18 and its working groups. Note that other groups may also distribute 19 working documents as Internet-Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is not appropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress". 26 To view the entire list of current Internet-Drafts, please check 27 the "1id-abstracts.txt" listing contained in the Internet-Drafts 28 Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net 29 (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au 30 (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu 31 (US West Coast). 33 Abstract 35 The Reliable Multicast Transport Protocol II, RMTP-II, is a reliable 36 multicast protocol, designed to reliably and efficiently send data 37 from a few senders to large groups of simultaneous recipients. It is 38 designed primarily for use over controlled network topologies. It 39 works over both symmetric networks, as well as over asymmetrical 40 network topologies such as those provided by satellite, cable modem 41 or Asymmetrical Digital Subscriber Line (ADSL) carriers. Before 42 sending, each sender must connect with a trusted Top Node, to receive 43 permission and control parameters for its data stream. The top node 44 provides network managers with a single point of control for the 45 senders, allowing them to monitor and control the traffic being sent. 47 Table of Contents 49 1 INTRODUCTION 4 50 2 ENTITIES 6 51 2.1 Node 6 52 2.2 RMTP-II Tree 6 53 2.3 Multicast Group Address 6 54 2.4 Data Channels 6 55 2.5 Control Channels 6 56 2.6 Data Stream 7 57 2.7 StreamID 7 58 2.8 Packet Sequence Numbers 7 59 2.9 Data Queue 7 60 2.10 RMTP-II Packets 7 61 3 NODE FUNCTIONS 10 62 3.1 Sender Node 10 63 3.2 Receiver Node 11 64 3.3 Top Node 12 65 3.4 Aggregator Node 13 66 3.5 Designated Receiver Node 13 67 4 OPERATION OF THE RMTP-II PROTOCOL 14 68 4.1 Basic Operation of the Protocol 14 69 4.2 Global Tree Parameters 15 70 4.3 Data and Control Paths 16 71 4.4 Data Transmission 16 72 4.5 Retransmission 16 73 4.6 Options 17 74 4.7 Tree Connections 18 75 4.8 Flow and Congestion Control 18 76 4.9 Membership 20 77 4.10 Fault Detection and Recovery 20 78 4.11 Quality of Service (QoS) 21 79 4.12 SNMP Support 22 80 4.13 Session Manager Support 22 81 4.14 Forward Error Correction 22 82 5 DETAILS: 23 83 5.1 RMTP-II Global Parameters 23 84 5.2 Tree Configuration 24 85 5.3 Tree Membership Algorithms 26 86 5.4 Data Transmission 28 87 5.5 Data Reception 32 88 5.6 HACK Generation 34 89 5.7 Retransmissions in response to a HACK 39 90 5.8 NACK Generation 40 91 5.9 NACK Response 41 92 5.10 Congestion Control 41 93 5.11 Failure Detection and Recovery 41 94 5.12 Control tree discovery 44 95 5.13 Control Tree Round Trip Times 46 96 6 RMTP-II PACKET FORMATS 48 97 6.1 RMTP-II Fixed Header 48 98 6.2 Data 50 99 6.3 Retransmission Packet 52 100 6.4 HACK 54 101 6.5 JoinConfirm 56 102 6.6 JoinStream 58 103 6.7 JoinAck 59 104 6.8 LeaveStream 60 105 6.9 Heartbeat 61 106 6.10 HeartbeatResponse 62 107 6.11 NullData 63 108 6.12 Eject 64 109 6.13 EOS 65 110 6.14 LeaveConfirm 66 111 6.15 RMTP-II Options Header 67 112 7 ACKNOWLEDGEMENTS 81 113 8 REFERENCES 83 115 1 Introduction 117 IP Multicast (RFC 1112) provides highly efficient delivery of 118 information to many receivers at once, with each packet going over 119 any given link no more than a single time. To date, there are no 120 standardized, reliable multicast protocols that provide the 121 equivalent of TCP for IP Multicast. It is widely recognized that, 122 unlike TCP, no single reliable multicast protocol can meet the needs 123 of all application types over all network types. 125 The Reliable Multicast Transport Protocol II, RMTP-II, is a reliable 126 multicast protocol which reliably and efficiently sends data from a 127 few senders to large groups of simultaneous recipients. RMTP-II 128 works over all types of networks, including symmetric networks as 129 well as asymmetrical networks such as those provided by satellite, 130 cable modem or Asymmetrical Digital Subscriber Line (ADSL) carriers. 131 RMTP-II requires some configuration of topology. 133 The objectives of RMTP-II are guaranteed reliability, high 134 throughput, and low end-to-end delay on any network topology, while 135 providing the network manager with control over transmission traffic. 137 RMTP-II is based on a hierarchical tree structure in which receivers 138 are grouped into local regions. In each region a special control 139 node is responsible for maintaining receiver membership and for 140 aggregating the acknowledgments from the receivers in its region and 141 forwarding them to the sender. The control node may also take 142 responsibility for retransmitting dropped packets to the local 143 receivers. 145 RMTP-II is a sender-reliable multicast protocol which uses a fully 146 distributed membership protocol to keep track of the current 147 membership of the tree. This allows senders to determine when 148 packets become stable and may be deleted. RMTP-II provides strong 149 guarantees on packet delivery and gives the senders a count of the 150 number of receivers that successfully received each packet. It also 151 supports optional negative acknowledgements of missed packets for 152 faster recovery of data and lower control traffic on low loss 153 networks. 155 The Internet is highly dependent on the TCP congestion control 156 mechanisms which allow all streams to share bandwidth fairly. 157 Widespread deployment of a transport protocol that does not interact 158 gracefully with the TCP protocol has the potential to do significant 159 damage to the Internet. This has been an obstacle to the 160 standardization of a reliable multicast protocol. RMTP-II is 161 designed to interact closely with congestion control algorithms and 162 provide these algorithms with constant information about the loss 163 rates and round trip times in the tree. The RMTP-II congestion 164 control algorithms are under development and will be presented in a 165 companion document. 167 RMTP-II complements its congestion control algorithms with additional 168 centralized management control over all RMTP-II streams running on 169 the network. RMTP requires senders to interact with a trusted Top 170 Node and accept configuration information from this node. The Top 171 Node provides the network manager with an SNMP interface for 172 monitoring and controlling the streams with which it is associated. 173 The network manager can place bandwidth limits on each stream and can 174 specify the congestion control parameters that each sender must use. 175 This provides graceful, controlled deployment of reliable multicast 176 that protects the network both through explicit management and 177 automatic congestion control policies. 179 RMTP-II requires some topology configuration which is left to an 180 external Session Manager. A Session Manager can be implemented as 181 either a centralized tool controlled by the network manager, as 182 static configuration files, or as a set of fully distributed 183 algorithms. The Session Manager is also responsible for optional 184 session advertisements and optional total group membership tracking. 186 RMTP-II provides the following additional features: 188 - Optional, integrated Forward Error Correction (FEC)is provided. 190 - Receivers may join or leave a stream already in progress. 192 - SNMP support is furnished all nodes. 194 - Senders and receivers may be ejected. 196 - Many-to-many multicast or special upgrades to routers is not 197 required. 199 - Time Bounded Reliability is provided for bounded recovery of data 200 for synchronous real-time streaming applications. 202 - A fault tolerant top node is supported. 204 - Dynamic fault detection and recovery are supported. 206 - The rate of generation of acknowledgement traffic can be 207 controlled. 209 Appendix A contains a design rationale for RMTP-II. 211 2 Entities 213 2.1 Node 215 A node is an entity with a network address which provides 216 transmission, control, or reception services for the network. 218 There are five types of nodes: sender nodes which transmit data, 219 receiver nodes which receive data, and three types of control nodes 220 which provide acknowledgment, aggregation and communication services 221 for control functions. The control nodes are: aggregator nodes which 222 aggregate and forward control information, designated receiver nodes 223 which function as aggregators and also receive and retransmit data, 224 and top nodes which perform aggregation and other control functions 225 at the top of the RMTP communication tree. Node functionality is 226 described later. 228 2.2 RMTP-II Tree 230 An RMTP-II tree is a collection of nodes connected into a tree-like 231 communication network with the top node at the root. The primary 232 function of the tree is to efficiently distribute acknowledgement 233 packets. 235 2.3 Multicast Group Address 237 A multicast group address is a pair consisting of an IP multicast 238 address and a UDP port number. Multicast group addresses are used for 239 sending data to receivers, and for sending control information to 240 local groups. 242 2.4 Data Channels 244 A data channel is a communication path used to send data streams. 245 Each data channel is associated with a multicast group address used 246 for sending and receiving the data streams. A sender sends a data 247 stream on a data channel. Receivers must join the data channel to 248 receive the data stream. 250 One RMTP-II Tree supports multiple data channels. A single data 251 channel can be used by one or more senders to send data. A receiver 252 must join all data channels corresponding to the streams it wishes to 253 receive. 255 2.5 Control Channels 257 A control channel is a communication path used to send control 258 information. A control channel may be associated with a unicast IP 259 address and port or may be associated with a multicast group address. 261 2.6 Data Stream 263 A data stream is a sequence of Data packets having a unique StreamID 264 and a data channel. 266 2.7 StreamID 268 A StreamID is a 16-bit number, chosen either by the application that 269 creates the stream or by RMTP-II. Senders and receivers use the 270 StreamID to distinguish data streams. A sender may specify a 271 StreamID in the range from 32768 (2^15) to 65535 (2^16)-1. Numbers 272 in the range from 0 to 32768 are reserved. If a sender specifies 0 273 as the StreamID, then RMTP-II assigns a unique StreamID in the range 274 from 1 to 32768. If a sender application specifies a StreamID that 275 is in use, RMTP-II will not allow the sender to join the RMTP tree. 277 2.8 Packet Sequence Numbers 279 A packet sequence number is a 32 bit number in the range from 1 280 through 2^32 - 1 which is used to specify the sequential order of a 281 Data packet in a stream of Data packets. A sender node assigns 282 consecutive sequence numbers to the Data packets provided by the 283 sender application for the data stream. Zero is reserved. 285 2.9 Data Queue 287 A data queue is a buffer, maintained by a sender or a designated 288 receiver node, for transmission and retransmission of the Data 289 packets provided by the sender application. New Data packets are 290 added to the data queue as they arrive from the application, up to a 291 specified buffer limit. The admission rate of packets to the network 292 is controlled by congestion control algorithms. Once a packet has 293 been received by its target recipients, it may be deleted from the 294 buffer under appropriate conditions. 296 2.10 RMTP-II Packets 298 All RMTP-II packets consist of a fixed header, optional option 299 headers, followed by data or control information. 301 Data is carried by Data packets and Retransmission packets. Control 302 information is carried in the following types of control packets: 303 HACK, NACK, JoinStream, JoinAck, JoinConfirm, LeaveStream, Heartbeat, 304 HeartbeatResponse, NullData, Eject, EOS, and LeaveConfirm. 306 2.10.1 Data Packet 307 The sender application provides Data packets to the sender node. The 308 sender node assigns consecutive sequence numbers to the Data packets 309 and multicasts the packets on the data channel. 311 Each Data packet has a StreamID and a sequence number which identify 312 the packet and allow a receiver application to reconstruct the data 313 stream from the Data packets. A receiver learns about Data packets 314 that have not yet arrived by receiving a packet later in the sequence 315 or from a NullData packet which mentions later packets. 317 A Data packet is said to be "stable", if it has been acknowledged as 318 received by all its target recipients. 320 2.10.2 Retransmission Packet 322 Retransmitted data is sent in Retransmission packets. A separate 323 packet type is used for retransmission to more easily enable use of 324 subtree multicast when this option is eventually supported by 325 routers. 327 2.10.3 HACK Packets 329 A HACK packet is unicast from a child node to its parent node to 330 indicate the status of the Data packets which have arrived and to 331 furnish statistics about the state of data reception at the node. 333 A HACK requests retransmission of Data packets that have not been 334 received. A HACK also acknowledges packets that have become stable. 335 A packet becomes stable when it has been received by sufficient, 336 critical nodes so that the sender is no longer required to hold the 337 packet for retransmission. 339 2.10.4 NACK 341 A NACK is a HACK that is used to request immediate recovery of lost 342 Data packets. 344 2.10.5 JoinStream Packet 346 All nodes except the top node send a JoinStream request to join a 347 data stream. The first JoinStream packet implicitly joins the RMTP- 348 II tree. 350 2.10.6 JoinConfirm Packet 352 A parent sends a JoinConfirm after it processes a JoinStream packet. 353 The JoinConfirm indicates the success or failure of the JoinStream 354 request. 356 2.10.7 JoinAck Packet 358 If a control node is unable to respond immediately to a JoinStream 359 request by a child with a JoinConfirm packet, it unicasts a JoinAck 360 packet to that child. 362 2.10.8 LeaveStream Packet 364 All nodes, except the top node, send a LeaveStream request when the 365 node leaves the data stream. 367 2.10.9 Heartbeat Packet 369 A control node periodically sends Heartbeat packets to notify its 370 child nodes that it is alive. 372 If a child node does not receive a Heartbeat packet within a 373 specified time limit, it detects that the parent has failed and joins 374 another parent node. 376 2.10.10 HeartbeatResponse Packet 378 All nodes, except the top node, unicast a HeartbeatResponse packet to 379 its parent node in response to a Heartbeat packet. The 380 HeartbeatResponse packet indicates that the child node is still 381 alive. 383 If a parent does not receive a HeartbeatResponse packet or a HACK 384 packet for a specified interval of time, then it detects that the 385 child node has failed and removes that child from its list of child 386 nodes. 388 2.10.11 NullData Packet 390 If a sender node has no data to send for a stream, it periodically 391 multicasts a NullData packet on the data channel. NullData packets 392 inform receivers about the state of the data stream and the sender. 394 A NullData packet contains the sequence number of the last packet 395 sent, which allows the receivers to determine missing packets. If a 396 receiver does not receive any Data or NullData packets from the 397 sender for a specified time, then the receiver will declare the 398 sender failed. 400 2.10.12 Eject Packet 402 If a child node is not operating normally, or a parent node restarts 403 after a failure and receives a packet from a child not in its child 404 list, then the parent node unicasts an Eject packet to the child 405 node. This child node can be another control node, a sender, or a 406 receiver. 408 2.10.13 LeaveConfirm Packet 410 A parent node unicasts a LeaveConfirm packet in response to a 411 LeaveStream packet from a child node. 413 3 Node Functions 415 3.1 Sender Node 417 Stream Variables 419 A sender node maintains the values of variables relating to the data 420 stream: the size of the data queue, the packet admission rate, the 421 sequence number of the lowest numbered unstable packet, and the 422 sequence number of the highest numbered packet in the data queue. 424 Join Stream 426 A sender node sends a JoinStream request to the top node for each 427 stream it intends to send. The sender provides a data channel with 428 the JoinStream request and either provides a unique StreamID or else 429 requests that the top node generate a unique StreamID. 431 Data Transmission 433 A sender multicasts Data packets to the receivers on the data channel 434 at a sending rate determined by the flow and congestion control 435 algorithms. If there is no data to send for a stream, the sender 436 periodically sends NullData packets to indicate that the data stream 437 is active. 439 Data Queue 441 When a sender receives a HACK from the top node, it uses the 442 information to delete packets from the data stream's data queue. A 443 packet is deleted from the queue if that packet is stable and all 444 lower numbered packets are also stable. 446 Data Retransmission 447 A HACK or NACK indicates which packets have not been received. A 448 sender retransmits the missed packets on the data channel. The time 449 interval between successive retransmissions for a data packet is 450 doubled for each retransmission, with an upper limit of 64 451 seconds(exponential backoff). 453 Data retransmission has higher priority than new data transmission. 454 The sending rate is controlled by the flow and congestion control 455 algorithms. 457 Tree Integrity 459 A sender gets Heartbeat packets from the top node's multicast control 460 channel and sends HeartbeatResponse packets to the top node on the 461 unicast control channel. This allows the sender and the top node to 462 determine whether the other is functioning. The NullData packets 463 also inform the receivers that the sender is functioning. 465 Receivers send HACKs in response to NullData packets, which informs 466 the sender of the status of the receivers. 468 3.2 Receiver Node 470 Join Stream 472 A receiver node sends a JoinStream request to its parent node for 473 each of the data streams that it wishes to receive. 475 Data Reception 477 A receiver node receives the data packets on the data channels and 478 delivers the data to its application. 480 Data Reliability 482 A receiver delivers the Data packets to the receiver application in 483 accordance with quality of service specified for the data stream. 485 A receiver sends HACKs to its parent to indicate the status of the 486 Data packets received and missed. 488 A receiver may also send NACKs to expedite the recovery of missing 489 Data packets. 491 Tree Integrity 493 A receiver receives Heartbeats from its parent on the parent's 494 multicast control channel to insure that the parent is operating. 496 A receiver sends HACKs and NACKs to its parent for each of its active 497 streams to indicate its availability. If none of the streams that a 498 receiver has joined are active, it periodically sends a 499 HeartbeatResponse packet to its parent. 501 3.3 Top Node 503 Tree Creation 505 The top node is assigned administratively and is the core of the 506 tree. 508 Sender Control 510 The top node controls transmission parameters and congestion control 511 parameters to each sender, and can change these dynamically while the 512 sender is transmitting. 514 Data Reliability 516 The top node aggregates HACKs received from its children and forwards 517 the aggregated HACK to the sender node. 519 Group Membership 521 The top node aggregates the membership count of the receivers, and 522 passes this to each sender. 524 Data Transmission 526 The top node may optionally accept unicast data from senders and 527 provide multicast transmission to the group. This is needed if the 528 sender node cannot multicast data. 530 Stream Identification 532 The top node allocates new StreamID values and guarantees the 533 uniqueness of the StreamIDs. If any sender requires a system 534 generated StreamID, the top node provides a unique value. If a 535 sender provides a StreamID value, the top node verifies that the 536 value is unique. If the value is not unique, the sender is not 537 allowed to join the RMTP-II tree. 539 Tree Integrity 541 The top node sends Heartbeats on its multicast control channel to 542 notify its child nodes and senders that it is available. 544 The child nodes of the top node respond to its Heartbeats with HACKs 545 or Heartbeat response packets. 547 The sender nodes respond to the top node Heartbeats with 548 HeartbeatResponse packets. 550 3.4 Aggregator Node 552 Data Reliability 554 An aggregator node aggregates HACKs from its child nodes and sends 555 them to its parent node on the unicast control channel. 557 NACK Forwarding 559 An aggregator node forwards NACKs from its child nodes to its parent 560 node. 562 Retransmission Forwarding 564 An aggregator node forwards retransmissions from its parent to its 565 children. 567 Group Membership 569 The aggregator node accumulates the membership count of its 570 receivers, and passes this count to its parent. An aggregator node 571 also maintains a list of its receivers. 573 Tree Integrity 575 An aggregator sends Heartbeats on its multicast control channel to 576 notify its child nodes of its availability. 578 An aggregator receives Heartbeats from its parent to insure that its 579 parent is available. 581 An aggregator receives HACKs or HeartbeatResponse packets from its 582 child nodes to insure that the child modes are available. 584 3.5 Designated Receiver Node 586 Aggregation 588 A designated receiver node has all the functionality of the 589 aggregator node. 591 Data Retransmission 592 A designated receiver receives data from all the data streams that 593 are joined by its child receiver nodes. A designated receiver 594 buffers the data for potential retransmission to its child receiver 595 nodes. 597 4 Operation of the RMTP-II protocol 599 4.1 Basic Operation of the Protocol 601 RMTP-II provides sequenced, reliable delivery of data from a few 602 senders to a large group of receivers. RMTP-II consists of a network 603 that has one or more sender nodes, many receiver nodes and one or 604 more control nodes. 606 The simplest configuration consists of a single sender and a top node 607 and receivers on multiple hosts connected to the network. An 608 implementation could allow multiple nodes of different types to run 609 in a single process, allowing a host to act as a sender and a 610 receiver, a sender and a top node, a receiver and a designated 611 receiver, or other combinations. 613 The figure below illustrates an RMTP tree with multiple control 614 nodes. 616 HACKs 617 -----------> (Top Node)----------------->(Sender node) 618 ^ ^^^ | 619 | / | \ | 620 HACKs | / | \ | 621 | / | \ | 622 | / | \ (Designated | 623 / | \ Receiver | 624 (Aggregator / | \ node) v 625 nodes) A N A N D N <------------| 626 ^^ ^^^ ^^ | Data 627 / | / | \ | \ | Channel 628 / | / | \ | \ | 629 / | / | \ | \ v 630 RN RN RN RN RN RN RN <------ 631 (Receiver Nodes) 633 A sender joins the RMTP tree and multicasts Data packets on the data 634 channel. 636 In the case of a symmetrical network whose control tree topology 637 congruent to the multicast routing tree topology, the data channel 638 may have the same path as the control channel. In this case, a data 639 packet would be multicast from the sender through the control nodes 640 to the receivers. 642 In the case of an asymmetrical network such as a one way satellite 643 network with a terrestrial return path, the receivers and designated 644 receivers would receive the multicast data directly, but the 645 aggregators and top nodes will only receive control traffic. 647 A receiver joins a data channel to receive data. A receiver 648 periodically informs its parent about the packets that it has or has 649 not received by unicasting a HACK packet to the parent. Each parent 650 node aggregates the HACKs from its child nodes and unicasts a single 651 aggregated HACK to its parent. The top node aggregates the HACKs 652 from its child nodes and unicasts a single HACK to the sender. 654 Each control node multicasts Heartbeat packets that inform their 655 child nodes that the parent node is still functioning. 657 A tree forms a loop from the sender to the receivers, and back to the 658 sender. Data and NullData regularly exercise the downward data 659 direction. Heartbeat packets exercise the downward control direction. 660 HACKs, NACKs, and HeartbeatResponse packets regularly exercise it in 661 the upward direction. This combination constantly checks that all of 662 the nodes in the tree are still functioning correctly, and initiates 663 fault recovery when required. 665 4.2 Global Tree Parameters 667 A collection of tree-wide parameters are set and controlled at the 668 top node. All the nodes of the RMTP tree acquire these parameters 669 when they join the RMTP tree for the first time and receive a 670 JoinConfirm packet. 672 These global parameters can be changed at the top node. The changes 673 are propagated to the nodes of the tree in optional fields of the 674 Heartbeat packets. 676 The top node sends the modified global parameters in Heartbeat 677 packets and waits for all of its child nodes to confirm the change. 679 A child node receives and applies the changed parameters and sends a 680 confirmation to its parent in a special HeartbeatResponse packet. 682 When a control node receives a Heartbeat packet with modified global 683 parameter options, it propagates the modified global parameters to 684 its child nodes in its Heartbeat packets. A control node sends the 685 confirmation HeartbeatResponse packet to its parent only after it 686 receives HeartbeatResponse confirmations from all its child nodes. 688 When the top node gets confirmations from all its child nodes, it is 689 guaranteed that all the nodes in the tree have updated their global 690 parameters. 692 4.3 Data and Control Paths 694 Data is multicast by a sender on the data channel. Data may be 695 retransmitted by the sender on the data channel or be multicast or 696 unicast on a local control channel by a designated receiver. 698 The control channels are organized into a tree with the top node at 699 the top of the tree and the receivers at the bottom of the tree. 700 Aggregators and designated receivers are always in the middle of the 701 tree. 703 4.4 Data Transmission 705 In order to receive Data packets from a sender, a receiver must join 706 the multicast group for the stream to be received. Each multicast 707 group is called a data channel. A receiver only joins the data 708 channels for the streams it will receive. 710 A designated receiver joins all of the multicast groups that its 711 descendants have joined. 713 A sender transmits Data packets to a group using IP Multicast. The 714 intermediate nodes in the RMTP-II tree structure are not responsible 715 for forwarding these Data packets. The standard IP Multicast routers 716 forward the packets to all receivers and designated receivers. 718 4.5 Retransmission 720 There are three types of retransmissions: local retransmission, 721 global retransmission, and subtree retransmission. 723 4.5.1 Local Retransmission 725 If a designated receiver determines from its child node's HACKs or 726 NACKs that a Data packet was missed, the designated receiver 727 retransmits the Data packet. The designated receiver multicasts the 728 retransmission on its multicast control channel. Each control node 729 child forwards a retransmission, as required, to its children. 731 4.5.2 Global Retransmission 732 If a sender node receives a HACK or NACK that indicates missing 733 packets, the sender performs global retransmission. The 734 unacknowledged packets are multicast as Retransmission packets on the 735 data channel. 737 4.5.3 Subtree Retransmission 739 On a network whose routers support subtree multicast transmission, a 740 designated receiver at the top of a subtree may multicast 741 retransmitted data to its subtree. In this case, it is not necessary 742 for the control nodes below the designated receiver in the subtree to 743 forward the retransmissions. Currently, no commercial routers 744 support this feature. 746 4.6 Options 748 4.6.1 Expedited Recovery 750 Expedited Recovery is an optional feature, which can be enabled on a 751 per-stream basis. If the NACK option is enabled for a Data stream, 752 an explicit request, called a negative acknowledgment, or NACK, may 753 be sent by a child node to its parent requesting retransmission of 754 missed data packets. Expedited recovery is used for data streams 755 that require very low latency for data recovery or for networks that 756 have low loss rates and wish to reduce the amount of control traffic 757 sent. 759 4.6.2 NACK Suppression 761 A receiver waits a random interval, within a specified range, before 762 sending a NACK. If the receiver receives the retransmitted data 763 before the NACK timer expires, the receiver cancels the NACK. This 764 reduces the chance that multiple receivers generate a NACK for the 765 same packet. 767 A designated receiver node multicasts a Data packet to its children 768 as soon as it gets a NACK request for that packet. 770 An aggregator node or a top node forwards only one NACK for a missing 771 Data packet within a specified period of time. Other NACKs for a 772 missing packet are ignored because the up-stream designated receiver 773 or Sender will multicast the retransmission. 775 4.6.3 Control Tree discovery 777 RMTP supports an option which allows the nodes in the RMTP tree to 778 acquire the addresses and location of its ancestors in the tree and 779 the addresses of its parent's siblings 780 If an RMTP-II node's parent fails, then the node can use the acquired 781 information to join an alternate control node. 783 4.6.4 Forward Error Correction 785 Option fields in RMTP-II can be used to implement Forward Error 786 Correction. RMTP-II supports both proactive FEC, in which a fixed 787 amount of parity is sent with the data, and reactive FEC, in which 788 parity is sent in retransmission packets to repair multiple 789 independent losses. See the Appendix - Forward Error Correction for 790 details. 792 4.6.5 Fault Tolerant Top Node 794 RMTP-II provides support for the implementation of a fault tolerant 795 top node. The top node uses the options field of its Heartbeat 796 packets to send the address of a backup top node to all of its 797 children. The top node sets this option when a backup top node joins 798 or leaves the RMTP tree. 800 4.6.6 Congestion Control 802 Congestion Control option fields are used to propagate the loss rate 803 and other congestion control parameters. RMTP-II proposes one 804 congestion control mechanism. See the Appendix - Congestion Control 805 for details. 807 4.6.7 Time Bounded Reliability 809 A bound for delivery time may be optionally specified for a stream. 810 Packets are delivered exactly once and in the same order in which 811 they were sent from the source, but if the time bound on a packet 812 expires it is dropped. See the Appendix - Time Bounded Reliability 813 for details. 815 4.7 Tree Connections 817 In order to join a tree, each node, except top node, sends a 818 JoinStream packet to its parent node and sets a Join timer for the 819 confirmation. 821 If the parent node can immediately confirm the JoinStream request, it 822 responds with a JoinConfirm packet. If the parent node needs more 823 time to process the JoinStream request, it responds with a JoinAck 824 packet. After it fully processes the JoinStream, it sends a 825 JoinConfirm packet. 827 The top node does additional processing for a JoinStream request from 828 a sender node. The top node validates the uniqueness of the StreamID 829 of the JoinStream request. If a sender requests a system generated 830 StreamID, the top node generates a unique StreamID. 832 4.8 Flow and Congestion Control 834 Flow and congestion control algorithms act to prevent the senders 835 from overloading the receivers. 837 RMTP-II uses a send-ahead mechanism to allow continuous transmission 838 of data without waiting for packet acknowledgments. This is used in 839 conjunction with flow and rate control algorithms. 841 Research in congestion control for large scale reliable multicast is 842 not yet mature. RMTP-II is currently designed to support a range of 843 window based and rate based flow and congestion control mechanisms. 844 Each of these mechanisms can be specified as an option and is 845 controlled by the top node of the tree. For general Internet use, 846 specific congestion control policies will eventually be mandated. 848 4.8.1 Fixed Rate Based Control 850 Fixed, rate-based flow control limits the transmission speed to a 851 predefined value. The flow control rate includes the global 852 multicast retransmissions from the sender. 854 4.8.2 Congestion Control 856 RMTP-II uses a Loss Tolerant Rate Controller algorithm (LTRC) for 857 congestion control. See Appendix B. This controller uses loss report 858 information from the receivers and perceived state at the sender to 859 perform congestion control. The goal of the controller is to be 860 responsive to congestion, but not overly reactive to spurious 861 independent loss. 863 LTRC allows the sender application to dynamically adjust the admit 864 rate between specified limits. A sender gets loss rate information 865 for the receivers from the HACK packets. The bottom level control 866 nodes calculate loss rate for each of its receiver child nodes from 867 the HACK packet information. The control nodes send the maximum loss 868 rate for the child nodes to its parent in a HACK packet. The next 869 level control nodes do not aggregate the loss rate, but send the 870 maximum loss rate for all the child nodes. 872 4.8.3 Ejection of Slow Receivers 874 Each sender node is responsible for ejecting slow receivers. 876 The sender application provides the minimum and maximum admit rate 877 limits. If the admit rate falls below the minimum rate, the sender 878 maintains the admit rate at the minimum rate and unicasts an Eject 879 packet to the top node. The Eject packet contains the loss rate 880 threshold for the stream and a flag that indicates whether all 881 receivers that exceed the threshold should be ejected or only the 882 receiver(s) with the maximum loss rate exceeding the threshold. The 883 top node sends one or more Eject packets to its children based on the 884 loss rate threshold and bit flag in the Eject packet. If the flag is 885 set then it will send Eject packet to all of its children having loss 886 more than the threshold else only to the child having maximum lass 887 over threshold. If the child node is a control node it uses the same 888 mechanism to send Eject packet to its children. The receiver(s) 889 having loss rates at or above the threshold get Eject packets and are 890 required to leave the stream. The receivers may rejoin the stream at 891 a later time. 893 4.9 Membership 895 RMTP provides a simple counted membership for each stream, available 896 at the top node and sender node. This counted membership is a simple 897 count of the receiver nodes that have joined a stream. 899 The detailed RMTP tree and stream membership information is 900 distributed across all the control nodes in the RMTP tree. An 901 external membership server can be used to acquire the full tree and 902 per stream membership from all the control nodes. 904 4.10 Fault Detection and Recovery 906 4.10.1 Sender node failure 908 A sender node that has no data to send will periodically send 909 NullData packets on the data channel. If a receiver fails to receive 910 Data packets or NullData packets for a stream sent by the sender, the 911 receiver detects a sender failure. 913 A sender node sends HeartbeatResponse packets to the top node. If 914 the top node fails to receive a HeartbeatResponse from a sender, it 915 detects the sender's failure. 917 4.10.2 Receiver node failure 919 A receiver node sends HACKs for each of the active streams that it 920 has joined. If none of the streams are active, then the receiver 921 sends HeartbeatResponse packets to its parent. 923 If a receiver's parent node does not receive a HACK or a 924 HeartbeatResponse within a specified time interval, the parent 925 detects the failure of the receiver and removes the child from its 926 child list. 928 4.10.3 Top node failure 930 The top node sends Heartbeat packets to its child nodes on its 931 multicast control channel. If a child node does not receive 932 Heartbeats from the top node, it detects a failure of the top node. 934 4.10.3.1 Fault Tolerant Top Node 936 RMTP-II provides for the implementation of a fault tolerant top node. 937 The top node sends the address of a backup top node to all of its 938 children in the Heartbeat packets. A backup top node monitors the 939 Heartbeat packets of the top node. If the top node fails, the backup 940 top node takes over. 942 If a child node of the top node detects a failure of the top node, it 943 reconnects to the backup top node. 945 4.10.4 Aggregator/Designated Receiver failure 947 Each aggregator and designated receiver node sends Heartbeat packets 948 to its child nodes on its multicast control channel. If the child 949 nodes do not receive any Heartbeats from the parent node, they detect 950 failure of the parent. 952 4.10.4.1 Recovery 954 When a child node detects failure of its parent node, it can try to 955 reconnect to a randomly chosen alternate aggregator, designated 956 receiver, or the top node of the RMTP-II tree. 958 4.11 Quality of Service (QoS) 960 RMTP-II supports three levels of QoS: 962 Reliable Unordered: packets are delivered exactly once, but packet 963 order is not guaranteed. 965 Reliable Source Ordered: Packets are delivered exactly once and in 966 the same order in which they were sent from the source. 968 Time Bounded Reliability: This is an optional QoS in which packets 969 are delivered exactly once and in the same order in which they 970 were sent from the source, but packets that cannot be delivered 971 within a specified time bound are dropped. See Appendix D - Time 972 Bounded Reliability. 974 4.12 SNMP Support 976 The control nodes, and the top node in particular, are designed to 977 interact with SNMP management tools. In cooperation with the top 978 node's centralized control of the senders, this allows network 979 managers to easily monitor and control the sessions being 980 transmitted. 982 All nodes of RMTP-II provide SNMP support. SNMP support is optional 983 for sender and receiver nodes, but is required for all control nodes. 984 See the Appendix - SNMP MIBs for details. 986 4.13 Session Manager Support 988 The Session Manager is an optional component associated with RMTP-II 989 which is responsible for providing each node with topology 990 configuration information. A session manager can be implemented 991 either as a set of static configuration files, as a fully distributed 992 algorithm, or as a centralized management tool that interfaces with 993 each of the nodes. 995 A session manager can optionally be responsible for providing total 996 group membership listings by aggregating the membership lists at each 997 control node. 999 A session manager can optionally provide an interface with directory 1000 services for the announcement of sessions. 1002 A session manager be responsible for partitioning the data at the 1003 sender, selecting the appropriate sets of layers at each receiver, 1004 and for reassembling the data at the receiver. This work is currently 1005 in progress. 1007 4.14 Forward Error Correction 1009 Recent work [NB96, NBT97, Rizzo97] has shown the benefits of 1010 incorporating reactive forward error correction (FEC) into reliable 1011 multicast protocols. This feature encodes data packets with FEC 1012 algorithms, but does not transmit the parity packets until a loss is 1013 detected. The parity packets are then transmitted and are able to 1014 repair different lost packets at different receivers. This is a 1015 powerful tool for providing scalability in the face of independent 1016 loss. When implemented, it is a simple matter to also provide 1017 proactive FEC which automatically transmits a certain percentage of 1018 parity packets along with the data. This is particularly useful when 1019 a high minimum error rate is expected, or when low latency is 1020 particularly important. This can be implemented as an option to 1021 RMTP- II. See Appendix D for FEC details. 1023 5 Details: 1025 5.1 RMTP-II Global Parameters 1027 5.1.1 The Tree-Wide Parameters 1029 The following global, or tree-wide, parameters are defined for an 1030 RMTP-II tree. These parameters are configured at the top node. The 1031 other nodes acquire the parameters when they connect to the RMTP-II 1032 tree. 1034 The parameters can be modified at the top node. The changes will be 1035 propagated to the rest of the tree. 1037 B: The branching factor B denotes the maximum number of children for 1038 any node in the tree. 1040 C: The constant C is a scaling coefficient for Thack. The constant C 1041 determines how quickly Thack increases when no data packets are 1042 being sent. 1044 R: The number R specifies the maximum number of HACKs which should 1045 be received, on average, for each Data packet sent, at any node. 1046 The stored value of R is divided by 100 to get an R value of the 1047 form dd.dd. 1049 Thack_max: The number Thack_max specifies the maximum time allowed 1050 between HACK transmissions for each receiver. 1052 Tjoin_response: The number Tjoin_response is the maximum time to 1053 wait for a response to a JoinStream request from the parent node. 1054 The response should be either a JoinAck or a JoinConfirm. 1056 Rjoin: The number Rjoin is the number of retries of the JoinStream 1057 request before declaring the parent unreachable. 1059 Thb: the number Thb is the time interval at which control nodes 1060 multicast Heartbeat packets. 1062 F: The failure threshold constant, F, determines the threshold time 1063 for failure detection. 1065 Tnulldata_max: The number Tnulldata_max is the maximum time interval 1066 for sending NullData packets. 1068 Optimistic: All the designated receivers should use optimistic HACK 1069 mechanism. 1071 RxMax: The number RxMax is the maximum number of retransmission that 1072 can be done for a data packet. 1074 Radmit_rate_min: This is the minimum admission rate that the sender 1075 can use for Data packets. 1077 Radmit_rate_max: This is the maximum admission rate that the sender 1078 can use for Data packets. 1080 5.1.2 Global Tree Parameter Distribution 1082 When a node joins an RMTP tree for the first time, it receives the 1083 global parameters from the options extension of the JoinConfirm 1084 packet. 1086 A node receives changes to global parameters in Heartbeat packets 1087 whose option flag, Global Parameters Modified, is set to 1. 1089 If a control node receives a Heartbeat packet containing modified 1090 variables, it sends Heartbeat packets to its children with the option 1091 flag, global parameters modified, set to 1. The control node 1092 continues sending the Heartbeat packet until it receives confirming 1093 HeartbeatResponse packets from all of its children. The control node 1094 then applies the new values of global parameters and sends the 1095 confirming HeartbeatResponse packet to its parent. 1097 When a receiver node receives a Heartbeat with the option flag, 1098 global parameter modified, set to 1, it applies the new parameter 1099 values and sends confirmation in the next HeartbeatResponse packet to 1100 its parent. 1102 Each time modifications are made to global parameters at the top 1103 node, it increments a 16-bit sequence number and inserts this 1104 sequence number in the Heartbeat packets it sends to its child nodes. 1105 The nodes of the RMTP tree identify global parameter modifications 1106 based on the sequence number in the Heartbeat packet. 1108 The top node does not allow additional modifications to the global 1109 parameters, until it gets confirmations from all of its children that 1110 the current modifications have been received. 1112 5.2 Tree Configuration 1113 A node requires information about the RMTP-II tree and the data 1114 stream it wishes to join. The mechanism for acquiring the 1115 configuration information is not part of the protocol, but is 1116 implemented by the associated Session Manager. The RMTP-II protocol 1117 specifies only the rules by which a node joins the tree. 1119 A node requires the following information to join an RMTP-II tree. 1120 The information depends on the type of the node. 1122 5.2.1 Top Node Configuration 1124 A top node requires: 1126 tree ID: a unique identifier for the RMTP-II tree 1128 UDP listen port: the number of the port on which the top node will 1129 listen for its children's control messages 1131 Local Multicast Control Channel: the address on which the top node 1132 sends Heartbeat packets to its children. 1134 5.2.2 Backup Top Node Configuration 1136 A backup top node requires: 1138 tree ID: a unique identifier for the RMTP-II tree 1140 UDP listen port: the number of the port on which the top node will 1141 listen for its children's control messages 1143 Local Multicast Control Channel: the address on which the top node 1144 sends Heartbeat packets to its children. 1146 5.2.3 Aggregator node Configuration 1148 An aggregator requires: 1150 tree ID: the unique identifier for the RMTP-II tree to join 1152 parent address: the address and port of the parent node to which the 1153 node should connect 1155 UDP listen port: the number of the port on which the node will listen 1156 for its children's control messages 1158 Local Multicast Control Channel: the address on which this node sends 1159 Heartbeat packets to its children. 1161 5.2.4 Designated Receiver node Configuration 1163 A designated receiver requires: 1165 tree ID: the unique identifier for the RMTP-II tree to join 1167 parent address: the address and port of the parent node to which the 1168 node should connect 1170 UDP listen port: the number of the port on which the node will 1171 listen for its children's control messages 1173 Local Multicast Control Channel: the address on which this node 1174 sends Heartbeat packets to its children. 1176 5.2.5 Sender or Receiver node 1178 A sender or receiver requires: 1180 tree ID: the unique identifier for the RMTP-II tree to join 1182 parent address: the address and port of the parent node to which the 1183 node should connect. 1185 StreamID: the unique identifier of the stream the node wants to send 1186 or receive 1188 data multicast group address: the multicast address on which the data 1189 stream is sent. 1191 5.3 Tree Membership Algorithms 1193 5.3.1 Join Algorithm 1195 A sender node makes a JoinStream request for every data stream that 1196 it intends to send. The parent for a sender is the top node. 1198 When a receiver node is created, it sends a separate JoinStream 1199 packet for each of the streams that it intends to receive. When a 1200 receiver node rejoins a stream after a parent failure, it may send a 1201 single JoinStream packet to join multiple streams. 1203 Aggregator and designated receiver nodes send a JoinStream packet 1204 with StreamID value zero to join the RMTP tree, without joining any 1205 stream. An aggregator or designated receiver later joins the streams 1206 that their children join. 1208 A node sends a JoinStream request to its parent node. It waits a 1209 time interval, Tjoin_response, for a response from the parent node. A 1210 parent node responds to a JoinStream request with a JoinAck packet or 1211 a JoinConfirm packet. If the parent node cannot process the request 1212 immediately, it responds with a JoinAck packet. Otherwise, it sends 1213 a JoinConfirm packet. 1215 If no response is received, the node retransmits the JoinStream 1216 request. The node retransmits the JoinStream request a maximum number 1217 of times, Rjoin. If a node does not receive a response for the 1218 JoinStream requests after Rjoin tries, it reports a Parent- 1219 Unreachable failure after Rjoin retries. 1221 If a node receives a JoinAck packet, it continues transmitting 1222 JoinStream requests at exponentially longer times, until it receives 1223 a JoinConfirm rather than a JoinAck. Every time that a node does not 1224 receive a response from its JoinStream request, it increments a local 1225 variable NumberFailures. Every time that it does not receive a 1226 JoinAck in response to its JoinStream request, it increments 1227 NumberFailures. If NumberFailures exceeds Rjoin, it reports a 1228 Parent- Unreachable failure. 1230 A node receives the tree's constant parameters and the parent's 1231 multicast control channel address from the JoinConfirm. 1233 When joining, the sender can request that RMTP-II provide the 1234 StreamID, in which case the top node generates a unique StreamID and 1235 sends it in the JoinConfirm packet. 1237 If a sender fails and than attempts to rejoin the tree, the top node 1238 will not allow this new sender to join until a period 6*F*Thb, the 1239 sender failure detect interval, has elapsed since the failure. 1241 A control node can proactively join the RMTP-II tree without joining 1242 a stream by issuing a JoinStream request with a StreamID value of 1243 zero. A control node can reactively join the RMTP-II tree when it 1244 receives a JoinStream request from any of its children. The control 1245 node joins all the streams that its children join. 1247 When a node issues a JoinStream request during a failure recovery, it 1248 sets the rejoin flag in the JoinStream request. During a rejoin, a 1249 node can specify all the streams that it wants to join in a single 1250 JoinStream request. 1252 5.3.2 Sender TimeStamp 1254 When a sender is created, it records the current time, expressed in 1255 elapsed seconds and microseconds since 00:00 Universal Coordinated 1256 Time, January 1, 1970. The sender uses the seconds part to 1257 initialize the value of TimeStamp. TimeStamp is a 32 bit number 1258 which is unique within approximately 136 years. 1260 The resolution of the TimeStamp is 1 second. Because this is UTC, 1261 there is no counter discontinuity when daylight savings occurs. 1263 The TimeStamp value is sent in every Data, NullData, Retransmission, 1264 and HACK packet for the data stream, and does not change over time 1265 for a stream. 1267 The value of TimeStamp allows nodes to distinguish different 1268 incarnations of the sender. A new TimeStamp on a data stream 1269 indicates that the sender has restarted. 1271 5.3.3 Leave Algorithm 1273 A node sends a LeaveStream request when it wishes to leave the data 1274 stream. The node waits for a period Tjoin_response to receive the 1275 LeaveConfirm packet from its parent node. If the LeaveConfirm packet 1276 is not received within the time Tjoin_response, then the node resends 1277 the LeaveStream request. The node sends the LeaveStream request a 1278 maximum of Rjoin times. 1280 A control node can leave a stream only if it has no children. 1282 To leave a stream, a sender node sends an EOS packet and waits for 1283 the EOS in the HACK before leaving the stream. 1285 5.4 Data Transmission 1287 An application provides Data packets to a sender node. The sender 1288 node assigns consecutive sequence numbers to the Data packets and 1289 buffers the Data packets in the data queue. 1291 A sender can use any sequence number other than 0 as the starting 1292 sequence number for the stream. Sequence number 0 indicates an 1293 inactive data stream and should not be used for other purposes. 1295 The sender initializes LastStable as StartSequenceNumber-1. 1296 LastStable is the sequence number of the last stable packet which the 1297 sender has removed from its data queue. All lower numbered packets 1298 are stable. The sender cannot retransmit any packet having sequence 1299 number less than or equal to LastStable. 1301 LastStable in conjunction with the sequence number is used by 1302 receivers to detect the first Data packet for the stream as well as 1303 missing packets. 1305 For example, suppose that a sender's StartSequenceNumber = 10 and the 1306 receiver's first Data packet has sequence number 15. LastStable = 9 1307 (10-1). The missing packets are 10, 11, 12, 13 and 14. 1309 5.4.1 Stable Packets 1311 If the lowest numbered Data packet in the data queue becomes stable, 1312 that is, all the target recipients have received the packet, then the 1313 sender deletes the packet from the data queue. The Sender receives 1314 stability information from the Stable Sequence Number of the HACKs it 1315 receives. 1317 5.4.2 Stream Parameters 1319 The sender maintains the following values for each of its data 1320 streams: 1322 Ndata_size: This is the data queue size, the number of Data packets 1323 the sender can buffer for transmission or retransmission. A 1324 sender can admit a packet to the data queue only if the number of 1325 packets in the data queue is less than Ndata_size. 1327 Radmit_rate_max: A sender uses Radmit_rate_max to control the 1328 maximum sending rate. The sending rate is determined by 1329 congestion control algorithms, and can fluctuate between 1330 Radmit_rate_min and Radmit_rate_max. 1332 Nseq_low: This is the sequence number of the lowest numbered packet 1333 in the data queue which is not yet stable. It is the sequence 1334 number of the oldest Data packet that can be retransmitted. 1336 Nseq_high: This is the sequence number of the highest numbered 1337 packet in the data queue. 1339 Tsend_alarm: This is the interval between multicasts of Data packets 1340 by the sender. 1342 Tnulldata_min: This is the minimum interval between the transmission 1343 of NullData packets by the sender. 1345 Tnulldata_alarm: This is the current interval between transmissions 1346 of NullData packets by the sender. 1348 5.4.3 Example: Packet Stability and Transmission 1350 Suppose that: Ndata_size equals 150, Nseq_low equals 200, Nseq_high 1351 equals 300, and all the Data packets numbered from 200 to 230 have 1352 gone stable. In this case, the sender will delete the stable packets 1353 from 200 to 230 from the data queue. The new value of Nseq_low will 1354 be 231. The sender can transmit Data packets until Nseq_high reaches 1355 380 (Nseq_low + Ndata_size). 1357 5.4.4 Example: Packet Rate 1359 Suppose the sender is using fixed size packets and Radmit_rate = 512 1360 Kbps = 65536 Bps, Ndata_size = 200 packets, PACKET_SIZE = 1024 bytes, 1361 Tsend_alarm = 50 ms = 20 ticks per second. 1363 The maximum number of packets that can be sent per interval 1364 Tsend_alarm is: (Radmit_rate*1024)/(8*(1000 / 1365 Tsend_alarm)*PACKET_SIZE) For this example the sender can at most 1366 send (512 * 1024)/(8 * (1000/50) * 1024) = 512/160 which is 1367 approximately 3 packets per interval 1369 5.4.5 NullData Packets 1371 If a sender has no Data packets to transmit for a data stream, it 1372 transmits NullData packets on the data channel. A NullData packet 1373 contains a sender time stamp, TimeStamp, the sequence number of the 1374 last packet, LastSequence, and the sequence number of the last stable 1375 packet, LastStable. 1377 NullData messages keep the stream alive, inform the receivers about 1378 the state of the sender, and detect packets missed by receivers. 1380 When there are no Data packet to send, a sender sets Tnulldata_alarm 1381 to Tnulldata_min. If Tnulldata_alarm expires, then the sender 1382 transmits a NullData packet and sets Tnulldata_alarm to twice its 1383 current value. The maximum value of Tnulldata_alarm is 1384 Tnulldata_max. This is done to utilize minimum bandwidth and insure 1385 timely recovery of missing Data packets. 1387 Example: Increase of Tnulldata_alarm 1388 Suppose that: 1389 Tnulldata_min = 2000 ms 1390 Tnulldata_max = 16000 ms 1391 Time of transmission of last Data packet = t1 1392 Let s = Tnulldata_min 1393 If there is no data to send, the sender transmits NullData packets at 1394 times t2 = t1+s, t3 = t2+2s, t4 = t3+4s, t5 = t4+8s, t6 = t5+16s 1396 |--|----|--------|----------------|----------------|--------------- 1397 t1 t2 t3 t4 t5 t6 1399 5.4.6 End Of Stream 1400 The end of a data stream is indicated by the flag EOS in the final 1401 Data packet of the data stream. The sender transmits a Data packet 1402 with the EOS bit set to 1 and waits for the packet to become stable. 1404 If a receiver receives a Data packet with EOS set to 1 and it is not 1405 missing any Data packets, it continuously generates HACK packets, 1406 based on Thack, with the EOS bit set to 1, until it gets an EOS 1407 packet from its parent confirming the end of stream. The receiver 1408 can then leave the stream. 1410 If a parent node receives a HACK with the EOS bit set to 1, it 1411 responds with an EOS packet to that child node confirming the end of 1412 stream. 1414 If a parent node has received HACKs for a data stream with the EOS 1415 bit set to 1 from all of its child nodes, then it sends a HACK with 1416 EOS set to 1 to its parent node. Once a sender indicates the end of 1417 the stream, it must wait until it gets a confirming EOS packet from 1418 the top node before it can leave the group. 1420 5.4.7 Computation of Feedback Round Trip Time 1422 The round trip time, RTT, is the difference between the time a Data 1423 packet is sent by the sender and the time that a HACK is received by 1424 the sender indicating that the packet was received or requesting 1425 retransmission of the packet. 1427 The sender measures the RTT for the first packet of a stream and 1428 every packet with sequence number equivalent to 1 mod H where 1430 H = Ceiling(B/R) 1431 B = MaxChildren 1432 R is the overhead ratio. 1434 For example, if H = 200, the sender will calculate RTT on packet 1435 number 1, then on packet 201, 401, 601. The sender will receive, on 1436 average, one HACK per 200 Data packets. 1438 5.4.8 Retransmission Timeout 1440 The Retransmission Timeout, RTO, is the time interval between 1441 retransmissions, in milliseconds. 1443 The RTO value is calculated by the sender node using the algorithm of 1444 [Jacobson88], [Jacobson90] which depends on both the smoothed RTT and 1445 the smoothed mean deviation. The algorithm can be implemented with 1446 integer arithmetic. 1448 M = measured RTT at the sender 1449 A = smoothed RTT 1450 D = smoothed mean deviation 1452 Initially, let 1454 g = 1/8, g is the gain for the RTT average. 1455 h = 1/4, h is the gain for deviation 1456 A = 0, current RTT estimator 1457 D = 3 seconds 1459 Err <= M - A 1460 A <= A + g*Err 1461 D <= D + h*(|Err| - D) 1462 RTO = A + 4D 1464 5.5 Data Reception 1466 5.5.1 New Stream 1468 When a receiver joins a data stream it receives a JoinConfirm packet 1469 which contains the StreamID, the TimeStamp, and the value of 1470 LastStable. 1472 If the data stream is not active, then LastStable equals 0. If the 1473 data stream is active, LastStable is the number of the stable packet 1474 with the greatest sequence number. The receiver can only receive 1475 packets with higher numbers than LastStable. The receiver calculates 1476 the starting sequence number, StartSequenceNumber, for the stream by 1477 adding 1 to the value LastStable. If the result equals 0 because of 1478 wrap-around, then the value is incremented to 1, since sequence 1479 number 0 is reserved. 1481 5.5.2 Quality of Service 1483 A receiver node delivers Data packets based on the quality of service 1484 value, QoS, specified for the data stream. If the QoS value is 1485 Source Ordered, then the Data packets must be delivered to the 1486 application in the order defined by the sequence numbers of the 1487 packets. If the QoS is Unordered Reliable, the packets are delivered 1488 to the application in the order they are received, but are never 1489 delivered more than once. 1491 An optional QoS level, Time Bounded Reliability, is detailed in 1492 Appendix - Time Bounded Reliability. 1494 5.5.3 Detection of Missing Packets 1495 Data, NullData, and Retransmission packets contain information that a 1496 receiver uses to detect missing packets. 1498 A NullData packet for a stream contains the sequence number of the 1499 last Data packet transmitted for the stream. A receiver determines 1500 whether any intervening packets in the sequence are missing. 1502 Each Data packet and Retransmission packet contains its own sequence 1503 number and the number of the last stable packet, LastStable. A 1504 receiver determines whether any packets have been missed between the 1505 last stable packet and the highest numbered packet received. 1507 5.5.4 Recovery of Missing Packets 1509 Both HACKs and NACKs are used for recovery of missing packets. 1511 If a receiver detects a missing Data packet, it sets to zero the bit 1512 of the HACK bitmap which corresponds to the sequence number of the 1513 missing Data packet. 1515 If the bit flag N of Data packets, NullData Packets, or 1516 Retransmission packets for a data stream is set to 1, then NACKs are 1517 enabled for the data stream. If a receiver detects a missing Data 1518 packet and NACKs are enabled for the stream, then the receiver sends 1519 a NACK based on the NACK generation algorithm. 1521 The HACK and NACK generation algorithms are described later. 1523 5.5.5 Stream failure detection 1525 A receiver detects the failure of a stream by the following 1526 mechanisms: 1528 Timeout Detection If a receiver does not receive any Data, NullData, 1529 or Retransmission packets within an interval 2*F*Tnulldata_max, then 1530 it detects the failure of the data stream 1532 Timestamp Detection If a receiver receives a Data packet whose 1533 TimeStamp value is greater than the current TimeStamp value for the 1534 data stream, then the receiver detects the failure of the stream. 1536 The receiver changes the current TimeStamp value to the TimeStamp 1537 value of the received packet. The receiver discards any Data packet 1538 with a TimeStamp value that is less than the current TimeStamp value. 1540 For example, suppose the current StreamID equals 0x1000 and TimeStamp 1541 equals 0x1234. The receiver receives a new Data packet with StreamID 1542 0x1000 and TimeStamp 0x1300. The new TimeStamp value is higher than 1543 the current value. The receiver detects that the stream has been 1544 abnormally terminated and restarted with the TimeStamp 0x1300. 1546 If a control node receives a HACK packet from a child node for a data 1547 stream which has a higher TimeStamp value than the current TimeStamp 1548 value for the data stream, then the node detects that the data stream 1549 has failed. The control node updates its information for the data 1550 stream with the new TimeStamp value and it ignores all HACK packets 1551 with TimeStamp value less than the new TimeStamp value. 1553 5.5.6 End of Stream 1555 If a receiver receives a confirming EOS packet from its parent for a 1556 data stream, it discards any further Data packets for that data 1557 stream. 1559 5.6 HACK Generation 1561 HACK packets are used to acknowledge received Data packets and to 1562 detect missed packets. The purpose of a HACK is to acknowledge the 1563 packets that are considered stable, and to request retransmission of 1564 packets that have not been received. 1566 The following fields of a HACK packet are used to determine the 1567 status of the received data: 1569 High Sequence Number: If the node is a receiver, the High Sequence 1570 Number is the sequence number of the highest numbered packet that 1571 the node has received. If the node is a control node, then the 1572 High Sequence Number is the sequence number of the highest 1573 numbered packet received by any of its child nodes. 1575 Low Sequence Number: If the node is a receiver, the Low Sequence 1576 Number is the sequence number of the lowest numbered packet that 1577 the node has not yet received. If the node is a control node, 1578 then the Low Sequence Number is the sequence number of the lowest 1579 numbered packet that has not yet been received by any of its child 1580 nodes. 1582 Stable Sequence Number: The stable sequence number is the sequence 1583 number of the highest contiguous stable packet known to the node. 1584 If the node is a receiver, this number is the Low Sequence Number 1585 minus 1. If the node is an aggregator or top node then it is the 1586 aggregated Low Sequence number minus 1. If the node is a 1587 designated receiver then the Stable Sequence Number has different 1588 meaning depending on the HACK mechanism used. 1590 Bitmap: The bitmap consists of a sequence of single bit 1591 "acknowledged" flags, corresponding to the sequence of packets 1592 with sequence numbers between Low Sequence Number and the High 1593 Sequence Number. A bit value 0 indicates that the corresponding 1594 packet is missing and requires retransmission. For reasons of 1595 efficiency, the bit corresponding to Low Sequence Number is at the 1596 bit position Low Sequence Number Modulo 32. The remaining bits 1597 are assigned to consecutive sequence numbers up to High Sequence 1598 Number. Any remaining initial or terminal bits are ignored. 1600 Example for details of HACK 1601 LSN, the Lowest Sequence Number not yet received by this node = 40 1602 HSN, the Highest Sequence Number received by this node = 72 1603 Bitmap[0] = 0xFF7EDC7F 1604 Bitmap[1] = FF800000 1605 Here is the bitmap array written as 0s and 1s: 1607 1 2 3 1608 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1609 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1610 |1 1 1 1|1 1 1 1|0 1 1 1|1 1 1 0|1 1 0 1|1 1 0 0|0 1 1 1|1 1 1 1| 1611 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1612 |1 1 1 1|1 1 1 1|1 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0| 1613 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1615 The bitmap begins at position LSN modulo 32, which is position 8. 1616 Bits 0-7 in Bitmap[0] are ignored. The bitmap extends for HSN - LSN 1617 = 32 positions. The final position is at position 8 of Bitmap[1]. 1618 Bits 9 - 31 of bitmap[1] are ignored. The bitmap indicates that the 1619 following packets are missing: 40, 47, 50, 54, 55, 56. 1621 The receiver nodes and control nodes use different algorithms for 1622 generating HACKs. 1624 5.6.1 Receiver node HACK generation 1626 Congestion caused by bursts of ACK or HACK traffic limits the 1627 scalability of ACK or HACK based protocols. RMTP resolves this 1628 problem by allowing the system to control the level of HACK overhead 1629 by dynamically adjusting the rate of control packets to match the 1630 rate of data packets, and by limiting control packet generation with 1631 the rotating HACK algorithm. 1633 Control rate of HACK 1635 Congestion due to control packet traffic is most likely to occur when 1636 data packets are sent at a high rate and the control channel has 1637 significantly lower bandwidth than the data channel. RMTP-II 1638 controls the rate by controlling the response ratio R = C/D. C is the 1639 number of control packets a parent receives from its children in 1640 response to the number D of data packets received. This ratio is 1641 controlled by limiting the acknowledgments returned by the children. 1643 The Rotating HACK Algorithm 1645 Each receiver is informed of its index number, M, in its parent's 1646 child list from the JoinConfirm packet received when the receiver 1647 joins the data stream. The receiver calculates the value of H 1648 defined by: H = Ceiling(B/R). B and R are tree wide constants. 1650 A receiver only sends a HACK for a packet whose sequence number 1651 Modulo H equals its M number modulo H. The value of H insures that 1652 the receiver child nodes of a single parent send a total of no more 1653 than R HACKs for each Data packet 1655 If a packet is dropped in transit, and the packet would have 1656 generated a HACK for the rotating HACK algorithm, then the first 1657 packet received with a sequence number greater than the missing 1658 trigger packet will cause a HACK to be generated. 1660 5.6.1.1 Example 1662 Assume a parent has B = 6 children, and the desired ratio of HACK 1663 packets received to Data packets sent is R = 2. Then the children 1664 will respond to every H = 6/2 = 3 Data packets. The children are 1665 separated into three groups by their M numbers (Modulo H), 0, 1, 2. 1666 The children with number 0 will respond to sequence numbers that 1667 equal 0 (Modulo 3). The children with number 1 will respond to 1668 sequence numbers that equal 1 (Modulo 3). The children with number 2 1669 will respond to sequence numbers that equal 2 (Modulo 3). The 6 1670 children are partitioned into three response classes of size 6/3 = 2. 1671 There will be B/H = 2 children responding to each Data packet. 1673 Sequence #: 101 102 103 104 105 106 107 108 1674 Sequence #, Mod 3: 2 0 1 2 0 1 2 0 1676 Receiver Index #: 1, 2, 3, 4, 5, 6 1677 Receiver Index (Modulo 3): 1, 2, 0, 1, 2, 0 1679 A small R value makes H large, and decreases the volume of control 1680 traffic. However, since the children do not send HACKs immediately, 1681 the time to reach a stable state is longer. 1683 5.6.1.2 Guaranteed HACK Generation 1685 When data traffic is low, a receiver may not send a packet for a long 1686 time. This could cause long waits for packet stability and could also 1687 make the receiver appear to have failed. The tree wide constant 1688 Thack_max guarantees that receivers respond in a timely manner. 1689 Thack_max specifies the maximum time that can elapse between HACKs 1690 while the stream is active. If the stream is active, a receiver 1691 sends at least one HACK within the interval Thack_max. 1693 The value of the variable Thack dynamically controls the time between 1694 HACKs. Thack varies as the average of the previous two inter-HACK 1695 times, Thack1 and Thack2. The values of Thack1 and Thack2 are 1696 initially set to Thack_max. Thack = Min((Thack1 + Thack2)*C, 1697 Thack_max) where C is a tree-wide constant that determines the 1698 responsiveness of Thack. 1700 The Thack timer is initialized when a receiver receives the first 1701 packet of a data stream. A new Thack value is calculated and the 1702 Thack timer is reset each time the receiver sends a HACK packet. 1704 If the data traffic load is high, H packets are received in time 1705 Thack or less and the rotating HACK algorithm determines the 1706 generation of HACKs. 1708 If the load is low, the Thack timer expires before H packets are 1709 received and Thack determines the generation of HACKs. Under low 1710 load conditions, the value of Thack grows in an approximately 1711 exponential manner to Thack_max. 1713 5.6.2 Control node HACK generation 1715 A control node has two functions in the HACK algorithm, HACK 1716 aggregation and HACK generation. Designated receivers use a modified 1717 HACK mechanism. 1719 Aggregator and top node HACK mechanism 1721 An aggregated HACK from a control node acknowledges that a packet has 1722 gone stable only if all descendants of the control node have received 1723 the packet. The Lowest Sequence Number refers to the lowest numbered 1724 packet that has not been received by at least one of the control 1725 node's child nodes. The Highest sequence number is the highest 1726 numbered packet that has been received by all of the control node's 1727 child nodes. The Stable Sequence Number is the aggregated Lowest 1728 Sequence Number minus 1. The parent bitmap is a logical AND of 1729 corresponding bits of all the child bitmaps. The bitmap indicates the 1730 packets missed by all of the control node's descendents. 1732 Example for HACK aggregation 1733 HACK information from receiver-1 1734 Lowest sequence number = 40 1735 Highest sequence number = 72 1736 Stable sequence number = 39 1737 Bitmap[0] = 0xFF7EDC7F Bitmap[1] = 0xFF800000 1738 missing packets: 40, 47, 50, 54, 55, 56 1740 HACK information from receiver-2 1741 Lowest sequence number = 38 1742 Highest sequence number = 74 1743 Stable sequence number = 37 1744 Bitmap[0] = 0xFDFEDD7F Bitmap[1] = 0xFF600000 1745 missing packets: 38, 47, 50, 54, 56, 72 1747 Aggregated HACK 1748 Lowest sequence number = 38 1749 Highest sequence number = 71 1750 Stable sequence number = 37 1751 Bitmap[0] = 0xFD7EDC7F Bitmap[1] = 0xFF600000 1752 The control node generates a HACK when it receives HACKs from all its 1753 child nodes. To guarantee HACK generation the control nodes sends a 1754 HACK if its Thack timer expires. The algorithm for Thack timer is 1755 identical to the receiver, refer to "Guaranteed HACK generation". 1757 receiver-1: 40 47 50 54 55 56 (72) 1758 receiver-2: 38 47 50 54 56 72 (74) 1759 aggregated: 38 40 47 50 54 55 56 (71) 1761 Packet 38 is missing from receiver-2 and is the lowest missing 1762 packet. If a packet is missing in either bitmap, it is missing in the 1763 aggregated HACK. Packet 71 is the highest packet received by both 1764 receivers. 1766 Designated receiver HACK mechanism 1768 The designated receiver has two kinds of HACK mechanisms: Pessimistic 1769 and Optimistic. 1771 Pessimistic HACK 1773 The pessimistic HACK mechanism is very similar to the aggregator's 1774 HACK mechanism. The Stable Sequence Number is the aggregated lowest 1775 Stable Sequence Number of all the child HACKs. The Lowest Sequence 1776 Number is the lowest missing packet above the Stable Sequence Number 1777 of the designated receiver. The Highest Sequence Number is the 1778 Highest Sequence Number received by the designated receiver. The 1779 bitmap only contains information about the data packets missed by the 1780 designated receiver. 1782 Optimistic HACK 1784 An optimistic HACK acknowledges that a packet is stable if the parent 1785 node receives it. It is the responsibility of the parent node to 1786 insure that every child receives the packet. A node generates an 1787 optimistic HACK based on its own list of received packets and uses 1788 the same rotating HACK algorithm used by the receiver nodes. 1790 With optimistic HACKs, the designated receiver still needs to 1791 aggregate HACKs from its child nodes because it needs to discard 1792 buffered data packets. It discards a packet as soon as all of its 1793 children have received it. 1795 The advantage of the Optimistic HACK mechanism is that the sender 1796 gets stability of packets before the packets are actually stable at 1797 the receivers. This decreases the buffering requirements at the 1798 receiver. The disadvantage of the Optimistic HACK mechanism is the 1799 resulting decrease in fault recovery. 1801 Consider a RMTP tree in which a receiver has a designated receiver as 1802 parent and a designated receiver as the next ancestor node. The 1803 parent node receives all the packets, but the receiver misses a 1804 packet. If the receiver's parent dies before it can retransmit the 1805 missing packet and the receiver rejoins the ancestor node, the 1806 receiver cannot recover the mission packet. This is because the 1807 packet has already gone stable at the ancestor and at the sender 1808 because of the optimistic HACK mechanism. 1810 5.7 Retransmissions in response to a HACK 1812 HACK packets prompt retransmission of missing packets. This 1813 retransmission can be done by the sender or by designated receiver 1814 nodes. 1816 A designated receiver node retransmits any missing packets for which 1817 it receives a request in a HACK and for which it is currently holding 1818 the requested packets. The retransmissions are multicast on the 1819 local control channel. When a designated receiver fulfills a 1820 retransmission request, it deletes that request from the request bit 1821 field in the HACK to prevent any of its parents from duplicating the 1822 retransmission. 1824 When a designated receiver retransmits a packet, it calculates a 1825 minimum period of time, Tmin_retransmit, before another 1826 retransmission request will be honored for this packet. The value of 1827 Tmin_retransmit is stored at the designated receiver along with the 1828 packet's sequence number and a count of the number of times the 1829 packet has been retransmitted. A request for retransmission which 1830 arrives at a designated receiver during the period of time 1831 Tmin_retransmit is ignored. 1833 Tmin_retransmit is calculated as the local RTT of the control channel 1834 multiplied by 2 to the power of the number of previous 1835 retransmissions of that packet. Tmin_retransmit can never exceed 1836 Tmax_retransmit, a fixed system constant. 1838 If the number of retransmissions reaches RXmax, the sender or a 1839 designated receiver will ignore further requests for retransmission. 1840 The designated receiver will mark the packet as stable. The sender 1841 will terminate the stream. The receiver will eventually get a Data 1842 Fail Indication. 1844 5.8 NACK Generation 1846 A negative acknowledgment, or NACK, is an explicit request by a 1847 receiver or a designated receiver to its parent node for a 1848 retransmission of missed data. Networks that operate under low loss 1849 conditions can use NACKs, combined with reduced HACK traffic, to get 1850 faster recovery of lost Data packets with less control traffic 1851 overhead. 1853 A NACK is a HACK packet with the NACK bit, N, set to 1. NACKs are 1854 allowed only for NACK enabled data streams. 1856 After detecting a missing packet, a receiver waits for a random time 1857 Z before generating a NACK. The time, Z, is random variable having a 1858 truncated exponential distribution on the interval [0, T]. Z is 1859 obtained from a uniformly distributed random variable, X, on the [0, 1860 1] by the formula: 1862 Z = T/b1*Ln(b2*X - 1) 1864 where 1865 N = 3, This is the expected number of NACKs, if all the Children miss 1866 the packet. 1867 T = (0.6)*RTT*b1/Ln(N) 1868 b1 = Ln(B) + 1 1869 b2 = Exp(b1) -1 1870 RTT is the local round trip time to the parent 1872 This backoff policy is discussed in [NB98], which also shows that 1873 this policy has a better response time than a uniformly distributed 1874 random variable and quantifies how the average latency varies as a 1875 function of the average worst case number of responses, N. 1877 If a receiver receives the missing Data packet before the NACK timer 1878 expires, it cancels its NACK. This reduces the chance that multiple 1879 receivers generate a NACK for the same packet and suppresses NACKs. 1881 5.9 NACK Response 1883 When a designated receiver receives a NACK, it immediately multicasts 1884 the retransmission on the control channel, unless it suppresses 1885 retransmission due to a recent HACK or NACK. Retransmissions due to 1886 a NACK are suppressed the same way as retransmissions due to a HACK. 1888 If a designated receiver gets a NACK requesting some packets that it 1889 has and some packets that it does not have, it clears the 1890 retransmission request bits for the packets it has, forwards the NACK 1891 to its parent. It retransmits the packets that is has available. If 1892 some packets are unavailable, the designated receiver sends a 1893 Heartbeat with the NACK Suppression Option to its children. The NACK 1894 Suppression option suppresses NACKs from the designated receiver's 1895 children. A single NACK Suppression packet can cover multiple 1896 packets. 1898 If an aggregator node receives a Heartbeat packet from its parent 1899 with the NACK Suppression option, it sends a Heartbeat packet to its 1900 children with bits set for all requested Data packets. 1902 When a designated receiver receives retransmissions from its parent, 1903 it forwards these retransmissions to its children. 1905 A top node or aggregator that receives a NACK acts just like a 1906 designated receiver that does not have the packets for 1907 retransmission. The sender for that stream is treated as the parent 1908 of the top node. 1910 5.10 Congestion Control 1912 The aggregators and designated receivers at the lowest level of the 1913 tree calculate the receiver loss averages based on the HACKs 1914 received. Using the HACK bitmap, the control node calculates the loss 1915 rate. Assuming the HACK bitmap covers 50 packets and 5 of the packets 1916 are missing, the loss rate for that receiver will be 10%. The control 1917 node reports the maximum of its children's loss rates to its parent. 1918 Finally, the top node passes the loss rate information to the sender. 1920 Congestion control is a required component of RMTP-II. See Appendix - 1921 Congestion Control for more detail. 1923 5.11 Failure Detection and Recovery 1925 The Heartbeat, HeartbeatResponse and NullData packets provide the 1926 primary failure detection mechanism in RMTP-II. 1928 5.11.1 Fault tolerant top node 1930 RMTP-II provides a fault-tolerant top node capability by allowing the 1931 configuration of a backup top node. The primary function of a backup 1932 top node is to monitor the top node's Heartbeat packets. If the top 1933 node fails, the backup top node takes over the responsibilities of 1934 the top node. 1936 A backup top node is created administratively when configuring the 1937 top node and the backup top node. The backup top node is configured 1938 as a top node. It joins with StreamID set to zero. The backup top 1939 node receives the top node's Heartbeat packets. The backup top node 1940 sends HeartbeatResponse packets to the top node to enable the top 1941 node to detect the failure of the backup top node. 1943 The top node sends the backup top node's address in the Heartbeat 1944 packets to it child nodes. An address of zero for the backup top 1945 node indicates that there is no backup top node available. 1947 5.11.2 Top Node Failure Detection 1949 The top node sends Heartbeat packets on its multicast control channel 1950 at time intervals Thb. If a child node does not receive a Heartbeat 1951 packet from the top node for a period of 2*F*Thb, it declares the top 1952 node failed. 1954 If the backup top node does not receive a Heartbeat for an interval 1955 F*Thb, then it declares the top node failed. 1957 If a child of the top node does not receive any Heartbeat from the 1958 top node for an interval 2*F*Thb, it detects the failure of the top 1959 node and the failure of the backup top node. The child node declares 1960 the top node failed and, because no backup exists, the child reports 1961 the parent unreachable and fails. 1963 5.11.3 Top Node Failure Recovery 1965 The backup top node detects the top node's failure within an interval 1966 that is half the interval required by the other children of top node. 1967 The backup top node takes over the top node's responsibility and 1968 sends Heartbeat packets on the top node's multicast control channel. 1970 When the top node's children start to receive the backup top node's 1971 Heartbeat packets, they validate the backup top node's identity and 1972 send a JoinStream request with rejoin flag set to 1 to reconnect to 1973 the backup top node. The backup top node acquires information about 1974 the children from their JoinStream requests. 1976 5.11.4 Parent Node Failure Detection by a Child 1978 Each control node sends Heartbeat packets on its multicast control 1979 channel every interval Thb. All the children of the control node 1980 receive the Heartbeat packets. If the child does not receive any 1981 Heartbeat from the parent in an interval F*Thb, it declare the parent 1982 failed. 1984 5.11.5 Node Failure Detection by a Parent 1986 A parent node receives its child node's HACKs and HeartbeatResponse 1987 packets. A child node sends either a HACK or a HeartbeatResponse 1988 packet every interval F*Thb. If a parent fails to receive a packet 1989 from a receiver child node within an interval 3*F*Thb, it detects the 1990 receiver's failure. 1992 If a parent fails to receive a packet from a control node child 1993 within an interval 6*F*Thb, it detects the failure of the control 1994 node child. 1996 If a the top node fails to receive a HeartbeatResponse from the 1997 backup top node for a period of 6*F*thb, it detects the failure of 1998 the sender node. 2000 The failure detection interval for a control node or sender node is 2001 longer than that of a receiver node because these nodes reside in 2002 controlled environments and the probability of their failure is much 2003 less than that of a receiver node. 2005 5.11.6 Child Node Failure Recovery 2007 If a control node detects the failure of a child, it sends an Eject 2008 packet with the Reason Code set to 1 to indicate that there was no 2009 response from the child. 2011 A control node may fail and restart before its children can detect 2012 the failure. If a control node receives a HACK or HeartbeatResponse 2013 packet from a child node that is not in its child list, it sends an 2014 Eject packet with Reason Code 2 to indicate that the parent 2015 restarted. 2017 If a child node has information about other control nodes, it can try 2018 to reconnect to an alternate parent node. The child node makes a 2019 JoinStream request with the rejoin flag set to 1 to join an alternate 2020 parent node. The child node picks a random parent node from the list 2021 of parents. 2023 If a child node receives an Eject packet from its parent, it must 2024 send a JoinStream request to establish its state at the parent. 2025 There is no guarantee that the JoinStream will succeed. If the 2026 parent rejects the JoinStream request, the child node can try to 2027 reconnect to an alternate parent node. 2029 5.11.6.1 Example 2030 Thb = 1 sec. 2031 F = 3 2032 Control node Heartbeat interval = 1 sec. 2033 Control node's failure detection interval = 3 sec. 2034 Receiver's HeartbeatResponse interval = 3 sec. 2035 Receiver child node's failure detection interval = 9 sec. 2036 Control child node's failure detection interval = 18 sec. 2037 If a control node fails, its child receiver nodes detect the parent's 2038 failure in 3 sec. The control node's parent detects the control 2039 node's failure in 18 sec. The receivers have 15 sec. to connect to 2040 an alternate parent node before losing their connection to the RMTP- 2041 II tree. 2043 5.12 Control tree discovery 2045 A Heartbeat packet from a control node may optionally include an 2046 ancestor list, a peer list, and a child list. 2048 An ancestor list contains the address of the control node's parent 2049 and the intervening parent nodes up to the top node. The list is 2050 ordered from the top node to the issuing node's parent. 2052 A peer list contains the addresses of the control node and its 2053 siblings under the parent node. 2055 A children list contains the addresses of the child nodes of the 2056 current node. 2058 A control-node only provides the addresses of the control child nodes 2059 from its children list. 2061 (Top Node) TN | Heartbeat from TN: Tree Info 2062 /^\ | ancestors: 2063 / | \ | peers: 2064 / | \ \ / children: AN1, AN2, AN3 2065 / | \ 2066 AN1 AN2 AN3 | Heartbeat from AN1: Tree Info 2067 /|\ | ancestors: TN 2068 / | \ | peers: AN1, AN2, AN3 2069 / | \ \ / children: AN4, AN5, AN6 2070 / | \ 2072 AN4 AN5 AN6 | Heartbeat from AN4: Tree Info 2073 | | ancestors: TN, AN1 2074 | | peers: AN4, AN5, AN6 2075 | \ / children: 2076 RN 2078 The top node, TN, is at the root of the tree and has no ancestor list 2079 and no peer list. The top node's children list contains the addresses 2080 of AN1, AN2, and AN3. 2082 The node AN1 receives the Heartbeat from the top node, TN, and 2083 receives: - an ancestor list with no entries, indicating its parent 2084 is 2085 the top node 2087 - a peer list with no entries, confirming that its parent is 2088 the top node 2090 - a Children list which contains the addresses of AN1 and its 2091 peers. 2093 The node AN1 sends the following lists in the Heartbeat packet to its 2094 child nodes: 2096 - an ancestor list containing the address of its parent, TN 2098 - a peer list containing the addresses of it and its siblings, AN 2100 - a children list containing the address of AN4, AN5, and AN6 2102 The node AN4 receives the following lists in the Heartbeat packet 2103 from AN1: 2105 - an ancestor list containing the address of TN 2107 - a peer list containing the addresses of AN1 and its peers. If AN1 2108 fails then AN4 will try to join one of its parent's peers AN2 and 2109 AN3. 2110 If the join is not successful, AN4 will try to connect to TN. 2112 - a children list containing the addresses AN4 and its siblings 2114 The node AN4 sends the following lists in the Heartbeat packet to its 2115 child nodes: 2117 - an ancestor list containing the addresses of TN, AN1, and AN4 2119 - a peer list containing the addresses of AN4 and its siblings, AN5 2120 and AN6 2122 - a children list with no entries; only control children are revealed 2124 The receiver node, RN, receives the following lists in the Heartbeat 2125 packet from AN4: 2127 - an ancestor list containing the addresses of TN, AN1, and AN4 2129 - a peer list containing the addresses of AN4 and its peers, AN5 and 2130 AN6 2132 - a children list with no entries 2134 If AN4 fails, RN can try to join its peers, AN5 or AN6. If AN4's 2135 peers don't respond, RN can attempt to the join AN4's parent. 2137 5.13 Control Tree Round Trip Times 2139 RMTP provides an optional mechanism to get the round trip time 2140 (HopRTT), between a node and its children, the round trip time from a 2141 node to its bottom-most descendent (TotalRTT-down), and the round 2142 trip time from a node to the top node (TotalRTT-up). The RTT Option 2143 is used in Heartbeat and HeartbeatResponse packets to calculate and 2144 propagate these RTTs. 2146 A child node gets information about HopRTT and TotalRTT-up from the 2147 Heartbeats of its parent. A parent node gets TotalRTT-down from the 2148 HeartbeatResponse of its children. 2150 5.13.1 Algorithm for RTT 2152 Each node uses the following values: 2154 ResponseDelay: This is the time difference, in milliseconds, between 2155 the time that a Heartbeat packet was received from a parent and 2156 the time the HeartbeatResponse packet sent to the parent. The 2157 child node sends this information in HeartbeatResponse to its 2158 parent. 2160 HopRTT: This is the maximum round trip time from a parent node to its 2161 children. The parent node calculates this from the TimeStamp and 2162 reception time of HeartbeatResponse. The children receive this 2163 from the parent in a Heartbeat packet. 2165 TotalRTT-down: This is the node's RTT to its most distance 2166 descendent. 2168 TotalRTT-up: This is the node's RTT to the top node. 2170 A parent node sends a first Heartbeat packet with the RTT option to 2171 its child nodes to calculate the RTT. It sets the value of TimeStamp 2172 in the RTT option to the sending time. Other fields of the RTT option 2173 are set to 0. 2175 The parent node receives HeartbeatResponse packets from its children 2176 with the following fields: 2178 TimeStamp: This is the original Heartbeat TimeStamp value. 2180 ResponseDelay: This is the time difference, in milliseconds, between 2181 the time the child received the Heartbeat packet and the time it 2182 sent the HeartbeatResponse packet. 2184 HopRTT: This field is 0 in HeartbeatResponse packets. 2186 TotalRTT-down: This is the total round trip time from the child to 2187 its most distant descendent. TotalRTT equals 0 for a node with no 2188 children. 2190 The parent node calculates the round trip time for each of its 2191 children: 2193 round trip time = reception time - TimeStamp - ResponseDelay 2195 The parent node sets the value of HopRTT: 2197 HopRTT = the maximum of child round trip times. HopRTT = 0 for a node 2198 with no children. 2200 The parent node calculates the distance to its most distance 2201 descendent: 2203 TotalRTT-down = maximum of the RTT to each child plus the TotalRTT- 2204 down values to that child. TotalRTT-down equals 0 for nodes with no 2205 children. 2207 The node is also the child of its parent and receives Heartbeat 2208 packets from it parent. If the Heartbeat has the RTT option and the 2209 TimeStamp value is non-zero, then the parent is requesting 2210 information to calculate the RTT. The node returns a 2211 HeartbeatResponse with the following values: 2213 TimeStamp: This is the parent Heartbeat TimeStamp value. 2215 ResponseDelay: This is the difference between the time the node 2216 received the Heartbeat and the time it sends the 2217 HeartbeatResponse. 2219 HopRTT: This field is 0 in HeartbeatResponse packets. 2221 TotalRTT-down: This is the total round trip time from the node to its 2222 most distant descendent, described above. 2224 If the Heartbeat has the RTT option and the TimeStamp value is zero, 2225 then the parent is providing the RTT information. The HopRTT and the 2226 TotalRTT-up fields have the relevant information. The node calculates 2227 its own TotalRTT-up value: 2229 TotalRTT-up = HopRTT plus parent's TotalRTT-up 2231 The node sends the new information in the next Heartbeat packet it 2232 sends. The following information is set in the RTT option of the 2233 Heartbeat packet. 2235 TimeStamp: This is set to 0. 2237 ResponseDelay: This is set to 0. 2239 HopRTT: This is the maximum of the round trip times from the parent 2240 to its children. 2242 TotalRTT-up: This is the parent's RTT to the top node, determined 2243 from Heartbeats from its parent, described above. TotalRTT-up 2244 equal 0 for the top node 2246 6 RMTP-II Packet Formats 2248 This section contains the packet formats for the RMTP-II packet 2249 types. Each field of the packet is shown and briefly discussed. The 2250 packet formats assume a 32-bit addressing scheme. 2252 6.1 RMTP-II Fixed Header 2254 1 2 3 2255 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2256 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2257 | VER |O Num|Res| TYPE | RMTP-II Tree ID | 2258 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2259 | RMTP-II Tree ID | 2260 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2261 | Data or Optional Fields | 2262 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2264 VER: Specifies the version number of the protocol, the first field 2265 to be checked by a receiver 2267 O: a number that specifies the number of options present in this 2268 packet. If no options are present, then this field is set to 0. 2270 TYPE: Packet type. 2271 Value Meaning 2272 0 Reserved 2273 1 Data 2274 2 Retransmission 2275 3 HACK 2276 4 JoinStream 2277 5 JoinAck 2278 6 JoinConfirm 2279 7 LeaveStream 2280 8 Heartbeat 2281 9 NullData 2282 10 Eject 2283 11 EOS 2284 12 HeartbeatResponse 2285 13 LeaveConfirm 2286 14-255 Reserved for Future Use 2288 Tree ID: Tree ID is a six-byte identifier of the RMTP-II tree. 2289 Currently Tree ID consists of the four byte IP address and a two 2290 byte UDP port of the top node. 2292 6.2 Data 2294 1 2 3 2295 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2296 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2297 | Sequence Number | 2298 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2299 | Last Stable Packet | 2300 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2301 | TimeStamp | 2302 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2303 | StreamID |N|E| Reserved | QOS | 2304 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2305 | Data Length | Data | 2306 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2308 Sequence Number: A sender stamps each Data packet that it sends with 2309 a sequence number. Sequence numbers are 32 bits in length and 2310 have a valid range of 1 to 2^32-1. RMTP-II does not use sequence 2311 number 0 and it is always ignored. Because this range is finite, 2312 all arithmetic and comparison with these numbers is Modulo 2^32. 2313 RMTP-II requires that no more than 2^31 packets can be created 2314 over a period equal to the maximum lifetime for a datagram packet 2315 in the network. This condition holds true when IP is used as the 2316 datagram service. 2318 Last Stable Packet: This field contains the sequence number of 2319 highest numbered stable packet at the sender. 2321 TimeStamp: This is a 32 bit time stamp with one second granularity 2322 used to indicate the time when the stream was started. 2324 StreamID: This is a 16 bit ID which uniquely identifies the stream. 2326 N: NACK enabled. This is a flag that is set to 1 if NACKs are 2327 enabled for this stream. 2329 E: This is a flag that is set to 1 to indicate the last packet of 2330 the stream. 2332 QOS: This specifies the desired quality of service for the Data 2333 packet. The valid values for the QoS field are: 2334 2 Unordered 2335 3 Source Ordered 2336 4 Time bound Reliability 2338 Data Length: This is the length of the data in bytes. 2340 Data: This holds the data to be delivered. 2342 6.3 Retransmission Packet 2344 1 2 3 2345 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2346 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2347 | Sequence Number | 2348 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2349 | Last Stable Packet | 2350 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2351 | TimeStamp | 2352 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2353 | StreamID |N|E|D|S| Reserved | QOS | 2354 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2355 | Data Length | Data | 2356 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2358 Sequence Number: This is the sequence number of the original data 2359 packet. 2361 Last Stable Packet: This field contains the sequence number of the 2362 last stable packet at the sender. 2364 TimeStamp: This is a 32 bit time stamp with one second granularity 2365 used to indicate the time when the stream was started. 2367 StreamID: This is a 16 bit ID which uniquely identifies the stream. 2369 N: This is a flag that is set to 1 if NACKs are enabled for this 2370 stream 2372 E: This is a flag that is set to 1 to indicate the last packet of a 2373 stream. 2375 D: This a flag that is set to 1 for a retransmission packet from a 2376 designated receiver. 2378 S: This flag is set to 1 to enable subtree multicast for 2379 retransmission. If the flag is set to 1, an intermediate control 2380 node will not forward the packet to its child nodes. 2382 QOS: This is a number that specifies the desired QoS for the Data 2383 packet. The valid values for the QoS field are: 2384 2 Unordered 2385 3 Source Ordered 2386 4 Time bound Reliability 2388 Data Length: This is the length of the data in bytes. 2390 Data: Data to be delivered. 2392 6.4 HACK 2394 1 2 3 2395 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2396 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2397 | TimeStamp | 2398 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2399 | Mcast Group Address | 2400 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2401 | Mcast Port | StreamID | 2402 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2403 | Child Index |E|N| Reserved | 2404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2405 | HACK Sequence Number | 2406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2407 | High Sequence Number | 2408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2409 | Low Sequence Number | 2410 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2411 | Stable Sequence Number | 2412 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2413 | Bitmap length | Receiver Count | 2414 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2415 | Bitmap | Padding | 2416 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2418 TimeStamp: This is a 32 bit time stamp with one second granularity 2419 used to indicate the time when the stream was started. 2421 McastGroup Address: This is the Multicast group address for this 2422 stream. 2424 Mcast Port: This is the UDP port for this stream. 2426 StreamID: This is a 16 bit ID which uniquely identifies the stream. 2428 Child Index: This is the index number of the child node in its 2429 parent's child list. 2431 E: This flag is set to 1 for the last HACK, to indicate all packets 2432 have been received. 2434 N: This flag is set to 1 to indicate that this is a NACK packet. 2436 HACK Sequence Number: This is the sequence number of the HACK 2437 packet. 2439 High Sequence Number: This is the sequence number of the highest 2440 numbered packet received. 2442 Low Sequence Number: This is the sequence number of the lowest 2443 numbered missing packet. 2445 Stable Sequence Number: The stable sequence number is the sequence 2446 number of the highest contiguous stable packet known to the node. 2448 Bitmap Length: This is the length of the bitmap in 32 bit words. 2450 Receiver Count: This is the number of receivers connected to this 2451 node. 2453 Bitmap: The bitmap of the missing packets between Low Sequence 2454 Number and High Sequence Number; starts at bit position Low 2455 Sequence Number modulo 32; Leading and trailing bits are ignored. 2457 6.5 JoinConfirm 2459 1 2 3 2460 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2461 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2462 | Child Index | Role | |C|R| HB_TTL | 2463 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2464 | Parent Heartbeat address | 2465 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2466 | Parent Heartbeat Port | Overhead | 2467 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2468 | Sequence No | Number of Senders | 2469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2470 | Sender 1 Last Stable | 2471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2472 | Sender 1 TimeStamp | 2473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2474 | Sender 1 StreamID | Reserved | 2475 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2476 | Sender 2 Last Stable | 2477 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2478 | Sender 2 TimeStamp | 2479 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2480 | Sender 2 StreamID | Reserved | 2481 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2483 Child Index: This is the index of the child node in its parent's 2484 child list. 2486 Role: This identifies or acknowledges the role of the parent. 2488 C: This flag is set to 1, if the parent accepts the child, or to 0 2489 if the parent rejects the child. 2491 R: This flag is set to 1 to indicate confirmation for a Rejoin 2492 packet. 2494 HB_TTL: This is the time-to-live for the Heartbeat packet. 2496 Parent Heartbeat Address: This is the multicast address for the 2497 parent's local control channel where Heartbeats are sent. 2499 Parent Heartbeat Port: This is the UDP port for the parent's local 2500 control channel where Heartbeats are sent 2502 Overhead: This is the response constant that specifies the number of 2503 HACKs that the answering parent expects for each Data packet. 2505 Sequence No.: This is the sequence number of the JoinStream packet. 2507 Number of Senders: This is the number of known senders. 2509 Sender Last Stable: This is the number of the last stable packet for 2510 the data stream. 2512 Sender TimeStamp: This is a 32 bit time stamp with one second 2513 granularity used to indicate the time when the stream was started. 2515 StreamID: This is the identifier of the stream. 2517 6.6 JoinStream 2519 1 2 3 2520 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2521 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2522 | Orig. TTL |R| Res | Role | Res | 2523 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2524 | Sequence No | Number of Senders | 2525 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2526 | Sender 1 StreamID | Sender 1 Mcast Port | 2527 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2528 | Sender 1 Mcast Addr | 2529 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2530 | Sender 2 StreamID | Sender 2 Mcast Port | 2531 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2532 | Sender 2 Mcast Addr | 2533 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2534 |... 2536 Orig. TTL: This is the original TTL sent from the originator. This 2537 field allows receiving node to assess roughly the distance from 2538 the source. 2540 R: This flag is set to 1 to indicate a rejoin request. 2542 Role: This specifies the role of the joining node in the RMTP tree. 2544 Sequence No: This is the number of times the same request has been 2545 sent. 2547 Number of Senders: This is the number of known senders. 2549 Sender StreamID: This is the stream identifier for the sender's data 2550 stream. 2552 Sender Mcast Port: This is the multicast port for the sender's data 2553 channel. 2555 Sender Mcast Addr: This is the multicast address for the sender's 2556 data channel. 2558 6.7 JoinAck 2560 1 2 3 2561 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2562 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2563 | Orig. TTL | Res | Sequence Number | 2564 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2565 | Parent Heartbeat address | 2566 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2567 | Parent Heartbeat Port | HB_TTL | Reserved | 2568 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2570 Orig. TTL: This is the original TTL sent from the originator. This 2571 field allows a receiving node to roughly assess the distance from 2572 the source. 2574 Sequence No.: This indicates how many times the same packet has been 2575 sent. 2577 Parent Heartbeat Address: This is the Heartbeat and the parent 2578 control multicast address. 2580 Parent Heartbeat Port: This is the Heartbeat and the parent control 2581 UDP port. 2583 HB_TTL: This is the time-to-live for the Heartbeat packet. 2585 6.8 LeaveStream 2587 1 2 3 2588 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2589 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2590 | Orig. TTL | Sequence No | Role | Res | 2591 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2592 | StreamID | Mcast Port | 2593 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2594 | Mcast Group Address | 2595 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2596 Orig. TTL: This is the original time-to-live sent from the 2597 originator. 2599 Sequence no.: This indicates how many times the same packet has been 2600 sent. 2602 Role: This is the role of the node sending the leave packet. 2604 StreamID: This is the ID of the stream. 2606 Mcast UDP Port: This is the multicast port of this stream. 2608 McastGroup Address: This is the multicast group address of this 2609 stream. 2611 6.9 Heartbeat 2613 1 2 3 2614 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2615 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2616 | Parent Address | 2617 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2618 | Parent Port |N|T| Reserved | Role | 2619 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2621 Parent Address: This is the IP address of the parent. 2623 Parent Port: This is the UDP port number of the parent. 2625 N: This flag is set to 1 to indicate that the node must respond 2626 immediately. 2628 T: This flag is set to 1 to indicate that the backup top node is 2629 taking over as the top node for the tree. 2631 Role: This is the role of the parent in the RMTP tree. 2633 6.10 HeartbeatResponse 2635 1 2 3 2636 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2637 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2638 | Role | Reserved | 2639 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2640 | Child Identifier | 2641 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2643 Role: This is the role of the node in the RMTP tree. 2645 Child Identifier: This is a unique identifier for the child, the 2646 StreamID for a sender node, the Child Index for other nodes 2648 6.11 NullData 2650 1 2 3 2651 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2652 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2653 | Sequence Number | 2654 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2655 | Last Stable | 2656 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2657 | TimeStamp | 2658 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2659 |N| Reserved | StreamID | 2660 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2662 Sequence Number: This is the sequence number of the last Data packet 2663 sent. 2665 Last Stable: This is the sequence number of last stable Data packet 2666 at the Sender. 2668 TimeStamp: This is a 32-bit time stamp with one second granularity 2669 used to indicate reincarnation of the sender. 2671 N: This flag is set to 1 to enable NACKs for this stream. 2673 StreamID : This is the 32-bit stream identifier. 2675 6.12 Eject 2677 1 2 3 2678 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2679 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2680 | Reason Code | Reserved | 2681 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2683 Reason Code: This code indicates the reason for sending the eject 2684 packet 2686 6.13 EOS 2688 1 2 3 2689 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2690 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2691 | TimeStamp | 2692 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2693 | Mcast Group Address | 2694 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2695 | Mcast Port | StreamID | 2696 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2698 TimeStamp: This is a 32-bit time stamp with one second granularity 2699 used to indicate reincarnation of the sender. 2701 McastGroup Address: This is the multicast group address for this 2702 stream. 2704 Mcast Port: This is the UDP port number for this stream. 2706 StreamID: This is the unique 32-bit stream identifier. 2708 6.14 LeaveConfirm 2710 1 2 3 2711 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2712 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2713 | Sequence No | Reserved | StreamID | 2714 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2716 Sequence No.: This is the sequence number of LeaveConfirm packet. 2718 StreamID: This is the unique 32-bit stream identifier. 2720 6.15 RMTP-II Options Header 2722 RMTP-II Options take a general form for their first 32 bits. That 2723 form is: 2725 1 2 3 2726 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2727 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2728 | A | OTYPE | LENGTH | Option Specific Format | 2729 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2731 A: a 2-bit flag that indicates how the packet is processed if the 2732 option is not 2733 supported by the implementation. 2735 00: ignore the option, but process the packet 2736 01: ignore the packet 2737 10: leave the RMTP tree 2738 11: reserved 2740 The flag A is set to 00 to indicate that the option is to be 2741 skipped and the packet processed. A is set to 01 to indicate that 2742 the whole packet is to be discarded. A is set to 10 to indicate 2743 that this is an important option which modifies the functional 2744 characteristics of the RMTP protocol and so the node must exit the 2745 RMTP tree. 2747 OTYPE: A number that indicates the option type. The predefined values 2748 of OTYPE are: 2749 1 Sender Forward 2750 2 FEC 2751 3 Control tree discovery 2752 4 Global parameter 2753 5 fault tolerant backup top node information 2754 6 congestion control information 2755 7 Round-trip Time for control tree 2756 8 Time bounded reliability 2757 9 Sender status Update Option 2758 10 Update Sender Parameters Option 2759 11 NACK Suppression Option 2760 12-63 reserved for future use 2762 LENGTH: Specifies the length of this option block in 32 bit words. 2764 6.15.1 Sender Forward Option 2766 This option allows a sender which does not have multicast capability, 2767 to forward data to the top node which will multicast the data to the 2768 tree. 2770 1 2 3 2771 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2772 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2773 | A | OTYPE | LENGTH | RESERVED | 2774 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2776 A: set to 00 ignore option, process packet 2778 OTYPE: set to 1 2780 LENGTH: set to 1 2782 6.15.2 FEC Option 2784 See Appendix D for a description of the Forward Error Correction 2785 option. 2787 6.15.3 RMTP tree control node Info Option 2789 A control node may optionally include lists of control node addresses 2790 in its Heartbeat packet. 2792 1 2 3 2793 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2794 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2795 | A | OTYPE | LEN | reserved | 2796 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2797 | parent nodes | peer nodes | child nodes | reserved | 2798 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2799 | Node Address | 2800 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2801 | node port | 2802 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2803 | Node Address | 2804 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2805 | node port | 2806 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2807 +-+-+-+- +-+-+-+-+ 2808 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2809 | Node Address | 2810 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2811 | node port | 2812 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2814 A: set to 00 ignore option, process packet 2816 OTYPE: 3 2818 LEN: This is the number of 32-bit words in the option packet. 2820 Ancestor Nodes: This is the number of address-port pairs in 2821 ancestor's address list. The addresses are ordered from the top node 2822 down to the current node's parent. If the number is zero then this 2823 node does not have any parents and must be the top node. 2825 Peer Nodes: This is the number of addresses-port pairs in the peer 2826 address list. The peers list contains the addresses of the control 2827 node children of this node's parent. The top node has zero peer 2828 nodes. 2830 Child Nodes: This is the number of address-port pairs in the child 2831 control node list of this node. 2833 Node Address: These entries contain the IP address of the listed 2834 nodes. 2836 Node Port: The port number of the listed nodes 2838 6.15.4 Global Parameter Modifications Option 2840 This option enables modifications to the global parameters of the 2841 RMTP tree. This option is available for JoinConfirm, Heartbeat, and 2842 HeartbeatResponse packets. 2844 1 2 3 2845 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2846 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2847 | A | OTYPE | LENGTH | sequence number | 2848 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2849 | Branching factor | Thack constant | 2850 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2851 | RMax | R | 2852 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2853 | Thack_max | Tjoin_response | 2854 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2855 | Rjoin | Theartbeat | 2856 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2857 | Fthreshold | Tnulldata_max | 2858 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2859 |O| Reserved | 2860 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2862 A: set to 10 Exit the RMTP tree, this option modified the 2863 behavior of the timings. 2865 OTYPE: 4 2867 LENGTH: 6 (6 32-bit words) 2869 Sequence Number: This is the sequence number that identifies this 2870 update. This number is set by the top node. All the nodes of the RMTP 2871 tree use this sequence number to determine if this is a duplicate 2872 packet. The Sequence Number is the only field when the option is use 2873 in a HeartbeatResponse packet. 2875 Branching factor: The branching factor B denotes the maximum number 2876 of children for any node of the tree. 2878 Thack constant: The Thack constant is a scaling coefficient for 2879 Thack. 2881 R: This is a real number which specifies the number of HACKs which 2882 should be received, on average, for each Data packet sent, at any 2883 node. 2885 RxMax: This is the maximum number of retransmissions allowed for a 2886 data packet. 2888 Thack_max: This number specifies the maximum time allowed between 2889 HACK transmissions for each receiver. 2891 Tjoin_response: This number specifies the maximum time to wait for a 2892 response to a JoinStream request from the parent node. The response 2893 should be either a JoinAck or a JoinConfirm. 2895 Rjoin: This number specifies the number of retries of the JoinStream 2896 request before declaring the parent unreachable. 2898 Theartbeat: This number specifies the time interval Thb at which 2899 control nodes multicast Heartbeat packets. 2901 Fthreshold: This number specifies the threshold time for failure 2902 detection. 2904 Tnulldata_max: This number specifies the maximum time interval for 2905 sending NullData packets. If the receiver does not receive any data 2906 packet from the sender within 2*Tnulldata_max, it detects a sender 2907 failure. 2909 O: This is a bit flag that is set to 1 to indicate that all 2910 designated receivers should use the optimistic HACK mechanism. The 2911 flag is set to 0 to indicate the pessimistic HACK mechanism. 2913 6.15.5 Fault Tolerant Top Node Option 2915 This option enables a backup top node. 2917 1 2 3 2918 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2919 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2920 | A | OTYPE | LENGTH | Backup top node UDP Port | 2921 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2922 | Backup top node IP Address | 2923 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2925 A: set to 00 skip option, process packet 2926 set to 01 discard packet 2928 OTYPE: set to 5 fault tolerant backup top node information 2930 Lenth: set to 2 two 32-bit words in the packet 2932 Backup top node UDP Port: 2933 Set to 0 no backup top node 2934 Set to UDP port of backup top node, if the backup top node exists 2936 Backup top node IP Address: 2937 Set to 0 no backup top node 2938 Set to IP Address of backup top node, if the backup top node exists 2940 6.15.6 LTRC Congestion Control Option 2942 This option enables transmission of loss rate for congestion control. 2944 1 2 3 2945 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2946 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2947 | A | OTYPE | LENGTH | LossRate | 2948 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2950 A: set to 00 skip option, process packet 2951 set to 01 discard packet 2953 OTYPE: set to 6 LTRC Congestion Control 2955 LENGTH: set to 1 one 32-bit field 2957 LossRate: Set to loss rate value * 100. If the loss rate is , say, 2958 10.34%, then LossRate will be 1034 (10.34*100). 2960 6.15.7 Control Tree Round Trip Time (RTT) 2962 This option is used in heartbeat and HeartbeatResponse packets to 2963 calculate the roundtrip time and to propagate the RTT to all the 2964 nodes in the RMTP tree. 2966 1 2 3 2967 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2968 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2969 | A | OTYPE | LENGTH | Reserved | 2970 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2971 | TimeStamp[0] | 2972 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2973 | TimeStamp[1] | 2974 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2975 | ResponseDelay | 2976 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2977 | HopRTT | 2978 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2979 | TotalRTT | 2980 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2982 A: set to 00 skip option, process packet 2984 OTYPE: set to 7: Round Trip Time (RTT) 2986 LENGTH: set to 6 2988 TimeStamp: Set to the current time when a control node wants to 2989 calculate the RTT. When a child node receives a Heartbeat with this 2990 field set to a non-zero value, it sets the RTT option in the next 2991 HeartbeatResponse it sends. If this field is set to zero, then the 2992 control node is providing HopRTT-up and TotalRTT-up to its child 2993 nodes. 2995 ResponseDelay: This field is set only in HeartbeatResponse packets. 2996 ResponseDelay is difference, in milliseconds, between the time the 2997 heartbeat packet was received and the time the HeartbeatResponse is 2998 sent. 3000 HopRTT: The maximum round trip time from a parent node to child 3001 nodes. This is set only in heartbeat packets. 3003 TotalRTT: If the packet is a heartbeat, then this field is TotalRTT- 3004 up, the round trip time from top node to the child node. If this 3005 packet is HeartbeatResponse packet, then this field is TotalRTT-down, 3006 the maximum round trip time from the node sending the 3007 HeartbeatResponse to its most distance descendent. 3009 6.15.8 Time Bounded Reliability Option 3011 This option provides support for time bounded reliability and can be 3012 used by a sender in the Data packets. 3014 1 2 3 3015 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3016 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3017 | A | OTYPE | LENGTH | Bound | 3018 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3019 | Tsent_0 | 3020 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3021 | Tsent_1 | 3022 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3024 A: set to 10 leave RMTP tree 3026 OTYPE: set to 8 Time bounded reliability 3028 LENGTH: set to 3 three 32-bit words in the packet 3030 Bound: This is the maximum time in seconds allowed for recovery of 3031 the packet 3033 Tsent: This is the time at the sender when the packet was generated. 3034 The timestamp is expressed in elapsed seconds and microseconds since 3035 00:00 Universal Coordinated Time, January 1, 1970. Tsent_0 contains 3036 the seconds part of the timestamp and Tsent_1 contains the 3037 microseconds part of the timestamp. 3039 6.15.9 Sender Status Update Option 3041 The sender uses this option in the HeartbeatResponse packet to update 3042 top node about current status at the sender. 3044 1 2 3 3045 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3046 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3047 | A | OTYPE | LENGTH | Current Rate | 3048 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3050 A: 3051 set to 00 skip option, process packet 3053 OTYPE: set to 9 This is the Sender Status Update Option 3055 LENGTH: set to 1 There is one 32-bit word in the packet 3057 Current Rate: This is the current admit rate at the sender. 3059 6.15.10 Sender Parameters Update Option 3061 The top node uses this option in heartbeat, HeartbeatResponse, Join, 3062 and JoinConfirm packets to pass updated parameters to the sender. 3064 1 2 3 3065 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3066 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3067 | A | OTYPE | LENGTH | Minimum Rate | 3068 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3069 | Maximum Rate | Initial Rate | 3070 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3072 A: 3073 set to 10 leave the RMTP tree 3075 OTYPE: set to 10 This is an Update Sender Parameters Option. 3077 LENGTH: set to 2 There are two 32-bit words in the packet. 3079 Minimum Rate: This is the minimum admit rate for the sender. 3081 Maximum Rate: This is the maximum admit rate sender for the sender. 3083 Initial Rate: This is the starting rate. If currently active then use 3084 Initial Rate as the current rate. 3086 6.15.11 NACK Suppression Option 3088 This option is sent in Retransmission or Heartbeat packets by control 3089 nodes to support NACK suppression. 3091 1 2 3 3092 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3093 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3094 | A | OTYPE | LENGTH | Not Used | 3095 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3096 | Start Sequence Number | 3097 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3098 | End Sequence Number | 3099 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3100 | Suppression Bitmap | 3101 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3103 A: 3104 set to 00 skip option, process the packet 3105 set to 01 discard packet 3107 OTYPE: set to 11 NACK Suppression Option 3109 LENGTH: set to 3 + Bitmap length 3111 Start Sequence Number: This specifies the lowest sequence number of 3112 Data packet, requested for retransmission but not available at the 3113 control node. 3115 End Sequence Number: This specifies the highest sequence number of 3116 Data packet, requested for retransmission but not available at the 3117 control node. 3119 Suppression Bitmap: This is a bitmap of missing Data packets at the 3120 control node, requested by at least one child node for 3121 retransmission. 3123 7 ACKNOWLEDGEMENTS 3125 RMTP-II draws heavily on the rich work done by the reliable multicast 3126 research community over the last decade. The area of tree based 3127 reliable multicast started with RMTP [PSK94, PSLB97] and TMTP 3128 [YGS95]. RMTP was the first protocol to propose organizing receivers 3129 into a tree formation for local recovery and avoidance of ACK 3130 implosion. TMTP introduced a number of new concepts, most notably the 3131 efficiencies gained by combining local NACKs with global ACKs. The 3132 MTP family of protocols [AFK92, BOGKS94] were the first to introduce 3133 the notion of different types of nodes having different roles in the 3134 communication group. The SRM protocol [FVLMZ96] showed how to use 3135 multicast NACKs for efficient local recovery and NACK suppression. 3136 PGM [SFLT98] showed how to do NACK suppression without relying on 3137 many-many multicast. 3139 The optional forward error correction is taken from [NB96, NBT97, 3140 Rizzo97]. The NACK timer backoff algorithms are taken directly from 3141 [NB98]. 3143 LORAX [LLG96] showed how to have multiple senders share a common 3144 acknowledgement tree. Studies have been done comparing different 3145 classes of reliable multicast protocols, showing that local recovery 3146 is a key piece of a scaleable reliable multicast protocol [NLJBC98], 3147 and that the combination of NACKs and ACKs proposed originally in 3148 TMTP has the highest scalability of any class of reliable multicast 3149 protocols [LG96]. MFTP [MRTW97] was the first protocol to emphasize 3150 the importance of support for satellite and asymmetrical networks. 3152 Recent work [PPV98, LG97, LC98, SFLT98] has shown the benefits 3153 changing routers to better support reliable multicast. RMTP-II has 3154 drawn on all of these ideas, and owes much of its design to these 3155 works and many others. 3157 CONTRIBUTORS AND REVIEWERS 3158 Special thanks goes to the following individuals, who have greatly 3159 contributed to the design and review of RMTP-II. 3161 Ernst Biersack, Institut EURECOM 3162 Carsten Bormann, University of Bremen, TZI 3163 Rick Buskens, Lucent 3164 Jon Crowcroft, UC London 3165 Christophe Diot, INRIA 3166 Jamal Golestani, Lucent 3167 Brian Levine, UC Santa Cruz 3168 John Lin, Lucent 3169 Don Newell, Intel 3170 Jorg Nonnenmacher, Institut EURECOM 3171 Joerg Ott, University of Bremen, TZI 3172 Christos Papadopoulos, University of Washington, St. Louis 3173 Krishan Sabnani, Lucent 3174 Nils Seifert, Telligue 3175 Muhammad Siddiqui, Lucent 3176 Tony Speakman, Cisco 3177 Paul Stirpe, Reuters 3178 Gursel Taskale, Reuters 3179 Wei Wu, Reuters 3180 Qingming Wang, Lucent 3181 Rajendra Yavatkar, Intel 3183 8 REFERENCES 3185 [AFK92] S. Armstrong, A. Freier, and K. Marzullo, "Multicast 3186 Transport Protocol", DARPA RFC 1301, February 1992. 3188 [BOGKS94] C. Bormann, J. Ott, H.-C. Gehrcke, T. Kerschat and N. 3189 Seifert, "MTP-2: Towards Achieving the S.E.R.O. Properties for 3190 Multicast Transport'', International Conference on Computer 3191 Communications and Networks (ICCCN-94), 1994. 3193 [FVLMZ96] S. Floyd, V. Jacobson, C. Liu, S. McCanne, L. Zhang, "A 3194 Reliable Multicast Framework for Light-weight Sessions and 3195 Application Level Framing", ACM Transactions on Networking, November 3196 1996. 3198 [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", 3199 Computer Communications Review, vol. 18, no. 4, pp.314-s29. 3201 [Jacobson90] V. Jacobson, "Berkeley TCP Evolution from 4.3-Tahoe to 3202 4.3-Reno", Proceedings of the Eighteenth internet Engineering Task 3203 Force, p 365, (Sept), University of British Columbia, Vancouver, B.C. 3205 [LC98] D. Li and D. Cheriton. "OTERS: Exploiting the Routing 3206 Topology for High-Performance Reliable Multicast", work in progress. 3208 [LG96] B. Levine and J.J. Garcia-Luna-Aceves, "A Comparison of Known 3209 Classes of Reliable Multicast Protocols", Proc. International 3210 Conference on Network Protocols (ICNP-96), October 1996. 3212 [LG97] B. Levine and J.J. Garcia-Luna-Aceves, "Improving Internet 3213 Multicast with Routing Labels," IEEE International Conference on 3214 Network Protocols (ICNP-97), October 28 - 31, 1997. p. 241-250. 3216 [LLG96] B. Levine, D. Lavo, and J.J. Garcia-Luna-Aceves. "The Case 3217 for Concurrent Reliable Multicasting Using Shared ACK Trees", Proc. 3218 The ACM Multimedia '96 Conference, November 1996. 3220 [MRTW97] K. Miller, K. Robertson, A. Tweedly, and M. White. 3221 "StarBurst Multicast File Transfer Protocol (MFTP) Specification", 3222 , January 1997, work in progress. 3224 [NB96] J. Nonnenmacher and E.W. Biersack, "Reliable Multicast: Where 3225 to use Forward Error Correction", Proc. 5th. Workshop on Protocols 3226 for High Speed Networks, Sophia Antipolis, France, Oct. 1996. 3228 [NB98] J. Nonnenmacher and E. W. Biersack, "Optimal Multicast 3229 Feedback", Proc. IEEE INFOCOM 1998, March 1998. 3231 [NBT97] J. Nonnenmacher, E. W. Biersack, and Don Towsley. "Parity- 3232 Based Loss Recovery for Reliable Multicast Transmission", In Proc. of 3233 ACM SIGCOMM '97, Cannes, France, September 1997. 3235 [NLJBC98] J. Nonnenmacher, M. Lacher, M. Jung, E. W. Biersack and 3236 Georg Carle, "How bad is reliable multicast without local recovery?", 3237 In Proc. of IEEE INFOCOM'98, San Francisco, CA, USA, March 1998. 3239 [PPV98] C. Papadopoulos, G. Parulkar, and G. Varghese, "An Error 3240 Control Scheme for Large Scale Multicast Applications", INFOCOM '98, 3241 March 1998. 3243 [PSK94] S. Paul, K. Sabnani, and D. Kristol, "Multicast Transport 3244 Protocols for High Speed Networks", Proceedings of International 3245 Conference on Network Protocols (ICNP-94), 1994. 3247 [PSLB97] "Reliable Multicast Transport Protocol (RMTP)", S. Paul, K. 3248 K. Sabnani, J. C. Lin, and S. Bhattacharyya, IEEE Journal on Selected 3249 Areas in Communications, Vol. 15, No. 3, April 1997. 3251 [Rizzo97] L. Rizzo, "Effective erasure codes for reliable computer 3252 communications protocols", DEIT Technical Report LR-970115. 3254 [SFLT98] T. Speakman, D. Farinacci, S. Lin, and A. Tweedly, 3255 "Pragmatic Group Multicast (PGM) Transport Protocol Specification", 3256 Internet Draft , January 1998, work 3257 in progress. 3259 [WMK94] B. Whetten, T. Montgomery, S. Kaplan, "A High Performance 3260 Totally Ordered Multicast Protocol", in "Theory and Practice in 3261 Distributed Systems", Springer Verlag LCNS938, 1994. 3263 [YGS95] R. Yavatkar, J. Griffioen, and M. Sudan, "A Reliable 3264 Dissemination Protocol for Interactive Collaborative Applications," 3265 Proceedings of the ACM Multimedia '95 Conference, November 1995. 3267 AUTHORS' ADDRESSES 3269 Brian Whetten 3270 whetten@gcast.com 3272 Murali Basavaiah 3273 murali@gcast.com 3275 Sanjoy Paul 3276 sanjoy@dnrc.bell-labs.com 3278 Todd Montgomery 3279 tmont@cs.wvu.edu 3281 Naveen Rastogi 3282 naveen@gcast.com 3284 Jim Conlan 3285 jim@gcast.com 3287 Thomas Yeh 3288 yeh@gcast.com