idnits 2.17.1 draft-ietf-rmt-track-arch-00.txt: ** The Abstract section seems to be numbered -(322): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(978): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(980): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == There are 3 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1336 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 324 instances of too long lines in the document, the longest one being 6 characters in excess of 72. ** There are 25 instances of lines with control characters in the document. ** The abstract seems to contain references ([2], [3]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 158 has weird spacing: '...to-many and...' == Line 563 has weird spacing: '...mission and...' == Line 757 has weird spacing: '...cket is cons...' == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: The reliability semantics TRACK provides are defined by the binding between a receiver and its repair head. When this binding is established, the repair head agrees to provide retransmission of missed packets for the receiver starting from a specific (receiver requested) sequence number. At this time, the repair head MUST not have discarded any data packet starting from this sequence number. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: Subsequently, a repair head needs to discard older packets from its buffer from time to time. The following two factors influence when to discard an old packet: a) Stability - When all receivers immediately subordinate to the repair head have acknowledged receipt of a packet, that packet is considered stable. When the whole sub-tree of receivers below a repair head have received a packet, it is considered as "strictly stable". TRACK provides no explicit support for this strict sense of stability (note this form of reliability is also referred to as "pessimistic reliability"). b) Sender recovery window - Each data packet carries two sequence numbers: one is the sequence number of the current data packet, and the other is the sender recommended sequence number where recovery should start from (smaller than the current sequence number). This pair of sequence numbers forms a sender-suggested recovery window. A repair head MUST not discard any packet before it becomes stable. Per binding agreement or session wide configuration, a repair head MAY be allowed to discard a packet when it moves outside of the sender recovery window. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 14, 2000) is 8680 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 20 looks like a reference -- Missing reference section? '2' on line 858 looks like a reference -- Missing reference section? '3' on line 45 looks like a reference -- Missing reference section? '4' on line 71 looks like a reference -- Missing reference section? '16' on line 531 looks like a reference -- Missing reference section? '5' on line 633 looks like a reference -- Missing reference section? '6' on line 639 looks like a reference -- Missing reference section? '7' on line 639 looks like a reference -- Missing reference section? '8' on line 639 looks like a reference -- Missing reference section? '9' on line 885 looks like a reference -- Missing reference section? '10' on line 1006 looks like a reference -- Missing reference section? '11' on line 1017 looks like a reference -- Missing reference section? '13' on line 1033 looks like a reference -- Missing reference section? '12' on line 1141 looks like a reference Summary: 10 errors (**), 0 flaws (~~), 9 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Reliable Multicast Transport (RMT) WG B. Whetten 2 Internet Draft Talarian 3 Document: draft-ietf-rmt-track-arch-00.txt D.Chiu 4 Sun Microsystems 5 S.Paul 6 Edgix 7 Miriam Kadansky 8 Sun Microsystems 9 Gursel Taskale 10 Talarian 11 July 14, 2000 13 TRACK ARCHITECTURE 14 A SCALEABLE REAL-TIME RELIABLE MULTICAST PROTOCOL 16 Status of this Memo 18 This document is an Internet-Draft and is in full conformance with all 19 provisions of Section 10 of RFC2026 [1]. 21 Internet-Drafts are working documents of the Internet Engineering Task 22 Force (IETF), its areas, and its working groups. Note that other groups may 23 also distribute working documents as Internet-Drafts. Internet-Drafts are 24 draft documents valid for a maximum of six months and may be updated, 25 replaced, or become obsolete by other documents at any time. It is 26 inappropriate to use Internet- Drafts as reference material or to cite them 27 other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 1. Abstract 36 One of the protocol instantiations the RMT WG is chartered to create is a 37 TRee-based ACKnowledgement protocol (TRACK). Rather than create a set of 38 monolithic protocol specifications, the RMT WG has chosen to break the 39 reliable multicast protocols in to Building Blocks (BB) and Protocol 40 Instantiations (PI). A Building Block is a specification of the algorithms 41 of a single component, with an abstract interface to other BBs and PIs. A 42 PI combines a set of BBs, adds in the additional required functionality not 43 specified in any BB, and specifies the specific instantiation of the 44 protocol. For more information, see the Reliable Multicast Transport 45 Building Blocks and Reliable Multicast Design Space documents [2][3]. 47 The TRACK protocol instantiation (TRACK for short) is designed to reliably 48 and efficiently send data from a single sender to large groups of 49 simultaneous recipients in real time. The term real-time is understood in 50 the industry as minimal latency including network propagation and 51 processing delays. TRACK PI provides functions similar to the NACK PI, and 52 adds support for a tree-based hierarchy (in its simplest form may consist 53 of only the sender as the Repair Head) of Repair Heads (RH), which 54 increases scalability by providing aggregation of control traffic and local 55 retransmission of lost packets. In addition to using negative 56 acknowledgements (NACKs) and forward error correction (FEC) for efficient 57 reporting and retransmission of lost packets, it also provides tree-based 58 ACKnowledgements (ACKs). ACKs provide the Sender with confirmation of 59 delivery of data packets to the Receivers. Like the NACK PI, it may also 60 take advantage of Generic Router Assist where available. 62 This document proposes a design rationale for the TRACK PI, an architecture 63 for TRACK, and a set of functional requirements TRACK has of other Building 64 Blocks. This document is not a protocol instantiation specification. 66 2. Conventions Used in this Document 68 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 69 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 70 document are to be interpreted as described in RFC-2119 [4]. 72 3. Design Rationale and Protocol Requirements 74 This section discusses many of the requirements imposed on the design of 75 the TRACK PI, as well as a design rationale which guides the aspects where 76 there is flexibility in selecting from different potential design 77 decisions. 79 3.1 Private and Public Networks 81 TRACK is designed to work in private networks, controlled networks and in 82 the public Internet. A controlled network typically has a single 83 administrative domain, has more homogenous network bandwidth, and is more 84 easily managed and controlled. These networks have the fewest barriers to 85 IP multicast deployment and the most immediate need for reliable multicast 86 services. Deployment in the Internet requires a protocol to span multiple 87 administrative domains, over vastly heterogeneous networks. The IETF is 88 specifically chartered with producing standards for the Internet, so this 89 must be the primary target network type. However, robust transport 90 protocols are grown, not created, and most of the short term deployment 91 experience will likely come from controlled networks. Therefore, TRACK is 92 designed to support both. 94 3.2 Manual vs. Automatic Controls 96 Some networks can take advantage of manual or centralized tools for 97 configuring and controlling the usage of a reliable multicast group. In 98 the public Internet the tools have to span multiple AS's where policies are 99 inconsistent. Hence, it is preferable to design tools that are fully 100 distributed and automatic. To address these requirements, TRACK supports 101 both manual and automatic algorithms for monitoring, management, and 102 configuration. 104 3.3 Heterogeneous Networks 106 While the majority of controlled networks are symmetrical and support many- 107 to-many multicast, in designing a protocol for the Internet, we must deal 108 with virtually all major network types. These include asymmetrical 109 networks, satellite networks, networks where only a single node may send to 110 a multicast group, and wireless networks. TRACK takes this into account by 111 not requiring any many-to-many multicast services. In addition, the 112 congestion control component used in TRACK will specifically deal with the 113 high bandwidth-delay product faced in many satellite networks and the high 114 link level loss rate faced by some wireless networks. Finally, TRACK does 115 not assume that the topology used for sending control packets has any 116 congruence to the topology of the multicast address used for sending data 117 packets. 119 3.4 Use of Network Infrastructure 121 There is wide consensus that in order to scale a real-time reliable 122 multicast protocol, there must be some use made of the network 123 infrastructure (the routers and servers inside the network). New software 124 that supports the transport layer typically would run in either the routers 125 or the servers in the network, or both. Deployment of router software 126 (such as that in the Generic Router Assist BB) is a powerful solution, but 127 typically requires very long time cycles, is of necessity limited in 128 functionality, and requires a graceful upgrade path. Server software (such 129 as the Repair Head control tree) is much easier to deploy, but may require 130 new hardware to be added to the network. 132 In controlled networks, particularly during the first deployment phases of 133 reliable multicast, it is reasonable to deploy servers that only support a 134 single application, or even to use selected end clients themselves to 135 perform the functions necessary for scalability. For widely deployed 136 Internet infrastructure components, the server infrastructure is usually 137 dedicated to just the single protocol, but supports all instances of that 138 protocol running across that piece of the network. Examples of this usage 139 model include DNS, DHCP, NNTP, and HTTP. Therefore, the control nodes used 140 in TRACK are designed to be run both on dedicated network servers able to 141 support hundreds or thousands of simultaneous data sessions, as well as on 142 an end user computer. 144 A number of extensions to IP multicast, such as subtree multicast, NACK 145 suppression, ACK aggregation, tree configuration discovery, and higher 146 fidelity congestion control reports, have been proposed which can run in 147 the routers. If deployed widely, these would make reliable multicast 148 protocols easier to configure and to scale more readily. Some or all of 149 these features are being standardized as part of the Generic Router Assist 150 (GRA) component. TRACK is designed to take advantage of GRA as it becomes 151 available, but not to require it. Ubiquitous deployment of GRA would 152 likely reduce the number of dedicated TRACK servers needed for large scale 153 (i.e. more than 1000 Receiver) deployments, and improve the performance of 154 the protocol. 156 3.5 Targeted Application Types 158 Multicast applications can be divided into two classes, few-to-many and 159 many-to-many. Many-to-many applications include multi-user games, small 160 group conferencing, and computer supported collaborative work. These 161 applications typically treat all members in a group as peers, require 162 special semantics such as total ordering of messages from multiple Senders, 163 and often have moderate scalability requirements. Other protocols, such as 164 RMP, have been designed to support these many-many applications. 166 In line with the charter for RMT, TRACK focuses on one to many bulk data 167 distribution applications, such as multicast file transfer, electronic 168 software distribution, real time news and financial market data 169 distribution, "push" applications, audio/video/data streaming, distance 170 learning, and some types of server replication. 172 In order to meet these requirements, TRACK treats each Sender as an 173 independent entity, and provides no ordering or other shared state across 174 data sessions, although multiple data sessions can share the same control 175 infrastructure. The protocol is designed to scale to at least many 176 thousands of simultaneous Receivers. TRACK provides a strong, but fully 177 distributed membership protocol, which supports scaling to many thousands 178 of simultaneous Receivers while providing confirmed delivery on messages. 179 Similar to TCP, TRACK continuously streams data to receivers, performing 180 acknowledgement and retransmission of older data packets at the same time 181 that new data packets are being sent. It also provides some special 182 support for real-time applications such as audio/video/data streaming and 183 live financial market data distribution. 185 Some real-time applications require jitter control for smooth playback. 186 This can be accomplished by using the unordered delivery option of TRACK 187 and performing jitter control in the application. Typically, this requires 188 the application to maintain a separate buffer to smooth out the per packet 189 delay variations. 191 TRACK also supports sender-controlled recovery window. In each data 192 packet, the sender may indicate to all receivers that data older than 193 certain sequence number are no longer worthy of recovering. (See section 194 on "Delivery Semantics" for more details). This mechanism helps the 195 transport better support applications that distribute content that ages 196 quickly, such as stock quotes. 198 3.6 IETF Mandated Criteria 200 In addition to the requirements imposed by the targeted network and 201 application types, TRACK is designed to meet all of the requirements 202 proposed by the IETF in RFC2357. 204 - Congestion Control. TRACK includes provably safe and TCP-friendly 205 congestion control algorithms that also scale to large groups. 207 - Well-controlled, Scaleable Behavior. TRACK includes carefully analyzed 208 algorithms that manage and smooth the control traffic and retransmissions. 209 These are key to avoiding NACK implosion, ACK implosion, and retransmission 210 implosion (the local recovery pathology). 212 - Security. TRACK supports protection of the transport infrastructure, 213 through the use of lightweight authentication of control and data packets. 215 3.7 Graceful Evolution 217 Creating robust, universally applicable standard protocols takes a great 218 deal of time and protocol evolution. While TRACK is being written as a 219 standard, it will have to continue to evolve as real world experience is 220 gained with the protocol, similar to how TCP has been tuned over almost 20 221 years of research and development. TRACK addresses this through its use of 222 Building Blocks, which allow particular algorithms to be broken out in to 223 separate components with well defined interfaces. This allows evolution of 224 these components, hopefully with little or no changes required to the rest 225 of the protocol. 227 TRACK also addresses evolution through its use of session parameters. 228 TRACK is presently dependent on a number of parameters which MUST be 229 configured throughout the tree for optimal operation. TRACK provides 230 mechanisms to automatically distribute these parameters to all members of 231 the group, and OPTIONALLY provides mechanisms to dynamically change some of 232 these parameters during group operation. 234 TRACK also provides SNMP management and monitoring tools. Over time, 235 deployment experiences will provide input on which values work best for 236 most deployments, leading to further refinements of the standard. 238 3.8 Algorithm Selection 240 The above design criteria applies to the general architecture of the 241 protocol. Additional criteria were used for selecting the optimal 242 algorithms for different sets of functions. These rationales are described 243 below, along with relevant functions. 245 4. Architectural Overview 247 4.1 TRACK Entities 249 4.1.1 Node Types 251 TRACK divides the operation of the protocol into three major entities: 252 Sender, Receiver, and Repair Head. TRACK's Repair Head corresponds to the 253 Service Node described in the Tree-Building draft. It is assumed that 254 Senders and Receivers typically run as part of an application on an end 255 host client. Repair Heads MAY be components in the network infrastructure, 256 managed by different network managers as part of different administrative 257 domains, or MAY run on an end host client, in which case they function as 258 both Receivers and Repair Heads. Absent of any automatic tree 259 configuration, it is assumed that the Infrastructure Repair Heads have 260 relatively static configurations, which consist of a list of nearby 261 possible Repair Heads. Senders and Receivers, on the other hand, are 262 transient entities, which typically only exist for the duration of a single 263 data session. In addition to these core components, applications that use 264 TRACK are expected to interface with other services that reside in other 265 network entities, such as multicast address allocation, session 266 advertisement, network management consoles, DHCP, DNS, server level 267 multicast, and multicast key management. 269 4.1.2 Multicast Group Address 271 A multicast group address is a pair consisting of an IP multicast address 272 and a UDP port number. It may optionally have a Time To Live (TTL) value, 273 although this value MUST only be used for providing a global scope to a 274 Data Session. Data multicast address and control multicast address are both 275 multicast group addresses. 277 4.1.3 Data Session 278 A Data Session is the unit of reliable delivery of TRACK. It consists of a 279 sequence of sequentially numbered Data packets, which are sent by a single 280 Sender over a single Data Multicast Address. They are delivered reliably, 281 with acknowledgements and retransmissions occurring over the Control Tree. 282 It is uniquely identified by a combination of a Session ID, sender's 283 address and port, and the multicast address and port. 285 A given Data Session is received by a set of zero or more Receivers, and a 286 set of zero or more of Repair Heads. One or more Data Sessions MAY share 287 the same Data Multicast Address (although this is not recommended). Each 288 TRACK node can simultaneously participate in multiple Data Sessions. A 289 receiver MUST join all the Data Multicast Addresses and Control Trees 290 corresponding to the Data Streams it wishes to receive. 292 4.1.4 Control Tree 294 A Control Tree is a hierarchical communication path used to send control 295 information from a set of Receivers, through zero or more Repair Heads 296 (RHs), to a Sender. Information from lower nodes are aggregated as the 297 information is relayed to higher nodes closer to the sender. Each Data 298 Session uses a Control Tree. 300 Each RH in the control tree uses a separate multicast address for 301 communicating with its children. Optionally, these RH multicast addresses 302 may be the same as the multicast address of the Data Channel. 304 4.1.5 Session ID 306 A Session ID is a 32-bit number (to be formally defined in the Common 307 Packet Header BB) chosen either by the application that creates the session 308 or selected by TRACK. Senders and Receivers use the Session ID to 309 distinguish Data Streams. A Sender may specify a Session ID in the range 310 from (2^31) to (2^32)-1. Numbers in the range from 0 to (2^31)-1 are 311 reserved. If a sender specifies 0 as the Stream ID, then TRACK randomly 312 assigns a Stream ID in the range from 1 to (2^31)-1. If a Session ID is 313 selected that is already in use on a Control Tree, the new stream will 314 fail, and will need to select a new Session ID. 316 A session is uniquely identified by its Session ID, its sender's 317 address/port, and its Data Multicast Address and port. 319 4.1.6 Packet Sequence Numbers 321 A packet sequence number is a 32 bit number in the range from 1 through 322 2^32 � 1, which is used to specify the sequential order of a Data packet in 323 a Data Stream. A sender node assigns consecutive sequence numbers to the 324 Data packets provided by the Sender application. Zero is reserved to 325 indicate that the data session has not yet started. 327 4.1.7 Data Queue 329 A Data Queue is a buffer, maintained by a Sender or a Repair Head, for 330 transmission and retransmission of the Data packets provided by the Sender 331 application. New Data packets are added to the data queue as they arrive 332 from the sending application, up to a specified buffer limit. The 333 admission rate of packets to the network is controlled by flow and 334 congestion control algorithms. Once a packet has been received by the 335 Receivers of a Data Stream, it may be deleted from the buffer. 337 4.1.8 Packet Types 339 TRACK defines a set of packets, which can be implemented either on top of 340 UDP or directly on top of IP. All TRACK packets will conform to the Common 341 Packet Headers BB. Each TRACK packet definition consists of a fixed 342 header, zero or more option headers, followed by data or control 343 information. 345 Data is carried in Data packets. The same packet type is used both to 346 transmit Data the first time and for retransmissions of lost packets. A bit 347 in the packet header is set when the packet is a retransmission. Each Data 348 packet has a Session ID and a sequence number, which identify the packet 349 and allow a receiver application to reconstruct the data stream from the 350 Data packets. 351 Receivers and Repair Heads unicast periodic status packets to their 352 parents. An ACK is sent regularly to indicate the status of the Data 353 packets which have arrived and to furnish congestion control statistics 354 about the state of data reception at the node. An ACK requests 355 retransmission of Data packets that have not been received. An ACK also 356 acknowledges packets that have become stable. A NACK is an ACK that is 357 used to request immediate recovery of lost Data packets. ACKs and NACKs 358 have the same format, but ACKs are passed all the way up the tree, while 359 NACKs are only sent as far as needed to find a node which can provide all 360 the requested retransmissions. A child will also send an ACK in response 361 to a NullData or Heartbeat packet if it has not sent an ACK within a 362 certain time interval. 364 TRACK uses the Tree-Building draft as a reference for building its repair 365 tree. The following is a description of TRACK's implementation of tree 366 building that is consistent with that draft. 368 When a Receiver or Repair Head wishes to establish a repair service 369 relationship, it uses a Bind packet to bind to a parent Repair Head. A 370 parent sends an Accept or Reject after it processes a Bind packet. 371 The Reject message comes with a reason code that explains the reason for 372 rejection. The reason may indicate that the parent is not connected into 373 the tree yet, so that the receiver can try again later (see open issue). 374 If the parent sends an Accept, this constitutes Joining a session. 376 When a Receiver or Repair Head wishes to leave a session, it sends a Leave 377 request to its parent. The parent replies with a LeaveConfirm packet, at 378 which time the child is allowed to leave. 380 A Repair Head or Sender periodically sends Heartbeat packets to notify its 381 child nodes that it is alive. 383 If a Sender has no data to send for a session, it periodically 384 multicasts a NullData packet on the Data Multicast Address. NullData 385 packets inform receivers about the state of the Data Stream and the Sender. 387 If a child node is not operating normally, or a parent node restarts 388 after a failure and receives a packet from a child not in its child list, 389 then the parent node sends an Eject packet to the child node, 390 causing the child node to terminate its connection to the control tree. 392 4.2 Basic Operation of the Protocol 394 For each Data Session, TRACK provides sequenced, reliable delivery of data 395 from a single Sender to up to tens of thousands of Receivers. A TRACK Data 396 Session consists of a network that has exactly one Sender node, zero or 397 more Receiver nodes and zero or more Repair Heads. 399 The figure below illustrates a TRACK Data Session with multiple Repair 400 Heads. 402 A Sender joins the TRACK tree and multicasts data packets on the Data 403 Multicast Address. All of the nodes in the session subscribe to the class 404 D IP multicast address and UDP port associated with the Data Multicast 405 Address. 407 There is no assumption of congruence between the topology of the Data 408 Multicast Address and the topology of the Control Tree. 410 -------> SD (Sender node)----->| 411 ^^^ | 412 ACKs / | \ Control | 413 and / | \ Tree | 414 NACKs / | \ | 415 / | \ (Repair | 416 / | \ Head | 417 / | \ nodes) v 418 RH RH RH <------------| 419 ^^ ^^^ ^^ | Data 420 / | / | \ | \ | Channel 421 / | / | \ | \ | 422 / | / | \ | \ v 423 R R R R R R R <--------- 424 (Receiver Nodes) 426 A Receiver joins a Data Multicast Address to receive data. A Receiver 427 periodically informs its parent about the packets that it has or has not 428 received by unicasting an ACK packet to the parent. Each parent node 429 aggregates the ACKs from its child nodes and (if it is not the Sender) 430 unicasts a single aggregated ACK to its parent. For lower latency 431 recovery in low loss networks, Receivers can also generate NACKs upon 432 detection of losses. These have the same format as a ACK, but are only 433 passed up the tree as far as necessary in order to find a Repair Head that 434 can retransmit the packet. The Repair Heads provide NACK suppression, 435 which provides traffic minimization benefits similar to ACK aggregation. 437 The Sender and each Repair Head have a multicast Local Control Channel to 438 their children. This is used for transmitting Heartbeat packets that 439 inform their child nodes that the parent node is still functioning. This 440 channel is also used to perform local retransmission of lost data packets 441 to just these children. TRACK will still provide correct operation even if 442 multicast addresses are reused across multiple Data Sessions or multiple 443 Local Control Channels. It is NOT RECOMMENDED to use the same multicast 444 address for multiple Local Control Channels serving any given Data Session. 446 The communication path forms a loop from the Sender to the Receivers, 447 through the Repair Heads back to the Sender. Data and NullData packets 448 regularly exercise the downward data direction. Heartbeat packets exercise 449 the downward control direction. ACKs, NACKs, and HeartbeatResponse packets 450 regularly exercise the control tree in the upward direction. This 451 combination constantly checks that all of the nodes in the tree are still 452 functioning correctly, and initiates fault recovery when required. 454 In addition to using ACKs, NACKs, and Repair Heads for scaleable loss 455 notification and retransmission, TRACK also supports the optional use of 456 Generic Router Assist (GRA) and integrated Forward Error Correction (FEC). 457 Two of the major functions of GRA are NACK suppression and dynamically 458 scoped local retransmission. These functions, if enabled, are 459 independently deployed between each parent and its children. For the 460 purpose of GRA NACK functions, each parent is considered to be a Sender and 461 the children of that parent are considered as the Receivers. 463 Retransmission requests, both NACKs and ACKs, contain selective bitmaps 464 indicating which packets need to be retransmitted. If FEC is enabled, 465 these bitmaps provide enough information to determine the number of parity 466 packets to be sent rather than sending individual retransmissions. 468 4.3 Session Creation 470 Before a data session starts delivering data, the tree for the Data Session 471 needs to be created. This process binds each Receiver to either a Repair 472 Head or the Sender, and binds the participating Repair Heads in to a loop- 473 free tree structure with the Sender as the root of the tree. This process 474 requires tree configuration knowledge, which can be provided with some 475 combination of manual and/or automatic configuration. The actual 476 algorithms for tree configuration will be part of the Automatic Tree 477 Configuration BB, and are discussed in the next section. 479 To start a data session, a Sender communicates to the Receivers, via either 480 an external service or through the application itself, the Data Multicast 481 Address that will be used for the Data Session. It may advertise other 482 relevant session information such as whether or not Repair Heads should be 483 used, whether manual or automatic tree configuration should be used, the 484 time at which the session will start, and other protocol constants. It may 485 also advertise certain hints for the tree configuration algorithms and 486 metrics. In this way, the Sender enforces a set of uniform Session 487 Configuration Parameters on all members of the session. 489 After receiving this out of band communication, the Receivers join the Data 490 Multicast Address, and attempt to bind to either the Sender or a local 491 Repair Head. The tree configuration algorithms are responsible for 492 providing the Receiver with a list of one or more nodes which it will 493 attempt to bind to. It will attempt to bind to the first node in the list, 494 and if this fails, it will move to the next one. A Receiver only binds to 495 a single Repair Head or Sender, at a time, for each Data Session. 497 When a Repair Head has a Receiver bind to it for a given Data Session, it 498 then also binds to another Repair Head or to the Sender, depending on the 499 list given to it by the tree configuration algorithms. The tree 500 configuration algorithms are responsible for ensuring that the tree is 501 formed without loops. 503 Once the Sender initiates tree building, it is also free to start sending 504 Data packets on the Data Multicast Address. Repair Heads and Receivers may 505 start receiving these packets, but may not request retransmission or 506 deliver data to the application until they receive confirmation that they 507 have successfully bound to the group. 509 Some of the Session Configuration Parameters MAY be changed dynamically by 510 the Sender by advertising the changed values as part of the NullData 511 packets periodically sent through the tree. If a given Session 512 Configuration Parameter must be the same at all nodes in order to provide 513 safe operation, it MUST NOT be dynamically changed once the Data Session 514 has started. 516 4.4 Tree Configuration 518 TRACK is designed to work either with manual configuration of the tree, or 519 with optional automatic tree configuration. Tree configuration is 520 responsible for providing each Receiver and Repair Head with a list of one 521 or more appropriate parents to attempt to bind to. 523 The goals of automatic tree configuration are: 525 - allow Receivers to automatically locate their best Repair Head(s), and 526 obtain the local control channel multicast address. 527 - provide automatic configuration of the Repair Head with either Repair 528 Heads that are servers operating in the network, or with dynamically 529 selected receivers. 531 These algorithms are specified in the Tree Configuration BB [16]. In order 532 to make sure that TRACK can be standardized in a timely fashion, the 533 automatic tree configuration algorithms need to be separate from the rest 534 of the TRACK protocol, so that TRACK can be deployed even without these 535 algorithms. When these algorithms from the Tree Configuration BB are not 536 available, TRACK will use static configuration. 538 4.5 Data Transmission and Retransmission 540 Data is multicast by a Sender on the Data Multicast Address. 541 Retransmissions of data packets may be multicast by the Sender on the Data 542 Multicast Address or be multicast on a Local Control Channel by a Repair 543 Head. In order to provide NACK suppression and to work with proactive FEC, 544 retransmissions are always multicast. If Generic Router Assist is enabled, 545 the routers may provide NACK suppression and allow dynamically scoped 546 retransmission to just the subset of Receivers and Repair Heads that have 547 missed a packet. 549 A Repair Head joins all of the Data Multicast Addresses that any of its 550 descendants have joined. A Repair Head is responsible for receiving and 551 buffering all data packets using the reliability semantics configured for a 552 stream. As a simple to implement option, a Repair Head MAY also function 553 as a Receiver, and pass these data packets to an attached application. 555 For additional fault tolerance, a Receiver MAY subscribe to the multicast 556 address associated with the Local Control Channel of one or more Repair 557 Heads in addition to the multicast address of its parent. In this case it 558 does not bind to this Repair Head or Sender, but will process 559 Retransmission packets sent to this address. If the Receiver's Repair Head 560 fails and it transfers to another Repair Head, this minimizes the number of 561 data packets it needs to recover after binding to the new Repair Head. 563 There are two types of retransmissions: local retransmission and 564 dynamically scoped retransmission. 566 4.5.1 Local Retransmission 568 If a Repair Head or Sender determines from its child node's ACKs or NACKs 569 that a Data packet was missed, the Repair Head retransmits the Data packet 570 or, if FEC is enabled, an FEC parity packet. The Repair Head or Sender 571 multicasts the Retransmission packet on its multicast Local Control 572 Channel. In the event that a Repair Head receives a retransmission and 573 knows that its children need this repair, it re-multicasts the 574 retransmission to its children. 576 The scope of retransmission is considered part of the Control Channel's 577 multicast address, and is derived during tree configuration. 579 4.5.2 Dynamically Scoped Retransmission 581 Dynamically Scoped Retransmission may be used on a network whose routers 582 support dynamically scoped retransmissions through Generic Router Assist. 583 Dynamically Scoped Retransmissions use soft state kept in the routers to 584 constrain the Retransmission to only the children that have requested them 585 through a NACK. Dynamically Scoped Retransmissions are known to be 586 susceptible to router topology changes. Therefore, only the first 587 retransmission of a packet is sent via this mechanism. Thereafter, only 588 the above two mechanisms should be used. This will allow the protocol to 589 provide connectivity even during router topology changes, albeit with less 590 efficiency. 592 4.6 Control Traffic Management 594 One of the largest challenges for scaleable reliable multicast protocols 595 has been that of controlling the potential explosion of control traffic. 596 There is a fundamental tradeoff between the latency with which losses can 597 be detected and repaired, and the amount of control traffic generated by 598 the protocol. In conjunction with the dynamic global tree parameters, 599 TRACK provides a set of algorithms that carefully control and manage this 600 traffic, preventing control traffic explosion. 602 Despite their different names, ACKs and NACKs both function as selective 603 acknowledgements of the window of contiguous sequence numbers that have not 604 yet been fully acknowledged. The only difference between the packet 605 headers is a single flag. 607 ACK packet frequency is controlled by setting a number of tree wide 608 parameters controlling their maximum rate of generation. The primary 609 parameter is the ratio parameter, R, for the maximum number of ACK packets 610 to be generated per data packet sent. The higher R is, the faster positive 611 acknowledgements will be generated all the way back to the sender. This 612 induces more back-channel traffic. 614 ACKs MUST be enabled for any Data Session. NACKs SHOULD be implemented as 615 part of any implementation, and MAY be enabled for any given Data Session. 616 If enabled, then on detection of a lost packet, a Receiver waits a random 617 interval before sending a NACK. If the Receiver receives the retransmitted 618 data before the NACK timer expires, the Receiver cancels the NACK. This 619 reduces the chance that multiple Receivers generate a NACK for the same 620 packet. 622 A Repair Head node multicasts a Data packet to its children as soon as it 623 gets a NACK request for that packet, unless it retransmitted that packet 624 previously in a configurable time period. If it does not have the missing 625 packet, it forwards the NACK to its parent, and multicasts a control packet 626 to its children to suppress any further NACKs for that packet from them. 627 The Repair Head forwards only one NACK for a missing Data packet within a 628 specified period of time. If more than one packet has been detected as 629 missing before the NACK is sent, the NACK will request all of the missing 630 packets. 632 NACKs are particularly good for providing real-time data distribution in 633 networks with low loss rates and short to moderate RTT times. See [5] for 634 comparisons on the tradeoffs between ACKs and NACKs for low latency 635 recovery of lost packets. 637 4.7 Integrated Forward Error Correction 639 Work [6][7][8] has shown the benefits of incorporating reactive forward 640 error correction (FEC) into reliable multicast protocols. This feature 641 encodes data packets with FEC algorithms, but does not transmit the parity 642 packets until a loss is detected. The parity packets are then transmitted 643 and are able to repair different lost packets at different Receivers. This 644 is a powerful tool for providing scalability in the face of independent 645 loss. When implemented, it is a simple matter to also provide proactive 646 FEC which automatically transmits a certain percentage of parity packets 647 along with the data. This is particularly useful when a high minimum error 648 rate is expected, or when low latency is particularly important. Both of 649 these are optionally supported in TRACK. 651 FEC is organized around windows of packets. TRACK Data packets include an 652 FEC offset window field, which identifies the offset of a given packet 653 within an FEC window. Combined with the FEC session configuration 654 parameters, this allows receivers to decode a combination of Data and 655 parity packets, to generate each window of Data packets. Proactive FEC 656 packets are parity packets sent as global retransmissions at the same time 657 a window of Data packets are sent. Reactive FEC packets are sent either 658 from a Repair Head or a Sender, in response to requests for 659 retransmissions. If using reactive FEC, a Repair Head must first have all 660 the packets in a window before it can respond to any request for 661 retransmission. The ACK and NACK bitmaps, combined with the information in 662 the headers of the Data packets, provides each Repair Head with enough 663 information to determine which parity packets the RH must compute and send 664 in response to requests for retransmission. 666 4.8 Flow and Congestion Control 668 Flow and congestion control algorithms act to prevent the Senders from 669 overflowing the Receivers' buffers and to force them to share the network 670 fairly and safely with other TCP and RM connections. TRACK uses a 671 combination of a transmission window for flow control, and the dynamic rate 672 control algorithms specified in the Congestion Control (CC) BB for 673 congestion control. These algorithms have been proven to meet all the 674 requirements for flow and congestion control, including being safe for use 675 in a general Internet environment, and provably fair with TCP. 677 The Sender application provides the minimum and maximum rate limits as part 678 of the global parameters. A Sender will not transmit at lower than the 679 minimum rate (except possibly during short periods of time when certain 680 slow receivers are being ejected), or higher than the maximum rate. If a 681 Receiver is not able to keep up with the minimum rate for a period of time, 682 the CC BB algorithms will cause it to leave the group. Receivers that leave 683 the group MAY attempt to rejoin the group at a later time, but SHOULD NOT 684 attempt an immediate reconnection. 686 4.9 Notification of Confirmed Delivery 688 TRACK provides a simple membership count for each session. This is 689 done by each repair head counting/aggregating its (subtree) membership 690 count and propagating it up the tree to the sender. The propagation 691 up the tree is piggybacked on the regular TRACK (ACK and NACK) packets. 693 Depending on whether there are late joiners, and receiver and repair 694 head failures, this count may fluctuate over the duration of the session. 696 Whether this counting is done or not can be controlled by a session-wide 697 configuration parameter. 699 A complete list of receiver membership can only be obtained if each repair 700 head (including the sender) supports an SNMP interface that supports 701 getting membership ids. Such SNMP support is optionally required for 702 dedicated repair servers (but not required of regular receivers). 704 4.10 Fault Detection and Recovery 706 4.10.1 Sender node failure detection 708 A Sender node that has no data to send will periodically send NullData 709 packets on the Data Multicast Address. If a Receiver or a Repair Head 710 fails to receive Data packets or NullData packets for a session sent by the 711 Sender, the Receiver detects a Sender failure. 713 4.10.2 Repair Head failure detection 715 Each Repair Head node sends Heartbeat packets to its child nodes on its 716 multicast Control Tree. If the child nodes do not receive any Heartbeats 717 from their parent Repair Head, they detect failure of the parent. 719 4.10.3 Receiver node failure detection 721 A Receiver node sends ACKs and (optionally) NACKs for each of the active 722 sessions that it has joined. If none of the sessions are active, then the 723 Receiver sends HeartbeatResponse packets to its parent. 725 If a Receiver's parent node does not receive a ACK, NACK or a 726 HeartbeatResponse packet within a specified time interval, the parent 727 detects the failure of the Receiver and removes the child from its child 728 list. 730 4.10.4 Repair Head discovery 732 TRACK supports an option which allows the nodes in the TRACK tree to 733 acquire the addresses and location of its ancestors in the control tree and 734 the addresses of its parent's siblings. If a TRACK node's parent fails, 735 then the node can use the acquired information to join an alternate control 736 node. 738 4.10.6 Recovery 740 When a child node detects failure of its parent node, it can try to 741 reconnect to an alternate Repair Head of the TRACK tree, or it can try to 742 reconnect directly to the Sender. 744 4.11 Reliability Semantics 746 The reliability semantics TRACK provides are defined by the binding 747 between a receiver and its repair head. When this binding is established, 748 the repair head agrees to provide retransmission of missed packets for the 749 receiver starting from a specific (receiver requested) sequence number. At 750 this time, the repair head MUST not have discarded any data packet starting 751 from this sequence number. 753 Subsequently, a repair head needs to discard older packets from its buffer 754 from time to time. The following two factors influence when to discard an 755 old packet: 756 a) Stability - When all receivers immediately subordinate to the repair 757 head have acknowledged receipt of a packet, that packet is considered 758 stable. When the whole sub-tree of receivers below a repair head have 759 received a packet, it is considered as "strictly stable". TRACK 760 provides no explicit support for this strict sense of stability (note 761 this form of reliability is also referred to as "pessimistic 762 reliability"). 763 b) Sender recovery window - Each data packet carries two sequence 764 numbers: one is the sequence number of the current data packet, and the 765 other is the sender recommended sequence number where recovery should 766 start from (smaller than the current sequence number). This pair of 767 sequence numbers forms a sender-suggested recovery window. 768 A repair head MUST not discard any packet before it becomes stable. Per 769 binding agreement or session wide configuration, a repair head MAY be 770 allowed to discard a packet when it moves outside of the sender 771 recovery window. 773 When a repair head's buffer is filled up and none of the packets can be 774 discarded (due to stability or recovery window requirements), newly arrived 775 packets must be discarded and recovered later. 777 A receiver SHOULD NOT try to recover packets outside of the sender 778 recovery window. 780 When a receiver loses its repair head due to network partition or 781 repair head crashing, the receiver MAY continue with the same reliability 782 service if it manages to find and re-affiliate with another repair head. If 783 the receiver fails to find an alternative repair head that can continue to 784 provide reliability service where the previous repair head left off, this 785 receiver MUST indicate failure to its application. 787 4.12 Ordering Semantics 789 TRACK offers two flavors of ordering semantics: Ordered or Unordered. One 790 of these is selected on a per session basis as part of the Session 791 Configuration Parameters. 793 Unordered service provides a reliable stream of packets, without 794 duplicates, and delivers them to the application in the order received. 795 This allows the lowest latency delivery for time sensitive applications. 796 It may also be used by applications that wish to provide its own jitter 797 control. 799 Ordered service provides TCP semantics on delivery. All packets are 800 delivered in the order sent, without duplicates. 802 4.13 SNMP Support 804 The Repair Heads and the Sender are designed to interact with SNMP 805 management tools. This allows network managers to easily monitor and 806 control the sessions being transmitted. All TRACK nodes have SNMP MIBs 807 defined. SNMP support is optional for Receiver nodes, but is required for 808 all other nodes. 810 4.14 Late Join Semantics 812 TRACK offers three flavors of late join support: 813 a) No Recovery 814 A receiver binds to a repair head after the session has started and 815 agrees to the reliability service starting from the sequence number in 816 the current data packet received from the sender. 817 b) Continuation 818 This semantic is used when a receiver has lost its repair head and 819 needs to re-affiliate. In this case, the receiver must indicate the 820 oldest sequence number it needs to repair in order to continue the 821 reliability service it had from the previous repair head. The binding 822 occurs if this is possible. 823 c) No Late Join 824 For some applications, it is important that a receiver receives either 825 all data or no data (e.g. software distribution). In this case option 826 (c) is used. 828 4.15 Application Signaling for Notification 830 TRACK provides two forms of application signaling for speedy 831 acknowledgement: 832 a) End of stream - this is done when the application has finished 833 sending all its data, and wants to finish the session. 834 b) Synch - this is done when the application comes to a point in its 835 data distribution that it wants to make sure all packets have been 836 received before proceeding further. In this case the session is not 837 ending. 839 In both cases, the application SHOULD be able to signal this through its 840 transport API. In turn, TRACK will carry the signal as a flag in its data 841 (or NullData) packets. For case (a), the flag is set in the last data 842 packet of the session, and in additional NullData packets 843 carrying the last sequence number. For case (b), the flag is set in 844 the data packet the application requires synch, and in additional 845 NullData packets sent prior to new data packets following the synch 846 sequence number. 848 Upon receiving "end of stream", a receiver must try to recover data packets 849 up to the indicated last sequence number and send its final ACK to its 850 repair head. The receiver can then leave the repair head. When all the 851 packets up to the last packet become stable, the repair head can leave. 853 Upon receiving "synch", the receivers and repair heads perform the same 854 operations as in "end of stream" except they keep their binding. 856 5. Functional Specification for TRACK Requirements of Building Blocks 858 Work [2] provides a rationale for decomposing the RMT protocols in to 859 Building Blocks and Protocol Instantiations. This section provides a 860 simple specification of the functions that TRACK requires from each of the 861 Building Blocks. It also provides some basic description of the interfaces 862 between these components. 864 Since the following overlaps with what is done in the BBs, all of section 5 865 is for discussion purposes only, and is not meant to replace what is 866 specified in the supporting BBs. The BBs will define the actual 867 algorithms. 869 5.1 NACK-based Reliability 871 This building block defines NACK-based loss detection/notification and 872 recovery. The major issues it addresses are implosion prevention 873 (suppression) and NACK semantics (i.e. how packets to be retransmitted 874 should be specified, both in the case of selective and FEC loss repair). 876 The NACK suppression mechanisms used by TRACK are unicast NACKs with 877 multicast confirmation and exponentially distributed timers. These 878 suppression mechanisms primarily need to both minimize delay while also 879 minimizing redundant messages. They may also need to have special 880 weighting to work with Congestion Feedback. 882 5.1.1 NACK BB Algorithms 884 Exponential Back Off. When a packet is detected as lost, an exponentially 885 distributed timer is set, based on the algorithms in [9]. This timer is 886 biased based on the input congestion weighting factor. If either a packet 887 or an explicit suppression message with the same sequence number is 888 detected before the timer goes off, the timer is cancelled. 890 NACK Generation. When a timer goes off, the protocol instantiation is 891 notified to generate a NACK for that sequence number. The protocol 892 instantiation may, at its discretion, group multiple NACK notifications in 893 to a single NACK packet. For TRACK, NACKs are implemented as a unicast 894 packet with a multicast confirmation response. 896 Response to a Retransmission Request. When a Repair Head or other possible 897 retransmission agent receives the first NACK from another group member for 898 a given packet, it notifies the protocol instantiation to send either a 899 data retransmission or, if it doesn't have the packet for retransmission, 900 an optional suppression message. It then sets an embargo timeout, tied to 901 the RTT to the furthest Receiver, during which other requests for the same 902 packet will be ignored. The length of this embargo doubles each time that 903 a retransmission is sent. This algorithm should also work with requests 904 for retransmissions that come in the form of ACKs, as the algorithms and 905 packet formats for both are identical, with the exception of the 906 suppression mechanisms used. 908 GRA Signaling. A primary function of GRA is to do NACK 909 elimination/suppression and subcasting of repairs. In order to do this, 910 the transport need to signal the GRA-enabled routers to turn on the 911 appropriate algorithms. This algorithm has to deal with issues such as 912 router topology changes. While not dealt with in detail here, this is a 913 very subtle issue, which will have to be dealt with carefully. 915 5.1.2 NACK BB Parameters 917 Congestion Weighting. From Congestion Control BB. This is a weighting 918 parameter for NACK suppression timers. The exact algorithms for this are 919 still to be determined. 921 Loss Notification. From TRACK protocol instantiation. Notification at a 922 Sender that a packet has been detected as lost, and the sequence number of 923 that packet. 925 Retransmission Request. From TRACK protocol instantiation. When a NACK or 926 ACK with a request for retransmission is received, this needs to be passed 927 to the BB for handling retransmission requests. 929 GRA Enabled. From PI. Is GRA enabled in the network? 931 5.2 FEC Repair BB 933 This building block is concerned with packet level FEC repair. It 934 specifies the FEC codec selection and the FEC packet naming (indexing) for 935 both reactive FEC and proactive FEC. 937 5.2.1 FEC BB Algorithms 939 FEC Input. Receive a window of packets (not necessarily all at once), and 940 store pointers to them for use in the FEC Create Parity algorithm. 942 FEC Create Parity. Given a window of packets, create and return a parity 943 packet. If a window is not yet full, first call the FEC Flush function. 944 If there are no more parity packets that can be generated for this window, 945 then return an error or else return a parity packet that has already been 946 generated. This uses one of a set of codecs, specified through the use of 947 codepoints. For TRACK, it is expected that the codecs will operate over 948 relatively small windows, to work with real-time applications and 949 congestion control. 951 FEC Flush. For a window that needs a parity packet, but is not yet full, 952 FEC flush creates all-zero packets for the rest of the packets in the 953 window. No more calls to FEC input can be made for this window after FEC 954 flush has been called. 956 FEC Decode. Given a set of received data and/or parity packets, decode the 957 window using the specified FEC codec. 959 5.2.2 FEC BB Parameters 961 Codec Code Point Index. What is the codec being used for encoding and 962 decoding? This is a fixed parameter per data stream. 964 FEC Window Size. What is the number of packets in an FEC window? This is 965 a fixed parameter per data stream. 967 FEC Maximum Parity. This is the maximum number of parity packets that can 968 be generated over a given window size. This is a fixed parameter per data 969 stream. 971 Data Packet Sequence Number. This is the sequence number of a data packet. 972 This is input to FEC Input from the PI. 974 FEC Window Offset. For a given packet, what is the offset in to an FEC 975 window? This is associated with each Data packet that uses FEC. It is a 976 header field on each Data packet sent. It is sequential over each packet 977 in a window, unless a Flush occurs on a partially full window. In that 978 case, the window offset of this last packet is set to FEC Window Size�1. 979 For parity packets, the FEC Window Offset starts at FEC Window Size, and 980 goes up to FEC Window Size + FEC Maximum Parity�1. This is returned from 981 FEC Input and from FEC Create Parity. 983 5.3 Congestion Control BB 985 TRACK uses a source-based rate regulation algorithm, with a single rate 986 provided to all the Receivers in the session. 988 The following set of algorithms and parameters is a subset of those needed 989 for a full implementation, but give an idea of what is required. 991 5.3.1 Congestion Control BB Algorithms 993 Initialization. A number of transport-wide parameters must be fed to each 994 of the nodes in the group, such as minimum rate, maximum rate, data segment 995 size, etc. 997 Receiver Measurements. The Receiver must keep track of its average loss 998 rate, and RTT to the Sender. We will call these measures "congestion 999 reports". 1001 Receiver Feedback. The Receivers must feed these congestion back to the 1002 Sender, piggybacked on both NACKs and ACKs. 1004 Hierarchical Aggregation. Restricted worst edge aggregation should be used 1005 to aggregate the congestion reports in the ACKs and/or NACKs being fed up 1006 the tree [10]. Every time that an ACK or NACK is generated, this algorithm 1007 should be called to fill in the appropriate fields. Every time an ACK or 1008 NACK is received, this algorithm should be called to process the congestion 1009 control fields in the packet. This algorithm must also be notified every 1010 time a new child joins or leaves at a Repair Head or Sender. 1012 Sender Rate Control. Based on the congestion reports received, the Sender 1013 must change its sending rate. 1015 TCP Friendly Equation. Given values for RTT, DataSize, and LossRate, this 1016 generates a target throughput rate according to a modified version of the 1017 complex TCP model given in [11]. 1019 5.3.2 Congestion Control BB Parameters 1021 Initialization Parameters. A set of different options, some of which can 1022 be permanent constants, but others are selected by either the Sender or a 1023 network manager. 1025 Lost Packet. Every time a packet is detected as lost, the Senders must be 1026 notified of this. 1028 RTT Measurement. Every time a RTT measurement is generated, either between 1029 Sender and Receiver(s), or between one level of the tree and another, the 1030 CC BB must be notified. 1032 Highest Allowed Sequence Number (HASN). This is used to implement 1033 "receiver-driven" window control [13]. Each receiver can keep track of a 1034 congestion window and compute the HASN to be included in each ACK. A Repair 1035 Head aggregates the HASNs by computing the minimum value from all its 1036 children and forwards that as its own HASN up the tree. 1038 5.4 Generic Router Assist BB 1040 The task of designing scaleable RM protocols can be made easier by the 1041 presence of some specific support in routers. In some application- 1042 specific cases, the increased benefits afforded by the addition of special 1043 router support can justify the resulting additional complexity and expense. 1045 Functional components which can take advantage of router support include 1046 feedback aggregation/suppression (both for loss notification and congestion 1047 control) and constrained retransmission of repair packets. 1049 The process of designing and deploying these mechanisms inside routers can 1050 be much slower than the one required for end-host protocol mechanisms. 1051 Therefore, it would be highly advantageous to define these mechanisms in a 1052 generic way that multiple protocols can use if it is available, but do not 1053 necessarily need to depend on. 1055 This component has two halves, a signaling protocol and actual router 1056 algorithms. The signaling protocol allows the transport protocol to 1057 request from the router the functions that it wishes to perform, and the 1058 router algorithms actually perform these functions. 1060 An important component of the signaling protocol is some level of 1061 commonality between the packet headers of multiple protocols, which allows 1062 the router to recognize and interpret the headers. This is covered in the 1063 section on common packet headers, below. 1065 5.4.1 GRA BB Algorithms 1067 NACK Suppression. NACKs are sent towards the parent Repair Head or Sender, 1068 with a Router Alert option on. GRA enabled routers detect these packets 1069 and suppress redundant NACKs. It then updates a soft state table so that 1070 it knows to retransmit the requested packet to the requesting children, 1071 using Dynamic Selective Retransmission. The NACK suppression algorithm 1072 needs to work with both ACKs and NACKs, in order for Dynamic Selective 1073 Retransmission to work with TRACK. This means that GRA can not suppress 1074 ACKs but must still use them to update its state for retransmissions. It 1075 also means that GRA must work with ACK and NACK selective bitmaps, not just 1076 NACKs that request a single packet. 1078 Dynamic Selective Retransmission. When a retransmission occurs, it is only 1079 forwarded to the interfaces of each router that have signaled through the 1080 use of NACKs that they need to see that packet. 1082 Nearest Repair Head Hint. The router is made aware of the nearest Repair 1083 Heads, and is able to tell a child which is the best candidate for it to 1084 use. This must only be used as a hint to children. 1086 Fine Grained Loss Reports. A major limitation of TFMCC is its limitation 1087 of only getting 1-bit loss reports (i.e. a packet is lost, or it is not) 1088 from the routers. A 8 or 16 bit report, piggybacked on to data packets, 1089 with the cumulative loss detected across all interfaces of GRA enabled 1090 routers the data packet crossed, would allow TRACK to become much more 1091 responsive to changes in network conditions. These reports can only be 1092 used as hints. 1094 Signaling Protocol. The functions for GRA need to be requested by the 1095 protocol ahead of time, and then the run time packet headers need to be 1096 decipherable by the router. 1098 5.4.2 GRA BB Parameters 1100 GRA Enabled. Is GRA enabled in any of the routers in the network? Which 1101 functions do the deployed version of GRA support? 1103 Packet Format. Which type of packet format is GRA to operate over? It is 1104 likely that different protocol instantiations will require differences in 1105 the packet headers they send to the router. This is tied to the common 1106 packet header BB, below. 1108 5.5 Automatic Tree Configuration BB 1110 TRACK takes advantage of hierarchical Repair Heads, to greatly increase the 1111 theoretical scalability of the protocol. These Repair Heads are used to 1112 form a tree with the source at the root, the Receivers at the leaves of the 1113 tree, and the Repair Heads in the middle. The Repair Heads can either be 1114 dedicated server software for this task, or they may be application nodes 1115 that are performing dual duty. 1117 The effectiveness of these agents to assist in the delivery of data is 1118 highly dependent upon how well the logical tree they use to communicate 1119 matches the underlying routing topology. The purpose of this building 1120 block is to construct and manage the logical tree connecting the agents. 1121 Ideally, this building block will perform these functions in a manner that 1122 adapts to changes in session membership, routing topology, and network 1123 availability. 1125 5.5.1 Auto Tree BB Algorithms 1127 These are discussed in section 3.3. They are not yet mature enough to 1128 break down in to component parts. 1130 5.5.2 Auto Tree BB Parameters 1132 These are discussed in section 3.3. The algorithms are not yet mature 1133 enough to break down in to the parameters needed. 1135 5.6 Security 1137 As specified in [12], the primary security requirement for a TRACK protocol 1138 is protection of the transport infrastructure. This is accomplished 1139 through the use of lightweight group authentication of the control and, 1140 optionally, the data packets sent to the group. These algorithms use IPsec 1141 and shared symmetric keys. For TRACK, [12] recommends that there be one 1142 shared key for the Data Session and one for each Local Control Channel. 1143 These keys are distributed through a separate key manager component, which 1144 may be either centralized or distributed. Each member of the group is 1145 responsible for contacting the key manager, establishing a pair-wise 1146 security association with the key manager, and obtaining the appropriate 1147 keys. The TRACK protocol then provides options for piggy-backing key 1148 update messages on the Data Session and each Local Control Channel of the 1149 protocol. These can either include a new shared group key (encrypted with 1150 the old group key) or a notification that the group key(s) are being 1151 changed and that the group members should contact the key manager to get 1152 the new key(s). The former typically occurs on a periodic basis, while the 1153 latter may occur when a group member leaves. 1155 The exact algorithms for this BB is presently the subject of research 1156 within the IRTF Secure Multicast Group (SMuG). Solutions for these 1157 requirements will be standardized within the IETF when ready. 1159 5.7 Common Headers BB 1161 As pointed out in the generic router support section, it is important to 1162 have some level of commonality across packet headers. It may also be 1163 useful to have common data header formats for other reasons. This building 1164 block consists of recommendations on fields in their packet headers that 1165 protocols should make common across themselves. TRACK needs to implement 1166 these recommendations in the TRACK PI. 1168 5.7.1 Common Header BB Fields 1170 GRA Signaling. The Retransmission, NACK and ACK packet headers need to 1171 provide a means for signaling their existence to GRA. For NACK and ACK 1172 headers, the selective bitmap needs to be specified in a common way across 1173 all protocols so that the GRA component can interpret these fields and 1174 determine the sequence numbers of the packets that are being requested. 1175 For the Retransmission packets, the sequence number of the packet needs to 1176 be in a standard position so that GRA can interpret it. For both NACK, ACK 1177 and Retransmission packets, the Session ID needs to be specified in a 1178 standard way across protocols. 1180 Data Packets. The identification of data packets within a stream should be 1181 common across all protocols, both to aid in commonality of application 1182 semantics across protocols and to aid in GRA signaling. A Data Packet is 1183 identified by three fields: the Session ID, the Sequence Number, and the 1184 FEC Window Offset. The Session ID may include the multicast address and/or 1185 a unique ID. The sequence number starts at 1 and increments with each Data 1186 packet sent in the Session. Sequence numbers are always sequentially 1187 generated, without gaps. The FEC Window Offset specifies the offset of a 1188 Data packet in to an FEC window. For a window of W generated packets, a 1189 maximum window size of M, and a maximum parity size of P, the packets are 1190 numbered as follows. The first W-1 packets are numbered as 0 through W-2, 1191 with W never to exceed M. Packet W is always numbered as M-1, so that 1192 Receivers and Repair Heads can detect a partially filled window. The P 1193 parity packets are numbered M through M+P-1. 1195 IP and UDP. It is easiest to implement protocols in the application space 1196 using UDP packets, but eventual kernel implementations will have TRACK 1197 implemented directly on top of IP. Other protocols share this requirement, 1198 and the way that this transition is done should be specified across all 1199 protocols. 1201 6. Security Considerations 1203 7. References 1205 1) Bradner, S., "The Internet Standards Process -- Revision 3", BCP 1206 9, RFC 2026, October 1996. 1208 2) Whetten, B., et. al. "Reliable Multicast Transport Building 1209 Blocks for One-to-Many Bulk-Data Transfer." Internet Draft, 1210 draft-ietf-rmt-buildingblocks-02.txt, Work in Progress. 1212 3) Handley, M., et. al. "The Reliable Multicast Design Space for 1213 Bulk Data Transfer." Internet Draft, draft-ietf-rmt-design- 1214 space-01.txt, Work in Progress. 1216 4) Bradner, S., "Key words for use in RFCs to Indicate Requirement 1217 Levels", BCP 14, RFC 2119, March 1997 1219 5) Whetten, B., Taskale, G. "Overview of the Reliable Multicast 1220 Transport Protocol II (RMTP-II)." IEEE Networking, Special Issue 1221 on Multicast, February 2000. 1223 6) Nonnenmacher, J., Biersack, E. "Reliable Multicast: Where 1224 to use Forward Error Correction", Proc. 5th. Workshop on 1225 Protocols for High Speed Networks, Sophia Antipolis, France, Oct. 1226 1996. 1228 7) Nonnenmacher, J., et. al. "Parity-Based Loss Recovery for 1229 Reliable Multicast Transmission", In Proc. of ACM SIGCOMM '97, 1230 Cannes, France, September 1997. 1232 8) Rizzo, L. "Effective erasure codes for reliable computer 1233 communications protocols", DEIT Technical Report LR-970115. 1235 9) Nonnenmacher, J., Biersack, E. "Optimal Multicast Feedback", 1236 Proc. IEEE INFOCOM 1998, March 1998. 1238 10) Whetten, B., Conlan, J. "A Rate Based Congestion Control Scheme 1239 for Reliable Multicast", GlobalCast Communications Technical 1240 White Paper, November 1998. http://www.talarian.com/rmtp-ii 1242 11) Padhye, J., et. al. "Modeling TCP Throughput: A Simple Model 1243 and its Empirical Validation". University of Massachusetts 1244 Technical Report CMPSCI TR 98-008. 1246 12) Hardjorno, T., Whetten, B. "Security Requirements for TRACK 1247 Protocols." Work in Progress. 1249 13) Golestani, J., "Fundamental Observations on Multicast Congestion 1250 Control in the Internet", Bell Labs, Lucent Technology, paper 1251 presented at the July 1998 RMRG meeting. 1253 14) Kadansky, M., D. Chiu, J. Wesley, J. Provino, "Tree-based 1254 Reliable Multicast (TRAM)", draft-kadansky-tram-02.txt, Work in 1255 Progress. 1257 15) Whetten, B., M. Basavaiah, S. Paul, T. Montgomery, "RMTP-II 1258 Specification", draft-whetten-rmtp-ii-00.txt, April 8, 1998. Work 1259 in Progress. 1261 16) draft-ietf-rmt-bb-tree-config-00.txt 1263 8. Acknowledgments 1265 Special thanks goes to the following individuals, who have 1266 contributed to the design and review of this document. 1268 Supratik Bhattacharyya, Sprint Labs 1270 Seok Koh, ETRI Korea 1271 Joseph Wesley, Sun Microsystems 1273 9. Author's Addresses 1275 Brian Whetten 1276 Talarian Corporation 1277 333 Distel Circle 1278 Los Altos CA 94022 1279 whetten@talarian.com 1281 Dah Ming Chiu 1282 Sun Microsystems Laboratories 1283 1 Network Drive 1284 Burlington, MA 01803 1285 dahming.chiu@sun.com 1287 Sanjoy Paul 1288 Edgix Corporation 1289 130 W. 42nd Street, Suite 850 1290 New York, NY 10036 1291 sanjoy@edgix.com 1293 Miriam Kadansky 1294 Sun Microsystems Laboratories 1295 1 Network Drive 1296 Burlington, MA 01803 1297 miriam.kadansky@east.sun.com 1299 Gursel Taskale 1300 Talarian Corporation 1301 333 Distel Circle 1302 Los Altos CA 94022 1303 whetten@talarian.com 1305 Full Copyright Statement 1307 Copyright (C) The Internet Society, 2000. All Rights Reserved. This 1308 document and translations of it may be copied and furnished to others, and 1309 derivative works that comment on or otherwise explain it or assist in its 1310 implementation may be prepared, copied, published and distributed, in whole 1311 or in part, without restriction of any kind, provided that the above 1312 copyright notice and this paragraph are included on all such copies and 1313 derivative works. However, this document itself may not be modified in any 1314 way, such as by removing the copyright notice or references to the Internet 1315 Society or other Internet organizations, except as needed for the purpose 1316 of developing Internet standards in which case the procedures for 1317 copyrights defined in the Internet Standards process must be followed, or 1318 as required to translate it into other languages.