idnits 2.17.1 draft-speakman-pgm-spec-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-20) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 132 instances of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 2613 has weird spacing: '... If so then ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (24 August 1998) is 9371 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' == Outdated reference: A later version (-03) exists of draft-miller-mftp-spec-02 -- Possible downref: Normative reference to a draft: ref. '4' -- Unexpected draft version: The latest known version of draft-katz-router-alert is -03, but you're referring to -04. -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '8' ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref. '9') ** Obsolete normative reference: RFC 1700 (ref. '10') (Obsoleted by RFC 3232) -- Possible downref: Non-RFC (?) normative reference: ref. '11' -- Possible downref: Non-RFC (?) normative reference: ref. '12' -- Possible downref: Non-RFC (?) normative reference: ref. '13' Summary: 12 errors (**), 0 flaws (~~), 3 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT PGM Reliable Transport Protocol Tony Speakman 3 Expires 24 February 1999 Dino Farinacci 4 Steven Lin 5 Alex Tweedly 7 cisco Systems 8 24 August 1998 10 PGM Reliable Transport Protocol Specification 11 13 Status of this Memo 15 This document is an Internet-Draft. Internet-Drafts are working docu- 16 ments of the Internet Engineering Task Force (IETF), its areas, and its 17 working groups. Note that other groups may also distribute working 18 documents as Internet-Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet- Drafts as reference material 23 or to cite them other than as "work in progress." 25 To view the entire list of current Internet-Drafts, please check the 26 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 27 Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), 28 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 29 ftp.isi.edu (US West Coast). 31 Abstract 33 Pragmatic General Multicast (PGM) is a reliable multicast transport pro- 34 tocol for applications that require ordered, duplicate-free, multicast 35 data delivery from multiple sources to multiple receivers. PGM guaran- 36 tees that a receiver in the group either receives all data packets from 37 transmissions and retransmissions, or is able to detect unrecoverable 38 data packet loss. PGM is specifically intended as a workable solution 39 for multicast applications with basic reliability requirements. Its 40 central design goal is simplicity of operation with due regard for sca- 41 lability and network efficiency. 43 Revision History 45 January 1998 draft-speakman-pgm-spec-00.txt 47 Original draft. 49 January 1998 draft-speakman-pgm-spec-01.txt 51 Deleted reference to proprietary trademark. 53 August 1998 draft-speakman-pgm-spec-02.txt 55 This revision benefited from general discussions in the forum of the 56 Reliable Multicast IRTF as well as from individual discussion with 57 Dan Leshchiner concerning source addressing and NAK elimination, with 58 Chetan Rai concerning TPDU ordering and local retransmission, and 59 with Jim Gemmell, Luigi Rizzo, and Lorenzo Vicisano concerning FEC. 61 Clarified that RDATA from DLRs and NCFs from network elements must 62 bear the ODATA source's source NLA. 64 Added NAK elimination timer and corresponding procedures to net- 65 work elements. 67 Added procedures and packet formats to incorporate FEC. 69 Changed all the packet type encodings to anticipate versioning and 70 extension. 72 Added work-in-progress items for RDATA delay at the source and 73 minimum NAK back-off at receivers. 75 Added work-in-progress items for SPMRs. 77 Table of Contents 79 Abbreviations ..................................................... 4 80 1. Introduction and Overview ..................................... 5 81 2. Architectural Description ..................................... 11 82 3. Terms and Concepts ............................................ 14 83 4. Procedures - General .......................................... 23 84 5. Procedures - Sources .......................................... 23 85 6. Procedures - Receivers ........................................ 26 86 7. Procedures - Network Elements ................................. 31 87 8. Packet Formats ................................................ 36 88 9. Options ....................................................... 45 89 10. Security Considerations ....................................... 57 90 Appendix A - Forward Error Correction ............................. 59 91 Appendix B - Congestion Avoidance ................................. 68 92 Appendix C - Flow Control ......................................... 69 93 Work in Progress .................................................. 76 94 Acknowledgements .................................................. 77 95 References ........................................................ 78 96 Abbreviations 98 ACK Acknowledgement 99 AFI Address Family Indicator 100 APDU Application Protocol Data Unit 101 ARQ Automatic Repeat reQuest 102 DLR Designated Local Retransmitter 103 GSI Globally Unique Source Identifier 104 FEC Forward Error Correction 105 MD5 Message-Digest Algorithm 106 MTU Maximum Transmission Unit 107 NAK Negative Acknowledgement 108 NCF NAK Confirmation 109 NLA Network Layer Address 110 NNAK Null Negative Acknowledgment 111 ODATA Original Data 112 RDATA Retransmitted Data 113 RSN Receive State Notification 114 SPM Source Path Message 115 SPMR SPM Request 116 TG Transmission Group 117 TGSIZE Transmission Group Size 118 TPDU Transport Protocol Data Unit 119 TSI Transport Session Identifier 120 TSN Transmit State Notification 121 1. Introduction and Overview 123 A variety of reliable protocols have been proposed for multicast data 124 delivery, each with an emphasis on particular types of applications, 125 network characteristics, or definitions of reliability ([1], [2], [3], 126 [4]). In this tradition, Pragmatic General Multicast (PGM) is a reli- 127 able transport protocol for applications that require ordered, 128 duplicate-free, multicast data delivery from multiple sources to multi- 129 ple receivers. 131 PGM is specifically intended as a workable solution for multicast appli- 132 cations with basic reliability requirements rather than as a comprehen- 133 sive solution for multicast applications with sophisticated ordering, 134 agreement, and robustness requirements. Its central design goal is sim- 135 plicity of operation with due regard for scalability and network effi- 136 ciency. 138 PGM has no notion of group membership. It simply provides reliable mul- 139 ticast data delivery within a transmit window advanced by a source in 140 the absence of negative acknowledgments from any receiver. Reliable 141 delivery is provided within a source's transmit window from the time a 142 receiver joins the group until it departs. PGM guarantees that a 143 receiver in the group either receives all data packets from transmis- 144 sions and retransmissions, or is able to detect unrecoverable data 145 packet loss. PGM supports any number of sources within a multicast 146 group, each fully identified by a globally unique Transport Session 147 Identifier (TSI), but since these sources/sessions operate entirely 148 independently of each other, this specification is phrased in terms of a 149 single source and extends without modification to multiple sources. 151 More specifically, PGM is not intended for use with applications that 152 depend either upon acknowledged delivery to a known group of recipients, 153 or upon total ordering amongst multiple sources. 155 Rather, PGM is best suited to those applications in which members may 156 join and leave at any time, and that are either insensitive to unrecov- 157 erable data packet loss or are prepared to resort to application 158 recovery in the event. Through its optional extensions, PGM provides 159 specific mechanisms to support applications as disparate as stock and 160 news updates, data conferencing, and low-delay, real-time video 161 transfer. 163 In the following text, transport-layer originators of PGM data packets 164 are referred to as sources, transport-layer consumers of PGM data pack- 165 ets are referred to as receivers, and network-layer entities in the 166 intervening network are referred to as network elements. 168 1.1. Summary of Operation 170 PGM runs over a datagram multicast protocol such as IP multicast [5]. 171 In the normal course of data transfer, a source multicasts sequenced 172 data packets (ODATA), and receivers unicast selective negative ack- 173 nowledgements (NAKs) for data packets detected to be missing from the 174 expected sequence. Network elements forward NAKs PGM-hop-by-PGM-hop to 175 the source, and confirm each hop by multicasting a NAK confirmation 176 (NCF) in response on the interface on which the NAK was received. 177 Retransmissions (RDATA) may be provided either by the source itself or 178 by a Designated Local Retransmitter (DLR) in response to a NAK, or by 179 another receiver in response to an NCF. 181 Since NAKs provide the sole mechanism for reliability, PGM is particu- 182 larly sensitive to their loss. To minimize NAK loss, PGM defines a 183 network-layer hop-by-hop procedure for reliable NAK forwarding. 185 Upon detection of a missing data packet, a receiver repeatedly unicasts 186 a NAK to the last-hop PGM network element on the distribution tree from 187 the source. A receiver repeats this NAK until it receives a NAK confir- 188 mation (NCF) multicast to the group from that PGM network element. That 189 network element responds with an NCF to the first occurrence of the NAK 190 and any further retransmissions of that same NAK from any receiver. In 191 turn, the network element repeatedly forwards the NAK to the upstream 192 PGM network element on the reverse of the distribution path from the 193 source of the original data packet until it also receives an NCF from 194 that network element. Finally, the source itself receives and confirms 195 the NAK by multicasting an NCF to the group. 197 While NCFs are multicast to the group, they are not propagated by PGM 198 network elements since they act as hop-by-hop confirmations. 200 To avoid NAK implosion, PGM specifies procedures for subnet-based NAK 201 suppression amongst receivers and NAK elimination within network ele- 202 ments. The usual result of this procedure is the propagation of just 203 one copy of a given selective NAK along the reverse of the distribution 204 path from any network with directly connected receivers to a source. 206 The net effect is that unicast NAKs return from a receiver to a source 207 on the reverse of the path on which ODATA was forwarded, that is, on the 208 reverse of the distribution tree from the source. More specifically, 209 they return through exactly the same sequence of PGM network elements 210 through which ODATA was forwarded, but in reverse. The reasons for han- 211 dling NAKs this way will become clear in the discussion of constraining 212 retransmissions, but first it's necessary to describe the mechanisms for 213 establishing the requisite source path state in PGM network elements. 215 To establish source path state in PGM network elements, the basic data 216 transfer operation is augmented by Source Path Messages (SPMs) from a 217 source, periodically interleaved with ODATA. SPMs function primarily to 218 establish source path state for a given TSI in all PGM network elements 219 on the distribution tree from the source. PGM network elements use this 220 information to address returning unicast NAKs directly to the upstream 221 PGM network element toward the source, and thereby insure that NAKs 222 return from a receiver to a source on the reverse of the distribution 223 path for the TSI. 225 SPMs also act to alert receivers that the oldest data in the transmit 226 window is about to be retired from the transmit window and will, 227 thereafter, not be available for retransmission from the source. SPMs 228 are sent by a source at least at the rate at which the transmit window 229 is advanced, and they serve to provoke further NAKs from receivers as 230 well as to maintain receive window state in the receivers. 232 As a further efficiency, PGM specifies procedures for the constraint of 233 retransmissions by network elements so that they reach only those group 234 members that missed the original transmission. As NAKs traverse the 235 reverse of the ODATA path (upward), they establish retransmit state in 236 the network elements which is used in turn to constrain the (downward) 237 forwarding of the corresponding RDATA. 239 Besides procedures for other receivers to provide retransmissions, PGM 240 also specifies options and procedures that permit designated local 241 retransmitters (DLRs) to announce their availability and to redirect 242 retransmission requests (NAKs) to themselves rather than to the original 243 source. In addition to these conventional procedures for loss recovery 244 through selective ARQ, Appendix A specifies Forward Error Correction 245 (FEC) procedures for sources to provide and receivers to request general 246 error correcting parity packets rather than selective retransmissions. 248 Finally, since PGM operates without regular return traffic from 249 receivers, conventional feedback mechanisms for transport flow and 250 congestion control cannot be applied. Appendix B specifies some prelim- 251 inary strategies for congestion avoidance to be modified and proven or 252 discarded as experience dictates. Appendix C specifies a basic and 253 optional flow control supplement native to PGM itself that introduces a 254 degree of receiver feedback, but it is entirely elective and not meant 255 as a replacement for reservation protocols or other out-of-band resource 256 and conference management strategies. In its basic operation, there- 257 fore, PGM relies on a purely rate-limited transmission strategy in the 258 source to bound the bandwidth consumed by PGM transport sessions and to 259 define the transmit window maintained by the source. 261 PGM defines four basic packet types: three that flow downstream (SPMs, 262 DATA, NCFs), and one that flows upstream (NAKs). 264 1.2. Design Goals and Constraints 266 PGM has been designed to serve that broad range of multicast applica- 267 tions that have relatively simple reliability requirements, and to do so 268 in a way that realizes the much advertised but often unrealized network 269 efficiences of multicast data transfer. The usual impediments to real- 270 izing these efficiences are the implosion of negative and positive ack- 271 nowledgements from receivers to senders, retransmission latency from the 272 source, and the propagation of retransmissions to disinterested 273 receivers. 275 1.2.1. Reliability. 277 Reliable data delivery across an unreliable network is conventionally 278 achieved through an end-to-end protocol in which a source (implicitly or 279 explicitly) solicits receipt confirmation from a receiver, and the 280 receiver responds positively or negatively. While the frequency of 281 negative acknowledgements is a function of the reliability of the net- 282 work and the receiver's resources (and so, potentially quite low), the 283 frequency of positive acknowledgements is fixed at at least the rate at 284 which the transmit window is advanced, and usually more often. 286 Negative acknowledgements primarily determine retransmissions and relia- 287 bility. Positive acknowledgements primarily determine transmit buffer 288 management. 290 When these principles are extended without modification to multicast 291 protocols, the result, at least for positive acknowledgements, is a bur- 292 den of positive acknowledgments transmitted to the source that quickly 293 threatens to overwhelm it as the number of receivers grows. More suc- 294 cinctly, ACK implosion keeps ACK-based reliable multicast protocols from 295 scaling well. 297 One of the goals of PGM is to get as strong a definition of reliability 298 as possible from as simple a protocol as possible. ACK implosion can be 299 addressed in a variety of effective but complicated ways, most of which 300 require re-transmit capability from other than the original source. 302 An alternative is to dispense with positive acknowledgements altogether, 303 and to resort to other strategies for buffer management while retaining 304 negative acknowledgements for retransmissions and reliability. The 305 approach taken in PGM is to retain negative acknowledgements, but to 306 dispense with positive acknowledgements and resort instead to timeouts 307 at the source to manage transmit resources. 309 The definition of reliability with PGM is a direct consequence of this 310 design decision. PGM guarantees that a receiver either receives all 311 data packets from transmissions and retransmissions, or is able to 312 detect unrecoverable data packet loss. 314 PGM includes strategies for repeatedly soliciting NAKs from receivers, 315 and for adding reliability to the NAKs themselves. By reinforcing the 316 NAK mechanism, PGM minimizes the probability that a receiver will detect 317 a missing data packet so late that the packet is unavailable for 318 retransmission either from the source, from another receiver, or from a 319 designated local retransmitter (DLR). Without ACKs and knowledge of 320 group membership, however, PGM cannot eliminate this possibility. 322 1.2.2. Group Membership 324 A second consequence of eliminating ACKs is that knowledge of group 325 membership is neither required nor provided by the protocol. Although a 326 source may receive some PGM packets (NAKs for instance) from some 327 receivers, the identity of the receivers does not figure in the process- 328 ing of those packets. Group membership may change during the course of 329 a PGM transport session without the knowledge of or consequence to the 330 source or the remaining receivers. 332 1.2.3. Efficiency 334 While PGM avoids the implosion of positive acknowledgements simply by 335 dispensing with ACKs, the implosion of negative acknowledgements is 336 addressed directly. 338 Receivers observe a random back-off before generating a NAK during which 339 interval the NAK is suppressed by the receiver upon receipt of a match- 340 ing NCF. In addition, PGM network elements eliminate duplicate NAKs 341 received on different interfaces on the same network element. The com- 342 bination of these two strategies usually results in the source receiving 343 just a single NAK for any given lost data packet. 345 Whether a retransmission is provided from another receiver, a DLR, or 346 the original source, it is important to constrain that retransmission to 347 only those network segments containing members that negatively ack- 348 nowledged the original transmission rather than propagating it 349 throughout the group. PGM specifies procedures for network elements to 350 use the pattern of NAKs to define a sub-tree within the group upon which 351 to forward the corresponding retransmission so that it reaches only 352 those receivers that missed it in the first place. 354 1.2.4. Simplicity 356 PGM is designed to achieve the greatest improvement in reliability (as 357 compared to the usual UDP) with the least complexity. As a result, PGM 358 does NOT address conference control, global ordering amongst multiple 359 sources in the group, nor recovery from network partitions. 361 1.2.5. Operability 363 PGM is designed to function, albeit with less efficiency, even in the 364 presence of network elements that have no knowledge of PGM. To that 365 end, all PGM data packets can be conventionally multicast routed by 366 non-PGM network elements with no loss of functionality, but with some 367 inefficiency in the propagation of RDATA and NCFs. 369 In addition, since NAKs are unicast to the last-hop PGM network element 370 and NCFs are multicast to the group, NAK/NCF operation is also con- 371 sistent across non-PGM network elements. However, since the NAK 372 suppression back-off delay is a protocol constant, and receivers rely on 373 the NCF to suppress NAKs, receivers must always have a PGM network ele- 374 ment as a first hop network element between themselves and every path to 375 every PGM source. If receivers are several hops removed from the first 376 PGM network element, the efficacy of NAK suppression may degrade. 378 1.3. Options 380 In addition to the basic data transfer operation described above, PGM 381 specifies several end-to-end options to address specific application 382 requirements. PGM specifies options to support fragmentation, sequence 383 number ranges, late joining, time-stamping, reception quality reports, 384 sequence number dropout, redirection, and Forward Error Correction 385 (FEC). Options may be appended to PGM packet headers only by their ori- 386 ginal transmitters. While they may be interpreted by network elements, 387 options are neither added nor removed by network elements. 389 All options are receiver-significant (i.e., they must be interpreted by 390 receivers). Some options are also network-significant (i.e., they must 391 be interpreted by network elements). 393 Fragmentation may be used in conjunction with data packets to allow a 394 transport-layer entity at the source to break up application-layer data 395 packets into multiple PGM data packets to conform with the maximum 396 transmission unit (MTU) supported by the network layer. Fragmentation 397 is incompatible with the sequence number dropout option. 399 Sequence number ranges may be used in conjunction with NAKs to allow 400 receivers to negatively acknowledge a contiguous range of missing 401 sequence numbers in a single NAK. 403 Late joining allows a source to indicate whether or not receivers may 404 request all available retransmissions when they initially join a partic- 405 ular transport session. 407 Time stamps may be used in conjunction with NAKs to allow receivers to 408 specify the interval in which the requested RDATA is relevant to them. 410 That interval is interpreted by both network elements and sources to 411 determine whether to continue with or abandon a given retransmission. 413 Reception quality reports may be used in conjunction with NAKs to allow 414 receivers to provide a reception quality metric for local interpretation 415 at the source for the purpose of congestion control. 417 Sequence number dropout may be used in conjunction with data packets to 418 allow sources and network elements to selectively eliminate PGM data 419 packets and convey the resulting sequence-number discontinuity to 420 receivers so that reliability can be preserved across the dropout. 421 Sequence number dropout is incompatible with the fragmentation option. 423 Redirection may be used in conjunction with NCFs to allow a DLR to 424 respond to normal NCFs with a redirecting NCF advertising its own 425 address as an alternative to the original source. Recipients of 426 redirecting NCFs may then direct subsequent NAKs to the DLR rather than 427 to the original source. In addition, receivers or network elements that 428 redirect NAKs to a DLR must also send a NULL NAK to provide congestion 429 feedback to the original source without also provoking a retransmission 430 from that source. 432 FEC techniques may be applied by receivers to use source-provided parity 433 packets rather than selective retransmissions to effect loss recovery. 435 2. Architectural Description 437 As an end-to-end transport protocol, PGM specifies packet formats and 438 procedures for sources to transmit and for receivers to receive data. 439 To enhance the efficiency of this data transfer, PGM also specifies 440 packet formats and procedures for network elements to improve the relia- 441 bility of NAKs and to constrain the propagation of retransmissions. The 442 division of these functions is described in this section and expanded in 443 detail in the next section. 445 2.1. Source Functions 447 Data Transmission 449 Sources multicast ODATA packets to the group within the transmit 450 window at a given transmit rate. 452 Source Path State 454 Sources multicast SPMs to the group, interleaved with ODATA if 455 present, to establish source path state in PGM network elements. 457 NAK Reliability 458 Sources multicast NCFs to the group in response to any NAKs they 459 receive. 461 Data Retransmission 463 Sources multicast RDATA packets to the group in response to NAKs 464 received for data packets within the transmit window. 466 Transmit Window Advance 468 Sources multicast SPMs to the group in preparation for advancing 469 the transmit window. Sources may simply advance the window with 470 the passage of time, or they may delay advancing the window until 471 no NAKs for the expiring fraction of the window are received 472 within a given SPM response interval. 474 2.2. Receiver Functions 476 Source Path State 478 Receivers use SPMs to determine the last-hop PGM network element 479 for a given TSI to which to direct their NAKs. 481 Data Reception 483 Receivers receive ODATA within the transmit window and eliminate 484 any duplicates. 486 Retransmission Requests 488 Receivers unicast NAKs to the last-hop PGM network element for 489 data packets within the receive window detected to be missing from 490 the expected sequence. A receiver must repeatedly transmit a 491 given NAK until it receives a matching NCF. 493 NAK Suppression 495 Receivers suppress NAKs for which a matching NCF is received dur- 496 ing the NAK transmit back-off interval. 498 Local Retransmission 500 Receivers may multicast retransmissions of any data in their 501 receive windows for which they receive a matching NCF. 503 Local Retransmission Suppression 504 Receivers suppress retransmissions for which a matching 505 retransmission is received during the retransmit back-off inter- 506 val. 508 Receive Window Advance 510 Receivers advance their receive windows as directed by an SPM 511 unless they detect that they are missing data packets in the 512 expiring fraction of the window. Receivers should expedite 513 retransmission requests for missing data packets in the expiring 514 fraction of the window. 516 Receivers immediately advance their receive windows upon receipt 517 of any PGM data packet within the receive window that advances the 518 receive window. 520 2.3. Network Element Functions 522 Network elements forward ODATA without intervention. 524 Source Path State 526 Network elements intercept SPMs and use them to establish source 527 path state for the corresponding source and group before multicast 528 forwarding them in the usual way. 530 NAK Reliability 532 Network elements multicast NCFs to the group in response to any 533 NAK they receive. For each NAK received, network elements create 534 retransmit state recording the transport session identifier, the 535 sequence number of the NAK, and the input interface on which the 536 NAK was received. 538 Constrained NAK Forwarding 540 Network elements repeatedly unicast forward only the first copy of 541 any NAK they receive to the upstream PGM network element on the 542 distribution path for the TSI and only until they receive an NCF 543 in response. 545 NAK Elimination 547 Network elements discard exact duplicates of any NAK for which 548 they already have retransmit state (i.e., that has been forwarded 549 either by themselves or a neighbouring PGM network element), and 550 respond with a matching NCF. 552 Constrained RDATA Forwarding 554 Network elements use NAKs to maintain retransmit state consisting 555 of a list of interfaces upon which a given NAK was received, and 556 they return the corresponding RDATA only on these interfaces. 558 NAK Anticipation 560 If a network element hears an upstream NCF (i.e., on the upstream 561 interface for the distribution tree for the TSI), it establishes 562 retransmit state without outgoing interfaces in anticipation of 563 responding to and eliminating duplicates of the NAK that may 564 arrive from downstream. 566 3. Terms and Concepts 568 Before proceeding from the preceding overview to the detail in the sub- 569 sequent Procedures, this section presents some concepts and definitions 570 that make that detail more intelligible. 572 3.1. Transport Session Identifiers 574 Every PGM packet is identified by a: 576 TSI transport session identifier 578 TSIs must be globally unique, and only one source at a time may act as 579 the source for a transport session. (Note that retransmitters do not 580 change the TSI in any RDATA they transmit). TSIs are composed of the 581 concatenation of a globally unique source identifier (GSI) and a 582 source-assigned source port. 584 Since all PGM packets originated by receivers are in response to PGM 585 packets originated by a source, receivers simply echo the TSI heard from 586 the source in any corresponding packets they originate. 588 Since all PGM packets originated by network elements are in response to 589 PGM packets originated by a receiver, network elements simply echo the 590 TSI heard from the receiver in any corresponding packets they originate. 592 3.2. Sequence Numbers 594 PGM uses a circular sequence number space from 0 through ((2**32) - 1) 595 to identify and order ODATA packets. Sources must number ODATA packets 596 in unit increments in the order in which the corresponding application 597 data is submitted for transmission. Within a transmit or receive window 598 (defined below), a sequence number x is "less" or "older" than sequence 599 number y if it numbers an ODATA packet preceding ODATA packet y, and a 600 sequence number y is "greater" or "more recent" than sequence number x 601 if it numbers an ODATA packet subsequent to ODATA packet x. 603 3.3. Transmit Window 605 The description of the operation of PGM rests fundamentally on the 606 definition of the source-maintained transmit window. This definition in 607 turn is derived directly from the amount of transmitted data (in 608 seconds) a source retains for retransmission (TXW_SECS), and the maximum 609 transmit rate (in bytes/second) maintained by a source to regulate its 610 bandwidth utilization (TXW_MAX_RTE). 612 The size of the transmit window in seconds is simply TXW_SECS. The size 613 of the transmit window in bytes (TXW_BYTES) is (TXW_MAX_RTE * TXW_SECS). 614 The size of the transmit window in sequence numbers (TXW_SQNS) is 615 (TXW_BYTES / bytes-per-packet). 617 In terms of sequence numbers, the transmit window is the range of 618 sequence numbers consumed by the source for sequentially numbering and 619 transmitting the most recent TXW_SECS of ODATA packets. The trailing 620 (or left) edge of the transmit window (TXW_TRAIL) is defined as the 621 sequence number of the oldest data packet available for retransmission 622 from a source. The leading (or right) edge of the transmit window 623 (TXW_LEAD) is defined as the sequence number of the most recent data 624 packet a source has transmitted. 626 The size of the transmit window in sequence numbers (TXW_SQNS) (i.e., 627 the difference between the leading and trailing edges) must be no 628 greater than half the PGM sequence number space less one. 630 The fraction of the transmit window size (in seconds of data) by which 631 the transmit window is advanced (TXW_ADV_SECS) is called the window 632 increment. The trailing (oldest) such fraction of the transmit window 633 itself is called the increment window. 635 In terms of sequence numbers, the increment window is the range of 636 sequence numbers that will be the first to be expired from the transmit 637 window. The trailing (or left) edge of the increment window is just 638 TXW_TRAIL, the trailing (or left) edge of the transmit window. The 639 leading (or right) edge of the increment window (TXW_INC) is defined as 640 one less than the sequence number of the first data packet transmitted 641 by the source TXW_ADV_SECS after transmitting TXW_TRAIL. 643 A data packet is described as being "in" the transmit or increment win- 644 dow, respectively, if its sequence number is in the range defined by the 645 transmit or increment window, respectively. 647 The transmit window is advanced across the increment window by the 648 source when it increments TXW_TRAIL to TXW_INC. When the transmit win- 649 dow is advanced across the increment window, the increment window is 650 emptied (i.e., TXW_TRAIL is momentarily equal to TXW_INC), begins to 651 refill immediately as transmission proceeds, is full again TXW_ADV_SECS 652 later (i.e., TXW_TRAIL is separated from TXW_INC by TXW_ADV_SECS of 653 data), at which point the transmit window is advanced again, and so on. 655 Consider the following example: 657 Assuming a constant transmit rate of 128kbps and a constant data 658 packet size of 1500 bytes, if a source maintains the past 30 seconds 659 of data for retransmission and increments its transmit window in 5 660 second increments, then 662 TXW_MAX_RTE = 16kBps 663 TXW_ADV_SECS = 5 seconds, 664 TXW_SECS = 35 seconds, 665 TXW_BYTES = 560kB, 666 TXW_SQNS = 383 (rounded up), 668 and the size of the increment window in sequence numbers 669 (TXW_MAX_RTE * TXW_ADV_SECS / 1500) = 54 (rounded down). 671 Continuing this example, the following is a diagram of the transmit win- 672 dow and the increment window therein in terms of sequence numbers. 674 TXW_TRAIL TXW_LEAD 675 | | 676 | | 677 |--|--------------- Transmit Window -------------|----| 678 v | | v 679 v v 680 ... +-----+-----+-...-+------+------+-...-+-------+-------+ ..... 681 n-1 | n | n+1 | ... | n+53 | n+54 | ... | n+381 | n+382 | n+383 682 ... +-----+-----+-...-+------+------+-...-+-------+-------+ ..... 683 ^ 684 ^ | ^ 685 |--- Increment Window|---| 686 | 687 | 688 TXW_INC 690 So the values of the sequence numbers defining these windows are: 692 TXW_TRAIL = n 693 TXW_INC = n+53 694 TXW_LEAD = n+382 696 NOTA BENE: In this example the window sizes in terms of sequence 697 numbers can be determined only because of the assumption of a con- 698 stant data packet size of 1500 bytes. When the data packet sizes are 699 variable, more or fewer sequence numbers may be consumed transmitting 700 the same amount (TXW_BYTES) of data. 702 So, for a given transport session identified by a TSI, a source main- 703 tains: 705 TXW_MAX_RTE a maximum transmit rate in kBytes per second, the cumula- 706 tive transmit rate of ODATA plus RDATA 708 TXW_TRAIL the sequence number defining the trailing edge of the 709 transmit window, the sequence number of the oldest data 710 packet available for retransmission 712 TXW_LEAD the sequence number defining the leading edge of the 713 transmit window, the sequence number of the most recently 714 transmitted ODATA packet 716 TXW_INC the sequence number defining the leading edge of the 717 increment window, the sequence number of the most 718 recently transmitted data packet amongst those that will 719 expire upon the next increment of the transmit window 721 Happily, everything else in this section is a LOT easier to explain than 722 the transmit window. 724 3.4. Receive Window 726 The receive window at the receivers is determined entirely by PGM pack- 727 ets from the source. 729 For a given transport session identified by a TSI, a receiver maintains: 731 RXW_TRAIL the sequence number defining the trailing edge of the 732 receive window, the sequence number (known from data 733 packets and SPMs) of the oldest data packet available for 734 retransmission from the source 736 RXW_LEAD the sequence number defining the leading edge of the 737 receive window, the greatest sequence number of any 738 received data packet 740 RXW_INC the sequence number defining the leading edge of the 741 increment window, the greatest sequence number (known 742 from SPMs) amongst the sequence numbers of those data 743 packets that will expire upon the next increment of the 744 receive window 746 The receive window is the range of sequence numbers a receiver is 747 expected to use to identify receivable ODATA. 749 The increment window is the range of sequence numbers that will be the 750 first to be made unavailable for retransmission by the source. It is 751 the range of the oldest sequence numbers from (and including) RXW_TRAIL 752 through RXW_INC. 754 A data packet is described as being "in" the receive or increment window 755 if its sequence number is in the receive or increment window. 757 The receive window is advanced by the receiver when it receives an SPM 758 that increments RXW_TRAIL. Receivers also advance their receive windows 759 upon receipt of any PGM data packet within the receive window that 760 advances the receive window. 762 3.5. Source Path State 764 To establish the retransmit state required to constrain RDATA, it's 765 essential that NAKs return from a receiver to a source on the reverse of 766 the distribution tree from the source. That is, they must return 767 through the same sequence of PGM network elements through which the 768 ODATA was forwarded, but in reverse. There are two reasons for this, 769 the less obvious one being by far the more important one. 771 The first and obvious reason is that RDATA is forwarded on the same path 772 as ODATA and so retransmit state must be established on this path if it 773 is to constrain the propagation of RDATA. 775 The second and less obvious reason is that in the absence of retransmit 776 state, PGM network elements do NOT forward RDATA, so the default 777 behaviour is to discard retransmissions. If retransmit state is not 778 properly established for interfaces on which ODATA went missing, then 779 receivers on those interfaces will continue to NAK for lost data and 780 ultimately experience unrecoverable data loss. 782 The principle function of SPMs is to provide the source path state 783 required for PGM network elements to forward NAKs from one PGM network 784 element to the next on the reverse of the distribution tree for the TSI, 785 establishing retransmit state each step of the way. This source path 786 state is simply the address of the upstream PGM network element on the 787 reverse of the distribution tree for the TSI. That upstream PGM network 788 element may be more than one actual hop away. SPMs establish the iden- 789 tity of the upstream PGM network element on the distribution tree for 790 each TSI in each group in each PGM network element, a sort of virtual 791 PGM topology. So although NAKs are unicast addressed, they are NOT 792 unicast routed by PGM network elements in the conventional sense. 793 Instead PGM network elements use the source path state established by 794 SPMs to direct NAKs PGM-hop-by-PGM-hop toward the source. The idea is 795 to constrain NAKs to the pure PGM topology spanning the more heterogene- 796 ous underlying topology of both PGM and non-PGM network elements. 798 The result is retransmit state in every PGM network element between the 799 receiver and the source so that the corresponding RDATA is never dis- 800 carded by a PGM network element for lack of retransmit state. 802 SPMs also maintain transmit window state in receivers by advertising the 803 trailing and leading edges of the transmit window (SPM_TRAIL and 804 SPM_LEAD) and the leading edge of the increment window (SPM_INC). When 805 SPM_INC is greater than SPM_TRAIL, the SPM is advertising an imminent 806 advance of the transmit window across the increment window. When such 807 an advance is not imminent, SPM_INC and SPM_TRAIL have the same value. 808 In the absence of data, SPMs may be used to close the transmit window in 809 time by advancing the transmit window until all three values SPM_TRAIL, 810 SPM_INC, and SPM_LEAD are equal. 812 3.6. Packet Contents 814 This section just provides enough short-hand to make the Procedures 815 intelligible. For the full details of packet contents, please refer to 816 Packet Formats below. 818 3.6.1. Source Path Messages 820 3.6.1.1. SPMs 822 SPMs are transmitted by sources to establish source-path state in PGM 823 network elements, and to provide transmit-window state in receivers. 825 SPMs are multicast to the group and contain: 827 SPM_TSI the source-assigned TSI for the session to which the SPM 828 corresponds 830 SPM_SQN a sequence number assigned sequentially by the source in 831 unit increments and scoped by SPM_TSI 833 NOTA BENE: this is an entirely separate sequence than is used 834 to number ODATA and RDATA. 836 SPM_TRAIL the sequence number defining the trailing edge of the 837 source's transmit window (TXW_TRAIL) 839 SPM_INC the sequence number defining the leading edge of the 840 source's increment window (TXW_INC) 842 SPM_LEAD the sequence number defining the leading edge of the 843 source's transmit window (TXW_LEAD) 845 SPM_PATH the network-layer address (NLA) of the interface on the 846 PGM network element on which the SPM is forwarded 848 3.6.2. Data Packets 850 3.6.2.1. ODATA - Original Data 852 ODATA packets are transmitted by sources to send application data to 853 receivers. 855 ODATA packets are multicast to the group and contain: 857 OD_TSI the globally unique source-assigned TSI 859 OD_TRAIL the sequence number defining the trailing edge of the 860 source's transmit window (TXW_TRAIL) 862 OD_TRAIL makes the protocol more robust in the face of 863 lost SPMs. By including the trailing edge of the 864 transmit window on every data packet, receivers that have 865 missed any SPMs that advanced the transmit window can 866 still detect the case, recover the application, and 867 potentially resynchronize to the transport session. 869 OD_SQN a sequence number assigned sequentially by the source in 870 unit increments and scoped by OD_TSI 872 3.6.2.2. RDATA - Retransmitted Data 874 RDATA packets are retransmitted data packets transmitted by sources or 875 DLRs in response to NAKs. 877 RDATA packets are multicast to the group and contain: 879 RD_TSI OD_TSI of the ODATA packet of which this is a retransmis- 880 sion 882 RD_TRAIL the sequence number defining the trailing edge of the 883 source's transmit window (TXW_TRAIL), not necessarily the 884 same as OD_TRAIL of the ODATA packet of which this is a 885 retransmission 887 RD_SQN OD_SQN of the ODATA packet of which this is a 888 retransmission 890 3.6.3. Negative Acknowledgements 892 3.6.3.1. NAKs - Negative Acknowledgments 894 NAKs are transmitted by receivers to request retransmission of missing 895 data packets. 897 NAKs are unicast (PGM-hop-by-PGM-hop) to the source and contain: 899 NAK_TSI OD_TSI of the ODATA packet for which retransmission is 900 requested 902 NAK_SQN OD_SQN of the ODATA packet for which retransmission is 903 requested 905 NAK_SRC the unicast NLA of the original source of the missing 906 ODATA. 908 NAK_GRP the multicast group NLA 910 3.6.3.2. NNAKs - Null Negative Acknowledgments 912 NNAKs are transmitted by either receivers or network elements that are 913 redirecting their NAKs to a DLR to provide flow-control feed-back to a 914 source. 916 NNAKs are unicast (PGM-hop-by-PGM-hop) to the source and contain: 918 NNAK_TSI NAK_TSI of the corresponding re-directed NAK. 920 NNAK_SQN NAK_SQN of the corresponding re-directed NAK. 922 NNAK_SRC NAK_SRC of the corresponding re-directed NAK. 924 NNAK_GRP NAK_GRP of the corresponding re-directed NAK. 926 3.6.4. Negative Acknowledgement Confirmations 928 3.6.4.1. NCFs - NAK confirmations 930 NCFs are transmitted by network elements and sources in response to 931 NAKs. 933 NCFs are multicast to the group and contain: 935 NCF_TSI NAK_TSI of the NAK being confirmed 936 NCF_SQN NAK_SQN of the NAK being confirmed 938 NCF_SRC NAK_SRC of the NAK being confirmed 940 NCF_GRP NAK_GRP of the NAK being confirmed 942 3.6.5. Option Encodings 944 OPT_FRAGMENT - Fragmentation 946 OPT_RANGE - Sequence Number Range 948 OPT_JOIN - Late Joining 950 OPT_TIME - Time Stamp 952 OPT_RXQ - Reception Quality Report 954 OPT_DROP - Sequence Number Dropout 956 OPT_REDIRECT - Redirect 958 OPT_PARITY - Forward Error Correction 959 4. Procedures - General 961 Since SPMs, NCFs, and RDATA must be treated conditionally by PGM network 962 elements, they must be distinguished from other packets in the chosen 963 multicast network protocol if PGM network elements are to extract them 964 from the usual switching path. 966 The most obvious way for network elements to achieve this is to examine 967 every packet in the network protocol for the PGM transport protocol and 968 packet types. However, the overhead of this approach is costly for 969 high-performance, multi-protocol network elements. An alternative, and 970 a requirement for PGM over IP multicast, is that SPMs, NCFs, and RDATA 971 must be transmitted with the IP Router Alert Option [6]. This option 972 gives network elements a network-layer indication that a packet should 973 be extracted from IP switching for more detailed processing. 975 5. Procedures - Sources 977 5.1. Data Transmission 979 Since PGM relies on a purely rate-limited transmission strategy in the 980 source to bound the bandwidth consumed by PGM transport sessions, an 981 assortment of techniques is assembled here to make that strategy as con- 982 servative and robust as possible. These techniques are the minimum 983 required of a PGM source, and others may be added as experience dic- 984 tates. 986 5.1.1. Maximum Cumulative Transmit Rate 988 A source must number ODATA packets in the order in which they are sub- 989 mitted for transmission by the application. A source must transmit 990 ODATA packets in sequence and only within the transmit window beginning 991 with TXW_TRAIL at no greater a rate than TXW_MAX_RTE. Note that 992 TXW_MAX_RTE is the maximum cumulative transmit rate of SPMs, ODATA and 993 RDATA. The reason for calculating TXW_MAX_RTE in this way is so that 994 retransmissions will act to back off the rate at which ODATA is 995 transmitted. 997 5.1.2. Transmit Rate Regulation 999 To regulate its transmit rate, a source must use a token bucket scheme 1000 or any other traffic management scheme that yields equivalent behaviour. 1001 A token bucket [7] is characterized by a continually sustainable data 1002 rate (the token rate) and the extent to which the data rate may exceed 1003 the token rate for short periods of time (the token bucket size). Over 1004 any arbitrarily chosen interval, the number of bytes the source may 1005 transmit cannot exceed the token bucket size plus the product of the 1006 token rate and the chosen interval. 1008 In addition, a source must bound the maximum rate at which successive 1009 packets may be transmitted using a leaky bucket scheme drained at a max- 1010 imum transmit rate, or equivalent mechanism. 1012 5.1.3. TPDU Ordering 1014 To preserve the logic of PGM's transmit window, a source must implement 1015 strict priority queueing of pending SPMs, pending RDATA, and pending 1016 ODATA from three separate queues in that order, or implement any mechan- 1017 ism that results in equivalent behaviour. 1019 5.1.4. Ambient SPMs 1021 Interleaved with ODATA and RDATA, a source must transmit SPMs at a rate 1022 at least sufficient to maintain current source path state in PGM network 1023 elements. Note that source path state in network elements does not 1024 track underlying changes in the distribution tree from a source until an 1025 SPM traverses the altered distribution tree. The consequence is that 1026 NAKs may go unconfirmed both at receivers and amongst network elments 1027 while changes in the underlying distribution tree take place. 1029 5.1.5. Heartbeat SPMs 1031 In the absence of data to transmit, a source should transmit SPMs at a 1032 decaying rate in order to assist early detection of lost data, to main- 1033 tain current source path state in PGM network elements, and to maintain 1034 current receive window state in the receivers. 1036 In this scheme [8], a source maintains an inter-heartbeat timer IHB_TMR 1037 which times the interval between the most recent packet (ODATA, RDATA, 1038 or SPM) transmission and the next heartbeat transmission. IHB_TMR is 1039 initialized to a minimum interval IHB_MIN after the transmission of any 1040 data packet. If IHB_TMR expires, the source transmits a heartbeat SPM 1041 and initializes IHB_TMR to double its previous value. The transmission 1042 of consecutive heartbeat SPMs doubles IHB each time up to a maximum 1043 interval IHB_MAX. The transmission of any data packet initializes 1044 IHB_TMR to IHB_MIN once again. The effect is to provoke prompt detec- 1045 tion of missing packets in the absence of data to transmit, and to do so 1046 with minimal bandwidth overhead. 1048 5.2. Negative Acknowledgement Confirmation 1050 A source must immediately multicast an NCF in response to any NAK it 1051 receives. The NCF is required since the alternative of responding 1052 immediately with RDATA would not allow other PGM network elements on the 1053 same subnet to do NAK anticipation, nor would it allow DLRs on the same 1054 subnet to provide retransmissions. The generation of NCFs should be 1055 rate-limited to protect against a denial of service in the presence of a 1056 NAK storm. 1058 5.3. Data Retransmission 1060 A source must then multicast RDATA (while respecting TXW_MAX_RTE) in 1061 response to any NAK it receives for data packets within the transmit 1062 window. A source should transmit RDATA at priority over concurrent 1063 ODATA. The effect of this priority is to back off the transmission of 1064 ODATA in favour of RDATA. 1066 5.4. Transmit Window Advance 1068 5.4.1. Advancing across the Increment Window 1070 A source must initiate SPM repetition in anticipation of advancing the 1071 trailing edge of the transmit window from TXW_TRAIL to TXW_INC. SPMs 1072 advise receivers that the range of sequence numbers between SPM_TRAIL 1073 (TXW_TRAIL) and SPM_INC (TXW_INC) are about to be expired from the 1074 transmit window (i.e., the range of sequence numbers that are about to 1075 occupy the increment window). So if SPM repetition is initiated 1076 SPM_RPT_IVL ahead of the expiry of the increment window, the SPMs must 1077 advertise the range of sequence numbers that will expire in SPM_RPT_IVL. 1078 SPM_RPT_IVL may be in the range (0, TXW_ADV_SECS). SPM_RPT_IVL should 1079 be at least as large as the worst case round trip delay to any receiver 1080 a source is required to reach. SPM_RPT_RTE should be at least high 1081 enough to result in the transmission of at least two SPMs within 1082 SPM_RPT_IVL. 1084 A source may simultaneously continue ODATA and RDATA transmission, 1085 TXW_MAX_RTE permitting. 1087 A source must repeat SPMs at a rate of SPM_RPT_RTE for an interval of at 1088 least SPM_RPT_IVL. Timer SPM_RPT_IVL_TMR is set to SPM_RPT_IVL upon 1089 transmission of the first SPM of SPM repetition. 1091 While SPM_RPT_IVL_TMR is running, a source should transmit RDATA within 1092 the increment window at priority over both concurrent ODATA and other 1093 RDATA outside of the increment window. The effect of this priority is 1094 to back off the transmission of ODATA and other RDATA in favour of 1095 retransmissions of data packets about to be retired from the transmit 1096 window. 1098 Once the transmit window is advanced across the increment window, 1099 SPM_TRAIL and SPM_INC are both set to the new value of TXW_TRAIL until 1100 the next window advancement. 1102 5.4.2. Advancing with Data 1104 There are two modes of operation for transmit window advancement. In 1105 the first, TXW_MAX_RTE is calculated from both ODATA and RDATA, and NAKs 1106 reset SPM_RPT_IVL_TMR. 1108 While SPM_RPT_IVL_TMR is running, a source uses the receipt of a NAK for 1109 ODATA within the increment window to reset timer SPM_RPT_IVL_TMR to 1110 SPM_RPT_IVL so that transmit window advancement is delayed until no NAKs 1111 for data in the increment window are seen for SPM_RPT_IVL seconds. If 1112 the transmit window should fill in the meantime, further transmissions 1113 would be suspended until the transmit window can be advanced. 1115 A source must advance the transmit window across the increment window 1116 only upon expiry of SPM_RPT_IVL_TMR. This mode of operation is intended 1117 for non-real-time, messaging applications based on the receipt of com- 1118 plete data at the expense of delay. 1120 5.4.3. Advancing with Time 1122 Alternatively, TXW_MAX_RTE may be calculated from ODATA only to maintain 1123 a constant data rate by consuming extra bandwidth for retransmissions, 1124 and SPM_RPT_IVL_TMR may be run down in real time, advancing the transmit 1125 window without regard for whether NAKs for data in the increment window 1126 are still being received. This mode of operation is intended for real- 1127 time, streaming applications based on the receipt of timely data at the 1128 expense of completeness. 1130 6. Procedures - Receivers 1132 6.1. Data Reception 1134 Initial data reception 1136 A receiver may initiate data reception beginning only with the first 1137 ODATA_SQN it receives within the advertised transmit window. This 1138 sequence number temporarily defines the trailing edge of the transmit 1139 window from the receiver's perspective. That is, it is assigned to 1140 RXW_TRAIL_INIT within the receiver, and until the trailing edge sequence 1141 number advertised in subsequent packets (SPMs or ODATA or RDATA) incre- 1142 ments through RXW_TRAIL_INIT, the receiver must only request retransmis- 1143 sions for sequence numbers subsequent to RXW_TRAIL_INIT. Thereafter, it 1144 may request retransmissions anywhere in the transmit window. This tem- 1145 porary restriction on retransmission requests prevents receivers from 1146 requesting a potentially large amount of history when they first begin 1147 to receive a given PGM transport session. 1149 Receiving and discarding data packets 1151 Within a given transport session, a receiver must receive any ODATA or 1152 RDATA packets within the receive window. A receiver must discard any 1153 data packet that duplicates one already received in the transmit window. 1154 A receiver must discard any data packet outside of the receive window. 1156 Contiguous data 1158 Contiguous data is comprised of those data packets within the receive 1159 window that have been received and are in the range from RXW_TRAIL up to 1160 (but not including) the first missing sequence number in the receive 1161 window. The most recently received data packet of contiguous data 1162 defines the leading edge of contiguous data. 1164 A receiver must deliver only contiguous data packets to the application, 1165 and it must do so in the order defined by those data packets' sequence 1166 numbers. 1168 A receiver may maintain full copies of any packet in the receive window 1169 for possible retransmission even after having delivered that data to the 1170 application. 1172 6.2. Source Path Messages 1174 Receivers must receive and sequence SPMs for any TSI they are receiving. 1175 For each TSI, receivers must use the most recent SPM to determine the 1176 NLA of the upstream PGM network element for use in NAK addressing. Note 1177 that a receiver cannot initiate retransmit requests until it has 1178 received at least one SPM for the corresponding TSI. 1180 SPMs in which SPM_INC is greater than SPM_TRAIL advertise an impending 1181 transmit window advance, and receivers should expedite retransmission 1182 requests for missing data packets in the expiring fraction of the win- 1183 dow. 1185 6.3. Negative Acknowledgment 1187 Detecting missing data packets 1189 Receivers must detect gaps in the expected data sequence by comparing 1190 the sequence number on the most recently received ODATA or RDATA packet 1191 with the leading edge of contiguous data. If the receiver has not 1192 received all intervening data packets, it must initiate selective NAK 1193 generation for each intervening missing sequence number. Receivers 1194 should temper the initiation of NAK generation to account for simple 1195 mis-ordering introduced by the network. 1197 Receivers must also detect gaps in the expected data sequence by compar- 1198 ing SPM_LEAD of the most recently received SPM with the leading edge of 1199 contiguous data. If the receiver has not received all intervening data 1200 packets, it must initiate selective NAK generation for each missing 1201 sequence number. 1203 Generating NAKs 1205 NAK generation requires that a receiver listen to NCFs for the same 1206 transport session. 1208 NAK generation also requires that a receiver observe four time out 1209 intervals for any given NAK (i.e., per NAK_TSI and NAK_SQN). 1211 The first time out interval, the NAK random back-off interval 1212 NAK_RB_IVL, randomly delays the transmission of a given NAK from a 1213 receiver. NAK_RB_IVL is counted down from the time a missing data 1214 packet is detected. Expiry of NAK_RB_IVL causes transmission of the 1215 NAK. 1217 The second time out interval, the NAK repeat interval NAK_RPT_IVL, lim- 1218 its the length of time for which a receiver will repeat a NAK while 1219 waiting for a corresponding NCF. NAK_RPT_IVL is counted down from the 1220 transmission of a NAK. Expiry of NAK_RPT_IVL cancels NAK generation and 1221 indicates unrecoverable data loss (due to missing NCF). 1223 The third time out interval, the NAK RDATA interval NAK_RDATA_IVL, lim- 1224 its the length of time for which a receiver will wait for the RDATA 1225 corresponding to a confirmed NAK. NAK_RDATA_IVL is counted down from 1226 the time a matching NCF is received. Expiry of NAK_RDATA_IVL causes the 1227 receiver to select a new value of NAK_RB_IVL, and start again. 1229 The fourth time out interval, the NAK generation interval NAK_GEN_IVL, 1230 limits the length of time for which a receiver will retry a NAK while 1231 waiting for the corresponding RDATA. NAK_GEN_IVL is counted down from 1232 the time a missing data packet is detected. Expiry of NAK_GEN_IVL can- 1233 cels NAK generation and indicates unrecoverable data loss (due to miss- 1234 ing RDATA). 1236 NAK generation follows the detection of a missing data packet and is the 1237 cycle of waiting for NAK_RB_IVL, listening for matching NCFs, transmit- 1238 ting a NAK if a matching NCF is not heard, waiting NAK_RDATA_IVL, and 1239 recommencing NAK generation if the matching data is not received. Dur- 1240 ing NAK_RB_IVL, a NAK is said to be pending. During NAK_RDATA_IVL, a 1241 NAK is said to be outstanding. 1243 Suspending NAK generation 1245 Suspending NAK generation just means waiting for either NAK_RB_IVL or 1246 NAK_RDATA_IVL to pass. 1248 A receiver must suspend NAK generation if a duplicate of the NAK is 1249 already pending from this receiver. A NAK is pending from this receiver 1250 if NAK_RB_IVL for this NAK has been initiated in this receiver but has 1251 not yet passed. 1253 A receiver must suspend NAK generation if a duplicate of the NAK is 1254 already outstanding from this or another receiver. A NAK is outstanding 1255 from this or another receiver if NAK_RDATA_IVL for this NAK has been 1256 initiated in this receiver but has not yet passed. 1258 Backing off NAK transmission 1260 Before transmitting a NAK, a receiver must wait some interval NAK_RB_IVL 1261 chosen randomly and uniformly over NAK_BO_IVL during which it listens 1262 for a matching NCF that may be transmitted in response to the same NAK 1263 from another receiver. 1265 NAK suppression 1267 A receiver must suspend NAK generation and wait at least NAK_RDATA_IVL 1268 before recommencing NAK generation if it hears a matching NCF during 1269 NAK_RB_IVL. A matching NCF must match NCF_TSI with NAK_TSI, and NCF_SQN 1270 with NAK_SQN. 1272 Transmitting a NAK 1274 Upon expiry of NAK_RB_IVL, a receiver must transmit a NAK to the 1275 upstream PGM network element for the TSI specifying the transport ses- 1276 sion identifier and missing sequence number. It must repeat the NAK at 1277 a rate of NAK_RPT_RTE for an interval of NAK_RPT_IVL until it receives a 1278 matching NCF. It must then wait NAK_RDATA_IVL before recommencing NAK 1279 generation. If it hears a matching NCF during NAK_RDATA_IVL, it must 1280 wait anew for NAK_RDATA_IVL before recommencing NAK generation (i.e., 1281 NCFs restart NAK_RDATA_IVL). 1283 Receivers should transmit NAKs for data packets in the increment window 1284 at priority over NAKs for data packets in the remainder of the receive 1285 window. 1287 Completion of NAK generation 1289 NAK generation is complete only upon the reception of the matching RDATA 1290 (or even ODATA) packet at any time during NAK generation. 1292 Cancellation of NAK generation 1294 NAK generation is canceled upon the advancing of the receive window so 1295 as to exclude the matching sequence number of a pending or outstanding 1296 NAK, or the expiry of NAK_GEN_IVL. Cancellation of NAK generation indi- 1297 cates unrecoverable data loss. 1299 Addressing NAKs 1301 A receiver (unicast) addresses a NAK to the upstream PGM network element 1302 for the TSI. It also records both the address of the source of the 1303 corresponding ODATA and the address of the group in the NAK header. 1305 Receiving NCFs 1307 A receiver must discard any NCFs it hears for data packets outside the 1308 receive window. 1310 If a receiver hears an NCF for a data packet in the receive window for 1311 which it has no retransmit state, it should discard the NCF only if it 1312 has already received the matching data packet. If it has not already 1313 received the matching data packet, it should wait NAK_RDATA_IVL and then 1314 commence NAK generation itself, beginning with the random back off pro- 1315 cedure. 1317 6.4. Local Retransmission 1319 Detecting retransmit requests 1321 Receivers may detect retransmit requests from other receivers by compar- 1322 ing the sequence number on any NCF received for any data packet in the 1323 receive window. If the receiver has received the corresponding data 1324 packet, it may initiate RDATA generation for that packet. 1326 Generating RDATA 1328 RDATA generation requires that a receiver listen to NCFs and RDATA for 1329 the same transport session. 1331 RDATA generation also requires that a receiver observe a time out inter- 1332 val for any given RDATA packet (i.e., per RDATA_TSI and RDATA_SQN). 1334 The RDATA random back-off interval RDATA_RB_IVL randomly delays the 1335 transmission of a given RDATA packet from a receiver. RDATA_RB_IVL is 1336 counted down from the time the retransmit request is detected. Expiry 1337 of RDATA_RB_IVL causes transmission of the RDATA packet. 1339 During RDATA_RB_IVL, an RDATA packet is said to be pending. 1341 Cancellation of RDATA generation 1343 A receiver must cancel RDATA generation if a duplicate of the RDATA 1344 packet is already pending from this receiver. An RDATA packet is pend- 1345 ing from this receiver if RDATA_RB_IVL for this RDATA packet has been 1346 initiated in this receiver but has not yet passed. 1348 RDATA generation is canceled upon the advancing of the receive window so 1349 as to exclude the matching sequence number of a pending RDATA. 1351 Backing off RDATA transmission 1353 Before transmitting an RDATA packet, a receiver must wait some interval 1354 RDATA_RB_IVL chosen randomly and uniformly over RDATA_BO_IVL during 1355 which it listens for a matching RDATA packet that may be transmitted 1356 from another receiver in response to the same NCF. 1358 RDATA suppression 1360 A receiver must cancel RDATA generation if it hears a matching RDATA 1361 packet during RDATA_RB_IVL. A matching RDATA packet must match 1362 RDATA_TSI and RDATA_SQN. 1364 Transmitting an RDATA packet 1366 Upon expiry of RDATA_RB_IVL, a receiver may multicast the RDATA packet 1367 to the group. The RDATA packet, other than its type (and therefore its 1368 checksum), must be an exact duplicate of the corresponding ODATA packet. 1370 7. Procedures - Network Elements 1372 7.1. Source Path State 1374 Upon receipt of an SPM, a network element records the Source Path 1375 Address SPM_PATH with the multicast routing information for the TSI. If 1376 the receiving network element is on the same subnet as the forwarding 1377 network element, this address will be the same as the address of the 1378 immediately upstream network element on the distribution tree for the 1379 TSI. If, however, non-PGM network elements intervene between the for- 1380 warding and the receiving network elements, this address will be the 1381 address of the first PGM network element across the intervening network 1382 elements. 1384 The network element then forwards the SPM on each outgoing interface for 1385 that TSI. As it does so, it encodes the network address of the outgoing 1386 interface in SPM_PATH in each copy of the SPM it forwards. 1388 7.2. NAK Confirmation 1390 Network elements must immediately transmit an NCF in response to any NAK 1391 they receive. The NCF must be multicast to the group on the interface 1392 on which the NAK was received. 1394 NOTA BENE: In order to avoid creating multicast routing state 1395 for PGM network elements across non-PGM-capable clouds, NCFs 1396 transmitted by network elements must bear the ODATA source's 1397 NLA, not the network element's NLA as might be expected. 1399 The generation of NCFs should be rate-limited to protect against a 1400 denial of service in the presence of a NAK storm. 1402 Simultaneously, network elements must establish retransmit state for the 1403 NAK if such state does not already exist, and add the interface on which 1404 the NAK was received to the corresponding retransmit interface list if 1405 the interface is not already listed. 1407 7.3. Constrained NAK Forwarding 1409 The NAK forwarding procedures for network elements are quite similar to 1410 those for receivers, but three important differences should be noted. 1411 First, network elements do NOT back off before forwarding a NAK (i.e., 1412 there is no NAK_BO_IVL) since the resulting delay of the NAK would com- 1413 pound with each hop. Instead, NAK anticipation and elimination act to 1414 prevent NAK storms from network elements. 1416 Second, network elements do NOT retry confirmed NAKs (i.e., there is no 1417 NAK_GEN_IVL) if RDATA is not seen; they simply discard the retransmit 1418 state and rely on receivers to re-request the retransmission. This 1419 approach keeps the retransmit state in the network elements relatively 1420 ephemeral and responsive to underlying routing changes. 1422 Third, note that ODATA does NOT cancel NAK forwarding in network ele- 1423 ments since it is switched by network elements without transport-layer 1424 intervention. 1426 NAK forwarding requires that a network element listen to NCFs for the 1427 same transport session. NAK forwarding also requires that a network 1428 element observe two time out intervals for any given NAK (i.e., per 1429 NAK_TSI and NAK_SQN). 1431 The first, the NAK repeat interval NAK_RPT_IVL, limits the length of 1432 time for which a network element will repeat a NAK while waiting for a 1433 corresponding NCF. NAK_RPT_IVL is counted down from the transmission of 1434 a NAK. Expiry of NAK_RPT_IVL cancels NAK forwarding (due to missing 1435 NCF). 1437 The second, the NAK RDATA interval NAK_RDATA_IVL, limits the length of 1438 time for which a network element will wait for the corresponding RDATA. 1439 NAK_RDATA_IVL is counted down from the time a matching NCF is received. 1440 Expiry of NAK_RDATA_IVL causes the network element to discard the 1441 corresponding retransmit state (due to missing RDATA). 1443 During NAK_RPT_IVL, a NAK is said to be pending. During NAK_RDATA_IVL, 1444 a NAK is said to be outstanding. 1446 A Network element must forward NAKs only to the upstream PGM network 1447 element for the TSI. 1449 A network element must repeat a NAK at a rate of NAK_RPT_RTE for an 1450 interval of NAK_RPT_IVL until it receives a matching NCF. A matching 1451 NCF must match NCF_TSI with NAK_TSI, and NCF_SQN with NAK_SQN. 1453 Upon reception of the corresponding NCF, network elements must wait at 1454 least NAK_RDATA_IVL for the corresponding RDATA. Receipt of the 1455 corresponding RDATA at any time during NAK forwarding cancels NAK for- 1456 warding and tears down the corresponding retransmit state in the network 1457 element. 1459 7.4. NAK elimination 1461 Two NAKs duplicate each other if they bear the same NAK_TSI and NAK_SQN. 1462 Network elements must discard all duplicates of a NAK that is pending. 1464 Once a NAK is outstanding, network elements must discard all duplicates 1465 of that NAK for NAK_ELIM_IVL. Upon expiry of NAK_ELIM_IVL, network ele- 1466 ments must suspend NAK elimination for that TSI/SQN until the first 1467 duplicate of that NAK is seen after the expiry of NAK_ELIM_IVL. This 1468 duplicate must be forwarded in the usual manner. Once this duplicate 1469 NAK is outstanding, network elements must once again discard all dupli- 1470 cates of that NAK for NAK_ELIM_IVL, and so on. NAK_RDATA_IVL must be 1471 reset each time a NAK for the corresponding TSI/SQN is forwarded (i.e., 1472 each time NAK_ELIM_IVL is reset). NAK_ELIM_IVL must be some small frac- 1473 tion of NAK_RDATA_IVL. 1475 NAK_ELIM_IVL acts to balance implosion prevention against retransmit 1476 state liveness. That is, it results in the elimination of all but at 1477 most one NAK per NAK_ELIM_IVL thereby allowing repeated NAKs to keep the 1478 retransmit state alive in the PGM network elements. 1480 7.5. NAK Anticipation 1482 An unsolicited NCF is one that is received by a network element when the 1483 network element has no corresponding pending or outstanding NAK. Net- 1484 work elements must process unsolicited NCFs differently depending on the 1485 interface on which they are received. 1487 If the interface on which an NCF is received is the same interface the 1488 network element would use to reach the upstream PGM network element, the 1489 network element simply establishes retransmit state for NCF_TSI and 1490 NCF_SQN without adding the interface to the retransmit interface list, 1491 and discards the NCF. If the retransmit state already exists, the net- 1492 work element just discards the NCF. 1494 If the interface on which an NCF is received is not the same interface 1495 the network element would use to reach the upstream PGM network element, 1496 the network element does not establish retransmit state and just dis- 1497 cards the NCF. 1499 Anticipated NAKs permit the elimination of any subsequent matching NAKs 1500 from downstream. Upon establishing anticipated retransmit state, net- 1501 work elements must eliminate subsequent NAKs only for a period of 1502 NAK_ELIM_IVL. Upon expiry of NAK_ELIM_IVL, network elements must 1503 suspend NAK elimination for that TSI/SQN until the first duplicate of 1504 that NAK is seen after the expiry of NAK_ELIM_IVL. This duplicate must 1505 be forwarded in the usual manner. Once this duplicate NAK is outstand- 1506 ing, network elements must once again discard all duplicates of that NAK 1507 for NAK_ELIM_IVL, and so on. NAK_RDATA_IVL must be reset each time a 1508 NAK for the corresponding TSI/SQN is forwarded (i.e., each time 1509 NAK_ELIM_IVL is reset). NAK_ELIM_IVL must be some small fraction of 1510 NAK_RDATA_IVL. 1512 7.6. NAK Shedding 1514 Network elments may implement local procedures for withholding NAK con- 1515 firmations for receivers detected to be reporting excessive loss. The 1516 result of these procedures would ultimately be unrecoverable data loss 1517 in the receiver. 1519 7.7. Addressing NAKs 1521 A PGM network element uses the *contained* source and group addresses to 1522 find the source/group multicast routing information, looks up the 1523 corresponding upstream PGM network element's address, uses it to re- 1524 address the (unicast) NAK, and unicasts it on the upstream interface for 1525 the distribution tree for the TSI. 1527 7.8. Constrained RDATA Forwarding 1529 Network elements must maintain retransmit state for each interface on 1530 which a given NAK is received at least once. Network elements must then 1531 use this list of interfaces to constrain the forwarding of the 1532 corresponding RDATA packet only to those interfaces in the list. An 1533 RDATA packet corresponds to a NAK if it matches NAK_TSI and NAK_SQN. 1535 Network elements must maintain this retransmit state only until either 1536 the corresponding RDATA is received and forwarded, or NAK_RDATA_IVL 1537 passes after forwarding the most recent instance of a given NAK. 1538 Thereafter, the corresponding retransmit state must be discarded. 1540 Network elements should discard and not forward RDATA packets for which 1541 they have no retransmit state. Note that the consequence of this pro- 1542 cedure is that, while it constrains retransmissions to the interested 1543 sub-set of the network, loss of retransmit state precipitates further 1544 NAKs from neglected receivers. 1546 8. Packet Formats 1548 All of the packet formats described in this section are transport-layer 1549 headers that must immediately follow the network-layer header in the 1550 packet. Only data packet headers (ODATA and RDATA) may be followed in 1551 the packet by application data. For each packet type, the source and 1552 destination network-layer addresses (NLAs) are specified in addition to 1553 the format and contents of the transport layer header. Recall from Gen- 1554 eral Procedures that, for PGM over IP multicast, SPMs, NCFs, and RDATA 1555 must also bear the IP Router Alert Option. 1557 For PGM over IP, the IP protocol number is 113. 1559 In all packets the descriptions of Source Port, Destination Port, 1560 Options, Checksum, Global Source ID (GSI), and TPDU Length are: 1562 Source Port: 1564 A random port number generated by the source. This port number 1565 must be unique within the source. Source Port together with Glo- 1566 bal Source ID forms the TSI. 1568 Destination Port: 1570 A globally well-known port number assigned to the given PGM appli- 1571 cation. 1573 Options: 1575 This field encodes binary indications of the presence and signifi- 1576 cance of any options. 1578 bit 0 set => One or more Option Extensions are present 1580 bit 1 set => One or more Options are network-significant 1582 Note that this bit is clear when OPT_FRAGMENT and/or OPT_JOIN 1583 are the only options present. 1585 bit 7 set => Packet is a parity packet (OPT_PARITY) 1587 All option extensions are encoded in extensions to the PGM header. 1589 Checksum: 1591 This field is the usual 1's complement of the 1's complement sum 1592 of the entire PGM packet including header. 1594 The checksum does not include a network-layer pseudo header for 1595 compatibility with network address translation. If the computed 1596 checksum is zero, it is transmitted as all ones. A value of zero 1597 in this field means the transmitter generated no checksum. 1599 Note that if any entity between a source and a receiver modifies 1600 the PGM header for any reason (such as editing the Previous 1601 Sequence Number field of OPT_DROP), it must either recompute the 1602 checksum or clear it. The checksum is mandatory on data packets 1603 (ODATA and RDATA) that do NOT also have OPT_DROP. 1605 Global Source ID: 1607 A globally unique source identifier. This ID must not change 1608 throughout the duration of the transport session. A recommended 1609 identifier is the low-order 48 bits of the MD5 [9] signature of 1610 the DNS name of the source. Global Source ID together with Source 1611 Port forms the TSI. 1613 TPDU Length: 1615 The length in octets of the PGM packet including the size of the 1616 header and any options. 1618 The high-order two bits of the Type field encode a version number, 0x0 1619 in this instance. The low-order nibble of the type field encodes the 1620 specific packet type. The intervening two bits (the low-order two bits 1621 of the high-order nibble) are reserved and must be zero. 1623 Within the low-order nibble of the Type field: 1625 values in the range 0x0 through 0x3 represent SPM-like packets (i.e., 1626 session-specific, sourced by a source, periodic), 1628 values in the range 0x4 through 0x7 represent DATA-like packets 1629 (i.e., data and retransmissions thereof), 1631 values in the range 0x8 through 0xB represent NAK-like packets (i.e., 1632 hop-by-hop reliable NAK forwarding procedures), 1634 and values in the range 0xC through 0xF represent RSN-like packets 1635 (i.e., session-specific, sourced by a receiver, asynchronous). 1637 Address Family Indicators (AFIs) are as specified in [10]. 1639 8.1. Source Path Messages 1641 SPMs are sent by a source to establish source path state in network ele- 1642 ments and to provide transmit window state to receivers. 1644 The source NLA of an SPM is the unicast NLA of the entity that ori- 1645 ginates the SPM. 1647 The destination NLA of an SPM is a multicast group NLA. 1649 0 1 2 3 1650 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1651 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1652 | Source Port | Destination Port | 1653 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1654 | Type | Options | Checksum | 1655 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1656 | Global Source ID ... | 1657 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1658 | ... Global Source ID | TPDU Length | 1659 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1660 | SPM's Sequence Number | 1661 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1662 | Trailing Edge Sequence Number | 1663 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1664 | Increment Sequence Number | 1665 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1666 | Leading Edge Sequence Number | 1667 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1668 | NLA AFI | reserved | 1669 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1670 | Path NLA ... | 1671 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 1672 | Option Extensions when present ... | 1673 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1675 Source Port: 1677 SPM_SPORT 1679 Together with SPM_GSI forms SPM_TSI 1681 Destination Port: 1683 SPM_DPORT 1685 Type: 1687 SPM_TYPE = 0x00 1689 Global Source ID: 1691 SPM_GSI 1693 Together with SPM_SPORT forms SPM_TSI 1695 SPM's Sequence Number 1697 SPM_SQN 1699 The sequence number assigned to the SPM by the source. 1701 Trailing Edge Sequence Number: 1703 SPM_TRAIL 1705 The sequence number defining the current trailing edge of the 1706 source's transmit window (TXW_TRAIL). 1708 Increment Sequence Number: 1710 SPM_INC 1712 The sequence number defining the current leading edge of the 1713 source's increment window (TXW_INC). 1715 Leading Edge Sequence Number: 1717 SPM_LEAD 1719 The sequence number defining the current leading edge of the 1720 source's transmit window (TXW_LEAD). 1722 Path NLA: 1724 SPM_PATH 1726 The NLA of the interface on the network element on which this SPM 1727 was forwarded. Initialized by a source to the source's NLA, 1728 rewritten by each PGM network element upon forwarding. 1730 Option Extensions: 1732 SPMs may bear OPT_JOIN. 1734 8.2. Data Packets 1736 Data packets carry application data from a source or a retransmitter to 1737 receivers. 1739 ODATA: 1741 Original data packets transmitted by a source. 1743 RDATA: 1745 Retransmissions transmitted by a source or by a designated local 1746 retransmitter (DLR) in response to a NAK, or a by a receiver in 1747 response to an NCF. 1749 The source NLA of a data packet is the unicast NLA of the entity that 1750 originates the data packet. 1752 The destination NLA of a data packet is a multicast group NLA. 1754 0 1 2 3 1755 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1756 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1757 | Source Port | Destination Port | 1758 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1759 | Type | Options | Checksum | 1760 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1761 | Global Source ID ... | 1762 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1763 | ... Global Source ID | TPDU Length | 1764 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1765 | Trailing Edge Sequence Number | 1766 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1767 | Data Packet Sequence Number | 1768 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1769 | Option Extensions when present ... | 1770 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1771 | Data ... 1772 +-+-+- ... 1774 Source Port: 1776 OD_SPORT, RD_SPORT 1778 Together with Global Source ID forms: 1780 OD_TSI, RD_TSI 1782 Destination Port: 1784 OD_DPORT, RD_DPORT 1786 Type: 1788 OD_TYPE = 0x04 1789 RD_TYPE = 0x05 1791 Global Source ID: 1793 OD_GSI, RD_GSI 1795 Together with Source Port forms: 1797 OD_TSI, RD_TSI 1799 Trailing Edge Sequence Number: 1801 OD_TRAIL, RD_TRAIL 1803 The sequence number defining the current trailing edge of the 1804 source's transmit window (TXW_TRAIL). In RDATA, this may not be 1805 the same as OD_TRAIL of the ODATA packet of which it is a 1806 retransmission. 1808 Data Packet Sequence Number: 1810 OD_SQN, RD_SQN 1812 The sequence number originally assigned to the ODATA packet by the 1813 source. 1815 Option Extensions: 1817 Data packets may bear OPT_FRAGMENT or OPT_DROP (not both) 1819 Data: 1821 Application data. 1823 8.3. Negative Acknowledgements and Confirmations 1825 NAK: 1827 Negative Acknowledgements are sent by receivers to request the 1828 retransmission of an ODATA packet detected to be missing from the 1829 expected sequence. 1831 N-NAK: 1833 Null Negative Acknowledgements are sent by DLRs to provide flow 1834 control feedback to the source of ODATA for which the DLR has pro- 1835 vided the corresponding RDATA. 1837 The source NLA of a NAK is the unicast NLA of the entity that originates 1838 the NAK. 1840 The destination NLA of a NAK is initialized by the originator of the NAK 1841 (a receiver) to the unicast NLA of the upstream PGM network element 1842 known from SPMs. The destination NLA of a NAK is rewritten by each PGM 1843 network element with the unicast NLA of the upstream PGM network element 1844 to which this NAK is forwarded. On the final hop, the destination NLA 1845 of a NAK is rewritten by the PGM network element with the unicast NLA of 1846 the original source or the unicast NLA of a DLR. 1848 NCF: 1850 NAK Confirmations are sent by network elements and sources to con- 1851 firm the receipt of a NAK. 1853 The source NLA of an NCF is the unicast NLA of the entity that ori- 1854 ginates the NCF. 1856 The destination NLA of an NCF is a multicast group NLA. 1858 0 1 2 3 1859 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1860 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1861 | Source Port | Destination Port | 1862 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1863 | Type | Options | Checksum | 1864 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1865 | Global Source ID ... | 1866 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1867 | ... Global Source ID | TPDU Length | 1868 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1869 | Requested Sequence Number | 1870 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1871 | NLA AFI | reserved | 1872 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1873 | Source NLA ... | 1874 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 1875 | NLA AFI | reserved | 1876 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1877 | Multicast Group NLA ... | 1878 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 1879 | Option Extensions when present ... 1880 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... 1882 Source Port: 1884 NAK_SPORT, NNAK_SPORT, NCF_SPORT 1886 Together with Global Source ID forms: 1888 NAK_TSI, NNAK_TSI, NCF_TSI 1890 Destination Port: 1892 NAK_DPORT, NNAK_DPORT, NCF_DPORT 1894 Type: 1896 NAK_TYPE = 0x08 1897 NNAK_TYPE = 0x09 1899 NCF_TYPE = 0x0A 1901 Global Source ID: 1903 NAK_GSI, NNAK_GSI, NCF_GSI 1904 Together with Source Port forms 1906 NAK_TSI, NNAK_TSI, NCF_TSI 1908 Requested Sequence Number: 1910 NAK_SQN, NNAK_SQN 1912 NAK_SQN is the sequence number of the ODATA packet for which 1913 retransmission is requested. 1915 NNAK_SQN is the sequence number of the RDATA packet for which 1916 retransmission has been provided by a DLR. 1918 NCF_SQN 1920 NCF_SQN is NAK_SQN from the NAK being confirmed. 1922 Source NLA: 1924 NAK_SRC, NNAK_SRC, NCF_SRC 1926 The unicast NLA of the original source of the missing ODATA. 1928 Multicast Group NLA: 1930 NAK_GRP, NNAK_GRP, NCF_GRP 1932 The multicast group NLA. 1934 Option Extensions: 1936 NAKs may bear OPT_RANGE and/or OPT_TIME 1937 NCFs may bear OPT_RANGE and/or OPT_REDIRECT 1939 9. Options 1941 PGM specifies several end-to-end options to address specific application 1942 requirements. PGM specifies options to support fragmentation, sequence 1943 number ranges, late joining, time-stamping, reception quality reports, 1944 sequence number dropout, and redirection. 1946 Options may be appended to PGM packet headers only by their original 1947 transmitters. While they may be interpreted by network elements, 1948 options are neither added nor removed by network elements. 1950 9.1. Option extension length - OPT_LENGTH 1952 When option extensions are appended to the standard PGM header, the 1953 extensions must be preceded by an option extension length field specify- 1954 ing the total length of all option extensions. 1956 In addition, the PGM packet length must be incremented by the total 1957 length of all options, and the presence of the options must be encoded 1958 in the Options field of the standard PGM header before the Checksum is 1959 computed. 1961 All network-significant options must be appended before any exclusively 1962 receiver-significant options. 1964 To provide an indication of the end of option extensions, OPT_END (0x80) 1965 must be set in the Option Type field of the trailing option extension. 1967 9.1.1. OPT_LENGTH - Packet Extension Format 1969 0 1 2 3 1970 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1971 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1972 | Option Type | Option Length | Total length of all options | 1973 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1975 Option Type = 0x00 1977 Option Length = 4 octets 1979 Total length of all options 1981 The total length in octets of all option extensions including 1982 OPT_LENGTH. 1984 9.2. Fragmentation Option - OPT_FRAGMENT 1986 Fragmentation allows transport-layer entities at a source to break up 1987 application protocol data units (APDUs) into multiple PGM data packets 1988 (TPDUs) to conform with the MTU supported by the network layer. The 1989 fragmentation option may be applied to ODATA and RDATA packets only. 1991 This option is incompatible with the sequence number dropout 1992 option since dropout is based upon application-layer informa- 1993 tion available only at the beginning of the APDU. Trailing 1994 fragments of such packets would not have sufficient informa- 1995 tion to which to apply the drop out algorithm and so would be 1996 pass through filters designed to discard the APDU as a whole. 1998 Architecturally, the accumulation of TPDUs into APDUs is applied to 1999 TPDUs that have already been received, duplicate eliminated, and con- 2000 tiguously sequenced by the receiver. Thus APDUs may be reassembled 2001 across increments of the transmit window. 2003 9.2.1. OPT_FRAGMENT - Packet Extension Contents 2005 OPT_FRAG_OFF the offset of the fragment from the beginning of the APDU 2007 OPT_FRAG_LEN the total length of the original APDU 2009 9.2.2. OPT_FRAGMENT - Procedures - Sources 2011 A source fragments APDUs into a contiguous series of fragments no larger 2012 than the MTU supported by the network layer. A source sequentially and 2013 uniquely assigns OD_SQNs to these fragments in the order in which they 2014 occur in the APDU. A source then sets OPT_FRAG_OFF to the value of the 2015 offset of the fragment in the original APDU (where the first byte of the 2016 APDU is at offset 0, and OPT_FRAG_OFF numbers the first byte in the 2017 fragment), and set OPT_FRAG_LEN to the value of the total length of the 2018 original APDU. 2020 9.2.3. OPT_FRAGMENT - Procedures - Receivers 2022 Receivers detect and accumulate fragmented packets until they have 2023 received an entire contiguous sequence of packets comprising an APDU. 2024 This sequence begins with the fragment bearing OPT_FRAG_OFF of 0, and 2025 terminates with the fragment whose length added to its OPT_FRAG_OFF is 2026 OPT_FRAG_LEN. 2028 9.2.4. OPT_FRAGMENT - Procedures - Network Elements 2030 This option is not network-significant. 2032 9.2.5. OPT_FRAGMENT - Packet Extension Format 2034 0 1 2 3 2035 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2036 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2037 | Option Type | Option Length | | 2038 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2039 | Offset | 2040 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2041 | Length | 2042 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2044 Option Type = 0x01 2046 Option Length = 12 octets 2048 Offset 2050 The offset of the fragment from the beginning of the APDU 2051 (OPT_FRAG_OFF). 2053 Length 2055 The total length of the original APDU (OPT_FRAG_LEN). 2057 9.3. Sequence Number Range Option - OPT_RANGE 2059 Sequence number ranges may be used in conjunction with NAKs (and 2060 corresponding NCFs) to allow receivers to negatively acknowledge a con- 2061 tiguous range of missing sequence numbers in a single NAK. 2063 In this section, a matching NCF must match NCF_TSI with NAK_TSI, NCF_SQN 2064 with NAK_SQN, and NCF_OPT_RANGE_MAX with NAK_OPT_RANGE_MAX. Correspond- 2065 ing ODATA/RDATA must match OD_TSI/RD_TSI with NAK_TSI, and OD_SQN/RD_SQN 2066 with any value in the range from NAK_SQN through NAK_OPT_RANGE_MAX, 2067 inclusive. 2069 9.3.1. OPT_RANGE - Packet Extensions Contents 2071 OPT_RANGE_MAX the largest sequence number in the range 2073 9.3.2. OPT_RANGE - Procedures - Receivers 2075 When a receiver detects a contiguous range of sequence numbers missing 2076 from the receive window, it may request their retransmission individu- 2077 ally with one NAK for each sequence number in the range, or it may 2078 request their retransmission collectively with one NAK, augmented by 2079 OPT_RANGE, for the entire range. 2081 To specify the range, the receiver must set NAK_SQN to the value of the 2082 smallest sequence number in the range, and it must set OPT_RANGE_MAX to 2083 the value of the largest sequence number in the range. 2085 In addition, the following modifications to the Procedures for NAK and 2086 NCF processing in receivers apply. 2088 Receipt of corresponding ODATA/RDATA during NAK_BO_IVL or NAK_RPT_IVL 2089 does NOT complete NAK generation unless the entire range of packets is 2090 received. 2092 The receipt of corresponding ODATA/RDATA during NAK_RDATA_IVL restarts 2093 NAK_RDATA_IVL. Upon expiry of NAK_RDATA_IVL, a receiver must re-examine 2094 the receive window to determine any remaining outstanding ranges of 2095 missing packets. 2097 9.3.3. OPT_RANGE - Procedures - Network Elements 2099 Network elements must confirm NAK ranges with a corresponding NCF. 2100 Other than that, the Procedures for confirming and forwarding NAKs, and 2101 for constraining RDATA are unchanged for this option. 2103 9.3.4. OPT_RANGE - Procedures - Sources 2105 The following modifications to the Procedures for NAK and NCF processing 2106 in sources apply. 2108 A source must confirm a NAK range with a matching NCF if ANY fraction of 2109 the specified range of packets is in the transmit window. A source need 2110 only retransmit those packets corresponding to that fraction of the 2111 range in the transmit window. 2113 9.3.5. OPT_RANGE - Packet Extension Format 2115 0 1 2 3 2116 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2117 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2118 | Option Type | Option Length | | 2119 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2120 | Maximum Sequence Number | 2121 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2123 Option Type = 0x02 2125 Option Length = 8 octets 2126 Maximum Sequence Number 2128 The largest sequence number in the range (OPT_RANGE_MAX). 2130 9.4. Late Joining Option - OPT_JOIN 2132 Late joining allows a source to bound the amount of retransmission his- 2133 tory receivers may request when they initially join a particular tran- 2134 sport session. 2136 This option indicates that receivers that join a transport session in 2137 progress may request retransmission of all data as far back as the given 2138 minimum sequence number from the time they join the transport session. 2139 The default is for receivers to receive data only from the first packet 2140 they receive and onward. 2142 9.4.1. OPT_JOIN - Packet Extensions Contents 2144 OPT_JOIN_MIN the minimum sequence number for retransmission 2146 9.4.2. OPT_JOIN - Procedures - Receivers 2148 If a PGM packet (ODATA, RDATA, or SPM) bears OPT_JOIN, a receiver may 2149 initialize the trailing edge of the receive window (RXW_TRAIL_INIT) to 2150 the given Minimum Sequence Number and proceeds with normal data recep- 2151 tion. 2153 9.4.3. OPT_JOIN - Procedures - Network Elements 2155 This option is not network-significant. 2157 9.4.4. OPT_JOIN - Packet Extension Format 2159 0 1 2 3 2160 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2161 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2162 | Option Type | Option Length | | 2163 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2164 | Minimum Sequence Number | 2165 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2167 Option Type = 0x03 2169 Option Length = 8 octets 2171 Minimum Sequence Number 2172 The minimum sequence number defining the initial trailing edge of 2173 the receive window for a late joining receiver. 2175 9.5. Time Stamp Option - OPT_TIME 2177 Time stamps may be used in conjunction with NAKs to allow receivers to 2178 specify the interval in which the requested RDATA is relevant to them. 2179 That interval is interpreted by both network elements and sources to 2180 determine whether to continue with or abandon a given retransmission. 2182 9.5.1. OPT_TIME - Packet Extensions Contents 2184 OPT_TIME_STAMP absolute time interval in milliseconds 2186 9.5.2. OPT_TIME - Procedures - Receivers 2188 Receivers may append the Time Stamp option to a NAK to indicate the 2189 absolute interval from the time of transmitting the NAK during which the 2190 receiver can usefully receive the corresponding RDATA. 2192 9.5.3. OPT_TIME - Procedures - Network Elements 2194 Network elements should use the time stamp of a NAK to age the associ- 2195 ated retransmit state for the specified interval and discard it if the 2196 corresponding RDATA has not already torn it down. 2198 Network elements must eliminate a time-stamped NAK only if its time 2199 stamp is smaller than the remaining time associated with the matching 2200 retransmit state. Otherwise, such a NAK must be forwarded instead of 2201 eliminated, and its time stamp must be used to replace the time stamp of 2202 existing retransmit state. 2204 9.5.4. OPT_TIME - Procedures - Sources 2206 A source should abandon any attempt to retransmit RDATA in response to a 2207 time stamped NAK if that retransmission cannot be completed within the 2208 specified interval. 2210 9.5.5. OPT_TIME - Packet Extension Format 2212 0 1 2 3 2213 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2214 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2215 | Option Type | Option Length | | 2216 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2217 | Time Stamp | 2218 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2220 Option Type = 0x04 2222 Option Length = 8 octets 2224 Time Stamp 2226 Absolute time interval in milliseconds (OPT_TIME_STAMP). 2228 9.6. Reception Quality Option - OPT_RXQ 2230 Reception quality reports may be used in conjunction with NAKs to allow 2231 receivers to provide a reception quality metric to the source. 2233 9.6.1. OPT_RXQ - Packet Extensions Contents 2235 OPT_RXQ_METRIC A reception quality metric defined by a source's local 2236 flow- and congestion-control procedures. 2238 9.6.2. OPT_RXQ - Procedures - Receivers 2240 Receivers may append the Reception Quality option to a NAK to indicate 2241 the rate of packet loss detected at the receiver. Receivers must bias 2242 the transmission of NAKs bearing OPT_RXQ by scaling NAK_BO_IVL with 2243 respect to the reception quality metric. That is, as reception quality 2244 deteriorates, NAK_BO_IVL should be reduced, and as reception quality 2245 improves, NAK_BO_IVL should be increased. 2247 The procedures for NAK suppression apply unchanged with the exception 2248 that NAKs bearing OPT_RXQ are only suppressed by other matching NAKs 2249 bearing OPT_RXQ and a worse reception quality metric. 2251 9.6.3. OPT_RXQ - Procedures - Network Elements 2253 Network elements must eliminate a NAK bearing OPT_RXQ only if its recep- 2254 tion quality metric is larger (worse) than the reception quality metric 2255 associated with the matching retransmit state. Otherwise, such a NAK 2256 must be forwarded instead of eliminated, and its reception quality 2257 metric must be used to replace the reception quality metric of existing 2258 retransmit state. 2260 9.6.4. OPT_RXQ - Procedures - Sources 2262 Sources may interpret reception quality reports in a local manner to 2263 adjust their transmission rate. 2265 9.6.5. OPT_RXQ - Packet Extension Format 2266 0 1 2 3 2267 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2268 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2269 | Option Type | Option Length | | 2270 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2271 | Reception Quality Metric | 2272 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2274 Option Type = 0x05 2276 Option Length = 8 octets 2278 Reception Quality Metric 2280 TBD 2282 9.7. Sequence Number Dropout Option - OPT_DROP 2284 Sequence number dropout may be used in conjunction with data packets to 2285 allow sources and network elements to selectively eliminate PGM data 2286 packets and convey the resulting sequence-number discontinuity to 2287 receivers so that sequencing can be preserved across the dropout. 2288 Sequence number dropout is incompatible with the fragmentation option. 2290 This option is incompatible with fragmentation since dropout 2291 is based upon application-layer information available only at 2292 the beginning of the APDU. Trailing fragments of such packets 2293 would not have sufficient information to which to apply the 2294 drop out algorithm and so would be pass through filters 2295 designed to discard the APDU as a whole. 2297 9.7.1. OPT_DROP - Packet Extensions Contents 2299 OPT_DROP_PREV the sequence number of the packet that should be regarded 2300 by the receiver as the logical predecessor to the packet 2301 bearing this option 2303 9.7.2. OPT_DROP - Procedures - Sources 2305 On a per-packet basis, a source may selectively permit intermediate 2306 application-layer filters to be applied to a data packet by appending 2307 OPT_DROP to ODATA/RDATA packets and setting the value of OPT_DROP_PREV 2308 to OD_SQN/RD_SQN. 2310 9.7.3. OPT_DROP - Procedures - Network Elements 2312 Network elements may apply intermediate application-layer filters only 2313 to ODATA/RDATA packets bearing OPT_DROP. If such a data packet passes 2314 the filters, it must be forwarded out each interface with OPT_DROP_PREV 2315 set to the value of the sequence number of the highest numbered data 2316 packet within OD_TSI/RD_TSI that has already been forward on that inter- 2317 face. 2319 9.7.4. OPT_DROP - Procedures - Receivers 2321 Receivers must do drop detection on packets bearing OPT_DROP by verify- 2322 ing that they have also received the data packet numbered OPT_DROP_PREV 2323 rather than checking for the numerical predecessor of OD_SQN/RD_SQN. If 2324 a receiver has received OPT_DROP_PREV, then no drop has occurred. If a 2325 receiver has not received OPT_DROP_PREV, then a receiver must NAK only 2326 for OPT_DROP_PREV and no other intervening sequence numbers. 2328 9.7.5. OPT_DROP - Packet Extension Format 2330 0 1 2 3 2331 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2332 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2333 | Option Type | Option Length | | 2334 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2335 | Previous Sequence Number | 2336 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2338 Option Type = 0x06 2340 Option Length = 8 octets 2342 Previous Sequence Number 2344 The sequence number of the packet that should be regarded by the 2345 receiver as the logical predecessor to the packet bearing this 2346 option (OPT_DROP_PREV). 2348 9.8. Redirect Option - OPT_REDIRECT 2350 Redirection may be used in conjunction with NCFs to allow a designated 2351 local retransmitter (DLR) to respond to normal NCFs with a redirecting 2352 NCF advertising its own address as an alternative to the original 2353 source. Recipients of redirecting NCFs may then direct subsequent NAKs 2354 to the DLR rather than to the original source. In addition, receivers 2355 or network elements that redirect their NAKs to a DLR must send a NULL 2356 NAK to provide congestion feedback to the original source without also 2357 provoking a retransmission from that source. 2359 9.8.1. OPT_REDIRECT - Packet Extensions Contents 2361 OPT_REDIR_NLA the DLR's own unicast network-layer address to which 2362 recipients of the redirecting NCF may direct subsequent 2363 NAKs for the corresponding TSI. 2365 9.8.2. OPT_REDIRECT - Procedures - DLRs 2367 A DLR must receive any PGM sessions for which it wishes to provide a 2368 source of retransmissions. In addition to acting as an ordinary PGM 2369 receiver, a DLR may then respond to NCFs sourced by neighbouring network 2370 elements (or even by the source itself) by multicasting a repeat of that 2371 NCF with TTL of 1 and OPT_REDIRECT providing its own network-layer 2372 address. The TTL constrains the redirecting NCF to the same subnet as 2373 the source of the normal NCF. This is to ensure that DLRs provide 2374 retransmissions only if they are directly on the reverse path to the 2375 original source. 2377 Further, a DLR must act as an ordinary PGM source in responding to any 2378 NAK it receives (i.e., directed to it). That is, it should respond 2379 first with a normal NCF and then RDATA as usual. 2381 NOTA BENE: In order to propagate on exactly the same distribu- 2382 tion tree as ODATA, RDATA packets transmitted by DLRs and 2383 other receivers must bear the ODATA source's NLA, not the 2384 DLR's or the receiver's NLA as might be expected. 2386 9.8.3. OPT_REDIRECT - Procedures - Network Elements 2388 Upon receiving a redirecting NCF, network elements should record the 2389 redirecting information for the TSI, and may redirect subsequent NAKs 2390 for the same TSI to the network address provided in the redirecting NCF 2391 rather than to the network address those NAKs bear upon reception. 2392 Note, however, that a redirecting NCF is NOT regarded as matching the 2393 NAK that provoked it, so it does not complete the transmission of that 2394 NAK. Only a normal matching NCF can complete the transmission of a NAK. 2396 For subsequent NAKs, if the network element has recorded redirection 2397 information for the corresponding TSI, it may change the destination 2398 network address of those NAKs and attempt to transmit them to the DLR. 2399 If, however, a corresponding NCF is not received from the DLR within 2400 NAK_RPT_IVL, the network element must discard the redirecting informa- 2401 tion for the TSI and re-attempt to forward the NAK as originally 2402 addressed. In addition, for any NAK it redirects, a network element 2403 must also unicast a NULL NAK toward the original source (i.e., the 2404 source from which it is receiving session ODATA) so that the original 2405 source's congestion avoidance procedures remain well informed. 2407 Network elements must treat NULL NAKs just as they would any other NAK 2408 with the exception that they must not add the receiving interface to the 2409 retransmit state. They must otherwise confirm and eliminate or forward 2410 NULL NAKs in the usual way. A NULL NAK would be forward only if match- 2411 ing retransmit state has not already been created. If a NULL NAK is 2412 used to initially create retransmit state, this fact must be recorded so 2413 that any subsequent non-NULL NAK will not be eliminated, but rather will 2414 be forwarded to provoke an actual retransmission. 2416 9.8.4. OPT_REDIRECT - Procedures - Receivers 2418 Upon receiving a redirecting NCF, receivers should record the redirect- 2419 ing information for the TSI, and may redirect subsequent NAKs for the 2420 same TSI to the network address provided in the redirecting NCF rather 2421 than to the network address of the corresponding ODATA for which the 2422 receiver is requesting retransmission. Note, however, that a redirect- 2423 ing NCF is NOT regarded as matching the NAK that provoked it, so it does 2424 not complete the transmission of that NAK. Only a normal matching NCF 2425 can complete the transmission of a NAK. 2427 For subsequent NAKs, if the receiver has recorded redirection informa- 2428 tion for the corresponding TSI, it may change the destination network 2429 address of those NAKs and attempt to transmit them to the DLR. If, how- 2430 ever, a corresponding NCF is not received within NAK_RPT_IVL, the 2431 receiver must discard the redirecting information for the TSI and re- 2432 attempt to forward the NAK to the original source of the missing ODATA. 2434 9.8.5. OPT_REDIRECT - Packet Extension Format 2436 0 1 2 3 2437 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2438 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2439 | Option Type | Option Length | | 2440 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2441 | NLA AFI | reserved | 2442 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2443 | DLR's NLA ... | 2444 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+ 2446 Option Type = 0x07 2448 Option Length = 4 + NLA length 2449 DLR's NLA 2451 The DLR's own unicast network address to which recipients of the 2452 redirecting NCF may direct subsequent NAKs. 2454 10. Security Considerations 2456 In addition to the usual problems of end-to-end authentication, PGM is 2457 vulnerable to a number of security risks that are specific to the 2458 mechanisms it uses to establish source path state, to establish 2459 retransmit state, to forward NAKs, to identify DLRs, and to distribute 2460 retransmissions. These mechanisms expose PGM network elements them- 2461 selves to security risks since network elements not only switch but also 2462 interpret SPMs, NAKs, NCFs, and RDATA, all of which may legitimately be 2463 transmitted by PGM sources, receivers, and DLRs. Short of full authen- 2464 tication of all neighbouring sources, receivers, DLRs, and network ele- 2465 ments, the protocol is not impervious to abuse. 2467 So putting aside the problems of rogue PGM network elements for the 2468 moment, there are enough potential security risks to network elements 2469 associated with sources, receivers, and DLRs alone. These risks include 2470 denial of service through the exhausting of both CPU bandwidth and 2471 memory, as well as loss of (retransmit) data connectivity through the 2472 muddling of retransmit state. 2474 False SPMs may cause PGM network elements to mis-direct NAKs intended 2475 for the legitimate source with the result that the requested RDATA would 2476 not be forthcoming. 2478 False NAKs may cause PGM network elements to establish spurious 2479 retransmit state that will expire only upon time-out and could lead to 2480 memory exhaustion in the meantime. 2482 False NCFs may cause PGM network elements to suspend NAK forwarding 2483 prematurely (or to mis-direct NAKs in the case of redirecting NCFs) 2484 resulting eventually in loss of RDATA. 2486 False RDATA may cause PGM network elements to tear down legitimate 2487 retransmit state resulting eventually in loss of legitimate RDATA. 2489 The development of precautions for network elements to protect them- 2490 selves against incidental or unsophisticated versions of these attacks 2491 is work in progress and includes: 2493 Damping of jitter in the value of either the source NLA of SPMs or 2494 the path NLA in SPMs. While the source NLA is expected to change 2495 seldom, the path NLA is expected to change occasionally as a conse- 2496 quence of changes in underlying multicast routing information. 2498 The extension of NAK shedding procedures to control the volume, not 2499 just the rate, of confirmed NAKs. In either case, these procedures 2500 assist network elements in surviving NAK attacks at the expense of 2501 maintaining service. More efficiently, network elements may use the 2502 knowledge of TSIs and their associated transmit windows gleaned from 2503 SPMs to control the proliferation of retransmit state. 2505 Matching of the source NLA of NCFs against the path NLA in SPMs (or 2506 the DLR's NLA in OPT_REDIR) to verify that the confirmation is at 2507 least apparently coming from the expected entity. 2509 A three-way handshake between network elements and DLRs that would 2510 permit a network element to ascertain with greater confidence that an 2511 alleged DLR is in fact on the same subnet, is identified by the 2512 alleged NLA, and is PGM conversant. 2514 Since PGM's Local Retransmission procedures allow any receiver to 2515 provide RDATA, the source NLA of RDATA may vary widely in value. At 2516 the expense of the efficiencies of local retransmission, a PGM net- 2517 work element could reduce its vulnerability to false RDATA by accept- 2518 ing RDATA only from the source, but as with all of these procedures, 2519 this is still no protection against full falsification of the 2520 network-layer header. 2522 11. Appendix A - Forward Error Correction 2524 11.1. Introduction 2526 The following procedures incorporate packet-level Reed Solomon Erasure 2527 correcting techniques as described in [11] and [12] into PGM. This 2528 approach to Forward Error Correction (FEC) is based upon the computation 2529 of h parity packets from k data packets for a total of n packets such 2530 that a receiver can reconstruct the k data packets out of any k of the n 2531 packets. More specifically, it is characteristic of the parity packets 2532 that any x of them can be used to reconstruct any x of the original data 2533 packets. The original k data packets are referred to as the Transmis- 2534 sion Group, and the total n packets as the FEC Block. 2536 These procedures permit any combination of pro-active FEC or on-demand 2537 FEC with conventional ARQ within a given TSI to provide any flavour of 2538 layered or integrated FEC. Once provided by a source, the actual use of 2539 FEC or ARQ for loss recovery in the session is entirely at the discre- 2540 tion of the receivers. Note that receivers may still resort to selec- 2541 tive NAKs even when parity is available, and sources must still provide 2542 selective retransmissions in response. The two approaches can be used 2543 by the same or different receivers in a single transport session without 2544 conflict. 2546 Pro-active FEC refers to the technique of computing parity packets at 2547 transmission time and transmitting them as a matter of course following 2548 the data packets. Pro-active FEC is recommended for providing loss 2549 recovery over simplex or asymmetric multicast channels over which 2550 returning retransmit requests is either impossible or costly. It pro- 2551 vides increased reliability at the expense of bandwidth. 2553 On-demand FEC refers to the technique of computing parity packets at 2554 retransmission time and transmitting them only upon demand (i.e., 2555 receiver-based loss detection and retransmit request). On-demand FEC is 2556 recommended for providing loss recovery of uncorrelated loss in very 2557 large receiver populations in which the probability of any single packet 2558 being lost is substantial. It provides equivalent reliability to selec- 2559 tive NAKs (ARQ) at the expense of no more and typically less bandwidth. 2561 Selective NAKs are NAKs that request the retransmission of specific 2562 packets by sequence number corresponding to the sequence number of any 2563 data packets detected to be missing from the expected sequence (conven- 2564 tional ARQ). Selective NAKs are recommended for recovering losses 2565 occurring in trailing partial transmission groups. 2567 Parity NAKs are NAKs that request the transmission of a specific number 2568 of parity packets by count corresponding to the count of the number of 2569 data packets detected to be missing from a group of k data packets (on- 2570 demand FEC). 2572 The objective of these procedures is to incorporate these FEC techniques 2573 into PGM so that: 2575 sources may provide parity packets either pro-actively or on-demand, 2576 interchangeably within the same TSI, 2578 receivers may use either selective or parity NAKs interchangeably 2579 within the same TSI, 2581 network elements may maintain retransmit state based on either selec- 2582 tive or parity NAKs in the same data structure, altering only search, 2583 RDATA constraint, and deletion algorithms in either case, 2585 and only OPTION additions to the basic packet formats are required. 2587 11.2. Overview 2589 Advertising FEC parameters in the transport session 2591 Sources add OPT_PARITY_PRM to SPMs to provide session-specific parame- 2592 ters such as the number of packets (TGSIZE == k) in a transmission 2593 group. This option lets receivers know how many packets in a transmis- 2594 sion group, and it lets network elements sort retransmit state by 2595 transmission group number. This option includes an indication of 2596 whether pro-active and/or on-demand parity is available from the source. 2598 Distinguishing parity packets from data packets 2600 Sources send pro-active parity packets as ODATA and on-demand parity 2601 packets as RDATA. A source must add OPT_PARITY to the ODATA/RDATA 2602 packet header of parity packets to permit network elements and receivers 2603 to distinguish them from data packets. 2605 Data and parity packet numbering 2607 Parity packets must be calculated over a fixed number k of data packets 2608 known as the Transmission Group. Grouping of packets into transmission 2609 groups effectively partitions a packet sequence number into a high-order 2610 portion (TG_SQN) specifying the transmission group (TG), and a low-order 2611 portion (PKT_SQN) specifying the packet number (PKT-NUM in the range 0 2612 through k-1) within that group. So from an implementation point of 2613 view, it's handy if k, the TG size, is a power of 2. If so then TG_SQN 2614 and PKT_SQN can be mapped side-by-side into the 32 bit SQN. So 2615 log2(TGSIZE) is the size in bits of PKT_SQN. 2617 This mapping does not diminish the effective sequence number space since 2618 parity packets are marked with OPT_PARITY that allows the sequence space 2619 (PKT_SQN) to be reused to number the h parity packets for as long as h 2620 is not greater than k. 2622 In case h is greater than k, a source must add OPT_PARITY_GRP to any 2623 parity packet numbered j greater than k-1 specifying the number m of the 2624 group of k parity packets to which the packet belongs where m is just 2625 the quotient from the integer division of j by k. Correspondingly, 2626 PKT-NUM for such parity packets is just j modulo k. 2628 Note that parity NAKs (and consequently their corresponding parity NCFs) 2629 must also be distinguished by the addition of OPT_PARITY, and that in 2630 these packets, PKT_SQN contains PKT-CNT, the number of missing packets, 2631 rather than PKT-NUM, the number of a specific missing packet. More on 2632 all this later. 2634 11.3. Packet Contents 2636 This section just provides enough short-hand to make the Procedures 2637 intelligible. For the full details of packet contents, please refer to 2638 Packet Formats below. 2640 OPT_PARITY indicated in pro-active (ODATA) and on-demand (RDATA) 2641 parity packets to distinguish them from data packets 2643 OPT_PARITY_PRM appended by sources to SPMs to specify session-specific 2644 parameters such as the transmission group size and the 2645 availability of pro-active and/or on-demand parity from 2646 the source 2648 OPT_PARITY_GRP the number of the group (greater than 0) of k parity 2649 packets to which the parity packet belongs when more than 2650 k parity packets are provided by the source 2652 11.3.1. Parity NAKs 2654 NAK_TG_SQN the high-order portion of NAK_SQN specifying the 2655 transmission group for which parity packets are requested 2657 NAK_PKT_CNT the low-order portion of NAK_SQN specifying the number of 2658 missing data packets for which parity packets are 2659 requested 2661 11.3.2. Parity NCFs 2663 NCF_TG_SQN the high-order portion of NCF_SQN specifying the 2664 transmission group for which parity packets were 2665 requested 2667 NCF_PKT_CNT the low-order portion of NCF_SQN specifying the number of 2668 missing data packets for which parity packets were 2669 requested 2671 11.3.3. On-demand Parity 2673 RDATA_TG_SQN the high-order portion of RDATA_SQN specifying the 2674 transmission group to which the parity packet belongs 2676 RDATA_PKT_SQN the low-order portion of RDATA_SQN specifying the parity 2677 packet sequence number within the transmission group 2679 11.3.4. Pro-active Parity 2681 ODATA_TG_SQN the high-order portion of ODATA_SQN specifying the 2682 transmission group to which the parity packet belongs 2684 ODATA_PKT_SQN the low-order portion of ODATA_SQN specifying the parity 2685 packet sequence number within the transmission group 2687 11.4. Procedures - Sources 2689 If a source elects to provide parity for a given transport session, it 2690 must first provide the transmission group size PARITY_PRM_TGS in the 2691 OPT_PARITY_PRM option of its SPMs. If a source elects to provide pro- 2692 active parity for a given transport session, it must set PARITY_PRM_PRO 2693 in the OPT_PARITY_PRM option of its SPMs. If a source elects to provide 2694 on-demand parity for a given transport session, it must set 2695 PARITY_PRM_OND in the OPT_PARITY_PRM option of its SPMs. 2697 A source must send any pro-active parity packets for a given transmis- 2698 sion group only after it has first sent all of the corresponding k data 2699 packets in that group. Pro-active parity packets must be sent as ODATA 2700 with OPT_PARITY. 2702 If a source elects to provide on-demand parity, it must respond to a 2703 parity NAK for a transmission group with a parity NCF. Subsequently, 2704 the source must then send the number of parity packets requested by that 2705 parity NAK. On-demand parity packets must be sent as RDATA with 2706 OPT_PARITY. 2708 In either case, the source must be prepared to also respond to selective 2709 NAKs in the usual way. 2711 In the absence of data to transmit, a source should pad out the 2712 transmission group with padded packets before calculating and providing 2713 parity packets either pro-actively or on demand. 2715 A source may consolidate requests for on-demand parity in the same 2716 transmission group according to the following procedures. If the number 2717 of pending (i.e., unsent) parity packets from a previous request for 2718 on-demand parity packets is equal to or greater than NAK_PKT_CNT in a 2719 subsequent NAK, that subsequent NAK must be confirmed but may otherwise 2720 be ignored. If the number of pending (i.e., unsent) parity packets from 2721 a previous request for on-demand parity packets is less than NAK_PKT_CNT 2722 in a subsequent NAK, that subsequent NAK must be confirmed but the 2723 source need only increase the number of pending parity packets to 2724 NAK_PKT_CNT. 2726 11.5. Procedures - Receivers 2728 If a receiver elects to make use of parity packets for loss recovery, it 2729 must first learn the transmission group size PARITY_PRM_TGS from 2730 OPT_PARITY_PRM in the SPMs for the TSI. The transmission group size is 2731 used by a receiver to determine the sequence number boundaries between 2732 transmission groups. 2734 Thereafter, if PARITY_PRM_PRO is also set in the SPMs for the TSI, a 2735 receiver may use any pro-active parity packets it receives for loss 2736 recovery, and if PARITY_PRM_OND is also set in the SPMs for the TSI, it 2737 may solicit on-demand parity packets upon loss detection. Parity pack- 2738 ets are ODATA (pro-active) or RDATA (on-demand) packets distinguished by 2739 OPT_PARITY which lets receivers know that ODATA/RDATA_TG_SQN identifies 2740 the group of PARITY_PRM_TGS packets to which the parity may be applied 2741 for loss recovery in the corresponding transmission group, and that 2742 ODATA/RDATA_PKT_SQN is being reused to number the parity packets within 2743 that group. Receivers order parity packets and eliminate duplicates 2744 within a transmission group based on ODATA/RDATA_PKT_SQN and on 2745 OPT_PARITY_GRP if present. 2747 To solicit on-demand parity packets, a receiver must send parity NAKs 2748 upon loss detection. For the purposes of soliciting on-demand parity, 2749 loss detection occurs at transmission group boundaries, i.e. upon 2750 receipt of the last data packet in a transmission group, upon receipt of 2751 any data packet in any subsequent transmission group, or upon receipt of 2752 any parity packet in the current or a subsequent transmission group. 2754 A parity NAK is simply a NAK with OPT_PARITY and NAK_PKT_CNT set to the 2755 count of the number of packets detected to be missing from the transmis- 2756 sion group specified by NAK_TG_SQN. Note that this constrains the 2757 receiver to request no more parity packets than there are data packets 2758 in the transmission group. 2760 A receiver should bias the value of NAK_RB_IVL for parity NAKs inversely 2761 proportional to NAK_PKT_CNT so that NAKs for larger losses are likely to 2762 be scheduled ahead of NAKs for smaller losses in the same receiver 2763 population. 2765 A confirming NCF for a parity NAK is a parity NCF with NCF_PKT_CNT equal 2766 to or greater than that specified by the parity NAK. 2768 A receiver's NAK_RDATA_IVL timer is not cancelled until all requested 2769 parity packets have been received. 2771 In the absence of data (detected from SPMs bearing SPM_LEAD equal to 2772 RXW_LEAD) on non-transmission-group boundaries, receivers should resort 2773 to selective NAKs for any missing packets in that trailing transmission 2774 group. 2776 11.6. Procedures - Network Elements 2778 Pro-active parity packets (ODATA with OPT_PARITY) are switched by net- 2779 work elements without transport-layer intervention. 2781 On-demand parity packets (RDATA with OPT_PARITY) necessitate modified 2782 request, confirmation and retransmit constraint procedures for network 2783 elements. In the context of these procedures, retransmit state is main- 2784 tained per NAK_TSI and NAK_TG_SQN, and in addition to recording the 2785 interfaces on which corresponding NAKs have been received, records the 2786 largest value of NAK_PKT_CNT seen in corresponding NAKs on each inter- 2787 face. This value is referred to as the known packet count. The largest 2788 of the known packet counts recorded for any interface in the retransmit 2789 state for the transmit group is referred to as the largest known packet 2790 count. 2792 Upon receipt of a parity NAK, a network element responds with the 2793 corresponding parity NCF. The corresponding parity NCF is just an NCF 2794 formed in the usual way (i.e., a multicast copy of the NAK with the 2795 packet type changed), but with the addition of OPT_PARITY and with 2796 NCF_PKT_CNT set to the larger of NAK_PKT_CNT and the largest known 2797 packet count. The network element then creates retransmit state in the 2798 usual way with the following modifications. 2800 If retransmit state for the receiving interface does not exist, the net- 2801 work element must create it and additionally record NAK_PKT_CNT from the 2802 parity NAK as the known packet count for the receiving interface. 2804 If retransmit state for the receiving interface already exists, the net- 2805 work element must eliminate the NAK only if NAK_ELIM_IVL has not expired 2806 and NAK_PKT_CNT is equal to or less than the known packet count for the 2807 receiving interface. If NAK_PKT_CNT is greater than this value, the 2808 network element must update the known packet count for the receiving 2809 interface with the larger NAK_PKT_CNT. 2811 Upon either adding a new interface or updating the known packet count 2812 for an existing interface, the network element must determine if 2813 NAK_PKT_CNT is greater than the largest known packet count. If so or if 2814 NAK_ELIM_IVL has expired, the network element must forward the parity 2815 NAK in the usual way with a value of NAK_PKT_CNT equal to the largest 2816 known packet count. 2818 Upon receipt of an on-demand parity packet, a network element must 2819 locate existing retransmit state for the corresponding RDATA_TSI and 2820 RDATA_TG_SQN. If no such retransmit state exists, the network element 2821 must discard the RDATA as usual. 2823 If corresponding retransmit state exists, the network element must for- 2824 ward the RDATA on all interfaces in the existing retransmit state, and 2825 decrement the known packet count for each by one. Any interfaces whose 2826 known packet count is thereby reduced to zero must be deleted from the 2827 retransmit state. If the number of interfaces is thereby reduced to 2828 zero, the retransmit state itself must be deleted. 2830 Network elements must use parity NCFs to anticipate NAKs in the usual 2831 way with the addition of recording NCF_PKT_CNT from the parity NCF with 2832 the anticipated state so that any subsequent NAKs received with 2833 NAK_PKT_CNT equal to or less than NCF_PKT_CNT will be eliminated, and 2834 any with NAK_PKT_CNT greater than NCF_PKT_CNT will be forwarded. 2836 11.7. Packet Formats 2838 11.7.1. OPT_PARITY_PRM - Packet Extension Format 2840 OPT_PARITY_PRM may be appended only to SPMs. 2842 0 1 2 3 2843 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2844 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2845 | Option Type | Option Length | P O| 2846 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2847 | Transmission Group Size | 2848 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2850 Option Type = 0x08 2852 Option Length = 8 octets 2854 P-bit (PARITY_PRM_PRO) 2856 Indicates when set that the source is providing pro-active parity 2857 packets. 2859 O-bit (PARITY_PRM_OND) 2861 Indicates when set that the source is providing on-demand parity 2862 packets. 2864 At least one of PARITY_PRM_PRO and PARITY_PRM_OND must be set. 2866 Transmission Group Size (PARITY_PRM_TGS) 2868 The number of data packets in the transmission group over which 2869 the parity packets are calculated. 2871 11.7.2. OPT_PARITY_GRP - Packet Extension Format 2873 OPT_PARITY_GRP may be appended only to parity packets. 2875 0 1 2 3 2876 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2877 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2878 | Option Type | Option Length | Parity Group Number | 2879 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2881 Option Type = 0x09 2883 Option Length = 4 octets 2885 Parity Group Number (PRM_GROUP) 2887 The number of the group of k parity packets amongst the h parity 2888 packets within the transmission group to which the parity packet 2889 belongs where the first k parity packets are in group zero. 2890 PRM_GROUP must not be zero. 2892 12. Appendix B - Congestion Avoidance 2894 A source must implement a couple of strategies for congestion avoidance 2895 derived in principle from the ones described in [13], but rephrased in 2896 terms of transmit rates rather than window sizes, and adapted to account 2897 for PGM's lack of ACKs. As yet, neither of these adaptations has either 2898 the analytic basis nor the practical credentials of those described in 2899 [13], and they are proposed here entirely as experimental strategies to 2900 be modified and proven or discarded as experience dictates. 2902 The first congestion avoidance strategy governs the rate at which a 2903 source may increase its transmit rate up to TXW_MAX_RTE upon initially 2904 starting transmission or restarting transmission after receiving a NAK. 2905 Specifically, upon initial transmission or after receiving a NAK, a 2906 source must reduce its transmit rate to TXW_INC_RTE << TXW_MAX_RTE, and 2907 may double its transmit rate every TXW_INC_SECS only for as long as no 2908 NAKs are received for TXW_INC_SECS and the resulting transmit rate is 2909 less than TXW_MAX_RTE. 2911 A good choice for TXW_INC_RTE would be something conservative such as 2912 TXW_MAX_RTE/256 to allow for 8 left shifts to get back up to 2913 TXW_MAX_RTE. 2915 A good choice for TXW_INC_SECS would be the worst case round trip 2916 delay to any receiver a source is required to reach (see SPM_RPT_IVL 2917 below). 2919 The second congestion avoidance strategy governs the rate at which a 2920 source must reduce its maximum transmit rate in the face of congestion, 2921 and the rate at which it may then increase its maximum transmit rate up 2922 to TXW_MAX_RTE. More specifically, a source must apply a multiplicative 2923 decrease in its maximum transmit rate in the face of congestion, and a 2924 linear increase in its maximum transmit rate in the absence of conges- 2925 tion. That is, upon receipt of a NAK, a source must reduce its maximum 2926 transmit rate by half, and thereafter increase it linearly over time 2927 only for as long as no NAKs are received and the transmit rate does not 2928 exceed TXW_MAX_RTE. 2930 A good choice for "over time" is every TXW_INC_SECS. 2932 Upon receipt of a NAK, these two strategies will combine first to reduce 2933 a source's transmit rate to TXW_INC_RTE from which it will increase 2934 exponentially up to half the transmission rate in use when the NAK was 2935 received, and thereafter to increase it linearly up to TXW_MAX_RTE for 2936 as long as no further NAKs are received. 2938 13. Appendix C - Flow Control 2940 A degree of flow control native to PGM itself is provided through the 2941 exchange of elective, periodic state notifications between sources 2942 (Transmit State Notifications - TSNs) and receivers (Receive State 2943 Notifications - RSNs). The goal of the flow control strategies in PGM 2944 is to conservatively adapt a source's transmit rate so as to minimize 2945 NAKs due to receiver overrun and to do so with as simple and efficient 2946 an exchange of protocol packets as possible. These strategies are 2947 intended to augment, not substitute for, source-based adaptive stra- 2948 tegies for rate-limiting transmissions based solely on the frequency of 2949 NAKs. 2951 Since PGM has no conference control mechanisms, these mechanisms simply 2952 act to modify a source's transmit rate to suit the slowest receiver the 2953 source is willing to accommodate. The use and frequency of TSNs and 2954 RSNs is left to the discretion of the implementation. 2956 TSNs enable a source to adapt its transmit rate as network and receiver 2957 resources permit. A source may distinguish congestion from flow control 2958 by noting that in the absence of RSNs, it is likely that most NAKs the 2959 source may see are the result of congestion and not end-to-end flow con- 2960 trol problems. So a source may also reduce its transmit rate simply in 2961 response to the pattern of NAKs it receives. 2963 These mechanisms are entirely elective and not meant as a replacement 2964 for reservation protocols or other out-of-band resource and conference 2965 management strategies. They are intended simply to provide a workable 2966 strategy in the absence of anything more sophisticated. PGM's reliable 2967 data transfer service is in no way dependent upon the use of TSNs and 2968 RSNs. 2970 13.1. Architectural Description 2972 To provide an optional mechanism for flow, PGM specifies packet formats 2973 and procedures for sources and receivers to exchange resource state 2974 notifications. 2976 13.1.1. Source Functions 2978 A source may periodically multicast TSNs to the group to advertise its 2979 transmit window and its minimum and current transmit rates. 2981 In response to corresponding RSNs, a source must reduce its transmit 2982 rate to at most the least rate specified in any RSN, and reflect this 2983 reduced current rate in subsequent TSNs. 2985 In the absence of corresponding RSNs, a source may conservatively 2986 increase its transmit rate, and reflect this increased current rate in 2987 subsequent TSNs. 2989 To find the local maximum current transmit rate, a source may continue 2990 to increase its current transmit rate until it receives RSNs (or NAKs) 2991 in response, and then back off appropriately. 2993 13.1.2. Receiver Functions 2995 A receiver unicasts an RSN to a source in response to a TSN only if the 2996 transmit rate advertised in the TSN exceeds the receiver's capacity. To 2997 prevent RSN implosion, receivers must observe a random back off over an 2998 interval three times the TSN period, and monitor TSNs in the meantime 2999 for a reduction in the current transmit rate. 3001 13.1.3. Network Element Functions 3003 Network elements forward TSNs, and RSNs without intervention. 3005 13.2. Terms and Concepts 3007 For a given transport session identified by a TSI, a source maintains: 3009 TXW_MIN_RTE a fixed minimum transmit rate in kBps, the minimum the 3010 transmitter will consider maintaining, equal to or less 3011 than TXW_MAX_RTE 3013 The reduction of TXW_MAX_RTE to TXW_MIN_RTE is negotiated through 3014 exchanges of TSNs and RSNs. 3016 For a given transport session identified by a TSI, a receiver maintains: 3018 RXW_MAX_RTE a fixed maximum reception rate in kBps, the maximum the 3019 receiver will consider maintaining 3021 The reduction of the current transmit rate (advertised in TSNs) to 3022 RXW_MAX_RTE is negotiated through exchanges of TSNs and RSNs. 3024 13.3. Packet Contents 3026 This section just provides enough short-hand to make the Procedures 3027 intelligible. For the full details of packet contents, please refer to 3028 Packet Formats below. 3030 13.3.1. Transmit State Notification (TSN) 3032 TSNs are formed by adding OPT_TSN to SPMs and contain: 3034 TSN_TSI (a.k.a. SPM_TSI) the source-assigned TSI for which RSNs 3035 are solicited 3037 TSN_SQN (a.k.a. SPM_SQN) a sequence number assigned sequentially 3038 by the source in unit increments and scoped by TSN_TSI 3040 NOTA BENE: this is an entirely separate sequence than is used 3041 to number ODATA and RDATA. 3043 TSN_TRAIL (a.k.a. SPM_TRAIL) the source's TXW_TRAIL 3045 TSN_LEAD (a.k.a. SPM_LEAD) the source's TXW_LEAD 3047 TSN_MIN_RTE the source's TXW_MIN_RTE 3049 TSN_MAX_RTE the source's TXW_MAX_RTE 3051 13.3.2. Receive State Notification (RSN) 3053 RSNs are unicast to the source and contain: 3055 RSN_TSI TSN_TSI from the TSN to which this is a response 3057 RSN_SQN TSN_SQN from the TSN to which this is a response 3059 RSN_TRAIL TSN_TRAIL from the TSN to which this is a response 3061 RSN_MAX_RTE the receiver's RXW_MAX_RTE 3063 13.4. Procedures - Sources 3065 13.4.1. Data Transmission Initialization 3067 Sources must sequence TSNs by assigning each a TSN_SQN using a number 3068 sequence separate from that used to number data packets. In addition, 3069 sources associate each TSN with a specific instance of the transmit win- 3070 dow by setting TSN_TRAIL to TXW_TRAIL. 3072 A source may precede initial data transmission to a transport session by 3073 sending TSNs at a rate of TSN_IDL_RTE for an interval of TSN_IDL_IVL. 3074 TSNs are used by the source in this instance simply to provoke RSNs from 3075 any receivers that may protest the advertised TSN_MAX_RTE. A source may 3076 use this procedure to find the largest acceptable initial values for 3077 TXW_MAX_RTE before initiating data transmission. 3079 In the ordinary course of data transmission, a source may periodically 3080 transmit TSNs and adjust the current transmit rate to establish the 3081 optimum rate for the current population of tuned-in receivers. 3083 Specifically, a source may increase the values in the TSN without 3084 increasing them in fact until it provokes RSNs. It should then use the 3085 values in the RSNs to back off to the highest acceptable values for 3086 actual use. 3088 Note, then, that a source may advertise higher values for TSN_MAX_RTE in 3089 its TSNs than it actually uses, but it must never actually use higher 3090 values for TXW_MAX_RTE than it advertises in its TSNs. 3092 13.4.2. Transmit Resource Management 3094 An RSN corresponds to a TSN if RSN_TSI matches TSN_TSI, RSN_SQN matches 3095 TSN_SQN, and RSN_TRAIL matches TSN_TRAIL. That is, an RSN corresponds 3096 to a TSN if it bears the same transport session, sequence, and transmit 3097 window identifiers as the TSN. 3099 Sources should respond to RSNs that correspond to the current TSN by 3100 reducing TXW_MAX_RTE to the minimum values heard in any such RSN as long 3101 as these values are no lower than TXW_MIN_RTE. 3103 13.5. Procedures - Receivers 3105 13.5.1. Data Reception Initialization 3107 TSNs must be sequenced by receivers based on a combination of TSN_SQN 3108 (which numbers TSNs separately from data packets) and TSN_TRAIL which 3109 relates the TSN to a specific transmit window. TSNs bearing the same 3110 TSN_TRAIL may be ordered relative to one another using TSN_SQN. The 3111 highest numbered such TSN should be used to maintain the receiver's 3112 notion of the transmit window and the current and maximum transmit 3113 rates. Ordering of TSNs is particularly important for TSNs in which 3114 transmit rates are increasing or decreasing. 3116 For a given transport session identified by TSI, a receiver may precede 3117 initial data reception by first receiving and accepting the values for 3118 TXW_MAX_RTE in a matching TSN. Accepting this value implies that the 3119 receiver is capable of receiving data at the rate of TXW_MAX_RTE. 3121 If a receiver accepts the advertised value for TXW_MAX_RTE in a matching 3122 TSN, it may initiate data reception in the transmit window provided by 3123 the TSN. 3125 If the TSN bears OPT_JOIN, the receiver initializes the trailing edge of 3126 the receive window to TXW_TRAIL and proceeds with normal data reception. 3128 If the TSN does not bear OPT_JOIN, the receiver may initiate data recep- 3129 tion beginning only with the first ODATA_SQN it receives within the 3130 advertised transmit window. This sequence number temporarily defines 3131 the trailing edge of the transmit window from the receivers perspective. 3132 That is, it is assigned to RXW_TRAIL_INIT within the receiver, and until 3133 trailing edge sequence number advertised in subsequent packets (TSNs or 3134 ODATA or RDATA or SPMs) increments through RXW_TRAIL_INIT, the receiver 3135 must only request retransmissions for sequence numbers subsequent to 3136 RXW_TRAIL_INIT. Thereafter, it may request retransmissions anywhere in 3137 the transmit window. This temporary restriction on retransmission 3138 requests prevents receivers from requesting a potentially large amount 3139 of history when they first begin to receive a given PGM transport ses- 3140 sion. 3142 13.5.2. Receive Resource Management 3144 >From a receiver's perspective, an acceptable TSN is one in which 3145 TSN_MIN_RTE is equal to or less than RXW_MAX_RTE. The current value of 3146 TSN_MAX_RTE may or may not be within the receiver's capacity. 3148 If a receiver receives an unacceptable TSN, the receiver must neither 3149 initiate nor continue data reception for the given transport session. 3150 In addition, it must not respond to the TSN with an RSN, although it may 3151 continue to receive and inspect TSNs for an acceptable one. 3153 If a receiver receives an acceptable TSN, but the advertised values of 3154 TSN_MAX_RTE exceed RXW_MAX_RTE, the receiver should respond with a 3155 corresponding RSN advertising the maximum value RSN_MAX_RTE with which 3156 it can operate. The receiver may simultaneously initiate or continue 3157 data reception, and it should continue to respond to subsequent TSNs 3158 with this RSN until it receives a TSN advertising a value of TSN_MAX_RTE 3159 with which it can operate. 3161 13.6. Packet Formats 3163 13.6.1. OPT_TSN - Packet Extension Format 3165 The source NLA of a TSN is the unicast address of the entity that 3166 originates the TSN. 3168 The destination NLA of a TSN is a multicast group NLA. 3170 0 1 2 3 3171 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3172 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3173 | Option Type | Option Length | | 3174 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3175 | Minimum Transmit Rate | 3176 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3177 | Maximum Transmit Rate | 3178 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3180 Option Type = 0x0A 3182 Option Length = 12 octets 3184 Minimum Transmit Rate (TSN_MIN_RTE) 3186 The minimum rate of transmission required for receivers to parti- 3187 cipate in the group (TXW_MIN_RTE). 3189 Transmit Rate (TSN_MAX_RTE) 3191 The current rate of transmission required by receivers to partici- 3192 pate in the group (TXW_MAX_RTE). 3194 13.6.2. RSN - Receive State Notification 3196 The source NLA of an RSN is the unicast address of the entity that 3197 originates the RSN. 3199 The destination NLA of an RSN is the unicast address of the source of 3200 the corresponding TSN. 3202 0 1 2 3 3203 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3204 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3205 | Source Port | Destination Port | 3206 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3207 | Type | Options | Checksum | 3208 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3209 | Global Source ID ... | 3210 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3211 | ... Global Source ID | TPDU Length | 3212 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3213 | RSN's Sequence Number | 3214 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3215 | Trailing Edge Sequence Number | 3216 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3217 | Receive Rate | 3218 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3220 Type: 3222 RSN_TYPE = 0x0C 3224 Options 3226 RSNs may bear only OPT_JOIN. 3228 RSN's Sequence Number (RSN_SQN) 3230 TSN_SQN from the corresponding TSN. 3232 Trailing Edge Sequence Number (RSN_TRAIL) 3234 TSN_TRAIL from the corresponding TSN. 3236 Transmit Rate (RSN_MAX_RTE) 3238 The maximum rate of transmission the receiver can sustain 3239 (RXW_MAX_RTE). 3241 Work in Progress 3243 In addition to the explicitly speculative material in the foregoing, 3244 work is also in progress on: 3246 Congestion avoidance through transmit rate control. 3248 Throughput control through shedding of lossy receivers. 3250 Reducing the latency of the alignment of source-path state with 3251 underlying multicast routing changes. 3253 Header compression. 3255 Strategies for securing PGM against the black-hole attacks outlined 3256 in Security Considerations. 3258 Heuristics for delaying the transmission of RDATA from a source to 3259 balance the tradeoff between the retransmit latency experienced by 3260 receivers and the overhead of duplicate RDATA packets experienced by 3261 the network. 3263 Heuristics for bounding the smallest random back-off interval for NAK 3264 generation to balance the tradeoff between the retransmit latency 3265 experienced by receivers and the overhead of unnecessary RDATA pack- 3266 ets experienced by the network when the apparent loss is due to sim- 3267 ple out-of-order delivery through the network. 3269 Addition of SPM Requests (SPMRs) to permit late-joining receivers to 3270 provoke an SPM from a source in a non-implosive way. 3272 Acknowledgements 3274 The design and specification of PGM has been substantially influenced by 3275 reviews and revisions provided by several people who took the time to 3276 read and critique this document. These include, in alphabetical order: 3278 Bob Albrightson albright@cisco.com 3279 Joel Bion jpbion@cisco.com 3280 Mark Bowles bowles@tibco.com 3281 Jon Crowcroft j.crowcroft@cs.ucl.ac.uk 3282 Steve Deering deering@cisco.com 3283 Tugrul Firatli tf@tibco.com 3284 Jim Gemmell jgemmell@microsoft.com 3285 Dan Harkins dharkins@cisco.com 3286 Dima Khoury dkhoury@cisco.com 3287 Dan Leshchiner dleshc@tibco.com 3288 Todd Montgomery tmont@gcast.com 3289 Gerard Newman gkn@network-alchemy.com 3290 Dave Oran oran@cisco.com 3291 Denny Page denny@tibco.com 3292 Ken Pillay ken@cisco.com 3293 Chetan Rai crai@cs.stanford.edu 3294 Yakov Rekhter yakov@cisco.com 3295 Luigi Rizzo luigi@iet.unipi.it 3296 Dave Rossetti rossetti@cisco.com 3297 Paul Stirpe paul.stirpe@reuters.com 3298 Lorenzo Vicisano l.vicisano@cs.ucl.ac.uk, 3299 Brian Whetten whetten@gcast.com 3300 Kyle York kyork@cisco.com 3301 References 3303 [1] B. Whetten, T. Montgomery, S. Kaplan, "A High Performance Totally 3304 Ordered Multicast Protocol", in "Theory and Practice in Distributed Sys- 3305 tems", Springer Verlag LCNS938, 1994 3307 [2] S. Floyd, V. Jacobson, C. Liu, S. McCanne, L. Zhang, "A Reliable 3308 Multicast Framework for Light-weight Sessions and Application Level 3309 Framing", ACM Transactions on Networking, November 1996 3311 [3] J. C. Lin, S. Paul, "RMTP: A Reliable Multicast Transport Protocol", 3312 ACM SIGCOMM August 1996 3314 [4] K. Miller, K. Robertson, A. Tweedly, M. White, "Multicast File 3315 Transfer Protocol (MFTP) Specification", INTERNET DRAFT draft-miller- 3316 mftp-spec-02, January 1997 3318 [5] S. Deering, "Host Extensions for IP Multicasting", INTERNET RFC1112, 3319 STD 5, August 1989 3321 [6] D. Katz, "IP Router Alert Option", INTERNET DRAFT draft-katz- 3322 router-alert-04, January 1997 3324 [7] C. Partridge, "Gigabit Networking", Addison Wesley 1994 3326 [8] H. W. Holbrook, S. K. Singhal, D. R. Cheriton, "Log-Based Receiver- 3327 Reliable Multicast for Distributed Interactive Simulation", ACM SIGCOMM 3328 1995 3330 [9] R. Rivest, "The MD5 Message-Digest Algorithm", INTERNET RFC1321, 3331 INFORMATIONAL, April 1992 3333 [10] J. Reynolds, J. Postel, "Assigned Numbers", INTERNET RFC1700, STD 3334 2, October 1994 3336 [11] J. Nonnenmacher, E. Biersack, D. Towsley, "Parity-Based Loss 3337 Recovery for Reliable Multicast Transmission", ACM SIGCOMM September 3338 1997 3340 [12] L. Rizzo, "Effective Erasure Codes for Reliable Computer Communica- 3341 tion Protocols", Computer Communication Review, April 1997 3343 [13] V. Jacobson, "Congestion Avoidance and Control", ACM SIGCOMM August 3344 1988 3345 Authors' Addresses 3347 Tony Speakman 3348 speakman@cisco.com 3350 Dino Farinacci 3351 dino@cisco.com 3353 Steven Lin 3354 slin@cisco.com 3356 Alex Tweedly 3357 agt@cisco.com 3359 Cisco Systems, Inc. 3360 170 West Tasman Drive, 3361 San Jose, CA 95134