idnits 2.17.1 draft-nielsen-tsvwg-sctp-tlr-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC4960]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1915 has weird spacing: '... xx xx...' -- The document date (October 19, 2015) is 3083 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Unused Reference: 'RFC5062' is defined on line 1523, but no explicit reference was found in the text ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) ** Obsolete normative reference: RFC 7053 (Obsoleted by RFC 9260) == Outdated reference: A later version (-13) exists of draft-ietf-tsvwg-sctp-ndata-04 == Outdated reference: A later version (-27) exists of draft-tuexen-tsvwg-sctp-multipath-10 == Outdated reference: A later version (-10) exists of draft-ietf-tcpm-rtorestart-08 == Outdated reference: A later version (-16) exists of draft-ietf-tsvwg-sctp-failover-13 Summary: 3 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group K. Nielsen 3 Internet-Draft R. De Santis 4 Intended status: Experimental Ericsson 5 Expires: April 21, 2016 A. Brunstrom 6 Karlstad University 7 M. Tuexen 8 Muenster Univ. of Appl. Science 9 R. Stewart 10 Netflix, Inc. 11 October 19, 2015 13 SCTP Tail Loss Recovery Enhancements 14 draft-nielsen-tsvwg-sctp-tlr-02.txt 16 Abstract 18 Loss Recovery by means of T3-Retransmission has significant 19 detrimental impact on the delays experienced through an SCTP 20 association. The throughput achievable over an SCTP association also 21 is negatively impacted by the occurrence of T3-Retransmissions. The 22 present SCTP Fast Recovery algorithms as specified by [RFC4960] are 23 not able to adequately or timely recover losses in certain 24 situations, thus resorting to loss recovery by lengthy 25 T3-Retransimissions or by non-timely activation of Fast Recovery. In 26 this document we specify a number of enhancements to the SCTP Loss 27 Recovery algorithms which amends some of these deficiencies with a 28 particular focus on Loss Recovery for drops in Traffic Tails. The 29 enhancements supplement the existing algorithms of [RFC4960] with 30 proactive probing and timer driven activation of the Fast 31 Retransmission algorithm as well as a number of enhancements of the 32 Fast Retransmission algorithm in itself are specified. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at http://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on April 21, 2016. 50 Copyright Notice 52 Copyright (c) 2015 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 68 1.1. The SCTP TLR Function . . . . . . . . . . . . . . . . . . 4 69 1.1.1. Dependencies . . . . . . . . . . . . . . . . . . . . 5 70 1.2. Relation to other work . . . . . . . . . . . . . . . . . 5 71 1.2.1. Early Retransmit and RTO Restart . . . . . . . . . . 5 72 1.2.2. TCP applicability . . . . . . . . . . . . . . . . . . 6 73 1.2.3. Packet Re-ordering . . . . . . . . . . . . . . . . . 6 74 1.2.4. Congestion Control . . . . . . . . . . . . . . . . . 7 75 1.2.5. CMT-SCTP Applicability . . . . . . . . . . . . . . . 7 76 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 8 77 3. Description of Algorithms . . . . . . . . . . . . . . . . . . 9 78 3.1. SCTP Scoreboard and miss indication Counting Enhancement 9 79 3.1.1. Multi-Path Considerations . . . . . . . . . . . . . . 11 80 3.2. RFC6675 nextseg() Tail Loss Enhancements for SCTP FR . . 11 81 3.2.1. Multi-Path Considerations . . . . . . . . . . . . . . 14 82 3.3. SCTP-TLR Description . . . . . . . . . . . . . . . . . . 15 83 3.3.1. Principles . . . . . . . . . . . . . . . . . . . . . 15 84 3.3.2. SCTP - TLR Statemachine . . . . . . . . . . . . . . . 19 85 3.3.3. TLPP Transmission Rules . . . . . . . . . . . . . . . 24 86 3.3.4. Masking of TLPP Recovered Losses . . . . . . . . . . 28 87 3.3.5. Elimination of unnecesary DELAY-ACK delays . . . . . 30 88 4. Confirmation of support for Immediate SACK . . . . . . . . . 31 89 5. Socket API Considerations . . . . . . . . . . . . . . . . . . 31 90 6. Security Considerations . . . . . . . . . . . . . . . . . . . 31 91 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 32 92 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32 93 9. Discussion and Evaluation of function . . . . . . . . . . . . 32 94 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 32 95 10.1. Normative References . . . . . . . . . . . . . . . . . . 32 96 10.2. Informative References . . . . . . . . . . . . . . . . . 33 97 Appendix A. Unambuiguous SACK . . . . . . . . . . . . . . . . . 35 98 A.1. TSN Retransmission ID in Data Chunk Header . . . . . . . 35 99 A.1.1. Sender side behaviour . . . . . . . . . . . . . . . . 36 100 A.1.2. Receiver side behaviour . . . . . . . . . . . . . . . 36 101 A.2. Unambuiguous SACK Chunk . . . . . . . . . . . . . . . . . 36 102 A.2.1. Receiver side behaviour . . . . . . . . . . . . . . . 40 103 A.3. Unambuigous SACK return . . . . . . . . . . . . . . . . . 40 104 A.4. Negotiation . . . . . . . . . . . . . . . . . . . . . . . 41 105 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 107 1. Introduction 109 Loss Recovery by means of T3-Retransmission has significant impact on 110 the delays experienced through, as well as, the throughput achievable 111 over an SCTP association. Loss Recovery by Fast Retransmission 112 operation in many situations is superior to T3-Retransmission from 113 both a latency and a throughput perspective. 115 The present SCTP Fast Retransmission algorithm, as specified by 116 [RFC4960], is driven uniquely by exceed of a DupTresh number of miss 117 indication counts stemming for returned SACKs, and it is as such not 118 able to adequately or timely recover losses in traffic tails where a 119 sufficient number of such SACKs may not be generated, there resorting 120 to loss recovery by T3-Retransimissions or by non-timely activation 121 of Fast Recovery. Non-timely activation here refer to the situation 122 where activation of Fast Recovery for packets lost within one data 123 burst needs to await arrival of SACKs from a subsequent data burst. 125 By drop in traffic tails (or tail drops) we refer generally and 126 specifically to the following situations: 128 1. Drops of the last SCTP packets of an SCTP association or more 129 generally drop of packets in the end of an SCTP association which 130 are not proceeded by more than DupThresh number of packets which 131 are not dropped. 133 2. Drops among packets sent in a the end of bursts spaced by pauses 134 of time equal to or greater than the T3-timeout (approximately). 135 It is noted that such bursts (pauses in between bursts) may 136 result from application limitations, from congestion control 137 limitations or from receiver side limitations. 139 3. Drops among packets sent so sparsely that each dropped packet 140 constitutes a tail drop in that DupThresh number of packets would 141 not be sent (would not be available for sent) prior to expiry of 142 the T3-timeout. 144 It shall be noted that while the above traffic drop criteria describe 145 drops among the forward data packets only, then drops among forward 146 data packets combined with drops of the returned SACKs may together 147 result in that an insufficient number of SACKs be returned to traffic 148 sender for that the Fast Retransmission algorithm be activated prior 149 to T3-timeout occurring. The tail traffic situations for which SCTP 150 Fast Retransmission is not able to recover the losses is thus in 151 general broader than the exact situations listed above. The 152 improvements specified include enhancement of SCTP to deduce the miss 153 indication counts from enhanced scoreboard information thus removing 154 some of the vulnerability of the present SCTP miss indication 155 counting to loss of SACKs. 157 1.1. The SCTP TLR Function 159 The function proposed for enhancements of the SCTP Loss Recovery 160 operation for Traffic Tail Losses is divided in two parts: 162 o Enhancements of SCTP Fast Retransmission (SCTP FR) algorithm by 163 means of the following Tail Loss Recovery improving functions 164 inspired by or specified by [RFC6675] for TCP: 166 * miss indication counting for a missing (non-SACK'ed) TSN will 167 be based on augmented scoreboard information such that the miss 168 indications will be based not on the number of returned SACKs 169 but on the number of SACK'ed SCTP packets carrying data chunks 170 of higher TSNs. The mechanism is specified both in terms of 171 packets, the book-keeping of which requires new logic, as well 172 as in terms of a less implementation demanding byte based 173 variant following the Islost() approach of [RFC6675]. We shall 174 refer to this improvement as Extended miss indication Counting. 176 * Fast Recovery operation is extended to include the "last 177 resort" retransmission, Nextseg 3) and Nextseg 4), operations 178 of [RFC6675], thus supporting conditional proactive fast 179 retransmissions of missing, but not yet classified as lost, 180 TSNs within the Fast Recovery Exit Point. 182 o New SCTP Tail Loss Recovery State machine with proactive timer 183 driven activation of (the enhanced) Fast Recovery operation. 184 Timer driven activation of Fast Recovery is initiated for 185 outstanding data whenever a certain time, shorter then the T3 186 timeout, has elapsed from the transmittal of the lowest 187 outstanding TSN and network responsiveness, in form of SACKs of 188 packets ahead of the TSN, has been proven since the transmittal of 189 the lowest outstanding TSN. The SCTP TLR mechanism implements a 190 new timer, the Tail Loss Probe timer (PTO), and it works in parts 191 by: 193 * Forced activation of Fast Recovery when network responsiveness 194 has been proven, and the PTO timer has kicked, since 195 transmittal of the lowest outstanding TSN, but additional 196 traffic sent (SACKs of TSNs ahead of the TSN) has not served to 197 activate Fast Recovery based on the Extended Mis Indication 198 Counting. 200 * Probing for network responsiveness, by transmittal of a TLR 201 probe packet, when no network responsiveness information (no 202 SACKs have been received for any packets ahead of line of the 203 TSN) is available at expiration of the PTO timer relative to 204 the lowest outstanding TSN 206 * Activation for T3-retransmission Loss Recovery only when the 207 network remains unresponsive (no SACKs are received) also after 208 transmittal, and subsequently timeout, of a TLR probe packet. 210 1.1.1. Dependencies 212 The SCTP TLR procedures proposed apply as add-on supplements to any 213 SCTP implementation based on [RFC4960]. The SCTP TLR procedures in 214 their core are sender-side only and do not impact the SCTP receiver. 216 Exploitation of SCTP immediate SACK feature, [RFC7053], and usage of 217 new (to be defined) Unambiguous Selective Acknowledgement feature of 218 SCTP require support in both sender and receiver of these SCTP 219 extensions. 221 1.2. Relation to other work 223 1.2.1. Early Retransmit and RTO Restart 225 It is noted that the Early Retransmit algorithm, [RFC5827], addresses 226 activation of Fast Recovery for a particular subset of the tail drop 227 situations in target of the SCTP TLR function. The solution proposed 228 embeds (as a special case) the Early Retransmits algorithm in the 229 delayed variant, experienced with for TCP in [DUKKIPATI02] in which 230 Early Retransmission is only activated provided a certain time has 231 elapsed since the lowest outstanding TSN was transmitted. The delay 232 adds robustness towards spurious retransmissions caused by "mild" 233 packet re-ordering as documented for TCP in [DUKKIPATI02]. 235 It is further noted that depending on the exact situation (e.g., drop 236 pattern, congestion window and amount of data in flight) then 237 T3-retransmission procedures need not be inferior to Fast 238 Retransmission procedures. Rather in some situations 239 T3-retransmission will indeed be superior as T3-retransmissions allow 240 for ramp up of the congestion window during the recovery process. 242 The changes proposed in this document focus on improving the Loss 243 Recovery operation of SCTP by enforcing timely activation of 244 (improved) Fast Retransmission algorithms. With the purpose to 245 reduce the latency of the TCP and SCTP Loss Recovery operation 246 [HURTIG] has taken the alternative approach of accelerating the 247 activation of T3-retransmission processes when Fast Recovery is not 248 able to kick in to recover the loss. [HURTIG] only addresses a 249 subset of the Tail loss scenarios in scope in the work presented 250 here. The ideas of [HURTIG] for accurate RTO restart are drawn on in 251 the solution proposed here for accurate restart of the new tail loss 252 probe timer (PTO-timer) as well as for accurate set of the T3-timer 253 under certain conditions thus harvesting some of the same latency 254 optimizations as [HURTIG]. The same approach has recently been 255 exploited for TCP by the invention of the TLPR function by the 256 authors of [Rajiullah]. 258 1.2.2. TCP applicability 260 SCTP Loss Recovery operation in its core is based on the design of 261 Loss Recovery for TCP with SACK enabled. The enhancements of SCTP 262 Tail Loss Recovery proposed here are applicable for TCP. 264 Note: The - to be determined - exploitation of SCTP immediate SACK 265 feature, [RFC7053], and the - to be determined - usage of new 266 unambiguous selective acknowledgement feature of SCTP may not be 267 readably applicable to TCP at present. ISSUE: Need to follow up on 268 [zimmermann02], [zimmermann03], 270 It is noted that while the SCTP TLR algorithms and SCTP TLR state 271 machine defined is inspired by the timer driven tail loss probe 272 approach specified in [DUKKIPATI01] for TCP, then the solution 273 defined here differs in the approach taken. The approach here is a 274 clean state approach defining a new comprehensive SCTP TLR state 275 machine as an add-on to the (at least conceptually) existing Fast 276 Recovery and T3-Retransmission SCTP state machines of SCTP. Thereby 277 the SCTP TLR algorithm is able to address all tail loss patterns, 278 whereas the approach of [DUKKIPATI01] relies on a number of 279 experimental mechanisms ([DUKKIPATI02], [MATHIS], [RFC5827]) defined 280 for TCP in IETF or in Research with ad hoc extension to support 281 selected tail loss patterns by addition of the tail loss probe 282 mechanism and the therefrom driven activation of the mechanisms. 284 1.2.3. Packet Re-ordering 286 The solution proposed is an enhancement of the existing mis 287 indication counting based Fast Recovery operation of SCTP, [RFC4960], 288 and as such the solution inherits the fundamental vulnerability to 289 packet re-ordering that the SCTP Fast Retransmission algorithm of 290 [RFC4960] embeds. 292 For deployment of SCTP in environments where the Fast Retransmission 293 algorithm of [RFC4960] gives rise to spurious entering of Fast 294 Recovery it would be relevant to look into remedies which may detect 295 such and undo the effects of such. Possibly following the approaches 296 taken for TCP (and SCTP) in this area. 298 OPEN ISSUE: In severe packet re-ordering situations where the second 299 packet of two subsequently sent packets outrace the first packet in 300 arrival with more than PTO time, then such may tricker the SCTP TLR 301 function to enter spurious Fast Recovery. It is conjectured that the 302 this situation does not significantly increase the vulnerability of 303 Loss Recovery to packet-reordering. To be determined and evaluated. 305 1.2.4. Congestion Control 307 In its very nature of prompting for activation of Fast Recovery 308 instead of T3-Retransmission Recovery then the benefit of the 309 solution proposed versus the existing solution of [RFC4960] will 310 depend on the CC operation not only during the recovery process but 311 also after exit of the recovery process. In this context it is noted 312 that the prior approach taken for TCP, [DUKKIPATI01], has been 313 documented for a TCP implementation running CUBIC, e.g., see 314 [zimmermann01], whereas SCTP runs a CC algorithm more similar to TCP 315 Reno CC as defined by [RFC5681]. 317 The solution at present is defined within the constraints of existing 318 Congestion Control principles of STCP as defined by [RFC4960]. It is 319 anticipated that Congestion Control improvements are desirable for 320 SCTP in general as well as for the functions defined here in 321 particular. 323 1.2.5. CMT-SCTP Applicability 325 The SCTP TLR specification in this document applies to a SCTP 326 implementation following the [RFC4960] principles of using one shared 327 SACK clock spanning the data transfer over multiple paths. It is 328 noted that in its nature of maintaining the common SACK clock 329 principles of [RFC4960] then the SCTP TLR mechanism specified here 330 retains some of the vulnerabilities from [RFC4960] to spurious (or 331 delayed) entering of Fast Recovery operation caused by path changes 332 in inhomogeneous environments (change of data transfer among paths of 333 significantly different RTTs). The validity of this choice is 334 motivated by that concurrent data transfer on multiple paths is the 335 exception case in [RFC4960] MH SCTP and remains the exception also 336 with the enhancements of [RFC4960] specified here. 338 It is envisaged that the SCTP TLR mechanism specified is readably 339 applicable also to a SCTP implementation supporting concurrent multi 340 path transfer in line with the specification of [CMT-SCTP]. Though 341 is it emphasized that SCTP-TLR, when applied to [CMT-SCTP], needs 342 some adjustments as it should be applied in a split manner following 343 the principles of SFR of [CMT-SCTP]. 345 2. Conventions and Terminology 347 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 348 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 349 document are to be interpreted as described in [RFC2119]. 351 For the purposes of defining the SCTP TLR function, we use the 352 following terms and concepts: 354 "DupThresh": The number of miss indication counts on an 355 outstanding TSN at the reach of which SCTP declares the TSN as 356 lost and enters Fast Recovery for the TSN if not in Fast Recovery 357 already. 359 "Flight size": At any given time we define the "Flight size" to be 360 the number of bytes that a SCTP sender considers to be in flight 361 in the network from the sender to the receiver. It is noted that 362 the bytes of a message, which is considered lost and which has not 363 been retransmitted, is not contained in the Flight size. Further 364 it is noted that the bytes of a message which has been 365 retransmitted (once) will count either once or twice in the Flight 366 size depending on whether SCTP considers the first transmission of 367 the message as having been lost (dropped) in the network. 369 "Outstanding TSN": A TSN (and the associated DATA chunk) that has 370 been sent by the SCTP sender for which it has not yet received an 371 acknowledgement and which the SCTP sender has not abandoned (e.g., 372 abandoned as a result of [RFC3758]). 374 "highTSN": The highest outstanding TSN at this point in time. 376 "lowTSN": The lowest outstanding TSN at this point in time. 378 "Scoreboard": An SCTP sender need maintain a data structure to 379 store various information on a per outstanding TSN basis. This 380 includes the selective acknowledgment information, miss indication 381 counts, bytes counts and other information defined [RFC4960], in 382 this document and in other SCTP specifications. This data 383 structure we refer to as "scoreboard". The specifics of the 384 scoreboard data structure are out of scope for this document (as 385 long as the implementation can perform all functions required by 386 this specification). 388 3. Description of Algorithms 390 3.1. SCTP Scoreboard and miss indication Counting Enhancement 392 Entering of Fast Recovery in SCTP, as specified by [RFC4960]), is 393 driven by miss indication counts. When a TSN has received 394 DupThresh=3 miss indication counts, the TSN is declared lost and will 395 be eligible for fast retransmission via Fast Recovery procedure. 397 miss indication counts are in RFC4960 SCTP driven entirely by receipt 398 of SACKs in accordance with the Highest TSN Newly Acknowledged 399 algorithm (section 7.2.4 of [RFC4960]): 401 Highest TSN Newly Acknowledged (HTNA): For each incoming SACK, 402 miss indications are incremented only for missing TSNs prior to 403 the highest TSN newly acknowledged in the SACK. A newly 404 acknowledged DATA chunk is one not previously acknowledged in a 405 SACK. 407 An evident issue with the HTNA algorithm is that it is vulnerable to 408 loss of SACKs. In many situations loss of SACKs will result only in 409 a slight delayed entering of Fast Recovery for a dropped TSN, but 410 generally, then by relying on HTNA algorithm only, loss of SACKs will 411 further broaden the traffic tails situations where Fast Recovery 412 either not be activated in a timely manner or not be activated at all 413 due to the receipt of an insufficient number SACKs only. 415 In order to make SCTP Fast Recovery more robust towards drop of 416 SACKs, the following extension of the HTNA algorithm SHOULD be 417 supported by an SCTP implementation: 419 Newly Acked Packets ahead-of-line (NAPahol): For each incoming 420 SACK, miss indications are incremented only for missing TSNs prior 421 to the highest TSN newly acknowledged in the SACK. A newly 422 acknowledged DATA chunk is one not previously acknowledged in a 423 SACK. For each missing TSN thus potentially eligible for 424 additional miss indication counts, the number of miss indications 425 to be given shall follow the number of newly acknowledged packets 426 ahead of line of the packet of the missing TSN. 428 The solution is robust towards split SACK. The solution requires for 429 the SCTP implementation to keep track of the relationship in between 430 data chunks (TSN numbers) and packets. One solution is for the SCTP 431 implementation to maintain a packet id as a monotonically 432 incrementing packet sequence number to map chunks to packets and for 433 each outstanding chunk to keep state of the packet id that the chunk 434 was sent in as well as (incrementally updated) the packet ids of up 435 to DupThresh-1 (=2) packets ahead of line for which chunks have been 436 SACKed. 438 For accurate PTO-timer management, using the restart principles of 439 [HURTIG] and [Rajiullah], see Section 3.3, an SCTP TLR implementation 440 is required to keep track of the time at which packets/TSNs are 441 transmitted (or strictly speaking to be able to deduce the time since 442 a packet/a TSN was last transmitted). An implementation may exploit 443 timestamps for the generation of (part of) the packet id as well as 444 for the mentioned time management thereby limiting the additional 445 overhead required for the packet id storage. 447 As an alternative to the above accurate packet counting then an SCTP 448 implementation MAY, to reduce implementation complexity, instead 449 support the following bytes counting based extension of the RFC4960 450 HTNA algorithm: 452 Highest Bytes Newly Acknowledged (HBNA): For each incoming SACK, 453 miss indications are incremented only for missing TSNs prior to 454 the highest TSN newly acknowledged in the SACK. A newly 455 acknowledged DATA chunk is one not previously acknowledged in a 456 SACK. For each missing TSN thus eligible for additional mis 457 indication counts, the number of miss indications to be given 458 shall follow the number of newly acknowledged bytes in the SACK 459 ahead of line of the missing TSN in the following manner Add-miss 460 indication-count(TSN) = Ceiling((Newly bytes ahead of 461 line(TSN))/PMTU). 463 The HBNA approach as specified above is vulnerable to split of SACK. 464 An implementation choice which is robust to split of SACK is to 465 recalculate the total amount of selectively acknowledged bytes ahead 466 of line of an outstanding TSN and update the miss indication count of 467 the TSN as Ceiling((Selectively Acked bytes ahead of line 468 (TSN))/PMTU). This more robust implementation choice however demands 469 either for maintain of additional state per TSN, namely the 470 Selectively Acked bytes ahead of line (TSN) or for extensive repeated 471 computations. Risk of split SACK may not be weighty enough to worth 472 such implementation complexity. 474 The HBNA approach follows the approach taken for TCP, Islost(), in 475 [RFC6675]. It is noted, however, that due to the message based 476 approach of SCTP, then a byte based approach generally will be less 477 accurate as a measure for the number of packet received ahead of line 478 than it is for byte stream based TCP. 480 3.1.1. Multi-Path Considerations 482 In multi-homed [RFC4960] SCTP, data that potentially will be subject 483 to fast retransmission may be in flight on multiple paths. This 484 (exception) situation can occur as a result of a change of the data 485 transfer path, which may come about, e.g., as a result of a 486 switchback operation performed autonomously by SCTP or as a result of 487 a management operation setting a new primary path. The situation can 488 also occur as a result of destination directed data transfer where 489 the destination address specified is different from the present data 490 transfer path destination. In an [RFC4960] SCTP implementation, 491 SACKs of data sent on one path will increase the miss indication 492 counts of data with lower TSN in flight on a different path. As such 493 SACKs of data sent on one path may actually result in generation of 494 (potentially spurious) loss event reactions on a different path. 495 This fundamental aspect of [RFC4960] miss indication counting is not 496 changed in this document. Meaning that it is not intended for the 497 miss indication counting improvements defined above, i.e., the 498 NAPahol and the HBNA mechanisms, to discriminate among the paths on 499 which the SACK'ed data contributing to the miss indication counting 500 has been sent. 502 3.2. RFC6675 nextseg() Tail Loss Enhancements for SCTP FR 504 The Fast Retransmission algorithm for TCP as specified in [RFC6675] 505 implements some differences compared to the Fast Retransmission 506 algorithm specified for SCTP by [RFC4960]. Of particular 507 significance for recovery of losses in traffic tail scenarios are the 508 fact that the [RFC6675] algorithm, once Fast Recovery has been 509 activated, takes two "last resort" retransmission measures, step 3) 510 and step 4) of Nextseg() of [RFC6675]. These measures facilitate the 511 recovery of losses in situations where only an insufficient number of 512 SACKs would be able to be generated to complete the Fast Recovery 513 process without resorting to T3-timeout. For SCTP Fast Recovery we 514 formulate the equivalent measures as follows: 516 Last Resort Retransmission: If the following conditions are met: 518 * there are no outstanding TSN's eligible for fast retransmission 519 due to DupThresh or more miss indications 521 * there is no new data available for transmission 523 then an outstanding TSN less than or equal to the Fast Recovery 524 Exit Point, for which there exists SACKs of chunks ahead of line 525 of the TSN, may be retransmitted provided the CWND allow. The 526 bytes of a TSN which is retransmitted in this manner are not 527 subtracted from the Flight size prior to this action be taken nor 528 as a result of this action. If the miss indication count of the 529 TSN subsequently reaches the DupThresh value, the bytes of the TSN 530 shall be subtracted from the Flight size. Once acknowledged the 531 remaining contribution of this TSN in the Flight size (whether it 532 be there counted once or twice at this point in time) is 533 subtracted. A TSN which is retransmitted in this manner will be 534 marked as ineligible for a subsequent fast retransmit (see 535 considerations on Multiple Fast Retransmission operation in 536 Section 3.3.1.3). 538 An SCTP implementation which implements the Unambiguous SACK 539 feature of Appendix A may implement a more accurate calculation of 540 the flightsize when doing Last Resort Retransmission. That is, 541 instead of subtracting the contribution from the retransmitted TSN 542 from the flightsize once the acknowledgement of the TSN arrives, 543 the SCTP implement may distinguish where the acknowledgment is for 544 the original TSN or for the retransmitted TSN and in case the 545 acknowledgement is not for the retransmitted TSN, SCTP should 546 delay the subtract of the bytes of the retransmitted TSN from the 547 flightsize until either an acknowledgement of the retransmitted 548 TSN is received (see Appendix A) or until PTO2-T_latest(TSN) time 549 has elapsed (see Section 3.3.1). 551 Rescue: If all of the following conditions are met: 553 * there are no outstanding TSN's eligible for fast retransmission 554 due to DupThresh or more miss indications 556 * there is no new data available for transmission and no data is 557 outstanding on the association beyond the Fast Recovery Exit 558 Point 560 * there are no outstanding TSNs eligible for Last Resort 561 Retransmission 563 * the cumack has progressed since this entering of Fast Recovery 565 and there exist non-SACKed, non fast retransmitted TSNs, within 566 the Fast Recovery Exit point, then for this entry of Fast 567 Recovery, conditionally to that the CWND allows, we allow for fast 568 retransmission of one packet of consecutive outstanding non fast 569 retransmitted TSNs up to PMTU size, the highest TSN of which MUST 570 be the highest outstanding TSN within the Fast Recovery Point. 571 The bytes of a TSN which is retransmitted in this manner are not 572 subtracted from the Flight size prior to this action be taken nor 573 as a result of this action. If the miss indication count of the 574 TSN subsequently reaches the DupThresh value, the bytes of the TSN 575 shall be subtracted from the Flight size. Once acknowledged the 576 remaining contribution of this TSN in the Flight size (whether it 577 be there counted once or twice at this point in time) is 578 subtracted. A TSN which is retransmitted in this manner will be 579 marked as ineligible for a subsequent fast retransmit(see 580 considerations on Multiple Fast Retransmission operation in 581 Section 3.3.1.3). 583 An implementation of the Rescue operation may be accomplished by 584 maintain of an RescueRTX parameter as described for TCP in [RFC6675]. 586 An SCTP implementation which implements the Unambiguous SACK feature 587 of Appendix A may implement a more accurate calculation of the 588 flightsize when performing Rescue operation. That is, instead of 589 subtracting the contribution from the retransmitted TSN from the 590 flightsize once the acknowledgement of the TSN arrives, the SCTP 591 implement may distinguish where the acknowledgment is for the 592 original TSN or for the retransmitted TSN and in case the 593 acknowledgement is not for the retransmitted TSN, SCTP should delay 594 the subtract of the bytes of the retransmitted TSN from the 595 flightsize until either an acknowledgement of the retransmitted TSN 596 is received (see Appendix A) or until PTO2-T_latest(TSN) time has 597 elapsed (see Section 3.3.1). 599 DISCUSSION: [RFC4960] in addition to the HTNA algorithm demand for 600 additional miss indication counting to be performed during Fast 601 Recovery according to the following prescription (section 7.2.4 of 602 [RFC4960]): 604 (#) If an endpoint is in Fast Recovery and a SACK arrives that 605 advances the Cumulative TSN Ack Point, the miss indications are 606 incremented for all TSNs reported missing in the SACK. 608 It is noted that under special circumstances then (#) makes SCTP Fast 609 Recovery complete in situations where TCP Fast Recovery would only 610 complete by virtue of the measure 3) or 4) of [RFC6675] and as such 611 these measures are more critically demanded for TCP Fast Recovery 612 operation than for the SCTP Fast Recovery operation. However as 613 documented by (OPEN ISSUE: to be filled in) the Last Resort 614 Retransmission operation and the Rescue operation also for SCTP 615 significantly improve the Loss Recovery operation; the latency of the 616 individual loss recovery operation as well as the ability of the 617 operation to complete without resort to T3-timeout. Consequently 618 this document prescribes for SCTP TLR to implement these procedures. 619 Conversely even when the measures 3) and 4) of [RFC6675] are 620 implemented, (#) gives benefits in terms of releasing flight size 621 space allowing Fast Recovery to progress. 623 As the algorithm extension is limited by the existing congestion 624 control algorithm of SCTP, these extensions of SCTP Fast Recovery do 625 not compromise the TCP fairness of the SCTP Fast Recovery Operation. 627 3.2.1. Multi-Path Considerations 629 In multi-homed [RFC4960] SCTP, data that potentially will be subject 630 to Fast Retransmission may be in flight on multiple paths. This 631 (exception) situation in particular can occur as a result of a change 632 of the data transfer path as a result of a switchback operation to a 633 primary path. Here SACKs of data sent on one path (e.g., the new 634 data transfer path) may result in generation of (potentially 635 spurious) loss event reactions on a different path (the prior data 636 transfer path). The [RFC4960] miss indication counting based on a 637 common SACK clock is not changed in this document, nevertheless the 638 protocol operation, here the operation of the Last Resort 639 Retransmission and the Rescue operation in this situation, need to be 640 specified. 642 The specification in this document is based on the following 643 fundamental goals: 645 o an [RFC4960] SCTP implementation must appropriately react to loss 646 events observed by means of miss indication counting, by 647 performing appropriate adjustments of CWND and sstresh, an all 648 paths where such loss events are observed. 650 o The observation of a loss event on one path should not for 651 [RFC4960] SCTP MH impact the congestion control operation on a 652 different path. 654 For the implementation of the Last Resort Retransmission and the 655 Rescue operations for [RFC4960] MH SCTP then the following 656 specifications are given: 658 o For a TSN to be eligible for Last Resort Retransmission a loss 659 event MUST have been observed on the path on which this TSN is in 660 flight. 662 o For a TSN to be eligible for the Rescue operation a loss event 663 MUST have been observed on the path on which this TSN is in 664 flight. 666 An implementation of the above may be accomplished by the 667 implementation of a Fast Recovery state and Fast Recovery Exit point 668 on a per path basis with the following particulars: 670 o A path enters the Fast Recovery State based on loss event 671 observation of TSNs in flight on the path. 673 o When a loss event is observed on a path the Fast Recovery Exit 674 point on the path is set to the highest TSN in flight of the path. 676 o Fast Retransmission of TSNs in flight on the path terminates once 677 the Fast Recovery Exit Point on the path has been reached (i.e., 678 has been cumulative SACK'ed) at which point the Fast Recovery 679 process on the path is terminated. 681 o The eligibility of a TSN for the Last Resort Retransmission and 682 the Rescue operation shall follow the prescriptions given above 683 with adherence to the Fast Recovery Exit point set on the path on 684 which the TSN is in flight. 686 The data retransmission process of data chunks in itself is 687 prescribed to happen on the present data transfer path of the 688 association regardless of which path the data chunks were in flight 689 on when they became eligible for Fast Retransmission. This follows 690 [RFC4960] and the preceding [CARO02]. 692 With the above per path modelling of the Fast Recovery operation, 693 SCTP may have multiple fast recovery exit points at any given time 694 (though at most one per path) and the fast recovery operation may 695 terminate at different times on the different paths. Further it is 696 noted that a path may be in Fast Recovery even if no data is in 697 flight on the path or even if the only data in flight on the path is 698 beyond the Fast Recovery Exit Point of the path. The latter can 699 occur in the very peculiar case where fast retransmission of data 700 declared lost on the path happens on a different path as well as that 701 the user performs a data directed data transfer on the path in 702 question. 704 An SCTP implementation fulfilling the goals described above may also 705 be achieved by other means than by maintain of a per path Fast 706 Recovery Exit point. For example it might be achieved by maintain of 707 a common association Fast Recovery Point spanning multiple paths, but 708 still the implementation must ensure appropriate per destination 709 address congestion control operation. 711 3.3. SCTP-TLR Description 713 3.3.1. Principles 715 The SCTP TLR function is based on the following principles. 717 3.3.1.1. Retransmission Timers Management 719 This document is specified as if there is a single retransmission 720 timer per destination transport address, but implementations MAY have 721 a retransmission timer for each DATA chunk. 723 This document specifies usage of new PTO timer for SCTP TLR. The 724 document is specified as if the PTO timer functions are implemented 725 by means of the existing retransmission timer of [RFC4960] SCTP, 726 i.e., under certain conditions the retransmission-timer is activated 727 with special PTO values rather than with the standard T3-timer value. 728 The document is specified as if there is a single PTO timer per 729 destination transport address, equivalently a single PTO timer per 730 path. Implementations MAY choose to implement a PTO timer per DATA 731 chunk. 733 For an outstanding TSN we define the time T_latest(TSN) to be the 734 time that has elapsed since the TSN was last sent. When a TSN is 735 first sent, or when it is retransmitted, T_latest(TSN)=0. An SCTP 736 TLR implementation must be able to deduce this value for any 737 outstanding TSN. 739 3.3.1.2. Timer driven entering of Fast Recovery 741 Timer driven entering of Fast Recovery in SCTP TLR is based on the 742 following principles: 744 o Maintain of a Tail Loss Probe Timer (PTO) which in certain 745 situations (generally when retransmission is not performed) is 746 running on a path. At any given time the value of the PTO timer 747 is related to the lowest TSN in flight on the path. The PTO timer 748 value used will depend on the situation: 750 By default the following timer value is used: 752 PTO1: PTO=MIN(RTO, 1.5*SRTT+MAX(RTTVAR, DELAY_ACK)) 754 Whereas the following value is used: 756 PTO2: PTO=MIN(RTO, 1.5*SRTT+RTTVAR) 758 when it is known that subsequent SACKs not acknowledging the 759 TSN for which the PTO is running will be (or will have been) 760 returned immediately. For more details see Section 3.3.2. 762 By design the probe timer is kept lower or equal to the RTO, 763 thereby aiming to prevent a potential unnecessary and damaging 764 RTO, as well as generally larger than an anticipated RTT 765 thereby preventing that it kicks in prematurely. I.e., the 766 timer only kicks in at a time where one would have expected to 767 have received a SACK of the lowest TSN in flight were there no 768 problems. 770 A minimal PTO value, PTO_MIN, is applied to the above formulas 771 (particularly important for PTO2). I.e., the effective PTO1 = 772 MAX(PTO_MIN, PTO1) and the effective PTO2 = MAX(PTO_MIN, PTO2). 773 The suggested value of PTO_MIN is 10 msec. In the following 774 when referring to PTO1 and PTO2 we refer to the effective PTO1 775 and PTO2 values. 777 For an SCTP implementation which performs RTT measurements 778 during the association set-up, the PTO set on the path on which 779 the first data chunk is sent shall be initialized from the RTT 780 measured on the path during the association set-up. If no such 781 RTT measurement is performed or is available on the particular 782 path in question, the PTO shall be initialized as RTO_INIT. 784 o PTO timer driven transmittal of Tail Loss Probe Packet: Once data 785 is outstanding on a path and the PTO timer of the path kicks and 786 no SACKs of any chunks with higher TSN number have arrived, a 787 probe packet, denoted a Tail Loss Probe Packet (TLPP), is sent to 788 probe for network responsiveness (i.e., for SACK of the TLPP) in 789 order to potentially drive proactive entering of Fast Recovery. 791 * For a SCTP sender that supports the Immediate SACK feature, 792 [RFC7053], the I-bit MUST be set on chunks sent in a TLPP 793 packet. 795 o PTO timer driven entering of Fast Recovery: Process is enforced 796 when network responsiveness is proven (SACK of later sent data 797 than lowest TSN in flight on the path is available) and (at least) 798 PTO time has elapsed since transmittal of this lowest TSN in 799 flight on the path. 801 Comment: The lowest outstanding TSN on an association may under 802 special circumstances not be in flight on any path of the 803 association. This can happen when the lowest outstanding TSN has 804 been declared lost but the transmittal of the TSN is prevented due to 805 congestion window limitations (e.g., during Fast Recovery). In this 806 case, as well as generally for TSNs that are being retransmitted due 807 to fast retransmission or T3-timeout, no PTO timer is running on the 808 TSN. Conversely when the lowest outstanding TSN on a path is not 809 subject to Fast Recovery or T3-Recovery, then this lowest outstanding 810 TSN is also in flight on the path. 812 3.3.1.3. Fast-Recovery and Loss Detection 814 Fast Recovery and miss indication counting for the SCTP TLR function 815 MUST embed the enhancements described in Section 3.2. In addition 816 SCTP TLR implements the following loss detection during Fast 817 Recovery: 819 o If in Fast Recovery, then an outstanding TSN in flight on the 820 path, with TSN lower that the Fast Recovery Exit Point on the 821 path, is declared lost when the following conditions are 822 satisfied: 824 * The TSN has not been fast retransmitted. 826 * T_latest(TSN) > PTO2. 828 * The TSN is lower than the highest outstanding SACK'ed TSN. 830 When declared lost by this procedure the TSN is subtracted from the 831 flight size as well as it becomes eligible for fast retransmission as 832 if it had been declared lost by reach of Dupthresh miss indication 833 counts. 835 Such loss detection during SCTP TLR Fast Recovery shall at a minimum 836 be done at receipt of SACK as well as at times where the possibility 837 to transmit new data is being evaluated. An implementation 838 maintaining PTO timers on a per data chunk basis may make further 839 evaluation based on timer expiration. 841 Following [RFC4960] it is assumed that a data chunk should only be 842 fast retransmitted once. I.e., subsequent retransmissions of the 843 data chunk must proceed as T3-retransmission. An SCTP TLR 844 implementation MAY possibly implement Multiple Fast Retransmission 845 operation following the principles described in [CARO01] extended to 846 include the Last Resort Retransmission and Rescue operations. Such 847 however is not covered by the specification given here. 849 3.3.1.4. T3-Recovery 851 [RFC4960] does not explicitly specify for an T3-Recovery phase to be 852 supported for SCTP, nor does [RFC4960] explicitly demand for that a 853 data chunk which has been T3-retransmitted cannot undergo fast 854 retransmission. It can be an advantage that a lost T3-retransmitted 855 data chunk may be recovered by timely fast retransmission rather than 856 by a subsequently, potentially back-off'ed T3-retransmission. For 857 [RFC4960] MH SCTP, however, reliable implementation of such fast 858 recovery of lost T3-retransmitted data is difficult to achieve given 859 the usage of one common SACK clock as new data on one path may trick 860 spurious fast retransmission of data that has been/is being 861 T3-retransmitted on a different path. Here it is important to 862 emphasize that concurrent T3-retransmission and new data transmission 863 on different paths is the standard operation of MH SCTP [RFC4960]. 864 (Though implementations might possibly mitigate such effects by only 865 sending new data after completion of the T3-retransmission operation 866 as well as the implementation of SCTP-PF, [SCTP-PF], would further 867 decrease the likelihood of such concurrent data transfer occurring.) 869 In this document we assume that an SCTP implementation follows either 870 of the following implementation choices: 872 o A data chunk which has underwent T3-retransmission cannot 873 subsequently be subject to Fast Retransmission whether such 874 entering of Fast Recovery be driven alone by miss indication 875 counting or by the SCTP TLR mechanism. This implementation choice 876 corresponds to implementing a T3-Recovery phase for SCTP 877 equivalent with the RTO-recovery phase of TCP. 879 o A data chunk, which has underwent T3-retransmission, will be 880 eligible for subsequent Fast Retransmission if such is driven by 881 miss indication counts from SACKs of new data chunks sent after 882 all data outstanding for T3-retransmission have been sent and the 883 new data is sent on the same path as the T3-retransmission data. 885 One implementation choice may be to follow the first implementation 886 choice for SCTP MH and the second implementation choice for SCTP SH. 887 Regardless of this implementation choice then in SCTP TLR a data 888 chunk that has been subject to T3-retransmission SHOULD NOT by 889 subject to the timer driven entering of Fast Recovery specified 890 below. The motivation for this choice is that the SRTT may not be 891 appropriately refreshed during the T3-retransmission process. OPEN 892 ISSUE/TO DO: Ideally the PTO timer used after the exit of the 893 T3-recovery phase should be updated based on a fresh RTT measurement. 894 E.g., from the last acknowledged TSN. If no new SRTT calculation is 895 made based on a scheduled RTT measurement, then the PTO timer values 896 could be made sure to be appropriately adjusted, if necessary, by a 897 last measured RTT by 1,5*SRTT + RTTVAR --> MAX(1*5 RTT, 1,5*SRTT + 898 RTTVAR). 900 3.3.2. SCTP - TLR Statemachine 902 The SCTP Tail Loss Recovery function defines 3 states: The SCTP TLR 903 OPEN state, the SCTP TLR PROBE WAIT state and the SCTP TLR DELAY WAIT 904 state. At any given time the SCTP transmission logic for the lowest 905 outstanding TSN on a path will be in one of these 3 states or the TSN 906 is sought being recovered by means of Fast Recovery or T3-Recovery. 908 Figure 1 illustrates the states and the state transitions. 910 (to be inserted) 912 Figure 1, Enhanced Loss Recovery State Machine Diagram 914 In the following we describe the states and the actions taken. 916 3.3.2.1. SCTP TLR OPEN STATE 918 This is the state the SCTP transmission logic is in on any path when 919 no TSN is outstanding on the association as well as it is the state 920 when SCTP sends the first data on a path after idle/no TSN 921 outstanding. It also more generally is the state the transmission 922 logic is in when there are no gaps in the SACK scoreboard beyond the 923 lowest outstanding TSN on the path. 925 In this state SCTP is not performing Fast Recovery nor T3-Recovery on 926 the lowest TSN outstanding on the path and no SACKs of any chunks 927 with higher TSN number have arrived. In this state, when SCTP has 928 outstanding data on the path, a PTO timer is running relative to the 929 lowest TSN outstanding on the path. 931 The PTO set on a (new) lowest outstanding TSN on the path in this 932 state will follow PTO1 when less than 2 packets are outstanding 933 beyond the TSN at the time when the timer is set and follow PTO2 when 934 2 or more packets are outstanding beyond the TSN when the PTO timer 935 is set or when the Immediate SACK feature is known to be supported by 936 both sender and receiver (see Section 4) and the I-bit has been set 937 on the TSN or on an outstanding TSN of higher number. 939 In the OPEN state the following may happen: 941 o A SACK commutatively acknowledging the lowest outstanding TSN and 942 resulting in no gaps in the SACK scoreboard may arrive. In this 943 case the state remains in OPEN state. If there still is 944 outstanding data on the path, the PTO timer is set on the new 945 lowest outstanding TSN. The PTO timer value set will be the value 946 PTO - T_latest(TSN) where the PTO value is calculated either from 947 PTO1 or PTO2 according to the evaluation criteria given above. 949 o A SACK with gap(s) may arrive, thus proving network responsiveness 950 while still not cumulatively acknowledging all lower (than the 951 SACK'ed gap) outstanding TSNs on the path. The SACK may or may 952 not move the cumulative ACK point. This indicates that either 953 packets are being re-ordered or the (new) lowest outstanding TSN 954 on the path has been lost. 956 * If the SACK makes the miss indication count on the (new) lowest 957 outstanding TSN reach Dupthresh the SCTP OPEN state is 958 terminated and Fast Recovery is started. 960 * If Dupthresh miss indication count is not reached on the (new) 961 lowest outstanding TSN, the state will now transit to SCTP TLR 962 DELAY WAIT state for potential entering of SCTP TLR driven Fast 963 Recovery if the PTO timer kicks prior to the (new) lowest 964 outstanding TSN has been acknowledged or for potential later 965 entering of Fast Recovery by reach of Dupthresh miss indication 966 counts. When transiting to SCTP TLR DELAY WAIT the PTO timer 967 relative to the (new) lowest outstanding TSN is reset to PTO2 - 968 T_latest(TSN). In case PTO2 - T_latest(TSN) <= 0, the DELAY 969 WAIT state is immediately terminated, the packet containing the 970 lowest outstanding TSN is declared lost, and Fast Recovery is 971 started. 973 o The PTO timer relative to the lowest outstanding TSN may kick, in 974 which case SCTP TLR will send a TLPP, reset the PTO timer relative 975 to the lowest outstanding TSN to a T3 timer and transit to SCTP 976 TLR PROBE WAIT state to await either the kick of the T3 relative 977 to the lowest outstanding TSN (network is persistently 978 unresponsive) or proof of network responsiveness and potential 979 entering of SCTP TLR driven Fast Recovery unless the network 980 responsiveness proof comes in form of cumulative acknowledgement 981 of the TSN. The T3-value set relative to the lowest outstanding 982 TSN when sending the TLPP probe and entering this state shall be: 984 * MAX(PTO1, RTO - T_latest(TSN))), when receiver side support for 985 Immediate SACK has not been confirmed for the association, see 986 Section 4. 988 * MAX(PTO2, RTO - T_latest(TSN)), when receiver side support for 989 Immediate SACK has been confirmed for the association, see 990 Section 4, and the SCTP sender itself deploys the Immediate 991 SACK feature. 993 For further details on the TLPP transmission see Section 3.3.3. 995 3.3.2.2. SCTP TLR PROBE WAIT STATE 997 In this state the lowest outstanding TSN has remained unSACK'ed for 998 more than PTO time and no indication (no SACK of higher outstanding 999 TSNs have been received) thus resulting in the transmittal of a TLPP 1000 to probe for the network responsiveness. 1002 The T3-value set relative to the lowest outstanding TSN when sending 1003 the TLPP probe and entering this state is: 1005 o MAX(PTO1, RTO - T_latest(TSN))), when receiver side support for 1006 Immediate SACK has not been confirmed for the association, see 1007 Section 4. 1009 o MAX(PTO2, RTO - T_latest(TSN)), when receiver side support for 1010 Immediate SACK has been confirmed for the association, see 1011 Section 4, and the SCTP sender itself deploys the Immediate SACK 1012 feature. 1014 For further details on the TLPP transmission see Section 3.3.3. 1015 Observe that in some special cases no TLPP is sent even if this state 1016 is entered and conceptually is handled as if a TLPP has been sent. 1018 In the PROBE WAIT state the following may happen: 1020 o SACKs may arrive that makes the miss indication count on the 1021 lowest outstanding TSN/lowest TSN in flight reach Dupthresh in 1022 which case the PROBE WAIT state is terminated and Fast Recovery is 1023 started. 1025 o A SACK cumulatively acknowledging all holes including the lowest 1026 outstanding TSN may bring the SCTP TLR STM state back to SCTP TLR 1027 OPEN state. In this case a new PTO timer will be started on the 1028 new lowest outstanding TSN following the PTO timer setting in the 1029 SCTP TLR OPEN state. In this situation "PTO restart principles" 1030 (i.e., yielding PTO-T_latest(TSN)) shall not be deployed. 1031 Spurious entering of PROBE WAIT state can happen if the PTO is too 1032 short, in such a situation it would not be prudent to deploy PTO 1033 restart principles when returning to OPEN state. OPEN ISSUE: 1034 Possibly PTO restart principles shall be refrained from until new 1035 RTT measurements are available. 1037 o A SACK may arrive for a higher outstanding TSN with lowest 1038 outstanding TSN on the path remaining unSACK'ed. This will result 1039 in declaration of the packet of the lowest outstanding TSN as lost 1040 and will make SCTP enter Fast Recovery. 1042 o A SACK may arrive that acknowledges the lowest outstanding TSN, 1043 but also data of higher TSN than the new lowest outstanding TSN 1044 are acknowledged in the SACK. In this case there is indication 1045 that either packet re-ordering has occurred or the new lowest 1046 outstanding TSN has been lost. The state will now transit to SCTP 1047 TLR DELAY WAIT state for potential entering of SCTP TLR driven 1048 Fast Recovery if the PTO timer kicks prior to the new lowest 1049 outstanding TSN has been acknowledged. The PTO timer set on the 1050 new lowest outstanding TSN will be PTO2 - T_latest(TSN). In case 1051 PTO2 - T_latest(TSN) <= 0, the DELAY WAIT state is immediately 1052 terminated, the packet containing the lowest outstanding TSN is 1053 declared lost, and Fast Recovery is started. 1055 o The T3-timer may kick. In this case the PROBE WAIT state will be 1056 terminated and T3-recovery will start on non-SACK'ed outstanding 1057 data. 1059 3.3.2.3. SCTP TLR DELAY WAIT STATE 1061 In this state network responsiveness has been received (in form of a 1062 SACK of higher TSN than the lowest outstanding TSN) and the PTO timer 1063 relative to the lowest outstanding TSN is running for potential 1064 entering of SCTP TLR driven Fast Recovery. 1066 The PTO set on a new lowest outstanding TSN in this state will be 1067 according to PTO2 in form of PTO2-T_latest(TSN). 1069 In the DELAY WAIT state the following may happen: 1071 o SACKs may arrive that will make the miss indication count on the 1072 lowest TSN in flight reach Dupthresh, the DELAY WAIT state is 1073 terminated and SCTP enters Fast Recovery. 1075 o The PTO timer relative to the lowest outstanding TSN may kick. 1076 This will result in declaration of packet of the lowest 1077 outstanding TSN as lost and will make SCTP enter Fast Recovery. 1079 o A SACK cumulatively acknowledging all holes including the lowest 1080 outstanding TSN may arrive and bring the SCTP TLR STM state back 1081 to SCTP TLR OPEN state and the PTO timer will be restarted on the 1082 new lowest outstanding TSN. The PTO timer value set will be the 1083 value PTO - T_latest(TSN) where the PTO value is calculated either 1084 from PTO1 or PTO2 according to the evaluation criteria given for 1085 the OPEN state. 1087 o A SACK may arrive that acknowledges the lowest outstanding TSN, 1088 but also data of higher TSN than the new lowest outstanding TSN 1089 are acknowledged in the SACK. In this case there is indication 1090 that either packet re-ordering has occurred or the new lowest 1091 outstanding TSN has been lost. The state will remain in SCTP TLR 1092 DELAY WAIT state for potential entering of SCTP TLR driven Fast 1093 Recovery if the PTO timer kicks prior to the new lowest 1094 outstanding TSN has been acknowledged. The PTO timer set on the 1095 new lowest outstanding TSN will be PTO2 - T_latest(TSN). In case 1096 PTO2 - T_latest(TSN) <= 0, the DELAY WAIT state is terminated, the 1097 packet containing the lowest outstanding TSN is declared lost and 1098 Fast Recovery is started. 1100 o A SACK may arrive that does not acknowledge the lowest outstanding 1101 TSN and still do not make the miss indication count reach the 1102 Dupthresh value. In this situation no changes are done to the PTO 1103 timer running and the state will remain in SCTP TLR DELAY WAIT 1104 state for potential entering of SCTP TLR driven Fast Recovery if 1105 the PTO timer kicks prior to the lowest outstanding TSN has been 1106 acknowledged. 1108 3.3.2.4. Exit of Loss Recovery 1110 After exit of Fast Recovery or completion of T3-retransmission then 1111 if data is outstanding a PTO timer is started relative to the lowest 1112 outstanding TSN on the path and the state transits to either SCTP TLR 1113 OPEN state or to SCTP TLR DELAY Wait state depending on the status of 1114 the SACK scoreboard (i.e., do gaps exist or not). The PTO timer set 1115 will follow the rules described above. PTO-restart principles shall 1116 not be deployed in this situation as fresh RTT measurements might not 1117 be available. OPEN ISSUE: Possibly PTO restart principles shall be 1118 refrained from until new RTT measurements are available. 1120 3.3.2.5. RTO-Restart Principles for the T3-timer 1122 When the lowest TSN in flight on a path is undergoing Fast Recovery 1123 or T3-retransmission a T3-timer is running on the path (relative to 1124 this lowest TSN in flight). For SCTP TLR the RTO-restart principles 1125 as of [HURTIG] SHOULD unconditionally be applied to the T3-timer. 1126 Thus the T3-timer set on a path in this case SHOULD be the value RTO- 1127 T_latest(TSN) relative to the lowest TSN in flight on the path. 1129 3.3.3. TLPP Transmission Rules 1131 The transmission of a Tail Loss Probe Packet (TLPP), done just prior 1132 to entering the SCTP TLR PROBE WAIT state from SCTP OPEN, is governed 1133 by the following details: 1135 o TLPP of new data is always preferred if such is available for 1136 transmission. If such exists, the TLPP sent is chosen as the 1137 lowest unsent TSNs that fit into one packet 1139 o Alternatively if no new data is available for transmission, either 1140 due to application or receiver side limitations, the presently 1141 outstanding packet with highest TSN number is chosen as the TLPP. 1143 o TLPP of retransmission data counts twice in the in-flight until 1144 acknowledged or detected as lost. 1146 o The transmittal of a TLPP of sub-PMTU size is not blocked by 1147 Nagle-like bundling. 1149 The highest (new) outstanding TSN is chosen for probing in order to 1150 best possibly interface with standard Fast Recovery, i.e., to create 1151 a loss pattern situation that corresponds best possibly with how Fast 1152 Recovery algorithm retransmits, and is invoked to retransmit, lost 1153 packets. 1155 TLPP Transmission conditions: 1157 A TLPP is not sent unconditionally when SCTP enters PROBE WAIT state 1158 on a path. 1160 No explicit limit is applied to the number of TLPP probe packets 1161 (i.e., the number of unacknowledged packets sent as TLPP) that may be 1162 outstanding at any given time but the number of such will in most 1163 situations be effectively limited to a very few (very often only one) 1164 by the following rules based on latency and congestion control 1165 principles; Generally a TLPP will not be allowed to breach the CWND 1166 more than once per RTT and further a TLPP is omitted to be sent if an 1167 already outstanding packet is considered to serve "good enough" from 1168 a network probing perspective. In addition special considerations 1169 are given for the transmittal of a TLPP consisting of retransmission 1170 data to ease loss masking detection (see Section 3.3.4). It is 1171 further noted that the frequency of TLPP transmittal is limited by 1172 how often a transition can happen out of and back into the PROBE WAIT 1173 state. 1175 The conditional transmission of a TLPP is specified as follows: 1177 o If the highest outstanding TSN has been sent only a little while 1178 ago, this TSN effectively serves as a probe and no TLPP need to be 1179 send. This condition aims to prevent unnecessary retransmission 1180 of just sent data and unnecessary transmittal of small sub-PMTU 1181 packets of new data. The exact condition to apply is: 1183 * If T_Latest(highTSN) < gamma * SRTT 1185 then no TLPP is sent. gamma = 1/2 is recommended. A special 1186 condition arise when little data is outstanding and the SACK of 1187 the outstanding data may be lost by a single loss of SACK. In 1188 this case the transmittal of a TLPP packet will make the SACK 1189 return be robust toward single loss of SACK. For added robustness 1190 to SACK return an SCTP TLR implementation MAY disregard the above 1191 condition if only 2 packets are outstanding. 1193 o If no TLPP is outstanding, a probe is sent unconditionally of 1194 CWND. 1196 o If a TLPP is outstanding, a probe is sent conditionally to that 1197 there is room in CWND. Otherwise no TLPP is sent. I.e., the CWND 1198 is not breached when a TLPP is outstanding. 1200 o If no new data exists, a probe of retransmission data is sent 1201 conditional to whether a TLPP of retransmission data is already 1202 outstanding. I.e.,: 1204 * If no TLPP of retransmission data is outstanding, send TLPP 1205 consisting of highest outstanding TSN. 1207 * If a TLPP of retransmission data is outstanding, no TLPP is 1208 sent. 1210 The above rules on probes of retransmission data are defined to ease 1211 the detection of TLPP recovered losses by the algorithm described in 1212 Section 3.3.4. 1214 3.3.3.1. Multi-Path Considerations for TLPP Transmission 1216 In multi-homed [RFC4960] SCTP, multiple paths may have a PTO timer 1217 running on data in flight. E.g., two paths may be in SCTP OPEN state 1218 and SCTP will have two PTO timers running, each relative to the 1219 lowest outstanding TSN on the respective path. This (exception) 1220 situation in particular can occur as a result of a change of the data 1221 transfer path as a result of a switchback operation to a primary 1222 path. The handling of TLPP transmission for SCTP MH is described in 1223 the following. The underlying philosophy of the solution is, as far 1224 as possible, to have the SCTP TLR probing mechanism be undertaken on, 1225 and by, the data transfer path. Thus best possibly avoiding 1226 conflicts that may arise due to concurrent data transfers on multiple 1227 paths. As follows: 1229 o When the PTO timer kicks on a path in SCTP OPEN state and the TLPP 1230 selected by the rules above consists of new data, then if the path 1231 is the present data transfer path of the association the TLPP will 1232 be sent and in this case the TLPP is sent on the data transfer 1233 path of the association. When in this situation the path is not 1234 the present data transfer path of the association, then 1236 * if there is no outstanding data on the present data transfer 1237 path, the TLPP of new data is sent there. 1239 * if there is outstanding data on the data transfer path, the 1240 TLPP is not sent. Instead the potential transmittal of a TLPP 1241 is deferred to be driven by a later kick of the PTO timer on 1242 the data transfer path. 1244 The first situation that data is available for transmittal on the 1245 data transfer path but has not been sent, is an unlikely 1246 situation, but it might possibly occur in some implementations. 1248 o When the PTO timer kicks on a path in SCTP OPEN state and the TLPP 1249 selected by the rules above consist of retransmission of the 1250 presently highest outstanding TSNs on the association, then if and 1251 only if these TSNs are outstanding on the path in question is the 1252 TLPP allowed to be sent. The following guidelines are given for 1253 the path selection for the TLPP: 1255 * An SCTP implementation which does not implement the Unambiguous 1256 SACK feature of Appendix A should send the TLPP on the path on 1257 which the TNSs are presently outstanding (i.e., on the path on 1258 which the PTO kicked). 1260 * An SCTP implementation which implements the Unambiguous SACK 1261 feature of Appendix A may send the TLPP on the data transfer 1262 path of the association. 1264 The reason a TLPP of retransmitted data in the first case above is 1265 sent on the path on which the data was first sent, even if this 1266 path is not the present data transfer path (special corner case 1267 with change of data transfer path or destination adders directed 1268 data transfer), is that the TLPP Loss Mask Detection mechanism, 1269 see Section 3.3.4 could not infer on which path to perform a 1270 congestion window reduction if the TLPP and original data is sent 1271 on different paths. An SCTP implementation which implements the 1272 Unambiguous SACK feature of Appendix A can better distinguish the 1273 SACK of the original TSN and the retransmitted TSN and can 1274 therefore operate differently. The choice of sending the TLPP on 1275 the data transfer path may be motivated by that the Fast Recovery 1276 procedure, which the SACK of the TLPP may result in, would use the 1277 data transfer path. On the other hand then differences in the RTT 1278 on the different paths may make it suboptimal to send the TLPP on 1279 the data transfer path as well as it can give rise to potential 1280 uncertainty in the TLPP Loss Recovery Mask detection and reaction 1281 process (see Section 3.3.4). 1283 It is emphasized that the deferral of the transmission of a TLPP does 1284 not prevent entering of the PROBE WAIT state on the path where the 1285 PTO kicked. 1287 3.3.4. Masking of TLPP Recovered Losses 1289 If a single SCTP packet is lost, there is a risk that the TLPP packet 1290 itself might repair the loss if that particular lost packet is used 1291 as probe. The masking problem is only present if the TLPP is based 1292 on retransmission data. The TLPP might mask the loss and thus 1293 interfere with the congestion control principle that requires for 1294 CWND halving when a loss is detected. 1296 At present the solution in this document operates with the algorithm 1297 defined for this purpose in [DUKKIPATI01] with adjustment to SCTP to 1298 rely on the D-SACK (duplicate TSN received) information available 1299 from SCTP SACK or alternatively to the information available from the 1300 Unambiguous SACK information of Appendix A. The solution operates 1301 with a conceptual TLPP Retransmission Episode. As follows: 1303 o Once a TLPP packet consisting of retransmission data is sent a 1304 TLPP Retransmission Episode is started. 1306 o A TLPP Retransmission Episode is abruptly terminated if Fast 1307 Recovery or T3-Recovery is entered. 1309 o For an SCTP implementation which does not implement the 1310 Unambiguous SACK feature of Appendix A, as well as for an SCTP 1311 association where the Unambiguous SACK feature of Appendix A is 1312 not in use, the TLPP Retransmission Episode terminates when an 1313 incoming SACK cumulatively acknowledges a sequence number higher 1314 than the sequence number of the TLPP probe with retransmission 1315 data. If at this time in stage the number of times the TLPP TSN 1316 has been received, according to the D-SACK information received, 1317 is lower than the number of times the TLPP TSN has been sent, CWND 1318 halving is done on the unique path on which the retransmission 1319 TLPP TSN has been sent. Further at this stage in time the 1320 contribution from the TSN is subtracted from the flight size in 1321 accordance to the number of times the TSN has been sent. 1323 o For an SCTP implementation which implements the Unambiguous SACK 1324 feature of Appendix A the following actions are taken at the time 1325 of acknowledgement of the TSN used as TLPP: 1327 * If the TLPP TSN is first cumulatively acknowledged in a SACK 1328 with CUMACK TSN = TLPP TSN and with no SACK (or CUMACK) of 1329 higher TSNs, then from the Unambiguous SACK information SCTP 1330 sender can classify to be in the following cases: 1332 + The original TSN has not (yet) been received, the 1333 retransmission TSN (the TLPP) has been received. 1335 - In this case the original TSN is judged as lost, CWND 1336 halving is performed on the path on which the original 1337 TSN was sent and the sent TSNs are subtracted from the 1338 flight size(s). This concludes the TLPP Retransmission 1339 Episode. 1341 + Both the original transmission as well as the retransmission 1342 (the TLPP) have been received. 1344 - In this case the sent TSNs are subtracted from the flight 1345 size(s). This concludes the TLPP Retransmission Episode. 1347 + The original TSN has been received, the retransmission TSN 1348 (the TLPP) has not yet been received: 1350 - In this case a special timer is started with value PTO- 1351 T_latest(TSN)and the bytes of the retransmitted TSN (the 1352 TLPP) remains in the flightsize of the path on which it 1353 was sent until either of the following happens - 1354 whichever happens first: 1356 o Unambiguous SACK of the TSN is received in which case 1357 the TSN is subtracted from the flightsize and the 1358 timer is stopped. This concludes the TLPP 1359 Retransmission Episode. 1361 o A SACK of a higher TSN than the TLPP arrives with 1362 unambiguous SACK information indicating that the TLPP 1363 has not been received. Now marking is made on the 1364 path so that, if when the timer kicks, the TSN has 1365 still not been acknowledged, the TSN is judged as 1366 lost, CWND halving is done and the TSN is subtracted 1367 from the flightsize. This then concludes the TLPP 1368 Retransmission Episode. 1370 o The timer kicks, the TSN is subtracted from the 1371 flightsize (but no CWND halving is done). This 1372 concludes the TLPP Retransmission Episode. 1374 * If the TLPP TSN is first cumulatively acknowledged in a SACK 1375 with highest SACK'ed (or CUMACK'ed) TSN > TLPP TSN, then from 1376 the Unambiguous SACK information SCTP sender can classify the 1377 same cases as above and take corresponding actions. One 1378 additional situation can arise in this situation: 1380 + Only one of the transmissions of the TSN has been received, 1381 but no clear Unambiguous SACK indication of which that was 1382 received is available from the SACK. This uncertainty can 1383 only result from situations where SACKs are lost, 1384 potentially in combination with that more data chunks than 1385 the TSN it self were outstanding at the time when the TLPP 1386 was sent and some of this data arrived later at the receiver 1387 than the original TSN or the TLPP. 1389 - In this case the original TSN is judged as having been 1390 received and it is subtracted on the flightsize of the 1391 path on which it was sent. The timer PTO-T_latest(TSN) 1392 is set and handling of potential CWND reduction caused by 1393 loss of the TLPP is handled following the principles 1394 described above. 1396 DISCUSSION of Unambiguous SACK Case Handling: CWND halving is not 1397 prescribed to be done for a potential lost retransmitted TSN used as 1398 TLPP in all cases above as there is no guarantee that a SACK 1399 confirming a potential arrival of the retransmitted TSN will arrive 1400 in time (i.e., this SACK may be lost). CWND halving is done if SACK 1401 of a higher TSN number than the TLPP number has arrived, PTO time has 1402 elapsed since the transmittal of the TLPP and the TLPP in it self 1403 cannot be determined to be received from the Unambiguous SACK 1404 information. 1406 3.3.5. Elimination of unnecesary DELAY-ACK delays 1408 The negative impact of DELAY_ACK on the loss recovery delay is 1409 partially mitigated by setting of the I-bit on TLPP. 1411 OPEN ISSUES: 1413 o It is to be determined if the Immediate SACK feature shall be 1414 relied on more aggressively. Possible options are: 1416 * Immediate SACK flag to be set on all retransmitted TSNs. 1418 * Immediate SACK flag to be set on all TSNs that are sent where 1419 the transmittal of an immediate following subsequent packet 1420 cannot be foreseen. This effectively would result in that the 1421 I-bit is set on a sent TSN whenever either of the following is 1422 true: 1424 + no more chunks can be sent right after this chunk due to 1425 CWND limitations. 1427 + no more chunks can be sent right after this due to RCV 1428 window limitations 1430 + no more chunks can be sent right after this as no more 1431 chunks are available in the SND buffer. 1433 + no more chunks can be sent right after this due to Nagle. 1434 (May depend on the exact Nagle-like implementation). 1436 For the second choice it would be relevant to use PTO1 setting for 1437 the PTO timer on all TSNs sent with the I-bit set, when the 1438 receiver is known to support the Immediate SACK feature. The 1439 downside of this choice is that it very severely limits the 1440 effectiveness of the DELAY_ACK feature. 1442 o Ideally the PTO timer relative to the lowest outstanding TSN 1443 should be adjusted to follow PTO2 when a subsequent packet is 1444 transmitted. The downside of this choice is the implementation 1445 impacts of such detailed - potentially per packet transmission - 1446 logic. To be elaborated further. 1448 4. Confirmation of support for Immediate SACK 1450 Confirmation of receiver support of the Immediate SACK function, 1451 [RFC7053] is established by an SCTP TLR sender by the following 1452 means: 1454 o In case the data chunk of [RFC4960] is in use on the association, 1455 confirmation of [RFC7053] support by the SCTP receiver is assumed 1456 if SCTP TLR sender receives a data chunk with the I-bit flag set. 1458 o [TO DE CONFIRMED:] In case the I-data chunk of [SCTP-IDATA] is in 1459 use on the association, SCTP sender can by [SCTP-IDATA] assume 1460 that SCTP receiver supports [RFC7053]. 1462 5. Socket API Considerations 1464 This section will describe how the socket API defined in [RFC6458] is 1465 extended to provide a way for the application to control the 1466 retransmission algorithms in operation in the SCTP layer. 1468 Socket option for control of the features is yet to be defined. 1470 Please note that this section is informational only. 1472 6. Security Considerations 1474 There are no new security considerations introduced by the functions 1475 defined in this document. 1477 7. Acknowledgements 1479 The author acknowledges Henrik Jensen for his very significant 1480 contribution for the definition of, the implementation of and the 1481 experiments with function. 1483 The work heavily draws on prior art work done for TCP, [DUKKIPATI01] 1484 in particular. The contributors of that work should be credited for 1485 many of the ideas put forward here for SCTP. 1487 8. IANA Considerations 1489 This document does not create any new registries or modify the rules 1490 for any existing registries managed by IANA. 1492 9. Discussion and Evaluation of function 1494 Experiments in progress. Details to be filled in. 1496 Right now we use this section to retain a number of issues that are 1497 to further elaborated on: 1499 o A significant number of spurious TLR probes have been observed in 1500 tests. It is to be determined if this is a fact of the function 1501 or whether it may be improved with adjustment of the PTO timer 1502 calculations. 1504 10. References 1506 10.1. Normative References 1508 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1509 Requirement Levels", BCP 14, RFC 2119, 1510 DOI 10.17487/RFC2119, March 1997, 1511 . 1513 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 1514 RFC 4960, DOI 10.17487/RFC4960, September 2007, 1515 . 1517 [RFC5061] Stewart, R., Xie, Q., Tuexen, M., Maruyama, S., and M. 1518 Kozuka, "Stream Control Transmission Protocol (SCTP) 1519 Dynamic Address Reconfiguration", RFC 5061, 1520 DOI 10.17487/RFC5061, September 2007, 1521 . 1523 [RFC5062] Stewart, R., Tuexen, M., and G. Camarillo, "Security 1524 Attacks Found Against the Stream Control Transmission 1525 Protocol (SCTP) and Current Countermeasures", RFC 5062, 1526 DOI 10.17487/RFC5062, September 2007, 1527 . 1529 [RFC7053] Tuexen, M., Ruengeler, I., and R. Stewart, "SACK- 1530 IMMEDIATELY Extension for the Stream Control Transmission 1531 Protocol", RFC 7053, DOI 10.17487/RFC7053, November 2013, 1532 . 1534 [SCTP-IDATA] 1535 R. Stewart et al, , "Stream Schedulers and User Message 1536 Interleaving for the Stream Control Transmission Protocol 1537 draft-ietf-tsvwg-sctp-ndata-04.txt", IETF Work In 1538 Progress , 07 2015. 1540 10.2. Informative References 1542 [CARO01] A. Caro et al, , "Retransmission Policies with Transport 1543 Layer Multihoming", ICON , 2003. 1545 [CARO02] A. Caro et al, , "Retransmission Schemes for End-to-end 1546 Failover with Transport Layer Multihoming", GLOBECOM , 11 1547 2004. 1549 [CMT-SCTP] 1550 Amer et al., P., "Load Sharing for the Stream Control 1551 Transmission Protocol (SCTP) draft-tuexen-tsvwg-sctp- 1552 multipath-10.txt", IETF Work In Progress , 5 2015. 1554 [DUKKIPATI01] 1555 Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, 1556 "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of 1557 Tail", Work Expired , 2 2013. 1559 [DUKKIPATI02] 1560 Dukkipati, N., Mathis, M., Cheng, Y., and M. Ghobadi, 1561 "Proportional Rate Reduction for TCP", Proceedings of the 1562 11th ACM SIGCOMM Conference on Internet Measurement , 11 1563 2011. 1565 [HURTIG] P. Hurtig et al., , "TCP and SCTP RTO Restart, draft-ietf- 1566 tcpm-rtorestart-08", IETF Work In Progress , 3 2015. 1568 [MATHIS] Mathis, M., "FACK", ACM SIGCOMM Computer Communication 1569 Review 26,4, 10 1996. 1571 [Rajiullah] 1572 M. Rajiullah et al., , "An Evaluation of Tail Loss 1573 Recovery Mechanisms for TCP", ACM SIGCOMM Computer 1574 Communication Review 45,1, 1 2015. 1576 [RFC3758] Stewart, R., Ramalho, M., Xie, Q., Tuexen, M., and P. 1577 Conrad, "Stream Control Transmission Protocol (SCTP) 1578 Partial Reliability Extension", RFC 3758, 1579 DOI 10.17487/RFC3758, May 2004, 1580 . 1582 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1583 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1584 . 1586 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and 1587 P. Hurtig, "Early Retransmit for TCP and Stream Control 1588 Transmission Protocol (SCTP)", RFC 5827, 1589 DOI 10.17487/RFC5827, May 2010, 1590 . 1592 [RFC6458] Stewart, R., Tuexen, M., Poon, K., Lei, P., and V. 1593 Yasevich, "Sockets API Extensions for the Stream Control 1594 Transmission Protocol (SCTP)", RFC 6458, 1595 DOI 10.17487/RFC6458, December 2011, 1596 . 1598 [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., 1599 and Y. Nishida, "A Conservative Loss Recovery Algorithm 1600 Based on Selective Acknowledgment (SACK) for TCP", 1601 RFC 6675, DOI 10.17487/RFC6675, August 2012, 1602 . 1604 [SCTP-PF] Y. Nishida et al, , "SCTP-PF: Quick Failover Algorithm in 1605 SCTP, draft-ietf-tsvwg-sctp-failover-13.txt", IETF Work In 1606 Progress , 09 2015. 1608 [zimmermann01] 1609 Zimmermann, A., "CUBIC for Fast Long-Distance Networks, 1610 draft-ietf-tcpm-cubic-00", IETF Work In Progress , 6 2015. 1612 [zimmermann02] 1613 Zimmermann, A., "The TCP Echo and TCP Echo Reply Option, 1614 draft-zimmermann-tcpm-echo-option-00", IETF Work In 1615 Progress , 6 2015. 1617 [zimmermann03] 1618 Zimmermann, A., "Using the TCP Echo Option for Spurious 1619 Retransmission Detection, draft-zimmermann-tcpm-spurious- 1620 rxmit-00", IETF Work In Progress , 7 2015. 1622 Appendix A. Unambuiguous SACK 1624 When receiving a SACK of a TSN it is not possible to unambiguously 1625 determine if the receiver hereby acknowledges the first transmission 1626 of the TSN or possible subsequent retransmissions of the TSN, when 1627 such multiple transmissions of the same TSN have been made. The 1628 duplicate TSN information in the SCTP SACK chunk does help to provide 1629 information about how many times the same TSN has been received at 1630 the received side, but still it is not possible to unequivocally link 1631 the SACK information to the different transmissions of the same TSN. 1632 An additional source of ambiguity comes from the fact that packets 1633 may be duplicated in the network. 1635 Unambiguous SACK information is generally beneficial for many SCTP 1636 protocol aspects, e.g., for improved RTT measurements, for more 1637 accurate loss detection, maintain of flightsize and congestion 1638 control operation. 1640 Providing full accurate SACK information from receiver to sender side 1641 requires a reliable (and ordered) SACK feedback channel thus 1642 overcoming the information gap that may arise from loss (or from re- 1643 ordering) of SACKs. The establishment of such a reliable feedback 1644 Chanel is not proposed but the proposal implements measures that 1645 allow for some robustness towards information loss due to SACK loss. 1647 NOTE for AUTHORS: The solution is independent from a potential split 1648 of the SACK TSN Gap information in SACK and NR-SACK gaps respectively 1649 following [CMT-SCTP]. 1651 A.1. TSN Retransmission ID in Data Chunk Header 1653 It is a prerequisite that the SCTP association deploy, and has 1654 negotiated usage of, the new I-data chunk of [SCTP-IDATA]. 1656 We define a new 4-bit Retransmission ID (RTX ID) in the I-data Chunk 1657 header. The 4 bits consume 4 bits of the new reserved 16-bit filed 1658 of the I-data chunk header. See Figure 1. 1660 0 1 2 3 1661 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1662 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1663 | Type = 64 | Res |I|U|B|E| Length | 1664 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1665 | TSN | 1666 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1667 | Stream Identifier | Reserved | RTX-ID| 1668 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1669 | Message Identifier | 1670 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1671 | Payload Protocol Identifier / Fragment Sequence Number | 1672 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1673 \ \ 1674 / User Data / 1675 \ \ 1676 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1678 Figure 1: RTX-ID in I-DATA chunk format 1680 A.1.1. Sender side behaviour 1682 New data MUST be sent with RTX-ID =0. Whenever SCTP retransmits a 1683 data chunk it SHOULD step up the RTX ID. The highest RXT ID = 15 is 1684 used for all retransmissions of the same TSN beyond the 15-th 1685 retransmission or when the RTX ID last used fort his TSN is 15. An 1686 SCTP sender MAY step the RTX ID up with more than one count when 1687 retransmitting a TSNs in order to have all TSNs within the SCTP 1688 packet use the one and the same RTX ID. 1690 A.1.2. Receiver side behaviour 1692 An SCTP receiver supporting this feature MUST process the RTX ID for 1693 all received TSNs in accordance with the prescriptions for 1694 Unambiguous SACK return below. 1696 A.2. Unambuiguous SACK Chunk 1697 0 1 2 3 1698 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1699 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1700 | Type = x |Chunk Flags | Chunk Length | 1701 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1702 | Cumulative TSN RTX (CUMACK TSN) | 1703 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1704 | Advertised Receiver Window Credit (a_rwnd) | 1705 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1706 | Number of Gap Ack Blocks = N | Reserved (future NR-SACK ?) | 1707 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1708 | NewlyCACK RTX ID Blocks = N | CACK Dupl TSN Blocks = N | 1709 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1710 | NewlySACK RTX ID Blocks = N | SACK Dupl TSN Blocks = N | 1711 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1712 | Number of RTX SACK Blocks = N | Reserved | 1713 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1714 | Highest CUMACK 'ed TSN received duplicated | 1715 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1716 | Gap Ack Block #1 Start | Gap Ack Block #1 End | 1717 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1718 / / 1719 \ format to be changed to cover more than 16-bits ? \ 1720 / / 1721 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1722 | Gap Ack Block #N Start | Gap Ack Block #N End | 1723 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1724 | | 1725 / / 1726 \ New Blocks in order set above ... to be filled in \ 1727 / / 1728 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1729 | | 1730 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1732 Figure 2: Unambuiguous SACK chunk format 1734 Newly CACK RTX ID block: 1736 This block provides information on the newly acknowledged TSNs 1737 that were cumulatively acked in this SACK and for which the 1738 following hold: 1740 * The TSN is newly acked in this SACK. I.e., the TSN has not 1741 been received before (or if it has been received before it was 1742 since reneged). 1744 * The newly acknowledged TSN was received with RTX ID different 1745 from zero. 1747 The RTX ID received with the TSN is returned in this block. The 1748 information returned in a CACK RTX ID block is a consecutive range 1749 of TSN fulfilling the above for which identical RTX ID has been 1750 received. Proposed format is off-set from CUMACK TSN (lower than 1751 CUMACK TSN), length of range and RTX ID. 1753 Newly SACK RTX ID block: 1755 This block provides information on the newly acknowledged TSNs 1756 that were selectively acknowledged in this SACK and for which the 1757 following hold: 1759 * The TSN is newly acked in this SACK. I.e., the TSN has not 1760 been received before (or if it has been received before, it was 1761 since reneged). 1763 * The newly acknowledged TSN was received with RTX ID different 1764 from zero. 1766 The RTX ID received with the TSN is returned in this block. The 1767 information returned in a SACK RTX ID block is a consecutive range 1768 of TSN fulfilling the above for which identical RTX ID has been 1769 received. Proposed format is off-set from CUMACK TSN (higher than 1770 CUMACK TSN), length of range and RTX ID - OR alternatively format 1771 of present SACK blocks with off set bounded by 16-bit to CUMACK 1772 TSN. 1774 Newly CACK Dupl TSN block: 1776 This block provides information on the TSNs received since last 1777 returned SACK for which following hold: 1779 * The TSN is lower than or equal to the CUMACK TSN. 1781 * The TSN is a duplicate. Meaning that a data chunk with same 1782 TSN, but possibly different RTX ID, has been received. 1784 The RTX ID received with the TSN is returned in this block. The 1785 information returned in a CACK Dupl TSN block is a consecutive 1786 range of TSN fulfilling the above for which identical RTX ID has 1787 been received. Proposed format is off-set from CUMACK TSN (lower 1788 than CUMACK TSN), length of range and RTX ID. The RTX ID may be 1789 zero. 1791 Newly SACK Dupl TSN block: 1793 This block provide information on the TSNs received since last 1794 returned SACK for which the following hold: 1796 * The TSN is higher than the CUMACK TSN. 1798 * The TSN is a duplicate. Meaning that a data chunk with same 1799 TSN, but possibly different RTX ID, has been received. 1801 The RTX ID received with the TSN is returned in this block. The 1802 information returned in a SACK Dupl TSN block is a consecutive 1803 range of TSN fulfilling the above for which identical RTX ID has 1804 been received. Proposed format is off-set from CUMACK TSN (higher 1805 than CUMACK TSN), length of range and RTX ID - OR - format of 1806 present SAC blocks with off set bounded by 16-bit to CUMACK TSN. 1807 The RTX ID may be zero. 1809 Together with the existing SACK information, the Newly CACK/SACK RTX 1810 ID and the CACK/SACK Dupl TSN blocks provide unambiguous SACK 1811 information for all received TSNs differentiating on the RTX ID 1812 received with the TSN. The information may be partially lost from 1813 the receiver to the sender if a SACK is lost. The RTX SACK Block and 1814 the Highest CUMACK Received Duplicated information is returned in 1815 order to provide means to recover part of the information that can be 1816 lost when a SACK is lost. 1818 RTX SACK block: 1820 This block provides information on the TSNs for which the 1821 following hold: 1823 * The TSN has been received and has been selectively acked in 1824 prior SACKs (OPEN: alternatively in SACKs including this one). 1826 * The TSN is higher than the CUMACK TSN. 1828 * The TSN has been received only with RTX IDs different from 1829 zero. 1831 The information returned in an RTX block is a consecutive range of 1832 TSN fulfilling the above. Proposed format is off-set from CUMACK 1833 TSN (higher than CUMACK TSN) and length of range - OR - format of 1834 present SACK blocks with off set - bounded by 16-bit to CUMACK 1835 TSN. 1837 Highest CUMACK'ed TSN received Duplicated: 1839 Here the highest TSNs that fulfill the following condition is 1840 inserted: 1842 * The TSN has been received duplicated 1844 * The TSN is lower than or equal to the CUMACK TSN. 1846 When no duplicates have been seen or when no duplicates have been 1847 seen in last 2^31 window of TSNs that have been cumulatively 1848 acknowledged, CUMACK TSN +1 is returned. 1850 By means of the RTX SACK block an SCTP sender may recover the 1851 information that a SACK'ed TSN does not represent the original TSN 1852 first sent. I.e., the TSN sent with RTX ID = 0. 1854 By means of the "Highest CUMACK'ed TSN received Duplicated" an SCTP 1855 receiver may recover the information that more than one incarnation 1856 of a TSN has been received when the SACK, which cumulatively 1857 acknowledged the arrival of the different incarnations of the TSN, in 1858 it self was lost. The particular example of special interest is the 1859 case where the one and the same SACK would contain information on 1860 receipt of both the original TSN and a spurious retransmission of the 1861 TSN. Such can happen in scenarios where DELAY_ACK handling at the 1862 receiver side delays the return of SACK information and a SACK is 1863 lost, even if the original data and the spurious retransmission data 1864 was sent with reasonable spacing in time. 1866 A.2.1. Receiver side behaviour 1868 The RTX SACK Block and the Highest CUMACK information to be returned 1869 in SACKs demand for an SCTP receiver to keep track (state) of the 1870 following information on a per association basis: 1872 o A list (or ranges) of TSNs that have been SACK'ed, but not yet 1873 cumulatively acknowledged and for which RTX ID = 0 has not been 1874 seen. It is noted that the TSN data chunk itself may have been 1875 delivered to the application. 1877 o The highest TSN lower than CUMACK TSN for which a duplicate has 1878 been received. 1880 A.3. Unambuigous SACK return 1882 Whenever Unambiguous SACKs are in use on an association and SCTP 1883 receives a valid data chunk with RTX-ID different from zero it shall 1884 not delay the return of the Unambiguous SACK. Otherwise Unambiguous 1885 SACKs are returned at any time when an [RFC4960] implementation would 1886 return a SACK. 1888 A window opener MUST include Unambiguous SACK information. 1890 A.4. Negotiation 1892 An SCTP receiver MUST NOT send an Unambiguous SACK chunk unless both 1893 peers have indicated its support of the Unambiguous SACK feature 1894 within the Supported Extensions Parameter as defined in [RFC5061]. 1895 If Unambiguous SACK has been negotiated on an association, 1896 Unambiguous SACKs MUST be returned whenever a SCTP receiver would 1897 return SACK information. If Unambiguous SACK has not been negotiated 1898 on an association, the RTX-ID field in the chunk header of incoming 1899 data chunks MUST be ignored and [RFC4960] SACK format and return 1900 policies MUST be adhered to. 1902 Authors' Addresses 1904 Karen E. E. Nielsen 1905 Ericsson 1906 Kistavaegen 25 1907 Stockholm 164 80 1908 Sweden 1910 Email: karen.nielsen@tieto.com 1912 Rafaelle De Santis 1913 Ericsson 1914 xx 1915 xx xx 1916 Italy 1918 Email: rafaele.de.santis@ericsson.com 1920 Anna Brunstrom 1921 Karlstad University 1922 Universitetsgatan 2 1923 Karlstad 651 88 1924 Sweden 1926 Email: anna.brunstrom@kau.se 1928 Michael Tuexen 1929 Muenster Univ. of Appl. Science 1930 Stegerwaldstrasse 39 1931 Steinfurt 48565 1932 Germany 1934 Email: tuexen@fh-muenster.de 1935 Randall Stewart 1936 Netflix, Inc. 1937 xx 1938 Chapin 29036 SC 1939 United States 1941 Email: randall@lakerest.net