idnits 2.17.1 draft-ietf-tcpm-tcp-dcr-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 18. -- Found old boilerplate from RFC 3978, Section 5.5 on line 819. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 796. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 803. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 809. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 2006) is 6676 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2988' is defined on line 728, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 3517 (Obsoleted by RFC 6675) -- Obsolete informational reference (is this intentional?): RFC 896 (Obsoleted by RFC 7805) -- Obsolete informational reference (is this intentional?): RFC 2960 (Obsoleted by RFC 4960) -- Obsolete informational reference (is this intentional?): RFC 2988 (Obsoleted by RFC 6298) Summary: 6 errors (**), 0 flaws (~~), 3 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Sumitha Bhandarkar 3 INTERNET DRAFT A. L. Narasimha Reddy 4 draft-ietf-tcpm-tcp-dcr-07.txt Texas A&M University 5 Expires: July 2006 Mark Allman 6 ICIR/ICSI 7 Ethan Blanton 8 Purdue University 9 January 2006 11 Improving the Robustness of TCP to Non-Congestion Events 13 Status of this Memo 15 By submitting this Internet-Draft, each author represents that any 16 applicable patent or other IPR claims of which he or she is aware 17 have been or will be disclosed, and any of which he or she becomes 18 aware will be disclosed, in accordance with Section 6 of BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt. 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 Copyright Notice 38 Copyright (C) The Internet Society (2006). 40 Abstract 42 This document specifies Non-Congestion Robustness (NCR) for TCP. In 43 the absence of explicit congestion notification from the network TCP 44 uses loss as an indication of congestion. One of the ways TCP 45 detects loss is using the arrival of three duplicate acknowledgments. 46 However, this heuristic is not always correct, notably in the case 47 when network paths reorder segments (for whatever reason), resulting 48 in degraded performance. TCP-NCR is designed to mitigate this 49 degraded performance by increasing the number of duplicate 50 acknowledgments required to trigger loss recovery, based on the 51 current state of the connection, in an effort to better disambiguate 52 true segment loss from segment reordering. This document specifies 53 the changes to TCP, as well as the costs and benefits of these 54 modifications. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . 2 59 2. NCR Description . . . . . . . . . . . . . . . . . . . . 5 60 3. Algorithm . . . . . . . . . . . . . . . . . . . . . . . 6 61 3.1 Initialization . . . . . . . . . . . . . . . . . . . 8 62 3.2 Terminating Extended Limited Transmit and 63 Preventing Bursts . . . . . . . . . . . . . . . . . . 9 64 3.3 Extended Limited Transmit . . . . . . . . . . . . . . 10 65 3.4 Entering Loss Recovery . . . . . . . . . . . . . . . 11 66 4. Advantages . . . . . . . . . . . . . . . . . . . . . . . 11 67 5. Disadvantages . . . . . . . . . . . . . . . . . . . . . 12 68 6. Related Work . . . . . . . . . . . . . . . . . . . . . . 13 69 7. Security Considerations . . . . . . . . . . . . . . . . 13 70 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . 14 71 9. IANA Considerations . . . . . . . . . . . . . . . . . . 14 72 10. Normative References . . . . . . . . . . . . . . . . . . 14 73 11. Informative References . . . . . . . . . . . . . . . . . 14 74 12. Author's Addresses . . . . . . . . . . . . . . . . . . . 16 76 Terminology 78 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 79 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 80 "OPTIONAL" in this document are to be interpreted as described 81 in [RFC2119]. 83 Readers should be familiar with the TCP terminology (e.g., 84 FlightSize, Pipe, etc.) given in [RFC2581] and [RFC3517]. 86 1. Introduction 88 One strength of TCP [RFC793] lies in its ability to adjust its 89 sending rate according to the perceived congestion in the network 90 [Jac88,RFC2581]. In the absence of explicit notification of 91 congestion from the network, TCP uses segment loss as an indication 92 of congestion (i.e., assuming queue overflow). TCP receivers send 93 cumulative acknowledgments (ACKs) indicating the next sequence number 94 expected from the sender for arriving segments [RFC793]. When 95 segments arrive out-of-order, duplicate ACKs are generated. As 96 specified in [RFC2581], a TCP sender uses the arrival of three 97 duplicate ACKs as an indication of segment loss. The TCP sender 98 retransmits the lost segment and reduces the load imposed on the 99 network, assuming the segment loss was caused by resource contention 100 within the network path. The TCP sender does not assume loss on the 101 first or second duplicate ACK, but waits for three duplicate ACKs to 102 account for minor packet reordering. However, the use of this 103 constant threshold of duplicate ACKs has several problems that can be 104 mitigated with a dynamic threshold. 106 The following is an example of TCP's behavior: 108 + TCP A is the data sender and TCP B is the data receiver. 110 + TCP A sends 10 segments each consisting of a single data byte 111 (i.e., transmits bytes 1-10 in segments 1-10). 113 + Assume segment 3 is dropped in the network. 115 + TCP B cumulatively acknowledges segments 1 and 2, making the 116 cumulative ACK transmitted to the sender 3 (the next expected 117 sequence number). (Note: TCP B may generate one or two ACKs, 118 depending on whether delayed ACKs [RFC1122,RFC2581] are 119 employed.) 121 + The arrival of segments 4-10 at TCP B will each trigger the 122 transmission of a cumulative ACK for sequence number 3. (Note: 123 [RFC2581] recommends that delayed ACKs not be used when the ACK 124 is triggered by an out-of-order segment.) 126 + When TCP A receives the third duplicate ACK (or fourth ACK 127 overall) for sequence number 3, TCP A will retransmit 128 segment 3 and reduce the sending rate by roughly half (see 129 [RFC2581] for specifics on the congestion control state 130 adjustments). 132 Alternatively, suppose segment 3 was not dropped by the network, but 133 rather delayed such that segment 3 arrives at TCP B after segment 10. 134 The above scenario will play out in precisely the same manner 135 insomuch as a retransmission of segment 3 will be triggered. In 136 other words, TCP is not capable of disambiguating this reordering 137 event from a segment loss, resulting in an unnecessary retransmission 138 and rate reduction. 140 The following is the specific motivation behind making TCP robust to 141 reordered segments: 143 * A number of Internet measurement studies have shown that packet 144 reordering is not a rare phenomenon [Pax97,BPS99,JIDKT03,GPL04]. 145 Further, the reordering can be well beyond that required for 146 fast retransmit to be falsely triggered. 148 * [BA02,ZKFP03] show the negative performance implications that 149 packet reordering has on current TCP. 151 * The requirement imposed by TCP for almost in-order packet 152 delivery places a constraint on the design of future technology. 153 Novel routing algorithms, network components, link-layer 154 retransmission mechanisms and applications could all be looked 155 at with a fresh perspective if TCP were to be more robust to 156 segment reordering. For instance, high speed packet switches 157 could cause resequencing of packets if TCP were more robust. 158 There has been work proposed in the literature explicitly to 159 ensure that packet ordering is maintained in such switches 160 (e.g., [KM02]). Also, link-layer mechanisms that attempt to 161 recover from packet corruption by retransmitting could be 162 allowed to reorder packets and, hence, increase the chances of 163 local loss repair rather than relying on TCP to repair the loss 164 (and, needlessly reduce its sending rate). Additional examples 165 include multi-path routing, high-delay satellite links and some 166 of the schemes proposed for a differentiated services 167 architecture. By making TCP more robust to non-congestion 168 events, TCP-NCR may open the design space of the future Internet 169 components. 171 In this document we specify a set of TCP sender modifications to 172 provide Non-Congestion Robustness (NCR) to TCP. In particular, these 173 changes are built on top of TCP with selective acknowledgments 174 (SACKs) [RFC2018] and the SACK-based loss recovery scheme given in 175 [RFC3517], since SACK is widely deployed at this point ([MAF05] 176 indicates that 68% of web servers and 88% of web clients utilize SACK 177 as of spring, 2004). 179 We note that the TCP-NCR algorithm provided in this document could be 180 easily adapted to SCTP [RFC2960] since SCTP uses congestion control 181 algorithms similar to TCP's (and, hence, has the same reordering 182 robustness issues). 184 As we note in several places in the remainder of this document, we 185 consider TCP-NCR to be experimental in that more experience with the 186 techniques is required before TCP-NCR should be used on a large scale 187 on the Internet. We encourage implementation and experimentation 188 with TCP-NCR in the hopes of gaining an understanding of its 189 suitability for wide-scale deployment. 191 The remainder of this document is organized as follows. Section 2 192 provides a high-level description of the TCP-NCR mechanisms. In 193 Section 3, we specify the TCP-NCR algorithm. Section 4 provides a 194 brief overview of the benefits of TCP-NCR, while Section 5 discusses 195 the drawbacks of TCP-NCR. Section 6 discusses related work. Section 196 7 discusses security concerns. 198 2. NCR Description 200 As discussed above, in the face of packet reordering, three duplicate 201 ACKs may not be enough to disambiguate loss from reordering. In this 202 section we provide a non-normative sketch of TCP-NCR. The detailed 203 algorithms for implementing Non-Congestion Robustness for TCP are 204 presented in the next section. 206 The general idea behind TCP-NCR is to increase the threshold used to 207 trigger a fast retransmission from the current fixed value of three 208 duplicate ACKs [RFC2581] to approximately a congestion window of data 209 having left the network (but, not less than the currently 210 standardized value of three duplicate ACKs). Since cwnd represents 211 the amount of data a TCP flow can transmit in one round-trip time 212 (RTT), waiting to receive notice that cwnd bytes have left the 213 network before deciding whether the root cause is loss or reordering 214 imposes a delay of roughly one RTT on both the retransmission and the 215 congestion control response. The appropriate choice for a new value 216 of the threshold is essentially a tradeoff between making the best 217 decision regarding the cause of the duplicate ACKs and 218 responsiveness. The choice to trigger a retransmission only after a 219 cwnd's worth of data is known to have left the network represents 220 roughly the largest amount of time a TCP can wait before the (often 221 costly) retransmission timeout may be triggered. Therefore, the 222 algorithm described in this document attempts to make the best 223 decision possible at the expense of timeliness. 225 Simply increasing the threshold before retransmitting a segment can 226 make TCP brittle to packet loss or ACK loss since such loss reduces 227 the number of duplicate ACKs that will arrive at the sender from the 228 receiver. For instance, if the cwnd is 10 segments and one segment 229 is lost, a duplicate ACK threshold of 10 will never be met because 230 duplicate ACKs corresponding to at most 9 segments will arrive at the 231 sender. To offset the issue of loss, we extend TCP's Limited 232 Transmit [RFC3042] scheme to allow for the sending of new data during 233 the period when the TCP sender is disambiguating loss and reordering. 234 This new data serves to increase the likelihood of enough duplicate 235 ACKs arriving at the sender to trigger loss recovery if it is 236 appropriate. 238 At this point we note that TCP tightly couples reliability and 239 congestion control -- when a segment is declared lost, a 240 retransmission is triggered and a change to the sending rate is also 241 made on the assumption that the drop is due to resource contention 243 [RFC2581]. Therefore, by simply changing the retransmission trigger 244 the congestion control response is also changed. However, we lack 245 experience on the Internet as to whether delaying the point that a 246 rate reduction takes place is appropriate for wide-scale deployment. 247 Therefore, the Extended Limited Transmit mechanism proposed in this 248 document offers two variants for experimentation. 250 The first Extended Limited Transmit variant, Careful Limited 251 Transmit, calls for the transmission of one previously unsent 252 segment, in response to duplicate acknowledgements, for every two 253 segments that are known to have left the network. This has the 254 effect of halving the sending rate since normal TCP operation calls 255 for the sending of one segment for every segment that has left the 256 network. Further, the halving starts immediately and is not delayed 257 until a retransmission is triggered. In the case of packet 258 reordering (i.e., not segment loss) the congestion control state is 259 restored to its previous state when reordering is determined. 261 The second variant, Aggressive Limited Transmit, calls for 262 transmitting one previously unsent data segment, in response to 263 duplicate acknowledgements, for every segment known to have left the 264 network. With this variant, while waiting to disambiguate the loss 265 from a reordering event, ACK-clocked transmission continues at 266 roughly the same rate as before the event started. Retransmission 267 and the sending rate reduction happen per [RFC2581,RFC3517], albeit 268 with the delayed threshold described above. While this approach 269 delays legitimate rate reductions (possibly slightly and temporarily 270 aggravating overall congestion on the network) the scheme has the 271 advantage of not reducing the transmission rate in the face of 272 segment reordering. 274 It is an open question which of the two Extended Limited Transmit 275 variants is best for use on the Internet. 277 3. Algorithm 279 The TCP-NCR modifications make two fundamental changes to the way 280 [RFC3517] currently operates, as follows. 282 First, the trigger for retransmitting a segment is changed from three 283 duplicate ACKs [RFC2581,RFC3517] to indications that a congestion 284 window's worth of data has left the network. Second, TCP-NCR 285 decouples initial congestion control decisions from retransmission 286 decisions, in some cases delaying congestion control changes relative 287 to TCP's current behavior defined in [RFC2581]. The algorithm 288 provides two alternatives for extending Limited Transmit. The two 289 variants of extended Limited Transmit are: 291 Careful Limited Transmit: 293 This variant calls for reducing the sending rate at 294 approximately the same time [RFC2581] implementations reduce 295 the congestion window, while at the same time withholding a 296 retransmission (and the final congestion determination) for 297 approximately one RTT. 299 Aggressive Limited Transmit: 301 This variant calls for maintaining the sending rate in the 302 face of duplicate ACKs until TCP concludes a segment is lost 303 and needs to be retransmitted (which TCP-NCR delays by one 304 RTT when compared with current loss recovery schemes). 306 A TCP-NCR implementation MUST use either Careful Limited Transmit or 307 Aggressive Limited Transmit. 309 A constant MUST be set depending on which variant of extended Limited 310 Transmit is used, as follows: 312 Careful Limited Transmit: 314 LT_F = 2/3 316 Aggressive Limited Transmit: 318 LT_F = 1/2 320 This constant reflects the fraction of outstanding data (including 321 data sent during Extended Limited Transmit) that must be SACKed 322 before a retransmission is triggered. Since Aggressive Limited 323 Transmit sends a new segment for every segment known to have left the 324 network, a total of roughly cwnd segments will be sent during 325 Aggressive Limited Transmit and therefore ideally a total of roughly 326 2*cwnd segments will be outstanding when a retransmission is 327 triggered. The duplicate ACK threshold is then set to LT_F = 1/2 of 328 2*cwnd (or about 1 RTT worth of data). The factor is different for 329 Careful Limited Transmit because the sender only transmits one new 330 segment for every two segments that are SACKed and therefore will 331 ideally have a total of 1.5*cwnd segments outstanding when the 332 retransmission is to be triggered. Hence, the required threshold is 333 LT_F=2/3 of 1.5*cwnd to delay the retransmission by roughly 1 RTT. 335 There are situations whereby the sender cannot transmit new data 336 during Extended Limited Transmit (e.g., lack of data from the 337 application, receiver's advertised window limit). These situations 338 can lead to the problems discussed in the last section when a TCP 339 does not employ Extended Limited Transmit and is starved for ACKs. 340 Therefore, TCP-NCR adapts the duplicate ACK threshold on each SACK 341 arrival to be as robust as possible given the actual amount of data 342 that has been transmitted, or roughly LT_F times the number of 343 outstanding segments. 345 The TCP-NCR modifications specified in this document lend themselves 346 to incremental deployment. Only the TCP implementation on the sender 347 side requires modification (assuming both hosts support SACK). The 348 changes themselves are modest. However, as will be discussed below, 349 availability of additional buffer space at the receiver will help 350 maximize the benefits of using TCP-NCR but are not strictly 351 necessary. 353 The following algorithms depend on the notions provided by [RFC3517] 354 and we assume the reader is familiar with the terminology given in 355 [RFC3517]. The TCP-NCR algorithm can be adapted to alternate SACK- 356 based loss recovery schemes. [BR04,BSRV04] outline non-SACK-based 357 algorithms, however, we do not specify those algorithms in this 358 document and do not recommend them due to both the complexity and 359 security implications of having only a gross understanding of the 360 number of outstanding segments in the network. 362 A TCP connection using the Nagle algorithm [RFC896,RFC1122] MAY 363 employ the TCP-NCR algorithm. If a TCP implementation does implement 364 TCP-NCR the implementation MUST follow the various specifications 365 provided in sections 3.1 - 3.4. If the Nagle algorithm is not being 366 used there is no way to accurately calculate the number of 367 outstanding segments in the network (and, therefore, no good way to 368 derive an appropriate duplicate ACK threshold) without adding state 369 to the TCP sender. A TCP connection that does not employ the Nagle 370 algorithm SHOULD NOT use TCP-NCR. We envision that NCR could be 371 adapted to an implementation that carefully tracks the sequence 372 numbers transmitted in each segment. However, we leave this as 373 future work. 375 3.1. Initialization 377 When entering a period of loss / reordering detection and Extended 378 Limited Transmit a TCP-NCR MUST initialize several state variables. 379 A TCP MUST enter Extended Limited Transmit upon receiving the first 380 ACK with a SACK block after the reception of an ACK that (a) did not 381 contain SACK information and (b) did increase the connection's 382 cumulative ACK point. The initializations are: 384 (I.1) The TCP MUST save the current FlightSize. 386 FlightSizePrev = FlightSize 388 (I.2) The TCP MUST set a variable for tracking the number of 389 segments for which an ACK does not trigger a transmission 390 during Careful Limited Transmit. 392 Skipped = 0 394 (Note: Skipped is not used during Aggressive Limited 395 Transmit.) 397 (I.3) The TCP MUST set DupThresh (from [RFC3517]) based on the 398 current FlightSize. 400 DupThresh = max (LT_F * (FlightSize / SMSS),3) 402 Note: We keep the lower bound of DupThresh = 3 from 403 [RFC2581,RFC3517]. 405 In addition to the above steps, the incoming ACK MUST be processed 406 with the E series of steps in section 3.3. 408 3.2. Terminating Extended Limited Transmit and Preventing Bursts 410 Extended Limited Transmit MUST be terminated at the start of loss 411 recovery as outlined in section 3.4. 413 The arrival of an ACK that advances the cumulative ACK point while in 414 Extended Limited Transmit, but before loss recovery is triggered 415 signals that a series of duplicate ACKs were caused by reordering and 416 not congestion. Therefore, the receipt of an ACK that extends the 417 cumulative ACK point MUST terminate Extended Limited Transmit. As 418 described below (in (T.4)), an ACK that extends the cumulative ACK 419 point and *also* contains SACK information will also trigger the 420 beginning of a new Extended Limited Transmit phase. 422 Upon the termination of Extended Limited Transmit, and especially 423 when using the Careful variant, TCP-NCR may be in a situation where 424 the entire cwnd is not being utilized and therefore TCP-NCR will be 425 prone to transmitting a burst of segments into the network. 426 Therefore, to mitigate this bursting when a TCP-NCR in the Extended 427 Limited Transmit phase receives an ACK that updates the cumulative 428 ACK point (regardless of whether the ACK contains SACK information), 429 the following steps MUST be taken: 431 (T.1) A TCP MUST reset cwnd to: 433 cwnd = min (FlightSize + SMSS,FlightSizePrev) 435 This step ensures that cwnd is not grossly larger than the 436 amount of data outstanding --- a situation that would cause a 437 line rate burst. 439 (T.2) A TCP MUST set ssthresh to: 441 ssthresh = FlightSizePrev 443 This step provides TCP-NCR with a sense of "history". If step 444 (T.1) reduces cwnd below FlightSizePrev this step ensures that 445 TCP-NCR will slow start back to the operating point in effect 446 before Extended Limited Transmit. 448 (T.3) A TCP is now permitted to transmit previously unsent data as 449 allowed by cwnd, FlightSize, application data availability and 450 the receiver's advertised window. 452 (T.4) When an incoming ACK extends the cumulative ACK point and also 453 contains SACK information, the initializations in steps (I.2) 454 and (I.3) from section 3.1 MUST be taken (but, step (I.1) MUST 455 NOT be executed) to re-start Extended Limited Transmit. In 456 addition, the series of steps in section 3.3 (the "E" steps) 457 MUST be taken. 459 3.3. Extended Limited Transmit 461 On each ACK containing SACK information that arrives after TCP-NCR 462 has entered the Extended Limited Transmit phase (as outlined in 463 section 3.1) and before Extended Limited Transmit terminates, the 464 sender MUST use the following procedure. 466 (E.1) The SetPipe () procedure from [RFC3517] MUST be used to set 467 the "pipe" variable (which represents the number of bytes 468 still considered "in the network"). Note: the current value 469 of DupThresh MUST be used by SetPipe () to produce an accurate 470 assessment of the amount of data still considered in the 471 network. 473 (E.2) If the comparison in equation (1) below holds and there are 474 SMSS bytes of previously unsent data available for 475 transmission then the sender MUST transmit one segment of SMSS 476 bytes. 478 (pipe + Skipped) <= (FlightSizePrev - SMSS) (1) 480 If the comparison in equation (1) does not hold or no new data 481 can be transmitted (due to lack of data from the application 482 or the advertised window limit), skip to step (E.6). 484 (E.3) Pipe MUST be incremented by SMSS bytes. 486 (E.4) If using Careful Limited Transmit, Skipped MUST be incremented 487 by SMSS bytes to ensure that the next SMSS bytes of SACKed data 488 processed does not trigger a Limited Transmit transmission 489 (since the goal of Careful Limited Transmit is to send upon 490 the reception of every second duplicate ACK). 492 (E.5) A TCP MUST return to step (E.2) to ensure that as many bytes 493 as appropriate are transmitted. This provides robustness to 494 ACK loss that can be (largely) compensated for using SACK 495 information. 497 (E.6) DupThresh MUST be reset via: 499 DupThresh = max (LT_F * (FlightSize / SMSS),3) 501 where FlightSize is the total number of bytes that have not 502 been cumulatively acknowledged (which is different from 503 "pipe"). 505 3.4 Entering Loss Recovery 507 When a segment is deemed lost via the algorithms in [RFC3517], 508 Extended Limited Transmit MUST be terminated, leaving the 509 algorithms in [RFC3517] to govern TCP's behavior. One slight 510 change to [RFC3517] MUST be made, however. In section 5, step 511 (2) of [RFC3517] MUST be changed to: 513 (2) ssthresh = cwnd = (FlightSizePrev / 2) 515 This ensures that the congestion control modifications are made 516 with respect to the amount of data in the network before 517 FlightSize was increased by Extended Limited Transmit. 519 Note: Once the algorithm in [RFC3517] takes over from Extended 520 Limited Transmit the DupThresh value MUST be held constant until 521 the loss recovery phase is terminated. 523 4. Advantages 525 The major advantages of TCP-NCR are two-fold. As discussed in 526 section 1, TCP-NCR will open up the design space for network 527 applications and components that are currently constrained by TCP's 528 lack of robustness to packet reordering. The second advantage is in 529 terms of an increase in TCP performance. 531 [BR04] presents ns-2 [NS-2] simulations of a pre-cursor to the TCP- 532 NCR algorithm specified in this document, called TCP-DCR (Delayed 533 Congestion Response). The paper shows that TCP-DCR aids performance 534 in comparison to unmodified TCP in the presence of packet reordering. 535 In addition, the extended version of [BR04] presents results based on 536 emulations involving Linux (kernel 2.4.24). These results show that 537 the performance of TCP-DCR is similar to Linux's native 538 implementation that seeks to "undo" wrong decisions based on DSACK 539 [RFC2883] feedback (similar to the schemes outlined in [ZKFP03]), 540 when packets are reordered by less than one RTT. The advantage of 541 using TCP-DCR over the DSACK-based scheme is that the DSACK-based 542 scheme tries to estimate the exact amount of reordering in the 543 network using fairly complex algorithms, whereas TCP-DCR achieves 544 similar results with less complicated modifications. 546 In addition, [BR04,BSRV04] illustrate the ability of TCP-DCR to allow 547 for the improvement of other parts of the system. For example, these 548 papers show that increasing TCP's robustness to packet reordering 549 allows for a novel wireless ARQ mechanism to be added at the link- 550 layer. The added robustness of the link-layer to channel errors, in 551 turn, increases TCP performance by not requiring TCP to retransmit 552 packets that were dropped due to corruption (and, hence, also 553 prevents TCP from needlessly reducing the sending rate when 554 retransmitting these segments). 556 5. Disadvantages 558 While we note that all of the changes outlined above are implemented 559 in the sender, the receiver also potentially has a part to play. In 560 particular, TCP-NCR increases the receiver's buffering requirement by 561 up to an extra cwnd -- in the case of the TCP sender using Aggressive 562 Limited Transmit and actual loss occurring in the network. 563 Therefore, to maximize the benefits from TCP-NCR receivers should 564 advertise a large window to absorb the extra out-of-order traffic. In 565 the case that the additional buffer requirements are not met, the use 566 of the above algorithm takes into account the reduced advertised 567 window---with a corresponding loss in robustness to packet 568 reordering. 570 In addition, using TCP-NCR could delay the delivery of data to the 571 application by up to one RTT because the fast retransmission point is 572 delayed by roughly one RTT in TCP-NCR. Applications that are 573 sensitive to such delays should turn off the TCP-NCR option. For 574 instance, a socket option could be introduced to allow applications 575 to control whether NCR would be used for a particular connection. 577 Finally, the use of TCP-NCR makes the recovery from congestion events 578 sluggish in comparison to the standard reaction in [RFC2581]. [BR04, 579 BSRV04] show (via simulation) that the delay in congestion response 580 has minimal impact on the connection itself and the traffic sharing a 581 bottleneck. [BBFS01] also indicates (again, via simulation) that 582 "slowly responsive" congestion control may be safe for deployment in 583 the Internet. These studies suggest that schemes that slightly delay 584 congestion control decisions may be reasonable, however, further 585 experimentation on the Internet is required to verify these results. 587 6. Related Work 589 Over the past few years, several solutions have been proposed to 590 improve the performance of TCP in the face of segment reordering. 591 These schemes generally fall into one of two categories (with some 592 overlap): mechanisms that try to prevent spurious retransmits from 593 happening and mechanisms that try to detect spurious retransmits and 594 "undo" the needless congestion control state changes that have been 595 taken. 597 [BA02,ZKFP03] attempt to prevent segment reordering from triggering 598 spurious retransmits by using various algorithms to approximate the 599 duplicate ACK threshold required to disambiguate loss and reordering 600 over a given network path at a given time. TCP-NCR similarly tries 601 to prevent spurious retransmits. However, TCP-NCR takes a simplified 602 approach compared to those in [BA02,ZKFP03] in that TCP-NCR simply 603 delays retransmission by an amount based on the current cwnd (in 604 comparison to standard TCP), while the other schemes use relatively 605 complex algorithms in an attempt to derive a more precise value for 606 DupThresh that depends on the current patterns of packet reordering. 607 While TCP-NCR offers simplicity the other schemes may offer more 608 precision such that applications would not be forced to wait as long 609 for their retransmissions. Future work could be undertaken to 610 achieve robustness without needless delay. 612 On the other hand, several schemes have been developed to detect and 613 mitigate needless retransmissions after the fact. 614 [RFC3522,RFC3708,BA02,RFC4015,SK04] present algorithms to detect 615 spurious retransmits and mitigate the changes these events made to 616 the congestion control state. TCP-NCR could be used in conjunction 617 with these algorithms, with TCP-NCR attempting to prevent spurious 618 retransmits and some other scheme kicking in if the prevention 619 failed. In addition, we note that TCP-NCR is concentrated on 620 preventing spurious fast retransmits and some of the above algorithms 621 also attempt to detect and mitigate spurious timeout-based 622 retransmits. 624 7. Security Considerations 626 We do not believe there are security implications involved with TCP- 627 NCR over and above those for general TCP congestion control 629 [RFC2581]. In particular, the Extended Limited Transmit algorithms 630 specified in this document have been specifically designed not to be 631 susceptible to the sorts of ACK splitting attacks TCP's general TCP 632 congestion control is vulnerable to (as discussed in [RFC3465]). 634 8. Acknowledgements 636 Feedback from Lars Eggert, Ted Faber, Wesley Eddy, Gorry Fairhurst, 637 Sally Floyd, Sara Landstrom, Nauzad Sadry, Pasi Sarolahti, Joe Touch 638 and Nitin Vaidya and the TCPM working group have contributed 639 significantly to this document. Our thanks to all! 641 9. IANA Considerations 643 This document requires no IANA assignments. The RFC Editor can 644 safely remove this section. 646 10. Normative References 648 [RFC793] J. Postel, "Transmission Control Protocol", RFC 793, 649 September 1981. 651 [RFC2018] M. Mathis, J. Mahdavi, S. Floyd and A. Romanow, "TCP 652 selective acknowledgment options," Internet RFC 2018. 654 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 655 Requirement Levels", BCP 14, RFC 2119, March 1997. 657 [RFC2581] M. Allman, V. Paxson, and W. Stevens, "TCP Congestion 658 Control", RFC 2581, April 1999. 660 [RFC3042] M. Allman, H. Balakrishnan and S. Floyd, "Enhancing TCP's 661 Loss Recovery Using Limited Transmit", RFC 3042, January 2001. 663 [RFC3517] E. Blanton, M. Allman, K. Fall and L. Wang, "A Conservative 664 Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for 665 TCP", RFC 3517, April 2003. 667 11. Informative References 669 [BA02] E. Blanton and M. Allman, "On Making TCP More Robust to Packet 670 Reordering," ACM Computer Communication Review, January 2002. 672 [BBFS01] D. Bansal, H. Balakrishnan, S. Floyd and S. Shenker, 673 "Dynamic Behavior of Slowly Responsive Congestion Control 674 Algorithms", Proceedings of ACM SIGCOMM, Sep. 2001. 676 [BPS99] J. Bennett, C. Partridge, and N. Shectman, "Packet reordering 677 is not pathological network behavior," IEEE/ACM Transactions on 678 Networking, December 1999. 680 [BR04] Sumitha Bhandarkar and A. L. Narasimha Reddy, "TCP-DCR: Making 681 TCP Robust to Non-Congestion Events", In the Proceedings of 682 Networking 2004 conference, May 2004. Extended version available as 683 tech report TAMU-ECE-2003-04. 685 [BSRV04] Sumitha Bhandarkar, Nauzad Sadry, A. L. Narasimha Reddy and 686 Nitin Vaidya, "TCP-DCR: A Novel Protocol for Tolerating Wireless 687 Channel Errors", To appear in IEEE Transactions on Mobile Computing 689 [GPL04] Ladan Gharai, Colin Perkins and Tom Lehman, "Packet 690 Reordering, High Speed Networks and Transport Protocol Performance", 691 ICCCN 2004, October 2004. 693 [Jac88] V. Jacobson, "Congestion Avoidance and Control", Computer 694 Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988. 695 ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z. 697 [JIDKT03] S. Jaiswal, G. Iannaccone, C. Diot, J. Kurose, and D. 698 Towsley, "Measurement and Classification of Out-of-Sequence Packets 699 in a Tier-1 IP Backbone," Proceedings of IEEE INFOCOM, 2003. 701 [KM02] I. Keslassy and N. McKeown, "Maintaining packet order in 702 twostage switches," Proceedings of the IEEE Infocom, June 2002 704 [MAF05] A. Medina, M. Allman, S. Floyd. Measuring the Evolution of 705 Transport Protocols in the Internet. ACM Computer Communication 706 Review, 35(2), April 2005. 708 [NS-2] ns-2 Network Simulator. http://www.isi.edu/nsnam/ 710 [Pax97] V. Paxson, "End-to-End Internet Packet Dynamics," Proceedings 711 of ACM SIGCOMM, September 1997. 713 [RFC896] J. Nagle, "Congestion Control in IP/TCP Internetworks", RFC 714 896, January 1984. 716 [RFC1122] R. Braden, "Requirements for Internet Hosts - Communication 717 Layers", RFC 1122, October 1989. 719 [RFC2883] Sally Floyd, Jamshid Mahdavi, Matt Mathis and Matt 720 Podolsky, "An Extension to the Selective Acknowledgement (SACK) 721 Option for TCP," RFC 2883, July 2000. 723 [RFC2960] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. 724 Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L. Zhang, V. Paxson. 726 Stream Control Transmission Protocol. October 2000. 728 [RFC2988] V. Paxson and M. Allman, "Computing TCP's Retransmission 729 Timer", RFC 2988, November 2000. 731 [RFC3465] M. Allman. TCP Congestion Control with Appropriate Byte 732 Counting (ABC), February 2003. RFC 3465. 734 [RFC3522] R. Ludwig and M. Meyer, "The Eifel Detection Algorithm for 735 TCP," RFC 3522, April 2003. 737 [RFC3708] E. Blanton and M. Allman, "Using TCP Duplicate Selective 738 Acknowledgement (DSACKs) and Stream Control Transmission Protocol 739 (SCTP) Duplicate Transmission Sequence Numbers (TSNs) to Detect 740 Spurious Retransmissions", RFC 3708, February 2004. 742 [RFC4015] R. Ludwig, A. Gurtov, "The Eifel Response Algorithm for 743 TCP", RFC 4015, February 2005. 745 [SK04] P. Sarolahti, M. Kojo, "Forward RTO-Recovery (F-RTO): An 746 Algorithm for Detecting Spurious Retransmission Timeouts with TCP and 747 SCTP", Internet-Draft draft-ietf-tcpm-frto-02.txt (work in progress). 748 November 2004. 750 [ZKFP03] M. Zhang, B. Karp, S. Floyd, L. Peterson, "RR-TCP: A 751 Reordering-Robust TCP with DSACK", in Proceedings of the Eleventh 752 IEEE International Conference on Networking Protocols (ICNP 2003), 753 Atlanta, GA, November, 2003. 755 12. Author's Addresses 757 Sumitha Bhandarkar 758 Dept. of Elec. Engg. 759 214 ZACH 760 College Station, TX 77843-3128 761 Phone: (512) 468-8078 762 Email: sumitha@tamu.edu 763 URL : http://students.cs.tamu.edu/sumitha/ 765 A. L. Narasimha Reddy 766 Professor 767 Dept. of Elec. Engg. 768 315C WERC 769 College Station, TX 77843-3128 770 Phone : (979) 845-7598 771 Email : reddy@ee.tamu.edu 772 URL : http://ee.tamu.edu/~reddy/ 773 Mark Allman 774 ICSI Center for Internet Research 775 1947 Center Street, Suite 600 776 Berkeley, CA 94704-1198 777 Phone: (216) 243-7361 778 Email: mallman@icir.org 779 URL: http://www.icir.org/mallman/ 781 Ethan Blanton 782 Purdue University Computer Science 783 250 North University Street 784 West Lafayette, IN 47907 785 Email: eblanton@cs.purdue.edu 787 Intellectual Property Statement 789 The IETF takes no position regarding the validity or scope of any 790 Intellectual Property Rights or other rights that might be claimed to 791 pertain to the implementation or use of the technology described in 792 this document or the extent to which any license under such rights 793 might or might not be available; nor does it represent that it has 794 made any independent effort to identify any such rights. Information 795 on the procedures with respect to rights in RFC documents can be 796 found in BCP 78 and BCP 79. 798 Copies of IPR disclosures made to the IETF Secretariat and any 799 assurances of licenses to be made available, or the result of an 800 attempt made to obtain a general license or permission for the use of 801 such proprietary rights by implementers or users of this 802 specification can be obtained from the IETF on-line IPR repository at 803 http://www.ietf.org/ipr. 805 The IETF invites any interested party to bring to its attention any 806 copyrights, patents or patent applications, or other proprietary 807 rights that may cover technology that may be required to implement 808 this standard. Please address the information to the IETF at 809 ietf-ipr@ietf.org. 811 Disclaimer of Validity 813 This document and the information contained herein are provided on an 814 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 815 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 816 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 817 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 818 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 819 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 821 Copyright Statement 823 Copyright (C) The Internet Society (2006). This document is subject 824 to the rights, licenses and restrictions contained in BCP 78, and 825 except as set forth therein, the authors retain all their rights. 827 Acknowledgment 829 Funding for the RFC Editor function is currently provided by the 830 Internet Society.