idnits 2.17.1 draft-ietf-tcpm-tcp-dcr-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 18. -- Found old boilerplate from RFC 3978, Section 5.5 on line 764. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 741. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 748. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 754. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 16 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 17 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 2005) is 6791 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2988' is defined on line 672, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 3517 (Obsoleted by RFC 6675) -- Obsolete informational reference (is this intentional?): RFC 896 (Obsoleted by RFC 7805) -- Obsolete informational reference (is this intentional?): RFC 2960 (Obsoleted by RFC 4960) -- Obsolete informational reference (is this intentional?): RFC 2988 (Obsoleted by RFC 6298) Summary: 9 errors (**), 0 flaws (~~), 5 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Sumitha Bhandarkar 3 INTERNET DRAFT A. L. Narasimha Reddy 4 draft-ietf-tcpm-tcp-dcr-05.txt Texas A&M University 5 Expires: April 2005 Mark Allman 6 ICIR/ICSI 7 Ethan Blanton 8 Purdue University 9 September 2005 11 Improving the Robustness of TCP to Non-Congestion Events 13 Status of this Memo 15 By submitting this Internet-Draft, each author represents that any 16 applicable patent or other IPR claims of which he or she is aware 17 have been or will be disclosed, and any of which he or she becomes 18 aware will be disclosed, in accordance with Section 6 of BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt. 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 Copyright Notice 38 Copyright (C) The Internet Society (2005). 40 Abstract: 42 This document specifies Non-Congestion Robustness (NCR) for TCP. In 43 the absence of explicit congestion notification from the network, 44 TCP's loss recovery algorithms treat the receipt of three duplicate 45 acknowledgments as an implicit indication of congestion in the 46 network. This is not always correct, notably in the case when 47 network paths reorder segments (for whatever reason), resulting in 48 degraded performance. TCP-NCR is designed to mitigate this degraded 49 performance by increasing the number of duplicate acknowledgments 50 required to trigger loss recovery, based on the current state of the 51 connection, in an effort to better disambiguate true segment loss 52 from segment reordering. This document specifies the changes to TCP, 53 as well as the costs and benefits of these modifications. 55 Terminology 57 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 58 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 59 "OPTIONAL" in this document are to be interpreted as described 60 in [RFC2119]. 62 Readers should be familiar with the TCP terminology given in 63 [RFC2581] and [RFC3517]. 65 1. Introduction 67 One strength of TCP [RFC793] lies in its ability to adjust its 68 sending rate according to the perceived congestion in the network 69 [Jac88,RFC2581]. In the absence of explicit notification of 70 congestion from the network, TCP uses segment loss as an indication 71 of congestion (i.e., assuming queue overflow). TCP receivers send 72 cumulative acknowledgments (ACKs) indicating the next sequence number 73 expected from the sender for arriving segments [RFC793]. When 74 segments arrive out-of-order, duplicate ACKs are generated. As 75 specified in [RFC2581], a TCP sender uses the arrival of three 76 duplicate ACKs as an indication of segment loss. The TCP sender 77 retransmits the lost segment and reduces the load imposed on the 78 network, assuming the segment loss was caused by resource contention 79 within the network path. The TCP sender does not assume loss on the 80 first or second duplicate ACK, but waits for three duplicate ACKs to 81 account for mild reordering. However, the use of this constant 82 threshold of duplicate ACKs has several problems that can be 83 mitigated with a dynamic threshold. 85 The following is an example of TCP's behavior: 87 + TCP A is the data sender and TCP B is the data receiver. 89 + TCP A sends 10 segments each consisting of a single data byte 90 (i.e., transmits bytes 1-10 in segments 1-10). 92 + Assume segment 3 is dropped in the network. 94 + TCP B cumulatively acknowledges segments 1 and 2, making the 95 cumulative ACK transmitted to the sender 3 (the next expected 96 sequence number). (Note: TCP B may generate one or two ACKs, 97 depending on whether delayed ACKs [RFC1122,RFC2581] are 98 employed.) 100 + The arrival of segments 4-10 at TCP B will each trigger the 101 transmission of a cumulative ACK for sequence number 3. (Note: 102 [RFC2581] recommends that delayed ACKs not be used when the ACK 103 is triggered by an out-of-order segment.) 105 + When TCP A receives the third duplicate ACK (or fourth ACK 106 overall) for sequence number 3, TCP A will retransmit 107 segment 3 and reduce the sending rate by roughly half (see 108 [RFC2581] for specifics on the congestion control state 109 adjustments). 111 Alternatively, suppose segment 3 was not dropped by the network, but 112 rather delayed such that segment 3 arrives after segment 10. The 113 above scenario will play out in precisely the same manner insomuch as 114 a retransmission of segment 3 will be triggered. In other words, TCP 115 is not capable of disambiguating this reordering event from a segment 116 loss. 118 The following is the specific motivation behind making TCP robust to 119 reordered segments: 121 * A number of Internet measurement studies have shown that packet 122 reordering is not a rare phenomenon [Pax97,BPS99,JIDKT03,GPL04]. 123 Further, the reordering can be well beyond that required for 124 fast retransmit to be falsely triggered. 126 * [BA02,ZKFP03] show the negative performance implications that 127 packet reordering has on current TCP. 129 * The requirement imposed by TCP for almost in-order packet 130 delivery places a constraint on the design of future technology. 131 Novel routing algorithms, network components, link-layer 132 retransmission mechanisms and applications could all be looked 133 at with a fresh perspective if TCP were to be more robust to 134 segment reordering. For instance, high speed packet switches 135 could cause resequencing of packets if TCP were more robust. 136 There has been work proposed in the literature explicitly to 137 ensure that packet ordering is maintained in such switches 138 [KM02]. Also, link-layer mechanisms that attempt to recover 139 from packet corruption by retransmitting could be allowed to 140 reorder packets and, hence, increase the chances of local loss 141 repair rather than relying on TCP to repair the loss (and, 142 needlessly reduce its sending rate). Additional examples 143 include multi-path routing, high-delay satellite links and some 144 of the schemes proposed for differentiated services 145 architecture. By making TCP more robust to non-congestion 146 events, TCP-NCR may open the design space of the future Internet 147 components. 149 In this document we specify a set of TCP sender modifications to 150 provide Non-Congestion Robustness (NCR) to TCP. In particular, these 151 changes are built on top of TCP with selective acknowledgments 152 (SACKs) [RFC2018] and the SACK-based loss recovery scheme given in 153 [RFC3517], since SACK is widely deployed at this point ([MAF05] 154 indicates that 68% of web servers and 88% of web clients utilize SACK 155 as of spring, 2004). 157 Finally, we note that the TCP-NCR algorithm provided in this document 158 could be easily adapted to SCTP [RFC2960] since SCTP uses congestion 159 control algorithms similar to TCP's (and, hence, has the same 160 reordering robustness issues). 162 The remainder of this document is organized as follows. Section 2 163 provides a high-level description of the TCP-NCR mechanisms. In 164 Section 3, we specify the TCP-NCR algorithm. Section 4 provides a 165 brief overview of the benefits of TCP-NCR, while Section 5 discusses 166 the drawbacks of TCP-NCR. Section 6 discusses related work. Section 167 7 discusses security concerns. 169 2. NCR Description 171 As discussed above, in the face of packet reordering three duplicate 172 ACKs may not be enough to disambiguate loss from reordering. In this 173 section we provide a non-normative sketch of TCP-NCR. The detailed 174 algorithms for implementing Non-Congestion Robustness for TCP are 175 presented in the next section. 177 The general idea behind TCP-NCR is to increase the threshold used to 178 trigger a fast retransmission from the current fixed value of three 179 duplicate ACKs [RFC2581] to approximately a congestion window of data 180 having left the network (but, not less than the currently 181 standardized value of three duplicate ACKs). Since cwnd represents 182 the amount of data a TCP flow can transmit in one round-trip time 183 (RTT), waiting to receive notice that cwnd bytes have left the 184 network before deciding whether the root cause is loss or reordering 185 imposes a delay of roughly one RTT. The appropriate choice for a new 186 value of the threshold is essentially a tradeoff between making the 187 best decision regarding the cause of the duplicate ACKs and 188 responsiveness. The choice to trigger a retransmission only after a 189 cwnd's worth of data is known to have left the network represents 190 roughly the largest amount of time a TCP can wait before the (often 191 costly) retransmission timeout may be triggered. Therefore, the 192 algorithm described in this document attempts to make the best root 193 cause decision possible. 195 Simply increasing the threshold before retransmitting a segment can 196 make TCP brittle to packet loss or ACK loss since such loss reduces 197 the number of duplicate ACKs that will arrive at the sender from the 198 receiver. For instance, if the cwnd is 10 segments and one segment 199 is lost, a duplicate ACK threshold of 10 will never be met because 200 duplicate ACKs corresponding to at most 9 segments will arrive at the 201 sender. To offset the issue of loss, we extend TCP's Limited 202 Transmit [RFC3042] scheme to allow for the sending of new data during 203 the period when the TCP sender is disambiguating loss and reordering. 204 This new data serves to increase the likelihood of enough duplicate 205 ACKs arriving at the sender to trigger loss recovery if it is 206 appropriate. 208 At this point we note that TCP tightly couples reliability and 209 congestion control -- when a segment is declared lost, a 210 retransmission is triggered and a change to the sending rate is also 211 made on the assumption that the drop is due to resource contention 212 [RFC2581]. Therefore, by simply changing the retransmission trigger 213 the congestion control response is also changed. However, we lack 214 experience on the Internet as to whether delaying the point that a 215 rate reduction takes place is appropriate for wide-scale deployment. 216 Therefore, the Extended Limited Transmit mechanism proposed in this 217 document offers two variants for experimentation. 219 The first Extended Limited Transmit variant, Careful Limited 220 Transmit, calls for the transmission of one previously unsent 221 segment, in response to duplicate acknowledgements, for every two 222 segments that are known to have left the network. This has the 223 effect of halving the sending rate since normal TCP operation calls 224 for the sending of one segment for every segment that has left the 225 network. Further, the halving starts immediately and is not delayed 226 until a retransmission is triggered. In the case of packet 227 reordering (i.e., not segment loss) the congestion control state is 228 restored to its previous state when reordering is determined. 230 The second variant, Aggressive Limited Transmit, calls for 231 transmitting one previously unsent data segment, in response to 232 duplicate acknowledgements, for every segment known to have left the 233 network. With this variant, while waiting to disambiguate the loss 234 from a reordering event, ACK-clocked transmission continues at 235 roughly the same rate as before the event started. Retransmission 236 and the sending rate reduction happen per [RFC2581,RFC3517], albeit 237 with the delayed threshold described above. While this approach 238 delays legitimate rate reductions (possibly slightly and temporarily 239 aggravating overall congestion on the network) the scheme has the 240 advantage of not reducing the transmission rate in the face of 241 segment reordering. 243 It is an open question which of the two Extended Limited Transmit 244 variants is best for use on the Internet. 246 3. Algorithm 248 The TCP-NCR modifications make two fundamental changes to the way 249 [RFC3517] currently operates, as follows. 251 First, the trigger for retransmitting a segment is changed from three 252 duplicate ACKs [RFC2581,RFC3517] to indications that a congestion 253 window's worth of data has left the network. Second, TCP-NCR 254 decouples initial congestion control decisions from retransmission 255 decisions, in some cases delaying congestion control changes relative 256 to TCP's current behavior defined in [RFC2581]. The algorithm 257 provides two alternatives for extending Limited Transmit. The two 258 variants of extended Limited Transmit are: 260 Careful Limited Transmit: 262 This variant calls for reducing the sending rate at 263 approximately the same time [RFC2581] implementations reduce 264 the congestion window, while at the same time withholding a 265 retransmission (and the final congestion determination) for 266 approximately one RTT. 268 Aggressive Limited Transmit: 270 This variant calls for maintaining the sending rate in the 271 face of duplicate ACKs until TCP concludes a segment is lost 272 and needs to be retransmitted (which TCP-NCR delays by one 273 RTT when compared with current loss recovery schemes). 275 A TCP-NCR implementation MUST use either Careful Limited Transmit or 276 Aggressive Limited Transmit. 278 A constant MUST be set depending on which variant of extended Limited 279 Transmit is used, as follows: 281 Careful Limited Transmit: 283 LT_F = 2/3 285 Aggressive Limited Transmit: 287 LT_F = 1/2 289 This constant reflects the fraction of outstanding data that must be 290 SACKed before a retransmission is triggered. Since Aggressive 291 Limited Transmit sends a new segment for every segment known to have 292 left the network, a total of roughly cwnd segments will be sent 293 during Aggressive Limited Transmit and therefore ideally a total of 294 2*cwnd segments will be outstanding. The duplicate ACK threshold is 295 then set to LT_F = 1/2 of 2*cwnd (or about 1 RTT worth of data). The 296 factor is different for Careful Limited Transmit because the sender 297 only transmits one new segment for every two segments that are SACKed 298 and therefore will ideally have a total of 1.5*cwnd segments 299 outstanding when the retransmission is to be triggered. Hence, the 300 required threshold is LT_F=2/3 of 1.5*cwnd to delay the 301 retransmission by roughly 1 RTT. 303 There are situations whereby the sender cannot transmit new data 304 during Extended Limited Transmit (e.g., lack of data from the 305 application, receiver's advertised window limit). These situations 306 can lead to the problems discussed in the last section when a TCP 307 does not employ Extended Limited Transmit and is starved for ACKs. 308 Therefore, TCP-NCR adapts the duplicate ACK threshold on each SACK 309 arrival to be as robust as possible given the actual amount of data 310 that has been transmitted, or roughly LT_F times the number of 311 outstanding segments. 313 The TCP-NCR modifications specified in this document lend themselves 314 to incremental deployment. Only the TCP implementation on the sender 315 side requires modification. The changes themselves are modest. 316 However, as will be discussed below, availability of additional 317 buffer space at the receiver will help maximize the benefits of using 318 TCP-NCR but are not strictly necessary. 320 The following algorithms depend on the notions provided by [RFC3517] 321 and we assume the reader is familiar with the terminology given in 322 [RFC3517]. The TCP-NCR algorithm can be adapted to alternate SACK- 323 based loss recovery schemes. [BR04,BSRV04] outline non-SACK-based 324 algorithms, however, we do not specify those algorithms in this 325 document and do not recommend them due to both the complexity and 326 security implications of having only a gross understanding of the 327 number of outstanding segments in the network. 329 A TCP connection using the Nagle algorithm [RFC896,RFC1122] MAY 330 employ the TCP-NCR algorithm. If a TCP implementation does implement 331 TCP-NCR the implementation MUST follow the various specifications 332 provided in sections 3.1 - 3.4. If the Nagle algorithm is not being 333 used there is no way to accurately calculate the number of 334 outstanding segments in the network (and, therefore, no good way to 335 derive an appropriate duplicate ACK threshold) without adding state 336 to the TCP sender. A TCP connection that does not employ the Nagle 337 algorithm SHOULD NOT use TCP-NCR. We envision that NCR could be 338 adapted to an implementation that carefully tracks the sequence 339 numbers transmitted in each segment. However, we leave this as 340 future work. 342 3.1. Initialization 344 When entering a period of loss / reordering detection and Extended 345 Limited Transmit a TCP-NCR MUST initialize several state variables. 346 A TCP MUST enter Extended Limited Transmit upon receiving the first 347 ACK with a SACK block after the reception of an ACK that (a) did not 348 contain SACK information and (b) did increase the connection's 349 cumulative ACK point. The initializations are: 351 (I.1) Save the current FlightSize. 353 FlightSizePrev = FlightSize 355 (I.2) Set a variable for tracking the number of segments for which 356 an ACK does not trigger a transmission during Careful Limited 357 Transmit. 359 Skipped = 0 361 (Note: Skipped is not used during Aggressive Limited 362 Transmit.) 364 (I.3) Set DupThresh (from [RFC3517]) based on the size of the 365 current FlightSize. 367 DupThresh = max (LT_F * (FlightSize / SMSS),3) 369 Note: We keep the lower bound of DupThresh = 3 from 370 [RFC2581,RFC3517]. 372 In addition to the above steps, the incoming ACK MUST be processed 373 with the E series of steps in section 3.3. 375 3.2. Terminating Extended Limited Transmit and Preventing Bursts 377 Extended Limited Transmit MUST be terminated at the start of loss 378 recovery as outlined in section 3.4. 380 The arrival of an ACK that advances the cumulative ACK point while in 381 Extended Limited Transmit, but before loss recovery is triggered 382 signals that a series of duplicate ACKs were caused by reordering and 383 not congestion. Therefore, the receipt of an ACK that extends the 384 cumulative ACK point MUST terminate Extended Limited Transmit. As 385 described below (in (T.4)), an ACK that extends the cumulative ACK 386 point and *also* contains SACK information will also trigger the 387 beginning of a new Extended Limited Transmit phase. 389 Upon the termination of Extended Limited Transmit, and especially 390 when using the Careful variant, TCP-NCR may be in a situation where 391 the entire cwnd is not being utilized and therefore TCP-NCR will be 392 prone to transmitting a burst of segments into the network. 393 Therefore, when a TCP-NCR in the Extended Limited Transmit phase 394 receives an ACK that updates the cumulative ACK point (regardless of 395 whether the ACK contains SACK information), the following steps MUST 396 be taken: 398 (T.1) cwnd = min (FlightSize + SMSS,FlightSizePrev) 400 This step ensures that cwnd is not grossly larger than the 401 amount of data outstanding --- a situation that would cause a 402 line rate burst. 404 (T.2) ssthresh = FlightSizePrev 406 This step provides TCP-NCR with a sense of "history". If step 407 (T.1) reduces cwnd below FlightSizePrev this step ensures that 408 TCP-NCR will slow start back to the operating point in effect 409 before Extended Limited Transmit. 411 (T.3) Transmit previously unsent data as allowed by cwnd, 412 FlightSize, application data availability and the receiver's 413 advertised window. 415 (T.4) When the ACK extends the cumulative ACK point and also 416 contains SACK information, the initializations in steps (I.2) 417 and (I.3) from section 3.1 MUST be taken (but, not step (I.1)) 418 to re-start Extended Limited Transmit. In addition, the 419 series of steps in section 3.3 (the "E" steps) MUST be taken. 421 3.3. Extended Limited Transmit 423 On each ACK containing SACK information that arrives after TCP-NCR 424 has entered the Extended Limited Transmit phase (as outlined in 425 section 3.1) and before Extended Limited Transmit terminates, the 426 sender MUST use the following procedure. 428 (E.1) Use the SetPipe () procedure from [RFC3517] to set the "pipe" 429 variable (which represents the number of bytes still considered 430 "in the network"). 432 (E.2) If the comparison in equation (1) below holds and there are 433 SMSS bytes of previously unsent data available for 434 transmission then transmit one segment of SMSS bytes. 436 (pipe + Skipped) <= (FlightSizePrev - SMSS) (1) 438 If the comparison in equation (1) does not hold or no new data 439 can be transmitted (due to lack of data from the application 440 or the advertised window limit), skip to step (E.6). 442 (E.3) Increment pipe by SMSS bytes. 444 (E.4) If using Careful Limited Transmit, increment Skipped by SMSS 445 bytes to ensure that the next SMSS bytes of SACKed data 446 processed do not trigger a Limited Transmit transmission (since 447 the goal of Careful Limited Transmit is to send upon the 448 reception of every second duplicate ACK). 450 (E.5) Return to step (E.2) to ensure that as many bytes as 451 appropriate are transmitted. This provides robustness to ACK 452 loss that can be (largely) compensated for using SACK 453 information. 455 (E.6) Reset DupThresh via: 457 DupThresh = max (LT_F * (FlightSize / SMSS),3) 459 where FlightSize is the total number of bytes that have not 460 been cumulatively acknowledged (which is different from 461 "pipe"). 463 3.4 Entering Loss Recovery 465 When a segment is deemed lost via the algorithms in [RFC3517], 466 Extended Limited Transmit MUST be terminated, leaving the 467 algoritms in [RFC3517] to govern TCP's behavior. One slight 468 change to [RFC3517] MUST be made, however. In section 5, step 469 (2) of [RFC3517] MUST be changed to: 471 (2) ssthresh = cwnd = (FlightSizePrev / 2) 473 This ensures that the congestion control modifications are made 474 with respect to the amount of data in the network before 475 FlightSize was increased by Extended Limited Transmit. 477 4. Advantages 479 The major advantages of TCP-NCR are two-fold. As discussed in 480 section 1, TCP-NCR will open up the design space for network 481 applications and components that are currently constrained by TCP's 482 lack of robustness to packet reordering. The second advantage is in 483 terms of an increase in TCP performance. 485 [BR04] presents ns-2 [NS-2] simulations of a pre-cursor to the TCP- 486 NCR algorithm specified in this document, called TCP-DCR (Delayed 487 Congestion Response). The paper shows that TCP-DCR aids performance 488 in comparison to unmodified TCP in the presence of packet reordering. 489 In addition, the extended version of [BR04] presents results based on 490 emulations involving Linux (kernel 2.4.24). These results show that 491 the performance of TCP-DCR is similar to Linux's native 492 implementation that seeks to "undo" wrong decisions based on DSACK 493 [RFC2883] feedback (similar to the schemes outlined in [ZKFP03]), 494 when packets are reordered by less than one RTT. The advantage of 495 using TCP-DCR over the DSACK-based scheme is that the DSACK-based 496 scheme tries to estimate the exact amount of reordering in the 497 network using fairly complex algorithms, whereas TCP-DCR achieves 498 similar results with less complicated modifications. 500 In addition, [BR04,BSRV04] illustrate the ability of TCP-DCR to allow 501 for the improvement of other parts of the system. For example, these 502 papers show that increasing TCP's robustness to packet reordering 503 allows for a novel wireless ARQ mechanism to be added at the link- 504 layer. The added robustness of the link-layer to channel errors, in 505 turn, increases TCP performance by not requiring TCP to retransmit 506 packets that were dropped due to corruption (and, hence, also 507 prevents TCP from needlessly reducing the sending rate when 508 retransmitting these segments). 510 5. Disadvantages 512 While we note that all of the changes outlined above are implemented 513 in the sender, the receiver also potentially has a part to play. In 514 particular, TCP-NCR increases the receiver's buffering requirement by 515 up to an extra cwnd -- in the case of the TCP sender using Aggressive 516 Limited Transmit and actual loss occurring in the network. 517 Therefore, to maximize the benefits from TCP-NCR receivers should 518 advertise a large window to absorb the extra out-of-order traffic. In 519 the case that the additonal buffer requirements are not met, the use 520 of the above algorithm takes into account the reduced advertised 521 window. 523 In addition, using TCP-NCR could delay the delivery of data to the 524 application by up to one RTT because the fast retransmission point is 525 delayed by roughly one RTT in TCP-NCR. Applications that are 526 sensitive to such delays should turn off the TCP-NCR option. For 527 instance, a socket option could be introduced to allow applications 528 to control whether NCR would be used for a particular connection. 530 Finally, the use of TCP-NCR makes the recovery from congestion events 531 sluggish in comparison to the standard reaction in [RFC2581]. [BR04, 532 BSRV04] show (via simulation) that the delay in congestion response 533 has minimal impact on the connection itself and the traffic sharing a 534 bottleneck. [BBFS01] also indicates (again, via simulation) that 535 "slowly responsive" congestion control may be safe for deployment in 536 the Internet. These studies suggest that schemes that slightly delay 537 congestion control decisions may be reasonable, however, further 538 experimentation on the Internet is required to verify these results. 540 6. Related Work 542 Over the past few years, several solutions have been proposed to 543 improve the performance of TCP in the face of segment reordering. 544 These schemes generally fall into one of two categories (with some 545 overlap): mechanisms that try to prevent spurious retransmits from 546 happening and mechanisms that try to detect spurious retransmits and 547 "undo" the needless congestion control state changes that have been 548 taken. 550 [BA02,ZKFP03] attempt to prevent segment reordering from triggering 551 spurious retransmits by using various algorithms to approximate the 552 duplicate ACK threshold required to disambiguate loss and reordering 553 over a given network path at a given time. TCP-NCR similarly tries 554 to prevent spurious retransmits. However, TCP-NCR takes a simplified 555 approach compared to those in [BA02,ZKFP03] in that TCP-NCR simply 556 delays retransmission by a fixed amount (in comparison to standard 557 TCP), while the other schemes use relatively complex algorithms in an 558 attempt to derive a more precise value for DupThresh that depends on 559 the network conditions. While TCP-NCR offers simplicity the other 560 schemes may offer more precision such that applications would not be 561 forced to wait as long for their retransmissions. Future work could 562 be undertaken to achieve robustness without needless delay. 564 On the other hand, several schemes have been developed to detect and 565 mitigate needless retransmissions after the fact. 566 [RFC3522,RFC3708,BA02,RFC4015,SK04] present algorithms to detect 567 spurious retransmits and mitigate the changes these events made to 568 the congestion control state. TCP-NCR could be used in conjunction 569 with these algorithms, with TCP-NCR attempting to prevent spurious 570 retransmits and some other scheme kicking in if the prevention 571 failed. In addition, we note that TCP-NCR is concentrated on 572 preventing spurious fast retransmits and some of the above algorithms 573 also attempt to detect and mitigate spurious timeout-based 574 retransmits. 576 7. Security Considerations 577 We do not believe there are security implications involved with TCP- 578 NCR over and above those for general TCP congestion control 579 [RFC2581]. In particular, the Extended Limited Transmit algorithms 580 specified in this document have been specifically designed not to be 581 susceptible to the sorts of ACK splitting attacks TCP's general TCP 582 congestion control is vulnerable to (as discussed in [RFC3465]). 584 8. Acknowledgements 586 Ted Faber, Wesley Eddy, Gorry Fairhurst, Sally Floyd, Nauzad Sadry, 587 Pasi Sarolahti, Joe Touch and Nitin Vaidya as well as feedback from 588 the TCPM working group have contributed significantly to this 589 document. Our thanks to all! 591 9. Normative References 593 [RFC793] J. Postel, "Transmission Control Protocol", RFC 793, 594 September 1981. 596 [RFC2018] M. Mathis, J. Mahdavi, S. Floyd and A. Romanow, "TCP 597 selective acknowledgment options," Internet RFC 2018. 599 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 600 Requirement Levels", BCP 14, RFC 2119, March 1997. 602 [RFC2581] M. Allman, V. Paxson, and W. Stevens, "TCP Congestion 603 Control", RFC 2581, April 1999. 605 [RFC3042] M. Allman, H. Balakrishnan and S. Floyd, "Enhancing TCP's 606 Loss Recovery Using Limited Transmit", RFC 3042, January 2001. 608 [RFC3517] E. Blanton, M. Allman, K. Fall and L. Wang, "A Conservative 609 Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for 610 TCP", RFC 3517, April 2003. 612 10. Informative References 614 [BA02] E. Blanton and M. Allman, "On Making TCP More Robust to Packet 615 Reordering," ACM Computer Communication Review, January 2002. 617 [BBFS01] D. Bansal, H. Balakrishnan, S. Floyd and S. Shenker, 618 "Dynamic Behavior of Slowly Responsive Congestion Control 619 Algorithms", Proceedings of ACM SIGCOMM, Sep. 2001. 621 [BPS99] J. Bennett, C. Partridge, and N. Shectman, "Packet reordering 622 is not pat hological network behavior," IEEE/ACM Transactions on 623 Networking, December 1999. 625 [BR04] Sumitha Bhandarkar and A. L. Narasimha Reddy, "TCP-DCR: Making 626 TCP Robust to Non-Congestion Events", In the Proceedings of 627 Networking 2004 conference, May 2004. Extended version available as 628 tech report TAMU-ECE-2003-04. 630 [BSRV04] Sumitha Bhandarkar, Nauzad Sadry, A. L. Narasimha Reddy and 631 Nitin Vaidya, "TCP-DCR: A Novel Protocol for Tolerating Wireless 632 Channel Errors", To appear in IEEE Transactions on Mobile Computing 634 [GPL04] Ladan Gharai, Colin Perkins and Tom Lehman, "Packet 635 Reordering, High Speed Networks and Transport Protocol Performance", 636 ICCCN 2004, October 2004. 638 [Jac88] V. Jacobson, "Congestion Avoidance and Control", Computer 639 Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988. 640 ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z. 642 [JIDKT03] S. Jaiswal, G. Iannaccone, C. Diot, J. Kurose, and D. 643 Towsley, "Measurement and Classification of Out-of-Sequence Packets 644 in a Tier-1 IP Backbone," Proceedings of IEEE INFOCOM, 2003. 646 [KM02] I. Keslassy and N. McKeown, "Maintaining packet order in 647 twostage switche s," Proceedings of the IEEE Infocom, June 2002 649 [MAF05] A. Medina, M. Allman, S. Floyd. Measuring the Evolution of 650 Transport Protocols in the Internet. ACM Computer Communication 651 Review, 35(2), April 2005. 653 [NS-2] ns-2 Network Simulator. http://www.isi.edu/nsnam/ 655 [Pax97] V. Paxson, "End-to-End Internet Packet Dynamics," Proceedings 656 of ACM SIGCOMM, September 1997. 658 [RFC896] J. Nagle, "Congestion Control in IP/TCP Internetworks", RFC 659 896, January 1984. 661 [RFC1122] R. Braden, "Requirements for Internet Hosts - Communication 662 Layers", RFC 1122, October 1989. 664 [RFC2883] Sally Floyd, Jamshid Mahdavi, Matt Mathis and Matt 665 Podolsky, "An Extension to the Selective Acknowledgement (SACK) 666 Option for TCP," RFC 2883, July 2000. 668 [RFC2960] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. 669 Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L. Zhang, V. Paxson. 670 Stream Control Transmission Protocol. October 2000. 672 [RFC2988] V. Paxson and M. Allman, "Computing TCP's Retransmission 673 Timer", RFC 2988, November 2000. 675 [RFC3465] M. Allman. TCP Congestion Control with Appropriate Byte 676 Counting (ABC), February 2003. RFC 3465. 678 [RFC3522] R. Ludwig and M. Meyer, "The Eifel Detection Algorithm for 679 TCP," RFC 3522, April 2003. 681 [RFC3708] E. Blanton and M. Allman, "Using TCP Duplicate Selective 682 Acknowledgement (DSACKs) and Stream Control Transmission Protocol 683 (SCTP) Duplicate Transmission Sequence Numbers (TSNs) to Detect 684 Spurious Retransmissions", RFC 3708, February 2004. 686 [RFC4015] R. Ludwig, A. Gurtov, "The Eifel Response Algorithm for 687 TCP", RFC 4015, February 2005. 689 [SK04] P. Sarolahti, M. Kojo, "Forward RTO-Recovery (F-RTO): An 690 Algorithm for Detecting Spurious Retransmission Timeouts with TCP and 691 SCTP", Internet-Draft draft-ietf-tcpm-frto-02.txt (work in progress). 692 November 2004. 694 [ZKFP03] M. Zhang, B. Karp, S. Floyd, L. Peterson, "RR-TCP: A 695 Reordering-Robust TCP with DSACK", in Proceedings of the Eleventh 696 IEEE International Conference on Networking Protocols (ICNP 2003), 697 Atlanta, GA, November, 2003. 699 11. Author's Addresses 701 Sumitha Bhandarkar 702 Dept. of Elec. Engg. 703 214 ZACH 704 College Station, TX 77843-3128 705 Phone: (512) 468-8078 706 Email: sumitha@tamu.edu 707 URL : http://students.cs.tamu.edu/sumitha/ 709 A. L. Narasimha Reddy 710 Professor 711 Dept. of Elec. Engg. 712 315C WERC 713 College Station, TX 77843-3128 714 Phone : (979) 845-7598 715 Email : reddy@ee.tamu.edu 716 URL : http://ee.tamu.edu/~reddy/ 718 Mark Allman 719 ICSI Center for Internet Research 720 1947 Center Street, Suite 600 721 Berkeley, CA 94704-1198 722 Phone: (216) 243-7361 723 Email: mallman@icir.org 724 URL: http://www.icir.org/mallman/ 726 Ethan Blanton 727 Purdue University Computer Science 728 250 North University Street 729 West Lafayette, IN 47907 730 Email: eblanton@cs.purdue.edu 732 Intellectual Property Statement 734 The IETF takes no position regarding the validity or scope of any 735 Intellectual Property Rights or other rights that might be claimed to 736 pertain to the implementation or use of the technology described in 737 this document or the extent to which any license under such rights 738 might or might not be available; nor does it represent that it has 739 made any independent effort to identify any such rights. Information 740 on the procedures with respect to rights in RFC documents can be 741 found in BCP 78 and BCP 79. 743 Copies of IPR disclosures made to the IETF Secretariat and any 744 assurances of licenses to be made available, or the result of an 745 attempt made to obtain a general license or permission for the use of 746 such proprietary rights by implementers or users of this 747 specification can be obtained from the IETF on-line IPR repository at 748 http://www.ietf.org/ipr. 750 The IETF invites any interested party to bring to its attention any 751 copyrights, patents or patent applications, or other proprietary 752 rights that may cover technology that may be required to implement 753 this standard. Please address the information to the IETF at 754 ietf-ipr@ietf.org. 756 Disclaimer of Validity 758 This document and the information contained herein are provided on an 759 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 760 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 761 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 762 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 763 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 764 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 766 Copyright Statement 768 Copyright (C) The Internet Society (2005). This document is subject 769 to the rights, licenses and restrictions contained in BCP 78, and 770 except as set forth therein, the authors retain all their rights. 772 Acknowledgment 774 Funding for the RFC Editor function is currently provided by the 775 Internet Society.