idnits 2.17.1 draft-ietf-tcpm-tcp-dcr-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 18. -- Found old boilerplate from RFC 3978, Section 5.5 on line 770. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 747. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 754. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 760. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 16 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 17 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 2005) is 6921 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2988' is defined on line 681, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 3517 (Obsoleted by RFC 6675) -- Obsolete informational reference (is this intentional?): RFC 896 (Obsoleted by RFC 7805) -- Obsolete informational reference (is this intentional?): RFC 2960 (Obsoleted by RFC 4960) -- Obsolete informational reference (is this intentional?): RFC 2988 (Obsoleted by RFC 6298) Summary: 9 errors (**), 0 flaws (~~), 5 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Sumitha Bhandarkar 3 INTERNET DRAFT A. L. Narasimha Reddy 4 draft-ietf-tcpm-tcp-dcr-04.txt Texas A&M University 5 Expires : November 2005 Mark Allman 6 ICIR 7 Ethan Blanton 8 Purdue University 9 May 2005 11 Improving the Robustness of TCP to Non-Congestion Events 13 Status of this Memo 15 By submitting this Internet-Draft, each author represents that any 16 applicable patent or other IPR claims of which he or she is aware 17 have been or will be disclosed, and any of which he or she becomes 18 aware will be disclosed, in accordance with Section 6 of BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt. 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 Copyright Notice 38 Copyright (C) The Internet Society (2005). 40 Abstract: 42 This document specifies Non-Congestion Robustness (NCR) for TCP. In 43 the absence of explicit congestion notification from the network, 44 TCP's loss recovery algorithms treat the receipt of three duplicate 45 acknowledgments as an implicit indication of congestion in the 46 network. This is not always correct, notably in the case when 47 network paths reorder segments (for whatever reason), resulting in 48 degraded performance. TCP-NCR is designed to mitigate this degraded 49 performance by increasing the number of duplicate acknowledgments 50 required to trigger loss recovery, based on the current state of the 51 connection, in an effort to disambiguate true segment loss from 52 segment reordering. In addition, we specify an option, Aggressive 53 Limited Transmit, where the TCP sender does not reduce its sending 54 rate until a segment is actually retransmitted; this would delay the 55 reduction of the sending rate by roughly one round-trip time compared 56 to current TCP implementations. This document specifies the changes 57 to TCP, as well as the costs and benefits of these modifications. 59 Terminology 61 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 62 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 63 "OPTIONAL" in this document are to be interpreted as described 64 in [RFC2119]. 66 Readers should be familiar with the TCP terminology given in 67 [RFC2581] and [RFC3517]. 69 1. Introduction 71 One strength of TCP [RFC793] lies in its ability to adjust its 72 sending rate according to the perceived congestion in the network 73 [Jac88,RFC2581]. In the absence of explicit notification of 74 congestion from the network, TCP uses segment loss as an indication 75 of congestion (i.e., assuming queue overflow). TCP receivers send 76 cumulative acknowledgments (ACKs) indicating the next sequence number 77 expected from the sender for arriving segments [RFC793]. When 78 segments arrive out-of-order, duplicate ACKs are generated. As 79 specified in [RFC2581], a TCP sender uses the arrival of three 80 duplicate ACKs as an indication of segment loss. The TCP sender 81 retransmits the lost segment and reduces the load imposed on the 82 network, assuming the segment loss was caused by resource contention 83 within the network path. The TCP sender does not assume loss on the 84 first or second duplicate ACK, but waits for three duplicate ACKs to 85 account for mild reordering. However, the use of this constant 86 threshold of duplicate ACKs has several problems that can be 87 mitigated with a dynamic threshold. 89 The following is an example of TCP's behavior: 91 + TCP A is the data sender and TCP B is the data receiver. 93 + TCP A sends 10 segments each consisting of a single data byte 94 (i.e., transmits bytes 1-10 in segments 1-10). 96 + Assume segment 3 is dropped in the network. 98 + TCP B cumulatively acknowledges segments 1 and 2, making the 99 cumulative ACK transmitted to the sender 3 (the next expected 100 sequence number). (Note: TCP B may generate one or two ACKs, 101 depending on whether delayed ACKs [RFC1122,RFC2581] are 102 employed.) 104 + The arrival of segments 4-10 at TCP B will each trigger the 105 transmission of a cumulative ACK for sequence number 3. (Note: 106 [RFC2581] recommends that delayed ACKs not be used when the ACK 107 is triggered by an out-of-order segment.) 109 + When TCP A receives the third duplicate ACK (or fourth ACK 110 overall) for sequence number 3, TCP A will retransmit 111 segment 3 and reduce the sending rate by roughly half (see 112 [RFC2581] for specifics on the congestion control state 113 adjustments). 115 Alternatively, suppose segment 3 was not dropped by the network, but 116 rather delayed such that segment 3 arrives after segment 10. The 117 above scenario will play out in precisely the same manner insomuch as 118 a retransmission of segment 3 will be triggered. In other words, TCP 119 is not capable of disambiguating this reordering event from a segment 120 loss. 122 The following is the specific motivation behind making TCP robust to 123 reordered segments: 125 * A number of Internet measurement studies have shown that packet 126 reordering is not a rare phenomenon [Pax97,BPS99,JIDKT03,GPL04]. 127 Further, the reordering can be well beyond that required for 128 fast retransmit to be falsely triggered. 130 * [BA02,ZKFP03] show the negative performance implications that 131 packet reordering has on current TCP. 133 * The requirement imposed by TCP for almost in-order packet 134 delivery places a constraint on the design of future technology. 135 Novel routing algorithms, network components, link-layer 136 retransmission mechanisms and applications could all be looked 137 at with a fresh perspective if TCP were to be more robust to 138 segment reordering. For instance, high speed packet switches 139 could cause resequencing of packets if TCP were more robust. 140 There has been work proposed in the literature explicitly to 141 ensure that packet ordering is maintained in such switches 142 [KM02]. Also, link-layer mechanisms that attempt to recover 143 from packet corruption by retransmitting could be allowed to 144 reorder packets and, hence, increase the chances of local loss 145 repair rather than relying on TCP to repair the loss (and, 146 needlessly reduce its sending rate). Additional examples 147 include multi-path routing, high-delay satellite links and some 148 of the schemes proposed for differentiated services 149 architecture. By making TCP more robust to non-congestion 150 events, TCP-NCR may open the design space of the future Internet 151 components. 153 In this document we specify a set of TCP sender modifications to 154 provide Non-Congestion Robustness (NCR) to TCP. In particular, these 155 changes are built on top of TCP with selective acknowledgments 156 (SACKs) [RFC2018] and the SACK-based loss recovery scheme given in 157 [RFC3517], since SACK is widely deployed at this point ([MAF05] 158 indicates that 68% of web servers and 88% of web clients utilize SACK 159 as of spring, 2004). 161 Finally, we note that the TCP-NCR algorithm provided in this document 162 could be easily adapted to SCTP [RFC2960] since SCTP uses congestion 163 control algorithms similar to TCP's (and, hence, has the same 164 reordering robustness issues). 166 The remainder of this document is organized as follows. Section 2 167 provides a high-level description of the TCP-NCR mechanisms. In 168 Section 3, we specify the TCP-NCR algorithm. Section 4 provides a 169 brief overview of the benefits of TCP-NCR, while Section 5 discusses 170 the drawbacks of TCP-NCR. Section 6 discusses related work. Section 171 7 discusses security concerns. 173 2. NCR Description 175 As discussed above, in the face of packet reordering three duplicate 176 ACKs may not be enough to disambiguate loss from reordering. In this 177 section we provide a non-normative sketch of TCP-NCR. The detailed 178 algorithms for implementing Non-Congestion Robustness for TCP are 179 presented in the next section. 181 The general idea behind TCP-NCR is to increase the threshold used to 182 trigger a fast retransmission from the current fixed value of three 183 duplicate ACKs [RFC2581] to approximately a congestion window of data 184 having left the network (but, not less than the currently 185 standardized value of three duplicate ACKs). Since cwnd represents 186 the amount of data a TCP flow can transmit in one round-trip time 187 (RTT), waiting to receive notice that cwnd bytes have left the 188 network before deciding whether the root cause is loss or reordering 189 imposes a delay of roughly one RTT. The appropriate choice for a new 190 value of the threshold is essentially a tradeoff between making the 191 best decision regarding the cause of the duplicate ACKs and 192 responsiveness. The choice to trigger a retransmission only after a 193 cwnd's worth of data is known to have left the network represents 194 roughly the largest amount of time a TCP can wait before the (often 195 costly) retransmission timeout may be triggered. Therefore, the 196 algorithm described in this document attempts to make the best root 197 cause decision possible. 199 Simply increasing the threshold before retransmitting a segment can 200 make TCP brittle to packet loss or ACK loss since such loss reduces 201 the number of duplicate ACKs that will arrive at the sender from the 202 receiver. For instance, if the cwnd is 10 segments and one segment 203 is lost, a duplicate ACK threshold of 10 will never be met because 204 duplicate ACKs corresponding to at most 9 segments will arrive at the 205 sender. To offset the issue of loss, we extend TCP's Limited 206 Transmit [RFC3042] scheme to allow for the sending of new data during 207 the period when the TCP sender is disambiguating loss and reordering. 208 This new data serves to increase the likelihood of enough duplicate 209 ACKs arriving at the sender to trigger loss recovery if it is 210 appropriate. 212 At this point we note that TCP tightly couples reliability and 213 congestion control -- when a segment is declared lost, a 214 retransmission is triggered and a change to sending rate is also made 215 on the assumption that the drop is due to resource contention 216 [RFC2581]. Therefore, by simply changing the retransmission trigger 217 the congestion control response is also changed. However, we lack 218 experience on the Internet as to whether delaying the point that a 219 rate reduction takes place is appropriate for wide-scale deployment. 220 Therefore, the extended Limited Transmit mechanism proposed in this 221 document offers two variants for experimentation. 223 The first Extended Limited Transmit variant, Careful Limited 224 Transmit, calls for the transmission of a previously unsent segment 225 for every two segments that are known to have left the network. This 226 has the effect of halving the sending rate since normal TCP operation 227 calls for the sending of one segment for every segment that has left 228 the network. Further, the halving starts immediately and is not 229 delayed until a retransmission is triggered. In the case of packet 230 reordering (i.e., not segment loss) the congestion control state is 231 restored to its previous state when reordering is determined. 233 The second variant, Aggressive Limited Transmit, calls for 234 transmitting a previously unsent data segment for every segment known 235 to have left the network. With this variant, while waiting to 236 disambiguate the loss from a reordering event, ACK-clocked 237 transmission continues at rougly the same rate as before the event 238 started. Retransmission and the sending rate reduction happen per 239 [RFC2581,RFC3517], albeit with the delayed threshold described above. 240 While this approach delays legitimate rate reductions (possibly 241 slightly and temporarily aggravating overall congestion on the 242 network) the scheme has the advantage of not reducing the 243 transmission rate in the face of segment reordering. 245 Which of the two Extended Limited Transmit variants is best for use 246 on the Internet is an open question. 248 3. Algorithm 250 The TCP-NCR modifications make two fundamental changes to the way 251 [RFC3517] currently operates, as follows. 253 First, the trigger for retransmitting a segment is changed from three 254 duplicate ACKs [RFC2581,RFC3517] to indications that a congestion 255 window's worth of data has left the network. Second, TCP-NCR 256 decouples initial congestion control decisions from retransmission 257 decisions, in some cases delaying congestion control changes relative 258 to TCP's current behavior defined in [RFC2581]. The algorithm 259 provides two alternatives for extending Limited Transmit. The two 260 variants of extended Limited Transmit are: 262 Careful Limited Transmit: 264 This variant calls for reducing the sending rate at 265 approximately the same time [RFC2581] implementations reduce 266 the congestion window, while at the same time withholding a 267 retransmission (and the final congestion determination) for 268 approximately one RTT. 270 Aggressive Limited Transmit: 272 This variant calls for maintaining the sending rate in the 273 face of duplicate ACKs until TCP concludes a segment is lost 274 and needs to be retransmitted (which TCP-NCR delays by one 275 RTT when compared with current loss recovery schemes). 277 A TCP-NCR implementation MUST use either Careful Limited Transmit or 278 Aggressive Limited Transmit. 280 A constant MUST be set depending on which variant of extended Limited 281 Transmit is used, as follows: 283 Careful Limited Transmit: 285 LT_F = 2/3 287 Aggressive Limited Transmit: 289 LT_F = 1/2 291 This constant reflects the fraction of outstanding data that must be 292 SACKed before a retransmission is triggered. Since Aggressive 293 Limited Transmit sends a new segment for every segment known to have 294 left the network, a total of roughly cwnd segments will be sent 295 during Aggressive Limited Transmit and therefore ideally a total of 296 2*cwnd segments will be outstanding. The duplicate ACK threshold is 297 then set to LT_F = 1/2 of 2*cwnd (or about 1 RTT worth of data). The 298 factor is different for Careful Limited Transmit because the sender 299 only transmits one new segment for every two segments that are SACKed 300 and therefore will ideally have a total of 1.5*cwnd segments 301 outstanding when the retransmission is to be triggered. Hence, the 302 required threshold is LT_F=2/3 of 1.5*cwnd to delay the 303 retransmission by roughly 1 RTT. 305 There are situations whereby the sender cannot transmit new data 306 during Extended Limited Transmit (e.g., lack of data from the 307 application, receiver's advertised window limit). These situations 308 can lead to the problems discussed in the last section when a TCP 309 does not employ Extended Limited Transmit and is starved for ACKs. 310 Therefore, TCP-NCR adapts the duplicate ACK threshold on each SACK 311 arrival to be as robust as possible given the actual amount of data 312 that has been transmitted, or roughly LT_F times the number of 313 outstanding segments. 315 The TCP-NCR modifications specified in this document lend themselves 316 to incremental deployment. Only the TCP implementation on the sender 317 side requires modification. The changes themselves are modest. 318 However, as will be discussed below, availability of additional 319 buffer space at the receiver will help maximize the benefits of using 320 TCP-NCR but are not strictly necessary. 322 The following algorithms depend on the notions provided by [RFC3517] 323 and we assume the reader is familiar with the terminology given in 324 [RFC3517]. The TCP-NCR algorithm can be adapted to alternate SACK- 325 based loss recovery schemes. [BR04,BSRV04] outline non-SACK-based 326 algorithms, however, we do not specify those algorithms in this 327 document and do not recommend them due to both the complexity and 328 security implications of having only a gross understanding of the 329 number of outstanding segments in the network. 331 A TCP connection using the Nagle algorithm [RFC896,RFC1122] MAY 332 employ the TCP-NCR algorithm. If a TCP implementation does implement 333 TCP-NCR the implementation MUST follow the various specifications 334 provided in sections 3.1 - 3.4. If the Nagle algorithm is not being 335 used there is no way to accurately calculate the number of 336 outstanding segments in the network (and, therefore, no good way to 337 derive an appropriate duplicate ACK threshold) without adding state 338 to the TCP sender. A TCP connection that does not employ the Nagle 339 algorithm SHOULD NOT use TCP-NCR. We envision that NCR could be 340 adapted to an implementation that carefully tracks the sequence 341 numbers transmitted in each segment. However, we leave this as 342 future work. 344 3.1. Initialization 346 When entering a period of loss / reordering detection and Extended 347 Limited Transmit a TCP-NCR MUST initialize several state variables. 348 A TCP MUST enter Extended Limited Transmit upon receiving the first 349 ACK with a SACK block after the reception of an ACK that (a) did not 350 contain SACK information and (b) did increase the connection's 351 cumulative ACK point. The initializations are: 353 (I.1) Save the current FlightSize. 355 FlightSizePrev = FlightSize 357 (I.2) Set a variable for tracking the number of segments for which 358 an ACK does not trigger a transmission during Careful Limited 359 Transmit. 361 Skipped = 0 363 (Note: Skipped is not used during Aggressive Limited 364 Transmit.) 366 (I.3) Set DupThresh (from [RFC3517]) based on the size of the 367 current FlightSize. 369 DupThresh = max (LT_F * (FlightSize / SMSS),3) 371 Note: We keep the lower bound of DupThresh = 3 from 372 [RFC2581,RFC3517]. 374 In addition to the above steps, the incoming ACK MUST be processed 375 with the E series of steps in section 3.3. 377 3.2. Terminating Extended Limited Transmit and Preventing Bursts 379 Extended Limited Transmit MUST be terminated at the start of loss 380 recovery as outlined in section 3.4. 382 The arrival of an ACK that advances the cumulative ACK point while in 383 Extended Limited Transmit, but before loss recovery is triggered 384 signals that a series of duplicate ACKs were caused by reordering and 385 not congestion. Therefore, the receipt of an ACK that extends the 386 cumulative ACK point MUST terminate Extended Limited Transmit. As 387 described below (in (T.4)), an ACK that extends the cumulative ACK 388 point and *also* contains SACK information will also trigger the 389 beginning of a new Extended Limited Transmit phase. 391 Upon the termination of Extended Limited Transmit, and especially 392 when using the Careful variant, TCP-NCR may be in a situation where 393 the entire cwnd is not being utilized and therefore TCP-NCR will be 394 prone to transmitting a burst of segments into the network. 395 Therefore, upon exiting Extended Limited Transmit the following steps 396 MUST be taken. 398 When a TCP-NCR in the Extended Limited Transmit phase receives an ACK 399 that updates the cumulative ACK point (regardless of whether the ACK 400 contains SACK information), the following steps MUST be taken: 402 (T.1) cwnd = min (FlightSize + SMSS,FlightSizePrev) 404 This step ensures that cwnd is not grossly larger than the 405 amount of data outstanding --- a situation that would cause a 406 line rate burst. 408 (T.2) ssthresh = FlightSizePrev 410 This step provides TCP-NCR with a sense of "history". If step 411 (T.1) reduces cwnd below FlightSizePrev this step ensures that 412 TCP-NCR will slow start back to the operating point in effect 413 before Extended Limited Transmit. 415 (T.3) Transmit previously unsent data as allowed by cwnd, 416 FlightSize, application data availability and the receiver's 417 advertised window. 419 (T.4) When the ACK extends the cumulative ACK point and also 420 contains SACK information, the initializations in steps (I.2) 421 and (I.3) from section 3.1 MUST be taken (but, not step (I.1)) 422 to re-start Extended Limited Transmit. In addition, the 423 series of steps in section 3.3 (the "E" steps) MUST be taken. 425 3.3. Extended Limited Transmit 427 On each ACK containing SACK information that arrives after TCP-NCR 428 has entered the Extended Limited Transmit phase (as outlined in 429 section 3.1) and before Extended Limited Transmit terminates, the 430 sender MUST use the following procedure. 432 (E.1) Use the SetPipe () procedure from [RFC3517] to set the "pipe" 433 variable (which represents the number of bytes still considered 434 "in the network"). 436 (E.2) If the comparison in equation (1) below holds and there are 437 SMSS bytes of previously unsent data available for 438 transmission then transmit one segment of SMSS bytes. 440 (pipe + Skipped) <= (FlightSizePrev - SMSS) (1) 442 If the comparison in equation (1) does not hold or no new data 443 can be transmitted (due to lack of data from the application 444 or the advertised window limit), skip to step (E.6). 446 (E.3) Increment pipe by SMSS bytes. 448 (E.4) If using Careful Limited Transmit, increment Skipped by SMSS 449 bytes to ensure that the next SMSS bytes of SACKed data 450 processed do not trigger a Limited Transmit transmission (since 451 the goal of Careful Limited Transmit is to send upon the 452 reception of every second duplicate ACK). 454 (E.5) Return to step (E.2) to ensure that as many bytes as 455 appropriate are transmitted. This provides robustness to ACK 456 loss that can be (largely) compensated for using SACK 457 information. 459 (E.6) Reset DupThresh via: 461 DupThresh = max (LT_F * (FlightSize / SMSS),3) 463 where FlightSize is the total number of bytes that have not 464 been cumulatively acknowledged (which is different from 465 "pipe"). 467 3.4 Entering Loss Recovery 469 When a segment is deemed lost via the algorithms in [RFC3517], 470 Extended Limited Transmit MUST be terminated, leaving the 471 algoritms in [RFC3517] to govern TCP's behavior. One slight 472 change to [RFC3517] MUST be made, however. In section 5, step 473 (2) of [RFC3517] MUST be changed to: 475 (2) ssthresh = cwnd = (FlightSizePrev / 2) 477 This ensures that the congestion control modifications are made 478 with respect to the amount of data in the network before 479 FlightSize was increased by Extended Limited Transmit. 481 4. Advantages 483 The major advantages of TCP-NCR are two-fold. As discussed in 484 section 1, TCP-NCR will open up the design space for network 485 applications and components that are currently constrained by TCP's 486 lack of robustness to packet reordering. The second advantage is in 487 terms of an increase in TCP performance. 489 [BR04] presents ns-2 [NS-2] simulations of a pre-cursor to the TCP- 490 NCR algorithm specified in this document, called TCP-DCR (Delayed 491 Congestion Response). The paper shows that TCP-DCR aids performance 492 in comparison to unmodified TCP in the presence of packet reordering. 493 In addition, the extended version of [BR04] presents results based on 494 emulations involving Linux (kernel 2.4.24). These results show that 495 the performance of TCP-DCR is similar to Linux's native 496 implementation that seeks to "undo" wrong decisions based on DSACK 497 [RFC2883] feedback (similar to the schemes outlined in [ZKFP03]), 498 when packets are reordered by less than one RTT. The advantage of 499 using TCP-DCR over the DSACK-based scheme is that the DSACK-based 500 scheme tries to estimate the exact amount of reordering in the 501 network using fairly complex algorithms, whereas TCP-DCR achieves 502 similar results with less complicated modifications. 504 In addition, [BR04,BSRV04] illustrate the ability of TCP-DCR to allow 505 for the improvement of other parts of the system. For example, these 506 papers show that increasing TCP's robustness to packet reordering 507 allows for a novel wireless ARQ mechanism to be added at the link- 508 layer. The added robustness of the link-layer to channel errors, in 509 turn, increases TCP performance by not requiring TCP to retransmit 510 packets that were dropped due to corruption (and, hence, also 511 prevents TCP from needlessly reducing the sending rate when 512 retransmitting these segments). 514 5. Disadvantages 516 While we note that all of the changes outlined above are implemented 517 in the sender, the receiver also potentially has a part to play. In 518 particular, TCP-NCR increases the receiver's buffering requirement by 519 up to an extra cwnd -- in the case of the TCP sender using Aggressive 520 Limited Transmit and actual loss occurring in the network. 521 Therefore, to maximize the benefits from TCP-NCR receivers should 522 advertise a large window to absorb the extra out-of-order traffic. In 523 the case that the additonal buffer requirements are not met, the use 524 of the above algorithm takes into account the reduced advertised 525 window, resulting in slighlty reduced robustness to reordering. 527 In addition, using TCP-NCR could delay the delivery of data to the 528 application by up to one RTT because the fast retransmission point is 529 delayed by roughly one RTT in TCP-NCR. Applications that are 530 sensitive to such delays should turn off the TCP-NCR option. For 531 instance, a socket option could be introduced to allow applications 532 to control whether NCR would be used for a particular connection. 534 Finally, the use of TCP-NCR makes the recovery from congestion events 535 sluggish in comparison to the standard reaction in [RFC2581]. [BR04, 536 BSRV04] show (via simulation) that the delay in congestion response 537 has minimal impact on the connection itself and the traffic sharing a 538 bottleneck. [BBFS01] also indicates (again, via simulation) that 539 "slowly responsive" congestion control may be safe for deployment in 540 the Internet. These studies suggest that schemes that slightly delay 541 congestion control decisions may be reasonable, however, further 542 experimentation on the Internet is required to verify these results. 544 6. Related Work 546 Over the past few years, several solutions have been proposed to 547 improve the performance of TCP in the face of segment reordering. 548 These schemes generally fall into one of two categories (with some 549 overlap): mechanisms that try to prevent spurious retransmits from 550 happening and mechanisms that try to detect spurious retransmits and 551 "undo" the needless congestion control state changes that have been 552 taken. 554 [BA02,ZKFP03] attempt to prevent segment reordering from triggering 555 spurious retransmits by using various algorithms to approximate the 556 duplicate ACK threshold required to disambiguate loss and reordering 557 over a given network path at a given time. TCP-NCR similarly tries 558 to prevent spurious retransmits. However, TCP-NCR takes a simplified 559 approach compared to those in [BA02,ZKFP03] in that TCP-NCR simply 560 delays retransmission by a fixed amount (in comparison to standard 561 TCP), while the other schemes use relatively complex algorithms in an 562 attempt to derive a more precise value for DupThresh that depends on 563 the network conditions. While TCP-NCR offers simplicity the other 564 schemes may offer more precision such that applications would not be 565 forced to wait as long for their retransmissions. Future work could 566 be undertaken to achieve robustness without needless delay. 568 On the other hand, several schemes have been developed to detect and 569 mitigate needless retransmissions after the fact. 570 [RFC3522,RFC3708,BA02,LG04,SK04] present algorithms to detect 571 spurious retransmits and mitigate the changes these events made to 572 the congestion control state. TCP-NCR could be used in conjunction 573 with these algorithms, with TCP-NCR attempting to prevent spurious 574 retransmits and some other scheme kicking in if the prevention 575 failed. In addition, we note that TCP-NCR is concentrated on 576 preventing spurious fast retransmits and some of the above algorithms 577 also attempt to detect and mitigate spurious timeout-based 578 retransmits. 580 7. Security Considerations 582 We do not believe there are security implications involved with TCP- 583 NCR over and above those for general TCP congestion control 584 [RFC2581]. In particular, the Extended Limited Transmit algorithms 585 specified in this document have been specifically designed not to be 586 susceptible to the sorts of ACK splitting attacks TCP's general TCP 587 congestion control is vulnerable to (as discussed in [RFC3465]. 589 8. Acknowledgements 591 Ted Faber, Sally Floyd, Nauzad Sadry, Pasi Sarolahti and Nitin Vaidya 592 as well as feedback from from the TCPM working group have contributed 593 significantly to this document. Our thanks to all! 595 9. Normative References 597 [RFC793] J. Postel, "Transmission Control Protocol", RFC 793, 598 September 1981. 600 [RFC2018] M. Mathis, J. Mahdavi, S. Floyd and A. Romanow, "TCP 601 selective acknowledgment options," Internet RFC 2018. 603 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 604 Requirement Levels", BCP 14, RFC 2119, March 1997. 606 [RFC2581] M. Allman, V. Paxson, and W. Stevens, "TCP Congestion 607 Control", RFC 2581, April 1999. 609 [RFC3042] M. Allman, H. Balakrishnan and S. Floyd, "Enhancing TCP's 610 Loss Recovery Using Limited Transmit", RFC 3042, January 2001. 612 [RFC3517] E. Blanton, M. Allman, K. Fall and L. Wang, "A Conservative 613 Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for 614 TCP", RFC 3517, April 2003. 616 10. Informative References 618 [BA02] E. Blanton and M. Allman, "On Making TCP More Robust to Packet 619 Reordering," ACM Computer Communication Review, January 2002. 621 [BBFS01] D. Bansal, H. Balakrishnan, S. Floyd and S. Shenker, 622 "Dynamic Behavior of Slowly Responsive Congestion Control 623 Algorithms", Proceedings of ACM SIGCOMM, Sep. 2001. 625 [BPS99] J. Bennett, C. Partridge, and N. Shectman, "Packet reordering 626 is not pat hological network behavior," IEEE/ACM Transactions on 627 Networking, December 1999. 629 [BR04] Sumitha Bhandarkar and A. L. Narasimha Reddy, "TCP-DCR: Making 630 TCP Robust to Non-Congestion Events", In the Proceedings of 631 Networking 2004 conference, May 2004. Extended version available as 632 tech report TAMU-ECE-2003-04. 634 [BSRV04] Sumitha Bhandarkar, Nauzad Sadry, A. L. Narasimha Reddy and 635 Nitin Vaidya, "TCP-DCR: A Novel Protocol for Tolerating Wireless 636 Channel Errors", To appear in IEEE Transactions on Mobile Computing 638 [GPL04] Ladan Gharai, Colin Perkins and Tom Lehman, "Packet 639 Reordering, High Speed Networks and Transport Protocol Performance", 640 ICCCN 2004, October 2004. 642 [Jac88] V. Jacobson, "Congestion Avoidance and Control", Computer 643 Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988. 644 ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z. 646 [JIDKT03] S. Jaiswal, G. Iannaccone, C. Diot, J. Kurose, and D. 647 Towsley, "Measurement and Classification of Out-of-Sequence Packets 648 in a Tier-1 IP Backbone," Proceedings of IEEE INFOCOM, 2003. 650 [KM02] I. Keslassy and N. McKeown, "Maintaining packet order in 651 twostage switche s," Proceedings of the IEEE Infocom, June 2002 653 [LG04] R. Ludwig, A. Gurtov, "The Eifel Response Algorithm for TCP", 654 Internet-Draft draft-ietf-tsvwg-tcp-eifel-response-06.txt (work in 655 progress). September 2004. 657 [MAF05] A. Medina, M. Allman, S. Floyd. Measuring the Evolution of 658 Transport Protocols in the Internet. ACM Computer Communication 659 Review, 35(2), April 2005. 661 [NS-2] ns-2 Network Simulator. http://www.isi.edu/nsnam/ 663 [Pax97] V. Paxson, "End-to-End Internet Packet Dynamics," Proceedings 664 of ACM SIGCOMM, September 1997. 666 [RFC896] J. Nagle, "Congestion Control in IP/TCP Internetworks", RFC 667 896, January 1984. 669 [RFC1122] R. Braden, "Requirements for Internet Hosts - Communication 670 Layers", RFC 1122, October 1989. 672 [RFC2883] Sally Floyd, Jamshid Mahdavi, Matt Mathis and Matt 673 Podolsky, "An Extension to the Selective Acknowledgement (SACK) 674 Option for TCP," RFC 2883, July 2000. 676 [RFC2960] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. 678 Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L. Zhang, V. Paxson. 679 Stream Control Transmission Protocol. October 2000. 681 [RFC2988] V. Paxson and M. Allman, "Computing TCP's Retransmission 682 Timer", RFC 2988, November 2000. 684 [RFC3465] M. Allman. TCP Congestion Control with Appropriate Byte 685 Counting (ABC), February 2003. RFC 3465. 687 [RFC3522] R. Ludwig and M. Meyer, "The Eifel Detection Algorithm for 688 TCP," RFC 3522, April 2003. 690 [RFC3708] E. Blanton and M. Allman, "Using TCP Duplicate Selective 691 Acknowledgement (DSACKs) and Stream Control Transmission Protocol 692 (SCTP) Duplicate Transmission Sequence Numbers (TSNs) to Detect 693 Spurious Retransmissions", RFC 3708, February 2004. 695 [SK04] P. Sarolahti, M. Kojo, "Forward RTO-Recovery (F-RTO): An 696 Algorithm for Detecting Spurious Retransmission Timeouts with TCP and 697 SCTP", Internet-Draft draft-ietf-tcpm-frto-02.txt (work in progress). 698 November 2004. 700 [ZKFP03] M. Zhang, B. Karp, S. Floyd, L. Peterson, "RR-TCP: A 701 Reordering-Robust TCP with DSACK", in Proceedings of the Eleventh 702 IEEE International Conference on Networking Protocols (ICNP 2003), 703 Atlanta, GA, November, 2003. 705 11. Author's Addresses 707 Sumitha Bhandarkar 708 Dept. of Elec. Engg. 709 214 ZACH 710 College Station, TX 77843-3128 711 Phone: (512) 468-8078 712 Email: sumitha@tamu.edu 713 URL : http://students.cs.tamu.edu/sumitha/ 715 A. L. Narasimha Reddy 716 Professor 717 Dept. of Elec. Engg. 718 315C WERC 719 College Station, TX 77843-3128 720 Phone : (979) 845-7598 721 Email : reddy@ee.tamu.edu 722 URL : http://ee.tamu.edu/~reddy/ 724 Mark Allman 725 ICSI Center for Internet Research 726 1947 Center Street, Suite 600 727 Berkeley, CA 94704-1198 728 Phone: (216) 243-7361 729 Email: mallman@icir.org 730 URL: http://www.icir.org/mallman/ 732 Ethan Blanton 733 Purdue University Computer Sciences 734 250 North University Street 735 West Lafayette, IN 47907 736 Email: eblanton@cs.purdue.edu 738 Intellectual Property Statement 740 The IETF takes no position regarding the validity or scope of any 741 Intellectual Property Rights or other rights that might be claimed to 742 pertain to the implementation or use of the technology described in 743 this document or the extent to which any license under such rights 744 might or might not be available; nor does it represent that it has 745 made any independent effort to identify any such rights. Information 746 on the procedures with respect to rights in RFC documents can be 747 found in BCP 78 and BCP 79. 749 Copies of IPR disclosures made to the IETF Secretariat and any 750 assurances of licenses to be made available, or the result of an 751 attempt made to obtain a general license or permission for the use of 752 such proprietary rights by implementers or users of this 753 specification can be obtained from the IETF on-line IPR repository at 754 http://www.ietf.org/ipr. 756 The IETF invites any interested party to bring to its attention any 757 copyrights, patents or patent applications, or other proprietary 758 rights that may cover technology that may be required to implement 759 this standard. Please address the information to the IETF at 760 ietf-ipr@ietf.org. 762 Disclaimer of Validity 764 This document and the information contained herein are provided on an 765 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 766 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 767 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 768 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 769 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 770 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 772 Copyright Statement 773 Copyright (C) The Internet Society (2005). This document is subject 774 to the rights, licenses and restrictions contained in BCP 78, and 775 except as set forth therein, the authors retain all their rights. 777 Acknowledgment 779 Funding for the RFC Editor function is currently provided by the 780 Internet Society.