idnits 2.17.1 draft-allman-tcp-lossrec-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 107: '...on, new segments SHOULD use the Limite...' RFC 2119 keyword, line 120: '...on window (cwnd) MUST NOT be changed w...' RFC 2119 keyword, line 129: '... [RFC2018], the data sender MUST NOT send new segments in response to...' RFC 2119 keyword, line 204: '... a TCP connection SHOULD only send one...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'DMKM00' is defined on line 315, but no explicit reference was found in the text == Unused Reference: 'LK98' is defined on line 335, but no explicit reference was found in the text == Unused Reference: 'Mor97' is defined on line 338, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'All00' -- Possible downref: Non-RFC (?) normative reference: ref. 'AP99' -- Possible downref: Non-RFC (?) normative reference: ref. 'Bal98' -- Possible downref: Non-RFC (?) normative reference: ref. 'BPS99' == Outdated reference: A later version (-07) exists of draft-ietf-pilc-slow-03 -- Possible downref: Non-RFC (?) normative reference: ref. 'FF96' -- Possible downref: Non-RFC (?) normative reference: ref. 'Flo94' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jac88' -- Possible downref: Non-RFC (?) normative reference: ref. 'LK98' -- Possible downref: Non-RFC (?) normative reference: ref. 'Mor97' -- Possible downref: Non-RFC (?) normative reference: ref. 'Pax97' ** Downref: Normative reference to an Informational draft: draft-hadi-jhsua-ecnperf (ref. 'SA00') -- Possible downref: Non-RFC (?) normative reference: ref. 'SCWA99' ** Obsolete normative reference: RFC 2481 (Obsoleted by RFC 3168) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2582 (Obsoleted by RFC 3782) Summary: 9 errors (**), 0 flaws (~~), 5 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Mark Allman 2 INTERNET DRAFT NASA GRC/BBN 3 File: draft-allman-tcp-lossrec-00.txt Hari Balakrishnan 4 MIT 5 Sally Floyd 6 ACIRI 7 June, 2000 8 Expires: December, 2000 10 Enhancing TCP's Loss Recovery Using 11 Early Duplicate Acknowledgment Response 13 Status of this Memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as 21 Internet-Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six 24 months and may be updated, replaced, or obsoleted by other documents 25 at any time. It is inappropriate to use Internet- Drafts as 26 reference material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 Abstract 36 This document proposes two new TCP mechanisms that can be used to 37 more effectively recover lost segments when a connection's 38 congestion window is small, or when a large number of segments are 39 lost in a single transmission window. The first of these 40 mechanisms, ``Limited Transmit'', calls for sending a new data 41 segment in response to each of the first two duplicate 42 acknowledgments that arrive at the sender. The second mechanism is 43 to reduce, in certain special circumstances, the number of duplicate 44 acknowledgments required to trigger a fast retransmission. 46 1 Introduction 48 A number of researchers have pointed out that TCP's loss recovery 49 strategies do not work well when the congestion window at a TCP 50 sender is small. This can happen, for instance, because there is 51 only a limited amount of data to send, or because of the limit 52 imposed by the receiver-advertised window, or because of the 53 constraints imposed by end-to-end congestion control over a 54 connection with a small bandwidth-delay product 55 [Mor97,BPS+98,Bal98,LK98,DMKM00]. When it detects a missing 56 segment, TCP enters a loss recovery phase using one of two methods. 57 First, if an acknowledgment (ACK) for a given segment is not 58 received in a certain amount of time a retransmission timeout occurs 59 and the segment is resent [PA00]. Second, the ``Fast Retransmit'' 60 algorithm resends a segment when three duplicate ACKs arrive at the 61 sender [Jac88,RFC2581]. However, because duplicate ACKs from the 62 receiver are also triggered by packet reordering in the Internet, 63 the TCP sender waits for three duplicate ACKs in an attempt to 64 disambiguate segment loss from packet reordering. Once in a loss 65 recovery phase, a number of techniques can be used to retransmit 66 lost segments, including slow start based recovery or Fast Recovery 67 [RFC2581], NewReno [RFC2582], and loss recovery based on selective 68 acknowledgments (SACKs) [RFC2018,FF96]. 70 TCP's retransmission timeout (RTO) is based on measured round-trip 71 times (RTT) between the sender and receiver, as specified in [PA00]. 72 To prevent spurious retransmissions of segments that are only 73 delayed and not lost, the minimum RTO is conservatively chosen to be 74 1 second. Therefore, it behooves TCP senders to detect and recover 75 from as many losses as possible without incurring a lengthy timeout 76 when the connection remains idle. However, if not enough duplicate 77 ACKs arrive from the receiver, the Fast Retransmit algorithm is 78 never triggered---this situation occurs when the congestion window 79 is small or if a large number of segments in a window are lost. For 80 instance, consider a congestion window (cwnd) of three segments. If 81 one segment is dropped by the network, then at most two duplicate 82 ACKs will arrive at the sender, assuming no ACK loss. Since three 83 duplicate ACKs are required to trigger Fast Retransmit, a timeout 84 will be required to resend the dropped packet. 86 [BPS+98] shows that roughly 56% of retransmissions sent by a busy 87 web server are sent after the RTO expires, while only 44% are 88 handled by Fast Retransmit. In addition, only 4% of the RTO-based 89 retransmissions could have been avoided with SACK, which of course 90 has to continue to disambiguate reordering from genuine loss. In 91 contrast, using the techniques outlined in this document and in 92 [Bal98], 25% of the RTO-based retransmissions in that dataset would 93 have likely been avoided. In addition, [All00] shows that for one 94 particular web server the median transfer size is less than four 95 segments, indicating that more than half of the connections will be 96 forced to rely on the RTO to recover from any losses that occur. 98 The next two sections of this document outline small changes to TCP 99 senders that will decrease the reliance on the retransmission timer, 100 and thereby improve TCP performance when Fast Retransmit is not 101 triggered. These changes do not adversely affect the performance of 102 TCP nor interact adversely with other connections, in other 103 circumstances. 105 2 Modified Response to Duplicate ACKs 106 When a TCP sender has previously unsent data queued for 107 transmission, new segments SHOULD use the Limited Transmit 108 algorithm, which calls for a TCP sender to transmit new data upon 109 the arrival of a duplicate ACK when the following conditions are 110 satisfied: 112 * The receiver's advertised window allows the transmission of the 113 segment. 115 * The amount of outstanding data would remain less than the 116 congestion window plus the duplicate ACK threshold used to 117 trigger Fast Retransmit. In other words, the sender can only 118 send two segments beyond the congestion window (cwnd). 120 The congestion window (cwnd) MUST NOT be changed when these new 121 segments are transmitted. Assuming that these new segments and the 122 corresponding ACKs are not dropped, this procedure allows the sender 123 to infer loss using the standard Fast Retransmit threshold of three 124 duplicate ACKs [RFC2581]. This is more robust to reordered packets 125 than it would be to retransmit an old packet on the first or second 126 duplicate ACK. 128 Note: If the connection is using selective acknowledgments 129 [RFC2018], the data sender MUST NOT send new segments in response to 130 duplicate ACKs that contain no new SACK information, as a 131 misbehaving receiver can generate such ACKs to trigger inappropriate 132 transmission of data segments. See [SCWA99] for a discussion of 133 attacks by misbehaving receivers. 135 Using Limited Transmit follows the ``conservation of packets'' 136 congestion control principle [Jac88]. Each of the first two 137 duplicate ACKs indicate that a segment has left the network. 138 Furthermore, the sender has not yet decided that a segment has been 139 dropped and therefore has no reason to assume the current congestion 140 control state is not accurate. Therefore, transmitting segments 141 does not deviate from the spirit of TCP's congestion control 142 principles. 144 [BPS99] shows that packet reordering is not a rare network event. 145 [RFC2581] does not provide for sending of data on the first two 146 duplicate ACKs that arrive at the sender. This causes a burst of 147 segments to be sent when an ACK for new data does arrive. Using 148 Limited Transmit, data packets will be clocked out by incoming ACKs 149 and therefore transmission will not be as bursty. 151 Note: Limited Transmit is implemented in the ns simulator. 152 Researchers wishing to investigate this mechanism further can do so 153 by enabling ``singledup_'' for the given TCP connection. 155 3 Reduction of the Retransmission Threshold 156 In some cases the TCP sender may not have new data queued and ready 157 to be transmitted when the first two duplicate ACKs arrive. In this 158 case, the Limited Transmit algorithm outlined in section 2 cannot be 159 utilized. If there is a large amount of outstanding data in the 160 network, not being able to transmit new segments when the first two 161 duplicate ACKs arrive is not a problem, as Fast Retransmit will be 162 triggered naturally. However, when the amount of outstanding data 163 is small the sender will have to rely on the RTO to repair any lost 164 segments. 166 As an example, consider the case when cwnd is three segments and one 167 of these segments is dropped by the network. If the other two 168 segments arrive at the receiver and the corresponding ACKs are not 169 dropped by the network, the sender will receive two duplicate ACKs, 170 which is not enough to trigger the Fast Retransmit algorithm. The 171 loss can therefore be repaired only after an RTO. However, the 172 sender has enough information to infer that it cannot expect three 173 duplicate ACKs when one segment is dropped. 175 The first mitigation of the above problem involves lowering the 176 duplicate ACK threshold, when cwnd is small and when no unsent data 177 segments are enqueued. In particular, if cwnd is less than 4 178 segments and there are no unsent segments at the sender, the 179 duplicate ACK threshold used to trigger Fast Retransmit is reduced 180 to cwnd-1 duplicate ACKs (where cwnd is in terms of segments). In 181 other words, when cwnd is small enough that losing one segment would 182 not trigger Fast Retransmit, we reduce the duplicate ACK threshold 183 to the number of duplicate ACKs expected if one segment is lost. 184 This mitigation is clearly less robust in the face of reordered 185 segments than the standard Fast Retransmit threshold of three 186 duplicate ACKs. Research shows that a general reduction in the 187 number of duplicate ACKs required to trigger fast retransmission of 188 a segment to two (rather than three) leads to a reduction in the 189 ratio of good to bad retransmits by a factor of three [Pax97]. 190 However, this analysis did not include the additional conditioning 191 on the event that the cwnd was smaller than 4 segments. 193 Note that persistent reordering of segments, coupled with an 194 application that does not constantly send data, can result in large 195 numbers of retransmissions. For instance, consider an application 196 that sends data two segments at a time, followed by an idle period 197 when no data is queued for delivery by TCP. If the network 198 consistently reorders the two segments, the TCP sender will 199 needlessly retransmit one out of every two unique segments 200 transmitted when using the above algorithm. However, this would 201 only be a problem for long-lived connections from applications that 202 transmit in spurts. 204 To combat this problem, a TCP connection SHOULD only send one 205 retransmission using a duplicate ACK threshold of less than three. 206 This allows for enhanced recovery for short connections and protects 207 the network from longer connections that could possibly use this 208 algorithm to send many needless retransmissions. We note that 209 future research may allow this restriction to be relaxed, and refer 210 the reader to Appendix A for a discussion of some alternate 211 mechanisms. While not explicitly recommended by this document, we 212 believe that these may prove useful depending on the results of 213 further research. 215 4 Related Work 217 Deployment of Explicit Congestion Notification (ECN) [Flo94,RFC2481] 218 may benefit connections with small congestion window sizes [SA00]. 219 ECN provides a method for indicating congestion to the end-host 220 without dropping segments. While some segment drops may still 221 occur, ECN may allow TCP to perform better with small cwnd sizes 222 because the sender will be required to detect less segment loss 223 [SA00]. 225 5 Security Considerations 227 The security implications of the changes proposed in this document 228 are minimal. The potential security issues come from the subversion 229 of end-to-end congestion control from "false" duplicate ACKs, where 230 a "false" duplicate ACK is a duplicate ACK that does not actually 231 acknowledge new data received at the TCP receiver. False duplicate 232 ACKs could result from duplicate ACKs that are themselves duplicated 233 in the network, or from misbehaving TCP receivers that send false 234 duplicate ACKs to subvert end-to-end congestion control 235 [SCWA99,RFC2581]. 237 When the TCP data receiver has agreed to use the SACK option, the 238 TCP data sender has fairly strong protection against false duplicate 239 ACKs. In particular, with SACK, a duplicate ACK that acknowledges 240 new data arriving at the receiver reports the sequence numbers of 241 that new data. Thus, with SACK, the TCP sender can verify that an 242 arriving duplicate ACK acknowledges data that the TCP sender has 243 actually sent, and for which no previous acknowledgment has been 244 received, before sending new data as a result of that 245 acknowledgment. For further protection, the TCP sender could keep a 246 record of packet boundaries for transmitted data packets, and 247 recognize at most one valid acknowledgment for each packet (e.g., 248 the first acknowledgment acknowledging the receipt of all of the 249 sequence numbers in that packet). 251 One could imagine some limited protection against false duplicate 252 ACKs for a non-SACK TCP connection, where the TCP sender keeps a 253 record of the number of packets transmitted, and recognizes at most 254 one acknowledgment per packet to be used for triggering the sending 255 of new data. However, this accounting of packets transmitted and 256 acknowledged would require additional state and extra complexity at 257 the TCP sender, and does not seem necessary. 259 The most important protection against false duplicate ACKs comes 260 from the limited potential of duplicate ACKs in subverting 261 end-to-end congestion control. There are two separate cases to 262 consider, when the TCP sender receives less than a threshold number 263 of duplicate ACKs, and when the TCP sender receives at least a 264 threshold number of duplicate ACKs. 266 First we consider the case when the TCP sender receives less than a 267 threshold number of duplicate ACKs. For example, the TCP receiver 268 could send two duplicate ACKs after each regular ACK. One might 269 imagine that the TCP sender would send at three times its allowed 270 sending rate. However, using Limited Transmit as outlined in 271 section 2 the sender is only allowed to exceed the congestion window 272 by less than the duplicate ACK threshold, and thus would not send a 273 new packet for each duplicate ACK received. 275 We next consider the case when the TCP sender receives at least the 276 threshold number of duplicate ACKs. This is an increased 277 possibility with the reduction of the duplicate ACK threshold for 278 the special case proposed in Section 3. However, in addition to 279 retransmitting a packet when a threshold number of duplicate ACKs is 280 received, the TCP sender also halves its congestion window, thus 281 reinforcing the role of end-to-end congestion control. If the 282 retransmitted packet is itself dropped, then it will only be 283 retransmitted again after the retransmit timer expires. Thus, the 284 potential drawback of a reduced threshold is not one of congestion 285 collapse for the network. Instead, the potential drawback would be 286 that of a single unnecessary retransmission, and an accompanying 287 unnecessary reduction of the congestion window, for the TCP 288 connection itself. This is not a security consideration, but a 289 performance consideration for the TCP connection itself. We note 290 that the reduced threshold would only apply when the TCP sender does 291 not have additional data ready to transmit, so the performance 292 penalty would be small. 294 References 296 [All00] Mark Allman. A Server-Side View of WWW Characteristics. 297 May, 2000. In preparation. 299 [AP99] Mark Allman, Vern Paxson. On Estimating End-to-End Network 300 Path Properties. ACM SIGCOMM, September 1999. 302 [Bal98] Hari Balakrishnan. Challenges to Reliable Data Transport 303 over Heterogeneous Wireless Networks. Ph.D. Thesis, University 304 of California at Berkeley, August 1998. 306 [BPS+98] Hari Balakrishnan, Venkata Padmanabhan, Srinivasan Seshan, 307 Mark Stemm, and Randy Katz. TCP Behavior of a Busy Web Server: 308 Analysis and Improvements. Proc. IEEE INFOCOM Conf., San 309 Francisco, CA, March 1998. 311 [BPS99] Jon Bennett, Craig Partridge, Nicholas Shectman. Packet 312 Reordering is Not Pathological Network Behavior. IEEE/ACM 313 Transactions on Networking, December 1999. 315 [DMKM00] Spencer Dawkins, Gabriel Montenegro, Markku Kojo, Vincent 316 Magret. End-to-end Performance Implications of Slow Links, 317 Internet-Draft draft-ietf-pilc-slow-03.txt, March 2000 (work in 318 progress). 320 [FF96] Kevin Fall, Sally Floyd. Simulation-based Comparisons of 321 Tahoe, Reno, and SACK TCP. ACM Computer Communication Review, 322 July 1996. 324 [Flo94] Sally Floyd. TCP and Explicit Congestion Notification. ACM 325 Computer Communication Review, October 1994. 327 [FMM+99] Sally Floyd, Jamshid Mahdavi, Matt Mathis, Matt Podolsky, 328 Allyn Romanow, An Extension to the Selective Acknowledgement 329 (SACK) Option for TCP, Internet-Draft draft-floyd-sack-00.txt, 330 August 1999. 332 [Jac88] Van Jacobson. Congestion Avoidance and Control. ACM 333 SIGCOMM 1988. 335 [LK98] Dong Lin, H.T. Kung. TCP Fast Recovery Strategies: Analysis 336 and Improvements. Proceedings of InfoCom, March 1998. 338 [Mor97] Robert Morris. TCP Behavior with Many Flows. Proceedings 339 of the Fifth IEEE International Conference on Network Protocols. 340 October 1997. 342 [PA00] Vern Paxson, Mark Allman. Computing TCP's Retransmission 343 Timer, April 2000. Internet-Draft draft-paxson-tcp-rto-01.txt 344 (work in progress). 346 [Pax97] Vern Paxson. End-to-End Internet Packet Dynamics. ACM 347 SIGCOMM, September 1997. 349 [SA00] Jamal Hadi Salim and Uvaiz Ahmed, Performance Evaluation of 350 Explicit Congestion Notification (ECN) in IP Networks, 351 draft-hadi-jhsua-ecnperf-01.txt, March 2000 (work in progress). 353 [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, Tom 354 Anderson. TCP Congestion Control with a Misbehaving Receiver. 355 ACM Computer Communications Review, October 1999. 357 [RFC2018] Matt Mathis, Jamshid Mahdavi, Sally Floyd, Allyn Romanow. 358 TCP Selective Acknowledgement Options. RFC 2018, October 1996. 360 [RFC2481] K. K. Ramakrishnan, Sally Floyd. A Proposal to Add 361 Explicit Congestion Notification (ECN) to IP. RFC 2481, January 362 1999. 364 [RFC2581] Mark Allman, Vern Paxson, W. Richard Stevens. TCP 365 Congestion Control. RFC 2581, April 1999. 367 [RFC2582] Sally Floyd, Tom Henderson. The NewReno Modification to 368 TCP's Fast Recovery Algorithm. RFC 2582, April 1999. 370 Author's Addresses: 372 Mark Allman 373 NASA Glenn Research Center/BBN Technologies 374 Lewis Field 375 21000 Brookpark Rd. MS 54-2 376 Cleveland, OH 44135 377 Phone: 216-433-6586 378 Fax: 216-433-8705 379 mallman@grc.nasa.gov 380 http://roland.grc.nasa.gov/~mallman 382 Hari Balakrishnan 383 Laboratory for Computer Science 384 545 Technology Square 385 Massachusetts Institute of Technology 386 Cambridge, MA 02139 387 hari@lcs.mit.edu 388 http://nms.lcs.mit.edu/~hari/ 390 Sally Floyd 391 AT&T Center for Internet Research at ICSI (ACIRI) 392 Phone: +1 (510) 666-2989 393 floyd@aciri.org 394 http://www.aciri.org/floyd/ 396 Appendix A: Research Issues in Adjusting the Duplicate ACK Threshold 398 Decreasing the number of duplicate ACKs required to trigger Fast 399 Retransmit, as suggested in section 3, has the drawback of making 400 Fast Retransmit less robust in the face of minor network reordering. 401 As outlined in section 3, this document only allows a TCP to use 402 Fast Retransmit one time when the number of duplicate ACKs is less 403 than three. This appendix suggests several methods by which this 404 restriction may be removed. However, these methods need further 405 research before they are suggested for use in shared networks. 407 Using information provided by the DSACK option [FMM+99], a TCP 408 sender can determine when its Fast Retransmit threshold is too low, 409 causing needless retransmissions due to reordering in the network. 410 Coupling the information provided by DSACKs with the algorithm 411 outlined in section 3 may provide a further enhancement. 412 Specifically, the proposed reduction in the duplicate ACK threshold 413 would not be taken if the network path is known to be reordering 414 segments. 416 The next method is to detect needless retransmits based on the time 417 between the retransmission and the next ACK received. As outlined 418 in [AP99] if this time is less than half of the minimum RTT observed 419 thus far the retransmission was likely unnecessary. When using less 420 than three duplicate ACKs as the threshold to trigger Fast 421 Retransmit, a TCP sender could attempt to determine whether the 422 retransmission was needed or not. In the case when it was 423 unnecessary, the sender could refrain from further use of Fast 424 Retransmit with a threshold of less than three duplicate ACKs. This 425 method of detecting bad retransmits is not as robust as using 426 DSACKs. However, the advantage is that this mechanism only requires 427 sender-side implementation changes. 429 A TCP sender can take measures to avoid a case where a large 430 percentage of the unique segments transmitted are being needlessly 431 retransmitted due to the use of a low duplicate ACK threshold (such 432 as the one outlined in section 3). Specifically, the sender can 433 limit the percentage of retransmits based on a duplicate ACK 434 threshold of less than three. This allows the mechanism to be used 435 throughout a long lived connection, but at the same time protecting 436 the network from potentially wasteful needless retransmissions. 437 However, this solution does not attempt to address the underlying 438 problem, but rather just limit the damage the algorithm can cause. 440 Finally, [Bal98] outlines another solution to the problem of having 441 no new segments to transmit into the network when the first two 442 duplicate ACKs arrive. In response to these duplicate ACKs, a TCP 443 sender transmits zero-byte segments to induce additional duplicate 444 ACKs [Bal98]. This method preserves the robustness of the standard 445 Fast Retransmit algorithm at the cost of injecting segments into the 446 network that do not deliver any data (and, therefore are potentially 447 wasting network resources).