idnits 2.17.1 draft-ietf-tsvwg-tcp-eifel-alg-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 9 longer pages, the longest (page 2) being 66 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 10 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([WS95], [RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC1122' is mentioned on line 356, but not defined == Unused Reference: 'RFC793' is defined on line 428, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323) ** Obsolete normative reference: RFC 2582 (Obsoleted by RFC 3782) -- Possible downref: Non-RFC (?) normative reference: ref. 'Gu01' -- Possible downref: Non-RFC (?) normative reference: ref. 'KP87' -- Possible downref: Non-RFC (?) normative reference: ref. 'LK00' ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) -- Possible downref: Non-RFC (?) normative reference: ref. 'WS95' Summary: 11 errors (**), 0 flaws (~~), 6 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Reiner Ludwig 3 INTERNET-DRAFT Michael Meyer 4 Expires: August 2002 Ericsson Research 5 February, 2002 7 The Eifel Detection Algorithm for TCP 8 10 Status of this memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that other 17 groups may also distribute working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or cite them other than as "work in progress". 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/lid-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html 30 Abstract 32 The Eifel detection algorithm allows the TCP sender to detect a 33 posteriori whether it has entered loss recovery unnecessarily. It 34 already determines this in the beginning of loss recovery when the 35 first new ACK arrives after the timeout-based or fast retransmit has 36 been sent. The algorithm requires that the TCP Timestamps option 37 defined in RFC1323 is enabled for a connection. The idea is to use 38 the timestamps echoed in returning ACKs to eliminate the 39 retransmission ambiguity in TCP. The Eifel detection algorithm 40 provides a basis for future TCP enhancements. This includes response 41 algorithms to back out of loss recovery by restoring a TCP sender's 42 congestion control state. 44 Terminology 46 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 47 SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this 48 document, are to be interpreted as described in [RFC2119]. 50 We use the term 'new ACK' to refer to an ACK that acknowledges 51 previously unacknowledged data. We use the term 'duplicate ACK' as 52 defined in [WS95]. Furthermore, we refer to the first-time 53 transmission of a data segment as the 'original transmit'. 55 1. Introduction 57 The retransmission ambiguity problem [KP87] is the TCP sender's 58 inability to distinguish whether the first new ACK that arrives after 59 a retransmit was sent in response to the original transmit or the 60 retransmit. This problem occurs after a timeout-based retransmit and 61 after a fast retransmit. The Eifel detection algorithm uses the TCP 62 Timestamps option defined in RFC1323 to eliminate the retransmission 63 ambiguity. It thereby allows the TCP sender to detect a posteriori 64 whether it has entered loss recovery unnecessarily. 66 This added capability of the TCP sender is useful in environments 67 where TCP's loss recovery and congestion control algorithms may often 68 get falsely triggered. This can be caused by packet reordering, 69 packet duplication, the loss of a flight of ACKs, or a sudden delay 70 increase in the data or the ACK path that results in a spurious 71 timeout. For example, such sudden delay increases can often occur in 72 wide-area wireless access networks due to handovers, resource 73 preemption, or because the mobile transmitter traverses through a 74 radio coverage hole (e.g., see [Gu01]). 76 Based on the Eifel detection algorithm, the TCP sender may then 77 choose to implement dedicated response algorithms. One goal of such a 78 response algorithm would be to alleviate the consequences of a 79 falsely triggered loss recovery. This may include restoring the TCP 80 sender's congestion control state, and/or avoiding the often 81 unnecessary go-back-N retransmits that typically occur after a 82 spurious timeout. Another goal would be to adapt protocol parameters 83 such as the duplicate acknowledgement threshold [RFC2581], and/or the 84 RTT estimators [RFC2988]. This is to reduce the risk of falsely 85 triggering TCP's loss recovery again as the connection progresses. 86 However, such response algorithms are outside the scope of this 87 document. Note: The original proposal, the "Eifel algorithm" [LK00], 88 comprises both a detection and a response algorithm. This document 89 only defines the detection part. 91 A key feature of the Eifel detection algorithm is that it already 92 detects upon the first new ACK that arrives during loss recovery 93 whether a fast retransmit or a timeout was spurious. This is crucial 94 to be able to avoid the mentioned go-back-N retransmits. Another 95 feature is that the Eifel detection algorithm is fairly robust 96 against the loss of ACKs. 98 2. Events that Falsely Trigger TCP Loss Recovery 100 The following events falsely trigger a TCP sender's loss recovery and 101 congestion control algorithms. This causes a so-called spurious 102 retransmit, and an unnecessary reduction of the TCP sender's 103 congestion window and slow start threshold [RFC2581]. 105 - Spurious timeout 107 - Packet reordering 109 - Packet duplication 111 The common understanding of a spurious timeout is a timeout that 112 would not have occurred had the sender "waited longer". This may be 113 caused by increased delay that suddenly occurs in the data and/or the 114 ACK path. That in turn might cause an ACK to arrive too late, i.e., 115 only after the TCP sender's retransmission timer has expired. Note 116 that such a late ACK could either be a new ACK that would acknowledge 117 the oldest outstanding segment, or it could be the third duplicate 118 ACK that would trigger a fast retransmit of the oldest outstanding 119 segment [RFC2581]. For the purpose of specifying the algorithm in 120 Section 3, we define the former case as SPUR_TO_NEWACK, and the 121 latter as SPUR_TO_DUPACK. However, in this document we stretch the 122 definition of a spurious timeout. We also speak of a spurious timeout 123 when the timeout occurred because an entire flight of ACKs was lost 124 while in fact the original transmit of the oldest outstanding segment 125 had arrived at the TCP receiver. This case is also considered to fall 126 under the definition of SPUR_TO_NEWACK. Even though such a timeout is 127 certainly unavoidable, the triggering of loss recovery was spurious 128 since that original transmit was not lost. Hence, we use the term 129 'spurious timeout' in the sense that entering the associated loss 130 recovery was unnecessary. Yet, we exclude from the definition of a 131 spurious timeout the case where the retransmit arrives at the TCP 132 receiver before the corresponding original transmit. 134 Packet reordering in the network may occur because IP [RFC791] does 135 not guarantee in-order delivery of packets. Additionally, a TCP 136 receiver generates a duplicate ACK for each segment that arrives out- 137 of-order. This results in a spurious fast retransmit if three or more 138 data segments arrive out-of-order at the TCP receiver, and at least 139 three of the resulting duplicate ACKs arrive at the TCP sender. We 140 define such a case as SPUR_FR. 142 Packet duplication may occur because a receiving IP does not (cannot) 143 remove packets that have been duplicated in the network. A TCP 144 receiver in turn also generates a duplicate ACK for each duplicate 145 segment. As with packet reordering, this results in a spurious fast 146 retransmit if duplication of data segments or ACKs results in three 147 or more duplicate ACKs to arrive at the TCP sender. This case is also 148 considered to fall under the definition of SPUR_FR. 150 The negative impact on TCP performance caused by packet reordering 151 and packet duplication is commonly the same: a single spurious 152 retransmit (the fast retransmit), and the unnecessary halving of the 153 TCP sender's congestion window as a result of the subsequent fast 154 recovery phase [RFC2581]. 156 The negative impact on TCP performance caused by a spurious timeout 157 is more severe. First, the timeout event itself causes a single 158 spurious retransmit, and unnecessarily forces the TCP sender into 159 slow start [RFC2581]. Then, as the connection progresses, a chain 160 reaction gets triggered that further decreases TCP's performance. 161 Since the timeout was spurious, at least some ACKs for original 162 transmits will typically arrive at the TCP sender before the ACK for 163 the retransmit arrives. (This is unless severe packet reordering 164 coincided with the spurious timeout in such a way that the ACK for 165 the retransmit is the first new ACK to arrive at the TCP sender.) 166 Those ACKs for original transmits will then trigger an implicit go- 167 back-N loss recovery at the TCP sender. Assuming that none of the 168 outstanding segments and none of the corresponding ACKs were lost, 169 all outstanding segments will get retransmitted unnecessarily. 170 Moreover, in some TCPs this may then cause a spurious fast 171 retransmit. This is because the spurious go-back-N retransmits will 172 arrive as duplicates at the TCP receiver, which in turn triggers a 173 series of duplicate ACKs. Note that this last spurious fast 174 retransmit could be avoided with the careful variant of 'bug fix' 175 [RFC2582]. 177 More detailed explanations including TCP trace plots that visualize 178 the effects of spurious timeouts and packet reordering can be found 179 in [LK00]. 181 3. The Eifel Detection Algorithm 183 3.1 The Idea 185 The goal of the Eifel detection algorithm is to allow the TCP sender 186 to detect a posteriori whether it has entered loss recovery 187 unnecessarily. Furthermore, the TCP sender should be able to make 188 this decision upon the first new ACK that arrives after the timeout- 189 based or fast retransmit has been sent. This in turn requires extra 190 information in every ACK by which the TCP sender can unambiguously 191 distinguish whether that first new ACK was sent in response to the 192 original transmit or the retransmit. Such extra information is 193 provided by the TCP Timestamps option [RFC1323]. Generally speaking, 194 timestamps are monotonously increasing "serial numbers" added into 195 every segment that are then echoed within the corresponding ACKs. 196 This is exploited by the Eifel detection algorithm in the following 197 way. 199 Given that timestamps are enabled for a connection the TCP sender 200 always stores the timestamp of the retransmit sent in the beginning 201 of loss recovery, i.e., the timestamp of the timeout-based or the 202 fast retransmit. If the timestamp of the first new ACK, arriving 203 after the mentioned retransmit was sent, is smaller then the stored 204 timestamp of that retransmit, then that ACK must have been sent in 205 response to an original transmit. Hence, the TCP sender must have 206 entered loss recovery unnecessarily. 208 The fact that the Eifel detection algorithm decides upon this first 209 new ACK is crucial to allow future response algorithms to avoid the 210 spurious go-back-N retransmits that typically occur after a spurious 211 timeout. Also, if loss recovery was entered unnecessarily, a window 212 worth of ACKs are outstanding that all carry a timestamp that is 213 smaller than the stored timestamp of the retransmit. The arrival of 214 any one of those ACKs suffices the Eifel detection algorithm to work. 215 Hence, the solution is fairly robust against ACK losses. Even the ACK 216 sent in response to the retransmit, i.e., the one that carries the 217 stored timestamp, may get lost. 219 Note: Also the DSACK option [RFC2883] can be used to detect a 220 posteriori whether the TCP sender has entered loss recovery 221 unnecessarily. However, the first ACK carrying a DSACK option usually 222 arrives at the TCP sender only after loss recovery has already 223 terminated. Thus, the DSACK option cannot be used to eliminate the 224 retransmission ambiguity. Consequently, it cannot be used to avoid 225 the mentioned spurious go-back-N retransmits. Moreover, a DSACK-based 226 detection algorithm is less robust against ACK losses. 228 3.2 The Algorithm 230 Given that the TCP Timestamps option [RFC1323] is enabled for a 231 connection, a TCP sender MAY use the Eifel detection algorithm as 232 defined in this subsection. 234 If the Eifel detection algorithm is used, the following steps MUST be 235 taken by the TCP sender only upon initiation of loss recovery, i.e., 236 when either the timeout-based retransmit or fast retransmit is sent. 237 Note: The Eifel detection algorithm should not be (re-)initiated 238 after loss recovery has already started. In particular, it may not be 239 (re-)initiated upon subsequent timeouts for the same segment, and not 240 upon retransmitting segments other than the oldest outstanding 241 segment. 243 (1) Set a "SpuriousRecovery" variable to FALSE. 245 (2) Set a "RetransmitTS" variable to the value of the 246 Timestamp Value field of the Timestamps option included 247 in the retransmit. A TCP sender must ensure that 248 RetransmitTS does not get overwritten as loss recovery 249 progresses, e.g., in case of a second timeout and 250 subsequent second retransmit of the same octet. 252 (3) Wait for the arrival of a valid ACK. If a valid ACK 253 arrives, then proceed to step (4). 255 (4) If a first or second duplicate ACK has arrived, then 256 return to step (3), else proceed to step (5). 258 (5) If a third duplicate ACK has arrived and the loss 259 recovery had been initiated with a timeout-based 260 retransmit, then set SpuriousRecovery to SPUR_TO_DUPACK 261 and proceed to step (DONE), else proceed to step (6). 263 (6) If a new ACK has arrived and the Timestamp Echo Reply 264 field of the ACK's Timestamps option is smaller than 265 RetransmitTS, then proceed to step (7), else proceed to 266 step (DONE). 268 (7) If the loss recovery had been initiated with a timeout- 269 based retransmit then set SpuriousRecovery to 270 SPUR_TO_NEWACK, else set SpuriousRecovery to SPUR_FR. 271 Proceed to step (DONE). 273 (DONE) No further processing. 275 The comparison operator "smaller than" in step (6) is conservative. 276 In theory, when the "timestamp clock" is slow or the network is fast, 277 RetransmitTS could at most be equal to the timestamp echoed by an ACK 278 sent in response to an original transmit. In that case, it is assumed 279 that the loss recovery was not falsely triggered. 281 3.3 Non-RFC1323-compliant TCP Receivers Mask Certain Spurious Timeouts 283 Not all TCP implementations strictly follow RFC1323. There are 284 differences concerning which timestamp a TCP receiver echoes when a 285 duplicate segment arrives. The relevant case to consider in this 286 context is a spurious timeout that occurred because an entire flight 287 of ACKs got lost. (Recall the definition of a spurious timeout in 288 Section 2.) The question is which timestamp the TCP receiver echoes 289 in response to the spurious retransmit that typically arrives as a 290 duplicate segment. RFC1323-compliant TCP receivers (e.g., LINUX 291 kernel 2.4) will echo the timestamp of the last segment that arrived 292 in-sequence, while some non-RFC1323-compliant TCP receivers will echo 293 the timestamp of the retransmit. 295 The Eifel detection algorithm will only detect such spurious timeouts 296 in case the TCP receiver is RFC1323-compliant in this respect. 298 3.4 Protecting Against Misbehaving TCP Receivers 300 A TCP receiver can easily make a genuine retransmit appear to the TCP 301 sender as a spurious retransmit by forging echoed timestamps. This 302 may pose a security concern. 304 Fortunately, there is a way to modify the Eifel detection algorithm 305 in a way that makes it robust against lying TCP receivers. The idea 306 is to use timestamps as a "segment's secret" that the TCP receiver 307 only gets to know if it receives the segment. Conversely, a TCP 308 receiver will not know the timestamp of a segment that was lost. 309 Hence, to "prove" that it received the original transmit of a segment 310 that the TCP sender retransmitted, the TCP receiver would need to 311 return the timestamp of that original transmit. The Eifel detection 312 algorithm could then be modified to only decide that loss recovery 313 had been unnecessarily entered if the first new ACK echoes the 314 timestamp of the original transmit. 316 Hence, implementers may choose to implement the algorithm with the 317 following modifications. 319 Step (2) is replaced with step (2'): 321 (2') Set a "RetransmitTS" variable to the value of the 322 Timestamp Value field of the Timestamps option that was 323 included in the original transmit corresponding to the 324 retransmit. Note: This step requires that the TCP sender 325 stores the timestamps of all outstanding original 326 transmits. 328 Step (4) is replaced with step (4'): 330 (4') If a first or second duplicate ACK has arrived, then 331 proceed to step (DONE), else proceed to step (6). 333 Step (6) is replaced with step (6'): 335 (6') If a new ACK has arrived and the Timestamp Echo Reply 336 field of the ACK's Timestamps option is equal to 337 RetransmitTS, then proceed to step (7), else proceed to 338 step (DONE). 340 These modifications come at a cost. First, spurious timeouts of type 341 SPUR_TO_DUPACK cannot be detected any longer. Also, the modified 342 algorithm is fairly sensitive against ACK losses since it relies on 343 the arrival of the new ACK that corresponds to the original transmit. 345 Note: The first new ACK that arrives after loss recovery had been 346 unnecessarily entered, typically echoes the timestamp of the original 347 transmit. This assumes that the ACK corresponding to the original 348 transmit was not lost, and the TCP receiver does not forge timestamps 349 but complies with RFC1323. In case of a spurious fast retransmit, 350 this is implied by the rules for generating ACKs for data segments 351 that fill in all or part of a gap in the sequence space (see section 352 4.2 of [RFC2581]) and by the rules for echoing timestamps in that 353 case (see rule (C) in section 3.4 of [RFC1323]). In case of a 354 spurious timeout, it is likely that the delay that has caused the 355 spurious timeout has also caused the TCP receiver's delayed ACK timer 356 [RFC1122] to expire before the original transmit arrives. Also, in 357 this case the rules for generating ACKs and the rules for echoing 358 timestamps (see rule (A) in section 3.4 of [RFC1323]) ensure that the 359 original transmit's timestamp is echoed. 361 A remaining problem is that a TCP receiver might guess a lost 362 segment's timestamp from observing the timestamps of segments that 363 arrived earlier. This could be avoided by having the TCP sender add a 364 "random counter" to the timestamp of every segment it sends, and then 365 increment that counter by a random value, e.g., between 1 and 100. 366 This ensures that timestamps remain monotonously increasing while 367 making it difficult for a TCP receiver to guess the timestamp of a 368 lost segment. 370 4. Security Considerations 372 There do not seem to be any security considerations associated with 373 the Eifel detection algorithm. This is because the Eifel detection 374 algorithm does not alter existing protocol state at the TCP sender 375 nor at the receiver. That is, no value of a state variable other than 376 one introduced by the algorithm itself (SpuriousRecovery, and 377 RetransmitTS) is changed. 379 Moreover, a variant of the Eifel detection algorithm has been 380 proposed in Section 3.4 that makes it robust against lying TCP 381 receivers. 383 Acknowledgments 385 Many thanks to Keith Sklower, Randy Katz, Stephan Baucke, Sally 386 Floyd, Vern Paxson, Mark Allman, Ethan Blanton, Andrei Gurtov, Pasi 387 Sarolahti, and Alexey Kuznetsov for useful discussions that 388 contributed to this work. 390 References 392 [RFC2581] M. Allman, V. Paxson, W. Stevens, TCP Congestion Control, 393 RFC 2581, April 1999. 395 [RFC2119] S. Bradner, Key words for use in RFCs to Indicate 396 Requirement Levels, RFC 2119, March 1997. 398 [RFC1323] V. Jacobson, R. Braden, D. Borman, TCP Extensions for High 399 Performance, RFC 1323, May 1992. 401 [RFC2582] S. Floyd, T. Henderson, The NewReno Modification to TCP's 402 Fast Recovery Algorithm, RFC 2582, April 1999. 404 [RFC2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, A. Romanow, 405 An Extension to the Selective Acknowledgement (SACK) Option 406 for TCP, RFC 2883, July 2000. 408 [Gu01] A. Gurtov, Effect of Delays on TCP Performance, In 409 Proceedings of IFIP Personal Wireless Communications, 410 August 2001. 412 [KP87] P. Karn, C. Partridge, Improving Round-Trip Time Estimates 413 in Reliable Transport Protocols, In Proceedings of ACM 414 SIGCOMM 87. 416 [LK00] R. Ludwig, R. H. Katz, The Eifel Algorithm: Making TCP 417 Robust Against Spurious Retransmissions, ACM Computer 418 Communication Review, Vol. 30, No. 1, January 2000, 419 available at http://www.acm.org/sigcomm/ccr/archive/2000/ 420 jan00/ccr-200001-ludwig.html (easier studied when 421 viewed/printed in color). 423 [RFC2988] V. Paxson, M. Allman, Computing TCP's Retransmission Timer, 424 RFC 2988, November 2000. 426 [RFC791] J. Postel, Internet Protocol, RFC 791, September 1981. 428 [RFC793] J. Postel, Transmission Control Protocol, RFC793, September 429 1981. 431 [WS95] G. R. Wright, W. R. Stevens, TCP/IP Illustrated, Volume 2 432 (The Implementation), Addison Wesley, January 1995. 434 Author's Address 436 Reiner Ludwig 437 Ericsson Research (EED) 438 Ericsson Allee 1 439 52134 Herzogenrath, Germany 440 Phone: +49 2407 575 719 441 Fax: +49 2407 575 400 442 Email: Reiner.Ludwig@eed.ericsson.se 444 Michael Meyer 445 Ericsson Research (EED) 446 Ericsson Allee 1 447 52134 Herzogenrath, Germany 448 Phone: +49 2407 575 654 449 Fax: +49 2407 575 400 450 Email: Michael.Meyer@eed.ericsson.se 452 This Internet-Draft expires in August 2002.