idnits 2.17.1 draft-paxson-tcpm-rfc2988bis-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 7 instances of too long lines in the document, the longest one being 6 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 2010) is 5184 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2988' is mentioned on line 360, but not defined ** Obsolete undefined reference: RFC 2988 (Obsoleted by RFC 6298) == Missing Reference: 'JBB92' is mentioned on line 156, but not defined == Missing Reference: 'RFC1122' is mentioned on line 360, but not defined == Missing Reference: 'RFC5681' is mentioned on line 383, but not defined ** Obsolete normative reference: RFC 2581 (ref. 'APS99') (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 793 (ref. 'Pos81') (Obsoleted by RFC 9293) Summary: 5 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force V. Paxson 2 INTERNET DRAFT ICSI/UC Berkeley 3 File: draft-paxson-tcpm-rfc2988bis-00.txt M. Allman 4 ICSI 5 J. Chu 6 Google 7 February 2010 9 Computing TCP's Retransmission Timer 11 Status of this Memo 13 This Internet-Draft is submitted to IETF in full conformance with 14 the provisions of BCP 78 and BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months and may be updated, replaced, or obsoleted by other documents 23 at any time. It is inappropriate to use Internet-Drafts as 24 reference material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on August 1, 2010. 34 Copyright Notice 36 Copyright (c) 2010 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with 44 respect to this document. Code Components extracted from this 45 document must include Simplified BSD License text as described in 46 Section 4.e of the Trust Legal Provisions and are provided without 47 warranty as described in the BSD License. 49 Abstract 51 This document defines the standard algorithm that Transmission 52 Control Protocol (TCP) senders are required to use to compute and 53 manage their retransmission timer. It expands on the discussion in 54 section 4.2.3.1 of RFC 1122 and upgrades the requirement of 55 supporting the algorithm from a SHOULD to a MUST. 57 1 Introduction 59 The Transmission Control Protocol (TCP) [Pos81] uses a retransmission 60 timer to ensure data delivery in the absence of any feedback from the 61 remote data receiver. The duration of this timer is referred to as 62 RTO (retransmission timeout). RFC 1122 [Bra89] specifies that the 63 RTO should be calculated as outlined in [Jac88]. 65 This document codifies the algorithm for setting the RTO. In 66 addition, this document expands on the discussion in section 4.2.3.1 67 of RFC 1122 and upgrades the requirement of supporting the algorithm 68 from a SHOULD to a MUST. RFC 2581 [APS99] outlines the algorithm TCP 69 uses to begin sending after the RTO expires and a retransmission is 70 sent. This document does not alter the behavior outlined in RFC 2581 71 [APS99]. 73 In some situations it may be beneficial for a TCP sender to be more 74 conservative than the algorithms detailed in this document allow. 75 However, a TCP MUST NOT be more aggressive than the following 76 algorithms allow. 78 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 79 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 80 document are to be interpreted as described in [Bra97]. 82 2 The Basic Algorithm 84 To compute the current RTO, a TCP sender maintains two state 85 variables, SRTT (smoothed round-trip time) and RTTVAR (round-trip 86 time variation). In addition, we assume a clock granularity of G 87 seconds. 89 The rules governing the computation of SRTT, RTTVAR, and RTO are as 90 follows: 92 (2.1) Until a round-trip time (RTT) measurement has been made for a 93 segment sent between the sender and receiver, the sender SHOULD 94 set RTO <- 1 second, though the "backing off" on repeated 95 retransmission discussed in (5.5) still applies. 97 Note that the previous version of this document used an 98 initial RTO of 3 seconds [RFC2988]. A TCP implementation MAY 99 still use this value (or any other value > 1 second). This 100 change in the lower bound on the initial RTO is discussed in 101 further detail in Appendix A. 103 (2.2) When the first RTT measurement R is made, the host MUST set 105 SRTT <- R 106 RTTVAR <- R/2 107 RTO <- SRTT + max (G, K*RTTVAR) 109 where K = 4. 111 (2.3) When a subsequent RTT measurement R' is made, a host MUST set 113 RTTVAR <- (1 - beta) * RTTVAR + beta * |SRTT - R'| 114 SRTT <- (1 - alpha) * SRTT + alpha * R' 116 The value of SRTT used in the update to RTTVAR is its value 117 before updating SRTT itself using the second assignment. That 118 is, updating RTTVAR and SRTT MUST be computed in the above 119 order. 121 The above SHOULD be computed using alpha=1/8 and beta=1/4 (as 122 suggested in [JK88]). 124 After the computation, a host MUST update 125 RTO <- SRTT + max (G, K*RTTVAR) 127 (2.4) Whenever RTO is computed, if it is less than 1 second then the 128 RTO SHOULD be rounded up to 1 second. 130 Traditionally, TCP implementations use coarse grain clocks to 131 measure the RTT and trigger the RTO, which imposes a large 132 minimum value on the RTO. Research suggests that a large 133 minimum RTO is needed to keep TCP conservative and avoid 134 spurious retransmissions [AP99]. Therefore, this 135 specification requires a large minimum RTO as a conservative 136 approach, while at the same time acknowledging that at some 137 future point, research may show that a smaller minimum RTO is 138 acceptable or superior. 140 (2.5) A maximum value MAY be placed on RTO provided it is at least 60 141 seconds. 143 3 Taking RTT Samples 145 TCP MUST use Karn's algorithm [KP87] for taking RTT samples. That 146 is, RTT samples MUST NOT be made using segments that were 147 retransmitted (and thus for which it is ambiguous whether the reply 148 was for the first instance of the packet or a later instance). The 149 only case when TCP can safely take RTT samples from retransmitted 150 segments is when the TCP timestamp option [JBB92] is employed, since 151 the timestamp option removes the ambiguity regarding which instance 152 of the data segment triggered the acknowledgment. 154 Traditionally, TCP implementations have taken one RTT measurement at 155 a time (typically once per RTT). However, when using the timestamp 156 option, each ACK can be used as an RTT sample. RFC 1323 [JBB92] 157 suggests that TCP connections utilizing large congestion windows 158 should take many RTT samples per window of data to avoid aliasing 159 effects in the estimated RTT. A TCP implementation MUST take at 160 least one RTT measurement per RTT (unless that is not possible per 161 Karn's algorithm). 163 For fairly modest congestion window sizes research suggests that 164 timing each segment does not lead to a better RTT estimator [AP99]. 165 Additionally, when multiple samples are taken per RTT the alpha and 166 beta defined in section 2 may keep an inadequate RTT history. A 167 method for changing these constants is currently an open research 168 question. 170 4 Clock Granularity 172 There is no requirement for the clock granularity G used for 173 computing RTT measurements and the different state variables. 174 However, if the K*RTTVAR term in the RTO calculation equals zero, 175 the variance term MUST be rounded to G seconds (i.e., use the 176 equation given in step 2.3). 178 RTO <- SRTT + max (G, K*RTTVAR) 180 Experience has shown that finer clock granularities (<= 100 msec) 181 perform somewhat better than more coarse granularities. 183 Note that [Jac88] outlines several clever tricks that can be used to 184 obtain better precision from coarse granularity timers. These 185 changes are widely implemented in current TCP implementations. 187 5 Managing the RTO Timer 189 An implementation MUST manage the retransmission timer(s) in such a 190 way that a segment is never retransmitted too early, i.e. less than 191 one RTO after the previous transmission of that segment. 193 The following is the RECOMMENDED algorithm for managing the 194 retransmission timer: 196 (5.1) Every time a packet containing data is sent (including a 197 retransmission), if the timer is not running, start it running 198 so that it will expire after RTO seconds (for the current value 199 of RTO). 201 (5.2) When all outstanding data has been acknowledged, turn off the 202 retransmission timer. 204 (5.3) When an ACK is received that acknowledges new data, restart the 205 retransmission timer so that it will expire after RTO seconds 206 (for the current value of RTO). 208 When the retransmission timer expires, do the following: 210 (5.4) Retransmit the earliest segment that has not been acknowledged 211 by the TCP receiver. 213 (5.5) The host MUST set RTO <- RTO * 2 ("back off the timer"). The 214 maximum value discussed in (2.5) above may be used to provide an 215 upper bound to this doubling operation. 217 (5.6) Start the retransmission timer, such that it expires after RTO 218 seconds (for the value of RTO after the doubling operation 219 outlined in 5.5). 221 (5.7) If the timer expires awaiting the ACK of a SYN segment and the 222 TCP implementation is using an RTO less than 3 seconds, the RTO 223 MUST be re-initialized to 3 seconds when data transmission 224 begins (i.e., after the three-way handshake completes). 226 This represents a change from the previous version of this 227 document [RFC2988] and is discussed in Appendix A. 229 Note that after retransmitting, once a new RTT measurement is 230 obtained (which can only happen when new data has been sent and 231 acknowledged), the computations outlined in section 2 are performed, 232 including the computation of RTO, which may result in "collapsing" 233 RTO back down after it has been subject to exponential backoff 234 (rule 5.5). 236 Note that a TCP implementation MAY clear SRTT and RTTVAR after 237 backing off the timer multiple times as it is likely that the 238 current SRTT and RTTVAR are bogus in this situation. Once SRTT and 239 RTTVAR are cleared they should be initialized with the next RTT 240 sample taken per (2.2) rather than using (2.3). 242 6 Security Considerations 244 This document requires a TCP to wait for a given interval before 245 retransmitting an unacknowledged segment. An attacker could cause a 246 TCP sender to compute a large value of RTO by adding delay to a 247 timed packet's latency, or that of its acknowledgment. However, 248 the ability to add delay to a packet's latency often coincides with 249 the ability to cause the packet to be lost, so it is difficult to 250 see what an attacker might gain from such an attack that could cause 251 more damage than simply discarding some of the TCP connection's 252 packets. 254 The Internet to a considerable degree relies on the correct 255 implementation of the RTO algorithm (as well as those described in 256 RFC 2581) in order to preserve network stability and avoid 257 congestion collapse. An attacker could cause TCP endpoints to 258 respond more aggressively in the face of congestion by forging 259 acknowledgments for segments before the receiver has actually 260 received the data, thus lowering RTO to an unsafe value. But to do 261 so requires spoofing the acknowledgments correctly, which is 262 difficult unless the attacker can monitor traffic along the path 263 between the sender and the receiver. In addition, even if the 264 attacker can cause the sender's RTO to reach too small a value, it 265 appears the attacker cannot leverage this into much of an attack 266 (compared to the other damage they can do if they can spoof packets 267 belonging to the connection), since the sending TCP will still back 268 off its timer in the face of an incorrectly transmitted packet's 269 loss due to actual congestion. 271 7 IANA Considerations 273 None 275 Acknowledgments 277 The RTO algorithm described in this memo was originated by Van 278 Jacobson in [Jac88]. 280 Much of the data that motivated changing the initial RTO from 3 281 seconds to 1 second came from Robert Love, Andre Broido and Mike 282 Belshe. 284 Normative References 286 [APS99] Allman, M., Paxson V. and W. Stevens, "TCP Congestion 287 Control", RFC 2581, April 1999. 289 [Bra89] Braden, R., "Requirements for Internet Hosts -- 290 Communication Layers", STD 3, RFC 1122, October 1989. 292 [Bra97] Bradner, S., "Key words for use in RFCs to Indicate 293 Requirement Levels", BCP 14, RFC 2119, March 1997. 295 [Pos81] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 296 September 1981. 298 Non-Normative References 300 [AP99] Allman, M. and V. Paxson, "On Estimating End-to-End Network 301 Path Properties", SIGCOMM 99. 303 [Chu09] Chu, J., "Tuning TCP Parameters for the 21st Century", 304 http://www.ietf.org/proceedings/75/slides/tcpm-1.pdf, July 305 2009. 307 [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer 308 Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988. 310 [JK88] Jacobson, V. and M. Karels, "Congestion Avoidance and 311 Control", ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z. 313 [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time 314 Estimates in Reliable Transport Protocols", SIGCOMM 87. 316 Author's Addresses 318 Vern Paxson 319 ICSI 320 1947 Center Street 321 Suite 600 322 Berkeley, CA 94704-1198 324 Phone: 510-666-2882 325 EMail: vern@icir.org 326 http://www.icir.org/vern/ 328 Mark Allman 329 ICSI 330 1947 Center Street 331 Suite 600 332 Berkeley, CA 94704-1198 334 Phone: 440-235-1792 335 EMail: mallman@icir.org 336 http://www.icir.org/mallman/ 338 H.K. Jerry Chu 339 Google, Inc. 340 1600 Amphitheatre Parkway 341 Mountain View, CA 94043 343 Phone: 650-253-3010 344 Email: hkchu@google.com 346 Appendix A 348 Choosing a reasonable initial RTO requires balancing two 349 competing considerations: 351 1. The initial RTO should be sufficiently large to cover most of the 352 end-to-end paths to avoid spurious retransmissions and their 353 associated negative performance impact. 355 2. The initial RTO should be small enough to ensure a timely 356 recovery from packet loss occurring before an RTT sample is 357 taken. 359 Traditionally, TCP has used 3 seconds as the initial RTO 360 [RFC1122,RFC2988]. This document calls for lowering this value to 1 361 second for the following reasons: 363 - Modern networks are simply faster than the state-of-the-art was 364 at the time the initial RTO of 3 seconds was defined. 366 - Studies have found that the round-trip time of more than 97.5% of 367 the connections observed in a large scale analysis were less than 368 1 second [Chu09], suggesting that 1 second meets criteria 1 above. 370 - In addition, the studies have observed retransmission rates within the 371 three-way handshake of roughly 2%. This shows that reducing the 372 initial RTO has benefit to a non-negligible set of connections. 374 - However, roughly 2.5% of the connections studied in [Chu09] have 375 an RTT longer than 1 second. For those connections, a 1 second 376 initial RTO guarantees a retransmission during connection establishment 377 (needed or not). 379 When this happens, this document calls for reverting to an initial 380 RTO of 3 seconds for the data transmission phase. Therefore, the 381 implications of the spurious retransmission are modest: (1) an 382 extra SYN is transmitted into the network, and (2) according to 383 [RFC5681] the initial congestion window will be limited to 1 384 segment. While (2) clearly puts such connections at a 385 disadvantage, this document at least resets the RTO such that the 386 connection will not continually run into problems with a short 387 timeout. (Of course, if the RTT is more than three seconds, the 388 connection will still encounter difficulties. But that is not a new 389 issue for TCP.) 391 In addition, we note that when using timestamps the TCP will be 392 able to take an RTT sample even in the presence of a spurious 393 retransmission, hence avoiding concern (2) above.