idnits 2.17.1 draft-touch-tcpm-automatic-iw-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 16, 2012) is 4300 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-08) exists of draft-ietf-tcpm-initcwnd-04 -- Obsolete informational reference (is this intentional?): RFC 1323 (Obsoleted by RFC 7323) -- Obsolete informational reference (is this intentional?): RFC 2001 (Obsoleted by RFC 2581) -- Obsolete informational reference (is this intentional?): RFC 2140 (Obsoleted by RFC 9040) -- Obsolete informational reference (is this intentional?): RFC 2414 (Obsoleted by RFC 3390) -- Obsolete informational reference (is this intentional?): RFC 2581 (Obsoleted by RFC 5681) -- Obsolete informational reference (is this intentional?): RFC 2861 (Obsoleted by RFC 7661) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TCPM Working Group J. Touch 2 Internet Draft USC/ISI 3 Intended status: Standards Track July 16, 2012 4 Expires: January 2013 6 Automating the Initial Window in TCP 7 draft-touch-tcpm-automatic-iw-03.txt 9 Status of this Memo 11 This Internet-Draft is submitted in full conformance with the 12 provisions of BCP 78 and BCP 79. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six 20 months and may be updated, replaced, or obsoleted by other documents 21 at any time. It is inappropriate to use Internet-Drafts as 22 reference material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html 30 This Internet-Draft will expire on January 16, 2011. 32 Copyright Notice 34 Copyright (c) 2012 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with 42 respect to this document. Code Components extracted from this 43 document must include Simplified BSD License text as described in 44 Section 4.e of the Trust Legal Provisions and are provided without 45 warranty as described in the Simplified BSD License. 47 Abstract 49 The Initial Window (IW) provides the starting point for TCP's 50 feedback-based congestion control algorithm. Its value has increased 51 over time to increase performance and to reflect increased 52 capability of Internet devices. This document describes a mechanism 53 to adjust the IW over long timescales, to make future changes more 54 safely deployed and to potentially avoid reexamination of this value 55 in the future. 57 Table of Contents 59 1. Introduction...................................................2 60 2. Conventions used in this document..............................3 61 3. Design Considerations..........................................3 62 4. Proposed IW Algorithm..........................................4 63 5. Discussion.....................................................7 64 6. Observations...................................................8 65 7. Security Considerations........................................9 66 8. IANA Considerations............................................9 67 9. Conclusions....................................................9 68 10. References....................................................9 69 10.1. Normative References.....................................9 70 10.2. Informative References...................................9 71 11. Acknowledgments..............................................10 73 1. Introduction 75 TCP's congestion control algorithm uses an initial window value 76 (IW), both as a starting point for new connections and after one RTO 77 or more [RFC2581][RFC2861]. This value has evolved over time, 78 originally one maximum segment size (MSS), and increased to the 79 lesser of four MSS or 4,380 bytes [RFC3390][RFC5681]. For typical 80 Internet connections with an maximum transmission units (MTUs) of 81 1500 bytes, this permits three segments of 1,460 bytes each. 83 The IW value was originally implied in the original TCP congestion 84 control description, and documented as a standard in 1997 85 [RFC2001][Ja88]. The value was last updated in 1998 experimentally, 86 and moved to the standards track in 2002 [RFC2414][RFC3390]. There 87 have been recent proposals to update the IW based on further 88 increases in host and router capabilities and network capacity, some 89 focusing on specific values (e.g., IW=10), and others prescribing a 90 schedule for increases over time (e.g., IW=6 for 2011, increasing by 91 1-2 MSS per year). 93 This document proposes that TCP can objectively measure when an IW 94 is too large, and that such feedback should be used over long 95 timescales to adjust the IW automatically. The result should be 96 safer to deploy and might avoid the need to repeatedly revisit IW 97 size over time. 99 2. Conventions used in this document 101 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 102 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 103 document are to be interpreted as described in RFC-2119 [RFC2119]. 105 In this document, these words will appear with that interpretation 106 only when in ALL CAPS. Lower case uses of these words are not to be 107 interpreted as carrying RFC-2119 significance. 109 In this document, the characters ">>" preceding an indented line(s) 110 indicates a compliance requirement statement using the key words 111 listed above. This convention aids reviewers in quickly identifying 112 or finding the explicit compliance requirements of this RFC. 114 3. Design Considerations 116 TCP's IW value has existed statically for over two decades, so any 117 solution to adjusting the IW dynamically should have similarly 118 stable, non-invasive effects on the performance and complexity of 119 TCP. In order to be fair, the IW should be similar for most machines 120 on the public Internet. Finally, a desirable goal is to develop a 121 self-correcting algorithm, so that IW values that cause network 122 problems can be avoided. To that end, we propose the following list 123 of design goals: 125 o Impart little to no impact to TCP in the absence of loss, i.e., 126 it should not increase the complexity of default packet 127 processing in the normal case. 129 o Adapt to network feedback over long timescales, avoiding values 130 that persistently cause network problems. 132 o Decrease the IW in the presence of sustained loss of IW segments, 133 as determined over a number of different connections. 135 o Increase the IW in the absence of sustained loss of IW segments, 136 as determined over a number of different connections. 138 o Operate conservatively, i.e., tend towards leaving the IW the 139 same in the absence of sufficient information, and give greater 140 consideration to IW segment loss than IW segment success. 142 We expect that, without other context, a good IW algorithm will 143 converge to a single value, but this is not required. An endpoint 144 with additional context or information, or deployed in a constrained 145 environment, can always use a different value. In specific, 146 information from previous connections, or sets of connections with a 147 similar path, can already be used as context for such decisions 148 [RFC2140]. 150 However, if a given IW value persistently causes packet loss during 151 the initial burst of packets, it is clearly inappropriate and could 152 be inducing unnecessary loss in other competing connections. This 153 might happen for sites behind very slow boxes with small buffers, 154 which may or may not be the first hop. 156 4. Proposed IW Algorithm 158 Below is a simple description of the proposed IW algorithm. It 159 relies on the following parameters: 161 o MinIW = 3 MSS or 4,380 bytes (as per RFC3390] 163 o MaxIW = 10 165 o MulDecr = 0.5 167 o AddIncr = 2 MSS 169 o Threshold = 0.05 171 We assume that the minimum IW (MinIW) should be as currently 172 specified [RFC3390]. The maximum IW can be set to a fixed value 173 [Ch10], or set based on a schedule if trusted time references are 174 available [Al10]; here we prefer a fixed value. We also propose to 175 use an AIMD algorithm, with increase and decreases as noted. 177 Although these parameters are somewhat arbitrary, their initial 178 values are not important except that the algorithm is AIMD and the 179 MaxIW should not exceed that recommended for other systems on the 180 Internet. Current proposals, including default current operation, 181 are degenerate cases of the algorithm below for given parameters - 182 notably MulDec = 1.0 and AddIncr = 0 MSS, thus disabling the 183 automatic part of the algorithm. 185 The proposed algorithm is as follows: 187 0. On boot: 189 IW = MaxIW; # assume this is in bytes, and an even number of MSS 191 1. Upon starting a new connection 193 CWND = IW; 194 conncount++; 195 IWnotchecked = 1; # true 197 2. During a connection's SYN-ACK processing, if SYN-ACK includes 198 ECN, treat as if the IW is too large 200 if (IWnotchecked && (synackecn == 1)) { 201 losscount++; 202 IWnotchecked = 0; # never check again 203 } 205 3. During a connection, if retransmission occurs, check the seqno of 206 the outgoing packet (in bytes) to see if the resent segment fixes 207 an IW loss: 209 if (Retransmitting && IWnotchecked && ((ISN - seqno) < IW))) { 210 losscount++; 211 IWnotchecked = 0; # never do this entire "if" again 212 } else { 213 IWnotchecked = 0; # you're beyond the IW so stop checking 214 } 216 4. Once every 1000 conections, as a separate process (i.e., not as 217 part of processing a given connection): 219 if (conncount > 1000) { 220 if (losscount/conncount > threshold) { 221 # the number of connections with errors is too high 222 IW = IW * MulDecr; 223 } else { 224 IW = IW + AddIncr; 225 } 226 } 228 We recognize that this algorithm can yield a false positive when the 229 sequence number wraps around. This can be avoided using either PAWS 230 [RFC1323] context or 64-bit internal sequence numbers (as in TCP-AO 231 [RFC5925]). Alternately, false positives can be allowed since they 232 are expected to be infrequent and thus will not affect the overall 233 statistics of the algorithm. 235 The following additional constraints are imposed: 237 >> The automatic IW algorithm MUST initialize to MaxIW, in the 238 absence of other context information. 240 If there are too few connections to make a decision, or if there is 241 otherwise insufficient information to increase the IW, then the 242 MaxIW defaults to the current recommended value. 244 >> An implementation may allow the MaxIW to grow beyond the 245 currently recommended Internet default, but not more than 2 segments 246 per calendar year. 248 If an endpoint has a persistent history of successfully transmitting 249 IW segments without loss, then it is allowed to probe the Internet 250 to determine if larger IW values have similar success. This probing 251 is limited and requires a trusted time source, otherwise the MaxIW 252 remains constant. 254 >> An implementation MUST adjust the IW based on loss statistics at 255 least once every 1000 connections. 257 An endpoint needs to be sufficiently reactive to IW loss. 259 >> An implementation MUST decrease the IW by at least one MSS when 260 indicated during an evaluation interval. 262 An endpoint that detects loss needs to decrease its IW by at least 263 one MSS, otherwise it is not participating in an automatic reactive 264 algorithm. 266 >> An implementation MUST increase by no more than 2 MSS per 267 evaluation interval. 269 An endpoint that does not experience IW loss needs to probe the 270 network incrementally. 272 >> An implementation SHOULD use an IW that is an integer multiple of 273 2 MSS. 275 The IW should remain a multiple of 2 MSS segments, to enable 276 efficient ACK compression without incurring unnecessary timeouts. 278 >> An implementation MUST decrease the IW if more than 95% of 279 connections have IW losses. 281 Again, this is to ensure an implementation is sufficiently reactive. 283 >> An implementation MAY group IW values and statistics within 284 subsets of connections. Such grouping MAY use any information about 285 connections to form groups except loss statistics. 287 There are some TCP connections which might not be counted at all, 288 such as those to/from loopback addresses, or those within the same 289 subnet as that of a local interface (for which congestion control is 290 sometimes disabled anyway). This may also include connections that 291 terminate before the IW is full, i.e., as a separate check at the 292 time of the connection closing. 294 The period over which the IW is updated is intended to be a long 295 timescale, e.g., a month or so, or 1,000 connections, whichever is 296 longer. An implementation might check the IW once a month, and 297 simply not update the IW or clear the connection counts in months 298 where the number of connections is too small. 300 5. Discussion 302 There are numerous parameters to the above algorithm that are 303 compliant with the given requirements; this is intended to allow 304 variation in configuration and implementation while ensuring that 305 all such algorithms are reactive and safe. 307 This algorithm continues to assume segments because that is the 308 basis of most TCP implementations. It might be useful to consider 309 revising the specifications to allow byte-based congestion given 310 sufficient experience. 312 The algorithm checks for IW losses only during the first IW after a 313 connection start; it does not check for IW losses elsewhere the IW 314 is used, e.g., during slow-start restarts. 316 >> An implementation MAY detect IW losses during slow-start restarts 317 in addition to losses during the first IW of a connection. In this 318 case, the implementation MUST count each restart as a "connection" 319 for the purposes of connection counts and periodic rechecking of the 320 IW value. 322 False positives can occur during some kinds of segment reordering, 323 e.g., that might trigger spurious retransmissions even without a 324 true segment loss. These are not expected to be sufficiently common 325 to dominate the algorithm and its conclusions. 327 This mechanism does require additional per-connection state which is 328 currently common in some implementations, and is useful for other 329 reasons (e.g., the ISN is used in TCP-AO [RFC5925]). The mechanism 330 also benefits from persistent state kept across reboots, as would be 331 other state sharing mechanisms (e.g., TCP Control Block Sharing 332 [RFC2140]). The mechanism is inspired by RFC 2140's use of 333 information across connections. 335 The receive window (RWIN) is not involved in this calculation. The 336 size of RWIN is determined by receiver resources, and provides space 337 to accommodate segment reordering. It is not involved with 338 congestion control, which is the focus of this document and its 339 management of the IW. 341 6. Observations 343 The IW may not converge to a single, global value. It also may not 344 converge at all, but rather may oscillate by a few MSS as it 345 repeatedly probes the Internet for larger IWs and fails. Both 346 properties are consistent with TCP behavior during each individual 347 connection. 349 This mechanism assumes that losses during the IW are due to IW size. 350 Persistent errors that drop packets for other reasons - e.g., OS 351 bugs, can cause false positives. Again, this is consistent with 352 TCP's basic assumption that loss is caused by congestion and 353 requires backoff. This algorithm treats the IW of new connections as 354 a long-timescale backoff system. 356 7. Security Considerations 358 This algorithm presents an opportunity for an intelligent attack to 359 reduce the IW of a given system, by repeatedly dropping packets 360 during the IW only. An intermediate that can drop packets in a 361 controlled manner can already impact the performance of a 362 connection, and can reduce the congestion window of an ongoing 363 connection in ways that impact performance more than just dropping 364 during the IW. 366 8. IANA Considerations 368 This document has no IANA considerations. This section should be 369 removed prior to publication. 371 9. Conclusions 373 375 10. References 377 10.1. Normative References 379 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 380 Requirement Levels", BCP 14, RFC 2119, March 1997. 382 [RFC3390] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's 383 Initial Window", RFC 3390 (Standards Track), Oct. 2002. 385 [RFC5681] Allman, M., Paxson, V., Blanton, E., "TCP Congestion 386 Control," RFC 5681 (Standards Track), Sep. 2009. 388 10.2. Informative References 390 [Al10] Allman, M., "Initial Congestion Window Specification", 391 (work in progress), draft-allman-tcpm-bump-initcwnd-00, 392 Nov. 2010. 394 [Ch10] Chu, J., Dukkipati, N., Cheng, Y., Mathis, M., "Increasing 395 TCP's Initial Window," (work in progress), draft-ietf- 396 tcpm-initcwnd-04, Jun. 2012. 398 [Ja88] Jacobson, V., M. Karels, "Congestion Avoidance and 399 Control", Proc. Sigcomm 1988. 401 [RFC1323] Jacobson, V., Braden, R., Borman, D., "TCP Extensions for 402 High Performance", RFC 1323, May 1992. 404 [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast 405 Retransmit, and Fast Recovery Algorithms", RFC2001 406 (Standards Track), Jan. 1997. 408 [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140 / 409 STD 7(Informational), Apr. 1997. 411 [RFC2414] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's 412 Initial Window", RFC 2414 (Experimental), Sept. 1998. 414 [RFC2581] Allman, M., Paxson, V., Stevens, W., "TCP Congestion 415 Control," RFC2581 (Standards Track), Apr. 1999. 417 [RFC2861] Handley, M., Padhye, J., Floyd, S., "TCP Congestion Window 418 Validation", RFC2861 (Experimental), June 2000. 420 [RFC5925] Touch, J., A. Mankin, R. Bonica, "The TCP Authentication 421 Option", RFC 5925 (Standards Track), June 2010. 423 11. Acknowledgments 425 Mark Allman and Aki Nyrjinen contributed to the development of this 426 algorithm. Members of the TCPM mailing list also participated in 427 providing useful feedback. 429 This document was prepared using 2-Word-v2.0.template.dot. 431 Authors' Addresses 433 Joe Touch 434 USC/ISI 435 4676 Admiralty Way 436 Marina del Rey, CA 90292-6695 U.S.A. 438 Phone: +1 (310) 448-9151 439 Email: touch@isi.edu