idnits 2.17.1 draft-ietf-tcpm-hystartplusplus-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (25 July 2021) is 999 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC1191' is mentioned on line 103, but not defined == Missing Reference: 'RFC4821' is mentioned on line 103, but not defined == Missing Reference: 'RFC1122' is mentioned on line 111, but not defined ** Downref: Normative reference to an Experimental RFC: RFC 3465 -- Obsolete informational reference (is this intentional?): RFC 8312 (Obsoleted by RFC 9438) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Balasubramanian 3 Internet-Draft Y. Huang 4 Intended status: Standards Track M. Olson 5 Expires: 26 January 2022 Microsoft 6 25 July 2021 8 HyStart++: Modified Slow Start for TCP 9 draft-ietf-tcpm-hystartplusplus-03 11 Abstract 13 This doument describes HyStart++, a simple modification to the slow 14 start phase of TCP congestion control algorithms. Traditional slow 15 start can cause overshooting of the ideal send rate and cause large 16 packet loss within a round-trip time which results in poor 17 performance. HyStart++ uses a delay increase heuristic to exit slow 18 start early while also mitigating poor performance which can result 19 from false positives. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on 26 January 2022. 38 Copyright Notice 40 Copyright (c) 2021 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 45 license-info) in effect on the date of publication of this document. 46 Please review these documents carefully, as they describe your rights 47 and restrictions with respect to this document. Code Components 48 extracted from this document must include Simplified BSD License text 49 as described in Section 4.e of the Trust Legal Provisions and are 50 provided without warranty as described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 4. HyStart++ Algorithm . . . . . . . . . . . . . . . . . . . . . 3 58 4.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 4.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 4 60 4.3. Tuning constants . . . . . . . . . . . . . . . . . . . . 6 61 5. Deployments and Performance Evaluations . . . . . . . . . . . 7 62 6. Security Considerations . . . . . . . . . . . . . . . . . . . 7 63 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 64 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 65 8.1. Normative References . . . . . . . . . . . . . . . . . . 7 66 8.2. Informative References . . . . . . . . . . . . . . . . . 8 67 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 69 1. Introduction 71 [RFC5681] describes the slow start congestion control algorithm for 72 TCP. The slow start algorithm is used when the congestion window 73 (cwnd) is less than the slow start threshold (ssthresh). During slow 74 start, in absence of packet loss signals, TCP increases cwnd 75 exponentially to probe the network capacity. This fast growth can 76 overshoot the ideal sending rate and cause significant packet loss 77 which cannot always be recovered efficiently, impairing flow 78 completion time. 80 HyStart++ first uses delay increase as a signal to exit slow start 81 before any packet loss occurs. This is one of two algorithms 82 specified in [HyStart]. After the HyStart delay algorithm finds an 83 exit point, a novel Conservative Slow Start (CSS) phase is used to 84 determine whether the slow start exit was spurious. This provides 85 protection against jitter and prevents performance problems that 86 result from early slow start exit due to false positives. HyStart++ 87 reduces packet loss and retransmissions, and improves goodput in lab 88 measurements as well as real world deployments. 90 2. Terminology 92 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 93 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 94 document are to be interpreted as described in [RFC2119]. 96 3. Definitions 98 We repeat here some definition from [RFC5681] to aid the reader. 100 SENDER MAXIMUM SEGMENT SIZE (SMSS): The SMSS is the size of the 101 largest segment that the sender can transmit. This value can be 102 based on the maximum transmission unit of the network, the path MTU 103 discovery [RFC1191, RFC4821] algorithm, RMSS (see next item), or 104 other factors. The size does not include the TCP/IP headers and 105 options. 107 RECEIVER MAXIMUM SEGMENT SIZE (RMSS): The RMSS is the size of the 108 largest segment the receiver is willing to accept. This is the value 109 specified in the MSS option sent by the receiver during connection 110 startup. Or, if the MSS option is not used, it is 536 bytes 111 [RFC1122]. The size does not include the TCP/IP headers and options. 113 RECEIVER WINDOW (rwnd): The most recently advertised receiver window. 115 CONGESTION WINDOW (cwnd): A TCP state variable that limits the amount 116 of data a TCP can send. At any given time, a TCP MUST NOT send data 117 with a sequence number higher than the sum of the highest 118 acknowledged sequence number and the minimum of cwnd and rwnd. 120 4. HyStart++ Algorithm 122 4.1. Summary 124 [HyStart] specifies two algorithms (a "Delay Increase" algorithm and 125 an "Inter-Packet Arrival" algorithm) to be run in parallel to detect 126 that the sending rate has reached capacity. In practice, the Inter- 127 Packet Arrival algorithm does not perform well and is not able to 128 detect congestion early, primarily due to ACK compression. The idea 129 of the Delay Increase algorithm is to look for RTT spikes, which 130 suggest that the bottleneck buffer is filling up. 132 In HyStart++, a TCP sender uses traditional slow start and then uses 133 the "Delay Increase" algorithm to trigger an exit from slow start. 134 But instead of going straight from slow start to congestion 135 avoidance, the sender spends a number of RTTs in a Conservative Slow 136 Start (CSS) phase to determine whether the exit was spurious. During 137 CSS, the congestion window is grown exponentially like in regular 138 slow start, but with a smaller exponential base, resulting in less 139 aggressive growth. If the RTT shrinks at any time during CSS, it's 140 concluded that the RTT spike was not related to congestion caused by 141 the connection sending too fast (i.e. the exit was spurious), and the 142 connection resumes slow start. If the RTT inflation persists 143 throughout CSS, the connection enters congestion avoidance. 145 4.2. Algorithm Details 147 We assume that Appropriate Byte Counting (as described in [RFC3465]) 148 is in use and L is the cwnd increase limit as discussed in RFC 3465. 150 A round is chosen to be approximately the Round-Trip Time (RTT). We 151 recommend that rounds be measured using sequence numbers. Round can 152 be approximated using sequence numbers as follows: 154 Define windowEnd as a sequence number initialize to SND.UNA 156 When windowEnd is ACKed, the current round ends and windowEnd is 157 set to SND.NXT 159 At the start of each round during standard slow start ([RFC5681]) and 160 CSS: 162 lastRoundMinRTT = currentRoundMinRTT 164 currentRoundMinRTT = infinity 166 rttSampleCount = 0 168 For each arriving ACK in slow start, where N is the number of 169 previously unacknowledged bytes acknowledged in the arriving ACK: 171 Update the cwnd 173 - cwnd = cwnd + min (N, L * SMSS) 175 Keep track of minimum observed RTT 177 - currentRoundMinRTT = min(currentRoundMinRTT, currRTT) 179 - where currRTT is the RTT sampled from the latest incoming ACK 181 - rttSampleCount += 1 183 For rounds where cwnd is at or higher than LOW_CWND and 184 N_RTT_SAMPLE RTT samples have been obtained, check if delay 185 increase triggers slow start exit 186 - if (cwnd >= (LOW_CWND * SMSS) AND rttSampleCount >= 187 N_RTT_SAMPLE) 189 o RttThresh = clamp(MIN_RTT_THRESH, lastRoundMinRTT / 8, 190 MAX_RTT_THRESH) 192 o if (currentRoundMinRTT >= (lastRoundMinRTT + RttThresh)) 194 + cssBaselineMinRtt = currentRoundMinRTT 196 + exit slow start and enter CSS 198 CSS lasts at most CSS_ROUNDS rounds. If the transition into CSS 199 happens in the middle of a round, that partial round counts towards 200 the limit. 202 For each arriving ACK in CSS, where N is the number of previously 203 unacknowledged bytes acknowledged in the arriving ACK: 205 Update the cwnd 207 - cwnd = cwnd + (min (N, L * SMSS) / CSS_GROWTH_DIVISOR) 209 Keep track of minimum observed RTT 211 - currentRoundMinRTT = min(currentRoundMinRTT, currRTT) 213 - where currRTT is the sampled RTT from the incoming ACK 215 - rttSampleCount += 1 217 For CSS rounds where N_RTT_SAMPLE RTT samples have been obtained, 218 check if current round's minRTT drops below baseline indicating 219 that HyStart exit was spurious. 221 - if (currentRoundMinRTT < cssBaselineMinRtt) 223 o cssBaselineMinRtt = infinity 225 o resume slow start including HyStart++ 227 If CSS_ROUNDS rounds are complete, enter congestion avoidance. 229 * ssthresh = cwnd 231 If loss or ECN-marking is observed anytime during standard slow start 232 or CSS, enter congestion avoidance. 234 * ssthresh = cwnd 236 4.3. Tuning constants 238 It is RECOMMENDED that a HyStart++ implementation use the following 239 constants: 241 * LOW_CWND = 16 243 * MIN_RTT_THRESH = 4 msec 245 * MAX_RTT_THRESH = 16 msec 247 * N_RTT_SAMPLE = 8 249 * CSS_GROWTH_DIVISOR = 4 251 * CSS_ROUNDS = 5 253 These constants have been determined with lab measurements and real 254 world deployments. An implementation MAY tune them for different 255 network characteristics. 257 Using smaller values of LOW_CWND will cause the algorithm to kick in 258 before the last round RTT can be measured, particularly if the 259 implementation uses an initial cwnd of 10 MSS. Higher values will 260 delay the detection of delay increase and reduce the ability of 261 HyStart++ to prevent overshoot problems. 263 The delay increase sensitivity is determined by MIN_RTT_THRESH and 264 MAX_RTT_THRESH. Smaller values of MIN_RTT_THRESH may cause spurious 265 exits from slow start. Larger values of MAX_RTT_THRESH may result in 266 slow start not exiting until loss is encountered for connections on 267 large RTT paths. 269 A TCP implementation is required to take at least one RTT sample each 270 round. Using lower values of N_RTT_SAMPLE will lower the accuracy of 271 the measured RTT for the round; higher values will improve accuracy 272 at the cost of more processing. 274 The minimum value of CSS_GROWTH_DIVISOR MUST be at least 2. A value 275 of 1 results in the same aggressive behavior as regular slow start. 276 Values larger than 4 will cause the algorithm to be less aggressive 277 and maybe less performant. 279 Smaller values of CSS_ROUNDS may miss detecting jitter and larger 280 values may limit performance. 282 An implementation SHOULD use HyStart++ only for the initial slow 283 start (when ssthresh is at its initial value of arbitrarily high per 284 [RFC5681]) and fall back to using traditional slow start for the 285 remainder of the connection lifetime. This is acceptable because 286 subsequent slow starts will use the discovered ssthresh value to exit 287 slow start and avoid the overshoot problem. An implementation MAY 288 use HyStart++ to grow the restart window ([RFC5681]) after a long 289 idle period. 291 5. Deployments and Performance Evaluations 293 As of the time of writing, HyStart++ draft 01 was default enabled for 294 all TCP connections in Windows for two years. The original Hystart 295 has been default-enabled for all TCP connections using the default 296 congestion control module CUBIC ([RFC8312]) for a decade. 298 In lab measurements with Windows TCP, HyStart++ shows both goodput 299 improvements as well as reductions in packet loss and 300 retransmissions. For example across a variety of tests on a 100 Mbps 301 link with a bottleneck buffer size of bandwidth-delay product, 302 HyStart++ reduces bytes retransmitted by 50% and retransmission 303 timeouts by 36%. 305 In an A/B test for HyStart++ draft 01 across a large Windows device 306 population, out of 52 billion TCP connections, 0.7% of connections 307 move from 1 RTO to 0 RTOs and another 0.7% connections move from 2 308 RTOs to 1 RTO with HyStart++. This test did not focus on send heavy 309 connections and the impact on send heavy connections is likely much 310 higher. We plan to conduct more such production experiments to 311 gather more data in the future. 313 6. Security Considerations 315 HyStart++ enhances slow start and inherits the general security 316 considerations discussed in [RFC5681]. 318 7. IANA Considerations 320 This document has no actions for IANA. 322 8. References 324 8.1. Normative References 326 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 327 Requirement Levels", BCP 14, RFC 2119, 328 DOI 10.17487/RFC2119, March 1997, 329 . 331 [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte 332 Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February 333 2003, . 335 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 336 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 337 . 339 8.2. Informative References 341 [HyStart] Ha, S. and I. Ree, "Hybrid Slow Start for High-Bandwidth 342 and Long-Distance Networks", 343 DOI 10.1145/1851182.1851192, International Workshop on 344 Protocols for Fast Long-Distance Networks, 2008, 345 . 348 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 349 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 350 RFC 8312, DOI 10.17487/RFC8312, February 2018, 351 . 353 Authors' Addresses 355 Praveen Balasubramanian 356 Microsoft 357 One Microsoft Way 358 Redmond, WA 98052 359 United States of America 361 Phone: +1 425 538 2782 362 Email: pravb@microsoft.com 364 Yi Huang 365 Microsoft 367 Phone: +1 425 703 0447 368 Email: huanyi@microsoft.com 370 Matt Olson 371 Microsoft 373 Phone: +1 425 538 8598 374 Email: maolson@microsoft.com