idnits 2.17.1 draft-ietf-tcpm-newcwv-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC2861, but the abstract doesn't seem to directly say this. It does mention RFC2861 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 22, 2015) is 3262 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2861 (Obsoleted by RFC 7661) -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCPM Working Group G. Fairhurst 3 Internet-Draft A. Sathiaseelan 4 Obsoletes: 2861 (if approved) R. Secchi 5 Intended status: Experimental University of Aberdeen 6 Expires: November 23, 2015 May 22, 2015 8 Updating TCP to support Rate-Limited Traffic 9 draft-ietf-tcpm-newcwv-11 11 Abstract 13 This document provides a mechanism to address issues that arise when 14 TCP is used to support traffic that exhibits periods where the 15 sending rate is limited by the application rather than the congestion 16 window. It provides an experimental update to TCP that allows a TCP 17 sender to restart quickly following a rate-limited interval. This 18 method is expected to benefit applications that send rate-limited 19 traffic using TCP, while also providing an appropriate response if 20 congestion is experienced. 22 It also evaluates the Experimental specification of TCP Congestion 23 Window Validation, CWV, defined in RFC 2861, and concludes that RFC 24 2861 sought to address important issues, but failed to deliver a 25 widely used solution. This document therefore recommends that the 26 status of RFC 2861 is moved from Experimental to Historic, and that 27 it is replaced by the current specification. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on November 23, 2015. 46 Copyright Notice 48 Copyright (c) 2015 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Implementation of new CWV . . . . . . . . . . . . . . . . 5 65 1.2. Standards Status of this Document . . . . . . . . . . . . 5 66 2. Reviewing experience with TCP-CWV . . . . . . . . . . . . . . 5 67 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 68 4.1. Initialisation . . . . . . . . . . . . . . . . . . . . . 8 69 4.2. Estimating the validated capacity supported by a path . . 8 70 4.3. Preserving cwnd during a rate-limited period. . . . . . . 9 71 4.4. TCP congestion control during the non-validated phase . . 10 72 4.4.1. Response to congestion in the non-validated phase . . 11 73 4.4.2. Sender burst control during the non-validated phase . 13 74 4.4.3. Adjustment at the end of the Non-Validated Period 75 (NVP) . . . . . . . . . . . . . . . . . . . . . . . . 13 76 4.5. Examples of Implementation . . . . . . . . . . . . . . . 14 77 4.5.1. Implementing the pipeACK measurement . . . . . . . . 14 78 4.5.2. Implementing detection of the cwnd-limited condition 15 79 5. Determining a safe period to preserve cwnd . . . . . . . . . 16 80 6. Security Considerations . . . . . . . . . . . . . . . . . . . 17 81 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 82 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17 83 9. Author Notes . . . . . . . . . . . . . . . . . . . . . . . . 17 84 9.1. Other related work . . . . . . . . . . . . . . . . . . . 17 85 10. Revision notes . . . . . . . . . . . . . . . . . . . . . . . 20 86 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 23 87 11.1. Normative References . . . . . . . . . . . . . . . . . . 23 88 11.2. Informative References . . . . . . . . . . . . . . . . . 24 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 24 91 1. Introduction 93 TCP is used for traffic with a range of application behaviours. The 94 TCP congestion window (cwnd) controls the number of unacknowledged 95 packets/bytes that a TCP flow may have in the network at any time, a 96 value known as the FlightSize [RFC5681]. FlightSize is a measure of 97 the volume of data that is in flight. A bulk application will always 98 have data available to transmit. The rate at which it sends is 99 therefore limited by the maximum permitted by the receiver advertised 100 window and the sender congestion window (cwnd). The FlightSize of a 101 bulk flow increases with the cwnd, and tracks the volume of data 102 acknowledged in the last Round Trip Time (RTT). 104 In contrast, a rate-limited application will experience periods when 105 the sender is either idle or is unable to send at the maximum rate 106 permitted by the cwnd. In this case, the volume of data sent 107 (FlightSize) can change significantly from one RTT to another, and 108 can be much less than the cwnd. Hence, it is possible that the 109 FlightSize could significantly exceed the recently used capacity. 110 The update in this document targets the operation of TCP in such 111 rate-limited cases. 113 Standard TCP [RFC5681] states that a TCP sender SHOULD set cwnd to no 114 more than the Restart Window (RW) before beginning transmission, if 115 the TCP sender has not sent data in an interval exceeding the 116 retransmission timeout, i.e., when an application becomes idle. 117 [RFC2861] noted that this TCP behaviour was not always observed in 118 current implementations. Experiments [Bis08] confirm this to still 119 be the case. 121 Congestion Window Validation, CWV, introduced the terminology of 122 "application limited periods". RFC2861 describes any time that an 123 application limits the sending rate, rather than being limited by the 124 transport, as "rate-limited". This update improves support for 125 applications that vary their transmission rate, either with (short) 126 idle periods between transmission or by changing the rate at which 127 the application sends. These applications are characterised by the 128 TCP FlightSize often being less than cwnd. Many Internet 129 applications exhibit this behaviour, including web browsing, http- 130 based adaptive streaming, applications that support query/response 131 type protocols, network file sharing, and live video transmission. 132 Many such applications currently avoid using long-lived (persistent) 133 TCP connections (e.g., [RFC2616] servers typically support persistent 134 HTTP connections, but do not enable this by default). Such 135 applications often instead either use a succession of short TCP 136 transfers or use UDP. 138 Standard TCP does not impose additional restrictions on the growth of 139 the congestion window when a TCP sender is unable to send at the 140 maximum rate allowed by the cwnd. In this case, the rate-limited 141 sender may grow a cwnd far beyond that corresponding to the current 142 transmit rate, resulting in a value that does not reflect current 143 information about the state of the network path the flow is using. 144 Use of such an invalid cwnd may result in reduced application 145 performance and/or could significantly contribute to network 146 congestion. 148 [RFC2861] proposed a solution to these issues in an experimental 149 method known as CWV. CWV was intended to help reduce cases where TCP 150 accumulated an invalid (inappropriately large) cwnd. The use and 151 drawbacks of using the CWV algorithm in RFC 2861 with an application 152 are discussed in Section 2. 154 Section 3 defines relevant terminology. 156 Section 4 specifies an alternative to CWV that seeks to address the 157 same issues, but does so in a way that is expected to mitigate the 158 impact on an application that varies its sending rate. The updated 159 method applies to the rate-limited conditions (including both 160 application-limited and idle senders). 162 The goals of this update are: 164 o To not change the behaviour of a TCP sender that performs bulk 165 transfers that fully use the cwnd. 167 o To provide a method that co-exists with Standard TCP and other 168 flows that use this updated method. 170 o To reduce transfer latency for applications that change their rate 171 over short intervals of time. 173 o To avoid a TCP sender growing a large "non-validated" cwnd, when 174 it has not recently sent using this cwnd. 176 o To remove the incentive for ad-hoc application or network stack 177 methods (such as "padding") solely to maintain a large cwnd for 178 future transmission. 180 o To incentivise the use of long-lived connections, rather than a 181 succession of short-lived flows, benefiting both the flows and 182 other flows sharing the network path when actual congestion is 183 encountered. 185 Section 5 describes the rationale for selecting the safe period to 186 preserve the cwnd. 188 1.1. Implementation of new CWV 190 The method specified in Section 4 of this document is a sender-side 191 only change to the the TCP congestion control behaviour of TCP. 193 The method creates a new protocol state, and requires a sender to 194 determine when the cwnd is validated or non-validated to control the 195 entry and exit from this state Section 4.3. It defines how a TCP 196 sender manages the growth of the cwnd using the set of rules defined 197 in Section 4. 199 Implementation of this specification requires an implementor to 200 define a method to measure the available capacity using the pipeACK 201 samples. The details of this measurement are implementation- 202 specific. An example is provided in Section 4.5.1, but other methods 203 are permitted. A sender also needs to provide a method to determine 204 when it becomes cwnd-limited. Implementation of this may require 205 consideration of other TCP methods (see Section 4.5.2). 207 A sender is also recommended to provide a method that controls the 208 maximum burst size, Section 4.4.2. However, implementors are allowed 209 flexibility in how this method is implemented and the choice of an 210 appropriate method is expected to depend on the way in which the 211 sender stack implements other TCP methods (such as TCP Segment 212 Offload, TSO). 214 1.2. Standards Status of this Document 216 The document updates and obsoletes the methods described in 217 [RFC2861]. It recommends a set of mechanisms, including the use of 218 pacing during a non-validated period. The updated mechanisms are 219 intended to have a less aggressive congestion impact than would be 220 exhibited by a standard TCP sender. 222 The specification in this draft is classified as "Experimental" 223 pending experience with deployed implementations of the methods. 225 2. Reviewing experience with TCP-CWV 227 [RFC2861] described a simple modification to the TCP congestion 228 control algorithm that decayed the cwnd after the transition to a 229 "sufficiently-long" idle period. This used the slow-start threshold 230 (ssthresh) to save information about the previous value of the 231 congestion window. The approach relaxed the standard TCP behaviour 232 [RFC5681] for an idle session, intended to improve application 233 performance. CWV also modified the behaviour when a sender 234 transmitted at a rate less than allowed by cwnd. 236 [RFC2861] proposed two set of responses, one after an "application- 237 limited" and one after an "idle period". Although this distinction 238 was argued, in practice differentiating the two conditions was found 239 problematic in actual networks (e.g., [Bis10]). While this offers 240 predictable performance for long on-off periods (>>1 RTT), or slowly 241 varying rate-based traffic, the performance could be unpredictable 242 for variable-rate traffic and depended both upon whether an accurate 243 RTT had been obtained and the pattern of application traffic relative 244 to the measured RTT. 246 Many applications can and often do vary their transmission over a 247 wide range of rates. Using [RFC2861] such applications often 248 experienced varying performance, which made it hard for application 249 developers to predict the TCP latency even when using a path with 250 stable network characteristics. We argue that an attempt to classify 251 application behaviour as application-limited or idle is problematic 252 and also inappropriate. This document therefore explicitly avoids 253 trying to differentiate these two cases, instead treating all rate- 254 limited traffic uniformly. 256 [RFC2861] has been implemented in some mainstream operating systems 257 as the default behaviour [Bis08]. Analysis (e.g., [Bis10] [Fai12]) 258 has shown that a TCP sender using CWV is able to use available 259 capacity on a shared path after an idle period. This can benefit 260 variable-rate applications, especially over long delay paths, when 261 compared to the slow-start restart specified by standard TCP. 262 However, CWV would only benefit an application if the idle period 263 were less than several Retransmission Time Out (RTO) intervals 264 [RFC6298], since the behaviour would otherwise be the same as for 265 standard TCP, which resets the cwnd to the TCP Restart Window after 266 this period. 268 To enable better performance for variable-rate applications with TCP, 269 some operating systems have chosen to support non-standard methods, 270 or applications have resorted to "padding" streams by sending dummy 271 data to maintain their sending rate when they have no data to 272 transmit. Although transmitting redundant data across a network path 273 provides good evidence that the path can sustain data at the offered 274 rate, padding also consumes network capacity and reduces the 275 opportunity for congestion-free statistical multiplexing. For 276 variable-rate flows, the benefits of statistical multiplexing can be 277 significant and it is therefore a goal to find a viable alternative 278 to padding streams. 280 Experience with [RFC2861] suggests that although the CWV method 281 benefited the network in a rate-limited scenario (reducing the 282 probability of network congestion), the behaviour was too 283 conservative for many common rate-limited applications. This 284 mechanism did not therefore offer the desirable increase in 285 application performance for rate-limited applications and it is 286 unclear whether applications actually use this mechanism in the 287 general Internet. 289 It is therefore concluded that CWV, as defined in [RFC2861], was 290 often a poor solution for many rate-limited applications. It had the 291 correct motivation, but had the wrong approach to solving this 292 problem. 294 3. Terminology 296 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 297 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 298 document are to be interpreted as described in [RFC2119]. 300 The document assumes familiarity with the terminology of TCP 301 congestion control [RFC5681]. 303 The following additional terminology is used in this document: 305 cwnd-limited: A TCP flow that has sent the maximum number of segments 306 permitted by the cwnd, where the application utilises the allowed 307 sending rate (see Section 4.5.2). 309 pipeACK sample: A measure of the volume of data acknowledged by the 310 network within an RTT. 312 pipeACK variable: A variable that measures the available capacity 313 using the set of pipeACK samples. 315 pipeACK Sampling Period: The maximum period that a measured pipeACK 316 sample may influence the pipeACK variable. 318 Non-validated phase: The phase where the cwnd reflects a previous 319 measurement of the available path capacity. 321 Non-validated period, NVP: The maximum period for which cwnd is 322 preserved in the non-validated phase. 324 Rate-limited: A TCP flow that does not consume more than one half of 325 cwnd, and hence operates in the non-validated phase. This includes 326 periods when an application is either idle or chooses to send at a 327 rate less than the maximum permitted by the cwnd. 329 Validated phase: The phase where the cwnd reflects a current estimate 330 of the available path capacity. 332 4. A New Congestion Window Validation method 334 This section proposes an update to the TCP congestion control 335 behaviour during a rate-limited interval. This new method 336 intentionally does not differentiate between times when the sender 337 has become idle or chooses to send at a rate less than the maximum 338 allowed by the cwnd. 340 The period where actual usage is less than allowed by cwnd, is named 341 the non-validated phase. The update allows an application in the 342 non-validated phase to resume transmission at a previous rate without 343 incurring the delay of slow-start. However, if the TCP sender 344 experiences congestion using the preserved cwnd, it is required to 345 immediately reset the cwnd to an appropriate value specified by the 346 method. If a sender does not take advantage of the preserved cwnd 347 within the Non-validated period, NVP, the value of cwnd is reduced, 348 ensuring the value better reflects the capacity that was recently 349 actually used. 351 It is expected that this update will satisfy the requirements of many 352 rate-limited applications and at the same time provide an appropriate 353 method for use in the Internet. New-CWV reduces this incentive for 354 an application to send "padding" data simply to keep transport 355 congestion state. 357 The method is specified in following subsections and is expected to 358 encourage applications and TCP stacks to use standards-based 359 congestion control methods. It may also encourage the use of long- 360 lived connections where this offers benefit (such as persistent 361 http). 363 4.1. Initialisation 365 A sender starts a TCP connection in the validated phase and 366 initialises the pipeACK variable to the "undefined" value. This 367 value inhibits use of the value in cwnd calculations. 369 4.2. Estimating the validated capacity supported by a path 371 [RFC6675] defines a variable, FlightSize, that indicates the 372 instantaneous amount of data that has been sent, but not cumulatively 373 acknowledged. In this method a new variable "pipeACK" is introduced 374 to measure the acknowledged size of the network pipe. This is used 375 to determine if the sender has validated the cwnd. pipeACK differs 376 from FlightSize in that it is evaluated over a window of acknowledged 377 data, rather than reflecting the amount of data outstanding. 379 A sender determines a pipeACK sample by measuring the volume of data 380 that was acknowledged by the network over the period of a measured 381 Round Trip Time (RTT). Using the variables defined in [RFC6675], a 382 value could be measured by caching the value of HighACK and after one 383 RTT measuring the difference between the cached HighACK value and the 384 current HighACK value. A sender MAY count TCP DupACKs that 385 acknowledge new data when collecting the pipeACK sample. Other 386 equivalent methods may be used. 388 A sender is not required to continuously update the pipeACK variable 389 after each received ACK, but SHOULD perform a pipeACK sample at least 390 once per RTT when it has sent unacknowledged segments. 392 The pipeACK variable MAY consider multiple pipeACK samples over the 393 pipeACK Sampling Period. The value of the pipeACK variable MUST NOT 394 exceed the maximum (highest value) within the sampling period. This 395 specification defines the pipeACK Sampling Period as Max(3*RTT, 1 396 second). This period enables a sender to compensate for large 397 fluctuations in the sending rate, where there may be pauses in 398 transmission, and allows the pipeACK variable to reflect the largest 399 recently measured pipeACK sample. 401 When no measurements are available (e.g., a sender that has been idle 402 will have sent no data and received no ACKs), the pipeACK variable is 403 set to the "undefined value". This value is used to inhibit entering 404 the non-validated phase until the first new measurement of a pipeACK 405 sample. (Section 4.5 provides examples of implementation.) 407 The pipeACK variable MUST NOT be updated during TCP Fast Recovery. 408 That is, the sender stops collecting pipeACK samples during loss 409 recovery. The method RECOMMENDS that the TCP SACK option [RFC2018] 410 is enabled and the method defined in [RFC6675] is used to recover 411 missing segments. This allows the sender to more accurately 412 determine the number of missing bytes during the loss recovery phase, 413 and using this method will result in a more appropriate cwnd 414 following loss. 416 4.3. Preserving cwnd during a rate-limited period. 418 The updated method creates a new TCP sender phase that captures 419 whether the cwnd reflects a validated or non-validated value. The 420 phases are defined as: 422 o Validated phase: pipeACK >=(1/2)*cwnd, or pipeACK is undefined. 423 This is the normal phase, where cwnd is expected to be an 424 approximate indication of the capacity currently available along 425 the network path, and the standard methods are used to increase 426 cwnd (currently [RFC5681]). 428 o Non-validated phase: pipeACK <(1/2)*cwnd. This is the phase where 429 the cwnd has a value based on a previous measurement of the 430 available capacity, and the usage of this capacity has not been 431 validated in the pipeACK Sampling Period. That is, when it is not 432 known whether the cwnd reflects the currently available capacity 433 along the network path. The mechanisms to be used in this phase 434 seek to determine a safe value for cwnd and an appropriate 435 reaction to congestion. 437 Note: A threshold is needed to determine whether a sender is in the 438 validated or non-validated phase. A standard TCP sender in slow- 439 start is permitted to double its FlightSize from one RTT to the next. 440 This motivated the choice of a threshold value of 1/2. This 441 threshold ensures a sender does not further increase the cwnd as long 442 as the FlightSize is less than (1/2*cwnd). Furthermore, a sender 443 with a FlightSize less than (1/2*cwnd) may in the next RTT be 444 permitted by the cwnd to send at a rate that more than doubles the 445 FlightSize, and hence this case needs to be regarded as non-validated 446 and a sender therefore needs to employ additional mechanisms while in 447 this phase. 449 4.4. TCP congestion control during the non-validated phase 451 A TCP sender implementing this specification MUST enter the non- 452 validated phase when the pipeACK is less than (1/2)*cwnd. 454 A TCP sender that enters the non-validated phase preserves the cwnd 455 (i.e., the cwnd only increases after a sender fully uses the cwnd in 456 this phase, otherwise the cwnd neither grows nor reduces). The phase 457 is concluded when the sender transmits sufficient data so that 458 pipeACK > (1/2)*cwnd (i.e., the sender is no longer rate-limited), or 459 when the sender receives an indication of congestion. 461 After a fixed period of time (the non-validated period, NVP), the 462 sender adjusts the cwnd Section 4.4.3). The NVP SHOULD NOT exceed 5 463 minutes.Section 5 discusses the rationale for choosing a safe value 464 for this period. 466 The behaviour in the non-validated phase is specified as: 468 o A sender determines whether to increase the cwnd based upon 469 whether it is cwnd-limited (see Section 4.5.2): 471 * A sender that is cwnd-limited MAY use the standard TCP method 472 to increase cwnd (i.e., a TCP sender that fully utilises the 473 cwnd is permitted to increase cwnd each received ACK using 474 standard methods). 476 * A sender that is not cwnd-limited MUST NOT increase the cwnd 477 when ACK packets are received in this phase (i.e., needs to 478 avoid growing the cwnd when it has not recently sent using the 479 current size of cwnd). 481 o If the sender receives an indication of congestion while in the 482 non-validated phase (i.e., detects loss), the sender MUST exit the 483 non-validated phase (reducing the cwnd as defined in 484 Section 4.4.1). 486 o If the Retransmission Time Out (RTO) expires while in the non- 487 validated phase, the sender MUST exit the non-validated phase. It 488 then resumes using the standard TCP RTO mechanism [RFC5681]. 490 o A sender with a pipeACK variable greater than (1/2)*cwnd SHOULD 491 enter the validated phase. (A rate-limited sender will not 492 normally be impacted by whether it is in a validated or non- 493 validated phase, since it will normally not increase FlightSize to 494 use the entire cwnd. However, a change to the validated phase 495 will release the sender from constraints on the growth of cwnd, 496 and result in using the standard congestion response.) 498 The cwnd-limited behaviour may be triggered during a transient 499 condition that occurs when a sender is in the non-validated phase and 500 receives an ACK that acknowledges received data, the cwnd was fully 501 utilised, and more data is awaiting transmission than may be sent 502 with the current cwnd. The sender MAY then use the standard method 503 to increase the cwnd. (Note, if the sender succeeds in sending these 504 new segments, the updated cwnd and pipeACK variables will eventually 505 result in a transition to the validated phase.) 507 4.4.1. Response to congestion in the non-validated phase 509 Reception of congestion feedback while in the non-validated phase is 510 interpreted as an indication that it was inappropriate for the sender 511 to use the preserved cwnd. The sender is therefore required to 512 quickly reduce the rate to avoid further congestion. Since the cwnd 513 does not have a validated value, a new cwnd value needs to be 514 selected based on the utilised rate. 516 A sender that detects a packet-drop MUST record the current 517 FlightSize in the variable LossFlightSize and MUST calculate a safe 518 cwnd for loss recovery using the method below: 520 cwnd = (Max(pipeACK,LossFlightSize))/2. 522 The pipeACK value is not updated during loss recoverySection 4.2. If 523 there is a valid pipeACK value, the new cwnd is adjusted to reflect 524 that a non-validated cwnd may be larger than the actual FlightSize, 525 or recently used FlightSize (recorded in pipeACK). The updated cwnd 526 therefore prevents overshoot by a sender significantly increasing its 527 transmission rate during the recovery period. 529 At the end of the recovery phase, the TCP sender MUST reset the cwnd 530 using the method below: 532 cwnd = (Max(pipeACK,LossFlightSize) - R)/2. 534 Where R is the volume of data that was successfully retransmitted 535 during the recovery phase. This corresponds to segments 536 retransmitted and considered lost by the pipe estimation algorithm at 537 the end of recovery. It does not include the additional cost of 538 multiple retransmission of the same data. The loss of segments 539 indicates that the path capacity was exceeded by at least R, and 540 hence the calculated cwnd is reduced by at least R before the window 541 is halved. 543 The calculated cwnd value MUST NOT be reduced below 1 TCP Maximum 544 Segment Size (MSS). 546 After completing the loss recovery phase, the sender MUST re- 547 initialise the pipeACK variable to the "undefined" value. This 548 ensures that standard TCP methods are used immediately after 549 completing loss recovery until a new pipeACK value can be determined. 551 ssthresh is adjusted using the standard TCP method. 553 Note: The adjustment by reducing cwnd by the volume of data not sent 554 (R) follows the method proposed for Jump Start [Liu07]. The 555 inclusion of the term R makes the adjustment more conservative than 556 standard TCP. This is required, since a sender in the non-validated 557 state may increase the rate more than a standard TCP would have done 558 relative to what was sent in the last RTT (i.e., more than doubled 559 the number of segments in flight relative to what it sent in the last 560 RTT). The additional reduction after congestion is beneficial when 561 the LossFlightSize has significantly overshot the available path 562 capacity incurring significant loss (e.g., following a change of path 563 characteristics or when additional traffic has taken a larger share 564 of the network bottleneck during a period when the sender transmits 565 less). 567 Note: The pipeACK value is only valid during a non-validated phase, 568 and therefore this does not exceed cwnd/2. If LossFlightSize and R 569 were small, then this can result in the final cwnd after loss 570 recovery being at most one quarter of the cwnd on detection of 571 congestion. This reduction is conservative, and pipeACK is then 572 reset to undefined, hence cwnd updates after a congestion event do 573 not depend upon the pipeACK history before congestion was detected. 575 4.4.2. Sender burst control during the non-validated phase 577 TCP congestion control allows a sender to accumulate a cwnd that 578 would allow it to send a burst of segments with a total size up to 579 the difference between the FlightsSize and cwnd. Such bursts can 580 impact other flows that share a network bottleneck and/or may induce 581 congestion when buffering is limited. 583 Various methods have been proposed to control the sender burstiness 584 [Hug01], [All05]. For example, TCP can limit the number of new 585 segments it sends per received ACK. This is effective when a flow of 586 ACKs is received, but can not be used to control a sender that has 587 not send appreciable data in the previous RTT [All05]. 589 This document recommends using a method to avoid line-rate bursts 590 after an idle or rate-limited interval when there is less reliable 591 information about the capacity of the network path: A TCP sender in 592 the non-validated phase SHOULD control the maximum burst size, e.g., 593 using a rate-based pacing algorithm in which a sender paces out the 594 cwnd over its estimate of the RTT, or some other method, to prevent 595 many segments being transmitted contiguously at line-rate. The most 596 appropriate method(s) to implement pacing depend on the design of the 597 TCP/IP stack, speed of interface and whether hardware support (such 598 as TCP Segment Offload, TSO) is used. The present document does not 599 recommend any specific method. 601 4.4.3. Adjustment at the end of the Non-Validated Period (NVP) 603 An application that remains in the non-validated phase for a period 604 greater than the NVP is required to adjust its congestion control 605 state. If the sender exits the non-validated phase after this 606 period, it MUST update the ssthresh: 608 ssthresh = max(ssthresh, 3*cwnd/4). 610 (This adjustment of ssthresh ensures that the sender records that it 611 has safely sustained the present rate. The change is beneficial to 612 rate-limited flows that encounter occasional congestion, and could 613 otherwise suffer an unwanted additional delay in recovering the 614 sending rate.) 615 The sender MUST then update cwnd to be not greater than: 617 cwnd = max((1/2)*cwnd, IW). 619 Where IW is the appropriate TCP initial window, used by the TCP 620 sender (e.g., [RFC5681]). 622 Note: This adjustment ensures that the sender responds conservatively 623 after remaining in the non-validated phase for more than the non- 624 validated period. In this case, it reduces the cwnd by a factor of 625 two from the preserved value. This adjustment is helpful when flows 626 accumulate but do not use a large cwnd, and seeks to mitigate the 627 impact when these flows later resume transmission. This could for 628 instance mitigate the impact if multiple high-rate application flows 629 were to become idle over an extended period of time and then were 630 simultaneously awakened by an external event. 632 4.5. Examples of Implementation 634 This section provides informative examples of implementation methods. 635 Implementations may choose to use other methods that comply with the 636 normative requirements. 638 4.5.1. Implementing the pipeACK measurement 640 A pipeACK sample may be measured once each RTT. This reduces the 641 sender processing burden for calculating after each acknowledgement 642 and also reduces storage requirements at the sender. 644 Since application behaviour can be bursty using CWV, it may be 645 desirable to implement a maximum filter to accumulate the measured 646 values so that the pipeACK variable records the largest pipeACK 647 sample within the pipeACK Sampling Period. One simple way to 648 implement this is to divide the pipeACK Sampling Period into several 649 (e.g., 5) equal length measurement periods. The sender then records 650 the start time for each measurement period and the highest measured 651 pipeACK sample. At the end of the measurement period, any 652 measurement(s) that are older than the pipeACK Sampling Period are 653 discarded. The pipeACK variable is then assigned the largest of the 654 set of the highest measured values. 656 pipeACK sample (Bytes) 657 ^ 658 | +----------+----------+ +----------+---...... 659 | | Sample A | Sample B | No | Sample C | Sample D 660 | | | | Sample | | 661 | | |\ 5 | | | | 662 | | | | | | | /\ 4 | 663 | | | | | |\ 3 | | | \ | 664 | | | \ | | \--- | | / \ | /| 2 665 | |/ \------| - | | / \------/ \... 666 +//-+----------+---------\+----/ /----+/---------+-------------> Time 668 <------------------------------------------------| 669 Sampling Period Current Time 671 Figure 1: Example of measuring pipeACK samples 673 Figure 1 shows an example of how measurement samples may be 674 collected. At the time represented by the figure new samples are 675 being accumulated into sample D. Three previous samples also fall 676 within the pipeACK Sampling Period: A, B, and C. There was also a 677 period of inactivity between samples B and C during which no 678 measurements were taken (because no new data segments were 679 acknowledged). The current value of the pipeACK variable will be 5, 680 the maximum across all samples. 682 After one further measurement period, Sample A will be discarded, 683 since it then is older than the pipeACK Sampling Period and the 684 pipeACK variable will be recalculated, Its value will be the larger 685 of Sample C or the final value accumulated in Sample D. 687 Note: the pipeACK Sampling Period and the NVP do not necessarily 688 require a new timer to be implemented. An alternative is to record a 689 timestamp when the sender enters the non-validated phase. Each time 690 a sender transmits a new segment, this timestamp may be used to 691 determine if the NVP has expired. If the period expires, the sender 692 takes into account how many units of the NVP have passed and make one 693 reduction (as defined in Section 4.4.3) for each NVP. 695 4.5.2. Implementing detection of the cwnd-limited condition 697 A sender needs to implement a method that detects the cwnd-limited 698 condition (see Section 4.4). This detects a condition where a sender 699 in the non-validated phase receives an ACK, but the size of cwnd 700 prevents sending more new data. 702 In simple terms, this condition is true only when the FlightSize of a 703 TCP sender is equal to or larger than the current cwnd. However, an 704 implementation also needs to consider constraints on the way in which 705 the cwnd variable can be used, for instance implementations need to 706 support other TCP methods such as the Nagle Algorithm and TCP Segment 707 Offload (TSO) that also use cwnd to control transmission. These 708 other methods can result in a sender becoming cwnd-limited when the 709 cwnd is nearly, rather than completely, equal to the FlightSize. 711 5. Determining a safe period to preserve cwnd 713 This section documents the rationale for selecting the maximum period 714 that cwnd may be preserved, known as the NVP. 716 Limiting the period that cwnd may be preserved avoids undesirable 717 side effects that would result if the cwnd were to be kept 718 unnecessarily high for an arbitrary long period, which was a part of 719 the problem that CWV originally attempted to address. The period a 720 sender may safely preserve the cwnd, is a function of the period that 721 a network path is expected to sustain the capacity reflected by cwnd. 722 There is no ideal choice for this time. 724 A period of five minutes was chosen for this NVP. This is a 725 compromise that was larger than the idle intervals of common 726 applications, but not sufficiently larger than the period for which 727 the capacity of an Internet path may commonly be regarded as stable. 728 The capacity of wired networks is usually relatively stable for 729 periods of several minutes and that load stability increases with the 730 capacity. This suggests that cwnd may be preserved for at least a 731 few minutes. 733 There are cases where the TCP throughput exhibits significant 734 variability over a time less than five minutes. Examples could 735 include wireless topologies, where TCP rate variations may fluctuate 736 on the order of a few seconds as a consequence of medium access 737 protocol instabilities. Mobility changes may also impact TCP 738 performance over short time scales. Senders that observe such rapid 739 changes in the path characteristic may also experience increased 740 congestion with the new method, however such variation would likely 741 also impact TCP's behaviour when supporting interactive and bulk 742 applications. 744 Routing algorithms may change the the network path that is used by a 745 transport. Although a change of path can in turn disrupt the RTT 746 measurement and may result in a change of the capacity available to a 747 TCP connection, we assume these path changes do not usually occur 748 frequently (compared to a time frame of a few minutes). 750 The value of five minutes is therefore expected to be sufficient for 751 most current applications. Simulation studies (e.g., [Bis11]) also 752 suggest that for many practical applications, the performance using 753 this value will not be significantly different to that observed using 754 a non-standard method that does not reset the cwnd after idle. 756 Finally, other TCP sender mechanisms have used a 5 minute timer, and 757 there could be simplifications in some implementations by reusing the 758 same interval. TCP defines a default user timeout of 5 minutes 759 [RFC0793] i.e., how long transmitted data may remain unacknowledged 760 before a connection is forcefully closed. 762 6. Security Considerations 764 General security considerations concerning TCP congestion control are 765 discussed in [RFC5681]. This document describes an algorithm that 766 updates one aspect of the congestion control procedures, and so the 767 considerations described in RFC 5681 also apply to this algorithm. 769 7. IANA Considerations 771 There are no IANA considerations. 773 8. Acknowledgments 775 This document was produced by the TCP Maintenance and Minor 776 Extensions (tcpm) working group. 778 The authors acknowledge the contributions of Dr I Biswas, Dr Ziaul 779 Hossain in supporting the evaluation of CWV and for their help in 780 developing the mechanisms proposed in this draft. We also 781 acknowledge comments received from the Internet Congestion Control 782 Research Group, in particular Yuchung Cheng, Mirja Kuehlewind, Joe 783 Touch, and Mark Allman. This work was part-funded by the European 784 Community under its Seventh Framework Programme through the Reducing 785 Internet Transport Latency (RITE) project (ICT-317700). 787 9. Author Notes 789 RFC-Editor note: please remove this section prior to publication. 791 9.1. Other related work 793 RFC-Editor note: please remove this section prior to publication. 795 There are several issues to be discussed more widely: 797 o There are potential interactions with the Experimental update in 798 RFC 6928 that raises the TCP initial Window to ten segments, do 799 these cases need to be elaborated? 801 This relates to the Experimental specification for increasing 802 the TCP IW defined in RFC 6928. 804 The two methods have different functions and different response 805 to loss/congestion. 807 RFC 6928 proposes an experimental update to TCP that would 808 increase the IW to ten segments. This would allow faster 809 opening of the cwnd, and also a large (same size) restart 810 window. This approach is based on the assumption that many 811 forward paths can sustain bursts of up to ten segments without 812 (appreciable) loss. Such a significant increase in cwnd must 813 be matched with an equally large reduction of cwnd if loss/ 814 congestion is detected, and such a congestion indication is 815 likely to require future use of IW=10 to be disabled for this 816 path for some time. This guards against the unwanted behaviour 817 of a series of short flows continuously flooding a network path 818 without network congestion feedback. 820 In contrast, this document proposes an update with a rationale 821 that relies on recent previous path history to select an 822 appropriate cwnd after restart. 824 The behaviour differs in three ways: 826 1) For applications that send little initially, new-cwv may 827 constrain more than RFC 6928, but would not require the 828 connection to reset any path information when a restart 829 incurred loss. In contrast, new-cwv would allow the TCP 830 connection to preserve the cached cwnd, any loss, would impact 831 cwnd, but not impact other flows. 833 2) For applications that utilise more capacity than provided by 834 a cwnd of 10 segments, this method would permit a larger 835 restart window compared to a restart using the method in RFC 836 6928. This is justified by the recent path history. 838 3) new-CWV is attended to also be used for rate-limited 839 applications, where the application sends, but does not seek to 840 fully utilise the cwnd. In this case, new-cwv constrains the 841 cwnd to that justified by the recent path history. The 842 performance trade-offs are hence different, and it would be 843 possible to enable new-cwv when also using the method in RFC 844 6928, and yield benefits. 846 o There is potential overlap with the Laminar proposal (draft- 847 mathis-tcpm-tcp-laminar) 849 The current draft was intended as a standards-track update to 850 TCP, rather than a new transport variant. At least, it would 851 be good to understand how the two interact and whether there is 852 a possibility of a single method. 854 o There is potential performance loss in loss of a short burst 855 (off list with M Allman) 857 A sender can transmit several segments then become idle. If 858 the first set of segments are all Acknowledged, the ssthresh 859 collapses to a small value (no new data is sent by the idle 860 sender). Loss of the later data results in congestion (e.g., 861 maybe a RED drop or some other cause, rather than the maximum 862 rate of this flow). When the sender performs loss recovery it 863 may have an appreciable pipeACK and cwnd, but a very low 864 FlightSize - the Standard algorithm therefore results in an 865 unusually low cwnd ((1/2)* FlightSize). 867 A constant rate flow would have maintained a FlightSize 868 appropriate to pipeACK (cwnd, if it is a bulk flow). 870 This could be fixed by adding a new state variable? It could 871 also be argued this is a corner case (e.g., loss of only the 872 last segments would have resulted in RTO), the impact could be 873 significant. 875 o There is potential interaction with TCP Control Block Sharing(M 876 Welzl) 878 An application that is non-validated can accumulate a cwnd that 879 is larger than the actual capacity. Is this a fair value to 880 use in TCB sharing? 882 We propose that TCB sharing should use the pipeACK in place of 883 cwnd when a TCP sender is in the Non-validated phase. This 884 value better reflects the capacity that the flow has utilised 885 in the network path. 887 10. Revision notes 889 RFC-Editor note: please remove this section prior to publication. 891 Draft 03 was submitted to ICCRG to receive comments and feedback. 893 Draft 04 contained the first set of clarifications after feedback: 895 o Changed name to application limited and used the term rate-limited 896 in all places. 898 o Added justification and many minor changes suggested on the list. 900 o Added text to tie-in with more accurate ECN marking. 902 o Added ref to Hug01 904 Draft 05 contained various updates: 906 o New text to redefine how to measure the acknowledged pipe, 907 differentiating this from the FlightSize, and hence avoiding 908 previous issues with infrequent large bursts of data not being 909 validated. A key point new feature is that pipeACK only triggers 910 leaving the NVP after the size of the pipe has been acknowledged. 911 This removed the need for hysteresis. 913 o Reduction values were changed to 1/2, following analysis of 914 suggestions from ICCRG. This also sets the "target" cwnd as twice 915 the used rate for non-validated case. 917 o Introduced a symbolic name (NVP) to denote the 5 minute period. 919 Draft 06 contained various updates: 921 o Required reset of pipeACK after congestion. 923 o Added comment on the effect of congestion after a short burst (M. 924 Allman). 926 o Correction of minor Typos. 928 WG draft 00 contained various updates: 930 o Updated initialisation of pipeACK to maximum value. 932 o Added note on intended status still to be determined. 934 WG draft 01 contained: 936 o Added corrections from Richard Scheffenegger. 938 o Raffaello Secchi added to the mechanism, based on implementation 939 experience. 941 o Removed that the requirement for the method to use TCP SACK option 943 o Although it may be desirable to use SACK, this is not essential to 944 the algorithm. 946 o Added the notion of the sampling period to accommodate large rate 947 variations and ensure that the method is stable. This algorithm 948 to be validated through implementation. 950 WG draft 02 contained: 952 o Clarified language around pipeACK variable and pipeACK sample - 953 Feedback from Aris Angelogiannopoulos. 955 WG draft 03 contained: 957 o Editorial corrections - Feedback from Anna Brunstrom. 959 o An adjustment to the procedure at the start and end of Reoloss 960 recovery to align the two equations. 962 o Further clarification of the "undefined" value of the pipeACK 963 variable. 965 WG draft 04 contained: 967 o Editorial corrections. 969 o Introduced the "cwnd-limited" term. 971 o An adjustment to the procedure at the start of a cwnd-limited 972 phase - the new text is intended to ensure that new-cwv is not 973 unnecessarily more conservative than standard TCP when the flow is 974 cwnd-limited. This resolves two issues: first it prevents 975 pathologies in which pipeACK increases slowly and erratically. It 976 also ensures that performance of bulk applications is not 977 significantly impacted when using the method. 979 o Clearly identifies that pacing (or equivalent) is requiring during 980 the NVP to control burstiness. New section added. 982 WG draft 05 contained: 984 o Clarification to first two bullets in Section 4.4 describing cwnd- 985 limited, to explain these are really alternates to the same case. 987 o Section giving implementation examples was restructured to clarify 988 there are two methods described. 990 o Cross References to sections updated - thanks to comments from 991 Martin Winbjoerk and Tim Wicinski. 993 WG draft 06 contained: 995 o The section giving implementation examples was restructured to 996 clarify there are two methods described. 998 o Justification of design decisions. 1000 o Re-organised text to improve clarity of argument. 1002 WG draft 07 contained: 1004 o Updated publication date. 1006 o Text on noting that cwnd shouldn't ever be made negative. 1008 o Updated text on ECN to clarify the process where R is a reduction 1009 based on ECN marks. 1011 WG draft 08 contained: 1013 o Removed description of how to use Accurate ECN feedback. It is 1014 not clear that this document should specify a usage of a mechanism 1015 that has not been fully defined. Accurate ECN may lead to 1016 different congestion responses and these will need to be defined 1017 in the CC specifications for using Accurate ECN. 1019 WG draft 09 contained: 1021 o Removed update to RFC 5681 - the status of the present document is 1022 Experimental, and hence this document does not update RFC 5681. 1024 WG draft 10 contained edits following WGLC: 1026 o Section 1.1 Implementation of new CWV: New section added to 1027 introduce the places where there are implementation flexibility. 1029 o Section 4.4: Clarified that the MUST is to satisfy the goal to 1030 avoid a TCP sender growing a large "non-validated" cwnd, when it 1031 has not recently sent using the current size of cwnd, and fixed 1032 format of bullet 2 in 4.4. 1034 o Section 4.5.2: rewritten section text. 1036 WG draft 11 contained edits following IETF LC: 1038 o Updated text in section 1.1. 1040 o Updated text in response to AD, Gen-ART, & Sec reviews. 1042 o LC call comments from Mirja Kuehlewind 1044 11. References 1046 11.1. Normative References 1048 [RFC0793] Postel, J., "Transmission Control Protocol", September 1049 1981. 1051 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 1052 Selective Acknowledgment Options", RFC 2018, October 1996. 1054 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1055 Requirement Levels", BCP 14, RFC 2119, March 1997. 1057 [RFC2861] Handley, M., Padhye, J., and S. Floyd, "TCP Congestion 1058 Window Validation", RFC 2861, June 2000. 1060 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1061 Control", September 2009. 1063 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 1064 "Computing TCP's Retransmission Timer", June 2011. 1066 [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., 1067 and Y. Nishida, "A Conservative Loss Recovery Algorithm 1068 Based on Selective Acknowledgment (SACK) for TCP", RFC 1069 6675, August 2012. 1071 11.2. Informative References 1073 [All05] Allman, M. and E. Blanton, "Notes on burst mitigation for 1074 transport protocols", March 2005. 1076 [Bis08] Biswas, I. and G. Fairhurst, "A Practical Evaluation of 1077 Congestion Window Validation Behaviour, 9th Annual 1078 Postgraduate Symposium in the Convergence of 1079 Telecommunications, Networking and Broadcasting (PGNet), 1080 Liverpool, UK", June 2008. 1082 [Bis10] Biswas, I., Sathiaseelan, A., Secchi, R., and G. 1083 Fairhurst, "Analysing TCP for Bursty Traffic, Int'l J. of 1084 Communications, Network and System Sciences, 7(3)", June 1085 2010. 1087 [Bis11] Biswas, I., "PhD Thesis, Internet congestion control for 1088 variable rate TCP traffic, School of Engineering, 1089 University of Aberdeen", June 2011. 1091 [Fai12] Sathiaseelan, A., Secchi, R., Fairhurst, G., and I. 1092 Biswas, "Enhancing TCP Performance to support Variable- 1093 Rate Traffic, 2nd Capacity Sharing Workshop, ACM CoNEXT, 1094 Nice, France, 10th December 2012.", June 2008. 1096 [Hug01] Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP 1097 Slow-Start Restart After Idle (Work-in-Progress)", 1098 December 2001. 1100 [Liu07] Liu, D., Allman, M., Jiny, S., and L. Wang, "Congestion 1101 Control without a Startup Phase, 5th International 1102 Workshop on Protocols for Fast Long-Distance Networks 1103 (PFLDnet), Los Angeles, California, USA", February 2007. 1105 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 1106 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 1107 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 1109 Authors' Addresses 1110 Godred Fairhurst 1111 University of Aberdeen 1112 School of Engineering 1113 Fraser Noble Building 1114 Aberdeen, Scotland AB24 3UE 1115 UK 1117 Email: gorry@erg.abdn.ac.uk 1118 URI: http://www.erg.abdn.ac.uk 1120 Arjuna Sathiaseelan 1121 University of Aberdeen 1122 School of Engineering 1123 Fraser Noble Building 1124 Aberdeen, Scotland AB24 3UE 1125 UK 1127 Email: arjuna@erg.abdn.ac.uk 1128 URI: http://www.erg.abdn.ac.uk 1130 Raffaello Secchi 1131 University of Aberdeen 1132 School of Engineering 1133 Fraser Noble Building 1134 Aberdeen, Scotland AB24 3UE 1135 UK 1137 Email: raffaello@erg.abdn.ac.uk 1138 URI: http://www.erg.abdn.ac.uk