idnits 2.17.1 draft-ietf-tcpm-newcwv-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC2861, but the abstract doesn't seem to directly say this. It does mention RFC2861 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 13, 2015) is 3294 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2861 (Obsoleted by RFC 7661) -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCPM Working Group G. Fairhurst 3 Internet-Draft A. Sathiaseelan 4 Obsoletes: 2861 (if approved) R. Secchi 5 Intended status: Experimental University of Aberdeen 6 Expires: October 15, 2015 April 13, 2015 8 Updating TCP to support Rate-Limited Traffic 9 draft-ietf-tcpm-newcwv-10 11 Abstract 13 This document provides a mechanism to address issues that arise when 14 TCP is used to support traffic that exhibits periods where the 15 sending rate is limited by the application rather than the congestion 16 window. It provides an experimental update to TCP that allows a TCP 17 sender to restart quickly following a rate-limited interval. This 18 method is expected to benefit applications that send rate-limited 19 traffic using TCP, while also providing an appropriate response if 20 congestion is experienced. 22 It also evaluates the Experimental specification of TCP Congestion 23 Window Validation, CWV, defined in RFC 2861, and concludes that RFC 24 2861 sought to address important issues, but failed to deliver a 25 widely used solution. This document therefore recommends that the 26 status of RFC 2861 is moved from Experimental to Historic, and that 27 it is replaced by the current specification. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on October 15, 2015. 46 Copyright Notice 48 Copyright (c) 2015 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Implementation of new CWV . . . . . . . . . . . . . . . . 4 65 1.2. Standards Status of this Document . . . . . . . . . . . . 5 66 2. Reviewing experience with TCP-CWV . . . . . . . . . . . . . . 5 67 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 68 4.1. Initialisation . . . . . . . . . . . . . . . . . . . . . 8 69 4.2. Estimating the validated capacity supported by a path . . 8 70 4.3. Preserving cwnd during a rate-limited period. . . . . . . 9 71 4.4. TCP congestion control during the non-validated phase . . 10 72 4.4.1. Response to congestion in the non-validated phase . . 11 73 4.4.2. Sender burst control during the non-validated phase . 12 74 4.4.3. Adjustment at the end of the non-validated phase . . 13 75 4.5. Examples of Implementation . . . . . . . . . . . . . . . 14 76 4.5.1. Implementing the pipeACK measurement . . . . . . . . 14 77 4.5.2. Implementing detection of the cwnd-limited condition 15 78 5. Determining a safe period to preserve cwnd . . . . . . . . . 15 79 6. Security Considerations . . . . . . . . . . . . . . . . . . . 16 80 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 81 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 16 82 9. Author Notes . . . . . . . . . . . . . . . . . . . . . . . . 17 83 9.1. Other related work . . . . . . . . . . . . . . . . . . . 17 84 10. Revision notes . . . . . . . . . . . . . . . . . . . . . . . 19 85 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 86 11.1. Normative References . . . . . . . . . . . . . . . . . . 22 87 11.2. Informative References . . . . . . . . . . . . . . . . . 23 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 90 1. Introduction 92 TCP is used to support a range of application behaviours. The TCP 93 congestion window (cwnd) controls the number of unacknowledged 94 packets/bytes that a TCP flow may have in the network at any time, a 95 value known as the FlightSize [RFC5681]. A bulk application will 96 always have data available to transmit. The rate at which it sends 97 is therefore limited by the maximum permitted by the receiver 98 advertised window and the sender congestion window (cwnd). In 99 contrast, a rate-limited application will experience periods when the 100 sender is either idle or is unable to send at the maximum rate 101 permitted by the cwnd. The update in this document targets the 102 operation of TCP in such rate-limited cases. 104 Standard TCP [RFC5681] states that a TCP sender SHOULD set cwnd to no 105 more than the Restart Window (RW) before beginning transmission, if 106 the TCP sender has not sent data in an interval exceeding the 107 retransmission timeout, i.e., when an application becomes idle. 108 [RFC2861] noted that this TCP behaviour was not always observed in 109 current implementations. Experiments [Bis08] confirm this to still 110 be the case. 112 Congestion Window Validation, CWV, introduced the terminology of 113 "application limited periods". This document describes any time that 114 an application limits the sending rate, rather than being limited by 115 the transport, as "rate-limited". This update improves support for 116 applications that vary their transmission rate, either with (short) 117 idle periods between transmission or by changing the rate the 118 application sends. These applications are characterised by the TCP 119 FlightSize often being less than cwnd. Many Internet applications 120 exhibit this behaviour, including web browsing, http-based adaptive 121 streaming, applications that support query/response type protocols, 122 network file sharing, and live video transmission. Many such 123 applications currently avoid using long-lived (persistent) TCP 124 connections (e.g. [RFC2616] servers typically support persistent 125 HTTP connections, but do not enable this by default). Such 126 applications often instead either use a succession of short TCP 127 transfers or use UDP. 129 Standard TCP does not impose additional restrictions on the growth of 130 the congestion window when a TCP sender is unable to send at the 131 maximum rate allowed by the cwnd. In this case, the rate-limited 132 sender may grow a cwnd far beyond that corresponding to the current 133 transmit rate, resulting in a value that does not reflect current 134 information about the state of the network path the flow is using. 135 Use of such an invalid cwnd may result in reduced application 136 performance and/or could significantly contribute to network 137 congestion. 139 [RFC2861] proposed a solution to these issues in an experimental 140 method known as CWV. CWV was intended to help reduce cases where TCP 141 accumulated an invalid (inappropriately large) cwnd. The use and 142 drawbacks of using the CWV algorithm in RFC 2861 with an application 143 are discussed in Section 2. 145 Section 3 defines relevant terminology. 147 Section 4 specifies an alternative to CWV that seeks to address the 148 same issues, but does this in a way that is expected to mitigate the 149 impact on an application that varies its sending rate. The updated 150 method applies to the rate-limited conditions (including both an 151 application-limited and idle sender). 153 The goals of this update are: 155 o To not change the behaviour of a TCP sender that performs bulk 156 transfers that consume the cwnd. 158 o To provide a method that co-exists with Standard TCP and other 159 flows that use this updated method. 161 o To reduce transfer latency for applications that change their rate 162 over short intervals of time. 164 o To avoid a TCP sender growing a large "non-validated" cwnd, when 165 it has not recently sent using this cwnd. 167 o To remove the incentive for ad-hoc application or network stack 168 methods (such as "padding") solely to maintain a large cwnd for 169 future transmission. 171 o To incentivise the use of long-lived connections, rather than a 172 succession of short-lived flows, benefiting both flows and network 173 when actual congestion is encountered. 175 Section 5 describes the rationale for selecting the safe period to 176 preserve the cwnd. 178 1.1. Implementation of new CWV 180 The method specified in this document is a sender-side only change to 181 the the TCP congestion control behaviour of TCP. 183 The method creates a new protocol state, and requires a sender to 184 determine when the cwnd is validated or non-validated to control the 185 entry and exit from this state (see Section 4.3). It specifies how a 186 TCP sender manages the growth of the cwnd using the set of rules 187 defined in Section 4. 189 Implementation of this specification requires an implementor to 190 define a method to provide a measure for the volume of recently 191 acknowledged data (pipeACK). The details of this measurement are 192 implementation-specific. An example is provided in Section 4.5.1, 193 but other methods are permitted. A sender also needs to provide a 194 method to determine when it becomes cwnd-limited. Implementation of 195 this may require consideration of other TCP methods (see 196 Section 4.5.2). 198 A sender is also recommended to provide a method that controls the 199 maximum burst size (see Section 4.4.2). However, implementors are 200 allowed flexibility in how this method is implemented and the choice 201 of an appropriate method is expected to depend on the way in which 202 the sender stack implements other TCP methods (such as TCP Segment 203 Offload, TSO). 205 1.2. Standards Status of this Document 207 This document was produced by the TCP Maintenance and Minor 208 Extensions (tcpm) working group. 210 The document updates and obsoletes the methods described in 211 [RFC2861]. It recommends a set of mechanisms, including the use of 212 pacing during a non-validated period. The updated mechanisms are 213 intended to have a less aggressive congestion impact than would be 214 exhibited by a standard TCP sender. 216 The specification in this draft is classified as "Experimental" 217 pending experience with deployed implementations of the methods. 219 2. Reviewing experience with TCP-CWV 221 [RFC2861] described a simple modification to the TCP congestion 222 control algorithm that decayed the cwnd after the transition to a 223 "sufficiently-long" idle period. This used the slow-start threshold 224 (ssthresh) to save information about the previous value of the 225 congestion window. The approach relaxed the standard TCP behaviour 226 [RFC5681] for an idle session, intended to improve application 227 performance. CWV also modified the behaviour where a sender 228 transmitted at a rate less than allowed by cwnd. 230 [RFC2861] proposed two set of responses, one after an "application- 231 limited" and one after an "idle period". Although this distinction 232 was argued, in practice differentiating the two conditions was found 233 problematic in actual networks (e.g.[Bis10]). This offers 234 predictable performance for long on-off periods (>>1 RTT), or slowly 235 varying rate-based traffic, the performance could be unpredictable 236 for variable-rate traffic and depended both upon whether an accurate 237 RTT had been obtained and the pattern of application traffic relative 238 to the measured RTT. 240 Many applications can and often do vary their transmission over a 241 wide range of rates. Using [RFC2861] such applications often 242 experienced varying performance, which made it hard for application 243 developers to predict the TCP latency even when using a path with 244 stable network characteristics. We argue that an attempt to classify 245 application behaviour as application-limited or idle is problematic 246 and also inappropriate. This document therefore explicitly avoids 247 trying to differentiate these two cases, instead treating all rate- 248 limited traffic uniformly. 250 [RFC2861] has been implemented in some mainstream operating systems 251 as the default behaviour [Bis08]. Analysis (e.g. [Bis10] [Fai12]) 252 has shown that a TCP sender using CWV is able to use available 253 capacity on a shared path after an idle period. This can benefit 254 variable-rate applications, especially over long delay paths, when 255 compared to the slow-start restart specified by standard TCP. 256 However, CWV would only benefit an application if the idle period 257 were less than several Retransmission Time Out (RTO) intervals 258 [RFC6298], since the behaviour would otherwise be the same as for 259 standard TCP, which resets the cwnd to the TCP Restart Window after 260 this period. 262 To enable better performance for variable-rate applications with TCP, 263 some operating systems have chosen to support non-standard methods, 264 or applications have resorted to "padding" streams by sending dummy 265 data to maintain their sending rate when they have no data to 266 transmit. Although transmitting redundant data across a network path 267 provides good evidence that the path can sustain data at the offered 268 rate, padding also consumes network capacity and reduces the 269 opportunity for congestion-free statistical multiplexing. For 270 variable-rate flows, the benefits of statistical multiplexing can be 271 significant and it is therefore a goal to find a viable alternative 272 to padding streams. 274 Experience with [RFC2861] suggests that although the CWV method 275 benefited the network in a rate-limited scenario (reducing the 276 probability of network congestion), the behaviour was too 277 conservative for many common rate-limited applications. This 278 mechanism did not therefore offer the desirable increase in 279 application performance for rate-limited applications and it is 280 unclear whether applications actually use this mechanism in the 281 general Internet. 283 It is therefore concluded that CWV, as defined in [RFC2861], was 284 often a poor solution for many rate-limited applications. It had the 285 correct motivation, but had the wrong approach to solving this 286 problem. 288 3. Terminology 290 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 291 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 292 document are to be interpreted as described in [RFC2119]. 294 The document assumes familiarity with the terminology of TCP 295 congestion control [RFC5681]. 297 The following terminology is used in this document: 299 cwnd-limited: A TCP flow that has sent the maximum number of segments 300 permitted by the cwnd, where the application utilises the allowed 301 sending rate (see Section 4.5.2). 303 pipeACK sample: A measure of the volume of data acknowledged by the 304 network within an RTT. 306 pipeACK variable: A variable that measures the available capacity 307 using the set of pipeACK samples. 309 pipeACK Sampling Period: The maximum period that a measured pipeACK 310 sample may influence the pipeACK variable. 312 Non-validated phase: The phase where the cwnd reflects a previous 313 measurement of the available path capacity. 315 Non-validated period, NVP: The maximum period for which cwnd is 316 preserved in the non-validated phase. 318 Rate-limited: A TCP flow that does not consume more than one half of 319 cwnd, and hence operates in the non-validated phase. This includes 320 periods when an application is either idle or chooses to send at a 321 rate less than the maximum permitted by the cwnd. 323 Validated phase: The phase where the cwnd reflects a current estimate 324 of the available path capacity. 326 4. A New Congestion Window Validation method 328 This section proposes an update to the TCP congestion control 329 behaviour during a rate-limited interval. This new method 330 intentionally does not differentiate between times when the sender 331 has become idle or chooses to send at a rate less than the maximum 332 allowed by the cwnd. 334 The period where actual usage is less than allowed by cwnd, is named 335 the non-validated phase. The update allows an application in the 336 non-validated phase to resume transmission at a previous rate without 337 incurring the delay of slow-start. However, if the TCP sender 338 experiences congestion using the preserved cwnd, it is required to 339 immediately reset the cwnd to an appropriate value specified by the 340 method. If a sender does not take advantage of the preserved cwnd 341 within the Non-validated period, NVP, the value of cwnd is reduced, 342 ensuring the value better reflects the capacity that was recently 343 actually used. 345 It is expected that this update will satisfy the requirements of many 346 rate-limited applications and at the same time provide an appropriate 347 method for use in the Internet. New-CWV reduces this incentive for 348 an application to send "padding" data simply to keep transport 349 congestion state. 351 The method is specified in following subsections and is expected to 352 encourage applications and TCP stacks to use standards-based 353 congestion control methods. It may also encourage the use of long- 354 lived connections where this offers benefit (such as persistent 355 http). 357 4.1. Initialisation 359 A sender starts a TCP connection in the validated phase and 360 initialises the pipeACK variable to the "undefined" value. This 361 value inhibits use of the value in cwnd calculations. 363 4.2. Estimating the validated capacity supported by a path 365 [RFC6675] defines a variable, FlightSize, that indicates the 366 instantaneous amount of data that has been sent, but not cumulatively 367 acknowledged. In this method a new variable "pipeACK" is introduced 368 to measure the acknowledged size of the network pipe. This is used 369 to determine if the sender has validated the cwnd. pipeACK differs 370 from FlightSize in that it is evaluated over a window of acknowledged 371 data, rather than reflecting the amount of data outstanding. 373 A sender determines a pipeACK sample by measuring the volume of data 374 that was acknowledged by the network over the period of a measured 375 Round Trip Time (RTT). Using the variables defined in [RFC6675], a 376 value could be measured by caching the value of HighACK and after one 377 RTT measuring the difference between the cached HighACK value and the 378 current HighACK value. Other equivalent methods may be used. 380 A sender is not required to continuously update the pipeACK variable 381 after each received ACK, but SHOULD perform a pipeACK sample at least 382 once per RTT when it has sent unacknowledged segments. 384 The pipeACK variable MAY consider multiple pipeACK samples over the 385 pipeACK Sampling Period. The value of the pipeACK variable MUST NOT 386 exceed the maximum (highest value) within the sampling period. This 387 specification defines the pipeACK Sampling Period as Max(3*RTT, 1 388 second). This period enables a sender to compensate for large 389 fluctuations in the sending rate, where there may be pauses in 390 transmission, and allows the pipeACK variable to reflect the largest 391 recently measured pipeACK sample. 393 When no measurements are available, the pipeACK variable is set to 394 the "undefined value". This value is used to inhibit entering the 395 non-validated phase until the first new measurement of a pipeACK 396 sample. 398 The pipeACK variable MUST NOT be updated during TCP Fast Recovery. 399 That is, the sender stops collecting pipeACK samples during loss 400 recovery. The method RECOMMENDS that the TCP SACK option [RFC2018] 401 is enabled and the method defined on [RFC6675]is used to recover 402 missing segments. This allows the sender to more accurately 403 determine the number of missing bytes during the loss recovery phase, 404 and using this method will result in a more appropriate cwnd 405 following loss. 407 4.3. Preserving cwnd during a rate-limited period. 409 The updated method creates a new TCP sender phase that captures 410 whether the cwnd reflects a validated or non-validated value. The 411 phases are defined as: 413 o Validated phase: pipeACK >=(1/2)*cwnd, or pipeACK is undefined. 414 This is the normal phase, where cwnd is expected to be an 415 approximate indication of the capacity currently available along 416 the network path, and the standard methods are used to increase 417 cwnd (currently [RFC5681]). 419 o Non-validated phase: pipeACK <(1/2)*cwnd. This is the phase where 420 the cwnd has a value based on a previous measurement of the 421 available capacity, and the usage of this capacity has not been 422 validated in the pipeACK Sampling Period. That is, when it is not 423 known whether the cwnd reflects the currently available capacity 424 along the network path. The mechanisms to be used in this phase 425 seek to determine a safe value for cwnd and an appropriate 426 reaction to congestion. 428 Note: A threshold is needed to determine whether a sender is in the 429 validated or non-validated phase. A standard TCP sender in slow- 430 start is permitted to double its FlightSize from one RTT to the next. 431 This motivated the choice of a threshold value of 1/2. This 432 threshold ensures a sender does not further increase the cwnd as long 433 as the FlightSize is less than (1/2*cwnd). Furthermore, a sender 434 with a FlightSize less than (1/2*cwnd) may in the next RTT be 435 permitted by the cwnd to send at a rate that more than doubles the 436 FlightSize, and hence this case needs to be regarded as non-validated 437 and a sender therefore needs to employ additional mechanisms while in 438 this phase. 440 4.4. TCP congestion control during the non-validated phase 442 A TCP sender implementing this specification MUST enter the non- 443 validated phase when the pipeACK is less than (1/2)*cwnd. 445 A TCP sender that enters the non-validated phase SHOULD preserve the 446 cwnd (i.e., the cwnd neither grows nor reduces while the sender 447 remains in this phase). If the sender receives an indication of 448 congestion, it uses the method described below. The phase is 449 concluded after a fixed period of time (the NVP, as explained in 450 Section 4.4.3) or when the sender transmits sufficient data so that 451 pipeACK > (1/2)*cwnd (i.e., the sender is no longer rate-limited). 453 The behaviour in the non-validated phase is specified as: 455 o A sender determines whether to increase the cwnd based upon 456 whether it is cwnd-limited (see Section 4.5.2): 458 * A sender that is cwnd-limited MAY use the standard TCP method 459 to increase cwnd (i.e., a TCP sender that fully utilises the 460 cwnd is permitted to increase cwnd each received ACK using 461 standard methods). 463 * A sender that is not cwnd-limited MUST NOT increase the cwnd 464 when ACK packets are received in this phase (i.e., needs to 465 avoid growing the cwnd when it has not recently sent using the 466 current size of cwnd). 468 o If the sender receives an indication of congestion while in the 469 non-validated phase (i.e., detects loss), the sender MUST exit the 470 non-validated phase (reducing the cwnd as defined in 471 Section 4.4.1). 473 o If the Retransmission Time Out (RTO) expires while in the non- 474 validated phase, the sender MUST exit the non-validated phase. It 475 then resumes using the standard TCP RTO mechanism [RFC5681]. 477 o A sender with a pipeACK variable greater than (1/2)*cwnd SHOULD 478 enter the validated phase. (A rate-limited sender will not 479 normally be impacted by whether it is in a validated or non- 480 validated phase, since it will normally not consume the entire 481 cwnd. However a change to the validated phase will release the 482 sender from constraints on the growth of cwnd, and restore the use 483 of the standard congestion response.) 485 The cwnd-limited behaviour may be triggered during a transient 486 condition that occurs when a sender is in the non-validated phase and 487 receives an ACK that acknowledges received data, the cwnd was fully 488 utilised, and more data is awaiting transmission than may be sent 489 with the current cwnd. The sender is then allowed to use the 490 standard method to increase the cwnd. (Note, if the sender succeeds 491 in sending these new segments, the updated cwnd and pipeACK variables 492 will eventually result in a transition to the validated phase.) 494 4.4.1. Response to congestion in the non-validated phase 496 Reception of congestion feedback while in the non-validated phase is 497 interpreted as an indication that it was inappropriate for the sender 498 to use the preserved cwnd. The sender is therefore required to 499 quickly reduce the rate to avoid further congestion. Since the cwnd 500 does not have a validated value, a new cwnd value needs to be 501 selected based on the utilised rate. 503 A sender that detects a packet-drop MUST record the current 504 FlightSize in the variable LossFlightSize and MUST calculate a safe 505 cwnd for loss recovery using the method below: 507 cwnd = (Max(pipeACK,LossFlightSize))/2. 509 The pipeACK value is not updated during loss recoverySection 4.2. If 510 there is a valid pipeACK value, the new cwnd is adjusted to reflect 511 that a non-validated cwnd may be larger than the actual FlightSize, 512 or recently used FlightSize (recorded in pipeACK). The updated cwnd 513 therefore prevents overshoot by a sender significantly increasing its 514 transmission rate during the recovery period. 516 At the end of the recovery phase, the TCP sender MUST reset the cwnd 517 using the method below: 519 cwnd = (Max(pipeACK,LossFlightSize) - R)/2. 521 Where R is the volume of data that was successfully retransmitted 522 during the recovery phase. This counts segments retransmitted and 523 considered lost by the pipe estimation algorithm at the end of 524 recovery. It does not include the additional cost of multiple 525 retransmission of the same data. 527 The calculated cwnd value MUST NOT be reduced below 1 MSS. 529 After completing the loss recovery phase, the sender MUST re- 530 initialise the pipeACK variable to the "undefined" value. This 531 ensures that standard TCP methods are used immediately after 532 completing loss recovery until a new pipeACK value can be determined. 534 ssthresh is adjusted using the standard TCP method. 536 Note: The adjustment by reducing cwnd by the volume of data not sent 537 (R) follows the method proposed for Jump Start [Liu07]. The 538 inclusion of the term R makes the adjustment more conservative than 539 standard TCP. This is required, since a sender in the non-validated 540 state may increase the rate more than a standard TCP would have done 541 relative to what was sent in the last RTT (i.e., more than doubled 542 the number of segments in flight relative to what it sent in the last 543 RTT). The additional reduction after congestion is beneficial when 544 the LossFlightSize has significantly overshot the available path 545 capacity incurring significant loss (e.g. following a change of path 546 characteristics or when additional traffic has taken a larger share 547 of the network bottleneck during a period when the sender transmits 548 less). 550 Note: The pipeACK value is only valid during a non-validated phase, 551 and therefore does not exceed cwnd/2. If LossFlightSize and R were 552 small, then this can result in the final cwnd after loss recovery 553 being 1/4 of the cwnd on detection of congestion. This reduction is 554 conservative, and pipeACK is reset to undefined. Subsequent updates 555 to cwnd do not therefore reflect pipeACK history before any 556 congestion event. 558 4.4.2. Sender burst control during the non-validated phase 560 TCP congestion control allows a sender to accumulate a cwnd that 561 would allow it to send a burst of segments with a total size up to 562 the difference between the FlightsSize and cwnd. Such bursts can 563 impact other flows that share a network bottleneck and/or may induce 564 congestion when buffering is limited. 566 Various methods have been proposed to control the sender burstiness 567 [Hug01], [All05]. For example, TCP can limit the number of new 568 segments it sends per received ACK. This is effective when a flow of 569 ACKs is received, but can not be used to control a sender that has 570 not send appreciable data in the previous RTT [All05]. 572 This document recommends using a method to avoid line-rate bursts 573 after an idle or rate-limited interval when there is less reliable 574 information about the capacity of the network path: A TCP sender in 575 the non-validated phase SHOULD control the maximum burst size, e.g. 576 using a rate-based pacing algorithm in which a sender paces out the 577 cwnd over its estimate of the RTT, or some other method, to prevent 578 many segments being transmitted contiguously at line-rate. The most 579 appropriate method(s) to implement pacing depend on the design of the 580 TCP/IP stack, speed of interface and whether hardware support (such 581 as TCP Segment Offload, TSO) is used. The present document does not 582 recommend any specific method. 584 4.4.3. Adjustment at the end of the non-validated phase 586 An application that remains in the non-validated phase for a period 587 greater than the NVP is required to adjust its congestion control 588 state. If the sender exits the non-validated phase after this 589 period, it MUST update the ssthresh: 591 ssthresh = max(ssthresh, 3*cwnd/4). 593 (This adjustment of ssthresh ensures that the sender records that it 594 has safely sustained the present rate. The change is beneficial to 595 rate-limited flows that encounter occasional congestion, and could 596 otherwise suffer an unwanted additional delay in recovering the 597 sending rate.) 599 The sender MUST then update cwnd to be not greater than: 601 cwnd = max((1/2)*cwnd, IW). 603 Where IW is the appropriate TCP initial window, used by the TCP 604 sender (e.g. [RFC5681]). 606 Note: This adjustment ensures that the sender responds conservatively 607 after remaining in the non-validated phase for more than the non- 608 validated period. In this case, it reduces the cwnd by a factor of 609 two from the preserved value. This adjustment is helpful when flows 610 accumulate but do not use a large cwnd, and seeks to mitigate the 611 impact when these flows later resume transmission. This could for 612 instance mitigate the impact if multiple high-rate application flows 613 were to become idle over an extended period of time and then were 614 simultaneously awakened by an external event. 616 4.5. Examples of Implementation 618 This section provides informative examples of implementation methods. 619 Implementations may choose to use other methods that comply with the 620 normative requirements. 622 4.5.1. Implementing the pipeACK measurement 624 A pipeACK sample may be measured once each RTT. This reduces the 625 sender processing burden for calculating after each acknowledgement 626 and also reduces storage requirements at the sender. 628 Since application behaviour can be bursty using CWV, it may be 629 desirable to implement a maximum filter to accumulate the measured 630 values so that the pipeACK variable records the largest pipeACK 631 sample within the pipeACK Sampling Period. One simple way to 632 implement this is to divide the pipeACK Sampling Period into several 633 (e.g. 5) equal length measurement periods. The sender then records 634 the start time for each measurement period and the highest measured 635 pipeACK sample. At the end of the measurement period, any 636 measurement(s) that are older than the pipeACK Sampling Period are 637 discarded. The pipeACK variable is then assigned the largest of the 638 set of the highest measured values. 640 +----------+----------+ +----------+---...... 641 | Sample A | Sample B | No | Sample C | Sample D 642 | | | Sample | | 643 | |\ 5 | | | | 644 | | | | | | /\ 4 | 645 | | | | |\ 3 | | | \ | 646 | | \ | | \--- | | / \ | /| 2 647 |/ \------| - | | / \------/ \... 648 +----------+---------\+----/ /----+/---------+-------------> Time 650 <------------------------------------------------| 651 Sampling Period Current Time 653 Figure 1: Example of measuring pipeACK samples 655 Figure 1 shows an example of how measurement samples may be 656 collected. At the time represented by the figure new samples are 657 being accumulated into sample D. Three previous samples also fall 658 within the pipeACK Sampling Period: A, B, and C. There was also a 659 period of inactivity between samples B and C during which no 660 measurements were taken. The current value of the pipeACK variable 661 will be 5, the maximum across all samples. 663 After one further measurement period, Sample A will be discarded, 664 since it then is older than the pipeACK Sampling Period and the 665 pipeACK variable will be recalculated, Its value will be the larger 666 of Sample C or the final value accumulated in Sample D. 668 Note: the pipeACK Sampling Period and the NVP period do not 669 necessarily require a new timer to be implemented. An alternative is 670 to record a timestamp when the sender enters the NVP. Each time a 671 sender transmits a new segment, this timestamp may be used to 672 determine if the NVP period has expired. If the period expires, the 673 sender may take into account how many units of the NVP period have 674 passed and make one reduction (as defined in Section 4.4.3) for each 675 NVP period. 677 4.5.2. Implementing detection of the cwnd-limited condition 679 A sender needs to implement a method that detects the cwnd-limited 680 condition (see Section 4.4. This detects a condition where a sender 681 receives an ACK in the non-validated phase, but the size of cwnd 682 prevents sending more new data. 684 In simple terms, this condition is true only when the FlightSize of a 685 TCP sender is equal to or larger than the current cwnd. However, an 686 implementation also needs to consider constraints on the way in which 687 the cwnd variable can be used, for instance implementations need to 688 support other TCP methods such as the Nagle Algorithm and TCP Segment 689 Offload (TSO) that also use cwnd to control transmission. These 690 other methods can result in a sender becoming cwnd-limited when the 691 cwnd is nearly, rather than completely, equal to the FlightSize. 693 5. Determining a safe period to preserve cwnd 695 This section documents the rationale for selecting the maximum period 696 that cwnd may be preserved, known as the non-validated period, NVP. 698 Limiting the period that cwnd may be preserved avoids undesirable 699 side effects that would result if the cwnd were to be kept 700 unnecessarily high for an arbitrary long period, which was a part of 701 the problem that CWV originally attempted to address. The period a 702 sender may safely preserve the cwnd, is a function of the period that 703 a network path is expected to sustain the capacity reflected by cwnd. 704 There is no ideal choice for this time. 706 A period of five minutes was chosen for this NVP. This is a 707 compromise that was larger than the idle intervals of common 708 applications, but not sufficiently larger than the period for which 709 the capacity of an Internet path may commonly be regarded as stable. 710 The capacity of wired networks is usually relatively stable for 711 periods of several minutes and that load stability increases with the 712 capacity. This suggests that cwnd may be preserved for at least a 713 few minutes. 715 There are cases where the TCP throughput exhibits significant 716 variability over a time less than five minutes. Examples could 717 include wireless topologies, where TCP rate variations may fluctuate 718 on the order of a few seconds as a consequence of medium access 719 protocol instabilities. Mobility changes may also impact TCP 720 performance over short time scales. Senders that observe such rapid 721 changes in the path characteristic may also experience increased 722 congestion with the new method, however such variation would likely 723 also impact TCP's behaviour when supporting interactive and bulk 724 applications. 726 Routing algorithms may modify the network path, disrupting the RTT 727 measurement and changing the capacity available to a TCP connection, 728 however such changes do not usually occur within a time frame of a 729 few minutes. 731 The value of five minutes is therefore expected to be sufficient for 732 most current applications. Simulation studies (e.g. [Bis11]) also 733 suggest that for many practical applications, the performance using 734 this value will not be significantly different to that observed using 735 a non-standard method that does not reset the cwnd after idle. 737 Finally, other TCP sender mechanisms have used a 5 minute timer, and 738 there could be simplifications in some implementations by reusing the 739 same interval. TCP defines a default user timeout of 5 minutes 740 [RFC0793] i.e., how long transmitted data may remain unacknowledged 741 before a connection is forcefully closed. 743 6. Security Considerations 745 General security considerations concerning TCP congestion control are 746 discussed in [RFC5681]. This document describes an algorithm that 747 updates one aspect of the congestion control procedures, and so the 748 considerations described in RFC 5681 also apply to this algorithm. 750 7. IANA Considerations 752 There are no IANA considerations. 754 8. Acknowledgments 756 The authors acknowledge the contributions of Dr I Biswas, Mr Ziaul 757 Hossain in supporting the evaluation of CWV and for their help in 758 developing the mechanisms proposed in this draft. We also 759 acknowledge comments received from the Internet Congestion Control 760 Research Group, in particular Yuchung Cheng, Mirja Kuehlewind, Joe 761 Touch, and Mark Allman. This work was part-funded by the European 762 Community under its Seventh Framework Programme through the Reducing 763 Internet Transport Latency (RITE) project (ICT-317700). 765 9. Author Notes 767 RFC-Editor note: please remove this section prior to publication. 769 9.1. Other related work 771 RFC-Editor note: please remove this section prior to publication. 773 There are several issues to be discussed more widely: 775 o There are potential interactions with the Experimental update in 776 RFC 6928 that raises the TCP initial Window to ten segments, do 777 these cases need to be elaborated? 779 This relates to the Experimental specification for increasing 780 the TCP IW defined in RFC 6928. 782 The two methods have different functions and different response 783 to loss/congestion. 785 RFC 6928 proposes an experimental update to TCP that would 786 increase the IW to ten segments. This would allow faster 787 opening of the cwnd, and also a large (same size) restart 788 window. This approach is based on the assumption that many 789 forward paths can sustain bursts of up to ten segments without 790 (appreciable) loss. Such a significant increase in cwnd must 791 be matched with an equally large reduction of cwnd if loss/ 792 congestion is detected, and such a congestion indication is 793 likely to require future use of IW=10 to be disabled for this 794 path for some time. This guards against the unwanted behaviour 795 of a series of short flows continuously flooding a network path 796 without network congestion feedback. 798 In contrast, this document proposes an update with a rationale 799 that relies on recent previous path history to select an 800 appropriate cwnd after restart. 802 The behaviour differs in three ways: 804 1) For applications that send little initially, new-cwv may 805 constrain more than RFC 6928, but would not require the 806 connection to reset any path information when a restart 807 incurred loss. In contrast, new-cwv would allow the TCP 808 connection to preserve the cached cwnd, any loss, would impact 809 cwnd, but not impact other flows. 811 2) For applications that utilise more capacity than provided by 812 a cwnd of 10 segments, this method would permit a larger 813 restart window compared to a restart using the method in RFC 814 6928. This is justified by the recent path history. 816 3) new-CWV is attended to also be used for rate-limited 817 applications, where the application sends, but does not seek to 818 fully utilise the cwnd. In this case, new-cwv constrains the 819 cwnd to that justified by the recent path history. The 820 performance trade-offs are hence different, and it would be 821 possible to enable new-cwv when also using the method in RFC 822 6928, and yield benefits. 824 o There is potential overlap with the Laminar proposal (draft- 825 mathis-tcpm-tcp-laminar) 827 The current draft was intended as a standards-track update to 828 TCP, rather than a new transport variant. At least, it would 829 be good to understand how the two interact and whether there is 830 a possibility of a single method. 832 o There is potential performance loss in loss of a short burst 833 (off list with M Allman) 835 A sender can transmit several segments then become idle. If 836 the first set of segments are all Acknowledged, the ssthresh 837 collapses to a small value (no new data is sent by the idle 838 sender). Loss of the later data results in congestion (e.g., 839 maybe a RED drop or some other cause, rather than the maximum 840 rate of this flow). When the sender performs loss recovery it 841 may have an appreciable pipeACK and cwnd, but a very low 842 FlightSize - the Standard algorithm therefore results in an 843 unusually low cwnd ((1/2)* FlightSize). 845 A constant rate flow would have maintained a FlightSize 846 appropriate to pipeACK (cwnd, if it is a bulk flow). 848 This could be fixed by adding a new state variable? It could 849 also be argued this is a corner case (e.g. loss of only the 850 last segments would have resulted in RTO), the impact could be 851 significant. 853 o There is potential interaction with TCP Control Block Sharing(M 854 Welzl) 856 An application that is non-validated can accumulate a cwnd that 857 is larger than the actual capacity. Is this a fair value to 858 use in TCB sharing? 860 We propose that TCB sharing should use the pipeACK in place of 861 cwnd when a TCP sender is in the Non-validated phase. This 862 value better reflects the capacity that the flow has utilised 863 in the network path. 865 10. Revision notes 867 RFC-Editor note: please remove this section prior to publication. 869 Draft 03 was submitted to ICCRG to receive comments and feedback. 871 Draft 04 contained the first set of clarifications after feedback: 873 o Changed name to application limited and used the term rate-limited 874 in all places. 876 o Added justification and many minor changes suggested on the list. 878 o Added text to tie-in with more accurate ECN marking. 880 o Added ref to Hug01 882 Draft 05 contained various updates: 884 o New text to redefine how to measure the acknowledged pipe, 885 differentiating this from the FlightSize, and hence avoiding 886 previous issues with infrequent large bursts of data not being 887 validated. A key point new feature is that pipeACK only triggers 888 leaving the NVP after the size of the pipe has been acknowledged. 889 This removed the need for hysteresis. 891 o Reduction values were changed to 1/2, following analysis of 892 suggestions from ICCRG. This also sets the "target" cwnd as twice 893 the used rate for non-validated case. 895 o Introduced a symbolic name (NVP) to denote the 5 minute period. 897 Draft 06 contained various updates: 899 o Required reset of pipeACK after congestion. 901 o Added comment on the effect of congestion after a short burst (M. 902 Allman). 904 o Correction of minor Typos. 906 WG draft 00 contained various updates: 908 o Updated initialisation of pipeACK to maximum value. 910 o Added note on intended status still to be determined. 912 WG draft 01 contained: 914 o Added corrections from Richard Scheffenegger. 916 o Raffaello Secchi added to the mechanism, based on implementation 917 experience. 919 o Removed that the requirement for the method to use TCP SACK option 921 o Although it may be desirable to use SACK, this is not essential to 922 the algorithm. 924 o Added the notion of the sampling period to accommodate large rate 925 variations and ensure that the method is stable. This algorithm 926 to be validated through implementation. 928 WG draft 02 contained: 930 o Clarified language around pipeACK variable and pipeACK sample - 931 Feedback from Aris Angelogiannopoulos. 933 WG draft 03 contained: 935 o Editorial corrections - Feedback from Anna Brunstrom. 937 o An adjustment to the procedure at the start and end of Reoloss 938 recovery to align the two equations. 940 o Further clarification of the "undefined" value of the pipeACK 941 variable. 943 WG draft 04 contained: 945 o Editorial corrections. 947 o Introduced the "cwnd-limited" term. 949 o An adjustment to the procedure at the start of a cwnd-limited 950 phase - the new text is intended to ensure that new-cwv is not 951 unnecessarily more conservative than standard TCP when the flow is 952 cwnd-limited. This resolves two issues: first it prevents 953 pathologies in which pipeACK increases slowly and erratically. It 954 also ensures that performance of bulk applications is not 955 significantly impacted when using the method. 957 o Clearly identifies that pacing (or equivalent) is requiring during 958 the NVP to control burstiness. New section added. 960 WG draft 05 contained: 962 o Clarification to first two bullets in Section 4.4 describing cwnd- 963 limited, to explain these are really alternates to the same case. 965 o Section giving implementation examples was restructured to clarify 966 there are two methods described. 968 o Cross References to sections updated - thanks to comments from 969 Martin Winbjoerk and Tim Wicinski. 971 WG draft 06 contained: 973 o The section giving implementation examples was restructured to 974 clarify there are two methods described. 976 o Justification of design decisions. 978 o Re-organised text to improve clarity of argument. 980 WG draft 07 contained: 982 o Updated publication date. 984 o Text on noting that cwnd shouldn't ever be made negative. 986 o Updated text on ECN to clarify the process where R is a reduction 987 based on ECN marks. 989 WG draft 08 contained: 991 o Removed description of how to use Accurate ECN feedback. It is 992 not clear that this document should specify a usage of a mechanism 993 that has not been fully defined. Accurate ECN may lead to 994 different congestion responses and these will need to be defined 995 in the CC specifications for using Accurate ECN. 997 WG draft 09 contained: 999 o Removed update to RFC 5681 - the status of the present document is 1000 Experimental, and hence this document does not update RFC 5681. 1002 WG draft 10 contained edits following WGLC: 1004 o Section 1.1 Implementation of new CWV: New section added to 1005 introduce the places where there are implementation flexibility. 1007 o Section 4.4: Clarified that the MUST is to satisfy the goal to 1008 avoid a TCP sender growing a large "non-validated" cwnd, when it 1009 has not recently sent using the current size of cwnd, and fixed 1010 format of bullet 2 in 4.4. 1012 o Section 4.5.2: rewritten section text. 1014 11. References 1016 11.1. Normative References 1018 [RFC0793] Postel, J., "Transmission Control Protocol", September 1019 1981. 1021 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 1022 Selective Acknowledgment Options", RFC 2018, October 1996. 1024 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1025 Requirement Levels", BCP 14, RFC 2119, March 1997. 1027 [RFC2861] Handley, M., Padhye, J., and S. Floyd, "TCP Congestion 1028 Window Validation", RFC 2861, June 2000. 1030 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1031 Control, RFC 5681", September 2009. 1033 [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., 1034 and Y. Nishida, "A Conservative Loss Recovery Algorithm 1035 Based on Selective Acknowledgment (SACK) for TCP", RFC 1036 6675, August 2012. 1038 11.2. Informative References 1040 [All05] Allman, M. and E. Blanton, "Notes on burst mitigation for 1041 transport protocols", March 2005. 1043 [Bis08] Biswas, I. and G. Fairhurst, "A Practical Evaluation of 1044 Congestion Window Validation Behaviour, 9th Annual 1045 Postgraduate Symposium in the Convergence of 1046 Telecommunications, Networking and Broadcasting (PGNet), 1047 Liverpool, UK", June 2008. 1049 [Bis10] Biswas, I., Sathiaseelan, A., Secchi, R., and G. 1050 Fairhurst, "Analysing TCP for Bursty Traffic, Int'l J. of 1051 Communications, Network and System Sciences, 7(3)", June 1052 2010. 1054 [Bis11] Biswas, I., "PhD Thesis, Internet congestion control for 1055 variable rate TCP traffic, School of Engineering, 1056 University of Aberdeen", June 2011. 1058 [Fai12] Sathiaseelan, A., Secchi, R., Fairhurst, G., and I. 1059 Biswas, "Enhancing TCP Performance to support Variable- 1060 Rate Traffic, 2nd Capacity Sharing Workshop, ACM CoNEXT, 1061 Nice, France, 10th December 2012.", June 2008. 1063 [Hug01] Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP 1064 Slow-Start Restart After Idle (Work-in-Progress)", 1065 December 2001. 1067 [Liu07] Liu, D., Allman, M., Jiny, S., and L. Wang, "Congestion 1068 Control without a Startup Phase, 5th International 1069 Workshop on Protocols for Fast Long-Distance Networks 1070 (PFLDnet), Los Angeles, California, USA", February 2007. 1072 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 1073 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 1074 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 1076 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 1077 "Computing TCP's Retransmission Timer, RFC 6928", June 1078 2011. 1080 Authors' Addresses 1081 Godred Fairhurst 1082 University of Aberdeen 1083 School of Engineering 1084 Fraser Noble Building 1085 Aberdeen, Scotland AB24 3UE 1086 UK 1088 Email: gorry@erg.abdn.ac.uk 1089 URI: http://www.erg.abdn.ac.uk 1091 Arjuna Sathiaseelan 1092 University of Aberdeen 1093 School of Engineering 1094 Fraser Noble Building 1095 Aberdeen, Scotland AB24 3UE 1096 UK 1098 Email: arjuna@erg.abdn.ac.uk 1099 URI: http://www.erg.abdn.ac.uk 1101 Raffaello Secchi 1102 University of Aberdeen 1103 School of Engineering 1104 Fraser Noble Building 1105 Aberdeen, Scotland AB24 3UE 1106 UK 1108 Email: raffaello@erg.abdn.ac.uk 1109 URI: http://www.erg.abdn.ac.uk