idnits 2.17.1 draft-ietf-tcpm-newcwv-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC2861, but the abstract doesn't seem to directly say this. It does mention RFC2861 though, so this could be OK. -- The draft header indicates that this document updates RFC5681, but the abstract doesn't seem to directly say this. It does mention RFC5681 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC5681, updated by this document, for RFC5378 checks: 2006-01-26) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 23, 2014) is 3680 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2861 (Obsoleted by RFC 7661) -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCPM Working Group G. Fairhurst 3 Internet-Draft A. Sathiaseelan 4 Obsoletes: 2861 (if approved) R. Secchi 5 Updates: 5681 (if approved) University of Aberdeen 6 Intended status: Experimental March 23, 2014 7 Expires: September 24, 2014 9 Updating TCP to support Rate-Limited Traffic 10 draft-ietf-tcpm-newcwv-06 12 Abstract 14 This document proposes an update to RFC 5681 to address issues that 15 arise when TCP is used to support traffic that exhibits periods where 16 the sending rate is limited by the application rather than the 17 congestion window. It provides an experimental update to TCP that 18 allows a TCP sender to restart quickly following either a rate- 19 limited interval. This method is expected to benefit applications 20 that send rate-limited traffic using TCP, while also providing an 21 appropriate response if congestion is experienced. 23 It also evaluates the Experimental specification of TCP Congestion 24 Window Validation, CWV, defined in RFC 2861, and concludes that RFC 25 2861 sought to address important issues, but failed to deliver a 26 widely used solution. This document therefore recommends that the 27 status of RFC 2861 is moved from Experimental to Historic, and that 28 it is replaced by the current specification. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on September 24, 2014. 47 Copyright Notice 49 Copyright (c) 2014 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 65 1.1. Standards Status of this Document . . . . . . . . . . . . 4 66 2. Reviewing experience with TCP-CWV . . . . . . . . . . . . . . 5 67 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 68 4.1. Initialisation . . . . . . . . . . . . . . . . . . . . . 8 69 4.2. Estimating the validated capacity supported by a path . . 8 70 4.3. Preserving cwnd during a rate-limited period. . . . . . . 9 71 4.4. TCP congestion control during the non-validated phase . . 9 72 4.4.1. Response to congestion in the non-validated phase . . 11 73 4.4.2. Sender burst control during the non-validated phase . 12 74 4.4.3. Adjustment at the end of the non-validated phase . . 13 75 4.5. Examples of Implementation . . . . . . . . . . . . . . . 13 76 4.5.1. Implementing the pipeACK measurement . . . . . . . . 13 77 4.5.2. Implementing detection of the cwnd-limited condition 15 78 5. Determining a safe period to preserve cwnd . . . . . . . . . 15 79 6. Security Considerations . . . . . . . . . . . . . . . . . . . 16 80 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 81 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 16 82 9. Author Notes . . . . . . . . . . . . . . . . . . . . . . . . 16 83 9.1. Other related work . . . . . . . . . . . . . . . . . . . 16 84 9.2. Revision notes . . . . . . . . . . . . . . . . . . . . . 19 85 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 86 10.1. Normative References . . . . . . . . . . . . . . . . . . 21 87 10.2. Informative References . . . . . . . . . . . . . . . . . 22 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 90 1. Introduction 92 TCP is used to support a range of application behaviours. The TCP 93 congestion window (cwnd) controls the number of unacknowledged 94 packets/bytes that a TCP flow may have in the network at any time, a 95 value known as the FlightSize [RFC5681]. A bulk application will 96 always have data available to transmit. The rate at which it sends 97 is therefore limited by the maximum permitted by the receiver 98 advertised window and the sender congestion window (cwnd). In 99 contrast, a rate-limited application will experience periods when the 100 sender is either idle or is unable to send at the maximum rate 101 permitted by the cwnd. The update in this document targets the 102 operation of TCP in such rate-limited cases. 104 Standard TCP [RFC5681] states that a TCP sender SHOULD set cwnd to no 105 more than the Restart Window (RW) before beginning transmission, if 106 the TCP sender has not sent data in an interval exceeding the 107 retransmission timeout, i..e when an application becomes idle. 108 [RFC2861] noted that this TCP behaviour was not always observed in 109 current implementations. Experiments [Bis08] confirm this to still 110 be the case. 112 CWV introduced the terminology of "application limited periods". 113 This document describes any time that an application limits the 114 sending rate, rather than being limited by the transport, as "rate- 115 limited". This update improves support for applications that vary 116 their transmission rate, either with (short) idle periods between 117 transmission or by changing the rate the application sends. These 118 applications are characterised by the TCP FlightSize often being less 119 than cwnd. Many Internet applications exhibit this behaviour, 120 including web browsing, http-based adaptive streaming, applications 121 that support query/response type protocols, network file sharing, and 122 live video transmission. Many such applications currently avoid 123 using long-lived (persistent) TCP connections (e.g. [RFC2616] servers 124 typically support persistent HTTP connections, but short server 125 timeouts often prevent using it). Such applications often instead 126 either use a succession of short TCP transfers or use UDP. 128 Standard TCP does not impose additional restrictions on the growth of 129 the congestion window when a TCP sender is unable to send at the 130 maximum rate allowed by the cwnd. In this case the rate-limited 131 sender may grow a cwnd far beyond that corresponding to the current 132 transmit rate, resulting in a value that does not reflect current 133 information about the state of the network path the flow is using. 134 Use of such an invalid cwnd may result in reduced application 135 performance and/or could significantly contribute to network 136 congestion. 138 [RFC2861] proposed a solution to these issues in an experimental 139 method known as Congestion Window Validation (CWV). CWV was intended 140 to help reduce cases where TCP accumulated an invalid cwnd. The use 141 and drawbacks of using the CWV algorithm in RFC 2861 with an 142 application are discussed in Section 2. 144 Section 3 defines relevant terminology. 146 Section 4 specifies an alternative to CWV that seeks to address the 147 same issues, but does this in a way that is expected to mitigate the 148 impact on an application that varies its sending rate. The updated 149 method applies to the rate-limited conditions (including both an 150 application-limited and idle sender). 152 The goals of this update are: 154 o To not change the behaviour of a TCP sender that performs bulk 155 transfers that consume the cwnd. 157 o To provide a method that co-exists with Standard TCP and other 158 flows that use this updated method. 160 o To reduce transfer latency for applications that change their rate 161 over short intervals of time. 163 o To avoid a TCP sender growing a large "non-validated" cwnd, when 164 it has not recently sent using this cwnd. 166 o To remove the incentive for ad-hoc application or network stack 167 methods (such as "padding") solely to maintain a large cwnd for 168 future transmission. 170 o To incentivise the use of long-lived connections, rather than a 171 succession of short-lived flows, benefiting both flows and network 172 when actual congestion is encountered. 174 Section 5 describes the rationale for selecting the safe period to 175 preserve the cwnd. 177 1.1. Standards Status of this Document 179 This document was produced by the TCP Maintenance and Minor 180 Extensions (tcpm) working group. 182 The document updates and obsoletes the methods described in 183 [RFC2861]. It recommends a set of mechanisms, including the use of 184 pacing during a non-validated period. The updated mechanisms are 185 intended to have a less aggressive congestion impact than would be 186 exhibited by a standard TCP sender. 188 The specification in this draft is classified as "Experimental" 189 pending experience with deployed implementations of the methods. 191 2. Reviewing experience with TCP-CWV 193 [RFC2861] described a simple modification to the TCP congestion 194 control algorithm that decayed the cwnd after the transition to a 195 "sufficiently-long" idle period. This used the slow-start threshold 196 (ssthresh) to save information about the previous value of the 197 congestion window. The approach relaxed the standard TCP behaviour 198 [RFC5681] for an idle session, intended to improve application 199 performance. CWV also modified the behaviour where a sender 200 transmitted at a rate less than allowed by cwnd. 202 [RFC2861] proposed two set of responses, one after an "application- 203 limited" and one after an "idle period". Although this distinction 204 was argued, in practice differentiating the two conditions was found 205 problematic in actual networks (e.g.[Bis10]). This offers 206 predictable performance for long on-off periods (>>1 RTT), or slowly 207 varying rate-based traffic, the performance could be unpredictable 208 for variable-rate traffic and depended both upon whether an accurate 209 RTT had been obtained and the pattern of application traffic relative 210 to the measured RTT. 212 Many applications can and often do vary their transmission over a 213 wide range rates. Using [RFC2861] such applications often 214 experienced varying performance, which made it hard for application 215 developers to predict the TCP latency even when using a path with 216 stable network characteristics. We argue that an attempt to classify 217 application behaviour as application-limited or idle is problematic 218 and also inappropriate. This document therefore explicitly avoids 219 trying to differentiate these two cases, instead treating all rate- 220 limited traffic uniformly. 222 [RFC2861] has been implemented in some mainstream operating systems 223 as the default behaviour [Bis08]. Analysis (e.g. [Bis10] [Fai12]) 224 has shown that a TCP sender using CWV is able to use available 225 capacity on a shared path after an idle period. This can benefit 226 variable-rate applications, especially over long delay paths, when 227 compared to the slow-start restart specified by standard TCP. 228 However, CWV would only benefit an application if the idle period 229 were less than several Retransmission Time Out (RTO) intervals 230 [RFC6298], since the behaviour would otherwise be the same as for 231 standard TCP, which resets the cwnd to the TCP Restart Window after 232 this period. 234 To enable better performance for variable-rate applications with TCP, 235 some operating systems have chosen to support non-standard methods, 236 or applications have resorted to "padding" streams to maintain their 237 sending rate when they have no data to transmit. Although 238 transmitting redundant data across a network path provides good 239 evidence that the path can sustain data at the offered rate, padding 240 also consumes network capacity and reduces the opportunity for 241 congestion-free statistical multiplexing. For variable-rate flows, 242 the benefits of statistical multiplexing can be significant and it is 243 therefore a goal to find a viable alternative to padding streams. 245 Experience with [RFC2861] suggests that although the CWV method 246 benefited the network in a rate-limited scenario (reducing the 247 probability of network congestion), the behaviour was too 248 conservative for many common rate-limited applications. This 249 mechanism did not therefore offer the desirable increase in 250 application performance for rate-limited applications and it is 251 unclear whether applications actually use this mechanism in the 252 general Internet. 254 It is therefore concluded that CWV, as defined in [RFC2861], was 255 often a poor solution for many rate-limited applications. It had the 256 correct motivation, but had the wrong approach to solving this 257 problem. 259 3. Terminology 261 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 262 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 263 document are to be interpreted as described in [RFC2119]. 265 The document assumes familiarity with the terminology of TCP 266 congestion control [RFC5681]. 268 The following terminology is used in this document: 270 cwnd-limited: A TCP flow that has sent the maximum number of segments 271 permitted by the cwnd, where the application utilises the allowed 272 sending rate (see Section 4.5.2). 274 pipeACK sample: A measure of the volume of data acknowledged by the 275 network within an RTT. 277 pipeACK variable: A variable that measures the available capacity 278 using the set of pipeACK samples. 280 pipeACK Sampling Period: The maximum period that a measured pipeACK 281 sample may influence the pipeACK variable. 283 Non-validated phase: The phase where the cwnd reflects a previous 284 measurement of the available path capacity. 286 Non-validated period, NVP: The maximum period for which cwnd is 287 preserved in the non-validated phase. 289 Rate-limited: A TCP flow that does not consume more than one half of 290 cwnd, and hence operates in the non-validated phase. This includes 291 periods when an application is either idle or chooses to send at a 292 rate less than the maximum permitted by the cwnd. 294 Validated phase: The phase where the cwnd reflects a current estimate 295 of the available path capacity. 297 4. A New Congestion Window Validation method 299 This section proposes an update to the TCP congestion control 300 behaviour during a rate-limited interval. This new method 301 intentionally does not differentiate between times when the sender 302 has become idle or chooses to send at a rate less than the maximum 303 allowed by the cwnd. 305 The period where actual usage is less than allowed by cwnd, is named 306 as the non-validated phase. The update allows an application in the 307 non-validated phase to resume transmission at a previous rate without 308 incurring the delay of slow-start. However, if the TCP sender 309 experiences congestion using the preserved cwnd, it is required to 310 immediately reset the cwnd to an appropriate value specified by the 311 method. If a sender does not take advantage of the preserved cwnd 312 within the NVP, the value of cwnd is reduced, ensuring the value 313 better reflects the capacity that was recently actually used. 315 It is expected that this update will satisfy the requirements of many 316 rate-limited applications and at the same time provide an appropriate 317 method for use in the Internet. Some applications use dummy packets 318 (aka "padding") to maintain a sending rate when an application has 319 now data to send. Although this ensures the path continues to 320 support the rate permitted by the cwnd, it wastes network capacity 321 sending useless data. New-CWV reduces this incentive for an 322 application to send data simply to keep transport congestion state. 324 The method is specified in following subsections and is expected to 325 encourage applications and TCP stacks to use standards-based 326 congestion control methods. It may also encourage the use of long- 327 lived connections where this offers benefit (such as persistent 328 http). 330 4.1. Initialisation 332 A sender starts a TCP connection in the validated phase and 333 initialises the pipeACK variable to the "undefined" value. This 334 value inhibits use of the value in cwnd calculations. 336 4.2. Estimating the validated capacity supported by a path 338 [RFC6675] defines a variable, FlightSize, that indicates the 339 instantaneous amount of data that has been sent, but not cumulatively 340 acknowledged. In this method a new variable "pipeACK" is introduced 341 to measure the acknowledged size of the network pipe. This is used 342 to determine if the sender has validated the cwnd. pipeACK differs 343 from FlightSize in that it is evaluated over a window of acknowledged 344 data, rather than reflecting the amount of data outstanding. 346 A sender determines a pipeACK sample by measuring the volume of data 347 that was acknowledged by the network over the period of a measured 348 Round Trip Time (RTT). Using the variables defined in [RFC6675], a 349 value could be measured by caching the value of HighACK and after one 350 RTT measuring the difference between the cached HighACK value and the 351 current HighACK value. Other equivalent methods may be used. 353 A sender is not required to continuously update the pipeACK variable 354 after each received ACK, but SHOULD perform a pipeACK sample at least 355 once per RTT when it has sent unacknowledged segments. 357 The pipeACK variable MAY consider multiple pipeACK samples over the 358 pipeACK Sampling Period. The value of the pipeACK variable MUST NOT 359 exceed the maximum (highest value) within the sampling period. This 360 specification defines the pipeACK Sampling Period as Max(3*RTT, 1 361 second). This period enables a sender to compensate for large 362 fluctuations in the sending rate, where there may be pauses in 363 transmission, and allows the pipeACK variable to reflect the largest 364 recently measured pipeACK sample. 366 When no measurements are available, the pipeACK variable is set to 367 the "undefined value". This value is used to inhibit entering the 368 non-validated phase until the first new measurement of a pipeACK 369 sample. 371 The pipeACK variable MUST NOT be updated during TCP Fast Recovery. 372 That is, the sender stops collecting pipeACK samples during loss 373 recovery. The method RECOMMENDS that the TCP SACK option [RFC2018] 374 is enabled and the method defined on [RFC6675]is used to recover 375 missing segments. This allows the sender to more accurately 376 determine the number of missing bytes during the loss recovery phase, 377 and using this method will result in a more appropriate cwnd 378 following loss. 380 4.3. Preserving cwnd during a rate-limited period. 382 The updated method creates a new TCP sender phase that captures 383 whether the cwnd reflects a validated or non-validated value. The 384 phases are defined as: 386 o Validated phase: pipeACK >=(1/2)*cwnd, or pipeACK is undefined. 387 This is the normal phase, where cwnd is expected to be an 388 approximate indication of the capacity currently available along 389 the network path, and the standard methods are used to increase 390 cwnd (currently [RFC5681]). 392 o Non-validated phase: pipeACK <(1/2)*cwnd. This is the phase where 393 the cwnd has a value based on a previous measurement of the 394 available capacity, and the usage of this capacity has not been 395 validated in the pipeACK Sampling Period. That is, when it is not 396 known whether the cwnd reflects the currently available capacity 397 along the network path. The mechanisms to be used in this phase 398 seek to determine a safe value for cwnd and an appropriate 399 reaction to congestion. 401 Note: A threshold is needed to determine whether a sender is in the 402 validated or non-validated phase. We start by noting that a standard 403 TCP sender in slow-start is permitted to double its FlightSize from 404 one RTT to the next. This motivated the choice of a threshold value 405 of 1/2. This threshold ensures a sender does not further increase 406 the cwnd as long as the FlightSize is less than (1/2*cwnd). 407 Furthermore, a sender with a FlightSize less than (1/2*cwnd) may in 408 the next RTT be permitted by the cwnd to send at a rate that more 409 than doubles the FlightSize, and hence this case needs to be regarded 410 as non-validated and a sender therefore needs to employ additional 411 mechanisms while in this phase. 413 4.4. TCP congestion control during the non-validated phase 415 A TCP sender MUST enter the non-validated phase when the pipeACK is 416 less than (1/2)*cwnd. 418 A TCP sender that enters the non-validated phase SHOULD preserve the 419 cwnd (i.e., this neither grows nor reduces while the sender remains 420 in this phase). If the sender receives an indication of congestion 421 (loss or Explicit Congestion Notification, ECN, mark [RFC3168]) it 422 uses the method described below. The phase is concluded after a 423 fixed period of time (the NVP, as explained in Section 4.4.3) or when 424 the sender transmits sufficient data so that pipeACK > (1/2)*cwnd 425 (i.e. the sender is no longer rate-limited). 427 The behaviour in the non-validated phase is specified as: 429 o A sender determines whether to increase the cwnd based upon 430 whether it is cwnd-limited (see Section 4.5.2): 432 o 434 * A sender that is cwnd-limited MAY use the standard TCP method 435 to increase cwnd (i.e. a TCP sender that fully utilises the 436 cwnd is permitted to increase cwnd each received ACK using 437 standard methods). 439 * A sender that is not cwnd-limited MUST NOT increase the cwnd 440 when ACK packets are received in this phase. 442 o If the sender receives an indication of congestion while in the 443 non-validated phase (i.e., detects loss, or an ECN mark), the 444 sender MUST exit the non-validated phase (reducing the cwnd as 445 defined in Section 4.4.1). 447 o If the Retransmission Time Out (RTO) expires while in the non- 448 validated phase, the sender MUST exit the non-validated phase. It 449 then resumes using the standard TCP RTO mechanism [RFC5681]. 451 o A sender with a pipeACK variable greater than (1/2)*cwnd SHOULD 452 enter the validated phase. (A rate-limited sender will not 453 normally be impacted by whether it is in a validated or non- 454 validated phase, since it will normally not consume the entire 455 cwnd. However a change to the validated phase will release the 456 sender from constraints on the growth of cwnd, and restore the use 457 of the standard congestion response.) 459 The cwnd-limited behaviour may be triggered during a transient 460 condition that occurs when a sender is in the non-validated phase and 461 receives an ACK that acknowledges received data, the cwnd was fully 462 utilised, and more data is awaiting transmission than may be sent 463 with the current cwnd. The sender is then allowed to use the 464 standard method to increase the cwnd. (Note, if the sender succeeds 465 in sending these new segments, the updated cwnd and pipeACK variables 466 will eventually result in a transition to the validated phase.) 468 4.4.1. Response to congestion in the non-validated phase 470 Reception of congestion feedback while in the non-validated phase is 471 interpreted as an indication that it was inappropriate for the sender 472 to use the preserved cwnd. The sender is therefore required to 473 quickly reduce the rate to avoid further congestion. Since the cwnd 474 does not have a validated value, a new cwnd value must be selected 475 based on the utilised rate. 477 A sender that detects a packet-drop, or receives an indication of an 478 ECN marked packet, MUST record the current FlightSize in the variable 479 LossFlightSize and MUST calculate a safe cwnd for loss recovery using 480 the method below: 482 cwnd = (Max(pipeACK,LossFlightSize))/2. 484 The pipeACK value is not updated during loss recoverySection 4.2. If 485 there is a valid pipeACK value, the new cwnd is adjusted to reflect 486 that a non-validated cwnd may be larger than the actual FlightSize, 487 or recently used FlightSize (recorded in pipeACK). The updated cwnd 488 therefore prevents overshoot by a sender significantly increasing its 489 transmission rate during the recovery period. 491 At the end of the recovery phase, the TCP sender MUST reset the cwnd 492 using the method below: 494 cwnd = (Max(pipeACK,LossFlightSize) - R)/2. 496 Where R is the volume of data that was retransmitted during the 497 recovery phase. 499 If the sender implements a method that allows it to identify the 500 number of ECN-marked segments within a window that were observed by 501 the receiver, the sender SHOULD use the method above, further 502 reducing R by the number of marked segments. 504 After completing the loss recovery phase, the sender MUST re- 505 initialise the pipeACK variable to the "undefined" value. This 506 ensures that standard TCP methods are used immediately after 507 completing loss recovery until a new pipeACK value can be determined. 509 ssthresh is adjusted using the standard TCP method. 511 Note: The adjustment by reducing cwnd by the volume of data not sent 512 (R) follows the method proposed for Jump Start [Liu07]. The 513 inclusion of the term R makes the adjustment more conservative than 514 standard TCP. This is required, since a sender in the non-validated 515 state may increase the rate more than a standard TCP would have done 516 relative to what was sent in the last RTT (i.e., more than doubled 517 the number of segments in flight relative to what it sent in the last 518 RTT). The additional reduction after congestion is beneficial when 519 the LossFlightSize has significantly overshot the available path 520 capacity incurring significant loss (e.g. following a change of path 521 characteristics or when additional traffic has taken a larger share 522 of the network bottleneck during a period when the sender transmits 523 less). 525 Note: The pipeACK value is only valid during a non-validated phase, 526 and therefore does not exceed cwnd/2. If LossFlightSize and R were 527 small, then this can result in the final cwnd after loss recovery 528 being not more than 1/4 of the cwnd on detection of congestion. This 529 reduction is conservative compared to standard TCP. pipeACK is reset 530 to undefined after completing loss recovery. Subsequent updates to 531 cwnd do not therefore reflect pipeACK history before any congestion 532 event. 534 4.4.2. Sender burst control during the non-validated phase 536 TCP congestion control allows a sender to accumulate a cwnd that 537 would allow it to send a burst of segments with a total size up to 538 the difference between the FlightsSize and cwnd. Such bursts can 539 impact other flows that share a network bottleneck and/or may induce 540 congestion when buffering is limited. 542 Various methods have been proposed to control the sender burstiness 543 [Hug01], [All05]. For example, TCP can limit the number of new 544 segments it sends per received ACK. This is effective when a flow of 545 ACKs is received, but can not be used to control a sender that has 546 not send appreciable data in the previous RTT [All05]. 548 This document recommends using a method to avoid line-rate bursts 549 after an idle or rate-limited interval when there is less reliable 550 information about the capacity of the network path: A TCP sender in 551 the non-validated phase SHOULD control the maximum burst size, e.g. 552 using a rate-based pacing algorithm in which a sender paces out the 553 cwnd over its estimate of the RTT, or some other method, to prevent 554 many segments being transmitted contiguously at line-rate. The most 555 appropriate method(s) to implement pacing depend on the design of the 556 TCP/IP stack, speed of interface and whether hardware support (such 557 as TCP Segment Offload, TSO) is used. The present document does not 558 recommend any specific method. 560 4.4.3. Adjustment at the end of the non-validated phase 562 An application that remains in the non-validated phase for a period 563 greater than the NVP is required to adjust its congestion control 564 state. If the sender exits the non-validated phase after this 565 period, it MUST update the ssthresh: 567 ssthresh = max(ssthresh, 3*cwnd/4). 569 (This adjustment of ssthresh ensures that the sender records that it 570 has safely sustained the present rate. The change is beneficial to 571 rate-limited flows that encounter occasional congestion, and could 572 otherwise suffer an unwanted additional delay in recovering the 573 sending rate.) 575 The sender MUST then update cwnd to be not greater than: 577 cwnd = max((1/2)*cwnd, IW). 579 Where IW is the appropriate TCP initial window, used by the TCP 580 sender (e.g. [RFC5681]). 582 Note: This adjustment ensures that the sender responds conservatively 583 after remaining in the non-validated phase for more than the non- 584 validated period. In this case, it reduces the cwnd by a factor of 585 two from the preserved value. This adjustment is helpful when flows 586 accumulate but do not use a large cwnd, and seeks to mitigate the 587 impact when these flows later resume transmission. This could for 588 instance mitigate the impact if multiple high-rate application flows 589 were to become idle over an extended period of time and then were 590 simultaneously awakened by some external event. 592 4.5. Examples of Implementation 594 This section provides informative examples of implementation methods. 595 Implementations may choose to use other methods that comply with the 596 normative requirements. 598 4.5.1. Implementing the pipeACK measurement 600 A pipeACK sample may be measured once each RTT. This reduces the 601 sender processing burden for calculating after each acknowledgement 602 and also reduces storage requirements at the sender. 604 Since application behaviour can be bursty using CWV, it may be 605 desirable to implement a maximum filter to accumulate the measured 606 values so that the pipeACK variable records the largest pipeACK 607 sample within the pipeACK Sampling Period. One simple way to 608 implement this is to divide the pipeACK Sampling Period into several 609 (e.g. 5) equal length measurement periods. The sender then records 610 the start time for each measurement period and the highest measured 611 pipeACK sample. At the end of the measurement period, any 612 measurement(s) that are older than the pipeACK Sampling Period are 613 discarded. The pipeACK variable is then assigned the largest of the 614 set of the highest measured values. 616 +----------+----------+ +----------+---...... 617 | Sample A | Sample B | No | Sample C | Sample D 618 | | | Sample | | 619 | |\ 5 | | | | 620 | | | | | | /\ 4 | 621 | | | | |\ 3 | | | \ | 622 | | \ | | \--- | | / \ | /| 2 623 |/ \------| - | | / \------/ \... 624 +----------+---------\+----/ /----+/---------+-------------> Time 626 <------------------------------------------------| 627 Sampling Period Current Time 629 Figure 1: Example of measuring pipeACK samples 631 Figure 1 shows an example of how measurement samples may be 632 collected. At the time represented by the figure new samples are 633 being accumulated into sample D. Three previous samples also fall 634 within the pipeACK Sampling Period: A, B, and C. There was also a 635 period of inactivity between samples B and C during which no 636 measurements were taken. The current value of the pipeACK variable 637 will be 5, the maximum across all samples. 639 After one further measurement period, Sample A will be discarded, 640 since it then is older than the pipeACK Sampling Period and the 641 pipeACK variable will be recalculated, Its value will be the larger 642 of Sample C or the final value accumulated in Sample D. 644 Note that the pipeACK Sampling Period and the NVP period do not 645 necessarily require a new timer to be implemented. An alternative is 646 to record a timestamp when the sender enters the NVP. Each time a 647 sender transmits a new segment, this timestamp may be used to 648 determine if the NVP period has expired. If the period expires, the 649 sender may take into account how many units of the NVP period have 650 passed and make one reduction (as defined in Section 4.4.3) for each 651 NVP period. 653 4.5.2. Implementing detection of the cwnd-limited condition 655 A method is required to detect the cwnd-limited condition (see 656 Section 4.4. This is used to detect a condition where a sender in 657 the non-validated phase receives an ACK, but the size of cwnd 658 prevents sending more new data. 660 In simple terms this condition is true only when the TCP sender's 661 FlightSize is equal to or larger than the cwnd. However, an 662 implementation must consider other constraints on the way in which 663 cwnd variable is used, for instance the need to support methods such 664 as the Nagle Algorithm and TCP Segment Offload (TSO). This can 665 result in a sender becoming cwnd-limited when the cwnd is nearly, 666 rather than completely, equal to the FlightSize. 668 5. Determining a safe period to preserve cwnd 670 This section documents the rationale for selecting the maximum period 671 that cwnd may be preserved, known as the non-validated period, NVP. 673 Limiting the period that cwnd may be preserved avoids undesirable 674 side effects that would result if the cwnd were to be kept 675 unnecessarily high for an arbitrary long period, which was a part of 676 the problem that CWV originally attempted to address. The period a 677 sender may safely preserve the cwnd, is a function of the period that 678 a network path is expected to sustain the capacity reflected by cwnd. 679 There is no ideal choice for this time. 681 A period of five minutes was chosen for this NVP. This is a 682 compromise that was larger than the idle intervals of common 683 applications, but not sufficiently larger than the period for which 684 the capacity of an Internet path may commonly be regarded as stable. 685 The capacity of wired networks is usually relatively stable for 686 periods of several minutes and that load stability increases with the 687 capacity. This suggests that cwnd may be preserved for at least a 688 few minutes. 690 There are cases where the TCP throughput exhibits significant 691 variability over a time less than five minutes. Examples could 692 include wireless topologies, where TCP rate variations may fluctuate 693 on the order of a few seconds as a consequence of medium access 694 protocol instabilities. Mobility changes may also impact TCP 695 performance over short time scales. Senders that observe such rapid 696 changes in the path characteristic may also experience increased 697 congestion with the new method, however such variation would likely 698 also impact TCP's behaviour when supporting interactive and bulk 699 applications. 701 Routing algorithms may modify the network path, disrupting the RTT 702 measurement and changing the capacity available to a TCP connection, 703 however such changes do not often occur within a time frame of a few 704 minutes. 706 The value of five minutes is therefore expected to be sufficient for 707 most current applications. Simulation studies (e.g. [Bis11]) also 708 suggest that for many practical applications, the performance using 709 this value will not be significantly different to that observed using 710 a non-standard method that does not reset the cwnd after idle. 712 Finally, other TCP sender mechanisms have used a 5 minute timer, and 713 there could be simplifications in some implementations by reusing the 714 same interval. TCP defines a default user timeout of 5 minutes 715 [RFC0793] i.e. how long transmitted data may remain unacknowledged 716 before a connection is forcefully closed. 718 6. Security Considerations 720 General security considerations concerning TCP congestion control are 721 discussed in [RFC5681]. This document describes an algorithm that 722 updates one aspect of the congestion control procedures, and so the 723 considerations described in RFC 5681 also apply to this algorithm. 725 7. IANA Considerations 727 There are no IANA considerations. 729 8. Acknowledgments 731 The authors acknowledge the contributions of Dr I Biswas, Mr Ziaul 732 Hossain in supporting the evaluation of CWV and for their help in 733 developing the mechanisms proposed in this draft. We also 734 acknowledge comments received from the Internet Congestion Control 735 Research Group, in particular Yuchung Cheng, Mirja Kuehlewind, Joe 736 Touch, and Mark Allman. This work was part-funded by the European 737 Community under its Seventh Framework Programme through the Reducing 738 Internet Transport Latency (RITE) project (ICT-317700). 740 9. Author Notes 742 RFC-Editor note: please remove this section prior to publication. 744 9.1. Other related work 746 RFC-Editor note: please remove this section prior to publication. 748 There are several issues to be discussed more widely: 750 o There are potential interactions with the Experimental update in 751 [RFC6928] that raises the TCP initial Window to ten segments, do 752 these cases need to be elaborated? 754 This relates to the Experimental specification for increasing 755 the TCP IW defined in RFC 6928. 757 The two methods have different functions and different response 758 to loss/congestion. 760 RFC 6928 proposes an experimental update to TCP that would 761 increase the IW to ten segments. This would allow faster 762 opening of the cwnd, and also a large (same size) restart 763 window. This approach is based on the assumption that many 764 forward paths can sustain bursts of up to ten segments without 765 (appreciable) loss. Such a significant increase in cwnd must 766 be matched with an equally large reduction of cwnd if loss/ 767 congestion is detected, and such a congestion indication is 768 likely to require future use of IW=10 to be disabled for this 769 path for some time. This guards against the unwanted behaviour 770 of a series of short flows continuously flooding a network path 771 without network congestion feedback. 773 In contrast, this document proposes an update with a rationale 774 that relies on recent previous path history to select an 775 appropriate cwnd after restart. 777 The behaviour differs in three ways: 779 1) For applications that send little initially, new-cwv may 780 constrain more than RFC 6928, but would not require the 781 connection to reset any path information when a restart 782 incurred loss. In contrast, new-cwv would allow the TCP 783 connection to preserve the cached cwnd, any loss, would impact 784 cwnd, but not impact other flows. 786 2) For applications that utilise more capacity than provided by 787 a cwnd of 10 segments, this method would permit a larger 788 restart window compared to a restart using the method in RFC 789 6928. This is justified by the recent path history. 791 3) new-CWV is attended to also be used for rate-limited 792 applications, where the application sends, but does not seek to 793 fully utilise the cwnd. In this case, new-cwv constrains the 794 cwnd to that justified by the recent path history. The 795 performance trade-offs are hence different, and it would be 796 possible to enable new-cwv when also using the method in RFC 797 6928, and yield benefits. 799 o There is potential overlap with the Laminar proposal (draft- 800 mathis-tcpm-tcp-laminar) 802 The current draft was intended as a standards-track update to 803 TCP, rather than a new transport variant. At least, it would 804 be good to understand how the two interact and whether there is 805 a possibility of a single method. 807 o There is potential performance loss in loss of a short burst 808 (off list with M Allman) 810 A sender can transmit several segments then become idle. If 811 the first segments are all ACK'ed the ssthresh collapses to a 812 small value (no new data is sent by the idle sender). Loss of 813 the later data results in congestion (e.g. maybe a RED drop or 814 some other cause, rather than the maximum rate of this flow). 815 When the sender performs loss recovery it may have an 816 appreciable pipeACK and cwnd, but a very low FlightSize - the 817 Standard algorithm results in an unusually low cwnd ((1/2)* 818 FlightSize). 820 A constant rate flow would have maintained a FlightSize 821 appropriate to pipeACK (cwnd if it is a bulk flow). 823 This could be fixed by adding a new state variable? It could 824 also be argued this is a corner case (e.g. loss of only the 825 last segments would have resulted in RTO), the impact could be 826 significant. 828 o There is potential interaction with TCP Control Block Sharing(M 829 Welzl) 831 An application that is non-validated can accumulate a cwnd that 832 is larger than the actual capacity. Is this a fair value to 833 use in TCB sharing? 835 We propose that TCB sharing should use the pipeACK in place of 836 cwnd when a TCP sender is in the Non-validated phase. This 837 value better reflects the capacity that the flow has utilised 838 in the network path. 840 9.2. Revision notes 842 RFC-Editor note: please remove this section prior to publication. 844 Draft 03 was submitted to ICCRG to receive comments and feedback. 846 Draft 04 contained the first set of clarifications after feedback: 848 o Changed name to application limited and used the term rate-limited 849 in all places. 851 o Added justification and many minor changes suggested on the list. 853 o Added text to tie-in with more accurate ECN marking. 855 o Added ref to Hug01 857 Draft 05 contained various updates: 859 o New text to redefine how to measure the acknowledged pipe, 860 differentiating this from the FlightSize, and hence avoiding 861 previous issues with infrequent large bursts of data not being 862 validated. A key point new feature is that pipeACK only triggers 863 leaving the NVP after the size of the pipe has been acknowledged. 864 This removed the need for hysteresis. 866 o Reduction values were changed to 1/2, following analysis of 867 suggestions from ICCRG. This also sets the "target" cwnd as twice 868 the used rate for non-validated case. 870 o Introduced a symbolic name (NVP) to denote the 5 minute period. 872 Draft 06 contained various updates: 874 o Required reset of pipeACK after congestion. 876 o Added comment on the effect of congestion after a short burst (M. 877 Allman). 879 o Correction of minor Typos. 881 WG draft 00 contained various updates: 883 o Updated initialisation of pipeACK to maximum value. 885 o Added note on intended status still to be determined. 887 WG draft 01 contained: 889 o Added corrections from Richard Scheffenegger. 891 o Raffaello Secchi added to the mechanism, based on implementation 892 experience. 894 o Removed that the requirement for the method to use TCP SACK option 896 o Although it may be desirable to use SACK, this is not essential to 897 the algorithm. 899 o Added the notion of the sampling period to accommodate large rate 900 variations and ensure that the method is stable. This algorithm 901 to be validated through implementation. 903 WG draft 02 contained: 905 o Clarified language around pipeACK variable and pipeACK sample - 906 Feedback from Aris Angelogiannopoulos. 908 WG draft 03 contained: 910 o Editorial corrections - Feedback from Anna Brunstrom. 912 o An adjustment to the procedure at the start and end of Reoloss 913 recovery to align the two equations. 915 o Further clarification of the "undefined" value of the pipeACK 916 variable. 918 WG draft 04 contained: 920 o Editorial corrections. 922 o Introduced the "cwnd-limited" term. 924 o An adjustment to the procedure at the start of a cwnd-limited 925 phase - the new text is intended to ensure that new-cwv is not 926 unnecessarily more conservative than standard TCP when the flow is 927 cwnd-limited. This resolves two issues: first it prevents 928 pathologies in which pipeACK increases slowly and erratically. It 929 also ensures that performance of bulk applications is not 930 significantly impacted when using the method. 932 o Clearly identifies that pacing (or equivalent) is requiring during 933 the NVP to control burstiness. New section added. 935 WG draft 05 contained: 937 o Clarification to first two bullets in Section 4.4 describing cwnd- 938 limited, to explain these are really alternates to the same case. 940 o Section giving implementation examples was restructured to clarify 941 there are two methods described. 943 o Cross References to sections updated - thanks to comments from 944 Martin Winbjoerk and Tim Wicinski. 946 WG draft 06 contained: 948 o The section giving implementation examples was restructured to 949 clarify there are two methods described. 951 o Justification of design decisions. 953 o Re-organised text to improve clarity of argument. 955 10. References 957 10.1. Normative References 959 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 960 793, September 1981. 962 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 963 Selective Acknowledgment Options", RFC 2018, October 1996. 965 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 966 Requirement Levels", BCP 14, RFC 2119, March 1997. 968 [RFC2861] Handley, M., Padhye, J., and S. Floyd, "TCP Congestion 969 Window Validation", RFC 2861, June 2000. 971 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 972 of Explicit Congestion Notification (ECN) to IP", RFC 973 3168, September 2001. 975 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 976 Control", RFC 5681, September 2009. 978 [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., 979 and Y. Nishida, "A Conservative Loss Recovery Algorithm 980 Based on Selective Acknowledgment (SACK) for TCP", RFC 981 6675, August 2012. 983 10.2. Informative References 985 [All05] Allman, M. and E. Blanton, "Notes on burst mitigation for 986 transport protocols", March 2005. 988 [Bis08] Biswas, I. and G. Fairhurst, "A Practical Evaluation of 989 Congestion Window Validation Behaviour, 9th Annual 990 Postgraduate Symposium in the Convergence of 991 Telecommunications, Networking and Broadcasting (PGNet), 992 Liverpool, UK", June 2008. 994 [Bis10] Biswas, I., Sathiaseelan, A., Secchi, R., and G. 995 Fairhurst, "Analysing TCP for Bursty Traffic, Int'l J. of 996 Communications, Network and System Sciences, 7(3)", June 997 2010. 999 [Bis11] Biswas, I., "PhD Thesis, Internet congestion control for 1000 variable rate TCP traffic, School of Engineering, 1001 University of Aberdeen", June 2011. 1003 [Fai12] Sathiaseelan, A., Secchi, R., Fairhurst, G., and I. 1004 Biswas, "Enhancing TCP Performance to support Variable- 1005 Rate Traffic, 2nd Capacity Sharing Workshop, ACM CoNEXT, 1006 Nice, France, 10th December 2012.", June 2008. 1008 [Hug01] Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP 1009 Slow-Start Restart After Idle (Work-in-Progress)", 1010 December 2001. 1012 [Liu07] Liu, D., Allman, M., Jiny, S., and L. Wang, "Congestion 1013 Control without a Startup Phase, 5th International 1014 Workshop on Protocols for Fast Long-Distance Networks 1015 (PFLDnet), Los Angeles, California, USA", February 2007. 1017 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 1018 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 1019 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 1021 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 1022 "Computing TCP's Retransmission Timer", RFC 6298, June 1023 2011. 1025 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, 1026 "Increasing TCP's Initial Window", RFC 6928, April 2013. 1028 Authors' Addresses 1030 Godred Fairhurst 1031 University of Aberdeen 1032 School of Engineering 1033 Fraser Noble Building 1034 Aberdeen, Scotland AB24 3UE 1035 UK 1037 Email: gorry@erg.abdn.ac.uk 1038 URI: http://www.erg.abdn.ac.uk 1040 Arjuna Sathiaseelan 1041 University of Aberdeen 1042 School of Engineering 1043 Fraser Noble Building 1044 Aberdeen, Scotland AB24 3UE 1045 UK 1047 Email: arjuna@erg.abdn.ac.uk 1048 URI: http://www.erg.abdn.ac.uk 1050 Raffaello Secchi 1051 University of Aberdeen 1052 School of Engineering 1053 Fraser Noble Building 1054 Aberdeen, Scotland AB24 3UE 1055 UK 1057 Email: raffaello@erg.abdn.ac.uk 1058 URI: http://www.erg.abdn.ac.uk