idnits 2.17.1 draft-ietf-tcpm-newcwv-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC2861, but the abstract doesn't seem to directly say this. It does mention RFC2861 though, so this could be OK. -- The draft header indicates that this document updates RFC5681, but the abstract doesn't seem to directly say this. It does mention RFC5681 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC5681, updated by this document, for RFC5378 checks: 2006-01-26) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 16, 2013) is 3782 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2861 (Obsoleted by RFC 7661) ** Obsolete normative reference: RFC 3517 (Obsoleted by RFC 6675) ** Downref: Normative reference to an Experimental RFC: RFC 6928 Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCPM Working Group G. Fairhurst 3 Internet-Draft A. Sathiaseelan 4 Obsoletes: 2861 (if approved) R. Secchi 5 Updates: 5681 (if approved) University of Aberdeen 6 Intended status: Standards Track December 16, 2013 7 Expires: June 19, 2014 9 Updating TCP to support Rate-Limited Traffic 10 draft-ietf-tcpm-newcwv-04 12 Abstract 14 This document proposes an update to RFC 5681 to address issues that 15 arise when TCP is used to support traffic that exhibits periods where 16 the sending rate is limited by the application rather than the 17 congestion window. It updates TCP to allow a TCP sender to restart 18 quickly following either an idle or rate-limited interval. This 19 method is expected to benefit applications that send rate-limited 20 traffic using TCP, while also providing an appropriate response if 21 congestion is experienced. 23 It also evaluates the Experimental specification of TCP Congestion 24 Window Validation, CWV, defined in RFC 2861, and concludes that RFC 25 2861 sought to address important issues, but failed to deliver a 26 widely used solution. This document therefore recommends that the 27 status of RFC 2861 is moved from Experimental to Historic, and that 28 it is replaced by the current specification. 30 NOTE: The standards status of this WG document is under review for 31 consideration as either Experimental (EXP) or Proposed Standard (PS). 32 This decision will be made later as the document is finalised. 34 Status of this Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at http://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on June 19, 2014. 50 Copyright Notice 52 Copyright (c) 2013 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 2. Reviewing experience with TCP-CWV . . . . . . . . . . . . . . 5 69 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 70 4. An updated TCP response to idle and application-limited 71 periods . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 72 4.1. A method for preserving cwnd during the idle and 73 application-limited periods. . . . . . . . . . . . . . . . 7 74 4.2. Initialisation . . . . . . . . . . . . . . . . . . . . . . 8 75 4.3. The nonvalidated phase . . . . . . . . . . . . . . . . . . 8 76 4.4. TCP congestion control during the nonvalidated phase . . . 8 77 4.4.1. Response to congestion in the nonvalidated phase . . . 9 78 4.4.2. Sender burst control during the nonvalidated phase . . 10 79 4.4.3. Adjustment at the end of the nonvalidated phase . . . 11 80 4.4.4. Examples of Implementation . . . . . . . . . . . . . . 11 81 5. Determining a safe period to preserve cwnd . . . . . . . . . . 13 82 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14 83 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 84 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14 85 9. Author Notes . . . . . . . . . . . . . . . . . . . . . . . . . 14 86 9.1. Other related work . . . . . . . . . . . . . . . . . . . . 14 87 9.2. Revision notes . . . . . . . . . . . . . . . . . . . . . . 17 88 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 89 10.1. Normative References . . . . . . . . . . . . . . . . . . . 19 90 10.2. Informative References . . . . . . . . . . . . . . . . . . 19 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 93 1. Introduction 95 TCP is used to support a range of application behaviours. The TCP 96 congestion window (cwnd) controls the number of unacknowledged 97 packets/bytes that a TCP flow may have in the network at any time, a 98 value known as the FlightSize [RFC5681]. A bulk application will 99 always have data available to transmit. The rate at which it sends 100 is therefore limited by the maximum permitted by the receiver 101 advertised window and the sender congestion window (cwnd). In 102 contrast, a rate-limited application will experience periods when the 103 sender is either idle or is unable to send at the maximum rate 104 permitted by the cwnd. This latter case is called rate-limited. The 105 focus of this document is on the operation of TCP in such an idle or 106 rate-limited case. 108 Standard TCP [RFC5681] requires the cwnd to be reset to the restart 109 window (RW) when an application becomes idle. [RFC2861] noted that 110 this TCP behaviour was not always observed in current 111 implementations. Recent experiments [Bis08] confirm this to still be 112 the case. 114 Standard TCP does not impose additional restrictions on the growth of 115 the cwnd when a TCP sender is rate-limited. A rate-limited sender 116 may therefore grow a cwnd far beyond that corresponding to the 117 current transmit rate, resulting in a value that does not reflect 118 current information about the state of the network path the flow is 119 using. Use of such an invalid cwnd may result in reduced application 120 performance and/or could significantly contribute to network 121 congestion. 123 [RFC2861] proposed a solution to these issues in an experimental 124 method known as Congestion Window Validation (CWV). CWV was intended 125 to help reduce cases where TCP accumulated an invalid cwnd. The use 126 and drawbacks of using the CWV algorithm in RFC 2861 with an 127 application are discussed in Section 2. 129 Section 3 defines relevant terminology. 131 Section 4 specifies an alternative to CWV that seeks to address the 132 same issues, but does this in a way that is expected to mitigate the 133 impact on an application that varies its sending rate. The method 134 described applies to both a rate-limited and an idle condition. 135 Section 5 describes the rationale for selecting the safe period to 136 preserve the cwnd. 138 2. Reviewing experience with TCP-CWV 140 RFC 2861 described a simple modification to the TCP congestion 141 control algorithm that decayed the cwnd after the transition to a 142 "sufficiently-long" idle period. This used the slow-start threshold 143 (ssthresh) to save information about the previous value of the 144 congestion window. The approach relaxed the standard TCP behaviour 145 [RFC5681] for an idle session, intended to improve application 146 performance. CWV also modified the behaviour for a rate-limited 147 session where a sender transmitted at a rate less than allowed by 148 cwnd. 150 RFC 2861 has been implemented in some mainstream operating systems as 151 the default behaviour [Bis08]. Analysis (e.g. [Bis10] [Fai12]) has 152 shown that a TCP sender using CWV is able to use available capacity 153 on a shared path after an idle period. This can benefit some 154 applications, especially over long delay paths, when compared to the 155 slow-start restart specified by standard TCP. However, CWV would 156 only benefit an application if the idle period were less than several 157 Retransmission Time Out (RTO) intervals [RFC6298], since the 158 behaviour would otherwise be the same as for standard TCP, which 159 resets the cwnd to the RTCP Restart Window (RW) after this period. 161 Experience with RFC 2861 suggests that although the CWV method 162 benefited the network in a rate-limited scenario (reducing the 163 probability of network congestion), the behaviour was too 164 conservative for many common rate-limited applications. This 165 mechanism did not therefore offer the desirable increase in 166 application performance for rate-limited applications and it is 167 unclear whether applications actually use this mechanism in the 168 general Internet. 170 It is therefore concluded that CWV, as defined in RFC2681, was often 171 a poor solution for many rate-limited applications. It had the 172 correct motivation, but had the wrong approach to solving this 173 problem. 175 3. Terminology 177 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 178 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 179 document are to be interpreted as described in [RFC2119]. 181 The document assumes familiarity with the terminology of TCP 182 congestion control [RFC5681]. 184 The following new terminology is introduced: 186 cwnd-limited: A TCP flow that sends the number of segments permitted 187 by the cwnd, where the application utilises the allowed sending rate. 189 pipeACK sample: A meaure of the volume of data acknowledged by the 190 network within an RTT. 192 pipeACK variable: A variable that measures the available capacity 193 using the set of pipeACK samples. 195 pipeACK Sampling Period: The maximum period that a measured pipeACK 196 sample may influence the pipeACK variable. 198 Non-validated phase: The phase where the cwnd reflects a previous 199 measurement of the available path capacity. 201 Non-validated period, NVP: The maximum period for which cwnd is 202 preserved in the non-validated phase. 204 Rate-limited: A TCP flow that does not consume more than one half of 205 cwnd, and hence operates in the non-validated phase. 207 Validated phase: The phase where the cwnd reflects a current estimate 208 of the available path capacity. 210 4. An updated TCP response to idle and application-limited periods 212 This section proposes an update to the TCP congestion control 213 behaviour during a rate-limited period. The new method permits a TCP 214 sender to preserve the cwnd when an application becomes idle or when 215 the sender is unable to send at the maximum rate permitted by the 216 cwnd (the non-validated period, NVP, see section 5). The period 217 where actual usage is less than allowed by cwnd, is named as the non- 218 validated phase. This method allows an application to resume 219 transmission at a previous rate without incurring the delay of slow- 220 start. However, if the TCP sender experiences congestion using the 221 preserved cwnd, it is required to immediately reset the cwnd to an 222 appropriate value specified by the method. If a sender does not take 223 advantage of the preserved cwnd within the NVP, the value of cwnd is 224 reduced, ensuring the value better reflects the capacity that was 225 recently actually used. 227 It is expected that this update will satisfy the requirements of many 228 rate-limited applications and at the same time provide an appropriate 229 method for use in the Internet. It also reduces the incentive for an 230 application to send data simply to keep transport congestion state. 231 (This is sometimes known as "padding"). 233 The new method does not differentiate between times when the sender 234 has become idle or rate-limited. This is partly a response to 235 recognition that some applications wish to transmit at a rate less 236 than allowed by the sender cwnd, and that it can be hard to make a 237 distinction between rate-limited and idle behaviour. This is 238 expected to encourage applications and TCP stacks to use standards- 239 based congestion control methods. It may also encourage the use of 240 long-lived connections where this offers benefit (such as persistent 241 http). 243 The method is specified in following subsections. 245 4.1. A method for preserving cwnd during the idle and application- 246 limited periods. 248 [RFC5681] defines a variable, FlightSize, that indicates the amount 249 of outstanding data in the network. This is assumed to be equal to 250 the value of Pipe calculated based on the pipe algorithm [RFC3517]. 251 In RFC5681 this value is used during loss recovery, whereas in this 252 method a new variable "pipeACK" is introduced to measure the 253 acknowledged size of the pipe, which is used to determine if the 254 sender has validated the cwnd. 256 A sender determines a pipeACK sample by measuring the volume of data 257 that was acknowledged by the network over the period of a measured 258 Round Trip Time (RTT). Using the variables defined in [RFC3517], a 259 value could be measured by caching the value of HighACK and after one 260 RTT measuring the difference between the cached HighACK value and the 261 current HighACK value. Other equivalent methods may be used. 263 A sender is not required to continuously update the pipeACK variable 264 after each received ACK, but SHOULD perform a pipeACK sample at least 265 once per RTT when it has sent unacknowledged segments. 267 The pipeACK variable MAY consider multiple pipeACK samples over the 268 pipeACK Sampling Period. The value of the pipeACK variable MUST NOT 269 exceed the maximum (highest value) within the sampling period. This 270 specification defines the pipeACK Sampling Period as Max(3*RTT, 1 271 second). This period enables a sender to compensate for large 272 fluctuations in the sending rate, where there may be pauses in 273 transmission, and allows the pipeACK variable to reflect the largest 274 recently measured pipeACK sample. 276 When no measurements are available, the pipeACK variable is set to 277 the "undefined value". This value is used to inhibit entering the 278 nonvalidated phase until the first new measurement of a pipeACK 279 sample. 281 The method RECOMMENDS that the TCP SACK option [RFC3517] is enabled. 282 This allows the sender to more accurately determine the number of 283 missing bytes during the loss recovery phase, and using this method 284 will result in a higher cwnd following loss. 286 4.2. Initialisation 288 A sender starts a TCP connection in the Validated phase and 289 initialises the pipeACK variable to the "undefined" value. This 290 value inhibits use of the value in cwv calculations. 292 4.3. The nonvalidated phase 294 The updated method creates a new TCP sender phase that captures 295 whether the cwnd reflects a validated or non-validated value. The 296 phases are defined as: 298 o Validated phase: pipeACK >=(1/2)*cwnd, or pipeACK is undefined. 299 This is the normal phase, where cwnd is expected to be an 300 approximate indication of the capacity currently available along 301 the network path, and the standard methods are used to increase 302 cwnd (currently [RFC5681]). The rule for transitioning to the 303 non-validated phase is specified in section 4.4. 305 o Non-validated phase: pipeACK <(1/2)*cwnd. This is the phase where 306 the cwnd has a value based on a previous measurement of the 307 available capacity, and the usage of this capacity has not been 308 validated in the pipeACK Sampling Period. That is, when it is not 309 known whether the cwnd reflects the currently available capacity 310 along the network path. The mechanisms to be used in this phase 311 seek to determine a safe value for cwnd and an appropriate 312 reaction to congestion. These mechanisms are specified in section 313 4.4. 315 The value 1/2 was selected to reduce the effects of variations in the 316 pipeACK variable, and to allow the sender some flexibility in when it 317 sends data. 319 4.4. TCP congestion control during the nonvalidated phase 321 A TCP sender MUST enter the non-validated phase when the pipeACK is 322 less than (1/2)*cwnd. 324 A TCP sender that enters the non-validated phase will preserve the 325 cwnd (i.e., this neither grows nor reduces while the sender remains 326 in this phase). If the sender receives an indication of congestion 327 (loss or Explicit Congestion Notification, ECN, mark [RFC3168]) it 328 uses the method described below. The phase is concluded after a 329 fixed period of time (the NVP, as explained in section 4.4.2) or when 330 the sender transmits sufficient data so that pipeACK > (1/2)*cwnd 331 (i.e. it is no longer rate-limited). 333 The behaviour in the non-validated phase is specified as: 335 o A cwnd-limited sender uses the standard TCP method to increase 336 cwnd (i.e. a TCP sender that fully utilises the cwnd is permitted 337 to increase cwnd each received ACK). 339 o A sender that is not cwnd-limited MUST NOT increase the cwnd when 340 ACK packets are received in this phase. 342 o If the sender receives an indication of congestion while in the 343 non-validated phase (i.e. detects loss, or an ECN mark), the 344 sender MUST exit the non-validated phase (reducing the cwnd as 345 defined in section 4.3.1). 347 o If the Retransmission Time Out (RTO) expires while in the non- 348 validated phase, the sender MUST exit the non-validated phase. It 349 then resumes using the Standard TCP RTO mechanism [RFC5681]. (The 350 resulting reduction of cwnd described in section 4.3.2 is 351 appropriate, since any accumulated path history is considered 352 unreliable). 354 o A sender with a pipeACK variable greater than (1/2)*cwnd SHOULD 355 enter the validated phase. (A rate-limited sender will not 356 normally be impacted by whether it is in a validated or non- 357 validated phase, since it will normally not consume the entire 358 cwnd. However a change to the validated phase will release the 359 sender from constraints on the growth of cwnd, and restore the use 360 of the standard congestion response.) 362 4.4.1. Response to congestion in the nonvalidated phase 364 Reception of congestion feedback while in the non-validated phase is 365 interpreted as an indication that it was inappropriate for the sender 366 to use the preserved cwnd. The sender is therefore required to 367 quickly reduce the rate to avoid further congestion. Since the cwnd 368 does not have a validated value, a new cwnd value must be selected 369 based on the utilised rate. 371 A sender that detects a packet-drop, or receives an indication of an 372 ECN marked packet, MUST record the current FlightSize in the variable 373 LossFlightSize and MUST calculate a safe cwnd for loss recovery using 374 the method below: 375 cwnd = (Max(pipeACK,LossFlightSize))/2. 377 If there is a valid pipeACK value, the new cwnd is adjusted to 378 reflect that a nonvalidated cwnd may be larger than the actual 379 FlightSize, or recently used FlightSize (recorded in pipeACK). The 380 updated cwnd therefore prevents overshoot by a sender significantly 381 increasing its transmission rate during the recovery period. 383 At the end of the recovery phase, the TCP sender MUST reset the cwnd 384 using the method below: 385 cwnd = (Max(pipeACK,LossFlightSize) - R)/2. 387 Where, R is the volume of data that was retransmitted during the 388 recovery phase. This follows the method proposed for Jump Start 389 [Liu07]. The inclusion of the term R makes an adjustment more 390 conservative than standard TCP. (This is required, since the sender 391 may have sent more segments than a Standard TCP sender would have 392 done. The additional reduction is beneficial when the LossFlightSize 393 significantly overshoots the available path capacity incurring 394 significant loss, for instance an intense traffic burst following a 395 non-validated period.) 397 If the sender implements a method that allows it to identify the 398 number of ECN-marked segments within a window that were observed by 399 the receiver, the sender SHOULD use the method above, further 400 reducing R by the number of marked segments. 402 The sender MUST also re-initialise the pipeACK variable to the 403 "undefined" value. This ensures that standard TCP methods are used 404 immediately after completing loss recovery until a new pipeACK value 405 can be determined. 407 ssthresh is adjusted using the standard TCP method. 409 4.4.2. Sender burst control during the nonvalidated phase 411 TCP congestion control allows a sender to accumulate a cwnd that 412 would allow it to send a bursts of segments with a total size up to 413 the difference between the FlightsSize and cwnd. Such bursts can 414 impact other flows that share a network bottleneck and/or may induce 415 congestion when buffering is limited. 417 Various methods have been proposed to control the sender bustiness 418 [Hug01], [All05]. For example, TCP can limit the number of new 419 segments it sends per received ACK . This is effective when a flow 420 of ACKs is received, but can not be used to control a sender that has 421 not send appreciable data in the previous RTT [All05]. 423 This document recommends using a method to avoid line-rate bursts 424 after an idle or rate-limited period when there is less reliable 425 information about the capacity of the network path: A TCP sender in 426 the non-validated phase SHOULD control the maximum burst size, e.g. 427 using a rate-based pacing algorithm in which a sender paces out the 428 cwnd over its estimate of the RTT, or some other method, to prevent 429 many segments being transmitted contiguously at line-rate. The most 430 appropriate method(s) to implement pacing depend on the design of the 431 TCP/IP stack, speed of interface and whether hardware support (such 432 as TCP Segment Offload, TSO) is used. The present document does not 433 recommend any specific method. 435 4.4.3. Adjustment at the end of the nonvalidated phase 437 An application that remains in the non-validated phase for a period 438 greater than the NVP is required to adjust its congestion control 439 state. If the sender exits the non-validated phase after this 440 period, it MUST update the ssthresh: 442 ssthresh = max(ssthresh, 3*cwnd/4). 444 (This adjustment of ssthresh ensures that the sender records that it 445 has safely sustained the present rate. The change is beneficial to 446 rate-limited flows that encounter occasional congestion, and could 447 otherwise suffer an unwanted additional delay in recovering the 448 sending rate.) 450 The sender MUST then update cwnd to be not greater than: 452 cwnd = max(1/2*cwnd, IW). 454 Where IW is the appropriate TCP initial window, used by the TCP 455 sender (e.g. [RFC5681]). 457 (This adjustment ensures that sender responds conservatively at the 458 end of the non-validated phase by reducing the cwnd to better reflect 459 the current rate of the sender. The cwnd update does not take into 460 account FlightSize or pipeACK value because these values only reflect 461 historical data and do not reflect the current sending rate.) 463 4.4.4. Examples of Implementation 465 This section is intended to provide informative examples of 466 implementation methods. Implementations may choose to use other 467 methods that comply with the normative requirements. 469 A pipeACK sample may be measured once each RTT. This reduces the 470 sender processing burden for calculating after each acknowledgement 471 and also reduces storage requirements at the sender. 473 Since application behaviour can be bursty using CWV, it may be 474 desirable to implement a maximum filter to accumulate the measured 475 values so that the pipeACK variable records the largest pipeACK 476 sample within the pipeACK Sampling Period. One simple way to 477 implement this is to divide the pipeACK Sampling Period into several 478 (e.g. 5) equal length measurement periods. The sender then records 479 the start time for each measurement period and the highest measured 480 pipeACK sample. At the end of the measurement period, any 481 measurement(s) that are older than the pipeACK Sampling Period are 482 discarded. The pipeACK variable is then assigned the largest of the 483 set of the highest measured values. 485 +----------+----------+ +----------+---...... 486 | Sample A | Sample B | No | Sample C | Sample D 487 | | | Sample | | 488 | |\ 5 | | | | 489 | | | | | | /\ 4 | 490 | | | | |\ 3 | | | \ | 491 | | \ | | \--- | | / \ | /| 2 492 |/ \------| - | | / \------/ \... 493 +----------+---------\+----/ /----+/---------+-------------> Time 495 <------------------------------------------------| 496 Sampling Period Current Time 498 Figure 1: Example of measuring pipeACK samples 500 Figure 1 shows an example of how measurement samples may be 501 collected. At the time represented by the figure new samples are 502 being accumulated into sample D. Three previous samples also fall 503 within the pipeACK Sampling Period: A, B, and C. There was also a 504 period of inactivity between samples B and C during which no 505 measurements were taken. The current value of the pipeACK variable 506 will be 5, the maximum across all samples. 508 After one further measurement period, Sample A will be discarded, 509 since it then is older than the pipeACK Sampling Period and the 510 pipeACK variable will be recalculated, Its value will be the larger 511 of Sample C or the final value accumulated in Sample D. 513 Note that the NVP period does not necessarily require a new timer to 514 be implemented. An alternative is to record a timestamp when the 515 sender enters the NVP. Each time a sender transmits a new segment, 516 this timestamp may be used to determine if the NVP period has 517 expired. If the period expires, the sender may take into account how 518 many units of the NVP period have passed and make one reduction (as 519 defined in section 4.3.2) for each NVP period. 521 A method is required to detect the cwnd-limited condition. In simple 522 terms this method is true only when the TCP sender's FlightSize is 523 equal to or larger than the cwnd. However, an implementation must 524 consider other constraints on the way in which cwnd variable is used, 525 for instance the need to support methods such as the Nagle Algorithm 526 and TCP Segment Offload (TSO). This can result in a sender becoming 527 cwnd-limited when the cwnd is nearly, rather than completely, equal 528 to the FlightSize. 530 5. Determining a safe period to preserve cwnd 532 This section documents the rationale for selecting the maximum period 533 that cwnd may be preserved, known as the non-validated period, NVP. 535 Limiting the period that cwnd may be preserved avoids undesirable 536 side effects that would result if the cwnd were to be kept 537 unnecessarily high for an arbitrary long period, which was a part of 538 the problem that CWV originally attempted to address. The period a 539 sender may safely preserve the cwnd, is a function of the period that 540 a network path is expected to sustain the capacity reflected by cwnd. 541 There is no ideal choice for this time. 543 A period of five minutes was chosen for this NVP. This is a 544 compromise that was larger than the idle intervals of common 545 applications, but not sufficiently larger than the period for which 546 the capacity of an Internet path may commonly be regarded as stable. 547 The capacity of wired networks is usually relatively stable for 548 periods of several minutes and that load stability increases with the 549 capacity. This suggests that cwnd may be preserved for at least a 550 few minutes. 552 There are cases where the TCP throughput exhibits significant 553 variability over a time less than five minutes. Examples could 554 include wireless topologies, where TCP rate variations may fluctuate 555 on the order of a few seconds as a consequence of medium access 556 protocol instabilities. Mobility changes may also impact TCP 557 performance over short time scales. Senders that observe such rapid 558 changes in the path characteristic may also experience increased 559 congestion with the new method, however such variation would likely 560 also impact TCP's behaviour when supporting interactive and bulk 561 applications. 563 Routing algorithms may modify the network path, disrupting the RTT 564 measurement and changing the capacity available to a TCP connection, 565 however such changes do not often occur within a time frame of a few 566 minutes. 568 The value of five minutes is therefore expected to be sufficient for 569 most current applications. Simulation studies (e.g. [Bis11]) also 570 suggest that for many practical applications, the performance using 571 this value will not be significantly different to that observed using 572 a non-standard method that does not reset the cwnd after idle. 574 Finally, other TCP sender mechanisms have used a 5 minute timer, and 575 there could be simplifications in some implementations by reusing the 576 same interval. TCP defines a default user timeout of 5 minutes 577 [RFC0793] i.e. how long transmitted data may remain unacknowledged 578 before a connection is forcefully closed. 580 6. Security Considerations 582 General security considerations concerning TCP congestion control are 583 discussed in [RFC5681]. This document describes an algorithm that 584 updates one aspect of the congestion control procedures, and so the 585 considerations described in RFC 5681 also apply to this algorithm. 587 7. IANA Considerations 589 There are no IANA considerations. 591 8. Acknowledgments 593 The authors acknowledge the contributions of Dr I Biswas, Mr Ziaul 594 Hossain in supporting the evaluation of CWV and for their help in 595 developing the mechanisms proposed in this draft. We also 596 acknowledge comments received from the Internet Congestion Control 597 Research Group, in particular Yuchung Cheng, Mirja Kuehlewind, and 598 Joe Touch. This work was part-funded by the European Community under 599 its Seventh Framework Programme through the Reducing Internet 600 Transport Latency (RITE) project (ICT-317700). 602 9. Author Notes 604 RFC-Editor note: please remove this section prior to publication. 606 9.1. Other related work 608 RFC-Editor note: please remove this section prior to publication. 610 There are several issues to be discussed more widely: 612 o There are potential interactions with the Experimental update in 613 [RFC6928] that raises the TCP initial Window to ten segments, do 614 these cases need to be elaborated? 616 This relates to the Experimental specification for increasing 617 the TCP IW defined in RFC 6928. 619 The two methods have different functions and different response 620 to loss/congestion. 622 RFC 6928 proposes an experimental update to TCP that would 623 increase the IW to ten segments. This would allow faster 624 opening of the cwnd, and also a large (same size) restart 625 window. This approach is based on the assumption that many 626 forward paths can sustain bursts of up to ten segments without 627 (appreciable) loss. Such a significant increase in cwnd must 628 be matched with an equally large reduction of cwnd if loss/ 629 congestion is detected, and such a congestion indication is 630 likely to require future use of IW=10 to be disabled for this 631 path for some time. This guards against the unwanted behaviour 632 of a series of short flows continuously flooding a network path 633 without network congestion feedback. 635 In contrast, this document proposes an update with a rationale 636 that relies on recent previous path history to select an 637 appropriate cwnd after restart. 639 The behaviour differs in three ways: 641 1) For applications that send little initially, new-cwv may 642 constrain more than RFC 6928, but would not require the 643 connection to reset any path information when a restart 644 incurred loss. In contrast, new-cwv would allow the TCP 645 connection to preserve the cached cwnd, any loss, would impact 646 cwnd, but not impact other flows. 648 2) For applications that utilise more capacity than provided by 649 a cwnd of 10 segments, this method would permit a larger 650 restart window compared to a restart using the method in RFC 651 6928. This is justified by the recent path history. 653 3) new-CWV is attended to also be used for rate-limited 654 applications, where the application sends, but does not seek to 655 fully utilise the cwnd. In this case, new-cwv constrains the 656 cwnd to that justified by the recent path history. The 657 performance trade-offs are hence different, and it would be 658 possible to enable new-cwv when also using the method in RFC 659 6928, and yield benefits. 661 o There is potential overlap with the Laminar proposal 662 (draft-mathis-tcpm-tcp-laminar) 664 The current draft was intended as a standards-track update to 665 TCP, rather than a new transport variant. At least, it would 666 be good to understand how the two interact and whether there is 667 a possibility of a single method. 669 o There is potential performance loss in loss of a short burst 670 (off list with M Allman) 672 A sender can transmit several segments then become idle. If 673 the first segments are all ACK'ed the ssthresh collapses to a 674 small value (no new data is sent by the idle sender). Loss of 675 the later data results in congestion (e.g. maybe a RED drop or 676 some other cause, rather than the maximum rate of this flow). 677 When the sender performs loss recovery it may have an 678 appreciable pipeACK and cwnd, but a very low FlightSize - the 679 Standard algorithm results in an unusually low cwnd (1/2 680 FlightSize). 682 A constant rate flow would have maintained a FlightSize 683 appropriate to pipeACK (cwnd if it is a bulk flow). 685 This could be fixed by adding a new state variable? It could 686 also be argued this is a corner case (e.g. loss of only the 687 last segments would have resulted in RTO), the impact could be 688 significant. 690 o There is potential interaction with TCP Control Block Sharing(M 691 Welzl) 693 An application that is non-validated can accumulate a cwnd that 694 is larger than the actual capacity. Is this a fair value to 695 use in TCB sharing? 697 We propose that TCB sharing should use the pipeACK in place of 698 cwnd when a TCP sender is in the Nonvalidated phase. This 699 value better reflects the capacity that the flow has utilised 700 in the network path. 702 9.2. Revision notes 704 RFC-Editor note: please remove this section prior to publication. 706 Draft 03 was submitted to ICCRG to receive comments and feedback. 708 Draft 04 contained the first set of clarifications after feedback: 710 o Changed name to application limited and used the term rate-limited 711 in all places. 713 o Added justification and many minor changes suggested on the list. 715 o Added text to tie-in with more accurate ECN marking. 717 o Added ref to Hug01 719 Draft 05 contained various updates: 721 o New text to redefine how to measure the acknowledged pipe, 722 differentiating this from the FlightSize, and hence avoiding 723 previous issues with infrequent large bursts of data not being 724 validated. A key point new feature is that pipeACK only triggers 725 leaving the NVP after the size of the pipe has been acknowledged. 726 This removed the need for hysteresis. 728 o Reduction values were changed to 1/2, following analysis of 729 suggestions from ICCRG. This also sets the "target" cwnd as twice 730 the used rate for non-validated case. 732 o Introduced a symbolic name (NVP) to denote the 5 minute period. 734 Draft 06 contained various updates: 736 o Required reset of pipeACK after congestion. 738 o Added comment on the effect of congestion after a short burst (M. 739 Allman). 741 o Correction of minor Typos. 743 WG draft 00 contained various updates: 745 o Updated initialisation of pipeACK to maximum value. 747 o Added note on intended status still to be determined. 749 WG draft 01 contained: 751 o Added corrections from Richard Scheffenegger. 753 o Raffaello Secchi added to the mechanism, based on implementation 754 experience. 756 o Removed that the requirement for the method to use TCP SACK option 757 [RFC3517] to be enabled - Although it may be desirable to use 758 SACK, this is not essential to the algorithm. 760 o Added the notion of the sampling period to accommodate large rate 761 variations and ensure that the method is stable. This algorithm 762 to be validated through implementation. 764 WG draft 02 contained: 766 o Clarified language around pipeACK variable and pipeACK sample - 767 Feedback from Aris Angelogiannopoulos. 769 WG draft 03 contained: 771 o Editorial corrections - Feedback from Anna Brunstrom. 773 o An adjustment to the procedure at the start and end of loss 774 recovery to align the two equations. 776 o Further clarification of the "undefined" value of the pipeACK 777 variable. 779 WG draft 04 contained: 781 o Editorial corrections. 783 o Introduced the "cwnd-limited" term. 785 o An adjustment to the procedure at the start of a cwnd-limited 786 phase - the new text is intended to ensure that new-cwv is not 787 unnecessarily more conservative than standard TCP when the flow is 788 cwnd-limited. This resolves two issues: first it prevents 789 pathologies in which pipeACK increases slowly and eraticaly. It 790 also ensures that performance of bulk applications is not 791 significantly impacted when using the method. 793 o Clearly identifies that pacing (or equivalent) is requiring during 794 the NVP to control bustiness. New section added. 796 10. References 798 10.1. Normative References 800 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 801 RFC 793, September 1981. 803 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 804 Requirement Levels", BCP 14, RFC 2119, March 1997. 806 [RFC2861] Handley, M., Padhye, J., and S. Floyd, "TCP Congestion 807 Window Validation", RFC 2861, June 2000. 809 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 810 of Explicit Congestion Notification (ECN) to IP", 811 RFC 3168, September 2001. 813 [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A 814 Conservative Selective Acknowledgment (SACK)-based Loss 815 Recovery Algorithm for TCP", RFC 3517, April 2003. 817 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 818 Control", RFC 5681, September 2009. 820 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 821 "Computing TCP's Retransmission Timer", RFC 6298, 822 June 2011. 824 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, 825 "Increasing TCP's Initial Window", RFC 6928, April 2013. 827 10.2. Informative References 829 [All05] "Notes on burst mitigation for transport protocols", 830 March 2005. 832 [Bis08] Biswas and Fairhurst, "A Practical Evaluation of 833 Congestion Window Validation Behaviour, 9th Annual 834 Postgraduate Symposium in the Convergence of 835 Telecommunications, Networking and Broadcasting (PGNet), 836 Liverpool, UK", June 2008. 838 [Bis10] Biswas, Sathiaseelan, Secchi, and Fairhurst, "Analysing 839 TCP for Bursty Traffic, Int'l J. of Communications, 840 Network and System Sciences, 7(3)", June 2010. 842 [Bis11] Biswas, "PhD Thesis, Internet congestion control for 843 variable rate TCP traffic, School of Engineering, 844 University of Aberdeen", June 2011. 846 [Fai12] Sathiaseelan, Secchi, Fairhurst, and Biswas, "Enhancing 847 TCP Performance to support Variable-Rate Traffic, 2nd 848 Capacity Sharing Workshop, ACM CoNEXT, Nice, France, 10th 849 December 2012.", June 2008. 851 [Hug01] Hughes, Touch, and Heidemann, "Issues in TCP Slow-Start 852 Restart After Idle (Work-in-Progress)", December 2001. 854 [Liu07] Liu, Allman, Jiny, and Wang, "Congestion Control without a 855 Startup Phase, 5th International Workshop on Protocols for 856 Fast Long-Distance Networks (PFLDnet), Los Angeles, 857 California, USA", February 2007. 859 Authors' Addresses 861 Godred Fairhurst 862 University of Aberdeen 863 School of Engineering 864 Fraser Noble Building 865 Aberdeen, Scotland AB24 3UE 866 UK 868 Email: gorry@erg.abdn.ac.uk 869 URI: http://www.erg.abdn.ac.uk 871 Arjuna Sathiaseelan 872 University of Aberdeen 873 School of Engineering 874 Fraser Noble Building 875 Aberdeen, Scotland AB24 3UE 876 UK 878 Email: arjuna@erg.abdn.ac.uk 879 URI: http://www.erg.abdn.ac.uk 880 Raffaello Secchi 881 University of Aberdeen 882 School of Engineering 883 Fraser Noble Building 884 Aberdeen, Scotland AB24 3UE 885 UK 887 Email: raffaello@erg.abdn.ac.uk 888 URI: http://www.erg.abdn.ac.uk