idnits 2.17.1 

draft-ietf-tcpm-newcwv-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The draft header indicates that this document obsoletes RFC2861, but the
     abstract doesn't seem to directly say this.  It does mention RFC2861
     though, so this could be OK.

  -- The draft header indicates that this document updates RFC5681, but the
     abstract doesn't seem to directly say this.  It does mention RFC5681
     though, so this could be OK.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

     (Using the creation date from RFC5681, updated by this document, for
     RFC5378 checks: 2006-01-26)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 23, 2014) is 3680 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Obsolete normative reference: RFC 2861 (Obsoleted by RFC 7661)

  -- Obsolete informational reference (is this intentional?): RFC 2616
     (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235)


     Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	TCPM Working Group                                          G. Fairhurst
3	Internet-Draft                                           A. Sathiaseelan
4	Obsoletes: 2861 (if approved)                                  R. Secchi
5	Updates: 5681 (if approved)                       University of Aberdeen
6	Intended status: Experimental                             March 23, 2014
7	Expires: September 24, 2014

9	              Updating TCP to support Rate-Limited Traffic
10	                       draft-ietf-tcpm-newcwv-06

12	Abstract

14	   This document proposes an update to RFC 5681 to address issues that
15	   arise when TCP is used to support traffic that exhibits periods where
16	   the sending rate is limited by the application rather than the
17	   congestion window.  It provides an experimental update to TCP that
18	   allows a TCP sender to restart quickly following either a rate-
19	   limited interval.  This method is expected to benefit applications
20	   that send rate-limited traffic using TCP, while also providing an
21	   appropriate response if congestion is experienced.

23	   It also evaluates the Experimental specification of TCP Congestion
24	   Window Validation, CWV, defined in RFC 2861, and concludes that RFC
25	   2861 sought to address important issues, but failed to deliver a
26	   widely used solution.  This document therefore recommends that the
27	   status of RFC 2861 is moved from Experimental to Historic, and that
28	   it is replaced by the current specification.

30	Status of This Memo

32	   This Internet-Draft is submitted in full conformance with the
33	   provisions of BCP 78 and BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF).  Note that other groups may also distribute
37	   working documents as Internet-Drafts.  The list of current Internet-
38	   Drafts is at http://datatracker.ietf.org/drafts/current/.

40	   Internet-Drafts are draft documents valid for a maximum of six months
41	   and may be updated, replaced, or obsoleted by other documents at any
42	   time.  It is inappropriate to use Internet-Drafts as reference
43	   material or to cite them other than as "work in progress."

45	   This Internet-Draft will expire on September 24, 2014.

47	Copyright Notice

49	   Copyright (c) 2014 IETF Trust and the persons identified as the
50	   document authors.  All rights reserved.

52	   This document is subject to BCP 78 and the IETF Trust's Legal
53	   Provisions Relating to IETF Documents
54	   (http://trustee.ietf.org/license-info) in effect on the date of
55	   publication of this document.  Please review these documents
56	   carefully, as they describe your rights and restrictions with respect
57	   to this document.  Code Components extracted from this document must
58	   include Simplified BSD License text as described in Section 4.e of
59	   the Trust Legal Provisions and are provided without warranty as
60	   described in the Simplified BSD License.

62	Table of Contents

64	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
65	     1.1.  Standards Status of this Document . . . . . . . . . . . .   4
66	   2.  Reviewing experience with TCP-CWV . . . . . . . . . . . . . .   5
67	   3.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   6
68	     4.1.  Initialisation  . . . . . . . . . . . . . . . . . . . . .   8
69	     4.2.  Estimating the validated capacity supported by a path . .   8
70	     4.3.  Preserving cwnd during a rate-limited period. . . . . . .   9
71	     4.4.  TCP congestion control during the non-validated phase . .   9
72	       4.4.1.  Response to congestion in the non-validated phase . .  11
73	       4.4.2.  Sender burst control during the non-validated phase .  12
74	       4.4.3.  Adjustment at the end of the non-validated phase  . .  13
75	     4.5.  Examples of Implementation  . . . . . . . . . . . . . . .  13
76	       4.5.1.  Implementing the pipeACK measurement  . . . . . . . .  13
77	       4.5.2.  Implementing detection of the cwnd-limited condition   15
78	   5.  Determining a safe period to preserve cwnd  . . . . . . . . .  15
79	   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  16
80	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
81	   8.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  16
82	   9.  Author Notes  . . . . . . . . . . . . . . . . . . . . . . . .  16
83	     9.1.  Other related work  . . . . . . . . . . . . . . . . . . .  16
84	     9.2.  Revision notes  . . . . . . . . . . . . . . . . . . . . .  19
85	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  21
86	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  21
87	     10.2.  Informative References . . . . . . . . . . . . . . . . .  22
88	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  23

90	1.  Introduction

92	   TCP is used to support a range of application behaviours.  The TCP
93	   congestion window (cwnd) controls the number of unacknowledged
94	   packets/bytes that a TCP flow may have in the network at any time, a
95	   value known as the FlightSize [RFC5681].  A bulk application will
96	   always have data available to transmit.  The rate at which it sends
97	   is therefore limited by the maximum permitted by the receiver
98	   advertised window and the sender congestion window (cwnd).  In
99	   contrast, a rate-limited application will experience periods when the
100	   sender is either idle or is unable to send at the maximum rate
101	   permitted by the cwnd.  The update in this document targets the
102	   operation of TCP in such rate-limited cases.

104	   Standard TCP [RFC5681] states that a TCP sender SHOULD set cwnd to no
105	   more than the Restart Window (RW) before beginning transmission, if
106	   the TCP sender has not sent data in an interval exceeding the
107	   retransmission timeout, i..e when an application becomes idle.
108	   [RFC2861] noted that this TCP behaviour was not always observed in
109	   current implementations.  Experiments [Bis08] confirm this to still
110	   be the case.

112	   CWV introduced the terminology of "application limited periods".
113	   This document describes any time that an application limits the
114	   sending rate, rather than being limited by the transport, as "rate-
115	   limited".  This update improves support for applications that vary
116	   their transmission rate, either with (short) idle periods between
117	   transmission or by changing the rate the application sends.  These
118	   applications are characterised by the TCP FlightSize often being less
119	   than cwnd.  Many Internet applications exhibit this behaviour,
120	   including web browsing, http-based adaptive streaming, applications
121	   that support query/response type protocols, network file sharing, and
122	   live video transmission.  Many such applications currently avoid
123	   using long-lived (persistent) TCP connections (e.g. [RFC2616] servers
124	   typically support persistent HTTP connections, but short server
125	   timeouts often prevent using it).  Such applications often instead
126	   either use a succession of short TCP transfers or use UDP.

128	   Standard TCP does not impose additional restrictions on the growth of
129	   the congestion window when a TCP sender is unable to send at the
130	   maximum rate allowed by the cwnd.  In this case the rate-limited
131	   sender may grow a cwnd far beyond that corresponding to the current
132	   transmit rate, resulting in a value that does not reflect current
133	   information about the state of the network path the flow is using.
134	   Use of such an invalid cwnd may result in reduced application
135	   performance and/or could significantly contribute to network
136	   congestion.

138	   [RFC2861] proposed a solution to these issues in an experimental
139	   method known as Congestion Window Validation (CWV).  CWV was intended
140	   to help reduce cases where TCP accumulated an invalid cwnd.  The use
141	   and drawbacks of using the CWV algorithm in RFC 2861 with an
142	   application are discussed in Section 2.

144	   Section 3 defines relevant terminology.

146	   Section 4 specifies an alternative to CWV that seeks to address the
147	   same issues, but does this in a way that is expected to mitigate the
148	   impact on an application that varies its sending rate.  The updated
149	   method applies to the rate-limited conditions (including both an
150	   application-limited and idle sender).

152	   The goals of this update are:

154	   o  To not change the behaviour of a TCP sender that performs bulk
155	      transfers that consume the cwnd.

157	   o  To provide a method that co-exists with Standard TCP and other
158	      flows that use this updated method.

160	   o  To reduce transfer latency for applications that change their rate
161	      over short intervals of time.

163	   o  To avoid a TCP sender growing a large "non-validated" cwnd, when
164	      it has not recently sent using this cwnd.

166	   o  To remove the incentive for ad-hoc application or network stack
167	      methods (such as "padding") solely to maintain a large cwnd for
168	      future transmission.

170	   o  To incentivise the use of long-lived connections, rather than a
171	      succession of short-lived flows, benefiting both flows and network
172	      when actual congestion is encountered.

174	   Section 5 describes the rationale for selecting the safe period to
175	   preserve the cwnd.

177	1.1.  Standards Status of this Document

179	   This document was produced by the TCP Maintenance and Minor
180	   Extensions (tcpm) working group.

182	   The document updates and obsoletes the methods described in
183	   [RFC2861].  It recommends a set of mechanisms, including the use of
184	   pacing during a non-validated period.  The updated mechanisms are
185	   intended to have a less aggressive congestion impact than would be
186	   exhibited by a standard TCP sender.

188	   The specification in this draft is classified as "Experimental"
189	   pending experience with deployed implementations of the methods.

191	2.  Reviewing experience with TCP-CWV

193	   [RFC2861] described a simple modification to the TCP congestion
194	   control algorithm that decayed the cwnd after the transition to a
195	   "sufficiently-long" idle period.  This used the slow-start threshold
196	   (ssthresh) to save information about the previous value of the
197	   congestion window.  The approach relaxed the standard TCP behaviour
198	   [RFC5681] for an idle session, intended to improve application
199	   performance.  CWV also modified the behaviour where a sender
200	   transmitted at a rate less than allowed by cwnd.

202	   [RFC2861] proposed two set of responses, one after an "application-
203	   limited" and one after an "idle period".  Although this distinction
204	   was argued, in practice differentiating the two conditions was found
205	   problematic in actual networks (e.g.[Bis10]).  This offers
206	   predictable performance for long on-off periods (>>1 RTT), or slowly
207	   varying rate-based traffic, the performance could be unpredictable
208	   for variable-rate traffic and depended both upon whether an accurate
209	   RTT had been obtained and the pattern of application traffic relative
210	   to the measured RTT.

212	   Many applications can and often do vary their transmission over a
213	   wide range rates.  Using [RFC2861] such applications often
214	   experienced varying performance, which made it hard for application
215	   developers to predict the TCP latency even when using a path with
216	   stable network characteristics.  We argue that an attempt to classify
217	   application behaviour as application-limited or idle is problematic
218	   and also inappropriate.  This document therefore explicitly avoids
219	   trying to differentiate these two cases, instead treating all rate-
220	   limited traffic uniformly.

222	   [RFC2861] has been implemented in some mainstream operating systems
223	   as the default behaviour [Bis08].  Analysis (e.g. [Bis10] [Fai12])
224	   has shown that a TCP sender using CWV is able to use available
225	   capacity on a shared path after an idle period.  This can benefit
226	   variable-rate applications, especially over long delay paths, when
227	   compared to the slow-start restart specified by standard TCP.
228	   However, CWV would only benefit an application if the idle period
229	   were less than several Retransmission Time Out (RTO) intervals
230	   [RFC6298], since the behaviour would otherwise be the same as for
231	   standard TCP, which resets the cwnd to the TCP Restart Window after
232	   this period.

234	   To enable better performance for variable-rate applications with TCP,
235	   some operating systems have chosen to support non-standard methods,
236	   or applications have resorted to "padding" streams to maintain their
237	   sending rate when they have no data to transmit.  Although
238	   transmitting redundant data across a network path provides good
239	   evidence that the path can sustain data at the offered rate, padding
240	   also consumes network capacity and reduces the opportunity for
241	   congestion-free statistical multiplexing.  For variable-rate flows,
242	   the benefits of statistical multiplexing can be significant and it is
243	   therefore a goal to find a viable alternative to padding streams.

245	   Experience with [RFC2861] suggests that although the CWV method
246	   benefited the network in a rate-limited scenario (reducing the
247	   probability of network congestion), the behaviour was too
248	   conservative for many common rate-limited applications.  This
249	   mechanism did not therefore offer the desirable increase in
250	   application performance for rate-limited applications and it is
251	   unclear whether applications actually use this mechanism in the
252	   general Internet.

254	   It is therefore concluded that CWV, as defined in [RFC2861], was
255	   often a poor solution for many rate-limited applications.  It had the
256	   correct motivation, but had the wrong approach to solving this
257	   problem.

259	3.  Terminology

261	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
262	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
263	   document are to be interpreted as described in [RFC2119].

265	   The document assumes familiarity with the terminology of TCP
266	   congestion control [RFC5681].

268	   The following terminology is used in this document:

270	   cwnd-limited: A TCP flow that has sent the maximum number of segments
271	   permitted by the cwnd, where the application utilises the allowed
272	   sending rate (see Section 4.5.2).

274	   pipeACK sample: A measure of the volume of data acknowledged by the
275	   network within an RTT.

277	   pipeACK variable: A variable that measures the available capacity
278	   using the set of pipeACK samples.

280	   pipeACK Sampling Period: The maximum period that a measured pipeACK
281	   sample may influence the pipeACK variable.

283	   Non-validated phase: The phase where the cwnd reflects a previous
284	   measurement of the available path capacity.

286	   Non-validated period, NVP: The maximum period for which cwnd is
287	   preserved in the non-validated phase.

289	   Rate-limited: A TCP flow that does not consume more than one half of
290	   cwnd, and hence operates in the non-validated phase.  This includes
291	   periods when an application is either idle or chooses to send at a
292	   rate less than the maximum permitted by the cwnd.

294	   Validated phase: The phase where the cwnd reflects a current estimate
295	   of the available path capacity.

297	4.  A New Congestion Window Validation method

299	   This section proposes an update to the TCP congestion control
300	   behaviour during a rate-limited interval.  This new method
301	   intentionally does not differentiate between times when the sender
302	   has become idle or chooses to send at a rate less than the maximum
303	   allowed by the cwnd.

305	   The period where actual usage is less than allowed by cwnd, is named
306	   as the non-validated phase.  The update allows an application in the
307	   non-validated phase to resume transmission at a previous rate without
308	   incurring the delay of slow-start.  However, if the TCP sender
309	   experiences congestion using the preserved cwnd, it is required to
310	   immediately reset the cwnd to an appropriate value specified by the
311	   method.  If a sender does not take advantage of the preserved cwnd
312	   within the NVP, the value of cwnd is reduced, ensuring the value
313	   better reflects the capacity that was recently actually used.

315	   It is expected that this update will satisfy the requirements of many
316	   rate-limited applications and at the same time provide an appropriate
317	   method for use in the Internet.  Some applications use dummy packets
318	   (aka "padding") to maintain a sending rate when an application has
319	   now data to send.  Although this ensures the path continues to
320	   support the rate permitted by the cwnd, it wastes network capacity
321	   sending useless data.  New-CWV reduces this incentive for an
322	   application to send data simply to keep transport congestion state.

324	   The method is specified in following subsections and is expected to
325	   encourage applications and TCP stacks to use standards-based
326	   congestion control methods.  It may also encourage the use of long-
327	   lived connections where this offers benefit (such as persistent
328	   http).

330	4.1.  Initialisation

332	   A sender starts a TCP connection in the validated phase and
333	   initialises the pipeACK variable to the "undefined" value.  This
334	   value inhibits use of the value in cwnd calculations.

336	4.2.  Estimating the validated capacity supported by a path

338	   [RFC6675] defines a variable, FlightSize, that indicates the
339	   instantaneous amount of data that has been sent, but not cumulatively
340	   acknowledged.  In this method a new variable "pipeACK" is introduced
341	   to measure the acknowledged size of the network pipe.  This is used
342	   to determine if the sender has validated the cwnd. pipeACK differs
343	   from FlightSize in that it is evaluated over a window of acknowledged
344	   data, rather than reflecting the amount of data outstanding.

346	   A sender determines a pipeACK sample by measuring the volume of data
347	   that was acknowledged by the network over the period of a measured
348	   Round Trip Time (RTT).  Using the variables defined in [RFC6675], a
349	   value could be measured by caching the value of HighACK and after one
350	   RTT measuring the difference between the cached HighACK value and the
351	   current HighACK value.  Other equivalent methods may be used.

353	   A sender is not required to continuously update the pipeACK variable
354	   after each received ACK, but SHOULD perform a pipeACK sample at least
355	   once per RTT when it has sent unacknowledged segments.

357	   The pipeACK variable MAY consider multiple pipeACK samples over the
358	   pipeACK Sampling Period.  The value of the pipeACK variable MUST NOT
359	   exceed the maximum (highest value) within the sampling period.  This
360	   specification defines the pipeACK Sampling Period as Max(3*RTT, 1
361	   second).  This period enables a sender to compensate for large
362	   fluctuations in the sending rate, where there may be pauses in
363	   transmission, and allows the pipeACK variable to reflect the largest
364	   recently measured pipeACK sample.

366	   When no measurements are available, the pipeACK variable is set to
367	   the "undefined value".  This value is used to inhibit entering the
368	   non-validated phase until the first new measurement of a pipeACK
369	   sample.

371	   The pipeACK variable MUST NOT be updated during TCP Fast Recovery.
372	   That is, the sender stops collecting pipeACK samples during loss
373	   recovery.  The method RECOMMENDS that the TCP SACK option [RFC2018]
374	   is enabled and the method defined on [RFC6675]is used to recover
375	   missing segments.  This allows the sender to more accurately
376	   determine the number of missing bytes during the loss recovery phase,
377	   and using this method will result in a more appropriate cwnd
378	   following loss.

380	4.3.  Preserving cwnd during a rate-limited period.

382	   The updated method creates a new TCP sender phase that captures
383	   whether the cwnd reflects a validated or non-validated value.  The
384	   phases are defined as:

386	   o  Validated phase: pipeACK >=(1/2)*cwnd, or pipeACK is undefined.
387	      This is the normal phase, where cwnd is expected to be an
388	      approximate indication of the capacity currently available along
389	      the network path, and the standard methods are used to increase
390	      cwnd (currently [RFC5681]).

392	   o  Non-validated phase: pipeACK <(1/2)*cwnd.  This is the phase where
393	      the cwnd has a value based on a previous measurement of the
394	      available capacity, and the usage of this capacity has not been
395	      validated in the pipeACK Sampling Period.  That is, when it is not
396	      known whether the cwnd reflects the currently available capacity
397	      along the network path.  The mechanisms to be used in this phase
398	      seek to determine a safe value for cwnd and an appropriate
399	      reaction to congestion.

401	   Note: A threshold is needed to determine whether a sender is in the
402	   validated or non-validated phase.  We start by noting that a standard
403	   TCP sender in slow-start is permitted to double its FlightSize from
404	   one RTT to the next.  This motivated the choice of a threshold value
405	   of 1/2.  This threshold ensures a sender does not further increase
406	   the cwnd as long as the FlightSize is less than (1/2*cwnd).
407	   Furthermore, a sender with a FlightSize less than (1/2*cwnd) may in
408	   the next RTT be permitted by the cwnd to send at a rate that more
409	   than doubles the FlightSize, and hence this case needs to be regarded
410	   as non-validated and a sender therefore needs to employ additional
411	   mechanisms while in this phase.

413	4.4.  TCP congestion control during the non-validated phase

415	   A TCP sender MUST enter the non-validated phase when the pipeACK is
416	   less than (1/2)*cwnd.

418	   A TCP sender that enters the non-validated phase SHOULD preserve the
419	   cwnd (i.e., this neither grows nor reduces while the sender remains
420	   in this phase).  If the sender receives an indication of congestion
421	   (loss or Explicit Congestion Notification, ECN, mark [RFC3168]) it
422	   uses the method described below.  The phase is concluded after a
423	   fixed period of time (the NVP, as explained in Section 4.4.3) or when
424	   the sender transmits sufficient data so that pipeACK > (1/2)*cwnd
425	   (i.e. the sender is no longer rate-limited).

427	   The behaviour in the non-validated phase is specified as:

429	   o  A sender determines whether to increase the cwnd based upon
430	      whether it is cwnd-limited (see Section 4.5.2):

432	   o

434	      *  A sender that is cwnd-limited MAY use the standard TCP method
435	         to increase cwnd (i.e. a TCP sender that fully utilises the
436	         cwnd is permitted to increase cwnd each received ACK using
437	         standard methods).

439	      *  A sender that is not cwnd-limited MUST NOT increase the cwnd
440	         when ACK packets are received in this phase.

442	   o  If the sender receives an indication of congestion while in the
443	      non-validated phase (i.e., detects loss, or an ECN mark), the
444	      sender MUST exit the non-validated phase (reducing the cwnd as
445	      defined in Section 4.4.1).

447	   o  If the Retransmission Time Out (RTO) expires while in the non-
448	      validated phase, the sender MUST exit the non-validated phase.  It
449	      then resumes using the standard TCP RTO mechanism [RFC5681].

451	   o  A sender with a pipeACK variable greater than (1/2)*cwnd SHOULD
452	      enter the validated phase.  (A rate-limited sender will not
453	      normally be impacted by whether it is in a validated or non-
454	      validated phase, since it will normally not consume the entire
455	      cwnd.  However a change to the validated phase will release the
456	      sender from constraints on the growth of cwnd, and restore the use
457	      of the standard congestion response.)

459	   The cwnd-limited behaviour may be triggered during a transient
460	   condition that occurs when a sender is in the non-validated phase and
461	   receives an ACK that acknowledges received data, the cwnd was fully
462	   utilised, and more data is awaiting transmission than may be sent
463	   with the current cwnd.  The sender is then allowed to use the
464	   standard method to increase the cwnd.  (Note, if the sender succeeds
465	   in sending these new segments, the updated cwnd and pipeACK variables
466	   will eventually result in a transition to the validated phase.)

468	4.4.1.  Response to congestion in the non-validated phase

470	   Reception of congestion feedback while in the non-validated phase is
471	   interpreted as an indication that it was inappropriate for the sender
472	   to use the preserved cwnd.  The sender is therefore required to
473	   quickly reduce the rate to avoid further congestion.  Since the cwnd
474	   does not have a validated value, a new cwnd value must be selected
475	   based on the utilised rate.

477	   A sender that detects a packet-drop, or receives an indication of an
478	   ECN marked packet, MUST record the current FlightSize in the variable
479	   LossFlightSize and MUST calculate a safe cwnd for loss recovery using
480	   the method below:

482	           cwnd = (Max(pipeACK,LossFlightSize))/2.

484	   The pipeACK value is not updated during loss recoverySection 4.2.  If
485	   there is a valid pipeACK value, the new cwnd is adjusted to reflect
486	   that a non-validated cwnd may be larger than the actual FlightSize,
487	   or recently used FlightSize (recorded in pipeACK).  The updated cwnd
488	   therefore prevents overshoot by a sender significantly increasing its
489	   transmission rate during the recovery period.

491	   At the end of the recovery phase, the TCP sender MUST reset the cwnd
492	   using the method below:

494	           cwnd = (Max(pipeACK,LossFlightSize) - R)/2.

496	   Where R is the volume of data that was retransmitted during the
497	   recovery phase.

499	   If the sender implements a method that allows it to identify the
500	   number of ECN-marked segments within a window that were observed by
501	   the receiver, the sender SHOULD use the method above, further
502	   reducing R by the number of marked segments.

504	   After completing the loss recovery phase, the sender MUST re-
505	   initialise the pipeACK variable to the "undefined" value.  This
506	   ensures that standard TCP methods are used immediately after
507	   completing loss recovery until a new pipeACK value can be determined.

509	   ssthresh is adjusted using the standard TCP method.

511	   Note: The adjustment by reducing cwnd by the volume of data not sent
512	   (R) follows the method proposed for Jump Start [Liu07].  The
513	   inclusion of the term R makes the adjustment more conservative than
514	   standard TCP.  This is required, since a sender in the non-validated
515	   state may increase the rate more than a standard TCP would have done
516	   relative to what was sent in the last RTT (i.e., more than doubled
517	   the number of segments in flight relative to what it sent in the last
518	   RTT).  The additional reduction after congestion is beneficial when
519	   the LossFlightSize has significantly overshot the available path
520	   capacity incurring significant loss (e.g. following a change of path
521	   characteristics or when additional traffic has taken a larger share
522	   of the network bottleneck during a period when the sender transmits
523	   less).

525	   Note: The pipeACK value is only valid during a non-validated phase,
526	   and therefore does not exceed cwnd/2.  If LossFlightSize and R were
527	   small, then this can result in the final cwnd after loss recovery
528	   being not more than 1/4 of the cwnd on detection of congestion.  This
529	   reduction is conservative compared to standard TCP.  pipeACK is reset
530	   to undefined after completing loss recovery.  Subsequent updates to
531	   cwnd do not therefore reflect pipeACK history before any congestion
532	   event.

534	4.4.2.  Sender burst control during the non-validated phase

536	   TCP congestion control allows a sender to accumulate a cwnd that
537	   would allow it to send a burst of segments with a total size up to
538	   the difference between the FlightsSize and cwnd.  Such bursts can
539	   impact other flows that share a network bottleneck and/or may induce
540	   congestion when buffering is limited.

542	   Various methods have been proposed to control the sender burstiness
543	   [Hug01], [All05].  For example, TCP can limit the number of new
544	   segments it sends per received ACK.  This is effective when a flow of
545	   ACKs is received, but can not be used to control a sender that has
546	   not send appreciable data in the previous RTT [All05].

548	   This document recommends using a method to avoid line-rate bursts
549	   after an idle or rate-limited interval when there is less reliable
550	   information about the capacity of the network path: A TCP sender in
551	   the non-validated phase SHOULD control the maximum burst size, e.g.
552	   using a rate-based pacing algorithm in which a sender paces out the
553	   cwnd over its estimate of the RTT, or some other method, to prevent
554	   many segments being transmitted contiguously at line-rate.  The most
555	   appropriate method(s) to implement pacing depend on the design of the
556	   TCP/IP stack, speed of interface and whether hardware support (such
557	   as TCP Segment Offload, TSO) is used.  The present document does not
558	   recommend any specific method.

560	4.4.3.  Adjustment at the end of the non-validated phase

562	   An application that remains in the non-validated phase for a period
563	   greater than the NVP is required to adjust its congestion control
564	   state.  If the sender exits the non-validated phase after this
565	   period, it MUST update the ssthresh:

567	         ssthresh = max(ssthresh, 3*cwnd/4).

569	   (This adjustment of ssthresh ensures that the sender records that it
570	   has safely sustained the present rate.  The change is beneficial to
571	   rate-limited flows that encounter occasional congestion, and could
572	   otherwise suffer an unwanted additional delay in recovering the
573	   sending rate.)

575	   The sender MUST then update cwnd to be not greater than:

577	            cwnd = max((1/2)*cwnd, IW).

579	   Where IW is the appropriate TCP initial window, used by the TCP
580	   sender (e.g. [RFC5681]).

582	   Note: This adjustment ensures that the sender responds conservatively
583	   after remaining in the non-validated phase for more than the non-
584	   validated period.  In this case, it reduces the cwnd by a factor of
585	   two from the preserved value.  This adjustment is helpful when flows
586	   accumulate but do not use a large cwnd, and seeks to mitigate the
587	   impact when these flows later resume transmission.  This could for
588	   instance mitigate the impact if multiple high-rate application flows
589	   were to become idle over an extended period of time and then were
590	   simultaneously awakened by some external event.

592	4.5.  Examples of Implementation

594	   This section provides informative examples of implementation methods.
595	   Implementations may choose to use other methods that comply with the
596	   normative requirements.

598	4.5.1.  Implementing the pipeACK measurement

600	   A pipeACK sample may be measured once each RTT.  This reduces the
601	   sender processing burden for calculating after each acknowledgement
602	   and also reduces storage requirements at the sender.

604	   Since application behaviour can be bursty using CWV, it may be
605	   desirable to implement a maximum filter to accumulate the measured
606	   values so that the pipeACK variable records the largest pipeACK
607	   sample within the pipeACK Sampling Period.  One simple way to
608	   implement this is to divide the pipeACK Sampling Period into several
609	   (e.g. 5) equal length measurement periods.  The sender then records
610	   the start time for each measurement period and the highest measured
611	   pipeACK sample.  At the end of the measurement period, any
612	   measurement(s) that are older than the pipeACK Sampling Period are
613	   discarded.  The pipeACK variable is then assigned the largest of the
614	   set of the highest measured values.

616	     +----------+----------+           +----------+---......
617	     | Sample A | Sample B | No        | Sample C | Sample D
618	     |          |          | Sample    |          |
619	     | |\ 5     |          |           |          |
620	     | | |      |          |           |  /\ 4    |
621	     | | |      |  |\ 3    |           |  | \     |
622	     | | \      | |  \---  |           |  /  \    |   /| 2
623	     |/   \------|       - |           | /    \------/ \...
624	     +----------+---------\+----/ /----+/---------+-------------> Time

626	     <------------------------------------------------|
627	                         Sampling Period          Current Time

629	   Figure 1: Example of measuring pipeACK samples

631	   Figure 1 shows an example of how measurement samples may be
632	   collected.  At the time represented by the figure new samples are
633	   being accumulated into sample D. Three previous samples also fall
634	   within the pipeACK Sampling Period: A, B, and C. There was also a
635	   period of inactivity between samples B and C during which no
636	   measurements were taken.  The current value of the pipeACK variable
637	   will be 5, the maximum across all samples.

639	   After one further measurement period, Sample A will be discarded,
640	   since it then is older than the pipeACK Sampling Period and the
641	   pipeACK variable will be recalculated, Its value will be the larger
642	   of Sample C or the final value accumulated in Sample D.

644	   Note that the pipeACK Sampling Period and the NVP period do not
645	   necessarily require a new timer to be implemented.  An alternative is
646	   to record a timestamp when the sender enters the NVP.  Each time a
647	   sender transmits a new segment, this timestamp may be used to
648	   determine if the NVP period has expired.  If the period expires, the
649	   sender may take into account how many units of the NVP period have
650	   passed and make one reduction (as defined in Section 4.4.3) for each
651	   NVP period.

653	4.5.2.  Implementing detection of the cwnd-limited condition

655	   A method is required to detect the cwnd-limited condition (see
656	   Section 4.4.  This is used to detect a condition where a sender in
657	   the non-validated phase receives an ACK, but the size of cwnd
658	   prevents sending more new data.

660	   In simple terms this condition is true only when the TCP sender's
661	   FlightSize is equal to or larger than the cwnd.  However, an
662	   implementation must consider other constraints on the way in which
663	   cwnd variable is used, for instance the need to support methods such
664	   as the Nagle Algorithm and TCP Segment Offload (TSO).  This can
665	   result in a sender becoming cwnd-limited when the cwnd is nearly,
666	   rather than completely, equal to the FlightSize.

668	5.  Determining a safe period to preserve cwnd

670	   This section documents the rationale for selecting the maximum period
671	   that cwnd may be preserved, known as the non-validated period, NVP.

673	   Limiting the period that cwnd may be preserved avoids undesirable
674	   side effects that would result if the cwnd were to be kept
675	   unnecessarily high for an arbitrary long period, which was a part of
676	   the problem that CWV originally attempted to address.  The period a
677	   sender may safely preserve the cwnd, is a function of the period that
678	   a network path is expected to sustain the capacity reflected by cwnd.
679	   There is no ideal choice for this time.

681	   A period of five minutes was chosen for this NVP.  This is a
682	   compromise that was larger than the idle intervals of common
683	   applications, but not sufficiently larger than the period for which
684	   the capacity of an Internet path may commonly be regarded as stable.
685	   The capacity of wired networks is usually relatively stable for
686	   periods of several minutes and that load stability increases with the
687	   capacity.  This suggests that cwnd may be preserved for at least a
688	   few minutes.

690	   There are cases where the TCP throughput exhibits significant
691	   variability over a time less than five minutes.  Examples could
692	   include wireless topologies, where TCP rate variations may fluctuate
693	   on the order of a few seconds as a consequence of medium access
694	   protocol instabilities.  Mobility changes may also impact TCP
695	   performance over short time scales.  Senders that observe such rapid
696	   changes in the path characteristic may also experience increased
697	   congestion with the new method, however such variation would likely
698	   also impact TCP's behaviour when supporting interactive and bulk
699	   applications.

701	   Routing algorithms may modify the network path, disrupting the RTT
702	   measurement and changing the capacity available to a TCP connection,
703	   however such changes do not often occur within a time frame of a few
704	   minutes.

706	   The value of five minutes is therefore expected to be sufficient for
707	   most current applications.  Simulation studies (e.g. [Bis11]) also
708	   suggest that for many practical applications, the performance using
709	   this value will not be significantly different to that observed using
710	   a non-standard method that does not reset the cwnd after idle.

712	   Finally, other TCP sender mechanisms have used a 5 minute timer, and
713	   there could be simplifications in some implementations by reusing the
714	   same interval.  TCP defines a default user timeout of 5 minutes
715	   [RFC0793] i.e. how long transmitted data may remain unacknowledged
716	   before a connection is forcefully closed.

718	6.  Security Considerations

720	   General security considerations concerning TCP congestion control are
721	   discussed in [RFC5681].  This document describes an algorithm that
722	   updates one aspect of the congestion control procedures, and so the
723	   considerations described in RFC 5681 also apply to this algorithm.

725	7.  IANA Considerations

727	   There are no IANA considerations.

729	8.  Acknowledgments

731	   The authors acknowledge the contributions of Dr I Biswas, Mr Ziaul
732	   Hossain in supporting the evaluation of CWV and for their help in
733	   developing the mechanisms proposed in this draft.  We also
734	   acknowledge comments received from the Internet Congestion Control
735	   Research Group, in particular Yuchung Cheng, Mirja Kuehlewind, Joe
736	   Touch, and Mark Allman.  This work was part-funded by the European
737	   Community under its Seventh Framework Programme through the Reducing
738	   Internet Transport Latency (RITE) project (ICT-317700).

740	9.  Author Notes

742	   RFC-Editor note: please remove this section prior to publication.

744	9.1.  Other related work

746	   RFC-Editor note: please remove this section prior to publication.

748	   There are several issues to be discussed more widely:

750	      o There are potential interactions with the Experimental update in
751	      [RFC6928] that raises the TCP initial Window to ten segments, do
752	      these cases need to be elaborated?

754	         This relates to the Experimental specification for increasing
755	         the TCP IW defined in RFC 6928.

757	         The two methods have different functions and different response
758	         to loss/congestion.

760	         RFC 6928 proposes an experimental update to TCP that would
761	         increase the IW to ten segments.  This would allow faster
762	         opening of the cwnd, and also a large (same size) restart
763	         window.  This approach is based on the assumption that many
764	         forward paths can sustain bursts of up to ten segments without
765	         (appreciable) loss.  Such a significant increase in cwnd must
766	         be matched with an equally large reduction of cwnd if loss/
767	         congestion is detected, and such a congestion indication is
768	         likely to require future use of IW=10 to be disabled for this
769	         path for some time.  This guards against the unwanted behaviour
770	         of a series of short flows continuously flooding a network path
771	         without network congestion feedback.

773	         In contrast, this document proposes an update with a rationale
774	         that relies on recent previous path history to select an
775	         appropriate cwnd after restart.

777	         The behaviour differs in three ways:

779	         1) For applications that send little initially, new-cwv may
780	         constrain more than RFC 6928, but would not require the
781	         connection to reset any path information when a restart
782	         incurred loss.  In contrast, new-cwv would allow the TCP
783	         connection to preserve the cached cwnd, any loss, would impact
784	         cwnd, but not impact other flows.

786	         2) For applications that utilise more capacity than provided by
787	         a cwnd of 10 segments, this method would permit a larger
788	         restart window compared to a restart using the method in RFC
789	         6928.  This is justified by the recent path history.

791	         3) new-CWV is attended to also be used for rate-limited
792	         applications, where the application sends, but does not seek to
793	         fully utilise the cwnd.  In this case, new-cwv constrains the
794	         cwnd to that justified by the recent path history.  The
795	         performance trade-offs are hence different, and it would be
796	         possible to enable new-cwv when also using the method in RFC
797	         6928, and yield benefits.

799	      o There is potential overlap with the Laminar proposal (draft-
800	      mathis-tcpm-tcp-laminar)

802	         The current draft was intended as a standards-track update to
803	         TCP, rather than a new transport variant.  At least, it would
804	         be good to understand how the two interact and whether there is
805	         a possibility of a single method.

807	      o There is potential performance loss in loss of a short burst
808	      (off list with M Allman)

810	         A sender can transmit several segments then become idle.  If
811	         the first segments are all ACK'ed the ssthresh collapses to a
812	         small value (no new data is sent by the idle sender).  Loss of
813	         the later data results in congestion (e.g. maybe a RED drop or
814	         some other cause, rather than the maximum rate of this flow).
815	         When the sender performs loss recovery it may have an
816	         appreciable pipeACK and cwnd, but a very low FlightSize - the
817	         Standard algorithm results in an unusually low cwnd ((1/2)*
818	         FlightSize).

820	         A constant rate flow would have maintained a FlightSize
821	         appropriate to pipeACK (cwnd if it is a bulk flow).

823	         This could be fixed by adding a new state variable?  It could
824	         also be argued this is a corner case (e.g. loss of only the
825	         last segments would have resulted in RTO), the impact could be
826	         significant.

828	      o There is potential interaction with TCP Control Block Sharing(M
829	      Welzl)

831	         An application that is non-validated can accumulate a cwnd that
832	         is larger than the actual capacity.  Is this a fair value to
833	         use in TCB sharing?

835	         We propose that TCB sharing should use the pipeACK in place of
836	         cwnd when a TCP sender is in the Non-validated phase.  This
837	         value better reflects the capacity that the flow has utilised
838	         in the network path.

840	9.2.  Revision notes

842	   RFC-Editor note: please remove this section prior to publication.

844	   Draft 03 was submitted to ICCRG to receive comments and feedback.

846	   Draft 04 contained the first set of clarifications after feedback:

848	   o  Changed name to application limited and used the term rate-limited
849	      in all places.

851	   o  Added justification and many minor changes suggested on the list.

853	   o  Added text to tie-in with more accurate ECN marking.

855	   o  Added ref to Hug01

857	   Draft 05 contained various updates:

859	   o  New text to redefine how to measure the acknowledged pipe,
860	      differentiating this from the FlightSize, and hence avoiding
861	      previous issues with infrequent large bursts of data not being
862	      validated.  A key point new feature is that pipeACK only triggers
863	      leaving the NVP after the size of the pipe has been acknowledged.
864	      This removed the need for hysteresis.

866	   o  Reduction values were changed to 1/2, following analysis of
867	      suggestions from ICCRG.  This also sets the "target" cwnd as twice
868	      the used rate for non-validated case.

870	   o  Introduced a symbolic name (NVP) to denote the 5 minute period.

872	   Draft 06 contained various updates:

874	   o  Required reset of pipeACK after congestion.

876	   o  Added comment on the effect of congestion after a short burst (M.
877	      Allman).

879	   o  Correction of minor Typos.

881	   WG draft 00 contained various updates:

883	   o  Updated initialisation of pipeACK to maximum value.

885	   o  Added note on intended status still to be determined.

887	   WG draft 01 contained:

889	   o  Added corrections from Richard Scheffenegger.

891	   o  Raffaello Secchi added to the mechanism, based on implementation
892	      experience.

894	   o  Removed that the requirement for the method to use TCP SACK option

896	   o  Although it may be desirable to use SACK, this is not essential to
897	      the algorithm.

899	   o  Added the notion of the sampling period to accommodate large rate
900	      variations and ensure that the method is stable.  This algorithm
901	      to be validated through implementation.

903	   WG draft 02 contained:

905	   o  Clarified language around pipeACK variable and pipeACK sample -
906	      Feedback from Aris Angelogiannopoulos.

908	   WG draft 03 contained:

910	   o  Editorial corrections - Feedback from Anna Brunstrom.

912	   o  An adjustment to the procedure at the start and end of Reoloss
913	      recovery to align the two equations.

915	   o  Further clarification of the "undefined" value of the pipeACK
916	      variable.

918	   WG draft 04 contained:

920	   o  Editorial corrections.

922	   o  Introduced the "cwnd-limited" term.

924	   o  An adjustment to the procedure at the start of a cwnd-limited
925	      phase - the new text is intended to ensure that new-cwv is not
926	      unnecessarily more conservative than standard TCP when the flow is
927	      cwnd-limited.  This resolves two issues: first it prevents
928	      pathologies in which pipeACK increases slowly and erratically.  It
929	      also ensures that performance of bulk applications is not
930	      significantly impacted when using the method.

932	   o  Clearly identifies that pacing (or equivalent) is requiring during
933	      the NVP to control burstiness.  New section added.

935	   WG draft 05 contained:

937	   o  Clarification to first two bullets in Section 4.4 describing cwnd-
938	      limited, to explain these are really alternates to the same case.

940	   o  Section giving implementation examples was restructured to clarify
941	      there are two methods described.

943	   o  Cross References to sections updated - thanks to comments from
944	      Martin Winbjoerk and Tim Wicinski.

946	   WG draft 06 contained:

948	   o  The section giving implementation examples was restructured to
949	      clarify there are two methods described.

951	   o  Justification of design decisions.

953	   o  Re-organised text to improve clarity of argument.

955	10.  References

957	10.1.  Normative References

959	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7, RFC
960	              793, September 1981.

962	   [RFC2018]  Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
963	              Selective Acknowledgment Options", RFC 2018, October 1996.

965	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
966	              Requirement Levels", BCP 14, RFC 2119, March 1997.

968	   [RFC2861]  Handley, M., Padhye, J., and S. Floyd, "TCP Congestion
969	              Window Validation", RFC 2861, June 2000.

971	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
972	              of Explicit Congestion Notification (ECN) to IP", RFC
973	              3168, September 2001.

975	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
976	              Control", RFC 5681, September 2009.

978	   [RFC6675]  Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M.,
979	              and Y. Nishida, "A Conservative Loss Recovery Algorithm
980	              Based on Selective Acknowledgment (SACK) for TCP", RFC
981	              6675, August 2012.

983	10.2.  Informative References

985	   [All05]    Allman, M. and E. Blanton, "Notes on burst mitigation for
986	              transport protocols", March 2005.

988	   [Bis08]    Biswas, I. and G. Fairhurst, "A Practical Evaluation of
989	              Congestion Window Validation Behaviour, 9th Annual
990	              Postgraduate Symposium in the Convergence of
991	              Telecommunications, Networking and Broadcasting (PGNet),
992	              Liverpool, UK", June 2008.

994	   [Bis10]    Biswas, I., Sathiaseelan, A., Secchi, R., and G.
995	              Fairhurst, "Analysing TCP for Bursty Traffic, Int'l J. of
996	              Communications, Network and System Sciences, 7(3)", June
997	              2010.

999	   [Bis11]    Biswas, I., "PhD Thesis, Internet congestion control for
1000	              variable rate TCP traffic, School of Engineering,
1001	              University of Aberdeen", June 2011.

1003	   [Fai12]    Sathiaseelan, A., Secchi, R., Fairhurst, G., and I.
1004	              Biswas, "Enhancing TCP Performance to support Variable-
1005	              Rate Traffic, 2nd Capacity Sharing Workshop, ACM CoNEXT,
1006	              Nice, France, 10th December 2012.", June 2008.

1008	   [Hug01]    Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP
1009	              Slow-Start Restart After Idle (Work-in-Progress)",
1010	              December 2001.

1012	   [Liu07]    Liu, D., Allman, M., Jiny, S., and L. Wang, "Congestion
1013	              Control without a Startup Phase, 5th International
1014	              Workshop on Protocols for Fast Long-Distance Networks
1015	              (PFLDnet), Los Angeles, California, USA", February 2007.

1017	   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
1018	              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
1019	              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

1021	   [RFC6298]  Paxson, V., Allman, M., Chu, J., and M. Sargent,
1022	              "Computing TCP's Retransmission Timer", RFC 6298, June
1023	              2011.

1025	   [RFC6928]  Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis,
1026	              "Increasing TCP's Initial Window", RFC 6928, April 2013.

1028	Authors' Addresses

1030	   Godred Fairhurst
1031	   University of Aberdeen
1032	   School of Engineering
1033	   Fraser Noble Building
1034	   Aberdeen, Scotland  AB24 3UE
1035	   UK

1037	   Email: gorry@erg.abdn.ac.uk
1038	   URI:   http://www.erg.abdn.ac.uk

1040	   Arjuna Sathiaseelan
1041	   University of Aberdeen
1042	   School of Engineering
1043	   Fraser Noble Building
1044	   Aberdeen, Scotland  AB24 3UE
1045	   UK

1047	   Email: arjuna@erg.abdn.ac.uk
1048	   URI:   http://www.erg.abdn.ac.uk

1050	   Raffaello Secchi
1051	   University of Aberdeen
1052	   School of Engineering
1053	   Fraser Noble Building
1054	   Aberdeen, Scotland  AB24 3UE
1055	   UK

1057	   Email: raffaello@erg.abdn.ac.uk
1058	   URI:   http://www.erg.abdn.ac.uk