idnits 2.17.1 

draft-ietf-aqm-recommendation-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The draft header indicates that this document obsoletes RFC2309, but the
     abstract doesn't seem to directly say this.  It does mention RFC2309
     though, so this could be OK.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (February 14, 2014) is 3714 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 5405 (Obsoleted by RFC 8085)

  -- Obsolete informational reference (is this intentional?): RFC  793
     (Obsoleted by RFC 9293)

  -- Obsolete informational reference (is this intentional?): RFC  896
     (Obsoleted by RFC 7805)

  -- Obsolete informational reference (is this intentional?): RFC 2309
     (Obsoleted by RFC 7567)

  -- Obsolete informational reference (is this intentional?): RFC 2460
     (Obsoleted by RFC 8200)

  -- Obsolete informational reference (is this intentional?): RFC 4960
     (Obsoleted by RFC 9260)


     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                      F. Baker, Ed.
3	Internet-Draft                                             Cisco Systems
4	Obsoletes: 2309 (if approved)                          G. Fairhurst, Ed.
5	Intended status: Best Current Practice            University of Aberdeen
6	Expires: August 18, 2014                               February 14, 2014

8	         IETF Recommendations Regarding Active Queue Management
9	                    draft-ietf-aqm-recommendation-02

11	Abstract

13	   This memo presents recommendations to the Internet community
14	   concerning measures to improve and preserve Internet performance.  It
15	   presents a strong recommendation for testing, standardization, and
16	   widespread deployment of active queue management (AQM) in network
17	   devices, to improve the performance of today's Internet.  It also
18	   urges a concerted effort of research, measurement, and ultimate
19	   deployment of AQM mechanisms to protect the Internet from flows that
20	   are not sufficiently responsive to congestion notification.

22	   The note largely repeats the recommendations of RFC 2309, updated
23	   after fifteen years of experience and new research.

25	Status of This Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on August 18, 2014.

42	Copyright Notice

44	   Copyright (c) 2014 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	Table of Contents

59	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
60	     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   4
61	   2.  The Need For Active Queue Management  . . . . . . . . . . . .   4
62	   3.  Managing Aggressive Flows . . . . . . . . . . . . . . . . . .   8
63	   4.  Conclusions and Recommendations . . . . . . . . . . . . . . .  10
64	     4.1.  Operational deployments SHOULD  use AQM procedures  . . .  11
65	     4.2.  Signaling to the transport endpoints  . . . . . . . . . .  11
66	       4.2.1.  AQM and ECN . . . . . . . . . . . . . . . . . . . . .  12
67	     4.3.  AQM algorithms deployed SHOULD NOT require operational
68	           tuning  . . . . . . . . . . . . . . . . . . . . . . . . .  13
69	     4.4.  AQM algorithms SHOULD respond to measured congestion, not
70	           application profiles. . . . . . . . . . . . . . . . . . .  14
71	     4.5.  AQM algorithms SHOULD NOT be dependent on specific
72	           transport protocol behaviours . . . . . . . . . . . . . .  15
73	     4.6.  Interactions with congestion control algorithms . . . . .  16
74	     4.7.  The need for further research . . . . . . . . . . . . . .  17
75	   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  17
76	   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  17
77	   7.  Privacy Considerations  . . . . . . . . . . . . . . . . . . .  18
78	   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  18
79	   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  18
80	     9.1.  Normative References  . . . . . . . . . . . . . . . . . .  18
81	     9.2.  Informative References  . . . . . . . . . . . . . . . . .  19
82	   Appendix A.  Change Log . . . . . . . . . . . . . . . . . . . . .  22
83	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  22

85	1.  Introduction

87	   The Internet protocol architecture is based on a connectionless end-
88	   to-end packet service using the Internet Protocol, whether IPv4
89	   [RFC0791] or IPv6 [RFC2460].  The advantages of its connectionless
90	   design: flexibility and robustness, have been amply demonstrated.
91	   However, these advantages are not without cost: careful design is
92	   required to provide good service under heavy load.  In fact, lack of
93	   attention to the dynamics of packet forwarding can result in severe
94	   service degradation or "Internet meltdown".  This phenomenon was
95	   first observed during the early growth phase of the Internet in the
96	   mid 1980s [RFC0896][RFC0970], and is technically called "congestive
97	   collapse".

99	   The original fix for Internet meltdown was provided by Van Jacobsen.
100	   Beginning in 1986, Jacobsen developed the congestion avoidance
101	   mechanisms that are now required in TCP implementations [Jacobson88]
102	   [RFC1122].  These mechanisms operate in Internet hosts to cause TCP
103	   connections to "back off" during congestion.  We say that TCP flows
104	   are "responsive" to congestion signals (i.e., marked or dropped
105	   packets) from the network.  It is primarily these TCP congestion
106	   avoidance algorithms that prevent the congestive collapse of today's
107	   Internet.  Similar algorithms are specified for other non-TCP
108	   transports.

110	   However, that is not the end of the story.  Considerable research has
111	   been done on Internet dynamics since 1988, and the Internet has
112	   grown.  It has become clear that the TCP congestion avoidance
113	   mechanisms [RFC5681], while necessary and powerful, are not
114	   sufficient to provide good service in all circumstances.  Basically,
115	   there is a limit to how much control can be accomplished from the
116	   edges of the network.  Some mechanisms are needed in the network
117	   devices to complement the endpoint congestion avoidance mechanisms.
118	   These mechanisms may be implemented in network devices that include
119	   routers, switches, and other network middleboxes.

121	   It is useful to distinguish between two classes of algorithms related
122	   to congestion control: "queue management" versus "scheduling"
123	   algorithms.  To a rough approximation, queue management algorithms
124	   manage the length of packet queues by marking or dropping packets
125	   when necessary or appropriate, while scheduling algorithms determine
126	   which packet to send next and are used primarily to manage the
127	   allocation of bandwidth among flows.  While these two AQM mechanisms
128	   are closely related, they address different performance issues.

130	   This memo highlights two performance issues:

132	   The first issue is the need for an advanced form of queue management
133	   that we call "Active Queue Management", AQM.  Section 2 summarizes
134	   the benefits that active queue management can bring.  A number of AQM
135	   procedures are described in the literature, with different
136	   characteristics.  This document does not recommend any of them in
137	   particular, but does make recommendations that ideally would affect
138	   the choice of procedure used in a given implementation.

140	   The second issue, discussed in Section 3 of this memo, is the
141	   potential for future congestive collapse of the Internet due to flows
142	   that are unresponsive, or not sufficiently responsive, to congestion
143	   indications.  Unfortunately, there is currently no consensus solution
144	   to controlling congestion caused by such aggressive flows;
145	   significant research and engineering will be required before any
146	   solution will be available.  It is imperative that this work be
147	   energetically pursued, to ensure the future stability of the
148	   Internet.

150	   Section 4 concludes the memo with a set of recommendations to the
151	   Internet community concerning these topics.

153	   The discussion in this memo applies to "best-effort" traffic, which
154	   is to say, traffic generated by applications that accept the
155	   occasional loss, duplication, or reordering of traffic in flight.  It
156	   also applies to other traffic, such as real-time traffic that can
157	   adapt its sending rate to reduce loss and/or delay.  It is most
158	   effective, when the adaption occurs on time scales of a single Round
159	   Trip Time (RTT) or a small number of RTTs, for elastic traffic
160	   [RFC1633].

162	   [RFC2309] resulted from past discussions of end-to-end performance,
163	   Internet congestion, and Random Early Discard (RED) in the End-to-End
164	   Research Group of the Internet Research Task Force (IRTF).  This
165	   update results from experience with this and other algorithms, and
166	   the AQM discussion within the IETF[AQM-WG].

168	1.1.  Requirements Language

170	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
171	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
172	   document are to be interpreted as described in [RFC2119].

174	2.  The Need For Active Queue Management

176	   The traditional technique for managing the queue length in a network
177	   device is to set a maximum length (in terms of packets) for each
178	   queue, accept packets for the queue until the maximum length is
179	   reached, then reject (drop) subsequent incoming packets until the
180	   queue decreases because a packet from the queue has been transmitted.
181	   This technique is known as "tail drop", since the packet that arrived
182	   most recently (i.e., the one on the tail of the queue) is dropped
183	   when the queue is full.  This method has served the Internet well for
184	   years, but it has two important drawbacks:

186	   1.  Lock-Out

188	       In some situations tail drop allows a single connection or a few
189	       flows to monopolize queue space, preventing other connections
190	       from getting room in the queue.  This "lock-out" phenomenon is
191	       often the result of synchronization or other timing effects.

193	   2.  Full Queues

195	       The tail drop discipline allows queues to maintain a full (or,
196	       almost full) status for long periods of time, since tail drop
197	       signals congestion (via a packet drop) only when the queue has
198	       become full.  It is important to reduce the steady-state queue
199	       size, and this is perhaps the most important goal for queue
200	       management.

202	       The naive assumption might be that there is a simple tradeoff
203	       between delay and throughput, and that the recommendation that
204	       queues be maintained in a "non-full" state essentially translates
205	       to a recommendation that low end-to-end delay is more important
206	       than high throughput.  However, this does not take into account
207	       the critical role that packet bursts play in Internet
208	       performance.  For example, even though TCP constrains the
209	       congestion window of a flow, packets often arrive at network
210	       devices in bursts [Leland94].  If the queue is full or almost
211	       full, an arriving burst will cause multiple packets to be
212	       dropped.  This can result in a global synchronization of flows
213	       throttling back, followed by a sustained period of lowered link
214	       utilization, reducing overall throughput.

216	       The point of buffering in the network is to absorb data bursts
217	       and to transmit them during the (hopefully) ensuing bursts of
218	       silence.  This is essential to permit the transmission of bursty
219	       data.  Normally small queues are preferred in network devices,
220	       with sufficient queue capacity to absorb the bursts.  The
221	       counter-intuitive result is that maintaining normally-small
222	       queues can result in higher throughput as well as lower end-to-
223	       end delay.  In summary, queue limits should not reflect the
224	       steady state queues we want to be maintained in the network;
225	       instead, they should reflect the size of bursts that a network
226	       device needs to absorb.

228	   Besides tail drop, two alternative queue disciplines that can be
229	   applied when a queue becomes full are "random drop on full" or "drop
230	   front on full".  Under the random drop on full discipline, a network
231	   device drops a randomly selected packet from the queue (which can be
232	   an expensive operation, since it naively requires an O(N) walk
233	   through the packet queue) when the queue is full and a new packet
234	   arrives.  Under the "drop front on full" discipline [Lakshman96], the
235	   network device drops the packet at the front of the queue when the
236	   queue is full and a new packet arrives.  Both of these solve the
237	   lock-out problem, but neither solves the full-queues problem
238	   described above.

240	   We know in general how to solve the full-queues problem for
241	   "responsive" flows, i.e., those flows that throttle back in response
242	   to congestion notification.  In the current Internet, dropped packets
243	   provide a critical mechanism indicating congestion notification to
244	   hosts.  The solution to the full-queues problem is for network
245	   devices to drop packets before a queue becomes full, so that hosts
246	   can respond to congestion before buffers overflow.  We call such a
247	   proactive approach AQM.  By dropping packets before buffers overflow,
248	   AQM allows network devices to control when and how many packets to
249	   drop.

251	   In summary, an active queue management mechanism can provide the
252	   following advantages for responsive flows.

254	   1.  Reduce number of packets dropped in network devices

256	       Packet bursts are an unavoidable aspect of packet networks
257	       [Willinger95].  If all the queue space in a network device is
258	       already committed to "steady state" traffic or if the buffer
259	       space is inadequate, then the network device will have no ability
260	       to buffer bursts.  By keeping the average queue size small, AQM
261	       will provide greater capacity to absorb naturally-occurring
262	       bursts without dropping packets.

264	       Furthermore, without AQM, more packets will be dropped when a
265	       queue does overflow.  This is undesirable for several reasons.
266	       First, with a shared queue and the tail drop discipline, this can
267	       result in unnecessary global synchronization of flows, resulting
268	       in lowered average link utilization, and hence lowered network
269	       throughput.  Second, unnecessary packet drops represent a
270	       possible waste of network capacity on the path before the drop
271	       point.

273	       While AQM can manage queue lengths and reduce end-to-end latency
274	       even in the absence of end-to-end congestion control, it will be
275	       able to reduce packet drops only in an environment that continues
276	       to be dominated by end-to-end congestion control.

278	   2.  Provide a lower-delay interactive service

280	       By keeping a small average queue size, AQM will reduce the delays
281	       experienced by flows.  This is particularly important for
282	       interactive applications such as short Web transfers, Telnet
283	       traffic, or interactive audio-video sessions, whose subjective
284	       (and objective) performance is better when the end-to-end delay
285	       is low.

287	   3.  Avoid lock-out behavior

289	       AQM can prevent lock-out behavior by ensuring that there will
290	       almost always be a buffer available for an incoming packet.  For
291	       the same reason, AQM can prevent a bias against low capacity, but
292	       highly bursty, flows.

294	       Lock-out is undesirable because it constitutes a gross unfairness
295	       among groups of flows.  However, we stop short of calling this
296	       benefit "increased fairness", because general fairness among
297	       flows requires per-flow state, which is not provided by queue
298	       management.  For example, in a network device using AQM with only
299	       FIFO scheduling, two TCP flows may receive very different share
300	       of the network capacity simply because they have different round-
301	       trip times [Floyd91], and a flow that does not use congestion
302	       control may receive more capacity than a flow that does.  For
303	       example, a router may maintain per-flow state to achieve general
304	       fairness by a per-flow scheduling algorithm such as Fair Queueing
305	       (FQ) [Demers90], or a Class-Based Queue scheduling algorithm such
306	       as CBQ [Floyd95].

308	       In contrast, AQM is needed even for network devices that use per-
309	       flow scheduling algorithms such as FQ or class-based scheduling
310	       algorithms, such as CBQ.  This is because per-flow scheduling
311	       algorithms by themselves do not control the overall queue size or
312	       the size of individual queues.  AQM is needed to control the
313	       overall average queue sizes, so that arriving bursts can be
314	       accommodated without dropping packets.  In addition, AQM should
315	       be used to control the queue size for each individual flow or
316	       class, so that they do not experience unnecessarily high delay.
317	       Therefore, AQM should be applied across the classes or flows as
318	       well as within each class or flow.

320	   In short, scheduling algorithms and queue management should be seen
321	   as complementary, not as replacements for each other.

323	   An AQM method may use Explicit Congestion Notification (ECN)
324	   [RFC3168] instead of dropping to mark packets under mild or moderate
325	   congestion (see Section 4.2.1).

327	   It is also important to differentiate the choice of buffer size for a
328	   queue in a switch/router or other network device, and the
329	   threshold(s) and other parameters that determine how and when an AQM
330	   algorithm operates.  One the one hand, the optimum buffer size is a
331	   function of operational requirements and should generally be sized to
332	   be sufficient to buffer the largest normal traffic burst that is
333	   expected.  This size depends on the number and burstiness of traffic
334	   arriving at the queue and the rate at which traffic leaves the queue.
335	   Different types of traffic and deployment scenarios will lead to
336	   different requirements.  On the other hand, the choice of AQM
337	   algorithm and associated parameters is a function of the way in which
338	   congestion is experienced and the required reaction to achieve
339	   acceptable performance.  This latter topic is the primary topic of
340	   the following sections.

342	3.  Managing Aggressive Flows

344	   One of the keys to the success of the Internet has been the
345	   congestion avoidance mechanisms of TCP.  Because TCP "backs off"
346	   during congestion, a large number of TCP connections can share a
347	   single, congested link in such a way that link bandwidth is shared
348	   reasonably equitably among similarly situated flows.  The equitable
349	   sharing of bandwidth among flows depends on all flows running
350	   compatible congestion avoidance algorithms, i.e., methods conformant
351	   with the current TCP specification [RFC5681].

353	   We call a flow "TCP-friendly" when it has a congestion response that
354	   approximates the average response expected of a TCP flow.  One
355	   example method of a TCP-friendly scheme is the TCP-Friendly Rate
356	   Control algorithm [RFC5348].  In this document, the term is used more
357	   generally to describe this and other algorithms that meet these
358	   goals.

360	   It is convenient to divide flows into three classes: (1) TCP Friendly
361	   flows, (2) unresponsive flows, i.e., flows that do not slow down when
362	   congestion occurs, and (3) flows that are responsive but are not TCP-
363	   friendly.  The last two classes contain more aggressive flows that
364	   pose significant threats to Internet performance, which we will now
365	   discuss.

367	   1.  TCP-Friendly flows

369	       A TCP-friendly flow responds to congestion notification within a
370	       small number of path Round Trip Times (RTT), and in steady-state
371	       it uses no more capacity than a conformant TCP running under
372	       comparable conditions (drop rate, RTT, packet size, etc.).  This
373	       is described in the remainder of the document.

375	   2.  Non-Responsive Flows

377	       The User Datagram Protocol (UDP) [RFC0768] provides a minimal,
378	       best-effort transport to applications and upper-layer protocols
379	       (both simply called "applications" in the remainder of this
380	       document) and does not itself provide mechanisms to prevent
381	       congestion collapse and establish a degree of fairness [RFC5405].

383	       There is a growing set of UDP-based applications whose congestion
384	       avoidance algorithms are inadequate or nonexistent (i.e, a flow
385	       that does not throttle its sending rate when it experiences
386	       congestion).  Examples include some UDP streaming applications
387	       for packet voice and video, and some multicast bulk data
388	       transport.  If no action is taken, such unresponsive flows could
389	       lead to a new congestive collapse [RFC2309].

391	       In general, UDP-based applications need to incorporate effective
392	       congestion avoidance mechanisms [RFC5405].  Further research and
393	       development of ways to accomplish congestion avoidance for
394	       presently unresponsive applications continue to be important.
395	       Network devices need to be able to protect themselves against
396	       unresponsive flows, and mechanisms to accomplish this must be
397	       developed and deployed.  Deployment of such mechanisms would
398	       provide an incentive for all applications to become responsive by
399	       either using a congestion-controlled transport (e.g. TCP, SCTP,
400	       DCCP) or by incorporating their own congestion control in the
401	       application [RFC5405].

403	   3.  Non-TCP-friendly Transport Protocols

405	       A second threat is posed by transport protocol implementations
406	       that are responsive to congestion, but, either deliberately or
407	       through faulty implementation, are not TCP-friendly.  Such
408	       applications may gain an unfair share of the available network
409	       capacity.

411	       For example, the popularity of the Internet has caused a
412	       proliferation in the number of TCP implementations.  Some of
413	       these may fail to implement the TCP congestion avoidance
414	       mechanisms correctly because of poor implementation.  Others may
415	       deliberately be implemented with congestion avoidance algorithms
416	       that are more aggressive in their use of capacity than other TCP
417	       implementations; this would allow a vendor to claim to have a
418	       "faster TCP".  The logical consequence of such implementations
419	       would be a spiral of increasingly aggressive TCP implementations,
420	       leading back to the point where there is effectively no
421	       congestion avoidance and the Internet is chronically congested.

423	       Another example could be an RTP/UDP video flow that uses an
424	       adaptive codec, but responds incompletely to indications of
425	       congestion or responds over an excessively long time period.
426	       Such flows are unlikely to be responsive to congestion signals in
427	       a timeframe comparable to a small number of end-to-end
428	       transmission delays.  However, over a longer timescale, perhaps
429	       seconds in duration, they could moderate their speed, or increase
430	       their speed if they determine capacity to be available.

432	       Tunneled traffic aggregates carrying multiple (short) TCP flows
433	       can be more aggressive than standard bulk TCP.  Applications
434	       (e.g. web browsers and peer-to-peer file-sharing) have exploited
435	       this by opening multiple connections to the same endpoint.

437	   The projected increase in the fraction of total Internet traffic for
438	   more aggressive flows in classes 2 and 3 clearly poses a threat to
439	   future Internet stability.  There is an urgent need for measurements
440	   of current conditions and for further research into the ways of
441	   managing such flows.  This raises many difficult issues in
442	   identifying and isolating unresponsive or non-TCP-friendly flows at
443	   an acceptable overhead cost.  Finally, there is as yet little
444	   measurement or simulation evidence available about the rate at which
445	   these threats are likely to be realized, or about the expected
446	   benefit of algorithms for managing such flows.

448	   Another topic requiring consideration is the appropriate granularity
449	   of a "flow" when considering a queue management method.  There are a
450	   few "natural" answers: 1) a transport (e.g. TCP or UDP) flow (source
451	   address/port, destination address/port, Differentiated Services Code
452	   Point - DSCP); 2) a source/destination host pair (IP addresses,
453	   DSCP); 3) a given source host or a given destination host.  We
454	   suggest that the source/destination host pair gives the most
455	   appropriate granularity in many circumstances.  However, it is
456	   possible that different vendors/providers could set different
457	   granularities for defining a flow (as a way of "distinguishing"
458	   themselves from one another), or that different granularities could
459	   be chosen for different places in the network.  It may be the case
460	   that the granularity is less important than the fact that a network
461	   device needs to be able to deal with more unresponsive flows at
462	   *some* granularity.  The granularity of flows for congestion
463	   management is, at least in part, a question of policy that needs to
464	   be addressed in the wider IETF community.

466	4.  Conclusions and Recommendations

468	   The IRTF, in publishing [RFC2309], and the IETF in subsequent
469	   discussion, has developed a set of specific recommendations regarding
470	   the implementation and operational use of AQM procedures.  This
471	   document updates these to include:

473	   1.  Network devices SHOULD implement some AQM mechanism to manage
474	       queue lengths, reduce end-to-end latency, and avoid lock-out
475	       phenomena within the Internet.

477	   2.  Deployed AQM algorithms SHOULD support Explicit Congestion
478	       Notification (ECN) as well as loss to signal congestion to
479	       endpoints.

481	   3.  The algorithms that the IETF recommends SHOULD NOT require
482	       operational (especially manual) configuration or tuning.

484	   4.  AQM algorithms SHOULD respond to measured congestion, not
485	       application profiles.

487	   5.  AQM algorithms SHOULD NOT interpret specific transport protocol
488	       behaviours.

490	   6.  Transport protocol congestion control algorithms SHOULD maximize
491	       their use of available capacity (when there is data to send)
492	       without incurring undue loss or undue round trip delay.

494	   7.  Research, engineering, and measurement efforts are needed
495	       regarding the design of mechanisms to deal with flows that are
496	       unresponsive to congestion notification or are responsive, but
497	       are more aggressive than present TCP.

499	   These recommendations are expressed using the word "SHOULD".  This is
500	   in recognition that there may be use cases that have not been
501	   envisaged in this document in which the recommendation does not
502	   apply.  However, care should be taken in concluding that one's use
503	   case falls in that category; during the life of the Internet, such
504	   use cases have been rarely if ever observed and reported on.  To the
505	   contrary, available research [Papagiannaki] says that even high speed
506	   links in network cores that are normally very stable in depth and
507	   behavior experience occasional issues that need moderation.

509	4.1.  Operational deployments SHOULD use AQM procedures

511	   AQM procedures are designed to minimize the delay induced in the
512	   network by queues that have filled as a result of host behavior.
513	   Marking and loss behaviors provide a signal that buffers within
514	   network devices are becoming unnecessarily full, and that the sender
515	   would do well to moderate its behavior.

517	4.2.  Signaling to the transport endpoints

519	   There are a number of ways a network device may signal to the end
520	   point that the network is becoming congested and trigger a reduction
521	   in rate.  The signalling methods include:

523	   o  Delaying transport segments (packets) in flight, such as in a
524	      queue.

526	   o  Dropping transport segments (packets) in transit.

528	   o  Marking transport segments (packets), such as using Explicit
529	      Congestion Control[RFC3168] [RFC4301] [RFC4774] [RFC6040]
530	      [RFC6679].

532	   The use of scheduling mechanisms, such as priority queuing, classful
533	   queuing, and fair queuing, is often effective in networks to help a
534	   network serve the needs of a range of applications.  Network
535	   operators can use these methods to manage traffic passing a choke
536	   point.  This is discussed in [RFC2474] and [RFC2475].

538	   Increased network latency can be used as an implicit signal of
539	   congestion.  E.g., in TCP additional delay can affect ACK Clocking
540	   and has the result of reducing the rate of transmission of new data.
541	   In RTP, network latency impacts the RTCP-reported RTT and increased
542	   latency can trigger a sender to adjust its rate.  Methods such as
543	   LEDBAT [RFC6817] assume increased latency as a primary signal of
544	   congestion.

546	   It is essential that all Internet hosts respond to loss [RFC5681],
547	   [RFC5405][RFC4960][RFC4340].  Packet dropping by network devices that
548	   are under load has two effects: It protects the network, which is the
549	   primary reason that network devices drop packets.  The detection of
550	   loss also provides a signal to a reliable transport (e.g. TCP, SCTP)
551	   that there is potential congestion using a pragmatic heuristic; "when
552	   the network discards a message in flight, it may imply the presence
553	   of faulty equipment or media in a path, and it may imply the presence
554	   of congestion.  To be conservative transport must the latter."
555	   Unreliable transports (e.g. using UDP) need to similarly react to
556	   loss [RFC5405]

558	   Network devices SHOULD use an AQM algorithm to determine the packets
559	   that are marked or discarded due to congestion.

561	   Loss also has an effect on the efficiency of a flow and can
562	   significantly impact some classes of application.  In reliable
563	   transports the dropped data must be subsequently retransmitted.
564	   While other applications/transports may adapt to the absence of lost
565	   data, this still implies inefficient use of available capacity and
566	   the dropped traffic can affect other flows.  Hence, loss is not
567	   entirely positive; it is a necessary evil.

569	4.2.1.  AQM and ECN

571	   Explicit Congestion Notification (ECN) [RFC4301] [RFC4774] [RFC6040]
572	   [RFC6679] is a network-layer function that allows a transport to
573	   receive network congestion information from a network device without
574	   incurring the unintended consequences of loss.  ECN includes both
575	   transport mechanisms and functions implemented in network devices,
576	   the latter rely upon using AQM to decider whether to ECN-mark.

578	   Congestion for ECN-capable transports is signalled by a network
579	   device setting the "Congestion Experienced (CE)" codepoint in the IP
580	   header.  This codepoint is noted by the remote receiving end point
581	   and signalled back to the sender using a transport protocol
582	   mechanism, allowing the sender to trigger timely congestion control.
583	   The decision to set the CE codepoint requires an AQM algorithm
584	   configured with a threshold.  Non-ECN capable flows (the default) are
585	   dropped under congestion.

587	   Network devices SHOULD use an AQM algorithm that marks ECN-capable
588	   traffic when making decisions about the response to congestion.
589	   Network devices need to implement this method by marking ECN-capable
590	   traffic or by dropping non-ECN-capable traffic.

592	   Safe deployment of ECN requires that network devices drop excessive
593	   traffic, even when marked as originating from an ECN-capable
594	   transport.  This is a necessary safety precaution because (1) A non-
595	   conformant, broken or malicious receiver could conceal an ECN mark,
596	   and not report this to the sender (2) A non-conformant, broken or
597	   malicious sender could ignore a reported ECN mark, as it could ignore
598	   a loss without using ECN (3) A malfunctioning or non-conforming
599	   network device may similarly "hide" an ECN mark.  In normal operation
600	   such cases should be very uncommon.

602	   Network devices SHOULD use an algorithm to drop excessive traffic,
603	   even when marked as originating from an ECN-capable transport.

605	4.3.  AQM algorithms deployed SHOULD NOT require operational tuning

607	   A number of AQM algorithms have been proposed.  Many require some
608	   form of tuning or setting of parameters for initial network
609	   conditions.  This can make these algorithms difficult to use in
610	   operational networks.

612	   AQM algorithms need to consider both "initial conditions" and
613	   "operational conditions".  The former includes values that exist
614	   before any experience is gathered about the use of the algorithm,
615	   such as the configured speed of interface, support for full duplex
616	   communication, interface MTU and other properties of the link.  The
617	   latter includes information observed from monitoring the size of the
618	   queue, experienced queueing delay, rate of packet discard, etc.

620	   This document therefore specifies that AQM algorithms that are
621	   proposed for deployment in the Internet have the following
622	   properties:

624	   o  SHOULD NOT require tuning of initial or configuration parameters.
625	      An algorithm needs to provide a default behaviour that auto-tunes
626	      to a reasonable performance for typical network operational
627	      conditions.  This is expected to ease deployment and operation.
628	      Initial conditions, such as the interface rate and MTU size or
629	      other values derived from these, MAY be required by an AQM
630	      algorithm.

632	   o  MAY support further manual tuning that could improve performance
633	      in a specific deployed network.  Algorithms that lack such
634	      variables are acceptable, but if such variables exist, they SHOULD
635	      be externalized (made visible to the operator).  Guidance needs to
636	      be provided on the cases where autotuning is unlikely to achieve
637	      satisfactory performance and to identify the set of parameters
638	      that can be tuned.  This is expected to enable the algorithm to be
639	      deployed in networks that have specific characteristics (variable/
640	      larger delay; networks were capacity is impacted by interactions
641	      with lower layer mechanisms, etc).

643	   o  MAY provide logging and alarm signals to assist in identifying if
644	      an algorithm using manual or auto-tuning is functioning as
645	      expected. (e.g., this could be based on an internal consistency
646	      check between input, output, and mark/drop rates over time).  This
647	      is expected to encourage deployment by default and allow operators
648	      to identify potential interactions with other network functions.

650	   Hence, self-tuning algorithms are to be preferred.  Algorithms
651	   recommended for general Internet deployment by the IETF need to be
652	   designed so that they do not require operational (especially manual)
653	   configuration or tuning.

655	4.4.  AQM algorithms SHOULD respond to measured congestion, not
656	      application profiles.

658	   Not all applications transmit packets of the same size.  Although
659	   applications may be characterised by particular profiles of packet
660	   size this should not be used as the basis for AQM (see next section).
661	   Other methods exist, e.g. Differentiated Services queueing, Pre-
662	   Congestion Notification (PCN) [RFC5559], that can be used to
663	   differentiate and police classes of application.  Network devices may
664	   combine AQM with these traffic classification mechanisms and perform
665	   AQM only on specific queues within a network device.

667	   An AQM algorithm should not deliberately try to prejudice the size of
668	   packet that performs best (i.e. Preferentially drop/mark based only
669	   on packet size).  Procedures for selecting packets to mark/drop
670	   SHOULD observe the actual or projected time that a packet is in a
671	   queue (bytes at a rate being an analog to time).  When an AQM
672	   algorithm decides whether to drop (or mark) a packet, it is
673	   RECOMMENDED that the size of the particular packet should not be
674	   taken into account [Byte-pkt].

676	   Applications (or transports) generally know the packet size that they
677	   are using and can hence make their judgments about whether to use
678	   small or large packets based on the data they wish to send and the
679	   expected impact on the delay or throughput, or other performance
680	   parameter.  When a transport or application responds to a dropped or
681	   marked packet, the size of the rate reduction should be proportionate
682	   to the size of the packet that was sent [Byte-pkt].

684	   AQM-enabled system MAY instantiate different instances of an AQM
685	   algorithm to be applied within the same traffic class.  Traffic
686	   classes may be differentiated based on an Access Control List (ACL),
687	   the packet DiffServ Code Point (DSCP) [RFC5559], setting of the ECN
688	   field[RFC3168] [RFC4774] or an equivalent codepoint at a lower layer.
689	   This recommendation goes beyond what is defined in RFC 3168, by
690	   allowing that an implementation MAY use more than one instance of an
691	   AQM algorithm to handle both ECN-capable and non-ECN-capable packets.

693	4.5.  AQM algorithms SHOULD NOT be dependent on specific transport
694	      protocol behaviours

696	   In deploying AQM, network devices need to support a range of Internet
697	   traffic and SHOULD NOT make implicit assumptions about the
698	   characteristics desired by the set transports/applications the
699	   network supports.  That is, AQM methods should be opaque to the
700	   choice of transport and application.

702	   AQM algorithms are often evaluated by considering TCP [RFC0793] with
703	   a limited number of applications.  Although TCP is the predominant
704	   transport in the Internet today, this no longer represents a
705	   sufficient selection of traffic for verification.  There is
706	   significant use of UDP [RFC0768] in voice and video services, and
707	   some applications find utility in SCTP [RFC4960] and DCCP [RFC4340].
708	   Hence, AQM algorithms should also demonstrate operation with
709	   transports other than TCP and need to consider a variety of
710	   applications.  Selection of AQM algorithms also needs to consider use
711	   of tunnel encapsulations that may carry traffic aggregates.

713	   AQM algorithms SHOULD NOT target or derive implicit assumptions about
714	   the characteristics desired by specific transports/applications.
715	   Transports and applications need to respond to the congestion signals
716	   provided by AQM (i.e. dropping or ECN-marking) in a timely manner
717	   (within a few RTT at the latest).

719	4.6.  Interactions with congestion control algorithms

721	   Applications and transports need to react to received implicit or
722	   explicit signals that indicate the presence of congestion.  This
723	   section identifies issues that can impact the design of transport
724	   protocols when using paths that use AQM.

726	   Transport protocols and applications need timely signals of
727	   congestion.  The time taken to detect and respond to congestion is
728	   increased when network devices queue packets in buffers.  It can be
729	   difficult to detect tail losses at a higher layer and this may
730	   sometimes require transport timers or probe packets to detect and
731	   respond to such loss.  Loss patterns may also impact timely
732	   detection, e.g. the time may be reduced when network devices do not
733	   drop long runs of packets from the same flow.

735	   A common objective is to deliver data from its source end point to
736	   its destination in the least possible time.  When speaking of TCP
737	   performance, the terms "knee" and "cliff" area defined by [Jain94].
738	   They respectively refer to the minimum congestion window that
739	   maximises throughput and the maximum congestion window that avoids
740	   loss.  An application that transmits at the rate determined by this
741	   window has the effect of maximizing the rate or throughput.  For the
742	   sender, exceeding the cliff is ineffective, as it (by definition)
743	   induces loss; operating at a point close to the cliff has a negative
744	   impact on other traffic and applications, triggering operator
745	   activities, such as those discussed in [RFC6057].  Operating below
746	   the knee reduces the throughput, since the sender fails to use
747	   available network capacity.  As a result, the behavior of any elastic
748	   transport congestion control algorithm designed to minimise delivery
749	   time should seek to use an effective window at or above the knee and
750	   well below the cliff.  Choice of an appropriate rate can
751	   significantly impact the loss and delay experienced not only by a
752	   flow, but by other flows that share the same queue.

754	   Some applications may send less than permitted by the congestion
755	   control window (or rate).  Examples include multimedia codecs that
756	   stream at some natural rate (or set of rates) or an application that
757	   is naturally interactive (e.g., some web applications, gaming,
758	   transaction-based protocols).  Such applications may have different
759	   objectives.  They may not wish to maximise throughput, but may desire
760	   a lower loss rate or bounded delay.

762	   The correct operation of an AQM-enabled network device MUST NOT rely
763	   upon specific transport responses to congestion signals.

765	4.7.  The need for further research

767	   The second recommendation of [RFC2309] called for further research
768	   into the interaction between network queues and host applications,
769	   and the means of signaling between them.  This research has occurred,
770	   and we as a community have learned a lot.  However, we are not done.

772	   We have learned that the problems of congestion, latency and buffer-
773	   sizing have not gone away, and are becoming more important to many
774	   users.  A number of self-tuning AQM algorithms have been found that
775	   offer significant advantages for deployed networks.  There is also
776	   renewed interest in deploying AQM and the potential of ECN.

778	   In 2013, an obvious example of further research is the need to
779	   consider the use of Map/Reduce applications in data centers; do we
780	   need to extend our taxonomy of TCP/SCTP sessions to include not only
781	   "mice" and "elephants", but "lemmings".  Where "Lemmings" are flash
782	   crowds of "mice" that the network inadvertently try to signal to as
783	   if they were elephant flows, resulting in head of line blocking in
784	   data center applications.

786	   Examples of other required research include:

788	   o  Research into new AQM and scheduling algorithms.

790	   o  Research into the use of and deployment of ECN alongside AQM.

792	   o  Tools for enabling AQM (and ECN) deployment and measuring the
793	      performance.

795	   o  Methods for mitigating the impact of non-conformant and malicious
796	      flows.

798	   o  Research to understand the implications of using new network and
799	      transport methods on applications.

801	   Hence, this document therefore reiterates the call of RFC 2309: we
802	   need continuing research as applications develop.

804	5.  IANA Considerations

806	   This memo asks the IANA for no new parameters.

808	6.  Security Considerations

810	   While security is a very important issue, it is largely orthogonal to
811	   the performance issues discussed in this memo.

813	   Many deployed network devices use queueing methods that allow
814	   unresponsive traffic to capture network capacity, denying access to
815	   other traffic flows.  This could potentially be used as a denial-of-
816	   service attack.  This threat could be reduced in network devices
817	   deploy AQM or some form of scheduling.  We note, however, that a
818	   denial-of-service attack may create unresponsive traffic flows that
819	   may be indistinguishable from other traffic flows (e.g. tunnels
820	   carrying aggregates of short flows, high-rate isochronous
821	   applications).  New methods therefore may remain vulnerable, and this
822	   document recommends that ongoing research should consider ways to
823	   mitigate such attacks.

825	7.  Privacy Considerations

827	   This document, by itself, presents no new privacy issues.

829	8.  Acknowledgements

831	   The original recommendation in [RFC2309] was written by the End-to-
832	   End Research Group, which is to say Bob Braden, Dave Clark, Jon
833	   Crowcroft, Bruce Davie, Steve Deering, Deborah Estrin, Sally Floyd,
834	   Van Jacobson, Greg Minshall, Craig Partridge, Larry Peterson, KK
835	   Ramakrishnan, Scott Shenker, John Wroclawski, and Lixia Zhang.  This
836	   is an edited version of that document, with much of its text and
837	   arguments unchanged.

839	   The need for an updated document was agreed to in the tsvarea meeting
840	   at IETF 86.  This document was reviewed on the aqm@ietf.org list.
841	   Comments came from Colin Perkins, Richard Scheffenegger, Dave Taht,
842	   and many others.

844	   Gorry Fairhurst was in part supported by the European Community under
845	   its Seventh Framework Programme through the Reducing Internet
846	   Transport Latency (RITE) project (ICT-317700).

848	9.  References

850	9.1.  Normative References

852	   [Byte-pkt]
853	              and Internet Engineering Task Force, Work in Progress,
854	              "Byte and Packet Congestion Notification (draft-ietf-
855	              tsvwg-byte-pkt-congest)", July 2013.

857	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
858	              Requirement Levels", BCP 14, RFC 2119, March 1997.

860	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
861	              of Explicit Congestion Notification (ECN) to IP", RFC
862	              3168, September 2001.

864	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
865	              Internet Protocol", RFC 4301, December 2005.

867	   [RFC4774]  Floyd, S., "Specifying Alternate Semantics for the
868	              Explicit Congestion Notification (ECN) Field", BCP 124,
869	              RFC 4774, November 2006.

871	   [RFC5405]  Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines
872	              for Application Designers", BCP 145, RFC 5405, November
873	              2008.

875	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
876	              Control", RFC 5681, September 2009.

878	   [RFC6040]  Briscoe, B., "Tunnelling of Explicit Congestion
879	              Notification", RFC 6040, November 2010.

881	   [RFC6679]  Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P.,
882	              and K. Carlberg, "Explicit Congestion Notification (ECN)
883	              for RTP over UDP", RFC 6679, August 2012.

885	9.2.  Informative References

887	   [AQM-WG]   "IETF AQM WG", .

889	   [Demers90]
890	              Demers, A., Keshav, S., and S. Shenker, "Analysis and
891	              Simulation of a Fair Queueing Algorithm, Internetworking:
892	              Research and Experience", SIGCOMM Symposium proceedings on
893	              Communications architectures and protocols , 1990.

895	   [Floyd91]  Floyd, S., "Connections with Multiple Congested Gateways
896	              in Packet-Switched Networks Part 1: One-way Traffic.",
897	              Computer Communications Review , October 1991.

899	   [Floyd95]  Floyd, S. and V. Jacobson, "Link-sharing and Resource
900	              Management Models for Packet Networks", IEEE/ACM
901	              Transactions on Networking , August 1995.

903	   [Jacobson88]
904	              Jacobson, V., "Congestion Avoidance and Control", SIGCOMM
905	              Symposium proceedings on Communications architectures and
906	              protocols , August 1988.

908	   [Jain94]   Jain, Raj., Ramakrishnan, KK., and Chiu. Dah-Ming,
909	              "Congestion avoidance scheme for computer networks", US
910	              Patent Office 5377327, December 1994.

912	   [Lakshman96]
913	              Lakshman, TV., Neidhardt, A., and T. Ott, "The Drop From
914	              Front Strategy in TCP Over ATM and Its Interworking with
915	              Other Control Features", IEEE Infocomm , 1996.

917	   [Leland94]
918	              Leland, W., Taqqu, M., Willinger, W., and D. Wilson, "On
919	              the Self-Similar Nature of Ethernet Traffic (Extended
920	              Version)", IEEE/ACM Transactions on Networking , February
921	              1994.

923	   [Papagiannaki]
924	              Sprint ATL, KAIST, University of Minnesota, Sprint ATL,
925	              and Intel ResearchIETF, "Analysis of Point-To-Point Packet
926	              Delay In an Operational Network", IEEE Infocom 2004, March
927	              2004, <http://www.ieee-infocom.org/2004/Papers/37_4.PDF>.

929	   [RFC0768]  Postel, J., "User Datagram Protocol", STD 6, RFC 768,
930	              August 1980.

932	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791, September
933	              1981.

935	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7, RFC
936	              793, September 1981.

938	   [RFC0896]  Nagle, J., "Congestion control in IP/TCP internetworks",
939	              RFC 896, January 1984.

941	   [RFC0970]  Nagle, J., "On packet switches with infinite storage", RFC
942	              970, December 1985.

944	   [RFC1122]  Braden, R., "Requirements for Internet Hosts -
945	              Communication Layers", STD 3, RFC 1122, October 1989.

947	   [RFC1633]  Braden, B., Clark, D., and S. Shenker, "Integrated
948	              Services in the Internet Architecture: an Overview", RFC
949	              1633, June 1994.

951	   [RFC2309]  Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
952	              S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
953	              Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
954	              S., Wroclawski, J., and L. Zhang, "Recommendations on
955	              Queue Management and Congestion Avoidance in the
956	              Internet", RFC 2309, April 1998.

958	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
959	              (IPv6) Specification", RFC 2460, December 1998.

961	   [RFC2474]  Nichols, K., Blake, S., Baker, F., and D. Black,
962	              "Definition of the Differentiated Services Field (DS
963	              Field) in the IPv4 and IPv6 Headers", RFC 2474, December
964	              1998.

966	   [RFC2475]  Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.,
967	              and W. Weiss, "An Architecture for Differentiated
968	              Services", RFC 2475, December 1998.

970	   [RFC4340]  Kohler, E., Handley, M., and S. Floyd, "Datagram
971	              Congestion Control Protocol (DCCP)", RFC 4340, March 2006.

973	   [RFC4960]  Stewart, R., "Stream Control Transmission Protocol", RFC
974	              4960, September 2007.

976	   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
977	              Friendly Rate Control (TFRC): Protocol Specification", RFC
978	              5348, September 2008.

980	   [RFC5559]  Eardley, P., "Pre-Congestion Notification (PCN)
981	              Architecture", RFC 5559, June 2009.

983	   [RFC6057]  Bastian, C., Klieber, T., Livingood, J., Mills, J., and R.
984	              Woundy, "Comcast's Protocol-Agnostic Congestion Management
985	              System", RFC 6057, December 2010.

987	   [RFC6817]  Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind,
988	              "Low Extra Delay Background Transport (LEDBAT)", RFC 6817,
989	              December 2012.

991	   [Willinger95]
992	              Willinger, W., Taqqu, M., Sherman, R., Wilson, D., and V.
993	              Jacobson, "Self-Similarity Through High-Variability:
994	              Statistical Analysis of Ethernet LAN Traffic at the Source
995	              Level", SIGCOMM Symposium proceedings on Communications
996	              architectures and protocols , August 1995.

998	Appendix A.  Change Log

1000	   Initial Version:  March 2013

1002	   Minor update of the algorithms that the IETF recommends SHOULD NOT
1003	   require operational (especially manual) configuration or tuningdate:

1005	      April 2013

1007	   Major surgery.  This draft is for discussion at IETF-87 and expected
1008	   to be further updated.
1009	      July 2013

1011	   -00 WG Draft - Updated transport recommendations; revised deployment
1012	   configuration section; numerous minor edits.
1013	      Oct 2013

1015	   -01 WG Draft - Updated transport recommendations; revised deployment
1016	   configuration section; numerous minor edits.
1017	      Jan 2014 - Feedback from WG.

1019	   -02 WG Draft - Minor edits  Feb 2014 - Mainly language fixes.

1021	Authors' Addresses

1023	   Fred Baker (editor)
1024	   Cisco Systems
1025	   Santa Barbara, California  93117
1026	   USA

1028	   Email: fred@cisco.com

1030	   Godred Fairhurst (editor)
1031	   University of Aberdeen
1032	   School of Engineering
1033	   Fraser Noble Building
1034	   Aberdeen, Scotland  AB24 3UE
1035	   UK

1037	   Email: gorry@erg.abdn.ac.uk
1038	   URI:   http://www.erg.abdn.ac.uk