idnits 2.17.1 

draft-ietf-aqm-recommendation-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The draft header indicates that this document obsoletes RFC2309, but the
     abstract doesn't seem to directly say this.  It does mention RFC2309
     though, so this could be OK.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 962 has weird spacing: '...-87 and  expec...'

  -- The document date (October 17, 2013) is 3844 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 5405 (Obsoleted by RFC 8085)

  -- Obsolete informational reference (is this intentional?): RFC  793
     (Obsoleted by RFC 9293)

  -- Obsolete informational reference (is this intentional?): RFC  896
     (Obsoleted by RFC 7805)

  -- Obsolete informational reference (is this intentional?): RFC 2309
     (Obsoleted by RFC 7567)

  -- Obsolete informational reference (is this intentional?): RFC 2460
     (Obsoleted by RFC 8200)

  -- Obsolete informational reference (is this intentional?): RFC 2960
     (Obsoleted by RFC 4960)

  -- Obsolete informational reference (is this intentional?): RFC 4960
     (Obsoleted by RFC 9260)


     Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                      F. Baker, Ed.
3	Internet-Draft                                             Cisco Systems
4	Obsoletes: 2309 (if approved)                          G. Fairhurst, Ed.
5	Intended status: BCP                              University of Aberdeen
6	Expires: April 17, 2014                                 October 17, 2013

8	         IETF Recommendations Regarding Active Queue Management
9	                    draft-ietf-aqm-recommendation-00

11	Abstract

13	   This memo presents recommendations to the Internet community
14	   concerning measures to improve and preserve Internet performance.  It
15	   presents a strong recommendation for testing, standardization, and
16	   widespread deployment of active queue management (AQM) in network
17	   devices, to improve the performance of today's Internet.  It also
18	   urges a concerted effort of research, measurement, and ultimate
19	   deployment of AQM mechanisms to protect the Internet from flows that
20	   are not sufficiently responsive to congestion notification.

22	   The note largely repeats the recommendations of RFC 2309, updated
23	   after fifteen years of experience and new research.

25	Status of this Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on April 13, 2014.

42	Copyright Notice

44	   Copyright (c) 2013 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	Table of Contents

59	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
60	     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  4
61	   2.  The Need For Active Queue Management . . . . . . . . . . . . .  4
62	   3.  Managing Aggressive Flows  . . . . . . . . . . . . . . . . . .  8
63	   4.  Conclusions and Recommendations  . . . . . . . . . . . . . . . 10
64	     4.1.  Operational deployments SHOULD  use AQM procedures . . . . 11
65	     4.2.  Signaling to the transport endpoints . . . . . . . . . . . 11
66	       4.2.1.  AQM and ECN  . . . . . . . . . . . . . . . . . . . . . 12
67	     4.3.  AQM algorithms deployed SHOULD NOT require operational
68	           tuning . . . . . . . . . . . . . . . . . . . . . . . . . . 13
69	     4.4.  AQM algorithms SHOULD respond to measured congestion,
70	           not application profiles.  . . . . . . . . . . . . . . . . 14
71	     4.5.  AQM algorithms SHOULD NOT be dependent on specific
72	           transport protocol behaviours  . . . . . . . . . . . . . . 14
73	     4.6.  Interactions with congestion control algorithms  . . . . . 15
74	     4.7.  The need for further research  . . . . . . . . . . . . . . 16
75	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 17
76	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 17
77	   7.  Privacy Considerations . . . . . . . . . . . . . . . . . . . . 17
78	   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17
79	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
80	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 18
81	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 18
82	   Appendix A.  Change Log  . . . . . . . . . . . . . . . . . . . . . 21
83	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21

85	1.  Introduction

87	   The Internet protocol architecture is based on a connectionless end-
88	   to-end packet service using the Internet Protocol, whether IPv4
89	   [RFC0791] or IPv6 [RFC2460].  The advantages of its connectionless
90	   design: flexibility and robustness, have been amply demonstrated.
91	   However, these advantages are not without cost: careful design is
92	   required to provide good service under heavy load.  In fact, lack of
93	   attention to the dynamics of packet forwarding can result in severe
94	   service degradation or "Internet meltdown".  This phenomenon was
95	   first observed during the early growth phase of the Internet of the
96	   mid 1980s [RFC0896][RFC0970], and is technically called "congestive
97	   collapse".

99	   The original fix for Internet meltdown was provided by Van Jacobsen.
100	   Beginning in 1986, Jacobsen developed the congestion avoidance
101	   mechanisms that are now required in TCP implementations [Jacobson88]
102	   [RFC1122].  These mechanisms operate in Internet hosts to cause TCP
103	   connections to "back off" during congestion.  We say that TCP flows
104	   are "responsive" to congestion signals (i.e., marked or dropped
105	   packets) from the network.  It is primarily these TCP congestion
106	   avoidance algorithms that prevent the congestive collapse of today's
107	   Internet.

109	   However, that is not the end of the story.  Considerable research has
110	   been done on Internet dynamics since 1988, and the Internet has
111	   grown.  It has become clear that the TCP congestion avoidance
112	   mechanisms [RFC5681], while necessary and powerful, are not
113	   sufficient to provide good service in all circumstances.  Basically,
114	   there is a limit to how much control can be accomplished from the
115	   edges of the network.  Some mechanisms are needed in the network
116	   devices to complement the endpoint congestion avoidance mechanisms.
117	   These mechanisms may be implemented in network devices that include
118	   routers, switches, and other network middleboxes.

120	   It is useful to distinguish between two classes of algorithms related
121	   to congestion control: "queue management" versus "scheduling"
122	   algorithms.  To a rough approximation, queue management algorithms
123	   manage the length of packet queues by marking or dropping packets
124	   when necessary or appropriate, while scheduling algorithms determine
125	   which packet to send next and are used primarily to manage the
126	   allocation of bandwidth among flows.  While these two AQM mechanisms
127	   are closely related, they address different performance issues.

129	   This memo highlights two performance issues:

131	   The first issue is the need for an advanced form of queue management
132	   that we call "active queue management."  Section 2 summarizes the
133	   benefits that active queue management can bring.  A number of Active
134	   Queue Management (AQM) procedures are described in the literature,
135	   with different characteristics.  This document does not recommend any
136	   of them in particular, but does make recommendations that ideally
137	   would affect the choice of procedure used in a given implementation.

139	   The second issue, discussed in Section 3 of this memo, is the
140	   potential for future congestive collapse of the Internet due to flows
141	   that are unresponsive, or not sufficiently responsive, to congestion
142	   indications.  Unfortunately, there is no consensus solution to
143	   controlling congestion caused by such aggressive flows; significant
144	   research and engineering will be required before any solution will be
145	   available.  It is imperative that this work be energetically pursued,
146	   to ensure the future stability of the Internet.

148	   Section 4 concludes the memo with a set of recommendations to the
149	   Internet community concerning these topics.

151	   The discussion in this memo applies to "best-effort" traffic, which
152	   is to say, traffic generated by applications that accept the
153	   occasional loss, duplication, or reordering of traffic in flight.  It
154	   also applies to other traffic, such as real-time traffic that can
155	   adapt its sending rate to reduce loss and/or delay.  It is most
156	   effective, when the adaption occurs on time scales of a single RTT or
157	   a small number of RTTs, for elastic traffic [RFC1633].

159	   [RFC2309] resulted from past discussions of end-to-end performance,
160	   Internet congestion, and Random Early Discard (RED) in the End-to-End
161	   Research Group of the Internet Research Task Force (IRTF).  This
162	   update results from experience with this and other algorithms, and
163	   the AQM discussion within the IETF[AQM-WG].

165	1.1.  Requirements Language

167	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
168	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
169	   document are to be interpreted as described in [RFC2119].

171	2.  The Need For Active Queue Management

173	   The traditional technique for managing the queue length in a network
174	   device is to set a maximum length (in terms of packets) for each
175	   queue, accept packets for the queue until the maximum length is
176	   reached, then reject (drop) subsequent incoming packets until the
177	   queue decreases because a packet from the queue has been transmitted.
178	   This technique is known as "tail drop", since the packet that arrived
179	   most recently (i.e., the one on the tail of the queue) is dropped
180	   when the queue is full.  This method has served the Internet well for
181	   years, but it has two important drawbacks.

183	   1.  Lock-Out

185	       In some situations tail drop allows a single connection or a few
186	       flows to monopolize queue space, preventing other connections
187	       from getting room in the queue.  This "lock-out" phenomenon is
188	       often the result of synchronization or other timing effects.

190	   2.  Full Queues

192	       The tail drop discipline allows queues to maintain a full (or,
193	       almost full) status for long periods of time, since tail drop
194	       signals congestion (via a packet drop) only when the queue has
195	       become full.  It is important to reduce the steady-state queue
196	       size, and this is perhaps the most important goal for queue
197	       management.

199	       The naive assumption might be that there is a simple tradeoff
200	       between delay and throughput, and that the recommendation that
201	       queues be maintained in a "non-full" state essentially translates
202	       to a recommendation that low end-to-end delay is more important
203	       than high throughput.  However, this does not take into account
204	       the critical role that packet bursts play in Internet
205	       performance.  Even though TCP constrains the congestion window of
206	       a flow, packets often arrive at network devices in bursts
207	       [Leland94].  If the queue is full or almost full, an arriving
208	       burst will cause multiple packets to be dropped.  This can result
209	       in a global synchronization of flows throttling back, followed by
210	       a sustained period of lowered link utilization, reducing overall
211	       throughput.

213	       The point of buffering in the network is to absorb data bursts
214	       and to transmit them during the (hopefully) ensuing bursts of
215	       silence.  This is essential to permit the transmission of bursty
216	       data.  Normally small queues are prefered in network devices,
217	       with sufficient queue capacity to absorb the bursts.  The
218	       counter-intuitive result is that maintaining normally-small
219	       queues can result in higher throughput as well as lower end-to-
220	       end delay.  In summary, queue limits should not reflect the
221	       steady state queues we want to be maintained in the network;
222	       instead, they should reflect the size of bursts that a network
223	       device needs to absorb.

225	   Besides tail drop, two alternative queue disciplines that can be
226	   applied when a queue becomes full are "random drop on full" or "drop
227	   front on full".  Under the random drop on full discipline, a network
228	   device drops a randomly selected packet from the queue (which can be
229	   an expensive operation, since it naively requires an O(N) walk
230	   through the packet queue) when the queue is full and a new packet
231	   arrives.  Under the "drop front on full" discipline [Lakshman96], the
232	   network device drops the packet at the front of the queue when the
233	   queue is full and a new packet arrives.  Both of these solve the
234	   lock-out problem, but neither solves the full-queues problem
235	   described above.

237	   We know in general how to solve the full-queues problem for
238	   "responsive" flows, i.e., those flows that throttle back in response
239	   to congestion notification.  In the current Internet, dropped packets
240	   provide a critical mechanism indicating congestion notification to
241	   hosts.  The solution to the full-queues problem is for network
242	   devices to drop packets before a queue becomes full, so that hosts
243	   can respond to congestion before buffers overflow.  We call such a
244	   proactive approach AQM.  By dropping packets before buffers overflow,
245	   AQM allows network devices to control when and how many packets to
246	   drop.

248	   In summary, an active queue management mechanism can provide the
249	   following advantages for responsive flows.

251	   1.  Reduce number of packets dropped in network devices

253	       Packet bursts are an unavoidable aspect of packet networks
254	       [Willinger95].  If all the queue space in a network device is
255	       already committed to "steady state" traffic or if the buffer
256	       space is inadequate, then the network device will have no ability
257	       to buffer bursts.  By keeping the average queue size small, AQM
258	       will provide greater capacity to absorb naturally-occurring
259	       bursts without dropping packets.

261	       Furthermore, without AQM, more packets will be dropped when a
262	       queue does overflow.  This is undesirable for several reasons.
263	       First, with a shared queue and the tail drop discipline, this can
264	       result in unnecessary global synchronization of flows, resulting
265	       in lowered average link utilization, and hence lowered network
266	       throughput.  Second, unnecessary packet drops represent a
267	       possible waste of network capacity on the path before the drop
268	       point.

270	       While AQM can manage queue lengths and reduce end-to-end latency
271	       even in the absence of end-to-end congestion control, it will be
272	       able to reduce packet drops only in an environment that continues
273	       to be dominated by end-to-end congestion control.

275	   2.  Provide a lower-delay interactive service

277	       By keeping a small average queue size, AQM will reduce the delays
278	       experienced by flows.  This is particularly important for
279	       interactive applications such as short Web transfers, Telnet
280	       traffic, or interactive audio-video sessions, whose subjective
281	       (and objective) performance is better when the end-to-end delay
282	       is low.

284	   3.  Avoid lock-out behavior

286	       AQM can prevent lock-out behavior by ensuring that there will
287	       almost always be a buffer available for an incoming packet.  For
288	       the same reason, AQM can prevent a bias against low capacity, but
289	       highly bursty, flows.

291	       Lock-out is undesirable because it constitutes a gross unfairness
292	       among groups of flows.  However, we stop short of calling this
293	       benefit "increased fairness", because general fairness among
294	       flows requires per-flow state, which is not provided by queue
295	       management.  For example, in a network device using AQM with only
296	       FIFO scheduling, two TCP flows may receive very different share
297	       of the network capacity simply because they have different round-
298	       trip times [Floyd91], and a flow that does not use congestion
299	       control may receive more capacity than a flow that does.  For
300	       example, a router may maintain per-flow state to achieve general
301	       fairness by a per-flow scheduling algorithm such as Fair Queueing
302	       (FQ) [Demers90], or a Class-Based Queue scheduling algorithm such
303	       as CBQ [Floyd95].

305	       In contrast, AQM is needed even for network devices that use per-
306	       flow scheduling algorithms such as FQ or class-based scheduling
307	       algorithms, such as CBQ.  This is because per-flow scheduling
308	       algorithms by themselves do not control the overall queue size or
309	       the size of individual queues.  AQM is needed to control the
310	       overall average queue sizes, so that arriving bursts can be
311	       accommodated without dropping packets.  In addition, AQM should
312	       be used to control the queue size for each individual flow or
313	       class, so that they do not experience unnecessarily high delay.
314	       Therefore, AQM should be applied across the classes or flows as
315	       well as within each class or flow.

317	       In short, scheduling algorithms and queue management should be
318	       seen as complementary, not as replacements for each other.

320	3.  Managing Aggressive Flows

322	   One of the keys to the success of the Internet has been the
323	   congestion avoidance mechanisms of TCP.  Because TCP "backs off"
324	   during congestion, a large number of TCP connections can share a
325	   single, congested link in such a way that link bandwidth is shared
326	   reasonably equitably among similarly situated flows.  The equitable
327	   sharing of bandwidth among flows depends on all flows running
328	   compatible congestion avoidance algorithms, i.e., methods conformant
329	   with the current TCP specification [RFC5681].

331	   We call a flow "TCP-friendly" when it has a congestion response that
332	   approximates the average response expected of a TCP flow.  One
333	   example method of a TCP-friendly scheme is the TCP-Friendly Rate
334	   Control algorithm [RFC5348].  In this document, the term is used more
335	   generally to describe this and other algorithms that meet these
336	   goals.

338	   It is convenient to divide flows into three classes: (1) TCP Friendly
339	   flows, (2) unresponsive flows, i.e., flows that do not slow down when
340	   congestion occurs, and (3) flows that are responsive but are not TCP-
341	   friendly.  The last two classes contain more aggressive flows that
342	   pose significant threats to Internet performance, which we will now
343	   discuss.

345	   1.  TCP-Friendly flows

347	       A TCP-friendly flow responds to congestion notification within a
348	       small number of path Round Trip Times (RTT), and in steady-state
349	       it uses no more capacity than a conformant TCP running under
350	       comparable conditions (drop rate, RTT, MTU, etc.).  This is
351	       described in the remainder of the document.

353	   2.  Non-Responsive Flows

355	       The User Datagram Protocol (UDP) [RFC0768] provides a minimal,
356	       best-effort transport to applications and upper-layer protocols
357	       (both simply called "applications" in the remainder of this
358	       document) and does not itself provide mechanisms to prevent
359	       congestion collapse and establish a degree of fairness [RFC5405].

361	       There is a growing set of UDP-based applications whose congestion
362	       avoidance algorithms are inadequate or nonexistent (i.e, a flow
363	       that does not throttle its sending rate when it experiences
364	       congestion).  Examples include some UDP streaming applications
365	       for packet voice and video, and some multicast bulk data
366	       transport.  If no action is taken, such unresponsive flows could
367	       lead to a new congestive collapse [RFC2309].

369	       In general, UDP-based applications need to incorporate effective
370	       congestion avoidance mechanisms [RFC5405].  Further research and
371	       development of ways to accomplish congestion avoidance for
372	       presently unresponsive applications continue to be
373	       important.Network devices need to be able to protect themselves
374	       against unresponsive flows, and mechanisms to accomplish this
375	       must be developed and deployed.  Deployment of such mechanisms
376	       would provide an incentive for all applications to become
377	       responsive by either using a congestion-controlled transport
378	       (e.g.  TCP, SCTP, DCCP) or by incorporating their own congestion
379	       control in the application.  [RFC5405].

381	   3.  Non-TCP-friendly Transport Protocols

383	       A second threat is posed by transport protocol implementations
384	       that are responsive to congestion, but, either deliberately or
385	       through faulty implementation, are not TCP-friendly.  Such
386	       applications may gain an unfair share of the available network
387	       capacity.

389	       For example, the popularity of the Internet has caused a
390	       proliferation in the number of TCP implementations.  Some of
391	       these may fail to implement the TCP congestion avoidance
392	       mechanisms correctly because of poor implementation.  Others may
393	       deliberately be implemented with congestion avoidance algorithms
394	       that are more aggressive in their use of capacity than other TCP
395	       implementations; this would allow a vendor to claim to have a
396	       "faster TCP".  The logical consequence of such implementations
397	       would be a spiral of increasingly aggressive TCP implementations,
398	       leading back to the point where there is effectively no
399	       congestion avoidance and the Internet is chronically congested.

401	       Another example could be an RTP/UDP video flow that uses an
402	       adaptive codec, but responds incompletely to indications of
403	       congestion or over responds over an excessively long time period.
404	       Such flows are unlikely to be responsive to congestion signals in
405	       a time frame comparable to a small number of end-to-end
406	       transmission delays.  However, over a longer timescale, perhaps
407	       seconds in duration, they could moderate their speed, or increase
408	       their speed if they determine capacity to be available.

410	       Tunneled traffic aggregates carrying multiple (short) TCP flows
411	       can be more aggressive than standard bulk TCP.  Applications
412	       (e.g. web browsers and peer-to-peer file-sharing) have exploited
413	       this by opening multiple connections to the same endpoint.

415	   The projected increase in the fraction of total Internet traffic for
416	   more aggressive flows in classes 2 and 3 clearly poses a threat to
417	   future Internet stability.  There is an urgent need for measurements
418	   of current conditions and for further research into the ways of
419	   managing such flows.  This raises many difficult issues in
420	   identifying and isolating unresponsive or non-TCP-friendly flows at
421	   an acceptable overhead cost.  Finally, there is as yet little
422	   measurement or simulation evidence available about the rate at which
423	   these threats are likely to be realized, or about the expected
424	   benefit of algorithms for managing such flows.

426	   Another topic requiring consideration is the appropriate granularity
427	   of a "flow" when considering a queue management method.  There are a
428	   few "natural" answers: 1) a transport (e.g.  TCP or UDP) flow (source
429	   address/port, destination address/port, DSCP); 2) a source/
430	   destination host pair (IP addresses, DSCP); 3) a given source host or
431	   a given destination host.  We suggest that the source/destination
432	   host pair gives the most appropriate granularity in many
433	   circumstances.  However, it is possible that different vendors/
434	   providers could set different granularities for defining a flow (as a
435	   way of "distinguishing" themselves from one another), or that
436	   different granularities could be chosen for different places in the
437	   network.  It may be the case that the granularity is less important
438	   than the fact that a network device needs to be able to deal with
439	   more unresponsive flows at *some* granularity.  The granularity of
440	   flows for congestion management is, at least in part, a question of
441	   policy that needs to be addressed in the wider IETF community.

443	4.  Conclusions and Recommendations

445	   The IRTF, in publishing [RFC2309], and the IETF in subsequent
446	   discussion, has developed a set of specific recommendations regarding
447	   the implementation and operational use of AQM procedures.  This
448	   document updates these to include:

450	   1.  Network devices SHOULD implement some AQM mechanism to manage
451	       queue lengths, reduce end-to-end latency, and avoid lock-out
452	       phenomena within the Internet.

454	   2.  Deployed AQM algorithms SHOULD support Explicit Congestion
455	       Notification (ECN) as well as loss to signal congestion to
456	       endpoints.

458	   3.  The algorithms that the IETF recommends SHOULD NOT require
459	       operational (especially manual) configuration or tuning.

461	   4.  AQM algorithms SHOULD respond to measured congestion, not
462	       application profiles.

464	   5.  AQM algorithms SHOULD NOT interpret specific transport protocol
465	       behaviours.

467	   6.  Transport protocol congestion control algorithms SHOULD maximize
468	       their use of available capacity (when there is data to send)
469	       without incurring undue loss or undue round trip delay.

471	   7.  Research, engineering, and measurement efforts are needed
472	       regarding the design of mechanisms to deal with flows that are
473	       unresponsive to congestion notification or are responsive, but
474	       are more aggressive than present TCP.

476	   These recommendations are expressed using the word "SHOULD".  This is
477	   in recognition that there may be use cases that have not been
478	   envisaged in this document in which the recommendation does not
479	   apply.  However, care should be taken in concluding that one's use
480	   case falls in that category; during the life of the Internet, such
481	   use cases have been rarely if ever observed and reported on.  To the
482	   contrary, available research [Papagiannaki] says that even high speed
483	   links in network cores that are normally very stable in depth and
484	   behavior experience occasional issues that need moderation.

486	4.1.  Operational deployments SHOULD  use AQM procedures

488	   AQM procedures are designed to minimize delay induced in the network
489	   by queues that have filled as a result of host behavior.  Marking and
490	   loss behaviors provide a signal that buffers within network devices
491	   are becoming unnecessarily full, and that the sender would do well to
492	   moderate its behavior.

494	4.2.  Signaling to the transport endpoints

496	   There are a number of ways a network device may signal to the end
497	   point that the network is becoming congested and trigger a reduction
498	   in rate.  The signalling methods include:

500	   o  Delaying data segments in flight, such as in a queue.

502	   o  Dropping data segments in transit.

504	   o  Marking data segments, such as using Explicit Congestion
505	      Control[RFC3168] [RFC4301] [RFC4774] [RFC6040] [RFC6679].

507	   The use of scheduling mechanisms, such as priority queuing, classful
508	   queuing, and fair queuing, is often effective in networks to help a
509	   network serve the needs of a range of applications.  Network
510	   operators can use these methods to manage traffic passing a choke
511	   point.  This is discussed in [RFC2474] and [RFC2475].

513	   Increased network latency can be used as an implicit signal of
514	   congestion.  E.g., in TCP additional delay can affect ACK Clocking
515	   and has the result of reducing the rate of transmission of new data.
516	   In RTP, network latency impacts the RTCP-reported RTT and increased
517	   latency can trigger a sender to adjust its rate.  Methods such as
518	   LEDBAT [RFC6817] assume increased latency as a primary signal of
519	   congestion.

521	   It is essential that all Internet hosts respond to loss [RFC5681],
522	   [RFC5405][RFC2960][RFC4340].  Packet dropping by network devices that
523	   are under load has two effects: It protects the network, which is the
524	   primary reason that network devices drop packets.  The detection of
525	   loss also provides a signal to a reliable transport (e.g.  TCP, SCTP)
526	   that there is potential congestion using a pragmatic heuristic; "when
527	   the network discards a message in flight, it may imply the presence
528	   of faulty equipment or media in a path, and it may imply the presence
529	   of congestion.  To be conservative transport must the latter."
530	   Unreliable transports (e.g. using UDP) need to similarly react to
531	   loss [RFC5405]

533	   Network devices SHOULD use use an AQM algorithm to determine the
534	   packets that are effected by congestion.

536	   Loss also has an effect on the efficiency of a flow and can
537	   significantly impact some classes of application.  In reliable
538	   transports the dropped data must be subsequently retransmitted.
539	   While other applications/transports may adapt to the absence of lost
540	   data, this still implies inefficient use of available capacity and
541	   the dropped traffic can affect other flows.  Hence, loss is not
542	   entirely positive; it is a necessary evil.

544	4.2.1.  AQM and ECN

546	   Explicit Congestion Notification (ECN) [RFC4301] [RFC4774] [RFC6040]
547	   [RFC6679]. is a network-layer function that allows a transport to
548	   receive network congestion information from a network device without
549	   incurring the unintended consequences of loss.  ECN includes both
550	   transport mechanisms and functions implemented in network devices,
551	   the latter rely upon using AQM to decider whether to ECN-mark.

553	   Congestion for ECN-capable transports is signalled by a network
554	   device setting the "Congestion Experienced (CE)" codepoint in the IP
555	   header.  This codepoint is noted by the remote receiving end point
556	   and signalled back to the sender using a transport protocol
557	   mechanism, allowing the sender to trigger timely congestion control.
558	   The decision to set the CE codepoint requires an AQM algorithm
559	   configured with a threshold.  Non-ECN capable flows (the default) are
560	   dropped under congestion.

562	   Network devices SHOULD use an AQM algorithm that marks ECN-capable
563	   traffic when making decisions about the response to congestion.
564	   Network devices need to implement this method by marking ECN-capable
565	   traffic or by dropping non-ECN-capable traffic.

567	   Safe deployment of ECN requires that network devices drop excessive
568	   traffic, even when marked as originating from an ECN-capable
569	   transport.  This is necessary because (1) A non-conformant, broken or
570	   malicious receiver could conceal an ECN mark, and not report this to
571	   the sender (2) A non-conformant, broken or malicious sender could
572	   ignore a reported ECN mark, as it could ignore a loss without using
573	   ECN (3) A malfunctioning or non-conforming network device may
574	   similarly "hide" an ECN mark.  In normal operation such cases should
575	   be very uncommon.

577	   Network devices SHOULD use an algorithm to drop excessive traffic,
578	   even when marked as originating from an ECN-capable transport.

580	4.3.  AQM algorithms deployed SHOULD NOT require operational tuning

582	   A number of AQM algorithms have been proposed.  Many require some
583	   form of tuning or setting of parameters for initial network
584	   conditions.  This can make these algorithms difficult to use in
585	   operational networks.

587	   This document therefore recommends that AQM algorithm proposed for
588	   deployment in the Internet:

590	   o  SHOULD NOT require tuning of initial or configuration parameters.
591	      An algorithm needs to provide a default behaviour that auto-tunes
592	      to a reasonable performance for typical network conditions.  This
593	      is expected to ease deployment and operation.

595	   o  MAY support further manual tuning that could improve performance
596	      in a specific deployed network.  Algorithms that lack such
597	      variables are acceptable, but if such variables exist, they SHOULD
598	      be externalized.  Guidance needs to be provided on the cases where
599	      autotuning is unlikely to achieve satisfactory performance and to
600	      identify the set of parameters that can be tuned.  This is
601	      expected to enable the algorithm to be deployed in networks that
602	      have specific characteristics (variable/larger delay; networks
603	      were capacity is impacted by interactions with lower layer
604	      mechanisms, etc)

606	   o  MAY provide logging and alarm signals to assist in identifying if
607	      an algorithm using manual or auto-tuning is functioning as
608	      expected. (e.g., this could be based on an internal consistency
609	      check between input, output, and mark/drop rates over time).  This
610	      is expected to encourage deployment by default and allow operators
611	      to identify potential interactions with other network functions.

613	   Hence, self-tuning algorithms are to be preferred.  Algorithms
614	   recommended for general Internet deployment by the IETF need to be
615	   designed so that they do not require operational (especially manual)
616	   configuration or tuning.

618	4.4.  AQM algorithms SHOULD respond to measured congestion, not
619	      application profiles.

621	   Not all applications transmit packets of the same size.  Although
622	   applications may be characterised by particular profiles of packet
623	   size this should not be used as the basis for AQM (see next section).
624	   Other methods exist, e.g.  Differentiated Services queueing, Pre-
625	   Congestion Notification (PCN) [RFC5559], that can be used to
626	   differentiate and police classes of application.  Network devices may
627	   combine AQM with these traffic classification mechanisms and perform
628	   AQM only on specific queues within a network device.

630	   An AQM algorithm should not deliberately try to prejudice the size of
631	   packet that performs best (i.e. preferentially drop/mark based only
632	   on packet size).  Procedures for selecting packets to mark/drop
633	   SHOULD observe actual or projected time a packet is in a queue (bytes
634	   at a rate being an analog to time).  When an AQM algorithm decides
635	   whether to drop (or mark) a packet, it is RECOMMENDED that the size
636	   of the particular packet should not be taken into account [Byte-pkt].

638	   Applications (or transports) generally know the packet size that they
639	   are using and can hence make their judgements about whether to use
640	   small or large packets based on the data they wish to send and the
641	   expected impact on the delay or throughput, or other performance
642	   parameter.  When a transport or application responds to a dropped or
643	   marked packet, the size of the rate reduction should be proportionate
644	   to the size of the packet that was sent [Byte-pkt].

646	4.5.  AQM algorithms SHOULD NOT be dependent on specific transport
647	      protocol behaviours

649	   In deploying AQM, network devices need to support a range of Internet
650	   traffic and SHOULD NOT make implicit assumptions about the
651	   characteristics desired by the set transports/applications the
652	   network supports.  That is, AQM methods should be opaque to the
653	   choice of transport and application.

655	   AQM algorithms are often evaluated by considering TCP [RFC0793] with
656	   a limited number of applications.  Although TCP is the predominant
657	   transport in the Internet today, this no longer represents a
658	   sufficient selection of traffic for verification.  There is
659	   significant use of UDP [RFC0768] in voice and video services, and
660	   some applications find utility in SCTP [RFC4960] and DCCP [RFC4340].
661	   Hence, AQM algorithms should also demonstrate operation with
662	   transports other than TCP and need to consider a variety of
663	   applications.  Selection of AQM algorithms also needs to consider use
664	   of tunnel encapsulations that may carry traffic aggregates.

666	   AQM algorithms SHOULD NOT target or derive implicit assumptions about
667	   the characteristics desired by specific transports/applications.
668	   Transports and applications need to respond to the congestion signals
669	   provided by AQM (i.e. dropping or ECN-marking) in a timely manner
670	   (within a few RTT at the latest).

672	4.6.  Interactions with congestion control algorithms

674	   Applications and transports need to react to received implicit or
675	   explicit signals that indicate the presence of congestion.  This
676	   section identifies issues that can impact the design of transport
677	   protocols when using paths that use AQM.

679	   Transport protocols and applications need timely signals of
680	   congestion.  The time taken to detect and respond to congestion is
681	   increased when network devices queue packets in buffers.  It can
682	   difficult to detect tail losses at a higher layer and may sometimes
683	   require transport timers or probe packets to detect and respond to
684	   such loss.  Loss patterns may also impact timely detection, e.g. the
685	   time may be reduced when network devices do not drop long runs of
686	   packets from the same flow.

688	   A common objective is to deliver data from its source end point to
689	   its destination in the least possible time.  When speaking of TCP
690	   performance, the terms "knee" and "cliff" area defined by [Jain94].
691	   They respectively refer to the minimum congestion window that
692	   maximises throughput and the maximum congestion window that avoids
693	   loss.  An application that transmits at the rate determined by this
694	   window has the effect of maximizing the rate or throughput.  For the
695	   sender, exceeding the cliff is ineffective, as it (by definition)
696	   induces loss; operating at a point close to the cliff has a negative
697	   impact on other traffic and applications, triggering operator
698	   activities, such as those discussed in [RFC6057].  Operating below
699	   the knee reduces the throughput, since the sender fails to use
700	   available network capacity.  As a result, the behavior of any elastic
701	   transport congestion control algorithm designed to minimise delivery
702	   time should seek to use an effective window at or above the knee and
703	   well below the cliff.  Choice of an appropriate rate can
704	   significantly impact the loss and delay experienced not only by a
705	   flow, but by other flows that share the same queue.

707	   Some applications may send less than permitted by the congestion
708	   control window (or rate).  Examples include multimedia codecs that
709	   stream at some natural rate (or set of rates) or an application that
710	   is naturally interactive (e.g. some web applications, gaming,
711	   transaction-based protocols).  Such applications may have different
712	   objectives.  They may not wish to maximise throughput, but may desire
713	   a lower loss rate or bounded delay.

715	   The correct operation of an AQM-enabled network device MUST NOT rely
716	   upon specific transport responses to congestion signals.

718	4.7.  The need for further research

720	   The second recommendation of [RFC2309] called for further research
721	   into the interaction between network queues and host applications,
722	   and the means of signaling between them.  This research has occurred,
723	   and we as a community have learned a lot.  However, we are not done.

725	   We have learned that the problems of congestion, latency and buffer-
726	   sizing have not gone away, and are becoming more important to many
727	   users.  A number of self-tuning AQM algorithms have been found that
728	   offer significant advantages for deployed networks.  There is also
729	   renewed interest in deploying AQM and the potential of ECN.

731	   In 2013, an obvious example of further research is the need to
732	   consider the use of Map/Reduce applications in data centers; do we
733	   need to extend our taxonomy of TCP/SCTP sessions to include not only
734	   "mice" and "elephants", but "lemmings"?  "Lemmings" are flash crowds
735	   of "mice" that the network inadvertently tries to signal to as if
736	   they were elephant flows, resulting in head of line blocking in data
737	   center applications.

739	   Examples of other required research include:

741	   o  Research into new AQM and scheduling algorithms.

743	   o  Research into the use of and deployment of ECN alongside AQM.

745	   o  Tools for enabling AQM (and ECN) deployment and measuring the
746	      performance.

748	   o  Methods for mitigating the impact of non-conformant and malicious
749	      flows.

751	   Hence, this document therefore reiterates the call of RFC 2309: we
752	   need continuing research as applications develop.

754	5.  IANA Considerations

756	   This memo asks the IANA for no new parameters.

758	6.  Security Considerations

760	   While security is a very important issue, it is largely orthogonal to
761	   the performance issues discussed in this memo.

763	   Many deployed network devices use queueing methods that allow
764	   unresponsive traffic to capture network capacity, denying access to
765	   other traffic flows.  This could potentially be used as a denial-of-
766	   service attack.  This threat could be reduced in network devices
767	   deploy AQM or some form of scheduling.  We note, however, that a
768	   denial-of-service attack may create unresponsive traffic flows that
769	   may be indistinguishable from other traffic flows (e.g. tunnels
770	   carrying aggregates of short flows, high-rate isochronous
771	   applications).  New methods therefore may remain vulnerable, and this
772	   document recommends that ongoing research should consider ways to
773	   mitigate such attacks.

775	7.  Privacy Considerations

777	   This document, by itself, presents no new privacy issues.

779	8.  Acknowledgements

781	   The original recommendation in [RFC2309] was written by the End-to-
782	   End Research Group, which is to say Bob Braden, Dave Clark, Jon
783	   Crowcroft, Bruce Davie, Steve Deering, Deborah Estrin, Sally Floyd,
784	   Van Jacobson, Greg Minshall, Craig Partridge, Larry Peterson, KK
785	   Ramakrishnan, Scott Shenker, John Wroclawski, and Lixia Zhang.  This
786	   is an edited version of that document, with much of its text and
787	   arguments unchanged.

789	   The need for an updated document was agreed to in the tsvarea meeting
790	   at IETF 86.  This document was reviewed on the aqm@ietf.org list.
791	   Comments came from Colin Perkins, Richard Scheffenegger, and Dave
792	   Taht.

794	   Gorry Fairhurst was in part supported by the European Community under
795	   its Seventh Framework Programme through the Reducing Internet
796	   Transport Latency (RITE) project (ICT-317700).

798	9.  References

800	9.1.  Normative References

802	   [Byte-pkt]
803	              Internet Engineering Task Force, Work in Progress, "Byte
804	              and Packet Congestion Notification
805	              (draft-ietf-tsvwg-byte-pkt-congest)", July 2013.

807	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
808	              Requirement Levels", BCP 14, RFC 2119, March 1997.

810	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
811	              of Explicit Congestion Notification (ECN) to IP",
812	              RFC 3168, September 2001.

814	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
815	              Internet Protocol", RFC 4301, December 2005.

817	   [RFC4774]  Floyd, S., "Specifying Alternate Semantics for the
818	              Explicit Congestion Notification (ECN) Field", BCP 124,
819	              RFC 4774, November 2006.

821	   [RFC5405]  Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines
822	              for Application Designers", BCP 145, RFC 5405,
823	              November 2008.

825	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
826	              Control", RFC 5681, September 2009.

828	   [RFC6040]  Briscoe, B., "Tunnelling of Explicit Congestion
829	              Notification", RFC 6040, November 2010.

831	   [RFC6679]  Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P.,
832	              and K. Carlberg, "Explicit Congestion Notification (ECN)
833	              for RTP over UDP", RFC 6679, August 2012.

835	9.2.  Informative References

837	   [AQM-WG]   "IETF AQM WG".

839	   [Demers90]
840	              Demers, A., Keshav, S., and S. Shenker, "Analysis and
841	              Simulation of a Fair Queueing Algorithm, Internetworking:
842	              Research and Experience", SIGCOMM Symposium proceedings on
843	              Communications architectures and protocols , 1990.

845	   [Floyd91]  Floyd, S., "Connections with Multiple Congested Gateways
846	              in Packet-Switched Networks Part 1: One-way Traffic.",
847	              Computer Communications Review , October 1991.

849	   [Floyd95]  Floyd, S. and V. Jacobson, "Link-sharing and Resource
850	              Management Models for Packet Networks", IEEE/ACM
851	              Transactions on Networking , August 1995.

853	   [Jacobson88]
854	              Jacobson, V., "Congestion Avoidance and Control", SIGCOMM
855	              Symposium proceedings on Communications architectures and
856	              protocols , August 1988.

858	   [Jain94]   Jain, Raj., Ramakrishnan, KK., and Chiu. Dah-Ming,
859	              "Congestion avoidance scheme for computer networks", US
860	              Patent Office 5377327, December 1994.

862	   [Lakshman96]
863	              Lakshman, TV., Neidhardt, A., and T. Ott, "The Drop From
864	              Front Strategy in TCP Over ATM and Its Interworking with
865	              Other Control Features", IEEE Infocomm , 1996.

867	   [Leland94]
868	              Leland, W., Taqqu, M., Willinger, W., and D. Wilson, "On
869	              the Self-Similar Nature of Ethernet Traffic (Extended
870	              Version)", IEEE/ACM Transactions on Networking ,
871	              February 1994.

873	   [Papagiannaki]
874	              Sprint ATL, KAIST, University of Minnesota, Sprint ATL,
875	              and Intel ResearchIETF, "Analysis of Point-To-Point Packet
876	              Delay In an Operational Network", IEEE Infocom 2004,
877	              March 2004,
878	              <http://www.ieee-infocom.org/2004/Papers/37_4.PDF>.

880	   [RFC0768]  Postel, J., "User Datagram Protocol", STD 6, RFC 768,
881	              August 1980.

883	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
884	              September 1981.

886	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
887	              RFC 793, September 1981.

889	   [RFC0896]  Nagle, J., "Congestion control in IP/TCP internetworks",
890	              RFC 896, January 1984.

892	   [RFC0970]  Nagle, J., "On packet switches with infinite storage",
893	              RFC 970, December 1985.

895	   [RFC1122]  Braden, R., "Requirements for Internet Hosts -
896	              Communication Layers", STD 3, RFC 1122, October 1989.

898	   [RFC1633]  Braden, B., Clark, D., and S. Shenker, "Integrated
899	              Services in the Internet Architecture: an Overview",
900	              RFC 1633, June 1994.

902	   [RFC2309]  Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
903	              S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
904	              Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
905	              S., Wroclawski, J., and L. Zhang, "Recommendations on
906	              Queue Management and Congestion Avoidance in the
907	              Internet", RFC 2309, April 1998.

909	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
910	              (IPv6) Specification", RFC 2460, December 1998.

912	   [RFC2474]  Nichols, K., Blake, S., Baker, F., and D. Black,
913	              "Definition of the Differentiated Services Field (DS
914	              Field) in the IPv4 and IPv6 Headers", RFC 2474,
915	              December 1998.

917	   [RFC2475]  Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.,
918	              and W. Weiss, "An Architecture for Differentiated
919	              Services", RFC 2475, December 1998.

921	   [RFC2960]  Stewart, R., Xie, Q., Morneault, K., Sharp, C.,
922	              Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M.,
923	              Zhang, L., and V. Paxson, "Stream Control Transmission
924	              Protocol", RFC 2960, October 2000.

926	   [RFC4340]  Kohler, E., Handley, M., and S. Floyd, "Datagram
927	              Congestion Control Protocol (DCCP)", RFC 4340, March 2006.

929	   [RFC4960]  Stewart, R., "Stream Control Transmission Protocol",
930	              RFC 4960, September 2007.

932	   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
933	              Friendly Rate Control (TFRC): Protocol Specification",
934	              RFC 5348, September 2008.

936	   [RFC5559]  Eardley, P., "Pre-Congestion Notification (PCN)
937	              Architecture", RFC 5559, June 2009.

939	   [RFC6057]  Bastian, C., Klieber, T., Livingood, J., Mills, J., and R.
940	              Woundy, "Comcast's Protocol-Agnostic Congestion Management
941	              System", RFC 6057, December 2010.

943	   [RFC6817]  Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind,
944	              "Low Extra Delay Background Transport (LEDBAT)", RFC 6817,
945	              December 2012.

947	   [Willinger95]
948	              Willinger, W., Taqqu, M., Sherman, R., Wilson, D., and V.
949	              Jacobson, "Self-Similarity Through High-Variability:
950	              Statistical Analysis of Ethernet LAN Traffic at the Source
951	              Level", SIGCOMM Symposium proceedings on Communications
952	              architectures and protocols , August 1995.

954	Appendix A.  Change Log

956	   Initial Version:  March 2013

958	   Minor update of the algorithms that the IETF recommends SHOULD NOT
959	   require operational (especially manual) configuration or tuningdate:
960	      April 2013

962	   Major surgery.  This draft is for discussion at IETF-87 and  expected
963	   to be further updated.  July 2013

965	   -00 WG Draft - Updated transport recommendations; revised deployment
966	   configuration section; numerous minor edits.  Oct 2013

968	Authors' Addresses

970	   Fred Baker (editor)
971	   Cisco Systems
972	   Santa Barbara, California  93117
973	   USA

975	   Email: fred@cisco.com

977	   Godred Fairhurst (editor)
978	   University of Aberdeen
979	   School of Engineering
980	   Fraser Noble Building
981	   Aberdeen, Scotland  AB24 3UE
982	   UK

984	   Email: gorry@erg.abdn.ac.uk
985	   URI:   http://www.erg.abdn.ac.uk