idnits 2.17.1 draft-hoeiland-joergensen-aqm-fq-codel-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 8 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 490 has weird spacing: '...st_head flowc...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (November 10, 2014) is 3454 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Looks like a reference, but probably isn't: '1' on line 719

  == Unused Reference: 'RFC2119' is defined on line 671, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC0896' is defined on line 674, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC0970' is defined on line 677, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2309' is defined on line 680, but no explicit
     reference was found in the text

  == Unused Reference: 'SFQ' is defined on line 694, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC  896 (Obsoleted by RFC 7805)

  ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567)


     Summary: 4 errors (**), 0 flaws (~~), 8 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	AQM working group                                 T. Hoeiland-Joergensen
3	Internet-Draft                                       Karlstad University
4	Intended status: Informational                               P. McKenney
5	Expires: May 14, 2015                        IBM Linux Technology Center
6	                                                                 D. Taht
7	                                                                Teklibre
8	                                                               J. Gettys
9	                                                              E. Dumazet
10	                                                            Google, Inc.
11	                                                       November 10, 2014

13	                            FlowQueue-Codel
14	               draft-hoeiland-joergensen-aqm-fq-codel-01

16	Abstract

18	   This memo presents the FQ-CoDel hybrid packet scheduler/AQM
19	   algorithm, a critical tool for fighting bufferbloat and reducing
20	   latency across the Internet.

22	   FQ-CoDel mixes packets from multiple flows and reduces the impact of
23	   head of line blocking from bursty traffic.  It provides isolation for
24	   low-rate traffic such as DNS, web, and videoconferencing traffic.  It
25	   improves utilisation across the networking fabric, especially for
26	   bidirectional traffic, by keeping queue lengths short; and it can be
27	   implemented in a memory- and CPU-efficient fashion across a wide
28	   range of hardware.

30	Status of This Memo

32	   This Internet-Draft is submitted in full conformance with the
33	   provisions of BCP 78 and BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF).  Note that other groups may also distribute
37	   working documents as Internet-Drafts.  The list of current Internet-
38	   Drafts is at http://datatracker.ietf.org/drafts/current/.

40	   Internet-Drafts are draft documents valid for a maximum of six months
41	   and may be updated, replaced, or obsoleted by other documents at any
42	   time.  It is inappropriate to use Internet-Drafts as reference
43	   material or to cite them other than as "work in progress."

45	   This Internet-Draft will expire on May 14, 2015.

47	Copyright Notice

49	   Copyright (c) 2014 IETF Trust and the persons identified as the
50	   document authors.  All rights reserved.

52	   This document is subject to BCP 78 and the IETF Trust's Legal
53	   Provisions Relating to IETF Documents
54	   (http://trustee.ietf.org/license-info) in effect on the date of
55	   publication of this document.  Please review these documents
56	   carefully, as they describe your rights and restrictions with respect
57	   to this document.  Code Components extracted from this document must
58	   include Simplified BSD License text as described in Section 4.e of
59	   the Trust Legal Provisions and are provided without warranty as
60	   described in the Simplified BSD License.

62	1.  Introduction

64	   The FQ-CoDel algorithm is a combined packet scheduler and AQM
65	   developed as part of the bufferbloat-fighting community effort.  It
66	   is based on a modified Deficit Round Robin (DRR) queue scheduler,
67	   with the CoDel AQM algorithm operating on each sub-queue.  This
68	   document describes the combined algorithm; reference implementations
69	   are available for ns2 and ns3 and it is included in the mainline
70	   Linux kernel as the FQ-CoDel queueing discipline.

72	   The rest of this document is structured as follows: This section
73	   gives some concepts and terminology used in the rest of the document,
74	   and gives a short informal summary of the FQ-CoDel algorithm.
75	   Section 2 gives an overview of the CoDel algorithm.  Section 3 covers
76	   the DRR portion.  Section 4 defines the parameters and data
77	   structures employed by FQ-CoDel.  Section 5 describes the working of
78	   the algorithm in detail.  Section 6 describes implementation
79	   considerations, and section 7 lists some of the limitations of using
80	   flow queueing.  Finally section 11 concludes.

82	1.1.  Terminology and concepts

84	   Flow: A flow is typically identified by a 5-tuple of source IP,
85	   destination IP, source port, destination port, and protocol.  It can
86	   also be identified by a superset or subset of those parameters, or by
87	   mac address, or other means.

89	   Queue: A queue of packets represented internally in FQ-CoDel.  In
90	   most instances each flow gets its own queue; however because of the
91	   possibility of hash collisions, this is not always the case.  In an
92	   attempt to avoid confusion, the word 'queue' is used to refer to the
93	   internal data structure, and 'flow' to refer to the actual stream of
94	   packets being delivered to the FQ-CoDel algorithm.

96	   Scheduler: A mechanism to select which queue a packet is dequeued
97	   from.

99	   CoDel AQM: The Active Queue Management algorithm employed by FQ-
100	   CoDel.

102	   DRR: Deficit round-robin scheduling.

104	   Quantum: The maximum amount of bytes to be dequeued from a queue at
105	   once.

107	1.2.  Informal summary of FQ-CoDel

109	   FQ-CoDel is a _hybrid_ of DRR [DRR] and CODEL [CODEL2012], with an
110	   optimisation for sparse flows similar to SQF [SQF2012] and DRR++
111	   [DRRPP].  We call this "Flow Queueing" rather than "Fair Queueing" as
112	   flows that build a queue are treated differently than flows that do
113	   not.

115	   FQ-CoDel stochastically classifies incoming packets into different
116	   sub-queues by hashing the 5-tuple of IP protocol number and source
117	   and destination IP and port numbers, perturbed with a random number
118	   selected at initiation time (although other flow classification
119	   schemes can optionally be configured instead).  Each queue is managed
120	   by the CoDel queueing discipline.  Packet ordering within a queue is
121	   preserved, since queues have FIFO ordering.

123	   The FQ-CoDel algorithm consists of two logical parts: the scheduler
124	   which selects which queue to dequeue a packet from, and the CoDel AQM
125	   which works on each of the queues.  The subtleties of FQ-CoDel are
126	   mostly in the scheduling part, whereas the interaction between the
127	   scheduler and the CoDel algorithm are fairly straight forward:

129	   At initialisation, each queue is set up to have a separate set of
130	   CoDel state variables.  By default, 1024 queues are created.  The
131	   current implementation supports anywhere from one to 64K separate
132	   queues, and each queue maintains the state variables throughout its
133	   lifetime, and so acts the same as the non-FQ CoDel variant would.
134	   This means that with only one queue, FQ-CoDel behaves essentially the
135	   same as CoDel by itself.

137	   On dequeue, FQ-CoDel selects a queue from which to dequeue by a two-
138	   tier round-robin scheme, in which each queue is allowed to dequeue up
139	   to a configurable quantum of bytes for each iteration.  Deviations
140	   from this quantum is maintained as a deficit for the queue, which
141	   serves to make the fairness scheme byte-based rather than a packet-
142	   based.  The two-tier round-robin mechanism distinguishes between
143	   "new" queues (which don't build up a standing queue) and "old"
144	   queues, that have queued enough data to be around for more than one
145	   iteration of the round-robin scheduler.

147	   This new/old queue distinction has a particular consequence for
148	   queues that don't build up more than a quantum of bytes before being
149	   visited by the scheduler: Such queues are removed from the list, and
150	   then re-added as a new queue each time a packet arrives for it, and
151	   so will get priority over queues that do not empty out each round
152	   (except for a minor modification to protect against starvation,
153	   detailed below).  Exactly how much data a flow has to send to keep
154	   its queue in this state is somewhat difficult to reason about,
155	   because it depends on both the egress link speed and the number of
156	   concurrent flows.  However, in practice many things that are
157	   beneficial to have prioritised for typical internet use (ACKs, DNS
158	   lookups, interactive SSH, HTTP requests, ARP, ICMP, VoIP) _tend_ to
159	   fall in this category, which is why FQ-CoDel performs so well for
160	   many practical applications.  However, the implicitness of the
161	   prioritisation means that for applications that require guaranteed
162	   priority (for instance multiplexing the network control plane over
163	   the network itself), explicit classification is still needed.

165	   This scheduling scheme has some subtlety to it, which is explained in
166	   detail in the remainder of this document.

168	2.  CoDel

170	   CoDel is described in the the ACM Queue paper, CODEL [CODEL2012], and
171	   Van Jacobson's IETF84 presentation CODELDRAFT [CODELDRAFT].  The
172	   basic idea is to control queue length, maintaining sufficient
173	   queueing to keep the outgoing link busy, but avoiding building up the
174	   queue beyond that point.  This is done by preferentially dropping
175	   packets that remain in the queue for "too long".

177	   When each new packet arrives, its arrival time is stored with it.
178	   Later, when it is that packet's turn to be dequeued, CoDel computes
179	   its sojourn time (the current time minus the arrival time).  If the
180	   sojourn time for packets being dequeued exceeds the _target_ time for
181	   a time period of at least the (current value of) _interval_, one or
182	   more packets will be dropped (or marked, if ECN is enabled) in order
183	   to signal the source endpoint to reduce its send rate.  If the
184	   sojourn still remains above the target time, the value of interval be
185	   lowered, and additional packet drops will occur on a schedule
186	   computed from an inverse-square-root control law until either (1) the
187	   queue becomes empty or (2) a packet is encountered with a sojourn
188	   time that is less than the target time.  Upon exiting the dropping
189	   mode, CoDel caches the last calculated interval (applying varying
190	   amounts of hysteresis to it), to be used as the starting point on
191	   subsequent re-entries into dropping mode.

193	   The _target_ time is normally set to about five milliseconds, and the
194	   initial _interval_ is normally set to about 100 milliseconds.  This
195	   approach has proven to be quite effective in a wide variety of
196	   situations.

198	   CoDel drops packets at the head of a queue, rather than at the tail.

200	3.  Flow Queueing

202	   FQ-CoDel's DRR scheduler is byte-based, employing a deficit round-
203	   robin mechanism between queues.  This works by keeping track of the
204	   current byte _deficit_ of each queue.  This deficit is initialised to
205	   the configurable quantum; each time a queue gets a dequeue
206	   opportunity, it gets to dequeue packets, decreasing the deficit by
207	   the packet size for each packet, until the deficit runs into the
208	   negative, at which point it is increased by one quantum, and the
209	   dequeue opportunity ends.

211	   This means that if one queue contains packets of size quantum/3, and
212	   another contains quantum-sized packets, the first queue will dequeue
213	   three packets each time it gets a turn, whereas the second only
214	   dequeues one.  This means that flows that send small packets are not
215	   penalised by the difference in packet sizes; rather, the DRR scheme
216	   approximates a (single-)byte-based fairness queueing.  The size of
217	   the quantum determines the scheduling granularity, with the tradeoff
218	   from too small a quantum being scheduling overhead.  For small
219	   bandwidths, lowering the quantum from the default MTU size can be
220	   advantageous.

222	   Unlike DRR there are two sets of flows - a "new" list for flows that
223	   have not built a queue recently, and an "old" list for flow-building
224	   queues.

226	4.  FQ-CoDel Parameters and Data Structures

228	   This section goes into the parameters and data structures in FQ-
229	   CoDel.

231	4.1.  Parameters

233	4.1.1.  Interval

235	   The _interval_ parameter has the same semantics as CoDel and is used
236	   to ensure that the measured minimum delay does not become too stale.
237	   The minimum delay MUST be experienced in the last epoch of length
238	   interval.  It SHOULD be set on the order of the worst-case RTT
239	   through the bottleneck to give end-points sufficient time to react.

241	   The default interval value is 100 ms.

243	4.1.2.  Target

245	   The _target_ parameter has the same semantics as CoDel.  It is the
246	   acceptable minimum standing/persistent queue delay for each FQ-CoDel
247	   Queue.  This minimum delay is identified by tracking the local
248	   minimum queue delay that packets experience.

250	   The default target value is 5 ms, but this value SHOULD be tuned to
251	   be at least the transmission time of a single MTU-sized packet at the
252	   prevalent egress link speed (which for e.g. 1Mbps and MTU 1500 is
253	   ~15ms).  It should otherwise be set to on the order of 5-10% of the
254	   configured interval.

256	4.1.3.  Packet limit

258	   Routers do not have infinite memory, so some packet limit MUST be
259	   enforced.

261	   The _limit_ parameter is the hard limit on the real queue size,
262	   measured in number of packets.  This limit is a global limit on the
263	   number of packets in all queues; each individual queue does not have
264	   an upper limit.  When the limit is reached and a new packet arrives
265	   for enqueue, a packet is dropped from the head of the largest queue
266	   (measured in bytes) to make room for the new packet.

268	   The default packet limit is 10240 packets, which is suitable for up
269	   to 10GigE speeds.  In practice, the hard limit is rarely, if ever,
270	   hit, as drops are performed by the CoDel algorithm long before the
271	   limit is hit.  For platforms that are severely memory constrained, a
272	   lower limit can be used.

274	4.1.4.  Quantum

276	   The _quantum_ parameter is the number of bytes each queue gets to
277	   dequeue on each round of the scheduling algorithm.  The default is
278	   set to 1514 bytes which corresponds to the Ethernet MTU plus the
279	   hardware header length of 14 bytes.

281	   In TSO-enabled systems, where a "packet" consists of an offloaded
282	   packet train, it can presently be as large as 64K bytes.  In GRO-
283	   enabled systems, up to 17 times the TCP max segment size (or 25K
284	   bytes).

286	4.1.5.  Flows

288	   The _flows_ parameter sets the number of sub-queues into which the
289	   incoming packets are classified.  Due to the stochastic nature of
290	   hashing, multiple flows may end up being hashed into the same slot.

292	   This parameter can be set only at load time since memory has to be
293	   allocated for the hash table in the current implementation.

295	   The default value is 1024.

297	4.1.6.  ECN

299	   ECN is _enabled_ by default.  Rather than do anything special with
300	   misbehaved ECN flows, FQ-CoDel relies on the packet scheduling system
301	   to minimise their impact, thus unresponsive packets in a flow being
302	   marked with ECN can grow to the overall packet limit, but will not
303	   otherwise affect the performance of the system.

305	   It can be disabled by specifying the _noecn_ parameter.

307	4.2.  Data structures

309	4.2.1.  Internal sub-queues

311	   The main data structure of FQ-CoDel is the array of sub-queues, which
312	   is instantiated to the number of queues specified by the _flows_
313	   parameter at instantiation time.  Each sub-queue consists simply of
314	   an ordered list of packets with FIFO semantics, two state variables
315	   tracking the queue deficit and total number of bytes enqueued, and
316	   the set of CoDel state variables.  Other state variables to track
317	   queue statistics can also be included: for instance, the Linux
318	   implementation keeps a count of dropped packets.

320	   Queue space is shared: there's a global limit on the number of
321	   packets the queues can hold, but not one per queue.

323	4.2.2.  New and old queues lists

325	   FQ-CoDel maintains two lists of active queues, called "new" and "old"
326	   queues.  Each list is an ordered list containing references to the
327	   array of sub-queues.  When a packet is added to a queue that is not
328	   currently active, that queue becomes active by being added to the
329	   list of new queues.  Later on, it is moved to the list of old queues,
330	   from which it is removed when it is no longer active.  This behaviour
331	   is the source of some subtlety in the packet scheduling at dequeue
332	   time, explained below.

334	5.  The FQ-CoDel scheduler and AQM interactions

336	   This section describes the operation of the FQ-CoDel scheduler and
337	   AQM.  It is split into two parts explaining the enqueue and dequeue
338	   operations.

340	5.1.  Enqueue

342	   The packet enqueue mechanism consists of three stages: classification
343	   into a sub-queue, timestamping and bookkeeping, and optionally
344	   dropping a packet when the total number of enqueued packets goes over
345	   the maximum.

347	   When a packet is enqueued, it is first classified into the
348	   appropriate sub-queue.  By default, this is done by hashing on the
349	   5-tuple of IP protocol, and source and destination IP and port
350	   numbers, permuted by a random value selected at initialisation time,
351	   and taking the hash value modulo the number of sub-queues.  However,
352	   an implementation MAY also specify a configurable classification
353	   scheme along a wide variety of other possible parameters such as mac
354	   address, diffserv, firewall and flow specific markings, etc. (the
355	   Linux implementation does so in the form of the 'tc filter' command).

357	   If a custom filter fails, classification failure results in the
358	   packet being dropped and no further action taken.  By design the
359	   standard filter cannot fail.

361	   Additionally, the default hashing algorithm presently deployed does
362	   decapsulation of some common packet types (6in4, IPIP, GRE 0), mixes
363	   IPv6 IP addresses thoroughly, and uses Jenkins hash on the result.

365	   Once the packet has been successfully classified into a sub-queue, it
366	   is handed over to the CoDel algorithm for timestamping.  It is then
367	   added to the tail of the selected queue, and the queue's byte count
368	   is updated by the packet size.  Then, if the queue is not currently
369	   active (i.e. if it is not in either the list of new or the list of
370	   old queues), it is added to the end of the list of new queues, and
371	   its deficit is initiated to the configured quantum.  Otherwise it is
372	   added to the old queue list.

374	   Finally, the total number of enqueued packets is compared with the
375	   configured limit, and if it is _above_ this value (which can happen
376	   since a packet was just enqueued), a packet is dropped from the head
377	   of the queue with the largest current byte count.  Note that this in
378	   most cases means that the packet that gets dropped is different from
379	   the one that was just enqueued, and may even be from a different
380	   queue.

382	5.2.  Dequeue

384	   Most of FQ-CoDel's work is done at packet dequeue time.  It consists
385	   of three parts: selecting a queue from which to dequeue a packet,
386	   actually dequeuing it (employing the CoDel algorithm in the process),
387	   and some final bookkeeping.

389	   For the first part, the scheduler first looks at the list of new
390	   queues; for each queue in that list, if that queue has a negative
391	   deficit (i.e. it has already dequeued at least a quantum of bytes),
392	   its deficit is increased by one quantum, and the queue is put onto
393	   the end of the list of old queues, and the routine selects the next
394	   queue and starts again.

396	   Otherwise, that queue is selected for dequeue.  If the list of new
397	   queues is empty, the scheduler proceeds down the list of old queues
398	   in the same fashion (checking the deficit, and either selecting the
399	   queue for dequeuing, or increasing the deficit and putting the queue
400	   back at the end of the list).

402	   After having selected a queue from which to dequeue a packet, the
403	   CoDel algorithm is invoked on that queue.  This applies the CoDel
404	   control law, and may discard one or more packets from the head of
405	   that queue, before returning the packet that should be dequeued (or
406	   nothing if the queue is or becomes empty while being handled by the
407	   CoDel algorithm).

409	   Finally, if the CoDel algorithm did not return a packet, the queue is
410	   empty, and the scheduler does one of two things: if the queue
411	   selected for dequeue came from the list of new queues, it is moved to
412	   the end of the list of old queues.  If instead it came from the list
413	   of old queues, that queue is removed from the list, to be added back
414	   (as a new queue) the next time a packet arrives that hashes to that
415	   queue.  Then (since no packet was available for dequeue), the whole
416	   dequeue process is restarted from the beginning.

418	   If, instead, the scheduler _did_ get a packet back from the CoDel
419	   algorithm, it updates the byte deficit for the selected queue before
420	   returning the packet as the result of the dequeue operation.

422	   The step that moves an empty queue from the list of new queues to the
423	   list of old queues before it is removed is crucial to prevent
424	   starvation.  Otherwise the queue could reappear (the next time a
425	   packet arrives for it) before the list of old queues is visited; this
426	   can go on indefinitely even with a small number of active flows, if
427	   the flow providing packets to the queue in question transmits at just
428	   the right rate.  This is prevented by first moving the queue to the
429	   list of old queues, forcing a pass through that, and thus preventing
430	   starvation.

432	   The resulting migration of queues between the different states is
433	   summarised in the following state diagram:

435	 +-----------------+                +--------------------+
436	 |                 |     Empty      |                    |
437	 |     Empty       |<---------------+        Old         +-----+
438	 |                 |                |                    |     |
439	 +-------+---------+                +--------------------+     |
440	         |                             ^              ^        |Quantum
441	         |Arrival                      |              |        |Exceeded
442	         v                             |              |        |
443	 +-----------------+                   |              |        |
444	 |                 |     Empty or      |              |        |
445	 |      New        +-------------------+              +--------+
446	 |                 |  Quantum exceeded
447	 +-----------------+

449	6.  Implementation considerations

451	6.1.  Probability of hash collisions

453	   Since the Linux FQ-CoDel implementation by default uses 1024 hash
454	   buckets, the probability that (say) 100 VoIP sessions will all hash
455	   to the same bucket is something like ten to the power of minus 300.
456	   Thus, the probability that at least one of the VoIP sessions will
457	   hash to some other queue is very high indeed.

459	   Conversely, the probability that each of the 100 VoIP sessions will
460	   get its own queue is given by (1023!/(924!*1024^99)) or about 0.007;
461	   so not all that probable.  The probability rises sharply, however, if
462	   we are willing to accept a few collisions.  For example, there is
463	   about an 86% probability that no more than two of the 100 VoIP
464	   sessions will be involved in any given collision, and about a 99%
465	   probability that no more than three of the VoIP sessions will be
466	   involved in any given collision.  These last two results were
467	   computed using Monte Carlo simulations: Oddly enough, the mathematics
468	   for VoIP-session collision exactly matches that of hardware cache
469	   overflow.

471	6.2.  Memory Overhead

473	   FQ-CoDel can be implemented with a very low memory footprint (less
474	   than 64 bytes per queue on 64 bit systems).  These are the data
475	   structures used in the Linux implementation:

477	struct codel_vars {
478	    u32             count;
479	    u32             lastcount;
480	    bool            dropping;
481	    u16             rec_inv_sqrt;
482	    codel_time_t    first_above_time;
483	    codel_time_t    drop_next;
484	    codel_time_t    ldelay;
485	};

487	struct fq_codel_flow {
488	    struct sk_buff    *head;
489	    struct sk_buff    *tail;
490	    struct list_head  flowchain;
491	    int               deficit;
492	    u32               dropped; /* number of drops (or ECN marks) on this flow */
493	    struct codel_vars cvars;
494	};

496	   The master table managing all queues looks like this:

498	  struct fq_codel_sched_data {
499	      struct tcf_proto *filter_list;  /* optional external classifier */
500	      struct fq_codel_flow *flows;    /* Flows table [flows_cnt] */
501	      u32             *backlogs;      /* backlog table [flows_cnt] */
502	      u32             flows_cnt;      /* number of flows */
503	      u32             perturbation;   /* hash perturbation */
504	      u32             quantum;        /* psched_mtu(qdisc_dev(sch)); */
505	      struct codel_params cparams;
506	      struct codel_stats cstats;
507	      u32             drop_overlimit;
508	      u32             new_flow_count;

510	      struct list_head new_flows;     /* list of new flows */
511	      struct list_head old_flows;     /* list of old flows */
512	  };

514	6.3.  Per-Packet Timestamping

516	   The CoDel portion of the algorithm requires per-packet timestamps be
517	   stored along with the packet.  While this approach works well for
518	   software-based routers, it may be impossible to retrofit devices that
519	   do most of their processing in silicon and lack space or mechanism
520	   for timestamping.

522	   Also, while perfect resolution is not needed, timestamping functions
523	   in the core OS need to be efficient as they are called at least once
524	   on each packet enqueue and dequeue.

526	6.4.  Other forms of "Fair Queueing"

528	   Much of the scheduling portion of FQ-CoDel is derived from DRR and is
529	   substantially similar to DRR++. SFQ-based versions have also been
530	   produced and tested in ns2.  Other forms of Fair Queueing, such as
531	   WFQ or QFQ, have not been thoroughly explored.

533	6.5.  Differences between CoDel and FQ-CoDel behaviour

535	   CoDel can be applied to a single queue system as a straight AQM,
536	   where it converges towards an "ideal" drop rate (i.e.  one that
537	   minimises delay while keeping a high link utilisation), and then
538	   optimises around that control point.

540	   The scheduling of FQ-CoDel mixes packets of competing flows, which
541	   acts to pace bursty flows to better fill the pipe.  Additionally, a
542	   new flow gets substantial "credit" over other flows until CoDel finds
543	   an ideal drop rate for it.  However, for a new flow that exceeds the
544	   configured quantum, more time passes before all of its data is
545	   delivered (as packets from it, too, are mixed across the other
546	   existing queue-building flows).  Thus, FQ-CoDel takes longer (as
547	   measured in time) to converge towards an ideal drop rate for a given
548	   new flow, but does so within fewer delivered _packets_ from that
549	   flow.

551	   Finally, the flow isolation FQ-CoDel provides means that the CoDel
552	   drop mechanism operates on the flows actually building queues, which
553	   results in packets being dropped more accurately from the largest
554	   flows than CoDel alone manages.  Additionally, flow isolation
555	   radically improves the transient behaviour of the network when
556	   traffic or link characteristics change (e.g. when new flows start up
557	   or the link bandwidth changes); while CoDel itself can take a while
558	   to respond, fq_codel doesn't miss a beat.

560	7.  Limitations of flow queueing

562	   While FQ-CoDel has been shown in many scenarios to offer significant
563	   performance gains, there are some scenarios where the scheduling
564	   algorithm in particular is not a good fit.  This section documents
565	   some of the known cases which either may require tweaking the default
566	   behaviour, or where alternatives to flow queueing should be
567	   considered.

569	7.1.  Fairness between things other than flows

571	   In some parts of the network, enforcing flow-level fairness may not
572	   be desirable, or some other level of fairness may be more important.
573	   An example of this can be an Internet Service Provider that may be
574	   more interested in ensuring fairness between customers than between
575	   flows.  Or a hosting or transit provider that wishes to ensure
576	   fairness between connecting Autonomous Systems or networks.  Another
577	   issue can be that the number of simultaneous flows experienced at a
578	   particular link can be too high for flow-based fairness queueing to
579	   be effective.

581	   Whatever the reason, in a scenario where fairness between flows is
582	   not desirable, reconfiguring FQ-CoDel to match on a different
583	   characteristic can be a way forward.  The implementation in Linux can
584	   leverage the powerful packet matching mechanism of the _tc_ subsystem
585	   to use any available packet field to partition packets into virtual
586	   queues, to for instance match on address or subnet source/destination
587	   pairs, application layer characteristics, etc.

589	   Furthermore, as commonly deployed today, FQ-CoDel is used with three
590	   or more tiers of classification: priority, best effort and
591	   background, based on diffserv markings.  Some products do more
592	   detailed classification, including deep packet inspection and
593	   destination-specific filters to achieve their desired result.

595	7.2.  Flow bunching by opaque encapsulation

597	   Where possible, FQ-CoDel will attempt to decapsulate packets before
598	   matching on the header fields for the flow hashing.  However, for
599	   some encapsulation techniques, most notably encrypted VPNs, this is
600	   not possible.  If several flows are bunched into one such
601	   encapsulated tunnel, they will be seen as one flow by the FQ-CoDel
602	   algorithm.  This means that they will share a queue, and drop
603	   behaviour, and so flows inside the encapsulation will not benefit
604	   from the implicit prioritisation of FQ-CoDel, but will continue to
605	   benefit from the reduced overall queue length from the CoDel
606	   algorithm operating on the queue.  In addition, when such an
607	   encapsulated bunch competes against other flows, it will count as one
608	   flow, and not assigned a share of the bandwidth based on how many
609	   flows are inside the encapsulation.

611	   Depending on the application, this may or may not be desirable
612	   behaviour.  In cases where it is not, changing FQ-CoDel's matching to
613	   not be flow-based (as detailed in the previous subsection above) can
614	   be a way to mitigate this.

616	7.3.  Low-priority congestion control algorithms

618	   Because of the flow isolation that FQ-CoDel provides, low-priority
619	   congestion control algorithms (or, in general, algorithms that try to
620	   voluntarily use up less than their fair share of bandwidth) can be
621	   re-prioritised.  Because a flow experiences very little added latency
622	   when the link is congested, such algorithms lack the signal to back
623	   off that added latency previously afforded them.  As such, existing
624	   algorithms tend to revert to loss-based congestion control, and will
625	   consume the fair share of bandwidth afforded to them by the FQ-CoDel
626	   scheduler.  However, low-priority congestion control mechanisms may
627	   be able to take steps to continue to be low priority, for instance by
628	   taking into account the vastly reduced level of delay afforded by an
629	   AQM, or by using a coupled approach to observing the behaviour of
630	   multiple flows.

632	8.  Security Considerations

634	   There are no specific security exposures associated with FQ-CoDel.
635	   Some exposures present in current FIFO systems are in fact reduced
636	   (e.g. simple minded packet floods).

638	9.  IANA Considerations

640	   This document has no actions for IANA.

642	10.  Acknowledgements

644	   Our deepest thanks to Eric Dumazet (author of FQ-CoDel), Kathie
645	   Nichols, Van Jacobson, and all the members of the bufferbloat.net
646	   effort.

648	11.  Conclusions

650	   FQ-CoDel is a very general, efficient, nearly parameterless active
651	   queue management approach combining flow queueing with CoDel.  It is
652	   a critical tool in solving bufferbloat.

654	   FQ-CoDel's default settings SHOULD be modified for other special-
655	   purpose networking applications, such as for exceptionally slow
656	   links, for use in data centres, or on links with inherent delay
657	   greater than 800ms (e.g. satellite links).

659	   On-going projects are: improving FQ-CoDel with more SFQ-like
660	   behaviour for lower bandwidth systems, improving the control law,
661	   optimising sparse packet drop behaviour, etc..

663	   In addition to the Linux kernel sources, ns2 and ns3 models are
664	   available.  Refinements (such as NFQCODEL [1]) are being tested in
665	   the CeroWrt effort.

667	12.  References

669	12.1.  Normative References

671	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
672	              Requirement Levels", BCP 14, RFC 2119, March 1997.

674	   [RFC0896]  Nagle, J., "Congestion control in IP/TCP internetworks",
675	              RFC 896, January 1984.

677	   [RFC0970]  Nagle, J., "On packet switches with infinite storage", RFC
678	              970, December 1985.

680	   [RFC2309]  Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
681	              S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
682	              Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
683	              S., Wroclawski, J., and L. Zhang, "Recommendations on
684	              Queue Management and Congestion Avoidance in the
685	              Internet", RFC 2309, April 1998.

687	   [CODELDRAFT]
688	              Nichols, K., Jacobson, V., McGregor, A., and J. Iyengar,
689	              "Controlling Queue Delay", October 2014,
690	              .

692	12.2.  Informative References

694	   [SFQ]      McKenney, P., "Stochastic Fairness Queuing", September
695	              1990, .

698	   [CODEL2012]
699	              Nichols, K. and V. Jacobson, "Controlling Queue Delay",
700	              July 2012, .

702	   [SQF2012]  Bonald, T., Muscariello, L., and N. Ostallo, "On the
703	              impact of TCP and per-flow scheduling on Internet
704	              Performance - IEEE/ACM transactions on Networking", April
705	              2012, .

708	   [DRR]      Shreedhar, M. and G. Varghese, "Efficient Fair Queueing
709	              Using Deficit Round Robin", June 1996,
710	              .

713	   [DRRPP]    MacGregor, and Shi, "Deficits for Bursty Latency-critical
714	              Flows: DRR++", 2000, .

717	12.3.  URIs

719	   [1] http://www.bufferbloat.net/projects/cerowrt/wiki/nfq_codel

721	Authors' Addresses

723	   Toke Hoeiland-Joergensen
724	   Karlstad University
725	   Dept. of Computer Science
726	   Karlstad  65188
727	   Sweden

729	   Email: toke.hoiland-jorgensen@kau.se

731	   Paul McKenney
732	   IBM Linux Technology Center
733	   1385 NW Amberglen Parkway
734	   Hillsboro, OR  97006
735	   USA

737	   Email: paulmck@linux.vnet.ibm.com
738	   URI:   http://www2.rdrop.com/~paulmck/

740	   Dave Taht
741	   Teklibre
742	   2104 W First street
743	   Apt 2002
744	   FT Myers, FL  33901
745	   USA

747	   Email: d+ietf@teklibre.com
748	   URI:   http://www.teklibre.com/

750	   Jim Gettys
751	   Google, Inc.
752	   21 Oak Knoll Road
753	   Carlisle, MA  01741
754	   USA

756	   Email: jg@freedesktop.org
757	   URI:   https://en.wikipedia.org/wiki/Jim_Gettys
758	   Eric Dumazet
759	   Google, Inc.
760	   1600 Amphitheater Pkwy
761	   Mountain View, Ca  94043
762	   USA

764	   Email: edumazet@gmail.com