idnits 2.17.1 

draft-ietf-tsvwg-aqm-dualq-coupled-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (May 10, 2017) is 2542 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  -- Looks like a reference, but probably isn't: '0' on line 926

  -- Looks like a reference, but probably isn't: '1' on line 926

  == Outdated reference: A later version (-07) exists of
     draft-ietf-tcpm-cubic-04

  == Outdated reference: A later version (-10) exists of
     draft-ietf-tcpm-dctcp-05

  == Outdated reference: A later version (-08) exists of
     draft-ietf-tsvwg-ecn-experimentation-00

  == Outdated reference: A later version (-29) exists of
     draft-ietf-tsvwg-ecn-l4s-id-00

  == Outdated reference: A later version (-20) exists of
     draft-ietf-tsvwg-l4s-arch-00

  -- Obsolete informational reference (is this intentional?): RFC 2309
     (Obsoleted by RFC 7567)


     Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Transport Area working group (tsvwg)                      K. De Schepper
3	Internet-Draft                                           Nokia Bell Labs
4	Intended status: Experimental                            B. Briscoe, Ed.
5	Expires: November 11, 2017                                 O. Bondarenko
6	                                                     Simula Research Lab
7	                                                                I. Tsang
8	                                                         Nokia Bell Labs
9	                                                            May 10, 2017

11	  DualQ Coupled AQM for Low Latency, Low Loss and Scalable Throughput
12	                 draft-ietf-tsvwg-aqm-dualq-coupled-00

14	Abstract

16	   Data Centre TCP (DCTCP) was designed to provide predictably low
17	   queuing latency, near-zero loss, and throughput scalability using
18	   explicit congestion notification (ECN) and an extremely simple
19	   marking behaviour on switches.  However, DCTCP does not co-exist with
20	   existing TCP traffic---throughput starves.  So, until now, DCTCP
21	   could only be deployed where a clean-slate environment could be
22	   arranged, such as in private data centres.  This specification
23	   defines `DualQ Coupled Active Queue Management (AQM)' to allow
24	   scalable congestion controls like DCTCP to safely co-exist with
25	   classic Internet traffic.  The Coupled AQM ensures that a flow runs
26	   at about the same rate whether it uses DCTCP or TCP Reno/Cubic, but
27	   without inspecting transport layer flow identifiers.  When tested in
28	   a residential broadband setting, DCTCP achieved sub-millisecond
29	   average queuing delay and zero congestion loss under a wide range of
30	   mixes of DCTCP and `Classic' broadband Internet traffic, without
31	   compromising the performance of the Classic traffic.  The solution
32	   also reduces network complexity and eliminates network configuration.

34	Status of This Memo

36	   This Internet-Draft is submitted in full conformance with the
37	   provisions of BCP 78 and BCP 79.

39	   Internet-Drafts are working documents of the Internet Engineering
40	   Task Force (IETF).  Note that other groups may also distribute
41	   working documents as Internet-Drafts.  The list of current Internet-
42	   Drafts is at http://datatracker.ietf.org/drafts/current/.

44	   Internet-Drafts are draft documents valid for a maximum of six months
45	   and may be updated, replaced, or obsoleted by other documents at any
46	   time.  It is inappropriate to use Internet-Drafts as reference
47	   material or to cite them other than as "work in progress."
48	   This Internet-Draft will expire on November 11, 2017.

50	Copyright Notice

52	   Copyright (c) 2017 IETF Trust and the persons identified as the
53	   document authors.  All rights reserved.

55	   This document is subject to BCP 78 and the IETF Trust's Legal
56	   Provisions Relating to IETF Documents
57	   (http://trustee.ietf.org/license-info) in effect on the date of
58	   publication of this document.  Please review these documents
59	   carefully, as they describe your rights and restrictions with respect
60	   to this document.  Code Components extracted from this document must
61	   include Simplified BSD License text as described in Section 4.e of
62	   the Trust Legal Provisions and are provided without warranty as
63	   described in the Simplified BSD License.

65	Table of Contents

67	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
68	     1.1.  Problem and Scope . . . . . . . . . . . . . . . . . . . .   2
69	     1.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   5
70	     1.3.  Features  . . . . . . . . . . . . . . . . . . . . . . . .   5
71	   2.  DualQ Coupled AQM Algorithm . . . . . . . . . . . . . . . . .   6
72	     2.1.  Coupled AQM . . . . . . . . . . . . . . . . . . . . . . .   7
73	     2.2.  Dual Queue  . . . . . . . . . . . . . . . . . . . . . . .   8
74	     2.3.  Traffic Classification  . . . . . . . . . . . . . . . . .   8
75	     2.4.  Normative Requirements  . . . . . . . . . . . . . . . . .   8
76	   3.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
77	   4.  Security Considerations . . . . . . . . . . . . . . . . . . .   9
78	     4.1.  Overload Handling . . . . . . . . . . . . . . . . . . . .  10
79	   5.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  11
80	   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  11
81	     6.1.  Normative References  . . . . . . . . . . . . . . . . . .  11
82	     6.2.  Informative References  . . . . . . . . . . . . . . . . .  11
83	   Appendix A.  Example DualQ Coupled PI2 Algorithm  . . . . . . . .  14
84	   Appendix B.  Example DualQ Coupled Curvy RED Algorithm  . . . . .  17
85	   Appendix C.  Guidance on Controlling Throughput Equivalence . . .  23
86	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  24

88	1.  Introduction

90	1.1.  Problem and Scope

92	   Latency is becoming the critical performance factor for many (most?)
93	   applications on the public Internet, e.g. interactive Web, Web
94	   services, voice, conversational video, interactive video, interactive
95	   remote presence, instant messaging, online gaming, remote desktop,
96	   cloud-based applications, and video-assisted remote control of
97	   machinery and industrial processes.  In the developed world, further
98	   increases in access network bit-rate offer diminishing returns,
99	   whereas latency is still a multi-faceted problem.  In the last decade
100	   or so, much has been done to reduce propagation time by placing
101	   caches or servers closer to users.  However, queuing remains a major
102	   component of latency.

104	   The Diffserv architecture provides Expedited Forwarding [RFC3246], so
105	   that low latency traffic can jump the queue of other traffic.
106	   However, on access links dedicated to individual sites (homes, small
107	   enterprises or mobile devices), often all traffic at any one time
108	   will be latency-sensitive.  Then Diffserv is of little use.  Instead,
109	   we need to remove the causes of any unnecessary delay.

111	   The bufferbloat project has shown that excessively-large buffering
112	   (`bufferbloat') has been introducing significantly more delay than
113	   the underlying propagation time.  These delays appear only
114	   intermittently--only when a capacity-seeking (e.g.  TCP) flow is long
115	   enough for the queue to fill the buffer, making every packet in other
116	   flows sharing the buffer sit through the queue.

118	   Active queue management (AQM) was originally developed to solve this
119	   problem (and others).  Unlike Diffserv, which gives low latency to
120	   some traffic at the expense of others, AQM controls latency for _all_
121	   traffic in a class.  In general, AQMs introduce an increasing level
122	   of discard from the buffer the longer the queue persists above a
123	   shallow threshold.  This gives sufficient signals to capacity-seeking
124	   (aka. greedy) flows to keep the buffer empty for its intended
125	   purpose: absorbing bursts.  However, RED [RFC2309] and other
126	   algorithms from the 1990s were sensitive to their configuration and
127	   hard to set correctly.  So, AQM was not widely deployed.

129	   More recent state-of-the-art AQMs, e.g.
130	   fq_CoDel [I-D.ietf-aqm-fq-codel], PIE [I-D.ietf-aqm-pie], Adaptive
131	   RED [ARED01], are easier to configure, because they define the
132	   queuing threshold in time not bytes, so it is invariant for different
133	   link rates.  However, no matter how good the AQM, the sawtoothing
134	   rate of TCP will either cause queuing delay to vary or cause the link
135	   to be under-utilized.  Even with a perfectly tuned AQM, the
136	   additional queuing delay will be of the same order as the underlying
137	   speed-of-light delay across the network.  Flow-queuing can isolate
138	   one flow from another, but it cannot isolate a TCP flow from the
139	   delay variations it inflicts on itself, and it has other problems -
140	   it overrides the flow rate decisions of variable rate video
141	   applications, it does not recognise the flows within IPSec VPN
142	   tunnels and it is relatively expensive to implement.

144	   It seems that further changes to the network alone will now yield
145	   diminishing returns.  Data Centre TCP (DCTCP [I-D.ietf-tcpm-dctcp])
146	   teaches us that a small but radical change to TCP is needed to cut
147	   two major outstanding causes of queuing delay variability:

149	   1.  the `sawtooth' varying rate of TCP itself;

151	   2.  the smoothing delay deliberately introduced into AQMs to permit
152	       bursts without triggering losses.

154	   The former causes a flow's round trip time (RTT) to vary from about 1
155	   to 2 times the base RTT between the machines in question.  The latter
156	   delays the system's response to change by a worst-case
157	   (transcontinental) RTT, which could be hundreds of times the actual
158	   RTT of typical traffic from localized CDNs.

160	   Latency is not our only concern:

162	   3.  It was known when TCP was first developed that it would not scale
163	       to high bandwidth-delay products.

165	   Given regular broadband bit-rates over WAN distances are
166	   already [RFC3649] beyond the scaling range of `classic' TCP Reno,
167	   `less unscalable' Cubic [I-D.ietf-tcpm-cubic] and
168	   Compound [I-D.sridharan-tcpm-ctcp] variants of TCP have been
169	   successfully deployed.  However, these are now approaching their
170	   scaling limits.  Unfortunately, fully scalable TCPs such as DCTCP
171	   cause `classic' TCP to starve itself, which is why they have been
172	   confined to private data centres or research testbeds (until now).

174	   This document specifies a `DualQ Coupled AQM' extension that solves
175	   the problem of coexistence between scalable and classic flows,
176	   without having to inspect flow identifiers.  The AQM is not like
177	   flow-queuing approaches [I-D.ietf-aqm-fq-codel] that classify packets
178	   by flow identifier into numerous separate queues in order to isolate
179	   sparse flows from the higher latency in the queues assigned to
180	   heavier flow.  In contrast, the AQM exploits the behaviour of
181	   scalable congestion controls like DCTCP so that every packet in every
182	   flow sharing the queue for DCTCP-like traffic can be served with very
183	   low latency.

185	   This AQM extension can be combined with any single qeueu AQM that
186	   generates a statistical or deterministic mark/drop probability driven
187	   by the queue dynamics.  In many cases it simplifies the basic control
188	   algorithm, and requires little extra processing.  Therefore it is
189	   believed the Coupled AQM would be applicable and easy to deploy in
190	   all types of buffers; buffers in cost-reduced mass-market residential
191	   equipment; buffers in end-system stacks; buffers in carrier-scale
192	   equipment including remote access servers, routers, firewalls and
193	   Ethernet switches; buffers in network interface cards, buffers in
194	   virtualized network appliances, hypervisors, and so on.

196	   The overall L4S architecture is described in
197	   [I-D.ietf-tsvwg-l4s-arch].  The supporting papers [PI216] and
198	   [DCttH15] give the full rationale for the AQM's design, both
199	   discursively and in more precise mathematical form.

201	1.2.  Terminology

203	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
204	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
205	   document are to be interpreted as described in [RFC2119].  In this
206	   document, these words will appear with that interpretation only when
207	   in ALL CAPS.  Lower case uses of these words are not to be
208	   interpreted as carrying RFC-2119 significance.

210	   The DualQ Coupled AQM uses two queues for two services.  Each of the
211	   following terms identifies both the service and the queue that
212	   provides the service:

214	   Classic (denoted by subscript C):  The `Classic' service is intended
215	      for all the behaviours that currently co-exist with TCP Reno (TCP
216	      Cubic, Compound, SCTP, etc).

218	   Low-Latency, Low-Loss and Scalable (L4S, denoted by subscript L):
219	      The `L4S' service is intended for a set of congestion controls
220	      with scalable properties such as DCTCP (e.g.
221	      Relentless [Mathis09]).

223	   Either service can cope with a proportion of unresponsive or less-
224	   responsive traffic as well (e.g.  DNS, VoIP, etc), just as a single
225	   queue AQM can.  The DualQ Coupled AQM behaviour is similar to a
226	   single FIFO queue with respect to unresponsive and overload traffic.

228	1.3.  Features

230	   The AQM couples marking and/or dropping across the two queues such
231	   that a flow will get roughly the same throughput whichever it uses.
232	   Therefore both queues can feed into the full capacity of a link and
233	   no rates need to be configured for the queues.  The L4S queue enables
234	   scalable congestion controls like DCTCP to give stunningly low and
235	   predictably low latency, without compromising the performance of
236	   competing 'Classic' Internet traffic.  Thousands of tests have been
237	   conducted in a typical fixed residential broadband setting.  Typical
238	   experiments used base round trip delays up to 100ms between the data
239	   centre and home network, and large amounts of background traffic in
240	   both queues.  For every L4S packet, the AQM kept the average queuing
241	   delay below 1ms (or 2 packets if serialization delay is bigger for
242	   slow links), and no losses at all were introduced by the AQM.
243	   Details of the extensive experiments will be made available [PI216]
244	   [DCttH15].

246	   Subjective testing was also conducted using a demanding panoramic
247	   interactive video application run over a stack with DCTCP enabled and
248	   deployed on the testbed.  Each user could pan or zoom their own high
249	   definition (HD) sub-window of a larger video scene from a football
250	   match.  Even though the user was also downloading large amounts of
251	   L4S and Classic data, latency was so low that the picture appeared to
252	   stick to their finger on the touchpad (all the L4S data achieved the
253	   same ultra-low latency).  With an alternative AQM, the video
254	   noticeably lagged behind the finger gestures.

256	   Unlike Diffserv Expedited Forwarding, the L4S queue does not have to
257	   be limited to a small proportion of the link capacity in order to
258	   achieve low delay.  The L4S queue can be filled with a heavy load of
259	   capacity-seeking flows like DCTCP and still achieve low delay.  The
260	   L4S queue does not rely on the presence of other traffic in the
261	   Classic queue that can be 'overtaken'.  It gives low latency to L4S
262	   traffic whether or not there is Classic traffic, and the latency of
263	   Classic traffic does not suffer when a proportion of the traffic is
264	   L4S.  The two queues are only necessary because DCTCP-like flows
265	   cannot keep latency predictably low and keep utilization high if they
266	   are mixed with legacy TCP flows,

268	   The experiments used the Linux implementation of DCTCP that is
269	   deployed in private data centres, without any modification despite
270	   its known deficiencies.  Nonetheless, certain modifications will be
271	   necessary before DCTCP is safe to use on the Internet, which are
272	   recorded in Appendix A of [I-D.ietf-tsvwg-ecn-l4s-id].  However, the
273	   focus of this specification is to get the network service in place.
274	   Then, without any management intervention, applications can exploit
275	   it by migrating to scalable controls like DCTCP, which can then
276	   evolve _while_ their benefits are being enjoyed by everyone on the
277	   Internet.

279	2.  DualQ Coupled AQM Algorithm

281	   There are two main aspects to the algorithm:

283	   o  the Coupled AQM that addresses throughput equivalence between
284	      Classic (e.g.  Reno, Cubic) flows and L4S (e.g.  DCTCP) flows

286	   o  the Dual Queue structure that provides latency separation for L4S
287	      flows to isolate them from the typically large Classic queue.

289	2.1.  Coupled AQM

291	   In the 1990s, the `TCP formula' was derived for the relationship
292	   between TCP's congestion window, cwnd, and its drop probability, p.
293	   To a first order approximation, cwnd of TCP Reno is inversely
294	   proportional to the square root of p.  TCP Cubic implements a Reno-
295	   compatibility mode, which is the only relevant mode for typical RTTs
296	   under 20ms, while the throughput of a single flow is less than about
297	   500Mb/s.  Therefore we can assume that Cubic traffic behaves similar
298	   to Reno (but with a slightly different constant of proportionality),
299	   and we shall use the term 'Classic' for the collection of Reno and
300	   Cubic in Reno mode.

302	   In our supporting paper [PI216], we derive the equivalent rate
303	   equation for DCTCP, for which cwnd is inversely proportional to p
304	   (not the square root), where in this case p is the ECN marking
305	   probability.  DCTCP is not the only congestion control that behaves
306	   like this, so we use the term 'L4S' traffic for all similar
307	   behaviour.

309	   In order to make a DCTCP flow run at roughly the same rate as a Reno
310	   TCP flow (all other factors being equal), we make the drop or marking
311	   probability for Classic traffic, p_C distinct from the marking
312	   probability for L4S traffic, p_L (in contrast to RFC3168 which
313	   requires them to be the same).  We make the Classic drop probability
314	   p_C proportional to the square of the L4S marking probability p_L.
315	   This is because we need to make the Reno flow rate equal the DCTCP
316	   flow rate, so we have to square the square root of p_C in the Reno
317	   rate equation to make it the same as the straight p_L in the DCTCP
318	   rate equation.

320	   There is a really simple way to implement the square of a probability
321	   - by testing the queue against two random numbers not one.  This is
322	   the approach adopted in Appendix A and Appendix B.

324	   Stating this as a formula, the relation between Classic drop
325	   probability, p_C, and L4S marking probability, p_L needs to take the
326	   form:

328	       p_C = ( p_L / k )^2                  (1)

330	   where k is the constant of proportionality.  Optionally, k can be
331	   expressed as a power of 2, so k=2^k', where k' is another constant.
332	   Then implementations can avoid costly division by shifting p_L by k'
333	   bits to the right.

335	2.2.  Dual Queue

337	   Classic traffic builds a large queue, so a separate queue is provided
338	   for L4S traffic, and it is scheduled with strict priority.
339	   Nonetheless, coupled marking ensures that giving priority to L4S
340	   traffic still leaves the right amount of spare scheduling time for
341	   Classic flows to each get equivalent throughput to DCTCP flows (all
342	   other factors such as RTT being equal).  The algorithm achieves this
343	   without having to inspect flow identifiers.

345	2.3.  Traffic Classification

347	   Both the Coupled AQM and DualQ mechanisms need an identifier to
348	   distinguish L4S and C packets.  A separate draft
349	   [I-D.ietf-tsvwg-ecn-l4s-id] recommends using the ECT(1) codepoint of
350	   the ECN field as this identifier, having assessed various
351	   alternatives.

353	   Given L4S work is currently on the experimental track, but the
354	   definition of the ECN field is on the standards track [RFC3168],
355	   another standards track document has proved necessary to make the
356	   ECT(1) codepoint available for experimentation
357	   [I-D.ietf-tsvwg-ecn-experimentation].

359	2.4.  Normative Requirements

361	   In the Dual Queue, L4S packets MUST be given priority over Classic,
362	   although strict priority MAY not be appropriate.

364	   All L4S traffic MUST be ECN-capable, although some Classic traffic
365	   MAY also be ECN-capable.

367	   Whatever identifier is used for L4S traffic, it will still be
368	   necessary to agree on the meaning of an ECN marking on L4S traffic,
369	   relative to a drop of Classic traffic.  In order to prevent
370	   starvation of Classic traffic by scalable L4S traffic (e.g.  DCTCP)
371	   the drop probability of Classic traffic MUST be proportional to the
372	   square of the marking probability of L4S traffic, In other words, the
373	   power to which p_L is raised in Eqn. (1) MUST be 2.

375	   The constant of proportionality, k, in Eqn (1) determines the
376	   relative flow rates of Classic and L4S flows when the AQM concerned
377	   is the bottleneck (all other factors being equal). k does not have to
378	   be standardized because differences do not prevent interoperability.
379	   However, k has to take some value, and each operator can make that
380	   choice.

382	   A value of k=2 is currently RECOMMENDED as the default for Internet
383	   access networks.  Assuming scalable congestion controls for the
384	   Internet will be as aggressive as DCTCP, this will ensure their
385	   congestion window will be roughly the same as that of a standards
386	   track TCP congestion control (Reno) [RFC5681] and other so-called
387	   TCP-friendly controls such as TCP Cubic in its TCP-friendly mode.

389	   The requirements for scalable congestion controls on the Internet
390	   (termed the TCP Prague requirements) [I-D.ietf-tsvwg-ecn-l4s-id] are
391	   not necessarily final.  If the aggressiveness of DCTCP is not defined
392	   as the benchmark for scalable controls on the Internet, the
393	   recommended value of k will also be subject to change.

395	   Whatever value is recommended, the choice of k is a matter of
396	   operator policy, and operators MAY choose a different value using
397	   Table 1 and the guidelines in Appendix C.

399	   Typically, access network operators isolate customers from each other
400	   with some form of layer-2 multiplexing (TDM in DOCSIS, CDMA in 3G) or
401	   L3 scheduling (WRR in broadband), rather than relying on TCP to share
402	   capacity between customers [RFC0970].  In such cases, the choice of k
403	   will solely affect relative flow rates within each customer's access
404	   capacity, not between customers.  Also, k will not affect relative
405	   flow rates at any times when all flows are Classic or all L4S, and it
406	   will not affect small flows.

408	   Example DualQ Coupled AQM algorithms called PI2 and Curvy RED are
409	   given in Appendix A and Appendix B.  Either example AQM can be used
410	   to couple packet marking and dropping across a dual Q.  Curvy RED
411	   requires less operations per packet than RED and can be used if the
412	   range of RTTs is limited.  PI2 is a simplification of PIE with stable
413	   Proportional-Integral control for both Classic and L4S congestion
414	   controls.  Nonetheless, it would be possible to control the queues
415	   with other alternative AQMs, as long as the above normative
416	   requirements (those expressed in capitals) are observed, which are
417	   intended to be independent of the specific AQM.

419	   {ToDo: Add management and monitoring requirements}

421	3.  IANA Considerations

423	   This specification contains no IANA considerations.

425	4.  Security Considerations
426	4.1.  Overload Handling

428	   Where the interests of users or flows might conflict, it could be
429	   necessary to police traffic to isolate any harm to performance.  This
430	   is a policy issue that needs to be separable from a basic AQM, but an
431	   AQM does need to handle overload.  A trade-off needs to be made
432	   between complexity and the risk of either class harming the other.
433	   It is an operator policy to define what must happen if the service
434	   time of the classic queue becomes too great.  In the following
435	   subsections three optional non-exclusive overload protections are
436	   defined.  Their objective is for the overload behaviour of the DualQ
437	   AQM to be similar to a single queue AQM.  The example implementation
438	   in Appendix A implements the 'delay on overload' policy.  Other
439	   overload protections can be envisaged:

441	   Minimum throughput service:   By replacing the priority scheduler
442	      with a weighted round robin scheduler, a minimum throughput
443	      service can be guaranteed for Classic traffic.  Typically the
444	      scheduling weight of the Classic queue will be small (e.g. 5%) to
445	      avoid interference with the coupling but big enough to avoid
446	      complete starvation of Classic traffic.

448	   Delay on overload:  To control milder overload of responsive traffic,
449	      particularly when close to the maximum congestion signal, delay
450	      can be used as an alternative congestion control mechanism.  The
451	      Dual Queue Coupled AQM can be made to behave like a single First-
452	      In First-Out (FIFO) queue with different service times by
453	      replacing the priority scheduler with a very simple scheduler that
454	      could be called a "time-shifted FIFO", which is the same as the
455	      Modifier Earliest Deadline First (MEDF) scheduler of [MEDF].  The
456	      scheduler adds T_m to the queue delay of the next L4S packet,
457	      before comparing it with the queue delay of the next Classic
458	      packet, then it selects the packet with the greater adjusted queue
459	      delay.  Under regular conditions, this time-shifted FIFO scheduler
460	      behaves just like a strict priority scheduler.  But under moderate
461	      or high overload it prevents starvation of the Classic queue,
462	      because the time-shift defines the maximum extra queuing delay
463	      (T_m) of Classic packets relative to L4S.

465	   Drop on overload:  On severe overload, e.g. due to non responsive
466	      traffic, queues will typically overflow and packet drop will be
467	      unavoidable.  It is important to avoid unresponsive ECN traffic
468	      (either Classic or L4S) driving the AQM to 100% drop and mark
469	      probability.  Congestion controls that have a minimum congestion
470	      window will become unresponsive to ECN marking when the marking
471	      probability is high.  This situation can be avoided by applying
472	      the drop probability to all packets of all traffic types when it
473	      exceeds a certain threshold or by limiting the drop and marking
474	      probabilities to a lower maximum value (up to where fairnes
475	      between the different traffic types is still guaranteed) and rely
476	      on delay to control temporary high congestion and eventually queue
477	      overflow.  If the classic drop probability is applied to all types
478	      of traffic when it is higher than a threshold probability the
479	      queueing delay can be controlled up to any overload situation, and
480	      no further measures are required.  If a maximum classic and
481	      coupled L4S probability of less than 100% is used, both queues
482	      need scheduling opportunities and should eventually experience
483	      drop.  This can be achieved with a scheduler that guarantees a
484	      minimum throughput for each queue, such as a weighted round robin
485	      or time-shifted FIFO scheduler.  In that case a common queue limit
486	      can be configured that will drop packets of both types of traffic.

488	   To keep the throughput of both L4S and Classic flows equal over the
489	   full load range, a different control strategy needs to be defined
490	   above the point where one congestion control first saturates to a
491	   probability of 100% (if k>1, L4S will saturate first).  Possible
492	   strategies include: also dropping L4S; increasing the queueing delay
493	   for both; or ensuring that L4S traffic still responds to marking
494	   below a window of 2 segments (see [I-D.ietf-tsvwg-ecn-l4s-id]).

496	5.  Acknowledgements

498	   Thanks to Anil Agarwal for detailed review comments and suggestions
499	   on how to make our explanation clearer.

501	   The authors' contributions are part-funded by the European Community
502	   under its Seventh Framework Programme through the Reducing Internet
503	   Transport Latency (RITE) project (ICT-317700).  The views expressed
504	   here are solely those of the authors.

506	6.  References

508	6.1.  Normative References

510	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
511	              Requirement Levels", BCP 14, RFC 2119,
512	              DOI 10.17487/RFC2119, March 1997,
513	              <http://www.rfc-editor.org/info/rfc2119>.

515	6.2.  Informative References

517	   [ARED01]   Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An
518	              Algorithm for Increasing the Robustness of RED's Active
519	              Queue Management", ACIRI Technical Report , August 2001,
520	              <http://www.icir.org/floyd/red.html>.

522	   [CoDel]    Nichols, K. and V. Jacobson, "Controlling Queue Delay",
523	              ACM Queue 10(5), May 2012,
524	              <http://queue.acm.org/issuedetail.cfm?issue=2208917>.

526	   [CRED_Insights]
527	              Briscoe, B., "Insights from Curvy RED (Random Early
528	              Detection)", BT Technical Report TR-TUB8-2015-003, July
529	              2015,
530	              <http://www.bobbriscoe.net/projects/latency/credi_tr.pdf>.

532	   [DCttH15]  De Schepper, K., Bondarenko, O., Briscoe, B., and I.
533	              Tsang, "`Data Centre to the Home': Ultra-Low Latency for
534	              All", 2015, <http://www.bobbriscoe.net/projects/latency/
535	              dctth_preprint.pdf>.

537	              (Under submission)

539	   [I-D.ietf-aqm-fq-codel]
540	              Hoeiland-Joergensen, T., McKenney, P.,
541	              dave.taht@gmail.com, d., Gettys, J., and E. Dumazet, "The
542	              FlowQueue-CoDel Packet Scheduler and Active Queue
543	              Management Algorithm", draft-ietf-aqm-fq-codel-06 (work in
544	              progress), March 2016.

546	   [I-D.ietf-aqm-pie]
547	              Pan, R., Natarajan, P., Baker, F., and G. White, "PIE: A
548	              Lightweight Control Scheme To Address the Bufferbloat
549	              Problem", draft-ietf-aqm-pie-10 (work in progress),
550	              September 2016.

552	   [I-D.ietf-tcpm-cubic]
553	              Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and
554	              R. Scheffenegger, "CUBIC for Fast Long-Distance Networks",
555	              draft-ietf-tcpm-cubic-04 (work in progress), February
556	              2017.

558	   [I-D.ietf-tcpm-dctcp]
559	              Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L.,
560	              and G. Judd, "Datacenter TCP (DCTCP): TCP Congestion
561	              Control for Datacenters", draft-ietf-tcpm-dctcp-05 (work
562	              in progress), March 2017.

564	   [I-D.ietf-tsvwg-ecn-experimentation]
565	              Black, D., "Explicit Congestion Notification (ECN)
566	              Experimentation", draft-ietf-tsvwg-ecn-experimentation-00
567	              (work in progress), November 2016.

569	   [I-D.ietf-tsvwg-ecn-l4s-id]
570	              Schepper, K., Briscoe, B., and I. Tsang, "Identifying
571	              Modified Explicit Congestion Notification (ECN) Semantics
572	              for Ultra-Low Queuing Delay", draft-ietf-tsvwg-ecn-l4s-
573	              id-00 (work in progress), November 2016.

575	   [I-D.ietf-tsvwg-l4s-arch]
576	              Briscoe, B., Schepper, K., and M. Bagnulo, "Low Latency,
577	              Low Loss, Scalable Throughput (L4S) Internet Service:
578	              Architecture", draft-ietf-tsvwg-l4s-arch-00 (work in
579	              progress), November 2016.

581	   [I-D.sridharan-tcpm-ctcp]
582	              Sridharan, M., Tan, K., Bansal, D., and D. Thaler,
583	              "Compound TCP: A New TCP Congestion Control for High-Speed
584	              and Long Distance Networks", draft-sridharan-tcpm-ctcp-02
585	              (work in progress), November 2008.

587	   [Mathis09]
588	              Mathis, M., "Relentless Congestion Control", PFLDNeT'09 ,
589	              May 2009, <http://www.hpcc.jp/pfldnet2009/
590	              Program_files/1569198525.pdf>.

592	   [MEDF]     Menth, M., Schmid, M., Heiss, H., and T. Reim, "MEDF - a
593	              simple scheduling algorithm for two real-time transport
594	              service classes with application in the UTRAN", Proc. IEEE
595	              Conference on Computer Communications (INFOCOM'03) Vol.2
596	              pp.1116-1122, March 2003.

598	   [PI216]    De Schepper, K., Bondarenko, O., Briscoe, B., and I.
599	              Tsang, "PI2: A Linearized AQM for both Classic and
600	              Scalable TCP", ACM CoNEXT'16 , December 2016,
601	              <https://riteproject.files.wordpress.com/2015/10/
602	              pi2_conext.pdf>.

604	              (To appear)

606	   [RFC0970]  Nagle, J., "On Packet Switches With Infinite Storage",
607	              RFC 970, DOI 10.17487/RFC0970, December 1985,
608	              <http://www.rfc-editor.org/info/rfc970>.

610	   [RFC2309]  Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
611	              S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
612	              Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
613	              S., Wroclawski, J., and L. Zhang, "Recommendations on
614	              Queue Management and Congestion Avoidance in the
615	              Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998,
616	              <http://www.rfc-editor.org/info/rfc2309>.

618	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
619	              of Explicit Congestion Notification (ECN) to IP",
620	              RFC 3168, DOI 10.17487/RFC3168, September 2001,
621	              <http://www.rfc-editor.org/info/rfc3168>.

623	   [RFC3246]  Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec,
624	              J., Courtney, W., Davari, S., Firoiu, V., and D.
625	              Stiliadis, "An Expedited Forwarding PHB (Per-Hop
626	              Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002,
627	              <http://www.rfc-editor.org/info/rfc3246>.

629	   [RFC3649]  Floyd, S., "HighSpeed TCP for Large Congestion Windows",
630	              RFC 3649, DOI 10.17487/RFC3649, December 2003,
631	              <http://www.rfc-editor.org/info/rfc3649>.

633	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
634	              Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
635	              <http://www.rfc-editor.org/info/rfc5681>.

637	Appendix A.  Example DualQ Coupled PI2 Algorithm

639	   As a first concrete example, the pseudocode below gives the DualQ
640	   Coupled AQM algorithm based on the PI2 Classic AQM, we used and
641	   tested.  For this example only the pseudo code is given.  An open
642	   source implementation for Linux is available at:
643	   https://github.com/olgabo/dualpi2.

645	  1:  dualpi2_enqueue(lq, cq, pkt) {  % Test limit and classify lq or cq
646	  2:    stamp(pkt)                       % attach arrival time to packet
647	  3:    if ( lq.len() + cq.len() > limit )
648	  4:      drop(pkt)                          % drop packet if q is full
649	  5:    else {
650	  6:      if ( ecn(pkt) modulo 2 == 0 )   % ECN bits = not-ect or ect(0)
651	  7:        cq.enqueue(pkt)
652	  8:      else                                 % ECN bits = ect(1) or ce
653	  9:        lq.enqueue(pkt)
654	  10:   }
655	  11: }

657	      Figure 1: Example Enqueue Pseudocode for DualQ Coupled PI2 AQM

659	  1:  dualpi2_dequeue(lq, cq) {  % Couples L4S & Classic queues, lq & cq
660	  2:    while ( lq.len() + cq.len() > 0 )
661	  3:      if ( lq.time() + tshift >= cq.time() ) {
662	  4:        lq.dequeue(pkt)
663	  5:        if ( (pkt.time() > T) or (p * k > rand()) )  % coupling here
664	  6:          mark(pkt)
665	  7:        return(pkt)                % return the packet and stop here
666	  8:      } else {
667	  9:        cq.dequeue(pkt)
668	  10:       if ( p > max(rand(), rand()) ) % only square part of (p/k)^2
669	  11:         if ( ecn(pkt) == 0 )                 % ECN field = not-ect
670	  12:           drop(pkt)                      % squared drop, redo loop
671	  13:         else {
672	  14:           mark(pkt)                                 % squared mark
673	  15:           return(pkt)            % return the packet and stop here
674	  16:         }
675	  17:       else
676	  18:         return(pkt)              % return the packet and stop here
677	  19:     }
678	  20:   }
679	  21:   return(NULL)                              % no packet to dequeue
680	  22: }

682	      Figure 2: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM

684	  1:  dualpi2_update(lq, cq) {                  % Update p every Tupdate
685	  2:    curq = cq.time()   % use queuing time of first-in Classic packet
686	  3:    alpha_U = alpha * Tupdate    % done once when parameters are set
687	  4:    beta_U = beta * Tupdate      % done once when parameters are set
688	  5:    p = p + alpha_U * (curq - target) + beta_U * (curq - prevq)
689	  6:    prevq = curq
690	  7:  }

692	     Figure 3: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM

694	   When packets arrive, first a common queue limit is checked as shown
695	   in line 3 of the enqueuing pseudocode in Figure 1.  Note that the
696	   limit is deliberately tested before enqueue to avoid any bias against
697	   larger packets (so the actual buffer has to be one packet larger than
698	   limit).  If limit is not exceeded, the packet will be classified and
699	   enqueued to the Classic or L4S queue dependent on the least
700	   significant bit of the ECN field in the IP header (line 6).  Packets
701	   with a codepoint having an LSB of 0 (Not-ECT and ECT(0)) will be
702	   enqueued in the Classic queue.  Otherwise, ECT(1) and CE packets will
703	   be enqueued in the L4S queue.

705	   The pseudocode in Figure 2 summarises the per packet dequeue
706	   implementation of the DualPI2 code.  Line 3 implements the time-
707	   shifted FIFO scheduling.  It takes the packet that waited the
708	   longest, biased by a time-shift of tshift for the Classic traffic.
709	   If an L4S packet is scheduled, lines 5 and 6 mark the packet if
710	   either the L4S threshold T is exceeded, or if a random marking
711	   decision is drawn according to k times the probability p (maintained
712	   by the dualpi2_update() function discussed below).  The coupling
713	   factor is applied here for determining the L4S marking probability so
714	   that Classic TCP control is independent from the L4S coupling factor.
715	   If a Classic packet is scheduled, lines 10 to 16 drop or mark the
716	   packet based on 2 random decisions resulting in the squared
717	   probability p^2 (hence the name PI2 for Classic traffic).  Note that
718	   p is not reduced here by the factor k, as p has already been
719	   multiplied by the factor k when it was used to mark the L4S traffic.
720	   The coupling factor gives Classic TCP and DCTCP traffic equal
721	   throughput; Because L4S marking is factored up by k, the dynamic gain
722	   parameters alpha and beta also have to be factored up by k for the
723	   L4S queue, which is necessary to ensure that Classic TCP and DCTCP
724	   controls have the same stability.

726	   The probability p is kept up to date by the core PI algorithm in
727	   Figure 3 which is executed every Tupdate ([I-D.ietf-aqm-pie] now
728	   recommends 16ms, but in our testing so far we have used the earlier
729	   recommendation of 32ms).  Note that p solely depends on the queuing
730	   time in the Classic queue.  In line 2, the current queuing delay is
731	   evaluated by inspecting the timestamp of the next packet to schedule
732	   in the Classic queue.  The function cq.time() subtracts the time
733	   stamped at enqueue from the current time and implicitly takes the
734	   current queuing delay as 0 if the queue is empty.  Line 3 and 4 only
735	   need to be executed when the configuration parameters are changed.
736	   Alpha and beta in Hz are gain factors per 1 second.  If a briefer
737	   update time is configured, alpha_U and beta_U (_U = per Tupdate) also
738	   have to be reduced, to ensure that the same response is given over
739	   time.  As such, a smaller Tupdate will only result in a response with
740	   smaller and finer steps, not a more aggressive response.  The new
741	   probability is calculated in line 5, where target is the target
742	   queuing delay, as defined in [I-D.ietf-aqm-pie].  In corner cases, p
743	   can overflow the range [0,1] so the resulting value of p has to be
744	   bounded (omitted from the pseudocode).  Unlike PIE, alpha_U and
745	   beta_U are not tuned dependent on p, every Tupdate.  Instead, in PI2
746	   alpha_U and beta_U can be constants because the squaring applied to
747	   Classic traffic tunes them inherently, as explained in [PI216].

749	   In our experiments so far (building on experiments with PIE) on
750	   broadband access links ranging from 4 Mb/s to 200 Mb/s with base RTTs
751	   from 5 ms to 100 ms, PI2 achieves good results with the following
752	   parameters:

754	      tshift = 40ms
755	      T = max(1ms, serialization time of 2 MTU)

757	      target = 20ms

759	      Tupdate = 32ms

761	      k = 2

763	      alpha = 10Hz (alpha*k = 20Hz for L4S)

765	      beta = 100Hz (beta*k = 200Hz for L4S)

767	Appendix B.  Example DualQ Coupled Curvy RED Algorithm

769	   As another example, the pseudocode below gives the Curvy RED based
770	   DualQ Coupled AQM algorithm we used and tested.  Although we designed
771	   the AQM to be efficient in integer arithmetic, to aid understanding
772	   it is first given using real-number arithmetic.  Then, one possible
773	   optimization for integer arithmetic is given, also in pseudocode.  To
774	   aid comparison, the line numbers are kept in step between the two by
775	   using letter suffixes where the longer code needs extra lines.

777	   1:  dualq_dequeue(lq, cq) {  % Couples L4S & Classic queues, lq & cq
778	   2:    if ( lq.dequeue(pkt) ) {
779	   3a:     p_L = cq.sec() / 2^S_L
780	   3b:     if ( lq.byt() > T )
781	   3c:       mark(pkt)
782	   3d:     elif ( p_L > maxrand(U) )
783	   4:        mark(pkt)
784	   5:      return(pkt)                % return the packet and stop here
785	   6:    }
786	   7:    while ( cq.dequeue(pkt) ) {
787	   8a:     alpha = 2^(-f_C)
788	   8b:     Q_C = alpha * pkt.sec() + (1-alpha)* Q_C    % Classic Q EWMA
789	   9a:     sqrt_p_C = Q_C / 2^S_C
790	   9b:     if ( sqrt_p_C > maxrand(2*U) )
791	   10:       drop(pkt)                        % Squared drop, redo loop
792	   11:     else
793	   12:       return(pkt)              % return the packet and stop here
794	   13:   }
795	   14:   return(NULL)                           % no packet to dequeue
796	   15: }

798	   16: maxrand(u) {                % return the max of u random numbers
799	   17:     maxr=0
800	   18:     while (u-- > 0)
801	   19:         maxr = max(maxr, rand())               % 0 <= rand() < 1
802	   20:     return(maxr)
803	   21: }

805	   Figure 4: Example Dequeue Pseudocode for DualQ Coupled Curvy RED AQM

807	   Packet classification code is not shown, as it is no different from
808	   Figure 1.  Potential classification schemes are discussed in
809	   Section 2.  Overload protection code will be included in a future
810	   draft {ToDo}.

812	   At the outer level, the structure of dualq_dequeue() implements
813	   strict priority scheduling.  The code is written assuming the AQM is
814	   applied on dequeue (Note 1) . Every time dualq_dequeue() is called,
815	   the if-block in lines 2-6 determines whether there is an L4S packet
816	   to dequeue by calling lq.dequeue(pkt), and otherwise the while-block
817	   in lines 7-13 determines whether there is a Classic packet to
818	   dequeue, by calling cq.dequeue(pkt).  (Note 2)

820	   In the lower priority Classic queue, a while loop is used so that, if
821	   the AQM determines that a classic packet should be dropped, it
822	   continues to test for classic packets deciding whether to drop each
823	   until it actually forwards one.  Thus, every call to dualq_dequeue()
824	   returns one packet if at least one is present in either queue,
825	   otherwise it returns NULL at line 14.  (Note 3)

827	   Within each queue, the decision whether to drop or mark is taken as
828	   follows (to simplify the explanation, it is assumed that U=1):

830	   L4S:  If the test at line 2 determines there is an L4S packet to
831	      dequeue, the tests at lines 3a and 3c determine whether to mark
832	      it.  The first is a simple test of whether the L4S queue (lq.byt()
833	      in bytes) is greater than a step threshold T in bytes (Note 4).
834	      The second test is similar to the random ECN marking in RED, but
835	      with the following differences: i) the marking function does not
836	      start with a plateau of zero marking until a minimum threshold,
837	      rather the marking probability starts to increase as soon as the
838	      queue is positive; ii) marking depends on queuing time, not bytes,
839	      in order to scale for any link rate without being reconfigured;
840	      iii) marking of the L4S queue does not depend on itself, it
841	      depends on the queuing time of the _other_ (Classic) queue, where
842	      cq.sec() is the queuing time of the packet at the head of the
843	      Classic queue (zero if empty); iv) marking depends on the
844	      instantaneous queuing time (of the other Classic queue), not a
845	      smoothed average; v) the queue is compared with the maximum of U
846	      random numbers (but if U=1, this is the same as the single random
847	      number used in RED).

849	      Specifically, in line 3a the marking probability p_L is set to the
850	      Classic queueing time qc.sec() in seconds divided by the L4S
851	      scaling parameter 2^S_L, which represents the queuing time (in
852	      seconds) at which marking probability would hit 100%. Then in line
853	      3d (if U=1) the result is compared with a uniformly distributed
854	      random number between 0 and 1, which ensures that marking
855	      probability will linearly increase with queueing time.  The
856	      scaling parameter is expressed as a power of 2 so that division
857	      can be implemented as a right bit-shift (>>) in line 3 of the
858	      integer variant of the pseudocode (Figure 5).

860	   Classic:  If the test at line 7 determines that there is at least one
861	      Classic packet to dequeue, the test at line 9b determines whether
862	      to drop it.  But before that, line 8b updates Q_C, which is an
863	      exponentially weighted moving average (Note 5) of the queuing time
864	      in the Classic queue, where pkt.sec() is the instantaneous
865	      queueing time of the current Classic packet and alpha is the EWMA
866	      constant for the classic queue.  In line 8a, alpha is represented
867	      as an integer power of 2, so that in line 8 of the integer code
868	      the division needed to weight the moving average can be
869	      implemented by a right bit-shift (>> f_C).

871	      Lines 9a and 9b implement the drop function.  In line 9a the
872	      averaged queuing time Q_C is divided by the Classic scaling
873	      parameter 2^S_C, in the same way that queuing time was scaled for
874	      L4S marking.  This scaled queuing time is given the variable name
875	      sqrt_p_C because it will be squared to compute Classic drop
876	      probability, so before it is squared it is effectively the square
877	      root of the drop probability.  The squaring is done by comparing
878	      it with the maximum out of two random numbers (assuming U=1).
879	      Comparing it with the maximum out of two is the same as the
880	      logical `AND' of two tests, which ensures drop probability rises
881	      with the square of queuing time (Note 6).  Again, the scaling
882	      parameter is expressed as a power of 2 so that division can be
883	      implemented as a right bit-shift in line 9 of the integer
884	      pseudocode.

886	   The marking/dropping functions in each queue (lines 3 & 9) are two
887	   cases of a new generalization of RED called Curvy RED, motivated as
888	   follows.  When we compared the performance of our AQM with fq_CoDel
889	   and PIE, we came to the conclusion that their goal of holding queuing
890	   delay to a fixed target is misguided [CRED_Insights].  As the number
891	   of flows increases, if the AQM does not allow TCP to increase queuing
892	   delay, it has to introduce abnormally high levels of loss.  Then loss
893	   rather than queuing becomes the dominant cause of delay for short
894	   flows, due to timeouts and tail losses.

896	   Curvy RED constrains delay with a softened target that allows some
897	   increase in delay as load increases.  This is achieved by increasing
898	   drop probability on a convex curve relative to queue growth (the
899	   square curve in the Classic queue, if U=1).  Like RED, the curve hugs
900	   the zero axis while the queue is shallow.  Then, as load increases,
901	   it introduces a growing barrier to higher delay.  But, unlike RED, it
902	   requires only one parameter, the scaling, not three.  The diadvantage
903	   of Curvy RED is that it is not adapted to a wide range of RTTs.
904	   Curvy RED can be used as is when the RTT range to support is limited
905	   otherwise an adaptation mechanism is required.

907	   There follows a summary listing of the two parameters used for each
908	   of the two queues:

910	   Classic:

912	      S_C :   The scaling factor of the dropping function scales Classic
913	         queuing times in the range [0, 2^(S_C)] seconds into a dropping
914	         probability in the range [0,1].  To make division efficient, it
915	         is constrained to be an integer power of two;

917	      f_C :  To smooth the queuing time of the Classic queue and make
918	         multiplication efficient, we use a negative integer power of
919	         two for the dimensionless EWMA constant, which we define as
920	         2^(-f_C).

922	   L4S :

924	      S_L (and k):   As for the Classic queue, the scaling factor of the
925	         L4S marking function scales Classic queueing times in the range
926	         [0, 2^(S_L)] seconds into a probability in the range [0,1].
927	         Note that S_L = S_C + k, where k is the coupling between the
928	         queues (Section 2.1).  So S_L and k count as only one
929	         parameter;

931	      T :  The queue size in bytes at which step threshold marking
932	         starts in the L4S queue.

934	   {ToDo: These are the raw parameters used within the algorithm.  A
935	   configuration front-end could accept more meaningful parameters and
936	   convert them into these raw parameters.}

938	   From our experiments so far, recommended values for these parameters
939	   are: S_C = -1; f_C = 5; T = 5 * MTU for the range of base RTTs
940	   typical on the public Internet.  [CRED_Insights] explains why these
941	   parameters are applicable whatever rate link this AQM implementation
942	   is deployed on and how the parameters would need to be adjusted for a
943	   scenario with a different range of RTTs (e.g. a data centre) {ToDo
944	   incorporate a summary of that report into this draft}. The setting of
945	   k depends on policy (see Section 2.4 and Appendix C respectively for
946	   its recommended setting and guidance on alternatives).

948	   There is also a cUrviness parameter, U, which is a small positive
949	   integer.  It is likely to take the same hard-coded value for all
950	   implementations, once experiments have determined a good value.  We
951	   have solely used U=1 in our experiments so far, but results might be
952	   even better with U=2 or higher.

954	   Note that the dropping function at line 9 calls maxrand(2*U), which
955	   gives twice as much curviness as the call to maxrand(U) in the
956	   marking function at line 3.  This is the trick that implements the
957	   square rule in equation (1) (Section 2.1).  This is based on the fact
958	   that, given a number X from 1 to 6, the probability that two dice
959	   throws will both be less than X is the square of the probability that
960	   one throw will be less than X.  So, when U=1, the L4S marking
961	   function is linear and the Classic dropping function is squared.  If
962	   U=2, L4S would be a square function and Classic would be quartic.
963	   And so on.

965	   The maxrand(u) function in lines 16-21 simply generates u random
966	   numbers and returns the maximum (Note 7).  Typically, maxrand(u)
967	   could be run in parallel out of band.  For instance, if U=1, the
968	   Classic queue would require the maximum of two random numbers.  So,
969	   instead of calling maxrand(2*U) in-band, the maximum of every pair of
970	   values from a pseudorandom number generator could be generated out-
971	   of-band, and held in a buffer ready for the Classic queue to consume.

973	   1:  dualq_dequeue(lq, cq) {  % Couples L4S & Classic queues, lq & cq
974	   2:     if ( lq.dequeue(pkt) ) {
975	   3:        if ((lq.byt() > T) || ((cq.ns() >> (S_L-2)) > maxrand(U)))
976	   4:           mark(pkt)
977	   5:        return(pkt)              % return the packet and stop here
978	   6:     }
979	   7:     while ( cq.dequeue(pkt) ) {
980	   8:         Q_C += (pkt.ns() - Q_C) >> f_C           % Classic Q EWMA
981	   9:        if ( (Q_C >> (S_C-2) ) > maxrand(2*U) )
982	   10:          drop(pkt)                     % Squared drop, redo loop
983	   11:       else
984	   12:          return(pkt)           % return the packet and stop here
985	   13:    }
986	   14:    return(NULL)                           % no packet to dequeue
987	   15: }

989	   Figure 5: Optimised Example Dequeue Pseudocode for Coupled DualQ AQM
990	                         using Integer Arithmetic

992	   Notes:

994	   1.  The drain rate of the queue can vary if it is scheduled relative
995	       to other queues, or to cater for fluctuations in a wireless
996	       medium.  To auto-adjust to changes in drain rate, the queue must
997	       be measured in time, not bytes or packets [CoDel].  In our Linux
998	       implementation, it was easiest to measure queuing time at
999	       dequeue.  Queuing time can be estimated when a packet is enqueued
1000	       by measuring the queue length in bytes and dividing by the recent
1001	       drain rate.

1003	   2.  An implementation has to use priority queueing, but it need not
1004	       implement strict priority.

1006	   3.  If packets can be enqueued while processing dequeue code, an
1007	       implementer might prefer to place the while loop around both
1008	       queues so that it goes back to test again whether any L4S packets
1009	       arrived while it was dropping a Classic packet.

1011	   4.  In order not to change too many factors at once, for now, we keep
1012	       the marking function for DCTCP-only traffic as similar as
1013	       possible to DCTCP.  However, unlike DCTCP, all processing is at
1014	       dequeue, so we determine whether to mark a packet at the head of
1015	       the queue by the byte-length of the queue _behind_ it.  We plan
1016	       to test whether using queuing time will work in all
1017	       circumstances, and if we find that the step can cause
1018	       oscillations, we will investigate replacing it with a steep
1019	       random marking curve.

1021	   5.  An EWMA is only one possible way to filter bursts; other more
1022	       adaptive smoothing methods could be valid and it might be
1023	       appropriate to decrease the EWMA faster than it increases.

1025	   6.  In practice at line 10 the Classic queue would probably test for
1026	       ECN capability on the packet to determine whether to drop or mark
1027	       the packet.  However, for brevity such detail is omitted.  All
1028	       packets classified into the L4S queue have to be ECN-capable, so
1029	       no dropping logic is necessary at line 3.  Nonetheless, L4S
1030	       packets could be dropped by overload code (see Section 4.1).

1032	   7.  In the integer variant of the pseudocode (Figure 5) real numbers
1033	       are all represented as integers scaled up by 2^32.  In lines 3 &
1034	       9 the function maxrand() is arranged to return an integer in the
1035	       range 0 <= maxrand() < 2^32.  Queuing times are also scaled up by
1036	       2^32, but in two stages: i) In lines 3 and 8 queuing times
1037	       cq.ns() and pkt.ns() are returned in integer nanoseconds, making
1038	       the values about 2^30 times larger than when the units were
1039	       seconds, ii) then in lines 3 and 9 an adjustment of -2 to the
1040	       right bit-shift multiplies the result by 2^2, to complete the
1041	       scaling by 2^32.

1043	Appendix C.  Guidance on Controlling Throughput Equivalence

1045	                     +---------------+------+-------+
1046	                     | RTT_C / RTT_L | Reno | Cubic |
1047	                     +---------------+------+-------+
1048	                     |             1 | k=1  | k=0   |
1049	                     |             2 | k=2  | k=1   |
1050	                     |             3 | k=2  | k=2   |
1051	                     |             4 | k=3  | k=2   |
1052	                     |             5 | k=3  | k=3   |
1053	                     +---------------+------+-------+

1055	   Table 1: Value of k for which DCTCP throughput is roughly the same as
1056	                Reno or Cubic, for some example RTT ratios

1058	   To determine the appropriate policy, the operator first has to judge
1059	   whether it wants DCTCP flows to have roughly equal throughput with
1060	   Reno or with Cubic (because, even in its Reno-compatibility mode,
1061	   Cubic is about 1.4 times more aggressive than Reno).  Then the
1062	   operator needs to decide at what ratio of RTTs it wants DCTCP and
1063	   Classic flows to have roughly equal throughput.  For example choosing
1064	   the recommended value of k=0 will make DCTCP throughput roughly the
1065	   same as Cubic, _if their RTTs are the same_.

1067	   However, even if the base RTTs are the same, the actual RTTs are
1068	   unlikely to be the same, because Classic (Cubic or Reno) traffic
1069	   needs a large queue to avoid under-utilization and excess drop,
1070	   whereas L4S (DCTCP) does not.  The operator might still choose this
1071	   policy if it judges that DCTCP throughput should be rewarded for
1072	   keeping its own queue short.

1074	   On the other hand, the operator will choose one of the higher values
1075	   for k, if it wants to slow DCTCP down to roughly the same throughput
1076	   as Classic flows, to compensate for Classic flows slowing themselves
1077	   down by causing themselves extra queuing delay.

1079	   The values for k in the table are derived from the formulae, which
1080	   was developed in [DCttH15]:

1082	       2^k = 1.64 (RTT_reno / RTT_dc)                  (2)
1083	       2^k = 1.19 (RTT_cubic / RTT_dc )                (3)

1085	   For localized traffic from a particular ISP's data centre, we used
1086	   the measured RTTs to calculate that a value of k=3 would achieve
1087	   throughput equivalence, and our experiments verified the formula very
1088	   closely.

1090	Authors' Addresses

1092	   Koen De Schepper
1093	   Nokia Bell Labs
1094	   Antwerp
1095	   Belgium

1097	   Email: koen.de_schepper@nokia.com
1098	   URI:   https://www.bell-labs.com/usr/koen.de_schepper

1100	   Bob Briscoe (editor)
1101	   Simula Research Lab

1103	   Email: ietf@bobbriscoe.net
1104	   URI:   http://bobbriscoe.net/
1105	   Olga Bondarenko
1106	   Simula Research Lab
1107	   Lysaker
1108	   Norway

1110	   Email: olgabnd@gmail.com
1111	   URI:   https://www.simula.no/people/olgabo

1113	   Ing-jyh Tsang
1114	   Nokia Bell Labs
1115	   Antwerp
1116	   Belgium

1118	   Email: ing-jyh.tsang@nokia.com