idnits 2.17.1 

draft-morton-tsvwg-lightweight-fair-queueing-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (3 July 2019) is 1757 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Transport Working Group                                        J. Morton
3	Internet-Draft                                                  P. Heist
4	Intended status: Informational                               3 July 2019
5	Expires: 4 January 2020

7	                       Lightweight Fair Queueing
8	            draft-morton-tsvwg-lightweight-fair-queueing-00

10	Abstract

12	   This note presents Lightweight Fair Queueing (LFQ), a fair queueing
13	   algorithm with a small code footprint, low memory requirements, no
14	   multiply operations, only two physical queues, and only one set of
15	   AQM state.  LFQ provides throughput fairness, sparse flow
16	   prioritization and ordering guarantees, making it suitable for a
17	   mixture of traffic flow types.

19	Status of This Memo

21	   This Internet-Draft is submitted in full conformance with the
22	   provisions of BCP 78 and BCP 79.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF).  Note that other groups may also distribute
26	   working documents as Internet-Drafts.  The list of current Internet-
27	   Drafts is at https://datatracker.ietf.org/drafts/current/.

29	   Internet-Drafts are draft documents valid for a maximum of six months
30	   and may be updated, replaced, or obsoleted by other documents at any
31	   time.  It is inappropriate to use Internet-Drafts as reference
32	   material or to cite them other than as "work in progress."

34	   This Internet-Draft will expire on 4 January 2020.

36	Copyright Notice

38	   Copyright (c) 2019 IETF Trust and the persons identified as the
39	   document authors.  All rights reserved.

41	   This document is subject to BCP 78 and the IETF Trust's Legal
42	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
43	   license-info) in effect on the date of publication of this document.
44	   Please review these documents carefully, as they describe your rights
45	   and restrictions with respect to this document.  Code Components
46	   extracted from this document must include Simplified BSD License text
47	   as described in Section 4.e of the Trust Legal Provisions and are
48	   provided without warranty as described in the Simplified BSD License.

50	Table of Contents

52	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
53	   2.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   2
54	   3.  The Algorithm . . . . . . . . . . . . . . . . . . . . . . . .   3
55	     3.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .   3
56	     3.2.  Declarations  . . . . . . . . . . . . . . . . . . . . . .   4
57	     3.3.  Pseudo-code . . . . . . . . . . . . . . . . . . . . . . .   6
58	     3.4.  Simulator . . . . . . . . . . . . . . . . . . . . . . . .   9
59	   4.  Security Considerations . . . . . . . . . . . . . . . . . . .   9
60	   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
61	   6.  Informative References  . . . . . . . . . . . . . . . . . . .   9
62	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  10

64	1.  Introduction

66	   Flow isolation is a powerful tool for congestion management in
67	   today's Internet.  Early implementations, such as SFQ [SFQ], aimed
68	   simply to have inter-flow induced latency dependent on the number of
69	   flows, rather than the total length of the queue.  Today, DRR++
70	   [DRRPP] explicitly shares throughput capacity between flows, and
71	   prioritises "sparse" flows that use less than their fair share, on
72	   the grounds that these are probably latency-sensitive traffic.  This
73	   reduces the inter-flow induced latency to near zero for sparse flows,
74	   regardless of the number of saturating flows.

76	   Unfortunately, the relatively complex algorithms and considerable
77	   dynamic state of a DRR++ queue set with individual AQM (Active Queue
78	   Management) [RFC7567] instances has proved disheartening to hardware
79	   implementors, and thus to deployment on high-capacity links.
80	   Ordinary CPE devices implementing DRR++ in software work well up to
81	   100Mbps or so.  A scheme involving only a small number of queues and
82	   AQM instances might be more suitable for the 1Gbps and up category.

84	   This note therefore presents LFQ, a fair queueing algorithm suitable
85	   for implementation in hardware, making fair queueing possible on
86	   high-throughput routers and low-cost middleboxes.

88	2.  Background

90	   LFQ is inspired by DRR++'s facility for identifying "sparse" flows
91	   and giving them strict priority over "saturating" flows.  DRR++ does
92	   this by maintaining separate lists of queues (each queue containing
93	   one flow) meeting "sparseness" criteria or not.

95	   Queues are first placed into the sparse list when they become non-
96	   empty, then moved to the saturating list when their deficit exceeds a
97	   threshold called "quantum".  Every queue's deficit is incremented by
98	   the packet size when packets are delivered from it, and decremented
99	   by the quantum when they come up in the list rotation.  Queues are
100	   removed from the saturating list only when they are found empty for a
101	   full rotation.

103	   This "sparseness" heuristic over observed per-flow queue occupancy
104	   characteristics is relatively robust, compared to relying on the
105	   correct behaviour of each source's congestion control algorithms and/
106	   or explicit traffic marking.  This is especially relevant with the
107	   recent development of high-fidelity congestion signalling schemes,
108	   such as DCTCP [RFC8257] and SCE (Some Congestion Experienced), whose
109	   expected congestion-signal response is markedly different from
110	   previous standards.

112	   In fq_codel [RFC8290] and Cake [CAKE], AQM is applied individually to
113	   each DRR++ flow, thus avoiding unnecessary leakage of AQM action from
114	   flows requiring it to well-behaved traffic which does not.  This
115	   arrangement has been shown to work well in practice, and is widely
116	   deployed as part of the Linux kernel, including in many CPE devices.
117	   However the per-queue AQM state dominates the memory requirements of
118	   DRR++.

120	   LFQ attempts to retain most of these characteristics while
121	   simplifying implementation requirements considerably.  This still
122	   requires identifying individual traffic flows and keeping some per-
123	   flow state, but there is no longer an individual queue per state nor
124	   any lists of such queues.  Instead there are only two queues and only
125	   one set of AQM state.  The operations required are believed to be
126	   amenable to hardware implementation.

128	3.  The Algorithm

130	3.1.  Overview

132	   Unlike conventional fair queueing, with Lightweight Fair Queueing,
133	   packets are not distributed to queues by a flow mapping, but by a
134	   sparseness metric associated with that mapping.  Thus, the number of
135	   queues is reduced to two.

137	   The number of flows which can be handled is far greater, however,
138	   being limited by the number of flow buckets indexed by the flow hash.
139	   An implementation might define a flow as traffic to one subscriber,
140	   and provide a perfect mapping between subscribers and buckets.
141	   Alternatively it might provide a stochastic mapping based on the
142	   traditional 5-tuple of addresses, port numbers, and protocol number.

144	   The per-flow state is just two integers (one signed, one unsigned)
145	   and one binary flag, in contrast to DRR++ which requires a whole
146	   queue and a set of AQM state per flow.  These integers are B,
147	   tracking the backlog of the flow in packets, and D, tracking a
148	   deficit value analogous to that used in DRR++.  The range of D is
149	   [-MTU..+MTU], so for a typical 1500-byte MTU, a 12-bit register
150	   suffices.  The binary flag K indicates whether packets should be
151	   skipped for the rest of this pass through the queue.  This small per-
152	   flow state makes tracking a large number of flows practical.

154	   The two queues provided are SQ and BQ:

156	   SQ is the "sparse queue" which handles flows classed as sparse,
157	   including the first packets in newly active flows.  This queue tends
158	   to remain short and drain quickly, which are ideal characteristics
159	   for latency-sensitive traffic, and young flows still establishing
160	   connections or probing for capacity.  This queue does not maintain
161	   AQM state nor apply AQM signals.

163	   BQ is the "bulk queue" which handles all traffic not classed as
164	   sparse, including at least the second and subsequent packets in a
165	   burst.  BQ has not only the typical "head" and "tail", but also a
166	   "scan" pointer which iterates over the packets in the queue from head
167	   to tail.  Packets are delivered from the "scan" position, not from
168	   the "head"; this is key to the capacity-sharing mechanism.  A full
169	   set of AQM state is maintained on BQ, and applied to all traffic
170	   delivered from it.

172	   In case of queue overflow, packets are removed from the "head" of BQ
173	   to make room for the new arrivals; this head-dropping behaviour
174	   minimises the delay before the lost packets can be retransmitted.

176	   This simplification of state and algorithm has some drawbacks in
177	   terms of resultant behaviour.  The sharing of link capacity between
178	   flows will not be as smooth as with DRR++, and the relatively coarse
179	   provision of AQM may result in a noticeable degradation of congestion
180	   signalling.

182	3.2.  Declarations

184	   The following queues are defined:

186	    -------------------------------
187	   -->                  | | | |  -->
188	    -------------------------------
189	    SQ: the Sparse Queue, containing packets from flows with no more
190	    than one packet in the queue at a time (no AQM for this queue).

192	                         /--[AQM]-->
193	                         |
194	    ---------------------|---------
195	   -->     | | | | | | | | | | | |
196	    -------------------------------
197	    BQ: the Bulk Queue, containing packets from flows that build up a
198	    multi-packet backlog (AQM managed queue), showing scan pointer.

200	   The following constants and variables are defined:

202	   *  B: the flow backlog, in packets

204	   *  D: the flow deficit, in bytes

206	   *  K: the flow scan skip flag

208	   *  N: the number of flow buckets (each bucket containing a value of
209	      B, D, and K)

211	   *  S: the size of a packet

213	   *  T: the packet's timestamp, for later use by AQM

215	   *  H: the packet's flow hash, cached

217	   *  MTU: the MTU of the link

219	   *  MAXSIZE: the maximum size for all packets in the queue

221	   *  NOW: the current timestamp

223	   *  FLOWS: all flow buckets

225	   Finally, the hash function FH() maps a packet to a flow bucket:

227	                                        +-------+
228	                                   /--- | B D K |
229	                                  /     +-------+
230	                                 /
231	                       +------+ /       +-------+
232	   ----- Packet -----> | FH() | ------- | B D K |
233	                       +------+ \       +-------+
234	                                 \
235	                                  \
236	                                   \--- ... N

238	3.3.  Pseudo-code

240	   In the following pseudo-code:

242	   *  Lowercase is used for internal variables, and uppercase for
243	      constants, variables and queues defined in Section 3.2.

245	   *  The send() function applies AQM logic before actually transmitting
246	      the packet given; if it drops the packet, dequeue() is immediately
247	      re-called.

249	   The following functions and variables are defined for both the sparse
250	   and bulk queues:

252	   *  The push() function adds a packet to the tail of the specified
253	      queue.

255	   *  The pop() function removes and returns the packet from the head of
256	      the specified queue.  BQ's scan pointer must point to the same
257	      packet afterwards, if it is still present, otherwise to the head
258	      of the queue.

260	   *  The .size variable (BQ.size and SQ.size) refers to the sum of the
261	      sizes of all packets in the queue, and may be maintained during
262	      push(), pop(), and pull().

264	   *  The .head variable is the current head pointer for the queue.

266	   The following functions and variables are defined only for BQ:

268	   *  The pull() function removes and returns the packet at the scan
269	      pointer.

271	   *  The scan() function returns the packet at the scan pointer without
272	      removing it.

274	   *  The head() function returns the packet at the head of the
275	      specified queue without removing it.

277	   *  The .scan variable is the current scan pointer for the queue.

279	   The logic for the enqueue operation is as follows:

281	   enqueue(packet p) {
282	       while (SQ.size + BQ.size + S > MAXSIZE) {
283	           ; Queue overflow - drop from BQ head, then from SQ
284	           dp := pop(BQ)
285	           if (!dp)
286	               dp := pop(SQ)
287	           bkt := dp.H
288	           bkt.B -= 1
289	       }

291	       bkt := FH(p)
292	       p.T = NOW
293	       p.H = bkt
294	       if (bkt.B == 0 && bkt.D >= 0 && !bkt.K)
295	           push(SQ, p)
296	       else
297	           push(BQ, p)
298	       bkt.B += 1
299	   }

301	   The logic for the dequeue operation is as follows:

303	   dequeue() {
304	       ; SQ gets strict priority
305	       p := pop(SQ)
306	       if (p) {
307	           send(p)
308	           bkt := p.H
309	           bkt.B -= 1
310	           bkt.D -= S
311	           if (bkt.D < 0) {
312	               bkt.K = true
313	               bkt.D += MTU
314	           }
315	           return
316	       }

318	       ; Process BQ if SQ was empty
319	       while (head(BQ)) {
320	           p := scan(BQ)
321	           if (!p) {
322	               ; Scan has reached tail of queue
323	               forall(f in FLOWS where f.B == 0 && !f.K)
324	                   f.D = 0
325	               forall(f in FLOWS where f.K)
326	                   f.K = false
327	               BQ.scan = BQ.head
328	               p := scan(BQ)
329	           }

331	           bkt := p.H
332	           if (!bkt.K) {
333	               ; Packet eligible for immediate delivery
334	               send(p)
335	               pull(BQ)
336	               bkt.B -= 1
337	               bkt.D -= S
338	               if (bkt.D < 0) {
339	                   bkt.K = true
340	                   bkt.D += MTU
341	               }
342	               return
343	           } else {
344	               ; Packet to stay in queue
345	               BQ.scan = BQ.scan.next
346	           }
347	       }
348	   }

350	3.4.  Simulator

352	   A discrete time simulator for LFQ has been implemented, which acts as
353	   a supporting demonstration, verification of the algorithm's
354	   effectiveness and a test bed for exploration [LFQSIM].

356	4.  Security Considerations

358	   As with all FQ algorithms, an attacker may degrade service by
359	   flooding the queue with traffic that hashes into random buckets, or
360	   obtain enhanced service by using multiple flows where one would
361	   normally suffice.  The latter may be mitigated by a flow mapping for
362	   individual hosts, or subscribers, rather than the 5-tuple.

364	5.  IANA Considerations

366	   There are no IANA considerations.

368	6.  Informative References

370	   [CAKE]     Hoiland-Jorgensen, T., Taht, D., and J. Morton, "Piece of
371	              CAKE: A Comprehensive Queue Management Solution for Home
372	              Gateways", May 2018, <https://arxiv.org/abs/1804.07617>.

374	   [DRRPP]    MacGregor, M.H. and W. Shi, "Deficits for Bursty Latency-
375	              critical Flows: DRR++", September 2000,
376	              <http://ieeexplore.ieee.org/xpls/
377	              abs_all.jsp?arnumber=875803>.

379	   [LFQSIM]   "Lightweight Fair Queueing Simulator GitHub Repository",
380	              July 2019, <https://github.com/heistp/lfqsim/>.

382	   [RFC7567]  Baker, F., Ed. and G. Fairhurst, Ed., "IETF
383	              Recommendations Regarding Active Queue Management",
384	              BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015,
385	              <https://www.rfc-editor.org/info/rfc7567>.

387	   [RFC8257]  Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L.,
388	              and G. Judd, "Data Center TCP (DCTCP): TCP Congestion
389	              Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257,
390	              October 2017, <https://www.rfc-editor.org/info/rfc8257>.

392	   [RFC8290]  Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys,
393	              J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler
394	              and Active Queue Management Algorithm", RFC 8290,
395	              DOI 10.17487/RFC8290, January 2018,
396	              <https://www.rfc-editor.org/info/rfc8290>.

398	   [SFQ]      McKenney, P.E., "Stochastic Fairness Queueing", June 2002,
399	              <http://www2.rdrop.com/~paulmck/scalability/paper/
400	              sfq.2002.06.04.pdf>.

402	Authors' Addresses

404	   Jonathan Morton
405	   Kokkonranta 21
406	   FI-31520 Pitkajarvi
407	   Finland

409	   Phone: +358 44 927 2377
410	   Email: chromatix99@gmail.com

412	   Peter G. Heist
413	   Redacted
414	   463 11 Liberec 30
415	   Czech Republic

417	   Email: pete@heistp.net