idnits 2.17.1 draft-morton-tsvwg-lightweight-fair-queueing-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (3 July 2019) is 1757 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Working Group J. Morton 3 Internet-Draft P. Heist 4 Intended status: Informational 3 July 2019 5 Expires: 4 January 2020 7 Lightweight Fair Queueing 8 draft-morton-tsvwg-lightweight-fair-queueing-00 10 Abstract 12 This note presents Lightweight Fair Queueing (LFQ), a fair queueing 13 algorithm with a small code footprint, low memory requirements, no 14 multiply operations, only two physical queues, and only one set of 15 AQM state. LFQ provides throughput fairness, sparse flow 16 prioritization and ordering guarantees, making it suitable for a 17 mixture of traffic flow types. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on 4 January 2020. 36 Copyright Notice 38 Copyright (c) 2019 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 43 license-info) in effect on the date of publication of this document. 44 Please review these documents carefully, as they describe your rights 45 and restrictions with respect to this document. Code Components 46 extracted from this document must include Simplified BSD License text 47 as described in Section 4.e of the Trust Legal Provisions and are 48 provided without warranty as described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 2 54 3. The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 3 55 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 3 56 3.2. Declarations . . . . . . . . . . . . . . . . . . . . . . 4 57 3.3. Pseudo-code . . . . . . . . . . . . . . . . . . . . . . . 6 58 3.4. Simulator . . . . . . . . . . . . . . . . . . . . . . . . 9 59 4. Security Considerations . . . . . . . . . . . . . . . . . . . 9 60 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 61 6. Informative References . . . . . . . . . . . . . . . . . . . 9 62 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 64 1. Introduction 66 Flow isolation is a powerful tool for congestion management in 67 today's Internet. Early implementations, such as SFQ [SFQ], aimed 68 simply to have inter-flow induced latency dependent on the number of 69 flows, rather than the total length of the queue. Today, DRR++ 70 [DRRPP] explicitly shares throughput capacity between flows, and 71 prioritises "sparse" flows that use less than their fair share, on 72 the grounds that these are probably latency-sensitive traffic. This 73 reduces the inter-flow induced latency to near zero for sparse flows, 74 regardless of the number of saturating flows. 76 Unfortunately, the relatively complex algorithms and considerable 77 dynamic state of a DRR++ queue set with individual AQM (Active Queue 78 Management) [RFC7567] instances has proved disheartening to hardware 79 implementors, and thus to deployment on high-capacity links. 80 Ordinary CPE devices implementing DRR++ in software work well up to 81 100Mbps or so. A scheme involving only a small number of queues and 82 AQM instances might be more suitable for the 1Gbps and up category. 84 This note therefore presents LFQ, a fair queueing algorithm suitable 85 for implementation in hardware, making fair queueing possible on 86 high-throughput routers and low-cost middleboxes. 88 2. Background 90 LFQ is inspired by DRR++'s facility for identifying "sparse" flows 91 and giving them strict priority over "saturating" flows. DRR++ does 92 this by maintaining separate lists of queues (each queue containing 93 one flow) meeting "sparseness" criteria or not. 95 Queues are first placed into the sparse list when they become non- 96 empty, then moved to the saturating list when their deficit exceeds a 97 threshold called "quantum". Every queue's deficit is incremented by 98 the packet size when packets are delivered from it, and decremented 99 by the quantum when they come up in the list rotation. Queues are 100 removed from the saturating list only when they are found empty for a 101 full rotation. 103 This "sparseness" heuristic over observed per-flow queue occupancy 104 characteristics is relatively robust, compared to relying on the 105 correct behaviour of each source's congestion control algorithms and/ 106 or explicit traffic marking. This is especially relevant with the 107 recent development of high-fidelity congestion signalling schemes, 108 such as DCTCP [RFC8257] and SCE (Some Congestion Experienced), whose 109 expected congestion-signal response is markedly different from 110 previous standards. 112 In fq_codel [RFC8290] and Cake [CAKE], AQM is applied individually to 113 each DRR++ flow, thus avoiding unnecessary leakage of AQM action from 114 flows requiring it to well-behaved traffic which does not. This 115 arrangement has been shown to work well in practice, and is widely 116 deployed as part of the Linux kernel, including in many CPE devices. 117 However the per-queue AQM state dominates the memory requirements of 118 DRR++. 120 LFQ attempts to retain most of these characteristics while 121 simplifying implementation requirements considerably. This still 122 requires identifying individual traffic flows and keeping some per- 123 flow state, but there is no longer an individual queue per state nor 124 any lists of such queues. Instead there are only two queues and only 125 one set of AQM state. The operations required are believed to be 126 amenable to hardware implementation. 128 3. The Algorithm 130 3.1. Overview 132 Unlike conventional fair queueing, with Lightweight Fair Queueing, 133 packets are not distributed to queues by a flow mapping, but by a 134 sparseness metric associated with that mapping. Thus, the number of 135 queues is reduced to two. 137 The number of flows which can be handled is far greater, however, 138 being limited by the number of flow buckets indexed by the flow hash. 139 An implementation might define a flow as traffic to one subscriber, 140 and provide a perfect mapping between subscribers and buckets. 141 Alternatively it might provide a stochastic mapping based on the 142 traditional 5-tuple of addresses, port numbers, and protocol number. 144 The per-flow state is just two integers (one signed, one unsigned) 145 and one binary flag, in contrast to DRR++ which requires a whole 146 queue and a set of AQM state per flow. These integers are B, 147 tracking the backlog of the flow in packets, and D, tracking a 148 deficit value analogous to that used in DRR++. The range of D is 149 [-MTU..+MTU], so for a typical 1500-byte MTU, a 12-bit register 150 suffices. The binary flag K indicates whether packets should be 151 skipped for the rest of this pass through the queue. This small per- 152 flow state makes tracking a large number of flows practical. 154 The two queues provided are SQ and BQ: 156 SQ is the "sparse queue" which handles flows classed as sparse, 157 including the first packets in newly active flows. This queue tends 158 to remain short and drain quickly, which are ideal characteristics 159 for latency-sensitive traffic, and young flows still establishing 160 connections or probing for capacity. This queue does not maintain 161 AQM state nor apply AQM signals. 163 BQ is the "bulk queue" which handles all traffic not classed as 164 sparse, including at least the second and subsequent packets in a 165 burst. BQ has not only the typical "head" and "tail", but also a 166 "scan" pointer which iterates over the packets in the queue from head 167 to tail. Packets are delivered from the "scan" position, not from 168 the "head"; this is key to the capacity-sharing mechanism. A full 169 set of AQM state is maintained on BQ, and applied to all traffic 170 delivered from it. 172 In case of queue overflow, packets are removed from the "head" of BQ 173 to make room for the new arrivals; this head-dropping behaviour 174 minimises the delay before the lost packets can be retransmitted. 176 This simplification of state and algorithm has some drawbacks in 177 terms of resultant behaviour. The sharing of link capacity between 178 flows will not be as smooth as with DRR++, and the relatively coarse 179 provision of AQM may result in a noticeable degradation of congestion 180 signalling. 182 3.2. Declarations 184 The following queues are defined: 186 ------------------------------- 187 --> | | | | --> 188 ------------------------------- 189 SQ: the Sparse Queue, containing packets from flows with no more 190 than one packet in the queue at a time (no AQM for this queue). 192 /--[AQM]--> 193 | 194 ---------------------|--------- 195 --> | | | | | | | | | | | | 196 ------------------------------- 197 BQ: the Bulk Queue, containing packets from flows that build up a 198 multi-packet backlog (AQM managed queue), showing scan pointer. 200 The following constants and variables are defined: 202 * B: the flow backlog, in packets 204 * D: the flow deficit, in bytes 206 * K: the flow scan skip flag 208 * N: the number of flow buckets (each bucket containing a value of 209 B, D, and K) 211 * S: the size of a packet 213 * T: the packet's timestamp, for later use by AQM 215 * H: the packet's flow hash, cached 217 * MTU: the MTU of the link 219 * MAXSIZE: the maximum size for all packets in the queue 221 * NOW: the current timestamp 223 * FLOWS: all flow buckets 225 Finally, the hash function FH() maps a packet to a flow bucket: 227 +-------+ 228 /--- | B D K | 229 / +-------+ 230 / 231 +------+ / +-------+ 232 ----- Packet -----> | FH() | ------- | B D K | 233 +------+ \ +-------+ 234 \ 235 \ 236 \--- ... N 238 3.3. Pseudo-code 240 In the following pseudo-code: 242 * Lowercase is used for internal variables, and uppercase for 243 constants, variables and queues defined in Section 3.2. 245 * The send() function applies AQM logic before actually transmitting 246 the packet given; if it drops the packet, dequeue() is immediately 247 re-called. 249 The following functions and variables are defined for both the sparse 250 and bulk queues: 252 * The push() function adds a packet to the tail of the specified 253 queue. 255 * The pop() function removes and returns the packet from the head of 256 the specified queue. BQ's scan pointer must point to the same 257 packet afterwards, if it is still present, otherwise to the head 258 of the queue. 260 * The .size variable (BQ.size and SQ.size) refers to the sum of the 261 sizes of all packets in the queue, and may be maintained during 262 push(), pop(), and pull(). 264 * The .head variable is the current head pointer for the queue. 266 The following functions and variables are defined only for BQ: 268 * The pull() function removes and returns the packet at the scan 269 pointer. 271 * The scan() function returns the packet at the scan pointer without 272 removing it. 274 * The head() function returns the packet at the head of the 275 specified queue without removing it. 277 * The .scan variable is the current scan pointer for the queue. 279 The logic for the enqueue operation is as follows: 281 enqueue(packet p) { 282 while (SQ.size + BQ.size + S > MAXSIZE) { 283 ; Queue overflow - drop from BQ head, then from SQ 284 dp := pop(BQ) 285 if (!dp) 286 dp := pop(SQ) 287 bkt := dp.H 288 bkt.B -= 1 289 } 291 bkt := FH(p) 292 p.T = NOW 293 p.H = bkt 294 if (bkt.B == 0 && bkt.D >= 0 && !bkt.K) 295 push(SQ, p) 296 else 297 push(BQ, p) 298 bkt.B += 1 299 } 301 The logic for the dequeue operation is as follows: 303 dequeue() { 304 ; SQ gets strict priority 305 p := pop(SQ) 306 if (p) { 307 send(p) 308 bkt := p.H 309 bkt.B -= 1 310 bkt.D -= S 311 if (bkt.D < 0) { 312 bkt.K = true 313 bkt.D += MTU 314 } 315 return 316 } 318 ; Process BQ if SQ was empty 319 while (head(BQ)) { 320 p := scan(BQ) 321 if (!p) { 322 ; Scan has reached tail of queue 323 forall(f in FLOWS where f.B == 0 && !f.K) 324 f.D = 0 325 forall(f in FLOWS where f.K) 326 f.K = false 327 BQ.scan = BQ.head 328 p := scan(BQ) 329 } 331 bkt := p.H 332 if (!bkt.K) { 333 ; Packet eligible for immediate delivery 334 send(p) 335 pull(BQ) 336 bkt.B -= 1 337 bkt.D -= S 338 if (bkt.D < 0) { 339 bkt.K = true 340 bkt.D += MTU 341 } 342 return 343 } else { 344 ; Packet to stay in queue 345 BQ.scan = BQ.scan.next 346 } 347 } 348 } 350 3.4. Simulator 352 A discrete time simulator for LFQ has been implemented, which acts as 353 a supporting demonstration, verification of the algorithm's 354 effectiveness and a test bed for exploration [LFQSIM]. 356 4. Security Considerations 358 As with all FQ algorithms, an attacker may degrade service by 359 flooding the queue with traffic that hashes into random buckets, or 360 obtain enhanced service by using multiple flows where one would 361 normally suffice. The latter may be mitigated by a flow mapping for 362 individual hosts, or subscribers, rather than the 5-tuple. 364 5. IANA Considerations 366 There are no IANA considerations. 368 6. Informative References 370 [CAKE] Hoiland-Jorgensen, T., Taht, D., and J. Morton, "Piece of 371 CAKE: A Comprehensive Queue Management Solution for Home 372 Gateways", May 2018, . 374 [DRRPP] MacGregor, M.H. and W. Shi, "Deficits for Bursty Latency- 375 critical Flows: DRR++", September 2000, 376 . 379 [LFQSIM] "Lightweight Fair Queueing Simulator GitHub Repository", 380 July 2019, . 382 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 383 Recommendations Regarding Active Queue Management", 384 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 385 . 387 [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., 388 and G. Judd, "Data Center TCP (DCTCP): TCP Congestion 389 Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, 390 October 2017, . 392 [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, 393 J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler 394 and Active Queue Management Algorithm", RFC 8290, 395 DOI 10.17487/RFC8290, January 2018, 396 . 398 [SFQ] McKenney, P.E., "Stochastic Fairness Queueing", June 2002, 399 . 402 Authors' Addresses 404 Jonathan Morton 405 Kokkonranta 21 406 FI-31520 Pitkajarvi 407 Finland 409 Phone: +358 44 927 2377 410 Email: chromatix99@gmail.com 412 Peter G. Heist 413 Redacted 414 463 11 Liberec 30 415 Czech Republic 417 Email: pete@heistp.net