idnits 2.17.1 draft-morton-tsvwg-cheap-nasty-queueing-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (4 November 2019) is 1635 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Working Group J. Morton 3 Internet-Draft P. Heist 4 Intended status: Informational 4 November 2019 5 Expires: 7 May 2020 7 Cheap Nasty Queueing 8 draft-morton-tsvwg-cheap-nasty-queueing-01 10 Abstract 12 This note presents Cheap Nasty Queueing (CNQ), a queueing algorithm 13 intended as a bare-minimum functionality standard for hardware 14 implementations. It provides stateless or single-instance AQM and 15 basic sparse-flow prioritisation. 17 Status of This Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at https://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on 7 May 2020. 34 Copyright Notice 36 Copyright (c) 2019 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 41 license-info) in effect on the date of publication of this document. 42 Please review these documents carefully, as they describe your rights 43 and restrictions with respect to this document. Code Components 44 extracted from this document must include Simplified BSD License text 45 as described in Section 4.e of the Trust Legal Provisions and are 46 provided without warranty as described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 51 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 2 52 3. The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 3 53 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 3 54 3.2. Declarations . . . . . . . . . . . . . . . . . . . . . . 4 55 3.3. Pseudo-code . . . . . . . . . . . . . . . . . . . . . . . 5 56 4. Security Considerations . . . . . . . . . . . . . . . . . . . 8 57 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 58 6. Informative References . . . . . . . . . . . . . . . . . . . 8 59 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 61 1. Introduction 63 Flow isolation is a powerful tool for congestion management in 64 today's Internet. Unfortunately, the relatively complex algorithms 65 and considerable dynamic state of a DRR++ queue set with individual 66 AQM (Active Queue Management) [RFC7567] instances has proved 67 disheartening to hardware implementors, and thus to deployment on 68 high-capacity links and in consumer-grade hardware. 70 This note therefore presents CNQ, a queueing algorithm suitable for 71 implementation in low-cost hardware, providing the absolute minimum 72 functionality to improve perceived network performance over that of a 73 dumb FIFO. 75 2. Background 77 CNQ is inspired by DRR++'s facility for identifying "sparse" flows 78 and giving them strict priority over "saturating" flows. DRR++ does 79 this by maintaining separate lists of queues (each queue containing 80 one flow) meeting "sparseness" criteria or not. 82 Queues are first placed into the sparse list when they become non- 83 empty, then moved to the saturating list when their deficit exceeds a 84 threshold called "quantum". Every queue's deficit is incremented by 85 the packet size when packets are delivered from it, and decremented 86 by the quantum when they come up in the list rotation. Queues are 87 removed from the saturating list only when they are found empty for a 88 full rotation. 90 This "sparseness" heuristic over observed per-flow queue occupancy 91 characteristics is relatively robust, compared to relying on the 92 correct behaviour of each source's congestion control algorithms and/ 93 or explicit traffic marking. This is especially relevant with the 94 recent development of high-fidelity congestion signalling schemes, 95 such as DCTCP [RFC8257] and SCE (Some Congestion Experienced), whose 96 expected congestion-signal response is markedly different from 97 previous standards. 99 In fq_codel [RFC8290] and Cake [CAKE], AQM is applied individually to 100 each DRR++ flow, thus avoiding unnecessary leakage of AQM action from 101 flows requiring it to well-behaved traffic which does not. This 102 arrangement has been shown to work well in practice, and is widely 103 deployed as part of the Linux kernel, including in many CPE devices. 104 However the per-queue AQM state dominates the memory requirements of 105 DRR++. 107 CNQ attempts to retain some of these characteristics while 108 simplifying implementation requirements considerably. This still 109 requires identifying individual traffic flows and keeping some per- 110 flow state, but there is no longer an individual queue per state nor 111 any lists of such queues. Instead there are only two queues and at 112 most one set of AQM state. The operations required are believed to 113 be amenable to low-cost hardware implementation. 115 3. The Algorithm 117 3.1. Overview 119 Unlike conventional fair queueing, with Cheap Nasty Queueing, packets 120 are not distributed to queues by a flow mapping, but by a sparseness 121 metric associated with that mapping. Thus, the number of queues is 122 reduced to two. 124 The number of flows which can be handled is far greater, however, 125 being limited by the number of flow buckets indexed by the flow hash. 126 An implementation might define a flow as traffic to one subscriber, 127 and provide a perfect mapping between subscribers and buckets. 128 Alternatively it might provide a stochastic mapping based on the 129 traditional 5-tuple of addresses, port numbers, and protocol number. 130 The latter would be appropriate for low-cost consumer hardware, in 131 which the notion of a "subscriber" is neither well-defined nor 132 useful. 134 The per-flow state is just one unsigned integer, in contrast to DRR++ 135 which requires a whole queue and a set of AQM state per flow. This 136 integer is B, tracking the backlog of the flow in packets. This 137 small per-flow state makes tracking a large number of flows 138 practical. 140 The two queues provided are SQ and BQ: 142 SQ is the "sparse queue" which handles flows classed as sparse, 143 including the first packets in newly active flows. This queue tends 144 to remain short and drain quickly, which are ideal characteristics 145 for latency-sensitive traffic, and young flows still establishing 146 connections or probing for capacity. This queue does not maintain 147 AQM state nor apply AQM signals, and will contain at most one packet 148 from each flow at any one time. 150 BQ is the "bulk queue" which handles all traffic not classed as 151 sparse, including at least the second and subsequent packets in a 152 burst. An AQM algorithm is applied to all traffic delivered from it. 154 To prevent well-paced traffic from dominating the queue by keeping 155 exactly one packet in SQ at all times, a dummy packet is sent into BQ 156 in parallel with every packet enqueued in SQ, and the B value for the 157 flow effectively tracks the number of packets (including dummies) in 158 BQ. A flow is therefore considered sparse IFF the interval between 159 its packets is longer than the sojourn time of packets in BQ. This 160 can be a much stricter criterion than for true derivatives of DRR++ 161 such as LFQ. 163 The maximum throughput of a sparse flow is thus defined by the size 164 of the packets composing that flow, divided by the sojourn time of 165 BQ, under the assumption that the flow is well paced. For a typical 166 example where the packet size is 1500 bytes and the sojourn time of 167 BQ is controlled to 5ms by Codel AQM action, flows of up to about 168 2.5Mbps throughput may be treated as sparse. 170 In case of queue overflow, packets are removed from the "head" of BQ 171 to make room for the new arrivals; this head-dropping behaviour 172 minimises the delay before the lost packets can be retransmitted. 174 This simplification of state and algorithm has some drawbacks in 175 terms of resultant behaviour compared to DRR++. The sharing of link 176 capacity between flows is dependent mainly on the RTT-fair properties 177 of the flows' own congestion control, in response to congestion 178 signalling from the single AQM. However, this should be seen as an 179 improvement in performance for sparse flows as compared to a plain 180 FIFO queue. 182 3.2. Declarations 184 The following queues are defined: 186 ------------------------------- 187 --> | | | | --> 188 ------------------------------- 189 SQ: the Sparse Queue, containing packets from flows with no more 190 than one packet in the queue at a time (no AQM for this queue). 192 ------------------------------- 193 --> | | | | | | | | | | | --> 194 ------------------------------- 195 BQ: the Bulk Queue, containing packets from flows that build up a 196 multi-packet backlog (AQM managed queue). 198 The following constants and variables are defined: 200 * B: the flow backlog, in packets 202 * N: the number of flow buckets (each bucket containing a value of 203 B) 205 * S: the size of a packet 207 * T: the packet's timestamp, for later use by AQM 209 * H: the packet's flow hash, cached 211 * MAXSIZE: the maximum size for all packets in the queue 213 * NOW: the current timestamp 215 Finally, the hash function FH() maps a packet to a flow bucket: 217 +---+ 218 /--- | B | 219 / +---+ 220 / 221 +------+ / +---+ 222 ----- Packet -----> | FH() | ------- | B | 223 +------+ \ +---+ 224 \ 225 \ 226 \--- ... N 228 3.3. Pseudo-code 230 In the following pseudo-code: 232 * Lowercase is used for internal variables, and uppercase for 233 constants, variables and queues defined in Section 3.2. 235 * The send() function transmits the packet. 237 * The aqm_action() function updates the AQM state (if any) based on 238 the current sojourn time, and returns an action code indicating 239 whether a CE or SCE mark (or no mark) should be applied. This 240 function may be stateless and merely return results from a 241 threshold function or probability ramp, or it may implement Codel 242 or similar stateful AQMs, or a hybrid of the two for separate CE 243 and SCE marking strategies. 245 The following functions and variables are defined for both the sparse 246 and bulk queues: 248 * The push() function adds a packet to the tail of the specified 249 queue. 251 * The pop() function removes and returns the packet from the head of 252 the specified queue. 254 * The .size variable (BQ.size and SQ.size) refers to the sum of the 255 sizes of all packets in the queue, and may be maintained during 256 push(), pop(). 258 * The .head variable is the current head pointer for the queue. 260 The logic for the enqueue operation is as follows: 262 enqueue(packet p) { 263 while (SQ.size + BQ.size + S > MAXSIZE) { 264 ; Queue overflow - drop from BQ head, then from SQ 265 dp := pop(BQ) 266 if (!dp) 267 dp := pop(SQ) 268 bkt := dp.H 269 bkt.B -= 1 270 } 272 bkt := FH(p) 273 p.T = NOW 274 p.H = bkt 275 if (bkt.B == 0) { 276 push(SQ, p) 277 dp := zero-length dummy packet 278 dp.T = NOW 279 dp.H = bkt 280 push(BQ, dp) 281 bkt.B += 2 282 } else { 283 push(BQ, p) 284 bkt.B += 1 285 } 286 } 288 The logic for the dequeue operation is as follows: 290 dequeue() { 291 ; SQ gets strict priority 292 p := pop(SQ) 293 if (p) { 294 send(p) 295 bkt := p.H 296 bkt.B -= 1 297 return 298 } 300 ; Process BQ if SQ was empty 301 repeat { 302 p := pop(BQ) 303 if (!p) { 304 ; Queue is empty 305 return 306 } 308 bkt := p.H 309 bkt.B -= 1 310 if (p.S == 0) { 311 ; Dummy packet for sparseness metric - drop 312 continue 313 } 315 ; Apply AQM logic based on sojourn time 316 t := NOW - p.T 318 ; drop unresponsive traffic 319 if (t > 500ms) 320 continue 322 switch(aqm_action(t)) { 323 case MARK_CE: 324 ; legacy congestion signalling 325 if (t.ECN == Not-ECT) 326 continue 327 ; RFC-3168 328 if (t.ECN == ECT || t.ECN == SCE) 329 t.ECN = CE ; and update IP header checksum 330 break 332 case MARK_SCE: 333 ; Some Congestion Experienced 334 if (t.ECN == ECT) 335 t.ECN = SCE ; and update IP header checksum 336 break 338 default: 339 ; no marking request 340 break 341 } 343 send(p) 344 return 345 } 346 } 348 4. Security Considerations 350 This is a very weak FQ algorithm, not much better than a dumb FIFO - 351 but still better. 353 5. IANA Considerations 355 There are no IANA considerations. 357 6. Informative References 359 [CAKE] Hoiland-Jorgensen, T., Taht, D., and J. Morton, "Piece of 360 CAKE: A Comprehensive Queue Management Solution for Home 361 Gateways", May 2018, . 363 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 364 Recommendations Regarding Active Queue Management", 365 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 366 . 368 [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., 369 and G. Judd, "Data Center TCP (DCTCP): TCP Congestion 370 Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, 371 October 2017, . 373 [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, 374 J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler 375 and Active Queue Management Algorithm", RFC 8290, 376 DOI 10.17487/RFC8290, January 2018, 377 . 379 Authors' Addresses 381 Jonathan Morton 382 Kokkonranta 21 383 FI-31520 Pitkajarvi 384 Finland 386 Phone: +358 44 927 2377 387 Email: chromatix99@gmail.com 389 Peter G. Heist 390 Redacted 391 463 11 Liberec 30 392 Czech Republic 394 Email: pete@heistp.net