idnits 2.17.1 draft-hoeiland-joergensen-aqm-fq-codel-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 8 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 518 has weird spacing: '...st_head flowc...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (March 3, 2014) is 3707 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'IETF84' is mentioned on line 203, but not defined -- Looks like a reference, but probably isn't: '1' on line 658 == Unused Reference: 'RFC0896' is defined on line 619, but no explicit reference was found in the text == Unused Reference: 'RFC0970' is defined on line 622, but no explicit reference was found in the text == Unused Reference: 'RFC2119' is defined on line 625, but no explicit reference was found in the text == Unused Reference: 'RFC2309' is defined on line 628, but no explicit reference was found in the text == Unused Reference: 'SFQ' is defined on line 646, but no explicit reference was found in the text ** Obsolete normative reference: RFC 896 (Obsoleted by RFC 7805) ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567) Summary: 3 errors (**), 0 flaws (~~), 9 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 AQM and Packet Scheduling T. Hoeiland-Joergensen 3 Internet-Draft Karlstad University 4 Intended status: Informational P. McKenney 5 Expires: September 4, 2014 IBM Linux Technology Center 6 D. Taht 7 Teklibre 8 J. Gettys 9 Bell Labs 10 E. Dumazet 11 Google, Inc. 12 March 3, 2014 14 FlowQueue-Codel 15 draft-hoeiland-joergensen-aqm-fq-codel-00 17 Abstract 19 This memo presents the FQ-CoDel hybrid packet scheduler/AQM 20 algorithm, a critical tool for fighting bufferbloat and reducing 21 latency across the Internet. 23 FQ-CoDel mixes packets from multiple flows and reduces the impact of 24 head of line blocking from bursty traffic. It provides isolation for 25 low-rate traffic such as DNS, web, and videoconferencing traffic. It 26 improves utilisation across the networking fabric, especially for 27 bidirectional traffic, by keeping queue lengths short; and it can be 28 implemented in a memory- and CPU-efficient fashion across a wide 29 range of hardware. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at http://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on September 4, 2014. 48 Copyright Notice 50 Copyright (c) 2014 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 66 1.1. Terminology and concepts . . . . . . . . . . . . . . . . 3 67 1.2. Informal summary of FQ-CoDel . . . . . . . . . . . . . . 4 68 2. CoDel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 69 3. Flow Queueing . . . . . . . . . . . . . . . . . . . . . . . . 5 70 4. FQ-CoDel Parameters and Data Structures . . . . . . . . . . . 6 71 4.1. Parameters . . . . . . . . . . . . . . . . . . . . . . . 6 72 4.1.1. Interval . . . . . . . . . . . . . . . . . . . . . . 6 73 4.1.2. Target . . . . . . . . . . . . . . . . . . . . . . . 6 74 4.1.3. Packet limit . . . . . . . . . . . . . . . . . . . . 7 75 4.1.4. Quantum . . . . . . . . . . . . . . . . . . . . . . . 7 76 4.1.5. Flows . . . . . . . . . . . . . . . . . . . . . . . . 7 77 4.1.6. ECN . . . . . . . . . . . . . . . . . . . . . . . . . 7 78 4.2. Data structures . . . . . . . . . . . . . . . . . . . . . 8 79 4.2.1. Internal sub-queues . . . . . . . . . . . . . . . . . 8 80 4.2.2. New and old queues lists . . . . . . . . . . . . . . 8 81 5. The FQ-CoDel scheduler and AQM interactions . . . . . . . . . 8 82 5.1. Enqueue . . . . . . . . . . . . . . . . . . . . . . . . . 8 83 5.2. Dequeue . . . . . . . . . . . . . . . . . . . . . . . . . 9 84 6. Implementation considerations . . . . . . . . . . . . . . . . 11 85 6.1. Probability of hash collisions . . . . . . . . . . . . . 11 86 6.2. Memory Overhead . . . . . . . . . . . . . . . . . . . . . 11 87 6.3. Per-Packet Timestamping . . . . . . . . . . . . . . . . . 12 88 6.4. Other forms of "Fair Queueing" . . . . . . . . . . . . . 12 89 6.5. Differences between CoDel and FQ-CoDel behaviour . . . . 12 90 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 91 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 92 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 93 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 13 94 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 95 11.1. Normative References . . . . . . . . . . . . . . . . . . 14 96 11.2. Informative References . . . . . . . . . . . . . . . . . 14 97 11.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 14 98 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 100 1. Introduction 102 The FQ-CoDel algorithm is a combined packet scheduler and AQM 103 developed as part of the bufferbloat-fighting community effort. It 104 is based on a modified Deficit Round Robin (DRR) queue scheduler, 105 with the CoDel AQM algorithm operating on each sub-queue. This 106 document describes the combined algorithm; reference implementations 107 are available for ns2 and ns3 and it is included in the mainline 108 Linux kernel as the FQ-CoDel queueing discipline. 110 The rest of this document is structured as follows: The rest of this 111 section gives some concepts and terminology used in the rest of the 112 document, and gives a short informal summary of the FQ-CoDel 113 algorithm. Section 2 gives an overview of the CoDel algorithm. 114 Section 3 covers the DRR portion. Section 4 defines the parameters 115 and data structures employed by FQ-CoDel. Section 5 describes the 116 working of the algorithm in detail. Section 6 describes 117 implementation considerations. Section 10 concludes. 119 1.1. Terminology and concepts 121 Flow: A flow is typically identified by a 5-tuple of source IP, 122 destination IP, source port, destination port, and protocol. It can 123 also be identified by a superset or subset of those parameters, or by 124 mac address, or other means. 126 Queue: A queue of packets represented internally in FQ-CoDel. In 127 most instances each flow gets its own queue; however because of the 128 possibility of hash collisions, this is not always the case. In an 129 attempt to avoid confusion, the word 'queue' is used to refer to the 130 internal data structure, and 'flow' to refer to the actual stream of 131 packets being delivered to the FQ-CoDel algorithm. 133 Scheduler: A mechanism to select which queue a packet is dequeued 134 from. 136 CoDel AQM: The Active Queue Management algorithm employed by FQ- 137 CoDel. 139 DRR: Deficit round-robin scheduling. 141 Quantum: The maximum amount of bytes to be dequeued from a queue at 142 once. 144 1.2. Informal summary of FQ-CoDel 146 FQ-CoDel is a _hybrid_ of DRR [DRR] and CODEL [CODEL2012], with an 147 optimisation for sparse flows similar to SQF [SQF2012]. We call this 148 "Flow Queueing" rather than "Fair Queueing" as flows that build a 149 queue are treated differently than flows that do not. 151 FQ-CoDel stochastically classifies incoming packets into different 152 sub-queues by hashing the 5-tuple of IP protocol number and source 153 and destination IP and port numbers, perturbed with a random number 154 selected at initiation time. Each queue is managed by the CoDel 155 queueing discipline. Packet ordering within a queue is preserved, 156 since queues have FIFO ordering. 158 The FQ-CoDel algorithm consists of two logical parts: the scheduler 159 which selects which queue to dequeue a packet from, and the CoDel AQM 160 which works on each of the queues. The subtleties of FQ-CoDel are 161 mostly in the scheduling part, whereas the interaction between the 162 scheduler and the CoDel algorithm are fairly straight forward: 164 At initialisation, each queue is set up to have a separate set of 165 CoDel state variables. By default, 1024 queues are created. The 166 current implementation supports anywhere from one to 64K separate 167 queues, and each queue maintains the state variables throughout its 168 lifetime, and so acts the same as the non-FQ CoDel variant would. 169 This means that with only one queue, FQ-CoDel behaves essentially the 170 same as CoDel by itself. 172 On dequeue, FQ-CoDel selects a queue from which to dequeue by a two- 173 tier round-robin scheme, in which each queue is allowed to dequeue up 174 to a configurable quantum of bytes for each iteration. Deviations 175 from this quantum is maintained as a deficit for the queue, which 176 serves to make the fairness scheme byte-based rather than a packet- 177 based. The two-tier round-robin mechanism distinguishes between 178 "new" queues (which don't build up a standing queue) and "old" 179 queues, that have queued enough data to be around for more than one 180 iteration of the round-robin scheduler. 182 This new/old queue distinction has a particular consequence for 183 queues that don't build up more than a quantum of bytes before being 184 visited by the scheduler: Such queues are removed from the list, and 185 then re-added as a new queue each time a packet arrives for it, and 186 so will get priority over queues that do not empty out each round 187 (except for a minor modification to protect against starvation, 188 detailed below). Exactly how much data a flow has to send to keep 189 its queue in this state is somewhat difficult to reason about, 190 because it depends on both the egress link speed and the number of 191 concurrent flows. However, in practice many things that are 192 beneficial to have prioritised for typical internet use (ACKs, DNS 193 lookups, interactive SSH, HTTP requests, ARP, ICMP, VoIP) _tend_ to 194 fall in this category, which is why FQ-CoDel performs so well for 195 many practical applications. 197 This scheduling scheme has some subtlety to it, which is explained in 198 detail in the remainder of this document. 200 2. CoDel 202 CoDel is described in the the ACM Queue paper, CODEL [CODEL2012], and 203 Van Jacobson's IETF84 presentation [IETF84]. The basic idea is to 204 control queue length, maintaining sufficient queueing to keep the 205 outgoing link busy, but avoiding building up the queue beyond that 206 point. This is done by preferentially dropping packets that remain 207 in the queue for "too long". 209 When each new packet arrives, its arrival time is stored with it. 210 Later, when it is that packet's turn to be dequeued, CoDel computes 211 its sojourn time (the current time minus the arrival time). If the 212 sojourn time for packets being dequeued exceeds the _target_ time for 213 a time period of at least the (current value of) _interval_, one or 214 more packets will be dropped (or marked, if ECN is enabled) in order 215 to signal the source endpoint to reduce its send rate. If the 216 sojourn still remains above the target time, the value of interval be 217 lowered, and additional packet drops will occur on a schedule 218 computed from an inverse-square-root control law until either (1) the 219 queue becomes empty or (2) a packet is encountered with a sojourn 220 time that is less than the target time. Upon exiting the dropping 221 mode, CoDel caches the last calculated interval (applying varying 222 amounts of hysteresis to it), to be used as the starting point on 223 subsequent re-entries into dropping mode. 225 The _target_ time is normally set to about five milliseconds, and the 226 initial _interval_ is normally set to about 100 milliseconds. This 227 approach has proven to be quite effective in a wide variety of 228 situations. 230 CoDel drops packets at the head of a queue, rather than at the tail. 232 3. Flow Queueing 234 FQ-CoDel's DRR scheduler is byte-based, employing a deficit round- 235 robin mechanism between queues. This works by keeping track of the 236 current byte _deficit_ of each queue. This deficit is initialised to 237 the configurable quantum; each time a queue gets a dequeue 238 opportunity, it gets to dequeue packets, decreasing the deficit by 239 the packet size for each packet, until the deficit runs into the 240 negative, at which point it is increased by one quantum, and the 241 dequeue opportunity ends. 243 This means that if one queue contains packets of size quantum/3, and 244 another contains quantum-sized packets, the first queue will dequeue 245 three packets each time it gets a turn, whereas the second only 246 dequeues one. This means that flows that send small packets are not 247 penalised by the difference in packet sizes; rather, the DRR scheme 248 approximates a (single-)byte-based fairness queueing. The size of 249 the quantum determines the scheduling granularity, with the tradeoff 250 from too small a quantum being scheduling overhead. For small 251 bandwidths, lowering the quantum from the default MTU size can be 252 advantageous. 254 Unlike DRR there are two sets of flows - a "new" list for flows that 255 have not built a queue recently, and an "old" list for flow-building 256 queues. 258 4. FQ-CoDel Parameters and Data Structures 260 This section goes into the parameters and data structures in FQ- 261 CoDel. 263 4.1. Parameters 265 4.1.1. Interval 267 The _interval_ parameter has the same semantics as CoDel and is used 268 to ensure that the measured minimum delay does not become too stale. 269 The minimum delay MUST be experienced in the last epoch of length 270 interval. It SHOULD be set on the order of the worst-case RTT 271 through the bottleneck to give end-points sufficient time to react. 273 The default interval value is 100 ms. 275 4.1.2. Target 277 The _target_ parameter has the same semantics as CoDel. It is the 278 acceptable minimum standing/persistent queue delay for each FQ-CoDel 279 Queue. This minimum delay is identified by tracking the local 280 minimum queue delay that packets experience. 282 The default target value is 5 ms, but this value SHOULD be tuned to 283 be at least the transmission time of a single MTU-sized packet at the 284 prevalent egress link speed (which for e.g. 1Mbps and MTU 1500 is 285 ~15ms). 287 4.1.3. Packet limit 289 Routers do not have infinite memory, so some packet limit MUST be 290 enforced. 292 The _limit_ parameter is the hard limit on the real queue size, 293 measured in number of packets. This limit is a global limit on the 294 number of packets in all queues; each individual queue does not have 295 an upper limit. When the limit is reached and a new packet arrives 296 for enqueue, a packet is dropped from the head of the largest queue 297 (measured in bytes) to make room for the new packet. 299 The default packet limit is 10240 packets, which is suitable for up 300 to 10GigE speeds. 302 4.1.4. Quantum 304 The _quantum_ parameter is the number of bytes each queue gets to 305 dequeue on each round of the scheduling algorithm. The default is 306 set to 1514 bytes which corresponds to the Ethernet MTU plus the 307 hardware header length of 14 bytes. 309 In TSO-enabled systems, where a "packet" consists of an offloaded 310 packet train, it can presently be as large as 64K bytes. In GRO- 311 enabled systems, up to 17 times the TCP max segment size (or 25K 312 bytes). 314 4.1.5. Flows 316 The _flows_ parameter sets the number of sub-queues into which the 317 incoming packets are classified. Due to the stochastic nature of 318 hashing, multiple flows may end up being hashed into the same slot. 320 This parameter can be set only at load time since memory has to be 321 allocated for the hash table in the current implementation. 323 The default value is 1024. 325 4.1.6. ECN 327 ECN is _enabled_ by default. Rather than do anything special with 328 misbehaved ECN flows, FQ-CoDel relies on the packet scheduling system 329 to minimise their impact, thus unresponsive packets in a flow being 330 marked with ECN can grow to the overall packet limit, but will not 331 otherwise affect the performance of the system. 333 It can be disabled by specifying the _noecn_ parameter. 335 4.2. Data structures 337 4.2.1. Internal sub-queues 339 The main data structure of FQ-CoDel is the array of sub-queues, which 340 is instantiated to the number of queues specified by the _flows_ 341 parameter at instantiation time. Each sub-queue consists simply of 342 an ordered list of packets with FIFO semantics, two state variables 343 tracking the queue deficit and total number of bytes enqueued, and 344 the set of CoDel state variables. Other state variables to track 345 queue statistics can also be included: for instance, the Linux 346 implementation keeps a count of dropped packets. 348 Queue space is shared: there's a global limit on the number of 349 packets the queues can hold, but not one per queue. 351 4.2.2. New and old queues lists 353 FQ-CoDel maintains two lists of active queues, called "new" and "old" 354 queues. Each list is an ordered list containing references to the 355 array of sub-queues. When a packet is added to a queue that is not 356 currently active, that queue becomes active by being added to the 357 list of new queues. Later on, it is moved to the list of old queues, 358 from which it is removed when it is no longer active. This behaviour 359 is the source of some subtlety in the packet scheduling at dequeue 360 time, explained below. 362 5. The FQ-CoDel scheduler and AQM interactions 364 This section describes the operation of the FQ-CoDel scheduler and 365 AQM. It is split into two parts explaining the enqueue and dequeue 366 operations. 368 5.1. Enqueue 370 The packet enqueue mechanism consists of three stages: classification 371 into a sub-queue, timestamping and bookkeeping, and optionally 372 dropping a packet when the total number of enqueued packets goes over 373 the maximum. 375 When a packet is enqueued, it is first classified into the 376 appropriate sub-queue. By default, this is done by hashing on the 377 5-tuple of IP protocol, and source and destination IP and port 378 numbers, permuted by a random value selected at initialisation time, 379 and taking the hash value modulo the number of sub-queues. However, 380 an implementation MAY also specify a configurable classification 381 scheme along a wide variety of other possible parameters such as mac 382 address, diffserv, firewall and flow specific markings, etc. (the 383 Linux implementation does so in the form of the 'tc filter' command). 385 If a custom filter fails, classification failure results in the 386 packet being dropped and no further action taken. By design the 387 standard filter cannot fail. 389 Additionally, the default hashing algorithm presently deployed does 390 decapsulation of some common packet types (6in4, IPIP, GRE 0), mixes 391 IPv6 IP addresses thoroughly, and uses Jenkins hash on the result. 393 Once the packet has been successfully classified into a sub-queue, it 394 is handed over to the CoDel algorithm for timestamping. It is then 395 added to the tail of the selected queue, and the queue's byte count 396 is updated by the packet size. Then, if the queue is not currently 397 active (i.e. if it is not in either the list of new or the list of 398 old queues), it is added to the end of the list of new queues, and 399 its deficit is initiated to the configured quantum. Otherwise it is 400 added to the old queue list. 402 Finally, the total number of enqueued packets is compared with the 403 configured limit, and if it is _above_ this value (which can happen 404 since a packet was just enqueued), a packet is dropped from the head 405 of the queue with the largest current byte count. Note that this in 406 most cases means that the packet that gets dropped is different from 407 the one that was just enqueued, and may even be from a different 408 queue. 410 5.2. Dequeue 412 Most of FQ-CoDel's work is done at packet dequeue time. It consists 413 of three parts: selecting a queue from which to dequeue a packet, 414 actually dequeuing it (employing the CoDel algorithm in the process), 415 and some final bookkeeping. 417 For the first part, the scheduler first looks at the list of new 418 queues; for each queue in that list, if that queue has a negative 419 deficit (i.e. it has already dequeued at least a quantum of bytes), 420 its deficit is increased by one quantum, and the queue is put onto 421 the end of the list of old queues, and the routine selects the next 422 queue and starts again. 424 Otherwise, that queue is selected for dequeue. If the list of new 425 queues is empty, the scheduler proceeds down the list of old queues 426 in the same fashion (checking the deficit, and either selecting the 427 queue for dequeuing, or increasing the deficit and putting the queue 428 back at the end of the list). 430 After having selected a queue from which to dequeue a packet, the 431 CoDel algorithm is invoked on that queue. This applies the CoDel 432 control law, and may discard one or more packets from the head of 433 that queue, before returning the packet that should be dequeued (or 434 nothing if the queue is or becomes empty while being handled by the 435 CoDel algorithm). 437 Finally, if the CoDel algorithm did not return a packet, the queue is 438 empty, and the scheduler does one of two things: if the queue 439 selected for dequeue came from the list of new queues, it is moved to 440 the end of the list of old queues. If instead it came from the list 441 of old queues, that queue is removed from the list, to be added back 442 (as a new queue) the next time a packet arrives that hashes to that 443 queue. Then (since no packet was available for dequeue), the whole 444 dequeue process is restarted from the beginning. 446 If, instead, the scheduler _did_ get a packet back from the CoDel 447 algorithm, it updates the byte deficit for the selected queue before 448 returning the packet as the result of the dequeue operation. 450 The step that moves an empty queue from the list of new queues to the 451 list of old queues before it is removed is crucial to prevent 452 starvation. This is because otherwise the queue can reappear (the 453 next time a packet arrives for it) before the list of old queues is 454 visited; this can go on indefinitely even with a small number of 455 active flows, if the flow providing packets to the queue in question 456 transmits at just the right rate. This is prevented by first moving 457 the queue to the list of old queues, forcing a pass through that, and 458 thus preventing starvation. 460 The resulting migration of queues between the different states is 461 summarised in the following state diagram: 463 +-----------------+ +--------------------+ 464 | | Empty | | 465 | Empty |<---------------+ Old +-----+ 466 | | | | | 467 +-------+---------+ +--------------------+ | 468 | ^ ^ |Quantum 469 |Arrival | | |Exceeded 470 v | | | 471 +-----------------+ | | | 472 | | Empty or | | | 473 | New +-------------------+ +--------+ 474 | | Quantum exceeded 475 +-----------------+ 477 6. Implementation considerations 479 6.1. Probability of hash collisions 481 Since the Linux FQ-CoDel implementation by default uses 1024 hash 482 buckets, the probability that (say) 100 VoIP sessions will all hash 483 to the same bucket is something like ten to the power of minus 300. 484 Thus, the probability that at least one of the VoIP sessions will 485 hash to some other queue is very high indeed. 487 Conversely, the probability that each of the 100 VoIP sessions will 488 get its own queue is given by (1023!/(924!*1024^99)) or about 0.007; 489 so not all that probable. The probability rises sharply, however, if 490 we are willing to accept a few collisions. For example, there is 491 about an 86% probability that no more than two of the 100 VoIP 492 sessions will be involved in any given collision, and about a 99% 493 probability that no more than three of the VoIP sessions will be 494 involved in any given collision. These last two results were 495 computed using Monte Carlo simulations: Oddly enough, the mathematics 496 for VoIP-session collision exactly matches that of hardware cache 497 overflow. 499 6.2. Memory Overhead 501 FQ-CoDel can be implemented with a very low memory footprint (less 502 than 64 bytes per queue on 64 bit systems). These are the data 503 structures used in the Linux implementation: 505 struct codel_vars { 506 u32 count; 507 u32 lastcount; 508 bool dropping; 509 u16 rec_inv_sqrt; 510 codel_time_t first_above_time; 511 codel_time_t drop_next; 512 codel_time_t ldelay; 513 }; 515 struct fq_codel_flow { 516 struct sk_buff *head; 517 struct sk_buff *tail; 518 struct list_head flowchain; 519 int deficit; 520 u32 dropped; /* number of drops (or ECN marks) on this flow */ 521 struct codel_vars cvars; 522 }; 524 The master table managing all queues looks like this: 526 struct fq_codel_sched_data { 527 struct tcf_proto *filter_list; /* optional external classifier */ 528 struct fq_codel_flow *flows; /* Flows table [flows_cnt] */ 529 u32 *backlogs; /* backlog table [flows_cnt] */ 530 u32 flows_cnt; /* number of flows */ 531 u32 perturbation; /* hash perturbation */ 532 u32 quantum; /* psched_mtu(qdisc_dev(sch)); */ 533 struct codel_params cparams; 534 struct codel_stats cstats; 535 u32 drop_overlimit; 536 u32 new_flow_count; 538 struct list_head new_flows; /* list of new flows */ 539 struct list_head old_flows; /* list of old flows */ 540 }; 542 6.3. Per-Packet Timestamping 544 The CoDel portion of the algorithm requires per-packet timestamps be 545 stored along with the packet. While this approach works well for 546 software-based routers, it cannot be easily retrofitted to devices 547 that do most of their processing in silicon. 549 Also, timestamping functions in the core OS need to be very 550 efficient. 552 6.4. Other forms of "Fair Queueing" 554 Much of the scheduling portion of FQ-CoDel is derived from DRR. SFQ- 555 based versions have also been produced and tested in ns2. Other 556 forms of Fair Queueing, such as WFQ or QFQ, have not been thoroughly 557 explored. 559 6.5. Differences between CoDel and FQ-CoDel behaviour 561 CoDel can be applied to a single queue system as a straight AQM, 562 where it converges towards an "ideal" drop rate (i.e. one that 563 minimises delay while keeping a high link utilisation), and then 564 optimises around that control point. 566 The scheduling of FQ-CoDel mixes packets of competing flows, which 567 acts to pace bursty flows to better fill the pipe. Additionally, a 568 new flow gets substantial "credit" over other flows until CoDel finds 569 an ideal drop rate for it. However, for a new flow that exceeds the 570 configured quantum, more time passes before all of its data is 571 delivered (as packets from it, too, are mixed across the other 572 existing queue-building flows). Thus, FQ-CoDel takes longer (as 573 measured in time) to converge towards an ideal drop rate for a given 574 new flow, but does so within fewer delivered _packets_ from that 575 flow. 577 Finally, FQ-CoDel drops packets from the largest flows sooner and 578 more accurately than CoDel alone, and it is more responsive to 579 changes in bandwidth, and in number of flows, than CoDel alone. 581 7. Security Considerations 583 There are no specific security exposures associated with FQ-CoDel. 584 Some exposures present in current FIFO systems are in fact reduced 585 (e.g. simple minded packet floods). 587 8. IANA Considerations 589 This document has no actions for IANA. 591 9. Acknowledgements 593 Our deepest thanks to Eric Dumazet (author of FQ-CoDel), Kathie 594 Nichols, Van Jacobson, and all the members of the bufferbloat.net 595 effort. 597 10. Conclusions 599 FQ-CoDel is a very general, efficient, nearly parameterless active 600 queue management approach combining flow queueing with CoDel. It is 601 a critical tool in solving bufferbloat. 603 FQ-CoDel's default settings SHOULD be modified for other special- 604 purpose networking applications, such as for exceptionally slow 605 links, for use in data centres, or on links with inherent delay 606 greater than 800ms (e.g. satellite links). 608 On-going projects are: improving FQ-CoDel with more SFQ-like 609 behaviour for lower bandwidth systems, improving the control law, 610 optimising sparse packet drop behaviour, etc.. 612 In addition to the Linux kernel sources, ns2 and ns3 models are 613 available. Refinements (such as NFQCODEL [1]) are being tested in 614 the CeroWrt effort. 616 11. References 617 11.1. Normative References 619 [RFC0896] Nagle, J., "Congestion control in IP/TCP internetworks", 620 RFC 896, January 1984. 622 [RFC0970] Nagle, J., "On packet switches with infinite storage", RFC 623 970, December 1985. 625 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 626 Requirement Levels", BCP 14, RFC 2119, March 1997. 628 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 629 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 630 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 631 S., Wroclawski, J., and L. Zhang, "Recommendations on 632 Queue Management and Congestion Avoidance in the 633 Internet", RFC 2309, April 1998. 635 11.2. Informative References 637 [CODEL2012] 638 Nichols, K. and V. Jacobson, "Controlling Queue Delay", 639 July 2012, . 641 [DRR] Shreedhar, M. and G. Varghese, "Efficient Fair Queueing 642 Using Deficit Round Robin", June 1996, 643 . 646 [SFQ] McKenney, P., "Stochastic Fairness Queuing", September 647 1990, . 650 [SQF2012] Bonald, T., Muscariello, L., and N. Ostallo, "On the 651 impact of TCP and per-flow scheduling on Internet 652 Performance - IEEE/ACM transactions on Networking", April 653 2012, . 656 11.3. URIs 658 [1] http://www.bufferbloat.net/projects/cerowrt/wiki/nfq_codel 660 Authors' Addresses 661 Toke Hoeiland-Joergensen 662 Karlstad University 663 Dept. of Computer Science 664 Karlstad 65188 665 Sweden 667 Email: toke.hoiland-jorgensen@kau.se 669 Paul McKenney 670 IBM Linux Technology Center 671 1385 NW Amberglen Parkway 672 Hillsboro, OR 97006 673 USA 675 Email: paulmck@linux.vnet.ibm.com 676 URI: http://www2.rdrop.com/~paulmck/ 678 Dave Taht 679 Teklibre 680 2104 W First street 681 Apt 2002 682 FT Myers, FL 33901 683 USA 685 Email: d@taht.net 686 URI: http://www.teklibre.com/ 688 Jim Gettys 689 Bell Labs 690 21 Oak Knoll Road 691 Carlisle, MA 01741 692 USA 694 Email: jg@freedesktop.org 695 URI: https://en.wikipedia.org/wiki/Jim_Gettys 697 Eric Dumazet 698 Google, Inc. 699 1600 Amphitheater Pkwy 700 Mountain View, Ca 94043 701 USA 703 Email: edumazet@gmail.com