idnits 2.17.1 draft-ietf-aqm-pie-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** There are 24 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 540 has weird spacing: '...ode and drop ...' == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 26, 2015) is 3291 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2309' is mentioned on line 116, but not defined ** Obsolete undefined reference: RFC 2309 (Obsoleted by RFC 7567) == Missing Reference: 'CoDel' is mentioned on line 577, but not defined == Missing Reference: 'CBQ' is mentioned on line 580, but not defined == Missing Reference: 'DOCSIS-PIE' is mentioned on line 586, but not defined == Missing Reference: 'AQM-GOAL' is mentioned on line 573, but not defined == Missing Reference: 'TCP-Models' is mentioned on line 597, but not defined == Missing Reference: 'PI' is mentioned on line 601, but not defined == Missing Reference: 'QCN' is mentioned on line 605, but not defined == Missing Reference: 'HPSR' is mentioned on line 589, but not defined == Missing Reference: 'AQM DOCSIS' is mentioned on line 594, but not defined Summary: 3 errors (**), 0 flaws (~~), 13 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft R. Pan, P. Natarajan, F. Baker 3 Active Queue Management B. VerSteeg, M. Prabhu, C. Piglione 4 Working Group V. Subramanian, G. White 5 Intended Status: Standards Track 7 Expires: September 27, 2015 March 26, 2015 9 PIE: A Lightweight Control Scheme To Address the 10 Bufferbloat Problem 12 draft-ietf-aqm-pie-01 14 Abstract 16 Bufferbloat is a phenomenon where excess buffers in the network cause 17 high latency and jitter. As more and more interactive applications 18 (e.g. voice over IP, real time video streaming and financial 19 transactions) run in the Internet, high latency and jitter degrade 20 application performance. There is a pressing need to design 21 intelligent queue management schemes that can control latency and 22 jitter; and hence provide desirable quality of service to users. 24 We present here a lightweight design, PIE (Proportional Integral 25 controller Enhanced) that can effectively control the average 26 queueing latency to a target value. Simulation results, theoretical 27 analysis and Linux testbed results have shown that PIE can ensure low 28 latency and achieve high link utilization under various congestion 29 situations. The design does not require per-packet timestamp, so it 30 incurs very small overhead and is simple enough to implement in both 31 hardware and software. 33 Status of this Memo 35 This Internet-Draft is submitted to IETF in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF), its areas, and its working groups. Note that 40 other groups may also distribute working documents as 41 Internet-Drafts. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 47 The list of current Internet-Drafts can be accessed at 48 http://www.ietf.org/1id-abstracts.html 50 The list of Internet-Draft Shadow Directories can be accessed at 51 http://www.ietf.org/shadow.html 53 Copyright and License Notice 55 Copyright (c) 2012 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (http://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 5 72 3. Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . 5 73 4. The BASIC PIE Scheme . . . . . . . . . . . . . . . . . . . . . 6 74 4.1 Random Dropping . . . . . . . . . . . . . . . . . . . . . . 6 75 4.2 Drop Probability Calculation . . . . . . . . . . . . . . . . 7 76 4.3 Departure Rate Estimation . . . . . . . . . . . . . . . . . 8 77 5. Design Enhancement . . . . . . . . . . . . . . . . . . . . . . 9 78 5.1 Turning PIE on and off . . . . . . . . . . . . . . . . . . . 9 79 5.2 Auto-tuning of PIE's control parameters . . . . . . . . . . 9 80 5.3 Handling Bursts . . . . . . . . . . . . . . . . . . . . . . 10 81 5.4 De-randomization . . . . . . . . . . . . . . . . . . . . . . 11 82 6. Implementation and Discussions . . . . . . . . . . . . . . . . 11 83 7. Future Research . . . . . . . . . . . . . . . . . . . . . . . . 13 84 8. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 13 85 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 14 86 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 87 10.1 Normative References . . . . . . . . . . . . . . . . . . . 14 88 10.2 Informative References . . . . . . . . . . . . . . . . . . 14 89 10.3 Other References . . . . . . . . . . . . . . . . . . . . . 14 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 91 10. The PIE pseudo Code . . . . . . . . . . . . . . . . . . . . . 16 93 1. Introduction 95 The explosion of smart phones, tablets and video traffic in the 96 Internet brings about a unique set of challenges for congestion 97 control. To avoid packet drops, many service providers or data center 98 operators require vendors to put in as much buffer as possible. With 99 rapid decrease in memory chip prices, these requests are easily 100 accommodated to keep customers happy. However, the above solution of 101 large buffer fails to take into account the nature of the TCP 102 protocol, the dominant transport protocol running in the Internet. 103 The TCP protocol continuously increases its sending rate and causes 104 network buffers to fill up. TCP cuts its rate only when it receives a 105 packet drop or mark that is interpreted as a congestion signal. 106 However, drops and marks usually occur when network buffers are full 107 or almost full. As a result, excess buffers, initially designed to 108 avoid packet drops, would lead to highly elevated queueing latency 109 and jitter. It is a delicate balancing act to design a queue 110 management scheme that not only allows short-term burst to smoothly 111 pass, but also controls the average latency when long-term congestion 112 persists. 114 Active queue management (AQM) schemes, such as Random Early Discard 115 (RED), have been around for well over a decade. AQM schemes could 116 potentially solve the aforementioned problem. RFC 2309[RFC2309] 117 strongly recommends the adoption of AQM schemes in the network to 118 improve the performance of the Internet. RED is implemented in a wide 119 variety of network devices, both in hardware and software. 120 Unfortunately, due to the fact that RED needs careful tuning of its 121 parameters for various network conditions, most network operators 122 don't turn RED on. In addition, RED is designed to control the queue 123 length which would affect delay implicitly. It does not control 124 latency directly. Hence, the Internet today still lacks an effective 125 design that can control buffer latency to improve the quality of 126 experience to latency-sensitive applications. 128 Recently, a new trend has emerged to control queueing latency 129 directly to address the bufferbloat problem [CoDel]. Although 130 following the new trend, PIE also aims to keep the benefits of RED: 131 such as easy to implement and scalable to high speeds. Similar to 132 RED, PIE randomly drops a packet at the onset of the congestion. The 133 congestion detection, however, is based on the queueing latency 134 instead of the queue length like RED. Furthermore, PIE also uses the 135 latency moving trends: latency increasing or decreasing, to help 136 determine congestion levels. The design parameters of PIE are chosen 137 via stability analysis. While these parameters can be fixed to work 138 in various traffic conditions, they could be made self-tuning to 139 optimize system performance. 141 Separately, we assume any delay-based AQM scheme would be applied 142 over a Fair Queueing (FQ) structure or its approximate design, Class 143 Based Queueing (CBQ). FQ is one of the most studied scheduling 144 algorithms since it was first proposed in 1985 [RFC970]. CBQ has been 145 a standard feature in most network devices today[CBQ]. These designs 146 help flows/classes achieve max-min fairness and help mitigate bias 147 against long flows with long round trip times(RTT). Any AQM scheme 148 that is built on top of FQ or CBQ could benefit from these 149 advantages. Furthermore, we believe that these advantages such as per 150 flow/class fairness are orthogonal to the AQM design whose primary 151 goal is to control latency for a given queue. For flows that are 152 classified into the same class and put into the same queue, we need 153 to ensure their latency is better controlled and their fairness is 154 not worse than those under the standard DropTail or RED design. 156 In October 2013, CableLabs' DOCSIS 3.1 specification [DOCSIS_3.1] 157 mandates that cable modems implement a specific variant of the PIE 158 design as the active queue management algorithm. In addition to cable 159 specific improvements, the PIE design in DOCSIS 3.1 [DOCSIS-PIE] has 160 improved the original design in several areas: de-randomization of 161 coin tosses, enhanced burst protection and expanded range of auto- 162 tuning. 164 The previous draft of PIE describes the overall design goals, system 165 elements and implementation details of PIE. It also includes various 166 design considerations: such as how auto-tuning can be done. This 167 draft incorporates aforementioned DOCSIS-PIE improvements and 168 integrate them into the PIE design. We also discusses a pure enque- 169 based design where all the operations can be triggered by a packet 170 arrival. 172 2. Terminology 174 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 175 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 176 document are to be interpreted as described in RFC 2119 [RFC2119]. 178 3. Design Goals 180 We explore a queue management framework where we aim to improve the 181 performance of interactive and delay-sensitive applications. Our 182 design follows the general guidelines set by the AQM working group 183 document "IETF Recommendations Regarding Active Queue Management" 184 [AQM-GOAL]. More specifically our design has the following basic 185 criteria. 187 * First, we directly control queueing latency instead of 188 controlling queue length. Queue sizes change with queue draining 189 rates and various flows' round trip times. Delay bloat is the 190 real issue that we need to address as it impairs real time 191 applications. If latency can be controlled, bufferbloat is not 192 an issue. As a matter of fact, we would allow more buffers for 193 sporadic bursts as long as the latency is under control. 195 * Secondly, we aim to attain high link utilization. The goal of 196 low latency shall be achieved without suffering link under- 197 utilization or losing network efficiency. An early congestion 198 signal could cause TCP to back off and avoid queue building up. 199 On the other hand, however, TCP's rate reduction could result in 200 link under-utilization. There is a delicate balance between 201 achieving high link utilization and low latency. 203 * Furthermore, the scheme should be simple to implement and 204 easily scalable in both hardware and software. The wide adoption 205 of RED over a variety of network devices is a testament to the 206 power of simple random early dropping/marking. We strive to 207 maintain similar design simplicity. 209 * Finally, the scheme should ensure system stability for various 210 network topologies and scale well with arbitrary number streams. 211 Design parameters shall be set automatically. Users only need to 212 set performance-related parameters such as target queue delay, 213 not design parameters. 215 In the following, we will elaborate on the design of PIE and its 216 operation. 218 4. The BASIC PIE Scheme 220 As illustrated in Fig. 1, our scheme conceptually comprises three simple 221 components: a) random dropping at enqueing; b) periodic drop probability 222 update; c) dequeing rate estimation. The following sections describe 223 these components in further detail, and explain how they interact with 224 each other. 226 4.1 Random Dropping 228 Like any state-of-the-art AQM scheme, PIE would drop packets randomly 229 according to a drop probability, p, that is obtained from the drop- 230 probability-calculation component: 232 * upon a packet arrival 234 randomly drop a packet with a probability p. 236 Random Drop 237 / -------------- 238 -------/ --------------> | | | | | --------------> 239 /|\ | | | | | | 240 | -------------- | 241 | Queue Buffer | 242 | | | Departure bytes 243 | |queue | 244 | |length | 245 | | | 246 | \|/ \|/ 247 | ----------------- ------------------- 248 | | Drop | | | 249 -----<-----| Probability |<---| Departure Rate | 250 | Calculation | | Estimation | 251 ----------------- ------------------- 253 Figure 1. The PIE Structure 255 4.2 Drop Probability Calculation 257 The PIE algorithm periodically updates the drop probability as follows: 259 * estimate current queueing delay using Little's law: 261 est_del = qlen/depart_rate; 263 * calculate drop probability p as: 265 p = p + alpha*(est_del-target_del) + beta*(est_del-est_del_old); 267 est_del_old = est_del. 269 Here, the current queue length is denoted by qlen. The draining rate of 270 the queue, depart_rate, is obtained from the departure-rate-estimation 271 block. Variables, est_del and est_del_old, represent the current and 272 previous estimation of the queueing delay. The target latency value is 273 expressed in target_del. The update interval is denoted as Tupdate. 275 Note that the calculation of drop probability is based not only on the 276 current estimation of the queueing delay, but also on the direction 277 where the delay is moving, i.e., whether the delay is getting longer or 278 shorter. This direction can simply be measured as the difference between 279 est_del and est_del_old. This is the classic Proportional Integral 280 controller design that is adopted here for controlling queueing latency. 281 The controller parameters, in the unit of hz, are designed using 282 feedback loop analysis where TCP's behaviors are modeled using the 283 results from well-studied prior art[TCP-Models]. 285 We would like to point out that this type of controller has been studied 286 before for controlling the queue length [PI, QCN]. PIE adopts the 287 Proportional Integral controller for controlling delay and makes the 288 scheme auto-tuning. The theoretical analysis of PIE is under paper 289 submission and its reference will be included in this draft once it 290 becomes available. Nonetheless, we will discuss the intuitions for these 291 parameters in Section 5. 293 4.3 Departure Rate Estimation 295 The draining rate of a queue in the network often varies either because 296 other queues are sharing the same link, or the link capacity fluctuates. 297 Rate fluctuation is particularly common in wireless networks. Hence, we 298 decide to measure the departure rate directly as follows. 300 * we are in a measurement cycle if we have enough data in the queue: 302 qlen > dq_threshold 304 * if in a measurement cycle: 306 upon a packet departure 308 dq_count = dq_count + deque_pkt_size; 310 * if dq_count > dq_threshold then 312 depart_rate = dq_count/(now-start); 314 dq_count = 0; 316 start = now; 318 We only measure the departure rate when there are sufficient data in the 319 buffer, i.e., when the queue length is over a certain threshold, 320 deq_threshold. Short, non-persistent bursts of packets result in empty 321 queues from time to time, this would make the measurement less accurate. 322 The parameter, dq_count, represents the number of bytes departed since 323 the last measurement. Once dq_count is over a certain threshold, 324 deq_threshold, we obtain a measurement sample. The threshold is 325 recommended to be set to 16KB assuming a typical packet size of around 326 1KB or 1.5KB. This threshold would allow us a long enough period to 327 obtain an average draining rate but also fast enough to reflect sudden 328 changes in the draining rate. Note that this threshold is not crucial 329 for the system's stability. 331 5. Design Enhancement 333 The above three components form the basis of the PIE algorithm. There 334 are several enhancements that we add to further augment the performance 335 of the basic algorithm. For clarity purpose, we include them here in 336 this section. 338 5.1 Turning PIE on and off 340 Traffic naturally fluctuates in a network. We would not want to 341 unnecessarily drop packets due to a spurious uptick in queueing latency. 342 If PIE is not active, we would only turn it on when the buffer occupancy 343 is over a certain threshold, which we set to 1/3 of the queue buffer 344 size. If PIE is on, we would turn it off when congestion is over, i.e. 345 when the drop probability, queue length and estimated queue delay all 346 reach 0. 348 5.2 Auto-tuning of PIE's control parameters 350 While the formal analysis can be found in [HPSR], we would like to 351 discuss the intuitions regarding how to determine the key control 352 parameters of PIE. Although the PIE algorithm would set them 353 automatically, they are not meant to be magic numbers. We hope to give 354 enough explanations here to help demystify them so that users can 355 experiment and explore on their own. 357 As it is obvious from the above, the crucial equation in the PIE 358 algorithm is 360 p = p + alpha*(est_del-target_del) + beta*(est_del-est_del_old). 362 The value of alpha determines how the deviation of current latency from 363 the target value affects the drop probability. The beta term exerts 364 additional adjustments depending on whether the latency is trending up 365 or down. Note that the drop probability is reached incrementally, not 366 through a single step. To avoid big swings in adjustments which often 367 leads to instability, we would like to tune p in small increments. 368 Suppose that p is in the range of 1%. Then we would want the value of 369 alpha and beta to be small enough, say 0.1%, adjustment in each step. If 370 p is in the higher range, say above 10%, then the situation would 371 warrant a higher single step tuning, for example 1%. There are could be 372 several regions of these tuning, extendable all the way to 0.001% if 373 needed. Finally, the drop probability would only be stabilized when the 374 latency is stable, i.e. est_del equals est_del_old; and the value of the 375 latency is equal to target_del. The relative weight between alpha and 376 beta determines the final balance between latency offset and latency 377 jitter. 379 The update interval, Tupdate, also plays a key role in stability. Given 380 the same alpha and beta values, the faster the update is, the higher the 381 loop gain will be. As it is not showing explicitly in the above 382 equation, it can become an oversight. Notice also that alpha and beta 383 have a unit of hz. 385 5.3 Handling Bursts 387 Although we aim to control the average latency of a congested queue, the 388 scheme should allow short term bursts to pass through without hurting 389 them. We would like to discuss how PIE manages bursts in this section 390 when it is active. 392 Bursts are well tolerated in the basic scheme for the following reasons: 393 first, the drop probability is updated periodically. Any short term 394 burst that occurs within this period could pass through without 395 incurring extra drops as it would not trigger a new drop probability 396 calculation. Secondly, PIE's drop probability calculation is done 397 incrementally. A single update would only lead to a small incremental 398 change in the probability. So if it happens that a burst does occur at 399 the exact instant that the probability is being calculated, the 400 incremental nature of the calculation would ensure its impact is kept 401 small. 403 Nonetheless, we would like to give users a precise control of the burst. 404 We introduce a parameter, max_burst, that is similar to the burst 405 tolerance in the token bucket design. By default, the parameter is set 406 to be 150ms. Users can certainly modify it according to their 407 application scenarios. The burst allowance is added into the basic PIE 408 design as follows: 410 * if PIE_active == FALSE 412 burst_allowance = max_burst; 414 * upon packet arrival 416 if burst_allowance > 0 enqueue packet; 418 * upon probability update when PIE_active == TRUE 420 burst_allowance = burst_allowance - Tupdate; 422 The burst allowance, noted by burst_allowance, is initialized to 423 max_burst. As long as burst_allowance is above zero, an incoming packet 424 will be enqueued bypassing the random drop process. During each update 425 instance, the value of burst_allowance is decremented by the update 426 period, Tupdate. When the congestion goes away, defined by us as p 427 equals to 0 and both the current and previous samples of estimated delay 428 are less than target_del, we reset burst_allowance to max_burst. 430 5.4 De-randomization 432 Although PIE adopts random dropping to achieve latency control, coin 433 tosses could introduce outlier situations where packets are dropped too 434 close to each other or too far from each other. This would cause real 435 drop percentage to deviate from the intended drop probability p. PIE 436 introduces a de-randomization mechanism to avoid such scenarios. We keep 437 a parameter called accu_prob, which is reset to 0 after a drop. Upon a 438 packet arrival, accu_prob is incremented by the amount of drop 439 probability, p. If accu_prob is less than a low threshold, e.g. 0.85, we 440 enque the arriving packet; on the other hand, if accu_prob is more than 441 a high threshold, e.g. 8.5, we force a packet drop. We would only 442 randomly drop a packet if accu_prob falls in between the two thresholds. 443 Since accu_prob is reset to 0 after a drop, another drop will not happen 444 until 0.85/p packets later. This avoids packets are dropped too close to 445 each other. In the other extreme case where 8.5/p packets have been 446 enqued without incurring a drop, PIE would force a drop that prevents 447 much fewer drops than desired. Further analysis can be found in [AQM 448 DOCSIS]. 450 6. Implementation and Discussions 452 PIE can be applied to existing hardware or software solutions. In this 453 section, we discuss the implementation cost of the PIE algorithm. There 454 are three steps involved in PIE as discussed in Section 4. We examine 455 their complexities as follows. 457 Upon packet arrival, the algorithm simply drops a packet randomly based 458 on the drop probability p. This step is straightforward and requires no 459 packet header examination and manipulation. Besides, since no per packet 460 overhead, such as a timestamp, is required, there is no extra memory 461 requirement. Furthermore, the input side of a queue is typically under 462 software control while the output side of a queue is hardware based. 463 Hence, a drop at enqueueing can be readily retrofitted into existing 464 hardware or software implementations. 466 The drop probability calculation is done in the background and it occurs 467 every Tudpate interval. Given modern high speed links, this period 468 translates into once every tens, hundreds or even thousands of packets. 469 Hence the calculation occurs at a much slower time scale than packet 470 processing time, at least an order of magnitude slower. The calculation 471 of drop probability involves multiplications using alpha and beta. Since 472 the algorithm is not sensitive to the precise values of alpha and beta, 473 we can choose the values, e.g. alpha=0.25 and beta=2.5 so that 474 multiplications can be done using simple adds and shifts. As no 475 complicated functions are required, PIE can be easily implemented in 476 both hardware and software. The state requirement is only two variables 477 per queue: est_del and est_del_old. Hence the memory overhead is small. 479 In the departure rate estimation, PIE uses a counter to keep track of 480 the number of bytes departed for the current interval. This counter is 481 incremented per packet departure. Every Tupdate, PIE calculates latency 482 using the departure rate, which can be implemented using a 483 multiplication. Note that many network devices keep track an interface's 484 departure rate. In this case, PIE might be able to reuse this 485 information, simply skip the third step of the algorithm and hence 486 incurs no extra cost. We also understand that in some software 487 implementations, timestamps are added for other purposes. In this case, 488 we can also make use of the time-stamps and bypass the departure rate 489 estimation and directly used the timestamp information in the drop 490 probability calculation. 492 In some platforms, enqueueing and dequeueing functions belong to 493 different modules that are independent to each other. In such 494 situations, a pure enque-based design is preferred. As shown in Figure 495 2, we depict an enque-based design. The departure rate is deduced from 496 the number of packets enqueued and the queue length. The design is based 497 on the following key observation: over a certain time interval, the 498 number of departure packets = the number of enqueued packets - the 499 number of extra packets in queue. In this design, everything can be 500 triggered by a packet arrival including the background update process. 501 The design complexity here is similar to the original design. 503 Random Drop 504 / -------------- 505 -------/ --------------------> | | | | | --------------> 506 /|\ | | | | | | 507 | | -------------- 508 | | Queue Buffer 509 | | | 510 | | |queue 511 | | |length 512 | | | 513 | \|/ \|/ 514 | ------------------------------ 515 | | Departure Rate | 516 -----<-----| & Drop Probability | 517 | Calculation | 518 ------------------------------ 520 Figure 2. The Enque-based PIE Structure 522 In summary, the state requirement for PIE is limited and computation 523 overheads are small. Hence, PIE is simple to be implemented. In 524 addition, since PIE does not require any user configuration, it does not 525 impose any new cost on existing network management system solutions. SFQ 526 can be combined with PIE to provide further improvement of latency for 527 various flows with different priorities. However, SFQ requires extra 528 queueing and scheduling structures. Whether the performance gain can 529 justify the design overhead needs to be further investigated. 531 7. Future Research 533 What is presented in this document is the design of the PIE algorithm, 534 which effectively controls the average queueing latency to a target 535 value. We foresee following areas that can be further studied. The 536 current design is auto-tuning based on the drop probability levels. 537 Future research can be done in adjusting the drop probability more 538 smoothly while keeping the design simple. Another further study can be 539 in the area of how to have an integrated solution for transitioning 540 between burst tolerance mode and drop early mode. 542 Since our design is separated into data path and control path. If 543 control path is implemented in software, any further improvement in 544 control path can be easily accommodated. 546 8. Incremental Deployment 547 One nice property of the AQM design is that it can be independently 548 designed and operated without the requirement of being inter-operable. 550 Although all network nodes can not be changed altogether to adopt 551 latency-based AQM schemes, we envision a gradual adoption which would 552 eventually lead to end-to-end low latency service for real time 553 applications. 555 9. IANA Considerations 557 There are no actions for IANA. 559 10. References 561 10.1 Normative References 563 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 564 Requirement Levels", BCP 14, RFC 2119, March 1997. 566 10.2 Informative References 568 [RFC970] Nagle, J., "On Packet Switches With Infinite 569 Storage",RFC970, December 1985. 571 10.3 Other References 573 [AQM-GOAL] Baker, F., Fairhurst, G., "IETF Recommendations Regarding 574 Active Queue Management", draft-ietf-aqm 575 -recommendation-11. 577 [CoDel] Nichols, K., Jacobson, V., "Controlling Queue Delay", ACM 578 Queue. ACM Publishing. doi:10.1145/2209249.22W.09264. 580 [CBQ] Cisco White Paper, "http://www.cisco.com/en/US/docs/12_0t 581 /12_0tfeature/guide/cbwfq.html". 583 [DOCSIS_3.1] http://www.cablelabs.com/wp-content/uploads/specdocs 584 /CM-SP-MULPIv3.1-I01-131029.pdf. 586 [DOCSIS-PIE] White, G. and Pan, R., "A PIE-Based AQM for DOCSIS 587 Cable Modems", IETF draft-white-aqm-docsis-pie-00. 589 [HPSR] Pan, R., Natarajan, P. Piglione, C., Prabhu, M.S., 590 Subramanian, V., Baker, F. Steeg and B. V., "PIE: 591 A Lightweight Control Scheme to Address the 592 Bufferbloat Problem", IEEE HPSR 2013. 594 [AQM DOCSIS] http://www.cablelabs.com/wp- 595 content/uploads/2014/06/DOCSIS-AQM_May2014.pdf 597 [TCP-Models] Misra, V., Gong, W., and Towsley, D., "Fluid-based 598 Analysis of a Network of AQM Routers Supporting TCP 599 Flows with an Application to RED", SIGCOMM 2000. 601 [PI] Hollot, C.V., Misra, V., Towsley, D. and Gong, W., 602 "On Designing Improved Controller for AQM Routers 603 Supporting TCP Flows", Infocom 2001. 605 [QCN] "Data Center Bridging - Congestion Notification", 606 http://www.ieee802.org/1/pages/802.1au.html. 608 Authors' Addresses 610 Rong Pan 611 Cisco Systems 612 3625 Cisco Way, 613 San Jose, CA 95134, USA 614 Email: ropan@cisco.com 616 Preethi Natarajan, 617 Cisco Systems 618 725 Alder Drive, 619 Milpitas, CA 95035, USA 620 Email: prenatar@cisco.com 622 Fred Baker 623 Cisco Systems 624 725 Alder Drive, 625 Milpitas, CA 95035, USA 626 Email: fred@cisco.com 628 Bill Ver Steeg 629 Cisco Systems 630 5030 Sugarloaf Parkway 631 Lawrenceville, GA, 30044, USA 632 Email: versteb@cisco.com 634 Mythili Prabhu* 635 Akamai Technologies 636 3355 Scott Blvd 637 Santa Clara, CA - 95054 638 Email: mythili@akamai.com 640 Chiara Piglione* 641 Broadcom Corporation 642 3151 Zanker Road 643 San Jose, CA 95134 644 Email: chiara@broadcom.com 646 Vijay Subramanian* 647 PLUMgrid, Inc. 648 350 Oakmead Parkway, 649 Suite 250 650 Sunnyvale, CA 94085 651 Email: vns@plumgrid.com 653 Greg White 654 CableLabs 655 858 Coal Creek Circle 656 Louisville, CO 80027, USA 657 Email: g.white@cablelabs.com 659 * Formerly at Cisco Systems 661 10. The PIE pseudo Code 663 Configurable Parameters: 664 - QDELAY_REF. AQM Latency Target (default: 16ms) 665 - BURST_ALLOWANCE. AQM Latency Target (default: 150ms) 667 Internal Parameters: 668 - Weights in the drop probability calculation (1/s): 669 alpha (default: 1/8), beta(default: 1+1/4) 670 - DQ_THRESHOLD (in bytes, default: 2^14 (in a power of 2) ) 671 - T_UPDATE: a period to calculate drop probability (default:16ms) 672 - QUEUE_SMALL = (1/3) * Buffer limit in bytes 674 Table which stores status variables (ending with "_"): 675 - active_: INACTIVE/ACTIVE 676 - burst_count_: current burst_count 677 - drop_prob_: The current packet drop probability. reset to 0 678 - accu_prob_: Accumulated drop probability. reset to 0 679 - qdelay_old_: The previous queue delay estimate. reset to 0 680 - last_timestamp_: Timestamp of previous status update 681 - dq_count_, measurement_start_, in_measurement_, 682 avg_dq_time_. variables for measuring avg_dq_rate_. 684 Public/system functions: 685 - queue_. Holds the pending packets. 686 - drop(packet). Drops/discards a packet 687 - now(). Returns the current time 688 - random(). Returns a uniform r.v. in the range 0 ~ 1 689 - queue_.is_full(). Returns true if queue_ is full 690 - queue_.byte_length(). Returns current queue_ length in bytes 691 - queue_.enque(packet). Adds packet to tail of queue_ 692 - queue_.deque(). Returns the packet from the head of queue_ 693 - packet.size(). Returns size of packet 695 ============================ 697 enque(Packet packet) { 698 if (queue_.is_full()) { 699 drop(packet); 700 PIE->accu_prob_ = 0; 701 } else if (PIE->active_ == TRUE && drop_early() == TRUE 702 && PIE->burst_count_ <= 0) { 703 drop(packet); 704 PIE->accu_prob_ = 0; 705 } else { 706 queue_.enque(packet); 707 } 709 //If the queue is over a certain threshold, turn on PIE 710 if (PIE->active_ == INACTIVE 711 && queue_.byte_length() >= QUEUE_SMALL) { 712 PIE->active_ = ACTIVE; 713 PIE->qdelay_old_ = 0; 714 PIE->drop_prob_ = 0; 715 PIE->in_measurement_ = TRUE; 716 PIE->dq_count_ = 0; 717 PIE->avg_dq_time_ = 0; 718 PIE->last_timestamp_ = now; 719 PIE->burst_count = BURST_ALLOWANCE; 720 PIE->accu_prob_ = 0; 721 PIE->measurement_start_ = now; 722 } 723 //If the queue has been idle for a while, turn off PIE 724 //reset counters when accessing the queue after some idle 725 //period if PIE was active before 726 if ( PIE->drop_prob_ == 0 && PIE->qdelay_old == 0 727 && queue_.byte_length() == 0) { 728 PIE->active_ = INACTIVE; 729 PIE->in_measurement_ = FALSE; 730 } 731 } 733 =========================== 735 drop_early() { 737 //PIE is active but the queue is not congested, return ENQUE 738 if ( (PIE->qdelay_old_ < QDELAY_REF/2 && PIE->drop_prob_ < 20%) 739 || (queue_.byte_length() <= 2 * MEAN_PKTSIZE) ) { 740 return ENQUE; 741 } 743 //Random drop 744 PIE->accu_prob_ += PIE->drop_prob_; 745 if (PIE->accu_prob_ < 0.85) 746 return ENQUE; 747 if (PIE->accu_prob_ >= 8.5) 748 return DROP; 749 double u = random(); 750 if (u < PIE->drop_prob_) { 751 PIE->accu_prob_ = 0; 752 return DROP; 753 } else { 754 return ENQUE; 755 } 756 } 758 ============================ 759 //update periodically, T_UPDATE = 16ms 760 status_update(state) { 761 if ( (now - PIE->last_timestampe_) >= T_UPDATE && 762 PIE->active_ == ACTIVE) { 763 //can be implemented using integer multiply, 764 //DQ_THRESHOLD is power of 2 value 765 qdelay = queue_.byte_length() * avg_dq_time_/DQ_THRESHOLD; 766 if (PIE->drop_prob_ < 0.1%) { 767 PIE->drop_prob_ += alpha*(qdelay - QDELAY_REF)/128 768 + beta*(qdelay-PIE->qdelay_old_)/128; 769 } else if (PIE->drop_prob_ < 1%) { 770 PIE->drop_prob_ += alpha*(qdelay - QDELAY_REF)/16 771 + beta*(qdelay-PIE->qdelay_old_)/16; 772 } else if (PIE->drop_prob_ < 10%) { 773 PIE->drop_prob_ += alpha*(qdelay - QDELAY_REF)/2 774 + beta*(qdelay-PIE->qdelay_old_)/2; 775 } else { 776 PIE->drop_prob_ += alpha*(qdelay - QDELAY_REF) 777 + beta*(qdelay-PIE->qdelay_old_); 778 } 780 //bound drop probability 781 if (PIE->drop_prob_ < 0) 782 PIE->drop_prob_ = 0 783 if (PIE->drop_prob_ > 1) 784 PIE->drop_prob_ = 1 786 PIE->qdelay_old_ = qdelay; 787 PIE->last_timestamp_ = now; 788 if (PIE->burst_count_ > 0) { 789 PIE->burst_count_ = PIE->burst_count_ - T_UPDATE; 790 } 791 } 792 } 794 ========================== 795 deque(Packet packet) { 797 //dequeue rate estimation 798 if (PIE->in_measurement_ == TRUE) { 799 PIE->dq_count_ = packet.size() + PIE->dq_count_; 800 //start a new measurement cycle if we have enough packets 801 if ( PIE->dq_count_ >= DQ_THRESHOLD) { 802 dq_time = now - PIE->measurement_start_; 803 if(PIE->avg_dq_time_ == 0) { 804 PIE->avg_dq_time_ = dq_time; 805 } else { 806 PIE->avg_dq_time_ = dq_time*1/4 + PIE->avg_dq_time*3/4; 807 } 808 PIE->in_measurement = FALSE; 809 } 810 } 812 //start a measurement cycle if we have enough data in the queue: 813 if (queue_.byte_length() >= DQ_THRESHOLD && 814 PIE->in_measurement_ == FALSE) { 815 PIE->in_measurement_ = TRUE; 816 PIE->measurement_start_ = now; 817 PIE->dq_count_ = 0; 818 } 819 }