idnits 2.17.1 

draft-ietf-aqm-pie-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** There are 17 instances of lines with control characters in the document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (October 27, 2014) is 3469 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC2309' is mentioned on line 114, but not defined

  ** Obsolete undefined reference: RFC 2309 (Obsoleted by RFC 7567)

  == Missing Reference: 'CoDel' is mentioned on line 553, but not defined

  == Missing Reference: 'CBQ' is mentioned on line 556, but not defined

  == Missing Reference: 'DOCSIS-PIE' is mentioned on line 562, but not defined

  == Missing Reference: 'TCP-Models' is mentioned on line 573, but not defined

  == Missing Reference: 'PI' is mentioned on line 577, but not defined

  == Missing Reference: 'QCN' is mentioned on line 581, but not defined

  == Missing Reference: 'HPSR' is mentioned on line 565, but not defined

  == Missing Reference: 'AQM DOCSIS' is mentioned on line 570, but not defined


     Summary: 3 errors (**), 0 flaws (~~), 12 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Draft                            R. Pan, P. Natarajan, F. Baker
3	Active Queue Management              B. VerSteeg, M. Prabhu, C. Piglione
4	Working Group                                   V. Subramanian, G. White

6	Expires: April 30, 2015                                 October 27, 2014

8	           PIE: A Lightweight Control Scheme To Address the
9	                          Bufferbloat Problem

11			      draft-ietf-aqm-pie-00

13	Abstract

15	   Bufferbloat is a phenomenon where excess buffers in the network cause
16	   high latency and jitter. As more and more interactive applications
17	   (e.g. voice over IP, real time video streaming and financial
18	   transactions) run in the Internet, high latency and jitter degrade
19	   application performance. There is a pressing need to design
20	   intelligent queue management schemes that can control latency and
21	   jitter; and hence provide desirable quality of service to users.

23	   We present here a lightweight design, PIE (Proportional Integral
24	   controller Enhanced) that can effectively control the average
25	   queueing latency to a target value. Simulation results, theoretical
26	   analysis and Linux testbed results have shown that PIE can ensure low
27	   latency and achieve high link utilization under various congestion
28	   situations. The design does not require per-packet timestamp, so it
29	   incurs very small overhead and is simple enough to implement in both
30	   hardware and software.

32	Status of this Memo

34	   This Internet-Draft is submitted to IETF in full conformance with the
35	   provisions of BCP 78 and BCP 79.

37	   Internet-Drafts are working documents of the Internet Engineering
38	   Task Force (IETF), its areas, and its working groups.  Note that
39	   other groups may also distribute working documents as
40	   Internet-Drafts.

42	   Internet-Drafts are draft documents valid for a maximum of six months
43	   and may be updated, replaced, or obsoleted by other documents at any
44	   time.  It is inappropriate to use Internet-Drafts as reference
45	   material or to cite them other than as "work in progress."
46	   The list of current Internet-Drafts can be accessed at
47	   http://www.ietf.org/1id-abstracts.html

49	   The list of Internet-Draft Shadow Directories can be accessed at
50	   http://www.ietf.org/shadow.html

52	Copyright and License Notice

54	   Copyright (c) 2012 IETF Trust and the persons identified as the
55	   document authors. All rights reserved.

57	   This document is subject to BCP 78 and the IETF Trust's Legal
58	   Provisions Relating to IETF Documents
59	   (http://trustee.ietf.org/license-info) in effect on the date of
60	   publication of this document. Please review these documents
61	   carefully, as they describe your rights and restrictions with respect
62	   to this document. Code Components extracted from this document must
63	   include Simplified BSD License text as described in Section 4.e of
64	   the Trust Legal Provisions and are provided without warranty as
65	   described in the Simplified BSD License.

67	Table of Contents

69	   1. Introduction  . . . . . . . . . . . . . . . . . . . . . . . . .  4
70	   2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . .  5
71	   3. Design Goals  . . . . . . . . . . . . . . . . . . . . . . . . .  5
72	   4. The BASIC PIE Scheme  . . . . . . . . . . . . . . . . . . . . .  6
73	     4.1 Random Dropping  . . . . . . . . . . . . . . . . . . . . . .  6
74	     4.2 Drop Probability Calculation . . . . . . . . . . . . . . . .  7
75	     4.3 Departure Rate Estimation  . . . . . . . . . . . . . . . . .  8
76	   5. Design Enhancement  . . . . . . . . . . . . . . . . . . . . . .  9
77	     5.1 Turning PIE on and off . . . . . . . . . . . . . . . . . . .  9
78	     5.2 Auto-tuning of PIE's control parameters  . . . . . . . . . .  9
79	     5.3 Handling Bursts  . . . . . . . . . . . . . . . . . . . . . . 10
80	     5.4 De-randomization . . . . . . . . . . . . . . . . . . . . . . 11
81	   6. Implementation and Discussions  . . . . . . . . . . . . . . . . 11
82	   7. Incremental Deployment  . . . . . . . . . . . . . . . . . . . . 13
83	   8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 13
84	   9. References  . . . . . . . . . . . . . . . . . . . . . . . . . . 14
85	     9.1  Normative References  . . . . . . . . . . . . . . . . . . . 14
86	     9.2  Informative References  . . . . . . . . . . . . . . . . . . 14
87	     9.3  Other References  . . . . . . . . . . . . . . . . . . . . . 14
88	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15
89	   10. The PIE pseudo Code  . . . . . . . . . . . . . . . . . . . . . 16

91	1. Introduction

93	   The explosion of smart phones, tablets and video traffic in the
94	   Internet brings about a unique set of challenges for congestion
95	   control. To avoid packet drops, many service providers or data center
96	   operators require vendors to put in as much buffer as possible. With
97	   rapid decrease in memory chip prices, these requests are easily
98	   accommodated to keep customers happy. However, the above solution of
99	   large buffer fails to take into account the nature of the TCP
100	   protocol, the dominant transport protocol running in the Internet.
101	   The TCP protocol continuously increases its sending rate and causes
102	   network buffers to fill up. TCP cuts its rate only when it receives a
103	   packet drop or mark that is interpreted as a congestion signal.
104	   However, drops and marks usually occur when network buffers are full
105	   or almost full. As a result, excess buffers, initially designed to
106	   avoid packet drops, would lead to highly elevated queueing latency
107	   and jitter. It is a delicate balancing act to design a queue
108	   management scheme that not only allows short-term burst to smoothly
109	   pass, but also controls the average latency when long-term congestion
110	   persists.

112	   Active queue management (AQM) schemes, such as Random Early Discard
113	   (RED), have been around for well over a decade. AQM schemes could
114	   potentially solve the aforementioned problem. RFC 2309[RFC2309]
115	   strongly recommends the adoption of AQM schemes in the network to
116	   improve the performance of the Internet. RED is implemented in a wide
117	   variety of network devices, both in hardware and software.
118	   Unfortunately, due to the fact that RED needs careful tuning of its
119	   parameters for various network conditions, most network operators
120	   don't turn RED on. In addition, RED is designed to control the queue
121	   length which would affect delay implicitly. It does not control
122	   latency directly. Hence, the Internet today still lacks an effective
123	   design that can control buffer latency to improve the quality of
124	   experience to latency-sensitive applications.

126	   Recently, a new trend has emerged to control queueing latency
127	   directly to address the bufferbloat problem [CoDel]. Although
128	   following the new trend, PIE also aims to keep the benefits of RED:
129	   such as easy to implement and scalable to high speeds. Similar to
130	   RED, PIE randomly drops a packet at the onset of the congestion. The
131	   congestion detection, however, is based on the queueing latency
132	   instead of the queue length like RED. Furthermore, PIE also uses the
133	   latency moving trends: latency increasing or decreasing, to help
134	   determine congestion levels. The design parameters of PIE are chosen
135	   via stability analysis. While these parameters can be fixed to work
136	   in various traffic conditions, they could be made self-tuning to
137	   optimize system performance.

139	   Separately, we assume any delay-based AQM scheme would be applied
140	   over a Fair Queueing (FQ) structure or its approximate design, Class
141	   Based Queueing (CBQ). FQ is one of the most studied scheduling
142	   algorithms since it was first proposed in 1985 [RFC970]. CBQ has been
143	   a standard feature in most network devices today[CBQ]. These designs
144	   help flows/classes achieve max-min fairness and help mitigate bias
145	   against long flows with long round trip times(RTT). Any AQM scheme
146	   that is built on top of FQ or CBQ could benefit from these
147	   advantages. Furthermore, we believe that these advantages such as per
148	   flow/class fairness are orthogonal to the AQM design whose primary
149	   goal is to control latency for a given queue. For flows that are
150	   classified into the same class and put into the same queue, we need
151	   to ensure their latency is better controlled and their fairness is
152	   not worse than those under the standard DropTail or RED design.

154	   In October 2013, CableLabs' DOCSIS 3.1 specification [DOCSIS_3.1]
155	   mandates that cable modems implement a specific variant of the PIE
156	   design as the active queue management algorithm. In addition to cable
157	   specific improvements, the PIE design in DOCSIS 3.1 [DOCSIS-PIE] has
158	   improved the original design in several areas: de-randomization of
159	   coin tosses, enhanced burst protection and expanded range of auto-
160	   tuning.

162	   The previous draft of PIE describes the overall design goals, system
163	   elements and implementation details of PIE. It also includes various
164	   design considerations: such as how auto-tuning can be done. This
165	   draft incorporates aforementioned DOCSIS-PIE improvements and
166	   integrate them into the PIE design. We also discusses a pure enque-
167	   based design where all the operations can be triggered by a packet
168	   arrival.

170	2. Terminology

172	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
173	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
174	   document are to be interpreted as described in RFC 2119 [RFC2119].

176	3. Design Goals

178	   We explore a queue management framework where we aim to improve the
179	   performance of interactive and delay-sensitive applications. The
180	   design of our scheme follows a few basic criteria.

182	        * First, we directly control queueing latency instead of
183	        controlling queue length. Queue sizes change with queue draining
184	        rates and various flows' round trip times. Delay bloat is the
185	        real issue that we need to address as it impairs real time
186	        applications. If latency can be controlled, bufferbloat is not
187	        an issue. As a matter of fact, we would allow more buffers for
188	        sporadic bursts as long as the latency is under control.

190	        * Secondly, we aim to attain high link utilization. The goal of
191	        low latency shall be achieved without suffering link under-
192	        utilization or losing network efficiency. An early congestion
193	        signal could cause TCP to back off and avoid queue building up.
194	        On the other hand, however, TCP's rate reduction could result in
195	        link under-utilization. There is a delicate balance between
196	        achieving high link utilization and low latency.

198	        * Furthermore, the scheme should be simple to implement and
199	        easily scalable in both hardware and software. The wide adoption
200	        of RED over a variety of network devices is a testament to the
201	        power of simple random early dropping/marking. We strive to
202	        maintain similar design simplicity.

204	        * Finally, the scheme should ensure system stability for various
205	        network topologies and scale well with arbitrary number streams.
206	        Design parameters shall be set automatically. Users only need to
207	        set performance-related parameters such as target queue delay,
208	        not design parameters.

210	In the following, we will elaborate on the design of PIE and its
211	operation.

213	4. The BASIC PIE Scheme

215	As illustrated in Fig. 1, our scheme conceptually comprises three simple
216	components: a) random dropping at enqueing; b) periodic drop probability
217	update; c) dequeing rate estimation. The following sections describe
218	these components in further detail, and explain how they interact with
219	each other.

221	4.1 Random Dropping

223	Like any state-of-the-art AQM scheme, PIE would drop packets randomly
224	according to a drop probability, p, that is obtained from the drop-
225	probability-calculation component:

227	    * upon a packet arrival

229	        randomly drop a packet with a probability p.

231	         Random Drop
232	             /               --------------
233	     -------/  -------------->    | | | | | -------------->
234	            /|\                   | | | | |         |
235	             |               --------------         |
236	             |                 Queue Buffer         |
237	             |                     |                | Departure bytes
238	             |                     |queue           |
239	             |                     |length          |
240	             |                     |                |
241	             |                    \|/              \|/
242	             |          -----------------    -------------------
243	             |          |     Drop      |    |                 |
244	             -----<-----|  Probability  |<---| Departure Rate  |
245	                        |  Calculation  |    | Estimation      |
246	                        -----------------    -------------------

248	                      Figure 1. The PIE Structure

250	4.2 Drop Probability Calculation

252	The PIE algorithm periodically updates the drop probability as follows:

254	    * estimate current queueing delay using Little's law:

256	        est_del = qlen/depart_rate;

258	    * calculate drop probability p as:

260	        p = p + alpha*(est_del-target_del) + beta*(est_del-est_del_old);

262	        est_del_old = est_del.

264	Here, the current queue length is denoted by qlen. The draining rate of
265	the queue, depart_rate, is obtained from the departure-rate-estimation
266	block.  Variables, est_del and est_del_old, represent the current and
267	previous estimation of the queueing delay. The target latency value is
268	expressed in target_del.  The update interval is denoted as Tupdate.

270	Note that the calculation of drop probability is based not only on the
271	current estimation of the queueing delay, but also on the direction
272	where the delay is moving, i.e., whether the delay is getting longer or
273	shorter. This direction can simply be measured as the difference between
274	est_del and est_del_old. This is the classic Proportional Integral
275	controller design that is adopted here for controlling queueing latency.
276	The controller parameters, in the unit of hz, are designed using
277	feedback loop analysis where TCP's behaviors are modeled using the
278	results from well-studied prior art[TCP-Models].

280	We would like to point out that this type of controller has been studied
281	before for controlling the queue length [PI, QCN]. PIE adopts the
282	Proportional Integral controller for controlling delay and makes the
283	scheme auto-tuning. The theoretical analysis of PIE is under paper
284	submission and its reference will be included in this draft once it
285	becomes available. Nonetheless, we will discuss the intuitions for these
286	parameters in Section 5.

288	4.3 Departure Rate Estimation

290	The draining rate of a queue in the network often varies either because
291	other queues are sharing the same link, or the link capacity fluctuates.
292	Rate fluctuation is particularly common in wireless networks. Hence, we
293	decide to measure the departure rate directly as follows.

295	    * we are in a measurement cycle if we have enough data in the queue:

297	        qlen > dq_threshold

299	    * if in a measurement cycle:

301	        upon a packet departure

303	        dq_count = dq_count + deque_pkt_size;

305	    * if dq_count > dq_threshold then

307	        depart_rate = dq_count/(now-start);

309	        dq_count = 0;

311	        start = now;

313	We only measure the departure rate when there are sufficient data in the
314	buffer, i.e., when the queue length is over a certain threshold,
315	deq_threshold. Short, non-persistent bursts of packets result in empty
316	queues from time to time, this would make the measurement less accurate.
317	The parameter, dq_count, represents the number of bytes departed since
318	the last measurement. Once dq_count is over a certain threshold,
319	deq_threshold, we obtain a measurement sample. The threshold is
320	recommended to be set to 16KB assuming a typical packet size of around
321	1KB or 1.5KB. This threshold would allow us a long enough period to
322	obtain an average draining rate but also fast enough to reflect sudden
323	changes in the draining rate. Note that this threshold is not crucial
324	for the system's stability.

326	5. Design Enhancement

328	The above three components form the basis of the PIE algorithm. There
329	are several enhancements that we add to further augment the performance
330	of the basic algorithm. For clarity purpose, we include them here in
331	this section.

333	5.1 Turning PIE on and off

335	Traffic naturally fluctuates in a network. We would not want to
336	unnecessarily drop packets due to a spurious uptick in queueing latency.
337	If PIE is not active, we would only turn it on when the buffer occupancy
338	is over a certain threshold, which we set to 1/3 of the queue buffer
339	size. If PIE is on, we would turn it off when congestion is over, i.e.
340	when the drop probability, queue length and estimated queue delay all
341	reach 0.

343	5.2 Auto-tuning of PIE's control parameters

345	While the formal analysis can be found in [HPSR], we would like to
346	discuss the intuitions regarding how to determine the key control
347	parameters of PIE. Although the PIE algorithm would set them
348	automatically, they are not meant to be magic numbers. We hope to give
349	enough explanations here to help demystify them so that users can
350	experiment and explore on their own.

352	As it is obvious from the above, the crucial equation in the PIE
353	algorithm is

355	     p = p + alpha*(est_del-target_del) + beta*(est_del-est_del_old).

357	The value of alpha determines how the deviation of current latency from
358	the target value affects the drop probability.  The beta term exerts
359	additional adjustments depending on whether the latency is trending up
360	or down. Note that the drop probability is reached incrementally, not
361	through a single step. To avoid big swings in adjustments which often
362	leads to instability, we would like to tune p in small increments.
363	Suppose that p is in the range of 1%. Then we would want the value of
364	alpha and beta to be small enough, say 0.1%, adjustment in each step. If
365	p is in the higher range, say above 10%, then the situation would
366	warrant a higher single step tuning, for example 1%. There are could be
367	several regions of these tuning, extendable all the way to 0.001% if
368	needed. Finally, the drop probability would only be stabilized when the
369	latency is stable, i.e. est_del equals est_del_old; and the value of the
370	latency is equal to target_del. The relative weight between alpha and
371	beta determines the final balance between latency offset and latency
372	jitter.

374	The update interval, Tupdate, also plays a key role in stability. Given
375	the same alpha and beta values, the faster the update is, the higher the
376	loop gain will be. As it is not showing explicitly in the above
377	equation, it can become an oversight. Notice also that alpha and beta
378	have a unit of hz.

380	5.3 Handling Bursts

382	Although we aim to control the average latency of a congested queue, the
383	scheme should allow short term bursts to pass through without hurting
384	them. We would like to discuss how PIE manages bursts in this section
385	when it is active.

387	Bursts are well tolerated in the basic scheme for the following reasons:
388	first, the drop probability is updated periodically. Any short term
389	burst that occurs within this period could pass through without
390	incurring extra drops as it would not trigger a new drop probability
391	calculation. Secondly, PIE's drop probability calculation is done
392	incrementally. A single update would only lead to a small incremental
393	change in the probability. So if it happens that a burst does occur at
394	the exact instant that the probability is being calculated, the
395	incremental nature of the calculation would ensure its impact is kept
396	small.

398	Nonetheless, we would like to give users a precise control of the burst.
399	We introduce a parameter, max_burst, that is similar to the burst
400	tolerance in the token bucket design. By default, the parameter is set
401	to be 150ms. Users can certainly modify it according to their
402	application scenarios. The burst allowance is added into the basic PIE
403	design as follows:

405	    * if PIE_active == FALSE

407	        burst_allowance = max_burst;

409	    * upon packet arrival

411	        if burst_allowance > 0 enqueue packet;

413	    * upon probability update when PIE_active == TRUE

415	        burst_allowance = burst_allowance - Tupdate;

417	The burst allowance, noted by burst_allowance, is initialized to
418	max_burst. As long as burst_allowance is above zero, an incoming packet
419	will be enqueued bypassing the random drop process. During each update
420	instance, the value of burst_allowance is decremented by the update
421	period, Tupdate. When the congestion goes away, defined by us as p
422	equals to 0 and both the current and previous samples of estimated delay
423	are less than target_del, we reset burst_allowance to max_burst.

425	5.4 De-randomization

427	Although PIE adopts random dropping to achieve latency control, coin
428	tosses could introduce outlier situations where packets are dropped too
429	close to each other or too far from each other. This would cause real
430	drop percentage to deviate from the intended drop probability p. PIE
431	introduces a de-randomization mechanism to avoid such scenarios. We keep
432	a parameter called accu_prob, which is reset to 0 after a drop. Upon a
433	packet arrival, accu_prob is incremented by the amount of drop
434	probability, p. If accu_prob is less than a low threshold, e.g. 0.85, we
435	enque the arriving packet; on the other hand, if accu_prob is more than
436	a high threshold, e.g. 8.5, we force a packet drop. We would only
437	randomly drop a packet if accu_prob falls in between the two thresholds.
438	Since accu_prob is reset to 0 after a drop, another drop will not happen
439	until 0.85/p packets later. This avoids packets are dropped too close to
440	each other. In the other extreme case where 8.5/p packets have been
441	enqued without incurring a drop, PIE would force a drop that prevents
442	much fewer drops than desired. Further analysis can be found in [AQM
443	DOCSIS].

445	6. Implementation and Discussions

447	PIE can be applied to existing hardware or software solutions. In this
448	section, we discuss the implementation cost of the PIE algorithm. There
449	are three steps involved in PIE as discussed in Section 4. We examine
450	their complexities as follows.

452	Upon packet arrival, the algorithm simply drops a packet randomly based
453	on the drop probability p. This step is straightforward and requires no
454	packet header examination and manipulation. Besides, since no per packet
455	overhead, such as a timestamp, is required, there is no extra memory
456	requirement. Furthermore, the input side of a queue is typically under
457	software control while the output side of a queue is hardware based.
458	Hence, a drop at enqueueing can be readily retrofitted into existing
459	hardware or software implementations.

461	The drop probability calculation is done in the background and it occurs
462	every Tudpate interval. Given modern high speed links, this period
463	translates into once every tens, hundreds or even thousands of packets.
464	Hence the calculation occurs at a much slower time scale than packet
465	processing time, at least an order of magnitude slower. The calculation
466	of drop probability involves multiplications using alpha and beta. Since
467	the algorithm is not sensitive to the precise values of alpha and beta,
468	multiplications can be done using simple adds and shifts. As no
469	complicated functions are required, PIE can be easily implemented in
470	both hardware and software. The state requirement is only two variables
471	per queue: est_del and est_del_old. Hence the memory overhead is small.

473	In the departure rate estimation, PIE uses a counter to keep track of
474	the number of bytes departed for the current interval. This counter is
475	incremented per packet departure. Every Tupdate, PIE calculates latency
476	using the departure rate, which can be implemented using a
477	multiplication. Note that many network devices keep track an interface's
478	departure rate. In this case, PIE might be able to reuse this
479	information, simply skip the third step of the algorithm and hence
480	incurs no extra cost. We also understand that in some software
481	implementations, time-stamped are added for other purposes. In this
482	case, we can also make use of the time-stamps and bypass the departure
483	rate estimation and directly used the timestamp information in the drop
484	probability calculation.

486	In some platforms, enqueueing and dequeueing functions belong to
487	different modules that are independent to each other. In such
488	situations, a pure enque-based design is preferred. As shown in Figure
489	2, we depict a enque-based design. The departure rate is deduced from
490	the number of packets enqueued and the queue length. The design is based
491	on the following key observation: over a certain time interval, the
492	number of departure packets = the number of enqueued packets - the
493	number of extra packets in queue. In this design, everything can be
494	triggered by a packet arrival including the background update process.
495	The design complexity here is similar to the original design.

497	         Random Drop
498	             /                     --------------
499	     -------/  -------------------->    | | | | | -------------->
500	            /|\             |           | | | | |
501	             |              |      --------------
502	             |              |       Queue Buffer
503	             |              |             |
504	             |              |             |queue
505	             |              |             |length
506	             |              |             |
507	             |             \|/           \|/
508	             |          ------------------------------
509	             |          |     Departure Rate         |
510	             -----<-----|  & Drop Probability        |
511	                        |        Calculation         |
512	                        ------------------------------

514	                Figure 2. The Enque-based PIE Structure

516	In summary, the state requirement for PIE is limited and computation
517	overheads are small. Hence, PIE is simple to be implemented. In
518	addition, since PIE does not require any user configuration, it does not
519	impose any new cost on existing network management system solutions. SFQ
520	can be combined with PIE to provide further improvement of latency for
521	various flows with different priorities. However, SFQ requires extra
522	queueing and scheduling structures. Whether the performance gain can
523	justify the design overhead needs to be further investigated.

525	7. Incremental Deployment

527	One nice property of the AQM design is that it can be independently
528	designed and operated without the requirement of being inter-operable.

530	Although all network nodes can not be changed altogether to adopt
531	latency-based AQM schemes, we envision a gradual adoption which would
532	eventually lead to end-to-end low latency service for real time
533	applications.

535	8. IANA Considerations

537	There are no actions for IANA.

539	9. References

541	9.1  Normative References

543	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
544	              Requirement Levels", BCP 14, RFC 2119, March 1997.

546	9.2  Informative References

548	   [RFC970]   Nagle, J., "On Packet Switches With Infinite
549	              Storage",RFC970, December 1985.

551	9.3  Other References

553	   [CoDel]    	Nichols, K., Jacobson, V., "Controlling Queue Delay", ACM
554	                  Queue. ACM Publishing. doi:10.1145/2209249.22W.09264.

556	   [CBQ]      	Cisco White Paper, "http://www.cisco.com/en/US/docs/12_0t
557	                  /12_0tfeature/guide/cbwfq.html".

559	   [DOCSIS_3.1]   http://www.cablelabs.com/wp-content/uploads/specdocs
560	                  /CM-SP-MULPIv3.1-I01-131029.pdf.

562	   [DOCSIS-PIE]   White, G. and Pan, R., "A PIE-Based AQM for DOCSIS
563	                  Cable Modems", IETF draft-white-aqm-docsis-pie-00.

565	   [HPSR]	Pan, R., Natarajan, P. Piglione, C., Prabhu, M.S.,
566	                  Subramanian, V., Baker, F. Steeg and B. V., "PIE:
567	                  A Lightweight Control Scheme to Address the
568	                  Bufferbloat Problem", IEEE HPSR 2013.

570	   [AQM DOCSIS]      http://www.cablelabs.com/wp-
571	   content/uploads/2014/06/DOCSIS-AQM_May2014.pdf

573	   [TCP-Models] 	Misra, V., Gong, W., and Towsley, D., "Fluid-based
574	                  Analysis of a Network of AQM Routers Supporting TCP
575	                  Flows with an Application to RED", SIGCOMM 2000.

577	   [PI]		Hollot, C.V., Misra, V., Towsley, D. and Gong, W.,
578	   		"On Designing Improved Controller for AQM Routers
579	   		Supporting TCP Flows", Infocom 2001.

581	   [QCN]		"Data Center Bridging - Congestion Notification",
582	   		http://www.ieee802.org/1/pages/802.1au.html.

584	Authors' Addresses

586	   Rong Pan
587	   Cisco Systems
588	   3625 Cisco Way,
589	   San Jose, CA 95134, USA
590	   Email: ropan@cisco.com

592	   Preethi Natarajan,
593	   Cisco Systems
594	   725 Alder Drive,
595	   Milpitas, CA 95035, USA
596	   Email: prenatar@cisco.com

598	   Fred Baker
599	   Cisco Systems
600	   725 Alder Drive,
601	   Milpitas, CA 95035, USA
602	   Email: fred@cisco.com

604	   Bill Ver Steeg
605	   Cisco Systems
606	   5030 Sugarloaf Parkway
607	   Lawrenceville, GA, 30044, USA
608	   Email: versteb@cisco.com

610	   Mythili Prabhu*
611	   Akamai Technologies
612	   3355 Scott Blvd
613	   Santa Clara, CA - 95054
614	   Email: mythili@akamai.com

616	   Chiara Piglione*
617	   Broadcom Corporation
618	   3151 Zanker Road
619	   San Jose, CA 95134
620	   Email: chiara@broadcom.com

622	   Vijay Subramanian*
623	   PLUMgrid, Inc.
624	   350 Oakmead Parkway,
625	   Suite 250
626	   Sunnyvale, CA 94085
627	   Email: vns@plumgrid.com

629	   Greg White
630	   CableLabs
631	   858 Coal Creek Circle
632	   Louisville, CO 80027, USA
633	   Email: g.white@cablelabs.com

635	   * Formerly at Cisco Systems

637	10. The PIE pseudo Code

639	  Configurable Parameters:
640	       - QDELAY_REF. AQM Latency Target (default: 16ms)
641	       - BURST_ALLOWANCE. AQM Latency Target (default: 150ms)

643	  Internal Parameters:
644	       - Weights in the drop probability calculation (1/s):
645	         alpha (default: 1/8), beta(default: 1+1/4)
646	       - DQ_THRESHOLD (in bytes, default: 2^14 (in a power of 2) )
647	       - T_UPDATE: a period to calculate drop probability (default:16ms)
648	       - QUEUE_SMALL = (1/3) * Buffer limit in bytes

650	  Table which stores status variables (ending with "_"):
651	       - active_: INACTIVE/ACTIVE
652	       - busrt_count: current burst_count
653	       - drop_prob:  The current packet drop probability. reset to 0
654	       - accu_prob: Accumulated drop probability. reset to 0
655	       - qdelay_old_:  The previous queue delay estimate. reset to 0
656	       - qlen_old_: The previous sample of queue length
657	       - dq_count_, measurement_start_, in_measurement_,
658	         avg_dq_time. variables for measuring avg_dq_rate_.

660	  Public/system functions:
661	       - queue_.  Holds the pending packets.
662	       - drop(packet).  Drops/discards a packet
663	       - now().  Returns the current time
664	       - random(). Returns a uniform r.v. in the range 0 ~ 1
665	       - queue_.is_full(). Returns true if queue_ is full
666	       - queue_.byte_length(). Returns current queue_ length in bytes
667	       - queue_.enque(packet). Adds packet to tail of queue_
668	       - queue_.deque(). Returns the packet from the head of queue_
669	       - packet.size(). Returns size of packet

671	============================

673	  enque(Packet packet) {
674	       if (queue_.is_full()) {
675	      	drop(packet);
676	       } else if (PIE->active_ == TRUE && drop_early() == TRUE
677	                  && BURST_count <= 0) {
678	      	drop(packet);
679	       } else {
680	      	queue_.enque(packet);
681	       }

683	       //If the queue is over a certain threshold, turn on PIE
684	       if (PIE->active_ == INACTIVE
685	           && queue_.byte_length() >= QUEUE_SMALL) {
686	            PIE->active_ = ACTIVE;
687	            PIE->qdelay_old_ = 0;
688	            PIE->drop_prob_ = 0;
689	            PIE->in_measurement_ = TRUE;
690	            PIE->dq_count_ = 0;
691	            PIE->avg_dq_time = 0;
692	            PIE->last_timestamp_ = now;
693	            PIE->burst_count = BURST_ALLOWANCE;
694	       }

696	       //If the queue has been idle for a while, turn off PIE
697	       //reset counters when accessing the queue after some idle
698	       //period if PIE was active before
699	       if ( PIE->drop_prob == 0 && PIE->qdelay_old == 0
700	            && queue_.byte_length() == 0) {
701	            PIE->drop_prob_ = 0;
702	            PIE->active_ = INACTIVE;
703	            PIE->in_measurement_ = FALSE;
704	       }
705	  }

707	===========================

709	  drop_early() {

711	      //PIE is active but the queue is not congested, return ENQUE
712	      if ( (PIE->qdelay_old_ < QDELAY_REF/2 && PIE->drop_prob < 20%)
713	   	  || (queue_.byte_length() <= 2 * MEAN_PKTSIZE) ) {
714	           return ENQUE;
715	      }

717	      //Random drop
718	      accu_prob += drop_prob;
719	      if (accu_prob < 0.85)
720	          return ENQUE;
721	      if (accu_prob < 8.5)
722	          return DROP;
723	      double u = random();
724	      if (u < PIE->drop_prob_) {
725	   	return DROP;
726	      } else {
727	   	return ENQUE;
728	      }
729	   }

731	============================
732	 //update periodically, T_UPDATE = 16ms
733	 status_update(state) {
734	     if ( (now - last_timestampe) >= T_UPDATE) {
735	       //can be implemented using integer multiply,
736	       //DQ_THRESHOLD is power of 2 value
737	       qdelay = queue_.byte_length() * avg_dqtime/DQ_THRESHOLD;
738	       if (PIE->drop_prob_ < 0.1%) {
739	            PIE->drop_prob_ += alpha*(qdelay - QDELAY_REF)/128
740	                               + beta*(delay-PIE->qdelay_old_)/128;
741	       } else if (PIE->drop_prob_ < 1%) {
742	            PIE->drop_prob_ += alpha*(qdelay - QDELAY_REF)/16
743	                               + beta*(delay-PIE->qdelay_old_)/16;
744	       } else if (PIE->drop_prob_ < 10%) {
745	            PIE->drop_prob_ += alpha*(qdelay - QDELAY_REF)/2
746	                               + beta*(delay-PIE->qdelay_old_)/2;
747	       } else {
748	            PIE->drop_prob_ += alpha*(qdelay - QDELAY_REF)
749	                             + beta*(delay-PIE->qdelay_old_);
750	       }
751	       PIE->qdelay_old_ = qdelay;
752	       PIE->last_timestamp_ = now;
753	       if (burst_count > 0) {
754	     	burst_count = burst_count - BURST_ALLOWANCE
755	       }
756	    }
757	}

759	==========================
760	  deque(Packet packet) {

762	     //dequeue rate estimation
763	     if (PIE->in_measurement_ == TRUE) {
764	          dq_count = packet->bytelen() + dq_count;
765	          if ( dq_count >= DQ_THRESHOLD) {
766	            dq_time = now - PIE->measurement_start_;
767	            if(PIE->avg_dq_time_ = 0) {
768	              PIE->avg_dq_time_ = dq_time;
769	            } else {
770	              PIE->avg_dq_time_ = dq_time*1/4 + tmp_avg_dqtime*3/4;
771	            }
772	            PIE->in_measurement = FALSE;
773	          }
774	     }

776	     if (queue_.byte_length() >= DQ_THRESHILD &&
777	         PIE->in_measurement == FALSE) {
778	            PIE->in_measurement_ = TRUE;
779	            PIE->measurement_start_ = now;
780	            PIE->dq_count_ = 0;
781	     }
782	  }