idnits 2.17.1 draft-ietf-aqm-eval-guidelines-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 27, 2015) is 3073 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'Byte' is mentioned on line 301, but not defined == Outdated reference: A later version (-07) exists of draft-ietf-tcpm-cubic-00 ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2679 (Obsoleted by RFC 7679) ** Obsolete normative reference: RFC 2680 (Obsoleted by RFC 7680) -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force N. Kuhn, Ed. 3 Internet-Draft Telecom Bretagne 4 Intended status: Informational P. Natarajan, Ed. 5 Expires: May 30, 2016 Cisco Systems 6 N. Khademi, Ed. 7 University of Oslo 8 D. Ros 9 Simula Research Laboratory AS 10 November 27, 2015 12 AQM Characterization Guidelines 13 draft-ietf-aqm-eval-guidelines-09 15 Abstract 17 Unmanaged large buffers in today's networks have given rise to a slew 18 of performance issues. These performance issues can be addressed by 19 some form of Active Queue Management (AQM) mechanism, optionally in 20 combination with a packet scheduling scheme such as fair queuing. 21 The IETF Active Queue Management and Packet Scheduling working group 22 was formed to standardize AQM schemes that are robust, easily 23 implementable, and successfully deployable in today's networks. This 24 document describes various criteria for performing precautionary 25 characterizations of AQM proposals. This document also helps in 26 ascertaining whether any given AQM proposal should be taken up for 27 standardization by the AQM WG. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on May 30, 2016. 46 Copyright Notice 48 Copyright (c) 2015 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 64 1.1. Reducing the latency and maximizing the goodput . . . . . 5 65 1.2. Guidelines for AQM evaluation . . . . . . . . . . . . . . 5 66 1.3. Requirements Language . . . . . . . . . . . . . . . . . . 6 67 1.4. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 6 68 2. End-to-end metrics . . . . . . . . . . . . . . . . . . . . . 6 69 2.1. Flow completion time . . . . . . . . . . . . . . . . . . 7 70 2.2. Flow start up time . . . . . . . . . . . . . . . . . . . 7 71 2.3. Packet loss . . . . . . . . . . . . . . . . . . . . . . . 7 72 2.4. Packet loss synchronization . . . . . . . . . . . . . . . 8 73 2.5. Goodput . . . . . . . . . . . . . . . . . . . . . . . . . 9 74 2.6. Latency and jitter . . . . . . . . . . . . . . . . . . . 9 75 2.7. Discussion on the trade-off between latency and goodput . 10 76 3. Generic setup for evaluations . . . . . . . . . . . . . . . . 10 77 3.1. Topology and notations . . . . . . . . . . . . . . . . . 11 78 3.2. Buffer size . . . . . . . . . . . . . . . . . . . . . . . 12 79 3.3. Congestion controls . . . . . . . . . . . . . . . . . . . 12 80 4. Methodology, Metrics, AQM Comparisons, Packet Sizes, 81 Scheduling and ECN . . . . . . . . . . . . . . . . . . . . . 13 82 4.1. Methodology . . . . . . . . . . . . . . . . . . . . . . . 13 83 4.2. Comments on metrics measurement . . . . . . . . . . . . . 13 84 4.3. Comparing AQM schemes . . . . . . . . . . . . . . . . . . 14 85 4.3.1. Performance comparison . . . . . . . . . . . . . . . 14 86 4.3.2. Deployment comparison . . . . . . . . . . . . . . . . 15 87 4.4. Packet sizes and congestion notification . . . . . . . . 15 88 4.5. Interaction with ECN . . . . . . . . . . . . . . . . . . 15 89 4.6. Interaction with Scheduling . . . . . . . . . . . . . . . 16 90 5. Transport Protocols . . . . . . . . . . . . . . . . . . . . . 16 91 5.1. TCP-friendly sender . . . . . . . . . . . . . . . . . . . 17 92 5.1.1. TCP-friendly sender with the same initial congestion 93 window . . . . . . . . . . . . . . . . . . . . . . . 17 95 5.1.2. TCP-friendly sender with different initial congestion 96 windows . . . . . . . . . . . . . . . . . . . . . . . 17 97 5.2. Aggressive transport sender . . . . . . . . . . . . . . . 18 98 5.3. Unresponsive transport sender . . . . . . . . . . . . . . 18 99 5.4. Less-than Best Effort transport sender . . . . . . . . . 19 100 6. Round Trip Time Fairness . . . . . . . . . . . . . . . . . . 19 101 6.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 19 102 6.2. Recommended tests . . . . . . . . . . . . . . . . . . . . 20 103 6.3. Metrics to evaluate the RTT fairness . . . . . . . . . . 20 104 7. Burst Absorption . . . . . . . . . . . . . . . . . . . . . . 20 105 7.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 20 106 7.2. Recommended tests . . . . . . . . . . . . . . . . . . . . 21 107 8. Stability . . . . . . . . . . . . . . . . . . . . . . . . . . 22 108 8.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 22 109 8.2. Recommended tests . . . . . . . . . . . . . . . . . . . . 23 110 8.2.1. Definition of the congestion Level . . . . . . . . . 23 111 8.2.2. Mild congestion . . . . . . . . . . . . . . . . . . . 24 112 8.2.3. Medium congestion . . . . . . . . . . . . . . . . . . 24 113 8.2.4. Heavy congestion . . . . . . . . . . . . . . . . . . 24 114 8.2.5. Varying the congestion level . . . . . . . . . . . . 24 115 8.2.6. Varying available capacity . . . . . . . . . . . . . 24 116 8.3. Parameter sensitivity and stability analysis . . . . . . 25 117 9. Various Traffic Profiles . . . . . . . . . . . . . . . . . . 26 118 9.1. Traffic mix . . . . . . . . . . . . . . . . . . . . . . . 26 119 9.2. Bi-directional traffic . . . . . . . . . . . . . . . . . 27 120 10. Multi-AQM Scenario . . . . . . . . . . . . . . . . . . . . . 27 121 10.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 27 122 10.2. Details on the evaluation scenario . . . . . . . . . . . 27 123 11. Implementation cost . . . . . . . . . . . . . . . . . . . . . 28 124 11.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 28 125 11.2. Recommended discussion . . . . . . . . . . . . . . . . . 28 126 12. Operator Control and Auto-tuning . . . . . . . . . . . . . . 28 127 12.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 29 128 12.2. Recommended discussion . . . . . . . . . . . . . . . . . 29 129 13. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 30 130 14. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 31 131 15. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 31 132 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 133 17. Security Considerations . . . . . . . . . . . . . . . . . . . 31 134 18. References . . . . . . . . . . . . . . . . . . . . . . . . . 31 135 18.1. Normative References . . . . . . . . . . . . . . . . . . 31 136 18.2. Informative References . . . . . . . . . . . . . . . . . 33 137 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 34 139 1. Introduction 141 Active Queue Management (AQM) [RFC7567] addresses the concerns 142 arising from using unnecessarily large and unmanaged buffers to 143 improve network and application performance. Several AQM algorithms 144 have been proposed in the past years, most notably Random Early 145 Detection (RED), BLUE, and Proportional Integral controller (PI), and 146 more recently CoDel [NICH2012] and PIE [PAN2013]. In general, these 147 algorithms actively interact with the Transmission Control Protocol 148 (TCP) and any other transport protocol that deploys a congestion 149 control scheme to manage the amount of data they keep in the network. 150 The available buffer space in the routers and switches should be 151 large enough to accommodate the short-term buffering requirements. 152 AQM schemes aim at reducing buffer occupancy, and therefore the end- 153 to-end delay. Some of these algorithms, notably RED, have also been 154 widely implemented in some network devices. However, the potential 155 benefits of the RED scheme have not been realized since RED is 156 reported to be usually turned off. The main reason of this 157 reluctance to use RED in today's deployments comes from its 158 sensitivity to the operating conditions in the network and the 159 difficulty of tuning its parameters. 161 A buffer is a physical volume of memory in which a queue or set of 162 queues are stored. When speaking of a specific queue in this 163 document, "buffer occupancy" refers to the amount of data (measured 164 in bytes or packets) that are in the queue, and the "maximum buffer 165 size" refers to the maximum buffer occupancy. In real 166 implementations of switches, a global memory is often shared between 167 the available devices, and thus, the maximum buffer size may vary 168 over the time. 170 Bufferbloat [BB2011] is the consequence of deploying large unmanaged 171 buffers on the Internet -- the buffering has often been measured to 172 be ten times or hundred times larger than needed. Large buffer sizes 173 in combination with TCP and/or unresponsive flows increases end-to- 174 end delay. This results in poor performance for latency-sensitive 175 applications such as real-time multimedia (e.g., voice, video, 176 gaming, etc). The degree to which this affects modern networking 177 equipment, especially consumer-grade equipment's, produces problems 178 even with commonly used web services. Active queue management is 179 thus essential to control queuing delay and decrease network latency. 181 The Active Queue Management and Packet Scheduling Working Group (AQM 182 WG) was chartered to address the problems with large unmanaged 183 buffers in the Internet. Specifically, the AQM WG is tasked with 184 standardizing AQM schemes that not only address concerns with such 185 buffers, but also are robust under a wide variety of operating 186 conditions. 188 In order to ascertain whether the WG should undertake standardizing 189 an AQM proposal, the WG requires guidelines for assessing AQM 190 proposals. This document provides the necessary characterization 191 guidelines. [RFC7567] separately describes the AQM algorithm 192 implemented in a router from the scheduling of packets sent by the 193 router. The rest of this memo refers to the AQM as a dropping/ 194 marking policy as a separate feature to any interface scheduling 195 scheme. This document may be complemented with another one on 196 guidelines for assessing combination of packet scheduling and AQM. 197 We note that such a document will inherit all the guidelines from 198 this document plus any additional scenarios relevant for packet 199 scheduling such as flow starvation evaluation or impact of the number 200 of hash buckets. 202 1.1. Reducing the latency and maximizing the goodput 204 The trade-off between reducing the latency and maximizing the goodput 205 is intrinsically linked to each AQM scheme and is key to evaluating 206 its performance. This trade-off MUST be considered in a variety of 207 scenarios to ensure the safety of an AQM deployment. Whenever 208 possible, solutions ought to aim at both maximizing goodput and 209 minimizing latency. 211 1.2. Guidelines for AQM evaluation 213 The guidelines help to quantify performance of AQM schemes in terms 214 of latency reduction, goodput maximization and the trade-off between 215 these two. The guidelines also discuss methods to understand the 216 various aspects associated with safely deploying and operating the 217 AQM scheme. These guidelines discuss methods to understand ease of 218 development, deployment and operational aspects of the AQM scheme 219 verses the potential gain in performance from the introduction of the 220 proposed scheme. 222 This memo details generic characterization scenarios against which 223 any AQM proposal should be evaluated, irrespective of whether or not 224 an AQM is standardized by the IETF. This documents recommends the 225 relevant scenarios and metrics to be considered. The document 226 presents central aspects of an AQM algorithm that must be considered 227 whatever the context, such as burst absorption capacity, RTT fairness 228 or resilience to fluctuating network conditions. 230 These guidelines do not cover every possible aspect of a particular 231 algorithm. In addition, it is worth noting that the proposed 232 criteria are not bound to a particular evaluation toolset. These 233 guidelines do not present context-dependent scenarios (such as 802.11 234 WLANs, data-centers or rural broadband networks). To keep the 235 guidelines generic, a number of potential router components and 236 algorithms (such as DiffServ) are omitted. 238 1.3. Requirements Language 240 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 241 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 242 document are to be interpreted as described in RFC 2119 [RFC2119]. 244 1.4. Glossary 246 o AQM: [RFC7567] separately describes the Active Queue Managment 247 (AQM) algorithm implemented in a router from the scheduling of 248 packets sent by the router. The rest of this memo refers to the 249 AQM as a dropping/marking policy as a separate feature to any 250 interface scheduling scheme. 252 o buffer: a physical volume of memory in which a queue or set of 253 queues are stored. 255 o buffer occupancy: amount of data that are stored in a buffer, 256 measured in bytes or packets. 258 o buffer size: maximum buffer occupancy, that is the maximum amount 259 of data that may be stored in a buffer, measured in bytes or 260 packets. 262 o goodput: goodput is defined as the number of bits per unit of time 263 forwarded to the correct destination minus any bits lost or 264 retransmitted [RFC2647]. 266 o SQRT: the square root function. 268 o ROUND: the round function. 270 2. End-to-end metrics 272 End-to-end delay is the result of propagation delay, serialization 273 delay, service delay in a switch, medium-access delay and queuing 274 delay, summed over the network elements along the path. AQM schemes 275 may reduce the queuing delay by providing signals to the sender on 276 the emergence of congestion, but any impact on the goodput must be 277 carefully considered. This section presents the metrics that could 278 be used to better quantify (1) the reduction of latency, (2) 279 maximization of goodput and (3) the trade-off between these two. 280 This section provides normative requirements for metrics that can be 281 used to assess the performance of an AQM scheme. 283 Some metrics listed in this section are not suited to every type of 284 traffic detailed in the rest of this document. It is therefore not 285 necessary to measure all of the following metrics: the chosen metric 286 may not be relevant to the context of the evaluation scenario (e.g., 287 latency vs. goodput trade-off in application-limited traffic 288 scenarios). Guidance is provided for each metric. 290 2.1. Flow completion time 292 The flow completion time is an important performance metric for the 293 end-user when the flow size is finite. Considering the fact that an 294 AQM scheme may drop/mark packets, the flow completion time is 295 directly linked to the dropping/marking policy of the AQM scheme. 296 This metric helps to better assess the performance of an AQM 297 depending on the flow size. The Flow Completion Time (FCT) is 298 related to the flow size (Fs) and the goodput for the flow (G) as 299 follows: 301 FCT [s] = Fs [Byte] / ( G [Bit/s] / 8 [Bit/Byte] ) 303 If this metric is used to evaluate the performance of web transfers, 304 it is suggested to rather consider the time needed to download all 305 the objects that compose the web page, as this makes more sense in 306 terms of user experience than assessing the time needed to download 307 each object. 309 2.2. Flow start up time 311 The flow start up time is the time between the request has been sent 312 from the client and the server starts to transmit data. The amount 313 of packets dropped by an AQM may seriously affect the waiting period 314 during which the data transfer has not started. This metric would 315 specifically focus on the operations such as DNS lookups, TCP opens 316 of SSL handshakes. 318 2.3. Packet loss 320 Packet loss can occur en-route, this can impact the end-to-end 321 performance measured at receiver. 323 The tester SHOULD evaluate loss experienced at the receiver using one 324 of the two metrics: 326 o the packet loss ratio: this metric is to be frequently measured 327 during the experiment. The long-term loss ratio is of interest 328 for steady-state scenarios only; 330 o the interval between consecutive losses: the time between two 331 losses is to be measured. 333 The packet loss ratio can be assessed by simply evaluating the loss 334 ratio as a function of the number of lost packets and the total 335 number of packets sent. This might not be easily done in laboratory 336 testing, for which these guidelines advice the tester: 338 o to check that for every packet, a corresponding packet was 339 received within a reasonable time, as explained in [RFC2680]. 341 o to keep a count of all packets sent, and a count of the non- 342 duplicate packets received, as explained in the section 10 of 343 [RFC2544]. 345 The interval between consecutive losses, which is also called a gap, 346 is a metric of interest for VoIP traffic and, as a result, has been 347 further specified in [RFC3611]. 349 2.4. Packet loss synchronization 351 One goal of an AQM algorithm is to help to avoid global 352 synchronization of flows sharing a bottleneck buffer on which the AQM 353 operates ([RFC2309],[RFC7567]). The "degree" of packet-loss 354 synchronization between flows SHOULD be assessed, with and without 355 the AQM under consideration. 357 As discussed e.g., in [HASS2008], loss synchronization among flows 358 may be quantified by several slightly different metrics that capture 359 different aspects of the same issue. However, in real-world 360 measurements the choice of metric could be imposed by practical 361 considerations -- e.g., whether fine-grained information on packet 362 losses in the bottleneck available or not. For the purpose of AQM 363 characterization, a good candidate metric is the global 364 synchronization ratio, measuring the proportion of flows losing 365 packets during a loss event. [JAY2006] used this metric in real- 366 world experiments to characterize synchronization along arbitrary 367 Internet paths; the full methodology is described in [JAY2006]. 369 If an AQM scheme is evaluated using real-life network environments, 370 it is worth pointing out that some network events, such as failed 371 link restoration may cause synchronized losses between active flows 372 and thus confuse the meaning of this metric. 374 2.5. Goodput 376 The goodput has been defined in section 3.17 of [RFC2647] as the 377 number of bits per unit of time forwarded to the correct destination 378 interface, minus any bits lost or retransmitted. This definition 379 induces that the test setup needs to be qualified to assure that it 380 is not generating losses on its own. 382 Measuring the end-to-end goodput provides an appreciation of how well 383 an AQM scheme improves transport and application performance. The 384 measured end-to-end goodput is linked to the dropping/marking policy 385 of the AQM scheme -- e.g., the fewer the number of packet drops, the 386 fewer packets need retransmission, minimizing the impact of AQM on 387 transport and application performance. Additionally, an AQM scheme 388 may resort to Explicit Congestion Notification (ECN) marking as an 389 initial means to control delay. Again, marking packets instead of 390 dropping them reduces the number of packet retransmissions and 391 increases goodput. End-to-end goodput values help to evaluate the 392 AQM scheme's effectiveness of an AQM scheme in minimizing packet 393 drops that impact application performance and to estimate how well 394 the AQM scheme works with ECN. 396 The measurement of the goodput allows the tester evaluate to which 397 extent an AQM is able to maintain a high bottleneck utilization. 398 This metric should be also obtained frequently during an experiment 399 as the long-term goodput is relevant for steady-state scenarios only 400 and may not necessarily reflect how the introduction of an AQM 401 actually impacts the link utilization during at a certain period of 402 time. Fluctuations in the values obtained from these measurements 403 may depend on other factors than the introduction of an AQM, such as 404 link layer losses due to external noise or corruption, fluctuating 405 bandwidths (802.11 WLANs), heavy congestion levels or transport 406 layer's rate reduction by congestion control mechanism. 408 2.6. Latency and jitter 410 The latency, or the one-way delay metric, is discussed in [RFC2679]. 411 There is a consensus on an adequate metric for the jitter, that 412 represents the one-way delay variations for packets from the same 413 flow: the Packet Delay Variation (PDV), detailed in [RFC5481], serves 414 well all use cases. 416 The end-to-end latency includes components other than just the 417 queuing delay, such as the signal processing delay, transmission 418 delay and the processing delay. Moreover, the jitter is caused by 419 variations in queuing and processing delay (e.g., scheduling 420 effects). The introduction of an AQM scheme would impact these 421 metrics (end-to-end latency and jitter) and therefore they should be 422 considered in the end-to-end evaluation of performance. 424 2.7. Discussion on the trade-off between latency and goodput 426 The metrics presented in this section may be considered as explained 427 in the rest of this document, in order to discuss and quantify the 428 trade-off between latency and goodput. 430 With regards to the goodput, and in addition to the long-term 431 stationary goodput value, it is RECOMMENDED to take measurements 432 every multiple of the minimum RTT (minRTT) between A and B. It is 433 suggested to take measurements at least every K x minRTT (to smooth 434 out the fluctuations), with K=10. Higher values for K are encouraged 435 whenever it is more appropriate for the presentation of the results. 436 The value for K may depend on the network's path characteristics. 437 The measurement period MUST be disclosed for each experiment and when 438 results/values are compared across different AQM schemes, the 439 comparisons SHOULD use exactly the same measurement periods. With 440 regards to latency, it is RECOMMENDED to take the samples on per- 441 packet basis whenever possible depending on the features provided by 442 hardware/software and the impact of sampling itself on the hardware 443 performance. It is generally RECOMMENDED to provide at least 10 444 samples per RTT. 446 From each of these sets of measurements, the cumulative density 447 function (CDF) of the considered metrics SHOULD be computed. If the 448 considered scenario introduces dynamically varying parameters, 449 temporal evolution of the metrics could also be generated. For each 450 scenario, the following graph may be generated: the x-axis shows 451 queuing delay (that is the average per-packet delay in excess of 452 minimum RTT), the y-axis the goodput. Ellipses are computed such as 453 detailed in [WINS2014]: "We take each individual [...] run [...] as 454 one point, and then compute the 1-epsilon elliptic contour of the 455 maximum-likelihood 2D Gaussian distribution that explains the points. 456 [...] we plot the median per-sender throughput and queueing delay as 457 a circle. [...] The orientation of an ellipse represents the 458 covariance between the throughput and delay measured for the 459 protocol." This graph provides part of a better understanding of (1) 460 the delay/goodput trade-off for a given congestion control mechanism 461 Section 5, and (2) how the goodput and average queue delay vary as a 462 function of the traffic load Section 8.2. 464 3. Generic setup for evaluations 466 This section presents the topology that can be used for each of the 467 following scenarios, the corresponding notations and discusses 468 various assumptions that have been made in the document. 470 3.1. Topology and notations 472 +---------+ +-----------+ 473 |senders A| |receivers B| 474 +---------+ +-----------+ 476 +--------------+ +--------------+ 477 |traffic class1| |traffic class1| 478 |--------------| |--------------| 479 | SEN.Flow1.1 +---------+ +-----------+ REC.Flow1.1 | 480 | + | | | | + | 481 | | | | | | | | 482 | + | | | | + | 483 | SEN.Flow1.X +-----+ | | +--------+ REC.Flow1.X | 484 +--------------+ | | | | +--------------+ 485 + +-+---+---+ +--+--+---+ + 486 | |Router L | |Router R | | 487 | |---------| |---------| | 488 | | AQM | | | | 489 | | BuffSize| | BuffSize| | 490 | | (Bsize) +-----+ (Bsize) | | 491 | +-----+--++ ++-+------+ | 492 + | | | | + 493 +--------------+ | | | | +--------------+ 494 |traffic classN| | | | | |traffic classN| 495 |--------------| | | | | |--------------| 496 | SEN.FlowN.1 +---------+ | | +-----------+ REC.FlowN.1 | 497 | + | | | | + | 498 | | | | | | | | 499 | + | | | | + | 500 | SEN.FlowN.Y +------------+ +-------------+ REC.FlowN.Y | 501 +--------------+ +--------------+ 503 Figure 1: Topology and notations 505 Figure 1 is a generic topology where: 507 o sender with different traffic characteristics (i.e., traffic 508 profiles) can be introduced; 510 o the timing of each flow could be different (i.e., when does each 511 flow start and stop); 513 o each traffic profile can comprise various number of flows; 515 o each link is characterized by a couple (one-way delay, capacity); 516 o flows are generated at A and sent to B, sharing a bottleneck (the 517 link between routers L and R); 519 o the tester SHOULD consider both scenarios of asymmetric and 520 symmetric bottleneck links in terms of bandwidth. In case of 521 asymmetric link, the capacity from senders to receivers is higher 522 than the one from receivers to senders; the symmetric link 523 scenario provides a basic understanding of the operation of the 524 AQM mechanism whereas the asymmetric link scenario evaluates an 525 AQM mechanism in a more realistic setup; 527 o in asymmetric link scenarios, the tester SHOULD study the bi- 528 directional traffic between A and B (downlink and uplink) with the 529 AQM mechanism deployed on one direction only. The tester MAY 530 additionally consider a scenario with AQM mechanism being deployed 531 on both directions. In each scenario, the tester SHOULD 532 investigate the impact of drop policy of the AQM on TCP ACK 533 packets and its impact on the performance. 535 Although this topology may not perfectly reflect actual topologies, 536 the simple topology is commonly used in the world of simulations and 537 small testbeds. It can be considered as adequate to evaluate AQM 538 proposals, similarly to the topology proposed in 539 [I-D.irtf-iccrg-tcpeval]. Testers ought to pay attention to the 540 topology that has been used to evaluate an AQM scheme when comparing 541 this scheme with a newly proposed AQM scheme. 543 3.2. Buffer size 545 The size of the buffers should be carefully chosen, and MAY be set to 546 the bandwidth-delay product; the bandwidth being the bottleneck 547 capacity and the delay the largest RTT in the considered network. 548 The size of the buffer can impact the AQM performance and is a 549 dimensioning parameter that will be considered when comparing AQM 550 proposals. 552 If a specific buffer size is required, the tester MUST justify and 553 detail the way the maximum queue size is set. Indeed, the maximum 554 size of the buffer may affect the AQM's performance and its choice 555 SHOULD be elaborated for a fair comparison between AQM proposals. 556 While comparing AQM schemes the buffer size SHOULD remain the same 557 across the tests. 559 3.3. Congestion controls 561 This document considers running three different congestion control 562 algorithms between A and B 563 o Standard TCP congestion control: the base-line congestion control 564 is TCP NewReno with SACK, as explained in [RFC5681]. 566 o Aggressive congestion controls: a base-line congestion control for 567 this category is TCP Cubic [I-D.ietf-tcpm-cubic]. 569 o Less-than Best Effort (LBE) congestion controls: an LBE congestion 570 control 'results in smaller bandwidth and/or delay impact on 571 standard TCP than standard TCP itself, when sharing a bottleneck 572 with it.' [RFC6297] 574 Other transport congestion controls can OPTIONALLY be evaluated in 575 addition. Recent transport layer protocols are not mentioned in the 576 following sections, for the sake of simplicity. 578 4. Methodology, Metrics, AQM Comparisons, Packet Sizes, Scheduling and 579 ECN 581 4.1. Methodology 583 One key objective behind formulating the guidelines is to help 584 ascertain whether a specific AQM is not only better than drop-tail 585 (with BDP-sized buffer) but also safe to deploy. Testers therefore 586 need to provide a reference document for their proposal discussing 587 performance and deployment compared to those of drop-tail. 589 A description of each test setup SHOULD be detailed to allow this 590 test to be compared with other tests. This also allows others to 591 replicate the tests if needed. This test setup SHOULD detail 592 software and hardware versions. The tester could make its data 593 available. 595 The proposals SHOULD be evaluated on real-life systems, or they MAY 596 be evaluated with event-driven simulations (such as ns-2, ns-3, 597 OMNET, etc). The proposed scenarios are not bound to a particular 598 evaluation toolset. 600 The tester is encouraged to make the detailed test setup and the 601 results publicly available. 603 4.2. Comments on metrics measurement 605 The document presents the end-to-end metrics that ought to be used to 606 evaluate the trade-off between latency and goodput in Section 2. In 607 addition to the end-to-end metrics, the queue-level metrics (normally 608 collected at the device operating the AQM) provide a better 609 understanding of the AQM behavior under study and the impact of its 610 internal parameters. Whenever it is possible (e.g., depending on the 611 features provided by the hardware/software), these guidelines advice 612 to consider queue-level metrics, such as link utilization, queuing 613 delay, queue size or packet drop/mark statistics in addition to the 614 AQM-specific parameters. However, the evaluation MUST be primarily 615 based on externally observed end-to-end metrics. 617 These guidelines do not aim to detail on the way these metrics can be 618 measured, since the way these metrics are measured is expected to 619 depend on the evaluation toolset. 621 4.3. Comparing AQM schemes 623 This document recognizes that these guidelines may be used for 624 comparing AQM schemes. 626 AQM schemes need to be compared against both performance and 627 deployment categories. In addition, this section details how best to 628 achieve a fair comparison of AQM schemes by avoiding certain 629 pitfalls. 631 4.3.1. Performance comparison 633 AQM schemes MUST be compared against all the generic scenarios 634 presented in this memo. AQM schemes MAY be compared for specific 635 network environments such as data centers, home networks, etc. If an 636 AQM scheme has parameter(s) that were externally tuned for 637 optimization or other purposes, these values MUST be disclosed. 639 AQM schemes belong to different varieties such as queue-length based 640 schemes (ex. RED) or queueing-delay based scheme (ex. CoDel, PIE). 641 AQM schemes expose different control knobs associated with different 642 semantics. For example, while both PIE and CoDel are queueing-delay 643 based schemes and each expose a knob to control the queueing delay -- 644 PIE's "queueing delay reference" vs. CoDel's "queueing delay target", 645 the two tuning parameters of the two schemes have different 646 semantics, resulting in different control points. Such differences 647 in AQM schemes can be easily overlooked while making comparisons. 649 This document RECOMMENDS the following procedures for a fair 650 performance comparison between the AQM schemes: 652 1. comparable control parameters and comparable input values: 653 carefully identify the set of parameters that control similar 654 behavior between the two AQM schemes and ensure these parameters 655 have comparable input values. For example, to compare how well a 656 queue-length based AQM scheme controls queueing delay vs. a 657 queueing-delay based AQM scheme, a tester can identify the 658 parameters of the schemes that control queue delay and ensure 659 that their input values are comparable. Similarly, to compare 660 how well two AQM schemes accommodate packet bursts, the tester 661 can identify burst-related control parameters and ensure they are 662 configured with similar values. Additionally, it would be 663 preferable if an AQM proposal listed such parameters and 664 discussed how each relates to network characteristics such as 665 capacity, average RTT etc. 667 2. compare over a range of input configurations: there could be 668 situations when the set of control parameters that affect a 669 specific behavior have different semantics between the two AQM 670 schemes. As mentioned above, PIE has tuning parameters to 671 control queue delay that has a different semantics from those 672 used in CoDel. In such situations, these schemes need to be 673 compared over a range of input configurations. For example, 674 compare PIE vs. CoDel over the range of target delay input 675 configurations. 677 4.3.2. Deployment comparison 679 AQM schemes MUST be compared against deployment criteria such as the 680 parameter sensitivity (Section 8.3), auto-tuning (Section 12) or 681 implementation cost (Section 11). 683 4.4. Packet sizes and congestion notification 685 An AQM scheme may be considering packet sizes while generating 686 congestion signals. [RFC7141] discusses the motivations behind this. 687 For example, control packets such as DNS requests/responses, TCP 688 SYNs/ACKs are small, but their loss can severely impact the 689 application performance. An AQM scheme may therefore be biased 690 towards small packets by dropping them with smaller probability 691 compared to larger packets. However, such an AQM scheme is unfair to 692 data senders generating larger packets. Data senders, malicious or 693 otherwise, are motivated to take advantage of such AQM scheme by 694 transmitting smaller packets, and could result in unsafe deployments 695 and unhealthy transport and/or application designs. 697 An AQM scheme SHOULD adhere to the recommendations outlined in 698 [RFC7141], and SHOULD NOT provide undue advantage to flows with 699 smaller packets [RFC7567]. 701 4.5. Interaction with ECN 703 Deployed AQM algorithms SHOULD implement Explicit Congestion 704 Notification (ECN) as well as loss to signal congestion to endpoints 705 [RFC7567]. ECN [RFC3168] is an alternative that allows AQM schemes 706 to signal receivers about network congestion that does not use packet 707 drop. The benefits of providing ECN support for an AQM scheme are 708 described in [WELZ2015]. Section 3 of [WELZ2015] describes expected 709 operation of routers enabling ECN. AQM schemes SHOULD NOT drop or 710 remark packets solely because the ECT(0) or ECT(1) codepoints are 711 used, and when ECN-capable SHOULD set a CE-mark on ECN-capable 712 packets in the presence of incipient congestion. 714 If the tested AQM scheme can support ECN [RFC7567], the testers MUST 715 discuss and describe the support of ECN. Since these guidelines can 716 be used to evaluate the performance of the tested AQM with and 717 without ECN markings, they could also be used to quantify the 718 interest of enabling ECN. 720 4.6. Interaction with Scheduling 722 A network device may use per-flow or per-class queuing with a 723 scheduling algorithm to either prioritize certain applications or 724 classes of traffic, limit the rate of transmission, or to provide 725 isolation between different traffic flows within a common class 726 [RFC7567]. 728 The scheduling and the AQM conjointly impact on the end-to-end 729 performance. Therefore, the AQM proposal MUST discuss the 730 feasibility to add scheduling combined with the AQM algorithm. This 731 discussion as an instance, MAY explain whether the dropping policy is 732 applied when packets are being enqueued or dequeued. 734 These guidelines do not propose guidelines to assess the performance 735 of scheduling algorithms. Indeed, as opposed to characterizing AQM 736 schemes that is related to their capacity to control the queuing 737 delay in a queue, characterizing scheduling schemes is related to the 738 scheduling itself and its interaction with the AQM scheme. As one 739 example, the scheduler may create sub-queues and the AQM scheme may 740 be applied on each of the sub-queues, and/or the AQM could be applied 741 on the whole queue. Also, schedulers might, such as FQ-CoDel 742 [HOEI2015] or FavorQueue [ANEL2014], introduce flow prioritization. 743 In these cases, specific scenarios should be proposed to ascertain 744 that these scheduler schemes not only helps in tackling the 745 bufferbloat, but also are robust under a wide variety of operating 746 conditions. This is out of the scope of this document that focus on 747 dropping and/or marking AQM schemes. 749 5. Transport Protocols 751 Network and end-devices need to be configured with a reasonable 752 amount of buffer space to absorb transient bursts. In some 753 situations, network providers tend to configure devices with large 754 buffers to avoid packet drops triggered by a full buffer and to 755 maximize the link utilization for standard loss-based TCP traffic. 757 AQM algorithms are often evaluated by considering Transmission 758 Control Protocol (TCP) [RFC0793] with a limited number of 759 applications. TCP is a widely deployed transport. It fills up 760 available buffers until a sender transfering a bulk flow with TCP 761 receives a signal (packet drop) that reduces the sending rate. The 762 larger the buffer, the higher the buffer occupancy, and therefore the 763 queuing delay. An efficient AQM scheme sends out early congestion 764 signals to TCP to bring the queuing delay under control. 766 Not all endpoints (or applications) using TCP use the same flavor of 767 TCP. Variety of senders generate different classes of traffic which 768 may not react to congestion signals (aka non-responsive flows 769 [RFC7567]) or may not reduce their sending rate as expected (aka 770 Transport Flows that are less responsive than TCP[RFC7567], also 771 called "aggressive flows"). In these cases, AQM schemes seek to 772 control the queuing delay. 774 This section provides guidelines to assess the performance of an AQM 775 proposal for various traffic profiles -- different types of senders 776 (with different TCP congestion control variants, unresponsive, 777 aggressive). 779 5.1. TCP-friendly sender 781 5.1.1. TCP-friendly sender with the same initial congestion window 783 This scenario helps to evaluate how an AQM scheme reacts to a TCP- 784 friendly transport sender. A single long-lived, non application- 785 limited, TCP NewReno flow, with an Initial congestion Window (IW) set 786 to 3 packets, transfers data between sender A and receiver B. Other 787 TCP friendly congestion control schemes such as TCP-friendly rate 788 control [RFC5348] etc MAY also be considered. 790 For each TCP-friendly transport considered, the graph described in 791 Section 2.7 could be generated. 793 5.1.2. TCP-friendly sender with different initial congestion windows 795 This scenario can be used to evaluate how an AQM scheme adapts to a 796 traffic mix consisting of TCP flows with different values of the IW. 798 For this scenario, two types of flows MUST be generated between 799 sender A and receiver B: 801 o A single long-lived non application-limited TCP NewReno flow; 802 o A single application-limited TCP NewReno flow, with an IW set to 3 803 or 10 packets. The size of the data transferred must be strictly 804 higher than 10 packets and should be lower than 100 packets. 806 The transmission of the non application-limited flow must start 807 before the transmission of the application-limited flow and only 808 after the steady state has been reached by non application-limited 809 flow. 811 For each of these scenarios, the graph described in Section 2.7 could 812 be generated for each class of traffic (application-limited and non 813 application-limited). The completion time of the application-limited 814 TCP flow could be measured. 816 5.2. Aggressive transport sender 818 This scenario helps testers to evaluate how an AQM scheme reacts to a 819 transport sender that is more aggressive than a single TCP-friendly 820 sender. We define 'aggressiveness' as a higher increase factor than 821 standard upon a successful transmission and/or a lower than standard 822 decrease factor upon a unsuccessful transmission (e.g., in case of 823 congestion controls with Additive-Increase Multiplicative-Decrease 824 (AIMD) principle, a larger AI and/or MD factors). A single long- 825 lived, non application-limited, TCP Cubic flow transfers data between 826 sender A and receiver B. Other aggressive congestion control schemes 827 MAY also be considered. 829 For each flavor of aggressive transports, the graph described in 830 Section 2.7 could be generated. 832 5.3. Unresponsive transport sender 834 This scenario helps testers to evaluate how an AQM scheme reacts to a 835 transport sender that is less responsive than TCP. Note that faulty 836 transport implementations on an end host and/or faulty network 837 elements en-route that "hide" congestion signals in packet headers 838 [RFC7567] may also lead to a similar situation, such that the AQM 839 scheme needs to adapt to unresponsive traffic. To this end, these 840 guidelines propose the two following scenarios. 842 The first scenario can be used to evaluate queue build up. It 843 considers unresponsive flow(s) whose sending rate is greater than the 844 bottleneck link capacity between routers L and R. This scenario 845 consists of a long-lived non application limited UDP flow transmits 846 data between sender A and receiver B. Graphs described in 847 Section 2.7 could be generated. 849 The second scenario can be used to evaluate if the AQM scheme is able 850 to keep the responsive fraction under control. This scenario 851 considers a mixture of TCP-friendly and unresponsive traffics. It 852 consists of a long-lived UDP flow from unresponsive application and a 853 single long-lived, non application-limited (unlimited data available 854 to the transport sender from application layer), TCP New Reno flow 855 that transmit data between sender A and receiver B. As opposed to 856 the first scenario, the rate of the UDP traffic should not be greater 857 than the bottleneck capacity, and should be higher than half of the 858 bottleneck capacity. For each type of traffic, the graph described 859 in Section 2.7 could be generated. 861 5.4. Less-than Best Effort transport sender 863 This scenario helps to evaluate how an AQM scheme reacts to LBE 864 congestion controls that 'results in smaller bandwidth and/or delay 865 impact on standard TCP than standard TCP itself, when sharing a 866 bottleneck with it.' [RFC6297]. The potential fateful interaction 867 when AQM and LBE techniques are combined has been shown in 868 [GONG2014]; this scenario helps to evaluate whether the coexistence 869 of the proposed AQM and LBE techniques may be possible. 871 A single long-lived non application-limited TCP NewReno flow 872 transfers data between sender A and receiver B. Other TCP-friendly 873 congestion control schemes MAY also be considered. Single long-lived 874 non application-limited LEDBAT [RFC6817] flows transfer data between 875 sender A and receiver B. We recommend to set the target delay and 876 gain values of LEDBAT respectively to 5 ms and 10 [TRAN2014]. Other 877 LBE congestion control schemes, any of those listed in [RFC6297], MAY 878 also be considered. 880 For each of the TCP-friendly and LBE transports, the graph described 881 in Section 2.7 could be generated. 883 6. Round Trip Time Fairness 885 6.1. Motivation 887 An AQM scheme's congestion signals (via drops or ECN marks) must 888 reach the transport sender so that a responsive sender can initiate 889 its congestion control mechanism and adjust the sending rate. This 890 procedure is thus dependent on the end-to-end path RTT. When the RTT 891 varies, the onset of congestion control is impacted, and in turn 892 impacts the ability of an AQM scheme to control the queue. It is 893 therefore important to assess the AQM schemes for a set of RTTs 894 between A and B (e.g., from 5 ms to 200 ms). 896 The asymmetry in terms of difference in intrinsic RTT between various 897 paths sharing the same bottleneck SHOULD be considered so that the 898 fairness between the flows can be discussed since in this scenario, a 899 flow traversing on shorter RTT path may react faster to congestion 900 and recover faster from it compared to another flow on a longer RTT 901 path. The introduction of AQM schemes may potentially improve this 902 type of fairness. 904 Introducing an AQM scheme may cause the unfairness between the flows, 905 even if the RTTs are identical. This potential unfairness SHOULD be 906 investigated as well. 908 6.2. Recommended tests 910 The RECOMMENDED topology is detailed in Figure 1. 912 To evaluate the RTT fairness, for each run, two flows divided into 913 two categories. Category I whose RTT between sender A and receiver B 914 SHOULD be 100ms. Category II which RTT between sender A and receiver 915 B should be in the range [5ms;560ms] inclusive. The maximum value 916 for the RTT represents the RTT of a satellite link that, according to 917 section 2 of [RFC2488] should be at least 558ms. 919 A set of evaluated flows MUST use the same congestion control 920 algorithm: all the generated flows could be single long-lived non 921 application-limited TCP NewReno flows. 923 6.3. Metrics to evaluate the RTT fairness 925 The outputs that MUST be measured are: (1) the cumulative average 926 goodput of the flow from Category I, goodput_Cat_I (Section 2.5); (2) 927 the cumulative average goodput of the flow from Category II, 928 goodput_Cat_II (Section 2.5); (3) the ratio goodput_Cat_II/ 929 goodput_Cat_I; (4) the average packet drop rate for each category 930 (Section 2.3). 932 7. Burst Absorption 934 "AQM mechanisms need to control the overall queue sizes, to ensure 935 that arriving bursts can be accommodated without dropping packets" 936 [RFC7567]. 938 7.1. Motivation 940 An AQM scheme can face bursts of packet arrivals due to various 941 reasons. Dropping one or more packets from a burst can result in 942 performance penalties for the corresponding flows, since dropped 943 packets have to be retransmitted. Performance penalties can result 944 in failing to meet SLAs and be a disincentive to AQM adoption. 946 The ability to accommodate bursts translates to larger queue length 947 and hence more queuing delay. On the one hand, it is important that 948 an AQM scheme quickly brings bursty traffic under control. On the 949 other hand, a peak in the packet drop rates to bring a packet burst 950 quickly under control could result in multiple drops per flow and 951 severely impact transport and application performance. Therefore, an 952 AQM scheme ought to bring bursts under control by balancing both 953 aspects -- (1) queuing delay spikes are minimized and (2) performance 954 penalties for ongoing flows in terms of packet drops are minimized. 956 An AQM scheme that maintains short queues allows some remaining space 957 in the buffer for bursts of arriving packets. The tolerance to 958 bursts of packets depends upon the number of packets in the queue, 959 which is directly linked to the AQM algorithm. Moreover, an AQM 960 scheme may implement a feature controlling the maximum size of 961 accepted bursts, that can depend on the buffer occupancy or the 962 currently estimated queuing delay. The impact of the buffer size on 963 the burst allowance may be evaluated. 965 7.2. Recommended tests 967 For this scenario, tester MUST evaluate how the AQM performs with the 968 following traffic generated from sender A to receiver B: 970 o Web traffic with IW10; 972 o Bursty video frames; 974 o Constant Bit Rate (CBR) UDP traffic. 976 o A single non application-limited bulk TCP flow as background 977 traffic. 979 Figure 2 presents the various cases for the traffic that MUST be 980 generated between sender A and receiver B. 982 +-------------------------------------------------+ 983 |Case| Traffic Type | 984 | +-----+------------+----+--------------------+ 985 | |Video|Web (IW 10)| CBR| Bulk TCP Traffic | 986 +----|-----|------------|----|--------------------| 987 |I | 0 | 1 | 1 | 0 | 988 +----|-----|------------|----|--------------------| 989 |II | 0 | 1 | 1 | 1 | 990 |----|-----|------------|----|--------------------| 991 |III | 1 | 1 | 1 | 0 | 992 +----|-----|------------|----|--------------------| 993 |IV | 1 | 1 | 1 | 1 | 994 +----+-----+------------+----+--------------------+ 996 Figure 2: Bursty traffic scenarios 998 A new web page download could start after the previous web page 999 download is finished. Each web page could be composed by at least 50 1000 objects and the size of each object should be at least 1kB. 6 TCP 1001 parallel connections SHOULD be generated to download the objects, 1002 each parallel connections having an initial congestion window set to 1003 10 packets. 1005 For each of these scenarios, the graph described in Section 2.7 could 1006 be generated for each application. Metrics such as end-to-end 1007 latency, jitter, flow completion time MAY be generated. For the 1008 cases of frame generation of bursty video traffic as well as the 1009 choice of web traffic pattern, these details and their presentation 1010 are left to the testers. 1012 8. Stability 1014 8.1. Motivation 1016 The safety of an AQM scheme is directly related to its stability 1017 under varying operating conditions such as varying traffic profiles 1018 and fluctuating network conditions. Since operating conditions can 1019 vary often the AQM needs to remain stable under these conditions 1020 without the need for additional external tuning. 1022 Network devices can experience varying operating conditions depending 1023 on factors such as time of the day, deployment scenario, etc. For 1024 example: 1026 o Traffic and congestion levels are higher during peak hours than 1027 off-peak hours. 1029 o In the presence of a scheduler, the draining rate of a queue can 1030 vary depending on the occupancy of other queues: a low load on a 1031 high priority queue implies a higher draining rate for the lower 1032 priority queues. 1034 o The capacity available can vary over time (e.g., a lossy channel, 1035 a link supporting traffic in a higher diffserv class). 1037 Whether the target context is a not stable environment, the ability 1038 of an AQM scheme to maintain its control over the queuing delay and 1039 buffer occupancy can be challenged. This document proposes 1040 guidelines to assess the behavior of AQM schemes under varying 1041 congestion levels and varying draining rates. 1043 8.2. Recommended tests 1045 Note that the traffic profiles explained below comprises non 1046 application-limited TCP flows. For each of the below scenarios, the 1047 graphs described in Section 2.7 SHOULD be generated, and the goodput 1048 of the various flows should be cumulated. For Section 8.2.5 and 1049 Section 8.2.6 they SHOULD incorporate the results in per-phase basis 1050 as well. 1052 Wherever the notion of time has explicitly mentioned in this 1053 subsection, time 0 starts from the moment all TCP flows have already 1054 reached their congestion avoidance phase. 1056 8.2.1. Definition of the congestion Level 1058 In these guidelines, the congestion levels are represented by the 1059 projected packet drop rate, had a drop-tail queue was chosen instead 1060 of an AQM scheme. When the bottleneck is shared among non 1061 application-limited TCP flows. l_r, the loss rate projection can be 1062 expressed as a function of N, the number of bulk TCP flows, and S, 1063 the sum of the bandwidth-delay product and the maximum buffer size, 1064 both expressed in packets, based on Eq. 3 of [MORR2000]: 1066 l_r = 0.76 * N^2 / S^2 1068 N = S * SQRT(1/0.76) * SQRT (l_r) 1070 These guidelines use the loss rate to define the different congestion 1071 levels, but they do not stipulate that in other circumstances, 1072 measuring the congestion level gives you an accurate estimation of 1073 the loss rate or vice-versa. 1075 8.2.2. Mild congestion 1077 This scenario can be used to evaluate how an AQM scheme reacts to a 1078 light load of incoming traffic resulting in mild congestion -- packet 1079 drop rates around 0.1%. The number of bulk flows required to achieve 1080 this congestion level, N_mild, is then: 1082 N_mild = ROUND (0.036*S) 1084 8.2.3. Medium congestion 1086 This scenario can be used to evaluate how an AQM scheme reacts to 1087 incoming traffic resulting in medium congestion -- packet drop rates 1088 around 0.5%. The number of bulk flows required to achieve this 1089 congestion level, N_med, is then: 1091 N_med = ROUND (0.081*S) 1093 8.2.4. Heavy congestion 1095 This scenario can be used to evaluate how an AQM scheme reacts to 1096 incoming traffic resulting in heavy congestion -- packet drop rates 1097 around 1%. The number of bulk flows required to achieve this 1098 congestion level, N_heavy, is then: 1100 N_heavy = ROUND (0.114*S) 1102 8.2.5. Varying the congestion level 1104 This scenario can be used to evaluate how an AQM scheme reacts to 1105 incoming traffic resulting in various levels of congestion during the 1106 experiment. In this scenario, the congestion level varies within a 1107 large time-scale. The following phases may be considered: phase I - 1108 mild congestion during 0-20s; phase II - medium congestion during 1109 20-40s; phase III - heavy congestion during 40-60s; phase I again, 1110 and so on. 1112 8.2.6. Varying available capacity 1114 This scenario can be used to help characterize how the AQM behaves 1115 and adapts to bandwidth changes. The experiments are not meant to 1116 reflect the exact conditions of Wi-Fi environments since it is hard 1117 to design repetitive experiments or accurate simulations for such 1118 scenarios. 1120 To emulate varying draining rates, the bottleneck capacity between 1121 nodes 'Router L' and 'Router R' varies over the course of the 1122 experiment as follows: 1124 o Experiment 1: the capacity varies between two values within a 1125 large time-scale. As an example, the following phases may be 1126 considered: phase I - 100Mbps during 0-20s; phase II - 10Mbps 1127 during 20-40s; phase I again, and so on. 1129 o Experiment 2: the capacity varies between two values within a 1130 short time-scale. As an example, the following phases may be 1131 considered: phase I - 100Mbps during 0-100ms; phase II - 10Mbps 1132 during 100-200ms; phase I again, and so on. 1134 The tester MAY choose a phase time-interval value different than what 1135 is stated above, if the network's path conditions (such as bandwidth- 1136 delay product) necessitate. In this case the choice of such time- 1137 interval value SHOULD be stated and elaborated. 1139 The tester MAY additionally evaluate the two mentioned scenarios 1140 (short-term and long-term capacity variations), during and/or 1141 including TCP slow-start phase. 1143 More realistic fluctuating capacity patterns MAY be considered. The 1144 tester MAY choose to incorporate realistic scenarios with regards to 1145 common fluctuation of bandwidth in state-of-the-art technologies. 1147 The scenario consists of TCP NewReno flows between sender A and 1148 receiver B. To better assess the impact of draining rates on the AQM 1149 behavior, the tester MUST compare its performance with those of drop- 1150 tail and SHOULD provide a reference document for their proposal 1151 discussing performance and deployment compared to those of drop-tail. 1152 Burst traffic, such as presented in Section 7.2, could also be 1153 considered to assess the impact of varying available capacity on the 1154 burst absorption of the AQM. 1156 8.3. Parameter sensitivity and stability analysis 1158 The control law used by an AQM is the primary means by which the 1159 queuing delay is controlled. Hence understanding the control law is 1160 critical to understanding the behavior of the AQM scheme. The 1161 control law could include several input parameters whose values 1162 affect the AQM scheme's output behavior and its stability. 1163 Additionally, AQM schemes may auto-tune parameter values in order to 1164 maintain stability under different network conditions (such as 1165 different congestion levels, draining rates or network environments). 1166 The stability of these auto-tuning techniques is also important to 1167 understand. 1169 Transports operating under the control of AQM experience the effect 1170 of multiple control loops that react over different timescales. It 1171 is therefore important that proposed AQM schemes are seen to be 1172 stable when they are deployed at multiple points of potential 1173 congestion along an Internet path. The pattern of congestion signals 1174 (loss or ECN-marking) arising from AQM methods also need to not 1175 adversely interact with the dynamics of the transport protocols that 1176 they control. 1178 AQM proposals SHOULD provide background material showing control 1179 theoretic analysis of the AQM control law and the input parameter 1180 space within which the control law operates as expected; or could use 1181 another way to discuss the stability of the control law. For 1182 parameters that are auto-tuned, the material SHOULD include stability 1183 analysis of the auto-tuning mechanism(s) as well. Such analysis 1184 helps to understand an AQM control law better and the network 1185 conditions/deployments under which the AQM is stable. 1187 9. Various Traffic Profiles 1189 This section provides guidelines to assess the performance of an AQM 1190 proposal for various traffic profiles such as traffic with different 1191 applications or bi-directional traffic. 1193 9.1. Traffic mix 1195 This scenario can be used to evaluate how an AQM scheme reacts to a 1196 traffic mix consisting of different applications such as: 1198 o Bulk TCP transfer 1200 o Web traffic 1202 o VoIP 1204 o Constant Bit Rate (CBR) UDP traffic 1206 o Adaptive video streaming 1208 Various traffic mixes can be considered. These guidelines RECOMMEND 1209 to examine at least the following example: 1 bi-directional VoIP; 6 1210 Web pages download (such as detailed in Section 7.2); 1 CBR; 1 1211 Adaptive Video; 5 bulk TCP. Any other combinations could be 1212 considered and should be carefully documented. 1214 For each scenario, the graph described in Section 2.7 could be 1215 generated for each class of traffic. Metrics such as end-to-end 1216 latency, jitter and flow completion time MAY be reported. 1218 9.2. Bi-directional traffic 1220 Control packets such as DNS requests/responses, TCP SYNs/ACKs are 1221 small, but their loss can severely impact the application 1222 performance. The scenario proposed in this section will help in 1223 assessing whether the introduction of an AQM scheme increases the 1224 loss probability of these important packets. 1226 For this scenario, traffic MUST be generated in both downlink and 1227 uplink, such as defined in Section 3.1. These guidelines RECOMMEND 1228 to consider a mild congestion level and the traffic presented in 1229 Section 8.2.2 in both directions. In this case, the metrics reported 1230 MUST be the same as in Section 8.2 for each direction. 1232 The traffic mix presented in Section 9.1 MAY also be generated in 1233 both directions. 1235 10. Multi-AQM Scenario 1237 10.1. Motivation 1239 Transports operating under the control of AQM experience the effect 1240 of multiple control loops that react over different timescales. It 1241 is therefore important that proposed AQM schemes are seen to be 1242 stable when they are deployed at multiple points of potential 1243 congestion along an Internet path. The pattern of congestion signals 1244 (loss or ECN-marking) arising from AQM methods also need to not 1245 adversely interact with the dynamics of the transport protocols that 1246 they control. 1248 10.2. Details on the evaluation scenario 1250 +---------+ +-----------+ 1251 |senders A|---+ +---|receivers A| 1252 +---------+ | | +-----------+ 1253 +-----+---+ +---------+ +--+-----+ 1254 |Router L |--|Router M |--|Router R| 1255 |AQM | |AQM | |No AQM | 1256 +---------+ +--+------+ +--+-----+ 1257 +---------+ | | +-----------+ 1258 |senders B|-------------+ +---|receivers B| 1259 +---------+ +-----------+ 1261 Figure 3: Topology for the Multi-AQM scenario 1263 This scenario can be used to evaluate how having AQM schemes in 1264 sequence impact the induced latency reduction, the induced goodput 1265 maximization and the trade-off between these two. The topology 1266 presented in Figure 3 could be used. AQM schemes introduced in 1267 Router L and Router M should be the same; any other configurations 1268 could be considered. For this scenario, it is recommended to 1269 consider a mild congestion level, the number of flows specified in 1270 Section 8.2.2 being equally shared among senders A and B. Any other 1271 relevant combination of congestion levels could be considered. We 1272 recommend to measure the metrics presented in Section 8.2. 1274 11. Implementation cost 1276 11.1. Motivation 1278 Successful deployment of AQM is directly related to its cost of 1279 implementation. Network devices can need hardware or software 1280 implementations of the AQM mechanism. Depending on a device's 1281 capabilities and limitations, the device may or may not be able to 1282 implement some or all parts of their AQM logic. 1284 AQM proposals SHOULD provide pseudo-code for the complete AQM scheme, 1285 highlighting generic implementation-specific aspects of the scheme 1286 such as "drop-tail" vs. "drop-head", inputs (e.g., current queuing 1287 delay, queue length), computations involved, need for timers, etc. 1288 This helps to identify costs associated with implementing the AQM 1289 scheme on a particular hardware or software device. This also 1290 facilitates discsusions around which kind of devices can easily 1291 support the AQM and which cannot. 1293 11.2. Recommended discussion 1295 AQM proposals SHOULD highlight parts of their AQM logic that are 1296 device dependent and discuss if and how AQM behavior could be 1297 impacted by the device. For example, a queueing-delay based AQM 1298 scheme requires current queuing delay as input from the device. If 1299 the device already maintains this value, then it can be trivial to 1300 implement the their AQM logic on the device. If the device provides 1301 indirect means to estimate the queuing delay (for example: 1302 timestamps, dequeuing rate), then the AQM behavior is sensitive to 1303 the precision of the queuing delay estimations are for that device. 1304 Highlighting the sensitivity of an AQM scheme to queuing delay 1305 estimations helps implementers to identify appropriate means of 1306 implementing the mechanism on a device. 1308 12. Operator Control and Auto-tuning 1309 12.1. Motivation 1311 One of the biggest hurdles of RED deployment was/is its parameter 1312 sensitivity to operating conditions -- how difficult it is to tune 1313 RED parameters for a deployment to achieve acceptable benefit from 1314 using RED. Fluctuating congestion levels and network conditions add 1315 to the complexity. Incorrect parameter values lead to poor 1316 performance. 1318 Any AQM scheme is likely to have parameters whose values affect the 1319 control law and behaviour of an AQM. Exposing all these parameters 1320 as control parameters to a network operator (or user) can easily 1321 result in a unsafe AQM deployment. Unexpected AQM behavior ensues 1322 when parameter values are set improperly. A minimal number of 1323 control parameters minimizes the number of ways a user can break a 1324 system where an AQM scheme is deployed at. Fewer control parameters 1325 make the AQM scheme more user-friendly and easier to deploy and 1326 debug. 1328 [RFC7567] states "AQM algorithms SHOULD NOT require tuning of initial 1329 or configuration parameters in common use cases." A scheme ought to 1330 expose only those parameters that control the macroscopic AQM 1331 behavior such as queue delay threshold, queue length threshold, etc. 1333 Additionally, the safety of an AQM scheme is directly related to its 1334 stability under varying operating conditions such as varying traffic 1335 profiles and fluctuating network conditions, as described in 1336 Section 8. Operating conditions vary often and hence the AQM needs 1337 to remain stable under these conditions without the need for 1338 additional external tuning. If AQM parameters require tuning under 1339 these conditions, then the AQM must self-adapt necessary parameter 1340 values by employing auto-tuning techniques. 1342 12.2. Recommended discussion 1344 In order to understand an AQM's deployment considerations and 1345 performance under a specific environment, AQM proposals SHOULD 1346 describe the parameters that control the macroscopic AQM behavior, 1347 and identify any parameters that require tuning to operational 1348 conditions. It could be interesting to also discuss that even if an 1349 AQM scheme may not adequately auto-tune its parameters, the resulting 1350 performance may not be optimal, but close to something reasonable. 1352 If there are any fixed parameters within the AQM, their setting 1353 SHOULD be discussed and justified, to help understand whether a fixed 1354 parameter value is applicable for a particular environment. 1356 If an AQM scheme is evaluated with parameter(s) that were externally 1357 tuned for optimization or other purposes, these values MUST be 1358 disclosed. 1360 13. Conclusion 1362 Figure 4 lists the scenarios and their requirements. 1364 +------------------------------------------------------------------+ 1365 |Scenario |Sec. |Requirement | 1366 +------------------------------------------------------------------+ 1367 +------------------------------------------------------------------+ 1368 |Interaction with ECN | 4.5 |MUST be discussed if supported | 1369 +------------------------------------------------------------------+ 1370 |Interaction with Scheduling| 4.6 |Feasibility MUST be discussed | 1371 +------------------------------------------------------------------+ 1372 |Transport Protocols |5. | | 1373 | TCP-friendly sender | 5.1 |Scenario MUST be considered | 1374 | Aggressive sender | 5.2 |Scenario MUST be considered | 1375 | Unresponsive sender | 5.3 |Scenario MUST be considered | 1376 | LBE sender | 5.4 |Scenario MAY be considered | 1377 +------------------------------------------------------------------+ 1378 |Round Trip Time Fairness | 6.2 |Scenario MUST be considered | 1379 +------------------------------------------------------------------+ 1380 |Burst Absorption | 7.2 |Scenario MUST be considered | 1381 +------------------------------------------------------------------+ 1382 |Stability |8. | | 1383 | Varying congestion levels | 8.2.5|Scenario MUST be considered | 1384 | Varying available capacity| 8.2.6|Scenario MUST be considered | 1385 | Parameters and stability | 8.3 |This SHOULD be discussed | 1386 +------------------------------------------------------------------+ 1387 |Various Traffic Profiles |9. | | 1388 | Traffic mix | 9.1 |Scenario is RECOMMENDED | 1389 | Bi-directional traffic | 9.2 |Scenario MAY be considered | 1390 +------------------------------------------------------------------+ 1391 |Multi-AQM | 10.2 |Scenario MAY be considered | 1392 +------------------------------------------------------------------+ 1393 |Implementation Cost | 11.2 |Pseudo-code SHOULD be provided | 1394 +------------------------------------------------------------------+ 1395 |Operator Control | 12.2 |Tuning SHOULD NOT be required | 1396 +------------------------------------------------------------------+ 1398 Figure 4: Summary of the scenarios and their requirements 1400 14. Acknowledgements 1402 This work has been partially supported by the European Community 1403 under its Seventh Framework Programme through the Reducing Internet 1404 Transport Latency (RITE) project (ICT-317700). 1406 15. Contributors 1408 Many thanks to S. Akhtar, A.B. Bagayoko, F. Baker, R. Bless, D. 1409 Collier-Brown, G. Fairhurst, J. Gettys, T. Hoiland-Jorgensen, K. 1410 Kilkki, C. Kulatunga, W. Lautenschlager, A.C. Morton, R. Pan, G. 1411 Skinner, D. Taht and M. Welzl for detailed and wise feedback on 1412 this document. 1414 16. IANA Considerations 1416 This memo includes no request to IANA. 1418 17. Security Considerations 1420 Some security considerations for AQM are identified in [RFC7567].This 1421 document, by itself, presents no new privacy nor security issues. 1423 18. References 1425 18.1. Normative References 1427 [I-D.ietf-tcpm-cubic] 1428 Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 1429 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 1430 draft-ietf-tcpm-cubic-00 (work in progress), June 2015. 1432 [I-D.irtf-iccrg-tcpeval] 1433 Hayes, D., Ros, D., Andrew, L., and S. Floyd, "Common TCP 1434 Evaluation Suite", draft-irtf-iccrg-tcpeval-01 (work in 1435 progress), July 2014. 1437 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 1438 RFC 793, DOI 10.17487/RFC0793, September 1981, 1439 . 1441 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1442 Requirement Levels", RFC 2119, 1997. 1444 [RFC2488] Allman, M., Glover, D., and L. Sanchez, "Enhancing TCP 1445 Over Satellite Channels using Standard Mechanisms", 1446 BCP 28, RFC 2488, DOI 10.17487/RFC2488, January 1999, 1447 . 1449 [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 1450 Network Interconnect Devices", RFC 2544, 1451 DOI 10.17487/RFC2544, March 1999, 1452 . 1454 [RFC2647] Newman, D., "Benchmarking Terminology for Firewall 1455 Performance", RFC 2647, DOI 10.17487/RFC2647, August 1999, 1456 . 1458 [RFC2679] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 1459 Delay Metric for IPPM", RFC 2679, DOI 10.17487/RFC2679, 1460 September 1999, . 1462 [RFC2680] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way 1463 Packet Loss Metric for IPPM", RFC 2680, 1464 DOI 10.17487/RFC2680, September 1999, 1465 . 1467 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1468 of Explicit Congestion Notification (ECN) to IP", 1469 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1470 . 1472 [RFC3611] Friedman, T., Ed., Caceres, R., Ed., and A. Clark, Ed., 1473 "RTP Control Protocol Extended Reports (RTCP XR)", 1474 RFC 3611, DOI 10.17487/RFC3611, November 2003, 1475 . 1477 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 1478 Friendly Rate Control (TFRC): Protocol Specification", 1479 RFC 5348, DOI 10.17487/RFC5348, September 2008, 1480 . 1482 [RFC5481] Morton, A. and B. Claise, "Packet Delay Variation 1483 Applicability Statement", RFC 5481, DOI 10.17487/RFC5481, 1484 March 2009, . 1486 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1487 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1488 . 1490 [RFC6297] Welzl, M. and D. Ros, "A Survey of Lower-than-Best-Effort 1491 Transport Protocols", RFC 6297, DOI 10.17487/RFC6297, June 1492 2011, . 1494 [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, 1495 "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, 1496 DOI 10.17487/RFC6817, December 2012, 1497 . 1499 [RFC7141] Briscoe, B. and J. Manner, "Byte and Packet Congestion 1500 Notification", RFC 7141, 2014. 1502 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 1503 Recommendations Regarding Active Queue Management", 1504 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 1505 . 1507 18.2. Informative References 1509 [ANEL2014] 1510 Anelli, P., Diana, R., and E. Lochin, "FavorQueue: a 1511 Parameterless Active Queue Management to Improve TCP 1512 Traffic Performance", Computer Networks vol. 60, 2014. 1514 [BB2011] "BufferBloat: what's wrong with the internet?", ACM 1515 Queue vol. 9, 2011. 1517 [GONG2014] 1518 Gong, Y., Rossi, D., Testa, C., Valenti, S., and D. Taht, 1519 "Fighting the bufferbloat: on the coexistence of AQM and 1520 low priority congestion control", Computer Networks, 1521 Elsevier, 2014, 60, pp.115 - 128 , 2014. 1523 [HASS2008] 1524 Hassayoun, S. and D. Ros, "Loss Synchronization and Router 1525 Buffer Sizing with High-Speed Versions of TCP", IEEE 1526 INFOCOM Workshops , 2008. 1528 [HOEI2015] 1529 Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, 1530 J., and E. Dumazet, "FlowQueue-Codel", IETF (Work-in- 1531 Progress) , January 2015. 1533 [JAY2006] Jay, P., Fu, Q., and G. Armitage, "A preliminary analysis 1534 of loss synchronisation between concurrent TCP flows", 1535 Australian Telecommunication Networks and Application 1536 Conference (ATNAC) , 2006. 1538 [MORR2000] 1539 Morris, R., "Scalable TCP congestion control", IEEE 1540 INFOCOM , 2000. 1542 [NICH2012] 1543 Nichols, K. and V. Jacobson, "Controlling Queue Delay", 1544 ACM Queue , 2012. 1546 [PAN2013] Pan, R., Natarajan, P., Piglione, C., Prabhu, MS., 1547 Subramanian, V., Baker, F., and B. VerSteeg, "PIE: A 1548 lightweight control scheme to address the bufferbloat 1549 problem", IEEE HPSR , 2013. 1551 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 1552 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 1553 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 1554 S., Wroclawski, J., and L. Zhang, "Recommendations on 1555 Queue Management and Congestion Avoidance in the 1556 Internet", RFC 2309, April 1998. 1558 [TRAN2014] 1559 Trang, S., Kuhn, N., Lochin, E., Baudoin, C., Dubois, E., 1560 and P. Gelard, "On The Existence Of Optimal LEDBAT 1561 Parameters", IEEE ICC 2014 - Communication QoS, 1562 Reliability and Modeling Symposium , 2014. 1564 [WELZ2015] 1565 Welzl, M. and G. Fairhurst, "The Benefits to Applications 1566 of using Explicit Congestion Notification (ECN)", IETF 1567 (Work-in-Progress) , June 2015. 1569 [WINS2014] 1570 Winstein, K., "Transport Architectures for an Evolving 1571 Internet", PhD thesis, Massachusetts Institute of 1572 Technology , 2014. 1574 Authors' Addresses 1576 Nicolas Kuhn (editor) 1577 Telecom Bretagne 1578 2 rue de la Chataigneraie 1579 Cesson-Sevigne 35510 1580 France 1582 Phone: +33 2 99 12 70 46 1583 Email: nicolas.kuhn@cnes.fr 1584 Preethi Natarajan (editor) 1585 Cisco Systems 1586 510 McCarthy Blvd 1587 Milpitas, California 1588 United States 1590 Email: prenatar@cisco.com 1592 Naeem Khademi (editor) 1593 University of Oslo 1594 Department of Informatics, PO Box 1080 Blindern 1595 N-0316 Oslo 1596 Norway 1598 Phone: +47 2285 24 93 1599 Email: naeemk@ifi.uio.no 1601 David Ros 1602 Simula Research Laboratory AS 1603 P.O. Box 134 1604 Lysaker, 1325 1605 Norway 1607 Phone: +33 299 25 21 21 1608 Email: dros@simula.no