idnits 2.17.1 draft-ietf-aqm-recommendation-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC2309, but the abstract doesn't seem to directly say this. It does mention RFC2309 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 23, 2015) is 3349 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5405 (Obsoleted by RFC 8085) -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 896 (Obsoleted by RFC 7805) -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Baker, Ed. 3 Internet-Draft Cisco Systems 4 Obsoletes: 2309 (if approved) G. Fairhurst, Ed. 5 Intended status: Best Current Practice University of Aberdeen 6 Expires: August 27, 2015 February 23, 2015 8 IETF Recommendations Regarding Active Queue Management 9 draft-ietf-aqm-recommendation-10 11 Abstract 13 This memo presents recommendations to the Internet community 14 concerning measures to improve and preserve Internet performance. It 15 presents a strong recommendation for testing, standardization, and 16 widespread deployment of active queue management (AQM) in network 17 devices, to improve the performance of today's Internet. It also 18 urges a concerted effort of research, measurement, and ultimate 19 deployment of AQM mechanisms to protect the Internet from flows that 20 are not sufficiently responsive to congestion notification. 22 The note replaces the recommendations of RFC 2309 based on fifteen 23 years of experience and new research. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on August 27, 2015. 42 Copyright Notice 44 Copyright (c) 2015 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 This document may contain material from IETF Documents or IETF 58 Contributions published or made publicly available before November 59 10, 2008. The person(s) controlling the copyright in some of this 60 material may not have granted the IETF Trust the right to allow 61 modifications of such material outside the IETF Standards Process. 62 Without obtaining an adequate license from the person(s) controlling 63 the copyright in such materials, this document may not be modified 64 outside the IETF Standards Process, and derivative works of it may 65 not be created outside the IETF Standards Process, except to format 66 it for publication as an RFC or to translate it into languages other 67 than English. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 72 1.1. Congestion Collapse . . . . . . . . . . . . . . . . . . . 3 73 1.2. Active Queue Management to Manage Latency . . . . . . . . 4 74 1.3. Document Overview . . . . . . . . . . . . . . . . . . . . 5 75 1.4. Changes to the recommendations of RFC2309 . . . . . . . . 6 76 1.5. Requirements Language . . . . . . . . . . . . . . . . . . 6 77 2. The Need For Active Queue Management . . . . . . . . . . . . 6 78 2.1. AQM and Multiple Queues . . . . . . . . . . . . . . . . . 10 79 2.2. AQM and Explicit Congestion Marking (ECN) . . . . . . . . 10 80 2.3. AQM and Buffer Size . . . . . . . . . . . . . . . . . . . 11 81 3. Managing Aggressive Flows . . . . . . . . . . . . . . . . . . 11 82 4. Conclusions and Recommendations . . . . . . . . . . . . . . . 14 83 4.1. Operational deployments SHOULD use AQM procedures . . . . 15 84 4.2. Signaling to the transport endpoints . . . . . . . . . . 16 85 4.2.1. AQM and ECN . . . . . . . . . . . . . . . . . . . . . 17 86 4.3. AQM algorithms deployed SHOULD NOT require operational 87 tuning . . . . . . . . . . . . . . . . . . . . . . . . . 18 88 4.4. AQM algorithms SHOULD respond to measured congestion, not 89 application profiles. . . . . . . . . . . . . . . . . . . 19 90 4.5. AQM algorithms SHOULD NOT be dependent on specific 91 transport protocol behaviours . . . . . . . . . . . . . . 20 92 4.6. Interactions with congestion control algorithms . . . . . 21 93 4.7. The need for further research . . . . . . . . . . . . . . 22 94 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 95 6. Security Considerations . . . . . . . . . . . . . . . . . . . 23 96 7. Privacy Considerations . . . . . . . . . . . . . . . . . . . 23 97 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 24 98 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 24 99 9.1. Normative References . . . . . . . . . . . . . . . . . . 24 100 9.2. Informative References . . . . . . . . . . . . . . . . . 25 101 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 28 102 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 31 104 1. Introduction 106 The Internet protocol architecture is based on a connectionless end- 107 to-end packet service using the Internet Protocol, whether IPv4 108 [RFC0791] or IPv6 [RFC2460]. The advantages of its connectionless 109 design: flexibility and robustness, have been amply demonstrated. 110 However, these advantages are not without cost: careful design is 111 required to provide good service under heavy load. In fact, lack of 112 attention to the dynamics of packet forwarding can result in severe 113 service degradation or "Internet meltdown". This phenomenon was 114 first observed during the early growth phase of the Internet in the 115 mid 1980s [RFC0896][RFC0970], and is technically called "congestion 116 collapse" and was a key focus of RFC2309. 118 Although wide-scale congestion collapse is not common in the 119 Internet, the presence of localised congestion collapse is by no 120 means rare. It is therefore important to continue to avoid 121 congestion collapse. 123 Since 1998, when RFC2309 was written, the Internet has become used 124 for a variety of traffic. In the current Internet, low latency is 125 extremely important for many interactive and transaction-based 126 applications. The same type of technology that RFC2309 advocated for 127 combating congestion collapse is also effective at limiting delays to 128 reduce the interaction delay (latency) experienced by applications 129 [Bri15]. High or unpredictable latency can impact the performance of 130 the control loops used by ene-to-end protocols (including congestion 131 control algorithms using TCP). There is now also a focus on reducing 132 network latency using the same technology. 134 The mechanisms decsribed in this document may be implemented in 135 network devices on the path between end-points that include routers, 136 switches, and other network middleboxes. The methods may also be 137 implemented in the networking stacks within endpoint devices that 138 connect to the network. 140 1.1. Congestion Collapse 142 The original fix for Internet meltdown was provided by Van Jacobsen. 143 Beginning in 1986, Jacobsen developed the congestion avoidance 144 mechanisms [Jacobson88] that are now required for implementations of 145 the Transport Control Protocol (TCP) [RFC0793] [RFC1122]. ([RFC7414] 146 provides a roadmap to help identify TCP-related documents.) These 147 mechanisms operate in Internet hosts to cause TCP connections to 148 "back off" during congestion. We say that TCP flows are "responsive" 149 to congestion signals (i.e., packets that are dropped or marked with 150 explicit congestion notification [RFC3168]). It is primarily these 151 TCP congestion avoidance algorithms that prevent the congestion 152 collapse of today's Internet. Similar algorithms are specified for 153 other non-TCP transports. 155 However, that is not the end of the story. Considerable research has 156 been done on Internet dynamics since 1988, and the Internet has 157 grown. It has become clear that the congestion avoidance mechanisms 158 [RFC5681], while necessary and powerful, are not sufficient to 159 provide good service in all circumstances. Basically, there is a 160 limit to how much control can be accomplished from the edges of the 161 network. Some mechanisms are needed in network devices to complement 162 the endpoint congestion avoidance mechanisms. These mechanisms may 163 be implemented in network devices. 165 1.2. Active Queue Management to Manage Latency 167 Internet latency has become a focus of attention to increase the 168 responsiveness of Internet applications and protocols. One major 169 source of delay is the build-up of queues in network devices. 170 Queueing occurs whenever the arrival rate of data at the ingress to a 171 device exceeds the current egress rate. Such queueing is normal in a 172 packet-switched network and is often necessary to absorb bursts in 173 transmission and perform statistical multiplexing of traffic, but 174 excessive queueing can lead to unwanted delay, reducing the 175 performance of some Internet applications. 177 RFC 2309 introduced the concept of "Active Queue Management" (AQM), a 178 class of technologies that, by signaling to common congestion- 179 controlled transports such as TCP, manages the size of queues that 180 build in network buffers. RFC 2309 also describes a specific AQM 181 algorithm, Random Early Detection (RED), and recommends that this be 182 widely implemented and used by default in routers. 184 With an appropriate set of parameters, RED is an effective algorithm. 185 However, dynamically predicting this set of parameters was found to 186 be difficult. As a result, RED has not been enabled by default, and 187 its present use in the Internet is limited. Other AQM algorithms 188 have been developed since RC2309 was published, some of which are 189 self-tuning within a range of applicability. Hence, while this memo 190 continues to recommend the deployment of AQM, it no longer recommends 191 that RED or any other specific algorithm is used as a default; 192 instead it provides recommendations on how to select appropriate 193 algorithms and that a recommended algorithm is able to automate any 194 required tuning for common deployment scenarios. 196 Deploying AQM in the network can significantly reduce the latency 197 across an Internet path and since writing RFC2309, this has become a 198 key motivation for using AQM in the Internet. In the context of AQM, 199 it is useful to distinguish between two related classes of 200 algorithms: "queue management" versus "scheduling" algorithms. To a 201 rough approximation, queue management algorithms manage the length of 202 packet queues by marking or dropping packets when necessary or 203 appropriate, while scheduling algorithms determine which packet to 204 send next and are used primarily to manage the allocation of 205 bandwidth among flows. While these two mechanisms are closely 206 related, they address different performance issues and operate on 207 different timescales. Both may be used in combination. 209 1.3. Document Overview 211 The discussion in this memo applies to "best-effort" traffic, which 212 is to say, traffic generated by applications that accept the 213 occasional loss, duplication, or reordering of traffic in flight. It 214 also applies to other traffic, such as real-time traffic that can 215 adapt its sending rate to reduce loss and/or delay. It is most 216 effective when the adaption occurs on time scales of a single Round 217 Trip Time (RTT) or a small number of RTTs, for elastic traffic 218 [RFC1633]. 220 Two performance issues are highlighted: 222 The first issue is the need for an advanced form of queue management 223 that we call "Active Queue Management", AQM. Section 2 summarizes 224 the benefits that active queue management can bring. A number of AQM 225 procedures are described in the literature, with different 226 characteristics. This document does not recommend any of them in 227 particular, but does make recommendations that ideally would affect 228 the choice of procedure used in a given implementation. 230 The second issue, discussed in Section 4 of this memo, is the 231 potential for future congestion collapse of the Internet due to flows 232 that are unresponsive, or not sufficiently responsive, to congestion 233 indications. Unfortunately, while scheduling can mitigate some of 234 the side-effects of sharing a network queue with an unresponsive 235 flow, there is currently no consensus solution to controlling the 236 congestion caused by such aggressive flows. Methods such as 237 congestion exposure (ConEx) [RFC6789] offer a framework [CONEX] that 238 can update network devices to alleviate these effects. Significant 239 research and engineering will be required before any solution will be 240 available. It is imperative that work to mitigate the impact of 241 unresponsive flows is energetically pursued, to ensure acceptable 242 performance and the future stability of the Internet. 244 Section 4 concludes the memo with a set of recommendations to the 245 Internet community on the use of AQM and recommendations for defining 246 AQM algorithms. 248 1.4. Changes to the recommendations of RFC2309 250 This memo replaces the recommendations in [RFC2309], which resulted 251 from past discussions of end-to-end performance, Internet congestion, 252 and RED in the End-to-End Research Group of the Internet Research 253 Task Force (IRTF). It follows experience with this and other 254 algorithms, and the AQM discussion within the IETF [AQM-WG]. 256 While RFC2309 described AQM in terms of the length of a queue. This 257 memo changes this, to use AQM to refer to any method that allows 258 network devices to control either the queue length and/or the mean 259 time that a packet spends in a queue. 261 This memo also explicitly obsoletes the recommendation that Random 262 Early Detection (RED) was to be used as the default AQM mechanism for 263 the Internet. This is replaced by a detailed set of recommendations 264 for selecting an appropriate AQM algorithm. As in RFC2309, this memo 265 also motivates the need for continued research, but clarifies the 266 research with examples appropriate at the time that this memo is 267 published. 269 1.5. Requirements Language 271 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 272 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 273 document are to be interpreted as described in [RFC2119]. 275 2. The Need For Active Queue Management 277 Active Queue Management (AQM) is a method that allows network devices 278 to control the queue length or the mean time that a packet spends in 279 a queue. Although AQM can be applied across a range of deployment 280 environments, the recommendations in this document are directed to 281 use in the general Internet. It is expected that the principles and 282 guidance are also applicable to a wide range of environments, but may 283 require tuning for specific types of link/network (e.g. to 284 accommodate the traffic patterns found in data centres, the 285 challenges of wireless infrastructure, or the higher delay 286 encountered on satellite Internet links). The remainder of this 287 section identifies the need for AQM and the advantages of deploying 288 AQM methods. 290 The traditional technique for managing the queue length in a network 291 device is to set a maximum length (in terms of packets) for each 292 queue, accept packets for the queue until the maximum length is 293 reached, then reject (drop) subsequent incoming packets until the 294 queue decreases because a packet from the queue has been transmitted. 295 This technique is known as "tail drop", since the packet that arrived 296 most recently (i.e., the one on the tail of the queue) is dropped 297 when the queue is full. This method has served the Internet well for 298 years, but it has four important drawbacks: 300 1. Full Queues 302 The tail drop discipline allows queues to maintain a full (or, 303 almost full) status for long periods of time, since tail drop 304 signals congestion (via a packet drop) only when the queue has 305 become full. It is important to reduce the steady-state queue 306 size, and this is perhaps the most important goal for queue 307 management. 309 The naive assumption might be that there is a simple tradeoff 310 between delay and throughput, and that the recommendation that 311 queues be maintained in a "non-full" state essentially translates 312 to a recommendation that low end-to-end delay is more important 313 than high throughput. However, this does not take into account 314 the critical role that packet bursts play in Internet 315 performance. For example, even though TCP constrains the 316 congestion window of a flow, packets often arrive at network 317 devices in bursts [Leland94]. If the queue is full or almost 318 full, an arriving burst will cause multiple packets to be dropped 319 from the same flow. Bursts of loss can result in a global 320 synchronization of flows throttling back, followed by a sustained 321 period of lowered link utilization, reducing overall throughput 322 [Flo94], [Zha90] 324 The goal of buffering in the network is to absorb data bursts and 325 to transmit them during the (hopefully) ensuing bursts of 326 silence. This is essential to permit transmission of bursts of 327 data. Normally small queues are preferred in network devices, 328 with sufficient queue capacity to absorb the bursts. The 329 counter-intuitive result is that maintaining normally-small 330 queues can result in higher throughput as well as lower end-to- 331 end delay. In summary, queue limits should not reflect the 332 steady state queues we want to be maintained in the network; 333 instead, they should reflect the size of bursts that a network 334 device needs to absorb. 336 2. Lock-Out 337 In some situations tail drop allows a single connection or a few 338 flows to monopolize the queue space starving other connections, 339 preventing them from getting room in the queue [Flo92]. 341 3. Mitigating the Impact of Packet Bursts 343 Large burst of packets can delay other packets, disrupting the 344 control loop (e.g. the pacing of flows by the TCP ACK-Clock), and 345 reducing the performance of flows that share a common bottleneck. 347 4. Control loop synchronization 349 Congestion control, like other end-to-end mechanisms, introduces 350 a control loop between hosts. Sessions that share a common 351 network bottleneck can therefore become synchronised, introducing 352 periodic disruption (e.g. jitter/loss). "lock-out" is often also 353 the result of synchronization or other timing effects 355 Besides tail drop, two alternative queue management disciplines that 356 can be applied when a queue becomes full are "random drop on full" or 357 "head drop on full". When a new packet arrives at a full queue using 358 the random drop on full discipline, the network device drops a 359 randomly selected packet from the queue (which can be an expensive 360 operation, since it naively requires an O(N) walk through the packet 361 queue). When a new packet arrives at a full queue using the head 362 drop on full discipline, the network device drops the packet at the 363 front of the queue [Lakshman96]. Both of these solve the lock-out 364 problem, but neither solves the full-queues problem described above. 366 We know in general how to solve the full-queues problem for 367 "responsive" flows, i.e., those flows that throttle back in response 368 to congestion notification. In the current Internet, dropped packets 369 provide a critical mechanism indicating congestion notification to 370 hosts. The solution to the full-queues problem is for network 371 devices to drop or ECN-mark packets before a queue becomes full, so 372 that hosts can respond to congestion before buffers overflow. We 373 call such a proactive approach AQM. By dropping or ECN-marking 374 packets before buffers overflow, AQM allows network devices to 375 control when and how many packets to drop. 377 In summary, an active queue management mechanism can provide the 378 following advantages for responsive flows. 380 1. Reduce number of packets dropped in network devices 382 Packet bursts are an unavoidable aspect of packet networks 383 [Willinger95]. If all the queue space in a network device is 384 already committed to "steady state" traffic or if the buffer 385 space is inadequate, then the network device will have no ability 386 to buffer bursts. By keeping the average queue size small, AQM 387 will provide greater capacity to absorb naturally-occurring 388 bursts without dropping packets. 390 Furthermore, without AQM, more packets will be dropped when a 391 queue does overflow. This is undesirable for several reasons. 392 First, with a shared queue and the tail drop discipline, this can 393 result in unnecessary global synchronization of flows, resulting 394 in lowered average link utilization, and hence lowered network 395 throughput. Second, unnecessary packet drops represent a waste 396 of network capacity on the path before the drop point. 398 While AQM can manage queue lengths and reduce end-to-end latency 399 even in the absence of end-to-end congestion control, it will be 400 able to reduce packet drops only in an environment that continues 401 to be dominated by end-to-end congestion control. 403 2. Provide a lower-delay interactive service 405 By keeping a small average queue size, AQM will reduce the delays 406 experienced by flows. This is particularly important for 407 interactive applications such as short web transfers, POP/IMAP, 408 DNS, terminal traffic (telnet, ssh, mosh, RDP, etc), gaming or 409 interactive audio-video sessions, whose subjective (and 410 objective) performance is better when the end-to-end delay is 411 low. 413 3. Avoid lock-out behavior 415 AQM can prevent lock-out behavior by ensuring that there will 416 almost always be a buffer available for an incoming packet. For 417 the same reason, AQM can prevent a bias against low capacity, but 418 highly bursty, flows. 420 Lock-out is undesirable because it constitutes a gross unfairness 421 among groups of flows. However, we stop short of calling this 422 benefit "increased fairness", because general fairness among 423 flows requires per-flow state, which is not provided by queue 424 management. For example, in a network device using AQM with only 425 FIFO scheduling, two TCP flows may receive very different share 426 of the network capacity simply because they have different round- 427 trip times [Floyd91], and a flow that does not use congestion 428 control may receive more capacity than a flow that does. AQM can 429 therefore be combined with a scheduling mechanism that divides 430 network traffic between multiple queues (section 2.1). 432 4. Reduce the probability of control loop synchronization 433 The probability of network control loop synchronization can be 434 reduced if network devices introduce randomness in the AQM 435 functions that trigger congestion avoidance at the sending host. 437 2.1. AQM and Multiple Queues 439 A network device may use per-flow or per-class queuing with a 440 scheduling algorithm to either prioritize certain applications or 441 classes of traffic, limit the rate of transmission, or to provide 442 isolation between different traffic flows within a common class. For 443 example, a router may maintain per-flow state to achieve general 444 fairness by a per-flow scheduling algorithm such as various forms of 445 Fair Queueing (FQ) [Dem90] [Sut99], including Weighted Fair Queuing 446 (WFQ), Stochastic Fairness Queueing (SFQ) [McK90] Deficit Round Robin 447 (DRR) [Shr96], [Nic12], and/or a Class-Based Queue scheduling 448 algorithm such as CBQ [Floyd95]. Hierarchical queues may also be 449 used e.g., as a part of a Hierarchical Token Bucket (HTB), or 450 Hierarchical Fair Service Curve (HFSC) [Sto97]. These methods are 451 also used to realize a range of Quality of Service (QoS) behaviours 452 designed to meet the need of traffic classes (e.g. using the 453 integrated or differentiated service models). 455 AQM is needed even for network devices that use per-flow or per-class 456 queuing, because scheduling algorithms by themselves do not control 457 the overall queue size or the size of individual queues. AQM 458 mechanisms might need to control the overall queue sizes, to ensure 459 that arriving bursts can be accommodated without dropping packets. 460 AQM should also be used to control the queue size for each individual 461 flow or class, so that they do not experience unnecessarily high 462 delay. Using a combination of AQM and scheduling between multiple 463 queues has been shown to offer good results in experimental and some 464 types of operational use. 466 In short, scheduling algorithms and queue management should be seen 467 as complementary, not as replacements for each other. 469 2.2. AQM and Explicit Congestion Marking (ECN) 471 An AQM method may use Explicit Congestion Notification (ECN) 472 [RFC3168] instead of dropping to mark packets under mild or moderate 473 congestion. ECN-marking can allow a network device to signal 474 congestion at a point before a transport experiences congestion loss 475 or additional queuing delay [ECN-Benefit]. Section 4.2.1 describes 476 some of the benefits of using ECN with AQM. 478 2.3. AQM and Buffer Size 480 It is important to differentiate the choice of buffer size for a 481 queue in a switch/router or other network device, and the 482 threshold(s) and other parameters that determine how and when an AQM 483 algorithm operates. The optimum buffer size is a function of 484 operational requirements and should generally be sized to be 485 sufficient to buffer the largest normal traffic burst that is 486 expected. This size depends on the number and burstiness of traffic 487 arriving at the queue and the rate at which traffic leaves the queue. 489 One objective of AQM is to minimize the effect of lock-out, where one 490 flow prevents other flows from effectively gaining capacity. This 491 need can be illustrated by a simple example of drop-tail queuing when 492 a new TCP flow injects packets into a queue that happens to be almost 493 full. A TCP flow's congestion control algorithm [RFC5681] increases 494 the flow rate to maximize its effective window. This builds a queue 495 in the network, inducing latency to the flow and other flows that 496 share this queue. Once a drop-tail queue fills, there will also be 497 loss. A new flow, sending its initial burst, has an enhanced 498 probability of filling the remaining queue and dropping packets. As 499 a result, the new flow can be effectively prevented from effectively 500 sharing the queue for a period of many RTTs. In contrast, AQM can 501 minimize the mean queue depth and therefore reducing the probability 502 that competing sessions can materially prevent each other from 503 performing well. 505 AQM frees a designer from having to limit the buffer space assigned 506 to a queue to achieve acceptable performance, allowing allocation of 507 sufficient buffering to satisfy the needs of the particular traffic 508 pattern. Different types of traffic and deployment scenarios will 509 lead to different requirements. The choice of AQM algorithm and 510 associated parameters is therefore a function of the way in which 511 congestion is experienced and the required reaction to achieve 512 acceptable performance. This latter is the primary topic of the 513 following sections. 515 3. Managing Aggressive Flows 517 One of the keys to the success of the Internet has been the 518 congestion avoidance mechanisms of TCP. Because TCP "backs off" 519 during congestion, a large number of TCP connections can share a 520 single, congested link in such a way that link bandwidth is shared 521 reasonably equitably among similarly situated flows. The equitable 522 sharing of bandwidth among flows depends on all flows running 523 compatible congestion avoidance algorithms, i.e., methods conformant 524 with the current TCP specification [RFC5681]. 526 In this document a flow is known as "TCP-friendly" when it has a 527 congestion response that approximates the average response expected 528 of a TCP flow. One example method of a TCP-friendly scheme is the 529 TCP-Friendly Rate Control algorithm [RFC5348]. In this document, the 530 term is used more generally to describe this and other algorithms 531 that meet these goals. 533 There are a variety of types of network flow. Some convenient 534 classes that describe flows are: (1) TCP Friendly flows, (2) 535 unresponsive flows, i.e., flows that do not slow down when congestion 536 occurs, and (3) flows that are responsive but are less responsive to 537 congestion than TCP. The last two classes contain more aggressive 538 flows that can pose significant threats to Internet performance. 540 1. TCP-Friendly flows 542 A TCP-friendly flow responds to congestion notification within a 543 small number of path Round Trip Times (RTT), and in steady-state 544 it uses no more capacity than a conformant TCP running under 545 comparable conditions (drop rate, RTT, packet size, etc.). This 546 is described in the remainder of the document. 548 2. Non-Responsive Flows 550 A flow that does not adjust its rate in response to congestion 551 notification within a small number of path RTTs, can also use 552 more capacity than a conformant TCP running under comparable 553 conditions. There is a growing set of applications whose 554 congestion avoidance algorithms are inadequate or nonexistent 555 (i.e., a flow that does not throttle its sending rate when it 556 experiences congestion). 558 The User Datagram Protocol (UDP) [RFC0768] provides a minimal, 559 best-effort transport to applications and upper-layer protocols 560 (both simply called "applications" in the remainder of this 561 document) and does not itself provide mechanisms to prevent 562 congestion collapse and establish a degree of fairness [RFC5405]. 563 Examples that use UDP include some streaming applications for 564 packet voice and video, and some multicast bulk data transport. 565 Other traffic, when aggregated may also become unresponsive to 566 congestion notification. If no action is taken, such 567 unresponsive flows could lead to a new congestion collapse 568 [RFC2914]. Some applications can even increase their traffic 569 volume in response to congestion (e.g. by adding forward error 570 correction when loss is experienced), with the possibility that 571 they contribute to congestion collapse. 573 In general, applications need to incorporate effective congestion 574 avoidance mechanisms [RFC5405]. Research continues to be needed 575 to identify and develop ways to accomplish congestion avoidance 576 for presently unresponsive applications. Network devices need to 577 be able to protect themselves against unresponsive flows, and 578 mechanisms to accomplish this must be developed and deployed. 579 Deployment of such mechanisms would provide an incentive for all 580 applications to become responsive by either using a congestion- 581 controlled transport (e.g. TCP, SCTP [RFC4960] and DCCP 582 [RFC4340].) or by incorporating their own congestion control in 583 the application [RFC5405], [RFC6679]. 585 3. Transport Flows that are less responsive than TCP 587 A second threat is posed by transport protocol implementations 588 that are responsive to congestion, but, either deliberately or 589 through faulty implementation, reduce less than a TCP flow would 590 have done in response to congestion. This covers a spectrum of 591 behaviours between (1) and (2). If applications are not 592 sufficiently responsive to congestion signals, they may gain an 593 unfair share of the available network capacity. 595 For example, the popularity of the Internet has caused a 596 proliferation in the number of TCP implementations. Some of 597 these may fail to implement the TCP congestion avoidance 598 mechanisms correctly because of poor implementation. Others may 599 deliberately be implemented with congestion avoidance algorithms 600 that are more aggressive in their use of capacity than other TCP 601 implementations; this would allow a vendor to claim to have a 602 "faster TCP". The logical consequence of such implementations 603 would be a spiral of increasingly aggressive TCP implementations, 604 leading back to the point where there is effectively no 605 congestion avoidance and the Internet is chronically congested. 607 Another example could be an RTP/UDP video flow that uses an 608 adaptive codec, but responds incompletely to indications of 609 congestion or responds over an excessively long time period. 610 Such flows are unlikely to be responsive to congestion signals in 611 a timeframe comparable to a small number of end-to-end 612 transmission delays. However, over a longer timescale, perhaps 613 seconds in duration, they could moderate their speed, or increase 614 their speed if they determine capacity to be available. 616 Tunneled traffic aggregates carrying multiple (short) TCP flows 617 can be more aggressive than standard bulk TCP. Applications 618 (e.g., web browsers primarily supporting HTTP 1.1 and peer-to- 619 peer file-sharing) have exploited this by opening multiple 620 connections to the same endpoint. 622 Lastly, some applications (e.g., web browsers primarily 623 supporting HTTP 1.1) open a large numbers of succesive short TCP 624 flows for a single session. This can lead to each individual 625 flow spending the majority of time in the exponential TCP slow 626 start phase, rather than in TCP congestion avoidance. The 627 resulting traffic aggregate can therefore be much less responsive 628 than a single standard TCP flow. 630 The projected increase in the fraction of total Internet traffic for 631 more aggressive flows in classes 2 and 3 could pose a threat to the 632 performance of the future Internet. There is therefore an urgent 633 need for measurements of current conditions and for further research 634 into the ways of managing such flows. This raises many difficult 635 issues in finding methods with an acceptable overhead cost that can 636 identify and isolate unresponsive flows or flows that are less 637 responsive than TCP. Finally, there is as yet little measurement or 638 simulation evidence available about the rate at which these threats 639 are likely to be realized, or about the expected benefit of 640 algorithms for managing such flows. 642 Another topic requiring consideration is the appropriate granularity 643 of a "flow" when considering a queue management method. There are a 644 few "natural" answers: 1) a transport (e.g.,TCP or UDP) flow (source 645 address/port, destination address/port, protocol); 2) Differentiated 646 Services Code Point, DSCP; 3) a source/destination host pair (IP 647 address); 4) a given source host or a given destination host, or 648 various combinations of the above; 5) a subscriber or site receiving 649 the Internet service (enterprise or residential). 651 The source/destination host pair gives an appropriate granularity in 652 many circumstances, However, different vendors/providers use 653 different granularities for defining a flow (as a way of 654 "distinguishing" themselves from one another), and different 655 granularities may be chosen for different places in the network. It 656 may be the case that the granularity is less important than the fact 657 that a network device needs to be able to deal with more unresponsive 658 flows at *some* granularity. The granularity of flows for congestion 659 management is, at least in part, a question of policy that needs to 660 be addressed in the wider IETF community. 662 4. Conclusions and Recommendations 664 The IRTF, in publishing [RFC2309], and the IETF in subsequent 665 discussion, has developed a set of specific recommendations regarding 666 the implementation and operational use of AQM procedures. The 667 recommendations provided by this document are summarised as: 669 1. Network devices SHOULD implement some AQM mechanism to manage 670 queue lengths, reduce end-to-end latency, and avoid lock-out 671 phenomena within the Internet. 673 2. Deployed AQM algorithms SHOULD support Explicit Congestion 674 Notification (ECN) as well as loss to signal congestion to 675 endpoints. 677 3. AQM algorithms SHOULD NOT require tuning of initial or 678 configuration parameters in common use cases. 680 4. AQM algorithms SHOULD respond to measured congestion, not 681 application profiles. 683 5. AQM algorithms SHOULD NOT interpret specific transport protocol 684 behaviours. 686 6. Transport protocol congestion control algorithms SHOULD maximize 687 their use of available capacity (when there is data to send) 688 without incurring undue loss or undue round trip delay. 690 7. Research, engineering, and measurement efforts are needed 691 regarding the design of mechanisms to deal with flows that are 692 unresponsive to congestion notification or are responsive, but 693 are more aggressive than present TCP. 695 These recommendations are expressed using the word "SHOULD". This is 696 in recognition that there may be use cases that have not been 697 envisaged in this document in which the recommendation does not 698 apply. Therefore, care should be taken in concluding that one's use 699 case falls in that category; during the life of the Internet, such 700 use cases have been rarely if ever observed and reported. To the 701 contrary, available research [Choi04] says that even high speed links 702 in network cores that are normally very stable in depth and behavior 703 experience occasional issues that need moderation. The 704 recommendations are detailed in the following sections. 706 4.1. Operational deployments SHOULD use AQM procedures 708 AQM procedures are designed to minimize the delay and buffer 709 exhaustion induced in the network by queues that have filled as a 710 result of host behavior. Marking and loss behaviors provide a signal 711 that buffers within network devices are becoming unnecessarily full, 712 and that the sender would do well to moderate its behavior. 714 The use of scheduling mechanisms, such as priority queuing, classful 715 queuing, and fair queuing, is often effective in networks to help a 716 network serve the needs of a range of applications. Network 717 operators can use these methods to manage traffic passing a choke 718 point. This is discussed in [RFC2474] and [RFC2475]. When 719 scheduling is used AQM should be applied across the classes or flows 720 as well as within each class or flow: 722 o AQM mechanisms need to control the overall queue sizes, to ensure 723 that arriving bursts can be accommodated without dropping packets. 725 o AQM mechanisms need to allow combination with other mechanisms, 726 such as scheduling, to allow implementation of policies for 727 providing fairness between different flows. 729 o AQM should be used to control the queue size for each individual 730 flow or class, so that they do not experience unnecessarily high 731 delay. 733 4.2. Signaling to the transport endpoints 735 There are a number of ways a network device may signal to the end 736 point that the network is becoming congested and trigger a reduction 737 in rate. The signalling methods include: 739 o Delaying transport segments (packets) in flight, such as in a 740 queue. 742 o Dropping transport segments (packets) in transit. 744 o Marking transport segments (packets), such as using Explicit 745 Congestion Control[RFC3168] [RFC4301] [RFC4774] [RFC6040] 746 [RFC6679]. 748 Increased network latency is used as an implicit signal of 749 congestion. E.g., in TCP additional delay can affect ACK Clocking 750 and has the result of reducing the rate of transmission of new data. 751 In the Real Time Protocol (RTP), network latency impacts the RTCP- 752 reported RTT and increased latency can trigger a sender to adjust its 753 rate. Methods such as Low Extra Delay Background Transport (LEDBAT) 754 [RFC6817] assume increased latency as a primary signal of congestion. 755 Appropriate use of delay-based methods and the implications of AQM 756 presently remains an area for further research. 758 It is essential that all Internet hosts respond to loss [RFC5681], 759 [RFC5405][RFC4960][RFC4340]. Packet dropping by network devices that 760 are under load has two effects: It protects the network, which is the 761 primary reason that network devices drop packets. The detection of 762 loss also provides a signal to a reliable transport (e.g., TCP, SCTP) 763 that there is potential congestion using a pragmatic heuristic; "when 764 the network discards a message in flight, it may imply the presence 765 of faulty equipment or media in a path, and it may imply the presence 766 of congestion. To be conservative, a transport must assume it may be 767 the latter." Applications using unreliable transports (e.g.,using 768 UDP) need to similarly react to loss [RFC5405] 770 Network devices SHOULD use an AQM algorithm to measure local 771 congestion and to determine the packets to mark or drop so that the 772 congestion is managed. 774 In general, dropping multiple packets from the same sessions in the 775 same RTT is ineffective, and can reduce throughput. Also, dropping 776 or marking packets from multiple sessions simultaneously can have the 777 effect of synchronizing them, resulting in increasing peaks and 778 troughs in the subsequent traffic load. Hence, AQM algorithms SHOULD 779 randomize dropping in time, to reduce the probability that congestion 780 indications are only experienced by a small proportion of the active 781 flows. 783 Loss due to dropping also has an effect on the efficiency of a flow 784 and can significantly impact some classes of application. In 785 reliable transports the dropped data must be subsequently 786 retransmitted. While other applications/transports may adapt to the 787 absence of lost data, this still implies inefficient use of available 788 capacity and the dropped traffic can affect other flows. Hence, 789 congestion signalling by loss is not entirely positive; it is a 790 necessary evil. 792 4.2.1. AQM and ECN 794 Explicit Congestion Notification (ECN) [RFC4301] [RFC4774] [RFC6040] 795 [RFC6679] is a network-layer function that allows a transport to 796 receive network congestion information from a network device without 797 incurring the unintended consequences of loss. ECN includes both 798 transport mechanisms and functions implemented in network devices, 799 the latter rely upon using AQM to decider when and whether to ECN- 800 mark. 802 Congestion for ECN-capable transports is signalled by a network 803 device setting the "Congestion Experienced (CE)" codepoint in the IP 804 header. This codepoint is noted by the remote receiving end point 805 and signalled back to the sender using a transport protocol 806 mechanism, allowing the sender to trigger timely congestion control. 807 The decision to set the CE codepoint requires an AQM algorithm 808 configured with a threshold. Non-ECN capable flows (the default) are 809 dropped under congestion. 811 Network devices SHOULD use an AQM algorithm that marks ECN-capable 812 traffic when making decisions about the response to congestion. 814 Network devices need to implement this method by marking ECN-capable 815 traffic or by dropping non-ECN-capable traffic. 817 Safe deployment of ECN requires that network devices drop excessive 818 traffic, even when marked as originating from an ECN-capable 819 transport. This is a necessary safety precaution because: 821 1. A non-conformant, broken or malicious receiver could conceal an 822 ECN mark, and not report this to the sender; 824 2. A non-conformant, broken or malicious sender could ignore a 825 reported ECN mark, as it could ignore a loss without using ECN; 827 3. A malfunctioning or non-conforming network device may "hide" an 828 ECN mark (or fail to correctly set the ECN codepoint at an egress 829 of a network tunnel). 831 In normal operation, such cases should be very uncommon, however 832 overload protection is desirable to protect traffic from 833 misconfigured or malicious use of ECN (e.g., a denial-of-service 834 attack that generates ECN-capable traffic that is unresponsive to CE- 835 marking). 837 An AQM algorithm that supports ECN needs to define the threshold and 838 algorithm for ECN-marking. This threshold MAY differ from that used 839 for dropping packets that are not marked as ECN-capable, and SHOULD 840 be configurable. 842 Network devices SHOULD use an algorithm to drop excessive traffic 843 (e.g., at some level above the threshold for CE-marking), even when 844 the packets are marked as originating from an ECN-capable transport. 846 4.3. AQM algorithms deployed SHOULD NOT require operational tuning 848 A number of AQM algorithms have been proposed. Many require some 849 form of tuning or setting of parameters for initial network 850 conditions. This can make these algorithms difficult to use in 851 operational networks. 853 AQM algorithms need to consider both "initial conditions" and 854 "operational conditions". The former includes values that exist 855 before any experience is gathered about the use of the algorithm, 856 such as the configured speed of interface, support for full duplex 857 communication, interface MTU and other properties of the link. The 858 latter includes information observed from monitoring the size of the 859 queue, experienced queueing delay, rate of packet discard, etc. 861 This document therefore specifies that AQM algorithms that are 862 proposed for deployment in the Internet have the following 863 properties: 865 o AQM algorithm deployment SHOULD NOT require tuning in common use 866 cases. An algorithm needs to provide a default behaviour that 867 auto-tunes to a reasonable performance for typical network 868 operational conditions. This is expected to ease deployment and 869 operation. Initial conditions, such as the interface rate and MTU 870 size or other values derived from these, MAY be required by an AQM 871 algorithm. 873 o MAY support further manual tuning that could improve performance 874 in a specific deployed network. Algorithms that lack such 875 variables are acceptable, but if such variables exist, they SHOULD 876 be externalized (made visible to the operator). Guidance needs to 877 be provided on the cases where auto-tuning is unlikely to achieve 878 acceptable performance and to identify the set of parameters that 879 can be tuned. For example, the expected response of an algorithm 880 may need to be configured to accommodate the largest expected Path 881 RTT, since this value can not be known at initialization. This 882 guidance is expected to enable the algorithm to be deployed in 883 networks that have specific characteristics (paths with variable/ 884 larger delay; networks where capacity is impacted by interactions 885 with lower layer mechanisms, etc). 887 o MAY provide logging and alarm signals to assist in identifying if 888 an algorithm using manual or auto-tuning is functioning as 889 expected. (e.g., this could be based on an internal consistency 890 check between input, output, and mark/drop rates over time). This 891 is expected to encourage deployment by default and allow operators 892 to identify potential interactions with other network functions. 894 Hence, self-tuning algorithms are to be preferred. Algorithms 895 recommended for general Internet deployment by the IETF need to be 896 designed so that they do not require operational (especially manual) 897 configuration or tuning. 899 4.4. AQM algorithms SHOULD respond to measured congestion, not 900 application profiles. 902 Not all applications transmit packets of the same size. Although 903 applications may be characterized by particular profiles of packet 904 size this should not be used as the basis for AQM (see next section). 905 Other methods exist, e.g., Differentiated Services queueing, Pre- 906 Congestion Notification (PCN) [RFC5559], that can be used to 907 differentiate and police classes of application. Network devices may 908 combine AQM with these traffic classification mechanisms and perform 909 AQM only on specific queues within a network device. 911 An AQM algorithm should not deliberately try to prejudice the size of 912 packet that performs best (i.e., Preferentially drop/mark based only 913 on packet size). Procedures for selecting packets to mark/drop 914 SHOULD observe the actual or projected time that a packet is in a 915 queue (bytes at a rate being an analog to time). When an AQM 916 algorithm decides whether to drop (or mark) a packet, it is 917 RECOMMENDED that the size of the particular packet should not be 918 taken into account [RFC7141]. 920 Applications (or transports) generally know the packet size that they 921 are using and can hence make their judgments about whether to use 922 small or large packets based on the data they wish to send and the 923 expected impact on the delay or throughput, or other performance 924 parameter. When a transport or application responds to a dropped or 925 marked packet, the size of the rate reduction should be proportionate 926 to the size of the packet that was sent [RFC7141]. 928 AQM-enabled system MAY instantiate different instances of an AQM 929 algorithm to be applied within the same traffic class. Traffic 930 classes may be differentiated based on an Access Control List (ACL), 931 the packet Differentiated Services Code Point (DSCP) [RFC5559], 932 enabling use of the ECN field (i.e., any of ECT(0), ECT(1) or 933 CE)[RFC3168] [RFC4774], a multi-field (MF) classifier that combines 934 the values of a set of protocol fields (e.g., IP address, transport, 935 ports) or an equivalent codepoint at a lower layer. This 936 recommendation goes beyond what is defined in RFC 3168, by allowing 937 that an implementation MAY use more than one instance of an AQM 938 algorithm to handle both ECN-capable and non-ECN-capable packets. 940 4.5. AQM algorithms SHOULD NOT be dependent on specific transport 941 protocol behaviours 943 In deploying AQM, network devices need to support a range of Internet 944 traffic and SHOULD NOT make implicit assumptions about the 945 characteristics desired by the set transports/applications the 946 network supports. That is, AQM methods should be opaque to the 947 choice of transport and application. 949 AQM algorithms are often evaluated by considering TCP [RFC0793] with 950 a limited number of applications. Although TCP is the predominant 951 transport in the Internet today, this no longer represents a 952 sufficient selection of traffic for verification. There is 953 significant use of UDP [RFC0768] in voice and video services, and 954 some applications find utility in SCTP [RFC4960] and DCCP [RFC4340]. 955 Hence, AQM algorithms should also demonstrate operation with 956 transports other than TCP and need to consider a variety of 957 applications. Selection of AQM algorithms also needs to consider use 958 of tunnel encapsulations that may carry traffic aggregates. 960 AQM algorithms SHOULD NOT target or derive implicit assumptions about 961 the characteristics desired by specific transports/applications. 962 Transports and applications need to respond to the congestion signals 963 provided by AQM (i.e., dropping or ECN-marking) in a timely manner 964 (within a few RTT at the latest). 966 4.6. Interactions with congestion control algorithms 968 Applications and transports need to react to received implicit or 969 explicit signals that indicate the presence of congestion. This 970 section identifies issues that can impact the design of transport 971 protocols when using paths that use AQM. 973 Transport protocols and applications need timely signals of 974 congestion. The time taken to detect and respond to congestion is 975 increased when network devices queue packets in buffers. It can be 976 difficult to detect tail losses at a higher layer and this may 977 sometimes require transport timers or probe packets to detect and 978 respond to such loss. Loss patterns may also impact timely 979 detection, e.g., the time may be reduced when network devices do not 980 drop long runs of packets from the same flow. 982 A common objective of an elastic transport congestion control 983 protocol is to allow an application to deliver the maximum rate of 984 data without inducing excessive delays when packets are queued in a 985 buffers within the network. To achieve this, a transport should try 986 to operate at rate below the inflexion point of the load/delay curve 987 (the bend of what is sometimes called a "hockey-stick" curve) 988 [Jain94]. When the congestion window allows the load to approach 989 this bend, the end-to-end delay starts to rise - a result of 990 congestion, as packets probabilistically arrive at non-overlapping 991 times. On the one hand, a transport that operates above this point 992 can experience congestion loss and could also trigger operator 993 activities, such as those discussed in [RFC6057]. On the other hand, 994 a flow may achieve both near-maximum throughput and low latency when 995 it operates close to this knee point, with minimal contribution to 996 router congestion. Choice of an appropriate rate/congestion window 997 can therefore significantly impact the loss and delay experienced by 998 a flow and will impact other flows that share a common network queue. 1000 Some applications may send less than permitted by the congestion 1001 control window (or rate). Examples include multimedia codecs that 1002 stream at some natural rate (or set of rates) or an application that 1003 is naturally interactive (e.g., some web applications, interactive 1004 server-based gaming, transaction-based protocols). Such applications 1005 may have different objectives. They may not wish to maximize 1006 throughput, but may desire a lower loss rate or bounded delay. 1008 The correct operation of an AQM-enabled network device MUST NOT rely 1009 upon specific transport responses to congestion signals. 1011 4.7. The need for further research 1013 The second recommendation of [RFC2309] called for further research 1014 into the interaction between network queues and host applications, 1015 and the means of signaling between them. This research has occurred, 1016 and we as a community have learned a lot. However, we are not done. 1018 We have learned that the problems of congestion, latency and buffer- 1019 sizing have not gone away, and are becoming more important to many 1020 users. A number of self-tuning AQM algorithms have been found that 1021 offer significant advantages for deployed networks. There is also 1022 renewed interest in deploying AQM and the potential of ECN. 1024 Traffic patterns can depend on the network deployment scenario, and 1025 Internet research therefore needs to consider the implications of a 1026 diverse range of application interactions. At the time of writing 1027 (in 2015), an obvious example of further research is the need to 1028 consider the many-to-one communication patterns found in data 1029 centers, known as incast [Ren12], (e.g., produced by Map/Reduce 1030 applications). Such anlaysis needs to study not only each 1031 application traffic type, but should also include combinations of 1032 types of traffic. 1034 Research also needs to consider the need to extend our taxonomy of 1035 transport sessions to include not only "mice" and "elephants", but 1036 "lemmings"? Where "Lemmings" are flash crowds of "mice" that the 1037 network inadvertently tries to signal to as if they were elephant 1038 flows, resulting in head of line blocking in a data center deployment 1039 scenario. 1041 Examples of other required research include: 1043 o Research into new AQM and scheduling algorithms. 1045 o Appropriate use of delay-based methods and the implications of 1046 AQM. 1048 o Research into suitable algorithms for marking ECN-capable packets 1049 that do not require operational configuration or tuning for common 1050 use. 1052 o Experience in the deployment of ECN alongside AQM. 1054 o Tools for enabling AQM (and ECN) deployment and measuring the 1055 performance. 1057 o Methods for mitigating the impact of non-conformant and malicious 1058 flows. 1060 o Research to understand the implications of using new network and 1061 transport methods on applications. 1063 Hence, this document therefore reiterates the call of RFC 2309: we 1064 need continuing research as applications develop. 1066 5. IANA Considerations 1068 This memo asks the IANA for no new parameters. 1070 6. Security Considerations 1072 While security is a very important issue, it is largely orthogonal to 1073 the performance issues discussed in this memo. 1075 This recommendation requires algorithms to be independent of specific 1076 transport or application behaviors. Therefore a network device does 1077 not require visibility or access to upper layer protocol information 1078 to implement an AQM algorithm. This ability to operate in an 1079 application-agnostic fashion is therefore an example of a privacy- 1080 enhancing feature. 1082 Many deployed network devices use queueing methods that allow 1083 unresponsive traffic to capture network capacity, denying access to 1084 other traffic flows. This could potentially be used as a denial-of- 1085 service attack. This threat could be reduced in network devices that 1086 deploy AQM or some form of scheduling. We note, however, that a 1087 denial-of-service attack that results in unresponsive traffic flows 1088 may be indistinguishable from other traffic flows (e.g., tunnels 1089 carrying aggregates of short flows, high-rate isochronous 1090 applications). New methods therefore may remain vulnerable, and this 1091 document recommends that ongoing research should consider ways to 1092 mitigate such attacks. 1094 7. Privacy Considerations 1096 This document, by itself, presents no new privacy issues. 1098 8. Acknowledgements 1100 The original version of this document describing best current 1101 practice was based on the informational text of [RFC2309]. This was 1102 written by the End-to-End Research Group, which is to say Bob Braden, 1103 Dave Clark, Jon Crowcroft, Bruce Davie, Steve Deering, Deborah 1104 Estrin, Sally Floyd, Van Jacobson, Greg Minshall, Craig Partridge, 1105 Larry Peterson, KK Ramakrishnan, Scott Shenker, John Wroclawski, and 1106 Lixia Zhang. Although there are important differences, many of the 1107 key arguments in the present document remain unchanged from those in 1108 RFC 2309. 1110 The need for an updated document was agreed to in the tsvarea meeting 1111 at IETF 86. This document was reviewed on the aqm@ietf.org list. 1112 Comments were received from Colin Perkins, Richard Scheffenegger, 1113 Dave Taht, John Leslie, David Collier-Brown and many others. 1115 Gorry Fairhurst was in part supported by the European Community under 1116 its Seventh Framework Programme through the Reducing Internet 1117 Transport Latency (RITE) project (ICT-317700). 1119 9. References 1121 9.1. Normative References 1123 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1124 Requirement Levels", BCP 14, RFC 2119, March 1997. 1126 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1127 of Explicit Congestion Notification (ECN) to IP", RFC 1128 3168, September 2001. 1130 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1131 Internet Protocol", RFC 4301, December 2005. 1133 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 1134 Explicit Congestion Notification (ECN) Field", BCP 124, 1135 RFC 4774, November 2006. 1137 [RFC5405] Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines 1138 for Application Designers", BCP 145, RFC 5405, November 1139 2008. 1141 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1142 Control", RFC 5681, September 2009. 1144 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1145 Notification", RFC 6040, November 2010. 1147 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 1148 and K. Carlberg, "Explicit Congestion Notification (ECN) 1149 for RTP over UDP", RFC 6679, August 2012. 1151 [RFC7141] Briscoe, B. and J. Manner, "Byte and Packet Congestion 1152 Notification", BCP 41, RFC 7141, February 2014. 1154 9.2. Informative References 1156 [AQM-WG] "IETF AQM WG", . 1158 [Bri15] Briscoe, Bob., Brunstrom, Anna., Petlund, Andreas., Hayes, 1159 David., Ros, David., Tsang, Ing-Jyh., Gjessing, Stein., 1160 Fairhurst, Gorry., Griwodz, Carsten., and Michael. Welzl, 1161 "Reducing Internet Latency: A Survey of Techniques and 1162 their Merit, IEEE Communications Surveys & Tutorials", 1163 2015. 1165 [CONEX] Mathis, M. and B. Briscoe, "The Benefits to Applications 1166 of using Explicit Congestion Notification (ECN)", IETF 1167 (Work-in-Progress) draft-ietf-conex-abstract-mech, March 1168 2014. 1170 [Choi04] Choi, Baek-Young., Moon, Sue., Zhang, Zhi-Li., 1171 Papagiannaki, K., and C. Diot, "Analysis of Point-To-Point 1172 Packet Delay In an Operational Network", March 2004. 1174 [Dem90] Demers, A., Keshav, S., and S. Shenker, "Analysis and 1175 Simulation of a Fair Queueing Algorithm, Internetworking: 1176 Research and Experience", SIGCOMM Symposium proceedings on 1177 Communications architectures and protocols , 1990. 1179 [ECN-Benefit] 1180 Welzl, M. and G. Fairhurst, "The Benefits to Applications 1181 of using Explicit Congestion Notification (ECN)", IETF 1182 (Work-in-Progress) , February 2014. 1184 [Flo92] Floyd, S. and V. Jacobsen, "On Traffic Phase Effects in 1185 Packet-Switched Gateways", 1992. 1187 [Flo94] Floyd, S. and V. Jacobsen, "The Synchronization of 1188 Periodic Routing Messages, 1189 http://ee.lbl.gov/papers/sync_94.pdf", 1994. 1191 [Floyd91] Floyd, S., "Connections with Multiple Congested Gateways 1192 in Packet-Switched Networks Part 1: One-way Traffic.", 1193 Computer Communications Review , October 1991. 1195 [Floyd95] Floyd, S. and V. Jacobson, "Link-sharing and Resource 1196 Management Models for Packet Networks", IEEE/ACM 1197 Transactions on Networking , August 1995. 1199 [Jacobson88] 1200 Jacobson, V., "Congestion Avoidance and Control", SIGCOMM 1201 Symposium proceedings on Communications architectures and 1202 protocols , August 1988. 1204 [Jain94] Jain, Raj., Ramakrishnan, KK., and Chiu. Dah-Ming, 1205 "Congestion avoidance scheme for computer networks", US 1206 Patent Office 5377327, December 1994. 1208 [Lakshman96] 1209 Lakshman, TV., Neidhardt, A., and T. Ott, "The Drop From 1210 Front Strategy in TCP Over ATM and Its Interworking with 1211 Other Control Features", IEEE Infocomm , 1996. 1213 [Leland94] 1214 Leland, W., Taqqu, M., Willinger, W., and D. Wilson, "On 1215 the Self-Similar Nature of Ethernet Traffic (Extended 1216 Version)", IEEE/ACM Transactions on Networking , February 1217 1994. 1219 [McK90] McKenney, PE. and G. Varghese, "Stochastic Fairness 1220 Queuing", 1221 http://www2.rdrop.com/~paulmck/scalability/paper/ 1222 sfq.2002.06.04.pdf , 1990. 1224 [Nic12] Nichols, K., "Controlling Queue Delay", Communications of 1225 the ACM Vol. 55 No. 11, July, 2012, pp.42-50. , July 2002. 1227 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 1228 August 1980. 1230 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1231 1981. 1233 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 1234 793, September 1981. 1236 [RFC0896] Nagle, J., "Congestion control in IP/TCP internetworks", 1237 RFC 896, January 1984. 1239 [RFC0970] Nagle, J., "On packet switches with infinite storage", RFC 1240 970, December 1985. 1242 [RFC1122] Braden, R., "Requirements for Internet Hosts - 1243 Communication Layers", STD 3, RFC 1122, October 1989. 1245 [RFC1633] Braden, B., Clark, D., and S. Shenker, "Integrated 1246 Services in the Internet Architecture: an Overview", RFC 1247 1633, June 1994. 1249 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 1250 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 1251 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 1252 S., Wroclawski, J., and L. Zhang, "Recommendations on 1253 Queue Management and Congestion Avoidance in the 1254 Internet", RFC 2309, April 1998. 1256 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1257 (IPv6) Specification", RFC 2460, December 1998. 1259 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1260 "Definition of the Differentiated Services Field (DS 1261 Field) in the IPv4 and IPv6 Headers", RFC 2474, December 1262 1998. 1264 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 1265 and W. Weiss, "An Architecture for Differentiated 1266 Services", RFC 2475, December 1998. 1268 [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, RFC 1269 2914, September 2000. 1271 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1272 Congestion Control Protocol (DCCP)", RFC 4340, March 2006. 1274 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC 1275 4960, September 2007. 1277 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 1278 Friendly Rate Control (TFRC): Protocol Specification", RFC 1279 5348, September 2008. 1281 [RFC5559] Eardley, P., "Pre-Congestion Notification (PCN) 1282 Architecture", RFC 5559, June 2009. 1284 [RFC6057] Bastian, C., Klieber, T., Livingood, J., Mills, J., and R. 1285 Woundy, "Comcast's Protocol-Agnostic Congestion Management 1286 System", RFC 6057, December 2010. 1288 [RFC6789] Briscoe, B., Woundy, R., and A. Cooper, "Congestion 1289 Exposure (ConEx) Concepts and Use Cases", RFC 6789, 1290 December 2012. 1292 [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, 1293 "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, 1294 December 2012. 1296 [RFC7414] Duke, M., Braden, R., Eddy, W., Blanton, E., and A. 1297 Zimmermann, "A Roadmap for Transmission Control Protocol 1298 (TCP) Specification Documents", RFC 7414, February 2015. 1300 [Ren12] Ren, Y., Zhao, Y., and P. Liu, "A survey on TCP Incast in 1301 data center networks, International Journal of 1302 Communication Systems, Volume 27, Issue 8, pages 1303 1160-117", 1990. 1305 [Shr96] Shreedhar, M. and G. Varghese, "Efficient Fair Queueing 1306 Using Deficit Round Robin", IEEE/ACM Transactions on 1307 Networking Vol 4, No. 3 , July 1996. 1309 [Sto97] Stoica, I. and H. Zhang, "A Hierarchical Fair Service 1310 Curve algorithm for Link sharing, real-time and priority 1311 services", ACM SIGCOMM , 1997. 1313 [Sut99] Suter, B., "Buffer Management Schemes for Supporting TCP 1314 in Gigabit Routers with Per-flow Queueing", IEEE Journal 1315 on Selected Areas in Communications Vol. 17 Issue 6, June, 1316 1999, pp. 1159-1169. , 1999. 1318 [Willinger95] 1319 Willinger, W., Taqqu, M., Sherman, R., Wilson, D., and V. 1320 Jacobson, "Self-Similarity Through High-Variability: 1321 Statistical Analysis of Ethernet LAN Traffic at the Source 1322 Level", SIGCOMM Symposium proceedings on Communications 1323 architectures and protocols , August 1995. 1325 [Zha90] Zhang, L. and D. Clark, "Oscillating Behavior of Network 1326 Traffic: A Case Study Simulation, 1327 http://groups.csail.mit.edu/ana/Publications/Zhang-DDC- 1328 Oscillating-Behavior-of-Network-Traffic-1990.pdf", 1990. 1330 Appendix A. Change Log 1332 RFC-Editor please remove this appendix before publication. 1334 Initial Version: March 2013 1335 Minor update of the algorithms that the IETF recommends SHOULD NOT 1336 require operational (especially manual) configuration or tuning 1337 April 2013 1339 Major surgery. This draft is for discussion at IETF-87 and expected 1340 to be further updated. 1341 July 2013 1343 -00 WG Draft - Updated transport recommendations; revised deployment 1344 configuration section; numerous minor edits. 1345 Oct 2013 1347 -01 WG Draft - Updated transport recommendations; revised deployment 1348 configuration section; numerous minor edits. 1349 Jan 2014 - Feedback from WG. 1351 -02 WG Draft - Minor edits Feb 2014 - Mainly language fixes. 1353 -03 WG Draft - Minor edits Feb 2013 - Comments from David Collier- 1354 Brown and David Taht. 1356 -04 WG Draft - Minor edits May 2014 - Comments during WGLC: Provided 1357 some introductory subsections to help people (with subsections and 1358 better text). - Written more on the role scheduling. - Clarified 1359 that ECN mark threshold needs to be configurable. - Reworked your 1360 "knee" para. Various updates in response to feedback. 1362 -05 WG Draft - Minor edits June 2014 - New text added to address 1363 further comments, and improve introduction - adding context, 1364 reference to Conex, linking between sections, added text on 1365 synchronization. 1367 -06 WG Draft - Minor edits July 2014 - Reorganised the introduction 1368 following WG feedback to better explain how this relates to the 1369 original goals of RFC2309. Added item on packet bursts. Various 1370 minor corrections incorporated - no change to main 1371 recommendations. 1373 -07 WG Draft - Minor edits July 2014 - Replaced ID REF by RFC 7141. 1374 Changes made to introduction following inputs from Wes Eddy and 1375 John Leslie. Corrections and additions proposed by Bob Briscoe. 1377 -08 WG Draft - Minor edits August 2014 - Review comments from John 1378 Leslie and Bob Briscoe. Text corrections including; updated 1379 Acknowledgments (RFC2309 ref) s/congestive/congestion/g; changed 1380 the more bold language from RFC2309 to reflect a more considered 1381 perceived threat to Internet Performance; modified the category 1382 that is not-TCP-like to be "less responsive to congestion than 1383 TCP" and more clearkly noted that represents a range of 1384 behaviours. 1386 -09 WG Draft - Minor edits Jan 2015 - Edits following LC comments. 1388 -10 WG Draft - Minor edits Feb 2015 - Update following IESG Review 1390 o Gorry's Unresolved to-do list 1392 o ---------------- 1394 o DISCUSS: This sentence above could be understood in different 1395 ways. For example, that any configuration is wrong. The ability 1396 to activate AQM is a good thing IMO. The section 4.3 title is 1397 closer to what you intend to say: "AQM algorithms deployed SHOULD 1398 NOT require operational tuning" The issue is that you only define 1399 what you mean by "operational configuration" in section 4.3 1401 o ---------------- 1403 o DISCUSS1: OLD: 3. The algorithms that the IETF recommends SHOULD 1404 NOT require operational (especially manual) configuration or 1405 tuning. NEW: AQM algorithm deployment SHOULD NOT require tuning 1406 of initial or configuration parameters. Gorry proposes: AQM 1407 algorithms specified by the IETF SHOULD NOT require tuning of 1408 initial or configuration parameters. 1410 o ------------------------- 1412 o DISCUSS2: OLD: 4.3 AQM algorithms deployed SHOULD NOT require 1413 operational tuning NEW: 4.3 AQM algorithm deployment SHOULD NOT 1414 require tuning 1416 o --------------------------------- 1418 o COMMENT: I see relatively little, and that includes here mention 1419 that time scales for queue management and the time scale for 1420 application responsiveness to congestion signals are wildly 1421 different. e.g. one is measured in usecs the other is bounded by 1422 RTT. queue sizing and policing around abberant events, for example 1423 micro-as loops driven by a prefix withdraw isn't really dealt with 1424 at all. 1426 o - GF: This is true, although we hint at the same issue with paths 1427 with satellite delay (when RTT >> AQM loop response time) --- have 1428 we ideas of what to write? 1430 o --------------------------------- 1431 o GEN-ART COMMENT: "Ensuring that mechanisms do not interact badly: 1432 Given that a number of different mechanisms are being developed 1433 and potentially may all be deployed in various quantities in 1434 routers, etc., along the path that a packet takes, ensuring that 1435 this does not lead to instability or other interactions should 1436 also be a target of research. A number of applications now have 1437 flow control mechanisms that may be deployed as an adjunct to TCP 1438 so that a single path may have multiple nested end-to-end feedback 1439 loops (notably, just about to be standardized, HHTP2!) and it 1440 would be very wise to ensure that adding AQM into the loop does 1441 not lead to problems. "-GF:what is this comment about???? - I know 1442 the problems of flow control interaction in HTTP2 with TCP; but I 1443 don't understand how that applies and what new text this needs: 1444 Ideas? 1446 Authors' Addresses 1448 Fred Baker (editor) 1449 Cisco Systems 1450 Santa Barbara, California 93117 1451 USA 1453 Email: fred@cisco.com 1455 Godred Fairhurst (editor) 1456 University of Aberdeen 1457 School of Engineering 1458 Fraser Noble Building 1459 Aberdeen, Scotland AB24 3UE 1460 UK 1462 Email: gorry@erg.abdn.ac.uk 1463 URI: http://www.erg.abdn.ac.uk