idnits 2.17.1 draft-ietf-aqm-recommendation-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC2309, but the abstract doesn't seem to directly say this. It does mention RFC2309 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 14, 2014) is 3714 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5405 (Obsoleted by RFC 8085) -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 896 (Obsoleted by RFC 7805) -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Baker, Ed. 3 Internet-Draft Cisco Systems 4 Obsoletes: 2309 (if approved) G. Fairhurst, Ed. 5 Intended status: Best Current Practice University of Aberdeen 6 Expires: August 18, 2014 February 14, 2014 8 IETF Recommendations Regarding Active Queue Management 9 draft-ietf-aqm-recommendation-02 11 Abstract 13 This memo presents recommendations to the Internet community 14 concerning measures to improve and preserve Internet performance. It 15 presents a strong recommendation for testing, standardization, and 16 widespread deployment of active queue management (AQM) in network 17 devices, to improve the performance of today's Internet. It also 18 urges a concerted effort of research, measurement, and ultimate 19 deployment of AQM mechanisms to protect the Internet from flows that 20 are not sufficiently responsive to congestion notification. 22 The note largely repeats the recommendations of RFC 2309, updated 23 after fifteen years of experience and new research. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on August 18, 2014. 42 Copyright Notice 44 Copyright (c) 2014 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 60 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 61 2. The Need For Active Queue Management . . . . . . . . . . . . 4 62 3. Managing Aggressive Flows . . . . . . . . . . . . . . . . . . 8 63 4. Conclusions and Recommendations . . . . . . . . . . . . . . . 10 64 4.1. Operational deployments SHOULD use AQM procedures . . . 11 65 4.2. Signaling to the transport endpoints . . . . . . . . . . 11 66 4.2.1. AQM and ECN . . . . . . . . . . . . . . . . . . . . . 12 67 4.3. AQM algorithms deployed SHOULD NOT require operational 68 tuning . . . . . . . . . . . . . . . . . . . . . . . . . 13 69 4.4. AQM algorithms SHOULD respond to measured congestion, not 70 application profiles. . . . . . . . . . . . . . . . . . . 14 71 4.5. AQM algorithms SHOULD NOT be dependent on specific 72 transport protocol behaviours . . . . . . . . . . . . . . 15 73 4.6. Interactions with congestion control algorithms . . . . . 16 74 4.7. The need for further research . . . . . . . . . . . . . . 17 75 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 76 6. Security Considerations . . . . . . . . . . . . . . . . . . . 17 77 7. Privacy Considerations . . . . . . . . . . . . . . . . . . . 18 78 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 79 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 80 9.1. Normative References . . . . . . . . . . . . . . . . . . 18 81 9.2. Informative References . . . . . . . . . . . . . . . . . 19 82 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 22 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 85 1. Introduction 87 The Internet protocol architecture is based on a connectionless end- 88 to-end packet service using the Internet Protocol, whether IPv4 89 [RFC0791] or IPv6 [RFC2460]. The advantages of its connectionless 90 design: flexibility and robustness, have been amply demonstrated. 91 However, these advantages are not without cost: careful design is 92 required to provide good service under heavy load. In fact, lack of 93 attention to the dynamics of packet forwarding can result in severe 94 service degradation or "Internet meltdown". This phenomenon was 95 first observed during the early growth phase of the Internet in the 96 mid 1980s [RFC0896][RFC0970], and is technically called "congestive 97 collapse". 99 The original fix for Internet meltdown was provided by Van Jacobsen. 100 Beginning in 1986, Jacobsen developed the congestion avoidance 101 mechanisms that are now required in TCP implementations [Jacobson88] 102 [RFC1122]. These mechanisms operate in Internet hosts to cause TCP 103 connections to "back off" during congestion. We say that TCP flows 104 are "responsive" to congestion signals (i.e., marked or dropped 105 packets) from the network. It is primarily these TCP congestion 106 avoidance algorithms that prevent the congestive collapse of today's 107 Internet. Similar algorithms are specified for other non-TCP 108 transports. 110 However, that is not the end of the story. Considerable research has 111 been done on Internet dynamics since 1988, and the Internet has 112 grown. It has become clear that the TCP congestion avoidance 113 mechanisms [RFC5681], while necessary and powerful, are not 114 sufficient to provide good service in all circumstances. Basically, 115 there is a limit to how much control can be accomplished from the 116 edges of the network. Some mechanisms are needed in the network 117 devices to complement the endpoint congestion avoidance mechanisms. 118 These mechanisms may be implemented in network devices that include 119 routers, switches, and other network middleboxes. 121 It is useful to distinguish between two classes of algorithms related 122 to congestion control: "queue management" versus "scheduling" 123 algorithms. To a rough approximation, queue management algorithms 124 manage the length of packet queues by marking or dropping packets 125 when necessary or appropriate, while scheduling algorithms determine 126 which packet to send next and are used primarily to manage the 127 allocation of bandwidth among flows. While these two AQM mechanisms 128 are closely related, they address different performance issues. 130 This memo highlights two performance issues: 132 The first issue is the need for an advanced form of queue management 133 that we call "Active Queue Management", AQM. Section 2 summarizes 134 the benefits that active queue management can bring. A number of AQM 135 procedures are described in the literature, with different 136 characteristics. This document does not recommend any of them in 137 particular, but does make recommendations that ideally would affect 138 the choice of procedure used in a given implementation. 140 The second issue, discussed in Section 3 of this memo, is the 141 potential for future congestive collapse of the Internet due to flows 142 that are unresponsive, or not sufficiently responsive, to congestion 143 indications. Unfortunately, there is currently no consensus solution 144 to controlling congestion caused by such aggressive flows; 145 significant research and engineering will be required before any 146 solution will be available. It is imperative that this work be 147 energetically pursued, to ensure the future stability of the 148 Internet. 150 Section 4 concludes the memo with a set of recommendations to the 151 Internet community concerning these topics. 153 The discussion in this memo applies to "best-effort" traffic, which 154 is to say, traffic generated by applications that accept the 155 occasional loss, duplication, or reordering of traffic in flight. It 156 also applies to other traffic, such as real-time traffic that can 157 adapt its sending rate to reduce loss and/or delay. It is most 158 effective, when the adaption occurs on time scales of a single Round 159 Trip Time (RTT) or a small number of RTTs, for elastic traffic 160 [RFC1633]. 162 [RFC2309] resulted from past discussions of end-to-end performance, 163 Internet congestion, and Random Early Discard (RED) in the End-to-End 164 Research Group of the Internet Research Task Force (IRTF). This 165 update results from experience with this and other algorithms, and 166 the AQM discussion within the IETF[AQM-WG]. 168 1.1. Requirements Language 170 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 171 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 172 document are to be interpreted as described in [RFC2119]. 174 2. The Need For Active Queue Management 176 The traditional technique for managing the queue length in a network 177 device is to set a maximum length (in terms of packets) for each 178 queue, accept packets for the queue until the maximum length is 179 reached, then reject (drop) subsequent incoming packets until the 180 queue decreases because a packet from the queue has been transmitted. 181 This technique is known as "tail drop", since the packet that arrived 182 most recently (i.e., the one on the tail of the queue) is dropped 183 when the queue is full. This method has served the Internet well for 184 years, but it has two important drawbacks: 186 1. Lock-Out 188 In some situations tail drop allows a single connection or a few 189 flows to monopolize queue space, preventing other connections 190 from getting room in the queue. This "lock-out" phenomenon is 191 often the result of synchronization or other timing effects. 193 2. Full Queues 195 The tail drop discipline allows queues to maintain a full (or, 196 almost full) status for long periods of time, since tail drop 197 signals congestion (via a packet drop) only when the queue has 198 become full. It is important to reduce the steady-state queue 199 size, and this is perhaps the most important goal for queue 200 management. 202 The naive assumption might be that there is a simple tradeoff 203 between delay and throughput, and that the recommendation that 204 queues be maintained in a "non-full" state essentially translates 205 to a recommendation that low end-to-end delay is more important 206 than high throughput. However, this does not take into account 207 the critical role that packet bursts play in Internet 208 performance. For example, even though TCP constrains the 209 congestion window of a flow, packets often arrive at network 210 devices in bursts [Leland94]. If the queue is full or almost 211 full, an arriving burst will cause multiple packets to be 212 dropped. This can result in a global synchronization of flows 213 throttling back, followed by a sustained period of lowered link 214 utilization, reducing overall throughput. 216 The point of buffering in the network is to absorb data bursts 217 and to transmit them during the (hopefully) ensuing bursts of 218 silence. This is essential to permit the transmission of bursty 219 data. Normally small queues are preferred in network devices, 220 with sufficient queue capacity to absorb the bursts. The 221 counter-intuitive result is that maintaining normally-small 222 queues can result in higher throughput as well as lower end-to- 223 end delay. In summary, queue limits should not reflect the 224 steady state queues we want to be maintained in the network; 225 instead, they should reflect the size of bursts that a network 226 device needs to absorb. 228 Besides tail drop, two alternative queue disciplines that can be 229 applied when a queue becomes full are "random drop on full" or "drop 230 front on full". Under the random drop on full discipline, a network 231 device drops a randomly selected packet from the queue (which can be 232 an expensive operation, since it naively requires an O(N) walk 233 through the packet queue) when the queue is full and a new packet 234 arrives. Under the "drop front on full" discipline [Lakshman96], the 235 network device drops the packet at the front of the queue when the 236 queue is full and a new packet arrives. Both of these solve the 237 lock-out problem, but neither solves the full-queues problem 238 described above. 240 We know in general how to solve the full-queues problem for 241 "responsive" flows, i.e., those flows that throttle back in response 242 to congestion notification. In the current Internet, dropped packets 243 provide a critical mechanism indicating congestion notification to 244 hosts. The solution to the full-queues problem is for network 245 devices to drop packets before a queue becomes full, so that hosts 246 can respond to congestion before buffers overflow. We call such a 247 proactive approach AQM. By dropping packets before buffers overflow, 248 AQM allows network devices to control when and how many packets to 249 drop. 251 In summary, an active queue management mechanism can provide the 252 following advantages for responsive flows. 254 1. Reduce number of packets dropped in network devices 256 Packet bursts are an unavoidable aspect of packet networks 257 [Willinger95]. If all the queue space in a network device is 258 already committed to "steady state" traffic or if the buffer 259 space is inadequate, then the network device will have no ability 260 to buffer bursts. By keeping the average queue size small, AQM 261 will provide greater capacity to absorb naturally-occurring 262 bursts without dropping packets. 264 Furthermore, without AQM, more packets will be dropped when a 265 queue does overflow. This is undesirable for several reasons. 266 First, with a shared queue and the tail drop discipline, this can 267 result in unnecessary global synchronization of flows, resulting 268 in lowered average link utilization, and hence lowered network 269 throughput. Second, unnecessary packet drops represent a 270 possible waste of network capacity on the path before the drop 271 point. 273 While AQM can manage queue lengths and reduce end-to-end latency 274 even in the absence of end-to-end congestion control, it will be 275 able to reduce packet drops only in an environment that continues 276 to be dominated by end-to-end congestion control. 278 2. Provide a lower-delay interactive service 280 By keeping a small average queue size, AQM will reduce the delays 281 experienced by flows. This is particularly important for 282 interactive applications such as short Web transfers, Telnet 283 traffic, or interactive audio-video sessions, whose subjective 284 (and objective) performance is better when the end-to-end delay 285 is low. 287 3. Avoid lock-out behavior 289 AQM can prevent lock-out behavior by ensuring that there will 290 almost always be a buffer available for an incoming packet. For 291 the same reason, AQM can prevent a bias against low capacity, but 292 highly bursty, flows. 294 Lock-out is undesirable because it constitutes a gross unfairness 295 among groups of flows. However, we stop short of calling this 296 benefit "increased fairness", because general fairness among 297 flows requires per-flow state, which is not provided by queue 298 management. For example, in a network device using AQM with only 299 FIFO scheduling, two TCP flows may receive very different share 300 of the network capacity simply because they have different round- 301 trip times [Floyd91], and a flow that does not use congestion 302 control may receive more capacity than a flow that does. For 303 example, a router may maintain per-flow state to achieve general 304 fairness by a per-flow scheduling algorithm such as Fair Queueing 305 (FQ) [Demers90], or a Class-Based Queue scheduling algorithm such 306 as CBQ [Floyd95]. 308 In contrast, AQM is needed even for network devices that use per- 309 flow scheduling algorithms such as FQ or class-based scheduling 310 algorithms, such as CBQ. This is because per-flow scheduling 311 algorithms by themselves do not control the overall queue size or 312 the size of individual queues. AQM is needed to control the 313 overall average queue sizes, so that arriving bursts can be 314 accommodated without dropping packets. In addition, AQM should 315 be used to control the queue size for each individual flow or 316 class, so that they do not experience unnecessarily high delay. 317 Therefore, AQM should be applied across the classes or flows as 318 well as within each class or flow. 320 In short, scheduling algorithms and queue management should be seen 321 as complementary, not as replacements for each other. 323 An AQM method may use Explicit Congestion Notification (ECN) 324 [RFC3168] instead of dropping to mark packets under mild or moderate 325 congestion (see Section 4.2.1). 327 It is also important to differentiate the choice of buffer size for a 328 queue in a switch/router or other network device, and the 329 threshold(s) and other parameters that determine how and when an AQM 330 algorithm operates. One the one hand, the optimum buffer size is a 331 function of operational requirements and should generally be sized to 332 be sufficient to buffer the largest normal traffic burst that is 333 expected. This size depends on the number and burstiness of traffic 334 arriving at the queue and the rate at which traffic leaves the queue. 335 Different types of traffic and deployment scenarios will lead to 336 different requirements. On the other hand, the choice of AQM 337 algorithm and associated parameters is a function of the way in which 338 congestion is experienced and the required reaction to achieve 339 acceptable performance. This latter topic is the primary topic of 340 the following sections. 342 3. Managing Aggressive Flows 344 One of the keys to the success of the Internet has been the 345 congestion avoidance mechanisms of TCP. Because TCP "backs off" 346 during congestion, a large number of TCP connections can share a 347 single, congested link in such a way that link bandwidth is shared 348 reasonably equitably among similarly situated flows. The equitable 349 sharing of bandwidth among flows depends on all flows running 350 compatible congestion avoidance algorithms, i.e., methods conformant 351 with the current TCP specification [RFC5681]. 353 We call a flow "TCP-friendly" when it has a congestion response that 354 approximates the average response expected of a TCP flow. One 355 example method of a TCP-friendly scheme is the TCP-Friendly Rate 356 Control algorithm [RFC5348]. In this document, the term is used more 357 generally to describe this and other algorithms that meet these 358 goals. 360 It is convenient to divide flows into three classes: (1) TCP Friendly 361 flows, (2) unresponsive flows, i.e., flows that do not slow down when 362 congestion occurs, and (3) flows that are responsive but are not TCP- 363 friendly. The last two classes contain more aggressive flows that 364 pose significant threats to Internet performance, which we will now 365 discuss. 367 1. TCP-Friendly flows 369 A TCP-friendly flow responds to congestion notification within a 370 small number of path Round Trip Times (RTT), and in steady-state 371 it uses no more capacity than a conformant TCP running under 372 comparable conditions (drop rate, RTT, packet size, etc.). This 373 is described in the remainder of the document. 375 2. Non-Responsive Flows 377 The User Datagram Protocol (UDP) [RFC0768] provides a minimal, 378 best-effort transport to applications and upper-layer protocols 379 (both simply called "applications" in the remainder of this 380 document) and does not itself provide mechanisms to prevent 381 congestion collapse and establish a degree of fairness [RFC5405]. 383 There is a growing set of UDP-based applications whose congestion 384 avoidance algorithms are inadequate or nonexistent (i.e, a flow 385 that does not throttle its sending rate when it experiences 386 congestion). Examples include some UDP streaming applications 387 for packet voice and video, and some multicast bulk data 388 transport. If no action is taken, such unresponsive flows could 389 lead to a new congestive collapse [RFC2309]. 391 In general, UDP-based applications need to incorporate effective 392 congestion avoidance mechanisms [RFC5405]. Further research and 393 development of ways to accomplish congestion avoidance for 394 presently unresponsive applications continue to be important. 395 Network devices need to be able to protect themselves against 396 unresponsive flows, and mechanisms to accomplish this must be 397 developed and deployed. Deployment of such mechanisms would 398 provide an incentive for all applications to become responsive by 399 either using a congestion-controlled transport (e.g. TCP, SCTP, 400 DCCP) or by incorporating their own congestion control in the 401 application [RFC5405]. 403 3. Non-TCP-friendly Transport Protocols 405 A second threat is posed by transport protocol implementations 406 that are responsive to congestion, but, either deliberately or 407 through faulty implementation, are not TCP-friendly. Such 408 applications may gain an unfair share of the available network 409 capacity. 411 For example, the popularity of the Internet has caused a 412 proliferation in the number of TCP implementations. Some of 413 these may fail to implement the TCP congestion avoidance 414 mechanisms correctly because of poor implementation. Others may 415 deliberately be implemented with congestion avoidance algorithms 416 that are more aggressive in their use of capacity than other TCP 417 implementations; this would allow a vendor to claim to have a 418 "faster TCP". The logical consequence of such implementations 419 would be a spiral of increasingly aggressive TCP implementations, 420 leading back to the point where there is effectively no 421 congestion avoidance and the Internet is chronically congested. 423 Another example could be an RTP/UDP video flow that uses an 424 adaptive codec, but responds incompletely to indications of 425 congestion or responds over an excessively long time period. 426 Such flows are unlikely to be responsive to congestion signals in 427 a timeframe comparable to a small number of end-to-end 428 transmission delays. However, over a longer timescale, perhaps 429 seconds in duration, they could moderate their speed, or increase 430 their speed if they determine capacity to be available. 432 Tunneled traffic aggregates carrying multiple (short) TCP flows 433 can be more aggressive than standard bulk TCP. Applications 434 (e.g. web browsers and peer-to-peer file-sharing) have exploited 435 this by opening multiple connections to the same endpoint. 437 The projected increase in the fraction of total Internet traffic for 438 more aggressive flows in classes 2 and 3 clearly poses a threat to 439 future Internet stability. There is an urgent need for measurements 440 of current conditions and for further research into the ways of 441 managing such flows. This raises many difficult issues in 442 identifying and isolating unresponsive or non-TCP-friendly flows at 443 an acceptable overhead cost. Finally, there is as yet little 444 measurement or simulation evidence available about the rate at which 445 these threats are likely to be realized, or about the expected 446 benefit of algorithms for managing such flows. 448 Another topic requiring consideration is the appropriate granularity 449 of a "flow" when considering a queue management method. There are a 450 few "natural" answers: 1) a transport (e.g. TCP or UDP) flow (source 451 address/port, destination address/port, Differentiated Services Code 452 Point - DSCP); 2) a source/destination host pair (IP addresses, 453 DSCP); 3) a given source host or a given destination host. We 454 suggest that the source/destination host pair gives the most 455 appropriate granularity in many circumstances. However, it is 456 possible that different vendors/providers could set different 457 granularities for defining a flow (as a way of "distinguishing" 458 themselves from one another), or that different granularities could 459 be chosen for different places in the network. It may be the case 460 that the granularity is less important than the fact that a network 461 device needs to be able to deal with more unresponsive flows at 462 *some* granularity. The granularity of flows for congestion 463 management is, at least in part, a question of policy that needs to 464 be addressed in the wider IETF community. 466 4. Conclusions and Recommendations 468 The IRTF, in publishing [RFC2309], and the IETF in subsequent 469 discussion, has developed a set of specific recommendations regarding 470 the implementation and operational use of AQM procedures. This 471 document updates these to include: 473 1. Network devices SHOULD implement some AQM mechanism to manage 474 queue lengths, reduce end-to-end latency, and avoid lock-out 475 phenomena within the Internet. 477 2. Deployed AQM algorithms SHOULD support Explicit Congestion 478 Notification (ECN) as well as loss to signal congestion to 479 endpoints. 481 3. The algorithms that the IETF recommends SHOULD NOT require 482 operational (especially manual) configuration or tuning. 484 4. AQM algorithms SHOULD respond to measured congestion, not 485 application profiles. 487 5. AQM algorithms SHOULD NOT interpret specific transport protocol 488 behaviours. 490 6. Transport protocol congestion control algorithms SHOULD maximize 491 their use of available capacity (when there is data to send) 492 without incurring undue loss or undue round trip delay. 494 7. Research, engineering, and measurement efforts are needed 495 regarding the design of mechanisms to deal with flows that are 496 unresponsive to congestion notification or are responsive, but 497 are more aggressive than present TCP. 499 These recommendations are expressed using the word "SHOULD". This is 500 in recognition that there may be use cases that have not been 501 envisaged in this document in which the recommendation does not 502 apply. However, care should be taken in concluding that one's use 503 case falls in that category; during the life of the Internet, such 504 use cases have been rarely if ever observed and reported on. To the 505 contrary, available research [Papagiannaki] says that even high speed 506 links in network cores that are normally very stable in depth and 507 behavior experience occasional issues that need moderation. 509 4.1. Operational deployments SHOULD use AQM procedures 511 AQM procedures are designed to minimize the delay induced in the 512 network by queues that have filled as a result of host behavior. 513 Marking and loss behaviors provide a signal that buffers within 514 network devices are becoming unnecessarily full, and that the sender 515 would do well to moderate its behavior. 517 4.2. Signaling to the transport endpoints 519 There are a number of ways a network device may signal to the end 520 point that the network is becoming congested and trigger a reduction 521 in rate. The signalling methods include: 523 o Delaying transport segments (packets) in flight, such as in a 524 queue. 526 o Dropping transport segments (packets) in transit. 528 o Marking transport segments (packets), such as using Explicit 529 Congestion Control[RFC3168] [RFC4301] [RFC4774] [RFC6040] 530 [RFC6679]. 532 The use of scheduling mechanisms, such as priority queuing, classful 533 queuing, and fair queuing, is often effective in networks to help a 534 network serve the needs of a range of applications. Network 535 operators can use these methods to manage traffic passing a choke 536 point. This is discussed in [RFC2474] and [RFC2475]. 538 Increased network latency can be used as an implicit signal of 539 congestion. E.g., in TCP additional delay can affect ACK Clocking 540 and has the result of reducing the rate of transmission of new data. 541 In RTP, network latency impacts the RTCP-reported RTT and increased 542 latency can trigger a sender to adjust its rate. Methods such as 543 LEDBAT [RFC6817] assume increased latency as a primary signal of 544 congestion. 546 It is essential that all Internet hosts respond to loss [RFC5681], 547 [RFC5405][RFC4960][RFC4340]. Packet dropping by network devices that 548 are under load has two effects: It protects the network, which is the 549 primary reason that network devices drop packets. The detection of 550 loss also provides a signal to a reliable transport (e.g. TCP, SCTP) 551 that there is potential congestion using a pragmatic heuristic; "when 552 the network discards a message in flight, it may imply the presence 553 of faulty equipment or media in a path, and it may imply the presence 554 of congestion. To be conservative transport must the latter." 555 Unreliable transports (e.g. using UDP) need to similarly react to 556 loss [RFC5405] 558 Network devices SHOULD use an AQM algorithm to determine the packets 559 that are marked or discarded due to congestion. 561 Loss also has an effect on the efficiency of a flow and can 562 significantly impact some classes of application. In reliable 563 transports the dropped data must be subsequently retransmitted. 564 While other applications/transports may adapt to the absence of lost 565 data, this still implies inefficient use of available capacity and 566 the dropped traffic can affect other flows. Hence, loss is not 567 entirely positive; it is a necessary evil. 569 4.2.1. AQM and ECN 571 Explicit Congestion Notification (ECN) [RFC4301] [RFC4774] [RFC6040] 572 [RFC6679] is a network-layer function that allows a transport to 573 receive network congestion information from a network device without 574 incurring the unintended consequences of loss. ECN includes both 575 transport mechanisms and functions implemented in network devices, 576 the latter rely upon using AQM to decider whether to ECN-mark. 578 Congestion for ECN-capable transports is signalled by a network 579 device setting the "Congestion Experienced (CE)" codepoint in the IP 580 header. This codepoint is noted by the remote receiving end point 581 and signalled back to the sender using a transport protocol 582 mechanism, allowing the sender to trigger timely congestion control. 583 The decision to set the CE codepoint requires an AQM algorithm 584 configured with a threshold. Non-ECN capable flows (the default) are 585 dropped under congestion. 587 Network devices SHOULD use an AQM algorithm that marks ECN-capable 588 traffic when making decisions about the response to congestion. 589 Network devices need to implement this method by marking ECN-capable 590 traffic or by dropping non-ECN-capable traffic. 592 Safe deployment of ECN requires that network devices drop excessive 593 traffic, even when marked as originating from an ECN-capable 594 transport. This is a necessary safety precaution because (1) A non- 595 conformant, broken or malicious receiver could conceal an ECN mark, 596 and not report this to the sender (2) A non-conformant, broken or 597 malicious sender could ignore a reported ECN mark, as it could ignore 598 a loss without using ECN (3) A malfunctioning or non-conforming 599 network device may similarly "hide" an ECN mark. In normal operation 600 such cases should be very uncommon. 602 Network devices SHOULD use an algorithm to drop excessive traffic, 603 even when marked as originating from an ECN-capable transport. 605 4.3. AQM algorithms deployed SHOULD NOT require operational tuning 607 A number of AQM algorithms have been proposed. Many require some 608 form of tuning or setting of parameters for initial network 609 conditions. This can make these algorithms difficult to use in 610 operational networks. 612 AQM algorithms need to consider both "initial conditions" and 613 "operational conditions". The former includes values that exist 614 before any experience is gathered about the use of the algorithm, 615 such as the configured speed of interface, support for full duplex 616 communication, interface MTU and other properties of the link. The 617 latter includes information observed from monitoring the size of the 618 queue, experienced queueing delay, rate of packet discard, etc. 620 This document therefore specifies that AQM algorithms that are 621 proposed for deployment in the Internet have the following 622 properties: 624 o SHOULD NOT require tuning of initial or configuration parameters. 625 An algorithm needs to provide a default behaviour that auto-tunes 626 to a reasonable performance for typical network operational 627 conditions. This is expected to ease deployment and operation. 628 Initial conditions, such as the interface rate and MTU size or 629 other values derived from these, MAY be required by an AQM 630 algorithm. 632 o MAY support further manual tuning that could improve performance 633 in a specific deployed network. Algorithms that lack such 634 variables are acceptable, but if such variables exist, they SHOULD 635 be externalized (made visible to the operator). Guidance needs to 636 be provided on the cases where autotuning is unlikely to achieve 637 satisfactory performance and to identify the set of parameters 638 that can be tuned. This is expected to enable the algorithm to be 639 deployed in networks that have specific characteristics (variable/ 640 larger delay; networks were capacity is impacted by interactions 641 with lower layer mechanisms, etc). 643 o MAY provide logging and alarm signals to assist in identifying if 644 an algorithm using manual or auto-tuning is functioning as 645 expected. (e.g., this could be based on an internal consistency 646 check between input, output, and mark/drop rates over time). This 647 is expected to encourage deployment by default and allow operators 648 to identify potential interactions with other network functions. 650 Hence, self-tuning algorithms are to be preferred. Algorithms 651 recommended for general Internet deployment by the IETF need to be 652 designed so that they do not require operational (especially manual) 653 configuration or tuning. 655 4.4. AQM algorithms SHOULD respond to measured congestion, not 656 application profiles. 658 Not all applications transmit packets of the same size. Although 659 applications may be characterised by particular profiles of packet 660 size this should not be used as the basis for AQM (see next section). 661 Other methods exist, e.g. Differentiated Services queueing, Pre- 662 Congestion Notification (PCN) [RFC5559], that can be used to 663 differentiate and police classes of application. Network devices may 664 combine AQM with these traffic classification mechanisms and perform 665 AQM only on specific queues within a network device. 667 An AQM algorithm should not deliberately try to prejudice the size of 668 packet that performs best (i.e. Preferentially drop/mark based only 669 on packet size). Procedures for selecting packets to mark/drop 670 SHOULD observe the actual or projected time that a packet is in a 671 queue (bytes at a rate being an analog to time). When an AQM 672 algorithm decides whether to drop (or mark) a packet, it is 673 RECOMMENDED that the size of the particular packet should not be 674 taken into account [Byte-pkt]. 676 Applications (or transports) generally know the packet size that they 677 are using and can hence make their judgments about whether to use 678 small or large packets based on the data they wish to send and the 679 expected impact on the delay or throughput, or other performance 680 parameter. When a transport or application responds to a dropped or 681 marked packet, the size of the rate reduction should be proportionate 682 to the size of the packet that was sent [Byte-pkt]. 684 AQM-enabled system MAY instantiate different instances of an AQM 685 algorithm to be applied within the same traffic class. Traffic 686 classes may be differentiated based on an Access Control List (ACL), 687 the packet DiffServ Code Point (DSCP) [RFC5559], setting of the ECN 688 field[RFC3168] [RFC4774] or an equivalent codepoint at a lower layer. 689 This recommendation goes beyond what is defined in RFC 3168, by 690 allowing that an implementation MAY use more than one instance of an 691 AQM algorithm to handle both ECN-capable and non-ECN-capable packets. 693 4.5. AQM algorithms SHOULD NOT be dependent on specific transport 694 protocol behaviours 696 In deploying AQM, network devices need to support a range of Internet 697 traffic and SHOULD NOT make implicit assumptions about the 698 characteristics desired by the set transports/applications the 699 network supports. That is, AQM methods should be opaque to the 700 choice of transport and application. 702 AQM algorithms are often evaluated by considering TCP [RFC0793] with 703 a limited number of applications. Although TCP is the predominant 704 transport in the Internet today, this no longer represents a 705 sufficient selection of traffic for verification. There is 706 significant use of UDP [RFC0768] in voice and video services, and 707 some applications find utility in SCTP [RFC4960] and DCCP [RFC4340]. 708 Hence, AQM algorithms should also demonstrate operation with 709 transports other than TCP and need to consider a variety of 710 applications. Selection of AQM algorithms also needs to consider use 711 of tunnel encapsulations that may carry traffic aggregates. 713 AQM algorithms SHOULD NOT target or derive implicit assumptions about 714 the characteristics desired by specific transports/applications. 715 Transports and applications need to respond to the congestion signals 716 provided by AQM (i.e. dropping or ECN-marking) in a timely manner 717 (within a few RTT at the latest). 719 4.6. Interactions with congestion control algorithms 721 Applications and transports need to react to received implicit or 722 explicit signals that indicate the presence of congestion. This 723 section identifies issues that can impact the design of transport 724 protocols when using paths that use AQM. 726 Transport protocols and applications need timely signals of 727 congestion. The time taken to detect and respond to congestion is 728 increased when network devices queue packets in buffers. It can be 729 difficult to detect tail losses at a higher layer and this may 730 sometimes require transport timers or probe packets to detect and 731 respond to such loss. Loss patterns may also impact timely 732 detection, e.g. the time may be reduced when network devices do not 733 drop long runs of packets from the same flow. 735 A common objective is to deliver data from its source end point to 736 its destination in the least possible time. When speaking of TCP 737 performance, the terms "knee" and "cliff" area defined by [Jain94]. 738 They respectively refer to the minimum congestion window that 739 maximises throughput and the maximum congestion window that avoids 740 loss. An application that transmits at the rate determined by this 741 window has the effect of maximizing the rate or throughput. For the 742 sender, exceeding the cliff is ineffective, as it (by definition) 743 induces loss; operating at a point close to the cliff has a negative 744 impact on other traffic and applications, triggering operator 745 activities, such as those discussed in [RFC6057]. Operating below 746 the knee reduces the throughput, since the sender fails to use 747 available network capacity. As a result, the behavior of any elastic 748 transport congestion control algorithm designed to minimise delivery 749 time should seek to use an effective window at or above the knee and 750 well below the cliff. Choice of an appropriate rate can 751 significantly impact the loss and delay experienced not only by a 752 flow, but by other flows that share the same queue. 754 Some applications may send less than permitted by the congestion 755 control window (or rate). Examples include multimedia codecs that 756 stream at some natural rate (or set of rates) or an application that 757 is naturally interactive (e.g., some web applications, gaming, 758 transaction-based protocols). Such applications may have different 759 objectives. They may not wish to maximise throughput, but may desire 760 a lower loss rate or bounded delay. 762 The correct operation of an AQM-enabled network device MUST NOT rely 763 upon specific transport responses to congestion signals. 765 4.7. The need for further research 767 The second recommendation of [RFC2309] called for further research 768 into the interaction between network queues and host applications, 769 and the means of signaling between them. This research has occurred, 770 and we as a community have learned a lot. However, we are not done. 772 We have learned that the problems of congestion, latency and buffer- 773 sizing have not gone away, and are becoming more important to many 774 users. A number of self-tuning AQM algorithms have been found that 775 offer significant advantages for deployed networks. There is also 776 renewed interest in deploying AQM and the potential of ECN. 778 In 2013, an obvious example of further research is the need to 779 consider the use of Map/Reduce applications in data centers; do we 780 need to extend our taxonomy of TCP/SCTP sessions to include not only 781 "mice" and "elephants", but "lemmings". Where "Lemmings" are flash 782 crowds of "mice" that the network inadvertently try to signal to as 783 if they were elephant flows, resulting in head of line blocking in 784 data center applications. 786 Examples of other required research include: 788 o Research into new AQM and scheduling algorithms. 790 o Research into the use of and deployment of ECN alongside AQM. 792 o Tools for enabling AQM (and ECN) deployment and measuring the 793 performance. 795 o Methods for mitigating the impact of non-conformant and malicious 796 flows. 798 o Research to understand the implications of using new network and 799 transport methods on applications. 801 Hence, this document therefore reiterates the call of RFC 2309: we 802 need continuing research as applications develop. 804 5. IANA Considerations 806 This memo asks the IANA for no new parameters. 808 6. Security Considerations 810 While security is a very important issue, it is largely orthogonal to 811 the performance issues discussed in this memo. 813 Many deployed network devices use queueing methods that allow 814 unresponsive traffic to capture network capacity, denying access to 815 other traffic flows. This could potentially be used as a denial-of- 816 service attack. This threat could be reduced in network devices 817 deploy AQM or some form of scheduling. We note, however, that a 818 denial-of-service attack may create unresponsive traffic flows that 819 may be indistinguishable from other traffic flows (e.g. tunnels 820 carrying aggregates of short flows, high-rate isochronous 821 applications). New methods therefore may remain vulnerable, and this 822 document recommends that ongoing research should consider ways to 823 mitigate such attacks. 825 7. Privacy Considerations 827 This document, by itself, presents no new privacy issues. 829 8. Acknowledgements 831 The original recommendation in [RFC2309] was written by the End-to- 832 End Research Group, which is to say Bob Braden, Dave Clark, Jon 833 Crowcroft, Bruce Davie, Steve Deering, Deborah Estrin, Sally Floyd, 834 Van Jacobson, Greg Minshall, Craig Partridge, Larry Peterson, KK 835 Ramakrishnan, Scott Shenker, John Wroclawski, and Lixia Zhang. This 836 is an edited version of that document, with much of its text and 837 arguments unchanged. 839 The need for an updated document was agreed to in the tsvarea meeting 840 at IETF 86. This document was reviewed on the aqm@ietf.org list. 841 Comments came from Colin Perkins, Richard Scheffenegger, Dave Taht, 842 and many others. 844 Gorry Fairhurst was in part supported by the European Community under 845 its Seventh Framework Programme through the Reducing Internet 846 Transport Latency (RITE) project (ICT-317700). 848 9. References 850 9.1. Normative References 852 [Byte-pkt] 853 and Internet Engineering Task Force, Work in Progress, 854 "Byte and Packet Congestion Notification (draft-ietf- 855 tsvwg-byte-pkt-congest)", July 2013. 857 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 858 Requirement Levels", BCP 14, RFC 2119, March 1997. 860 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 861 of Explicit Congestion Notification (ECN) to IP", RFC 862 3168, September 2001. 864 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 865 Internet Protocol", RFC 4301, December 2005. 867 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 868 Explicit Congestion Notification (ECN) Field", BCP 124, 869 RFC 4774, November 2006. 871 [RFC5405] Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines 872 for Application Designers", BCP 145, RFC 5405, November 873 2008. 875 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 876 Control", RFC 5681, September 2009. 878 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 879 Notification", RFC 6040, November 2010. 881 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 882 and K. Carlberg, "Explicit Congestion Notification (ECN) 883 for RTP over UDP", RFC 6679, August 2012. 885 9.2. Informative References 887 [AQM-WG] "IETF AQM WG", . 889 [Demers90] 890 Demers, A., Keshav, S., and S. Shenker, "Analysis and 891 Simulation of a Fair Queueing Algorithm, Internetworking: 892 Research and Experience", SIGCOMM Symposium proceedings on 893 Communications architectures and protocols , 1990. 895 [Floyd91] Floyd, S., "Connections with Multiple Congested Gateways 896 in Packet-Switched Networks Part 1: One-way Traffic.", 897 Computer Communications Review , October 1991. 899 [Floyd95] Floyd, S. and V. Jacobson, "Link-sharing and Resource 900 Management Models for Packet Networks", IEEE/ACM 901 Transactions on Networking , August 1995. 903 [Jacobson88] 904 Jacobson, V., "Congestion Avoidance and Control", SIGCOMM 905 Symposium proceedings on Communications architectures and 906 protocols , August 1988. 908 [Jain94] Jain, Raj., Ramakrishnan, KK., and Chiu. Dah-Ming, 909 "Congestion avoidance scheme for computer networks", US 910 Patent Office 5377327, December 1994. 912 [Lakshman96] 913 Lakshman, TV., Neidhardt, A., and T. Ott, "The Drop From 914 Front Strategy in TCP Over ATM and Its Interworking with 915 Other Control Features", IEEE Infocomm , 1996. 917 [Leland94] 918 Leland, W., Taqqu, M., Willinger, W., and D. Wilson, "On 919 the Self-Similar Nature of Ethernet Traffic (Extended 920 Version)", IEEE/ACM Transactions on Networking , February 921 1994. 923 [Papagiannaki] 924 Sprint ATL, KAIST, University of Minnesota, Sprint ATL, 925 and Intel ResearchIETF, "Analysis of Point-To-Point Packet 926 Delay In an Operational Network", IEEE Infocom 2004, March 927 2004, . 929 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 930 August 1980. 932 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 933 1981. 935 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 936 793, September 1981. 938 [RFC0896] Nagle, J., "Congestion control in IP/TCP internetworks", 939 RFC 896, January 1984. 941 [RFC0970] Nagle, J., "On packet switches with infinite storage", RFC 942 970, December 1985. 944 [RFC1122] Braden, R., "Requirements for Internet Hosts - 945 Communication Layers", STD 3, RFC 1122, October 1989. 947 [RFC1633] Braden, B., Clark, D., and S. Shenker, "Integrated 948 Services in the Internet Architecture: an Overview", RFC 949 1633, June 1994. 951 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 952 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 953 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 954 S., Wroclawski, J., and L. Zhang, "Recommendations on 955 Queue Management and Congestion Avoidance in the 956 Internet", RFC 2309, April 1998. 958 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 959 (IPv6) Specification", RFC 2460, December 1998. 961 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 962 "Definition of the Differentiated Services Field (DS 963 Field) in the IPv4 and IPv6 Headers", RFC 2474, December 964 1998. 966 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 967 and W. Weiss, "An Architecture for Differentiated 968 Services", RFC 2475, December 1998. 970 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 971 Congestion Control Protocol (DCCP)", RFC 4340, March 2006. 973 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC 974 4960, September 2007. 976 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 977 Friendly Rate Control (TFRC): Protocol Specification", RFC 978 5348, September 2008. 980 [RFC5559] Eardley, P., "Pre-Congestion Notification (PCN) 981 Architecture", RFC 5559, June 2009. 983 [RFC6057] Bastian, C., Klieber, T., Livingood, J., Mills, J., and R. 984 Woundy, "Comcast's Protocol-Agnostic Congestion Management 985 System", RFC 6057, December 2010. 987 [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, 988 "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, 989 December 2012. 991 [Willinger95] 992 Willinger, W., Taqqu, M., Sherman, R., Wilson, D., and V. 993 Jacobson, "Self-Similarity Through High-Variability: 994 Statistical Analysis of Ethernet LAN Traffic at the Source 995 Level", SIGCOMM Symposium proceedings on Communications 996 architectures and protocols , August 1995. 998 Appendix A. Change Log 1000 Initial Version: March 2013 1002 Minor update of the algorithms that the IETF recommends SHOULD NOT 1003 require operational (especially manual) configuration or tuningdate: 1005 April 2013 1007 Major surgery. This draft is for discussion at IETF-87 and expected 1008 to be further updated. 1009 July 2013 1011 -00 WG Draft - Updated transport recommendations; revised deployment 1012 configuration section; numerous minor edits. 1013 Oct 2013 1015 -01 WG Draft - Updated transport recommendations; revised deployment 1016 configuration section; numerous minor edits. 1017 Jan 2014 - Feedback from WG. 1019 -02 WG Draft - Minor edits Feb 2014 - Mainly language fixes. 1021 Authors' Addresses 1023 Fred Baker (editor) 1024 Cisco Systems 1025 Santa Barbara, California 93117 1026 USA 1028 Email: fred@cisco.com 1030 Godred Fairhurst (editor) 1031 University of Aberdeen 1032 School of Engineering 1033 Fraser Noble Building 1034 Aberdeen, Scotland AB24 3UE 1035 UK 1037 Email: gorry@erg.abdn.ac.uk 1038 URI: http://www.erg.abdn.ac.uk