idnits 2.17.1 draft-ietf-aqm-recommendation-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC2309, but the abstract doesn't seem to directly say this. It does mention RFC2309 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 962 has weird spacing: '...-87 and expec...' -- The document date (October 17, 2013) is 3844 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5405 (Obsoleted by RFC 8085) -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 896 (Obsoleted by RFC 7805) -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 2960 (Obsoleted by RFC 4960) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Baker, Ed. 3 Internet-Draft Cisco Systems 4 Obsoletes: 2309 (if approved) G. Fairhurst, Ed. 5 Intended status: BCP University of Aberdeen 6 Expires: April 17, 2014 October 17, 2013 8 IETF Recommendations Regarding Active Queue Management 9 draft-ietf-aqm-recommendation-00 11 Abstract 13 This memo presents recommendations to the Internet community 14 concerning measures to improve and preserve Internet performance. It 15 presents a strong recommendation for testing, standardization, and 16 widespread deployment of active queue management (AQM) in network 17 devices, to improve the performance of today's Internet. It also 18 urges a concerted effort of research, measurement, and ultimate 19 deployment of AQM mechanisms to protect the Internet from flows that 20 are not sufficiently responsive to congestion notification. 22 The note largely repeats the recommendations of RFC 2309, updated 23 after fifteen years of experience and new research. 25 Status of this Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on April 13, 2014. 42 Copyright Notice 44 Copyright (c) 2013 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 61 2. The Need For Active Queue Management . . . . . . . . . . . . . 4 62 3. Managing Aggressive Flows . . . . . . . . . . . . . . . . . . 8 63 4. Conclusions and Recommendations . . . . . . . . . . . . . . . 10 64 4.1. Operational deployments SHOULD use AQM procedures . . . . 11 65 4.2. Signaling to the transport endpoints . . . . . . . . . . . 11 66 4.2.1. AQM and ECN . . . . . . . . . . . . . . . . . . . . . 12 67 4.3. AQM algorithms deployed SHOULD NOT require operational 68 tuning . . . . . . . . . . . . . . . . . . . . . . . . . . 13 69 4.4. AQM algorithms SHOULD respond to measured congestion, 70 not application profiles. . . . . . . . . . . . . . . . . 14 71 4.5. AQM algorithms SHOULD NOT be dependent on specific 72 transport protocol behaviours . . . . . . . . . . . . . . 14 73 4.6. Interactions with congestion control algorithms . . . . . 15 74 4.7. The need for further research . . . . . . . . . . . . . . 16 75 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 76 6. Security Considerations . . . . . . . . . . . . . . . . . . . 17 77 7. Privacy Considerations . . . . . . . . . . . . . . . . . . . . 17 78 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17 79 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 80 9.1. Normative References . . . . . . . . . . . . . . . . . . . 18 81 9.2. Informative References . . . . . . . . . . . . . . . . . . 18 82 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 21 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21 85 1. Introduction 87 The Internet protocol architecture is based on a connectionless end- 88 to-end packet service using the Internet Protocol, whether IPv4 89 [RFC0791] or IPv6 [RFC2460]. The advantages of its connectionless 90 design: flexibility and robustness, have been amply demonstrated. 91 However, these advantages are not without cost: careful design is 92 required to provide good service under heavy load. In fact, lack of 93 attention to the dynamics of packet forwarding can result in severe 94 service degradation or "Internet meltdown". This phenomenon was 95 first observed during the early growth phase of the Internet of the 96 mid 1980s [RFC0896][RFC0970], and is technically called "congestive 97 collapse". 99 The original fix for Internet meltdown was provided by Van Jacobsen. 100 Beginning in 1986, Jacobsen developed the congestion avoidance 101 mechanisms that are now required in TCP implementations [Jacobson88] 102 [RFC1122]. These mechanisms operate in Internet hosts to cause TCP 103 connections to "back off" during congestion. We say that TCP flows 104 are "responsive" to congestion signals (i.e., marked or dropped 105 packets) from the network. It is primarily these TCP congestion 106 avoidance algorithms that prevent the congestive collapse of today's 107 Internet. 109 However, that is not the end of the story. Considerable research has 110 been done on Internet dynamics since 1988, and the Internet has 111 grown. It has become clear that the TCP congestion avoidance 112 mechanisms [RFC5681], while necessary and powerful, are not 113 sufficient to provide good service in all circumstances. Basically, 114 there is a limit to how much control can be accomplished from the 115 edges of the network. Some mechanisms are needed in the network 116 devices to complement the endpoint congestion avoidance mechanisms. 117 These mechanisms may be implemented in network devices that include 118 routers, switches, and other network middleboxes. 120 It is useful to distinguish between two classes of algorithms related 121 to congestion control: "queue management" versus "scheduling" 122 algorithms. To a rough approximation, queue management algorithms 123 manage the length of packet queues by marking or dropping packets 124 when necessary or appropriate, while scheduling algorithms determine 125 which packet to send next and are used primarily to manage the 126 allocation of bandwidth among flows. While these two AQM mechanisms 127 are closely related, they address different performance issues. 129 This memo highlights two performance issues: 131 The first issue is the need for an advanced form of queue management 132 that we call "active queue management." Section 2 summarizes the 133 benefits that active queue management can bring. A number of Active 134 Queue Management (AQM) procedures are described in the literature, 135 with different characteristics. This document does not recommend any 136 of them in particular, but does make recommendations that ideally 137 would affect the choice of procedure used in a given implementation. 139 The second issue, discussed in Section 3 of this memo, is the 140 potential for future congestive collapse of the Internet due to flows 141 that are unresponsive, or not sufficiently responsive, to congestion 142 indications. Unfortunately, there is no consensus solution to 143 controlling congestion caused by such aggressive flows; significant 144 research and engineering will be required before any solution will be 145 available. It is imperative that this work be energetically pursued, 146 to ensure the future stability of the Internet. 148 Section 4 concludes the memo with a set of recommendations to the 149 Internet community concerning these topics. 151 The discussion in this memo applies to "best-effort" traffic, which 152 is to say, traffic generated by applications that accept the 153 occasional loss, duplication, or reordering of traffic in flight. It 154 also applies to other traffic, such as real-time traffic that can 155 adapt its sending rate to reduce loss and/or delay. It is most 156 effective, when the adaption occurs on time scales of a single RTT or 157 a small number of RTTs, for elastic traffic [RFC1633]. 159 [RFC2309] resulted from past discussions of end-to-end performance, 160 Internet congestion, and Random Early Discard (RED) in the End-to-End 161 Research Group of the Internet Research Task Force (IRTF). This 162 update results from experience with this and other algorithms, and 163 the AQM discussion within the IETF[AQM-WG]. 165 1.1. Requirements Language 167 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 168 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 169 document are to be interpreted as described in [RFC2119]. 171 2. The Need For Active Queue Management 173 The traditional technique for managing the queue length in a network 174 device is to set a maximum length (in terms of packets) for each 175 queue, accept packets for the queue until the maximum length is 176 reached, then reject (drop) subsequent incoming packets until the 177 queue decreases because a packet from the queue has been transmitted. 178 This technique is known as "tail drop", since the packet that arrived 179 most recently (i.e., the one on the tail of the queue) is dropped 180 when the queue is full. This method has served the Internet well for 181 years, but it has two important drawbacks. 183 1. Lock-Out 185 In some situations tail drop allows a single connection or a few 186 flows to monopolize queue space, preventing other connections 187 from getting room in the queue. This "lock-out" phenomenon is 188 often the result of synchronization or other timing effects. 190 2. Full Queues 192 The tail drop discipline allows queues to maintain a full (or, 193 almost full) status for long periods of time, since tail drop 194 signals congestion (via a packet drop) only when the queue has 195 become full. It is important to reduce the steady-state queue 196 size, and this is perhaps the most important goal for queue 197 management. 199 The naive assumption might be that there is a simple tradeoff 200 between delay and throughput, and that the recommendation that 201 queues be maintained in a "non-full" state essentially translates 202 to a recommendation that low end-to-end delay is more important 203 than high throughput. However, this does not take into account 204 the critical role that packet bursts play in Internet 205 performance. Even though TCP constrains the congestion window of 206 a flow, packets often arrive at network devices in bursts 207 [Leland94]. If the queue is full or almost full, an arriving 208 burst will cause multiple packets to be dropped. This can result 209 in a global synchronization of flows throttling back, followed by 210 a sustained period of lowered link utilization, reducing overall 211 throughput. 213 The point of buffering in the network is to absorb data bursts 214 and to transmit them during the (hopefully) ensuing bursts of 215 silence. This is essential to permit the transmission of bursty 216 data. Normally small queues are prefered in network devices, 217 with sufficient queue capacity to absorb the bursts. The 218 counter-intuitive result is that maintaining normally-small 219 queues can result in higher throughput as well as lower end-to- 220 end delay. In summary, queue limits should not reflect the 221 steady state queues we want to be maintained in the network; 222 instead, they should reflect the size of bursts that a network 223 device needs to absorb. 225 Besides tail drop, two alternative queue disciplines that can be 226 applied when a queue becomes full are "random drop on full" or "drop 227 front on full". Under the random drop on full discipline, a network 228 device drops a randomly selected packet from the queue (which can be 229 an expensive operation, since it naively requires an O(N) walk 230 through the packet queue) when the queue is full and a new packet 231 arrives. Under the "drop front on full" discipline [Lakshman96], the 232 network device drops the packet at the front of the queue when the 233 queue is full and a new packet arrives. Both of these solve the 234 lock-out problem, but neither solves the full-queues problem 235 described above. 237 We know in general how to solve the full-queues problem for 238 "responsive" flows, i.e., those flows that throttle back in response 239 to congestion notification. In the current Internet, dropped packets 240 provide a critical mechanism indicating congestion notification to 241 hosts. The solution to the full-queues problem is for network 242 devices to drop packets before a queue becomes full, so that hosts 243 can respond to congestion before buffers overflow. We call such a 244 proactive approach AQM. By dropping packets before buffers overflow, 245 AQM allows network devices to control when and how many packets to 246 drop. 248 In summary, an active queue management mechanism can provide the 249 following advantages for responsive flows. 251 1. Reduce number of packets dropped in network devices 253 Packet bursts are an unavoidable aspect of packet networks 254 [Willinger95]. If all the queue space in a network device is 255 already committed to "steady state" traffic or if the buffer 256 space is inadequate, then the network device will have no ability 257 to buffer bursts. By keeping the average queue size small, AQM 258 will provide greater capacity to absorb naturally-occurring 259 bursts without dropping packets. 261 Furthermore, without AQM, more packets will be dropped when a 262 queue does overflow. This is undesirable for several reasons. 263 First, with a shared queue and the tail drop discipline, this can 264 result in unnecessary global synchronization of flows, resulting 265 in lowered average link utilization, and hence lowered network 266 throughput. Second, unnecessary packet drops represent a 267 possible waste of network capacity on the path before the drop 268 point. 270 While AQM can manage queue lengths and reduce end-to-end latency 271 even in the absence of end-to-end congestion control, it will be 272 able to reduce packet drops only in an environment that continues 273 to be dominated by end-to-end congestion control. 275 2. Provide a lower-delay interactive service 277 By keeping a small average queue size, AQM will reduce the delays 278 experienced by flows. This is particularly important for 279 interactive applications such as short Web transfers, Telnet 280 traffic, or interactive audio-video sessions, whose subjective 281 (and objective) performance is better when the end-to-end delay 282 is low. 284 3. Avoid lock-out behavior 286 AQM can prevent lock-out behavior by ensuring that there will 287 almost always be a buffer available for an incoming packet. For 288 the same reason, AQM can prevent a bias against low capacity, but 289 highly bursty, flows. 291 Lock-out is undesirable because it constitutes a gross unfairness 292 among groups of flows. However, we stop short of calling this 293 benefit "increased fairness", because general fairness among 294 flows requires per-flow state, which is not provided by queue 295 management. For example, in a network device using AQM with only 296 FIFO scheduling, two TCP flows may receive very different share 297 of the network capacity simply because they have different round- 298 trip times [Floyd91], and a flow that does not use congestion 299 control may receive more capacity than a flow that does. For 300 example, a router may maintain per-flow state to achieve general 301 fairness by a per-flow scheduling algorithm such as Fair Queueing 302 (FQ) [Demers90], or a Class-Based Queue scheduling algorithm such 303 as CBQ [Floyd95]. 305 In contrast, AQM is needed even for network devices that use per- 306 flow scheduling algorithms such as FQ or class-based scheduling 307 algorithms, such as CBQ. This is because per-flow scheduling 308 algorithms by themselves do not control the overall queue size or 309 the size of individual queues. AQM is needed to control the 310 overall average queue sizes, so that arriving bursts can be 311 accommodated without dropping packets. In addition, AQM should 312 be used to control the queue size for each individual flow or 313 class, so that they do not experience unnecessarily high delay. 314 Therefore, AQM should be applied across the classes or flows as 315 well as within each class or flow. 317 In short, scheduling algorithms and queue management should be 318 seen as complementary, not as replacements for each other. 320 3. Managing Aggressive Flows 322 One of the keys to the success of the Internet has been the 323 congestion avoidance mechanisms of TCP. Because TCP "backs off" 324 during congestion, a large number of TCP connections can share a 325 single, congested link in such a way that link bandwidth is shared 326 reasonably equitably among similarly situated flows. The equitable 327 sharing of bandwidth among flows depends on all flows running 328 compatible congestion avoidance algorithms, i.e., methods conformant 329 with the current TCP specification [RFC5681]. 331 We call a flow "TCP-friendly" when it has a congestion response that 332 approximates the average response expected of a TCP flow. One 333 example method of a TCP-friendly scheme is the TCP-Friendly Rate 334 Control algorithm [RFC5348]. In this document, the term is used more 335 generally to describe this and other algorithms that meet these 336 goals. 338 It is convenient to divide flows into three classes: (1) TCP Friendly 339 flows, (2) unresponsive flows, i.e., flows that do not slow down when 340 congestion occurs, and (3) flows that are responsive but are not TCP- 341 friendly. The last two classes contain more aggressive flows that 342 pose significant threats to Internet performance, which we will now 343 discuss. 345 1. TCP-Friendly flows 347 A TCP-friendly flow responds to congestion notification within a 348 small number of path Round Trip Times (RTT), and in steady-state 349 it uses no more capacity than a conformant TCP running under 350 comparable conditions (drop rate, RTT, MTU, etc.). This is 351 described in the remainder of the document. 353 2. Non-Responsive Flows 355 The User Datagram Protocol (UDP) [RFC0768] provides a minimal, 356 best-effort transport to applications and upper-layer protocols 357 (both simply called "applications" in the remainder of this 358 document) and does not itself provide mechanisms to prevent 359 congestion collapse and establish a degree of fairness [RFC5405]. 361 There is a growing set of UDP-based applications whose congestion 362 avoidance algorithms are inadequate or nonexistent (i.e, a flow 363 that does not throttle its sending rate when it experiences 364 congestion). Examples include some UDP streaming applications 365 for packet voice and video, and some multicast bulk data 366 transport. If no action is taken, such unresponsive flows could 367 lead to a new congestive collapse [RFC2309]. 369 In general, UDP-based applications need to incorporate effective 370 congestion avoidance mechanisms [RFC5405]. Further research and 371 development of ways to accomplish congestion avoidance for 372 presently unresponsive applications continue to be 373 important.Network devices need to be able to protect themselves 374 against unresponsive flows, and mechanisms to accomplish this 375 must be developed and deployed. Deployment of such mechanisms 376 would provide an incentive for all applications to become 377 responsive by either using a congestion-controlled transport 378 (e.g. TCP, SCTP, DCCP) or by incorporating their own congestion 379 control in the application. [RFC5405]. 381 3. Non-TCP-friendly Transport Protocols 383 A second threat is posed by transport protocol implementations 384 that are responsive to congestion, but, either deliberately or 385 through faulty implementation, are not TCP-friendly. Such 386 applications may gain an unfair share of the available network 387 capacity. 389 For example, the popularity of the Internet has caused a 390 proliferation in the number of TCP implementations. Some of 391 these may fail to implement the TCP congestion avoidance 392 mechanisms correctly because of poor implementation. Others may 393 deliberately be implemented with congestion avoidance algorithms 394 that are more aggressive in their use of capacity than other TCP 395 implementations; this would allow a vendor to claim to have a 396 "faster TCP". The logical consequence of such implementations 397 would be a spiral of increasingly aggressive TCP implementations, 398 leading back to the point where there is effectively no 399 congestion avoidance and the Internet is chronically congested. 401 Another example could be an RTP/UDP video flow that uses an 402 adaptive codec, but responds incompletely to indications of 403 congestion or over responds over an excessively long time period. 404 Such flows are unlikely to be responsive to congestion signals in 405 a time frame comparable to a small number of end-to-end 406 transmission delays. However, over a longer timescale, perhaps 407 seconds in duration, they could moderate their speed, or increase 408 their speed if they determine capacity to be available. 410 Tunneled traffic aggregates carrying multiple (short) TCP flows 411 can be more aggressive than standard bulk TCP. Applications 412 (e.g. web browsers and peer-to-peer file-sharing) have exploited 413 this by opening multiple connections to the same endpoint. 415 The projected increase in the fraction of total Internet traffic for 416 more aggressive flows in classes 2 and 3 clearly poses a threat to 417 future Internet stability. There is an urgent need for measurements 418 of current conditions and for further research into the ways of 419 managing such flows. This raises many difficult issues in 420 identifying and isolating unresponsive or non-TCP-friendly flows at 421 an acceptable overhead cost. Finally, there is as yet little 422 measurement or simulation evidence available about the rate at which 423 these threats are likely to be realized, or about the expected 424 benefit of algorithms for managing such flows. 426 Another topic requiring consideration is the appropriate granularity 427 of a "flow" when considering a queue management method. There are a 428 few "natural" answers: 1) a transport (e.g. TCP or UDP) flow (source 429 address/port, destination address/port, DSCP); 2) a source/ 430 destination host pair (IP addresses, DSCP); 3) a given source host or 431 a given destination host. We suggest that the source/destination 432 host pair gives the most appropriate granularity in many 433 circumstances. However, it is possible that different vendors/ 434 providers could set different granularities for defining a flow (as a 435 way of "distinguishing" themselves from one another), or that 436 different granularities could be chosen for different places in the 437 network. It may be the case that the granularity is less important 438 than the fact that a network device needs to be able to deal with 439 more unresponsive flows at *some* granularity. The granularity of 440 flows for congestion management is, at least in part, a question of 441 policy that needs to be addressed in the wider IETF community. 443 4. Conclusions and Recommendations 445 The IRTF, in publishing [RFC2309], and the IETF in subsequent 446 discussion, has developed a set of specific recommendations regarding 447 the implementation and operational use of AQM procedures. This 448 document updates these to include: 450 1. Network devices SHOULD implement some AQM mechanism to manage 451 queue lengths, reduce end-to-end latency, and avoid lock-out 452 phenomena within the Internet. 454 2. Deployed AQM algorithms SHOULD support Explicit Congestion 455 Notification (ECN) as well as loss to signal congestion to 456 endpoints. 458 3. The algorithms that the IETF recommends SHOULD NOT require 459 operational (especially manual) configuration or tuning. 461 4. AQM algorithms SHOULD respond to measured congestion, not 462 application profiles. 464 5. AQM algorithms SHOULD NOT interpret specific transport protocol 465 behaviours. 467 6. Transport protocol congestion control algorithms SHOULD maximize 468 their use of available capacity (when there is data to send) 469 without incurring undue loss or undue round trip delay. 471 7. Research, engineering, and measurement efforts are needed 472 regarding the design of mechanisms to deal with flows that are 473 unresponsive to congestion notification or are responsive, but 474 are more aggressive than present TCP. 476 These recommendations are expressed using the word "SHOULD". This is 477 in recognition that there may be use cases that have not been 478 envisaged in this document in which the recommendation does not 479 apply. However, care should be taken in concluding that one's use 480 case falls in that category; during the life of the Internet, such 481 use cases have been rarely if ever observed and reported on. To the 482 contrary, available research [Papagiannaki] says that even high speed 483 links in network cores that are normally very stable in depth and 484 behavior experience occasional issues that need moderation. 486 4.1. Operational deployments SHOULD use AQM procedures 488 AQM procedures are designed to minimize delay induced in the network 489 by queues that have filled as a result of host behavior. Marking and 490 loss behaviors provide a signal that buffers within network devices 491 are becoming unnecessarily full, and that the sender would do well to 492 moderate its behavior. 494 4.2. Signaling to the transport endpoints 496 There are a number of ways a network device may signal to the end 497 point that the network is becoming congested and trigger a reduction 498 in rate. The signalling methods include: 500 o Delaying data segments in flight, such as in a queue. 502 o Dropping data segments in transit. 504 o Marking data segments, such as using Explicit Congestion 505 Control[RFC3168] [RFC4301] [RFC4774] [RFC6040] [RFC6679]. 507 The use of scheduling mechanisms, such as priority queuing, classful 508 queuing, and fair queuing, is often effective in networks to help a 509 network serve the needs of a range of applications. Network 510 operators can use these methods to manage traffic passing a choke 511 point. This is discussed in [RFC2474] and [RFC2475]. 513 Increased network latency can be used as an implicit signal of 514 congestion. E.g., in TCP additional delay can affect ACK Clocking 515 and has the result of reducing the rate of transmission of new data. 516 In RTP, network latency impacts the RTCP-reported RTT and increased 517 latency can trigger a sender to adjust its rate. Methods such as 518 LEDBAT [RFC6817] assume increased latency as a primary signal of 519 congestion. 521 It is essential that all Internet hosts respond to loss [RFC5681], 522 [RFC5405][RFC2960][RFC4340]. Packet dropping by network devices that 523 are under load has two effects: It protects the network, which is the 524 primary reason that network devices drop packets. The detection of 525 loss also provides a signal to a reliable transport (e.g. TCP, SCTP) 526 that there is potential congestion using a pragmatic heuristic; "when 527 the network discards a message in flight, it may imply the presence 528 of faulty equipment or media in a path, and it may imply the presence 529 of congestion. To be conservative transport must the latter." 530 Unreliable transports (e.g. using UDP) need to similarly react to 531 loss [RFC5405] 533 Network devices SHOULD use use an AQM algorithm to determine the 534 packets that are effected by congestion. 536 Loss also has an effect on the efficiency of a flow and can 537 significantly impact some classes of application. In reliable 538 transports the dropped data must be subsequently retransmitted. 539 While other applications/transports may adapt to the absence of lost 540 data, this still implies inefficient use of available capacity and 541 the dropped traffic can affect other flows. Hence, loss is not 542 entirely positive; it is a necessary evil. 544 4.2.1. AQM and ECN 546 Explicit Congestion Notification (ECN) [RFC4301] [RFC4774] [RFC6040] 547 [RFC6679]. is a network-layer function that allows a transport to 548 receive network congestion information from a network device without 549 incurring the unintended consequences of loss. ECN includes both 550 transport mechanisms and functions implemented in network devices, 551 the latter rely upon using AQM to decider whether to ECN-mark. 553 Congestion for ECN-capable transports is signalled by a network 554 device setting the "Congestion Experienced (CE)" codepoint in the IP 555 header. This codepoint is noted by the remote receiving end point 556 and signalled back to the sender using a transport protocol 557 mechanism, allowing the sender to trigger timely congestion control. 558 The decision to set the CE codepoint requires an AQM algorithm 559 configured with a threshold. Non-ECN capable flows (the default) are 560 dropped under congestion. 562 Network devices SHOULD use an AQM algorithm that marks ECN-capable 563 traffic when making decisions about the response to congestion. 564 Network devices need to implement this method by marking ECN-capable 565 traffic or by dropping non-ECN-capable traffic. 567 Safe deployment of ECN requires that network devices drop excessive 568 traffic, even when marked as originating from an ECN-capable 569 transport. This is necessary because (1) A non-conformant, broken or 570 malicious receiver could conceal an ECN mark, and not report this to 571 the sender (2) A non-conformant, broken or malicious sender could 572 ignore a reported ECN mark, as it could ignore a loss without using 573 ECN (3) A malfunctioning or non-conforming network device may 574 similarly "hide" an ECN mark. In normal operation such cases should 575 be very uncommon. 577 Network devices SHOULD use an algorithm to drop excessive traffic, 578 even when marked as originating from an ECN-capable transport. 580 4.3. AQM algorithms deployed SHOULD NOT require operational tuning 582 A number of AQM algorithms have been proposed. Many require some 583 form of tuning or setting of parameters for initial network 584 conditions. This can make these algorithms difficult to use in 585 operational networks. 587 This document therefore recommends that AQM algorithm proposed for 588 deployment in the Internet: 590 o SHOULD NOT require tuning of initial or configuration parameters. 591 An algorithm needs to provide a default behaviour that auto-tunes 592 to a reasonable performance for typical network conditions. This 593 is expected to ease deployment and operation. 595 o MAY support further manual tuning that could improve performance 596 in a specific deployed network. Algorithms that lack such 597 variables are acceptable, but if such variables exist, they SHOULD 598 be externalized. Guidance needs to be provided on the cases where 599 autotuning is unlikely to achieve satisfactory performance and to 600 identify the set of parameters that can be tuned. This is 601 expected to enable the algorithm to be deployed in networks that 602 have specific characteristics (variable/larger delay; networks 603 were capacity is impacted by interactions with lower layer 604 mechanisms, etc) 606 o MAY provide logging and alarm signals to assist in identifying if 607 an algorithm using manual or auto-tuning is functioning as 608 expected. (e.g., this could be based on an internal consistency 609 check between input, output, and mark/drop rates over time). This 610 is expected to encourage deployment by default and allow operators 611 to identify potential interactions with other network functions. 613 Hence, self-tuning algorithms are to be preferred. Algorithms 614 recommended for general Internet deployment by the IETF need to be 615 designed so that they do not require operational (especially manual) 616 configuration or tuning. 618 4.4. AQM algorithms SHOULD respond to measured congestion, not 619 application profiles. 621 Not all applications transmit packets of the same size. Although 622 applications may be characterised by particular profiles of packet 623 size this should not be used as the basis for AQM (see next section). 624 Other methods exist, e.g. Differentiated Services queueing, Pre- 625 Congestion Notification (PCN) [RFC5559], that can be used to 626 differentiate and police classes of application. Network devices may 627 combine AQM with these traffic classification mechanisms and perform 628 AQM only on specific queues within a network device. 630 An AQM algorithm should not deliberately try to prejudice the size of 631 packet that performs best (i.e. preferentially drop/mark based only 632 on packet size). Procedures for selecting packets to mark/drop 633 SHOULD observe actual or projected time a packet is in a queue (bytes 634 at a rate being an analog to time). When an AQM algorithm decides 635 whether to drop (or mark) a packet, it is RECOMMENDED that the size 636 of the particular packet should not be taken into account [Byte-pkt]. 638 Applications (or transports) generally know the packet size that they 639 are using and can hence make their judgements about whether to use 640 small or large packets based on the data they wish to send and the 641 expected impact on the delay or throughput, or other performance 642 parameter. When a transport or application responds to a dropped or 643 marked packet, the size of the rate reduction should be proportionate 644 to the size of the packet that was sent [Byte-pkt]. 646 4.5. AQM algorithms SHOULD NOT be dependent on specific transport 647 protocol behaviours 649 In deploying AQM, network devices need to support a range of Internet 650 traffic and SHOULD NOT make implicit assumptions about the 651 characteristics desired by the set transports/applications the 652 network supports. That is, AQM methods should be opaque to the 653 choice of transport and application. 655 AQM algorithms are often evaluated by considering TCP [RFC0793] with 656 a limited number of applications. Although TCP is the predominant 657 transport in the Internet today, this no longer represents a 658 sufficient selection of traffic for verification. There is 659 significant use of UDP [RFC0768] in voice and video services, and 660 some applications find utility in SCTP [RFC4960] and DCCP [RFC4340]. 661 Hence, AQM algorithms should also demonstrate operation with 662 transports other than TCP and need to consider a variety of 663 applications. Selection of AQM algorithms also needs to consider use 664 of tunnel encapsulations that may carry traffic aggregates. 666 AQM algorithms SHOULD NOT target or derive implicit assumptions about 667 the characteristics desired by specific transports/applications. 668 Transports and applications need to respond to the congestion signals 669 provided by AQM (i.e. dropping or ECN-marking) in a timely manner 670 (within a few RTT at the latest). 672 4.6. Interactions with congestion control algorithms 674 Applications and transports need to react to received implicit or 675 explicit signals that indicate the presence of congestion. This 676 section identifies issues that can impact the design of transport 677 protocols when using paths that use AQM. 679 Transport protocols and applications need timely signals of 680 congestion. The time taken to detect and respond to congestion is 681 increased when network devices queue packets in buffers. It can 682 difficult to detect tail losses at a higher layer and may sometimes 683 require transport timers or probe packets to detect and respond to 684 such loss. Loss patterns may also impact timely detection, e.g. the 685 time may be reduced when network devices do not drop long runs of 686 packets from the same flow. 688 A common objective is to deliver data from its source end point to 689 its destination in the least possible time. When speaking of TCP 690 performance, the terms "knee" and "cliff" area defined by [Jain94]. 691 They respectively refer to the minimum congestion window that 692 maximises throughput and the maximum congestion window that avoids 693 loss. An application that transmits at the rate determined by this 694 window has the effect of maximizing the rate or throughput. For the 695 sender, exceeding the cliff is ineffective, as it (by definition) 696 induces loss; operating at a point close to the cliff has a negative 697 impact on other traffic and applications, triggering operator 698 activities, such as those discussed in [RFC6057]. Operating below 699 the knee reduces the throughput, since the sender fails to use 700 available network capacity. As a result, the behavior of any elastic 701 transport congestion control algorithm designed to minimise delivery 702 time should seek to use an effective window at or above the knee and 703 well below the cliff. Choice of an appropriate rate can 704 significantly impact the loss and delay experienced not only by a 705 flow, but by other flows that share the same queue. 707 Some applications may send less than permitted by the congestion 708 control window (or rate). Examples include multimedia codecs that 709 stream at some natural rate (or set of rates) or an application that 710 is naturally interactive (e.g. some web applications, gaming, 711 transaction-based protocols). Such applications may have different 712 objectives. They may not wish to maximise throughput, but may desire 713 a lower loss rate or bounded delay. 715 The correct operation of an AQM-enabled network device MUST NOT rely 716 upon specific transport responses to congestion signals. 718 4.7. The need for further research 720 The second recommendation of [RFC2309] called for further research 721 into the interaction between network queues and host applications, 722 and the means of signaling between them. This research has occurred, 723 and we as a community have learned a lot. However, we are not done. 725 We have learned that the problems of congestion, latency and buffer- 726 sizing have not gone away, and are becoming more important to many 727 users. A number of self-tuning AQM algorithms have been found that 728 offer significant advantages for deployed networks. There is also 729 renewed interest in deploying AQM and the potential of ECN. 731 In 2013, an obvious example of further research is the need to 732 consider the use of Map/Reduce applications in data centers; do we 733 need to extend our taxonomy of TCP/SCTP sessions to include not only 734 "mice" and "elephants", but "lemmings"? "Lemmings" are flash crowds 735 of "mice" that the network inadvertently tries to signal to as if 736 they were elephant flows, resulting in head of line blocking in data 737 center applications. 739 Examples of other required research include: 741 o Research into new AQM and scheduling algorithms. 743 o Research into the use of and deployment of ECN alongside AQM. 745 o Tools for enabling AQM (and ECN) deployment and measuring the 746 performance. 748 o Methods for mitigating the impact of non-conformant and malicious 749 flows. 751 Hence, this document therefore reiterates the call of RFC 2309: we 752 need continuing research as applications develop. 754 5. IANA Considerations 756 This memo asks the IANA for no new parameters. 758 6. Security Considerations 760 While security is a very important issue, it is largely orthogonal to 761 the performance issues discussed in this memo. 763 Many deployed network devices use queueing methods that allow 764 unresponsive traffic to capture network capacity, denying access to 765 other traffic flows. This could potentially be used as a denial-of- 766 service attack. This threat could be reduced in network devices 767 deploy AQM or some form of scheduling. We note, however, that a 768 denial-of-service attack may create unresponsive traffic flows that 769 may be indistinguishable from other traffic flows (e.g. tunnels 770 carrying aggregates of short flows, high-rate isochronous 771 applications). New methods therefore may remain vulnerable, and this 772 document recommends that ongoing research should consider ways to 773 mitigate such attacks. 775 7. Privacy Considerations 777 This document, by itself, presents no new privacy issues. 779 8. Acknowledgements 781 The original recommendation in [RFC2309] was written by the End-to- 782 End Research Group, which is to say Bob Braden, Dave Clark, Jon 783 Crowcroft, Bruce Davie, Steve Deering, Deborah Estrin, Sally Floyd, 784 Van Jacobson, Greg Minshall, Craig Partridge, Larry Peterson, KK 785 Ramakrishnan, Scott Shenker, John Wroclawski, and Lixia Zhang. This 786 is an edited version of that document, with much of its text and 787 arguments unchanged. 789 The need for an updated document was agreed to in the tsvarea meeting 790 at IETF 86. This document was reviewed on the aqm@ietf.org list. 791 Comments came from Colin Perkins, Richard Scheffenegger, and Dave 792 Taht. 794 Gorry Fairhurst was in part supported by the European Community under 795 its Seventh Framework Programme through the Reducing Internet 796 Transport Latency (RITE) project (ICT-317700). 798 9. References 800 9.1. Normative References 802 [Byte-pkt] 803 Internet Engineering Task Force, Work in Progress, "Byte 804 and Packet Congestion Notification 805 (draft-ietf-tsvwg-byte-pkt-congest)", July 2013. 807 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 808 Requirement Levels", BCP 14, RFC 2119, March 1997. 810 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 811 of Explicit Congestion Notification (ECN) to IP", 812 RFC 3168, September 2001. 814 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 815 Internet Protocol", RFC 4301, December 2005. 817 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 818 Explicit Congestion Notification (ECN) Field", BCP 124, 819 RFC 4774, November 2006. 821 [RFC5405] Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines 822 for Application Designers", BCP 145, RFC 5405, 823 November 2008. 825 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 826 Control", RFC 5681, September 2009. 828 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 829 Notification", RFC 6040, November 2010. 831 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 832 and K. Carlberg, "Explicit Congestion Notification (ECN) 833 for RTP over UDP", RFC 6679, August 2012. 835 9.2. Informative References 837 [AQM-WG] "IETF AQM WG". 839 [Demers90] 840 Demers, A., Keshav, S., and S. Shenker, "Analysis and 841 Simulation of a Fair Queueing Algorithm, Internetworking: 842 Research and Experience", SIGCOMM Symposium proceedings on 843 Communications architectures and protocols , 1990. 845 [Floyd91] Floyd, S., "Connections with Multiple Congested Gateways 846 in Packet-Switched Networks Part 1: One-way Traffic.", 847 Computer Communications Review , October 1991. 849 [Floyd95] Floyd, S. and V. Jacobson, "Link-sharing and Resource 850 Management Models for Packet Networks", IEEE/ACM 851 Transactions on Networking , August 1995. 853 [Jacobson88] 854 Jacobson, V., "Congestion Avoidance and Control", SIGCOMM 855 Symposium proceedings on Communications architectures and 856 protocols , August 1988. 858 [Jain94] Jain, Raj., Ramakrishnan, KK., and Chiu. Dah-Ming, 859 "Congestion avoidance scheme for computer networks", US 860 Patent Office 5377327, December 1994. 862 [Lakshman96] 863 Lakshman, TV., Neidhardt, A., and T. Ott, "The Drop From 864 Front Strategy in TCP Over ATM and Its Interworking with 865 Other Control Features", IEEE Infocomm , 1996. 867 [Leland94] 868 Leland, W., Taqqu, M., Willinger, W., and D. Wilson, "On 869 the Self-Similar Nature of Ethernet Traffic (Extended 870 Version)", IEEE/ACM Transactions on Networking , 871 February 1994. 873 [Papagiannaki] 874 Sprint ATL, KAIST, University of Minnesota, Sprint ATL, 875 and Intel ResearchIETF, "Analysis of Point-To-Point Packet 876 Delay In an Operational Network", IEEE Infocom 2004, 877 March 2004, 878 . 880 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 881 August 1980. 883 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 884 September 1981. 886 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 887 RFC 793, September 1981. 889 [RFC0896] Nagle, J., "Congestion control in IP/TCP internetworks", 890 RFC 896, January 1984. 892 [RFC0970] Nagle, J., "On packet switches with infinite storage", 893 RFC 970, December 1985. 895 [RFC1122] Braden, R., "Requirements for Internet Hosts - 896 Communication Layers", STD 3, RFC 1122, October 1989. 898 [RFC1633] Braden, B., Clark, D., and S. Shenker, "Integrated 899 Services in the Internet Architecture: an Overview", 900 RFC 1633, June 1994. 902 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 903 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 904 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 905 S., Wroclawski, J., and L. Zhang, "Recommendations on 906 Queue Management and Congestion Avoidance in the 907 Internet", RFC 2309, April 1998. 909 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 910 (IPv6) Specification", RFC 2460, December 1998. 912 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 913 "Definition of the Differentiated Services Field (DS 914 Field) in the IPv4 and IPv6 Headers", RFC 2474, 915 December 1998. 917 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 918 and W. Weiss, "An Architecture for Differentiated 919 Services", RFC 2475, December 1998. 921 [RFC2960] Stewart, R., Xie, Q., Morneault, K., Sharp, C., 922 Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M., 923 Zhang, L., and V. Paxson, "Stream Control Transmission 924 Protocol", RFC 2960, October 2000. 926 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 927 Congestion Control Protocol (DCCP)", RFC 4340, March 2006. 929 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", 930 RFC 4960, September 2007. 932 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 933 Friendly Rate Control (TFRC): Protocol Specification", 934 RFC 5348, September 2008. 936 [RFC5559] Eardley, P., "Pre-Congestion Notification (PCN) 937 Architecture", RFC 5559, June 2009. 939 [RFC6057] Bastian, C., Klieber, T., Livingood, J., Mills, J., and R. 940 Woundy, "Comcast's Protocol-Agnostic Congestion Management 941 System", RFC 6057, December 2010. 943 [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, 944 "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, 945 December 2012. 947 [Willinger95] 948 Willinger, W., Taqqu, M., Sherman, R., Wilson, D., and V. 949 Jacobson, "Self-Similarity Through High-Variability: 950 Statistical Analysis of Ethernet LAN Traffic at the Source 951 Level", SIGCOMM Symposium proceedings on Communications 952 architectures and protocols , August 1995. 954 Appendix A. Change Log 956 Initial Version: March 2013 958 Minor update of the algorithms that the IETF recommends SHOULD NOT 959 require operational (especially manual) configuration or tuningdate: 960 April 2013 962 Major surgery. This draft is for discussion at IETF-87 and expected 963 to be further updated. July 2013 965 -00 WG Draft - Updated transport recommendations; revised deployment 966 configuration section; numerous minor edits. Oct 2013 968 Authors' Addresses 970 Fred Baker (editor) 971 Cisco Systems 972 Santa Barbara, California 93117 973 USA 975 Email: fred@cisco.com 977 Godred Fairhurst (editor) 978 University of Aberdeen 979 School of Engineering 980 Fraser Noble Building 981 Aberdeen, Scotland AB24 3UE 982 UK 984 Email: gorry@erg.abdn.ac.uk 985 URI: http://www.erg.abdn.ac.uk