idnits 2.17.1 draft-ietf-aqm-recommendation-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC2309, but the abstract doesn't seem to directly say this. It does mention RFC2309 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 15, 2014) is 3722 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5405 (Obsoleted by RFC 8085) -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 896 (Obsoleted by RFC 7805) -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Baker, Ed. 3 Internet-Draft Cisco Systems 4 Obsoletes: 2309 (if approved) G. Fairhurst, Ed. 5 Intended status: Best Current Practice University of Aberdeen 6 Expires: August 19, 2014 February 15, 2014 8 IETF Recommendations Regarding Active Queue Management 9 draft-ietf-aqm-recommendation-03 11 Abstract 13 This memo presents recommendations to the Internet community 14 concerning measures to improve and preserve Internet performance. It 15 presents a strong recommendation for testing, standardization, and 16 widespread deployment of active queue management (AQM) in network 17 devices, to improve the performance of today's Internet. It also 18 urges a concerted effort of research, measurement, and ultimate 19 deployment of AQM mechanisms to protect the Internet from flows that 20 are not sufficiently responsive to congestion notification. 22 The note largely repeats the recommendations of RFC 2309, updated 23 after fifteen years of experience and new research. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on August 19, 2014. 42 Copyright Notice 44 Copyright (c) 2014 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 60 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 61 2. The Need For Active Queue Management . . . . . . . . . . . . 4 62 3. Managing Aggressive Flows . . . . . . . . . . . . . . . . . . 8 63 4. Conclusions and Recommendations . . . . . . . . . . . . . . . 10 64 4.1. Operational deployments SHOULD use AQM procedures . . . 11 65 4.2. Signaling to the transport endpoints . . . . . . . . . . 12 66 4.2.1. AQM and ECN . . . . . . . . . . . . . . . . . . . . . 13 67 4.3. AQM algorithms deployed SHOULD NOT require operational 68 tuning . . . . . . . . . . . . . . . . . . . . . . . . . 13 69 4.4. AQM algorithms SHOULD respond to measured congestion, not 70 application profiles. . . . . . . . . . . . . . . . . . . 15 71 4.5. AQM algorithms SHOULD NOT be dependent on specific 72 transport protocol behaviours . . . . . . . . . . . . . . 15 73 4.6. Interactions with congestion control algorithms . . . . . 16 74 4.7. The need for further research . . . . . . . . . . . . . . 17 75 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 76 6. Security Considerations . . . . . . . . . . . . . . . . . . . 18 77 7. Privacy Considerations . . . . . . . . . . . . . . . . . . . 18 78 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 79 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 80 9.1. Normative References . . . . . . . . . . . . . . . . . . 19 81 9.2. Informative References . . . . . . . . . . . . . . . . . 19 82 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 22 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 85 1. Introduction 87 The Internet protocol architecture is based on a connectionless end- 88 to-end packet service using the Internet Protocol, whether IPv4 89 [RFC0791] or IPv6 [RFC2460]. The advantages of its connectionless 90 design: flexibility and robustness, have been amply demonstrated. 91 However, these advantages are not without cost: careful design is 92 required to provide good service under heavy load. In fact, lack of 93 attention to the dynamics of packet forwarding can result in severe 94 service degradation or "Internet meltdown". This phenomenon was 95 first observed during the early growth phase of the Internet in the 96 mid 1980s [RFC0896][RFC0970], and is technically called "congestive 97 collapse". 99 The original fix for Internet meltdown was provided by Van Jacobsen. 100 Beginning in 1986, Jacobsen developed the congestion avoidance 101 mechanisms that are now required in TCP implementations [Jacobson88] 102 [RFC1122]. These mechanisms operate in Internet hosts to cause TCP 103 connections to "back off" during congestion. We say that TCP flows 104 are "responsive" to congestion signals (i.e., marked or dropped 105 packets) from the network. It is primarily these TCP congestion 106 avoidance algorithms that prevent the congestive collapse of today's 107 Internet. Similar algorithms are specified for other non-TCP 108 transports. 110 However, that is not the end of the story. Considerable research has 111 been done on Internet dynamics since 1988, and the Internet has 112 grown. It has become clear that the TCP congestion avoidance 113 mechanisms [RFC5681], while necessary and powerful, are not 114 sufficient to provide good service in all circumstances. Basically, 115 there is a limit to how much control can be accomplished from the 116 edges of the network. Some mechanisms are needed in the network 117 devices to complement the endpoint congestion avoidance mechanisms. 118 These mechanisms may be implemented in network devices that include 119 routers, switches, and other network middleboxes. 121 It is useful to distinguish between two classes of algorithms related 122 to congestion control: "queue management" versus "scheduling" 123 algorithms. To a rough approximation, queue management algorithms 124 manage the length of packet queues by marking or dropping packets 125 when necessary or appropriate, while scheduling algorithms determine 126 which packet to send next and are used primarily to manage the 127 allocation of bandwidth among flows. While these two AQM mechanisms 128 are closely related, they address different performance issues. 130 This memo highlights two performance issues: 132 The first issue is the need for an advanced form of queue management 133 that we call "Active Queue Management", AQM. Section 2 summarizes 134 the benefits that active queue management can bring. A number of AQM 135 procedures are described in the literature, with different 136 characteristics. This document does not recommend any of them in 137 particular, but does make recommendations that ideally would affect 138 the choice of procedure used in a given implementation. 140 The second issue, discussed in Section 3 of this memo, is the 141 potential for future congestive collapse of the Internet due to flows 142 that are unresponsive, or not sufficiently responsive, to congestion 143 indications. Unfortunately, there is currently no consensus solution 144 to controlling congestion caused by such aggressive flows; 145 significant research and engineering will be required before any 146 solution will be available. It is imperative that this work be 147 energetically pursued, to ensure the future stability of the 148 Internet. 150 Section 4 concludes the memo with a set of recommendations to the 151 Internet community concerning these topics. 153 The discussion in this memo applies to "best-effort" traffic, which 154 is to say, traffic generated by applications that accept the 155 occasional loss, duplication, or reordering of traffic in flight. It 156 also applies to other traffic, such as real-time traffic that can 157 adapt its sending rate to reduce loss and/or delay. It is most 158 effective when the adaption occurs on time scales of a single Round 159 Trip Time (RTT) or a small number of RTTs, for elastic traffic 160 [RFC1633]. 162 [RFC2309] resulted from past discussions of end-to-end performance, 163 Internet congestion, and Random Early Discard (RED) in the End-to-End 164 Research Group of the Internet Research Task Force (IRTF). This 165 update results from experience with this and other algorithms, and 166 the AQM discussion within the IETF[AQM-WG]. 168 1.1. Requirements Language 170 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 171 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 172 document are to be interpreted as described in [RFC2119]. 174 2. The Need For Active Queue Management 176 The traditional technique for managing the queue length in a network 177 device is to set a maximum length (in terms of packets) for each 178 queue, accept packets for the queue until the maximum length is 179 reached, then reject (drop) subsequent incoming packets until the 180 queue decreases because a packet from the queue has been transmitted. 181 This technique is known as "tail drop", since the packet that arrived 182 most recently (i.e., the one on the tail of the queue) is dropped 183 when the queue is full. This method has served the Internet well for 184 years, but it has two important drawbacks: 186 1. Lock-Out 188 In some situations tail drop allows a single connection or a few 189 flows to monopolize the queue space starving other connection 190 preventing them from getting room in the queue. This "lock-out" 191 phenomenon is often the result of synchronization or other timing 192 effects. 194 2. Full Queues 196 The tail drop discipline allows queues to maintain a full (or, 197 almost full) status for long periods of time, since tail drop 198 signals congestion (via a packet drop) only when the queue has 199 become full. It is important to reduce the steady-state queue 200 size, and this is perhaps the most important goal for queue 201 management. 203 The naive assumption might be that there is a simple tradeoff 204 between delay and throughput, and that the recommendation that 205 queues be maintained in a "non-full" state essentially translates 206 to a recommendation that low end-to-end delay is more important 207 than high throughput. However, this does not take into account 208 the critical role that packet bursts play in Internet 209 performance. For example, even though TCP constrains the 210 congestion window of a flow, packets often arrive at network 211 devices in bursts [Leland94]. If the queue is full or almost 212 full, an arriving burst will cause multiple packets to be 213 dropped. This can result in a global synchronization of flows 214 throttling back, followed by a sustained period of lowered link 215 utilization, reducing overall throughput. 217 The point of buffering in the network is to absorb data bursts 218 and to transmit them during the (hopefully) ensuing bursts of 219 silence. This is essential to permit the transmission of bursty 220 data. Normally small queues are preferred in network devices, 221 with sufficient queue capacity to absorb the bursts. The 222 counter-intuitive result is that maintaining normally-small 223 queues can result in higher throughput as well as lower end-to- 224 end delay. In summary, queue limits should not reflect the 225 steady state queues we want to be maintained in the network; 226 instead, they should reflect the size of bursts that a network 227 device needs to absorb. 229 Besides tail drop, two alternative queue disciplines that can be 230 applied when a queue becomes full are "random drop on full" or "drop 231 front on full". Under the random drop on full discipline, a network 232 device drops a randomly selected packet from the queue (which can be 233 an expensive operation, since it naively requires an O(N) walk 234 through the packet queue) when the queue is full and a new packet 235 arrives. Under the "drop front on full" discipline [Lakshman96], the 236 network device drops the packet at the front of the queue when the 237 queue is full and a new packet arrives. Both of these solve the 238 lock-out problem, but neither solves the full-queues problem 239 described above. 241 We know in general how to solve the full-queues problem for 242 "responsive" flows, i.e., those flows that throttle back in response 243 to congestion notification. In the current Internet, dropped packets 244 provide a critical mechanism indicating congestion notification to 245 hosts. The solution to the full-queues problem is for network 246 devices to drop packets before a queue becomes full, so that hosts 247 can respond to congestion before buffers overflow. We call such a 248 proactive approach AQM. By dropping packets before buffers overflow, 249 AQM allows network devices to control when and how many packets to 250 drop. 252 In summary, an active queue management mechanism can provide the 253 following advantages for responsive flows. 255 1. Reduce number of packets dropped in network devices 257 Packet bursts are an unavoidable aspect of packet networks 258 [Willinger95]. If all the queue space in a network device is 259 already committed to "steady state" traffic or if the buffer 260 space is inadequate, then the network device will have no ability 261 to buffer bursts. By keeping the average queue size small, AQM 262 will provide greater capacity to absorb naturally-occurring 263 bursts without dropping packets. 265 Furthermore, without AQM, more packets will be dropped when a 266 queue does overflow. This is undesirable for several reasons. 267 First, with a shared queue and the tail drop discipline, this can 268 result in unnecessary global synchronization of flows, resulting 269 in lowered average link utilization, and hence lowered network 270 throughput. Second, unnecessary packet drops represent a waste 271 of network capacity on the path before the drop point. 273 While AQM can manage queue lengths and reduce end-to-end latency 274 even in the absence of end-to-end congestion control, it will be 275 able to reduce packet drops only in an environment that continues 276 to be dominated by end-to-end congestion control. 278 2. Provide a lower-delay interactive service 280 By keeping a small average queue size, AQM will reduce the delays 281 experienced by flows. This is particularly important for 282 interactive applications such as short Web transfers, POP/IMAP, 283 Telnet traffic, or interactive audio-video sessions, whose 284 subjective (and objective) performance is better when the end-to- 285 end delay is low. 287 3. Avoid lock-out behavior 289 AQM can prevent lock-out behavior by ensuring that there will 290 almost always be a buffer available for an incoming packet. For 291 the same reason, AQM can prevent a bias against low capacity, but 292 highly bursty, flows. 294 Lock-out is undesirable because it constitutes a gross unfairness 295 among groups of flows. However, we stop short of calling this 296 benefit "increased fairness", because general fairness among 297 flows requires per-flow state, which is not provided by queue 298 management. For example, in a network device using AQM with only 299 FIFO scheduling, two TCP flows may receive very different share 300 of the network capacity simply because they have different round- 301 trip times [Floyd91], and a flow that does not use congestion 302 control may receive more capacity than a flow that does. For 303 example, a router may maintain per-flow state to achieve general 304 fairness by a per-flow scheduling algorithm such as Fair Queueing 305 (FQ) [Demers90], or a Class-Based Queue scheduling algorithm such 306 as CBQ [Floyd95]. 308 In contrast, AQM is needed even for network devices that use per- 309 flow scheduling algorithms such as FQ or class-based scheduling 310 algorithms, such as CBQ. This is because per-flow scheduling 311 algorithms by themselves do not control the overall queue size or 312 the size of individual queues. AQM is needed to control the 313 overall average queue sizes, so that arriving bursts can be 314 accommodated without dropping packets. In addition, AQM should 315 be used to control the queue size for each individual flow or 316 class, so that they do not experience unnecessarily high delay. 317 Therefore, AQM should be applied across the classes or flows as 318 well as within each class or flow. 320 In short, scheduling algorithms and queue management should be seen 321 as complementary, not as replacements for each other. 323 An AQM method may use Explicit Congestion Notification (ECN) 324 [RFC3168] instead of dropping to mark packets under mild or moderate 325 congestion (see Section 4.2.1). 327 It is also important to differentiate the choice of buffer size for a 328 queue in a switch/router or other network device, and the 329 threshold(s) and other parameters that determine how and when an AQM 330 algorithm operates. One the one hand, the optimum buffer size is a 331 function of operational requirements and should generally be sized to 332 be sufficient to buffer the largest normal traffic burst that is 333 expected. This size depends on the number and burstiness of traffic 334 arriving at the queue and the rate at which traffic leaves the queue. 335 Different types of traffic and deployment scenarios will lead to 336 different requirements. AQM frees a designer from having to the 337 limit buffer space to achieve acceptable performance, allowing 338 allocation of sufficient buffering to satisfy the needs of the 339 particular traffic pattern. On the other hand, the choice of AQM 340 algorithm and associated parameters is a function of the way in which 341 congestion is experienced and the required reaction to achieve 342 acceptable performance. This latter topic is the primary topic of 343 the following sections. 345 3. Managing Aggressive Flows 347 One of the keys to the success of the Internet has been the 348 congestion avoidance mechanisms of TCP. Because TCP "backs off" 349 during congestion, a large number of TCP connections can share a 350 single, congested link in such a way that link bandwidth is shared 351 reasonably equitably among similarly situated flows. The equitable 352 sharing of bandwidth among flows depends on all flows running 353 compatible congestion avoidance algorithms, i.e., methods conformant 354 with the current TCP specification [RFC5681]. 356 We call a flow "TCP-friendly" when it has a congestion response that 357 approximates the average response expected of a TCP flow. One 358 example method of a TCP-friendly scheme is the TCP-Friendly Rate 359 Control algorithm [RFC5348]. In this document, the term is used more 360 generally to describe this and other algorithms that meet these 361 goals. 363 It is convenient to divide flows into three classes: (1) TCP Friendly 364 flows, (2) unresponsive flows, i.e., flows that do not slow down when 365 congestion occurs, and (3) flows that are responsive but are not TCP- 366 friendly. The last two classes contain more aggressive flows that 367 pose significant threats to Internet performance, which we will now 368 discuss. 370 1. TCP-Friendly flows 372 A TCP-friendly flow responds to congestion notification within a 373 small number of path Round Trip Times (RTT), and in steady-state 374 it uses no more capacity than a conformant TCP running under 375 comparable conditions (drop rate, RTT, packet size, etc.). This 376 is described in the remainder of the document. 378 2. Non-Responsive Flows 379 The User Datagram Protocol (UDP) [RFC0768] provides a minimal, 380 best-effort transport to applications and upper-layer protocols 381 (both simply called "applications" in the remainder of this 382 document) and does not itself provide mechanisms to prevent 383 congestion collapse and establish a degree of fairness [RFC5405]. 385 There is a growing set of UDP-based applications whose congestion 386 avoidance algorithms are inadequate or nonexistent (i.e, a flow 387 that does not throttle its sending rate when it experiences 388 congestion). Examples include some UDP streaming applications 389 for packet voice and video, and some multicast bulk data 390 transport. If no action is taken, such unresponsive flows could 391 lead to a new congestive collapse [RFC2309]. 393 In general, UDP-based applications need to incorporate effective 394 congestion avoidance mechanisms [RFC5405]. Further research and 395 development of ways to accomplish congestion avoidance for 396 presently unresponsive applications continue to be important. 397 Network devices need to be able to protect themselves against 398 unresponsive flows, and mechanisms to accomplish this must be 399 developed and deployed. Deployment of such mechanisms would 400 provide an incentive for all applications to become responsive by 401 either using a congestion-controlled transport (e.g. TCP, SCTP, 402 DCCP) or by incorporating their own congestion control in the 403 application [RFC5405]. 405 3. Non-TCP-friendly Transport Protocols 407 A second threat is posed by transport protocol implementations 408 that are responsive to congestion, but, either deliberately or 409 through faulty implementation, are not TCP-friendly. Such 410 applications may gain an unfair share of the available network 411 capacity. 413 For example, the popularity of the Internet has caused a 414 proliferation in the number of TCP implementations. Some of 415 these may fail to implement the TCP congestion avoidance 416 mechanisms correctly because of poor implementation. Others may 417 deliberately be implemented with congestion avoidance algorithms 418 that are more aggressive in their use of capacity than other TCP 419 implementations; this would allow a vendor to claim to have a 420 "faster TCP". The logical consequence of such implementations 421 would be a spiral of increasingly aggressive TCP implementations, 422 leading back to the point where there is effectively no 423 congestion avoidance and the Internet is chronically congested. 425 Another example could be an RTP/UDP video flow that uses an 426 adaptive codec, but responds incompletely to indications of 427 congestion or responds over an excessively long time period. 428 Such flows are unlikely to be responsive to congestion signals in 429 a timeframe comparable to a small number of end-to-end 430 transmission delays. However, over a longer timescale, perhaps 431 seconds in duration, they could moderate their speed, or increase 432 their speed if they determine capacity to be available. 434 Tunneled traffic aggregates carrying multiple (short) TCP flows 435 can be more aggressive than standard bulk TCP. Applications 436 (e.g. web browsers and peer-to-peer file-sharing) have exploited 437 this by opening multiple connections to the same endpoint. 439 The projected increase in the fraction of total Internet traffic for 440 more aggressive flows in classes 2 and 3 clearly poses a threat to 441 future Internet stability. There is an urgent need for measurements 442 of current conditions and for further research into the ways of 443 managing such flows. This raises many difficult issues in 444 identifying and isolating unresponsive or non-TCP-friendly flows at 445 an acceptable overhead cost. Finally, there is as yet little 446 measurement or simulation evidence available about the rate at which 447 these threats are likely to be realized, or about the expected 448 benefit of algorithms for managing such flows. 450 Another topic requiring consideration is the appropriate granularity 451 of a "flow" when considering a queue management method. There are a 452 few "natural" answers: 1) a transport (e.g. TCP or UDP) flow (source 453 address/port, destination address/port, Differentiated Services Code 454 Point - DSCP); 2) a source/destination host pair (IP addresses, 455 DSCP); 3) a given source host or a given destination host. We 456 suggest that the source/destination host pair gives the most 457 appropriate granularity in many circumstances. However, it is 458 possible that different vendors/providers could set different 459 granularities for defining a flow (as a way of "distinguishing" 460 themselves from one another), or that different granularities could 461 be chosen for different places in the network. It may be the case 462 that the granularity is less important than the fact that a network 463 device needs to be able to deal with more unresponsive flows at 464 *some* granularity. The granularity of flows for congestion 465 management is, at least in part, a question of policy that needs to 466 be addressed in the wider IETF community. 468 4. Conclusions and Recommendations 470 The IRTF, in publishing [RFC2309], and the IETF in subsequent 471 discussion, has developed a set of specific recommendations regarding 472 the implementation and operational use of AQM procedures. This 473 document updates these to include: 475 1. Network devices SHOULD implement some AQM mechanism to manage 476 queue lengths, reduce end-to-end latency, and avoid lock-out 477 phenomena within the Internet. 479 2. Deployed AQM algorithms SHOULD support Explicit Congestion 480 Notification (ECN) as well as loss to signal congestion to 481 endpoints. 483 3. The algorithms that the IETF recommends SHOULD NOT require 484 operational (especially manual) configuration or tuning. 486 4. AQM algorithms SHOULD respond to measured congestion, not 487 application profiles. 489 5. AQM algorithms SHOULD NOT interpret specific transport protocol 490 behaviours. 492 6. Transport protocol congestion control algorithms SHOULD maximize 493 their use of available capacity (when there is data to send) 494 without incurring undue loss or undue round trip delay. 496 7. Research, engineering, and measurement efforts are needed 497 regarding the design of mechanisms to deal with flows that are 498 unresponsive to congestion notification or are responsive, but 499 are more aggressive than present TCP. 501 These recommendations are expressed using the word "SHOULD". This is 502 in recognition that there may be use cases that have not been 503 envisaged in this document in which the recommendation does not 504 apply. However, care should be taken in concluding that one's use 505 case falls in that category; during the life of the Internet, such 506 use cases have been rarely if ever observed and reported on. To the 507 contrary, available research [Papagiannaki] says that even high speed 508 links in network cores that are normally very stable in depth and 509 behavior experience occasional issues that need moderation. 511 4.1. Operational deployments SHOULD use AQM procedures 513 AQM procedures are designed to minimize the delay induced in the 514 network by queues that have filled as a result of host behavior. 515 Marking and loss behaviors provide a signal that buffers within 516 network devices are becoming unnecessarily full, and that the sender 517 would do well to moderate its behavior. 519 4.2. Signaling to the transport endpoints 521 There are a number of ways a network device may signal to the end 522 point that the network is becoming congested and trigger a reduction 523 in rate. The signalling methods include: 525 o Delaying transport segments (packets) in flight, such as in a 526 queue. 528 o Dropping transport segments (packets) in transit. 530 o Marking transport segments (packets), such as using Explicit 531 Congestion Control[RFC3168] [RFC4301] [RFC4774] [RFC6040] 532 [RFC6679]. 534 The use of scheduling mechanisms, such as priority queuing, classful 535 queuing, and fair queuing, is often effective in networks to help a 536 network serve the needs of a range of applications. Network 537 operators can use these methods to manage traffic passing a choke 538 point. This is discussed in [RFC2474] and [RFC2475]. 540 Increased network latency can be used as an implicit signal of 541 congestion. E.g., in TCP additional delay can affect ACK Clocking 542 and has the result of reducing the rate of transmission of new data. 543 In RTP, network latency impacts the RTCP-reported RTT and increased 544 latency can trigger a sender to adjust its rate. Methods such as 545 LEDBAT [RFC6817] assume increased latency as a primary signal of 546 congestion. 548 It is essential that all Internet hosts respond to loss [RFC5681], 549 [RFC5405][RFC4960][RFC4340]. Packet dropping by network devices that 550 are under load has two effects: It protects the network, which is the 551 primary reason that network devices drop packets. The detection of 552 loss also provides a signal to a reliable transport (e.g. TCP, SCTP) 553 that there is potential congestion using a pragmatic heuristic; "when 554 the network discards a message in flight, it may imply the presence 555 of faulty equipment or media in a path, and it may imply the presence 556 of congestion. To be conservative, a transport must assume it may be 557 the latter." Unreliable transports (e.g. using UDP) need to 558 similarly react to loss [RFC5405] 560 Network devices SHOULD use an AQM algorithm to determine the packets 561 that are marked or discarded due to congestion. 563 Loss also has an effect on the efficiency of a flow and can 564 significantly impact some classes of application. In reliable 565 transports the dropped data must be subsequently retransmitted. 566 While other applications/transports may adapt to the absence of lost 567 data, this still implies inefficient use of available capacity and 568 the dropped traffic can affect other flows. Hence, congestion 569 signalling by loss is not entirely positive; it is a necessary evil. 571 4.2.1. AQM and ECN 573 Explicit Congestion Notification (ECN) [RFC4301] [RFC4774] [RFC6040] 574 [RFC6679] is a network-layer function that allows a transport to 575 receive network congestion information from a network device without 576 incurring the unintended consequences of loss. ECN includes both 577 transport mechanisms and functions implemented in network devices, 578 the latter rely upon using AQM to decider whether to ECN-mark. 580 Congestion for ECN-capable transports is signalled by a network 581 device setting the "Congestion Experienced (CE)" codepoint in the IP 582 header. This codepoint is noted by the remote receiving end point 583 and signalled back to the sender using a transport protocol 584 mechanism, allowing the sender to trigger timely congestion control. 585 The decision to set the CE codepoint requires an AQM algorithm 586 configured with a threshold. Non-ECN capable flows (the default) are 587 dropped under congestion. 589 Network devices SHOULD use an AQM algorithm that marks ECN-capable 590 traffic when making decisions about the response to congestion. 591 Network devices need to implement this method by marking ECN-capable 592 traffic or by dropping non-ECN-capable traffic. 594 Safe deployment of ECN requires that network devices drop excessive 595 traffic, even when marked as originating from an ECN-capable 596 transport. This is a necessary safety precaution because (1) A non- 597 conformant, broken or malicious receiver could conceal an ECN mark, 598 and not report this to the sender (2) A non-conformant, broken or 599 malicious sender could ignore a reported ECN mark, as it could ignore 600 a loss without using ECN (3) A malfunctioning or non-conforming 601 network device may similarly "hide" an ECN mark. In normal operation 602 such cases should be very uncommon, however overload protection is 603 desirable to protect traffic from misconfigured or malicous use of 604 ECN. 606 Network devices SHOULD use an algorithm to drop excessive traffic, 607 even when marked as originating from an ECN-capable transport. 609 4.3. AQM algorithms deployed SHOULD NOT require operational tuning 611 A number of AQM algorithms have been proposed. Many require some 612 form of tuning or setting of parameters for initial network 613 conditions. This can make these algorithms difficult to use in 614 operational networks. 616 AQM algorithms need to consider both "initial conditions" and 617 "operational conditions". The former includes values that exist 618 before any experience is gathered about the use of the algorithm, 619 such as the configured speed of interface, support for full duplex 620 communication, interface MTU and other properties of the link. The 621 latter includes information observed from monitoring the size of the 622 queue, experienced queueing delay, rate of packet discard, etc. 624 This document therefore specifies that AQM algorithms that are 625 proposed for deployment in the Internet have the following 626 properties: 628 o SHOULD NOT require tuning of initial or configuration parameters. 629 An algorithm needs to provide a default behaviour that auto-tunes 630 to a reasonable performance for typical network operational 631 conditions. This is expected to ease deployment and operation. 632 Initial conditions, such as the interface rate and MTU size or 633 other values derived from these, MAY be required by an AQM 634 algorithm. 636 o MAY support further manual tuning that could improve performance 637 in a specific deployed network. Algorithms that lack such 638 variables are acceptable, but if such variables exist, they SHOULD 639 be externalized (made visible to the operator). Guidance needs to 640 be provided on the cases where autotuning is unlikely to achieve 641 satisfactory performance and to identify the set of parameters 642 that can be tuned. This is expected to enable the algorithm to be 643 deployed in networks that have specific characteristics (variable/ 644 larger delay; networks where capacity is impacted by interactions 645 with lower layer mechanisms, etc). 647 o MAY provide logging and alarm signals to assist in identifying if 648 an algorithm using manual or auto-tuning is functioning as 649 expected. (e.g., this could be based on an internal consistency 650 check between input, output, and mark/drop rates over time). This 651 is expected to encourage deployment by default and allow operators 652 to identify potential interactions with other network functions. 654 Hence, self-tuning algorithms are to be preferred. Algorithms 655 recommended for general Internet deployment by the IETF need to be 656 designed so that they do not require operational (especially manual) 657 configuration or tuning. 659 4.4. AQM algorithms SHOULD respond to measured congestion, not 660 application profiles. 662 Not all applications transmit packets of the same size. Although 663 applications may be characterized by particular profiles of packet 664 size this should not be used as the basis for AQM (see next section). 665 Other methods exist, e.g. Differentiated Services queueing, Pre- 666 Congestion Notification (PCN) [RFC5559], that can be used to 667 differentiate and police classes of application. Network devices may 668 combine AQM with these traffic classification mechanisms and perform 669 AQM only on specific queues within a network device. 671 An AQM algorithm should not deliberately try to prejudice the size of 672 packet that performs best (i.e. Preferentially drop/mark based only 673 on packet size). Procedures for selecting packets to mark/drop 674 SHOULD observe the actual or projected time that a packet is in a 675 queue (bytes at a rate being an analog to time). When an AQM 676 algorithm decides whether to drop (or mark) a packet, it is 677 RECOMMENDED that the size of the particular packet should not be 678 taken into account [Byte-pkt]. 680 Applications (or transports) generally know the packet size that they 681 are using and can hence make their judgments about whether to use 682 small or large packets based on the data they wish to send and the 683 expected impact on the delay or throughput, or other performance 684 parameter. When a transport or application responds to a dropped or 685 marked packet, the size of the rate reduction should be proportionate 686 to the size of the packet that was sent [Byte-pkt]. 688 AQM-enabled system MAY instantiate different instances of an AQM 689 algorithm to be applied within the same traffic class. Traffic 690 classes may be differentiated based on an Access Control List (ACL), 691 the packet DiffServ Code Point (DSCP) [RFC5559], setting of the ECN 692 field[RFC3168] [RFC4774], a multi-field (MF) classifier that combines 693 the values of a set of protocol fields (e.g. IP address, transport, 694 ports) or an equivalent codepoint at a lower layer. This 695 recommendation goes beyond what is defined in RFC 3168, by allowing 696 that an implementation MAY use more than one instance of an AQM 697 algorithm to handle both ECN-capable and non-ECN-capable packets. 699 4.5. AQM algorithms SHOULD NOT be dependent on specific transport 700 protocol behaviours 702 In deploying AQM, network devices need to support a range of Internet 703 traffic and SHOULD NOT make implicit assumptions about the 704 characteristics desired by the set transports/applications the 705 network supports. That is, AQM methods should be opaque to the 706 choice of transport and application. 708 AQM algorithms are often evaluated by considering TCP [RFC0793] with 709 a limited number of applications. Although TCP is the predominant 710 transport in the Internet today, this no longer represents a 711 sufficient selection of traffic for verification. There is 712 significant use of UDP [RFC0768] in voice and video services, and 713 some applications find utility in SCTP [RFC4960] and DCCP [RFC4340]. 714 Hence, AQM algorithms should also demonstrate operation with 715 transports other than TCP and need to consider a variety of 716 applications. Selection of AQM algorithms also needs to consider use 717 of tunnel encapsulations that may carry traffic aggregates. 719 AQM algorithms SHOULD NOT target or derive implicit assumptions about 720 the characteristics desired by specific transports/applications. 721 Transports and applications need to respond to the congestion signals 722 provided by AQM (i.e. dropping or ECN-marking) in a timely manner 723 (within a few RTT at the latest). 725 4.6. Interactions with congestion control algorithms 727 Applications and transports need to react to received implicit or 728 explicit signals that indicate the presence of congestion. This 729 section identifies issues that can impact the design of transport 730 protocols when using paths that use AQM. 732 Transport protocols and applications need timely signals of 733 congestion. The time taken to detect and respond to congestion is 734 increased when network devices queue packets in buffers. It can be 735 difficult to detect tail losses at a higher layer and this may 736 sometimes require transport timers or probe packets to detect and 737 respond to such loss. Loss patterns may also impact timely 738 detection, e.g. the time may be reduced when network devices do not 739 drop long runs of packets from the same flow. 741 A common objective is to deliver data from its source end point to 742 its destination in the least possible time. When speaking of TCP 743 performance, the terms "knee" and "cliff" area defined by [Jain94]. 744 They respectively refer to the minimum congestion window that 745 maximizes throughput and the maximum congestion window that avoids 746 loss. An application that transmits at the rate determined by this 747 window has the effect of maximizing the rate or throughput. For the 748 sender, exceeding the cliff is ineffective, as it (by definition) 749 induces loss; operating at a point close to the cliff has a negative 750 impact on other traffic and applications, triggering operator 751 activities, such as those discussed in [RFC6057]. Operating below 752 the knee reduces the throughput, since the sender fails to use 753 available network capacity. As a result, the behavior of any elastic 754 transport congestion control algorithm designed to minimize delivery 755 time should seek to use an effective window at or above the knee and 756 well below the cliff. Choice of an appropriate rate can 757 significantly impact the loss and delay experienced not only by a 758 flow, but by other flows that share the same queue. 760 Some applications may send less than permitted by the congestion 761 control window (or rate). Examples include multimedia codecs that 762 stream at some natural rate (or set of rates) or an application that 763 is naturally interactive (e.g., some web applications, gaming, 764 transaction-based protocols). Such applications may have different 765 objectives. They may not wish to maximize throughput, but may desire 766 a lower loss rate or bounded delay. 768 The correct operation of an AQM-enabled network device MUST NOT rely 769 upon specific transport responses to congestion signals. 771 4.7. The need for further research 773 The second recommendation of [RFC2309] called for further research 774 into the interaction between network queues and host applications, 775 and the means of signaling between them. This research has occurred, 776 and we as a community have learned a lot. However, we are not done. 778 We have learned that the problems of congestion, latency and buffer- 779 sizing have not gone away, and are becoming more important to many 780 users. A number of self-tuning AQM algorithms have been found that 781 offer significant advantages for deployed networks. There is also 782 renewed interest in deploying AQM and the potential of ECN. 784 In 2013, an obvious example of further research is the need to 785 consider the use of Map/Reduce applications in data centers; do we 786 need to extend our taxonomy of TCP/SCTP sessions to include not only 787 "mice" and "elephants", but "lemmings". "Lemmings" are flash crowds 788 of "mice" that the network inadvertently try to signal to as if they 789 were elephant flows, resulting in head of line blocking in data 790 center applications. 792 Examples of other required research include: 794 o Research into new AQM and scheduling algorithms. 796 o Research into the use of and deployment of ECN alongside AQM. 798 o Tools for enabling AQM (and ECN) deployment and measuring the 799 performance. 801 o Methods for mitigating the impact of non-conformant and malicious 802 flows. 804 o Research to understand the implications of using new network and 805 transport methods on applications. 807 Hence, this document therefore reiterates the call of RFC 2309: we 808 need continuing research as applications develop. 810 5. IANA Considerations 812 This memo asks the IANA for no new parameters. 814 6. Security Considerations 816 While security is a very important issue, it is largely orthogonal to 817 the performance issues discussed in this memo. 819 Many deployed network devices use queueing methods that allow 820 unresponsive traffic to capture network capacity, denying access to 821 other traffic flows. This could potentially be used as a denial-of- 822 service attack. This threat could be reduced in network devices 823 deploy AQM or some form of scheduling. We note, however, that a 824 denial-of-service attack may create unresponsive traffic flows that 825 may be indistinguishable from other traffic flows (e.g. tunnels 826 carrying aggregates of short flows, high-rate isochronous 827 applications). New methods therefore may remain vulnerable, and this 828 document recommends that ongoing research should consider ways to 829 mitigate such attacks. 831 7. Privacy Considerations 833 This document, by itself, presents no new privacy issues. 835 8. Acknowledgements 837 The original recommendation in [RFC2309] was written by the End-to- 838 End Research Group, which is to say Bob Braden, Dave Clark, Jon 839 Crowcroft, Bruce Davie, Steve Deering, Deborah Estrin, Sally Floyd, 840 Van Jacobson, Greg Minshall, Craig Partridge, Larry Peterson, KK 841 Ramakrishnan, Scott Shenker, John Wroclawski, and Lixia Zhang. This 842 is an edited version of that document, with much of its text and 843 arguments unchanged. 845 The need for an updated document was agreed to in the tsvarea meeting 846 at IETF 86. This document was reviewed on the aqm@ietf.org list. 847 Comments came from Colin Perkins, Richard Scheffenegger, Dave Taht, 848 and many others. 850 Gorry Fairhurst was in part supported by the European Community under 851 its Seventh Framework Programme through the Reducing Internet 852 Transport Latency (RITE) project (ICT-317700). 854 9. References 856 9.1. Normative References 858 [Byte-pkt] 859 and Internet Engineering Task Force, Work in Progress, 860 "Byte and Packet Congestion Notification (draft-ietf- 861 tsvwg-byte-pkt-congest)", July 2013. 863 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 864 Requirement Levels", BCP 14, RFC 2119, March 1997. 866 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 867 of Explicit Congestion Notification (ECN) to IP", RFC 868 3168, September 2001. 870 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 871 Internet Protocol", RFC 4301, December 2005. 873 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 874 Explicit Congestion Notification (ECN) Field", BCP 124, 875 RFC 4774, November 2006. 877 [RFC5405] Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines 878 for Application Designers", BCP 145, RFC 5405, November 879 2008. 881 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 882 Control", RFC 5681, September 2009. 884 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 885 Notification", RFC 6040, November 2010. 887 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 888 and K. Carlberg, "Explicit Congestion Notification (ECN) 889 for RTP over UDP", RFC 6679, August 2012. 891 9.2. Informative References 893 [AQM-WG] "IETF AQM WG", . 895 [Demers90] 896 Demers, A., Keshav, S., and S. Shenker, "Analysis and 897 Simulation of a Fair Queueing Algorithm, Internetworking: 898 Research and Experience", SIGCOMM Symposium proceedings on 899 Communications architectures and protocols , 1990. 901 [Floyd91] Floyd, S., "Connections with Multiple Congested Gateways 902 in Packet-Switched Networks Part 1: One-way Traffic.", 903 Computer Communications Review , October 1991. 905 [Floyd95] Floyd, S. and V. Jacobson, "Link-sharing and Resource 906 Management Models for Packet Networks", IEEE/ACM 907 Transactions on Networking , August 1995. 909 [Jacobson88] 910 Jacobson, V., "Congestion Avoidance and Control", SIGCOMM 911 Symposium proceedings on Communications architectures and 912 protocols , August 1988. 914 [Jain94] Jain, Raj., Ramakrishnan, KK., and Chiu. Dah-Ming, 915 "Congestion avoidance scheme for computer networks", US 916 Patent Office 5377327, December 1994. 918 [Lakshman96] 919 Lakshman, TV., Neidhardt, A., and T. Ott, "The Drop From 920 Front Strategy in TCP Over ATM and Its Interworking with 921 Other Control Features", IEEE Infocomm , 1996. 923 [Leland94] 924 Leland, W., Taqqu, M., Willinger, W., and D. Wilson, "On 925 the Self-Similar Nature of Ethernet Traffic (Extended 926 Version)", IEEE/ACM Transactions on Networking , February 927 1994. 929 [Papagiannaki] 930 Sprint ATL, KAIST, University of Minnesota, Sprint ATL, 931 and Intel ResearchIETF, "Analysis of Point-To-Point Packet 932 Delay In an Operational Network", IEEE Infocom 2004, March 933 2004, . 935 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 936 August 1980. 938 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 939 1981. 941 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 942 793, September 1981. 944 [RFC0896] Nagle, J., "Congestion control in IP/TCP internetworks", 945 RFC 896, January 1984. 947 [RFC0970] Nagle, J., "On packet switches with infinite storage", RFC 948 970, December 1985. 950 [RFC1122] Braden, R., "Requirements for Internet Hosts - 951 Communication Layers", STD 3, RFC 1122, October 1989. 953 [RFC1633] Braden, B., Clark, D., and S. Shenker, "Integrated 954 Services in the Internet Architecture: an Overview", RFC 955 1633, June 1994. 957 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 958 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 959 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 960 S., Wroclawski, J., and L. Zhang, "Recommendations on 961 Queue Management and Congestion Avoidance in the 962 Internet", RFC 2309, April 1998. 964 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 965 (IPv6) Specification", RFC 2460, December 1998. 967 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 968 "Definition of the Differentiated Services Field (DS 969 Field) in the IPv4 and IPv6 Headers", RFC 2474, December 970 1998. 972 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 973 and W. Weiss, "An Architecture for Differentiated 974 Services", RFC 2475, December 1998. 976 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 977 Congestion Control Protocol (DCCP)", RFC 4340, March 2006. 979 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC 980 4960, September 2007. 982 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 983 Friendly Rate Control (TFRC): Protocol Specification", RFC 984 5348, September 2008. 986 [RFC5559] Eardley, P., "Pre-Congestion Notification (PCN) 987 Architecture", RFC 5559, June 2009. 989 [RFC6057] Bastian, C., Klieber, T., Livingood, J., Mills, J., and R. 990 Woundy, "Comcast's Protocol-Agnostic Congestion Management 991 System", RFC 6057, December 2010. 993 [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, 994 "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, 995 December 2012. 997 [Willinger95] 998 Willinger, W., Taqqu, M., Sherman, R., Wilson, D., and V. 999 Jacobson, "Self-Similarity Through High-Variability: 1000 Statistical Analysis of Ethernet LAN Traffic at the Source 1001 Level", SIGCOMM Symposium proceedings on Communications 1002 architectures and protocols , August 1995. 1004 Appendix A. Change Log 1006 Initial Version: March 2013 1008 Minor update of the algorithms that the IETF recommends SHOULD NOT 1009 require operational (especially manual) configuration or tuningdate: 1011 April 2013 1013 Major surgery. This draft is for discussion at IETF-87 and expected 1014 to be further updated. 1015 July 2013 1017 -00 WG Draft - Updated transport recommendations; revised deployment 1018 configuration section; numerous minor edits. 1019 Oct 2013 1021 -01 WG Draft - Updated transport recommendations; revised deployment 1022 configuration section; numerous minor edits. 1023 Jan 2014 - Feedback from WG. 1025 -02 WG Draft - Minor edits Feb 2014 - Mainly language fixes. 1027 -03 WG Draft - Minor edits Feb 2013 - Comments from David Collier- 1028 Brown and David Taht. 1030 Authors' Addresses 1032 Fred Baker (editor) 1033 Cisco Systems 1034 Santa Barbara, California 93117 1035 USA 1037 Email: fred@cisco.com 1038 Godred Fairhurst (editor) 1039 University of Aberdeen 1040 School of Engineering 1041 Fraser Noble Building 1042 Aberdeen, Scotland AB24 3UE 1043 UK 1045 Email: gorry@erg.abdn.ac.uk 1046 URI: http://www.erg.abdn.ac.uk