idnits 2.17.1 draft-floyd-cong-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 15 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 16 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 131 has weird spacing: '...lure to respo...' == Line 134 has weird spacing: '... but is a w...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 2000) is 8713 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2119' is defined on line 535, but no explicit reference was found in the text == Outdated reference: A later version (-03) exists of draft-balakrishnan-cm-02 -- Possible downref: Normative reference to a draft: ref. 'BS00' == Outdated reference: A later version (-07) exists of draft-ietf-pilc-slow-03 -- Possible downref: Non-RFC (?) normative reference: ref. 'FF99' -- Unexpected draft version: The latest known version of draft-handley-tcp-cwv is -01, but you're referring to -02. ** Downref: Normative reference to an Historic draft: draft-handley-tcp-cwv (ref. 'HPF00') -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson88' ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 896 (Obsoleted by RFC 7805) ** Obsolete normative reference: RFC 1323 (Obsoleted by RFC 7323) ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567) ** Downref: Normative reference to an Informational RFC: RFC 2357 ** Obsolete normative reference: RFC 2414 (Obsoleted by RFC 3390) ** Downref: Normative reference to an Informational RFC: RFC 2475 ** Obsolete normative reference: RFC 2481 (Obsoleted by RFC 3168) ** Downref: Normative reference to an Informational RFC: RFC 2525 ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2582 (Obsoleted by RFC 3782) ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Possible downref: Non-RFC (?) normative reference: ref. 'SCWA99' -- Possible downref: Non-RFC (?) normative reference: ref. 'TCPB98' -- Possible downref: Non-RFC (?) normative reference: ref. 'TCPF98' Summary: 17 errors (**), 0 flaws (~~), 9 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Sally Floyd, Editor 2 INTERNET DRAFT ACIRI 3 draft-floyd-cong-04.txt June 2000 4 Expires: December 2000 6 Congestion Control Principles 8 Status of this Memo 10 This document is an Internet-Draft and is in full conformance with 11 all provisions of Section 10 of RFC2026. 13 Internet-Drafts are working documents of the Internet Engineering 14 Task Force (IETF), its areas, and its working groups. Note that 15 other groups may also distribute working documents as Internet- 16 Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet- Drafts as reference 21 material or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 Abstract 31 1. Introduction. 33 The goal of this document is to explain the need for congestion 34 control in the Internet, and to discuss what constitutes correct 35 congestion control. One specific goal is to illustrate the dangers 36 of neglecting to apply proper congestion control. A second goal is 37 to discuss the role of the IETF in standardizing new congestion 38 control protocols. 40 This document draws heavily from earlier RFCs, in some cases 41 reproducing entire sections of the text of earlier documents 42 [RFC2309, RFC2357]. We have also borrowed heavily from earlier 43 publications addressing the need for end-to-end congestion control 44 [FF99]. 46 2. Current standards on congestion control 48 IETF standards concerning end-to-end congestion control focus either 49 on specific protocols (e.g., TCP [RFC2581], reliable multicast 50 protocols [RFC2357]) or on the syntax and semantics of communications 51 between the end nodes and routers about congestion information (e.g., 52 Explicit Congestion Notification [RFC2481]) or desired quality-of- 53 service (diff-serv)). The role of end-to-end congestion control is 54 also discussed in an Informational RFC on "Recommendations on Queue 55 Management and Congestion Avoidance in the Internet" [RFC2309]. RFC 56 2309 recommends the deployment of active queue management mechanisms 57 in routers, and the continuation of design efforts towards mechanisms 58 in routers to deal with flows that are unresponsive to congestion 59 notification. We freely borrow from RFC 2309 some of their general 60 discussion of end-to-end congestion control. 62 In contrast to the RFCs discussed above, this document is a more 63 general discussion of the principles of congestion control. One of 64 the keys to the success of the Internet has been the congestion 65 avoidance mechanisms of TCP. While TCP is still the dominant 66 transport protocol in the Internet, it is not ubiquitous, and there 67 are an increasing number of applications that, for one reason or 68 another, choose not to use TCP. Such traffic includes not only 69 multicast traffic, but unicast traffic such as streaming multimedia 70 that does not require reliability; and traffic such as DNS or routing 71 messages that consist of short transfers deemed critical to the 72 operation of the network. Much of this traffic does not use any form 73 of either bandwidth reservations or end-to-end congestion control. 74 The continued use of end-to-end congestion control by best-effort 75 traffic is critical for maintaining the stability of the Internet. 77 This document also discusses the general role of the IETF in the 78 standardization of new congestion control protocols. 80 The discussion of congestion control principles for differentiated 81 services or integrated services is not addressed in this document. 82 Some categories of integrated or differentiated services include a 83 guarantee by the network of end-to-end bandwidth, and as such do not 84 require end-to-end congestion control mechanisms. 86 3. The development of end-to-end congestion control. 88 3.1. Preventing congestion collapse. 90 The Internet protocol architecture is based on a connectionless end- 91 to-end packet service using the IP protocol. The advantages of its 92 connectionless design, flexibility and robustness, have been amply 93 demonstrated. However, these advantages are not without cost: 94 careful design is required to provide good service under heavy load. 95 In fact, lack of attention to the dynamics of packet forwarding can 96 result in severe service degradation or "Internet meltdown". This 97 phenomenon was first observed during the early growth phase of the 98 Internet of the mid 1980s [RFC896], and is technically called 99 "congestion collapse". 101 The original specification of TCP [RFC793] included window-based flow 102 control as a means for the receiver to govern the amount of data sent 103 by the sender. This flow control was used to prevent overflow of the 104 receiver's data buffer space available for that connection. [RFC793] 105 reported that segments could be lost due either to errors or to 106 network congestion, but did not include dynamic adjustment of the 107 flow-control window in response to congestion. 109 The original fix for Internet meltdown was provided by Van Jacobson. 110 Beginning in 1986, Jacobson developed the congestion avoidance 111 mechanisms that are now required in TCP implementations [Jacobson88, 112 RFC 2581]. These mechanisms operate in the hosts to cause TCP 113 connections to "back off" during congestion. We say that TCP flows 114 are "responsive" to congestion signals (i.e., dropped packets) from 115 the network. It is these TCP congestion avoidance algorithms that 116 prevent the congestion collapse of today's Internet. 118 However, that is not the end of the story. Considerable research has 119 been done on Internet dynamics since 1988, and the Internet has 120 grown. It has become clear that the TCP congestion avoidance 121 mechanisms [RFC2581], while necessary and powerful, are not 122 sufficient to provide good service in all circumstances. In addition 123 to the development of new congestion control mechanisms [RFC2357], 124 router-based mechanisms are in development that complement the 125 endpoint congestion avoidance mechanisms. 127 A major issue that still needs to be addressed is the potential for 128 future congestion collapse of the Internet due to flows that do not 129 use responsible end-to-end congestion control. RFC 896 [RFC896] 130 suggested in 1984 that gateways should detect and `squelch' 131 misbehaving hosts: "Failure to respond to an ICMP Source Quench 132 message, though, should be regarded as grounds for action by a 133 gateway to disconnect a host. Detecting such failure is non-trivial 134 but is a worthwhile area for further research." Current papers 135 still propose that routers detect and penalize flows that are not 136 employing acceptable end-to-end congestion control [FF99]. 138 3.2. Fairness 140 In addition to a concern about congestion collapse, there is a 141 concern about `fairness' for best-effort traffic. Because TCP "backs 142 off" during congestion, a large number of TCP connections can share a 143 single, congested link in such a way that bandwidth is shared 144 reasonably equitably among similarly situated flows. The equitable 145 sharing of bandwidth among flows depends on the fact that all flows 146 are running compatible congestion control algorithms. For TCP, this 147 means congestion control algorithms conformant with the current TCP 148 specification [RFC793, RFC1122, RFC2581]. 150 The issue of fairness among competing flows has become increasingly 151 important for several reasons. First, using window scaling 152 [RFC1323], individual TCPs can use high bandwidth even over high- 153 propagation-delay paths. Second, with the growth of the web, 154 Internet users increasingly want high-bandwidth and low-delay 155 communications, rather than the leisurely transfer of a long file in 156 the background. The growth of best-effort traffic that does not use 157 TCP underscores this concern about fairness between competing best- 158 effort traffic in times of congestion. 160 The popularity of the Internet has caused a proliferation in the 161 number of TCP implementations. Some of these may fail to implement 162 the TCP congestion avoidance mechanisms correctly because of poor 163 implementation [RFC2525]. Others may deliberately be implemented 164 with congestion avoidance algorithms that are more aggressive in 165 their use of bandwidth than other TCP implementations; this would 166 allow a vendor to claim to have a "faster TCP". The logical 167 consequence of such implementations would be a spiral of increasingly 168 aggressive TCP implementations, or increasingly aggressive transport 169 protocols, leading back to the point where there is effectively no 170 congestion avoidance and the Internet is chronically congested. 172 There is a well-known way to achieve more aggressive performance 173 without even changing the transport protocol, by changing the level 174 of granularity: open multiple connections to the same place, as has 175 been done in the past by some Web browsers. Thus, instead of a 176 spiral of increasingly aggressive transport protocols, we would 177 instead have a spiral of increasingly aggressive web browsers, or 178 increasingly aggressive applications. 180 This raises the issue of the appropriate granularity of a "flow", 181 where we define a `flow' as the level of granularity appropriate for 182 the application of both fairness and congestion control. From RFC 183 2309: "There are a few `natural' answers: 1) a TCP or UDP connection 184 (source address/port, destination address/port); 2) a 185 source/destination host pair; 3) a given source host or a given 186 destination host. We would guess that the source/destination host 187 pair gives the most appropriate granularity in many circumstances. 188 The granularity of flows for congestion management is, at least in 189 part, a policy question that needs to be addressed in the wider IETF 190 community." 192 Again borrowing from RFC 2309, we use the term "TCP-compatible" for a 193 flow that behaves under congestion like a flow produced by a 194 conformant TCP. A TCP-compatible flow is responsive to congestion 195 notification, and in steady-state uses no more bandwidth than a 196 conformant TCP running under comparable conditions (drop rate, RTT, 197 MTU, etc.) 199 It is convenient to divide flows into three classes: (1) TCP- 200 compatible flows, (2) unresponsive flows, i.e., flows that do not 201 slow down when congestion occurs, and (3) flows that are responsive 202 but are not TCP-compatible. The last two classes contain more 203 aggressive flows that pose significant threats to Internet 204 performance, as we discuss below. 206 In addition to steady-state fairness, the fairness of the initial 207 slow-start is also a concern. One concern is the transient effect on 208 other flows of a flow with an overly-aggressive slow-start procedure. 209 Slow-start performance is particularly important for the many flows 210 that are short-lived, and only have a small amount of data to 211 transfer. 213 3.3. Optimizing performance regarding throughput, delay, and loss. 215 In addition to the prevention of congestion collapse and concerns 216 about fairness, a third reason for a flow to use end-to-end 217 congestion control can be to optimize its own performance regarding 218 throughput, delay, and loss. In some circumstances, for example in 219 environments of high statistical multiplexing, the delay and loss 220 rate experienced by a flow are largely independent of its own sending 221 rate. However, in environments with lower levels of statistical 222 multiplexing or with per-flow scheduling, the delay and loss rate 223 experienced by a flow is in part a function of the flow's own sending 224 rate. Thus, a flow can use end-to-end congestion control to limit 225 the delay or loss experienced by its own packets. We would note, 226 however, that in an environment like the current best-effort 227 Internet, concerns regarding congestion collapse and fairness with 228 competing flows limit the range of congestion control behaviors 229 available to a flow. 231 4. The role of the standards process 233 The standardization of a transport protocol includes not only 234 standardization of aspects of the protocol that could affect 235 interoperability (e.g., information exchanged by the end-nodes), but 236 also standardization of mechanisms deemed critical to performance 237 (e.g., in TCP, reduction of the congestion window in response to a 238 packet drop). At the same time, implementation-specific details and 239 other aspects of the transport protocol that do not affect 240 interoperability and do not significantly interfere with performance 241 do not require standardization. Areas of TCP that do not require 242 standardization include the details of TCP's Fast Recovery procedure 243 after a Fast Retransmit [RFC2582]. The appendix uses examples from 244 TCP to discuss in more detail the role of the standards process in 245 the development of congestion control. 247 4.1. The development of new transport protocols. 249 In addition to addressing the danger of congestion collapse, the 250 standardization process for new transport protocols takes care to 251 avoid a congestion control `arms race' among competing protocols. As 252 an example, in RFC 2357 [RFC2357] the TSV Area Directors and their 253 Directorate outline criteria for the publication as RFCs of Internet- 254 Drafts on reliable multicast transport protocols. From [RFC2357]: 255 "A particular concern for the IETF is the impact of reliable 256 multicast traffic on other traffic in the Internet in times of 257 congestion, in particular the effect of reliable multicast traffic on 258 competing TCP traffic.... The challenge to the IETF is to encourage 259 research and implementations of reliable multicast, and to enable the 260 needs of applications for reliable multicast to be met as 261 expeditiously as possible, while at the same time protecting the 262 Internet from the congestion disaster or collapse that could result 263 from the widespread use of applications with inappropriate reliable 264 multicast mechanisms." 266 The list of technical criteria that must be addressed by RFCs on new 267 reliable multicast transport protocols include the following: "Is 268 there a congestion control mechanism? How well does it perform? When 269 does it fail? Note that congestion control mechanisms that operate 270 on the network more aggressively than TCP will face a great burden of 271 proof that they don't threaten network stability." 273 It is reasonable to expect that these concerns about the effect of 274 new transport protocols on competing traffic will apply not only to 275 reliable multicast protocols, but to unreliable unicast, reliable 276 unicast, and unreliable multicast traffic as well. 278 4.2. Application-level issues that affect congestion control 280 The specific issue of a browser opening multiple connections to the 281 same destination has been addressed by RFC 2616 [RFC2616], which 282 states in Section 8.1.4 that "Clients that use persistent connections 283 SHOULD limit the number of simultaneous connections that they 284 maintain to a given server. A single-user client SHOULD NOT maintain 285 more than 2 connections with any server or proxy." 287 4.3. New developments in the standards process 289 The most obvious developments in the IETF that could affect the 290 evolution of congestion control are the development of integrated and 291 differentiated services [RFC2212, RFC2475] and of Explicit Congestion 292 Notification (ECN) [RFC2481]. However, other less dramatic 293 developments are likely to affect congestion control as well. 295 One such effort is that to construct Endpoint Congestion Management 296 [BS00], to enable multiple concurrent flows from a sender to the same 297 receiver to share congestion control state. By allowing multiple 298 connections to the same destination to act as one flow in terms of 299 end-to-end congestion control, a Congestion Manager could allow 300 individual connections slow-starting to take advantage of previous 301 information about the congestion state of the end-to-end path. 302 Further, the use of a Congestion Manager could remove the congestion 303 control dangers of multiple flows being opened between the same 304 source/destination pair, and could perhaps be used to allow a browser 305 to open many simultaneous connections to the same destination. 307 5. A description of congestion collapse 309 This section discusses congestion collapse from undelivered packets 310 in some detail, and shows how unresponsive flows could contribute to 311 congestion collapse in the Internet. This section draws heavily on 312 material from [FF99]. 314 Informally, congestion collapse occurs when an increase in the 315 network load results in a decrease in the useful work done by the 316 network. As discussed in Section 3, congestion collapse was first 317 reported in the mid 1980s [RFC896], and was largely due to TCP 318 connections unnecessarily retransmitting packets that were either in 319 transit or had already been received at the receiver. We call the 320 congestion collapse that results from the unnecessary retransmission 321 of packets classical congestion collapse. Classical congestion 322 collapse is a stable condition that can result in throughput that is 323 a small fraction of normal [RFC896]. Problems with classical 324 congestion collapse have generally been corrected by the timer 325 improvements and congestion control mechanisms in modern 326 implementations of TCP [Jacobson88]. 328 A second form of potential congestion collapse occurs due to 329 undelivered packets. Congestion collapse from undelivered packets 330 arises when bandwidth is wasted by delivering packets through the 331 network that are dropped before reaching their ultimate destination. 332 This is probably the largest unresolved danger with respect to 333 congestion collapse in the Internet today. Different scenarios can 334 result in different degrees of congestion collapse, in terms of the 335 fraction of the congested links' bandwidth used for productive work. 336 The danger of congestion collapse from undelivered packets is due 337 primarily to the increasing deployment of open-loop applications not 338 using end-to-end congestion control. Even more destructive would be 339 best-effort applications that *increase* their sending rate in 340 response to an increased packet drop rate (e.g., automatically using 341 an increased level of FEC). 343 Table 1 gives the results from a scenario with congestion collapse 344 from undelivered packets, where scarce bandwidth is wasted by packets 345 that never reach their destination. The simulation uses a scenario 346 with three TCP flows and one UDP flow competing over a congested 1.5 347 Mbps link. The access links for all nodes are 10 Mbps, except that 348 the access link to the receiver of the UDP flow is 128 Kbps, only 9% 349 of the bandwidth of shared link. When the UDP source rate exceeds 350 128 Kbps, most of the UDP packets will be dropped at the output port 351 to that final link. 353 UDP 354 Arrival UDP TCP Total 355 Rate Goodput Goodput Goodput 356 -------------------------------------- 357 0.7 0.7 98.5 99.2 358 1.8 1.7 97.3 99.1 359 2.6 2.6 96.0 98.6 360 5.3 5.2 92.7 97.9 361 8.8 8.4 87.1 95.5 362 10.5 8.4 84.8 93.2 363 13.1 8.4 81.4 89.8 364 17.5 8.4 77.3 85.7 365 26.3 8.4 64.5 72.8 366 52.6 8.4 38.1 46.4 367 58.4 8.4 32.8 41.2 368 65.7 8.4 28.5 36.8 369 75.1 8.4 19.7 28.1 370 87.6 8.4 11.3 19.7 371 105.2 8.4 3.4 11.8 372 131.5 8.4 2.4 10.7 373 Table 1. A simulation with three TCP flows and one UDP flow. 375 Table 1 shows the UDP arrival rate from the sender, the UDP goodput 376 (defined as the bandwidth delivered to the receiver), the TCP goodput 377 (as delivered to the TCP receivers), and the aggregate goodput on the 378 congested 1.5 Mbps link. Each rate is given as a fraction of the 379 bandwidth of the congested link. As the UDP source rate increases, 380 the TCP goodput decreases roughly linearly, and the UDP goodput is 381 nearly constant. Thus, as the UDP flow increases its offered load, 382 its only effect is to hurt the TCP and aggregate goodput. On the 383 congested link, the UDP flow ultimately `wastes' the bandwidth that 384 could have been used by the TCP flow, and reduces the goodput in the 385 network as a whole down to a small fraction of the bandwidth of the 386 congested link. 388 The simulations in Table 1 illustrate both unfairness and congestion 389 collapse. As [FF99] discusses, compatible congestion control is not 390 the only way to provide fairness; per-flow scheduling at the 391 congested routers is an alternative mechanism at the routers that 392 guarantees fairness. However, as discussed in [FF99], per-flow 393 scheduling can not be relied upon to prevent congestion collapse. 395 There are only two alternatives for eliminating the danger of 396 congestion collapse from undelivered packets. The first alternative 397 for preventing congestion collapse from undelivered packets is the 398 use of effective end-to-end congestion control by the end nodes. 399 More specifically, the requirement would be that a flow avoid a 400 pattern of significant losses at links downstream from the first 401 congested link on the path. (Here, we would consider any link a 402 `congested link' if any flow is using bandwidth that would otherwise 403 be used by other traffic on the link.) Given that an end-node is 404 generally unable to distinguish between a path with one congested 405 link and a path with multiple congested links, the most reliable way 406 for a flow to avoid a pattern of significant losses at a downstream 407 congested link is for the flow to use end-to-end congestion control, 408 and reduce its sending rate in the presence of loss. 410 A second alternative for preventing congestion collapse from 411 undelivered packets would be a guarantee by the network that packets 412 accepted at a congested link in the network will be delivered all the 413 way to the receiver [RFC2212, RFC2475]. We note that the choice 414 between the first alternative of end-to-end congestion control and 415 the second alternative of end-to-end bandwidth guarantees does not 416 have to be an either/or decision; congestion collapse can be 417 prevented by the use of effective end-to-end congestion by some of 418 the traffic, and the use of end-to-end bandwidth guarantees from the 419 network for the rest of the traffic. 421 6. Forms of end-to-end congestion control 423 This document has discussed concerns about congestion collapse and 424 about fairness with TCP for new forms of congestion control. This 425 does not mean, however, that concerns about congestion collapse and 426 fairness with TCP necessitate that all best-effort traffic deploy 427 congestion control based on TCP's Additive-Increase Multiplicative- 428 Decrease (AIMD) algorithm of reducing the sending rate in half in 429 response to each packet drop. This section separately discusses the 430 implications of these two concerns of congestion collapse and 431 fairness with TCP. 433 6.1. End-to-end congestion control for avoiding congestion collapse. 435 The avoidance of congestion collapse from undelivered packets 436 requires that flows avoid a scenario of a high sending rate, multiple 437 congested links, and a persistent high packet drop rate at the 438 downstream link. Because congestion collapse from undelivered 439 packets consists of packets that waste valuable bandwidth only to be 440 dropped downstream, this form of congestion collapse is not possible 441 in an environment where each flow traverses only one congested link, 442 or where only a small number of packets are dropped at links 443 downstream of the first congested link. Thus, any form of congestion 444 control that successfully avoids a high sending rate in the presence 445 of a high packet drop rate should be sufficient to avoid congestion 446 collapse from undelivered packets. 448 We would note that the addition of Explicit Congestion Notification 449 (ECN) to the IP architecture would not, in and of itself, remove the 450 danger of congestion collapse for best-effort traffic. ECN allows 451 routers to set a bit in packet headers as an indication of congestion 452 to the end-nodes, rather than being forced to rely on packet drops to 453 indicate congestion. However, with ECN, packet-marking would replace 454 packet-dropping only in times of moderate congestion. In particular, 455 when congestion is heavy, and a router's buffers overflow, the router 456 has no choice but to drop arriving packets. 458 6.2. End-to-end congestion control for fairness with TCP. 460 The concern expressed in [RFC2357] about fairness with TCP places a 461 significant though not crippling constraint on the range of viable 462 end-to-end congestion control mechanisms for best-effort traffic. An 463 environment with per-flow scheduling at all congested links would 464 isolate flows from each other, and eliminate the need for congestion 465 control mechanisms to be TCP-compatible. An environment with 466 differentiated services, where flows marked as belonging to a certain 467 diff-serv class would be scheduled in isolation from best-effort 468 traffic, could allow the emergence of an entire diff-serv class of 469 traffic where congestion control was not required to be TCP- 470 compatible. Similarly, a pricing-controlled environment, or a diff- 471 serv class with its own pricing paradigm, could supercede the concern 472 about fairness with TCP. However, for the current Internet 473 environment, where other best-effort traffic could compete in a FIFO 474 queue with TCP traffic, the absence of fairness with TCP could lead 475 to one flow `starving out' another flow in a time of high congestion, 476 as was illustrated in Table 1 above. 478 However, the list of TCP-compatible congestion control procedures is 479 not limited to AIMD with the same increase/ decrease parameters as 480 TCP. Other TCP-compatible congestion control procedures include 481 rate-based variants of AIMD; AIMD with different sets of 482 increase/decrease parameters that give the same steady-state 483 behavior; equation-based congestion control where the sender adjusts 484 its sending rate in response to information about the long-term 485 packet drop rate; layered multicast where receivers subscribe and 486 unsubscribe from layered multicast groups; and possibly other forms 487 that we have not yet begun to consider. 489 7. Acknowledgements 491 Much of this document draws directly on previous RFCs addressing end- 492 to-end congestion control. This attempts to be a summary of ideas 493 that have been discussed for many years, and by many people. In 494 particular, acknowledgement is due to the members of the End-to-End 495 Research Group, the Reliable Multicast Research Group, and the 496 Transport Area Directorate. This document has also benefited from 497 discussion and feedback from the Transport Area Working Group. 498 Particular thanks are due to Mark Allman for feedback on an earlier 499 version of this document. 501 8. References 503 [BS00] Hari Balakrishnan and Srinivasan Seshan, The Congestion 504 Manager, draft-balakrishnan-cm-02.txt, internet draft, work in 505 progress, March 2000. 507 [DMKM00] S. Dawkins, G. Montenegro, M. Kojo, and V. Magret, End-to- 508 end Performance Implications of Slow Links, draft-ietf-pilc- 509 slow-03.txt, internet draft, work in progress, March 2000. 511 [FF99] Floyd, S., and Fall, K., Promoting the Use of End-to-End 512 Congestion Control in the Internet. IEEE/ACM Transactions on 513 Networking, August 1999. URL http://www- 514 nrg.ee.lbl.gov/floyd/end2end-paper.html". 516 [HPF00] Handley, M., Padhye, J., and Floyd, S., TCP Congestion Window 517 Validation, draft-handley-tcp-cwv-02.txt, internet draft, work in 518 progress, March 2000. 520 [Jacobson88] V. Jacobson, Congestion Avoidance and Control, ACM 521 SIGCOMM '88, August 1988. 523 [RFC793] J. Postel, Transmission Control Protocol, RFC 793, September 524 1981. 526 [RFC896] Nagle, J., Congestion Control in IP/TCP, RFC 896, January 527 1984. 529 [RFC1122] Braden, R., Ed., Requirements for Internet Hosts -- 530 Communication Layers, STD 3, RFC 1122, October 1989. 532 [RFC1323] V. Jacobson, R. Braden, and D. Borman, TCP Extensions for 533 High Performance, RFC 1323, May 1992. 535 [RFC2119] S. Bradner, Key words for use in RFCs to Indicate 536 Requirement Levels, RFC 2119, March 1997. 538 [RFC2212] S. Shenker, C. Partridge, and R. Guerin, Specification of 539 Guaranteed Quality of Service, RFC 2212, September 1997. 541 [RFC2309] B. Braden, Clark, D., Crowcroft, J., Davie, B., S. Deering, 542 Estrin, D., Floyd, S., V. Jacobson, G. Minshall, Partridge, C., L. 543 Peterson, Ramakrishnan, K.K., Shenker, S., Wroclawski, J., and Zhang, 544 L., Recommendations on Queue Management and Congestion Avoidance in 545 the Internet, RFC 2309, April 1998. 547 [RFC2357] A. Mankin, A. Romanow, S. Bradner, and V. Paxson, IETF 548 Criteria for Evaluating Reliable Multicast Transport and Application 549 Protocols, RFC 2357, June 1998. 551 [RFC2414] Allman, M., Floyd, S., and Partridge, C., Increasing TCP's 552 Initial Window, RFC 2414, Experimental, September 1998. 554 [RFC2475] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. 555 Weiss, An Architecture for Differentiated Services, RFC 2475, 556 December 1998. 558 [RFC2481] K. Ramakrishnan and S. Floyd, A Proposal to add Explicit 559 Congestion Notification (ECN) to IP, RFC 2481, January 1999. 561 [RFC2525] V. Paxson, M Allman, S. Dawson, W. Fenner, and J. Griner, 562 I. Heavens, K. Lahey, J. Semke, B. Volz, Known TCP Implementation 563 Problems, RFC 2525, March 1999. 565 [RFC2581] M. Allman, V. Paxson, and W. Stevens, TCP Congestion 566 Control, RFC 2581, April 1999. 568 [RFC2582] Floyd, S., and Henderson, T., The NewReno Modification to 569 TCP's Fast Recovery Algorithm, RFC 2582, April 1999. 571 [RFC2616] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, 572 P. Leach, and T. Berners-Lee, Hypertext Transfer Protocol -- 573 HTTP/1.1, RFC 2616, June 1999. 575 [SCWA99] S. Savage, N. Cardwell, D. Wetherall, and T. Anderson, TCP 576 Congestion Control with a Misbehaving Receiver, ACM Computer 577 Communications Review, October 1999. 579 [TCPB98] Hari Balakrishnan, Venkata N. Padmanabhan, Srinivasan 580 Seshan, Mark Stemm, and Randy H. Katz, TCP Behavior of a Busy 581 Internet Server: Analysis and Improvements, IEEE Infocom, March 1998. 582 Available from: 583 "http://www.cs.berkeley.edu/~hari/papers/infocom98.ps.gz". 585 [TCPF98] Dong Lin and H.T. Kung, TCP Fast Recovery Strategies: 586 Analysis and Improvements, IEEE Infocom, March 1998. Available from: 587 "http://www.eecs.harvard.edu/networking/papers/ infocom-tcp- 588 final-198.pdf". 590 9. TCP-Specific issues 592 In this section we discuss some of the particulars of TCP congestion 593 control, to illustrate a realization of the congestion control 594 principles, including some of the details that arise when 595 incorporating them into a production transport protocol. 597 9.1. Slow-start. 599 The TCP sender can not open a new connection by sending a large burst 600 of data (e.g., a receiver's advertised window) all at once. The TCP 601 sender is limited by a small initial value for the congestion window. 602 During slow-start, the TCP sender can increase its sending rate by at 603 most a factor of two in one roundtrip time. Slow-start ends when 604 congestion is detected, or when the sender's congestion window is 605 greater than the slow-start threshold ssthresh. 607 An issue that potentially affects global congestion control, and 608 therefore has been explicitly addressed in the standards process, 609 includes an increase in the value of the initial window 610 [RFC2414,RFC2581]. 612 Issues that have not been addressed in the standards process, and are 613 generally considered not to require standardization, include such 614 issues as the use (or non-use) of rate-based pacing, and mechanisms 615 for ending slow-start early, before the congestion window reaches 616 ssthresh. Such mechanisms result in slow-start behavior that is as 617 conservative or more conservative than standard TCP. 619 9.2. Additive Increase, Multiplicative Decrease. 621 In the absence of congestion, the TCP sender increases its congestion 622 window by at most one packet per roundtrip time. In response to a 623 congestion indication, the TCP sender decreases its congestion window 624 by half. (More precisely, the new congestion window is half of the 625 minimum of the congestion window and the receiver's advertised 626 window.) 628 An issue that potentially affects global congestion control, and 629 therefore would be likely to be explicitly addressed in the standards 630 process, would include a proposed addition of congestion control for 631 the return stream of `pure acks'. 633 An issue that has not been addressed in the standards process, and is 634 generally not considered to require standardization, would be a 635 change to the congestion window to apply as an upper bound on the 636 number of bytes presumed to be in the pipe, instead of applying as a 637 sliding window starting from the cumulative acknowledgement. 638 (Clearly, the receiver's advertised window applies as a sliding 639 window starting from the cumulative acknowledgement field, because 640 packets received above the cumulative acknowledgement field are held 641 in TCP's receive buffer, and have not been delivered to the 642 application. However, the congestion window applies to the number of 643 packets outstanding in the pipe, and does not necessarily have to 644 include packets that have been received out-of-order by the TCP 645 receiver.) 647 9.3. Retransmit timers. 649 The TCP sender sets a retransmit timer to infer that a packet has 650 been dropped in the network. When the retransmit timer expires, the 651 sender infers that a packet has been lost, sets ssthresh to half of 652 the current window, and goes into slow-start, retransmitting the lost 653 packet. If the retransmit timer expires because no acknowledgement 654 has been received for a retransmitted packet, the retransmit timer is 655 also "backed-off", doubling the value of the next retransmit timeout 656 interval. 658 An issue that potentially affects global congestion control, and 659 therefore would be likely to be explicitly addressed in the standards 660 process, might include a modified mechanism for setting the 661 retransmit timer that could significantly increase the number of 662 retransmit timers that expire prematurely, when the acknowledgement 663 has not yet arrived at the sender, but in fact no packets have been 664 dropped. This could be of concern to the Internet standards process 665 because retransmit timers that expire prematurely could lead to an 666 increase in the number of packets unnecessarily transmitted on a 667 congested link. 669 9.4. Fast Retransmit and Fast Recovery. 671 After seeing three duplicate acknowledgements, the TCP sender infers 672 a packet loss. The TCP sender sets ssthresh to half of the current 673 window, reduces the congestion window to at most half of the previous 674 window, and retransmits the lost packet. 676 An issue that potentially affects global congestion control, and 677 therefore would be likely to be explicitly addressed in the standards 678 process, might include a proposal (if there was one) for inferring a 679 lost packet after only one or two duplicate acknowledgements. If 680 poorly designed, such a proposal could lead to an increase in the 681 number of packets unnecessarily transmitted on a congested path. 683 An issue that has not been addressed in the standards process, and 684 would not be expected to require standardization, would be a proposal 685 to send a "new" or presumed-lost packet in response to a duplicate or 686 partial acknowledgement, if allowed by the congestion window. An 687 example of this would be sending a new packet in response to a single 688 duplicate acknowledgement, to keep the `ack clock' going in case no 689 further acknowledgements would have arrived. Such a proposal is an 690 example of a beneficial change that does not involve interoperability 691 and does not affect global congestion control, and that therefore 692 could be implemented by vendors without requiring the intervention of 693 the IETF standards process. (This issue has in fact been addressed 694 in [DMKM00], which suggests that "researchers may wish to experiment 695 with injecting new traffic into the network when duplicate 696 acknowledgements are being received, as described in [TCPB98] and 697 [TCPF98]." 699 9.5. Other aspects of TCP congestion control. 701 Other aspects of TCP congestion control that have not been discussed 702 in any of the sections above include TCP's recovery from an idle or 703 application-limited period [HPF00]. 705 10. Security Considerations 707 This document has been about the risks associated with congestion 708 control, or with the absence of congestion control. Section 3.2 709 discusses the potentials for unfairness if competing flows don't use 710 compatible congestion control mechanisms, and Section 5 considers the 711 dangers of congestion collapse if flows don't use end-to-end 712 congestion control. 714 Because this document does not propose any specific congestion 715 control mechanisms, it is also not necessary to present specific 716 security measures associated with congestion control. However, we 717 would note that there are a range of security considerations 718 associated with congestion control that should be considered in IETF 719 documents. 721 For example, individual congestion control mechanisms should be as 722 robust as possible to the attempts of individual end-nodes to subvert 723 end-to-end congestion control [SCWA99]. This is a particular concern 724 in multicast congestion control, because of the far-reaching 725 distribution of the traffic and the greater opportunities for 726 individual receivers to fail to report congestion. 728 RFC 2309 also discussed the potential dangers to the Internet of 729 unresponsive flows, that is, flows that don't reduce their sending 730 rate in the presence of congestion, and describes the need for 731 mechanisms in the network to deal with flows that are unresponsive to 732 congestion notification. We would note that there is still a need 733 for research, engineering, measurement, and deployment in these 734 areas. 736 Because the Internet aggregates very large numbers of flows, the risk 737 to the whole infrastructure of subverting the congestion control of a 738 few individual flows is limited. Rather, the risk to the 739 infrastructure would come from the widespread deployment of many end- 740 nodes subverting end-to-end congestion control. 742 AUTHORS' ADDRESSES 744 Sally Floyd 745 AT&T Center for Internet Research at ICSI (ACIRI) 746 Phone: +1 (510) 642-4274 x189 747 Email: floyd@aciri.org 748 URL: http://www.aciri.org/floyd/ 750 This draft was created in June 2000. 751 It expires December 2000.