idnits 2.17.1 draft-kksjf-ecn-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 18 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 19 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 170: '...ms followed at the end-systems MUST be...' RFC 2119 keyword, line 198: '...cket, the router MAY instead set the C...' RFC 2119 keyword, line 530: '...capsulating ('outside') header MUST be...' RFC 2119 keyword, line 532: '... the outside header SHOULD be a 1....' RFC 2119 keyword, line 536: '...e outside header MUST be ORed with the...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 1998) is 9410 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 2001' is mentioned on line 267, but not defined ** Obsolete undefined reference: RFC 2001 (Obsoleted by RFC 2581) == Unused Reference: 'Floyd97' is defined on line 661, but no explicit reference was found in the text == Unused Reference: 'FRED' is defined on line 673, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'CKLTZ97' -- Possible downref: Non-RFC (?) normative reference: ref. 'CKLT98' -- Possible downref: Non-RFC (?) normative reference: ref. 'ECN' -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ93' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd94' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd97' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd98' -- Possible downref: Non-RFC (?) normative reference: ref. 'K98' -- Possible downref: Non-RFC (?) normative reference: ref. 'FRED' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson88' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson90' ** Downref: Normative reference to an Informational RFC: RFC 1141 -- Possible downref: Non-RFC (?) normative reference: ref. 'MJV96' ** Obsolete normative reference: RFC 2001 (Obsoleted by RFC 2581) ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567) -- Possible downref: Non-RFC (?) normative reference: ref. 'RJ90' Summary: 14 errors (**), 0 flaws (~~), 6 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force K. K. Ramakrishnan 3 INTERNET DRAFT AT&T Labs Research 4 draft-kksjf-ecn-01.txt Sally Floyd 5 LBNL 6 July 1998 7 Expires: January 1999 9 A Proposal to add Explicit Congestion Notification (ECN) to IP 11 Status of this Memo 13 This document is an Internet-Draft. Internet-Drafts are working 14 documents of the Internet Engineering Task Force (IETF), its areas, 15 and its working groups. Note that other groups may also distribute 16 working documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet- Drafts as reference 21 material or to cite them other than as "work in progress." 23 To view the entire list of current Internet-Drafts, please check the 24 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 25 Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern 26 Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific 27 Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). 29 Abstract 31 This note describes a proposed addition of ECN (Explicit Congestion 32 Notification) to IP. TCP is currently the dominant transport 33 protocol used in the Internet. We begin by describing TCP's use of 34 packet drops as an indication of congestion. Next we argue that with 35 the addition of active queue management (e.g., RED) to the Internet 36 infrastructure, where routers detect congestion before the queue 37 overflows, routers are no longer limited to packet drops as an 38 indication of congestion, but could instead set a Congestion 39 Experienced (CE) bit in the packet header, for ECN-capable transport 40 protocols. We describe when the CE bit would be set in the routers, 41 and describe what modifications would be needed to TCP to make it 42 ECN-capable. Modifications to other transport protocols (e.g., 43 unreliable unicast or multicast, reliable multicast, other reliable 44 unicast transport protocols) could be considered as those protocols 45 are developed and advance through the standards process. 47 1. Introduction 49 TCP's congestion control and avoidance algorithms are based on the 50 notion that the network is a black-box [Jacobson88, Jacobson90]. The 51 network's state of congestion or otherwise is determined by end- 52 systems probing for the network state, by gradually increasing the 53 load on the network (by increasing the window of packets that are 54 outstanding in the network) until the network becomes congested and a 55 packet is lost. Treating the network as a "black-box" and treating 56 loss as an indication of congestion in the network is appropriate for 57 pure best-effort data carried by TCP which has little or no 58 sensitivity to delay or loss of individual packets. In addition, 59 TCP's congestion management algorithms have techniques built-in (such 60 as Fast Retransmit and Fast Recovery) to minimize the impact of 61 losses from a throughput perspective. 63 However, these mechanisms are not intended to help applications that 64 are in fact sensitive to the delay or loss of one or more individual 65 packets. Interactive traffic such as telnet, web-browsing, and 66 transfer of audio and video data can be sensitive to packet losses 67 (using an unreliable data delivery transport such as UDP) or to the 68 increased latency of the packet caused by the need to retransmit the 69 packet after a loss (for reliable data delivery such as TCP). 71 Since TCP determines the appropriate congestion window to use by 72 gradually increasing the window size until it experiences a dropped 73 packet, this causes the queues at the bottleneck router to build up. 74 With most packet drop policies at the router that are not sensitive 75 to the load placed by each individual flow, this means that some of 76 the packets of latency-sensitive flows are going to be dropped. 77 Active queue management mechanisms detect congestion before the queue 78 overflows, and provide an indication of this congestion to the end 79 nodes. The advantages of active queue management are discussed in 80 RFC 2309 [RFC2309]. Active queue management avoids some of the bad 81 properties of dropping on queue overflow, including the undesirable 82 synchronization of loss across multiple flows. More importantly, 83 active queue management means that transport protocols with 84 congestion control (e.g., TCP) do not have to rely on buffer overflow 85 as the only indication of congestion. This can reduce unnecessary 86 queueing delay for all traffic sharing that queue. 88 Active queue management mechanisms may use one of several methods for 89 indicating congestion to end-nodes. One is to use packet drops, as is 90 currently done. However, active queue management allows the router to 91 separate policies of queueing or dropping packets from the policies 92 for indicating congestion. Thus, active queue management allows 93 routers to use the Congestion Experienced (CE) bit in a packet header 94 as an indication of congestion, instead of relying solely on packet 95 drops. 97 2. Assumptions and General Principles 99 In this section, we describe some of the important design principles 100 and assumptions that guided the design choices in this proposal. 102 (1) Congestion may persist over different time-scales. The time 103 scales that we are concerned with are congestion events that may last 104 longer than a round-trip time. 105 (2) The number of packets in an individual flow (e.g., TCP connection 106 or an exchange using UDP) may range from a small number of packets to 107 quite a large number. We are interested in managing the congestion 108 caused by flows that send enough packets so that they are still 109 active when network feedback reaches them. 110 (3) New mechanisms for congestion control and avoidance need to co- 111 exist and cooperate with existing mechanisms for congestion control. 112 In particular, new mechanisms have to co-exist with TCP's current 113 methods of adapting to congestion and with routers' current practice 114 of dropping packets in periods of congestion. 115 (4) Because ECN is likely to be adopted gradually, accommodating 116 migration is essential. Some routers may still only drop packets to 117 indicate congestion, and some end-systems may not be ECN-capable. The 118 most viable strategy is one that accommodates incremental deployment 119 without having to resort to "islands" of ECN-capable and non-ECN- 120 capable environments. 121 (5) Asymmetric routing is likely to be a normal occurrence in the 122 Internet. The path (sequence of links and routers) followed by data 123 packets may be different from the path followed by the acknowledgment 124 packets in the reverse direction. 125 (6) Routers process the "regular" headers in IP packets more 126 efficiently than they process the header information in IP options. 127 This suggests keeping congestion experienced information in the 128 regular headers of an IP packet. 129 (7) It must be recognized that not all end-systems will cooperate in 130 mechanisms for congestion control. However, new mechanisms shouldn't 131 make it easier for TCP applications to disable TCP congestion 132 control. The benefit of lying about participating in new mechanisms 133 such as ECN-capability should be small. 135 3. Random Early Detection (RED) 137 Random Early Detection (RED) is a mechanism for active queue 138 management that has been proposed to detect incipient congestion 139 [FJ93], and is currently being deployed in the Internet backbone 140 [RFC2309]. Although RED is meant to be a general mechanism using one 141 of several alternatives for congestion indication, in the current 142 environment of the Internet RED is restricted to using packet drops 143 as a mechanism for congestion indication. RED drops packets based on 144 the average queue length exceeding a threshold, rather than only when 145 the queue overflows. However, when RED drops packets before the 146 queue actually overflows, RED is not forced by memory limitations to 147 discard the packet. 149 RED could set a Congestion Experienced (CE) bit in the packet header 150 instead of dropping the packet, if such a bit was provided in the IP 151 header and understood by the transport protocol. The use of the CE 152 bit would allow the receiver(s) to receive the packet, avoiding the 153 potential for excessive delays due to retransmissions after packet 154 losses. We use the term 'CE packet' to denote a packet that has the 155 CE bit set. 157 4. Explicit Congestion Notification in IP 159 We propose that the Internet provide a congestion indication for 160 incipient congestion (as in RED and earlier work [RJ90]) where the 161 notification can sometimes be through marking packets rather than 162 dropping them. This would require an ECN field in the IP header with 163 two bits. The ECN-Capable Transport (ECT) bit would be set by the 164 data sender to indicate that the end-points of the transport protocol 165 are ECN-capable. The CE bit would be set by the router to indicate 166 congestion to the end nodes. Routers that have a packet arriving at 167 a full queue would drop the packet, just as they do now. 169 Upon the receipt by an ECN-Capable transport of a single CE packet, 170 the congestion control algorithms followed at the end-systems MUST be 171 essentially the same as the congestion control response to a *single* 172 dropped packet. For example, for TCP the source TCP halves its 173 congestion window "cwnd" in response to an ECN indication received by 174 the data receiver. 176 One reason for requiring that the congestion-control response to the 177 CE packet be essentially the same as the response to a dropped packet 178 is to accommodate the incremental deployment of ECN in both end- 179 systems and in routers. Some routers may drop ECN-Capable packets 180 (e.g., using the same RED policies for congestion detection) while 181 other routers set the CE bit, for equivalent levels of congestion. 182 Similarly, a router might drop a non-ECN-Capable packet but set the 183 CE bit in an ECN-Capable packet, for equivalent levels of congestion. 184 Different congestion control responses to a CE bit indication and to 185 a packet drop could result in unfair treatment for different flows. 187 An additional requirement is that the end-systems should react to 188 congestion at most once per window of data (i.e., at most once per 189 roundtrip time), to avoid reacting multiple times to multiple 190 indications of congestion within a roundtrip time. 192 For a router, the CE bit of an ECN-Capable packet should only be set 193 if the router would otherwise have dropped the packet as an 194 indication of congestion to the end nodes. When the router's buffer 195 is not yet full and the router is prepared to drop a packet to inform 196 end nodes of incipient congestion, the router should first check to 197 see if the ECT bit is set in that packet's IP header. If so, then 198 instead of dropping the packet, the router MAY instead set the CE bit 199 in the IP header. 201 An environment where all end nodes were ECN-Capable could allow new 202 criteria to be developed for setting the CE bit, and new congestion 203 control mechanisms for end-node reaction to CE packets. However, 204 this is a research issue, and as such is not addressed in this 205 document. 207 When a CE packet is received by a router, the CE bit is left 208 unchanged, and the packet transmitted as usual. When severe 209 congestion has occurred and the router's queue is full, then the 210 router has no choice but to drop some packet when a new packet 211 arrives. We anticipate that such packet losses will become 212 relatively infrequent when a majority of end-systems become ECN- 213 Capable and participate in TCP or other compatible congestion control 214 mechanisms. In an adequately-provisioned network in such an ECN- 215 Capable environment, packet losses should occur primarily during 216 transients or in the presence of non-cooperating sources. 218 We expect that routers will set the CE bit in response to incipient 219 congestion as indicated by the average queue size, using the RED 220 algorithms suggested in [FJ93, RFC2309]. To the best of our 221 knowledge, this is the only proposal currently under discussion in 222 the IETF for routers to drop packets proactively, before the buffer 223 overflows. However, this document does not attempt to specify a 224 particular mechanism for active queue management, leaving that 225 endeavor, if needed, to other areas of the IETF. While ECN is 226 inextricably tied up with active queue management at the router, the 227 reverse does not hold; active queue management mechanisms have been 228 developed and deployed independently from ECN, using packet drops as 229 indications of congestion in the absence of ECN in the IP 230 architecture. 232 5. Support from the Transport Protocol 234 ECN requires support from the transport protocol, in addition to the 235 functionality given by the ECN field in the IP packet header. The 236 transport protocol might require negotiation between the endpoints 237 during setup to determine that all of the endpoints are ECN-capable, 238 so that the sender can set the ECT bit in transmitted packets. 239 Second, the transport protocol must be capable of reacting 240 appropriately to the receipt of CE packets. This reaction could be 241 in the form of the data receiver informing the data sender of the 242 received CE packet (e.g., TCP), of the data receiver unsubscribing to 243 a layered multicast group (e.g., RLM [MJV96]), or of some other 244 action that ultimately reduces the arrival rate of that flow to that 245 receiver. 247 This document only addresses the addition of ECN Capability to TCP, 248 leaving issues of ECN and other transport protocols to further 249 research. For TCP, ECN requires three new mechanisms: negotiation 250 between the endpoints during setup to determine if they are both 251 ECN-capable; an ECN-Echo flag in the TCP header so that the data 252 receiver can inform the data sender when a CE packet has been 253 received; and a Congestion Window Reduced (CWR) flag in the TCP 254 header so that the data sender can inform the data receiver that the 255 congestion window has been reduced. The support required from other 256 transport protocols is likely to be different, particular for 257 unreliable or reliable multicast transport protocols, and will have 258 to be determined as other transport protocols are brought to the IETF 259 for standardization. 261 5.1. TCP 263 The following sections describe in detail the proposed use of ECN in 264 TCP. This proposal is described in essentially the same form in 265 [Floyd94]. We assume that the source TCP uses the standard congestion 266 control algorithms of Slow-start, Fast Retransmit and Fast Recovery 267 [RFC 2001]. 269 5.1.1. TCP Initialization 271 In the TCP connection setup phase, the source and destination TCPs 272 exchange information about their desire and/or capability to use ECN. 273 As a result of the negotiation, the TCP sender sets the ECT bit in 274 the IP header to indicate to the network that the transport is 275 capable and willing to participate in ECN for this packet. This will 276 indicate to the routers that they may mark this packet with the CE 277 bit, if they would like to use that as a method of congestion 278 notification. If the TCP connection does not wish to use ECN 279 notification for a particular packet, the sending TCP sets the ECT 280 bit equal to 0 (i.e., not set), and the TCP receiver ignores the CE 281 bit in the received packet. 283 The TCP mechanism for negotiating ECN-Capability uses the ECN-Echo 284 flag in the TCP header. (This was called the ECN Notify flag in some 285 earlier documents.) Bit 9 in the Reserved field of the TCP header is 286 assigned to the ECN-Echo flag. 288 When a node sends a TCP SYN packet, it may set the ECN-Echo flag in 289 the TCP header. For a SYN packet, the ECN-Echo flag is defined as an 290 indication that the sending TCP is ECN-Capable, rather than as a 291 return indication of congestion. More precisely, a SYN packet with 292 the ECN-Echo flag set indicates that that sending TCP implementation 293 will respond to incoming data packets that have the CE bit set in the 294 IP header by setting the ECN-Echo flag in outgoing TCP 295 Acknowledgement (ACK) packets. 297 Similarly, for a SYN-ACK packet, the ECN-Echo flag in the TCP header 298 is defined as an indication that the TCP transmitting the SYN-ACK 299 packet is ECN-Capable. 301 5.1.2. The TCP Sender 303 For a TCP connection using ECN, data packets are transmitted with the 304 ECT bit set in the IP header (set to a "1"). If the sender receives 305 an ECN-Echo ACK packet (that is, an ACK packet with the ECN-Echo flag 306 set in the TCP header), then the sender knows that congestion was 307 encountered in the network on the path from the sender to the 308 receiver. The indication of congestion should be treated just as a 309 congestion loss in non-ECN-Capable TCP. That is, the TCP source 310 halves the congestion window "cwnd" and reduces the slow start 311 threshold "ssthresh". The sending TCP does NOT increase the 312 congestion window in response to the receipt of an ECN-Echo ACK 313 packet. 315 A critical condition is that TCP does not react to congestion 316 indications more than once every window of data (or more loosely, 317 more than once every round-trip time). That is, the TCP sender's 318 congestion window should be reduced only once in response to a series 319 of dropped and/or CE packets from a single window of data, 321 The recommended method for implementing this is as follows. Assume 322 that at time "t" the source TCP reacts to an ECN-Echo ACK packet by 323 reducing its congestion window. The source TCP notes the packets 324 that are outstanding at that time (i.e., packets that have not yet 325 been acknowledged). Until all these packets are acknowledged, the 326 source TCP does not react to another ECN indication of congestion. 327 However, if during this period a packet is retransmitted as a result 328 of a retransmission timeout or the receipt of the required number 329 (e.g., 3) of duplicate acknowledgments, then the source TCP will 330 react to subsequent ECN indications of congestion. 332 [Floyd94] discusses this further, and [Floyd98] includes a validation 333 test illustrating a wide range of ECN scenarios. These scenarios 334 include the following: an ECN followed by another ECN, a Fast 335 Retransmit, or a Retransmit Timeout; and a Retransmit Timeout or a 336 Fast Retransmit followed by an ECN. 338 When the TCP sender reduces its congestion window in response to an 339 ECN-Echo ACK packet, there is no need for the sender to slow-start 340 (as in Tahoe TCP in response to a packet drop) or to stop sending 341 packets for a period of time to allow the queue to dissipate (as in 342 Reno TCP for roughly half a round-trip time during Fast Recovery). 343 The CE packet in the forward direction does not indicate the imminent 344 possibility of buffer overflow requiring an urgent source action to 345 reduce the load dramatically. Incoming acknowledgements that 346 continue to arrive can "clock out" outgoing packets as allowed by the 347 reduced congestion window. 349 TCP follows existing algorithms for sending data packets in response 350 to incoming ACKs, multiple duplicate acknowledgements, or retransmit 351 timeouts [RFC2001]. 353 5.1.3. The TCP Receiver 355 When TCP receives a CE data packet at the destination end-system, the 356 TCP data receiver sets the ECN-Echo flag in the TCP header of the 357 subsequent ACK packet. If there is any ACK withholding implemented, 358 as in current "delayed-ACK" TCP implementations where the TCP 359 receiver can send an ACK for two arriving data packets, then the 360 ECN-Echo flag in the ACK packet will be set to the OR of the CE bits 361 of all of the data packets being acknowledged. That is, if any of 362 the received data packets are CE packets, then the returning ACK has 363 the ECN-Echo flag set. 365 To provide robustness against the possibility of a dropped ACK packet 366 carrying an ECN-Echo flag, the TCP receiver must set the ECN-Echo 367 flag in a series of ACK packets. To enable the TCP receiver to 368 determine when to stop setting the ECN-Echo flag, we introduce a 369 second new flag in the TCP header, the Congestion Window Reduced 370 (CWR) flag. The CWR flag is assigned to Bit 8 in the Reserved field 371 of the TCP header. 373 When an ECN-Capable TCP reduces its congestion window for any reason 374 (because of a retransmit timeout, a Fast Retransmit, or in response 375 to an ECN Notification), the TCP sets the CWR flag in the TCP header 376 of the first data packet sent after the window reduction. If that 377 data packet is dropped in the network, then the sending TCP will have 378 to reduce the congestion window again and retransmit the dropped 379 packet. Thus, the Congestion Window Reduced message is reliably 380 delivered to the data receiver. 382 After a TCP receiver sends an ACK packet with the ECN-Echo bit set, 383 that TCP receiver continues to set the ECN-Echo flag in ACK packets 384 until it receives a CWR packet (a packet with the CWR flag set). 385 After the receipt of the CWR packet, acknowledgements for subsequent 386 non-CE data packets do not have the ECN-Echo flag set. If another CE 387 packet is received by the data receiver, the receiver would once 388 again send ACK packets with the ECN-Echo flag set. While the receipt 389 of a CWR packet does not guarantee that the data sender received the 390 ECN-Echo message, this does guarantee that the data sender reduced 391 its congestion window at some point *after* it sent the data packet 392 for which the CE bit was set. 394 We have already specified that a TCP sender reduces its congestion 395 window at most once per window of data. This mechanism requires some 396 care to make sure that the sender reduces its congestion window at 397 most once per ECN indication, and that multiple ECN messages over 398 several successive windows of data are properly reported to the ECN 399 sender. This is discussed further in [Floyd98]. 401 5.1.4. Congestion on the ACK-path 403 For the current generation of TCP congestion control algorithms, pure 404 acknowledgement packets (e.g., packets that do not contain any 405 accompanying data) should be sent with the ECN-capable bit off. 406 Current TCP receivers have no mechanisms for reducing traffic on the 407 ACK-path in response to congestion notification. Mechanisms for 408 responding to congestion on the ACK-path can be relegated as an area 409 for future research. (One simple possibility would be for the sender 410 to reduce its congestion window when it receives a pure ACK packet 411 with the CE bit set). For current TCP implementations, a single 412 dropped ACK generally has only a very small effect on the TCP's 413 sending rate. 415 6. Summary of changes required in IP and TCP 417 Two bits need to be specified in the IP header, the ECN-Capable 418 Transport (ECT) bit and the Congestion Experienced (CE) bit. The ECT 419 bit set to "0" indicates that the transport protocol will ignore the 420 CE bit. This is the default value for the ECT bit. The ECT bit set 421 to "1" indicates that the transport protocol is willing and able to 422 participate in ECN. 424 The default value for the CE bit is "0". The router sets the CE bit 425 to "1" to indicate congestion to the end nodes. The CE bit in a 426 packet header should never be reset by a router from "1" to "0". 428 TCP requires three changes, a negotiation phase during setup to 429 determine if both end nodes are ECN-capable, and two new flags in the 430 TCP header, from the "reserved" flags in the TCP flags field. The 431 ECN-Echo flag is used by the data receiver to inform the data sender 432 of a received CE packet. The Congestion Window Reduced flag is used 433 by the data sender to inform the data receiver that the congestion 434 window has been reduced. 436 7. Non-relationship to ATM's EFCI indicator or Frame Relay's FECN 438 Since the ATM and Frame Relay mechanisms for congestion indication 439 have typically been defined without any notion of average queue size 440 as the basis for determining that an intermediate node is congested, 441 we believe that they provide a very noisy signal. The TCP-sender 442 reaction specified in this draft for ECN is NOT the appropriate 443 reaction for such a noisy signal of congestion notification. It is 444 our expectation that ATM's EFCI and Frame Relay's FECN mechanisms 445 would be phased out over time within the ATM network. However, if 446 the routers that interface to the ATM network have a way of 447 maintaining the average queue at the interface, and use it to come to 448 a reliable determination that the ATM subnet is congested, they may 449 use the ECN notification that is defined here. 451 8. Non-compliance by the End Nodes 453 This section discusses concerns about the vulnerability of ECN to 454 non-compliant end-nodes (i.e., end nodes that set the ECT bit in 455 transmitted packets but do not respond to received CE packets). We 456 argue that the addition of ECN to the IP architecture would not 457 significantly increase the current vulnerability of the architecture 458 to unresponsive flows. 460 Even for non-ECN environments, there are serious concerns about the 461 damage that can be done by non-compliant or unresponsive flows (that 462 is, flows that do not respond to congestion control indications by 463 reducing their arrival rate at the congested link). For example, an 464 end-node could "turn off congestion control" by not reducing its 465 congestion window in response to packet drops. This is a concern for 466 the current Internet. It has been argued that routers will have to 467 deploy mechanisms to detect and differentially treat packets from 468 non-compliant flows. It has also been argued that techniques such as 469 end-to-end per-flow scheduling and isolation of one flow from 470 another, differentiated services, or end-to-end reservations could 471 remove some of the more damaging effects of unresponsive flows. 473 It has been argued that dropping packets in itself may be an adequate 474 deterrent for non-compliance, and that the use of ECN removes this 475 deterrent. We would argue in response that (1) ECN-capable routers 476 preserve packet-dropping behavior in times of high congestion; and 477 (2) even in times of high congestion, dropping packets in itself is 478 not an adequate deterrent for non-compliance. 480 First, ECN-Capable routers will only mark packets (as opposed to 481 dropping them) when the packet marking rate is reasonably low. During 482 periods where the average queue size exceeds an upper threshold, and 483 therefore the potential packet marking rate would be high, our 484 recommendation is that routers drop packets rather then set the CE 485 bit in packet headers. 487 During the periods of low or moderate packet marking rates when ECN 488 would be deployed, there would be little deterrent effect on 489 unresponsive flows of dropping rather than marking those packets. For 490 example, delay-insensitive flows using reliable delivery might have 491 an incentive to increase rather than to decrease their sending rate 492 in the presence of dropped packets. Similarly, delay-sensitive flows 493 using unreliable delivery might increase their use of FEC in response 494 to an increased packet drop rate, increasing rather than decreasing 495 their sending rate. For the same reasons, we do not believe that 496 packet dropping itself is an effective deterrent for non-compliance 497 even in an environment of high packet drop rates. 499 Several methods have been proposed to identify and restrict non- 500 compliant or unresponsive flows. The addition of ECN to the network 501 environment would not in any way increase the difficulty of designing 502 and deploying such mechanisms. If anything, the addition of ECN to 503 the architecture would make the job of identifying unresponsive flows 504 slightly easier. For example, in an ECN-Capable environment routers 505 are not limited to information about packets that are dropped or have 506 the CE bit set at that router itself; in such an environment routers 507 could also take note of arriving CE packets that indicate congestion 508 encountered by that packet earlier in the path. 510 9. Non-compliance in the Network 512 The breakdown of effective congestion control could be caused not 513 only by a non-compliant end-node, but also by the loss of the 514 congestion indication in the network itself. As one example, a rogue 515 or broken router could "erase" the CE bit in arriving CE packets, 516 thus preventing that indication of congestion from reaching 517 downstream receivers. This could result in the failure of congestion 518 control for that flow and a resulting increase in congestion in the 519 network, ultimately resulting in subsequent packets dropped for this 520 flow as the average queue size increased at the congested gateway. 521 Concerns regarding the loss of congestion indications from 522 encapsulated, dropped, or corrupted packets are discussed below. 524 9.1. Encapsulated packets 526 Some care is required to handle the CE and ECT bits appropriately 527 when packets are encapsulated and de-encapsulated for tunnels. When 528 a packet is encapsulated, the following rules apply regarding the ECT 529 bit. First, if the ECT bit in the encapsulated ('inside') header is 530 a 0, then the ECT bit in the encapsulating ('outside') header MUST be 531 a 0. If the ECT bit in the inside header is a 1, then the ECT bit in 532 the outside header SHOULD be a 1. 534 When a packet is de-encapsulated, the following rules apply regarding 535 the CE bit. If the ECT bit is a 1 in both the inside and the outside 536 header, then the CE bit in the outside header MUST be ORed with the 537 CE bit in the inside header. (That is, in this case a CE bit of 1 in 538 the outside header must be copied to the inside header.) If the ECT 539 bit in either header is a 0, then the CE bit in the outside header is 540 ignored. 542 9.2. Dropped or Corrupted Packets 544 An additional issue concerns a packet that has the CE bit set at one 545 router and is dropped by a subsequent router. For the proposed use 546 for ECN in this paper (that is, for a transport protocol such as TCP 547 for which a dropped data packet is an indication of congestion), end 548 nodes detect dropped data packets, and the congestion response of the 549 end nodes to a dropped data packet is at least as strong as the 550 congestion response to a received CE packet. 552 However, transport protocols such as TCP do not necessarily detect 553 all packet drops, such as the drop of a "pure" ACK packet; for 554 example, TCP does not reduce the arrival rate of subsequent ACK 555 packets in response to an earlier dropped ACK packet. Any proposal 556 for extending ECN-Capability to such packets would have to address 557 concerns raised by CE packets that were later dropped in the network. 559 Similarly, if a CE packet is dropped later in the network due to 560 corruption (bit errors), the end nodes should still invoke congestion 561 control, just as TCP would today in response to a dropped data 562 packet. This issue of corrupted CE packets would have to be 563 considered in any proposal for the network to distinguish between 564 packets dropped due to corruption, and packets dropped due to 565 congestion or buffer overflow. 567 10. A summary of related work. 569 [Floyd94] considers the advantages and drawbacks of adding ECN to the 570 TCP/IP architecture. As shown in the simulation-based comparisons, 571 one advantage of ECN is to avoid unnecessary packet drops for short 572 or delay-sensitive TCP connections. A second advantage of ECN is in 573 avoiding some unnecessary retransmit timeouts in TCP. This paper 574 discusses in detail the integration of ECN into TCP's congestion 575 control mechanisms. The possible disadvantages of ECN discussed in 576 the paper are that a non-compliant TCP connection could falsely 577 advertise itself as ECN-capable, and that a TCP ACK packet carrying 578 an ECN-Echo message could itself be dropped in the network. The 579 first of these two issues is discussed in Section 8 of this document, 580 and the second is addressed by the proposal in Section 5.1.3 for a 581 CWR flag in the TCP header. 583 [CKLTZ97] reports on an experimental implementation of ECN in IPv6. 584 The experiments include an implementation of ECN in an existing 585 implementation of RED for FreeBSD. A number of experiments were run 586 to demonstrate the control of the average queue size in the router, 587 the performance of ECN for a single TCP connection as a congested 588 router, and fairness with multiple competing TCP connections. One 589 conclusion of the experiments is that dropping a packet from a bulk- 590 data transfer degrades performance much more severely than marking a 591 packet. 593 Because the experimental implementation in [CKLTZ97] predates some of 594 the developments in this document, the implementation does not 595 conform to this document in all respects. For example, in the 596 experimental implementation the CWR flag is not used, but instead the 597 TCP receiver sends the ECN-Echo bit on a single ACK packet. 599 [K98] and [CKLT98] build on [CKLTZ97] to further analyze the benefits 600 of ECN for TCP. The conclusions are that ECN TCP gets moderately 601 better throughput than non-ECN TCP; that ECN TCP flows are fair 602 towards non-ECN TCP flows; and that ECN TCP is robust with two-way 603 traffic, congestion in both directions, and with multiple congested 604 gateways. Experiments with many short web transfers show that, while 605 most of the short connections have similar transfer times with or 606 without ECN, a small percentage of the short connections have very 607 high transfer times for the non-ECN experiments as compared to the 608 ECN experiments. This increased transfer time is particularly 609 dramatic for those short connections that have their first packet 610 dropped in the non-ECN experiments, and that therefore have to wait 611 six seconds for the retransmit timer to expire. 613 The ECN Web Page [ECN] has pointers to other implementations of ECN 614 in progress. 616 11. Conclusions 618 Given the current effort to implement RED, we believe this is the 619 right time for router vendors to examine how to implement congestion 620 avoidance mechanisms that do not depend on packet drops alone. With 621 the increased deployment of applications and transports sensitive to 622 the delay and loss of a single packet, depending on packet loss as a 623 normal congestion notification mechanism appears to be insufficient 624 (or at the very least, non-optimal). 626 12. Acknowledgements 628 A number of people have made contributions to this internet-draft. 629 In particular, we would like to thank Kenjiro Cho for the proposal 630 for the TCP mechanism for negotiating ECN-Capability, Steve Blake and 631 Kevin Fall for the material on IPv4 Header Checksum Recalculation, 632 and Steve Bellovin, Jim Bound, Brian Carpenter, Paul Ferguson, 633 Stephen Kent, Greg Minshall, and Vern Paxson for discussions of 634 security issues. We also thank the Internet End-to-End Research 635 Group for ongoing discussions of these issues. 637 13. References 639 [CKLTZ97] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L., 640 "Implementing Explicit Congestion Notification (ECN) in TCP over 641 IPv6", UCLA Technical Report, December 1997, URL 642 "http://www.cs.ucla.edu/~hari/software/ecn/ecn_rpt.ps.gz". 644 [CKLT98] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L., 645 "Implementing ECN for TCP/IPv6", presentation to the ECN BOF at the 646 L.A. IETF, March 1998, URL "http://www.cs.ucla.edu/~hari/ecn- 647 ietf.ps". 649 [ECN] "The ECN Web Page", URL "http://www- 650 nrg.ee.lbl.gov/floyd/ecn.html". 652 [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways 653 for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 654 N.4, August 1993, p. 397-413. URL 655 "ftp://ftp.ee.lbl.gov/papers/early.pdf". 657 [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM 658 Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. 659 URL "ftp://ftp.ee.lbl.gov/papers/tcp_ecn.4.ps.Z". 661 [Floyd97] Floyd, S., and Fall, K., "Router Mechanisms to Support 662 End-to-End Congestion Control", Technical report, February 1997. URL 663 "ftp://ftp.ee.lbl.gov/papers/collapse.ps". 665 [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", 666 URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- 667 ecn. 669 [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN) 670 benefits for TCP", Master's thesis, UCLA, 1998, URL 671 "http://www.cs.ucla.edu/~hari/software/ecn/ecn_report.ps.gz". 673 [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", 674 SIGCOMM '97, September 1997. URL 675 "http://www.inria.fr/rodeo/sigcomm97/program.html#ab078". 677 [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. 678 ACM SIGCOMM '88, pp. 314-329. URL 679 "ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z". 681 [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance 682 Algorithm", Message to end2end-interest mailing list, April 1990. 683 URL "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". 685 [RFC1141] T. Mallory and A. Kullberg, "Incremental Updating of the 686 Internet Checksum", RFC 1141, January 1990. 688 [MJV96], S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven 689 Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130. 691 [RFC2001] W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast 692 Retransmit, and Fast Recovery Algorithms", RFC 2001, January 1997. 694 [RFC2309] B. Braden, D. Clark, J. Crowcroft, B. Davie, S. Deering, D. 695 Estrin, S. Floyd, V. Jacobson, G. Minshall, C. Partridge, L. 696 Peterson, K. Ramakrishnan, S. Shenker, J. Wroclawski, L. Zhang, 697 "Recommendations on Queue Management and Congestion Avoidance in the 698 Internet", RFC 2309, April 1998. 700 [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for 701 Congestion Avoidance in Computer Networks", ACM Transactions on 702 Computer Systems, Vol.8, No.2, pp. 158-181, May 1990. 704 14. Security Considerations 706 Security considerations have been discussed in Section 9. 708 15. IPv4 Header Checksum Recalculation 710 IPv4 header checksum recalculation is an issue with some high-end 711 router architectures using an output-buffered switch, since most if 712 not all of the header manipulation is performed on the input side of 713 the switch, while the ECN decision would need to be made local to the 714 output buffer. This is not an issue for IPv6, since there is no IPv6 715 header checksum. The IPv4 TOS octet is the last byte of a 16-bit 716 half-word. 718 RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 719 checksum after the TTL field is decremented. The incremental 720 updating of the IPv4 checksum after the CE bit was set would work as 721 follows: Let HC be the original header checksum, and let HC' be the 722 new header checksum after the CE bit has been set. Then for header 723 checksums calculated with one's complement subtraction, HC' would be 724 recalculated as follows: 725 HC' = { HC - 1 HC > 1 726 { 0x0000 HC = 1 727 For header checksums calculated on two's complement machines, HC' 728 would be recalculated as follows after the CE bit was set: 729 HC' = { HC - 1 HC > 0 730 { 0xFFFE HC = 0 732 16. The motivation for the ECT bit. 734 The need for the ECT bit is motivated by the fact that ECN will be 735 deployed incrementally in an Internet where some transport protocols 736 and routers understand ECN and some do not. With the ECT bit, the 737 router can drop packets from flows that are not ECN-capable, but can 738 **instead** set the CE bit in flows that **are** ECN-capable. Because 739 the ECT bit allows an end node to have the CE bit set in a packet 740 **instead** of having the packet dropped, an end node might have some 741 incentive to deploy ECN. 743 If there was no ECT indication, then the router would have to set the 744 CE bit for packets from both ECN-capable and non-ECN-capable flows. 745 In this case, there would be no incentive for end-nodes to deploy 746 ECN, and no viable path of incremental deployment from a non-ECN 747 world to an ECN-capable world. Consider the first stages of such an 748 incremental deployment, where a subset of the flows are ECN-capable. 749 At the onset of congestion, when the packet dropping/marking rate 750 would be low, routers would only set CE bits, rather than dropping 751 packets. However, only those flows that are ECN-capable would 752 understand and respond to CE packets. The result is that the ECN- 753 capable flows would back off, and the non-ECN-capable flows would be 754 unaware of the ECN signals and would continue to open their 755 congestion windows. 757 In this case, there are two possible outcomes: (1) the ECN-capable 758 flows back off, the non-ECN-capable flows get all of the bandwidth, 759 and congestion remains mild, or (2) the ECN-capable flows back off, 760 the non-ECN-capable flows don't, and congestion increases until the 761 router transitions from setting the CE bit to dropping packets. 762 While this second outcome evens out the fairness, the ECN-capable 763 flows would still receive little benefit from being ECN-capable, 764 because the increased congestion would drive the router to packet- 765 dropping behavior. 767 A flow that advertised itself as ECN-Capable but does not respond to 768 CE bits is functionally equivalent to a flow that turns off 769 congestion control, as discussed in Sections 8 and 9. 771 Thus, in a world when a subset of the flows are ECN-capable, but 772 where ECN-capable flows have no mechanism for indicating that fact to 773 the routers, there would be less effective and less fair congestion 774 control in the Internet, resulting in a strong incentive for end 775 nodes not to deploy ECN. 777 17. Why use two bits in the IP header? 779 Given the need for an ECT indication in the IP header, there still 780 remains the question of whether the ECT (ECN-Capable Transport) and 781 CE (Congestion Experienced) indications should be overloaded on a 782 single bit. This overloaded-one-bit alternative, explored in 783 [Floyd94], would involve a single bit with two values. One value, 784 "ECT and not CE", would represent an ECN-Capable Transport, and the 785 other value, "CE or not ECT", would represent either Congestion 786 Experienced or a non-ECN-Capable transport. 788 There is only one inherent functional difference between the one-bit 789 and two-bit implementations. This functional difference concerns 790 packets that traverse multiple congested routers. Consider a CE 791 packet that arrives at a second congested router, and is selected by 792 the active queue management at that router for either marking or 793 dropping. In the one-bit implementation, the second congested router 794 has no choice but to drop the CE packet, because it cannot 795 distinguish between a CE packet and a non-ECT packet. In the two-bit 796 implementation, the second congested router has the choice of either 797 dropping the CE packet, or of leaving it alone with the CE bit set. 799 Another difference between the one-bit and two-bit implementations 800 comes from the fact that with the one-bit implementation, receivers 801 in a single flow cannot distinguish between CE and non-ECT packets. 802 Thus, in the one-bit implementation an ECN-capable data sender would 803 have to unambiguously indicate to the receiver or receivers whether 804 each packet had been sent as ECN-Capable or as non-ECN-Capable. One 805 possibility would be for the sender to indicate in the transport 806 header whether the packet was sent as ECN-Capable. A second 807 possibility that would involve a functional limitation for the one- 808 bit implementation would be for the sender to unambiguously indicate 809 that it was going to send *all* of its packets as ECN-Capable or as 810 non-ECN-Capable. For a multicast transport protocol, this 811 unambiguous indication would have to be apparent to receivers joining 812 an on-going multicast session. 814 Another advantage of the two-bit approach is that it is somewhat more 815 robust. The most critical issue, discussed in Section 8, is that the 816 default indication should be that of a non-ECN-Capable transport. In 817 a two-bit implementation, this requirement for the default value 818 simply means that the ECT bit should be `OFF' by default. In the 819 one-bit implementation, this means that the single overloaded bit 820 should by default be in the "CE or not ECT" position. This is less 821 clear and straightforward, and possibly more open to incorrect 822 implementations either in the end nodes or in the routers. 824 In summary, while the one-bit implementation could be a possible 825 implementation, it has the following significant limitations relative 826 to the two-bit implementation. First, the one-bit implementation has 827 more limited functionality for the treatment of CE packets at a 828 second congested router. Second, the one-bit implementation requires 829 either that extra information be carried in the transport header of 830 packets from ECN-Capable flows (to convey the functionality of the 831 second bit elsewhere, namely in the transport header), or that 832 senders in ECN-Capable flows accept the limitation that receivers 833 must be able to determine a priori which packets are ECN-Capable and 834 which are not ECN-Capable. Third, the one-bit implementation is 835 possibly more open to errors from faulty implementations that choose 836 the wrong default value for the ECN bit. We believe that the use of 837 the extra bit in the IP header for the ECT-bit is extremely valuable 838 to overcome these limitations. 840 AUTHORS' ADDRESSES 842 K. K. Ramakrishnan 843 AT&T Labs. Research 844 Phone: +1 (973) 360-8766 845 Email: kkrama@research.att.com 846 URL: http://www.research.att.com/info/kkrama 848 Sally Floyd 849 Lawrence Berkeley National Laboratory 850 Phone: +1 (510) 486-7518 851 Email: floyd@ee.lbl.gov 852 URL: http://www-nrg.ee.lbl.gov/floyd/ 854 This draft was created in July 1998. 855 It expires January 1999.