idnits 2.17.1 draft-kksjf-ecn-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 20 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 21 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 169: '...ms followed at the end-systems MUST be...' RFC 2119 keyword, line 197: '...cket, the router MAY instead set the C...' RFC 2119 keyword, line 566: '... header MUST be a 0. If the ECT bit...' RFC 2119 keyword, line 567: '...t in the outside header SHOULD be a 1....' RFC 2119 keyword, line 571: '...e outside header MUST be ORed with the...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 1998) is 9348 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 2001' is mentioned on line 266, but not defined ** Obsolete undefined reference: RFC 2001 (Obsoleted by RFC 2581) == Unused Reference: 'Floyd97' is defined on line 756, but no explicit reference was found in the text == Unused Reference: 'FRED' is defined on line 768, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'AH' -- Possible downref: Non-RFC (?) normative reference: ref. 'CKLTZ97' -- Possible downref: Non-RFC (?) normative reference: ref. 'CKLT98' -- Possible downref: Non-RFC (?) normative reference: ref. 'ECN' -- Possible downref: Non-RFC (?) normative reference: ref. 'ESP' -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ93' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd94' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd97' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd98' -- Possible downref: Non-RFC (?) normative reference: ref. 'K98' -- Possible downref: Non-RFC (?) normative reference: ref. 'FRED' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson88' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson90' ** Downref: Normative reference to an Informational RFC: RFC 1141 -- Possible downref: Non-RFC (?) normative reference: ref. 'MJV96' ** Obsolete normative reference: RFC 2001 (Obsoleted by RFC 2581) ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567) -- Possible downref: Non-RFC (?) normative reference: ref. 'RJ90' Summary: 14 errors (**), 0 flaws (~~), 6 warnings (==), 17 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force K. K. Ramakrishnan 2 INTERNET DRAFT AT&T Labs Research 3 draft-kksjf-ecn-02.txt Sally Floyd 4 LBNL 5 September 1998 6 Expires: March 1999 8 A Proposal to add Explicit Congestion Notification (ECN) to IP 10 Status of this Memo 12 This document is an Internet-Draft. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its areas, 14 and its working groups. Note that other groups may also distribute 15 working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet- Drafts as reference 20 material or to cite them other than as "work in progress." 22 To view the entire list of current Internet-Drafts, please check the 23 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 24 Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern 25 Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific 26 Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). 28 Abstract 30 This note describes a proposed addition of ECN (Explicit Congestion 31 Notification) to IP. TCP is currently the dominant transport 32 protocol used in the Internet. We begin by describing TCP's use of 33 packet drops as an indication of congestion. Next we argue that with 34 the addition of active queue management (e.g., RED) to the Internet 35 infrastructure, where routers detect congestion before the queue 36 overflows, routers are no longer limited to packet drops as an 37 indication of congestion. Routers could instead set a Congestion 38 Experienced (CE) bit in the packet header of packets from ECN-capable 39 transport protocols. We describe when the CE bit would be set in the 40 routers, and describe what modifications would be needed to TCP to 41 make it ECN-capable. Modifications to other transport protocols 42 (e.g., unreliable unicast or multicast, reliable multicast, other 43 reliable unicast transport protocols) could be considered as those 44 protocols are developed and advance through the standards process. 46 1. Introduction 48 TCP's congestion control and avoidance algorithms are based on the 49 notion that the network is a black-box [Jacobson88, Jacobson90]. The 50 network's state of congestion or otherwise is determined by end- 51 systems probing for the network state, by gradually increasing the 52 load on the network (by increasing the window of packets that are 53 outstanding in the network) until the network becomes congested and a 54 packet is lost. Treating the network as a "black-box" and treating 55 loss as an indication of congestion in the network is appropriate for 56 pure best-effort data carried by TCP which has little or no 57 sensitivity to delay or loss of individual packets. In addition, 58 TCP's congestion management algorithms have techniques built-in (such 59 as Fast Retransmit and Fast Recovery) to minimize the impact of 60 losses from a throughput perspective. 62 However, these mechanisms are not intended to help applications that 63 are in fact sensitive to the delay or loss of one or more individual 64 packets. Interactive traffic such as telnet, web-browsing, and 65 transfer of audio and video data can be sensitive to packet losses 66 (using an unreliable data delivery transport such as UDP) or to the 67 increased latency of the packet caused by the need to retransmit the 68 packet after a loss (for reliable data delivery such as TCP). 70 Since TCP determines the appropriate congestion window to use by 71 gradually increasing the window size until it experiences a dropped 72 packet, this causes the queues at the bottleneck router to build up. 73 With most packet drop policies at the router that are not sensitive 74 to the load placed by each individual flow, this means that some of 75 the packets of latency-sensitive flows are going to be dropped. 76 Active queue management mechanisms detect congestion before the queue 77 overflows, and provide an indication of this congestion to the end 78 nodes. The advantages of active queue management are discussed in 79 RFC 2309 [RFC2309]. Active queue management avoids some of the bad 80 properties of dropping on queue overflow, including the undesirable 81 synchronization of loss across multiple flows. More importantly, 82 active queue management means that transport protocols with 83 congestion control (e.g., TCP) do not have to rely on buffer overflow 84 as the only indication of congestion. This can reduce unnecessary 85 queueing delay for all traffic sharing that queue. 87 Active queue management mechanisms may use one of several methods for 88 indicating congestion to end-nodes. One is to use packet drops, as is 89 currently done. However, active queue management allows the router 90 to separate policies of queueing or dropping packets from the 91 policies for indicating congestion. Thus, active queue management 92 allows routers to use the Congestion Experienced (CE) bit in a packet 93 header as an indication of congestion, instead of relying solely on 94 packet drops. 96 2. Assumptions and General Principles 98 In this section, we describe some of the important design principles 99 and assumptions that guided the design choices in this proposal. 101 (1) Congestion may persist over different time-scales. The time 102 scales that we are concerned with are congestion events that may last 103 longer than a round-trip time. 104 (2) The number of packets in an individual flow (e.g., TCP connection 105 or an exchange using UDP) may range from a small number of packets to 106 quite a large number. We are interested in managing the congestion 107 caused by flows that send enough packets so that they are still 108 active when network feedback reaches them. 109 (3) New mechanisms for congestion control and avoidance need to co- 110 exist and cooperate with existing mechanisms for congestion control. 111 In particular, new mechanisms have to co-exist with TCP's current 112 methods of adapting to congestion and with routers' current practice 113 of dropping packets in periods of congestion. 114 (4) Because ECN is likely to be adopted gradually, accommodating 115 migration is essential. Some routers may still only drop packets to 116 indicate congestion, and some end-systems may not be ECN-capable. 117 The most viable strategy is one that accommodates incremental 118 deployment without having to resort to "islands" of ECN-capable and 119 non-ECN-capable environments. 120 (5) Asymmetric routing is likely to be a normal occurrence in the 121 Internet. The path (sequence of links and routers) followed by data 122 packets may be different from the path followed by the acknowledgment 123 packets in the reverse direction. 124 (6) Many routers process the "regular" headers in IP packets more 125 efficiently than they process the header information in IP options. 126 This suggests keeping congestion experienced information in the 127 regular headers of an IP packet. 128 (7) It must be recognized that not all end-systems will cooperate in 129 mechanisms for congestion control. However, new mechanisms shouldn't 130 make it easier for TCP applications to disable TCP congestion 131 control. The benefit of lying about participating in new mechanisms 132 such as ECN-capability should be small. 134 3. Random Early Detection (RED) 136 Random Early Detection (RED) is a mechanism for active queue 137 management that has been proposed to detect incipient congestion 138 [FJ93], and is currently being deployed in the Internet backbone 139 [RFC2309]. Although RED is meant to be a general mechanism using one 140 of several alternatives for congestion indication, in the current 141 environment of the Internet RED is restricted to using packet drops 142 as a mechanism for congestion indication. RED drops packets based on 143 the average queue length exceeding a threshold, rather than only when 144 the queue overflows. However, when RED drops packets before the 145 queue actually overflows, RED is not forced by memory limitations to 146 discard the packet. 148 RED could set a Congestion Experienced (CE) bit in the packet header 149 instead of dropping the packet, if such a bit was provided in the IP 150 header and understood by the transport protocol. The use of the CE 151 bit would allow the receiver(s) to receive the packet, avoiding the 152 potential for excessive delays due to retransmissions after packet 153 losses. We use the term 'CE packet' to denote a packet that has the 154 CE bit set. 156 4. Explicit Congestion Notification in IP 158 We propose that the Internet provide a congestion indication for 159 incipient congestion (as in RED and earlier work [RJ90]) where the 160 notification can sometimes be through marking packets rather than 161 dropping them. This would require an ECN field in the IP header with 162 two bits. The ECN-Capable Transport (ECT) bit would be set by the 163 data sender to indicate that the end-points of the transport protocol 164 are ECN-capable. The CE bit would be set by the router to indicate 165 congestion to the end nodes. Routers that have a packet arriving at 166 a full queue would drop the packet, just as they do now. 168 Upon the receipt by an ECN-Capable transport of a single CE packet, 169 the congestion control algorithms followed at the end-systems MUST be 170 essentially the same as the congestion control response to a *single* 171 dropped packet. For example, for TCP the source TCP halves its 172 congestion window "cwnd" in response to an ECN indication received by 173 the data receiver. 175 One reason for requiring that the congestion-control response to the 176 CE packet be essentially the same as the response to a dropped packet 177 is to accommodate the incremental deployment of ECN in both end- 178 systems and in routers. Some routers may drop ECN-Capable packets 179 (e.g., using the same RED policies for congestion detection) while 180 other routers set the CE bit, for equivalent levels of congestion. 181 Similarly, a router might drop a non-ECN-Capable packet but set the 182 CE bit in an ECN-Capable packet, for equivalent levels of congestion. 183 Different congestion control responses to a CE bit indication and to 184 a packet drop could result in unfair treatment for different flows. 186 An additional requirement is that the end-systems should react to 187 congestion at most once per window of data (i.e., at most once per 188 roundtrip time), to avoid reacting multiple times to multiple 189 indications of congestion within a roundtrip time. 191 For a router, the CE bit of an ECN-Capable packet should only be set 192 if the router would otherwise have dropped the packet as an 193 indication of congestion to the end nodes. When the router's buffer 194 is not yet full and the router is prepared to drop a packet to inform 195 end nodes of incipient congestion, the router should first check to 196 see if the ECT bit is set in that packet's IP header. If so, then 197 instead of dropping the packet, the router MAY instead set the CE bit 198 in the IP header. 200 An environment where all end nodes were ECN-Capable could allow new 201 criteria to be developed for setting the CE bit, and new congestion 202 control mechanisms for end-node reaction to CE packets. However, 203 this is a research issue, and as such is not addressed in this 204 document. 206 When a CE packet is received by a router, the CE bit is left 207 unchanged, and the packet transmitted as usual. When severe 208 congestion has occurred and the router's queue is full, then the 209 router has no choice but to drop some packet when a new packet 210 arrives. We anticipate that such packet losses will become 211 relatively infrequent when a majority of end-systems become ECN- 212 Capable and participate in TCP or other compatible congestion control 213 mechanisms. In an adequately-provisioned network in such an ECN- 214 Capable environment, packet losses should occur primarily during 215 transients or in the presence of non-cooperating sources. 217 We expect that routers will set the CE bit in response to incipient 218 congestion as indicated by the average queue size, using the RED 219 algorithms suggested in [FJ93, RFC2309]. To the best of our 220 knowledge, this is the only proposal currently under discussion in 221 the IETF for routers to drop packets proactively, before the buffer 222 overflows. However, this document does not attempt to specify a 223 particular mechanism for active queue management, leaving that 224 endeavor, if needed, to other areas of the IETF. While ECN is 225 inextricably tied up with active queue management at the router, the 226 reverse does not hold; active queue management mechanisms have been 227 developed and deployed independently from ECN, using packet drops as 228 indications of congestion in the absence of ECN in the IP 229 architecture. 231 5. Support from the Transport Protocol 233 ECN requires support from the transport protocol, in addition to the 234 functionality given by the ECN field in the IP packet header. The 235 transport protocol might require negotiation between the endpoints 236 during setup to determine that all of the endpoints are ECN-capable, 237 so that the sender can set the ECT bit in transmitted packets. 238 Second, the transport protocol must be capable of reacting 239 appropriately to the receipt of CE packets. This reaction could be 240 in the form of the data receiver informing the data sender of the 241 received CE packet (e.g., TCP), of the data receiver unsubscribing to 242 a layered multicast group (e.g., RLM [MJV96]), or of some other 243 action that ultimately reduces the arrival rate of that flow to that 244 receiver. 246 This document only addresses the addition of ECN Capability to TCP, 247 leaving issues of ECN and other transport protocols to further 248 research. For TCP, ECN requires three new mechanisms: negotiation 249 between the endpoints during setup to determine if they are both ECN- 250 capable; an ECN-Echo flag in the TCP header so that the data receiver 251 can inform the data sender when a CE packet has been received; and a 252 Congestion Window Reduced (CWR) flag in the TCP header so that the 253 data sender can inform the data receiver that the congestion window 254 has been reduced. The support required from other transport 255 protocols is likely to be different, particular for unreliable or 256 reliable multicast transport protocols, and will have to be 257 determined as other transport protocols are brought to the IETF for 258 standardization. 260 5.1. TCP 262 The following sections describe in detail the proposed use of ECN in 263 TCP. This proposal is described in essentially the same form in 264 [Floyd94]. We assume that the source TCP uses the standard 265 congestion control algorithms of Slow-start, Fast Retransmit and Fast 266 Recovery [RFC 2001]. 268 This proposal specifies two new flags in the Reserved field of the 269 TCP header. The TCP mechanism for negotiating ECN-Capability uses 270 the ECN-Echo flag in the TCP header. (This was called the ECN Notify 271 flag in some earlier documents.) Bit 9 in the Reserved field of the 272 TCP header is designated as the ECN-Echo flag. 274 To enable the TCP receiver to determine when to stop setting the ECN- 275 Echo flag, we introduce a second new flag in the TCP header, the 276 Congestion Window Reduced (CWR) flag. The CWR flag is assigned to 277 Bit 8 in the Reserved field of the TCP header. 279 The use of these flags is described in the sections below. 281 5.1.1. TCP Initialization 283 In the TCP connection setup phase, the source and destination TCPs 284 exchange information about their desire and/or capability to use ECN. 285 Subsequent to the completion of this negotiation, the TCP sender sets 286 the ECT bit in the IP header of packets to indicate to the network 287 that the transport is capable and willing to participate in ECN for 288 this packet. This will indicate to the routers that they may mark 289 this packet with the CE bit, if they would like to use that as a 290 method of congestion notification. If the TCP connection does not 291 wish to use ECN notification for a particular packet, the sending TCP 292 sets the ECT bit equal to 0 (i.e., not set), and the TCP receiver 293 ignores the CE bit in the received packet. 295 When a node sends a TCP SYN packet, it may set the ECN-Echo and CWR 296 flags in the TCP header. For a SYN packet, the setting of both the 297 ECN-Echo and CWR flags are defined as an indication that the sending 298 TCP is ECN-Capable, rather than as an indication of congestion or of 299 response to congestion. More precisely, a SYN packet with both the 300 ECN-Echo and CWR flags set indicates that the TCP implementation 301 transmitting the SYN packet will respond to incoming data packets 302 that have the CE bit set in the IP header by setting the ECN-Echo 303 flag in outgoing TCP Acknowledgement (ACK) packets. 305 When a node sends a SYN-ACK packet, it may set the ECN-Echo flag, but 306 it does not set the CWR flag. For a SYN-ACK packet, the pattern of 307 the ECN-Echo flag set and the CWR flag not set in the TCP header is 308 defined as an indication that the TCP transmitting the SYN-ACK packet 309 is ECN-Capable. 311 There is the question of why we chose to have the TCP sending the SYN 312 set two ECN-related flags in the Reserved field of the TCP header for 313 the SYN packet, while the responding TCP sending the SYN-ACK sets 314 only one ECN-related flag in the SYN-ACK packet? This asymmetry is 315 necessary for the robust negotiation of ECN-capability with deployed 316 TCP implementations. There exists at least one TCP implementation in 317 which TCP receivers set the Reserved field of the TCP header in ACK 318 packets (and hence the SYN-ACK) simply to reflect the Reserved field 319 of the TCP header in the received data packet. Because the TCP SYN 320 packet sets the ECN-Echo and CWR flags to indicate ECN-capability, 321 while the SYN-ACK packet sets only the ECN-Echo flag, the sending TCP 322 correctly interprets a receiver's reflection of its own flags in the 323 Reserved field as an indication that the receiver is not ECN-capable. 325 5.1.2. The TCP Sender 327 For a TCP connection using ECN, data packets are transmitted with the 328 ECT bit set in the IP header (set to a "1"). If the sender receives 329 an ECN-Echo ACK packet (that is, an ACK packet with the ECN-Echo flag 330 set in the TCP header), then the sender knows that congestion was 331 encountered in the network on the path from the sender to the 332 receiver. The indication of congestion should be treated just as a 333 congestion loss in non-ECN-Capable TCP. That is, the TCP source 334 halves the congestion window "cwnd" and reduces the slow start 335 threshold "ssthresh". The sending TCP does NOT increase the 336 congestion window in response to the receipt of an ECN-Echo ACK 337 packet. 339 A critical condition is that TCP does not react to congestion 340 indications more than once every window of data (or more loosely, 341 more than once every round-trip time). That is, the TCP sender's 342 congestion window should be reduced only once in response to a series 343 of dropped and/or CE packets from a single window of data, In 344 addition, the TCP source should not decrease the slow-start 345 threshold, ssthresh, if it has been decreased within the last round 346 trip time. However, if any retransmitted packets are dropped or have 347 the CE bit set, then this is interpreted by the source TCP as a new 348 instance of congestion. 350 [Floyd94] discusses this further, and [Floyd98] includes a validation 351 test in the ns simulator illustrating a wide range of ECN scenarios. 352 These scenarios include the following: an ECN followed by another 353 ECN, a Fast Retransmit, or a Retransmit Timeout; and a Retransmit 354 Timeout or a Fast Retransmit followed by an ECN. 356 When the TCP sender reduces its congestion window in response to an 357 ECN-Echo ACK packet, there is no need for the sender to slow-start 358 (as in Tahoe TCP in response to a packet drop) or to stop sending 359 packets for a period of time to allow the queue to dissipate (as in 360 Reno TCP for roughly half a round-trip time during Fast Recovery). 361 The CE packet in the forward direction does not indicate the imminent 362 possibility of buffer overflow requiring an urgent source action to 363 reduce the load dramatically. Incoming acknowledgements that 364 continue to arrive can "clock out" outgoing packets as allowed by the 365 reduced congestion window. 367 TCP follows existing algorithms for sending data packets in response 368 to incoming ACKs, multiple duplicate acknowledgements, or retransmit 369 timeouts [RFC2001]. 371 5.1.3. The TCP Receiver 373 When TCP receives a CE data packet at the destination end-system, the 374 TCP data receiver sets the ECN-Echo flag in the TCP header of the 375 subsequent ACK packet. If there is any ACK withholding implemented, 376 as in current "delayed-ACK" TCP implementations where the TCP 377 receiver can send an ACK for two arriving data packets, then the ECN- 378 Echo flag in the ACK packet will be set to the OR of the CE bits of 379 all of the data packets being acknowledged. That is, if any of the 380 received data packets are CE packets, then the returning ACK has the 381 ECN-Echo flag set. 383 To provide robustness against the possibility of a dropped ACK packet 384 carrying an ECN-Echo flag, the TCP receiver must set the ECN-Echo 385 flag in a series of ACK packets. The TCP receiver uses the CWR flag 386 to determine when to stop setting the ECN-Echo flag. 388 When an ECN-Capable TCP reduces its congestion window for any reason 389 (because of a retransmit timeout, a Fast Retransmit, or in response 390 to an ECN Notification), the TCP sets the CWR flag in the TCP header 391 of the first data packet sent after the window reduction. If that 392 data packet is dropped in the network, then the sending TCP will have 393 to reduce the congestion window again and retransmit the dropped 394 packet. Thus, the Congestion Window Reduced message is reliably 395 delivered to the data receiver. 397 After a TCP receiver sends an ACK packet with the ECN-Echo bit set, 398 that TCP receiver continues to set the ECN-Echo flag in ACK packets 399 until it receives a CWR packet (a packet with the CWR flag set). 400 After the receipt of the CWR packet, acknowledgements for subsequent 401 non-CE data packets do not have the ECN-Echo flag set. If another CE 402 packet is received by the data receiver, the receiver would once 403 again send ACK packets with the ECN-Echo flag set. While the receipt 404 of a CWR packet does not guarantee that the data sender received the 405 ECN-Echo message, this does guarantee that the data sender reduced 406 its congestion window at some point *after* it sent the data packet 407 for which the CE bit was set. 409 We have already specified that a TCP sender reduces its congestion 410 window at most once per window of data. This mechanism requires some 411 care to make sure that the sender reduces its congestion window at 412 most once per ECN indication, and that multiple ECN messages over 413 several successive windows of data are properly reported to the ECN 414 sender. This is discussed further in [Floyd98]. 416 5.1.4. Congestion on the ACK-path 418 For the current generation of TCP congestion control algorithms, pure 419 acknowledgement packets (e.g., packets that do not contain any 420 accompanying data) should be sent with the ECT bit off. Current TCP 421 receivers have no mechanisms for reducing traffic on the ACK-path in 422 response to congestion notification. Mechanisms for responding to 423 congestion on the ACK-path can be relegated as an area for future 424 research. (One simple possibility would be for the sender to reduce 425 its congestion window when it receives a pure ACK packet with the CE 426 bit set). For current TCP implementations, a single dropped ACK 427 generally has only a very small effect on the TCP's sending rate. 429 6. Summary of changes required in IP and TCP 431 Two bits need to be specified in the IP header, the ECN-Capable 432 Transport (ECT) bit and the Congestion Experienced (CE) bit. The ECT 433 bit set to "0" indicates that the transport protocol will ignore the 434 CE bit. This is the default value for the ECT bit. The ECT bit set 435 to "1" indicates that the transport protocol is willing and able to 436 participate in ECN. 438 The default value for the CE bit is "0". The router sets the CE bit 439 to "1" to indicate congestion to the end nodes. The CE bit in a 440 packet header should never be reset by a router from "1" to "0". 442 TCP requires three changes, a negotiation phase during setup to 443 determine if both end nodes are ECN-capable, and two new flags in the 444 TCP header, from the "reserved" flags in the TCP flags field. The 445 ECN-Echo flag is used by the data receiver to inform the data sender 446 of a received CE packet. The Congestion Window Reduced flag is used 447 by the data sender to inform the data receiver that the congestion 448 window has been reduced. 450 7. Non-relationship to ATM's EFCI indicator or Frame Relay's FECN 452 Since the ATM and Frame Relay mechanisms for congestion indication 453 have typically been defined without any notion of average queue size 454 as the basis for determining that an intermediate node is congested, 455 we believe that they provide a very noisy signal. The TCP-sender 456 reaction specified in this draft for ECN is NOT the appropriate 457 reaction for such a noisy signal of congestion notification. It is 458 our expectation that ATM's EFCI and Frame Relay's FECN mechanisms 459 would be phased out over time within the ATM network. However, if 460 the routers that interface to the ATM network have a way of 461 maintaining the average queue at the interface, and use it to come to 462 a reliable determination that the ATM subnet is congested, they may 463 use the ECN notification that is defined here. 465 We emphasize that a *single* packet with the CE bit set in an IP 466 packet causes the transport layer to respond, in terms of congestion 467 control, as it would to a packet drop. As such, the CE bit is not a 468 good match to a transient signal such as one based on the 469 instantaneous queue size. However, experiments in techniques at 470 layer 2 (e.g., in ATM switches or Frame Relay switches) should be 471 encouraged. For example, using a scheme such as RED (where packet 472 marking is based on the average queue length exceeding a threshold), 473 layer 2 devices could provide a reasonably reliable indication of 474 congestion. When all the layer 2 devices in a path set that layer's 475 own Congestion Experienced bit (e.g., the EFCI bit for ATM, the FECN 476 bit in Frame Relay) in this reliable manner, then the interface 477 router to the layer 2 network could copy the state of that layer 2 478 Congestion Experienced bit into the CE bit in the IP header. We 479 recognize that this is not the current practice, nor is it in current 480 standards. However, encouraging experimentation in this manner may 481 provide the information needed to enable evolution of existing layer 482 2 mechanisms to provide a more reliable means of congestion 483 indication, when they use a single bit for indicating congestion. 485 8. Non-compliance by the End Nodes 487 This section discusses concerns about the vulnerability of ECN to 488 non-compliant end-nodes (i.e., end nodes that set the ECT bit in 489 transmitted packets but do not respond to received CE packets). We 490 argue that the addition of ECN to the IP architecture would not 491 significantly increase the current vulnerability of the architecture 492 to unresponsive flows. 494 Even for non-ECN environments, there are serious concerns about the 495 damage that can be done by non-compliant or unresponsive flows (that 496 is, flows that do not respond to congestion control indications by 497 reducing their arrival rate at the congested link). For example, an 498 end-node could "turn off congestion control" by not reducing its 499 congestion window in response to packet drops. This is a concern for 500 the current Internet. It has been argued that routers will have to 501 deploy mechanisms to detect and differentially treat packets from 502 non-compliant flows. It has also been argued that techniques such as 503 end-to-end per-flow scheduling and isolation of one flow from 504 another, differentiated services, or end-to-end reservations could 505 remove some of the more damaging effects of unresponsive flows. 507 It has been argued that dropping packets in itself may be an adequate 508 deterrent for non-compliance, and that the use of ECN removes this 509 deterrent. We would argue in response that (1) ECN-capable routers 510 preserve packet-dropping behavior in times of high congestion; and 511 (2) even in times of high congestion, dropping packets in itself is 512 not an adequate deterrent for non-compliance. 514 First, ECN-Capable routers will only mark packets (as opposed to 515 dropping them) when the packet marking rate is reasonably low. 516 During periods where the average queue size exceeds an upper 517 threshold, and therefore the potential packet marking rate would be 518 high, our recommendation is that routers drop packets rather then set 519 the CE bit in packet headers. 521 During the periods of low or moderate packet marking rates when ECN 522 would be deployed, there would be little deterrent effect on 523 unresponsive flows of dropping rather than marking those packets. 524 For example, delay-insensitive flows using reliable delivery might 525 have an incentive to increase rather than to decrease their sending 526 rate in the presence of dropped packets. Similarly, delay-sensitive 527 flows using unreliable delivery might increase their use of FEC in 528 response to an increased packet drop rate, increasing rather than 529 decreasing their sending rate. For the same reasons, we do not 530 believe that packet dropping itself is an effective deterrent for 531 non-compliance even in an environment of high packet drop rates. 533 Several methods have been proposed to identify and restrict non- 534 compliant or unresponsive flows. The addition of ECN to the network 535 environment would not in any way increase the difficulty of designing 536 and deploying such mechanisms. If anything, the addition of ECN to 537 the architecture would make the job of identifying unresponsive flows 538 slightly easier. For example, in an ECN-Capable environment routers 539 are not limited to information about packets that are dropped or have 540 the CE bit set at that router itself; in such an environment routers 541 could also take note of arriving CE packets that indicate congestion 542 encountered by that packet earlier in the path. 544 9. Non-compliance in the Network 546 The breakdown of effective congestion control could be caused not 547 only by a non-compliant end-node, but also by the loss of the 548 congestion indication in the network itself. As one example, a rogue 549 or broken router could "erase" the CE bit in arriving CE packets, 550 thus preventing that indication of congestion from reaching 551 downstream receivers. This could result in the failure of congestion 552 control for that flow and a resulting increase in congestion in the 553 network, ultimately resulting in subsequent packets dropped for this 554 flow as the average queue size increased at the congested gateway. 555 Concerns regarding the loss of congestion indications from 556 encapsulated, dropped, or corrupted packets are discussed below. 558 9.1. Encapsulated packets 560 Some care is required to handle the CE and ECT bits appropriately 561 when packets are encapsulated and de-encapsulated for tunnels. 563 When a packet is encapsulated, the following rules apply regarding 564 the ECT bit. First, if the ECT bit in the encapsulated ('inside') 565 header is a 0, then the ECT bit in the encapsulating ('outside') 566 header MUST be a 0. If the ECT bit in the inside header is a 1, then 567 the ECT bit in the outside header SHOULD be a 1. 569 When a packet is de-encapsulated, the following rules apply regarding 570 the CE bit. If the ECT bit is a 1 in both the inside and the outside 571 header, then the CE bit in the outside header MUST be ORed with the 572 CE bit in the inside header. (That is, in this case a CE bit of 1 in 573 the outside header must be copied to the inside header.) If the ECT 574 bit in either header is a 0, then the CE bit in the outside header is 575 ignored. This requirement for the treatment of de-encapsulated 576 packets does not currently apply to IPsec tunnels. 578 A specific example of the use of ECN with encapsulation occurs when a 579 flow wishes to use ECN-capability to avoid the danger of an 580 unnecessary packet drop for the encapsulated packet as a result of 581 congestion at an intermediate node in the tunnel. This functionality 582 can be supported by copying the ECN codepoint in the inner IP header 583 to the outer IP header upon encapsulation, and using the ECN 584 codepoint in the outer IP header to set the ECN codepoint in the 585 inner IP header upon decapsulation. This effectively allows routers 586 along the tunnel to cause the CE bit to be set in the ECN field of 587 the unencapsulated IP header of an ECN-capable packet when such 588 routers experience congestion. 590 9.2. IPsec Tunnel Considerations 592 The IPsec protocol, as defined in [ESP, AH], does not include the IP 593 header's ECN field in any of its cryptographic calculations (in the 594 case of tunnel mode, the outer IP header's ECN field is not 595 included). Hence modification of the ECN field by a network node has 596 no effect on IPsec's end-to-end security, because it cannot cause any 597 IPsec integrity check to fail. As a consequence, IPsec does not 598 provide any defense against an adversary's modification of the ECN 599 field (i.e., a man-in-the-middle attack), as the adversary's 600 modification will also have no effect on IPsec's end-to-end security. 601 In some environments, the ability to modify the ECN field without 602 affecting IPsec integrity checks may constitute a covert channel; if 603 it is necessary to eliminate such a channel or reduce its bandwidth, 604 then the outer IP header's ECN field can be zeroed at the tunnel 605 ingress and egress nodes. 607 The IPsec protocol currently requires that the inner header's ECN 608 field not be changed by IPsec decapsulation processing at a tunnel 609 egress node. This ensures that an adversary's modifications to the 610 ECN field cannot be used to launch theft- or denial-of-service 611 attacks across an IPsec tunnel endpoint, as any such modifications 612 will be discarded at the tunnel endpoint. This document makes no 613 change to that IPsec requirement. As a consequence of the current 614 specification of the IPsec protocol, we suggest that experiments with 615 ECN not be carried out for flows that will undergo IPsec tunneling at 616 the present time. 618 If the IPsec specifications are modified in the future to permit a 619 tunnel egress node to modify the ECN field in an inner IP header 620 based on the ECN field value in the outer header (e.g., copying part 621 or all of the outer ECN field to the inner ECN field), or to permit 622 the ECN field of the outer IP header to be zeroed during 623 encapsulation, then experiments with ECN may be used in combination 624 with IPsec tunneling. 626 This discussion of ECN and IPsec tunnel considerations draws heavily 627 on related discussions and documents from the Differentiated Services 628 Working Group. 630 9.3. Dropped or Corrupted Packets 632 An additional issue concerns a packet that has the CE bit set at one 633 router and is dropped by a subsequent router. For the proposed use 634 for ECN in this paper (that is, for a transport protocol such as TCP 635 for which a dropped data packet is an indication of congestion), end 636 nodes detect dropped data packets, and the congestion response of the 637 end nodes to a dropped data packet is at least as strong as the 638 congestion response to a received CE packet. 640 However, transport protocols such as TCP do not necessarily detect 641 all packet drops, such as the drop of a "pure" ACK packet; for 642 example, TCP does not reduce the arrival rate of subsequent ACK 643 packets in response to an earlier dropped ACK packet. Any proposal 644 for extending ECN-Capability to such packets would have to address 645 concerns raised by CE packets that were later dropped in the network. 647 Similarly, if a CE packet is dropped later in the network due to 648 corruption (bit errors), the end nodes should still invoke congestion 649 control, just as TCP would today in response to a dropped data 650 packet. This issue of corrupted CE packets would have to be 651 considered in any proposal for the network to distinguish between 652 packets dropped due to corruption, and packets dropped due to 653 congestion or buffer overflow. 655 10. A summary of related work. 657 [Floyd94] considers the advantages and drawbacks of adding ECN to the 658 TCP/IP architecture. As shown in the simulation-based comparisons, 659 one advantage of ECN is to avoid unnecessary packet drops for short 660 or delay-sensitive TCP connections. A second advantage of ECN is in 661 avoiding some unnecessary retransmit timeouts in TCP. This paper 662 discusses in detail the integration of ECN into TCP's congestion 663 control mechanisms. The possible disadvantages of ECN discussed in 664 the paper are that a non-compliant TCP connection could falsely 665 advertise itself as ECN-capable, and that a TCP ACK packet carrying 666 an ECN-Echo message could itself be dropped in the network. The 667 first of these two issues is discussed in Section 8 of this document, 668 and the second is addressed by the proposal in Section 5.1.3 for a 669 CWR flag in the TCP header. 671 [CKLTZ97] reports on an experimental implementation of ECN in IPv6. 672 The experiments include an implementation of ECN in an existing 673 implementation of RED for FreeBSD. A number of experiments were run 674 to demonstrate the control of the average queue size in the router, 675 the performance of ECN for a single TCP connection as a congested 676 router, and fairness with multiple competing TCP connections. One 677 conclusion of the experiments is that dropping a packet from a bulk- 678 data transfer degrades performance much more severely than marking a 679 packet. 681 Because the experimental implementation in [CKLTZ97] predates some of 682 the developments in this document, the implementation does not 683 conform to this document in all respects. For example, in the 684 experimental implementation the CWR flag is not used, but instead the 685 TCP receiver sends the ECN-Echo bit on a single ACK packet. 687 [K98] and [CKLT98] build on [CKLTZ97] to further analyze the benefits 688 of ECN for TCP. The conclusions are that ECN TCP gets moderately 689 better throughput than non-ECN TCP; that ECN TCP flows are fair 690 towards non-ECN TCP flows; and that ECN TCP is robust with two-way 691 traffic, congestion in both directions, and with multiple congested 692 gateways. Experiments with many short web transfers show that, while 693 most of the short connections have similar transfer times with or 694 without ECN, a small percentage of the short connections have very 695 high transfer times for the non-ECN experiments as compared to the 696 ECN experiments. This increased transfer time is particularly 697 dramatic for those short connections that have their first packet 698 dropped in the non-ECN experiments, and that therefore have to wait 699 six seconds for the retransmit timer to expire. 701 The ECN Web Page [ECN] has pointers to other implementations of ECN 702 in progress. 704 11. Conclusions 706 Given the current effort to implement RED, we believe this is the 707 right time for router vendors to examine how to implement congestion 708 avoidance mechanisms that do not depend on packet drops alone. With 709 the increased deployment of applications and transports sensitive to 710 the delay and loss of a single packet, depending on packet loss as a 711 normal congestion notification mechanism appears to be insufficient 712 (or at the very least, non-optimal). 714 12. Acknowledgements 716 Many people have made contributions to this internet-draft. In 717 particular, we would like to thank Kenjiro Cho for the proposal for 718 the TCP mechanism for negotiating ECN-Capability, Kevin Fall for the 719 proposal of the CWR bit, Steve Blake for material on IPv4 Header 720 Checksum Recalculation, Jamal Hadi Salim for discussions of ECN 721 issues, and Steve Bellovin, Jim Bound, Brian Carpenter, Paul 722 Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson for 723 discussions of security issues. We also thank the Internet End-to- 724 End Research Group for ongoing discussions of these issues. 726 13. References 728 [AH] S. Kent and R. Atkinson, "IP Authentication Header", Internet 729 Draft , July 1998. 731 [CKLTZ97] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L., 732 "Implementing Explicit Congestion Notification (ECN) in TCP over 733 IPv6", UCLA Technical Report, December 1997, URL 734 "http://www.cs.ucla.edu/~hari/software/ecn/ecn_rpt.ps.gz". 736 [CKLT98] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L., 737 "Implementing ECN for TCP/IPv6", presentation to the ECN BOF at the 738 L.A. IETF, March 1998, URL "http://www.cs.ucla.edu/~hari/ecn- 739 ietf.ps". 741 [ECN] "The ECN Web Page", URL "http://www- 742 nrg.ee.lbl.gov/floyd/ecn.html". 744 [ESP] S. Kent and R. Atkinson, "IP Encapsulating Security Payload", 745 Internet Draft , July 1998. 747 [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways 748 for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 749 N.4, August 1993, p. 397-413. URL 750 "ftp://ftp.ee.lbl.gov/papers/early.pdf". 752 [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM 753 Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. 754 URL "ftp://ftp.ee.lbl.gov/papers/tcp_ecn.4.ps.Z". 756 [Floyd97] Floyd, S., and Fall, K., "Router Mechanisms to Support End- 757 to-End Congestion Control", Technical report, February 1997. URL 758 "ftp://ftp.ee.lbl.gov/papers/collapse.ps". 760 [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", 761 URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- 762 ecn. 764 [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN) 765 benefits for TCP", Master's thesis, UCLA, 1998, URL 766 "http://www.cs.ucla.edu/~hari/software/ecn/ecn_report.ps.gz". 768 [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", 769 SIGCOMM '97, September 1997. URL 770 "http://www.inria.fr/rodeo/sigcomm97/program.html#ab078". 772 [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. 773 ACM SIGCOMM '88, pp. 314-329. URL 774 "ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z". 776 [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance 777 Algorithm", Message to end2end-interest mailing list, April 1990. 778 URL "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". 780 [RFC1141] T. Mallory and A. Kullberg, "Incremental Updating of the 781 Internet Checksum", RFC 1141, January 1990. 783 [MJV96], S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven 784 Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130. 786 [RFC2001] W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast 787 Retransmit, and Fast Recovery Algorithms", RFC 2001, January 1997. 789 [RFC2309] B. Braden, D. Clark, J. Crowcroft, B. Davie, S. Deering, D. 790 Estrin, S. Floyd, V. Jacobson, G. Minshall, C. Partridge, L. 791 Peterson, K. Ramakrishnan, S. Shenker, J. Wroclawski, L. Zhang, 792 "Recommendations on Queue Management and Congestion Avoidance in the 793 Internet", RFC 2309, April 1998. 795 [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for 796 Congestion Avoidance in Computer Networks", ACM Transactions on 797 Computer Systems, Vol.8, No.2, pp. 158-181, May 1990. 799 14. Security Considerations 801 Security considerations have been discussed in Section 9. 803 15. IPv4 Header Checksum Recalculation 805 IPv4 header checksum recalculation is an issue with some high-end 806 router architectures using an output-buffered switch, since most if 807 not all of the header manipulation is performed on the input side of 808 the switch, while the ECN decision would need to be made local to the 809 output buffer. This is not an issue for IPv6, since there is no IPv6 810 header checksum. The IPv4 TOS octet is the last byte of a 16-bit 811 half-word. 813 RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 814 checksum after the TTL field is decremented. The incremental 815 updating of the IPv4 checksum after the CE bit was set would work as 816 follows: Let HC be the original header checksum, and let HC' be the 817 new header checksum after the CE bit has been set. Then for header 818 checksums calculated with one's complement subtraction, HC' would be 819 recalculated as follows: 820 HC' = { HC - 1 HC > 1 821 { 0x0000 HC = 1 823 For header checksums calculated on two's complement machines, HC' 824 would be recalculated as follows after the CE bit was set: 825 HC' = { HC - 1 HC > 0 826 { 0xFFFE HC = 0 828 16. The motivation for the ECT bit. 830 The need for the ECT bit is motivated by the fact that ECN will be 831 deployed incrementally in an Internet where some transport protocols 832 and routers understand ECN and some do not. With the ECT bit, the 833 router can drop packets from flows that are not ECN-capable, but can 834 **instead** set the CE bit in flows that **are** ECN-capable. 835 Because the ECT bit allows an end node to have the CE bit set in a 836 packet **instead** of having the packet dropped, an end node might 837 have some incentive to deploy ECN. 839 If there was no ECT indication, then the router would have to set the 840 CE bit for packets from both ECN-capable and non-ECN-capable flows. 841 In this case, there would be no incentive for end-nodes to deploy 842 ECN, and no viable path of incremental deployment from a non-ECN 843 world to an ECN-capable world. Consider the first stages of such an 844 incremental deployment, where a subset of the flows are ECN-capable. 845 At the onset of congestion, when the packet dropping/marking rate 846 would be low, routers would only set CE bits, rather than dropping 847 packets. However, only those flows that are ECN-capable would 848 understand and respond to CE packets. The result is that the ECN- 849 capable flows would back off, and the non-ECN-capable flows would be 850 unaware of the ECN signals and would continue to open their 851 congestion windows. 853 In this case, there are two possible outcomes: (1) the ECN-capable 854 flows back off, the non-ECN-capable flows get all of the bandwidth, 855 and congestion remains mild, or (2) the ECN-capable flows back off, 856 the non-ECN-capable flows don't, and congestion increases until the 857 router transitions from setting the CE bit to dropping packets. 858 While this second outcome evens out the fairness, the ECN-capable 859 flows would still receive little benefit from being ECN-capable, 860 because the increased congestion would drive the router to packet- 861 dropping behavior. 863 A flow that advertised itself as ECN-Capable but does not respond to 864 CE bits is functionally equivalent to a flow that turns off 865 congestion control, as discussed in Sections 8 and 9. 867 Thus, in a world when a subset of the flows are ECN-capable, but 868 where ECN-capable flows have no mechanism for indicating that fact to 869 the routers, there would be less effective and less fair congestion 870 control in the Internet, resulting in a strong incentive for end 871 nodes not to deploy ECN. 873 17. Why use two bits in the IP header? 875 Given the need for an ECT indication in the IP header, there still 876 remains the question of whether the ECT (ECN-Capable Transport) and 877 CE (Congestion Experienced) indications should be overloaded on a 878 single bit. This overloaded-one-bit alternative, explored in 879 [Floyd94], would involve a single bit with two values. One value, 880 "ECT and not CE", would represent an ECN-Capable Transport, and the 881 other value, "CE or not ECT", would represent either Congestion 882 Experienced or a non-ECN-Capable transport. 884 There is only one inherent functional difference between the one-bit 885 and two-bit implementations. This functional difference concerns 886 packets that traverse multiple congested routers. Consider a CE 887 packet that arrives at a second congested router, and is selected by 888 the active queue management at that router for either marking or 889 dropping. In the one-bit implementation, the second congested router 890 has no choice but to drop the CE packet, because it cannot 891 distinguish between a CE packet and a non-ECT packet. In the two-bit 892 implementation, the second congested router has the choice of either 893 dropping the CE packet, or of leaving it alone with the CE bit set. 895 Another difference between the one-bit and two-bit implementations 896 comes from the fact that with the one-bit implementation, receivers 897 in a single flow cannot distinguish between CE and non-ECT packets. 898 Thus, in the one-bit implementation an ECN-capable data sender would 899 have to unambiguously indicate to the receiver or receivers whether 900 each packet had been sent as ECN-Capable or as non-ECN-Capable. One 901 possibility would be for the sender to indicate in the transport 902 header whether the packet was sent as ECN-Capable. A second 903 possibility that would involve a functional limitation for the one- 904 bit implementation would be for the sender to unambiguously indicate 905 that it was going to send *all* of its packets as ECN-Capable or as 906 non-ECN-Capable. For a multicast transport protocol, this 907 unambiguous indication would have to be apparent to receivers joining 908 an on-going multicast session. 910 Another advantage of the two-bit approach is that it is somewhat more 911 robust. The most critical issue, discussed in Section 8, is that the 912 default indication should be that of a non-ECN-Capable transport. In 913 a two-bit implementation, this requirement for the default value 914 simply means that the ECT bit should be `OFF' by default. In the 915 one-bit implementation, this means that the single overloaded bit 916 should by default be in the "CE or not ECT" position. This is less 917 clear and straightforward, and possibly more open to incorrect 918 implementations either in the end nodes or in the routers. 920 In summary, while the one-bit implementation could be a possible 921 implementation, it has the following significant limitations relative 922 to the two-bit implementation. First, the one-bit implementation has 923 more limited functionality for the treatment of CE packets at a 924 second congested router. Second, the one-bit implementation requires 925 either that extra information be carried in the transport header of 926 packets from ECN-Capable flows (to convey the functionality of the 927 second bit elsewhere, namely in the transport header), or that 928 senders in ECN-Capable flows accept the limitation that receivers 929 must be able to determine a priori which packets are ECN-Capable and 930 which are not ECN-Capable. Third, the one-bit implementation is 931 possibly more open to errors from faulty implementations that choose 932 the wrong default value for the ECN bit. We believe that the use of 933 the extra bit in the IP header for the ECT-bit is extremely valuable 934 to overcome these limitations. 936 AUTHORS' ADDRESSES 938 K. K. Ramakrishnan 939 AT&T Labs. Research 940 Phone: +1 (973) 360-8766 941 Email: kkrama@research.att.com 942 URL: http://www.research.att.com/info/kkrama 944 Sally Floyd 945 Lawrence Berkeley National Laboratory 946 Phone: +1 (510) 486-7518 947 Email: floyd@ee.lbl.gov 948 URL: http://www-nrg.ee.lbl.gov/floyd/ 950 This draft was created in September 1998. 951 It expires March 1999.