idnits 2.17.1 draft-kksjf-ecn-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 24 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 1998) is 9325 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 2001' is mentioned on line 293, but not defined ** Obsolete undefined reference: RFC 2001 (Obsoleted by RFC 2581) == Missing Reference: 'RFC 1455' is mentioned on line 1044, but not defined ** Obsolete undefined reference: RFC 1455 (Obsoleted by RFC 2474) == Unused Reference: 'Floyd97' is defined on line 819, but no explicit reference was found in the text == Unused Reference: 'FRED' is defined on line 831, but no explicit reference was found in the text == Unused Reference: 'RFC1455' is defined on line 857, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'CKLTZ97' -- Possible downref: Non-RFC (?) normative reference: ref. 'CKLTZ98' -- Possible downref: Non-RFC (?) normative reference: ref. 'ECN' -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ93' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd94' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd97' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd98' -- Possible downref: Non-RFC (?) normative reference: ref. 'K98' -- Possible downref: Non-RFC (?) normative reference: ref. 'FRED' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson88' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson90' -- Possible downref: Non-RFC (?) normative reference: ref. 'MJV96' ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Downref: Normative reference to an Informational RFC: RFC 1141 ** Obsolete normative reference: RFC 1349 (Obsoleted by RFC 2474) ** Obsolete normative reference: RFC 1455 (Obsoleted by RFC 2474) ** Obsolete normative reference: RFC 2001 (Obsoleted by RFC 2581) ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567) -- Possible downref: Non-RFC (?) normative reference: ref. 'RJ90' Summary: 17 errors (**), 0 flaws (~~), 8 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force K. K. Ramakrishnan 2 INTERNET DRAFT AT&T Labs Research 3 draft-kksjf-ecn-03.txt Sally Floyd 4 LBNL 5 October 1998 6 Expires: April 1999 8 A Proposal to add Explicit Congestion Notification (ECN) to IP 10 Status of this Memo 12 This document is an Internet-Draft. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its areas, 14 and its working groups. Note that other groups may also distribute 15 working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet- Drafts as reference 20 material or to cite them other than as "work in progress." 22 To view the entire list of current Internet-Drafts, please check the 23 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 24 Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern 25 Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific 26 Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). 28 Abstract 30 This note describes a proposed addition of ECN (Explicit Congestion 31 Notification) to IP. TCP is currently the dominant transport 32 protocol used in the Internet. We begin by describing TCP's use of 33 packet drops as an indication of congestion. Next we argue that with 34 the addition of active queue management (e.g., RED) to the Internet 35 infrastructure, where routers detect congestion before the queue 36 overflows, routers are no longer limited to packet drops as an 37 indication of congestion. Routers could instead set a Congestion 38 Experienced (CE) bit in the packet header of packets from ECN-capable 39 transport protocols. We describe when the CE bit would be set in the 40 routers, and describe what modifications would be needed to TCP to 41 make it ECN-capable. Modifications to other transport protocols 42 (e.g., unreliable unicast or multicast, reliable multicast, other 43 reliable unicast transport protocols) could be considered as those 44 protocols are developed and advance through the standards process. 46 1. Conventions and Acronyms 48 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 49 SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this 50 document, are to be interpreted as described in [B97]. 52 2. Introduction 54 TCP's congestion control and avoidance algorithms are based on the 55 notion that the network is a black-box [Jacobson88, Jacobson90]. The 56 network's state of congestion or otherwise is determined by end- 57 systems probing for the network state, by gradually increasing the 58 load on the network (by increasing the window of packets that are 59 outstanding in the network) until the network becomes congested and a 60 packet is lost. Treating the network as a "black-box" and treating 61 loss as an indication of congestion in the network is appropriate for 62 pure best-effort data carried by TCP which has little or no 63 sensitivity to delay or loss of individual packets. In addition, 64 TCP's congestion management algorithms have techniques built-in (such 65 as Fast Retransmit and Fast Recovery) to minimize the impact of 66 losses from a throughput perspective. 68 However, these mechanisms are not intended to help applications that 69 are in fact sensitive to the delay or loss of one or more individual 70 packets. Interactive traffic such as telnet, web-browsing, and 71 transfer of audio and video data can be sensitive to packet losses 72 (using an unreliable data delivery transport such as UDP) or to the 73 increased latency of the packet caused by the need to retransmit the 74 packet after a loss (for reliable data delivery such as TCP). 76 Since TCP determines the appropriate congestion window to use by 77 gradually increasing the window size until it experiences a dropped 78 packet, this causes the queues at the bottleneck router to build up. 79 With most packet drop policies at the router that are not sensitive 80 to the load placed by each individual flow, this means that some of 81 the packets of latency-sensitive flows are going to be dropped. 82 Active queue management mechanisms detect congestion before the queue 83 overflows, and provide an indication of this congestion to the end 84 nodes. The advantages of active queue management are discussed in 85 RFC 2309 [RFC2309]. Active queue management avoids some of the bad 86 properties of dropping on queue overflow, including the undesirable 87 synchronization of loss across multiple flows. More importantly, 88 active queue management means that transport protocols with 89 congestion control (e.g., TCP) do not have to rely on buffer overflow 90 as the only indication of congestion. This can reduce unnecessary 91 queueing delay for all traffic sharing that queue. 93 Active queue management mechanisms may use one of several methods for 94 indicating congestion to end-nodes. One is to use packet drops, as is 95 currently done. However, active queue management allows the router to 96 separate policies of queueing or dropping packets from the policies 97 for indicating congestion. Thus, active queue management allows 98 routers to use the Congestion Experienced (CE) bit in a packet header 99 as an indication of congestion, instead of relying solely on packet 100 drops. 102 3. Assumptions and General Principles 104 In this section, we describe some of the important design principles 105 and assumptions that guided the design choices in this proposal. 107 (1) Congestion may persist over different time-scales. The time 108 scales that we are concerned with are congestion events that may last 109 longer than a round-trip time. 110 (2) The number of packets in an individual flow (e.g., TCP connection 111 or an exchange using UDP) may range from a small number of packets to 112 quite a large number. We are interested in managing the congestion 113 caused by flows that send enough packets so that they are still 114 active when network feedback reaches them. 115 (3) New mechanisms for congestion control and avoidance need to co- 116 exist and cooperate with existing mechanisms for congestion control. 117 In particular, new mechanisms have to co-exist with TCP's current 118 methods of adapting to congestion and with routers' current practice 119 of dropping packets in periods of congestion. 120 (4) Because ECN is likely to be adopted gradually, accommodating 121 migration is essential. Some routers may still only drop packets to 122 indicate congestion, and some end-systems may not be ECN-capable. The 123 most viable strategy is one that accommodates incremental deployment 124 without having to resort to "islands" of ECN-capable and non-ECN- 125 capable environments. 126 (5) Asymmetric routing is likely to be a normal occurrence in the 127 Internet. The path (sequence of links and routers) followed by data 128 packets may be different from the path followed by the acknowledgment 129 packets in the reverse direction. 130 (6) Many routers process the "regular" headers in IP packets more 131 efficiently than they process the header information in IP options. 132 This suggests keeping congestion experienced information in the 133 regular headers of an IP packet. 134 (7) It must be recognized that not all end-systems will cooperate in 135 mechanisms for congestion control. However, new mechanisms shouldn't 136 make it easier for TCP applications to disable TCP congestion 137 control. The benefit of lying about participating in new mechanisms 138 such as ECN-capability should be small. 140 4. Random Early Detection (RED) 142 Random Early Detection (RED) is a mechanism for active queue 143 management that has been proposed to detect incipient congestion 144 [FJ93], and is currently being deployed in the Internet backbone 145 [RFC2309]. Although RED is meant to be a general mechanism using one 146 of several alternatives for congestion indication, in the current 147 environment of the Internet RED is restricted to using packet drops 148 as a mechanism for congestion indication. RED drops packets based on 149 the average queue length exceeding a threshold, rather than only when 150 the queue overflows. However, when RED drops packets before the 151 queue actually overflows, RED is not forced by memory limitations to 152 discard the packet. 154 RED could set a Congestion Experienced (CE) bit in the packet header 155 instead of dropping the packet, if such a bit was provided in the IP 156 header and understood by the transport protocol. The use of the CE 157 bit would allow the receiver(s) to receive the packet, avoiding the 158 potential for excessive delays due to retransmissions after packet 159 losses. We use the term 'CE packet' to denote a packet that has the 160 CE bit set. 162 5. Explicit Congestion Notification in IP 164 We propose that the Internet provide a congestion indication for 165 incipient congestion (as in RED and earlier work [RJ90]) where the 166 notification can sometimes be through marking packets rather than 167 dropping them. This would require an ECN field in the IP header with 168 two bits. The ECN-Capable Transport (ECT) bit would be set by the 169 data sender to indicate that the end-points of the transport protocol 170 are ECN-capable. The CE bit would be set by the router to indicate 171 congestion to the end nodes. Routers that have a packet arriving at 172 a full queue would drop the packet, just as they do now. 174 Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field. 175 Bit 6 is designated as the ECT bit, and bit 7 is designated as the CE 176 bit. The IPv4 TOS octet corresponds to the Traffic Class octet in 177 IPv6. The definitions for the IPv4 TOS octet [RFC791] and the IPv6 178 Traffic Class octet are intended to be superseded by the DS 179 (Differentiated Services) Field [RFC-DIFFSERV?]. Bits 6 and 7 are 180 listed in [RFC-DIFFSERV?] as Currently Unused. Section 19 gives a 181 brief history of the TOS octet. 183 Because of the unstable history of the TOS octet, the use of the ECN 184 field as specified in this document cannot be guaranteed to be 185 backwards compatible with all past uses of these two bits. The 186 potential dangers of this lack of backwards compatibility are 187 discussed in Section 19. 189 Upon the receipt by an ECN-Capable transport of a single CE packet, 190 the congestion control algorithms followed at the end-systems MUST be 191 essentially the same as the congestion control response to a *single* 192 dropped packet. For example, for ECN-Capable TCP the source TCP is 193 required to halve its congestion window for any window of data 194 containing either a packet drop or an ECN indication. However, we 195 would like to point out some notable exceptions in the reaction of 196 the source TCP, related to following the shorter-time-scale details 197 of particular implementations of TCP. For TCP's response to an ECN 198 indication, we do not recommend such behavior as the slow-start of 199 Tahoe TCP in response to a packet drop, or Reno TCP's wait of roughly 200 half a round-trip time during Fast Recovery. 202 One reason for requiring that the congestion-control response to the 203 CE packet be essentially the same as the response to a dropped packet 204 is to accommodate the incremental deployment of ECN in both end- 205 systems and in routers. Some routers may drop ECN-Capable packets 206 (e.g., using the same RED policies for congestion detection) while 207 other routers set the CE bit, for equivalent levels of congestion. 208 Similarly, a router might drop a non-ECN-Capable packet but set the 209 CE bit in an ECN-Capable packet, for equivalent levels of congestion. 210 Different congestion control responses to a CE bit indication and to 211 a packet drop could result in unfair treatment for different flows. 213 An additional requirement is that the end-systems should react to 214 congestion at most once per window of data (i.e., at most once per 215 roundtrip time), to avoid reacting multiple times to multiple 216 indications of congestion within a roundtrip time. 218 For a router, the CE bit of an ECN-Capable packet should only be set 219 if the router would otherwise have dropped the packet as an 220 indication of congestion to the end nodes. When the router's buffer 221 is not yet full and the router is prepared to drop a packet to inform 222 end nodes of incipient congestion, the router should first check to 223 see if the ECT bit is set in that packet's IP header. If so, then 224 instead of dropping the packet, the router MAY instead set the CE bit 225 in the IP header. 227 An environment where all end nodes were ECN-Capable could allow new 228 criteria to be developed for setting the CE bit, and new congestion 229 control mechanisms for end-node reaction to CE packets. However, 230 this is a research issue, and as such is not addressed in this 231 document. 233 When a CE packet is received by a router, the CE bit is left 234 unchanged, and the packet transmitted as usual. When severe 235 congestion has occurred and the router's queue is full, then the 236 router has no choice but to drop some packet when a new packet 237 arrives. We anticipate that such packet losses will become 238 relatively infrequent when a majority of end-systems become ECN- 239 Capable and participate in TCP or other compatible congestion control 240 mechanisms. In an adequately-provisioned network in such an ECN- 241 Capable environment, packet losses should occur primarily during 242 transients or in the presence of non-cooperating sources. 244 We expect that routers will set the CE bit in response to incipient 245 congestion as indicated by the average queue size, using the RED 246 algorithms suggested in [FJ93, RFC2309]. To the best of our 247 knowledge, this is the only proposal currently under discussion in 248 the IETF for routers to drop packets proactively, before the buffer 249 overflows. However, this document does not attempt to specify a 250 particular mechanism for active queue management, leaving that 251 endeavor, if needed, to other areas of the IETF. While ECN is 252 inextricably tied up with active queue management at the router, the 253 reverse does not hold; active queue management mechanisms have been 254 developed and deployed independently from ECN, using packet drops as 255 indications of congestion in the absence of ECN in the IP 256 architecture. 258 6. Support from the Transport Protocol 260 ECN requires support from the transport protocol, in addition to the 261 functionality given by the ECN field in the IP packet header. The 262 transport protocol might require negotiation between the endpoints 263 during setup to determine that all of the endpoints are ECN-capable, 264 so that the sender can set the ECT bit in transmitted packets. 265 Second, the transport protocol must be capable of reacting 266 appropriately to the receipt of CE packets. This reaction could be 267 in the form of the data receiver informing the data sender of the 268 received CE packet (e.g., TCP), of the data receiver unsubscribing to 269 a layered multicast group (e.g., RLM [MJV96]), or of some other 270 action that ultimately reduces the arrival rate of that flow to that 271 receiver. 273 This document only addresses the addition of ECN Capability to TCP, 274 leaving issues of ECN and other transport protocols to further 275 research. For TCP, ECN requires three new mechanisms: negotiation 276 between the endpoints during setup to determine if they are both 277 ECN-capable; an ECN-Echo flag in the TCP header so that the data 278 receiver can inform the data sender when a CE packet has been 279 received; and a Congestion Window Reduced (CWR) flag in the TCP 280 header so that the data sender can inform the data receiver that the 281 congestion window has been reduced. The support required from other 282 transport protocols is likely to be different, particular for 283 unreliable or reliable multicast transport protocols, and will have 284 to be determined as other transport protocols are brought to the IETF 285 for standardization. 287 6.1. TCP 289 The following sections describe in detail the proposed use of ECN in 290 TCP. This proposal is described in essentially the same form in 291 [Floyd94]. We assume that the source TCP uses the standard congestion 292 control algorithms of Slow-start, Fast Retransmit and Fast Recovery 293 [RFC 2001]. 295 This proposal specifies two new flags in the Reserved field of the 296 TCP header. The TCP mechanism for negotiating ECN-Capability uses 297 the ECN-Echo flag in the TCP header. (This was called the ECN Notify 298 flag in some earlier documents.) Bit 9 in the Reserved field of the 299 TCP header is designated as the ECN-Echo flag. The location of the 300 6-bit Reserved field in the TCP header is shown in Figure 3 of RFC 301 793 [RFC793]. 303 To enable the TCP receiver to determine when to stop setting the 304 ECN-Echo flag, we introduce a second new flag in the TCP header, the 305 Congestion Window Reduced (CWR) flag. The CWR flag is assigned to 306 Bit 8 in the Reserved field of the TCP header. 308 The use of these flags is described in the sections below. 310 6.1.1. TCP Initialization 312 In the TCP connection setup phase, the source and destination TCPs 313 exchange information about their desire and/or capability to use ECN. 314 Subsequent to the completion of this negotiation, the TCP sender sets 315 the ECT bit in the IP header of data packets to indicate to the 316 network that the transport is capable and willing to participate in 317 ECN for this packet. This will indicate to the routers that they may 318 mark this packet with the CE bit, if they would like to use that as a 319 method of congestion notification. If the TCP connection does not 320 wish to use ECN notification for a particular packet, the sending TCP 321 sets the ECT bit equal to 0 (i.e., not set), and the TCP receiver 322 ignores the CE bit in the received packet. 324 When a node sends a TCP SYN packet, it may set the ECN-Echo and CWR 325 flags in the TCP header. For a SYN packet, the setting of both the 326 ECN-Echo and CWR flags are defined as an indication that the sending 327 TCP is ECN-Capable, rather than as an indication of congestion or of 328 response to congestion. More precisely, a SYN packet with both the 329 ECN-Echo and CWR flags set indicates that the TCP implementation 330 transmitting the SYN packet will participate in ECN as both a sender 331 and receiver. As a receiver, it will respond to incoming data 332 packets that have the CE bit set in the IP header by setting the 333 ECN-Echo flag in outgoing TCP Acknowledgement (ACK) packets. As a 334 sender, it will respond to incoming packets that have the ECN-Echo 335 flag set by reducing the congestion window when appropriate. 337 When a node sends a SYN-ACK packet, it may set the ECN-Echo flag, but 338 it does not set the CWR flag. For a SYN-ACK packet, the pattern of 339 the ECN-Echo flag set and the CWR flag not set in the TCP header is 340 defined as an indication that the TCP transmitting the SYN-ACK packet 341 is ECN-Capable. 343 There is the question of why we chose to have the TCP sending the SYN 344 set two ECN-related flags in the Reserved field of the TCP header for 345 the SYN packet, while the responding TCP sending the SYN-ACK sets 346 only one ECN-related flag in the SYN-ACK packet. This asymmetry is 347 necessary for the robust negotiation of ECN-capability with deployed 348 TCP implementations. There exists at least one TCP implementation in 349 which TCP receivers set the Reserved field of the TCP header in ACK 350 packets (and hence the SYN-ACK) simply to reflect the Reserved field 351 of the TCP header in the received data packet. Because the TCP SYN 352 packet sets the ECN-Echo and CWR flags to indicate ECN-capability, 353 while the SYN-ACK packet sets only the ECN-Echo flag, the sending TCP 354 correctly interprets a receiver's reflection of its own flags in the 355 Reserved field as an indication that the receiver is not ECN-capable. 357 6.1.2. The TCP Sender 359 For a TCP connection using ECN, data packets are transmitted with the 360 ECT bit set in the IP header (set to a "1"). If the sender receives 361 an ECN-Echo ACK packet (that is, an ACK packet with the ECN-Echo flag 362 set in the TCP header), then the sender knows that congestion was 363 encountered in the network on the path from the sender to the 364 receiver. The indication of congestion should be treated just as a 365 congestion loss in non-ECN-Capable TCP. That is, the TCP source 366 halves the congestion window "cwnd" and reduces the slow start 367 threshold "ssthresh". The sending TCP does NOT increase the 368 congestion window in response to the receipt of an ECN-Echo ACK 369 packet. 371 A critical condition is that TCP does not react to congestion 372 indications more than once every window of data (or more loosely, 373 more than once every round-trip time). That is, the TCP sender's 374 congestion window should be reduced only once in response to a series 375 of dropped and/or CE packets from a single window of data, In 376 addition, the TCP source should not decrease the slow-start 377 threshold, ssthresh, if it has been decreased within the last round 378 trip time. However, if any retransmitted packets are dropped or have 379 the CE bit set, then this is interpreted by the source TCP as a new 380 instance of congestion. 382 After the source TCP reduces its congestion window in response to a 383 CE packet, incoming acknowledgements that continue to arrive can 384 "clock out" outgoing packets as allowed by the reduced congestion 385 window. If the congestion window consists of only one MSS (maximum 386 segment size), and the sending TCP receives an ECN-Echo ACK packet, 387 then the sending TCP should in principle still reduce its congestion 388 window in half. However, the value of the congestion window is 389 bounded below by a value of one MSS. If the sending TCP were to 390 continue to send, using a congestion window of 1 MSS, this results in 391 the transmission of one packet per round-trip time. We believe it is 392 desirable to still reduce the sending rate of the TCP sender even 393 further, on receipt of an ECN-Echo packet when the congestion window 394 is one. We use the retransmit timer as a means to reduce the rate 395 further in this circumstance. Therefore, the sending TCP should also 396 reset the retransmit timer on receiving the ECN-Echo packet when the 397 congestion window is one. The sending TCP will then be able to send 398 a new packet when the retransmit timer expires. 400 [Floyd94] discusses TCP's response to ECN in more detail. [Floyd98] 401 discusses the validation test in the ns simulator, which illustrates 402 a wide range of ECN scenarios. These scenarios include the following: 403 an ECN followed by another ECN, a Fast Retransmit, or a Retransmit 404 Timeout; a Retransmit Timeout or a Fast Retransmit followed by an 405 ECN, and a congestion window of one packet followed by an ECN. 407 TCP follows existing algorithms for sending data packets in response 408 to incoming ACKs, multiple duplicate acknowledgements, or retransmit 409 timeouts [RFC2001]. 411 6.1.3. The TCP Receiver 413 When TCP receives a CE data packet at the destination end-system, the 414 TCP data receiver sets the ECN-Echo flag in the TCP header of the 415 subsequent ACK packet. If there is any ACK withholding implemented, 416 as in current "delayed-ACK" TCP implementations where the TCP 417 receiver can send an ACK for two arriving data packets, then the 418 ECN-Echo flag in the ACK packet will be set to the OR of the CE bits 419 of all of the data packets being acknowledged. That is, if any of 420 the received data packets are CE packets, then the returning ACK has 421 the ECN-Echo flag set. 423 To provide robustness against the possibility of a dropped ACK packet 424 carrying an ECN-Echo flag, the TCP receiver must set the ECN-Echo 425 flag in a series of ACK packets. The TCP receiver uses the CWR flag 426 to determine when to stop setting the ECN-Echo flag. 428 When an ECN-Capable TCP reduces its congestion window for any reason 429 (because of a retransmit timeout, a Fast Retransmit, or in response 430 to an ECN Notification), the TCP sets the CWR flag in the TCP header 431 of the first data packet sent after the window reduction. If that 432 data packet is dropped in the network, then the sending TCP will have 433 to reduce the congestion window again and retransmit the dropped 434 packet. Thus, the Congestion Window Reduced message is reliably 435 delivered to the data receiver. 437 After a TCP receiver sends an ACK packet with the ECN-Echo bit set, 438 that TCP receiver continues to set the ECN-Echo flag in ACK packets 439 until it receives a CWR packet (a packet with the CWR flag set). 440 After the receipt of the CWR packet, acknowledgements for subsequent 441 non-CE data packets do not have the ECN-Echo flag set. If another CE 442 packet is received by the data receiver, the receiver would once 443 again send ACK packets with the ECN-Echo flag set. While the receipt 444 of a CWR packet does not guarantee that the data sender received the 445 ECN-Echo message, this does indicate that the data sender reduced its 446 congestion window at some point *after* it sent the data packet for 447 which the CE bit was set. 449 We have already specified that a TCP sender reduces its congestion 450 window at most once per window of data. This mechanism requires some 451 care to make sure that the sender reduces its congestion window at 452 most once per ECN indication, and that multiple ECN messages over 453 several successive windows of data are properly reported to the ECN 454 sender. This is discussed further in [Floyd98]. 456 6.1.4. Congestion on the ACK-path 458 For the current generation of TCP congestion control algorithms, pure 459 acknowledgement packets (e.g., packets that do not contain any 460 accompanying data) should be sent with the ECT bit off. Current TCP 461 receivers have no mechanisms for reducing traffic on the ACK-path in 462 response to congestion notification. Mechanisms for responding to 463 congestion on the ACK-path are areas for current and future research. 464 (One simple possibility would be for the sender to reduce its 465 congestion window when it receives a pure ACK packet with the CE bit 466 set). For current TCP implementations, a single dropped ACK generally 467 has only a very small effect on the TCP's sending rate. 469 7. Summary of changes required in IP and TCP 471 Two bits need to be specified in the IP header, the ECN-Capable 472 Transport (ECT) bit and the Congestion Experienced (CE) bit. The ECT 473 bit set to "0" indicates that the transport protocol will ignore the 474 CE bit. This is the default value for the ECT bit. The ECT bit set 475 to "1" indicates that the transport protocol is willing and able to 476 participate in ECN. 478 The default value for the CE bit is "0". The router sets the CE bit 479 to "1" to indicate congestion to the end nodes. The CE bit in a 480 packet header should never be reset by a router from "1" to "0". 482 TCP requires three changes, a negotiation phase during setup to 483 determine if both end nodes are ECN-capable, and two new flags in the 484 TCP header, from the "reserved" flags in the TCP flags field. The 485 ECN-Echo flag is used by the data receiver to inform the data sender 486 of a received CE packet. The Congestion Window Reduced flag is used 487 by the data sender to inform the data receiver that the congestion 488 window has been reduced. 490 8. Non-relationship to ATM's EFCI indicator or Frame Relay's FECN 492 Since the ATM and Frame Relay mechanisms for congestion indication 493 have typically been defined without any notion of average queue size 494 as the basis for determining that an intermediate node is congested, 495 we believe that they provide a very noisy signal. The TCP-sender 496 reaction specified in this draft for ECN is NOT the appropriate 497 reaction for such a noisy signal of congestion notification. It is 498 our expectation that ATM's EFCI and Frame Relay's FECN mechanisms 499 would be phased out over time within the ATM network. However, if 500 the routers that interface to the ATM network have a way of 501 maintaining the average queue at the interface, and use it to come to 502 a reliable determination that the ATM subnet is congested, they may 503 use the ECN notification that is defined here. 505 We emphasize that a *single* packet with the CE bit set in an IP 506 packet causes the transport layer to respond, in terms of congestion 507 control, as it would to a packet drop. As such, the CE bit is not a 508 good match to a transient signal such as one based on the 509 instantaneous queue size. However, experiments in techniques at 510 layer 2 (e.g., in ATM switches or Frame Relay switches) should be 511 encouraged. For example, using a scheme such as RED (where packet 512 marking is based on the average queue length exceeding a threshold), 513 layer 2 devices could provide a reasonably reliable indication of 514 congestion. When all the layer 2 devices in a path set that layer's 515 own Congestion Experienced bit (e.g., the EFCI bit for ATM, the FECN 516 bit in Frame Relay) in this reliable manner, then the interface 517 router to the layer 2 network could copy the state of that layer 2 518 Congestion Experienced bit into the CE bit in the IP header. We 519 recognize that this is not the current practice, nor is it in current 520 standards. However, encouraging experimentation in this manner may 521 provide the information needed to enable evolution of existing layer 522 2 mechanisms to provide a more reliable means of congestion 523 indication, when they use a single bit for indicating congestion. 525 9. Non-compliance by the End Nodes 527 This section discusses concerns about the vulnerability of ECN to 528 non-compliant end-nodes (i.e., end nodes that set the ECT bit in 529 transmitted packets but do not respond to received CE packets). We 530 argue that the addition of ECN to the IP architecture would not 531 significantly increase the current vulnerability of the architecture 532 to unresponsive flows. 534 Even for non-ECN environments, there are serious concerns about the 535 damage that can be done by non-compliant or unresponsive flows (that 536 is, flows that do not respond to congestion control indications by 537 reducing their arrival rate at the congested link). For example, an 538 end-node could "turn off congestion control" by not reducing its 539 congestion window in response to packet drops. This is a concern for 540 the current Internet. It has been argued that routers will have to 541 deploy mechanisms to detect and differentially treat packets from 542 non-compliant flows. It has also been argued that techniques such as 543 end-to-end per-flow scheduling and isolation of one flow from 544 another, differentiated services, or end-to-end reservations could 545 remove some of the more damaging effects of unresponsive flows. 547 It has been argued that dropping packets in itself may be an adequate 548 deterrent for non-compliance, and that the use of ECN removes this 549 deterrent. We would argue in response that (1) ECN-capable routers 550 preserve packet-dropping behavior in times of high congestion; and 551 (2) even in times of high congestion, dropping packets in itself is 552 not an adequate deterrent for non-compliance. 554 First, ECN-Capable routers will only mark packets (as opposed to 555 dropping them) when the packet marking rate is reasonably low. During 556 periods where the average queue size exceeds an upper threshold, and 557 therefore the potential packet marking rate would be high, our 558 recommendation is that routers drop packets rather then set the CE 559 bit in packet headers. 561 During the periods of low or moderate packet marking rates when ECN 562 would be deployed, there would be little deterrent effect on 563 unresponsive flows of dropping rather than marking those packets. For 564 example, delay-insensitive flows using reliable delivery might have 565 an incentive to increase rather than to decrease their sending rate 566 in the presence of dropped packets. Similarly, delay-sensitive flows 567 using unreliable delivery might increase their use of FEC in response 568 to an increased packet drop rate, increasing rather than decreasing 569 their sending rate. For the same reasons, we do not believe that 570 packet dropping itself is an effective deterrent for non-compliance 571 even in an environment of high packet drop rates. 573 Several methods have been proposed to identify and restrict non- 574 compliant or unresponsive flows. The addition of ECN to the network 575 environment would not in any way increase the difficulty of designing 576 and deploying such mechanisms. If anything, the addition of ECN to 577 the architecture would make the job of identifying unresponsive flows 578 slightly easier. For example, in an ECN-Capable environment routers 579 are not limited to information about packets that are dropped or have 580 the CE bit set at that router itself; in such an environment routers 581 could also take note of arriving CE packets that indicate congestion 582 encountered by that packet earlier in the path. 584 10. Non-compliance in the Network 586 The breakdown of effective congestion control could be caused not 587 only by a non-compliant end-node, but also by the loss of the 588 congestion indication in the network itself. This could happen 589 through a rogue or broken router that set the ECT bit in a packet 590 from a non-ECN-capable transport, or "erased" the CE bit in arriving 591 packets. As one example, a rogue or broken router that "erased" the 592 CE bit in arriving CE packets would prevent that indication of 593 congestion from reaching downstream receivers. This could result in 594 the failure of congestion control for that flow and a resulting 595 increase in congestion in the network, ultimately resulting in 596 subsequent packets dropped for this flow as the average queue size 597 increased at the congested gateway. 599 The actions of a rogue or broken router could also result in an 600 unnecessary indication of congestion to the end-nodes. These actions 601 can include a router dropping a packet or setting the CE bit in the 602 absence of congestion. From a congestion control point of view, 603 setting the CE bit in the absence of congestion by a non-compliant 604 router would be no different than a router dropping a packet 605 unecessarily. By "erasing" the ECT bit of a packet that is later 606 dropped in the network, a router's actions could result in an 607 unnecessary packet drop for that packet later in the network. 609 Concerns regarding the loss of congestion indications from 610 encapsulated, dropped, or corrupted packets are discussed below. 612 10.1. Encapsulated packets 614 Some care is required to handle the CE and ECT bits appropriately 615 when packets are encapsulated and de-encapsulated for tunnels. 617 When a packet is encapsulated, the following rules apply regarding 618 the ECT bit. First, if the ECT bit in the encapsulated ('inside') 619 header is a 0, then the ECT bit in the encapsulating ('outside') 620 header MUST be a 0. If the ECT bit in the inside header is a 1, then 621 the ECT bit in the outside header SHOULD be a 1. 623 When a packet is de-encapsulated, the following rules apply regarding 624 the CE bit. If the ECT bit is a 1 in both the inside and the outside 625 header, then the CE bit in the outside header MUST be ORed with the 626 CE bit in the inside header. (That is, in this case a CE bit of 1 in 627 the outside header must be copied to the inside header.) If the ECT 628 bit in either header is a 0, then the CE bit in the outside header is 629 ignored. This requirement for the treatment of de-encapsulated 630 packets does not currently apply to IPsec tunnels. 632 A specific example of the use of ECN with encapsulation occurs when a 633 flow wishes to use ECN-capability to avoid the danger of an 634 unnecessary packet drop for the encapsulated packet as a result of 635 congestion at an intermediate node in the tunnel. This functionality 636 can be supported by copying the ECN field in the inner IP header to 637 the outer IP header upon encapsulation, and using the ECN field in 638 the outer IP header to set the ECN field in the inner IP header upon 639 decapsulation. This effectively allows routers along the tunnel to 640 cause the CE bit to be set in the ECN field of the unencapsulated IP 641 header of an ECN-capable packet when such routers experience 642 congestion. 644 10.2. IPsec Tunnel Considerations 646 The IPsec protocol, as defined in [RFC-ESP?, RFC-AH?], does not 647 include the IP header's ECN field in any of its cryptographic 648 calculations (in the case of tunnel mode, the outer IP header's ECN 649 field is not included). Hence modification of the ECN field by a 650 network node has no effect on IPsec's end-to-end security, because it 651 cannot cause any IPsec integrity check to fail. As a consequence, 652 IPsec does not provide any defense against an adversary's 653 modification of the ECN field (i.e., a man-in-the-middle attack), as 654 the adversary's modification will also have no effect on IPsec's 655 end-to-end security. In some environments, the ability to modify the 656 ECN field without affecting IPsec integrity checks may constitute a 657 covert channel; if it is necessary to eliminate such a channel or 658 reduce its bandwidth, then the outer IP header's ECN field can be 659 zeroed at the tunnel ingress and egress nodes. 661 The IPsec protocol currently requires that the inner header's ECN 662 field not be changed by IPsec decapsulation processing at a tunnel 663 egress node. This ensures that an adversary's modifications to the 664 ECN field cannot be used to launch theft- or denial-of-service 665 attacks across an IPsec tunnel endpoint, as any such modifications 666 will be discarded at the tunnel endpoint. This document makes no 667 change to that IPsec requirement. As a consequence of the current 668 specification of the IPsec protocol, we suggest that experiments with 669 ECN not be carried out for flows that will undergo IPsec tunneling at 670 the present time. 672 If the IPsec specifications are modified in the future to permit a 673 tunnel egress node to modify the ECN field in an inner IP header 674 based on the ECN field value in the outer header (e.g., copying part 675 or all of the outer ECN field to the inner ECN field), or to permit 676 the ECN field of the outer IP header to be zeroed during 677 encapsulation, then experiments with ECN may be used in combination 678 with IPsec tunneling. 680 This discussion of ECN and IPsec tunnel considerations draws heavily 681 on related discussions and documents from the Differentiated Services 682 Working Group. 684 10.3. Dropped or Corrupted Packets 686 An additional issue concerns a packet that has the CE bit set at one 687 router and is dropped by a subsequent router. For the proposed use 688 for ECN in this paper (that is, for a transport protocol such as TCP 689 for which a dropped data packet is an indication of congestion), end 690 nodes detect dropped data packets, and the congestion response of the 691 end nodes to a dropped data packet is at least as strong as the 692 congestion response to a received CE packet. 694 However, transport protocols such as TCP do not necessarily detect 695 all packet drops, such as the drop of a "pure" ACK packet; for 696 example, TCP does not reduce the arrival rate of subsequent ACK 697 packets in response to an earlier dropped ACK packet. Any proposal 698 for extending ECN-Capability to such packets would have to address 699 concerns raised by CE packets that were later dropped in the network. 701 Similarly, if a CE packet is dropped later in the network due to 702 corruption (bit errors), the end nodes should still invoke congestion 703 control, just as TCP would today in response to a dropped data 704 packet. This issue of corrupted CE packets would have to be 705 considered in any proposal for the network to distinguish between 706 packets dropped due to corruption, and packets dropped due to 707 congestion or buffer overflow. 709 11. A summary of related work. 711 [Floyd94] considers the advantages and drawbacks of adding ECN to the 712 TCP/IP architecture. As shown in the simulation-based comparisons, 713 one advantage of ECN is to avoid unnecessary packet drops for short 714 or delay-sensitive TCP connections. A second advantage of ECN is in 715 avoiding some unnecessary retransmit timeouts in TCP. This paper 716 discusses in detail the integration of ECN into TCP's congestion 717 control mechanisms. The possible disadvantages of ECN discussed in 718 the paper are that a non-compliant TCP connection could falsely 719 advertise itself as ECN-capable, and that a TCP ACK packet carrying 720 an ECN-Echo message could itself be dropped in the network. The 721 first of these two issues is discussed in Section 8 of this document, 722 and the second is addressed by the proposal in Section 5.1.3 for a 723 CWR flag in the TCP header. 725 [CKLTZ97] reports on an experimental implementation of ECN in IPv6. 726 The experiments include an implementation of ECN in an existing 727 implementation of RED for FreeBSD. A number of experiments were run 728 to demonstrate the control of the average queue size in the router, 729 the performance of ECN for a single TCP connection as a congested 730 router, and fairness with multiple competing TCP connections. One 731 conclusion of the experiments is that dropping packets from a bulk- 732 data transfer can degrade performance much more severely than marking 733 packets. 735 Because the experimental implementation in [CKLTZ97] predates some of 736 the developments in this document, the implementation does not 737 conform to this document in all respects. For example, in the 738 experimental implementation the CWR flag is not used, but instead the 739 TCP receiver sends the ECN-Echo bit on a single ACK packet. 741 [K98] and [CKLTZ98] build on [CKLTZ97] to further analyze the 742 benefits of ECN for TCP. The conclusions are that ECN TCP gets 743 moderately better throughput than non-ECN TCP; that ECN TCP flows are 744 fair towards non-ECN TCP flows; and that ECN TCP is robust with two- 745 way traffic, congestion in both directions, and with multiple 746 congested gateways. Experiments with many short web transfers show 747 that, while most of the short connections have similar transfer times 748 with or without ECN, a small percentage of the short connections have 749 very long transfer times for the non-ECN experiments as compared to 750 the ECN experiments. This increased transfer time is particularly 751 dramatic for those short connections that have their first packet 752 dropped in the non-ECN experiments, and that therefore have to wait 753 six seconds for the retransmit timer to expire. 755 The ECN Web Page [ECN] has pointers to other implementations of ECN 756 in progress. 758 12. Conclusions 760 Given the current effort to implement RED, we believe this is the 761 right time for router vendors to examine how to implement congestion 762 avoidance mechanisms that do not depend on packet drops alone. With 763 the increased deployment of applications and transports sensitive to 764 the delay and loss of a single packet (e.g., realtime traffic, short 765 web transfers), depending on packet loss as a normal congestion 766 notification mechanism appears to be insufficient (or at the very 767 least, non-optimal). 769 13. Acknowledgements 771 Many people have made contributions to this internet-draft. In 772 particular, we would like to thank Kenjiro Cho for the proposal for 773 the TCP mechanism for negotiating ECN-Capability, Kevin Fall for the 774 proposal of the CWR bit, Steve Blake for material on IPv4 Header 775 Checksum Recalculation, Jamal Hadi Salim for discussions of ECN 776 issues, and Steve Bellovin, Jim Bound, Brian Carpenter, Paul 777 Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson for 778 discussions of security issues. We also thank the Internet End-to- 779 End Research Group for ongoing discussions of these issues. 781 14. References 783 [RFC-AH?] S. Kent and R. Atkinson, "IP Authentication Header", 784 Internet Draft , July 1998. 786 [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement 787 Levels", BCP 14, RFC 2119, March 1997. 789 [CKLTZ97] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L., 790 "Implementing Explicit Congestion Notification (ECN) in TCP over 791 IPv6", UCLA Technical Report, December 1997, URL 792 "http://www.cs.ucla.edu/~hari/software/ecn/ecn_rpt.ps.gz". 794 [CKLTZ98] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L., 795 "Implementing ECN for TCP/IPv6", presentation to the ECN BOF at the 796 L.A. IETF, March 1998, URL "http://www.cs.ucla.edu/~hari/ecn- 797 ietf.ps". 799 [RFC-DIFFSERV?] Kathleen Nichols, Steven Blake, Fred Baker, and David 800 L. Black, "Definition of the Differentiated Services Field (DS 801 Field) in the IPv4 and IPv6 Headers", Internet draft draft-ietf- 802 diffserv-header-04.txt in last call, October 1998. 804 [ECN] "The ECN Web Page", URL "http://www- 805 nrg.ee.lbl.gov/floyd/ecn.html". 807 [RFC-ESP?] S. Kent and R. Atkinson, "IP Encapsulating Security 808 Payload", Internet Draft , July 1998. 810 [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways 811 for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 812 N.4, August 1993, p. 397-413. URL 813 "ftp://ftp.ee.lbl.gov/papers/early.pdf". 815 [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM 816 Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. 817 URL "ftp://ftp.ee.lbl.gov/papers/tcp_ecn.4.ps.Z". 819 [Floyd97] Floyd, S., and Fall, K., "Router Mechanisms to Support 820 End-to-End Congestion Control", Technical report, February 1997. URL 821 "ftp://ftp.ee.lbl.gov/papers/collapse.ps". 823 [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", 824 URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- 825 ecn. 827 [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN) 828 benefits for TCP", Master's thesis, UCLA, 1998, URL 829 "http://www.cs.ucla.edu/~hari/software/ecn/ecn_report.ps.gz". 831 [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", 832 SIGCOMM '97, September 1997. URL 833 "http://www.inria.fr/rodeo/sigcomm97/program.html#ab078". 835 [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. 836 ACM SIGCOMM '88, pp. 314-329. URL 837 "ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z". 839 [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance 840 Algorithm", Message to end2end-interest mailing list, April 1990. 841 URL "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". 843 [MJV96], S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven 844 Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130. 846 [RFC791] J. Postel, Internet Protocol, RFC 791, September 1981. 848 [RFC793] J. Postel, Transmission Control Protocol, RFC 793, September 849 1981. 851 [RFC1141] T. Mallory and A. Kullberg, "Incremental Updating of the 852 Internet Checksum", RFC 1141, January 1990. 854 [RFC1349] P. Almquist, "Type of Service in the Internet Protocol 855 Suite", RFC 1349, July 1992. 857 [RFC1455] D. Eastlake, "Physical Link Security Type of Service", RFC 858 1455, May 1993. 860 [RFC2001] W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast 861 Retransmit, and Fast Recovery Algorithms", RFC 2001, January 1997. 863 [RFC2309] B. Braden, D. Clark, J. Crowcroft, B. Davie, S. Deering, D. 864 Estrin, S. Floyd, V. Jacobson, G. Minshall, C. Partridge, L. 865 Peterson, K. Ramakrishnan, S. Shenker, J. Wroclawski, L. Zhang, 866 "Recommendations on Queue Management and Congestion Avoidance in the 867 Internet", RFC 2309, April 1998. 869 [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for 870 Congestion Avoidance in Computer Networks", ACM Transactions on 871 Computer Systems, Vol.8, No.2, pp. 158-181, May 1990. 873 15. Security Considerations 875 Security considerations have been discussed in Section 9. 877 16. IPv4 Header Checksum Recalculation 879 IPv4 header checksum recalculation is an issue with some high-end 880 router architectures using an output-buffered switch, since most if 881 not all of the header manipulation is performed on the input side of 882 the switch, while the ECN decision would need to be made local to the 883 output buffer. This is not an issue for IPv6, since there is no IPv6 884 header checksum. The IPv4 TOS octet is the last byte of a 16-bit 885 half-word. 887 RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 888 checksum after the TTL field is decremented. The incremental 889 updating of the IPv4 checksum after the CE bit was set would work as 890 follows: Let HC be the original header checksum, and let HC' be the 891 new header checksum after the CE bit has been set. Then for header 892 checksums calculated with one's complement subtraction, HC' would be 893 recalculated as follows: 894 HC' = { HC - 1 HC > 1 895 { 0x0000 HC = 1 896 For header checksums calculated on two's complement machines, HC' 897 would be recalculated as follows after the CE bit was set: 898 HC' = { HC - 1 HC > 0 899 { 0xFFFE HC = 0 901 17. The motivation for the ECT bit. 903 The need for the ECT bit is motivated by the fact that ECN will be 904 deployed incrementally in an Internet where some transport protocols 905 and routers understand ECN and some do not. With the ECT bit, the 906 router can drop packets from flows that are not ECN-capable, but can 907 *instead* set the CE bit in flows that *are* ECN-capable. Because the 908 ECT bit allows an end node to have the CE bit set in a packet 909 *instead* of having the packet dropped, an end node might have some 910 incentive to deploy ECN. 912 If there was no ECT indication, then the router would have to set the 913 CE bit for packets from both ECN-capable and non-ECN-capable flows. 914 In this case, there would be no incentive for end-nodes to deploy 915 ECN, and no viable path of incremental deployment from a non-ECN 916 world to an ECN-capable world. Consider the first stages of such an 917 incremental deployment, where a subset of the flows are ECN-capable. 918 At the onset of congestion, when the packet dropping/marking rate 919 would be low, routers would only set CE bits, rather than dropping 920 packets. However, only those flows that are ECN-capable would 921 understand and respond to CE packets. The result is that the ECN- 922 capable flows would back off, and the non-ECN-capable flows would be 923 unaware of the ECN signals and would continue to open their 924 congestion windows. 926 In this case, there are two possible outcomes: (1) the ECN-capable 927 flows back off, the non-ECN-capable flows get all of the bandwidth, 928 and congestion remains mild, or (2) the ECN-capable flows back off, 929 the non-ECN-capable flows don't, and congestion increases until the 930 router transitions from setting the CE bit to dropping packets. 931 While this second outcome evens out the fairness, the ECN-capable 932 flows would still receive little benefit from being ECN-capable, 933 because the increased congestion would drive the router to packet- 934 dropping behavior. 936 A flow that advertised itself as ECN-Capable but does not respond to 937 CE bits is functionally equivalent to a flow that turns off 938 congestion control, as discussed in Sections 8 and 9. 940 Thus, in a world when a subset of the flows are ECN-capable, but 941 where ECN-capable flows have no mechanism for indicating that fact to 942 the routers, there would be less effective and less fair congestion 943 control in the Internet, resulting in a strong incentive for end 944 nodes not to deploy ECN. 946 18. Why use two bits in the IP header? 948 Given the need for an ECT indication in the IP header, there still 949 remains the question of whether the ECT (ECN-Capable Transport) and 950 CE (Congestion Experienced) indications should be overloaded on a 951 single bit. This overloaded-one-bit alternative, explored in 952 [Floyd94], would involve a single bit with two values. One value, 953 "ECT and not CE", would represent an ECN-Capable Transport, and the 954 other value, "CE or not ECT", would represent either Congestion 955 Experienced or a non-ECN-Capable transport. 957 One difference between the one-bit and two-bit implementations 958 concerns packets that traverse multiple congested routers. Consider 959 a CE packet that arrives at a second congested router, and is 960 selected by the active queue management at that router for either 961 marking or dropping. In the one-bit implementation, the second 962 congested router has no choice but to drop the CE packet, because it 963 cannot distinguish between a CE packet and a non-ECT packet. In the 964 two-bit implementation, the second congested router has the choice of 965 either dropping the CE packet, or of leaving it alone with the CE bit 966 set. 968 Another difference between the one-bit and two-bit implementations 969 comes from the fact that with the one-bit implementation, receivers 970 in a single flow cannot distinguish between CE and non-ECT packets. 971 Thus, in the one-bit implementation an ECN-capable data sender would 972 have to unambiguously indicate to the receiver or receivers whether 973 each packet had been sent as ECN-Capable or as non-ECN-Capable. One 974 possibility would be for the sender to indicate in the transport 975 header whether the packet was sent as ECN-Capable. A second 976 possibility that would involve a functional limitation for the one- 977 bit implementation would be for the sender to unambiguously indicate 978 that it was going to send *all* of its packets as ECN-Capable or as 979 non-ECN-Capable. For a multicast transport protocol, this 980 unambiguous indication would have to be apparent to receivers joining 981 an on-going multicast session. 983 Another advantage of the two-bit approach is that it is somewhat more 984 robust. The most critical issue, discussed in Section 8, is that the 985 default indication should be that of a non-ECN-Capable transport. In 986 a two-bit implementation, this requirement for the default value 987 simply means that the ECT bit should be `OFF' by default. In the 988 one-bit implementation, this means that the single overloaded bit 989 should by default be in the "CE or not ECT" position. This is less 990 clear and straightforward, and possibly more open to incorrect 991 implementations either in the end nodes or in the routers. 993 In summary, while the one-bit implementation could be a possible 994 implementation, it has the following significant limitations relative 995 to the two-bit implementation. First, the one-bit implementation has 996 more limited functionality for the treatment of CE packets at a 997 second congested router. Second, the one-bit implementation requires 998 either that extra information be carried in the transport header of 999 packets from ECN-Capable flows (to convey the functionality of the 1000 second bit elsewhere, namely in the transport header), or that 1001 senders in ECN-Capable flows accept the limitation that receivers 1002 must be able to determine a priori which packets are ECN-Capable and 1003 which are not ECN-Capable. Third, the one-bit implementation is 1004 possibly more open to errors from faulty implementations that choose 1005 the wrong default value for the ECN bit. We believe that the use of 1006 the extra bit in the IP header for the ECT-bit is extremely valuable 1007 to overcome these limitations. 1009 19. Historical definitions for the IPv4 TOS octet 1011 RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP 1012 header. In RFC 791, bits 6 and 7 of the ToS octet are listed as 1013 "Reserved for Future Use", and are shown set to zero. The first two 1014 fields of the ToS octet were defined as the Precedence and Type of 1015 Service (TOS) fields. 1017 0 1 2 3 4 5 6 7 1018 +-----+-----+-----+-----+-----+-----+-----+-----+ 1019 | PRECEDENCE | TOS | 0 | 0 | RFC 791 1020 +-----+-----+-----+-----+-----+-----+-----+-----+ 1022 RFC 1122 included bits 6 and 7 in the TOS field, though it did not 1023 discuss any specific use for those two bits: 1025 0 1 2 3 4 5 6 7 1026 +-----+-----+-----+-----+-----+-----+-----+-----+ 1027 | PRECEDENCE | TOS | RFC 1122 1028 +-----+-----+-----+-----+-----+-----+-----+-----+ 1030 The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows: 1032 0 1 2 3 4 5 6 7 1033 +-----+-----+-----+-----+-----+-----+-----+-----+ 1034 | PRECEDENCE | TOS | MBZ | RFC 1349 1035 +-----+-----+-----+-----+-----+-----+-----+-----+ 1037 Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary 1038 Cost". In addition to the Precedence and Type of Service (TOS) 1039 fields, the last field, MBZ (for "must be zero") was defined as 1040 currently unused. RFC 1349 stated that "The originator of a datagram 1041 sets [the MBZ] field to zero (unless participating in an Internet 1042 protocol experiment which makes use of that bit)." 1044 RFC 1455 [RFC 1455] defined an experimental standard that used all 1045 four bits in the TOS field to request a guaranteed level of link 1046 security. 1048 RFC 1349 is obsoleted by "Definition of the Differentiated Services 1049 Field (DS Field) in the IPv4 and IPv6 Headers" [RFC-DIFFSERV?], in 1050 which bits 6 and 7 of the DS field are listed as Currently Unused 1051 (CU). The first six bits of the DS field are defined as the 1052 Differentiated Services CodePoint (DSCP): 1054 0 1 2 3 4 5 6 7 1055 +-----+-----+-----+-----+-----+-----+-----+-----+ 1056 | DSCP | CU | 1057 +-----+-----+-----+-----+-----+-----+-----+-----+ 1059 Because of this unstable history, the definition of the ECN field in 1060 this document cannot be guaranteed to be backwards compatible with 1061 all past uses of these two bits. The damage that could be done by a 1062 non-ECN-capable router would be to "erase" the CE bit for an ECN- 1063 capable packet that arrived at the router with the CE bit set, or set 1064 the CE bit even in the absence of congestion. This has been 1065 discussed in Section 10 on "Non-compliance in the Network". 1067 The damage that could be done in an ECN-capable environment by a 1068 non-ECN-capable end-node transmitting packets with the ECT bit set 1069 has been discussed in Section 9 on "Non-compliance by the End Nodes". 1071 AUTHORS' ADDRESSES 1073 K. K. Ramakrishnan 1074 AT&T Labs. Research 1075 Phone: +1 (973) 360-8766 1076 Email: kkrama@research.att.com 1077 URL: http://www.research.att.com/info/kkrama 1079 Sally Floyd 1080 Lawrence Berkeley National Laboratory 1081 Phone: +1 (510) 486-7518 1082 Email: floyd@ee.lbl.gov 1083 URL: http://www-nrg.ee.lbl.gov/floyd/ 1085 This draft was created in October 1998. 1086 It expires April 1999.