idnits 2.17.1 draft-ietf-tsvwg-ecn-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 54 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 55 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 5 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 2001' is mentioned on line 482, but not defined ** Obsolete undefined reference: RFC 2001 (Obsoleted by RFC 2581) == Missing Reference: 'RFC 2983' is mentioned on line 988, but not defined == Missing Reference: 'RFC 2474' is mentioned on line 1373, but not defined == Missing Reference: 'RFC 2475' is mentioned on line 1374, but not defined == Missing Reference: 'RFC 1455' is mentioned on line 2488, but not defined ** Obsolete undefined reference: RFC 1455 (Obsoleted by RFC 2474) == Unused Reference: 'FRED' is defined on line 1757, but no explicit reference was found in the text == Unused Reference: 'RFC1455' is defined on line 1800, but no explicit reference was found in the text == Unused Reference: 'RFC1701' is defined on line 1803, but no explicit reference was found in the text == Unused Reference: 'RFC1702' is defined on line 1806, but no explicit reference was found in the text == Unused Reference: 'RFC 2119' is defined on line 1812, but no explicit reference was found in the text == Unused Reference: 'RFC2408' is defined on line 1824, but no explicit reference was found in the text == Unused Reference: 'RFC2409' is defined on line 1828, but no explicit reference was found in the text == Unused Reference: 'RFC2475' is defined on line 1835, but no explicit reference was found in the text == Unused Reference: 'RFC2983' is defined on line 1849, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2402 (ref. 'AH') (Obsoleted by RFC 4302, RFC 4305) -- Possible downref: Non-RFC (?) normative reference: ref. 'ECN' ** Obsolete normative reference: RFC 2406 (ref. 'ESP') (Obsoleted by RFC 4303, RFC 4305) -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ93' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd94' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd98' -- Possible downref: Non-RFC (?) normative reference: ref. 'FF99' -- Possible downref: Non-RFC (?) normative reference: ref. 'FRED' ** Downref: Normative reference to an Informational RFC: RFC 1701 (ref. 'GRE') -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson88' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson90' -- Possible downref: Non-RFC (?) normative reference: ref. 'K98' -- Possible downref: Non-RFC (?) normative reference: ref. 'MJV96' ** Downref: Normative reference to an Informational RFC: RFC 2702 (ref. 'MPLS') ** Downref: Normative reference to an Informational RFC: RFC 2637 (ref. 'PPTP') ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Downref: Normative reference to an Informational RFC: RFC 1141 ** Obsolete normative reference: RFC 1349 (Obsoleted by RFC 2474) ** Obsolete normative reference: RFC 1455 (Obsoleted by RFC 2474) -- Duplicate reference: RFC1701, mentioned in 'RFC1701', was also mentioned in 'GRE'. ** Downref: Normative reference to an Informational RFC: RFC 1701 ** Downref: Normative reference to an Informational RFC: RFC 1702 -- Duplicate reference: RFC2119, mentioned in 'RFC 2119', was also mentioned in 'B97'. ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567) ** Obsolete normative reference: RFC 2401 (Obsoleted by RFC 4301) ** Obsolete normative reference: RFC 2407 (Obsoleted by RFC 4306) ** Obsolete normative reference: RFC 2409 (ref. 'RFC2408') (Obsoleted by RFC 4306) -- Duplicate reference: RFC2409, mentioned in 'RFC2409', was also mentioned in 'RFC2408'. ** Obsolete normative reference: RFC 2409 (Obsoleted by RFC 4306) ** Downref: Normative reference to an Informational RFC: RFC 2475 ** Obsolete normative reference: RFC 2481 (Obsoleted by RFC 3168) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Downref: Normative reference to an Informational RFC: RFC 2884 ** Downref: Normative reference to an Informational RFC: RFC 2983 -- Possible downref: Non-RFC (?) normative reference: ref. 'RJ90' -- Possible downref: Non-RFC (?) normative reference: ref. 'SCWA99' Summary: 27 errors (**), 0 flaws (~~), 18 warnings (==), 17 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force K. K. Ramakrishnan 3 INTERNET DRAFT TeraOptic Networks 4 draft-ietf-tsvwg-ecn-01.txt Sally Floyd 5 ACIRI 6 D. Black 7 EMC 8 January, 2001 9 Expires: July, 2001 11 The Addition of Explicit Congestion Notification (ECN) to IP 13 Status of this Memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet- Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 Abstract 36 This document specifies the incorporation of ECN (Explicit Congestion 37 Notification) to TCP and IP, including ECN's use of two bits in the 38 IP header. We begin by describing TCP's use of packet drops as an 39 indication of congestion. Next we explain that with the addition of 40 active queue management (e.g., RED) to the Internet infrastructure, 41 where routers detect congestion before the queue overflows, routers 42 are no longer limited to packet drops as an indication of congestion. 43 Routers can instead set the Congestion Experienced (CE) bit in the IP 44 header of packets from ECN-capable transports. We describe when the 45 CE bit is to be set in routers, and describe modifications needed to 46 TCP to make it ECN-capable. Modifications to other transport 47 protocols (e.g., unreliable unicast or multicast, reliable multicast, 48 other reliable unicast transport protocols) could be considered as 49 those protocols are developed and advance through the standards 50 process. 52 We also describe in this document the issues involving the use of ECN 53 within IP tunnels, and within IPsec tunnels in particular. 55 One of the guiding principles for this document is that all the 56 mechanisms specified here are incrementally deployable. 58 Table of Contents 59 1. Introduction 60 2. Conventions and Acronyms 61 3. Assumptions and General Principles 62 4. Active Queue Management (AQM) 63 5. Explicit Congestion Notification in IP 64 5.1. ECN as an Indication of Persistent Congestion 65 5.2. Dropped or Corrupted Packets 66 6. Support from the Transport Protocol 67 6.1. TCP 68 6.1.1. TCP Initialization 69 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field 70 6.1.2. The TCP Sender 71 6.1.3. The TCP Receiver 72 6.1.4. Congestion on the ACK-path 73 6.1.5. Retransmitted TCP packets 74 6.1.6. TCP Window Probes. 75 7. Non-compliance by the End Nodes 76 8. Non-compliance in the Network 77 8.1. Complications Introduced by Split Paths 78 9. Encapsulated Packets 79 9.1. IP packets encapsulated in IP 80 9.1.1. The Limited-functionality and Full-functionality Options 81 9.1.2. Changes to the ECN Field within an IP Tunnel. 82 9.2. IPsec Tunnels 83 9.2.1. Negotiation between Tunnel Endpoints 84 9.2.1.1. ECN Tunnel Security Association Database Field 85 9.2.1.2. ECN Tunnel Security Association Attribute 86 9.2.1.3. Changes to IPsec Tunnel Header Processing 87 9.2.2. Changes to the ECN Field within an IPsec Tunnel. 88 9.2.3. Comments for IPsec Support 89 9.3. IP packets encapsulated in non-IP packet headers. 90 10. Issues Raised by Monitoring and Policing Devices 91 11. Evaluations of ECN 92 12. Summary of changes required in IP and TCP 93 13. Conclusions 94 14. Acknowledgements 95 15. References 96 16. Security Considerations 97 17. IPv4 Header Checksum Recalculation 98 18. Possible Changes to the ECN Field in the Network 99 18.1. Possible Changes to the IP Header 100 18.1.1. Erasing the Congestion Indication 101 18.1.2. Falsely Reporting Congestion 102 18.1.3. Disabling ECN-Capability 103 18.1.4. Falsely Indicating ECN-Capability 104 18.1.5. Changes with No Functional Effect 105 18.2. Information carried in the Transport Header 106 18.3. Split Paths 107 19. Implications of Subverting End-to-End Congestion Control 108 19.1. Implications for the Network and for Competing Flows 109 19.2. Implications for the Subverted Flow 110 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control 111 20. The Motivation for the ECT bit. 112 21. Why use Two Bits in the IP Header? 113 22. Historical Definitions for the IPv4 TOS Octet 114 23. IANA Considerations 116 RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - To compare 117 this with draft-ietf-tsvwg-ecn-00, compare the following: 118 "http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-00.troff" 119 "http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-01.troff" 120 Changes from draft-ietf-tsvwg-ecn-00: 121 * Deleted Section 6.1.1.2. on "Robust TCP Initialization with no 122 response to the SYN", and modified the paragraph in the Conclusions 123 referring to this. 124 * Added Section 23 on IANA Considerations. 125 * Added two paragraphs to Section 18.2 on denial-of-service attacks. 126 * Added some text about the ECN nonce being a research issue. 127 * Moved two paragraphs about setting the CWR bit from Section 6.1.3 to 128 Section 6.1.2. 129 * Various small changes: 130 Adding several small clarifying sentences in Section 12, 22. 131 Small clarification to text in Section 19.2. 132 Deleted a few unnecessary sentences in Section 9. 133 Updated some references to Section X. 134 Added more references to RFC 2780. 135 Deleted references to internet-drafts. 136 Clarified terminology for "non-ECN-setup SYN packet", including the 137 following: "Receivers MUST correctly handle all forms of the non-ECN- 138 setup SYN and SYN-ACK packets." 140 1. Introduction 142 TCP's congestion control and avoidance algorithms are based on the 143 notion that the network is a black-box [Jacobson88, Jacobson90]. The 144 network's state of congestion or otherwise is determined by end-sys- 145 tems probing for the network state, by gradually increasing the load 146 on the network (by increasing the window of packets that are out- 147 standing in the network) until the network becomes congested and a 148 packet is lost. Treating the network as a "black-box" and treating 149 loss as an indication of congestion in the network is appropriate for 150 pure best-effort data carried by TCP, with little or no sensitivity 151 to delay or loss of individual packets. In addition, TCP's conges- 152 tion management algorithms have techniques built-in (such as Fast 153 Retransmit and Fast Recovery) to minimize the impact of losses, from 154 a throughput perspective. However, these mechanisms are not intended 155 to help applications that are in fact sensitive to the delay or loss 156 of one or more individual packets. Interactive traffic such as tel- 157 net, web-browsing, and transfer of audio and video data can be sensi- 158 tive to packet losses (especially when using an unreliable data 159 delivery transport such as UDP) or to the increased latency of the 160 packet caused by the need to retransmit the packet after a loss (with 161 the reliable data delivery semantics provided by TCP). 163 Since TCP determines the appropriate congestion window to use by 164 gradually increasing the window size until it experiences a dropped 165 packet, this causes the queues at the bottleneck router to build up. 166 With most packet drop policies at the router that are not sensitive 167 to the load placed by each individual flow (e.g., tail-drop on queue 168 overflow), this means that some of the packets of latency-sensitive 169 flows may be dropped. In addition, such drop policies lead to syn- 170 chronization of loss across multiple flows. 172 Active queue management mechanisms detect congestion before the queue 173 overflows, and provide an indication of this congestion to the end 174 nodes. Thus, active queue management can reduce unnecessary queueing 175 delay for all traffic sharing that queue. The advantages of active 176 queue management are discussed in RFC 2309 [RFC2309]. Active queue 177 management avoids some of the bad properties of dropping on queue 178 overflow, including the undesirable synchronization of loss across 179 multiple flows. More importantly, active queue management means that 180 transport protocols with mechanisms for congestion control (e.g., 181 TCP) do not have to rely on buffer overflow as the only indication of 182 congestion. 184 Active queue management mechanisms may use one of several methods for 185 indicating congestion to end-nodes. One is to use packet drops, as is 186 currently done. However, active queue management allows the router to 187 separate policies of queueing or dropping packets from the policies 188 for indicating congestion. Thus, active queue management allows 189 routers to use the Congestion Experienced (CE) bit in a packet header 190 as an indication of congestion, instead of relying solely on packet 191 drops. This has the potential of reducing the impact of loss on 192 latency-sensitive flows. 194 This document is intended to obsolete RFC 2481, "A Proposal to add 195 Explicit Congestion Notification (ECN) to IP", which defined ECN as 196 an Experimental Protocol for the Internet Community. 198 RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - This 199 document obsoletes three subsequent internet-drafts on ECN, "IPsec 200 Interactions with ECN", "ECN Interactions with IP Tunnels", and "TCP 201 with ECN: The Treatment of Retransmitted Data Packets". This 202 document is intended largely to merge the earlier documents all into 203 a single document, for greater clarity, in preparation to becoming a 204 Proposed Standard. 206 2. Conventions and Acronyms 208 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 209 SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this 210 document, are to be interpreted as described in [B97]. 212 3. Assumptions and General Principles 214 In this section, we describe some of the important design principles 215 and assumptions that guided the design choices in this proposal. 217 * Because ECN is likely to be adopted gradually, accommodating migra- 218 tion is essential. Some routers may still only drop packets to indi- 219 cate congestion, and some end-systems may not be ECN-capable. The 220 most viable strategy is one that accommodates incremental deployment 221 without having to resort to "islands" of ECN-capable and non-ECN- 222 capable environments. 223 * New mechanisms for congestion control and avoidance need to co- 224 exist and cooperate with existing mechanisms for congestion control. 225 In particular, new mechanisms have to co-exist with TCP's current 226 methods of adapting to congestion and with routers' current practice 227 of dropping packets in periods of congestion. 228 * Congestion may persist over different time-scales. The time scales 229 that we are concerned with are congestion events that may last longer 230 than a round-trip time. 231 * The number of packets in an individual flow (e.g., TCP connection 232 or an exchange using UDP) may range from a small number of packets to 233 quite a large number. We are interested in managing the congestion 234 caused by flows that send enough packets so that they are still 235 active when network feedback reaches them. 236 * Asymmetric routing is likely to be a normal occurrence in the 237 Internet. The path (sequence of links and routers) followed by data 238 packets may be different from the path followed by the acknowledgment 239 packets in the reverse direction. 240 * Many routers process the "regular" headers in IP packets more effi- 241 ciently than they process the header information in IP options. This 242 suggests keeping congestion experienced information in the regular 243 headers of an IP packet. 244 * It must be recognized that not all end-systems will cooperate in 245 mechanisms for congestion control. However, new mechanisms shouldn't 246 make it easier for TCP applications to disable TCP congestion con- 247 trol. The benefit of lying about participating in new mechanisms 248 such as ECN-capability should be small. 250 4. Active Queue Management (AQM) 252 Random Early Detection (RED) is one mechanism for Active Queue Man- 253 agement (AQM) that has been proposed to detect incipient congestion 254 [FJ93], and is currently being deployed in the Internet [RFC2309]. 255 AQM is meant to be a general mechanism using one of several alterna- 256 tives for congestion indication, but in the absence of ECN, AQM is 257 restricted to using packet drops as a mechanism for congestion indi- 258 cation. AQM drops packets based on the average queue length exceed- 259 ing a threshold, rather than only when the queue overflows. However, 260 because AQM may drop packets before the queue actually overflows, AQM 261 is not always forced by memory limitations to discard the packet. 263 AQM can set a Congestion Experienced (CE) bit in the packet header 264 instead of dropping the packet, when such a bit is provided in the IP 265 header and understood by the transport protocol. The use of the CE 266 bit with ECN allows the receiver(s) to receive the packet, avoiding 267 the potential for excessive delays due to retransmissions after 268 packet losses. We use the term 'CE packet' to denote a packet that 269 has the CE bit set. 271 5. Explicit Congestion Notification in IP 273 This document specifies that the Internet provide a congestion indi- 274 cation for incipient congestion (as in RED and earlier work [RJ90]) 275 where the notification can sometimes be through marking packets 276 rather than dropping them. This uses an ECN field in the IP header 277 with two bits. The ECN-Capable Transport (ECT) bit is set by the 278 data sender to indicate that the end-points of the transport protocol 279 are ECN-capable. The CE bit is set by the router to indicate conges- 280 tion to the end nodes. Routers that have a packet arriving at a full 281 queue drop the packet, just as they do in the absence of ECN. 283 Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field. 284 Bit 6 is designated as the ECT bit, and bit 7 is designated as the CE 285 bit. The IPv4 TOS octet corresponds to the Traffic Class octet in 286 IPv6. The definitions for the IPv4 TOS octet [RFC791] and the IPv6 287 Traffic Class octet have been superseded by the six-bit DS (Differen- 288 tiated Services) Field [RFC2474, RFC2780]. Bits 6 and 7 are listed 289 in [RFC2474] as Currently Unused, and are specified in RFC 2780 as 290 approved for experimental use for ECN. Section 19 gives a brief his- 291 tory of the TOS octet. 293 0 1 2 3 4 5 6 7 294 +-----+-----+-----+-----+-----+-----+-----+-----+ 295 | DS FIELD | ECN FIELD | 296 | | | 297 | DSCP | ECT | CE | 298 +-----+-----+-----+-----+-----+-----+-----+-----+ 300 DSCP: differentiated services codepoint 301 ECN: Explicit Congestion Notification 303 Figure 1: The Differentiated Services and ECN Fields in IP. 305 Because of the unstable history of the TOS octet, the use of the ECN 306 field as specified in this document cannot be guaranteed to be back- 307 wards compatible with all past uses of these two bits. The potential 308 dangers of this lack of backwards compatibility are discussed in Sec- 309 tion 19. 311 Upon the receipt by an ECN-Capable transport of a single CE packet, 312 the congestion control algorithms followed at the end-systems MUST be 313 essentially the same as the congestion control response to a *single* 314 dropped packet. For example, for ECN-Capable TCP the source TCP is 315 required to halve its congestion window for any window of data con- 316 taining either a packet drop or an ECN indication. 318 One reason for requiring that the congestion-control response to the 319 CE packet be essentially the same as the response to a dropped packet 320 is to accommodate the incremental deployment of ECN in both end-sys- 321 tems and in routers. Some routers may drop ECN-Capable packets 322 (e.g., using the same AQM policies for congestion detection) while 323 other routers set the CE bit, for equivalent levels of congestion. 324 Similarly, a router might drop a non-ECN-Capable packet but set the 325 CE bit in an ECN-Capable packet, for equivalent levels of congestion. 326 If there were different congestion control responses to a CE bit 327 indication than to a packet drop, this could result in unfair treat- 328 ment for different flows. 330 An additional goal is that the end-systems should react to congestion 331 at most once per window of data (i.e., at most once per round-trip 332 time), to avoid reacting multiple times to multiple indications of 333 congestion within a round-trip time. 335 For a router, the CE bit of an ECN-Capable packet should only be set 336 if the router would otherwise have dropped the packet as an indica- 337 tion of congestion to the end nodes. When the router's buffer is not 338 yet full and the router is prepared to drop a packet to inform end 339 nodes of incipient congestion, the router should first check to see 340 if the ECT bit is set in that packet's IP header. If so, then 341 instead of dropping the packet, the router MAY instead set the CE bit 342 in the IP header. 344 An environment where all end nodes were ECN-Capable could allow new 345 criteria to be developed for setting the CE bit, and new congestion 346 control mechanisms for end-node reaction to CE packets. However, 347 this is a research issue, and as such is not addressed in this docu- 348 ment. 350 When a CE packet (i.e., a packet that has the CE bit set) is received 351 by a router, the CE bit is left unchanged, and the packet is trans- 352 mitted as usual. When severe congestion has occurred and the router's 353 queue is full, then the router has no choice but to drop some packet 354 when a new packet arrives. We anticipate that such packet losses 355 will become relatively infrequent when a majority of end-systems 356 become ECN-Capable and participate in TCP or other compatible conges- 357 tion control mechanisms. In an ECN-Capable environment that is ade- 358 quately-provisioned network, packet losses should occur primarily 359 during transients or in the presence of non-cooperating sources. 361 We expect that routers will set the CE bit in response to incipient 362 congestion as indicated by the average queue size, using the RED 363 algorithms suggested in [FJ93, RFC2309]. To the best of our knowl- 364 edge, this is the only proposal currently under discussion in the 365 IETF for routers to drop packets proactively, before the buffer over- 366 flows. However, this document does not attempt to specify a particu- 367 lar mechanism for active queue management, leaving that endeavor, if 368 needed, to other areas of the IETF. While ECN is inextricably tied 369 up with the need to have a reasonable active queue management mecha- 370 nism at the router, the reverse does not hold; active queue manage- 371 ment mechanisms have been developed and deployed independent of ECN, 372 using packet drops as indications of congestion in the absence of ECN 373 in the IP architecture. 375 5.1. ECN as an Indication of Persistent Congestion 377 We emphasize that a *single* packet with the CE bit set in an IP 378 packet causes the transport layer to respond, in terms of congestion 379 control, as it would to a packet drop. The instantaneous queue size 380 is likely to see considerable variations even when the router does 381 not experience persistent congestion. As such, it is important that 382 transient congestion at a router, reflected by the instantaneous 383 queue size reaching a threshold much smaller than the capacity of the 384 queue, not trigger a reaction at the transport layer. Therefore, the 385 CE bit should not be set by a router based on the instantaneous queue 386 size. 388 For example, since the ATM and Frame Relay mechanisms for congestion 389 indication have typically been defined without an associated notion 390 of average queue size as the basis for determining that an intermedi- 391 ate node is congested, we believe that they provide a very noisy sig- 392 nal. The TCP-sender reaction specified in this document for ECN is 393 NOT the appropriate reaction for such a noisy signal of congestion 394 notification. However, if the routers that interface to the ATM net- 395 work have a way of maintaining the average queue at the interface, 396 and use it to come to a reliable determination that the ATM subnet is 397 congested, they may use the ECN notification that is defined here. 399 We continue to encourage experiments in techniques at layer 2 (e.g., 400 in ATM switches or Frame Relay switches) to take advantage of ECN. 401 For example, using a scheme such as RED (where packet marking is 402 based on the average queue length exceeding a threshold), layer 2 403 devices could provide a reasonably reliable indication of congestion. 404 When all the layer 2 devices in a path set that layer's own Conges- 405 tion Experienced bit (e.g., the EFCI bit for ATM, the FECN bit in 406 Frame Relay) in this reliable manner, then the interface router to 407 the layer 2 network could copy the state of that layer 2 Congestion 408 Experienced bit into the CE bit in the IP header. We recognize that 409 this is not the current practice, nor is it in current standards. 410 However, encouraging experimentation in this manner may provide the 411 information needed to enable evolution of existing layer 2 mechanisms 412 to provide a more reliable means of congestion indication, when they 413 use a single bit for indicating congestion. 415 5.2. Dropped or Corrupted Packets 417 For the proposed use for ECN in this document (that is, for a trans- 418 port protocol such as TCP for which a dropped data packet is an indi- 419 cation of congestion), end nodes detect dropped data packets, and the 420 congestion response of the end nodes to a dropped data packet is at 421 least as strong as the congestion response to a received CE packet. 422 To ensure the reliable delivery of the congestion indication of the 423 CE bit, the ECT bit MUST NOT be set in a packet unless the loss of 424 that packet in the network would be detected by the end nodes and 425 interpreted as an indication of congestion. 427 Transport protocols such as TCP do not necessarily detect all packet 428 drops, such as the drop of a "pure" ACK packet; for example, TCP does 429 not reduce the arrival rate of subsequent ACK packets in response to 430 an earlier dropped ACK packet. Any proposal for extending ECN-Capa- 431 bility to such packets would have to address issues such as the case 432 of an ACK packet that was marked with the CE bit but was later 433 dropped in the network. We believe that this aspect is still the sub- 434 ject of research, so this document specifies that at this time, 435 "pure" ACK packets MUST NOT indicate ECN-Capability. 437 Similarly, if a CE packet is dropped later in the network due to cor- 438 ruption (bit errors), the end nodes should still invoke congestion 439 control, just as TCP would today in response to a dropped data 440 packet. This issue of corrupted CE packets would have to be consid- 441 ered in any proposal for the network to distinguish between packets 442 dropped due to corruption, and packets dropped due to congestion or 443 buffer overflow. In particular, the ubiquitous deployment of ECN 444 would not, in and of itself, be a sufficient development to allow 445 end-nodes to interpret packet drops as indications of corruption 446 rather than congestion. 448 6. Support from the Transport Protocol 450 ECN requires support from the transport protocol, in addition to the 451 functionality given by the ECN field in the IP packet header. The 452 transport protocol might require negotiation between the endpoints 453 during setup to determine that all of the endpoints are ECN-capable, 454 so that the sender can set the ECT bit in transmitted packets. Sec- 455 ond, the transport protocol must be capable of reacting appropriately 456 to the receipt of CE packets. This reaction could be in the form of 457 the data receiver informing the data sender of the received CE packet 458 (e.g., TCP), of the data receiver unsubscribing to a layered multi- 459 cast group (e.g., RLM [MJV96]), or of some other action that ulti- 460 mately reduces the arrival rate of that flow on that congested link. 462 This document only addresses the addition of ECN Capability to TCP, 463 leaving issues of ECN in other transport protocols to further 464 research. For TCP, ECN requires three new pieces of functionality: 465 negotiation between the endpoints during connection setup to deter- 466 mine if they are both ECN-capable; an ECN-Echo (ECE) flag in the TCP 467 header so that the data receiver can inform the data sender when a CE 468 packet has been received; and a Congestion Window Reduced (CWR) flag 469 in the TCP header so that the data sender can inform the data 470 receiver that the congestion window has been reduced. The support 471 required from other transport protocols is likely to be different, 472 particularly for unreliable or reliable multicast transport proto- 473 cols, and will have to be determined as other transport protocols are 474 brought to the IETF for standardization. 476 6.1. TCP 478 The following sections describe in detail the proposed use of ECN in 479 TCP. This proposal is described in essentially the same form in 480 [Floyd94]. We assume that the source TCP uses the standard congestion 481 control algorithms of Slow-start, Fast Retransmit and Fast Recovery 482 [RFC 2001]. 484 This proposal specifies two new flags in the Reserved field of the 485 TCP header. The TCP mechanism for negotiating ECN-Capability uses 486 the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the Reserved 487 field of the TCP header is designated as the ECN-Echo flag. The 488 location of the 6-bit Reserved field in the TCP header is shown in 489 Figure 3 of RFC 793 [RFC793] (and is reproduced below for complete- 490 ness). This specification of the ECN Field leaves the Reserved field 491 as a 4-bit field using bits 4-7. 493 To enable the TCP receiver to determine when to stop setting the ECN- 494 Echo flag, we introduce a second new flag in the TCP header, the CWR 495 flag. The CWR flag is assigned to Bit 8 in the Reserved field of the 496 TCP header. 498 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 499 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 500 | | | U | A | P | R | S | F | 501 | Header Length | Reserved | R | C | S | S | Y | I | 502 | | | G | K | H | T | N | N | 503 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 505 Figure 2: The old definition of bytes 13 and 14 of the TCP 506 header. 508 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 509 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 510 | | | C | E | U | A | P | R | S | F | 511 | Header Length | Reserved | W | C | R | C | S | S | Y | I | 512 | | | R | E | G | K | H | T | N | N | 513 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 515 Figure 3: The new definition of bytes 13 and 14 of the TCP 516 Header. 518 Thus, ECN uses the ECT and CE flags in the IP header (as shown in 519 Figure 1) for signaling between routers and connection endpoints, and 520 uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure 521 3) for TCP-endpoint to TCP-endpoint signaling. For a TCP connection, 522 a typical sequence of events in an ECN-based reaction to congestion 523 is as follows: 524 * The ECT bit is set in packets transmitted by the sender to indi- 525 cate that ECN is supported by the transport entities for these 526 packets. 527 * An ECN-capable router detects impending congestion and detects 528 that the ECT bit is set in the packet it is about to drop. 529 Instead of dropping the packet, the router chooses to set the CE 530 bit in the IP header and forwards the packet. 531 * The receiver receives the packet with the CE bit set, and sets 532 the ECN-Echo flag in its next TCP ACK sent to the sender. 534 * The sender receives the TCP ACK with ECN-Echo set, and reacts to 535 the congestion as if a packet had been dropped. 536 * The sender sets the CWR flag in the TCP header of the next 537 packet sent to the receiver to acknowledge its receipt of and 538 reaction to the ECN-Echo flag. 540 The negotiation for using ECN by the TCP transport entities and the 541 use of the ECN-Echo and CWR flags is described in more detail in the 542 sections below. 544 6.1.1 TCP Initialization 546 In the TCP connection setup phase, the source and destination TCPs 547 exchange information about their willingness to use ECN. Subsequent 548 to the completion of this negotiation, the TCP sender sets the ECT 549 bit in the IP header of data packets to indicate to the network that 550 the transport is capable and willing to participate in ECN for this 551 packet. This indicates to the routers that they may mark this packet 552 with the CE bit, if they would like to use that as a method of con- 553 gestion notification. If the TCP connection does not wish to use ECN 554 notification for a particular packet, the sending TCP sets the ECT 555 bit equal to 0 (i.e., not set), and the TCP receiver ignores the CE 556 bit in the received packet. 558 For this discussion, we designate the initiating host as Host A and 559 the responding host as Host B. We call a SYN packet with the ECE and 560 CWR flags set an "ECN-setup SYN packet", and we call a SYN packet 561 with at least one of the ECE and CWR flags not set a "non-ECN-setup 562 SYN packet". Similarly, we call a SYN-ACK packet with only the ECE 563 flag set but the CWR flag not set an "ECN-setup SYN-ACK packet", and 564 we call a SYN-ACK packet with any other configuration of the ECE and 565 CWR flags a "non-ECN-setup SYN-ACK packet". 567 Before a TCP connection can use ECN, Host A sends an ECN-setup SYN 568 packet, and Host B sends an ECN-setup SYN-ACK packet. For a SYN 569 packet, the setting of both ECE and CWR in the ECN-setup SYN packet 570 is defined as an indication that the sending TCP is ECN-Capable, 571 rather than as an indication of congestion or of response to conges- 572 tion. More precisely, an ECN-setup SYN packet indicates that the TCP 573 implementation transmitting the SYN packet will participate in ECN as 574 both a sender and receiver. Specifically, as a receiver, it will 575 respond to incoming data packets that have the CE bit set in the IP 576 header by setting ECE in outgoing TCP Acknowledgement (ACK) packets. 577 As a sender, it will respond to incoming packets that have ECE set by 578 reducing the congestion window and setting CWR when appropriate. An 579 ECN-setup SYN packet does not commit the TCP sender to setting the 580 ECT bit in any or all of the packets it may transmit. However, the 581 commitment to respond appropriately to incoming packets with the CE 582 bit set remains even if the TCP sender in a later transmission, 583 within this TCP connection, sends a SYN packet without ECE and CWR 584 set. 586 When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE flag 587 but not the CWR flag. An ECN-setup SYN-ACK packet is defined as an 588 indication that the TCP transmitting the SYN-ACK packet is ECN-Capa- 589 ble. As with the SYN packet, an ECN-setup SYN-ACK packet does not 590 commit the TCP host to setting the ECT bit in transmitted packets. 592 The following rules apply to the sending of ECN-setup packets: 594 * If a host has received an ECN-setup SYN packet, then it MAY send an 595 ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send an ECN-setup 596 SYN-ACK packet. 597 * A host MUST NOT set ECT on data packets unless it has sent at least 598 one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received at 599 least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has sent no 600 non-ECN-setup SYN or non-ECN-setup SYN-ACK packet. If a host has 601 received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK 602 packet, then it SHOULD NOT set ECT on data packets. 603 * If a host ever sets the ECT bit on a data packet, then that host 604 MUST correctly set/clear the CWR TCP bit on all subsequent packets in 605 the connection. 606 * If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK 607 packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN- 608 ACK packet, then if that host receives TCP data packets with ECT and 609 CE bits set in the IP header, then that host MUST process these pack- 610 ets as specified for an ECN-capable connection. * A host that is not 611 willing to use ECN on a TCP connection SHOULD clear both the ECE and 612 CWR flags in all non-ECN-setup SYN and/or SYN-ACK packets that it 613 sends to indicate this unwillingness. Receivers MUST correctly han- 614 dle all forms of the non-ECN-setup SYN and SYN-ACK packets. 616 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field 618 There is the question of why we chose to have the TCP sending the SYN 619 set two ECN-related flags in the Reserved field of the TCP header for 620 the SYN packet, while the responding TCP sending the SYN-ACK sets 621 only one ECN-related flag in the SYN-ACK packet. This asymmetry is 622 necessary for the robust negotiation of ECN-capability with some 623 deployed TCP implementations. There exists at least one faulty TCP 624 implementation in which TCP receivers set the Reserved field of the 625 TCP header in ACK packets (and hence the SYN-ACK) simply to reflect 626 the Reserved field of the TCP header in the received data packet. 627 Because the TCP SYN packet sets the ECN-Echo and CWR flags to indi- 628 cate ECN-capability, while the SYN-ACK packet sets only the ECN-Echo 629 flag, the sending TCP correctly interprets a receiver's reflection of 630 its own flags in the Reserved field as an indication that the 631 receiver is not ECN-capable. The sending TCP is not mislead by a 632 faulty TCP implementation sending a SYN-ACK packet that simply 633 reflects the Reserved field of the incoming SYN packet. 635 6.1.2. The TCP Sender 637 For a TCP connection using ECN, new data packets are transmitted with 638 the ECT bit set in the IP header (set to a "1"). If the sender 639 receives an ECN-Echo (ECE) ACK packet (that is, an ACK packet with 640 the ECN-Echo flag set in the TCP header), then the sender knows that 641 congestion was encountered in the network on the path from the sender 642 to the receiver. The indication of congestion should be treated just 643 as a congestion loss in non-ECN-Capable TCP. That is, the TCP source 644 halves the congestion window "cwnd" and reduces the slow start 645 threshold "ssthresh". The sending TCP SHOULD NOT increase the con- 646 gestion window in response to the receipt of an ECN-Echo ACK packet. 648 TCP should not react to congestion indications more than once every 649 window of data (or more loosely, more than once every round-trip 650 time). That is, the TCP sender's congestion window should be reduced 651 only once in response to a series of dropped and/or CE packets from a 652 single window of data. In addition, the TCP source should not 653 decrease the slow-start threshold, ssthresh, if it has been decreased 654 within the last round trip time. However, if any retransmitted pack- 655 ets are dropped, then this is interpreted by the source TCP as a new 656 instance of congestion. 658 After the source TCP reduces its congestion window in response to a 659 CE packet, incoming acknowledgements that continue to arrive can 660 "clock out" outgoing packets as allowed by the reduced congestion 661 window. If the congestion window consists of only one MSS (maximum 662 segment size), and the sending TCP receives an ECN-Echo ACK packet, 663 then the sending TCP should in principle still reduce its congestion 664 window in half. However, the value of the congestion window is 665 bounded below by a value of one MSS. If the sending TCP were to con- 666 tinue to send, using a congestion window of 1 MSS, this results in 667 the transmission of one packet per round-trip time. It is necessary 668 to still reduce the sending rate of the TCP sender even further, on 669 receipt of an ECN-Echo packet when the congestion window is one. We 670 use the retransmit timer as a means of reducing the rate further in 671 this circumstance. Therefore, the sending TCP MUST reset the 672 retransmit timer on receiving the ECN-Echo packet when the congestion 673 window is one. The sending TCP will then be able to send a new 674 packet only when the retransmit timer expires. 676 When an ECN-Capable TCP sender reduces its congestion window for any 677 reason (because of a retransmit timeout, a Fast Retransmit, or in 678 response to an ECN Notification), the TCP sender sets the CWR flag in 679 the TCP header of the first new data packet sent after the window 680 reduction. If that data packet is dropped in the network, then the 681 sending TCP will have to reduce the congestion window again and 682 retransmit the dropped packet. 684 We ensure that the "Congestion Window Reduced" information is reli- 685 ably delivered to the TCP receiver. This comes about from the fact 686 that if the new data packet carrying the CWR flag is dropped, then 687 the TCP sender will have to again reduce its congestion window, and 688 send another new data packet with the CWR flag set. Thus, the CWR 689 bit in the TCP header SHOULD NOT be set on retransmitted packets. 690 When the TCP data sender is ready to set the CWR bit after reducing 691 the congestion window, it SHOULD set the CWR bit only on the first 692 new data packet that it transmits. 694 [Floyd94] discusses TCP's response to ECN in more detail. [Floyd98] 695 discusses the validation test in the ns simulator, which illustrates 696 a wide range of ECN scenarios. These scenarios include the following: 697 an ECN followed by another ECN, a Fast Retransmit, or a Retransmit 698 Timeout; a Retransmit Timeout or a Fast Retransmit followed by an 699 ECN; and a congestion window of one packet followed by an ECN. 701 TCP follows existing algorithms for sending data packets in response 702 to incoming ACKs, multiple duplicate acknowledgements, or retransmit 703 timeouts [RFC2581]. TCP also follows the normal procedures for 704 increasing the congestion window when it receives ACK packets without 705 the ECN-Echo bit set [RFC2581]. 707 6.1.3. The TCP Receiver 709 When TCP receives a CE data packet at the destination end-system, the 710 TCP data receiver sets the ECN-Echo flag in the TCP header of the 711 subsequent ACK packet. If there is any ACK withholding implemented, 712 as in current "delayed-ACK" TCP implementations where the TCP 713 receiver can send an ACK for two arriving data packets, then the ECN- 714 Echo flag in the ACK packet will be set to the OR of the CE bits of 715 all of the data packets being acknowledged. That is, if any of the 716 received data packets are CE packets, then the returning ACK has the 717 ECN-Echo flag set. 719 To provide robustness against the possibility of a dropped ACK packet 720 carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in 721 a series of ACK packets sent subsequently. The TCP receiver uses the 722 CWR flag received from the TCP sender to determine when to stop set- 723 ting the ECN-Echo flag. 725 After a TCP receiver sends an ACK packet with the ECN-Echo bit set, 726 that TCP receiver continues to set the ECN-Echo flag in all the ACK 727 packets it sends (whether they acknowledge CE data packets or non-CE 728 data packets) until it receives a CWR packet (a packet with the CWR 729 flag set). After the receipt of the CWR packet, acknowledgements for 730 subsequent non-CE data packets do not have the ECN-Echo flag set. If 731 another CE packet is received by the data receiver, the receiver 732 would once again send ACK packets with the ECN-Echo flag set. While 733 the receipt of a CWR packet does not guarantee that the data sender 734 received the ECN-Echo message, this does suggest that the data sender 735 reduced its congestion window at some point *after* it sent the data 736 packet for which the CE bit was set. 738 We have already specified that a TCP sender is not required to reduce 739 its congestion window more than once per window of data. Some care 740 is required if the TCP sender is to avoid unnecessary reductions of 741 the congestion window when a window of data includes both dropped 742 packets and (marked) CE packets. This is illustrated in [Floyd98]. 744 6.1.4. Congestion on the ACK-path 746 For the current generation of TCP congestion control algorithms, pure 747 acknowledgement packets (e.g., packets that do not contain any accom- 748 panying data) should be sent with the ECT bit off. Current TCP 749 receivers have no mechanisms for reducing traffic on the ACK-path in 750 response to congestion notification. Mechanisms for responding to 751 congestion on the ACK-path are areas for current and future research. 752 (One simple possibility would be for the sender to reduce its conges- 753 tion window when it receives a pure ACK packet with the CE bit set). 754 For current TCP implementations, a single dropped ACK generally has 755 only a very small effect on the TCP's sending rate. 757 6.1.5. Retransmitted TCP packets 759 This document specifies that for ECN-capable TCP implementations, the 760 ECT bit (ECN-Capable Transport) in the IP header MUST NOT be set on 761 retransmitted data packets, and that the TCP data receiver SHOULD 762 ignore the ECN field on arriving data packets that are outside of the 763 receiver's current window. This is for greater security against 764 denial-of-service attacks, as well as for robustness of the ECN con- 765 gestion indication with packets that are dropped later in the net- 766 work. 768 First, we note that if the TCP sender were to set the ECT bit on a 769 retransmitted packet, then if an unnecessarily-retransmitted packet 770 was later dropped in the network, the end nodes would never receive 771 the indication of congestion from the router setting the CE bit. 772 Thus, setting the ECT bit on retransmitted data packets is not con- 773 sistent with the robust delivery of the congestion indication even 774 for packets that are later dropped in the network. 776 In addition, an attacker capable of spoofing the IP source address of 777 the TCP sender could send data packets with arbitrary sequence num- 778 bers, with both the ECT and CE bits set in the IP header. On receiv- 779 ing this spoofed data packet, the TCP data receiver would determine 780 that the data does not lie in the current receive window, and return 781 a duplicate acknowledgement. We define an out-of-window packet at 782 the TCP data receiver as a data packet that lies outside the 783 receiver's current window. On receiving an out-of-window packet, the 784 TCP data receiver has to decide whether or not to treat the CE bit in 785 the packet header as a valid indication of congestion, and therefore 786 whether to return ECN-Echo indications to the TCP data sender. If 787 the TCP data receiver ignored the CE bit in an out-of-window packet, 788 then the TCP data sender would not receive this possibly-legitimate 789 indication of congestion from the network, resulting in a violation 790 of end-to-end congestion control. On the other hand, if the TCP data 791 receiver honors the CE indication in the out-of-window packet, and 792 reports the indication of congestion to the TCP data sender, then the 793 malicious node that created the spoofed, out-of-window packet has 794 successfully "attacked" the TCP connection by forcing the data sender 795 to unnecessarily reduce (halve) its congestion window. To prevent 796 such a denial-of-service attack, we specify that a legitimate TCP 797 data sender MUST NOT set the ECT bit on retransmitted data packets, 798 and that the TCP data receiver SHOULD ignore the CE bit on out-of- 799 window packets. 801 One drawback of not setting ECT on retransmitted packets denies ECN 802 protection for retransmitted packets. However, for an ECN-capable 803 TCP connection in a fully-ECN-capable environment with mild conges- 804 tion, packets should rarely be dropped due to congestion in the first 805 place, and so instances of retransmitted packets should rarely arise. 806 If packets are being retransmitted, then there are already packet 807 losses (from corruption or from congestion) that ECN has been unable 808 to prevent. 810 We note that if the router sets the CE bit for an ECN-capable data 811 packet within a TCP connection, then the TCP connection is guaranteed 812 to receive that indication of congestion, or to receive some other 813 indication of congestion within the same window of data, even if this 814 packet is dropped or reordered in the network. We consider two 815 cases, when the packet is later retransmitted, and when the packet is 816 not later retransmitted. 818 In the first case, if the packet is either dropped or delayed, and at 819 some point retransmitted by the data sender, then the retransmission 820 is a result of a Fast Retransmit or a Retransmit Timeout for either 821 that packet or for some prior packet in the same window of data. In 822 this case, because the data sender already has retransmitted this 823 packet, we know that the data sender has already responded to an 824 indication of congestion for some packet within the same window of 825 data as the original packet. Thus, even if the first transmission of 826 the packet is dropped in the network, or is delayed, if it had the CE 827 bit set, and is later ignored by the data receiver as an out-of-win- 828 dow packet, this is not a problem, because the sender has already 829 responded to an indication of congestion for that window of data. 831 In the second case, if the packet is never retransmitted by the data 832 sender, then this data packet is the only copy of this data received 833 by the data receiver, and therefore arrives at the data receiver as 834 an in-window packet, regardless of how much the packet might be 835 delayed or reordered. In this case, if the CE bit is set on the 836 packet within the network, this will be treated by the data receiver 837 as a valid indication of congestion. 839 6.1.6. TCP Window Probes. 841 When the TCP data receiver advertises a zero window, the TCP data 842 sender sends window probes to determine if the receiver's window has 843 increased. Window probe packets do not contain any user data except 844 for the sequence number, which is a byte. If a window probe packet 845 is dropped in the network, this loss is not detected by the receiver. 846 Therefore, the TCP data sender MUST NOT set either the ECT or CWR 847 bits on window probe packets. 849 However, because window probes use exact sequence numbers, they can- 850 not be easily spoofed in denial-of-service attacks. Therefore, if a 851 window probe arrives with ECT and CE set, then the receiver SHOULD 852 respond to the ECN indications. 854 7. Non-compliance by the End Nodes 856 This section discusses concerns about the vulnerability of ECN to 857 non-compliant end-nodes (i.e., end nodes that set the ECT bit in 858 transmitted packets but do not respond to received CE packets). We 859 argue that the addition of ECN to the IP architecture will not sig- 860 nificantly increase the current vulnerability of the architecture to 861 unresponsive flows. 863 Even for non-ECN environments, there are serious concerns about the 864 damage that can be done by non-compliant or unresponsive flows (that 865 is, flows that do not respond to congestion control indications by 866 reducing their arrival rate at the congested link). For example, an 867 end-node could "turn off congestion control" by not reducing its con- 868 gestion window in response to packet drops. This is a concern for the 869 current Internet. It has been argued that routers will have to 870 deploy mechanisms to detect and differentially treat packets from 871 non-compliant flows [RFC2309,FF99]. It has also been suggested that 872 techniques such as end-to-end per-flow scheduling and isolation of 873 one flow from another, differentiated services, or end-to-end reser- 874 vations could remove some of the more damaging effects of unrespon- 875 sive flows. 877 It might seem that dropping packets in itself is an adequate deter- 878 rent for non-compliance, and that the use of ECN removes this deter- 879 rent. We would argue in response that (1) ECN-capable routers pre- 880 serve packet-dropping behavior in times of high congestion; and (2) 881 even in times of high congestion, dropping packets in itself is not 882 an adequate deterrent for non-compliance. 884 First, ECN-Capable routers will only mark packets (as opposed to 885 dropping them) when the packet marking rate is reasonably low. During 886 periods where the average queue size exceeds an upper threshold, and 887 therefore the potential packet marking rate would be high, our recom- 888 mendation is that routers drop packets rather then set the CE bit in 889 packet headers. 891 During the periods of low or moderate packet marking rates when ECN 892 would be deployed, there would be little deterrent effect on unre- 893 sponsive flows of dropping rather than marking those packets. For 894 example, delay-insensitive flows using reliable delivery might have 895 an incentive to increase rather than to decrease their sending rate 896 in the presence of dropped packets. Similarly, delay-sensitive flows 897 using unreliable delivery might increase their use of FEC in response 898 to an increased packet drop rate, increasing rather than decreasing 899 their sending rate. For the same reasons, we do not believe that 900 packet dropping itself is an effective deterrent for non-compliance 901 even in an environment of high packet drop rates, when all flows are 902 sharing the same packet drop rate. 904 Several methods have been proposed to identify and restrict non-com- 905 pliant or unresponsive flows. The addition of ECN to the network 906 environment would not in any way increase the difficulty of designing 907 and deploying such mechanisms. If anything, the addition of ECN to 908 the architecture would make the job of identifying unresponsive flows 909 slightly easier. For example, in an ECN-Capable environment routers 910 are not limited to information about packets that are dropped or have 911 the CE bit set at that router itself; in such an environment, routers 912 could also take note of arriving CE packets that indicate congestion 913 encountered by that packet earlier in the path. 915 8. Non-compliance in the Network 917 This section considers the issues when a router is operating, possi- 918 bly maliciously, to modify either of the bits in the ECN field. In 919 this section we represent the ECN field in the IP header by the tuple 920 (ECT bit, CE bit). 922 By tampering with the bits in the ECN field, an adversary (or a bro- 923 ken router) could do one or more of the following: falsely report 924 congestion, disable ECN-Capability for an individual packet, erase 925 the ECN congestion indication, or falsely indicate ECN-Capability. 926 Section 18 systematically examines the various cases by which the ECN 927 field could be modified. The important criterion considered in 928 determining the consequences of such modifications is whether it is 929 likely to lead to poorer behavior in any dimension (throughput, 930 delay, fairness or functionality) than if a router were to drop a 931 packet. 933 The first two possible changes, falsely reporting congestion or dis- 934 abling ECN-Capability for an individual packet, are no worse than if 935 the router were to simply drop the packet. From a congestion control 936 point of view, setting the CE bit in the absence of congestion by a 937 non-compliant router would be no worse than a router dropping a 938 packet unnecessarily. By "erasing" the ECT bit of a packet that is 939 later dropped in the network, a router's actions could result in an 940 unnecessary packet drop for that packet later in the network. 942 However, as discussed in Section 18, a router that erases the ECN 943 congestion indication or falsely indicates ECN-Capability could 944 potentially do more damage to the flow that if it has simply dropped 945 the packet. A rogue or broken router that "erased" the CE bit in 946 arriving CE packets would prevent that indication of congestion from 947 reaching downstream receivers. This could result in the failure of 948 congestion control for that flow and a resulting increase in conges- 949 tion in the network, ultimately resulting in subsequent packets 950 dropped for this flow as the average queue size increased at the con- 951 gested gateway. 953 Section 19 considers the potential repercussions of subverting end- 954 to-end congestion control by either falsely indicating ECN-Capabil- 955 ity, or by erasing the congestion indication in ECN (the CE-bit). We 956 observe in Section 19 that the consequence of subverting ECN-based 957 congestion control may lead to potential unfairness, but this is 958 likely to be no worse than the subversion of either ECN-based or 959 packet-based congestion control by the end nodes. 961 8.1. Complications Introduced by Split Paths 963 If a router or other network element has access to all of the packets 964 of a flow, then that router could do no more damage to a flow by 965 altering the ECN field than it could by simply dropping all of the 966 packets from that flow. However, in some cases, a malicious or bro- 967 ken router might have access to only a subset of the packets from a 968 flow. The question is as follows: can this router, by altering the 969 ECN field in this subset of the packets, do more damage to that flow 970 than if it has simply dropped that set of the packets? 972 This is also discussed in detail in Section 18, which conclude as 973 follows: It is true that the adversary that has access only to a 974 subset of packets in an aggregate might, by subverting ECN-based con- 975 gestion control, be able to deny the benefits of ECN to the other 976 packets in the aggregate. While this is undesirable, this is not a 977 sufficient concern to result in disabling ECN. 979 9. Encapsulated Packets 981 9.1. IP packets encapsulated in IP 983 The encapsulation of IP packet headers in tunnels is used in many 984 places, including IPsec and IP in IP [RFC2003]. This section consid- 985 ers issues related to interactions between ECN and IP tunnels, and 986 specifies two alternative solutions. This discussion is complemented 987 by RFC 2983's discussion of interactions between Differentiated Ser- 988 vices and IP tunnels of various forms [RFC 2983], as Differentiated 989 Services uses the remaining six bits of the IP header octet that is 990 used by ECN (see Figure 1 in Section 5). 992 Some IP tunnel modes are based on adding a new "outer" IP header that 993 encapsulates the original, or "inner" IP header and its associated 994 packet. In many cases, the new "outer" IP header may be added and 995 removed at intermediate points along a connection, enabling the net- 996 work to establish a tunnel without requiring endpoint participation. 997 We denote tunnels that specify that the outer header be discarded at 998 tunnel egress as "simple tunnels". 1000 ECN uses the ECT and CE flags in the IP header for signaling between 1001 routers and connection endpoints. ECN interacts with IP tunnels 1002 based on the treatment of these flags in the IP header. In simple IP 1003 tunnels the octet containing these flags is copied or mapped from the 1004 inner IP header to the outer IP header at IP tunnel ingress, and the 1005 outer header's copy of this field is discarded at IP tunnel egress. 1006 If the outer header were to be simply discarded without taking care 1007 to deal with the ECN related flags, and an ECN-capable router were to 1008 set the CE (Congestion Experienced) bit within a packet in a simple 1009 IP tunnel, this indication would be discarded at tunnel egress, los- 1010 ing the indication of congestion. 1012 Thus, the use of ECN over simple IP tunnels would result in routers 1013 attempting to use the outer IP header to signal congestion to end- 1014 points, but those congestion warnings never arriving because the 1015 outer header is discarded at the tunnel egress point. This problem 1016 was encountered with ECN and IPsec in tunnel mode, and RFC 2481 rec- 1017 ommended that ECN not be used with the older simple IPsec tunnels in 1018 order to avoid this behavior and its consequences. When ECN becomes 1019 widely deployed, then simple tunnels likely to carry ECN-capable 1020 traffic will have to be changed. 1022 From a security point of view, the use of ECN in the outer header of 1023 an IP tunnel might raise security concerns because an adversary could 1024 tamper with the ECN information that propagates beyond the tunnel 1025 endpoint. Based on an analysis in Sections 18 and 19 of these con- 1026 cerns and the resultant risks, our overall approach is to make sup- 1027 port for ECN an option for IP tunnels, so that an IP tunnel can be 1028 specified or configured either to use ECN or not to use ECN in the 1029 outer header of the tunnel. Thus, in environments or tunneling pro- 1030 tocols where the risks of using ECN are judged to outweigh its bene- 1031 fits, the tunnel can simply not use ECN in the outer header. Then 1032 the only indication of congestion experienced at routers within the 1033 tunnel would be through packet loss. 1035 The result is that there are two viable options for the behavior of 1036 ECN-capable connections over an IP tunnel, especially IPsec tunnels: 1037 * A limited-functionality option in which ECN is preserved in the 1038 inner header, but disabled in the outer header. The only mecha- 1039 nism available for signaling congestion occurring within the tun- 1040 nel in this case is dropped packets. 1041 * A full-functionality option that supports ECN in both the inner 1042 and outer headers, and propagates congestion warnings from nodes 1043 within the tunnel to endpoints. 1045 Support for these options requires varying amounts of changes to IP 1046 header processing at tunnel ingress and egress. A small subset of 1047 these changes sufficient to support only the limited-functionality 1048 option would be sufficient to eliminate any incompatibility between 1049 ECN and IP tunnels. 1051 One goal of this document is to give guidance about the tradeoffs 1052 between the limited-functionality and full-functionality options. A 1053 full discussion of the potential effects of an adversary's modifica- 1054 tions of the CE and ECT bits is given in Sections 18 and 19. 1056 9.1.1. The Limited-functionality and Full-functionality Options 1058 The limited-functionality option for ECN encapsulation in IP tunnels 1059 is for the ECT bit in the outside (encapsulating) header to be off 1060 (i.e., set to 0), regardless of the value of the ECT bit in the 1061 inside (encapsulated) header. With this option, the ECN field in the 1062 inner header is not altered upon de-capsulation. The disadvantage of 1063 this approach is that the flow does not have ECN support for that 1064 part of the path that is using IP tunneling, even if the encapsulated 1065 packet (from the original TCP sender) is ECN-Capable. That is, if 1066 the encapsulated packet arrives at a congested router that is ECN- 1067 capable, and the router can decide to drop or mark the packet as an 1068 indication of congestion to the end nodes, the router will not be 1069 permitted to set the CE bit in the packet header, but instead will 1070 have to drop the packet. 1072 The full-functionality option for ECN encapsulation is to copy the 1073 ECT bit of the inside header to the outside header on encapsulation, 1074 and to OR the CE bit from the outer header with the CE bit of the 1075 inside header on decapsulation. That is, for full ECN support the 1076 encapsulation and decapsulation processing involves the following: 1077 At tunnel ingress, the full-functionality option copies the value of 1078 ECT (bit 6) in the inner header to the outer header. CE (bit 7) is 1079 set to 0 in the outer header. Upon decapsulation at the tunnel 1080 egress, the full-functionality option sets CE to 1 in the inner 1081 header if the value of ECT (bit 6) in the inner header is 1, and the 1082 value of CE (bit 7) in the outer header is 1. Otherwise, no change 1083 is made to this field of the inner header. 1085 With the full-functionality option, a flow can take advantage of ECN 1086 in those parts of the path that might use IP tunneling. The disad- 1087 vantage of the full-functionality option from a security perspective 1088 is that the IP tunnel cannot protect the flow from certain modifica- 1089 tions to the ECN bits in the IP header within the tunnel. The poten- 1090 tial dangers from modifications to the ECN bits in the IP header are 1091 described in detail in Sections 18 and 19. 1093 (1) An IP tunnel MUST modify the handling of the DS field octet at 1094 IP tunnel endpoints by implementing either the limited-functional- 1095 ity or the full-functionality option. 1096 (2) Optionally, an IP tunnel MAY enable the endpoints of an IP 1097 tunnel to negotiate the choice between the limited-functionality 1098 and the full-functionality option for ECN in the tunnel. 1100 The minimum required to make ECN usable with IP tunnels is the lim- 1101 ited-functionality option, which prevents ECN from being enabled in 1102 the outer header of an IPsec tunnel. Full support for ECN requires 1103 the use of the full-functionality option. If there are no optional 1104 mechanisms for the tunnel endpoints to negotiate a choice between the 1105 limited-functionality or full-functionality option, there can be a 1106 pre-existing agreement between the tunnel endpoints about whether to 1107 support the limited-functionality or the full-functionality ECN 1108 option. 1110 In addition, it is RECOMMENDED that packets with ECT and CE both set 1111 to 1 in the outer header be dropped if they arrive at the tunnel 1112 egress point for a tunnel that uses the limited-functionality option, 1113 or for a tunnel that uses the full-functionality option but for which 1114 the ECT bit in the inner header is set to zero. This is motivated by 1115 backwards compatibility and to ensure that no unauthorized modifica- 1116 tions of the ECN field take place, and is discussed further in the 1117 next Section (9.1.2). 1119 9.1.2. Changes to the ECN Field within an IP Tunnel. 1121 The presence of a copy of the ECN field in the inner header of an IP 1122 tunnel mode packet provides an opportunity for detection of unautho- 1123 rized modifications to the ECT bit in the outer header. Comparison 1124 of the ECT bits in the inner and outer headers falls into two cate- 1125 gories for implementations that conform to this document: 1126 * If the IP tunnel uses the full-functionality option, then the 1127 values of the ECT bits in the inner and outer headers should be 1128 identical. 1129 * If the tunnel uses the limited-functionality option, then the 1130 ECT bit in the outer header should be 0. 1132 Receipt of a packet not satisfying the appropriate condition could be 1133 a cause of concern. 1135 Consider the case of an IP tunnel where the tunnel ingress point has 1136 not been updated to this document's requirements, while the tunnel 1137 egress point has been updated to support ECN. In this case, the IP 1138 tunnel is not explicitly configured to support the full-functionality 1139 ECN option. However, the tunnel ingress point is behaving identically 1140 to a tunnel ingress point that supports the full-functionality 1141 option. If packets from an ECN-capable connection use this tunnel, 1142 ECT will be set to 1 in the outer header at the tunnel ingress point. 1143 Congestion within the tunnel may then result in ECN-capable routers 1144 setting CE in the outer header. Because the tunnel has not been 1145 explicitly configured to support the full-functionality option, the 1146 tunnel egress point expects the ECT bit in the outer header to be 0. 1147 When an ECN-capable tunnel egress point receives a packet with the 1148 ECT bit in the outer header set to 1, in a tunnel that has not been 1149 configured to support the full-functionality option, that packet 1150 should be processed, according to whether CE bit was set, as follows. 1151 It is RECOMMENDED that such packets, with the ECT bit in the outer 1152 header set to 1 on a tunnel that has not been configured to support 1153 the full-functionality option, be dropped at the egress point if CE 1154 is set to 1 in the outer header but 0 in the inner header, and for- 1155 warded otherwise. 1157 An IP tunnel cannot provide protection against erasure of congestion 1158 indications based on resetting the value of the CE bit in packets for 1159 which ECT is set in the outer header. The erasure of congestion 1160 indications may impact the network and other flows in ways that would 1161 not be possible in the absence of ECN. It is important to note that 1162 erasure of congestion indications can only be performed to congestion 1163 indications placed by nodes within the tunnel; the copy of the CE bit 1164 in the inner header preserves congestion notifications from nodes 1165 upstream of the tunnel ingress. If erasure of congestion notifica- 1166 tions is judged to be a security risk that exceeds the congestion 1167 management benefits of ECN, then tunnels could be specified or con- 1168 figured to use the limited-functionality option. 1170 9.2. IPsec Tunnels 1172 IPsec supports secure communication over potentially insecure network 1173 components such as intermediate routers. IPsec protocols support two 1174 operating modes, transport mode and tunnel mode, that span a wide 1175 range of security requirements and operating environments. Transport 1176 mode security protocol header(s) are inserted between the IP (IPv4 or 1177 IPv6) header and higher layer protocol headers (e.g., TCP), and hence 1178 transport mode can only be used for end-to-end security on a connec- 1179 tion. IPsec tunnel mode is based on adding a new "outer" IP header 1180 that encapsulates the original, or "inner" IP header and its associ- 1181 ated packet. Tunnel mode security headers are inserted between these 1182 two IP headers. In contrast to transport mode, the new "outer" IP 1183 header and tunnel mode security headers can be added and removed at 1184 intermediate points along a connection, enabling security gateways to 1185 secure vulnerable portions of a connection without requiring endpoint 1186 participation in the security protocols. An important aspect of tun- 1187 nel mode security is that in the original specification, the outer 1188 header is discarded at tunnel egress, ensuring that security threats 1189 based on modifying the IP header do not propagate beyond that tunnel 1190 endpoint. Further discussion of IPsec can be found in [RFC2401]. 1192 The IPsec protocol as originally defined in [ESP, AH] required that 1193 the inner header's ECN field not be changed by IPsec decapsulation 1194 processing at a tunnel egress node; this would have ruled out the 1195 possibility of full-functionality mode for ECN. At the same time, 1196 this would ensure that an adversary's modifications to the ECN field 1197 cannot be used to launch theft- or denial-of-service attacks across 1198 an IPsec tunnel endpoint, as any such modifications will be discarded 1199 at the tunnel endpoint. 1201 In principle, permitting the use of ECN functionality in the outer 1202 header of an IPsec tunnel raises security concerns because an adver- 1203 sary could tamper with the information that propagates beyond the 1204 tunnel endpoint. Based on an analysis (included in Sections 18 and 1205 19) of these concerns and the associated risks, our overall approach 1206 has been to provide configuration support for IPsec changes to remove 1207 the conflict with ECN. 1209 In particular, in tunnel mode the IPsec tunnel MUST support either 1210 the limited-functionality or the full-functionality mode outlined in 1211 Section 9.1.1. 1213 This makes permission to use ECN functionality in the outer header of 1214 an IPsec tunnel a configurable part of the corresponding IPsec Secu- 1215 rity Association (SA), so that it can be disabled in situations where 1216 the risks are judged to outweigh the benefits. The result is that an 1217 IPsec security administrator is presented with two alternatives for 1218 the behavior of ECN-capable connections within an IPsec tunnel, the 1219 limited-functionality alternative and full-functionality alternative 1220 described earlier. All IPsec implementations MUST implement either 1221 the limited-functionality or the full-functionality alternative in 1222 order to eliminate incompatibility between ECN and IPsec tunnels, but 1223 implementers MAY choose to implement either alternative. 1225 In addition, this document specifies how the endpoints of an IPsec 1226 tunnel could negotiate enabling ECN functionality in the outer head- 1227 ers of that tunnel based on security policy. The ability to negoti- 1228 ate ECN usage between tunnel endpoints would enable a security admin- 1229 istrator to disable ECN in situations where she believes the risks 1230 (e.g., of lost congestion notifications) outweigh the benefits of 1231 ECN. 1233 The IPsec protocol, as defined in [ESP, AH], does not include the IP 1234 header's ECN field in any of its cryptographic calculations (in the 1235 case of tunnel mode, the outer IP header's ECN field is not 1236 included). Hence modification of the ECN field by a network node has 1237 no effect on IPsec's end-to-end security, because it cannot cause any 1238 IPsec integrity check to fail. As a consequence, IPsec does not pro- 1239 vide any defense against an adversary's modification of the ECN field 1240 (i.e., a man-in-the-middle attack), as the adversary's modification 1241 will also have no effect on IPsec's end-to-end security. In some 1242 environments, the ability to modify the ECN field without affecting 1243 IPsec integrity checks may constitute a covert channel; if it is nec- 1244 essary to eliminate such a channel or reduce its bandwidth, then the 1245 IPsec tunnel should be run in limited-functionality mode. 1247 9.2.1. Negotiation between Tunnel Endpoints 1249 This section describes the detailed changes to enable usage of ECN 1250 over IPsec tunnels, including the negotiation of ECN support between 1251 tunnel endpoints. This is supported by three changes to IPsec: 1252 * An optional Security Association Database (SAD) field indicating 1253 whether tunnel encapsulation and decapsulation processing allows 1254 or forbids ECN usage in the outer IP header. 1255 * An optional Security Association Attribute that enables negotia- 1256 tion of this SAD field between the two endpoints of an SA that 1257 supports tunnel mode. 1258 * Changes to tunnel mode encapsulation and decapsulation process- 1259 ing to allow or forbid ECN usage in the outer IP header based on 1260 the value of the SAD field. When ECN usage is allowed in the 1261 outer IP header, ECT is set in the outer header for ECN-capable 1262 connections and congestion notifications (indicated by the CE bit) 1263 from such connections are propagated to the inner header at tunnel 1264 egress. 1266 If negotiation of ECN usage is implemented, then the SAD field SHOULD 1267 also be implemented. On the other hand, negotiation of ECN usage is 1268 OPTIONAL in all cases, even for implementations that support the SAD 1269 field. The encapsulation and decapsulation processing changes are 1270 REQUIRED, but MAY be implemented without the other two changes by 1271 assuming that ECN usage is always forbidden. The full-functionality 1272 alternative for ECN usage over IPsec tunnels consists of the SAD 1273 field and the full version of encapsulation and decapsulation pro- 1274 cessing changes, with or without the OPTIONAL negotiation support. 1275 The limited-functionality alternative consists of a subset of the 1276 encapsulation and decapsulation changes that always forbids ECN 1277 usage. 1279 These changes are covered further in the following three subsections. 1281 9.2.1.1. ECN Tunnel Security Association Database Field 1283 Full ECN functionality adds a new field to the SAD (see [RFC2401]): 1285 ECN Tunnel: allowed or forbidden. 1287 Indicates whether ECN-capable connections using this SA in tunnel 1288 mode are permitted to receive ECN congestion notifications for 1289 congestion occurring within the tunnel. The allowed value enables 1290 ECN congestion notifications. The forbidden value disables such 1291 notifications, causing all congestion to be indicated via dropped 1292 packets. 1294 [OPTIONAL. The value of this field SHOULD be assumed to be 1295 "forbidden" in implementations that do not support it.] 1297 If this attribute is implemented, then the SA specification in a 1298 Security Policy Database (SPD) entry MUST support a corresponding 1299 attribute, and this SPD attribute MUST be covered by the SPD adminis- 1300 trative interface (currently described in Section 4.4.1 of 1301 [RFC2401]). 1303 9.2.1.2. ECN Tunnel Security Association Attribute 1305 A new IPsec Security Association Attribute is defined to enable the 1306 support for ECN congestion notifications based on the outer IP header 1307 to be negotiated for IPsec tunnels (see [RFC2407]). This attribute 1308 is OPTIONAL, although implementations that support it SHOULD also 1309 support the SAD field defined in Section 9.2.1.1. 1311 Attribute Type 1313 class value type 1314 ------------------------------------------------- 1315 ECN Tunnel 10 Basic 1317 The IPsec SA Attribute value 10 has been allocated by IANA to indi- 1318 cate that the ECN Tunnel SA Attribute is being negotiated; the type 1319 of this attribute is Basic (see Section 4.5 of [RFC2407]). The Class 1320 Values are used to conduct the negotiation. See [RFC2407, RFC2408, 1321 RFC2409] for further information including encoding formats and 1322 requirements for negotiating this SA attribute. 1324 Class Values 1326 ECN Tunnel 1328 Specifies whether ECN functionality is allowed to 1329 be used with Tunnel Encapsulation Mode. 1330 This affects tunnel encapsulation and decapsulation processing - 1331 see Section 9.2.1.3. 1333 RESERVED 0 1334 Allowed 1 1335 Forbidden 2 1337 Values 3-61439 are reserved to IANA. Values 61440-65535 are for 1338 private use. 1340 If unspecified, the default shall be assumed to be Forbidden. 1342 ECN Tunnel is a new SA attribute, and hence initiators that use it 1343 can expect to encounter responders that do not understand it, and 1344 therefore reject proposals containing it. For backwards compatibil- 1345 ity with such implementations initiators SHOULD always also include a 1346 proposal without the ECN Tunnel attribute to enable such a responder 1347 to select a transform or proposal that does not contain the ECN Tun- 1348 nel attribute. RFC 2407 currently requires responders to reject all 1349 proposals if any proposal contains an unknown attribute; this 1350 requirement is expected to be changed to require a responder not to 1351 select proposals or transforms containing unknown attributes. 1353 9.2.1.3. Changes to IPsec Tunnel Header Processing 1355 For full ECN support, the encapsulation and decapsulation processing 1356 for the IPv4 TOS field and the IPv6 Traffic Class field are changed 1357 from that specified in [RFC2401] to the following: 1359 <-- How Outer Hdr Relates to Inner Hdr --> 1360 Outer Hdr at Inner Hdr at 1361 IPv4 Encapsulator Decapsulator 1362 Header fields: -------------------- ------------ 1363 DS Field copied from inner hdr (5) no change 1364 ECN Field constructed (7) constructed (8) 1366 IPv6 1367 Header fields: 1368 DS Field copied from inner hdr (6) no change 1369 ECN Field constructed (7) constructed (8) 1371 (5)(6) If the packet will immediately enter a domain for which the 1372 DSCP value in the outer header is not appropriate, that value MUST 1373 be mapped to an appropriate value for the domain [RFC 2474]. Also 1374 see [RFC 2475] for further information. 1376 (7) If the value of the ECN Tunnel field in the SAD entry for this 1377 SA is "allowed" and the value of ECT (bit 0) is 1 in the inner 1378 header, set ECT to 1 in the outer header, else set ECT to 0 in the 1379 outer header. Set CE (bit 1) to 0 in the outer header. 1381 (8) If the value of the ECN tunnel field in the SAD entry for this 1382 SA is "allowed" and the value of ECT (bit 0) in the inner header 1383 is 1, then set the CE bit (bit 1) in the inner header to the logi- 1384 cal OR of the CE bit in the inner header with the CE bit in the 1385 outer header, else make no change to the ECN field. 1387 (5) and (6) are identical to match usage in [RFC2401], although 1388 they are different in [RFC2401]. 1390 The above description applies to implementations that support the ECN 1391 Tunnel field in the SAD; such implementations MUST implement this 1392 processing instead of the processing of the IPv4 TOS octet and IPv6 1393 Traffic Class octet defined in [RFC2401]. This constitutes the full- 1394 functionality alternative for ECN usage with IPsec tunnels. 1396 An implementation that does not support the ECN Tunnel field in the 1397 SAD MUST implement this processing by assuming that the value of the 1398 ECN Tunnel field of the SAD is "forbidden" for every SA. In this 1399 case, the processing of the ECN field reduces to: 1401 (7) Set the ECN field (ECT and CE bits) to zero in the outer 1402 header. 1403 (8) Make no change to the ECN field in the inner header. 1405 This constitutes the limited functionality alternative for ECN usage 1406 with IPsec tunnels. 1408 For backwards compatibility, packets with ECT and CE both set to 1 in 1409 the outer header SHOULD be dropped if they arrive on an SA that is 1410 using the limited-functionality option, or that is using the full- 1411 functionality option (i.e., and has set the ECT flag in the outer 1412 header to 1) for a packet with the ECT flag set to 0 in the inner 1413 header. 1415 9.2.2. Changes to the ECN Field within an IPsec Tunnel. 1417 If the ECN Field is changed inappropriately within an IPsec tunnel, 1418 and this change is detected at the tunnel egress, then the receipt of 1419 a packet not satisfying the appropriate condition for its SA is an 1420 auditable event. An implementation MAY create audit records with 1421 per-SA counts of incorrect packets over some time period rather than 1422 creating an audit record for each erroneous packet. Any such audit 1423 record SHOULD contain the headers from at least one erroneous packet, 1424 but need not contain the headers from every packet represented by the 1425 entry. 1427 9.2.3. Comments for IPsec Support 1429 Substantial comments were received on two areas of this document dur- 1430 ing review by the IPsec working group. This section describes these 1431 comments and explains why the proposed changes were not incorporated. 1433 The first comment indicated that per-node configuration is easier to 1434 implement than per-SA configuration. After serious thought and 1435 despite some initial encouragement of per-node configuration, it no 1436 longer seems to be a good idea. The concern is that as ECN-awareness 1437 is progressively deployed in IPsec, many ECN-aware IPsec implementa- 1438 tions will find themselves communicating with a mixture of ECN-aware 1439 and ECN-unaware IPsec tunnel endpoints. In such an environment with 1440 per-node configuration, the only reasonable thing to do is forbid ECN 1441 usage for all IPsec tunnels, which is not the desired outcome. 1443 In the second area, several reviewers noted that SA negotiation is 1444 complex, and adding to it is non-trivial. One reviewer suggested 1445 using ICMP after tunnel setup as a possible alternative. The addi- 1446 tion to SA negotiation in this document is OPTIONAL and will remain 1447 so; implementers are free to ignore it. The authors believe that the 1448 assurance it provides can be useful in a number of situations. In 1449 practice, if this is not implemented, it can be deleted at a subse- 1450 quent stage in the standards process. Extending ICMP to negotiate 1451 ECN after tunnel setup is more complex than extending SA attribute 1452 negotiation. Some tunnels do not permit traffic to be addressed to 1453 the tunnel egress endpoint, hence the ICMP packet would have to be 1454 addressed to somewhere else, scanned for by the egress endpoint, and 1455 discarded there or at its actual destination. In addition, ICMP 1456 delivery is unreliable, and hence there is a possibility of an ICMP 1457 packet being dropped, entailing the invention of yet another 1458 ack/retransmit mechanism. It seems better simply to specify an 1459 OPTIONAL extension to the existing SA negotiation mechanism. 1461 9.3. IP packets encapsulated in non-IP packet headers. 1463 A different set of issues are raised, relative to ECN, when IP pack- 1464 ets are encapsulated in tunnels with non-IP packet headers. This 1465 occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP]. 1466 For these protocols, there is no conflict with ECN; it is just that 1467 ECN cannot be used within the tunnel unless an ECN codepoint can be 1468 specified for the header of the encapsulating protocol. Earlier work 1469 considered a preliminary proposal for incorporating ECN into MPLS, 1470 and proposals for incorporating ECN into GRE, L2TP, or PPTP will be 1471 considered as the need arises. 1473 10. Issues Raised by Monitoring and Policing Devices 1475 One possibility is that monitoring and policing devices (or more 1476 informally, "penalty boxes") will be installed in the network to mon- 1477 itor whether best-effort flows are appropriately responding to con- 1478 gestion, and to preferentially drop packets from flows determined not 1479 to be using adequate end-to-end congestion control procedures. 1481 We recommend that any "penalty box" that detects a flow or an aggre- 1482 gate of flows that is not responding to end-to-end congestion control 1483 first change from marking to dropping packets from that flow, before 1484 taking any additional action to restrict the bandwidth available to 1485 that flow. Thus, initially, the router may drop packets in which the 1486 router would otherwise would have set the CE bit. This could include 1487 dropping those arriving packets for that flow that are ECN-Capable 1488 and that already have the CE bit set. In this way, any congestion 1489 indications seen by that router for that flow will be guaranteed to 1490 also be seen by the end nodes, even in the presence of malicious or 1491 broken routers elsewhere in the path. If we assume that the first 1492 action taken at any "penalty box" for an ECN-capable flow will be to 1493 drop packets instead of marking them, then there is no way that an 1494 adversary that subverts ECN-based end-to-end congestion control can 1495 cause a flow to be characterized as being non-cooperative and placed 1496 into a more severe action within the "penalty box". 1498 The monitoring and policing devices that are actually deployed could 1499 fall short of the `ideal' monitoring device described above, in that 1500 the monitoring is applied not to a single flow, but to an aggregate 1501 of flows (e.g., those sharing a single IPsec tunnel). In this case, 1502 the switch from marking to dropping would apply to all of the flows 1503 in that aggregate, denying the benefits of ECN to the other flows in 1504 the aggregate also. At the highest level of aggregation, another 1505 form of the disabling of ECN happens even in the absence of monitor- 1506 ing and policing devices, when ECN-Capable RED queues switch from 1507 marking to dropping packets as an indication of congestion when the 1508 average queue size has exceeded some threshold. 1510 If there were serious operational problems with routers inappropri- 1511 ately erasing the CE bit in packet headers, this could be addressed 1512 to some extent by including a one-bit ECN nonce in packet headers. 1513 Routers would erase the nonce when they set the CE bit [SCWA99]. 1514 Routers that erased the CE bit would face additional difficulty in 1515 reconstructing the original nonce, and thus repeated erasure of the 1516 CE bit would be more likely to be detected by the end-nodes. (This 1517 could in fact be done without adding any extra bits for ECN in the IP 1518 header, by using the ECN codepoints (ECT=1, CE=0) and (ECT=0, CE=1) 1519 as the two values for the nonce, and by defining the codepoint 1520 (ECT=0, CE=1) to mean exactly the same as the codepoint (ECT=1, 1521 CE=0).) However, at this point the potential danger of misbehaving 1522 routers does not seem of sufficient concern to warrant this addi- 1523 tional complication of adding an ECN nonce to protect against the 1524 erasure of the CE bit. Additional research is also needed to better 1525 understand the value of such a nonce and appropriate means of gener- 1526 ating sequences of nonce values that an adversary will find suffi- 1527 ciently difficult to reconstruct. 1529 An ECN nonce would also address the problem of misbehaving transport 1530 receivers lying to the transport sender about whether or not the CE 1531 bit was set in a packet. However, another possibility is for the 1532 data sender to test for a misbehaving receiver directly, by occasion- 1533 ally sending a data packet with ECT and CE set, to see if the 1534 receiver reports receiving the CE bit. Of course, if these packets 1535 encountered congestion in the network, the router would make no 1536 change in the packets, because the CE bit would already be set. 1537 Thus, for packets sent with the ECT and CE bits set, the TCP end- 1538 nodes could not determine if some router intended to set the CE bit 1539 in these packets. For this reason, sending packets with the ECT and 1540 CE bits would have to be done very sparingly. In addition, the TCP 1541 sender would have to remember which packets were sent with the ECT 1542 and CE bits set, so that it doesn't react to them as if there was 1543 congestion in the network. We believe that further research is 1544 needed on possible transport-based mechanisms for verifying that the 1545 transport receiver does not lie to the transport sender about the 1546 receipt of congestion indications. 1548 11. Evaluations of ECN 1550 This section discusses some of the related work evaluating the use of 1551 ECN. The ECN Web Page [ECN] has pointers to other papers, as well as 1552 to implementations of ECN. 1554 [Floyd94] considers the advantages and drawbacks of adding ECN to the 1555 TCP/IP architecture. As shown in the simulation-based comparisons, 1556 one advantage of ECN is to avoid unnecessary packet drops for short 1557 or delay-sensitive TCP connections. A second advantage of ECN is in 1558 avoiding some unnecessary retransmit timeouts in TCP. This paper 1559 discusses in detail the integration of ECN into TCP's congestion con- 1560 trol mechanisms. The possible disadvantages of ECN discussed in the 1561 paper are that a non-compliant TCP connection could falsely advertise 1562 itself as ECN-capable, and that a TCP ACK packet carrying an ECN-Echo 1563 message could itself be dropped in the network. The first of these 1564 two issues is discussed in the appendix of this document, and the 1565 second is addressed by the addition of the CWR flag in the TCP 1566 header. 1568 Experimental evaluations of ECN include [RFC2884,K98]. The conclu- 1569 sions of [K98] and [RFC2884] are that ECN TCP gets moderately better 1570 throughput than non-ECN TCP; that ECN TCP flows are fair towards non- 1571 ECN TCP flows; and that ECN TCP is robust with two-way traffic (with 1572 congestion in both directions) and with multiple congested gateways. 1573 Experiments with many short web transfers show that, while most of 1574 the short connections have similar transfer times with or without 1575 ECN, a small percentage of the short connections have very long 1576 transfer times for the non-ECN experiments as compared to the ECN 1577 experiments. 1579 12. Summary of changes required in IP and TCP 1581 This document specified two bits in the IP header, the ECN-Capable 1582 Transport (ECT) bit and the Congestion Experienced (CE) bit, to be 1583 used for ECN. The ECT bit set to "0" indicates that the transport 1584 protocol will ignore the CE bit. This is the default value for the 1585 ECT bit. The ECT bit set to "1" indicates that the transport proto- 1586 col is willing and able to participate in ECN. 1588 The default value for the CE bit is "0". The router sets the CE bit 1589 to "1" to indicate congestion to the end nodes. The CE bit in a 1590 packet header MUST NOT be reset by a router from "1" to "0". 1592 When viewed in terms of code points, this document has defined three 1593 code points for the ECN field, for "not ECT" (ECT=0, CE=0), "ECT but 1594 not CE" (ECT=1, CE=0), and "ECT and CE" (ECT=1, CE=1). The code 1595 point of (ECT=0, CE=1) is not defined in this document. One possi- 1596 bility would be for this code point to be used, some time in the 1597 future, for some other function for non-ECN-capable packets. A sec- 1598 ond possibility would be for this code point to be used as an ECN 1599 nonce, as described earlier in the document. A third possibility 1600 would be for the code point (ECT=0, CE=1) to be used to indicate that 1601 the packet is ECN-capable for an alternate semantics for the Conges- 1602 tion Experienced indication. However, at this time the code point 1603 (ECT=0, CE=1) remains undefined. 1605 TCP requires three changes for ECN, a setup phase and two new flags 1606 in the TCP header. The ECN-Echo flag is used by the data receiver to 1607 inform the data sender of a received CE packet. The Congestion Win- 1608 dow Reduced (CWR) flag is used by the data sender to inform the data 1609 receiver that the congestion window has been reduced. 1611 When ECN (Explicit Congestion Notification [RFC2481]) is used, it is 1612 required that congestion indications generated within an IP tunnel 1613 not be lost at the tunnel egress. We specified a minor modification 1614 to the IP protocol's handling of the ECN field during encapsulation 1615 and de-capsulation to allow flows that will undergo IP tunneling to 1616 use ECN. 1618 Two options for ECN in tunnels were specified: 1619 1) A limited-functionality option that does not use ECN inside the IP 1620 tunnel, by turning the ECT bit in the outer header off, and not 1621 altering the inner header at the time of decapsulation. 1622 2) The full-functionality option, which copies the ECT bit of the 1623 inner header to the encapsulating header. At decapsulation, if the 1624 ECT bit is set in the inner header, the CE bit on the outer header is 1625 ORed with the CE bit of the inner header to update the CE bit of the 1626 packet. 1628 All IP tunnels MUST implement one of the two alternative approaches 1629 described above. For IPsec tunnels, this document also defines an 1630 optional IPsec Security Association (SA) attribute that enables 1631 negotiation of ECN usage within IPsec tunnels and an optional field 1632 in the Security Association Database to indicate whether ECN is per- 1633 mitted in tunnel mode on a SA. The required changes to IPsec tunnels 1634 for ECN usage modify RFC 2401 [RFC2401], which defines the IPsec 1635 architecture and specifies some aspects of its implementation. The 1636 new IPsec SA attribute is in addition to those already defined in 1637 Section 4.5 of [RFC2407]. 1639 This document is intended to obsolete RFC 2481, "A Proposal to add 1640 Explicit Congestion Notification (ECN) to IP", which defined ECN as 1641 an Experimental Protocol for the Internet Community. The rest of 1642 this section describes the relationship between this document and its 1643 predecessor. 1645 RFC 2481 included a brief discussion of the use of ECN with encapsu- 1646 lated packets, and noted that for the IPsec specifications at the 1647 time (January 1999), flows could not safely use ECN if they were to 1648 traverse IPsec tunnels. RFC 2481 also described the changes that 1649 could be made to IPsec tunnel specifications to made them compatible 1650 with ECN. 1652 This document also incorporates work that was done after RFC 2481, 1653 First was to describe the changes to IPsec tunnels in detail, and 1654 extensively discuss the security implications of ECN (now included as 1655 Sections 18 and 19 of this document). Second was to extend the dis- 1656 cussion of IPsec tunnels to include all IP tunnels. Because older IP 1657 tunnels are not compatible with a flow's use of ECN, the deployment 1658 of ECN in the Internet will create strong pressure for older IP tun- 1659 nels to be updated to an ECN-compatible version, using either the 1660 limited-functionality or the full-functionality option. 1662 This document does not address the issue of including ECN in non-IP 1663 tunnels such as MPLS, GRE, L2TP, or PPTP. An earlier preliminary 1664 document about adding ECN support to MPLS was not advanced. 1666 A third new piece of work after RFC2481 was to describe the ECN pro- 1667 cedure with retransmitted data packets, that the ECT bit should not 1668 be set on retransmitted data packets. The motivation for this addi- 1669 tional specification is to eliminate a possible avenue for denial-of- 1670 service attacks on an existing TCP connection. Some prior deploy- 1671 ments of ECN-capable TCP might not conform to the (new) requirement 1672 not to set the ECT bit on retransmitted packets; we do not believe 1673 this will cause significant problems in practice. 1675 This document also expands slightly on the specification of the use 1676 of SYN packets for the negotiation of ECN. While some prior deploy- 1677 ments of ECN-capable TCP might not conform to the requirements speci- 1678 fied in this document, we do not believe that this will lead to any 1679 performance or compatibility problems for TCP connections with a com- 1680 bination of TCP implementations at the endpoints. 1682 13. Conclusions 1684 Given the current effort to implement AQM, we believe this is the 1685 right time to deploy congestion avoidance mechanisms that do not 1686 depend on packet drops alone. With the increased deployment of 1687 applications and transports sensitive to the delay and loss of a sin- 1688 gle packet (e.g., realtime traffic, short web transfers), depending 1689 on packet loss as a normal congestion notification mechanism appears 1690 to be insufficient (or at the very least, non-optimal). 1692 We examined the consequence of modifications of the ECN field within 1693 the network, analyzing all the opportunities for an adversary to 1694 change the ECN field. In many cases, the change to the ECN field is 1695 no worse than dropping a packet. However, we noted that some changes 1696 have the more serious consequence of subverting end-to-end congestion 1697 control. However, we point out that even then the potential damage 1698 is limited, and is similar to the threat posed by end-systems inten- 1699 tionally failing to cooperate with end-to-end congestion control. 1701 14. Acknowledgements 1703 Many people have made contributions to this work and this document, 1704 including many that we have not managed to directly acknowledge in 1705 this document. In addition, we would like to thank Kenjiro Cho for 1706 the proposal for the TCP mechanism for negotiating ECN-Capability, 1707 Kevin Fall for the proposal of the CWR bit, Steve Blake for material 1708 on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for discus- 1709 sions of ECN issues, and Steve Bellovin, Jim Bound, Brian Carpenter, 1710 Paul Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson for dis- 1711 cussions of security issues. We also thank the Internet End-to-End 1712 Research Group for ongoing discussions of these issues. 1714 Email discussions with a number of people, including Alexey 1715 Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have addressed 1716 the issues raised by non-conformant equipment in the Internet that 1717 does not respond to TCP SYN packets with the ECE and CWR flags set. 1718 We thank Mark Handley, Jitentra Padhye, and others for discussions on 1719 the TCP initialization procedures. 1721 The discussion of ECN and IP tunnel considerations draws heavily on 1722 related discussions and documents from the Differentiated Services 1723 Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh, 1724 for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen 1725 for proposing modifications to RFC 2407 that improve the usability of 1726 negotiating the ECN Tunnel SA attribute. 1728 15. References 1730 [AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402, 1731 November 1998. 1733 [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1734 Levels", BCP 14, RFC 2119, March 1997. 1736 [ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html". 1737 Reference for informational purposes only. 1739 [ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload", 1740 RFC 2406, November 1998. 1742 [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways 1743 for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 1744 N.4, August 1993, p. 397-413. 1746 [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM 1747 Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. 1749 [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", 1750 URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- 1751 ecn. Reference for informational purposes only. 1753 [FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-End Con- 1754 gestion Control in the Internet", IEEE/ACM Transactions on Network- 1755 ing, August 1999. 1757 [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", 1758 SIGCOMM '97, September 1997. 1760 [GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing 1761 Encapsulation (GRE), RFC 1701, October 1994. 1763 [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. 1764 ACM SIGCOMM '88, pp. 314-329. 1766 [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance Algo- 1767 rithm", Message to end2end-interest mailing list, April 1990. URL 1768 "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". 1770 [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN) 1771 benefits for TCP", Master's thesis, UCLA, 1998, URL 1772 "http://www.cs.ucla.edu/~hari/software/ecn/ ecn_report.ps.gz". 1774 [L2TP] W. Townsley, A. Valencia, A. Rubens, G. Pall, G. Zorn, and B. 1775 Palter Layer Two Tunneling Protocol "L2TP", RFC 2661, August 1999. 1777 [MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver- driven 1778 Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130. 1780 [MPLS] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus, 1781 Requirements for Traffic Engineering Over MPLS, RFC 2702, September 1782 1999. 1784 [PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W. 1785 and G. Zorn, "Point-to-Point Tunneling Protocol (PPTP)", RFC 2637, 1786 July 1999. 1788 [RFC791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1789 1981. 1791 [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 1792 September 1981. 1794 [RFC1141] Mallory, T. and A. Kullberg, "Incremental Updating of the 1795 Internet Checksum", RFC 1141, January 1990. 1797 [RFC1349] Almquist, P., "Type of Service in the Internet Protocol 1798 Suite", RFC 1349, July 1992. 1800 [RFC1455] Eastlake, D., "Physical Link Security Type of Service", RFC 1801 1455, May 1993. 1803 [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic 1804 Routing Encapsulation (GRE), RFC 1701, October 1994. 1806 [RFC1702] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic 1807 Routing Encapsulation over IPv4 networks, RFC 1702, October 1994. 1809 [RFC2003] Perkins, C., IP Encapsulation within IP, RFC 2003, October 1810 1996. 1812 [RFC 2119] S. Bradner, Key words for use in RFCs to Indicate Require- 1813 ment Levels, RFC 2119, March 1997. 1815 [RFC2309] Braden, B., et al., "Recommendations on Queue Management 1816 and Congestion Avoidance in the Internet", RFC 2309, April 1998. 1818 [RFC2401] S. Kent and R. Atkinson, Security Architecture for the 1819 Internet Protocol, RFC 2401, November 1998. 1821 [RFC2407] D. Piper, The Internet IP Security Domain of Interpretation 1822 for ISAKMP, RFC 2407, November 1998. 1824 [RFC2408] D. Maughan, M. Schertler, M. Schneider, and J. Turner, 1825 Internet Security Association and Key Management Protocol (ISAKMP), 1826 RFC 2409, November 1998. 1828 [RFC2409] D. Harkins and D. Carrel, The Internet Key Exchange (IKE), 1829 RFC 2409, November 1998. 1831 [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition 1832 of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 1833 Headers", RFC 2474, December 1998. 1835 [RFC2475] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. 1836 Weiss, An Architecture for Differentiated Services, RFC 2475, Decem- 1837 ber 1998. 1839 [RFC2481] K. Ramakrishnan and S. Floyd, A Proposal to add Explicit 1840 Congestion Notification (ECN) to IP, RFC 2481, January 1999. 1842 [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control", 1843 RFC 2581, April 1999. 1845 [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation 1846 of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884, 1847 July 2000. 1849 [RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983, 1850 October 2000. 1852 [RFC2780] S. Bradner and V. Paxson, "IANA Allocation Guidelines For 1853 Values In the Internet Protocol and Related Headers", RFC 2780, March 1854 2000. 1856 [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for 1857 Congestion Avoidance in Computer Networks", ACM Transactions on Com- 1858 puter Systems, Vol.8, No.2, pp. 158-181, May 1990. 1860 [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom 1861 Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM 1862 Computer Communications Review, October 1999. 1864 16. Security Considerations 1866 Security considerations have been discussed in Sections 7, 8, 18, and 1867 19. 1869 17. IPv4 Header Checksum Recalculation 1871 IPv4 header checksum recalculation is an issue with some high-end 1872 router architectures using an output-buffered switch, since most if 1873 not all of the header manipulation is performed on the input side of 1874 the switch, while the ECN decision would need to be made local to the 1875 output buffer. This is not an issue for IPv6, since there is no IPv6 1876 header checksum. The IPv4 TOS octet is the last byte of a 16-bit 1877 half-word. 1879 RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 1880 checksum after the TTL field is decremented. The incremental updat- 1881 ing of the IPv4 checksum after the CE bit was set would work as fol- 1882 lows: Let HC be the original header checksum, and let HC' be the new 1883 header checksum after the CE bit has been set. Then for header 1884 checksums calculated with one's complement subtraction, HC' would be 1885 recalculated as follows: 1887 HC' = { HC - 1 HC > 1 1888 { 0x0000 HC = 1 1890 For header checksums calculated on two's complement machines, HC' would 1891 be recalculated as follows after the CE bit was set: 1893 HC' = { HC - 1 HC > 0 1894 { 0xFFFE HC = 0 1896 18. Possible Changes to the ECN Field in the Network 1898 This section discusses in detail possible changes to the ECN field in 1899 the network, such as falsely reporting congestion, disabling ECN- 1900 Capability for an individual packet, erasing the ECN congestion indi- 1901 cation, or falsely indicating ECN-Capability. We represent the ECN 1902 bits in the IP header by the tuple (ECT bit, CE bit). 1904 18.1. Possible Changes to the IP Header 1906 18.1.1. Erasing the Congestion Indication 1908 First, we consider the changes that a router could make that would 1909 result in effectively erasing the congestion indication after it had 1910 been set by a router upstream. The convention followed is: 1911 (ECT, CE) of received packet -> (ECT, CE) of packet transmitted. 1913 (1, 1) -> (1, 0): erase only the CE bit that was set. 1914 (1, 1) -> (0, 0): erase both the ECT bit and the CE bit. 1915 (1, 1) -> (0, 1): erase the ECT bit 1917 The first change turns off the CE bit after it has been set by some 1918 upstream router along the path. The consequence for the upstream 1919 router is that there is a potential for congestion to build for a 1920 time, because the congestion indication does not reach the source. 1921 However, the packet would be received and acknowledged. 1923 The potential effect of erasing the congestion indication is complex, 1924 and is discussed in depth in Section 19 below. Note that the effect 1925 of erasing the congestion indication is different from dropping a 1926 packet in the network. When a data packet is dropped, the drop is 1927 detected by the TCP sender, and interpreted as an indication of con- 1928 gestion. Similarly, if a sufficient number of consecutive acknowl- 1929 edgement packets are dropped, causing the cumulative acknowledgement 1930 field not to be advanced at the sender, the sender is limited by the 1931 congestion window from sending additional packets, and ultimately the 1932 retransmit timer expires. 1934 In contrast, a systematic erasure of the CE bit by a downstream 1935 router can have the effect of causing a queue buildup at an upstream 1936 router, including the possible loss of packets due to buffer over- 1937 flow. There is a potential of unfairness in that another flow that 1938 goes through the congested router could react to the CE bit set while 1939 the flow that has the CE bit erased could see better performance. 1940 The limitations on this potential unfairness are discussed in more 1941 detail in Section 19 below. 1943 The second change is to turn off both the ECT and the CE bits, thus 1944 erasing the congestion indication and disabling ECN-Capability at the 1945 same time. The third change turns off only the ECT bit, disabling 1946 ECN-Capability. 1948 Within an IP tunnel using the full-functionality option, the third 1949 change would not erase the congestion indication, but would only dis- 1950 able ECN-Capability for that packet within the rest of the tunnel. 1951 However, when performed outside of an IP tunnel, the third change 1952 would also effectively erase the congestion indication, because an 1953 ECN field of (0, 1) is undefined. 1955 The `erasure' of the congestion indication is only effective if the 1956 packet does not end up being marked or dropped again by a downstream 1957 router. With the first change, the packet remains ECN-Capable, and 1958 could be either marked or dropped by a downstream router as an indi- 1959 cation of congestion. With the second and third changes, the packet 1960 is no longer ECN-capable, and can therefore be dropped but not marked 1961 by a downstream router as an indication of congestion. 1963 18.1.2. Falsely Reporting Congestion 1965 (1, 0) -> (1, 1) 1967 This change is to set the CE bit when the ECT bit was already set, 1968 even though there was no congestion. This change does not affect the 1969 treatment of that packet along the rest of the path. In particular, 1970 a router does not examine the CE bit in deciding whether to drop or 1971 mark an arriving packet. 1973 However, this could result in the application unnecessarily invoking 1974 end-to-end congestion control, and reducing its arrival rate. By 1975 itself, this is no worse (for the application or for the network) 1976 than if the tampering router had actually dropped the packet. 1978 18.1.3. Disabling ECN-Capability 1980 (1, 0) -> (0, *) 1982 This change is to turn off the ECT bit of a packet that does not have 1983 the CE bit set. (Section 18.1.1 discussed the case of turning off 1984 the ECT bit of a packet that does have the CE bit set.) This means 1985 that if the packet later encounters congestion (e.g., by arriving to 1986 a RED queue with a moderate average queue size), it will be dropped 1987 instead of being marked. By itself, this is no worse (for the appli- 1988 cation) than if the tampering router had actually dropped the packet. 1989 The saving grace in this particular case is that there is no con- 1990 gested router upstream expecting a reaction from setting the CE bit. 1992 18.1.4. Falsely Indicating ECN-Capability 1993 This change would incorrectly label a packet as ECN-Capable. The 1994 packet may have been sent either by an ECN-Capable transport or a 1995 transport that is not ECN-Capable. 1997 (0, *) -> (1, 0); 1998 (0, *) -> (1, 1); 2000 If the packet later encounters moderate congestion at an ECN-Capable 2001 router, the router could set the CE bit instead of dropping the 2002 packet. If the transport protocol in fact is not ECN-Capable, then 2003 the transport will never receive this indication of congestion, and 2004 will not reduce its sending rate in response. The potential conse- 2005 quences of falsely indicating ECN-capability are discussed further in 2006 Section 19 below. 2008 If the packet never later encounters congestion at an ECN-Capable 2009 router, then the first of these two changes would have no effect. 2010 The second change, however, would have the effect of giving false 2011 reports of congestion to a monitoring device along the path. If the 2012 transport protocol is ECN-Capable, then the second of these two 2013 changes (when, for example, (0,0) was changed to (1,1)) could also 2014 have an effect at the transport level, by combining falsely indicat- 2015 ing ECN-Capability with falsely reporting congestion. For an ECN- 2016 capable transport, this would cause the transport to unnecessarily 2017 react to congestion. In this particular case, the router that is 2018 incorrectly changing the ECN field could have dropped the packet. 2019 Thus for this case of an ECN-capable transport, the consequence of 2020 this change to the ECN field is no worse than dropping the packet. 2022 18.1.5. Changes with No Functional Effect 2024 (0, *) -> (0, *) 2026 The CE bit is ignored in a packet that does not have the ECT bit set. 2027 Thus, this change would have no effect, in terms of ECN. 2029 18.2. Information carried in the Transport Header 2031 For TCP, an ECN-capable TCP receiver informs its TCP peer that it is 2032 ECN-capable at the TCP level, conveying this information in the TCP 2033 header at the time the connection is setup. This document does not 2034 consider potential dangers introduced by changes in the transport 2035 header within the network. In the case of IPsec tunnels, the IPsec 2036 tunnel protects the transport header. 2038 Another issue concerns TCP packets with a spoofed IP source address 2039 carrying invalid ECN information in the transport header. For com- 2040 pleteness, we examine here some possible ways that a node spoofing 2041 the IP source address of another node could use the two ECN flags in 2042 the TCP header to launch a denial-of-service attack. However, these 2043 attacks would require an ability for the attacker to use valid TCP 2044 sequence numbers, and any attacker with this ability and with the 2045 ability to spoof IP source addresses could damage the TCP connection 2046 without using the ECN flags. Therefore, ECN does not add any new 2047 vulnerabilities in this respect. 2049 An acknowledgement packet with a spoofed IP source address of the TCP 2050 data receiver could include the ECE bit set. If accepted by the TCP 2051 data sender as a valid packet, this spoofed acknowledgement packet 2052 could result in the TCP data sender unnecessarily halving its conges- 2053 tion window. However, to be accepted by the data sender, such a 2054 spoofed acknowledgement packet would have to have the correct 32-bit 2055 sequence number as well as a valid acknowledgement number. An 2056 attacker that could successfully send such a spoofed acknowledgement 2057 packet could also send a spoofed RST packet, or do other equally dam- 2058 aging operations to the TCP connection. 2060 Packets with a spoofed IP source address of the TCP data sender could 2061 include the CWR bit set. Again, to be accepted, such a packet would 2062 have to have a valid sequence number. In addition, such a spoofed 2063 packet would have a limited performance impact. Spoofing a data 2064 packet with the CWR bit set could result in the TCP data receiver 2065 sending fewer ECE packets than it would otherwise, if the data 2066 receiver was sending ECE packets when it received the spoofed CWR 2067 packet. 2069 18.3. Split Paths 2071 In some cases, a malicious or broken router might have access to only 2072 a subset of the packets from a flow. The question is as follows: 2073 can this router, by altering the ECN field in this subset of the 2074 packets, do more damage to that flow than if it had simply dropped 2075 that set of packets? 2077 We will classify the packets in the flow as A packets and B packets, 2078 and assume that the adversary only has access to A packets. Assume 2079 that the adversary is subverting end-to-end congestion control along 2080 the path traveled by A packets only, by either falsely indicating 2081 ECN-Capability upstream of the point where congestion occurs, or 2082 erasing the congestion indication downstream. Consider also that 2083 there exists a monitoring device that sees both the A and B packets, 2084 and will "punish" both the A and B packets if the total flow is 2085 determined not to be properly responding to indications of conges- 2086 tion. Another key characteristic that we believe is likely to be 2087 true is that the monitoring device, before `punishing' the A&B flow, 2088 will first drop packets instead of setting the CE bit, and will drop 2089 arriving packets of that flow that already have the ECT and CE bits 2090 set. If the end nodes are in fact using end-to-end congestion con- 2091 trol, they will see all of the indications of congestion seen by the 2092 monitoring device, and will begin to respond to these indications of 2093 congestion. Thus, the monitoring device is successful in providing 2094 the indications to the flow at an early stage. 2096 It is true that the adversary that has access only to the A packets 2097 might, by subverting ECN-based congestion control, be able to deny 2098 the benefits of ECN to the other packets in the A&B aggregate. While 2099 this is unfortunate, this is not a reason to disable ECN within an 2100 IPsec tunnel. 2102 A variant of falsely reporting congestion occurs when there are two 2103 adversaries along a path, where the first adversary falsely reports 2104 congestion, and the second adversary `erases' those reports. (Unlike 2105 packet drops, ECN congestion reports can be `reversed' later in the 2106 network by a malicious or broken router.) While this would be trans- 2107 parent to the end node, it is possible that a monitoring device 2108 between the first and second adversaries would see the false indica- 2109 tions of congestion. Keep in mind our recommendation in this docu- 2110 ment, that before `punishing' a flow for not responding appropriately 2111 to congestion, the router will first switch to dropping rather than 2112 marking as an indication of congestion, for that flow. When this 2113 includes dropping arriving packets from that flow that have the CE 2114 bit set, this ensures that these indications of congestion are being 2115 seen by the end nodes. Thus, there is no additional harm that we are 2116 able to postulate as a result of multiple conflicting adversaries. 2118 19. Implications of Subverting End-to-End Congestion Control 2120 This section focuses on the potential repercussions of subverting 2121 end-to-end congestion control by either falsely indicating ECN-Capa- 2122 bility, or by erasing the congestion indication in ECN (the CE-bit). 2123 Subverting end-to-end congestion control by either of these two meth- 2124 ods can have consequences both for the application and for the net- 2125 work. We discuss these separately below. 2127 The first method to subvert end-to-end congestion control, that of 2128 falsely indicating ECN-Capability, effectively subverts end-to-end 2129 congestion control only if the packet later encounters congestion 2130 that results in the setting of the CE bit. In this case, the trans- 2131 port protocol (which may not be ECN-capable) does not receive the 2132 indication of congestion from these downstream congested routers. 2134 The second method to subvert end-to-end congestion control, `erasing' 2135 the (set) CE bit in a packet, effectively subverts end-to-end conges- 2136 tion control only when the CE bit in the packet was set earlier by a 2137 congested router. In this case, the transport protocol does not 2138 receive the indication of congestion from the upstream congested 2139 routers. 2141 Either of these two methods of subverting end-to-end congestion con- 2142 trol can potentially introduce more damage to the network (and possi- 2143 bly to the flow itself) than if the adversary had simply dropped 2144 packets from that flow. However, as we discuss later in this section 2145 and in Section 7, this potential damage is limited. 2147 19.1. Implications for the Network and for Competing Flows 2149 The CE bit of the ECN field is only used by routers as an indication 2150 of congestion during periods of *moderate* congestion. ECN-capable 2151 routers should drop rather than mark packets during heavy congestion 2152 even if the router's queue is not yet full. For example, for routers 2153 using active queue management based on RED, the router should drop 2154 rather than mark packets that arrive while the average queue sizes 2155 exceed the RED queue's maximum threshold. 2157 One consequence for the network of subverting end-to-end congestion 2158 control is that flows that do not receive the congestion indications 2159 from the network might increase their sending rate until they drive 2160 the network into heavier congestion. Then, the congested router 2161 could begin to drop rather than mark arriving packets. For flows 2162 that are not isolated by some form of per-flow scheduling or other 2163 per-flow mechanisms, but are instead aggregated with other flows in a 2164 single queue in an undifferentiated fashion, this packet-dropping at 2165 the congested router would apply to all flows that share that queue. 2166 Thus, the consequences would be to increase the level of congestion 2167 in the network. 2169 In some cases, the increase in the level of congestion will lead to a 2170 substantial buffer buildup at the congested queue that will be suffi- 2171 cient to drive the congested queue from the packet-marking to the 2172 packet-dropping regime. This transition could occur either because 2173 of buffer overflow, or because of the active queue management policy 2174 described above that drops packets when the average queue is above 2175 RED's maximum threshold. At this point, all flows, including the 2176 subverted flow, will begin to see packet drops instead of packet 2177 marks, and a malicious or broken router will no longer be able to 2178 `erase' these indications of congestion in the network. If the end 2179 nodes are deploying appropriate end-to-end congestion control, then 2180 the subverted flow will reduce its arrival rate in response to con- 2181 gestion. When the level of congestion is sufficiently reduced, the 2182 congested queue can return from the packet-dropping regime to the 2183 packet-marking regime. The steady-state pattern could be one of the 2184 congested queue oscillating between these two regimes. 2186 In other cases, the consequences of subverting end-to-end congestion 2187 control will not be severe enough to drive the congested link into 2188 sufficiently-heavy congestion that packets are dropped instead of 2189 being marked. In this case, the implications for competing flows in 2190 the network will be a slightly-increased rate of packet marking or 2191 dropping, and a corresponding decrease in the bandwidth available to 2192 those flows. This can be a stable state if the arrival rate of the 2193 subverted flow is sufficiently small, relative to the link bandwidth, 2194 that the average queue size at the congested router remains under 2195 control. In particular, the subverted flow could have a limited 2196 bandwidth demand on the link at this router, while still getting more 2197 than its "fair" share of the link. This limited demand could be due 2198 to a limited demand from the data source; a limitation from the TCP 2199 advertised window; a lower-bandwidth access pipe; or other factors. 2200 Thus the subversion of ECN-based congestion control can still lead to 2201 unfairness, which we believe is appropriate to note here. 2203 The threat to the network posed by the subversion of ECN-based con- 2204 gestion control in the network is essentially the same as the threat 2205 posed by an end-system that intentionally fails to cooperate with 2206 end-to-end congestion control. The deployment of mechanisms in 2207 routers to address this threat is an open research question, and is 2208 discussed further in Section 10. 2210 Let us take the example described in Section 18.1.1, where the CE bit 2211 that was set in a packet is erased: {(1, 1) -> (1, 0)}. The conse- 2212 quence for the congested upstream router that set the CE bit is that 2213 this congestion indication does not reach the end nodes for that 2214 flow. The source (even one which is completely cooperative and not 2215 malicious) is thus allowed to continue to increase its sending rate 2216 (if it is a TCP flow, by increasing its congestion window). The flow 2217 potentially achieves better throughput than the other flows that also 2218 share the congested router, especially if there are no policing mech- 2219 anisms or per-flow queueing mechanisms at that router. Consider the 2220 behavior of the other flows, especially if they are cooperative: that 2221 is, the flows that do not experience subverted end-to-end congestion 2222 control. They are likely to reduce their load (e.g., by reducing 2223 their window size) on the congested router, thus benefiting our sub- 2224 verted flow. This results in unfairness. As we discussed above, this 2225 unfairness could either be transient (because the congested queue is 2226 driven into the packet-marking regime), oscillatory (because the con- 2227 gested queue oscillates between the packet marking and the packet 2228 dropping regime), or more moderate but a persistent stable state 2229 (because the congested queue is never driven to the packet dropping 2230 regime). 2232 The results would be similar if the subverted flow was intentionally 2233 avoiding end-to-end congestion control. One difference is that a 2234 flow that is intentionally avoiding end-to-end congestion control at 2235 the end nodes can avoid end-to-end congestion control even when the 2236 congested queue is in packet-dropping mode, by refusing to reduce its 2237 sending rate in response to packet drops in the network. Thus the 2238 problems for the network from the subversion of ECN-based congestion 2239 control are less severe than the problems caused by the intentional 2240 avoidance of end-to-end congestion control in the end nodes. It is 2241 also the case that it is considerably more difficult to control the 2242 behavior of the end nodes than it is to control the behavior of the 2243 infrastructure itself. This is not to say that the problems for the 2244 network posed by the network's subversion of ECN-based congestion 2245 control are small; just that they are dwarfed by the problems for the 2246 network posed by the subversion of either ECN-based or other cur- 2247 rently known packet-based congestion control mechanisms by the end 2248 nodes. 2250 19.2. Implications for the Subverted Flow 2252 When a source indicates that it is ECN-capable, there is an expecta- 2253 tion that the routers in the network that are capable of participat- 2254 ing in ECN will use the CE bit for indication of congestion. There is 2255 the potential benefit of using ECN in reducing the amount of packet 2256 loss (in addition to the reduced queueing delays because of active 2257 queue management policies). When the packet flows through a tunnel 2258 where the nodes that the tunneled packets traverse are untrusted in 2259 some way, the expectation is that IPsec will protect the flow from 2260 subversion that results in undesirable consequences. 2262 In many cases, a subverted flow will benefit from the subversion of 2263 end-to-end congestion control for that flow in the network, by 2264 receiving more bandwidth than it would have otherwise, relative to 2265 competing non-subverted flows. If the congested queue reaches the 2266 packet-dropping stage, then the subversion of end-to-end congestion 2267 control might or might not be of overall benefit to the subverted 2268 flow, depending on that flow's relative tradeoffs between throughput, 2269 loss, and delay. 2271 One form of subverting end-to-end congestion control is to falsely 2272 indicate ECN-capability by setting the ECT bit. This has the conse- 2273 quence of downstream congested routers setting the CE bit in vain. 2274 However, as described in Section 9.1.2, if the ECT bit is changed in 2275 an IP tunnel, this can be detected at the egress point of the tunnel, 2276 as long as the inner header was not changed within the tunnel. 2278 The second form of subverting end-to-end congestion control is to 2279 erase the congestion indication, either by erasing the CE bit 2280 directly, or by erasing the ECT bit when the CE bit is already set. 2281 In this case, it is the upstream congested routers that set the CE 2282 bit in vain. 2284 If the ECT bit is erased within an IP tunnel, then this can be 2285 detected at the egress point of the tunnel, as long as the inner 2286 header was not changed within the tunnel. If the CE bit is set 2287 upstream of the IP tunnel, then any erasure of the outer header's CE 2288 bit within the tunnel will have no effect because the inner header 2289 preserves the set value of the CE bit. However, if the CE bit is set 2290 within the tunnel, and erased either within or downstream of the tun- 2291 nel, this is not necessarily detected at the egress point of the tun- 2292 nel. 2294 With this subversion of end-to-end congestion control, an end-system 2295 transport does not respond to the congestion indication. Along with 2296 the increased unfairness for the non-subverted flows described in the 2297 previous section, the congested router's queue could continue to 2298 build, resulting in packet loss at the congested router - which is a 2299 means for indicating congestion to the transport in any case. In the 2300 interim, the flow might experience higher queueing delays, possibly 2301 along with an increased bandwidth relative to other non-subverted 2302 flows. But transports do not inherently make assumptions of consis- 2303 tently experiencing carefully managed queueing in the path. We 2304 believe that these forms of subverting end-to-end congestion control 2305 are no worse for the subverted flow than if the adversary had simply 2306 dropped the packets of that flow itself. 2308 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control 2310 We have shown that, in many cases, a malicious or broken router that 2311 is able to change the bits in the ECN field can do no more damage 2312 than if it had simply dropped the packet in question. However, this 2313 is not true in all cases, in particular in the cases where the broken 2314 router subverted end-to-end congestion control by either falsely 2315 indicating ECN-Capability or by erasing the ECN congestion indication 2316 (in the CE-bit). While there are many ways that a router can harm a 2317 flow by dropping packets, a router cannot subvert end-to-end conges- 2318 tion control by dropping packets. As an example, a router cannot 2319 subvert TCP congestion control by dropping data packets, acknowledge- 2320 ment packets, or control packets. 2322 Even though packet-dropping cannot be used to subvert end-to-end con- 2323 gestion control, there *are* non-ECN-based methods for subverting 2324 end-to-end congestion control that a broken or malicious router could 2325 use. For example, a broken router could duplicate data packets, thus 2326 effectively negating the effects of end-to-end congestion control 2327 along some portion of the path. (For a router that duplicated pack- 2328 ets within an IPsec tunnel, the security administrator can cause the 2329 duplicate packets to be discarded by configuring anti-replay protec- 2330 tion for the tunnel.) This duplication of packets within the network 2331 would have similar implications for the network and for the subverted 2332 flow as those described in Sections 18.1.1 and 18.1.4 above. 2334 20. The Motivation for the ECT bit. 2336 The need for the ECT bit is motivated by the fact that ECN will be 2337 deployed incrementally in an Internet where some transport protocols 2338 and routers understand ECN and some do not. With the ECT bit, the 2339 router can drop packets from flows that are not ECN-capable, but can 2340 *instead* set the CE bit in packets that *are* ECN-capable. Because 2341 the ECT bit allows an end node to have the CE bit set in a packet 2342 *instead* of having the packet dropped, an end node might have some 2343 incentive to deploy ECN. 2345 If there was no ECT indication, then the router would have to set the 2346 CE bit for packets from both ECN-capable and non-ECN-capable flows. 2347 In this case, there would be no incentive for end-nodes to deploy 2348 ECN, and no viable path of incremental deployment from a non-ECN 2349 world to an ECN-capable world. Consider the first stages of such an 2350 incremental deployment, where a subset of the flows are ECN-capable. 2351 At the onset of congestion, when the packet dropping/marking rate 2352 would be low, routers would only set CE bits, rather than dropping 2353 packets. However, only those flows that are ECN-capable would under- 2354 stand and respond to CE packets. The result is that the ECN-capable 2355 flows would back off, and the non-ECN-capable flows would be unaware 2356 of the ECN signals and would continue to open their congestion win- 2357 dows. 2359 In this case, there are two possible outcomes: (1) the ECN-capable 2360 flows back off, the non-ECN-capable flows get all of the bandwidth, 2361 and congestion remains mild, or (2) the ECN-capable flows back off, 2362 the non-ECN-capable flows don't, and congestion increases until the 2363 router transitions from setting the CE bit to dropping packets. 2364 While this second outcome evens out the fairness, the ECN-capable 2365 flows would still receive little benefit from being ECN-capable, 2366 because the increased congestion would drive the router to packet- 2367 dropping behavior. 2369 A flow that advertised itself as ECN-Capable but does not respond to 2370 CE bits is functionally equivalent to a flow that turns off conges- 2371 tion control, as discussed earlier in this document. 2373 Thus, in a world when a subset of the flows are ECN-capable, but 2374 where ECN-capable flows have no mechanism for indicating that fact to 2375 the routers, there would be less effective and less fair congestion 2376 control in the Internet, resulting in a strong incentive for end 2377 nodes not to deploy ECN. 2379 21. Why use Two Bits in the IP Header? 2381 Given the need for an ECT indication in the IP header, there still 2382 remains the question of whether the ECT (ECN-Capable Transport) and 2383 CE (Congestion Experienced) indications should have been overloaded 2384 on a single bit. This overloaded-one-bit alternative, explored in 2385 [Floyd94], would have involved a single bit with two values. One 2386 value, "ECT and not CE", would represent an ECN-Capable Transport, 2387 and the other value, "CE or not ECT", would represent either Conges- 2388 tion Experienced or a non-ECN-Capable transport. 2390 One difference between the one-bit and two-bit implementations con- 2391 cerns packets that traverse multiple congested routers. Consider a 2392 CE packet that arrives at a second congested router, and is selected 2393 by the active queue management at that router for either marking or 2394 dropping. In the one-bit implementation, the second congested router 2395 has no choice but to drop the CE packet, because it cannot distin- 2396 guish between a CE packet and a non-ECT packet. In the two-bit 2397 implementation, the second congested router has the choice of either 2398 dropping the CE packet, or of leaving it alone with the CE bit set. 2400 Another difference between the one-bit and two-bit implementations 2401 comes from the fact that with the one-bit implementation, receivers 2402 in a single flow cannot distinguish between CE and non-ECT packets. 2403 Thus, in the one-bit implementation an ECN-capable data sender would 2404 have to unambiguously indicate to the receiver or receivers whether 2405 each packet had been sent as ECN-Capable or as non-ECN-Capable. One 2406 possibility would be for the sender to indicate in the transport 2407 header whether the packet was sent as ECN-Capable. A second possi- 2408 bility that would involve a functional limitation for the one- bit 2409 implementation would be for the sender to unambiguously indicate that 2410 it was going to send *all* of its packets as ECN-Capable or as non- 2411 ECN-Capable. For a multicast transport protocol, this unambiguous 2412 indication would have to be apparent to receivers joining an on-going 2413 multicast session. 2415 Another concern that was described earlier (and recommended in this 2416 document) is that transports (particularly TCP) should not mark pure 2417 ACK packets or retransmitted packets as being ECN-Capable. A pure 2418 ACK packet from a non-ECN-capable transport could be dropped, without 2419 necessarily having an impact on the transport from a congestion con- 2420 trol perspective (because subsequent ACKs are cumulative). An ECN- 2421 capable transport reacting to the CE bit set in a pure ACK packet by 2422 reducing the window would be at a disadvantage in comparison to a 2423 non-ECN-capable transport. For this reason (and for reasons described 2424 earlier in relation to retransmitted packets), it is desirable to 2425 have the ECN-Capable bit indication on a per-packet basis. 2427 Another advantage of the two-bit approach is that it is somewhat more 2428 robust. The most critical issue, discussed in Section 8, is that the 2429 default indication should be that of a non-ECN-Capable transport. In 2430 a two-bit implementation, this requirement for the default value sim- 2431 ply means that the ECT bit should be `OFF' by default. In the one- 2432 bit implementation, this means that the single overloaded bit should 2433 by default be in the "CE or not ECT" position. This is less clear 2434 and straightforward, and possibly more open to incorrect implementa- 2435 tions either in the end nodes or in the routers. 2437 In summary, while the one-bit implementation could be a possible 2438 implementation, it has the following significant limitations relative 2439 to the two-bit implementation. First, the one-bit implementation has 2440 more limited functionality for the treatment of CE packets at a sec- 2441 ond congested router. Second, the one-bit implementation requires 2442 either that extra information be carried in the transport header of 2443 packets from ECN-Capable flows (to convey the functionality of the 2444 second bit elsewhere, namely in the transport header), or that 2445 senders in ECN-Capable flows accept the limitation that receivers 2446 must be able to determine a priori which packets are ECN-Capable and 2447 which are not ECN-Capable. Third, the one-bit implementation is pos- 2448 sibly more open to errors from faulty implementations that choose the 2449 wrong default value for the ECN bit. We believe that the use of the 2450 extra bit in the IP header for the ECT-bit is extremely valuable to 2451 overcome these limitations. 2453 22. Historical Definitions for the IPv4 TOS Octet 2455 RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP 2456 header. In RFC 791, bits 6 and 7 of the ToS octet are listed as 2457 "Reserved for Future Use", and are shown set to zero. The first two 2458 fields of the ToS octet were defined as the Precedence and Type of 2459 Service (TOS) fields. 2461 0 1 2 3 4 5 6 7 2462 +-----+-----+-----+-----+-----+-----+-----+-----+ 2463 | PRECEDENCE | TOS | 0 | 0 | RFC 791 2464 +-----+-----+-----+-----+-----+-----+-----+-----+ 2466 RFC 1122 included bits 6 and 7 in the TOS field, though it did not 2467 discuss any specific use for those two bits: 2469 0 1 2 3 4 5 6 7 2470 +-----+-----+-----+-----+-----+-----+-----+-----+ 2471 | PRECEDENCE | TOS | RFC 1122 2472 +-----+-----+-----+-----+-----+-----+-----+-----+ 2474 The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows: 2476 0 1 2 3 4 5 6 7 2477 +-----+-----+-----+-----+-----+-----+-----+-----+ 2478 | PRECEDENCE | TOS | MBZ | RFC 1349 2479 +-----+-----+-----+-----+-----+-----+-----+-----+ 2481 Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary 2482 Cost". In addition to the Precedence and Type of Service (TOS) 2483 fields, the last field, MBZ (for "must be zero") was defined as cur- 2484 rently unused. RFC 1349 stated that "The originator of a datagram 2485 sets [the MBZ] field to zero (unless participating in an Internet 2486 protocol experiment which makes use of that bit)." 2488 RFC 1455 [RFC 1455] defined an experimental standard that used all 2489 four bits in the TOS field to request a guaranteed level of link 2490 security. 2492 RFC 1349 and RFC 1455 have been obsoleted by "Definition of the Dif- 2493 ferentiated Services Field (DS Field) in the IPv4 and IPv6 Headers" 2494 [RFC2474] in which bits 6 and 7 of the DS field are listed as Cur- 2495 rently Unused (CU). RFC 2780 [RFC2780] specified ECN as an experi- 2496 mental use of the two-bit CU field. RFC 2780 updated the definition 2497 of the DS Field to only encompass the first six bits of this octet 2498 rather than all eight bits; these first six bits are defined as the 2499 Differentiated Services CodePoint (DSCP): 2501 0 1 2 3 4 5 6 7 2502 +-----+-----+-----+-----+-----+-----+-----+-----+ 2503 | DSCP | CU | RFCs 2474, 2504 2780 2505 +-----+-----+-----+-----+-----+-----+-----+-----+ 2507 Because of this unstable history, the definition of the ECN field in 2508 this document cannot be guaranteed to be backwards compatible with 2509 all past uses of these two bits. 2511 Prior to RFC 2474, routers were not permitted to modify bits in 2512 either the DSCP or ECN field of packets forwarded through them, and 2513 hence routers that comply only with RFCs prior to 2474 should have no 2514 effect on ECN. For end nodes, bit 7 (the ECN CE bit) must be trans- 2515 mitted as zero for any implementation compliant only with RFCs prior 2516 to 2474. Such nodes may transmit bit 6 (the ECN ECT bit) as one for 2517 the "Minimize Monetary Cost" provision of RFC 1349 or the experiment 2518 authorized by RFC 1455; neither this aspect of RFC 1349 nor the 2519 experiment in RFC 1455 were widely implemented or used. The damage 2520 that could be done by a broken, non-conformant router would be to 2521 "erase" the CE bit for an ECN- capable packet that arrived at the 2522 router with the CE bit set, or set the CE bit even in the absence of 2523 congestion. This has been discussed in the section on "Non-compli- 2524 ance in the Network". 2526 The damage that could be done in an ECN-capable environment by a non- 2527 ECN-capable end-node transmitting packets with the ECT bit set has 2528 been discussed in the section on "Non-compliance by the End Nodes". 2530 23. IANA Considerations 2532 The bits for ECT and CE in the ECN Field of the IP header and the 2533 bits for CWR and ECE in the TCP header are specified by the Standards 2534 Action of this RFC, as is required by RFC 2780. We would note that 2535 this RFC does not define the codepoint of (ECT=0, CE=1) for the ECT 2536 and CE bits. 2538 IANA allocated the IPSEC Security Association Attribute value 10 for 2539 the ECN Tunnel use described in Section 9.2.1.2 above at the request 2540 of David Black in November 1999. If this draft is approved for pub- 2541 lication as an RFC, IANA should change the Reference for this alloca- 2542 tion from David Black's request to this RFC based on its RFC number. 2544 AUTHORS' ADDRESSES 2546 K. K. Ramakrishnan 2547 TeraOptic Networks, Inc. 2548 Phone: +1 (408) 666-8650 2549 Email: kk@teraoptic.com 2551 Sally Floyd 2552 Phone: +1 (510) 666-2989 2553 ACIRI 2554 Email: floyd@aciri.org 2555 URL: http://www.aciri.org/floyd/ 2557 David L. Black 2558 EMC Corporation 2559 42 South St. 2560 Hopkinton, MA 01748 2561 Phone: +1 (508) 435-1000 x75140 2562 Email: black_david@emc.com 2564 This draft was created in January 2001. 2565 It expires July 2001.