idnits 2.17.1 draft-ietf-tsvwg-ecn-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 53 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 54 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 2 instances of too long lines in the document, the longest one being 5 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 2001' is mentioned on line 445, but not defined ** Obsolete undefined reference: RFC 2001 (Obsoleted by RFC 2581) == Missing Reference: 'RFC2401' is mentioned on line 1395, but not defined ** Obsolete undefined reference: RFC 2401 (Obsoleted by RFC 4301) == Missing Reference: 'RFC 2474' is mentioned on line 1375, but not defined == Missing Reference: 'RFC 2475' is mentioned on line 1376, but not defined == Missing Reference: 'RFC 1455' is mentioned on line 2462, but not defined ** Obsolete undefined reference: RFC 1455 (Obsoleted by RFC 2474) == Unused Reference: 'FRED' is defined on line 1759, but no explicit reference was found in the text == Unused Reference: 'RFC1455' is defined on line 1802, but no explicit reference was found in the text == Unused Reference: 'RFC1701' is defined on line 1805, but no explicit reference was found in the text == Unused Reference: 'RFC1702' is defined on line 1808, but no explicit reference was found in the text == Unused Reference: 'RFC 2119' is defined on line 1814, but no explicit reference was found in the text == Unused Reference: 'RFC2408' is defined on line 1826, but no explicit reference was found in the text == Unused Reference: 'RFC2409' is defined on line 1830, but no explicit reference was found in the text == Unused Reference: 'RFC2475' is defined on line 1837, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2402 (ref. 'AH') (Obsoleted by RFC 4302, RFC 4305) -- Possible downref: Non-RFC (?) normative reference: ref. 'ECN' ** Obsolete normative reference: RFC 2406 (ref. 'ESP') (Obsoleted by RFC 4303, RFC 4305) -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ93' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd94' -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd98' -- Possible downref: Non-RFC (?) normative reference: ref. 'FF99' -- Possible downref: Non-RFC (?) normative reference: ref. 'FRED' ** Downref: Normative reference to an Informational RFC: RFC 1701 (ref. 'GRE') -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson88' -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson90' -- Possible downref: Non-RFC (?) normative reference: ref. 'K98' -- Possible downref: Non-RFC (?) normative reference: ref. 'MJV96' ** Downref: Normative reference to an Informational RFC: RFC 2702 (ref. 'MPLS') ** Downref: Normative reference to an Informational RFC: RFC 2637 (ref. 'PPTP') ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Downref: Normative reference to an Informational RFC: RFC 1141 ** Obsolete normative reference: RFC 1349 (Obsoleted by RFC 2474) ** Obsolete normative reference: RFC 1455 (Obsoleted by RFC 2474) -- Duplicate reference: RFC1701, mentioned in 'RFC1701', was also mentioned in 'GRE'. ** Downref: Normative reference to an Informational RFC: RFC 1701 ** Downref: Normative reference to an Informational RFC: RFC 1702 -- Duplicate reference: RFC2119, mentioned in 'RFC 2119', was also mentioned in 'B97'. ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567) ** Obsolete normative reference: RFC 2401 (Obsoleted by RFC 4301) ** Obsolete normative reference: RFC 2407 (Obsoleted by RFC 4306) ** Obsolete normative reference: RFC 2409 (ref. 'RFC2408') (Obsoleted by RFC 4306) -- Duplicate reference: RFC2409, mentioned in 'RFC2409', was also mentioned in 'RFC2408'. ** Obsolete normative reference: RFC 2409 (Obsoleted by RFC 4306) ** Downref: Normative reference to an Informational RFC: RFC 2475 ** Obsolete normative reference: RFC 2481 (Obsoleted by RFC 3168) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Downref: Normative reference to an Informational RFC: RFC 2884 ** Downref: Normative reference to an Informational RFC: RFC 2983 -- Possible downref: Non-RFC (?) normative reference: ref. 'RFD99' -- Possible downref: Non-RFC (?) normative reference: ref. 'RJ90' -- Possible downref: Non-RFC (?) normative reference: ref. 'SCWA99' Summary: 29 errors (**), 0 flaws (~~), 17 warnings (==), 18 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force K. K. Ramakrishnan 3 INTERNET DRAFT TeraOptic Networks 4 draft-ietf-tsvwg-ecn-00.txt Sally Floyd 5 ACIRI 6 D. Black 7 EMC 8 November, 2000 9 Expires: May, 2001 11 The Addition of Explicit Congestion Notification (ECN) to IP 13 Status of this Memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet- Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 Abstract 36 This document specifies the incorporation of ECN (Explicit Congestion 37 Notification) to TCP and IP, including ECN's use of two bits in the 38 IP header's DS field. We begin by describing TCP's use of packet 39 drops as an indication of congestion. Next we explain that with the 40 addition of active queue management (e.g., RED) to the Internet 41 infrastructure, where routers detect congestion before the queue 42 overflows, routers are no longer limited to packet drops as an 43 indication of congestion. Routers can instead set the Congestion 44 Experienced (CE) bit in the IP header of packets from ECN-capable 45 transports. We describe when the CE bit is to be set in routers, and 46 describe modifications needed to TCP to make it ECN-capable. 47 Modifications to other transport protocols (e.g., unreliable unicast 48 or multicast, reliable multicast, other reliable unicast transport 49 protocols) could be considered as those protocols are developed and 50 advance through the standards process. 52 We also describe in this document the issues involving the use of ECN 53 within IP tunnels, and within IPsec tunnels in particular. 55 One of the guiding principles for this document is that all the 56 mechanisms specified here are incrementally deployable. 58 Table of Contents 59 1. Introduction 60 2. Conventions and Acronyms 61 3. Assumptions and General Principles 62 4. Active Queue Management (AQM) 63 5. Explicit Congestion Notification in IP 64 5.1. ECN as an indication of persistent congestion 65 5.2. Dropped or Corrupted Packets 66 6. Support from the Transport Protocol 67 6.1. TCP 68 6.1.1. TCP Initialization 69 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field 70 6.1.1.2. Robust TCP Initialization with no response to the SYN 71 6.1.2. The TCP Sender 72 6.1.3. The TCP Receiver 73 6.1.4. Congestion on the ACK-path 74 6.1.5. Retransmitted TCP packets 75 6.1.6. TCP Window Probes. 76 7. Non-compliance by the End Nodes 77 8. Non-compliance in the Network 78 8.1. Complications Introduced by Split Paths 79 9. Encapsulated Packets 80 9.1. IP packets encapsulated in IP 81 9.1.1. The limited-functionality and full-functionality options within 82 9.1.2. Changes to the ECN Field within an IP Tunnel. 83 9.2. IPsec Tunnels 84 9.2.1. Negotiation between Tunnel Endpoints 85 9.2.1.1. ECN Tunnel Security Association Database Field 86 9.2.1.2. ECN Tunnel Security Association Attribute 87 9.2.1.3. Changes to IPsec Tunnel Header Processing 88 9.2.2. Changes to the ECN Field within an IPsec Tunnel. 89 9.2.3. Comments for IPsec Support 90 9.3. IP packets encapsulated in non-IP packet headers. 91 10. Issues Raised by Monitoring and Policing Devices 92 11. Evaluations of ECN 93 12. Summary of changes required in IP and TCP 94 13. Conclusions 95 14. Acknowledgements 96 15. References 97 16. Security Considerations 98 17. IPv4 Header Checksum Recalculation 99 18. Possible Changes to the ECN Field in the Network 100 18.1. Possible Changes to the IP Header 101 18.1.1. Erasing the Congestion Indication 102 18.1.2. Falsely Reporting Congestion 103 18.1.3. Disabling ECN-Capability 104 18.1.4. Falsely Indicating ECN-Capability 105 18.1.5. Changes with No Functional Effect 106 18.2. Information carried in the Transport Header 107 18.3. Split Paths 108 19. Implications of Subverting End-to-End Congestion Control 109 19.1. Implications for the Network and for Competing Flows 110 19.2. Implications for the Subverted Flow 111 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control 112 20. The motivation for the ECT bit. 113 21. Why use two bits in the IP header? 114 22. Historical definitions for the IPv4 TOS octet 116 1. Introduction 118 TCP's congestion control and avoidance algorithms are based on the 119 notion that the network is a black-box [Jacobson88, Jacobson90]. The 120 network's state of congestion or otherwise is determined by end- sys- 121 tems probing for the network state, by gradually increasing the load 122 on the network (by increasing the window of packets that are out- 123 standing in the network) until the network becomes congested and a 124 packet is lost. Treating the network as a "black-box" and treating 125 loss as an indication of congestion in the network is appropriate for 126 pure best-effort data carried by TCP, with little or no sensitivity 127 to delay or loss of individual packets. In addition, TCP's conges- 128 tion management algorithms have techniques built-in (such as Fast 129 Retransmit and Fast Recovery) to minimize the impact of losses, from 130 a throughput perspective. However, these mechanisms are not intended 131 to help applications that are in fact sensitive to the delay or loss 132 of one or more individual packets. Interactive traffic such as tel- 133 net, web-browsing, and transfer of audio and video data can be sensi- 134 tive to packet losses (especially when using an unreliable data 135 delivery transport such as UDP) or to the increased latency of the 136 packet caused by the need to retransmit the packet after a loss (with 137 the reliable data delivery semantics provided by TCP). 139 Since TCP determines the appropriate congestion window to use by 140 gradually increasing the window size until it experiences a dropped 141 packet, this causes the queues at the bottleneck router to build up. 142 With most packet drop policies at the router that are not sensitive 143 to the load placed by each individual flow (e.g., tail-drop on queue 144 overflow), this means that some of the packets of latency-sensitive 145 flows may be dropped. In addition, such drop policies lead to syn- 146 chronization of loss across multiple flows. 148 Active queue management mechanisms detect congestion before the queue 149 overflows, and provide an indication of this congestion to the end 150 nodes. Thus, active queue management can reduce unnecessary queueing 151 delay for all traffic sharing that queue. The advantages of active 152 queue management are discussed in RFC 2309 [RFC2309]. Active queue 153 management avoids some of the bad properties of dropping on queue 154 overflow, including the undesirable synchronization of loss across 155 multiple flows. More importantly, active queue management means that 156 transport protocols with mechanisms for congestion control (e.g., 157 TCP) do not have to rely on buffer overflow as the only indication of 158 congestion. 160 Active queue management mechanisms may use one of several methods for 161 indicating congestion to end-nodes. One is to use packet drops, as is 162 currently done. However, active queue management allows the router to 163 separate policies of queueing or dropping packets from the policies 164 for indicating congestion. Thus, active queue management allows 165 routers to use the Congestion Experienced (CE) bit in a packet header 166 as an indication of congestion, instead of relying solely on packet 167 drops. This has the potential of reducing the impact of loss on 168 latency-sensitive flows. 170 2. Conventions and Acronyms 172 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 173 SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this 174 document, are to be interpreted as described in [B97]. 176 3. Assumptions and General Principles 178 In this section, we describe some of the important design principles 179 and assumptions that guided the design choices in this proposal. 181 * Because ECN is likely to be adopted gradually, accommodating migra- 182 tion is essential. Some routers may still only drop packets to indi- 183 cate congestion, and some end-systems may not be ECN- capable. The 184 most viable strategy is one that accommodates incremental deployment 185 without having to resort to "islands" of ECN-capable and non-ECN- 186 capable environments. 187 * New mechanisms for congestion control and avoidance need to co- 188 exist and cooperate with existing mechanisms for congestion control. 189 In particular, new mechanisms have to co-exist with TCP's current 190 methods of adapting to congestion and with routers' current practice 191 of dropping packets in periods of congestion. 192 * Congestion may persist over different time-scales. The time scales 193 that we are concerned with are congestion events that may last longer 194 than a round-trip time. 195 * The number of packets in an individual flow (e.g., TCP connection 196 or an exchange using UDP) may range from a small number of packets to 197 quite a large number. We are interested in managing the congestion 198 caused by flows that send enough packets so that they are still 199 active when network feedback reaches them. 200 * Asymmetric routing is likely to be a normal occurrence in the 201 Internet. The path (sequence of links and routers) followed by data 202 packets may be different from the path followed by the acknowledgment 203 packets in the reverse direction. 204 * Many routers process the "regular" headers in IP packets more effi- 205 ciently than they process the header information in IP options. This 206 suggests keeping congestion experienced information in the regular 207 headers of an IP packet. 208 * It must be recognized that not all end-systems will cooperate in 209 mechanisms for congestion control. However, new mechanisms shouldn't 210 make it easier for TCP applications to disable TCP congestion con- 211 trol. The benefit of lying about participating in new mechanisms 212 such as ECN-capability should be small. 214 4. Active Queue Management (AQM) 216 Random Early Detection (RED) is one mechanism for Active Queue Man- 217 agement (AQM) that has been proposed to detect incipient congestion 218 [FJ93], and is currently being deployed in the Internet [RFC2309]. 219 AQM is meant to be a general mechanism using one of several alterna- 220 tives for congestion indication, but in the absence of ECN, AQM is 221 restricted to using packet drops as a mechanism for congestion indi- 222 cation. AQM drops packets based on the average queue length exceed- 223 ing a threshold, rather than only when the queue overflows. However, 224 because AQM may drop packets before the queue actually overflows, AQM 225 is not always forced by memory limitations to discard the packet. 227 AQM can set a Congestion Experienced (CE) bit in the packet header 228 instead of dropping the packet, when such a bit is provided in the IP 229 header and understood by the transport protocol. The use of the CE 230 bit with ECN allows the receiver(s) to receive the packet, avoiding 231 the potential for excessive delays due to retransmissions after 232 packet losses. We use the term 'CE packet' to denote a packet that 233 has the CE bit set. 235 5. Explicit Congestion Notification in IP 237 This document specifies that the Internet provide a congestion indi- 238 cation for incipient congestion (as in RED and earlier work [RJ90]) 239 where the notification can sometimes be through marking packets 240 rather than dropping them. This uses an ECN field in the IP header 241 with two bits. The ECN-Capable Transport (ECT) bit is set by the 242 data sender to indicate that the end-points of the transport protocol 243 are ECN-capable. The CE bit is set by the router to indicate conges- 244 tion to the end nodes. Routers that have a packet arriving at a full 245 queue drop the packet, just as they do in the absence of ECN. 247 Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field. 248 Bit 6 is designated as the ECT bit, and bit 7 is designated as the CE 249 bit. The IPv4 TOS octet corresponds to the Traffic Class octet in 250 IPv6. The definitions for the IPv4 TOS octet [RFC791] and the IPv6 251 Traffic Class octet have been superseded by the DS (Differentiated 252 Services) Field [RFC2474]. Bits 6 and 7 are listed in [RFC2474] as 253 Currently Unused. Section 19 gives a brief history of the TOS octet. 255 0 1 2 3 4 5 6 7 256 +-----+-----+-----+-----+-----+-----+-----+-----+ 257 | | ECN FIELD | 258 | DSCP | | 259 | | ECT | CE | 260 +-----+-----+-----+-----+-----+-----+-----+-----+ 262 DSCP: differentiated services codepoint 263 ECN: Explicit Congestion Notification 265 Figure 1: The Differentiated Services Field in IP. 267 Because of the unstable history of the TOS octet, the use of the ECN 268 field as specified in this document cannot be guaranteed to be back- 269 wards compatible with all past uses of these two bits. The potential 270 dangers of this lack of backwards compatibility are discussed in Sec- 271 tion 19. 273 Upon the receipt by an ECN-Capable transport of a single CE packet, 274 the congestion control algorithms followed at the end-systems MUST be 275 essentially the same as the congestion control response to a *single* 276 dropped packet. For example, for ECN-Capable TCP the source TCP is 277 required to halve its congestion window for any window of data con- 278 taining either a packet drop or an ECN indication. 280 One reason for requiring that the congestion-control response to the 281 CE packet be essentially the same as the response to a dropped packet 282 is to accommodate the incremental deployment of ECN in both end-sys- 283 tems and in routers. Some routers may drop ECN-Capable packets 284 (e.g., using the same AQM policies for congestion detection) while 285 other routers set the CE bit, for equivalent levels of congestion. 286 Similarly, a router might drop a non-ECN-Capable packet but set the 287 CE bit in an ECN-Capable packet, for equivalent levels of congestion. 288 If there were different congestion control responses to a CE bit 289 indication than to a packet drop, this could result in unfair treat- 290 ment for different flows. 292 An additional goal is that the end-systems should react to congestion 293 at most once per window of data (i.e., at most once per round-trip 294 time), to avoid reacting multiple times to multiple indications of 295 congestion within a round-trip time. 297 For a router, the CE bit of an ECN-Capable packet should only be set 298 if the router would otherwise have dropped the packet as an indica- 299 tion of congestion to the end nodes. When the router's buffer is not 300 yet full and the router is prepared to drop a packet to inform end 301 nodes of incipient congestion, the router should first check to see 302 if the ECT bit is set in that packet's IP header. If so, then 303 instead of dropping the packet, the router MAY instead set the CE bit 304 in the IP header. 306 An environment where all end nodes were ECN-Capable could allow new 307 criteria to be developed for setting the CE bit, and new congestion 308 control mechanisms for end-node reaction to CE packets. However, 309 this is a research issue, and as such is not addressed in this docu- 310 ment. 312 When a CE packet (i.e., a packet that has the CE bit set) is received 313 by a router, the CE bit is left unchanged, and the packet is trans- 314 mitted as usual. When severe congestion has occurred and the router's 315 queue is full, then the router has no choice but to drop some packet 316 when a new packet arrives. We anticipate that such packet losses 317 will become relatively infrequent when a majority of end-systems 318 become ECN- Capable and participate in TCP or other compatible con- 319 gestion control mechanisms. In an ECN-Capable environment that is 320 adequately-provisioned network, packet losses should occur primarily 321 during transients or in the presence of non-cooperating sources. 323 We expect that routers will set the CE bit in response to incipient 324 congestion as indicated by the average queue size, using the RED 325 algorithms suggested in [FJ93, RFC2309]. To the best of our knowl- 326 edge, this is the only proposal currently under discussion in the 327 IETF for routers to drop packets proactively, before the buffer over- 328 flows. However, this document does not attempt to specify a particu- 329 lar mechanism for active queue management, leaving that endeavor, if 330 needed, to other areas of the IETF. While ECN is inextricably tied 331 up with the need to have a reasonable active queue management mecha- 332 nism at the router, the reverse does not hold; active queue manage- 333 ment mechanisms have been developed and deployed independent of ECN, 334 using packet drops as indications of congestion in the absence of ECN 335 in the IP architecture. 337 5.1. ECN as an indication of persistent congestion 339 We emphasize that a *single* packet with the CE bit set in an IP 340 packet causes the transport layer to respond, in terms of congestion 341 control, as it would to a packet drop. The instantaneous queue size 342 is likely to see considerable variations even when the router does 343 not experience persistent congestion. As such, it is important that 344 transient congestion at a router, reflected by the instantaneous 345 queue size reaching a threshold much smaller than the capacity of the 346 queue, not trigger a reaction at the transport layer. Therefore, the 347 CE bit should not be set by a router based on the instantaneous queue 348 size. 350 For example, since the ATM and Frame Relay mechanisms for congestion 351 indication have typically been defined without an associated notion 352 of average queue size as the basis for determining that an intermedi- 353 ate node is congested, we believe that they provide a very noisy sig- 354 nal. The TCP-sender reaction specified in this document for ECN is 355 NOT the appropriate reaction for such a noisy signal of congestion 356 notification. However, if the routers that interface to the ATM net- 357 work have a way of maintaining the average queue at the interface, 358 and use it to come to a reliable determination that the ATM subnet is 359 congested, they may use the ECN notification that is defined here. 361 We continue to encourage experiments in techniques at layer 2 (e.g., 362 in ATM switches or Frame Relay switches) to take advantage of ECN. 363 For example, using a scheme such as RED (where packet marking is 364 based on the average queue length exceeding a threshold), layer 2 365 devices could provide a reasonably reliable indication of congestion. 366 When all the layer 2 devices in a path set that layer's own Conges- 367 tion Experienced bit (e.g., the EFCI bit for ATM, the FECN bit in 368 Frame Relay) in this reliable manner, then the interface router to 369 the layer 2 network could copy the state of that layer 2 Congestion 370 Experienced bit into the CE bit in the IP header. We recognize that 371 this is not the current practice, nor is it in current standards. 372 However, encouraging experimentation in this manner may provide the 373 information needed to enable evolution of existing layer 2 mechanisms 374 to provide a more reliable means of congestion indication, when they 375 use a single bit for indicating congestion. 377 5.2. Dropped or Corrupted Packets 379 For the proposed use for ECN in this document (that is, for a trans- 380 port protocol such as TCP for which a dropped data packet is an indi- 381 cation of congestion), end nodes detect dropped data packets, and the 382 congestion response of the end nodes to a dropped data packet is at 383 least as strong as the congestion response to a received CE packet. 384 To ensure the reliable delivery of the congestion indication of the 385 CE bit, the ECT bit MUST NOT be set in a packet unless the loss of 386 that packet in the network would be detected by the end nodes and 387 interpreted as an indication of congestion. 389 Transport protocols such as TCP do not necessarily detect all packet 390 drops, such as the drop of a "pure" ACK packet; for example, TCP does 391 not reduce the arrival rate of subsequent ACK packets in response to 392 an earlier dropped ACK packet. Any proposal for extending ECN-Capa- 393 bility to such packets would have to address issues such as the case 394 of an ACK packet that was marked with the CE bit but was later 395 dropped in the network. We believe that this aspect is still the sub- 396 ject of research, so this document specifies that at this time, 397 "pure" ACK packets MUST NOT indicate ECN-Capability. 399 Similarly, if a CE packet is dropped later in the network due to cor- 400 ruption (bit errors), the end nodes should still invoke congestion 401 control, just as TCP would today in response to a dropped data 402 packet. This issue of corrupted CE packets would have to be consid- 403 ered in any proposal for the network to distinguish between packets 404 dropped due to corruption, and packets dropped due to congestion or 405 buffer overflow. In particular, the ubiquitous deployment of ECN 406 would not, in and of itself, be a sufficient development to allow 407 end-nodes to interpret packet drops as indications of corruption 408 rather than congestion. 410 6. Support from the Transport Protocol 412 ECN requires support from the transport protocol, in addition to the 413 functionality given by the ECN field in the IP packet header. The 414 transport protocol might require negotiation between the endpoints 415 during setup to determine that all of the endpoints are ECN-capable, 416 so that the sender can set the ECT bit in transmitted packets. Sec- 417 ond, the transport protocol must be capable of reacting appropriately 418 to the receipt of CE packets. This reaction could be in the form of 419 the data receiver informing the data sender of the received CE packet 420 (e.g., TCP), of the data receiver unsubscribing to a layered multi- 421 cast group (e.g., RLM [MJV96]), or of some other action that ulti- 422 mately reduces the arrival rate of that flow on that congested link. 424 This document only addresses the addition of ECN Capability to TCP, 425 leaving issues of ECN in other transport protocols to further 426 research. For TCP, ECN requires three new pieces of functionality: 427 negotiation between the endpoints during connection setup to deter- 428 mine if they are both ECN-capable; an ECN-Echo (ECE) flag in the TCP 429 header so that the data receiver can inform the data sender when a CE 430 packet has been received; and a Congestion Window Reduced (CWR) flag 431 in the TCP header so that the data sender can inform the data 432 receiver that the congestion window has been reduced. The support 433 required from other transport protocols is likely to be different, 434 particularly for unreliable or reliable multicast transport proto- 435 cols, and will have to be determined as other transport protocols are 436 brought to the IETF for standardization. 438 6.1. TCP 440 The following sections describe in detail the proposed use of ECN in 441 TCP. This proposal is described in essentially the same form in 443 [Floyd94]. We assume that the source TCP uses the standard congestion 444 control algorithms of Slow-start, Fast Retransmit and Fast Recovery 445 [RFC 2001]. 447 This proposal specifies two new flags in the Reserved field of the 448 TCP header. The TCP mechanism for negotiating ECN-Capability uses 449 the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the Reserved 450 field of the TCP header is designated as the ECN-Echo flag. The 451 location of the 6-bit Reserved field in the TCP header is shown in 452 Figure 3 of RFC 793 [RFC793] (and is reproduced below for complete- 453 ness). This specification of the ECN Field leaves the Reserved field 454 as a 4-bit field using bits 4-7. 456 To enable the TCP receiver to determine when to stop setting the ECN- 457 Echo flag, we introduce a second new flag in the TCP header, the CWR 458 flag. The CWR flag is assigned to Bit 8 in the Reserved field of the 459 TCP header. 461 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 462 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 463 | | | U | A | P | R | S | F | 464 | Header Length | Reserved | R | C | S | S | Y | I | 465 | | | G | K | H | T | N | N | 466 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 468 Figure 2: The old definition of bytes 13 and 14 of the TCP 469 header. 471 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 472 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 473 | | | C | E | U | A | P | R | S | F | 474 | Header Length | Reserved | W | C | R | C | S | S | Y | I | 475 | | | R | E | G | K | H | T | N | N | 476 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 478 Figure 3: The new definition of bytes 13 and 14 of the TCP 479 Header. 481 Thus, ECN uses the ECT and CE flags in the IP header (as shown in 482 Figure 1) for signaling between routers and connection endpoints, and 483 uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure 484 3) for TCP-endpoint to TCP-endpoint signaling. For a TCP connection, 485 a typical sequence of events in an ECN-based reaction to congestion 486 is as follows: 487 * The ECT bit is set in packets transmitted by the sender to indi- 488 cate that ECN is supported by the transport entities for these 489 packets. 490 * An ECN-capable router detects impending congestion and detects 491 that the ECT bit is set in the packet it is about to drop. 492 Instead of dropping the packet, the router chooses to set the CE 493 bit in the IP header and forwards the packet. 494 * The receiver receives the packet with the CE bit set, and sets 495 the ECN-Echo flag in its next TCP ACK sent to the sender. 496 * The sender receives the TCP ACK with ECN-Echo set, and reacts to 497 the congestion as if a packet had been dropped. 498 * The sender sets the CWR flag in the TCP header of the next 499 packet sent to the receiver to acknowledge its receipt of and 500 reaction to the ECN-Echo flag. 502 The negotiation for using ECN by the TCP transport entities and the 503 use of the ECN-Echo and CWR flags is described in more detail in the 504 sections below. 506 6.1.1 TCP Initialization 508 In the TCP connection setup phase, the source and destination TCPs 509 exchange information about their willingness to use ECN. Subsequent 510 to the completion of this negotiation, the TCP sender sets the ECT 511 bit in the IP header of data packets to indicate to the network that 512 the transport is capable and willing to participate in ECN for this 513 packet. This indicates to the routers that they may mark this packet 514 with the CE bit, if they would like to use that as a method of con- 515 gestion notification. If the TCP connection does not wish to use ECN 516 notification for a particular packet, the sending TCP sets the ECT 517 bit equal to 0 (i.e., not set), and the TCP receiver ignores the CE 518 bit in the received packet. 520 For this discussion, we designate the initiating host as Host A and 521 the responding host as Host B. We call a SYN packet with the ECE and 522 CWR flags set an "ECN-setup SYN packet", and we call a SYN packet 523 with the ECE and CWR flags not set a "non-ECN-setup SYN packet". 524 Similarly, we call a SYN-ACK packet with only the ECE flag set but 525 the CWR flag not set an "ECN-setup SYN-ACK packet", and we call a 526 SYN-ACK packet with both the ECE and CWR flags not set a "non-ECN- 527 setup SYN-ACK packet". 529 Before a TCP connection can use ECN, Host A sends an ECN-setup SYN 530 packet, and Host B sends an ECN-setup SYN-ACK packet. For a SYN 531 packet, the setting of both ECE and CWR in the ECN-setup SYN packet 532 is defined as an indication that the sending TCP is ECN-Capable, 533 rather than as an indication of congestion or of response to conges- 534 tion. More precisely, an ECN-setup SYN packet indicates that the TCP 535 implementation transmitting the SYN packet will participate in ECN as 536 both a sender and receiver. Specifically, as a receiver, it will 537 respond to incoming data packets that have the CE bit set in the IP 538 header by setting ECE in outgoing TCP Acknowledgement (ACK) packets. 540 As a sender, it will respond to incoming packets that have ECE set by 541 reducing the congestion window and setting CWR when appropriate. An 542 ECN-setup SYN packet does not commit the TCP sender to setting the 543 ECT bit in any or all of the packets it may transmit. However, the 544 commitment to respond appropriately to incoming packets with the CE 545 bit set remains even if the TCP sender in a later transmission, 546 within this TCP connection, sends a SYN packet without ECE and CWR 547 set. 549 When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE flag 550 but not the CWR flag. An ECN-setup SYN-ACK packet is defined as an 551 indication that the TCP transmitting the SYN-ACK packet is ECN-Capa- 552 ble. As with the SYN packet, an ECN-setup SYN-ACK packet does not 553 commit the TCP host to setting the ECT bit in transmitted packets. 555 The following rules apply to the sending of ECN-setup packets: 557 * If a host has received an ECN-setup SYN packet, then it MAY send an 558 ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send an ECN-setup 559 SYN-ACK packet. 560 * A host MUST NOT set ECT on data packets unless it has sent at least 561 one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received at 562 least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has sent no 563 non-ECN-setup SYN or non-ECN-setup SYN-ACK packet. If a host has 564 received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK 565 packet, then it SHOULD NOT set ECT on data packets. 566 * If a host ever sets the ECT bit on a data packet, then that host 567 MUST correctly set/clear the CWR TCP bit on all subsequent packets in 568 the connection. 569 * If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK 570 packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN- 571 ACK packet, then if that host receives TCP data packets with ECT and 572 CE bits set in the IP header, then that host MUST process these pack- 573 ets as specified for an ECN-capable connection. 575 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field 577 There is the question of why we chose to have the TCP sending the SYN 578 set two ECN-related flags in the Reserved field of the TCP header for 579 the SYN packet, while the responding TCP sending the SYN-ACK sets 580 only one ECN-related flag in the SYN-ACK packet. This asymmetry is 581 necessary for the robust negotiation of ECN-capability with some 582 deployed TCP implementations. There exists at least one faulty TCP 583 implementation in which TCP receivers set the Reserved field of the 584 TCP header in ACK packets (and hence the SYN-ACK) simply to reflect 585 the Reserved field of the TCP header in the received data packet. 586 Because the TCP SYN packet sets the ECN-Echo and CWR flags to indi- 587 cate ECN-capability, while the SYN-ACK packet sets only the ECN-Echo 588 flag, the sending TCP correctly interprets a receiver's reflection of 589 its own flags in the Reserved field as an indication that the 590 receiver is not ECN-capable. The sending TCP is not mislead by a 591 faulty TCP implementation sending a SYN-ACK packet that simply 592 reflects the Reserved field of the incoming SYN packet. 594 6.1.1.2. Robust TCP Initialization with no response to the SYN 596 ECN introduces the use of the ECN-Echo and CWR flags in the TCP 597 header (as shown in Figure 3) for initialization. There exists some 598 faulty equipment in the Internet that either ignores an ECN-setup SYN 599 packet or responds with a RST, in the belief that such a packet (with 600 these bits set) is a signature for a port-scanning tool that could be 601 used in a denial-of-service attack. To provide robust connectivity 602 even in the presence of such faulty equipment, a host that receives a 603 RST in response to the transmission of an ECN-setup SYN packet MAY 604 resend a SYN with CWR and ECE cleared. This could result in a TCP 605 connection being established without using ECN. Similarly, a host 606 that receives no reply to an ECN-setup SYN within the normal SYN 607 retransmission timeout interval MAY resend the SYN and any subsequent 608 SYN retransmissions with CWR and ECE cleared. To overcome normal 609 packet loss that results in the original SYN being lost, the origi- 610 nating host may retransmit one or more ECN-setup SYN packets before 611 giving up and retransmitting the SYN with the CWR and ECE bits 612 cleared. 614 We note that in this case, the following example scenario is possi- 615 ble: 617 (1) Host A: Sends an ECN-setup SYN. 618 (2) Host B: Sends an ECN-setup SYN/ACK, packet is dropped or 619 delayed. 620 (3) Host A: Sends a non-ECN-setup SYN. 621 (4) Host B: Sends a non-ECN-setup SYN/ACK. 623 We note that in this case, following the procedures above, neither 624 Host A nor Host B may set the ECT bit on data packets, We further 625 note that a host NEVER uses the reception of ECT data packets as an 626 implicit signal that the other host is ECN-capable. 628 6.1.2. The TCP Sender 630 For a TCP connection using ECN, data packets are transmitted with the 631 ECT bit set in the IP header (set to a "1"). If the sender receives 632 an ECN-Echo (ECE) ACK packet (that is, an ACK packet with the ECN- 633 Echo flag set in the TCP header), then the sender knows that conges- 634 tion was encountered in the network on the path from the sender to 635 the receiver. The indication of congestion should be treated just as 636 a congestion loss in non-ECN-Capable TCP. That is, the TCP source 637 halves the congestion window "cwnd" and reduces the slow start 638 threshold "ssthresh". The sending TCP SHOULD NOT increase the con- 639 gestion window in response to the receipt of an ECN-Echo ACK packet. 641 TCP should not react to congestion indications more than once every 642 window of data (or more loosely, more than once every round-trip 643 time). That is, the TCP sender's congestion window should be reduced 644 only once in response to a series of dropped and/or CE packets from a 645 single window of data. In addition, the TCP source should not 646 decrease the slow-start threshold, ssthresh, if it has been decreased 647 within the last round trip time. However, if any retransmitted pack- 648 ets are dropped, then this is interpreted by the source TCP as a new 649 instance of congestion. 651 After the source TCP reduces its congestion window in response to a 652 CE packet, incoming acknowledgements that continue to arrive can 653 "clock out" outgoing packets as allowed by the reduced congestion 654 window. If the congestion window consists of only one MSS (maximum 655 segment size), and the sending TCP receives an ECN-Echo ACK packet, 656 then the sending TCP should in principle still reduce its congestion 657 window in half. However, the value of the congestion window is 658 bounded below by a value of one MSS. If the sending TCP were to con- 659 tinue to send, using a congestion window of 1 MSS, this results in 660 the transmission of one packet per round-trip time. It is necessary 661 to still reduce the sending rate of the TCP sender even further, on 662 receipt of an ECN-Echo packet when the congestion window is one. We 663 use the retransmit timer as a means of reducing the rate further in 664 this circumstance. Therefore, the sending TCP MUST reset the 665 retransmit timer on receiving the ECN-Echo packet when the congestion 666 window is one. The sending TCP will then be able to send a new 667 packet only when the retransmit timer expires. 669 [Floyd94] discusses TCP's response to ECN in more detail. [Floyd98] 670 discusses the validation test in the ns simulator, which illustrates 671 a wide range of ECN scenarios. These scenarios include the following: 672 an ECN followed by another ECN, a Fast Retransmit, or a Retransmit 673 Timeout; a Retransmit Timeout or a Fast Retransmit followed by an 674 ECN; and a congestion window of one packet followed by an ECN. 676 TCP follows existing algorithms for sending data packets in response 677 to incoming ACKs, multiple duplicate acknowledgements, or retransmit 678 timeouts [RFC2581]. TCP also follows the normal procedures for 679 increasing the congestion window when it receives ACK packets without 680 the ECN-Echo bit set [RFC2581]. 682 6.1.3. The TCP Receiver 684 When TCP receives a CE data packet at the destination end-system, the 685 TCP data receiver sets the ECN-Echo flag in the TCP header of the 686 subsequent ACK packet. If there is any ACK withholding implemented, 687 as in current "delayed-ACK" TCP implementations where the TCP 688 receiver can send an ACK for two arriving data packets, then the ECN- 689 Echo flag in the ACK packet will be set to the OR of the CE bits of 690 all of the data packets being acknowledged. That is, if any of the 691 received data packets are CE packets, then the returning ACK has the 692 ECN-Echo flag set. 694 To provide robustness against the possibility of a dropped ACK packet 695 carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in 696 a series of ACK packets sent subsequently. The TCP receiver uses the 697 CWR flag received from the TCP sender to determine when to stop set- 698 ting the ECN-Echo flag. 700 When an ECN-Capable TCP sender reduces its congestion window for any 701 reason (because of a retransmit timeout, a Fast Retransmit, or in 702 response to an ECN Notification), the TCP sender sets the CWR flag in 703 the TCP header of the first new data packet sent after the window 704 reduction. If that data packet is dropped in the network, then the 705 sending TCP will have to reduce the congestion window again and 706 retransmit the dropped packet. 708 We ensure that the "Congestion Window Reduced" information is reli- 709 ably delivered to the TCP receiver. This comes about from the fact 710 that if the new data packet carrying the CWR flag is dropped, then 711 the TCP sender will have to again reduce its congestion window, and 712 send another new data packet with the CWR flag set. Thus, the CWR 713 bit in the TCP header SHOULD NOT be set on retransmitted packets. 714 When the TCP data sender is ready to set the CWR bit after reducing 715 the congestion window, it SHOULD set the CWR bit only on the first 716 new data packet that it transmits. 718 After a TCP receiver sends an ACK packet with the ECN-Echo bit set, 719 that TCP receiver continues to set the ECN-Echo flag in all the ACK 720 packets it sends (whether they acknowledge CE data packets or non-CE 721 data packets) until it receives a CWR packet (a packet with the CWR 722 flag set). After the receipt of the CWR packet, acknowledgements for 723 subsequent non-CE data packets do not have the ECN-Echo flag set. If 724 another CE packet is received by the data receiver, the receiver 725 would once again send ACK packets with the ECN-Echo flag set. While 726 the receipt of a CWR packet does not guarantee that the data sender 727 received the ECN-Echo message, this does suggest that the data sender 728 reduced its congestion window at some point *after* it sent the data 729 packet for which the CE bit was set. 731 We have already specified that a TCP sender is not required to reduce 732 its congestion window more than once per window of data. Some care 733 is required if the TCP sender is to avoid unnecessary reductions of 734 the congestion window when a window of data includes both dropped 735 packets and (marked) CE packets. This is illustrated in [Floyd98]. 737 6.1.4. Congestion on the ACK-path 739 For the current generation of TCP congestion control algorithms, pure 740 acknowledgement packets (e.g., packets that do not contain any accom- 741 panying data) should be sent with the ECT bit off. Current TCP 742 receivers have no mechanisms for reducing traffic on the ACK-path in 743 response to congestion notification. Mechanisms for responding to 744 congestion on the ACK-path are areas for current and future research. 745 (One simple possibility would be for the sender to reduce its conges- 746 tion window when it receives a pure ACK packet with the CE bit set). 747 For current TCP implementations, a single dropped ACK generally has 748 only a very small effect on the TCP's sending rate. 750 6.1.5. Retransmitted TCP packets 752 This document specifies that for ECN-capable TCP implementations, the 753 ECT bit (ECN-Capable Transport) in the IP header MUST NOT be set on 754 retransmitted data packets, and that the TCP data receiver SHOULD 755 ignore the ECN field on arriving data packets that are outside of the 756 receiver's current window. This is for greater security against 757 denial-of-service attacks, as well as for robustness of the ECN con- 758 gestion indication with packets that are dropped later in the net- 759 work. 761 First, we note that if the TCP sender were to set the ECT bit on a 762 retransmitted packet, then if an unnecessarily-retransmitted packet 763 was later dropped in the network, the end nodes would never receive 764 the indication of congestion from the router setting the CE bit. 765 Thus, setting the ECT bit on retransmitted data packets is not con- 766 sistent with the robust delivery of the congestion indication even 767 for packets that are later dropped in the network. 769 In addition, an attacker capable of spoofing the IP source address of 770 the TCP sender could send data packets with arbitrary sequence num- 771 bers, with both the ECT and CE bits set in the IP header. On receiv- 772 ing this spoofed data packet, the TCP data receiver would determine 773 that the data does not lie in the current receive window, and return 774 a duplicate acknowledgement. We define an out-of-window packet at 775 the TCP data receiver as a data packet that lies outside the 776 receiver's current window. On receiving an out-of-window packet, the 777 TCP data receiver has to decide whether or not to treat the CE bit in 778 the packet header as a valid indication of congestion, and therefore 779 whether to return ECN-Echo indications to the TCP data sender. If 780 the TCP data receiver ignored the CE bit in an out-of-window packet, 781 then the TCP data sender would not receive this possibly-legitimate 782 indication of congestion from the network, resulting in a violation 783 of end-to-end congestion control. On the other hand, if the TCP data 784 receiver honors the CE indication in the out-of-window packet, and 785 reports the indication of congestion to the TCP data sender, then the 786 malicious node that created the spoofed, out-of-window packet has 787 successfully "attacked" the TCP connection by forcing the data sender 788 to unnecessarily reduce (halve) its congestion window. To prevent 789 such a denial-of-service attack, we specify that a legitimate TCP 790 data sender MUST NOT set the ECT bit on retransmitted data packets, 791 and that the TCP data receiver SHOULD ignore the CE bit on out-of- 792 window packets. 794 One drawback of not setting ECT on retransmitted packets denies ECN 795 protection for retransmitted packets. However, for an ECN-capable 796 TCP connection in a fully-ECN-capable environment with mild conges- 797 tion, packets should rarely be dropped due to congestion in the first 798 place, and so instances of retransmitted packets should rarely arise. 799 If packets are being retransmitted, then there are already packet 800 losses (from corruption or from congestion) that ECN has been unable 801 to prevent. 803 We note that if the router sets the CE bit for an ECN-capable data 804 packet within a TCP connection, then the TCP connection is guaranteed 805 to receive that indication of congestion, or to receive some other 806 indication of congestion within the same window of data, even if this 807 packet is dropped or reordered in the network. We consider two 808 cases, when the packet is later retransmitted, and when the packet is 809 not later retransmitted. 811 In the first case, if the packet is either dropped or delayed, and at 812 some point retransmitted by the data sender, then the retransmission 813 is a result of a Fast Retransmit or a Retransmit Timeout for either 814 that packet or for some prior packet in the same window of data. In 815 this case, because the data sender already has retransmitted this 816 packet, we know that the data sender has already responded to an 817 indication of congestion for some packet within the same window of 818 data as the original packet. Thus, even if the first transmission of 819 the packet is dropped in the network, or is delayed, if it had the CE 820 bit set, and is later ignored by the data receiver as an out-of-win- 821 dow packet, this is not a problem, because the sender has already 822 responded to an indication of congestion for that window of data. 824 In the second case, if the packet is never retransmitted by the data 825 sender, then this data packet is the only copy of this data received 826 by the data receiver, and therefore arrives at the data receiver as 827 an in-window packet, regardless of how much the packet might be 828 delayed or reordered. In this case, if the CE bit is set on the 829 packet within the network, this will be treated by the data receiver 830 as a valid indication of congestion. 832 6.1.6. TCP Window Probes. 834 When the TCP data receiver advertises a zero window, the TCP data 835 sender sends window probes to determine if the receiver's window has 836 increased. Window probe packets do not contain any user data except 837 for the sequence number, which is a byte. If a window probe packet 838 is dropped in the network, this loss is not detected by the receiver. 839 Therefore, the TCP data sender MUST NOT set either the ECT or CWR 840 bits on window probe packets. 842 However, because window probes use exact sequence numbers, they can- 843 not be easily spoofed in denial-of-service attacks. Therefore, if a 844 window probe arrives with ECT and CE set, then the receiver SHOULD 845 respond to the ECN indications. 847 7. Non-compliance by the End Nodes 849 This section discusses concerns about the vulnerability of ECN to 850 non-compliant end-nodes (i.e., end nodes that set the ECT bit in 851 transmitted packets but do not respond to received CE packets). We 852 argue that the addition of ECN to the IP architecture will not sig- 853 nificantly increase the current vulnerability of the architecture to 854 unresponsive flows. 856 Even for non-ECN environments, there are serious concerns about the 857 damage that can be done by non-compliant or unresponsive flows (that 858 is, flows that do not respond to congestion control indications by 859 reducing their arrival rate at the congested link). For example, an 860 end-node could "turn off congestion control" by not reducing its con- 861 gestion window in response to packet drops. This is a concern for the 862 current Internet. It has been argued that routers will have to 863 deploy mechanisms to detect and differentially treat packets from 864 non-compliant flows [RFC2309,FF99]. It has also been suggested that 865 techniques such as end-to-end per-flow scheduling and isolation of 866 one flow from another, differentiated services, or end-to-end reser- 867 vations could remove some of the more damaging effects of unrespon- 868 sive flows. 870 It might seem that dropping packets in itself is an adequate deter- 871 rent for non-compliance, and that the use of ECN removes this deter- 872 rent. We would argue in response that (1) ECN-capable routers pre- 873 serve packet-dropping behavior in times of high congestion; and (2) 874 even in times of high congestion, dropping packets in itself is not 875 an adequate deterrent for non-compliance. 877 First, ECN-Capable routers will only mark packets (as opposed to 878 dropping them) when the packet marking rate is reasonably low. During 879 periods where the average queue size exceeds an upper threshold, and 880 therefore the potential packet marking rate would be high, our recom- 881 mendation is that routers drop packets rather then set the CE bit in 882 packet headers. 884 During the periods of low or moderate packet marking rates when ECN 885 would be deployed, there would be little deterrent effect on unre- 886 sponsive flows of dropping rather than marking those packets. For 887 example, delay-insensitive flows using reliable delivery might have 888 an incentive to increase rather than to decrease their sending rate 889 in the presence of dropped packets. Similarly, delay-sensitive flows 890 using unreliable delivery might increase their use of FEC in response 891 to an increased packet drop rate, increasing rather than decreasing 892 their sending rate. For the same reasons, we do not believe that 893 packet dropping itself is an effective deterrent for non-compliance 894 even in an environment of high packet drop rates, when all flows are 895 sharing the same packet drop rate. 897 Several methods have been proposed to identify and restrict non- com- 898 pliant or unresponsive flows. The addition of ECN to the network 899 environment would not in any way increase the difficulty of designing 900 and deploying such mechanisms. If anything, the addition of ECN to 901 the architecture would make the job of identifying unresponsive flows 902 slightly easier. For example, in an ECN-Capable environment routers 903 are not limited to information about packets that are dropped or have 904 the CE bit set at that router itself; in such an environment, routers 905 could also take note of arriving CE packets that indicate congestion 906 encountered by that packet earlier in the path. 908 8. Non-compliance in the Network 910 This section considers the issues when a router is operating, possi- 911 bly maliciously, to modify either of the bits in the ECN field. In 912 this section we represent the ECN field in the IP header by the tuple 913 (ECT bit, CE bit). 915 By tampering with the bits in the ECN field, an adversary (or a bro- 916 ken router) could do one or more of the following: falsely report 917 congestion, disable ECN-Capability for an individual packet, erase 918 the ECN congestion indication, or falsely indicate ECN-Capability. 919 Appendix X systematically examines the various cases by which the ECN 920 field could be modified. The important criterion considered in 921 determining the consequences of such modifications is whether it is 922 likely to lead to poorer behavior in any dimension (throughput, 923 delay, fairness or functionality) than if a router were to drop a 924 packet. 926 The first two possible changes, falsely reporting congestion or dis- 927 abling ECN-Capability for an individual packet, are no worse than if 928 the router were to simply drop the packet. From a congestion control 929 point of view, setting the CE bit in the absence of congestion by a 930 non-compliant router would be no worse than a router dropping a 931 packet unnecessarily. By "erasing" the ECT bit of a packet that is 932 later dropped in the network, a router's actions could result in an 933 unnecessary packet drop for that packet later in the network. 935 However, as discussed in Section X in the Appendix, a router that 936 erases the ECN congestion indication or falsely indicates ECN-Capa- 937 bility could potentially do more damage to the flow that if it has 938 simply dropped the packet. A rogue or broken router that "erased" 939 the CE bit in arriving CE packets would prevent that indication of 940 congestion from reaching downstream receivers. This could result in 941 the failure of congestion control for that flow and a resulting 942 increase in congestion in the network, ultimately resulting in subse- 943 quent packets dropped for this flow as the average queue size 944 increased at the congested gateway. 946 Appendix X considers the potential repercussions of subverting end- 947 to-end congestion control by either falsely indicating ECN-Capabil- 948 ity, or by erasing the congestion indication in ECN (the CE-bit). We 949 observe in the Appendix that the consequence of subverting ECN-based 950 congestion control may lead to potential unfairness, but this is 951 likely to be no worse than the subversion of either ECN-based or 952 packet-based congestion control by the end nodes. 954 8.1. Complications Introduced by Split Paths 956 If a router or other network element has access to all of the packets 957 of a flow, then that router could do no more damage to a flow by 958 altering the ECN field than it could by simply dropping all of the 959 packets from that flow. However, in some cases, a malicious or bro- 960 ken router might have access to only a subset of the packets from a 961 flow. The question is as follows: can this router, by altering the 962 ECN field in this subset of the packets, do more damage to that flow 963 than if it has simply dropped that set of the packets? 965 This is also discussed in detail in the Appendix, which concludes as 966 follows: It is true that the adversary that has access only to a 967 subset of packets in an aggregate might, by subverting ECN-based con- 968 gestion control, be able to deny the benefits of ECN to the other 969 packets in the aggregate. While this is undesirable, this is not a 970 sufficient concern to result in disabling ECN within an IP tunnel. 972 9. Encapsulated Packets 974 9.1. IP packets encapsulated in IP 976 The encapsulation of IP packet headers in tunnels is used in many 977 places, including IPsec and IP in IP [RFC2003]. Currently, the ECN 978 specification does not accommodate the constraints imposed by some of 979 these pre-existing specifications for tunnels. This document consid- 980 ers issues related to interactions between ECN and IP tunnels, and 981 specifies two alternative solutions. 983 Some IP tunnel modes are based on adding a new "outer" IP header that 984 encapsulates the original, or "inner" IP header and its associated 985 packet. In many cases, the new "outer" IP header may be added and 986 removed at intermediate points along a connection, enabling the net- 987 work to establish a tunnel without requiring endpoint participation. 988 We denote tunnels that specify that the outer header be discarded at 989 tunnel egress as "simple tunnels". 991 ECN uses the ECT and CE flags in the IP header for signaling between 992 routers and connection endpoints. ECN interacts with IP tunnels 993 because of the ECT and CE flags in the DS field octet in the IP 994 header [RFC2474] (also referred to as the IPv4 TOS octet or IPv6 995 Traffic Class octet). [RFC2983] discusses interactions of Differen- 996 tiated Services with IP tunnels of various forms. In simple IP tun- 997 nels the DS field octet is copied or mapped from the inner IP header 998 to the outer IP header at IP tunnel ingress, and the outer header's 999 copy of this field is discarded at IP tunnel egress. If the outer 1000 header were to be simply discarded without taking care to deal with 1001 the ECN related flags, and an ECN-capable router were to set the CE 1002 (Congestion Experienced) bit within a packet in a simple IP tunnel, 1003 this indication would be discarded at tunnel egress, losing the indi- 1004 cation of congestion. 1006 Thus, the use of ECN over simple IP tunnels would result in routers 1007 attempting to use the outer IP header to signal congestion to end- 1008 points, but those congestion warnings never arriving because the 1009 outer header is discarded at the tunnel egress point. This problem 1010 was encountered with ECN and IPsec in tunnel mode, and RFC 2481 rec- 1011 ommended that ECN not be used with the older simple IPsec tunnels in 1012 order to avoid this behavior and its consequences. When ECN becomes 1013 widely deployed, then simple tunnels likely to carry ECN-capable 1014 traffic will have to be changed. 1016 From a security point of view, the use of ECN in the outer header of 1017 an IP tunnel might raise security concerns because an adversary could 1018 tamper with the ECN information that propagates beyond the tunnel 1019 endpoint. Based on an analysis in the Appendix of these concerns and 1020 the resultant risks, our overall approach is to make support for ECN 1021 an option for IP tunnels, so that an IP tunnel can be specified or 1022 configured either to use ECN or not to use ECN in the outer header of 1023 the tunnel. Thus, in environments or tunneling protocols where the 1024 risks of using ECN are judged to outweigh its benefits, the tunnel 1025 can simply not use ECN in the outer header. Then the only indication 1026 of congestion experienced at routers within the tunnel would be 1027 through packet loss. 1029 The result is that there are two viable options for the behavior of 1030 ECN-capable connections over an IP tunnel, especially IPSec tunnels: 1031 * A limited-functionality option in which ECN is preserved in the 1032 inner header, but disabled in the outer header. The only mecha- 1033 nism available for signaling congestion occurring within the tun- 1034 nel in this case is dropped packets. 1035 * A full-functionality option that supports ECN in both the inner 1036 and outer headers, and propagates congestion warnings from nodes 1037 within the tunnel to endpoints. 1039 Support for these options requires varying amounts of changes to IP 1040 header processing at tunnel ingress and egress. A small subset of 1041 these changes sufficient to support only the limited-functionality 1042 option would be sufficient to eliminate any incompatibility between 1043 ECN and IP tunnels. 1045 One goal of this document is to give guidance about the tradeoffs 1046 between the limited-functionality and full-functionality options. A 1047 full discussion of the potential effects of an adversary's modifica- 1048 tions of the CE and ECT bits is given in the Appendix. 1050 9.1.1. The limited-functionality and full-functionality options within 1051 IP Tunnels 1053 The limited-functionality option for ECN encapsulation in IP tunnels 1054 is for the ECT bit in the outside (encapsulating) header to be off 1055 (i.e., set to 0), regardless of the value of the ECT bit in the 1056 inside (encapsulated) header. With this option, the ECN field in the 1057 inner header is not altered upon de-capsulation. The disadvantage of 1058 this approach is that the flow does not have ECN support for that 1059 part of the path that is using IP tunneling, even if the encapsulated 1060 packet (from the original TCP sender) is ECN-Capable. That is, if 1061 the encapsulated packet arrives at a congested router that is ECN- 1062 capable, and the router can decide to drop or mark the packet as an 1063 indication of congestion to the end nodes, the router will not be 1064 permitted to set the CE bit in the packet header, but instead will 1065 have to drop the packet. 1067 The IP full-functionality option for ECN encapsulation is to copy the 1068 ECT bit of the inside header to the outside header on encapsulation, 1069 and to OR the CE bit from the outer header with the CE bit of the 1070 inside header on decapsulation. That is, for full ECN support the 1071 encapsulation and decapsulation processing for the DS field octet 1072 involves the following: At tunnel ingress, the full-functionality 1073 option copies the value of ECT (bit 6) in the inner header to the 1074 outer header. CE (bit 7) is set to 0 in the outer header. Upon 1075 decapsulation at the tunnel egress, the full-functionality option 1076 sets CE to 1 in the inner header if the value of ECT (bit 6) in the 1077 inner header is 1, and the value of CE (bit 7) in the outer header is 1078 1. Otherwise, no change is made to this field of the inner header. 1080 With the full-functionality option, a flow can take advantage of ECN 1081 in those parts of the path that might use IP tunneling. The disad- 1082 vantage of the full-functionality option from a security perspective 1083 is that the IP tunnel cannot protect the flow from certain modifica- 1084 tions to the ECN bits in the IP header within the tunnel. The poten- 1085 tial dangers from modifications to the ECN bits in the IP header are 1086 described in detail in the Appendix. 1088 (1) An IP tunnel MUST modify the handling of the DS field octet at 1089 IP tunnel endpoints by implementing either the limited-functional- 1090 ity or the full-functionality option. 1091 (2) Optionally, an IP tunnel MAY enable the endpoints of an IP 1092 tunnel to negotiate the choice between the limited-functionality 1093 and the full-functionality option for ECN in the tunnel. 1095 The minimum required to make ECN usable with IP tunnels is the lim- 1096 ited-functionality option, which prevents ECN from being enabled in 1097 the outer header of an IPsec tunnel. Full support for ECN requires 1098 the use of the full-functionality option. If there are no optional 1099 mechanisms for the tunnel endpoints to negotiate a choice between the 1100 limited-functionality or full-functionality option, there can be a 1101 pre-existing agreement between the tunnel endpoints about whether to 1102 support the limited-functionality or the full-functionality ECN 1103 option. 1105 In addition, it is RECOMMENDED that packets with ECT and CE both set 1106 to 1 in the outer header be dropped if they arrive at the tunnel 1107 egress point for a tunnel that uses the limited-functionality option, 1108 or for a tunnel that uses the full-functionality option but for which 1109 the ECT bit in the inner header is set to zero. This is motivated by 1110 backwards compatibility and to ensure that no unauthorized modifica- 1111 tions of the ECN field take place, and is discussed further in the 1112 Appendix. 1114 9.1.2. Changes to the ECN Field within an IP Tunnel. 1116 The presence of a copy of the ECN field in the inner header of an IP 1117 tunnel mode packet provides an opportunity for detection of unautho- 1118 rized modifications to the ECT bit in the outer header. Comparison 1119 of the ECT bits in the inner and outer headers falls into two cate- 1120 gories for implementations that conform to this document: 1121 * If the IP tunnel uses the full-functionality option, then the 1122 values of the ECT bits in the inner and outer headers should be 1123 identical. 1124 * If the tunnel uses the limited-functionality option, then the 1125 ECT bit in the outer header should be 0. 1127 Receipt of a packet not satisfying the appropriate condition could be 1128 a cause of concern. 1130 Consider the case of an IP tunnel where the tunnel ingress point has 1131 not been updated to this document's requirements, while the tunnel 1132 egress point has been updated to support ECN. In this case, the IP 1133 tunnel is not explicitly configured to support the full-functionality 1134 ECN option. However, the tunnel ingress point is behaving identically 1135 to a tunnel ingress point that supports the full-functionality 1136 option. If packets from an ECN-capable connection use this tunnel, 1137 ECT will be set to 1 in the outer header at the tunnel ingress point. 1138 Congestion within the tunnel may then result in ECN-capable routers 1139 setting CE in the outer header. Because the tunnel has not been 1140 explicitly configured to support the full-functionality option, the 1141 tunnel egress point expects the ECT bit in the outer header to be 0. 1142 When an ECN-capable tunnel egress point receives a packet with the 1143 ECT bit in the outer header set to 1, in a tunnel that has not been 1144 configured to support the full-functionality option, that packet 1145 should be processed, according to whether CE bit was set, as follows. 1146 It is RECOMMENDED that such packets, with the ECT bit in the outer 1147 header set to 1 on a tunnel that has not been configured to support 1148 the full-functionality option, be dropped at the egress point if CE 1149 is set to 1 in the outer header but 0 in the inner header, and for- 1150 warded otherwise. 1152 An IP tunnel cannot provide protection against erasure of congestion 1153 indications based on resetting the value of the CE bit in packets for 1154 which ECT is set in the outer header. The erasure of congestion 1155 indications may impact the network and other flows in ways that would 1156 not be possible in the absence of ECN. It is important to note that 1157 erasure of congestion indications can only be performed to congestion 1158 indications placed by nodes within the tunnel; the copy of the CE bit 1159 in the inner header preserves congestion notifications from nodes 1160 upstream of the tunnel ingress. If erasure of congestion notifica- 1161 tions is judged to be a security risk that exceeds the congestion 1162 management benefits of ECN, then tunnels could be specified or con- 1163 figured to use the limited-functionality option. 1165 9.2. IPsec Tunnels 1167 IPsec supports secure communication over potentially insecure network 1168 components such as intermediate routers. IPsec protocols support two 1169 operating modes, transport mode and tunnel mode, that span a wide 1170 range of security requirements and operating environments. Transport 1171 mode security protocol header(s) are inserted between the IP (IPv4 or 1172 IPv6) header and higher layer protocol headers (e.g., TCP), and hence 1173 transport mode can only be used for end-to-end security on a connec- 1174 tion. IPsec tunnel mode is based on adding a new "outer" IP header 1175 that encapsulates the original, or "inner" IP header and its associ- 1176 ated packet. Tunnel mode security headers are inserted between these 1177 two IP headers. In contrast to transport mode, the new "outer" IP 1178 header and tunnel mode security headers can be added and removed at 1179 intermediate points along a connection, enabling security gateways to 1180 secure vulnerable portions of a connection without requiring endpoint 1181 participation in the security protocols. An important aspect of tun- 1182 nel mode security is that in the original specification, the outer 1183 header is discarded at tunnel egress, ensuring that security threats 1184 based on modifying the IP header do not propagate beyond that tunnel 1185 endpoint. Further discussion of IPsec can be found in [RFC 2401]. 1187 The IPsec protocol as originally defined in [ESP, AH] required that 1188 the inner header's ECN field not be changed by IPsec decapsulation 1189 processing at a tunnel egress node; this would have ruled out the 1190 possibility of full-functionality mode for ECN. At the same time, 1191 this would ensure that an adversary's modifications to the ECN field 1192 cannot be used to launch theft- or denial-of-service attacks across 1193 an IPsec tunnel endpoint, as any such modifications will be discarded 1194 at the tunnel endpoint. 1196 In principle, permitting the use of ECN functionality in the outer 1197 header of an IPsec tunnel raises security concerns because an adver- 1198 sary could tamper with the information that propagates beyond the 1199 tunnel endpoint. Based on an analysis (included in the Appendix) of 1200 these concerns and the associated risks, our overall approach has 1201 been to provide configuration support for IPsec changes to remove the 1202 conflict with ECN. 1204 In particular, in tunnel mode the IPsec tunnel MUST support either 1205 the limited-functionality or the full-functionality mode outlined in 1206 Section X. 1208 This makes permission to use ECN functionality in the outer header of 1209 an IPsec tunnel a configurable part of the corresponding IPsec 1210 Security Association (SA), so that it can be disabled in situations 1211 where the risks are judged to outweigh the benefits. The result is 1212 that an IPsec security administrator is presented with two alterna- 1213 tives for the behavior of ECN-capable connections within an IPsec 1214 tunnel, the limited-functionality alternative and full-functionality 1215 alternative described earlier. All IPsec implementations MUST imple- 1216 ment either the limited-functionality or the full-functionality 1217 alternative in order to eliminate incompatibility between ECN and 1218 IPsec tunnels, but implementers MAY choose to implement either alter- 1219 native. 1221 In addition, this document specifies how the endpoints of an IPsec 1222 tunnel could negotiate enabling ECN functionality in the outer head- 1223 ers of that tunnel based on security policy. The ability to negoti- 1224 ate ECN usage between tunnel endpoints would enable a security admin- 1225 istrator to disable ECN in situations where she believes the risks 1226 (e.g., of lost congestion notifications) outweigh the benefits of 1227 ECN. 1229 The IPsec protocol, as defined in [ESP, AH], does not include the IP 1230 header's ECN field in any of its cryptographic calculations (in the 1231 case of tunnel mode, the outer IP header's ECN field is not 1232 included). Hence modification of the ECN field by a network node has 1233 no effect on IPsec's end-to-end security, because it cannot cause any 1234 IPsec integrity check to fail. As a consequence, IPsec does not pro- 1235 vide any defense against an adversary's modification of the ECN field 1236 (i.e., a man-in-the-middle attack), as the adversary's modification 1237 will also have no effect on IPsec's end-to-end security. In some 1238 environments, the ability to modify the ECN field without affecting 1239 IPsec integrity checks may constitute a covert channel; if it is nec- 1240 essary to eliminate such a channel or reduce its bandwidth, then the 1241 IPsec tunnel should be run in limited-functionality mode. 1243 9.2.1. Negotiation between Tunnel Endpoints 1245 This section describes the detailed changes to enable usage of ECN 1246 over IPsec tunnels, including the negotiation of ECN support between 1247 tunnel endpoints. This is supported by three changes to IPsec: 1248 * An optional Security Association Database (SAD) field indicating 1249 whether tunnel encapsulation and decapsulation processing allows 1250 or forbids ECN usage in the outer IP header. 1251 * An optional Security Association Attribute that enables negotia- 1252 tion of this SAD field between the two endpoints of an SA that 1253 supports tunnel mode. 1254 * Changes to tunnel mode encapsulation and decapsulation process- 1255 ing to allow or forbid ECN usage in the outer IP header based on 1256 the value of the SAD field. When ECN usage is allowed in the 1257 outer IP header, ECT is set in the outer header for ECN-capable 1258 connections and congestion notifications (indicated by the CE bit) 1259 from such connections are propagated to the inner header at tunnel 1260 egress. 1262 If negotiation of ECN usage is implemented, then the SAD field SHOULD 1263 also be implemented. On the other hand, negotiation of ECN usage is 1264 OPTIONAL in all cases, even for implementations that support the SAD 1265 field. The encapsulation and decapsulation processing changes are 1266 REQUIRED, but MAY be implemented without the other two changes by 1267 assuming that ECN usage is always forbidden. The full-functionality 1268 alternative for ECN usage over IPsec tunnels consists of the SAD 1269 field and the full version of encapsulation and decapsulation pro- 1270 cessing changes, with or without the OPTIONAL negotiation support. 1271 The limited-functionality alternative consists of a subset of the 1272 encapsulation and decapsulation changes that always forbids ECN 1273 usage. 1275 These changes are covered further in the following three subsections. 1277 9.2.1.1. ECN Tunnel Security Association Database Field 1279 Full ECN functionality adds a new field to the SAD (see [RFC2401]): 1281 ECN Tunnel: allowed or forbidden. 1283 Indicates whether ECN-capable connections using this SA in tunnel 1284 mode are permitted to receive ECN congestion notifications for 1285 congestion occurring within the tunnel. The allowed value enables 1286 ECN congestion notifications. The forbidden value disables such 1287 notifications, causing all congestion to be indicated via dropped 1288 packets. 1290 [OPTIONAL. The value of this field SHOULD be assumed to be "for- 1291 bidden" in implementations that do not support it.] 1293 If this attribute is implemented, then the SA specification in a 1294 Security Policy Database (SPD) entry MUST support a corresponding 1295 attribute, and this SPD attribute MUST be covered by the SPD adminis- 1296 trative interface (currently described in Section 4.4.1 of 1297 [RFC2401]). 1299 9.2.1.2. ECN Tunnel Security Association Attribute 1301 A new IPsec Security Association Attribute is defined to enable the 1302 support for ECN congestion notifications based on the outer IP header 1303 to be negotiated for IPsec tunnels (see [RFC2407]). This attribute 1304 is OPTIONAL, although implementations that support it SHOULD also 1305 support the SAD field defined in Section 3.1. 1307 Attribute Type 1309 class value type 1310 ------------------------------------------------- 1311 ECN Tunnel 10 Basic 1313 The IPsec SA Attribute value 10 has been allocated by IANA to indi- 1314 cate that the ECN Tunnel SA Attribute is being negotiated; the type 1315 of this attribute is Basic (see Section 4.5 of [RFC2407]). The Class 1316 Values are used to conduct the negotiation. See [RFC2407, RFC2408, 1317 RFC2409] for further information including encoding formats and 1318 requirements for negotiating this SA attribute. 1320 Class Values 1322 ECN Tunnel 1324 Specifies whether ECN functionality is allowed to 1325 be used with Tunnel Encapsulation Mode. 1326 This affects tunnel encapsulation and decapsulation processing - 1327 see Section 3.3. 1329 RESERVED 0 1330 Allowed 1 1331 Forbidden 2 1333 Values 3-61439 are reserved to IANA. Values 61440-65535 are for 1334 private use. 1336 If unspecified, the default shall be assumed to be Forbidden. 1338 ECN Tunnel is a new SA attribute, and hence initiators that use it 1339 can expect to encounter responders that do not understand it, and 1340 therefore reject proposals containing it. For backwards compatibil- 1341 ity with such implementations initiators SHOULD always also include a 1342 proposal without the ECN Tunnel attribute to enable such a responder 1343 to select a transform or proposal that does not contain the ECN Tun- 1344 nel attribute. RFC 2407 currently requires responders to reject all 1345 proposals if any proposal contains an unknown attribute; this 1346 requirement is expected to be changed to require a responder not to 1347 select proposals or transforms containing unknown attributes. 1349 9.2.1.3. Changes to IPsec Tunnel Header Processing 1351 Subsequent to the publication of [RFC 2401], the TOS octet of IPv4 1352 and the Traffic Class octet of IPv6 have been superseded by the six- 1353 bit DS Field [RFC2474, RFC2780] and a two-bit "currently unused" (CU) 1354 field [RFC2780], and this document supersedes the CU field by tne ECN 1355 Field. 1357 For full ECN support, the encapsulation and decapsulation processing 1358 for the IPv4 TOS field and the IPv6 Traffic Class field are changed 1359 from that specified in [RFC2401] to the following: 1361 <-- How Outer Hdr Relates to Inner Hdr --> 1362 Outer Hdr at Inner Hdr at 1363 IPv4 Encapsulator Decapsulator 1364 Header fields: -------------------- ------------ 1365 DS Field copied from inner hdr (5) no change 1366 ECN Field constructed (7) constructed (8) 1368 IPv6 1369 Header fields: 1370 DS Field copied from inner hdr (6) no change 1371 ECN Field constructed (7) constructed (8) 1373 (5)(6) If the packet will immediately enter a domain for which the 1374 DSCP value in the outer header is not appropriate, that value MUST 1375 be mapped to an appropriate value for the domain [RFC 2474]. Also 1376 see [RFC 2475] for further information. 1378 (7) If the value of the ECN Tunnel field in the SAD entry for this 1379 SA is "allowed" and the value of ECT (bit 0) is 1 in the inner 1380 header, set ECT to 1 in the outer header, else set ECT to 0 in the 1381 outer header. Set CE (bit 1) to 0 in the outer header. 1383 (8) If the value of the ECN tunnel field in the SAD entry for this 1384 SA is "allowed" and the value of ECT (bit 0) in the inner header 1385 is 1, then set the CE bit (bit 1) in the inner header to the logi- 1386 cal OR of the CE bit in the inner header with the CE bit in the 1387 outer header, else make no change to the ECN field. 1389 (5) and (6) are identical to match usage in [RFC2401], although 1390 they are different in [RFC2401]. 1392 The above description applies to implementations that support the ECN 1393 Tunnel field in the SAD; such implementations MUST implement this 1394 processing of the DS field instead of the processing of the IPv4 TOS 1395 octet and IPv6 Traffic Class octet defined in [RFC2401]. This con- 1396 stitutes the full-functionality alternative for ECN usage with IPsec 1397 tunnels. 1399 An implementation that does not support the ECN Tunnel field in the 1400 SAD MUST implement processing of the DS Field by assuming that the 1401 value of the ECN Tunnel field of the SAD is "forbidden" for every SA. 1402 In this case, the processing of the ECN field reduces to: 1404 (7) Set the ECN field (ECT and CE bits) to zero in the outer 1405 header. 1406 (8) Make no change to the ECN field in the inner header. 1408 This constitutes the limited functionality alternative for ECN usage 1409 with IPsec tunnels. 1411 For backwards compatibility, packets with ECT and CE both set to 1 in 1412 the outer header SHOULD be dropped if they arrive on an SA that is 1413 using the limited-functionality option, or that is using the full- 1414 functionality option (i.e., and has set the ECT flag in the outer 1415 header to 1) for a packet with the ECT flag set to 0 in the inner 1416 header. 1418 9.2.2. Changes to the ECN Field within an IPsec Tunnel. 1420 If the ECN Field is changed inappropriately within an IPsec tunnel, 1421 and this change is detected at the tunnel egress, then the receipt of 1422 a packet not satisfying the appropriate condition for its SA is an 1423 auditable event. An implementation MAY create audit records with 1424 per-SA counts of incorrect packets over some time period rather than 1425 creating an audit record for each erroneous packet. Any such audit 1426 record SHOULD contain the headers from at least one erroneous packet, 1427 but need not contain the headers from every packet represented by the 1428 entry. 1430 9.2.3. Comments for IPsec Support 1432 Substantial comments were received on two areas of this document dur- 1433 ing review by the IPsec working group. This section describes these 1434 comments and explains why the proposed changes were not incorporated. 1436 The first comment indicated that per-node configuration is easier to 1437 implement than per-SA configuration. After serious thought and 1438 despite some initial encouragement of per-node configuration, it no 1439 longer seems to be a good idea. The concern is that as IPsec is pro- 1440 gressively deployed, many ECN-aware IPsec implementations will find 1441 themselves communicating with a mixture of ECN-aware and ECN-unaware 1442 IPsec tunnel endpoints. In such an environment with per-node config- 1443 uration, the only reasonable thing to do is forbid ECN usage for all 1444 IPsec tunnels, which is not the desired outcome. 1446 In the second area, several reviewers noted that SA negotiation is 1447 complex, and adding to it is non-trivial. One reviewer suggested 1448 using ICMP after tunnel setup as a possible alternative. The addi- 1449 tion to SA negotiation in the document is OPTIONAL and will remain 1450 so; implementers are free to ignore it. The authors believe that the 1451 assurance it provides can be useful in a number of situations. In 1452 practice, if this is not implemented, it can be deleted at a subse- 1453 quent stage in the standards process. Extending ICMP to negotiate 1454 ECN after tunnel setup is more complex than extending SA attribute 1455 negotiation. Some tunnels do not permit traffic to be addressed to 1456 the tunnel egress endpoint, hence the ICMP packet would have to be 1457 addressed to somewhere else, scanned for by the egress endpoint, and 1458 discarded there or at its actual destination. In addition, ICMP 1459 delivery is unreliable, and hence there is a possibility of an ICMP 1460 packet being dropped, entailing the invention of yet another 1461 ack/retransmit mechanism. It seems better simply to specify an 1462 OPTIONAL extension to the existing SA negotiation mechanism. 1464 9.3. IP packets encapsulated in non-IP packet headers. 1466 A different set of issues are raised, relative to ECN, when IP pack- 1467 ets are encapsulated in tunnels with non-IP packet headers. This 1468 occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP]. 1469 For these protocols, there is no conflict with ECN; it is just that 1470 ECN cannot be used within the tunnel unless an ECN codepoint can be 1471 specified for the header of the encapsulating protocol. [RFD99] con- 1472 sidered a preliminary proposal for incorporating ECN into MPLS, and 1473 proposals for incorporating ECN into GRE, L2TP, or PPTP will be con- 1474 sidered as the need arises. 1476 10. Issues Raised by Monitoring and Policing Devices 1478 One possibility is that monitoring and policing devices (or more 1479 informally, "penalty boxes") will be installed in the network to mon- 1480 itor whether best-effort flows are appropriately responding to con- 1481 gestion, and to preferentially drop packets from flows determined not 1482 to be using adequate end-to-end congestion control procedures. This 1483 is discussed in more detail in the Appendix. 1485 We recommend that any "penalty box" that detects a flow or an aggre- 1486 gate of flows that is not responding to end-to-end congestion control 1487 first change from marking to dropping packets from that flow, before 1488 taking any additional action to restrict the bandwidth available to 1489 that flow. Thus, initially, the router may drop packets in which the 1490 router would otherwise would have set the CE bit. This could include 1491 dropping those arriving packets for that flow that are ECN-Capable 1492 and that already have the CE bit set. In this way, any congestion 1493 indications seen by that router for that flow will be guaranteed to 1494 also be seen by the end nodes, even in the presence of malicious or 1495 broken routers elsewhere in the path. If we assume that the first 1496 action taken at any "penalty box" for an ECN-capable flow will be to 1497 drop packets instead of marking them, then there is no way that an 1498 adversary that subverts ECN-based end-to-end congestion control can 1499 cause a flow to be characterized as being non-cooperative and placed 1500 into a more severe action within the "penalty box". 1502 The monitoring and policing devices that are actually deployed could 1503 fall short of the `ideal' monitoring device described above, in that 1504 the monitoring is applied not to a single flow or to a single IPsec 1505 tunnel, but to an aggregate of flows. In this case, the switch from 1506 marking to dropping would apply to all of the flows in that aggre- 1507 gate, denying the benefits of ECN to the other flows in the aggregate 1508 also. At the highest level of aggregation, another form of the dis- 1509 abling of ECN happens even in the absence of monitoring and policing 1510 devices, when ECN-Capable RED queues switch from marking to dropping 1511 packets as an indication of congestion when the average queue size 1512 has exceeded some threshold. 1514 If there were serious operational problems with routers inappropri- 1515 ately erasing the CE bit in packet headers, one potential fix would 1516 be to include a one-bit ECN nonce in packet headers, and for routers 1517 to erase the nonce when they set the CE bit [SCWA99]. Routers that 1518 erased the CE bit would be unable to consistently reconstruct the 1519 original nonce, and thus repeated erasure of the CE bit would be 1520 detected by the end-nodes. (This could in fact be done without 1521 adding any extra bits for ECN in the IP header, by using the ECN 1522 codepoints (ECT=1, CE=0) and (ECT=0, CE=1) as the two values for the 1523 nonce, and by defining the codepoint (ECT=0, CE=1) to mean exactly 1524 the same as the codepoint (ECT=1, CE=0).) However, at this point the 1525 potential danger of misbehaving routers does not seem of sufficient 1526 concern to warrant this additional complication of adding an ECN 1527 nonce to protect against the erasure of the CE bit. 1529 An ECN nonce would also address the problem of misbehaving transport 1530 receivers lying to the transport sender about whether or not the CE 1531 bit was set in a packet. However, another possibility is for the 1532 data sender to test for a misbehaving receiver directly, by occasion- 1533 ally sending a data packet with ECT and CE set, to see if the 1534 receiver reports receiving the CE bit. Of course, if these packets 1535 encountered congestion in the network, the TCP sender would not 1536 receive this indication of congestion, so setting the ECT and CE bits 1537 at the sender would have to be done very sparingly. In addition, the 1538 TCP sender would have to remember which packets were sent with the 1539 ECT and CE bits set, so that it doesn't react to them as if there was 1540 congestion in the network. We believe that further research is 1541 needed on possible transport-based mechanisms for verifying that the 1542 transport receiver does not lie to the transport sender about the 1543 receipt of congestion indications. 1545 11. Evaluations of ECN 1547 This section discusses some of the related work evaluating the use of 1548 ECN. The ECN Web Page [ECN] has pointers to other papers, as well as 1549 to implementations of ECN. 1551 [Floyd94] considers the advantages and drawbacks of adding ECN to the 1552 TCP/IP architecture. As shown in the simulation-based comparisons, 1553 one advantage of ECN is to avoid unnecessary packet drops for short 1554 or delay-sensitive TCP connections. A second advantage of ECN is in 1555 avoiding some unnecessary retransmit timeouts in TCP. This paper 1556 discusses in detail the integration of ECN into TCP's congestion con- 1557 trol mechanisms. The possible disadvantages of ECN discussed in the 1558 paper are that a non-compliant TCP connection could falsely advertise 1559 itself as ECN-capable, and that a TCP ACK packet carrying an ECN-Echo 1560 message could itself be dropped in the network. The first of these 1561 two issues is discussed in the appendix of this document, and the 1562 second is addressed by the addition of the CWR flag in the TCP 1563 header. 1565 Experimental evaluations of ECN include [RFC2884,K98]. The conclu- 1566 sions of [K98] and [RFC2884] are that ECN TCP gets moderately better 1567 throughput than non-ECN TCP; that ECN TCP flows are fair towards non- 1568 ECN TCP flows; and that ECN TCP is robust with two-way traffic (with 1569 congestion in both directions) and with multiple congested gateways. 1570 Experiments with many short web transfers show that, while most of 1571 the short connections have similar transfer times with or without 1572 ECN, a small percentage of the short connections have very long 1573 transfer times for the non-ECN experiments as compared to the ECN 1574 experiments. 1576 12. Summary of changes required in IP and TCP 1578 This document specified two bits in the IP header, the ECN-Capable 1579 Transport (ECT) bit and the Congestion Experienced (CE) bit, to be 1580 used for ECN. The ECT bit set to "0" indicates that the transport 1581 protocol will ignore the CE bit. This is the default value for the 1582 ECT bit. The ECT bit set to "1" indicates that the transport proto- 1583 col is willing and able to participate in ECN. 1585 The default value for the CE bit is "0". The router sets the CE bit 1586 to "1" to indicate congestion to the end nodes. The CE bit in a 1587 packet header MUST NOT be reset by a router from "1" to "0". 1589 When viewed in terms of code points, this document has defined three 1590 code points for the ECN field, for "not ECT" (ECT=0, CE=0), "ECT but 1591 not CE" (ECT=1, CE=0), and "ECT and CE" (ECT=1, CE=1). The code 1592 point of (ECT=0, CE=1) is not defined in this document. One 1593 possibility would be for this code point to be used, some time in the 1594 future, for some other function for non-ECN-capable packets. A sec- 1595 ond possibility would be for this code point to be used as an ECN 1596 nonce, as described earlier in the paper. A third possibility would 1597 be for the code point (ECT=0, CE=1) to be used to indicate that the 1598 packet is ECN-capable for an alternate semantics for the Congestion 1599 Experienced indication. However, at this time the code point (ECT=0, 1600 CE=1) remains undefined. 1602 TCP requires three changes for ECN, a setup phase and two new flags 1603 in the TCP header. The ECN-Echo flag is used by the data receiver to 1604 inform the data sender of a received CE packet. The Congestion Win- 1605 dow Reduced (CWR) flag is used by the data sender to inform the data 1606 receiver that the congestion window has been reduced. 1608 When ECN (Explicit Congestion Notification [RFC2481]) is used, it is 1609 required that congestion indications generated within an IP tunnel 1610 not be lost at the tunnel egress. We specified a minor modification 1611 to the IP protocol's handling of the ECN field during encapsulation 1612 and de-capsulation to allow flows that will undergo IP tunneling to 1613 use ECN. 1615 Two options for ECN in tunnels were specified: 1616 1) A limited-functionality option that does not use ECN inside the IP 1617 tunnel, by turning the ECT bit in the outer header off, and not 1618 altering the inner header at the time of decapsulation. 1619 2) The full-functionality option, which copies the ECT bit of the 1620 inner header to the encapsulating header. At decapsulation, if the 1621 ECT bit is set in the inner header, the CE bit on the outer header is 1622 ORed with the CE bit of the inner header to update the CE bit of the 1623 packet. 1625 All IP tunnels MUST implement one of the two alternative approaches 1626 described above. For IPsec tunnels, this document also defines an 1627 optional IPsec SA attribute that enables negotiation of ECN usage 1628 within IPsec tunnels and an optional field in the Security Associa- 1629 tion Database to indicate whether ECN is permitted in tunnel mode on 1630 a SA. 1632 This document is intended to obsolete RFC 2481, "A Proposal to add 1633 Explicit Congestion Notification (ECN) to IP", which defined ECN as 1634 an Experimental Protocol for the Internet Community, as well as to 1635 obsolete three subsequent internet-drafts on ECN, "IPsec Interactions 1636 with ECN", "ECN Interactions with IP Tunnels", and "TCP with ECN: The 1637 Treatment of Retransmitted Data Packets". This document is intended 1638 largely to merge the earlier documents all into a single document, 1639 for greater clarity, in preparation to becoming a Proposed Standard. 1640 The rest of this section describes the relationship between this 1641 document and its predecessors. 1643 RFC 2481 included a brief discussion of the use of ECN with encapsu- 1644 lated packets, and noted that for the IPsec specifications at the 1645 time (January 1999), flows could not safely use ECN if they were to 1646 traverse IPsec tunnels. RFC 2481 also described the changes that 1647 could be made to IPsec tunnel specifications to made them compatible 1648 with ECN. "IPsec Interactions with ECN" outlined these changes to 1649 IPsec tunnels in detail, and included an extensive discussion of the 1650 security implications of ECN (now included as Sections 18 and 19 of 1651 this document). The draft of "ECN Interactions with IP Tunnels" 1652 extended the discussion of IPsec tunnels to include all IP tunnels. 1653 Because older IP tunnels are not compatible with a flow's use of ECN, 1654 the deployment of ECN in the Internet will create strong pressure for 1655 older IP tunnels to be updated to an ECN-compatible version, using 1656 either the limited-functionality or the full-functionality option. 1658 This document does not address the issue of including ECN in non-IP 1659 tunnels such as MPLS, GRE, L2TP, or PPTP. An earlier preliminary 1660 document about adding ECN support to MPLS has since expired. 1662 This document expands on one area not addressed in RFC 2481, the use 1663 of ECN with retransmitted data packets. That is, this document 1664 includes the material from "TCP with ECN: The Treatment of Retrans- 1665 mitted Data Packets" specifying that the ECT bit should not be set on 1666 retransmitted data packets. The motivation for this additional spec- 1667 ification is to eliminate a possible avenue for denial-of-service 1668 attacks on an existing TCP connection. Some prior deployments of 1669 ECN-capable TCP might not conform to the (new) requirement not to set 1670 the ECT bit on retransmitted packets; we do not believe this will 1671 cause significant problems in practice. 1673 This document also expands on the specification of the use of SYN 1674 packets for the negotiation of ECN, and specifies some optional 1675 behavior for this. In particular, the document allows a TCP host to 1676 send a non-ECN-setup SYN packet after sending a failed ECN-setup SYN 1677 packet, and precisely specifies the required behavior when both ECN- 1678 setup SYN packets and non-ECN-setup SYN packets are sent in the same 1679 connection. While some prior deployments of ECN-capable TCP might 1680 not conform to the requirements specified in this document, we do not 1681 believe that this will lead to any performance or compatibility prob- 1682 lems for TCP connections with a combination of TCP implementations at 1683 the endpoints. 1685 13. Conclusions 1687 Given the current effort to implement AQM, we believe this is the 1688 right time to deploy congestion avoidance mechanisms that do not 1689 depend on packet drops alone. With the increased deployment of 1690 applications and transports sensitive to the delay and loss of a sin- 1691 gle packet (e.g., realtime traffic, short web transfers), depending 1692 on packet loss as a normal congestion notification mechanism appears 1693 to be insufficient (or at the very least, non-optimal). 1695 We examined the consequence of modifications of the ECN field within 1696 the network, analyzing all the opportunities for an adversary to 1697 change the ECN field. In many cases, the change to the ECN field is 1698 no worse than dropping a packet. However, we noted that some changes 1699 have the more serious consequence of subverting end-to-end congestion 1700 control. However, we point out that even then the potential damage 1701 is limited, and is similar to the threat posed by end-systems inten- 1702 tionally failing to cooperate with end-to-end congestion control. 1704 14. Acknowledgements 1706 Many people have made contributions to this work and this document, 1707 including many that we have not managed to directly acknowledge in 1708 this document. In addition, we would like to thank Kenjiro Cho for 1709 the proposal for the TCP mechanism for negotiating ECN-Capability, 1710 Kevin Fall for the proposal of the CWR bit, Steve Blake for material 1711 on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for discus- 1712 sions of ECN issues, and Steve Bellovin, Jim Bound, Brian Carpenter, 1713 Paul Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson for dis- 1714 cussions of security issues. We also thank the Internet End-to-End 1715 Research Group for ongoing discussions of these issues. 1717 Email discussions with a number of people, including Alexey 1718 Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have addressed 1719 the issues raised by non-conformant equipment in the Internet that 1720 does not respond to TCP SYN packets with the ECE and CWR flags set. 1721 We thank Mark Handley, Jitentra Padhye, and others for contributions 1722 to the TCP initialization procedures. 1724 The discussion of ECN and IP tunnel considerations draws heavily on 1725 related discussions and documents from the Differentiated Services 1726 Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh, 1727 for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen 1728 for proposing modifications to RFC 2407 that improve the usability of 1729 negotiating the ECN Tunnel SA attribute. 1731 15. References 1733 [AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402, 1734 November 1998. 1736 [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1737 Levels", BCP 14, RFC 2119, March 1997. 1739 [ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html". 1741 [ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload", 1742 RFC 2406, November 1998. 1744 [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways 1745 for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 1746 N.4, August 1993, p. 397-413. 1748 [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM 1749 Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. 1751 [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", 1752 URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- 1753 ecn. 1755 [FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-End Con- 1756 gestion Control in the Internet", IEEE/ACM Transactions on Network- 1757 ing, August 1999. 1759 [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", 1760 SIGCOMM '97, September 1997. 1762 [GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing 1763 Encapsulation (GRE), RFC 1701, October 1994. 1765 [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. 1766 ACM SIGCOMM '88, pp. 314-329. 1768 [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance Algo- 1769 rithm", Message to end2end-interest mailing list, April 1990. URL 1770 "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". 1772 [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN) 1773 benefits for TCP", Master's thesis, UCLA, 1998, URL 1774 "http://www.cs.ucla.edu/~hari/software/ecn/ ecn_report.ps.gz". 1776 [L2TP] W. Townsley, A. Valencia, A. Rubens, G. Pall, G. Zorn, and B. 1777 Palter Layer Two Tunneling Protocol "L2TP", RFC 2661, August 1999. 1779 [MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver- driven 1780 Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130. 1782 [MPLS] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus, 1783 Requirements for Traffic Engineering Over MPLS, RFC 2702, September 1784 1999. 1786 [PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W. 1787 and G. Zorn, "Point-to-Point Tunneling Protocol (PPTP)", RFC 2637, 1788 July 1999. 1790 [RFC791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1791 1981. 1793 [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 1794 September 1981. 1796 [RFC1141] Mallory, T. and A. Kullberg, "Incremental Updating of the 1797 Internet Checksum", RFC 1141, January 1990. 1799 [RFC1349] Almquist, P., "Type of Service in the Internet Protocol 1800 Suite", RFC 1349, July 1992. 1802 [RFC1455] Eastlake, D., "Physical Link Security Type of Service", RFC 1803 1455, May 1993. 1805 [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic 1806 Routing Encapsulation (GRE), RFC 1701, October 1994. 1808 [RFC1702] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic 1809 Routing Encapsulation over IPv4 networks, RFC 1702, October 1994. 1811 [RFC2003] Perkins, C., IP Encapsulation within IP, RFC 2003, October 1812 1996. 1814 [RFC 2119] S. Bradner, Key words for use in RFCs to Indicate Require- 1815 ment Levels, RFC 2119, March 1997. 1817 [RFC2309] Braden, B., et al., "Recommendations on Queue Management 1818 and Congestion Avoidance in the Internet", RFC 2309, April 1998. 1820 [RFC 2401] S. Kent and R. Atkinson, Security Architecture for the 1821 Internet Protocol, RFC 2401, November 1998. 1823 [RFC2407] D. Piper, The Internet IP Security Domain of Interpretation 1824 for ISAKMP, RFC 2407, November 1998. 1826 [RFC2408] D. Maughan, M. Schertler, M. Schneider, and J. Turner, 1827 Internet Security Association and Key Management Protocol (ISAKMP), 1828 RFC 2409, November 1998. 1830 [RFC2409] D. Harkins and D. Carrel, The Internet Key Exchange (IKE), 1831 RFC 2409, November 1998. 1833 [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition 1834 of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 1835 Headers", RFC 2474, December 1998. 1837 [RFC2475] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. 1838 Weiss, An Architecture for Differentiated Services, RFC 2475, Decem- 1839 ber 1998. 1841 [RFC2481] K. Ramakrishnan and S. Floyd, A Proposal to add Explicit 1842 Congestion Notification (ECN) to IP, RFC 2481, January 1999. 1844 [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control", 1845 RFC 2581, April 1999. 1847 [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation 1848 of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884, 1849 July 2000. 1851 [RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983, 1852 October 2000. 1854 [RFC2780] S. Bradner and V. Paxson, IANA Allocation Guidelines For 1855 Values In the Internet Protocol and Related Headers, RFC 2780, March 1856 2000. 1858 [RFD99] Ramakrishnan, Floyd, S., and Davie, B., A Proposal to Incor- 1859 porate ECN in MPLS, work in progress, June 1999. URL 1860 "http://www.aciri.org/floyd/papers/draft-ietf-mpls-ecn-00.txt". 1862 [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for 1863 Congestion Avoidance in Computer Networks", ACM Transactions on Com- 1864 puter Systems, Vol.8, No.2, pp. 158-181, May 1990. 1866 [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom 1867 Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM 1868 Computer Communications Review, October 1999. 1870 16. Security Considerations 1872 Security considerations have been discussed in Sections 7 and 8. 1874 17. IPv4 Header Checksum Recalculation 1876 IPv4 header checksum recalculation is an issue with some high-end 1877 router architectures using an output-buffered switch, since most if 1878 not all of the header manipulation is performed on the input side of 1879 the switch, while the ECN decision would need to be made local to the 1880 output buffer. This is not an issue for IPv6, since there is no IPv6 1881 header checksum. The IPv4 TOS octet is the last byte of a 16-bit 1882 half-word. 1884 RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 1885 checksum after the TTL field is decremented. The incremental updat- 1886 ing of the IPv4 checksum after the CE bit was set would work as fol- 1887 lows: Let HC be the original header checksum, and let HC' be the new 1888 header checksum after the CE bit has been set. Then for header 1889 checksums calculated with one's complement subtraction, HC' would be 1890 recalculated as follows: 1892 HC' = { HC - 1 HC > 1 1893 { 0x0000 HC = 1 1895 For header checksums calculated on two's complement machines, HC' would 1896 be recalculated as follows after the CE bit was set: 1898 HC' = { HC - 1 HC > 0 1899 { 0xFFFE HC = 0 1901 18. Possible Changes to the ECN Field in the Network 1903 This section discusses in detail possible changes to the ECN field in 1904 the network, such as falsely reporting congestion, disabling ECN- 1905 Capability for an individual packet, erasing the ECN congestion indi- 1906 cation, or falsely indicating ECN-Capability. We represent the ECN 1907 bits in the IP header by the tuple (ECT bit, CE bit). 1909 18.1. Possible Changes to the IP Header 1911 18.1.1. Erasing the Congestion Indication 1913 First, we consider the changes that a router could make that would 1914 result in effectively erasing the congestion indication after it had 1915 been set by a router upstream. The convention followed is: 1916 (ECT, CE) of received packet -> (ECT, CE) of packet transmitted. 1918 (1, 1) -> (1, 0): erase only the CE bit that was set. 1919 (1, 1) -> (0, 0): erase both the ECT bit and the CE bit. 1920 (1, 1) -> (0, 1): erase the ECT bit 1922 The first change turns off the CE bit after it has been set by some 1923 upstream router along the path. The consequence for the upstream 1924 router is that there is a potential for congestion to build for a 1925 time, because the congestion indication does not reach the source. 1926 However, the packet would be received and acknowledged. 1928 The potential effect of erasing the congestion indication is complex, 1929 and is discussed in depth in Section 19 below. Note that the effect 1930 of erasing the congestion indication is different from dropping a 1931 packet in the network. When a data packet is dropped, the drop is 1932 detected by the TCP sender, and interpreted as an indication of con- 1933 gestion. Similarly, if a sufficient number of consecutive acknowl- 1934 edgement packets are dropped, causing the cumulative acknowledgement 1935 field not to be advanced at the sender, the sender is limited by the 1936 congestion window from sending additional packets, and ultimately the 1937 retransmit timer expires. 1939 In contrast, a systematic erasure of the CE bit by a downstream 1940 router can have the effect of causing a queue buildup at an upstream 1941 router, including the possible loss of packets due to buffer over- 1942 flow. There is a potential of unfairness in that another flow that 1943 goes through the congested router could react to the CE bit set while 1944 the flow that has the CE bit erased could see better performance. 1945 The limitations on this potential unfairness are discussed in more 1946 detail in Section 19 below. 1948 The second change is to turn off both the ECT and the CE bits, thus 1949 erasing the congestion indication and disabling ECN-Capability at the 1950 same time. The third change turns off only the ECT bit, disabling 1951 ECN-Capability. 1953 Within an IP tunnel using the full-functionality option, the third 1954 change would not erase the congestion indication, but would only dis- 1955 able ECN-Capability for that packet within the rest of the tunnel. 1956 However, when performed outside of an IP tunnel, the third change 1957 would also effectively erase the congestion indication, because an 1958 ECN field of (0, 1) is undefined. 1960 The `erasure' of the congestion indication is only effective if the 1961 packet does not end up being marked or dropped again by a downstream 1962 router. With the first change, the packet remains ECN-Capable, and 1963 could be either marked or dropped by a downstream router as an indi- 1964 cation of congestion. With the second and third changes, the packet 1965 is no longer ECN-capable, and can therefore be dropped but not marked 1966 by a downstream router as an indication of congestion. 1968 18.1.2. Falsely Reporting Congestion 1970 (1, 0) -> (1, 1) 1972 This change is to set the CE bit when the ECT bit was already set, 1973 even though there was no congestion. This change does not affect the 1974 treatment of that packet along the rest of the path. In particular, 1975 a router does not examine the CE bit in deciding whether to drop or 1976 mark an arriving packet. 1978 However, this could result in the application unnecessarily invoking 1979 end-to-end congestion control, and reducing its arrival rate. By 1980 itself, this is no worse (for the application or for the network) 1981 than if the tampering router had actually dropped the packet. 1983 18.1.3. Disabling ECN-Capability 1985 (1, 0) -> (0, *) 1987 This change is to turn off the ECT bit of a packet that does not have 1988 the CE bit set. (Section 18.1.1 discussed the case of turning off 1989 the ECT bit of a packet that does have the CE bit set.) This means 1990 that if the packet later encounters congestion (e.g., by arriving to 1991 a RED queue with a moderate average queue size), it will be dropped 1992 instead of being marked. By itself, this is no worse (for the appli- 1993 cation) than if the tampering router had actually dropped the packet. 1994 The saving grace in this particular case is that there is no con- 1995 gested router upstream expecting a reaction from setting the CE bit. 1997 18.1.4. Falsely Indicating ECN-Capability 1998 This change would incorrectly label a packet as ECN-Capable. The 1999 packet may have been sent either by an ECN-Capable transport or a 2000 transport that is not ECN-Capable. 2002 (0, *) -> (1, 0); 2003 (0, *) -> (1, 1); 2005 If the packet later encounters moderate congestion at an ECN-Capable 2006 router, the router could set the CE bit instead of dropping the 2007 packet. If the transport protocol in fact is not ECN-Capable, then 2008 the transport will never receive this indication of congestion, and 2009 will not reduce its sending rate in response. The potential conse- 2010 quences of falsely indicating ECN-capability are discussed further in 2011 Section 19 below. 2013 If the packet never later encounters congestion at an ECN-Capable 2014 router, then the first of these two changes would have no effect. 2015 The second change, however, would have the effect of giving false 2016 reports of congestion to a monitoring device along the path. If the 2017 transport protocol is ECN-Capable, then the second of these two 2018 changes (when, for example, (0,0) was changed to (1,1)) could also 2019 have an effect at the transport level, by combining falsely indicat- 2020 ing ECN-Capability with falsely reporting congestion. For an ECN- 2021 capable transport, this would cause the transport to unnecessarily 2022 react to congestion. In this particular case, the router that is 2023 incorrectly changing the ECN field could have dropped the packet. 2025 Thus for this case of an ECN-capable transport, the consequence of 2026 this change to the ECN field is no worse than dropping the packet. 2028 18.1.5. Changes with No Functional Effect 2030 (0, *) -> (0, *) 2032 The CE bit is ignored in a packet that does not have the ECT bit set. 2033 Thus, this change would have no effect, in terms of ECN. 2035 18.2. Information carried in the Transport Header 2037 For TCP, an ECN-capable TCP receiver informs its TCP peer that it is 2038 ECN-capable at the TCP level, using information in the TCP header at 2039 the time the connection is setup. This document does not consider 2040 potential dangers introduced by changes in the transport header 2041 within the network. In the case of IPsec tunnels, the IPsec tunnel 2042 protects the transport header. 2044 18.3. Split Paths 2046 In some cases, a malicious or broken router might have access to only 2047 a subset of the packets from a flow. The question is as follows: 2048 can this router, by altering the ECN field in this subset of the 2049 packets, do more damage to that flow than if it had simply dropped 2050 that set of packets? 2052 We will classify the packets in the flow as A packets and B packets, 2053 and assume that the adversary only has access to A packets. Assume 2054 that the adversary is subverting end-to-end congestion control along 2055 the path traveled by A packets only, by either falsely indicating 2056 ECN-Capability upstream of the point where congestion occurs, or 2057 erasing the congestion indication downstream. Consider also that 2058 there exists a monitoring device that sees both the A and B packets, 2059 and will "punish" both the A and B packets if the total flow is 2060 determined not to be properly responding to indications of conges- 2061 tion. Another key characteristic that we believe is likely to be 2062 true is that the monitoring device, before `punishing' the A&B flow, 2063 will first drop packets instead of setting the CE bit, and will drop 2064 arriving packets of that flow that already have the ECT and CE bits 2065 set. If the end nodes are in fact using end-to-end congestion con- 2066 trol, they will see all of the indications of congestion seen by the 2067 monitoring device, and will begin to respond to these indications of 2068 congestion. Thus, the monitoring device is successful in providing 2069 the indications to the flow at an early stage. 2071 It is true that the adversary that has access only to the A packets 2072 might, by subverting ECN-based congestion control, be able to deny 2073 the benefits of ECN to the other packets in the A&B aggregate. While 2074 this is unfortunate, this is not a reason to disable ECN within an 2075 IPsec tunnel. 2077 A variant of falsely reporting congestion occurs when there are two 2078 adversaries along a path, where the first adversary falsely reports 2079 congestion, and the second adversary `erases' those reports. (Unlike 2080 packet drops, ECN congestion reports can be `reversed' later in the 2081 network by a malicious or broken router.) While this would be trans- 2082 parent to the end node, it is possible that a monitoring device 2083 between the first and second adversaries would see the false indica- 2084 tions of congestion. Keep in mind our recommendation in this docu- 2085 ment, that before `punishing' a flow for not responding appropriately 2086 to congestion, the router will first switch to dropping rather than 2087 marking as an indication of congestion, for that flow. When this 2088 includes dropping arriving packets from that flow that have the CE 2089 bit set, this ensures that these indications of congestion are being 2090 seen by the end nodes. Thus, there is no additional harm that we are 2091 able to postulate as a result of multiple conflicting adversaries. 2093 19. Implications of Subverting End-to-End Congestion Control 2095 This section focuses on the potential repercussions of subverting 2096 end-to-end congestion control by either falsely indicating ECN-Capa- 2097 bility, or by erasing the congestion indication in ECN (the CE-bit). 2098 Subverting end-to-end congestion control by either of these two meth- 2099 ods can have consequences both for the application and for the net- 2100 work. We discuss these separately below. 2102 The first method to subvert end-to-end congestion control, that of 2103 falsely indicating ECN-Capability, effectively subverts end-to-end 2104 congestion control only if the packet later encounters congestion 2105 that results in the setting of the CE bit. In this case, the trans- 2106 port protocol (which may not be ECN-capable) does not receive the 2107 indication of congestion from these downstream congested routers. 2109 The second method to subvert end-to-end congestion control, `erasing' 2110 the (set) CE bit in a packet, effectively subverts end-to-end conges- 2111 tion control only when the CE bit in the packet was set earlier by a 2112 congested router. In this case, the transport protocol does not 2113 receive the indication of congestion from the upstream congested 2114 routers. 2116 Either of these two methods of subverting end-to-end congestion con- 2117 trol can potentially introduce more damage to the network (and possi- 2118 bly to the flow itself) than if the adversary had simply dropped 2119 packets from that flow. However, as we discuss later in this section 2120 and in Section 7, this potential damage is limited. 2122 19.1. Implications for the Network and for Competing Flows 2124 The CE bit of the ECN field is only used by routers as an indication 2125 of congestion during periods of *moderate* congestion. ECN-capable 2126 routers should drop rather than mark packets during heavy congestion 2127 even if the router's queue is not yet full. For example, for routers 2128 using active queue management based on RED, the router should drop 2129 rather than mark packets that arrive while the average queue sizes 2130 exceed the RED queue's maximum threshold. 2132 One consequence for the network of subverting end-to-end congestion 2133 control is that flows that do not receive the congestion indications 2134 from the network might increase their sending rate until they drive 2135 the network into heavier congestion. Then, the congested router 2136 could begin to drop rather than mark arriving packets. For flows 2137 that are not isolated by some form of per-flow scheduling or other 2138 per-flow mechanisms, but are instead aggregated with other flows in a 2139 single queue in an undifferentiated fashion, this packet-dropping at 2140 the congested router would apply to all flows that share that queue. 2141 Thus, the consequences would be to increase the level of congestion 2142 in the network. 2144 In some cases, the increase in the level of congestion will lead to a 2145 substantial buffer buildup at the congested queue that will be suffi- 2146 cient to drive the congested queue from the packet-marking to the 2147 packet-dropping regime. This transition could occur either because 2148 of buffer overflow, or because of the active queue management policy 2149 described above that drops packets when the average queue is above 2150 RED's maximum threshold. At this point, all flows, including the 2151 subverted flow, will begin to see packet drops instead of packet 2152 marks, and a malicious or broken router will no longer be able to 2153 `erase' these indications of congestion in the network. If the end 2154 nodes are deploying appropriate end-to-end congestion control, then 2155 the subverted flow will reduce its arrival rate in response to con- 2156 gestion. When the level of congestion is sufficiently reduced, the 2157 congested queue can return from the packet-dropping regime to the 2158 packet-marking regime. The steady-state pattern could be one of the 2159 congested queue oscillating between these two regimes. 2161 In other cases, the consequences of subverting end-to-end congestion 2162 control will not be severe enough to drive the congested link into 2163 sufficiently-heavy congestion that packets are dropped instead of 2164 being marked. In this case, the implications for competing flows in 2165 the network will be a slightly-increased rate of packet marking or 2166 dropping, and a corresponding decrease in the bandwidth available to 2167 those flows. This can be a stable state if the arrival rate of the 2168 subverted flow is sufficiently small, relative to the link bandwidth, 2169 that the average queue size at the congested router remains under 2170 control. In particular, the subverted flow could have a limited 2171 bandwidth demand on the link at this router, while still getting more 2172 than its "fair" share of the link. This limited demand could be due 2173 to a limited demand from the data source; a limitation from the TCP 2174 advertised window; a lower-bandwidth access pipe; or other factors. 2175 Thus the subversion of ECN-based congestion control can still lead to 2176 unfairness, which we believe is appropriate to note here. 2178 The threat to the network posed by the subversion of ECN-based con- 2179 gestion control in the network is essentially the same as the threat 2180 posed by an end-system that intentionally fails to cooperate with 2181 end-to-end congestion control. The deployment of mechanisms in 2182 routers to address this threat is an open research question, and is 2183 discussed further in Section 10. 2185 Let us take the example described in Section 18.1.1, where the CE bit 2186 that was set in a packet is erased: {(1, 1) -> (1, 0)}. The conse- 2187 quence for the congested upstream router that set the CE bit is that 2188 this congestion indication does not reach the end nodes for that 2189 flow. The source (even one which is completely cooperative and not 2190 malicious) is thus allowed to continue to increase its sending rate 2191 (if it is a TCP flow, by increasing its congestion window). The flow 2192 potentially achieves better throughput than the other flows that also 2193 share the congested router, especially if there are no policing mech- 2194 anisms or per-flow queueing mechanisms at that router. Consider the 2195 behavior of the other flows, especially if they are cooperative: that 2196 is, the flows that do not experience subverted end-to-end congestion 2197 control. They are likely to reduce their load (e.g., by reducing 2198 their window size) on the congested router, thus benefiting our sub- 2199 verted flow. This results in unfairness. As we discussed above, this 2200 unfairness could either be transient (because the congested queue is 2201 driven into the packet-marking regime), oscillatory (because the con- 2202 gested queue oscillates between the packet marking and the packet 2203 dropping regime), or more moderate but a persistent stable state 2204 (because the congested queue is never driven to the packet dropping 2205 regime). 2207 The results would be similar if the subverted flow was intentionally 2208 avoiding end-to-end congestion control. One difference is that a 2209 flow that is intentionally avoiding end-to-end congestion control at 2210 the end nodes can avoid end-to-end congestion control even when the 2211 congested queue is in packet-dropping mode, by refusing to reduce its 2212 sending rate in response to packet drops in the network. Thus the 2213 problems for the network from the subversion of ECN-based congestion 2214 control are less severe than the problems caused by the intentional 2215 avoidance of end-to-end congestion control in the end nodes. It is 2216 also the case that it is considerably more difficult to control the 2217 behavior of the end nodes than it is to control the behavior of the 2218 infrastructure itself. This is not to say that the problems for the 2219 network posed by the network's subversion of ECN-based congestion 2220 control are small; just that they are dwarfed by the problems for the 2221 network posed by the subversion of either ECN-based or other cur- 2222 rently known packet-based congestion control mechanisms by the end 2223 nodes. 2225 19.2. Implications for the Subverted Flow 2227 When a source indicates that it is ECN-capable, there is an expecta- 2228 tion that the routers in the network that are capable of participat- 2229 ing in ECN will use the CE bit for indication of congestion. There is 2230 the potential benefit of using ECN in reducing the amount of packet 2231 loss (in addition to the reduced queueing delays because of active 2232 queue management policies). When the packet flows through a tunnel 2233 where the nodes that the tunneled packets traverse are untrusted in 2234 some way, the expectation is that IPsec will protect the flow from 2235 subversion that results in undesirable consequences. 2237 In many cases, a subverted flow will benefit from the subversion of 2238 end-to-end congestion control for that flow in the network, by 2239 receiving more bandwidth than it would have otherwise, relative to 2240 competing non-subverted flows. If the congested queue reaches the 2241 packet-dropping stage, then the subversion of end-to-end congestion 2242 control might or might not be of overall benefit to the subverted 2243 flow, depending on that flow's relative tradeoffs between throughput, 2244 loss, and delay. 2246 One form of subverting end-to-end congestion control is to falsely 2247 indicate ECN-capability by setting the ECT bit. This has the conse- 2248 quence of downstream congested routers setting the CE bit in vain. 2249 However, as we describe in the section below, if the ECT bit is 2250 changed in the IPsec tunnel, this can be detected at the egress point 2251 of the tunnel. 2253 The second form of subverting end-to-end congestion control is to 2254 erase the congestion indication, either by erasing the CE bit 2255 directly, or by erasing the ECT bit when the CE bit is already set. 2256 In this case, it is the upstream congested routers that set the CE 2257 bit in vain. 2259 If the ECT bit is erased within an IP tunnel, then this can be 2260 detected at the egress point of the tunnel. If the CE bit is set 2261 upstream of the IP tunnel, then any erasure of the outer header's CE 2262 bit within the tunnel will have no effect because the inner header 2263 preserves the set value of the CE bit. However, if the CE bit is set 2264 within the tunnel, and erased either within or downstream of the tun- 2265 nel, this is not necessarily detected at the egress point of the 2266 tunnel. 2268 With this subversion of end-to-end congestion control, an end-system 2269 transport does not respond to the congestion indication. Along with 2270 the increased unfairness for the non-subverted flows described in the 2271 previous section, the congested router's queue could continue to 2272 build, resulting in packet loss at the congested router - which is a 2273 means for indicating congestion to the transport in any case. In the 2274 interim, the flow might experience higher queueing delays, possibly 2275 along with an increased bandwidth relative to other non-subverted 2276 flows. But transports do not inherently make assumptions of consis- 2277 tently experiencing carefully managed queueing in the path. We 2278 believe that these forms of subverting end-to-end congestion control 2279 are no worse for the subverted flow than if the adversary had simply 2280 dropped the packets of that flow itself. 2282 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control 2284 We have shown that, in many cases, a malicious or broken router that 2285 is able to change the bits in the ECN field can do no more damage 2286 than if it had simply dropped the packet in question. However, this 2287 is not true in all cases, in particular in the cases where the broken 2288 router subverted end-to-end congestion control by either falsely 2289 indicating ECN-Capability or by erasing the ECN congestion indication 2290 (in the CE-bit). While there are many ways that a router can harm a 2291 flow by dropping packets, a router cannot subvert end-to-end conges- 2292 tion control by dropping packets. As an example, a router cannot 2293 subvert TCP congestion control by dropping data packets, acknowledge- 2294 ment packets, or control packets. 2296 Even though packet-dropping cannot be used to subvert end-to-end con- 2297 gestion control, there *are* non-ECN-based methods for subverting 2298 end-to-end congestion control that a broken or malicious router could 2299 use. For example, a broken router could duplicate data packets, thus 2300 effectively negating the effects of end-to-end congestion control 2301 along some portion of the path. (For a router that duplicated pack- 2302 ets within an IPsec tunnel, the security administrator can cause the 2303 duplicate packets to be discarded by configuring anti-replay protec- 2304 tion for the tunnel.) This duplication of packets within the network 2305 would have similar implications for the network and for the subverted 2306 flow as those described in Sections 18.1.1 and 18.1.4 above. 2308 20. The motivation for the ECT bit. 2310 The need for the ECT bit is motivated by the fact that ECN will be 2311 deployed incrementally in an Internet where some transport protocols 2312 and routers understand ECN and some do not. With the ECT bit, the 2313 router can drop packets from flows that are not ECN-capable, but can 2314 *instead* set the CE bit in packets that *are* ECN-capable. Because 2315 the ECT bit allows an end node to have the CE bit set in a packet 2316 *instead* of having the packet dropped, an end node might have some 2317 incentive to deploy ECN. 2319 If there was no ECT indication, then the router would have to set the 2320 CE bit for packets from both ECN-capable and non-ECN-capable flows. 2321 In this case, there would be no incentive for end-nodes to deploy 2322 ECN, and no viable path of incremental deployment from a non-ECN 2323 world to an ECN-capable world. Consider the first stages of such an 2324 incremental deployment, where a subset of the flows are ECN-capable. 2325 At the onset of congestion, when the packet dropping/marking rate 2326 would be low, routers would only set CE bits, rather than dropping 2327 packets. However, only those flows that are ECN-capable would under- 2328 stand and respond to CE packets. The result is that the ECN- capable 2329 flows would back off, and the non-ECN-capable flows would be unaware 2330 of the ECN signals and would continue to open their congestion win- 2331 dows. 2333 In this case, there are two possible outcomes: (1) the ECN-capable 2334 flows back off, the non-ECN-capable flows get all of the bandwidth, 2335 and congestion remains mild, or (2) the ECN-capable flows back off, 2336 the non-ECN-capable flows don't, and congestion increases until the 2337 router transitions from setting the CE bit to dropping packets. 2338 While this second outcome evens out the fairness, the ECN-capable 2339 flows would still receive little benefit from being ECN-capable, 2340 because the increased congestion would drive the router to packet- 2341 dropping behavior. 2343 A flow that advertised itself as ECN-Capable but does not respond to 2344 CE bits is functionally equivalent to a flow that turns off conges- 2345 tion control, as discussed earlier in this document. 2347 Thus, in a world when a subset of the flows are ECN-capable, but 2348 where ECN-capable flows have no mechanism for indicating that fact to 2349 the routers, there would be less effective and less fair congestion 2350 control in the Internet, resulting in a strong incentive for end 2351 nodes not to deploy ECN. 2353 21. Why use two bits in the IP header? 2355 Given the need for an ECT indication in the IP header, there still 2356 remains the question of whether the ECT (ECN-Capable Transport) and 2357 CE (Congestion Experienced) indications should have been overloaded 2358 on a single bit. This overloaded-one-bit alternative, explored in 2359 [Floyd94], would have involved a single bit with two values. One 2360 value, "ECT and not CE", would represent an ECN-Capable Transport, 2361 and the other value, "CE or not ECT", would represent either 2362 Congestion Experienced or a non-ECN-Capable transport. 2364 One difference between the one-bit and two-bit implementations con- 2365 cerns packets that traverse multiple congested routers. Consider a 2366 CE packet that arrives at a second congested router, and is selected 2367 by the active queue management at that router for either marking or 2368 dropping. In the one-bit implementation, the second congested router 2369 has no choice but to drop the CE packet, because it cannot distin- 2370 guish between a CE packet and a non-ECT packet. In the two-bit 2371 implementation, the second congested router has the choice of either 2372 dropping the CE packet, or of leaving it alone with the CE bit set. 2374 Another difference between the one-bit and two-bit implementations 2375 comes from the fact that with the one-bit implementation, receivers 2376 in a single flow cannot distinguish between CE and non-ECT packets. 2377 Thus, in the one-bit implementation an ECN-capable data sender would 2378 have to unambiguously indicate to the receiver or receivers whether 2379 each packet had been sent as ECN-Capable or as non-ECN-Capable. One 2380 possibility would be for the sender to indicate in the transport 2381 header whether the packet was sent as ECN-Capable. A second possi- 2382 bility that would involve a functional limitation for the one- bit 2383 implementation would be for the sender to unambiguously indicate that 2384 it was going to send *all* of its packets as ECN-Capable or as non- 2385 ECN-Capable. For a multicast transport protocol, this unambiguous 2386 indication would have to be apparent to receivers joining an on-going 2387 multicast session. 2389 Another concern that was described earlier (and recommended in this 2390 document) is that transports (particularly TCP) should not mark pure 2391 ACK packets or retransmitted packets as being ECN-Capable. A pure 2392 ACK packet from a non-ECN-capable transport could be dropped, without 2393 necessarily having an impact on the transport from a congestion con- 2394 trol perspective (because subsequent ACKs are cumulative). An ECN- 2395 capable transport reacting to the CE bit set in a pure ACK packet by 2396 reducing the window would be at a disadvantage in comparison to a 2397 non-ECN-capable transport. For this reason (and for reasons described 2398 earlier in relation to retransmitted packets), it is desirable to 2399 have the ECN-Capable bit indication on a per-packet basis. 2401 Another advantage of the two-bit approach is that it is somewhat more 2402 robust. The most critical issue, discussed in Section 8, is that the 2403 default indication should be that of a non-ECN-Capable transport. In 2404 a two-bit implementation, this requirement for the default value sim- 2405 ply means that the ECT bit should be `OFF' by default. In the one- 2406 bit implementation, this means that the single overloaded bit should 2407 by default be in the "CE or not ECT" position. This is less clear 2408 and straightforward, and possibly more open to incorrect implementa- 2409 tions either in the end nodes or in the routers. 2411 In summary, while the one-bit implementation could be a possible 2412 implementation, it has the following significant limitations relative 2413 to the two-bit implementation. First, the one-bit implementation has 2414 more limited functionality for the treatment of CE packets at a sec- 2415 ond congested router. Second, the one-bit implementation requires 2416 either that extra information be carried in the transport header of 2417 packets from ECN-Capable flows (to convey the functionality of the 2418 second bit elsewhere, namely in the transport header), or that 2419 senders in ECN-Capable flows accept the limitation that receivers 2420 must be able to determine a priori which packets are ECN-Capable and 2421 which are not ECN-Capable. Third, the one-bit implementation is pos- 2422 sibly more open to errors from faulty implementations that choose the 2423 wrong default value for the ECN bit. We believe that the use of the 2424 extra bit in the IP header for the ECT-bit is extremely valuable to 2425 overcome these limitations. 2427 22. Historical definitions for the IPv4 TOS octet 2429 RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP 2430 header. In RFC 791, bits 6 and 7 of the ToS octet are listed as 2431 "Reserved for Future Use", and are shown set to zero. The first two 2432 fields of the ToS octet were defined as the Precedence and Type of 2433 Service (TOS) fields. 2435 0 1 2 3 4 5 6 7 2436 +-----+-----+-----+-----+-----+-----+-----+-----+ 2437 | PRECEDENCE | TOS | 0 | 0 | RFC 791 2438 +-----+-----+-----+-----+-----+-----+-----+-----+ 2440 RFC 1122 included bits 6 and 7 in the TOS field, though it did not 2441 discuss any specific use for those two bits: 2443 0 1 2 3 4 5 6 7 2444 +-----+-----+-----+-----+-----+-----+-----+-----+ 2445 | PRECEDENCE | TOS | RFC 1122 2446 +-----+-----+-----+-----+-----+-----+-----+-----+ 2448 The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows: 2450 0 1 2 3 4 5 6 7 2451 +-----+-----+-----+-----+-----+-----+-----+-----+ 2452 | PRECEDENCE | TOS | MBZ | RFC 1349 2453 +-----+-----+-----+-----+-----+-----+-----+-----+ 2455 Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary 2456 Cost". In addition to the Precedence and Type of Service (TOS) 2457 fields, the last field, MBZ (for "must be zero") was defined as 2458 currently unused. RFC 1349 stated that "The originator of a datagram 2459 sets [the MBZ] field to zero (unless participating in an Internet 2460 protocol experiment which makes use of that bit)." 2462 RFC 1455 [RFC 1455] defined an experimental standard that used all 2463 four bits in the TOS field to request a guaranteed level of link 2464 security. 2466 RFC 1349 is obsoleted by "Definition of the Differentiated Services 2467 Field (DS Field) in the IPv4 and IPv6 Headers" [RFC2474], in which 2468 bits 6 and 7 of the DS field are listed as Currently Unused (CU). 2469 The first six bits of the DS field are defined as the Differentiated 2470 Services CodePoint (DSCP): 2472 0 1 2 3 4 5 6 7 2473 +-----+-----+-----+-----+-----+-----+-----+-----+ 2474 | DSCP | CU | RFC 2474 2475 +-----+-----+-----+-----+-----+-----+-----+-----+ 2477 Because of this unstable history, the definition of the ECN field in 2478 this document cannot be guaranteed to be backwards compatible with 2479 all past uses of these two bits. The damage that could be done by a 2480 non-ECN-capable router would be to "erase" the CE bit for an ECN- 2481 capable packet that arrived at the router with the CE bit set, or set 2482 the CE bit even in the absence of congestion. This has been dis- 2483 cussed in the section on "Non-compliance in the Network". 2485 The damage that could be done in an ECN-capable environment by a non- 2486 ECN-capable end-node transmitting packets with the ECT bit set has 2487 been discussed in the section on "Non-compliance by the End Nodes". 2489 AUTHORS' ADDRESSES 2491 K. K. Ramakrishnan 2492 TeraOptic Networks, Inc. 2493 Phone: +1 (408) 666-8650 2494 Email: kk@teraoptic.com 2496 Sally Floyd 2497 Phone: +1 (510) 666-2989 2498 ACIRI 2499 Email: floyd@aciri.org 2500 URL: http://www.aciri.org/floyd/ 2502 David L. Black 2503 EMC Corporation 2504 42 South St. 2506 Hopkinton, MA 01748 2507 Phone: +1 (508) 435-1000 x75140 2508 Email: black_david@emc.com 2510 This draft was created in November 2000. 2511 It expires May 2001.