| < draft-ietf-tsvwg-ecn-00.txt | draft-ietf-tsvwg-ecn-01.txt > | |||
|---|---|---|---|---|
| Internet Engineering Task Force K. K. Ramakrishnan | Internet Engineering Task Force K. K. Ramakrishnan | |||
| INTERNET DRAFT TeraOptic Networks | INTERNET DRAFT TeraOptic Networks | |||
| draft-ietf-tsvwg-ecn-00.txt Sally Floyd | draft-ietf-tsvwg-ecn-01.txt Sally Floyd | |||
| ACIRI | ACIRI | |||
| D. Black | D. Black | |||
| EMC | EMC | |||
| November, 2000 | January, 2001 | |||
| Expires: May, 2001 | Expires: July, 2001 | |||
| The Addition of Explicit Congestion Notification (ECN) to IP | The Addition of Explicit Congestion Notification (ECN) to IP | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
| all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 1, line 39 ¶ | skipping to change at page 1, line 39 ¶ | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt | http://www.ietf.org/ietf/1id-abstracts.txt | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| Abstract | Abstract | |||
| This document specifies the incorporation of ECN (Explicit Congestion | This document specifies the incorporation of ECN (Explicit Congestion | |||
| Notification) to TCP and IP, including ECN's use of two bits in the | Notification) to TCP and IP, including ECN's use of two bits in the | |||
| IP header's DS field. We begin by describing TCP's use of packet | IP header. We begin by describing TCP's use of packet drops as an | |||
| drops as an indication of congestion. Next we explain that with the | indication of congestion. Next we explain that with the addition of | |||
| addition of active queue management (e.g., RED) to the Internet | active queue management (e.g., RED) to the Internet infrastructure, | |||
| infrastructure, where routers detect congestion before the queue | where routers detect congestion before the queue overflows, routers | |||
| overflows, routers are no longer limited to packet drops as an | are no longer limited to packet drops as an indication of congestion. | |||
| indication of congestion. Routers can instead set the Congestion | Routers can instead set the Congestion Experienced (CE) bit in the IP | |||
| Experienced (CE) bit in the IP header of packets from ECN-capable | header of packets from ECN-capable transports. We describe when the | |||
| transports. We describe when the CE bit is to be set in routers, and | CE bit is to be set in routers, and describe modifications needed to | |||
| describe modifications needed to TCP to make it ECN-capable. | TCP to make it ECN-capable. Modifications to other transport | |||
| Modifications to other transport protocols (e.g., unreliable unicast | protocols (e.g., unreliable unicast or multicast, reliable multicast, | |||
| or multicast, reliable multicast, other reliable unicast transport | other reliable unicast transport protocols) could be considered as | |||
| protocols) could be considered as those protocols are developed and | those protocols are developed and advance through the standards | |||
| advance through the standards process. | process. | |||
| We also describe in this document the issues involving the use of ECN | We also describe in this document the issues involving the use of ECN | |||
| within IP tunnels, and within IPsec tunnels in particular. | within IP tunnels, and within IPsec tunnels in particular. | |||
| One of the guiding principles for this document is that all the | One of the guiding principles for this document is that all the | |||
| mechanisms specified here are incrementally deployable. | mechanisms specified here are incrementally deployable. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction | 1. Introduction | |||
| 2. Conventions and Acronyms | 2. Conventions and Acronyms | |||
| 3. Assumptions and General Principles | 3. Assumptions and General Principles | |||
| 4. Active Queue Management (AQM) | 4. Active Queue Management (AQM) | |||
| 5. Explicit Congestion Notification in IP | 5. Explicit Congestion Notification in IP | |||
| 5.1. ECN as an indication of persistent congestion | 5.1. ECN as an Indication of Persistent Congestion | |||
| 5.2. Dropped or Corrupted Packets | 5.2. Dropped or Corrupted Packets | |||
| 6. Support from the Transport Protocol | 6. Support from the Transport Protocol | |||
| 6.1. TCP | 6.1. TCP | |||
| 6.1.1. TCP Initialization | 6.1.1. TCP Initialization | |||
| 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field | 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field | |||
| 6.1.1.2. Robust TCP Initialization with no response to the SYN | ||||
| 6.1.2. The TCP Sender | 6.1.2. The TCP Sender | |||
| 6.1.3. The TCP Receiver | 6.1.3. The TCP Receiver | |||
| 6.1.4. Congestion on the ACK-path | 6.1.4. Congestion on the ACK-path | |||
| 6.1.5. Retransmitted TCP packets | 6.1.5. Retransmitted TCP packets | |||
| 6.1.6. TCP Window Probes. | 6.1.6. TCP Window Probes. | |||
| 7. Non-compliance by the End Nodes | 7. Non-compliance by the End Nodes | |||
| 8. Non-compliance in the Network | 8. Non-compliance in the Network | |||
| 8.1. Complications Introduced by Split Paths | 8.1. Complications Introduced by Split Paths | |||
| 9. Encapsulated Packets | 9. Encapsulated Packets | |||
| 9.1. IP packets encapsulated in IP | 9.1. IP packets encapsulated in IP | |||
| 9.1.1. The limited-functionality and full-functionality options within | 9.1.1. The Limited-functionality and Full-functionality Options | |||
| 9.1.2. Changes to the ECN Field within an IP Tunnel. | 9.1.2. Changes to the ECN Field within an IP Tunnel. | |||
| 9.2. IPsec Tunnels | 9.2. IPsec Tunnels | |||
| 9.2.1. Negotiation between Tunnel Endpoints | 9.2.1. Negotiation between Tunnel Endpoints | |||
| 9.2.1.1. ECN Tunnel Security Association Database Field | 9.2.1.1. ECN Tunnel Security Association Database Field | |||
| 9.2.1.2. ECN Tunnel Security Association Attribute | 9.2.1.2. ECN Tunnel Security Association Attribute | |||
| 9.2.1.3. Changes to IPsec Tunnel Header Processing | 9.2.1.3. Changes to IPsec Tunnel Header Processing | |||
| 9.2.2. Changes to the ECN Field within an IPsec Tunnel. | 9.2.2. Changes to the ECN Field within an IPsec Tunnel. | |||
| 9.2.3. Comments for IPsec Support | 9.2.3. Comments for IPsec Support | |||
| 9.3. IP packets encapsulated in non-IP packet headers. | 9.3. IP packets encapsulated in non-IP packet headers. | |||
| 10. Issues Raised by Monitoring and Policing Devices | 10. Issues Raised by Monitoring and Policing Devices | |||
| skipping to change at page 4, line 9 ¶ | skipping to change at page 4, line 8 ¶ | |||
| 18.1.2. Falsely Reporting Congestion | 18.1.2. Falsely Reporting Congestion | |||
| 18.1.3. Disabling ECN-Capability | 18.1.3. Disabling ECN-Capability | |||
| 18.1.4. Falsely Indicating ECN-Capability | 18.1.4. Falsely Indicating ECN-Capability | |||
| 18.1.5. Changes with No Functional Effect | 18.1.5. Changes with No Functional Effect | |||
| 18.2. Information carried in the Transport Header | 18.2. Information carried in the Transport Header | |||
| 18.3. Split Paths | 18.3. Split Paths | |||
| 19. Implications of Subverting End-to-End Congestion Control | 19. Implications of Subverting End-to-End Congestion Control | |||
| 19.1. Implications for the Network and for Competing Flows | 19.1. Implications for the Network and for Competing Flows | |||
| 19.2. Implications for the Subverted Flow | 19.2. Implications for the Subverted Flow | |||
| 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control | 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control | |||
| 20. The motivation for the ECT bit. | 20. The Motivation for the ECT bit. | |||
| 21. Why use two bits in the IP header? | 21. Why use Two Bits in the IP Header? | |||
| 22. Historical definitions for the IPv4 TOS octet | 22. Historical Definitions for the IPv4 TOS Octet | |||
| 23. IANA Considerations | ||||
| RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - To compare | ||||
| this with draft-ietf-tsvwg-ecn-00, compare the following: | ||||
| "http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-00.troff" | ||||
| "http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-01.troff" | ||||
| Changes from draft-ietf-tsvwg-ecn-00: | ||||
| * Deleted Section 6.1.1.2. on "Robust TCP Initialization with no | ||||
| response to the SYN", and modified the paragraph in the Conclusions | ||||
| referring to this. | ||||
| * Added Section 23 on IANA Considerations. | ||||
| * Added two paragraphs to Section 18.2 on denial-of-service attacks. | ||||
| * Added some text about the ECN nonce being a research issue. | ||||
| * Moved two paragraphs about setting the CWR bit from Section 6.1.3 to | ||||
| Section 6.1.2. | ||||
| * Various small changes: | ||||
| Adding several small clarifying sentences in Section 12, 22. | ||||
| Small clarification to text in Section 19.2. | ||||
| Deleted a few unnecessary sentences in Section 9. | ||||
| Updated some references to Section X. | ||||
| Added more references to RFC 2780. | ||||
| Deleted references to internet-drafts. | ||||
| Clarified terminology for "non-ECN-setup SYN packet", including the | ||||
| following: "Receivers MUST correctly handle all forms of the non-ECN- | ||||
| setup SYN and SYN-ACK packets." | ||||
| 1. Introduction | 1. Introduction | |||
| TCP's congestion control and avoidance algorithms are based on the | TCP's congestion control and avoidance algorithms are based on the | |||
| notion that the network is a black-box [Jacobson88, Jacobson90]. The | notion that the network is a black-box [Jacobson88, Jacobson90]. The | |||
| network's state of congestion or otherwise is determined by end- sys- | network's state of congestion or otherwise is determined by end-sys- | |||
| tems probing for the network state, by gradually increasing the load | tems probing for the network state, by gradually increasing the load | |||
| on the network (by increasing the window of packets that are out- | on the network (by increasing the window of packets that are out- | |||
| standing in the network) until the network becomes congested and a | standing in the network) until the network becomes congested and a | |||
| packet is lost. Treating the network as a "black-box" and treating | packet is lost. Treating the network as a "black-box" and treating | |||
| loss as an indication of congestion in the network is appropriate for | loss as an indication of congestion in the network is appropriate for | |||
| pure best-effort data carried by TCP, with little or no sensitivity | pure best-effort data carried by TCP, with little or no sensitivity | |||
| to delay or loss of individual packets. In addition, TCP's conges- | to delay or loss of individual packets. In addition, TCP's conges- | |||
| tion management algorithms have techniques built-in (such as Fast | tion management algorithms have techniques built-in (such as Fast | |||
| Retransmit and Fast Recovery) to minimize the impact of losses, from | Retransmit and Fast Recovery) to minimize the impact of losses, from | |||
| a throughput perspective. However, these mechanisms are not intended | a throughput perspective. However, these mechanisms are not intended | |||
| skipping to change at page 5, line 20 ¶ | skipping to change at page 5, line 44 ¶ | |||
| Active queue management mechanisms may use one of several methods for | Active queue management mechanisms may use one of several methods for | |||
| indicating congestion to end-nodes. One is to use packet drops, as is | indicating congestion to end-nodes. One is to use packet drops, as is | |||
| currently done. However, active queue management allows the router to | currently done. However, active queue management allows the router to | |||
| separate policies of queueing or dropping packets from the policies | separate policies of queueing or dropping packets from the policies | |||
| for indicating congestion. Thus, active queue management allows | for indicating congestion. Thus, active queue management allows | |||
| routers to use the Congestion Experienced (CE) bit in a packet header | routers to use the Congestion Experienced (CE) bit in a packet header | |||
| as an indication of congestion, instead of relying solely on packet | as an indication of congestion, instead of relying solely on packet | |||
| drops. This has the potential of reducing the impact of loss on | drops. This has the potential of reducing the impact of loss on | |||
| latency-sensitive flows. | latency-sensitive flows. | |||
| This document is intended to obsolete RFC 2481, "A Proposal to add | ||||
| Explicit Congestion Notification (ECN) to IP", which defined ECN as | ||||
| an Experimental Protocol for the Internet Community. | ||||
| RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - This | ||||
| document obsoletes three subsequent internet-drafts on ECN, "IPsec | ||||
| Interactions with ECN", "ECN Interactions with IP Tunnels", and "TCP | ||||
| with ECN: The Treatment of Retransmitted Data Packets". This | ||||
| document is intended largely to merge the earlier documents all into | ||||
| a single document, for greater clarity, in preparation to becoming a | ||||
| Proposed Standard. | ||||
| 2. Conventions and Acronyms | 2. Conventions and Acronyms | |||
| The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, | The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, | |||
| SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this | SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this | |||
| document, are to be interpreted as described in [B97]. | document, are to be interpreted as described in [B97]. | |||
| 3. Assumptions and General Principles | 3. Assumptions and General Principles | |||
| In this section, we describe some of the important design principles | In this section, we describe some of the important design principles | |||
| and assumptions that guided the design choices in this proposal. | and assumptions that guided the design choices in this proposal. | |||
| * Because ECN is likely to be adopted gradually, accommodating migra- | * Because ECN is likely to be adopted gradually, accommodating migra- | |||
| tion is essential. Some routers may still only drop packets to indi- | tion is essential. Some routers may still only drop packets to indi- | |||
| cate congestion, and some end-systems may not be ECN- capable. The | cate congestion, and some end-systems may not be ECN-capable. The | |||
| most viable strategy is one that accommodates incremental deployment | most viable strategy is one that accommodates incremental deployment | |||
| without having to resort to "islands" of ECN-capable and non-ECN- | without having to resort to "islands" of ECN-capable and non-ECN- | |||
| capable environments. | capable environments. | |||
| * New mechanisms for congestion control and avoidance need to co- | * New mechanisms for congestion control and avoidance need to co- | |||
| exist and cooperate with existing mechanisms for congestion control. | exist and cooperate with existing mechanisms for congestion control. | |||
| In particular, new mechanisms have to co-exist with TCP's current | In particular, new mechanisms have to co-exist with TCP's current | |||
| methods of adapting to congestion and with routers' current practice | methods of adapting to congestion and with routers' current practice | |||
| of dropping packets in periods of congestion. | of dropping packets in periods of congestion. | |||
| * Congestion may persist over different time-scales. The time scales | * Congestion may persist over different time-scales. The time scales | |||
| that we are concerned with are congestion events that may last longer | that we are concerned with are congestion events that may last longer | |||
| skipping to change at page 7, line 5 ¶ | skipping to change at page 7, line 42 ¶ | |||
| with two bits. The ECN-Capable Transport (ECT) bit is set by the | with two bits. The ECN-Capable Transport (ECT) bit is set by the | |||
| data sender to indicate that the end-points of the transport protocol | data sender to indicate that the end-points of the transport protocol | |||
| are ECN-capable. The CE bit is set by the router to indicate conges- | are ECN-capable. The CE bit is set by the router to indicate conges- | |||
| tion to the end nodes. Routers that have a packet arriving at a full | tion to the end nodes. Routers that have a packet arriving at a full | |||
| queue drop the packet, just as they do in the absence of ECN. | queue drop the packet, just as they do in the absence of ECN. | |||
| Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field. | Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field. | |||
| Bit 6 is designated as the ECT bit, and bit 7 is designated as the CE | Bit 6 is designated as the ECT bit, and bit 7 is designated as the CE | |||
| bit. The IPv4 TOS octet corresponds to the Traffic Class octet in | bit. The IPv4 TOS octet corresponds to the Traffic Class octet in | |||
| IPv6. The definitions for the IPv4 TOS octet [RFC791] and the IPv6 | IPv6. The definitions for the IPv4 TOS octet [RFC791] and the IPv6 | |||
| Traffic Class octet have been superseded by the DS (Differentiated | Traffic Class octet have been superseded by the six-bit DS (Differen- | |||
| Services) Field [RFC2474]. Bits 6 and 7 are listed in [RFC2474] as | tiated Services) Field [RFC2474, RFC2780]. Bits 6 and 7 are listed | |||
| Currently Unused. Section 19 gives a brief history of the TOS octet. | in [RFC2474] as Currently Unused, and are specified in RFC 2780 as | |||
| approved for experimental use for ECN. Section 19 gives a brief his- | ||||
| tory of the TOS octet. | ||||
| 0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
| +-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| | | ECN FIELD | | | DS FIELD | ECN FIELD | | |||
| | DSCP | | | | | | | |||
| | | ECT | CE | | | DSCP | ECT | CE | | |||
| +-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| DSCP: differentiated services codepoint | DSCP: differentiated services codepoint | |||
| ECN: Explicit Congestion Notification | ECN: Explicit Congestion Notification | |||
| Figure 1: The Differentiated Services Field in IP. | Figure 1: The Differentiated Services and ECN Fields in IP. | |||
| Because of the unstable history of the TOS octet, the use of the ECN | Because of the unstable history of the TOS octet, the use of the ECN | |||
| field as specified in this document cannot be guaranteed to be back- | field as specified in this document cannot be guaranteed to be back- | |||
| wards compatible with all past uses of these two bits. The potential | wards compatible with all past uses of these two bits. The potential | |||
| dangers of this lack of backwards compatibility are discussed in Sec- | dangers of this lack of backwards compatibility are discussed in Sec- | |||
| tion 19. | tion 19. | |||
| Upon the receipt by an ECN-Capable transport of a single CE packet, | Upon the receipt by an ECN-Capable transport of a single CE packet, | |||
| the congestion control algorithms followed at the end-systems MUST be | the congestion control algorithms followed at the end-systems MUST be | |||
| essentially the same as the congestion control response to a *single* | essentially the same as the congestion control response to a *single* | |||
| skipping to change at page 8, line 24 ¶ | skipping to change at page 9, line 19 ¶ | |||
| control mechanisms for end-node reaction to CE packets. However, | control mechanisms for end-node reaction to CE packets. However, | |||
| this is a research issue, and as such is not addressed in this docu- | this is a research issue, and as such is not addressed in this docu- | |||
| ment. | ment. | |||
| When a CE packet (i.e., a packet that has the CE bit set) is received | When a CE packet (i.e., a packet that has the CE bit set) is received | |||
| by a router, the CE bit is left unchanged, and the packet is trans- | by a router, the CE bit is left unchanged, and the packet is trans- | |||
| mitted as usual. When severe congestion has occurred and the router's | mitted as usual. When severe congestion has occurred and the router's | |||
| queue is full, then the router has no choice but to drop some packet | queue is full, then the router has no choice but to drop some packet | |||
| when a new packet arrives. We anticipate that such packet losses | when a new packet arrives. We anticipate that such packet losses | |||
| will become relatively infrequent when a majority of end-systems | will become relatively infrequent when a majority of end-systems | |||
| become ECN- Capable and participate in TCP or other compatible con- | become ECN-Capable and participate in TCP or other compatible conges- | |||
| gestion control mechanisms. In an ECN-Capable environment that is | tion control mechanisms. In an ECN-Capable environment that is ade- | |||
| adequately-provisioned network, packet losses should occur primarily | quately-provisioned network, packet losses should occur primarily | |||
| during transients or in the presence of non-cooperating sources. | during transients or in the presence of non-cooperating sources. | |||
| We expect that routers will set the CE bit in response to incipient | We expect that routers will set the CE bit in response to incipient | |||
| congestion as indicated by the average queue size, using the RED | congestion as indicated by the average queue size, using the RED | |||
| algorithms suggested in [FJ93, RFC2309]. To the best of our knowl- | algorithms suggested in [FJ93, RFC2309]. To the best of our knowl- | |||
| edge, this is the only proposal currently under discussion in the | edge, this is the only proposal currently under discussion in the | |||
| IETF for routers to drop packets proactively, before the buffer over- | IETF for routers to drop packets proactively, before the buffer over- | |||
| flows. However, this document does not attempt to specify a particu- | flows. However, this document does not attempt to specify a particu- | |||
| lar mechanism for active queue management, leaving that endeavor, if | lar mechanism for active queue management, leaving that endeavor, if | |||
| needed, to other areas of the IETF. While ECN is inextricably tied | needed, to other areas of the IETF. While ECN is inextricably tied | |||
| up with the need to have a reasonable active queue management mecha- | up with the need to have a reasonable active queue management mecha- | |||
| nism at the router, the reverse does not hold; active queue manage- | nism at the router, the reverse does not hold; active queue manage- | |||
| ment mechanisms have been developed and deployed independent of ECN, | ment mechanisms have been developed and deployed independent of ECN, | |||
| using packet drops as indications of congestion in the absence of ECN | using packet drops as indications of congestion in the absence of ECN | |||
| in the IP architecture. | in the IP architecture. | |||
| 5.1. ECN as an indication of persistent congestion | 5.1. ECN as an Indication of Persistent Congestion | |||
| We emphasize that a *single* packet with the CE bit set in an IP | We emphasize that a *single* packet with the CE bit set in an IP | |||
| packet causes the transport layer to respond, in terms of congestion | packet causes the transport layer to respond, in terms of congestion | |||
| control, as it would to a packet drop. The instantaneous queue size | control, as it would to a packet drop. The instantaneous queue size | |||
| is likely to see considerable variations even when the router does | is likely to see considerable variations even when the router does | |||
| not experience persistent congestion. As such, it is important that | not experience persistent congestion. As such, it is important that | |||
| transient congestion at a router, reflected by the instantaneous | transient congestion at a router, reflected by the instantaneous | |||
| queue size reaching a threshold much smaller than the capacity of the | queue size reaching a threshold much smaller than the capacity of the | |||
| queue, not trigger a reaction at the transport layer. Therefore, the | queue, not trigger a reaction at the transport layer. Therefore, the | |||
| CE bit should not be set by a router based on the instantaneous queue | CE bit should not be set by a router based on the instantaneous queue | |||
| skipping to change at page 12, line 36 ¶ | skipping to change at page 13, line 32 ¶ | |||
| packet. This indicates to the routers that they may mark this packet | packet. This indicates to the routers that they may mark this packet | |||
| with the CE bit, if they would like to use that as a method of con- | with the CE bit, if they would like to use that as a method of con- | |||
| gestion notification. If the TCP connection does not wish to use ECN | gestion notification. If the TCP connection does not wish to use ECN | |||
| notification for a particular packet, the sending TCP sets the ECT | notification for a particular packet, the sending TCP sets the ECT | |||
| bit equal to 0 (i.e., not set), and the TCP receiver ignores the CE | bit equal to 0 (i.e., not set), and the TCP receiver ignores the CE | |||
| bit in the received packet. | bit in the received packet. | |||
| For this discussion, we designate the initiating host as Host A and | For this discussion, we designate the initiating host as Host A and | |||
| the responding host as Host B. We call a SYN packet with the ECE and | the responding host as Host B. We call a SYN packet with the ECE and | |||
| CWR flags set an "ECN-setup SYN packet", and we call a SYN packet | CWR flags set an "ECN-setup SYN packet", and we call a SYN packet | |||
| with the ECE and CWR flags not set a "non-ECN-setup SYN packet". | with at least one of the ECE and CWR flags not set a "non-ECN-setup | |||
| Similarly, we call a SYN-ACK packet with only the ECE flag set but | SYN packet". Similarly, we call a SYN-ACK packet with only the ECE | |||
| the CWR flag not set an "ECN-setup SYN-ACK packet", and we call a | flag set but the CWR flag not set an "ECN-setup SYN-ACK packet", and | |||
| SYN-ACK packet with both the ECE and CWR flags not set a "non-ECN- | we call a SYN-ACK packet with any other configuration of the ECE and | |||
| setup SYN-ACK packet". | CWR flags a "non-ECN-setup SYN-ACK packet". | |||
| Before a TCP connection can use ECN, Host A sends an ECN-setup SYN | Before a TCP connection can use ECN, Host A sends an ECN-setup SYN | |||
| packet, and Host B sends an ECN-setup SYN-ACK packet. For a SYN | packet, and Host B sends an ECN-setup SYN-ACK packet. For a SYN | |||
| packet, the setting of both ECE and CWR in the ECN-setup SYN packet | packet, the setting of both ECE and CWR in the ECN-setup SYN packet | |||
| is defined as an indication that the sending TCP is ECN-Capable, | is defined as an indication that the sending TCP is ECN-Capable, | |||
| rather than as an indication of congestion or of response to conges- | rather than as an indication of congestion or of response to conges- | |||
| tion. More precisely, an ECN-setup SYN packet indicates that the TCP | tion. More precisely, an ECN-setup SYN packet indicates that the TCP | |||
| implementation transmitting the SYN packet will participate in ECN as | implementation transmitting the SYN packet will participate in ECN as | |||
| both a sender and receiver. Specifically, as a receiver, it will | both a sender and receiver. Specifically, as a receiver, it will | |||
| respond to incoming data packets that have the CE bit set in the IP | respond to incoming data packets that have the CE bit set in the IP | |||
| skipping to change at page 13, line 38 ¶ | skipping to change at page 14, line 32 ¶ | |||
| non-ECN-setup SYN or non-ECN-setup SYN-ACK packet. If a host has | non-ECN-setup SYN or non-ECN-setup SYN-ACK packet. If a host has | |||
| received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK | received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK | |||
| packet, then it SHOULD NOT set ECT on data packets. | packet, then it SHOULD NOT set ECT on data packets. | |||
| * If a host ever sets the ECT bit on a data packet, then that host | * If a host ever sets the ECT bit on a data packet, then that host | |||
| MUST correctly set/clear the CWR TCP bit on all subsequent packets in | MUST correctly set/clear the CWR TCP bit on all subsequent packets in | |||
| the connection. | the connection. | |||
| * If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK | * If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK | |||
| packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN- | packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN- | |||
| ACK packet, then if that host receives TCP data packets with ECT and | ACK packet, then if that host receives TCP data packets with ECT and | |||
| CE bits set in the IP header, then that host MUST process these pack- | CE bits set in the IP header, then that host MUST process these pack- | |||
| ets as specified for an ECN-capable connection. | ets as specified for an ECN-capable connection. * A host that is not | |||
| willing to use ECN on a TCP connection SHOULD clear both the ECE and | ||||
| CWR flags in all non-ECN-setup SYN and/or SYN-ACK packets that it | ||||
| sends to indicate this unwillingness. Receivers MUST correctly han- | ||||
| dle all forms of the non-ECN-setup SYN and SYN-ACK packets. | ||||
| 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field | 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field | |||
| There is the question of why we chose to have the TCP sending the SYN | There is the question of why we chose to have the TCP sending the SYN | |||
| set two ECN-related flags in the Reserved field of the TCP header for | set two ECN-related flags in the Reserved field of the TCP header for | |||
| the SYN packet, while the responding TCP sending the SYN-ACK sets | the SYN packet, while the responding TCP sending the SYN-ACK sets | |||
| only one ECN-related flag in the SYN-ACK packet. This asymmetry is | only one ECN-related flag in the SYN-ACK packet. This asymmetry is | |||
| necessary for the robust negotiation of ECN-capability with some | necessary for the robust negotiation of ECN-capability with some | |||
| deployed TCP implementations. There exists at least one faulty TCP | deployed TCP implementations. There exists at least one faulty TCP | |||
| implementation in which TCP receivers set the Reserved field of the | implementation in which TCP receivers set the Reserved field of the | |||
| TCP header in ACK packets (and hence the SYN-ACK) simply to reflect | TCP header in ACK packets (and hence the SYN-ACK) simply to reflect | |||
| the Reserved field of the TCP header in the received data packet. | the Reserved field of the TCP header in the received data packet. | |||
| Because the TCP SYN packet sets the ECN-Echo and CWR flags to indi- | Because the TCP SYN packet sets the ECN-Echo and CWR flags to indi- | |||
| cate ECN-capability, while the SYN-ACK packet sets only the ECN-Echo | cate ECN-capability, while the SYN-ACK packet sets only the ECN-Echo | |||
| flag, the sending TCP correctly interprets a receiver's reflection of | flag, the sending TCP correctly interprets a receiver's reflection of | |||
| its own flags in the Reserved field as an indication that the | its own flags in the Reserved field as an indication that the | |||
| receiver is not ECN-capable. The sending TCP is not mislead by a | receiver is not ECN-capable. The sending TCP is not mislead by a | |||
| faulty TCP implementation sending a SYN-ACK packet that simply | faulty TCP implementation sending a SYN-ACK packet that simply | |||
| reflects the Reserved field of the incoming SYN packet. | reflects the Reserved field of the incoming SYN packet. | |||
| 6.1.1.2. Robust TCP Initialization with no response to the SYN | ||||
| ECN introduces the use of the ECN-Echo and CWR flags in the TCP | ||||
| header (as shown in Figure 3) for initialization. There exists some | ||||
| faulty equipment in the Internet that either ignores an ECN-setup SYN | ||||
| packet or responds with a RST, in the belief that such a packet (with | ||||
| these bits set) is a signature for a port-scanning tool that could be | ||||
| used in a denial-of-service attack. To provide robust connectivity | ||||
| even in the presence of such faulty equipment, a host that receives a | ||||
| RST in response to the transmission of an ECN-setup SYN packet MAY | ||||
| resend a SYN with CWR and ECE cleared. This could result in a TCP | ||||
| connection being established without using ECN. Similarly, a host | ||||
| that receives no reply to an ECN-setup SYN within the normal SYN | ||||
| retransmission timeout interval MAY resend the SYN and any subsequent | ||||
| SYN retransmissions with CWR and ECE cleared. To overcome normal | ||||
| packet loss that results in the original SYN being lost, the origi- | ||||
| nating host may retransmit one or more ECN-setup SYN packets before | ||||
| giving up and retransmitting the SYN with the CWR and ECE bits | ||||
| cleared. | ||||
| We note that in this case, the following example scenario is possi- | ||||
| ble: | ||||
| (1) Host A: Sends an ECN-setup SYN. | ||||
| (2) Host B: Sends an ECN-setup SYN/ACK, packet is dropped or | ||||
| delayed. | ||||
| (3) Host A: Sends a non-ECN-setup SYN. | ||||
| (4) Host B: Sends a non-ECN-setup SYN/ACK. | ||||
| We note that in this case, following the procedures above, neither | ||||
| Host A nor Host B may set the ECT bit on data packets, We further | ||||
| note that a host NEVER uses the reception of ECT data packets as an | ||||
| implicit signal that the other host is ECN-capable. | ||||
| 6.1.2. The TCP Sender | 6.1.2. The TCP Sender | |||
| For a TCP connection using ECN, data packets are transmitted with the | For a TCP connection using ECN, new data packets are transmitted with | |||
| ECT bit set in the IP header (set to a "1"). If the sender receives | the ECT bit set in the IP header (set to a "1"). If the sender | |||
| an ECN-Echo (ECE) ACK packet (that is, an ACK packet with the ECN- | receives an ECN-Echo (ECE) ACK packet (that is, an ACK packet with | |||
| Echo flag set in the TCP header), then the sender knows that conges- | the ECN-Echo flag set in the TCP header), then the sender knows that | |||
| tion was encountered in the network on the path from the sender to | congestion was encountered in the network on the path from the sender | |||
| the receiver. The indication of congestion should be treated just as | to the receiver. The indication of congestion should be treated just | |||
| a congestion loss in non-ECN-Capable TCP. That is, the TCP source | as a congestion loss in non-ECN-Capable TCP. That is, the TCP source | |||
| halves the congestion window "cwnd" and reduces the slow start | halves the congestion window "cwnd" and reduces the slow start | |||
| threshold "ssthresh". The sending TCP SHOULD NOT increase the con- | threshold "ssthresh". The sending TCP SHOULD NOT increase the con- | |||
| gestion window in response to the receipt of an ECN-Echo ACK packet. | gestion window in response to the receipt of an ECN-Echo ACK packet. | |||
| TCP should not react to congestion indications more than once every | TCP should not react to congestion indications more than once every | |||
| window of data (or more loosely, more than once every round-trip | window of data (or more loosely, more than once every round-trip | |||
| time). That is, the TCP sender's congestion window should be reduced | time). That is, the TCP sender's congestion window should be reduced | |||
| only once in response to a series of dropped and/or CE packets from a | only once in response to a series of dropped and/or CE packets from a | |||
| single window of data. In addition, the TCP source should not | single window of data. In addition, the TCP source should not | |||
| decrease the slow-start threshold, ssthresh, if it has been decreased | decrease the slow-start threshold, ssthresh, if it has been decreased | |||
| skipping to change at page 15, line 37 ¶ | skipping to change at page 15, line 50 ¶ | |||
| tinue to send, using a congestion window of 1 MSS, this results in | tinue to send, using a congestion window of 1 MSS, this results in | |||
| the transmission of one packet per round-trip time. It is necessary | the transmission of one packet per round-trip time. It is necessary | |||
| to still reduce the sending rate of the TCP sender even further, on | to still reduce the sending rate of the TCP sender even further, on | |||
| receipt of an ECN-Echo packet when the congestion window is one. We | receipt of an ECN-Echo packet when the congestion window is one. We | |||
| use the retransmit timer as a means of reducing the rate further in | use the retransmit timer as a means of reducing the rate further in | |||
| this circumstance. Therefore, the sending TCP MUST reset the | this circumstance. Therefore, the sending TCP MUST reset the | |||
| retransmit timer on receiving the ECN-Echo packet when the congestion | retransmit timer on receiving the ECN-Echo packet when the congestion | |||
| window is one. The sending TCP will then be able to send a new | window is one. The sending TCP will then be able to send a new | |||
| packet only when the retransmit timer expires. | packet only when the retransmit timer expires. | |||
| When an ECN-Capable TCP sender reduces its congestion window for any | ||||
| reason (because of a retransmit timeout, a Fast Retransmit, or in | ||||
| response to an ECN Notification), the TCP sender sets the CWR flag in | ||||
| the TCP header of the first new data packet sent after the window | ||||
| reduction. If that data packet is dropped in the network, then the | ||||
| sending TCP will have to reduce the congestion window again and | ||||
| retransmit the dropped packet. | ||||
| We ensure that the "Congestion Window Reduced" information is reli- | ||||
| ably delivered to the TCP receiver. This comes about from the fact | ||||
| that if the new data packet carrying the CWR flag is dropped, then | ||||
| the TCP sender will have to again reduce its congestion window, and | ||||
| send another new data packet with the CWR flag set. Thus, the CWR | ||||
| bit in the TCP header SHOULD NOT be set on retransmitted packets. | ||||
| When the TCP data sender is ready to set the CWR bit after reducing | ||||
| the congestion window, it SHOULD set the CWR bit only on the first | ||||
| new data packet that it transmits. | ||||
| [Floyd94] discusses TCP's response to ECN in more detail. [Floyd98] | [Floyd94] discusses TCP's response to ECN in more detail. [Floyd98] | |||
| discusses the validation test in the ns simulator, which illustrates | discusses the validation test in the ns simulator, which illustrates | |||
| a wide range of ECN scenarios. These scenarios include the following: | a wide range of ECN scenarios. These scenarios include the following: | |||
| an ECN followed by another ECN, a Fast Retransmit, or a Retransmit | an ECN followed by another ECN, a Fast Retransmit, or a Retransmit | |||
| Timeout; a Retransmit Timeout or a Fast Retransmit followed by an | Timeout; a Retransmit Timeout or a Fast Retransmit followed by an | |||
| ECN; and a congestion window of one packet followed by an ECN. | ECN; and a congestion window of one packet followed by an ECN. | |||
| TCP follows existing algorithms for sending data packets in response | TCP follows existing algorithms for sending data packets in response | |||
| to incoming ACKs, multiple duplicate acknowledgements, or retransmit | to incoming ACKs, multiple duplicate acknowledgements, or retransmit | |||
| timeouts [RFC2581]. TCP also follows the normal procedures for | timeouts [RFC2581]. TCP also follows the normal procedures for | |||
| skipping to change at page 16, line 23 ¶ | skipping to change at page 16, line 51 ¶ | |||
| all of the data packets being acknowledged. That is, if any of the | all of the data packets being acknowledged. That is, if any of the | |||
| received data packets are CE packets, then the returning ACK has the | received data packets are CE packets, then the returning ACK has the | |||
| ECN-Echo flag set. | ECN-Echo flag set. | |||
| To provide robustness against the possibility of a dropped ACK packet | To provide robustness against the possibility of a dropped ACK packet | |||
| carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in | carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in | |||
| a series of ACK packets sent subsequently. The TCP receiver uses the | a series of ACK packets sent subsequently. The TCP receiver uses the | |||
| CWR flag received from the TCP sender to determine when to stop set- | CWR flag received from the TCP sender to determine when to stop set- | |||
| ting the ECN-Echo flag. | ting the ECN-Echo flag. | |||
| When an ECN-Capable TCP sender reduces its congestion window for any | ||||
| reason (because of a retransmit timeout, a Fast Retransmit, or in | ||||
| response to an ECN Notification), the TCP sender sets the CWR flag in | ||||
| the TCP header of the first new data packet sent after the window | ||||
| reduction. If that data packet is dropped in the network, then the | ||||
| sending TCP will have to reduce the congestion window again and | ||||
| retransmit the dropped packet. | ||||
| We ensure that the "Congestion Window Reduced" information is reli- | ||||
| ably delivered to the TCP receiver. This comes about from the fact | ||||
| that if the new data packet carrying the CWR flag is dropped, then | ||||
| the TCP sender will have to again reduce its congestion window, and | ||||
| send another new data packet with the CWR flag set. Thus, the CWR | ||||
| bit in the TCP header SHOULD NOT be set on retransmitted packets. | ||||
| When the TCP data sender is ready to set the CWR bit after reducing | ||||
| the congestion window, it SHOULD set the CWR bit only on the first | ||||
| new data packet that it transmits. | ||||
| After a TCP receiver sends an ACK packet with the ECN-Echo bit set, | After a TCP receiver sends an ACK packet with the ECN-Echo bit set, | |||
| that TCP receiver continues to set the ECN-Echo flag in all the ACK | that TCP receiver continues to set the ECN-Echo flag in all the ACK | |||
| packets it sends (whether they acknowledge CE data packets or non-CE | packets it sends (whether they acknowledge CE data packets or non-CE | |||
| data packets) until it receives a CWR packet (a packet with the CWR | data packets) until it receives a CWR packet (a packet with the CWR | |||
| flag set). After the receipt of the CWR packet, acknowledgements for | flag set). After the receipt of the CWR packet, acknowledgements for | |||
| subsequent non-CE data packets do not have the ECN-Echo flag set. If | subsequent non-CE data packets do not have the ECN-Echo flag set. If | |||
| another CE packet is received by the data receiver, the receiver | another CE packet is received by the data receiver, the receiver | |||
| would once again send ACK packets with the ECN-Echo flag set. While | would once again send ACK packets with the ECN-Echo flag set. While | |||
| the receipt of a CWR packet does not guarantee that the data sender | the receipt of a CWR packet does not guarantee that the data sender | |||
| received the ECN-Echo message, this does suggest that the data sender | received the ECN-Echo message, this does suggest that the data sender | |||
| skipping to change at page 20, line 26 ¶ | skipping to change at page 20, line 38 ¶ | |||
| example, delay-insensitive flows using reliable delivery might have | example, delay-insensitive flows using reliable delivery might have | |||
| an incentive to increase rather than to decrease their sending rate | an incentive to increase rather than to decrease their sending rate | |||
| in the presence of dropped packets. Similarly, delay-sensitive flows | in the presence of dropped packets. Similarly, delay-sensitive flows | |||
| using unreliable delivery might increase their use of FEC in response | using unreliable delivery might increase their use of FEC in response | |||
| to an increased packet drop rate, increasing rather than decreasing | to an increased packet drop rate, increasing rather than decreasing | |||
| their sending rate. For the same reasons, we do not believe that | their sending rate. For the same reasons, we do not believe that | |||
| packet dropping itself is an effective deterrent for non-compliance | packet dropping itself is an effective deterrent for non-compliance | |||
| even in an environment of high packet drop rates, when all flows are | even in an environment of high packet drop rates, when all flows are | |||
| sharing the same packet drop rate. | sharing the same packet drop rate. | |||
| Several methods have been proposed to identify and restrict non- com- | Several methods have been proposed to identify and restrict non-com- | |||
| pliant or unresponsive flows. The addition of ECN to the network | pliant or unresponsive flows. The addition of ECN to the network | |||
| environment would not in any way increase the difficulty of designing | environment would not in any way increase the difficulty of designing | |||
| and deploying such mechanisms. If anything, the addition of ECN to | and deploying such mechanisms. If anything, the addition of ECN to | |||
| the architecture would make the job of identifying unresponsive flows | the architecture would make the job of identifying unresponsive flows | |||
| slightly easier. For example, in an ECN-Capable environment routers | slightly easier. For example, in an ECN-Capable environment routers | |||
| are not limited to information about packets that are dropped or have | are not limited to information about packets that are dropped or have | |||
| the CE bit set at that router itself; in such an environment, routers | the CE bit set at that router itself; in such an environment, routers | |||
| could also take note of arriving CE packets that indicate congestion | could also take note of arriving CE packets that indicate congestion | |||
| encountered by that packet earlier in the path. | encountered by that packet earlier in the path. | |||
| skipping to change at page 20, line 48 ¶ | skipping to change at page 21, line 16 ¶ | |||
| This section considers the issues when a router is operating, possi- | This section considers the issues when a router is operating, possi- | |||
| bly maliciously, to modify either of the bits in the ECN field. In | bly maliciously, to modify either of the bits in the ECN field. In | |||
| this section we represent the ECN field in the IP header by the tuple | this section we represent the ECN field in the IP header by the tuple | |||
| (ECT bit, CE bit). | (ECT bit, CE bit). | |||
| By tampering with the bits in the ECN field, an adversary (or a bro- | By tampering with the bits in the ECN field, an adversary (or a bro- | |||
| ken router) could do one or more of the following: falsely report | ken router) could do one or more of the following: falsely report | |||
| congestion, disable ECN-Capability for an individual packet, erase | congestion, disable ECN-Capability for an individual packet, erase | |||
| the ECN congestion indication, or falsely indicate ECN-Capability. | the ECN congestion indication, or falsely indicate ECN-Capability. | |||
| Appendix X systematically examines the various cases by which the ECN | Section 18 systematically examines the various cases by which the ECN | |||
| field could be modified. The important criterion considered in | field could be modified. The important criterion considered in | |||
| determining the consequences of such modifications is whether it is | determining the consequences of such modifications is whether it is | |||
| likely to lead to poorer behavior in any dimension (throughput, | likely to lead to poorer behavior in any dimension (throughput, | |||
| delay, fairness or functionality) than if a router were to drop a | delay, fairness or functionality) than if a router were to drop a | |||
| packet. | packet. | |||
| The first two possible changes, falsely reporting congestion or dis- | The first two possible changes, falsely reporting congestion or dis- | |||
| abling ECN-Capability for an individual packet, are no worse than if | abling ECN-Capability for an individual packet, are no worse than if | |||
| the router were to simply drop the packet. From a congestion control | the router were to simply drop the packet. From a congestion control | |||
| point of view, setting the CE bit in the absence of congestion by a | point of view, setting the CE bit in the absence of congestion by a | |||
| non-compliant router would be no worse than a router dropping a | non-compliant router would be no worse than a router dropping a | |||
| packet unnecessarily. By "erasing" the ECT bit of a packet that is | packet unnecessarily. By "erasing" the ECT bit of a packet that is | |||
| later dropped in the network, a router's actions could result in an | later dropped in the network, a router's actions could result in an | |||
| unnecessary packet drop for that packet later in the network. | unnecessary packet drop for that packet later in the network. | |||
| However, as discussed in Section X in the Appendix, a router that | However, as discussed in Section 18, a router that erases the ECN | |||
| erases the ECN congestion indication or falsely indicates ECN-Capa- | congestion indication or falsely indicates ECN-Capability could | |||
| bility could potentially do more damage to the flow that if it has | potentially do more damage to the flow that if it has simply dropped | |||
| simply dropped the packet. A rogue or broken router that "erased" | the packet. A rogue or broken router that "erased" the CE bit in | |||
| the CE bit in arriving CE packets would prevent that indication of | arriving CE packets would prevent that indication of congestion from | |||
| congestion from reaching downstream receivers. This could result in | reaching downstream receivers. This could result in the failure of | |||
| the failure of congestion control for that flow and a resulting | congestion control for that flow and a resulting increase in conges- | |||
| increase in congestion in the network, ultimately resulting in subse- | tion in the network, ultimately resulting in subsequent packets | |||
| quent packets dropped for this flow as the average queue size | dropped for this flow as the average queue size increased at the con- | |||
| increased at the congested gateway. | gested gateway. | |||
| Appendix X considers the potential repercussions of subverting end- | Section 19 considers the potential repercussions of subverting end- | |||
| to-end congestion control by either falsely indicating ECN-Capabil- | to-end congestion control by either falsely indicating ECN-Capabil- | |||
| ity, or by erasing the congestion indication in ECN (the CE-bit). We | ity, or by erasing the congestion indication in ECN (the CE-bit). We | |||
| observe in the Appendix that the consequence of subverting ECN-based | observe in Section 19 that the consequence of subverting ECN-based | |||
| congestion control may lead to potential unfairness, but this is | congestion control may lead to potential unfairness, but this is | |||
| likely to be no worse than the subversion of either ECN-based or | likely to be no worse than the subversion of either ECN-based or | |||
| packet-based congestion control by the end nodes. | packet-based congestion control by the end nodes. | |||
| 8.1. Complications Introduced by Split Paths | 8.1. Complications Introduced by Split Paths | |||
| If a router or other network element has access to all of the packets | If a router or other network element has access to all of the packets | |||
| of a flow, then that router could do no more damage to a flow by | of a flow, then that router could do no more damage to a flow by | |||
| altering the ECN field than it could by simply dropping all of the | altering the ECN field than it could by simply dropping all of the | |||
| packets from that flow. However, in some cases, a malicious or bro- | packets from that flow. However, in some cases, a malicious or bro- | |||
| ken router might have access to only a subset of the packets from a | ken router might have access to only a subset of the packets from a | |||
| flow. The question is as follows: can this router, by altering the | flow. The question is as follows: can this router, by altering the | |||
| ECN field in this subset of the packets, do more damage to that flow | ECN field in this subset of the packets, do more damage to that flow | |||
| than if it has simply dropped that set of the packets? | than if it has simply dropped that set of the packets? | |||
| This is also discussed in detail in the Appendix, which concludes as | This is also discussed in detail in Section 18, which conclude as | |||
| follows: It is true that the adversary that has access only to a | follows: It is true that the adversary that has access only to a | |||
| subset of packets in an aggregate might, by subverting ECN-based con- | subset of packets in an aggregate might, by subverting ECN-based con- | |||
| gestion control, be able to deny the benefits of ECN to the other | gestion control, be able to deny the benefits of ECN to the other | |||
| packets in the aggregate. While this is undesirable, this is not a | packets in the aggregate. While this is undesirable, this is not a | |||
| sufficient concern to result in disabling ECN within an IP tunnel. | sufficient concern to result in disabling ECN. | |||
| 9. Encapsulated Packets | 9. Encapsulated Packets | |||
| 9.1. IP packets encapsulated in IP | 9.1. IP packets encapsulated in IP | |||
| The encapsulation of IP packet headers in tunnels is used in many | The encapsulation of IP packet headers in tunnels is used in many | |||
| places, including IPsec and IP in IP [RFC2003]. Currently, the ECN | places, including IPsec and IP in IP [RFC2003]. This section consid- | |||
| specification does not accommodate the constraints imposed by some of | ||||
| these pre-existing specifications for tunnels. This document consid- | ||||
| ers issues related to interactions between ECN and IP tunnels, and | ers issues related to interactions between ECN and IP tunnels, and | |||
| specifies two alternative solutions. | specifies two alternative solutions. This discussion is complemented | |||
| by RFC 2983's discussion of interactions between Differentiated Ser- | ||||
| vices and IP tunnels of various forms [RFC 2983], as Differentiated | ||||
| Services uses the remaining six bits of the IP header octet that is | ||||
| used by ECN (see Figure 1 in Section 5). | ||||
| Some IP tunnel modes are based on adding a new "outer" IP header that | Some IP tunnel modes are based on adding a new "outer" IP header that | |||
| encapsulates the original, or "inner" IP header and its associated | encapsulates the original, or "inner" IP header and its associated | |||
| packet. In many cases, the new "outer" IP header may be added and | packet. In many cases, the new "outer" IP header may be added and | |||
| removed at intermediate points along a connection, enabling the net- | removed at intermediate points along a connection, enabling the net- | |||
| work to establish a tunnel without requiring endpoint participation. | work to establish a tunnel without requiring endpoint participation. | |||
| We denote tunnels that specify that the outer header be discarded at | We denote tunnels that specify that the outer header be discarded at | |||
| tunnel egress as "simple tunnels". | tunnel egress as "simple tunnels". | |||
| ECN uses the ECT and CE flags in the IP header for signaling between | ECN uses the ECT and CE flags in the IP header for signaling between | |||
| routers and connection endpoints. ECN interacts with IP tunnels | routers and connection endpoints. ECN interacts with IP tunnels | |||
| because of the ECT and CE flags in the DS field octet in the IP | based on the treatment of these flags in the IP header. In simple IP | |||
| header [RFC2474] (also referred to as the IPv4 TOS octet or IPv6 | tunnels the octet containing these flags is copied or mapped from the | |||
| Traffic Class octet). [RFC2983] discusses interactions of Differen- | inner IP header to the outer IP header at IP tunnel ingress, and the | |||
| tiated Services with IP tunnels of various forms. In simple IP tun- | outer header's copy of this field is discarded at IP tunnel egress. | |||
| nels the DS field octet is copied or mapped from the inner IP header | If the outer header were to be simply discarded without taking care | |||
| to the outer IP header at IP tunnel ingress, and the outer header's | to deal with the ECN related flags, and an ECN-capable router were to | |||
| copy of this field is discarded at IP tunnel egress. If the outer | set the CE (Congestion Experienced) bit within a packet in a simple | |||
| header were to be simply discarded without taking care to deal with | IP tunnel, this indication would be discarded at tunnel egress, los- | |||
| the ECN related flags, and an ECN-capable router were to set the CE | ing the indication of congestion. | |||
| (Congestion Experienced) bit within a packet in a simple IP tunnel, | ||||
| this indication would be discarded at tunnel egress, losing the indi- | ||||
| cation of congestion. | ||||
| Thus, the use of ECN over simple IP tunnels would result in routers | Thus, the use of ECN over simple IP tunnels would result in routers | |||
| attempting to use the outer IP header to signal congestion to end- | attempting to use the outer IP header to signal congestion to end- | |||
| points, but those congestion warnings never arriving because the | points, but those congestion warnings never arriving because the | |||
| outer header is discarded at the tunnel egress point. This problem | outer header is discarded at the tunnel egress point. This problem | |||
| was encountered with ECN and IPsec in tunnel mode, and RFC 2481 rec- | was encountered with ECN and IPsec in tunnel mode, and RFC 2481 rec- | |||
| ommended that ECN not be used with the older simple IPsec tunnels in | ommended that ECN not be used with the older simple IPsec tunnels in | |||
| order to avoid this behavior and its consequences. When ECN becomes | order to avoid this behavior and its consequences. When ECN becomes | |||
| widely deployed, then simple tunnels likely to carry ECN-capable | widely deployed, then simple tunnels likely to carry ECN-capable | |||
| traffic will have to be changed. | traffic will have to be changed. | |||
| From a security point of view, the use of ECN in the outer header of | From a security point of view, the use of ECN in the outer header of | |||
| an IP tunnel might raise security concerns because an adversary could | an IP tunnel might raise security concerns because an adversary could | |||
| tamper with the ECN information that propagates beyond the tunnel | tamper with the ECN information that propagates beyond the tunnel | |||
| endpoint. Based on an analysis in the Appendix of these concerns and | endpoint. Based on an analysis in Sections 18 and 19 of these con- | |||
| the resultant risks, our overall approach is to make support for ECN | cerns and the resultant risks, our overall approach is to make sup- | |||
| an option for IP tunnels, so that an IP tunnel can be specified or | port for ECN an option for IP tunnels, so that an IP tunnel can be | |||
| configured either to use ECN or not to use ECN in the outer header of | specified or configured either to use ECN or not to use ECN in the | |||
| the tunnel. Thus, in environments or tunneling protocols where the | outer header of the tunnel. Thus, in environments or tunneling pro- | |||
| risks of using ECN are judged to outweigh its benefits, the tunnel | tocols where the risks of using ECN are judged to outweigh its bene- | |||
| can simply not use ECN in the outer header. Then the only indication | fits, the tunnel can simply not use ECN in the outer header. Then | |||
| of congestion experienced at routers within the tunnel would be | the only indication of congestion experienced at routers within the | |||
| through packet loss. | tunnel would be through packet loss. | |||
| The result is that there are two viable options for the behavior of | The result is that there are two viable options for the behavior of | |||
| ECN-capable connections over an IP tunnel, especially IPSec tunnels: | ECN-capable connections over an IP tunnel, especially IPsec tunnels: | |||
| * A limited-functionality option in which ECN is preserved in the | * A limited-functionality option in which ECN is preserved in the | |||
| inner header, but disabled in the outer header. The only mecha- | inner header, but disabled in the outer header. The only mecha- | |||
| nism available for signaling congestion occurring within the tun- | nism available for signaling congestion occurring within the tun- | |||
| nel in this case is dropped packets. | nel in this case is dropped packets. | |||
| * A full-functionality option that supports ECN in both the inner | * A full-functionality option that supports ECN in both the inner | |||
| and outer headers, and propagates congestion warnings from nodes | and outer headers, and propagates congestion warnings from nodes | |||
| within the tunnel to endpoints. | within the tunnel to endpoints. | |||
| Support for these options requires varying amounts of changes to IP | Support for these options requires varying amounts of changes to IP | |||
| header processing at tunnel ingress and egress. A small subset of | header processing at tunnel ingress and egress. A small subset of | |||
| these changes sufficient to support only the limited-functionality | these changes sufficient to support only the limited-functionality | |||
| option would be sufficient to eliminate any incompatibility between | option would be sufficient to eliminate any incompatibility between | |||
| ECN and IP tunnels. | ECN and IP tunnels. | |||
| One goal of this document is to give guidance about the tradeoffs | One goal of this document is to give guidance about the tradeoffs | |||
| between the limited-functionality and full-functionality options. A | between the limited-functionality and full-functionality options. A | |||
| full discussion of the potential effects of an adversary's modifica- | full discussion of the potential effects of an adversary's modifica- | |||
| tions of the CE and ECT bits is given in the Appendix. | tions of the CE and ECT bits is given in Sections 18 and 19. | |||
| 9.1.1. The limited-functionality and full-functionality options within | 9.1.1. The Limited-functionality and Full-functionality Options | |||
| IP Tunnels | ||||
| The limited-functionality option for ECN encapsulation in IP tunnels | The limited-functionality option for ECN encapsulation in IP tunnels | |||
| is for the ECT bit in the outside (encapsulating) header to be off | is for the ECT bit in the outside (encapsulating) header to be off | |||
| (i.e., set to 0), regardless of the value of the ECT bit in the | (i.e., set to 0), regardless of the value of the ECT bit in the | |||
| inside (encapsulated) header. With this option, the ECN field in the | inside (encapsulated) header. With this option, the ECN field in the | |||
| inner header is not altered upon de-capsulation. The disadvantage of | inner header is not altered upon de-capsulation. The disadvantage of | |||
| this approach is that the flow does not have ECN support for that | this approach is that the flow does not have ECN support for that | |||
| part of the path that is using IP tunneling, even if the encapsulated | part of the path that is using IP tunneling, even if the encapsulated | |||
| packet (from the original TCP sender) is ECN-Capable. That is, if | packet (from the original TCP sender) is ECN-Capable. That is, if | |||
| the encapsulated packet arrives at a congested router that is ECN- | the encapsulated packet arrives at a congested router that is ECN- | |||
| capable, and the router can decide to drop or mark the packet as an | capable, and the router can decide to drop or mark the packet as an | |||
| indication of congestion to the end nodes, the router will not be | indication of congestion to the end nodes, the router will not be | |||
| permitted to set the CE bit in the packet header, but instead will | permitted to set the CE bit in the packet header, but instead will | |||
| have to drop the packet. | have to drop the packet. | |||
| The IP full-functionality option for ECN encapsulation is to copy the | The full-functionality option for ECN encapsulation is to copy the | |||
| ECT bit of the inside header to the outside header on encapsulation, | ECT bit of the inside header to the outside header on encapsulation, | |||
| and to OR the CE bit from the outer header with the CE bit of the | and to OR the CE bit from the outer header with the CE bit of the | |||
| inside header on decapsulation. That is, for full ECN support the | inside header on decapsulation. That is, for full ECN support the | |||
| encapsulation and decapsulation processing for the DS field octet | encapsulation and decapsulation processing involves the following: | |||
| involves the following: At tunnel ingress, the full-functionality | At tunnel ingress, the full-functionality option copies the value of | |||
| option copies the value of ECT (bit 6) in the inner header to the | ECT (bit 6) in the inner header to the outer header. CE (bit 7) is | |||
| outer header. CE (bit 7) is set to 0 in the outer header. Upon | set to 0 in the outer header. Upon decapsulation at the tunnel | |||
| decapsulation at the tunnel egress, the full-functionality option | egress, the full-functionality option sets CE to 1 in the inner | |||
| sets CE to 1 in the inner header if the value of ECT (bit 6) in the | header if the value of ECT (bit 6) in the inner header is 1, and the | |||
| inner header is 1, and the value of CE (bit 7) in the outer header is | value of CE (bit 7) in the outer header is 1. Otherwise, no change | |||
| 1. Otherwise, no change is made to this field of the inner header. | is made to this field of the inner header. | |||
| With the full-functionality option, a flow can take advantage of ECN | With the full-functionality option, a flow can take advantage of ECN | |||
| in those parts of the path that might use IP tunneling. The disad- | in those parts of the path that might use IP tunneling. The disad- | |||
| vantage of the full-functionality option from a security perspective | vantage of the full-functionality option from a security perspective | |||
| is that the IP tunnel cannot protect the flow from certain modifica- | is that the IP tunnel cannot protect the flow from certain modifica- | |||
| tions to the ECN bits in the IP header within the tunnel. The poten- | tions to the ECN bits in the IP header within the tunnel. The poten- | |||
| tial dangers from modifications to the ECN bits in the IP header are | tial dangers from modifications to the ECN bits in the IP header are | |||
| described in detail in the Appendix. | described in detail in Sections 18 and 19. | |||
| (1) An IP tunnel MUST modify the handling of the DS field octet at | (1) An IP tunnel MUST modify the handling of the DS field octet at | |||
| IP tunnel endpoints by implementing either the limited-functional- | IP tunnel endpoints by implementing either the limited-functional- | |||
| ity or the full-functionality option. | ity or the full-functionality option. | |||
| (2) Optionally, an IP tunnel MAY enable the endpoints of an IP | (2) Optionally, an IP tunnel MAY enable the endpoints of an IP | |||
| tunnel to negotiate the choice between the limited-functionality | tunnel to negotiate the choice between the limited-functionality | |||
| and the full-functionality option for ECN in the tunnel. | and the full-functionality option for ECN in the tunnel. | |||
| The minimum required to make ECN usable with IP tunnels is the lim- | The minimum required to make ECN usable with IP tunnels is the lim- | |||
| ited-functionality option, which prevents ECN from being enabled in | ited-functionality option, which prevents ECN from being enabled in | |||
| skipping to change at page 24, line 48 ¶ | skipping to change at page 25, line 17 ¶ | |||
| support the limited-functionality or the full-functionality ECN | support the limited-functionality or the full-functionality ECN | |||
| option. | option. | |||
| In addition, it is RECOMMENDED that packets with ECT and CE both set | In addition, it is RECOMMENDED that packets with ECT and CE both set | |||
| to 1 in the outer header be dropped if they arrive at the tunnel | to 1 in the outer header be dropped if they arrive at the tunnel | |||
| egress point for a tunnel that uses the limited-functionality option, | egress point for a tunnel that uses the limited-functionality option, | |||
| or for a tunnel that uses the full-functionality option but for which | or for a tunnel that uses the full-functionality option but for which | |||
| the ECT bit in the inner header is set to zero. This is motivated by | the ECT bit in the inner header is set to zero. This is motivated by | |||
| backwards compatibility and to ensure that no unauthorized modifica- | backwards compatibility and to ensure that no unauthorized modifica- | |||
| tions of the ECN field take place, and is discussed further in the | tions of the ECN field take place, and is discussed further in the | |||
| Appendix. | next Section (9.1.2). | |||
| 9.1.2. Changes to the ECN Field within an IP Tunnel. | 9.1.2. Changes to the ECN Field within an IP Tunnel. | |||
| The presence of a copy of the ECN field in the inner header of an IP | The presence of a copy of the ECN field in the inner header of an IP | |||
| tunnel mode packet provides an opportunity for detection of unautho- | tunnel mode packet provides an opportunity for detection of unautho- | |||
| rized modifications to the ECT bit in the outer header. Comparison | rized modifications to the ECT bit in the outer header. Comparison | |||
| of the ECT bits in the inner and outer headers falls into two cate- | of the ECT bits in the inner and outer headers falls into two cate- | |||
| gories for implementations that conform to this document: | gories for implementations that conform to this document: | |||
| * If the IP tunnel uses the full-functionality option, then the | * If the IP tunnel uses the full-functionality option, then the | |||
| values of the ECT bits in the inner and outer headers should be | values of the ECT bits in the inner and outer headers should be | |||
| skipping to change at page 26, line 27 ¶ | skipping to change at page 26, line 42 ¶ | |||
| that encapsulates the original, or "inner" IP header and its associ- | that encapsulates the original, or "inner" IP header and its associ- | |||
| ated packet. Tunnel mode security headers are inserted between these | ated packet. Tunnel mode security headers are inserted between these | |||
| two IP headers. In contrast to transport mode, the new "outer" IP | two IP headers. In contrast to transport mode, the new "outer" IP | |||
| header and tunnel mode security headers can be added and removed at | header and tunnel mode security headers can be added and removed at | |||
| intermediate points along a connection, enabling security gateways to | intermediate points along a connection, enabling security gateways to | |||
| secure vulnerable portions of a connection without requiring endpoint | secure vulnerable portions of a connection without requiring endpoint | |||
| participation in the security protocols. An important aspect of tun- | participation in the security protocols. An important aspect of tun- | |||
| nel mode security is that in the original specification, the outer | nel mode security is that in the original specification, the outer | |||
| header is discarded at tunnel egress, ensuring that security threats | header is discarded at tunnel egress, ensuring that security threats | |||
| based on modifying the IP header do not propagate beyond that tunnel | based on modifying the IP header do not propagate beyond that tunnel | |||
| endpoint. Further discussion of IPsec can be found in [RFC 2401]. | endpoint. Further discussion of IPsec can be found in [RFC2401]. | |||
| The IPsec protocol as originally defined in [ESP, AH] required that | The IPsec protocol as originally defined in [ESP, AH] required that | |||
| the inner header's ECN field not be changed by IPsec decapsulation | the inner header's ECN field not be changed by IPsec decapsulation | |||
| processing at a tunnel egress node; this would have ruled out the | processing at a tunnel egress node; this would have ruled out the | |||
| possibility of full-functionality mode for ECN. At the same time, | possibility of full-functionality mode for ECN. At the same time, | |||
| this would ensure that an adversary's modifications to the ECN field | this would ensure that an adversary's modifications to the ECN field | |||
| cannot be used to launch theft- or denial-of-service attacks across | cannot be used to launch theft- or denial-of-service attacks across | |||
| an IPsec tunnel endpoint, as any such modifications will be discarded | an IPsec tunnel endpoint, as any such modifications will be discarded | |||
| at the tunnel endpoint. | at the tunnel endpoint. | |||
| In principle, permitting the use of ECN functionality in the outer | In principle, permitting the use of ECN functionality in the outer | |||
| header of an IPsec tunnel raises security concerns because an adver- | header of an IPsec tunnel raises security concerns because an adver- | |||
| sary could tamper with the information that propagates beyond the | sary could tamper with the information that propagates beyond the | |||
| tunnel endpoint. Based on an analysis (included in the Appendix) of | tunnel endpoint. Based on an analysis (included in Sections 18 and | |||
| these concerns and the associated risks, our overall approach has | 19) of these concerns and the associated risks, our overall approach | |||
| been to provide configuration support for IPsec changes to remove the | has been to provide configuration support for IPsec changes to remove | |||
| conflict with ECN. | the conflict with ECN. | |||
| In particular, in tunnel mode the IPsec tunnel MUST support either | In particular, in tunnel mode the IPsec tunnel MUST support either | |||
| the limited-functionality or the full-functionality mode outlined in | the limited-functionality or the full-functionality mode outlined in | |||
| Section X. | Section 9.1.1. | |||
| This makes permission to use ECN functionality in the outer header of | This makes permission to use ECN functionality in the outer header of | |||
| an IPsec tunnel a configurable part of the corresponding IPsec | an IPsec tunnel a configurable part of the corresponding IPsec Secu- | |||
| Security Association (SA), so that it can be disabled in situations | rity Association (SA), so that it can be disabled in situations where | |||
| where the risks are judged to outweigh the benefits. The result is | the risks are judged to outweigh the benefits. The result is that an | |||
| that an IPsec security administrator is presented with two alterna- | IPsec security administrator is presented with two alternatives for | |||
| tives for the behavior of ECN-capable connections within an IPsec | the behavior of ECN-capable connections within an IPsec tunnel, the | |||
| tunnel, the limited-functionality alternative and full-functionality | limited-functionality alternative and full-functionality alternative | |||
| alternative described earlier. All IPsec implementations MUST imple- | described earlier. All IPsec implementations MUST implement either | |||
| ment either the limited-functionality or the full-functionality | the limited-functionality or the full-functionality alternative in | |||
| alternative in order to eliminate incompatibility between ECN and | order to eliminate incompatibility between ECN and IPsec tunnels, but | |||
| IPsec tunnels, but implementers MAY choose to implement either alter- | implementers MAY choose to implement either alternative. | |||
| native. | ||||
| In addition, this document specifies how the endpoints of an IPsec | In addition, this document specifies how the endpoints of an IPsec | |||
| tunnel could negotiate enabling ECN functionality in the outer head- | tunnel could negotiate enabling ECN functionality in the outer head- | |||
| ers of that tunnel based on security policy. The ability to negoti- | ers of that tunnel based on security policy. The ability to negoti- | |||
| ate ECN usage between tunnel endpoints would enable a security admin- | ate ECN usage between tunnel endpoints would enable a security admin- | |||
| istrator to disable ECN in situations where she believes the risks | istrator to disable ECN in situations where she believes the risks | |||
| (e.g., of lost congestion notifications) outweigh the benefits of | (e.g., of lost congestion notifications) outweigh the benefits of | |||
| ECN. | ECN. | |||
| The IPsec protocol, as defined in [ESP, AH], does not include the IP | The IPsec protocol, as defined in [ESP, AH], does not include the IP | |||
| skipping to change at page 28, line 36 ¶ | skipping to change at page 28, line 52 ¶ | |||
| ECN Tunnel: allowed or forbidden. | ECN Tunnel: allowed or forbidden. | |||
| Indicates whether ECN-capable connections using this SA in tunnel | Indicates whether ECN-capable connections using this SA in tunnel | |||
| mode are permitted to receive ECN congestion notifications for | mode are permitted to receive ECN congestion notifications for | |||
| congestion occurring within the tunnel. The allowed value enables | congestion occurring within the tunnel. The allowed value enables | |||
| ECN congestion notifications. The forbidden value disables such | ECN congestion notifications. The forbidden value disables such | |||
| notifications, causing all congestion to be indicated via dropped | notifications, causing all congestion to be indicated via dropped | |||
| packets. | packets. | |||
| [OPTIONAL. The value of this field SHOULD be assumed to be "for- | [OPTIONAL. The value of this field SHOULD be assumed to be | |||
| bidden" in implementations that do not support it.] | "forbidden" in implementations that do not support it.] | |||
| If this attribute is implemented, then the SA specification in a | If this attribute is implemented, then the SA specification in a | |||
| Security Policy Database (SPD) entry MUST support a corresponding | Security Policy Database (SPD) entry MUST support a corresponding | |||
| attribute, and this SPD attribute MUST be covered by the SPD adminis- | attribute, and this SPD attribute MUST be covered by the SPD adminis- | |||
| trative interface (currently described in Section 4.4.1 of | trative interface (currently described in Section 4.4.1 of | |||
| [RFC2401]). | [RFC2401]). | |||
| 9.2.1.2. ECN Tunnel Security Association Attribute | 9.2.1.2. ECN Tunnel Security Association Attribute | |||
| A new IPsec Security Association Attribute is defined to enable the | A new IPsec Security Association Attribute is defined to enable the | |||
| support for ECN congestion notifications based on the outer IP header | support for ECN congestion notifications based on the outer IP header | |||
| to be negotiated for IPsec tunnels (see [RFC2407]). This attribute | to be negotiated for IPsec tunnels (see [RFC2407]). This attribute | |||
| is OPTIONAL, although implementations that support it SHOULD also | is OPTIONAL, although implementations that support it SHOULD also | |||
| support the SAD field defined in Section 3.1. | support the SAD field defined in Section 9.2.1.1. | |||
| Attribute Type | Attribute Type | |||
| class value type | class value type | |||
| ------------------------------------------------- | ------------------------------------------------- | |||
| ECN Tunnel 10 Basic | ECN Tunnel 10 Basic | |||
| The IPsec SA Attribute value 10 has been allocated by IANA to indi- | The IPsec SA Attribute value 10 has been allocated by IANA to indi- | |||
| cate that the ECN Tunnel SA Attribute is being negotiated; the type | cate that the ECN Tunnel SA Attribute is being negotiated; the type | |||
| of this attribute is Basic (see Section 4.5 of [RFC2407]). The Class | of this attribute is Basic (see Section 4.5 of [RFC2407]). The Class | |||
| skipping to change at page 29, line 25 ¶ | skipping to change at page 29, line 40 ¶ | |||
| RFC2409] for further information including encoding formats and | RFC2409] for further information including encoding formats and | |||
| requirements for negotiating this SA attribute. | requirements for negotiating this SA attribute. | |||
| Class Values | Class Values | |||
| ECN Tunnel | ECN Tunnel | |||
| Specifies whether ECN functionality is allowed to | Specifies whether ECN functionality is allowed to | |||
| be used with Tunnel Encapsulation Mode. | be used with Tunnel Encapsulation Mode. | |||
| This affects tunnel encapsulation and decapsulation processing - | This affects tunnel encapsulation and decapsulation processing - | |||
| see Section 3.3. | see Section 9.2.1.3. | |||
| RESERVED 0 | RESERVED 0 | |||
| Allowed 1 | Allowed 1 | |||
| Forbidden 2 | Forbidden 2 | |||
| Values 3-61439 are reserved to IANA. Values 61440-65535 are for | Values 3-61439 are reserved to IANA. Values 61440-65535 are for | |||
| private use. | private use. | |||
| If unspecified, the default shall be assumed to be Forbidden. | If unspecified, the default shall be assumed to be Forbidden. | |||
| skipping to change at page 29, line 49 ¶ | skipping to change at page 30, line 16 ¶ | |||
| ity with such implementations initiators SHOULD always also include a | ity with such implementations initiators SHOULD always also include a | |||
| proposal without the ECN Tunnel attribute to enable such a responder | proposal without the ECN Tunnel attribute to enable such a responder | |||
| to select a transform or proposal that does not contain the ECN Tun- | to select a transform or proposal that does not contain the ECN Tun- | |||
| nel attribute. RFC 2407 currently requires responders to reject all | nel attribute. RFC 2407 currently requires responders to reject all | |||
| proposals if any proposal contains an unknown attribute; this | proposals if any proposal contains an unknown attribute; this | |||
| requirement is expected to be changed to require a responder not to | requirement is expected to be changed to require a responder not to | |||
| select proposals or transforms containing unknown attributes. | select proposals or transforms containing unknown attributes. | |||
| 9.2.1.3. Changes to IPsec Tunnel Header Processing | 9.2.1.3. Changes to IPsec Tunnel Header Processing | |||
| Subsequent to the publication of [RFC 2401], the TOS octet of IPv4 | ||||
| and the Traffic Class octet of IPv6 have been superseded by the six- | ||||
| bit DS Field [RFC2474, RFC2780] and a two-bit "currently unused" (CU) | ||||
| field [RFC2780], and this document supersedes the CU field by tne ECN | ||||
| Field. | ||||
| For full ECN support, the encapsulation and decapsulation processing | For full ECN support, the encapsulation and decapsulation processing | |||
| for the IPv4 TOS field and the IPv6 Traffic Class field are changed | for the IPv4 TOS field and the IPv6 Traffic Class field are changed | |||
| from that specified in [RFC2401] to the following: | from that specified in [RFC2401] to the following: | |||
| <-- How Outer Hdr Relates to Inner Hdr --> | <-- How Outer Hdr Relates to Inner Hdr --> | |||
| Outer Hdr at Inner Hdr at | Outer Hdr at Inner Hdr at | |||
| IPv4 Encapsulator Decapsulator | IPv4 Encapsulator Decapsulator | |||
| Header fields: -------------------- ------------ | Header fields: -------------------- ------------ | |||
| DS Field copied from inner hdr (5) no change | DS Field copied from inner hdr (5) no change | |||
| ECN Field constructed (7) constructed (8) | ECN Field constructed (7) constructed (8) | |||
| skipping to change at page 30, line 43 ¶ | skipping to change at page 31, line 5 ¶ | |||
| SA is "allowed" and the value of ECT (bit 0) in the inner header | SA is "allowed" and the value of ECT (bit 0) in the inner header | |||
| is 1, then set the CE bit (bit 1) in the inner header to the logi- | is 1, then set the CE bit (bit 1) in the inner header to the logi- | |||
| cal OR of the CE bit in the inner header with the CE bit in the | cal OR of the CE bit in the inner header with the CE bit in the | |||
| outer header, else make no change to the ECN field. | outer header, else make no change to the ECN field. | |||
| (5) and (6) are identical to match usage in [RFC2401], although | (5) and (6) are identical to match usage in [RFC2401], although | |||
| they are different in [RFC2401]. | they are different in [RFC2401]. | |||
| The above description applies to implementations that support the ECN | The above description applies to implementations that support the ECN | |||
| Tunnel field in the SAD; such implementations MUST implement this | Tunnel field in the SAD; such implementations MUST implement this | |||
| processing of the DS field instead of the processing of the IPv4 TOS | processing instead of the processing of the IPv4 TOS octet and IPv6 | |||
| octet and IPv6 Traffic Class octet defined in [RFC2401]. This con- | Traffic Class octet defined in [RFC2401]. This constitutes the full- | |||
| stitutes the full-functionality alternative for ECN usage with IPsec | functionality alternative for ECN usage with IPsec tunnels. | |||
| tunnels. | ||||
| An implementation that does not support the ECN Tunnel field in the | An implementation that does not support the ECN Tunnel field in the | |||
| SAD MUST implement processing of the DS Field by assuming that the | SAD MUST implement this processing by assuming that the value of the | |||
| value of the ECN Tunnel field of the SAD is "forbidden" for every SA. | ECN Tunnel field of the SAD is "forbidden" for every SA. In this | |||
| In this case, the processing of the ECN field reduces to: | case, the processing of the ECN field reduces to: | |||
| (7) Set the ECN field (ECT and CE bits) to zero in the outer | (7) Set the ECN field (ECT and CE bits) to zero in the outer | |||
| header. | header. | |||
| (8) Make no change to the ECN field in the inner header. | (8) Make no change to the ECN field in the inner header. | |||
| This constitutes the limited functionality alternative for ECN usage | This constitutes the limited functionality alternative for ECN usage | |||
| with IPsec tunnels. | with IPsec tunnels. | |||
| For backwards compatibility, packets with ECT and CE both set to 1 in | For backwards compatibility, packets with ECT and CE both set to 1 in | |||
| the outer header SHOULD be dropped if they arrive on an SA that is | the outer header SHOULD be dropped if they arrive on an SA that is | |||
| skipping to change at page 31, line 40 ¶ | skipping to change at page 31, line 49 ¶ | |||
| 9.2.3. Comments for IPsec Support | 9.2.3. Comments for IPsec Support | |||
| Substantial comments were received on two areas of this document dur- | Substantial comments were received on two areas of this document dur- | |||
| ing review by the IPsec working group. This section describes these | ing review by the IPsec working group. This section describes these | |||
| comments and explains why the proposed changes were not incorporated. | comments and explains why the proposed changes were not incorporated. | |||
| The first comment indicated that per-node configuration is easier to | The first comment indicated that per-node configuration is easier to | |||
| implement than per-SA configuration. After serious thought and | implement than per-SA configuration. After serious thought and | |||
| despite some initial encouragement of per-node configuration, it no | despite some initial encouragement of per-node configuration, it no | |||
| longer seems to be a good idea. The concern is that as IPsec is pro- | longer seems to be a good idea. The concern is that as ECN-awareness | |||
| gressively deployed, many ECN-aware IPsec implementations will find | is progressively deployed in IPsec, many ECN-aware IPsec implementa- | |||
| themselves communicating with a mixture of ECN-aware and ECN-unaware | tions will find themselves communicating with a mixture of ECN-aware | |||
| IPsec tunnel endpoints. In such an environment with per-node config- | and ECN-unaware IPsec tunnel endpoints. In such an environment with | |||
| uration, the only reasonable thing to do is forbid ECN usage for all | per-node configuration, the only reasonable thing to do is forbid ECN | |||
| IPsec tunnels, which is not the desired outcome. | usage for all IPsec tunnels, which is not the desired outcome. | |||
| In the second area, several reviewers noted that SA negotiation is | In the second area, several reviewers noted that SA negotiation is | |||
| complex, and adding to it is non-trivial. One reviewer suggested | complex, and adding to it is non-trivial. One reviewer suggested | |||
| using ICMP after tunnel setup as a possible alternative. The addi- | using ICMP after tunnel setup as a possible alternative. The addi- | |||
| tion to SA negotiation in the document is OPTIONAL and will remain | tion to SA negotiation in this document is OPTIONAL and will remain | |||
| so; implementers are free to ignore it. The authors believe that the | so; implementers are free to ignore it. The authors believe that the | |||
| assurance it provides can be useful in a number of situations. In | assurance it provides can be useful in a number of situations. In | |||
| practice, if this is not implemented, it can be deleted at a subse- | practice, if this is not implemented, it can be deleted at a subse- | |||
| quent stage in the standards process. Extending ICMP to negotiate | quent stage in the standards process. Extending ICMP to negotiate | |||
| ECN after tunnel setup is more complex than extending SA attribute | ECN after tunnel setup is more complex than extending SA attribute | |||
| negotiation. Some tunnels do not permit traffic to be addressed to | negotiation. Some tunnels do not permit traffic to be addressed to | |||
| the tunnel egress endpoint, hence the ICMP packet would have to be | the tunnel egress endpoint, hence the ICMP packet would have to be | |||
| addressed to somewhere else, scanned for by the egress endpoint, and | addressed to somewhere else, scanned for by the egress endpoint, and | |||
| discarded there or at its actual destination. In addition, ICMP | discarded there or at its actual destination. In addition, ICMP | |||
| delivery is unreliable, and hence there is a possibility of an ICMP | delivery is unreliable, and hence there is a possibility of an ICMP | |||
| skipping to change at page 32, line 23 ¶ | skipping to change at page 32, line 33 ¶ | |||
| ack/retransmit mechanism. It seems better simply to specify an | ack/retransmit mechanism. It seems better simply to specify an | |||
| OPTIONAL extension to the existing SA negotiation mechanism. | OPTIONAL extension to the existing SA negotiation mechanism. | |||
| 9.3. IP packets encapsulated in non-IP packet headers. | 9.3. IP packets encapsulated in non-IP packet headers. | |||
| A different set of issues are raised, relative to ECN, when IP pack- | A different set of issues are raised, relative to ECN, when IP pack- | |||
| ets are encapsulated in tunnels with non-IP packet headers. This | ets are encapsulated in tunnels with non-IP packet headers. This | |||
| occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP]. | occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP]. | |||
| For these protocols, there is no conflict with ECN; it is just that | For these protocols, there is no conflict with ECN; it is just that | |||
| ECN cannot be used within the tunnel unless an ECN codepoint can be | ECN cannot be used within the tunnel unless an ECN codepoint can be | |||
| specified for the header of the encapsulating protocol. [RFD99] con- | specified for the header of the encapsulating protocol. Earlier work | |||
| sidered a preliminary proposal for incorporating ECN into MPLS, and | considered a preliminary proposal for incorporating ECN into MPLS, | |||
| proposals for incorporating ECN into GRE, L2TP, or PPTP will be con- | and proposals for incorporating ECN into GRE, L2TP, or PPTP will be | |||
| sidered as the need arises. | considered as the need arises. | |||
| 10. Issues Raised by Monitoring and Policing Devices | 10. Issues Raised by Monitoring and Policing Devices | |||
| One possibility is that monitoring and policing devices (or more | One possibility is that monitoring and policing devices (or more | |||
| informally, "penalty boxes") will be installed in the network to mon- | informally, "penalty boxes") will be installed in the network to mon- | |||
| itor whether best-effort flows are appropriately responding to con- | itor whether best-effort flows are appropriately responding to con- | |||
| gestion, and to preferentially drop packets from flows determined not | gestion, and to preferentially drop packets from flows determined not | |||
| to be using adequate end-to-end congestion control procedures. This | to be using adequate end-to-end congestion control procedures. | |||
| is discussed in more detail in the Appendix. | ||||
| We recommend that any "penalty box" that detects a flow or an aggre- | We recommend that any "penalty box" that detects a flow or an aggre- | |||
| gate of flows that is not responding to end-to-end congestion control | gate of flows that is not responding to end-to-end congestion control | |||
| first change from marking to dropping packets from that flow, before | first change from marking to dropping packets from that flow, before | |||
| taking any additional action to restrict the bandwidth available to | taking any additional action to restrict the bandwidth available to | |||
| that flow. Thus, initially, the router may drop packets in which the | that flow. Thus, initially, the router may drop packets in which the | |||
| router would otherwise would have set the CE bit. This could include | router would otherwise would have set the CE bit. This could include | |||
| dropping those arriving packets for that flow that are ECN-Capable | dropping those arriving packets for that flow that are ECN-Capable | |||
| and that already have the CE bit set. In this way, any congestion | and that already have the CE bit set. In this way, any congestion | |||
| indications seen by that router for that flow will be guaranteed to | indications seen by that router for that flow will be guaranteed to | |||
| also be seen by the end nodes, even in the presence of malicious or | also be seen by the end nodes, even in the presence of malicious or | |||
| broken routers elsewhere in the path. If we assume that the first | broken routers elsewhere in the path. If we assume that the first | |||
| action taken at any "penalty box" for an ECN-capable flow will be to | action taken at any "penalty box" for an ECN-capable flow will be to | |||
| drop packets instead of marking them, then there is no way that an | drop packets instead of marking them, then there is no way that an | |||
| adversary that subverts ECN-based end-to-end congestion control can | adversary that subverts ECN-based end-to-end congestion control can | |||
| cause a flow to be characterized as being non-cooperative and placed | cause a flow to be characterized as being non-cooperative and placed | |||
| into a more severe action within the "penalty box". | into a more severe action within the "penalty box". | |||
| The monitoring and policing devices that are actually deployed could | The monitoring and policing devices that are actually deployed could | |||
| fall short of the `ideal' monitoring device described above, in that | fall short of the `ideal' monitoring device described above, in that | |||
| the monitoring is applied not to a single flow or to a single IPsec | the monitoring is applied not to a single flow, but to an aggregate | |||
| tunnel, but to an aggregate of flows. In this case, the switch from | of flows (e.g., those sharing a single IPsec tunnel). In this case, | |||
| marking to dropping would apply to all of the flows in that aggre- | the switch from marking to dropping would apply to all of the flows | |||
| gate, denying the benefits of ECN to the other flows in the aggregate | in that aggregate, denying the benefits of ECN to the other flows in | |||
| also. At the highest level of aggregation, another form of the dis- | the aggregate also. At the highest level of aggregation, another | |||
| abling of ECN happens even in the absence of monitoring and policing | form of the disabling of ECN happens even in the absence of monitor- | |||
| devices, when ECN-Capable RED queues switch from marking to dropping | ing and policing devices, when ECN-Capable RED queues switch from | |||
| packets as an indication of congestion when the average queue size | marking to dropping packets as an indication of congestion when the | |||
| has exceeded some threshold. | average queue size has exceeded some threshold. | |||
| If there were serious operational problems with routers inappropri- | If there were serious operational problems with routers inappropri- | |||
| ately erasing the CE bit in packet headers, one potential fix would | ately erasing the CE bit in packet headers, this could be addressed | |||
| be to include a one-bit ECN nonce in packet headers, and for routers | to some extent by including a one-bit ECN nonce in packet headers. | |||
| to erase the nonce when they set the CE bit [SCWA99]. Routers that | Routers would erase the nonce when they set the CE bit [SCWA99]. | |||
| erased the CE bit would be unable to consistently reconstruct the | Routers that erased the CE bit would face additional difficulty in | |||
| original nonce, and thus repeated erasure of the CE bit would be | reconstructing the original nonce, and thus repeated erasure of the | |||
| detected by the end-nodes. (This could in fact be done without | CE bit would be more likely to be detected by the end-nodes. (This | |||
| adding any extra bits for ECN in the IP header, by using the ECN | could in fact be done without adding any extra bits for ECN in the IP | |||
| codepoints (ECT=1, CE=0) and (ECT=0, CE=1) as the two values for the | header, by using the ECN codepoints (ECT=1, CE=0) and (ECT=0, CE=1) | |||
| nonce, and by defining the codepoint (ECT=0, CE=1) to mean exactly | as the two values for the nonce, and by defining the codepoint | |||
| the same as the codepoint (ECT=1, CE=0).) However, at this point the | (ECT=0, CE=1) to mean exactly the same as the codepoint (ECT=1, | |||
| potential danger of misbehaving routers does not seem of sufficient | CE=0).) However, at this point the potential danger of misbehaving | |||
| concern to warrant this additional complication of adding an ECN | routers does not seem of sufficient concern to warrant this addi- | |||
| nonce to protect against the erasure of the CE bit. | tional complication of adding an ECN nonce to protect against the | |||
| erasure of the CE bit. Additional research is also needed to better | ||||
| understand the value of such a nonce and appropriate means of gener- | ||||
| ating sequences of nonce values that an adversary will find suffi- | ||||
| ciently difficult to reconstruct. | ||||
| An ECN nonce would also address the problem of misbehaving transport | An ECN nonce would also address the problem of misbehaving transport | |||
| receivers lying to the transport sender about whether or not the CE | receivers lying to the transport sender about whether or not the CE | |||
| bit was set in a packet. However, another possibility is for the | bit was set in a packet. However, another possibility is for the | |||
| data sender to test for a misbehaving receiver directly, by occasion- | data sender to test for a misbehaving receiver directly, by occasion- | |||
| ally sending a data packet with ECT and CE set, to see if the | ally sending a data packet with ECT and CE set, to see if the | |||
| receiver reports receiving the CE bit. Of course, if these packets | receiver reports receiving the CE bit. Of course, if these packets | |||
| encountered congestion in the network, the TCP sender would not | encountered congestion in the network, the router would make no | |||
| receive this indication of congestion, so setting the ECT and CE bits | change in the packets, because the CE bit would already be set. | |||
| at the sender would have to be done very sparingly. In addition, the | Thus, for packets sent with the ECT and CE bits set, the TCP end- | |||
| TCP sender would have to remember which packets were sent with the | nodes could not determine if some router intended to set the CE bit | |||
| ECT and CE bits set, so that it doesn't react to them as if there was | in these packets. For this reason, sending packets with the ECT and | |||
| CE bits would have to be done very sparingly. In addition, the TCP | ||||
| sender would have to remember which packets were sent with the ECT | ||||
| and CE bits set, so that it doesn't react to them as if there was | ||||
| congestion in the network. We believe that further research is | congestion in the network. We believe that further research is | |||
| needed on possible transport-based mechanisms for verifying that the | needed on possible transport-based mechanisms for verifying that the | |||
| transport receiver does not lie to the transport sender about the | transport receiver does not lie to the transport sender about the | |||
| receipt of congestion indications. | receipt of congestion indications. | |||
| 11. Evaluations of ECN | 11. Evaluations of ECN | |||
| This section discusses some of the related work evaluating the use of | This section discusses some of the related work evaluating the use of | |||
| ECN. The ECN Web Page [ECN] has pointers to other papers, as well as | ECN. The ECN Web Page [ECN] has pointers to other papers, as well as | |||
| to implementations of ECN. | to implementations of ECN. | |||
| skipping to change at page 34, line 52 ¶ | skipping to change at page 35, line 16 ¶ | |||
| ECT bit. The ECT bit set to "1" indicates that the transport proto- | ECT bit. The ECT bit set to "1" indicates that the transport proto- | |||
| col is willing and able to participate in ECN. | col is willing and able to participate in ECN. | |||
| The default value for the CE bit is "0". The router sets the CE bit | The default value for the CE bit is "0". The router sets the CE bit | |||
| to "1" to indicate congestion to the end nodes. The CE bit in a | to "1" to indicate congestion to the end nodes. The CE bit in a | |||
| packet header MUST NOT be reset by a router from "1" to "0". | packet header MUST NOT be reset by a router from "1" to "0". | |||
| When viewed in terms of code points, this document has defined three | When viewed in terms of code points, this document has defined three | |||
| code points for the ECN field, for "not ECT" (ECT=0, CE=0), "ECT but | code points for the ECN field, for "not ECT" (ECT=0, CE=0), "ECT but | |||
| not CE" (ECT=1, CE=0), and "ECT and CE" (ECT=1, CE=1). The code | not CE" (ECT=1, CE=0), and "ECT and CE" (ECT=1, CE=1). The code | |||
| point of (ECT=0, CE=1) is not defined in this document. One | point of (ECT=0, CE=1) is not defined in this document. One possi- | |||
| possibility would be for this code point to be used, some time in the | bility would be for this code point to be used, some time in the | |||
| future, for some other function for non-ECN-capable packets. A sec- | future, for some other function for non-ECN-capable packets. A sec- | |||
| ond possibility would be for this code point to be used as an ECN | ond possibility would be for this code point to be used as an ECN | |||
| nonce, as described earlier in the paper. A third possibility would | nonce, as described earlier in the document. A third possibility | |||
| be for the code point (ECT=0, CE=1) to be used to indicate that the | would be for the code point (ECT=0, CE=1) to be used to indicate that | |||
| packet is ECN-capable for an alternate semantics for the Congestion | the packet is ECN-capable for an alternate semantics for the Conges- | |||
| Experienced indication. However, at this time the code point (ECT=0, | tion Experienced indication. However, at this time the code point | |||
| CE=1) remains undefined. | (ECT=0, CE=1) remains undefined. | |||
| TCP requires three changes for ECN, a setup phase and two new flags | TCP requires three changes for ECN, a setup phase and two new flags | |||
| in the TCP header. The ECN-Echo flag is used by the data receiver to | in the TCP header. The ECN-Echo flag is used by the data receiver to | |||
| inform the data sender of a received CE packet. The Congestion Win- | inform the data sender of a received CE packet. The Congestion Win- | |||
| dow Reduced (CWR) flag is used by the data sender to inform the data | dow Reduced (CWR) flag is used by the data sender to inform the data | |||
| receiver that the congestion window has been reduced. | receiver that the congestion window has been reduced. | |||
| When ECN (Explicit Congestion Notification [RFC2481]) is used, it is | When ECN (Explicit Congestion Notification [RFC2481]) is used, it is | |||
| required that congestion indications generated within an IP tunnel | required that congestion indications generated within an IP tunnel | |||
| not be lost at the tunnel egress. We specified a minor modification | not be lost at the tunnel egress. We specified a minor modification | |||
| skipping to change at page 35, line 38 ¶ | skipping to change at page 35, line 51 ¶ | |||
| tunnel, by turning the ECT bit in the outer header off, and not | tunnel, by turning the ECT bit in the outer header off, and not | |||
| altering the inner header at the time of decapsulation. | altering the inner header at the time of decapsulation. | |||
| 2) The full-functionality option, which copies the ECT bit of the | 2) The full-functionality option, which copies the ECT bit of the | |||
| inner header to the encapsulating header. At decapsulation, if the | inner header to the encapsulating header. At decapsulation, if the | |||
| ECT bit is set in the inner header, the CE bit on the outer header is | ECT bit is set in the inner header, the CE bit on the outer header is | |||
| ORed with the CE bit of the inner header to update the CE bit of the | ORed with the CE bit of the inner header to update the CE bit of the | |||
| packet. | packet. | |||
| All IP tunnels MUST implement one of the two alternative approaches | All IP tunnels MUST implement one of the two alternative approaches | |||
| described above. For IPsec tunnels, this document also defines an | described above. For IPsec tunnels, this document also defines an | |||
| optional IPsec SA attribute that enables negotiation of ECN usage | optional IPsec Security Association (SA) attribute that enables | |||
| within IPsec tunnels and an optional field in the Security Associa- | negotiation of ECN usage within IPsec tunnels and an optional field | |||
| tion Database to indicate whether ECN is permitted in tunnel mode on | in the Security Association Database to indicate whether ECN is per- | |||
| a SA. | mitted in tunnel mode on a SA. The required changes to IPsec tunnels | |||
| for ECN usage modify RFC 2401 [RFC2401], which defines the IPsec | ||||
| architecture and specifies some aspects of its implementation. The | ||||
| new IPsec SA attribute is in addition to those already defined in | ||||
| Section 4.5 of [RFC2407]. | ||||
| This document is intended to obsolete RFC 2481, "A Proposal to add | This document is intended to obsolete RFC 2481, "A Proposal to add | |||
| Explicit Congestion Notification (ECN) to IP", which defined ECN as | Explicit Congestion Notification (ECN) to IP", which defined ECN as | |||
| an Experimental Protocol for the Internet Community, as well as to | an Experimental Protocol for the Internet Community. The rest of | |||
| obsolete three subsequent internet-drafts on ECN, "IPsec Interactions | this section describes the relationship between this document and its | |||
| with ECN", "ECN Interactions with IP Tunnels", and "TCP with ECN: The | predecessor. | |||
| Treatment of Retransmitted Data Packets". This document is intended | ||||
| largely to merge the earlier documents all into a single document, | ||||
| for greater clarity, in preparation to becoming a Proposed Standard. | ||||
| The rest of this section describes the relationship between this | ||||
| document and its predecessors. | ||||
| RFC 2481 included a brief discussion of the use of ECN with encapsu- | RFC 2481 included a brief discussion of the use of ECN with encapsu- | |||
| lated packets, and noted that for the IPsec specifications at the | lated packets, and noted that for the IPsec specifications at the | |||
| time (January 1999), flows could not safely use ECN if they were to | time (January 1999), flows could not safely use ECN if they were to | |||
| traverse IPsec tunnels. RFC 2481 also described the changes that | traverse IPsec tunnels. RFC 2481 also described the changes that | |||
| could be made to IPsec tunnel specifications to made them compatible | could be made to IPsec tunnel specifications to made them compatible | |||
| with ECN. "IPsec Interactions with ECN" outlined these changes to | with ECN. | |||
| IPsec tunnels in detail, and included an extensive discussion of the | ||||
| security implications of ECN (now included as Sections 18 and 19 of | This document also incorporates work that was done after RFC 2481, | |||
| this document). The draft of "ECN Interactions with IP Tunnels" | First was to describe the changes to IPsec tunnels in detail, and | |||
| extended the discussion of IPsec tunnels to include all IP tunnels. | extensively discuss the security implications of ECN (now included as | |||
| Because older IP tunnels are not compatible with a flow's use of ECN, | Sections 18 and 19 of this document). Second was to extend the dis- | |||
| the deployment of ECN in the Internet will create strong pressure for | cussion of IPsec tunnels to include all IP tunnels. Because older IP | |||
| older IP tunnels to be updated to an ECN-compatible version, using | tunnels are not compatible with a flow's use of ECN, the deployment | |||
| either the limited-functionality or the full-functionality option. | of ECN in the Internet will create strong pressure for older IP tun- | |||
| nels to be updated to an ECN-compatible version, using either the | ||||
| limited-functionality or the full-functionality option. | ||||
| This document does not address the issue of including ECN in non-IP | This document does not address the issue of including ECN in non-IP | |||
| tunnels such as MPLS, GRE, L2TP, or PPTP. An earlier preliminary | tunnels such as MPLS, GRE, L2TP, or PPTP. An earlier preliminary | |||
| document about adding ECN support to MPLS has since expired. | document about adding ECN support to MPLS was not advanced. | |||
| This document expands on one area not addressed in RFC 2481, the use | A third new piece of work after RFC2481 was to describe the ECN pro- | |||
| of ECN with retransmitted data packets. That is, this document | cedure with retransmitted data packets, that the ECT bit should not | |||
| includes the material from "TCP with ECN: The Treatment of Retrans- | be set on retransmitted data packets. The motivation for this addi- | |||
| mitted Data Packets" specifying that the ECT bit should not be set on | tional specification is to eliminate a possible avenue for denial-of- | |||
| retransmitted data packets. The motivation for this additional spec- | service attacks on an existing TCP connection. Some prior deploy- | |||
| ification is to eliminate a possible avenue for denial-of-service | ments of ECN-capable TCP might not conform to the (new) requirement | |||
| attacks on an existing TCP connection. Some prior deployments of | not to set the ECT bit on retransmitted packets; we do not believe | |||
| ECN-capable TCP might not conform to the (new) requirement not to set | this will cause significant problems in practice. | |||
| the ECT bit on retransmitted packets; we do not believe this will | ||||
| cause significant problems in practice. | ||||
| This document also expands on the specification of the use of SYN | This document also expands slightly on the specification of the use | |||
| packets for the negotiation of ECN, and specifies some optional | of SYN packets for the negotiation of ECN. While some prior deploy- | |||
| behavior for this. In particular, the document allows a TCP host to | ments of ECN-capable TCP might not conform to the requirements speci- | |||
| send a non-ECN-setup SYN packet after sending a failed ECN-setup SYN | fied in this document, we do not believe that this will lead to any | |||
| packet, and precisely specifies the required behavior when both ECN- | performance or compatibility problems for TCP connections with a com- | |||
| setup SYN packets and non-ECN-setup SYN packets are sent in the same | bination of TCP implementations at the endpoints. | |||
| connection. While some prior deployments of ECN-capable TCP might | ||||
| not conform to the requirements specified in this document, we do not | ||||
| believe that this will lead to any performance or compatibility prob- | ||||
| lems for TCP connections with a combination of TCP implementations at | ||||
| the endpoints. | ||||
| 13. Conclusions | 13. Conclusions | |||
| Given the current effort to implement AQM, we believe this is the | Given the current effort to implement AQM, we believe this is the | |||
| right time to deploy congestion avoidance mechanisms that do not | right time to deploy congestion avoidance mechanisms that do not | |||
| depend on packet drops alone. With the increased deployment of | depend on packet drops alone. With the increased deployment of | |||
| applications and transports sensitive to the delay and loss of a sin- | applications and transports sensitive to the delay and loss of a sin- | |||
| gle packet (e.g., realtime traffic, short web transfers), depending | gle packet (e.g., realtime traffic, short web transfers), depending | |||
| on packet loss as a normal congestion notification mechanism appears | on packet loss as a normal congestion notification mechanism appears | |||
| to be insufficient (or at the very least, non-optimal). | to be insufficient (or at the very least, non-optimal). | |||
| skipping to change at page 37, line 36 ¶ | skipping to change at page 37, line 43 ¶ | |||
| on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for discus- | on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for discus- | |||
| sions of ECN issues, and Steve Bellovin, Jim Bound, Brian Carpenter, | sions of ECN issues, and Steve Bellovin, Jim Bound, Brian Carpenter, | |||
| Paul Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson for dis- | Paul Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson for dis- | |||
| cussions of security issues. We also thank the Internet End-to-End | cussions of security issues. We also thank the Internet End-to-End | |||
| Research Group for ongoing discussions of these issues. | Research Group for ongoing discussions of these issues. | |||
| Email discussions with a number of people, including Alexey | Email discussions with a number of people, including Alexey | |||
| Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have addressed | Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have addressed | |||
| the issues raised by non-conformant equipment in the Internet that | the issues raised by non-conformant equipment in the Internet that | |||
| does not respond to TCP SYN packets with the ECE and CWR flags set. | does not respond to TCP SYN packets with the ECE and CWR flags set. | |||
| We thank Mark Handley, Jitentra Padhye, and others for contributions | We thank Mark Handley, Jitentra Padhye, and others for discussions on | |||
| to the TCP initialization procedures. | the TCP initialization procedures. | |||
| The discussion of ECN and IP tunnel considerations draws heavily on | The discussion of ECN and IP tunnel considerations draws heavily on | |||
| related discussions and documents from the Differentiated Services | related discussions and documents from the Differentiated Services | |||
| Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh, | Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh, | |||
| for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen | for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen | |||
| for proposing modifications to RFC 2407 that improve the usability of | for proposing modifications to RFC 2407 that improve the usability of | |||
| negotiating the ECN Tunnel SA attribute. | negotiating the ECN Tunnel SA attribute. | |||
| 15. References | 15. References | |||
| [AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402, | [AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402, | |||
| November 1998. | November 1998. | |||
| [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement | [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement | |||
| Levels", BCP 14, RFC 2119, March 1997. | Levels", BCP 14, RFC 2119, March 1997. | |||
| [ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html". | [ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html". | |||
| Reference for informational purposes only. | ||||
| [ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload", | [ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload", | |||
| RFC 2406, November 1998. | RFC 2406, November 1998. | |||
| [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways | [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways | |||
| for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 | for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 | |||
| N.4, August 1993, p. 397-413. | N.4, August 1993, p. 397-413. | |||
| [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM | [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM | |||
| Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. | Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. | |||
| [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", | [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", | |||
| URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- | URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- | |||
| ecn. | ecn. Reference for informational purposes only. | |||
| [FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-End Con- | [FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-End Con- | |||
| gestion Control in the Internet", IEEE/ACM Transactions on Network- | gestion Control in the Internet", IEEE/ACM Transactions on Network- | |||
| ing, August 1999. | ing, August 1999. | |||
| [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", | [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", | |||
| SIGCOMM '97, September 1997. | SIGCOMM '97, September 1997. | |||
| [GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing | [GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing | |||
| Encapsulation (GRE), RFC 1701, October 1994. | Encapsulation (GRE), RFC 1701, October 1994. | |||
| skipping to change at page 39, line 39 ¶ | skipping to change at page 39, line 46 ¶ | |||
| [RFC2003] Perkins, C., IP Encapsulation within IP, RFC 2003, October | [RFC2003] Perkins, C., IP Encapsulation within IP, RFC 2003, October | |||
| 1996. | 1996. | |||
| [RFC 2119] S. Bradner, Key words for use in RFCs to Indicate Require- | [RFC 2119] S. Bradner, Key words for use in RFCs to Indicate Require- | |||
| ment Levels, RFC 2119, March 1997. | ment Levels, RFC 2119, March 1997. | |||
| [RFC2309] Braden, B., et al., "Recommendations on Queue Management | [RFC2309] Braden, B., et al., "Recommendations on Queue Management | |||
| and Congestion Avoidance in the Internet", RFC 2309, April 1998. | and Congestion Avoidance in the Internet", RFC 2309, April 1998. | |||
| [RFC 2401] S. Kent and R. Atkinson, Security Architecture for the | [RFC2401] S. Kent and R. Atkinson, Security Architecture for the | |||
| Internet Protocol, RFC 2401, November 1998. | Internet Protocol, RFC 2401, November 1998. | |||
| [RFC2407] D. Piper, The Internet IP Security Domain of Interpretation | [RFC2407] D. Piper, The Internet IP Security Domain of Interpretation | |||
| for ISAKMP, RFC 2407, November 1998. | for ISAKMP, RFC 2407, November 1998. | |||
| [RFC2408] D. Maughan, M. Schertler, M. Schneider, and J. Turner, | [RFC2408] D. Maughan, M. Schertler, M. Schneider, and J. Turner, | |||
| Internet Security Association and Key Management Protocol (ISAKMP), | Internet Security Association and Key Management Protocol (ISAKMP), | |||
| RFC 2409, November 1998. | RFC 2409, November 1998. | |||
| [RFC2409] D. Harkins and D. Carrel, The Internet Key Exchange (IKE), | [RFC2409] D. Harkins and D. Carrel, The Internet Key Exchange (IKE), | |||
| skipping to change at page 40, line 24 ¶ | skipping to change at page 40, line 31 ¶ | |||
| [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control", | [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control", | |||
| RFC 2581, April 1999. | RFC 2581, April 1999. | |||
| [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation | [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation | |||
| of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884, | of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884, | |||
| July 2000. | July 2000. | |||
| [RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983, | [RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983, | |||
| October 2000. | October 2000. | |||
| [RFC2780] S. Bradner and V. Paxson, IANA Allocation Guidelines For | [RFC2780] S. Bradner and V. Paxson, "IANA Allocation Guidelines For | |||
| Values In the Internet Protocol and Related Headers, RFC 2780, March | Values In the Internet Protocol and Related Headers", RFC 2780, March | |||
| 2000. | 2000. | |||
| [RFD99] Ramakrishnan, Floyd, S., and Davie, B., A Proposal to Incor- | ||||
| porate ECN in MPLS, work in progress, June 1999. URL | ||||
| "http://www.aciri.org/floyd/papers/draft-ietf-mpls-ecn-00.txt". | ||||
| [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for | [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for | |||
| Congestion Avoidance in Computer Networks", ACM Transactions on Com- | Congestion Avoidance in Computer Networks", ACM Transactions on Com- | |||
| puter Systems, Vol.8, No.2, pp. 158-181, May 1990. | puter Systems, Vol.8, No.2, pp. 158-181, May 1990. | |||
| [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom | [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom | |||
| Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM | Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM | |||
| Computer Communications Review, October 1999. | Computer Communications Review, October 1999. | |||
| 16. Security Considerations | 16. Security Considerations | |||
| Security considerations have been discussed in Sections 7 and 8. | Security considerations have been discussed in Sections 7, 8, 18, and | |||
| 19. | ||||
| 17. IPv4 Header Checksum Recalculation | 17. IPv4 Header Checksum Recalculation | |||
| IPv4 header checksum recalculation is an issue with some high-end | IPv4 header checksum recalculation is an issue with some high-end | |||
| router architectures using an output-buffered switch, since most if | router architectures using an output-buffered switch, since most if | |||
| not all of the header manipulation is performed on the input side of | not all of the header manipulation is performed on the input side of | |||
| the switch, while the ECN decision would need to be made local to the | the switch, while the ECN decision would need to be made local to the | |||
| output buffer. This is not an issue for IPv6, since there is no IPv6 | output buffer. This is not an issue for IPv6, since there is no IPv6 | |||
| header checksum. The IPv4 TOS octet is the last byte of a 16-bit | header checksum. The IPv4 TOS octet is the last byte of a 16-bit | |||
| half-word. | half-word. | |||
| skipping to change at page 44, line 4 ¶ | skipping to change at page 44, line 12 ¶ | |||
| router, then the first of these two changes would have no effect. | router, then the first of these two changes would have no effect. | |||
| The second change, however, would have the effect of giving false | The second change, however, would have the effect of giving false | |||
| reports of congestion to a monitoring device along the path. If the | reports of congestion to a monitoring device along the path. If the | |||
| transport protocol is ECN-Capable, then the second of these two | transport protocol is ECN-Capable, then the second of these two | |||
| changes (when, for example, (0,0) was changed to (1,1)) could also | changes (when, for example, (0,0) was changed to (1,1)) could also | |||
| have an effect at the transport level, by combining falsely indicat- | have an effect at the transport level, by combining falsely indicat- | |||
| ing ECN-Capability with falsely reporting congestion. For an ECN- | ing ECN-Capability with falsely reporting congestion. For an ECN- | |||
| capable transport, this would cause the transport to unnecessarily | capable transport, this would cause the transport to unnecessarily | |||
| react to congestion. In this particular case, the router that is | react to congestion. In this particular case, the router that is | |||
| incorrectly changing the ECN field could have dropped the packet. | incorrectly changing the ECN field could have dropped the packet. | |||
| Thus for this case of an ECN-capable transport, the consequence of | Thus for this case of an ECN-capable transport, the consequence of | |||
| this change to the ECN field is no worse than dropping the packet. | this change to the ECN field is no worse than dropping the packet. | |||
| 18.1.5. Changes with No Functional Effect | 18.1.5. Changes with No Functional Effect | |||
| (0, *) -> (0, *) | (0, *) -> (0, *) | |||
| The CE bit is ignored in a packet that does not have the ECT bit set. | The CE bit is ignored in a packet that does not have the ECT bit set. | |||
| Thus, this change would have no effect, in terms of ECN. | Thus, this change would have no effect, in terms of ECN. | |||
| 18.2. Information carried in the Transport Header | 18.2. Information carried in the Transport Header | |||
| For TCP, an ECN-capable TCP receiver informs its TCP peer that it is | For TCP, an ECN-capable TCP receiver informs its TCP peer that it is | |||
| ECN-capable at the TCP level, using information in the TCP header at | ECN-capable at the TCP level, conveying this information in the TCP | |||
| the time the connection is setup. This document does not consider | header at the time the connection is setup. This document does not | |||
| potential dangers introduced by changes in the transport header | consider potential dangers introduced by changes in the transport | |||
| within the network. In the case of IPsec tunnels, the IPsec tunnel | header within the network. In the case of IPsec tunnels, the IPsec | |||
| protects the transport header. | tunnel protects the transport header. | |||
| Another issue concerns TCP packets with a spoofed IP source address | ||||
| carrying invalid ECN information in the transport header. For com- | ||||
| pleteness, we examine here some possible ways that a node spoofing | ||||
| the IP source address of another node could use the two ECN flags in | ||||
| the TCP header to launch a denial-of-service attack. However, these | ||||
| attacks would require an ability for the attacker to use valid TCP | ||||
| sequence numbers, and any attacker with this ability and with the | ||||
| ability to spoof IP source addresses could damage the TCP connection | ||||
| without using the ECN flags. Therefore, ECN does not add any new | ||||
| vulnerabilities in this respect. | ||||
| An acknowledgement packet with a spoofed IP source address of the TCP | ||||
| data receiver could include the ECE bit set. If accepted by the TCP | ||||
| data sender as a valid packet, this spoofed acknowledgement packet | ||||
| could result in the TCP data sender unnecessarily halving its conges- | ||||
| tion window. However, to be accepted by the data sender, such a | ||||
| spoofed acknowledgement packet would have to have the correct 32-bit | ||||
| sequence number as well as a valid acknowledgement number. An | ||||
| attacker that could successfully send such a spoofed acknowledgement | ||||
| packet could also send a spoofed RST packet, or do other equally dam- | ||||
| aging operations to the TCP connection. | ||||
| Packets with a spoofed IP source address of the TCP data sender could | ||||
| include the CWR bit set. Again, to be accepted, such a packet would | ||||
| have to have a valid sequence number. In addition, such a spoofed | ||||
| packet would have a limited performance impact. Spoofing a data | ||||
| packet with the CWR bit set could result in the TCP data receiver | ||||
| sending fewer ECE packets than it would otherwise, if the data | ||||
| receiver was sending ECE packets when it received the spoofed CWR | ||||
| packet. | ||||
| 18.3. Split Paths | 18.3. Split Paths | |||
| In some cases, a malicious or broken router might have access to only | In some cases, a malicious or broken router might have access to only | |||
| a subset of the packets from a flow. The question is as follows: | a subset of the packets from a flow. The question is as follows: | |||
| can this router, by altering the ECN field in this subset of the | can this router, by altering the ECN field in this subset of the | |||
| packets, do more damage to that flow than if it had simply dropped | packets, do more damage to that flow than if it had simply dropped | |||
| that set of packets? | that set of packets? | |||
| We will classify the packets in the flow as A packets and B packets, | We will classify the packets in the flow as A packets and B packets, | |||
| skipping to change at page 48, line 35 ¶ | skipping to change at page 49, line 29 ¶ | |||
| receiving more bandwidth than it would have otherwise, relative to | receiving more bandwidth than it would have otherwise, relative to | |||
| competing non-subverted flows. If the congested queue reaches the | competing non-subverted flows. If the congested queue reaches the | |||
| packet-dropping stage, then the subversion of end-to-end congestion | packet-dropping stage, then the subversion of end-to-end congestion | |||
| control might or might not be of overall benefit to the subverted | control might or might not be of overall benefit to the subverted | |||
| flow, depending on that flow's relative tradeoffs between throughput, | flow, depending on that flow's relative tradeoffs between throughput, | |||
| loss, and delay. | loss, and delay. | |||
| One form of subverting end-to-end congestion control is to falsely | One form of subverting end-to-end congestion control is to falsely | |||
| indicate ECN-capability by setting the ECT bit. This has the conse- | indicate ECN-capability by setting the ECT bit. This has the conse- | |||
| quence of downstream congested routers setting the CE bit in vain. | quence of downstream congested routers setting the CE bit in vain. | |||
| However, as we describe in the section below, if the ECT bit is | However, as described in Section 9.1.2, if the ECT bit is changed in | |||
| changed in the IPsec tunnel, this can be detected at the egress point | an IP tunnel, this can be detected at the egress point of the tunnel, | |||
| of the tunnel. | as long as the inner header was not changed within the tunnel. | |||
| The second form of subverting end-to-end congestion control is to | The second form of subverting end-to-end congestion control is to | |||
| erase the congestion indication, either by erasing the CE bit | erase the congestion indication, either by erasing the CE bit | |||
| directly, or by erasing the ECT bit when the CE bit is already set. | directly, or by erasing the ECT bit when the CE bit is already set. | |||
| In this case, it is the upstream congested routers that set the CE | In this case, it is the upstream congested routers that set the CE | |||
| bit in vain. | bit in vain. | |||
| If the ECT bit is erased within an IP tunnel, then this can be | If the ECT bit is erased within an IP tunnel, then this can be | |||
| detected at the egress point of the tunnel. If the CE bit is set | detected at the egress point of the tunnel, as long as the inner | |||
| header was not changed within the tunnel. If the CE bit is set | ||||
| upstream of the IP tunnel, then any erasure of the outer header's CE | upstream of the IP tunnel, then any erasure of the outer header's CE | |||
| bit within the tunnel will have no effect because the inner header | bit within the tunnel will have no effect because the inner header | |||
| preserves the set value of the CE bit. However, if the CE bit is set | preserves the set value of the CE bit. However, if the CE bit is set | |||
| within the tunnel, and erased either within or downstream of the tun- | within the tunnel, and erased either within or downstream of the tun- | |||
| nel, this is not necessarily detected at the egress point of the | nel, this is not necessarily detected at the egress point of the tun- | |||
| tunnel. | nel. | |||
| With this subversion of end-to-end congestion control, an end-system | With this subversion of end-to-end congestion control, an end-system | |||
| transport does not respond to the congestion indication. Along with | transport does not respond to the congestion indication. Along with | |||
| the increased unfairness for the non-subverted flows described in the | the increased unfairness for the non-subverted flows described in the | |||
| previous section, the congested router's queue could continue to | previous section, the congested router's queue could continue to | |||
| build, resulting in packet loss at the congested router - which is a | build, resulting in packet loss at the congested router - which is a | |||
| means for indicating congestion to the transport in any case. In the | means for indicating congestion to the transport in any case. In the | |||
| interim, the flow might experience higher queueing delays, possibly | interim, the flow might experience higher queueing delays, possibly | |||
| along with an increased bandwidth relative to other non-subverted | along with an increased bandwidth relative to other non-subverted | |||
| flows. But transports do not inherently make assumptions of consis- | flows. But transports do not inherently make assumptions of consis- | |||
| skipping to change at page 49, line 46 ¶ | skipping to change at page 50, line 40 ¶ | |||
| end-to-end congestion control that a broken or malicious router could | end-to-end congestion control that a broken or malicious router could | |||
| use. For example, a broken router could duplicate data packets, thus | use. For example, a broken router could duplicate data packets, thus | |||
| effectively negating the effects of end-to-end congestion control | effectively negating the effects of end-to-end congestion control | |||
| along some portion of the path. (For a router that duplicated pack- | along some portion of the path. (For a router that duplicated pack- | |||
| ets within an IPsec tunnel, the security administrator can cause the | ets within an IPsec tunnel, the security administrator can cause the | |||
| duplicate packets to be discarded by configuring anti-replay protec- | duplicate packets to be discarded by configuring anti-replay protec- | |||
| tion for the tunnel.) This duplication of packets within the network | tion for the tunnel.) This duplication of packets within the network | |||
| would have similar implications for the network and for the subverted | would have similar implications for the network and for the subverted | |||
| flow as those described in Sections 18.1.1 and 18.1.4 above. | flow as those described in Sections 18.1.1 and 18.1.4 above. | |||
| 20. The motivation for the ECT bit. | 20. The Motivation for the ECT bit. | |||
| The need for the ECT bit is motivated by the fact that ECN will be | The need for the ECT bit is motivated by the fact that ECN will be | |||
| deployed incrementally in an Internet where some transport protocols | deployed incrementally in an Internet where some transport protocols | |||
| and routers understand ECN and some do not. With the ECT bit, the | and routers understand ECN and some do not. With the ECT bit, the | |||
| router can drop packets from flows that are not ECN-capable, but can | router can drop packets from flows that are not ECN-capable, but can | |||
| *instead* set the CE bit in packets that *are* ECN-capable. Because | *instead* set the CE bit in packets that *are* ECN-capable. Because | |||
| the ECT bit allows an end node to have the CE bit set in a packet | the ECT bit allows an end node to have the CE bit set in a packet | |||
| *instead* of having the packet dropped, an end node might have some | *instead* of having the packet dropped, an end node might have some | |||
| incentive to deploy ECN. | incentive to deploy ECN. | |||
| If there was no ECT indication, then the router would have to set the | If there was no ECT indication, then the router would have to set the | |||
| CE bit for packets from both ECN-capable and non-ECN-capable flows. | CE bit for packets from both ECN-capable and non-ECN-capable flows. | |||
| In this case, there would be no incentive for end-nodes to deploy | In this case, there would be no incentive for end-nodes to deploy | |||
| ECN, and no viable path of incremental deployment from a non-ECN | ECN, and no viable path of incremental deployment from a non-ECN | |||
| world to an ECN-capable world. Consider the first stages of such an | world to an ECN-capable world. Consider the first stages of such an | |||
| incremental deployment, where a subset of the flows are ECN-capable. | incremental deployment, where a subset of the flows are ECN-capable. | |||
| At the onset of congestion, when the packet dropping/marking rate | At the onset of congestion, when the packet dropping/marking rate | |||
| would be low, routers would only set CE bits, rather than dropping | would be low, routers would only set CE bits, rather than dropping | |||
| packets. However, only those flows that are ECN-capable would under- | packets. However, only those flows that are ECN-capable would under- | |||
| stand and respond to CE packets. The result is that the ECN- capable | stand and respond to CE packets. The result is that the ECN-capable | |||
| flows would back off, and the non-ECN-capable flows would be unaware | flows would back off, and the non-ECN-capable flows would be unaware | |||
| of the ECN signals and would continue to open their congestion win- | of the ECN signals and would continue to open their congestion win- | |||
| dows. | dows. | |||
| In this case, there are two possible outcomes: (1) the ECN-capable | In this case, there are two possible outcomes: (1) the ECN-capable | |||
| flows back off, the non-ECN-capable flows get all of the bandwidth, | flows back off, the non-ECN-capable flows get all of the bandwidth, | |||
| and congestion remains mild, or (2) the ECN-capable flows back off, | and congestion remains mild, or (2) the ECN-capable flows back off, | |||
| the non-ECN-capable flows don't, and congestion increases until the | the non-ECN-capable flows don't, and congestion increases until the | |||
| router transitions from setting the CE bit to dropping packets. | router transitions from setting the CE bit to dropping packets. | |||
| While this second outcome evens out the fairness, the ECN-capable | While this second outcome evens out the fairness, the ECN-capable | |||
| skipping to change at page 50, line 43 ¶ | skipping to change at page 51, line 37 ¶ | |||
| A flow that advertised itself as ECN-Capable but does not respond to | A flow that advertised itself as ECN-Capable but does not respond to | |||
| CE bits is functionally equivalent to a flow that turns off conges- | CE bits is functionally equivalent to a flow that turns off conges- | |||
| tion control, as discussed earlier in this document. | tion control, as discussed earlier in this document. | |||
| Thus, in a world when a subset of the flows are ECN-capable, but | Thus, in a world when a subset of the flows are ECN-capable, but | |||
| where ECN-capable flows have no mechanism for indicating that fact to | where ECN-capable flows have no mechanism for indicating that fact to | |||
| the routers, there would be less effective and less fair congestion | the routers, there would be less effective and less fair congestion | |||
| control in the Internet, resulting in a strong incentive for end | control in the Internet, resulting in a strong incentive for end | |||
| nodes not to deploy ECN. | nodes not to deploy ECN. | |||
| 21. Why use two bits in the IP header? | 21. Why use Two Bits in the IP Header? | |||
| Given the need for an ECT indication in the IP header, there still | Given the need for an ECT indication in the IP header, there still | |||
| remains the question of whether the ECT (ECN-Capable Transport) and | remains the question of whether the ECT (ECN-Capable Transport) and | |||
| CE (Congestion Experienced) indications should have been overloaded | CE (Congestion Experienced) indications should have been overloaded | |||
| on a single bit. This overloaded-one-bit alternative, explored in | on a single bit. This overloaded-one-bit alternative, explored in | |||
| [Floyd94], would have involved a single bit with two values. One | [Floyd94], would have involved a single bit with two values. One | |||
| value, "ECT and not CE", would represent an ECN-Capable Transport, | value, "ECT and not CE", would represent an ECN-Capable Transport, | |||
| and the other value, "CE or not ECT", would represent either | and the other value, "CE or not ECT", would represent either Conges- | |||
| Congestion Experienced or a non-ECN-Capable transport. | tion Experienced or a non-ECN-Capable transport. | |||
| One difference between the one-bit and two-bit implementations con- | One difference between the one-bit and two-bit implementations con- | |||
| cerns packets that traverse multiple congested routers. Consider a | cerns packets that traverse multiple congested routers. Consider a | |||
| CE packet that arrives at a second congested router, and is selected | CE packet that arrives at a second congested router, and is selected | |||
| by the active queue management at that router for either marking or | by the active queue management at that router for either marking or | |||
| dropping. In the one-bit implementation, the second congested router | dropping. In the one-bit implementation, the second congested router | |||
| has no choice but to drop the CE packet, because it cannot distin- | has no choice but to drop the CE packet, because it cannot distin- | |||
| guish between a CE packet and a non-ECT packet. In the two-bit | guish between a CE packet and a non-ECT packet. In the two-bit | |||
| implementation, the second congested router has the choice of either | implementation, the second congested router has the choice of either | |||
| dropping the CE packet, or of leaving it alone with the CE bit set. | dropping the CE packet, or of leaving it alone with the CE bit set. | |||
| skipping to change at page 52, line 21 ¶ | skipping to change at page 53, line 15 ¶ | |||
| packets from ECN-Capable flows (to convey the functionality of the | packets from ECN-Capable flows (to convey the functionality of the | |||
| second bit elsewhere, namely in the transport header), or that | second bit elsewhere, namely in the transport header), or that | |||
| senders in ECN-Capable flows accept the limitation that receivers | senders in ECN-Capable flows accept the limitation that receivers | |||
| must be able to determine a priori which packets are ECN-Capable and | must be able to determine a priori which packets are ECN-Capable and | |||
| which are not ECN-Capable. Third, the one-bit implementation is pos- | which are not ECN-Capable. Third, the one-bit implementation is pos- | |||
| sibly more open to errors from faulty implementations that choose the | sibly more open to errors from faulty implementations that choose the | |||
| wrong default value for the ECN bit. We believe that the use of the | wrong default value for the ECN bit. We believe that the use of the | |||
| extra bit in the IP header for the ECT-bit is extremely valuable to | extra bit in the IP header for the ECT-bit is extremely valuable to | |||
| overcome these limitations. | overcome these limitations. | |||
| 22. Historical definitions for the IPv4 TOS octet | 22. Historical Definitions for the IPv4 TOS Octet | |||
| RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP | RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP | |||
| header. In RFC 791, bits 6 and 7 of the ToS octet are listed as | header. In RFC 791, bits 6 and 7 of the ToS octet are listed as | |||
| "Reserved for Future Use", and are shown set to zero. The first two | "Reserved for Future Use", and are shown set to zero. The first two | |||
| fields of the ToS octet were defined as the Precedence and Type of | fields of the ToS octet were defined as the Precedence and Type of | |||
| Service (TOS) fields. | Service (TOS) fields. | |||
| 0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
| +-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| | PRECEDENCE | TOS | 0 | 0 | RFC 791 | | PRECEDENCE | TOS | 0 | 0 | RFC 791 | |||
| skipping to change at page 52, line 51 ¶ | skipping to change at page 53, line 45 ¶ | |||
| The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows: | The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows: | |||
| 0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
| +-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| | PRECEDENCE | TOS | MBZ | RFC 1349 | | PRECEDENCE | TOS | MBZ | RFC 1349 | |||
| +-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary | Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary | |||
| Cost". In addition to the Precedence and Type of Service (TOS) | Cost". In addition to the Precedence and Type of Service (TOS) | |||
| fields, the last field, MBZ (for "must be zero") was defined as | fields, the last field, MBZ (for "must be zero") was defined as cur- | |||
| currently unused. RFC 1349 stated that "The originator of a datagram | rently unused. RFC 1349 stated that "The originator of a datagram | |||
| sets [the MBZ] field to zero (unless participating in an Internet | sets [the MBZ] field to zero (unless participating in an Internet | |||
| protocol experiment which makes use of that bit)." | protocol experiment which makes use of that bit)." | |||
| RFC 1455 [RFC 1455] defined an experimental standard that used all | RFC 1455 [RFC 1455] defined an experimental standard that used all | |||
| four bits in the TOS field to request a guaranteed level of link | four bits in the TOS field to request a guaranteed level of link | |||
| security. | security. | |||
| RFC 1349 is obsoleted by "Definition of the Differentiated Services | RFC 1349 and RFC 1455 have been obsoleted by "Definition of the Dif- | |||
| Field (DS Field) in the IPv4 and IPv6 Headers" [RFC2474], in which | ferentiated Services Field (DS Field) in the IPv4 and IPv6 Headers" | |||
| bits 6 and 7 of the DS field are listed as Currently Unused (CU). | [RFC2474] in which bits 6 and 7 of the DS field are listed as Cur- | |||
| The first six bits of the DS field are defined as the Differentiated | rently Unused (CU). RFC 2780 [RFC2780] specified ECN as an experi- | |||
| Services CodePoint (DSCP): | mental use of the two-bit CU field. RFC 2780 updated the definition | |||
| of the DS Field to only encompass the first six bits of this octet | ||||
| rather than all eight bits; these first six bits are defined as the | ||||
| Differentiated Services CodePoint (DSCP): | ||||
| 0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
| +-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| | DSCP | CU | RFC 2474 | | DSCP | CU | RFCs 2474, | |||
| 2780 | ||||
| +-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| Because of this unstable history, the definition of the ECN field in | Because of this unstable history, the definition of the ECN field in | |||
| this document cannot be guaranteed to be backwards compatible with | this document cannot be guaranteed to be backwards compatible with | |||
| all past uses of these two bits. The damage that could be done by a | all past uses of these two bits. | |||
| non-ECN-capable router would be to "erase" the CE bit for an ECN- | ||||
| capable packet that arrived at the router with the CE bit set, or set | Prior to RFC 2474, routers were not permitted to modify bits in | |||
| the CE bit even in the absence of congestion. This has been dis- | either the DSCP or ECN field of packets forwarded through them, and | |||
| cussed in the section on "Non-compliance in the Network". | hence routers that comply only with RFCs prior to 2474 should have no | |||
| effect on ECN. For end nodes, bit 7 (the ECN CE bit) must be trans- | ||||
| mitted as zero for any implementation compliant only with RFCs prior | ||||
| to 2474. Such nodes may transmit bit 6 (the ECN ECT bit) as one for | ||||
| the "Minimize Monetary Cost" provision of RFC 1349 or the experiment | ||||
| authorized by RFC 1455; neither this aspect of RFC 1349 nor the | ||||
| experiment in RFC 1455 were widely implemented or used. The damage | ||||
| that could be done by a broken, non-conformant router would be to | ||||
| "erase" the CE bit for an ECN- capable packet that arrived at the | ||||
| router with the CE bit set, or set the CE bit even in the absence of | ||||
| congestion. This has been discussed in the section on "Non-compli- | ||||
| ance in the Network". | ||||
| The damage that could be done in an ECN-capable environment by a non- | The damage that could be done in an ECN-capable environment by a non- | |||
| ECN-capable end-node transmitting packets with the ECT bit set has | ECN-capable end-node transmitting packets with the ECT bit set has | |||
| been discussed in the section on "Non-compliance by the End Nodes". | been discussed in the section on "Non-compliance by the End Nodes". | |||
| 23. IANA Considerations | ||||
| The bits for ECT and CE in the ECN Field of the IP header and the | ||||
| bits for CWR and ECE in the TCP header are specified by the Standards | ||||
| Action of this RFC, as is required by RFC 2780. We would note that | ||||
| this RFC does not define the codepoint of (ECT=0, CE=1) for the ECT | ||||
| and CE bits. | ||||
| IANA allocated the IPSEC Security Association Attribute value 10 for | ||||
| the ECN Tunnel use described in Section 9.2.1.2 above at the request | ||||
| of David Black in November 1999. If this draft is approved for pub- | ||||
| lication as an RFC, IANA should change the Reference for this alloca- | ||||
| tion from David Black's request to this RFC based on its RFC number. | ||||
| AUTHORS' ADDRESSES | AUTHORS' ADDRESSES | |||
| K. K. Ramakrishnan | K. K. Ramakrishnan | |||
| TeraOptic Networks, Inc. | TeraOptic Networks, Inc. | |||
| Phone: +1 (408) 666-8650 | Phone: +1 (408) 666-8650 | |||
| Email: kk@teraoptic.com | Email: kk@teraoptic.com | |||
| Sally Floyd | Sally Floyd | |||
| Phone: +1 (510) 666-2989 | Phone: +1 (510) 666-2989 | |||
| ACIRI | ACIRI | |||
| skipping to change at page 54, line 4 ¶ | skipping to change at page 55, line 27 ¶ | |||
| Sally Floyd | Sally Floyd | |||
| Phone: +1 (510) 666-2989 | Phone: +1 (510) 666-2989 | |||
| ACIRI | ACIRI | |||
| Email: floyd@aciri.org | Email: floyd@aciri.org | |||
| URL: http://www.aciri.org/floyd/ | URL: http://www.aciri.org/floyd/ | |||
| David L. Black | David L. Black | |||
| EMC Corporation | EMC Corporation | |||
| 42 South St. | 42 South St. | |||
| Hopkinton, MA 01748 | Hopkinton, MA 01748 | |||
| Phone: +1 (508) 435-1000 x75140 | Phone: +1 (508) 435-1000 x75140 | |||
| Email: black_david@emc.com | Email: black_david@emc.com | |||
| This draft was created in November 2000. | This draft was created in January 2001. | |||
| It expires May 2001. | It expires July 2001. | |||
| End of changes. 88 change blocks. | ||||
| 323 lines changed or deleted | 378 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||