| < draft-ietf-tsvwg-ecn-01.txt | draft-ietf-tsvwg-ecn-02.txt > | |||
|---|---|---|---|---|
| Internet Engineering Task Force K. K. Ramakrishnan | Internet Engineering Task Force K. K. Ramakrishnan | |||
| INTERNET DRAFT TeraOptic Networks | INTERNET DRAFT TeraOptic Networks | |||
| draft-ietf-tsvwg-ecn-01.txt Sally Floyd | draft-ietf-tsvwg-ecn-02.txt Sally Floyd | |||
| ACIRI | ACIRI | |||
| D. Black | D. Black | |||
| EMC | EMC | |||
| January, 2001 | February, 2001 | |||
| Expires: July, 2001 | Expires: August, 2001 | |||
| The Addition of Explicit Congestion Notification (ECN) to IP | The Addition of Explicit Congestion Notification (ECN) to IP | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
| all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
| Drafts. | Drafts. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet- Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt | http://www.ietf.org/ietf/1id-abstracts.txt | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| Abstract | Abstract | |||
| This document specifies the incorporation of ECN (Explicit Congestion | This document specifies the incorporation of ECN (Explicit Congestion | |||
| Notification) to TCP and IP, including ECN's use of two bits in the | Notification) to TCP and IP, including ECN's use of two bits in the | |||
| IP header. We begin by describing TCP's use of packet drops as an | IP header. We begin by describing TCP's use of packet drops as an | |||
| indication of congestion. Next we explain that with the addition of | indication of congestion. Next we explain that with the addition of | |||
| active queue management (e.g., RED) to the Internet infrastructure, | active queue management (e.g., RED) to the Internet infrastructure, | |||
| where routers detect congestion before the queue overflows, routers | where routers detect congestion before the queue overflows, routers | |||
| are no longer limited to packet drops as an indication of congestion. | are no longer limited to packet drops as an indication of congestion. | |||
| Routers can instead set the Congestion Experienced (CE) bit in the IP | Routers can instead set the Congestion Experienced (CE) codepoint in | |||
| header of packets from ECN-capable transports. We describe when the | the IP header of packets from ECN-capable transports. We describe | |||
| CE bit is to be set in routers, and describe modifications needed to | when the CE codepoint is to be set in routers, and describe | |||
| TCP to make it ECN-capable. Modifications to other transport | modifications needed to TCP to make it ECN-capable. Modifications to | |||
| protocols (e.g., unreliable unicast or multicast, reliable multicast, | other transport protocols (e.g., unreliable unicast or multicast, | |||
| other reliable unicast transport protocols) could be considered as | reliable multicast, other reliable unicast transport protocols) could | |||
| those protocols are developed and advance through the standards | be considered as those protocols are developed and advance through | |||
| process. | the standards process. | |||
| We also describe in this document the issues involving the use of ECN | We also describe in this document the issues involving the use of ECN | |||
| within IP tunnels, and within IPsec tunnels in particular. | within IP tunnels, and within IPsec tunnels in particular. | |||
| One of the guiding principles for this document is that all the | One of the guiding principles for this document is that all the | |||
| mechanisms specified here are incrementally deployable. | mechanisms specified here are incrementally deployable. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction | 1. Introduction | |||
| 2. Conventions and Acronyms | 2. Conventions and Acronyms | |||
| 3. Assumptions and General Principles | 3. Assumptions and General Principles | |||
| 4. Active Queue Management (AQM) | 4. Active Queue Management (AQM) | |||
| 5. Explicit Congestion Notification in IP | 5. Explicit Congestion Notification in IP | |||
| 5.1. ECN as an Indication of Persistent Congestion | 5.1. ECN as an Indication of Persistent Congestion | |||
| 5.2. Dropped or Corrupted Packets | 5.2. Dropped or Corrupted Packets | |||
| 5.3. Fragmentation | ||||
| 6. Support from the Transport Protocol | 6. Support from the Transport Protocol | |||
| 6.1. TCP | 6.1. TCP | |||
| 6.1.1. TCP Initialization | 6.1.1 TCP Initialization | |||
| 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field | 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field | |||
| 6.1.2. The TCP Sender | 6.1.2. The TCP Sender | |||
| 6.1.3. The TCP Receiver | 6.1.3. The TCP Receiver | |||
| 6.1.4. Congestion on the ACK-path | 6.1.4. Congestion on the ACK-path | |||
| 6.1.5. Retransmitted TCP packets | 6.1.5. Retransmitted TCP packets | |||
| 6.1.6. TCP Window Probes. | 6.1.6. TCP Window Probes. | |||
| 7. Non-compliance by the End Nodes | 7. Non-compliance by the End Nodes | |||
| 8. Non-compliance in the Network | 8. Non-compliance in the Network | |||
| 8.1. Complications Introduced by Split Paths | 8.1. Complications Introduced by Split Paths | |||
| 9. Encapsulated Packets | 9. Encapsulated Packets | |||
| skipping to change at page 3, line 37 ¶ | skipping to change at page 3, line 38 ¶ | |||
| 9.2. IPsec Tunnels | 9.2. IPsec Tunnels | |||
| 9.2.1. Negotiation between Tunnel Endpoints | 9.2.1. Negotiation between Tunnel Endpoints | |||
| 9.2.1.1. ECN Tunnel Security Association Database Field | 9.2.1.1. ECN Tunnel Security Association Database Field | |||
| 9.2.1.2. ECN Tunnel Security Association Attribute | 9.2.1.2. ECN Tunnel Security Association Attribute | |||
| 9.2.1.3. Changes to IPsec Tunnel Header Processing | 9.2.1.3. Changes to IPsec Tunnel Header Processing | |||
| 9.2.2. Changes to the ECN Field within an IPsec Tunnel. | 9.2.2. Changes to the ECN Field within an IPsec Tunnel. | |||
| 9.2.3. Comments for IPsec Support | 9.2.3. Comments for IPsec Support | |||
| 9.3. IP packets encapsulated in non-IP packet headers. | 9.3. IP packets encapsulated in non-IP packet headers. | |||
| 10. Issues Raised by Monitoring and Policing Devices | 10. Issues Raised by Monitoring and Policing Devices | |||
| 11. Evaluations of ECN | 11. Evaluations of ECN | |||
| 11.1. Related Work Evaluating ECN | ||||
| 11.2. A Discussion of the ECN nonce. | ||||
| 11.2.1. The Incremental Deployment of ECT(1) in Routers. | ||||
| 12. Summary of changes required in IP and TCP | 12. Summary of changes required in IP and TCP | |||
| 13. Conclusions | 13. Conclusions | |||
| 14. Acknowledgements | 14. Acknowledgements | |||
| 15. References | 15. References | |||
| 16. Security Considerations | 16. Security Considerations | |||
| 17. IPv4 Header Checksum Recalculation | 17. IPv4 Header Checksum Recalculation | |||
| 18. Possible Changes to the ECN Field in the Network | 18. Possible Changes to the ECN Field in the Network | |||
| 18.1. Possible Changes to the IP Header | 18.1. Possible Changes to the IP Header | |||
| 18.1.1. Erasing the Congestion Indication | 18.1.1. Erasing the Congestion Indication | |||
| 18.1.2. Falsely Reporting Congestion | 18.1.2. Falsely Reporting Congestion | |||
| 18.1.3. Disabling ECN-Capability | 18.1.3. Disabling ECN-Capability | |||
| 18.1.4. Falsely Indicating ECN-Capability | 18.1.4. Falsely Indicating ECN-Capability | |||
| 18.1.5. Changes with No Functional Effect | ||||
| 18.2. Information carried in the Transport Header | 18.2. Information carried in the Transport Header | |||
| 18.3. Split Paths | 18.3. Split Paths | |||
| 19. Implications of Subverting End-to-End Congestion Control | 19. Implications of Subverting End-to-End Congestion Control | |||
| 19.1. Implications for the Network and for Competing Flows | 19.1. Implications for the Network and for Competing Flows | |||
| 19.2. Implications for the Subverted Flow | 19.2. Implications for the Subverted Flow | |||
| 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control | 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control | |||
| 20. The Motivation for the ECT bit. | 20. The Motivation for the ECT Codepoints. | |||
| 20.1. The Motivation for an ECT Codepoint. | ||||
| 20.2. The Motivation for two ECT Codepoints. | ||||
| 21. Why use Two Bits in the IP Header? | 21. Why use Two Bits in the IP Header? | |||
| 22. Historical Definitions for the IPv4 TOS Octet | 22. Historical Definitions for the IPv4 TOS Octet | |||
| 23. IANA Considerations | 23. IANA Considerations | |||
| RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - To compare | RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - To compare | |||
| this with draft-ietf-tsvwg-ecn-00, compare the following: | this with draft-ietf-tsvwg-ecn-01, compare the following: | |||
| "http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-00.troff" | ||||
| "http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-01.troff" | "http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-01.troff" | |||
| Changes from draft-ietf-tsvwg-ecn-00: | "http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-02.troff" | |||
| * Deleted Section 6.1.1.2. on "Robust TCP Initialization with no | Changes from draft-ietf-tsvwg-ecn-01: | |||
| response to the SYN", and modified the paragraph in the Conclusions | Added the ECT(1) codepoint, and changed references about bits to | |||
| referring to this. | references about codepoints in many places. Also added Section 11.2 on | |||
| * Added Section 23 on IANA Considerations. | "A Discussion of the ECN nonce", and Section 20.2 on "The Motivation for | |||
| * Added two paragraphs to Section 18.2 on denial-of-service attacks. | two ECT Codepoints". | |||
| * Added some text about the ECN nonce being a research issue. | Added a paragraph saying that by default, the discussion of setting | |||
| * Moved two paragraphs about setting the CWR bit from Section 6.1.3 to | the CE codepoint applies to all Differentiated Services Per-Hop | |||
| Section 6.1.2. | Behaviors. | |||
| * Various small changes: | Added Section 5.3 on fragmentation. | |||
| Adding several small clarifying sentences in Section 12, 22. | Added "A host MUST NOT set ECT on SYN or SYN-ACK packets." to the end | |||
| Small clarification to text in Section 19.2. | of Section 6.1.1, just to be explicit. | |||
| Deleted a few unnecessary sentences in Section 9. | Corrected some references to "Section 19" to "Section 22". | |||
| Updated some references to Section X. | Clarified that ECN is defined identically in IPv4 and in IPv6. | |||
| Added more references to RFC 2780. | ||||
| Deleted references to internet-drafts. | ||||
| Clarified terminology for "non-ECN-setup SYN packet", including the | ||||
| following: "Receivers MUST correctly handle all forms of the non-ECN- | ||||
| setup SYN and SYN-ACK packets." | ||||
| 1. Introduction | 1. Introduction | |||
| TCP's congestion control and avoidance algorithms are based on the | TCP's congestion control and avoidance algorithms are based on the | |||
| notion that the network is a black-box [Jacobson88, Jacobson90]. The | notion that the network is a black-box [Jacobson88, Jacobson90]. The | |||
| network's state of congestion or otherwise is determined by end-sys- | network's state of congestion or otherwise is determined by end- | |||
| tems probing for the network state, by gradually increasing the load | systems probing for the network state, by gradually increasing the | |||
| on the network (by increasing the window of packets that are out- | load on the network (by increasing the window of packets that are | |||
| standing in the network) until the network becomes congested and a | outstanding in the network) until the network becomes congested and a | |||
| packet is lost. Treating the network as a "black-box" and treating | packet is lost. Treating the network as a "black-box" and treating | |||
| loss as an indication of congestion in the network is appropriate for | loss as an indication of congestion in the network is appropriate for | |||
| pure best-effort data carried by TCP, with little or no sensitivity | pure best-effort data carried by TCP, with little or no sensitivity | |||
| to delay or loss of individual packets. In addition, TCP's conges- | to delay or loss of individual packets. In addition, TCP's | |||
| tion management algorithms have techniques built-in (such as Fast | congestion management algorithms have techniques built-in (such as | |||
| Retransmit and Fast Recovery) to minimize the impact of losses, from | Fast Retransmit and Fast Recovery) to minimize the impact of losses, | |||
| a throughput perspective. However, these mechanisms are not intended | from a throughput perspective. However, these mechanisms are not | |||
| to help applications that are in fact sensitive to the delay or loss | intended to help applications that are in fact sensitive to the delay | |||
| of one or more individual packets. Interactive traffic such as tel- | or loss of one or more individual packets. Interactive traffic such | |||
| net, web-browsing, and transfer of audio and video data can be sensi- | as telnet, web-browsing, and transfer of audio and video data can be | |||
| tive to packet losses (especially when using an unreliable data | sensitive to packet losses (especially when using an unreliable data | |||
| delivery transport such as UDP) or to the increased latency of the | delivery transport such as UDP) or to the increased latency of the | |||
| packet caused by the need to retransmit the packet after a loss (with | packet caused by the need to retransmit the packet after a loss (with | |||
| the reliable data delivery semantics provided by TCP). | the reliable data delivery semantics provided by TCP). | |||
| Since TCP determines the appropriate congestion window to use by | Since TCP determines the appropriate congestion window to use by | |||
| gradually increasing the window size until it experiences a dropped | gradually increasing the window size until it experiences a dropped | |||
| packet, this causes the queues at the bottleneck router to build up. | packet, this causes the queues at the bottleneck router to build up. | |||
| With most packet drop policies at the router that are not sensitive | With most packet drop policies at the router that are not sensitive | |||
| to the load placed by each individual flow (e.g., tail-drop on queue | to the load placed by each individual flow (e.g., tail-drop on queue | |||
| overflow), this means that some of the packets of latency-sensitive | overflow), this means that some of the packets of latency-sensitive | |||
| flows may be dropped. In addition, such drop policies lead to syn- | flows may be dropped. In addition, such drop policies lead to | |||
| chronization of loss across multiple flows. | synchronization of loss across multiple flows. | |||
| Active queue management mechanisms detect congestion before the queue | Active queue management mechanisms detect congestion before the queue | |||
| overflows, and provide an indication of this congestion to the end | overflows, and provide an indication of this congestion to the end | |||
| nodes. Thus, active queue management can reduce unnecessary queueing | nodes. Thus, active queue management can reduce unnecessary queueing | |||
| delay for all traffic sharing that queue. The advantages of active | delay for all traffic sharing that queue. The advantages of active | |||
| queue management are discussed in RFC 2309 [RFC2309]. Active queue | queue management are discussed in RFC 2309 [RFC2309]. Active queue | |||
| management avoids some of the bad properties of dropping on queue | management avoids some of the bad properties of dropping on queue | |||
| overflow, including the undesirable synchronization of loss across | overflow, including the undesirable synchronization of loss across | |||
| multiple flows. More importantly, active queue management means that | multiple flows. More importantly, active queue management means that | |||
| transport protocols with mechanisms for congestion control (e.g., | transport protocols with mechanisms for congestion control (e.g., | |||
| TCP) do not have to rely on buffer overflow as the only indication of | TCP) do not have to rely on buffer overflow as the only indication of | |||
| congestion. | congestion. | |||
| Active queue management mechanisms may use one of several methods for | Active queue management mechanisms may use one of several methods for | |||
| indicating congestion to end-nodes. One is to use packet drops, as is | indicating congestion to end-nodes. One is to use packet drops, as is | |||
| currently done. However, active queue management allows the router to | currently done. However, active queue management allows the router to | |||
| separate policies of queueing or dropping packets from the policies | separate policies of queueing or dropping packets from the policies | |||
| for indicating congestion. Thus, active queue management allows | for indicating congestion. Thus, active queue management allows | |||
| routers to use the Congestion Experienced (CE) bit in a packet header | routers to use the Congestion Experienced (CE) codepoint in a packet | |||
| as an indication of congestion, instead of relying solely on packet | header as an indication of congestion, instead of relying solely on | |||
| drops. This has the potential of reducing the impact of loss on | packet drops. This has the potential of reducing the impact of loss | |||
| latency-sensitive flows. | on latency-sensitive flows. | |||
| This document is intended to obsolete RFC 2481, "A Proposal to add | This document is intended to obsolete RFC 2481, "A Proposal to add | |||
| Explicit Congestion Notification (ECN) to IP", which defined ECN as | Explicit Congestion Notification (ECN) to IP", which defined ECN as | |||
| an Experimental Protocol for the Internet Community. | an Experimental Protocol for the Internet Community. | |||
| RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - This | RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - This | |||
| document obsoletes three subsequent internet-drafts on ECN, "IPsec | document obsoletes three subsequent internet-drafts on ECN, "IPsec | |||
| Interactions with ECN", "ECN Interactions with IP Tunnels", and "TCP | Interactions with ECN", "ECN Interactions with IP Tunnels", and "TCP | |||
| with ECN: The Treatment of Retransmitted Data Packets". This | with ECN: The Treatment of Retransmitted Data Packets". This | |||
| document is intended largely to merge the earlier documents all into | document is intended largely to merge the earlier documents all into | |||
| skipping to change at page 6, line 19 ¶ | skipping to change at page 6, line 18 ¶ | |||
| The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, | The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, | |||
| SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this | SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this | |||
| document, are to be interpreted as described in [B97]. | document, are to be interpreted as described in [B97]. | |||
| 3. Assumptions and General Principles | 3. Assumptions and General Principles | |||
| In this section, we describe some of the important design principles | In this section, we describe some of the important design principles | |||
| and assumptions that guided the design choices in this proposal. | and assumptions that guided the design choices in this proposal. | |||
| * Because ECN is likely to be adopted gradually, accommodating migra- | * Because ECN is likely to be adopted gradually, accommodating | |||
| tion is essential. Some routers may still only drop packets to indi- | migration is essential. Some routers may still only drop packets to | |||
| cate congestion, and some end-systems may not be ECN-capable. The | indicate congestion, and some end-systems may not be ECN-capable. The | |||
| most viable strategy is one that accommodates incremental deployment | most viable strategy is one that accommodates incremental deployment | |||
| without having to resort to "islands" of ECN-capable and non-ECN- | without having to resort to "islands" of ECN-capable and non-ECN- | |||
| capable environments. | capable environments. | |||
| * New mechanisms for congestion control and avoidance need to co- | * New mechanisms for congestion control and avoidance need to co- | |||
| exist and cooperate with existing mechanisms for congestion control. | exist and cooperate with existing mechanisms for congestion control. | |||
| In particular, new mechanisms have to co-exist with TCP's current | In particular, new mechanisms have to co-exist with TCP's current | |||
| methods of adapting to congestion and with routers' current practice | methods of adapting to congestion and with routers' current practice | |||
| of dropping packets in periods of congestion. | of dropping packets in periods of congestion. | |||
| * Congestion may persist over different time-scales. The time scales | * Congestion may persist over different time-scales. The time scales | |||
| that we are concerned with are congestion events that may last longer | that we are concerned with are congestion events that may last longer | |||
| than a round-trip time. | than a round-trip time. | |||
| * The number of packets in an individual flow (e.g., TCP connection | * The number of packets in an individual flow (e.g., TCP connection | |||
| or an exchange using UDP) may range from a small number of packets to | or an exchange using UDP) may range from a small number of packets to | |||
| quite a large number. We are interested in managing the congestion | quite a large number. We are interested in managing the congestion | |||
| caused by flows that send enough packets so that they are still | caused by flows that send enough packets so that they are still | |||
| active when network feedback reaches them. | active when network feedback reaches them. | |||
| * Asymmetric routing is likely to be a normal occurrence in the | * Asymmetric routing is likely to be a normal occurrence in the | |||
| Internet. The path (sequence of links and routers) followed by data | Internet. The path (sequence of links and routers) followed by data | |||
| packets may be different from the path followed by the acknowledgment | packets may be different from the path followed by the acknowledgment | |||
| packets in the reverse direction. | packets in the reverse direction. | |||
| * Many routers process the "regular" headers in IP packets more effi- | * Many routers process the "regular" headers in IP packets more | |||
| ciently than they process the header information in IP options. This | efficiently than they process the header information in IP options. | |||
| suggests keeping congestion experienced information in the regular | This suggests keeping congestion experienced information in the | |||
| headers of an IP packet. | regular headers of an IP packet. | |||
| * It must be recognized that not all end-systems will cooperate in | * It must be recognized that not all end-systems will cooperate in | |||
| mechanisms for congestion control. However, new mechanisms shouldn't | mechanisms for congestion control. However, new mechanisms shouldn't | |||
| make it easier for TCP applications to disable TCP congestion con- | make it easier for TCP applications to disable TCP congestion | |||
| trol. The benefit of lying about participating in new mechanisms | control. The benefit of lying about participating in new mechanisms | |||
| such as ECN-capability should be small. | such as ECN-capability should be small. | |||
| 4. Active Queue Management (AQM) | 4. Active Queue Management (AQM) | |||
| Random Early Detection (RED) is one mechanism for Active Queue Man- | Random Early Detection (RED) is one mechanism for Active Queue | |||
| agement (AQM) that has been proposed to detect incipient congestion | Management (AQM) that has been proposed to detect incipient | |||
| [FJ93], and is currently being deployed in the Internet [RFC2309]. | congestion [FJ93], and is currently being deployed in the Internet | |||
| AQM is meant to be a general mechanism using one of several alterna- | [RFC2309]. AQM is meant to be a general mechanism using one of | |||
| tives for congestion indication, but in the absence of ECN, AQM is | several alternatives for congestion indication, but in the absence of | |||
| restricted to using packet drops as a mechanism for congestion indi- | ECN, AQM is restricted to using packet drops as a mechanism for | |||
| cation. AQM drops packets based on the average queue length exceed- | congestion indication. AQM drops packets based on the average queue | |||
| ing a threshold, rather than only when the queue overflows. However, | length exceeding a threshold, rather than only when the queue | |||
| because AQM may drop packets before the queue actually overflows, AQM | overflows. However, because AQM may drop packets before the queue | |||
| is not always forced by memory limitations to discard the packet. | actually overflows, AQM is not always forced by memory limitations to | |||
| discard the packet. | ||||
| AQM can set a Congestion Experienced (CE) bit in the packet header | AQM can set a Congestion Experienced (CE) codepoint in the packet | |||
| instead of dropping the packet, when such a bit is provided in the IP | header instead of dropping the packet, when such a field is provided | |||
| header and understood by the transport protocol. The use of the CE | in the IP header and understood by the transport protocol. The use | |||
| bit with ECN allows the receiver(s) to receive the packet, avoiding | of the CE codepoint with ECN allows the receiver(s) to receive the | |||
| the potential for excessive delays due to retransmissions after | packet, avoiding the potential for excessive delays due to | |||
| packet losses. We use the term 'CE packet' to denote a packet that | retransmissions after packet losses. We use the term 'CE packet' to | |||
| has the CE bit set. | denote a packet that has the CE codepoint set. | |||
| 5. Explicit Congestion Notification in IP | 5. Explicit Congestion Notification in IP | |||
| This document specifies that the Internet provide a congestion indi- | This document specifies that the Internet provide a congestion | |||
| cation for incipient congestion (as in RED and earlier work [RJ90]) | indication for incipient congestion (as in RED and earlier work | |||
| where the notification can sometimes be through marking packets | [RJ90]) where the notification can sometimes be through marking | |||
| rather than dropping them. This uses an ECN field in the IP header | packets rather than dropping them. This uses an ECN field in the IP | |||
| with two bits. The ECN-Capable Transport (ECT) bit is set by the | header with two bits, making four ECN codepoints, '00' to '11'. The | |||
| ECN-Capable Transport (ECT) codepoints '10' and '01' are set by the | ||||
| data sender to indicate that the end-points of the transport protocol | data sender to indicate that the end-points of the transport protocol | |||
| are ECN-capable. The CE bit is set by the router to indicate conges- | are ECN-capable; we call them ECT(0) and ECT(1) respectively. The | |||
| tion to the end nodes. Routers that have a packet arriving at a full | phrase "the ECT codepoint" in this documents refers to either of the | |||
| queue drop the packet, just as they do in the absence of ECN. | two ECT codepoints. Routers treat the ECT(0) and ECT(1) codepoints | |||
| as equivalent. Senders are free to use either the ECT(0) or the | ||||
| ECT(1) codepoint to indicate ECT, on a packet-by-packet basis. | ||||
| Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field. | The use of both the two codepoints for ECT, ECT(0) and ECT(1), is | |||
| Bit 6 is designated as the ECT bit, and bit 7 is designated as the CE | motivated primarily by the desire to allow mechanisms for the data | |||
| bit. The IPv4 TOS octet corresponds to the Traffic Class octet in | sender to verify that network elements are not erasing the CE | |||
| IPv6. The definitions for the IPv4 TOS octet [RFC791] and the IPv6 | codepoint, and that data receivers are properly reporting to the | |||
| Traffic Class octet have been superseded by the six-bit DS (Differen- | sender the receipt of packets with the CE codepoint set, as required | |||
| tiated Services) Field [RFC2474, RFC2780]. Bits 6 and 7 are listed | by the transport protocol. Guidelines for the senders and receivers | |||
| in [RFC2474] as Currently Unused, and are specified in RFC 2780 as | to differentiate between the ECT(0) and ECT(1) codepoints will be | |||
| approved for experimental use for ECN. Section 19 gives a brief his- | addressed in separate documents, for each transport protocol. In | |||
| tory of the TOS octet. | particular, this document does not address mechanisms for TCP end- | |||
| nodes to differentiate between the ECT(0) and ECT(1) codepoints. | ||||
| Protocols and senders that only require a single ECT codepoint SHOULD | ||||
| use ECT(0). | ||||
| The not-ECT codepoint '00' indicates a packet that is not using ECN. | ||||
| The CE codepoint '11' is set by a router to indicate congestion to | ||||
| the end nodes. Routers that have a packet arriving at a full queue | ||||
| drop the packet, just as they do in the absence of ECN. | ||||
| +-----+-----+ | ||||
| | ECN FIELD | | ||||
| +-----+-----+ | ||||
| ECT CE The ECT and CE bits defined in RFC 2481. | ||||
| 0 0 Not-ECT | ||||
| 0 1 ECT(1) | ||||
| 1 0 ECT(0) | ||||
| 1 1 CE | ||||
| Figure 1: The ECN Field in IP. | ||||
| The use of two ECT codepoints essentially gives a one-bit ECN nonce | ||||
| in packet headers, and routers necessarily "erase" the nonce when | ||||
| they set the CE codepoint [SCWA99]. For example, routers that erased | ||||
| the CE codepoint would face additional difficulty in reconstructing | ||||
| the original nonce, and thus repeated erasure of the CE codepoint | ||||
| would be more likely to be detected by the end-nodes. The ECN nonce | ||||
| also can address the problem of misbehaving transport receivers lying | ||||
| to the transport sender about whether or not the CE codepoint was set | ||||
| in a packet. The motivations for the use of two ECT codepoints is | ||||
| discussed in more detail in Section 20, along with some discussion of | ||||
| alternate possibilities for the fourth ECT codepoint. Backwards | ||||
| compatibility with earlier ECN implementations that do not understand | ||||
| the ECT(1) codepoint is discussed in Section 11. | ||||
| In RFC 2481 [RFC2481], the ECN field was divided into the ECN-Capable | ||||
| Transport (ECT) bit and the CE bit. The ECN field with only the ECN- | ||||
| Capable Transport (ECT) bit set in RFC 2481 corresponds to the ECT(0) | ||||
| codepoint in this document, and the ECN field with both the ECT and | ||||
| CE bit in RFC 2481 corresponds to the CE codepoint in this document. | ||||
| The '01' codepoint was left undefined in RFC 2481, and this is the | ||||
| reason for recommending the use of ECT(0) when only a single ECT | ||||
| codepoint is needed. | ||||
| 0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
| +-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| | DS FIELD | ECN FIELD | | | DS FIELD, DSCP | ECN FIELD | | |||
| | | | | ||||
| | DSCP | ECT | CE | | ||||
| +-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| DSCP: differentiated services codepoint | DSCP: differentiated services codepoint | |||
| ECN: Explicit Congestion Notification | ECN: Explicit Congestion Notification | |||
| Figure 1: The Differentiated Services and ECN Fields in IP. | Figure 2: The Differentiated Services and ECN Fields in IP. | |||
| Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field. | ||||
| The IPv4 TOS octet corresponds to the Traffic Class octet in IPv6, | ||||
| and the ECN field is defined identically in both cases. The | ||||
| definitions for the IPv4 TOS octet [RFC791] and the IPv6 Traffic | ||||
| Class octet have been superseded by the six-bit DS (Differentiated | ||||
| Services) Field [RFC2474, RFC2780]. Bits 6 and 7 are listed in | ||||
| [RFC2474] as Currently Unused, and are specified in RFC 2780 as | ||||
| approved for experimental use for ECN. Section 22 gives a brief | ||||
| history of the TOS octet. | ||||
| Because of the unstable history of the TOS octet, the use of the ECN | Because of the unstable history of the TOS octet, the use of the ECN | |||
| field as specified in this document cannot be guaranteed to be back- | field as specified in this document cannot be guaranteed to be | |||
| wards compatible with all past uses of these two bits. The potential | backwards compatible with those past uses of these two bits that pre- | |||
| dangers of this lack of backwards compatibility are discussed in Sec- | date ECN. The potential dangers of this lack of backwards | |||
| tion 19. | compatibility are discussed in Section 22. | |||
| Upon the receipt by an ECN-Capable transport of a single CE packet, | Upon the receipt by an ECN-Capable transport of a single CE packet, | |||
| the congestion control algorithms followed at the end-systems MUST be | the congestion control algorithms followed at the end-systems MUST be | |||
| essentially the same as the congestion control response to a *single* | essentially the same as the congestion control response to a *single* | |||
| dropped packet. For example, for ECN-Capable TCP the source TCP is | dropped packet. For example, for ECN-Capable TCP the source TCP is | |||
| required to halve its congestion window for any window of data con- | required to halve its congestion window for any window of data | |||
| taining either a packet drop or an ECN indication. | containing either a packet drop or an ECN indication. | |||
| One reason for requiring that the congestion-control response to the | One reason for requiring that the congestion-control response to the | |||
| CE packet be essentially the same as the response to a dropped packet | CE packet be essentially the same as the response to a dropped packet | |||
| is to accommodate the incremental deployment of ECN in both end-sys- | is to accommodate the incremental deployment of ECN in both end- | |||
| tems and in routers. Some routers may drop ECN-Capable packets | systems and in routers. Some routers may drop ECN-Capable packets | |||
| (e.g., using the same AQM policies for congestion detection) while | (e.g., using the same AQM policies for congestion detection) while | |||
| other routers set the CE bit, for equivalent levels of congestion. | other routers set the CE codepoint, for equivalent levels of | |||
| Similarly, a router might drop a non-ECN-Capable packet but set the | congestion. Similarly, a router might drop a non-ECN-Capable packet | |||
| CE bit in an ECN-Capable packet, for equivalent levels of congestion. | but set the CE codepoint in an ECN-Capable packet, for equivalent | |||
| If there were different congestion control responses to a CE bit | levels of congestion. If there were different congestion control | |||
| indication than to a packet drop, this could result in unfair treat- | responses to a CE codepoint than to a packet drop, this could result | |||
| ment for different flows. | in unfair treatment for different flows. | |||
| An additional goal is that the end-systems should react to congestion | An additional goal is that the end-systems should react to congestion | |||
| at most once per window of data (i.e., at most once per round-trip | at most once per window of data (i.e., at most once per round-trip | |||
| time), to avoid reacting multiple times to multiple indications of | time), to avoid reacting multiple times to multiple indications of | |||
| congestion within a round-trip time. | congestion within a round-trip time. | |||
| For a router, the CE bit of an ECN-Capable packet should only be set | For a router, the CE codepoint of an ECN-Capable packet SHOULD only | |||
| if the router would otherwise have dropped the packet as an indica- | be set if the router would otherwise have dropped the packet as an | |||
| tion of congestion to the end nodes. When the router's buffer is not | indication of congestion to the end nodes. When the router's buffer | |||
| yet full and the router is prepared to drop a packet to inform end | is not yet full and the router is prepared to drop a packet to inform | |||
| nodes of incipient congestion, the router should first check to see | end nodes of incipient congestion, the router should first check to | |||
| if the ECT bit is set in that packet's IP header. If so, then | see if the ECT codepoint is set in that packet's IP header. If so, | |||
| instead of dropping the packet, the router MAY instead set the CE bit | then instead of dropping the packet, the router MAY instead set the | |||
| in the IP header. | CE codepoint in the IP header. | |||
| An environment where all end nodes were ECN-Capable could allow new | An environment where all end nodes were ECN-Capable could allow new | |||
| criteria to be developed for setting the CE bit, and new congestion | criteria to be developed for setting the CE codepoint, and new | |||
| control mechanisms for end-node reaction to CE packets. However, | congestion control mechanisms for end-node reaction to CE packets. | |||
| this is a research issue, and as such is not addressed in this docu- | However, this is a research issue, and as such is not addressed in | |||
| ment. | this document. | |||
| When a CE packet (i.e., a packet that has the CE bit set) is received | When a CE packet (i.e., a packet that has the CE codepoint set) is | |||
| by a router, the CE bit is left unchanged, and the packet is trans- | received by a router, the CE codepoint is left unchanged, and the | |||
| mitted as usual. When severe congestion has occurred and the router's | packet is transmitted as usual. When severe congestion has occurred | |||
| queue is full, then the router has no choice but to drop some packet | and the router's queue is full, then the router has no choice but to | |||
| when a new packet arrives. We anticipate that such packet losses | drop some packet when a new packet arrives. We anticipate that such | |||
| will become relatively infrequent when a majority of end-systems | packet losses will become relatively infrequent when a majority of | |||
| become ECN-Capable and participate in TCP or other compatible conges- | end-systems become ECN-Capable and participate in TCP or other | |||
| tion control mechanisms. In an ECN-Capable environment that is ade- | compatible congestion control mechanisms. In an ECN-Capable | |||
| quately-provisioned network, packet losses should occur primarily | environment that is adequately-provisioned, packet losses should | |||
| during transients or in the presence of non-cooperating sources. | occur primarily during transients or in the presence of non- | |||
| cooperating sources. | ||||
| We expect that routers will set the CE bit in response to incipient | The above discussion of when CE may be set instead of dropping a | |||
| congestion as indicated by the average queue size, using the RED | packet applies by default to all Differentiated Services Per-Hop | |||
| algorithms suggested in [FJ93, RFC2309]. To the best of our knowl- | Behaviors (PHBs) [RFC 2475]. Specifications for PHBs MAY provide | |||
| edge, this is the only proposal currently under discussion in the | more specifics on how a compliant implementation is to choose between | |||
| IETF for routers to drop packets proactively, before the buffer over- | setting CE and dropping a packet, but this is NOT REQUIRED. A router | |||
| flows. However, this document does not attempt to specify a particu- | MUST NOT set CE instead of dropping a packet when the drop that would | |||
| lar mechanism for active queue management, leaving that endeavor, if | occur is caused by reasons other than congestion or the desire to | |||
| needed, to other areas of the IETF. While ECN is inextricably tied | indicate incipient congestion to end nodes (e.g., a diffserv edge | |||
| up with the need to have a reasonable active queue management mecha- | node may be configured to unconditionally drop certain classes of | |||
| nism at the router, the reverse does not hold; active queue manage- | traffic to prevent them from entering its diffserv domain). | |||
| ment mechanisms have been developed and deployed independent of ECN, | ||||
| using packet drops as indications of congestion in the absence of ECN | We expect that routers will set the CE codepoint in response to | |||
| in the IP architecture. | incipient congestion as indicated by the average queue size, using | |||
| the RED algorithms suggested in [FJ93, RFC2309]. To the best of our | ||||
| knowledge, this is the only proposal currently under discussion in | ||||
| the IETF for routers to drop packets proactively, before the buffer | ||||
| overflows. However, this document does not attempt to specify a | ||||
| particular mechanism for active queue management, leaving that | ||||
| endeavor, if needed, to other areas of the IETF. While ECN is | ||||
| inextricably tied up with the need to have a reasonable active queue | ||||
| management mechanism at the router, the reverse does not hold; active | ||||
| queue management mechanisms have been developed and deployed | ||||
| independent of ECN, using packet drops as indications of congestion | ||||
| in the absence of ECN in the IP architecture. | ||||
| 5.1. ECN as an Indication of Persistent Congestion | 5.1. ECN as an Indication of Persistent Congestion | |||
| We emphasize that a *single* packet with the CE bit set in an IP | We emphasize that a *single* packet with the CE codepoint set in an | |||
| packet causes the transport layer to respond, in terms of congestion | IP packet causes the transport layer to respond, in terms of | |||
| control, as it would to a packet drop. The instantaneous queue size | congestion control, as it would to a packet drop. The instantaneous | |||
| is likely to see considerable variations even when the router does | queue size is likely to see considerable variations even when the | |||
| not experience persistent congestion. As such, it is important that | router does not experience persistent congestion. As such, it is | |||
| transient congestion at a router, reflected by the instantaneous | important that transient congestion at a router, reflected by the | |||
| queue size reaching a threshold much smaller than the capacity of the | instantaneous queue size reaching a threshold much smaller than the | |||
| queue, not trigger a reaction at the transport layer. Therefore, the | capacity of the queue, not trigger a reaction at the transport layer. | |||
| CE bit should not be set by a router based on the instantaneous queue | Therefore, the CE codepoint should not be set by a router based on | |||
| size. | the instantaneous queue size. | |||
| For example, since the ATM and Frame Relay mechanisms for congestion | For example, since the ATM and Frame Relay mechanisms for congestion | |||
| indication have typically been defined without an associated notion | indication have typically been defined without an associated notion | |||
| of average queue size as the basis for determining that an intermedi- | of average queue size as the basis for determining that an | |||
| ate node is congested, we believe that they provide a very noisy sig- | intermediate node is congested, we believe that they provide a very | |||
| nal. The TCP-sender reaction specified in this document for ECN is | noisy signal. The TCP-sender reaction specified in this document for | |||
| NOT the appropriate reaction for such a noisy signal of congestion | ECN is NOT the appropriate reaction for such a noisy signal of | |||
| notification. However, if the routers that interface to the ATM net- | congestion notification. However, if the routers that interface to | |||
| work have a way of maintaining the average queue at the interface, | the ATM network have a way of maintaining the average queue at the | |||
| and use it to come to a reliable determination that the ATM subnet is | interface, and use it to come to a reliable determination that the | |||
| congested, they may use the ECN notification that is defined here. | ATM subnet is congested, they may use the ECN notification that is | |||
| defined here. | ||||
| We continue to encourage experiments in techniques at layer 2 (e.g., | We continue to encourage experiments in techniques at layer 2 (e.g., | |||
| in ATM switches or Frame Relay switches) to take advantage of ECN. | in ATM switches or Frame Relay switches) to take advantage of ECN. | |||
| For example, using a scheme such as RED (where packet marking is | For example, using a scheme such as RED (where packet marking is | |||
| based on the average queue length exceeding a threshold), layer 2 | based on the average queue length exceeding a threshold), layer 2 | |||
| devices could provide a reasonably reliable indication of congestion. | devices could provide a reasonably reliable indication of congestion. | |||
| When all the layer 2 devices in a path set that layer's own Conges- | When all the layer 2 devices in a path set that layer's own | |||
| tion Experienced bit (e.g., the EFCI bit for ATM, the FECN bit in | Congestion Experienced codepoint (e.g., the EFCI bit for ATM, the | |||
| Frame Relay) in this reliable manner, then the interface router to | FECN bit in Frame Relay) in this reliable manner, then the interface | |||
| the layer 2 network could copy the state of that layer 2 Congestion | router to the layer 2 network could copy the state of that layer 2 | |||
| Experienced bit into the CE bit in the IP header. We recognize that | Congestion Experienced codepoint into the CE codepoint in the IP | |||
| this is not the current practice, nor is it in current standards. | header. We recognize that this is not the current practice, nor is | |||
| However, encouraging experimentation in this manner may provide the | it in current standards. However, encouraging experimentation in this | |||
| information needed to enable evolution of existing layer 2 mechanisms | manner may provide the information needed to enable evolution of | |||
| to provide a more reliable means of congestion indication, when they | existing layer 2 mechanisms to provide a more reliable means of | |||
| use a single bit for indicating congestion. | congestion indication, when they use a single bit for indicating | |||
| congestion. | ||||
| 5.2. Dropped or Corrupted Packets | 5.2. Dropped or Corrupted Packets | |||
| For the proposed use for ECN in this document (that is, for a trans- | For the proposed use for ECN in this document (that is, for a | |||
| port protocol such as TCP for which a dropped data packet is an indi- | transport protocol such as TCP for which a dropped data packet is an | |||
| cation of congestion), end nodes detect dropped data packets, and the | indication of congestion), end nodes detect dropped data packets, and | |||
| congestion response of the end nodes to a dropped data packet is at | the congestion response of the end nodes to a dropped data packet is | |||
| least as strong as the congestion response to a received CE packet. | at least as strong as the congestion response to a received CE | |||
| To ensure the reliable delivery of the congestion indication of the | packet. To ensure the reliable delivery of the congestion indication | |||
| CE bit, the ECT bit MUST NOT be set in a packet unless the loss of | of the CE codepoint, an ECT codepoint MUST NOT be set in a packet | |||
| that packet in the network would be detected by the end nodes and | unless the loss of that packet in the network would be detected by | |||
| interpreted as an indication of congestion. | the end nodes and interpreted as an indication of congestion. | |||
| Transport protocols such as TCP do not necessarily detect all packet | Transport protocols such as TCP do not necessarily detect all packet | |||
| drops, such as the drop of a "pure" ACK packet; for example, TCP does | drops, such as the drop of a "pure" ACK packet; for example, TCP does | |||
| not reduce the arrival rate of subsequent ACK packets in response to | not reduce the arrival rate of subsequent ACK packets in response to | |||
| an earlier dropped ACK packet. Any proposal for extending ECN-Capa- | an earlier dropped ACK packet. Any proposal for extending ECN- | |||
| bility to such packets would have to address issues such as the case | Capability to such packets would have to address issues such as the | |||
| of an ACK packet that was marked with the CE bit but was later | case of an ACK packet that was marked with the CE codepoint but was | |||
| dropped in the network. We believe that this aspect is still the sub- | later dropped in the network. We believe that this aspect is still | |||
| ject of research, so this document specifies that at this time, | the subject of research, so this document specifies that at this | |||
| "pure" ACK packets MUST NOT indicate ECN-Capability. | time, "pure" ACK packets MUST NOT indicate ECN-Capability. | |||
| Similarly, if a CE packet is dropped later in the network due to cor- | Similarly, if a CE packet is dropped later in the network due to | |||
| ruption (bit errors), the end nodes should still invoke congestion | corruption (bit errors), the end nodes should still invoke congestion | |||
| control, just as TCP would today in response to a dropped data | control, just as TCP would today in response to a dropped data | |||
| packet. This issue of corrupted CE packets would have to be consid- | packet. This issue of corrupted CE packets would have to be | |||
| ered in any proposal for the network to distinguish between packets | considered in any proposal for the network to distinguish between | |||
| dropped due to corruption, and packets dropped due to congestion or | packets dropped due to corruption, and packets dropped due to | |||
| buffer overflow. In particular, the ubiquitous deployment of ECN | congestion or buffer overflow. In particular, the ubiquitous | |||
| would not, in and of itself, be a sufficient development to allow | deployment of ECN would not, in and of itself, be a sufficient | |||
| end-nodes to interpret packet drops as indications of corruption | development to allow end-nodes to interpret packet drops as | |||
| rather than congestion. | indications of corruption rather than congestion. | |||
| 5.3. Fragmentation | ||||
| All ECN-capable packets SHOULD have the DF (Don't Fragment) bit set. | ||||
| Reassembly of a fragmented packet MUST NOT lose indications of | ||||
| congestion. In other words, if any fragment of an IP packet to be | ||||
| reassembled has the CE codepoint set, then one of two actions MUST be | ||||
| taken: | ||||
| * The reassembled packet has the CE codepoint set. This MUST NOT | ||||
| occur if any of the other fragments contributing to this | ||||
| reassembly carries the Not-ECT codepoint. | ||||
| * The packet is dropped instead of being reassmembled. | ||||
| If both actions are applicable, either MAY be chosen. Reassembly of | ||||
| a fragmented packet MUST NOT change the ECN codepoint when all of the | ||||
| fragments carry the same codepoint. | ||||
| Situations may arise in which the above specification is | ||||
| insufficiently precise. For example, it does not place requirements | ||||
| on reassembly of fragments that carry a mixture of ECT(0), ECT(1) | ||||
| and/or Not-ECT. In situations where more precise reassembly behavior | ||||
| would be required, protocol specifications SHOULD instead specify | ||||
| that DF MUST be set in all packets sent by the protocol. | ||||
| 6. Support from the Transport Protocol | 6. Support from the Transport Protocol | |||
| ECN requires support from the transport protocol, in addition to the | ECN requires support from the transport protocol, in addition to the | |||
| functionality given by the ECN field in the IP packet header. The | functionality given by the ECN field in the IP packet header. The | |||
| transport protocol might require negotiation between the endpoints | transport protocol might require negotiation between the endpoints | |||
| during setup to determine that all of the endpoints are ECN-capable, | during setup to determine that all of the endpoints are ECN-capable, | |||
| so that the sender can set the ECT bit in transmitted packets. Sec- | so that the sender can set the ECT codepoint in transmitted packets. | |||
| ond, the transport protocol must be capable of reacting appropriately | Second, the transport protocol must be capable of reacting | |||
| to the receipt of CE packets. This reaction could be in the form of | appropriately to the receipt of CE packets. This reaction could be | |||
| the data receiver informing the data sender of the received CE packet | in the form of the data receiver informing the data sender of the | |||
| (e.g., TCP), of the data receiver unsubscribing to a layered multi- | received CE packet (e.g., TCP), of the data receiver unsubscribing to | |||
| cast group (e.g., RLM [MJV96]), or of some other action that ulti- | a layered multicast group (e.g., RLM [MJV96]), or of some other | |||
| mately reduces the arrival rate of that flow on that congested link. | action that ultimately reduces the arrival rate of that flow on that | |||
| congested link. CE packets indicate persistent rather than transient | ||||
| congestion (see Section 5.1), and hence reactions to the receipt of | ||||
| CE packets should be those appropriate for persistent congestion. | ||||
| This document only addresses the addition of ECN Capability to TCP, | This document only addresses the addition of ECN Capability to TCP, | |||
| leaving issues of ECN in other transport protocols to further | leaving issues of ECN in other transport protocols to further | |||
| research. For TCP, ECN requires three new pieces of functionality: | research. For TCP, ECN requires three new pieces of functionality: | |||
| negotiation between the endpoints during connection setup to deter- | negotiation between the endpoints during connection setup to | |||
| mine if they are both ECN-capable; an ECN-Echo (ECE) flag in the TCP | determine if they are both ECN-capable; an ECN-Echo (ECE) flag in the | |||
| header so that the data receiver can inform the data sender when a CE | TCP header so that the data receiver can inform the data sender when | |||
| packet has been received; and a Congestion Window Reduced (CWR) flag | a CE packet has been received; and a Congestion Window Reduced (CWR) | |||
| in the TCP header so that the data sender can inform the data | flag in the TCP header so that the data sender can inform the data | |||
| receiver that the congestion window has been reduced. The support | receiver that the congestion window has been reduced. The support | |||
| required from other transport protocols is likely to be different, | required from other transport protocols is likely to be different, | |||
| particularly for unreliable or reliable multicast transport proto- | particularly for unreliable or reliable multicast transport | |||
| cols, and will have to be determined as other transport protocols are | protocols, and will have to be determined as other transport | |||
| brought to the IETF for standardization. | protocols are brought to the IETF for standardization. | |||
| 6.1. TCP | 6.1. TCP | |||
| The following sections describe in detail the proposed use of ECN in | The following sections describe in detail the proposed use of ECN in | |||
| TCP. This proposal is described in essentially the same form in | TCP. This proposal is described in essentially the same form in | |||
| [Floyd94]. We assume that the source TCP uses the standard congestion | [Floyd94]. We assume that the source TCP uses the standard congestion | |||
| control algorithms of Slow-start, Fast Retransmit and Fast Recovery | control algorithms of Slow-start, Fast Retransmit and Fast Recovery | |||
| [RFC 2001]. | [RFC 2001]. | |||
| This proposal specifies two new flags in the Reserved field of the | This proposal specifies two new flags in the Reserved field of the | |||
| TCP header. The TCP mechanism for negotiating ECN-Capability uses | TCP header. The TCP mechanism for negotiating ECN-Capability uses | |||
| the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the Reserved | the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the Reserved | |||
| field of the TCP header is designated as the ECN-Echo flag. The | field of the TCP header is designated as the ECN-Echo flag. The | |||
| location of the 6-bit Reserved field in the TCP header is shown in | location of the 6-bit Reserved field in the TCP header is shown in | |||
| Figure 3 of RFC 793 [RFC793] (and is reproduced below for complete- | Figure 4 of RFC 793 [RFC793] (and is reproduced below for | |||
| ness). This specification of the ECN Field leaves the Reserved field | completeness). This specification of the ECN Field leaves the | |||
| as a 4-bit field using bits 4-7. | Reserved field as a 4-bit field using bits 4-7. | |||
| To enable the TCP receiver to determine when to stop setting the ECN- | To enable the TCP receiver to determine when to stop setting the ECN- | |||
| Echo flag, we introduce a second new flag in the TCP header, the CWR | Echo flag, we introduce a second new flag in the TCP header, the CWR | |||
| flag. The CWR flag is assigned to Bit 8 in the Reserved field of the | flag. The CWR flag is assigned to Bit 8 in the Reserved field of the | |||
| TCP header. | TCP header. | |||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | | U | A | P | R | S | F | | | | | U | A | P | R | S | F | | |||
| | Header Length | Reserved | R | C | S | S | Y | I | | | Header Length | Reserved | R | C | S | S | Y | I | | |||
| | | | G | K | H | T | N | N | | | | | G | K | H | T | N | N | | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| Figure 2: The old definition of bytes 13 and 14 of the TCP | Figure 3: The old definition of bytes 13 and 14 of the TCP | |||
| header. | header. | |||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | | C | E | U | A | P | R | S | F | | | | | C | E | U | A | P | R | S | F | | |||
| | Header Length | Reserved | W | C | R | C | S | S | Y | I | | | Header Length | Reserved | W | C | R | C | S | S | Y | I | | |||
| | | | R | E | G | K | H | T | N | N | | | | | R | E | G | K | H | T | N | N | | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| Figure 3: The new definition of bytes 13 and 14 of the TCP | Figure 4: The new definition of bytes 13 and 14 of the TCP | |||
| Header. | Header. | |||
| Thus, ECN uses the ECT and CE flags in the IP header (as shown in | Thus, ECN uses the ECT and CE flags in the IP header (as shown in | |||
| Figure 1) for signaling between routers and connection endpoints, and | Figure 1) for signaling between routers and connection endpoints, and | |||
| uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure | uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure | |||
| 3) for TCP-endpoint to TCP-endpoint signaling. For a TCP connection, | 4) for TCP-endpoint to TCP-endpoint signaling. For a TCP connection, | |||
| a typical sequence of events in an ECN-based reaction to congestion | a typical sequence of events in an ECN-based reaction to congestion | |||
| is as follows: | is as follows: | |||
| * The ECT bit is set in packets transmitted by the sender to indi- | * An ECT codepoint is set in packets transmitted by the sender to | |||
| cate that ECN is supported by the transport entities for these | indicate that ECN is supported by the transport entities for these | |||
| packets. | packets. | |||
| * An ECN-capable router detects impending congestion and detects | * An ECN-capable router detects impending congestion and detects | |||
| that the ECT bit is set in the packet it is about to drop. | that an ECT codepoint is set in the packet it is about to drop. | |||
| Instead of dropping the packet, the router chooses to set the CE | Instead of dropping the packet, the router chooses to set the CE | |||
| bit in the IP header and forwards the packet. | codepoint in the IP header and forwards the packet. | |||
| * The receiver receives the packet with the CE bit set, and sets | * The receiver receives the packet with the CE codepoint set, and | |||
| the ECN-Echo flag in its next TCP ACK sent to the sender. | sets the ECN-Echo flag in its next TCP ACK sent to the sender. | |||
| * The sender receives the TCP ACK with ECN-Echo set, and reacts to | * The sender receives the TCP ACK with ECN-Echo set, and reacts to | |||
| the congestion as if a packet had been dropped. | the congestion as if a packet had been dropped. | |||
| * The sender sets the CWR flag in the TCP header of the next | * The sender sets the CWR flag in the TCP header of the next | |||
| packet sent to the receiver to acknowledge its receipt of and | packet sent to the receiver to acknowledge its receipt of and | |||
| reaction to the ECN-Echo flag. | reaction to the ECN-Echo flag. | |||
| The negotiation for using ECN by the TCP transport entities and the | The negotiation for using ECN by the TCP transport entities and the | |||
| use of the ECN-Echo and CWR flags is described in more detail in the | use of the ECN-Echo and CWR flags is described in more detail in the | |||
| sections below. | sections below. | |||
| 6.1.1 TCP Initialization | 6.1.1 TCP Initialization | |||
| In the TCP connection setup phase, the source and destination TCPs | In the TCP connection setup phase, the source and destination TCPs | |||
| exchange information about their willingness to use ECN. Subsequent | exchange information about their willingness to use ECN. Subsequent | |||
| to the completion of this negotiation, the TCP sender sets the ECT | to the completion of this negotiation, the TCP sender sets an ECT | |||
| bit in the IP header of data packets to indicate to the network that | codepoint in the IP header of data packets to indicate to the network | |||
| the transport is capable and willing to participate in ECN for this | that the transport is capable and willing to participate in ECN for | |||
| packet. This indicates to the routers that they may mark this packet | this packet. This indicates to the routers that they may mark this | |||
| with the CE bit, if they would like to use that as a method of con- | packet with the CE codepoint, if they would like to use that as a | |||
| gestion notification. If the TCP connection does not wish to use ECN | method of congestion notification. If the TCP connection does not | |||
| notification for a particular packet, the sending TCP sets the ECT | wish to use ECN notification for a particular packet, the sending TCP | |||
| bit equal to 0 (i.e., not set), and the TCP receiver ignores the CE | sets the ECN codepoint to not-ECT, and the TCP receiver ignores the | |||
| bit in the received packet. | CE codepoint in the received packet. | |||
| For this discussion, we designate the initiating host as Host A and | For this discussion, we designate the initiating host as Host A and | |||
| the responding host as Host B. We call a SYN packet with the ECE and | the responding host as Host B. We call a SYN packet with the ECE and | |||
| CWR flags set an "ECN-setup SYN packet", and we call a SYN packet | CWR flags set an "ECN-setup SYN packet", and we call a SYN packet | |||
| with at least one of the ECE and CWR flags not set a "non-ECN-setup | with at least one of the ECE and CWR flags not set a "non-ECN-setup | |||
| SYN packet". Similarly, we call a SYN-ACK packet with only the ECE | SYN packet". Similarly, we call a SYN-ACK packet with only the ECE | |||
| flag set but the CWR flag not set an "ECN-setup SYN-ACK packet", and | flag set but the CWR flag not set an "ECN-setup SYN-ACK packet", and | |||
| we call a SYN-ACK packet with any other configuration of the ECE and | we call a SYN-ACK packet with any other configuration of the ECE and | |||
| CWR flags a "non-ECN-setup SYN-ACK packet". | CWR flags a "non-ECN-setup SYN-ACK packet". | |||
| Before a TCP connection can use ECN, Host A sends an ECN-setup SYN | Before a TCP connection can use ECN, Host A sends an ECN-setup SYN | |||
| packet, and Host B sends an ECN-setup SYN-ACK packet. For a SYN | packet, and Host B sends an ECN-setup SYN-ACK packet. For a SYN | |||
| packet, the setting of both ECE and CWR in the ECN-setup SYN packet | packet, the setting of both ECE and CWR in the ECN-setup SYN packet | |||
| is defined as an indication that the sending TCP is ECN-Capable, | is defined as an indication that the sending TCP is ECN-Capable, | |||
| rather than as an indication of congestion or of response to conges- | rather than as an indication of congestion or of response to | |||
| tion. More precisely, an ECN-setup SYN packet indicates that the TCP | congestion. More precisely, an ECN-setup SYN packet indicates that | |||
| implementation transmitting the SYN packet will participate in ECN as | the TCP implementation transmitting the SYN packet will participate | |||
| both a sender and receiver. Specifically, as a receiver, it will | in ECN as both a sender and receiver. Specifically, as a receiver, | |||
| respond to incoming data packets that have the CE bit set in the IP | it will respond to incoming data packets that have the CE codepoint | |||
| header by setting ECE in outgoing TCP Acknowledgement (ACK) packets. | set in the IP header by setting ECE in outgoing TCP Acknowledgement | |||
| As a sender, it will respond to incoming packets that have ECE set by | (ACK) packets. As a sender, it will respond to incoming packets that | |||
| reducing the congestion window and setting CWR when appropriate. An | have ECE set by reducing the congestion window and setting CWR when | |||
| ECN-setup SYN packet does not commit the TCP sender to setting the | appropriate. An ECN-setup SYN packet does not commit the TCP sender | |||
| ECT bit in any or all of the packets it may transmit. However, the | to setting the ECT codepoint in any or all of the packets it may | |||
| commitment to respond appropriately to incoming packets with the CE | transmit. However, the commitment to respond appropriately to | |||
| bit set remains even if the TCP sender in a later transmission, | incoming packets with the CE codepoint set remains even if the TCP | |||
| within this TCP connection, sends a SYN packet without ECE and CWR | sender in a later transmission, within this TCP connection, sends a | |||
| set. | SYN packet without ECE and CWR set. | |||
| When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE flag | When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE flag | |||
| but not the CWR flag. An ECN-setup SYN-ACK packet is defined as an | but not the CWR flag. An ECN-setup SYN-ACK packet is defined as an | |||
| indication that the TCP transmitting the SYN-ACK packet is ECN-Capa- | indication that the TCP transmitting the SYN-ACK packet is ECN- | |||
| ble. As with the SYN packet, an ECN-setup SYN-ACK packet does not | Capable. As with the SYN packet, an ECN-setup SYN-ACK packet does | |||
| commit the TCP host to setting the ECT bit in transmitted packets. | not commit the TCP host to setting the ECT codepoint in transmitted | |||
| packets. | ||||
| The following rules apply to the sending of ECN-setup packets: | The following rules apply to the sending of ECN-setup packets: | |||
| * If a host has received an ECN-setup SYN packet, then it MAY send an | * If a host has received an ECN-setup SYN packet, then it MAY send an | |||
| ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send an ECN-setup | ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send an ECN-setup | |||
| SYN-ACK packet. | SYN-ACK packet. | |||
| * A host MUST NOT set ECT on data packets unless it has sent at least | * A host MUST NOT set ECT on data packets unless it has sent at least | |||
| one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received at | one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received at | |||
| least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has sent no | least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has sent no | |||
| non-ECN-setup SYN or non-ECN-setup SYN-ACK packet. If a host has | non-ECN-setup SYN or non-ECN-setup SYN-ACK packet. If a host has | |||
| received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK | received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK | |||
| packet, then it SHOULD NOT set ECT on data packets. | packet, then it SHOULD NOT set ECT on data packets. | |||
| * If a host ever sets the ECT bit on a data packet, then that host | * If a host ever sets the ECT codepoint on a data packet, then that | |||
| MUST correctly set/clear the CWR TCP bit on all subsequent packets in | host MUST correctly set/clear the CWR TCP bit on all subsequent | |||
| the connection. | packets in the connection. | |||
| * If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK | * If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK | |||
| packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN- | packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN- | |||
| ACK packet, then if that host receives TCP data packets with ECT and | ACK packet, then if that host receives TCP data packets with ECT and | |||
| CE bits set in the IP header, then that host MUST process these pack- | CE codepoints set in the IP header, then that host MUST process these | |||
| ets as specified for an ECN-capable connection. * A host that is not | packets as specified for an ECN-capable connection. | |||
| willing to use ECN on a TCP connection SHOULD clear both the ECE and | * A host that is not willing to use ECN on a TCP connection SHOULD | |||
| CWR flags in all non-ECN-setup SYN and/or SYN-ACK packets that it | clear both the ECE and CWR flags in all non-ECN-setup SYN and/or SYN- | |||
| sends to indicate this unwillingness. Receivers MUST correctly han- | ACK packets that it sends to indicate this unwillingness. Receivers | |||
| dle all forms of the non-ECN-setup SYN and SYN-ACK packets. | MUST correctly handle all forms of the non-ECN-setup SYN and SYN-ACK | |||
| packets. | ||||
| * A host MUST NOT set ECT on SYN or SYN-ACK packets. | ||||
| 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field | 6.1.1.1. Robust TCP Initialization with an Echoed Reserve Field | |||
| There is the question of why we chose to have the TCP sending the SYN | There is the question of why we chose to have the TCP sending the SYN | |||
| set two ECN-related flags in the Reserved field of the TCP header for | set two ECN-related flags in the Reserved field of the TCP header for | |||
| the SYN packet, while the responding TCP sending the SYN-ACK sets | the SYN packet, while the responding TCP sending the SYN-ACK sets | |||
| only one ECN-related flag in the SYN-ACK packet. This asymmetry is | only one ECN-related flag in the SYN-ACK packet. This asymmetry is | |||
| necessary for the robust negotiation of ECN-capability with some | necessary for the robust negotiation of ECN-capability with some | |||
| deployed TCP implementations. There exists at least one faulty TCP | deployed TCP implementations. There exists at least one faulty TCP | |||
| implementation in which TCP receivers set the Reserved field of the | implementation in which TCP receivers set the Reserved field of the | |||
| TCP header in ACK packets (and hence the SYN-ACK) simply to reflect | TCP header in ACK packets (and hence the SYN-ACK) simply to reflect | |||
| the Reserved field of the TCP header in the received data packet. | the Reserved field of the TCP header in the received data packet. | |||
| Because the TCP SYN packet sets the ECN-Echo and CWR flags to indi- | Because the TCP SYN packet sets the ECN-Echo and CWR flags to | |||
| cate ECN-capability, while the SYN-ACK packet sets only the ECN-Echo | indicate ECN-capability, while the SYN-ACK packet sets only the ECN- | |||
| flag, the sending TCP correctly interprets a receiver's reflection of | Echo flag, the sending TCP correctly interprets a receiver's | |||
| its own flags in the Reserved field as an indication that the | reflection of its own flags in the Reserved field as an indication | |||
| receiver is not ECN-capable. The sending TCP is not mislead by a | that the receiver is not ECN-capable. The sending TCP is not mislead | |||
| faulty TCP implementation sending a SYN-ACK packet that simply | by a faulty TCP implementation sending a SYN-ACK packet that simply | |||
| reflects the Reserved field of the incoming SYN packet. | reflects the Reserved field of the incoming SYN packet. | |||
| 6.1.2. The TCP Sender | 6.1.2. The TCP Sender | |||
| For a TCP connection using ECN, new data packets are transmitted with | For a TCP connection using ECN, new data packets are transmitted with | |||
| the ECT bit set in the IP header (set to a "1"). If the sender | an ECT codepoint set in the IP header. When only one ECT codepoint | |||
| receives an ECN-Echo (ECE) ACK packet (that is, an ACK packet with | is needed by a sender for all packets sent on a TCP connection, | |||
| the ECN-Echo flag set in the TCP header), then the sender knows that | ECT(0) SHOULD be used. If the sender receives an ECN-Echo (ECE) ACK | |||
| congestion was encountered in the network on the path from the sender | packet (that is, an ACK packet with the ECN-Echo flag set in the TCP | |||
| to the receiver. The indication of congestion should be treated just | header), then the sender knows that congestion was encountered in the | |||
| as a congestion loss in non-ECN-Capable TCP. That is, the TCP source | network on the path from the sender to the receiver. The indication | |||
| halves the congestion window "cwnd" and reduces the slow start | of congestion should be treated just as a congestion loss in non-ECN- | |||
| threshold "ssthresh". The sending TCP SHOULD NOT increase the con- | Capable TCP. That is, the TCP source halves the congestion window | |||
| gestion window in response to the receipt of an ECN-Echo ACK packet. | "cwnd" and reduces the slow start threshold "ssthresh". The sending | |||
| TCP SHOULD NOT increase the congestion window in response to the | ||||
| receipt of an ECN-Echo ACK packet. | ||||
| TCP should not react to congestion indications more than once every | TCP should not react to congestion indications more than once every | |||
| window of data (or more loosely, more than once every round-trip | window of data (or more loosely, more than once every round-trip | |||
| time). That is, the TCP sender's congestion window should be reduced | time). That is, the TCP sender's congestion window should be reduced | |||
| only once in response to a series of dropped and/or CE packets from a | only once in response to a series of dropped and/or CE packets from a | |||
| single window of data. In addition, the TCP source should not | single window of data. In addition, the TCP source should not | |||
| decrease the slow-start threshold, ssthresh, if it has been decreased | decrease the slow-start threshold, ssthresh, if it has been decreased | |||
| within the last round trip time. However, if any retransmitted pack- | within the last round trip time. However, if any retransmitted | |||
| ets are dropped, then this is interpreted by the source TCP as a new | packets are dropped, then this is interpreted by the source TCP as a | |||
| instance of congestion. | new instance of congestion. | |||
| After the source TCP reduces its congestion window in response to a | After the source TCP reduces its congestion window in response to a | |||
| CE packet, incoming acknowledgements that continue to arrive can | CE packet, incoming acknowledgements that continue to arrive can | |||
| "clock out" outgoing packets as allowed by the reduced congestion | "clock out" outgoing packets as allowed by the reduced congestion | |||
| window. If the congestion window consists of only one MSS (maximum | window. If the congestion window consists of only one MSS (maximum | |||
| segment size), and the sending TCP receives an ECN-Echo ACK packet, | segment size), and the sending TCP receives an ECN-Echo ACK packet, | |||
| then the sending TCP should in principle still reduce its congestion | then the sending TCP should in principle still reduce its congestion | |||
| window in half. However, the value of the congestion window is | window in half. However, the value of the congestion window is | |||
| bounded below by a value of one MSS. If the sending TCP were to con- | bounded below by a value of one MSS. If the sending TCP were to | |||
| tinue to send, using a congestion window of 1 MSS, this results in | continue to send, using a congestion window of 1 MSS, this results in | |||
| the transmission of one packet per round-trip time. It is necessary | the transmission of one packet per round-trip time. It is necessary | |||
| to still reduce the sending rate of the TCP sender even further, on | to still reduce the sending rate of the TCP sender even further, on | |||
| receipt of an ECN-Echo packet when the congestion window is one. We | receipt of an ECN-Echo packet when the congestion window is one. We | |||
| use the retransmit timer as a means of reducing the rate further in | use the retransmit timer as a means of reducing the rate further in | |||
| this circumstance. Therefore, the sending TCP MUST reset the | this circumstance. Therefore, the sending TCP MUST reset the | |||
| retransmit timer on receiving the ECN-Echo packet when the congestion | retransmit timer on receiving the ECN-Echo packet when the congestion | |||
| window is one. The sending TCP will then be able to send a new | window is one. The sending TCP will then be able to send a new | |||
| packet only when the retransmit timer expires. | packet only when the retransmit timer expires. | |||
| When an ECN-Capable TCP sender reduces its congestion window for any | When an ECN-Capable TCP sender reduces its congestion window for any | |||
| reason (because of a retransmit timeout, a Fast Retransmit, or in | reason (because of a retransmit timeout, a Fast Retransmit, or in | |||
| response to an ECN Notification), the TCP sender sets the CWR flag in | response to an ECN Notification), the TCP sender sets the CWR flag in | |||
| the TCP header of the first new data packet sent after the window | the TCP header of the first new data packet sent after the window | |||
| reduction. If that data packet is dropped in the network, then the | reduction. If that data packet is dropped in the network, then the | |||
| sending TCP will have to reduce the congestion window again and | sending TCP will have to reduce the congestion window again and | |||
| retransmit the dropped packet. | retransmit the dropped packet. | |||
| We ensure that the "Congestion Window Reduced" information is reli- | We ensure that the "Congestion Window Reduced" information is | |||
| ably delivered to the TCP receiver. This comes about from the fact | reliably delivered to the TCP receiver. This comes about from the | |||
| that if the new data packet carrying the CWR flag is dropped, then | fact that if the new data packet carrying the CWR flag is dropped, | |||
| the TCP sender will have to again reduce its congestion window, and | then the TCP sender will have to again reduce its congestion window, | |||
| send another new data packet with the CWR flag set. Thus, the CWR | and send another new data packet with the CWR flag set. Thus, the | |||
| bit in the TCP header SHOULD NOT be set on retransmitted packets. | CWR bit in the TCP header SHOULD NOT be set on retransmitted packets. | |||
| When the TCP data sender is ready to set the CWR bit after reducing | When the TCP data sender is ready to set the CWR bit after reducing | |||
| the congestion window, it SHOULD set the CWR bit only on the first | the congestion window, it SHOULD set the CWR bit only on the first | |||
| new data packet that it transmits. | new data packet that it transmits. | |||
| [Floyd94] discusses TCP's response to ECN in more detail. [Floyd98] | [Floyd94] discusses TCP's response to ECN in more detail. [Floyd98] | |||
| discusses the validation test in the ns simulator, which illustrates | discusses the validation test in the ns simulator, which illustrates | |||
| a wide range of ECN scenarios. These scenarios include the following: | a wide range of ECN scenarios. These scenarios include the following: | |||
| an ECN followed by another ECN, a Fast Retransmit, or a Retransmit | an ECN followed by another ECN, a Fast Retransmit, or a Retransmit | |||
| Timeout; a Retransmit Timeout or a Fast Retransmit followed by an | Timeout; a Retransmit Timeout or a Fast Retransmit followed by an | |||
| ECN; and a congestion window of one packet followed by an ECN. | ECN; and a congestion window of one packet followed by an ECN. | |||
| skipping to change at page 16, line 40 ¶ | skipping to change at page 18, line 44 ¶ | |||
| increasing the congestion window when it receives ACK packets without | increasing the congestion window when it receives ACK packets without | |||
| the ECN-Echo bit set [RFC2581]. | the ECN-Echo bit set [RFC2581]. | |||
| 6.1.3. The TCP Receiver | 6.1.3. The TCP Receiver | |||
| When TCP receives a CE data packet at the destination end-system, the | When TCP receives a CE data packet at the destination end-system, the | |||
| TCP data receiver sets the ECN-Echo flag in the TCP header of the | TCP data receiver sets the ECN-Echo flag in the TCP header of the | |||
| subsequent ACK packet. If there is any ACK withholding implemented, | subsequent ACK packet. If there is any ACK withholding implemented, | |||
| as in current "delayed-ACK" TCP implementations where the TCP | as in current "delayed-ACK" TCP implementations where the TCP | |||
| receiver can send an ACK for two arriving data packets, then the ECN- | receiver can send an ACK for two arriving data packets, then the ECN- | |||
| Echo flag in the ACK packet will be set to the OR of the CE bits of | Echo flag in the ACK packet will be set to '1' if the CE codepoint is | |||
| all of the data packets being acknowledged. That is, if any of the | set in any of the data packets being acknowledged. That is, if any | |||
| received data packets are CE packets, then the returning ACK has the | of the received data packets are CE packets, then the returning ACK | |||
| ECN-Echo flag set. | has the ECN-Echo flag set. | |||
| To provide robustness against the possibility of a dropped ACK packet | To provide robustness against the possibility of a dropped ACK packet | |||
| carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in | carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in | |||
| a series of ACK packets sent subsequently. The TCP receiver uses the | a series of ACK packets sent subsequently. The TCP receiver uses the | |||
| CWR flag received from the TCP sender to determine when to stop set- | CWR flag received from the TCP sender to determine when to stop | |||
| ting the ECN-Echo flag. | setting the ECN-Echo flag. | |||
| After a TCP receiver sends an ACK packet with the ECN-Echo bit set, | After a TCP receiver sends an ACK packet with the ECN-Echo bit set, | |||
| that TCP receiver continues to set the ECN-Echo flag in all the ACK | that TCP receiver continues to set the ECN-Echo flag in all the ACK | |||
| packets it sends (whether they acknowledge CE data packets or non-CE | packets it sends (whether they acknowledge CE data packets or non-CE | |||
| data packets) until it receives a CWR packet (a packet with the CWR | data packets) until it receives a CWR packet (a packet with the CWR | |||
| flag set). After the receipt of the CWR packet, acknowledgements for | flag set). After the receipt of the CWR packet, acknowledgements for | |||
| subsequent non-CE data packets do not have the ECN-Echo flag set. If | subsequent non-CE data packets do not have the ECN-Echo flag set. If | |||
| another CE packet is received by the data receiver, the receiver | another CE packet is received by the data receiver, the receiver | |||
| would once again send ACK packets with the ECN-Echo flag set. While | would once again send ACK packets with the ECN-Echo flag set. While | |||
| the receipt of a CWR packet does not guarantee that the data sender | the receipt of a CWR packet does not guarantee that the data sender | |||
| received the ECN-Echo message, this does suggest that the data sender | received the ECN-Echo message, this does suggest that the data sender | |||
| reduced its congestion window at some point *after* it sent the data | reduced its congestion window at some point *after* it sent the data | |||
| packet for which the CE bit was set. | packet for which the CE codepoint was set. | |||
| We have already specified that a TCP sender is not required to reduce | We have already specified that a TCP sender is not required to reduce | |||
| its congestion window more than once per window of data. Some care | its congestion window more than once per window of data. Some care | |||
| is required if the TCP sender is to avoid unnecessary reductions of | is required if the TCP sender is to avoid unnecessary reductions of | |||
| the congestion window when a window of data includes both dropped | the congestion window when a window of data includes both dropped | |||
| packets and (marked) CE packets. This is illustrated in [Floyd98]. | packets and (marked) CE packets. This is illustrated in [Floyd98]. | |||
| 6.1.4. Congestion on the ACK-path | 6.1.4. Congestion on the ACK-path | |||
| For the current generation of TCP congestion control algorithms, pure | For the current generation of TCP congestion control algorithms, pure | |||
| acknowledgement packets (e.g., packets that do not contain any accom- | acknowledgement packets (e.g., packets that do not contain any | |||
| panying data) should be sent with the ECT bit off. Current TCP | accompanying data) should be sent with the not-ECT codepoint. | |||
| receivers have no mechanisms for reducing traffic on the ACK-path in | Current TCP receivers have no mechanisms for reducing traffic on the | |||
| response to congestion notification. Mechanisms for responding to | ACK-path in response to congestion notification. Mechanisms for | |||
| congestion on the ACK-path are areas for current and future research. | responding to congestion on the ACK-path are areas for current and | |||
| (One simple possibility would be for the sender to reduce its conges- | future research. (One simple possibility would be for the sender to | |||
| tion window when it receives a pure ACK packet with the CE bit set). | reduce its congestion window when it receives a pure ACK packet with | |||
| For current TCP implementations, a single dropped ACK generally has | the CE codepoint set). For current TCP implementations, a single | |||
| only a very small effect on the TCP's sending rate. | dropped ACK generally has only a very small effect on the TCP's | |||
| sending rate. | ||||
| 6.1.5. Retransmitted TCP packets | 6.1.5. Retransmitted TCP packets | |||
| This document specifies that for ECN-capable TCP implementations, the | This document specifies ECN-capable TCP implementations MUST NOT set | |||
| ECT bit (ECN-Capable Transport) in the IP header MUST NOT be set on | either ECT codepoint (ECT(0) or ECT(1)) in the IP header for | |||
| retransmitted data packets, and that the TCP data receiver SHOULD | retransmitted data packets, and that the TCP data receiver SHOULD | |||
| ignore the ECN field on arriving data packets that are outside of the | ignore the ECN field on arriving data packets that are outside of the | |||
| receiver's current window. This is for greater security against | receiver's current window. This is for greater security against | |||
| denial-of-service attacks, as well as for robustness of the ECN con- | denial-of-service attacks, as well as for robustness of the ECN | |||
| gestion indication with packets that are dropped later in the net- | congestion indication with packets that are dropped later in the | |||
| work. | network. | |||
| First, we note that if the TCP sender were to set the ECT bit on a | First, we note that if the TCP sender were to set an ECT codepoint on | |||
| retransmitted packet, then if an unnecessarily-retransmitted packet | a retransmitted packet, then if an unnecessarily-retransmitted packet | |||
| was later dropped in the network, the end nodes would never receive | was later dropped in the network, the end nodes would never receive | |||
| the indication of congestion from the router setting the CE bit. | the indication of congestion from the router setting the CE | |||
| Thus, setting the ECT bit on retransmitted data packets is not con- | codepoint. Thus, setting an ECT codepoint on retransmitted data | |||
| sistent with the robust delivery of the congestion indication even | packets is not consistent with the robust delivery of the congestion | |||
| for packets that are later dropped in the network. | indication even for packets that are later dropped in the network. | |||
| In addition, an attacker capable of spoofing the IP source address of | In addition, an attacker capable of spoofing the IP source address of | |||
| the TCP sender could send data packets with arbitrary sequence num- | the TCP sender could send data packets with arbitrary sequence | |||
| bers, with both the ECT and CE bits set in the IP header. On receiv- | numbers, with the CE codepoint set in the IP header. On receiving | |||
| ing this spoofed data packet, the TCP data receiver would determine | this spoofed data packet, the TCP data receiver would determine that | |||
| that the data does not lie in the current receive window, and return | the data does not lie in the current receive window, and return a | |||
| a duplicate acknowledgement. We define an out-of-window packet at | duplicate acknowledgement. We define an out-of-window packet at the | |||
| the TCP data receiver as a data packet that lies outside the | TCP data receiver as a data packet that lies outside the receiver's | |||
| receiver's current window. On receiving an out-of-window packet, the | current window. On receiving an out-of-window packet, the TCP data | |||
| TCP data receiver has to decide whether or not to treat the CE bit in | receiver has to decide whether or not to treat the CE codepoint in | |||
| the packet header as a valid indication of congestion, and therefore | the packet header as a valid indication of congestion, and therefore | |||
| whether to return ECN-Echo indications to the TCP data sender. If | whether to return ECN-Echo indications to the TCP data sender. If | |||
| the TCP data receiver ignored the CE bit in an out-of-window packet, | the TCP data receiver ignored the CE codepoint in an out-of-window | |||
| then the TCP data sender would not receive this possibly-legitimate | packet, then the TCP data sender would not receive this possibly- | |||
| indication of congestion from the network, resulting in a violation | legitimate indication of congestion from the network, resulting in a | |||
| of end-to-end congestion control. On the other hand, if the TCP data | violation of end-to-end congestion control. On the other hand, if | |||
| receiver honors the CE indication in the out-of-window packet, and | the TCP data receiver honors the CE indication in the out-of-window | |||
| reports the indication of congestion to the TCP data sender, then the | packet, and reports the indication of congestion to the TCP data | |||
| malicious node that created the spoofed, out-of-window packet has | sender, then the malicious node that created the spoofed, out-of- | |||
| successfully "attacked" the TCP connection by forcing the data sender | window packet has successfully "attacked" the TCP connection by | |||
| to unnecessarily reduce (halve) its congestion window. To prevent | forcing the data sender to unnecessarily reduce (halve) its | |||
| such a denial-of-service attack, we specify that a legitimate TCP | congestion window. To prevent such a denial-of-service attack, we | |||
| data sender MUST NOT set the ECT bit on retransmitted data packets, | specify that a legitimate TCP data sender MUST NOT set an ECT | |||
| and that the TCP data receiver SHOULD ignore the CE bit on out-of- | codepoint on retransmitted data packets, and that the TCP data | |||
| window packets. | receiver SHOULD ignore the CE codepoint on out-of-window packets. | |||
| One drawback of not setting ECT on retransmitted packets denies ECN | One drawback of not setting ECT(0) or ECT(1) on retransmitted packets | |||
| protection for retransmitted packets. However, for an ECN-capable | is that it denies ECN protection for retransmitted packets. However, | |||
| TCP connection in a fully-ECN-capable environment with mild conges- | for an ECN-capable TCP connection in a fully-ECN-capable environment | |||
| tion, packets should rarely be dropped due to congestion in the first | with mild congestion, packets should rarely be dropped due to | |||
| place, and so instances of retransmitted packets should rarely arise. | congestion in the first place, and so instances of retransmitted | |||
| If packets are being retransmitted, then there are already packet | packets should rarely arise. If packets are being retransmitted, | |||
| losses (from corruption or from congestion) that ECN has been unable | then there are already packet losses (from corruption or from | |||
| to prevent. | congestion) that ECN has been unable to prevent. | |||
| We note that if the router sets the CE bit for an ECN-capable data | We note that if the router sets the CE codepoint for an ECN-capable | |||
| packet within a TCP connection, then the TCP connection is guaranteed | data packet within a TCP connection, then the TCP connection is | |||
| to receive that indication of congestion, or to receive some other | guaranteed to receive that indication of congestion, or to receive | |||
| indication of congestion within the same window of data, even if this | some other indication of congestion within the same window of data, | |||
| packet is dropped or reordered in the network. We consider two | even if this packet is dropped or reordered in the network. We | |||
| cases, when the packet is later retransmitted, and when the packet is | consider two cases, when the packet is later retransmitted, and when | |||
| not later retransmitted. | the packet is not later retransmitted. | |||
| In the first case, if the packet is either dropped or delayed, and at | In the first case, if the packet is either dropped or delayed, and at | |||
| some point retransmitted by the data sender, then the retransmission | some point retransmitted by the data sender, then the retransmission | |||
| is a result of a Fast Retransmit or a Retransmit Timeout for either | is a result of a Fast Retransmit or a Retransmit Timeout for either | |||
| that packet or for some prior packet in the same window of data. In | that packet or for some prior packet in the same window of data. In | |||
| this case, because the data sender already has retransmitted this | this case, because the data sender already has retransmitted this | |||
| packet, we know that the data sender has already responded to an | packet, we know that the data sender has already responded to an | |||
| indication of congestion for some packet within the same window of | indication of congestion for some packet within the same window of | |||
| data as the original packet. Thus, even if the first transmission of | data as the original packet. Thus, even if the first transmission of | |||
| the packet is dropped in the network, or is delayed, if it had the CE | the packet is dropped in the network, or is delayed, if it had the CE | |||
| bit set, and is later ignored by the data receiver as an out-of-win- | codepoint set, and is later ignored by the data receiver as an out- | |||
| dow packet, this is not a problem, because the sender has already | of-window packet, this is not a problem, because the sender has | |||
| responded to an indication of congestion for that window of data. | already responded to an indication of congestion for that window of | |||
| data. | ||||
| In the second case, if the packet is never retransmitted by the data | In the second case, if the packet is never retransmitted by the data | |||
| sender, then this data packet is the only copy of this data received | sender, then this data packet is the only copy of this data received | |||
| by the data receiver, and therefore arrives at the data receiver as | by the data receiver, and therefore arrives at the data receiver as | |||
| an in-window packet, regardless of how much the packet might be | an in-window packet, regardless of how much the packet might be | |||
| delayed or reordered. In this case, if the CE bit is set on the | delayed or reordered. In this case, if the CE codepoint is set on | |||
| packet within the network, this will be treated by the data receiver | the packet within the network, this will be treated by the data | |||
| as a valid indication of congestion. | receiver as a valid indication of congestion. | |||
| 6.1.6. TCP Window Probes. | 6.1.6. TCP Window Probes. | |||
| When the TCP data receiver advertises a zero window, the TCP data | When the TCP data receiver advertises a zero window, the TCP data | |||
| sender sends window probes to determine if the receiver's window has | sender sends window probes to determine if the receiver's window has | |||
| increased. Window probe packets do not contain any user data except | increased. Window probe packets do not contain any user data except | |||
| for the sequence number, which is a byte. If a window probe packet | for the sequence number, which is a byte. If a window probe packet | |||
| is dropped in the network, this loss is not detected by the receiver. | is dropped in the network, this loss is not detected by the receiver. | |||
| Therefore, the TCP data sender MUST NOT set either the ECT or CWR | Therefore, the TCP data sender MUST NOT set either an ECT codepoint | |||
| bits on window probe packets. | or the CWR bit on window probe packets. | |||
| However, because window probes use exact sequence numbers, they can- | However, because window probes use exact sequence numbers, they | |||
| not be easily spoofed in denial-of-service attacks. Therefore, if a | cannot be easily spoofed in denial-of-service attacks. Therefore, if | |||
| window probe arrives with ECT and CE set, then the receiver SHOULD | a window probe arrives with the CE codepoint set, then the receiver | |||
| respond to the ECN indications. | SHOULD respond to the ECN indications. | |||
| 7. Non-compliance by the End Nodes | 7. Non-compliance by the End Nodes | |||
| This section discusses concerns about the vulnerability of ECN to | This section discusses concerns about the vulnerability of ECN to | |||
| non-compliant end-nodes (i.e., end nodes that set the ECT bit in | non-compliant end-nodes (i.e., end nodes that set the ECT codepoint | |||
| transmitted packets but do not respond to received CE packets). We | in transmitted packets but do not respond to received CE packets). | |||
| argue that the addition of ECN to the IP architecture will not sig- | We argue that the addition of ECN to the IP architecture will not | |||
| nificantly increase the current vulnerability of the architecture to | significantly increase the current vulnerability of the architecture | |||
| unresponsive flows. | to unresponsive flows. | |||
| Even for non-ECN environments, there are serious concerns about the | Even for non-ECN environments, there are serious concerns about the | |||
| damage that can be done by non-compliant or unresponsive flows (that | damage that can be done by non-compliant or unresponsive flows (that | |||
| is, flows that do not respond to congestion control indications by | is, flows that do not respond to congestion control indications by | |||
| reducing their arrival rate at the congested link). For example, an | reducing their arrival rate at the congested link). For example, an | |||
| end-node could "turn off congestion control" by not reducing its con- | end-node could "turn off congestion control" by not reducing its | |||
| gestion window in response to packet drops. This is a concern for the | congestion window in response to packet drops. This is a concern for | |||
| current Internet. It has been argued that routers will have to | the current Internet. It has been argued that routers will have to | |||
| deploy mechanisms to detect and differentially treat packets from | deploy mechanisms to detect and differentially treat packets from | |||
| non-compliant flows [RFC2309,FF99]. It has also been suggested that | non-compliant flows [RFC2309,FF99]. It has also been suggested that | |||
| techniques such as end-to-end per-flow scheduling and isolation of | techniques such as end-to-end per-flow scheduling and isolation of | |||
| one flow from another, differentiated services, or end-to-end reser- | one flow from another, differentiated services, or end-to-end | |||
| vations could remove some of the more damaging effects of unrespon- | reservations could remove some of the more damaging effects of | |||
| sive flows. | unresponsive flows. | |||
| It might seem that dropping packets in itself is an adequate deter- | It might seem that dropping packets in itself is an adequate | |||
| rent for non-compliance, and that the use of ECN removes this deter- | deterrent for non-compliance, and that the use of ECN removes this | |||
| rent. We would argue in response that (1) ECN-capable routers pre- | deterrent. We would argue in response that (1) ECN-capable routers | |||
| serve packet-dropping behavior in times of high congestion; and (2) | preserve packet-dropping behavior in times of high congestion; and | |||
| even in times of high congestion, dropping packets in itself is not | (2) even in times of high congestion, dropping packets in itself is | |||
| an adequate deterrent for non-compliance. | not an adequate deterrent for non-compliance. | |||
| First, ECN-Capable routers will only mark packets (as opposed to | First, ECN-Capable routers will only mark packets (as opposed to | |||
| dropping them) when the packet marking rate is reasonably low. During | dropping them) when the packet marking rate is reasonably low. During | |||
| periods where the average queue size exceeds an upper threshold, and | periods where the average queue size exceeds an upper threshold, and | |||
| therefore the potential packet marking rate would be high, our recom- | therefore the potential packet marking rate would be high, our | |||
| mendation is that routers drop packets rather then set the CE bit in | recommendation is that routers drop packets rather then set the CE | |||
| packet headers. | codepoint in packet headers. | |||
| During the periods of low or moderate packet marking rates when ECN | During the periods of low or moderate packet marking rates when ECN | |||
| would be deployed, there would be little deterrent effect on unre- | would be deployed, there would be little deterrent effect on | |||
| sponsive flows of dropping rather than marking those packets. For | unresponsive flows of dropping rather than marking those packets. For | |||
| example, delay-insensitive flows using reliable delivery might have | example, delay-insensitive flows using reliable delivery might have | |||
| an incentive to increase rather than to decrease their sending rate | an incentive to increase rather than to decrease their sending rate | |||
| in the presence of dropped packets. Similarly, delay-sensitive flows | in the presence of dropped packets. Similarly, delay-sensitive flows | |||
| using unreliable delivery might increase their use of FEC in response | using unreliable delivery might increase their use of FEC in response | |||
| to an increased packet drop rate, increasing rather than decreasing | to an increased packet drop rate, increasing rather than decreasing | |||
| their sending rate. For the same reasons, we do not believe that | their sending rate. For the same reasons, we do not believe that | |||
| packet dropping itself is an effective deterrent for non-compliance | packet dropping itself is an effective deterrent for non-compliance | |||
| even in an environment of high packet drop rates, when all flows are | even in an environment of high packet drop rates, when all flows are | |||
| sharing the same packet drop rate. | sharing the same packet drop rate. | |||
| Several methods have been proposed to identify and restrict non-com- | Several methods have been proposed to identify and restrict non- | |||
| pliant or unresponsive flows. The addition of ECN to the network | compliant or unresponsive flows. The addition of ECN to the network | |||
| environment would not in any way increase the difficulty of designing | environment would not in any way increase the difficulty of designing | |||
| and deploying such mechanisms. If anything, the addition of ECN to | and deploying such mechanisms. If anything, the addition of ECN to | |||
| the architecture would make the job of identifying unresponsive flows | the architecture would make the job of identifying unresponsive flows | |||
| slightly easier. For example, in an ECN-Capable environment routers | slightly easier. For example, in an ECN-Capable environment routers | |||
| are not limited to information about packets that are dropped or have | are not limited to information about packets that are dropped or have | |||
| the CE bit set at that router itself; in such an environment, routers | the CE codepoint set at that router itself; in such an environment, | |||
| could also take note of arriving CE packets that indicate congestion | routers could also take note of arriving CE packets that indicate | |||
| encountered by that packet earlier in the path. | congestion encountered by that packet earlier in the path. | |||
| 8. Non-compliance in the Network | 8. Non-compliance in the Network | |||
| This section considers the issues when a router is operating, possi- | This section considers the issues when a router is operating, | |||
| bly maliciously, to modify either of the bits in the ECN field. In | possibly maliciously, to modify either of the bits in the ECN field. | |||
| this section we represent the ECN field in the IP header by the tuple | ||||
| (ECT bit, CE bit). | ||||
| By tampering with the bits in the ECN field, an adversary (or a bro- | By tampering with the bits in the ECN field, an adversary (or a | |||
| ken router) could do one or more of the following: falsely report | broken router) could do one or more of the following: falsely report | |||
| congestion, disable ECN-Capability for an individual packet, erase | congestion, disable ECN-Capability for an individual packet, erase | |||
| the ECN congestion indication, or falsely indicate ECN-Capability. | the ECN congestion indication, or falsely indicate ECN-Capability. | |||
| Section 18 systematically examines the various cases by which the ECN | Section 18 systematically examines the various cases by which the ECN | |||
| field could be modified. The important criterion considered in | field could be modified. The important criterion considered in | |||
| determining the consequences of such modifications is whether it is | determining the consequences of such modifications is whether it is | |||
| likely to lead to poorer behavior in any dimension (throughput, | likely to lead to poorer behavior in any dimension (throughput, | |||
| delay, fairness or functionality) than if a router were to drop a | delay, fairness or functionality) than if a router were to drop a | |||
| packet. | packet. | |||
| The first two possible changes, falsely reporting congestion or dis- | The first two possible changes, falsely reporting congestion or | |||
| abling ECN-Capability for an individual packet, are no worse than if | disabling ECN-Capability for an individual packet, are no worse than | |||
| the router were to simply drop the packet. From a congestion control | if the router were to simply drop the packet. From a congestion | |||
| point of view, setting the CE bit in the absence of congestion by a | control point of view, setting the CE codepoint in the absence of | |||
| non-compliant router would be no worse than a router dropping a | congestion by a non-compliant router would be no worse than a router | |||
| packet unnecessarily. By "erasing" the ECT bit of a packet that is | dropping a packet unnecessarily. By "erasing" an ECT codepoint of a | |||
| later dropped in the network, a router's actions could result in an | packet that is later dropped in the network, a router's actions could | |||
| unnecessary packet drop for that packet later in the network. | result in an unnecessary packet drop for that packet later in the | |||
| network. | ||||
| However, as discussed in Section 18, a router that erases the ECN | However, as discussed in Section 18, a router that erases the ECN | |||
| congestion indication or falsely indicates ECN-Capability could | congestion indication or falsely indicates ECN-Capability could | |||
| potentially do more damage to the flow that if it has simply dropped | potentially do more damage to the flow that if it has simply dropped | |||
| the packet. A rogue or broken router that "erased" the CE bit in | the packet. A rogue or broken router that "erased" the CE codepoint | |||
| arriving CE packets would prevent that indication of congestion from | in arriving CE packets would prevent that indication of congestion | |||
| reaching downstream receivers. This could result in the failure of | from reaching downstream receivers. This could result in the failure | |||
| congestion control for that flow and a resulting increase in conges- | of congestion control for that flow and a resulting increase in | |||
| tion in the network, ultimately resulting in subsequent packets | congestion in the network, ultimately resulting in subsequent packets | |||
| dropped for this flow as the average queue size increased at the con- | dropped for this flow as the average queue size increased at the | |||
| gested gateway. | congested gateway. | |||
| Section 19 considers the potential repercussions of subverting end- | Section 19 considers the potential repercussions of subverting end- | |||
| to-end congestion control by either falsely indicating ECN-Capabil- | to-end congestion control by either falsely indicating ECN- | |||
| ity, or by erasing the congestion indication in ECN (the CE-bit). We | Capability, or by erasing the congestion indication in ECN (the CE- | |||
| observe in Section 19 that the consequence of subverting ECN-based | codepoint). We observe in Section 19 that the consequence of | |||
| congestion control may lead to potential unfairness, but this is | subverting ECN-based congestion control may lead to potential | |||
| likely to be no worse than the subversion of either ECN-based or | unfairness, but this is likely to be no worse than the subversion of | |||
| packet-based congestion control by the end nodes. | either ECN-based or packet-based congestion control by the end nodes. | |||
| 8.1. Complications Introduced by Split Paths | 8.1. Complications Introduced by Split Paths | |||
| If a router or other network element has access to all of the packets | If a router or other network element has access to all of the packets | |||
| of a flow, then that router could do no more damage to a flow by | of a flow, then that router could do no more damage to a flow by | |||
| altering the ECN field than it could by simply dropping all of the | altering the ECN field than it could by simply dropping all of the | |||
| packets from that flow. However, in some cases, a malicious or bro- | packets from that flow. However, in some cases, a malicious or | |||
| ken router might have access to only a subset of the packets from a | broken router might have access to only a subset of the packets from | |||
| flow. The question is as follows: can this router, by altering the | a flow. The question is as follows: can this router, by altering | |||
| ECN field in this subset of the packets, do more damage to that flow | the ECN field in this subset of the packets, do more damage to that | |||
| than if it has simply dropped that set of the packets? | flow than if it has simply dropped that set of the packets? | |||
| This is also discussed in detail in Section 18, which conclude as | This is also discussed in detail in Section 18, which conclude as | |||
| follows: It is true that the adversary that has access only to a | follows: It is true that the adversary that has access only to a | |||
| subset of packets in an aggregate might, by subverting ECN-based con- | subset of packets in an aggregate might, by subverting ECN-based | |||
| gestion control, be able to deny the benefits of ECN to the other | congestion control, be able to deny the benefits of ECN to the other | |||
| packets in the aggregate. While this is undesirable, this is not a | packets in the aggregate. While this is undesirable, this is not a | |||
| sufficient concern to result in disabling ECN. | sufficient concern to result in disabling ECN. | |||
| 9. Encapsulated Packets | 9. Encapsulated Packets | |||
| 9.1. IP packets encapsulated in IP | 9.1. IP packets encapsulated in IP | |||
| The encapsulation of IP packet headers in tunnels is used in many | The encapsulation of IP packet headers in tunnels is used in many | |||
| places, including IPsec and IP in IP [RFC2003]. This section consid- | places, including IPsec and IP in IP [RFC2003]. This section | |||
| ers issues related to interactions between ECN and IP tunnels, and | considers issues related to interactions between ECN and IP tunnels, | |||
| specifies two alternative solutions. This discussion is complemented | and specifies two alternative solutions. This discussion is | |||
| by RFC 2983's discussion of interactions between Differentiated Ser- | complemented by RFC 2983's discussion of interactions between | |||
| vices and IP tunnels of various forms [RFC 2983], as Differentiated | Differentiated Services and IP tunnels of various forms [RFC 2983], | |||
| Services uses the remaining six bits of the IP header octet that is | as Differentiated Services uses the remaining six bits of the IP | |||
| used by ECN (see Figure 1 in Section 5). | header octet that is used by ECN (see Figure 2 in Section 5). | |||
| Some IP tunnel modes are based on adding a new "outer" IP header that | Some IP tunnel modes are based on adding a new "outer" IP header that | |||
| encapsulates the original, or "inner" IP header and its associated | encapsulates the original, or "inner" IP header and its associated | |||
| packet. In many cases, the new "outer" IP header may be added and | packet. In many cases, the new "outer" IP header may be added and | |||
| removed at intermediate points along a connection, enabling the net- | removed at intermediate points along a connection, enabling the | |||
| work to establish a tunnel without requiring endpoint participation. | network to establish a tunnel without requiring endpoint | |||
| We denote tunnels that specify that the outer header be discarded at | participation. We denote tunnels that specify that the outer header | |||
| tunnel egress as "simple tunnels". | be discarded at tunnel egress as "simple tunnels". | |||
| ECN uses the ECT and CE flags in the IP header for signaling between | ECN uses the ECN field in the IP header for signaling between routers | |||
| routers and connection endpoints. ECN interacts with IP tunnels | and connection endpoints. ECN interacts with IP tunnels based on the | |||
| based on the treatment of these flags in the IP header. In simple IP | treatment of the ECN field in the IP header. In simple IP tunnels | |||
| tunnels the octet containing these flags is copied or mapped from the | the octet containing the ECN field is copied or mapped from the inner | |||
| inner IP header to the outer IP header at IP tunnel ingress, and the | IP header to the outer IP header at IP tunnel ingress, and the outer | |||
| outer header's copy of this field is discarded at IP tunnel egress. | header's copy of this field is discarded at IP tunnel egress. If the | |||
| If the outer header were to be simply discarded without taking care | outer header were to be simply discarded without taking care to deal | |||
| to deal with the ECN related flags, and an ECN-capable router were to | with the ECN field, and an ECN-capable router were to set the CE | |||
| set the CE (Congestion Experienced) bit within a packet in a simple | (Congestion Experienced) codepoint within a packet in a simple IP | |||
| IP tunnel, this indication would be discarded at tunnel egress, los- | tunnel, this indication would be discarded at tunnel egress, losing | |||
| ing the indication of congestion. | the indication of congestion. | |||
| Thus, the use of ECN over simple IP tunnels would result in routers | Thus, the use of ECN over simple IP tunnels would result in routers | |||
| attempting to use the outer IP header to signal congestion to end- | attempting to use the outer IP header to signal congestion to | |||
| points, but those congestion warnings never arriving because the | endpoints, but those congestion warnings never arriving because the | |||
| outer header is discarded at the tunnel egress point. This problem | outer header is discarded at the tunnel egress point. This problem | |||
| was encountered with ECN and IPsec in tunnel mode, and RFC 2481 rec- | was encountered with ECN and IPsec in tunnel mode, and RFC 2481 | |||
| ommended that ECN not be used with the older simple IPsec tunnels in | recommended that ECN not be used with the older simple IPsec tunnels | |||
| order to avoid this behavior and its consequences. When ECN becomes | in order to avoid this behavior and its consequences. When ECN | |||
| widely deployed, then simple tunnels likely to carry ECN-capable | becomes widely deployed, then simple tunnels likely to carry ECN- | |||
| traffic will have to be changed. | capable traffic will have to be changed. | |||
| From a security point of view, the use of ECN in the outer header of | From a security point of view, the use of ECN in the outer header of | |||
| an IP tunnel might raise security concerns because an adversary could | an IP tunnel might raise security concerns because an adversary could | |||
| tamper with the ECN information that propagates beyond the tunnel | tamper with the ECN information that propagates beyond the tunnel | |||
| endpoint. Based on an analysis in Sections 18 and 19 of these con- | endpoint. Based on an analysis in Sections 18 and 19 of these | |||
| cerns and the resultant risks, our overall approach is to make sup- | concerns and the resultant risks, our overall approach is to make | |||
| port for ECN an option for IP tunnels, so that an IP tunnel can be | support for ECN an option for IP tunnels, so that an IP tunnel can be | |||
| specified or configured either to use ECN or not to use ECN in the | specified or configured either to use ECN or not to use ECN in the | |||
| outer header of the tunnel. Thus, in environments or tunneling pro- | outer header of the tunnel. Thus, in environments or tunneling | |||
| tocols where the risks of using ECN are judged to outweigh its bene- | protocols where the risks of using ECN are judged to outweigh its | |||
| fits, the tunnel can simply not use ECN in the outer header. Then | benefits, the tunnel can simply not use ECN in the outer header. | |||
| the only indication of congestion experienced at routers within the | Then the only indication of congestion experienced at routers within | |||
| tunnel would be through packet loss. | the tunnel would be through packet loss. | |||
| The result is that there are two viable options for the behavior of | The result is that there are two viable options for the behavior of | |||
| ECN-capable connections over an IP tunnel, especially IPsec tunnels: | ECN-capable connections over an IP tunnel, especially IPsec tunnels: | |||
| * A limited-functionality option in which ECN is preserved in the | * A limited-functionality option in which ECN is preserved in the | |||
| inner header, but disabled in the outer header. The only mecha- | inner header, but disabled in the outer header. The only | |||
| nism available for signaling congestion occurring within the tun- | mechanism available for signaling congestion occurring within the | |||
| nel in this case is dropped packets. | tunnel in this case is dropped packets. | |||
| * A full-functionality option that supports ECN in both the inner | * A full-functionality option that supports ECN in both the inner | |||
| and outer headers, and propagates congestion warnings from nodes | and outer headers, and propagates congestion warnings from nodes | |||
| within the tunnel to endpoints. | within the tunnel to endpoints. | |||
| Support for these options requires varying amounts of changes to IP | Support for these options requires varying amounts of changes to IP | |||
| header processing at tunnel ingress and egress. A small subset of | header processing at tunnel ingress and egress. A small subset of | |||
| these changes sufficient to support only the limited-functionality | these changes sufficient to support only the limited-functionality | |||
| option would be sufficient to eliminate any incompatibility between | option would be sufficient to eliminate any incompatibility between | |||
| ECN and IP tunnels. | ECN and IP tunnels. | |||
| One goal of this document is to give guidance about the tradeoffs | One goal of this document is to give guidance about the tradeoffs | |||
| between the limited-functionality and full-functionality options. A | between the limited-functionality and full-functionality options. A | |||
| full discussion of the potential effects of an adversary's modifica- | full discussion of the potential effects of an adversary's | |||
| tions of the CE and ECT bits is given in Sections 18 and 19. | modifications of the ECN field is given in Sections 18 and 19. | |||
| 9.1.1. The Limited-functionality and Full-functionality Options | 9.1.1. The Limited-functionality and Full-functionality Options | |||
| The limited-functionality option for ECN encapsulation in IP tunnels | The limited-functionality option for ECN encapsulation in IP tunnels | |||
| is for the ECT bit in the outside (encapsulating) header to be off | is for the non-ECT codepoint to be set in the outside (encapsulating) | |||
| (i.e., set to 0), regardless of the value of the ECT bit in the | header regardless of the value of the ECN field in the inside | |||
| inside (encapsulated) header. With this option, the ECN field in the | (encapsulated) header. With this option, the ECN field in the inner | |||
| inner header is not altered upon de-capsulation. The disadvantage of | header is not altered upon de-capsulation. The disadvantage of this | |||
| this approach is that the flow does not have ECN support for that | approach is that the flow does not have ECN support for that part of | |||
| part of the path that is using IP tunneling, even if the encapsulated | the path that is using IP tunneling, even if the encapsulated packet | |||
| packet (from the original TCP sender) is ECN-Capable. That is, if | (from the original TCP sender) is ECN-Capable. That is, if the | |||
| the encapsulated packet arrives at a congested router that is ECN- | encapsulated packet arrives at a congested router that is ECN- | |||
| capable, and the router can decide to drop or mark the packet as an | capable, and the router can decide to drop or mark the packet as an | |||
| indication of congestion to the end nodes, the router will not be | indication of congestion to the end nodes, the router will not be | |||
| permitted to set the CE bit in the packet header, but instead will | permitted to set the CE codepoint in the packet header, but instead | |||
| have to drop the packet. | will have to drop the packet. | |||
| The full-functionality option for ECN encapsulation is to copy the | The full-functionality option for ECN encapsulation is to copy the | |||
| ECT bit of the inside header to the outside header on encapsulation, | ECN codepoint of the inside header to the outside header on | |||
| and to OR the CE bit from the outer header with the CE bit of the | encapsulation if the inside header is not-ECT or ECT, and to set the | |||
| inside header on decapsulation. That is, for full ECN support the | ECN codepoint of the outside header to ECT(0) if the ECN codepoint of | |||
| encapsulation and decapsulation processing involves the following: | the inside header is CE. On decapsulation, if the CE codepoint is | |||
| At tunnel ingress, the full-functionality option copies the value of | set on the outside header, then the CE codepoint is also set in the | |||
| ECT (bit 6) in the inner header to the outer header. CE (bit 7) is | inner header. Otherwise, the ECN codepoint on the inner header is | |||
| set to 0 in the outer header. Upon decapsulation at the tunnel | left unchanged. That is, for full ECN support the encapsulation and | |||
| egress, the full-functionality option sets CE to 1 in the inner | decapsulation processing involves the following: At tunnel ingress, | |||
| header if the value of ECT (bit 6) in the inner header is 1, and the | the full-functionality option sets the ECN codepoint in the outer | |||
| value of CE (bit 7) in the outer header is 1. Otherwise, no change | header. If the ECN codepoint in the inner header is not-ECT or ECT, | |||
| is made to this field of the inner header. | then it is copied to the ECN codepoint in the outer header. If the | |||
| ECN codepoint in the inner header is CE, then the ECN codepoint in | ||||
| the outer header is set to ECT(0). Upon decapsulation at the tunnel | ||||
| egress, the full-functionality option sets the CE codepoint in the | ||||
| inner header if the CE codepoint is set in the outer header. | ||||
| Otherwise, no change is made to this field of the inner header. | ||||
| With the full-functionality option, a flow can take advantage of ECN | With the full-functionality option, a flow can take advantage of ECN | |||
| in those parts of the path that might use IP tunneling. The disad- | in those parts of the path that might use IP tunneling. The | |||
| vantage of the full-functionality option from a security perspective | disadvantage of the full-functionality option from a security | |||
| is that the IP tunnel cannot protect the flow from certain modifica- | perspective is that the IP tunnel cannot protect the flow from | |||
| tions to the ECN bits in the IP header within the tunnel. The poten- | certain modifications to the ECN bits in the IP header within the | |||
| tial dangers from modifications to the ECN bits in the IP header are | tunnel. The potential dangers from modifications to the ECN bits in | |||
| described in detail in Sections 18 and 19. | the IP header are described in detail in Sections 18 and 19. | |||
| (1) An IP tunnel MUST modify the handling of the DS field octet at | (1) An IP tunnel MUST modify the handling of the DS field octet at | |||
| IP tunnel endpoints by implementing either the limited-functional- | IP tunnel endpoints by implementing either the limited- | |||
| ity or the full-functionality option. | functionality or the full-functionality option. | |||
| (2) Optionally, an IP tunnel MAY enable the endpoints of an IP | (2) Optionally, an IP tunnel MAY enable the endpoints of an IP | |||
| tunnel to negotiate the choice between the limited-functionality | tunnel to negotiate the choice between the limited-functionality | |||
| and the full-functionality option for ECN in the tunnel. | and the full-functionality option for ECN in the tunnel. | |||
| The minimum required to make ECN usable with IP tunnels is the lim- | The minimum required to make ECN usable with IP tunnels is the | |||
| ited-functionality option, which prevents ECN from being enabled in | limited-functionality option, which prevents ECN from being enabled | |||
| the outer header of an IPsec tunnel. Full support for ECN requires | in the outer header of an IPsec tunnel. Full support for ECN | |||
| the use of the full-functionality option. If there are no optional | requires the use of the full-functionality option. If there are no | |||
| mechanisms for the tunnel endpoints to negotiate a choice between the | optional mechanisms for the tunnel endpoints to negotiate a choice | |||
| limited-functionality or full-functionality option, there can be a | between the limited-functionality or full-functionality option, there | |||
| pre-existing agreement between the tunnel endpoints about whether to | can be a pre-existing agreement between the tunnel endpoints about | |||
| support the limited-functionality or the full-functionality ECN | whether to support the limited-functionality or the full- | |||
| option. | functionality ECN option. | |||
| In addition, it is RECOMMENDED that packets with ECT and CE both set | In addition, it is RECOMMENDED that packets with the CE codepoint in | |||
| to 1 in the outer header be dropped if they arrive at the tunnel | the outer header be dropped if they arrive at the tunnel egress point | |||
| egress point for a tunnel that uses the limited-functionality option, | for a tunnel that uses the limited-functionality option, or for a | |||
| or for a tunnel that uses the full-functionality option but for which | tunnel that uses the full-functionality option but for which the not- | |||
| the ECT bit in the inner header is set to zero. This is motivated by | ECT codepoint is set in the inner header. This is motivated by | |||
| backwards compatibility and to ensure that no unauthorized modifica- | backwards compatibility and to ensure that no unauthorized | |||
| tions of the ECN field take place, and is discussed further in the | modifications of the ECN field take place, and is discussed further | |||
| next Section (9.1.2). | in the next Section (9.1.2). | |||
| 9.1.2. Changes to the ECN Field within an IP Tunnel. | 9.1.2. Changes to the ECN Field within an IP Tunnel. | |||
| The presence of a copy of the ECN field in the inner header of an IP | The presence of a copy of the ECN field in the inner header of an IP | |||
| tunnel mode packet provides an opportunity for detection of unautho- | tunnel mode packet provides an opportunity for detection of | |||
| rized modifications to the ECT bit in the outer header. Comparison | unauthorized modifications to the ECN field in the outer header. | |||
| of the ECT bits in the inner and outer headers falls into two cate- | Comparison of the ECT fields in the inner and outer headers falls | |||
| gories for implementations that conform to this document: | into two categories for implementations that conform to this | |||
| document: | ||||
| * If the IP tunnel uses the full-functionality option, then the | * If the IP tunnel uses the full-functionality option, then the | |||
| values of the ECT bits in the inner and outer headers should be | not-ECT codepoint should be set in the outer header if and only if | |||
| identical. | it is also set in the inner header. | |||
| * If the tunnel uses the limited-functionality option, then the | * If the tunnel uses the limited-functionality option, then the | |||
| ECT bit in the outer header should be 0. | not-ECT codepoint should be set in the outer header. | |||
| Receipt of a packet not satisfying the appropriate condition could be | Receipt of a packet not satisfying the appropriate condition could be | |||
| a cause of concern. | a cause of concern. | |||
| Consider the case of an IP tunnel where the tunnel ingress point has | Consider the case of an IP tunnel where the tunnel ingress point has | |||
| not been updated to this document's requirements, while the tunnel | not been updated to this document's requirements, while the tunnel | |||
| egress point has been updated to support ECN. In this case, the IP | egress point has been updated to support ECN. In this case, the IP | |||
| tunnel is not explicitly configured to support the full-functionality | tunnel is not explicitly configured to support the full-functionality | |||
| ECN option. However, the tunnel ingress point is behaving identically | ECN option. However, the tunnel ingress point is behaving identically | |||
| to a tunnel ingress point that supports the full-functionality | to a tunnel ingress point that supports the full-functionality | |||
| option. If packets from an ECN-capable connection use this tunnel, | option. If packets from an ECN-capable connection use this tunnel, | |||
| ECT will be set to 1 in the outer header at the tunnel ingress point. | the ECT codepoint will be set in the outer header at the tunnel | |||
| Congestion within the tunnel may then result in ECN-capable routers | ingress point. Congestion within the tunnel may then result in ECN- | |||
| setting CE in the outer header. Because the tunnel has not been | capable routers setting CE in the outer header. Because the tunnel | |||
| explicitly configured to support the full-functionality option, the | has not been explicitly configured to support the full-functionality | |||
| tunnel egress point expects the ECT bit in the outer header to be 0. | option, the tunnel egress point expects the not-ECT codepoint to be | |||
| When an ECN-capable tunnel egress point receives a packet with the | set in the outer header. When an ECN-capable tunnel egress point | |||
| ECT bit in the outer header set to 1, in a tunnel that has not been | receives a packet with the ECT or CE codepoint in the outer header, | |||
| configured to support the full-functionality option, that packet | in a tunnel that has not been configured to support the full- | |||
| should be processed, according to whether CE bit was set, as follows. | functionality option, that packet should be processed, according to | |||
| It is RECOMMENDED that such packets, with the ECT bit in the outer | whether the CE codepoint was set, as follows. It is RECOMMENDED that | |||
| header set to 1 on a tunnel that has not been configured to support | on a tunnel that has not been configured to support the full- | |||
| the full-functionality option, be dropped at the egress point if CE | functionality option, packets should be dropped at the egress point | |||
| is set to 1 in the outer header but 0 in the inner header, and for- | if the CE codepoint is set in the outer header but not in the inner | |||
| warded otherwise. | header, and should be forwarded otherwise. | |||
| An IP tunnel cannot provide protection against erasure of congestion | An IP tunnel cannot provide protection against erasure of congestion | |||
| indications based on resetting the value of the CE bit in packets for | indications based on changing the ECN codepoint from CE to ECT. The | |||
| which ECT is set in the outer header. The erasure of congestion | erasure of congestion indications may impact the network and other | |||
| indications may impact the network and other flows in ways that would | flows in ways that would not be possible in the absence of ECN. It | |||
| not be possible in the absence of ECN. It is important to note that | is important to note that erasure of congestion indications can only | |||
| erasure of congestion indications can only be performed to congestion | be performed to congestion indications placed by nodes within the | |||
| indications placed by nodes within the tunnel; the copy of the CE bit | tunnel; the copy of the ECN field in the inner header preserves | |||
| in the inner header preserves congestion notifications from nodes | congestion notifications from nodes upstream of the tunnel ingress | |||
| upstream of the tunnel ingress. If erasure of congestion notifica- | (unless the inner header is also erased). If erasure of congestion | |||
| tions is judged to be a security risk that exceeds the congestion | notifications is judged to be a security risk that exceeds the | |||
| management benefits of ECN, then tunnels could be specified or con- | congestion management benefits of ECN, then tunnels could be | |||
| figured to use the limited-functionality option. | specified or configured to use the limited-functionality option. | |||
| 9.2. IPsec Tunnels | 9.2. IPsec Tunnels | |||
| IPsec supports secure communication over potentially insecure network | IPsec supports secure communication over potentially insecure network | |||
| components such as intermediate routers. IPsec protocols support two | components such as intermediate routers. IPsec protocols support two | |||
| operating modes, transport mode and tunnel mode, that span a wide | operating modes, transport mode and tunnel mode, that span a wide | |||
| range of security requirements and operating environments. Transport | range of security requirements and operating environments. Transport | |||
| mode security protocol header(s) are inserted between the IP (IPv4 or | mode security protocol header(s) are inserted between the IP (IPv4 or | |||
| IPv6) header and higher layer protocol headers (e.g., TCP), and hence | IPv6) header and higher layer protocol headers (e.g., TCP), and hence | |||
| transport mode can only be used for end-to-end security on a connec- | transport mode can only be used for end-to-end security on a | |||
| tion. IPsec tunnel mode is based on adding a new "outer" IP header | connection. IPsec tunnel mode is based on adding a new "outer" IP | |||
| that encapsulates the original, or "inner" IP header and its associ- | header that encapsulates the original, or "inner" IP header and its | |||
| ated packet. Tunnel mode security headers are inserted between these | associated packet. Tunnel mode security headers are inserted between | |||
| two IP headers. In contrast to transport mode, the new "outer" IP | these two IP headers. In contrast to transport mode, the new "outer" | |||
| header and tunnel mode security headers can be added and removed at | IP header and tunnel mode security headers can be added and removed | |||
| intermediate points along a connection, enabling security gateways to | at intermediate points along a connection, enabling security gateways | |||
| secure vulnerable portions of a connection without requiring endpoint | to secure vulnerable portions of a connection without requiring | |||
| participation in the security protocols. An important aspect of tun- | endpoint participation in the security protocols. An important | |||
| nel mode security is that in the original specification, the outer | aspect of tunnel mode security is that in the original specification, | |||
| header is discarded at tunnel egress, ensuring that security threats | the outer header is discarded at tunnel egress, ensuring that | |||
| based on modifying the IP header do not propagate beyond that tunnel | security threats based on modifying the IP header do not propagate | |||
| endpoint. Further discussion of IPsec can be found in [RFC2401]. | beyond that tunnel endpoint. Further discussion of IPsec can be | |||
| found in [RFC2401]. | ||||
| The IPsec protocol as originally defined in [ESP, AH] required that | The IPsec protocol as originally defined in [ESP, AH] required that | |||
| the inner header's ECN field not be changed by IPsec decapsulation | the inner header's ECN field not be changed by IPsec decapsulation | |||
| processing at a tunnel egress node; this would have ruled out the | processing at a tunnel egress node; this would have ruled out the | |||
| possibility of full-functionality mode for ECN. At the same time, | possibility of full-functionality mode for ECN. At the same time, | |||
| this would ensure that an adversary's modifications to the ECN field | this would ensure that an adversary's modifications to the ECN field | |||
| cannot be used to launch theft- or denial-of-service attacks across | cannot be used to launch theft- or denial-of-service attacks across | |||
| an IPsec tunnel endpoint, as any such modifications will be discarded | an IPsec tunnel endpoint, as any such modifications will be discarded | |||
| at the tunnel endpoint. | at the tunnel endpoint. | |||
| In principle, permitting the use of ECN functionality in the outer | In principle, permitting the use of ECN functionality in the outer | |||
| header of an IPsec tunnel raises security concerns because an adver- | header of an IPsec tunnel raises security concerns because an | |||
| sary could tamper with the information that propagates beyond the | adversary could tamper with the information that propagates beyond | |||
| tunnel endpoint. Based on an analysis (included in Sections 18 and | the tunnel endpoint. Based on an analysis (included in Sections 18 | |||
| 19) of these concerns and the associated risks, our overall approach | and 19) of these concerns and the associated risks, our overall | |||
| has been to provide configuration support for IPsec changes to remove | approach has been to provide configuration support for IPsec changes | |||
| the conflict with ECN. | to remove the conflict with ECN. | |||
| In particular, in tunnel mode the IPsec tunnel MUST support either | In particular, in tunnel mode the IPsec tunnel MUST support either | |||
| the limited-functionality or the full-functionality mode outlined in | the limited-functionality or the full-functionality mode outlined in | |||
| Section 9.1.1. | Section 9.1.1. | |||
| This makes permission to use ECN functionality in the outer header of | This makes permission to use ECN functionality in the outer header of | |||
| an IPsec tunnel a configurable part of the corresponding IPsec Secu- | an IPsec tunnel a configurable part of the corresponding IPsec | |||
| rity Association (SA), so that it can be disabled in situations where | Security Association (SA), so that it can be disabled in situations | |||
| the risks are judged to outweigh the benefits. The result is that an | where the risks are judged to outweigh the benefits. The result is | |||
| IPsec security administrator is presented with two alternatives for | that an IPsec security administrator is presented with two | |||
| the behavior of ECN-capable connections within an IPsec tunnel, the | alternatives for the behavior of ECN-capable connections within an | |||
| limited-functionality alternative and full-functionality alternative | IPsec tunnel, the limited-functionality alternative and full- | |||
| described earlier. All IPsec implementations MUST implement either | functionality alternative described earlier. All IPsec | |||
| the limited-functionality or the full-functionality alternative in | implementations MUST implement either the limited-functionality or | |||
| order to eliminate incompatibility between ECN and IPsec tunnels, but | the full-functionality alternative in order to eliminate | |||
| implementers MAY choose to implement either alternative. | incompatibility between ECN and IPsec tunnels, but implementers MAY | |||
| choose to implement either alternative. | ||||
| In addition, this document specifies how the endpoints of an IPsec | In addition, this document specifies how the endpoints of an IPsec | |||
| tunnel could negotiate enabling ECN functionality in the outer head- | tunnel could negotiate enabling ECN functionality in the outer | |||
| ers of that tunnel based on security policy. The ability to negoti- | headers of that tunnel based on security policy. The ability to | |||
| ate ECN usage between tunnel endpoints would enable a security admin- | negotiate ECN usage between tunnel endpoints would enable a security | |||
| istrator to disable ECN in situations where she believes the risks | administrator to disable ECN in situations where she believes the | |||
| (e.g., of lost congestion notifications) outweigh the benefits of | risks (e.g., of lost congestion notifications) outweigh the benefits | |||
| ECN. | of ECN. | |||
| The IPsec protocol, as defined in [ESP, AH], does not include the IP | The IPsec protocol, as defined in [ESP, AH], does not include the IP | |||
| header's ECN field in any of its cryptographic calculations (in the | header's ECN field in any of its cryptographic calculations (in the | |||
| case of tunnel mode, the outer IP header's ECN field is not | case of tunnel mode, the outer IP header's ECN field is not | |||
| included). Hence modification of the ECN field by a network node has | included). Hence modification of the ECN field by a network node has | |||
| no effect on IPsec's end-to-end security, because it cannot cause any | no effect on IPsec's end-to-end security, because it cannot cause any | |||
| IPsec integrity check to fail. As a consequence, IPsec does not pro- | IPsec integrity check to fail. As a consequence, IPsec does not | |||
| vide any defense against an adversary's modification of the ECN field | provide any defense against an adversary's modification of the ECN | |||
| (i.e., a man-in-the-middle attack), as the adversary's modification | field (i.e., a man-in-the-middle attack), as the adversary's | |||
| will also have no effect on IPsec's end-to-end security. In some | modification will also have no effect on IPsec's end-to-end security. | |||
| environments, the ability to modify the ECN field without affecting | In some environments, the ability to modify the ECN field without | |||
| IPsec integrity checks may constitute a covert channel; if it is nec- | affecting IPsec integrity checks may constitute a covert channel; if | |||
| essary to eliminate such a channel or reduce its bandwidth, then the | it is necessary to eliminate such a channel or reduce its bandwidth, | |||
| IPsec tunnel should be run in limited-functionality mode. | then the IPsec tunnel should be run in limited-functionality mode. | |||
| 9.2.1. Negotiation between Tunnel Endpoints | 9.2.1. Negotiation between Tunnel Endpoints | |||
| This section describes the detailed changes to enable usage of ECN | This section describes the detailed changes to enable usage of ECN | |||
| over IPsec tunnels, including the negotiation of ECN support between | over IPsec tunnels, including the negotiation of ECN support between | |||
| tunnel endpoints. This is supported by three changes to IPsec: | tunnel endpoints. This is supported by three changes to IPsec: | |||
| * An optional Security Association Database (SAD) field indicating | * An optional Security Association Database (SAD) field indicating | |||
| whether tunnel encapsulation and decapsulation processing allows | whether tunnel encapsulation and decapsulation processing allows | |||
| or forbids ECN usage in the outer IP header. | or forbids ECN usage in the outer IP header. | |||
| * An optional Security Association Attribute that enables negotia- | * An optional Security Association Attribute that enables | |||
| tion of this SAD field between the two endpoints of an SA that | negotiation of this SAD field between the two endpoints of an SA | |||
| supports tunnel mode. | that supports tunnel mode. | |||
| * Changes to tunnel mode encapsulation and decapsulation process- | * Changes to tunnel mode encapsulation and decapsulation | |||
| ing to allow or forbid ECN usage in the outer IP header based on | processing to allow or forbid ECN usage in the outer IP header | |||
| the value of the SAD field. When ECN usage is allowed in the | based on the value of the SAD field. When ECN usage is allowed in | |||
| outer IP header, ECT is set in the outer header for ECN-capable | the outer IP header, the ECT codepoint is set in the outer header | |||
| connections and congestion notifications (indicated by the CE bit) | for ECN-capable connections and congestion notifications | |||
| from such connections are propagated to the inner header at tunnel | (indicated by the CE codepoint) from such connections are | |||
| egress. | propagated to the inner header at tunnel egress. | |||
| If negotiation of ECN usage is implemented, then the SAD field SHOULD | If negotiation of ECN usage is implemented, then the SAD field SHOULD | |||
| also be implemented. On the other hand, negotiation of ECN usage is | also be implemented. On the other hand, negotiation of ECN usage is | |||
| OPTIONAL in all cases, even for implementations that support the SAD | OPTIONAL in all cases, even for implementations that support the SAD | |||
| field. The encapsulation and decapsulation processing changes are | field. The encapsulation and decapsulation processing changes are | |||
| REQUIRED, but MAY be implemented without the other two changes by | REQUIRED, but MAY be implemented without the other two changes by | |||
| assuming that ECN usage is always forbidden. The full-functionality | assuming that ECN usage is always forbidden. The full-functionality | |||
| alternative for ECN usage over IPsec tunnels consists of the SAD | alternative for ECN usage over IPsec tunnels consists of the SAD | |||
| field and the full version of encapsulation and decapsulation pro- | field and the full version of encapsulation and decapsulation | |||
| cessing changes, with or without the OPTIONAL negotiation support. | processing changes, with or without the OPTIONAL negotiation support. | |||
| The limited-functionality alternative consists of a subset of the | The limited-functionality alternative consists of a subset of the | |||
| encapsulation and decapsulation changes that always forbids ECN | encapsulation and decapsulation changes that always forbids ECN | |||
| usage. | usage. | |||
| These changes are covered further in the following three subsections. | These changes are covered further in the following three subsections. | |||
| 9.2.1.1. ECN Tunnel Security Association Database Field | 9.2.1.1. ECN Tunnel Security Association Database Field | |||
| Full ECN functionality adds a new field to the SAD (see [RFC2401]): | Full ECN functionality adds a new field to the SAD (see [RFC2401]): | |||
| skipping to change at page 29, line 8 ¶ | skipping to change at page 31, line 14 ¶ | |||
| congestion occurring within the tunnel. The allowed value enables | congestion occurring within the tunnel. The allowed value enables | |||
| ECN congestion notifications. The forbidden value disables such | ECN congestion notifications. The forbidden value disables such | |||
| notifications, causing all congestion to be indicated via dropped | notifications, causing all congestion to be indicated via dropped | |||
| packets. | packets. | |||
| [OPTIONAL. The value of this field SHOULD be assumed to be | [OPTIONAL. The value of this field SHOULD be assumed to be | |||
| "forbidden" in implementations that do not support it.] | "forbidden" in implementations that do not support it.] | |||
| If this attribute is implemented, then the SA specification in a | If this attribute is implemented, then the SA specification in a | |||
| Security Policy Database (SPD) entry MUST support a corresponding | Security Policy Database (SPD) entry MUST support a corresponding | |||
| attribute, and this SPD attribute MUST be covered by the SPD adminis- | attribute, and this SPD attribute MUST be covered by the SPD | |||
| trative interface (currently described in Section 4.4.1 of | administrative interface (currently described in Section 4.4.1 of | |||
| [RFC2401]). | [RFC2401]). | |||
| 9.2.1.2. ECN Tunnel Security Association Attribute | 9.2.1.2. ECN Tunnel Security Association Attribute | |||
| A new IPsec Security Association Attribute is defined to enable the | A new IPsec Security Association Attribute is defined to enable the | |||
| support for ECN congestion notifications based on the outer IP header | support for ECN congestion notifications based on the outer IP header | |||
| to be negotiated for IPsec tunnels (see [RFC2407]). This attribute | to be negotiated for IPsec tunnels (see [RFC2407]). This attribute | |||
| is OPTIONAL, although implementations that support it SHOULD also | is OPTIONAL, although implementations that support it SHOULD also | |||
| support the SAD field defined in Section 9.2.1.1. | support the SAD field defined in Section 9.2.1.1. | |||
| Attribute Type | Attribute Type | |||
| class value type | class value type | |||
| ------------------------------------------------- | ------------------------------------------------- | |||
| ECN Tunnel 10 Basic | ECN Tunnel 10 Basic | |||
| The IPsec SA Attribute value 10 has been allocated by IANA to indi- | The IPsec SA Attribute value 10 has been allocated by IANA to | |||
| cate that the ECN Tunnel SA Attribute is being negotiated; the type | indicate that the ECN Tunnel SA Attribute is being negotiated; the | |||
| of this attribute is Basic (see Section 4.5 of [RFC2407]). The Class | type of this attribute is Basic (see Section 4.5 of [RFC2407]). The | |||
| Values are used to conduct the negotiation. See [RFC2407, RFC2408, | Class Values are used to conduct the negotiation. See [RFC2407, | |||
| RFC2409] for further information including encoding formats and | RFC2408, RFC2409] for further information including encoding formats | |||
| requirements for negotiating this SA attribute. | and requirements for negotiating this SA attribute. | |||
| Class Values | Class Values | |||
| ECN Tunnel | ECN Tunnel | |||
| Specifies whether ECN functionality is allowed to | Specifies whether ECN functionality is allowed to | |||
| be used with Tunnel Encapsulation Mode. | be used with Tunnel Encapsulation Mode. | |||
| This affects tunnel encapsulation and decapsulation processing - | This affects tunnel encapsulation and decapsulation processing - | |||
| see Section 9.2.1.3. | see Section 9.2.1.3. | |||
| skipping to change at page 29, line 45 ¶ | skipping to change at page 32, line 4 ¶ | |||
| ECN Tunnel | ECN Tunnel | |||
| Specifies whether ECN functionality is allowed to | Specifies whether ECN functionality is allowed to | |||
| be used with Tunnel Encapsulation Mode. | be used with Tunnel Encapsulation Mode. | |||
| This affects tunnel encapsulation and decapsulation processing - | This affects tunnel encapsulation and decapsulation processing - | |||
| see Section 9.2.1.3. | see Section 9.2.1.3. | |||
| RESERVED 0 | RESERVED 0 | |||
| Allowed 1 | Allowed 1 | |||
| Forbidden 2 | Forbidden 2 | |||
| Values 3-61439 are reserved to IANA. Values 61440-65535 are for | Values 3-61439 are reserved to IANA. Values 61440-65535 are for | |||
| private use. | private use. | |||
| If unspecified, the default shall be assumed to be Forbidden. | If unspecified, the default shall be assumed to be Forbidden. | |||
| ECN Tunnel is a new SA attribute, and hence initiators that use it | ECN Tunnel is a new SA attribute, and hence initiators that use it | |||
| can expect to encounter responders that do not understand it, and | can expect to encounter responders that do not understand it, and | |||
| therefore reject proposals containing it. For backwards compatibil- | therefore reject proposals containing it. For backwards | |||
| ity with such implementations initiators SHOULD always also include a | compatibility with such implementations initiators SHOULD always also | |||
| proposal without the ECN Tunnel attribute to enable such a responder | include a proposal without the ECN Tunnel attribute to enable such a | |||
| to select a transform or proposal that does not contain the ECN Tun- | responder to select a transform or proposal that does not contain the | |||
| nel attribute. RFC 2407 currently requires responders to reject all | ECN Tunnel attribute. RFC 2407 currently requires responders to | |||
| proposals if any proposal contains an unknown attribute; this | reject all proposals if any proposal contains an unknown attribute; | |||
| requirement is expected to be changed to require a responder not to | this requirement is expected to be changed to require a responder not | |||
| select proposals or transforms containing unknown attributes. | to select proposals or transforms containing unknown attributes. | |||
| 9.2.1.3. Changes to IPsec Tunnel Header Processing | 9.2.1.3. Changes to IPsec Tunnel Header Processing | |||
| For full ECN support, the encapsulation and decapsulation processing | For full ECN support, the encapsulation and decapsulation processing | |||
| for the IPv4 TOS field and the IPv6 Traffic Class field are changed | for the IPv4 TOS field and the IPv6 Traffic Class field are changed | |||
| from that specified in [RFC2401] to the following: | from that specified in [RFC2401] to the following: | |||
| <-- How Outer Hdr Relates to Inner Hdr --> | <-- How Outer Hdr Relates to Inner Hdr --> | |||
| Outer Hdr at Inner Hdr at | Outer Hdr at Inner Hdr at | |||
| IPv4 Encapsulator Decapsulator | IPv4 Encapsulator Decapsulator | |||
| skipping to change at page 30, line 38 ¶ | skipping to change at page 32, line 44 ¶ | |||
| Header fields: | Header fields: | |||
| DS Field copied from inner hdr (6) no change | DS Field copied from inner hdr (6) no change | |||
| ECN Field constructed (7) constructed (8) | ECN Field constructed (7) constructed (8) | |||
| (5)(6) If the packet will immediately enter a domain for which the | (5)(6) If the packet will immediately enter a domain for which the | |||
| DSCP value in the outer header is not appropriate, that value MUST | DSCP value in the outer header is not appropriate, that value MUST | |||
| be mapped to an appropriate value for the domain [RFC 2474]. Also | be mapped to an appropriate value for the domain [RFC 2474]. Also | |||
| see [RFC 2475] for further information. | see [RFC 2475] for further information. | |||
| (7) If the value of the ECN Tunnel field in the SAD entry for this | (7) If the value of the ECN Tunnel field in the SAD entry for this | |||
| SA is "allowed" and the value of ECT (bit 0) is 1 in the inner | SA is "allowed" and the ECN field in the inner header is set to | |||
| header, set ECT to 1 in the outer header, else set ECT to 0 in the | any value other than CE, copy this ECN field to the outer header. | |||
| outer header. Set CE (bit 1) to 0 in the outer header. | If the ECN field in the inner header is set to CE, then set the | |||
| ECN field in the outer header to ECT(0). | ||||
| (8) If the value of the ECN tunnel field in the SAD entry for this | (8) If the value of the ECN tunnel field in the SAD entry for this | |||
| SA is "allowed" and the value of ECT (bit 0) in the inner header | SA is "allowed" and the ECN field in the inner header is set to | |||
| is 1, then set the CE bit (bit 1) in the inner header to the logi- | ECT(0) or ECT(1) and the ECN field in the outer header is set to | |||
| cal OR of the CE bit in the inner header with the CE bit in the | CE, then copy the ECN field from the outer header to the inner | |||
| outer header, else make no change to the ECN field. | header. Otherwise, make no change to the ECN field in the inner | |||
| header. | ||||
| (5) and (6) are identical to match usage in [RFC2401], although | (5) and (6) are identical to match usage in [RFC2401], although | |||
| they are different in [RFC2401]. | they are different in [RFC2401]. | |||
| The above description applies to implementations that support the ECN | The above description applies to implementations that support the ECN | |||
| Tunnel field in the SAD; such implementations MUST implement this | Tunnel field in the SAD; such implementations MUST implement this | |||
| processing instead of the processing of the IPv4 TOS octet and IPv6 | processing instead of the processing of the IPv4 TOS octet and IPv6 | |||
| Traffic Class octet defined in [RFC2401]. This constitutes the full- | Traffic Class octet defined in [RFC2401]. This constitutes the full- | |||
| functionality alternative for ECN usage with IPsec tunnels. | functionality alternative for ECN usage with IPsec tunnels. | |||
| An implementation that does not support the ECN Tunnel field in the | An implementation that does not support the ECN Tunnel field in the | |||
| SAD MUST implement this processing by assuming that the value of the | SAD MUST implement this processing by assuming that the value of the | |||
| ECN Tunnel field of the SAD is "forbidden" for every SA. In this | ECN Tunnel field of the SAD is "forbidden" for every SA. In this | |||
| case, the processing of the ECN field reduces to: | case, the processing of the ECN field reduces to: | |||
| (7) Set the ECN field (ECT and CE bits) to zero in the outer | (7) Set the ECN field to not-ECT in the outer header. | |||
| header. | ||||
| (8) Make no change to the ECN field in the inner header. | (8) Make no change to the ECN field in the inner header. | |||
| This constitutes the limited functionality alternative for ECN usage | This constitutes the limited functionality alternative for ECN usage | |||
| with IPsec tunnels. | with IPsec tunnels. | |||
| For backwards compatibility, packets with ECT and CE both set to 1 in | For backwards compatibility, packets with the CE codepoint set in the | |||
| the outer header SHOULD be dropped if they arrive on an SA that is | outer header SHOULD be dropped if they arrive on an SA that is using | |||
| using the limited-functionality option, or that is using the full- | the limited-functionality option, or that is using the full- | |||
| functionality option (i.e., and has set the ECT flag in the outer | functionality option with the not-ECN codepoint set in the inner | |||
| header to 1) for a packet with the ECT flag set to 0 in the inner | ||||
| header. | header. | |||
| 9.2.2. Changes to the ECN Field within an IPsec Tunnel. | 9.2.2. Changes to the ECN Field within an IPsec Tunnel. | |||
| If the ECN Field is changed inappropriately within an IPsec tunnel, | If the ECN Field is changed inappropriately within an IPsec tunnel, | |||
| and this change is detected at the tunnel egress, then the receipt of | and this change is detected at the tunnel egress, then the receipt of | |||
| a packet not satisfying the appropriate condition for its SA is an | a packet not satisfying the appropriate condition for its SA is an | |||
| auditable event. An implementation MAY create audit records with | auditable event. An implementation MAY create audit records with | |||
| per-SA counts of incorrect packets over some time period rather than | per-SA counts of incorrect packets over some time period rather than | |||
| creating an audit record for each erroneous packet. Any such audit | creating an audit record for each erroneous packet. Any such audit | |||
| record SHOULD contain the headers from at least one erroneous packet, | record SHOULD contain the headers from at least one erroneous packet, | |||
| but need not contain the headers from every packet represented by the | but need not contain the headers from every packet represented by the | |||
| entry. | entry. | |||
| 9.2.3. Comments for IPsec Support | 9.2.3. Comments for IPsec Support | |||
| Substantial comments were received on two areas of this document dur- | Substantial comments were received on two areas of this document | |||
| ing review by the IPsec working group. This section describes these | during review by the IPsec working group. This section describes | |||
| comments and explains why the proposed changes were not incorporated. | these comments and explains why the proposed changes were not | |||
| incorporated. | ||||
| The first comment indicated that per-node configuration is easier to | The first comment indicated that per-node configuration is easier to | |||
| implement than per-SA configuration. After serious thought and | implement than per-SA configuration. After serious thought and | |||
| despite some initial encouragement of per-node configuration, it no | despite some initial encouragement of per-node configuration, it no | |||
| longer seems to be a good idea. The concern is that as ECN-awareness | longer seems to be a good idea. The concern is that as ECN-awareness | |||
| is progressively deployed in IPsec, many ECN-aware IPsec implementa- | is progressively deployed in IPsec, many ECN-aware IPsec | |||
| tions will find themselves communicating with a mixture of ECN-aware | implementations will find themselves communicating with a mixture of | |||
| and ECN-unaware IPsec tunnel endpoints. In such an environment with | ECN-aware and ECN-unaware IPsec tunnel endpoints. In such an | |||
| per-node configuration, the only reasonable thing to do is forbid ECN | environment with per-node configuration, the only reasonable thing to | |||
| usage for all IPsec tunnels, which is not the desired outcome. | do is forbid ECN usage for all IPsec tunnels, which is not the | |||
| desired outcome. | ||||
| In the second area, several reviewers noted that SA negotiation is | In the second area, several reviewers noted that SA negotiation is | |||
| complex, and adding to it is non-trivial. One reviewer suggested | complex, and adding to it is non-trivial. One reviewer suggested | |||
| using ICMP after tunnel setup as a possible alternative. The addi- | using ICMP after tunnel setup as a possible alternative. The | |||
| tion to SA negotiation in this document is OPTIONAL and will remain | addition to SA negotiation in this document is OPTIONAL and will | |||
| so; implementers are free to ignore it. The authors believe that the | remain so; implementers are free to ignore it. The authors believe | |||
| assurance it provides can be useful in a number of situations. In | that the assurance it provides can be useful in a number of | |||
| practice, if this is not implemented, it can be deleted at a subse- | situations. In practice, if this is not implemented, it can be | |||
| quent stage in the standards process. Extending ICMP to negotiate | deleted at a subsequent stage in the standards process. Extending | |||
| ECN after tunnel setup is more complex than extending SA attribute | ICMP to negotiate ECN after tunnel setup is more complex than | |||
| negotiation. Some tunnels do not permit traffic to be addressed to | extending SA attribute negotiation. Some tunnels do not permit | |||
| the tunnel egress endpoint, hence the ICMP packet would have to be | traffic to be addressed to the tunnel egress endpoint, hence the ICMP | |||
| addressed to somewhere else, scanned for by the egress endpoint, and | packet would have to be addressed to somewhere else, scanned for by | |||
| discarded there or at its actual destination. In addition, ICMP | the egress endpoint, and discarded there or at its actual | |||
| delivery is unreliable, and hence there is a possibility of an ICMP | destination. In addition, ICMP delivery is unreliable, and hence | |||
| packet being dropped, entailing the invention of yet another | there is a possibility of an ICMP packet being dropped, entailing the | |||
| ack/retransmit mechanism. It seems better simply to specify an | invention of yet another ack/retransmit mechanism. It seems better | |||
| OPTIONAL extension to the existing SA negotiation mechanism. | simply to specify an OPTIONAL extension to the existing SA | |||
| negotiation mechanism. | ||||
| 9.3. IP packets encapsulated in non-IP packet headers. | 9.3. IP packets encapsulated in non-IP packet headers. | |||
| A different set of issues are raised, relative to ECN, when IP pack- | A different set of issues are raised, relative to ECN, when IP | |||
| ets are encapsulated in tunnels with non-IP packet headers. This | packets are encapsulated in tunnels with non-IP packet headers. This | |||
| occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP]. | occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP]. | |||
| For these protocols, there is no conflict with ECN; it is just that | For these protocols, there is no conflict with ECN; it is just that | |||
| ECN cannot be used within the tunnel unless an ECN codepoint can be | ECN cannot be used within the tunnel unless an ECN codepoint can be | |||
| specified for the header of the encapsulating protocol. Earlier work | specified for the header of the encapsulating protocol. Earlier work | |||
| considered a preliminary proposal for incorporating ECN into MPLS, | considered a preliminary proposal for incorporating ECN into MPLS, | |||
| and proposals for incorporating ECN into GRE, L2TP, or PPTP will be | and proposals for incorporating ECN into GRE, L2TP, or PPTP will be | |||
| considered as the need arises. | considered as the need arises. | |||
| 10. Issues Raised by Monitoring and Policing Devices | 10. Issues Raised by Monitoring and Policing Devices | |||
| One possibility is that monitoring and policing devices (or more | One possibility is that monitoring and policing devices (or more | |||
| informally, "penalty boxes") will be installed in the network to mon- | informally, "penalty boxes") will be installed in the network to | |||
| itor whether best-effort flows are appropriately responding to con- | monitor whether best-effort flows are appropriately responding to | |||
| gestion, and to preferentially drop packets from flows determined not | congestion, and to preferentially drop packets from flows determined | |||
| to be using adequate end-to-end congestion control procedures. | not to be using adequate end-to-end congestion control procedures. | |||
| We recommend that any "penalty box" that detects a flow or an aggre- | We recommend that any "penalty box" that detects a flow or an | |||
| gate of flows that is not responding to end-to-end congestion control | aggregate of flows that is not responding to end-to-end congestion | |||
| first change from marking to dropping packets from that flow, before | control first change from marking to dropping packets from that flow, | |||
| taking any additional action to restrict the bandwidth available to | before taking any additional action to restrict the bandwidth | |||
| that flow. Thus, initially, the router may drop packets in which the | available to that flow. Thus, initially, the router may drop packets | |||
| router would otherwise would have set the CE bit. This could include | in which the router would otherwise would have set the CE codepoint. | |||
| dropping those arriving packets for that flow that are ECN-Capable | This could include dropping those arriving packets for that flow that | |||
| and that already have the CE bit set. In this way, any congestion | are ECN-Capable and that already have the CE codepoint set. In this | |||
| indications seen by that router for that flow will be guaranteed to | way, any congestion indications seen by that router for that flow | |||
| also be seen by the end nodes, even in the presence of malicious or | will be guaranteed to also be seen by the end nodes, even in the | |||
| broken routers elsewhere in the path. If we assume that the first | presence of malicious or broken routers elsewhere in the path. If we | |||
| action taken at any "penalty box" for an ECN-capable flow will be to | assume that the first action taken at any "penalty box" for an ECN- | |||
| drop packets instead of marking them, then there is no way that an | capable flow will be to drop packets instead of marking them, then | |||
| adversary that subverts ECN-based end-to-end congestion control can | there is no way that an adversary that subverts ECN-based end-to-end | |||
| cause a flow to be characterized as being non-cooperative and placed | congestion control can cause a flow to be characterized as being non- | |||
| into a more severe action within the "penalty box". | cooperative and placed into a more severe action within the "penalty | |||
| box". | ||||
| The monitoring and policing devices that are actually deployed could | The monitoring and policing devices that are actually deployed could | |||
| fall short of the `ideal' monitoring device described above, in that | fall short of the `ideal' monitoring device described above, in that | |||
| the monitoring is applied not to a single flow, but to an aggregate | the monitoring is applied not to a single flow, but to an aggregate | |||
| of flows (e.g., those sharing a single IPsec tunnel). In this case, | of flows (e.g., those sharing a single IPsec tunnel). In this case, | |||
| the switch from marking to dropping would apply to all of the flows | the switch from marking to dropping would apply to all of the flows | |||
| in that aggregate, denying the benefits of ECN to the other flows in | in that aggregate, denying the benefits of ECN to the other flows in | |||
| the aggregate also. At the highest level of aggregation, another | the aggregate also. At the highest level of aggregation, another | |||
| form of the disabling of ECN happens even in the absence of monitor- | form of the disabling of ECN happens even in the absence of | |||
| ing and policing devices, when ECN-Capable RED queues switch from | monitoring and policing devices, when ECN-Capable RED queues switch | |||
| marking to dropping packets as an indication of congestion when the | from marking to dropping packets as an indication of congestion when | |||
| average queue size has exceeded some threshold. | the average queue size has exceeded some threshold. | |||
| If there were serious operational problems with routers inappropri- | ||||
| ately erasing the CE bit in packet headers, this could be addressed | ||||
| to some extent by including a one-bit ECN nonce in packet headers. | ||||
| Routers would erase the nonce when they set the CE bit [SCWA99]. | ||||
| Routers that erased the CE bit would face additional difficulty in | ||||
| reconstructing the original nonce, and thus repeated erasure of the | ||||
| CE bit would be more likely to be detected by the end-nodes. (This | ||||
| could in fact be done without adding any extra bits for ECN in the IP | ||||
| header, by using the ECN codepoints (ECT=1, CE=0) and (ECT=0, CE=1) | ||||
| as the two values for the nonce, and by defining the codepoint | ||||
| (ECT=0, CE=1) to mean exactly the same as the codepoint (ECT=1, | ||||
| CE=0).) However, at this point the potential danger of misbehaving | ||||
| routers does not seem of sufficient concern to warrant this addi- | ||||
| tional complication of adding an ECN nonce to protect against the | ||||
| erasure of the CE bit. Additional research is also needed to better | ||||
| understand the value of such a nonce and appropriate means of gener- | ||||
| ating sequences of nonce values that an adversary will find suffi- | ||||
| ciently difficult to reconstruct. | ||||
| An ECN nonce would also address the problem of misbehaving transport | ||||
| receivers lying to the transport sender about whether or not the CE | ||||
| bit was set in a packet. However, another possibility is for the | ||||
| data sender to test for a misbehaving receiver directly, by occasion- | ||||
| ally sending a data packet with ECT and CE set, to see if the | ||||
| receiver reports receiving the CE bit. Of course, if these packets | ||||
| encountered congestion in the network, the router would make no | ||||
| change in the packets, because the CE bit would already be set. | ||||
| Thus, for packets sent with the ECT and CE bits set, the TCP end- | ||||
| nodes could not determine if some router intended to set the CE bit | ||||
| in these packets. For this reason, sending packets with the ECT and | ||||
| CE bits would have to be done very sparingly. In addition, the TCP | ||||
| sender would have to remember which packets were sent with the ECT | ||||
| and CE bits set, so that it doesn't react to them as if there was | ||||
| congestion in the network. We believe that further research is | ||||
| needed on possible transport-based mechanisms for verifying that the | ||||
| transport receiver does not lie to the transport sender about the | ||||
| receipt of congestion indications. | ||||
| 11. Evaluations of ECN | 11. Evaluations of ECN | |||
| 11.1. Related Work Evaluating ECN | ||||
| This section discusses some of the related work evaluating the use of | This section discusses some of the related work evaluating the use of | |||
| ECN. The ECN Web Page [ECN] has pointers to other papers, as well as | ECN. The ECN Web Page [ECN] has pointers to other papers, as well as | |||
| to implementations of ECN. | to implementations of ECN. | |||
| [Floyd94] considers the advantages and drawbacks of adding ECN to the | [Floyd94] considers the advantages and drawbacks of adding ECN to the | |||
| TCP/IP architecture. As shown in the simulation-based comparisons, | TCP/IP architecture. As shown in the simulation-based comparisons, | |||
| one advantage of ECN is to avoid unnecessary packet drops for short | one advantage of ECN is to avoid unnecessary packet drops for short | |||
| or delay-sensitive TCP connections. A second advantage of ECN is in | or delay-sensitive TCP connections. A second advantage of ECN is in | |||
| avoiding some unnecessary retransmit timeouts in TCP. This paper | avoiding some unnecessary retransmit timeouts in TCP. This paper | |||
| discusses in detail the integration of ECN into TCP's congestion con- | discusses in detail the integration of ECN into TCP's congestion | |||
| trol mechanisms. The possible disadvantages of ECN discussed in the | control mechanisms. The possible disadvantages of ECN discussed in | |||
| paper are that a non-compliant TCP connection could falsely advertise | the paper are that a non-compliant TCP connection could falsely | |||
| itself as ECN-capable, and that a TCP ACK packet carrying an ECN-Echo | advertise itself as ECN-capable, and that a TCP ACK packet carrying | |||
| message could itself be dropped in the network. The first of these | an ECN-Echo message could itself be dropped in the network. The | |||
| two issues is discussed in the appendix of this document, and the | first of these two issues is discussed in the appendix of this | |||
| second is addressed by the addition of the CWR flag in the TCP | document, and the second is addressed by the addition of the CWR flag | |||
| header. | in the TCP header. | |||
| Experimental evaluations of ECN include [RFC2884,K98]. The conclu- | Experimental evaluations of ECN include [RFC2884,K98]. The | |||
| sions of [K98] and [RFC2884] are that ECN TCP gets moderately better | conclusions of [K98] and [RFC2884] are that ECN TCP gets moderately | |||
| throughput than non-ECN TCP; that ECN TCP flows are fair towards non- | better throughput than non-ECN TCP; that ECN TCP flows are fair | |||
| ECN TCP flows; and that ECN TCP is robust with two-way traffic (with | towards non-ECN TCP flows; and that ECN TCP is robust with two-way | |||
| congestion in both directions) and with multiple congested gateways. | traffic (with congestion in both directions) and with multiple | |||
| Experiments with many short web transfers show that, while most of | congested gateways. Experiments with many short web transfers show | |||
| the short connections have similar transfer times with or without | that, while most of the short connections have similar transfer times | |||
| ECN, a small percentage of the short connections have very long | with or without ECN, a small percentage of the short connections have | |||
| transfer times for the non-ECN experiments as compared to the ECN | very long transfer times for the non-ECN experiments as compared to | |||
| experiments. | the ECN experiments. | |||
| 12. Summary of changes required in IP and TCP | 11.2. A Discussion of the ECN nonce. | |||
| This document specified two bits in the IP header, the ECN-Capable | The use of two ECT codepoints, ECT(0) and ECT(1), can provide a one- | |||
| Transport (ECT) bit and the Congestion Experienced (CE) bit, to be | bit ECN nonce in packet headers [SCWA99]. The primary motivation for | |||
| used for ECN. The ECT bit set to "0" indicates that the transport | this is the desire to allow mechanisms for the data sender to verify | |||
| protocol will ignore the CE bit. This is the default value for the | that network elements are not erasing the CE codepoint, and that data | |||
| ECT bit. The ECT bit set to "1" indicates that the transport proto- | receivers are properly reporting to the sender the receipt of packets | |||
| col is willing and able to participate in ECN. | with the CE codepoint set, as required by the transport protocol. | |||
| This section discusses issues of backwards compatibility with IP ECN | ||||
| implementations in routers conformant with RFC 2481, in which only | ||||
| one ECT codepoint was defined. We do not believe that the | ||||
| incremental deployment of ECN implementations that understand the | ||||
| ECT(1) codepoint will cause significant operational problems. This | ||||
| is particularly likely to be the case when the deployment of the | ||||
| ECT(1) codepoint begins with routers, before the ECT(1) codepoint | ||||
| starts to be used by end-nodes. | ||||
| The default value for the CE bit is "0". The router sets the CE bit | 11.2.1. The Incremental Deployment of ECT(1) in Routers. | |||
| to "1" to indicate congestion to the end nodes. The CE bit in a | ||||
| packet header MUST NOT be reset by a router from "1" to "0". | ||||
| When viewed in terms of code points, this document has defined three | ECN has been an Experimental standard since January 1999, and there | |||
| code points for the ECN field, for "not ECT" (ECT=0, CE=0), "ECT but | are already implementations of ECN in routers that do not understand | |||
| not CE" (ECT=1, CE=0), and "ECT and CE" (ECT=1, CE=1). The code | the ECT(1) codepoint. When the use of the ECT(1) codepoint is | |||
| point of (ECT=0, CE=1) is not defined in this document. One possi- | standardized for TCP or for other transport protocols, this could | |||
| bility would be for this code point to be used, some time in the | mean that a data sender is using the ECT(1) codepoint, but that this | |||
| future, for some other function for non-ECN-capable packets. A sec- | codepoint is not understood by a congested router on the path. | |||
| ond possibility would be for this code point to be used as an ECN | ||||
| nonce, as described earlier in the document. A third possibility | If allowed by the transport protocol, a data sender would be free not | |||
| would be for the code point (ECT=0, CE=1) to be used to indicate that | to make use of ECT(1) at all, and to send all ECN-capable packets | |||
| the packet is ECN-capable for an alternate semantics for the Conges- | with the codepoint ECT(0). However, if an ECN-capable sender is | |||
| tion Experienced indication. However, at this time the code point | using ECT(1), and the congested router on the path did not understand | |||
| (ECT=0, CE=1) remains undefined. | the ECT(1) codepoint, then the router would end up marking some of | |||
| the ECT(0) packets, and dropping some of the ECT(1) packets, as | ||||
| indications of congestion. Since TCP is required to react to both | ||||
| marked and dropped packets, this behavior of dropping packets that | ||||
| could have been marked poses no significant threat to the network, | ||||
| and is consistent with the overall approach to ECN that allows | ||||
| routers to determine when and whether to mark packets as they see fit | ||||
| (see Section 5). | ||||
| 12. Summary of changes required in IP and TCP | ||||
| This document specified two bits in the IP header to be used for ECN. | ||||
| The not-ECT codepoint indicates that the transport protocol will | ||||
| ignore the CE codepoint. This is the default value for the ECN | ||||
| codepoint. The ECT codepoints indicate that the transport protocol | ||||
| is willing and able to participate in ECN. | ||||
| The router sets the CE codepoint to indicate congestion to the end | ||||
| nodes. The CE codepoint in a packet header MUST NOT be reset by a | ||||
| router. | ||||
| TCP requires three changes for ECN, a setup phase and two new flags | TCP requires three changes for ECN, a setup phase and two new flags | |||
| in the TCP header. The ECN-Echo flag is used by the data receiver to | in the TCP header. The ECN-Echo flag is used by the data receiver to | |||
| inform the data sender of a received CE packet. The Congestion Win- | inform the data sender of a received CE packet. The Congestion | |||
| dow Reduced (CWR) flag is used by the data sender to inform the data | Window Reduced (CWR) flag is used by the data sender to inform the | |||
| receiver that the congestion window has been reduced. | data receiver that the congestion window has been reduced. | |||
| When ECN (Explicit Congestion Notification [RFC2481]) is used, it is | When ECN (Explicit Congestion Notification [RFC2481]) is used, it is | |||
| required that congestion indications generated within an IP tunnel | required that congestion indications generated within an IP tunnel | |||
| not be lost at the tunnel egress. We specified a minor modification | not be lost at the tunnel egress. We specified a minor modification | |||
| to the IP protocol's handling of the ECN field during encapsulation | to the IP protocol's handling of the ECN field during encapsulation | |||
| and de-capsulation to allow flows that will undergo IP tunneling to | and de-capsulation to allow flows that will undergo IP tunneling to | |||
| use ECN. | use ECN. | |||
| Two options for ECN in tunnels were specified: | Two options for ECN in tunnels were specified: | |||
| 1) A limited-functionality option that does not use ECN inside the IP | 1) A limited-functionality option that does not use ECN inside the IP | |||
| tunnel, by turning the ECT bit in the outer header off, and not | tunnel, by setting the ECN field in the outer header to not-ECT, and | |||
| altering the inner header at the time of decapsulation. | not altering the inner header at the time of decapsulation. | |||
| 2) The full-functionality option, which copies the ECT bit of the | 2) The full-functionality option, which sets the ECN field in the | |||
| inner header to the encapsulating header. At decapsulation, if the | outer header to either not-ECT or to one of the ECT codepoints, | |||
| ECT bit is set in the inner header, the CE bit on the outer header is | depending on the ECN field in the inner header. At decapsulation, if | |||
| ORed with the CE bit of the inner header to update the CE bit of the | the CE codepoint is set in the outer header, and the inner header is | |||
| packet. | set to one of the ECT codepoints, then the CE codepoint is copied to | |||
| the inner header. | ||||
| All IP tunnels MUST implement one of the two alternative approaches | All IP tunnels MUST implement one of the two alternative approaches | |||
| described above. For IPsec tunnels, this document also defines an | described above. For IPsec tunnels, this document also defines an | |||
| optional IPsec Security Association (SA) attribute that enables | optional IPsec Security Association (SA) attribute that enables | |||
| negotiation of ECN usage within IPsec tunnels and an optional field | negotiation of ECN usage within IPsec tunnels and an optional field | |||
| in the Security Association Database to indicate whether ECN is per- | in the Security Association Database to indicate whether ECN is | |||
| mitted in tunnel mode on a SA. The required changes to IPsec tunnels | permitted in tunnel mode on a SA. The required changes to IPsec | |||
| for ECN usage modify RFC 2401 [RFC2401], which defines the IPsec | tunnels for ECN usage modify RFC 2401 [RFC2401], which defines the | |||
| architecture and specifies some aspects of its implementation. The | IPsec architecture and specifies some aspects of its implementation. | |||
| new IPsec SA attribute is in addition to those already defined in | The new IPsec SA attribute is in addition to those already defined in | |||
| Section 4.5 of [RFC2407]. | Section 4.5 of [RFC2407]. | |||
| This document is intended to obsolete RFC 2481, "A Proposal to add | This document is intended to obsolete RFC 2481, "A Proposal to add | |||
| Explicit Congestion Notification (ECN) to IP", which defined ECN as | Explicit Congestion Notification (ECN) to IP", which defined ECN as | |||
| an Experimental Protocol for the Internet Community. The rest of | an Experimental Protocol for the Internet Community. The rest of | |||
| this section describes the relationship between this document and its | this section describes the relationship between this document and its | |||
| predecessor. | predecessor. | |||
| RFC 2481 included a brief discussion of the use of ECN with encapsu- | RFC 2481 included a brief discussion of the use of ECN with | |||
| lated packets, and noted that for the IPsec specifications at the | encapsulated packets, and noted that for the IPsec specifications at | |||
| time (January 1999), flows could not safely use ECN if they were to | the time (January 1999), flows could not safely use ECN if they were | |||
| traverse IPsec tunnels. RFC 2481 also described the changes that | to traverse IPsec tunnels. RFC 2481 also described the changes that | |||
| could be made to IPsec tunnel specifications to made them compatible | could be made to IPsec tunnel specifications to made them compatible | |||
| with ECN. | with ECN. | |||
| This document also incorporates work that was done after RFC 2481, | This document also incorporates work that was done after RFC 2481, | |||
| First was to describe the changes to IPsec tunnels in detail, and | First was to describe the changes to IPsec tunnels in detail, and | |||
| extensively discuss the security implications of ECN (now included as | extensively discuss the security implications of ECN (now included as | |||
| Sections 18 and 19 of this document). Second was to extend the dis- | Sections 18 and 19 of this document). Second was to extend the | |||
| cussion of IPsec tunnels to include all IP tunnels. Because older IP | discussion of IPsec tunnels to include all IP tunnels. Because older | |||
| tunnels are not compatible with a flow's use of ECN, the deployment | IP tunnels are not compatible with a flow's use of ECN, the | |||
| of ECN in the Internet will create strong pressure for older IP tun- | deployment of ECN in the Internet will create strong pressure for | |||
| nels to be updated to an ECN-compatible version, using either the | older IP tunnels to be updated to an ECN-compatible version, using | |||
| limited-functionality or the full-functionality option. | either the limited-functionality or the full-functionality option. | |||
| This document does not address the issue of including ECN in non-IP | This document does not address the issue of including ECN in non-IP | |||
| tunnels such as MPLS, GRE, L2TP, or PPTP. An earlier preliminary | tunnels such as MPLS, GRE, L2TP, or PPTP. An earlier preliminary | |||
| document about adding ECN support to MPLS was not advanced. | document about adding ECN support to MPLS was not advanced. | |||
| A third new piece of work after RFC2481 was to describe the ECN pro- | A third new piece of work after RFC2481 was to describe the ECN | |||
| cedure with retransmitted data packets, that the ECT bit should not | procedure with retransmitted data packets, that an ECT codepoint | |||
| be set on retransmitted data packets. The motivation for this addi- | should not be set on retransmitted data packets. The motivation for | |||
| tional specification is to eliminate a possible avenue for denial-of- | this additional specification is to eliminate a possible avenue for | |||
| service attacks on an existing TCP connection. Some prior deploy- | denial-of-service attacks on an existing TCP connection. Some prior | |||
| ments of ECN-capable TCP might not conform to the (new) requirement | deployments of ECN-capable TCP might not conform to the (new) | |||
| not to set the ECT bit on retransmitted packets; we do not believe | requirement not to set an ECT codepoint on retransmitted packets; we | |||
| this will cause significant problems in practice. | do not believe this will cause significant problems in practice. | |||
| This document also expands slightly on the specification of the use | This document also expands slightly on the specification of the use | |||
| of SYN packets for the negotiation of ECN. While some prior deploy- | of SYN packets for the negotiation of ECN. While some prior | |||
| ments of ECN-capable TCP might not conform to the requirements speci- | deployments of ECN-capable TCP might not conform to the requirements | |||
| fied in this document, we do not believe that this will lead to any | specified in this document, we do not believe that this will lead to | |||
| performance or compatibility problems for TCP connections with a com- | any performance or compatibility problems for TCP connections with a | |||
| bination of TCP implementations at the endpoints. | combination of TCP implementations at the endpoints. | |||
| This document also includes the specification of the ECT(1) | ||||
| codepoint, which may be used by TCP as part of the implementation of | ||||
| an ECN nonce. | ||||
| 13. Conclusions | 13. Conclusions | |||
| Given the current effort to implement AQM, we believe this is the | Given the current effort to implement AQM, we believe this is the | |||
| right time to deploy congestion avoidance mechanisms that do not | right time to deploy congestion avoidance mechanisms that do not | |||
| depend on packet drops alone. With the increased deployment of | depend on packet drops alone. With the increased deployment of | |||
| applications and transports sensitive to the delay and loss of a sin- | applications and transports sensitive to the delay and loss of a | |||
| gle packet (e.g., realtime traffic, short web transfers), depending | single packet (e.g., realtime traffic, short web transfers), | |||
| on packet loss as a normal congestion notification mechanism appears | depending on packet loss as a normal congestion notification | |||
| to be insufficient (or at the very least, non-optimal). | mechanism appears to be insufficient (or at the very least, non- | |||
| optimal). | ||||
| We examined the consequence of modifications of the ECN field within | We examined the consequence of modifications of the ECN field within | |||
| the network, analyzing all the opportunities for an adversary to | the network, analyzing all the opportunities for an adversary to | |||
| change the ECN field. In many cases, the change to the ECN field is | change the ECN field. In many cases, the change to the ECN field is | |||
| no worse than dropping a packet. However, we noted that some changes | no worse than dropping a packet. However, we noted that some changes | |||
| have the more serious consequence of subverting end-to-end congestion | have the more serious consequence of subverting end-to-end congestion | |||
| control. However, we point out that even then the potential damage | control. However, we point out that even then the potential damage | |||
| is limited, and is similar to the threat posed by end-systems inten- | is limited, and is similar to the threat posed by end-systems | |||
| tionally failing to cooperate with end-to-end congestion control. | intentionally failing to cooperate with end-to-end congestion | |||
| control. | ||||
| 14. Acknowledgements | 14. Acknowledgements | |||
| Many people have made contributions to this work and this document, | Many people have made contributions to this work and this document, | |||
| including many that we have not managed to directly acknowledge in | including many that we have not managed to directly acknowledge in | |||
| this document. In addition, we would like to thank Kenjiro Cho for | this document. In addition, we would like to thank Kenjiro Cho for | |||
| the proposal for the TCP mechanism for negotiating ECN-Capability, | the proposal for the TCP mechanism for negotiating ECN-Capability, | |||
| Kevin Fall for the proposal of the CWR bit, Steve Blake for material | Kevin Fall for the proposal of the CWR bit, Steve Blake for material | |||
| on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for discus- | on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for | |||
| sions of ECN issues, and Steve Bellovin, Jim Bound, Brian Carpenter, | discussions of ECN issues, and Steve Bellovin, Jim Bound, Brian | |||
| Paul Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson for dis- | Carpenter, Paul Ferguson, Stephen Kent, Greg Minshall, and Vern | |||
| cussions of security issues. We also thank the Internet End-to-End | Paxson for discussions of security issues. We also thank the | |||
| Research Group for ongoing discussions of these issues. | Internet End-to-End Research Group for ongoing discussions of these | |||
| issues. | ||||
| Email discussions with a number of people, including Alexey | Email discussions with a number of people, including Alexey | |||
| Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have addressed | Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have addressed | |||
| the issues raised by non-conformant equipment in the Internet that | the issues raised by non-conformant equipment in the Internet that | |||
| does not respond to TCP SYN packets with the ECE and CWR flags set. | does not respond to TCP SYN packets with the ECE and CWR flags set. | |||
| We thank Mark Handley, Jitentra Padhye, and others for discussions on | We thank Mark Handley, Jitentra Padhye, and others for discussions on | |||
| the TCP initialization procedures. | the TCP initialization procedures. | |||
| The discussion of ECN and IP tunnel considerations draws heavily on | The discussion of ECN and IP tunnel considerations draws heavily on | |||
| related discussions and documents from the Differentiated Services | related discussions and documents from the Differentiated Services | |||
| Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh, | Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh, | |||
| for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen | for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen | |||
| for proposing modifications to RFC 2407 that improve the usability of | for proposing modifications to RFC 2407 that improve the usability of | |||
| negotiating the ECN Tunnel SA attribute. | negotiating the ECN Tunnel SA attribute. | |||
| We thank David Wetherall, David Ely, and Neil Spring for the proposal | ||||
| for the ECN nonce. We also thank Stefan Savage for discussions on | ||||
| this issue. We thank Bob Briscoe and Jon Crowcroft for raising the | ||||
| issue of fragmentation in IP, on alternate semantics for the fourth | ||||
| ECN codepoint, and several other topics. We thank Richard Wendland | ||||
| for feedback on several issues in the draft. | ||||
| 15. References | 15. References | |||
| [AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402, | [AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402, | |||
| November 1998. | November 1998. | |||
| [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement | [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement | |||
| Levels", BCP 14, RFC 2119, March 1997. | Levels", BCP 14, RFC 2119, March 1997. | |||
| [ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html". | [ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html". | |||
| Reference for informational purposes only. | Reference for informational purposes only. | |||
| skipping to change at page 38, line 30 ¶ | skipping to change at page 40, line 44 ¶ | |||
| for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 | for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 | |||
| N.4, August 1993, p. 397-413. | N.4, August 1993, p. 397-413. | |||
| [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM | [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM | |||
| Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. | Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. | |||
| [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", | [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", | |||
| URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- | URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- | |||
| ecn. Reference for informational purposes only. | ecn. Reference for informational purposes only. | |||
| [FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-End Con- | [FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-End | |||
| gestion Control in the Internet", IEEE/ACM Transactions on Network- | Congestion Control in the Internet", IEEE/ACM Transactions on | |||
| ing, August 1999. | Networking, August 1999. | |||
| [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", | [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection", | |||
| SIGCOMM '97, September 1997. | SIGCOMM '97, September 1997. | |||
| [GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing | [GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing | |||
| Encapsulation (GRE), RFC 1701, October 1994. | Encapsulation (GRE), RFC 1701, October 1994. | |||
| [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. | [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. | |||
| ACM SIGCOMM '88, pp. 314-329. | ACM SIGCOMM '88, pp. 314-329. | |||
| [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance Algo- | [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance | |||
| rithm", Message to end2end-interest mailing list, April 1990. URL | Algorithm", Message to end2end-interest mailing list, April 1990. URL | |||
| "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". | "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". | |||
| [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN) | [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN) | |||
| benefits for TCP", Master's thesis, UCLA, 1998, URL | benefits for TCP", Master's thesis, UCLA, 1998, URL | |||
| "http://www.cs.ucla.edu/~hari/software/ecn/ ecn_report.ps.gz". | "http://www.cs.ucla.edu/~hari/software/ecn/ ecn_report.ps.gz". | |||
| [L2TP] W. Townsley, A. Valencia, A. Rubens, G. Pall, G. Zorn, and B. | [L2TP] W. Townsley, A. Valencia, A. Rubens, G. Pall, G. Zorn, and B. | |||
| Palter Layer Two Tunneling Protocol "L2TP", RFC 2661, August 1999. | Palter Layer Two Tunneling Protocol "L2TP", RFC 2661, August 1999. | |||
| [MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver- driven | [MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven | |||
| Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130. | Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130. | |||
| [MPLS] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus, | [MPLS] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus, | |||
| Requirements for Traffic Engineering Over MPLS, RFC 2702, September | Requirements for Traffic Engineering Over MPLS, RFC 2702, September | |||
| 1999. | 1999. | |||
| [PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W. | [PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W. | |||
| and G. Zorn, "Point-to-Point Tunneling Protocol (PPTP)", RFC 2637, | and G. Zorn, "Point-to-Point Tunneling Protocol (PPTP)", RFC 2637, | |||
| July 1999. | July 1999. | |||
| skipping to change at page 39, line 40 ¶ | skipping to change at page 42, line 5 ¶ | |||
| [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic | [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic | |||
| Routing Encapsulation (GRE), RFC 1701, October 1994. | Routing Encapsulation (GRE), RFC 1701, October 1994. | |||
| [RFC1702] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic | [RFC1702] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic | |||
| Routing Encapsulation over IPv4 networks, RFC 1702, October 1994. | Routing Encapsulation over IPv4 networks, RFC 1702, October 1994. | |||
| [RFC2003] Perkins, C., IP Encapsulation within IP, RFC 2003, October | [RFC2003] Perkins, C., IP Encapsulation within IP, RFC 2003, October | |||
| 1996. | 1996. | |||
| [RFC 2119] S. Bradner, Key words for use in RFCs to Indicate Require- | [RFC 2119] S. Bradner, Key words for use in RFCs to Indicate | |||
| ment Levels, RFC 2119, March 1997. | Requirement Levels, RFC 2119, March 1997. | |||
| [RFC2309] Braden, B., et al., "Recommendations on Queue Management | [RFC2309] Braden, B., et al., "Recommendations on Queue Management | |||
| and Congestion Avoidance in the Internet", RFC 2309, April 1998. | and Congestion Avoidance in the Internet", RFC 2309, April 1998. | |||
| [RFC2401] S. Kent and R. Atkinson, Security Architecture for the | [RFC2401] S. Kent and R. Atkinson, Security Architecture for the | |||
| Internet Protocol, RFC 2401, November 1998. | Internet Protocol, RFC 2401, November 1998. | |||
| [RFC2407] D. Piper, The Internet IP Security Domain of Interpretation | [RFC2407] D. Piper, The Internet IP Security Domain of Interpretation | |||
| for ISAKMP, RFC 2407, November 1998. | for ISAKMP, RFC 2407, November 1998. | |||
| skipping to change at page 40, line 15 ¶ | skipping to change at page 42, line 29 ¶ | |||
| RFC 2409, November 1998. | RFC 2409, November 1998. | |||
| [RFC2409] D. Harkins and D. Carrel, The Internet Key Exchange (IKE), | [RFC2409] D. Harkins and D. Carrel, The Internet Key Exchange (IKE), | |||
| RFC 2409, November 1998. | RFC 2409, November 1998. | |||
| [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition | [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition | |||
| of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 | of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 | |||
| Headers", RFC 2474, December 1998. | Headers", RFC 2474, December 1998. | |||
| [RFC2475] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. | [RFC2475] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. | |||
| Weiss, An Architecture for Differentiated Services, RFC 2475, Decem- | Weiss, An Architecture for Differentiated Services, RFC 2475, | |||
| ber 1998. | December 1998. | |||
| [RFC2481] K. Ramakrishnan and S. Floyd, A Proposal to add Explicit | [RFC2481] K. Ramakrishnan and S. Floyd, A Proposal to add Explicit | |||
| Congestion Notification (ECN) to IP, RFC 2481, January 1999. | Congestion Notification (ECN) to IP, RFC 2481, January 1999. | |||
| [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control", | [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control", | |||
| RFC 2581, April 1999. | RFC 2581, April 1999. | |||
| [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation | [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation | |||
| of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884, | of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884, | |||
| July 2000. | July 2000. | |||
| [RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983, | [RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983, | |||
| October 2000. | October 2000. | |||
| [RFC2780] S. Bradner and V. Paxson, "IANA Allocation Guidelines For | [RFC2780] S. Bradner and V. Paxson, "IANA Allocation Guidelines For | |||
| Values In the Internet Protocol and Related Headers", RFC 2780, March | Values In the Internet Protocol and Related Headers", RFC 2780, March | |||
| 2000. | 2000. | |||
| [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for | [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for | |||
| Congestion Avoidance in Computer Networks", ACM Transactions on Com- | Congestion Avoidance in Computer Networks", ACM Transactions on | |||
| puter Systems, Vol.8, No.2, pp. 158-181, May 1990. | Computer Systems, Vol.8, No.2, pp. 158-181, May 1990. | |||
| [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom | [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom | |||
| Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM | Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM | |||
| Computer Communications Review, October 1999. | Computer Communications Review, October 1999. | |||
| 16. Security Considerations | 16. Security Considerations | |||
| Security considerations have been discussed in Sections 7, 8, 18, and | Security considerations have been discussed in Sections 7, 8, 18, and | |||
| 19. | 19. | |||
| skipping to change at page 41, line 11 ¶ | skipping to change at page 43, line 25 ¶ | |||
| IPv4 header checksum recalculation is an issue with some high-end | IPv4 header checksum recalculation is an issue with some high-end | |||
| router architectures using an output-buffered switch, since most if | router architectures using an output-buffered switch, since most if | |||
| not all of the header manipulation is performed on the input side of | not all of the header manipulation is performed on the input side of | |||
| the switch, while the ECN decision would need to be made local to the | the switch, while the ECN decision would need to be made local to the | |||
| output buffer. This is not an issue for IPv6, since there is no IPv6 | output buffer. This is not an issue for IPv6, since there is no IPv6 | |||
| header checksum. The IPv4 TOS octet is the last byte of a 16-bit | header checksum. The IPv4 TOS octet is the last byte of a 16-bit | |||
| half-word. | half-word. | |||
| RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 | RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 | |||
| checksum after the TTL field is decremented. The incremental updat- | checksum after the TTL field is decremented. The incremental | |||
| ing of the IPv4 checksum after the CE bit was set would work as fol- | updating of the IPv4 checksum after the CE codepoint was set would | |||
| lows: Let HC be the original header checksum, and let HC' be the new | work as follows: Let HC be the original header checksum for an ECT(0) | |||
| header checksum after the CE bit has been set. Then for header | packet, and let HC' be the new header checksum after the CE checksum | |||
| checksums calculated with one's complement subtraction, HC' would be | has been set. That is, the ECN field has changed from '10' to '11'. | |||
| recalculated as follows: | Then for header checksums calculated with one's complement | |||
| subtraction, HC' would be recalculated as follows: | ||||
| HC' = { HC - 1 HC > 1 | HC' = { HC - 1 HC > 1 | |||
| { 0x0000 HC = 1 | { 0x0000 HC = 1 | |||
| For header checksums calculated on two's complement machines, HC' would | For header checksums calculated on two's complement machines, HC' would | |||
| be recalculated as follows after the CE bit was set: | be recalculated as follows after the CE bit was set: | |||
| HC' = { HC - 1 HC > 0 | HC' = { HC - 1 HC > 0 | |||
| { 0xFFFE HC = 0 | { 0xFFFE HC = 0 | |||
| A similar incremental updating of the IPv4 checksum can be carried out | ||||
| when the ECN field is changed from ECT(1) to CE, that is, from '01' to | ||||
| '11'. | ||||
| 18. Possible Changes to the ECN Field in the Network | 18. Possible Changes to the ECN Field in the Network | |||
| This section discusses in detail possible changes to the ECN field in | This section discusses in detail possible changes to the ECN field in | |||
| the network, such as falsely reporting congestion, disabling ECN- | the network, such as falsely reporting congestion, disabling ECN- | |||
| Capability for an individual packet, erasing the ECN congestion indi- | Capability for an individual packet, erasing the ECN congestion | |||
| cation, or falsely indicating ECN-Capability. We represent the ECN | indication, or falsely indicating ECN-Capability. | |||
| bits in the IP header by the tuple (ECT bit, CE bit). | ||||
| 18.1. Possible Changes to the IP Header | 18.1. Possible Changes to the IP Header | |||
| 18.1.1. Erasing the Congestion Indication | 18.1.1. Erasing the Congestion Indication | |||
| First, we consider the changes that a router could make that would | First, we consider the changes that a router could make that would | |||
| result in effectively erasing the congestion indication after it had | result in effectively erasing the congestion indication after it had | |||
| been set by a router upstream. The convention followed is: | been set by a router upstream. The convention followed is: | |||
| (ECT, CE) of received packet -> (ECT, CE) of packet transmitted. | ECN codepoint of received packet -> ECN codepoint of packet | |||
| transmitted. | ||||
| (1, 1) -> (1, 0): erase only the CE bit that was set. | Replacing the CE codepoint with the ECT(0) or ECT(1) codepoint | |||
| (1, 1) -> (0, 0): erase both the ECT bit and the CE bit. | effectively erases the congestion indication. However, with the use | |||
| (1, 1) -> (0, 1): erase the ECT bit | of two ECT codepoints, a router erasing the CE codepoint has no way | |||
| to know whether the original ECT codepoint was ECT(0) or ECT(1). | ||||
| Thus, it is possible for the transport protocol to deploy mechanisms | ||||
| to detect such erasures of the CE codepoint. | ||||
| The first change turns off the CE bit after it has been set by some | The consequence of the erasure of the CE codepoint for the upstream | |||
| upstream router along the path. The consequence for the upstream | ||||
| router is that there is a potential for congestion to build for a | router is that there is a potential for congestion to build for a | |||
| time, because the congestion indication does not reach the source. | time, because the congestion indication does not reach the source. | |||
| However, the packet would be received and acknowledged. | However, the packet would be received and acknowledged. | |||
| The potential effect of erasing the congestion indication is complex, | The potential effect of erasing the congestion indication is complex, | |||
| and is discussed in depth in Section 19 below. Note that the effect | and is discussed in depth in Section 19 below. Note that the effect | |||
| of erasing the congestion indication is different from dropping a | of erasing the congestion indication is different from dropping a | |||
| packet in the network. When a data packet is dropped, the drop is | packet in the network. When a data packet is dropped, the drop is | |||
| detected by the TCP sender, and interpreted as an indication of con- | detected by the TCP sender, and interpreted as an indication of | |||
| gestion. Similarly, if a sufficient number of consecutive acknowl- | congestion. Similarly, if a sufficient number of consecutive | |||
| edgement packets are dropped, causing the cumulative acknowledgement | acknowledgement packets are dropped, causing the cumulative | |||
| field not to be advanced at the sender, the sender is limited by the | acknowledgement field not to be advanced at the sender, the sender is | |||
| congestion window from sending additional packets, and ultimately the | limited by the congestion window from sending additional packets, and | |||
| retransmit timer expires. | ultimately the retransmit timer expires. | |||
| In contrast, a systematic erasure of the CE bit by a downstream | In contrast, a systematic erasure of the CE bit by a downstream | |||
| router can have the effect of causing a queue buildup at an upstream | router can have the effect of causing a queue buildup at an upstream | |||
| router, including the possible loss of packets due to buffer over- | router, including the possible loss of packets due to buffer | |||
| flow. There is a potential of unfairness in that another flow that | overflow. There is a potential of unfairness in that another flow | |||
| goes through the congested router could react to the CE bit set while | that goes through the congested router could react to the CE bit set | |||
| the flow that has the CE bit erased could see better performance. | while the flow that has the CE bit erased could see better | |||
| The limitations on this potential unfairness are discussed in more | performance. The limitations on this potential unfairness are | |||
| detail in Section 19 below. | discussed in more detail in Section 19 below. | |||
| The second change is to turn off both the ECT and the CE bits, thus | ||||
| erasing the congestion indication and disabling ECN-Capability at the | ||||
| same time. The third change turns off only the ECT bit, disabling | ||||
| ECN-Capability. | ||||
| Within an IP tunnel using the full-functionality option, the third | The last of the three changes is to replace the CE codepoint with the | |||
| change would not erase the congestion indication, but would only dis- | not-ECT codepoint. thus erasing the congestion indication and | |||
| able ECN-Capability for that packet within the rest of the tunnel. | disabling ECN-Capability at the same time. | |||
| However, when performed outside of an IP tunnel, the third change | ||||
| would also effectively erase the congestion indication, because an | ||||
| ECN field of (0, 1) is undefined. | ||||
| The `erasure' of the congestion indication is only effective if the | The `erasure' of the congestion indication is only effective if the | |||
| packet does not end up being marked or dropped again by a downstream | packet does not end up being marked or dropped again by a downstream | |||
| router. With the first change, the packet remains ECN-Capable, and | router. If the CE codepoint is replaced by an ECT codepoint, the | |||
| could be either marked or dropped by a downstream router as an indi- | packet remains ECN-Capable, and could be either marked or dropped by | |||
| cation of congestion. With the second and third changes, the packet | a downstream router as an indication of congestion. If the CE | |||
| is no longer ECN-capable, and can therefore be dropped but not marked | codepoint is replaced by the not-ECT codepoint, the packet is no | |||
| by a downstream router as an indication of congestion. | longer ECN-capable, and can therefore be dropped but not marked by a | |||
| downstream router as an indication of congestion. | ||||
| 18.1.2. Falsely Reporting Congestion | 18.1.2. Falsely Reporting Congestion | |||
| (1, 0) -> (1, 1) | This change is to set the CE codepoint when an ECT codepoint was | |||
| already set, even though there was no congestion. This change does | ||||
| This change is to set the CE bit when the ECT bit was already set, | not affect the treatment of that packet along the rest of the path. | |||
| even though there was no congestion. This change does not affect the | In particular, a router does not examine the CE codepoint in deciding | |||
| treatment of that packet along the rest of the path. In particular, | whether to drop or mark an arriving packet. | |||
| a router does not examine the CE bit in deciding whether to drop or | ||||
| mark an arriving packet. | ||||
| However, this could result in the application unnecessarily invoking | However, this could result in the application unnecessarily invoking | |||
| end-to-end congestion control, and reducing its arrival rate. By | end-to-end congestion control, and reducing its arrival rate. By | |||
| itself, this is no worse (for the application or for the network) | itself, this is no worse (for the application or for the network) | |||
| than if the tampering router had actually dropped the packet. | than if the tampering router had actually dropped the packet. | |||
| 18.1.3. Disabling ECN-Capability | 18.1.3. Disabling ECN-Capability | |||
| (1, 0) -> (0, *) | This change is to turn off the ECT codepoint of a packet. This means | |||
| This change is to turn off the ECT bit of a packet that does not have | ||||
| the CE bit set. (Section 18.1.1 discussed the case of turning off | ||||
| the ECT bit of a packet that does have the CE bit set.) This means | ||||
| that if the packet later encounters congestion (e.g., by arriving to | that if the packet later encounters congestion (e.g., by arriving to | |||
| a RED queue with a moderate average queue size), it will be dropped | a RED queue with a moderate average queue size), it will be dropped | |||
| instead of being marked. By itself, this is no worse (for the appli- | instead of being marked. By itself, this is no worse (for the | |||
| cation) than if the tampering router had actually dropped the packet. | application) than if the tampering router had actually dropped the | |||
| The saving grace in this particular case is that there is no con- | packet. The saving grace in this particular case is that there is no | |||
| gested router upstream expecting a reaction from setting the CE bit. | congested router upstream expecting a reaction from setting the CE | |||
| bit. | ||||
| 18.1.4. Falsely Indicating ECN-Capability | 18.1.4. Falsely Indicating ECN-Capability | |||
| This change would incorrectly label a packet as ECN-Capable. The | This change would incorrectly label a packet as ECN-Capable. The | |||
| packet may have been sent either by an ECN-Capable transport or a | packet may have been sent either by an ECN-Capable transport or a | |||
| transport that is not ECN-Capable. | transport that is not ECN-Capable. | |||
| (0, *) -> (1, 0); | ||||
| (0, *) -> (1, 1); | ||||
| If the packet later encounters moderate congestion at an ECN-Capable | If the packet later encounters moderate congestion at an ECN-Capable | |||
| router, the router could set the CE bit instead of dropping the | router, the router could set the CE codepoint instead of dropping the | |||
| packet. If the transport protocol in fact is not ECN-Capable, then | packet. If the transport protocol in fact is not ECN-Capable, then | |||
| the transport will never receive this indication of congestion, and | the transport will never receive this indication of congestion, and | |||
| will not reduce its sending rate in response. The potential conse- | will not reduce its sending rate in response. The potential | |||
| quences of falsely indicating ECN-capability are discussed further in | consequences of falsely indicating ECN-capability are discussed | |||
| Section 19 below. | further in Section 19 below. | |||
| If the packet never later encounters congestion at an ECN-Capable | If the packet never later encounters congestion at an ECN-Capable | |||
| router, then the first of these two changes would have no effect. | router, then the first of these two changes would have no effect, | |||
| The second change, however, would have the effect of giving false | other than possibly interfering with the use of the ECN nonce by the | |||
| reports of congestion to a monitoring device along the path. If the | transport protocol. The last change, however, would have the effect | |||
| transport protocol is ECN-Capable, then the second of these two | of giving false reports of congestion to a monitoring device along | |||
| changes (when, for example, (0,0) was changed to (1,1)) could also | the path. If the transport protocol is ECN-Capable, then this change | |||
| have an effect at the transport level, by combining falsely indicat- | could also have an effect at the transport level, by combining | |||
| ing ECN-Capability with falsely reporting congestion. For an ECN- | falsely indicating ECN-Capability with falsely reporting congestion. | |||
| capable transport, this would cause the transport to unnecessarily | For an ECN-capable transport, this would cause the transport to | |||
| react to congestion. In this particular case, the router that is | unnecessarily react to congestion. In this particular case, the | |||
| incorrectly changing the ECN field could have dropped the packet. | router that is incorrectly changing the ECN field could have dropped | |||
| Thus for this case of an ECN-capable transport, the consequence of | the packet. Thus for this case of an ECN-capable transport, the | |||
| this change to the ECN field is no worse than dropping the packet. | consequence of this change to the ECN field is no worse than dropping | |||
| the packet. | ||||
| 18.1.5. Changes with No Functional Effect | ||||
| (0, *) -> (0, *) | ||||
| The CE bit is ignored in a packet that does not have the ECT bit set. | ||||
| Thus, this change would have no effect, in terms of ECN. | ||||
| 18.2. Information carried in the Transport Header | 18.2. Information carried in the Transport Header | |||
| For TCP, an ECN-capable TCP receiver informs its TCP peer that it is | For TCP, an ECN-capable TCP receiver informs its TCP peer that it is | |||
| ECN-capable at the TCP level, conveying this information in the TCP | ECN-capable at the TCP level, conveying this information in the TCP | |||
| header at the time the connection is setup. This document does not | header at the time the connection is setup. This document does not | |||
| consider potential dangers introduced by changes in the transport | consider potential dangers introduced by changes in the transport | |||
| header within the network. In the case of IPsec tunnels, the IPsec | header within the network. In the case of IPsec tunnels, the IPsec | |||
| tunnel protects the transport header. | tunnel protects the transport header. | |||
| Another issue concerns TCP packets with a spoofed IP source address | Another issue concerns TCP packets with a spoofed IP source address | |||
| carrying invalid ECN information in the transport header. For com- | carrying invalid ECN information in the transport header. For | |||
| pleteness, we examine here some possible ways that a node spoofing | completeness, we examine here some possible ways that a node spoofing | |||
| the IP source address of another node could use the two ECN flags in | the IP source address of another node could use the two ECN flags in | |||
| the TCP header to launch a denial-of-service attack. However, these | the TCP header to launch a denial-of-service attack. However, these | |||
| attacks would require an ability for the attacker to use valid TCP | attacks would require an ability for the attacker to use valid TCP | |||
| sequence numbers, and any attacker with this ability and with the | sequence numbers, and any attacker with this ability and with the | |||
| ability to spoof IP source addresses could damage the TCP connection | ability to spoof IP source addresses could damage the TCP connection | |||
| without using the ECN flags. Therefore, ECN does not add any new | without using the ECN flags. Therefore, ECN does not add any new | |||
| vulnerabilities in this respect. | vulnerabilities in this respect. | |||
| An acknowledgement packet with a spoofed IP source address of the TCP | An acknowledgement packet with a spoofed IP source address of the TCP | |||
| data receiver could include the ECE bit set. If accepted by the TCP | data receiver could include the ECE bit set. If accepted by the TCP | |||
| data sender as a valid packet, this spoofed acknowledgement packet | data sender as a valid packet, this spoofed acknowledgement packet | |||
| could result in the TCP data sender unnecessarily halving its conges- | could result in the TCP data sender unnecessarily halving its | |||
| tion window. However, to be accepted by the data sender, such a | congestion window. However, to be accepted by the data sender, such | |||
| spoofed acknowledgement packet would have to have the correct 32-bit | a spoofed acknowledgement packet would have to have the correct | |||
| sequence number as well as a valid acknowledgement number. An | 32-bit sequence number as well as a valid acknowledgement number. An | |||
| attacker that could successfully send such a spoofed acknowledgement | attacker that could successfully send such a spoofed acknowledgement | |||
| packet could also send a spoofed RST packet, or do other equally dam- | packet could also send a spoofed RST packet, or do other equally | |||
| aging operations to the TCP connection. | damaging operations to the TCP connection. | |||
| Packets with a spoofed IP source address of the TCP data sender could | Packets with a spoofed IP source address of the TCP data sender could | |||
| include the CWR bit set. Again, to be accepted, such a packet would | include the CWR bit set. Again, to be accepted, such a packet would | |||
| have to have a valid sequence number. In addition, such a spoofed | have to have a valid sequence number. In addition, such a spoofed | |||
| packet would have a limited performance impact. Spoofing a data | packet would have a limited performance impact. Spoofing a data | |||
| packet with the CWR bit set could result in the TCP data receiver | packet with the CWR bit set could result in the TCP data receiver | |||
| sending fewer ECE packets than it would otherwise, if the data | sending fewer ECE packets than it would otherwise, if the data | |||
| receiver was sending ECE packets when it received the spoofed CWR | receiver was sending ECE packets when it received the spoofed CWR | |||
| packet. | packet. | |||
| skipping to change at page 45, line 30 ¶ | skipping to change at page 47, line 30 ¶ | |||
| that set of packets? | that set of packets? | |||
| We will classify the packets in the flow as A packets and B packets, | We will classify the packets in the flow as A packets and B packets, | |||
| and assume that the adversary only has access to A packets. Assume | and assume that the adversary only has access to A packets. Assume | |||
| that the adversary is subverting end-to-end congestion control along | that the adversary is subverting end-to-end congestion control along | |||
| the path traveled by A packets only, by either falsely indicating | the path traveled by A packets only, by either falsely indicating | |||
| ECN-Capability upstream of the point where congestion occurs, or | ECN-Capability upstream of the point where congestion occurs, or | |||
| erasing the congestion indication downstream. Consider also that | erasing the congestion indication downstream. Consider also that | |||
| there exists a monitoring device that sees both the A and B packets, | there exists a monitoring device that sees both the A and B packets, | |||
| and will "punish" both the A and B packets if the total flow is | and will "punish" both the A and B packets if the total flow is | |||
| determined not to be properly responding to indications of conges- | determined not to be properly responding to indications of | |||
| tion. Another key characteristic that we believe is likely to be | congestion. Another key characteristic that we believe is likely to | |||
| true is that the monitoring device, before `punishing' the A&B flow, | be true is that the monitoring device, before `punishing' the A&B | |||
| will first drop packets instead of setting the CE bit, and will drop | flow, will first drop packets instead of setting the CE codepoint, | |||
| arriving packets of that flow that already have the ECT and CE bits | and will drop arriving packets of that flow that already have the CE | |||
| set. If the end nodes are in fact using end-to-end congestion con- | codepoint set. If the end nodes are in fact using end-to-end | |||
| trol, they will see all of the indications of congestion seen by the | congestion control, they will see all of the indications of | |||
| monitoring device, and will begin to respond to these indications of | congestion seen by the monitoring device, and will begin to respond | |||
| congestion. Thus, the monitoring device is successful in providing | to these indications of congestion. Thus, the monitoring device is | |||
| the indications to the flow at an early stage. | successful in providing the indications to the flow at an early | |||
| stage. | ||||
| It is true that the adversary that has access only to the A packets | It is true that the adversary that has access only to the A packets | |||
| might, by subverting ECN-based congestion control, be able to deny | might, by subverting ECN-based congestion control, be able to deny | |||
| the benefits of ECN to the other packets in the A&B aggregate. While | the benefits of ECN to the other packets in the A&B aggregate. While | |||
| this is unfortunate, this is not a reason to disable ECN within an | this is unfortunate, this is not a reason to disable ECN within an | |||
| IPsec tunnel. | IPsec tunnel. | |||
| A variant of falsely reporting congestion occurs when there are two | A variant of falsely reporting congestion occurs when there are two | |||
| adversaries along a path, where the first adversary falsely reports | adversaries along a path, where the first adversary falsely reports | |||
| congestion, and the second adversary `erases' those reports. (Unlike | congestion, and the second adversary `erases' those reports. (Unlike | |||
| packet drops, ECN congestion reports can be `reversed' later in the | packet drops, ECN congestion reports can be `reversed' later in the | |||
| network by a malicious or broken router.) While this would be trans- | network by a malicious or broken router. However, the use of the ECN | |||
| parent to the end node, it is possible that a monitoring device | nonce could help the transport to detect this behavior.) While this | |||
| between the first and second adversaries would see the false indica- | would be transparent to the end node, it is possible that a | |||
| tions of congestion. Keep in mind our recommendation in this docu- | monitoring device between the first and second adversaries would see | |||
| ment, that before `punishing' a flow for not responding appropriately | the false indications of congestion. Keep in mind our recommendation | |||
| to congestion, the router will first switch to dropping rather than | in this document, that before `punishing' a flow for not responding | |||
| marking as an indication of congestion, for that flow. When this | appropriately to congestion, the router will first switch to dropping | |||
| includes dropping arriving packets from that flow that have the CE | rather than marking as an indication of congestion, for that flow. | |||
| bit set, this ensures that these indications of congestion are being | When this includes dropping arriving packets from that flow that have | |||
| seen by the end nodes. Thus, there is no additional harm that we are | the CE codepoint set, this ensures that these indications of | |||
| able to postulate as a result of multiple conflicting adversaries. | congestion are being seen by the end nodes. Thus, there is no | |||
| additional harm that we are able to postulate as a result of multiple | ||||
| conflicting adversaries. | ||||
| 19. Implications of Subverting End-to-End Congestion Control | 19. Implications of Subverting End-to-End Congestion Control | |||
| This section focuses on the potential repercussions of subverting | This section focuses on the potential repercussions of subverting | |||
| end-to-end congestion control by either falsely indicating ECN-Capa- | end-to-end congestion control by either falsely indicating ECN- | |||
| bility, or by erasing the congestion indication in ECN (the CE-bit). | Capability, or by erasing the congestion indication in ECN (the CE | |||
| Subverting end-to-end congestion control by either of these two meth- | codepoint). Subverting end-to-end congestion control by either of | |||
| ods can have consequences both for the application and for the net- | these two methods can have consequences both for the application and | |||
| work. We discuss these separately below. | for the network. We discuss these separately below. | |||
| The first method to subvert end-to-end congestion control, that of | The first method to subvert end-to-end congestion control, that of | |||
| falsely indicating ECN-Capability, effectively subverts end-to-end | falsely indicating ECN-Capability, effectively subverts end-to-end | |||
| congestion control only if the packet later encounters congestion | congestion control only if the packet later encounters congestion | |||
| that results in the setting of the CE bit. In this case, the trans- | that results in the setting of the CE codepoint. In this case, the | |||
| port protocol (which may not be ECN-capable) does not receive the | transport protocol (which may not be ECN-capable) does not receive | |||
| indication of congestion from these downstream congested routers. | the indication of congestion from these downstream congested routers. | |||
| The second method to subvert end-to-end congestion control, `erasing' | The second method to subvert end-to-end congestion control, `erasing' | |||
| the (set) CE bit in a packet, effectively subverts end-to-end conges- | the CE codepoint in a packet, effectively subverts end-to-end | |||
| tion control only when the CE bit in the packet was set earlier by a | congestion control only when the CE codepoint in the packet was set | |||
| congested router. In this case, the transport protocol does not | earlier by a congested router. In this case, the transport protocol | |||
| receive the indication of congestion from the upstream congested | does not receive the indication of congestion from the upstream | |||
| routers. | congested routers. | |||
| Either of these two methods of subverting end-to-end congestion con- | Either of these two methods of subverting end-to-end congestion | |||
| trol can potentially introduce more damage to the network (and possi- | control can potentially introduce more damage to the network (and | |||
| bly to the flow itself) than if the adversary had simply dropped | possibly to the flow itself) than if the adversary had simply dropped | |||
| packets from that flow. However, as we discuss later in this section | packets from that flow. However, as we discuss later in this section | |||
| and in Section 7, this potential damage is limited. | and in Section 7, this potential damage is limited. | |||
| 19.1. Implications for the Network and for Competing Flows | 19.1. Implications for the Network and for Competing Flows | |||
| The CE bit of the ECN field is only used by routers as an indication | The CE codepoint of the ECN field is only used by routers as an | |||
| of congestion during periods of *moderate* congestion. ECN-capable | indication of congestion during periods of *moderate* congestion. | |||
| routers should drop rather than mark packets during heavy congestion | ECN-capable routers should drop rather than mark packets during heavy | |||
| even if the router's queue is not yet full. For example, for routers | congestion even if the router's queue is not yet full. For example, | |||
| using active queue management based on RED, the router should drop | for routers using active queue management based on RED, the router | |||
| rather than mark packets that arrive while the average queue sizes | should drop rather than mark packets that arrive while the average | |||
| exceed the RED queue's maximum threshold. | queue sizes exceed the RED queue's maximum threshold. | |||
| One consequence for the network of subverting end-to-end congestion | One consequence for the network of subverting end-to-end congestion | |||
| control is that flows that do not receive the congestion indications | control is that flows that do not receive the congestion indications | |||
| from the network might increase their sending rate until they drive | from the network might increase their sending rate until they drive | |||
| the network into heavier congestion. Then, the congested router | the network into heavier congestion. Then, the congested router | |||
| could begin to drop rather than mark arriving packets. For flows | could begin to drop rather than mark arriving packets. For flows | |||
| that are not isolated by some form of per-flow scheduling or other | that are not isolated by some form of per-flow scheduling or other | |||
| per-flow mechanisms, but are instead aggregated with other flows in a | per-flow mechanisms, but are instead aggregated with other flows in a | |||
| single queue in an undifferentiated fashion, this packet-dropping at | single queue in an undifferentiated fashion, this packet-dropping at | |||
| the congested router would apply to all flows that share that queue. | the congested router would apply to all flows that share that queue. | |||
| Thus, the consequences would be to increase the level of congestion | Thus, the consequences would be to increase the level of congestion | |||
| in the network. | in the network. | |||
| In some cases, the increase in the level of congestion will lead to a | In some cases, the increase in the level of congestion will lead to a | |||
| substantial buffer buildup at the congested queue that will be suffi- | substantial buffer buildup at the congested queue that will be | |||
| cient to drive the congested queue from the packet-marking to the | sufficient to drive the congested queue from the packet-marking to | |||
| packet-dropping regime. This transition could occur either because | the packet-dropping regime. This transition could occur either | |||
| of buffer overflow, or because of the active queue management policy | because of buffer overflow, or because of the active queue management | |||
| described above that drops packets when the average queue is above | policy described above that drops packets when the average queue is | |||
| RED's maximum threshold. At this point, all flows, including the | above RED's maximum threshold. At this point, all flows, including | |||
| subverted flow, will begin to see packet drops instead of packet | the subverted flow, will begin to see packet drops instead of packet | |||
| marks, and a malicious or broken router will no longer be able to | marks, and a malicious or broken router will no longer be able to | |||
| `erase' these indications of congestion in the network. If the end | `erase' these indications of congestion in the network. If the end | |||
| nodes are deploying appropriate end-to-end congestion control, then | nodes are deploying appropriate end-to-end congestion control, then | |||
| the subverted flow will reduce its arrival rate in response to con- | the subverted flow will reduce its arrival rate in response to | |||
| gestion. When the level of congestion is sufficiently reduced, the | congestion. When the level of congestion is sufficiently reduced, | |||
| congested queue can return from the packet-dropping regime to the | the congested queue can return from the packet-dropping regime to the | |||
| packet-marking regime. The steady-state pattern could be one of the | packet-marking regime. The steady-state pattern could be one of the | |||
| congested queue oscillating between these two regimes. | congested queue oscillating between these two regimes. | |||
| In other cases, the consequences of subverting end-to-end congestion | In other cases, the consequences of subverting end-to-end congestion | |||
| control will not be severe enough to drive the congested link into | control will not be severe enough to drive the congested link into | |||
| sufficiently-heavy congestion that packets are dropped instead of | sufficiently-heavy congestion that packets are dropped instead of | |||
| being marked. In this case, the implications for competing flows in | being marked. In this case, the implications for competing flows in | |||
| the network will be a slightly-increased rate of packet marking or | the network will be a slightly-increased rate of packet marking or | |||
| dropping, and a corresponding decrease in the bandwidth available to | dropping, and a corresponding decrease in the bandwidth available to | |||
| those flows. This can be a stable state if the arrival rate of the | those flows. This can be a stable state if the arrival rate of the | |||
| subverted flow is sufficiently small, relative to the link bandwidth, | subverted flow is sufficiently small, relative to the link bandwidth, | |||
| that the average queue size at the congested router remains under | that the average queue size at the congested router remains under | |||
| control. In particular, the subverted flow could have a limited | control. In particular, the subverted flow could have a limited | |||
| bandwidth demand on the link at this router, while still getting more | bandwidth demand on the link at this router, while still getting more | |||
| than its "fair" share of the link. This limited demand could be due | than its "fair" share of the link. This limited demand could be due | |||
| to a limited demand from the data source; a limitation from the TCP | to a limited demand from the data source; a limitation from the TCP | |||
| advertised window; a lower-bandwidth access pipe; or other factors. | advertised window; a lower-bandwidth access pipe; or other factors. | |||
| Thus the subversion of ECN-based congestion control can still lead to | Thus the subversion of ECN-based congestion control can still lead to | |||
| unfairness, which we believe is appropriate to note here. | unfairness, which we believe is appropriate to note here. | |||
| The threat to the network posed by the subversion of ECN-based con- | The threat to the network posed by the subversion of ECN-based | |||
| gestion control in the network is essentially the same as the threat | congestion control in the network is essentially the same as the | |||
| posed by an end-system that intentionally fails to cooperate with | threat posed by an end-system that intentionally fails to cooperate | |||
| end-to-end congestion control. The deployment of mechanisms in | with end-to-end congestion control. The deployment of mechanisms in | |||
| routers to address this threat is an open research question, and is | routers to address this threat is an open research question, and is | |||
| discussed further in Section 10. | discussed further in Section 10. | |||
| Let us take the example described in Section 18.1.1, where the CE bit | Let us take the example described in Section 18.1.1, where the CE | |||
| that was set in a packet is erased: {(1, 1) -> (1, 0)}. The conse- | codepoint that was set in a packet is erased: {'11' -> '10' or '11' | |||
| quence for the congested upstream router that set the CE bit is that | -> '01'}. The consequence for the congested upstream router that set | |||
| this congestion indication does not reach the end nodes for that | the CE codepoint is that this congestion indication does not reach | |||
| flow. The source (even one which is completely cooperative and not | the end nodes for that flow. The source (even one which is completely | |||
| malicious) is thus allowed to continue to increase its sending rate | cooperative and not malicious) is thus allowed to continue to | |||
| (if it is a TCP flow, by increasing its congestion window). The flow | increase its sending rate (if it is a TCP flow, by increasing its | |||
| potentially achieves better throughput than the other flows that also | congestion window). The flow potentially achieves better throughput | |||
| share the congested router, especially if there are no policing mech- | than the other flows that also share the congested router, especially | |||
| anisms or per-flow queueing mechanisms at that router. Consider the | if there are no policing mechanisms or per-flow queueing mechanisms | |||
| behavior of the other flows, especially if they are cooperative: that | at that router. Consider the behavior of the other flows, especially | |||
| is, the flows that do not experience subverted end-to-end congestion | if they are cooperative: that is, the flows that do not experience | |||
| control. They are likely to reduce their load (e.g., by reducing | subverted end-to-end congestion control. They are likely to reduce | |||
| their window size) on the congested router, thus benefiting our sub- | their load (e.g., by reducing their window size) on the congested | |||
| verted flow. This results in unfairness. As we discussed above, this | router, thus benefiting our subverted flow. This results in | |||
| unfairness could either be transient (because the congested queue is | unfairness. As we discussed above, this unfairness could either be | |||
| driven into the packet-marking regime), oscillatory (because the con- | transient (because the congested queue is driven into the packet- | |||
| gested queue oscillates between the packet marking and the packet | marking regime), oscillatory (because the congested queue oscillates | |||
| dropping regime), or more moderate but a persistent stable state | between the packet marking and the packet dropping regime), or more | |||
| (because the congested queue is never driven to the packet dropping | moderate but a persistent stable state (because the congested queue | |||
| regime). | is never driven to the packet dropping regime). | |||
| The results would be similar if the subverted flow was intentionally | The results would be similar if the subverted flow was intentionally | |||
| avoiding end-to-end congestion control. One difference is that a | avoiding end-to-end congestion control. One difference is that a | |||
| flow that is intentionally avoiding end-to-end congestion control at | flow that is intentionally avoiding end-to-end congestion control at | |||
| the end nodes can avoid end-to-end congestion control even when the | the end nodes can avoid end-to-end congestion control even when the | |||
| congested queue is in packet-dropping mode, by refusing to reduce its | congested queue is in packet-dropping mode, by refusing to reduce its | |||
| sending rate in response to packet drops in the network. Thus the | sending rate in response to packet drops in the network. Thus the | |||
| problems for the network from the subversion of ECN-based congestion | problems for the network from the subversion of ECN-based congestion | |||
| control are less severe than the problems caused by the intentional | control are less severe than the problems caused by the intentional | |||
| avoidance of end-to-end congestion control in the end nodes. It is | avoidance of end-to-end congestion control in the end nodes. It is | |||
| also the case that it is considerably more difficult to control the | also the case that it is considerably more difficult to control the | |||
| behavior of the end nodes than it is to control the behavior of the | behavior of the end nodes than it is to control the behavior of the | |||
| infrastructure itself. This is not to say that the problems for the | infrastructure itself. This is not to say that the problems for the | |||
| network posed by the network's subversion of ECN-based congestion | network posed by the network's subversion of ECN-based congestion | |||
| control are small; just that they are dwarfed by the problems for the | control are small; just that they are dwarfed by the problems for the | |||
| network posed by the subversion of either ECN-based or other cur- | network posed by the subversion of either ECN-based or other | |||
| rently known packet-based congestion control mechanisms by the end | currently known packet-based congestion control mechanisms by the end | |||
| nodes. | nodes. | |||
| 19.2. Implications for the Subverted Flow | 19.2. Implications for the Subverted Flow | |||
| When a source indicates that it is ECN-capable, there is an expecta- | When a source indicates that it is ECN-capable, there is an | |||
| tion that the routers in the network that are capable of participat- | expectation that the routers in the network that are capable of | |||
| ing in ECN will use the CE bit for indication of congestion. There is | participating in ECN will use the CE codepoint for indication of | |||
| the potential benefit of using ECN in reducing the amount of packet | congestion. There is the potential benefit of using ECN in reducing | |||
| loss (in addition to the reduced queueing delays because of active | the amount of packet loss (in addition to the reduced queueing delays | |||
| queue management policies). When the packet flows through a tunnel | because of active queue management policies). When the packet flows | |||
| where the nodes that the tunneled packets traverse are untrusted in | through a tunnel where the nodes that the tunneled packets traverse | |||
| some way, the expectation is that IPsec will protect the flow from | are untrusted in some way, the expectation is that IPsec will protect | |||
| subversion that results in undesirable consequences. | the flow from subversion that results in undesirable consequences. | |||
| In many cases, a subverted flow will benefit from the subversion of | In many cases, a subverted flow will benefit from the subversion of | |||
| end-to-end congestion control for that flow in the network, by | end-to-end congestion control for that flow in the network, by | |||
| receiving more bandwidth than it would have otherwise, relative to | receiving more bandwidth than it would have otherwise, relative to | |||
| competing non-subverted flows. If the congested queue reaches the | competing non-subverted flows. If the congested queue reaches the | |||
| packet-dropping stage, then the subversion of end-to-end congestion | packet-dropping stage, then the subversion of end-to-end congestion | |||
| control might or might not be of overall benefit to the subverted | control might or might not be of overall benefit to the subverted | |||
| flow, depending on that flow's relative tradeoffs between throughput, | flow, depending on that flow's relative tradeoffs between throughput, | |||
| loss, and delay. | loss, and delay. | |||
| One form of subverting end-to-end congestion control is to falsely | One form of subverting end-to-end congestion control is to falsely | |||
| indicate ECN-capability by setting the ECT bit. This has the conse- | indicate ECN-capability by setting the ECT codepoint. This has the | |||
| quence of downstream congested routers setting the CE bit in vain. | consequence of downstream congested routers setting the CE codepoint | |||
| However, as described in Section 9.1.2, if the ECT bit is changed in | in vain. However, as described in Section 9.1.2, if an ECT codepoint | |||
| an IP tunnel, this can be detected at the egress point of the tunnel, | is changed in an IP tunnel, this can be detected at the egress point | |||
| as long as the inner header was not changed within the tunnel. | of the tunnel, as long as the inner header was not changed within the | |||
| tunnel. | ||||
| The second form of subverting end-to-end congestion control is to | The second form of subverting end-to-end congestion control is to | |||
| erase the congestion indication, either by erasing the CE bit | erase the congestion indication by erasing the CE codepoint. In this | |||
| directly, or by erasing the ECT bit when the CE bit is already set. | case, it is the upstream congested routers that set the CE codepoint | |||
| In this case, it is the upstream congested routers that set the CE | in vain. | |||
| bit in vain. | ||||
| If the ECT bit is erased within an IP tunnel, then this can be | If an ECT codepoint is erased within an IP tunnel, then this can be | |||
| detected at the egress point of the tunnel, as long as the inner | detected at the egress point of the tunnel, as long as the inner | |||
| header was not changed within the tunnel. If the CE bit is set | header was not changed within the tunnel. If the CE codepoint is set | |||
| upstream of the IP tunnel, then any erasure of the outer header's CE | upstream of the IP tunnel, then any erasure of the outer header's CE | |||
| bit within the tunnel will have no effect because the inner header | codepoint within the tunnel will have no effect because the inner | |||
| preserves the set value of the CE bit. However, if the CE bit is set | header preserves the set value of the CE codepoint. However, if the | |||
| within the tunnel, and erased either within or downstream of the tun- | CE codepoint is set within the tunnel, and erased either within or | |||
| nel, this is not necessarily detected at the egress point of the tun- | downstream of the tunnel, this is not necessarily detected at the | |||
| nel. | egress point of the tunnel. | |||
| With this subversion of end-to-end congestion control, an end-system | With this subversion of end-to-end congestion control, an end-system | |||
| transport does not respond to the congestion indication. Along with | transport does not respond to the congestion indication. Along with | |||
| the increased unfairness for the non-subverted flows described in the | the increased unfairness for the non-subverted flows described in the | |||
| previous section, the congested router's queue could continue to | previous section, the congested router's queue could continue to | |||
| build, resulting in packet loss at the congested router - which is a | build, resulting in packet loss at the congested router - which is a | |||
| means for indicating congestion to the transport in any case. In the | means for indicating congestion to the transport in any case. In the | |||
| interim, the flow might experience higher queueing delays, possibly | interim, the flow might experience higher queueing delays, possibly | |||
| along with an increased bandwidth relative to other non-subverted | along with an increased bandwidth relative to other non-subverted | |||
| flows. But transports do not inherently make assumptions of consis- | flows. But transports do not inherently make assumptions of | |||
| tently experiencing carefully managed queueing in the path. We | consistently experiencing carefully managed queueing in the path. We | |||
| believe that these forms of subverting end-to-end congestion control | believe that these forms of subverting end-to-end congestion control | |||
| are no worse for the subverted flow than if the adversary had simply | are no worse for the subverted flow than if the adversary had simply | |||
| dropped the packets of that flow itself. | dropped the packets of that flow itself. | |||
| 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control | 19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control | |||
| We have shown that, in many cases, a malicious or broken router that | We have shown that, in many cases, a malicious or broken router that | |||
| is able to change the bits in the ECN field can do no more damage | is able to change the bits in the ECN field can do no more damage | |||
| than if it had simply dropped the packet in question. However, this | than if it had simply dropped the packet in question. However, this | |||
| is not true in all cases, in particular in the cases where the broken | is not true in all cases, in particular in the cases where the broken | |||
| router subverted end-to-end congestion control by either falsely | router subverted end-to-end congestion control by either falsely | |||
| indicating ECN-Capability or by erasing the ECN congestion indication | indicating ECN-Capability or by erasing the ECN congestion indication | |||
| (in the CE-bit). While there are many ways that a router can harm a | (in the CE codepoint). While there are many ways that a router can | |||
| flow by dropping packets, a router cannot subvert end-to-end conges- | harm a flow by dropping packets, a router cannot subvert end-to-end | |||
| tion control by dropping packets. As an example, a router cannot | congestion control by dropping packets. As an example, a router | |||
| subvert TCP congestion control by dropping data packets, acknowledge- | cannot subvert TCP congestion control by dropping data packets, | |||
| ment packets, or control packets. | acknowledgement packets, or control packets. | |||
| Even though packet-dropping cannot be used to subvert end-to-end con- | Even though packet-dropping cannot be used to subvert end-to-end | |||
| gestion control, there *are* non-ECN-based methods for subverting | congestion control, there *are* non-ECN-based methods for subverting | |||
| end-to-end congestion control that a broken or malicious router could | end-to-end congestion control that a broken or malicious router could | |||
| use. For example, a broken router could duplicate data packets, thus | use. For example, a broken router could duplicate data packets, thus | |||
| effectively negating the effects of end-to-end congestion control | effectively negating the effects of end-to-end congestion control | |||
| along some portion of the path. (For a router that duplicated pack- | along some portion of the path. (For a router that duplicated | |||
| ets within an IPsec tunnel, the security administrator can cause the | packets within an IPsec tunnel, the security administrator can cause | |||
| duplicate packets to be discarded by configuring anti-replay protec- | the duplicate packets to be discarded by configuring anti-replay | |||
| tion for the tunnel.) This duplication of packets within the network | protection for the tunnel.) This duplication of packets within the | |||
| would have similar implications for the network and for the subverted | network would have similar implications for the network and for the | |||
| flow as those described in Sections 18.1.1 and 18.1.4 above. | subverted flow as those described in Sections 18.1.1 and 18.1.4 | |||
| above. | ||||
| 20. The Motivation for the ECT bit. | 20. The Motivation for the ECT Codepoints. | |||
| The need for the ECT bit is motivated by the fact that ECN will be | 20.1. The Motivation for an ECT Codepoint. | |||
| deployed incrementally in an Internet where some transport protocols | ||||
| and routers understand ECN and some do not. With the ECT bit, the | ||||
| router can drop packets from flows that are not ECN-capable, but can | ||||
| *instead* set the CE bit in packets that *are* ECN-capable. Because | ||||
| the ECT bit allows an end node to have the CE bit set in a packet | ||||
| *instead* of having the packet dropped, an end node might have some | ||||
| incentive to deploy ECN. | ||||
| If there was no ECT indication, then the router would have to set the | The need for an ECT codepoint is motivated by the fact that ECN will | |||
| CE bit for packets from both ECN-capable and non-ECN-capable flows. | be deployed incrementally in an Internet where some transport | |||
| In this case, there would be no incentive for end-nodes to deploy | protocols and routers understand ECN and some do not. With an ECT | |||
| ECN, and no viable path of incremental deployment from a non-ECN | codepoint, the router can drop packets from flows that are not ECN- | |||
| world to an ECN-capable world. Consider the first stages of such an | capable, but can *instead* set the CE codepoint in packets that *are* | |||
| incremental deployment, where a subset of the flows are ECN-capable. | ECN-capable. Because an ECT codepoint allows an end node to have the | |||
| At the onset of congestion, when the packet dropping/marking rate | CE codepoint set in a packet *instead* of having the packet dropped, | |||
| would be low, routers would only set CE bits, rather than dropping | an end node might have some incentive to deploy ECN. | |||
| packets. However, only those flows that are ECN-capable would under- | ||||
| stand and respond to CE packets. The result is that the ECN-capable | If there was no ECT codepoint, then the router would have to set the | |||
| flows would back off, and the non-ECN-capable flows would be unaware | CE codepoint for packets from both ECN-capable and non-ECN-capable | |||
| of the ECN signals and would continue to open their congestion win- | flows. In this case, there would be no incentive for end-nodes to | |||
| dows. | deploy ECN, and no viable path of incremental deployment from a non- | |||
| ECN world to an ECN-capable world. Consider the first stages of such | ||||
| an incremental deployment, where a subset of the flows are ECN- | ||||
| capable. At the onset of congestion, when the packet | ||||
| dropping/marking rate would be low, routers would only set CE | ||||
| codepoints, rather than dropping packets. However, only those flows | ||||
| that are ECN-capable would understand and respond to CE packets. The | ||||
| result is that the ECN-capable flows would back off, and the non-ECN- | ||||
| capable flows would be unaware of the ECN signals and would continue | ||||
| to open their congestion windows. | ||||
| In this case, there are two possible outcomes: (1) the ECN-capable | In this case, there are two possible outcomes: (1) the ECN-capable | |||
| flows back off, the non-ECN-capable flows get all of the bandwidth, | flows back off, the non-ECN-capable flows get all of the bandwidth, | |||
| and congestion remains mild, or (2) the ECN-capable flows back off, | and congestion remains mild, or (2) the ECN-capable flows back off, | |||
| the non-ECN-capable flows don't, and congestion increases until the | the non-ECN-capable flows don't, and congestion increases until the | |||
| router transitions from setting the CE bit to dropping packets. | router transitions from setting the CE codepoint to dropping packets. | |||
| While this second outcome evens out the fairness, the ECN-capable | While this second outcome evens out the fairness, the ECN-capable | |||
| flows would still receive little benefit from being ECN-capable, | flows would still receive little benefit from being ECN-capable, | |||
| because the increased congestion would drive the router to packet- | because the increased congestion would drive the router to packet- | |||
| dropping behavior. | dropping behavior. | |||
| A flow that advertised itself as ECN-Capable but does not respond to | A flow that advertised itself as ECN-Capable but does not respond to | |||
| CE bits is functionally equivalent to a flow that turns off conges- | CE codepoints is functionally equivalent to a flow that turns off | |||
| tion control, as discussed earlier in this document. | congestion control, as discussed earlier in this document. | |||
| Thus, in a world when a subset of the flows are ECN-capable, but | Thus, in a world when a subset of the flows are ECN-capable, but | |||
| where ECN-capable flows have no mechanism for indicating that fact to | where ECN-capable flows have no mechanism for indicating that fact to | |||
| the routers, there would be less effective and less fair congestion | the routers, there would be less effective and less fair congestion | |||
| control in the Internet, resulting in a strong incentive for end | control in the Internet, resulting in a strong incentive for end | |||
| nodes not to deploy ECN. | nodes not to deploy ECN. | |||
| 20.2. The Motivation for two ECT Codepoints. | ||||
| The primary motivation for the two ECT codepoints is to provide a | ||||
| one-bit ECN nonce. The ECN nonce allows the development of | ||||
| mechanisms for the sender to probabilistically verify that network | ||||
| elements are not erasing the CE codepoint, and that data receivers | ||||
| are properly reporting to the sender the receipt of packets with the | ||||
| CE codepoint set. | ||||
| Another possibility for senders to detect misbehaving network | ||||
| elements or receivers would be for the data sender to occasionally | ||||
| send a data packet with the CE codepoint set, to see if the receiver | ||||
| reports receiving the CE codepoint. Of course, if these packets | ||||
| encountered congestion in the network, the router might make no | ||||
| change in the packets, because the CE codepoint would already be set. | ||||
| Thus, for packets sent with the CE codepoint set, the TCP end-nodes | ||||
| could not determine if some router intended to set the CE codepoint | ||||
| in these packets. For this reason, sending packets with the CE | ||||
| codepoint would have to be done sparingly, and would be a less | ||||
| effective check against misbehaving network elements and receivers | ||||
| than would be the ECN nonce. | ||||
| The assignment of the fourth ECN codepoint to ECT(1) precludes the | ||||
| use of this codepoint for other purposes. For clarity, we briefly | ||||
| list those possible purposes here. | ||||
| One possibility might have been for the data sender to use the fourth | ||||
| ECN codepoint to indicate an alternate semantics for ECN. However, | ||||
| this seems to us more appropriate to be signalled using a | ||||
| differentiated services codepoint in the DS field. | ||||
| A second possible use for the fourth ECN codepoint would have been to | ||||
| give the router two separate codepoints for the indication of | ||||
| congestion, CE(0) and CE(1), for mild and severe congestion | ||||
| respectively. While this could be useful in some cases, this | ||||
| certainly does not seem a compelling requirement at this point. If | ||||
| there was judged to be a compelling need for this, the complications | ||||
| of incremental deployment would most likely necessitate more that | ||||
| just one codepoint for this function. | ||||
| A third use that has been informally proposed for the ECN codepoint | ||||
| is for use in some forms of multicast congestion control, based on | ||||
| randomized procedures for duplicating marked packets at routers. | ||||
| Some proposed multicast packet duplication procedures are based on a | ||||
| new ECN codepoint that (1) conveys the fact that congestion occurred | ||||
| upstream of the duplication point that marked the packet with this | ||||
| codepoint and (2) can detect congestion downstream of that | ||||
| duplication point. ECT(1) can serve this purpose because it is both | ||||
| distinct from ECT(0) and is replaced by CE when ECN marking occurs in | ||||
| response to congestion or incipient congestion. Explanation of how | ||||
| this enhanced version of ECN would be used by multicast congestion | ||||
| control is beyond the scope of this document, as are ECN-aware | ||||
| multicast packet duplication procedures and the processing of the ECN | ||||
| field at multicast receivers in all cases (i.e., irrespective of the | ||||
| multicast packet duplication procedure(s) used). | ||||
| The specification of IP tunnel modifications for ECN in this document | ||||
| assumes that the only change made to the outer IP header's ECN field | ||||
| between tunnel endpoints is to set the CE codepoint to indicate | ||||
| congestion. This is not consistent with some of the proposed uses of | ||||
| ECT(1) by the multicast duplication procedures in the previous | ||||
| paragraph, and such procedures SHOULD NOT be deployed within tunnels | ||||
| configured for full ECN functionality. Limited ECN functionality may | ||||
| be used instead, although in practice many tunnel protocols | ||||
| (including IPsec) will not work correctly if multicast traffic | ||||
| duplication occurs within the tunnel | ||||
| 21. Why use Two Bits in the IP Header? | 21. Why use Two Bits in the IP Header? | |||
| Given the need for an ECT indication in the IP header, there still | Given the need for an ECT indication in the IP header, there still | |||
| remains the question of whether the ECT (ECN-Capable Transport) and | remains the question of whether the ECT (ECN-Capable Transport) and | |||
| CE (Congestion Experienced) indications should have been overloaded | CE (Congestion Experienced) codepoints should have been overloaded on | |||
| on a single bit. This overloaded-one-bit alternative, explored in | a single bit. This overloaded-one-bit alternative, explored in | |||
| [Floyd94], would have involved a single bit with two values. One | [Floyd94], would have involved a single bit with two values. One | |||
| value, "ECT and not CE", would represent an ECN-Capable Transport, | value, "ECT and not CE", would represent an ECN-Capable Transport, | |||
| and the other value, "CE or not ECT", would represent either Conges- | and the other value, "CE or not ECT", would represent either | |||
| tion Experienced or a non-ECN-Capable transport. | Congestion Experienced or a non-ECN-Capable transport. | |||
| One difference between the one-bit and two-bit implementations con- | One difference between the one-bit and two-bit implementations | |||
| cerns packets that traverse multiple congested routers. Consider a | concerns packets that traverse multiple congested routers. Consider | |||
| CE packet that arrives at a second congested router, and is selected | a CE packet that arrives at a second congested router, and is | |||
| by the active queue management at that router for either marking or | selected by the active queue management at that router for either | |||
| dropping. In the one-bit implementation, the second congested router | marking or dropping. In the one-bit implementation, the second | |||
| has no choice but to drop the CE packet, because it cannot distin- | congested router has no choice but to drop the CE packet, because it | |||
| guish between a CE packet and a non-ECT packet. In the two-bit | cannot distinguish between a CE packet and a non-ECT packet. In the | |||
| implementation, the second congested router has the choice of either | two-bit implementation, the second congested router has the choice of | |||
| dropping the CE packet, or of leaving it alone with the CE bit set. | either dropping the CE packet, or of leaving it alone with the CE | |||
| codepoint set. | ||||
| Another difference between the one-bit and two-bit implementations | Another difference between the one-bit and two-bit implementations | |||
| comes from the fact that with the one-bit implementation, receivers | comes from the fact that with the one-bit implementation, receivers | |||
| in a single flow cannot distinguish between CE and non-ECT packets. | in a single flow cannot distinguish between CE and non-ECT packets. | |||
| Thus, in the one-bit implementation an ECN-capable data sender would | Thus, in the one-bit implementation an ECN-capable data sender would | |||
| have to unambiguously indicate to the receiver or receivers whether | have to unambiguously indicate to the receiver or receivers whether | |||
| each packet had been sent as ECN-Capable or as non-ECN-Capable. One | each packet had been sent as ECN-Capable or as non-ECN-Capable. One | |||
| possibility would be for the sender to indicate in the transport | possibility would be for the sender to indicate in the transport | |||
| header whether the packet was sent as ECN-Capable. A second possi- | header whether the packet was sent as ECN-Capable. A second | |||
| bility that would involve a functional limitation for the one- bit | possibility that would involve a functional limitation for the one- | |||
| implementation would be for the sender to unambiguously indicate that | bit implementation would be for the sender to unambiguously indicate | |||
| it was going to send *all* of its packets as ECN-Capable or as non- | that it was going to send *all* of its packets as ECN-Capable or as | |||
| ECN-Capable. For a multicast transport protocol, this unambiguous | non-ECN-Capable. For a multicast transport protocol, this | |||
| indication would have to be apparent to receivers joining an on-going | unambiguous indication would have to be apparent to receivers joining | |||
| multicast session. | an on-going multicast session. | |||
| Another concern that was described earlier (and recommended in this | Another concern that was described earlier (and recommended in this | |||
| document) is that transports (particularly TCP) should not mark pure | document) is that transports (particularly TCP) should not mark pure | |||
| ACK packets or retransmitted packets as being ECN-Capable. A pure | ACK packets or retransmitted packets as being ECN-Capable. A pure | |||
| ACK packet from a non-ECN-capable transport could be dropped, without | ACK packet from a non-ECN-capable transport could be dropped, without | |||
| necessarily having an impact on the transport from a congestion con- | necessarily having an impact on the transport from a congestion | |||
| trol perspective (because subsequent ACKs are cumulative). An ECN- | control perspective (because subsequent ACKs are cumulative). An | |||
| capable transport reacting to the CE bit set in a pure ACK packet by | ECN-capable transport reacting to the CE codepoint in a pure ACK | |||
| reducing the window would be at a disadvantage in comparison to a | packet by reducing the window would be at a disadvantage in | |||
| non-ECN-capable transport. For this reason (and for reasons described | comparison to a non-ECN-capable transport. For this reason (and for | |||
| earlier in relation to retransmitted packets), it is desirable to | reasons described earlier in relation to retransmitted packets), it | |||
| have the ECN-Capable bit indication on a per-packet basis. | is desirable to have the ECT codepoint set on a per-packet basis. | |||
| Another advantage of the two-bit approach is that it is somewhat more | Another advantage of the two-bit approach is that it is somewhat more | |||
| robust. The most critical issue, discussed in Section 8, is that the | robust. The most critical issue, discussed in Section 8, is that the | |||
| default indication should be that of a non-ECN-Capable transport. In | default indication should be that of a non-ECN-Capable transport. In | |||
| a two-bit implementation, this requirement for the default value sim- | a two-bit implementation, this requirement for the default value | |||
| ply means that the ECT bit should be `OFF' by default. In the one- | simply means that the non-ECT codepoint should be the default. In | |||
| bit implementation, this means that the single overloaded bit should | the one-bit implementation, this means that the single overloaded bit | |||
| by default be in the "CE or not ECT" position. This is less clear | should by default be in the "CE or not ECT" position. This is less | |||
| and straightforward, and possibly more open to incorrect implementa- | clear and straightforward, and possibly more open to incorrect | |||
| tions either in the end nodes or in the routers. | implementations either in the end nodes or in the routers. | |||
| In summary, while the one-bit implementation could be a possible | In summary, while the one-bit implementation could be a possible | |||
| implementation, it has the following significant limitations relative | implementation, it has the following significant limitations relative | |||
| to the two-bit implementation. First, the one-bit implementation has | to the two-bit implementation. First, the one-bit implementation has | |||
| more limited functionality for the treatment of CE packets at a sec- | more limited functionality for the treatment of CE packets at a | |||
| ond congested router. Second, the one-bit implementation requires | second congested router. Second, the one-bit implementation requires | |||
| either that extra information be carried in the transport header of | either that extra information be carried in the transport header of | |||
| packets from ECN-Capable flows (to convey the functionality of the | packets from ECN-Capable flows (to convey the functionality of the | |||
| second bit elsewhere, namely in the transport header), or that | second bit elsewhere, namely in the transport header), or that | |||
| senders in ECN-Capable flows accept the limitation that receivers | senders in ECN-Capable flows accept the limitation that receivers | |||
| must be able to determine a priori which packets are ECN-Capable and | must be able to determine a priori which packets are ECN-Capable and | |||
| which are not ECN-Capable. Third, the one-bit implementation is pos- | which are not ECN-Capable. Third, the one-bit implementation is | |||
| sibly more open to errors from faulty implementations that choose the | possibly more open to errors from faulty implementations that choose | |||
| wrong default value for the ECN bit. We believe that the use of the | the wrong default value for the ECN bit. We believe that the use of | |||
| extra bit in the IP header for the ECT-bit is extremely valuable to | the extra bit in the IP header for the ECT-bit is extremely valuable | |||
| overcome these limitations. | to overcome these limitations. | |||
| 22. Historical Definitions for the IPv4 TOS Octet | 22. Historical Definitions for the IPv4 TOS Octet | |||
| RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP | RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP | |||
| header. In RFC 791, bits 6 and 7 of the ToS octet are listed as | header. In RFC 791, bits 6 and 7 of the ToS octet are listed as | |||
| "Reserved for Future Use", and are shown set to zero. The first two | "Reserved for Future Use", and are shown set to zero. The first two | |||
| fields of the ToS octet were defined as the Precedence and Type of | fields of the ToS octet were defined as the Precedence and Type of | |||
| Service (TOS) fields. | Service (TOS) fields. | |||
| 0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
| skipping to change at page 53, line 45 ¶ | skipping to change at page 57, line 20 ¶ | |||
| The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows: | The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows: | |||
| 0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
| +-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| | PRECEDENCE | TOS | MBZ | RFC 1349 | | PRECEDENCE | TOS | MBZ | RFC 1349 | |||
| +-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary | Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary | |||
| Cost". In addition to the Precedence and Type of Service (TOS) | Cost". In addition to the Precedence and Type of Service (TOS) | |||
| fields, the last field, MBZ (for "must be zero") was defined as cur- | fields, the last field, MBZ (for "must be zero") was defined as | |||
| rently unused. RFC 1349 stated that "The originator of a datagram | currently unused. RFC 1349 stated that "The originator of a datagram | |||
| sets [the MBZ] field to zero (unless participating in an Internet | sets [the MBZ] field to zero (unless participating in an Internet | |||
| protocol experiment which makes use of that bit)." | protocol experiment which makes use of that bit)." | |||
| RFC 1455 [RFC 1455] defined an experimental standard that used all | RFC 1455 [RFC 1455] defined an experimental standard that used all | |||
| four bits in the TOS field to request a guaranteed level of link | four bits in the TOS field to request a guaranteed level of link | |||
| security. | security. | |||
| RFC 1349 and RFC 1455 have been obsoleted by "Definition of the Dif- | RFC 1349 and RFC 1455 have been obsoleted by "Definition of the | |||
| ferentiated Services Field (DS Field) in the IPv4 and IPv6 Headers" | Differentiated Services Field (DS Field) in the IPv4 and IPv6 | |||
| [RFC2474] in which bits 6 and 7 of the DS field are listed as Cur- | Headers" [RFC2474] in which bits 6 and 7 of the DS field are listed | |||
| rently Unused (CU). RFC 2780 [RFC2780] specified ECN as an experi- | as Currently Unused (CU). RFC 2780 [RFC2780] specified ECN as an | |||
| mental use of the two-bit CU field. RFC 2780 updated the definition | experimental use of the two-bit CU field. RFC 2780 updated the | |||
| of the DS Field to only encompass the first six bits of this octet | definition of the DS Field to only encompass the first six bits of | |||
| rather than all eight bits; these first six bits are defined as the | this octet rather than all eight bits; these first six bits are | |||
| Differentiated Services CodePoint (DSCP): | defined as the Differentiated Services CodePoint (DSCP): | |||
| 0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
| +-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| | DSCP | CU | RFCs 2474, | | DSCP | CU | RFCs 2474, | |||
| 2780 | 2780 | |||
| +-----+-----+-----+-----+-----+-----+-----+-----+ | +-----+-----+-----+-----+-----+-----+-----+-----+ | |||
| Because of this unstable history, the definition of the ECN field in | Because of this unstable history, the definition of the ECN field in | |||
| this document cannot be guaranteed to be backwards compatible with | this document cannot be guaranteed to be backwards compatible with | |||
| all past uses of these two bits. | all past uses of these two bits. | |||
| Prior to RFC 2474, routers were not permitted to modify bits in | Prior to RFC 2474, routers were not permitted to modify bits in | |||
| either the DSCP or ECN field of packets forwarded through them, and | either the DSCP or ECN field of packets forwarded through them, and | |||
| hence routers that comply only with RFCs prior to 2474 should have no | hence routers that comply only with RFCs prior to 2474 should have no | |||
| effect on ECN. For end nodes, bit 7 (the ECN CE bit) must be trans- | effect on ECN. For end nodes, bit 7 (the second ECN bit) must be | |||
| mitted as zero for any implementation compliant only with RFCs prior | transmitted as zero for any implementation compliant only with RFCs | |||
| to 2474. Such nodes may transmit bit 6 (the ECN ECT bit) as one for | prior to 2474. Such nodes may transmit bit 6 (the first ECN bit) as | |||
| the "Minimize Monetary Cost" provision of RFC 1349 or the experiment | one for the "Minimize Monetary Cost" provision of RFC 1349 or the | |||
| authorized by RFC 1455; neither this aspect of RFC 1349 nor the | experiment authorized by RFC 1455; neither this aspect of RFC 1349 | |||
| experiment in RFC 1455 were widely implemented or used. The damage | nor the experiment in RFC 1455 were widely implemented or used. The | |||
| that could be done by a broken, non-conformant router would be to | damage that could be done by a broken, non-conformant router would | |||
| "erase" the CE bit for an ECN- capable packet that arrived at the | include "erasing" the CE codepoint for an ECN-capable packet that | |||
| router with the CE bit set, or set the CE bit even in the absence of | arrived at the router with the CE codepoint set, or setting the CE | |||
| congestion. This has been discussed in the section on "Non-compli- | codepoint even in the absence of congestion. This has been discussed | |||
| ance in the Network". | in the section on "Non-compliance in the Network". | |||
| The damage that could be done in an ECN-capable environment by a non- | The damage that could be done in an ECN-capable environment by a non- | |||
| ECN-capable end-node transmitting packets with the ECT bit set has | ECN-capable end-node transmitting packets with the ECT codepoint set | |||
| been discussed in the section on "Non-compliance by the End Nodes". | has been discussed in the section on "Non-compliance by the End | |||
| Nodes". | ||||
| 23. IANA Considerations | 23. IANA Considerations | |||
| The bits for ECT and CE in the ECN Field of the IP header and the | The codepoints for the ECN Field of the IP header and the bits for | |||
| bits for CWR and ECE in the TCP header are specified by the Standards | CWR and ECE in the TCP header are specified by the Standards Action | |||
| Action of this RFC, as is required by RFC 2780. We would note that | of this RFC, as is required by RFC 2780. | |||
| this RFC does not define the codepoint of (ECT=0, CE=1) for the ECT | ||||
| and CE bits. | ||||
| IANA allocated the IPSEC Security Association Attribute value 10 for | IANA allocated the IPSEC Security Association Attribute value 10 for | |||
| the ECN Tunnel use described in Section 9.2.1.2 above at the request | the ECN Tunnel use described in Section 9.2.1.2 above at the request | |||
| of David Black in November 1999. If this draft is approved for pub- | of David Black in November 1999. If this draft is approved for | |||
| lication as an RFC, IANA should change the Reference for this alloca- | publication as an RFC, IANA should change the Reference for this | |||
| tion from David Black's request to this RFC based on its RFC number. | allocation from David Black's request to this RFC based on its RFC | |||
| number. | ||||
| AUTHORS' ADDRESSES | AUTHORS' ADDRESSES | |||
| K. K. Ramakrishnan | K. K. Ramakrishnan | |||
| TeraOptic Networks, Inc. | TeraOptic Networks, Inc. | |||
| Phone: +1 (408) 666-8650 | Phone: +1 (408) 666-8650 | |||
| Email: kk@teraoptic.com | Email: kk@teraoptic.com | |||
| Sally Floyd | Sally Floyd | |||
| Phone: +1 (510) 666-2989 | Phone: +1 (510) 666-2989 | |||
| skipping to change at page 55, line 31 ¶ | skipping to change at page 59, line 7 ¶ | |||
| Email: floyd@aciri.org | Email: floyd@aciri.org | |||
| URL: http://www.aciri.org/floyd/ | URL: http://www.aciri.org/floyd/ | |||
| David L. Black | David L. Black | |||
| EMC Corporation | EMC Corporation | |||
| 42 South St. | 42 South St. | |||
| Hopkinton, MA 01748 | Hopkinton, MA 01748 | |||
| Phone: +1 (508) 435-1000 x75140 | Phone: +1 (508) 435-1000 x75140 | |||
| Email: black_david@emc.com | Email: black_david@emc.com | |||
| This draft was created in January 2001. | This draft was created in February 2001. | |||
| It expires July 2001. | It expires August 2001. | |||
| End of changes. 221 change blocks. | ||||
| 1131 lines changed or deleted | 1304 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||