idnits 2.17.1 

draft-ietf-tsvwg-ecn-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 54
     longer pages, the longest (page 2) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 55 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There is 1 instance of too long lines in the document, the longest one
     being 5 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC 2001' is mentioned on line 482, but not defined

  ** Obsolete undefined reference: RFC 2001 (Obsoleted by RFC 2581)

  == Missing Reference: 'RFC 2983' is mentioned on line 988, but not defined

  == Missing Reference: 'RFC 2474' is mentioned on line 1373, but not defined

  == Missing Reference: 'RFC 2475' is mentioned on line 1374, but not defined

  == Missing Reference: 'RFC 1455' is mentioned on line 2488, but not defined

  ** Obsolete undefined reference: RFC 1455 (Obsoleted by RFC 2474)

  == Unused Reference: 'FRED' is defined on line 1757, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC1455' is defined on line 1800, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC1701' is defined on line 1803, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC1702' is defined on line 1806, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC 2119' is defined on line 1812, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2408' is defined on line 1824, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2409' is defined on line 1828, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2475' is defined on line 1835, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2983' is defined on line 1849, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2402 (ref. 'AH') (Obsoleted by RFC
     4302, RFC 4305)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ECN'

  ** Obsolete normative reference: RFC 2406 (ref. 'ESP') (Obsoleted by RFC
     4303, RFC 4305)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ93'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd94'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FF99'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FRED'

  ** Downref: Normative reference to an Informational RFC: RFC 1701 (ref.
     'GRE')

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson88'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson90'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'K98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MJV96'

  ** Downref: Normative reference to an Informational RFC: RFC 2702 (ref.
     'MPLS')

  ** Downref: Normative reference to an Informational RFC: RFC 2637 (ref.
     'PPTP')

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Downref: Normative reference to an Informational RFC: RFC 1141

  ** Obsolete normative reference: RFC 1349 (Obsoleted by RFC 2474)

  ** Obsolete normative reference: RFC 1455 (Obsoleted by RFC 2474)

  -- Duplicate reference: RFC1701, mentioned in 'RFC1701', was also mentioned
     in 'GRE'.

  ** Downref: Normative reference to an Informational RFC: RFC 1701

  ** Downref: Normative reference to an Informational RFC: RFC 1702

  -- Duplicate reference: RFC2119, mentioned in 'RFC 2119', was also
     mentioned in 'B97'.

  ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567)

  ** Obsolete normative reference: RFC 2401 (Obsoleted by RFC 4301)

  ** Obsolete normative reference: RFC 2407 (Obsoleted by RFC 4306)

  ** Obsolete normative reference: RFC 2409 (ref. 'RFC2408') (Obsoleted by
     RFC 4306)

  -- Duplicate reference: RFC2409, mentioned in 'RFC2409', was also mentioned
     in 'RFC2408'.

  ** Obsolete normative reference: RFC 2409 (Obsoleted by RFC 4306)

  ** Downref: Normative reference to an Informational RFC: RFC 2475

  ** Obsolete normative reference: RFC 2481 (Obsoleted by RFC 3168)

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)

  ** Downref: Normative reference to an Informational RFC: RFC 2884

  ** Downref: Normative reference to an Informational RFC: RFC 2983

  -- Possible downref: Non-RFC (?) normative reference: ref. 'RJ90'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'SCWA99'


     Summary: 27 errors (**), 0 flaws (~~), 18 warnings (==), 17 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                       K. K. Ramakrishnan
3	INTERNET DRAFT                                        TeraOptic Networks
4	draft-ietf-tsvwg-ecn-01.txt                                  Sally Floyd
5	                                                                   ACIRI
6	                                                                D. Black
7	                                                                     EMC
8	                                                           January, 2001
9	                                                     Expires: July, 2001

11	      The Addition of Explicit Congestion Notification (ECN) to IP

13	                          Status of this Memo

15	   This document is an Internet-Draft and is in full conformance with
16	   all provisions of Section 10 of RFC2026.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet- Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	Abstract

36	   This document specifies the incorporation of ECN (Explicit Congestion
37	   Notification) to TCP and IP, including ECN's use of two bits in the
38	   IP header.  We begin by describing TCP's use of packet drops as an
39	   indication of congestion.  Next we explain that with the addition of
40	   active queue management (e.g., RED) to the Internet infrastructure,
41	   where routers detect congestion before the queue overflows, routers
42	   are no longer limited to packet drops as an indication of congestion.
43	   Routers can instead set the Congestion Experienced (CE) bit in the IP
44	   header of packets from ECN-capable transports.  We describe when the
45	   CE bit is to be set in routers, and describe modifications needed to
46	   TCP to make it ECN-capable.  Modifications to other transport
47	   protocols (e.g., unreliable unicast or multicast, reliable multicast,
48	   other reliable unicast transport protocols) could be considered as
49	   those protocols are developed and advance through the standards
50	   process.

52	   We also describe in this document the issues involving the use of ECN
53	   within IP tunnels, and within IPsec tunnels in particular.

55	   One of the guiding principles for this document is that all the
56	   mechanisms specified here are incrementally deployable.

58	Table of Contents
59	     1.  Introduction
60	     2.  Conventions and Acronyms
61	     3.  Assumptions and General Principles
62	     4.  Active Queue Management (AQM)
63	     5.  Explicit Congestion Notification in IP
64	     5.1.  ECN as an Indication of Persistent Congestion
65	     5.2.  Dropped or Corrupted Packets
66	     6.  Support from the Transport Protocol
67	     6.1.  TCP
68	     6.1.1.  TCP Initialization
69	     6.1.1.1.  Robust TCP Initialization with an Echoed Reserve Field
70	     6.1.2.  The TCP Sender
71	     6.1.3.  The TCP Receiver
72	     6.1.4.  Congestion on the ACK-path
73	     6.1.5.  Retransmitted TCP packets
74	     6.1.6.  TCP Window Probes.
75	     7.  Non-compliance by the End Nodes
76	     8.  Non-compliance in the Network
77	     8.1.  Complications Introduced by Split Paths
78	     9.  Encapsulated Packets
79	     9.1.  IP packets encapsulated in IP
80	     9.1.1.  The Limited-functionality and Full-functionality Options
81	     9.1.2.  Changes to the ECN Field within an IP Tunnel.
82	     9.2.  IPsec Tunnels
83	     9.2.1.  Negotiation between Tunnel Endpoints
84	     9.2.1.1.  ECN Tunnel Security Association Database Field
85	     9.2.1.2.  ECN Tunnel Security Association Attribute
86	     9.2.1.3.  Changes to IPsec Tunnel Header Processing
87	     9.2.2.  Changes to the ECN Field within an IPsec Tunnel.
88	     9.2.3.  Comments for IPsec Support
89	     9.3.  IP packets encapsulated in non-IP packet headers.
90	     10.  Issues Raised by Monitoring and Policing Devices
91	     11.  Evaluations of ECN
92	     12.  Summary of changes required in IP and TCP
93	     13.  Conclusions
94	     14.  Acknowledgements
95	     15.  References
96	     16.  Security Considerations
97	     17.  IPv4 Header Checksum Recalculation
98	     18.  Possible Changes to the ECN Field in the Network
99	     18.1.  Possible Changes to the IP Header
100	     18.1.1.  Erasing the Congestion Indication
101	     18.1.2.  Falsely Reporting Congestion
102	     18.1.3.  Disabling ECN-Capability
103	     18.1.4.  Falsely Indicating ECN-Capability
104	     18.1.5.  Changes with No Functional Effect
105	     18.2.  Information carried in the Transport Header
106	     18.3.  Split Paths
107	     19.  Implications of Subverting End-to-End Congestion Control
108	     19.1.  Implications for the Network and for Competing Flows
109	     19.2.  Implications for the Subverted Flow
110	     19.3.  Non-ECN-Based Methods of Subverting End-to-end Congestion Control
111	     20.  The Motivation for the ECT bit.
112	     21.  Why use Two Bits in the IP Header?
113	     22.  Historical Definitions for the IPv4 TOS Octet
114	     23.  IANA Considerations

116	RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - To compare
117	this with draft-ietf-tsvwg-ecn-00, compare the following:
118	"http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-00.troff"
119	"http://www.aciri.org/floyd/papers/draft-ietf-tsvwg-ecn-01.troff"
120	Changes from draft-ietf-tsvwg-ecn-00:
121	* Deleted Section 6.1.1.2. on "Robust TCP Initialization with no
122	response to the SYN", and modified the paragraph in the Conclusions
123	referring to this.
124	* Added Section 23 on IANA Considerations.
125	* Added two paragraphs to Section 18.2 on denial-of-service attacks.
126	* Added some text about the ECN nonce being a research issue.
127	* Moved two paragraphs about setting the CWR bit from Section 6.1.3 to
128	  Section 6.1.2.
129	* Various small changes:
130	  Adding several small clarifying sentences in Section 12, 22.
131	  Small clarification to text in Section 19.2.
132	  Deleted a few unnecessary sentences in Section 9.
133	  Updated some references to Section X.
134	  Added more references to RFC 2780.
135	  Deleted references to internet-drafts.
136	  Clarified terminology for "non-ECN-setup SYN packet", including the
137	following:  "Receivers MUST correctly handle all forms of the non-ECN-
138	setup SYN and SYN-ACK packets."

140	1.  Introduction

142	   TCP's congestion control and avoidance algorithms are based on the
143	   notion that the network is a black-box [Jacobson88, Jacobson90].  The
144	   network's state of congestion or otherwise is determined by end-sys-
145	   tems probing for the network state, by gradually increasing the load
146	   on the network (by increasing the window of packets that are out-
147	   standing in the network) until the network becomes congested and a
148	   packet is lost.  Treating the network as a "black-box" and treating
149	   loss as an indication of congestion in the network is appropriate for
150	   pure best-effort data carried by TCP, with little or no sensitivity
151	   to delay or loss of individual packets.  In addition, TCP's conges-
152	   tion management algorithms have techniques built-in (such as Fast
153	   Retransmit and Fast Recovery) to minimize the impact of losses, from
154	   a throughput perspective.  However, these mechanisms are not intended
155	   to help applications that are in fact sensitive to the delay or loss
156	   of one or more individual packets.  Interactive traffic such as tel-
157	   net, web-browsing, and transfer of audio and video data can be sensi-
158	   tive to packet losses (especially when using an unreliable data
159	   delivery transport such as UDP) or to the increased latency of the
160	   packet caused by the need to retransmit the packet after a loss (with
161	   the reliable data delivery semantics provided by TCP).

163	   Since TCP determines the appropriate congestion window to use by
164	   gradually increasing the window size until it experiences a dropped
165	   packet, this causes the queues at the bottleneck router to build up.
166	   With most packet drop policies at the router that are not sensitive
167	   to the load placed by each individual flow (e.g., tail-drop on queue
168	   overflow), this means that some of the packets of latency-sensitive
169	   flows may be dropped. In addition, such drop policies lead to syn-
170	   chronization of loss across multiple flows.

172	   Active queue management mechanisms detect congestion before the queue
173	   overflows, and provide an indication of this congestion to the end
174	   nodes.  Thus, active queue management can reduce unnecessary queueing
175	   delay for all traffic sharing that queue.  The advantages of active
176	   queue management are discussed in RFC 2309 [RFC2309].  Active queue
177	   management avoids some of the bad properties of dropping on queue
178	   overflow, including the undesirable synchronization of loss across
179	   multiple flows.  More importantly, active queue management means that
180	   transport protocols with mechanisms for congestion control (e.g.,
181	   TCP) do not have to rely on buffer overflow as the only indication of
182	   congestion.

184	   Active queue management mechanisms may use one of several methods for
185	   indicating congestion to end-nodes. One is to use packet drops, as is
186	   currently done. However, active queue management allows the router to
187	   separate policies of queueing or dropping packets from the policies
188	   for indicating congestion. Thus, active queue management allows
189	   routers to use the Congestion Experienced (CE) bit in a packet header
190	   as an indication of congestion, instead of relying solely on packet
191	   drops. This has the potential of reducing the impact of loss on
192	   latency-sensitive flows.

194	   This document is intended to obsolete RFC 2481, "A Proposal to add
195	   Explicit Congestion Notification (ECN) to IP", which defined ECN as
196	   an Experimental Protocol for the Internet Community.

198	   RFC EDITOR - REMOVE THE FOLLOWING PARAGRAPH ON PUBLICATION - This
199	   document obsoletes three subsequent internet-drafts on ECN, "IPsec
200	   Interactions with ECN", "ECN Interactions with IP Tunnels", and "TCP
201	   with ECN: The Treatment of Retransmitted Data Packets".  This
202	   document is intended largely to merge the earlier documents all into
203	   a single document, for greater clarity, in preparation to becoming a
204	   Proposed Standard.

206	2.  Conventions and Acronyms

208	   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
209	   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
210	   document, are to be interpreted as described in [B97].

212	3.  Assumptions and General Principles

214	   In this section, we describe some of the important design principles
215	   and assumptions that guided the design choices in this proposal.

217	   * Because ECN is likely to be adopted gradually, accommodating migra-
218	   tion is essential. Some routers may still only drop packets to indi-
219	   cate congestion, and some end-systems may not be ECN-capable. The
220	   most viable strategy is one that accommodates incremental deployment
221	   without having to resort to "islands" of ECN-capable and non-ECN-
222	   capable environments.
223	   * New mechanisms for congestion control and avoidance need to co-
224	   exist and cooperate with existing mechanisms for congestion control.
225	   In particular, new mechanisms have to co-exist with TCP's current
226	   methods of adapting to congestion and with routers' current practice
227	   of dropping packets in periods of congestion.
228	   * Congestion may persist over different time-scales. The time scales
229	   that we are concerned with are congestion events that may last longer
230	   than a round-trip time.
231	   * The number of packets in an individual flow (e.g., TCP connection
232	   or an exchange using UDP) may range from a small number of packets to
233	   quite a large number. We are interested in managing the congestion
234	   caused by flows that send enough packets so that they are still
235	   active when network feedback reaches them.
236	   * Asymmetric routing is likely to be a normal occurrence in the
237	   Internet. The path (sequence of links and routers) followed by data
238	   packets may be different from the path followed by the acknowledgment
239	   packets in the reverse direction.
240	   * Many routers process the "regular" headers in IP packets more effi-
241	   ciently than they process the header information in IP options.  This
242	   suggests keeping congestion experienced information in the regular
243	   headers of an IP packet.
244	   * It must be recognized that not all end-systems will cooperate in
245	   mechanisms for congestion control. However, new mechanisms shouldn't
246	   make it easier for TCP applications to disable TCP congestion con-
247	   trol.  The benefit of lying about participating in new mechanisms
248	   such as ECN-capability should be small.

250	4.  Active Queue Management (AQM)

252	   Random Early Detection (RED) is one mechanism for Active Queue Man-
253	   agement (AQM) that has been proposed to detect incipient congestion
254	   [FJ93], and is currently being deployed in the Internet [RFC2309].
255	   AQM is meant to be a general mechanism using one of several alterna-
256	   tives for congestion indication, but in the absence of ECN, AQM is
257	   restricted to using packet drops as a mechanism for congestion indi-
258	   cation.  AQM drops packets based on the average queue length exceed-
259	   ing a threshold, rather than only when the queue overflows.  However,
260	   because AQM may drop packets before the queue actually overflows, AQM
261	   is not always forced by memory limitations to discard the packet.

263	   AQM can set a Congestion Experienced (CE) bit in the packet header
264	   instead of dropping the packet, when such a bit is provided in the IP
265	   header and understood by the transport protocol.  The use of the CE
266	   bit with ECN allows the receiver(s) to receive the packet, avoiding
267	   the potential for excessive delays due to retransmissions after
268	   packet losses.  We use the term 'CE packet' to denote a packet that
269	   has the CE bit set.

271	5.  Explicit Congestion Notification in IP

273	   This document specifies that the Internet provide a congestion indi-
274	   cation for incipient congestion (as in RED and earlier work [RJ90])
275	   where the notification can sometimes be through marking packets
276	   rather than dropping them.  This uses an ECN field in the IP header
277	   with two bits.  The ECN-Capable Transport (ECT) bit is set by the
278	   data sender to indicate that the end-points of the transport protocol
279	   are ECN-capable.  The CE bit is set by the router to indicate conges-
280	   tion to the end nodes.  Routers that have a packet arriving at a full
281	   queue drop the packet, just as they do in the absence of ECN.

283	   Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field.
284	   Bit 6 is designated as the ECT bit, and bit 7 is designated as the CE
285	   bit.  The IPv4 TOS octet corresponds to the Traffic Class octet in
286	   IPv6.  The definitions for the IPv4 TOS octet [RFC791] and the IPv6
287	   Traffic Class octet have been superseded by the six-bit DS (Differen-
288	   tiated Services) Field [RFC2474, RFC2780].  Bits 6 and 7 are listed
289	   in [RFC2474] as Currently Unused, and are specified in RFC 2780 as
290	   approved for experimental use for ECN.  Section 19 gives a brief his-
291	   tory of the TOS octet.

293	            0     1     2     3     4     5     6     7
294	         +-----+-----+-----+-----+-----+-----+-----+-----+
295	         |             DS FIELD              | ECN FIELD |
296	         |                                   |           |
297	         |               DSCP                | ECT | CE  |
298	         +-----+-----+-----+-----+-----+-----+-----+-----+

300	           DSCP: differentiated services codepoint
301	           ECN:  Explicit Congestion Notification

303	          Figure 1: The Differentiated Services and ECN Fields in IP.

305	   Because of the unstable history of the TOS octet, the use of the ECN
306	   field as specified in this document cannot be guaranteed to be back-
307	   wards compatible with all past uses of these two bits.  The potential
308	   dangers of this lack of backwards compatibility are discussed in Sec-
309	   tion 19.

311	   Upon the receipt by an ECN-Capable transport of a single CE packet,
312	   the congestion control algorithms followed at the end-systems MUST be
313	   essentially the same as the congestion control response to a *single*
314	   dropped packet.  For example, for ECN-Capable TCP the source TCP is
315	   required to halve its congestion window for any window of data con-
316	   taining either a packet drop or an ECN indication.

318	   One reason for requiring that the congestion-control response to the
319	   CE packet be essentially the same as the response to a dropped packet
320	   is to accommodate the incremental deployment of ECN in both end-sys-
321	   tems and in routers.  Some routers may drop ECN-Capable packets
322	   (e.g., using the same AQM policies for congestion detection) while
323	   other routers set the CE bit, for equivalent levels of congestion.
324	   Similarly, a router might drop a non-ECN-Capable packet but set the
325	   CE bit in an ECN-Capable packet, for equivalent levels of congestion.
326	   If there were different congestion control responses to a CE bit
327	   indication than to a packet drop, this could result in unfair treat-
328	   ment for different flows.

330	   An additional goal is that the end-systems should react to congestion
331	   at most once per window of data (i.e., at most once per round-trip
332	   time), to avoid reacting multiple times to multiple indications of
333	   congestion within a round-trip time.

335	   For a router, the CE bit of an ECN-Capable packet should only be set
336	   if the router would otherwise have dropped the packet as an indica-
337	   tion of congestion to the end nodes. When the router's buffer is not
338	   yet full and the router is prepared to drop a packet to inform end
339	   nodes of incipient congestion, the router should first check to see
340	   if the ECT bit is set in that packet's IP header.  If so, then
341	   instead of dropping the packet, the router MAY instead set the CE bit
342	   in the IP header.

344	   An environment where all end nodes were ECN-Capable could allow new
345	   criteria to be developed for setting the CE bit, and new congestion
346	   control mechanisms for end-node reaction to CE packets.  However,
347	   this is a research issue, and as such is not addressed in this docu-
348	   ment.

350	   When a CE packet (i.e., a packet that has the CE bit set) is received
351	   by a router, the CE bit is left unchanged, and the packet is trans-
352	   mitted as usual. When severe congestion has occurred and the router's
353	   queue is full, then the router has no choice but to drop some packet
354	   when a new packet arrives.  We anticipate that such packet losses
355	   will become relatively infrequent when a majority of end-systems
356	   become ECN-Capable and participate in TCP or other compatible conges-
357	   tion control mechanisms. In an ECN-Capable environment that is ade-
358	   quately-provisioned network, packet losses should occur primarily
359	   during transients or in the presence of non-cooperating sources.

361	   We expect that routers will set the CE bit in response to incipient
362	   congestion as indicated by the average queue size, using the RED
363	   algorithms suggested in [FJ93, RFC2309].  To the best of our knowl-
364	   edge, this is the only proposal currently under discussion in the
365	   IETF for routers to drop packets proactively, before the buffer over-
366	   flows.  However, this document does not attempt to specify a particu-
367	   lar mechanism for active queue management, leaving that endeavor, if
368	   needed, to other areas of the IETF.  While ECN is inextricably tied
369	   up with the need to have a reasonable active queue management mecha-
370	   nism at the router, the reverse does not hold; active queue manage-
371	   ment mechanisms have been developed and deployed independent of ECN,
372	   using packet drops as indications of congestion in the absence of ECN
373	   in the IP architecture.

375	5.1.  ECN as an Indication of Persistent Congestion

377	   We emphasize that a *single* packet with the CE bit set in an IP
378	   packet causes the transport layer to respond, in terms of congestion
379	   control, as it would to a packet drop.  The instantaneous queue size
380	   is likely to see considerable variations even when the router does
381	   not experience persistent congestion.  As such, it is important that
382	   transient congestion at a router, reflected by the instantaneous
383	   queue size reaching a threshold much smaller than the capacity of the
384	   queue, not trigger a reaction at the transport layer.  Therefore, the
385	   CE bit should not be set by a router based on the instantaneous queue
386	   size.

388	   For example, since the ATM and Frame Relay mechanisms for congestion
389	   indication have typically been defined without an associated notion
390	   of average queue size as the basis for determining that an intermedi-
391	   ate node is congested, we believe that they provide a very noisy sig-
392	   nal. The TCP-sender reaction specified in this document for ECN is
393	   NOT the appropriate reaction for such a noisy signal of congestion
394	   notification.  However, if the routers that interface to the ATM net-
395	   work have a way of maintaining the average queue at the interface,
396	   and use it to come to a reliable determination that the ATM subnet is
397	   congested, they may use the ECN notification that is defined here.

399	   We continue to encourage experiments in techniques at layer 2 (e.g.,
400	   in ATM switches or Frame Relay switches) to take advantage of ECN.
401	   For example, using a scheme such as RED (where packet marking is
402	   based on the average queue length exceeding a threshold), layer 2
403	   devices could provide a reasonably reliable indication of congestion.
404	   When all the layer 2 devices in a path set that layer's own Conges-
405	   tion Experienced bit (e.g., the EFCI bit for ATM, the FECN bit in
406	   Frame Relay) in this reliable manner, then the interface router to
407	   the layer 2 network could copy the state of that layer 2 Congestion
408	   Experienced bit into the CE bit in the IP header.  We recognize that
409	   this is not the current practice, nor is it in current standards.
410	   However, encouraging experimentation in this manner may provide the
411	   information needed to enable evolution of existing layer 2 mechanisms
412	   to provide a more reliable means of congestion indication, when they
413	   use a single bit for indicating congestion.

415	5.2.  Dropped or Corrupted Packets

417	   For the proposed use for ECN in this document (that is, for a trans-
418	   port protocol such as TCP for which a dropped data packet is an indi-
419	   cation of congestion), end nodes detect dropped data packets, and the
420	   congestion response of the end nodes to a dropped data packet is at
421	   least as strong as the congestion response to a received CE packet.
422	   To ensure the reliable delivery of the congestion indication of the
423	   CE bit, the ECT bit MUST NOT be set in a packet unless the loss of
424	   that packet in the network would be detected by the end nodes and
425	   interpreted as an indication of congestion.

427	   Transport protocols such as TCP do not necessarily detect all packet
428	   drops, such as the drop of a "pure" ACK packet; for example, TCP does
429	   not reduce the arrival rate of subsequent ACK packets in response to
430	   an earlier dropped ACK packet.  Any proposal for extending ECN-Capa-
431	   bility to such packets would have to address issues such as the case
432	   of an ACK packet that was marked with the CE bit but was later
433	   dropped in the network. We believe that this aspect is still the sub-
434	   ject of research, so this document specifies that at this time,
435	   "pure" ACK packets MUST NOT indicate ECN-Capability.

437	   Similarly, if a CE packet is dropped later in the network due to cor-
438	   ruption (bit errors), the end nodes should still invoke congestion
439	   control, just as TCP would today in response to a dropped data
440	   packet. This issue of corrupted CE packets would have to be consid-
441	   ered in any proposal for the network to distinguish between packets
442	   dropped due to corruption, and packets dropped due to congestion or
443	   buffer overflow.  In particular, the ubiquitous deployment of ECN
444	   would not, in and of itself, be a sufficient development to allow
445	   end-nodes to interpret packet drops as indications of corruption
446	   rather than congestion.

448	6.  Support from the Transport Protocol

450	   ECN requires support from the transport protocol, in addition to the
451	   functionality given by the ECN field in the IP packet header. The
452	   transport protocol might require negotiation between the endpoints
453	   during setup to determine that all of the endpoints are ECN-capable,
454	   so that the sender can set the ECT bit in transmitted packets.  Sec-
455	   ond, the transport protocol must be capable of reacting appropriately
456	   to the receipt of CE packets.  This reaction could be in the form of
457	   the data receiver informing the data sender of the received CE packet
458	   (e.g., TCP), of the data receiver unsubscribing to a layered multi-
459	   cast group (e.g., RLM [MJV96]), or of some other action that ulti-
460	   mately reduces the arrival rate of that flow on that congested link.

462	   This document only addresses the addition of ECN Capability to TCP,
463	   leaving issues of ECN in other transport protocols to further
464	   research.  For TCP, ECN requires three new pieces of functionality:
465	   negotiation between the endpoints during connection setup to deter-
466	   mine if they are both ECN-capable; an ECN-Echo (ECE) flag in the TCP
467	   header so that the data receiver can inform the data sender when a CE
468	   packet has been received; and a Congestion Window Reduced (CWR) flag
469	   in the TCP header so that the data sender can inform the data
470	   receiver that the congestion window has been reduced. The support
471	   required from other transport protocols is likely to be different,
472	   particularly for unreliable or reliable multicast transport proto-
473	   cols, and will have to be determined as other transport protocols are
474	   brought to the IETF for standardization.

476	6.1.  TCP

478	   The following sections describe in detail the proposed use of ECN in
479	   TCP.  This proposal is described in essentially the same form in
480	   [Floyd94]. We assume that the source TCP uses the standard congestion
481	   control algorithms of Slow-start, Fast Retransmit and Fast Recovery
482	   [RFC 2001].

484	   This proposal specifies two new flags in the Reserved field of the
485	   TCP header.  The TCP mechanism for negotiating ECN-Capability uses
486	   the ECN-Echo (ECE) flag in the TCP header.  Bit 9 in the Reserved
487	   field of the TCP header is designated as the ECN-Echo flag.  The
488	   location of the 6-bit Reserved field in the TCP header is shown in
489	   Figure 3 of RFC 793 [RFC793] (and is reproduced below for complete-
490	   ness).  This specification of the ECN Field leaves the Reserved field
491	   as a 4-bit field using bits 4-7.

493	   To enable the TCP receiver to determine when to stop setting the ECN-
494	   Echo flag, we introduce a second new flag in the TCP header, the CWR
495	   flag.  The CWR flag is assigned to Bit 8 in the Reserved field of the
496	   TCP header.

498	         0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
499	       +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
500	       |               |                       | U | A | P | R | S | F |
501	       | Header Length |        Reserved       | R | C | S | S | Y | I |
502	       |               |                       | G | K | H | T | N | N |
503	       +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

505	        Figure 2: The old definition of bytes 13 and 14 of the TCP
506	   header.

508	         0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
509	       +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
510	       |               |               | C | E | U | A | P | R | S | F |
511	       | Header Length |    Reserved   | W | C | R | C | S | S | Y | I |
512	       |               |               | R | E | G | K | H | T | N | N |
513	       +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

515	        Figure 3: The new definition of bytes 13 and 14 of the TCP
516	   Header.

518	   Thus, ECN uses the ECT and CE flags in the IP header (as shown in
519	   Figure 1) for signaling between routers and connection endpoints, and
520	   uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure
521	   3) for TCP-endpoint to TCP-endpoint signaling.  For a TCP connection,
522	   a typical sequence of events in an ECN-based reaction to congestion
523	   is as follows:
524	      * The ECT bit is set in packets transmitted by the sender to indi-
525	      cate that ECN is supported by the transport entities for these
526	      packets.
527	      * An ECN-capable router detects impending congestion and detects
528	      that the ECT bit is set in the packet it is about to drop.
529	      Instead of dropping the packet, the router chooses to set the CE
530	      bit in the IP header and forwards the packet.
531	      * The receiver receives the packet with the CE bit set, and sets
532	      the ECN-Echo flag in its next TCP ACK sent to the sender.

534	      * The sender receives the TCP ACK with ECN-Echo set, and reacts to
535	      the congestion as if a packet had been dropped.
536	      * The sender sets the CWR flag in the TCP header of the next
537	      packet sent to the receiver to acknowledge its receipt of and
538	      reaction to the ECN-Echo flag.

540	   The negotiation for using ECN by the TCP transport entities and the
541	   use of the ECN-Echo and CWR flags is described in more detail in the
542	   sections below.

544	6.1.1  TCP Initialization

546	   In the TCP connection setup phase, the source and destination TCPs
547	   exchange information about their willingness to use ECN.  Subsequent
548	   to the completion of this negotiation, the TCP sender sets the ECT
549	   bit in the IP header of data packets to indicate to the network that
550	   the transport is capable and willing to participate in ECN for this
551	   packet. This indicates to the routers that they may mark this packet
552	   with the CE bit, if they would like to use that as a method of con-
553	   gestion notification. If the TCP connection does not wish to use ECN
554	   notification for a particular packet, the sending TCP sets the ECT
555	   bit equal to 0 (i.e., not set), and the TCP receiver ignores the CE
556	   bit in the received packet.

558	   For this discussion, we designate the initiating host as Host A and
559	   the responding host as Host B.  We call a SYN packet with the ECE and
560	   CWR flags set an "ECN-setup SYN packet", and we call a SYN packet
561	   with at least one of the ECE and CWR flags not set a "non-ECN-setup
562	   SYN packet".  Similarly, we call a SYN-ACK packet with only the ECE
563	   flag set but the CWR flag not set an "ECN-setup SYN-ACK packet", and
564	   we call a SYN-ACK packet with any other configuration of the ECE and
565	   CWR flags a "non-ECN-setup SYN-ACK packet".

567	   Before a TCP connection can use ECN, Host A sends an ECN-setup SYN
568	   packet, and Host B sends an ECN-setup SYN-ACK packet.  For a SYN
569	   packet, the setting of both ECE and CWR in the ECN-setup SYN packet
570	   is defined as an indication that the sending TCP is ECN-Capable,
571	   rather than as an indication of congestion or of response to conges-
572	   tion. More precisely, an ECN-setup SYN packet indicates that the TCP
573	   implementation transmitting the SYN packet will participate in ECN as
574	   both a sender and receiver.  Specifically, as a receiver, it will
575	   respond to incoming data packets that have the CE bit set in the IP
576	   header by setting ECE in outgoing TCP Acknowledgement (ACK) packets.
577	   As a sender, it will respond to incoming packets that have ECE set by
578	   reducing the congestion window and setting CWR when appropriate.  An
579	   ECN-setup SYN packet does not commit the TCP sender to setting the
580	   ECT bit in any or all of the packets it may transmit.  However, the
581	   commitment to respond appropriately to incoming packets with the CE
582	   bit set remains even if the TCP sender in a later transmission,
583	   within this TCP connection, sends a SYN packet without ECE and CWR
584	   set.

586	   When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE flag
587	   but not the CWR flag.  An ECN-setup SYN-ACK packet is defined as an
588	   indication that the TCP transmitting the SYN-ACK packet is ECN-Capa-
589	   ble.  As with the SYN packet, an ECN-setup SYN-ACK packet does not
590	   commit the TCP host to setting the ECT bit in transmitted packets.

592	   The following rules apply to the sending of ECN-setup packets:

594	   * If a host has received an ECN-setup SYN packet, then it MAY send an
595	   ECN-setup SYN-ACK packet.  Otherwise, it MUST NOT send an ECN-setup
596	   SYN-ACK packet.
597	   * A host MUST NOT set ECT on data packets unless it has sent at least
598	   one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received at
599	   least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has sent no
600	   non-ECN-setup SYN or non-ECN-setup SYN-ACK packet.  If a host has
601	   received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK
602	   packet, then it SHOULD NOT set ECT on data packets.
603	   * If a host ever sets the ECT bit on a data packet, then that host
604	   MUST correctly set/clear the CWR TCP bit on all subsequent packets in
605	   the connection.
606	   * If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK
607	   packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN-
608	   ACK packet, then if that host receives TCP data packets with ECT and
609	   CE bits set in the IP header, then that host MUST process these pack-
610	   ets as specified for an ECN-capable connection.  * A host that is not
611	   willing to use ECN on a TCP connection SHOULD clear both the ECE and
612	   CWR flags in all non-ECN-setup SYN and/or SYN-ACK packets that it
613	   sends to indicate this unwillingness.  Receivers MUST correctly han-
614	   dle all forms of the non-ECN-setup SYN and SYN-ACK packets.

616	6.1.1.1.  Robust TCP Initialization with an Echoed Reserve Field

618	   There is the question of why we chose to have the TCP sending the SYN
619	   set two ECN-related flags in the Reserved field of the TCP header for
620	   the SYN packet, while the responding TCP sending the SYN-ACK sets
621	   only one ECN-related flag in the SYN-ACK packet.  This asymmetry is
622	   necessary for the robust negotiation of ECN-capability with some
623	   deployed TCP implementations.  There exists at least one faulty TCP
624	   implementation in which TCP receivers set the Reserved field of the
625	   TCP header in ACK packets (and hence the SYN-ACK) simply to reflect
626	   the Reserved field of the TCP header in the received data packet.
627	   Because the TCP SYN packet sets the ECN-Echo and CWR flags to indi-
628	   cate ECN-capability, while the SYN-ACK packet sets only the ECN-Echo
629	   flag, the sending TCP correctly interprets a receiver's reflection of
630	   its own flags in the Reserved field as an indication that the
631	   receiver is not ECN-capable.  The sending TCP is not mislead by a
632	   faulty TCP implementation sending a SYN-ACK packet that simply
633	   reflects the Reserved field of the incoming SYN packet.

635	6.1.2.  The TCP Sender

637	   For a TCP connection using ECN, new data packets are transmitted with
638	   the ECT bit set in the IP header (set to a "1").  If the sender
639	   receives an ECN-Echo (ECE) ACK packet (that is, an ACK packet with
640	   the ECN-Echo flag set in the TCP header), then the sender knows that
641	   congestion was encountered in the network on the path from the sender
642	   to the receiver.  The indication of congestion should be treated just
643	   as a congestion loss in non-ECN-Capable TCP. That is, the TCP source
644	   halves the congestion window "cwnd" and reduces the slow start
645	   threshold "ssthresh".  The sending TCP SHOULD NOT increase the con-
646	   gestion window in response to the receipt of an ECN-Echo ACK packet.

648	   TCP should not react to congestion indications more than once every
649	   window of data (or more loosely, more than once every round-trip
650	   time). That is, the TCP sender's congestion window should be reduced
651	   only once in response to a series of dropped and/or CE packets from a
652	   single window of data.  In addition, the TCP source should not
653	   decrease the slow-start threshold, ssthresh, if it has been decreased
654	   within the last round trip time.  However, if any retransmitted pack-
655	   ets are dropped, then this is interpreted by the source TCP as a new
656	   instance of congestion.

658	   After the source TCP reduces its congestion window in response to a
659	   CE packet, incoming acknowledgements that continue to arrive can
660	   "clock out" outgoing packets as allowed by the reduced congestion
661	   window.  If the congestion window consists of only one MSS (maximum
662	   segment size), and the sending TCP receives an ECN-Echo ACK packet,
663	   then the sending TCP should in principle still reduce its congestion
664	   window in half. However, the value of the congestion window is
665	   bounded below by a value of one MSS.  If the sending TCP were to con-
666	   tinue to send, using a congestion window of 1 MSS, this results in
667	   the transmission of one packet per round-trip time.  It is necessary
668	   to still reduce the sending rate of the TCP sender even further, on
669	   receipt of an ECN-Echo packet when the congestion window is one.  We
670	   use the retransmit timer as a means of reducing the rate further in
671	   this circumstance.  Therefore, the sending TCP MUST reset the
672	   retransmit timer on receiving the ECN-Echo packet when the congestion
673	   window is one.  The sending TCP will then be able to send a new
674	   packet only when the retransmit timer expires.

676	   When an ECN-Capable TCP sender reduces its congestion window for any
677	   reason (because of a retransmit timeout, a Fast Retransmit, or in
678	   response to an ECN Notification), the TCP sender sets the CWR flag in
679	   the TCP header of the first new data packet sent after the window
680	   reduction.  If that data packet is dropped in the network, then the
681	   sending TCP will have to reduce the congestion window again and
682	   retransmit the dropped packet.

684	   We ensure that the "Congestion Window Reduced" information is reli-
685	   ably delivered to the TCP receiver.  This comes about from the fact
686	   that if the new data packet carrying the CWR flag is dropped, then
687	   the TCP sender will have to again reduce its congestion window, and
688	   send another new data packet with the CWR flag set.  Thus, the CWR
689	   bit in the TCP header SHOULD NOT be set on retransmitted packets.
690	   When the TCP data sender is ready to set the CWR bit after reducing
691	   the congestion window, it SHOULD set the CWR bit only on the first
692	   new data packet that it transmits.

694	   [Floyd94] discusses TCP's response to ECN in more detail.  [Floyd98]
695	   discusses the validation test in the ns simulator, which illustrates
696	   a wide range of ECN scenarios. These scenarios include the following:
697	   an ECN followed by another ECN, a Fast Retransmit, or a Retransmit
698	   Timeout; a Retransmit Timeout or a Fast Retransmit followed by an
699	   ECN; and a congestion window of one packet followed by an ECN.

701	   TCP follows existing algorithms for sending data packets in response
702	   to incoming ACKs, multiple duplicate acknowledgements, or retransmit
703	   timeouts [RFC2581].  TCP also follows the normal procedures for
704	   increasing the congestion window when it receives ACK packets without
705	   the ECN-Echo bit set [RFC2581].

707	6.1.3.  The TCP Receiver

709	   When TCP receives a CE data packet at the destination end-system, the
710	   TCP data receiver sets the ECN-Echo flag in the TCP header of the
711	   subsequent ACK packet.  If there is any ACK withholding implemented,
712	   as in current "delayed-ACK" TCP implementations where the TCP
713	   receiver can send an ACK for two arriving data packets, then the ECN-
714	   Echo flag in the ACK packet will be set to the OR of the CE bits of
715	   all of the data packets being acknowledged.  That is, if any of the
716	   received data packets are CE packets, then the returning ACK has the
717	   ECN-Echo flag set.

719	   To provide robustness against the possibility of a dropped ACK packet
720	   carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in
721	   a series of ACK packets sent subsequently.  The TCP receiver uses the
722	   CWR flag received from the TCP sender to determine when to stop set-
723	   ting the ECN-Echo flag.

725	   After a TCP receiver sends an ACK packet with the ECN-Echo bit set,
726	   that TCP receiver continues to set the ECN-Echo flag in all the ACK
727	   packets it sends (whether they acknowledge CE data packets or non-CE
728	   data packets) until it receives a CWR packet (a packet with the CWR
729	   flag set).  After the receipt of the CWR packet, acknowledgements for
730	   subsequent non-CE data packets do not have the ECN-Echo flag set. If
731	   another CE packet is received by the data receiver, the receiver
732	   would once again send ACK packets with the ECN-Echo flag set.  While
733	   the receipt of a CWR packet does not guarantee that the data sender
734	   received the ECN-Echo message, this does suggest that the data sender
735	   reduced its congestion window at some point *after* it sent the data
736	   packet for which the CE bit was set.

738	   We have already specified that a TCP sender is not required to reduce
739	   its congestion window more than once per window of data.  Some care
740	   is required if the TCP sender is to avoid unnecessary reductions of
741	   the congestion window when a window of data includes both dropped
742	   packets and (marked) CE packets.  This is illustrated in [Floyd98].

744	6.1.4.  Congestion on the ACK-path

746	   For the current generation of TCP congestion control algorithms, pure
747	   acknowledgement packets (e.g., packets that do not contain any accom-
748	   panying data) should be sent with the ECT bit off. Current TCP
749	   receivers have no mechanisms for reducing traffic on the ACK-path in
750	   response to congestion notification.  Mechanisms for responding to
751	   congestion on the ACK-path are areas for current and future research.
752	   (One simple possibility would be for the sender to reduce its conges-
753	   tion window when it receives a pure ACK packet with the CE bit set).
754	   For current TCP implementations, a single dropped ACK generally has
755	   only a very small effect on the TCP's sending rate.

757	6.1.5.  Retransmitted TCP packets

759	   This document specifies that for ECN-capable TCP implementations, the
760	   ECT bit (ECN-Capable Transport) in the IP header MUST NOT be set on
761	   retransmitted data packets, and that the TCP data receiver SHOULD
762	   ignore the ECN field on arriving data packets that are outside of the
763	   receiver's current window.  This is for greater security against
764	   denial-of-service attacks, as well as for robustness of the ECN con-
765	   gestion indication with packets that are dropped later in the net-
766	   work.

768	   First, we note that if the TCP sender were to set the ECT bit on a
769	   retransmitted packet, then if an unnecessarily-retransmitted packet
770	   was later dropped in the network, the end nodes would never receive
771	   the indication of congestion from the router setting the CE bit.
772	   Thus, setting the ECT bit on retransmitted data packets is not con-
773	   sistent with the robust delivery of the congestion indication even
774	   for packets that are later dropped in the network.

776	   In addition, an attacker capable of spoofing the IP source address of
777	   the TCP sender could send data packets with arbitrary sequence num-
778	   bers, with both the ECT and CE bits set in the IP header.  On receiv-
779	   ing this spoofed data packet, the TCP data receiver would determine
780	   that the data does not lie in the current receive window, and return
781	   a duplicate acknowledgement.  We define an out-of-window packet at
782	   the TCP data receiver as a data packet that lies outside the
783	   receiver's current window.  On receiving an out-of-window packet, the
784	   TCP data receiver has to decide whether or not to treat the CE bit in
785	   the packet header as a valid indication of congestion, and therefore
786	   whether to return ECN-Echo indications to the TCP data sender.  If
787	   the TCP data receiver ignored the CE bit in an out-of-window packet,
788	   then the TCP data sender would not receive this possibly-legitimate
789	   indication of congestion from the network, resulting in a violation
790	   of end-to-end congestion control.  On the other hand, if the TCP data
791	   receiver honors the CE indication in the out-of-window packet, and
792	   reports the indication of congestion to the TCP data sender, then the
793	   malicious node that created the spoofed, out-of-window packet has
794	   successfully "attacked" the TCP connection by forcing the data sender
795	   to unnecessarily reduce (halve) its congestion window.  To prevent
796	   such a denial-of-service attack, we specify that a legitimate TCP
797	   data sender MUST NOT set the ECT bit on retransmitted data packets,
798	   and that the TCP data receiver SHOULD ignore the CE bit on out-of-
799	   window packets.

801	   One drawback of not setting ECT on retransmitted packets denies ECN
802	   protection for retransmitted packets.  However, for an ECN-capable
803	   TCP connection in a fully-ECN-capable environment with mild conges-
804	   tion, packets should rarely be dropped due to congestion in the first
805	   place, and so instances of retransmitted packets should rarely arise.
806	   If packets are being retransmitted, then there are already packet
807	   losses (from corruption or from congestion) that ECN has been unable
808	   to prevent.

810	   We note that if the router sets the CE bit for an ECN-capable data
811	   packet within a TCP connection, then the TCP connection is guaranteed
812	   to receive that indication of congestion, or to receive some other
813	   indication of congestion within the same window of data, even if this
814	   packet is dropped or reordered in the network.  We consider two
815	   cases, when the packet is later retransmitted, and when the packet is
816	   not later retransmitted.

818	   In the first case, if the packet is either dropped or delayed, and at
819	   some point retransmitted by the data sender, then the retransmission
820	   is a result of a Fast Retransmit or a Retransmit Timeout for either
821	   that packet or for some prior packet in the same window of data.  In
822	   this case, because the data sender already has retransmitted this
823	   packet, we know that the data sender has already responded to an
824	   indication of congestion for some packet within the same window of
825	   data as the original packet.  Thus, even if the first transmission of
826	   the packet is dropped in the network, or is delayed, if it had the CE
827	   bit set, and is later ignored by the data receiver as an out-of-win-
828	   dow packet, this is not a problem, because the sender has already
829	   responded to an indication of congestion for that window of data.

831	   In the second case, if the packet is never retransmitted by the data
832	   sender, then this data packet is the only copy of this data received
833	   by the data receiver, and therefore arrives at the data receiver as
834	   an in-window packet, regardless of how much the packet might be
835	   delayed or reordered.  In this case, if the CE bit is set on the
836	   packet within the network, this will be treated by the data receiver
837	   as a valid indication of congestion.

839	6.1.6.  TCP Window Probes.

841	   When the TCP data receiver advertises a zero window, the TCP data
842	   sender sends window probes to determine if the receiver's window has
843	   increased.  Window probe packets do not contain any user data except
844	   for the sequence number, which is a byte.  If a window probe packet
845	   is dropped in the network, this loss is not detected by the receiver.
846	   Therefore, the TCP data sender MUST NOT set either the ECT or CWR
847	   bits on window probe packets.

849	   However, because window probes use exact sequence numbers, they can-
850	   not be easily spoofed in denial-of-service attacks.  Therefore, if a
851	   window probe arrives with ECT and CE set, then the receiver SHOULD
852	   respond to the ECN indications.

854	7.  Non-compliance by the End Nodes

856	   This section discusses concerns about the vulnerability of ECN to
857	   non-compliant end-nodes (i.e., end nodes that set the ECT bit in
858	   transmitted packets but do not respond to received CE packets).  We
859	   argue that the addition of ECN to the IP architecture will not sig-
860	   nificantly increase the current vulnerability of the architecture to
861	   unresponsive flows.

863	   Even for non-ECN environments, there are serious concerns about the
864	   damage that can be done by non-compliant or unresponsive flows (that
865	   is, flows that do not respond to congestion control indications by
866	   reducing their arrival rate at the congested link).  For example, an
867	   end-node could "turn off congestion control" by not reducing its con-
868	   gestion window in response to packet drops. This is a concern for the
869	   current Internet.  It has been argued that routers will have to
870	   deploy mechanisms to detect and differentially treat packets from
871	   non-compliant flows [RFC2309,FF99].  It has also been suggested that
872	   techniques such as end-to-end per-flow scheduling and isolation of
873	   one flow from another, differentiated services, or end-to-end reser-
874	   vations could remove some of the more damaging effects of unrespon-
875	   sive flows.

877	   It might seem that dropping packets in itself is an adequate deter-
878	   rent for non-compliance, and that the use of ECN removes this deter-
879	   rent.  We would argue in response that (1) ECN-capable routers pre-
880	   serve packet-dropping behavior in times of high congestion; and (2)
881	   even in times of high congestion, dropping packets in itself is not
882	   an adequate deterrent for non-compliance.

884	   First, ECN-Capable routers will only mark packets (as opposed to
885	   dropping them) when the packet marking rate is reasonably low. During
886	   periods where the average queue size exceeds an upper threshold, and
887	   therefore the potential packet marking rate would be high, our recom-
888	   mendation is that routers drop packets rather then set the CE bit in
889	   packet headers.

891	   During the periods of low or moderate packet marking rates when ECN
892	   would be deployed, there would be little deterrent effect on unre-
893	   sponsive flows of dropping rather than marking those packets. For
894	   example, delay-insensitive flows using reliable delivery might have
895	   an incentive to increase rather than to decrease their sending rate
896	   in the presence of dropped packets.  Similarly, delay-sensitive flows
897	   using unreliable delivery might increase their use of FEC in response
898	   to an increased packet drop rate, increasing rather than decreasing
899	   their sending rate.  For the same reasons, we do not believe that
900	   packet dropping itself is an effective deterrent for non-compliance
901	   even in an environment of high packet drop rates, when all flows are
902	   sharing the same packet drop rate.

904	   Several methods have been proposed to identify and restrict non-com-
905	   pliant or unresponsive flows. The addition of ECN to the network
906	   environment would not in any way increase the difficulty of designing
907	   and deploying such mechanisms. If anything, the addition of ECN to
908	   the architecture would make the job of identifying unresponsive flows
909	   slightly easier.  For example, in an ECN-Capable environment routers
910	   are not limited to information about packets that are dropped or have
911	   the CE bit set at that router itself; in such an environment, routers
912	   could also take note of arriving CE packets that indicate congestion
913	   encountered by that packet earlier in the path.

915	8.  Non-compliance in the Network

917	   This section considers the issues when a router is operating, possi-
918	   bly maliciously, to modify either of the bits in the ECN field.  In
919	   this section we represent the ECN field in the IP header by the tuple
920	   (ECT bit, CE bit).

922	   By tampering with the bits in the ECN field, an adversary (or a bro-
923	   ken router) could do one or more of the following: falsely report
924	   congestion, disable ECN-Capability for an individual packet, erase
925	   the ECN congestion indication, or falsely indicate ECN-Capability.
926	   Section 18 systematically examines the various cases by which the ECN
927	   field could be modified.  The important criterion considered in
928	   determining the consequences of such modifications is whether it is
929	   likely to lead to poorer behavior in any dimension (throughput,
930	   delay, fairness or functionality) than if a router were to drop a
931	   packet.

933	   The first two possible changes, falsely reporting congestion or dis-
934	   abling ECN-Capability for an individual packet, are no worse than if
935	   the router were to simply drop the packet.  From a congestion control
936	   point of view, setting the CE bit in the absence of congestion by a
937	   non-compliant router would be no worse than a router dropping a
938	   packet unnecessarily. By "erasing" the ECT bit of a packet that is
939	   later dropped in the network, a router's actions could result in an
940	   unnecessary packet drop for that packet later in the network.

942	   However, as discussed in Section 18, a router that erases the ECN
943	   congestion indication or falsely indicates ECN-Capability could
944	   potentially do more damage to the flow that if it has simply dropped
945	   the packet.  A rogue or broken router that "erased" the CE bit in
946	   arriving CE packets would prevent that indication of congestion from
947	   reaching downstream receivers.  This could result in the failure of
948	   congestion control for that flow and a resulting increase in conges-
949	   tion in the network, ultimately resulting in subsequent packets
950	   dropped for this flow as the average queue size increased at the con-
951	   gested gateway.

953	   Section 19 considers the potential repercussions of subverting end-
954	   to-end congestion control by either falsely indicating ECN-Capabil-
955	   ity, or by erasing the congestion indication in ECN (the CE-bit).  We
956	   observe in Section 19 that the consequence of subverting ECN-based
957	   congestion control may lead to potential unfairness, but this is
958	   likely to be no worse than the subversion of either ECN-based or
959	   packet-based congestion control by the end nodes.

961	8.1.  Complications Introduced by Split Paths

963	   If a router or other network element has access to all of the packets
964	   of a flow, then that router could do no more damage to a flow by
965	   altering the ECN field than it could by simply dropping all of the
966	   packets from that flow.  However, in some cases, a malicious or bro-
967	   ken router might have access to only a subset of the packets from a
968	   flow.  The question is as follows:  can this router, by altering the
969	   ECN field in this subset of the packets, do more damage to that flow
970	   than if it has simply dropped that set of the packets?

972	   This is also discussed in detail in Section 18, which conclude as
973	   follows:  It is true that the adversary that has access only to a
974	   subset of packets in an aggregate might, by subverting ECN-based con-
975	   gestion control, be able to deny the benefits of ECN to the other
976	   packets in the aggregate.  While this is undesirable, this is not a
977	   sufficient concern to result in disabling ECN.

979	9.  Encapsulated Packets

981	9.1.  IP packets encapsulated in IP

983	   The encapsulation of IP packet headers in tunnels is used in many
984	   places, including IPsec and IP in IP [RFC2003].  This section consid-
985	   ers issues related to interactions between ECN and IP tunnels, and
986	   specifies two alternative solutions.  This discussion is complemented
987	   by RFC 2983's discussion of interactions between Differentiated Ser-
988	   vices and IP tunnels of various forms [RFC 2983], as Differentiated
989	   Services uses the remaining six bits of the IP header octet that is
990	   used by ECN (see Figure 1 in Section 5).

992	   Some IP tunnel modes are based on adding a new "outer" IP header that
993	   encapsulates the original, or "inner" IP header and its associated
994	   packet.  In many cases, the new "outer" IP header may be added and
995	   removed at intermediate points along a connection, enabling the net-
996	   work to establish a tunnel without requiring endpoint participation.
997	   We denote tunnels that specify that the outer header be discarded at
998	   tunnel egress as "simple tunnels".

1000	   ECN uses the ECT and CE flags in the IP header for signaling between
1001	   routers and connection endpoints.  ECN interacts with IP tunnels
1002	   based on the treatment of these flags in the IP header.  In simple IP
1003	   tunnels the octet containing these flags is copied or mapped from the
1004	   inner IP header to the outer IP header at IP tunnel ingress, and the
1005	   outer header's copy of this field is discarded at IP tunnel egress.
1006	   If the outer header were to be simply discarded without taking care
1007	   to deal with the ECN related flags, and an ECN-capable router were to
1008	   set the CE (Congestion Experienced) bit within a packet in a simple
1009	   IP tunnel, this indication would be discarded at tunnel egress, los-
1010	   ing the indication of congestion.

1012	   Thus, the use of ECN over simple IP tunnels would result in routers
1013	   attempting to use the outer IP header to signal congestion to end-
1014	   points, but those congestion warnings never arriving because the
1015	   outer header is discarded at the tunnel egress point.  This problem
1016	   was encountered with ECN and IPsec in tunnel mode, and RFC 2481 rec-
1017	   ommended that ECN not be used with the older simple IPsec tunnels in
1018	   order to avoid this behavior and its consequences.  When ECN becomes
1019	   widely deployed, then simple tunnels likely to carry ECN-capable
1020	   traffic will have to be changed.

1022	   From a security point of view, the use of ECN in the outer header of
1023	   an IP tunnel might raise security concerns because an adversary could
1024	   tamper with the ECN information that propagates beyond the tunnel
1025	   endpoint.  Based on an analysis in Sections 18 and 19 of these con-
1026	   cerns and the resultant risks, our overall approach is to make sup-
1027	   port for ECN an option for IP tunnels, so that an IP tunnel can be
1028	   specified or configured either to use ECN or not to use ECN in the
1029	   outer header of the tunnel.  Thus, in environments or tunneling pro-
1030	   tocols where the risks of using ECN are judged to outweigh its bene-
1031	   fits, the tunnel can simply not use ECN in the outer header.  Then
1032	   the only indication of congestion experienced at routers within the
1033	   tunnel would be through packet loss.

1035	   The result is that there are two viable options for the behavior of
1036	   ECN-capable connections over an IP tunnel, especially IPsec tunnels:
1037	      * A limited-functionality option in which ECN is preserved in the
1038	      inner header, but disabled in the outer header.  The only mecha-
1039	      nism available for signaling congestion occurring within the tun-
1040	      nel in this case is dropped packets.
1041	      * A full-functionality option that supports ECN in both the inner
1042	      and outer headers, and propagates congestion warnings from nodes
1043	      within the tunnel to endpoints.

1045	   Support for these options requires varying amounts of changes to IP
1046	   header processing at tunnel ingress and egress.  A small subset of
1047	   these changes sufficient to support only the limited-functionality
1048	   option would be sufficient to eliminate any incompatibility between
1049	   ECN and IP tunnels.

1051	   One goal of this document is to give guidance about the tradeoffs
1052	   between the limited-functionality and full-functionality options.  A
1053	   full discussion of the potential effects of an adversary's modifica-
1054	   tions of the CE and ECT bits is given in Sections 18 and 19.

1056	9.1.1.  The Limited-functionality and Full-functionality Options

1058	   The limited-functionality option for ECN encapsulation in IP tunnels
1059	   is for the ECT bit in the outside (encapsulating) header to be off
1060	   (i.e., set to 0), regardless of the value of the ECT bit in the
1061	   inside (encapsulated) header.  With this option, the ECN field in the
1062	   inner header is not altered upon de-capsulation.  The disadvantage of
1063	   this approach is that the flow does not have ECN support for that
1064	   part of the path that is using IP tunneling, even if the encapsulated
1065	   packet (from the original TCP sender) is ECN-Capable.  That is, if
1066	   the encapsulated packet arrives at a congested router that is ECN-
1067	   capable, and the router can decide to drop or mark the packet as an
1068	   indication of congestion to the end nodes, the router will not be
1069	   permitted to set the CE bit in the packet header, but instead will
1070	   have to drop the packet.

1072	   The full-functionality option for ECN encapsulation is to copy the
1073	   ECT bit of the inside header to the outside header on encapsulation,
1074	   and to OR the CE bit from the outer header with the CE bit of the
1075	   inside header on decapsulation.  That is, for full ECN support the
1076	   encapsulation and decapsulation processing involves the following:
1077	   At tunnel ingress, the full-functionality option copies the value of
1078	   ECT (bit 6) in the inner header to the outer header.  CE (bit 7) is
1079	   set to 0 in the outer header.  Upon decapsulation at the tunnel
1080	   egress, the full-functionality option sets CE to 1 in the inner
1081	   header if the value of ECT (bit 6) in the inner header is 1, and the
1082	   value of CE (bit 7) in the outer header is 1.  Otherwise, no change
1083	   is made to this field of the inner header.

1085	   With the full-functionality option, a flow can take advantage of ECN
1086	   in those parts of the path that might use IP tunneling.  The disad-
1087	   vantage of the full-functionality option from a security perspective
1088	   is that the IP tunnel cannot protect the flow from certain modifica-
1089	   tions to the ECN bits in the IP header within the tunnel.  The poten-
1090	   tial dangers from modifications to the ECN bits in the IP header are
1091	   described in detail in Sections 18 and 19.

1093	      (1) An IP tunnel MUST modify the handling of the DS field octet at
1094	      IP tunnel endpoints by implementing either the limited-functional-
1095	      ity or the full-functionality option.
1096	      (2) Optionally, an IP tunnel MAY enable the endpoints of an IP
1097	      tunnel to negotiate the choice between the limited-functionality
1098	      and the full-functionality option for ECN in the tunnel.

1100	   The minimum required to make ECN usable with IP tunnels is the lim-
1101	   ited-functionality option, which prevents ECN from being enabled in
1102	   the outer header of an IPsec tunnel.  Full support for ECN requires
1103	   the use of the full-functionality option.  If there are no optional
1104	   mechanisms for the tunnel endpoints to negotiate a choice between the
1105	   limited-functionality or full-functionality option, there can be a
1106	   pre-existing agreement between the tunnel endpoints about whether to
1107	   support the limited-functionality or the full-functionality ECN
1108	   option.

1110	   In addition, it is RECOMMENDED that packets with ECT and CE both set
1111	   to 1 in the outer header be dropped if they arrive at the tunnel
1112	   egress point for a tunnel that uses the limited-functionality option,
1113	   or for a tunnel that uses the full-functionality option but for which
1114	   the ECT bit in the inner header is set to zero.  This is motivated by
1115	   backwards compatibility and to ensure that no unauthorized modifica-
1116	   tions of the ECN field take place, and is discussed further in the
1117	   next Section (9.1.2).

1119	9.1.2.  Changes to the ECN Field within an IP Tunnel.

1121	   The presence of a copy of the ECN field in the inner header of an IP
1122	   tunnel mode packet provides an opportunity for detection of unautho-
1123	   rized modifications to the ECT bit in the outer header.  Comparison
1124	   of the ECT bits in the inner and outer headers falls into two cate-
1125	   gories for implementations that conform to this document:
1126	      * If the IP tunnel uses the full-functionality option, then the
1127	      values of the ECT bits in the inner and outer headers should be
1128	      identical.
1129	      * If the tunnel uses the limited-functionality option, then the
1130	      ECT bit in the outer header should be 0.

1132	   Receipt of a packet not satisfying the appropriate condition could be
1133	   a cause of concern.

1135	   Consider the case of an IP tunnel where the tunnel ingress point has
1136	   not been updated to this document's requirements, while the tunnel
1137	   egress point has been updated to support ECN.  In this case, the IP
1138	   tunnel is not explicitly configured to support the full-functionality
1139	   ECN option. However, the tunnel ingress point is behaving identically
1140	   to a tunnel ingress point that supports the full-functionality
1141	   option.  If packets from an ECN-capable connection use this tunnel,
1142	   ECT will be set to 1 in the outer header at the tunnel ingress point.
1143	   Congestion within the tunnel may then result in ECN-capable routers
1144	   setting CE in the outer header.  Because the tunnel has not been
1145	   explicitly configured to support the full-functionality option, the
1146	   tunnel egress point expects the ECT bit in the outer header to be 0.
1147	   When an ECN-capable tunnel egress point receives a packet with the
1148	   ECT bit in the outer header set to 1, in a tunnel that has not been
1149	   configured to support the full-functionality option, that packet
1150	   should be processed, according to whether CE bit was set, as follows.
1151	   It is RECOMMENDED that such packets, with the ECT bit in the outer
1152	   header set to 1 on a tunnel that has not been configured to support
1153	   the full-functionality option, be dropped at the egress point if CE
1154	   is set to 1 in the outer header but 0 in the inner header, and for-
1155	   warded otherwise.

1157	   An IP tunnel cannot provide protection against erasure of congestion
1158	   indications based on resetting the value of the CE bit in packets for
1159	   which ECT is set in the outer header.  The erasure of congestion
1160	   indications may impact the network and other flows in ways that would
1161	   not be possible in the absence of ECN.  It is important to note that
1162	   erasure of congestion indications can only be performed to congestion
1163	   indications placed by nodes within the tunnel; the copy of the CE bit
1164	   in the inner header preserves congestion notifications from nodes
1165	   upstream of the tunnel ingress.  If erasure of congestion notifica-
1166	   tions is judged to be a security risk that exceeds the congestion
1167	   management benefits of ECN, then tunnels could be specified or con-
1168	   figured to use the limited-functionality option.

1170	9.2.  IPsec Tunnels

1172	   IPsec supports secure communication over potentially insecure network
1173	   components such as intermediate routers.  IPsec protocols support two
1174	   operating modes, transport mode and tunnel mode, that span a wide
1175	   range of security requirements and operating environments.  Transport
1176	   mode security protocol header(s) are inserted between the IP (IPv4 or
1177	   IPv6) header and higher layer protocol headers (e.g., TCP), and hence
1178	   transport mode can only be used for end-to-end security on a connec-
1179	   tion.  IPsec tunnel mode is based on adding a new "outer" IP header
1180	   that encapsulates the original, or "inner" IP header and its associ-
1181	   ated packet.  Tunnel mode security headers are inserted between these
1182	   two IP headers.  In contrast to transport mode, the new "outer" IP
1183	   header and tunnel mode security headers can be added and removed at
1184	   intermediate points along a connection, enabling security gateways to
1185	   secure vulnerable portions of a connection without requiring endpoint
1186	   participation in the security protocols.  An important aspect of tun-
1187	   nel mode security is that in the original specification, the outer
1188	   header is discarded at tunnel egress, ensuring that security threats
1189	   based on modifying the IP header do not propagate beyond that tunnel
1190	   endpoint.  Further discussion of IPsec can be found in [RFC2401].

1192	   The IPsec protocol as originally defined in [ESP, AH] required that
1193	   the inner header's ECN field not be changed by IPsec decapsulation
1194	   processing at a tunnel egress node; this would have ruled out the
1195	   possibility of full-functionality mode for ECN.  At the same time,
1196	   this would ensure that an adversary's modifications to the ECN field
1197	   cannot be used to launch theft- or denial-of-service attacks across
1198	   an IPsec tunnel endpoint, as any such modifications will be discarded
1199	   at the tunnel endpoint.

1201	   In principle, permitting the use of ECN functionality in the outer
1202	   header of an IPsec tunnel raises security concerns because an adver-
1203	   sary could tamper with the information that propagates beyond the
1204	   tunnel endpoint.  Based on an analysis (included in Sections 18 and
1205	   19) of these concerns and the associated risks, our overall approach
1206	   has been to provide configuration support for IPsec changes to remove
1207	   the conflict with ECN.

1209	   In particular, in tunnel mode the IPsec tunnel MUST support either
1210	   the limited-functionality or the full-functionality mode outlined in
1211	   Section 9.1.1.

1213	   This makes permission to use ECN functionality in the outer header of
1214	   an IPsec tunnel a configurable part of the corresponding IPsec Secu-
1215	   rity Association (SA), so that it can be disabled in situations where
1216	   the risks are judged to outweigh the benefits.  The result is that an
1217	   IPsec security administrator is presented with two alternatives for
1218	   the behavior of ECN-capable connections within an IPsec tunnel, the
1219	   limited-functionality alternative and full-functionality alternative
1220	   described earlier.  All IPsec implementations MUST implement either
1221	   the limited-functionality or the full-functionality alternative in
1222	   order to eliminate incompatibility between ECN and IPsec tunnels, but
1223	   implementers MAY choose to implement either alternative.

1225	   In addition, this document specifies how the endpoints of an IPsec
1226	   tunnel could negotiate enabling ECN functionality in the outer head-
1227	   ers of that tunnel based on security policy.  The ability to negoti-
1228	   ate ECN usage between tunnel endpoints would enable a security admin-
1229	   istrator to disable ECN in situations where she believes the risks
1230	   (e.g., of lost congestion notifications) outweigh the benefits of
1231	   ECN.

1233	   The IPsec protocol, as defined in [ESP, AH], does not include the IP
1234	   header's ECN field in any of its cryptographic calculations (in the
1235	   case of tunnel mode, the outer IP header's ECN field is not
1236	   included).  Hence modification of the ECN field by a network node has
1237	   no effect on IPsec's end-to-end security, because it cannot cause any
1238	   IPsec integrity check to fail.  As a consequence, IPsec does not pro-
1239	   vide any defense against an adversary's modification of the ECN field
1240	   (i.e., a man-in-the-middle attack), as the adversary's modification
1241	   will also have no effect on IPsec's end-to-end security.  In some
1242	   environments, the ability to modify the ECN field without affecting
1243	   IPsec integrity checks may constitute a covert channel; if it is nec-
1244	   essary to eliminate such a channel or reduce its bandwidth, then the
1245	   IPsec tunnel should be run in limited-functionality mode.

1247	9.2.1.  Negotiation between Tunnel Endpoints

1249	   This section describes the detailed changes to enable usage of ECN
1250	   over IPsec tunnels, including the negotiation of ECN support between
1251	   tunnel endpoints.  This is supported by three changes to IPsec:
1252	      * An optional Security Association Database (SAD) field indicating
1253	      whether tunnel encapsulation and decapsulation processing allows
1254	      or forbids ECN usage in the outer IP header.
1255	      * An optional Security Association Attribute that enables negotia-
1256	      tion of this SAD field between the two endpoints of an SA that
1257	      supports tunnel mode.
1258	      * Changes to tunnel mode encapsulation and decapsulation process-
1259	      ing to allow or forbid ECN usage in the outer IP header based on
1260	      the value of the SAD field.  When ECN usage is allowed in the
1261	      outer IP header, ECT is set in the outer header for ECN-capable
1262	      connections and congestion notifications (indicated by the CE bit)
1263	      from such connections are propagated to the inner header at tunnel
1264	      egress.

1266	   If negotiation of ECN usage is implemented, then the SAD field SHOULD
1267	   also be implemented.  On the other hand, negotiation of ECN usage is
1268	   OPTIONAL in all cases, even for implementations that support the SAD
1269	   field.  The encapsulation and decapsulation processing changes are
1270	   REQUIRED, but MAY be implemented without the other two changes by
1271	   assuming that ECN usage is always forbidden.  The full-functionality
1272	   alternative for ECN usage over IPsec tunnels consists of the SAD
1273	   field and the full version of encapsulation and decapsulation pro-
1274	   cessing changes, with or without the OPTIONAL negotiation support.
1275	   The limited-functionality alternative consists of a subset of the
1276	   encapsulation and decapsulation changes that always forbids ECN
1277	   usage.

1279	   These changes are covered further in the following three subsections.

1281	9.2.1.1.  ECN Tunnel Security Association Database Field

1283	   Full ECN functionality adds a new field to the SAD (see [RFC2401]):

1285	      ECN Tunnel: allowed or forbidden.

1287	      Indicates whether ECN-capable connections using this SA in tunnel
1288	      mode are permitted to receive ECN congestion notifications for
1289	      congestion occurring within the tunnel.  The allowed value enables
1290	      ECN congestion notifications.  The forbidden value disables such
1291	      notifications, causing all congestion to be indicated via dropped
1292	      packets.

1294	      [OPTIONAL.  The value of this field SHOULD be assumed to be
1295	      "forbidden" in implementations that do not support it.]

1297	   If this attribute is implemented, then the SA specification in a
1298	   Security Policy Database (SPD) entry MUST support a corresponding
1299	   attribute, and this SPD attribute MUST be covered by the SPD adminis-
1300	   trative interface (currently described in Section 4.4.1 of
1301	   [RFC2401]).

1303	9.2.1.2.  ECN Tunnel Security Association Attribute

1305	   A new IPsec Security Association Attribute is defined to enable the
1306	   support for ECN congestion notifications based on the outer IP header
1307	   to be negotiated for IPsec tunnels (see [RFC2407]).  This attribute
1308	   is OPTIONAL, although implementations that support it SHOULD also
1309	   support the SAD field defined in Section 9.2.1.1.

1311	   Attribute Type

1313	           class               value           type
1314	     -------------------------------------------------
1315	     ECN Tunnel                 10             Basic

1317	   The IPsec SA Attribute value 10 has been allocated by IANA to indi-
1318	   cate that the ECN Tunnel SA Attribute is being negotiated; the type
1319	   of this attribute is Basic (see Section 4.5 of [RFC2407]).  The Class
1320	   Values are used to conduct the negotiation.  See [RFC2407, RFC2408,
1321	   RFC2409] for further information including encoding formats and
1322	   requirements for negotiating this SA attribute.

1324	   Class Values

1326	     ECN Tunnel

1328	       Specifies whether ECN functionality is allowed to
1329	       be used with Tunnel Encapsulation Mode.
1330	       This affects tunnel encapsulation and decapsulation processing -
1331	       see Section 9.2.1.3.

1333	       RESERVED          0
1334	       Allowed           1
1335	       Forbidden         2

1337	       Values 3-61439 are reserved to IANA.  Values 61440-65535 are for
1338	       private use.

1340	       If unspecified, the default shall be assumed to be Forbidden.

1342	   ECN Tunnel is a new SA attribute, and hence initiators that use it
1343	   can expect to encounter responders that do not understand it, and
1344	   therefore reject proposals containing it.  For backwards compatibil-
1345	   ity with such implementations initiators SHOULD always also include a
1346	   proposal without the ECN Tunnel attribute to enable such a responder
1347	   to select a transform or proposal that does not contain the ECN Tun-
1348	   nel attribute.  RFC 2407 currently requires responders to reject all
1349	   proposals if any proposal contains an unknown attribute; this
1350	   requirement is expected to be changed to require a responder not to
1351	   select proposals or transforms containing unknown attributes.

1353	9.2.1.3.  Changes to IPsec Tunnel Header Processing

1355	   For full ECN support, the encapsulation and decapsulation processing
1356	   for the IPv4 TOS field and the IPv6 Traffic Class field are changed
1357	   from that specified in [RFC2401] to the following:

1359	                           <-- How Outer Hdr Relates to Inner Hdr -->
1360	                           Outer Hdr at                 Inner Hdr at
1361	      IPv4                 Encapsulator                 Decapsulator
1362	        Header fields:     --------------------         ------------
1363	          DS Field         copied from inner hdr (5)    no change
1364	          ECN Field        constructed (7)              constructed (8)

1366	      IPv6
1367	        Header fields:
1368	          DS Field         copied from inner hdr (6)    no change
1369	          ECN Field        constructed (7)              constructed (8)

1371	      (5)(6) If the packet will immediately enter a domain for which the
1372	      DSCP value in the outer header is not appropriate, that value MUST
1373	      be mapped to an appropriate value for the domain [RFC 2474].  Also
1374	      see [RFC 2475] for further information.

1376	      (7) If the value of the ECN Tunnel field in the SAD entry for this
1377	      SA is "allowed" and the value of ECT (bit 0) is 1 in the inner
1378	      header, set ECT to 1 in the outer header, else set ECT to 0 in the
1379	      outer header.  Set CE (bit 1) to 0 in the outer header.

1381	      (8) If the value of the ECN tunnel field in the SAD entry for this
1382	      SA is "allowed" and the value of ECT (bit 0) in the inner header
1383	      is 1, then set the CE bit (bit 1) in the inner header to the logi-
1384	      cal OR of the CE bit in the inner header with the CE bit in the
1385	      outer header, else make no change to the ECN field.

1387	      (5) and (6) are identical to match usage in [RFC2401], although
1388	      they are different in [RFC2401].

1390	   The above description applies to implementations that support the ECN
1391	   Tunnel field in the SAD; such implementations MUST implement this
1392	   processing instead of the processing of the IPv4 TOS octet and IPv6
1393	   Traffic Class octet defined in [RFC2401].  This constitutes the full-
1394	   functionality alternative for ECN usage with IPsec tunnels.

1396	   An implementation that does not support the ECN Tunnel field in the
1397	   SAD MUST implement this processing by assuming that the value of the
1398	   ECN Tunnel field of the SAD is "forbidden" for every SA.  In this
1399	   case, the processing of the ECN field reduces to:

1401	      (7) Set the ECN field (ECT and CE bits) to zero in the outer
1402	      header.
1403	      (8) Make no change to the ECN field in the inner header.

1405	   This constitutes the limited functionality alternative for ECN usage
1406	   with IPsec tunnels.

1408	   For backwards compatibility, packets with ECT and CE both set to 1 in
1409	   the outer header SHOULD be dropped if they arrive on an SA that is
1410	   using the limited-functionality option, or that is using the full-
1411	   functionality option (i.e., and has set the ECT flag in the outer
1412	   header to 1) for a packet with the ECT flag set to 0 in the inner
1413	   header.

1415	9.2.2.  Changes to the ECN Field within an IPsec Tunnel.

1417	   If the ECN Field is changed inappropriately within an IPsec tunnel,
1418	   and this change is detected at the tunnel egress, then the receipt of
1419	   a packet not satisfying the appropriate condition for its SA is an
1420	   auditable event.  An implementation MAY create audit records with
1421	   per-SA counts of incorrect packets over some time period rather than
1422	   creating an audit record for each erroneous packet.  Any such audit
1423	   record SHOULD contain the headers from at least one erroneous packet,
1424	   but need not contain the headers from every packet represented by the
1425	   entry.

1427	9.2.3.  Comments for IPsec Support

1429	   Substantial comments were received on two areas of this document dur-
1430	   ing review by the IPsec working group.  This section describes these
1431	   comments and explains why the proposed changes were not incorporated.

1433	   The first comment indicated that per-node configuration is easier to
1434	   implement than per-SA configuration.  After serious thought and
1435	   despite some initial encouragement of per-node configuration, it no
1436	   longer seems to be a good idea. The concern is that as ECN-awareness
1437	   is progressively deployed in IPsec, many ECN-aware IPsec implementa-
1438	   tions will find themselves communicating with a mixture of ECN-aware
1439	   and ECN-unaware IPsec tunnel endpoints.  In such an environment with
1440	   per-node configuration, the only reasonable thing to do is forbid ECN
1441	   usage for all IPsec tunnels, which is not the desired outcome.

1443	   In the second area, several reviewers noted that SA negotiation is
1444	   complex, and adding to it is non-trivial.  One reviewer suggested
1445	   using ICMP after tunnel setup as a possible alternative.  The addi-
1446	   tion to SA negotiation in this document is OPTIONAL and will remain
1447	   so; implementers are free to ignore it.  The authors believe that the
1448	   assurance it provides can be useful in a number of situations.  In
1449	   practice, if this is not implemented, it can be deleted at a subse-
1450	   quent stage in the standards process.  Extending ICMP to negotiate
1451	   ECN after tunnel setup is more complex than extending SA attribute
1452	   negotiation.  Some tunnels do not permit traffic to be addressed to
1453	   the tunnel egress endpoint, hence the ICMP packet would have to be
1454	   addressed to somewhere else, scanned for by the egress endpoint, and
1455	   discarded there or at its actual destination.  In addition, ICMP
1456	   delivery is unreliable, and hence there is a possibility of an ICMP
1457	   packet being dropped, entailing the invention of yet another
1458	   ack/retransmit mechanism.  It seems better simply to specify an
1459	   OPTIONAL extension to the existing SA negotiation mechanism.

1461	9.3.  IP packets encapsulated in non-IP packet headers.

1463	   A different set of issues are raised, relative to ECN, when IP pack-
1464	   ets are encapsulated in tunnels with non-IP packet headers.  This
1465	   occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP].
1466	   For these protocols, there is no conflict with ECN; it is just that
1467	   ECN cannot be used within the tunnel unless an ECN codepoint can be
1468	   specified for the header of the encapsulating protocol.  Earlier work
1469	   considered a preliminary proposal for incorporating ECN into MPLS,
1470	   and proposals for incorporating ECN into GRE, L2TP, or PPTP will be
1471	   considered as the need arises.

1473	10.  Issues Raised by Monitoring and Policing Devices

1475	   One possibility is that monitoring and policing devices (or more
1476	   informally, "penalty boxes") will be installed in the network to mon-
1477	   itor whether best-effort flows are appropriately responding to con-
1478	   gestion, and to preferentially drop packets from flows determined not
1479	   to be using adequate end-to-end congestion control procedures.

1481	   We recommend that any "penalty box" that detects a flow or an aggre-
1482	   gate of flows that is not responding to end-to-end congestion control
1483	   first change from marking to dropping packets from that flow, before
1484	   taking any additional action to restrict the bandwidth available to
1485	   that flow.  Thus, initially, the router may drop packets in which the
1486	   router would otherwise would have set the CE bit.  This could include
1487	   dropping those arriving packets for that flow that are ECN-Capable
1488	   and that already have the CE bit set.  In this way, any congestion
1489	   indications seen by that router for that flow will be guaranteed to
1490	   also be seen by the end nodes, even in the presence of malicious or
1491	   broken routers elsewhere in the path.  If we assume that the first
1492	   action taken at any "penalty box" for an ECN-capable flow will be to
1493	   drop packets instead of marking them, then there is no way that an
1494	   adversary that subverts ECN-based end-to-end congestion control can
1495	   cause a flow to be characterized as being non-cooperative and placed
1496	   into a more severe action within the "penalty box".

1498	   The monitoring and policing devices that are actually deployed could
1499	   fall short of the `ideal' monitoring device described above, in that
1500	   the monitoring is applied not to a single flow, but to an aggregate
1501	   of flows (e.g., those sharing a single IPsec tunnel).  In this case,
1502	   the switch from marking to dropping would apply to all of the flows
1503	   in that aggregate, denying the benefits of ECN to the other flows in
1504	   the aggregate also.  At the highest level of aggregation, another
1505	   form of the disabling of ECN happens even in the absence of monitor-
1506	   ing and policing devices, when ECN-Capable RED queues switch from
1507	   marking to dropping packets as an indication of congestion when the
1508	   average queue size has exceeded some threshold.

1510	   If there were serious operational problems with routers inappropri-
1511	   ately erasing the CE bit in packet headers, this could be addressed
1512	   to some extent by including a one-bit ECN nonce in packet headers.
1513	   Routers would erase the nonce when they set the CE bit [SCWA99].
1514	   Routers that erased the CE bit would face additional difficulty in
1515	   reconstructing the original nonce, and thus repeated erasure of the
1516	   CE bit would be more likely to be detected by the end-nodes.  (This
1517	   could in fact be done without adding any extra bits for ECN in the IP
1518	   header, by using the ECN codepoints (ECT=1, CE=0) and (ECT=0, CE=1)
1519	   as the two values for the nonce, and by defining the codepoint
1520	   (ECT=0, CE=1) to mean exactly the same as the codepoint (ECT=1,
1521	   CE=0).)  However, at this point the potential danger of misbehaving
1522	   routers does not seem of sufficient concern to warrant this addi-
1523	   tional complication of adding an ECN nonce to protect against the
1524	   erasure of the CE bit.  Additional research is also needed to better
1525	   understand the value of such a nonce and appropriate means of gener-
1526	   ating sequences of nonce values that an adversary will find suffi-
1527	   ciently difficult to reconstruct.

1529	   An ECN nonce would also address the problem of misbehaving transport
1530	   receivers lying to the transport sender about whether or not the CE
1531	   bit was set in a packet.  However, another possibility is for the
1532	   data sender to test for a misbehaving receiver directly, by occasion-
1533	   ally sending a data packet with ECT and CE set, to see if the
1534	   receiver reports receiving the CE bit.  Of course, if these packets
1535	   encountered congestion in the network, the router would make no
1536	   change in the packets, because the CE bit would already be set.
1537	   Thus, for packets sent with the ECT and CE bits set, the TCP end-
1538	   nodes could not determine if some router intended to set the CE bit
1539	   in these packets.  For this reason, sending packets with the ECT and
1540	   CE bits would have to be done very sparingly.  In addition, the TCP
1541	   sender would have to remember which packets were sent with the ECT
1542	   and CE bits set, so that it doesn't react to them as if there was
1543	   congestion in the network.  We believe that further research is
1544	   needed on possible transport-based mechanisms for verifying that the
1545	   transport receiver does not lie to the transport sender about the
1546	   receipt of congestion indications.

1548	11.  Evaluations of ECN

1550	   This section discusses some of the related work evaluating the use of
1551	   ECN.  The ECN Web Page [ECN] has pointers to other papers, as well as
1552	   to implementations of ECN.

1554	   [Floyd94] considers the advantages and drawbacks of adding ECN to the
1555	   TCP/IP architecture.  As shown in the simulation-based comparisons,
1556	   one advantage of ECN is to avoid unnecessary packet drops for short
1557	   or delay-sensitive TCP connections.  A second advantage of ECN is in
1558	   avoiding some unnecessary retransmit timeouts in TCP.  This paper
1559	   discusses in detail the integration of ECN into TCP's congestion con-
1560	   trol mechanisms.  The possible disadvantages of ECN discussed in the
1561	   paper are that a non-compliant TCP connection could falsely advertise
1562	   itself as ECN-capable, and that a TCP ACK packet carrying an ECN-Echo
1563	   message could itself be dropped in the network.  The first of these
1564	   two issues is discussed in the appendix of this document, and the
1565	   second is addressed by the addition of the CWR flag in the TCP
1566	   header.

1568	   Experimental evaluations of ECN include [RFC2884,K98].  The conclu-
1569	   sions of [K98] and [RFC2884] are that ECN TCP gets moderately better
1570	   throughput than non-ECN TCP; that ECN TCP flows are fair towards non-
1571	   ECN TCP flows; and that ECN TCP is robust with two-way traffic (with
1572	   congestion in both directions) and with multiple congested gateways.
1573	   Experiments with many short web transfers show that, while most of
1574	   the short connections have similar transfer times with or without
1575	   ECN, a small percentage of the short connections have very long
1576	   transfer times for the non-ECN experiments as compared to the ECN
1577	   experiments.

1579	12.  Summary of changes required in IP and TCP

1581	   This document specified two bits in the IP header, the ECN-Capable
1582	   Transport (ECT) bit and the Congestion Experienced (CE) bit, to be
1583	   used for ECN.  The ECT bit set to "0" indicates that the transport
1584	   protocol will ignore the CE bit.  This is the default value for the
1585	   ECT bit.  The ECT bit set to "1" indicates that the transport proto-
1586	   col is willing and able to participate in ECN.

1588	   The default value for the CE bit is "0".  The router sets the CE bit
1589	   to "1" to indicate congestion to the end nodes.  The CE bit in a
1590	   packet header MUST NOT be reset by a router from "1" to "0".

1592	   When viewed in terms of code points, this document has defined three
1593	   code points for the ECN field, for "not ECT" (ECT=0, CE=0), "ECT but
1594	   not CE" (ECT=1, CE=0), and "ECT and CE" (ECT=1, CE=1).  The code
1595	   point of (ECT=0, CE=1) is not defined in this document.  One possi-
1596	   bility would be for this code point to be used, some time in the
1597	   future, for some other function for non-ECN-capable packets.  A sec-
1598	   ond possibility would be for this code point to be used as an ECN
1599	   nonce, as described earlier in the document.  A third possibility
1600	   would be for the code point (ECT=0, CE=1) to be used to indicate that
1601	   the packet is ECN-capable for an alternate semantics for the Conges-
1602	   tion Experienced indication.  However, at this time the code point
1603	   (ECT=0, CE=1) remains undefined.

1605	   TCP requires three changes for ECN, a setup phase and two new flags
1606	   in the TCP header. The ECN-Echo flag is used by the data receiver to
1607	   inform the data sender of a received CE packet.  The Congestion Win-
1608	   dow Reduced (CWR) flag is used by the data sender to inform the data
1609	   receiver that the congestion window has been reduced.

1611	   When ECN (Explicit Congestion Notification [RFC2481]) is used, it is
1612	   required that congestion indications generated within an IP tunnel
1613	   not be lost at the tunnel egress.  We specified a minor modification
1614	   to the IP protocol's handling of the ECN field during encapsulation
1615	   and de-capsulation to allow flows that will undergo IP tunneling to
1616	   use ECN.

1618	   Two options for ECN in tunnels were specified:
1619	   1) A limited-functionality option that does not use ECN inside the IP
1620	   tunnel, by turning the ECT bit in the outer header off, and not
1621	   altering the inner header at the time of decapsulation.
1622	   2) The full-functionality option, which copies the ECT bit of the
1623	   inner header to the encapsulating header. At decapsulation, if the
1624	   ECT bit is set in the inner header, the CE bit on the outer header is
1625	   ORed with the CE bit of the inner header to update the CE bit of the
1626	   packet.

1628	   All IP tunnels MUST implement one of the two alternative approaches
1629	   described above.  For IPsec tunnels, this document also defines an
1630	   optional IPsec Security Association (SA) attribute that enables
1631	   negotiation of ECN usage within IPsec tunnels and an optional field
1632	   in the Security Association Database to indicate whether ECN is per-
1633	   mitted in tunnel mode on a SA.  The required changes to IPsec tunnels
1634	   for ECN usage modify RFC 2401 [RFC2401], which defines the IPsec
1635	   architecture and specifies some aspects of its implementation.  The
1636	   new IPsec SA attribute is in addition to those already defined in
1637	   Section 4.5 of [RFC2407].

1639	   This document is intended to obsolete RFC 2481, "A Proposal to add
1640	   Explicit Congestion Notification (ECN) to IP", which defined ECN as
1641	   an Experimental Protocol for the Internet Community.  The rest of
1642	   this section describes the relationship between this document and its
1643	   predecessor.

1645	   RFC 2481 included a brief discussion of the use of ECN with encapsu-
1646	   lated packets, and noted that for the IPsec specifications at the
1647	   time (January 1999), flows could not safely use ECN if they were to
1648	   traverse IPsec tunnels.  RFC 2481 also described the changes that
1649	   could be made to IPsec tunnel specifications to made them compatible
1650	   with ECN.

1652	   This document also incorporates work that was done after RFC 2481,
1653	   First was to describe the changes to IPsec tunnels in detail, and
1654	   extensively discuss the security implications of ECN (now included as
1655	   Sections 18 and 19 of this document).  Second was to extend the dis-
1656	   cussion of IPsec tunnels to include all IP tunnels.  Because older IP
1657	   tunnels are not compatible with a flow's use of ECN, the deployment
1658	   of ECN in the Internet will create strong pressure for older IP tun-
1659	   nels to be updated to an ECN-compatible version, using either the
1660	   limited-functionality or the full-functionality option.

1662	   This document does not address the issue of including ECN in non-IP
1663	   tunnels such as MPLS, GRE, L2TP, or PPTP.  An earlier preliminary
1664	   document about adding ECN support to MPLS was not advanced.

1666	   A third new piece of work after RFC2481 was to describe the ECN pro-
1667	   cedure with retransmitted data packets, that the ECT bit should not
1668	   be set on retransmitted data packets.  The motivation for this addi-
1669	   tional specification is to eliminate a possible avenue for denial-of-
1670	   service attacks on an existing TCP connection.  Some prior deploy-
1671	   ments of ECN-capable TCP might not conform to the (new) requirement
1672	   not to set the ECT bit on retransmitted packets; we do not believe
1673	   this will cause significant problems in practice.

1675	   This document also expands slightly on the specification of the use
1676	   of SYN packets for the negotiation of ECN.  While some prior deploy-
1677	   ments of ECN-capable TCP might not conform to the requirements speci-
1678	   fied in this document, we do not believe that this will lead to any
1679	   performance or compatibility problems for TCP connections with a com-
1680	   bination of TCP implementations at the endpoints.

1682	13.  Conclusions

1684	   Given the current effort to implement AQM, we believe this is the
1685	   right time to deploy congestion avoidance mechanisms that do not
1686	   depend on packet drops alone.  With the increased deployment of
1687	   applications and transports sensitive to the delay and loss of a sin-
1688	   gle packet (e.g., realtime traffic, short web transfers), depending
1689	   on packet loss as a normal congestion notification mechanism appears
1690	   to be insufficient (or at the very least, non-optimal).

1692	   We examined the consequence of modifications of the ECN field within
1693	   the network, analyzing all the opportunities for an adversary to
1694	   change the ECN field.  In many cases, the change to the ECN field is
1695	   no worse than dropping a packet. However, we noted that some changes
1696	   have the more serious consequence of subverting end-to-end congestion
1697	   control.  However, we point out that even then the potential damage
1698	   is limited, and is similar to the threat posed by end-systems inten-
1699	   tionally failing to cooperate with end-to-end congestion control.

1701	14.  Acknowledgements

1703	   Many people have made contributions to this work and this document,
1704	   including many that we have not managed to directly acknowledge in
1705	   this document.  In addition, we would like to thank Kenjiro Cho for
1706	   the proposal for the TCP mechanism for negotiating ECN-Capability,
1707	   Kevin Fall for the proposal of the CWR bit, Steve Blake for material
1708	   on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for discus-
1709	   sions of ECN issues, and Steve Bellovin, Jim Bound, Brian Carpenter,
1710	   Paul Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson for dis-
1711	   cussions of security issues.  We also thank the Internet End-to-End
1712	   Research Group for ongoing discussions of these issues.

1714	   Email discussions with a number of people, including Alexey
1715	   Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have addressed
1716	   the issues raised by non-conformant equipment in the Internet that
1717	   does not respond to TCP SYN packets with the ECE and CWR flags set.
1718	   We thank Mark Handley, Jitentra Padhye, and others for discussions on
1719	   the TCP initialization procedures.

1721	   The discussion of ECN and IP tunnel considerations draws heavily on
1722	   related discussions and documents from the Differentiated Services
1723	   Working Group.  We thank Tabassum Bint Haque from Dhaka, Bangladesh,
1724	   for feedback on IP tunnels.  We thank Derrell Piper and Kero Tivinen
1725	   for proposing modifications to RFC 2407 that improve the usability of
1726	   negotiating the ECN Tunnel SA attribute.

1728	15.  References

1730	   [AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402,
1731	   November 1998.

1733	   [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement
1734	   Levels", BCP 14, RFC 2119, March 1997.

1736	   [ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html".
1737	   Reference for informational purposes only.

1739	   [ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload",
1740	   RFC 2406, November 1998.

1742	   [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways
1743	   for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1
1744	   N.4, August 1993, p.  397-413.

1746	   [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM
1747	   Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23.

1749	   [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator",
1750	   URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all-
1751	   ecn.  Reference for informational purposes only.

1753	   [FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-End Con-
1754	   gestion Control in the Internet", IEEE/ACM Transactions on Network-
1755	   ing, August 1999.

1757	   [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection",
1758	   SIGCOMM '97, September 1997.

1760	   [GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing
1761	   Encapsulation (GRE), RFC 1701, October 1994.

1763	   [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc.
1764	   ACM SIGCOMM '88, pp. 314-329.

1766	   [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance Algo-
1767	   rithm", Message to end2end-interest mailing list, April 1990. URL
1768	   "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt".

1770	   [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN)
1771	   benefits for TCP", Master's thesis, UCLA, 1998, URL
1772	   "http://www.cs.ucla.edu/~hari/software/ecn/ ecn_report.ps.gz".

1774	   [L2TP] W. Townsley, A. Valencia, A. Rubens, G. Pall, G. Zorn, and B.
1775	   Palter Layer Two Tunneling Protocol "L2TP", RFC 2661, August 1999.

1777	   [MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver- driven
1778	   Layered Multicast", SIGCOMM '96, August 1996, pp.  117-130.

1780	   [MPLS] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus,
1781	   Requirements for Traffic Engineering Over MPLS, RFC 2702, September
1782	   1999.

1784	   [PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W.
1785	   and G. Zorn, "Point-to-Point Tunneling Protocol (PPTP)", RFC 2637,
1786	   July 1999.

1788	   [RFC791] Postel, J., "Internet Protocol", STD 5, RFC 791, September
1789	   1981.

1791	   [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
1792	   September 1981.

1794	   [RFC1141] Mallory, T. and A. Kullberg, "Incremental Updating of the
1795	   Internet Checksum", RFC 1141, January 1990.

1797	   [RFC1349] Almquist, P., "Type of Service in the Internet Protocol
1798	   Suite", RFC 1349, July 1992.

1800	   [RFC1455] Eastlake, D., "Physical Link Security Type of Service", RFC
1801	   1455, May 1993.

1803	   [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic
1804	   Routing Encapsulation (GRE), RFC 1701, October 1994.

1806	   [RFC1702] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic
1807	   Routing Encapsulation over IPv4 networks, RFC 1702, October 1994.

1809	   [RFC2003]  Perkins, C., IP Encapsulation within IP, RFC 2003, October
1810	   1996.

1812	   [RFC 2119] S. Bradner, Key words for use in RFCs to Indicate Require-
1813	   ment Levels, RFC 2119, March 1997.

1815	   [RFC2309] Braden, B., et al., "Recommendations on Queue Management
1816	   and Congestion Avoidance in the Internet", RFC 2309, April 1998.

1818	   [RFC2401] S. Kent and R. Atkinson, Security Architecture for the
1819	   Internet Protocol, RFC 2401, November 1998.

1821	   [RFC2407] D. Piper, The Internet IP Security Domain of Interpretation
1822	   for ISAKMP, RFC 2407, November 1998.

1824	   [RFC2408] D. Maughan, M. Schertler, M. Schneider, and J. Turner,
1825	   Internet Security Association and Key Management Protocol (ISAKMP),
1826	   RFC 2409, November 1998.

1828	   [RFC2409] D. Harkins and D. Carrel, The Internet Key Exchange (IKE),
1829	   RFC 2409, November 1998.

1831	   [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition
1832	   of the Differentiated Services Field (DS Field) in the IPv4 and IPv6
1833	   Headers", RFC 2474, December 1998.

1835	   [RFC2475] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W.
1836	   Weiss, An Architecture for Differentiated Services, RFC 2475, Decem-
1837	   ber 1998.

1839	   [RFC2481] K. Ramakrishnan and S. Floyd, A Proposal to add Explicit
1840	   Congestion Notification (ECN) to IP, RFC 2481, January 1999.

1842	   [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control",
1843	   RFC 2581, April 1999.

1845	   [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation
1846	   of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884,
1847	   July 2000.

1849	   [RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983,
1850	   October 2000.

1852	   [RFC2780] S. Bradner and V. Paxson, "IANA Allocation Guidelines For
1853	   Values In the Internet Protocol and Related Headers", RFC 2780, March
1854	   2000.

1856	   [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for
1857	   Congestion Avoidance in Computer Networks", ACM Transactions on Com-
1858	   puter Systems, Vol.8, No.2, pp.  158-181, May 1990.

1860	   [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom
1861	   Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM
1862	   Computer Communications Review, October 1999.

1864	16.  Security Considerations

1866	   Security considerations have been discussed in Sections 7, 8, 18, and
1867	   19.

1869	17.  IPv4 Header Checksum Recalculation

1871	   IPv4 header checksum recalculation is an issue with some high-end
1872	   router architectures using an output-buffered switch, since most if
1873	   not all of the header manipulation is performed on the input side of
1874	   the switch, while the ECN decision would need to be made local to the
1875	   output buffer. This is not an issue for IPv6, since there is no IPv6
1876	   header checksum. The IPv4 TOS octet is the last byte of a 16-bit
1877	   half-word.

1879	   RFC 1141 [RFC1141] discusses the incremental updating of the IPv4
1880	   checksum after the TTL field is decremented.  The incremental updat-
1881	   ing of the IPv4 checksum after the CE bit was set would work as fol-
1882	   lows: Let HC be the original header checksum, and let HC' be the new
1883	   header checksum after the CE bit has been set.  Then for header
1884	   checksums calculated with one's complement subtraction, HC' would be
1885	   recalculated as follows:

1887	        HC' = { HC - 1     HC > 1
1888	              { 0x0000     HC = 1

1890	For header checksums calculated on two's complement machines, HC' would
1891	be recalculated as follows after the CE bit was set:

1893	        HC' = { HC - 1     HC > 0
1894	              { 0xFFFE     HC = 0

1896	18.  Possible Changes to the ECN Field in the Network

1898	   This section discusses in detail possible changes to the ECN field in
1899	   the network, such as falsely reporting congestion, disabling ECN-
1900	   Capability for an individual packet, erasing the ECN congestion indi-
1901	   cation, or falsely indicating ECN-Capability.  We represent the ECN
1902	   bits in the IP header by the tuple (ECT bit, CE bit).

1904	18.1.  Possible Changes to the IP Header

1906	18.1.1.  Erasing the Congestion Indication

1908	   First, we consider the changes that a router could make that would
1909	   result in effectively erasing the congestion indication after it had
1910	   been set by a router upstream.  The convention followed is:
1911	   (ECT, CE) of received packet -> (ECT, CE) of packet transmitted.

1913	   (1, 1) -> (1, 0): erase only the CE bit that was set.
1914	   (1, 1) -> (0, 0): erase both the ECT bit and the CE bit.
1915	   (1, 1) -> (0, 1): erase the ECT bit

1917	   The first change turns off the CE bit after it has been set by some
1918	   upstream router along the path.  The consequence for the upstream
1919	   router is that there is a potential for congestion to build for a
1920	   time, because the congestion indication does not reach the source.
1921	   However, the packet would be received and acknowledged.

1923	   The potential effect of erasing the congestion indication is complex,
1924	   and is discussed in depth in Section 19 below.  Note that the effect
1925	   of erasing the congestion indication is different from dropping a
1926	   packet in the network.  When a data packet is dropped, the drop is
1927	   detected by the TCP sender, and interpreted as an indication of con-
1928	   gestion.  Similarly, if a sufficient number of consecutive acknowl-
1929	   edgement packets are dropped, causing the cumulative acknowledgement
1930	   field not to be advanced at the sender, the sender is limited by the
1931	   congestion window from sending additional packets, and ultimately the
1932	   retransmit timer expires.

1934	   In contrast, a systematic erasure of the CE bit by a downstream
1935	   router can have the effect of causing a queue buildup at an upstream
1936	   router, including the possible loss of packets due to buffer over-
1937	   flow.  There is a potential of unfairness in that another flow that
1938	   goes through the congested router could react to the CE bit set while
1939	   the flow that has the CE bit erased could see better performance.
1940	   The limitations on this potential unfairness are discussed in more
1941	   detail in Section 19 below.

1943	   The second change is to turn off both the ECT and the CE bits, thus
1944	   erasing the congestion indication and disabling ECN-Capability at the
1945	   same time.  The third change turns off only the ECT bit, disabling
1946	   ECN-Capability.

1948	   Within an IP tunnel using the full-functionality option, the third
1949	   change would not erase the congestion indication, but would only dis-
1950	   able ECN-Capability for that packet within the rest of the tunnel.
1951	   However, when performed outside of an IP tunnel, the third change
1952	   would also effectively erase the congestion indication, because an
1953	   ECN field of (0, 1) is undefined.

1955	   The `erasure' of the congestion indication is only effective if the
1956	   packet does not end up being marked or dropped again by a downstream
1957	   router.  With the first change, the packet remains ECN-Capable, and
1958	   could be either marked or dropped by a downstream router as an indi-
1959	   cation of congestion.  With the second and third changes, the packet
1960	   is no longer ECN-capable, and can therefore be dropped but not marked
1961	   by a downstream router as an indication of congestion.

1963	18.1.2.  Falsely Reporting Congestion

1965	   (1, 0) -> (1, 1)

1967	   This change is to set the CE bit when the ECT bit was already set,
1968	   even though there was no congestion.  This change does not affect the
1969	   treatment of that packet along the rest of the path.  In particular,
1970	   a router does not examine the CE bit in deciding whether to drop or
1971	   mark an arriving packet.

1973	   However, this could result in the application unnecessarily invoking
1974	   end-to-end congestion control, and reducing its arrival rate.  By
1975	   itself, this is no worse (for the application or for the network)
1976	   than if the tampering router had actually dropped the packet.

1978	18.1.3.  Disabling ECN-Capability

1980	   (1, 0) -> (0, *)

1982	   This change is to turn off the ECT bit of a packet that does not have
1983	   the CE bit set.  (Section 18.1.1 discussed the case of turning off
1984	   the ECT bit of a packet that does have the CE bit set.)  This means
1985	   that if the packet later encounters congestion (e.g., by arriving to
1986	   a RED queue with a moderate average queue size), it will be dropped
1987	   instead of being marked.  By itself, this is no worse (for the appli-
1988	   cation) than if the tampering router had actually dropped the packet.
1989	   The saving grace in this particular case is that there is no con-
1990	   gested router upstream expecting a reaction from setting the CE bit.

1992	18.1.4.  Falsely Indicating ECN-Capability
1993	   This change would incorrectly label a packet as ECN-Capable. The
1994	   packet may have been sent either by an ECN-Capable transport or a
1995	   transport that is not ECN-Capable.

1997	   (0, *) -> (1, 0);
1998	   (0, *) -> (1, 1);

2000	   If the packet later encounters moderate congestion at an ECN-Capable
2001	   router, the router could set the CE bit instead of dropping the
2002	   packet.  If the transport protocol in fact is not ECN-Capable, then
2003	   the transport will never receive this indication of congestion, and
2004	   will not reduce its sending rate in response.  The potential conse-
2005	   quences of falsely indicating ECN-capability are discussed further in
2006	   Section 19 below.

2008	   If the packet never later encounters congestion at an ECN-Capable
2009	   router, then the first of these two changes would have no effect.
2010	   The second change, however, would have the effect of giving false
2011	   reports of congestion to a monitoring device along the path.  If the
2012	   transport protocol is ECN-Capable, then the second of these two
2013	   changes (when, for example, (0,0) was changed to (1,1)) could also
2014	   have an effect at the transport level, by combining falsely indicat-
2015	   ing ECN-Capability with falsely reporting congestion.  For an ECN-
2016	   capable transport, this would cause the transport to unnecessarily
2017	   react to congestion.  In this particular case, the router that is
2018	   incorrectly changing the ECN field could have dropped the packet.
2019	   Thus for this case of an ECN-capable transport, the consequence of
2020	   this change to the ECN field is no worse than dropping the packet.

2022	18.1.5.  Changes with No Functional Effect

2024	   (0, *) -> (0, *)

2026	   The CE bit is ignored in a packet that does not have the ECT bit set.
2027	   Thus, this change would have no effect, in terms of ECN.

2029	18.2.  Information carried in the Transport Header

2031	   For TCP, an ECN-capable TCP receiver informs its TCP peer that it is
2032	   ECN-capable at the TCP level, conveying this information in the TCP
2033	   header at the time the connection is setup.  This document does not
2034	   consider potential dangers introduced by changes in the transport
2035	   header within the network.  In the case of IPsec tunnels, the IPsec
2036	   tunnel protects the transport header.

2038	   Another issue concerns TCP packets with a spoofed IP source address
2039	   carrying invalid ECN information in the transport header.  For com-
2040	   pleteness, we examine here some possible ways that a node spoofing
2041	   the IP source address of another node could use the two ECN flags in
2042	   the TCP header to launch a denial-of-service attack. However, these
2043	   attacks would require an ability for the attacker to use valid TCP
2044	   sequence numbers, and any attacker with this ability and with the
2045	   ability to spoof IP source addresses could damage the TCP connection
2046	   without using the ECN flags.  Therefore, ECN does not add any new
2047	   vulnerabilities in this respect.

2049	   An acknowledgement packet with a spoofed IP source address of the TCP
2050	   data receiver could include the ECE bit set.  If accepted by the TCP
2051	   data sender as a valid packet, this spoofed acknowledgement packet
2052	   could result in the TCP data sender unnecessarily halving its conges-
2053	   tion window.  However, to be accepted by the data sender, such a
2054	   spoofed acknowledgement packet would have to have the correct 32-bit
2055	   sequence number as well as a valid acknowledgement number.  An
2056	   attacker that could successfully send such a spoofed acknowledgement
2057	   packet could also send a spoofed RST packet, or do other equally dam-
2058	   aging operations to the TCP connection.

2060	   Packets with a spoofed IP source address of the TCP data sender could
2061	   include the CWR bit set.  Again, to be accepted, such a packet would
2062	   have to have a valid sequence number.  In addition, such a spoofed
2063	   packet would have a limited performance impact.  Spoofing a data
2064	   packet with the CWR bit set could result in the TCP data receiver
2065	   sending fewer ECE packets than it would otherwise, if the data
2066	   receiver was sending ECE packets when it received the spoofed CWR
2067	   packet.

2069	18.3.  Split Paths

2071	   In some cases, a malicious or broken router might have access to only
2072	   a subset of the packets from a flow.  The question is as follows:
2073	   can this router, by altering the ECN field in this subset of the
2074	   packets, do more damage to that flow than if it had simply dropped
2075	   that set of packets?

2077	   We will classify the packets in the flow as A packets and B packets,
2078	   and assume that the adversary only has access to A packets.  Assume
2079	   that the adversary is subverting end-to-end congestion control along
2080	   the path traveled by A packets only, by either falsely indicating
2081	   ECN-Capability upstream of the point where congestion occurs, or
2082	   erasing the congestion indication downstream.  Consider also that
2083	   there exists a monitoring device that sees both the A and B packets,
2084	   and will "punish" both the A and B packets if the total flow is
2085	   determined not to be properly responding to indications of conges-
2086	   tion.  Another key characteristic that we believe is likely to be
2087	   true is that the monitoring device, before `punishing' the A&B flow,
2088	   will first drop packets instead of setting the CE bit, and will drop
2089	   arriving packets of that flow that already have the ECT and CE bits
2090	   set.  If the end nodes are in fact using end-to-end congestion con-
2091	   trol, they will see all of the indications of congestion seen by the
2092	   monitoring device, and will begin to respond to these indications of
2093	   congestion. Thus, the monitoring device is successful in providing
2094	   the indications to the flow at an early stage.

2096	   It is true that the adversary that has access only to the A packets
2097	   might, by subverting ECN-based congestion control, be able to deny
2098	   the benefits of ECN to the other packets in the A&B aggregate.  While
2099	   this is unfortunate, this is not a reason to disable ECN within an
2100	   IPsec tunnel.

2102	   A variant of falsely reporting congestion occurs when there are two
2103	   adversaries along a path, where the first adversary falsely reports
2104	   congestion, and the second adversary `erases' those reports. (Unlike
2105	   packet drops, ECN congestion reports can be `reversed' later in the
2106	   network by a malicious or broken router.)  While this would be trans-
2107	   parent to the end node, it is possible that a monitoring device
2108	   between the first and second adversaries would see the false indica-
2109	   tions of congestion.  Keep in mind our recommendation in this docu-
2110	   ment, that before `punishing' a flow for not responding appropriately
2111	   to congestion, the router will first switch to dropping rather than
2112	   marking as an indication of congestion, for that flow.  When this
2113	   includes dropping arriving packets from that flow that have the CE
2114	   bit set, this ensures that these indications of congestion are being
2115	   seen by the end nodes.  Thus, there is no additional harm that we are
2116	   able to postulate as a result of multiple conflicting adversaries.

2118	19.  Implications of Subverting End-to-End Congestion Control

2120	   This section focuses on the potential repercussions of subverting
2121	   end-to-end congestion control by either falsely indicating ECN-Capa-
2122	   bility, or by erasing the congestion indication in ECN (the CE-bit).
2123	   Subverting end-to-end congestion control by either of these two meth-
2124	   ods can have consequences both for the application and for the net-
2125	   work.  We discuss these separately below.

2127	   The first method to subvert end-to-end congestion control, that of
2128	   falsely indicating ECN-Capability, effectively subverts end-to-end
2129	   congestion control only if the packet later encounters congestion
2130	   that results in the setting of the CE bit.  In this case, the trans-
2131	   port protocol (which may not be ECN-capable) does not receive the
2132	   indication of congestion from these downstream congested routers.

2134	   The second method to subvert end-to-end congestion control, `erasing'
2135	   the (set) CE bit in a packet, effectively subverts end-to-end conges-
2136	   tion control only when the CE bit in the packet was set earlier by a
2137	   congested router.  In this case, the transport protocol does not
2138	   receive the indication of congestion from the upstream congested
2139	   routers.

2141	   Either of these two methods of subverting end-to-end congestion con-
2142	   trol can potentially introduce more damage to the network (and possi-
2143	   bly to the flow itself) than if the adversary had simply dropped
2144	   packets from that flow.  However, as we discuss later in this section
2145	   and in Section 7, this potential damage is limited.

2147	19.1.  Implications for the Network and for Competing Flows

2149	   The CE bit of the ECN field is only used by routers as an indication
2150	   of congestion during periods of *moderate* congestion.  ECN-capable
2151	   routers should drop rather than mark packets during heavy congestion
2152	   even if the router's queue is not yet full.  For example, for routers
2153	   using active queue management based on RED, the router should drop
2154	   rather than mark packets that arrive while the average queue sizes
2155	   exceed the RED queue's maximum threshold.

2157	   One consequence for the network of subverting end-to-end congestion
2158	   control is that flows that do not receive the congestion indications
2159	   from the network might increase their sending rate until they drive
2160	   the network into heavier congestion.  Then, the congested router
2161	   could begin to drop rather than mark arriving packets.  For flows
2162	   that are not isolated by some form of per-flow scheduling or other
2163	   per-flow mechanisms, but are instead aggregated with other flows in a
2164	   single queue in an undifferentiated fashion, this packet-dropping at
2165	   the congested router would apply to all flows that share that queue.
2166	   Thus, the consequences would be to increase the level of congestion
2167	   in the network.

2169	   In some cases, the increase in the level of congestion will lead to a
2170	   substantial buffer buildup at the congested queue that will be suffi-
2171	   cient to drive the congested queue from the packet-marking to the
2172	   packet-dropping regime.  This transition could occur either because
2173	   of buffer overflow, or because of the active queue management policy
2174	   described above that drops packets when the average queue is above
2175	   RED's maximum threshold.  At this point, all flows, including the
2176	   subverted flow, will begin to see packet drops instead of packet
2177	   marks, and a malicious or broken router will no longer be able to
2178	   `erase' these indications of congestion in the network.  If the end
2179	   nodes are deploying appropriate end-to-end congestion control, then
2180	   the subverted flow will reduce its arrival rate in response to con-
2181	   gestion.  When the level of congestion is sufficiently reduced, the
2182	   congested queue can return from the packet-dropping regime to the
2183	   packet-marking regime.  The steady-state pattern could be one of the
2184	   congested queue oscillating between these two regimes.

2186	   In other cases, the consequences of subverting end-to-end congestion
2187	   control will not be severe enough to drive the congested link into
2188	   sufficiently-heavy congestion that packets are dropped instead of
2189	   being marked.  In this case, the implications for competing flows in
2190	   the network will be a slightly-increased rate of packet marking or
2191	   dropping, and a corresponding decrease in the bandwidth available to
2192	   those flows.  This can be a stable state if the arrival rate of the
2193	   subverted flow is sufficiently small, relative to the link bandwidth,
2194	   that the average queue size at the congested router remains under
2195	   control.  In particular, the subverted flow could have a limited
2196	   bandwidth demand on the link at this router, while still getting more
2197	   than its "fair" share of the link.  This limited demand could be due
2198	   to a limited demand from the data source; a limitation from the TCP
2199	   advertised window; a lower-bandwidth access pipe; or other factors.
2200	   Thus the subversion of ECN-based congestion control can still lead to
2201	   unfairness, which we believe is appropriate to note here.

2203	   The threat to the network posed by the subversion of ECN-based con-
2204	   gestion control in the network is essentially the same as the threat
2205	   posed by an end-system that intentionally fails to cooperate with
2206	   end-to-end congestion control.  The deployment of mechanisms in
2207	   routers to address this threat is an open research question, and is
2208	   discussed further in Section 10.

2210	   Let us take the example described in Section 18.1.1, where the CE bit
2211	   that was set in a packet is erased: {(1, 1) -> (1, 0)}.  The conse-
2212	   quence for the congested upstream router that set the CE bit is that
2213	   this congestion indication does not reach the end nodes for that
2214	   flow. The source (even one which is completely cooperative and not
2215	   malicious) is thus allowed to continue to increase its sending rate
2216	   (if it is a TCP flow, by increasing its congestion window).  The flow
2217	   potentially achieves better throughput than the other flows that also
2218	   share the congested router, especially if there are no policing mech-
2219	   anisms or per-flow queueing mechanisms at that router.  Consider the
2220	   behavior of the other flows, especially if they are cooperative: that
2221	   is, the flows that do not experience subverted end-to-end congestion
2222	   control.  They are likely to reduce their load (e.g., by reducing
2223	   their window size) on the congested router, thus benefiting our sub-
2224	   verted flow. This results in unfairness.  As we discussed above, this
2225	   unfairness could either be transient (because the congested queue is
2226	   driven into the packet-marking regime), oscillatory (because the con-
2227	   gested queue oscillates between the packet marking and the packet
2228	   dropping regime), or more moderate but a persistent stable state
2229	   (because the congested queue is never driven to the packet dropping
2230	   regime).

2232	   The results would be similar if the subverted flow was intentionally
2233	   avoiding end-to-end congestion control.  One difference is that a
2234	   flow that is intentionally avoiding end-to-end congestion control at
2235	   the end nodes can avoid end-to-end congestion control even when the
2236	   congested queue is in packet-dropping mode, by refusing to reduce its
2237	   sending rate in response to packet drops in the network.  Thus the
2238	   problems for the network from the subversion of ECN-based congestion
2239	   control are less severe than the problems caused by the intentional
2240	   avoidance of end-to-end congestion control in the end nodes.  It is
2241	   also the case that it is considerably more difficult to control the
2242	   behavior of the end nodes than it is to control the behavior of the
2243	   infrastructure itself.  This is not to say that the problems for the
2244	   network posed by the network's subversion of ECN-based congestion
2245	   control are small; just that they are dwarfed by the problems for the
2246	   network posed by the subversion of either ECN-based or other cur-
2247	   rently known packet-based congestion control mechanisms by the end
2248	   nodes.

2250	19.2.  Implications for the Subverted Flow

2252	   When a source indicates that it is ECN-capable, there is an expecta-
2253	   tion that the routers in the network that are capable of participat-
2254	   ing in ECN will use the CE bit for indication of congestion. There is
2255	   the potential benefit of using ECN in reducing the amount of packet
2256	   loss (in addition to the reduced queueing delays because of active
2257	   queue management policies).  When the packet flows through a tunnel
2258	   where the nodes that the tunneled packets traverse are untrusted in
2259	   some way, the expectation is that IPsec will protect the flow from
2260	   subversion that results in undesirable consequences.

2262	   In many cases, a subverted flow will benefit from the subversion of
2263	   end-to-end congestion control for that flow in the network, by
2264	   receiving more bandwidth than it would have otherwise, relative to
2265	   competing non-subverted flows.  If the congested queue reaches the
2266	   packet-dropping stage, then the subversion of end-to-end congestion
2267	   control might or might not be of overall benefit to the subverted
2268	   flow, depending on that flow's relative tradeoffs between throughput,
2269	   loss, and delay.

2271	   One form of subverting end-to-end congestion control is to falsely
2272	   indicate ECN-capability by setting the ECT bit.  This has the conse-
2273	   quence of downstream congested routers setting the CE bit in vain.
2274	   However, as described in Section 9.1.2, if the ECT bit is changed in
2275	   an IP tunnel, this can be detected at the egress point of the tunnel,
2276	   as long as the inner header was not changed within the tunnel.

2278	   The second form of subverting end-to-end congestion control is to
2279	   erase the congestion indication, either by erasing the CE bit
2280	   directly, or by erasing the ECT bit when the CE bit is already set.
2281	   In this case, it is the upstream congested routers that set the CE
2282	   bit in vain.

2284	   If the ECT bit is erased within an IP tunnel, then this can be
2285	   detected at the egress point of the tunnel, as long as the inner
2286	   header was not changed within the tunnel.  If the CE bit is set
2287	   upstream of the IP tunnel, then any erasure of the outer header's CE
2288	   bit within the tunnel will have no effect because the inner header
2289	   preserves the set value of the CE bit.  However, if the CE bit is set
2290	   within the tunnel, and erased either within or downstream of the tun-
2291	   nel, this is not necessarily detected at the egress point of the tun-
2292	   nel.

2294	   With this subversion of end-to-end congestion control, an end-system
2295	   transport does not respond to the congestion indication.  Along with
2296	   the increased unfairness for the non-subverted flows described in the
2297	   previous section, the congested router's queue could continue to
2298	   build, resulting in packet loss at the congested router - which is a
2299	   means for indicating congestion to the transport in any case.  In the
2300	   interim, the flow might experience higher queueing delays, possibly
2301	   along with an increased bandwidth relative to other non-subverted
2302	   flows.  But transports do not inherently make assumptions of consis-
2303	   tently experiencing carefully managed queueing in the path.  We
2304	   believe that these forms of subverting end-to-end congestion control
2305	   are no worse for the subverted flow than if the adversary had simply
2306	   dropped the packets of that flow itself.

2308	19.3.  Non-ECN-Based Methods of Subverting End-to-end Congestion Control

2310	   We have shown that, in many cases, a malicious or broken router that
2311	   is able to change the bits in the ECN field can do no more damage
2312	   than if it had simply dropped the packet in question.  However, this
2313	   is not true in all cases, in particular in the cases where the broken
2314	   router subverted end-to-end congestion control by either falsely
2315	   indicating ECN-Capability or by erasing the ECN congestion indication
2316	   (in the CE-bit).  While there are many ways that a router can harm a
2317	   flow by dropping packets, a router cannot subvert end-to-end conges-
2318	   tion control by dropping packets.  As an example, a router cannot
2319	   subvert TCP congestion control by dropping data packets, acknowledge-
2320	   ment packets, or control packets.

2322	   Even though packet-dropping cannot be used to subvert end-to-end con-
2323	   gestion control, there *are* non-ECN-based methods for subverting
2324	   end-to-end congestion control that a broken or malicious router could
2325	   use.  For example, a broken router could duplicate data packets, thus
2326	   effectively negating the effects of end-to-end congestion control
2327	   along some portion of the path.  (For a router that duplicated pack-
2328	   ets within an IPsec tunnel, the security administrator can cause the
2329	   duplicate packets to be discarded by configuring anti-replay protec-
2330	   tion for the tunnel.)  This duplication of packets within the network
2331	   would have similar implications for the network and for the subverted
2332	   flow as those described in Sections 18.1.1 and 18.1.4 above.

2334	20.  The Motivation for the ECT bit.

2336	   The need for the ECT bit is motivated by the fact that ECN will be
2337	   deployed incrementally in an Internet where some transport protocols
2338	   and routers understand ECN and some do not. With the ECT bit, the
2339	   router can drop packets from flows that are not ECN-capable, but can
2340	   *instead* set the CE bit in packets that *are* ECN-capable. Because
2341	   the ECT bit allows an end node to have the CE bit set in a packet
2342	   *instead* of having the packet dropped, an end node might have some
2343	   incentive to deploy ECN.

2345	   If there was no ECT indication, then the router would have to set the
2346	   CE bit for packets from both ECN-capable and non-ECN-capable flows.
2347	   In this case, there would be no incentive for end-nodes to deploy
2348	   ECN, and no viable path of incremental deployment from a non-ECN
2349	   world to an ECN-capable world.  Consider the first stages of such an
2350	   incremental deployment, where a subset of the flows are ECN-capable.
2351	   At the onset of congestion, when the packet dropping/marking rate
2352	   would be low, routers would only set CE bits, rather than dropping
2353	   packets.  However, only those flows that are ECN-capable would under-
2354	   stand and respond to CE packets. The result is that the ECN-capable
2355	   flows would back off, and the non-ECN-capable flows would be unaware
2356	   of the ECN signals and would continue to open their congestion win-
2357	   dows.

2359	   In this case, there are two possible outcomes: (1) the ECN-capable
2360	   flows back off, the non-ECN-capable flows get all of the bandwidth,
2361	   and congestion remains mild, or (2) the ECN-capable flows back off,
2362	   the non-ECN-capable flows don't, and congestion increases until the
2363	   router transitions from setting the CE bit to dropping packets.
2364	   While this second outcome evens out the fairness, the ECN-capable
2365	   flows would still receive little benefit from being ECN-capable,
2366	   because the increased congestion would drive the router to packet-
2367	   dropping behavior.

2369	   A flow that advertised itself as ECN-Capable but does not respond to
2370	   CE bits is functionally equivalent to a flow that turns off conges-
2371	   tion control, as discussed earlier in this document.

2373	   Thus, in a world when a subset of the flows are ECN-capable, but
2374	   where ECN-capable flows have no mechanism for indicating that fact to
2375	   the routers, there would be less effective and less fair congestion
2376	   control in the Internet, resulting in a strong incentive for end
2377	   nodes not to deploy ECN.

2379	21.  Why use Two Bits in the IP Header?

2381	   Given the need for an ECT indication in the IP header, there still
2382	   remains the question of whether the ECT (ECN-Capable Transport) and
2383	   CE (Congestion Experienced) indications should have been overloaded
2384	   on a single bit.  This overloaded-one-bit alternative, explored in
2385	   [Floyd94], would have involved a single bit with two values.  One
2386	   value, "ECT and not CE", would represent an ECN-Capable Transport,
2387	   and the other value, "CE or not ECT", would represent either Conges-
2388	   tion Experienced or a non-ECN-Capable transport.

2390	   One difference between the one-bit and two-bit implementations con-
2391	   cerns packets that traverse multiple congested routers.  Consider a
2392	   CE packet that arrives at a second congested router, and is selected
2393	   by the active queue management at that router for either marking or
2394	   dropping.  In the one-bit implementation, the second congested router
2395	   has no choice but to drop the CE packet, because it cannot distin-
2396	   guish between a CE packet and a non-ECT packet.  In the two-bit
2397	   implementation, the second congested router has the choice of either
2398	   dropping the CE packet, or of leaving it alone with the CE bit set.

2400	   Another difference between the one-bit and two-bit implementations
2401	   comes from the fact that with the one-bit implementation, receivers
2402	   in a single flow cannot distinguish between CE and non-ECT packets.
2403	   Thus, in the one-bit implementation an ECN-capable data sender would
2404	   have to unambiguously indicate to the receiver or receivers whether
2405	   each packet had been sent as ECN-Capable or as non-ECN-Capable.  One
2406	   possibility would be for the sender to indicate in the transport
2407	   header whether the packet was sent as ECN-Capable.  A second possi-
2408	   bility that would involve a functional limitation for the one- bit
2409	   implementation would be for the sender to unambiguously indicate that
2410	   it was going to send *all* of its packets as ECN-Capable or as non-
2411	   ECN-Capable.  For a multicast transport protocol, this unambiguous
2412	   indication would have to be apparent to receivers joining an on-going
2413	   multicast session.

2415	   Another concern that was described earlier (and recommended in this
2416	   document) is that transports (particularly TCP) should not mark pure
2417	   ACK packets or retransmitted packets as being ECN-Capable.  A pure
2418	   ACK packet from a non-ECN-capable transport could be dropped, without
2419	   necessarily having an impact on the transport from a congestion con-
2420	   trol perspective (because subsequent ACKs are cumulative).  An ECN-
2421	   capable transport reacting to the CE bit set in a pure ACK packet by
2422	   reducing the window would be at a disadvantage in comparison to a
2423	   non-ECN-capable transport. For this reason (and for reasons described
2424	   earlier in relation to retransmitted packets), it is desirable to
2425	   have the ECN-Capable bit indication on a per-packet basis.

2427	   Another advantage of the two-bit approach is that it is somewhat more
2428	   robust.  The most critical issue, discussed in Section 8, is that the
2429	   default indication should be that of a non-ECN-Capable transport.  In
2430	   a two-bit implementation, this requirement for the default value sim-
2431	   ply means that the ECT bit should be `OFF' by default.  In the one-
2432	   bit implementation, this means that the single overloaded bit should
2433	   by default be in the "CE or not ECT" position.  This is less clear
2434	   and straightforward, and possibly more open to incorrect implementa-
2435	   tions either in the end nodes or in the routers.

2437	   In summary, while the one-bit implementation could be a possible
2438	   implementation, it has the following significant limitations relative
2439	   to the two-bit implementation.  First, the one-bit implementation has
2440	   more limited functionality for the treatment of CE packets at a sec-
2441	   ond congested router.  Second, the one-bit implementation requires
2442	   either that extra information be carried in the transport header of
2443	   packets from ECN-Capable flows (to convey the functionality of the
2444	   second bit elsewhere, namely in the transport header), or that
2445	   senders in ECN-Capable flows accept the limitation that receivers
2446	   must be able to determine a priori which packets are ECN-Capable and
2447	   which are not ECN-Capable. Third, the one-bit implementation is pos-
2448	   sibly more open to errors from faulty implementations that choose the
2449	   wrong default value for the ECN bit.  We believe that the use of the
2450	   extra bit in the IP header for the ECT-bit is extremely valuable to
2451	   overcome these limitations.

2453	22.  Historical Definitions for the IPv4 TOS Octet

2455	   RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP
2456	   header.  In RFC 791, bits 6 and 7 of the ToS octet are listed as
2457	   "Reserved for Future Use", and are shown set to zero.  The first two
2458	   fields of the ToS octet were defined as the Precedence and Type of
2459	   Service (TOS) fields.

2461	            0     1     2     3     4     5     6     7
2462	         +-----+-----+-----+-----+-----+-----+-----+-----+
2463	         |   PRECEDENCE    |       TOS       |  0  |  0  |  RFC 791
2464	         +-----+-----+-----+-----+-----+-----+-----+-----+

2466	   RFC 1122 included bits 6 and 7 in the TOS field, though it did not
2467	   discuss any specific use for those two bits:

2469	            0     1     2     3     4     5     6     7
2470	         +-----+-----+-----+-----+-----+-----+-----+-----+
2471	         |   PRECEDENCE    |       TOS                   |  RFC 1122
2472	         +-----+-----+-----+-----+-----+-----+-----+-----+

2474	   The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows:

2476	            0     1     2     3     4     5     6     7
2477	         +-----+-----+-----+-----+-----+-----+-----+-----+
2478	         |   PRECEDENCE    |       TOS             | MBZ |  RFC 1349
2479	         +-----+-----+-----+-----+-----+-----+-----+-----+

2481	   Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary
2482	   Cost".  In addition to the Precedence and Type of Service (TOS)
2483	   fields, the last field, MBZ (for "must be zero") was defined as cur-
2484	   rently unused.  RFC 1349 stated that "The originator of a datagram
2485	   sets [the MBZ] field to zero (unless participating in an Internet
2486	   protocol experiment which makes use of that bit)."

2488	   RFC 1455 [RFC 1455] defined an experimental standard that used all
2489	   four bits in the TOS field to request a guaranteed level of link
2490	   security.

2492	   RFC 1349 and RFC 1455 have been obsoleted by "Definition of the Dif-
2493	   ferentiated Services Field (DS Field) in the IPv4 and IPv6 Headers"
2494	   [RFC2474] in which bits 6 and 7 of the DS field are listed as Cur-
2495	   rently Unused (CU).  RFC 2780 [RFC2780] specified ECN as an experi-
2496	   mental use of the two-bit CU field.  RFC 2780 updated the definition
2497	   of the DS Field to only encompass the first six bits of this octet
2498	   rather than all eight bits; these first six bits are defined as the
2499	   Differentiated Services CodePoint (DSCP):

2501	            0     1     2     3     4     5     6     7
2502	         +-----+-----+-----+-----+-----+-----+-----+-----+
2503	         |               DSCP                |    CU     |  RFCs 2474,
2504	   2780
2505	         +-----+-----+-----+-----+-----+-----+-----+-----+

2507	   Because of this unstable history, the definition of the ECN field in
2508	   this document cannot be guaranteed to be backwards compatible with
2509	   all past uses of these two bits.

2511	   Prior to RFC 2474, routers were not permitted to modify bits in
2512	   either the DSCP or ECN field of packets forwarded through them, and
2513	   hence routers that comply only with RFCs prior to 2474 should have no
2514	   effect on ECN.  For end nodes, bit 7 (the ECN CE bit) must be trans-
2515	   mitted as zero for any implementation compliant only with RFCs prior
2516	   to 2474.  Such nodes may transmit bit 6 (the ECN ECT bit) as one for
2517	   the "Minimize Monetary Cost" provision of RFC 1349 or the experiment
2518	   authorized by RFC 1455; neither this aspect of RFC 1349 nor the
2519	   experiment in RFC 1455 were widely implemented or used.  The damage
2520	   that could be done by a broken, non-conformant router would be to
2521	   "erase" the CE bit for an ECN- capable packet that arrived at the
2522	   router with the CE bit set, or set the CE bit even in the absence of
2523	   congestion.  This has been discussed in the section on "Non-compli-
2524	   ance in the Network".

2526	   The damage that could be done in an ECN-capable environment by a non-
2527	   ECN-capable end-node transmitting packets with the ECT bit set has
2528	   been discussed in the section on "Non-compliance by the End Nodes".

2530	23.  IANA Considerations

2532	   The bits for ECT and CE in the ECN Field of the IP header and the
2533	   bits for CWR and ECE in the TCP header are specified by the Standards
2534	   Action of this RFC, as is required by RFC 2780.  We would note that
2535	   this RFC does not define the codepoint of (ECT=0, CE=1) for the ECT
2536	   and CE bits.

2538	   IANA allocated the IPSEC Security Association Attribute value 10 for
2539	   the ECN Tunnel use described in Section 9.2.1.2 above at the request
2540	   of David Black in November 1999.  If this draft is approved for pub-
2541	   lication as an RFC, IANA should change the Reference for this alloca-
2542	   tion from David Black's request to this RFC based on its RFC number.

2544	   AUTHORS' ADDRESSES

2546	      K. K. Ramakrishnan
2547	      TeraOptic Networks, Inc.
2548	      Phone: +1 (408) 666-8650
2549	      Email: kk@teraoptic.com

2551	      Sally Floyd
2552	      Phone: +1 (510) 666-2989
2553	      ACIRI
2554	      Email: floyd@aciri.org
2555	      URL: http://www.aciri.org/floyd/

2557	      David L. Black
2558	      EMC Corporation
2559	      42 South St.
2560	      Hopkinton, MA  01748
2561	      Phone:  +1 (508) 435-1000 x75140
2562	      Email: black_david@emc.com

2564	      This draft was created in January 2001.
2565	      It expires July 2001.