idnits 2.17.1 

draft-ietf-tsvwg-ecn-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 53
     longer pages, the longest (page 2) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 54 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 2 instances of too long lines in the document, the longest one
     being 5 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC 2001' is mentioned on line 445, but not defined

  ** Obsolete undefined reference: RFC 2001 (Obsoleted by RFC 2581)

  == Missing Reference: 'RFC2401' is mentioned on line 1395, but not defined

  ** Obsolete undefined reference: RFC 2401 (Obsoleted by RFC 4301)

  == Missing Reference: 'RFC 2474' is mentioned on line 1375, but not defined

  == Missing Reference: 'RFC 2475' is mentioned on line 1376, but not defined

  == Missing Reference: 'RFC 1455' is mentioned on line 2462, but not defined

  ** Obsolete undefined reference: RFC 1455 (Obsoleted by RFC 2474)

  == Unused Reference: 'FRED' is defined on line 1759, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC1455' is defined on line 1802, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC1701' is defined on line 1805, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC1702' is defined on line 1808, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC 2119' is defined on line 1814, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2408' is defined on line 1826, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2409' is defined on line 1830, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2475' is defined on line 1837, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2402 (ref. 'AH') (Obsoleted by RFC
     4302, RFC 4305)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ECN'

  ** Obsolete normative reference: RFC 2406 (ref. 'ESP') (Obsoleted by RFC
     4303, RFC 4305)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ93'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd94'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FF99'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FRED'

  ** Downref: Normative reference to an Informational RFC: RFC 1701 (ref.
     'GRE')

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson88'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson90'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'K98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MJV96'

  ** Downref: Normative reference to an Informational RFC: RFC 2702 (ref.
     'MPLS')

  ** Downref: Normative reference to an Informational RFC: RFC 2637 (ref.
     'PPTP')

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Downref: Normative reference to an Informational RFC: RFC 1141

  ** Obsolete normative reference: RFC 1349 (Obsoleted by RFC 2474)

  ** Obsolete normative reference: RFC 1455 (Obsoleted by RFC 2474)

  -- Duplicate reference: RFC1701, mentioned in 'RFC1701', was also mentioned
     in 'GRE'.

  ** Downref: Normative reference to an Informational RFC: RFC 1701

  ** Downref: Normative reference to an Informational RFC: RFC 1702

  -- Duplicate reference: RFC2119, mentioned in 'RFC 2119', was also
     mentioned in 'B97'.

  ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567)

  ** Obsolete normative reference: RFC 2401 (Obsoleted by RFC 4301)

  ** Obsolete normative reference: RFC 2407 (Obsoleted by RFC 4306)

  ** Obsolete normative reference: RFC 2409 (ref. 'RFC2408') (Obsoleted by
     RFC 4306)

  -- Duplicate reference: RFC2409, mentioned in 'RFC2409', was also mentioned
     in 'RFC2408'.

  ** Obsolete normative reference: RFC 2409 (Obsoleted by RFC 4306)

  ** Downref: Normative reference to an Informational RFC: RFC 2475

  ** Obsolete normative reference: RFC 2481 (Obsoleted by RFC 3168)

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)

  ** Downref: Normative reference to an Informational RFC: RFC 2884

  ** Downref: Normative reference to an Informational RFC: RFC 2983

  -- Possible downref: Non-RFC (?) normative reference: ref. 'RFD99'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'RJ90'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'SCWA99'


     Summary: 29 errors (**), 0 flaws (~~), 17 warnings (==), 18 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                       K. K. Ramakrishnan
3	INTERNET DRAFT                                        TeraOptic Networks
4	draft-ietf-tsvwg-ecn-00.txt                                  Sally Floyd
5	                                                                   ACIRI
6	                                                                D. Black
7	                                                                     EMC
8	                                                          November, 2000
9	                                                      Expires: May, 2001

11	      The Addition of Explicit Congestion Notification (ECN) to IP

13	                          Status of this Memo

15	   This document is an Internet-Draft and is in full conformance with
16	   all provisions of Section 10 of RFC2026.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet- Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	Abstract

36	   This document specifies the incorporation of ECN (Explicit Congestion
37	   Notification) to TCP and IP, including ECN's use of two bits in the
38	   IP header's DS field.  We begin by describing TCP's use of packet
39	   drops as an indication of congestion.  Next we explain that with the
40	   addition of active queue management (e.g., RED) to the Internet
41	   infrastructure, where routers detect congestion before the queue
42	   overflows, routers are no longer limited to packet drops as an
43	   indication of congestion.  Routers can instead set the Congestion
44	   Experienced (CE) bit in the IP header of packets from ECN-capable
45	   transports.  We describe when the CE bit is to be set in routers, and
46	   describe modifications needed to TCP to make it ECN-capable.
47	   Modifications to other transport protocols (e.g., unreliable unicast
48	   or multicast, reliable multicast, other reliable unicast transport
49	   protocols) could be considered as those protocols are developed and
50	   advance through the standards process.

52	   We also describe in this document the issues involving the use of ECN
53	   within IP tunnels, and within IPsec tunnels in particular.

55	   One of the guiding principles for this document is that all the
56	   mechanisms specified here are incrementally deployable.

58	Table of Contents
59	     1.  Introduction
60	     2.  Conventions and Acronyms
61	     3.  Assumptions and General Principles
62	     4.  Active Queue Management (AQM)
63	     5.  Explicit Congestion Notification in IP
64	     5.1.  ECN as an indication of persistent congestion
65	     5.2.  Dropped or Corrupted Packets
66	     6.  Support from the Transport Protocol
67	     6.1.  TCP
68	     6.1.1.  TCP Initialization
69	     6.1.1.1.  Robust TCP Initialization with an Echoed Reserve Field
70	     6.1.1.2.  Robust TCP Initialization with no response to the SYN
71	     6.1.2.  The TCP Sender
72	     6.1.3.  The TCP Receiver
73	     6.1.4.  Congestion on the ACK-path
74	     6.1.5.  Retransmitted TCP packets
75	     6.1.6.  TCP Window Probes.
76	     7.  Non-compliance by the End Nodes
77	     8.  Non-compliance in the Network
78	     8.1.  Complications Introduced by Split Paths
79	     9.  Encapsulated Packets
80	     9.1.  IP packets encapsulated in IP
81	     9.1.1.  The limited-functionality and full-functionality options within
82	     9.1.2.  Changes to the ECN Field within an IP Tunnel.
83	     9.2.  IPsec Tunnels
84	     9.2.1.  Negotiation between Tunnel Endpoints
85	     9.2.1.1.  ECN Tunnel Security Association Database Field
86	     9.2.1.2.  ECN Tunnel Security Association Attribute
87	     9.2.1.3.  Changes to IPsec Tunnel Header Processing
88	     9.2.2.  Changes to the ECN Field within an IPsec Tunnel.
89	     9.2.3.  Comments for IPsec Support
90	     9.3.  IP packets encapsulated in non-IP packet headers.
91	     10.  Issues Raised by Monitoring and Policing Devices
92	     11.  Evaluations of ECN
93	     12.  Summary of changes required in IP and TCP
94	     13.  Conclusions
95	     14.  Acknowledgements
96	     15.  References
97	     16.  Security Considerations
98	     17.  IPv4 Header Checksum Recalculation
99	     18.  Possible Changes to the ECN Field in the Network
100	     18.1.  Possible Changes to the IP Header
101	     18.1.1.  Erasing the Congestion Indication
102	     18.1.2.  Falsely Reporting Congestion
103	     18.1.3.  Disabling ECN-Capability
104	     18.1.4.  Falsely Indicating ECN-Capability
105	     18.1.5.  Changes with No Functional Effect
106	     18.2.  Information carried in the Transport Header
107	     18.3.  Split Paths
108	     19.  Implications of Subverting End-to-End Congestion Control
109	     19.1.  Implications for the Network and for Competing Flows
110	     19.2.  Implications for the Subverted Flow
111	     19.3.  Non-ECN-Based Methods of Subverting End-to-end Congestion Control
112	     20.  The motivation for the ECT bit.
113	     21.  Why use two bits in the IP header?
114	     22.  Historical definitions for the IPv4 TOS octet

116	1.  Introduction

118	   TCP's congestion control and avoidance algorithms are based on the
119	   notion that the network is a black-box [Jacobson88, Jacobson90].  The
120	   network's state of congestion or otherwise is determined by end- sys-
121	   tems probing for the network state, by gradually increasing the load
122	   on the network (by increasing the window of packets that are out-
123	   standing in the network) until the network becomes congested and a
124	   packet is lost.  Treating the network as a "black-box" and treating
125	   loss as an indication of congestion in the network is appropriate for
126	   pure best-effort data carried by TCP, with little or no sensitivity
127	   to delay or loss of individual packets.  In addition, TCP's conges-
128	   tion management algorithms have techniques built-in (such as Fast
129	   Retransmit and Fast Recovery) to minimize the impact of losses, from
130	   a throughput perspective.  However, these mechanisms are not intended
131	   to help applications that are in fact sensitive to the delay or loss
132	   of one or more individual packets.  Interactive traffic such as tel-
133	   net, web-browsing, and transfer of audio and video data can be sensi-
134	   tive to packet losses (especially when using an unreliable data
135	   delivery transport such as UDP) or to the increased latency of the
136	   packet caused by the need to retransmit the packet after a loss (with
137	   the reliable data delivery semantics provided by TCP).

139	   Since TCP determines the appropriate congestion window to use by
140	   gradually increasing the window size until it experiences a dropped
141	   packet, this causes the queues at the bottleneck router to build up.
142	   With most packet drop policies at the router that are not sensitive
143	   to the load placed by each individual flow (e.g., tail-drop on queue
144	   overflow), this means that some of the packets of latency-sensitive
145	   flows may be dropped. In addition, such drop policies lead to syn-
146	   chronization of loss across multiple flows.

148	   Active queue management mechanisms detect congestion before the queue
149	   overflows, and provide an indication of this congestion to the end
150	   nodes.  Thus, active queue management can reduce unnecessary queueing
151	   delay for all traffic sharing that queue.  The advantages of active
152	   queue management are discussed in RFC 2309 [RFC2309].  Active queue
153	   management avoids some of the bad properties of dropping on queue
154	   overflow, including the undesirable synchronization of loss across
155	   multiple flows.  More importantly, active queue management means that
156	   transport protocols with mechanisms for congestion control (e.g.,
157	   TCP) do not have to rely on buffer overflow as the only indication of
158	   congestion.

160	   Active queue management mechanisms may use one of several methods for
161	   indicating congestion to end-nodes. One is to use packet drops, as is
162	   currently done. However, active queue management allows the router to
163	   separate policies of queueing or dropping packets from the policies
164	   for indicating congestion. Thus, active queue management allows
165	   routers to use the Congestion Experienced (CE) bit in a packet header
166	   as an indication of congestion, instead of relying solely on packet
167	   drops. This has the potential of reducing the impact of loss on
168	   latency-sensitive flows.

170	2.  Conventions and Acronyms

172	   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
173	   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
174	   document, are to be interpreted as described in [B97].

176	3.  Assumptions and General Principles

178	   In this section, we describe some of the important design principles
179	   and assumptions that guided the design choices in this proposal.

181	   * Because ECN is likely to be adopted gradually, accommodating migra-
182	   tion is essential. Some routers may still only drop packets to indi-
183	   cate congestion, and some end-systems may not be ECN- capable. The
184	   most viable strategy is one that accommodates incremental deployment
185	   without having to resort to "islands" of ECN-capable and non-ECN-
186	   capable environments.
187	   * New mechanisms for congestion control and avoidance need to co-
188	   exist and cooperate with existing mechanisms for congestion control.
189	   In particular, new mechanisms have to co-exist with TCP's current
190	   methods of adapting to congestion and with routers' current practice
191	   of dropping packets in periods of congestion.
192	   * Congestion may persist over different time-scales. The time scales
193	   that we are concerned with are congestion events that may last longer
194	   than a round-trip time.
195	   * The number of packets in an individual flow (e.g., TCP connection
196	   or an exchange using UDP) may range from a small number of packets to
197	   quite a large number. We are interested in managing the congestion
198	   caused by flows that send enough packets so that they are still
199	   active when network feedback reaches them.
200	   * Asymmetric routing is likely to be a normal occurrence in the
201	   Internet. The path (sequence of links and routers) followed by data
202	   packets may be different from the path followed by the acknowledgment
203	   packets in the reverse direction.
204	   * Many routers process the "regular" headers in IP packets more effi-
205	   ciently than they process the header information in IP options.  This
206	   suggests keeping congestion experienced information in the regular
207	   headers of an IP packet.
208	   * It must be recognized that not all end-systems will cooperate in
209	   mechanisms for congestion control. However, new mechanisms shouldn't
210	   make it easier for TCP applications to disable TCP congestion con-
211	   trol.  The benefit of lying about participating in new mechanisms
212	   such as ECN-capability should be small.

214	4.  Active Queue Management (AQM)

216	   Random Early Detection (RED) is one mechanism for Active Queue Man-
217	   agement (AQM) that has been proposed to detect incipient congestion
218	   [FJ93], and is currently being deployed in the Internet [RFC2309].
219	   AQM is meant to be a general mechanism using one of several alterna-
220	   tives for congestion indication, but in the absence of ECN, AQM is
221	   restricted to using packet drops as a mechanism for congestion indi-
222	   cation.  AQM drops packets based on the average queue length exceed-
223	   ing a threshold, rather than only when the queue overflows.  However,
224	   because AQM may drop packets before the queue actually overflows, AQM
225	   is not always forced by memory limitations to discard the packet.

227	   AQM can set a Congestion Experienced (CE) bit in the packet header
228	   instead of dropping the packet, when such a bit is provided in the IP
229	   header and understood by the transport protocol.  The use of the CE
230	   bit with ECN allows the receiver(s) to receive the packet, avoiding
231	   the potential for excessive delays due to retransmissions after
232	   packet losses.  We use the term 'CE packet' to denote a packet that
233	   has the CE bit set.

235	5.  Explicit Congestion Notification in IP

237	   This document specifies that the Internet provide a congestion indi-
238	   cation for incipient congestion (as in RED and earlier work [RJ90])
239	   where the notification can sometimes be through marking packets
240	   rather than dropping them.  This uses an ECN field in the IP header
241	   with two bits.  The ECN-Capable Transport (ECT) bit is set by the
242	   data sender to indicate that the end-points of the transport protocol
243	   are ECN-capable.  The CE bit is set by the router to indicate conges-
244	   tion to the end nodes.  Routers that have a packet arriving at a full
245	   queue drop the packet, just as they do in the absence of ECN.

247	   Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field.
248	   Bit 6 is designated as the ECT bit, and bit 7 is designated as the CE
249	   bit.  The IPv4 TOS octet corresponds to the Traffic Class octet in
250	   IPv6.  The definitions for the IPv4 TOS octet [RFC791] and the IPv6
251	   Traffic Class octet have been superseded by the DS (Differentiated
252	   Services) Field [RFC2474].  Bits 6 and 7 are listed in [RFC2474] as
253	   Currently Unused.  Section 19 gives a brief history of the TOS octet.

255	            0     1     2     3     4     5     6     7
256	         +-----+-----+-----+-----+-----+-----+-----+-----+
257	         |                                   | ECN FIELD |
258	         |               DSCP                |           |
259	         |                                   | ECT | CE  |
260	         +-----+-----+-----+-----+-----+-----+-----+-----+

262	           DSCP: differentiated services codepoint
263	           ECN:  Explicit Congestion Notification

265	          Figure 1: The Differentiated Services Field in IP.

267	   Because of the unstable history of the TOS octet, the use of the ECN
268	   field as specified in this document cannot be guaranteed to be back-
269	   wards compatible with all past uses of these two bits.  The potential
270	   dangers of this lack of backwards compatibility are discussed in Sec-
271	   tion 19.

273	   Upon the receipt by an ECN-Capable transport of a single CE packet,
274	   the congestion control algorithms followed at the end-systems MUST be
275	   essentially the same as the congestion control response to a *single*
276	   dropped packet.  For example, for ECN-Capable TCP the source TCP is
277	   required to halve its congestion window for any window of data con-
278	   taining either a packet drop or an ECN indication.

280	   One reason for requiring that the congestion-control response to the
281	   CE packet be essentially the same as the response to a dropped packet
282	   is to accommodate the incremental deployment of ECN in both end-sys-
283	   tems and in routers.  Some routers may drop ECN-Capable packets
284	   (e.g., using the same AQM policies for congestion detection) while
285	   other routers set the CE bit, for equivalent levels of congestion.
286	   Similarly, a router might drop a non-ECN-Capable packet but set the
287	   CE bit in an ECN-Capable packet, for equivalent levels of congestion.
288	   If there were different congestion control responses to a CE bit
289	   indication than to a packet drop, this could result in unfair treat-
290	   ment for different flows.

292	   An additional goal is that the end-systems should react to congestion
293	   at most once per window of data (i.e., at most once per round-trip
294	   time), to avoid reacting multiple times to multiple indications of
295	   congestion within a round-trip time.

297	   For a router, the CE bit of an ECN-Capable packet should only be set
298	   if the router would otherwise have dropped the packet as an indica-
299	   tion of congestion to the end nodes. When the router's buffer is not
300	   yet full and the router is prepared to drop a packet to inform end
301	   nodes of incipient congestion, the router should first check to see
302	   if the ECT bit is set in that packet's IP header.  If so, then
303	   instead of dropping the packet, the router MAY instead set the CE bit
304	   in the IP header.

306	   An environment where all end nodes were ECN-Capable could allow new
307	   criteria to be developed for setting the CE bit, and new congestion
308	   control mechanisms for end-node reaction to CE packets.  However,
309	   this is a research issue, and as such is not addressed in this docu-
310	   ment.

312	   When a CE packet (i.e., a packet that has the CE bit set) is received
313	   by a router, the CE bit is left unchanged, and the packet is trans-
314	   mitted as usual. When severe congestion has occurred and the router's
315	   queue is full, then the router has no choice but to drop some packet
316	   when a new packet arrives.  We anticipate that such packet losses
317	   will become relatively infrequent when a majority of end-systems
318	   become ECN- Capable and participate in TCP or other compatible con-
319	   gestion control mechanisms. In an ECN-Capable environment that is
320	   adequately-provisioned network, packet losses should occur primarily
321	   during transients or in the presence of non-cooperating sources.

323	   We expect that routers will set the CE bit in response to incipient
324	   congestion as indicated by the average queue size, using the RED
325	   algorithms suggested in [FJ93, RFC2309].  To the best of our knowl-
326	   edge, this is the only proposal currently under discussion in the
327	   IETF for routers to drop packets proactively, before the buffer over-
328	   flows.  However, this document does not attempt to specify a particu-
329	   lar mechanism for active queue management, leaving that endeavor, if
330	   needed, to other areas of the IETF.  While ECN is inextricably tied
331	   up with the need to have a reasonable active queue management mecha-
332	   nism at the router, the reverse does not hold; active queue manage-
333	   ment mechanisms have been developed and deployed independent of ECN,
334	   using packet drops as indications of congestion in the absence of ECN
335	   in the IP architecture.

337	5.1.  ECN as an indication of persistent congestion

339	   We emphasize that a *single* packet with the CE bit set in an IP
340	   packet causes the transport layer to respond, in terms of congestion
341	   control, as it would to a packet drop.  The instantaneous queue size
342	   is likely to see considerable variations even when the router does
343	   not experience persistent congestion.  As such, it is important that
344	   transient congestion at a router, reflected by the instantaneous
345	   queue size reaching a threshold much smaller than the capacity of the
346	   queue, not trigger a reaction at the transport layer.  Therefore, the
347	   CE bit should not be set by a router based on the instantaneous queue
348	   size.

350	   For example, since the ATM and Frame Relay mechanisms for congestion
351	   indication have typically been defined without an associated notion
352	   of average queue size as the basis for determining that an intermedi-
353	   ate node is congested, we believe that they provide a very noisy sig-
354	   nal. The TCP-sender reaction specified in this document for ECN is
355	   NOT the appropriate reaction for such a noisy signal of congestion
356	   notification.  However, if the routers that interface to the ATM net-
357	   work have a way of maintaining the average queue at the interface,
358	   and use it to come to a reliable determination that the ATM subnet is
359	   congested, they may use the ECN notification that is defined here.

361	   We continue to encourage experiments in techniques at layer 2 (e.g.,
362	   in ATM switches or Frame Relay switches) to take advantage of ECN.
363	   For example, using a scheme such as RED (where packet marking is
364	   based on the average queue length exceeding a threshold), layer 2
365	   devices could provide a reasonably reliable indication of congestion.
366	   When all the layer 2 devices in a path set that layer's own Conges-
367	   tion Experienced bit (e.g., the EFCI bit for ATM, the FECN bit in
368	   Frame Relay) in this reliable manner, then the interface router to
369	   the layer 2 network could copy the state of that layer 2 Congestion
370	   Experienced bit into the CE bit in the IP header.  We recognize that
371	   this is not the current practice, nor is it in current standards.
372	   However, encouraging experimentation in this manner may provide the
373	   information needed to enable evolution of existing layer 2 mechanisms
374	   to provide a more reliable means of congestion indication, when they
375	   use a single bit for indicating congestion.

377	5.2.  Dropped or Corrupted Packets

379	   For the proposed use for ECN in this document (that is, for a trans-
380	   port protocol such as TCP for which a dropped data packet is an indi-
381	   cation of congestion), end nodes detect dropped data packets, and the
382	   congestion response of the end nodes to a dropped data packet is at
383	   least as strong as the congestion response to a received CE packet.
384	   To ensure the reliable delivery of the congestion indication of the
385	   CE bit, the ECT bit MUST NOT be set in a packet unless the loss of
386	   that packet in the network would be detected by the end nodes and
387	   interpreted as an indication of congestion.

389	   Transport protocols such as TCP do not necessarily detect all packet
390	   drops, such as the drop of a "pure" ACK packet; for example, TCP does
391	   not reduce the arrival rate of subsequent ACK packets in response to
392	   an earlier dropped ACK packet.  Any proposal for extending ECN-Capa-
393	   bility to such packets would have to address issues such as the case
394	   of an ACK packet that was marked with the CE bit but was later
395	   dropped in the network. We believe that this aspect is still the sub-
396	   ject of research, so this document specifies that at this time,
397	   "pure" ACK packets MUST NOT indicate ECN-Capability.

399	   Similarly, if a CE packet is dropped later in the network due to cor-
400	   ruption (bit errors), the end nodes should still invoke congestion
401	   control, just as TCP would today in response to a dropped data
402	   packet. This issue of corrupted CE packets would have to be consid-
403	   ered in any proposal for the network to distinguish between packets
404	   dropped due to corruption, and packets dropped due to congestion or
405	   buffer overflow.  In particular, the ubiquitous deployment of ECN
406	   would not, in and of itself, be a sufficient development to allow
407	   end-nodes to interpret packet drops as indications of corruption
408	   rather than congestion.

410	6.  Support from the Transport Protocol

412	   ECN requires support from the transport protocol, in addition to the
413	   functionality given by the ECN field in the IP packet header. The
414	   transport protocol might require negotiation between the endpoints
415	   during setup to determine that all of the endpoints are ECN-capable,
416	   so that the sender can set the ECT bit in transmitted packets.  Sec-
417	   ond, the transport protocol must be capable of reacting appropriately
418	   to the receipt of CE packets.  This reaction could be in the form of
419	   the data receiver informing the data sender of the received CE packet
420	   (e.g., TCP), of the data receiver unsubscribing to a layered multi-
421	   cast group (e.g., RLM [MJV96]), or of some other action that ulti-
422	   mately reduces the arrival rate of that flow on that congested link.

424	   This document only addresses the addition of ECN Capability to TCP,
425	   leaving issues of ECN in other transport protocols to further
426	   research.  For TCP, ECN requires three new pieces of functionality:
427	   negotiation between the endpoints during connection setup to deter-
428	   mine if they are both ECN-capable; an ECN-Echo (ECE) flag in the TCP
429	   header so that the data receiver can inform the data sender when a CE
430	   packet has been received; and a Congestion Window Reduced (CWR) flag
431	   in the TCP header so that the data sender can inform the data
432	   receiver that the congestion window has been reduced. The support
433	   required from other transport protocols is likely to be different,
434	   particularly for unreliable or reliable multicast transport proto-
435	   cols, and will have to be determined as other transport protocols are
436	   brought to the IETF for standardization.

438	6.1.  TCP

440	   The following sections describe in detail the proposed use of ECN in
441	   TCP.  This proposal is described in essentially the same form in

443	   [Floyd94]. We assume that the source TCP uses the standard congestion
444	   control algorithms of Slow-start, Fast Retransmit and Fast Recovery
445	   [RFC 2001].

447	   This proposal specifies two new flags in the Reserved field of the
448	   TCP header.  The TCP mechanism for negotiating ECN-Capability uses
449	   the ECN-Echo (ECE) flag in the TCP header.  Bit 9 in the Reserved
450	   field of the TCP header is designated as the ECN-Echo flag.  The
451	   location of the 6-bit Reserved field in the TCP header is shown in
452	   Figure 3 of RFC 793 [RFC793] (and is reproduced below for complete-
453	   ness).  This specification of the ECN Field leaves the Reserved field
454	   as a 4-bit field using bits 4-7.

456	   To enable the TCP receiver to determine when to stop setting the ECN-
457	   Echo flag, we introduce a second new flag in the TCP header, the CWR
458	   flag.  The CWR flag is assigned to Bit 8 in the Reserved field of the
459	   TCP header.

461	         0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
462	       +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
463	       |               |                       | U | A | P | R | S | F |
464	       | Header Length |        Reserved       | R | C | S | S | Y | I |
465	       |               |                       | G | K | H | T | N | N |
466	       +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

468	        Figure 2: The old definition of bytes 13 and 14 of the TCP
469	   header.

471	         0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
472	       +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
473	       |               |               | C | E | U | A | P | R | S | F |
474	       | Header Length |    Reserved   | W | C | R | C | S | S | Y | I |
475	       |               |               | R | E | G | K | H | T | N | N |
476	       +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

478	        Figure 3: The new definition of bytes 13 and 14 of the TCP
479	   Header.

481	   Thus, ECN uses the ECT and CE flags in the IP header (as shown in
482	   Figure 1) for signaling between routers and connection endpoints, and
483	   uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure
484	   3) for TCP-endpoint to TCP-endpoint signaling.  For a TCP connection,
485	   a typical sequence of events in an ECN-based reaction to congestion
486	   is as follows:
487	      * The ECT bit is set in packets transmitted by the sender to indi-
488	      cate that ECN is supported by the transport entities for these
489	      packets.
490	      * An ECN-capable router detects impending congestion and detects
491	      that the ECT bit is set in the packet it is about to drop.
492	      Instead of dropping the packet, the router chooses to set the CE
493	      bit in the IP header and forwards the packet.
494	      * The receiver receives the packet with the CE bit set, and sets
495	      the ECN-Echo flag in its next TCP ACK sent to the sender.
496	      * The sender receives the TCP ACK with ECN-Echo set, and reacts to
497	      the congestion as if a packet had been dropped.
498	      * The sender sets the CWR flag in the TCP header of the next
499	      packet sent to the receiver to acknowledge its receipt of and
500	      reaction to the ECN-Echo flag.

502	   The negotiation for using ECN by the TCP transport entities and the
503	   use of the ECN-Echo and CWR flags is described in more detail in the
504	   sections below.

506	6.1.1  TCP Initialization

508	   In the TCP connection setup phase, the source and destination TCPs
509	   exchange information about their willingness to use ECN.  Subsequent
510	   to the completion of this negotiation, the TCP sender sets the ECT
511	   bit in the IP header of data packets to indicate to the network that
512	   the transport is capable and willing to participate in ECN for this
513	   packet. This indicates to the routers that they may mark this packet
514	   with the CE bit, if they would like to use that as a method of con-
515	   gestion notification. If the TCP connection does not wish to use ECN
516	   notification for a particular packet, the sending TCP sets the ECT
517	   bit equal to 0 (i.e., not set), and the TCP receiver ignores the CE
518	   bit in the received packet.

520	   For this discussion, we designate the initiating host as Host A and
521	   the responding host as Host B.  We call a SYN packet with the ECE and
522	   CWR flags set an "ECN-setup SYN packet", and we call a SYN packet
523	   with the ECE and CWR flags not set a "non-ECN-setup SYN packet".
524	   Similarly, we call a SYN-ACK packet with only the ECE flag set but
525	   the CWR flag not set an "ECN-setup SYN-ACK packet", and we call a
526	   SYN-ACK packet with both the ECE and CWR flags not set a "non-ECN-
527	   setup SYN-ACK packet".

529	   Before a TCP connection can use ECN, Host A sends an ECN-setup SYN
530	   packet, and Host B sends an ECN-setup SYN-ACK packet.  For a SYN
531	   packet, the setting of both ECE and CWR in the ECN-setup SYN packet
532	   is defined as an indication that the sending TCP is ECN-Capable,
533	   rather than as an indication of congestion or of response to conges-
534	   tion. More precisely, an ECN-setup SYN packet indicates that the TCP
535	   implementation transmitting the SYN packet will participate in ECN as
536	   both a sender and receiver.  Specifically, as a receiver, it will
537	   respond to incoming data packets that have the CE bit set in the IP
538	   header by setting ECE in outgoing TCP Acknowledgement (ACK) packets.

540	   As a sender, it will respond to incoming packets that have ECE set by
541	   reducing the congestion window and setting CWR when appropriate.  An
542	   ECN-setup SYN packet does not commit the TCP sender to setting the
543	   ECT bit in any or all of the packets it may transmit.  However, the
544	   commitment to respond appropriately to incoming packets with the CE
545	   bit set remains even if the TCP sender in a later transmission,
546	   within this TCP connection, sends a SYN packet without ECE and CWR
547	   set.

549	   When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE flag
550	   but not the CWR flag.  An ECN-setup SYN-ACK packet is defined as an
551	   indication that the TCP transmitting the SYN-ACK packet is ECN-Capa-
552	   ble.  As with the SYN packet, an ECN-setup SYN-ACK packet does not
553	   commit the TCP host to setting the ECT bit in transmitted packets.

555	   The following rules apply to the sending of ECN-setup packets:

557	   * If a host has received an ECN-setup SYN packet, then it MAY send an
558	   ECN-setup SYN-ACK packet.  Otherwise, it MUST NOT send an ECN-setup
559	   SYN-ACK packet.
560	   * A host MUST NOT set ECT on data packets unless it has sent at least
561	   one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received at
562	   least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has sent no
563	   non-ECN-setup SYN or non-ECN-setup SYN-ACK packet.  If a host has
564	   received at least one non-ECN-setup SYN or non-ECN-setup SYN-ACK
565	   packet, then it SHOULD NOT set ECT on data packets.
566	   * If a host ever sets the ECT bit on a data packet, then that host
567	   MUST correctly set/clear the CWR TCP bit on all subsequent packets in
568	   the connection.
569	   * If a host has sent at least one ECN-setup SYN or ECN-setup SYN-ACK
570	   packet, and has received no non-ECN-setup SYN or non-ECN-setup SYN-
571	   ACK packet, then if that host receives TCP data packets with ECT and
572	   CE bits set in the IP header, then that host MUST process these pack-
573	   ets as specified for an ECN-capable connection.

575	6.1.1.1.  Robust TCP Initialization with an Echoed Reserve Field

577	   There is the question of why we chose to have the TCP sending the SYN
578	   set two ECN-related flags in the Reserved field of the TCP header for
579	   the SYN packet, while the responding TCP sending the SYN-ACK sets
580	   only one ECN-related flag in the SYN-ACK packet.  This asymmetry is
581	   necessary for the robust negotiation of ECN-capability with some
582	   deployed TCP implementations.  There exists at least one faulty TCP
583	   implementation in which TCP receivers set the Reserved field of the
584	   TCP header in ACK packets (and hence the SYN-ACK) simply to reflect
585	   the Reserved field of the TCP header in the received data packet.
586	   Because the TCP SYN packet sets the ECN-Echo and CWR flags to indi-
587	   cate ECN-capability, while the SYN-ACK packet sets only the ECN-Echo
588	   flag, the sending TCP correctly interprets a receiver's reflection of
589	   its own flags in the Reserved field as an indication that the
590	   receiver is not ECN-capable.  The sending TCP is not mislead by a
591	   faulty TCP implementation sending a SYN-ACK packet that simply
592	   reflects the Reserved field of the incoming SYN packet.

594	6.1.1.2.  Robust TCP Initialization with no response to the SYN

596	   ECN introduces the use of the ECN-Echo and CWR flags in the TCP
597	   header (as shown in Figure 3) for initialization.  There exists some
598	   faulty equipment in the Internet that either ignores an ECN-setup SYN
599	   packet or responds with a RST, in the belief that such a packet (with
600	   these bits set) is a signature for a port-scanning tool that could be
601	   used in a denial-of-service attack.  To provide robust connectivity
602	   even in the presence of such faulty equipment, a host that receives a
603	   RST in response to the transmission of an ECN-setup SYN packet MAY
604	   resend a SYN with CWR and ECE cleared. This could result in a TCP
605	   connection being established without using ECN.  Similarly, a host
606	   that receives no reply to an ECN-setup SYN within the normal SYN
607	   retransmission timeout interval MAY resend the SYN and any subsequent
608	   SYN retransmissions with CWR and ECE cleared.  To overcome normal
609	   packet loss that results in the original SYN being lost, the origi-
610	   nating host may retransmit one or more ECN-setup SYN packets before
611	   giving up and retransmitting the SYN with the CWR and ECE bits
612	   cleared.

614	   We note that in this case, the following example scenario is possi-
615	   ble:

617	    (1) Host A: Sends an ECN-setup SYN.
618	    (2) Host B: Sends an ECN-setup SYN/ACK, packet is dropped or
619	   delayed.
620	    (3) Host A: Sends a non-ECN-setup SYN.
621	    (4) Host B: Sends a non-ECN-setup SYN/ACK.

623	   We note that in this case, following the procedures above, neither
624	   Host A nor Host B may set the ECT bit on data packets, We further
625	   note that a host NEVER uses the reception of ECT data packets as an
626	   implicit signal that the other host is ECN-capable.

628	6.1.2.  The TCP Sender

630	   For a TCP connection using ECN, data packets are transmitted with the
631	   ECT bit set in the IP header (set to a "1").  If the sender receives
632	   an ECN-Echo (ECE) ACK packet (that is, an ACK packet with the ECN-
633	   Echo flag set in the TCP header), then the sender knows that conges-
634	   tion was encountered in the network on the path from the sender to
635	   the receiver.  The indication of congestion should be treated just as
636	   a congestion loss in non-ECN-Capable TCP. That is, the TCP source
637	   halves the congestion window "cwnd" and reduces the slow start
638	   threshold "ssthresh".  The sending TCP SHOULD NOT increase the con-
639	   gestion window in response to the receipt of an ECN-Echo ACK packet.

641	   TCP should not react to congestion indications more than once every
642	   window of data (or more loosely, more than once every round-trip
643	   time). That is, the TCP sender's congestion window should be reduced
644	   only once in response to a series of dropped and/or CE packets from a
645	   single window of data.  In addition, the TCP source should not
646	   decrease the slow-start threshold, ssthresh, if it has been decreased
647	   within the last round trip time.  However, if any retransmitted pack-
648	   ets are dropped, then this is interpreted by the source TCP as a new
649	   instance of congestion.

651	   After the source TCP reduces its congestion window in response to a
652	   CE packet, incoming acknowledgements that continue to arrive can
653	   "clock out" outgoing packets as allowed by the reduced congestion
654	   window.  If the congestion window consists of only one MSS (maximum
655	   segment size), and the sending TCP receives an ECN-Echo ACK packet,
656	   then the sending TCP should in principle still reduce its congestion
657	   window in half. However, the value of the congestion window is
658	   bounded below by a value of one MSS.  If the sending TCP were to con-
659	   tinue to send, using a congestion window of 1 MSS, this results in
660	   the transmission of one packet per round-trip time.  It is necessary
661	   to still reduce the sending rate of the TCP sender even further, on
662	   receipt of an ECN-Echo packet when the congestion window is one.  We
663	   use the retransmit timer as a means of reducing the rate further in
664	   this circumstance.  Therefore, the sending TCP MUST reset the
665	   retransmit timer on receiving the ECN-Echo packet when the congestion
666	   window is one.  The sending TCP will then be able to send a new
667	   packet only when the retransmit timer expires.

669	   [Floyd94] discusses TCP's response to ECN in more detail.  [Floyd98]
670	   discusses the validation test in the ns simulator, which illustrates
671	   a wide range of ECN scenarios. These scenarios include the following:
672	   an ECN followed by another ECN, a Fast Retransmit, or a Retransmit
673	   Timeout; a Retransmit Timeout or a Fast Retransmit followed by an
674	   ECN; and a congestion window of one packet followed by an ECN.

676	   TCP follows existing algorithms for sending data packets in response
677	   to incoming ACKs, multiple duplicate acknowledgements, or retransmit
678	   timeouts [RFC2581].  TCP also follows the normal procedures for
679	   increasing the congestion window when it receives ACK packets without
680	   the ECN-Echo bit set [RFC2581].

682	6.1.3.  The TCP Receiver

684	   When TCP receives a CE data packet at the destination end-system, the
685	   TCP data receiver sets the ECN-Echo flag in the TCP header of the
686	   subsequent ACK packet.  If there is any ACK withholding implemented,
687	   as in current "delayed-ACK" TCP implementations where the TCP
688	   receiver can send an ACK for two arriving data packets, then the ECN-
689	   Echo flag in the ACK packet will be set to the OR of the CE bits of
690	   all of the data packets being acknowledged.  That is, if any of the
691	   received data packets are CE packets, then the returning ACK has the
692	   ECN-Echo flag set.

694	   To provide robustness against the possibility of a dropped ACK packet
695	   carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in
696	   a series of ACK packets sent subsequently.  The TCP receiver uses the
697	   CWR flag received from the TCP sender to determine when to stop set-
698	   ting the ECN-Echo flag.

700	   When an ECN-Capable TCP sender reduces its congestion window for any
701	   reason (because of a retransmit timeout, a Fast Retransmit, or in
702	   response to an ECN Notification), the TCP sender sets the CWR flag in
703	   the TCP header of the first new data packet sent after the window
704	   reduction.  If that data packet is dropped in the network, then the
705	   sending TCP will have to reduce the congestion window again and
706	   retransmit the dropped packet.

708	   We ensure that the "Congestion Window Reduced" information is reli-
709	   ably delivered to the TCP receiver.  This comes about from the fact
710	   that if the new data packet carrying the CWR flag is dropped, then
711	   the TCP sender will have to again reduce its congestion window, and
712	   send another new data packet with the CWR flag set.  Thus, the CWR
713	   bit in the TCP header SHOULD NOT be set on retransmitted packets.
714	   When the TCP data sender is ready to set the CWR bit after reducing
715	   the congestion window, it SHOULD set the CWR bit only on the first
716	   new data packet that it transmits.

718	   After a TCP receiver sends an ACK packet with the ECN-Echo bit set,
719	   that TCP receiver continues to set the ECN-Echo flag in all the ACK
720	   packets it sends (whether they acknowledge CE data packets or non-CE
721	   data packets) until it receives a CWR packet (a packet with the CWR
722	   flag set).  After the receipt of the CWR packet, acknowledgements for
723	   subsequent non-CE data packets do not have the ECN-Echo flag set. If
724	   another CE packet is received by the data receiver, the receiver
725	   would once again send ACK packets with the ECN-Echo flag set.  While
726	   the receipt of a CWR packet does not guarantee that the data sender
727	   received the ECN-Echo message, this does suggest that the data sender
728	   reduced its congestion window at some point *after* it sent the data
729	   packet for which the CE bit was set.

731	   We have already specified that a TCP sender is not required to reduce
732	   its congestion window more than once per window of data.  Some care
733	   is required if the TCP sender is to avoid unnecessary reductions of
734	   the congestion window when a window of data includes both dropped
735	   packets and (marked) CE packets.  This is illustrated in [Floyd98].

737	6.1.4.  Congestion on the ACK-path

739	   For the current generation of TCP congestion control algorithms, pure
740	   acknowledgement packets (e.g., packets that do not contain any accom-
741	   panying data) should be sent with the ECT bit off. Current TCP
742	   receivers have no mechanisms for reducing traffic on the ACK-path in
743	   response to congestion notification.  Mechanisms for responding to
744	   congestion on the ACK-path are areas for current and future research.
745	   (One simple possibility would be for the sender to reduce its conges-
746	   tion window when it receives a pure ACK packet with the CE bit set).
747	   For current TCP implementations, a single dropped ACK generally has
748	   only a very small effect on the TCP's sending rate.

750	6.1.5.  Retransmitted TCP packets

752	   This document specifies that for ECN-capable TCP implementations, the
753	   ECT bit (ECN-Capable Transport) in the IP header MUST NOT be set on
754	   retransmitted data packets, and that the TCP data receiver SHOULD
755	   ignore the ECN field on arriving data packets that are outside of the
756	   receiver's current window.  This is for greater security against
757	   denial-of-service attacks, as well as for robustness of the ECN con-
758	   gestion indication with packets that are dropped later in the net-
759	   work.

761	   First, we note that if the TCP sender were to set the ECT bit on a
762	   retransmitted packet, then if an unnecessarily-retransmitted packet
763	   was later dropped in the network, the end nodes would never receive
764	   the indication of congestion from the router setting the CE bit.
765	   Thus, setting the ECT bit on retransmitted data packets is not con-
766	   sistent with the robust delivery of the congestion indication even
767	   for packets that are later dropped in the network.

769	   In addition, an attacker capable of spoofing the IP source address of
770	   the TCP sender could send data packets with arbitrary sequence num-
771	   bers, with both the ECT and CE bits set in the IP header.  On receiv-
772	   ing this spoofed data packet, the TCP data receiver would determine
773	   that the data does not lie in the current receive window, and return
774	   a duplicate acknowledgement.  We define an out-of-window packet at
775	   the TCP data receiver as a data packet that lies outside the
776	   receiver's current window.  On receiving an out-of-window packet, the
777	   TCP data receiver has to decide whether or not to treat the CE bit in
778	   the packet header as a valid indication of congestion, and therefore
779	   whether to return ECN-Echo indications to the TCP data sender.  If
780	   the TCP data receiver ignored the CE bit in an out-of-window packet,
781	   then the TCP data sender would not receive this possibly-legitimate
782	   indication of congestion from the network, resulting in a violation
783	   of end-to-end congestion control.  On the other hand, if the TCP data
784	   receiver honors the CE indication in the out-of-window packet, and
785	   reports the indication of congestion to the TCP data sender, then the
786	   malicious node that created the spoofed, out-of-window packet has
787	   successfully "attacked" the TCP connection by forcing the data sender
788	   to unnecessarily reduce (halve) its congestion window.  To prevent
789	   such a denial-of-service attack, we specify that a legitimate TCP
790	   data sender MUST NOT set the ECT bit on retransmitted data packets,
791	   and that the TCP data receiver SHOULD ignore the CE bit on out-of-
792	   window packets.

794	   One drawback of not setting ECT on retransmitted packets denies ECN
795	   protection for retransmitted packets.  However, for an ECN-capable
796	   TCP connection in a fully-ECN-capable environment with mild conges-
797	   tion, packets should rarely be dropped due to congestion in the first
798	   place, and so instances of retransmitted packets should rarely arise.
799	   If packets are being retransmitted, then there are already packet
800	   losses (from corruption or from congestion) that ECN has been unable
801	   to prevent.

803	   We note that if the router sets the CE bit for an ECN-capable data
804	   packet within a TCP connection, then the TCP connection is guaranteed
805	   to receive that indication of congestion, or to receive some other
806	   indication of congestion within the same window of data, even if this
807	   packet is dropped or reordered in the network.  We consider two
808	   cases, when the packet is later retransmitted, and when the packet is
809	   not later retransmitted.

811	   In the first case, if the packet is either dropped or delayed, and at
812	   some point retransmitted by the data sender, then the retransmission
813	   is a result of a Fast Retransmit or a Retransmit Timeout for either
814	   that packet or for some prior packet in the same window of data.  In
815	   this case, because the data sender already has retransmitted this
816	   packet, we know that the data sender has already responded to an
817	   indication of congestion for some packet within the same window of
818	   data as the original packet.  Thus, even if the first transmission of
819	   the packet is dropped in the network, or is delayed, if it had the CE
820	   bit set, and is later ignored by the data receiver as an out-of-win-
821	   dow packet, this is not a problem, because the sender has already
822	   responded to an indication of congestion for that window of data.

824	   In the second case, if the packet is never retransmitted by the data
825	   sender, then this data packet is the only copy of this data received
826	   by the data receiver, and therefore arrives at the data receiver as
827	   an in-window packet, regardless of how much the packet might be
828	   delayed or reordered.  In this case, if the CE bit is set on the
829	   packet within the network, this will be treated by the data receiver
830	   as a valid indication of congestion.

832	6.1.6.  TCP Window Probes.

834	   When the TCP data receiver advertises a zero window, the TCP data
835	   sender sends window probes to determine if the receiver's window has
836	   increased.  Window probe packets do not contain any user data except
837	   for the sequence number, which is a byte.  If a window probe packet
838	   is dropped in the network, this loss is not detected by the receiver.
839	   Therefore, the TCP data sender MUST NOT set either the ECT or CWR
840	   bits on window probe packets.

842	   However, because window probes use exact sequence numbers, they can-
843	   not be easily spoofed in denial-of-service attacks.  Therefore, if a
844	   window probe arrives with ECT and CE set, then the receiver SHOULD
845	   respond to the ECN indications.

847	7.  Non-compliance by the End Nodes

849	   This section discusses concerns about the vulnerability of ECN to
850	   non-compliant end-nodes (i.e., end nodes that set the ECT bit in
851	   transmitted packets but do not respond to received CE packets).  We
852	   argue that the addition of ECN to the IP architecture will not sig-
853	   nificantly increase the current vulnerability of the architecture to
854	   unresponsive flows.

856	   Even for non-ECN environments, there are serious concerns about the
857	   damage that can be done by non-compliant or unresponsive flows (that
858	   is, flows that do not respond to congestion control indications by
859	   reducing their arrival rate at the congested link).  For example, an
860	   end-node could "turn off congestion control" by not reducing its con-
861	   gestion window in response to packet drops. This is a concern for the
862	   current Internet.  It has been argued that routers will have to
863	   deploy mechanisms to detect and differentially treat packets from
864	   non-compliant flows [RFC2309,FF99].  It has also been suggested that
865	   techniques such as end-to-end per-flow scheduling and isolation of
866	   one flow from another, differentiated services, or end-to-end reser-
867	   vations could remove some of the more damaging effects of unrespon-
868	   sive flows.

870	   It might seem that dropping packets in itself is an adequate deter-
871	   rent for non-compliance, and that the use of ECN removes this deter-
872	   rent.  We would argue in response that (1) ECN-capable routers pre-
873	   serve packet-dropping behavior in times of high congestion; and (2)
874	   even in times of high congestion, dropping packets in itself is not
875	   an adequate deterrent for non-compliance.

877	   First, ECN-Capable routers will only mark packets (as opposed to
878	   dropping them) when the packet marking rate is reasonably low. During
879	   periods where the average queue size exceeds an upper threshold, and
880	   therefore the potential packet marking rate would be high, our recom-
881	   mendation is that routers drop packets rather then set the CE bit in
882	   packet headers.

884	   During the periods of low or moderate packet marking rates when ECN
885	   would be deployed, there would be little deterrent effect on unre-
886	   sponsive flows of dropping rather than marking those packets. For
887	   example, delay-insensitive flows using reliable delivery might have
888	   an incentive to increase rather than to decrease their sending rate
889	   in the presence of dropped packets.  Similarly, delay-sensitive flows
890	   using unreliable delivery might increase their use of FEC in response
891	   to an increased packet drop rate, increasing rather than decreasing
892	   their sending rate.  For the same reasons, we do not believe that
893	   packet dropping itself is an effective deterrent for non-compliance
894	   even in an environment of high packet drop rates, when all flows are
895	   sharing the same packet drop rate.

897	   Several methods have been proposed to identify and restrict non- com-
898	   pliant or unresponsive flows. The addition of ECN to the network
899	   environment would not in any way increase the difficulty of designing
900	   and deploying such mechanisms. If anything, the addition of ECN to
901	   the architecture would make the job of identifying unresponsive flows
902	   slightly easier.  For example, in an ECN-Capable environment routers
903	   are not limited to information about packets that are dropped or have
904	   the CE bit set at that router itself; in such an environment, routers
905	   could also take note of arriving CE packets that indicate congestion
906	   encountered by that packet earlier in the path.

908	8.  Non-compliance in the Network

910	   This section considers the issues when a router is operating, possi-
911	   bly maliciously, to modify either of the bits in the ECN field.  In
912	   this section we represent the ECN field in the IP header by the tuple
913	   (ECT bit, CE bit).

915	   By tampering with the bits in the ECN field, an adversary (or a bro-
916	   ken router) could do one or more of the following: falsely report
917	   congestion, disable ECN-Capability for an individual packet, erase
918	   the ECN congestion indication, or falsely indicate ECN-Capability.
919	   Appendix X systematically examines the various cases by which the ECN
920	   field could be modified.  The important criterion considered in
921	   determining the consequences of such modifications is whether it is
922	   likely to lead to poorer behavior in any dimension (throughput,
923	   delay, fairness or functionality) than if a router were to drop a
924	   packet.

926	   The first two possible changes, falsely reporting congestion or dis-
927	   abling ECN-Capability for an individual packet, are no worse than if
928	   the router were to simply drop the packet.  From a congestion control
929	   point of view, setting the CE bit in the absence of congestion by a
930	   non-compliant router would be no worse than a router dropping a
931	   packet unnecessarily. By "erasing" the ECT bit of a packet that is
932	   later dropped in the network, a router's actions could result in an
933	   unnecessary packet drop for that packet later in the network.

935	   However, as discussed in Section X in the Appendix, a router that
936	   erases the ECN congestion indication or falsely indicates ECN-Capa-
937	   bility could potentially do more damage to the flow that if it has
938	   simply dropped the packet.  A rogue or broken router that "erased"
939	   the CE bit in arriving CE packets would prevent that indication of
940	   congestion from reaching downstream receivers.  This could result in
941	   the failure of congestion control for that flow and a resulting
942	   increase in congestion in the network, ultimately resulting in subse-
943	   quent packets dropped for this flow as the average queue size
944	   increased at the congested gateway.

946	   Appendix X considers the potential repercussions of subverting end-
947	   to-end congestion control by either falsely indicating ECN-Capabil-
948	   ity, or by erasing the congestion indication in ECN (the CE-bit).  We
949	   observe in the Appendix that the consequence of subverting ECN-based
950	   congestion control may lead to potential unfairness, but this is
951	   likely to be no worse than the subversion of either ECN-based or
952	   packet-based congestion control by the end nodes.

954	8.1.  Complications Introduced by Split Paths

956	   If a router or other network element has access to all of the packets
957	   of a flow, then that router could do no more damage to a flow by
958	   altering the ECN field than it could by simply dropping all of the
959	   packets from that flow.  However, in some cases, a malicious or bro-
960	   ken router might have access to only a subset of the packets from a
961	   flow.  The question is as follows:  can this router, by altering the
962	   ECN field in this subset of the packets, do more damage to that flow
963	   than if it has simply dropped that set of the packets?

965	   This is also discussed in detail in the Appendix, which concludes as
966	   follows:  It is true that the adversary that has access only to a
967	   subset of packets in an aggregate might, by subverting ECN-based con-
968	   gestion control, be able to deny the benefits of ECN to the other
969	   packets in the aggregate.  While this is undesirable, this is not a
970	   sufficient concern to result in disabling ECN within an IP tunnel.

972	9.  Encapsulated Packets

974	9.1.  IP packets encapsulated in IP

976	   The encapsulation of IP packet headers in tunnels is used in many
977	   places, including IPsec and IP in IP [RFC2003].  Currently, the ECN
978	   specification does not accommodate the constraints imposed by some of
979	   these pre-existing specifications for tunnels.  This document consid-
980	   ers issues related to interactions between ECN and IP tunnels, and
981	   specifies two alternative solutions.

983	   Some IP tunnel modes are based on adding a new "outer" IP header that
984	   encapsulates the original, or "inner" IP header and its associated
985	   packet.  In many cases, the new "outer" IP header may be added and
986	   removed at intermediate points along a connection, enabling the net-
987	   work to establish a tunnel without requiring endpoint participation.
988	   We denote tunnels that specify that the outer header be discarded at
989	   tunnel egress as "simple tunnels".

991	   ECN uses the ECT and CE flags in the IP header for signaling between
992	   routers and connection endpoints.  ECN interacts with IP tunnels
993	   because of the ECT and CE flags in the DS field octet in the IP
994	   header [RFC2474] (also referred to as the IPv4 TOS octet or IPv6
995	   Traffic Class octet).  [RFC2983] discusses interactions of Differen-
996	   tiated Services with IP tunnels of various forms.  In simple IP tun-
997	   nels the DS field octet is copied or mapped from the inner IP header
998	   to the outer IP header at IP tunnel ingress, and the outer header's
999	   copy of this field is discarded at IP tunnel egress.  If the outer
1000	   header were to be simply discarded without taking care to deal with
1001	   the ECN related flags, and an ECN-capable router were to set the CE
1002	   (Congestion Experienced) bit within a packet in a simple IP tunnel,
1003	   this indication would be discarded at tunnel egress, losing the indi-
1004	   cation of congestion.

1006	   Thus, the use of ECN over simple IP tunnels would result in routers
1007	   attempting to use the outer IP header to signal congestion to end-
1008	   points, but those congestion warnings never arriving because the
1009	   outer header is discarded at the tunnel egress point.  This problem
1010	   was encountered with ECN and IPsec in tunnel mode, and RFC 2481 rec-
1011	   ommended that ECN not be used with the older simple IPsec tunnels in
1012	   order to avoid this behavior and its consequences.  When ECN becomes
1013	   widely deployed, then simple tunnels likely to carry ECN-capable
1014	   traffic will have to be changed.

1016	   From a security point of view, the use of ECN in the outer header of
1017	   an IP tunnel might raise security concerns because an adversary could
1018	   tamper with the ECN information that propagates beyond the tunnel
1019	   endpoint.  Based on an analysis in the Appendix of these concerns and
1020	   the resultant risks, our overall approach is to make support for ECN
1021	   an option for IP tunnels, so that an IP tunnel can be specified or
1022	   configured either to use ECN or not to use ECN in the outer header of
1023	   the tunnel.  Thus, in environments or tunneling protocols where the
1024	   risks of using ECN are judged to outweigh its benefits, the tunnel
1025	   can simply not use ECN in the outer header.  Then the only indication
1026	   of congestion experienced at routers within the tunnel would be
1027	   through packet loss.

1029	   The result is that there are two viable options for the behavior of
1030	   ECN-capable connections over an IP tunnel, especially IPSec tunnels:
1031	      * A limited-functionality option in which ECN is preserved in the
1032	      inner header, but disabled in the outer header.  The only mecha-
1033	      nism available for signaling congestion occurring within the tun-
1034	      nel in this case is dropped packets.
1035	      * A full-functionality option that supports ECN in both the inner
1036	      and outer headers, and propagates congestion warnings from nodes
1037	      within the tunnel to endpoints.

1039	   Support for these options requires varying amounts of changes to IP
1040	   header processing at tunnel ingress and egress.  A small subset of
1041	   these changes sufficient to support only the limited-functionality
1042	   option would be sufficient to eliminate any incompatibility between
1043	   ECN and IP tunnels.

1045	   One goal of this document is to give guidance about the tradeoffs
1046	   between the limited-functionality and full-functionality options.  A
1047	   full discussion of the potential effects of an adversary's modifica-
1048	   tions of the CE and ECT bits is given in the Appendix.

1050	9.1.1.  The limited-functionality and full-functionality options within
1051	IP Tunnels

1053	   The limited-functionality option for ECN encapsulation in IP tunnels
1054	   is for the ECT bit in the outside (encapsulating) header to be off
1055	   (i.e., set to 0), regardless of the value of the ECT bit in the
1056	   inside (encapsulated) header.  With this option, the ECN field in the
1057	   inner header is not altered upon de-capsulation.  The disadvantage of
1058	   this approach is that the flow does not have ECN support for that
1059	   part of the path that is using IP tunneling, even if the encapsulated
1060	   packet (from the original TCP sender) is ECN-Capable.  That is, if
1061	   the encapsulated packet arrives at a congested router that is ECN-
1062	   capable, and the router can decide to drop or mark the packet as an
1063	   indication of congestion to the end nodes, the router will not be
1064	   permitted to set the CE bit in the packet header, but instead will
1065	   have to drop the packet.

1067	   The IP full-functionality option for ECN encapsulation is to copy the
1068	   ECT bit of the inside header to the outside header on encapsulation,
1069	   and to OR the CE bit from the outer header with the CE bit of the
1070	   inside header on decapsulation.  That is, for full ECN support the
1071	   encapsulation and decapsulation processing for the DS field octet
1072	   involves the following:  At tunnel ingress, the full-functionality
1073	   option copies the value of ECT (bit 6) in the inner header to the
1074	   outer header.  CE (bit 7) is set to 0 in the outer header.  Upon
1075	   decapsulation at the tunnel egress, the full-functionality option
1076	   sets CE to 1 in the inner header if the value of ECT (bit 6) in the
1077	   inner header is 1, and the value of CE (bit 7) in the outer header is
1078	   1.  Otherwise, no change is made to this field of the inner header.

1080	   With the full-functionality option, a flow can take advantage of ECN
1081	   in those parts of the path that might use IP tunneling.  The disad-
1082	   vantage of the full-functionality option from a security perspective
1083	   is that the IP tunnel cannot protect the flow from certain modifica-
1084	   tions to the ECN bits in the IP header within the tunnel.  The poten-
1085	   tial dangers from modifications to the ECN bits in the IP header are
1086	   described in detail in the Appendix.

1088	      (1) An IP tunnel MUST modify the handling of the DS field octet at
1089	      IP tunnel endpoints by implementing either the limited-functional-
1090	      ity or the full-functionality option.
1091	      (2) Optionally, an IP tunnel MAY enable the endpoints of an IP
1092	      tunnel to negotiate the choice between the limited-functionality
1093	      and the full-functionality option for ECN in the tunnel.

1095	   The minimum required to make ECN usable with IP tunnels is the lim-
1096	   ited-functionality option, which prevents ECN from being enabled in
1097	   the outer header of an IPsec tunnel.  Full support for ECN requires
1098	   the use of the full-functionality option.  If there are no optional
1099	   mechanisms for the tunnel endpoints to negotiate a choice between the
1100	   limited-functionality or full-functionality option, there can be a
1101	   pre-existing agreement between the tunnel endpoints about whether to
1102	   support the limited-functionality or the full-functionality ECN
1103	   option.

1105	   In addition, it is RECOMMENDED that packets with ECT and CE both set
1106	   to 1 in the outer header be dropped if they arrive at the tunnel
1107	   egress point for a tunnel that uses the limited-functionality option,
1108	   or for a tunnel that uses the full-functionality option but for which
1109	   the ECT bit in the inner header is set to zero.  This is motivated by
1110	   backwards compatibility and to ensure that no unauthorized modifica-
1111	   tions of the ECN field take place, and is discussed further in the
1112	   Appendix.

1114	9.1.2.  Changes to the ECN Field within an IP Tunnel.

1116	   The presence of a copy of the ECN field in the inner header of an IP
1117	   tunnel mode packet provides an opportunity for detection of unautho-
1118	   rized modifications to the ECT bit in the outer header.  Comparison
1119	   of the ECT bits in the inner and outer headers falls into two cate-
1120	   gories for implementations that conform to this document:
1121	      * If the IP tunnel uses the full-functionality option, then the
1122	      values of the ECT bits in the inner and outer headers should be
1123	      identical.
1124	      * If the tunnel uses the limited-functionality option, then the
1125	      ECT bit in the outer header should be 0.

1127	   Receipt of a packet not satisfying the appropriate condition could be
1128	   a cause of concern.

1130	   Consider the case of an IP tunnel where the tunnel ingress point has
1131	   not been updated to this document's requirements, while the tunnel
1132	   egress point has been updated to support ECN.  In this case, the IP
1133	   tunnel is not explicitly configured to support the full-functionality
1134	   ECN option. However, the tunnel ingress point is behaving identically
1135	   to a tunnel ingress point that supports the full-functionality
1136	   option.  If packets from an ECN-capable connection use this tunnel,
1137	   ECT will be set to 1 in the outer header at the tunnel ingress point.
1138	   Congestion within the tunnel may then result in ECN-capable routers
1139	   setting CE in the outer header.  Because the tunnel has not been
1140	   explicitly configured to support the full-functionality option, the
1141	   tunnel egress point expects the ECT bit in the outer header to be 0.
1142	   When an ECN-capable tunnel egress point receives a packet with the
1143	   ECT bit in the outer header set to 1, in a tunnel that has not been
1144	   configured to support the full-functionality option, that packet
1145	   should be processed, according to whether CE bit was set, as follows.
1146	   It is RECOMMENDED that such packets, with the ECT bit in the outer
1147	   header set to 1 on a tunnel that has not been configured to support
1148	   the full-functionality option, be dropped at the egress point if CE
1149	   is set to 1 in the outer header but 0 in the inner header, and for-
1150	   warded otherwise.

1152	   An IP tunnel cannot provide protection against erasure of congestion
1153	   indications based on resetting the value of the CE bit in packets for
1154	   which ECT is set in the outer header.  The erasure of congestion
1155	   indications may impact the network and other flows in ways that would
1156	   not be possible in the absence of ECN.  It is important to note that
1157	   erasure of congestion indications can only be performed to congestion
1158	   indications placed by nodes within the tunnel; the copy of the CE bit
1159	   in the inner header preserves congestion notifications from nodes
1160	   upstream of the tunnel ingress.  If erasure of congestion notifica-
1161	   tions is judged to be a security risk that exceeds the congestion
1162	   management benefits of ECN, then tunnels could be specified or con-
1163	   figured to use the limited-functionality option.

1165	9.2.  IPsec Tunnels

1167	   IPsec supports secure communication over potentially insecure network
1168	   components such as intermediate routers.  IPsec protocols support two
1169	   operating modes, transport mode and tunnel mode, that span a wide
1170	   range of security requirements and operating environments.  Transport
1171	   mode security protocol header(s) are inserted between the IP (IPv4 or
1172	   IPv6) header and higher layer protocol headers (e.g., TCP), and hence
1173	   transport mode can only be used for end-to-end security on a connec-
1174	   tion.  IPsec tunnel mode is based on adding a new "outer" IP header
1175	   that encapsulates the original, or "inner" IP header and its associ-
1176	   ated packet.  Tunnel mode security headers are inserted between these
1177	   two IP headers.  In contrast to transport mode, the new "outer" IP
1178	   header and tunnel mode security headers can be added and removed at
1179	   intermediate points along a connection, enabling security gateways to
1180	   secure vulnerable portions of a connection without requiring endpoint
1181	   participation in the security protocols.  An important aspect of tun-
1182	   nel mode security is that in the original specification, the outer
1183	   header is discarded at tunnel egress, ensuring that security threats
1184	   based on modifying the IP header do not propagate beyond that tunnel
1185	   endpoint.  Further discussion of IPsec can be found in [RFC 2401].

1187	   The IPsec protocol as originally defined in [ESP, AH] required that
1188	   the inner header's ECN field not be changed by IPsec decapsulation
1189	   processing at a tunnel egress node; this would have ruled out the
1190	   possibility of full-functionality mode for ECN.  At the same time,
1191	   this would ensure that an adversary's modifications to the ECN field
1192	   cannot be used to launch theft- or denial-of-service attacks across
1193	   an IPsec tunnel endpoint, as any such modifications will be discarded
1194	   at the tunnel endpoint.

1196	   In principle, permitting the use of ECN functionality in the outer
1197	   header of an IPsec tunnel raises security concerns because an adver-
1198	   sary could tamper with the information that propagates beyond the
1199	   tunnel endpoint.  Based on an analysis (included in the Appendix) of
1200	   these concerns and the associated risks, our overall approach has
1201	   been to provide configuration support for IPsec changes to remove the
1202	   conflict with ECN.

1204	   In particular, in tunnel mode the IPsec tunnel MUST support either
1205	   the limited-functionality or the full-functionality mode outlined in
1206	   Section X.

1208	   This makes permission to use ECN functionality in the outer header of
1209	   an IPsec tunnel a configurable part of the corresponding IPsec
1210	   Security Association (SA), so that it can be disabled in situations
1211	   where the risks are judged to outweigh the benefits.  The result is
1212	   that an IPsec security administrator is presented with two alterna-
1213	   tives for the behavior of ECN-capable connections within an IPsec
1214	   tunnel, the limited-functionality alternative and full-functionality
1215	   alternative described earlier.  All IPsec implementations MUST imple-
1216	   ment either the limited-functionality or the full-functionality
1217	   alternative in order to eliminate incompatibility between ECN and
1218	   IPsec tunnels, but implementers MAY choose to implement either alter-
1219	   native.

1221	   In addition, this document specifies how the endpoints of an IPsec
1222	   tunnel could negotiate enabling ECN functionality in the outer head-
1223	   ers of that tunnel based on security policy.  The ability to negoti-
1224	   ate ECN usage between tunnel endpoints would enable a security admin-
1225	   istrator to disable ECN in situations where she believes the risks
1226	   (e.g., of lost congestion notifications) outweigh the benefits of
1227	   ECN.

1229	   The IPsec protocol, as defined in [ESP, AH], does not include the IP
1230	   header's ECN field in any of its cryptographic calculations (in the
1231	   case of tunnel mode, the outer IP header's ECN field is not
1232	   included).  Hence modification of the ECN field by a network node has
1233	   no effect on IPsec's end-to-end security, because it cannot cause any
1234	   IPsec integrity check to fail.  As a consequence, IPsec does not pro-
1235	   vide any defense against an adversary's modification of the ECN field
1236	   (i.e., a man-in-the-middle attack), as the adversary's modification
1237	   will also have no effect on IPsec's end-to-end security.  In some
1238	   environments, the ability to modify the ECN field without affecting
1239	   IPsec integrity checks may constitute a covert channel; if it is nec-
1240	   essary to eliminate such a channel or reduce its bandwidth, then the
1241	   IPsec tunnel should be run in limited-functionality mode.

1243	9.2.1.  Negotiation between Tunnel Endpoints

1245	   This section describes the detailed changes to enable usage of ECN
1246	   over IPsec tunnels, including the negotiation of ECN support between
1247	   tunnel endpoints.  This is supported by three changes to IPsec:
1248	      * An optional Security Association Database (SAD) field indicating
1249	      whether tunnel encapsulation and decapsulation processing allows
1250	      or forbids ECN usage in the outer IP header.
1251	      * An optional Security Association Attribute that enables negotia-
1252	      tion of this SAD field between the two endpoints of an SA that
1253	      supports tunnel mode.
1254	      * Changes to tunnel mode encapsulation and decapsulation process-
1255	      ing to allow or forbid ECN usage in the outer IP header based on
1256	      the value of the SAD field.  When ECN usage is allowed in the
1257	      outer IP header, ECT is set in the outer header for ECN-capable
1258	      connections and congestion notifications (indicated by the CE bit)
1259	      from such connections are propagated to the inner header at tunnel
1260	      egress.

1262	   If negotiation of ECN usage is implemented, then the SAD field SHOULD
1263	   also be implemented.  On the other hand, negotiation of ECN usage is
1264	   OPTIONAL in all cases, even for implementations that support the SAD
1265	   field.  The encapsulation and decapsulation processing changes are
1266	   REQUIRED, but MAY be implemented without the other two changes by
1267	   assuming that ECN usage is always forbidden.  The full-functionality
1268	   alternative for ECN usage over IPsec tunnels consists of the SAD
1269	   field and the full version of encapsulation and decapsulation pro-
1270	   cessing changes, with or without the OPTIONAL negotiation support.
1271	   The limited-functionality alternative consists of a subset of the
1272	   encapsulation and decapsulation changes that always forbids ECN
1273	   usage.

1275	   These changes are covered further in the following three subsections.

1277	9.2.1.1.  ECN Tunnel Security Association Database Field

1279	   Full ECN functionality adds a new field to the SAD (see [RFC2401]):

1281	      ECN Tunnel: allowed or forbidden.

1283	      Indicates whether ECN-capable connections using this SA in tunnel
1284	      mode are permitted to receive ECN congestion notifications for
1285	      congestion occurring within the tunnel.  The allowed value enables
1286	      ECN congestion notifications.  The forbidden value disables such
1287	      notifications, causing all congestion to be indicated via dropped
1288	      packets.

1290	      [OPTIONAL.  The value of this field SHOULD be assumed to be "for-
1291	      bidden" in implementations that do not support it.]

1293	   If this attribute is implemented, then the SA specification in a
1294	   Security Policy Database (SPD) entry MUST support a corresponding
1295	   attribute, and this SPD attribute MUST be covered by the SPD adminis-
1296	   trative interface (currently described in Section 4.4.1 of
1297	   [RFC2401]).

1299	9.2.1.2.  ECN Tunnel Security Association Attribute

1301	   A new IPsec Security Association Attribute is defined to enable the
1302	   support for ECN congestion notifications based on the outer IP header
1303	   to be negotiated for IPsec tunnels (see [RFC2407]).  This attribute
1304	   is OPTIONAL, although implementations that support it SHOULD also
1305	   support the SAD field defined in Section 3.1.

1307	   Attribute Type

1309	           class               value           type
1310	     -------------------------------------------------
1311	     ECN Tunnel                 10             Basic

1313	   The IPsec SA Attribute value 10 has been allocated by IANA to indi-
1314	   cate that the ECN Tunnel SA Attribute is being negotiated; the type
1315	   of this attribute is Basic (see Section 4.5 of [RFC2407]).  The Class
1316	   Values are used to conduct the negotiation.  See [RFC2407, RFC2408,
1317	   RFC2409] for further information including encoding formats and
1318	   requirements for negotiating this SA attribute.

1320	   Class Values

1322	     ECN Tunnel

1324	       Specifies whether ECN functionality is allowed to
1325	       be used with Tunnel Encapsulation Mode.
1326	       This affects tunnel encapsulation and decapsulation processing -
1327	       see Section 3.3.

1329	       RESERVED          0
1330	       Allowed           1
1331	       Forbidden         2

1333	       Values 3-61439 are reserved to IANA.  Values 61440-65535 are for
1334	       private use.

1336	       If unspecified, the default shall be assumed to be Forbidden.

1338	   ECN Tunnel is a new SA attribute, and hence initiators that use it
1339	   can expect to encounter responders that do not understand it, and
1340	   therefore reject proposals containing it.  For backwards compatibil-
1341	   ity with such implementations initiators SHOULD always also include a
1342	   proposal without the ECN Tunnel attribute to enable such a responder
1343	   to select a transform or proposal that does not contain the ECN Tun-
1344	   nel attribute.  RFC 2407 currently requires responders to reject all
1345	   proposals if any proposal contains an unknown attribute; this
1346	   requirement is expected to be changed to require a responder not to
1347	   select proposals or transforms containing unknown attributes.

1349	9.2.1.3.  Changes to IPsec Tunnel Header Processing

1351	   Subsequent to the publication of [RFC 2401], the TOS octet of IPv4
1352	   and the Traffic Class octet of IPv6 have been superseded by the six-
1353	   bit DS Field [RFC2474, RFC2780] and a two-bit "currently unused" (CU)
1354	   field [RFC2780], and this document supersedes the CU field by tne ECN
1355	   Field.

1357	   For full ECN support, the encapsulation and decapsulation processing
1358	   for the IPv4 TOS field and the IPv6 Traffic Class field are changed
1359	   from that specified in [RFC2401] to the following:

1361	                           <-- How Outer Hdr Relates to Inner Hdr -->
1362	                           Outer Hdr at                 Inner Hdr at
1363	      IPv4                 Encapsulator                 Decapsulator
1364	        Header fields:     --------------------         ------------
1365	          DS Field         copied from inner hdr (5)    no change
1366	          ECN Field        constructed (7)              constructed (8)

1368	      IPv6
1369	        Header fields:
1370	          DS Field         copied from inner hdr (6)    no change
1371	          ECN Field        constructed (7)              constructed (8)

1373	      (5)(6) If the packet will immediately enter a domain for which the
1374	      DSCP value in the outer header is not appropriate, that value MUST
1375	      be mapped to an appropriate value for the domain [RFC 2474].  Also
1376	      see [RFC 2475] for further information.

1378	      (7) If the value of the ECN Tunnel field in the SAD entry for this
1379	      SA is "allowed" and the value of ECT (bit 0) is 1 in the inner
1380	      header, set ECT to 1 in the outer header, else set ECT to 0 in the
1381	      outer header.  Set CE (bit 1) to 0 in the outer header.

1383	      (8) If the value of the ECN tunnel field in the SAD entry for this
1384	      SA is "allowed" and the value of ECT (bit 0) in the inner header
1385	      is 1, then set the CE bit (bit 1) in the inner header to the logi-
1386	      cal OR of the CE bit in the inner header with the CE bit in the
1387	      outer header, else make no change to the ECN field.

1389	      (5) and (6) are identical to match usage in [RFC2401], although
1390	      they are different in [RFC2401].

1392	   The above description applies to implementations that support the ECN
1393	   Tunnel field in the SAD; such implementations MUST implement this
1394	   processing of the DS field instead of the processing of the IPv4 TOS
1395	   octet and IPv6 Traffic Class octet defined in [RFC2401].  This con-
1396	   stitutes the full-functionality alternative for ECN usage with IPsec
1397	   tunnels.

1399	   An implementation that does not support the ECN Tunnel field in the
1400	   SAD MUST implement processing of the DS Field by assuming that the
1401	   value of the ECN Tunnel field of the SAD is "forbidden" for every SA.
1402	   In this case, the processing of the ECN field reduces to:

1404	      (7) Set the ECN field (ECT and CE bits) to zero in the outer
1405	      header.
1406	      (8) Make no change to the ECN field in the inner header.

1408	   This constitutes the limited functionality alternative for ECN usage
1409	   with IPsec tunnels.

1411	   For backwards compatibility, packets with ECT and CE both set to 1 in
1412	   the outer header SHOULD be dropped if they arrive on an SA that is
1413	   using the limited-functionality option, or that is using the full-
1414	   functionality option (i.e., and has set the ECT flag in the outer
1415	   header to 1) for a packet with the ECT flag set to 0 in the inner
1416	   header.

1418	9.2.2.  Changes to the ECN Field within an IPsec Tunnel.

1420	   If the ECN Field is changed inappropriately within an IPsec tunnel,
1421	   and this change is detected at the tunnel egress, then the receipt of
1422	   a packet not satisfying the appropriate condition for its SA is an
1423	   auditable event.  An implementation MAY create audit records with
1424	   per-SA counts of incorrect packets over some time period rather than
1425	   creating an audit record for each erroneous packet.  Any such audit
1426	   record SHOULD contain the headers from at least one erroneous packet,
1427	   but need not contain the headers from every packet represented by the
1428	   entry.

1430	9.2.3.  Comments for IPsec Support

1432	   Substantial comments were received on two areas of this document dur-
1433	   ing review by the IPsec working group.  This section describes these
1434	   comments and explains why the proposed changes were not incorporated.

1436	   The first comment indicated that per-node configuration is easier to
1437	   implement than per-SA configuration.  After serious thought and
1438	   despite some initial encouragement of per-node configuration, it no
1439	   longer seems to be a good idea. The concern is that as IPsec is pro-
1440	   gressively deployed, many ECN-aware IPsec implementations will find
1441	   themselves communicating with a mixture of ECN-aware and ECN-unaware
1442	   IPsec tunnel endpoints.  In such an environment with per-node config-
1443	   uration, the only reasonable thing to do is forbid ECN usage for all
1444	   IPsec tunnels, which is not the desired outcome.

1446	   In the second area, several reviewers noted that SA negotiation is
1447	   complex, and adding to it is non-trivial.  One reviewer suggested
1448	   using ICMP after tunnel setup as a possible alternative.  The addi-
1449	   tion to SA negotiation in the document is OPTIONAL and will remain
1450	   so; implementers are free to ignore it.  The authors believe that the
1451	   assurance it provides can be useful in a number of situations.  In
1452	   practice, if this is not implemented, it can be deleted at a subse-
1453	   quent stage in the standards process.  Extending ICMP to negotiate
1454	   ECN after tunnel setup is more complex than extending SA attribute
1455	   negotiation.  Some tunnels do not permit traffic to be addressed to
1456	   the tunnel egress endpoint, hence the ICMP packet would have to be
1457	   addressed to somewhere else, scanned for by the egress endpoint, and
1458	   discarded there or at its actual destination.  In addition, ICMP
1459	   delivery is unreliable, and hence there is a possibility of an ICMP
1460	   packet being dropped, entailing the invention of yet another
1461	   ack/retransmit mechanism.  It seems better simply to specify an
1462	   OPTIONAL extension to the existing SA negotiation mechanism.

1464	9.3.  IP packets encapsulated in non-IP packet headers.

1466	   A different set of issues are raised, relative to ECN, when IP pack-
1467	   ets are encapsulated in tunnels with non-IP packet headers.  This
1468	   occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP].
1469	   For these protocols, there is no conflict with ECN; it is just that
1470	   ECN cannot be used within the tunnel unless an ECN codepoint can be
1471	   specified for the header of the encapsulating protocol.  [RFD99] con-
1472	   sidered a preliminary proposal for incorporating ECN into MPLS, and
1473	   proposals for incorporating ECN into GRE, L2TP, or PPTP will be con-
1474	   sidered as the need arises.

1476	10.  Issues Raised by Monitoring and Policing Devices

1478	   One possibility is that monitoring and policing devices (or more
1479	   informally, "penalty boxes") will be installed in the network to mon-
1480	   itor whether best-effort flows are appropriately responding to con-
1481	   gestion, and to preferentially drop packets from flows determined not
1482	   to be using adequate end-to-end congestion control procedures.  This
1483	   is discussed in more detail in the Appendix.

1485	   We recommend that any "penalty box" that detects a flow or an aggre-
1486	   gate of flows that is not responding to end-to-end congestion control
1487	   first change from marking to dropping packets from that flow, before
1488	   taking any additional action to restrict the bandwidth available to
1489	   that flow.  Thus, initially, the router may drop packets in which the
1490	   router would otherwise would have set the CE bit.  This could include
1491	   dropping those arriving packets for that flow that are ECN-Capable
1492	   and that already have the CE bit set.  In this way, any congestion
1493	   indications seen by that router for that flow will be guaranteed to
1494	   also be seen by the end nodes, even in the presence of malicious or
1495	   broken routers elsewhere in the path.  If we assume that the first
1496	   action taken at any "penalty box" for an ECN-capable flow will be to
1497	   drop packets instead of marking them, then there is no way that an
1498	   adversary that subverts ECN-based end-to-end congestion control can
1499	   cause a flow to be characterized as being non-cooperative and placed
1500	   into a more severe action within the "penalty box".

1502	   The monitoring and policing devices that are actually deployed could
1503	   fall short of the `ideal' monitoring device described above, in that
1504	   the monitoring is applied not to a single flow or to a single IPsec
1505	   tunnel, but to an aggregate of flows.  In this case, the switch from
1506	   marking to dropping would apply to all of the flows in that aggre-
1507	   gate, denying the benefits of ECN to the other flows in the aggregate
1508	   also.  At the highest level of aggregation, another form of the dis-
1509	   abling of ECN happens even in the absence of monitoring and policing
1510	   devices, when ECN-Capable RED queues switch from marking to dropping
1511	   packets as an indication of congestion when the average queue size
1512	   has exceeded some threshold.

1514	   If there were serious operational problems with routers inappropri-
1515	   ately erasing the CE bit in packet headers, one potential fix would
1516	   be to include a one-bit ECN nonce in packet headers, and for routers
1517	   to erase the nonce when they set the CE bit [SCWA99].  Routers that
1518	   erased the CE bit would be unable to consistently reconstruct the
1519	   original nonce, and thus repeated erasure of the CE bit would be
1520	   detected by the end-nodes.  (This could in fact be done without
1521	   adding any extra bits for ECN in the IP header, by using the ECN
1522	   codepoints (ECT=1, CE=0) and (ECT=0, CE=1) as the two values for the
1523	   nonce, and by defining the codepoint (ECT=0, CE=1) to mean exactly
1524	   the same as the codepoint (ECT=1, CE=0).)  However, at this point the
1525	   potential danger of misbehaving routers does not seem of sufficient
1526	   concern to warrant this additional complication of adding an ECN
1527	   nonce to protect against the erasure of the CE bit.

1529	   An ECN nonce would also address the problem of misbehaving transport
1530	   receivers lying to the transport sender about whether or not the CE
1531	   bit was set in a packet.  However, another possibility is for the
1532	   data sender to test for a misbehaving receiver directly, by occasion-
1533	   ally sending a data packet with ECT and CE set, to see if the
1534	   receiver reports receiving the CE bit.  Of course, if these packets
1535	   encountered congestion in the network, the TCP sender would not
1536	   receive this indication of congestion, so setting the ECT and CE bits
1537	   at the sender would have to be done very sparingly.  In addition, the
1538	   TCP sender would have to remember which packets were sent with the
1539	   ECT and CE bits set, so that it doesn't react to them as if there was
1540	   congestion in the network.  We believe that further research is
1541	   needed on possible transport-based mechanisms for verifying that the
1542	   transport receiver does not lie to the transport sender about the
1543	   receipt of congestion indications.

1545	11.  Evaluations of ECN

1547	   This section discusses some of the related work evaluating the use of
1548	   ECN.  The ECN Web Page [ECN] has pointers to other papers, as well as
1549	   to implementations of ECN.

1551	   [Floyd94] considers the advantages and drawbacks of adding ECN to the
1552	   TCP/IP architecture.  As shown in the simulation-based comparisons,
1553	   one advantage of ECN is to avoid unnecessary packet drops for short
1554	   or delay-sensitive TCP connections.  A second advantage of ECN is in
1555	   avoiding some unnecessary retransmit timeouts in TCP.  This paper
1556	   discusses in detail the integration of ECN into TCP's congestion con-
1557	   trol mechanisms.  The possible disadvantages of ECN discussed in the
1558	   paper are that a non-compliant TCP connection could falsely advertise
1559	   itself as ECN-capable, and that a TCP ACK packet carrying an ECN-Echo
1560	   message could itself be dropped in the network.  The first of these
1561	   two issues is discussed in the appendix of this document, and the
1562	   second is addressed by the addition of the CWR flag in the TCP
1563	   header.

1565	   Experimental evaluations of ECN include [RFC2884,K98].  The conclu-
1566	   sions of [K98] and [RFC2884] are that ECN TCP gets moderately better
1567	   throughput than non-ECN TCP; that ECN TCP flows are fair towards non-
1568	   ECN TCP flows; and that ECN TCP is robust with two-way traffic (with
1569	   congestion in both directions) and with multiple congested gateways.
1570	   Experiments with many short web transfers show that, while most of
1571	   the short connections have similar transfer times with or without
1572	   ECN, a small percentage of the short connections have very long
1573	   transfer times for the non-ECN experiments as compared to the ECN
1574	   experiments.

1576	12.  Summary of changes required in IP and TCP

1578	   This document specified two bits in the IP header, the ECN-Capable
1579	   Transport (ECT) bit and the Congestion Experienced (CE) bit, to be
1580	   used for ECN.  The ECT bit set to "0" indicates that the transport
1581	   protocol will ignore the CE bit.  This is the default value for the
1582	   ECT bit.  The ECT bit set to "1" indicates that the transport proto-
1583	   col is willing and able to participate in ECN.

1585	   The default value for the CE bit is "0".  The router sets the CE bit
1586	   to "1" to indicate congestion to the end nodes.  The CE bit in a
1587	   packet header MUST NOT be reset by a router from "1" to "0".

1589	   When viewed in terms of code points, this document has defined three
1590	   code points for the ECN field, for "not ECT" (ECT=0, CE=0), "ECT but
1591	   not CE" (ECT=1, CE=0), and "ECT and CE" (ECT=1, CE=1).  The code
1592	   point of (ECT=0, CE=1) is not defined in this document.  One
1593	   possibility would be for this code point to be used, some time in the
1594	   future, for some other function for non-ECN-capable packets.  A sec-
1595	   ond possibility would be for this code point to be used as an ECN
1596	   nonce, as described earlier in the paper.  A third possibility would
1597	   be for the code point (ECT=0, CE=1) to be used to indicate that the
1598	   packet is ECN-capable for an alternate semantics for the Congestion
1599	   Experienced indication.  However, at this time the code point (ECT=0,
1600	   CE=1) remains undefined.

1602	   TCP requires three changes for ECN, a setup phase and two new flags
1603	   in the TCP header. The ECN-Echo flag is used by the data receiver to
1604	   inform the data sender of a received CE packet.  The Congestion Win-
1605	   dow Reduced (CWR) flag is used by the data sender to inform the data
1606	   receiver that the congestion window has been reduced.

1608	   When ECN (Explicit Congestion Notification [RFC2481]) is used, it is
1609	   required that congestion indications generated within an IP tunnel
1610	   not be lost at the tunnel egress.  We specified a minor modification
1611	   to the IP protocol's handling of the ECN field during encapsulation
1612	   and de-capsulation to allow flows that will undergo IP tunneling to
1613	   use ECN.

1615	   Two options for ECN in tunnels were specified:
1616	   1) A limited-functionality option that does not use ECN inside the IP
1617	   tunnel, by turning the ECT bit in the outer header off, and not
1618	   altering the inner header at the time of decapsulation.
1619	   2) The full-functionality option, which copies the ECT bit of the
1620	   inner header to the encapsulating header. At decapsulation, if the
1621	   ECT bit is set in the inner header, the CE bit on the outer header is
1622	   ORed with the CE bit of the inner header to update the CE bit of the
1623	   packet.

1625	   All IP tunnels MUST implement one of the two alternative approaches
1626	   described above.  For IPsec tunnels, this document also defines an
1627	   optional IPsec SA attribute that enables negotiation of ECN usage
1628	   within IPsec tunnels and an optional field in the Security Associa-
1629	   tion Database to indicate whether ECN is permitted in tunnel mode on
1630	   a SA.

1632	   This document is intended to obsolete RFC 2481, "A Proposal to add
1633	   Explicit Congestion Notification (ECN) to IP", which defined ECN as
1634	   an Experimental Protocol for the Internet Community, as well as to
1635	   obsolete three subsequent internet-drafts on ECN, "IPsec Interactions
1636	   with ECN", "ECN Interactions with IP Tunnels", and "TCP with ECN: The
1637	   Treatment of Retransmitted Data Packets".  This document is intended
1638	   largely to merge the earlier documents all into a single document,
1639	   for greater clarity, in preparation to becoming a Proposed Standard.
1640	   The rest of this section describes the relationship between this
1641	   document and its predecessors.

1643	   RFC 2481 included a brief discussion of the use of ECN with encapsu-
1644	   lated packets, and noted that for the IPsec specifications at the
1645	   time (January 1999), flows could not safely use ECN if they were to
1646	   traverse IPsec tunnels.  RFC 2481 also described the changes that
1647	   could be made to IPsec tunnel specifications to made them compatible
1648	   with ECN.  "IPsec Interactions with ECN" outlined these changes to
1649	   IPsec tunnels in detail, and included an extensive discussion of the
1650	   security implications of ECN (now included as Sections 18 and 19 of
1651	   this document).  The draft of "ECN Interactions with IP Tunnels"
1652	   extended the discussion of IPsec tunnels to include all IP tunnels.
1653	   Because older IP tunnels are not compatible with a flow's use of ECN,
1654	   the deployment of ECN in the Internet will create strong pressure for
1655	   older IP tunnels to be updated to an ECN-compatible version, using
1656	   either the limited-functionality or the full-functionality option.

1658	   This document does not address the issue of including ECN in non-IP
1659	   tunnels such as MPLS, GRE, L2TP, or PPTP.  An earlier preliminary
1660	   document about adding ECN support to MPLS has since expired.

1662	   This document expands on one area not addressed in RFC 2481, the use
1663	   of ECN with retransmitted data packets.  That is, this document
1664	   includes the material from "TCP with ECN: The Treatment of Retrans-
1665	   mitted Data Packets" specifying that the ECT bit should not be set on
1666	   retransmitted data packets.  The motivation for this additional spec-
1667	   ification is to eliminate a possible avenue for denial-of-service
1668	   attacks on an existing TCP connection.  Some prior deployments of
1669	   ECN-capable TCP might not conform to the (new) requirement not to set
1670	   the ECT bit on retransmitted packets; we do not believe this will
1671	   cause significant problems in practice.

1673	   This document also expands on the specification of the use of SYN
1674	   packets for the negotiation of ECN, and specifies some optional
1675	   behavior for this.  In particular, the document allows a TCP host to
1676	   send a non-ECN-setup SYN packet after sending a failed ECN-setup SYN
1677	   packet, and precisely specifies the required behavior when both ECN-
1678	   setup SYN packets and non-ECN-setup SYN packets are sent in the same
1679	   connection.  While some prior deployments of ECN-capable TCP might
1680	   not conform to the requirements specified in this document, we do not
1681	   believe that this will lead to any performance or compatibility prob-
1682	   lems for TCP connections with a combination of TCP implementations at
1683	   the endpoints.

1685	13.  Conclusions

1687	   Given the current effort to implement AQM, we believe this is the
1688	   right time to deploy congestion avoidance mechanisms that do not
1689	   depend on packet drops alone.  With the increased deployment of
1690	   applications and transports sensitive to the delay and loss of a sin-
1691	   gle packet (e.g., realtime traffic, short web transfers), depending
1692	   on packet loss as a normal congestion notification mechanism appears
1693	   to be insufficient (or at the very least, non-optimal).

1695	   We examined the consequence of modifications of the ECN field within
1696	   the network, analyzing all the opportunities for an adversary to
1697	   change the ECN field.  In many cases, the change to the ECN field is
1698	   no worse than dropping a packet. However, we noted that some changes
1699	   have the more serious consequence of subverting end-to-end congestion
1700	   control.  However, we point out that even then the potential damage
1701	   is limited, and is similar to the threat posed by end-systems inten-
1702	   tionally failing to cooperate with end-to-end congestion control.

1704	14.  Acknowledgements

1706	   Many people have made contributions to this work and this document,
1707	   including many that we have not managed to directly acknowledge in
1708	   this document.  In addition, we would like to thank Kenjiro Cho for
1709	   the proposal for the TCP mechanism for negotiating ECN-Capability,
1710	   Kevin Fall for the proposal of the CWR bit, Steve Blake for material
1711	   on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for discus-
1712	   sions of ECN issues, and Steve Bellovin, Jim Bound, Brian Carpenter,
1713	   Paul Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson for dis-
1714	   cussions of security issues.  We also thank the Internet End-to-End
1715	   Research Group for ongoing discussions of these issues.

1717	   Email discussions with a number of people, including Alexey
1718	   Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have addressed
1719	   the issues raised by non-conformant equipment in the Internet that
1720	   does not respond to TCP SYN packets with the ECE and CWR flags set.
1721	   We thank Mark Handley, Jitentra Padhye, and others for contributions
1722	   to the TCP initialization procedures.

1724	   The discussion of ECN and IP tunnel considerations draws heavily on
1725	   related discussions and documents from the Differentiated Services
1726	   Working Group.  We thank Tabassum Bint Haque from Dhaka, Bangladesh,
1727	   for feedback on IP tunnels.  We thank Derrell Piper and Kero Tivinen
1728	   for proposing modifications to RFC 2407 that improve the usability of
1729	   negotiating the ECN Tunnel SA attribute.

1731	15.  References

1733	   [AH] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402,
1734	   November 1998.

1736	   [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement
1737	   Levels", BCP 14, RFC 2119, March 1997.

1739	   [ECN] "The ECN Web Page", URL "http://www.aciri.org/floyd/ecn.html".

1741	   [ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload",
1742	   RFC 2406, November 1998.

1744	   [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways
1745	   for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1
1746	   N.4, August 1993, p.  397-413.

1748	   [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM
1749	   Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23.

1751	   [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator",
1752	   URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all-
1753	   ecn.

1755	   [FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-End Con-
1756	   gestion Control in the Internet", IEEE/ACM Transactions on Network-
1757	   ing, August 1999.

1759	   [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection",
1760	   SIGCOMM '97, September 1997.

1762	   [GRE] S. Hanks, T. Li, D. Farinacci, and P. Traina, Generic Routing
1763	   Encapsulation (GRE), RFC 1701, October 1994.

1765	   [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc.
1766	   ACM SIGCOMM '88, pp. 314-329.

1768	   [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance Algo-
1769	   rithm", Message to end2end-interest mailing list, April 1990. URL
1770	   "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt".

1772	   [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN)
1773	   benefits for TCP", Master's thesis, UCLA, 1998, URL
1774	   "http://www.cs.ucla.edu/~hari/software/ecn/ ecn_report.ps.gz".

1776	   [L2TP] W. Townsley, A. Valencia, A. Rubens, G. Pall, G. Zorn, and B.
1777	   Palter Layer Two Tunneling Protocol "L2TP", RFC 2661, August 1999.

1779	   [MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver- driven
1780	   Layered Multicast", SIGCOMM '96, August 1996, pp.  117-130.

1782	   [MPLS] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus,
1783	   Requirements for Traffic Engineering Over MPLS, RFC 2702, September
1784	   1999.

1786	   [PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W.
1787	   and G. Zorn, "Point-to-Point Tunneling Protocol (PPTP)", RFC 2637,
1788	   July 1999.

1790	   [RFC791] Postel, J., "Internet Protocol", STD 5, RFC 791, September
1791	   1981.

1793	   [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
1794	   September 1981.

1796	   [RFC1141] Mallory, T. and A. Kullberg, "Incremental Updating of the
1797	   Internet Checksum", RFC 1141, January 1990.

1799	   [RFC1349] Almquist, P., "Type of Service in the Internet Protocol
1800	   Suite", RFC 1349, July 1992.

1802	   [RFC1455] Eastlake, D., "Physical Link Security Type of Service", RFC
1803	   1455, May 1993.

1805	   [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic
1806	   Routing Encapsulation (GRE), RFC 1701, October 1994.

1808	   [RFC1702] Hanks, S., Li, T., Farinacci, D., and P. Traina, Generic
1809	   Routing Encapsulation over IPv4 networks, RFC 1702, October 1994.

1811	   [RFC2003]  Perkins, C., IP Encapsulation within IP, RFC 2003, October
1812	   1996.

1814	   [RFC 2119] S. Bradner, Key words for use in RFCs to Indicate Require-
1815	   ment Levels, RFC 2119, March 1997.

1817	   [RFC2309] Braden, B., et al., "Recommendations on Queue Management
1818	   and Congestion Avoidance in the Internet", RFC 2309, April 1998.

1820	   [RFC 2401] S. Kent and R. Atkinson, Security Architecture for the
1821	   Internet Protocol, RFC 2401, November 1998.

1823	   [RFC2407] D. Piper, The Internet IP Security Domain of Interpretation
1824	   for ISAKMP, RFC 2407, November 1998.

1826	   [RFC2408] D. Maughan, M. Schertler, M. Schneider, and J. Turner,
1827	   Internet Security Association and Key Management Protocol (ISAKMP),
1828	   RFC 2409, November 1998.

1830	   [RFC2409] D. Harkins and D. Carrel, The Internet Key Exchange (IKE),
1831	   RFC 2409, November 1998.

1833	   [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition
1834	   of the Differentiated Services Field (DS Field) in the IPv4 and IPv6
1835	   Headers", RFC 2474, December 1998.

1837	   [RFC2475] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W.
1838	   Weiss, An Architecture for Differentiated Services, RFC 2475, Decem-
1839	   ber 1998.

1841	   [RFC2481] K. Ramakrishnan and S. Floyd, A Proposal to add Explicit
1842	   Congestion Notification (ECN) to IP, RFC 2481, January 1999.

1844	   [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control",
1845	   RFC 2581, April 1999.

1847	   [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed, "Performance Evaluation
1848	   of Explicit Congestion Notification (ECN) in IP Networks", RFC 2884,
1849	   July 2000.

1851	   [RFC2983] D. Black, "Differentiated Services and Tunnels", RFC2983,
1852	   October 2000.

1854	   [RFC2780] S. Bradner and V. Paxson, IANA Allocation Guidelines For
1855	   Values In the Internet Protocol and Related Headers, RFC 2780, March
1856	   2000.

1858	   [RFD99] Ramakrishnan, Floyd, S., and Davie, B., A Proposal to Incor-
1859	   porate ECN in MPLS, work in progress, June 1999.  URL
1860	   "http://www.aciri.org/floyd/papers/draft-ietf-mpls-ecn-00.txt".

1862	   [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for
1863	   Congestion Avoidance in Computer Networks", ACM Transactions on Com-
1864	   puter Systems, Vol.8, No.2, pp.  158-181, May 1990.

1866	   [SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom
1867	   Anderson, TCP Congestion Control with a Misbehaving Receiver, ACM
1868	   Computer Communications Review, October 1999.

1870	16.  Security Considerations

1872	   Security considerations have been discussed in Sections 7 and 8.

1874	17.  IPv4 Header Checksum Recalculation

1876	   IPv4 header checksum recalculation is an issue with some high-end
1877	   router architectures using an output-buffered switch, since most if
1878	   not all of the header manipulation is performed on the input side of
1879	   the switch, while the ECN decision would need to be made local to the
1880	   output buffer. This is not an issue for IPv6, since there is no IPv6
1881	   header checksum. The IPv4 TOS octet is the last byte of a 16-bit
1882	   half-word.

1884	   RFC 1141 [RFC1141] discusses the incremental updating of the IPv4
1885	   checksum after the TTL field is decremented.  The incremental updat-
1886	   ing of the IPv4 checksum after the CE bit was set would work as fol-
1887	   lows: Let HC be the original header checksum, and let HC' be the new
1888	   header checksum after the CE bit has been set.  Then for header
1889	   checksums calculated with one's complement subtraction, HC' would be
1890	   recalculated as follows:

1892	        HC' = { HC - 1     HC > 1
1893	              { 0x0000     HC = 1

1895	For header checksums calculated on two's complement machines, HC' would
1896	be recalculated as follows after the CE bit was set:

1898	        HC' = { HC - 1     HC > 0
1899	              { 0xFFFE     HC = 0

1901	18.  Possible Changes to the ECN Field in the Network

1903	   This section discusses in detail possible changes to the ECN field in
1904	   the network, such as falsely reporting congestion, disabling ECN-
1905	   Capability for an individual packet, erasing the ECN congestion indi-
1906	   cation, or falsely indicating ECN-Capability.  We represent the ECN
1907	   bits in the IP header by the tuple (ECT bit, CE bit).

1909	18.1.  Possible Changes to the IP Header

1911	18.1.1.  Erasing the Congestion Indication

1913	   First, we consider the changes that a router could make that would
1914	   result in effectively erasing the congestion indication after it had
1915	   been set by a router upstream.  The convention followed is:
1916	   (ECT, CE) of received packet -> (ECT, CE) of packet transmitted.

1918	   (1, 1) -> (1, 0): erase only the CE bit that was set.
1919	   (1, 1) -> (0, 0): erase both the ECT bit and the CE bit.
1920	   (1, 1) -> (0, 1): erase the ECT bit

1922	   The first change turns off the CE bit after it has been set by some
1923	   upstream router along the path.  The consequence for the upstream
1924	   router is that there is a potential for congestion to build for a
1925	   time, because the congestion indication does not reach the source.
1926	   However, the packet would be received and acknowledged.

1928	   The potential effect of erasing the congestion indication is complex,
1929	   and is discussed in depth in Section 19 below.  Note that the effect
1930	   of erasing the congestion indication is different from dropping a
1931	   packet in the network.  When a data packet is dropped, the drop is
1932	   detected by the TCP sender, and interpreted as an indication of con-
1933	   gestion.  Similarly, if a sufficient number of consecutive acknowl-
1934	   edgement packets are dropped, causing the cumulative acknowledgement
1935	   field not to be advanced at the sender, the sender is limited by the
1936	   congestion window from sending additional packets, and ultimately the
1937	   retransmit timer expires.

1939	   In contrast, a systematic erasure of the CE bit by a downstream
1940	   router can have the effect of causing a queue buildup at an upstream
1941	   router, including the possible loss of packets due to buffer over-
1942	   flow.  There is a potential of unfairness in that another flow that
1943	   goes through the congested router could react to the CE bit set while
1944	   the flow that has the CE bit erased could see better performance.
1945	   The limitations on this potential unfairness are discussed in more
1946	   detail in Section 19 below.

1948	   The second change is to turn off both the ECT and the CE bits, thus
1949	   erasing the congestion indication and disabling ECN-Capability at the
1950	   same time.  The third change turns off only the ECT bit, disabling
1951	   ECN-Capability.

1953	   Within an IP tunnel using the full-functionality option, the third
1954	   change would not erase the congestion indication, but would only dis-
1955	   able ECN-Capability for that packet within the rest of the tunnel.
1956	   However, when performed outside of an IP tunnel, the third change
1957	   would also effectively erase the congestion indication, because an
1958	   ECN field of (0, 1) is undefined.

1960	   The `erasure' of the congestion indication is only effective if the
1961	   packet does not end up being marked or dropped again by a downstream
1962	   router.  With the first change, the packet remains ECN-Capable, and
1963	   could be either marked or dropped by a downstream router as an indi-
1964	   cation of congestion.  With the second and third changes, the packet
1965	   is no longer ECN-capable, and can therefore be dropped but not marked
1966	   by a downstream router as an indication of congestion.

1968	18.1.2.  Falsely Reporting Congestion

1970	   (1, 0) -> (1, 1)

1972	   This change is to set the CE bit when the ECT bit was already set,
1973	   even though there was no congestion.  This change does not affect the
1974	   treatment of that packet along the rest of the path.  In particular,
1975	   a router does not examine the CE bit in deciding whether to drop or
1976	   mark an arriving packet.

1978	   However, this could result in the application unnecessarily invoking
1979	   end-to-end congestion control, and reducing its arrival rate.  By
1980	   itself, this is no worse (for the application or for the network)
1981	   than if the tampering router had actually dropped the packet.

1983	18.1.3.  Disabling ECN-Capability

1985	   (1, 0) -> (0, *)

1987	   This change is to turn off the ECT bit of a packet that does not have
1988	   the CE bit set.  (Section 18.1.1 discussed the case of turning off
1989	   the ECT bit of a packet that does have the CE bit set.)  This means
1990	   that if the packet later encounters congestion (e.g., by arriving to
1991	   a RED queue with a moderate average queue size), it will be dropped
1992	   instead of being marked.  By itself, this is no worse (for the appli-
1993	   cation) than if the tampering router had actually dropped the packet.
1994	   The saving grace in this particular case is that there is no con-
1995	   gested router upstream expecting a reaction from setting the CE bit.

1997	18.1.4.  Falsely Indicating ECN-Capability
1998	   This change would incorrectly label a packet as ECN-Capable. The
1999	   packet may have been sent either by an ECN-Capable transport or a
2000	   transport that is not ECN-Capable.

2002	   (0, *) -> (1, 0);
2003	   (0, *) -> (1, 1);

2005	   If the packet later encounters moderate congestion at an ECN-Capable
2006	   router, the router could set the CE bit instead of dropping the
2007	   packet.  If the transport protocol in fact is not ECN-Capable, then
2008	   the transport will never receive this indication of congestion, and
2009	   will not reduce its sending rate in response.  The potential conse-
2010	   quences of falsely indicating ECN-capability are discussed further in
2011	   Section 19 below.

2013	   If the packet never later encounters congestion at an ECN-Capable
2014	   router, then the first of these two changes would have no effect.
2015	   The second change, however, would have the effect of giving false
2016	   reports of congestion to a monitoring device along the path.  If the
2017	   transport protocol is ECN-Capable, then the second of these two
2018	   changes (when, for example, (0,0) was changed to (1,1)) could also
2019	   have an effect at the transport level, by combining falsely indicat-
2020	   ing ECN-Capability with falsely reporting congestion.  For an ECN-
2021	   capable transport, this would cause the transport to unnecessarily
2022	   react to congestion.  In this particular case, the router that is
2023	   incorrectly changing the ECN field could have dropped the packet.

2025	   Thus for this case of an ECN-capable transport, the consequence of
2026	   this change to the ECN field is no worse than dropping the packet.

2028	18.1.5.  Changes with No Functional Effect

2030	   (0, *) -> (0, *)

2032	   The CE bit is ignored in a packet that does not have the ECT bit set.
2033	   Thus, this change would have no effect, in terms of ECN.

2035	18.2.  Information carried in the Transport Header

2037	   For TCP, an ECN-capable TCP receiver informs its TCP peer that it is
2038	   ECN-capable at the TCP level, using information in the TCP header at
2039	   the time the connection is setup.  This document does not consider
2040	   potential dangers introduced by changes in the transport header
2041	   within the network.  In the case of IPsec tunnels, the IPsec tunnel
2042	   protects the transport header.

2044	18.3.  Split Paths

2046	   In some cases, a malicious or broken router might have access to only
2047	   a subset of the packets from a flow.  The question is as follows:
2048	   can this router, by altering the ECN field in this subset of the
2049	   packets, do more damage to that flow than if it had simply dropped
2050	   that set of packets?

2052	   We will classify the packets in the flow as A packets and B packets,
2053	   and assume that the adversary only has access to A packets.  Assume
2054	   that the adversary is subverting end-to-end congestion control along
2055	   the path traveled by A packets only, by either falsely indicating
2056	   ECN-Capability upstream of the point where congestion occurs, or
2057	   erasing the congestion indication downstream.  Consider also that
2058	   there exists a monitoring device that sees both the A and B packets,
2059	   and will "punish" both the A and B packets if the total flow is
2060	   determined not to be properly responding to indications of conges-
2061	   tion.  Another key characteristic that we believe is likely to be
2062	   true is that the monitoring device, before `punishing' the A&B flow,
2063	   will first drop packets instead of setting the CE bit, and will drop
2064	   arriving packets of that flow that already have the ECT and CE bits
2065	   set.  If the end nodes are in fact using end-to-end congestion con-
2066	   trol, they will see all of the indications of congestion seen by the
2067	   monitoring device, and will begin to respond to these indications of
2068	   congestion. Thus, the monitoring device is successful in providing
2069	   the indications to the flow at an early stage.

2071	   It is true that the adversary that has access only to the A packets
2072	   might, by subverting ECN-based congestion control, be able to deny
2073	   the benefits of ECN to the other packets in the A&B aggregate.  While
2074	   this is unfortunate, this is not a reason to disable ECN within an
2075	   IPsec tunnel.

2077	   A variant of falsely reporting congestion occurs when there are two
2078	   adversaries along a path, where the first adversary falsely reports
2079	   congestion, and the second adversary `erases' those reports. (Unlike
2080	   packet drops, ECN congestion reports can be `reversed' later in the
2081	   network by a malicious or broken router.)  While this would be trans-
2082	   parent to the end node, it is possible that a monitoring device
2083	   between the first and second adversaries would see the false indica-
2084	   tions of congestion.  Keep in mind our recommendation in this docu-
2085	   ment, that before `punishing' a flow for not responding appropriately
2086	   to congestion, the router will first switch to dropping rather than
2087	   marking as an indication of congestion, for that flow.  When this
2088	   includes dropping arriving packets from that flow that have the CE
2089	   bit set, this ensures that these indications of congestion are being
2090	   seen by the end nodes.  Thus, there is no additional harm that we are
2091	   able to postulate as a result of multiple conflicting adversaries.

2093	19.  Implications of Subverting End-to-End Congestion Control

2095	   This section focuses on the potential repercussions of subverting
2096	   end-to-end congestion control by either falsely indicating ECN-Capa-
2097	   bility, or by erasing the congestion indication in ECN (the CE-bit).
2098	   Subverting end-to-end congestion control by either of these two meth-
2099	   ods can have consequences both for the application and for the net-
2100	   work.  We discuss these separately below.

2102	   The first method to subvert end-to-end congestion control, that of
2103	   falsely indicating ECN-Capability, effectively subverts end-to-end
2104	   congestion control only if the packet later encounters congestion
2105	   that results in the setting of the CE bit.  In this case, the trans-
2106	   port protocol (which may not be ECN-capable) does not receive the
2107	   indication of congestion from these downstream congested routers.

2109	   The second method to subvert end-to-end congestion control, `erasing'
2110	   the (set) CE bit in a packet, effectively subverts end-to-end conges-
2111	   tion control only when the CE bit in the packet was set earlier by a
2112	   congested router.  In this case, the transport protocol does not
2113	   receive the indication of congestion from the upstream congested
2114	   routers.

2116	   Either of these two methods of subverting end-to-end congestion con-
2117	   trol can potentially introduce more damage to the network (and possi-
2118	   bly to the flow itself) than if the adversary had simply dropped
2119	   packets from that flow.  However, as we discuss later in this section
2120	   and in Section 7, this potential damage is limited.

2122	19.1.  Implications for the Network and for Competing Flows

2124	   The CE bit of the ECN field is only used by routers as an indication
2125	   of congestion during periods of *moderate* congestion.  ECN-capable
2126	   routers should drop rather than mark packets during heavy congestion
2127	   even if the router's queue is not yet full.  For example, for routers
2128	   using active queue management based on RED, the router should drop
2129	   rather than mark packets that arrive while the average queue sizes
2130	   exceed the RED queue's maximum threshold.

2132	   One consequence for the network of subverting end-to-end congestion
2133	   control is that flows that do not receive the congestion indications
2134	   from the network might increase their sending rate until they drive
2135	   the network into heavier congestion.  Then, the congested router
2136	   could begin to drop rather than mark arriving packets.  For flows
2137	   that are not isolated by some form of per-flow scheduling or other
2138	   per-flow mechanisms, but are instead aggregated with other flows in a
2139	   single queue in an undifferentiated fashion, this packet-dropping at
2140	   the congested router would apply to all flows that share that queue.
2141	   Thus, the consequences would be to increase the level of congestion
2142	   in the network.

2144	   In some cases, the increase in the level of congestion will lead to a
2145	   substantial buffer buildup at the congested queue that will be suffi-
2146	   cient to drive the congested queue from the packet-marking to the
2147	   packet-dropping regime.  This transition could occur either because
2148	   of buffer overflow, or because of the active queue management policy
2149	   described above that drops packets when the average queue is above
2150	   RED's maximum threshold.  At this point, all flows, including the
2151	   subverted flow, will begin to see packet drops instead of packet
2152	   marks, and a malicious or broken router will no longer be able to
2153	   `erase' these indications of congestion in the network.  If the end
2154	   nodes are deploying appropriate end-to-end congestion control, then
2155	   the subverted flow will reduce its arrival rate in response to con-
2156	   gestion.  When the level of congestion is sufficiently reduced, the
2157	   congested queue can return from the packet-dropping regime to the
2158	   packet-marking regime.  The steady-state pattern could be one of the
2159	   congested queue oscillating between these two regimes.

2161	   In other cases, the consequences of subverting end-to-end congestion
2162	   control will not be severe enough to drive the congested link into
2163	   sufficiently-heavy congestion that packets are dropped instead of
2164	   being marked.  In this case, the implications for competing flows in
2165	   the network will be a slightly-increased rate of packet marking or
2166	   dropping, and a corresponding decrease in the bandwidth available to
2167	   those flows.  This can be a stable state if the arrival rate of the
2168	   subverted flow is sufficiently small, relative to the link bandwidth,
2169	   that the average queue size at the congested router remains under
2170	   control.  In particular, the subverted flow could have a limited
2171	   bandwidth demand on the link at this router, while still getting more
2172	   than its "fair" share of the link.  This limited demand could be due
2173	   to a limited demand from the data source; a limitation from the TCP
2174	   advertised window; a lower-bandwidth access pipe; or other factors.
2175	   Thus the subversion of ECN-based congestion control can still lead to
2176	   unfairness, which we believe is appropriate to note here.

2178	   The threat to the network posed by the subversion of ECN-based con-
2179	   gestion control in the network is essentially the same as the threat
2180	   posed by an end-system that intentionally fails to cooperate with
2181	   end-to-end congestion control.  The deployment of mechanisms in
2182	   routers to address this threat is an open research question, and is
2183	   discussed further in Section 10.

2185	   Let us take the example described in Section 18.1.1, where the CE bit
2186	   that was set in a packet is erased: {(1, 1) -> (1, 0)}.  The conse-
2187	   quence for the congested upstream router that set the CE bit is that
2188	   this congestion indication does not reach the end nodes for that
2189	   flow. The source (even one which is completely cooperative and not
2190	   malicious) is thus allowed to continue to increase its sending rate
2191	   (if it is a TCP flow, by increasing its congestion window).  The flow
2192	   potentially achieves better throughput than the other flows that also
2193	   share the congested router, especially if there are no policing mech-
2194	   anisms or per-flow queueing mechanisms at that router.  Consider the
2195	   behavior of the other flows, especially if they are cooperative: that
2196	   is, the flows that do not experience subverted end-to-end congestion
2197	   control.  They are likely to reduce their load (e.g., by reducing
2198	   their window size) on the congested router, thus benefiting our sub-
2199	   verted flow. This results in unfairness.  As we discussed above, this
2200	   unfairness could either be transient (because the congested queue is
2201	   driven into the packet-marking regime), oscillatory (because the con-
2202	   gested queue oscillates between the packet marking and the packet
2203	   dropping regime), or more moderate but a persistent stable state
2204	   (because the congested queue is never driven to the packet dropping
2205	   regime).

2207	   The results would be similar if the subverted flow was intentionally
2208	   avoiding end-to-end congestion control.  One difference is that a
2209	   flow that is intentionally avoiding end-to-end congestion control at
2210	   the end nodes can avoid end-to-end congestion control even when the
2211	   congested queue is in packet-dropping mode, by refusing to reduce its
2212	   sending rate in response to packet drops in the network.  Thus the
2213	   problems for the network from the subversion of ECN-based congestion
2214	   control are less severe than the problems caused by the intentional
2215	   avoidance of end-to-end congestion control in the end nodes.  It is
2216	   also the case that it is considerably more difficult to control the
2217	   behavior of the end nodes than it is to control the behavior of the
2218	   infrastructure itself.  This is not to say that the problems for the
2219	   network posed by the network's subversion of ECN-based congestion
2220	   control are small; just that they are dwarfed by the problems for the
2221	   network posed by the subversion of either ECN-based or other cur-
2222	   rently known packet-based congestion control mechanisms by the end
2223	   nodes.

2225	19.2.  Implications for the Subverted Flow

2227	   When a source indicates that it is ECN-capable, there is an expecta-
2228	   tion that the routers in the network that are capable of participat-
2229	   ing in ECN will use the CE bit for indication of congestion. There is
2230	   the potential benefit of using ECN in reducing the amount of packet
2231	   loss (in addition to the reduced queueing delays because of active
2232	   queue management policies).  When the packet flows through a tunnel
2233	   where the nodes that the tunneled packets traverse are untrusted in
2234	   some way, the expectation is that IPsec will protect the flow from
2235	   subversion that results in undesirable consequences.

2237	   In many cases, a subverted flow will benefit from the subversion of
2238	   end-to-end congestion control for that flow in the network, by
2239	   receiving more bandwidth than it would have otherwise, relative to
2240	   competing non-subverted flows.  If the congested queue reaches the
2241	   packet-dropping stage, then the subversion of end-to-end congestion
2242	   control might or might not be of overall benefit to the subverted
2243	   flow, depending on that flow's relative tradeoffs between throughput,
2244	   loss, and delay.

2246	   One form of subverting end-to-end congestion control is to falsely
2247	   indicate ECN-capability by setting the ECT bit.  This has the conse-
2248	   quence of downstream congested routers setting the CE bit in vain.
2249	   However, as we describe in the section below, if the ECT bit is
2250	   changed in the IPsec tunnel, this can be detected at the egress point
2251	   of the tunnel.

2253	   The second form of subverting end-to-end congestion control is to
2254	   erase the congestion indication, either by erasing the CE bit
2255	   directly, or by erasing the ECT bit when the CE bit is already set.
2256	   In this case, it is the upstream congested routers that set the CE
2257	   bit in vain.

2259	   If the ECT bit is erased within an IP tunnel, then this can be
2260	   detected at the egress point of the tunnel.  If the CE bit is set
2261	   upstream of the IP tunnel, then any erasure of the outer header's CE
2262	   bit within the tunnel will have no effect because the inner header
2263	   preserves the set value of the CE bit.  However, if the CE bit is set
2264	   within the tunnel, and erased either within or downstream of the tun-
2265	   nel, this is not necessarily detected at the egress point of the
2266	   tunnel.

2268	   With this subversion of end-to-end congestion control, an end-system
2269	   transport does not respond to the congestion indication.  Along with
2270	   the increased unfairness for the non-subverted flows described in the
2271	   previous section, the congested router's queue could continue to
2272	   build, resulting in packet loss at the congested router - which is a
2273	   means for indicating congestion to the transport in any case.  In the
2274	   interim, the flow might experience higher queueing delays, possibly
2275	   along with an increased bandwidth relative to other non-subverted
2276	   flows.  But transports do not inherently make assumptions of consis-
2277	   tently experiencing carefully managed queueing in the path.  We
2278	   believe that these forms of subverting end-to-end congestion control
2279	   are no worse for the subverted flow than if the adversary had simply
2280	   dropped the packets of that flow itself.

2282	19.3.  Non-ECN-Based Methods of Subverting End-to-end Congestion Control

2284	   We have shown that, in many cases, a malicious or broken router that
2285	   is able to change the bits in the ECN field can do no more damage
2286	   than if it had simply dropped the packet in question.  However, this
2287	   is not true in all cases, in particular in the cases where the broken
2288	   router subverted end-to-end congestion control by either falsely
2289	   indicating ECN-Capability or by erasing the ECN congestion indication
2290	   (in the CE-bit).  While there are many ways that a router can harm a
2291	   flow by dropping packets, a router cannot subvert end-to-end conges-
2292	   tion control by dropping packets.  As an example, a router cannot
2293	   subvert TCP congestion control by dropping data packets, acknowledge-
2294	   ment packets, or control packets.

2296	   Even though packet-dropping cannot be used to subvert end-to-end con-
2297	   gestion control, there *are* non-ECN-based methods for subverting
2298	   end-to-end congestion control that a broken or malicious router could
2299	   use.  For example, a broken router could duplicate data packets, thus
2300	   effectively negating the effects of end-to-end congestion control
2301	   along some portion of the path.  (For a router that duplicated pack-
2302	   ets within an IPsec tunnel, the security administrator can cause the
2303	   duplicate packets to be discarded by configuring anti-replay protec-
2304	   tion for the tunnel.)  This duplication of packets within the network
2305	   would have similar implications for the network and for the subverted
2306	   flow as those described in Sections 18.1.1 and 18.1.4 above.

2308	20.  The motivation for the ECT bit.

2310	   The need for the ECT bit is motivated by the fact that ECN will be
2311	   deployed incrementally in an Internet where some transport protocols
2312	   and routers understand ECN and some do not. With the ECT bit, the
2313	   router can drop packets from flows that are not ECN-capable, but can
2314	   *instead* set the CE bit in packets that *are* ECN-capable. Because
2315	   the ECT bit allows an end node to have the CE bit set in a packet
2316	   *instead* of having the packet dropped, an end node might have some
2317	   incentive to deploy ECN.

2319	   If there was no ECT indication, then the router would have to set the
2320	   CE bit for packets from both ECN-capable and non-ECN-capable flows.
2321	   In this case, there would be no incentive for end-nodes to deploy
2322	   ECN, and no viable path of incremental deployment from a non-ECN
2323	   world to an ECN-capable world.  Consider the first stages of such an
2324	   incremental deployment, where a subset of the flows are ECN-capable.
2325	   At the onset of congestion, when the packet dropping/marking rate
2326	   would be low, routers would only set CE bits, rather than dropping
2327	   packets.  However, only those flows that are ECN-capable would under-
2328	   stand and respond to CE packets. The result is that the ECN- capable
2329	   flows would back off, and the non-ECN-capable flows would be unaware
2330	   of the ECN signals and would continue to open their congestion win-
2331	   dows.

2333	   In this case, there are two possible outcomes: (1) the ECN-capable
2334	   flows back off, the non-ECN-capable flows get all of the bandwidth,
2335	   and congestion remains mild, or (2) the ECN-capable flows back off,
2336	   the non-ECN-capable flows don't, and congestion increases until the
2337	   router transitions from setting the CE bit to dropping packets.
2338	   While this second outcome evens out the fairness, the ECN-capable
2339	   flows would still receive little benefit from being ECN-capable,
2340	   because the increased congestion would drive the router to packet-
2341	   dropping behavior.

2343	   A flow that advertised itself as ECN-Capable but does not respond to
2344	   CE bits is functionally equivalent to a flow that turns off conges-
2345	   tion control, as discussed earlier in this document.

2347	   Thus, in a world when a subset of the flows are ECN-capable, but
2348	   where ECN-capable flows have no mechanism for indicating that fact to
2349	   the routers, there would be less effective and less fair congestion
2350	   control in the Internet, resulting in a strong incentive for end
2351	   nodes not to deploy ECN.

2353	21.  Why use two bits in the IP header?

2355	   Given the need for an ECT indication in the IP header, there still
2356	   remains the question of whether the ECT (ECN-Capable Transport) and
2357	   CE (Congestion Experienced) indications should have been overloaded
2358	   on a single bit.  This overloaded-one-bit alternative, explored in
2359	   [Floyd94], would have involved a single bit with two values.  One
2360	   value, "ECT and not CE", would represent an ECN-Capable Transport,
2361	   and the other value, "CE or not ECT", would represent either
2362	   Congestion Experienced or a non-ECN-Capable transport.

2364	   One difference between the one-bit and two-bit implementations con-
2365	   cerns packets that traverse multiple congested routers.  Consider a
2366	   CE packet that arrives at a second congested router, and is selected
2367	   by the active queue management at that router for either marking or
2368	   dropping.  In the one-bit implementation, the second congested router
2369	   has no choice but to drop the CE packet, because it cannot distin-
2370	   guish between a CE packet and a non-ECT packet.  In the two-bit
2371	   implementation, the second congested router has the choice of either
2372	   dropping the CE packet, or of leaving it alone with the CE bit set.

2374	   Another difference between the one-bit and two-bit implementations
2375	   comes from the fact that with the one-bit implementation, receivers
2376	   in a single flow cannot distinguish between CE and non-ECT packets.
2377	   Thus, in the one-bit implementation an ECN-capable data sender would
2378	   have to unambiguously indicate to the receiver or receivers whether
2379	   each packet had been sent as ECN-Capable or as non-ECN-Capable.  One
2380	   possibility would be for the sender to indicate in the transport
2381	   header whether the packet was sent as ECN-Capable.  A second possi-
2382	   bility that would involve a functional limitation for the one- bit
2383	   implementation would be for the sender to unambiguously indicate that
2384	   it was going to send *all* of its packets as ECN-Capable or as non-
2385	   ECN-Capable.  For a multicast transport protocol, this unambiguous
2386	   indication would have to be apparent to receivers joining an on-going
2387	   multicast session.

2389	   Another concern that was described earlier (and recommended in this
2390	   document) is that transports (particularly TCP) should not mark pure
2391	   ACK packets or retransmitted packets as being ECN-Capable.  A pure
2392	   ACK packet from a non-ECN-capable transport could be dropped, without
2393	   necessarily having an impact on the transport from a congestion con-
2394	   trol perspective (because subsequent ACKs are cumulative).  An ECN-
2395	   capable transport reacting to the CE bit set in a pure ACK packet by
2396	   reducing the window would be at a disadvantage in comparison to a
2397	   non-ECN-capable transport. For this reason (and for reasons described
2398	   earlier in relation to retransmitted packets), it is desirable to
2399	   have the ECN-Capable bit indication on a per-packet basis.

2401	   Another advantage of the two-bit approach is that it is somewhat more
2402	   robust.  The most critical issue, discussed in Section 8, is that the
2403	   default indication should be that of a non-ECN-Capable transport.  In
2404	   a two-bit implementation, this requirement for the default value sim-
2405	   ply means that the ECT bit should be `OFF' by default.  In the one-
2406	   bit implementation, this means that the single overloaded bit should
2407	   by default be in the "CE or not ECT" position.  This is less clear
2408	   and straightforward, and possibly more open to incorrect implementa-
2409	   tions either in the end nodes or in the routers.

2411	   In summary, while the one-bit implementation could be a possible
2412	   implementation, it has the following significant limitations relative
2413	   to the two-bit implementation.  First, the one-bit implementation has
2414	   more limited functionality for the treatment of CE packets at a sec-
2415	   ond congested router.  Second, the one-bit implementation requires
2416	   either that extra information be carried in the transport header of
2417	   packets from ECN-Capable flows (to convey the functionality of the
2418	   second bit elsewhere, namely in the transport header), or that
2419	   senders in ECN-Capable flows accept the limitation that receivers
2420	   must be able to determine a priori which packets are ECN-Capable and
2421	   which are not ECN-Capable. Third, the one-bit implementation is pos-
2422	   sibly more open to errors from faulty implementations that choose the
2423	   wrong default value for the ECN bit.  We believe that the use of the
2424	   extra bit in the IP header for the ECT-bit is extremely valuable to
2425	   overcome these limitations.

2427	22.  Historical definitions for the IPv4 TOS octet

2429	   RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP
2430	   header.  In RFC 791, bits 6 and 7 of the ToS octet are listed as
2431	   "Reserved for Future Use", and are shown set to zero.  The first two
2432	   fields of the ToS octet were defined as the Precedence and Type of
2433	   Service (TOS) fields.

2435	            0     1     2     3     4     5     6     7
2436	         +-----+-----+-----+-----+-----+-----+-----+-----+
2437	         |   PRECEDENCE    |       TOS       |  0  |  0  |  RFC 791
2438	         +-----+-----+-----+-----+-----+-----+-----+-----+

2440	   RFC 1122 included bits 6 and 7 in the TOS field, though it did not
2441	   discuss any specific use for those two bits:

2443	            0     1     2     3     4     5     6     7
2444	         +-----+-----+-----+-----+-----+-----+-----+-----+
2445	         |   PRECEDENCE    |       TOS                   |  RFC 1122
2446	         +-----+-----+-----+-----+-----+-----+-----+-----+

2448	   The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows:

2450	            0     1     2     3     4     5     6     7
2451	         +-----+-----+-----+-----+-----+-----+-----+-----+
2452	         |   PRECEDENCE    |       TOS             | MBZ |  RFC 1349
2453	         +-----+-----+-----+-----+-----+-----+-----+-----+

2455	   Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary
2456	   Cost".  In addition to the Precedence and Type of Service (TOS)
2457	   fields, the last field, MBZ (for "must be zero") was defined as
2458	   currently unused.  RFC 1349 stated that "The originator of a datagram
2459	   sets [the MBZ] field to zero (unless participating in an Internet
2460	   protocol experiment which makes use of that bit)."

2462	   RFC 1455 [RFC 1455] defined an experimental standard that used all
2463	   four bits in the TOS field to request a guaranteed level of link
2464	   security.

2466	   RFC 1349 is obsoleted by "Definition of the Differentiated Services
2467	   Field (DS Field) in the IPv4 and IPv6 Headers" [RFC2474], in which
2468	   bits 6 and 7 of the DS field are listed as Currently Unused (CU).
2469	   The first six bits of the DS field are defined as the Differentiated
2470	   Services CodePoint (DSCP):

2472	            0     1     2     3     4     5     6     7
2473	         +-----+-----+-----+-----+-----+-----+-----+-----+
2474	         |               DSCP                |    CU     |  RFC 2474
2475	         +-----+-----+-----+-----+-----+-----+-----+-----+

2477	   Because of this unstable history, the definition of the ECN field in
2478	   this document cannot be guaranteed to be backwards compatible with
2479	   all past uses of these two bits.  The damage that could be done by a
2480	   non-ECN-capable router would be to "erase" the CE bit for an ECN-
2481	   capable packet that arrived at the router with the CE bit set, or set
2482	   the CE bit even in the absence of congestion.  This has been dis-
2483	   cussed in the section on "Non-compliance in the Network".

2485	   The damage that could be done in an ECN-capable environment by a non-
2486	   ECN-capable end-node transmitting packets with the ECT bit set has
2487	   been discussed in the section on "Non-compliance by the End Nodes".

2489	   AUTHORS' ADDRESSES

2491	      K. K. Ramakrishnan
2492	      TeraOptic Networks, Inc.
2493	      Phone: +1 (408) 666-8650
2494	      Email: kk@teraoptic.com

2496	      Sally Floyd
2497	      Phone: +1 (510) 666-2989
2498	      ACIRI
2499	      Email: floyd@aciri.org
2500	      URL: http://www.aciri.org/floyd/

2502	      David L. Black
2503	      EMC Corporation
2504	      42 South St.

2506	      Hopkinton, MA  01748
2507	      Phone:  +1 (508) 435-1000 x75140
2508	      Email: black_david@emc.com

2510	      This draft was created in November 2000.
2511	      It expires May 2001.