idnits 2.17.1 

draft-kksjf-ecn-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-26) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 24 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 1998) is 9325 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC 2001' is mentioned on line 293, but not defined

  ** Obsolete undefined reference: RFC 2001 (Obsoleted by RFC 2581)

  == Missing Reference: 'RFC 1455' is mentioned on line 1044, but not defined

  ** Obsolete undefined reference: RFC 1455 (Obsoleted by RFC 2474)

  == Unused Reference: 'Floyd97' is defined on line 819, but no explicit
     reference was found in the text

  == Unused Reference: 'FRED' is defined on line 831, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC1455' is defined on line 857, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CKLTZ97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CKLTZ98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ECN'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ93'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd94'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'K98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FRED'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson88'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson90'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MJV96'

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Downref: Normative reference to an Informational RFC: RFC 1141

  ** Obsolete normative reference: RFC 1349 (Obsoleted by RFC 2474)

  ** Obsolete normative reference: RFC 1455 (Obsoleted by RFC 2474)

  ** Obsolete normative reference: RFC 2001 (Obsoleted by RFC 2581)

  ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'RJ90'


     Summary: 17 errors (**), 0 flaws (~~), 8 warnings (==), 15 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                       K. K. Ramakrishnan
2	INTERNET DRAFT                                        AT&T Labs Research
3	draft-kksjf-ecn-03.txt                                       Sally Floyd
4	                                                                    LBNL
5	                                                            October 1998
6	                                                    Expires:  April 1999

8	    A Proposal to add Explicit Congestion Notification (ECN) to IP

10	                          Status of this Memo

12	   This document is an Internet-Draft.  Internet-Drafts are working
13	   documents of the Internet Engineering Task Force (IETF), its areas,
14	   and its working groups.  Note that other groups may also distribute
15	   working documents as Internet-Drafts.

17	   Internet-Drafts are draft documents valid for a maximum of six months
18	   and may be updated, replaced, or obsoleted by other documents at any
19	   time.  It is inappropriate to use Internet- Drafts as reference
20	   material or to cite them other than as "work in progress."

22	   To view the entire list of current Internet-Drafts, please check the
23	   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
24	   Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern
25	   Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific
26	   Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast).

28	Abstract

30	   This note describes a proposed addition of ECN (Explicit Congestion
31	   Notification) to IP.  TCP is currently the dominant transport
32	   protocol used in the Internet. We begin by describing TCP's use of
33	   packet drops as an indication of congestion.  Next we argue that with
34	   the addition of active queue management (e.g., RED) to the Internet
35	   infrastructure, where routers detect congestion before the queue
36	   overflows, routers are no longer limited to packet drops as an
37	   indication of congestion.  Routers could instead set a Congestion
38	   Experienced (CE) bit in the packet header of packets from ECN-capable
39	   transport protocols.  We describe when the CE bit would be set in the
40	   routers, and describe what modifications would be needed to TCP to
41	   make it ECN-capable.  Modifications to other transport protocols
42	   (e.g., unreliable unicast or multicast, reliable multicast, other
43	   reliable unicast transport protocols) could be considered as those
44	   protocols are developed and advance through the standards process.

46	1.  Conventions and Acronyms

48	   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
49	   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
50	   document, are to be interpreted as described in [B97].

52	2. Introduction

54	   TCP's congestion control and avoidance algorithms are based on the
55	   notion that the network is a black-box [Jacobson88, Jacobson90].  The
56	   network's state of congestion or otherwise is determined by end-
57	   systems probing for the network state, by gradually increasing the
58	   load on the network (by increasing the window of packets that are
59	   outstanding in the network) until the network becomes congested and a
60	   packet is lost.  Treating the network as a "black-box" and treating
61	   loss as an indication of congestion in the network is appropriate for
62	   pure best-effort data carried by TCP which has little or no
63	   sensitivity to delay or loss of individual packets.  In addition,
64	   TCP's congestion management algorithms have techniques built-in (such
65	   as Fast Retransmit and Fast Recovery) to minimize the impact of
66	   losses from a throughput perspective.

68	   However, these mechanisms are not intended to help applications that
69	   are in fact sensitive to the delay or loss of one or more individual
70	   packets.  Interactive traffic such as telnet, web-browsing, and
71	   transfer of audio and video data can be sensitive to packet losses
72	   (using an unreliable data delivery transport such as UDP) or to the
73	   increased latency of the packet caused by the need to retransmit the
74	   packet after a loss (for reliable data delivery such as TCP).

76	   Since TCP determines the appropriate congestion window to use by
77	   gradually increasing the window size until it experiences a dropped
78	   packet, this causes the queues at the bottleneck router to build up.
79	   With most packet drop policies at the router that are not sensitive
80	   to the load placed by each individual flow, this means that some of
81	   the packets of latency-sensitive flows are going to be dropped.
82	   Active queue management mechanisms detect congestion before the queue
83	   overflows, and provide an indication of this congestion to the end
84	   nodes.  The advantages of active queue management are discussed in
85	   RFC 2309 [RFC2309].  Active queue management avoids some of the bad
86	   properties of dropping on queue overflow, including the undesirable
87	   synchronization of loss across multiple flows.  More importantly,
88	   active queue management means that transport protocols with
89	   congestion control (e.g., TCP) do not have to rely on buffer overflow
90	   as the only indication of congestion.  This can reduce unnecessary
91	   queueing delay for all traffic sharing that queue.

93	   Active queue management mechanisms may use one of several methods for
94	   indicating congestion to end-nodes. One is to use packet drops, as is
95	   currently done. However, active queue management allows the router to
96	   separate policies of queueing or dropping packets from the policies
97	   for indicating congestion. Thus, active queue management allows
98	   routers to use the Congestion Experienced (CE) bit in a packet header
99	   as an indication of congestion, instead of relying solely on packet
100	   drops.

102	3. Assumptions and General Principles

104	   In this section, we describe some of the important design principles
105	   and assumptions that guided the design choices in this proposal.

107	   (1) Congestion may persist over different time-scales. The time
108	   scales that we are concerned with are congestion events that may last
109	   longer than a round-trip time.
110	   (2) The number of packets in an individual flow (e.g., TCP connection
111	   or an exchange using UDP) may range from a small number of packets to
112	   quite a large number. We are interested in managing the congestion
113	   caused by flows that send enough packets so that they are still
114	   active when network feedback reaches them.
115	   (3) New mechanisms for congestion control and avoidance need to co-
116	   exist and cooperate with existing mechanisms for congestion control.
117	   In particular, new mechanisms have to co-exist with TCP's current
118	   methods of adapting to congestion and with routers' current practice
119	   of dropping packets in periods of congestion.
120	   (4) Because ECN is likely to be adopted gradually, accommodating
121	   migration is essential. Some routers may still only drop packets to
122	   indicate congestion, and some end-systems may not be ECN-capable. The
123	   most viable strategy is one that accommodates incremental deployment
124	   without having to resort to "islands" of ECN-capable and non-ECN-
125	   capable environments.
126	   (5) Asymmetric routing is likely to be a normal occurrence in the
127	   Internet. The path (sequence of links and routers) followed by data
128	   packets may be different from the path followed by the acknowledgment
129	   packets in the reverse direction.
130	   (6) Many routers process the "regular" headers in IP packets more
131	   efficiently than they process the header information in IP options.
132	   This suggests keeping congestion experienced information in the
133	   regular headers of an IP packet.
134	   (7) It must be recognized that not all end-systems will cooperate in
135	   mechanisms for congestion control. However, new mechanisms shouldn't
136	   make it easier for TCP applications to disable TCP congestion
137	   control. The benefit of lying about participating in new mechanisms
138	   such as ECN-capability should be small.

140	4. Random Early Detection (RED)

142	   Random Early Detection (RED) is a mechanism for active queue
143	   management that has been proposed to detect incipient congestion
144	   [FJ93], and is currently being deployed in the Internet backbone
145	   [RFC2309].  Although RED is meant to be a general mechanism using one
146	   of several alternatives for congestion indication, in the current
147	   environment of the Internet RED is restricted to using packet drops
148	   as a mechanism for congestion indication.  RED drops packets based on
149	   the average queue length exceeding a threshold, rather than only when
150	   the queue overflows.  However, when RED drops packets before the
151	   queue actually overflows, RED is not forced by memory limitations to
152	   discard the packet.

154	   RED could set a Congestion Experienced (CE) bit in the packet header
155	   instead of dropping the packet, if such a bit was provided in the IP
156	   header and understood by the transport protocol.  The use of the CE
157	   bit would allow the receiver(s) to receive the packet, avoiding the
158	   potential for excessive delays due to retransmissions after packet
159	   losses.  We use the term 'CE packet' to denote a packet that has the
160	   CE bit set.

162	5. Explicit Congestion Notification in IP

164	   We propose that the Internet provide a congestion indication for
165	   incipient congestion (as in RED and earlier work [RJ90]) where the
166	   notification can sometimes be through marking packets rather than
167	   dropping them.  This would require an ECN field in the IP header with
168	   two bits.  The ECN-Capable Transport (ECT) bit would be set by the
169	   data sender to indicate that the end-points of the transport protocol
170	   are ECN-capable.  The CE bit would be set by the router to indicate
171	   congestion to the end nodes.  Routers that have a packet arriving at
172	   a full queue would drop the packet, just as they do now.

174	   Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field.
175	   Bit 6 is designated as the ECT bit, and bit 7 is designated as the CE
176	   bit.  The IPv4 TOS octet corresponds to the Traffic Class octet in
177	   IPv6.  The definitions for the IPv4 TOS octet [RFC791] and the IPv6
178	   Traffic Class octet are intended to be superseded by the DS
179	   (Differentiated Services) Field [RFC-DIFFSERV?].  Bits 6 and 7 are
180	   listed in [RFC-DIFFSERV?] as Currently Unused.  Section 19 gives a
181	   brief history of the TOS octet.

183	   Because of the unstable history of the TOS octet, the use of the ECN
184	   field as specified in this document cannot be guaranteed to be
185	   backwards compatible with all past uses of these two bits.  The
186	   potential dangers of this lack of backwards compatibility are
187	   discussed in Section 19.

189	   Upon the receipt by an ECN-Capable transport of a single CE packet,
190	   the congestion control algorithms followed at the end-systems MUST be
191	   essentially the same as the congestion control response to a *single*
192	   dropped packet.  For example, for ECN-Capable TCP the source TCP is
193	   required to halve its congestion window for any window of data
194	   containing either a packet drop or an ECN indication.  However, we
195	   would like to point out some notable exceptions in the reaction of
196	   the source TCP, related to following the shorter-time-scale details
197	   of particular implementations of TCP.  For TCP's response to an ECN
198	   indication, we do not recommend such behavior as the slow-start of
199	   Tahoe TCP in response to a packet drop, or Reno TCP's wait of roughly
200	   half a round-trip time during Fast Recovery.

202	   One reason for requiring that the congestion-control response to the
203	   CE packet be essentially the same as the response to a dropped packet
204	   is to accommodate the incremental deployment of ECN in both end-
205	   systems and in routers.  Some routers may drop ECN-Capable packets
206	   (e.g., using the same RED policies for congestion detection) while
207	   other routers set the CE bit, for equivalent levels of congestion.
208	   Similarly, a router might drop a non-ECN-Capable packet but set the
209	   CE bit in an ECN-Capable packet, for equivalent levels of congestion.
210	   Different congestion control responses to a CE bit indication and to
211	   a packet drop could result in unfair treatment for different flows.

213	   An additional requirement is that the end-systems should react to
214	   congestion at most once per window of data (i.e., at most once per
215	   roundtrip time), to avoid reacting multiple times to multiple
216	   indications of congestion within a roundtrip time.

218	   For a router, the CE bit of an ECN-Capable packet should only be set
219	   if the router would otherwise have dropped the packet as an
220	   indication of congestion to the end nodes. When the router's buffer
221	   is not yet full and the router is prepared to drop a packet to inform
222	   end nodes of incipient congestion, the router should first check to
223	   see if the ECT bit is set in that packet's IP header.  If so, then
224	   instead of dropping the packet, the router MAY instead set the CE bit
225	   in the IP header.

227	   An environment where all end nodes were ECN-Capable could allow new
228	   criteria to be developed for setting the CE bit, and new congestion
229	   control mechanisms for end-node reaction to CE packets.  However,
230	   this is a research issue, and as such is not addressed in this
231	   document.

233	   When a CE packet is received by a router, the CE bit is left
234	   unchanged, and the packet transmitted as usual. When severe
235	   congestion has occurred and the router's queue is full, then the
236	   router has no choice but to drop some packet when a new packet
237	   arrives.  We anticipate that such packet losses will become
238	   relatively infrequent when a majority of end-systems become ECN-
239	   Capable and participate in TCP or other compatible congestion control
240	   mechanisms. In an adequately-provisioned network in such an ECN-
241	   Capable environment, packet losses should occur primarily during
242	   transients or in the presence of non-cooperating sources.

244	   We expect that routers will set the CE bit in response to incipient
245	   congestion as indicated by the average queue size, using the RED
246	   algorithms suggested in [FJ93, RFC2309].  To the best of our
247	   knowledge, this is the only proposal currently under discussion in
248	   the IETF for routers to drop packets proactively, before the buffer
249	   overflows.  However, this document does not attempt to specify a
250	   particular mechanism for active queue management, leaving that
251	   endeavor, if needed, to other areas of the IETF.  While ECN is
252	   inextricably tied up with active queue management at the router, the
253	   reverse does not hold; active queue management mechanisms have been
254	   developed and deployed independently from ECN, using packet drops as
255	   indications of congestion in the absence of ECN in the IP
256	   architecture.

258	6. Support from the Transport Protocol

260	   ECN requires support from the transport protocol, in addition to the
261	   functionality given by the ECN field in the IP packet header. The
262	   transport protocol might require negotiation between the endpoints
263	   during setup to determine that all of the endpoints are ECN-capable,
264	   so that the sender can set the ECT bit in transmitted packets.
265	   Second, the transport protocol must be capable of reacting
266	   appropriately to the receipt of CE packets.  This reaction could be
267	   in the form of the data receiver informing the data sender of the
268	   received CE packet (e.g., TCP), of the data receiver unsubscribing to
269	   a layered multicast group (e.g., RLM [MJV96]), or of some other
270	   action that ultimately reduces the arrival rate of that flow to that
271	   receiver.

273	   This document only addresses the addition of ECN Capability to TCP,
274	   leaving issues of ECN and other transport protocols to further
275	   research.  For TCP, ECN requires three new mechanisms:  negotiation
276	   between the endpoints during setup to determine if they are both
277	   ECN-capable; an ECN-Echo flag in the TCP header so that the data
278	   receiver can inform the data sender when a CE packet has been
279	   received; and a Congestion Window Reduced (CWR) flag in the TCP
280	   header so that the data sender can inform the data receiver that the
281	   congestion window has been reduced. The support required from other
282	   transport protocols is likely to be different, particular for
283	   unreliable or reliable multicast transport protocols, and will have
284	   to be determined as other transport protocols are brought to the IETF
285	   for standardization.

287	6.1. TCP

289	   The following sections describe in detail the proposed use of ECN in
290	   TCP.  This proposal is described in essentially the same form in
291	   [Floyd94]. We assume that the source TCP uses the standard congestion
292	   control algorithms of Slow-start, Fast Retransmit and Fast Recovery
293	   [RFC 2001].

295	   This proposal specifies two new flags in the Reserved field of the
296	   TCP header.  The TCP mechanism for negotiating ECN-Capability uses
297	   the ECN-Echo flag in the TCP header.  (This was called the ECN Notify
298	   flag in some earlier documents.)  Bit 9 in the Reserved field of the
299	   TCP header is designated as the ECN-Echo flag.  The location of the
300	   6-bit Reserved field in the TCP header is shown in Figure 3 of RFC
301	   793 [RFC793].

303	   To enable the TCP receiver to determine when to stop setting the
304	   ECN-Echo flag, we introduce a second new flag in the TCP header, the
305	   Congestion Window Reduced (CWR) flag.  The CWR flag is assigned to
306	   Bit 8 in the Reserved field of the TCP header.

308	   The use of these flags is described in the sections below.

310	6.1.1.  TCP Initialization

312	   In the TCP connection setup phase, the source and destination TCPs
313	   exchange information about their desire and/or capability to use ECN.
314	   Subsequent to the completion of this negotiation, the TCP sender sets
315	   the ECT bit in the IP header of data packets to indicate to the
316	   network that the transport is capable and willing to participate in
317	   ECN for this packet. This will indicate to the routers that they may
318	   mark this packet with the CE bit, if they would like to use that as a
319	   method of congestion notification. If the TCP connection does not
320	   wish to use ECN notification for a particular packet, the sending TCP
321	   sets the ECT bit equal to 0 (i.e., not set), and the TCP receiver
322	   ignores the CE bit in the received packet.

324	   When a node sends a TCP SYN packet, it may set the ECN-Echo and CWR
325	   flags in the TCP header.  For a SYN packet, the setting of both the
326	   ECN-Echo and CWR flags are defined as an indication that the sending
327	   TCP is ECN-Capable, rather than as an indication of congestion or of
328	   response to congestion. More precisely, a SYN packet with both the
329	   ECN-Echo and CWR flags set indicates that the TCP implementation
330	   transmitting the SYN packet will participate in ECN as both a sender
331	   and receiver.  As a receiver, it will respond to incoming data
332	   packets that have the CE bit set in the IP header by setting the
333	   ECN-Echo flag in outgoing TCP Acknowledgement (ACK) packets.  As a
334	   sender, it will respond to incoming packets that have the ECN-Echo
335	   flag set by reducing the congestion window when appropriate.

337	   When a node sends a SYN-ACK packet, it may set the ECN-Echo flag, but
338	   it does not set the CWR flag.  For a SYN-ACK packet, the pattern of
339	   the ECN-Echo flag set and the CWR flag not set in the TCP header is
340	   defined as an indication that the TCP transmitting the SYN-ACK packet
341	   is ECN-Capable.

343	   There is the question of why we chose to have the TCP sending the SYN
344	   set two ECN-related flags in the Reserved field of the TCP header for
345	   the SYN packet, while the responding TCP sending the SYN-ACK sets
346	   only one ECN-related flag in the SYN-ACK packet.  This asymmetry is
347	   necessary for the robust negotiation of ECN-capability with deployed
348	   TCP implementations.  There exists at least one TCP implementation in
349	   which TCP receivers set the Reserved field of the TCP header in ACK
350	   packets (and hence the SYN-ACK) simply to reflect the Reserved field
351	   of the TCP header in the received data packet.  Because the TCP SYN
352	   packet sets the ECN-Echo and CWR flags to indicate ECN-capability,
353	   while the SYN-ACK packet sets only the ECN-Echo flag, the sending TCP
354	   correctly interprets a receiver's reflection of its own flags in the
355	   Reserved field as an indication that the receiver is not ECN-capable.

357	6.1.2.  The TCP Sender

359	   For a TCP connection using ECN, data packets are transmitted with the
360	   ECT bit set in the IP header (set to a "1").  If the sender receives
361	   an ECN-Echo ACK packet (that is, an ACK packet with the ECN-Echo flag
362	   set in the TCP header), then the sender knows that congestion was
363	   encountered in the network on the path from the sender to the
364	   receiver.  The indication of congestion should be treated just as a
365	   congestion loss in non-ECN-Capable TCP. That is, the TCP source
366	   halves the congestion window "cwnd" and reduces the slow start
367	   threshold "ssthresh".  The sending TCP does NOT increase the
368	   congestion window in response to the receipt of an ECN-Echo ACK
369	   packet.

371	   A critical condition is that TCP does not react to congestion
372	   indications more than once every window of data (or more loosely,
373	   more than once every round-trip time). That is, the TCP sender's
374	   congestion window should be reduced only once in response to a series
375	   of dropped and/or CE packets from a single window of data, In
376	   addition, the TCP source should not decrease the slow-start
377	   threshold, ssthresh, if it has been decreased within the last round
378	   trip time.  However, if any retransmitted packets are dropped or have
379	   the CE bit set, then this is interpreted by the source TCP as a new
380	   instance of congestion.

382	   After the source TCP reduces its congestion window in response to a
383	   CE packet, incoming acknowledgements that continue to arrive can
384	   "clock out" outgoing packets as allowed by the reduced congestion
385	   window.  If the congestion window consists of only one MSS (maximum
386	   segment size), and the sending TCP receives an ECN-Echo ACK packet,
387	   then the sending TCP should in principle still reduce its congestion
388	   window in half. However, the value of the congestion window is
389	   bounded below by a value of one MSS.  If the sending TCP were to
390	   continue to send, using a congestion window of 1 MSS, this results in
391	   the transmission of one packet per round-trip time.  We believe it is
392	   desirable to still reduce the sending rate of the TCP sender even
393	   further, on receipt of an ECN-Echo packet when the congestion window
394	   is one.  We use the retransmit timer as a means to reduce the rate
395	   further in this circumstance.  Therefore, the sending TCP should also
396	   reset the retransmit timer on receiving the ECN-Echo packet when the
397	   congestion window is one.  The sending TCP will then be able to send
398	   a new packet when the retransmit timer expires.

400	   [Floyd94] discusses TCP's response to ECN in more detail.  [Floyd98]
401	   discusses the validation test in the ns simulator, which illustrates
402	   a wide range of ECN scenarios. These scenarios include the following:
403	   an ECN followed by another ECN, a Fast Retransmit, or a Retransmit
404	   Timeout; a Retransmit Timeout or a Fast Retransmit followed by an
405	   ECN, and a congestion window of one packet followed by an ECN.

407	   TCP follows existing algorithms for sending data packets in response
408	   to incoming ACKs, multiple duplicate acknowledgements, or retransmit
409	   timeouts [RFC2001].

411	6.1.3.  The TCP Receiver

413	   When TCP receives a CE data packet at the destination end-system, the
414	   TCP data receiver sets the ECN-Echo flag in the TCP header of the
415	   subsequent ACK packet.  If there is any ACK withholding implemented,
416	   as in current "delayed-ACK" TCP implementations where the TCP
417	   receiver can send an ACK for two arriving data packets, then the
418	   ECN-Echo flag in the ACK packet will be set to the OR of the CE bits
419	   of all of the data packets being acknowledged.  That is, if any of
420	   the received data packets are CE packets, then the returning ACK has
421	   the ECN-Echo flag set.

423	   To provide robustness against the possibility of a dropped ACK packet
424	   carrying an ECN-Echo flag, the TCP receiver must set the ECN-Echo
425	   flag in a series of ACK packets. The TCP receiver uses the CWR flag
426	   to determine when to stop setting the ECN-Echo flag.

428	   When an ECN-Capable TCP reduces its congestion window for any reason
429	   (because of a retransmit timeout, a Fast Retransmit, or in response
430	   to an ECN Notification), the TCP sets the CWR flag in the TCP header
431	   of the first data packet sent after the window reduction.  If that
432	   data packet is dropped in the network, then the sending TCP will have
433	   to reduce the congestion window again and retransmit the dropped
434	   packet.  Thus, the Congestion Window Reduced message is reliably
435	   delivered to the data receiver.

437	   After a TCP receiver sends an ACK packet with the ECN-Echo bit set,
438	   that TCP receiver continues to set the ECN-Echo flag in ACK packets
439	   until it receives a CWR packet (a packet with the CWR flag set).
440	   After the receipt of the CWR packet, acknowledgements for subsequent
441	   non-CE data packets do not have the ECN-Echo flag set. If another CE
442	   packet is received by the data receiver, the receiver would once
443	   again send ACK packets with the ECN-Echo flag set.  While the receipt
444	   of a CWR packet does not guarantee that the data sender received the
445	   ECN-Echo message, this does indicate that the data sender reduced its
446	   congestion window at some point *after* it sent the data packet for
447	   which the CE bit was set.

449	   We have already specified that a TCP sender reduces its congestion
450	   window at most once per window of data.  This mechanism requires some
451	   care to make sure that the sender reduces its congestion window at
452	   most once per ECN indication, and that multiple ECN messages over
453	   several successive windows of data are properly reported to the ECN
454	   sender.  This is discussed further in [Floyd98].

456	6.1.4. Congestion on the ACK-path

458	   For the current generation of TCP congestion control algorithms, pure
459	   acknowledgement packets (e.g., packets that do not contain any
460	   accompanying data) should be sent with the ECT bit off. Current TCP
461	   receivers have no mechanisms for reducing traffic on the ACK-path in
462	   response to congestion notification.  Mechanisms for responding to
463	   congestion on the ACK-path are areas for current and future research.
464	   (One simple possibility would be for the sender to reduce its
465	   congestion window when it receives a pure ACK packet with the CE bit
466	   set). For current TCP implementations, a single dropped ACK generally
467	   has only a very small effect on the TCP's sending rate.

469	7. Summary of changes required in IP and TCP

471	   Two bits need to be specified in the IP header, the ECN-Capable
472	   Transport (ECT) bit and the Congestion Experienced (CE) bit.  The ECT
473	   bit set to "0" indicates that the transport protocol will ignore the
474	   CE bit.  This is the default value for the ECT bit.  The ECT bit set
475	   to "1" indicates that the transport protocol is willing and able to
476	   participate in ECN.

478	   The default value for the CE bit is "0".  The router sets the CE bit
479	   to "1" to indicate congestion to the end nodes.  The CE bit in a
480	   packet header should never be reset by a router from "1" to "0".

482	   TCP requires three changes, a negotiation phase during setup to
483	   determine if both end nodes are ECN-capable, and two new flags in the
484	   TCP header, from the "reserved" flags in the TCP flags field.  The
485	   ECN-Echo flag is used by the data receiver to inform the data sender
486	   of a received CE packet.  The Congestion Window Reduced flag is used
487	   by the data sender to inform the data receiver that the congestion
488	   window has been reduced.

490	8. Non-relationship to ATM's EFCI indicator or Frame Relay's FECN

492	   Since the ATM and Frame Relay mechanisms for congestion indication
493	   have typically been defined without any notion of average queue size
494	   as the basis for determining that an intermediate node is congested,
495	   we believe that they provide a very noisy signal. The TCP-sender
496	   reaction specified in this draft for ECN is NOT the appropriate
497	   reaction for such a noisy signal of congestion notification. It is
498	   our expectation that ATM's EFCI and Frame Relay's FECN mechanisms
499	   would be phased out over time within the ATM network.  However, if
500	   the routers that interface to the ATM network have a way of
501	   maintaining the average queue at the interface, and use it to come to
502	   a reliable determination that the ATM subnet is congested, they may
503	   use the ECN notification that is defined here.

505	   We emphasize that a *single* packet with the CE bit set in an IP
506	   packet causes the transport layer to respond, in terms of congestion
507	   control, as it would to a packet drop.  As such, the CE bit is not a
508	   good match to a transient signal such as one based on the
509	   instantaneous queue size.  However, experiments in techniques at
510	   layer 2 (e.g., in ATM switches or Frame Relay switches) should be
511	   encouraged.  For example, using a scheme such as RED (where packet
512	   marking is based on the average queue length exceeding a threshold),
513	   layer 2 devices could provide a reasonably reliable indication of
514	   congestion.  When all the layer 2 devices in a path set that layer's
515	   own Congestion Experienced bit (e.g., the EFCI bit for ATM, the FECN
516	   bit in Frame Relay) in this reliable manner, then the interface
517	   router to the layer 2 network could copy the state of that layer 2
518	   Congestion Experienced bit into the CE bit in the IP header.  We
519	   recognize that this is not the current practice, nor is it in current
520	   standards. However, encouraging experimentation in this manner may
521	   provide the information needed to enable evolution of existing layer
522	   2 mechanisms to provide a more reliable means of congestion
523	   indication, when they use a single bit for indicating congestion.

525	9. Non-compliance by the End Nodes

527	   This section discusses concerns about the vulnerability of ECN to
528	   non-compliant end-nodes (i.e., end nodes that set the ECT bit in
529	   transmitted packets but do not respond to received CE packets).  We
530	   argue that the addition of ECN to the IP architecture would not
531	   significantly increase the current vulnerability of the architecture
532	   to unresponsive flows.

534	   Even for non-ECN environments, there are serious concerns about the
535	   damage that can be done by non-compliant or unresponsive flows (that
536	   is, flows that do not respond to congestion control indications by
537	   reducing their arrival rate at the congested link).  For example, an
538	   end-node could "turn off congestion control" by not reducing its
539	   congestion window in response to packet drops. This is a concern for
540	   the current Internet.  It has been argued that routers will have to
541	   deploy mechanisms to detect and differentially treat packets from
542	   non-compliant flows.  It has also been argued that techniques such as
543	   end-to-end per-flow scheduling and isolation of one flow from
544	   another, differentiated services, or end-to-end reservations could
545	   remove some of the more damaging effects of unresponsive flows.

547	   It has been argued that dropping packets in itself may be an adequate
548	   deterrent for non-compliance, and that the use of ECN removes this
549	   deterrent.  We would argue in response that (1) ECN-capable routers
550	   preserve packet-dropping behavior in times of high congestion; and
551	   (2) even in times of high congestion, dropping packets in itself is
552	   not an adequate deterrent for non-compliance.

554	   First, ECN-Capable routers will only mark packets (as opposed to
555	   dropping them) when the packet marking rate is reasonably low. During
556	   periods where the average queue size exceeds an upper threshold, and
557	   therefore the potential packet marking rate would be high, our
558	   recommendation is that routers drop packets rather then set the CE
559	   bit in packet headers.

561	   During the periods of low or moderate packet marking rates when ECN
562	   would be deployed, there would be little deterrent effect on
563	   unresponsive flows of dropping rather than marking those packets. For
564	   example, delay-insensitive flows using reliable delivery might have
565	   an incentive to increase rather than to decrease their sending rate
566	   in the presence of dropped packets.  Similarly, delay-sensitive flows
567	   using unreliable delivery might increase their use of FEC in response
568	   to an increased packet drop rate, increasing rather than decreasing
569	   their sending rate.  For the same reasons, we do not believe that
570	   packet dropping itself is an effective deterrent for non-compliance
571	   even in an environment of high packet drop rates.

573	   Several methods have been proposed to identify and restrict non-
574	   compliant or unresponsive flows. The addition of ECN to the network
575	   environment would not in any way increase the difficulty of designing
576	   and deploying such mechanisms. If anything, the addition of ECN to
577	   the architecture would make the job of identifying unresponsive flows
578	   slightly easier.  For example, in an ECN-Capable environment routers
579	   are not limited to information about packets that are dropped or have
580	   the CE bit set at that router itself; in such an environment routers
581	   could also take note of arriving CE packets that indicate congestion
582	   encountered by that packet earlier in the path.

584	10. Non-compliance in the Network

586	   The breakdown of effective congestion control could be caused not
587	   only by a non-compliant end-node, but also by the loss of the
588	   congestion indication in the network itself.  This could happen
589	   through a rogue or broken router that set the ECT bit in a packet
590	   from a non-ECN-capable transport, or "erased" the CE bit in arriving
591	   packets.  As one example, a rogue or broken router that "erased" the
592	   CE bit in arriving CE packets would prevent that indication of
593	   congestion from reaching downstream receivers.  This could result in
594	   the failure of congestion control for that flow and a resulting
595	   increase in congestion in the network, ultimately resulting in
596	   subsequent packets dropped for this flow as the average queue size
597	   increased at the congested gateway.

599	   The actions of a rogue or broken router could also result in an
600	   unnecessary indication of congestion to the end-nodes.  These actions
601	   can include a router dropping a packet or setting the CE bit in the
602	   absence of congestion. From a congestion control point of view,
603	   setting the CE bit in the absence of congestion by a non-compliant
604	   router would be no different than a router dropping a packet
605	   unecessarily. By "erasing" the ECT bit of a packet that is later
606	   dropped in the network, a router's actions could result in an
607	   unnecessary packet drop for that packet later in the network.

609	   Concerns regarding the loss of congestion indications from
610	   encapsulated, dropped, or corrupted packets are discussed below.

612	10.1. Encapsulated packets

614	   Some care is required to handle the CE and ECT bits appropriately
615	   when packets are encapsulated and de-encapsulated for tunnels.

617	   When a packet is encapsulated, the following rules apply regarding
618	   the ECT bit.  First, if the ECT bit in the encapsulated ('inside')
619	   header is a 0, then the ECT bit in the encapsulating ('outside')
620	   header MUST be a 0.  If the ECT bit in the inside header is a 1, then
621	   the ECT bit in the outside header SHOULD be a 1.

623	   When a packet is de-encapsulated, the following rules apply regarding
624	   the CE bit.  If the ECT bit is a 1 in both the inside and the outside
625	   header, then the CE bit in the outside header MUST be ORed with the
626	   CE bit in the inside header.  (That is, in this case a CE bit of 1 in
627	   the outside header must be copied to the inside header.)  If the ECT
628	   bit in either header is a 0, then the CE bit in the outside header is
629	   ignored.  This requirement for the treatment of de-encapsulated
630	   packets does not currently apply to IPsec tunnels.

632	   A specific example of the use of ECN with encapsulation occurs when a
633	   flow wishes to use ECN-capability to avoid the danger of an
634	   unnecessary packet drop for the encapsulated packet as a result of
635	   congestion at an intermediate node in the tunnel.  This functionality
636	   can be supported by copying the ECN field in the inner IP header to
637	   the outer IP header upon encapsulation, and using the ECN field in
638	   the outer IP header to set the ECN field in the inner IP header upon
639	   decapsulation.  This effectively allows routers along the tunnel to
640	   cause the CE bit to be set in the ECN field of the unencapsulated IP
641	   header of an ECN-capable packet when such routers experience
642	   congestion.

644	10.2.  IPsec Tunnel Considerations

646	   The IPsec protocol, as defined in [RFC-ESP?, RFC-AH?], does not
647	   include the IP header's ECN field in any of its cryptographic
648	   calculations (in the case of tunnel mode, the outer IP header's ECN
649	   field is not included).  Hence modification of the ECN field by a
650	   network node has no effect on IPsec's end-to-end security, because it
651	   cannot cause any IPsec integrity check to fail.  As a consequence,
652	   IPsec does not provide any defense against an adversary's
653	   modification of the ECN field (i.e., a man-in-the-middle attack), as
654	   the adversary's modification will also have no effect on IPsec's
655	   end-to-end security.  In some environments, the ability to modify the
656	   ECN field without affecting IPsec integrity checks may constitute a
657	   covert channel; if it is necessary to eliminate such a channel or
658	   reduce its bandwidth, then the outer IP header's ECN field can be
659	   zeroed at the tunnel ingress and egress nodes.

661	   The IPsec protocol currently requires that the inner header's ECN
662	   field not be changed by IPsec decapsulation processing at a tunnel
663	   egress node.  This ensures that an adversary's modifications to the
664	   ECN field cannot be used to launch theft- or denial-of-service
665	   attacks across an IPsec tunnel endpoint, as any such modifications
666	   will be discarded at the tunnel endpoint.  This document makes no
667	   change to that IPsec requirement. As a consequence of the current
668	   specification of the IPsec protocol, we suggest that experiments with
669	   ECN not be carried out for flows that will undergo IPsec tunneling at
670	   the present time.

672	   If the IPsec specifications are modified in the future to permit a
673	   tunnel egress node to modify the ECN field in an inner IP header
674	   based on the ECN field value in the outer header (e.g., copying part
675	   or all of the outer ECN field to the inner ECN field), or to permit
676	   the ECN field of the outer IP header to be zeroed during
677	   encapsulation, then experiments with ECN may be used in combination
678	   with IPsec tunneling.

680	   This discussion of ECN and IPsec tunnel considerations draws heavily
681	   on related discussions and documents from the Differentiated Services
682	   Working Group.

684	10.3.  Dropped or Corrupted Packets

686	   An additional issue concerns a packet that has the CE bit set at one
687	   router and is dropped by a subsequent router.  For the proposed use
688	   for ECN in this paper (that is, for a transport protocol such as TCP
689	   for which a dropped data packet is an indication of congestion), end
690	   nodes detect dropped data packets, and the congestion response of the
691	   end nodes to a dropped data packet is at least as strong as the
692	   congestion response to a received CE packet.

694	   However, transport protocols such as TCP do not necessarily detect
695	   all packet drops, such as the drop of a "pure" ACK packet; for
696	   example, TCP does not reduce the arrival rate of subsequent ACK
697	   packets in response to an earlier dropped ACK packet.  Any proposal
698	   for extending ECN-Capability to such packets would have to address
699	   concerns raised by CE packets that were later dropped in the network.

701	   Similarly, if a CE packet is dropped later in the network due to
702	   corruption (bit errors), the end nodes should still invoke congestion
703	   control, just as TCP would today in response to a dropped data
704	   packet. This issue of corrupted CE packets would have to be
705	   considered in any proposal for the network to distinguish between
706	   packets dropped due to corruption, and packets dropped due to
707	   congestion or buffer overflow.

709	11. A summary of related work.

711	   [Floyd94] considers the advantages and drawbacks of adding ECN to the
712	   TCP/IP architecture.  As shown in the simulation-based comparisons,
713	   one advantage of ECN is to avoid unnecessary packet drops for short
714	   or delay-sensitive TCP connections.  A second advantage of ECN is in
715	   avoiding some unnecessary retransmit timeouts in TCP.  This paper
716	   discusses in detail the integration of ECN into TCP's congestion
717	   control mechanisms.  The possible disadvantages of ECN discussed in
718	   the paper are that a non-compliant TCP connection could falsely
719	   advertise itself as ECN-capable, and that a TCP ACK packet carrying
720	   an ECN-Echo message could itself be dropped in the network.  The
721	   first of these two issues is discussed in Section 8 of this document,
722	   and the second is addressed by the proposal in Section 5.1.3 for a
723	   CWR flag in the TCP header.

725	   [CKLTZ97] reports on an experimental implementation of ECN in IPv6.
726	   The experiments include an implementation of ECN in an existing
727	   implementation of RED for FreeBSD.  A number of experiments were run
728	   to demonstrate the control of the average queue size in the router,
729	   the performance of ECN for a single TCP connection as a congested
730	   router, and fairness with multiple competing TCP connections.  One
731	   conclusion of the experiments is that dropping packets from a bulk-
732	   data transfer can degrade performance much more severely than marking
733	   packets.

735	   Because the experimental implementation in [CKLTZ97] predates some of
736	   the developments in this document, the implementation does not
737	   conform to this document in all respects.  For example, in the
738	   experimental implementation the CWR flag is not used, but instead the
739	   TCP receiver sends the ECN-Echo bit on a single ACK packet.

741	   [K98] and [CKLTZ98] build on [CKLTZ97] to further analyze the
742	   benefits of ECN for TCP. The conclusions are that ECN TCP gets
743	   moderately better throughput than non-ECN TCP; that ECN TCP flows are
744	   fair towards non-ECN TCP flows; and that ECN TCP is robust with two-
745	   way traffic, congestion in both directions, and with multiple
746	   congested gateways.  Experiments with many short web transfers show
747	   that, while most of the short connections have similar transfer times
748	   with or without ECN, a small percentage of the short connections have
749	   very long transfer times for the non-ECN experiments as compared to
750	   the ECN experiments.  This increased transfer time is particularly
751	   dramatic for those short connections that have their first packet
752	   dropped in the non-ECN experiments, and that therefore have to wait
753	   six seconds for the retransmit timer to expire.

755	   The ECN Web Page [ECN] has pointers to other implementations of ECN
756	   in progress.

758	12. Conclusions

760	   Given the current effort to implement RED, we believe this is the
761	   right time for router vendors to examine how to implement congestion
762	   avoidance mechanisms that do not depend on packet drops alone.  With
763	   the increased deployment of applications and transports sensitive to
764	   the delay and loss of a single packet (e.g., realtime traffic, short
765	   web transfers), depending on packet loss as a normal congestion
766	   notification mechanism appears to be insufficient (or at the very
767	   least, non-optimal).

769	13. Acknowledgements

771	   Many people have made contributions to this internet-draft.  In
772	   particular, we would like to thank Kenjiro Cho for the proposal for
773	   the TCP mechanism for negotiating ECN-Capability, Kevin Fall for the
774	   proposal of the CWR bit, Steve Blake for material on IPv4 Header
775	   Checksum Recalculation, Jamal Hadi Salim for discussions of ECN
776	   issues, and Steve Bellovin, Jim Bound, Brian Carpenter, Paul
777	   Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson for
778	   discussions of security issues.  We also thank the Internet End-to-
779	   End Research Group for ongoing discussions of these issues.

781	14. References

783	   [RFC-AH?] S. Kent and R. Atkinson, "IP Authentication Header",
784	   Internet Draft <draft-ietf-ipsec-auth-header-07.txt>, July 1998.

786	   [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement
787	   Levels", BCP 14, RFC 2119, March 1997.

789	   [CKLTZ97] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L.,
790	   "Implementing Explicit Congestion Notification (ECN) in TCP over
791	   IPv6", UCLA Technical Report, December 1997, URL
792	   "http://www.cs.ucla.edu/~hari/software/ecn/ecn_rpt.ps.gz".

794	   [CKLTZ98] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L.,
795	   "Implementing ECN for TCP/IPv6", presentation to the ECN BOF at the
796	   L.A. IETF, March 1998, URL "http://www.cs.ucla.edu/~hari/ecn-
797	   ietf.ps".

799	   [RFC-DIFFSERV?] Kathleen Nichols, Steven Blake, Fred Baker, and David
800	   L.  Black, "Definition of the Differentiated Services Field (DS
801	   Field) in the IPv4 and IPv6 Headers", Internet draft draft-ietf-
802	   diffserv-header-04.txt in last call, October 1998.

804	   [ECN] "The ECN Web Page", URL "http://www-
805	   nrg.ee.lbl.gov/floyd/ecn.html".

807	   [RFC-ESP?] S. Kent and R. Atkinson, "IP Encapsulating Security
808	   Payload", Internet Draft <draft-ietf-ipsec-esp-v2-06.txt>, July 1998.

810	   [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways
811	   for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1
812	   N.4, August 1993, p. 397-413.  URL
813	   "ftp://ftp.ee.lbl.gov/papers/early.pdf".

815	   [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM
816	   Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23.
817	   URL "ftp://ftp.ee.lbl.gov/papers/tcp_ecn.4.ps.Z".

819	   [Floyd97] Floyd, S., and Fall, K., "Router Mechanisms to Support
820	   End-to-End Congestion Control", Technical report, February 1997.  URL
821	   "ftp://ftp.ee.lbl.gov/papers/collapse.ps".

823	   [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator",
824	   URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all-
825	   ecn.

827	   [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN)
828	   benefits for TCP", Master's thesis, UCLA, 1998, URL
829	   "http://www.cs.ucla.edu/~hari/software/ecn/ecn_report.ps.gz".

831	   [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection",
832	   SIGCOMM '97, September 1997.  URL
833	   "http://www.inria.fr/rodeo/sigcomm97/program.html#ab078".

835	   [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc.
836	   ACM SIGCOMM '88, pp. 314-329.  URL
837	   "ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z".

839	   [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance
840	   Algorithm", Message to end2end-interest mailing list, April 1990.
841	   URL "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt".

843	   [MJV96], S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven
844	   Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130.

846	   [RFC791] J. Postel, Internet Protocol, RFC 791, September 1981.

848	   [RFC793] J. Postel, Transmission Control Protocol, RFC 793, September
849	   1981.

851	   [RFC1141] T. Mallory and A. Kullberg, "Incremental Updating of the
852	   Internet Checksum", RFC 1141, January 1990.

854	   [RFC1349] P. Almquist, "Type of Service in the Internet Protocol
855	   Suite", RFC 1349, July 1992.

857	   [RFC1455] D. Eastlake, "Physical Link Security Type of Service", RFC
858	   1455, May 1993.

860	   [RFC2001] W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast
861	   Retransmit, and Fast Recovery Algorithms", RFC 2001, January 1997.

863	   [RFC2309] B. Braden, D. Clark, J. Crowcroft, B. Davie, S. Deering, D.
864	   Estrin, S. Floyd, V. Jacobson, G. Minshall, C. Partridge, L.
865	   Peterson, K. Ramakrishnan, S. Shenker, J. Wroclawski, L. Zhang,
866	   "Recommendations on Queue Management and Congestion Avoidance in the
867	   Internet", RFC 2309, April 1998.

869	   [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for
870	   Congestion Avoidance in Computer Networks", ACM Transactions on
871	   Computer Systems, Vol.8, No.2, pp. 158-181, May 1990.

873	15. Security Considerations

875	   Security considerations have been discussed in Section 9.

877	16. IPv4 Header Checksum Recalculation

879	   IPv4 header checksum recalculation is an issue with some high-end
880	   router architectures using an output-buffered switch, since most if
881	   not all of the header manipulation is performed on the input side of
882	   the switch, while the ECN decision would need to be made local to the
883	   output buffer. This is not an issue for IPv6, since there is no IPv6
884	   header checksum. The IPv4 TOS octet is the last byte of a 16-bit
885	   half-word.

887	   RFC 1141 [RFC1141] discusses the incremental updating of the IPv4
888	   checksum after the TTL field is decremented.  The incremental
889	   updating of the IPv4 checksum after the CE bit was set would work as
890	   follows: Let HC be the original header checksum, and let HC' be the
891	   new header checksum after the CE bit has been set.  Then for header
892	   checksums calculated with one's complement subtraction, HC' would be
893	   recalculated as follows:
894	      HC' = { HC - 1     HC > 1
895	            { 0x0000     HC = 1
896	   For header checksums calculated on two's complement machines, HC'
897	   would be recalculated as follows after the CE bit was set:
898	       HC' = { HC - 1     HC > 0
899	             { 0xFFFE     HC = 0

901	17. The motivation for the ECT bit.

903	   The need for the ECT bit is motivated by the fact that ECN will be
904	   deployed incrementally in an Internet where some transport protocols
905	   and routers understand ECN and some do not. With the ECT bit, the
906	   router can drop packets from flows that are not ECN-capable, but can
907	   *instead* set the CE bit in flows that *are* ECN-capable. Because the
908	   ECT bit allows an end node to have the CE bit set in a packet
909	   *instead* of having the packet dropped, an end node might have some
910	   incentive to deploy ECN.

912	   If there was no ECT indication, then the router would have to set the
913	   CE bit for packets from both ECN-capable and non-ECN-capable flows.
914	   In this case, there would be no incentive for end-nodes to deploy
915	   ECN, and no viable path of incremental deployment from a non-ECN
916	   world to an ECN-capable world.  Consider the first stages of such an
917	   incremental deployment, where a subset of the flows are ECN-capable.
918	   At the onset of congestion, when the packet dropping/marking rate
919	   would be low, routers would only set CE bits, rather than dropping
920	   packets.  However, only those flows that are ECN-capable would
921	   understand and respond to CE packets. The result is that the ECN-
922	   capable flows would back off, and the non-ECN-capable flows would be
923	   unaware of the ECN signals and would continue to open their
924	   congestion windows.

926	   In this case, there are two possible outcomes: (1) the ECN-capable
927	   flows back off, the non-ECN-capable flows get all of the bandwidth,
928	   and congestion remains mild, or (2) the ECN-capable flows back off,
929	   the non-ECN-capable flows don't, and congestion increases until the
930	   router transitions from setting the CE bit to dropping packets.
931	   While this second outcome evens out the fairness, the ECN-capable
932	   flows would still receive little benefit from being ECN-capable,
933	   because the increased congestion would drive the router to packet-
934	   dropping behavior.

936	   A flow that advertised itself as ECN-Capable but does not respond to
937	   CE bits is functionally equivalent to a flow that turns off
938	   congestion control, as discussed in Sections 8 and 9.

940	   Thus, in a world when a subset of the flows are ECN-capable, but
941	   where ECN-capable flows have no mechanism for indicating that fact to
942	   the routers, there would be less effective and less fair congestion
943	   control in the Internet, resulting in a strong incentive for end
944	   nodes not to deploy ECN.

946	18. Why use two bits in the IP header?

948	   Given the need for an ECT indication in the IP header, there still
949	   remains the question of whether the ECT (ECN-Capable Transport) and
950	   CE (Congestion Experienced) indications should be overloaded on a
951	   single bit.  This overloaded-one-bit alternative, explored in
952	   [Floyd94], would involve a single bit with two values.  One value,
953	   "ECT and not CE", would represent an ECN-Capable Transport, and the
954	   other value, "CE or not ECT", would represent either Congestion
955	   Experienced or a non-ECN-Capable transport.

957	   One difference between the one-bit and two-bit implementations
958	   concerns packets that traverse multiple congested routers.  Consider
959	   a CE packet that arrives at a second congested router, and is
960	   selected by the active queue management at that router for either
961	   marking or dropping.  In the one-bit implementation, the second
962	   congested router has no choice but to drop the CE packet, because it
963	   cannot distinguish between a CE packet and a non-ECT packet.  In the
964	   two-bit implementation, the second congested router has the choice of
965	   either dropping the CE packet, or of leaving it alone with the CE bit
966	   set.

968	   Another difference between the one-bit and two-bit implementations
969	   comes from the fact that with the one-bit implementation, receivers
970	   in a single flow cannot distinguish between CE and non-ECT packets.
971	   Thus, in the one-bit implementation an ECN-capable data sender would
972	   have to unambiguously indicate to the receiver or receivers whether
973	   each packet had been sent as ECN-Capable or as non-ECN-Capable.  One
974	   possibility would be for the sender to indicate in the transport
975	   header whether the packet was sent as ECN-Capable.  A second
976	   possibility that would involve a functional limitation for the one-
977	   bit implementation would be for the sender to unambiguously indicate
978	   that it was going to send *all* of its packets as ECN-Capable or as
979	   non-ECN-Capable.  For a multicast transport protocol, this
980	   unambiguous indication would have to be apparent to receivers joining
981	   an on-going multicast session.

983	   Another advantage of the two-bit approach is that it is somewhat more
984	   robust.  The most critical issue, discussed in Section 8, is that the
985	   default indication should be that of a non-ECN-Capable transport.  In
986	   a two-bit implementation, this requirement for the default value
987	   simply means that the ECT bit should be `OFF' by default.  In the
988	   one-bit implementation, this means that the single overloaded bit
989	   should by default be in the "CE or not ECT" position.  This is less
990	   clear and straightforward, and possibly more open to incorrect
991	   implementations either in the end nodes or in the routers.

993	   In summary, while the one-bit implementation could be a possible
994	   implementation, it has the following significant limitations relative
995	   to the two-bit implementation.  First, the one-bit implementation has
996	   more limited functionality for the treatment of CE packets at a
997	   second congested router.  Second, the one-bit implementation requires
998	   either that extra information be carried in the transport header of
999	   packets from ECN-Capable flows (to convey the functionality of the
1000	   second bit elsewhere, namely in the transport header), or that
1001	   senders in ECN-Capable flows accept the limitation that receivers
1002	   must be able to determine a priori which packets are ECN-Capable and
1003	   which are not ECN-Capable. Third, the one-bit implementation is
1004	   possibly more open to errors from faulty implementations that choose
1005	   the wrong default value for the ECN bit.  We believe that the use of
1006	   the extra bit in the IP header for the ECT-bit is extremely valuable
1007	   to overcome these limitations.

1009	19.  Historical definitions for the IPv4 TOS octet

1011	   RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP
1012	   header.  In RFC 791, bits 6 and 7 of the ToS octet are listed as
1013	   "Reserved for Future Use", and are shown set to zero.  The first two
1014	   fields of the ToS octet were defined as the Precedence and Type of
1015	   Service (TOS) fields.

1017	            0     1     2     3     4     5     6     7
1018	         +-----+-----+-----+-----+-----+-----+-----+-----+
1019	         |   PRECEDENCE    |       TOS       |  0  |  0  |    RFC 791
1020	         +-----+-----+-----+-----+-----+-----+-----+-----+

1022	   RFC 1122 included bits 6 and 7 in the TOS field, though it did not
1023	   discuss any specific use for those two bits:

1025	            0     1     2     3     4     5     6     7
1026	         +-----+-----+-----+-----+-----+-----+-----+-----+
1027	         |   PRECEDENCE    |       TOS                   |    RFC 1122
1028	         +-----+-----+-----+-----+-----+-----+-----+-----+

1030	   The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows:

1032	            0     1     2     3     4     5     6     7
1033	         +-----+-----+-----+-----+-----+-----+-----+-----+
1034	         |   PRECEDENCE    |       TOS             | MBZ |    RFC 1349
1035	         +-----+-----+-----+-----+-----+-----+-----+-----+

1037	   Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary
1038	   Cost".  In addition to the Precedence and Type of Service (TOS)
1039	   fields, the last field, MBZ (for "must be zero") was defined as
1040	   currently unused.  RFC 1349 stated that "The originator of a datagram
1041	   sets [the MBZ] field to zero (unless participating in an Internet
1042	   protocol experiment which makes use of that bit)."

1044	   RFC 1455 [RFC 1455] defined an experimental standard that used all
1045	   four bits in the TOS field to request a guaranteed level of link
1046	   security.

1048	   RFC 1349 is obsoleted by "Definition of the Differentiated Services
1049	   Field (DS Field) in the IPv4 and IPv6 Headers" [RFC-DIFFSERV?], in
1050	   which bits 6 and 7 of the DS field are listed as Currently Unused
1051	   (CU).  The first six bits of the DS field are defined as the
1052	   Differentiated Services CodePoint (DSCP):

1054	            0     1     2     3     4     5     6     7
1055	         +-----+-----+-----+-----+-----+-----+-----+-----+
1056	         |               DSCP                |    CU     |
1057	         +-----+-----+-----+-----+-----+-----+-----+-----+

1059	   Because of this unstable history, the definition of the ECN field in
1060	   this document cannot be guaranteed to be backwards compatible with
1061	   all past uses of these two bits.  The damage that could be done by a
1062	   non-ECN-capable router would be to "erase" the CE bit for an ECN-
1063	   capable packet that arrived at the router with the CE bit set, or set
1064	   the CE bit even in the absence of congestion.  This has been
1065	   discussed in Section 10 on "Non-compliance in the Network".

1067	   The damage that could be done in an ECN-capable environment by a
1068	   non-ECN-capable end-node transmitting packets with the ECT bit set
1069	   has been discussed in Section 9 on "Non-compliance by the End Nodes".

1071	AUTHORS' ADDRESSES

1073	   K. K. Ramakrishnan
1074	   AT&T Labs. Research
1075	   Phone: +1 (973) 360-8766
1076	   Email: kkrama@research.att.com
1077	   URL: http://www.research.att.com/info/kkrama

1079	   Sally Floyd
1080	   Lawrence Berkeley National Laboratory
1081	   Phone: +1 (510) 486-7518
1082	   Email: floyd@ee.lbl.gov
1083	   URL: http://www-nrg.ee.lbl.gov/floyd/

1085	   This draft was created in October 1998.
1086	   It expires April 1999.