idnits 2.17.1 

draft-kksjf-ecn-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-19) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 20
     longer pages, the longest (page 2) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 21 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 169: '...ms followed at the end-systems MUST be...'
     RFC 2119 keyword, line 197: '...cket, the router MAY instead set the C...'
     RFC 2119 keyword, line 566: '...   header MUST be a 0.  If the ECT bit...'
     RFC 2119 keyword, line 567: '...t in the outside header SHOULD be a 1....'
     RFC 2119 keyword, line 571: '...e outside header MUST be ORed with the...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (September 1998) is 9348 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC 2001' is mentioned on line 266, but not defined

  ** Obsolete undefined reference: RFC 2001 (Obsoleted by RFC 2581)

  == Unused Reference: 'Floyd97' is defined on line 756, but no explicit
     reference was found in the text

  == Unused Reference: 'FRED' is defined on line 768, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'AH'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CKLTZ97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CKLT98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ECN'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ESP'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ93'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd94'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'K98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FRED'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson88'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson90'

  ** Downref: Normative reference to an Informational RFC: RFC 1141

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MJV96'

  ** Obsolete normative reference: RFC 2001 (Obsoleted by RFC 2581)

  ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'RJ90'


     Summary: 14 errors (**), 0 flaws (~~), 6 warnings (==), 17 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                       K. K. Ramakrishnan
2	INTERNET DRAFT                                        AT&T Labs Research
3	draft-kksjf-ecn-02.txt                                       Sally Floyd
4	                                                                    LBNL
5	                                                          September 1998
6	                                                    Expires:  March 1999

8	     A Proposal to add Explicit Congestion Notification (ECN) to IP

10	                          Status of this Memo

12	   This document is an Internet-Draft.  Internet-Drafts are working
13	   documents of the Internet Engineering Task Force (IETF), its areas,
14	   and its working groups.  Note that other groups may also distribute
15	   working documents as Internet-Drafts.

17	   Internet-Drafts are draft documents valid for a maximum of six months
18	   and may be updated, replaced, or obsoleted by other documents at any
19	   time.  It is inappropriate to use Internet- Drafts as reference
20	   material or to cite them other than as "work in progress."

22	   To view the entire list of current Internet-Drafts, please check the
23	   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
24	   Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern
25	   Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific
26	   Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast).

28	Abstract

30	   This note describes a proposed addition of ECN (Explicit Congestion
31	   Notification) to IP.  TCP is currently the dominant transport
32	   protocol used in the Internet. We begin by describing TCP's use of
33	   packet drops as an indication of congestion.  Next we argue that with
34	   the addition of active queue management (e.g., RED) to the Internet
35	   infrastructure, where routers detect congestion before the queue
36	   overflows, routers are no longer limited to packet drops as an
37	   indication of congestion.  Routers could instead set a Congestion
38	   Experienced (CE) bit in the packet header of packets from ECN-capable
39	   transport protocols.  We describe when the CE bit would be set in the
40	   routers, and describe what modifications would be needed to TCP to
41	   make it ECN-capable.  Modifications to other transport protocols
42	   (e.g., unreliable unicast or multicast, reliable multicast, other
43	   reliable unicast transport protocols) could be considered as those
44	   protocols are developed and advance through the standards process.

46	1. Introduction

48	   TCP's congestion control and avoidance algorithms are based on the
49	   notion that the network is a black-box [Jacobson88, Jacobson90].  The
50	   network's state of congestion or otherwise is determined by end-
51	   systems probing for the network state, by gradually increasing the
52	   load on the network (by increasing the window of packets that are
53	   outstanding in the network) until the network becomes congested and a
54	   packet is lost.  Treating the network as a "black-box" and treating
55	   loss as an indication of congestion in the network is appropriate for
56	   pure best-effort data carried by TCP which has little or no
57	   sensitivity to delay or loss of individual packets.  In addition,
58	   TCP's congestion management algorithms have techniques built-in (such
59	   as Fast Retransmit and Fast Recovery) to minimize the impact of
60	   losses from a throughput perspective.

62	   However, these mechanisms are not intended to help applications that
63	   are in fact sensitive to the delay or loss of one or more individual
64	   packets.  Interactive traffic such as telnet, web-browsing, and
65	   transfer of audio and video data can be sensitive to packet losses
66	   (using an unreliable data delivery transport such as UDP) or to the
67	   increased latency of the packet caused by the need to retransmit the
68	   packet after a loss (for reliable data delivery such as TCP).

70	   Since TCP determines the appropriate congestion window to use by
71	   gradually increasing the window size until it experiences a dropped
72	   packet, this causes the queues at the bottleneck router to build up.
73	   With most packet drop policies at the router that are not sensitive
74	   to the load placed by each individual flow, this means that some of
75	   the packets of latency-sensitive flows are going to be dropped.
76	   Active queue management mechanisms detect congestion before the queue
77	   overflows, and provide an indication of this congestion to the end
78	   nodes.  The advantages of active queue management are discussed in
79	   RFC 2309 [RFC2309].  Active queue management avoids some of the bad
80	   properties of dropping on queue overflow, including the undesirable
81	   synchronization of loss across multiple flows.  More importantly,
82	   active queue management means that transport protocols with
83	   congestion control (e.g., TCP) do not have to rely on buffer overflow
84	   as the only indication of congestion.  This can reduce unnecessary
85	   queueing delay for all traffic sharing that queue.

87	   Active queue management mechanisms may use one of several methods for
88	   indicating congestion to end-nodes. One is to use packet drops, as is
89	   currently done.  However, active queue management allows the router
90	   to separate policies of queueing or dropping packets from the
91	   policies for indicating congestion.  Thus, active queue management
92	   allows routers to use the Congestion Experienced (CE) bit in a packet
93	   header as an indication of congestion, instead of relying solely on
94	   packet drops.

96	2. Assumptions and General Principles

98	   In this section, we describe some of the important design principles
99	   and assumptions that guided the design choices in this proposal.

101	   (1) Congestion may persist over different time-scales. The time
102	   scales that we are concerned with are congestion events that may last
103	   longer than a round-trip time.
104	   (2) The number of packets in an individual flow (e.g., TCP connection
105	   or an exchange using UDP) may range from a small number of packets to
106	   quite a large number. We are interested in managing the congestion
107	   caused by flows that send enough packets so that they are still
108	   active when network feedback reaches them.
109	   (3) New mechanisms for congestion control and avoidance need to co-
110	   exist and cooperate with existing mechanisms for congestion control.
111	   In particular, new mechanisms have to co-exist with TCP's current
112	   methods of adapting to congestion and with routers' current practice
113	   of dropping packets in periods of congestion.
114	   (4) Because ECN is likely to be adopted gradually, accommodating
115	   migration is essential. Some routers may still only drop packets to
116	   indicate congestion, and some end-systems may not be ECN-capable.
117	   The most viable strategy is one that accommodates incremental
118	   deployment without having to resort to "islands" of ECN-capable and
119	   non-ECN-capable environments.
120	   (5) Asymmetric routing is likely to be a normal occurrence in the
121	   Internet.  The path (sequence of links and routers) followed by data
122	   packets may be different from the path followed by the acknowledgment
123	   packets in the reverse direction.
124	   (6) Many routers process the "regular" headers in IP packets more
125	   efficiently than they process the header information in IP options.
126	   This suggests keeping congestion experienced information in the
127	   regular headers of an IP packet.
128	   (7) It must be recognized that not all end-systems will cooperate in
129	   mechanisms for congestion control. However, new mechanisms shouldn't
130	   make it easier for TCP applications to disable TCP congestion
131	   control. The benefit of lying about participating in new mechanisms
132	   such as ECN-capability should be small.

134	3. Random Early Detection (RED)

136	   Random Early Detection (RED) is a mechanism for active queue
137	   management that has been proposed to detect incipient congestion
138	   [FJ93], and is currently being deployed in the Internet backbone
139	   [RFC2309].  Although RED is meant to be a general mechanism using one
140	   of several alternatives for congestion indication, in the current
141	   environment of the Internet RED is restricted to using packet drops
142	   as a mechanism for congestion indication.  RED drops packets based on
143	   the average queue length exceeding a threshold, rather than only when
144	   the queue overflows.  However, when RED drops packets before the
145	   queue actually overflows, RED is not forced by memory limitations to
146	   discard the packet.

148	   RED could set a Congestion Experienced (CE) bit in the packet header
149	   instead of dropping the packet, if such a bit was provided in the IP
150	   header and understood by the transport protocol.  The use of the CE
151	   bit would allow the receiver(s) to receive the packet, avoiding the
152	   potential for excessive delays due to retransmissions after packet
153	   losses.  We use the term 'CE packet' to denote a packet that has the
154	   CE bit set.

156	4. Explicit Congestion Notification in IP

158	   We propose that the Internet provide a congestion indication for
159	   incipient congestion (as in RED and earlier work [RJ90]) where the
160	   notification can sometimes be through marking packets rather than
161	   dropping them.  This would require an ECN field in the IP header with
162	   two bits.  The ECN-Capable Transport (ECT) bit would be set by the
163	   data sender to indicate that the end-points of the transport protocol
164	   are ECN-capable.  The CE bit would be set by the router to indicate
165	   congestion to the end nodes.  Routers that have a packet arriving at
166	   a full queue would drop the packet, just as they do now.

168	   Upon the receipt by an ECN-Capable transport of a single CE packet,
169	   the congestion control algorithms followed at the end-systems MUST be
170	   essentially the same as the congestion control response to a *single*
171	   dropped packet.  For example, for TCP the source TCP halves its
172	   congestion window "cwnd" in response to an ECN indication received by
173	   the data receiver.

175	   One reason for requiring that the congestion-control response to the
176	   CE packet be essentially the same as the response to a dropped packet
177	   is to accommodate the incremental deployment of ECN in both end-
178	   systems and in routers.  Some routers may drop ECN-Capable packets
179	   (e.g., using the same RED policies for congestion detection) while
180	   other routers set the CE bit, for equivalent levels of congestion.
181	   Similarly, a router might drop a non-ECN-Capable packet but set the
182	   CE bit in an ECN-Capable packet, for equivalent levels of congestion.
183	   Different congestion control responses to a CE bit indication and to
184	   a packet drop could result in unfair treatment for different flows.

186	   An additional requirement is that the end-systems should react to
187	   congestion at most once per window of data (i.e., at most once per
188	   roundtrip time), to avoid reacting multiple times to multiple
189	   indications of congestion within a roundtrip time.

191	   For a router, the CE bit of an ECN-Capable packet should only be set
192	   if the router would otherwise have dropped the packet as an
193	   indication of congestion to the end nodes.  When the router's buffer
194	   is not yet full and the router is prepared to drop a packet to inform
195	   end nodes of incipient congestion, the router should first check to
196	   see if the ECT bit is set in that packet's IP header.  If so, then
197	   instead of dropping the packet, the router MAY instead set the CE bit
198	   in the IP header.

200	   An environment where all end nodes were ECN-Capable could allow new
201	   criteria to be developed for setting the CE bit, and new congestion
202	   control mechanisms for end-node reaction to CE packets.  However,
203	   this is a research issue, and as such is not addressed in this
204	   document.

206	   When a CE packet is received by a router, the CE bit is left
207	   unchanged, and the packet transmitted as usual.  When severe
208	   congestion has occurred and the router's queue is full, then the
209	   router has no choice but to drop some packet when a new packet
210	   arrives.  We anticipate that such packet losses will become
211	   relatively infrequent when a majority of end-systems become ECN-
212	   Capable and participate in TCP or other compatible congestion control
213	   mechanisms.  In an adequately-provisioned network in such an ECN-
214	   Capable environment, packet losses should occur primarily during
215	   transients or in the presence of non-cooperating sources.

217	   We expect that routers will set the CE bit in response to incipient
218	   congestion as indicated by the average queue size, using the RED
219	   algorithms suggested in [FJ93, RFC2309].  To the best of our
220	   knowledge, this is the only proposal currently under discussion in
221	   the IETF for routers to drop packets proactively, before the buffer
222	   overflows.  However, this document does not attempt to specify a
223	   particular mechanism for active queue management, leaving that
224	   endeavor, if needed, to other areas of the IETF.  While ECN is
225	   inextricably tied up with active queue management at the router, the
226	   reverse does not hold; active queue management mechanisms have been
227	   developed and deployed independently from ECN, using packet drops as
228	   indications of congestion in the absence of ECN in the IP
229	   architecture.

231	5. Support from the Transport Protocol

233	   ECN requires support from the transport protocol, in addition to the
234	   functionality given by the ECN field in the IP packet header.  The
235	   transport protocol might require negotiation between the endpoints
236	   during setup to determine that all of the endpoints are ECN-capable,
237	   so that the sender can set the ECT bit in transmitted packets.
238	   Second, the transport protocol must be capable of reacting
239	   appropriately to the receipt of CE packets.  This reaction could be
240	   in the form of the data receiver informing the data sender of the
241	   received CE packet (e.g., TCP), of the data receiver unsubscribing to
242	   a layered multicast group (e.g., RLM [MJV96]), or of some other
243	   action that ultimately reduces the arrival rate of that flow to that
244	   receiver.

246	   This document only addresses the addition of ECN Capability to TCP,
247	   leaving issues of ECN and other transport protocols to further
248	   research.  For TCP, ECN requires three new mechanisms:  negotiation
249	   between the endpoints during setup to determine if they are both ECN-
250	   capable; an ECN-Echo flag in the TCP header so that the data receiver
251	   can inform the data sender when a CE packet has been received; and a
252	   Congestion Window Reduced (CWR) flag in the TCP header so that the
253	   data sender can inform the data receiver that the congestion window
254	   has been reduced.  The support required from other transport
255	   protocols is likely to be different, particular for unreliable or
256	   reliable multicast transport protocols, and will have to be
257	   determined as other transport protocols are brought to the IETF for
258	   standardization.

260	5.1. TCP

262	   The following sections describe in detail the proposed use of ECN in
263	   TCP.  This proposal is described in essentially the same form in
264	   [Floyd94].  We assume that the source TCP uses the standard
265	   congestion control algorithms of Slow-start, Fast Retransmit and Fast
266	   Recovery [RFC 2001].

268	   This proposal specifies two new flags in the Reserved field of the
269	   TCP header.  The TCP mechanism for negotiating ECN-Capability uses
270	   the ECN-Echo flag in the TCP header.  (This was called the ECN Notify
271	   flag in some earlier documents.)  Bit 9 in the Reserved field of the
272	   TCP header is designated as the ECN-Echo flag.

274	   To enable the TCP receiver to determine when to stop setting the ECN-
275	   Echo flag, we introduce a second new flag in the TCP header, the
276	   Congestion Window Reduced (CWR) flag.  The CWR flag is assigned to
277	   Bit 8 in the Reserved field of the TCP header.

279	   The use of these flags is described in the sections below.

281	5.1.1.  TCP Initialization

283	   In the TCP connection setup phase, the source and destination TCPs
284	   exchange information about their desire and/or capability to use ECN.
285	   Subsequent to the completion of this negotiation, the TCP sender sets
286	   the ECT bit in the IP header of packets to indicate to the network
287	   that the transport is capable and willing to participate in ECN for
288	   this packet.  This will indicate to the routers that they may mark
289	   this packet with the CE bit, if they would like to use that as a
290	   method of congestion notification. If the TCP connection does not
291	   wish to use ECN notification for a particular packet, the sending TCP
292	   sets the ECT bit equal to 0 (i.e., not set), and the TCP receiver
293	   ignores the CE bit in the received packet.

295	   When a node sends a TCP SYN packet, it may set the ECN-Echo and CWR
296	   flags in the TCP header.  For a SYN packet, the setting of both the
297	   ECN-Echo and CWR flags are defined as an indication that the sending
298	   TCP is ECN-Capable, rather than as an indication of congestion or of
299	   response to congestion.  More precisely, a SYN packet with both the
300	   ECN-Echo and CWR flags set indicates that the TCP implementation
301	   transmitting the SYN packet will respond to incoming data packets
302	   that have the CE bit set in the IP header by setting the ECN-Echo
303	   flag in outgoing TCP Acknowledgement (ACK) packets.

305	   When a node sends a SYN-ACK packet, it may set the ECN-Echo flag, but
306	   it does not set the CWR flag.  For a SYN-ACK packet, the pattern of
307	   the ECN-Echo flag set and the CWR flag not set in the TCP header is
308	   defined as an indication that the TCP transmitting the SYN-ACK packet
309	   is ECN-Capable.

311	   There is the question of why we chose to have the TCP sending the SYN
312	   set two ECN-related flags in the Reserved field of the TCP header for
313	   the SYN packet, while the responding TCP sending the SYN-ACK sets
314	   only one ECN-related flag in the SYN-ACK packet?  This asymmetry is
315	   necessary for the robust negotiation of ECN-capability with deployed
316	   TCP implementations.  There exists at least one TCP implementation in
317	   which TCP receivers set the Reserved field of the TCP header in ACK
318	   packets (and hence the SYN-ACK) simply to reflect the Reserved field
319	   of the TCP header in the received data packet.  Because the TCP SYN
320	   packet sets the ECN-Echo and CWR flags to indicate ECN-capability,
321	   while the SYN-ACK packet sets only the ECN-Echo flag, the sending TCP
322	   correctly interprets a receiver's reflection of its own flags in the
323	   Reserved field as an indication that the receiver is not ECN-capable.

325	5.1.2.  The TCP Sender

327	   For a TCP connection using ECN, data packets are transmitted with the
328	   ECT bit set in the IP header (set to a "1").  If the sender receives
329	   an ECN-Echo ACK packet (that is, an ACK packet with the ECN-Echo flag
330	   set in the TCP header), then the sender knows that congestion was
331	   encountered in the network on the path from the sender to the
332	   receiver.  The indication of congestion should be treated just as a
333	   congestion loss in non-ECN-Capable TCP. That is, the TCP source
334	   halves the congestion window "cwnd" and reduces the slow start
335	   threshold "ssthresh".  The sending TCP does NOT increase the
336	   congestion window in response to the receipt of an ECN-Echo ACK
337	   packet.

339	   A critical condition is that TCP does not react to congestion
340	   indications more than once every window of data (or more loosely,
341	   more than once every round-trip time).  That is, the TCP sender's
342	   congestion window should be reduced only once in response to a series
343	   of dropped and/or CE packets from a single window of data, In
344	   addition, the TCP source should not decrease the slow-start
345	   threshold, ssthresh, if it has been decreased within the last round
346	   trip time.  However, if any retransmitted packets are dropped or have
347	   the CE bit set, then this is interpreted by the source TCP as a new
348	   instance of congestion.

350	   [Floyd94] discusses this further, and [Floyd98] includes a validation
351	   test in the ns simulator illustrating a wide range of ECN scenarios.
352	   These scenarios include the following: an ECN followed by another
353	   ECN, a Fast Retransmit, or a Retransmit Timeout; and a Retransmit
354	   Timeout or a Fast Retransmit followed by an ECN.

356	   When the TCP sender reduces its congestion window in response to an
357	   ECN-Echo ACK packet, there is no need for the sender to slow-start
358	   (as in Tahoe TCP in response to a packet drop) or to stop sending
359	   packets for a period of time to allow the queue to dissipate (as in
360	   Reno TCP for roughly half a round-trip time during Fast Recovery).
361	   The CE packet in the forward direction does not indicate the imminent
362	   possibility of buffer overflow requiring an urgent source action to
363	   reduce the load dramatically.  Incoming acknowledgements that
364	   continue to arrive can "clock out" outgoing packets as allowed by the
365	   reduced congestion window.

367	   TCP follows existing algorithms for sending data packets in response
368	   to incoming ACKs, multiple duplicate acknowledgements, or retransmit
369	   timeouts [RFC2001].

371	5.1.3.  The TCP Receiver

373	   When TCP receives a CE data packet at the destination end-system, the
374	   TCP data receiver sets the ECN-Echo flag in the TCP header of the
375	   subsequent ACK packet.  If there is any ACK withholding implemented,
376	   as in current "delayed-ACK" TCP implementations where the TCP
377	   receiver can send an ACK for two arriving data packets, then the ECN-
378	   Echo flag in the ACK packet will be set to the OR of the CE bits of
379	   all of the data packets being acknowledged.  That is, if any of the
380	   received data packets are CE packets, then the returning ACK has the
381	   ECN-Echo flag set.

383	   To provide robustness against the possibility of a dropped ACK packet
384	   carrying an ECN-Echo flag, the TCP receiver must set the ECN-Echo
385	   flag in a series of ACK packets.  The TCP receiver uses the CWR flag
386	   to determine when to stop setting the ECN-Echo flag.

388	   When an ECN-Capable TCP reduces its congestion window for any reason
389	   (because of a retransmit timeout, a Fast Retransmit, or in response
390	   to an ECN Notification), the TCP sets the CWR flag in the TCP header
391	   of the first data packet sent after the window reduction.  If that
392	   data packet is dropped in the network, then the sending TCP will have
393	   to reduce the congestion window again and retransmit the dropped
394	   packet.  Thus, the Congestion Window Reduced message is reliably
395	   delivered to the data receiver.

397	   After a TCP receiver sends an ACK packet with the ECN-Echo bit set,
398	   that TCP receiver continues to set the ECN-Echo flag in ACK packets
399	   until it receives a CWR packet (a packet with the CWR flag set).
400	   After the receipt of the CWR packet, acknowledgements for subsequent
401	   non-CE data packets do not have the ECN-Echo flag set.  If another CE
402	   packet is received by the data receiver, the receiver would once
403	   again send ACK packets with the ECN-Echo flag set.  While the receipt
404	   of a CWR packet does not guarantee that the data sender received the
405	   ECN-Echo message, this does guarantee that the data sender reduced
406	   its congestion window at some point *after* it sent the data packet
407	   for which the CE bit was set.

409	   We have already specified that a TCP sender reduces its congestion
410	   window at most once per window of data.  This mechanism requires some
411	   care to make sure that the sender reduces its congestion window at
412	   most once per ECN indication, and that multiple ECN messages over
413	   several successive windows of data are properly reported to the ECN
414	   sender.  This is discussed further in [Floyd98].

416	5.1.4. Congestion on the ACK-path

418	   For the current generation of TCP congestion control algorithms, pure
419	   acknowledgement packets (e.g., packets that do not contain any
420	   accompanying data) should be sent with the ECT bit off.  Current TCP
421	   receivers have no mechanisms for reducing traffic on the ACK-path in
422	   response to congestion notification.  Mechanisms for responding to
423	   congestion on the ACK-path can be relegated as an area for future
424	   research.  (One simple possibility would be for the sender to reduce
425	   its congestion window when it receives a pure ACK packet with the CE
426	   bit set).  For current TCP implementations, a single dropped ACK
427	   generally has only a very small effect on the TCP's sending rate.

429	6. Summary of changes required in IP and TCP

431	   Two bits need to be specified in the IP header, the ECN-Capable
432	   Transport (ECT) bit and the Congestion Experienced (CE) bit.  The ECT
433	   bit set to "0" indicates that the transport protocol will ignore the
434	   CE bit.  This is the default value for the ECT bit.  The ECT bit set
435	   to "1" indicates that the transport protocol is willing and able to
436	   participate in ECN.

438	   The default value for the CE bit is "0".  The router sets the CE bit
439	   to "1" to indicate congestion to the end nodes.  The CE bit in a
440	   packet header should never be reset by a router from "1" to "0".

442	   TCP requires three changes, a negotiation phase during setup to
443	   determine if both end nodes are ECN-capable, and two new flags in the
444	   TCP header, from the "reserved" flags in the TCP flags field.  The
445	   ECN-Echo flag is used by the data receiver to inform the data sender
446	   of a received CE packet.  The Congestion Window Reduced flag is used
447	   by the data sender to inform the data receiver that the congestion
448	   window has been reduced.

450	7. Non-relationship to ATM's EFCI indicator or Frame Relay's FECN

452	   Since the ATM and Frame Relay mechanisms for congestion indication
453	   have typically been defined without any notion of average queue size
454	   as the basis for determining that an intermediate node is congested,
455	   we believe that they provide a very noisy signal. The TCP-sender
456	   reaction specified in this draft for ECN is NOT the appropriate
457	   reaction for such a noisy signal of congestion notification. It is
458	   our expectation that ATM's EFCI and Frame Relay's FECN mechanisms
459	   would be phased out over time within the ATM network.  However, if
460	   the routers that interface to the ATM network have a way of
461	   maintaining the average queue at the interface, and use it to come to
462	   a reliable determination that the ATM subnet is congested, they may
463	   use the ECN notification that is defined here.

465	   We emphasize that a *single* packet with the CE bit set in an IP
466	   packet causes the transport layer to respond, in terms of congestion
467	   control, as it would to a packet drop.  As such, the CE bit is not a
468	   good match to a transient signal such as one based on the
469	   instantaneous queue size.  However, experiments in techniques at
470	   layer 2 (e.g., in ATM switches or Frame Relay switches) should be
471	   encouraged.  For example, using a scheme such as RED (where packet
472	   marking is based on the average queue length exceeding a threshold),
473	   layer 2 devices could provide a reasonably reliable indication of
474	   congestion.  When all the layer 2 devices in a path set that layer's
475	   own Congestion Experienced bit (e.g., the EFCI bit for ATM, the FECN
476	   bit in Frame Relay) in this reliable manner, then the interface
477	   router to the layer 2 network could copy the state of that layer 2
478	   Congestion Experienced bit into the CE bit in the IP header.  We
479	   recognize that this is not the current practice, nor is it in current
480	   standards. However, encouraging experimentation in this manner may
481	   provide the information needed to enable evolution of existing layer
482	   2 mechanisms to provide a more reliable means of congestion
483	   indication, when they use a single bit for indicating congestion.

485	8. Non-compliance by the End Nodes

487	   This section discusses concerns about the vulnerability of ECN to
488	   non-compliant end-nodes (i.e., end nodes that set the ECT bit in
489	   transmitted packets but do not respond to received CE packets).  We
490	   argue that the addition of ECN to the IP architecture would not
491	   significantly increase the current vulnerability of the architecture
492	   to unresponsive flows.

494	   Even for non-ECN environments, there are serious concerns about the
495	   damage that can be done by non-compliant or unresponsive flows (that
496	   is, flows that do not respond to congestion control indications by
497	   reducing their arrival rate at the congested link).  For example, an
498	   end-node could "turn off congestion control" by not reducing its
499	   congestion window in response to packet drops.  This is a concern for
500	   the current Internet.  It has been argued that routers will have to
501	   deploy mechanisms to detect and differentially treat packets from
502	   non-compliant flows.  It has also been argued that techniques such as
503	   end-to-end per-flow scheduling and isolation of one flow from
504	   another, differentiated services, or end-to-end reservations could
505	   remove some of the more damaging effects of unresponsive flows.

507	   It has been argued that dropping packets in itself may be an adequate
508	   deterrent for non-compliance, and that the use of ECN removes this
509	   deterrent.  We would argue in response that (1) ECN-capable routers
510	   preserve packet-dropping behavior in times of high congestion; and
511	   (2) even in times of high congestion, dropping packets in itself is
512	   not an adequate deterrent for non-compliance.

514	   First, ECN-Capable routers will only mark packets (as opposed to
515	   dropping them) when the packet marking rate is reasonably low.
516	   During periods where the average queue size exceeds an upper
517	   threshold, and therefore the potential packet marking rate would be
518	   high, our recommendation is that routers drop packets rather then set
519	   the CE bit in packet headers.

521	   During the periods of low or moderate packet marking rates when ECN
522	   would be deployed, there would be little deterrent effect on
523	   unresponsive flows of dropping rather than marking those packets.
524	   For example, delay-insensitive flows using reliable delivery might
525	   have an incentive to increase rather than to decrease their sending
526	   rate in the presence of dropped packets.  Similarly, delay-sensitive
527	   flows using unreliable delivery might increase their use of FEC in
528	   response to an increased packet drop rate, increasing rather than
529	   decreasing their sending rate.  For the same reasons, we do not
530	   believe that packet dropping itself is an effective deterrent for
531	   non-compliance even in an environment of high packet drop rates.

533	   Several methods have been proposed to identify and restrict non-
534	   compliant or unresponsive flows.  The addition of ECN to the network
535	   environment would not in any way increase the difficulty of designing
536	   and deploying such mechanisms.  If anything, the addition of ECN to
537	   the architecture would make the job of identifying unresponsive flows
538	   slightly easier.  For example, in an ECN-Capable environment routers
539	   are not limited to information about packets that are dropped or have
540	   the CE bit set at that router itself; in such an environment routers
541	   could also take note of arriving CE packets that indicate congestion
542	   encountered by that packet earlier in the path.

544	9. Non-compliance in the Network

546	   The breakdown of effective congestion control could be caused not
547	   only by a non-compliant end-node, but also by the loss of the
548	   congestion indication in the network itself.  As one example, a rogue
549	   or broken router could "erase" the CE bit in arriving CE packets,
550	   thus preventing that indication of congestion from reaching
551	   downstream receivers.  This could result in the failure of congestion
552	   control for that flow and a resulting increase in congestion in the
553	   network, ultimately resulting in subsequent packets dropped for this
554	   flow as the average queue size increased at the congested gateway.
555	   Concerns regarding the loss of congestion indications from
556	   encapsulated, dropped, or corrupted packets are discussed below.

558	9.1. Encapsulated packets

560	   Some care is required to handle the CE and ECT bits appropriately
561	   when packets are encapsulated and de-encapsulated for tunnels.

563	   When a packet is encapsulated, the following rules apply regarding
564	   the ECT bit.  First, if the ECT bit in the encapsulated ('inside')
565	   header is a 0, then the ECT bit in the encapsulating ('outside')
566	   header MUST be a 0.  If the ECT bit in the inside header is a 1, then
567	   the ECT bit in the outside header SHOULD be a 1.

569	   When a packet is de-encapsulated, the following rules apply regarding
570	   the CE bit.  If the ECT bit is a 1 in both the inside and the outside
571	   header, then the CE bit in the outside header MUST be ORed with the
572	   CE bit in the inside header.  (That is, in this case a CE bit of 1 in
573	   the outside header must be copied to the inside header.)  If the ECT
574	   bit in either header is a 0, then the CE bit in the outside header is
575	   ignored.  This requirement for the treatment of de-encapsulated
576	   packets does not currently apply to IPsec tunnels.

578	   A specific example of the use of ECN with encapsulation occurs when a
579	   flow wishes to use ECN-capability to avoid the danger of an
580	   unnecessary packet drop for the encapsulated packet as a result of
581	   congestion at an intermediate node in the tunnel.  This functionality
582	   can be supported by copying the ECN codepoint in the inner IP header
583	   to the outer IP header upon encapsulation, and using the ECN
584	   codepoint in the outer IP header to set the ECN codepoint in the
585	   inner IP header upon decapsulation.  This effectively allows routers
586	   along the tunnel to cause the CE bit to be set in the ECN field of
587	   the unencapsulated IP header of an ECN-capable packet when such
588	   routers experience congestion.

590	9.2.  IPsec Tunnel Considerations

592	   The IPsec protocol, as defined in [ESP, AH], does not include the IP
593	   header's ECN field in any of its cryptographic calculations (in the
594	   case of tunnel mode, the outer IP header's ECN field is not
595	   included).  Hence modification of the ECN field by a network node has
596	   no effect on IPsec's end-to-end security, because it cannot cause any
597	   IPsec integrity check to fail.  As a consequence, IPsec does not
598	   provide any defense against an adversary's modification of the ECN
599	   field (i.e., a man-in-the-middle attack), as the adversary's
600	   modification will also have no effect on IPsec's end-to-end security.
601	   In some environments, the ability to modify the ECN field without
602	   affecting IPsec integrity checks may constitute a covert channel; if
603	   it is necessary to eliminate such a channel or reduce its bandwidth,
604	   then the outer IP header's ECN field can be zeroed at the tunnel
605	   ingress and egress nodes.

607	   The IPsec protocol currently requires that the inner header's ECN
608	   field not be changed by IPsec decapsulation processing at a tunnel
609	   egress node.  This ensures that an adversary's modifications to the
610	   ECN field cannot be used to launch theft- or denial-of-service
611	   attacks across an IPsec tunnel endpoint, as any such modifications
612	   will be discarded at the tunnel endpoint.  This document makes no
613	   change to that IPsec requirement.  As a consequence of the current
614	   specification of the IPsec protocol, we suggest that experiments with
615	   ECN not be carried out for flows that will undergo IPsec tunneling at
616	   the present time.

618	   If the IPsec specifications are modified in the future to permit a
619	   tunnel egress node to modify the ECN field in an inner IP header
620	   based on the ECN field value in the outer header (e.g., copying part
621	   or all of the outer ECN field to the inner ECN field), or to permit
622	   the ECN field of the outer IP header to be zeroed during
623	   encapsulation, then experiments with ECN may be used in combination
624	   with IPsec tunneling.

626	   This discussion of ECN and IPsec tunnel considerations draws heavily
627	   on related discussions and documents from the Differentiated Services
628	   Working Group.

630	9.3.  Dropped or Corrupted Packets

632	   An additional issue concerns a packet that has the CE bit set at one
633	   router and is dropped by a subsequent router.  For the proposed use
634	   for ECN in this paper (that is, for a transport protocol such as TCP
635	   for which a dropped data packet is an indication of congestion), end
636	   nodes detect dropped data packets, and the congestion response of the
637	   end nodes to a dropped data packet is at least as strong as the
638	   congestion response to a received CE packet.

640	   However, transport protocols such as TCP do not necessarily detect
641	   all packet drops, such as the drop of a "pure" ACK packet; for
642	   example, TCP does not reduce the arrival rate of subsequent ACK
643	   packets in response to an earlier dropped ACK packet.  Any proposal
644	   for extending ECN-Capability to such packets would have to address
645	   concerns raised by CE packets that were later dropped in the network.

647	   Similarly, if a CE packet is dropped later in the network due to
648	   corruption (bit errors), the end nodes should still invoke congestion
649	   control, just as TCP would today in response to a dropped data
650	   packet.  This issue of corrupted CE packets would have to be
651	   considered in any proposal for the network to distinguish between
652	   packets dropped due to corruption, and packets dropped due to
653	   congestion or buffer overflow.

655	10. A summary of related work.

657	   [Floyd94] considers the advantages and drawbacks of adding ECN to the
658	   TCP/IP architecture.  As shown in the simulation-based comparisons,
659	   one advantage of ECN is to avoid unnecessary packet drops for short
660	   or delay-sensitive TCP connections.  A second advantage of ECN is in
661	   avoiding some unnecessary retransmit timeouts in TCP.  This paper
662	   discusses in detail the integration of ECN into TCP's congestion
663	   control mechanisms.  The possible disadvantages of ECN discussed in
664	   the paper are that a non-compliant TCP connection could falsely
665	   advertise itself as ECN-capable, and that a TCP ACK packet carrying
666	   an ECN-Echo message could itself be dropped in the network.  The
667	   first of these two issues is discussed in Section 8 of this document,
668	   and the second is addressed by the proposal in Section 5.1.3 for a
669	   CWR flag in the TCP header.

671	   [CKLTZ97] reports on an experimental implementation of ECN in IPv6.
672	   The experiments include an implementation of ECN in an existing
673	   implementation of RED for FreeBSD.  A number of experiments were run
674	   to demonstrate the control of the average queue size in the router,
675	   the performance of ECN for a single TCP connection as a congested
676	   router, and fairness with multiple competing TCP connections.  One
677	   conclusion of the experiments is that dropping a packet from a bulk-
678	   data transfer degrades performance much more severely than marking a
679	   packet.

681	   Because the experimental implementation in [CKLTZ97] predates some of
682	   the developments in this document, the implementation does not
683	   conform to this document in all respects.  For example, in the
684	   experimental implementation the CWR flag is not used, but instead the
685	   TCP receiver sends the ECN-Echo bit on a single ACK packet.

687	   [K98] and [CKLT98] build on [CKLTZ97] to further analyze the benefits
688	   of ECN for TCP.  The conclusions are that ECN TCP gets moderately
689	   better throughput than non-ECN TCP; that ECN TCP flows are fair
690	   towards non-ECN TCP flows; and that ECN TCP is robust with two-way
691	   traffic, congestion in both directions, and with multiple congested
692	   gateways.  Experiments with many short web transfers show that, while
693	   most of the short connections have similar transfer times with or
694	   without ECN, a small percentage of the short connections have very
695	   high transfer times for the non-ECN experiments as compared to the
696	   ECN experiments.  This increased transfer time is particularly
697	   dramatic for those short connections that have their first packet
698	   dropped in the non-ECN experiments, and that therefore have to wait
699	   six seconds for the retransmit timer to expire.

701	   The ECN Web Page [ECN] has pointers to other implementations of ECN
702	   in progress.

704	11. Conclusions

706	   Given the current effort to implement RED, we believe this is the
707	   right time for router vendors to examine how to implement congestion
708	   avoidance mechanisms that do not depend on packet drops alone.  With
709	   the increased deployment of applications and transports sensitive to
710	   the delay and loss of a single packet, depending on packet loss as a
711	   normal congestion notification mechanism appears to be insufficient
712	   (or at the very least, non-optimal).

714	12. Acknowledgements

716	   Many people have made contributions to this internet-draft.  In
717	   particular, we would like to thank Kenjiro Cho for the proposal for
718	   the TCP mechanism for negotiating ECN-Capability, Kevin Fall for the
719	   proposal of the CWR bit, Steve Blake for material on IPv4 Header
720	   Checksum Recalculation, Jamal Hadi Salim for discussions of ECN
721	   issues, and Steve Bellovin, Jim Bound, Brian Carpenter, Paul
722	   Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson for
723	   discussions of security issues.  We also thank the Internet End-to-
724	   End Research Group for ongoing discussions of these issues.

726	13. References

728	   [AH] S. Kent and R. Atkinson, "IP Authentication Header", Internet
729	   Draft <draft-ietf-ipsec-auth-header-07.txt>, July 1998.

731	   [CKLTZ97] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L.,
732	   "Implementing Explicit Congestion Notification (ECN) in TCP over
733	   IPv6", UCLA Technical Report, December 1997, URL
734	   "http://www.cs.ucla.edu/~hari/software/ecn/ecn_rpt.ps.gz".

736	   [CKLT98] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L.,
737	   "Implementing ECN for TCP/IPv6", presentation to the ECN BOF at the
738	   L.A. IETF, March 1998, URL "http://www.cs.ucla.edu/~hari/ecn-
739	   ietf.ps".

741	   [ECN] "The ECN Web Page", URL "http://www-
742	   nrg.ee.lbl.gov/floyd/ecn.html".

744	   [ESP] S. Kent and R. Atkinson, "IP Encapsulating Security Payload",
745	   Internet Draft <draft-ietf-ipsec-esp-v2-06.txt>, July 1998.

747	   [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways
748	   for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1
749	   N.4, August 1993, p. 397-413.  URL
750	   "ftp://ftp.ee.lbl.gov/papers/early.pdf".

752	   [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM
753	   Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23.
754	   URL "ftp://ftp.ee.lbl.gov/papers/tcp_ecn.4.ps.Z".

756	   [Floyd97] Floyd, S., and Fall, K., "Router Mechanisms to Support End-
757	   to-End Congestion Control", Technical report, February 1997.  URL
758	   "ftp://ftp.ee.lbl.gov/papers/collapse.ps".

760	   [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator",
761	   URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all-
762	   ecn.

764	   [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN)
765	   benefits for TCP", Master's thesis, UCLA, 1998, URL
766	   "http://www.cs.ucla.edu/~hari/software/ecn/ecn_report.ps.gz".

768	   [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection",
769	   SIGCOMM '97, September 1997.  URL
770	   "http://www.inria.fr/rodeo/sigcomm97/program.html#ab078".

772	   [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc.
773	   ACM SIGCOMM '88, pp. 314-329.  URL
774	   "ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z".

776	   [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance
777	   Algorithm", Message to end2end-interest mailing list, April 1990.
778	   URL "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt".

780	   [RFC1141] T. Mallory and A. Kullberg, "Incremental Updating of the
781	   Internet Checksum", RFC 1141, January 1990.

783	   [MJV96], S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven
784	   Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130.

786	   [RFC2001] W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast
787	   Retransmit, and Fast Recovery Algorithms", RFC 2001, January 1997.

789	   [RFC2309] B. Braden, D. Clark, J. Crowcroft, B. Davie, S. Deering, D.
790	   Estrin, S. Floyd, V. Jacobson, G. Minshall, C. Partridge, L.
791	   Peterson, K. Ramakrishnan, S. Shenker, J. Wroclawski, L. Zhang,
792	   "Recommendations on Queue Management and Congestion Avoidance in the
793	   Internet", RFC 2309, April 1998.

795	   [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for
796	   Congestion Avoidance in Computer Networks", ACM Transactions on
797	   Computer Systems, Vol.8, No.2, pp. 158-181, May 1990.

799	14. Security Considerations

801	   Security considerations have been discussed in Section 9.

803	15. IPv4 Header Checksum Recalculation

805	   IPv4 header checksum recalculation is an issue with some high-end
806	   router architectures using an output-buffered switch, since most if
807	   not all of the header manipulation is performed on the input side of
808	   the switch, while the ECN decision would need to be made local to the
809	   output buffer.  This is not an issue for IPv6, since there is no IPv6
810	   header checksum.  The IPv4 TOS octet is the last byte of a 16-bit
811	   half-word.

813	   RFC 1141 [RFC1141] discusses the incremental updating of the IPv4
814	   checksum after the TTL field is decremented.  The incremental
815	   updating of the IPv4 checksum after the CE bit was set would work as
816	   follows: Let HC be the original header checksum, and let HC' be the
817	   new header checksum after the CE bit has been set.  Then for header
818	   checksums calculated with one's complement subtraction, HC' would be
819	   recalculated as follows:
820	      HC' = { HC - 1     HC > 1
821	            { 0x0000     HC = 1

823	   For header checksums calculated on two's complement machines, HC'
824	   would be recalculated as follows after the CE bit was set:
825	       HC' = { HC - 1     HC > 0
826	             { 0xFFFE     HC = 0

828	16. The motivation for the ECT bit.

830	   The need for the ECT bit is motivated by the fact that ECN will be
831	   deployed incrementally in an Internet where some transport protocols
832	   and routers understand ECN and some do not.  With the ECT bit, the
833	   router can drop packets from flows that are not ECN-capable, but can
834	   **instead** set the CE bit in flows that **are** ECN-capable.
835	   Because the ECT bit allows an end node to have the CE bit set in a
836	   packet **instead** of having the packet dropped, an end node might
837	   have some incentive to deploy ECN.

839	   If there was no ECT indication, then the router would have to set the
840	   CE bit for packets from both ECN-capable and non-ECN-capable flows.
841	   In this case, there would be no incentive for end-nodes to deploy
842	   ECN, and no viable path of incremental deployment from a non-ECN
843	   world to an ECN-capable world.  Consider the first stages of such an
844	   incremental deployment, where a subset of the flows are ECN-capable.
845	   At the onset of congestion, when the packet dropping/marking rate
846	   would be low, routers would only set CE bits, rather than dropping
847	   packets.  However, only those flows that are ECN-capable would
848	   understand and respond to CE packets.  The result is that the ECN-
849	   capable flows would back off, and the non-ECN-capable flows would be
850	   unaware of the ECN signals and would continue to open their
851	   congestion windows.

853	   In this case, there are two possible outcomes: (1) the ECN-capable
854	   flows back off, the non-ECN-capable flows get all of the bandwidth,
855	   and congestion remains mild, or (2) the ECN-capable flows back off,
856	   the non-ECN-capable flows don't, and congestion increases until the
857	   router transitions from setting the CE bit to dropping packets.
858	   While this second outcome evens out the fairness, the ECN-capable
859	   flows would still receive little benefit from being ECN-capable,
860	   because the increased congestion would drive the router to packet-
861	   dropping behavior.

863	   A flow that advertised itself as ECN-Capable but does not respond to
864	   CE bits is functionally equivalent to a flow that turns off
865	   congestion control, as discussed in Sections 8 and 9.

867	   Thus, in a world when a subset of the flows are ECN-capable, but
868	   where ECN-capable flows have no mechanism for indicating that fact to
869	   the routers, there would be less effective and less fair congestion
870	   control in the Internet, resulting in a strong incentive for end
871	   nodes not to deploy ECN.

873	17. Why use two bits in the IP header?

875	   Given the need for an ECT indication in the IP header, there still
876	   remains the question of whether the ECT (ECN-Capable Transport) and
877	   CE (Congestion Experienced) indications should be overloaded on a
878	   single bit.  This overloaded-one-bit alternative, explored in
879	   [Floyd94], would involve a single bit with two values.  One value,
880	   "ECT and not CE", would represent an ECN-Capable Transport, and the
881	   other value, "CE or not ECT", would represent either Congestion
882	   Experienced or a non-ECN-Capable transport.

884	   There is only one inherent functional difference between the one-bit
885	   and two-bit implementations.  This functional difference concerns
886	   packets that traverse multiple congested routers.  Consider a CE
887	   packet that arrives at a second congested router, and is selected by
888	   the active queue management at that router for either marking or
889	   dropping.  In the one-bit implementation, the second congested router
890	   has no choice but to drop the CE packet, because it cannot
891	   distinguish between a CE packet and a non-ECT packet.  In the two-bit
892	   implementation, the second congested router has the choice of either
893	   dropping the CE packet, or of leaving it alone with the CE bit set.

895	   Another difference between the one-bit and two-bit implementations
896	   comes from the fact that with the one-bit implementation, receivers
897	   in a single flow cannot distinguish between CE and non-ECT packets.
898	   Thus, in the one-bit implementation an ECN-capable data sender would
899	   have to unambiguously indicate to the receiver or receivers whether
900	   each packet had been sent as ECN-Capable or as non-ECN-Capable.  One
901	   possibility would be for the sender to indicate in the transport
902	   header whether the packet was sent as ECN-Capable.  A second
903	   possibility that would involve a functional limitation for the one-
904	   bit implementation would be for the sender to unambiguously indicate
905	   that it was going to send *all* of its packets as ECN-Capable or as
906	   non-ECN-Capable.  For a multicast transport protocol, this
907	   unambiguous indication would have to be apparent to receivers joining
908	   an on-going multicast session.

910	   Another advantage of the two-bit approach is that it is somewhat more
911	   robust.  The most critical issue, discussed in Section 8, is that the
912	   default indication should be that of a non-ECN-Capable transport.  In
913	   a two-bit implementation, this requirement for the default value
914	   simply means that the ECT bit should be `OFF' by default.  In the
915	   one-bit implementation, this means that the single overloaded bit
916	   should by default be in the "CE or not ECT" position.  This is less
917	   clear and straightforward, and possibly more open to incorrect
918	   implementations either in the end nodes or in the routers.

920	   In summary, while the one-bit implementation could be a possible
921	   implementation, it has the following significant limitations relative
922	   to the two-bit implementation.  First, the one-bit implementation has
923	   more limited functionality for the treatment of CE packets at a
924	   second congested router.  Second, the one-bit implementation requires
925	   either that extra information be carried in the transport header of
926	   packets from ECN-Capable flows (to convey the functionality of the
927	   second bit elsewhere, namely in the transport header), or that
928	   senders in ECN-Capable flows accept the limitation that receivers
929	   must be able to determine a priori which packets are ECN-Capable and
930	   which are not ECN-Capable.  Third, the one-bit implementation is
931	   possibly more open to errors from faulty implementations that choose
932	   the wrong default value for the ECN bit.  We believe that the use of
933	   the extra bit in the IP header for the ECT-bit is extremely valuable
934	   to overcome these limitations.

936	AUTHORS' ADDRESSES

938	   K. K. Ramakrishnan
939	   AT&T Labs. Research
940	   Phone: +1 (973) 360-8766
941	   Email: kkrama@research.att.com
942	   URL: http://www.research.att.com/info/kkrama

944	   Sally Floyd
945	   Lawrence Berkeley National Laboratory
946	   Phone: +1 (510) 486-7518
947	   Email: floyd@ee.lbl.gov
948	   URL: http://www-nrg.ee.lbl.gov/floyd/

950	   This draft was created in September 1998.
951	   It expires March 1999.