idnits 2.17.1 

draft-kksjf-ecn-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-19) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 18
     longer pages, the longest (page 2) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 19 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 170: '...ms followed at the end-systems MUST be...'
     RFC 2119 keyword, line 198: '...cket, the router MAY instead set the C...'
     RFC 2119 keyword, line 530: '...capsulating ('outside') header MUST be...'
     RFC 2119 keyword, line 532: '...   the outside header SHOULD be a 1....'
     RFC 2119 keyword, line 536: '...e outside header MUST be ORed with the...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 1998) is 9410 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC 2001' is mentioned on line 267, but not defined

  ** Obsolete undefined reference: RFC 2001 (Obsoleted by RFC 2581)

  == Unused Reference: 'Floyd97' is defined on line 661, but no explicit
     reference was found in the text

  == Unused Reference: 'FRED' is defined on line 673, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CKLTZ97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CKLT98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ECN'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FJ93'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd94'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Floyd98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'K98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FRED'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson88'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Jacobson90'

  ** Downref: Normative reference to an Informational RFC: RFC 1141

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MJV96'

  ** Obsolete normative reference: RFC 2001 (Obsoleted by RFC 2581)

  ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'RJ90'


     Summary: 14 errors (**), 0 flaws (~~), 6 warnings (==), 15 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                       K. K. Ramakrishnan
3	INTERNET DRAFT                                        AT&T Labs Research
4	draft-kksjf-ecn-01.txt                                       Sally Floyd
5	                                                                    LBNL
6	                                                               July 1998
7	                                                  Expires:  January 1999

9	    A Proposal to add Explicit Congestion Notification (ECN) to IP

11	                          Status of this Memo

13	   This document is an Internet-Draft.  Internet-Drafts are working
14	   documents of the Internet Engineering Task Force (IETF), its areas,
15	   and its working groups.  Note that other groups may also distribute
16	   working documents as Internet-Drafts.

18	   Internet-Drafts are draft documents valid for a maximum of six months
19	   and may be updated, replaced, or obsoleted by other documents at any
20	   time.  It is inappropriate to use Internet- Drafts as reference
21	   material or to cite them other than as "work in progress."

23	   To view the entire list of current Internet-Drafts, please check the
24	   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
25	   Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern
26	   Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific
27	   Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast).

29	Abstract

31	   This note describes a proposed addition of ECN (Explicit Congestion
32	   Notification) to IP.  TCP is currently the dominant transport
33	   protocol used in the Internet. We begin by describing TCP's use of
34	   packet drops as an indication of congestion.  Next we argue that with
35	   the addition of active queue management (e.g., RED) to the Internet
36	   infrastructure, where routers detect congestion before the queue
37	   overflows, routers are no longer limited to packet drops as an
38	   indication of congestion, but could instead set a Congestion
39	   Experienced (CE) bit in the packet header, for ECN-capable transport
40	   protocols.  We describe when the CE bit would be set in the routers,
41	   and describe what modifications would be needed to TCP to make it
42	   ECN-capable.  Modifications to other transport protocols (e.g.,
43	   unreliable unicast or multicast, reliable multicast, other reliable
44	   unicast transport protocols) could be considered as those protocols
45	   are developed and advance through the standards process.

47	1. Introduction

49	   TCP's congestion control and avoidance algorithms are based on the
50	   notion that the network is a black-box [Jacobson88, Jacobson90].  The
51	   network's state of congestion or otherwise is determined by end-
52	   systems probing for the network state, by gradually increasing the
53	   load on the network (by increasing the window of packets that are
54	   outstanding in the network) until the network becomes congested and a
55	   packet is lost.  Treating the network as a "black-box" and treating
56	   loss as an indication of congestion in the network is appropriate for
57	   pure best-effort data carried by TCP which has little or no
58	   sensitivity to delay or loss of individual packets.  In addition,
59	   TCP's congestion management algorithms have techniques built-in (such
60	   as Fast Retransmit and Fast Recovery) to minimize the impact of
61	   losses from a throughput perspective.

63	   However, these mechanisms are not intended to help applications that
64	   are in fact sensitive to the delay or loss of one or more individual
65	   packets.  Interactive traffic such as telnet, web-browsing, and
66	   transfer of audio and video data can be sensitive to packet losses
67	   (using an unreliable data delivery transport such as UDP) or to the
68	   increased latency of the packet caused by the need to retransmit the
69	   packet after a loss (for reliable data delivery such as TCP).

71	   Since TCP determines the appropriate congestion window to use by
72	   gradually increasing the window size until it experiences a dropped
73	   packet, this causes the queues at the bottleneck router to build up.
74	   With most packet drop policies at the router that are not sensitive
75	   to the load placed by each individual flow, this means that some of
76	   the packets of latency-sensitive flows are going to be dropped.
77	   Active queue management mechanisms detect congestion before the queue
78	   overflows, and provide an indication of this congestion to the end
79	   nodes.  The advantages of active queue management are discussed in
80	   RFC 2309 [RFC2309].  Active queue management avoids some of the bad
81	   properties of dropping on queue overflow, including the undesirable
82	   synchronization of loss across multiple flows.  More importantly,
83	   active queue management means that transport protocols with
84	   congestion control (e.g., TCP) do not have to rely on buffer overflow
85	   as the only indication of congestion.  This can reduce unnecessary
86	   queueing delay for all traffic sharing that queue.

88	   Active queue management mechanisms may use one of several methods for
89	   indicating congestion to end-nodes. One is to use packet drops, as is
90	   currently done. However, active queue management allows the router to
91	   separate policies of queueing or dropping packets from the policies
92	   for indicating congestion. Thus, active queue management allows
93	   routers to use the Congestion Experienced (CE) bit in a packet header
94	   as an indication of congestion, instead of relying solely on packet
95	   drops.

97	2. Assumptions and General Principles

99	   In this section, we describe some of the important design principles
100	   and assumptions that guided the design choices in this proposal.

102	   (1) Congestion may persist over different time-scales. The time
103	   scales that we are concerned with are congestion events that may last
104	   longer than a round-trip time.
105	   (2) The number of packets in an individual flow (e.g., TCP connection
106	   or an exchange using UDP) may range from a small number of packets to
107	   quite a large number. We are interested in managing the congestion
108	   caused by flows that send enough packets so that they are still
109	   active when network feedback reaches them.
110	   (3) New mechanisms for congestion control and avoidance need to co-
111	   exist and cooperate with existing mechanisms for congestion control.
112	   In particular, new mechanisms have to co-exist with TCP's current
113	   methods of adapting to congestion and with routers' current practice
114	   of dropping packets in periods of congestion.
115	   (4) Because ECN is likely to be adopted gradually, accommodating
116	   migration is essential. Some routers may still only drop packets to
117	   indicate congestion, and some end-systems may not be ECN-capable. The
118	   most viable strategy is one that accommodates incremental deployment
119	   without having to resort to "islands" of ECN-capable and non-ECN-
120	   capable environments.
121	   (5) Asymmetric routing is likely to be a normal occurrence in the
122	   Internet. The path (sequence of links and routers) followed by data
123	   packets may be different from the path followed by the acknowledgment
124	   packets in the reverse direction.
125	   (6) Routers process the "regular" headers in IP packets more
126	   efficiently than they process the header information in IP options.
127	   This suggests keeping congestion experienced information in the
128	   regular headers of an IP packet.
129	   (7) It must be recognized that not all end-systems will cooperate in
130	   mechanisms for congestion control. However, new mechanisms shouldn't
131	   make it easier for TCP applications to disable TCP congestion
132	   control. The benefit of lying about participating in new mechanisms
133	   such as ECN-capability should be small.

135	3. Random Early Detection (RED)

137	   Random Early Detection (RED) is a mechanism for active queue
138	   management that has been proposed to detect incipient congestion
139	   [FJ93], and is currently being deployed in the Internet backbone
140	   [RFC2309].  Although RED is meant to be a general mechanism using one
141	   of several alternatives for congestion indication, in the current
142	   environment of the Internet RED is restricted to using packet drops
143	   as a mechanism for congestion indication.  RED drops packets based on
144	   the average queue length exceeding a threshold, rather than only when
145	   the queue overflows.  However, when RED drops packets before the
146	   queue actually overflows, RED is not forced by memory limitations to
147	   discard the packet.

149	   RED could set a Congestion Experienced (CE) bit in the packet header
150	   instead of dropping the packet, if such a bit was provided in the IP
151	   header and understood by the transport protocol.  The use of the CE
152	   bit would allow the receiver(s) to receive the packet, avoiding the
153	   potential for excessive delays due to retransmissions after packet
154	   losses.  We use the term 'CE packet' to denote a packet that has the
155	   CE bit set.

157	4. Explicit Congestion Notification in IP

159	   We propose that the Internet provide a congestion indication for
160	   incipient congestion (as in RED and earlier work [RJ90]) where the
161	   notification can sometimes be through marking packets rather than
162	   dropping them.  This would require an ECN field in the IP header with
163	   two bits.  The ECN-Capable Transport (ECT) bit would be set by the
164	   data sender to indicate that the end-points of the transport protocol
165	   are ECN-capable.  The CE bit would be set by the router to indicate
166	   congestion to the end nodes.  Routers that have a packet arriving at
167	   a full queue would drop the packet, just as they do now.

169	   Upon the receipt by an ECN-Capable transport of a single CE packet,
170	   the congestion control algorithms followed at the end-systems MUST be
171	   essentially the same as the congestion control response to a *single*
172	   dropped packet.  For example, for TCP the source TCP halves its
173	   congestion window "cwnd" in response to an ECN indication received by
174	   the data receiver.

176	   One reason for requiring that the congestion-control response to the
177	   CE packet be essentially the same as the response to a dropped packet
178	   is to accommodate the incremental deployment of ECN in both end-
179	   systems and in routers.  Some routers may drop ECN-Capable packets
180	   (e.g., using the same RED policies for congestion detection) while
181	   other routers set the CE bit, for equivalent levels of congestion.
182	   Similarly, a router might drop a non-ECN-Capable packet but set the
183	   CE bit in an ECN-Capable packet, for equivalent levels of congestion.
184	   Different congestion control responses to a CE bit indication and to
185	   a packet drop could result in unfair treatment for different flows.

187	   An additional requirement is that the end-systems should react to
188	   congestion at most once per window of data (i.e., at most once per
189	   roundtrip time), to avoid reacting multiple times to multiple
190	   indications of congestion within a roundtrip time.

192	   For a router, the CE bit of an ECN-Capable packet should only be set
193	   if the router would otherwise have dropped the packet as an
194	   indication of congestion to the end nodes. When the router's buffer
195	   is not yet full and the router is prepared to drop a packet to inform
196	   end nodes of incipient congestion, the router should first check to
197	   see if the ECT bit is set in that packet's IP header.  If so, then
198	   instead of dropping the packet, the router MAY instead set the CE bit
199	   in the IP header.

201	   An environment where all end nodes were ECN-Capable could allow new
202	   criteria to be developed for setting the CE bit, and new congestion
203	   control mechanisms for end-node reaction to CE packets.  However,
204	   this is a research issue, and as such is not addressed in this
205	   document.

207	   When a CE packet is received by a router, the CE bit is left
208	   unchanged, and the packet transmitted as usual. When severe
209	   congestion has occurred and the router's queue is full, then the
210	   router has no choice but to drop some packet when a new packet
211	   arrives.  We anticipate that such packet losses will become
212	   relatively infrequent when a majority of end-systems become ECN-
213	   Capable and participate in TCP or other compatible congestion control
214	   mechanisms. In an adequately-provisioned network in such an ECN-
215	   Capable environment, packet losses should occur primarily during
216	   transients or in the presence of non-cooperating sources.

218	   We expect that routers will set the CE bit in response to incipient
219	   congestion as indicated by the average queue size, using the RED
220	   algorithms suggested in [FJ93, RFC2309].  To the best of our
221	   knowledge, this is the only proposal currently under discussion in
222	   the IETF for routers to drop packets proactively, before the buffer
223	   overflows.  However, this document does not attempt to specify a
224	   particular mechanism for active queue management, leaving that
225	   endeavor, if needed, to other areas of the IETF.  While ECN is
226	   inextricably tied up with active queue management at the router, the
227	   reverse does not hold; active queue management mechanisms have been
228	   developed and deployed independently from ECN, using packet drops as
229	   indications of congestion in the absence of ECN in the IP
230	   architecture.

232	5. Support from the Transport Protocol

234	   ECN requires support from the transport protocol, in addition to the
235	   functionality given by the ECN field in the IP packet header. The
236	   transport protocol might require negotiation between the endpoints
237	   during setup to determine that all of the endpoints are ECN-capable,
238	   so that the sender can set the ECT bit in transmitted packets.
239	   Second, the transport protocol must be capable of reacting
240	   appropriately to the receipt of CE packets.  This reaction could be
241	   in the form of the data receiver informing the data sender of the
242	   received CE packet (e.g., TCP), of the data receiver unsubscribing to
243	   a layered multicast group (e.g., RLM [MJV96]), or of some other
244	   action that ultimately reduces the arrival rate of that flow to that
245	   receiver.

247	   This document only addresses the addition of ECN Capability to TCP,
248	   leaving issues of ECN and other transport protocols to further
249	   research.  For TCP, ECN requires three new mechanisms:  negotiation
250	   between the endpoints during setup to determine if they are both
251	   ECN-capable; an ECN-Echo flag in the TCP header so that the data
252	   receiver can inform the data sender when a CE packet has been
253	   received; and a Congestion Window Reduced (CWR) flag in the TCP
254	   header so that the data sender can inform the data receiver that the
255	   congestion window has been reduced. The support required from other
256	   transport protocols is likely to be different, particular for
257	   unreliable or reliable multicast transport protocols, and will have
258	   to be determined as other transport protocols are brought to the IETF
259	   for standardization.

261	5.1. TCP

263	   The following sections describe in detail the proposed use of ECN in
264	   TCP.  This proposal is described in essentially the same form in
265	   [Floyd94]. We assume that the source TCP uses the standard congestion
266	   control algorithms of Slow-start, Fast Retransmit and Fast Recovery
267	   [RFC 2001].

269	5.1.1.  TCP Initialization

271	   In the TCP connection setup phase, the source and destination TCPs
272	   exchange information about their desire and/or capability to use ECN.
273	   As a result of the negotiation, the TCP sender sets the ECT bit in
274	   the IP header to indicate to the network that the transport is
275	   capable and willing to participate in ECN for this packet. This will
276	   indicate to the routers that they may mark this packet with the CE
277	   bit, if they would like to use that as a method of congestion
278	   notification. If the TCP connection does not wish to use ECN
279	   notification for a particular packet, the sending TCP sets the ECT
280	   bit equal to 0 (i.e., not set), and the TCP receiver ignores the CE
281	   bit in the received packet.

283	   The TCP mechanism for negotiating ECN-Capability uses the ECN-Echo
284	   flag in the TCP header.  (This was called the ECN Notify flag in some
285	   earlier documents.) Bit 9 in the Reserved field of the TCP header is
286	   assigned to the ECN-Echo flag.

288	   When a node sends a TCP SYN packet, it may set the ECN-Echo flag in
289	   the TCP header.  For a SYN packet, the ECN-Echo flag is defined as an
290	   indication that the sending TCP is ECN-Capable, rather than as a
291	   return indication of congestion. More precisely, a SYN packet with
292	   the ECN-Echo flag set indicates that that sending TCP implementation
293	   will respond to incoming data packets that have the CE bit set in the
294	   IP header by setting the ECN-Echo flag in outgoing TCP
295	   Acknowledgement (ACK) packets.

297	   Similarly, for a SYN-ACK packet, the ECN-Echo flag in the TCP header
298	   is defined as an indication that the TCP transmitting the SYN-ACK
299	   packet is ECN-Capable.

301	5.1.2.  The TCP Sender

303	   For a TCP connection using ECN, data packets are transmitted with the
304	   ECT bit set in the IP header (set to a "1").  If the sender receives
305	   an ECN-Echo ACK packet (that is, an ACK packet with the ECN-Echo flag
306	   set in the TCP header), then the sender knows that congestion was
307	   encountered in the network on the path from the sender to the
308	   receiver.  The indication of congestion should be treated just as a
309	   congestion loss in non-ECN-Capable TCP. That is, the TCP source
310	   halves the congestion window "cwnd" and reduces the slow start
311	   threshold "ssthresh".  The sending TCP does NOT increase the
312	   congestion window in response to the receipt of an ECN-Echo ACK
313	   packet.

315	   A critical condition is that TCP does not react to congestion
316	   indications more than once every window of data (or more loosely,
317	   more than once every round-trip time). That is, the TCP sender's
318	   congestion window should be reduced only once in response to a series
319	   of dropped and/or CE packets from a single window of data,

321	   The recommended method for implementing this is as follows.  Assume
322	   that at time "t" the source TCP reacts to an ECN-Echo ACK packet by
323	   reducing its congestion window.  The source TCP notes the packets
324	   that are outstanding at that time (i.e., packets that have not yet
325	   been acknowledged).  Until all these packets are acknowledged, the
326	   source TCP does not react to another ECN indication of congestion.
327	   However, if during this period a packet is retransmitted as a result
328	   of a retransmission timeout or the receipt of the required number
329	   (e.g., 3) of duplicate acknowledgments, then the source TCP will
330	   react to subsequent ECN indications of congestion.

332	   [Floyd94] discusses this further, and [Floyd98] includes a validation
333	   test illustrating a wide range of ECN scenarios. These scenarios
334	   include the following: an ECN followed by another ECN, a Fast
335	   Retransmit, or a Retransmit Timeout; and a Retransmit Timeout or a
336	   Fast Retransmit followed by an ECN.

338	   When the TCP sender reduces its congestion window in response to an
339	   ECN-Echo ACK packet, there is no need for the sender to slow-start
340	   (as in Tahoe TCP in response to a packet drop) or to stop sending
341	   packets for a period of time to allow the queue to dissipate (as in
342	   Reno TCP for roughly half a round-trip time during Fast Recovery).
343	   The CE packet in the forward direction does not indicate the imminent
344	   possibility of buffer overflow requiring an urgent source action to
345	   reduce the load dramatically.  Incoming acknowledgements that
346	   continue to arrive can "clock out" outgoing packets as allowed by the
347	   reduced congestion window.

349	   TCP follows existing algorithms for sending data packets in response
350	   to incoming ACKs, multiple duplicate acknowledgements, or retransmit
351	   timeouts [RFC2001].

353	5.1.3.  The TCP Receiver

355	   When TCP receives a CE data packet at the destination end-system, the
356	   TCP data receiver sets the ECN-Echo flag in the TCP header of the
357	   subsequent ACK packet.  If there is any ACK withholding implemented,
358	   as in current "delayed-ACK" TCP implementations where the TCP
359	   receiver can send an ACK for two arriving data packets, then the
360	   ECN-Echo flag in the ACK packet will be set to the OR of the CE bits
361	   of all of the data packets being acknowledged.  That is, if any of
362	   the received data packets are CE packets, then the returning ACK has
363	   the ECN-Echo flag set.

365	   To provide robustness against the possibility of a dropped ACK packet
366	   carrying an ECN-Echo flag, the TCP receiver must set the ECN-Echo
367	   flag in a series of ACK packets.  To enable the TCP receiver to
368	   determine when to stop setting the ECN-Echo flag, we introduce a
369	   second new flag in the TCP header, the Congestion Window Reduced
370	   (CWR) flag.  The CWR flag is assigned to Bit 8 in the Reserved field
371	   of the TCP header.

373	   When an ECN-Capable TCP reduces its congestion window for any reason
374	   (because of a retransmit timeout, a Fast Retransmit, or in response
375	   to an ECN Notification), the TCP sets the CWR flag in the TCP header
376	   of the first data packet sent after the window reduction.  If that
377	   data packet is dropped in the network, then the sending TCP will have
378	   to reduce the congestion window again and retransmit the dropped
379	   packet.  Thus, the Congestion Window Reduced message is reliably
380	   delivered to the data receiver.

382	   After a TCP receiver sends an ACK packet with the ECN-Echo bit set,
383	   that TCP receiver continues to set the ECN-Echo flag in ACK packets
384	   until it receives a CWR packet (a packet with the CWR flag set).
385	   After the receipt of the CWR packet, acknowledgements for subsequent
386	   non-CE data packets do not have the ECN-Echo flag set. If another CE
387	   packet is received by the data receiver, the receiver would once
388	   again send ACK packets with the ECN-Echo flag set.  While the receipt
389	   of a CWR packet does not guarantee that the data sender received the
390	   ECN-Echo message, this does guarantee that the data sender reduced
391	   its congestion window at some point *after* it sent the data packet
392	   for which the CE bit was set.

394	   We have already specified that a TCP sender reduces its congestion
395	   window at most once per window of data.  This mechanism requires some
396	   care to make sure that the sender reduces its congestion window at
397	   most once per ECN indication, and that multiple ECN messages over
398	   several successive windows of data are properly reported to the ECN
399	   sender.  This is discussed further in [Floyd98].

401	5.1.4. Congestion on the ACK-path

403	   For the current generation of TCP congestion control algorithms, pure
404	   acknowledgement packets (e.g., packets that do not contain any
405	   accompanying data) should be sent with the ECN-capable bit off.
406	   Current TCP receivers have no mechanisms for reducing traffic on the
407	   ACK-path in response to congestion notification.  Mechanisms for
408	   responding to congestion on the ACK-path can be relegated as an area
409	   for future research.  (One simple possibility would be for the sender
410	   to reduce its congestion window when it receives a pure ACK packet
411	   with the CE bit set). For current TCP implementations, a single
412	   dropped ACK generally has only a very small effect on the TCP's
413	   sending rate.

415	6. Summary of changes required in IP and TCP

417	   Two bits need to be specified in the IP header, the ECN-Capable
418	   Transport (ECT) bit and the Congestion Experienced (CE) bit.  The ECT
419	   bit set to "0" indicates that the transport protocol will ignore the
420	   CE bit.  This is the default value for the ECT bit.  The ECT bit set
421	   to "1" indicates that the transport protocol is willing and able to
422	   participate in ECN.

424	   The default value for the CE bit is "0".  The router sets the CE bit
425	   to "1" to indicate congestion to the end nodes.  The CE bit in a
426	   packet header should never be reset by a router from "1" to "0".

428	   TCP requires three changes, a negotiation phase during setup to
429	   determine if both end nodes are ECN-capable, and two new flags in the
430	   TCP header, from the "reserved" flags in the TCP flags field.  The
431	   ECN-Echo flag is used by the data receiver to inform the data sender
432	   of a received CE packet.  The Congestion Window Reduced flag is used
433	   by the data sender to inform the data receiver that the congestion
434	   window has been reduced.

436	7. Non-relationship to ATM's EFCI indicator or Frame Relay's FECN

438	   Since the ATM and Frame Relay mechanisms for congestion indication
439	   have typically been defined without any notion of average queue size
440	   as the basis for determining that an intermediate node is congested,
441	   we believe that they provide a very noisy signal. The TCP-sender
442	   reaction specified in this draft for ECN is NOT the appropriate
443	   reaction for such a noisy signal of congestion notification. It is
444	   our expectation that ATM's EFCI and Frame Relay's FECN mechanisms
445	   would be phased out over time within the ATM network.  However, if
446	   the routers that interface to the ATM network have a way of
447	   maintaining the average queue at the interface, and use it to come to
448	   a reliable determination that the ATM subnet is congested, they may
449	   use the ECN notification that is defined here.

451	8. Non-compliance by the End Nodes

453	   This section discusses concerns about the vulnerability of ECN to
454	   non-compliant end-nodes (i.e., end nodes that set the ECT bit in
455	   transmitted packets but do not respond to received CE packets).  We
456	   argue that the addition of ECN to the IP architecture would not
457	   significantly increase the current vulnerability of the architecture
458	   to unresponsive flows.

460	   Even for non-ECN environments, there are serious concerns about the
461	   damage that can be done by non-compliant or unresponsive flows (that
462	   is, flows that do not respond to congestion control indications by
463	   reducing their arrival rate at the congested link).  For example, an
464	   end-node could "turn off congestion control" by not reducing its
465	   congestion window in response to packet drops. This is a concern for
466	   the current Internet.  It has been argued that routers will have to
467	   deploy mechanisms to detect and differentially treat packets from
468	   non-compliant flows.  It has also been argued that techniques such as
469	   end-to-end per-flow scheduling and isolation of one flow from
470	   another, differentiated services, or end-to-end reservations could
471	   remove some of the more damaging effects of unresponsive flows.

473	   It has been argued that dropping packets in itself may be an adequate
474	   deterrent for non-compliance, and that the use of ECN removes this
475	   deterrent.  We would argue in response that (1) ECN-capable routers
476	   preserve packet-dropping behavior in times of high congestion; and
477	   (2) even in times of high congestion, dropping packets in itself is
478	   not an adequate deterrent for non-compliance.

480	   First, ECN-Capable routers will only mark packets (as opposed to
481	   dropping them) when the packet marking rate is reasonably low. During
482	   periods where the average queue size exceeds an upper threshold, and
483	   therefore the potential packet marking rate would be high, our
484	   recommendation is that routers drop packets rather then set the CE
485	   bit in packet headers.

487	   During the periods of low or moderate packet marking rates when ECN
488	   would be deployed, there would be little deterrent effect on
489	   unresponsive flows of dropping rather than marking those packets. For
490	   example, delay-insensitive flows using reliable delivery might have
491	   an incentive to increase rather than to decrease their sending rate
492	   in the presence of dropped packets.  Similarly, delay-sensitive flows
493	   using unreliable delivery might increase their use of FEC in response
494	   to an increased packet drop rate, increasing rather than decreasing
495	   their sending rate.  For the same reasons, we do not believe that
496	   packet dropping itself is an effective deterrent for non-compliance
497	   even in an environment of high packet drop rates.

499	   Several methods have been proposed to identify and restrict non-
500	   compliant or unresponsive flows. The addition of ECN to the network
501	   environment would not in any way increase the difficulty of designing
502	   and deploying such mechanisms. If anything, the addition of ECN to
503	   the architecture would make the job of identifying unresponsive flows
504	   slightly easier.  For example, in an ECN-Capable environment routers
505	   are not limited to information about packets that are dropped or have
506	   the CE bit set at that router itself; in such an environment routers
507	   could also take note of arriving CE packets that indicate congestion
508	   encountered by that packet earlier in the path.

510	9. Non-compliance in the Network

512	   The breakdown of effective congestion control could be caused not
513	   only by a non-compliant end-node, but also by the loss of the
514	   congestion indication in the network itself.  As one example, a rogue
515	   or broken router could "erase" the CE bit in arriving CE packets,
516	   thus preventing that indication of congestion from reaching
517	   downstream receivers.  This could result in the failure of congestion
518	   control for that flow and a resulting increase in congestion in the
519	   network, ultimately resulting in subsequent packets dropped for this
520	   flow as the average queue size increased at the congested gateway.
521	   Concerns regarding the loss of congestion indications from
522	   encapsulated, dropped, or corrupted packets are discussed below.

524	9.1. Encapsulated packets

526	   Some care is required to handle the CE and ECT bits appropriately
527	   when packets are encapsulated and de-encapsulated for tunnels.  When
528	   a packet is encapsulated, the following rules apply regarding the ECT
529	   bit.  First, if the ECT bit in the encapsulated ('inside') header is
530	   a 0, then the ECT bit in the encapsulating ('outside') header MUST be
531	   a 0.  If the ECT bit in the inside header is a 1, then the ECT bit in
532	   the outside header SHOULD be a 1.

534	   When a packet is de-encapsulated, the following rules apply regarding
535	   the CE bit.  If the ECT bit is a 1 in both the inside and the outside
536	   header, then the CE bit in the outside header MUST be ORed with the
537	   CE bit in the inside header.  (That is, in this case a CE bit of 1 in
538	   the outside header must be copied to the inside header.) If the ECT
539	   bit in either header is a 0, then the CE bit in the outside header is
540	   ignored.

542	9.2.  Dropped or Corrupted Packets

544	   An additional issue concerns a packet that has the CE bit set at one
545	   router and is dropped by a subsequent router.  For the proposed use
546	   for ECN in this paper (that is, for a transport protocol such as TCP
547	   for which a dropped data packet is an indication of congestion), end
548	   nodes detect dropped data packets, and the congestion response of the
549	   end nodes to a dropped data packet is at least as strong as the
550	   congestion response to a received CE packet.

552	   However, transport protocols such as TCP do not necessarily detect
553	   all packet drops, such as the drop of a "pure" ACK packet; for
554	   example, TCP does not reduce the arrival rate of subsequent ACK
555	   packets in response to an earlier dropped ACK packet.  Any proposal
556	   for extending ECN-Capability to such packets would have to address
557	   concerns raised by CE packets that were later dropped in the network.

559	   Similarly, if a CE packet is dropped later in the network due to
560	   corruption (bit errors), the end nodes should still invoke congestion
561	   control, just as TCP would today in response to a dropped data
562	   packet. This issue of corrupted CE packets would have to be
563	   considered in any proposal for the network to distinguish between
564	   packets dropped due to corruption, and packets dropped due to
565	   congestion or buffer overflow.

567	10. A summary of related work.

569	   [Floyd94] considers the advantages and drawbacks of adding ECN to the
570	   TCP/IP architecture.  As shown in the simulation-based comparisons,
571	   one advantage of ECN is to avoid unnecessary packet drops for short
572	   or delay-sensitive TCP connections.  A second advantage of ECN is in
573	   avoiding some unnecessary retransmit timeouts in TCP.  This paper
574	   discusses in detail the integration of ECN into TCP's congestion
575	   control mechanisms.  The possible disadvantages of ECN discussed in
576	   the paper are that a non-compliant TCP connection could falsely
577	   advertise itself as ECN-capable, and that a TCP ACK packet carrying
578	   an ECN-Echo message could itself be dropped in the network.  The
579	   first of these two issues is discussed in Section 8 of this document,
580	   and the second is addressed by the proposal in Section 5.1.3 for a
581	   CWR flag in the TCP header.

583	   [CKLTZ97] reports on an experimental implementation of ECN in IPv6.
584	   The experiments include an implementation of ECN in an existing
585	   implementation of RED for FreeBSD.  A number of experiments were run
586	   to demonstrate the control of the average queue size in the router,
587	   the performance of ECN for a single TCP connection as a congested
588	   router, and fairness with multiple competing TCP connections.  One
589	   conclusion of the experiments is that dropping a packet from a bulk-
590	   data transfer degrades performance much more severely than marking a
591	   packet.

593	   Because the experimental implementation in [CKLTZ97] predates some of
594	   the developments in this document, the implementation does not
595	   conform to this document in all respects.  For example, in the
596	   experimental implementation the CWR flag is not used, but instead the
597	   TCP receiver sends the ECN-Echo bit on a single ACK packet.

599	   [K98] and [CKLT98] build on [CKLTZ97] to further analyze the benefits
600	   of ECN for TCP. The conclusions are that ECN TCP gets moderately
601	   better throughput than non-ECN TCP; that ECN TCP flows are fair
602	   towards non-ECN TCP flows; and that ECN TCP is robust with two-way
603	   traffic, congestion in both directions, and with multiple congested
604	   gateways.  Experiments with many short web transfers show that, while
605	   most of the short connections have similar transfer times with or
606	   without ECN, a small percentage of the short connections have very
607	   high transfer times for the non-ECN experiments as compared to the
608	   ECN experiments.  This increased transfer time is particularly
609	   dramatic for those short connections that have their first packet
610	   dropped in the non-ECN experiments, and that therefore have to wait
611	   six seconds for the retransmit timer to expire.

613	   The ECN Web Page [ECN] has pointers to other implementations of ECN
614	   in progress.

616	11. Conclusions

618	   Given the current effort to implement RED, we believe this is the
619	   right time for router vendors to examine how to implement congestion
620	   avoidance mechanisms that do not depend on packet drops alone.  With
621	   the increased deployment of applications and transports sensitive to
622	   the delay and loss of a single packet, depending on packet loss as a
623	   normal congestion notification mechanism appears to be insufficient
624	   (or at the very least, non-optimal).

626	12. Acknowledgements

628	   A number of people have made contributions to this internet-draft.
629	   In particular, we would like to thank Kenjiro Cho for the proposal
630	   for the TCP mechanism for negotiating ECN-Capability, Steve Blake and
631	   Kevin Fall for the material on IPv4 Header Checksum Recalculation,
632	   and Steve Bellovin, Jim Bound, Brian Carpenter, Paul Ferguson,
633	   Stephen Kent, Greg Minshall, and Vern Paxson for discussions of
634	   security issues.  We also thank the Internet End-to-End Research
635	   Group for ongoing discussions of these issues.

637	13. References

639	   [CKLTZ97] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L.,
640	   "Implementing Explicit Congestion Notification (ECN) in TCP over
641	   IPv6", UCLA Technical Report, December 1997, URL
642	   "http://www.cs.ucla.edu/~hari/software/ecn/ecn_rpt.ps.gz".

644	   [CKLT98] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L.,
645	   "Implementing ECN for TCP/IPv6", presentation to the ECN BOF at the
646	   L.A. IETF, March 1998, URL "http://www.cs.ucla.edu/~hari/ecn-
647	   ietf.ps".

649	   [ECN] "The ECN Web Page", URL "http://www-
650	   nrg.ee.lbl.gov/floyd/ecn.html".

652	   [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways
653	   for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1
654	   N.4, August 1993, p. 397-413.  URL
655	   "ftp://ftp.ee.lbl.gov/papers/early.pdf".

657	   [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM
658	   Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23.
659	   URL "ftp://ftp.ee.lbl.gov/papers/tcp_ecn.4.ps.Z".

661	   [Floyd97] Floyd, S., and Fall, K., "Router Mechanisms to Support
662	   End-to-End Congestion Control", Technical report, February 1997.  URL
663	   "ftp://ftp.ee.lbl.gov/papers/collapse.ps".

665	   [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator",
666	   URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all-
667	   ecn.

669	   [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN)
670	   benefits for TCP", Master's thesis, UCLA, 1998, URL
671	   "http://www.cs.ucla.edu/~hari/software/ecn/ecn_report.ps.gz".

673	   [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection",
674	   SIGCOMM '97, September 1997.  URL
675	   "http://www.inria.fr/rodeo/sigcomm97/program.html#ab078".

677	   [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc.
678	   ACM SIGCOMM '88, pp. 314-329.  URL
679	   "ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z".

681	   [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance
682	   Algorithm", Message to end2end-interest mailing list, April 1990.
683	   URL "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt".

685	   [RFC1141] T. Mallory and A. Kullberg, "Incremental Updating of the
686	   Internet Checksum", RFC 1141, January 1990.

688	   [MJV96], S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven
689	   Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130.

691	   [RFC2001] W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast
692	   Retransmit, and Fast Recovery Algorithms", RFC 2001, January 1997.

694	   [RFC2309] B. Braden, D. Clark, J. Crowcroft, B. Davie, S. Deering, D.
695	   Estrin, S. Floyd, V. Jacobson, G. Minshall, C. Partridge, L.
696	   Peterson, K. Ramakrishnan, S. Shenker, J. Wroclawski, L. Zhang,
697	   "Recommendations on Queue Management and Congestion Avoidance in the
698	   Internet", RFC 2309, April 1998.

700	   [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for
701	   Congestion Avoidance in Computer Networks", ACM Transactions on
702	   Computer Systems, Vol.8, No.2, pp. 158-181, May 1990.

704	14. Security Considerations

706	   Security considerations have been discussed in Section 9.

708	15. IPv4 Header Checksum Recalculation

710	   IPv4 header checksum recalculation is an issue with some high-end
711	   router architectures using an output-buffered switch, since most if
712	   not all of the header manipulation is performed on the input side of
713	   the switch, while the ECN decision would need to be made local to the
714	   output buffer. This is not an issue for IPv6, since there is no IPv6
715	   header checksum. The IPv4 TOS octet is the last byte of a 16-bit
716	   half-word.

718	   RFC 1141 [RFC1141] discusses the incremental updating of the IPv4
719	   checksum after the TTL field is decremented.  The incremental
720	   updating of the IPv4 checksum after the CE bit was set would work as
721	   follows: Let HC be the original header checksum, and let HC' be the
722	   new header checksum after the CE bit has been set.  Then for header
723	   checksums calculated with one's complement subtraction, HC' would be
724	   recalculated as follows:
725	      HC' = { HC - 1     HC > 1
726	            { 0x0000     HC = 1
727	   For header checksums calculated on two's complement machines, HC'
728	   would be recalculated as follows after the CE bit was set:
729	       HC' = { HC - 1     HC > 0
730	             { 0xFFFE     HC = 0

732	16. The motivation for the ECT bit.

734	   The need for the ECT bit is motivated by the fact that ECN will be
735	   deployed incrementally in an Internet where some transport protocols
736	   and routers understand ECN and some do not. With the ECT bit, the
737	   router can drop packets from flows that are not ECN-capable, but can
738	   **instead** set the CE bit in flows that **are** ECN-capable. Because
739	   the ECT bit allows an end node to have the CE bit set in a packet
740	   **instead** of having the packet dropped, an end node might have some
741	   incentive to deploy ECN.

743	   If there was no ECT indication, then the router would have to set the
744	   CE bit for packets from both ECN-capable and non-ECN-capable flows.
745	   In this case, there would be no incentive for end-nodes to deploy
746	   ECN, and no viable path of incremental deployment from a non-ECN
747	   world to an ECN-capable world.  Consider the first stages of such an
748	   incremental deployment, where a subset of the flows are ECN-capable.
749	   At the onset of congestion, when the packet dropping/marking rate
750	   would be low, routers would only set CE bits, rather than dropping
751	   packets.  However, only those flows that are ECN-capable would
752	   understand and respond to CE packets. The result is that the ECN-
753	   capable flows would back off, and the non-ECN-capable flows would be
754	   unaware of the ECN signals and would continue to open their
755	   congestion windows.

757	   In this case, there are two possible outcomes: (1) the ECN-capable
758	   flows back off, the non-ECN-capable flows get all of the bandwidth,
759	   and congestion remains mild, or (2) the ECN-capable flows back off,
760	   the non-ECN-capable flows don't, and congestion increases until the
761	   router transitions from setting the CE bit to dropping packets.
762	   While this second outcome evens out the fairness, the ECN-capable
763	   flows would still receive little benefit from being ECN-capable,
764	   because the increased congestion would drive the router to packet-
765	   dropping behavior.

767	   A flow that advertised itself as ECN-Capable but does not respond to
768	   CE bits is functionally equivalent to a flow that turns off
769	   congestion control, as discussed in Sections 8 and 9.

771	   Thus, in a world when a subset of the flows are ECN-capable, but
772	   where ECN-capable flows have no mechanism for indicating that fact to
773	   the routers, there would be less effective and less fair congestion
774	   control in the Internet, resulting in a strong incentive for end
775	   nodes not to deploy ECN.

777	17. Why use two bits in the IP header?

779	   Given the need for an ECT indication in the IP header, there still
780	   remains the question of whether the ECT (ECN-Capable Transport) and
781	   CE (Congestion Experienced) indications should be overloaded on a
782	   single bit.  This overloaded-one-bit alternative, explored in
783	   [Floyd94], would involve a single bit with two values.  One value,
784	   "ECT and not CE", would represent an ECN-Capable Transport, and the
785	   other value, "CE or not ECT", would represent either Congestion
786	   Experienced or a non-ECN-Capable transport.

788	   There is only one inherent functional difference between the one-bit
789	   and two-bit implementations.  This functional difference concerns
790	   packets that traverse multiple congested routers.  Consider a CE
791	   packet that arrives at a second congested router, and is selected by
792	   the active queue management at that router for either marking or
793	   dropping.  In the one-bit implementation, the second congested router
794	   has no choice but to drop the CE packet, because it cannot
795	   distinguish between a CE packet and a non-ECT packet.  In the two-bit
796	   implementation, the second congested router has the choice of either
797	   dropping the CE packet, or of leaving it alone with the CE bit set.

799	   Another difference between the one-bit and two-bit implementations
800	   comes from the fact that with the one-bit implementation, receivers
801	   in a single flow cannot distinguish between CE and non-ECT packets.
802	   Thus, in the one-bit implementation an ECN-capable data sender would
803	   have to unambiguously indicate to the receiver or receivers whether
804	   each packet had been sent as ECN-Capable or as non-ECN-Capable.  One
805	   possibility would be for the sender to indicate in the transport
806	   header whether the packet was sent as ECN-Capable.  A second
807	   possibility that would involve a functional limitation for the one-
808	   bit implementation would be for the sender to unambiguously indicate
809	   that it was going to send *all* of its packets as ECN-Capable or as
810	   non-ECN-Capable.  For a multicast transport protocol, this
811	   unambiguous indication would have to be apparent to receivers joining
812	   an on-going multicast session.

814	   Another advantage of the two-bit approach is that it is somewhat more
815	   robust.  The most critical issue, discussed in Section 8, is that the
816	   default indication should be that of a non-ECN-Capable transport.  In
817	   a two-bit implementation, this requirement for the default value
818	   simply means that the ECT bit should be `OFF' by default.  In the
819	   one-bit implementation, this means that the single overloaded bit
820	   should by default be in the "CE or not ECT" position.  This is less
821	   clear and straightforward, and possibly more open to incorrect
822	   implementations either in the end nodes or in the routers.

824	   In summary, while the one-bit implementation could be a possible
825	   implementation, it has the following significant limitations relative
826	   to the two-bit implementation.  First, the one-bit implementation has
827	   more limited functionality for the treatment of CE packets at a
828	   second congested router.  Second, the one-bit implementation requires
829	   either that extra information be carried in the transport header of
830	   packets from ECN-Capable flows (to convey the functionality of the
831	   second bit elsewhere, namely in the transport header), or that
832	   senders in ECN-Capable flows accept the limitation that receivers
833	   must be able to determine a priori which packets are ECN-Capable and
834	   which are not ECN-Capable. Third, the one-bit implementation is
835	   possibly more open to errors from faulty implementations that choose
836	   the wrong default value for the ECN bit.  We believe that the use of
837	   the extra bit in the IP header for the ECT-bit is extremely valuable
838	   to overcome these limitations.

840	AUTHORS' ADDRESSES

842	   K. K. Ramakrishnan
843	   AT&T Labs. Research
844	   Phone: +1 (973) 360-8766
845	   Email: kkrama@research.att.com
846	   URL: http://www.research.att.com/info/kkrama

848	   Sally Floyd
849	   Lawrence Berkeley National Laboratory
850	   Phone: +1 (510) 486-7518
851	   Email: floyd@ee.lbl.gov
852	   URL: http://www-nrg.ee.lbl.gov/floyd/

854	   This draft was created in July 1998.
855	   It expires January 1999.