< draft-kksjf-ecn-02.txt   draft-kksjf-ecn-03.txt >
Internet Engineering Task Force K. K. Ramakrishnan Internet Engineering Task Force K. K. Ramakrishnan
INTERNET DRAFT AT&T Labs Research INTERNET DRAFT AT&T Labs Research
draft-kksjf-ecn-02.txt Sally Floyd draft-kksjf-ecn-03.txt Sally Floyd
LBNL LBNL
September 1998 October 1998
Expires: March 1999 Expires: April 1999
A Proposal to add Explicit Congestion Notification (ECN) to IP A Proposal to add Explicit Congestion Notification (ECN) to IP
Status of this Memo Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts. working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
skipping to change at page 2, line 6 skipping to change at page 2, line 5
overflows, routers are no longer limited to packet drops as an overflows, routers are no longer limited to packet drops as an
indication of congestion. Routers could instead set a Congestion indication of congestion. Routers could instead set a Congestion
Experienced (CE) bit in the packet header of packets from ECN-capable Experienced (CE) bit in the packet header of packets from ECN-capable
transport protocols. We describe when the CE bit would be set in the transport protocols. We describe when the CE bit would be set in the
routers, and describe what modifications would be needed to TCP to routers, and describe what modifications would be needed to TCP to
make it ECN-capable. Modifications to other transport protocols make it ECN-capable. Modifications to other transport protocols
(e.g., unreliable unicast or multicast, reliable multicast, other (e.g., unreliable unicast or multicast, reliable multicast, other
reliable unicast transport protocols) could be considered as those reliable unicast transport protocols) could be considered as those
protocols are developed and advance through the standards process. protocols are developed and advance through the standards process.
1. Introduction 1. Conventions and Acronyms
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in [B97].
2. Introduction
TCP's congestion control and avoidance algorithms are based on the TCP's congestion control and avoidance algorithms are based on the
notion that the network is a black-box [Jacobson88, Jacobson90]. The notion that the network is a black-box [Jacobson88, Jacobson90]. The
network's state of congestion or otherwise is determined by end- network's state of congestion or otherwise is determined by end-
systems probing for the network state, by gradually increasing the systems probing for the network state, by gradually increasing the
load on the network (by increasing the window of packets that are load on the network (by increasing the window of packets that are
outstanding in the network) until the network becomes congested and a outstanding in the network) until the network becomes congested and a
packet is lost. Treating the network as a "black-box" and treating packet is lost. Treating the network as a "black-box" and treating
loss as an indication of congestion in the network is appropriate for loss as an indication of congestion in the network is appropriate for
pure best-effort data carried by TCP which has little or no pure best-effort data carried by TCP which has little or no
skipping to change at page 2, line 49 skipping to change at page 3, line 7
RFC 2309 [RFC2309]. Active queue management avoids some of the bad RFC 2309 [RFC2309]. Active queue management avoids some of the bad
properties of dropping on queue overflow, including the undesirable properties of dropping on queue overflow, including the undesirable
synchronization of loss across multiple flows. More importantly, synchronization of loss across multiple flows. More importantly,
active queue management means that transport protocols with active queue management means that transport protocols with
congestion control (e.g., TCP) do not have to rely on buffer overflow congestion control (e.g., TCP) do not have to rely on buffer overflow
as the only indication of congestion. This can reduce unnecessary as the only indication of congestion. This can reduce unnecessary
queueing delay for all traffic sharing that queue. queueing delay for all traffic sharing that queue.
Active queue management mechanisms may use one of several methods for Active queue management mechanisms may use one of several methods for
indicating congestion to end-nodes. One is to use packet drops, as is indicating congestion to end-nodes. One is to use packet drops, as is
currently done. However, active queue management allows the router currently done. However, active queue management allows the router to
to separate policies of queueing or dropping packets from the separate policies of queueing or dropping packets from the policies
policies for indicating congestion. Thus, active queue management for indicating congestion. Thus, active queue management allows
allows routers to use the Congestion Experienced (CE) bit in a packet routers to use the Congestion Experienced (CE) bit in a packet header
header as an indication of congestion, instead of relying solely on as an indication of congestion, instead of relying solely on packet
packet drops. drops.
2. Assumptions and General Principles 3. Assumptions and General Principles
In this section, we describe some of the important design principles In this section, we describe some of the important design principles
and assumptions that guided the design choices in this proposal. and assumptions that guided the design choices in this proposal.
(1) Congestion may persist over different time-scales. The time (1) Congestion may persist over different time-scales. The time
scales that we are concerned with are congestion events that may last scales that we are concerned with are congestion events that may last
longer than a round-trip time. longer than a round-trip time.
(2) The number of packets in an individual flow (e.g., TCP connection (2) The number of packets in an individual flow (e.g., TCP connection
or an exchange using UDP) may range from a small number of packets to or an exchange using UDP) may range from a small number of packets to
quite a large number. We are interested in managing the congestion quite a large number. We are interested in managing the congestion
caused by flows that send enough packets so that they are still caused by flows that send enough packets so that they are still
active when network feedback reaches them. active when network feedback reaches them.
(3) New mechanisms for congestion control and avoidance need to co- (3) New mechanisms for congestion control and avoidance need to co-
exist and cooperate with existing mechanisms for congestion control. exist and cooperate with existing mechanisms for congestion control.
In particular, new mechanisms have to co-exist with TCP's current In particular, new mechanisms have to co-exist with TCP's current
methods of adapting to congestion and with routers' current practice methods of adapting to congestion and with routers' current practice
of dropping packets in periods of congestion. of dropping packets in periods of congestion.
(4) Because ECN is likely to be adopted gradually, accommodating (4) Because ECN is likely to be adopted gradually, accommodating
migration is essential. Some routers may still only drop packets to migration is essential. Some routers may still only drop packets to
indicate congestion, and some end-systems may not be ECN-capable. indicate congestion, and some end-systems may not be ECN-capable. The
The most viable strategy is one that accommodates incremental most viable strategy is one that accommodates incremental deployment
deployment without having to resort to "islands" of ECN-capable and without having to resort to "islands" of ECN-capable and non-ECN-
non-ECN-capable environments. capable environments.
(5) Asymmetric routing is likely to be a normal occurrence in the (5) Asymmetric routing is likely to be a normal occurrence in the
Internet. The path (sequence of links and routers) followed by data Internet. The path (sequence of links and routers) followed by data
packets may be different from the path followed by the acknowledgment packets may be different from the path followed by the acknowledgment
packets in the reverse direction. packets in the reverse direction.
(6) Many routers process the "regular" headers in IP packets more (6) Many routers process the "regular" headers in IP packets more
efficiently than they process the header information in IP options. efficiently than they process the header information in IP options.
This suggests keeping congestion experienced information in the This suggests keeping congestion experienced information in the
regular headers of an IP packet. regular headers of an IP packet.
(7) It must be recognized that not all end-systems will cooperate in (7) It must be recognized that not all end-systems will cooperate in
mechanisms for congestion control. However, new mechanisms shouldn't mechanisms for congestion control. However, new mechanisms shouldn't
make it easier for TCP applications to disable TCP congestion make it easier for TCP applications to disable TCP congestion
control. The benefit of lying about participating in new mechanisms control. The benefit of lying about participating in new mechanisms
such as ECN-capability should be small. such as ECN-capability should be small.
3. Random Early Detection (RED) 4. Random Early Detection (RED)
Random Early Detection (RED) is a mechanism for active queue Random Early Detection (RED) is a mechanism for active queue
management that has been proposed to detect incipient congestion management that has been proposed to detect incipient congestion
[FJ93], and is currently being deployed in the Internet backbone [FJ93], and is currently being deployed in the Internet backbone
[RFC2309]. Although RED is meant to be a general mechanism using one [RFC2309]. Although RED is meant to be a general mechanism using one
of several alternatives for congestion indication, in the current of several alternatives for congestion indication, in the current
environment of the Internet RED is restricted to using packet drops environment of the Internet RED is restricted to using packet drops
as a mechanism for congestion indication. RED drops packets based on as a mechanism for congestion indication. RED drops packets based on
the average queue length exceeding a threshold, rather than only when the average queue length exceeding a threshold, rather than only when
the queue overflows. However, when RED drops packets before the the queue overflows. However, when RED drops packets before the
skipping to change at page 4, line 20 skipping to change at page 4, line 27
discard the packet. discard the packet.
RED could set a Congestion Experienced (CE) bit in the packet header RED could set a Congestion Experienced (CE) bit in the packet header
instead of dropping the packet, if such a bit was provided in the IP instead of dropping the packet, if such a bit was provided in the IP
header and understood by the transport protocol. The use of the CE header and understood by the transport protocol. The use of the CE
bit would allow the receiver(s) to receive the packet, avoiding the bit would allow the receiver(s) to receive the packet, avoiding the
potential for excessive delays due to retransmissions after packet potential for excessive delays due to retransmissions after packet
losses. We use the term 'CE packet' to denote a packet that has the losses. We use the term 'CE packet' to denote a packet that has the
CE bit set. CE bit set.
4. Explicit Congestion Notification in IP 5. Explicit Congestion Notification in IP
We propose that the Internet provide a congestion indication for We propose that the Internet provide a congestion indication for
incipient congestion (as in RED and earlier work [RJ90]) where the incipient congestion (as in RED and earlier work [RJ90]) where the
notification can sometimes be through marking packets rather than notification can sometimes be through marking packets rather than
dropping them. This would require an ECN field in the IP header with dropping them. This would require an ECN field in the IP header with
two bits. The ECN-Capable Transport (ECT) bit would be set by the two bits. The ECN-Capable Transport (ECT) bit would be set by the
data sender to indicate that the end-points of the transport protocol data sender to indicate that the end-points of the transport protocol
are ECN-capable. The CE bit would be set by the router to indicate are ECN-capable. The CE bit would be set by the router to indicate
congestion to the end nodes. Routers that have a packet arriving at congestion to the end nodes. Routers that have a packet arriving at
a full queue would drop the packet, just as they do now. a full queue would drop the packet, just as they do now.
Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field.
Bit 6 is designated as the ECT bit, and bit 7 is designated as the CE
bit. The IPv4 TOS octet corresponds to the Traffic Class octet in
IPv6. The definitions for the IPv4 TOS octet [RFC791] and the IPv6
Traffic Class octet are intended to be superseded by the DS
(Differentiated Services) Field [RFC-DIFFSERV?]. Bits 6 and 7 are
listed in [RFC-DIFFSERV?] as Currently Unused. Section 19 gives a
brief history of the TOS octet.
Because of the unstable history of the TOS octet, the use of the ECN
field as specified in this document cannot be guaranteed to be
backwards compatible with all past uses of these two bits. The
potential dangers of this lack of backwards compatibility are
discussed in Section 19.
Upon the receipt by an ECN-Capable transport of a single CE packet, Upon the receipt by an ECN-Capable transport of a single CE packet,
the congestion control algorithms followed at the end-systems MUST be the congestion control algorithms followed at the end-systems MUST be
essentially the same as the congestion control response to a *single* essentially the same as the congestion control response to a *single*
dropped packet. For example, for TCP the source TCP halves its dropped packet. For example, for ECN-Capable TCP the source TCP is
congestion window "cwnd" in response to an ECN indication received by required to halve its congestion window for any window of data
the data receiver. containing either a packet drop or an ECN indication. However, we
would like to point out some notable exceptions in the reaction of
the source TCP, related to following the shorter-time-scale details
of particular implementations of TCP. For TCP's response to an ECN
indication, we do not recommend such behavior as the slow-start of
Tahoe TCP in response to a packet drop, or Reno TCP's wait of roughly
half a round-trip time during Fast Recovery.
One reason for requiring that the congestion-control response to the One reason for requiring that the congestion-control response to the
CE packet be essentially the same as the response to a dropped packet CE packet be essentially the same as the response to a dropped packet
is to accommodate the incremental deployment of ECN in both end- is to accommodate the incremental deployment of ECN in both end-
systems and in routers. Some routers may drop ECN-Capable packets systems and in routers. Some routers may drop ECN-Capable packets
(e.g., using the same RED policies for congestion detection) while (e.g., using the same RED policies for congestion detection) while
other routers set the CE bit, for equivalent levels of congestion. other routers set the CE bit, for equivalent levels of congestion.
Similarly, a router might drop a non-ECN-Capable packet but set the Similarly, a router might drop a non-ECN-Capable packet but set the
CE bit in an ECN-Capable packet, for equivalent levels of congestion. CE bit in an ECN-Capable packet, for equivalent levels of congestion.
Different congestion control responses to a CE bit indication and to Different congestion control responses to a CE bit indication and to
a packet drop could result in unfair treatment for different flows. a packet drop could result in unfair treatment for different flows.
An additional requirement is that the end-systems should react to An additional requirement is that the end-systems should react to
congestion at most once per window of data (i.e., at most once per congestion at most once per window of data (i.e., at most once per
roundtrip time), to avoid reacting multiple times to multiple roundtrip time), to avoid reacting multiple times to multiple
indications of congestion within a roundtrip time. indications of congestion within a roundtrip time.
For a router, the CE bit of an ECN-Capable packet should only be set For a router, the CE bit of an ECN-Capable packet should only be set
if the router would otherwise have dropped the packet as an if the router would otherwise have dropped the packet as an
indication of congestion to the end nodes. When the router's buffer indication of congestion to the end nodes. When the router's buffer
is not yet full and the router is prepared to drop a packet to inform is not yet full and the router is prepared to drop a packet to inform
end nodes of incipient congestion, the router should first check to end nodes of incipient congestion, the router should first check to
see if the ECT bit is set in that packet's IP header. If so, then see if the ECT bit is set in that packet's IP header. If so, then
instead of dropping the packet, the router MAY instead set the CE bit instead of dropping the packet, the router MAY instead set the CE bit
in the IP header. in the IP header.
An environment where all end nodes were ECN-Capable could allow new An environment where all end nodes were ECN-Capable could allow new
criteria to be developed for setting the CE bit, and new congestion criteria to be developed for setting the CE bit, and new congestion
control mechanisms for end-node reaction to CE packets. However, control mechanisms for end-node reaction to CE packets. However,
this is a research issue, and as such is not addressed in this this is a research issue, and as such is not addressed in this
document. document.
When a CE packet is received by a router, the CE bit is left When a CE packet is received by a router, the CE bit is left
unchanged, and the packet transmitted as usual. When severe unchanged, and the packet transmitted as usual. When severe
congestion has occurred and the router's queue is full, then the congestion has occurred and the router's queue is full, then the
router has no choice but to drop some packet when a new packet router has no choice but to drop some packet when a new packet
arrives. We anticipate that such packet losses will become arrives. We anticipate that such packet losses will become
relatively infrequent when a majority of end-systems become ECN- relatively infrequent when a majority of end-systems become ECN-
Capable and participate in TCP or other compatible congestion control Capable and participate in TCP or other compatible congestion control
mechanisms. In an adequately-provisioned network in such an ECN- mechanisms. In an adequately-provisioned network in such an ECN-
Capable environment, packet losses should occur primarily during Capable environment, packet losses should occur primarily during
transients or in the presence of non-cooperating sources. transients or in the presence of non-cooperating sources.
We expect that routers will set the CE bit in response to incipient We expect that routers will set the CE bit in response to incipient
congestion as indicated by the average queue size, using the RED congestion as indicated by the average queue size, using the RED
algorithms suggested in [FJ93, RFC2309]. To the best of our algorithms suggested in [FJ93, RFC2309]. To the best of our
knowledge, this is the only proposal currently under discussion in knowledge, this is the only proposal currently under discussion in
the IETF for routers to drop packets proactively, before the buffer the IETF for routers to drop packets proactively, before the buffer
overflows. However, this document does not attempt to specify a overflows. However, this document does not attempt to specify a
particular mechanism for active queue management, leaving that particular mechanism for active queue management, leaving that
endeavor, if needed, to other areas of the IETF. While ECN is endeavor, if needed, to other areas of the IETF. While ECN is
inextricably tied up with active queue management at the router, the inextricably tied up with active queue management at the router, the
reverse does not hold; active queue management mechanisms have been reverse does not hold; active queue management mechanisms have been
developed and deployed independently from ECN, using packet drops as developed and deployed independently from ECN, using packet drops as
indications of congestion in the absence of ECN in the IP indications of congestion in the absence of ECN in the IP
architecture. architecture.
5. Support from the Transport Protocol 6. Support from the Transport Protocol
ECN requires support from the transport protocol, in addition to the ECN requires support from the transport protocol, in addition to the
functionality given by the ECN field in the IP packet header. The functionality given by the ECN field in the IP packet header. The
transport protocol might require negotiation between the endpoints transport protocol might require negotiation between the endpoints
during setup to determine that all of the endpoints are ECN-capable, during setup to determine that all of the endpoints are ECN-capable,
so that the sender can set the ECT bit in transmitted packets. so that the sender can set the ECT bit in transmitted packets.
Second, the transport protocol must be capable of reacting Second, the transport protocol must be capable of reacting
appropriately to the receipt of CE packets. This reaction could be appropriately to the receipt of CE packets. This reaction could be
in the form of the data receiver informing the data sender of the in the form of the data receiver informing the data sender of the
received CE packet (e.g., TCP), of the data receiver unsubscribing to received CE packet (e.g., TCP), of the data receiver unsubscribing to
a layered multicast group (e.g., RLM [MJV96]), or of some other a layered multicast group (e.g., RLM [MJV96]), or of some other
action that ultimately reduces the arrival rate of that flow to that action that ultimately reduces the arrival rate of that flow to that
receiver. receiver.
This document only addresses the addition of ECN Capability to TCP, This document only addresses the addition of ECN Capability to TCP,
leaving issues of ECN and other transport protocols to further leaving issues of ECN and other transport protocols to further
research. For TCP, ECN requires three new mechanisms: negotiation research. For TCP, ECN requires three new mechanisms: negotiation
between the endpoints during setup to determine if they are both ECN- between the endpoints during setup to determine if they are both
capable; an ECN-Echo flag in the TCP header so that the data receiver ECN-capable; an ECN-Echo flag in the TCP header so that the data
can inform the data sender when a CE packet has been received; and a receiver can inform the data sender when a CE packet has been
Congestion Window Reduced (CWR) flag in the TCP header so that the received; and a Congestion Window Reduced (CWR) flag in the TCP
data sender can inform the data receiver that the congestion window header so that the data sender can inform the data receiver that the
has been reduced. The support required from other transport congestion window has been reduced. The support required from other
protocols is likely to be different, particular for unreliable or transport protocols is likely to be different, particular for
reliable multicast transport protocols, and will have to be unreliable or reliable multicast transport protocols, and will have
determined as other transport protocols are brought to the IETF for to be determined as other transport protocols are brought to the IETF
standardization. for standardization.
5.1. TCP 6.1. TCP
The following sections describe in detail the proposed use of ECN in The following sections describe in detail the proposed use of ECN in
TCP. This proposal is described in essentially the same form in TCP. This proposal is described in essentially the same form in
[Floyd94]. We assume that the source TCP uses the standard [Floyd94]. We assume that the source TCP uses the standard congestion
congestion control algorithms of Slow-start, Fast Retransmit and Fast control algorithms of Slow-start, Fast Retransmit and Fast Recovery
Recovery [RFC 2001]. [RFC 2001].
This proposal specifies two new flags in the Reserved field of the This proposal specifies two new flags in the Reserved field of the
TCP header. The TCP mechanism for negotiating ECN-Capability uses TCP header. The TCP mechanism for negotiating ECN-Capability uses
the ECN-Echo flag in the TCP header. (This was called the ECN Notify the ECN-Echo flag in the TCP header. (This was called the ECN Notify
flag in some earlier documents.) Bit 9 in the Reserved field of the flag in some earlier documents.) Bit 9 in the Reserved field of the
TCP header is designated as the ECN-Echo flag. TCP header is designated as the ECN-Echo flag. The location of the
6-bit Reserved field in the TCP header is shown in Figure 3 of RFC
793 [RFC793].
To enable the TCP receiver to determine when to stop setting the ECN- To enable the TCP receiver to determine when to stop setting the
Echo flag, we introduce a second new flag in the TCP header, the ECN-Echo flag, we introduce a second new flag in the TCP header, the
Congestion Window Reduced (CWR) flag. The CWR flag is assigned to Congestion Window Reduced (CWR) flag. The CWR flag is assigned to
Bit 8 in the Reserved field of the TCP header. Bit 8 in the Reserved field of the TCP header.
The use of these flags is described in the sections below. The use of these flags is described in the sections below.
5.1.1. TCP Initialization 6.1.1. TCP Initialization
In the TCP connection setup phase, the source and destination TCPs In the TCP connection setup phase, the source and destination TCPs
exchange information about their desire and/or capability to use ECN. exchange information about their desire and/or capability to use ECN.
Subsequent to the completion of this negotiation, the TCP sender sets Subsequent to the completion of this negotiation, the TCP sender sets
the ECT bit in the IP header of packets to indicate to the network the ECT bit in the IP header of data packets to indicate to the
that the transport is capable and willing to participate in ECN for network that the transport is capable and willing to participate in
this packet. This will indicate to the routers that they may mark ECN for this packet. This will indicate to the routers that they may
this packet with the CE bit, if they would like to use that as a mark this packet with the CE bit, if they would like to use that as a
method of congestion notification. If the TCP connection does not method of congestion notification. If the TCP connection does not
wish to use ECN notification for a particular packet, the sending TCP wish to use ECN notification for a particular packet, the sending TCP
sets the ECT bit equal to 0 (i.e., not set), and the TCP receiver sets the ECT bit equal to 0 (i.e., not set), and the TCP receiver
ignores the CE bit in the received packet. ignores the CE bit in the received packet.
When a node sends a TCP SYN packet, it may set the ECN-Echo and CWR When a node sends a TCP SYN packet, it may set the ECN-Echo and CWR
flags in the TCP header. For a SYN packet, the setting of both the flags in the TCP header. For a SYN packet, the setting of both the
ECN-Echo and CWR flags are defined as an indication that the sending ECN-Echo and CWR flags are defined as an indication that the sending
TCP is ECN-Capable, rather than as an indication of congestion or of TCP is ECN-Capable, rather than as an indication of congestion or of
response to congestion. More precisely, a SYN packet with both the response to congestion. More precisely, a SYN packet with both the
ECN-Echo and CWR flags set indicates that the TCP implementation ECN-Echo and CWR flags set indicates that the TCP implementation
transmitting the SYN packet will respond to incoming data packets transmitting the SYN packet will participate in ECN as both a sender
that have the CE bit set in the IP header by setting the ECN-Echo and receiver. As a receiver, it will respond to incoming data
flag in outgoing TCP Acknowledgement (ACK) packets. packets that have the CE bit set in the IP header by setting the
ECN-Echo flag in outgoing TCP Acknowledgement (ACK) packets. As a
sender, it will respond to incoming packets that have the ECN-Echo
flag set by reducing the congestion window when appropriate.
When a node sends a SYN-ACK packet, it may set the ECN-Echo flag, but When a node sends a SYN-ACK packet, it may set the ECN-Echo flag, but
it does not set the CWR flag. For a SYN-ACK packet, the pattern of it does not set the CWR flag. For a SYN-ACK packet, the pattern of
the ECN-Echo flag set and the CWR flag not set in the TCP header is the ECN-Echo flag set and the CWR flag not set in the TCP header is
defined as an indication that the TCP transmitting the SYN-ACK packet defined as an indication that the TCP transmitting the SYN-ACK packet
is ECN-Capable. is ECN-Capable.
There is the question of why we chose to have the TCP sending the SYN There is the question of why we chose to have the TCP sending the SYN
set two ECN-related flags in the Reserved field of the TCP header for set two ECN-related flags in the Reserved field of the TCP header for
the SYN packet, while the responding TCP sending the SYN-ACK sets the SYN packet, while the responding TCP sending the SYN-ACK sets
only one ECN-related flag in the SYN-ACK packet? This asymmetry is only one ECN-related flag in the SYN-ACK packet. This asymmetry is
necessary for the robust negotiation of ECN-capability with deployed necessary for the robust negotiation of ECN-capability with deployed
TCP implementations. There exists at least one TCP implementation in TCP implementations. There exists at least one TCP implementation in
which TCP receivers set the Reserved field of the TCP header in ACK which TCP receivers set the Reserved field of the TCP header in ACK
packets (and hence the SYN-ACK) simply to reflect the Reserved field packets (and hence the SYN-ACK) simply to reflect the Reserved field
of the TCP header in the received data packet. Because the TCP SYN of the TCP header in the received data packet. Because the TCP SYN
packet sets the ECN-Echo and CWR flags to indicate ECN-capability, packet sets the ECN-Echo and CWR flags to indicate ECN-capability,
while the SYN-ACK packet sets only the ECN-Echo flag, the sending TCP while the SYN-ACK packet sets only the ECN-Echo flag, the sending TCP
correctly interprets a receiver's reflection of its own flags in the correctly interprets a receiver's reflection of its own flags in the
Reserved field as an indication that the receiver is not ECN-capable. Reserved field as an indication that the receiver is not ECN-capable.
5.1.2. The TCP Sender 6.1.2. The TCP Sender
For a TCP connection using ECN, data packets are transmitted with the For a TCP connection using ECN, data packets are transmitted with the
ECT bit set in the IP header (set to a "1"). If the sender receives ECT bit set in the IP header (set to a "1"). If the sender receives
an ECN-Echo ACK packet (that is, an ACK packet with the ECN-Echo flag an ECN-Echo ACK packet (that is, an ACK packet with the ECN-Echo flag
set in the TCP header), then the sender knows that congestion was set in the TCP header), then the sender knows that congestion was
encountered in the network on the path from the sender to the encountered in the network on the path from the sender to the
receiver. The indication of congestion should be treated just as a receiver. The indication of congestion should be treated just as a
congestion loss in non-ECN-Capable TCP. That is, the TCP source congestion loss in non-ECN-Capable TCP. That is, the TCP source
halves the congestion window "cwnd" and reduces the slow start halves the congestion window "cwnd" and reduces the slow start
threshold "ssthresh". The sending TCP does NOT increase the threshold "ssthresh". The sending TCP does NOT increase the
congestion window in response to the receipt of an ECN-Echo ACK congestion window in response to the receipt of an ECN-Echo ACK
packet. packet.
A critical condition is that TCP does not react to congestion A critical condition is that TCP does not react to congestion
indications more than once every window of data (or more loosely, indications more than once every window of data (or more loosely,
more than once every round-trip time). That is, the TCP sender's more than once every round-trip time). That is, the TCP sender's
congestion window should be reduced only once in response to a series congestion window should be reduced only once in response to a series
of dropped and/or CE packets from a single window of data, In of dropped and/or CE packets from a single window of data, In
addition, the TCP source should not decrease the slow-start addition, the TCP source should not decrease the slow-start
threshold, ssthresh, if it has been decreased within the last round threshold, ssthresh, if it has been decreased within the last round
trip time. However, if any retransmitted packets are dropped or have trip time. However, if any retransmitted packets are dropped or have
the CE bit set, then this is interpreted by the source TCP as a new the CE bit set, then this is interpreted by the source TCP as a new
instance of congestion. instance of congestion.
[Floyd94] discusses this further, and [Floyd98] includes a validation After the source TCP reduces its congestion window in response to a
test in the ns simulator illustrating a wide range of ECN scenarios. CE packet, incoming acknowledgements that continue to arrive can
These scenarios include the following: an ECN followed by another "clock out" outgoing packets as allowed by the reduced congestion
ECN, a Fast Retransmit, or a Retransmit Timeout; and a Retransmit window. If the congestion window consists of only one MSS (maximum
Timeout or a Fast Retransmit followed by an ECN. segment size), and the sending TCP receives an ECN-Echo ACK packet,
then the sending TCP should in principle still reduce its congestion
window in half. However, the value of the congestion window is
bounded below by a value of one MSS. If the sending TCP were to
continue to send, using a congestion window of 1 MSS, this results in
the transmission of one packet per round-trip time. We believe it is
desirable to still reduce the sending rate of the TCP sender even
further, on receipt of an ECN-Echo packet when the congestion window
is one. We use the retransmit timer as a means to reduce the rate
further in this circumstance. Therefore, the sending TCP should also
reset the retransmit timer on receiving the ECN-Echo packet when the
congestion window is one. The sending TCP will then be able to send
a new packet when the retransmit timer expires.
When the TCP sender reduces its congestion window in response to an [Floyd94] discusses TCP's response to ECN in more detail. [Floyd98]
ECN-Echo ACK packet, there is no need for the sender to slow-start discusses the validation test in the ns simulator, which illustrates
(as in Tahoe TCP in response to a packet drop) or to stop sending a wide range of ECN scenarios. These scenarios include the following:
packets for a period of time to allow the queue to dissipate (as in an ECN followed by another ECN, a Fast Retransmit, or a Retransmit
Reno TCP for roughly half a round-trip time during Fast Recovery). Timeout; a Retransmit Timeout or a Fast Retransmit followed by an
The CE packet in the forward direction does not indicate the imminent ECN, and a congestion window of one packet followed by an ECN.
possibility of buffer overflow requiring an urgent source action to
reduce the load dramatically. Incoming acknowledgements that
continue to arrive can "clock out" outgoing packets as allowed by the
reduced congestion window.
TCP follows existing algorithms for sending data packets in response TCP follows existing algorithms for sending data packets in response
to incoming ACKs, multiple duplicate acknowledgements, or retransmit to incoming ACKs, multiple duplicate acknowledgements, or retransmit
timeouts [RFC2001]. timeouts [RFC2001].
5.1.3. The TCP Receiver 6.1.3. The TCP Receiver
When TCP receives a CE data packet at the destination end-system, the When TCP receives a CE data packet at the destination end-system, the
TCP data receiver sets the ECN-Echo flag in the TCP header of the TCP data receiver sets the ECN-Echo flag in the TCP header of the
subsequent ACK packet. If there is any ACK withholding implemented, subsequent ACK packet. If there is any ACK withholding implemented,
as in current "delayed-ACK" TCP implementations where the TCP as in current "delayed-ACK" TCP implementations where the TCP
receiver can send an ACK for two arriving data packets, then the ECN- receiver can send an ACK for two arriving data packets, then the
Echo flag in the ACK packet will be set to the OR of the CE bits of ECN-Echo flag in the ACK packet will be set to the OR of the CE bits
all of the data packets being acknowledged. That is, if any of the of all of the data packets being acknowledged. That is, if any of
received data packets are CE packets, then the returning ACK has the the received data packets are CE packets, then the returning ACK has
ECN-Echo flag set. the ECN-Echo flag set.
To provide robustness against the possibility of a dropped ACK packet To provide robustness against the possibility of a dropped ACK packet
carrying an ECN-Echo flag, the TCP receiver must set the ECN-Echo carrying an ECN-Echo flag, the TCP receiver must set the ECN-Echo
flag in a series of ACK packets. The TCP receiver uses the CWR flag flag in a series of ACK packets. The TCP receiver uses the CWR flag
to determine when to stop setting the ECN-Echo flag. to determine when to stop setting the ECN-Echo flag.
When an ECN-Capable TCP reduces its congestion window for any reason When an ECN-Capable TCP reduces its congestion window for any reason
(because of a retransmit timeout, a Fast Retransmit, or in response (because of a retransmit timeout, a Fast Retransmit, or in response
to an ECN Notification), the TCP sets the CWR flag in the TCP header to an ECN Notification), the TCP sets the CWR flag in the TCP header
of the first data packet sent after the window reduction. If that of the first data packet sent after the window reduction. If that
data packet is dropped in the network, then the sending TCP will have data packet is dropped in the network, then the sending TCP will have
to reduce the congestion window again and retransmit the dropped to reduce the congestion window again and retransmit the dropped
packet. Thus, the Congestion Window Reduced message is reliably packet. Thus, the Congestion Window Reduced message is reliably
delivered to the data receiver. delivered to the data receiver.
After a TCP receiver sends an ACK packet with the ECN-Echo bit set, After a TCP receiver sends an ACK packet with the ECN-Echo bit set,
that TCP receiver continues to set the ECN-Echo flag in ACK packets that TCP receiver continues to set the ECN-Echo flag in ACK packets
until it receives a CWR packet (a packet with the CWR flag set). until it receives a CWR packet (a packet with the CWR flag set).
After the receipt of the CWR packet, acknowledgements for subsequent After the receipt of the CWR packet, acknowledgements for subsequent
non-CE data packets do not have the ECN-Echo flag set. If another CE non-CE data packets do not have the ECN-Echo flag set. If another CE
packet is received by the data receiver, the receiver would once packet is received by the data receiver, the receiver would once
again send ACK packets with the ECN-Echo flag set. While the receipt again send ACK packets with the ECN-Echo flag set. While the receipt
of a CWR packet does not guarantee that the data sender received the of a CWR packet does not guarantee that the data sender received the
ECN-Echo message, this does guarantee that the data sender reduced ECN-Echo message, this does indicate that the data sender reduced its
its congestion window at some point *after* it sent the data packet congestion window at some point *after* it sent the data packet for
for which the CE bit was set. which the CE bit was set.
We have already specified that a TCP sender reduces its congestion We have already specified that a TCP sender reduces its congestion
window at most once per window of data. This mechanism requires some window at most once per window of data. This mechanism requires some
care to make sure that the sender reduces its congestion window at care to make sure that the sender reduces its congestion window at
most once per ECN indication, and that multiple ECN messages over most once per ECN indication, and that multiple ECN messages over
several successive windows of data are properly reported to the ECN several successive windows of data are properly reported to the ECN
sender. This is discussed further in [Floyd98]. sender. This is discussed further in [Floyd98].
5.1.4. Congestion on the ACK-path 6.1.4. Congestion on the ACK-path
For the current generation of TCP congestion control algorithms, pure For the current generation of TCP congestion control algorithms, pure
acknowledgement packets (e.g., packets that do not contain any acknowledgement packets (e.g., packets that do not contain any
accompanying data) should be sent with the ECT bit off. Current TCP accompanying data) should be sent with the ECT bit off. Current TCP
receivers have no mechanisms for reducing traffic on the ACK-path in receivers have no mechanisms for reducing traffic on the ACK-path in
response to congestion notification. Mechanisms for responding to response to congestion notification. Mechanisms for responding to
congestion on the ACK-path can be relegated as an area for future congestion on the ACK-path are areas for current and future research.
research. (One simple possibility would be for the sender to reduce (One simple possibility would be for the sender to reduce its
its congestion window when it receives a pure ACK packet with the CE congestion window when it receives a pure ACK packet with the CE bit
bit set). For current TCP implementations, a single dropped ACK set). For current TCP implementations, a single dropped ACK generally
generally has only a very small effect on the TCP's sending rate. has only a very small effect on the TCP's sending rate.
6. Summary of changes required in IP and TCP 7. Summary of changes required in IP and TCP
Two bits need to be specified in the IP header, the ECN-Capable Two bits need to be specified in the IP header, the ECN-Capable
Transport (ECT) bit and the Congestion Experienced (CE) bit. The ECT Transport (ECT) bit and the Congestion Experienced (CE) bit. The ECT
bit set to "0" indicates that the transport protocol will ignore the bit set to "0" indicates that the transport protocol will ignore the
CE bit. This is the default value for the ECT bit. The ECT bit set CE bit. This is the default value for the ECT bit. The ECT bit set
to "1" indicates that the transport protocol is willing and able to to "1" indicates that the transport protocol is willing and able to
participate in ECN. participate in ECN.
The default value for the CE bit is "0". The router sets the CE bit The default value for the CE bit is "0". The router sets the CE bit
to "1" to indicate congestion to the end nodes. The CE bit in a to "1" to indicate congestion to the end nodes. The CE bit in a
packet header should never be reset by a router from "1" to "0". packet header should never be reset by a router from "1" to "0".
TCP requires three changes, a negotiation phase during setup to TCP requires three changes, a negotiation phase during setup to
determine if both end nodes are ECN-capable, and two new flags in the determine if both end nodes are ECN-capable, and two new flags in the
TCP header, from the "reserved" flags in the TCP flags field. The TCP header, from the "reserved" flags in the TCP flags field. The
ECN-Echo flag is used by the data receiver to inform the data sender ECN-Echo flag is used by the data receiver to inform the data sender
of a received CE packet. The Congestion Window Reduced flag is used of a received CE packet. The Congestion Window Reduced flag is used
by the data sender to inform the data receiver that the congestion by the data sender to inform the data receiver that the congestion
window has been reduced. window has been reduced.
7. Non-relationship to ATM's EFCI indicator or Frame Relay's FECN 8. Non-relationship to ATM's EFCI indicator or Frame Relay's FECN
Since the ATM and Frame Relay mechanisms for congestion indication Since the ATM and Frame Relay mechanisms for congestion indication
have typically been defined without any notion of average queue size have typically been defined without any notion of average queue size
as the basis for determining that an intermediate node is congested, as the basis for determining that an intermediate node is congested,
we believe that they provide a very noisy signal. The TCP-sender we believe that they provide a very noisy signal. The TCP-sender
reaction specified in this draft for ECN is NOT the appropriate reaction specified in this draft for ECN is NOT the appropriate
reaction for such a noisy signal of congestion notification. It is reaction for such a noisy signal of congestion notification. It is
our expectation that ATM's EFCI and Frame Relay's FECN mechanisms our expectation that ATM's EFCI and Frame Relay's FECN mechanisms
would be phased out over time within the ATM network. However, if would be phased out over time within the ATM network. However, if
the routers that interface to the ATM network have a way of the routers that interface to the ATM network have a way of
skipping to change at page 11, line 16 skipping to change at page 12, line 5
own Congestion Experienced bit (e.g., the EFCI bit for ATM, the FECN own Congestion Experienced bit (e.g., the EFCI bit for ATM, the FECN
bit in Frame Relay) in this reliable manner, then the interface bit in Frame Relay) in this reliable manner, then the interface
router to the layer 2 network could copy the state of that layer 2 router to the layer 2 network could copy the state of that layer 2
Congestion Experienced bit into the CE bit in the IP header. We Congestion Experienced bit into the CE bit in the IP header. We
recognize that this is not the current practice, nor is it in current recognize that this is not the current practice, nor is it in current
standards. However, encouraging experimentation in this manner may standards. However, encouraging experimentation in this manner may
provide the information needed to enable evolution of existing layer provide the information needed to enable evolution of existing layer
2 mechanisms to provide a more reliable means of congestion 2 mechanisms to provide a more reliable means of congestion
indication, when they use a single bit for indicating congestion. indication, when they use a single bit for indicating congestion.
8. Non-compliance by the End Nodes 9. Non-compliance by the End Nodes
This section discusses concerns about the vulnerability of ECN to This section discusses concerns about the vulnerability of ECN to
non-compliant end-nodes (i.e., end nodes that set the ECT bit in non-compliant end-nodes (i.e., end nodes that set the ECT bit in
transmitted packets but do not respond to received CE packets). We transmitted packets but do not respond to received CE packets). We
argue that the addition of ECN to the IP architecture would not argue that the addition of ECN to the IP architecture would not
significantly increase the current vulnerability of the architecture significantly increase the current vulnerability of the architecture
to unresponsive flows. to unresponsive flows.
Even for non-ECN environments, there are serious concerns about the Even for non-ECN environments, there are serious concerns about the
damage that can be done by non-compliant or unresponsive flows (that damage that can be done by non-compliant or unresponsive flows (that
is, flows that do not respond to congestion control indications by is, flows that do not respond to congestion control indications by
reducing their arrival rate at the congested link). For example, an reducing their arrival rate at the congested link). For example, an
end-node could "turn off congestion control" by not reducing its end-node could "turn off congestion control" by not reducing its
congestion window in response to packet drops. This is a concern for congestion window in response to packet drops. This is a concern for
the current Internet. It has been argued that routers will have to the current Internet. It has been argued that routers will have to
deploy mechanisms to detect and differentially treat packets from deploy mechanisms to detect and differentially treat packets from
non-compliant flows. It has also been argued that techniques such as non-compliant flows. It has also been argued that techniques such as
end-to-end per-flow scheduling and isolation of one flow from end-to-end per-flow scheduling and isolation of one flow from
another, differentiated services, or end-to-end reservations could another, differentiated services, or end-to-end reservations could
remove some of the more damaging effects of unresponsive flows. remove some of the more damaging effects of unresponsive flows.
It has been argued that dropping packets in itself may be an adequate It has been argued that dropping packets in itself may be an adequate
deterrent for non-compliance, and that the use of ECN removes this deterrent for non-compliance, and that the use of ECN removes this
deterrent. We would argue in response that (1) ECN-capable routers deterrent. We would argue in response that (1) ECN-capable routers
preserve packet-dropping behavior in times of high congestion; and preserve packet-dropping behavior in times of high congestion; and
(2) even in times of high congestion, dropping packets in itself is (2) even in times of high congestion, dropping packets in itself is
not an adequate deterrent for non-compliance. not an adequate deterrent for non-compliance.
First, ECN-Capable routers will only mark packets (as opposed to First, ECN-Capable routers will only mark packets (as opposed to
dropping them) when the packet marking rate is reasonably low. dropping them) when the packet marking rate is reasonably low. During
During periods where the average queue size exceeds an upper periods where the average queue size exceeds an upper threshold, and
threshold, and therefore the potential packet marking rate would be therefore the potential packet marking rate would be high, our
high, our recommendation is that routers drop packets rather then set recommendation is that routers drop packets rather then set the CE
the CE bit in packet headers. bit in packet headers.
During the periods of low or moderate packet marking rates when ECN During the periods of low or moderate packet marking rates when ECN
would be deployed, there would be little deterrent effect on would be deployed, there would be little deterrent effect on
unresponsive flows of dropping rather than marking those packets. unresponsive flows of dropping rather than marking those packets. For
For example, delay-insensitive flows using reliable delivery might example, delay-insensitive flows using reliable delivery might have
have an incentive to increase rather than to decrease their sending an incentive to increase rather than to decrease their sending rate
rate in the presence of dropped packets. Similarly, delay-sensitive in the presence of dropped packets. Similarly, delay-sensitive flows
flows using unreliable delivery might increase their use of FEC in using unreliable delivery might increase their use of FEC in response
response to an increased packet drop rate, increasing rather than to an increased packet drop rate, increasing rather than decreasing
decreasing their sending rate. For the same reasons, we do not their sending rate. For the same reasons, we do not believe that
believe that packet dropping itself is an effective deterrent for packet dropping itself is an effective deterrent for non-compliance
non-compliance even in an environment of high packet drop rates. even in an environment of high packet drop rates.
Several methods have been proposed to identify and restrict non- Several methods have been proposed to identify and restrict non-
compliant or unresponsive flows. The addition of ECN to the network compliant or unresponsive flows. The addition of ECN to the network
environment would not in any way increase the difficulty of designing environment would not in any way increase the difficulty of designing
and deploying such mechanisms. If anything, the addition of ECN to and deploying such mechanisms. If anything, the addition of ECN to
the architecture would make the job of identifying unresponsive flows the architecture would make the job of identifying unresponsive flows
slightly easier. For example, in an ECN-Capable environment routers slightly easier. For example, in an ECN-Capable environment routers
are not limited to information about packets that are dropped or have are not limited to information about packets that are dropped or have
the CE bit set at that router itself; in such an environment routers the CE bit set at that router itself; in such an environment routers
could also take note of arriving CE packets that indicate congestion could also take note of arriving CE packets that indicate congestion
encountered by that packet earlier in the path. encountered by that packet earlier in the path.
9. Non-compliance in the Network 10. Non-compliance in the Network
The breakdown of effective congestion control could be caused not The breakdown of effective congestion control could be caused not
only by a non-compliant end-node, but also by the loss of the only by a non-compliant end-node, but also by the loss of the
congestion indication in the network itself. As one example, a rogue congestion indication in the network itself. This could happen
or broken router could "erase" the CE bit in arriving CE packets, through a rogue or broken router that set the ECT bit in a packet
thus preventing that indication of congestion from reaching from a non-ECN-capable transport, or "erased" the CE bit in arriving
downstream receivers. This could result in the failure of congestion packets. As one example, a rogue or broken router that "erased" the
control for that flow and a resulting increase in congestion in the CE bit in arriving CE packets would prevent that indication of
network, ultimately resulting in subsequent packets dropped for this congestion from reaching downstream receivers. This could result in
flow as the average queue size increased at the congested gateway. the failure of congestion control for that flow and a resulting
increase in congestion in the network, ultimately resulting in
subsequent packets dropped for this flow as the average queue size
increased at the congested gateway.
The actions of a rogue or broken router could also result in an
unnecessary indication of congestion to the end-nodes. These actions
can include a router dropping a packet or setting the CE bit in the
absence of congestion. From a congestion control point of view,
setting the CE bit in the absence of congestion by a non-compliant
router would be no different than a router dropping a packet
unecessarily. By "erasing" the ECT bit of a packet that is later
dropped in the network, a router's actions could result in an
unnecessary packet drop for that packet later in the network.
Concerns regarding the loss of congestion indications from Concerns regarding the loss of congestion indications from
encapsulated, dropped, or corrupted packets are discussed below. encapsulated, dropped, or corrupted packets are discussed below.
9.1. Encapsulated packets 10.1. Encapsulated packets
Some care is required to handle the CE and ECT bits appropriately Some care is required to handle the CE and ECT bits appropriately
when packets are encapsulated and de-encapsulated for tunnels. when packets are encapsulated and de-encapsulated for tunnels.
When a packet is encapsulated, the following rules apply regarding When a packet is encapsulated, the following rules apply regarding
the ECT bit. First, if the ECT bit in the encapsulated ('inside') the ECT bit. First, if the ECT bit in the encapsulated ('inside')
header is a 0, then the ECT bit in the encapsulating ('outside') header is a 0, then the ECT bit in the encapsulating ('outside')
header MUST be a 0. If the ECT bit in the inside header is a 1, then header MUST be a 0. If the ECT bit in the inside header is a 1, then
the ECT bit in the outside header SHOULD be a 1. the ECT bit in the outside header SHOULD be a 1.
skipping to change at page 13, line 18 skipping to change at page 14, line 19
CE bit in the inside header. (That is, in this case a CE bit of 1 in CE bit in the inside header. (That is, in this case a CE bit of 1 in
the outside header must be copied to the inside header.) If the ECT the outside header must be copied to the inside header.) If the ECT
bit in either header is a 0, then the CE bit in the outside header is bit in either header is a 0, then the CE bit in the outside header is
ignored. This requirement for the treatment of de-encapsulated ignored. This requirement for the treatment of de-encapsulated
packets does not currently apply to IPsec tunnels. packets does not currently apply to IPsec tunnels.
A specific example of the use of ECN with encapsulation occurs when a A specific example of the use of ECN with encapsulation occurs when a
flow wishes to use ECN-capability to avoid the danger of an flow wishes to use ECN-capability to avoid the danger of an
unnecessary packet drop for the encapsulated packet as a result of unnecessary packet drop for the encapsulated packet as a result of
congestion at an intermediate node in the tunnel. This functionality congestion at an intermediate node in the tunnel. This functionality
can be supported by copying the ECN codepoint in the inner IP header can be supported by copying the ECN field in the inner IP header to
to the outer IP header upon encapsulation, and using the ECN the outer IP header upon encapsulation, and using the ECN field in
codepoint in the outer IP header to set the ECN codepoint in the the outer IP header to set the ECN field in the inner IP header upon
inner IP header upon decapsulation. This effectively allows routers decapsulation. This effectively allows routers along the tunnel to
along the tunnel to cause the CE bit to be set in the ECN field of cause the CE bit to be set in the ECN field of the unencapsulated IP
the unencapsulated IP header of an ECN-capable packet when such header of an ECN-capable packet when such routers experience
routers experience congestion. congestion.
9.2. IPsec Tunnel Considerations 10.2. IPsec Tunnel Considerations
The IPsec protocol, as defined in [ESP, AH], does not include the IP The IPsec protocol, as defined in [RFC-ESP?, RFC-AH?], does not
header's ECN field in any of its cryptographic calculations (in the include the IP header's ECN field in any of its cryptographic
case of tunnel mode, the outer IP header's ECN field is not calculations (in the case of tunnel mode, the outer IP header's ECN
included). Hence modification of the ECN field by a network node has field is not included). Hence modification of the ECN field by a
no effect on IPsec's end-to-end security, because it cannot cause any network node has no effect on IPsec's end-to-end security, because it
IPsec integrity check to fail. As a consequence, IPsec does not cannot cause any IPsec integrity check to fail. As a consequence,
provide any defense against an adversary's modification of the ECN IPsec does not provide any defense against an adversary's
field (i.e., a man-in-the-middle attack), as the adversary's modification of the ECN field (i.e., a man-in-the-middle attack), as
modification will also have no effect on IPsec's end-to-end security. the adversary's modification will also have no effect on IPsec's
In some environments, the ability to modify the ECN field without end-to-end security. In some environments, the ability to modify the
affecting IPsec integrity checks may constitute a covert channel; if ECN field without affecting IPsec integrity checks may constitute a
it is necessary to eliminate such a channel or reduce its bandwidth, covert channel; if it is necessary to eliminate such a channel or
then the outer IP header's ECN field can be zeroed at the tunnel reduce its bandwidth, then the outer IP header's ECN field can be
ingress and egress nodes. zeroed at the tunnel ingress and egress nodes.
The IPsec protocol currently requires that the inner header's ECN The IPsec protocol currently requires that the inner header's ECN
field not be changed by IPsec decapsulation processing at a tunnel field not be changed by IPsec decapsulation processing at a tunnel
egress node. This ensures that an adversary's modifications to the egress node. This ensures that an adversary's modifications to the
ECN field cannot be used to launch theft- or denial-of-service ECN field cannot be used to launch theft- or denial-of-service
attacks across an IPsec tunnel endpoint, as any such modifications attacks across an IPsec tunnel endpoint, as any such modifications
will be discarded at the tunnel endpoint. This document makes no will be discarded at the tunnel endpoint. This document makes no
change to that IPsec requirement. As a consequence of the current change to that IPsec requirement. As a consequence of the current
specification of the IPsec protocol, we suggest that experiments with specification of the IPsec protocol, we suggest that experiments with
ECN not be carried out for flows that will undergo IPsec tunneling at ECN not be carried out for flows that will undergo IPsec tunneling at
the present time. the present time.
If the IPsec specifications are modified in the future to permit a If the IPsec specifications are modified in the future to permit a
tunnel egress node to modify the ECN field in an inner IP header tunnel egress node to modify the ECN field in an inner IP header
based on the ECN field value in the outer header (e.g., copying part based on the ECN field value in the outer header (e.g., copying part
or all of the outer ECN field to the inner ECN field), or to permit or all of the outer ECN field to the inner ECN field), or to permit
the ECN field of the outer IP header to be zeroed during the ECN field of the outer IP header to be zeroed during
encapsulation, then experiments with ECN may be used in combination encapsulation, then experiments with ECN may be used in combination
with IPsec tunneling. with IPsec tunneling.
This discussion of ECN and IPsec tunnel considerations draws heavily This discussion of ECN and IPsec tunnel considerations draws heavily
on related discussions and documents from the Differentiated Services on related discussions and documents from the Differentiated Services
Working Group. Working Group.
9.3. Dropped or Corrupted Packets 10.3. Dropped or Corrupted Packets
An additional issue concerns a packet that has the CE bit set at one An additional issue concerns a packet that has the CE bit set at one
router and is dropped by a subsequent router. For the proposed use router and is dropped by a subsequent router. For the proposed use
for ECN in this paper (that is, for a transport protocol such as TCP for ECN in this paper (that is, for a transport protocol such as TCP
for which a dropped data packet is an indication of congestion), end for which a dropped data packet is an indication of congestion), end
nodes detect dropped data packets, and the congestion response of the nodes detect dropped data packets, and the congestion response of the
end nodes to a dropped data packet is at least as strong as the end nodes to a dropped data packet is at least as strong as the
congestion response to a received CE packet. congestion response to a received CE packet.
However, transport protocols such as TCP do not necessarily detect However, transport protocols such as TCP do not necessarily detect
all packet drops, such as the drop of a "pure" ACK packet; for all packet drops, such as the drop of a "pure" ACK packet; for
example, TCP does not reduce the arrival rate of subsequent ACK example, TCP does not reduce the arrival rate of subsequent ACK
packets in response to an earlier dropped ACK packet. Any proposal packets in response to an earlier dropped ACK packet. Any proposal
for extending ECN-Capability to such packets would have to address for extending ECN-Capability to such packets would have to address
concerns raised by CE packets that were later dropped in the network. concerns raised by CE packets that were later dropped in the network.
Similarly, if a CE packet is dropped later in the network due to Similarly, if a CE packet is dropped later in the network due to
corruption (bit errors), the end nodes should still invoke congestion corruption (bit errors), the end nodes should still invoke congestion
control, just as TCP would today in response to a dropped data control, just as TCP would today in response to a dropped data
packet. This issue of corrupted CE packets would have to be packet. This issue of corrupted CE packets would have to be
considered in any proposal for the network to distinguish between considered in any proposal for the network to distinguish between
packets dropped due to corruption, and packets dropped due to packets dropped due to corruption, and packets dropped due to
congestion or buffer overflow. congestion or buffer overflow.
10. A summary of related work. 11. A summary of related work.
[Floyd94] considers the advantages and drawbacks of adding ECN to the [Floyd94] considers the advantages and drawbacks of adding ECN to the
TCP/IP architecture. As shown in the simulation-based comparisons, TCP/IP architecture. As shown in the simulation-based comparisons,
one advantage of ECN is to avoid unnecessary packet drops for short one advantage of ECN is to avoid unnecessary packet drops for short
or delay-sensitive TCP connections. A second advantage of ECN is in or delay-sensitive TCP connections. A second advantage of ECN is in
avoiding some unnecessary retransmit timeouts in TCP. This paper avoiding some unnecessary retransmit timeouts in TCP. This paper
discusses in detail the integration of ECN into TCP's congestion discusses in detail the integration of ECN into TCP's congestion
control mechanisms. The possible disadvantages of ECN discussed in control mechanisms. The possible disadvantages of ECN discussed in
the paper are that a non-compliant TCP connection could falsely the paper are that a non-compliant TCP connection could falsely
advertise itself as ECN-capable, and that a TCP ACK packet carrying advertise itself as ECN-capable, and that a TCP ACK packet carrying
skipping to change at page 15, line 15 skipping to change at page 16, line 18
first of these two issues is discussed in Section 8 of this document, first of these two issues is discussed in Section 8 of this document,
and the second is addressed by the proposal in Section 5.1.3 for a and the second is addressed by the proposal in Section 5.1.3 for a
CWR flag in the TCP header. CWR flag in the TCP header.
[CKLTZ97] reports on an experimental implementation of ECN in IPv6. [CKLTZ97] reports on an experimental implementation of ECN in IPv6.
The experiments include an implementation of ECN in an existing The experiments include an implementation of ECN in an existing
implementation of RED for FreeBSD. A number of experiments were run implementation of RED for FreeBSD. A number of experiments were run
to demonstrate the control of the average queue size in the router, to demonstrate the control of the average queue size in the router,
the performance of ECN for a single TCP connection as a congested the performance of ECN for a single TCP connection as a congested
router, and fairness with multiple competing TCP connections. One router, and fairness with multiple competing TCP connections. One
conclusion of the experiments is that dropping a packet from a bulk- conclusion of the experiments is that dropping packets from a bulk-
data transfer degrades performance much more severely than marking a data transfer can degrade performance much more severely than marking
packet. packets.
Because the experimental implementation in [CKLTZ97] predates some of Because the experimental implementation in [CKLTZ97] predates some of
the developments in this document, the implementation does not the developments in this document, the implementation does not
conform to this document in all respects. For example, in the conform to this document in all respects. For example, in the
experimental implementation the CWR flag is not used, but instead the experimental implementation the CWR flag is not used, but instead the
TCP receiver sends the ECN-Echo bit on a single ACK packet. TCP receiver sends the ECN-Echo bit on a single ACK packet.
[K98] and [CKLT98] build on [CKLTZ97] to further analyze the benefits [K98] and [CKLTZ98] build on [CKLTZ97] to further analyze the
of ECN for TCP. The conclusions are that ECN TCP gets moderately benefits of ECN for TCP. The conclusions are that ECN TCP gets
better throughput than non-ECN TCP; that ECN TCP flows are fair moderately better throughput than non-ECN TCP; that ECN TCP flows are
towards non-ECN TCP flows; and that ECN TCP is robust with two-way fair towards non-ECN TCP flows; and that ECN TCP is robust with two-
traffic, congestion in both directions, and with multiple congested way traffic, congestion in both directions, and with multiple
gateways. Experiments with many short web transfers show that, while congested gateways. Experiments with many short web transfers show
most of the short connections have similar transfer times with or that, while most of the short connections have similar transfer times
without ECN, a small percentage of the short connections have very with or without ECN, a small percentage of the short connections have
high transfer times for the non-ECN experiments as compared to the very long transfer times for the non-ECN experiments as compared to
ECN experiments. This increased transfer time is particularly the ECN experiments. This increased transfer time is particularly
dramatic for those short connections that have their first packet dramatic for those short connections that have their first packet
dropped in the non-ECN experiments, and that therefore have to wait dropped in the non-ECN experiments, and that therefore have to wait
six seconds for the retransmit timer to expire. six seconds for the retransmit timer to expire.
The ECN Web Page [ECN] has pointers to other implementations of ECN The ECN Web Page [ECN] has pointers to other implementations of ECN
in progress. in progress.
11. Conclusions 12. Conclusions
Given the current effort to implement RED, we believe this is the Given the current effort to implement RED, we believe this is the
right time for router vendors to examine how to implement congestion right time for router vendors to examine how to implement congestion
avoidance mechanisms that do not depend on packet drops alone. With avoidance mechanisms that do not depend on packet drops alone. With
the increased deployment of applications and transports sensitive to the increased deployment of applications and transports sensitive to
the delay and loss of a single packet, depending on packet loss as a the delay and loss of a single packet (e.g., realtime traffic, short
normal congestion notification mechanism appears to be insufficient web transfers), depending on packet loss as a normal congestion
(or at the very least, non-optimal). notification mechanism appears to be insufficient (or at the very
least, non-optimal).
12. Acknowledgements 13. Acknowledgements
Many people have made contributions to this internet-draft. In Many people have made contributions to this internet-draft. In
particular, we would like to thank Kenjiro Cho for the proposal for particular, we would like to thank Kenjiro Cho for the proposal for
the TCP mechanism for negotiating ECN-Capability, Kevin Fall for the the TCP mechanism for negotiating ECN-Capability, Kevin Fall for the
proposal of the CWR bit, Steve Blake for material on IPv4 Header proposal of the CWR bit, Steve Blake for material on IPv4 Header
Checksum Recalculation, Jamal Hadi Salim for discussions of ECN Checksum Recalculation, Jamal Hadi Salim for discussions of ECN
issues, and Steve Bellovin, Jim Bound, Brian Carpenter, Paul issues, and Steve Bellovin, Jim Bound, Brian Carpenter, Paul
Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson for Ferguson, Stephen Kent, Greg Minshall, and Vern Paxson for
discussions of security issues. We also thank the Internet End-to- discussions of security issues. We also thank the Internet End-to-
End Research Group for ongoing discussions of these issues. End Research Group for ongoing discussions of these issues.
13. References 14. References
[AH] S. Kent and R. Atkinson, "IP Authentication Header", Internet [RFC-AH?] S. Kent and R. Atkinson, "IP Authentication Header",
Draft <draft-ietf-ipsec-auth-header-07.txt>, July 1998. Internet Draft <draft-ietf-ipsec-auth-header-07.txt>, July 1998.
[B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[CKLTZ97] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L., [CKLTZ97] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L.,
"Implementing Explicit Congestion Notification (ECN) in TCP over "Implementing Explicit Congestion Notification (ECN) in TCP over
IPv6", UCLA Technical Report, December 1997, URL IPv6", UCLA Technical Report, December 1997, URL
"http://www.cs.ucla.edu/~hari/software/ecn/ecn_rpt.ps.gz". "http://www.cs.ucla.edu/~hari/software/ecn/ecn_rpt.ps.gz".
[CKLT98] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L., [CKLTZ98] Chen, C., Krishnan, H., Leung, S., Tang, N., and Zhang, L.,
"Implementing ECN for TCP/IPv6", presentation to the ECN BOF at the "Implementing ECN for TCP/IPv6", presentation to the ECN BOF at the
L.A. IETF, March 1998, URL "http://www.cs.ucla.edu/~hari/ecn- L.A. IETF, March 1998, URL "http://www.cs.ucla.edu/~hari/ecn-
ietf.ps". ietf.ps".
[RFC-DIFFSERV?] Kathleen Nichols, Steven Blake, Fred Baker, and David
L. Black, "Definition of the Differentiated Services Field (DS
Field) in the IPv4 and IPv6 Headers", Internet draft draft-ietf-
diffserv-header-04.txt in last call, October 1998.
[ECN] "The ECN Web Page", URL "http://www- [ECN] "The ECN Web Page", URL "http://www-
nrg.ee.lbl.gov/floyd/ecn.html". nrg.ee.lbl.gov/floyd/ecn.html".
[ESP] S. Kent and R. Atkinson, "IP Encapsulating Security Payload", [RFC-ESP?] S. Kent and R. Atkinson, "IP Encapsulating Security
Internet Draft <draft-ietf-ipsec-esp-v2-06.txt>, July 1998. Payload", Internet Draft <draft-ietf-ipsec-esp-v2-06.txt>, July 1998.
[FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways
for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1 for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1
N.4, August 1993, p. 397-413. URL N.4, August 1993, p. 397-413. URL
"ftp://ftp.ee.lbl.gov/papers/early.pdf". "ftp://ftp.ee.lbl.gov/papers/early.pdf".
[Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM
Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23. Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23.
URL "ftp://ftp.ee.lbl.gov/papers/tcp_ecn.4.ps.Z". URL "ftp://ftp.ee.lbl.gov/papers/tcp_ecn.4.ps.Z".
[Floyd97] Floyd, S., and Fall, K., "Router Mechanisms to Support End- [Floyd97] Floyd, S., and Fall, K., "Router Mechanisms to Support
to-End Congestion Control", Technical report, February 1997. URL End-to-End Congestion Control", Technical report, February 1997. URL
"ftp://ftp.ee.lbl.gov/papers/collapse.ps". "ftp://ftp.ee.lbl.gov/papers/collapse.ps".
[Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator", [Floyd98] Floyd, S., "The ECN Validation Test in the NS Simulator",
URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all- URL "http://www-mash.cs.berkeley.edu/ns/", test tcl/test/test-all-
ecn. ecn.
[K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN) [K98] Krishnan, H., "Analyzing Explicit Congestion Notification (ECN)
benefits for TCP", Master's thesis, UCLA, 1998, URL benefits for TCP", Master's thesis, UCLA, 1998, URL
"http://www.cs.ucla.edu/~hari/software/ecn/ecn_report.ps.gz". "http://www.cs.ucla.edu/~hari/software/ecn/ecn_report.ps.gz".
skipping to change at page 18, line 10 skipping to change at page 19, line 18
"http://www.inria.fr/rodeo/sigcomm97/program.html#ab078". "http://www.inria.fr/rodeo/sigcomm97/program.html#ab078".
[Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc. [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc.
ACM SIGCOMM '88, pp. 314-329. URL ACM SIGCOMM '88, pp. 314-329. URL
"ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z". "ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z".
[Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance
Algorithm", Message to end2end-interest mailing list, April 1990. Algorithm", Message to end2end-interest mailing list, April 1990.
URL "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt". URL "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt".
[MJV96], S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven
Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130.
[RFC791] J. Postel, Internet Protocol, RFC 791, September 1981.
[RFC793] J. Postel, Transmission Control Protocol, RFC 793, September
1981.
[RFC1141] T. Mallory and A. Kullberg, "Incremental Updating of the [RFC1141] T. Mallory and A. Kullberg, "Incremental Updating of the
Internet Checksum", RFC 1141, January 1990. Internet Checksum", RFC 1141, January 1990.
[MJV96], S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven [RFC1349] P. Almquist, "Type of Service in the Internet Protocol
Layered Multicast", SIGCOMM '96, August 1996, pp. 117-130. Suite", RFC 1349, July 1992.
[RFC1455] D. Eastlake, "Physical Link Security Type of Service", RFC
1455, May 1993.
[RFC2001] W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast [RFC2001] W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast
Retransmit, and Fast Recovery Algorithms", RFC 2001, January 1997. Retransmit, and Fast Recovery Algorithms", RFC 2001, January 1997.
[RFC2309] B. Braden, D. Clark, J. Crowcroft, B. Davie, S. Deering, D. [RFC2309] B. Braden, D. Clark, J. Crowcroft, B. Davie, S. Deering, D.
Estrin, S. Floyd, V. Jacobson, G. Minshall, C. Partridge, L. Estrin, S. Floyd, V. Jacobson, G. Minshall, C. Partridge, L.
Peterson, K. Ramakrishnan, S. Shenker, J. Wroclawski, L. Zhang, Peterson, K. Ramakrishnan, S. Shenker, J. Wroclawski, L. Zhang,
"Recommendations on Queue Management and Congestion Avoidance in the "Recommendations on Queue Management and Congestion Avoidance in the
Internet", RFC 2309, April 1998. Internet", RFC 2309, April 1998.
[RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for
Congestion Avoidance in Computer Networks", ACM Transactions on Congestion Avoidance in Computer Networks", ACM Transactions on
Computer Systems, Vol.8, No.2, pp. 158-181, May 1990. Computer Systems, Vol.8, No.2, pp. 158-181, May 1990.
14. Security Considerations 15. Security Considerations
Security considerations have been discussed in Section 9. Security considerations have been discussed in Section 9.
15. IPv4 Header Checksum Recalculation 16. IPv4 Header Checksum Recalculation
IPv4 header checksum recalculation is an issue with some high-end IPv4 header checksum recalculation is an issue with some high-end
router architectures using an output-buffered switch, since most if router architectures using an output-buffered switch, since most if
not all of the header manipulation is performed on the input side of not all of the header manipulation is performed on the input side of
the switch, while the ECN decision would need to be made local to the the switch, while the ECN decision would need to be made local to the
output buffer. This is not an issue for IPv6, since there is no IPv6 output buffer. This is not an issue for IPv6, since there is no IPv6
header checksum. The IPv4 TOS octet is the last byte of a 16-bit header checksum. The IPv4 TOS octet is the last byte of a 16-bit
half-word. half-word.
RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 RFC 1141 [RFC1141] discusses the incremental updating of the IPv4
checksum after the TTL field is decremented. The incremental checksum after the TTL field is decremented. The incremental
updating of the IPv4 checksum after the CE bit was set would work as updating of the IPv4 checksum after the CE bit was set would work as
follows: Let HC be the original header checksum, and let HC' be the follows: Let HC be the original header checksum, and let HC' be the
new header checksum after the CE bit has been set. Then for header new header checksum after the CE bit has been set. Then for header
checksums calculated with one's complement subtraction, HC' would be checksums calculated with one's complement subtraction, HC' would be
recalculated as follows: recalculated as follows:
HC' = { HC - 1 HC > 1 HC' = { HC - 1 HC > 1
skipping to change at page 19, line 4 skipping to change at page 20, line 24
RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 RFC 1141 [RFC1141] discusses the incremental updating of the IPv4
checksum after the TTL field is decremented. The incremental checksum after the TTL field is decremented. The incremental
updating of the IPv4 checksum after the CE bit was set would work as updating of the IPv4 checksum after the CE bit was set would work as
follows: Let HC be the original header checksum, and let HC' be the follows: Let HC be the original header checksum, and let HC' be the
new header checksum after the CE bit has been set. Then for header new header checksum after the CE bit has been set. Then for header
checksums calculated with one's complement subtraction, HC' would be checksums calculated with one's complement subtraction, HC' would be
recalculated as follows: recalculated as follows:
HC' = { HC - 1 HC > 1 HC' = { HC - 1 HC > 1
{ 0x0000 HC = 1 { 0x0000 HC = 1
For header checksums calculated on two's complement machines, HC' For header checksums calculated on two's complement machines, HC'
would be recalculated as follows after the CE bit was set: would be recalculated as follows after the CE bit was set:
HC' = { HC - 1 HC > 0 HC' = { HC - 1 HC > 0
{ 0xFFFE HC = 0 { 0xFFFE HC = 0
16. The motivation for the ECT bit. 17. The motivation for the ECT bit.
The need for the ECT bit is motivated by the fact that ECN will be The need for the ECT bit is motivated by the fact that ECN will be
deployed incrementally in an Internet where some transport protocols deployed incrementally in an Internet where some transport protocols
and routers understand ECN and some do not. With the ECT bit, the and routers understand ECN and some do not. With the ECT bit, the
router can drop packets from flows that are not ECN-capable, but can router can drop packets from flows that are not ECN-capable, but can
**instead** set the CE bit in flows that **are** ECN-capable. *instead* set the CE bit in flows that *are* ECN-capable. Because the
Because the ECT bit allows an end node to have the CE bit set in a ECT bit allows an end node to have the CE bit set in a packet
packet **instead** of having the packet dropped, an end node might *instead* of having the packet dropped, an end node might have some
have some incentive to deploy ECN. incentive to deploy ECN.
If there was no ECT indication, then the router would have to set the If there was no ECT indication, then the router would have to set the
CE bit for packets from both ECN-capable and non-ECN-capable flows. CE bit for packets from both ECN-capable and non-ECN-capable flows.
In this case, there would be no incentive for end-nodes to deploy In this case, there would be no incentive for end-nodes to deploy
ECN, and no viable path of incremental deployment from a non-ECN ECN, and no viable path of incremental deployment from a non-ECN
world to an ECN-capable world. Consider the first stages of such an world to an ECN-capable world. Consider the first stages of such an
incremental deployment, where a subset of the flows are ECN-capable. incremental deployment, where a subset of the flows are ECN-capable.
At the onset of congestion, when the packet dropping/marking rate At the onset of congestion, when the packet dropping/marking rate
would be low, routers would only set CE bits, rather than dropping would be low, routers would only set CE bits, rather than dropping
packets. However, only those flows that are ECN-capable would packets. However, only those flows that are ECN-capable would
understand and respond to CE packets. The result is that the ECN- understand and respond to CE packets. The result is that the ECN-
capable flows would back off, and the non-ECN-capable flows would be capable flows would back off, and the non-ECN-capable flows would be
unaware of the ECN signals and would continue to open their unaware of the ECN signals and would continue to open their
congestion windows. congestion windows.
In this case, there are two possible outcomes: (1) the ECN-capable In this case, there are two possible outcomes: (1) the ECN-capable
flows back off, the non-ECN-capable flows get all of the bandwidth, flows back off, the non-ECN-capable flows get all of the bandwidth,
and congestion remains mild, or (2) the ECN-capable flows back off, and congestion remains mild, or (2) the ECN-capable flows back off,
the non-ECN-capable flows don't, and congestion increases until the the non-ECN-capable flows don't, and congestion increases until the
router transitions from setting the CE bit to dropping packets. router transitions from setting the CE bit to dropping packets.
While this second outcome evens out the fairness, the ECN-capable While this second outcome evens out the fairness, the ECN-capable
skipping to change at page 20, line 6 skipping to change at page 21, line 25
A flow that advertised itself as ECN-Capable but does not respond to A flow that advertised itself as ECN-Capable but does not respond to
CE bits is functionally equivalent to a flow that turns off CE bits is functionally equivalent to a flow that turns off
congestion control, as discussed in Sections 8 and 9. congestion control, as discussed in Sections 8 and 9.
Thus, in a world when a subset of the flows are ECN-capable, but Thus, in a world when a subset of the flows are ECN-capable, but
where ECN-capable flows have no mechanism for indicating that fact to where ECN-capable flows have no mechanism for indicating that fact to
the routers, there would be less effective and less fair congestion the routers, there would be less effective and less fair congestion
control in the Internet, resulting in a strong incentive for end control in the Internet, resulting in a strong incentive for end
nodes not to deploy ECN. nodes not to deploy ECN.
17. Why use two bits in the IP header? 18. Why use two bits in the IP header?
Given the need for an ECT indication in the IP header, there still Given the need for an ECT indication in the IP header, there still
remains the question of whether the ECT (ECN-Capable Transport) and remains the question of whether the ECT (ECN-Capable Transport) and
CE (Congestion Experienced) indications should be overloaded on a CE (Congestion Experienced) indications should be overloaded on a
single bit. This overloaded-one-bit alternative, explored in single bit. This overloaded-one-bit alternative, explored in
[Floyd94], would involve a single bit with two values. One value, [Floyd94], would involve a single bit with two values. One value,
"ECT and not CE", would represent an ECN-Capable Transport, and the "ECT and not CE", would represent an ECN-Capable Transport, and the
other value, "CE or not ECT", would represent either Congestion other value, "CE or not ECT", would represent either Congestion
Experienced or a non-ECN-Capable transport. Experienced or a non-ECN-Capable transport.
There is only one inherent functional difference between the one-bit One difference between the one-bit and two-bit implementations
and two-bit implementations. This functional difference concerns concerns packets that traverse multiple congested routers. Consider
packets that traverse multiple congested routers. Consider a CE a CE packet that arrives at a second congested router, and is
packet that arrives at a second congested router, and is selected by selected by the active queue management at that router for either
the active queue management at that router for either marking or marking or dropping. In the one-bit implementation, the second
dropping. In the one-bit implementation, the second congested router congested router has no choice but to drop the CE packet, because it
has no choice but to drop the CE packet, because it cannot cannot distinguish between a CE packet and a non-ECT packet. In the
distinguish between a CE packet and a non-ECT packet. In the two-bit two-bit implementation, the second congested router has the choice of
implementation, the second congested router has the choice of either either dropping the CE packet, or of leaving it alone with the CE bit
dropping the CE packet, or of leaving it alone with the CE bit set. set.
Another difference between the one-bit and two-bit implementations Another difference between the one-bit and two-bit implementations
comes from the fact that with the one-bit implementation, receivers comes from the fact that with the one-bit implementation, receivers
in a single flow cannot distinguish between CE and non-ECT packets. in a single flow cannot distinguish between CE and non-ECT packets.
Thus, in the one-bit implementation an ECN-capable data sender would Thus, in the one-bit implementation an ECN-capable data sender would
have to unambiguously indicate to the receiver or receivers whether have to unambiguously indicate to the receiver or receivers whether
each packet had been sent as ECN-Capable or as non-ECN-Capable. One each packet had been sent as ECN-Capable or as non-ECN-Capable. One
possibility would be for the sender to indicate in the transport possibility would be for the sender to indicate in the transport
header whether the packet was sent as ECN-Capable. A second header whether the packet was sent as ECN-Capable. A second
possibility that would involve a functional limitation for the one- possibility that would involve a functional limitation for the one-
skipping to change at page 21, line 15 skipping to change at page 22, line 33
In summary, while the one-bit implementation could be a possible In summary, while the one-bit implementation could be a possible
implementation, it has the following significant limitations relative implementation, it has the following significant limitations relative
to the two-bit implementation. First, the one-bit implementation has to the two-bit implementation. First, the one-bit implementation has
more limited functionality for the treatment of CE packets at a more limited functionality for the treatment of CE packets at a
second congested router. Second, the one-bit implementation requires second congested router. Second, the one-bit implementation requires
either that extra information be carried in the transport header of either that extra information be carried in the transport header of
packets from ECN-Capable flows (to convey the functionality of the packets from ECN-Capable flows (to convey the functionality of the
second bit elsewhere, namely in the transport header), or that second bit elsewhere, namely in the transport header), or that
senders in ECN-Capable flows accept the limitation that receivers senders in ECN-Capable flows accept the limitation that receivers
must be able to determine a priori which packets are ECN-Capable and must be able to determine a priori which packets are ECN-Capable and
which are not ECN-Capable. Third, the one-bit implementation is which are not ECN-Capable. Third, the one-bit implementation is
possibly more open to errors from faulty implementations that choose possibly more open to errors from faulty implementations that choose
the wrong default value for the ECN bit. We believe that the use of the wrong default value for the ECN bit. We believe that the use of
the extra bit in the IP header for the ECT-bit is extremely valuable the extra bit in the IP header for the ECT-bit is extremely valuable
to overcome these limitations. to overcome these limitations.
19. Historical definitions for the IPv4 TOS octet
RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP
header. In RFC 791, bits 6 and 7 of the ToS octet are listed as
"Reserved for Future Use", and are shown set to zero. The first two
fields of the ToS octet were defined as the Precedence and Type of
Service (TOS) fields.
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| PRECEDENCE | TOS | 0 | 0 | RFC 791
+-----+-----+-----+-----+-----+-----+-----+-----+
RFC 1122 included bits 6 and 7 in the TOS field, though it did not
discuss any specific use for those two bits:
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| PRECEDENCE | TOS | RFC 1122
+-----+-----+-----+-----+-----+-----+-----+-----+
The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows:
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| PRECEDENCE | TOS | MBZ | RFC 1349
+-----+-----+-----+-----+-----+-----+-----+-----+
Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary
Cost". In addition to the Precedence and Type of Service (TOS)
fields, the last field, MBZ (for "must be zero") was defined as
currently unused. RFC 1349 stated that "The originator of a datagram
sets [the MBZ] field to zero (unless participating in an Internet
protocol experiment which makes use of that bit)."
RFC 1455 [RFC 1455] defined an experimental standard that used all
four bits in the TOS field to request a guaranteed level of link
security.
RFC 1349 is obsoleted by "Definition of the Differentiated Services
Field (DS Field) in the IPv4 and IPv6 Headers" [RFC-DIFFSERV?], in
which bits 6 and 7 of the DS field are listed as Currently Unused
(CU). The first six bits of the DS field are defined as the
Differentiated Services CodePoint (DSCP):
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| DSCP | CU |
+-----+-----+-----+-----+-----+-----+-----+-----+
Because of this unstable history, the definition of the ECN field in
this document cannot be guaranteed to be backwards compatible with
all past uses of these two bits. The damage that could be done by a
non-ECN-capable router would be to "erase" the CE bit for an ECN-
capable packet that arrived at the router with the CE bit set, or set
the CE bit even in the absence of congestion. This has been
discussed in Section 10 on "Non-compliance in the Network".
The damage that could be done in an ECN-capable environment by a
non-ECN-capable end-node transmitting packets with the ECT bit set
has been discussed in Section 9 on "Non-compliance by the End Nodes".
AUTHORS' ADDRESSES AUTHORS' ADDRESSES
K. K. Ramakrishnan K. K. Ramakrishnan
AT&T Labs. Research AT&T Labs. Research
Phone: +1 (973) 360-8766 Phone: +1 (973) 360-8766
Email: kkrama@research.att.com Email: kkrama@research.att.com
URL: http://www.research.att.com/info/kkrama URL: http://www.research.att.com/info/kkrama
Sally Floyd Sally Floyd
Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory
Phone: +1 (510) 486-7518 Phone: +1 (510) 486-7518
Email: floyd@ee.lbl.gov Email: floyd@ee.lbl.gov
URL: http://www-nrg.ee.lbl.gov/floyd/ URL: http://www-nrg.ee.lbl.gov/floyd/
This draft was created in September 1998. This draft was created in October 1998.
It expires March 1999. It expires April 1999.
 End of changes. 83 change blocks. 
196 lines changed or deleted 331 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/