idnits 2.17.1 

draft-briscoe-tsvwg-re-ecn-tcp-07.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

     (You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Feb 2009 rather than one of the newer Notices.  See
     https://trustee.ietf.org/license-info/.)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 3, 2009) is 5530 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)

  ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260)

  == Outdated reference: A later version (-10) exists of
     draft-ietf-tcpm-ecnsyn-07

  == Outdated reference: A later version (-03) exists of
     draft-moncaster-tcpm-rcv-cheat-02

  == Outdated reference: A later version (-11) exists of
     draft-ietf-pcn-architecture-09

  -- Obsolete informational reference (is this intentional?): RFC 2309
     (Obsoleted by RFC 7567)

  -- Obsolete informational reference (is this intentional?): RFC 2988
     (Obsoleted by RFC 6298)

  -- Obsolete informational reference (is this intentional?): RFC 4835
     (Obsoleted by RFC 7321)

  == Outdated reference: A later version (-03) exists of
     draft-briscoe-re-pcn-border-cheat-02


     Summary: 3 errors (**), 0 flaws (~~), 5 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Transport Area Working Group                                  B. Briscoe
3	Internet-Draft                                                  BT & UCL
4	Intended status: Standards Track                              A. Jacquet
5	Expires: September 4, 2009                                  T. Moncaster
6	                                                                A. Smith
7	                                                                      BT
8	                                                           March 3, 2009

10	     Re-ECN: Adding Accountability for Causing Congestion to TCP/IP
11	                   draft-briscoe-tsvwg-re-ecn-tcp-07

13	Status of this Memo

15	   This Internet-Draft is submitted to IETF in full conformance with the
16	   provisions of BCP 78 and BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on September 4, 2009.

36	Copyright Notice

38	   Copyright (c) 2009 IETF Trust and the persons identified as the
39	   document authors.  All rights reserved.

41	   This document is subject to BCP 78 and the IETF Trust's Legal
42	   Provisions Relating to IETF Documents in effect on the date of
43	   publication of this document (http://trustee.ietf.org/license-info).
44	   Please review these documents carefully, as they describe your rights
45	   and restrictions with respect to this document.

47	Abstract

49	   This document introduces a new protocol for explicit congestion
50	   notification (ECN), termed re-ECN, which can be deployed
51	   incrementally around unmodified routers.  The protocol works by
52	   arranging an extended ECN field in each packet so that, as it crosses
53	   any interface in an internetwork, it will carry a truthful prediction
54	   of congestion on the remainder of its path.  The purpose of this
55	   document is to specify the re-ECN protocol at the IP layer and to
56	   give guidelines on any consequent changes required to transport
57	   protocols.  It includes the changes required to TCP both as an
58	   example and as a specification.  It briefly gives examples of
59	   mechanisms that can use the protocol to ensure data sources respond
60	   correctly to congestion,and these are described more fully in a
61	   companion document [re-ecn-motive].

63	Authors' Statement: Status (to be removed by the RFC Editor)

65	   Although the re-ECN protocol is intended to make a simple but far-
66	   reaching change to the Internet architecture, the most immediate
67	   priority for the authors is to delay any move of the ECN nonce to
68	   Proposed Standard status.  The argument for this position is
69	   developed in Appendix E.

71	Changes from previous drafts (to be removed by the RFC Editor)

73	   Full diffs created using the rfcdiff tool are available at
74	   <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#retcp>

76	   From -06 to -07 (current version):

78	      Major changes made following splitting this protocol document from
79	      the related motivations document [re-ecn-motive].

81	      Significant re-ordering of remaining text.

83	      New terminology introduced for clarity.

85	      Minor editorial changes throughout.

87	Table of Contents

89	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  5
90	   2.  Requirements notation  . . . . . . . . . . . . . . . . . . . .  6
91	   3.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  6
92	   4.  Protocol Overview  . . . . . . . . . . . . . . . . . . . . . .  7
93	     4.1.  Simplified Re-ECN Protocol . . . . . . . . . . . . . . . .  7
94	       4.1.1.  Congestion Control and Policing the Protocol . . . . .  7
95	       4.1.2.  Background and Applicability . . . . . . . . . . . . .  8
96	     4.2.  Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or
97	           v6)  . . . . . . . . . . . . . . . . . . . . . . . . . . .  9
98	     4.3.  Re-ECN Protocol Operation  . . . . . . . . . . . . . . . . 10
99	     4.4.  Positive and Negative Flows  . . . . . . . . . . . . . . . 12
100	   5.  Network Layer  . . . . . . . . . . . . . . . . . . . . . . . . 13
101	     5.1.  Re-ECN IPv4 Wire Protocol  . . . . . . . . . . . . . . . . 13
102	     5.2.  Re-ECN IPv6 Wire Protocol  . . . . . . . . . . . . . . . . 15
103	     5.3.  Router Forwarding Behaviour  . . . . . . . . . . . . . . . 16
104	     5.4.  Justification for Setting the First SYN to FNE . . . . . . 17
105	     5.5.  Control and Management . . . . . . . . . . . . . . . . . . 18
106	       5.5.1.  Negative Balance Warning . . . . . . . . . . . . . . . 18
107	       5.5.2.  Rate Response Control  . . . . . . . . . . . . . . . . 19
108	     5.6.  IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 19
109	     5.7.  Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 20
110	   6.  Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 21
111	     6.1.  TCP  . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
112	       6.1.1.  RECN mode: Full Re-ECN capable transport . . . . . . . 22
113	       6.1.2.  RECN-Co mode: Re-ECT Sender with a RFC3168
114	               compliant ECN Receiver . . . . . . . . . . . . . . . . 24
115	       6.1.3.  Capability Negotiation . . . . . . . . . . . . . . . . 26
116	       6.1.4.  Extended ECN (EECN) Field Settings during Flow
117	               Start or after Idle Periods  . . . . . . . . . . . . . 27
118	       6.1.5.  Pure ACKS, Retransmissions, Window Probes and
119	               Partial ACKs . . . . . . . . . . . . . . . . . . . . . 31
120	     6.2.  Other Transports . . . . . . . . . . . . . . . . . . . . . 32
121	       6.2.1.  General Guidelines for Adding Re-ECN to Other
122	               Transports . . . . . . . . . . . . . . . . . . . . . . 32
123	       6.2.2.  Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 32
124	       6.2.3.  Guidelines for adding Re-ECN to DCCP . . . . . . . . . 33
125	       6.2.4.  Guidelines for adding Re-ECN to SCTP . . . . . . . . . 33
126	   7.  Incremental Deployment . . . . . . . . . . . . . . . . . . . . 33
127	   8.  Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 34
128	     8.1.  Congestion Notification Integrity  . . . . . . . . . . . . 34
129	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 35
130	   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 37
131	   11. Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 37
132	   12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 37
133	   13. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 38
134	   14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 38
135	     14.1. Normative References . . . . . . . . . . . . . . . . . . . 38
136	     14.2. Informative References . . . . . . . . . . . . . . . . . . 39
137	   Appendix A.  Precise Re-ECN Protocol Operation . . . . . . . . . . 41
138	   Appendix B.  Justification for Two Codepoints Signifying Zero
139	                Worth Packets . . . . . . . . . . . . . . . . . . . . 43
140	   Appendix C.  ECN Compatibility . . . . . . . . . . . . . . . . . . 44
141	   Appendix D.  Packet Marking with FNE During Flow Start . . . . . . 45
142	   Appendix E.  Argument for holding back the ECN nonce . . . . . . . 47
143	   Appendix F.  Alternative Terminology Used in Other Documents . . . 49

145	1.  Introduction

147	   This document aims to provide a complete specification of the
148	   addition of the re-ECN protocol to IP and guidelines on how to add it
149	   to transport layer protocols, including a complete specification of
150	   re-ECN in TCP as an example.  The motivation behind this proposal is
151	   given in [re-ecn-motive], but we include a brief summary here.

153	   Re-ECN is intended to allow senders to inform the network of the
154	   level of congestion they expect their flows to see.  This information
155	   is currently only visible at the transport layer.  ECN [RFC3168]
156	   reveals the upstream congestion state of any path by monitoring the
157	   rate of CE marks.  The receiver then informs the sender when they
158	   have seen a marked packet.  Re-ECN builds on ECN by providing new
159	   codepoints that allow the sender to declare the level of congestion
160	   they expect on the forward path.  It is closely related to ECN and
161	   indeed we define a compatability mode to allow a re-ECN sender to
162	   communicate with an ECN receiver [xref].

164	   If a sender understates expected congestion compared to actual
165	   congestion then the network could discard packets or enact some other
166	   sanction.  A policer can also be introduced at the ingress of
167	   networks that can limit the level of congestion being caused.

169	   A general statement of the problem solved by re-ECN is to provide
170	   sufficient information in each IP datagram to be able to hold senders
171	   and whole networks accountable for the congestion they cause
172	   downstream, before they cause it.  But the every-day problems that
173	   re-ECN can solve are much more recognisable than this rather generic
174	   statement: mitigating distributed denial of service (DDoS);
175	   simplifying differentiation of quality of service (QoS); policing
176	   compliance to congestion control; and so on.

178	   It is important to add a few key points.

180	   o  In any stnadard network it always takes one round trip before any
181	      feedback is received.  For this reason a sender must make a
182	      conservative prediction by transmitting IP packets with a special
183	      Cautious marking.

185	   o  It should be noted that the prediction is carried in-band in
186	      normal data packets and for many transports feedback can be
187	      carried in the normal acknowledgements or control packets.

189	   o  The re-ECN protocol is independent of the transport.  In TCP,
190	      acknowledgments are used to convey the feedback from receiver to
191	      sender.  This memo concentrates on TCP as an example transport
192	      protocol, however the re-ECN protocol is compatible with any
193	      transport where feedback can be sent from receiver to sender.

195	   This document is structured as follows.  First an overview of the re-
196	   ECN protocol is given (Section 4), outlining its attributes and
197	   explaining conceptually how it works as a whole.  The two main parts
198	   of the document follow.  That is, the protocol specification divided
199	   into network (Section 5) and transport (Section 6) layers.
200	   Deployment issues discussed throughout the document are brought
201	   together in Section 7.  Related work is discussed in (Section 8).

203	2.  Requirements notation

205	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
206	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
207	   document are to be interpreted as described in [RFC2119].

209	3.  Terminology

211	   The following terminology is used throughout this memo.  Some of this
212	   terminology is new and, to avoid confusion, Appendix F sets out all
213	   the alternative terminology that has been used in other re-ECN
214	   related documents.

216	   o  Neutral packet - a packet that is able to be congestion marked by
217	      an ECN or re-ECN queue.

219	   o  Negative packet - a Neutral packet that has been congestion marked
220	      by an ECN or re-ECN queue.

222	   o  Positive packet - a packet that has been marked by the sender to
223	      indicate the expected level of congestion along its path.  In
224	      general Positive packets should only be sent in response to
225	      feedback received from the receiver.*

227	   o  Cancelled packet - a Positive Packet that has been congestion
228	      marked by an ECN or re-ECN queue.

230	   o  Cautious packet - a packet that has been marked by the sender to
231	      indeiate the expected level of congestion along its path.  In
232	      general Cautious packets should be used when there is insufficient
233	      feedback to be confident about the congestion state of the
234	      network.*

236	   o  * the difference between positive and cautious packets is
237	      explained in detail later in the document along with guidelines on
238	      the use of Cautious packets.

240	   All the above terms have related IP codepoints as defined in
241	   (Section 5).

243	4.  Protocol Overview

245	4.1.  Simplified Re-ECN Protocol

247	   We describe here the simplified re-ECN protocol.  To simplify the
248	   description we assume packets and segments are synonymous.

250	   Packets are sent from a sender to a receiver.  In Figure 1 the queues
251	   (Q1 and Q2) are ECN enabled as per RFC 3168 [RFC3168].  If congestion
252	   occurs then packets are marked with the congestion experienced (CE)
253	   flag exactly as in the ECN protocol [RFC3168]; the routers do not
254	   need to be modified and do not need to know the re-ECN protocol.  The
255	   receiver constantly informs the sender of the current count of
256	   Positive packets it has seen.  The sender uses this information
257	   determine how many Positive packets it must send into the network.
258	   The receiver's aim is to balance the number of bytes that have been
259	   congestion marked with the number of Positive bytes it has sent.

261	          +--------- Feedback----------+
262	          |                            |
263	          v                            |
264	        +---+    +----+    +----+    +---+
265	        |   |    |    |    |    |    |   |
266	        | S |--->| Q1 |--->| Q2 |--->| R |
267	        |   |    |    |    |    |    |   |
268	        +---+    +----+    +----+    +---+

270	                          Figure 1: Simple Re-ECN

272	4.1.1.  Congestion Control and Policing the Protocol

274	   The arrangement of the protocol ensures that packets carry a
275	   declaration of the amount of congestion that will be experienced on
276	   the path.  The re-ECN protocol is orthogonal to to any congestion
277	   control algorithms, but can be used to ensure that congestion control
278	   is being applied by the sender.

280	   In general we assume that there will be a policer at the network
281	   ingress which can rate limit traffic based on the amount of
282	   congestion declared.

284	   At the network egress there is a droper which can impose sanctions on
285	   flows that incorrectly declare congestion.

287	   Policers and droppers are explained in more detail in

289	   [re-ecn-motive].

291	4.1.2.  Background and Applicability

293	   The re-ECN protocol makes no changes and has no effect on the TCP
294	   congestion control algorithm or on other rate responses to
295	   congestion. re-ECN is not a new congestion control protocol, rather
296	   it is orthogonal to congestion control itself.  Re-ECN is concerned
297	   with revealing information about congestion so that users and
298	   networks can be held accountable for the congestion they cause, or
299	   allow to be caused.

301	   Re-ECN builds on ECN so we briefly recap the essentials of the ECN
302	   protocol [RFC3168].  Two bits in the IP protocol (v4 or v6) are
303	   assigned to the ECN field.  The sender clears the field to "00" (Not-
304	   ECT) if either end-point transport is not ECN-capable.  Otherwise it
305	   indicates an ECN-capable transport (ECT) using either of the two
306	   code-points "10" or "01" (ECT(0) and ECT(1) resp.).

308	   ECN-capable queues probabilistically set this field to "11" if
309	   congestion is experienced (CE).  In general this marking probability
310	   will increase with the length of the queue at its egress link
311	   (typically using the RED algorithm [RFC2309]).  However, they still
312	   drop rather than mark Not-ECT packets.  With multiple ECN-capable
313	   queues on a path, a flow of packets accumulates the fraction of CE
314	   marking that each queue adds.  The combined effect of the packet
315	   marking of all the queues along the path signals congestion of the
316	   whole path to the receiver.  So, for example, if one queue early in a
317	   path is marking 1% of packets and another later in a path is marking
318	   2%, flows that pass through both queues will experience approximately
319	   3% marking (see Appendix A for a precise treatment).

321	   The choice of two ECT code-points in the ECN field [RFC3168]
322	   permitted future flexibility, optionally allowing the sender to
323	   encode the experimental ECN nonce [RFC3540] in the packet stream.
324	   The nonce is designed to allow a sender to check the integrity of
325	   congestion feedback.  But Section 8.1 explains that it still gives no
326	   control over how fast the sender transmits as a result of the
327	   feedback.  On the other hand, re-ECN is designed both to ensure that
328	   congestion is declared honestly and that the sender's rate responds
329	   appropriately.

331	   Re-ECN is based on a feedback arrangement called `re-
332	   feedback' [Re-fb].  The word is short for either receiver-aligned,
333	   re-inserted or re-echoed feedback.  But it actually works even when
334	   no feedback is available.  In fact it has been carefully designed to
335	   work for single datagram flows.  It also encourages aggregation of
336	   single packet flows by congestion control proxies.  Then, even if the
337	   traffic mix of the Internet were to become dominated by short
338	   messages, it would still be possible to control congestion
339	   effectively and efficiently.

341	   Changing the Internet's feedback architecture seems to imply
342	   considerable upheaval.  But re-ECN can be deployed incrementally at
343	   the transport layer around unmodified queues using existing fields in
344	   IP (v4 or v6).  However it does also require the last undefined bit
345	   in the IPv4 header, which it uses in combination with the 2-bit ECN
346	   field to create four new codepoints.  Nonetheless, we RECOMMEND
347	   adding optional preferentail drop to IP queues based on the re-ECN
348	   fields in order to improve resilience against DoS attacks.
349	   Similarly, re-ECN works best if both the sender and receiver
350	   transports are re-ECN-capable, but it can work with just sender
351	   support(Section 6.1.2).

353	4.2.  Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6)

355	   The re-ECN wire protocol uses the two bit ECN field broadly as in
356	   RFC3168 [RFC3168] as described above, but with five differences of
357	   detail (brought together in a list in Section 7).  This specification
358	   defines a new re-ECN extension (RE) flag.  We will defer the
359	   definition of the actual position of the RE flag in the IPv4 & v6
360	   headers until Section 5.  When we don't need to choose between IPv4
361	   and v6 wire protocols it will suffice call it the RE flag.

363	   Unlike the ECN field, the RE flag is intended to be set by the sender
364	   and SHOULD remain unchanged along the path, although it can be read
365	   by network elements that understand the re-ECN protocol.  It is
366	   feasible that a network element MAY change the setting of the RE
367	   flag, perhaps acting as a proxy for an end-point, but such a protocol
368	   would have to be defined in another specification (e.g. [Re-PCN]).

370	   Although the RE flag is a separate, single bit field, it can be read
371	   as an extension to the two-bit ECN field; the three concatenated bits
372	   in what we will call the extended ECN field (EECN) giving eight
373	   codepoints.  We will use the RFC3168 names of the ECN codepoints to
374	   describe settings of the ECN field when the RE flag setting is "don't
375	   care", but we also define the following six extended ECN codepoint
376	   names for when we need to be more specific.

378	   One of re-ECN's codepoints is an alternative use of the codepoint set
379	   aside in RFC3168 for the ECN nonce (ECT(1)).  Transports using re-ECN
380	   do not need to use the ECN nonce as long as the sender is also
381	   checking for transport protocol compliance
382	   [I-D.moncaster-tcpm-rcv-cheat].  The case for doing this is given in
383	   Appendix E.  Two re-ECN codepoints are given compatible uses to those
384	   defined in RFC3168 (Not-ECT and CE).  The other codepoint used by
385	   RFC3168 (ECT(0)) isn't used for re-ECN.  Altogether this leave one
386	   codepoint of the eight unused by ECN or re-ECN and available for
387	   future use.

389	   +--------+-------------+-------+-----------+------------------------+
390	   |   ECN  |   RFC3168   |   RE  |    EECN   |     re-ECN meaning     |
391	   |  field |  codepoint  |  flag | codepoint |                        |
392	   +--------+-------------+-------+-----------+------------------------+
393	   |   00   |   Not-ECT   |   0   |  Not-ECT  |   Not re-ECN-capable   |
394	   |        |             |       |           |   transport (Legacy)   |
395	   |   00   |     ---     |   1   |    FNE    |      Feedback not      |
396	   |        |             |       |           | established (Cautious) |
397	   |   01   |    ECT(1)   |   0   |  Re-Echo  |  Re-echoed congestion  |
398	   |        |             |       |           |   and RECT (Positive)  |
399	   |   01   |     ---     |   1   |    RECT   |     Re-ECN capable     |
400	   |        |             |       |           |   transport (Neutral)  |
401	   |   10   |    ECT(0)   |   0   |   ECT(0)  |  RFC3168 ECN use only  |
402	   |        |             |       |           |                        |
403	   |   10   |     ---     |   1   |   --CU--  |    Currently unused    |
404	   |        |             |       |           |                        |
405	   |   11   |      CE     |   0   |   CE(0)   |  Re-Echo cancelled by  |
406	   |        |             |       |           |     CE (Cancelled)     |
407	   |   11   |     ---     |   1   |   CE(-1)  | Congestion Experienced |
408	   |        |             |       |           |       (Negative)       |
409	   +--------+-------------+-------+-----------+------------------------+

411	                     Table 1: Extended ECN Codepoints

413	4.3.  Re-ECN Protocol Operation

415	   In this section we will give an overview of the operation of the re-
416	   ECN protocol for TCP/IP, leaving a detailed specification to the
417	   following sections.  Other transports will be discussed later.

419	   In summary, the protocol adds a third `re-echo' stage to the existing
420	   TCP/IP ECN protocol.  Whenever the network adds CE congestion
421	   signalling to the IP header on the forward data path, the receiver
422	   feeds it back to the ingress using TCP, then the sender re-echoes it
423	   into the forward data path using the RE flag in the next packet.

425	   Prior to receiving any feedback a sender will not know which setting
426	   of the RE flag to use, so it sends Cautious packets by setting the
427	   FNE codepoint.  The network reads the FNE codepoint conservatively as
428	   equivalent to re-echoed congestion.

430	   Specifically, once feedback from an ECN or re-ECN capable flow is
431	   established, a re-ECN sender always initialises the ECN field to
432	   ECT(1).  And it usually sets the RE flag to "1" indicating a Neutral
433	   packet.  Whenever a queue marks a packet to CE, the receiver feeds
434	   back this event to the sender.  On receiving this feedback, the re-
435	   ECN sender will clear the RE flag to "0" in the next packet it sends
436	   (indicating a Positive packet).

438	   We chose to set and clear the RE flag this way round to ease
439	   incremental deployment (see Section 7).  To avoid confusion we will
440	   use the term `blanking' (rather than marking) when the RE flag is
441	   cleared to "0".  So, over a stream of packets, we will talk of the
442	   `RE blanking fraction' as the fraction of octets in packets with the
443	   RE flag cleared to "0".

445	       +---+  +----+                +----+  +---+
446	       | S |--| Q1 |----------------| Q2 |--| R |
447	       +---+  +----+                +----+  +---+
448	         .      .                      .      .
449	       ^ .      .                      .      .
450	       | .      .                      .      .
451	       | .     RE blanking fraction    .      .
452	    3% |-------------------------------+=======
453	       | .      .                      |      .
454	    2% | .      .                      |      .
455	       | .      .  CE marking fraction |      .
456	    1% | .      +----------------------+      .
457	       | .      |                      .      .
458	    0% +--------------------------------------->
459	         ^          ^                      ^
460	         L          M                      N    Observation points

462	                  Figure 2: A 2-Queue Example (Imprecise)

464	   Figure 2 uses a simple network to illustrate how re-ECN allows queues
465	   to measure downstream congestion.  The receiver views a CE marking
466	   fraction of 3% which is fed back to the sender.  The sender sets an
467	   RE blanking fraction of 3% to match this.  This RE blanking fraction
468	   can be observed along the path as the RE flag is not changed by
469	   network nodes once set by the sender.  This is shown by the
470	   horizontal line at 3% in the figure.  The CE marked fraction is shown
471	   by the stepped line which rises to meet the RE blanking fraction line
472	   with steps at at each queue where packets are marked.  Two queues are
473	   shown (Q1 and Q2) that are currently congested.  Each time packets
474	   pass through a fraction are marked; 1% at Q1 and 2% at Q2).  The
475	   approximate downstream congestion can be measured at the observation
476	   points shown along the path by subtracting the CE marking fraction
477	   from the RE blanking fraction, as shown in the table below
478	   (Appendix A derives these approximations from a precise analysis).

480	   NB due to the unary nature of ECN marking and the equivalent unary
481	   nature of re-ECN blanking, the precise fraction of marked bytes must
482	   be calculated by maintaining a moving average of the number of
483	   packets that have been marked as a proportion of the total number of
484	   packets.

486	   Along the path the fraction of packets that had their RE field
487	   cleared remains unchanged so it can be used as a reference against
488	   which to compare upstream congestion.  The difference predicts
489	   downstream congestion for the rest of the path.  Therefore, measuring
490	   the fractions of each codepoint at any point in the Internet will
491	   reveal upstream, downstream and whole path congestion.

493	   Note that we have introduced discussion of marking and blanking
494	   fractions solely for illustration.  We are not saying any protocol
495	   handler will work with these average fractions directly.  In fact the
496	   protocol actually requires the number of marked and blanked bytes to
497	   balance by the time the packet reaches the receiver.

499	4.4.  Positive and Negative Flows

501	   In Section 3 we introduced the terms Positive, Neutral, Negative,
502	   Cautious and Cancelled.  This terminology is based on the requirement
503	   to balance the proportion of bytes marked as CE with the proportion
504	   of bytes that are re-echo marked.  In the rest of this memo we will
505	   loosely talk of positive or negative flows, meaning flows where the
506	   moving average of the downstream congestion metric is persistently
507	   positive or negative.  A negative flow is one where more CE marked
508	   packets than re-ECN blanked packets arrive.  Likewise in positive
509	   flows more re-ECN blanked packets arrive than CE marked packets.  The
510	   notion of a negative metric arises because it is derived by
511	   subtracting one metric from another.  Of course actual downstream
512	   congestion cannot be negative, only the metric can (whether due to
513	   time lags or deliberate malice).

515	   Therefore we will talk of packets having `worth' of +1, 0 or -1,
516	   which, when multiplied by their size, indicates their contribution to
517	   the downstream congestion metric.  The worth of each type of packet
518	   is given below in Table 2.  The idea is that most flows start with
519	   zero worth.  Every time the network decrements the worth of a packet,
520	   the sender increments the worth of a later packet.  Then, over time,
521	   as many positive octets should arrive at the receiver as negative.
522	   Note we have said octets not packets, so if packets are of different
523	   sizes, the worth should be incremented on enough octets to balance
524	   the octets in negative packets arriving at the receiver.  It is this
525	   balance that will allow the network to hold the sender accountable
526	   for the congestion it causes.

528	   If a packet carrying re-echoed congestion happens to also be
529	   congestion marked, the +1 worth added by the sender will be cancelled
530	   out by the -1 network congestion marking.  Although the two worth
531	   values correctly cancel out, neither the congestion marking nor the
532	   re-echoed congestion are lost, because the RE bit and the ECN field
533	   are orthogonal.  So, whenever this happens, the receiver will
534	   correctly detect and re-echo the new congestion event as well.

536	   The table below specifies unambiguously the worth of each extended
537	   ECN codepoint.  Note the order is different from the previous table
538	   to better show how the worth increments and decrements.

540	   +---------+-------+---------------+-------+-------------------------+
541	   |   ECN   |   RE  | Extended ECN  | Worth |       Re-ECN Term       |
542	   |  field  |  bit  | codepoint     |       |                         |
543	   +---------+-------+---------------+-------+-------------------------+
544	   |    00   |   0   | Not-RECT      | ...   |           ---           |
545	   |    00   |   1   | FNE           | +1    |         Cautious        |
546	   |    01   |   0   | Re-Echo       | +1    |         Positive        |
547	   |    10   |   0   | Legacy        | ...   |   RFC3168 ECN use only  |
548	   |         |       |               |       |                         |
549	   |    11   |   0   | CE(0)         |  0    |         Negative        |
550	   |    01   |   1   | RECT          |  0    |         Neutral         |
551	   |    10   |   1   | --CU--        | ...   |     Currently unused    |
552	   |         |       |               |       |                         |
553	   |    11   |   1   | CE(-1)        | -1    |         Negative        |
554	   +---------+-------+---------------+-------+-------------------------+

556	                Table 2: 'Worth' of Extended ECN Codepoints

558	5.  Network Layer

560	5.1.  Re-ECN IPv4 Wire Protocol

562	   The wire protocol of the ECN field in the IP header remains largely
563	   unchanged from [RFC3168].  However, an extension to the ECN field we
564	   call the RE (Re-ECN extension) flag (Section 4.2) is defined in this
565	   document.  It doubles the extended ECN codepoint space, giving 8
566	   potential codepoints.  The semantics of the extra codepoints are
567	   backward compatible with the semantics of the 4 original codepoints
568	   [RFC3168] (Section 7 collects together and summarises all the changes
569	   defined in this document).

571	   For IPv4, this document proposes that the new RE control flag will be
572	   positioned where the `reserved' control flag was at bit 48 of the
573	   IPv4 header (counting from 0).  Alternatively, some would call this
574	   bit 0 (counting from 0) of byte 7 (counting from 1) of the IPv4
575	   header (Figure 3).

577	             0   1   2
578	           +---+---+---+
579	           | R | D | M |
580	           | E | F | F |
581	           +---+---+---+

583	   Figure 3: New Definition of the Re-ECN Extension (RE) Control Flag at
584	                  the Start of Byte 7 of the IPv4 Header

586	   The semantics of the RE flag are described in outline in Section 4
587	   and specified fully in Section 6.  The RE flag is always considered
588	   in conjunction with the 2-bit ECN field, as if they were concatenated
589	   together to form a 3-bit extended ECN field.  If the ECN field is set
590	   to either the ECT(1) or CE codepoint, when the RE flag is blanked
591	   (cleared to "0") it represents a re-echo of congestion experienced by
592	   an early packet.  If the ECN field is set to the Not-ECT codepoint,
593	   when the RE flag is set to "1" it represents the feedback not
594	   established (FNE) codepoint, which signals that the packet was sent
595	   without the benefit of congestion feedback.

597	   It is believed that the FNE codepoint can simultaneously serve other
598	   purposes, particularly where the start of a flow needs distinguishing
599	   from packets later in the flow.  For instance it would have been
600	   useful to identify new flows for tag switching and might enable
601	   similar developments in the future if it were adopted.  It is similar
602	   to the state set-up bit idea designed to protect against memory
603	   exhaustion attacks.  This idea was proposed informally by David Clark
604	   and documented by Handley and Greenhalgh  [Steps_DoS].  The FNE
605	   codepoint can be thought of as a `soft-state set-up flag', because it
606	   is idempotent (i.e. one occurrence of the flag is sufficient but
607	   further occurrences achieve the same effect if previous ones were
608	   lost).

610	   We are sure there will probably be other claims pending on the use of
611	   bit 48.  We know of at least two  [ARI05], [RFC3514] but neither have
612	   been pursued in the IETF, so far, although the present proposal would
613	   meet the needs of the latter.

615	   The security flag proposal (commonly known as the evil bit) was
616	   published on 1 April 2003 as Informational RFC 3514, but it was not
617	   adopted due to confusion over whether evil-doers might set it
618	   inappropriately.  The present proposal is backward compatible with
619	   RFC3514 because if re-ECN compliant senders were benign they would
620	   correctly clear the evil bit to honestly declare that they had just
621	   received congestion feedback.  Whereas evil-doers would hide
622	   congestion feedback by setting the evil bit continuously, or at least
623	   more often than they should.  So, evil senders can be identified,
624	   because they declare that they are good less often than they should.

626	5.2.  Re-ECN IPv6 Wire Protocol

628	   For IPv6, this document proposes that the new RE control flag will be
629	   positioned as the first bit of the option field of a new Congestion
630	   hop by hop option header (Figure 4).

632	        0                   1                   2                   3
633	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
634	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
635	       |  Next Header  |  Hdr ext Len  |  Option Type  | Opt Length =4 |
636	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
637	       |R|                     Reserved for future use                 |
638	       |E|                                                             |
639	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

641	      Figure 4: Definition of a New IPv6 Congestion Hop by Hop Option
642	         Header containing the re-ECN Extension (RE) Control Flag

644	               0 1 2 3 4 5 6 7 8
645	               +-+-+-+-+-+-+-+-+-
646	               |AIU|C|Option ID|
647	               +-+-+-+-+-+-+-+-+-

649	           Figure 5: Congestion Hop by Hop Option Type Encoding

651	   The Hop-by-Hop Options header enables packets to carry information to
652	   be examined and processed by routers or nodes along the packet's
653	   delivery path, including the source and destination nodes.  For re-
654	   ECN, the two bits of the Action If Unrecognized (AIU) flag of the
655	   Congestion extension header MUST be set to "00" meaning if
656	   unrecognized `skip over option and continue processing the header'.
657	   Then, any routers or a receiver not upgraded with the optional re-ECN
658	   features described in this memo will simply ignore this header.  But
659	   routers with these optional re-ECN features or a re-ECN policing
660	   function, will process this Congestion extension header.

662	   The `C' flag MUST be set to "1" to specify that the Option Data
663	   (currently only the RE control flag) can change en-route to the
664	   packet's final destination.  This ensures that, when an
665	   Authentication header (AH [RFC4302]) is present in the packet, for
666	   any option whose data may change en-route, its entire Option Data
667	   field will be treated as zero-valued octets when computing or
668	   verifying the packet's authenticating value.

670	   Although the RE control flag should not be changed along the path, we
671	   expect that the rest of this option field that is currently `Reserved
672	   for future use' could be used for a multi-bit congestion notification
673	   field which we would expect to change en route.  As the RE flag does
674	   not need end-to-end authentication, we set the C flag to '1'.

676	   {ToDo: A Congestion Hop by Hop Option ID will need to be registered
677	   with IANA.}

679	5.3.  Router Forwarding Behaviour

681	   Re-ECN works well without modifying the forwarding behaviour of any
682	   routers.  However, below, two OPTIONAL changes to forwarding
683	   behaviour are defined which respectively enhance performance and
684	   improve a router's discrimination against flooding attacks.  They are
685	   both OPTIONAL additions that we propose MAY apply by default to all
686	   Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN
687	   marking behaviours [RFC3168].  Specifications for PHBs MAY define
688	   different forwarding behaviours from this default, but this is not
689	   required.  [Re-PCN] is one example.

691	   FNE indicates ECT:

693	      The FNE codepoint tells a router to assume that the packet was
694	      sent by an ECN-capable transport (see Section 5.4).  Therefore an
695	      FNE packet MAY be marked rather than dropped.  Note that the FNE
696	      codepoint has been intentionally chosen so that, to RFC3168
697	      compliant routers (which do not inspect the RE flag) an FNE packet
698	      appears to be Not-ECT so it will be dropped by legacy AQM
699	      algorithms.

701	      A network operator MUST NOT configure a queue to ECN mark rather
702	      than drop FNE packets unless it can guarantee that FNE packets
703	      will be rate limited, either locally or upstream.  The ingress
704	      policers discussed in [re-ecn-motive] would count as rate limiters
705	      for this purpose.

707	   Preferential Drop:  If a re-ECN capable router queue experiences very
708	      high load so that it has to drop arriving packets (e.g. a DoS
709	      attack), it MAY preferentially drop packets within the same
710	      Diffserv PHB using the preference order for extended ECN
711	      codepoints given in Table 3.  Preferential dropping can be
712	      difficult to implement on some hardware, but if feasible it would
713	      discriminate against attack traffic if done as part of the overall
714	      policing framework of [re-ecn-motive].  If nowhere else, routers
715	      at the egress of a network SHOULD implement preferential drop
716	      (stronger than the MAY above).  For simplicity, preferences 4 & 5
717	      MAY be merged into one preference level.

719	   +-------+-----+------------+-------+------------+-------------------+
720	   |  ECN  |  RE | Extended   | Worth | Drop Pref  |   Re-ECN meaning  |
721	   | field | bit | ECN        |       | (1 = drop  |                   |
722	   |       |     | codepoint  |       | 1st)       |                   |
723	   +-------+-----+------------+-------+------------+-------------------+
724	   |   01  |  0  | Re-Echo    | +1    | 5/4        |     Re-echoed     |
725	   |       |     |            |       |            |   congestion and  |
726	   |       |     |            |       |            |        RECT       |
727	   |   00  |  1  | FNE        | +1    | 4          |    Feedback not   |
728	   |       |     |            |       |            |    established    |
729	   |   11  |  0  | CE(0)      | 0     | 3          |  Re-Echo canceled |
730	   |       |     |            |       |            |   by congestion   |
731	   |       |     |            |       |            |    experienced    |
732	   |   01  |  1  | RECT       | 0     | 3          |   Re-ECN capable  |
733	   |       |     |            |       |            |     transport     |
734	   |   11  |  1  | CE(-1)     | -1    | 3          |     Congestion    |
735	   |       |     |            |       |            |    experienced    |
736	   |   10  |  1  | --CU--     | n/a   | 2          |  Currently Unused |
737	   |   10  |  0  | ---        | n/a   | 2          |  RFC3168 ECN use  |
738	   |       |     |            |       |            |        only       |
739	   |   00  |  0  | Not-RECT   | n/a   | 1          |        Not        |
740	   |       |     |            |       |            |   Re-ECN-capable  |
741	   |       |     |            |       |            |     transport     |
742	   +-------+-----+------------+-------+------------+-------------------+

744	       Table 3: Drop Preference of EECN Codepoints (Sorted by `Worth')

746	      The above drop preferences are arranged to preserve packets with
747	      more positive worth (Section 4.4), given senders of positive
748	      packets must have honestly declared downstream congestion.  A full
749	      treatment of this is provided in the companion document desribing
750	      the motivation and architecture for re-ECN [re-ecn-motive]
751	      particularly when the application of re-ECN to protect against
752	      DDoS attacks is described.

754	5.4.  Justification for Setting the First SYN to FNE

756	   the initial SYN MUST be set to FNE by Re-ECT client A (Section 6.1.4)
757	   and (Section 5.3) says a queue MAY optionally treat an FNE packet as
758	   ECN capable, so an initial SYN may be marked CE(-1) rather than
759	   dropped.  This seems dangerous, because the sender has not yet
760	   established whether the receiver is a RFC3168 one that does not
761	   understand congestion marking.  It also seems to allow malicious
762	   senders to take advantage of ECN marking to avoid so much drop when
763	   launching SYN flooding attacks.  Below we explain the features of the
764	   protocol design that remove both these dangers.

766	   ECN-capable initial SYN with a Not-ECT server:  If the TCP server B
767	      is re-ECN capable, provision is made for it to feedback a possible
768	      congestion marked SYN in the SYN ACK (Section 6.1.4).  But if the
769	      TCP client A finds out from the SYN ACK that the server was not
770	      ECN-capable, the TCP client MUST conservatively consider the first
771	      SYN as congestion marked before setting itself into Not-ECT mode.
772	      Section 6.1.4 mandates that such a TCP client MUST also set its
773	      initial window to 1 segment.  In this way we remove the need to
774	      cautiously avoid setting the first SYN to Not-RECT.  This will
775	      give worse performance while deployment is patchy, but better
776	      performance once deployment is widespread.

778	   SYN flooding attacks can't exploit ECN-capability:  Malicious hosts
779	      may think they can use the advantage that ECN-marking gives over
780	      drop in launching classic SYN-flood attacks.  But Section 5.3
781	      mandates that a router MUST only be configured to treat packets
782	      with the FNE codepoint as ECN-capable if FNE packets are rate
783	      limited somewhere.  Introduction of the FNE codepoint was a
784	      deliberate move to enable transport-neutral handling of flow-start
785	      and flow state set-up in the IP layer where it belongs.  It then
786	      becomes possible to protect against flooding attacks of all forms
787	      (not just SYN flooding) without transport-specific inspection for
788	      things like the SYN flag in TCP headers.  Then, for instance, SYN
789	      flooding attacks using IPSec ESP encryption can also be rate
790	      limited at the IP layer.

792	   It might seem pedantic going to all this trouble to enable ECN on the
793	   initial packet of a flow, but it is motivated by a much wider concern
794	   to ensure safe congestion control will still be possible even if the
795	   application mix evolves to the point where the majority of flows
796	   consist of a single window or even a single packet.  It also allows
797	   denial of service attacks to be more easily isolated and prevented.

799	5.5.  Control and Management

801	5.5.1.  Negative Balance Warning

803	   A new ICMP message type is being considered so that a dropper can
804	   warn the apparent sender of a flow that it has started to sanction
805	   the flow.  The message would have similar semantics to the `Time
806	   exceeded' ICMP message type.  To ensure the sender has to invest some
807	   work before the network will generate such a message, a dropper
808	   SHOULD only send such a message for flows that have demonstrated that
809	   they have started correctly by establishing a positive record, but
810	   have later gone negative.  The threshold is up to the implementation.
811	   The purpose of the message is to deconfuse the cause of drops from
812	   other causes, such as congestion or transmission losses.  The dropper
813	   would send the message to the sender of the flow, not the receiver.

815	   If we did define this message type, it would be REQUIRED for all re-
816	   ECT senders to parse and understand it.  Note that a sender MUST only
817	   use this message to explain why losses are occurring.  A sender MUST
818	   NOT take this message to mean that losses have occurred that it was
819	   not aware of.  Otherwise, spoof messages could be sent by malicious
820	   sources to slow down a sender (c.f.  ICMP source quench).

822	   However, the need for this message type is not yet confirmed, as we
823	   are considering how to prevent it being used by malicious senders to
824	   scan for droppers and to test their threshold settings. {ToDo:
825	   Complete this section.}

827	5.5.2.  Rate Response Control

829	   As discussed in [re-ecn-motive] the sender's access operator will be
830	   expected to use bulk per-user policing, but they might choose to
831	   introduce a per-flow policer.  In cases where operators do introduce
832	   per-flow policing, there may be a need for a sender to send a request
833	   to the ingress policer asking for permission to apply a non-default
834	   response to congestion (where TCP-friendly is assumed to be the
835	   default).  This would require the sender to know what message
836	   format(s) to use and to be able to discover how to address the
837	   policer.  The required control protocol(s) are outside the scope of
838	   this document, but will require definition elsewhere.

840	   The policer is likely to be local to the sender and inline, probably
841	   at the ingress interface to the internetwork.  So, discovery should
842	   not be hard.  A variety of control protocols already exist for some
843	   widely used rate-responses to congestion.  For instance DCCP
844	   congestion control identifiers (CCIDs [RFC4340]) fulfil this role and
845	   so does QoS signalling (e.g. and RSVP request for controlled load
846	   service is equivalent to a request for no rate response to
847	   congestion, but with admission control).

849	5.6.  IP in IP Tunnels

851	   For re-ECN to work correctly through IP in IP tunnels, it needs
852	   slightly different tunnel handling to regular ECN [RFC3168].
853	   Currently there is some incosistency between how the handling of IP
854	   in IP tunnels is defined in [RFC3168] and how it is defined in
855	   [RFC4301], but re-ECN would work fine with the IPsec behaviour.  This
856	   inconsistency is addressed in a new Internet Draft [ECN-tunnel] that
857	   proposes to update RFC3168 tunnel behaviour to bring it into line
858	   with IPsec.  Ideally, for re-ECN to work through a tunnel, the tunnel
859	   entry should copy both the RE flag and the ECN field from the inner
860	   to the outer IP header.  Then at the tunnel exit, any congestion
861	   marking of the outer ECN field should overwrite the inner ECN field
862	   (unless the inner field is Not-ECT in which case an alarm should be
863	   raised).  The RE flag shouldn't change along a path, so the outer RE
864	   flag should be the same as the inner.  If it isn't a management alarm
865	   should be raised.  This behaviour is the same as the full-
866	   functionality variant of [RFC3168] at tunnel exit, but different at
867	   tunnel entry.

869	   If tunnels are left as they are specified in [RFC3168], whether the
870	   limited or full-functionality variants are used, a problem arises
871	   with re-ECN if a tunnel crosses an inter-domain boundary, because the
872	   difference between positive and negative markings will not be
873	   correctly accounted for.  In a limited functionality ECN tunnel, the
874	   flow will appear to be RFC3168 compliant traffic, and therefore may
875	   be wrongly rate limited.  In a full-functionality ECN tunnel, the
876	   result will depend whether the tunnel entry copies the inner RE flag
877	   to the outer header or the RE flag in the outer header is always
878	   cleared.  If the former, the flow will tend to be too positive when
879	   accounted for at borders.  If the latter, it will be too negative.
880	   If the rules set out in [ECN-tunnel] are followed then this will not
881	   be an issue.

883	5.7.  Non-Issues

885	   The following issues might seem to cause unfavourable interactions
886	   with re-ECN, but we will explain why they don't:

888	   o  Various link layers support explicit congestion notification, such
889	      as Frame Relay and ATM.  Explicit congestion notification is
890	      proposed to be added to other link layers, such as Ethernet
891	      (802.3ar Ethernet congestion management) and MPLS [RFC5129];

893	   o  Encryption and IPSec.

895	   In the case of congestion notification at the link layer, each
896	   particular link layer scheme either manages congestion on the link
897	   with its own link-level feedback (the usual arrangement in the cases
898	   of ATM and Frame Relay), or congestion notification from the link
899	   layer is merged into congestion notification at the IP level when the
900	   frame headers are decapsulated at the end of the link (the
901	   recommended arrangement in the Ethernet and MPLS cases).  Given the
902	   RE flag is not intended to change along the path, this means that
903	   downstream congestion will still be measureable at any point where IP
904	   is processed on the path by subtracting positive from negative
905	   markings.

907	   In the case of encryption, as long as the tunnel issues described in
908	   Section 5.6 are dealt with, payload encryption itself will not be a
909	   problem.  The design goal of re-ECN is to include downstream
910	   congestion in the IP header so that it is not necessary to bury into
911	   inner headers.  Obfuscation of flow identifiers is not a problem for
912	   re-ECN policing elements.  Re-ECN doesn't ever require flow
913	   identifiers to be valid, it only requires them to be unique.  So if
914	   an IPSec encapsulating security payload (ESP [RFC4835]) or an
915	   authentication header (AH [RFC4302]) is used, the security parameters
916	   index (SPI) will be a sufficient flow identifier, as it is intended
917	   to be unique to a flow without revealing actual port numbers.

919	   In general, even if endpoints use some locally agreed scheme to hide
920	   port numbers, re-ECN policing elements can just consider the pair of
921	   source and destination IP addresses as the flow identifier.  Re-ECN
922	   encourages endpoints to at least tell the network layer that a
923	   sequence of packets are all part of the same flow, if indeed they
924	   are.  The alternative would be for the sender to make each packet
925	   appear to be a new flow, which would require them all to be marked
926	   FNE in order to avoid being treated with the bulk of malicious flows
927	   at the egress dropper.  Given the FNE marking is worth +1 and
928	   networks are likely to rate limit FNE packets, endpoints are given an
929	   incentive not to set FNE on each packet.  But if the sender really
930	   does want to hide the flow relationship between packets it can choose
931	   to pay the cost of multiple FNE packets, which in the long run will
932	   compensate for the extra memory required on network policing elements
933	   to process each flow.

935	6.  Transport Layers

937	6.1.  TCP

939	   Re-ECN capability at the sender is essential.  At the receiver it is
940	   optional, as long as the receiver has a basic RFC3168-compliant ECN-
941	   capable transport (ECT) [RFC3168].  Given re-ECN is not the first
942	   attempt to define the semantics of the ECN field, we give a table
943	   below summarising what happens for various combinations of
944	   capabilities of the sender S and receiver R, as indicated in the
945	   first four columns below.  The last column gives the mode a half-
946	   connection should be in after the first two of the three TCP
947	   handshakes.

949	   +--------+--------------+------------+---------+--------------------+
950	   | Re-ECT |   ECT-Nonce  |     ECT    | Not-ECT |         S-R        |
951	   |        |   (RFC3540)  |  (RFC3168) |         |   Half-connection  |
952	   |        |              |            |         |        Mode        |
953	   +--------+--------------+------------+---------+--------------------+
954	   |   SR   |              |            |         |        RECN        |
955	   |    S   |       R      |            |         |       RECN-Co      |
956	   |    S   |              |      R     |         |       RECN-Co      |
957	   |    S   |              |            |    R    |       Not-ECT      |
958	   +--------+--------------+------------+---------+--------------------+

960	       Table 4: Modes of TCP Half-connection for Combinations of ECN
961	                  Capabilities of Sender S and Receiver R

963	   We will describe what happens in each mode, then describe how they
964	   are negotiated.  The abbreviations for the modes in the above table
965	   mean:

967	   RECN:  Full re-ECN capable transport

969	   RECN-Co:  Re-ECN sender in compatibility mode with a RFC3168
970	      compliant [RFC3168] ECN receiver or an [RFC3540] ECN nonce-capable
971	      receiver.  Implementation of this mode is OPTIONAL.

973	   Not-ECT:  Not ECN-capable transport, as defined in [RFC3168] for when
974	      at least one of the transports does not understand even basic ECN
975	      marking.

977	   Note that we use the term Re-ECT for a host transport that is re-ECN-
978	   capable but RECN for the modes of the half connections between hosts
979	   when they are both Re-ECT.  If a host transport is Re-ECT, this fact
980	   alone does NOT imply either of its half connections will necessarily
981	   be in RECN mode, at least not until it has confirmed that the other
982	   host is Re-ECT.

984	6.1.1.  RECN mode: Full Re-ECN capable transport

986	   In full RECN mode, for each half connection, both the sender and the
987	   receiver each maintain an unsigned integer counter we will call ECC
988	   (echo congestion counter).  The receiver maintains a count of how
989	   many times a CE marked packet has arrived during the half-connection.
990	   Once a RECN connection is established, the three TCP option flags
991	   (ECE, CWR & NS) used for ECN-related functions in other versions of
992	   ECN are used as a 3-bit field for the receiver to repeatedly tell the
993	   sender the current value of ECC, modulo 8, whenever it sends a TCP
994	   ACK.  We will call this the echo congestion increment (ECI) field.
995	   This overloaded use of these 3 option flags as one 3-bit ECI field is
996	   shown in Figure 7.  The actual definition of the TCP header,
997	   including the addition of support for the ECN nonce, is shown for
998	   comparison in Figure 6.  This specification does not redefine the
999	   names of these three TCP option flags, it merely overloads them with
1000	   another definition once a flow is established.

1002	        0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
1003	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
1004	      |               |           | N | C | E | U | A | P | R | S | F |
1005	      | Header Length | Reserved  | S | W | C | R | C | S | S | Y | I |
1006	      |               |           |   | R | E | G | K | H | T | N | N |
1007	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

1009	    Figure 6: The (post-ECN Nonce) definition of bytes 13 and 14 of the
1010	                                TCP Header

1012	        0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
1013	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
1014	      |               |           |           | U | A | P | R | S | F |
1015	      | Header Length | Reserved  |    ECI    | R | C | S | S | Y | I |
1016	      |               |           |           | G | K | H | T | N | N |
1017	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

1019	    Figure 7: Definition of the ECI field within bytes 13 and 14 of the
1020	   TCP Header, overloading the current definitions above for established
1021	                                RECN flows.

1023	   Receiver Action in RECN Mode

1025	      Every time a CE marked packet arrives at a receiver in RECN mode,
1026	      the receiver transport increments its local value of ECC and MUST
1027	      echo its value, modulo 8, to the sender in the ECI field of the
1028	      next ACK.  It MUST repeat the same value of ECI in every
1029	      subsequent ACK until the next CE event, when it increments ECI
1030	      again.

1032	      The increment of the local ECC values is modulo 8 so the field
1033	      value simply wraps round back to zero when it overflows.  The
1034	      least significant bit is to the right (labelled bit 9).

1036	      A receiver in RECN mode MAY delay the echo of a CE to the next
1037	      delayed-ACK, which would be necessary if ACK-withholding were
1038	      implemented.

1040	   Sender Action in RECN Mode

1042	      On the arrival of every ACK, the sender compares the ECI field
1043	      with its own ECC value, then replaces its local value with that
1044	      from the ACK.  The difference D (D = (ECI + 8 - ECC mod 8) mod 8)
1045	      is assumed to be the number of CE marked packets that arrived at
1046	      the receiver since it sent the previously received ACK (but see
1047	      below for the sender's safety strategy).  Whenever the ECI field
1048	      increments by D (and/or d drops are detected), the sender MUST
1049	      clear the RE flag to "0" in the IP header of the next D' data
1050	      packets it sends (where D' = D + d), effectively re-echoing each
1051	      single increment of ECI.  Otherwise the data sender MUST send all
1052	      data packets with RE set to "1".

1054	      As a general rule, once a flow is established, as well as setting
1055	      or clearing the RE flag as above, a data sender in RECN mode MUST
1056	      always set the ECN field to ECT(1).  However, the settings of the
1057	      extended ECN field during flow start are defined in Section 6.1.4.

1059	      As we have already emphasised, the re-ECN protocol makes no
1060	      changes and has no effect on the TCP congestion control algorithm.
1061	      So, the first increment of ECI (or detection of a drop) in a RTT
1062	      triggers the standard TCP congestion response, no more than one
1063	      congestion response per round trip, as usual.  However, the sender
1064	      re-echoes every increment of ECI irrespective of RTTs.

1066	      A TCP sender also acts as the receiver for the other half-
1067	      connection.  The host will maintain two ECC values S.ECC and R.ECC
1068	      as sender and receiver respectively.  Every TCP header sent by a
1069	      host in RECN mode will also repeat the prevailing value of R.ECC
1070	      in its ECI field.  If a sender in RECN mode has to retransmit a
1071	      packet due to a suspected loss, the re-transmitted packet MUST
1072	      carry the latest prevailing value of R.ECC when it is re-
1073	      transmitted, which will not necessarily be the one it carried
1074	      originally.

1076	6.1.2.  RECN-Co mode: Re-ECT Sender with a RFC3168 compliant ECN
1077	        Receiver

1079	   If the half-connection is in RECN-Co mode, ECN feedback proceeds no
1080	   differently to that of RFC3168 compliant ECN.  In other words, the
1081	   receiver sets the ECE flag repeatedly in the TCP header and the
1082	   sender responds by setting the CWR flag.  Although RECN-Co mode is
1083	   used when the receiver has not implemented the re-ECN protocol, the
1084	   sender can infer enough from its RFC3168 compliant ECN feedback to
1085	   set or clear the RE flag reasonably well.  Specifically, every time
1086	   the receiver toggles the ECE field from "0" to "1" (or a loss is
1087	   detected), as well as setting CWR in the TCP flags, the re-ECN sender
1088	   MUST blank the RE flag of the next packet to "0" as it would do in
1089	   full RECN mode.  Otherwise, the data sender SHOULD send all other
1090	   packets with RE set to "1".  Once a flow is established, a re-ECN
1091	   data sender in RECN-Co mode MUST always set the ECN field to ECT(1).

1093	   If a CE marked packet arrives at the receiver within a round trip
1094	   time of a previous mark, the receiver will still be echoing ECE for
1095	   the last CE mark.  Therefore, such a mark will be missed by the
1096	   sender.  Of course, this isn't of concern for congestion control, but
1097	   it does mean that very occasionally the RE blanking fraction will be
1098	   understated.  Therefore flows in RECN-Co mode may occasionally be
1099	   mistaken for very lightly cheating flows and consequently might
1100	   suffer a small number of packet drops through an egress dropper.  We
1101	   expect re-ECN would be deployed for some time before policers and
1102	   droppers start to enforce it.  So, given there is not much ECN
1103	   deployment yet anyway, this minor problem may affect only a very
1104	   small proportion of flows, reducing to nothing over the years as
1105	   RFC3168 compliant ECN hosts upgrade.  The use of RECN-Co mode would
1106	   need to be reviewed in the light of experience at the time of re-ECN
1107	   deployment.

1109	   RECN-Co mode is OPTIONAL.  Re-ECN implementers who want to keep their
1110	   code simple, MAY choose not to implement this mode.  If they do not,
1111	   a re-ECN sender SHOULD fall back to RFC3168 compliant ECT mode in the
1112	   presence of an ECN-capable receiver.  It MAY choose to fall back to
1113	   the ECT-Nonce mode, but if re-ECN implementers don't want to be
1114	   bothered with RECN-Co mode, they probably won't want to add an ECT-
1115	   Nonce mode either.

1117	6.1.2.1.  Re-ECN support for the ECN Nonce

1119	   A TCP half-connection in RECN-Co mode MUST NOT support the ECN
1120	   Nonce [RFC3540].  This means that the sending code of a re-ECN
1121	   implementation will never need to include ECN Nonce support.  Re-ECN
1122	   is intended to provide wider protection than the ECN nonce against
1123	   congestion control misbehaviour, and re-ECN only requires support
1124	   from the sender, therefore it is preferable to specifically rule out
1125	   the need for dual sender implementations.  As a consequence, a re-ECN
1126	   capable sender will never set ECT(0), so it will be easier for
1127	   network elements to discriminate re-ECN traffic flows from other ECN
1128	   traffic, which will always contain some ECT(0) packets.

1130	   However, a re-ECN implementation MAY OPTIONALLY include receiving
1131	   code that complies with the ECN Nonce protocol when interacting with
1132	   a sender that supports the ECN nonce (rather than re-ECN), but this
1133	   support is not required.

1135	   RFC3540 allows an ECN nonce sender to choose whether to sanction a
1136	   receiver that does not ever set the nonce sum.  Given re-ECN is
1137	   intended to provide wider protection than the ECN nonce against
1138	   congestion control misbehaviour, implementers of re-ECN receivers MAY
1139	   choose not to implement backwards compatibility with the ECN nonce
1140	   capability.  This may be because they deem that the risk of sanctions
1141	   is low, perhaps because significant deployment of the ECN nonce seems
1142	   unlikely at implementation time.

1144	6.1.3.  Capability Negotiation

1146	   During the TCP hand-shake at the start of a connection, an originator
1147	   of the connection (host A) with a re-ECN-capable transport MUST
1148	   indicate it is Re-ECT by setting the TCP flags NS=1, CWR=1 and ECE=1
1149	   in the initial SYN.

1151	   A responding Re-ECT host (host B) MUST return a SYN ACK with flags
1152	   CWR=1 and ECE=0.  The responding host MUST NOT set this combination
1153	   of flags unless the preceding SYN has already indicated Re-ECT
1154	   support as above.  Normally a Re-ECT server (B) will reply to a Re-
1155	   ECT client with NS=0, but if the initial SYN from Re-ECT client A is
1156	   marked CE(-1), a Re-ECT server B MUST increment its local value of
1157	   ECC.  But B cannot reflect the value of ECC in the SYN ACK, because
1158	   it is still using the 3 bits to negotiate connection capabilities.
1159	   So, server B MUST set the alternative TCP header flags in its SYN
1160	   ACK: NS=1, CWR=1 and ECE=0.

1162	   These handshakes are summarised in Table 5 below, with X indicating
1163	   NS can be either 0 or 1 depending on whether congestion had been
1164	   experienced.  The handshakes used for the other flavours of ECN are
1165	   also shown for comparison.  To compress the width of the table, the
1166	   headings of the first four columns have been severely abbreviated, as
1167	   follows:

1169	      R: *R*e-ECT

1171	      N: ECT-*N*once (RFC3540)

1173	      E: *E*CT (RFC3168)

1175	      I: Not-ECT (*I*mplicit congestion notification).

1177	   These correspond with the same headings used in Table 4.  Indeed, the
1178	   resulting modes in the last two columns of the table below are a more
1179	   comprehensive way of saying the same thing as Table 4.

1181	   +----+---+---+---+------------+-------------+-----------+-----------+
1182	   | R  | N | E | I |   SYN A-B  | SYN ACK B-A |  A-B Mode |  B-A Mode |
1183	   +----+---+---+---+------------+-------------+-----------+-----------+
1184	   |    |   |   |   | NS CWR ECE |  NS CWR ECE |           |           |
1185	   | AB |   |   |   |  1   1   1 |  X   1   0  |    RECN   |    RECN   |
1186	   | A  | B |   |   |  1   1   1 |  1   0   1  |  RECN-Co  | ECT-Nonce |
1187	   | A  |   | B |   |  1   1   1 |  0   0   1  |  RECN-Co  |    ECT    |
1188	   | A  |   |   | B |  1   1   1 |  0   0   0  |  Not-ECT  |  Not-ECT  |
1189	   | B  | A |   |   |  0   1   1 |  0   0   1  | ECT-Nonce |  RECN-Co  |
1190	   | B  |   | A |   |  0   1   1 |  0   0   1  |    ECT    |  RECN-Co  |
1191	   | B  |   |   | A |  0   0   0 |  0   0   0  |  Not-ECT  |  Not-ECT  |
1192	   +----+---+---+---+------------+-------------+-----------+-----------+

1194	      Table 5: TCP Capability Negotiation between Originator (A) and
1195	                               Responder (B)

1197	   As soon as a re-ECN capable TCP server receives a SYN, it MUST set
1198	   its two half-connections into the modes given in Table 5.  As soon as
1199	   a re-ECN capable TCP client receives a SYN ACK, it MUST set its two
1200	   half-connections into the modes given in Table 5.  The half-
1201	   connections will remain in these modes for the rest of the
1202	   connection, including for the third segment of TCP's three-way hand-
1203	   shake (the ACK).

1205	   {ToDo: Consider RSTs within a connection.}

1207	   Recall that, if the SYN ACK reflects the same flag settings as the
1208	   preceding SYN (because there is a broken RFC3168 compliant
1209	   implementation that behaves this way), RFC3168 specifies that the
1210	   whole connection MUST revert to Not-ECT.

1212	   Also note that, whenever the SYN flag of a TCP segment is set
1213	   (including when the ACK flag is also set), the NS, CWR and ECE flags
1214	   ( i.e the ECI field of the SYNACK) MUST NOT be interpreted as the
1215	   3-bit ECI value, which is only set as a copy of the local ECC value
1216	   in non-SYN packets.

1218	6.1.4.  Extended ECN (EECN) Field Settings during Flow Start or after
1219	        Idle Periods

1221	   If the originator (A) of a TCP connection supports re-ECN it MUST set
1222	   the extended ECN (EECN) field in the IP header of the initial SYN
1223	   packet to the feedback not established (FNE) codepoint.

1225	   FNE is a new extended ECN codepoint defined by this specification
1226	   (Section 4.2).  The feedback not established (FNE) codepoint is used
1227	   when the transport does not have the benefit of ECN feedback so it
1228	   cannot decide whether to set or clear the RE flag.

1230	   If after receiving a SYN the server B has set its sending half-
1231	   connection into RECN mode or RECN-Co mode, it MUST set the extended
1232	   ECN field in the IP header of its SYN ACK to the feedback not
1233	   established (FNE) codepoint.  Note the careful wording here, which
1234	   means that Re-ECT server B MUST set FNE on a SYN ACK whether it is
1235	   responding to a SYN from a Re-ECT client or from a client that is
1236	   merely ECN-capable.  This is because FNE indicates the transport is
1237	   ECN capable.

1239	   The original ECN specification [RFC3168] required SYNs and SYN ACKs
1240	   to use the Not-ECT codepoint of the ECN field.  The aim was to
1241	   prevent well-known DoS attacks such as SYN flooding being able to
1242	   gain from the advantage that ECN capability afforded over drop at
1243	   ECN-capable routers.

1245	   For a SYN ACK, Kuzmanovic [I-D.ietf-tcpm-ecnsyn] has shown that this
1246	   caution was unnecessary, and proposes to allow a SYN ACK to be ECN-
1247	   capable to improve performance.  By stipulating the FNE codepoint for
1248	   the initial SYN, we comply with RFC3168 in word but not in spirit,
1249	   because we have indeed set the ECN field to Not-ECT, but we have
1250	   extended the ECN field with another bit.  And it will be seen
1251	   (Section 5.3) that we have defined one setting of that bit to mean an
1252	   ECN-capable transport.  Therefore, by proposing that the FNE
1253	   codepoint MUST be used on the initial SYN of a connection, we have
1254	   gone further by proposing to make the initial SYN ECN-capable too.
1255	   Section 5.4 justifies deciding to make the initial SYN ECN-capable.

1257	   Once a TCP half connection is in RECN mode or RECN-Co mode, FNE will
1258	   have already been set on the initial SYN and possibly the SYN ACK as
1259	   above.  But each re-ECN sender will have to set FNE cautiously on a
1260	   few data packets as well, given a number of packets will usually have
1261	   to be sent before sufficient congestion feedback is received.  The
1262	   behaviour will be different depending on the mode of the half-
1263	   connection:

1265	   RECN mode:  Given the constraints on TCP's initial window [RFC3390]
1266	      and its exponential window increase during slow start
1267	      phase [RFC2581], it turns out that the sender SHOULD set FNE on
1268	      the first and third data packets in its flow after the initial
1269	      3-way handshake, assuming equal sized data packets once a flow is
1270	      established.  Appendix D presents the calculation that led to this
1271	      conclusion.  Below, after running through the start of an example
1272	      TCP session, we give the intuition learned from that calculation.

1274	   RECN-Co mode:  A re-ECT sender that switches into re-ECN
1275	      compatibility mode or into Not-ECT mode (because it has detected
1276	      the corresponding host is not re-ECN capable) MUST limit its
1277	      initial window to 1 segment.  The reasoning behind this constraint
1278	      is given in Section 5.4.  Having set this initial window, a re-ECN
1279	      sender in RECN-Co mode SHOULD set FNE on the first and third data
1280	      packets in a flow, as for RECN mode.

1282	   +----+------+----------------+-------+-------+---------------+------+
1283	   |    | Data | TCP A(Re-ECT)  | IP A  | IP B  | TCP B(Re-ECT) | Data |
1284	   +----+------+----------------+-------+-------+---------------+------+
1285	   |    | Byte |  SEQ  ACK CTL  | EECN  | EECN  |  SEQ  ACK CTL | Byte |
1286	   | -- | ---- | -------------  | ----- | ----- | ------------- | ---- |
1287	   |  1 |      | 0100      SYN  | FNE   | -->   |      R.ECC=0  |      |
1288	   |    |      |    CWR,ECE,NS  |       |       |               |      |
1289	   |  2 |      |      R.ECC=0   | <--   | FNE   | 0300 0101     |      |
1290	   |    |      |                |       |       |   SYN,ACK,CWR |      |
1291	   |  3 |      | 0101 0301 ACK  | RECT  | -->   |      R.ECC=0  |      |
1292	   |  4 | 1000 | 0101 0301 ACK  | FNE   | -->   |      R.ECC=0  |      |
1293	   |  5 |      |      R.ECC=0   | <--   | FNE   | 0301 1102 ACK | 1460 |
1294	   |  6 |      |      R.ECC=0   | <--   | RECT  | 1762 1102 ACK | 1460 |
1295	   |  7 |      |      R.ECC=0   | <--   | FNE   | 3222 1102 ACK | 1460 |
1296	   |  8 |      | 1102 1762 ACK  | RECT  | -->   |      R.ECC=0  |      |
1297	   |  9 |      |      R.ECC=0   | <--   | RECT  | 4682 1102 ACK | 1460 |
1298	   | 10 |      |      R.ECC=0   | <--   | RECT  | 6142 1102 ACK | 1460 |
1299	   | 11 |      | 1102 3222 ACK  | RECT  | -->   |      R.ECC=0  |      |
1300	   | 12 |      |      R.ECC=0   | <--   | RECT  | 7602 1102 ACK | 1460 |
1301	   | 13 |      |      R.ECC=1   | <*-   | RECT  | 9062 1102 ACK | 1460 |
1302	   |    |      | ...            |       |       |               |      |
1303	   +----+------+----------------+-------+-------+---------------+------+

1305	                      Table 6: TCP Session Example #1

1307	   Table 6 shows an example TCP session, where the server B sets FNE on
1308	   its first and third data packets (lines 5 & 7) as well as on the
1309	   initial SYN ACK as previously described.  The left hand half of the
1310	   table shows the relevant settings of headers sent by client A in
1311	   three layers: the TCP payload size; TCP settings; then IP settings.
1312	   The right hand half gives equivalent columns for server B. The only
1313	   TCP settings shown are the sequence number (SEQ), acknowledgement
1314	   number (ACK) and the relevant control (CTL) flags that A sets in the
1315	   TCP header.  The IP columns show the setting of the extended ECN
1316	   (EECN) field.

1318	   Also shown on the receiving side of the table is the value of the
1319	   receiver's echo congestion counter (R.ECC) after processing the
1320	   incoming EECN header.  Note that, once a host sets a half-connection
1321	   into RECN mode, it MUST initialise its local value of ECC to zero.

1323	   The intuition that Appendix D gives for why a sender should set FNE
1324	   on the first and third data packets is as follows.  At line 13, a
1325	   packet sent by B is shown with an '*', which means it has been
1326	   congestion marked by an intermediate queue from RECT to CE(-1).  On
1327	   receiving this CE marked packet, client A increments its ECC counter
1328	   to 1 as shown.  This was the 7th data packet B sent, but before
1329	   feedback about this event returns to B, it might well have sent many
1330	   more packets.  Indeed, during exponential slow start, about as many
1331	   packets will be in flight (unacknowledged) as have been acknowledged.
1332	   So, when the feedback from the congestion event on B's 7th segment
1333	   returns, B will have sent about 7 further packets that will still be
1334	   in flight.  At that stage, B's best estimate of the network's packet
1335	   marking fraction will be 1/7.  So, as B will have sent about 14
1336	   packets, it should have already marked 2 of them as FNE in order to
1337	   have marked 1/7; hence the need to have set the first and third data
1338	   packets to FNE.

1340	   Client A's behaviour in Table 6 also shows FNE being set on the first
1341	   SYN and the first data packet (lines 1 & 4), but in this case it
1342	   sends no more data packets, so of course, it cannot, and does not
1343	   need to, set FNE again.  Note that in the A-B direction there is no
1344	   need to set FNE on the third part of the three-way hand-shake (line
1345	   3---the ACK).

1347	   Note that in this section we have used the word SHOULD rather than
1348	   MUST when specifying how to set FNE on data segments before positive
1349	   congestion feedback arrives (but note that the word MUST was used for
1350	   FNE on the SYN and SYN ACK).  FNE is only RECOMMENDED for the first
1351	   and third data segments to entertain the possibility that the TCP
1352	   transport has the benefit of other knowledge of the path, which it
1353	   re-uses from one flow for the benefit of a newly starting flow.  For
1354	   instance, one flow can re-use knowledge of other flows between the
1355	   same hosts if using a Congestion Manager [RFC3124] or when a proxy
1356	   host aggregates congestion information for large numbers of flows.

1358	   After an idle period of more than 1 second, a re-ECN sender transport
1359	   MUST set the EECN field of the packet that resumes the connection to
1360	   FNE.  Note that this next packet may be sent a very long time later,
1361	   a packet does NOT have to be sent after 1 second of idling.  In order
1362	   that the design of network policers can be deterministic, this
1363	   specification deliberately puts an absolute lower limit on how long a
1364	   connection can be idle before the packet that resumes the connection
1365	   must be set to FNE, rather than relating it to the connection round
1366	   trip time.  We use the lower bound of the retransmission timeout
1367	   (RTO) [RFC2988], which is commonly used as the idle period before TCP
1368	   must reduce to the restart window [RFC2581].  Note our specification
1369	   of re-ECN's idle period is NOT intended to change the idle period for
1370	   TCP's restart, nor indeed for any other purposes.

1372	   {ToDo: Describe how the sender falls back to RFC3168 modes if packets
1373	   don't appear to be getting through (to work round firewalls
1374	   discarding packets they consider unusual).}

1376	6.1.5.  Pure ACKS, Retransmissions, Window Probes and Partial ACKs

1378	   A re-ECN sender MUST clear the RE flag to "0" and set the ECN field
1379	   to Not-ECT in pure ACKs, retransmissions and window probes, as
1380	   specified in  [RFC3168].  Our eventual goal is for all packets to be
1381	   sent with re-ECN enabled, and we believe the semantics of the ECI
1382	   field go a long way towards being able to achieve this.  However, we
1383	   have not completed a full security analysis for these cases,
1384	   therefore, currently we merely re-state current practice.

1386	   We must also reconcile the facts that congestion marking is applied
1387	   to packets but acknowledgements cover octet ranges and acknowledged
1388	   octet boundaries need not match the transmitted boundaries.  The
1389	   general principle we work to is to remain compatible with TCP's
1390	   congestion control which is driven by congestion events at packet
1391	   granularity while at the same time aiming to blank the RE flag on at
1392	   least as many octets in a flow as have been marked CE.

1394	   Therefore, a re-ECN TCP receiver MUST increment its ECC value as many
1395	   times as CE marked packets have been received.  And that value MUST
1396	   be echoed to the sender in the first available ACK using the ECI
1397	   field.  This ensures the TCP sender's congestion control receives
1398	   timely feedback on congestion events at the same packet granularity
1399	   that they were generated on congested queues.

1401	   Then, a re-ECN sender stores the difference D between its own ECC
1402	   value and the incoming ECI field by incrementing a counter R. Then, R
1403	   is decremented by 1 each subsequent packet that is sent with the RE
1404	   flag blanked, until R is no longer positive.  Using this technique,
1405	   whenever a re-ECN transport sends a not re-ECN capable packet (e.g. a
1406	   retransmission), the remaining packets required to have the RE flag
1407	   blanked will be automatically carried over to subsequent packets,
1408	   through the variable R.

1410	   This does not ensure precisely the same number of octets have RE
1411	   blanked as were CE marked.  But we believe positive errors will
1412	   cancel negative over a long enough period. {ToDo: However, more
1413	   research is needed to prove whether this is so.  If it is not, it may
1414	   be necessary to increment and decrement R in octets rather than
1415	   packets, by incrementing R as the product of D and the size in octets
1416	   of packets being sent (typically the MSS).}

1418	6.2.  Other Transports

1420	6.2.1.  General Guidelines for Adding Re-ECN to Other Transports

1422	   As a general rule, Re-ECT sender transports that have established the
1423	   receiver transport is at least ECN-capable (not necessarily re-ECN
1424	   capable) MUST blank the RE codepoint for at least as many octets as
1425	   arrive at receiver with the CE codepoint set.  Re-ECN-capable sender
1426	   transports should always initialise the ECN field to the ECT(1)
1427	   codepoint once a flow is established.

1429	   If the sender transport does not have sufficient feedback to even
1430	   estimate the path's CE rate, it SHOULD set FNE continuously.  If the
1431	   sender transport has some, perhaps stale, feedback to estimate that
1432	   the path's CE rate is nearly definitely less than E%, the transport
1433	   MAY blank RE in packets for E% of sent octets, and set the RECT
1434	   codepoint for the remainder.

1436	   The following sections give guidelines on how re-ECN support could be
1437	   added to RSVP or NSIS, to DCCP, and to SCTP - although separate
1438	   Internet drafts will be necessary to document the exact mechanics of
1439	   re-ECN in each of these protocols.

1441	   {ToDo: Give a brief outline of what would be expected for each of the
1442	   following:

1444	   o  UDP fire and forget (e.g.  DNS)

1446	   o  UDP streaming with no feedback

1448	   o  UDP streaming with feedback

1450	   }

1452	6.2.2.  Guidelines for adding Re-ECN to RSVP or NSIS

1454	   A separate I-D has been submitted [Re-PCN] describing how re-ECN can
1455	   be used in an edge-to-edge rather than end-to-end scenario.  It can
1456	   then be used by downstream networks to police whether upstream
1457	   networks are blocking new flow reservations when downstream
1458	   congestion is too high, even though the congestion is in other
1459	   operators' downstream networks.  This relates to current IETF work on
1460	   Admission Control over Diffserv using Pre-Congestion Notification
1461	   (PCN)  [PCN-arch].

1463	6.2.3.  Guidelines for adding Re-ECN to DCCP

1465	   Beside adjusting the initial features negotiation sequence, operating
1466	   re-ECN in DCCP [RFC4340] could be achieved by defining a new option
1467	   to be added to acknowledgments, that would include a multibit field
1468	   where the destination could copy its ECC.

1470	6.2.4.  Guidelines for adding Re-ECN to SCTP

1472	   Appendix A in [RFC4960] gives the specifications for SCTP to support
1473	   ECN.  Similar steps should be taken to support re-ECN.  Beside
1474	   adjusting the initial features negotiation sequence, operating re-ECN
1475	   in SCTP could be achieved by defining a new control chunk, that would
1476	   include a multibit field where the destination could copy its ECC

1478	7.  Incremental Deployment

1480	   The design of the re-ECN protocol started from the fact that the
1481	   current ECN marking behaviour of queues was sufficient and that re-
1482	   feedback could be introduced around these queues by changing the
1483	   sender behaviour but not the routers.  Otherwise, if we had required
1484	   routers to be changed, the chance of encountering a path that had
1485	   every router upgraded would be vanishly small during early
1486	   deployment, giving no incentive to start deployment.  Also, as there
1487	   is no new forwarding behaviour, routers and hosts do not have to
1488	   signal or negotiate anything.

1490	   However, networks that choose to protect themselves using re-ECN do
1491	   have to add new security functions at their trust boundaries with
1492	   others.  They distinguish legacy traffic by its ECN field.  Traffic
1493	   from Not-ECT transports is distinguishable by its Not-ECT marking.
1494	   Traffic from RFC3168 compliant ECN transports is distinguished from
1495	   re-ECN by which of ECT(0) or ECT(1) is used.  We chose to use ECT(1)
1496	   for re-ECN traffic deliberately.  Existing ECN sources set ECT(0) on
1497	   either 50% (the nonce) or 100% (the default) of packets, whereas re-
1498	   ECN does not use ECT(0) at all.  We can use this distinguishing
1499	   feature of RFC3168 compliant ECN traffic to separate it out for
1500	   different treatment at the various border security functions: egress
1501	   dropping, ingress policing and border policing.

1503	   The general principle we adopt is that an egress dropper will not
1504	   drop any legacy traffic, but ingress and border policers will limit
1505	   the bulk rate of legacy traffic (Not-ECT, ECT(0) and those amrked
1506	   with the unused codepoint) that can enter each network.  Then, during
1507	   early re-ECN deployment, operators can set very permissive (or non-
1508	   existent) rate-limits on legacy traffic, but once re-ECN
1509	   implementations are generally available, legacy traffic can be rate-
1510	   limited increasingly harshly.  Ultimately, an operator might choose
1511	   to block all legacy traffic entering its network, or at least only
1512	   allow through a trickle.

1514	   Then, as the limits are set more strictly, the more RFC3168 ECN
1515	   sources will gain by upgrading to re-ECN.  Thus, towards the end of
1516	   the voluntary incremental deployment period, RFC3168 compliant
1517	   transports can be given progressively stronger encouragement to
1518	   upgrade.

1520	   The following list of minor changes, brings together all the points
1521	   where re-ECN semantics for use of the two-bit ECN field are different
1522	   compared to RFC3168:

1524	   o  A re-ECN sender sets ECT(1) by default, whereas an RFC3168 sender
1525	      sets ECT(0) by default (Section 4.3);

1527	   o  No provision is necessary for a re-ECN capable source transport to
1528	      use the ECN nonce (Section 6.1.2.1);

1530	   o  Routers MAY preferentially drop different extended ECN codepoints
1531	      (Section 5.3);

1533	   o  Packets carrying the feedback not established (FNE) codepoint MAY
1534	      optionally be marked rather than dropped by routers, even though
1535	      their ECN field is Not-ECT (with the important caveat in
1536	      Section 5.3);

1538	   o  Packets may be dropped by policing nodes because of apparent
1539	      misbehaviour, not just because of congestion ;

1541	   o  Tunnel entry behaviour is still to be defined, but may have to be
1542	      different from RFC3168 (Section 5.6).

1544	   None of these changes REQUIRE any modifications to routers.  Also
1545	   none of these changes affect anything about end to end congestion
1546	   control; they are all to do with allowing networks to police that end
1547	   to end congestion control is well-behaved.

1549	8.  Related Work

1551	8.1.  Congestion Notification Integrity

1553	   The choice of two ECT code-points in the ECN field [RFC3168]
1554	   permitted future flexibility, optionally allowing the sender to
1555	   encode the experimental ECN nonce [RFC3540] in the packet stream.
1556	   This mechanism has since been included in the specifications of DCCP
1557	   [RFC4340].

1559	   The ECN nonce is an elegant scheme that allows the sender to detect
1560	   if someone in the feedback loop - the receiver especially - tries to
1561	   claim no congestion was experienced when in fact congestion led to
1562	   packet drops or ECN marks.  For each packet it sends, the sender
1563	   chooses between the two ECT codepoints in a pseudo-random sequence.
1564	   Then, whenever the network marks a packet with CE, if the receiver
1565	   wants to deny congestion happened, she has to guess which ECT
1566	   codepoint was overwritten.  She has only a 50:50 chance of being
1567	   correct each time she denies a congestion mark or a drop, which
1568	   ultimately will give her away.

1570	   The purpose of a network-layer nonce should primarily be protection
1571	   of the network, while a transport-layer nonce would be better used to
1572	   protect the sender from cheating receivers.  Now, the assumption
1573	   behind the ECN nonce is that a sender will want to detect whether a
1574	   receiver is suppressing congestion feedback.  This is only true if
1575	   the sender's interests are aligned with the network's, or with the
1576	   community of users as a whole.  This may be true for certain large
1577	   senders, who are under close scrutiny and have a reputation to
1578	   maintain.  But we have to deal with a more hostile world, where
1579	   traffic may be dominated by peer-to-peer transfers, rather than
1580	   downloads from a few popular sites.  Often the `natural' self-
1581	   interest of a sender is not aligned with the interests of other
1582	   users.  It often wishes to transfer data quickly to the receiver as
1583	   much as the receiver wants the data quickly.

1585	   In contrast, the re-ECN protocol enables policing of an agreed rate-
1586	   response to congestion (e.g. TCP-friendliness) at the sender's
1587	   interface with the internetwork.  It also ensures downstream networks
1588	   can police their upstream neighbours, to encourage them to police
1589	   their users in turn.  But most importantly, it requires the sender to
1590	   declare path congestion to the network and it can remove traffic at
1591	   the egress if this declaration is dishonest.  So it can police
1592	   correctly, irrespective of whether the receiver tries to suppress
1593	   congestion feedback or whether the sender ignores genuine congestion
1594	   feedback.  Therefore the re-ECN protocol addresses a much wider range
1595	   of cheating problems, which includes the one addressed by the ECN
1596	   nonce.

1598	9.  Security Considerations

1600	   This whole memo concerns the deployment of a secure congestion
1601	   control framework.  However, below we list some specific security
1602	   issues that we are still working on:

1604	   o  Malicious users have ability to launch dynamically changing
1605	      attacks, exploiting the time it takes to detect an attack, given
1606	      ECN marking is binary.  We are concentrating on subtle
1607	      interactions between the ingress policer and the egress dropper in
1608	      an effort to make it impossible to game the system.

1610	   o  There is an inherent need for at least some flow state at the
1611	      egress dropper given the binary marking environment, which leads
1612	      to an apparent vulnerability to state exhaustion attacks.  An
1613	      egress dropper design with bounded flow state is in write-up.

1615	   o  A malicious source can spoof another user's address and send
1616	      negative traffic to the same destination in order to fool the
1617	      dropper into sanctioning the other user's flow.  To prevent or
1618	      mitigate these two different kinds of DoS attack, against the
1619	      dropper and against given flows, we are considering various
1620	      protection mechanisms.

1622	   o  A malicious client can send requests using a spoofed source
1623	      address to a server (such as a DNS server) that tends to respond
1624	      with single packet responses.  This server will then be tricked
1625	      into having to set FNE on the first (and only) packet of all these
1626	      wasted responses.  Given packets marked FNE are worth +1, this
1627	      will cause such servers to consume more of their allowance to
1628	      cause congestion than they would wish to.  In general, re-ECN is
1629	      deliberately designed so that single packet flows have to bear the
1630	      cost of not discovering the congestion state of their path.  One
1631	      of the reasons for introducing re-ECN is to encourage short flows
1632	      to make use of previous path knowledge by moving the cost of this
1633	      lack of knowledge to sources that create short flows.  Therefore,
1634	      we in the long run we might expect services like DNS to aggregate
1635	      single packet flows into connections where it brings benefits.
1636	      However, this attack where DNS requests are made from spoofed
1637	      addresses genuinely forces the server to waste its resources.  The
1638	      only mitigating feature is that the attacker has to set FNE on
1639	      each of its requests if they are to get through an egress dropper
1640	      to a DNS server.  The attacker therefore has to consume as many
1641	      resources as the victim, which at least implies re-ECN does not
1642	      unwittingly amplify this attack.

1644	   Having highlighted outstanding security issues, we now explain the
1645	   design decisions that were taken based on a security-related
1646	   rationale.  It may seem that the six codepoints of the eight made
1647	   available by extending the ECN field with the RE flag have been used
1648	   rather wastefully to encode just five states.  In effect the RE flag
1649	   has been used as an orthogonal single bit, using up four codepoints
1650	   to encode the three states of positive, neutral and negative worth.
1651	   The mapping of the codepoints in an earlier version of this proposal
1652	   used the codepoint space more efficiently, but the scheme became
1653	   vulnerable to network operators bypassing congestion penalties by
1654	   focusing congestion marking on positive packets.  Appendix B explains
1655	   why fixing that problem while allowing for incremental deployment,
1656	   would have used another codepoint anyway.  So it was better to use
1657	   this orthogonal encoding scheme, which greatly simplified the whole
1658	   protocol and brought with it some subtle security benefits (see the
1659	   last paragraph of Appendix B).

1661	   With the scheme as now proposed, once the RE flag is set or cleared
1662	   by the sender or its proxy, it should not be written by the network,
1663	   only read.  So the endpoints can detect if any network maliciously
1664	   alters the RE flag.  IPSec AH integrity checking does not cover the
1665	   IPv4 option flags (they were considered mutable---even the one we
1666	   propose using for the RE flag that was `currently unused' when IPSec
1667	   was defined).  But it would be sufficient for a pair of endpoints to
1668	   make random checks on whether the RE flag was the same when it
1669	   reached the egress as when it left the ingress.  Indeed, if IPSec AH
1670	   had covered the RE flag, any network intending to alter sufficient RE
1671	   flags to make a gain would have focused its alterations on packets
1672	   without authenticating headers (AHs).

1674	   The security of re-ECN has been deliberately designed to not rely on
1675	   cryptography.

1677	10.  IANA Considerations

1679	   This memo includes no request to IANA (yet).

1681	   If this memo was to progress to standards track, it would list:

1683	   o  The new RE flag in IPv4 (Section 5.1) and its extension with the
1684	      ECN field to create a new set of extended ECN (EECN) codepoints;

1686	   o  The definition of the EECN codepoints for default Diffserv PHBs
1687	      (Section 4.2)

1689	   o  The new extension header for IPv6 (Section 5.2);

1691	   o  The new combinations of flags in the TCP header for capability
1692	      negotiation (Section 6.1.3);

1694	11.  Conclusions

1696	   {ToDo:}

1698	12.  Acknowledgements

1700	   Sebastien Cazalet and Andrea Soppera contributed to the idea of re-
1701	   feedback.  All the following have given helpful comments: Andrea
1702	   Soppera, David Songhurst, Peter Hovell, Louise Burness, Phil Eardley,
1703	   Steve Rudkin, Marc Wennink, Fabrice Saffre, Cefn Hoile, Steve Wright,
1704	   John Davey, Martin Koyabe, Carla Di Cairano-Gilfedder, Alexandru
1705	   Murgu, Nigel Geffen, Pete Willis, John Adams (BT), Sally Floyd
1706	   (ICIR), Joe Babiarz, Kwok Ho-Chan (Nortel), Stephen Hailes, Mark
1707	   Handley (who developed the attack with canceled packets), Adam
1708	   Greenhalgh (who developed the attack on DNS) (UCL), Jon Crowcroft
1709	   (Uni Cam), David Clark, Bill Lehr, Sharon Gillett, Steve Bauer (who
1710	   complemented our own dummy traffic attacks with others), Liz Maida
1711	   (MIT), and comments from participants in the CRN/CFP Broadband and
1712	   DoS-resistant Internet working groups.A special thank you to
1713	   Alessandro Salvatori for coming up with fiendish attacks on re-ECN.

1715	13.  Comments Solicited

1717	   Comments and questions are encouraged and very welcome.  They can be
1718	   addressed to the IETF Transport Area working group's mailing list
1719	   <tsvwg@ietf.org>, and/or to the authors.

1721	14.  References

1723	14.1.  Normative References

1725	   [RFC2119]                       Bradner, S., "Key words for use in
1726	                                   RFCs to Indicate Requirement Levels",
1727	                                   BCP 14, RFC 2119, March 1997.

1729	   [RFC2581]                       Allman, M., Paxson, V., and W.
1730	                                   Stevens, "TCP Congestion Control",
1731	                                   RFC 2581, April 1999.

1733	   [RFC3168]                       Ramakrishnan, K., Floyd, S., and D.
1734	                                   Black, "The Addition of Explicit
1735	                                   Congestion Notification (ECN) to IP",
1736	                                   RFC 3168, September 2001.

1738	   [RFC3390]                       Allman, M., Floyd, S., and C.
1739	                                   Partridge, "Increasing TCP's Initial
1740	                                   Window", RFC 3390, October 2002.

1742	   [RFC4340]                       Kohler, E., Handley, M., and S.
1743	                                   Floyd, "Datagram Congestion Control
1744	                                   Protocol (DCCP)", RFC 4340,
1745	                                   March 2006.

1747	   [RFC4341]                       Floyd, S. and E. Kohler, "Profile for
1748	                                   Datagram Congestion Control Protocol
1749	                                   (DCCP) Congestion Control ID 2: TCP-
1750	                                   like Congestion Control", RFC 4341,
1751	                                   March 2006.

1753	   [RFC4342]                       Floyd, S., Kohler, E., and J. Padhye,
1754	                                   "Profile for Datagram Congestion
1755	                                   Control Protocol (DCCP) Congestion
1756	                                   Control ID 3: TCP-Friendly Rate
1757	                                   Control (TFRC)", RFC 4342,
1758	                                   March 2006.

1760	   [RFC4960]                       Stewart, R., "Stream Control
1761	                                   Transmission Protocol", RFC 4960,
1762	                                   September 2007.

1764	14.2.  Informative References

1766	   [ARI05]                         Adams, J., Roberts, L., and A.
1767	                                   IJsselmuiden, "Changing the Internet
1768	                                   to Support Real-Time Content Supply
1769	                                   from a Large Fraction of Broadband
1770	                                   Residential Users", BT Technology
1771	                                   Journal (BTTJ) 23(2), April 2005.

1773	   [ECN-tunnel]                    Briscoe, B., "Layered Encapsulation
1774	                                   of Congestion Notification",
1775	                                   draft-briscoe-tsvwg-ecn-tunnel-01
1776	                                   (work in progress), July 2008.

1778	   [I-D.ietf-tcpm-ecnsyn]          Kuzmanovic, A., "Adding Explicit
1779	                                   Congestion Notification (ECN)
1780	                                   Capability to TCP's SYN/ACK
1781	                                   Packets", draft-ietf-tcpm-ecnsyn-07
1782	                                   (work in progress), November 2008.

1784	   [I-D.moncaster-tcpm-rcv-cheat]  Moncaster, T., "A TCP Test to Allow
1785	                                   Senders to Identify Receiver Non-
1786	                                   Compliance",
1787	                                   draft-moncaster-tcpm-rcv-cheat-02
1788	                                   (work in progress), November 2007.

1790	   [PCN-arch]                      Eardley, P., Babiarz, J., Chan, K.,
1791	                                   Charny, A., Geib, R., Karagiannis,
1792	                                   G., Menth, M., and T. Tsou, "Pre-
1793	                                   Congestion Notification
1794	                                   Architecture",
1795	                                   draft-ietf-pcn-architecture-09 (work
1796	                                   in progress), January 2008.

1798	   [RFC2309]                       Braden, B., Clark, D., Crowcroft, J.,
1799	                                   Davie, B., Deering, S., Estrin, D.,
1800	                                   Floyd, S., Jacobson, V., Minshall,
1801	                                   G., Partridge, C., Peterson, L.,
1802	                                   Ramakrishnan, K., Shenker, S.,
1803	                                   Wroclawski, J., and L. Zhang,
1804	                                   "Recommendations on Queue Management
1805	                                   and Congestion Avoidance in the
1806	                                   Internet", RFC 2309, April 1998.

1808	   [RFC2475]                       Blake, S., Black, D., Carlson, M.,
1809	                                   Davies, E., Wang, Z., and W. Weiss,
1810	                                   "An Architecture for Differentiated
1811	                                   Services", RFC 2475, December 1998.

1813	   [RFC2988]                       Paxson, V. and M. Allman, "Computing
1814	                                   TCP's Retransmission Timer",
1815	                                   RFC 2988, November 2000.

1817	   [RFC3124]                       Balakrishnan, H. and S. Seshan, "The
1818	                                   Congestion Manager", RFC 3124,
1819	                                   June 2001.

1821	   [RFC3514]                       Bellovin, S., "The Security Flag in
1822	                                   the IPv4 Header", RFC 3514,
1823	                                   April 2003.

1825	   [RFC3540]                       Spring, N., Wetherall, D., and D.
1826	                                   Ely, "Robust Explicit Congestion
1827	                                   Notification (ECN) Signaling with
1828	                                   Nonces", RFC 3540, June 2003.

1830	   [RFC4301]                       Kent, S. and K. Seo, "Security
1831	                                   Architecture for the Internet
1832	                                   Protocol", RFC 4301, December 2005.

1834	   [RFC4302]                       Kent, S., "IP Authentication Header",
1835	                                   RFC 4302, December 2005.

1837	   [RFC4835]                       Eastlake, D., "Cryptographic
1838	                                   Algorithm Implementation Requirements
1839	                                   for Encapsulating Security Payload
1840	                                   (ESP) and Authentication Header
1841	                                   (AH)", RFC 4835, April 2007.

1843	   [RFC5129]                       Davie, B., Briscoe, B., and J. Tay,
1844	                                   "Explicit Congestion Marking in
1845	                                   MPLS", RFC 5129, January 2008.

1847	   [Re-PCN]                        Briscoe, B., "Emulating Border Flow
1848	                                   Policing using Re-ECN on Bulk Data",
1849	                                   draft-briscoe-re-pcn-border-cheat-02
1850	                                   (work in progress), September 2008.

1852	   [Re-fb]                         Briscoe, B., Jacquet, A., Di Cairano-
1853	                                   Gilfedder, C., Salvatori, A.,
1854	                                   Soppera, A., and M. Koyabe, "Policing
1855	                                   Congestion Response in an
1856	                                   Internetwork Using Re-Feedback", ACM
1857	                                   SIGCOMM CCR 35(4)277--288,
1858	                                   August 2005, <http://www.acm.org/
1859	                                   sigs/sigcomm/sigcomm2005/
1860	                                   techprog.html#session8>.

1862	   [Savage99]                      Savage, S., Cardwell, N., Wetherall,
1863	                                   D., and T. Anderson, "TCP congestion
1864	                                   control with a misbehaving receiver",
1865	                                   ACM SIGCOMM CCR 29(5), October 1999,
1866	                                   <http://citeseer.ist.psu.edu/
1867	                                   savage99tcp.html>.

1869	   [Steps_DoS]                     Handley, M. and A. Greenhalgh, "Steps
1870	                                   towards a DoS-resistant Internet
1871	                                   Architecture", Proc. ACM SIGCOMM
1872	                                   workshop on Future directions in
1873	                                   network architecture (FDNA'04) pp
1874	                                   49--56, August 2004.

1876	   [re-ecn-motive]                 Briscoe, B., "Re-ECN: The Motivation
1877	                                   for Adding Congestion Accountability
1878	                                   to TCP/IP", draft-briscoe-tsvwg-re-
1879	                                   ecn-tcp-motivation-00 (work in
1880	                                   progress), March 2009.

1882	Appendix A.  Precise Re-ECN Protocol Operation

1884	   {ToDo: fix this}

1886	   The protocol operation in the middle described in Section 4.3 was an
1887	   approximation.  In fact, standard ECN router marking combines 1% and
1888	   2% marking into slightly less than 3% whole-path marking, because
1889	   routers deliberately mark CE whether or not it has already been
1890	   marked by another router upstream.  So the combined marking fraction
1891	   would actually be 100% - (100% - 1%)(100% - 2%) = 2.98%.

1893	   To generalise this we will need some notation.

1895	   o  j represents the index of each resource (typically queues) along a
1896	      path, ranging from 0 at the first router to n-1 at the last.

1898	   o  m_j represents the fraction of octets *m*arked CE by a particular
1899	      router (whether or not they are already marked) because of
1900	      congestion of resource j.

1902	   o  u_j represents congestion *u*pstream of resource j, being the
1903	      fraction of CE marking in arriving packet headers (before
1904	      marking).

1906	   o  p_j represents *p*ath congestion, being the fraction of packets
1907	      arriving at resource j with the RE flag blanked (excluding Not-
1908	      RECT packets).

1910	   o  v_j denotes expected congestion downstream of resource j, which
1911	      can be thought of as a *v*irtual marking fraction, being derived
1912	      from two other marking fractions.

1914	   Observed fractions of each particular codepoint (u, p and v) and
1915	   router marking rate m are dimensionless fractions, being the ratio of
1916	   two data volumes (marked and total) over a monitoring period.  All
1917	   measurements are in terms of octets, not packets, assuming that line
1918	   resources are more congestible than packet processing.

1920	   The path congestion (RE blanking fraction) set by the sender should
1921	   reflect the upstream congestion (CE marking fraction) fed back from
1922	   the destination.  Therefore in the steady state

1924	      p_0  = u_n
1925	           = 1 - (1 - m_1)(1 - m_2)...

1927	   Similarly, at some point j in the middle of the network, if p = 1 -
1928	   (1 - u_j)(1 - v_j), then

1930	      v_j  = 1 - (1 - p)/(1 - u_j)

1932	          ~= p - u_j;                      if u_j << 100%

1934	   So, between the two routers in the example in Section 4.3, congestion
1935	   downstream is

1937	      v_1  = 100.00% - (100% - 2.98%) / (100% - 1.00%)
1938	           = 2.00%,

1940	   or a useful approximation of downstream congestion is
1941	      v_1 ~= 2.98% - 1.00%
1942	          ~= 1.98%.

1944	Appendix B.  Justification for Two Codepoints Signifying Zero Worth
1945	             Packets

1947	   It may seem a waste of a codepoint to set aside two codepoints of the
1948	   Extended ECN field to signify zero worth (RECT and CE(0) are both
1949	   worth zero).  The justification is subtle, but worth recording.

1951	   The original version of Re-ECN ([Re-fb] and draft-00 of this memo)
1952	   used three codepoints for neutral (ECT(1)), positive (ECT(0)) and
1953	   negative (CE) packets.  The sender set packets to neutral unless re-
1954	   echoing congestion, when it set them positive, in much the same way
1955	   that it blanks the RE flag in the current protocol.  However, routers
1956	   were meant to mark congestion by setting packets negative (CE)
1957	   irrespective of whether they had previously been neutral or positive.

1959	   However, we did not arrange for senders to remember which packet had
1960	   been sent with which codepoint, or for feedback to say exactly which
1961	   packets arrived with which codepoints.  The transport was meant to
1962	   inflate the number of positive packets it sent to allow for a few
1963	   being wiped out by congestion marking.  We (wrongly) assumed that
1964	   routers would congestion mark packets indiscriminately, so the
1965	   transport could infer how many positive packets had been marked and
1966	   compensate accordingly by re-echoing.  But this created a perverse
1967	   incentive for routers to preferentially congestion mark positive
1968	   packets rather than neutral ones.

1970	   We could have removed this perverse incentive by requiring Re-ECN
1971	   senders to remember which packets they had sent with which codepoint.
1972	   And for feedback from the receiver to identify which packets arrived
1973	   as which.  Then, if a positive packet was congestion marked to
1974	   negative, the sender could have re-echoed twice to maintain the
1975	   balance between positive and negative at the receiver.

1977	   Instead, we chose to make re-echoing congestion (blanking RE)
1978	   orthogonal to congestion notification (marking CE), which required a
1979	   second neutral codepoint.  Then the receiver would be able to detect
1980	   and echo a congestion event even if it arrived on a packet that had
1981	   originally been positive.

1983	   If we had added extra complexity to the sender and receiver
1984	   transports to track changes to individual packets, we could have made
1985	   it work, but then routers would have had an incentive to mark
1986	   positive packets with half the probability of neutral packets.  That
1987	   in turn would have led router algorithms to become more complex.
1988	   Then senders wouldn't know whether a mark had been introduced by a
1989	   simple or a complex router algorithm.  That in turn would have
1990	   required another codepoint to distinguish between RFC3168 ECN and new
1991	   Re-ECN router marking.

1993	   Once the cost of IP header codepoint real-estate was the same for
1994	   both schemes, there was no doubt that the simpler option for
1995	   endpoints and for routers should be chosen.  The resulting protocol
1996	   also no longer needed the tricky inflation/deflation complexity of
1997	   the original (broken) scheme.  It was also much simpler to understand
1998	   conceptually.

2000	   A further advantage of the new orthogonal four-codepoint scheme was
2001	   that senders owned sole rights to change the RE flag and routers
2002	   owned sole rights to change the ECN field.  Although we still arrange
2003	   the incentives so neither party strays outside their dominion, these
2004	   clear lines of authority simplify the matter.

2006	   Finally, a little redundancy can be very powerful in a scheme such as
2007	   this.  In one flow, the proportion of packets changed to CE should be
2008	   the same as the proportion of RECT packets changed to CE(-1) and the
2009	   proportion of Re-Echo packets changed to CE(0).  Double checking
2010	   using such redundant relationships can improve the security of a
2011	   scheme (cf. double-entry book-keeping or the ECN Nonce).
2012	   Alternatively, it might be necessary to exploit the redundancy in the
2013	   future to encode an extra information channel.

2015	Appendix C.  ECN Compatibility

2017	   The rationale for choosing the particular combinations of SYN and SYN
2018	   ACK flags in Section 6.1.3 is as follows.

2020	   Choice of SYN flags:  A Re-ECN sender can work with RFC3168 compliant
2021	      ECN receivers so we wanted to use the same flags as would be used
2022	      in an ECN-setup SYN [RFC3168] (CWR=1, ECE=1).  But at the same
2023	      time, we wanted a server (host B) that is Re-ECT to be able to
2024	      recognise that the client (A) is also Re-ECT.  We believe also
2025	      setting NS=1 in the initial SYN achieves both these objectives, as
2026	      it should be ignored by RFC3168 compliant ECT receivers and by
2027	      ECT-Nonce receivers.  But senders that are not Re-ECT should not
2028	      set NS=1.  At the time ECN was defined, the NS flag was not
2029	      defined, so setting NS=1 should be ignored by existing ECT
2030	      receivers (but testing against implementations may yet prove
2031	      otherwise).  The ECN Nonce RFC [RFC3540] is silent on what the NS
2032	      field might be set to in the TCP SYN, but we believe the intent
2033	      was for a nonce client to set NS=0 in the initial SYN (again only
2034	      testing will tell).  Therefore we define a Re-ECN-setup SYN as one
2035	      with NS=1, CWR=1 & ECE=1

2037	   Choice of SYN ACK flags:  Choice of SYN ACK: The client (A) needs to
2038	      be able to determine whether the server (B) is Re-ECT.  The
2039	      original ECN specification required an ECT server to respond to an
2040	      ECN-setup SYN with an ECN-setup SYN ACK of CWR=0 and ECE=1.  There
2041	      is no room to modify this by setting the NS flag, as that is
2042	      already set in the SYN ACK of an ECT-Nonce server.  So we used the
2043	      only combination of CWR and ECE that would not be used by existing
2044	      TCP receivers: CWR=1 and ECE=0.  The original ECN specification
2045	      defines this combination as a non-ECN-setup SYN ACK, which remains
2046	      true for RFC3168 compliant and Nonce ECTs.  But for Re-ECN we
2047	      define it as a Re-ECN-setup SYN ACK.  We didn't use a SYN ACK with
2048	      both CWR and ECE cleared to 0 because that would be the likely
2049	      response from most Not-ECT receivers.  And we didn't use a SYN ACK
2050	      with both CWR and ECE set to 1 either, as at least one broken
2051	      receiver implementation echoes whatever flags were in the SYN into
2052	      its SYN ACK.  Therefore we define a Re-ECN-setup SYN ACK as one
2053	      with CWR=1 & ECE=0.

2055	   Choice of two alternative SYN ACKs:  the NS flag may take either
2056	      value in a Re-ECN-setup SYN ACK.  Section 5.4 REQUIRES that a Re-
2057	      ECT server MUST set the NS flag to 1 in a Re-ECN-setup SYN ACK to
2058	      echo congestion experienced (CE) on the initial SYN.  Otherwise a
2059	      Re-ECN-setup SYN ACK MUST be returned with NS=0.  The only current
2060	      known use of the NS flag in a SYN ACK is to indicate support for
2061	      the ECN nonce, which will be negotiated by setting CWR=0 & ECE=1.
2062	      Given the ECN nonce MUST NOT be used for a RECN mode connection, a
2063	      Re-ECN-setup SYN ACK can use either setting of the NS flag without
2064	      any risk of confusion, because the CWR & ECE flags will be
2065	      reversed relative to those used by an ECN nonce SYN ACK.

2067	Appendix D.  Packet Marking with FNE During Flow Start

2069	   FNE (feedback not established) packets have two functions.  Their
2070	   main role is to announce the start of a new flow when feedback has
2071	   not yet been established.  However they also have the role of
2072	   balancing the expected feedback and can be used where there are
2073	   sudden changes in the rate of transmission.  Whilst this should not
2074	   happen under TCP their use as speculative marking is used in building
2075	   the following argument as to why the first and third packets should
2076	   be set to FNE.

2078	   The proportion of FNE packets in each roundtrip should be a high
2079	   estimate of the potential error in the balance of number of
2080	   congestion marked packets versus number of re-echo packets already
2081	   issued.

2083	   Let's call:

2085	      S: the number of the TCP segments sent so far

2087	      F: the number of FNE packets sent so far

2089	      R: the number of Re-Echo packets sent so far

2091	      A: the number of acknowledgments received so far

2093	      C: the number of acknowledgments echoing a CE packet

2095	   In normal operation, when we want to send packet S+1, we first need
2096	   to check that enough Re-Echo packets have been issued:

2098	   If R<C, then S+1 will be a Re-echo packet

2100	   Next we need to estimate the amount of congestion observed so far.
2101	   If congestion was stationary, it could be estimated as C/A. A
2102	   pessimistic bound is (C+1)/(A+1) which assumes that the next
2103	   acknowledgment will echo a CE packet; we'll use that more pessimistic
2104	   estimate to drive the generation of FNE packets.

2106	   The number of CE packets expected when (S+1) will be acknowledged is
2107	   therefore (S+1)*(C+1)/(A+1).  Packet S+1 should be set to FNE if that
2108	   expected value exceeds the sum of FNE and Re-Echo packets sent so
2109	   far.

2111	      If  (F+R)<(S+1)*(C+1)/(A+1),
2112	        then S+1 will be set to FNE
2113	        else S+1 will be set to RECT

2115	   So the full test should be:

2117	      When packet (S+1) is about to be sent...
2118	        If R<C,
2119	           then S+1 will be set to Re-Echo
2120	        Else if  (F+R)<(S+1)*(C+1)/(A+1),
2121	          then S+1 will be set to FNE
2122	        Else S+1 will be set to RECT

2124	   This means that at any point, given A, R, F, C, the source could send
2125	   another k RECT packets, so that k < (F+R)*(A+1)/(C+1)-S

2127	   The above scheme is independent of the actions of both the dropper
2128	   and policer and doesn't depend on the rate adaptation discipline of
2129	   the source.  It only defines Re-Echo packets as notification of
2130	   effective end-to-end congestion (as witnessed at the previous
2131	   roundtrip), and FNE packets as notification of speculative end-to-end
2132	   congestion based on a high estimate of congestion
2133	   In practice, for any source:

2135	   o  for the first packet, A=R=F=C=S=0 ==> 1 FNE

2137	   o  if the acknowledgment doesn't echo a mark

2139	      *  for the second packet, A=F=S=1 R=C=0 ==> 1 RECT

2141	      *  for the third packet, S=2 A=F=1 R=C=0 ==> 1 FNE

2143	   o  if no acknowledgement for these two packets echoes a congestion
2144	      mark, then {A=S=3 F=2 R=C=0} which gives k<2*4/1-3, so the source

2146	   o  if no acknowledgement for these four packets echoes a congestion
2147	      mark, then {A=S=7 F=2 R=C=0} which gives k<2*8/1-7, so the source
2148	      could send another 8 RECT packets. ==> 8 RECT

2150	   This behaviour happens to match TCP's congestion window control in
2151	   slow start, which is why for TCP sources, only the first and third
2152	   packet need be FNE packets.

2154	   A source that would open the congestion window any quicker would have
2155	   to insert more FNE packets.  As another example a UDP source sending
2156	   VBR traffic might need to send several FNE packets ahead of the
2157	   traffic peaks it generates.

2159	Appendix E.  Argument for holding back the ECN nonce

2161	   The ECN nonce is a mechanism that allows a /sending/ transport to
2162	   detect if drop or ECN marking at a congested router has been
2163	   suppressed by a node somewhere in the feedback loop---another router
2164	   or the receiver.

2166	   Space for the ECN nonce was set aside in [RFC3168] (currently
2167	   proposed standard) while the full nonce mechanism is specified in
2168	   [RFC3540] (currently experimental).  The specifications for [RFC4340]
2169	   (currently proposed standard) requires that "Each DCCP sender SHOULD
2170	   set ECN Nonces on its packets...".  It also mandates as a requirement
2171	   for all CCID profiles that "Any newly defined acknowledgement
2172	   mechanism MUST include a way to transmit ECN Nonce Echoes back to the
2173	   sender.", therefore:

2175	   o  The CCID profile for TCP-like Congestion Control [RFC4341]
2176	      (currently proposed standard) says "The sender will use the ECN
2177	      Nonce for data packets, and the receiver will echo those nonces in
2178	      its Ack Vectors."

2180	   o  The CCID profile for TCP-Friendly Rate Control (TFRC) [RFC4342]
2181	      recommends that "The sender [use] Loss Intervals options' ECN
2182	      Nonce Echoes (and possibly any Ack Vectors' ECN Nonce Echoes) to
2183	      probabilistically verify that the receiver is correctly reporting
2184	      all dropped or marked packets."

2186	   The primary function of the ECN nonce is to protect the integrity of
2187	   the information about congestion: ECN marks and packet drops.
2188	   However, when the nonce is used to protect the integrity of
2189	   information about packet drops, rather than ECN marks, a transport
2190	   layer nonce will always be sufficient (because a drop loses the
2191	   transport header as well as the ECN field in the network header),
2192	   which would avoid using scarce IP header codepoint space.  Similarly,
2193	   a transport layer nonce would protect against a receiver sending
2194	   early acknowledgements [Savage99].

2196	   If the ECN nonce reveals integrity problems with the information
2197	   about congestion, the sending transport can use that knowledge for
2198	   two functions:

2200	   o  to protect its own resources, by allocating them in proportion to
2201	      the rates that each network path can sustain, based on congestion
2202	      control,

2204	   o  and to protect congested routers in the network, by slowing down
2205	      drastically its connection to the destination with corrupt
2206	      congestion information.

2208	   If the sending transport chooses to act in the interests of congested
2209	   routers, it can reduce its rate if it detects some malicious party in
2210	   the feedback loop may be suppressing ECN feedback.  But it would only
2211	   be useful to congested routers when /all/ senders using them are
2212	   trusted to act in interest of the congested routers.

2214	   In the end, the only essential use of a network layer nonce is when
2215	   sending transports (e.g. large servers) want to allocate their /own/
2216	   resources in proportion to the rates that each network path can
2217	   sustain, based on congestion control.  In that case, the nonce allows
2218	   senders to be assured that they aren't being duped into giving more
2219	   of their own resources to a particular flow.  And if congestion
2220	   suppression is detected, the sending transport can rate limit the
2221	   offending connection to protect its own resources.  Certainly, this
2222	   is a useful function, but the IETF should carefully decide whether
2223	   such a single, very specific case warrants IP header space.

2225	   In contrast, Re-ECN allows all routers to fully protect themselves
2226	   from such attacks, without having to trust anyone - senders,
2227	   receivers, neighbouring networks.  Re-ECN is therefore proposed in
2228	   preference to the ECN nonce on the basis that it addresses the
2229	   generic problem of accountability for congestion of a network's
2230	   resources at the IP layer.

2232	   Delaying the ECN nonce is justified because the applicability of the
2233	   ECN nonce seems too limited for it to consume a two-bit codepoint in
2234	   the IP header.  It therefore seems prudent to give time for an
2235	   alternative way to be found to do the one function the nonce is
2236	   essential for.

2238	   Moreover, while we have re-designed the Re-ECN codepoints so that
2239	   they do not prevent the ECN nonce progressing, the same is not true
2240	   the other way round.  If the ECN nonce started to see some deployment
2241	   (perhaps because it was blessed with proposed standard status),
2242	   incremental deployment of Re-ECN would effectively be impossible,
2243	   because Re-ECN marking fractions at inter-domain borders would be
2244	   polluted by unknown levels of nonce traffic.

2246	   The authors are aware that Re-ECN must prove it has the potential it
2247	   claims if it is to displace the nonce.  Therefore, every effort has
2248	   been made to complete a comprehensive specification of Re-ECN so that
2249	   its potential can be assessed.  We therefore seek the opinion of the
2250	   Internet community on whether the Re-ECN protocol is sufficiently
2251	   useful to warrant standards action.

2253	Appendix F.  Alternative Terminology Used in Other Documents

2255	   A number of alternative terms have been used in various documents
2256	   describign re-feedback and re-ECN.  These are set out in the
2257	   following table
2258	   +-------------------+---------------+-------------------------------+
2259	   |      Current      |      EECN     |             Colour            |
2260	   |    Terminology    |   codepoint   |                               |
2261	   +-------------------+---------------+-------------------------------+
2262	   |      Cautious     |      FNE      |             Green             |
2263	   |      Positive     |    Re-Echo    |             Black             |
2264	   |      Neutral      |      RECT     |              Grey             |
2265	   |      Negative     |     CE(-1)    |              Red              |
2266	   |     Cancelled     |     CE(0)     |           Red-Black           |
2267	   |     Legacy ECN    |     ECT(0)    |             White             |
2268	   |  Currently Unused |     --CU--    |        Currently unused       |
2269	   |                   |               |                               |
2270	   |       Legacy      |    Not-ECT    |             White             |
2271	   +-------------------+---------------+-------------------------------+

2273	                  Table 7: Alternative re-ECN Terminology

2275	Authors' Addresses

2277	   Bob Briscoe
2278	   BT & UCL
2279	   B54/77, Adastral Park
2280	   Martlesham Heath
2281	   Ipswich  IP5 3RE
2282	   UK

2284	   Phone: +44 1473 645196
2285	   EMail: bob.briscoe@bt.com
2286	   URI:   http://www.cs.ucl.ac.uk/staff/B.Briscoe/

2288	   Arnaud Jacquet
2289	   BT
2290	   B54/70, Adastral Park
2291	   Martlesham Heath
2292	   Ipswich  IP5 3RE
2293	   UK

2295	   Phone: +44 1473 647284
2296	   EMail: arnaud.jacquet@bt.com
2297	   URI:

2299	   Toby Moncaster
2300	   BT
2301	   B54/70, Adastral Park
2302	   Martlesham Heath
2303	   Ipswich  IP5 3RE
2304	   UK

2306	   Phone: +44 1473 648734
2307	   EMail: toby.moncaster@bt.com

2309	   Alan Smith
2310	   BT
2311	   B54/76, Adastral Park
2312	   Martlesham Heath
2313	   Ipswich  IP5 3RE
2314	   UK

2316	   Phone: +44 1473 640404
2317	   EMail: alan.p.smith@bt.com