idnits 2.17.1 

draft-briscoe-conex-re-ecn-tcp-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 16, 2013) is 3937 days in the past.  Is this
     intentional?


  Checking references for intended status: Historic
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 4835 (Obsoleted by RFC 7321)

  ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260)

  == Outdated reference: A later version (-10) exists of
     draft-ietf-conex-tcp-modifications-04

  == Outdated reference: A later version (-03) exists of
     draft-briscoe-conex-re-ecn-motiv-02

  -- Obsolete informational reference (is this intentional?): RFC 2309
     (Obsoleted by RFC 7567)


     Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Transport Area Working Group                             B. Briscoe, Ed.
3	Internet-Draft                                                A. Jacquet
4	Intended status: Historic                                             BT
5	Expires: January 17, 2014                                   T. Moncaster
6	                                                           Moncaster.com
7	                                                                A. Smith
8	                                                                      BT
9	                                                           July 16, 2013

11	     Re-ECN: Adding Accountability for Causing Congestion to TCP/IP
12	                   draft-briscoe-conex-re-ecn-tcp-02

14	Abstract

16	   This document introduces re-ECN (re-inserted explicit congestion
17	   notification), which is intended to make a simple but far-reaching
18	   change to the Internet architecture.  The sender uses the IP header
19	   to reveal the congestion that it expects on the end-to-end path.  The
20	   protocol works by arranging an extended ECN field in each packet so
21	   that, as it crosses any interface in an internetwork, it will carry a
22	   truthful prediction of congestion on the remainder of its path.  It
23	   can be deployed incrementally around unmodified routers.  The purpose
24	   of this document is to specify the re-ECN protocol at the IP layer
25	   and to give guidelines on any consequent changes required to
26	   transport protocols.  It includes the changes required to TCP both as
27	   an example and as a specification.  It briefly gives examples of
28	   mechanisms that can use the protocol to ensure data sources respond
29	   sufficiently to congestion, but these are described more fully in a
30	   companion document.

32	   Note concerning Intended Status: If this draft were ever published as
33	   an RFC it would probably have historic status.  There is limited
34	   space in the IP header, so re-ECN had to compromise by requiring the
35	   receiver to be ECN-enabled otherwise the sender could not use re-ECN.
36	   Re-ECN was a precursor to chartering of the IETF's Congestion
37	   Exposure (ConEx) working group, but during chartering there were
38	   still too few ECN receivers enabled, therefore it was decided to
39	   pursue other compromises in order to fit a similar capability into
40	   the IP header.

42	Status of This Memo

44	   This Internet-Draft is submitted in full conformance with the
45	   provisions of BCP 78 and BCP 79.

47	   Internet-Drafts are working documents of the Internet Engineering
48	   Task Force (IETF).  Note that other groups may also distribute
49	   working documents as Internet-Drafts.  The list of current Internet-
50	   Drafts is at http://datatracker.ietf.org/drafts/current/.

52	   Internet-Drafts are draft documents valid for a maximum of six months
53	   and may be updated, replaced, or obsoleted by other documents at any
54	   time.  It is inappropriate to use Internet-Drafts as reference
55	   material or to cite them other than as "work in progress."

57	   This Internet-Draft will expire on January 17, 2014.

59	Copyright Notice

61	   Copyright (c) 2013 IETF Trust and the persons identified as the
62	   document authors.  All rights reserved.

64	   This document is subject to BCP 78 and the IETF Trust's Legal
65	   Provisions Relating to IETF Documents
66	   (http://trustee.ietf.org/license-info) in effect on the date of
67	   publication of this document.  Please review these documents
68	   carefully, as they describe your rights and restrictions with respect
69	   to this document.  Code Components extracted from this document must
70	   include Simplified BSD License text as described in Section 4.e of
71	   the Trust Legal Provisions and are provided without warranty as
72	   described in the Simplified BSD License.

74	Table of Contents

76	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  5
77	   2.  Requirements notation  . . . . . . . . . . . . . . . . . . . .  6
78	   3.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  6
79	   4.  Protocol Overview  . . . . . . . . . . . . . . . . . . . . . .  7
80	     4.1.  Simplified Re-ECN Protocol . . . . . . . . . . . . . . . .  7
81	       4.1.1.  Congestion Control and Policing the Protocol . . . . .  8
82	       4.1.2.  Background and Applicability . . . . . . . . . . . . .  8
83	     4.2.  Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or
84	           v6)  . . . . . . . . . . . . . . . . . . . . . . . . . . .  9
85	     4.3.  Re-ECN Protocol Operation  . . . . . . . . . . . . . . . . 11
86	     4.4.  Positive and Negative Flows  . . . . . . . . . . . . . . . 13
87	   5.  Network Layer  . . . . . . . . . . . . . . . . . . . . . . . . 14
88	     5.1.  Re-ECN IPv4 Wire Protocol  . . . . . . . . . . . . . . . . 14
89	     5.2.  Re-ECN IPv6 Wire Protocol  . . . . . . . . . . . . . . . . 16
90	     5.3.  Router Forwarding Behaviour  . . . . . . . . . . . . . . . 17
91	     5.4.  Justification for Setting the First SYN to FNE . . . . . . 18
92	     5.5.  Control and Management . . . . . . . . . . . . . . . . . . 19
93	       5.5.1.  Negative Balance Warning . . . . . . . . . . . . . . . 19
94	       5.5.2.  Rate Response Control  . . . . . . . . . . . . . . . . 20
95	     5.6.  IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 20
96	     5.7.  Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 21

98	   6.  Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 22
99	     6.1.  TCP  . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
100	       6.1.1.  RECN mode: Full Re-ECN capable transport . . . . . . . 23
101	       6.1.2.  RECN-Co mode: Re-ECT Sender with a RFC3168
102	               compliant ECN Receiver . . . . . . . . . . . . . . . . 25
103	       6.1.3.  Capability Negotiation . . . . . . . . . . . . . . . . 27
104	       6.1.4.  Extended ECN (EECN) Field Settings during Flow
105	               Start or after Idle Periods  . . . . . . . . . . . . . 28
106	       6.1.5.  Pure ACKS, Retransmissions, Window Probes and
107	               Partial ACKs . . . . . . . . . . . . . . . . . . . . . 32
108	     6.2.  Other Transports . . . . . . . . . . . . . . . . . . . . . 33
109	       6.2.1.  General Guidelines for Adding Re-ECN to Other
110	               Transports . . . . . . . . . . . . . . . . . . . . . . 33
111	       6.2.2.  Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 33
112	       6.2.3.  Guidelines for adding Re-ECN to DCCP . . . . . . . . . 34
113	       6.2.4.  Guidelines for adding Re-ECN to SCTP . . . . . . . . . 34
114	   7.  Incremental Deployment . . . . . . . . . . . . . . . . . . . . 34
115	   8.  Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 35
116	     8.1.  Congestion Notification Integrity  . . . . . . . . . . . . 36
117	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 37
118	   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 38
119	   11. Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 39
120	   12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 39
121	   13. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 39
122	   14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 39
123	     14.1. Normative References . . . . . . . . . . . . . . . . . . . 39
124	     14.2. Informative References . . . . . . . . . . . . . . . . . . 40
125	   Appendix A.  Precise Re-ECN Protocol Operation . . . . . . . . . . 42
126	   Appendix B.  Justification for Two Codepoints Signifying Zero
127	                Worth Packets . . . . . . . . . . . . . . . . . . . . 44
128	   Appendix C.  ECN Compatibility . . . . . . . . . . . . . . . . . . 45
129	   Appendix D.  Packet Marking with FNE During Flow Start . . . . . . 47
130	   Appendix E.  Argument for holding back the ECN nonce . . . . . . . 49
131	   Appendix F.  Alternative Terminology Used in Other Documents . . . 51

133	Authors' Statement: (to be removed by the RFC Editor)

135	   The most immediate priority for the authors is to delay any move of
136	   the ECN nonce to Proposed Standard status, in order to leave options
137	   open for the future.  The argument for this position is developed in
138	   Appendix E.

140	Changes from previous drafts (to be removed by the RFC Editor)

142	   Full diffs from all previous versions (created using the rfcdiff
143	   tool) are available at <http://www.bobbriscoe.net/pubs.html#retcp>

145	   From draft-briscoe-conex-...-01 to -02 (current version):  Re-issued
146	      to keep alive; updated references

148	   From draft-briscoe-conex-...-00 to -01:  Re-issued to keep alive;
149	      updated references

151	   From draft-briscoe-tsvwg-...-08 to draft-briscoe-conex-...-00:

153	      Re-issued to keep alive for reference by ConEx working group

155	      Changed working group tag in filename from tsvwg to conex

157	      Changed intended status to historic and added explanatory note

159	      Updated references.  Also, now that RFC6040 has been published,
160	      the section on tunnelling required a re-write

162	      Corrected name of CE(0) to Cancelled in Table 2

164	      Noted errors and omissions (rather than spending time correcting
165	      them):

167	      *  Made a few 'ToDo' comments visible that had previously been
168	         comments within the document source

170	      *  Identified errors with 'ToDo' comments, referring to correct
171	         material where possible.

173	   From -08 to -09:

175	      Re-issued to keep alive for reference by ConEx working group.

177	      Hardly any changes to content, even where it is out of date,
178	      except references updated.

180	   From -07 to -08:

182	      Minor changes and consistency checks.

184	      References updated.

186	   From -06 to -07:

188	      Major changes made following splitting this protocol document from
189	      the related motivations document [I-D.re-ecn-motiv].

191	      Significant re-ordering of remaining text.

193	      New terminology introduced for clarity.

195	      Minor editorial changes throughout.

197	1.  Introduction

199	   This document provides a complete specification for the addition of
200	   the re-ECN protocol to IP and guidelines on how to add it to
201	   transport layer protocols, including a complete specification of re-
202	   ECN in TCP as an example.  The motivation behind this proposal is
203	   given in [I-D.re-ecn-motiv], but we include a brief summary here.

205	   Re-ECN is intended to allow senders to inform the network of the
206	   level of congestion they expect their flows to see.  This information
207	   is currently only visible at the transport layer.  ECN [RFC3168]
208	   reveals the upstream congestion state of any path by monitoring the
209	   rate of CE marks.  The receiver then informs the sender when they
210	   have seen a marked packet.  Re-ECN builds on ECN by providing new
211	   codepoints that allow the sender to declare the level of congestion
212	   they expect on the forward path.  It is closely related to ECN and
213	   indeed we define a compatibility mode to allow a re-ECN sender to
214	   communicate with an ECN receiver.

216	   If a sender understates expected congestion compared to actual
217	   congestion then the network could discard packets or enact some other
218	   sanction.  A policer can also be introduced at the ingress of
219	   networks that can limit the level of congestion being caused.

221	   A general statement of the problem solved by re-ECN is to provide
222	   sufficient information in each IP datagram to be able to hold senders
223	   and whole networks accountable for the congestion they cause
224	   downstream, before they cause it.  But the every-day problems that
225	   re-ECN can solve are much more recognisable than this rather generic
226	   statement: mitigating distributed denial of service (DDoS);
227	   simplifying differentiation of quality of service (QoS); policing
228	   compliance to congestion control; and so on.

230	   It is important to add a few key points.

232	   o  In any standard network it always takes one round trip before any
233	      feedback is received.  For this reason a sender must make a
234	      conservative prediction by transmitting IP packets with a special
235	      Cautious marking when it is unsure of the state of the network.

237	   o  It should be noted that the prediction is carried in-band in
238	      normal data packets and for many transports feedback can be
239	      carried in the normal acknowledgements or control packets.

241	   o  The re-ECN protocol is independent of the transport.  In TCP,
242	      acknowledgments are used to convey the feedback from receiver to
243	      sender.  This memo concentrates on TCP as an example transport
244	      protocol, however the re-ECN protocol is compatible with any
245	      transport where feedback can be sent from receiver to sender.

247	   This document is structured as follows.  First an overview of the re-
248	   ECN protocol is given (Section 4), outlining its attributes and
249	   explaining conceptually how it works as a whole.  The two main parts
250	   of the document follow.  That is, the protocol specification divided
251	   into network (Section 5) and transport (Section 6) layers.
252	   Deployment issues discussed throughout the document are brought
253	   together in Section 7.  Related work is discussed in (Section 8).

255	2.  Requirements notation

257	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
258	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
259	   document are to be interpreted as described in [RFC2119].

261	3.  Terminology

263	   {ToDo: No attempt has been made to bring terminology into line with
264	   that agreed within the ConEx working group.  For instance the term
265	   dropper remains unchanged, even though the ConEx w-g has decided to
266	   call it an audit function (which is actually a much better term).}

268	   The following terminology is used throughout this memo.  Some of this
269	   terminology has changed as this draft has been revised.  Therefore,
270	   to help avoid confusion, Appendix F sets out all the alternative
271	   terminology that has been used in other re-ECN related documents.

273	   o  Neutral packet - a packet that is able to be congestion marked by
274	      an ECN or re-ECN queue.

276	   o  Negative packet - a Neutral packet that has been congestion marked
277	      by an ECN or re-ECN queue.

279	   o  Positive packet - a packet that has been marked by the sender to
280	      indicate the expected level of congestion along its path.  In
281	      general Positive packets should only be sent in response to
282	      feedback received from the receiver.*

284	   o  Cancelled packet - a Positive Packet that has been congestion
285	      marked by an ECN or re-ECN queue.

287	   o  Cautious packet - a packet that has been marked by the sender to
288	      indicate the expected level of congestion along its path.  In
289	      general Cautious packets should be used when there is insufficient
290	      feedback to be confident about the congestion state of the
291	      network.*

293	      * the difference between positive and cautious packets is
294	      explained in detail later in the document along with guidelines on
295	      the use of Cautious packets.

297	   All the above terms have related IP codepoints as defined in
298	   (Section 5).

300	4.  Protocol Overview

302	4.1.  Simplified Re-ECN Protocol

304	   We describe here the simplified re-ECN protocol.  To simplify the
305	   description we assume packets and segments are synonymous.

307	   Packets are sent from a sender to a receiver.  In Figure 1 the queues
308	   (Q1 and Q2) are ECN enabled as per RFC 3168 [RFC3168].  If congestion
309	   occurs then packets are marked with the congestion experienced (CE)
310	   flag exactly as in the ECN protocol [RFC3168]; the routers do not
311	   need to be modified and do not need to know the re-ECN protocol.  The
312	   receiver constantly informs the sender of the current count of
313	   Negative packets it has seen.  The sender uses this information
314	   determine how many Positive packets it must send into the network.
315	   The receiver's aim is to balance the number of bytes that have been
316	   congestion marked with the number of Positive bytes it has sent.

318	          +--------- Feedback----------+
319	          |                            |
320	          v                            |
321	        +---+    +----+    +----+    +---+
322	        |   |    |    |    |    |    |   |
323	        | S |--->| Q1 |--->| Q2 |--->| R |
324	        |   |    |    |    |    |    |   |
325	        +---+    +----+    +----+    +---+

327	                          Figure 1: Simple Re-ECN

329	4.1.1.  Congestion Control and Policing the Protocol

331	   The arrangement of the protocol ensures that packets carry a
332	   declaration of the amount of congestion that will be experienced on
333	   the path.  The re-ECN protocol is orthogonal to any congestion
334	   control algorithms, but can be used to ensure that congestion control
335	   is being applied by the sender.

337	   In general we assume that there will be a policer at the network
338	   ingress which can rate limit traffic based on the amount of
339	   congestion declared.

341	   At the network egress there is a dropper which can impose sanctions
342	   on flows that incorrectly declare congestion.

344	   Policers and droppers are explained in more detail in
345	   [I-D.re-ecn-motiv].

347	4.1.2.  Background and Applicability

349	   The re-ECN protocol makes no changes and has no effect on the TCP
350	   congestion control algorithm or on other rate responses to
351	   congestion.  Re-ECN is not a new congestion control protocol, rather
352	   it is orthogonal to congestion control itself.  Re-ECN is concerned
353	   with revealing information about congestion so that users and
354	   networks can be held accountable for the congestion they cause, or
355	   allow to be caused.

357	   Re-ECN builds on ECN so we briefly recap the essentials of the ECN
358	   protocol [RFC3168].  Two bits in the IP protocol (v4 or v6) are
359	   assigned to the ECN field.  The sender clears the field to "00" (Not-
360	   ECT) if either end-point transport is not ECN-capable.  Otherwise it
361	   indicates an ECN-capable transport (ECT) using either of the two
362	   code-points "10" or "01" (ECT(0) and ECT(1) resp.).

364	   ECN-capable queues probabilistically set this field to "11" if
365	   congestion is experienced (CE).  In general this marking probability
366	   will increase with the length of the queue at its egress link
367	   (typically using the RED algorithm [RFC2309]).  However, they still
368	   drop rather than mark Not-ECT packets.  With multiple ECN-capable
369	   queues on a path, a flow of packets accumulates the fraction of CE
370	   marking that each queue adds.  The combined effect of the packet
371	   marking of all the queues along the path signals congestion of the
372	   whole path to the receiver.  So, for example, if one queue early in a
373	   path is marking 1% of packets and another later in a path is marking
374	   2%, flows that pass through both queues will experience approximately
375	   3% marking (see Appendix A for a precise treatment).

377	   The choice of two ECT code-points in the ECN field [RFC3168]
378	   permitted future flexibility, optionally allowing the sender to
379	   encode the experimental ECN nonce [RFC3540] in the packet stream.
380	   The nonce is designed to allow a sender to check the integrity of
381	   congestion feedback.  But Section 8.1 explains that it still gives no
382	   control over how fast the sender transmits as a result of the
383	   feedback.  On the other hand, re-ECN is designed both to ensure that
384	   congestion is declared honestly and that the sender's rate responds
385	   appropriately.

387	   Re-ECN is based on a feedback arrangement called `re-
388	   feedback' [Re-fb].  The word is short for either receiver-aligned,
389	   re-inserted or re-echoed feedback.  But it actually works even when
390	   no feedback is available.  In fact it has been carefully designed to
391	   work for single datagram flows.  It also encourages aggregation of
392	   single packet flows by congestion control proxies.  Then, even if the
393	   traffic mix of the Internet were to become dominated by short
394	   messages, it would still be possible to control congestion
395	   effectively and efficiently.

397	   Changing the Internet's feedback architecture seems to imply
398	   considerable upheaval.  But re-ECN can be deployed incrementally at
399	   the transport layer around unmodified queues using existing fields in
400	   IP (v4 or v6).  However it does also require the last undefined bit
401	   in the IPv4 header, which it uses in combination with the 2-bit ECN
402	   field to create four new codepoints.  Nonetheless, we RECOMMEND
403	   adding optional preferential drop to IP queues based on the re-ECN
404	   fields in order to improve resilience against DoS attacks.
405	   Similarly, re-ECN works best if both the sender and receiver
406	   transports are re-ECN-capable, but it can work with just sender
407	   support(Section 6.1.2).

409	4.2.  Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6)

411	   The re-ECN wire protocol uses the two bit ECN field broadly as in
412	   RFC3168 [RFC3168] as described above, but with five differences of
413	   detail (brought together in a list in Section 7).  This specification
414	   defines a new re-ECN extension (RE) flag.  We will defer the
415	   definition of the actual position of the RE flag in the IPv4 & v6
416	   headers until Section 5.  When we don't need to choose between IPv4
417	   and v6 wire protocols it will suffice call it the RE flag.

419	   Unlike the ECN field, the RE flag is intended to be set by the sender
420	   and SHOULD remain unchanged along the path, although it can be read
421	   by network elements that understand the re-ECN protocol.  It is
422	   feasible that a network element MAY change the setting of the RE
423	   flag, perhaps acting as a proxy for an end-point, but such a protocol
424	   would have to be defined in another specification
425	   (e.g. [I-D.re-pcn-border-cheat]).

427	   Although the RE flag is a separate, single bit field, it can be read
428	   as an extension to the two-bit ECN field; the three concatenated bits
429	   in what we will call the extended ECN field (EECN) giving eight
430	   codepoints.  We will use the RFC3168 names of the ECN codepoints to
431	   describe settings of the ECN field when the RE flag setting is "don't
432	   care", but we also define the following six extended ECN codepoint
433	   names for when we need to be more specific.

435	   One of re-ECN's codepoints is an alternative use of the codepoint set
436	   aside in RFC3168 for the ECN nonce (ECT(1)).  Transports using re-ECN
437	   do not need to use the ECN nonce as long as the sender is also
438	   checking for transport protocol compliance [tcp-rcv-cheat].  The case
439	   for doing this is given in Appendix E.  Two re-ECN codepoints are
440	   given compatible uses to those defined in RFC3168 (Not-ECT and CE).
441	   The other codepoint used by RFC3168 (ECT(0)) isn't used for re-ECN.
442	   Altogether this leave one codepoint of the eight unused by ECN or re-
443	   ECN and available for future use.

445	   +--------+-------------+-------+-----------+------------------------+
446	   |   ECN  |   RFC3168   |   RE  |    EECN   |     re-ECN meaning     |
447	   |  field |  codepoint  |  flag | codepoint |                        |
448	   +--------+-------------+-------+-----------+------------------------+
449	   |   00   |   Not-ECT   |   0   |  Not-ECT  |   Not re-ECN-capable   |
450	   |        |             |       |           |   transport (Legacy)   |
451	   |   00   |     ---     |   1   |    FNE    |      Feedback not      |
452	   |        |             |       |           | established (Cautious) |
453	   |   01   |    ECT(1)   |   0   |  Re-Echo  |  Re-echoed congestion  |
454	   |        |             |       |           |   and RECT (Positive)  |
455	   |   01   |     ---     |   1   |    RECT   |     Re-ECN capable     |
456	   |        |             |       |           |   transport (Neutral)  |
457	   |   10   |    ECT(0)   |   0   |   ECT(0)  |  RFC3168 ECN use only  |
458	   |        |             |       |           |                        |
459	   |   10   |     ---     |   1   |   --CU--  |    Currently unused    |
460	   |        |             |       |           |                        |
461	   |   11   |      CE     |   0   |   CE(0)   |  Re-Echo cancelled by  |
462	   |        |             |       |           |     CE (Cancelled)     |
463	   |   11   |     ---     |   1   |   CE(-1)  | Congestion Experienced |
464	   |        |             |       |           |       (Negative)       |
465	   +--------+-------------+-------+-----------+------------------------+

467	                     Table 1: Extended ECN Codepoints

469	4.3.  Re-ECN Protocol Operation

471	   In this section we will give an overview of the operation of the re-
472	   ECN protocol for TCP/IP, leaving a detailed specification to the
473	   following sections.  Other transports will be discussed later.

475	   {ToDo: This section to be updated to explain that the sender re-
476	   echoes losses in the same way as ECN markings.}

478	   In summary, the protocol adds a third `re-echo' stage to the existing
479	   TCP/IP ECN protocol.  Whenever the network adds CE congestion
480	   signalling to the IP header on the forward data path, the receiver
481	   feeds it back to the ingress using TCP, then the sender re-echoes it
482	   into the forward data path using the RE flag in the next packet.

484	   Prior to receiving any feedback a sender will not know which setting
485	   of the RE flag to use, so it sends Cautious packets by setting the
486	   FNE codepoint.  The network reads the FNE codepoint conservatively as
487	   equivalent to re-echoed congestion.

489	   Specifically, once feedback from an ECN or re-ECN capable flow is
490	   established, a re-ECN sender always initialises the ECN field to
491	   ECT(1).  And it usually sets the RE flag to "1" indicating a Neutral
492	   packet.  Whenever a queue marks a packet to CE, the receiver feeds
493	   back this event to the sender.  On receiving this feedback, the re-
494	   ECN sender will clear the RE flag to "0" in the next packet it sends
495	   (indicating a Positive packet).

497	   We chose to set and clear the RE flag this way round to ease
498	   incremental deployment (see Section 7).  To avoid confusion we will
499	   use the term `blanking' (rather than marking) when the RE flag is
500	   cleared to "0".  So, over a stream of packets, we will talk of the
501	   `RE blanking fraction' as the fraction of octets in packets with the
502	   RE flag cleared to "0".

504	       +---+  +----+                +----+  +---+
505	       | S |--| Q1 |----------------| Q2 |--| R |
506	       +---+  +----+                +----+  +---+
507	         .      .                      .      .
508	       ^ .      .                      .      .
509	       | .      .                      .      .
510	       | .     RE blanking fraction    .      .
511	    3% |-------------------------------+=======
512	       | .      .                      |      .
513	    2% | .      .                      |      .
514	       | .      .  CE marking fraction |      .
515	    1% | .      +----------------------+      .
516	       | .      |                      .      .
517	    0% +--------------------------------------->
518	         ^          ^                      ^
519	         L          M                      N    Observation points

521	                  Figure 2: A 2-Queue Example (Imprecise)

523	   Figure 2 uses a simple network to illustrate how re-ECN allows queues
524	   to measure downstream congestion.  The receiver views a CE marking
525	   fraction of 3% which is fed back to the sender.  The sender sets an
526	   RE blanking fraction of 3% to match this.  This RE blanking fraction
527	   can be observed along the path as the RE flag is not changed by
528	   network nodes once set by the sender.  This is shown by the
529	   horizontal line at 3% in the figure.  The CE marked fraction is shown
530	   by the stepped line which rises to meet the RE blanking fraction line
531	   with steps at each queue where packets are marked.  Two queues are
532	   shown (Q1 and Q2) that are currently congested.  Each time packets
533	   pass through a fraction are marked; 1% at Q1 and 2% at Q2).  The
534	   approximate downstream congestion can be measured at the observation
535	   points shown along the path by subtracting the CE marking fraction
536	   from the RE blanking fraction, as shown in the table below
537	   (Appendix A derives these approximations from a precise analysis).
538	   NB due to the unary nature of ECN marking and the equivalent unary
539	   nature of re-ECN blanking, the precise fraction of marked bytes must
540	   be calculated by maintaining a moving average of the number of
541	   packets that have been marked as a proportion of the total number of
542	   packets.

544	   Along the path the fraction of packets that had their RE field
545	   cleared remains unchanged so it can be used as a reference against
546	   which to compare upstream congestion.  The difference predicts
547	   downstream congestion for the rest of the path.  Therefore, measuring
548	   the fractions of each codepoint at any point in the Internet will
549	   reveal upstream, downstream and whole path congestion.

551	   Note that we have introduced discussion of marking and blanking
552	   fractions solely for illustration.  We are not saying any protocol
553	   handler will work with these average fractions directly.  In fact the
554	   protocol actually requires the number of marked and blanked bytes to
555	   balance by the time the packet reaches the receiver.

557	4.4.  Positive and Negative Flows

559	   In Section 3 we introduced the terms Positive, Neutral, Negative,
560	   Cautious and Cancelled.  This terminology is based on the requirement
561	   to balance the proportion of bytes marked as CE with the proportion
562	   of bytes that are re-echo marked.  In the rest of this memo we will
563	   loosely talk of positive or negative flows, meaning flows where the
564	   moving average of the downstream congestion metric is persistently
565	   positive or negative.  A negative flow is one where more CE marked
566	   packets than re-ECN blanked packets arrive.  Likewise in positive
567	   flows more re-ECN blanked packets arrive than CE marked packets.  The
568	   notion of a negative metric arises because it is derived by
569	   subtracting one metric from another.  Of course actual downstream
570	   congestion cannot be negative, only the metric can (whether due to
571	   time lags or deliberate malice).

573	   Therefore we will talk of packets having `worth' of +1, 0 or -1,
574	   which, when multiplied by their size, indicates their contribution to
575	   the downstream congestion metric.  The worth of each type of packet
576	   is given below in Table 2.  The idea is that most flows start with
577	   zero worth.  Every time the network decrements the worth of a packet,
578	   the sender increments the worth of a later packet.  Then, over time,
579	   as many positive octets should arrive at the receiver as negative.
580	   Note we have said octets not packets, so if packets are of different
581	   sizes, the worth should be incremented on enough octets to balance
582	   the octets in negative packets arriving at the receiver.  It is this
583	   balance that will allow the network to hold the sender accountable
584	   for the congestion it causes.

586	   If a packet carrying re-echoed congestion happens to also be
587	   congestion marked, the +1 worth added by the sender will be cancelled
588	   out by the -1 network congestion marking.  Although the two worth
589	   values correctly cancel out, neither the congestion marking nor the
590	   re-echoed congestion are lost, because the RE bit and the ECN field
591	   are orthogonal.  So, whenever this happens, the receiver will
592	   correctly detect and re-echo the new congestion event as well.

594	   The table below specifies unambiguously the worth of each extended
595	   ECN codepoint.  Note the order is different from the previous table
596	   to better show how the worth increments and decrements.

598	   +---------+-------+---------------+-------+-------------------------+
599	   |   ECN   |   RE  | Extended ECN  | Worth |       Re-ECN Term       |
600	   |  field  |  bit  | codepoint     |       |                         |
601	   +---------+-------+---------------+-------+-------------------------+
602	   |    00   |   0   | Not-RECT      | ...   |           ---           |
603	   |    00   |   1   | FNE           | +1    |         Cautious        |
604	   |    01   |   0   | Re-Echo       | +1    |         Positive        |
605	   |    10   |   0   | Legacy        | ...   |   RFC3168 ECN use only  |
606	   |         |       |               |       |                         |
607	   |    11   |   0   | CE(0)         |  0    |        Cancelled        |
608	   |    01   |   1   | RECT          |  0    |         Neutral         |
609	   |    10   |   1   | --CU--        | ...   |     Currently unused    |
610	   |         |       |               |       |                         |
611	   |    11   |   1   | CE(-1)        | -1    |         Negative        |
612	   +---------+-------+---------------+-------+-------------------------+

614	                Table 2: 'Worth' of Extended ECN Codepoints

616	5.  Network Layer

618	5.1.  Re-ECN IPv4 Wire Protocol

620	   The wire protocol of the ECN field in the IP header remains largely
621	   unchanged from [RFC3168].  However, an extension to the ECN field we
622	   call the RE (Re-ECN extension) flag (Section 4.2) is defined in this
623	   document.  It doubles the extended ECN codepoint space, giving 8
624	   potential codepoints.  The semantics of the extra codepoints are
625	   backward compatible with the semantics of the 4 original codepoints
626	   [RFC3168] (Section 7 collects together and summarises all the changes
627	   defined in this document).

629	   For IPv4, this document proposes that the new RE control flag will be
630	   positioned where the `reserved' control flag was at bit 48 of the
631	   IPv4 header (counting from 0).  Alternatively, some would call this
632	   bit 0 (counting from 0) of byte 7 (counting from 1) of the IPv4
633	   header (Figure 3).

635	             0   1   2
636	           +---+---+---+
637	           | R | D | M |
638	           | E | F | F |
639	           +---+---+---+

641	   Figure 3: New Definition of the Re-ECN Extension (RE) Control Flag at
642	                  the Start of Byte 7 of the IPv4 Header

644	   The semantics of the RE flag are described in outline in Section 4
645	   and specified fully in Section 6.  The RE flag is always considered
646	   in conjunction with the 2-bit ECN field, as if they were concatenated
647	   together to form a 3-bit extended ECN field.  If the ECN field is set
648	   to either the ECT(1) or CE codepoint, when the RE flag is blanked
649	   (cleared to "0") it represents a re-echo of congestion experienced by
650	   an early packet.  If the ECN field is set to the Not-ECT codepoint,
651	   when the RE flag is set to "1" it represents the feedback not
652	   established (FNE) codepoint, which signals that the packet was sent
653	   without the benefit of congestion feedback.

655	   It is believed that the FNE codepoint can simultaneously serve other
656	   purposes, particularly where the start of a flow needs distinguishing
657	   from packets later in the flow.  For instance it would have been
658	   useful to identify new flows for tag switching and might enable
659	   similar developments in the future if it were adopted.  It is similar
660	   to the state set-up bit idea designed to protect against memory
661	   exhaustion attacks.  This idea was proposed informally by David Clark
662	   and documented by Handley and Greenhalgh  [Steps_DoS].  The FNE
663	   codepoint can be thought of as a `soft-state set-up flag', because it
664	   is idempotent (i.e. one occurrence of the flag is sufficient but
665	   further occurrences achieve the same effect if previous ones were
666	   lost).

668	   We are sure there will probably be other claims pending on the use of
669	   bit 48.  We know of at least two  [ARI05], [RFC3514] but neither have
670	   been pursued in the IETF, so far, although the present proposal would
671	   meet the needs of the latter.

673	   The security flag proposal (commonly known as the evil bit) was
674	   published on 1 April 2003 as Informational RFC 3514, but it was not
675	   adopted due to confusion over whether evil-doers might set it
676	   inappropriately.  The present proposal is backward compatible with
677	   RFC3514 because if re-ECN compliant senders were benign they would
678	   correctly clear the evil bit to honestly declare that they had just
679	   received congestion feedback.  Whereas evil-doers would hide
680	   congestion feedback by setting the evil bit continuously, or at least
681	   more often than they should.  So, evil senders can be identified,
682	   because they declare that they are good less often than they should.

684	5.2.  Re-ECN IPv6 Wire Protocol

686	   For IPv6, this document proposes that the new RE control flag will be
687	   positioned as the first bit of the option field of a new Congestion
688	   hop by hop option header (Figure 4).

690	        0                   1                   2                   3
691	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
692	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
693	       |  Next Header  |  Hdr ext Len  |  Option Type  | Opt Length =4 |
694	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
695	       |R|                     Reserved for future use                 |
696	       |E|                                                             |
697	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

699	      Figure 4: Definition of a New IPv6 Congestion Hop by Hop Option
700	         Header containing the re-ECN Extension (RE) Control Flag

702	               0 1 2 3 4 5 6 7 8
703	               +-+-+-+-+-+-+-+-+-
704	               |AIU|C|Option ID|
705	               +-+-+-+-+-+-+-+-+-

707	           Figure 5: Congestion Hop by Hop Option Type Encoding

709	   The Hop-by-Hop Options header enables packets to carry information to
710	   be examined and processed by routers or nodes along the packet's
711	   delivery path, including the source and destination nodes.  For re-
712	   ECN, the two bits of the Action If Unrecognized (AIU) flag of the
713	   Congestion extension header MUST be set to "00" meaning if
714	   unrecognized `skip over option and continue processing the header'.
715	   Then, any routers or a receiver not upgraded with the optional re-ECN
716	   features described in this memo will simply ignore this header.  But
717	   routers with these optional re-ECN features or a re-ECN policing
718	   function, will process this Congestion extension header.

720	   The `C' flag MUST be set to "1" to specify that the Option Data
721	   (currently only the RE control flag) can change en-route to the
722	   packet's final destination.  This ensures that, when an
723	   Authentication header (AH [RFC4302]) is present in the packet, for
724	   any option whose data may change en-route, its entire Option Data
725	   field will be treated as zero-valued octets when computing or
726	   verifying the packet's authenticating value.

728	   Although the RE control flag should not be changed along the path, we
729	   expect that the rest of this option field that is currently `Reserved
730	   for future use' could be used for a multi-bit congestion notification
731	   field which we would expect to change en route.  Therefore, as
732	   changes to the RE flag could be detected end-to-end without
733	   authentication (see Section 9), we set the C flag to '1'.

735	5.3.  Router Forwarding Behaviour

737	   {ToDo: Consider a section on how whole protocol interworks with drop.
738	   Perhaps in Protocol Overview.}

740	   Re-ECN works well without modifying the forwarding behaviour of any
741	   routers.  However, below, two OPTIONAL changes to forwarding
742	   behaviour are defined which respectively enhance performance and
743	   improve a router's discrimination against flooding attacks.  They are
744	   both OPTIONAL additions that we propose MAY apply by default to all
745	   Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN
746	   marking behaviours [RFC3168].  Specifications for PHBs MAY define
747	   different forwarding behaviours from this default, but this is not
748	   required.  [I-D.re-pcn-border-cheat] is one example.

750	   FNE indicates ECT:

752	      The FNE codepoint tells a router to assume that the packet was
753	      sent by an ECN-capable transport (see Section 5.4).  Therefore an
754	      FNE packet MAY be marked rather than dropped.  Note that the FNE
755	      codepoint has been intentionally chosen so that, to RFC3168
756	      compliant routers (which do not inspect the RE flag) an FNE packet
757	      appears to be Not-ECT so it will be dropped by legacy AQM
758	      algorithms.

760	      A network operator MUST NOT configure a queue to ECN mark rather
761	      than drop FNE packets unless it can guarantee that FNE packets
762	      will be rate limited, either locally or upstream.  The ingress
763	      policers discussed in [I-D.re-ecn-motiv] would count as rate
764	      limiters for this purpose.

766	   Preferential Drop:  If a re-ECN capable router queue experiences very
767	      high load so that it has to drop arriving packets (e.g. a DoS
768	      attack), it MAY preferentially drop packets within the same
769	      Diffserv PHB using the preference order for extended ECN
770	      codepoints given in Table 3.  Preferential dropping can be
771	      difficult to implement on some hardware, but if feasible it would
772	      discriminate against attack traffic if done as part of the overall
773	      policing framework of [I-D.re-ecn-motiv].  If nowhere else,
774	      routers at the egress of a network SHOULD implement preferential
775	      drop (stronger than the MAY above).  For simplicity, preferences 4
776	      & 5 MAY be merged into one preference level.

778	      The tabulated drop preferences are arranged to preserve packets
779	      with more positive worth (Section 4.4), given senders of positive
780	      packets must have honestly declared downstream congestion.  A full
781	      treatment of this is provided in the companion document describing
782	      the motivation and architecture for re-ECN [I-D.re-ecn-motiv]
783	      particularly when the application of re-ECN to protect against
784	      DDoS attacks is described.

786	   +-------+-----+------------+-------+------------+-------------------+
787	   |  ECN  |  RE | Extended   | Worth | Drop Pref  |   Re-ECN meaning  |
788	   | field | bit | ECN        |       | (1 = drop  |                   |
789	   |       |     | codepoint  |       | 1st)       |                   |
790	   +-------+-----+------------+-------+------------+-------------------+
791	   |   01  |  0  | Re-Echo    | +1    | 5/4        |     Re-echoed     |
792	   |       |     |            |       |            |   congestion and  |
793	   |       |     |            |       |            |        RECT       |
794	   |   00  |  1  | FNE        | +1    | 4          |    Feedback not   |
795	   |       |     |            |       |            |    established    |
796	   |   11  |  0  | CE(0)      | 0     | 3          |  Re-Echo canceled |
797	   |       |     |            |       |            |   by congestion   |
798	   |       |     |            |       |            |    experienced    |
799	   |   01  |  1  | RECT       | 0     | 3          |   Re-ECN capable  |
800	   |       |     |            |       |            |     transport     |
801	   |   11  |  1  | CE(-1)     | -1    | 3          |     Congestion    |
802	   |       |     |            |       |            |    experienced    |
803	   |   10  |  1  | --CU--     | n/a   | 2          |  Currently Unused |
804	   |   10  |  0  | ---        | n/a   | 2          |  RFC3168 ECN use  |
805	   |       |     |            |       |            |        only       |
806	   |   00  |  0  | Not-RECT   | n/a   | 1          |        Not        |
807	   |       |     |            |       |            |   Re-ECN-capable  |
808	   |       |     |            |       |            |     transport     |
809	   +-------+-----+------------+-------+------------+-------------------+

811	      Table 3: Drop Preference of EECN Codepoints (Sorted by `Worth')

813	5.4.  Justification for Setting the First SYN to FNE

815	   the initial SYN MUST be set to FNE by Re-ECT client A (Section 6.1.4)
816	   and (Section 5.3) says a queue MAY optionally treat an FNE packet as
817	   ECN capable, so an initial SYN may be marked CE(-1) rather than
818	   dropped.  This seems dangerous, because the sender has not yet
819	   established whether the receiver is a RFC3168 one that does not
820	   understand congestion marking.  It also seems to allow malicious
821	   senders to take advantage of ECN marking to avoid so much drop when
822	   launching SYN flooding attacks.  Below we explain the features of the
823	   protocol design that remove both these dangers.

825	   ECN-capable initial SYN with a Not-ECT server:  If the TCP server B
826	      is re-ECN capable, provision is made for it to feedback a possible
827	      congestion marked SYN in the SYN ACK (Section 6.1.4).  But if the
828	      TCP client A finds out from the SYN ACK that the server was not
829	      ECN-capable, the TCP client MUST conservatively consider the first
830	      SYN as congestion marked before setting itself into Not-ECT mode.
831	      Section 6.1.4 mandates that such a TCP client MUST also set its
832	      initial window to 1 segment.  In this way we remove the need to
833	      cautiously avoid setting the first SYN to Not-RECT.  This will
834	      give worse performance while deployment is patchy, but better
835	      performance once deployment is widespread.

837	   SYN flooding attacks can't exploit ECN-capability:  Malicious hosts
838	      may think they can use the advantage that ECN-marking gives over
839	      drop in launching classic SYN-flood attacks.  But Section 5.3
840	      mandates that a router MUST only be configured to treat packets
841	      with the FNE codepoint as ECN-capable if FNE packets are rate
842	      limited somewhere.  Introduction of the FNE codepoint was a
843	      deliberate move to enable transport-neutral handling of flow-start
844	      and flow state set-up in the IP layer where it belongs.  It then
845	      becomes possible to protect against flooding attacks of all forms
846	      (not just SYN flooding) without transport-specific inspection for
847	      things like the SYN flag in TCP headers.  Then, for instance, SYN
848	      flooding attacks using IPsec ESP encryption can also be rate
849	      limited at the IP layer.

851	   It might seem pedantic going to all this trouble to enable ECN on the
852	   initial packet of a flow, but it is motivated by a much wider concern
853	   to ensure safe congestion control will still be possible even if the
854	   application mix evolves to the point where the majority of flows
855	   consist of a single window or even a single packet.  It also allows
856	   denial of service attacks to be more easily isolated and prevented.

858	   {ToDo: Give alternative where initial packet is Not-RECT and last ACK
859	   of three-way handshake is FNE.  Explain this will give better
860	   performance while deployment is patchy, but worse performance once
861	   deployment is high.}

863	5.5.  Control and Management

865	5.5.1.  Negative Balance Warning

867	   A new ICMP message type is being considered so that a dropper can
868	   warn the apparent sender of a flow that it has started to sanction
869	   the flow.  The message would have similar semantics to the `Time
870	   exceeded' ICMP message type.  To ensure the sender has to invest some
871	   work before the network will generate such a message, a dropper
872	   SHOULD only send such a message for flows that have demonstrated that
873	   they have started correctly by establishing a positive record, but
874	   have later gone negative.  The threshold is up to the implementation.
875	   The purpose of the message is to deconfuse the cause of drops from
876	   other causes, such as congestion or transmission losses.  The dropper
877	   would send the message to the sender of the flow, not the receiver.
878	   If we did define this message type, it would be REQUIRED for all re-
879	   ECT senders to parse and understand it.  Note that a sender MUST only
880	   use this message to explain why losses are occurring.  A sender MUST
881	   NOT take this message to mean that losses have occurred that it was
882	   not aware of.  Otherwise, spoof messages could be sent by malicious
883	   sources to slow down a sender (c.f.  ICMP source quench).

885	   However, the need for this message type is not yet confirmed, as we
886	   are considering how to prevent it being used by malicious senders to
887	   scan for droppers and to test their threshold settings. {ToDo:
888	   Complete this section.}

890	5.5.2.  Rate Response Control

892	   As discussed in [I-D.re-ecn-motiv] the sender's access operator will
893	   be expected to use bulk per-user policing, but they might choose to
894	   introduce a per-flow policer.  In cases where operators do introduce
895	   per-flow policing, there may be a need for a sender to send a request
896	   to the ingress policer asking for permission to apply a non-default
897	   response to congestion (where TCP-friendly is assumed to be the
898	   default).  This would require the sender to know what message
899	   format(s) to use and to be able to discover how to address the
900	   policer.  The required control protocol(s) are outside the scope of
901	   this document, but will require definition elsewhere.

903	   The policer is likely to be local to the sender and inline, probably
904	   at the ingress interface to the internetwork.  So, discovery should
905	   not be hard.  A variety of control protocols already exist for some
906	   widely used rate-responses to congestion.  For instance DCCP
907	   congestion control identifiers (CCIDs [RFC4340]) fulfil this role and
908	   so does QoS signalling (e.g. and RSVP request for controlled load
909	   service is equivalent to a request for no rate response to
910	   congestion, but with admission control).

912	5.6.  IP in IP Tunnels

914	   Ideally, for re-ECN to work through IP in IP tunnels, the tunnel
915	   entry should copy both the RE flag and the ECN field from the inner
916	   to the outer IP header.  Then at the tunnel exit, any CE marking of
917	   the outer ECN field should overwrite the inner ECN field (unless the
918	   inner field is Not-ECT in which case an alarm should be raised).  The
919	   RE flag shouldn't change along a path, so the outer RE flag should be
920	   the same as the inner.  If it isn't, a management alarm should be
921	   raised.

923	   This requirement is satisfied by the latest specification for
924	   handling ECN through IP tunnels [RFC6040] as well as by IPsec
925	   [RFC4301].  However, it is not satisfied by the ingress behaviour
926	   specified in [RFC3168] although at least the full-functionality
927	   variant of the egress behaviour is fine.  RFC6040 updates RFC3168,
928	   but it is likely that many legacy non-IPsec IP-in-IP tunnels will
929	   exist.

931	   If legacy tunnels are left as specified in [RFC3168], whether the
932	   limited or full-functionality variants is used, a problem arises with
933	   re-ECN if a tunnel crosses an inter-domain boundary, because the
934	   difference between positive and negative markings will not be
935	   correctly accounted for.  In a limited functionality ECN tunnel, the
936	   flow will appear to be RFC3168 compliant traffic, and therefore may
937	   be wrongly rate limited.  In a full-functionality ECN tunnel, the
938	   result will depend whether the tunnel entry copies the inner RE flag
939	   to the outer header or the RE flag in the outer header is always
940	   cleared.  If the former, the flow will tend to be too positive when
941	   accounted for at borders.  If the latter, it will be too negative.
942	   If the rules set out in [RFC6040] are followed then this will not be
943	   an issue.

945	5.7.  Non-Issues

947	   The following issues might seem to cause unfavourable interactions
948	   with re-ECN, but we will explain why they don't:

950	   o  Various link layers support explicit congestion notification, such
951	      as Frame Relay and ATM.  Explicit congestion notification is
952	      proposed to be added to other link layers, such as Ethernet
953	      (802.3ar Ethernet congestion management) and MPLS [RFC5129];

955	   o  Encryption and IPsec.

957	   In the case of congestion notification at the link layer, each
958	   particular link layer scheme either manages congestion on the link
959	   with its own link-level feedback (the usual arrangement in the cases
960	   of ATM and Frame Relay), or congestion notification from the link
961	   layer is merged into congestion notification at the IP level when the
962	   frame headers are decapsulated at the end of the link (the
963	   recommended arrangement in the Ethernet and MPLS cases).  Given the
964	   RE flag is not intended to change along the path, this means that
965	   downstream congestion will still be measurable at any point where IP
966	   is processed on the path by subtracting positive from negative
967	   markings.

969	   In the case of encryption, as long as the tunnel issues described in
970	   Section 5.6 are dealt with, payload encryption itself will not be a
971	   problem.  The design goal of re-ECN is to include downstream
972	   congestion in the IP header so that it is not necessary to bury into
973	   inner headers.  Obfuscation of flow identifiers is not a problem for
974	   re-ECN policing elements.  Re-ECN doesn't ever require flow
975	   identifiers to be valid, it only requires them to be unique.  So if
976	   an IPsec encapsulating security payload (ESP [RFC4835]) or an
977	   authentication header (AH [RFC4302]) is used, the security parameters
978	   index (SPI) will be a sufficient flow identifier, as it is intended
979	   to be unique to a flow without revealing actual port numbers.

981	   In general, even if endpoints use some locally agreed scheme to hide
982	   port numbers, re-ECN policing elements can just consider the pair of
983	   source and destination IP addresses as the flow identifier.  Re-ECN
984	   encourages endpoints to at least tell the network layer that a
985	   sequence of packets are all part of the same flow, if indeed they
986	   are.  The alternative would be for the sender to make each packet
987	   appear to be a new flow, which would require them all to be marked
988	   FNE in order to avoid being treated with the bulk of malicious flows
989	   at the egress dropper.  Given the FNE marking is worth +1 and
990	   networks are likely to rate limit FNE packets, endpoints are given an
991	   incentive not to set FNE on each packet.  But if the sender really
992	   does want to hide the flow relationship between packets it can choose
993	   to pay the cost of multiple FNE packets, which in the long run will
994	   compensate for the extra memory required on network policing elements
995	   to process each flow.

997	   {ToDo: Add a note about it being useful that the AH header does not
998	   cover the RE flag, referring to Section 9.}

1000	6.  Transport Layers

1002	6.1.  TCP

1004	   Re-ECN capability at the sender is essential.  At the receiver it is
1005	   optional, as long as the receiver has a basic RFC3168-compliant ECN-
1006	   capable transport (ECT) [RFC3168].  Given re-ECN is not the first
1007	   attempt to define the semantics of the ECN field, we give a table
1008	   below summarising what happens for various combinations of
1009	   capabilities of the sender S and receiver R, as indicated in the
1010	   first four columns below.  The last column gives the mode a half-
1011	   connection should be in after the first two of the three TCP
1012	   handshakes.

1014	   +--------+--------------+------------+---------+--------------------+
1015	   | Re-ECT |   ECT-Nonce  |     ECT    | Not-ECT |         S-R        |
1016	   |        |   (RFC3540)  |  (RFC3168) |         |   Half-connection  |
1017	   |        |              |            |         |        Mode        |
1018	   +--------+--------------+------------+---------+--------------------+
1019	   |   SR   |              |            |         |        RECN        |
1020	   |    S   |       R      |            |         |       RECN-Co      |
1021	   |    S   |              |      R     |         |       RECN-Co      |
1022	   |    S   |              |            |    R    |       Not-ECT      |
1023	   +--------+--------------+------------+---------+--------------------+

1025	       Table 4: Modes of TCP Half-connection for Combinations of ECN
1026	                  Capabilities of Sender S and Receiver R

1028	   We will describe what happens in each mode, then describe how they
1029	   are negotiated.  The abbreviations for the modes in the above table
1030	   mean:

1032	   RECN:  Full re-ECN capable transport

1034	   RECN-Co:  Re-ECN sender in compatibility mode with a RFC3168
1035	      compliant [RFC3168] ECN receiver or an [RFC3540] ECN nonce-capable
1036	      receiver.  Implementation of this mode is OPTIONAL.

1038	   Not-ECT:  Not ECN-capable transport, as defined in [RFC3168] for when
1039	      at least one of the transports does not understand even basic ECN
1040	      marking.

1042	   Note that we use the term Re-ECT for a host transport that is re-ECN-
1043	   capable but RECN for the modes of the half connections between hosts
1044	   when they are both Re-ECT.  If a host transport is Re-ECT, this fact
1045	   alone does NOT imply either of its half connections will necessarily
1046	   be in RECN mode, at least not until it has confirmed that the other
1047	   host is Re-ECT.

1049	6.1.1.  RECN mode: Full Re-ECN capable transport

1051	   In full RECN mode, for each half connection, both the sender and the
1052	   receiver each maintain an unsigned integer counter we will call ECC
1053	   (echo congestion counter).  The receiver maintains a count of how
1054	   many times a CE marked packet has arrived during the half-connection.
1055	   Once a RECN connection is established, the three TCP option flags
1056	   (ECE, CWR & NS) used for ECN-related functions in other versions of
1057	   ECN are used as a 3-bit field for the receiver to repeatedly tell the
1058	   sender the current value of ECC, modulo 8, whenever it sends a TCP
1059	   ACK.  We will call this the echo congestion increment (ECI) field.
1060	   This overloaded use of these 3 option flags as one 3-bit ECI field is
1061	   shown in Figure 7.  The actual definition of the TCP header,
1062	   including the addition of support for the ECN nonce, is shown for
1063	   comparison in Figure 6.  This specification does not redefine the
1064	   names of these three TCP option flags, it merely overloads them with
1065	   another definition once a flow is established.

1067	        0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
1068	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
1069	      |               |           | N | C | E | U | A | P | R | S | F |
1070	      | Header Length | Reserved  | S | W | C | R | C | S | S | Y | I |
1071	      |               |           |   | R | E | G | K | H | T | N | N |
1072	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

1074	    Figure 6: The (post-ECN Nonce) definition of bytes 13 and 14 of the
1075	                                TCP Header

1077	        0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
1078	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
1079	      |               |           |           | U | A | P | R | S | F |
1080	      | Header Length | Reserved  |    ECI    | R | C | S | S | Y | I |
1081	      |               |           |           | G | K | H | T | N | N |
1082	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

1084	    Figure 7: Definition of the ECI field within bytes 13 and 14 of the
1085	   TCP Header, overloading the current definitions above for established
1086	                                RECN flows.

1088	   Receiver Action in RECN Mode

1090	      Every time a CE marked packet arrives at a receiver in RECN mode,
1091	      the receiver transport increments its local value of ECC and MUST
1092	      echo its value, modulo 8, to the sender in the ECI field of the
1093	      next ACK.  It MUST repeat the same value of ECI in every
1094	      subsequent ACK until the next CE event, when it increments ECI
1095	      again.

1097	      The increment of the local ECC values is modulo 8 so the field
1098	      value simply wraps round back to zero when it overflows.  The
1099	      least significant bit is to the right (labelled bit 9).

1101	      A receiver in RECN mode MAY delay the echo of a CE to the next
1102	      delayed-ACK, which would be necessary if ACK-withholding were
1103	      implemented.

1105	   Sender Action in RECN Mode

1107	      On the arrival of every ACK, the sender compares the ECI field
1108	      with its own ECC value, then replaces its local value with that
1109	      from the ACK.  The difference D (D = (ECI + 8 - ECC mod 8) mod 8)
1110	      is assumed to be the number of CE marked packets that arrived at
1111	      the receiver since it sent the previously received ACK (but see
1112	      below for the sender's safety strategy).  Whenever the ECI field
1113	      increments by D (and/or d drops are detected), the sender MUST
1114	      clear the RE flag to "0" in the IP header of the next D' data
1115	      packets it sends (where D' = D + d), effectively re-echoing each
1116	      single increment of ECI.  Otherwise the data sender MUST send all
1117	      data packets with RE set to "1".

1119	      As a general rule, once a flow is established, as well as setting
1120	      or clearing the RE flag as above, a data sender in RECN mode MUST
1121	      always set the ECN field to ECT(1).  However, the settings of the
1122	      extended ECN field during flow start are defined in Section 6.1.4.

1124	      As we have already emphasised, the re-ECN protocol makes no
1125	      changes and has no effect on the TCP congestion control algorithm.
1126	      So, the first increment of ECI (or detection of a drop) in a RTT
1127	      triggers the standard TCP congestion response, no more than one
1128	      congestion response per round trip, as usual.  However, the sender
1129	      re-echoes every increment of ECI irrespective of RTTs.

1131	      A TCP sender also acts as the receiver for the other half-
1132	      connection.  The host will maintain two ECC values S.ECC and R.ECC
1133	      as sender and receiver respectively.  Every TCP header sent by a
1134	      host in RECN mode will also repeat the prevailing value of R.ECC
1135	      in its ECI field.  If a sender in RECN mode has to retransmit a
1136	      packet due to a suspected loss, the re-transmitted packet MUST
1137	      carry the latest prevailing value of R.ECC when it is re-
1138	      transmitted, which will not necessarily be the one it carried
1139	      originally.

1141	6.1.2.  RECN-Co mode: Re-ECT Sender with a RFC3168 compliant ECN
1142	        Receiver

1144	   If the half-connection is in RECN-Co mode, ECN feedback proceeds no
1145	   differently to that of RFC3168 compliant ECN.  In other words, the
1146	   receiver sets the ECE flag repeatedly in the TCP header and the
1147	   sender responds by setting the CWR flag.  Although RECN-Co mode is
1148	   used when the receiver has not implemented the re-ECN protocol, the
1149	   sender can infer enough from its RFC3168 compliant ECN feedback to
1150	   set or clear the RE flag reasonably well.  Specifically, every time
1151	   the receiver toggles the ECE field from "0" to "1" (or a loss is
1152	   detected), as well as setting CWR in the TCP flags, the re-ECN sender
1153	   MUST blank the RE flag of the next packet to "0" as it would do in
1154	   full RECN mode.  Otherwise, the data sender SHOULD send all other
1155	   packets with RE set to "1".  Once a flow is established, a re-ECN
1156	   data sender in RECN-Co mode MUST always set the ECN field to ECT(1).

1158	   If a CE marked packet arrives at the receiver within a round trip
1159	   time of a previous mark, the receiver will still be echoing ECE for
1160	   the last CE mark.  Therefore, such a mark will be missed by the
1161	   sender.  Of course, this isn't of concern for congestion control, but
1162	   it does mean that very occasionally the RE blanking fraction will be
1163	   understated.  Therefore flows in RECN-Co mode may occasionally be
1164	   mistaken for very lightly cheating flows and consequently might
1165	   suffer a small number of packet drops through an egress dropper.  We
1166	   expect re-ECN would be deployed for some time before policers and
1167	   droppers start to enforce it.  So, given there is not much ECN
1168	   deployment yet anyway, this minor problem may affect only a very
1169	   small proportion of flows, reducing to nothing over the years as
1170	   RFC3168 compliant ECN hosts upgrade.  The use of RECN-Co mode would
1171	   need to be reviewed in the light of experience at the time of re-ECN
1172	   deployment.

1174	   RECN-Co mode is OPTIONAL.  Re-ECN implementers who want to keep their
1175	   code simple, MAY choose not to implement this mode.  If they do not,
1176	   a re-ECN sender SHOULD fall back to RFC3168 compliant ECT mode in the
1177	   presence of an ECN-capable receiver.  It MAY choose to fall back to
1178	   the ECT-Nonce mode, but if re-ECN implementers don't want to be
1179	   bothered with RECN-Co mode, they probably won't want to add an ECT-
1180	   Nonce mode either.

1182	6.1.2.1.  Re-ECN support for the ECN Nonce

1184	   A TCP half-connection in RECN-Co mode MUST NOT support the ECN
1185	   Nonce [RFC3540].  This means that the sending code of a re-ECN
1186	   implementation will never need to include ECN Nonce support.  Re-ECN
1187	   is intended to provide wider protection than the ECN nonce against
1188	   congestion control misbehaviour, and re-ECN only requires support
1189	   from the sender, therefore it is preferable to specifically rule out
1190	   the need for dual sender implementations.  As a consequence, a re-ECN
1191	   capable sender will never set ECT(0), so it will be easier for
1192	   network elements to discriminate re-ECN traffic flows from other ECN
1193	   traffic, which will always contain some ECT(0) packets.

1195	   However, a re-ECN implementation MAY OPTIONALLY include receiving
1196	   code that complies with the ECN Nonce protocol when interacting with
1197	   a sender that supports the ECN nonce (rather than re-ECN), but this
1198	   support is not required.

1200	   RFC3540 allows an ECN nonce sender to choose whether to sanction a
1201	   receiver that does not ever set the nonce sum.  Given re-ECN is
1202	   intended to provide wider protection than the ECN nonce against
1203	   congestion control misbehaviour, implementers of re-ECN receivers MAY
1204	   choose not to implement backwards compatibility with the ECN nonce
1205	   capability.  This may be because they deem that the risk of sanctions
1206	   is low, perhaps because significant deployment of the ECN nonce seems
1207	   unlikely at implementation time.

1209	6.1.3.  Capability Negotiation

1211	   During the TCP hand-shake at the start of a connection, an originator
1212	   of the connection (host A) with a re-ECN-capable transport MUST
1213	   indicate it is Re-ECT by setting the TCP flags NS=1, CWR=1 and ECE=1
1214	   in the initial SYN.

1216	   A responding Re-ECT host (host B) MUST return a SYN ACK with flags
1217	   CWR=1 and ECE=0.  The responding host MUST NOT set this combination
1218	   of flags unless the preceding SYN has already indicated Re-ECT
1219	   support as above.  Normally a Re-ECT server (B) will reply to a Re-
1220	   ECT client with NS=0, but if the initial SYN from Re-ECT client A is
1221	   marked CE(-1), a Re-ECT server B MUST increment its local value of
1222	   ECC.  But B cannot reflect the value of ECC in the SYN ACK, because
1223	   it is still using the 3 bits to negotiate connection capabilities.
1224	   So, server B MUST set the alternative TCP header flags in its SYN
1225	   ACK: NS=1, CWR=1 and ECE=0.

1227	   These handshakes are summarised in Table 5 below, with X indicating
1228	   NS can be either 1 or 0 depending respectively on whether congestion
1229	   had been experienced or not.  The handshakes used for the other
1230	   flavours of ECN are also shown for comparison.  To compress the width
1231	   of the table, the headings of the first four columns have been
1232	   severely abbreviated, as follows:

1234	      R: *R*e-ECT

1236	      N: ECT-*N*once (RFC3540)

1238	      E: *E*CT (RFC3168)

1240	      I: Not-ECT (*I*mplicit congestion notification).

1242	   These correspond with the same headings used in Table 4.  Indeed, the
1243	   resulting modes in the last two columns of the table below are a more
1244	   comprehensive way of saying the same thing as Table 4.

1246	   +----+---+---+---+------------+-------------+-----------+-----------+
1247	   | R  | N | E | I |   SYN A-B  | SYN ACK B-A |  A-B Mode |  B-A Mode |
1248	   +----+---+---+---+------------+-------------+-----------+-----------+
1249	   |    |   |   |   | NS CWR ECE |  NS CWR ECE |           |           |
1250	   | AB |   |   |   |  1   1   1 |  X   1   0  |    RECN   |    RECN   |
1251	   | A  | B |   |   |  1   1   1 |  1   0   1  |  RECN-Co  | ECT-Nonce |
1252	   | A  |   | B |   |  1   1   1 |  0   0   1  |  RECN-Co  |    ECT    |
1253	   | A  |   |   | B |  1   1   1 |  0   0   0  |  Not-ECT  |  Not-ECT  |
1254	   | B  | A |   |   |  0   1   1 |  0   0   1  | ECT-Nonce |  RECN-Co  |
1255	   | B  |   | A |   |  0   1   1 |  0   0   1  |    ECT    |  RECN-Co  |
1256	   | B  |   |   | A |  0   0   0 |  0   0   0  |  Not-ECT  |  Not-ECT  |
1257	   +----+---+---+---+------------+-------------+-----------+-----------+

1259	      Table 5: TCP Capability Negotiation between Originator (A) and
1260	                               Responder (B)

1262	   As soon as a re-ECN capable TCP server receives a SYN, it MUST set
1263	   its two half-connections into the modes given in Table 5.  As soon as
1264	   a re-ECN capable TCP client receives a SYN ACK, it MUST set its two
1265	   half-connections into the modes given in Table 5.  The half-
1266	   connections will remain in these modes for the rest of the
1267	   connection, including for the third segment of TCP's three-way hand-
1268	   shake (the ACK).

1270	   {ToDo: Consider delaying mode changes if using SYN cookies (will also
1271	   affect next section).}

1273	   {ToDo: consider RSTs within a connection.}

1275	   Recall that, if the SYN ACK reflects the same flag settings as the
1276	   preceding SYN (because there is a broken RFC3168 compliant
1277	   implementation that behaves this way), RFC3168 specifies that the
1278	   whole connection MUST revert to Not-ECT.

1280	   Also note that, whenever the SYN flag of a TCP segment is set
1281	   (including when the ACK flag is also set), the NS, CWR and ECE flags
1282	   ( i.e the ECI field of the SYN-ACK) MUST NOT be interpreted as the
1283	   3-bit ECI value, which is only set as a copy of the local ECC value
1284	   in non-SYN packets.

1286	6.1.4.  Extended ECN (EECN) Field Settings during Flow Start or after
1287	        Idle Periods

1289	   If the originator (A) of a TCP connection supports re-ECN it MUST set
1290	   the extended ECN (EECN) field in the IP header of the initial SYN
1291	   packet to the feedback not established (FNE) codepoint.

1293	   FNE is a new extended ECN codepoint defined by this specification
1294	   (Section 4.2).  The feedback not established (FNE) codepoint is used
1295	   when the transport does not have the benefit of ECN feedback so it
1296	   cannot decide whether to set or clear the RE flag.

1298	   If after receiving a SYN the server B has set its sending half-
1299	   connection into RECN mode or RECN-Co mode, it MUST set the extended
1300	   ECN field in the IP header of its SYN ACK to the feedback not
1301	   established (FNE) codepoint.  Note the careful wording here, which
1302	   means that Re-ECT server B MUST set FNE on a SYN ACK whether it is
1303	   responding to a SYN from a Re-ECT client or from a client that is
1304	   merely ECN-capable.  This is because FNE indicates the transport is
1305	   ECN capable as well as re-ECN capable.

1307	   The original ECN specification [RFC3168] required SYNs and SYN ACKs
1308	   to use the Not-ECT codepoint of the ECN field.  The aim was to
1309	   prevent well-known DoS attacks such as SYN flooding being able to
1310	   gain from the advantage that ECN capability afforded over drop at
1311	   ECN-capable routers.

1313	   For a SYN ACK, Kuzmanovic [RFC5562] has shown that this caution was
1314	   unnecessary, and allows a SYN ACK to be ECN-capable to improve
1315	   performance.  By stipulating the FNE codepoint for the initial SYN,
1316	   we comply with RFC3168 in word but not in spirit, because we have
1317	   indeed set the ECN field to Not-ECT, but we have extended the ECN
1318	   field with another bit.  And it will be seen (Section 5.3) that we
1319	   have defined one setting of that bit to mean an ECN-capable
1320	   transport.  Therefore, by proposing that the FNE codepoint MUST be
1321	   used on the initial SYN of a connection, we have gone further by
1322	   proposing to make the initial SYN ECN-capable too.  Section 5.4
1323	   justifies deciding to make the initial SYN ECN-capable.

1325	   Once a TCP half connection is in RECN mode or RECN-Co mode, FNE will
1326	   have already been set on the initial SYN and possibly the SYN ACK as
1327	   above.  But each re-ECN sender will have to set FNE cautiously on a
1328	   few data packets as well, given a number of packets will usually have
1329	   to be sent before sufficient congestion feedback is received.  The
1330	   behaviour will be different depending on the mode of the half-
1331	   connection:

1333	   RECN mode:  Given the constraints on TCP's initial window [RFC3390]
1334	      and its exponential window increase during slow start
1335	      phase [RFC5681], it turns out that the sender SHOULD set FNE on
1336	      the first and third data packets in its flow after the initial
1337	      3-way handshake, assuming equal sized data packets once a flow is
1338	      established.  Appendix D presents the calculation that led to this
1339	      conclusion.  Below, after running through the start of an example
1340	      TCP session, we give the intuition learned from that calculation.
1341	      {ToDo: unfortunately the calculation was based on erroneous
1342	      assumptions; see [I-D.conex-tcp-mods] for a better approach.}

1344	   RECN-Co mode:  A re-ECT sender that switches into re-ECN
1345	      compatibility mode or into Not-ECT mode (because it has detected
1346	      the corresponding host is not re-ECN capable) MUST limit its
1347	      initial window to 1 segment.  The reasoning behind this constraint
1348	      is given in Section 5.4.  Having set this initial window, a re-ECN
1349	      sender in RECN-Co mode SHOULD set FNE on the first and third data
1350	      packets in a flow, as for RECN mode.

1352	   +----+------+----------------+-------+-------+---------------+------+
1353	   |    | Data | TCP A(Re-ECT)  | IP A  | IP B  | TCP B(Re-ECT) | Data |
1354	   +----+------+----------------+-------+-------+---------------+------+
1355	   |    | Byte |  SEQ  ACK CTL  | EECN  | EECN  |  SEQ  ACK CTL | Byte |
1356	   | -- | ---- | -------------  | ----- | ----- | ------------- | ---- |
1357	   |  1 |      | 0100      SYN  | FNE   | -->   |      R.ECC=0  |      |
1358	   |    |      |    CWR,ECE,NS  |       |       |               |      |
1359	   |  2 |      |      R.ECC=0   | <--   | FNE   | 0300 0101     |      |
1360	   |    |      |                |       |       |   SYN,ACK,CWR |      |
1361	   |  3 |      | 0101 0301 ACK  | RECT  | -->   |      R.ECC=0  |      |
1362	   |  4 | 1000 | 0101 0301 ACK  | FNE   | -->   |      R.ECC=0  |      |
1363	   |  5 |      |      R.ECC=0   | <--   | FNE   | 0301 1102 ACK | 1460 |
1364	   |  6 |      |      R.ECC=0   | <--   | RECT  | 1762 1102 ACK | 1460 |
1365	   |  7 |      |      R.ECC=0   | <--   | FNE   | 3222 1102 ACK | 1460 |
1366	   |  8 |      | 1102 1762 ACK  | RECT  | -->   |      R.ECC=0  |      |
1367	   |  9 |      |      R.ECC=0   | <--   | RECT  | 4682 1102 ACK | 1460 |
1368	   | 10 |      |      R.ECC=0   | <--   | RECT  | 6142 1102 ACK | 1460 |
1369	   | 11 |      | 1102 3222 ACK  | RECT  | -->   |      R.ECC=0  |      |
1370	   | 12 |      |      R.ECC=0   | <--   | RECT  | 7602 1102 ACK | 1460 |
1371	   | 13 |      |      R.ECC=1   | <*-   | RECT  | 9062 1102 ACK | 1460 |
1372	   |    |      | ...            |       |       |               |      |
1373	   +----+------+----------------+-------+-------+---------------+------+

1375	                      Table 6: TCP Session Example #1

1377	   Table 6 shows an example TCP session, where the server B sets FNE on
1378	   its first and third data packets (lines 5 & 7) as well as on the
1379	   initial SYN ACK as previously described.  The left hand half of the
1380	   table shows the relevant settings of headers sent by client A in
1381	   three layers: the TCP payload size; TCP settings; then IP settings.
1382	   The right hand half gives equivalent columns for server B. The only
1383	   TCP settings shown are the sequence number (SEQ), acknowledgement
1384	   number (ACK) and the relevant control (CTL) flags that the relevant
1385	   sending host sets in the TCP header.  The IP columns show the setting
1386	   of the extended ECN (EECN) field.

1388	   Also shown on the receiving side of the table is the value of the
1389	   receiver's echo congestion counter (R.ECC) after processing the
1390	   incoming EECN header.  Note that, once a host sets a half-connection
1391	   into RECN mode, it MUST initialise its local value of ECC to zero.

1393	   The intuition that Appendix D gives for why a sender should set FNE
1394	   on the first and third data packets is as follows.  At line 13, a
1395	   packet sent by B is shown with an '*', which means it has been
1396	   congestion marked by an intermediate queue from RECT to CE(-1).  On
1397	   receiving this CE marked packet, client A increments its ECC counter
1398	   to 1 as shown.  This was the 7th data packet B sent, but before
1399	   feedback about this event returns to B, it might well have sent many
1400	   more packets.  Indeed, during exponential slow start, about as many
1401	   packets will be in flight (unacknowledged) as have been acknowledged.
1402	   So, when the feedback from the congestion event on B's 7th segment
1403	   returns, B will have sent about 7 further packets that will still be
1404	   in flight.  At that stage, B's best estimate of the network's packet
1405	   marking fraction will be 1/7.  So, as B will have sent about 14
1406	   packets, it should have already marked 2 of them as FNE in order to
1407	   have marked 1/7; hence the need to have set the first and third data
1408	   packets to FNE.

1410	   Client A's behaviour in Table 6 also shows FNE being set on the first
1411	   SYN and the first data packet (lines 1 & 4), but in this case it
1412	   sends no more data packets, so of course, it cannot, and does not
1413	   need to, set FNE again.  Note that in the A-B direction there is no
1414	   need to set FNE on the third part of the three-way hand-shake (line
1415	   3---the ACK).

1417	   Note that in this section we have used the word SHOULD rather than
1418	   MUST when specifying how to set FNE on data segments before positive
1419	   congestion feedback arrives (but note that the word MUST was used for
1420	   FNE on the SYN and SYN ACK).  FNE is only RECOMMENDED for the first
1421	   and third data segments to entertain the possibility that the TCP
1422	   transport has the benefit of other knowledge of the path, which it
1423	   re-uses from one flow for the benefit of a newly starting flow.  For
1424	   instance, one flow can re-use knowledge of other flows between the
1425	   same hosts if using a Congestion Manager [RFC3124] or when a proxy
1426	   host aggregates congestion information for large numbers of flows.

1428	   {ToDo: There is probably scope for re-writing the above in a
1429	   different way so that it says MUST unless some other knowledge of the
1430	   path is available.  See earlier note pointing out FNE on 1st & 3rd is
1431	   too few.}

1433	   After an idle period of more than 1 second, a re-ECN sender transport
1434	   MUST set the EECN field of the packet that resumes the connection to
1435	   FNE.  Note that this next packet may be sent a very long time later,
1436	   a packet does NOT have to be sent after 1 second of idling.  In order
1437	   that the design of network policers can be deterministic, this
1438	   specification deliberately puts an absolute lower limit on how long a
1439	   connection can be idle before the packet that resumes the connection
1440	   must be set to FNE, rather than relating it to the connection round
1441	   trip time.  We use the lower bound of the retransmission timeout
1442	   (RTO) [RFC6298], which is commonly used as the idle period before TCP
1443	   must reduce to the restart window [RFC5681].  Note our specification
1444	   of re-ECN's idle period is NOT intended to change the idle period for
1445	   TCP's restart, nor indeed for any other purposes.

1447	   {ToDo: Describe how the sender falls back to RFC3168 modes if packets
1448	   don't appear to be getting through (to work round firewalls
1449	   discarding packets they consider unusual).}

1451	   {ToDo: Possible future capabilities for changing Slow Start}

1453	6.1.5.  Pure ACKS, Retransmissions, Window Probes and Partial ACKs

1455	   A re-ECN sender MUST clear the RE flag to "0" and set the ECN field
1456	   to Not-ECT in pure ACKs, retransmissions and window probes, as
1457	   specified in  [RFC3168].  Our eventual goal is for all packets to be
1458	   sent with re-ECN enabled, and we believe the semantics of the ECI
1459	   field go a long way towards being able to achieve this.  However, we
1460	   have not completed a full security analysis for these cases,
1461	   therefore, currently we merely re-state current practice.

1463	   We must also reconcile the facts that congestion marking is applied
1464	   to packets but acknowledgements cover octet ranges and acknowledged
1465	   octet boundaries need not match the transmitted boundaries.  The
1466	   general principle we work to is to remain compatible with TCP's
1467	   congestion control which is driven by congestion events at packet
1468	   granularity while at the same time aiming to blank the RE flag on at
1469	   least as many octets in a flow as have been marked CE.

1471	   Therefore, a re-ECN TCP receiver MUST increment its ECC value as many
1472	   times as CE marked packets have been received.  And that value MUST
1473	   be echoed to the sender in the first available ACK using the ECI
1474	   field.  This ensures the TCP sender's congestion control receives
1475	   timely feedback on congestion events at the same packet granularity
1476	   that they were generated on congested queues.

1478	   Then, a re-ECN sender stores the difference D between its own ECC
1479	   value and the incoming ECI field by incrementing a counter R. Then, R
1480	   is decremented by 1 each subsequent packet that is sent with the RE
1481	   flag blanked, until R is no longer positive.  Using this technique,
1482	   whenever a re-ECN transport sends a not re-ECN capable packet (e.g. a
1483	   retransmission), the remaining packets required to have the RE flag
1484	   blanked will be automatically carried over to subsequent packets,
1485	   through the variable R.

1487	   This does not ensure precisely the same number of octets have RE
1488	   blanked as were CE marked.  But we believe positive errors will
1489	   cancel negative over a long enough period. {ToDo: However, more
1490	   research is needed to prove whether this is so.  If it is not, it may
1491	   be necessary to increment and decrement R in octets rather than
1492	   packets, by incrementing R as the product of D and the size in octets
1493	   of packets being sent (typically the MSS).}

1495	6.2.  Other Transports

1497	6.2.1.  General Guidelines for Adding Re-ECN to Other Transports

1499	   As a general rule, Re-ECT sender transports that have established the
1500	   receiver transport is at least ECN-capable (not necessarily re-ECN
1501	   capable) MUST blank the RE codepoint for at least as many octets as
1502	   arrive at receiver with the CE codepoint set.  Re-ECN-capable sender
1503	   transports should always initialise the ECN field to the ECT(1)
1504	   codepoint once a flow is established.

1506	   If the sender transport does not have sufficient feedback to even
1507	   estimate the path's CE rate, it SHOULD set FNE continuously.  If the
1508	   sender transport has some, perhaps stale, feedback to estimate that
1509	   the path's CE rate is nearly definitely less than E%, the transport
1510	   MAY blank RE in packets for E% of sent octets, and set the RECT
1511	   codepoint for the remainder.

1513	   The following sections give guidelines on how re-ECN support could be
1514	   added to RSVP or NSIS, to DCCP, and to SCTP - although separate
1515	   Internet drafts will be necessary to document the exact mechanics of
1516	   re-ECN in each of these protocols.

1518	   {ToDo: Give a brief outline of what would be expected for each of the
1519	   following:

1521	   o  UDP fire and forget (e.g.  DNS)

1523	   o  UDP streaming with no feedback

1525	   o  UDP streaming with feedback

1527	   }

1529	6.2.2.  Guidelines for adding Re-ECN to RSVP or NSIS

1531	   A separate I-D has been submitted [I-D.re-pcn-border-cheat]
1532	   describing how re-ECN can be used in an edge-to-edge rather than end-
1533	   to-end scenario.  It can then be used by downstream networks to
1534	   police whether upstream networks are blocking new flow reservations
1535	   when downstream congestion is too high, even though the congestion is
1536	   in other operators' downstream networks.  This relates to current
1537	   IETF work on Admission Control over Diffserv using Pre-Congestion
1538	   Notification (PCN)  [RFC5559].

1540	6.2.3.  Guidelines for adding Re-ECN to DCCP

1542	   Beside adjusting the initial features negotiation sequence, operating
1543	   re-ECN in DCCP [RFC4340] could be achieved by defining a new option
1544	   to be added to acknowledgments, that would include a multibit field
1545	   where the destination could copy its ECC.

1547	6.2.4.  Guidelines for adding Re-ECN to SCTP

1549	   Appendix A in [RFC4960] gives the specifications for SCTP to support
1550	   ECN.  Similar steps should be taken to support re-ECN.  Beside
1551	   adjusting the initial features negotiation sequence, operating re-ECN
1552	   in SCTP could be achieved by defining a new control chunk, that would
1553	   include a multibit field where the destination could copy its ECC

1555	7.  Incremental Deployment

1557	   The design of the re-ECN protocol started from the fact that the
1558	   current ECN marking behaviour of queues was sufficient and that re-
1559	   feedback could be introduced around these queues by changing the
1560	   sender behaviour but not the routers.  Otherwise, if we had required
1561	   routers to be changed, the chance of encountering a path that had
1562	   every router upgraded would be vanishingly small during early
1563	   deployment, giving no incentive to start deployment.  Also, as there
1564	   is no new forwarding behaviour, routers and hosts do not have to
1565	   signal or negotiate anything.

1567	   However, networks that choose to protect themselves using re-ECN do
1568	   have to add new security functions at their trust boundaries with
1569	   others.  They distinguish legacy traffic by its ECN field.  Traffic
1570	   from Not-ECT transports is distinguishable by its Not-ECT marking.
1571	   Traffic from RFC3168 compliant ECN transports is distinguished from
1572	   re-ECN by which of ECT(0) or ECT(1) is used.  We chose to use ECT(1)
1573	   for re-ECN traffic deliberately.  Existing ECN sources set ECT(0) on
1574	   either 50% (the nonce) or 100% (the default) of packets, whereas re-
1575	   ECN does not use ECT(0) at all.  We can use this distinguishing
1576	   feature of RFC3168 compliant ECN traffic to separate it out for
1577	   different treatment at the various border security functions: egress
1578	   dropping, ingress policing and border policing.

1580	   The general principle we adopt is that an egress dropper will not
1581	   drop any legacy traffic, but ingress and border policers will limit
1582	   the bulk rate of legacy traffic (Not-ECT, ECT(0) and those marked
1583	   with the unused codepoint) that can enter each network.  Then, during
1584	   early re-ECN deployment, operators can set very permissive (or non-
1585	   existent) rate-limits on legacy traffic, but once re-ECN
1586	   implementations are generally available, legacy traffic can be rate-
1587	   limited increasingly harshly.  Ultimately, an operator might choose
1588	   to block all legacy traffic entering its network, or at least only
1589	   allow through a trickle.

1591	   Then, as the limits are set more strictly, the more RFC3168 ECN
1592	   sources will gain by upgrading to re-ECN.  Thus, towards the end of
1593	   the voluntary incremental deployment period, RFC3168 compliant
1594	   transports can be given progressively stronger encouragement to
1595	   upgrade.

1597	   The following list of minor changes, brings together all the points
1598	   where re-ECN semantics for use of the two-bit ECN field are different
1599	   compared to RFC3168:

1601	   o  A re-ECN sender sets ECT(1) by default, whereas an RFC3168 sender
1602	      sets ECT(0) by default (Section 4.3);

1604	   o  No provision is necessary for a re-ECN capable source transport to
1605	      use the ECN nonce (Section 6.1.2.1);

1607	   o  Routers MAY preferentially drop different extended ECN codepoints
1608	      (Section 5.3);

1610	   o  Packets carrying the feedback not established (FNE) codepoint MAY
1611	      optionally be marked rather than dropped by routers, even though
1612	      their ECN field is Not-ECT (with the important caveat in
1613	      Section 5.3);

1615	   o  Packets may be dropped by policing nodes because of apparent
1616	      misbehaviour, not just because of congestion ;

1618	   o  Tunnel entry behaviour is still to be defined, but may have to be
1619	      different from RFC3168 (Section 5.6).

1621	   None of these changes REQUIRE any modifications to routers.  Also
1622	   none of these changes affect anything about end to end congestion
1623	   control; they are all to do with allowing networks to police that end
1624	   to end congestion control is well-behaved.

1626	8.  Related Work
1627	8.1.  Congestion Notification Integrity

1629	   The choice of two ECT code-points in the ECN field [RFC3168]
1630	   permitted future flexibility, optionally allowing the sender to
1631	   encode the experimental ECN nonce [RFC3540] in the packet stream.
1632	   This mechanism has since been included in the specifications of DCCP
1633	   [RFC4340].

1635	   {ToDo: DCCP provides nonce support - how does this affect the RFC?}

1637	   The ECN nonce is an elegant scheme that allows the sender to detect
1638	   if someone in the feedback loop - the receiver especially - tries to
1639	   claim no congestion was experienced when in fact congestion led to
1640	   packet drops or ECN marks.  For each packet it sends, the sender
1641	   chooses between the two ECT codepoints in a pseudo-random sequence.
1642	   Then, whenever the network marks a packet with CE, if the receiver
1643	   wants to deny congestion happened, she has to guess which ECT
1644	   codepoint was overwritten.  She has only a 50:50 chance of being
1645	   correct each time she denies a congestion mark or a drop, which
1646	   ultimately will give her away.

1648	   The purpose of a network-layer nonce should primarily be protection
1649	   of the network, while a transport-layer nonce would be better used to
1650	   protect the sender from cheating receivers.  Now, the assumption
1651	   behind the ECN nonce is that a sender will want to detect whether a
1652	   receiver is suppressing congestion feedback.  This is only true if
1653	   the sender's interests are aligned with the network's, or with the
1654	   community of users as a whole.  This may be true for certain large
1655	   senders, who are under close scrutiny and have a reputation to
1656	   maintain.  But we have to deal with a more hostile world, where
1657	   traffic may be dominated by peer-to-peer transfers, rather than
1658	   downloads from a few popular sites.  Often the `natural' self-
1659	   interest of a sender is not aligned with the interests of other
1660	   users.  It often wishes to transfer data quickly to the receiver as
1661	   much as the receiver wants the data quickly.

1663	   In contrast, the re-ECN protocol enables policing of an agreed rate-
1664	   response to congestion (e.g. TCP-friendliness) at the sender's
1665	   interface with the internetwork.  It also ensures downstream networks
1666	   can police their upstream neighbours, to encourage them to police
1667	   their users in turn.  But most importantly, it requires the sender to
1668	   declare path congestion to the network and it can remove traffic at
1669	   the egress if this declaration is dishonest.  So it can police
1670	   correctly, irrespective of whether the receiver tries to suppress
1671	   congestion feedback or whether the sender ignores genuine congestion
1672	   feedback.  Therefore the re-ECN protocol addresses a much wider range
1673	   of cheating problems, which includes the one addressed by the ECN
1674	   nonce.

1676	   {ToDo: Ensure we address the early ACK problem.}

1678	9.  Security Considerations

1680	   {ToDo: Describe attacks by networks on flows and by spoofing
1681	   sources.} {ToDo: Re-ECN & DNS servers}

1683	   This whole memo concerns the deployment of a secure congestion
1684	   control framework.  However, below we list some specific security
1685	   issues that we are still working on:

1687	   o  Malicious users have ability to launch dynamically changing
1688	      attacks, exploiting the time it takes to detect an attack, given
1689	      ECN marking is binary.  We are concentrating on subtle
1690	      interactions between the ingress policer and the egress dropper in
1691	      an effort to make it impossible to game the system.

1693	   o  There is an inherent need for at least some flow state at the
1694	      egress dropper given the binary marking environment, which leads
1695	      to an apparent vulnerability to state exhaustion attacks.  An
1696	      egress dropper design with bounded flow state is in write-up.

1698	   o  A malicious source can spoof another user's address and send
1699	      negative traffic to the same destination in order to fool the
1700	      dropper into sanctioning the other user's flow.  To prevent or
1701	      mitigate these two different kinds of DoS attack, against the
1702	      dropper and against given flows, we are considering various
1703	      protection mechanisms.

1705	   o  A malicious client can send requests using a spoofed source
1706	      address to a server (such as a DNS server) that tends to respond
1707	      with single packet responses.  This server will then be tricked
1708	      into having to set FNE on the first (and only) packet of all these
1709	      wasted responses.  Given packets marked FNE are worth +1, this
1710	      will cause such servers to consume more of their allowance to
1711	      cause congestion than they would wish to.  In general, re-ECN is
1712	      deliberately designed so that single packet flows have to bear the
1713	      cost of not discovering the congestion state of their path.  One
1714	      of the reasons for introducing re-ECN is to encourage short flows
1715	      to make use of previous path knowledge by moving the cost of this
1716	      lack of knowledge to sources that create short flows.  Therefore,
1717	      we in the long run we might expect services like DNS to aggregate
1718	      single packet flows into connections where it brings benefits.
1719	      However, this attack where DNS requests are made from spoofed
1720	      addresses genuinely forces the server to waste its resources.  The
1721	      only mitigating feature is that the attacker has to set FNE on
1722	      each of its requests if they are to get through an egress dropper
1723	      to a DNS server.  The attacker therefore has to consume as many
1724	      resources as the victim, which at least implies re-ECN does not
1725	      unwittingly amplify this attack.

1727	   Having highlighted outstanding security issues, we now explain the
1728	   design decisions that were taken based on a security-related
1729	   rationale.  It may seem that the six codepoints of the eight made
1730	   available by extending the ECN field with the RE flag have been used
1731	   rather wastefully to encode just five states.  In effect the RE flag
1732	   has been used as an orthogonal single bit, using up four codepoints
1733	   to encode the three states of positive, neutral and negative worth.
1734	   The mapping of the codepoints in an earlier version of this proposal
1735	   used the codepoint space more efficiently, but the scheme became
1736	   vulnerable to network operators bypassing congestion penalties by
1737	   focusing congestion marking on positive packets.  Appendix B explains
1738	   why fixing that problem while allowing for incremental deployment,
1739	   would have used another codepoint anyway.  So it was better to use
1740	   this orthogonal encoding scheme, which greatly simplified the whole
1741	   protocol and brought with it some subtle security benefits (see the
1742	   last paragraph of Appendix B).

1744	   With the scheme as now proposed, once the RE flag is set or cleared
1745	   by the sender or its proxy, it should not be written by the network,
1746	   only read.  So the endpoints can detect if any network maliciously
1747	   alters the RE flag.  IPsec AH integrity checking does not cover the
1748	   IPv4 option flags (they were considered mutable---even the one we
1749	   propose using for the RE flag that was `currently unused' when IPsec
1750	   was defined).  But it would be sufficient for a pair of endpoints to
1751	   make random checks on whether the RE flag was the same when it
1752	   reached the egress as when it left the ingress.  Indeed, if IPsec AH
1753	   had covered the RE flag, any network intending to alter sufficient RE
1754	   flags to make a gain would have focused its alterations on packets
1755	   without authenticating headers (AHs).

1757	   The security of re-ECN has been deliberately designed to not rely on
1758	   cryptography.

1760	10.  IANA Considerations

1762	   This memo includes no request to IANA (yet).

1764	   If this memo was to progress to standards track, it would list:

1766	   o  The new RE flag in IPv4 (Section 5.1) and its extension with the
1767	      ECN field to create a new set of extended ECN (EECN) codepoints;

1769	   o  The definition of the EECN codepoints for default Diffserv PHBs
1770	      (Section 4.2)

1772	   o  The Hop-by-Hop option ID for the new extension header for IPv6
1773	      (Section 5.2);

1775	   o  The new combinations of flags in the TCP header for capability
1776	      negotiation (Section 6.1.3);

1778	11.  Conclusions

1780	   {ToDo:}

1782	12.  Acknowledgements

1784	   Sebastien Cazalet and Andrea Soppera contributed to the idea of re-
1785	   feedback.  All the following have given helpful comments: Andrea
1786	   Soppera, David Songhurst, Peter Hovell, Louise Burness, Phil Eardley,
1787	   Steve Rudkin, Marc Wennink, Fabrice Saffre, Cefn Hoile, Steve Wright,
1788	   John Davey, Martin Koyabe, Carla Di Cairano-Gilfedder, Alexandru
1789	   Murgu, Nigel Geffen, Pete Willis, John Adams (BT), Sally Floyd
1790	   (ICIR), Joe Babiarz, Kwok Ho-Chan (Nortel), Stephen Hailes, Mark
1791	   Handley (who developed the attack with canceled packets), Adam
1792	   Greenhalgh (who developed the attack on DNS) (UCL), Jon Crowcroft
1793	   (Uni Cam), David Clark, Bill Lehr, Sharon Gillett, Steve Bauer (who
1794	   complemented our own dummy traffic attacks with others), Liz Maida
1795	   (MIT), Meral Shirazipour (Ericsson) and comments from participants in
1796	   the CRN/CFP Broadband and DoS-resistant Internet working groups.A
1797	   special thank you to Alessandro Salvatori for coming up with fiendish
1798	   attacks on re-ECN.

1800	13.  Comments Solicited

1802	   Comments and questions are encouraged and very welcome.  They can be
1803	   addressed to the IETF Congestion Exposure (ConEx) working group's
1804	   mailing list <conex@ietf.org>, and/or to the authors.

1806	14.  References

1808	14.1.  Normative References

1810	   [RFC2119]                  Bradner, S., "Key words for use in RFCs to
1811	                              Indicate Requirement Levels", BCP 14,
1812	                              RFC 2119, March 1997.

1814	   [RFC3168]                  Ramakrishnan, K., Floyd, S., and D. Black,
1815	                              "The Addition of Explicit Congestion
1816	                              Notification (ECN) to IP", RFC 3168,
1817	                              September 2001.

1819	   [RFC3390]                  Allman, M., Floyd, S., and C. Partridge,
1820	                              "Increasing TCP's Initial Window",
1821	                              RFC 3390, October 2002.

1823	   [RFC4302]                  Kent, S., "IP Authentication Header",
1824	                              RFC 4302, December 2005.

1826	   [RFC4340]                  Kohler, E., Handley, M., and S. Floyd,
1827	                              "Datagram Congestion Control Protocol
1828	                              (DCCP)", RFC 4340, March 2006.

1830	   [RFC4341]                  Floyd, S. and E. Kohler, "Profile for
1831	                              Datagram Congestion Control Protocol
1832	                              (DCCP) Congestion Control ID 2: TCP-like
1833	                              Congestion Control", RFC 4341, March 2006.

1835	   [RFC4342]                  Floyd, S., Kohler, E., and J. Padhye,
1836	                              "Profile for Datagram Congestion Control
1837	                              Protocol (DCCP) Congestion Control ID 3:
1838	                              TCP-Friendly Rate Control (TFRC)",
1839	                              RFC 4342, March 2006.

1841	   [RFC4835]                  Manral, V., "Cryptographic Algorithm
1842	                              Implementation Requirements for
1843	                              Encapsulating Security Payload (ESP) and
1844	                              Authentication Header (AH)", RFC 4835,
1845	                              April 2007.

1847	   [RFC4960]                  Stewart, R., "Stream Control Transmission
1848	                              Protocol", RFC 4960, September 2007.

1850	   [RFC5562]                  Kuzmanovic, A., Mondal, A., Floyd, S., and
1851	                              K. Ramakrishnan, "Adding Explicit
1852	                              Congestion Notification (ECN) Capability
1853	                              to TCP's SYN/ACK Packets", RFC 5562,
1854	                              June 2009.

1856	   [RFC5681]                  Allman, M., Paxson, V., and E. Blanton,
1857	                              "TCP Congestion Control", RFC 5681,
1858	                              September 2009.

1860	   [RFC6040]                  Briscoe, B., "Tunnelling of Explicit
1861	                              Congestion Notification", RFC 6040,
1862	                              November 2010.

1864	14.2.  Informative References

1866	   [ARI05]                    Adams, J., Roberts, L., and A.
1867	                              IJsselmuiden, "Changing the Internet to
1868	                              Support Real-Time Content Supply from a
1869	                              Large Fraction of Broadband Residential
1870	                              Users", BT Technology Journal
1871	                              (BTTJ) 23(2), April 2005.

1873	   [I-D.conex-tcp-mods]       Kuehlewind, M. and R. Scheffenegger, "TCP
1874	                              modifications for Congestion Exposure",
1875	                              draft-ietf-conex-tcp-modifications-04
1876	                              (work in progress), July 2013.

1878	   [I-D.re-ecn-motiv]         Briscoe, B., Jacquet, A., Moncaster, T.,
1879	                              and A. Smith, "Re-ECN: A Framework for
1880	                              adding Congestion Accountability to
1881	                              TCP/IP",
1882	                              draft-briscoe-conex-re-ecn-motiv-02 (work
1883	                              in progress), July 2013.

1885	   [I-D.re-pcn-border-cheat]  Briscoe, B., "Emulating Border Flow
1886	                              Policing using Re-PCN on Bulk Data",
1887	                              draft-briscoe-re-pcn-border-cheat-03 (work
1888	                              in progress), October 2009.

1890	   [RFC2309]                  Braden, B., Clark, D., Crowcroft, J.,
1891	                              Davie, B., Deering, S., Estrin, D., Floyd,
1892	                              S., Jacobson, V., Minshall, G., Partridge,
1893	                              C., Peterson, L., Ramakrishnan, K.,
1894	                              Shenker, S., Wroclawski, J., and L. Zhang,
1895	                              "Recommendations on Queue Management and
1896	                              Congestion Avoidance in the Internet",
1897	                              RFC 2309, April 1998.

1899	   [RFC2475]                  Blake, S., Black, D., Carlson, M., Davies,
1900	                              E., Wang, Z., and W. Weiss, "An
1901	                              Architecture for Differentiated Services",
1902	                              RFC 2475, December 1998.

1904	   [RFC3124]                  Balakrishnan, H. and S. Seshan, "The
1905	                              Congestion Manager", RFC 3124, June 2001.

1907	   [RFC3514]                  Bellovin, S., "The Security Flag in the
1908	                              IPv4 Header", RFC 3514, April 2003.

1910	   [RFC3540]                  Spring, N., Wetherall, D., and D. Ely,
1911	                              "Robust Explicit Congestion Notification
1912	                              (ECN) Signaling with Nonces", RFC 3540,
1913	                              June 2003.

1915	   [RFC4301]                  Kent, S. and K. Seo, "Security
1916	                              Architecture for the Internet Protocol",
1917	                              RFC 4301, December 2005.

1919	   [RFC5129]                  Davie, B., Briscoe, B., and J. Tay,
1920	                              "Explicit Congestion Marking in MPLS",
1921	                              RFC 5129, January 2008.

1923	   [RFC5559]                  Eardley, P., "Pre-Congestion Notification
1924	                              (PCN) Architecture", RFC 5559, June 2009.

1926	   [RFC6298]                  Paxson, V., Allman, M., Chu, J., and M.
1927	                              Sargent, "Computing TCP's Retransmission
1928	                              Timer", RFC 6298, June 2011.

1930	   [Re-fb]                    Briscoe, B., Jacquet, A., Di Cairano-
1931	                              Gilfedder, C., Salvatori, A., Soppera, A.,
1932	                              and M. Koyabe, "Policing Congestion
1933	                              Response in an Internetwork Using Re-
1934	                              Feedback", ACM SIGCOMM CCR 35(4)277--288,
1935	                              August 2005, <http://www.acm.org/sigs/
1936	                              sigcomm/sigcomm2005/
1937	                              techprog.html#session8>.

1939	   [Savage99]                 Savage, S., Cardwell, N., Wetherall, D.,
1940	                              and T. Anderson, "TCP congestion control
1941	                              with a misbehaving receiver", ACM SIGCOMM
1942	                              CCR 29(5), October 1999, <http://
1943	                              citeseer.ist.psu.edu/savage99tcp.html>.

1945	   [Steps_DoS]                Handley, M. and A. Greenhalgh, "Steps
1946	                              towards a DoS-resistant Internet
1947	                              Architecture", Proc. ACM SIGCOMM workshop
1948	                              on Future directions in network
1949	                              architecture (FDNA'04) pp 49--56,
1950	                              August 2004.

1952	   [tcp-rcv-cheat]            Moncaster, T., Briscoe, B., and A.
1953	                              Jacquet, "A TCP Test to Allow Senders to
1954	                              Identify Receiver Non-Compliance",
1955	                              draft-moncaster-tcpm-rcv-cheat-02 (work in
1956	                              progress), November 2007.

1958	Appendix A.  Precise Re-ECN Protocol Operation

1960	   The protocol operation in Section 4.3 was described as an
1961	   approximation.  In fact, standard ECN marking at a queue combines 1%
1962	   and 2% marking into slightly less than 3% whole-path marking, because
1963	   queues deliberately mark CE whether or not it has already been marked
1964	   by another queue upstream.  So the combined marking fraction would
1965	   actually be 100% - (100% - 1%)(100% - 2%) = 2.98%.

1967	   To generalise this we will need some notation.

1969	   o  j represents the index of each resource (typically queues) along a
1970	      path, ranging from 0 at the first queue to n-1 at the last.

1972	   o  m_j represents the fraction of octets to be *m*arked CE by a
1973	      particular queue (whether or not they are already marked) because
1974	      of congestion of resource j.

1976	   o  u_j represents congestion signals arriving from *u*pstream of
1977	      resource j, being the fraction of CE marking in arriving packet
1978	      headers (before marking).

1980	   o  p_j represents *p*ath congestion, being the fraction of packets
1981	      arriving at resource j with the RE flag blanked (excluding Not-
1982	      RECT packets).

1984	   o  v_j denotes expected congestion downstream of resource j, which
1985	      can be thought of as a *v*irtual marking fraction, being derived
1986	      from two other marking fractions.

1988	   Observed fractions of each particular codepoint (u, p and v) and
1989	   queue marking rate m are dimensionless fractions, being the ratio of
1990	   two data volumes (marked and total) over a monitoring period.  All
1991	   measurements are in terms of octets, not packets, assuming that line
1992	   resources are more congestible than packet processing.

1994	   The path congestion (RE blanking fraction) set by the sender should
1995	   reflect upstream congestion (CE marking fraction) from the viewpoint
1996	   of the destination, which it feeds back to the sender.  Therefore in
1997	   the steady state

1999	      p_0  = u_n
2000	           = 1 - (1 - m_1)(1 - m_2)...

2002	   Similarly, at some point j in the middle of the network, given p = 1
2003	   - (1 - u_j)(1 - v_j), then

2005	      v_j  = 1 - (1 - p)/(1 - u_j)

2007	          ~= p - u_j;                      if u_j << 100%

2009	   So, between the two routers in the example in Section 4.3, congestion
2010	   downstream is
2011	      v_1  = 100.00% - (100% - 2.98%) / (100% - 1.00%)
2012	           = 2.00%,

2014	   or a useful approximation of downstream congestion is

2016	      v_1 ~= 2.98% - 1.00%
2017	          ~= 1.98%.

2019	Appendix B.  Justification for Two Codepoints Signifying Zero Worth
2020	             Packets

2022	   It may seem a waste of a codepoint to set aside two codepoints of the
2023	   Extended ECN field to signify zero worth (RECT and CE(0) are both
2024	   worth zero).  The justification is subtle, but worth recording.

2026	   The original version of Re-ECN ([Re-fb] and draft-00 of this memo)
2027	   used three codepoints for neutral (ECT(1)), positive (ECT(0)) and
2028	   negative (CE) packets.  The sender set packets to neutral unless re-
2029	   echoing congestion, when it set them positive, in much the same way
2030	   that it blanks the RE flag in the current protocol.  However, routers
2031	   were meant to mark congestion by setting packets negative (CE)
2032	   irrespective of whether they had previously been neutral or positive.

2034	   However, we did not arrange for senders to remember which packet had
2035	   been sent with which codepoint, or for feedback to say exactly which
2036	   packets arrived with which codepoints.  The transport was meant to
2037	   inflate the number of positive packets it sent to allow for a few
2038	   being wiped out by congestion marking.  We (wrongly) assumed that
2039	   routers would congestion mark packets indiscriminately, so the
2040	   transport could infer how many positive packets had been marked and
2041	   compensate accordingly by re-echoing.  But this created a perverse
2042	   incentive for routers to preferentially congestion mark positive
2043	   packets rather than neutral ones.

2045	   We could have removed this perverse incentive by requiring Re-ECN
2046	   senders to remember which packets they had sent with which codepoint.
2047	   And for feedback from the receiver to identify which packets arrived
2048	   as which.  Then, if a positive packet was congestion marked to
2049	   negative, the sender could have re-echoed twice to maintain the
2050	   balance between positive and negative at the receiver.

2052	   Instead, we chose to make re-echoing congestion (blanking RE)
2053	   orthogonal to congestion notification (marking CE), which required a
2054	   second neutral codepoint.  Then the receiver would be able to detect
2055	   and echo a congestion event even if it arrived on a packet that had
2056	   originally been positive.

2058	   If we had added extra complexity to the sender and receiver
2059	   transports to track changes to individual packets, we could have made
2060	   it work, but then routers would have had an incentive to mark
2061	   positive packets with half the probability of neutral packets.  That
2062	   in turn would have led router algorithms to become more complex.
2063	   Then senders wouldn't know whether a mark had been introduced by a
2064	   simple or a complex router algorithm.  That in turn would have
2065	   required another codepoint to distinguish between RFC3168 ECN and new
2066	   Re-ECN router marking.

2068	   Once the cost of IP header codepoint real-estate was the same for
2069	   both schemes, there was no doubt that the simpler option for
2070	   endpoints and for routers should be chosen.  The resulting protocol
2071	   also no longer needed the tricky inflation/deflation complexity of
2072	   the original (broken) scheme.  It was also much simpler to understand
2073	   conceptually.

2075	   A further advantage of the new orthogonal four-codepoint scheme was
2076	   that senders owned sole rights to change the RE flag and routers
2077	   owned sole rights to change the ECN field.  Although we still arrange
2078	   the incentives so neither party strays outside their dominion, these
2079	   clear lines of authority simplify the matter.

2081	   Finally, a little redundancy can be very powerful in a scheme such as
2082	   this.  In one flow, the proportion of packets changed to CE should be
2083	   the same as the proportion of RECT packets changed to CE(-1) and the
2084	   proportion of Re-Echo packets changed to CE(0).  Double checking
2085	   using such redundant relationships can improve the security of a
2086	   scheme (cf. double-entry book-keeping or the ECN Nonce).
2087	   Alternatively, it might be necessary to exploit the redundancy in the
2088	   future to encode an extra information channel.

2090	Appendix C.  ECN Compatibility

2092	   The rationale for choosing the particular combinations of SYN and SYN
2093	   ACK flags in Section 6.1.3 is as follows.

2095	   Choice of SYN flags:  A Re-ECN sender can work with RFC3168 compliant
2096	      ECN receivers so we wanted to use the same flags as would be used
2097	      in an ECN-setup SYN [RFC3168] (CWR=1, ECE=1).  But at the same
2098	      time, we wanted a server (host B) that is Re-ECT to be able to
2099	      recognise that the client (A) is also Re-ECT.  We believe also
2100	      setting NS=1 in the initial SYN achieves both these objectives, as
2101	      it should be ignored by RFC3168 compliant ECT receivers and by
2102	      ECT-Nonce receivers.  But senders that are not Re-ECT should not
2103	      set NS=1.  At the time ECN was defined, the NS flag was not
2104	      defined, so setting NS=1 should be ignored by existing ECT
2105	      receivers (but testing against implementations may yet prove
2106	      otherwise).  The ECN Nonce RFC [RFC3540] is silent on what the NS
2107	      field might be set to in the TCP SYN, but we believe the intent
2108	      was for a nonce client to set NS=0 in the initial SYN (again only
2109	      testing will tell).  Therefore we define a Re-ECN-setup SYN as one
2110	      with NS=1, CWR=1 & ECE=1

2112	   Choice of SYN ACK flags:  Choice of SYN ACK: The client (A) needs to
2113	      be able to determine whether the server (B) is Re-ECT.  The
2114	      original ECN specification required an ECT server to respond to an
2115	      ECN-setup SYN with an ECN-setup SYN ACK of CWR=0 and ECE=1.  There
2116	      is no room to modify this by setting the NS flag, as that is
2117	      already set in the SYN ACK of an ECT-Nonce server.  So we used the
2118	      only combination of CWR and ECE that would not be used by existing
2119	      TCP receivers: CWR=1 and ECE=0.  The original ECN specification
2120	      defines this combination as a non-ECN-setup SYN ACK, which remains
2121	      true for RFC3168 compliant and Nonce ECTs.  But for Re-ECN we
2122	      define it as a Re-ECN-setup SYN ACK.  We didn't use a SYN ACK with
2123	      both CWR and ECE cleared to 0 because that would be the likely
2124	      response from most Not-ECT receivers.  And we didn't use a SYN ACK
2125	      with both CWR and ECE set to 1 either, as at least one broken
2126	      receiver implementation echoes whatever flags were in the SYN into
2127	      its SYN ACK.  Therefore we define a Re-ECN-setup SYN ACK as one
2128	      with CWR=1 & ECE=0.

2130	   Choice of two alternative SYN ACKs:  the NS flag may take either
2131	      value in a Re-ECN-setup SYN ACK.  Section 5.4 REQUIRES that a Re-
2132	      ECT server MUST set the NS flag to 1 in a Re-ECN-setup SYN ACK to
2133	      echo congestion experienced (CE) on the initial SYN.  Otherwise a
2134	      Re-ECN-setup SYN ACK MUST be returned with NS=0.  The only current
2135	      known use of the NS flag in a SYN ACK is to indicate support for
2136	      the ECN nonce, which will be negotiated by setting CWR=0 & ECE=1.
2137	      Given the ECN nonce MUST NOT be used for a RECN mode connection, a
2138	      Re-ECN-setup SYN ACK can use either setting of the NS flag without
2139	      any risk of confusion, because the CWR & ECE flags will be
2140	      reversed relative to those used by an ECN nonce SYN ACK.

2142	   {ToDo: include the text below, either here, or in the algorithm
2143	   sections} At an egress dropper, well-behaved RFC3168 compliant flows
2144	   will appear to consist mostly of ECT(0) packets, with a few CE(0)
2145	   packet.  And, if the legacy source is setting the ECN nonce, the
2146	   majority of packets will be an equal mix of ECT(0) and ECT(1) packets
2147	   (the latter appearing to be Re-Echo packets in Re-ECN terms).  None
2148	   of these three packet markings is negative, so an egress dropper can
2149	   handle all legacy flows in bulk and, as long as they don't send any
2150	   packets using Re-ECN markings, it need not drop any legacy packets.
2151	   So, as soon as an ECT(0) packet is seen, its flow ID can be added to
2152	   the set of known legacy flows (a single Bloom filter would suffice).
2153	   But, if any packets in flows classified as RFC3168 compliant are
2154	   marked with any other marking than the three expected, the flow can
2155	   be removed from the RFC3168 set, to be treated in bulk with mis-
2156	   behaving Re-ECN flows---the remainder of flow IDs that require no
2157	   flow state to be held.

2159	   To an ingress Re-ECN policer, legacy ECN flows will appear as very
2160	   highly congested paths.  When policers are first deployed they can be
2161	   configured permissively, allowing through both `RFC3168' ECN and
2162	   misbehaving Re-ECN flows.  Then, as the threshold is set more
2163	   strictly, the more RFC3168 ECN sources will gain by upgrading to Re-
2164	   ECN.  Thus, towards the end of the voluntary incremental deployment
2165	   period, RFC3168 transports can be given progressively stronger
2166	   encouragement to upgrade.

2168	Appendix D.  Packet Marking with FNE During Flow Start

2170	   FNE (feedback not established) packets have two functions.  Their
2171	   main role is to announce the start of a new flow when feedback has
2172	   not yet been established.  However they also have the role of
2173	   balancing the expected feedback and can be used where there are
2174	   sudden changes in the rate of transmission.  Whilst this should not
2175	   happen under TCP their use as speculative marking is used in building
2176	   the following argument as to why the first and third packets should
2177	   be set to FNE.

2179	   The proportion of FNE packets in each round-trip should be a high
2180	   estimate of the potential error in the balance of number of
2181	   congestion marked packets versus number of re-echo packets already
2182	   issued.

2184	   Let's call:

2186	      S: the number of the TCP segments sent so far

2188	      F: the number of FNE packets sent so far

2190	      R: the number of Re-Echo packets sent so far

2192	      A: the number of acknowledgments received so far

2194	      C: the number of acknowledgments echoing a CE packet

2196	   In normal operation, when we want to send packet S+1, we first need
2197	   to check that enough Re-Echo packets have been issued:

2199	   If R<C, then S+1 will be a Re-echo packet

2201	   Next we need to estimate the amount of congestion observed so far.
2202	   If congestion was stationary, it could be estimated as C/A. A
2203	   pessimistic bound is (C+1)/(A+1) which assumes that the next
2204	   acknowledgment will echo a CE packet; we'll use that more pessimistic
2205	   estimate to drive the generation of FNE packets.

2207	   The number of CE packets expected when (S+1) will be acknowledged is
2208	   therefore (S+1)*(C+1)/(A+1).  Packet S+1 should be set to FNE if that
2209	   expected value exceeds the sum of FNE and Re-Echo packets sent so
2210	   far.

2212	      If  (F+R)<(S+1)*(C+1)/(A+1),
2213	        then S+1 will be set to FNE
2214	        else S+1 will be set to RECT

2216	   So the full test should be:

2218	      When packet (S+1) is about to be sent...
2219	        If R<C,
2220	           then S+1 will be set to Re-Echo
2221	        Else if  (F+R)<(S+1)*(C+1)/(A+1),
2222	          then S+1 will be set to FNE
2223	        Else S+1 will be set to RECT

2225	   This means that at any point, given A, R, F, C, the source could send
2226	   another k RECT packets, so that k < (F+R)*(A+1)/(C+1)-S

2228	   The above scheme is independent of the actions of both the dropper
2229	   and policer and doesn't depend on the rate adaptation discipline of
2230	   the source.  It only defines Re-Echo packets as notification of
2231	   effective end-to-end congestion (as witnessed at the previous round-
2232	   trip), and FNE packets as notification of speculative end-to-end
2233	   congestion based on a high estimate of congestion

2235	   In practice, for any source:

2237	   o  for the first packet, A=R=F=C=S=0 ==> 1 FNE

2239	   o  if the acknowledgment doesn't echo a mark

2241	      *  for the second packet, A=F=S=1 R=C=0 ==> 1 RECT

2243	      *  for the third packet, S=2 A=F=1 R=C=0 ==> 1 FNE

2245	   o  if no acknowledgement for these two packets echoes a congestion
2246	      mark, then {A=S=3 F=2 R=C=0} which gives k<2*4/1-3, so the source

2248	   o  if no acknowledgement for these four packets echoes a congestion
2249	      mark, then {A=S=7 F=2 R=C=0} which gives k<2*8/1-7, so the source
2250	      could send another 8 RECT packets. ==> 8 RECT

2252	   This behaviour happens to match TCP's congestion window control in
2253	   slow start, which is why for TCP sources, only the first and third
2254	   packet need be FNE packets.

2256	   A source that would open the congestion window any quicker would have
2257	   to insert more FNE packets.  As another example a UDP source sending
2258	   VBR traffic might need to send several FNE packets ahead of the
2259	   traffic peaks it generates.

2261	Appendix E.  Argument for holding back the ECN nonce

2263	   The ECN nonce is a mechanism that allows a /sending/ transport to
2264	   detect if drop or ECN marking at a congested router has been
2265	   suppressed by a node somewhere in the feedback loop---another router
2266	   or the receiver.

2268	   Space for the ECN nonce was set aside in [RFC3168] (currently
2269	   proposed standard) while the full nonce mechanism is specified in
2270	   [RFC3540] (currently experimental).  The specifications for [RFC4340]
2271	   (currently proposed standard) requires that "Each DCCP sender SHOULD
2272	   set ECN Nonces on its packets...".  It also mandates as a requirement
2273	   for all CCID profiles that "Any newly defined acknowledgement
2274	   mechanism MUST include a way to transmit ECN Nonce Echoes back to the
2275	   sender.", therefore:

2277	   o  The CCID profile for TCP-like Congestion Control [RFC4341]
2278	      (currently proposed standard) says "The sender will use the ECN
2279	      Nonce for data packets, and the receiver will echo those nonces in
2280	      its Ack Vectors."

2282	   o  The CCID profile for TCP-Friendly Rate Control (TFRC) [RFC4342]
2283	      recommends that "The sender [use] Loss Intervals options' ECN
2284	      Nonce Echoes (and possibly any Ack Vectors' ECN Nonce Echoes) to
2285	      probabilistically verify that the receiver is correctly reporting
2286	      all dropped or marked packets."

2288	   The primary function of the ECN nonce is to protect the integrity of
2289	   the information about congestion: ECN marks and packet drops.
2290	   However, when the nonce is used to protect the integrity of
2291	   information about packet drops, rather than ECN marks, a transport
2292	   layer nonce will always be sufficient (because a drop loses the
2293	   transport header as well as the ECN field in the network header),
2294	   which would avoid using scarce IP header codepoint space.  Similarly,
2295	   a transport layer nonce would protect against a receiver sending
2296	   early acknowledgements [Savage99].

2298	   If the ECN nonce reveals integrity problems with the information
2299	   about congestion, the sending transport can use that knowledge for
2300	   two functions:

2302	   o  to protect its own resources, by allocating them in proportion to
2303	      the rates that each network path can sustain, based on congestion
2304	      control,

2306	   o  and to protect congested routers in the network, by slowing down
2307	      drastically its connection to the destination with corrupt
2308	      congestion information.

2310	   If the sending transport chooses to act in the interests of congested
2311	   routers, it can reduce its rate if it detects some malicious party in
2312	   the feedback loop may be suppressing ECN feedback.  But it would only
2313	   be useful to congested routers when /all/ senders using them are
2314	   trusted to act in interest of the congested routers.

2316	   In the end, the only essential use of a network layer nonce is when
2317	   sending transports (e.g. large servers) want to allocate their /own/
2318	   resources in proportion to the rates that each network path can
2319	   sustain, based on congestion control.  In that case, the nonce allows
2320	   senders to be assured that they aren't being duped into giving more
2321	   of their own resources to a particular flow.  And if congestion
2322	   suppression is detected, the sending transport can rate limit the
2323	   offending connection to protect its own resources.  Certainly, this
2324	   is a useful function, but the IETF should carefully decide whether
2325	   such a single, very specific case warrants IP header space.

2327	   In contrast, Re-ECN allows all routers to fully protect themselves
2328	   from such attacks, without having to trust anyone - senders,
2329	   receivers, neighbouring networks.  Re-ECN is therefore proposed in
2330	   preference to the ECN nonce on the basis that it addresses the
2331	   generic problem of accountability for congestion of a network's
2332	   resources at the IP layer.

2334	   Delaying the ECN nonce is justified because the applicability of the
2335	   ECN nonce seems too limited for it to consume a two-bit codepoint in
2336	   the IP header.  It therefore seems prudent to give time for an
2337	   alternative way to be found to do the one function the nonce is
2338	   essential for.

2340	   Moreover, while we have re-designed the Re-ECN codepoints so that
2341	   they do not prevent the ECN nonce progressing, the same is not true
2342	   the other way round.  If the ECN nonce started to see some deployment
2343	   (perhaps because it was blessed with proposed standard status),
2344	   incremental deployment of Re-ECN would effectively be impossible,
2345	   because Re-ECN marking fractions at inter-domain borders would be
2346	   polluted by unknown levels of nonce traffic.

2348	   The authors are aware that Re-ECN must prove it has the potential it
2349	   claims if it is to displace the nonce.  Therefore, every effort has
2350	   been made to complete a comprehensive specification of Re-ECN so that
2351	   its potential can be assessed.  We therefore seek the opinion of the
2352	   Internet community on whether the Re-ECN protocol is sufficiently
2353	   useful to warrant standards action.

2355	Appendix F.  Alternative Terminology Used in Other Documents

2357	   A number of alternative terms have been used in various documents
2358	   describing re-feedback and re-ECN.  These are set out in the
2359	   following table

2361	        +---------------------+----------------+------------------+
2362	        | Current Terminology | EECN codepoint |      Colour      |
2363	        +---------------------+----------------+------------------+
2364	        |       Cautious      |       FNE      |       Green      |
2365	        |       Positive      |     Re-Echo    |       Black      |
2366	        |       Neutral       |      RECT      |       Grey       |
2367	        |       Negative      |     CE(-1)     |        Red       |
2368	        |      Cancelled      |      CE(0)     |     Red-Black    |
2369	        |      Legacy ECN     |     ECT(0)     |       White      |
2370	        |   Currently Unused  |     --CU--     | Currently unused |
2371	        |                     |                |                  |
2372	        |        Legacy       |     Not-ECT    |       White      |
2373	        +---------------------+----------------+------------------+

2375	                  Table 7: Alternative re-ECN Terminology

2377	Authors' Addresses

2379	   Bob Briscoe (editor)
2380	   BT
2381	   B54/77, Adastral Park
2382	   Martlesham Heath
2383	   Ipswich  IP5 3RE
2384	   UK

2386	   Phone: +44 1473 645196
2387	   EMail: bob.briscoe@bt.com
2388	   URI:   http://bobbriscoe.net/
2389	   Arnaud Jacquet
2390	   BT
2391	   B54/70, Adastral Park
2392	   Martlesham Heath
2393	   Ipswich  IP5 3RE
2394	   UK

2396	   Phone: +44 1473 647284
2397	   EMail: arnaud.jacquet@bt.com
2398	   URI:

2400	   Toby Moncaster
2401	   Moncaster.com
2402	   Dukes
2403	   Layer Marney
2404	   Colchester  CO5 9UZ
2405	   UK

2407	   EMail: toby@moncaster.com

2409	   Alan Smith
2410	   BT
2411	   B54/76, Adastral Park
2412	   Martlesham Heath
2413	   Ipswich  IP5 3RE
2414	   UK

2416	   Phone: +44 1473 640404
2417	   EMail: alan.p.smith@bt.com