idnits 2.17.1 

draft-briscoe-conex-re-ecn-tcp-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 11, 2014) is 3698 days in the past.  Is this
     intentional?


  Checking references for intended status: Historic
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 4835 (Obsoleted by RFC 7321)

  ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260)

  == Outdated reference: A later version (-10) exists of
     draft-ietf-conex-tcp-modifications-05

  -- Obsolete informational reference (is this intentional?): RFC 2309
     (Obsoleted by RFC 7567)


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Transport Area Working Group                             B. Briscoe, Ed.
3	Internet-Draft                                                A. Jacquet
4	Intended status: Historic                                             BT
5	Expires: September 12, 2014                                 T. Moncaster
6	                                                           Moncaster.com
7	                                                                A. Smith
8	                                                                      BT
9	                                                          March 11, 2014

11	     Re-ECN: Adding Accountability for Causing Congestion to TCP/IP
12	                   draft-briscoe-conex-re-ecn-tcp-03

14	Abstract

16	   This document introduces re-ECN (re-inserted explicit congestion
17	   notification), which is intended to make a simple but far-reaching
18	   change to the Internet architecture.  The sender uses the IP header
19	   to reveal the congestion that it expects on the end-to-end path.  The
20	   protocol works by arranging an extended ECN field in each packet so
21	   that, as it crosses any interface in an internetwork, it will carry a
22	   truthful prediction of congestion on the remainder of its path.  It
23	   can be deployed incrementally around unmodified routers.  The purpose
24	   of this document is to specify the re-ECN protocol at the IP layer
25	   and to give guidelines on any consequent changes required to
26	   transport protocols.  It includes the changes required to TCP both as
27	   an example and as a specification.  It briefly gives examples of
28	   mechanisms that can use the protocol to ensure data sources respond
29	   sufficiently to congestion, but these are described more fully in a
30	   companion document.

32	   Note concerning Intended Status: If this draft were ever published as
33	   an RFC it would probably have historic status.  There is limited
34	   space in the IP header, so re-ECN had to compromise by requiring the
35	   receiver to be ECN-enabled otherwise the sender could not use re-ECN.
36	   Re-ECN was a precursor to chartering of the IETF's Congestion
37	   Exposure (ConEx) working group, but during chartering there were
38	   still too few ECN receivers enabled, therefore it was decided to
39	   pursue other compromises in order to fit a similar capability into
40	   the IP header.

42	Status of This Memo

44	   This Internet-Draft is submitted in full conformance with the
45	   provisions of BCP 78 and BCP 79.

47	   Internet-Drafts are working documents of the Internet Engineering
48	   Task Force (IETF).  Note that other groups may also distribute
49	   working documents as Internet-Drafts.  The list of current Internet-
50	   Drafts is at http://datatracker.ietf.org/drafts/current/.

52	   Internet-Drafts are draft documents valid for a maximum of six months
53	   and may be updated, replaced, or obsoleted by other documents at any
54	   time.  It is inappropriate to use Internet-Drafts as reference
55	   material or to cite them other than as "work in progress."

57	   This Internet-Draft will expire on September 12, 2014.

59	Copyright Notice

61	   Copyright (c) 2014 IETF Trust and the persons identified as the
62	   document authors.  All rights reserved.

64	   This document is subject to BCP 78 and the IETF Trust's Legal
65	   Provisions Relating to IETF Documents
66	   (http://trustee.ietf.org/license-info) in effect on the date of
67	   publication of this document.  Please review these documents
68	   carefully, as they describe your rights and restrictions with respect
69	   to this document.  Code Components extracted from this document must
70	   include Simplified BSD License text as described in Section 4.e of
71	   the Trust Legal Provisions and are provided without warranty as
72	   described in the Simplified BSD License.

74	Table of Contents

76	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  5
77	   2.  Requirements notation  . . . . . . . . . . . . . . . . . . . .  6
78	   3.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  6
79	   4.  Protocol Overview  . . . . . . . . . . . . . . . . . . . . . .  7
80	     4.1.  Simplified Re-ECN Protocol . . . . . . . . . . . . . . . .  7
81	       4.1.1.  Congestion Control and Policing the Protocol . . . . .  8
82	       4.1.2.  Background and Applicability . . . . . . . . . . . . .  8
83	     4.2.  Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or
84	           v6)  . . . . . . . . . . . . . . . . . . . . . . . . . . .  9
85	     4.3.  Re-ECN Protocol Operation  . . . . . . . . . . . . . . . . 11
86	     4.4.  Positive and Negative Flows  . . . . . . . . . . . . . . . 13
87	   5.  Network Layer  . . . . . . . . . . . . . . . . . . . . . . . . 14
88	     5.1.  Re-ECN IPv4 Wire Protocol  . . . . . . . . . . . . . . . . 14
89	     5.2.  Re-ECN IPv6 Wire Protocol  . . . . . . . . . . . . . . . . 16
90	     5.3.  Router Forwarding Behaviour  . . . . . . . . . . . . . . . 17
91	     5.4.  Justification for Setting the First SYN to FNE . . . . . . 18
92	     5.5.  Control and Management . . . . . . . . . . . . . . . . . . 19
93	       5.5.1.  Negative Balance Warning . . . . . . . . . . . . . . . 19
94	       5.5.2.  Rate Response Control  . . . . . . . . . . . . . . . . 20
95	     5.6.  IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 20
96	     5.7.  Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 21

98	   6.  Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 22
99	     6.1.  TCP  . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
100	       6.1.1.  RECN mode: Full Re-ECN capable transport . . . . . . . 23
101	       6.1.2.  RECN-Co mode: Re-ECT Sender with a RFC3168
102	               compliant ECN Receiver . . . . . . . . . . . . . . . . 25
103	       6.1.3.  Capability Negotiation . . . . . . . . . . . . . . . . 27
104	       6.1.4.  Extended ECN (EECN) Field Settings during Flow
105	               Start or after Idle Periods  . . . . . . . . . . . . . 28
106	       6.1.5.  Pure ACKS, Retransmissions, Window Probes and
107	               Partial ACKs . . . . . . . . . . . . . . . . . . . . . 32
108	     6.2.  Other Transports . . . . . . . . . . . . . . . . . . . . . 33
109	       6.2.1.  General Guidelines for Adding Re-ECN to Other
110	               Transports . . . . . . . . . . . . . . . . . . . . . . 33
111	       6.2.2.  Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 33
112	       6.2.3.  Guidelines for adding Re-ECN to DCCP . . . . . . . . . 34
113	       6.2.4.  Guidelines for adding Re-ECN to SCTP . . . . . . . . . 34
114	   7.  Incremental Deployment . . . . . . . . . . . . . . . . . . . . 34
115	   8.  Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 35
116	     8.1.  Congestion Notification Integrity  . . . . . . . . . . . . 36
117	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 37
118	   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 38
119	   11. Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 39
120	   12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 39
121	   13. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 39
122	   14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 39
123	     14.1. Normative References . . . . . . . . . . . . . . . . . . . 39
124	     14.2. Informative References . . . . . . . . . . . . . . . . . . 40
125	   Appendix A.  Precise Re-ECN Protocol Operation . . . . . . . . . . 42
126	   Appendix B.  Justification for Two Codepoints Signifying Zero
127	                Worth Packets . . . . . . . . . . . . . . . . . . . . 44
128	   Appendix C.  ECN Compatibility . . . . . . . . . . . . . . . . . . 45
129	   Appendix D.  Packet Marking with FNE During Flow Start . . . . . . 47
130	   Appendix E.  Argument for holding back the ECN nonce . . . . . . . 49
131	   Appendix F.  Alternative Terminology Used in Other Documents . . . 51

133	Authors' Statement: (to be removed by the RFC Editor)

135	   The most immediate priority for the authors is to delay any move of
136	   the ECN nonce to Proposed Standard status, in order to leave options
137	   open for the future.  The argument for this position is developed in
138	   Appendix E.

140	Changes from previous drafts (to be removed by the RFC Editor)

142	   Full diffs from all previous versions (created using the rfcdiff
143	   tool) are available at <http://www.bobbriscoe.net/pubs.html#retcp>

145	   From draft-briscoe-conex-...-02 to -03 (current version):  Re-issued
146	      to keep alive; updated references

148	   From draft-briscoe-conex-...-01 to -02 (current version):  Re-issued
149	      to keep alive; updated references

151	   From draft-briscoe-conex-...-00 to -01:  Re-issued to keep alive;
152	      updated references

154	   From draft-briscoe-tsvwg-...-08 to draft-briscoe-conex-...-00:

156	      Re-issued to keep alive for reference by ConEx working group

158	      Changed working group tag in filename from tsvwg to conex

160	      Changed intended status to historic and added explanatory note

162	      Updated references.  Also, now that RFC6040 has been published,
163	      the section on tunnelling required a re-write

165	      Corrected name of CE(0) to Cancelled in Table 2

167	      Noted errors and omissions (rather than spending time correcting
168	      them):

170	      *  Made a few 'ToDo' comments visible that had previously been
171	         comments within the document source

173	      *  Identified errors with 'ToDo' comments, referring to correct
174	         material where possible.

176	   From -08 to -09:

178	      Re-issued to keep alive for reference by ConEx working group.

180	      Hardly any changes to content, even where it is out of date,
181	      except references updated.

183	   From -07 to -08:

185	      Minor changes and consistency checks.

187	      References updated.

189	   From -06 to -07:

191	      Major changes made following splitting this protocol document from
192	      the related motivations document [I-D.re-ecn-motiv].

194	      Significant re-ordering of remaining text.

196	      New terminology introduced for clarity.

198	      Minor editorial changes throughout.

200	1.  Introduction

202	   This document provides a complete specification for the addition of
203	   the re-ECN protocol to IP and guidelines on how to add it to
204	   transport layer protocols, including a complete specification of re-
205	   ECN in TCP as an example.  The motivation behind this proposal is
206	   given in [I-D.re-ecn-motiv], but we include a brief summary here.

208	   Re-ECN is intended to allow senders to inform the network of the
209	   level of congestion they expect their flows to see.  This information
210	   is currently only visible at the transport layer.  ECN [RFC3168]
211	   reveals the upstream congestion state of any path by monitoring the
212	   rate of CE marks.  The receiver then informs the sender when they
213	   have seen a marked packet.  Re-ECN builds on ECN by providing new
214	   codepoints that allow the sender to declare the level of congestion
215	   they expect on the forward path.  It is closely related to ECN and
216	   indeed we define a compatibility mode to allow a re-ECN sender to
217	   communicate with an ECN receiver.

219	   If a sender understates expected congestion compared to actual
220	   congestion then the network could discard packets or enact some other
221	   sanction.  A policer can also be introduced at the ingress of
222	   networks that can limit the level of congestion being caused.

224	   A general statement of the problem solved by re-ECN is to provide
225	   sufficient information in each IP datagram to be able to hold senders
226	   and whole networks accountable for the congestion they cause
227	   downstream, before they cause it.  But the every-day problems that
228	   re-ECN can solve are much more recognisable than this rather generic
229	   statement: mitigating distributed denial of service (DDoS);
230	   simplifying differentiation of quality of service (QoS); policing
231	   compliance to congestion control; and so on.

233	   It is important to add a few key points.

235	   o  In any standard network it always takes one round trip before any
236	      feedback is received.  For this reason a sender must make a
237	      conservative prediction by transmitting IP packets with a special
238	      Cautious marking when it is unsure of the state of the network.

240	   o  It should be noted that the prediction is carried in-band in
241	      normal data packets and for many transports feedback can be
242	      carried in the normal acknowledgements or control packets.

244	   o  The re-ECN protocol is independent of the transport.  In TCP,
245	      acknowledgments are used to convey the feedback from receiver to
246	      sender.  This memo concentrates on TCP as an example transport
247	      protocol, however the re-ECN protocol is compatible with any
248	      transport where feedback can be sent from receiver to sender.

250	   This document is structured as follows.  First an overview of the re-
251	   ECN protocol is given (Section 4), outlining its attributes and
252	   explaining conceptually how it works as a whole.  The two main parts
253	   of the document follow.  That is, the protocol specification divided
254	   into network (Section 5) and transport (Section 6) layers.
255	   Deployment issues discussed throughout the document are brought
256	   together in Section 7.  Related work is discussed in (Section 8).

258	2.  Requirements notation

260	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
261	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
262	   document are to be interpreted as described in [RFC2119].

264	3.  Terminology

266	   {ToDo: No attempt has been made to bring terminology into line with
267	   that agreed within the ConEx working group.  For instance the term
268	   dropper remains unchanged, even though the ConEx w-g has decided to
269	   call it an audit function (which is actually a much better term).}

271	   The following terminology is used throughout this memo.  Some of this
272	   terminology has changed as this draft has been revised.  Therefore,
273	   to help avoid confusion, Appendix F sets out all the alternative
274	   terminology that has been used in other re-ECN related documents.

276	   o  Neutral packet - a packet that is able to be congestion marked by
277	      an ECN or re-ECN queue.

279	   o  Negative packet - a Neutral packet that has been congestion marked
280	      by an ECN or re-ECN queue.

282	   o  Positive packet - a packet that has been marked by the sender to
283	      indicate the expected level of congestion along its path.  In
284	      general Positive packets should only be sent in response to
285	      feedback received from the receiver.*

287	   o  Cancelled packet - a Positive Packet that has been congestion
288	      marked by an ECN or re-ECN queue.

290	   o  Cautious packet - a packet that has been marked by the sender to
291	      indicate the expected level of congestion along its path.  In
292	      general Cautious packets should be used when there is insufficient
293	      feedback to be confident about the congestion state of the
294	      network.*

296	      * the difference between positive and cautious packets is
297	      explained in detail later in the document along with guidelines on
298	      the use of Cautious packets.

300	   All the above terms have related IP codepoints as defined in
301	   (Section 5).

303	4.  Protocol Overview

305	4.1.  Simplified Re-ECN Protocol

307	   We describe here the simplified re-ECN protocol.  To simplify the
308	   description we assume packets and segments are synonymous.

310	   Packets are sent from a sender to a receiver.  In Figure 1 the queues
311	   (Q1 and Q2) are ECN enabled as per RFC 3168 [RFC3168].  If congestion
312	   occurs then packets are marked with the congestion experienced (CE)
313	   flag exactly as in the ECN protocol [RFC3168]; the routers do not
314	   need to be modified and do not need to know the re-ECN protocol.  The
315	   receiver constantly informs the sender of the current count of
316	   Negative packets it has seen.  The sender uses this information
317	   determine how many Positive packets it must send into the network.
318	   The receiver's aim is to balance the number of bytes that have been
319	   congestion marked with the number of Positive bytes it has sent.

321	          +--------- Feedback----------+
322	          |                            |
323	          v                            |
324	        +---+    +----+    +----+    +---+
325	        |   |    |    |    |    |    |   |
326	        | S |--->| Q1 |--->| Q2 |--->| R |
327	        |   |    |    |    |    |    |   |
328	        +---+    +----+    +----+    +---+

330	                          Figure 1: Simple Re-ECN

332	4.1.1.  Congestion Control and Policing the Protocol

334	   The arrangement of the protocol ensures that packets carry a
335	   declaration of the amount of congestion that will be experienced on
336	   the path.  The re-ECN protocol is orthogonal to any congestion
337	   control algorithms, but can be used to ensure that congestion control
338	   is being applied by the sender.

340	   In general we assume that there will be a policer at the network
341	   ingress which can rate limit traffic based on the amount of
342	   congestion declared.

344	   At the network egress there is a dropper which can impose sanctions
345	   on flows that incorrectly declare congestion.

347	   Policers and droppers are explained in more detail in
348	   [I-D.re-ecn-motiv].

350	4.1.2.  Background and Applicability

352	   The re-ECN protocol makes no changes and has no effect on the TCP
353	   congestion control algorithm or on other rate responses to
354	   congestion.  Re-ECN is not a new congestion control protocol, rather
355	   it is orthogonal to congestion control itself.  Re-ECN is concerned
356	   with revealing information about congestion so that users and
357	   networks can be held accountable for the congestion they cause, or
358	   allow to be caused.

360	   Re-ECN builds on ECN so we briefly recap the essentials of the ECN
361	   protocol [RFC3168].  Two bits in the IP protocol (v4 or v6) are
362	   assigned to the ECN field.  The sender clears the field to "00" (Not-
363	   ECT) if either end-point transport is not ECN-capable.  Otherwise it
364	   indicates an ECN-capable transport (ECT) using either of the two
365	   code-points "10" or "01" (ECT(0) and ECT(1) resp.).

367	   ECN-capable queues probabilistically set this field to "11" if
368	   congestion is experienced (CE).  In general this marking probability
369	   will increase with the length of the queue at its egress link
370	   (typically using the RED algorithm [RFC2309]).  However, they still
371	   drop rather than mark Not-ECT packets.  With multiple ECN-capable
372	   queues on a path, a flow of packets accumulates the fraction of CE
373	   marking that each queue adds.  The combined effect of the packet
374	   marking of all the queues along the path signals congestion of the
375	   whole path to the receiver.  So, for example, if one queue early in a
376	   path is marking 1% of packets and another later in a path is marking
377	   2%, flows that pass through both queues will experience approximately
378	   3% marking (see Appendix A for a precise treatment).

380	   The choice of two ECT code-points in the ECN field [RFC3168]
381	   permitted future flexibility, optionally allowing the sender to
382	   encode the experimental ECN nonce [RFC3540] in the packet stream.
383	   The nonce is designed to allow a sender to check the integrity of
384	   congestion feedback.  But Section 8.1 explains that it still gives no
385	   control over how fast the sender transmits as a result of the
386	   feedback.  On the other hand, re-ECN is designed both to ensure that
387	   congestion is declared honestly and that the sender's rate responds
388	   appropriately.

390	   Re-ECN is based on a feedback arrangement called `re-
391	   feedback' [Re-fb].  The word is short for either receiver-aligned,
392	   re-inserted or re-echoed feedback.  But it actually works even when
393	   no feedback is available.  In fact it has been carefully designed to
394	   work for single datagram flows.  It also encourages aggregation of
395	   single packet flows by congestion control proxies.  Then, even if the
396	   traffic mix of the Internet were to become dominated by short
397	   messages, it would still be possible to control congestion
398	   effectively and efficiently.

400	   Changing the Internet's feedback architecture seems to imply
401	   considerable upheaval.  But re-ECN can be deployed incrementally at
402	   the transport layer around unmodified queues using existing fields in
403	   IP (v4 or v6).  However it does also require the last undefined bit
404	   in the IPv4 header, which it uses in combination with the 2-bit ECN
405	   field to create four new codepoints.  Nonetheless, we RECOMMEND
406	   adding optional preferential drop to IP queues based on the re-ECN
407	   fields in order to improve resilience against DoS attacks.
408	   Similarly, re-ECN works best if both the sender and receiver
409	   transports are re-ECN-capable, but it can work with just sender
410	   support(Section 6.1.2).

412	4.2.  Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6)

414	   The re-ECN wire protocol uses the two bit ECN field broadly as in
415	   RFC3168 [RFC3168] as described above, but with five differences of
416	   detail (brought together in a list in Section 7).  This specification
417	   defines a new re-ECN extension (RE) flag.  We will defer the
418	   definition of the actual position of the RE flag in the IPv4 & v6
419	   headers until Section 5.  When we don't need to choose between IPv4
420	   and v6 wire protocols it will suffice call it the RE flag.

422	   Unlike the ECN field, the RE flag is intended to be set by the sender
423	   and SHOULD remain unchanged along the path, although it can be read
424	   by network elements that understand the re-ECN protocol.  It is
425	   feasible that a network element MAY change the setting of the RE
426	   flag, perhaps acting as a proxy for an end-point, but such a protocol
427	   would have to be defined in another specification
428	   (e.g. [I-D.re-pcn-border-cheat]).

430	   Although the RE flag is a separate, single bit field, it can be read
431	   as an extension to the two-bit ECN field; the three concatenated bits
432	   in what we will call the extended ECN field (EECN) giving eight
433	   codepoints.  We will use the RFC3168 names of the ECN codepoints to
434	   describe settings of the ECN field when the RE flag setting is "don't
435	   care", but we also define the following six extended ECN codepoint
436	   names for when we need to be more specific.

438	   One of re-ECN's codepoints is an alternative use of the codepoint set
439	   aside in RFC3168 for the ECN nonce (ECT(1)).  Transports using re-ECN
440	   do not need to use the ECN nonce as long as the sender is also
441	   checking for transport protocol compliance [tcp-rcv-cheat].  The case
442	   for doing this is given in Appendix E.  Two re-ECN codepoints are
443	   given compatible uses to those defined in RFC3168 (Not-ECT and CE).
444	   The other codepoint used by RFC3168 (ECT(0)) isn't used for re-ECN.
445	   Altogether this leave one codepoint of the eight unused by ECN or re-
446	   ECN and available for future use.

448	   +--------+-------------+-------+-----------+------------------------+
449	   |   ECN  |   RFC3168   |   RE  |    EECN   |     re-ECN meaning     |
450	   |  field |  codepoint  |  flag | codepoint |                        |
451	   +--------+-------------+-------+-----------+------------------------+
452	   |   00   |   Not-ECT   |   0   |  Not-ECT  |   Not re-ECN-capable   |
453	   |        |             |       |           |   transport (Legacy)   |
454	   |   00   |     ---     |   1   |    FNE    |      Feedback not      |
455	   |        |             |       |           | established (Cautious) |
456	   |   01   |    ECT(1)   |   0   |  Re-Echo  |  Re-echoed congestion  |
457	   |        |             |       |           |   and RECT (Positive)  |
458	   |   01   |     ---     |   1   |    RECT   |     Re-ECN capable     |
459	   |        |             |       |           |   transport (Neutral)  |
460	   |   10   |    ECT(0)   |   0   |   ECT(0)  |  RFC3168 ECN use only  |
461	   |        |             |       |           |                        |
462	   |   10   |     ---     |   1   |   --CU--  |    Currently unused    |
463	   |        |             |       |           |                        |
464	   |   11   |      CE     |   0   |   CE(0)   |  Re-Echo cancelled by  |
465	   |        |             |       |           |     CE (Cancelled)     |
466	   |   11   |     ---     |   1   |   CE(-1)  | Congestion Experienced |
467	   |        |             |       |           |       (Negative)       |
468	   +--------+-------------+-------+-----------+------------------------+

470	                     Table 1: Extended ECN Codepoints

472	4.3.  Re-ECN Protocol Operation

474	   In this section we will give an overview of the operation of the re-
475	   ECN protocol for TCP/IP, leaving a detailed specification to the
476	   following sections.  Other transports will be discussed later.

478	   {ToDo: This section to be updated to explain that the sender re-
479	   echoes losses in the same way as ECN markings.}

481	   In summary, the protocol adds a third `re-echo' stage to the existing
482	   TCP/IP ECN protocol.  Whenever the network adds CE congestion
483	   signalling to the IP header on the forward data path, the receiver
484	   feeds it back to the ingress using TCP, then the sender re-echoes it
485	   into the forward data path using the RE flag in the next packet.

487	   Prior to receiving any feedback a sender will not know which setting
488	   of the RE flag to use, so it sends Cautious packets by setting the
489	   FNE codepoint.  The network reads the FNE codepoint conservatively as
490	   equivalent to re-echoed congestion.

492	   Specifically, once feedback from an ECN or re-ECN capable flow is
493	   established, a re-ECN sender always initialises the ECN field to
494	   ECT(1).  And it usually sets the RE flag to "1" indicating a Neutral
495	   packet.  Whenever a queue marks a packet to CE, the receiver feeds
496	   back this event to the sender.  On receiving this feedback, the re-
497	   ECN sender will clear the RE flag to "0" in the next packet it sends
498	   (indicating a Positive packet).

500	   We chose to set and clear the RE flag this way round to ease
501	   incremental deployment (see Section 7).  To avoid confusion we will
502	   use the term `blanking' (rather than marking) when the RE flag is
503	   cleared to "0".  So, over a stream of packets, we will talk of the
504	   `RE blanking fraction' as the fraction of octets in packets with the
505	   RE flag cleared to "0".

507	       +---+  +----+                +----+  +---+
508	       | S |--| Q1 |----------------| Q2 |--| R |
509	       +---+  +----+                +----+  +---+
510	         .      .                      .      .
511	       ^ .      .                      .      .
512	       | .      .                      .      .
513	       | .     RE blanking fraction    .      .
514	    3% |-------------------------------+=======
515	       | .      .                      |      .
516	    2% | .      .                      |      .
517	       | .      .  CE marking fraction |      .
518	    1% | .      +----------------------+      .
519	       | .      |                      .      .
520	    0% +--------------------------------------->
521	         ^          ^                      ^
522	         L          M                      N    Observation points

524	                  Figure 2: A 2-Queue Example (Imprecise)

526	   Figure 2 uses a simple network to illustrate how re-ECN allows queues
527	   to measure downstream congestion.  The receiver views a CE marking
528	   fraction of 3% which is fed back to the sender.  The sender sets an
529	   RE blanking fraction of 3% to match this.  This RE blanking fraction
530	   can be observed along the path as the RE flag is not changed by
531	   network nodes once set by the sender.  This is shown by the
532	   horizontal line at 3% in the figure.  The CE marked fraction is shown
533	   by the stepped line which rises to meet the RE blanking fraction line
534	   with steps at each queue where packets are marked.  Two queues are
535	   shown (Q1 and Q2) that are currently congested.  Each time packets
536	   pass through a fraction are marked; 1% at Q1 and 2% at Q2).  The
537	   approximate downstream congestion can be measured at the observation
538	   points shown along the path by subtracting the CE marking fraction
539	   from the RE blanking fraction, as shown in the table below
540	   (Appendix A derives these approximations from a precise analysis).
541	   NB due to the unary nature of ECN marking and the equivalent unary
542	   nature of re-ECN blanking, the precise fraction of marked bytes must
543	   be calculated by maintaining a moving average of the number of
544	   packets that have been marked as a proportion of the total number of
545	   packets.

547	   Along the path the fraction of packets that had their RE field
548	   cleared remains unchanged so it can be used as a reference against
549	   which to compare upstream congestion.  The difference predicts
550	   downstream congestion for the rest of the path.  Therefore, measuring
551	   the fractions of each codepoint at any point in the Internet will
552	   reveal upstream, downstream and whole path congestion.

554	   Note that we have introduced discussion of marking and blanking
555	   fractions solely for illustration.  We are not saying any protocol
556	   handler will work with these average fractions directly.  In fact the
557	   protocol actually requires the number of marked and blanked bytes to
558	   balance by the time the packet reaches the receiver.

560	4.4.  Positive and Negative Flows

562	   In Section 3 we introduced the terms Positive, Neutral, Negative,
563	   Cautious and Cancelled.  This terminology is based on the requirement
564	   to balance the proportion of bytes marked as CE with the proportion
565	   of bytes that are re-echo marked.  In the rest of this memo we will
566	   loosely talk of positive or negative flows, meaning flows where the
567	   moving average of the downstream congestion metric is persistently
568	   positive or negative.  A negative flow is one where more CE marked
569	   packets than re-ECN blanked packets arrive.  Likewise in positive
570	   flows more re-ECN blanked packets arrive than CE marked packets.  The
571	   notion of a negative metric arises because it is derived by
572	   subtracting one metric from another.  Of course actual downstream
573	   congestion cannot be negative, only the metric can (whether due to
574	   time lags or deliberate malice).

576	   Therefore we will talk of packets having `worth' of +1, 0 or -1,
577	   which, when multiplied by their size, indicates their contribution to
578	   the downstream congestion metric.  The worth of each type of packet
579	   is given below in Table 2.  The idea is that most flows start with
580	   zero worth.  Every time the network decrements the worth of a packet,
581	   the sender increments the worth of a later packet.  Then, over time,
582	   as many positive octets should arrive at the receiver as negative.
583	   Note we have said octets not packets, so if packets are of different
584	   sizes, the worth should be incremented on enough octets to balance
585	   the octets in negative packets arriving at the receiver.  It is this
586	   balance that will allow the network to hold the sender accountable
587	   for the congestion it causes.

589	   If a packet carrying re-echoed congestion happens to also be
590	   congestion marked, the +1 worth added by the sender will be cancelled
591	   out by the -1 network congestion marking.  Although the two worth
592	   values correctly cancel out, neither the congestion marking nor the
593	   re-echoed congestion are lost, because the RE bit and the ECN field
594	   are orthogonal.  So, whenever this happens, the receiver will
595	   correctly detect and re-echo the new congestion event as well.

597	   The table below specifies unambiguously the worth of each extended
598	   ECN codepoint.  Note the order is different from the previous table
599	   to better show how the worth increments and decrements.

601	   +---------+-------+---------------+-------+-------------------------+
602	   |   ECN   |   RE  | Extended ECN  | Worth |       Re-ECN Term       |
603	   |  field  |  bit  | codepoint     |       |                         |
604	   +---------+-------+---------------+-------+-------------------------+
605	   |    00   |   0   | Not-RECT      | ...   |           ---           |
606	   |    00   |   1   | FNE           | +1    |         Cautious        |
607	   |    01   |   0   | Re-Echo       | +1    |         Positive        |
608	   |    10   |   0   | Legacy        | ...   |   RFC3168 ECN use only  |
609	   |         |       |               |       |                         |
610	   |    11   |   0   | CE(0)         |  0    |        Cancelled        |
611	   |    01   |   1   | RECT          |  0    |         Neutral         |
612	   |    10   |   1   | --CU--        | ...   |     Currently unused    |
613	   |         |       |               |       |                         |
614	   |    11   |   1   | CE(-1)        | -1    |         Negative        |
615	   +---------+-------+---------------+-------+-------------------------+

617	                Table 2: 'Worth' of Extended ECN Codepoints

619	5.  Network Layer

621	5.1.  Re-ECN IPv4 Wire Protocol

623	   The wire protocol of the ECN field in the IP header remains largely
624	   unchanged from [RFC3168].  However, an extension to the ECN field we
625	   call the RE (Re-ECN extension) flag (Section 4.2) is defined in this
626	   document.  It doubles the extended ECN codepoint space, giving 8
627	   potential codepoints.  The semantics of the extra codepoints are
628	   backward compatible with the semantics of the 4 original codepoints
629	   [RFC3168] (Section 7 collects together and summarises all the changes
630	   defined in this document).

632	   For IPv4, this document proposes that the new RE control flag will be
633	   positioned where the `reserved' control flag was at bit 48 of the
634	   IPv4 header (counting from 0).  Alternatively, some would call this
635	   bit 0 (counting from 0) of byte 7 (counting from 1) of the IPv4
636	   header (Figure 3).

638	             0   1   2
639	           +---+---+---+
640	           | R | D | M |
641	           | E | F | F |
642	           +---+---+---+

644	   Figure 3: New Definition of the Re-ECN Extension (RE) Control Flag at
645	                  the Start of Byte 7 of the IPv4 Header

647	   The semantics of the RE flag are described in outline in Section 4
648	   and specified fully in Section 6.  The RE flag is always considered
649	   in conjunction with the 2-bit ECN field, as if they were concatenated
650	   together to form a 3-bit extended ECN field.  If the ECN field is set
651	   to either the ECT(1) or CE codepoint, when the RE flag is blanked
652	   (cleared to "0") it represents a re-echo of congestion experienced by
653	   an early packet.  If the ECN field is set to the Not-ECT codepoint,
654	   when the RE flag is set to "1" it represents the feedback not
655	   established (FNE) codepoint, which signals that the packet was sent
656	   without the benefit of congestion feedback.

658	   It is believed that the FNE codepoint can simultaneously serve other
659	   purposes, particularly where the start of a flow needs distinguishing
660	   from packets later in the flow.  For instance it would have been
661	   useful to identify new flows for tag switching and might enable
662	   similar developments in the future if it were adopted.  It is similar
663	   to the state set-up bit idea designed to protect against memory
664	   exhaustion attacks.  This idea was proposed informally by David Clark
665	   and documented by Handley and Greenhalgh  [Steps_DoS].  The FNE
666	   codepoint can be thought of as a `soft-state set-up flag', because it
667	   is idempotent (i.e. one occurrence of the flag is sufficient but
668	   further occurrences achieve the same effect if previous ones were
669	   lost).

671	   We are sure there will probably be other claims pending on the use of
672	   bit 48.  We know of at least two  [ARI05], [RFC3514] but neither have
673	   been pursued in the IETF, so far, although the present proposal would
674	   meet the needs of the latter.

676	   The security flag proposal (commonly known as the evil bit) was
677	   published on 1 April 2003 as Informational RFC 3514, but it was not
678	   adopted due to confusion over whether evil-doers might set it
679	   inappropriately.  The present proposal is backward compatible with
680	   RFC3514 because if re-ECN compliant senders were benign they would
681	   correctly clear the evil bit to honestly declare that they had just
682	   received congestion feedback.  Whereas evil-doers would hide
683	   congestion feedback by setting the evil bit continuously, or at least
684	   more often than they should.  So, evil senders can be identified,
685	   because they declare that they are good less often than they should.

687	5.2.  Re-ECN IPv6 Wire Protocol

689	   For IPv6, this document proposes that the new RE control flag will be
690	   positioned as the first bit of the option field of a new Congestion
691	   hop by hop option header (Figure 4).

693	        0                   1                   2                   3
694	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
695	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
696	       |  Next Header  |  Hdr ext Len  |  Option Type  | Opt Length =4 |
697	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
698	       |R|                     Reserved for future use                 |
699	       |E|                                                             |
700	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

702	      Figure 4: Definition of a New IPv6 Congestion Hop by Hop Option
703	         Header containing the re-ECN Extension (RE) Control Flag

705	               0 1 2 3 4 5 6 7 8
706	               +-+-+-+-+-+-+-+-+-
707	               |AIU|C|Option ID|
708	               +-+-+-+-+-+-+-+-+-

710	           Figure 5: Congestion Hop by Hop Option Type Encoding

712	   The Hop-by-Hop Options header enables packets to carry information to
713	   be examined and processed by routers or nodes along the packet's
714	   delivery path, including the source and destination nodes.  For re-
715	   ECN, the two bits of the Action If Unrecognized (AIU) flag of the
716	   Congestion extension header MUST be set to "00" meaning if
717	   unrecognized `skip over option and continue processing the header'.
718	   Then, any routers or a receiver not upgraded with the optional re-ECN
719	   features described in this memo will simply ignore this header.  But
720	   routers with these optional re-ECN features or a re-ECN policing
721	   function, will process this Congestion extension header.

723	   The `C' flag MUST be set to "1" to specify that the Option Data
724	   (currently only the RE control flag) can change en-route to the
725	   packet's final destination.  This ensures that, when an
726	   Authentication header (AH [RFC4302]) is present in the packet, for
727	   any option whose data may change en-route, its entire Option Data
728	   field will be treated as zero-valued octets when computing or
729	   verifying the packet's authenticating value.

731	   Although the RE control flag should not be changed along the path, we
732	   expect that the rest of this option field that is currently `Reserved
733	   for future use' could be used for a multi-bit congestion notification
734	   field which we would expect to change en route.  Therefore, as
735	   changes to the RE flag could be detected end-to-end without
736	   authentication (see Section 9), we set the C flag to '1'.

738	5.3.  Router Forwarding Behaviour

740	   {ToDo: Consider a section on how whole protocol interworks with drop.
741	   Perhaps in Protocol Overview.}

743	   Re-ECN works well without modifying the forwarding behaviour of any
744	   routers.  However, below, two OPTIONAL changes to forwarding
745	   behaviour are defined which respectively enhance performance and
746	   improve a router's discrimination against flooding attacks.  They are
747	   both OPTIONAL additions that we propose MAY apply by default to all
748	   Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN
749	   marking behaviours [RFC3168].  Specifications for PHBs MAY define
750	   different forwarding behaviours from this default, but this is not
751	   required.  [I-D.re-pcn-border-cheat] is one example.

753	   FNE indicates ECT:

755	      The FNE codepoint tells a router to assume that the packet was
756	      sent by an ECN-capable transport (see Section 5.4).  Therefore an
757	      FNE packet MAY be marked rather than dropped.  Note that the FNE
758	      codepoint has been intentionally chosen so that, to RFC3168
759	      compliant routers (which do not inspect the RE flag) an FNE packet
760	      appears to be Not-ECT so it will be dropped by legacy AQM
761	      algorithms.

763	      A network operator MUST NOT configure a queue to ECN mark rather
764	      than drop FNE packets unless it can guarantee that FNE packets
765	      will be rate limited, either locally or upstream.  The ingress
766	      policers discussed in [I-D.re-ecn-motiv] would count as rate
767	      limiters for this purpose.

769	   Preferential Drop:  If a re-ECN capable router queue experiences very
770	      high load so that it has to drop arriving packets (e.g. a DoS
771	      attack), it MAY preferentially drop packets within the same
772	      Diffserv PHB using the preference order for extended ECN
773	      codepoints given in Table 3.  Preferential dropping can be
774	      difficult to implement on some hardware, but if feasible it would
775	      discriminate against attack traffic if done as part of the overall
776	      policing framework of [I-D.re-ecn-motiv].  If nowhere else,
777	      routers at the egress of a network SHOULD implement preferential
778	      drop (stronger than the MAY above).  For simplicity, preferences 4
779	      & 5 MAY be merged into one preference level.

781	      The tabulated drop preferences are arranged to preserve packets
782	      with more positive worth (Section 4.4), given senders of positive
783	      packets must have honestly declared downstream congestion.  A full
784	      treatment of this is provided in the companion document describing
785	      the motivation and architecture for re-ECN [I-D.re-ecn-motiv]
786	      particularly when the application of re-ECN to protect against
787	      DDoS attacks is described.

789	   +-------+-----+------------+-------+------------+-------------------+
790	   |  ECN  |  RE | Extended   | Worth | Drop Pref  |   Re-ECN meaning  |
791	   | field | bit | ECN        |       | (1 = drop  |                   |
792	   |       |     | codepoint  |       | 1st)       |                   |
793	   +-------+-----+------------+-------+------------+-------------------+
794	   |   01  |  0  | Re-Echo    | +1    | 5/4        |     Re-echoed     |
795	   |       |     |            |       |            |   congestion and  |
796	   |       |     |            |       |            |        RECT       |
797	   |   00  |  1  | FNE        | +1    | 4          |    Feedback not   |
798	   |       |     |            |       |            |    established    |
799	   |   11  |  0  | CE(0)      | 0     | 3          |  Re-Echo canceled |
800	   |       |     |            |       |            |   by congestion   |
801	   |       |     |            |       |            |    experienced    |
802	   |   01  |  1  | RECT       | 0     | 3          |   Re-ECN capable  |
803	   |       |     |            |       |            |     transport     |
804	   |   11  |  1  | CE(-1)     | -1    | 3          |     Congestion    |
805	   |       |     |            |       |            |    experienced    |
806	   |   10  |  1  | --CU--     | n/a   | 2          |  Currently Unused |
807	   |   10  |  0  | ---        | n/a   | 2          |  RFC3168 ECN use  |
808	   |       |     |            |       |            |        only       |
809	   |   00  |  0  | Not-RECT   | n/a   | 1          |        Not        |
810	   |       |     |            |       |            |   Re-ECN-capable  |
811	   |       |     |            |       |            |     transport     |
812	   +-------+-----+------------+-------+------------+-------------------+

814	      Table 3: Drop Preference of EECN Codepoints (Sorted by `Worth')

816	5.4.  Justification for Setting the First SYN to FNE

818	   the initial SYN MUST be set to FNE by Re-ECT client A (Section 6.1.4)
819	   and (Section 5.3) says a queue MAY optionally treat an FNE packet as
820	   ECN capable, so an initial SYN may be marked CE(-1) rather than
821	   dropped.  This seems dangerous, because the sender has not yet
822	   established whether the receiver is a RFC3168 one that does not
823	   understand congestion marking.  It also seems to allow malicious
824	   senders to take advantage of ECN marking to avoid so much drop when
825	   launching SYN flooding attacks.  Below we explain the features of the
826	   protocol design that remove both these dangers.

828	   ECN-capable initial SYN with a Not-ECT server:  If the TCP server B
829	      is re-ECN capable, provision is made for it to feedback a possible
830	      congestion marked SYN in the SYN ACK (Section 6.1.4).  But if the
831	      TCP client A finds out from the SYN ACK that the server was not
832	      ECN-capable, the TCP client MUST conservatively consider the first
833	      SYN as congestion marked before setting itself into Not-ECT mode.
834	      Section 6.1.4 mandates that such a TCP client MUST also set its
835	      initial window to 1 segment.  In this way we remove the need to
836	      cautiously avoid setting the first SYN to Not-RECT.  This will
837	      give worse performance while deployment is patchy, but better
838	      performance once deployment is widespread.

840	   SYN flooding attacks can't exploit ECN-capability:  Malicious hosts
841	      may think they can use the advantage that ECN-marking gives over
842	      drop in launching classic SYN-flood attacks.  But Section 5.3
843	      mandates that a router MUST only be configured to treat packets
844	      with the FNE codepoint as ECN-capable if FNE packets are rate
845	      limited somewhere.  Introduction of the FNE codepoint was a
846	      deliberate move to enable transport-neutral handling of flow-start
847	      and flow state set-up in the IP layer where it belongs.  It then
848	      becomes possible to protect against flooding attacks of all forms
849	      (not just SYN flooding) without transport-specific inspection for
850	      things like the SYN flag in TCP headers.  Then, for instance, SYN
851	      flooding attacks using IPsec ESP encryption can also be rate
852	      limited at the IP layer.

854	   It might seem pedantic going to all this trouble to enable ECN on the
855	   initial packet of a flow, but it is motivated by a much wider concern
856	   to ensure safe congestion control will still be possible even if the
857	   application mix evolves to the point where the majority of flows
858	   consist of a single window or even a single packet.  It also allows
859	   denial of service attacks to be more easily isolated and prevented.

861	   {ToDo: Give alternative where initial packet is Not-RECT and last ACK
862	   of three-way handshake is FNE.  Explain this will give better
863	   performance while deployment is patchy, but worse performance once
864	   deployment is high.}

866	5.5.  Control and Management

868	5.5.1.  Negative Balance Warning

870	   A new ICMP message type is being considered so that a dropper can
871	   warn the apparent sender of a flow that it has started to sanction
872	   the flow.  The message would have similar semantics to the `Time
873	   exceeded' ICMP message type.  To ensure the sender has to invest some
874	   work before the network will generate such a message, a dropper
875	   SHOULD only send such a message for flows that have demonstrated that
876	   they have started correctly by establishing a positive record, but
877	   have later gone negative.  The threshold is up to the implementation.
878	   The purpose of the message is to deconfuse the cause of drops from
879	   other causes, such as congestion or transmission losses.  The dropper
880	   would send the message to the sender of the flow, not the receiver.
881	   If we did define this message type, it would be REQUIRED for all re-
882	   ECT senders to parse and understand it.  Note that a sender MUST only
883	   use this message to explain why losses are occurring.  A sender MUST
884	   NOT take this message to mean that losses have occurred that it was
885	   not aware of.  Otherwise, spoof messages could be sent by malicious
886	   sources to slow down a sender (c.f.  ICMP source quench).

888	   However, the need for this message type is not yet confirmed, as we
889	   are considering how to prevent it being used by malicious senders to
890	   scan for droppers and to test their threshold settings. {ToDo:
891	   Complete this section.}

893	5.5.2.  Rate Response Control

895	   As discussed in [I-D.re-ecn-motiv] the sender's access operator will
896	   be expected to use bulk per-user policing, but they might choose to
897	   introduce a per-flow policer.  In cases where operators do introduce
898	   per-flow policing, there may be a need for a sender to send a request
899	   to the ingress policer asking for permission to apply a non-default
900	   response to congestion (where TCP-friendly is assumed to be the
901	   default).  This would require the sender to know what message
902	   format(s) to use and to be able to discover how to address the
903	   policer.  The required control protocol(s) are outside the scope of
904	   this document, but will require definition elsewhere.

906	   The policer is likely to be local to the sender and inline, probably
907	   at the ingress interface to the internetwork.  So, discovery should
908	   not be hard.  A variety of control protocols already exist for some
909	   widely used rate-responses to congestion.  For instance DCCP
910	   congestion control identifiers (CCIDs [RFC4340]) fulfil this role and
911	   so does QoS signalling (e.g. and RSVP request for controlled load
912	   service is equivalent to a request for no rate response to
913	   congestion, but with admission control).

915	5.6.  IP in IP Tunnels

917	   Ideally, for re-ECN to work through IP in IP tunnels, the tunnel
918	   entry should copy both the RE flag and the ECN field from the inner
919	   to the outer IP header.  Then at the tunnel exit, any CE marking of
920	   the outer ECN field should overwrite the inner ECN field (unless the
921	   inner field is Not-ECT in which case an alarm should be raised).  The
922	   RE flag shouldn't change along a path, so the outer RE flag should be
923	   the same as the inner.  If it isn't, a management alarm should be
924	   raised.

926	   This requirement is satisfied by the latest specification for
927	   handling ECN through IP tunnels [RFC6040] as well as by IPsec
928	   [RFC4301].  However, it is not satisfied by the ingress behaviour
929	   specified in [RFC3168] although at least the full-functionality
930	   variant of the egress behaviour is fine.  RFC6040 updates RFC3168,
931	   but it is likely that many legacy non-IPsec IP-in-IP tunnels will
932	   exist.

934	   If legacy tunnels are left as specified in [RFC3168], whether the
935	   limited or full-functionality variants is used, a problem arises with
936	   re-ECN if a tunnel crosses an inter-domain boundary, because the
937	   difference between positive and negative markings will not be
938	   correctly accounted for.  In a limited functionality ECN tunnel, the
939	   flow will appear to be RFC3168 compliant traffic, and therefore may
940	   be wrongly rate limited.  In a full-functionality ECN tunnel, the
941	   result will depend whether the tunnel entry copies the inner RE flag
942	   to the outer header or the RE flag in the outer header is always
943	   cleared.  If the former, the flow will tend to be too positive when
944	   accounted for at borders.  If the latter, it will be too negative.
945	   If the rules set out in [RFC6040] are followed then this will not be
946	   an issue.

948	5.7.  Non-Issues

950	   The following issues might seem to cause unfavourable interactions
951	   with re-ECN, but we will explain why they don't:

953	   o  Various link layers support explicit congestion notification, such
954	      as Frame Relay and ATM.  Explicit congestion notification is
955	      proposed to be added to other link layers, such as Ethernet
956	      (802.3ar Ethernet congestion management) and MPLS [RFC5129];

958	   o  Encryption and IPsec.

960	   In the case of congestion notification at the link layer, each
961	   particular link layer scheme either manages congestion on the link
962	   with its own link-level feedback (the usual arrangement in the cases
963	   of ATM and Frame Relay), or congestion notification from the link
964	   layer is merged into congestion notification at the IP level when the
965	   frame headers are decapsulated at the end of the link (the
966	   recommended arrangement in the Ethernet and MPLS cases).  Given the
967	   RE flag is not intended to change along the path, this means that
968	   downstream congestion will still be measurable at any point where IP
969	   is processed on the path by subtracting positive from negative
970	   markings.

972	   In the case of encryption, as long as the tunnel issues described in
973	   Section 5.6 are dealt with, payload encryption itself will not be a
974	   problem.  The design goal of re-ECN is to include downstream
975	   congestion in the IP header so that it is not necessary to bury into
976	   inner headers.  Obfuscation of flow identifiers is not a problem for
977	   re-ECN policing elements.  Re-ECN doesn't ever require flow
978	   identifiers to be valid, it only requires them to be unique.  So if
979	   an IPsec encapsulating security payload (ESP [RFC4835]) or an
980	   authentication header (AH [RFC4302]) is used, the security parameters
981	   index (SPI) will be a sufficient flow identifier, as it is intended
982	   to be unique to a flow without revealing actual port numbers.

984	   In general, even if endpoints use some locally agreed scheme to hide
985	   port numbers, re-ECN policing elements can just consider the pair of
986	   source and destination IP addresses as the flow identifier.  Re-ECN
987	   encourages endpoints to at least tell the network layer that a
988	   sequence of packets are all part of the same flow, if indeed they
989	   are.  The alternative would be for the sender to make each packet
990	   appear to be a new flow, which would require them all to be marked
991	   FNE in order to avoid being treated with the bulk of malicious flows
992	   at the egress dropper.  Given the FNE marking is worth +1 and
993	   networks are likely to rate limit FNE packets, endpoints are given an
994	   incentive not to set FNE on each packet.  But if the sender really
995	   does want to hide the flow relationship between packets it can choose
996	   to pay the cost of multiple FNE packets, which in the long run will
997	   compensate for the extra memory required on network policing elements
998	   to process each flow.

1000	   {ToDo: Add a note about it being useful that the AH header does not
1001	   cover the RE flag, referring to Section 9.}

1003	6.  Transport Layers

1005	6.1.  TCP

1007	   Re-ECN capability at the sender is essential.  At the receiver it is
1008	   optional, as long as the receiver has a basic RFC3168-compliant ECN-
1009	   capable transport (ECT) [RFC3168].  Given re-ECN is not the first
1010	   attempt to define the semantics of the ECN field, we give a table
1011	   below summarising what happens for various combinations of
1012	   capabilities of the sender S and receiver R, as indicated in the
1013	   first four columns below.  The last column gives the mode a half-
1014	   connection should be in after the first two of the three TCP
1015	   handshakes.

1017	   +--------+--------------+------------+---------+--------------------+
1018	   | Re-ECT |   ECT-Nonce  |     ECT    | Not-ECT |         S-R        |
1019	   |        |   (RFC3540)  |  (RFC3168) |         |   Half-connection  |
1020	   |        |              |            |         |        Mode        |
1021	   +--------+--------------+------------+---------+--------------------+
1022	   |   SR   |              |            |         |        RECN        |
1023	   |    S   |       R      |            |         |       RECN-Co      |
1024	   |    S   |              |      R     |         |       RECN-Co      |
1025	   |    S   |              |            |    R    |       Not-ECT      |
1026	   +--------+--------------+------------+---------+--------------------+

1028	       Table 4: Modes of TCP Half-connection for Combinations of ECN
1029	                  Capabilities of Sender S and Receiver R

1031	   We will describe what happens in each mode, then describe how they
1032	   are negotiated.  The abbreviations for the modes in the above table
1033	   mean:

1035	   RECN:  Full re-ECN capable transport

1037	   RECN-Co:  Re-ECN sender in compatibility mode with a RFC3168
1038	      compliant [RFC3168] ECN receiver or an [RFC3540] ECN nonce-capable
1039	      receiver.  Implementation of this mode is OPTIONAL.

1041	   Not-ECT:  Not ECN-capable transport, as defined in [RFC3168] for when
1042	      at least one of the transports does not understand even basic ECN
1043	      marking.

1045	   Note that we use the term Re-ECT for a host transport that is re-ECN-
1046	   capable but RECN for the modes of the half connections between hosts
1047	   when they are both Re-ECT.  If a host transport is Re-ECT, this fact
1048	   alone does NOT imply either of its half connections will necessarily
1049	   be in RECN mode, at least not until it has confirmed that the other
1050	   host is Re-ECT.

1052	6.1.1.  RECN mode: Full Re-ECN capable transport

1054	   In full RECN mode, for each half connection, both the sender and the
1055	   receiver each maintain an unsigned integer counter we will call ECC
1056	   (echo congestion counter).  The receiver maintains a count of how
1057	   many times a CE marked packet has arrived during the half-connection.
1058	   Once a RECN connection is established, the three TCP option flags
1059	   (ECE, CWR & NS) used for ECN-related functions in other versions of
1060	   ECN are used as a 3-bit field for the receiver to repeatedly tell the
1061	   sender the current value of ECC, modulo 8, whenever it sends a TCP
1062	   ACK.  We will call this the echo congestion increment (ECI) field.
1063	   This overloaded use of these 3 option flags as one 3-bit ECI field is
1064	   shown in Figure 7.  The actual definition of the TCP header,
1065	   including the addition of support for the ECN nonce, is shown for
1066	   comparison in Figure 6.  This specification does not redefine the
1067	   names of these three TCP option flags, it merely overloads them with
1068	   another definition once a flow is established.

1070	        0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
1071	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
1072	      |               |           | N | C | E | U | A | P | R | S | F |
1073	      | Header Length | Reserved  | S | W | C | R | C | S | S | Y | I |
1074	      |               |           |   | R | E | G | K | H | T | N | N |
1075	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

1077	    Figure 6: The (post-ECN Nonce) definition of bytes 13 and 14 of the
1078	                                TCP Header

1080	        0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
1081	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
1082	      |               |           |           | U | A | P | R | S | F |
1083	      | Header Length | Reserved  |    ECI    | R | C | S | S | Y | I |
1084	      |               |           |           | G | K | H | T | N | N |
1085	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

1087	    Figure 7: Definition of the ECI field within bytes 13 and 14 of the
1088	   TCP Header, overloading the current definitions above for established
1089	                                RECN flows.

1091	   Receiver Action in RECN Mode

1093	      Every time a CE marked packet arrives at a receiver in RECN mode,
1094	      the receiver transport increments its local value of ECC and MUST
1095	      echo its value, modulo 8, to the sender in the ECI field of the
1096	      next ACK.  It MUST repeat the same value of ECI in every
1097	      subsequent ACK until the next CE event, when it increments ECI
1098	      again.

1100	      The increment of the local ECC values is modulo 8 so the field
1101	      value simply wraps round back to zero when it overflows.  The
1102	      least significant bit is to the right (labelled bit 9).

1104	      A receiver in RECN mode MAY delay the echo of a CE to the next
1105	      delayed-ACK, which would be necessary if ACK-withholding were
1106	      implemented.

1108	   Sender Action in RECN Mode

1110	      On the arrival of every ACK, the sender compares the ECI field
1111	      with its own ECC value, then replaces its local value with that
1112	      from the ACK.  The difference D (D = (ECI + 8 - ECC mod 8) mod 8)
1113	      is assumed to be the number of CE marked packets that arrived at
1114	      the receiver since it sent the previously received ACK (but see
1115	      below for the sender's safety strategy).  Whenever the ECI field
1116	      increments by D (and/or d drops are detected), the sender MUST
1117	      clear the RE flag to "0" in the IP header of the next D' data
1118	      packets it sends (where D' = D + d), effectively re-echoing each
1119	      single increment of ECI.  Otherwise the data sender MUST send all
1120	      data packets with RE set to "1".

1122	      As a general rule, once a flow is established, as well as setting
1123	      or clearing the RE flag as above, a data sender in RECN mode MUST
1124	      always set the ECN field to ECT(1).  However, the settings of the
1125	      extended ECN field during flow start are defined in Section 6.1.4.

1127	      As we have already emphasised, the re-ECN protocol makes no
1128	      changes and has no effect on the TCP congestion control algorithm.
1129	      So, the first increment of ECI (or detection of a drop) in a RTT
1130	      triggers the standard TCP congestion response, no more than one
1131	      congestion response per round trip, as usual.  However, the sender
1132	      re-echoes every increment of ECI irrespective of RTTs.

1134	      A TCP sender also acts as the receiver for the other half-
1135	      connection.  The host will maintain two ECC values S.ECC and R.ECC
1136	      as sender and receiver respectively.  Every TCP header sent by a
1137	      host in RECN mode will also repeat the prevailing value of R.ECC
1138	      in its ECI field.  If a sender in RECN mode has to retransmit a
1139	      packet due to a suspected loss, the re-transmitted packet MUST
1140	      carry the latest prevailing value of R.ECC when it is re-
1141	      transmitted, which will not necessarily be the one it carried
1142	      originally.

1144	6.1.2.  RECN-Co mode: Re-ECT Sender with a RFC3168 compliant ECN
1145	        Receiver

1147	   If the half-connection is in RECN-Co mode, ECN feedback proceeds no
1148	   differently to that of RFC3168 compliant ECN.  In other words, the
1149	   receiver sets the ECE flag repeatedly in the TCP header and the
1150	   sender responds by setting the CWR flag.  Although RECN-Co mode is
1151	   used when the receiver has not implemented the re-ECN protocol, the
1152	   sender can infer enough from its RFC3168 compliant ECN feedback to
1153	   set or clear the RE flag reasonably well.  Specifically, every time
1154	   the receiver toggles the ECE field from "0" to "1" (or a loss is
1155	   detected), as well as setting CWR in the TCP flags, the re-ECN sender
1156	   MUST blank the RE flag of the next packet to "0" as it would do in
1157	   full RECN mode.  Otherwise, the data sender SHOULD send all other
1158	   packets with RE set to "1".  Once a flow is established, a re-ECN
1159	   data sender in RECN-Co mode MUST always set the ECN field to ECT(1).

1161	   If a CE marked packet arrives at the receiver within a round trip
1162	   time of a previous mark, the receiver will still be echoing ECE for
1163	   the last CE mark.  Therefore, such a mark will be missed by the
1164	   sender.  Of course, this isn't of concern for congestion control, but
1165	   it does mean that very occasionally the RE blanking fraction will be
1166	   understated.  Therefore flows in RECN-Co mode may occasionally be
1167	   mistaken for very lightly cheating flows and consequently might
1168	   suffer a small number of packet drops through an egress dropper.  We
1169	   expect re-ECN would be deployed for some time before policers and
1170	   droppers start to enforce it.  So, given there is not much ECN
1171	   deployment yet anyway, this minor problem may affect only a very
1172	   small proportion of flows, reducing to nothing over the years as
1173	   RFC3168 compliant ECN hosts upgrade.  The use of RECN-Co mode would
1174	   need to be reviewed in the light of experience at the time of re-ECN
1175	   deployment.

1177	   RECN-Co mode is OPTIONAL.  Re-ECN implementers who want to keep their
1178	   code simple, MAY choose not to implement this mode.  If they do not,
1179	   a re-ECN sender SHOULD fall back to RFC3168 compliant ECT mode in the
1180	   presence of an ECN-capable receiver.  It MAY choose to fall back to
1181	   the ECT-Nonce mode, but if re-ECN implementers don't want to be
1182	   bothered with RECN-Co mode, they probably won't want to add an ECT-
1183	   Nonce mode either.

1185	6.1.2.1.  Re-ECN support for the ECN Nonce

1187	   A TCP half-connection in RECN-Co mode MUST NOT support the ECN
1188	   Nonce [RFC3540].  This means that the sending code of a re-ECN
1189	   implementation will never need to include ECN Nonce support.  Re-ECN
1190	   is intended to provide wider protection than the ECN nonce against
1191	   congestion control misbehaviour, and re-ECN only requires support
1192	   from the sender, therefore it is preferable to specifically rule out
1193	   the need for dual sender implementations.  As a consequence, a re-ECN
1194	   capable sender will never set ECT(0), so it will be easier for
1195	   network elements to discriminate re-ECN traffic flows from other ECN
1196	   traffic, which will always contain some ECT(0) packets.

1198	   However, a re-ECN implementation MAY OPTIONALLY include receiving
1199	   code that complies with the ECN Nonce protocol when interacting with
1200	   a sender that supports the ECN nonce (rather than re-ECN), but this
1201	   support is not required.

1203	   RFC3540 allows an ECN nonce sender to choose whether to sanction a
1204	   receiver that does not ever set the nonce sum.  Given re-ECN is
1205	   intended to provide wider protection than the ECN nonce against
1206	   congestion control misbehaviour, implementers of re-ECN receivers MAY
1207	   choose not to implement backwards compatibility with the ECN nonce
1208	   capability.  This may be because they deem that the risk of sanctions
1209	   is low, perhaps because significant deployment of the ECN nonce seems
1210	   unlikely at implementation time.

1212	6.1.3.  Capability Negotiation

1214	   During the TCP hand-shake at the start of a connection, an originator
1215	   of the connection (host A) with a re-ECN-capable transport MUST
1216	   indicate it is Re-ECT by setting the TCP flags NS=1, CWR=1 and ECE=1
1217	   in the initial SYN.

1219	   A responding Re-ECT host (host B) MUST return a SYN ACK with flags
1220	   CWR=1 and ECE=0.  The responding host MUST NOT set this combination
1221	   of flags unless the preceding SYN has already indicated Re-ECT
1222	   support as above.  Normally a Re-ECT server (B) will reply to a Re-
1223	   ECT client with NS=0, but if the initial SYN from Re-ECT client A is
1224	   marked CE(-1), a Re-ECT server B MUST increment its local value of
1225	   ECC.  But B cannot reflect the value of ECC in the SYN ACK, because
1226	   it is still using the 3 bits to negotiate connection capabilities.
1227	   So, server B MUST set the alternative TCP header flags in its SYN
1228	   ACK: NS=1, CWR=1 and ECE=0.

1230	   These handshakes are summarised in Table 5 below, with X indicating
1231	   NS can be either 1 or 0 depending respectively on whether congestion
1232	   had been experienced or not.  The handshakes used for the other
1233	   flavours of ECN are also shown for comparison.  To compress the width
1234	   of the table, the headings of the first four columns have been
1235	   severely abbreviated, as follows:

1237	      R: *R*e-ECT

1239	      N: ECT-*N*once (RFC3540)

1241	      E: *E*CT (RFC3168)

1243	      I: Not-ECT (*I*mplicit congestion notification).

1245	   These correspond with the same headings used in Table 4.  Indeed, the
1246	   resulting modes in the last two columns of the table below are a more
1247	   comprehensive way of saying the same thing as Table 4.

1249	   +----+---+---+---+------------+-------------+-----------+-----------+
1250	   | R  | N | E | I |   SYN A-B  | SYN ACK B-A |  A-B Mode |  B-A Mode |
1251	   +----+---+---+---+------------+-------------+-----------+-----------+
1252	   |    |   |   |   | NS CWR ECE |  NS CWR ECE |           |           |
1253	   | AB |   |   |   |  1   1   1 |  X   1   0  |    RECN   |    RECN   |
1254	   | A  | B |   |   |  1   1   1 |  1   0   1  |  RECN-Co  | ECT-Nonce |
1255	   | A  |   | B |   |  1   1   1 |  0   0   1  |  RECN-Co  |    ECT    |
1256	   | A  |   |   | B |  1   1   1 |  0   0   0  |  Not-ECT  |  Not-ECT  |
1257	   | B  | A |   |   |  0   1   1 |  0   0   1  | ECT-Nonce |  RECN-Co  |
1258	   | B  |   | A |   |  0   1   1 |  0   0   1  |    ECT    |  RECN-Co  |
1259	   | B  |   |   | A |  0   0   0 |  0   0   0  |  Not-ECT  |  Not-ECT  |
1260	   +----+---+---+---+------------+-------------+-----------+-----------+

1262	      Table 5: TCP Capability Negotiation between Originator (A) and
1263	                               Responder (B)

1265	   As soon as a re-ECN capable TCP server receives a SYN, it MUST set
1266	   its two half-connections into the modes given in Table 5.  As soon as
1267	   a re-ECN capable TCP client receives a SYN ACK, it MUST set its two
1268	   half-connections into the modes given in Table 5.  The half-
1269	   connections will remain in these modes for the rest of the
1270	   connection, including for the third segment of TCP's three-way hand-
1271	   shake (the ACK).

1273	   {ToDo: Consider delaying mode changes if using SYN cookies (will also
1274	   affect next section).}

1276	   {ToDo: consider RSTs within a connection.}

1278	   Recall that, if the SYN ACK reflects the same flag settings as the
1279	   preceding SYN (because there is a broken RFC3168 compliant
1280	   implementation that behaves this way), RFC3168 specifies that the
1281	   whole connection MUST revert to Not-ECT.

1283	   Also note that, whenever the SYN flag of a TCP segment is set
1284	   (including when the ACK flag is also set), the NS, CWR and ECE flags
1285	   ( i.e the ECI field of the SYN-ACK) MUST NOT be interpreted as the
1286	   3-bit ECI value, which is only set as a copy of the local ECC value
1287	   in non-SYN packets.

1289	6.1.4.  Extended ECN (EECN) Field Settings during Flow Start or after
1290	        Idle Periods

1292	   If the originator (A) of a TCP connection supports re-ECN it MUST set
1293	   the extended ECN (EECN) field in the IP header of the initial SYN
1294	   packet to the feedback not established (FNE) codepoint.

1296	   FNE is a new extended ECN codepoint defined by this specification
1297	   (Section 4.2).  The feedback not established (FNE) codepoint is used
1298	   when the transport does not have the benefit of ECN feedback so it
1299	   cannot decide whether to set or clear the RE flag.

1301	   If after receiving a SYN the server B has set its sending half-
1302	   connection into RECN mode or RECN-Co mode, it MUST set the extended
1303	   ECN field in the IP header of its SYN ACK to the feedback not
1304	   established (FNE) codepoint.  Note the careful wording here, which
1305	   means that Re-ECT server B MUST set FNE on a SYN ACK whether it is
1306	   responding to a SYN from a Re-ECT client or from a client that is
1307	   merely ECN-capable.  This is because FNE indicates the transport is
1308	   ECN capable as well as re-ECN capable.

1310	   The original ECN specification [RFC3168] required SYNs and SYN ACKs
1311	   to use the Not-ECT codepoint of the ECN field.  The aim was to
1312	   prevent well-known DoS attacks such as SYN flooding being able to
1313	   gain from the advantage that ECN capability afforded over drop at
1314	   ECN-capable routers.

1316	   For a SYN ACK, Kuzmanovic [RFC5562] has shown that this caution was
1317	   unnecessary, and allows a SYN ACK to be ECN-capable to improve
1318	   performance.  By stipulating the FNE codepoint for the initial SYN,
1319	   we comply with RFC3168 in word but not in spirit, because we have
1320	   indeed set the ECN field to Not-ECT, but we have extended the ECN
1321	   field with another bit.  And it will be seen (Section 5.3) that we
1322	   have defined one setting of that bit to mean an ECN-capable
1323	   transport.  Therefore, by proposing that the FNE codepoint MUST be
1324	   used on the initial SYN of a connection, we have gone further by
1325	   proposing to make the initial SYN ECN-capable too.  Section 5.4
1326	   justifies deciding to make the initial SYN ECN-capable.

1328	   Once a TCP half connection is in RECN mode or RECN-Co mode, FNE will
1329	   have already been set on the initial SYN and possibly the SYN ACK as
1330	   above.  But each re-ECN sender will have to set FNE cautiously on a
1331	   few data packets as well, given a number of packets will usually have
1332	   to be sent before sufficient congestion feedback is received.  The
1333	   behaviour will be different depending on the mode of the half-
1334	   connection:

1336	   RECN mode:  Given the constraints on TCP's initial window [RFC3390]
1337	      and its exponential window increase during slow start
1338	      phase [RFC5681], it turns out that the sender SHOULD set FNE on
1339	      the first and third data packets in its flow after the initial
1340	      3-way handshake, assuming equal sized data packets once a flow is
1341	      established.  Appendix D presents the calculation that led to this
1342	      conclusion.  Below, after running through the start of an example
1343	      TCP session, we give the intuition learned from that calculation.
1344	      {ToDo: unfortunately the calculation was based on erroneous
1345	      assumptions; see [I-D.conex-tcp-mods] for a better approach.}

1347	   RECN-Co mode:  A re-ECT sender that switches into re-ECN
1348	      compatibility mode or into Not-ECT mode (because it has detected
1349	      the corresponding host is not re-ECN capable) MUST limit its
1350	      initial window to 1 segment.  The reasoning behind this constraint
1351	      is given in Section 5.4.  Having set this initial window, a re-ECN
1352	      sender in RECN-Co mode SHOULD set FNE on the first and third data
1353	      packets in a flow, as for RECN mode.

1355	   +----+------+----------------+-------+-------+---------------+------+
1356	   |    | Data | TCP A(Re-ECT)  | IP A  | IP B  | TCP B(Re-ECT) | Data |
1357	   +----+------+----------------+-------+-------+---------------+------+
1358	   |    | Byte |  SEQ  ACK CTL  | EECN  | EECN  |  SEQ  ACK CTL | Byte |
1359	   | -- | ---- | -------------  | ----- | ----- | ------------- | ---- |
1360	   |  1 |      | 0100      SYN  | FNE   | -->   |      R.ECC=0  |      |
1361	   |    |      |    CWR,ECE,NS  |       |       |               |      |
1362	   |  2 |      |      R.ECC=0   | <--   | FNE   | 0300 0101     |      |
1363	   |    |      |                |       |       |   SYN,ACK,CWR |      |
1364	   |  3 |      | 0101 0301 ACK  | RECT  | -->   |      R.ECC=0  |      |
1365	   |  4 | 1000 | 0101 0301 ACK  | FNE   | -->   |      R.ECC=0  |      |
1366	   |  5 |      |      R.ECC=0   | <--   | FNE   | 0301 1102 ACK | 1460 |
1367	   |  6 |      |      R.ECC=0   | <--   | RECT  | 1762 1102 ACK | 1460 |
1368	   |  7 |      |      R.ECC=0   | <--   | FNE   | 3222 1102 ACK | 1460 |
1369	   |  8 |      | 1102 1762 ACK  | RECT  | -->   |      R.ECC=0  |      |
1370	   |  9 |      |      R.ECC=0   | <--   | RECT  | 4682 1102 ACK | 1460 |
1371	   | 10 |      |      R.ECC=0   | <--   | RECT  | 6142 1102 ACK | 1460 |
1372	   | 11 |      | 1102 3222 ACK  | RECT  | -->   |      R.ECC=0  |      |
1373	   | 12 |      |      R.ECC=0   | <--   | RECT  | 7602 1102 ACK | 1460 |
1374	   | 13 |      |      R.ECC=1   | <*-   | RECT  | 9062 1102 ACK | 1460 |
1375	   |    |      | ...            |       |       |               |      |
1376	   +----+------+----------------+-------+-------+---------------+------+

1378	                      Table 6: TCP Session Example #1

1380	   Table 6 shows an example TCP session, where the server B sets FNE on
1381	   its first and third data packets (lines 5 & 7) as well as on the
1382	   initial SYN ACK as previously described.  The left hand half of the
1383	   table shows the relevant settings of headers sent by client A in
1384	   three layers: the TCP payload size; TCP settings; then IP settings.
1385	   The right hand half gives equivalent columns for server B. The only
1386	   TCP settings shown are the sequence number (SEQ), acknowledgement
1387	   number (ACK) and the relevant control (CTL) flags that the relevant
1388	   sending host sets in the TCP header.  The IP columns show the setting
1389	   of the extended ECN (EECN) field.

1391	   Also shown on the receiving side of the table is the value of the
1392	   receiver's echo congestion counter (R.ECC) after processing the
1393	   incoming EECN header.  Note that, once a host sets a half-connection
1394	   into RECN mode, it MUST initialise its local value of ECC to zero.

1396	   The intuition that Appendix D gives for why a sender should set FNE
1397	   on the first and third data packets is as follows.  At line 13, a
1398	   packet sent by B is shown with an '*', which means it has been
1399	   congestion marked by an intermediate queue from RECT to CE(-1).  On
1400	   receiving this CE marked packet, client A increments its ECC counter
1401	   to 1 as shown.  This was the 7th data packet B sent, but before
1402	   feedback about this event returns to B, it might well have sent many
1403	   more packets.  Indeed, during exponential slow start, about as many
1404	   packets will be in flight (unacknowledged) as have been acknowledged.
1405	   So, when the feedback from the congestion event on B's 7th segment
1406	   returns, B will have sent about 7 further packets that will still be
1407	   in flight.  At that stage, B's best estimate of the network's packet
1408	   marking fraction will be 1/7.  So, as B will have sent about 14
1409	   packets, it should have already marked 2 of them as FNE in order to
1410	   have marked 1/7; hence the need to have set the first and third data
1411	   packets to FNE.

1413	   Client A's behaviour in Table 6 also shows FNE being set on the first
1414	   SYN and the first data packet (lines 1 & 4), but in this case it
1415	   sends no more data packets, so of course, it cannot, and does not
1416	   need to, set FNE again.  Note that in the A-B direction there is no
1417	   need to set FNE on the third part of the three-way hand-shake (line
1418	   3---the ACK).

1420	   Note that in this section we have used the word SHOULD rather than
1421	   MUST when specifying how to set FNE on data segments before positive
1422	   congestion feedback arrives (but note that the word MUST was used for
1423	   FNE on the SYN and SYN ACK).  FNE is only RECOMMENDED for the first
1424	   and third data segments to entertain the possibility that the TCP
1425	   transport has the benefit of other knowledge of the path, which it
1426	   re-uses from one flow for the benefit of a newly starting flow.  For
1427	   instance, one flow can re-use knowledge of other flows between the
1428	   same hosts if using a Congestion Manager [RFC3124] or when a proxy
1429	   host aggregates congestion information for large numbers of flows.

1431	   {ToDo: There is probably scope for re-writing the above in a
1432	   different way so that it says MUST unless some other knowledge of the
1433	   path is available.  See earlier note pointing out FNE on 1st & 3rd is
1434	   too few.}

1436	   After an idle period of more than 1 second, a re-ECN sender transport
1437	   MUST set the EECN field of the packet that resumes the connection to
1438	   FNE.  Note that this next packet may be sent a very long time later,
1439	   a packet does NOT have to be sent after 1 second of idling.  In order
1440	   that the design of network policers can be deterministic, this
1441	   specification deliberately puts an absolute lower limit on how long a
1442	   connection can be idle before the packet that resumes the connection
1443	   must be set to FNE, rather than relating it to the connection round
1444	   trip time.  We use the lower bound of the retransmission timeout
1445	   (RTO) [RFC6298], which is commonly used as the idle period before TCP
1446	   must reduce to the restart window [RFC5681].  Note our specification
1447	   of re-ECN's idle period is NOT intended to change the idle period for
1448	   TCP's restart, nor indeed for any other purposes.

1450	   {ToDo: Describe how the sender falls back to RFC3168 modes if packets
1451	   don't appear to be getting through (to work round firewalls
1452	   discarding packets they consider unusual).}

1454	   {ToDo: Possible future capabilities for changing Slow Start}

1456	6.1.5.  Pure ACKS, Retransmissions, Window Probes and Partial ACKs

1458	   A re-ECN sender MUST clear the RE flag to "0" and set the ECN field
1459	   to Not-ECT in pure ACKs, retransmissions and window probes, as
1460	   specified in  [RFC3168].  Our eventual goal is for all packets to be
1461	   sent with re-ECN enabled, and we believe the semantics of the ECI
1462	   field go a long way towards being able to achieve this.  However, we
1463	   have not completed a full security analysis for these cases,
1464	   therefore, currently we merely re-state current practice.

1466	   We must also reconcile the facts that congestion marking is applied
1467	   to packets but acknowledgements cover octet ranges and acknowledged
1468	   octet boundaries need not match the transmitted boundaries.  The
1469	   general principle we work to is to remain compatible with TCP's
1470	   congestion control which is driven by congestion events at packet
1471	   granularity while at the same time aiming to blank the RE flag on at
1472	   least as many octets in a flow as have been marked CE.

1474	   Therefore, a re-ECN TCP receiver MUST increment its ECC value as many
1475	   times as CE marked packets have been received.  And that value MUST
1476	   be echoed to the sender in the first available ACK using the ECI
1477	   field.  This ensures the TCP sender's congestion control receives
1478	   timely feedback on congestion events at the same packet granularity
1479	   that they were generated on congested queues.

1481	   Then, a re-ECN sender stores the difference D between its own ECC
1482	   value and the incoming ECI field by incrementing a counter R. Then, R
1483	   is decremented by 1 each subsequent packet that is sent with the RE
1484	   flag blanked, until R is no longer positive.  Using this technique,
1485	   whenever a re-ECN transport sends a not re-ECN capable packet (e.g. a
1486	   retransmission), the remaining packets required to have the RE flag
1487	   blanked will be automatically carried over to subsequent packets,
1488	   through the variable R.

1490	   This does not ensure precisely the same number of octets have RE
1491	   blanked as were CE marked.  But we believe positive errors will
1492	   cancel negative over a long enough period. {ToDo: However, more
1493	   research is needed to prove whether this is so.  If it is not, it may
1494	   be necessary to increment and decrement R in octets rather than
1495	   packets, by incrementing R as the product of D and the size in octets
1496	   of packets being sent (typically the MSS).}

1498	6.2.  Other Transports

1500	6.2.1.  General Guidelines for Adding Re-ECN to Other Transports

1502	   As a general rule, Re-ECT sender transports that have established the
1503	   receiver transport is at least ECN-capable (not necessarily re-ECN
1504	   capable) MUST blank the RE codepoint for at least as many octets as
1505	   arrive at receiver with the CE codepoint set.  Re-ECN-capable sender
1506	   transports should always initialise the ECN field to the ECT(1)
1507	   codepoint once a flow is established.

1509	   If the sender transport does not have sufficient feedback to even
1510	   estimate the path's CE rate, it SHOULD set FNE continuously.  If the
1511	   sender transport has some, perhaps stale, feedback to estimate that
1512	   the path's CE rate is nearly definitely less than E%, the transport
1513	   MAY blank RE in packets for E% of sent octets, and set the RECT
1514	   codepoint for the remainder.

1516	   The following sections give guidelines on how re-ECN support could be
1517	   added to RSVP or NSIS, to DCCP, and to SCTP - although separate
1518	   Internet drafts will be necessary to document the exact mechanics of
1519	   re-ECN in each of these protocols.

1521	   {ToDo: Give a brief outline of what would be expected for each of the
1522	   following:

1524	   o  UDP fire and forget (e.g.  DNS)

1526	   o  UDP streaming with no feedback

1528	   o  UDP streaming with feedback

1530	   }

1532	6.2.2.  Guidelines for adding Re-ECN to RSVP or NSIS

1534	   A separate I-D has been submitted [I-D.re-pcn-border-cheat]
1535	   describing how re-ECN can be used in an edge-to-edge rather than end-
1536	   to-end scenario.  It can then be used by downstream networks to
1537	   police whether upstream networks are blocking new flow reservations
1538	   when downstream congestion is too high, even though the congestion is
1539	   in other operators' downstream networks.  This relates to current
1540	   IETF work on Admission Control over Diffserv using Pre-Congestion
1541	   Notification (PCN)  [RFC5559].

1543	6.2.3.  Guidelines for adding Re-ECN to DCCP

1545	   Beside adjusting the initial features negotiation sequence, operating
1546	   re-ECN in DCCP [RFC4340] could be achieved by defining a new option
1547	   to be added to acknowledgments, that would include a multibit field
1548	   where the destination could copy its ECC.

1550	6.2.4.  Guidelines for adding Re-ECN to SCTP

1552	   Appendix A in [RFC4960] gives the specifications for SCTP to support
1553	   ECN.  Similar steps should be taken to support re-ECN.  Beside
1554	   adjusting the initial features negotiation sequence, operating re-ECN
1555	   in SCTP could be achieved by defining a new control chunk, that would
1556	   include a multibit field where the destination could copy its ECC

1558	7.  Incremental Deployment

1560	   The design of the re-ECN protocol started from the fact that the
1561	   current ECN marking behaviour of queues was sufficient and that re-
1562	   feedback could be introduced around these queues by changing the
1563	   sender behaviour but not the routers.  Otherwise, if we had required
1564	   routers to be changed, the chance of encountering a path that had
1565	   every router upgraded would be vanishingly small during early
1566	   deployment, giving no incentive to start deployment.  Also, as there
1567	   is no new forwarding behaviour, routers and hosts do not have to
1568	   signal or negotiate anything.

1570	   However, networks that choose to protect themselves using re-ECN do
1571	   have to add new security functions at their trust boundaries with
1572	   others.  They distinguish legacy traffic by its ECN field.  Traffic
1573	   from Not-ECT transports is distinguishable by its Not-ECT marking.
1574	   Traffic from RFC3168 compliant ECN transports is distinguished from
1575	   re-ECN by which of ECT(0) or ECT(1) is used.  We chose to use ECT(1)
1576	   for re-ECN traffic deliberately.  Existing ECN sources set ECT(0) on
1577	   either 50% (the nonce) or 100% (the default) of packets, whereas re-
1578	   ECN does not use ECT(0) at all.  We can use this distinguishing
1579	   feature of RFC3168 compliant ECN traffic to separate it out for
1580	   different treatment at the various border security functions: egress
1581	   dropping, ingress policing and border policing.

1583	   The general principle we adopt is that an egress dropper will not
1584	   drop any legacy traffic, but ingress and border policers will limit
1585	   the bulk rate of legacy traffic (Not-ECT, ECT(0) and those marked
1586	   with the unused codepoint) that can enter each network.  Then, during
1587	   early re-ECN deployment, operators can set very permissive (or non-
1588	   existent) rate-limits on legacy traffic, but once re-ECN
1589	   implementations are generally available, legacy traffic can be rate-
1590	   limited increasingly harshly.  Ultimately, an operator might choose
1591	   to block all legacy traffic entering its network, or at least only
1592	   allow through a trickle.

1594	   Then, as the limits are set more strictly, the more RFC3168 ECN
1595	   sources will gain by upgrading to re-ECN.  Thus, towards the end of
1596	   the voluntary incremental deployment period, RFC3168 compliant
1597	   transports can be given progressively stronger encouragement to
1598	   upgrade.

1600	   The following list of minor changes, brings together all the points
1601	   where re-ECN semantics for use of the two-bit ECN field are different
1602	   compared to RFC3168:

1604	   o  A re-ECN sender sets ECT(1) by default, whereas an RFC3168 sender
1605	      sets ECT(0) by default (Section 4.3);

1607	   o  No provision is necessary for a re-ECN capable source transport to
1608	      use the ECN nonce (Section 6.1.2.1);

1610	   o  Routers MAY preferentially drop different extended ECN codepoints
1611	      (Section 5.3);

1613	   o  Packets carrying the feedback not established (FNE) codepoint MAY
1614	      optionally be marked rather than dropped by routers, even though
1615	      their ECN field is Not-ECT (with the important caveat in
1616	      Section 5.3);

1618	   o  Packets may be dropped by policing nodes because of apparent
1619	      misbehaviour, not just because of congestion ;

1621	   o  Tunnel entry behaviour is still to be defined, but may have to be
1622	      different from RFC3168 (Section 5.6).

1624	   None of these changes REQUIRE any modifications to routers.  Also
1625	   none of these changes affect anything about end to end congestion
1626	   control; they are all to do with allowing networks to police that end
1627	   to end congestion control is well-behaved.

1629	8.  Related Work
1630	8.1.  Congestion Notification Integrity

1632	   The choice of two ECT code-points in the ECN field [RFC3168]
1633	   permitted future flexibility, optionally allowing the sender to
1634	   encode the experimental ECN nonce [RFC3540] in the packet stream.
1635	   This mechanism has since been included in the specifications of DCCP
1636	   [RFC4340].

1638	   {ToDo: DCCP provides nonce support - how does this affect the RFC?}

1640	   The ECN nonce is an elegant scheme that allows the sender to detect
1641	   if someone in the feedback loop - the receiver especially - tries to
1642	   claim no congestion was experienced when in fact congestion led to
1643	   packet drops or ECN marks.  For each packet it sends, the sender
1644	   chooses between the two ECT codepoints in a pseudo-random sequence.
1645	   Then, whenever the network marks a packet with CE, if the receiver
1646	   wants to deny congestion happened, she has to guess which ECT
1647	   codepoint was overwritten.  She has only a 50:50 chance of being
1648	   correct each time she denies a congestion mark or a drop, which
1649	   ultimately will give her away.

1651	   The purpose of a network-layer nonce should primarily be protection
1652	   of the network, while a transport-layer nonce would be better used to
1653	   protect the sender from cheating receivers.  Now, the assumption
1654	   behind the ECN nonce is that a sender will want to detect whether a
1655	   receiver is suppressing congestion feedback.  This is only true if
1656	   the sender's interests are aligned with the network's, or with the
1657	   community of users as a whole.  This may be true for certain large
1658	   senders, who are under close scrutiny and have a reputation to
1659	   maintain.  But we have to deal with a more hostile world, where
1660	   traffic may be dominated by peer-to-peer transfers, rather than
1661	   downloads from a few popular sites.  Often the `natural' self-
1662	   interest of a sender is not aligned with the interests of other
1663	   users.  It often wishes to transfer data quickly to the receiver as
1664	   much as the receiver wants the data quickly.

1666	   In contrast, the re-ECN protocol enables policing of an agreed rate-
1667	   response to congestion (e.g. TCP-friendliness) at the sender's
1668	   interface with the internetwork.  It also ensures downstream networks
1669	   can police their upstream neighbours, to encourage them to police
1670	   their users in turn.  But most importantly, it requires the sender to
1671	   declare path congestion to the network and it can remove traffic at
1672	   the egress if this declaration is dishonest.  So it can police
1673	   correctly, irrespective of whether the receiver tries to suppress
1674	   congestion feedback or whether the sender ignores genuine congestion
1675	   feedback.  Therefore the re-ECN protocol addresses a much wider range
1676	   of cheating problems, which includes the one addressed by the ECN
1677	   nonce.

1679	   {ToDo: Ensure we address the early ACK problem.}

1681	9.  Security Considerations

1683	   {ToDo: Describe attacks by networks on flows and by spoofing
1684	   sources.} {ToDo: Re-ECN & DNS servers}

1686	   This whole memo concerns the deployment of a secure congestion
1687	   control framework.  However, below we list some specific security
1688	   issues that we are still working on:

1690	   o  Malicious users have ability to launch dynamically changing
1691	      attacks, exploiting the time it takes to detect an attack, given
1692	      ECN marking is binary.  We are concentrating on subtle
1693	      interactions between the ingress policer and the egress dropper in
1694	      an effort to make it impossible to game the system.

1696	   o  There is an inherent need for at least some flow state at the
1697	      egress dropper given the binary marking environment, which leads
1698	      to an apparent vulnerability to state exhaustion attacks.  An
1699	      egress dropper design with bounded flow state is in write-up.

1701	   o  A malicious source can spoof another user's address and send
1702	      negative traffic to the same destination in order to fool the
1703	      dropper into sanctioning the other user's flow.  To prevent or
1704	      mitigate these two different kinds of DoS attack, against the
1705	      dropper and against given flows, we are considering various
1706	      protection mechanisms.

1708	   o  A malicious client can send requests using a spoofed source
1709	      address to a server (such as a DNS server) that tends to respond
1710	      with single packet responses.  This server will then be tricked
1711	      into having to set FNE on the first (and only) packet of all these
1712	      wasted responses.  Given packets marked FNE are worth +1, this
1713	      will cause such servers to consume more of their allowance to
1714	      cause congestion than they would wish to.  In general, re-ECN is
1715	      deliberately designed so that single packet flows have to bear the
1716	      cost of not discovering the congestion state of their path.  One
1717	      of the reasons for introducing re-ECN is to encourage short flows
1718	      to make use of previous path knowledge by moving the cost of this
1719	      lack of knowledge to sources that create short flows.  Therefore,
1720	      we in the long run we might expect services like DNS to aggregate
1721	      single packet flows into connections where it brings benefits.
1722	      However, this attack where DNS requests are made from spoofed
1723	      addresses genuinely forces the server to waste its resources.  The
1724	      only mitigating feature is that the attacker has to set FNE on
1725	      each of its requests if they are to get through an egress dropper
1726	      to a DNS server.  The attacker therefore has to consume as many
1727	      resources as the victim, which at least implies re-ECN does not
1728	      unwittingly amplify this attack.

1730	   Having highlighted outstanding security issues, we now explain the
1731	   design decisions that were taken based on a security-related
1732	   rationale.  It may seem that the six codepoints of the eight made
1733	   available by extending the ECN field with the RE flag have been used
1734	   rather wastefully to encode just five states.  In effect the RE flag
1735	   has been used as an orthogonal single bit, using up four codepoints
1736	   to encode the three states of positive, neutral and negative worth.
1737	   The mapping of the codepoints in an earlier version of this proposal
1738	   used the codepoint space more efficiently, but the scheme became
1739	   vulnerable to network operators bypassing congestion penalties by
1740	   focusing congestion marking on positive packets.  Appendix B explains
1741	   why fixing that problem while allowing for incremental deployment,
1742	   would have used another codepoint anyway.  So it was better to use
1743	   this orthogonal encoding scheme, which greatly simplified the whole
1744	   protocol and brought with it some subtle security benefits (see the
1745	   last paragraph of Appendix B).

1747	   With the scheme as now proposed, once the RE flag is set or cleared
1748	   by the sender or its proxy, it should not be written by the network,
1749	   only read.  So the endpoints can detect if any network maliciously
1750	   alters the RE flag.  IPsec AH integrity checking does not cover the
1751	   IPv4 option flags (they were considered mutable---even the one we
1752	   propose using for the RE flag that was `currently unused' when IPsec
1753	   was defined).  But it would be sufficient for a pair of endpoints to
1754	   make random checks on whether the RE flag was the same when it
1755	   reached the egress as when it left the ingress.  Indeed, if IPsec AH
1756	   had covered the RE flag, any network intending to alter sufficient RE
1757	   flags to make a gain would have focused its alterations on packets
1758	   without authenticating headers (AHs).

1760	   The security of re-ECN has been deliberately designed to not rely on
1761	   cryptography.

1763	10.  IANA Considerations

1765	   This memo includes no request to IANA (yet).

1767	   If this memo was to progress to standards track, it would list:

1769	   o  The new RE flag in IPv4 (Section 5.1) and its extension with the
1770	      ECN field to create a new set of extended ECN (EECN) codepoints;

1772	   o  The definition of the EECN codepoints for default Diffserv PHBs
1773	      (Section 4.2)

1775	   o  The Hop-by-Hop option ID for the new extension header for IPv6
1776	      (Section 5.2);

1778	   o  The new combinations of flags in the TCP header for capability
1779	      negotiation (Section 6.1.3);

1781	11.  Conclusions

1783	   {ToDo:}

1785	12.  Acknowledgements

1787	   Sebastien Cazalet and Andrea Soppera contributed to the idea of re-
1788	   feedback.  All the following have given helpful comments: Andrea
1789	   Soppera, David Songhurst, Peter Hovell, Louise Burness, Phil Eardley,
1790	   Steve Rudkin, Marc Wennink, Fabrice Saffre, Cefn Hoile, Steve Wright,
1791	   John Davey, Martin Koyabe, Carla Di Cairano-Gilfedder, Alexandru
1792	   Murgu, Nigel Geffen, Pete Willis, John Adams (BT), Sally Floyd
1793	   (ICIR), Joe Babiarz, Kwok Ho-Chan (Nortel), Stephen Hailes, Mark
1794	   Handley (who developed the attack with canceled packets), Adam
1795	   Greenhalgh (who developed the attack on DNS) (UCL), Jon Crowcroft
1796	   (Uni Cam), David Clark, Bill Lehr, Sharon Gillett, Steve Bauer (who
1797	   complemented our own dummy traffic attacks with others), Liz Maida
1798	   (MIT), Meral Shirazipour (Ericsson) and comments from participants in
1799	   the CRN/CFP Broadband and DoS-resistant Internet working groups.A
1800	   special thank you to Alessandro Salvatori for coming up with fiendish
1801	   attacks on re-ECN.

1803	13.  Comments Solicited

1805	   Comments and questions are encouraged and very welcome.  They can be
1806	   addressed to the IETF Congestion Exposure (ConEx) working group's
1807	   mailing list <conex@ietf.org>, and/or to the authors.

1809	14.  References

1811	14.1.  Normative References

1813	   [RFC2119]                  Bradner, S., "Key words for use in RFCs to
1814	                              Indicate Requirement Levels", BCP 14,
1815	                              RFC 2119, March 1997.

1817	   [RFC3168]                  Ramakrishnan, K., Floyd, S., and D. Black,
1818	                              "The Addition of Explicit Congestion
1819	                              Notification (ECN) to IP", RFC 3168,
1820	                              September 2001.

1822	   [RFC3390]                  Allman, M., Floyd, S., and C. Partridge,
1823	                              "Increasing TCP's Initial Window",
1824	                              RFC 3390, October 2002.

1826	   [RFC4302]                  Kent, S., "IP Authentication Header",
1827	                              RFC 4302, December 2005.

1829	   [RFC4340]                  Kohler, E., Handley, M., and S. Floyd,
1830	                              "Datagram Congestion Control Protocol
1831	                              (DCCP)", RFC 4340, March 2006.

1833	   [RFC4341]                  Floyd, S. and E. Kohler, "Profile for
1834	                              Datagram Congestion Control Protocol
1835	                              (DCCP) Congestion Control ID 2: TCP-like
1836	                              Congestion Control", RFC 4341, March 2006.

1838	   [RFC4342]                  Floyd, S., Kohler, E., and J. Padhye,
1839	                              "Profile for Datagram Congestion Control
1840	                              Protocol (DCCP) Congestion Control ID 3:
1841	                              TCP-Friendly Rate Control (TFRC)",
1842	                              RFC 4342, March 2006.

1844	   [RFC4835]                  Manral, V., "Cryptographic Algorithm
1845	                              Implementation Requirements for
1846	                              Encapsulating Security Payload (ESP) and
1847	                              Authentication Header (AH)", RFC 4835,
1848	                              April 2007.

1850	   [RFC4960]                  Stewart, R., "Stream Control Transmission
1851	                              Protocol", RFC 4960, September 2007.

1853	   [RFC5562]                  Kuzmanovic, A., Mondal, A., Floyd, S., and
1854	                              K. Ramakrishnan, "Adding Explicit
1855	                              Congestion Notification (ECN) Capability
1856	                              to TCP's SYN/ACK Packets", RFC 5562,
1857	                              June 2009.

1859	   [RFC5681]                  Allman, M., Paxson, V., and E. Blanton,
1860	                              "TCP Congestion Control", RFC 5681,
1861	                              September 2009.

1863	   [RFC6040]                  Briscoe, B., "Tunnelling of Explicit
1864	                              Congestion Notification", RFC 6040,
1865	                              November 2010.

1867	14.2.  Informative References

1869	   [ARI05]                    Adams, J., Roberts, L., and A.
1870	                              IJsselmuiden, "Changing the Internet to
1871	                              Support Real-Time Content Supply from a
1872	                              Large Fraction of Broadband Residential
1873	                              Users", BT Technology Journal
1874	                              (BTTJ) 23(2), April 2005.

1876	   [I-D.conex-tcp-mods]       Kuehlewind, M. and R. Scheffenegger, "TCP
1877	                              modifications for Congestion Exposure",
1878	                              draft-ietf-conex-tcp-modifications-05
1879	                              (work in progress), February 2014.

1881	   [I-D.re-ecn-motiv]         Briscoe, B., Jacquet, A., Moncaster, T.,
1882	                              and A. Smith, "Re-ECN: A Framework for
1883	                              adding Congestion Accountability to
1884	                              TCP/IP",
1885	                              draft-briscoe-conex-re-ecn-motiv-03 (work
1886	                              in progress), March 2014.

1888	   [I-D.re-pcn-border-cheat]  Briscoe, B., "Emulating Border Flow
1889	                              Policing using Re-PCN on Bulk Data",
1890	                              draft-briscoe-re-pcn-border-cheat-03 (work
1891	                              in progress), October 2009.

1893	   [RFC2309]                  Braden, B., Clark, D., Crowcroft, J.,
1894	                              Davie, B., Deering, S., Estrin, D., Floyd,
1895	                              S., Jacobson, V., Minshall, G., Partridge,
1896	                              C., Peterson, L., Ramakrishnan, K.,
1897	                              Shenker, S., Wroclawski, J., and L. Zhang,
1898	                              "Recommendations on Queue Management and
1899	                              Congestion Avoidance in the Internet",
1900	                              RFC 2309, April 1998.

1902	   [RFC2475]                  Blake, S., Black, D., Carlson, M., Davies,
1903	                              E., Wang, Z., and W. Weiss, "An
1904	                              Architecture for Differentiated Services",
1905	                              RFC 2475, December 1998.

1907	   [RFC3124]                  Balakrishnan, H. and S. Seshan, "The
1908	                              Congestion Manager", RFC 3124, June 2001.

1910	   [RFC3514]                  Bellovin, S., "The Security Flag in the
1911	                              IPv4 Header", RFC 3514, April 2003.

1913	   [RFC3540]                  Spring, N., Wetherall, D., and D. Ely,
1914	                              "Robust Explicit Congestion Notification
1915	                              (ECN) Signaling with Nonces", RFC 3540,
1916	                              June 2003.

1918	   [RFC4301]                  Kent, S. and K. Seo, "Security
1919	                              Architecture for the Internet Protocol",
1920	                              RFC 4301, December 2005.

1922	   [RFC5129]                  Davie, B., Briscoe, B., and J. Tay,
1923	                              "Explicit Congestion Marking in MPLS",
1924	                              RFC 5129, January 2008.

1926	   [RFC5559]                  Eardley, P., "Pre-Congestion Notification
1927	                              (PCN) Architecture", RFC 5559, June 2009.

1929	   [RFC6298]                  Paxson, V., Allman, M., Chu, J., and M.
1930	                              Sargent, "Computing TCP's Retransmission
1931	                              Timer", RFC 6298, June 2011.

1933	   [Re-fb]                    Briscoe, B., Jacquet, A., Di Cairano-
1934	                              Gilfedder, C., Salvatori, A., Soppera, A.,
1935	                              and M. Koyabe, "Policing Congestion
1936	                              Response in an Internetwork Using Re-
1937	                              Feedback", ACM SIGCOMM CCR 35(4)277--288,
1938	                              August 2005, <http://www.acm.org/sigs/
1939	                              sigcomm/sigcomm2005/
1940	                              techprog.html#session8>.

1942	   [Savage99]                 Savage, S., Cardwell, N., Wetherall, D.,
1943	                              and T. Anderson, "TCP congestion control
1944	                              with a misbehaving receiver", ACM SIGCOMM
1945	                              CCR 29(5), October 1999, <http://
1946	                              citeseer.ist.psu.edu/savage99tcp.html>.

1948	   [Steps_DoS]                Handley, M. and A. Greenhalgh, "Steps
1949	                              towards a DoS-resistant Internet
1950	                              Architecture", Proc. ACM SIGCOMM workshop
1951	                              on Future directions in network
1952	                              architecture (FDNA'04) pp 49--56,
1953	                              August 2004.

1955	   [tcp-rcv-cheat]            Moncaster, T., Briscoe, B., and A.
1956	                              Jacquet, "A TCP Test to Allow Senders to
1957	                              Identify Receiver Non-Compliance",
1958	                              draft-moncaster-tcpm-rcv-cheat-02 (work in
1959	                              progress), November 2007.

1961	Appendix A.  Precise Re-ECN Protocol Operation

1963	   The protocol operation in Section 4.3 was described as an
1964	   approximation.  In fact, standard ECN marking at a queue combines 1%
1965	   and 2% marking into slightly less than 3% whole-path marking, because
1966	   queues deliberately mark CE whether or not it has already been marked
1967	   by another queue upstream.  So the combined marking fraction would
1968	   actually be 100% - (100% - 1%)(100% - 2%) = 2.98%.

1970	   To generalise this we will need some notation.

1972	   o  j represents the index of each resource (typically queues) along a
1973	      path, ranging from 0 at the first queue to n-1 at the last.

1975	   o  m_j represents the fraction of octets to be *m*arked CE by a
1976	      particular queue (whether or not they are already marked) because
1977	      of congestion of resource j.

1979	   o  u_j represents congestion signals arriving from *u*pstream of
1980	      resource j, being the fraction of CE marking in arriving packet
1981	      headers (before marking).

1983	   o  p_j represents *p*ath congestion, being the fraction of packets
1984	      arriving at resource j with the RE flag blanked (excluding Not-
1985	      RECT packets).

1987	   o  v_j denotes expected congestion downstream of resource j, which
1988	      can be thought of as a *v*irtual marking fraction, being derived
1989	      from two other marking fractions.

1991	   Observed fractions of each particular codepoint (u, p and v) and
1992	   queue marking rate m are dimensionless fractions, being the ratio of
1993	   two data volumes (marked and total) over a monitoring period.  All
1994	   measurements are in terms of octets, not packets, assuming that line
1995	   resources are more congestible than packet processing.

1997	   The path congestion (RE blanking fraction) set by the sender should
1998	   reflect upstream congestion (CE marking fraction) from the viewpoint
1999	   of the destination, which it feeds back to the sender.  Therefore in
2000	   the steady state

2002	      p_0  = u_n
2003	           = 1 - (1 - m_1)(1 - m_2)...

2005	   Similarly, at some point j in the middle of the network, given p = 1
2006	   - (1 - u_j)(1 - v_j), then

2008	      v_j  = 1 - (1 - p)/(1 - u_j)

2010	          ~= p - u_j;                      if u_j << 100%

2012	   So, between the two routers in the example in Section 4.3, congestion
2013	   downstream is
2014	      v_1  = 100.00% - (100% - 2.98%) / (100% - 1.00%)
2015	           = 2.00%,

2017	   or a useful approximation of downstream congestion is

2019	      v_1 ~= 2.98% - 1.00%
2020	          ~= 1.98%.

2022	Appendix B.  Justification for Two Codepoints Signifying Zero Worth
2023	             Packets

2025	   It may seem a waste of a codepoint to set aside two codepoints of the
2026	   Extended ECN field to signify zero worth (RECT and CE(0) are both
2027	   worth zero).  The justification is subtle, but worth recording.

2029	   The original version of Re-ECN ([Re-fb] and draft-00 of this memo)
2030	   used three codepoints for neutral (ECT(1)), positive (ECT(0)) and
2031	   negative (CE) packets.  The sender set packets to neutral unless re-
2032	   echoing congestion, when it set them positive, in much the same way
2033	   that it blanks the RE flag in the current protocol.  However, routers
2034	   were meant to mark congestion by setting packets negative (CE)
2035	   irrespective of whether they had previously been neutral or positive.

2037	   However, we did not arrange for senders to remember which packet had
2038	   been sent with which codepoint, or for feedback to say exactly which
2039	   packets arrived with which codepoints.  The transport was meant to
2040	   inflate the number of positive packets it sent to allow for a few
2041	   being wiped out by congestion marking.  We (wrongly) assumed that
2042	   routers would congestion mark packets indiscriminately, so the
2043	   transport could infer how many positive packets had been marked and
2044	   compensate accordingly by re-echoing.  But this created a perverse
2045	   incentive for routers to preferentially congestion mark positive
2046	   packets rather than neutral ones.

2048	   We could have removed this perverse incentive by requiring Re-ECN
2049	   senders to remember which packets they had sent with which codepoint.
2050	   And for feedback from the receiver to identify which packets arrived
2051	   as which.  Then, if a positive packet was congestion marked to
2052	   negative, the sender could have re-echoed twice to maintain the
2053	   balance between positive and negative at the receiver.

2055	   Instead, we chose to make re-echoing congestion (blanking RE)
2056	   orthogonal to congestion notification (marking CE), which required a
2057	   second neutral codepoint.  Then the receiver would be able to detect
2058	   and echo a congestion event even if it arrived on a packet that had
2059	   originally been positive.

2061	   If we had added extra complexity to the sender and receiver
2062	   transports to track changes to individual packets, we could have made
2063	   it work, but then routers would have had an incentive to mark
2064	   positive packets with half the probability of neutral packets.  That
2065	   in turn would have led router algorithms to become more complex.
2066	   Then senders wouldn't know whether a mark had been introduced by a
2067	   simple or a complex router algorithm.  That in turn would have
2068	   required another codepoint to distinguish between RFC3168 ECN and new
2069	   Re-ECN router marking.

2071	   Once the cost of IP header codepoint real-estate was the same for
2072	   both schemes, there was no doubt that the simpler option for
2073	   endpoints and for routers should be chosen.  The resulting protocol
2074	   also no longer needed the tricky inflation/deflation complexity of
2075	   the original (broken) scheme.  It was also much simpler to understand
2076	   conceptually.

2078	   A further advantage of the new orthogonal four-codepoint scheme was
2079	   that senders owned sole rights to change the RE flag and routers
2080	   owned sole rights to change the ECN field.  Although we still arrange
2081	   the incentives so neither party strays outside their dominion, these
2082	   clear lines of authority simplify the matter.

2084	   Finally, a little redundancy can be very powerful in a scheme such as
2085	   this.  In one flow, the proportion of packets changed to CE should be
2086	   the same as the proportion of RECT packets changed to CE(-1) and the
2087	   proportion of Re-Echo packets changed to CE(0).  Double checking
2088	   using such redundant relationships can improve the security of a
2089	   scheme (cf. double-entry book-keeping or the ECN Nonce).
2090	   Alternatively, it might be necessary to exploit the redundancy in the
2091	   future to encode an extra information channel.

2093	Appendix C.  ECN Compatibility

2095	   The rationale for choosing the particular combinations of SYN and SYN
2096	   ACK flags in Section 6.1.3 is as follows.

2098	   Choice of SYN flags:  A Re-ECN sender can work with RFC3168 compliant
2099	      ECN receivers so we wanted to use the same flags as would be used
2100	      in an ECN-setup SYN [RFC3168] (CWR=1, ECE=1).  But at the same
2101	      time, we wanted a server (host B) that is Re-ECT to be able to
2102	      recognise that the client (A) is also Re-ECT.  We believe also
2103	      setting NS=1 in the initial SYN achieves both these objectives, as
2104	      it should be ignored by RFC3168 compliant ECT receivers and by
2105	      ECT-Nonce receivers.  But senders that are not Re-ECT should not
2106	      set NS=1.  At the time ECN was defined, the NS flag was not
2107	      defined, so setting NS=1 should be ignored by existing ECT
2108	      receivers (but testing against implementations may yet prove
2109	      otherwise).  The ECN Nonce RFC [RFC3540] is silent on what the NS
2110	      field might be set to in the TCP SYN, but we believe the intent
2111	      was for a nonce client to set NS=0 in the initial SYN (again only
2112	      testing will tell).  Therefore we define a Re-ECN-setup SYN as one
2113	      with NS=1, CWR=1 & ECE=1

2115	   Choice of SYN ACK flags:  Choice of SYN ACK: The client (A) needs to
2116	      be able to determine whether the server (B) is Re-ECT.  The
2117	      original ECN specification required an ECT server to respond to an
2118	      ECN-setup SYN with an ECN-setup SYN ACK of CWR=0 and ECE=1.  There
2119	      is no room to modify this by setting the NS flag, as that is
2120	      already set in the SYN ACK of an ECT-Nonce server.  So we used the
2121	      only combination of CWR and ECE that would not be used by existing
2122	      TCP receivers: CWR=1 and ECE=0.  The original ECN specification
2123	      defines this combination as a non-ECN-setup SYN ACK, which remains
2124	      true for RFC3168 compliant and Nonce ECTs.  But for Re-ECN we
2125	      define it as a Re-ECN-setup SYN ACK.  We didn't use a SYN ACK with
2126	      both CWR and ECE cleared to 0 because that would be the likely
2127	      response from most Not-ECT receivers.  And we didn't use a SYN ACK
2128	      with both CWR and ECE set to 1 either, as at least one broken
2129	      receiver implementation echoes whatever flags were in the SYN into
2130	      its SYN ACK.  Therefore we define a Re-ECN-setup SYN ACK as one
2131	      with CWR=1 & ECE=0.

2133	   Choice of two alternative SYN ACKs:  the NS flag may take either
2134	      value in a Re-ECN-setup SYN ACK.  Section 5.4 REQUIRES that a Re-
2135	      ECT server MUST set the NS flag to 1 in a Re-ECN-setup SYN ACK to
2136	      echo congestion experienced (CE) on the initial SYN.  Otherwise a
2137	      Re-ECN-setup SYN ACK MUST be returned with NS=0.  The only current
2138	      known use of the NS flag in a SYN ACK is to indicate support for
2139	      the ECN nonce, which will be negotiated by setting CWR=0 & ECE=1.
2140	      Given the ECN nonce MUST NOT be used for a RECN mode connection, a
2141	      Re-ECN-setup SYN ACK can use either setting of the NS flag without
2142	      any risk of confusion, because the CWR & ECE flags will be
2143	      reversed relative to those used by an ECN nonce SYN ACK.

2145	   {ToDo: include the text below, either here, or in the algorithm
2146	   sections} At an egress dropper, well-behaved RFC3168 compliant flows
2147	   will appear to consist mostly of ECT(0) packets, with a few CE(0)
2148	   packet.  And, if the legacy source is setting the ECN nonce, the
2149	   majority of packets will be an equal mix of ECT(0) and ECT(1) packets
2150	   (the latter appearing to be Re-Echo packets in Re-ECN terms).  None
2151	   of these three packet markings is negative, so an egress dropper can
2152	   handle all legacy flows in bulk and, as long as they don't send any
2153	   packets using Re-ECN markings, it need not drop any legacy packets.
2154	   So, as soon as an ECT(0) packet is seen, its flow ID can be added to
2155	   the set of known legacy flows (a single Bloom filter would suffice).
2156	   But, if any packets in flows classified as RFC3168 compliant are
2157	   marked with any other marking than the three expected, the flow can
2158	   be removed from the RFC3168 set, to be treated in bulk with mis-
2159	   behaving Re-ECN flows---the remainder of flow IDs that require no
2160	   flow state to be held.

2162	   To an ingress Re-ECN policer, legacy ECN flows will appear as very
2163	   highly congested paths.  When policers are first deployed they can be
2164	   configured permissively, allowing through both `RFC3168' ECN and
2165	   misbehaving Re-ECN flows.  Then, as the threshold is set more
2166	   strictly, the more RFC3168 ECN sources will gain by upgrading to Re-
2167	   ECN.  Thus, towards the end of the voluntary incremental deployment
2168	   period, RFC3168 transports can be given progressively stronger
2169	   encouragement to upgrade.

2171	Appendix D.  Packet Marking with FNE During Flow Start

2173	   FNE (feedback not established) packets have two functions.  Their
2174	   main role is to announce the start of a new flow when feedback has
2175	   not yet been established.  However they also have the role of
2176	   balancing the expected feedback and can be used where there are
2177	   sudden changes in the rate of transmission.  Whilst this should not
2178	   happen under TCP their use as speculative marking is used in building
2179	   the following argument as to why the first and third packets should
2180	   be set to FNE.

2182	   The proportion of FNE packets in each round-trip should be a high
2183	   estimate of the potential error in the balance of number of
2184	   congestion marked packets versus number of re-echo packets already
2185	   issued.

2187	   Let's call:

2189	      S: the number of the TCP segments sent so far

2191	      F: the number of FNE packets sent so far

2193	      R: the number of Re-Echo packets sent so far

2195	      A: the number of acknowledgments received so far

2197	      C: the number of acknowledgments echoing a CE packet

2199	   In normal operation, when we want to send packet S+1, we first need
2200	   to check that enough Re-Echo packets have been issued:

2202	   If R<C, then S+1 will be a Re-echo packet

2204	   Next we need to estimate the amount of congestion observed so far.
2205	   If congestion was stationary, it could be estimated as C/A. A
2206	   pessimistic bound is (C+1)/(A+1) which assumes that the next
2207	   acknowledgment will echo a CE packet; we'll use that more pessimistic
2208	   estimate to drive the generation of FNE packets.

2210	   The number of CE packets expected when (S+1) will be acknowledged is
2211	   therefore (S+1)*(C+1)/(A+1).  Packet S+1 should be set to FNE if that
2212	   expected value exceeds the sum of FNE and Re-Echo packets sent so
2213	   far.

2215	      If  (F+R)<(S+1)*(C+1)/(A+1),
2216	        then S+1 will be set to FNE
2217	        else S+1 will be set to RECT

2219	   So the full test should be:

2221	      When packet (S+1) is about to be sent...
2222	        If R<C,
2223	           then S+1 will be set to Re-Echo
2224	        Else if  (F+R)<(S+1)*(C+1)/(A+1),
2225	          then S+1 will be set to FNE
2226	        Else S+1 will be set to RECT

2228	   This means that at any point, given A, R, F, C, the source could send
2229	   another k RECT packets, so that k < (F+R)*(A+1)/(C+1)-S

2231	   The above scheme is independent of the actions of both the dropper
2232	   and policer and doesn't depend on the rate adaptation discipline of
2233	   the source.  It only defines Re-Echo packets as notification of
2234	   effective end-to-end congestion (as witnessed at the previous round-
2235	   trip), and FNE packets as notification of speculative end-to-end
2236	   congestion based on a high estimate of congestion

2238	   In practice, for any source:

2240	   o  for the first packet, A=R=F=C=S=0 ==> 1 FNE

2242	   o  if the acknowledgment doesn't echo a mark

2244	      *  for the second packet, A=F=S=1 R=C=0 ==> 1 RECT

2246	      *  for the third packet, S=2 A=F=1 R=C=0 ==> 1 FNE

2248	   o  if no acknowledgement for these two packets echoes a congestion
2249	      mark, then {A=S=3 F=2 R=C=0} which gives k<2*4/1-3, so the source

2251	   o  if no acknowledgement for these four packets echoes a congestion
2252	      mark, then {A=S=7 F=2 R=C=0} which gives k<2*8/1-7, so the source
2253	      could send another 8 RECT packets. ==> 8 RECT

2255	   This behaviour happens to match TCP's congestion window control in
2256	   slow start, which is why for TCP sources, only the first and third
2257	   packet need be FNE packets.

2259	   A source that would open the congestion window any quicker would have
2260	   to insert more FNE packets.  As another example a UDP source sending
2261	   VBR traffic might need to send several FNE packets ahead of the
2262	   traffic peaks it generates.

2264	Appendix E.  Argument for holding back the ECN nonce

2266	   The ECN nonce is a mechanism that allows a /sending/ transport to
2267	   detect if drop or ECN marking at a congested router has been
2268	   suppressed by a node somewhere in the feedback loop---another router
2269	   or the receiver.

2271	   Space for the ECN nonce was set aside in [RFC3168] (currently
2272	   proposed standard) while the full nonce mechanism is specified in
2273	   [RFC3540] (currently experimental).  The specifications for [RFC4340]
2274	   (currently proposed standard) requires that "Each DCCP sender SHOULD
2275	   set ECN Nonces on its packets...".  It also mandates as a requirement
2276	   for all CCID profiles that "Any newly defined acknowledgement
2277	   mechanism MUST include a way to transmit ECN Nonce Echoes back to the
2278	   sender.", therefore:

2280	   o  The CCID profile for TCP-like Congestion Control [RFC4341]
2281	      (currently proposed standard) says "The sender will use the ECN
2282	      Nonce for data packets, and the receiver will echo those nonces in
2283	      its Ack Vectors."

2285	   o  The CCID profile for TCP-Friendly Rate Control (TFRC) [RFC4342]
2286	      recommends that "The sender [use] Loss Intervals options' ECN
2287	      Nonce Echoes (and possibly any Ack Vectors' ECN Nonce Echoes) to
2288	      probabilistically verify that the receiver is correctly reporting
2289	      all dropped or marked packets."

2291	   The primary function of the ECN nonce is to protect the integrity of
2292	   the information about congestion: ECN marks and packet drops.
2293	   However, when the nonce is used to protect the integrity of
2294	   information about packet drops, rather than ECN marks, a transport
2295	   layer nonce will always be sufficient (because a drop loses the
2296	   transport header as well as the ECN field in the network header),
2297	   which would avoid using scarce IP header codepoint space.  Similarly,
2298	   a transport layer nonce would protect against a receiver sending
2299	   early acknowledgements [Savage99].

2301	   If the ECN nonce reveals integrity problems with the information
2302	   about congestion, the sending transport can use that knowledge for
2303	   two functions:

2305	   o  to protect its own resources, by allocating them in proportion to
2306	      the rates that each network path can sustain, based on congestion
2307	      control,

2309	   o  and to protect congested routers in the network, by slowing down
2310	      drastically its connection to the destination with corrupt
2311	      congestion information.

2313	   If the sending transport chooses to act in the interests of congested
2314	   routers, it can reduce its rate if it detects some malicious party in
2315	   the feedback loop may be suppressing ECN feedback.  But it would only
2316	   be useful to congested routers when /all/ senders using them are
2317	   trusted to act in interest of the congested routers.

2319	   In the end, the only essential use of a network layer nonce is when
2320	   sending transports (e.g. large servers) want to allocate their /own/
2321	   resources in proportion to the rates that each network path can
2322	   sustain, based on congestion control.  In that case, the nonce allows
2323	   senders to be assured that they aren't being duped into giving more
2324	   of their own resources to a particular flow.  And if congestion
2325	   suppression is detected, the sending transport can rate limit the
2326	   offending connection to protect its own resources.  Certainly, this
2327	   is a useful function, but the IETF should carefully decide whether
2328	   such a single, very specific case warrants IP header space.

2330	   In contrast, Re-ECN allows all routers to fully protect themselves
2331	   from such attacks, without having to trust anyone - senders,
2332	   receivers, neighbouring networks.  Re-ECN is therefore proposed in
2333	   preference to the ECN nonce on the basis that it addresses the
2334	   generic problem of accountability for congestion of a network's
2335	   resources at the IP layer.

2337	   Delaying the ECN nonce is justified because the applicability of the
2338	   ECN nonce seems too limited for it to consume a two-bit codepoint in
2339	   the IP header.  It therefore seems prudent to give time for an
2340	   alternative way to be found to do the one function the nonce is
2341	   essential for.

2343	   Moreover, while we have re-designed the Re-ECN codepoints so that
2344	   they do not prevent the ECN nonce progressing, the same is not true
2345	   the other way round.  If the ECN nonce started to see some deployment
2346	   (perhaps because it was blessed with proposed standard status),
2347	   incremental deployment of Re-ECN would effectively be impossible,
2348	   because Re-ECN marking fractions at inter-domain borders would be
2349	   polluted by unknown levels of nonce traffic.

2351	   The authors are aware that Re-ECN must prove it has the potential it
2352	   claims if it is to displace the nonce.  Therefore, every effort has
2353	   been made to complete a comprehensive specification of Re-ECN so that
2354	   its potential can be assessed.  We therefore seek the opinion of the
2355	   Internet community on whether the Re-ECN protocol is sufficiently
2356	   useful to warrant standards action.

2358	Appendix F.  Alternative Terminology Used in Other Documents

2360	   A number of alternative terms have been used in various documents
2361	   describing re-feedback and re-ECN.  These are set out in the
2362	   following table

2364	        +---------------------+----------------+------------------+
2365	        | Current Terminology | EECN codepoint |      Colour      |
2366	        +---------------------+----------------+------------------+
2367	        |       Cautious      |       FNE      |       Green      |
2368	        |       Positive      |     Re-Echo    |       Black      |
2369	        |       Neutral       |      RECT      |       Grey       |
2370	        |       Negative      |     CE(-1)     |        Red       |
2371	        |      Cancelled      |      CE(0)     |     Red-Black    |
2372	        |      Legacy ECN     |     ECT(0)     |       White      |
2373	        |   Currently Unused  |     --CU--     | Currently unused |
2374	        |                     |                |                  |
2375	        |        Legacy       |     Not-ECT    |       White      |
2376	        +---------------------+----------------+------------------+

2378	                  Table 7: Alternative re-ECN Terminology

2380	Authors' Addresses

2382	   Bob Briscoe (editor)
2383	   BT
2384	   B54/77, Adastral Park
2385	   Martlesham Heath
2386	   Ipswich  IP5 3RE
2387	   UK

2389	   Phone: +44 1473 645196
2390	   EMail: bob.briscoe@bt.com
2391	   URI:   http://bobbriscoe.net/
2392	   Arnaud Jacquet
2393	   BT
2394	   B54/70, Adastral Park
2395	   Martlesham Heath
2396	   Ipswich  IP5 3RE
2397	   UK

2399	   Phone: +44 1473 647284
2400	   EMail: arnaud.jacquet@bt.com
2401	   URI:

2403	   Toby Moncaster
2404	   Moncaster.com
2405	   Dukes
2406	   Layer Marney
2407	   Colchester  CO5 9UZ
2408	   UK

2410	   EMail: toby@moncaster.com

2412	   Alan Smith
2413	   BT
2414	   B54/76, Adastral Park
2415	   Martlesham Heath
2416	   Ipswich  IP5 3RE
2417	   UK

2419	   Phone: +44 1473 640404
2420	   EMail: alan.p.smith@bt.com