idnits 2.17.1 

draft-briscoe-tsvwg-re-ecn-tcp-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 3002.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2979.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2986.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2992.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The exact meaning of the all-uppercase expression 'NOT REQUIRED' is not
     defined in RFC 2119.  If it is intended as a requirements expression, it
     should be rewritten using one of the combinations defined in RFC 2119;
     otherwise it should not be all-uppercase.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'SHOULD not' in this paragraph:
     
     Appendix F also gives an example dropper implementation that
     aggregates flow state.  Dropper algorithms will often maintain a moving
     average across flows of the fraction of RE blanked packets. When
     maintaining an average across flows, a dropper SHOULD only allow flows
     into the average if they start with FNE, but it SHOULD not include
     packets with the FNE codepoint set in the average.  An ingress gateway
     sets the FNE codepoint when it does not have the benefit of feedback from
     the ingress.  So, counting packets with FNE cleared would be likely to
     make the average unnecessarily positive, providing headroom (or should we
     say footroom?) for dishonest (negative) traffic.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 06, 2006) is 6627 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'ITU-T.I.371' is defined on line 2523, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567)

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)

  ** Downref: Normative reference to an Historic RFC: RFC 3540

  == Outdated reference: A later version (-04) exists of
     draft-briscoe-tsvwg-cl-architecture-02

  -- Obsolete informational reference (is this intentional?): RFC 2988
     (Obsoleted by RFC 6298)


     Summary: 6 errors (**), 0 flaws (~~), 5 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Transport Area Working Group                                  B. Briscoe
3	Internet-Draft                                                  BT & UCL
4	Expires: September 7, 2006                                    A. Jacquet
5	                                                            A. Salvatori
6	                                                                      BT
7	                                                          March 06, 2006

9	     Re-ECN: Adding Accountability for Causing Congestion to TCP/IP
10	                   draft-briscoe-tsvwg-re-ecn-tcp-01

12	Status of this Memo

14	   By submitting this Internet-Draft, each author represents that any
15	   applicable patent or other IPR claims of which he or she is aware
16	   have been or will be disclosed, and any of which he or she becomes
17	   aware will be disclosed, in accordance with Section 6 of BCP 79.

19	   Internet-Drafts are working documents of the Internet Engineering
20	   Task Force (IETF), its areas, and its working groups.  Note that
21	   other groups may also distribute working documents as Internet-
22	   Drafts.

24	   Internet-Drafts are draft documents valid for a maximum of six months
25	   and may be updated, replaced, or obsoleted by other documents at any
26	   time.  It is inappropriate to use Internet-Drafts as reference
27	   material or to cite them other than as "work in progress."

29	   The list of current Internet-Drafts can be accessed at
30	   http://www.ietf.org/ietf/1id-abstracts.txt.

32	   The list of Internet-Draft Shadow Directories can be accessed at
33	   http://www.ietf.org/shadow.html.

35	   This Internet-Draft will expire on September 7, 2006.

37	Copyright Notice

39	   Copyright (C) The Internet Society (2006).

41	Abstract

43	   This document introduces a new protocol for explicit congestion
44	   notification (ECN), termed re-ECN, which can be deployed
45	   incrementally around unmodified routers.  The protocol arranges an
46	   extended ECN field in each packet so that, as it crosses any
47	   interface in an internetwork, it will carry a truthful prediction of
48	   congestion on the remainder of its path.  Then the upstream party at
49	   any trust boundary in the internetwork can be held responsible for
50	   the congestion they cause, or allow to be caused.  So, networks can
51	   introduce straightforward accountability and policing mechanisms for
52	   incoming traffic from end-customers or from neighbouring network
53	   domains.  The purpose of this document is to specify the re-ECN
54	   protocol at the IP layer and to give guidelines on any consequent
55	   changes required to transport protocols.  It includes the changes
56	   required to TCP both as an example and as a specification.  It also
57	   gives examples of mechanisms that can use the protocol to ensure data
58	   sources respond correctly to congestion.  And it describes example
59	   mechanisms that ensure the dominant selfish strategy of both network
60	   domains and end-points will be to set the extended ECN field
61	   honestly.

63	Authors' Statement: Status (to be removed by the RFC Editor)

65	   This document is posted as an Internet-Draft with the intent (at
66	   least that of the authors) to eventually progress to standards track.

68	   Although the re-ECN protocol is intended to make a simple but far-
69	   reaching change to the Internet architecture, the most immediate
70	   priority for the authors is to delay any move of the ECN nonce to
71	   Proposed Standard status.

73	   The ECN nonce is an experimental RFC that allows /senders/ to check
74	   the integrity of congestion feedback from /networks/.  Therefore the
75	   nonce only helps in scenarios where the sender is trusted to control
76	   network congestion.  On the other hand, the re-ECN protocol aims to
77	   allow networks themselves to be able to police cheating senders and
78	   receivers and to police neighbouring networks.  Re-ECN is therefore
79	   proposed in preference to the ECN nonce on the basis that it
80	   addresses the generic problem of accountability for congestion of a
81	   network's resources at the IP layer.

83	   Delaying the ECN nonce is justified by two factors:

85	   o  The ECN nonce would permanently consumes a two-bit codepoint in
86	      the IP header for a purpose specific to a limited trust model.
87	      Although the nonce is a neat idea, its applicability seems too
88	      limited to warrant space in the IP header;

90	   o  Although we have re-designed the re-ECN codepoints so that they do
91	      not prevent the ECN nonce progressing, the same is not true the
92	      other way round.  If the ECN nonce started to see some deployment
93	      (perhaps because it was blessed with proposed standard status),
94	      incremental deployment of re-ECN would effectively be impossible,
95	      because re-ECN marking fractions at inter-domain borders would be
96	      polluted by unknown levels of nonce traffic.

98	   The authors are aware that re-ECN must prove it has the potential it
99	   claims if it is to displace the nonce.  Therefore, every effort has
100	   been made to complete a comprehensive specification of re-ECN so that
101	   its potential can be assessed.  We therefore seek the opinion of the
102	   Internet community on whether the re-ECN protocol is sufficiently
103	   useful to warrant standards action.

105	Table of Contents

107	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  5
108	   2.  Requirements notation  . . . . . . . . . . . . . . . . . . . .  6
109	   3.  Protocol Overview  . . . . . . . . . . . . . . . . . . . . . .  7
110	     3.1.  Background and Applicability . . . . . . . . . . . . . . .  7
111	     3.2.  Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or
112	           v6)  . . . . . . . . . . . . . . . . . . . . . . . . . . .  8
113	     3.3.  Re-ECN Protocol Operation  . . . . . . . . . . . . . . . .  9
114	     3.4.  Informal Terminology . . . . . . . . . . . . . . . . . . . 11
115	   4.  Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 13
116	     4.1.  TCP  . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
117	       4.1.1.  RECN mode: Full re-ECN capable transport . . . . . . . 14
118	       4.1.2.  RECN-Co mode: Re-ECT Sender with a Vanilla or
119	               Nonce ECT Receiver . . . . . . . . . . . . . . . . . . 17
120	       4.1.3.  Capability Negotiation . . . . . . . . . . . . . . . . 18
121	       4.1.4.  Extended ECN (EECN) Field Settings during Flow
122	               Start or after Idle Periods  . . . . . . . . . . . . . 20
123	       4.1.5.  Pure ACKS, Retransmissions, Window Probes and
124	               Partial ACKs . . . . . . . . . . . . . . . . . . . . . 23
125	     4.2.  Other Transports . . . . . . . . . . . . . . . . . . . . . 24
126	       4.2.1.  Guidelines for Adding Re-ECN to Other Transports . . . 24
127	   5.  Network Layer  . . . . . . . . . . . . . . . . . . . . . . . . 24
128	     5.1.  Re-ECN IPv4 Wire Protocol  . . . . . . . . . . . . . . . . 24
129	     5.2.  Re-ECN IPv6 Wire Protocol  . . . . . . . . . . . . . . . . 26
130	     5.3.  Router Forwarding Behaviour  . . . . . . . . . . . . . . . 26
131	     5.4.  Justification for Setting the First SYN to FNE . . . . . . 27
132	     5.5.  Control and Management . . . . . . . . . . . . . . . . . . 28
133	       5.5.1.  Negative Balance Warning . . . . . . . . . . . . . . . 28
134	       5.5.2.  Rate Response Control  . . . . . . . . . . . . . . . . 28
135	     5.6.  Tunnels  . . . . . . . . . . . . . . . . . . . . . . . . . 29
136	     5.7.  Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 29
137	   6.  Applications . . . . . . . . . . . . . . . . . . . . . . . . . 29
138	     6.1.  Policing Congestion Response . . . . . . . . . . . . . . . 29
139	       6.1.1.  The Policing Problem . . . . . . . . . . . . . . . . . 29
140	       6.1.2.  Incentive Framework  . . . . . . . . . . . . . . . . . 30
141	       6.1.3.  Egress Dropper . . . . . . . . . . . . . . . . . . . . 36
142	       6.1.4.  Rate Policing  . . . . . . . . . . . . . . . . . . . . 37
143	       6.1.5.  Inter-domain Policing  . . . . . . . . . . . . . . . . 39
144	       6.1.6.  Simulations  . . . . . . . . . . . . . . . . . . . . . 39

146	     6.2.  Other Applications . . . . . . . . . . . . . . . . . . . . 40
147	       6.2.1.  DDoS Mitigation  . . . . . . . . . . . . . . . . . . . 40
148	       6.2.2.  End-to-end QoS . . . . . . . . . . . . . . . . . . . . 41
149	       6.2.3.  Traffic Engineering  . . . . . . . . . . . . . . . . . 41
150	       6.2.4.  Inter-Provider Service Monitoring  . . . . . . . . . . 41
151	     6.3.  Limitations  . . . . . . . . . . . . . . . . . . . . . . . 41
152	   7.  Incremental Deployment . . . . . . . . . . . . . . . . . . . . 41
153	     7.1.  Incremental Deployment Features  . . . . . . . . . . . . . 42
154	     7.2.  Incremental Deployment Incentives  . . . . . . . . . . . . 42
155	   8.  Architectural Rationale  . . . . . . . . . . . . . . . . . . . 47
156	   9.  Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 50
157	     9.1.  Policing Rate Response to Congestion . . . . . . . . . . . 50
158	     9.2.  Congestion Notification Integrity  . . . . . . . . . . . . 50
159	     9.3.  Identifying Upstream and Downstream Congestion . . . . . . 51
160	   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 51
161	   11. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 52
162	   12. Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 52
163	   13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 53
164	   14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 53
165	   15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 53
166	     15.1. Normative References . . . . . . . . . . . . . . . . . . . 53
167	     15.2. Informative References . . . . . . . . . . . . . . . . . . 54
168	   Appendix A.  Precise Re-ECN Protocol Operation . . . . . . . . . . 56
169	   Appendix B.  ECN Compatibility . . . . . . . . . . . . . . . . . . 57
170	   Appendix C.  Packet Marking During Flow Start  . . . . . . . . . . 58
171	   Appendix D.  Example Egress Dropper Algorithm  . . . . . . . . . . 59
172	   Appendix E.  Re-TTL  . . . . . . . . . . . . . . . . . . . . . . . 59
173	   Appendix F.  Policer Designs to ensure Congestion
174	                Responsiveness  . . . . . . . . . . . . . . . . . . . 59
175	     F.1.  Per-user Policing  . . . . . . . . . . . . . . . . . . . . 59
176	     F.2.  Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 61
177	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 64
178	   Intellectual Property and Copyright Statements . . . . . . . . . . 65

180	1.  Introduction

182	   This document aims:

184	   o  To provide a complete specification of the addition of the re-ECN
185	      protocol to IP and guidelines on how to add it to transport layer
186	      protocols, including a complete specification of re-ECN in TCP as
187	      an example;

189	   o  To show how a number of hard problems become much easier to solve
190	      once re-ECN is available in IP.

192	   A general statement of the problem solved by re-ECN is to provide
193	   sufficient information in each IP datagram to be able to hold senders
194	   and whole networks accountable for the congestion they cause
195	   downstream, before they cause it.  But the every-day problems that
196	   re-ECN can solve are much more recognisable than this rather generic
197	   statement: mitigating distributed denial of service (DDoS);
198	   simplifying differentiation of quality of service (QoS); policing
199	   compliance to congestion control; and so on.

201	   Uniquely, re-ECN manages to enable solutions to these problems
202	   without unduly stifling innovative new ways to use the Internet.
203	   This was a hard balance to strike, given it could be argued that DDoS
204	   is an innovative way to use the Internet.  The most valuable insight
205	   was to allow each network to choose the level of constraint it wishes
206	   to impose.  Also re-ECN has been carefully designed so that networks
207	   that choose to use it conservatively can protect themselves against
208	   the congestion caused in their network by users on other networks
209	   with more liberal policies.

211	   For instance, some network owners want to block applications like
212	   voice and video unless their network is compensated for the extra
213	   share of bottleneck bandwidth taken.  These real-time applications
214	   tend to be unresponsive when congestion arises.  Whereas elastic TCP-
215	   based applications back away quickly, ending up taking a much smaller
216	   share of congested capacity for themselves.  Other network owners
217	   want to invest in large amounts of capacity and make their gains from
218	   simplicity of operation and economies of scale.

220	   Re-ECN allows the more conservative networks to police out flows that
221	   have not asked to be unresponsive to congestion---not because they
222	   are voice or video---just because they don't respond to congestion.
223	   But it also allows other networks to choose not to police.
224	   Crucially, when flows from liberal networks cross into a conservative
225	   network, re-ECN enables the conservative network to apply penalties
226	   to its neighbouring networks for the congestion they cause.  And
227	   these penalties can be applied to bulk data, without regard to flows.

229	   Then, if unresponsive applications become so dominant that some of
230	   the more liberal networks experience congestion collapse [RFC3714],
231	   they can change their minds and use re-ECN to apply tighter controls
232	   in order to bring congestion back under control.

234	   Re-ECN works by arranging that each packet arrives at each network
235	   element carrying a view of expected congestion on its own downstream
236	   path, albeit averaged over multiple packets.  Most usefully,
237	   congestion on the remainder of the path becomes visible in the IP
238	   header at the first ingress.  Many of the applications of re-ECN
239	   involve a policer at this ingress using the view of downstream
240	   congestion arriving in packets to police or control the packet rate.

242	   Importantly, the scheme is recursive: a whole network harbouring
243	   users causing congestion in downstream networks can be held
244	   responsible or policed by its downstream neighbour.

246	   This document is structured as follows.  First an overview of the re-
247	   ECN protocol is given (Section 3), outlining its attributes and
248	   explaining conceptually how it works as a whole.  The two main parts
249	   of the document follow, as described above.  That is, the protocol
250	   specification divided into transport (Section 4) and network
251	   (Section 5) layers, then the applications it can be put to, such as
252	   policing DDoS, QoS and congestion control (Section 6).  Although
253	   these applications do not require standardisation themselves, they
254	   are described in a fair degree of detail in order to explain how re-
255	   ECN can be used.  Given, re-ECN proposes to use the last undefined
256	   bit in the IPv4 header, we felt it necessary to outline the potential
257	   that re-ECN could release in return for being given that bit.

259	   Deployment issues discussed throughout the document are brought
260	   together in Section 7, which is followed by a brief section
261	   explaining the somewhat subtle rationale for the design, from an
262	   architectural perspective (Section 8).  We end by describing related
263	   work (Section 9), listing security considerations (Section 10) and
264	   finally drawing conclusions (Section 12).

266	2.  Requirements notation

268	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
269	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
270	   document are to be interpreted as described in [RFC2119].

272	   This document first specifies a protocol, then describes a framework
273	   that creates the right incentives to ensure compliance to the
274	   protocol.  This could cause confusion because the second part of the
275	   document considers many cases where malicious nodes may not comply
276	   with the protocol.  When such contingencies are described, if any of
277	   the above keywords are not capitalised, that is deliberate.  So, for
278	   instance, the following two apparently contradictory sentences would
279	   be perfectly consistent: i) x MUST do this; ii) x may not do this.

281	3.  Protocol Overview

283	3.1.  Background and Applicability

285	   First we briefly recap the essentials of the ECN protocol [RFC3168].
286	   Two bits in the IP protocol (v4 or v6) are assigned to the ECN field.
287	   The sender clears the field to "00" (Not-ECT) if either end-point
288	   transport is not ECN-capable.  Otherwise it indicates an ECN-capable
289	   transport (ECT) using either of the two code-points "10" or "01"
290	   (ECT(0) and ECT(1) resp.).

292	   ECN-capable routers probabilistically set "11" if congestion is
293	   experienced (CE), the marking probability increasing with the length
294	   of the queue at its egress link (the RED algorithm [RFC2309]).
295	   However, they still drop rather than mark Not-ECT packets.  With
296	   multiple ECN-capable routers on a path, a flow of packets accumulates
297	   the fraction of CE marking that each router adds.  The combined
298	   effect of the packet marking of all the routers along the path
299	   signals congestion of the whole path to the receiver.  So, for
300	   example, if one router early in a path is marking 1% of packets and
301	   another later in a path is marking 2%, flows that pass through both
302	   routers will experience approximately 3% marking (see Appendix A for
303	   a precise treatment).

305	   The choice of two ECT code-points in the ECN field [RFC3168]
306	   permitted future flexibility, optionally allowing the sender to
307	   encode the experimental ECN nonce [RFC3540] in the packet stream.
308	   The nonce is designed to allow a sender to check the integrity of
309	   congestion feedback.  But Section 9.2 explains that it still gives no
310	   control over how fast the sender transmits as a result of the
311	   feedback.  On the other hand, re-ECN is designed both to ensure that
312	   congestion is declared honestly and that the sender's rate responds
313	   appropriately.

315	   Re-ECN is based on a feedback arrangement called
316	   `re-feedback' [Re-fb].  The word is short for either receiver-
317	   aligned, re-inserted or re-echoed feedback.  But it actually works
318	   even when no feedback is available.  In fact it has been carefully
319	   designed to work for single datagram flows.  Indeed, it even
320	   encourages aggregation of single packet flows by congestion control
321	   proxies.  Then, even if the traffic mix of the Internet were to
322	   become dominated by short messages, it would still be possible to
323	   control congestion efficiently.

325	   Changing the Internet's feedback architecture seems to imply
326	   considerable upheaval.  But re-ECN can be deployed incrementally at
327	   the transport layer around unmodified routers using existing fields
328	   in IP (v4 or v6).  However it does also require the last undefined
329	   bit in the IPv4 header, which it uses in combination with the 2-bit
330	   ECN field to create four new codepoints.  Changes to IP routers are
331	   RECOMMENDED in order to improve resilience against DoS attacks.
332	   Similarly, re-ECN works best if both the sender and receiver
333	   transports are re-ECN-capable, but it can work with just sender
334	   support.  Section 7 summarises the incremental deployment strategy.

336	   The re-ECN protocol makes no changes and has no effect on the TCP
337	   congestion control algorithm or on other rate responses to
338	   congestion.  Re-ECN is only concerned with enabling the ingress
339	   network to police that a source is complying with a congestion
340	   control algorithm, which is orthogonal to congestion control itself.

342	   Before re-ECN can be considered worthy of using up the last bit in
343	   the IP header, we must be sure that all our claims are robust.  We
344	   have gradually been reducing the list of outstanding issues, but the
345	   few that still remain are listed in Section 6.3.  We expect others
346	   may find new attacks, but we offer the re-ECN protocol on the basis
347	   that it is built on fairly solid theoretical foundations and, so far,
348	   it has proved possible to keep it relatively robust.

350	3.2.  Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6)

352	   The re-ECN wire protocol uses the two bit ECN field broadly as in
353	   RFC3168 [RFC3168] as described above, but with three differences of
354	   detail (see Section 5.3).  This specification defines a new re-ECN
355	   extension (RE) flag.  We will defer the definition of the actual
356	   position of the RE flag in the IPv4 & v6 headers until Section 5.
357	   Until then it will suffice to use an abstraction of the IPv4 and v6
358	   wire protocols by just calling it the RE flag.

360	   Unlike the ECN field, the RE flag is intended to be set by the sender
361	   and remain unchanged along the path, although it can be read by
362	   network elements that understand the re-ECN protocol.  It is feasible
363	   that a network element MAY change the setting of the RE flag, perhaps
364	   acting as a proxy for an end-point, but such a protocol would have to
365	   be defined in another specification (e.g. [Re-PCN]).

367	   Although the RE flag is a separate, single bit field, it can be read
368	   as an extension to the two-bit ECN field; the three concatenated bits
369	   in what we will call the extended ECN field (EECN) making eight
370	   codepoints.  We will use the RFC3168 names of the ECN codepoints to
371	   describe settings of the ECN field when the RE flag setting is "don't
372	   care", but we also define the following six extended ECN codepoint
373	   names for when we need to be more specific.

375	   +-------+------------+------+---------------+-----------------------+
376	   |  ECN  | RFC3168    |  RE  | Extended ECN  |     Re-ECN meaning    |
377	   | field | codepoint  | flag | codepoint     |                       |
378	   +-------+------------+------+---------------+-----------------------+
379	   |   00  | Not-ECT    |   0  | Not-RECT      |   Not re-ECN-capable  |
380	   |       |            |      |               |       transport       |
381	   |   00  | Not-ECT    |   1  | FNE           |      Feedback not     |
382	   |       |            |      |               |      established      |
383	   |   01  | ECT(1)     |   0  | Re-Echo       |  Re-echoed congestion |
384	   |       |            |      |               |        and RECT       |
385	   |   01  | ECT(1)     |   1  | RECT          |     Re-ECN capable    |
386	   |       |            |      |               |       transport       |
387	   |   10  | ECT(0)     |   0  | ---           |  Legacy ECN use only  |
388	   |       |            |      |               |                       |
389	   |   10  | ECT(0)     |   1  | --CU--        |    Currently unused   |
390	   |       |            |      |               |                       |
391	   |   11  | CE         |   0  | CE(0)         |       Congestion      |
392	   |       |            |      |               |    experienced with   |
393	   |       |            |      |               |        Re-Echo        |
394	   |   11  | CE         |   1  | CE(-1)        |       Congestion      |
395	   |       |            |      |               |      experienced      |
396	   +-------+------------+------+---------------+-----------------------+

398	                     Table 1: Extended ECN Codepoints

400	3.3.  Re-ECN Protocol Operation

402	   In this section we will give an overview of the operation of the re-
403	   ECN protocol for TCP/IP, leaving a detailed specification to the
404	   following sections.  Other transports will be discussed later.

406	   In summary, the protocol adds a third `re-echo' stage to the existing
407	   TCP/IP ECN protocol.  Whenever the network adds CE congestion
408	   signalling to the IP header on the forward data path, the receiver
409	   feeds it back to the ingress using TCP, then the sender re-echoes it
410	   into the forward data path using the RE flag in the next packet.

412	   Prior to receiving any feedback a sender will not know which setting
413	   of the RE flag to use, so it sets the feedback not established (FNE)
414	   codepoint.  The network reads the FNE codepoint conservatively as
415	   equivalent to re-echoed congestion.

417	   Specifically, once a flow is established, a re-ECN sender always
418	   initialises the ECN field to ECT(1).  And it usually sets the RE flag
419	   to "1".  Whenever a router re-marks a packet to CE, the receiver
420	   feeds back this event to the sender.  On receiving this feedback, the
421	   re-ECN sender will clear the RE flag to "0" in the next packet it
422	   sends.

424	   We chose to set and clear the RE flag this way round to ease
425	   incremental deployment (see Section 7).  To avoid confusion we will
426	   use the term `blanking' (rather than marking) when the RE flag is
427	   cleared to "0".  So, over a stream of packets, we will talk of the
428	   `RE blanking fraction' as the fraction of octets in packets with the
429	   RE flag cleared to "0".

431	       ^
432	       |
433	       |       RE blanking fraction
434	    3% |--------------------------------+=====
435	       |                                |
436	    2% |                                |
437	       |            CE marking fraction |
438	    1% |        +-----------------------+
439	       |        |
440	    0% +---------------------------------------->
441	          ^     0     ^                 i    ^    resource index
442	          |     ^     |                 ^    |
443	          0     |     1                 |    2     observation points
444	              1.00%                  2.00%         marking fraction

446	   Figure 1: A 2-Router Example (Imprecise)

448	   Figure 1 uses the two router example introduced earlier to illustrate
449	   why re-ECN allows routers to measure downstream congestion.  The
450	   horizontal axis represents the index of each congestible resource
451	   (typically queues) along a path through the Internet.  There may be
452	   many routers on the path, but we assume only two are currently
453	   congested (those with resource index 0 and i).  The two superimposed
454	   plots show the fraction of each extended ECN codepoint in a flow
455	   observed along this path.  Given about 3% of packets reaching the
456	   destination are marked CE, in response to feedback the sender will
457	   blank the RE flag in about 3% of packets it sends.  Then approximate
458	   downstream congestion can be measured at the observation points shown
459	   along the path by subtracting the CE marking fraction from the RE
460	   blanking fraction, as shown in the table below (Appendix A derives
461	   these approximations from a precise analysis).

463	           +-------------------+------------------------------+
464	           | Observation point | Approx downstream congestion |
465	           +-------------------+------------------------------+
466	           |         0         |         3% - 0% = 3%         |
467	           |         1         |         3% - 1% = 2%         |
468	           |         2         |         3% - 3% = 0%         |
469	           +-------------------+------------------------------+

471	   Table 2: Downstream Congestion Measured at Example Observation Points

473	   All along the path, whole-path congestion remains unchanged so it can
474	   be used as a reference against which to compare upstream congestion.
475	   The difference predicts downstream congestion for the rest of the
476	   path.  Therefore, measuring the fractions of each codepoint at any
477	   point in the Internet will reveal upstream, downstream and whole path
478	   congestion.

480	   Note that we have introduced discussion of marking and blanking
481	   fractions solely for illustration.  To be absolutely clear, these
482	   fractions are averages that would result from the behaviour of a TCP
483	   protocol handler mechanically blanking outgoing packets in direct
484	   response to incoming feedback---we are not saying any protocol
485	   handler works with these average fractions directly.

487	3.4.  Informal Terminology

489	   In the rest of this memo we will loosely talk of positive or negative
490	   flows, meaning flows where the moving average of the downstream
491	   congestion metric is persistently positive or negative.  The notion
492	   of a negative metric arises because it is derived by subtracting one
493	   metric from another.  Of course actual downstream congestion cannot
494	   be negative, only the metric can (whether due to time lags or
495	   deliberate malice).

497	   Just as we will loosely talk of positive and negative flows, we will
498	   also talk of positive or negative packets, meaning packets that
499	   contribute positively or negatively to downstream congestion.

501	   Therefore packets can be considered to have a `worth' of +1, 0 or -1,
502	   which, when multiplied by their size, indicates their contribution to
503	   downstream congestion.  Figure 2 shows the main state transitions of
504	   the system once a flow is established, showing the worth of packets
505	   in each state.  When the network congestion marks a packet it
506	   decrements its worth.  When the sender blanks the RE flag in order to
507	   re-echo congestion it increments the worth of a packet.

509	   Sender state         Sent    Worth  Network    Received Worth
510	                        packet         Congestion packet
511	            +----------------------------------------------------+
512	            |                                                    ^
513	            V                                                    |
514	   Congestion echoed -->Re-Echo  +1      -->      CE(0)      0 --+
515	                          /                                      |
516	        No congestion___/                                        |
517	                   /    \                                        |
518	                  V       \                                      |
519	   Flow established --> RECT      0      -->      CE(-1)    -1 --+

521	   Figure 2: Re-ECN System State Diagram (bootstrap not shown)

523	   The idea is that every time the network decrements the worth of a
524	   packet, the sender increments the worth of a later packet.  Then,
525	   over time, as many positive packets should arrive at the receiver as
526	   negative.  It is this balance that will allow the network to hold the
527	   sender accountable for the congestion it causes, as we shall see.

529	   If we start with the sender in `flow established' state, normally it
530	   goes round the tight sub-loop, sending RECT packets (worth nothing)
531	   and returning to the flow established state to send another one.  But
532	   if one of the packets is congestion marked, its worth is decremented.
533	   The sender will have been continuing round its tight sending loop.
534	   But when congestion feedback returns from one of the packets in
535	   flight (the largest loop in the figure) the sender jumps to the
536	   congestion echoed state in order to re-echo the congestion,
537	   incrementing the worth of the next packet by blanking its RE bit.
538	   The sender then returns to the flow established state and continues
539	   in the tight loop sending zero worth.

541	   If a packet carrying re-echoed congestion happens to also be
542	   congestion marked, the worth added by the sender will be cancelled
543	   out by the network congestion marking.  Although the two worth values
544	   correctly cancel out, neither the congestion marking nor the re-
545	   echoed congestion are lost, because the RE bit and the ECN field are
546	   orthogonal.  So, whenever this happens, the receiver will correctly
547	   detect and re-echo the new congestion event as well (the top sub-
548	   loop).

550	   The table below specifies unambiguously the worth of each extended
551	   ECN codepoint.  Note the order is different from the previous table
552	   to better show how the worth increments and decrements.  The FNE
553	   codepoint is an exception.  It is used in the bootstrap process
554	   (explained later) and has the same positive worth as a packet with
555	   the Re-Echo codepoint.

557	   +--------+------+----------------+-------+--------------------------+
558	   |   ECN  |  RE  | Extended ECN   | Worth |      Re-ECN meaning      |
559	   |  field |  bit | codepoint      |       |                          |
560	   +--------+------+----------------+-------+--------------------------+
561	   |   00   |   0  | Not-RECT       | ...   |    Not re-ECN-capable    |
562	   |        |      |                |       |         transport        |
563	   |   01   |   0  | Re-Echo        | +1    | Re-echoed congestion and |
564	   |        |      |                |       |           RECT           |
565	   |   10   |   0  | ---            | ...   |  Legacy ECN use only     |
566	   |   11   |   0  | CE(0)          |  0    |  Congestion experienced  |
567	   |        |      |                |       |       with Re-Echo       |
568	   |   00   |   1  | FNE            | +1    | Feedback not established |
569	   |   01   |   1  | RECT           |  0    | Re-ECN capable transport |
570	   |   10   |   1  | --CU--         | ...   |     Currently unused     |
571	   |        |      |                |       |                          |
572	   |   11   |   1  | CE(-1)         | -1    |  Congestion experienced  |
573	   +--------+------+----------------+-------+--------------------------+

575	                Table 3: 'Worth' of Extended ECN Codepoints

577	4.  Transport Layers

579	4.1.  TCP

581	   Re-ECN capability at the sender is essential.  At the receiver it is
582	   optional, as long as the receiver has a basic (`vanilla flavour')
583	   RFC3168-compliant ECN-capable transport (ECT) [RFC3168].  Given re-
584	   ECN is not the first attempt to define the semantics of the ECN
585	   field, we give a table below summarising what happens for various
586	   combinations of capabilities of the sender S and receiver R, as
587	   indicated in the first four columns below.  The last column gives the
588	   mode a half-connection should be in after the first two of the three
589	   TCP handshakes.

591	   +--------+---------------+-----------+---------+--------------------+
592	   | Re-ECT |   ECT-Nonce   |    ECT    | Not-ECT |         S-R        |
593	   |        |   (RFC3540)   | (RFC3168) |         |   Half-connection  |
594	   |        |               |           |         |        Mode        |
595	   +--------+---------------+-----------+---------+--------------------+
596	   |   SR   |               |           |         |        RECN        |
597	   |    S   |       R       |           |         |       RECN-Co      |
598	   |    S   |               |     R     |         |       RECN-Co      |
599	   |    S   |               |           |    R    |       Not-ECT      |
600	   +--------+---------------+-----------+---------+--------------------+

602	       Table 4: Modes of TCP Half-connection for Combinations of ECN
603	                  Capabilities of Sender S and Receiver R

605	   We will describe what happens in each mode, then describe how they
606	   are negotiated.  The abbreviations for the modes in the above table
607	   mean:

609	   RECN: Full re-ECN capable transport

611	   RECN-Co: Re-ECN sender in compatibility mode with a vanilla [RFC3168]
612	      ECN receiver or an [RFC3540] ECN nonce-capable receiver.

614	   Not-ECT: Not ECN-capable transport, as defined in [RFC3168] for when
615	      at least one of the transports does not understand even basic ECN
616	      marking.

618	   Note that we use the term Re-ECT for a host transport that is re-ECN-
619	   capable but RECN for the modes of the half connections between hosts
620	   when they are both Re-ECT.  If a host transport is Re-ECT, this fact
621	   alone does NOT imply either of its half connections will necessarily
622	   be in RECN mode, at least not until it has confirmed that the other
623	   host is Re-ECT.

625	4.1.1.  RECN mode: Full re-ECN capable transport

627	   In full RECN mode, for each half connection, both the sender and the
628	   receiver each maintain an unsigned integer counter we will call ECC
629	   (echo congestion counter).  The receiver maintains a count, modulo 8,
630	   of how many times a CE marked packet has arrived during the half-
631	   connection.  Once a RECN connection is established, the three TCP
632	   option flags (ECE, CWR & NS) used for ECN-related functions in
633	   previous versions of ECN are used as a 3-bit field for the receiver
634	   to repeatedly tell the sender the current value of ECC whenever it
635	   sends a TCP ACK.  We will call this the echo congestion increment
636	   (ECI) field.  This overloaded use of these 3 option flags as one
637	   3-bit ECI field is shown in Figure 4.  The actual definition of the
638	   TCP header, including the addition of support for the ECN nonce, is
639	   shown for comparison in Figure 3.  This specification does not
640	   redefine the names of these three TCP option flags, it merely
641	   overloads them with another definition once a flow is established.

643	        0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
644	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
645	      |               |           | N | C | E | U | A | P | R | S | F |
646	      | Header Length | Reserved  | S | W | C | R | C | S | S | Y | I |
647	      |               |           |   | R | E | G | K | H | T | N | N |
648	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

650	   Figure 3: The (post-ECN Nonce) definition of bytes 13 and 14 of the
651	   TCP Header
652	        0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
653	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
654	      |               |           |           | U | A | P | R | S | F |
655	      | Header Length | Reserved  |    ECI    | R | C | S | S | Y | I |
656	      |               |           |           | G | K | H | T | N | N |
657	      +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

659	   Figure 4: Definition of the ECI field within bytes 13 and 14 of the
660	   TCP Header, overloading the current definitions above for established
661	   RECN flows.

663	   Receiver Action in RECN Mode

665	      Every time a CE marked packet arrives at a receiver in RECN mode,
666	      the receiver transport increments its local value of ECC modulo 8
667	      and MUST echo its value to the sender in the ECI field of the next
668	      ACK.  It MUST repeat the same value of ECI in every subsequent ACK
669	      until the next CE event, when it increments ECI again.

671	      The increment of the local ECC values is modulo 8 so the field
672	      value simply wraps round back to zero when it overflows.  The
673	      least significant bit is to the right (labelled bit 9).

675	      A receiver in RECN mode MAY delay the echo of a CE to the next
676	      delayed-ACK, which would be necessary if ACK-withholding were
677	      implemented.

679	   Sender Action in RECN Mode

681	      On the arrival of every ACK, the sender compares the ECI field
682	      with its own ECC value, then replaces its local value with that
683	      from the ACK.  The difference D is assumed to be the number of CE
684	      marked packets that arrived at the receiver since it sent the
685	      previously received ACK (but see below for the sender's safety
686	      strategy).  Whenever the ECI field increments by D (or D drops are
687	      detected), the sender MUST clear the RE flag to "0" in the IP
688	      header of the next D data packets it sends, effectively re-echoing
689	      each single increment of ECI.  Otherwise the data sender MUST send
690	      all data packets with RE set to "1".

692	      As a general rule, once a flow is established, as well as setting
693	      or clearing the RE flag as above, a data sender in RECN mode MUST
694	      always set the ECN field to ECT(1).  However, the settings of the
695	      extended ECN field during flow start are defined in Section 4.1.4.

697	      As we have already emphasised, the re-ECN protocol makes no
698	      changes and has no effect on the TCP congestion control algorithm.
699	      So, each increment of ECI (or detection of a drop) also triggers
700	      the standard TCP congestion response, but with no more than one
701	      congestion response per round trip, as usual.

703	      A TCP sender also acts as the receiver for the other half-
704	      connection.  The host will maintain two ECC values S.ECC and R.ECC
705	      as sender and receiver respectively.  Every data packet sent by a
706	      host in RECN mode will also repeat the prevailing value of R.ECC
707	      in its ECI field.  If a sender in RECN mode has to retransmit a
708	      packet due to a suspected loss, the re-transmitted packet MUST
709	      carry the latest prevailing value of R.ECC when it is re-
710	      transmitted, which will not necessarily be the one it carried
711	      originally.

713	4.1.1.1.  Safety against Long Pure ACK Loss Sequences

715	   The ECI method was chosen for echoing congestion marking because a
716	   re-ECN sender needs to know about every CE mark arriving at the
717	   receiver, not just whether at least one arrives within a round trip
718	   time (which is all the ECE/CWR mechanism supported).  But pure ACKs
719	   are not protected by TCP reliable delivery, so we repeat the same ECI
720	   value in every ACK until it changes.  Even if many ACKs in a row are
721	   lost, as soon as one gets through, the ECI field it repeats from
722	   previous ACKs that didn't get through will update the sender on how
723	   many CE marks arrived since the last ACK got through.

725	   The sender will only lose a record of the arrival of a CE mark if all
726	   the ACKS are lost (and all of them were pure ACKs) for a stream of
727	   data long enough to contain 8 or more CE marks.  So, if the marking
728	   fraction was p, at least 8/p pure ACKs would have to be lost.  For
729	   example, if p was 5%, a sequence of 160 pure ACKs would all have to
730	   be lost.  To protect against such extremely unlikely events, if a re-
731	   ECN sender detects a sequence of pure ACKs has been lost it SHOULD
732	   assume the ECI field wrapped as many times as possible within the
733	   sequence.

735	   Specifically, if a re-ECN sender receives an ACK with an
736	   acknowledgement number that acknowledges L segments since the
737	   previous ACK but with a sequence number unchanged from the previously
738	   received ACK, it SHOULD conservatively assume that the ECI field
739	   incremented by D' = L - ((L-D) mod 8), where D is the apparent
740	   increase in the ECI field.  For example if the ACK arriving after 9
741	   pure ACK losses apparently increased ECI by 2, the assumed increment
742	   of ECI would still be 2.  But if ECI apparently increased by 2 after
743	   11 pure ACK losses, ECI should be assumed to have increased by 10.

745	   A re-ECN sender MAY implement a heuristic algorithm to predict beyond
746	   reasonable doubt that the ECI field probably did not wrap within a
747	   sequence of lost pure ACKs.  But such an algorithm is NOT REQUIRED.
748	   Such an algorithm MUST NOT be used unless it is proven to work even
749	   in the presence of correlation between high ACK loss rate on the back
750	   channel and high CE marking rate on the forward channel.

752	   Whatever assumption a re-ECN sender makes about potentially lost CE
753	   marks, both its congestion control and its re-echoing behaviour
754	   SHOULD be consistent with the assumption it makes.

756	4.1.2.  RECN-Co mode: Re-ECT Sender with a Vanilla or Nonce ECT Receiver

758	   If the half-connection is in RECN-Co mode, ECN feedback proceeds no
759	   differently to that of vanilla ECN.  In other words, the receiver
760	   sets the ECE flag repeatedly in the TCP header and the sender
761	   responds by setting the CWR flag.  Although RECN-Co mode is used when
762	   the receiver has not implemented the re-ECN protocol, the sender can
763	   infer enough from its vanilla ECN feedback to set or clear the RE
764	   flag reasonably well.  Essentially, every time the receiver toggles
765	   the ECE field from "0" to "1" (or a loss is detected), as well as
766	   setting CWR in the TCP flags, the re-ECN sender sets the IP header
767	   the same as it would do in full RECN mode.  Specifically, the re-ECN
768	   sender MUST clear the RE flag to "0" in the next packet.  Otherwise
769	   the data sender SHOULD send all other packets with RE set to "1".
770	   Once a flow is established, a re-ECN data sender in RECN-Co mode MUST
771	   always set the ECN field to ECT(1).

773	   If a CE marked packet arrives at the receiver within a round trip
774	   time of a previous mark, the receiver will still be echoing ECE for
775	   the last CE mark.  Therefore, such a mark will be missed by the
776	   sender.  Of course, this isn't of concern for congestion control, but
777	   it does mean that very occasionally the RE blanking fraction will be
778	   understated.  Therefore flows in RECN-Co mode may occasionally be
779	   mistaken for very lightly cheating flows and consequently might
780	   suffer a small number of packet drops through an egress dropper
781	   (Section 6.1.3).  We expect re-ECN would be deployed for some time
782	   before policers and droppers start to enforce it.  So, given there is
783	   not much ECN deployment yet anyway, this minor problem may affect
784	   only a very small proportion of flows, reducing to nothing over the
785	   years as vanilla ECN hosts upgrade.  The use of RECN-Co mode would
786	   need to be reviewed in the light of experience at the time of re-ECN
787	   deployment.

789	   RECN-Co mode is OPTIONAL.  Re-ECN implementers who want to keep their
790	   code simple, MAY choose not to implement this mode.  If they do not,
791	   a re-ECN sender SHOULD fall back to vanilla ECT mode in the presence
792	   of an ECN-capable receiver.  It MAY choose to fall back to the ECT-
793	   Nonce mode, but if re-ECN implementers don't want to be bothered with
794	   RECN-Co mode, they probably won't want to add an ECT-Nonce mode
795	   either.

797	4.1.2.1.  Re-ECN support for the ECN Nonce

799	   A TCP half-connection in RECN-Co mode MUST NOT support the ECN
800	   Nonce [RFC3540].  This means that the sending code of a re-ECN
801	   implementation will never need to include ECN Nonce support.  Re-ECN
802	   is intended to provide wider protection than the ECN nonce against
803	   congestion control misbehaviour, and re-ECN only requires support
804	   from the sender, therefore it is preferable to specifically rule out
805	   the need for dual sender implementations.  As a consequence, a re-ECN
806	   capable sender will never set ECT(0), so it will be easier for
807	   network elements to discriminate re-ECN traffic flows from other ECN
808	   traffic, which will always contain some ECT(0) packets.

810	   However, a re-ECN implementation MAY OPTIONALLY include receiving
811	   code that complies with the ECN Nonce protocol when interacting with
812	   a sender that supports the ECN nonce (rather than re-ECN), but this
813	   support is NOT REQUIRED.

815	   RFC3540 allows an ECN nonce sender to choose whether to sanction a
816	   receiver that does not ever set the nonce sum.  Given re-ECN is
817	   intended to provide wider protection than the ECN nonce against
818	   congestion control misbehaviour, implementers of re-ECN receivers MAY
819	   choose not to implement backwards compatibility with the ECN nonce
820	   capability.  This may be because they deem that the risk of sanctions
821	   is low, perhaps because significant deployment of the ECN nonce seems
822	   unlikely at implementation time.

824	4.1.3.  Capability Negotiation

826	   During the TCP hand-shake at the start of a connection, an originator
827	   of the connection (host A) with a re-ECN-capable transport MUST
828	   indicate it is Re-ECT by setting the TCP options NS=1, CWR=1 and
829	   ECE=1 in the initial SYN.

831	   A responding Re-ECT host (host B) MUST return a SYN ACK with flags
832	   CWR=1 and ECE=0.  The responding host MUST NOT set this combination
833	   of flags unless the preceding SYN has already indicated Re-ECT
834	   support as above.  A Re-ECT server (B) can use either setting of the
835	   NS flag combined with this type of SYN ACK in response to a SYN from
836	   a Re-ECT client (A).  Normally a Re-ECT server will reply to a Re-ECT
837	   client with NS=0, but under special circumstances described in
838	   Section 4.1.4 it can return a SYN ACK with NS=1.

840	   These handshakes are summarised in Table 5 below, with X meaning
841	   `don't care'.  The handshakes used for the other flavours of ECN are
842	   also shown for comparison.  To compress the width of the table, the
843	   headings of the first four columns have been severely abbreviated, as
844	   follows:

846	      R: *R*e-ECT

848	      N: ECT-*N*once (RFC3540)

850	      E: *E*CT (RFC3168)

852	      I: Not-ECT (*I*mplicit congestion notification).

854	   These correspond with the same headings used in Table 4.  Indeed, the
855	   resulting modes in the last two columns of the table below are a more
856	   comprehensive way of saying the same thing as Table 4.

858	   +----+---+---+---+------------+-------------+-----------+-----------+
859	   | R  | N | E | I |   SYN A-B  | SYN ACK B-A |  A-B Mode |  B-A Mode |
860	   +----+---+---+---+------------+-------------+-----------+-----------+
861	   |    |   |   |   | NS CWR ECE |  NS CWR ECE |           |           |
862	   | AB |   |   |   |  1   1   1 |  X   1   0  |    RECN   |    RECN   |
863	   | A  | B |   |   |  1   1   1 |  1   0   1  |  RECN-Co  | ECT-Nonce |
864	   | A  |   | B |   |  1   1   1 |  0   0   1  |  RECN-Co  |    ECT    |
865	   | A  |   |   | B |  1   1   1 |  0   0   0  |  Not-ECT  |  Not-ECT  |
866	   | B  | A |   |   |  0   1   1 |  0   0   1  | ECT-Nonce |  RECN-Co  |
867	   | B  |   | A |   |  0   1   1 |  0   0   1  |    ECT    |  RECN-Co  |
868	   | B  |   |   | A |  0   0   0 |  0   0   0  |  Not-ECT  |  Not-ECT  |
869	   +----+---+---+---+------------+-------------+-----------+-----------+

871	      Table 5: TCP Capability Negotiation between Originator (A) and
872	                               Responder (B)

874	   As soon as a re-ECN capable TCP server receives a SYN, it MUST set
875	   its two half-connections into the modes given in Table 5.  As soon as
876	   a re-ECN capable TCP client receives a SYN ACK, it MUST set its two
877	   half-connections into the modes given in Table 5.  The half-
878	   connections will remain in these modes for the rest of the
879	   connection, including for the third segment of TCP's three-way hand-
880	   shake (the ACK).

882	   {ToDo: Consider SYNs within a connection.}

884	   Recall that, if the SYN ACK reflects the same flag settings as the
885	   preceding SYN (because there is a broken legacy implementation that
886	   behaves this way), RFC3168 specifies that the whole connection MUST
887	   revert to Not-ECT.

889	   Also note that, whenever the SYN flag of a TCP segment is set
890	   (including when the ACK flag is also set), the NS, CWR and ECE flags
891	   MUST NOT be interpreted as the 3-bit ECI value, which is only set as
892	   a copy of the local ECC value in non-SYN packets.

894	4.1.4.  Extended ECN (EECN) Field Settings during Flow Start or after
895	        Idle Periods

897	   If the originator (A) of a TCP connection supports re-ECN it MUST set
898	   the extended ECN (EECN) field in the IP header of the initial SYN
899	   packet to the feedback not established (FNE) codepoint.

901	   FNE is a new extended ECN codepoint defined by this specification
902	   (Section 3.2).  The feedback not established (FNE) codepoint is used
903	   when the transport does not have the benefit of ECN feedback so it
904	   cannot decide whether to set or clear the RE flag.

906	   If after receiving a SYN the server B has set its sending half-
907	   connection into RECN mode or RECN-Co mode, it MUST set the extended
908	   ECN field in the IP header of its SYN ACK to the feedback not
909	   established (FNE) codepoint.  Note the careful wording here, which
910	   means that Re-ECT server B must set FNE on a SYN ACK whether it is
911	   responding to a SYN from a Re-ECT client or from a client that is
912	   merely ECN-capable.

914	   The original ECN specification [RFC3168] required SYNs and SYN ACKs
915	   to use the Not-ECT codepoint of the ECN field.  The aim was to
916	   prevent well-known DoS attacks such as SYN flooding being able to
917	   gain from the advantage that ECN capability afforded over drop at
918	   ECN-capable routers.  For a SYN ACK [I-D.ietf-tsvwg-ecnsyn] has shown
919	   this caution was unnecessary, and proposes to allow a SYN ACK to be
920	   ECN-capable to improve performance.  However, our use of FNE on the
921	   initial SYN seems to comply with this aim in word but not in spirit,
922	   so a justification for choosing to set RE to 1 for a SYN is given in
923	   Section 5.4.

925	   Once a TCP half connection is in RECN mode or RECN-Co mode, FNE will
926	   have already been set on the initial SYN and possibly the SYN ACK as
927	   above.  But each re-ECN sender will have to set FNE cautiously on a
928	   few data packets as well, given a number of packets will usually have
929	   to be sent before sufficient congestion feedback is received.  The
930	   behaviour will be different depending on the mode of the half-
931	   connection:

933	   RECN mode: Given the constraints on TCP's initial window [RFC3390]
934	      and its exponential window increase during slow start
935	      phase [RFC2581], it turns out that the sender SHOULD set FNE on
936	      the first and third data packets in its flow, assuming equal sized
937	      data packets once a flow is established.  Appendix C presents the
938	      calculation that led to this conclusion.  Below, after running
939	      through the start of an example TCP session, we give the intuition
940	      learned from that calculation.

942	   RECN-Co mode: A re-ECT sender that switches into re-ECN compatibility
943	      mode (because it has detected the corresponding host is ECN-
944	      capable but not re-ECN capable) MUST limit its initial window to 1
945	      segment.  The reasoning behind this constraint is given in
946	      Section 5.4.  Having set this initial window, a re-ECN sender in
947	      RECN-Co mode SHOULD set FNE on the first and third data packets in
948	      a flow, as for RECN mode.

950	   +----+------+----------------+-------+-------+---------------+------+
951	   |    | Data | TCP A(Re-ECT)  | IP A  | IP B  | TCP B(Re-ECT) | Data |
952	   +----+------+----------------+-------+-------+---------------+------+
953	   |    | Byte |  SEQ  ACK CTL  | EECN  | EECN  |  SEQ  ACK CTL | Byte |
954	   | -- | ---- | -------------  | ----- | ----- | ------------- | ---- |
955	   |  1 |      | 0100      SYN  | FNE   | -->   |      R.ECC=0  |      |
956	   |    |      |    CWR,ECE,NS  |       |       |               |      |
957	   |  2 |      |      R.ECC=0   | <--   | FNE   | 0300 0101     |      |
958	   |    |      |                |       |       |   SYN,ACK,CWR |      |
959	   |  3 |      | 0101 0301 ACK  | RECT  | -->   |      R.ECC=0  |      |
960	   |  4 | 1000 | 0101 0301 ACK  | FNE   | -->   |      R.ECC=0  |      |
961	   |  5 |      |      R.ECC=0   | <--   | FNE   | 0301 1102 ACK | 1460 |
962	   |  6 |      |      R.ECC=0   | <--   | RECT  | 1762 1102 ACK | 1460 |
963	   |  7 |      |      R.ECC=0   | <--   | FNE   | 3222 1102 ACK | 1460 |
964	   |  8 |      | 1102 1762 ACK  | RECT  | -->   |      R.ECC=0  |      |
965	   |  9 |      |      R.ECC=0   | <--   | RECT  | 4682 1102 ACK | 1460 |
966	   | 10 |      |      R.ECC=0   | <--   | RECT  | 6142 1102 ACK | 1460 |
967	   | 11 |      | 1102 3222 ACK  | RECT  | -->   |      R.ECC=0  |      |
968	   | 12 |      |      R.ECC=0   | <--   | RECT  | 7602 1102 ACK | 1460 |
969	   | 13 |      |      R.ECC=1   | <*-   | RECT  | 9062 1102 ACK | 1460 |
970	   |    |      | ...            |       |       |               |      |
971	   +----+------+----------------+-------+-------+---------------+------+

973	                      Table 6: TCP Session Example #1

975	   Table 6 shows an example TCP session, where the server B sets FNE on
976	   its first and third data packets (lines 5 & 7) as well as on the
977	   initial SYN ACK as previously described.  The left hand half of the
978	   table shows the relevant settings of headers sent by client A in
979	   three layers: the TCP payload size; TCP settings; then IP settings.
980	   The right hand half gives equivalent columns for server B. The only
981	   TCP settings shown are the sequence number (SEQ), acknowledgement
982	   number (ACK) and the relevant control (CTL) flags that A sets in the
983	   TCP header.  The IP columns show the setting of the extended ECN
984	   (EECN) field.

986	   Also shown on the receiving side of the table is the value of the
987	   receiver's echo congestion counter (R.ECC) after processing the
988	   incoming EECN header.  Note that, once a host sets a half-connection
989	   into RECN mode, it MUST initialise its local value of ECC to zero.

991	   The intuition that Appendix C gives for why a sender should set FNE
992	   on the first and third data packets is as follows.  At line 13, a
993	   packet sent by B is shown with an '*', which means it has been
994	   congestion marked by an intermediate router from RECT to CE(-1).  On
995	   receiving this CE marked packet, client A increments its ECC counter
996	   to 1 as shown.  This was the 7th data packet B sent, but before
997	   feedback about this event returns to B, it might well have sent many
998	   more packets.  Indeed, during exponential slow start, about as many
999	   packets will be in flight (unacknowledged) as have been acknowledged.
1000	   So, when the feedback from the congestion event on B's 7th segment
1001	   returns, B will have sent about 7 further packets that will still be
1002	   in flight.  At that stage, B's best estimate of the network's packet
1003	   marking fraction will be 1/7.  So, as B will have sent about 14
1004	   packets, it should have already marked 2 of them as FNE in order to
1005	   have marked 1/7; hence the need to have set the first and third data
1006	   packets to FNE.

1008	   Client A's behaviour in Table 6 also shows FNE being set on the first
1009	   SYN and the first data packet (lines 1 & 4), but in this case it
1010	   sends no more data packets, so of course, it cannot, and does not
1011	   need to, set FNE again.  Note that in the A-B direction there is no
1012	   need to set FNE on the third part of the three-way hand-shake (line
1013	   3---the ACK).

1015	   Note that in this section we have used the word SHOULD rather than
1016	   MUST when specifying how to set FNE on data segments before positive
1017	   congestion feedback arrives (but note that the word MUST was used for
1018	   FNE on the SYN and SYN ACK).  FNE is only RECOMMENDED for the first
1019	   and third data segments to entertain the possibility that the TCP
1020	   transport has the benefit of other knowledge of the path, which it
1021	   re-uses from one flow for the benefit of a newly starting flow.  For
1022	   instance, one flow can re-use knowledge of other flows between the
1023	   same hosts if using a Congestion Manager [RFC3124] or when a proxy
1024	   host aggregates congestion information for large numbers of flows.

1026	   After an idle period of more than 1 second, a re-ECN sender MUST set
1027	   the EECN field of the next packet it sends to FNE.  In order that the
1028	   design of network policers can be deterministic, this specification
1029	   deliberately puts an absolute lower limit on how long a connection
1030	   can be idle before the next packet must be FNE, rather than relating
1031	   it to the connection round trip time.  We use the lower bound of the
1032	   retransmission timeout (RTO) [RFC2988], which is commonly used as the
1033	   idle period before TCP must reduce to the restart window [RFC2581].

1035	   Note our specification of re-ECN's idle period is NOT intended to
1036	   change the idle period for TCP's restart, nor indeed for any other
1037	   purposes.

1039	   {ToDo: Describe how the sender falls back to legacy modes if packets
1040	   don't appear to be getting through (to work round firewalls
1041	   discarding packets they consider unusual).}

1043	4.1.5.  Pure ACKS, Retransmissions, Window Probes and Partial ACKs

1045	   A re-ECN sender MUST clear the RE flag to "0" and set the ECN field
1046	   to Not-ECT in pure ACKs, retransmissions and window probes, as
1047	   specified in [RFC3168].  Our eventual goal is for all packets to be
1048	   sent with re-ECN enabled, and we believe the semantics of the ECI
1049	   field go a long way towards being able to achieve this.  However, we
1050	   have not completed a full security analysis for these cases,
1051	   therefore, currently we merely re-state current practice.

1053	   We must also reconcile the facts that congestion marking is applied
1054	   to packets but acknowledgements cover octet ranges and acknowledged
1055	   octet boundaries need not match the transmitted boundaries.  The
1056	   general principle we work to is to remain compatible with TCP's
1057	   congestion control which is driven by congestion events at packet
1058	   granularity while at the same time aiming to blank the RE flag on at
1059	   least as many octets in a flow as have been marked CE.

1061	   Therefore, a re-ECN TCP receiver MUST increment its ECC value as many
1062	   times as CE marked packets have been received.  And that value MUST
1063	   be echoed to the sender in the first available ACK using the ECI
1064	   field.  This ensures the TCP sender's congestion control receives
1065	   timely feedback on congestion events at the same packet granularity
1066	   that they were generated on congested routers.

1068	   Then, a re-ECN sender stores the difference D between its own ECC
1069	   value and the incoming ECI field by incrementing a counter R. Then, R
1070	   is decremented by 1 each subsequent packet that is sent with the RE
1071	   flag blanked, until R is no longer positive.  Using this technique,
1072	   whenever a re-ECN transport sends a not re-ECN capable (NRECN) packet
1073	   (e.g. a retransmission), the remaining packets required to have the
1074	   RE flag blanked will be automatically carried over to subsequent
1075	   packets, through the variable R.

1077	   This does not ensure precisely the same number of octets have RE
1078	   blanked as were CE marked.  But we believe positive errors will
1079	   cancel negative over a long enough period. {ToDo: However, more
1080	   research is needed to prove whether this is so.  If it is not, it may
1081	   be necessary to increment and decrement R in octets rather than
1082	   packets, by incrementing R as the product of D and the size in octets
1083	   of packets being sent (typically the MSS).}

1085	4.2.  Other Transports

1087	4.2.1.  Guidelines for Adding Re-ECN to Other Transports

1089	   Re-ECT sender transports that have established the receiver transport
1090	   is at least ECN-capable (not necessarily re-ECN capable) MUST blank
1091	   the RE codepoint in packets carrying at least as many octets as
1092	   arrive at receiver with the CE codepoint set.  Re-ECN-capable sender
1093	   transports should always initialise the ECN field to the ECT(1)
1094	   codepoint once a flow is established.

1096	   If the sender transport does not have sufficient feedback to even
1097	   estimate the path's CE rate, it SHOULD set FNE continuously.  If the
1098	   sender transport has some, perhaps stale, feedback to estimate that
1099	   the path's CE rate is nearly definitely less than E%, the transport
1100	   MAY blank RE in packets for E% of sent octets, and set the RECT
1101	   codepoint for the remainder.

1103	   {ToDo: Give a brief outline of what would be expected for each of the
1104	   following:

1106	   o  UDP fire and forget (e.g.  DNS)

1108	   o  UDP streaming with no feedback

1110	   o  UDP streaming with feedback

1112	   o  DCCP}

1114	   o  RSVP and/or NSIS: A separate I-D has been submitted [Re-PCN]
1115	      describing how re-ECN can be used in an edge-to-edge rather than
1116	      end-to-end scenario.  It can then be used by downstream networks
1117	      to police whether upstream networks are blocking new flow
1118	      reservations when downstream congestion is too high, even though
1119	      the congestion is in other operators' downstream networks.  This
1120	      relates to current work in progress on Admission Control over
1121	      Diffserv using Pre-Congestion Notification, being reported to the
1122	      IETF TSVWG [CL-arch].

1124	5.  Network Layer

1126	5.1.  Re-ECN IPv4 Wire Protocol

1128	   The wire protocol of the ECN field in the IP header remains largely
1129	   unchanged from [RFC3168].  However, an extension to the ECN field we
1130	   call the RE (re-ECN extension) flag (Section 3.2) is defined in this
1131	   document.  It doubles the extended ECN codepoint space, giving 8
1132	   potential codepoints.  The semantics of the extra codepoints are
1133	   backward compatible with the semantics of the 4 original codepoints
1134	   [RFC3168] (Section 7 collects together and summarises all the changes
1135	   defined in this document).

1137	   For IPv4, this document proposes that the new RE control flag will be
1138	   positioned where the `reserved' control flag was at bit 48 of the
1139	   IPv4 header (counting from 0).  Alternatively, some would call this
1140	   bit 0 (counting from 0) of byte 7 (counting from 1) of the IPv4
1141	   header (Figure 5).

1143	             0   1   2
1144	           +---+---+---+
1145	           | R | D | M |
1146	           | E | F | F |
1147	           +---+---+---+

1149	   Figure 5: New Definition of the Re-ECN Extension (RE) Control Flag at
1150	   the Start of Byte 7 of the IPv4 Header

1152	   It is believed that the RE flag can simultaneously serve other
1153	   purposes, particularly where the start of a flow needs distinguishing
1154	   from packets later in the flow.  For instance it would have been
1155	   useful to identify new flows for tag switching and might enable
1156	   similar developments in the future if it were adopted.  It is similar
1157	   to the state set-up bit idea designed to protect against memory
1158	   exhaustion attacks.  This idea was proposed by David Clark and
1159	   documented by Handley and Greenhalgh [Steps_DoS].  The RE flag can be
1160	   thought of as a `soft-state set-up flag', because it is idempotent
1161	   (i.e. one occurrence of the flag is sufficient but further
1162	   occurrences achieve the same effect if previous ones were lost).

1164	   We are sure there will probably be other claims pending on the use of
1165	   bit 48.  We know of at least two [ARI05], [RFC3514] but neither have
1166	   been pursued in the IETF, so far, although the present proposal would
1167	   meet the needs of the former.

1169	   The security flag proposal (commonly known as the evil bit) was
1170	   published on 1 April 2003 as Informational RFC 3514, but it was not
1171	   adopted due to confusion over whether evil-doers might set it
1172	   inappropriately.  The present proposal is backward compatible with
1173	   RFC3514 because if re-ECN compliant senders were benign they would
1174	   correctly clear the evil bit to honestly declare that they had just
1175	   received congestion feedback.  Whereas evil-doers would hide
1176	   congestion feedback by setting the evil bit continuously, or at least
1177	   more often than they should.  So, evil senders can be identified,
1178	   because they declare that they are good less often than they should.

1180	5.2.  Re-ECN IPv6 Wire Protocol

1182	   {ToDo: Include the IPv6 extension header design, including support
1183	   for the FNE flag.  Also its integrated support for a future multi-bit
1184	   congestion notification field, with a TTL hop count scheme to check
1185	   that all routers on the path support it (similar to Quick-Start).
1186	   So, if the whole path of routers doesn't support the extension, the
1187	   end-points can fall back to re-ECN (or drop).}

1189	5.3.  Router Forwarding Behaviour

1191	   Re-ECN works well without modifying the forwarding behaviour of any
1192	   routers.  However, below, two OPTIONAL changes to forwarding
1193	   behaviour are defined, which respectively enhance performance and
1194	   improve a router's discrimination against flooding attacks.  They are
1195	   both OPTIONAL additions that we propose MAY apply by default to all
1196	   Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN
1197	   marking behaviours [RFC3168].  Specifications for PHBs MAY define
1198	   different forwarding behaviours from this default, but this is NOT
1199	   REQUIRED.  [Re-PCN] is one example.

1201	   FNE indicates ECT:

1203	      The FNE codepoint indicates to a router that the packet was sent
1204	      and will be received by an ECN-capable transport.  Therefore an
1205	      FNE packet MAY be marked rather than dropped.  Note that the FNE
1206	      codepoint has been intentionally chosen so that, to legacy routers
1207	      (which do not inspect the RE flag), an FNE packet appears to be
1208	      Not-ECT, so will be dropped by legacy AQM algorithms.

1210	      A network operator MUST NOT configure a router to ECN mark rather
1211	      than drop FNE packets unless it can guarantee that FNE packets
1212	      will be rate limited, either locally or upstream.  The ingress
1213	      policers discussed in Section 6.1.4 would count as rate limiters
1214	      for this purpose.

1216	   Preferential Drop: If a re-ECN capable router experiences very high
1217	      load so that it has to drop arriving packets (e.g. a DoS attack),
1218	      it MAY preferentially drop packets within the same Diffserv PHB
1219	      using the preference order for extended ECN codepoints given in
1220	      Table 7.  Preferential dropping is difficult to implement, but if
1221	      feasible it would discriminate against attack traffic, if done as
1222	      part of the overall policing framework of Section 6.1.2.  If
1223	      nowhere else, routers at the egress of a network SHOULD implement
1224	      preferential drop (stronger than the MAY above).  For simplicity,
1225	      preferences 3,4 & 5 MAY be merged into one preference level.

1227	   +-------+-----+------------+-------+-------------+------------------+
1228	   |  ECN  |  RE | Extended   | Worth | Drop Pref   |  Re-ECN meaning  |
1229	   | field | bit | ECN        |       | (1 = drop   |                  |
1230	   |       |     | codepoint  |       | 1st)        |                  |
1231	   +-------+-----+------------+-------+-------------+------------------+
1232	   |   01  |  0  | Re-Echo    | +1    | 7           |     Re-echoed    |
1233	   |       |     |            |       |             |  congestion and  |
1234	   |       |     |            |       |             |       RECT       |
1235	   |   00  |  1  | FNE        | +1    | 6           |   Feedback not   |
1236	   |       |     |            |       |             |    established   |
1237	   |   11  |  0  | CE(0)      | 0     | 5           |    Congestion    |
1238	   |       |     |            |       |             | experienced with |
1239	   |       |     |            |       |             |      Re-Echo     |
1240	   |   01  |  1  | RECT       | 0     | 4           |  Re-ECN capable  |
1241	   |       |     |            |       |             |     transport    |
1242	   |   11  |  1  | CE(-1)     | -1    | 3           |    Congestion    |
1243	   |       |     |            |       |             |    experienced   |
1244	   |   10  |  1  | --CU--     | n/a   | 2           | Currently Unused |
1245	   |   10  |  0  | ---        | n/a   | 2           |  Legacy ECN use  |
1246	   |       |     |            |       |             |       only       |
1247	   |   00  |  0  | Not-RECT   | n/a   | 1           |        Not       |
1248	   |       |     |            |       |             |  re-ECN-capable  |
1249	   |       |     |            |       |             |     transport    |
1250	   +-------+-----+------------+-------+-------------+------------------+

1252	       Table 7: Drop Preference of EECN Codepoints (Sorted by `Worth')

1254	      The above drop preferences are arranged to preserve packets with
1255	      more positive worth (Section 3.4), given senders of positive
1256	      packets must have honestly declared downstream congestion.  This
1257	      is explained fully in Section 6 on applications.

1259	5.4.  Justification for Setting the First SYN to FNE

1261	   We require clients to consider the first SYN as congestion marked if
1262	   they find out at the end of the handshake that the server was not Re-
1263	   ECT capable.  This way we remove the need to cautiously avoid setting
1264	   the first SYN to Not-RECT.  This will give worse performance while
1265	   deployment is patchy, but better performance once deployment is
1266	   widespread.  Malicious clients may think they can use the advantage
1267	   that ECN-marking gives over drop in launching classic SYN-flood
1268	   attacks.  But the rate limit on FNE codepoints performed by the
1269	   ingress policer should be a sufficient countermeasure.

1271	   If the server is re-ECN capable, provision is made for it to echo a
1272	   possible congestion marking.  Congested routers may mark an FNE
1273	   packet to CE (see Section 5.3), in which case the packet will arrive
1274	   at B with an extended ECN codepoint of CE(-1).  So, if the initial
1275	   SYN from Re-ECT client A is marked CE(-1), a Re-ECT server B MUST
1276	   increment its local value of ECC.  But B cannot reflect the value of
1277	   ECC in the SYN ACK, because it is still using the 3 bits to negotiate
1278	   connection capabilities.  So, server B MUST set the alternative TCP
1279	   header flags in its SYN ACK: NS=1, CWR=1 and ECE=0 (see Table 5).

1281	   It might seem pedantic worrying about these single packets, but this
1282	   behaviour ensures the system is safe, even if the application mix on
1283	   the Internet evolves to the point where the majority of flows consist
1284	   of a single window or even a single packet.  It also allows denial of
1285	   service attacks to be more easily isolated and prevented.

1287	5.5.  Control and Management

1289	5.5.1.  Negative Balance Warning

1291	   A new ICMP message type is being considered so that a dropper can
1292	   warn the apparent sender of a flow that it has started to sanction
1293	   the flow.  The message would have similar semantics to the `Time
1294	   exceeded' ICMP message type.  To ensure the sender has to invest some
1295	   work before the network will generate such a message, a dropper
1296	   SHOULD only send such a message for flows that have demonstrated that
1297	   they have started correctly by establishing a positive record, but
1298	   have later gone negative.  The threshold is up to the implementation.
1299	   The purpose of the message is to deconfuse the cause of drops from
1300	   other causes, such as congestion or transmission losses.  The dropper
1301	   would send the message to the sender of the flow, not the receiver.
1302	   If we did define this message type, it would be REQUIRED for all re-
1303	   ECT senders to parse and understand it.  Note that a sender MUST only
1304	   use this message to explain why losses are occurring.  A sender MUST
1305	   NOT take this message to mean that losses have occurred that it was
1306	   not aware of.  Otherwise, spoof messages could be sent by malicious
1307	   sources to slow down a sender (c.f.  ICMP source quench).

1309	   However, the need for this message type is not yet confirmed, as we
1310	   are considering how to prevent it being used by malicious senders to
1311	   scan for droppers and to test their threshold settings. {ToDo:
1312	   Complete this section.}

1314	5.5.2.  Rate Response Control

1316	   The framework of Section 6.1.2 implies the need for a sender to send
1317	   a request to an ingress policer asking that it be allowed to apply a
1318	   non-default response to congestion (where TCP-friendly is assumed to
1319	   be the default).  This would require the sender to be able to
1320	   discover how to address the policer.  And message format(s) would
1321	   have to be defined.  The required control protocol(s) are outside the
1322	   scope of this document, but will require definition elsewhere.

1324	   The policer is likely to be local to the sender and inline, probably
1325	   at the ingress interface to the internetwork.  So, discovery should
1326	   not be hard.  A variety of control protocols already exist for some
1327	   widely used rate-responses to congestion.  For instance DCCP
1328	   congestion control identifiers (CCIDs) fulfil this role and so does
1329	   QoS signalling (e.g. and RSVP request for controlled load service is
1330	   equivalent to a request for no rate response to congestion, but with
1331	   admission control).

1333	5.6.  Tunnels

1335	   For tunnels to work correctly, re-ECN largely requires no more than
1336	   the tunnel handling of regular ECN [RFC3168].  The RE flag raises an
1337	   extra issue, but it is more straightforward than the ECN field
1338	   because it is not intended to change along the path.  Therefore a
1339	   tunnel entry point only needs to copy the RE flag into the
1340	   encapsulating header, without any need to negotiate whether the
1341	   tunnel exit supports RE flag handling.

1343	   {ToDo: However, there are some issues to discuss concerning tunnels,
1344	   which will be included in a future version of this draft}

1346	5.7.  Non-Issues

1348	   {ToDo: This section will explain why the addition of re-ECN does not
1349	   interact with any of the following:

1351	   o  Integration with congestion notification in various link layers
1352	      (Ethernet, ATM (and MPLS if it had a congestion notification
1353	      capability added, which is not precluded for the EXP field
1354	      [RFC3270])

1356	   o  Tunnels, and Overlays that wish to support congestion notification
1357	      (see also the brief discussion of edge-to-edge support for re-ECN
1358	      in RSVP or NSIS transports earlier)

1360	   o  Encryption and IPSec

1362	   }

1364	6.  Applications

1366	6.1.  Policing Congestion Response

1368	6.1.1.  The Policing Problem

1370	   The current Internet architecture trusts hosts to respond voluntarily
1371	   to congestion.  Limited evidence shows that the large majority of
1372	   end-points on the Internet comply with a TCP-friendly response to
1373	   congestion.  But telephony (and increasingly video) services over the
1374	   best efforts Internet are attracting the interest of major commercial
1375	   operations.  Most of these applications do not respond to congestion
1376	   at all.  Those that can switch to lower rate codecs, still have a
1377	   lower bound below which they must become unresponsive to congestion.

1379	   Even TCP-friendly applications can cause a disproportionate amount of
1380	   congestion, simply by using multiple flows or by transferring data
1381	   continuously.  Also the Internet Architecture has few defences
1382	   against distributed denial of service attacks that combine both
1383	   problems: unresponsiveness to congestion and flooding with multiple
1384	   flows.

1386	   Applications that need (or choose) to be unresponsive to congestion
1387	   can effectively steal whatever share of bottleneck resources they
1388	   want from responsive flows.  Whether or not such free-riding is
1389	   common, inability to prevent it increases the risk of poor returns
1390	   for investors in network infrastructure, leading to under-investment.
1391	   An increasing proportion of unresponsive, free-riding demand coupled
1392	   with persistent under-supply is a broken economic cycle.  Therefore,
1393	   if the current, largely co-operative consensus continues to erode,
1394	   congestion collapse could become more common in more areas of the
1395	   Internet [RFC3714].

1397	   However, while we have designed re-ECN to provide a way to solve
1398	   these problems, this does not imply we advocate that every network
1399	   should introduce tight controls on those that cause congestion.  Re-
1400	   ECN has been specifically designed to allow different networks to
1401	   choose how conservative or liberal they wish to be with respect to
1402	   policing congestion.  But those that choose to be conservative can
1403	   protect themselves from the excesses that liberal networks allow
1404	   their users.

1406	6.1.2.  Incentive Framework

1408	   The aim is to create an incentive environment that ensures optimal
1409	   sharing of capacity despite everyone acting selfishly (including
1410	   lying and cheating).  Of course, the mechanisms put in place for this
1411	   can lie dormant wherever co-operation is the norm.

1413	   Throughout this document we focus on path congestion.  But most forms
1414	   of fairness, including TCP's, also depend on round trip time.  So, we
1415	   also propose to measure downstream path delay using re-feedback.
1416	   This proposal will be published in a very simple future draft, but
1417	   for now we give an outline in Appendix E.

1419	   Figure 6 sketches the incentive framework that we will describe piece
1420	   by piece throughout this section.  We will do a first pass in
1421	   overview, then return to each piece in detail.  An internetwork with
1422	   multiple trust boundaries is depicted.  The difference between the
1423	   two plots in the example we used earlier Figure 1 is plotted below.
1424	   The graph displays downstream path congestion seen in a typical flow
1425	   as it traverses an example path from sender S to receiver R, across
1426	   networks N1, N2 & N4.  Everyone is shown using re-ECN, but we intend
1427	   to show why everyone would /choose/ to use it, correctly and
1428	   honestly.

1430	   Two main types of self-interest can be identified:

1432	   o  Users want to transmit data across the network as fast as
1433	      possible, paying as little as possible for the privilege.  In this
1434	      respect, there is no distinction between senders and receivers,
1435	      but we must be wary of potential malice by one on the other;

1437	   o  Network operators want to maximise revenues from the resources
1438	      they invest in.  They compete amongst themselves for the custom of
1439	      users.

1441	         policer
1442	       A  |
1443	       |  |
1444	       |S <-----N1----> <---N2---> <---N4--> R         domain
1445	       |: :                                :
1446	       |V :                                :
1447	    3% |--------+                          :
1448	       |  :     |                          :
1449	    2% |  :     +-----------------------+  :
1450	       |  :    downstream congestion    |  :
1451	    1% |  :                             |  :
1452	       |  :                             |  :
1453	    0% +--------------------------------+=====-->
1454	                0                       i  ^      resource index
1455	                |                       | /|\
1456	              1.00%                  2.00% |       marking fraction
1457	                                           |
1458	                                        dropper

1460	   Figure 6: Incentive Framework, showing creation of opposing pressures
1461	   to under-declare and over-declare downstream congestion, using a
1462	   policer and a dropper
1463	   Source congestion control: We want to ensure that the sender will
1464	      throttle its rate as downstream congestion increases.  Whatever
1465	      the agreed congestion response (whether TCP-compatible or some
1466	      enhanced QoS), to some extent it will always be against the
1467	      sender's interest to comply.

1469	   Ingress policing: But it is in all the network operators' interests
1470	      to encourage fair congestion response, so that their investments
1471	      are employed to satisfy the most valuable demand.  N1 is in the
1472	      best position to deploy a policer at its ingress to check that S1
1473	      is complying with congestion control (Section 6.1.4).  But ingress
1474	      policing is not the only possible arrangement.  Re-ECN provides
1475	      the necessary information for dual control of congestion either by
1476	      the sender or by the network ingress.  So, in some scenarios (e.g.
1477	      sensing devices with minimal capabilities) the network ingress
1478	      might do the congestion control as a proxy for the sender.

1480	   Edge egress dropper: If the policer ensures the source has less right
1481	      to a high rate the higher it declares downstream congestion, the
1482	      source has a clear incentive to understate downstream congestion.
1483	      But, if packets are understated when they enter the internetwork,
1484	      they will be negative when they leave.  So, we introduce a dropper
1485	      at the last network egress, which drops packets in flows that
1486	      persistently declare negative downstream congestion (see
1487	      Section 6.1.3 for details).  Incidentally, a network can trivially
1488	      prevent negative traffic from being sent in the first place by not
1489	      permitting a sender to send any CE packets, which would clearly
1490	      contravene the ECN protocol.

1492	               ..competitive routing
1493	             .'         :      '.
1494	           .'  p e n a l:t i e s '.
1495	          :           | :       \  :
1496	       A  :           | :        | :
1497	       |S <-----N1----> <---N2---> <---N4--> R         domain
1498	       |  :           | :        | :
1499	       |  V           | :        | :
1500	    3% |--------+     | :        | :
1501	       |        |     V V        V V
1502	    2% |        +-----------------------+
1503	       |       downstream congestion    |
1504	    1% |          :                     |
1505	       |          :                     |
1506	    0% +--------------------------------+=====-->
1507	                0                ^      i         resource index
1508	                |               /|\     |
1509	              1.00%              |   2.00%         marking fraction
1510	                                 |
1511	                             sanctions

1513	      Figure 7: Incentives at Inter-domain Borders

1515	   Inter-domain traffic policing: But next we must ask, if congestion
1516	   arises downstream (say in N4), what is the ingress network's (N1's)
1517	   incentive to police its customers' response?  If N1 turns a blind
1518	   eye, its own customers benefit while other networks suffer.  This is
1519	   why all inter-domain QoS architectures (e.g. Intserv, Diffserv)
1520	   police traffic each time it crosses a trust boundary.  Re-ECN gives
1521	   trustworthy information at each trust boundary, which N4 (say) can
1522	   use in bulk to police all the responses to congestion of all the
1523	   sources beyond its upstream neighbour (N2) with one very simple
1524	   passive mechanism, as we will now explain using Figure 7.

1526	   But before we do, we need to make a very important point.  In the
1527	   explanation that follows, we assume a very specific variant of volume
1528	   charging between networks.  We must make clear that we are not
1529	   advocating that everyone should use this form of contract.  We are
1530	   well aware that the IETF tries to avoid standardising technology that
1531	   depends on a particular business model.  And we strongly share this
1532	   desire to encourage diversity.  But our aim is merely to show that
1533	   border policing can at least work with this one model, then we can
1534	   assume that operators might experiment with the metric in other
1535	   models (see Section 6.1.5 for examples).  Of course, operators are
1536	   free to complement this usage element of their charges with
1537	   traditional capacity charging, and we expect they will.

1539	   Emulating policing with inter-domain congestion charging: Between
1540	      high-speed networks, we would rather avoid holding back traffic
1541	      while it is policed.  Instead, once re-ECN has arranged headers to
1542	      carry downstream congestion honestly, N2 can contract to pay N4
1543	      penalties in proportion to a single bulk count of the congestion
1544	      metrics crossing their mutual trust boundary (Section 6.1.5).  In
1545	      this way, N4 puts pressure on N2 to suppress downstream
1546	      congestion, as shown by the solid downward arrow at the egress of
1547	      N2.  Then N2 has an incentive either to police the congestion
1548	      response of its own ingress traffic (from N1) or to charge N1 in
1549	      turn on the basis of congestion counted at their mutual boundary.
1550	      In this recursive way, the incentives for each flow to respond
1551	      correctly to congestion trace back with each flow precisely to
1552	      each source, despite the mechanism not recognising flows (see
1553	      Section 6.2.2).  If N1 turns a blind eye to its own upstream
1554	      customers' congestion response, it will still have to pay its
1555	      downstream neighbours.

1557	   No congestion charging to users: Bulk congestion charging at trust
1558	      boundaries is passive and extremely simple, and loses none of its
1559	      per-packet precision from one boundary to the next (unlike
1560	      Diffserv all-address traffic conditioning agreements, which
1561	      dissipate their effectiveness across long topologies).  But at any
1562	      trust boundary, there is no imperative to use congestion charging.
1563	      Traditional traffic policing can be used, if the complexity and
1564	      cost is preferred.  In particular, at the boundary with end
1565	      customers (e.g. between S and N1), traffic policing will most
1566	      likely be far more appropriate.  Policer complexity is less of a
1567	      concern at the edge of the network.  And end-customers are known
1568	      to be highly averse to the unpredictability of congestion
1569	      charging.

1571	      So, NOTE WELL: this document neither advocates nor requires
1572	      congestion charging for end customers and advocates but does not
1573	      require inter-domain congestion charging.

1575	   Competitive discipline of inter-domain traffic engineering: With
1576	      inter-domain congestion charging, a domain seems to have a
1577	      perverse incentive to fake congestion; N2's profit depends on the
1578	      difference between congestion at its ingress (its revenue) and at
1579	      its egress (its cost).  So, overstating internal congestion seems
1580	      to increase profit.  However, smart border routing [Smart_rtg] by
1581	      N1 will bias its multipath routing towards the least cost routes.
1582	      So, N2 risks losing all its revenue to competitive routes if it
1583	      overstates congestion (see Section 6.2.3).  In other words, if N2
1584	      is the least congested route, its ability to raise excess profits
1585	      is limited by the congestion on the next least congested route.
1586	      This pressure on N2 to remain competitive is represented by the
1587	      dotted downward arrow at the ingress to N2 in Figure 7.

1589	   Closing the loop: All the above elements conspire to trap everyone
1590	      between two opposing pressures (upper half of Figure 6), ensuring
1591	      the downstream congestion metric arrives at the destination
1592	      neither above nor below zero.  So, we have arrived back where we
1593	      started in our argument.  The ingress edge network can rely on
1594	      downstream congestion declared in the packet headers presented by
1595	      the sender.  So it can police the sender's congestion response
1596	      accordingly.

1598	6.1.2.1.  The Case against Classic Feedback

1600	   A system that produces an optimal outcome as a result of everyone's
1601	   selfish actions is extremely powerful.  But why do we have to change
1602	   to re-ECN to achieve it?  Can't classic congestion feedback (as used
1603	   already by standard ECN) be arranged to provide similar incentives?
1604	   Superficially it can.  Given ECN already existed, this was the
1605	   deployment path Kelly proposed for his seminal work that used self-
1606	   interest to optimise a system of networks and users (summarised in
1607	   [Evol_cc]).  The mechanism was nearly identical to volume charging;
1608	   except only the volume of packets marked with congestion experienced
1609	   (CE) was counted.

1611	   However, below we explain why relying on classic feedback /required/
1612	   congestion charging to be used, while re-ECN achieves the same
1613	   powerful outcome, but does not /require/ congestion charging.  In
1614	   brief, the problem with classic feedback is that the incentives have
1615	   to trace the indirect path back to the sender---the long way round
1616	   the feedback loop.  For example, if classic feedback were used in
1617	   Figure 6, N2 would have had to influence N1 via N4, R & S rather than
1618	   directly.

1620	   Inability to agree what is happening downstream: In order to police
1621	      its upstream neighbour's congestion response, the neighbours
1622	      should be able to agree on the congestion to be responded to.
1623	      Whatever the feedback regime, as packets change hands at each
1624	      trust boundary, any path metrics they carry are verifiable by both
1625	      neighbours.  But, with a classic path metric, they can only agree
1626	      on the /upstream/ path congestion.

1628	   Inaccessible back-channel: The network needs a whole-path congestion
1629	      metric to control the source.  Classically, whole path congestion
1630	      emerges at the destination, to be fed back from receiver to sender
1631	      in a back-channel.  But, in any data network, back-channels need
1632	      not be visible to relays, as they are essentially communications
1633	      between the end-points.  They may be encrypted, asymmetrically
1634	      routed or simply omitted, so no network element can reliably
1635	      intercept them.  The congestion charging literature solves this
1636	      problem by charging the receiver and assuming this will cause the
1637	      receiver to refer the charges to the sender.  But, of course, this
1638	      creates unintended side-effects...

1640	   `Receiver pays' unacceptable: In connectionless datagram networks,
1641	      receivers and receiving networks cannot prevent reception from
1642	      malicious senders, so `receiver pays' opens them to `denial of
1643	      funds' attacks.

1645	   End-user congestion charging unacceptable: Even if 'denial of funds'
1646	      were not a problem, we know that end-users are highly averse to
1647	      the unpredictability of congestion charging and anyway, we want to
1648	      avoid restricting network operators to just one retail tariff.
1649	      But with classic feedback only an upstream metric is available, so
1650	      we cannot avoid having to wrap the `receiver pays' money flow
1651	      around the feedback loop, necessarily forcing end-users to be
1652	      subjected to congestion charging.

1654	   To summarise so far, with classic feedback, policing congestion
1655	   response /requires/ congestion charging of end-users and a `receiver
1656	   pays' model, whereas, with re-ECN, incentives can be fashioned either
1657	   by technical policing mechanisms (more appropriate for end users) or
1658	   by congestion charging using the safer `sender pays' model (more
1659	   appropriate inter-domain).

1661	   We now take a second pass over the incentive framework, filling in
1662	   the detail.

1664	6.1.3.  Egress Dropper

1666	   As traffic leaves the last network before the receiver (domain N4 in
1667	   Figure 6), the RE blanking fraction in a flow should match the CE
1668	   congestion marking fraction.  If it is less (a negative flow), it
1669	   implies that the source is understating path congestion (which will
1670	   reduce the penalties that N2 owes N4).

1672	   If flows are positive, N4 need take no action---this simply means its
1673	   upstream neighbour is paying more penalties than it needs to, and the
1674	   source is going slower than it needs to.  But, to protect itself
1675	   against persistently negative flows, N4 should install a dropper at
1676	   its egress.  Appendix D gives a suggested algorithm for the dropper,
1677	   meeting the criteria below.

1679	   o  It SHOULD introduce minimal false positives for honest flows;

1681	   o  It SHOULD quickly detect and sanction dishonest flows (minimal
1682	      false negatives);

1684	   o  It MUST be invulnerable to state exhaustion attacks from malicious
1685	      sources.  For instance, if the dropper uses flow-state, it should
1686	      not be possible for a source to send numerous packets, each with a
1687	      different flow ID, to force the dropper to exhaust its memory
1688	      capacity.;

1690	   o  It MUST introduce sufficient loss in goodput so that malicious
1691	      sources cannot play off losses in the egress dropper against
1692	      higher allowed throughput.  Salvatori [CLoop_pol] describes this
1693	      attack, which involves the source understating path congestion
1694	      then inserting forward error correction (FEC) packets to
1695	      compensate expected losses.

1697	   Note that the dropper operates on flows but we would like it not to
1698	   require per-flow state.  This is why we have been careful to ensure
1699	   that all flows MUST start with a packet marked with the FNE
1700	   codepoint.  If a flow does not start with the FNE codepoint, a
1701	   dropper is likely to treat it unfavourably.  This risk makes it worth
1702	   setting the FNE codepoint at the start of a flow, even though there
1703	   is a cost to the sender of setting FNE (positive `worth').  Indeed,
1704	   with the FNE codepoint, the rate at which a sender can generate new
1705	   flows can be limited (Appendix F).  In this respect, the FNE
1706	   codepoint works like Clark's state set-up bit [Steps_DoS].

1708	   Appendix F also gives an example dropper implementation that
1709	   aggregates flow state.  Dropper algorithms will often maintain a
1710	   moving average across flows of the fraction of RE blanked packets.
1711	   When maintaining an average across flows, a dropper SHOULD only allow
1712	   flows into the average if they start with FNE, but it SHOULD not
1713	   include packets with the FNE codepoint set in the average.  An
1714	   ingress gateway sets the FNE codepoint when it does not have the
1715	   benefit of feedback from the ingress.  So, counting packets with FNE
1716	   cleared would be likely to make the average unnecessarily positive,
1717	   providing headroom (or should we say footroom?) for dishonest
1718	   (negative) traffic.

1720	   If the dropper detects a persistently negative flow, it SHOULD drop
1721	   sufficient negative and neutral packets to force the flow to not be
1722	   negative.  Drops SHOULD be focused on just sufficient packets in
1723	   misbehaving flows to remove the negative bias while doing minimal
1724	   harm.

1726	6.1.4.  Rate Policing

1728	   Approaches like [XCHOKe] & [pBox] are nice approaches for rate
1729	   policing traffic without the benefit of whole path information, such
1730	   as could be provided by re-ECN.  But they must be deployed at
1731	   bottlenecks in order to work.  Unfortunately, a large proportion of
1732	   traffic traverses at least two bottlenecks (in the two access
1733	   networks), particularly with the current traffic mix where peer-to-
1734	   peer file-sharing is prevalent.  These `bottleneck policers' could be
1735	   adapted to combine ECN congestion marking from the upstream path with
1736	   local congestion knowledge.  But then the only useful placement for
1737	   them would be close to the egress of the network.

1739	   But then, if these bottleneck policers were widely deployed, the
1740	   Internet would find itself with one universal rate adaptation policy
1741	   (TCP-friendliness) embedded throughout the network.  Given TCP's
1742	   congestion control algorithm is already known to be hitting its
1743	   scalability limits and new algorithms are being developed for high-
1744	   speed congestion control, embedding TCP policing into the Internet
1745	   would make evolution to new algorithms extremely painful.  If a
1746	   source wanted to use a different algorithm, it would have to both
1747	   discover and negotiate with a policer in some remote access network,
1748	   as well as possibly others on its path.

1750	   Therefore, re-ECN has been designed to avoid the need for bottleneck
1751	   policing so that we can avoid the threat of a single rate adaptation
1752	   policy throughout the network.  Instead, re-ECN allows the access
1753	   network operator at the ingress to choose which rate adaptation to
1754	   enforce.  If desired, the re-ECN wire protocol allows these ingress
1755	   policers to perform per-flow policing according to the widely adopted
1756	   TCP rate adaptation, but it also allows new rate adaptation policies
1757	   beyond TCP to be enforced.  Further, it also allows the flexibility
1758	   for networks to choose to police users as a whole, rather than flows
1759	   (see Appendix F for example designs).

1761	   o  The particular rate adaptation may be agreed bilaterally between
1762	      the sender and its ingress provider (Section 5.5.2), which would
1763	      greatly improve the evolvability of congestion control, requiring
1764	      only a single, local box to be updated upon changes.  Of course,
1765	      one would currently expect TCP to be the default of choice.

1767	   o  Bottleneck policing can easily be circumvented, opening multiple
1768	      flows by varying the active end-point port number; or by spoofing
1769	      the source address but arranging with the receiver to hide the
1770	      true return address at a higher layer.

1772	   A useful feature of re-ECN is that it provides all the information a
1773	   policer needs directly in the packets being policed.  Re-Echo packets
1774	   represent congestion echoes as far as an ingress policer is
1775	   concerned.  So, even policing TCP's AIMD algorithm is relatively
1776	   straightforward.  Appendix F presents an example design, but the
1777	   choice of the preferred mechanism is up to the implementer.

1779	   Finally, we must not forget that an easy way to circumvent re-ECN's
1780	   defences is for the source to turn off re-ECN support, by setting the
1781	   Not-RECT codepoint, implying legacy traffic.  Therefore an ingress
1782	   policer must put a general rate-limit on Not-RECT traffic, which
1783	   SHOULD be lax during early, patchy deployment, but will have to
1784	   become stricter as deployment widens.  Similarly, flows starting
1785	   without an FNE packet can be confined by a strict rate-limit used for
1786	   the remainder of flows that haven't proved they are well-behaved by
1787	   starting correctly (therefore they need not consume any flow state---
1788	   they are just confined to the `misbehaving' bin if they carry an
1789	   unrecognised flow ID).  Also, as already pointed out, an ingress rate
1790	   policer MUST block both CE codepoints, as traffic that is already
1791	   negative as soon as it is sent must be invalid.

1793	6.1.5.  Inter-domain Policing

1795	   Section 6.1.2 outlining the whole the Incentive Framework above has
1796	   already explained how neighbouring domains can arrange their contract
1797	   with each other so that a network can penalises its upstream
1798	   neighbour in proportion to the total downstream congestion that
1799	   crosses the interface between them over an accounting period.  That
1800	   is, a simple count of the volume of data in packets with RE blanked
1801	   minus the volume with CE marked over, say, a month.

1803	   Full details of how this can be done, why it works and a security
1804	   analysis are available in a sister Internet Draft entitled `Emulating
1805	   Border Flow Policing using Re-ECN on Bulk Data' [Re-PCN].  That I-D
1806	   gives examples of how downstream networks can police the aggregate
1807	   congestion response of their upstream neighbours, against different
1808	   contractual arrangements.  The goal is to ensure an upstream network
1809	   in turn polices its upstream networks, eventually ensuring upstream
1810	   networks will suffer if they do not police the rate response to
1811	   congestion of their users.

1813	   The scenario used in [Re-PCN] is one where re-ECN is used edge-to-
1814	   edge rather than end-to-end as in the present document.  However, the
1815	   position at inter-domain borders is nearly identical. {ToDo: A
1816	   summary of the relevant aspects of that I-D will be included here,
1817	   but due to lack of time this has had to be deferred for the next
1818	   version.}

1820	6.1.6.  Simulations

1822	   Simulations of policer and dropper performance done for the multi-bit
1823	   version of re-feedback have been included in section 5 "Dropper
1824	   Performance" of [Re-fb].  Simulations of policer and dropper for the
1825	   re-ECN version described in this document are work in progress.

1827	6.2.  Other Applications

1829	   {ToDo: Other applications of re-ECN will be briefly outlined here
1830	   (largely drawing from section 3 of [Re-fb]), such as: }

1832	6.2.1.  DDoS Mitigation

1834	   A flooding attack is inherently about congestion of a resource.
1835	   Because re-ECN ensures the sources causing network congestion
1836	   experience the cost of their own actions, it acts as a first line of
1837	   defence against DDoS.  As load focuses on a victim, upstream queues
1838	   grow, requiring honest sources to pre-load packets with a higher
1839	   fraction of positive packets.  Once downstream routers are so
1840	   congested that they are dropping traffic, they will be CE marking the
1841	   traffic they do forward 100%.  Honest sources will therefore be
1842	   sending Re-Echo 100% (and therefore being severely rate-limited at
1843	   the ingress).

1845	   Malicious sources can either do the same as honest sources, and be
1846	   rate-limited at ingress, or they can understate congestion by sending
1847	   more neutral RECT packets than they should.  If sources understate
1848	   congestion (i.e. do not re-echo sufficient positive packets) and the
1849	   preferential drop ranking is implemented on routers (Section 5.3),
1850	   these routers will preserve positive traffic until last.  So, the
1851	   neutral traffic from malicious sources will all be automatically
1852	   dropped first.  Either way, the malicious sources cannot send more
1853	   than honest sources.

1855	   Further, DDoS sources will tend to be re-used by different
1856	   controllers for different attacks.  They will therefore build up a
1857	   long term history of causing congestion.  Therefore, as long as the
1858	   population of potentially compromisable hosts around the Internet is
1859	   limited, the per-user policing algorithms in Appendix F.1 will
1860	   gradually throttle down the zombies.  Therefore, widespread
1861	   deployment of re-ECN could considerably dampen the force of DDoS.
1862	   Zombie armies could hold back from attacking for long enough to be
1863	   able to build up enough credit in the per-user policers to launch an
1864	   attack.  But they would then still be limited to no more throughput
1865	   than other, honest users.

1867	   Inter-domain traffic policing (see Section 6.1.5)ensures that any
1868	   network that harbours compromised `zombie' hosts will have to bear
1869	   the cost of the congestion caused by the packets of the zombies in
1870	   downstream networks.  Such network will be incentivised to deploy
1871	   per-user policers that rate-limit hosts unresponsive to congestion so
1872	   they can only send very slowly into congested paths.  As well as
1873	   protecting other networks, the extremely poor performance at any sign
1874	   of congestion will incentivise the zombie's owner to clean it up.

1876	   However, the host should behave normally when using uncongested
1877	   paths.

1879	6.2.2.  End-to-end QoS

1881	   {ToDo: }

1883	6.2.3.  Traffic Engineering

1885	   {ToDo: }

1887	6.2.4.  Inter-Provider Service Monitoring

1889	   {ToDo: }

1891	6.3.  Limitations

1893	   This section will discuss the limitations of the re-ECN approach,
1894	   particularly:

1896	   o  Malicious users have the ability to turn off ECT.  Given Not-ECT
1897	      traffic cannot be efficiently policed, users would be able to get
1898	      a considerable advantage that would not be simply compensated by
1899	      their being the preferential candidates for drops in case of
1900	      sustained congestion.  For this reason, we recommend that while
1901	      accommodating a smooth initial transition to re-ECN policers
1902	      should gradually be tuned to rate limit Not-ECT traffic in the
1903	      long term.

1905	   o  Re-feedback for TTL (re-TTL) would also be desirable at the same
1906	      time as re-ECN.  Unfortunately this requires a further agreement
1907	      to standardise the mechanisms briefly described in Appendix E

1909	   o  We are considering the issue of whether it would be useful to
1910	      truncate rather than drop packets that appear to be malicious, so
1911	      that the feedback loop is not broken but useful data can be
1912	      removed.

1914	   o  The inability to police excessive congestion when it causes an
1915	      ECN-capable router to drop ECT traffic rather than marking it.
1916	      Re-ECN allows policing of downstream explicit congestion
1917	      notifications, not drops.

1919	7.  Incremental Deployment
1920	7.1.  Incremental Deployment Features

1922	   We chose to use ECT(1) for Re-ECN traffic deliberately.  Existing ECN
1923	   sources set ECT(0) at either 50% (the nonce) or 100% (the default).
1924	   So they will appear to a re-ECN policer as very highly congested
1925	   paths.  When policers are first deployed they can be configured
1926	   permissively, allowing through both `legacy' ECN and misbehaving re-
1927	   ECN flows.  Then, as the threshold is set more strictly, the more
1928	   legacy ECN sources will gain by upgrading to re-ECN.  Thus, towards
1929	   the end of the voluntary incremental deployment period, legacy
1930	   transports can be given progressively stronger encouragement to
1931	   upgrade.

1933	   {ToDo: As well as introducing the new information above, this section
1934	   is intended to collect together all the snippets of information
1935	   throughout the draft about incremental deployment.  Through lack of
1936	   time, this rationalisation will have to wait until the next version,
1937	   except for the brief list below.  However, a long section describing
1938	   possible deployment scenarios is available in the section following.}

1940	   Re-ECN semantics for use of the two-bit ECN field are different in
1941	   the following minor respects compared to RFC3168:

1943	   o  A re-ECN sender sets ECT(1) by default, whereas an RFC3168 sender
1944	      sets ECT(0) by default;

1946	   o  No provision is necessary for a re-ECN capable source transport to
1947	      use the ECN nonce;

1949	   o  Routers MAY preferentially drop different extended ECN codepoints;

1951	   o  Packets carrying the feedback not established (FNE) codepoint MAY
1952	      optionally be marked rather than dropped by routers, even though
1953	      their ECN field is Not-ECT (with the important caveat in
1954	      "retcp_Router_Forwarding_Behaviour");

1956	   o  Packets may be dropped by policing nodes because of apparent
1957	      misbehaviour, not just because of congestion.

1959	   None of these changes REQUIRE any modifications to routers.

1961	7.2.  Incremental Deployment Incentives

1963	   It would only be worth standardising the re-ECN protocol if there
1964	   existed a coherent story for how it might be incrementally deployed.
1965	   In order for it to have a chance of deployment, everyone who needs to
1966	   act, must have a strong incentive to act, and the incentives must
1967	   arise in the order that deployment would have to happen.  Re-ECN
1968	   works around unmodified ECN routers, but we can't just discuss why
1969	   and how re-ECN deployment might build on ECN deployment, because
1970	   there is precious little to build on in the first place.  Instead, we
1971	   aim to show that re-ECN deployment could carry ECN with it.  We focus
1972	   on commercial deployment incentives, although some of the arguments
1973	   apply equally to academic or government sectors.

1975	   ECN deployment:

1977	      ECN is largely implemented in commercial routers, but generally
1978	      not as a supported feature, and it has largely not been deployed
1979	      by commercial network operators.  It has been released in many
1980	      Unix-based operating systems, but not in proprietary OSs like
1981	      Windows or those in many mobile devices.  For detailed deployment
1982	      status, see [ECN-Deploy].  We believe the reason ECN deployment
1983	      has not happened is twofold:

1985	      *  ECN requires changes to both routers and hosts.  If someone
1986	         wanted to sell the improvement that ECN offers, they would have
1987	         to co-ordinate deployment of their product with others.  An ECN
1988	         server only gives any improvement on an ECN network.  An ECN
1989	         network only gives any improvement if used by ECN devices.
1990	         Deployment that requires co-ordination adds cost and delay and
1991	         tends to dilute any competitive advantage that might be gained.

1993	      *  ECN `only' gives a performance improvement.  Making a product a
1994	         bit faster (whether the product is a device or a network),
1995	         isn't usually a sufficient selling point to be worth the cost
1996	         of co-ordinating across the industry to deploy it.  Network
1997	         operators tend to avoid re-configuring a working network unless
1998	         launching a new product.

2000	   ECN and re-ECN for Edge-to-edge Assured QoS:

2002	      We believe the proposal to provide assured QoS sessions using a
2003	      form of ECN called pre-congestion notification (PCN) [CL-arch] is
2004	      most likely to break the deadlock in ECN deployment first.  It
2005	      only requires edge-to-edge deployment so it does not require
2006	      endpoint support.  It can be deployed in a single network, then
2007	      grow incrementally to interconnected networks.  And it provides a
2008	      different `product' (internetworked assured QoS), rather than
2009	      merely making an existing product a bit faster.

2011	      Not only could this assured QoS application kick-start ECN
2012	      deployment, it could also carry re-ECN deployment with it; because
2013	      re-ECN can enable the assured QoS region to expand to a large
2014	      internetwork where neighbouring networks do not trust each other.
2015	      [Re-PCN] argues that re-ECN security should be built in to the QoS
2016	      system from the start, explaining why and how.

2018	      If ECN and re-ECN were deployed edge-to-edge for assured QoS,
2019	      operators would gain valuable experience.  They would also clear
2020	      away many technical obstacles such as firewall configurations that
2021	      block all but the legacy settings of the ECN field and the RE
2022	      flag.

2024	   ECN in Access Networks:

2026	      The next obstacle to ECN deployment would be extension to access
2027	      and backhaul networks, where considerable link layer differences
2028	      makes implementation non-trivial, particularly on congested
2029	      wireless links.  ECN and re-ECN work fine during partial
2030	      deployment, but they will not be very useful if the most congested
2031	      elements in networks are the last to support them.  Access network
2032	      support is one of the weakest parts of this deployment story.  All
2033	      we can hope is that, once the benefits of ECN are better
2034	      understood by operators, they will push for the necessary link
2035	      layer implementations as deployment proceeds.

2037	   Policing Unresponsive Flows:

2039	      Re-ECN allows a network to offer differentiated quality of service
2040	      as explained in Section 6.2.2.  But we do not believe this will
2041	      motivate initial deployment of re-ECN, because the industry is
2042	      already set on alternative ways of doing QoS.  Despite being much
2043	      more complicated and expensive, the alternative approaches are
2044	      here and now.

2046	      But re-ECN is critical to QoS deployment in another respect.  It
2047	      can be used to prevent applications from taking whatever bandwidth
2048	      they choose without asking.

2050	      Currently, applications that remain resolute in their lack of
2051	      response to congestion are rewarded by other TCP applications.  In
2052	      other words, TCP is naively friendly, in that it reduces its rate
2053	      in response to congestion whether it is competing with friends
2054	      (other TCPs) or with enemies (unresponsive applications).

2056	      Therefore, those network owners that want to sell QoS will be keen
2057	      to ensure that their users can't help themselves to QoS for free.
2058	      Given the very large revenues at stake, we believe effective
2059	      policing of congestion response will become highly sought after by
2060	      network owners.

2062	      But this does not necessarily argue for re-ECN deployment.
2063	      Network owners might choose to deploy bottleneck policers rather
2064	      than re-ECN-based policing.  However, under Related Work
2065	      (Section 9) we argue that bottleneck policers are inherently
2066	      vulnerable to circumvention.

2068	      Therefore we believe there will be a strong demand from network
2069	      owners for re-ECN deployment so they can police flows that do not
2070	      ask to be unresponsive to congestion, in order to protect their
2071	      revenues from flows that do ask (QoS).  In particular, we suspect
2072	      that the operators of cellular networks will want to prevent VoIP
2073	      and video applications being used freely on their networks as a
2074	      more open market develops in GPRS and 3G devices.

2076	      Initial deployments are likely to be isolated to single cellular
2077	      networks.  Cellular operators would first place requirements on
2078	      device manufacturers to include re-ECN in the standards for mobile
2079	      devices.  In parallel, they would put out tenders for ingress and
2080	      egress policers.  Then, after a while they would start to tighten
2081	      rate limits on Not-ECT traffic from non-standard devices and they
2082	      would start policing whatever non-accredited applications people
2083	      might install on mobile devices with re-ECN support in the
2084	      operating system.  This would force even independent mobile device
2085	      manufacturers to provide re-ECN support.  Early standardisation
2086	      across the cellular operators is likely, including interconnection
2087	      agreements with penalties for excess downstream congestion.

2089	      We suspect some fixed broadband networks (whether cable or DSL)
2090	      would follow a similar path.  However, we also believe that larger
2091	      parts of the fixed Internet would not choose to police on a per-
2092	      flow basis.  Some might choose to police congestion on a per-user
2093	      basis in order to manage heavy peer-to-peer file-sharing, but it
2094	      seems likely that a sizeable majority would not deploy any form of
2095	      policing.

2097	      This hybrid situation begs the question, "How does re-ECN work for
2098	      networks that choose to using policing if they connect with others
2099	      that don't?"  Traffic from non-ECN capable sources will arrive
2100	      from other networks and cause congestion within the policed, ECN-
2101	      capable networks.  So networks that chose to police congestion
2102	      would rate-limit Not-ECT traffic throughout their network,
2103	      particularly at their borders.  They would probably also set
2104	      higher usage prices in their interconnection contracts for
2105	      incoming Not-ECT and Not-RECT traffic.  We assume that
2106	      interconnection contracts between networks in the same tier will
2107	      include congestion penalties before contracts with provider
2108	      backbones do.

2110	      A hybrid situation could remain for all time.  As was explained in
2111	      the introduction, we believe in healthy competition between
2112	      policing and not policing, with no imperative to convert the whole
2113	      world to the religion of policing.  Networks that chose not to
2114	      deploy egress droppers would leave themselves open to being
2115	      congested by senders in other networks.  But that would be their
2116	      choice.

2118	      The important aspect of the egress dropper though is that it most
2119	      protects the network that deploys it.  If a network does not
2120	      deploy an egress dropper, sources sending into it from other
2121	      networks will be able to understate the congestion they are
2122	      causing.  Whereas, if a network deploys an egress dropper, it can
2123	      know how much congestion other networks are dumping into it.  And
2124	      apply penalties or charges accordingly.  So, whether or not a
2125	      network polices its own sources at ingress, it is in its interests
2126	      to deploy an egress dropper.

2128	   Host support:

2130	      In the above deployment scenario, host operating system support
2131	      for re-ECN came about through the cellular operators demanding it
2132	      in device standards (i.e. 3GPP).  Of course, increasingly, mobile
2133	      devices are being built to support multiple wireless technologies.
2134	      So, if re-ECN were stipulated for cellular devices, it would
2135	      automatically appear in those devices connected to the wireless
2136	      fringes of fixed networks if they coupled cellular with WiFi or
2137	      Bluetooth technology, for instance.  Also, once implemented in the
2138	      operating system of one mobile device, it would tend to be found
2139	      in other devices using the same family of operating system.

2141	      Therefore, whether or not a fixed network deployed ECN, or
2142	      deployed re-ECN policers and droppers, many of its hosts might
2143	      well be using re-ECN over it.  Indeed, they would be at an
2144	      advantage when communicating with hosts across Re-ECN policed
2145	      networks that rate limited Not-RECT traffic.

2147	   Other possible scenarios:

2149	      The above is thankfully not the only plausible scenario we can
2150	      think of.  One of the many clubs of operators that meet regularly
2151	      around the world might decide to act together to persuade a major
2152	      operating system manufacturer to implement re-ECN.  And they may
2153	      agree between them on an interconnection model that includes
2154	      congestion penalties.

2156	      Re-ECN provides an interesting opportunity for device
2157	      manufacturers as well as network operators.  Policers can be
2158	      configured loosely when first deployed.  Then as re-ECN take-up
2159	      increases, they can be tightened up, so that a network with re-ECN
2160	      deployed can gradually squeeze down the service provided to legacy
2161	      devices that have not upgraded to re-ECN.  Many device vendors
2162	      rely on replacement sales.  And operating system companies rely
2163	      heavily on new release sales.  Also support services would like to
2164	      be able to force stragglers to upgrade.  So, the ability to
2165	      throttle service to legacy operating systems is quite valuable.

2167	      Also, policing unresponsive sources may not be the only or even
2168	      the first application that drives deployment.  It may be policing
2169	      causes of heavy congestion (e.g. peer-to-peer file-sharing).  Or
2170	      it may be mitigation of denial of service.  Or we may be wrong in
2171	      thinking simpler QoS will not be the initial motivation for re-ECN
2172	      deployment.  Indeed, the combined pressure for all these may be
2173	      the motivator, but it seems optimistic to expect such a level of
2174	      joined-up thinking from today's communications industry.  We
2175	      believe a single application alone must be a sufficient motivator.

2177	      In short, everyone gains from adding accountability to TCP/IP,
2178	      except the selfish or malicious.  So, deployment incentives tend
2179	      to be strong.

2181	8.  Architectural Rationale

2183	   In the Internet's technical community the danger of not responding to
2184	   congestion is well-understood, with its attendant risk of congestion
2185	   collapse [RFC3714].  However, many of the Internet's commercial
2186	   community consider that the very essence of IP is to provide open
2187	   access to the internetwork for all applications.  Congestion is seen
2188	   as a symptom of over-conservative investment.  And the goal of
2189	   application design is to find novel ways to continue working despite
2190	   congestion.  They argue that the Internet was never intended to be
2191	   solely for TCP-friendly applications.  Another side of the Internet's
2192	   commercial community believe that it is no use providing a network
2193	   for novel applications if it has insufficient capacity.  And it will
2194	   always have insufficient capacity unless a greater share of
2195	   application revenues can be /assured/ for the infrastructure
2196	   provider.  Otherwise the major investments required will carry too
2197	   much risk and won't happen.

2199	   The lesson articulated in [Tussle] is that we shouldn't embed our
2200	   view on these arguments into the Internet at design time.  Instead we
2201	   should design the Internet so that the outcome of these arguments can
2202	   get decided at run-time.  Re-ECN is designed in that spirit.  Once
2203	   the protocol is available, different network operators can choose how
2204	   liberal they want to be in holding people accountable for the
2205	   congestion they cause.  Some might boldly invest in capacity and not
2206	   police its use at all, hoping that novel applications will result.
2207	   Others might use re-ECN for fine-grained flow policing, expecting to
2208	   make money selling vertically integrated services.  Yet others might
2209	   sit somewhere half-way, perhaps doing coarse, per-user policing.  All
2210	   might change their minds later.  But re-ECN always allows them to
2211	   interconnect so that the careful ones can protect themselves from the
2212	   liberal ones.

2214	   The incentive-based approach used for re-ECN is based on Gibbens and
2215	   Kelly's arguments [Evol_cc] on allowing endpoints the freedom to
2216	   evolve new congestion control algorithms for new applications.  They
2217	   ensured responsible behaviour despite everyone's self-interest by
2218	   applying pricing to ECN marking, and Kelly had proved stability and
2219	   optimality in an earlier paper.

2221	   Re-ECN keeps all the underlying economic incentives, but rearranges
2222	   the feedback.  The idea is to allow a network operator (if it
2223	   chooses) to deploy engineering mechanisms like policers at the front
2224	   of the network which can be designed to behave /as if/ they are
2225	   responding to congestion prices.  Rather than having to subject users
2226	   to congestion pricing, networks can then use more traditional
2227	   charging regimes (or novel ones).  But the engineering can constrain
2228	   the overall amount of congestion a user can cause.  This provides a
2229	   buffer against completely outrageous congestion control, but still
2230	   makes it easy for novel applications to evolve if they need different
2231	   congestion control to the norms.  It also allows novel charging
2232	   regimes to evolve.

2234	   Despite being achieved with a relatively minor protocol change, re-
2235	   ECN is an architectural change.  Previously, Internet congestion
2236	   could only be controlled by the data sender, because it was the only
2237	   one both in a position to control the load and in a position to see
2238	   information on congestion.  Re-ECN levels the playing field.  It
2239	   recognises that the network also has a role to play in moderating
2240	   (policing) congestion control.  But policing is only truly effective
2241	   at the first ingress into an internetwork, whereas path congestion
2242	   was previously only visible at the last egress.  So, re-ECN
2243	   democratises congestion information.  Then the choice over who
2244	   actually controls congestion can be made at run-time, not design
2245	   time---a bit like an aircraft with dual controls.  And different
2246	   operators can make different choices.  We believe non-architectural
2247	   approaches to this problem are unlikely to offer more than partial
2248	   solutions (see Section 9).

2250	   Importantly, re-ECN does NOT REQUIRE assumptions about specific
2251	   congestion responses to be embedded in any network elements, except
2252	   at the first ingress to the internetwork if that level of control is
2253	   desired by the ingress operator.  But such tight policing will be a
2254	   matter of agreement between the source and its access network
2255	   operator.  The ingress operator need not police congestion response
2256	   at flow granularity; it can simply hold a source responsible for the
2257	   aggregate congestion it causes, perhaps keeping it within a monthly
2258	   congestion quota.  Or if the ingress network trusts the source, it
2259	   can do nothing.

2261	   Therefore, the aim of the re-ECN protocol is NOT solely to police
2262	   TCP-friendliness.  Re-ECN preserves IP as a generic network layer for
2263	   all sorts of responses to congestion, for all sorts of transports.
2264	   Re-ECN merely ensures truthful downstream congestion information is
2265	   available in the network layer for all sorts of accountability
2266	   applications.

2268	   The end to end design principle does not say that all functions
2269	   should be moved out of the lower layers---only those functions that
2270	   are not generic to all higher layers.  Re-ECN adds a function to the
2271	   network layer that is generic, but was omitted: accountability for
2272	   causing congestion.  Accountability is not something that an end-user
2273	   can provide to themselves.  We believe re-ECN adds no more than is
2274	   sufficient to hold each flow accountable, even if it consists of a
2275	   single datagram.

2277	   "Accountability" implies being able to identify who is responsible
2278	   for causing congestion.  However, at the network layer it would NOT
2279	   be useful to identify the cause of congestion by adding individual or
2280	   organisational identity information, NOR by using source IP
2281	   addresses.  Rather than bringing identity information to the point of
2282	   congestion, we bring downstream congestion information to the point
2283	   where the cause can be most easily identified and dealt with.  That
2284	   is, at any trust boundary, congestion can be associated with the
2285	   physically connected upstream neighbour that is directly responsible
2286	   for causing it (whether intentionally or not).  A trust boundary
2287	   interface is exactly the place to police or throttle in order to
2288	   directly mitigate congestion, rather than having to trace the
2289	   (ir)responsible party in order to shut them down.

2291	   Some considered that ECN itself was a layering violation.  The
2292	   reasoning went that the interface to a layer should provide a service
2293	   to the higher layer and hide how the lower layer does it.  However,
2294	   ECN reveals the state of the network layer and below to the transport
2295	   layer.  A more positive way to describe ECN is that it is like the
2296	   return value of a function call to the network layer.  It explicitly
2297	   returns the status of the request to deliver a packet, by returning a
2298	   value representing the current risk that a packet will not be served.

2300	   Re-ECN has similar semantics, except the transport layer must try to
2301	   guess the return value, then it can use the actual return value from
2302	   the network layer to modify the next guess.

2304	9.  Related Work

2306	   {Due to lack of time, this section is incomplete.  The reader is
2307	   referred to the Related Work section of [Re-fb] for a brief selection
2308	   of related ideas.}

2310	9.1.  Policing Rate Response to Congestion

2312	   ATM network elements send congestion back-pressure messages [ITU-
2313	   T.I.371] along each connection, duplicating any end to end feedback
2314	   because they don't trust it.  On the other hand, re-ECN ensures
2315	   information in forwarded packets can be used for congestion
2316	   management without requiring a connection-oriented architecture and
2317	   re-using the overhead of fields that are already set aside for end to
2318	   end congestion control (and routing loop detection in the case of re-
2319	   TTL in Appendix E).

2321	   We borrowed ideas from policers in the literature [pBox],[XCHOKe],
2322	   AFD etc. for our rate equation policer.  However, without the benefit
2323	   of re-ECN they don't police the correct rate for the condition of
2324	   their path.  They detect unusually high /absolute/ rates, but only
2325	   while the policer itself is congested, because they work by detecting
2326	   prevalent flows in the discards from the local RED queue.  These
2327	   policers must sit at every potential bottleneck, whereas our policer
2328	   need only be located at each ingress to the internetwork.  As Floyd &
2329	   Fall explain [pBox], the limitation of their approach is that a high
2330	   sending rate might be perfectly legitimate, if the rest of the path
2331	   is uncongested or the round trip time is short.  Commercially
2332	   available rate policers cap the rate of any one flow.  Or they
2333	   enforce monthly volume caps in an attempt to control high volume
2334	   file-sharing.  They limit the value a customer derives.  They might
2335	   also limit the congestion customers can cause, but only as an
2336	   accidental side-effect.  They actually punish traffic that fills
2337	   troughs as much as traffic that causes peaks in utilisation.  In
2338	   practice network operators need to be able to allocate service by
2339	   cost during congestion, and by value at other times.

2341	9.2.  Congestion Notification Integrity

2343	   The choice of two ECT code-points in the ECN field [RFC3168]
2344	   permitted future flexibility, optionally allowing the sender to
2345	   encode the experimental ECN nonce [RFC3540] in the packet stream.

2347	   The ECN nonce is an elegant scheme that allows the sender to detect
2348	   if someone in the feedback loop tries to claim no congestion was
2349	   experienced when it fact it was (whether drop or ECN marking).  The
2350	   sender chooses between the two ECT codepoints in a pseudo-random
2351	   sequence.  Then, whenever the network marks a packet with CE, to deny
2352	   the congestion happened, the cheater would have to guess which ECT
2353	   codepoint was overwritten, with only a 50:50 chance of being correct
2354	   each time.

2356	   The assumption behind the ECN nonce is that a sender will want to
2357	   detect whether a receiver is suppressing congestion feedback.  This
2358	   is only true if the sender's interests are aligned with the
2359	   network's, or with the community of users as a whole.  This may be
2360	   true for certain large senders, who are under close scrutiny and have
2361	   a reputation to maintain.  But we have to deal with a more hostile
2362	   world, where traffic may be dominated by peer-to-peer transfers,
2363	   rather than downloads from a few popular sites.  Often the `natural'
2364	   self-interest of a sender is not aligned with the interests of other
2365	   users.  It often wishes to transfer data quickly to the receiver as
2366	   much as the receiver wants the data quickly.

2368	   In contrast, the re-ECN protocol enables policing of an agreed rate-
2369	   response to congestion (e.g. TCP-friendliness) at the sender's
2370	   interface with the internetwork.  It also ensures downstream networks
2371	   can police their upstream neighbours, to encourage them to police
2372	   their users in turn.  But most importantly, it requires the sender to
2373	   declare path congestion to the network and it can remove traffic at
2374	   the egress if this declaration is dishonest.  So it can police
2375	   correctly, irrespective of whether the receiver tries to suppress
2376	   congestion feedback or whether the sender ignores genuine congestion
2377	   feedback.  Therefore the re-ECN protocol addresses a much wider range
2378	   of cheating problems, which includes the one addressed by the ECN
2379	   nonce. {ToDo: Ensure we address the early ACK problem.}

2381	9.3.  Identifying Upstream and Downstream Congestion

2383	   Purple [Purple] proposes that routers should use the CWR flag in the
2384	   TCP header of ECN-capable flows to work out path congestion and
2385	   therefore downstream congestion in a similar way to re-ECN.  However,
2386	   because CWR is in the transport layer, it is not always visible to
2387	   network layer routers and policers.  Purple's motivation was to
2388	   improve AQM, not policing.  But, of course, nodes trying to avoid a
2389	   policer would not be expected to allow CWR to be visible.

2391	10.  Security Considerations

2393	   This whole memo concerns the deployment of a secure congestion
2394	   control framework.  There are some specific security issues that we
2395	   are still working on.

2397	   Malicious users have ability to launch dynamically changing attacks,
2398	   exploiting the time it takes to detect an attack, given ECN marking
2399	   is binary.  We are concentrating on subtle interactions between the
2400	   ingress policer and the egress dropper in an effort to make it
2401	   impossible to game the system.

2403	   There is an inherent need for at least some flow state at the egress
2404	   dropper given the binary marking environment, and the consequent
2405	   vulnerability to state exhaustion attacks.  An egress dropper design
2406	   with bounded flow state is in write-up.

2408	   A malicious source can spoof another user's address and send negative
2409	   traffic to the same destination in order to fool the dropper into
2410	   sanctioning the other user's flow.  To prevent or mitigate these two
2411	   different kinds of DoS attack, against the dropper and against given
2412	   flows, we are considering various protection mechanisms.
2413	   Section 5.5.1 discusses one of these.

2415	   The security of re-ECN has been deliberately designed to not rely on
2416	   cryptography.

2418	11.  IANA Considerations

2420	   This memo includes no request to IANA (yet).

2422	   If this memo was to progress to standards track, it would list:

2424	   o  The new RE flag in IPv4 (Section 5.1) and its extension with the
2425	      ECN field to create a new set of extended ECN (EECN) codepoints;

2427	   o  The definition of the EECN codepoints for default Diffserv PHBs
2428	      (Section 3.2)

2430	   o  The new extension header for IPv6 (Section 5.2);

2432	   o  The new combinations of flags in the TCP header for capability
2433	      negotiation (Section 4.1.3);

2435	   o  The new ICMP message type (Section 5.5.1).

2437	12.  Conclusions

2439	   {ToDo:}

2441	13.  Acknowledgements

2443	   Sebastien Cazalet and Andrea Soppera contributed to the idea of re-
2444	   feedback.  All the following have given helpful comments: Andrea
2445	   Soppera, David Songhurst, Peter Hovell, Louise Burness, Phil Eardley,
2446	   Steve Rudkin, Marc Wennink, Fabrice Saffre, Cefn Hoile, Steve Wright,
2447	   John Davey, Martin Koyabe, Carla Di Cairano-Gilfedder, Alexandru
2448	   Murgu, Nigel Geffen, Pete Willis (BT), Sally Floyd (ICIR), Stephen
2449	   Hailes, Mark Handley, Adam Greenhalgh (UCL), Jon Crowcroft (Uni Cam),
2450	   David Clark, Bill Lehr, Sharon Gillett, Steve Bauer, Liz Maida (MIT),
2451	   and comments from participants in the CRN/CFP Broadband and DoS-
2452	   resistant Internet working groups.

2454	14.  Comments Solicited

2456	   Comments and questions are encouraged and very welcome.  They can be
2457	   addressed to the IETF Transport Area working group's mailing list
2458	   <tsvwg@ietf.org>, and/or to the authors.

2460	15.  References

2462	15.1.  Normative References

2464	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
2465	              Requirement Levels", BCP 14, RFC 2119, March 1997.

2467	   [RFC2309]  Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
2468	              S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
2469	              Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
2470	              S., Wroclawski, J., and L. Zhang, "Recommendations on
2471	              Queue Management and Congestion Avoidance in the
2472	              Internet", RFC 2309, April 1998.

2474	   [RFC2581]  Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
2475	              Control", RFC 2581, April 1999.

2477	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
2478	              of Explicit Congestion Notification (ECN) to IP",
2479	              RFC 3168, September 2001.

2481	   [RFC3390]  Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's
2482	              Initial Window", RFC 3390, October 2002.

2484	   [RFC3540]  Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
2485	              Congestion Notification (ECN) Signaling with Nonces",
2486	              RFC 3540, June 2003.

2488	15.2.  Informative References

2490	   [ARI05]    Adams, J., Roberts, L., and A. IJsselmuiden, "Changing the
2491	              Internet to Support Real-Time Content Supply from a Large
2492	              Fraction of Broadband Residential Users", BT Technology
2493	              Journal (BTTJ) 23(2), April 2005.

2495	   [CL-arch]  Briscoe, B., Eardley, P., Songhurst, D., Le Faucheur, F.,
2496	              Charny, A., Babiarz, J., and K. Chan, "A Framework for
2497	              Admission Control over DiffServ using Pre-Congestion
2498	              Notification", draft-briscoe-tsvwg-cl-architecture-02
2499	              (work in progress), March 2006.

2501	   [CLoop_pol]
2502	              Salvatori, A., "Closed Loop Traffic Policing", Politecnico
2503	              Torino and Institut Eurecom Masters Thesis ,
2504	              September 2005.

2506	   [ECN-Deploy]
2507	              Floyd, S., "ECN (Explicit Congestion Notification) in
2508	              TCP/IP; Implementation and Deployment of ECN", Web-page ,
2509	              May 2004,
2510	              <http://www.icir.org/floyd/ecn.html#implementations>.

2512	   [Evol_cc]  Gibbens, R. and F. Kelly, "Resource pricing and the
2513	              evolution of congestion control", Automatica 35(12)1969--
2514	              1985, December 1999,
2515	              <http://www.statslab.cam.ac.uk/~frank/evol.html>.

2517	   [I-D.ietf-tsvwg-ecnsyn]
2518	              Kuzmanovic, A., "Adding Explicit Congestion Notification
2519	              (ECN) Capability to TCP's SYN/ACK  Packets",
2520	              draft-ietf-tsvwg-ecnsyn-00 (work in progress),
2521	              November 2005.

2523	   [ITU-T.I.371]
2524	              ITU-T, "Traffic Control and Congestion Control in
2525	              {B-ISDN}", ITU-T Rec. I.371 (03/04), March 2004.

2527	   [Jiang02]  Jiang, H. and D. Dovrolis, "The Macroscopic Behavior of
2528	              the TCP Congestion Avoidance Algorithm", ACM SIGCOMM
2529	              CCR 32(3)75-88, July 2002,
2530	              <http://doi.acm.org/10.1145/571697.571725>.

2532	   [Mathis97]
2533	              Mathis, M., Semke, J., Mahdavi, J., and T. Ott, "The
2534	              Macroscopic Behavior of the TCP Congestion Avoidance
2535	              Algorithm", ACM SIGCOMM CCR 27(3)67--82, July 1997,
2536	              <http://doi.acm.org/10.1145/263932.264023>.

2538	   [Purple]   Pletka, R., Waldvogel, M., and S. Mannal, "PURPLE:
2539	              Predictive Active Queue Management Utilizing Congestion
2540	              Information", Proc. Local Computer Networks (LCN 2003) ,
2541	              October 2003.

2543	   [RFC2475]  Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.,
2544	              and W. Weiss, "An Architecture for Differentiated
2545	              Services", RFC 2475, December 1998.

2547	   [RFC2988]  Paxson, V. and M. Allman, "Computing TCP's Retransmission
2548	              Timer", RFC 2988, November 2000.

2550	   [RFC3124]  Balakrishnan, H. and S. Seshan, "The Congestion Manager",
2551	              RFC 3124, June 2001.

2553	   [RFC3270]  Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen,
2554	              P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi-
2555	              Protocol Label Switching (MPLS) Support of Differentiated
2556	              Services", RFC 3270, May 2002.

2558	   [RFC3514]  Bellovin, S., "The Security Flag in the IPv4 Header",
2559	              RFC 3514, April 2003.

2561	   [RFC3714]  Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion
2562	              Control for Voice Traffic in the Internet", RFC 3714,
2563	              March 2004.

2565	   [Re-PCN]   Briscoe, B., "Emulating Border Flow Policing using Re-ECN
2566	              on Bulk Data", draft-briscoe-tsvwg-re-ecn-border-cheat-01
2567	              (work in progress), March 2006.

2569	   [Re-fb]    Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C.,
2570	              Salvatori, A., Soppera, A., and M. Koyabe, "Policing
2571	              Congestion Response in an Internetwork Using Re-Feedback",
2572	              ACM SIGCOMM CCR 35(4)277--288, August 2005, <http://
2573	              www.acm.org/sigs/sigcomm/sigcomm2005/
2574	              techprog.html#session8>.

2576	   [Smart_rtg]
2577	              Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang,
2578	              "Optimizing Cost and Performance for Multihoming", ACM
2579	              SIGCOMM CCR 34(4)79--92, October 2004,
2580	              <http://citeseer.ist.psu.edu/698472.html>.

2582	   [Steps_DoS]
2583	              Handley, M. and A. Greenhalgh, "Steps towards a DoS-
2584	              resistant Internet Architecture", Proc. ACM SIGCOMM
2585	              workshop on Future directions in network architecture
2586	              (FDNA'04) pp 49--56, August 2004.

2588	   [Tussle]   Clark, D., Sollins, K., Wroclawski, J., and R. Braden,
2589	              "Tussle in Cyberspace: Defining Tomorrow's Internet", ACM
2590	              SIGCOMM CCR 32(4)347--356, October 2002,
2591	              <http://www.acm.org/sigcomm/sigcomm2002/papers/
2592	              tussle.pdf>.

2594	   [XCHOKe]   Chhabra, P., Chuig, S., Goel, A., John, A., Kumar, A.,
2595	              Saran, H., and R. Shorey, "XCHOKe: Malicious Source
2596	              Control for Congestion Avoidance at Internet Gateways",
2597	              Proceedings of IEEE International Conference on Network
2598	              Protocols (ICNP-02) , November 2002,
2599	              <http://www.cc.gatech.edu/~akumar/xchoke.pdf>.

2601	   [pBox]     Floyd, S. and K. Fall, "Promoting the Use of End-to-End
2602	              Congestion Control in the Internet", IEEE/ACM Transactions
2603	              on Networking 7(4) 458--472, August 1999,
2604	              <http://www.aciri.org/floyd/end2end-paper.html>.

2606	Appendix A.  Precise Re-ECN Protocol Operation

2608	   The protocol operation described in Section 3.3 was an approximation.
2609	   In fact, standard ECN router marking combines 1% and 2% marking into
2610	   slightly less than 3% whole-path marking, because routers
2611	   deliberately mark CE whether or not it has already been marked by
2612	   another router upstream.  So the combined marking fraction would
2613	   actually be 100% - (100% - 1%)(100% - 2%) = 2.98%.

2615	   To generalise this we will need some notation.

2617	   o  j represents the index of each resource (typically queues) along a
2618	      path, ranging from 0 at the first router to n-1 at the last.

2620	   o  m_j represents the fraction of octets *m*arked CE by a particular
2621	      router (whether or not they are already marked) because of
2622	      congestion of resource j.

2624	   o  u_j represents congestion *u*pstream of resource j, being the
2625	      fraction of CE marking in arriving packet headers (before
2626	      marking).

2628	   o  p_j represents *p*ath congestion, being the fraction of packets
2629	      arriving at resource j with the RE flag blanked (excluding Not-
2630	      RECT packets).

2632	   o  v_j denotes expected congestion downstream of resource j, which
2633	      can be thought of as a *v*irtual marking fraction, being derived
2634	      from two other marking fractions.

2636	   Observed fractions of each particular codepoint (u, p and v) and
2637	   router marking rate m are dimensionless fractions, being the ratio of
2638	   two data volumes (marked and total) over a monitoring period.  All
2639	   measurements are in terms of octets, not packets, assuming that line
2640	   resources are more congestible than packet processing.

2642	   The path congestion (RE blanking fraction) set by the sender should
2643	   reflect the upstream congestion (CE marking fraction) fed back from
2644	   the destination.  Therefore in the steady state

2646	      p_0  = u_n
2647	           = 1 - (1 - m_1)(1 - m_2)...

2649	   Similarly, at some point j in the middle of the network, if p = 1 -
2650	   (1 - u_j)(1 - v_j), then

2652	      v_j  = 1 - (1 - p)/(1 - u_j)

2654	          ~= p - u_j;                      if u_j << 100%

2656	   So, between the two routers in the example in Section 3.3, congestion
2657	   downstream is

2659	      v_1  = 100.00% - (100% - 2.98%) / (100% - 1.00%)
2660	           = 2.00%,

2662	   or a useful approximation of downstream congestion is

2664	      v_1 ~= 2.98% - 1.00%
2665	          ~= 1.98%.

2667	Appendix B.  ECN Compatibility

2669	   The rationale for choosing the particular combinations of SYN and SYN
2670	   ACK flags in Section 4.1.3 is as follows.

2672	   Choice of SYN flags: A re-ECN sender can work with vanilla ECN
2673	      receivers so we wanted to use the same flags as would be used in
2674	      an ECN-setup SYN [RFC3168] (CWR=1, ECE=1).  But at the same time,
2675	      we wanted a server (host B) that is Re-ECT to be able to recognise
2676	      that the client (A) is also Re-ECT.  We believe also setting NS=1
2677	      in the initial SYN achieves both these objectives, as it should be
2678	      ignored by vanilla ECT receivers and by ECT-Nonce receivers.  But
2679	      senders that are not Re-ECT should not set NS=1.  At the time ECN
2680	      was defined, the NS flag was not defined, so setting NS=1 should
2681	      be ignored by existing ECT receivers (but testing against
2682	      implementations may yet prove otherwise).  The ECN Nonce
2683	      RFC [RFC3540] is silent on what the NS field might be set to in
2684	      the TCP SYN, but we believe the intent was for a nonce client to
2685	      set NS=0 in the initial SYN (again only testing will tell).
2686	      Therefore we define a Re-ECN-setup SYN as one with NS=1, CWR=1 &
2687	      ECE=1

2689	   Choice of SYN ACK flags: Choice of SYN ACK: The client (A) needs to
2690	      be able to determine whether the server (B) is Re-ECT.  The
2691	      original ECN specification required an ECT server to respond to an
2692	      ECN-setup SYN with an ECN-setup SYN ACK of CWR=0 and ECE=1.  There
2693	      is no room to modify this by setting the NS flag, as that is
2694	      already set in the SYN ACK of an ECT-Nonce server.  So we used the
2695	      only combination of CWR and ECE that would not be used by existing
2696	      TCP receivers: CWR=1 and ECE=0.  The original ECN specification
2697	      defines this combination as a non-ECN-setup SYN ACK, which remains
2698	      true for vanilla and Nonce ECTs.  But for re-ECN we define it as a
2699	      Re-ECN-setup SYN ACK.  We didn't use a SYN ACK with both CWR and
2700	      ECE cleared to 0 because that would be the likely response from
2701	      most Not-ECT receivers.  And we didn't use a SYN ACK with both CWR
2702	      and ECE set to 1 either, as at least one broken receiver
2703	      implementation echoes whatever flags were in the SYN into its SYN
2704	      ACK.  Therefore we define a Re-ECN-setup SYN ACK as one with CWR=1
2705	      & ECE=0.

2707	   Choice of two alternative SYN ACKs: the NS flag may take either value
2708	      in a Re-ECN-setup SYN ACK.  Section 5.4 REQUIRES that a Re-ECT
2709	      server MUST set the NS flag to 1 in a Re-ECN-setup SYN ACK to echo
2710	      congestion experienced (CE) on the initial SYN.  Otherwise a Re-
2711	      ECN-setup SYN ACK MUST be returned with NS=0.  The only current
2712	      known use of the NS flag in a SYN ACK is to indicate support for
2713	      the ECN nonce, which will be negotiated by setting CWR=0 & ECE=1.
2714	      Given the ECN nonce MUST NOT be used for a RECN mode connection, a
2715	      Re-ECN-setup SYN ACK can use either setting of the NS flag without
2716	      any risk of confusion, because the CWR & ECE flags will be
2717	      reversed relative to those used by an ECN nonce SYN ACK.

2719	Appendix C.  Packet Marking During Flow Start

2721	   {ToDo: Write up proof that sender should mark FNE on first and third
2722	   data packets, even with the largest allowed initial window.}

2724	Appendix D.  Example Egress Dropper Algorithm

2726	   {ToDo: Write up the basic algorithm with flow state, then the
2727	   aggregated one.}

2729	Appendix E.  Re-TTL

2731	   This Appendix gives an overview of a proposal to be able to overload
2732	   the TTL field in the IP header to monitor downstream propagation
2733	   delay.  It is planned to fully write up this proposal in a future
2734	   Internet Draft.

2736	   Delay re-feedback can be achieved by overloading the TTL field,
2737	   without changing IP or router TTL processing.  A target value for TTL
2738	   at the destination would need standardising, say 16.  If the path hop
2739	   count increased by more than 16 during a routing change, it would
2740	   temporarily be mistaken for a routing loop, so this target would need
2741	   to be chosen to exceed typical hop count increases.  The TCP wire
2742	   protocol and handlers would need modifying to feed back the
2743	   destination TTL and initialise it.  It would be necessary to
2744	   standardise the unit of TTL in terms of real time (as was the
2745	   original intent in the early days of the Internet).

2747	   In the longer term, precision could be improved if routers
2748	   decremented TTL to represent exact propagation delay to the next
2749	   router.  That is, for a router to decrement TTL by, say, 1.8 time
2750	   units it would alternate the decrement of every packet between 1 & 2
2751	   at a ratio of 1:4.  Although this might sometimes require a seemingly
2752	   dangerous null decrement, a packet in a loop would still decrement to
2753	   zero after 255 time units on average.  As more routers were upgraded
2754	   to this more accurate TTL decrement, path delay estimates would
2755	   become increasingly accurate despite the presence of some legacy
2756	   routers that continued to always decrement the TTL by 1.

2758	Appendix F.  Policer Designs to ensure Congestion Responsiveness

2760	F.1.  Per-user Policing

2762	   User policing requires a policer on the ingress interface of the
2763	   access router associated with the user.  At that point, the traffic
2764	   of the user hasn't diverged on different routes yet; nor has it mixed
2765	   with traffic from other sources.

2767	   In order to ensure that a user doesn't generate more congestion in
2768	   the network than her due share, a modified bulk token-bucket is
2769	   maintained with the following parameter:

2771	   o  b_0 the initial token level

2773	   o  r the filling rate

2775	   o  b_max the bucket depth

2777	   The same token bucket algorithm is used as in many areas of
2778	   networking, but how it is used is very different:

2780	   o  all traffic from a user over the lifetime of their subscription is
2781	      policed in the same token bucket.

2783	   o  only Re-Echo packets consume tokens

2785	   Such a policer will allow network operators to throttle the
2786	   contribution of their users to network congestion.  This will require
2787	   the appropriate contractual terms to be in place between operators
2788	   and users.  For instance: a condition for a user to subscribe to a
2789	   given network service may be that she should not cause more than a
2790	   volume C_user of congestion over a reference period T_user, although
2791	   she may carry forward up to N_user times her allowance at the end of
2792	   each period.  These terms directly set the parameter of the user
2793	   policer:

2795	   o  b_0 = C_user

2797	   o  r = C_user/T_user

2799	   o  b_max = b_0 * (N_user +1)

2801	   Besides the congestion budget policer above, another user policer
2802	   will be necessary to rate-limit FNE packets, if they are to be marked
2803	   rather than dropped (see discussion in Section 5.3.).  Rate-limiting
2804	   FNE packets will prevent high bursts of new flow arrivals, which is a
2805	   very useful feature in DoS prevention.  A condition to subscribe to a
2806	   given network service would have to be that a user should not
2807	   generate more than C_FNE FNE packets, over a reference period T_FNE,
2808	   with no option to carry forward any of the allowance at the end of
2809	   each period.  These terms directly set the parameters of the FNE
2810	   policer:

2812	   o  b_0 = C_FNE

2814	   o  r = C_FNE/T_FNE

2816	   o  b_max = b_0

2818	   T_FNE should be a much shorter period than T_user: for instance T_FNE
2819	   could be in the order of minutes while T_user could be in order of
2820	   weeks.

2822	F.2.  Per-flow Rate Policing

2824	   Per-flow policing aims to enforce congestion responsiveness on the
2825	   shortest information timescale on a network path: packet roundtrips.

2827	   This again requires that the appropriate terms be agreed between a
2828	   network operator and its users, where a congestion responsiveness
2829	   policy might be required for the use of a given network service
2830	   (perhaps unless the user specifically requests otherwise).

2832	   As an example, we describe below how a rate adaptation policer can be
2833	   designed when the applicable rate adaptation policy is TCP-
2834	   compliance.  In that context, the average throughput of a flow will
2835	   be expected to be bounded by the value of the TCP throughput during
2836	   congestion avoidance, given n Mathis' formula [Mathis97]

2838	      x_TCP = k * s / ( T * sqrt(m) )

2840	   where:

2842	   o  x_TCP is the throughput of the TCP flow in packets per second,

2844	   o  k is a constant upper-bounded by sqrt(3/2),

2846	   o  s is the average packet size of the flow,

2848	   o  T is the roundtrip time of the flow,

2850	   o  m is the congestion level experienced by the flow.

2852	   We define the marking period N=1/m which represents the average
2853	   number of packets between two re-echoes.  Mathis' formula can be re-
2854	   written as:

2856	      x_TCP = k*s*sqrt(N)/T

2858	   We can then get the average inter-mark time in a compliant TCP flow,
2859	   dt_TCP, by solving (x_TCP/s)*dt_TCP = N which gives

2861	      dt_TCP = sqrt(N)*T/k

2863	   We rely on this equation for the design of a rate-adaptation policer
2864	   as a variation of a token bucket.  In that case a policer has to be
2865	   set up for each policed flow.  This may be triggered by FNE packets,
2866	   with the remainder of flows being all rate limited together if they
2867	   do not start with an FNE packet.

2869	   Where maintaining per flow state is not a problem, for instance on
2870	   some access routers, systematic per-flow policing may be considered.
2871	   Should per-flow state be more constrained, rate adaptation policing
2872	   could be limited to a random sample of flows exhibiting Re-Echoes.

2874	   As in the case of user policing, only re-echo packets will consume
2875	   tokens, however the amount of tokens consumed will depend on the
2876	   congestion signal.

2878	   When a new rate adaptation policer is set up for flow j, the
2879	   following state is created:

2881	   o  a token bucket b_j of depth b_max starting at level b_0

2883	   o  a timestamp t_j = timenow()

2885	   o  a counter N_j = 0

2887	   o  a roundtrip estimate T_j

2889	   o  a filling rate r

2891	   When the policing node forwards a packet of flow j with no Re-Echo:

2893	   o  . the counter is incremented: N_j += 1

2895	   When the policing node forwards a packet of flow j carrying a
2896	   congestion mark (CE):

2898	   o  the counter is incremented: N_j += 1

2900	   o  the token level is adjusted: b_j += r*(timenow()-t_j) - sqrt(N_j)*
2901	      T_j/k

2903	   o  the counter is reset: N_j = 0

2905	   o  the timer is reset: t_j = timenow()

2907	   An implementation example will be given in a later draft that avoids
2908	   having to extract the square root.

2910	   Analysis: For a TCP flow, for r= 1 token/sec, on average,

2912	      r*(timenow()-t_j)-sqrt(N_j)* T_j/k = dt_TCP - sqrt(N)*T/k = 0

2914	   This means that the token level will fluctuate around its initial
2915	   level.  The depth b_max of the bucket sets the timescale on which the
2916	   rate adaptation policy is performed while the filling rate r sets the
2917	   trade-off between responsiveness and robustness:

2919	   o  the higher b_max, the longer it will take to catch greedy flows

2921	   o  the higher r, the fewer false positives (greedy verdict on
2922	      compliant flows) but the more false negatives (compliant verdict
2923	      on greedy flows)

2925	   This rate adaptation policer requires the availability of a roundtrip
2926	   estimate which may be obtained for instance from the application of
2927	   re-feedback to the downstream delay Appendix E or passive estimation
2928	   [Jiang02].

2930	   When the bucket of a policer located at the access router (whether it
2931	   is a per-user policer or a per-flow policer) becomes empty, the
2932	   access router SHOULD drop at least all packets causing the token
2933	   level to become negative.  The network operator MAY take further
2934	   sanctions if the token level of the per-flow policers associated with
2935	   a user becomes negative.

2937	Authors' Addresses

2939	   Bob Briscoe
2940	   BT & UCL
2941	   B54/77, Adastral Park
2942	   Martlesham Heath
2943	   Ipswich  IP5 3RE
2944	   UK

2946	   Phone: +44 1473 645196
2947	   Email: bob.briscoe@bt.com
2948	   URI:   http://www.cs.ucl.ac.uk/staff/B.Briscoe/

2950	   Arnaud Jacquet
2951	   BT
2952	   B54/70, Adastral Park
2953	   Martlesham Heath
2954	   Ipswich  IP5 3RE
2955	   UK

2957	   Phone: +44 1473 647284
2958	   Email: arnaud.jacquet@bt.com
2959	   URI:

2961	   Alessandro Salvatori
2962	   BT
2963	   B54/77, Adastral Park
2964	   Martlesham Heath
2965	   Ipswich  IP5 3RE
2966	   UK

2968	   Email: sandr8@gmail.com

2970	Intellectual Property Statement

2972	   The IETF takes no position regarding the validity or scope of any
2973	   Intellectual Property Rights or other rights that might be claimed to
2974	   pertain to the implementation or use of the technology described in
2975	   this document or the extent to which any license under such rights
2976	   might or might not be available; nor does it represent that it has
2977	   made any independent effort to identify any such rights.  Information
2978	   on the procedures with respect to rights in RFC documents can be
2979	   found in BCP 78 and BCP 79.

2981	   Copies of IPR disclosures made to the IETF Secretariat and any
2982	   assurances of licenses to be made available, or the result of an
2983	   attempt made to obtain a general license or permission for the use of
2984	   such proprietary rights by implementers or users of this
2985	   specification can be obtained from the IETF on-line IPR repository at
2986	   http://www.ietf.org/ipr.

2988	   The IETF invites any interested party to bring to its attention any
2989	   copyrights, patents or patent applications, or other proprietary
2990	   rights that may cover technology that may be required to implement
2991	   this standard.  Please address the information to the IETF at
2992	   ietf-ipr@ietf.org.

2994	Disclaimer of Validity

2996	   This document and the information contained herein are provided on an
2997	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
2998	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
2999	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
3000	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
3001	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
3002	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

3004	Copyright Statement

3006	   Copyright (C) The Internet Society (2006).  This document is subject
3007	   to the rights, licenses and restrictions contained in BCP 78, and
3008	   except as set forth therein, the authors retain all their rights.

3010	Acknowledgment

3012	   Funding for the RFC Editor function is currently provided by the
3013	   Internet Society.