idnits 2.17.1 

draft-briscoe-conex-re-ecn-motiv-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 11, 2014) is 3699 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Historic
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-04) exists of
     draft-briscoe-conex-re-ecn-tcp-03


     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Transport Area Working Group                             B. Briscoe, Ed.
3	Internet-Draft                                                A. Jacquet
4	Intended status: Historic                                             BT
5	Expires: September 12, 2014                                 T. Moncaster
6	                                                           Moncaster.com
7	                                                                A. Smith
8	                                                                      BT
9	                                                          March 11, 2014

11	   Re-ECN: A Framework for adding Congestion Accountability to TCP/IP
12	                  draft-briscoe-conex-re-ecn-motiv-03

14	Abstract

16	   This document describes a framework for using a new protocol called
17	   re-ECN (re-inserted explicit congestion notification), which can be
18	   deployed incrementally around unmodified routers.  Re-ECN allows
19	   accurate congestion monitoring throughout the network thus enabling
20	   the upstream party at any trust boundary in the internetwork to be
21	   held responsible for the congestion they cause, or allow to be
22	   caused.  So, networks can introduce straightforward accountability
23	   for congestion and policing mechanisms for incoming traffic from end-
24	   customers or from neighbouring network domains.  As well as giving
25	   the motivation for re-ECN this document also gives examples of
26	   mechanisms that can use the protocol to ensure data sources respond
27	   correctly to congestion.  And it describes example mechanisms that
28	   ensure the dominant selfish strategy of both network domains and end-
29	   points will be to use the protocol honestly.

31	   Note concerning Intended Status: If this draft were ever published as
32	   an RFC it would probably have historic status.  There is limited
33	   space in the IP header, so re-ECN had to compromise by requiring the
34	   receiver to be ECN-enabled otherwise the sender could not use re-ECN.
35	   Re-ECN was a precursor to chartering of the IETF's Congestion
36	   Exposure (ConEx) working group, but during chartering there were
37	   still too few ECN receivers enabled, therefore it was decided to
38	   pursue other compromises in order to fit a similar capability into
39	   the IP header.

41	Status of This Memo

43	   This Internet-Draft is submitted in full conformance with the
44	   provisions of BCP 78 and BCP 79.

46	   Internet-Drafts are working documents of the Internet Engineering
47	   Task Force (IETF).  Note that other groups may also distribute
48	   working documents as Internet-Drafts.  The list of current Internet-
49	   Drafts is at http://datatracker.ietf.org/drafts/current/.

51	   Internet-Drafts are draft documents valid for a maximum of six months
52	   and may be updated, replaced, or obsoleted by other documents at any
53	   time.  It is inappropriate to use Internet-Drafts as reference
54	   material or to cite them other than as "work in progress."

56	   This Internet-Draft will expire on September 12, 2014.

58	Copyright Notice

60	   Copyright (c) 2014 IETF Trust and the persons identified as the
61	   document authors.  All rights reserved.

63	   This document is subject to BCP 78 and the IETF Trust's Legal
64	   Provisions Relating to IETF Documents
65	   (http://trustee.ietf.org/license-info) in effect on the date of
66	   publication of this document.  Please review these documents
67	   carefully, as they describe your rights and restrictions with respect
68	   to this document.  Code Components extracted from this document must
69	   include Simplified BSD License text as described in Section 4.e of
70	   the Trust Legal Provisions and are provided without warranty as
71	   described in the Simplified BSD License.

73	Table of Contents

75	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
76	     1.1.  Motivation . . . . . . . . . . . . . . . . . . . . . . . .  4
77	     1.2.  Re-ECN Protocol in Brief . . . . . . . . . . . . . . . . .  5
78	     1.3.  The Re-ECN Framework . . . . . . . . . . . . . . . . . . .  6
79	     1.4.  Solving Hard Problems  . . . . . . . . . . . . . . . . . .  7
80	     1.5.  The Rest of this Document  . . . . . . . . . . . . . . . .  8
81	   2.  Requirements notation  . . . . . . . . . . . . . . . . . . . .  8
82	   3.  Motivation . . . . . . . . . . . . . . . . . . . . . . . . . .  9
83	     3.1.  Policing Congestion Response . . . . . . . . . . . . . . .  9
84	       3.1.1.  The Policing Problem . . . . . . . . . . . . . . . . .  9
85	       3.1.2.  The Case Against Bottleneck Policing . . . . . . . . . 10
86	   4.  Re-ECN Incentive Framework . . . . . . . . . . . . . . . . . . 11
87	     4.1.  Revealing Congestion Along the Path  . . . . . . . . . . . 11
88	       4.1.1.  Positive and Negative Flows  . . . . . . . . . . . . . 13
89	     4.2.  Incentive Framework Overview . . . . . . . . . . . . . . . 13
90	     4.3.  Egress Dropper . . . . . . . . . . . . . . . . . . . . . . 17
91	     4.4.  Ingress Policing . . . . . . . . . . . . . . . . . . . . . 19
92	     4.5.  Inter-domain Policing  . . . . . . . . . . . . . . . . . . 21
93	     4.6.  Inter-domain Fail-safes  . . . . . . . . . . . . . . . . . 24
94	     4.7.  The Case against Classic Feedback  . . . . . . . . . . . . 25
95	     4.8.  Simulations  . . . . . . . . . . . . . . . . . . . . . . . 26
96	   5.  Other Applications of Re-ECN . . . . . . . . . . . . . . . . . 26
97	     5.1.  DDoS Mitigation  . . . . . . . . . . . . . . . . . . . . . 26
98	     5.2.  End-to-end QoS . . . . . . . . . . . . . . . . . . . . . . 28
99	     5.3.  Traffic Engineering  . . . . . . . . . . . . . . . . . . . 28
100	     5.4.  Inter-Provider Service Monitoring  . . . . . . . . . . . . 28
101	   6.  Limitations  . . . . . . . . . . . . . . . . . . . . . . . . . 28
102	   7.  Incremental Deployment . . . . . . . . . . . . . . . . . . . . 29
103	     7.1.  Incremental Deployment Features  . . . . . . . . . . . . . 29
104	     7.2.  Incremental Deployment Incentives  . . . . . . . . . . . . 30
105	   8.  Architectural Rationale  . . . . . . . . . . . . . . . . . . . 35
106	   9.  Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 38
107	     9.1.  Policing Rate Response to Congestion . . . . . . . . . . . 38
108	     9.2.  Congestion Notification Integrity  . . . . . . . . . . . . 38
109	     9.3.  Identifying Upstream and Downstream Congestion . . . . . . 39
110	   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 40
111	   11. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 40
112	   12. Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 40
113	   13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 40
114	   14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 40
115	   15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 40
116	     15.1. Normative References . . . . . . . . . . . . . . . . . . . 40
117	     15.2. Informative References . . . . . . . . . . . . . . . . . . 41
118	   Appendix A.  Example Egress Dropper Algorithm  . . . . . . . . . . 43
119	   Appendix B.  Policer Designs to ensure Congestion
120	                Responsiveness  . . . . . . . . . . . . . . . . . . . 44
121	     B.1.  Per-user Policing  . . . . . . . . . . . . . . . . . . . . 44
122	     B.2.  Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 45
123	   Appendix C.  Downstream Congestion Metering Algorithms . . . . . . 47
124	     C.1.  Bulk Downstream Congestion Metering Algorithm  . . . . . . 48
125	     C.2.  Inflation Factor for Persistently Negative Flows . . . . . 48
126	   Appendix D.  Re-TTL  . . . . . . . . . . . . . . . . . . . . . . . 49
127	   Appendix E.  Argument for holding back the ECN nonce . . . . . . . 50

129	Authors' Statement: Status (to be removed by the RFC Editor)

131	   Although the re-ECN protocol is intended to make a simple but far-
132	   reaching change to the Internet architecture, the most immediate
133	   priority for the authors is to delay any move of the ECN nonce to
134	   Proposed Standard status.  The argument for this position is
135	   developed in Appendix E.

137	1.  Introduction

139	   This document aims to:

141	   o  Describe the motivation for wanting to introduce re-ECN;

143	   o  Provide a very brief description of the protocol;

145	   o  The framework within which the protocol sits;

147	   o  To show how a number of hard problems become much easier to solve
148	      once re-ECN is available in IP.

150	   This introduction starts with a run through of these 4 points.

152	1.1.  Motivation

154	   Re-ECN is proposed as a means of allowing accurate monitoring of
155	   congestion throughout the Internet.  The current Internet relies on
156	   the vast majority of end-systems running TCP and reacting to detected
157	   congestion by reducing their sending rates.  Thus congestion control
158	   is conducted by the collaboration of the majority of end-systems.

160	   In this situation it is possible for applications that are
161	   unresponsive to congestion to take whatever share of bottleneck
162	   resources they want from responsive flows, the responsive flows
163	   reduce their sending rate in face of congestion and effectively get
164	   out of the way of unresponsive flows.  An increasing proportion of
165	   such applications could lead to congestion collapse being more common
166	   [RFC3714].  Each network has no visibility of whole path congestion
167	   and can only respond to congestion on a local basis.

169	   Using re-ECN will allow any point along a path to calculate
170	   congestion both upstream and downstream of that point.  As a
171	   consequence of this policing of congestion /could/ be carried out in
172	   the network if end-systems fail to do so.  Re-ECN enables flows and
173	   users to be policed and for policing to happen at network ingress and
174	   at network borders.

176	1.2.  Re-ECN Protocol in Brief

178	   In re-ECN each sender makes a prediction of the congestion that each
179	   flow will cause and signals that prediction within the IP headers of
180	   that flow.  The prediction is based on, but not limited to, feedback
181	   received from the receiver.  Sending a prediction of the congestion
182	   gives network equipment a view of the congestion downstream and
183	   upstream.

185	   In order to explain this mechanism we introduce the notion of IP
186	   packets carrying different, notional values dependent on the state of
187	   their header flags:

189	   o  Negative - are those marked by queues when incipient congestion is
190	      detected.  This is exactly the same as ECN [RFC3168];

192	   o  Positive - are sent by the sender in proportion to the number of
193	      bytes in packets that have been marked negative according to
194	      feedback received from the receiver;

196	   o  Cautious - are sent whenever the sender cannot be sure of the
197	      correct amount of positive bytes to inject into the network for
198	      example, at the start of a flow to indicate that feedback has not
199	      been established;

201	   o  Cancelled - packets sent by the sender as positive that get marked
202	      as negative by queues in the network due to incipient congestion;

204	   o  Neutral - normal IP packets but show queues that they can be
205	      marked negative.

207	   A flow starts to transmit packets.  No feedback has been established
208	   so a number of cautious packets are sent (see the protocol definition
209	   [Re-TCP] for an analysis of how many cautious packets should be sent
210	   at flow start).  The rest are sent as neutral.

212	   The packets traverse a congested queue.  A fraction are marked
213	   negative as an indication of incipient congestion.

215	   The packets are received by the receiver.  The receiver feeds back to
216	   the sender a count of the number of packets that have been marked
217	   negative.  This feedback can be provided either by the transport
218	   (e.g.  TCP) or by higher-layer control messages.

220	   The sender receives the feedback and then sends a number of positive
221	   packets in proportion to the bytes represented by packets that have
222	   been marked negative.  It is important to note that congestion is
223	   revealed by the fraction of marked packets rather than a field in the
224	   IP header.  This is due to the limited code points available and
225	   includes use of the last unallocated bit (sometimes called the evil
226	   bit [RFC3514]).  Full details of the code points used is given in
227	   [Re-TCP].  This lack of codepoints is, however, the case with IPv4.
228	   ECN is similarly restricted.

230	   The number of bytes inside the negative packets and positive packets
231	   should therefore be approximately equal at the termination point of
232	   the flow.  To put it another way, the balance of negative and
233	   positive should be zero.

235	1.3.  The Re-ECN Framework

237	   The introducion of the protocol enables 3 things:

239	   o  Gives a view of whole path congestion;

241	   o  Enables policing of flows;

243	   o  It allows networks to monitor the flow of congestion across their
244	      borders.

246	   At any point in the network a device can calculate the upstream
247	   congestion by calculating the fraction of bytes in negative packets
248	   to total packets.  This it could do using ECN by calculating the
249	   fraction of packets marked Congestion Experienced.

251	   Using re-ECN a device in the network can calculate downstream
252	   congestion by subtracting the fraction of negative packets from the
253	   fraction of positive packets.

255	   A user can be restricted to only causing a certain amount of
256	   congestion.  A Policer could be introduced at the ingress of a
257	   network that counts the number of positive packets being sent and
258	   limits the sender if that sender ties to transmit more positive
259	   packets than their allowance.

261	   A user could deliberately ignore some or all of the feedback and
262	   transmit packets with a zero or much lower proportion of positive
263	   packets than negative packets.  To solve this a Dropper is proposed.
264	   This would be placed at the egress of a network.  If the number of
265	   negative packets exceeds the number of positive packets then the flow
266	   could be dropped or some other sanction enacted.

268	   Policers and droppers could be used between networks in order to
269	   police bulk traffic.  A whole network harbouring users causing
270	   congestion in downstream networks can be held responsible or policed
271	   by its downstream neighbour.

273	1.4.  Solving Hard Problems

275	   We have already shown that by making flows declare the level of
276	   congestion they are causing that they can be policed, more
277	   specifically these are the kind of problems that can be solved:

279	   o  mitigating distributed denial of service (DDoS);

281	   o  simplifying differentiation of quality of service (QoS);

283	   o  policing compliance to congestion control;

285	   o  inter-provider service monitoring;

287	   o  etc.

289	   Uniquely, re-ECN manages to enable solutions to these problems
290	   without unduly stifling innovative new ways to use the Internet.
291	   This was a hard balance to strike, given it could be argued that DDoS
292	   is an innovative way to use the Internet.  The most valuable insight
293	   was to allow each network to choose the level of constraint it wishes
294	   to impose.  Also re-ECN has been carefully designed so that networks
295	   that choose to use it conservatively can protect themselves against
296	   the congestion caused in their network by users on other networks
297	   with more liberal policies.

299	   For instance, some network owners want to block applications like
300	   voice and video unless their network is compensated for the extra
301	   share of bottleneck bandwidth taken.  These real-time applications
302	   tend to be unresponsive when congestion arises.  Whereas elastic TCP-
303	   based applications back away quickly, ending up taking a much smaller
304	   share of congested capacity for themselves.  Other network owners
305	   want to invest in large amounts of capacity and make their gains from
306	   simplicity of operation and economies of scale.

308	   While we have designed re-ECN so that networks can choose to deploy
309	   stringent policing, this does not imply we advocate that every
310	   network should introduce tight controls on those that cause
311	   congestion.  Re-ECN has been specifically designed to allow different
312	   networks to choose how conservative or liberal they wish to be with
313	   respect to policing congestion.  But those that choose to be
314	   conservative can protect themselves from the excesses that liberal
315	   networks allow their users.

317	   Re-ECN allows the more conservative networks to police out flows that
318	   have not asked to be unresponsive to congestion---not because they
319	   are voice or video---just because they don't respond to congestion.
320	   But it also allows other networks to choose not to police.

322	   Crucially, when flows from liberal networks cross into a conservative
323	   network, re-ECN enables the conservative network to apply penalties
324	   to its neighbouring networks for the congestion they allow to be
325	   caused.  And these penalties can be applied to bulk data, without
326	   regard to flows.

328	   Then, if unresponsive applications become so dominant that some of
329	   the more liberal networks experience congestion collapse [RFC3714],
330	   they can change their minds and use re-ECN to apply tighter controls
331	   in order to bring congestion back under control.

333	   Re-ECN reduces the need for complex network equipment to perform
334	   these functions.

336	1.5.  The Rest of this Document

338	   This document is structured as follows.  First the motivation for the
339	   new protocol is given (Section 3) followed by the incentive framework
340	   that is possible with the protocol Section 4.  Section 5 then
341	   describes other important applications re-ECN, such as policing DDoS,
342	   QoS and congestion control.  Although these applications do not
343	   require standardisation themselves, they are described in a fair
344	   degree of detail in order to explain how re-ECN can be used.  Given
345	   re-ECN proposes to use the last undefined bit in the IPv4 header, we
346	   felt it necessary to outline the potential that re-ECN could release
347	   in return for being given that bit.

349	   Deployment issues discussed throughout the document are brought
350	   together in Section 7, which is followed by a brief section
351	   explaining the somewhat subtle rationale for the design from an
352	   architectural perspective (Section 8).  We end by describing related
353	   work (Section 9), listing security considerations (Section 10) and
354	   finally drawing conclusions (Section 12).

356	2.  Requirements notation

358	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
359	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
360	   document are to be interpreted as described in [RFC2119].

362	   This document first specifies a protocol, then describes a framework
363	   that creates the right incentives to ensure compliance to the
364	   protocol.  This could cause confusion because the second part of the
365	   document considers many cases where malicious nodes may not comply
366	   with the protocol.  When such contingencies are described, if any of
367	   the above keywords are not capitalised, that is deliberate.  So, for
368	   instance, the following two apparently contradictory sentences would
369	   be perfectly consistent: i) x MUST do this; ii) x may not do this.

371	3.  Motivation

373	3.1.  Policing Congestion Response

375	3.1.1.  The Policing Problem

377	   The current Internet architecture trusts hosts to respond voluntarily
378	   to congestion.  Limited evidence shows that the large majority of
379	   end-points on the Internet comply with a TCP-friendly response to
380	   congestion.  But telephony (and increasingly video) services over the
381	   best effort Internet are attracting the interest of major commercial
382	   operations.  Most of these applications do not respond to congestion
383	   at all.  Those that can switch to lower rate codecs.

385	   Of course, the Internet is intended to support many different
386	   application behaviours.  But the problem is that this freedom can be
387	   exercised irresponsibly.  The greater problem is that we will never
388	   be able to agree on where the boundary is between responsible and
389	   irresponsible.  Therefore re-ECN is designed to allow different
390	   networks to set their own view of the limit to irresponsibility, and
391	   to allow networks that choose a more conservative limit to push back
392	   against congestion caused in more liberal networks.

394	   As an example of the impossibility of setting a standard for
395	   fairness, mandating TCP-friendliness would set the bar too high for
396	   unresponsive streaming media, but still some would say the bar was
397	   too low [relax-fairness].  Even though all known peer-to-peer
398	   filesharing applications are TCP-compatible, they can cause a
399	   disproportionate amount of congestion, simply by using multiple flows
400	   and by transferring data continuously relative to other short-lived
401	   sessions.  On the other hand, if we swung the other way and set the
402	   bar low enough to allow streaming media to be unresponsive, we would
403	   also allow denial of service attacks, which are typically
404	   unresponsive to congestion and consist of multiple continuous flows.

406	   Applications that need (or choose) to be unresponsive to congestion
407	   can effectively take (some would say steal) whatever share of
408	   bottleneck resources they want from responsive flows.  Whether or not
409	   such free-riding is common, inability to prevent it increases the
410	   risk of poor returns for investors in network infrastructure, leading
411	   to under-investment.  An increasing proportion of unresponsive or
412	   free-riding demand coupled with persistent under-supply is a broken
413	   economic cycle.  Therefore, if the current, largely co-operative
414	   consensus continues to erode, congestion collapse could become more
415	   common in more areas of the Internet [RFC3714].

417	   While we have designed re-ECN so that networks can choose to deploy
418	   stringent policing, this does not imply we advocate that every
419	   network should introduce tight controls on those that cause
420	   congestion.  Re-ECN has been specifically designed to allow different
421	   networks to choose how conservative or liberal they wish to be with
422	   respect to policing congestion.  But those that choose to be
423	   conservative can protect themselves from the excesses that liberal
424	   networks allow their users.

426	3.1.2.  The Case Against Bottleneck Policing

428	   The state of the art in rate policing is the bottleneck policer,
429	   which is intended to be deployed at any forwarding resource that may
430	   become congested.  Its aim is to detect flows that cause
431	   significantly more local congestion than others.  Although operators
432	   might solve their immediate problems by deploying bottleneck
433	   policers, we are concerned that widespread deployment would make it
434	   extremely hard to evolve new application behaviours.  We believe the
435	   IETF should offer re-ECN as the preferred protocol on which to base
436	   solutions to the policing problems of operators, because it would not
437	   harm evolvability and, frankly, it would be far more effective (see
438	   later for why).

440	   Approaches like [XCHOKe] & [pBox] are nice approaches for rate
441	   policing traffic without the benefit of whole path information (such
442	   as could be provided by re-ECN).  But they must be deployed at
443	   bottlenecks in order to work.  Unfortunately, a large proportion of
444	   traffic traverses at least two bottlenecks (in two access networks),
445	   particularly with the current traffic mix where peer-to-peer file-
446	   sharing is prevalent.  If ECN were deployed, we believe it would be
447	   likely that these bottleneck policers would be adapted to combine ECN
448	   congestion marking from the upstream path with local congestion
449	   knowledge.  But then the only useful placement for such policers
450	   would be close to the egress of the internetwork.

452	   But then, if these bottleneck policers were widely deployed (which
453	   would require them to be more effective than they are now), the
454	   Internet would find itself with one universal rate adaptation policy
455	   (probably TCP-friendliness) embedded throughout the network.  Given
456	   TCP's congestion control algorithm is already known to be hitting its
457	   scalability limits and new algorithms are being developed for high-
458	   speed congestion control, embedding TCP policing into the Internet
459	   would make evolution to new algorithms extremely painful.  If a
460	   source wanted to use a different algorithm, it would have to first
461	   discover then negotiate with all the policers on its path,
462	   particularly those in the far access network.  The IETF has already
463	   traveled that path with the Intserv architecture and found it
464	   constrains scalability [RFC2208].

466	   Anyway, if bottleneck policers were ever widely deployed, they would
467	   be likely to be bypassed by determined attackers.  They inherently
468	   have to police fairness per flow or per source-destination pair.
469	   Therefore they can easily be circumvented either by opening multiple
470	   flows (by varying the end-point port number); or by spoofing the
471	   source address but arranging with the receiver to hide the true
472	   return address at a higher layer.

474	4.  Re-ECN Incentive Framework

476	   The aim is to create an incentive environment that ensures optimal
477	   sharing of capacity despite everyone acting selfishly (including
478	   lying and cheating).  Of course, the mechanisms put in place for this
479	   can lie dormant wherever co-operation is the norm.

481	4.1.  Revealing Congestion Along the Path

483	   Throughout this document we focus on path congestion.  But some forms
484	   of fairness, particularly TCP's, also depend on round trip time.  If
485	   TCP-fairness is required, we also propose to measure downstream path
486	   delay using re-feedback.  We give a simple outline of how this could
487	   work in Appendix D.  However, we do not expect this to be necessary,
488	   as researchers tend to agree that only congestion control dynamics
489	   need to depend on RTT, not the rate that the algorithm would converge
490	   on after a period of stability.

492	   Recall that re-ECN can be used to measure path congestion at any
493	   point on the path.  End-systems know the whole path congestion.  The
494	   receiver knows this by the ratio of negative packets to all other
495	   packets it observes.  The sender knows this same information via the
496	   feedback.

498	       +---+  +----+                +----+  +---+
499	       | S |--| Q1 |----------------| Q2 |--| R |
500	       +---+  +----+                +----+  +---+
501	         .      .                      .      .
502	       ^ .      .                      .      .
503	       | .      .                      .      .
504	       | .     positive fraction       .      .
505	    3% |-------------------------------+=======
506	       | .      .                      |      .
507	    2% | .      .                      |      .
508	       | .      .  negative fraction   |      .
509	    1% | .      +----------------------+      .
510	       | .      |                      .      .
511	    0% +--------------------------------------->
512	         ^          ^                      ^
513	         L          M                      N    Observation points

515	                  Figure 1: A 2-Queue Example (Imprecise)

517	   Figure 1 uses a simple network to illustrate how re-ECN allows queues
518	   to measure downstream congestion.  The receiver counts negative
519	   packets as being 3% of all received packets.  This fraction is fed
520	   back to the sender.  The sender sets 3% of its packets to be positive
521	   to match this.  This fraction of positive packets can be observed
522	   along the path.  This is shown by the horizontal line at 3% in the
523	   figure.  The negative fraction is shown by the stepped line which
524	   rises to meet the positive fraction line with steps at at each queue
525	   where packets are marked negative.  Two queues are shown (Q1 and Q2)
526	   that are currently congested.  Each time packets pass through a
527	   fraction are marked red; 1% at Q1 and 2% at Q2).  The approximate
528	   downstream congestion can be measured at the observation points shown
529	   along the path by subtracting the negative fraction from the positive
530	   fraction, as shown in the table below.  [Re-TCP] [ref other document]
531	   derives these approximations from a precise analysis).

533	           +-------------------+------------------------------+
534	           | Observation point | Approx downstream congestion |
535	           +-------------------+------------------------------+
536	           |         L         |         3% - 0% = 3%         |
537	           |         M         |         3% - 1% = 2%         |
538	           |         N         |         3% - 3% = 0%         |
539	           +-------------------+------------------------------+

541	   Table 1: Downstream Congestion Measured at Example Observation Points

543	   All along the path, whole-path congestion remains unchanged so it can
544	   be used as a reference against which to compare upstream congestion.

546	   The difference predicts downstream congestion for the rest of the
547	   path.  Therefore, measuring the fractions of negative and positive
548	   packets at any point in the Internet will reveal upstream, downstream
549	   and whole path congestion.

551	   Note: to be absolutely clear these fractions are averages that would
552	   result from the behaviour of the protocol handler mechanically
553	   sending positive packets in direct response to incoming feedback---we
554	   are not saying any protocol handler has to work with these average
555	   fractions directly.

557	4.1.1.  Positive and Negative Flows

559	   In section Section 1.2 we introduced the notion of IP packets having
560	   different values (negative, positive, cautious, cancelled and
561	   neutral).  So positive and cautious packets have a value of +1,
562	   negative -1, and cancelled and neutral have zero value.

564	   In the rest of this document we will loosely talk of positive or
565	   negative flows.  A negative flow is one where more negative bytes
566	   than positive bytes arrive at the reciever.  Likewise positive flows
567	   are where more positive bytes arrive than negative bytes.  Both of
568	   these indicate that the wrong amount of positive bytes have been
569	   sent.

571	4.2.  Incentive Framework Overview

573	   Figure 2 sketches the incentive framework that we will describe piece
574	   by piece throughout this section.  We will do a first pass in
575	   overview, then return to each piece in detail.  We re-use the earlier
576	   example of how downstream congestion is derived by subtracting
577	   upstream congestion from path congestion (Figure 1) but depict
578	   multiple trust boundaries to turn it into an internetwork.  For
579	   clarity, only downstream congestion is shown (the difference between
580	   the two earlier plots).  The graph displays downstream path
581	   congestion seen in a typical flow as it traverses an example path
582	   from sender S to receiver R, across networks N1, N2 & N3.  Everyone
583	   is shown using re-ECN correctly, but we intend to show why everyone
584	   would /choose/ to use it correctly, and honestly.

586	   Three main types of self-interest can be identified:

588	   o  Users want to transmit data across the network as fast as
589	      possible, paying as little as possible for the privilege.  In this
590	      respect, there is no distinction between senders and receivers,
591	      but we must be wary of potential malice by one on the other;

593	   o  Network operators want to maximise revenues from the resources
594	      they invest in.  They compete amongst themselves for the custom of
595	      users.

597	   o  Attackers (whether users or networks) want to use any opportunity
598	      to subvert the new re-ECN system for their own gain or to damage
599	      the service of their victims, whether targeted or random.

601	          policer                        dropper
602	           |                                |
603	           |                                |
604	         S <-----N1----> <---N2---> <---N3--> R    domain
605	                        |          |
606	                        |          |
607	                      Border Gateways

609	                       Figure 2: Incentive Framework

611	   Source congestion control:  We want to ensure that the sender will
612	      throttle its rate as downstream congestion increases.  Whatever
613	      the agreed congestion response (whether TCP-compatible or some
614	      enhanced QoS), to some extent it will always be against the
615	      sender's interest to comply.

617	   Ingress policing:  But it is in all the network operators' interests
618	      to encourage fair congestion response, so that their investments
619	      are employed to satisfy the most valuable demand.  The re-ECN
620	      protocol ensures packets carry the necessary information about
621	      their own expected downstream congestion so that N1 can deploy a
622	      policer at its ingress to check that S1 is complying with whatever
623	      congestion control it should be using (Section 4.4).  If N1 is
624	      extremely conservative it could police each flow, but it is likely
625	      to just police the bulk amount of congestion each customer causes
626	      without regard to flows, or if it is extremely liberal it need not
627	      police congestion control at all.  Whatever, it is always
628	      preferable to police traffic at the very first ingress into an
629	      internetwork, before non-compliant traffic can cause any damage.

631	   Edge egress dropper:  If the policer ensures the source has less
632	      right to a high rate the higher it declares downstream congestion,
633	      the source has a clear incentive to understate downstream
634	      congestion.  But, if flows of packets are understated when they
635	      enter the internetwork, they will have become negative by the time
636	      they leave.  So, we introduce a dropper at the last network
637	      egress, which drops packets in flows that persistently declare
638	      negative downstream congestion (see Section 4.3 for details).

640	   Inter-domain traffic policing:  But next we must ask, if congestion
641	      arises downstream (say in N3), what is the ingress network's
642	      (N1's) incentive to police its customers' response?  If N1 turns a
643	      blind eye, its own customers benefit while other networks suffer.
644	      This is why all inter-domain QoS architectures (e.g. Intserv,
645	      Diffserv) police traffic each time it crosses a trust boundary.
646	      We have already shown that re-ECN gives a trustworthy measure of
647	      the expected downstream congestion that a flow will cause by
648	      subtracting negative volume from positive at any intermediate
649	      point on a path.  N3 (say) can use this measure to police all the
650	      responses to congestion of all the sources beyond its upstream
651	      neighbour (N2), but in bulk with one very simple passive
652	      mechanism, rather than per flow, as we will now explain.

654	   Emulating policing with inter-domain congestion penalties:  Between
655	      high-speed networks, we would rather avoid per-flow policing, and
656	      we would rather avoid holding back traffic while it is policed.
657	      Instead, once re-ECN has arranged headers to carry downstream
658	      congestion honestly, N2 can contract to pay N3 penalties in
659	      proportion to a single bulk count of the congestion metrics
660	      crossing their mutual trust boundary (Section 4.5).  In this way,
661	      N3 puts pressure on N2 to suppress downstream congestion, for
662	      every flow passing through the border interface, even though they
663	      will all start and end in different places, and even though they
664	      may all be allowed different responses to congestion.  The figure
665	      depicts this downward pressure on N2 by the solid downward arrow
666	      at the egress of N2.  Then N2 has an incentive either to police
667	      the congestion response of its own ingress traffic (from N1) or to
668	      emulate policing by applying penalties to N1 in turn on the basis
669	      of congestion counted at their mutual boundary.  In this recursive
670	      way, the incentives for each flow to respond correctly to
671	      congestion trace back with each flow precisely to each source,
672	      despite the mechanism not recognising flows (see Section 5.2).

674	   Inter-domain congestion charging diversity:  Any two networks are
675	      free to agree any of a range of penalty regimes between themselves
676	      but they would only provide the right incentives if they were
677	      within the following reasonable constraints.  N2 should expect to
678	      have to pay penalties to N3 where penalties monotonically increase
679	      with the volume of congestion and negative penalties are not
680	      allowed.  For instance, they may agree an SLA with tiered
681	      congestion thresholds, where higher penalties apply the higher the
682	      threshold that is broken.  But the most obvious (and useful) form
683	      of penalty is where N3 levies a charge on N2 proportional to the
684	      volume of downstream congestion N2 dumps into N3.  In the
685	      explanation that follows, we assume this specific variant of
686	      volume charging between networks - charging proportionate to the
687	      volume of congestion.

689	      We must make clear that we are not advocating that everyone should
690	      use this form of contract.  We are well aware that the IETF tries
691	      to avoid standardising technology that depends on a particular
692	      business model.  And we strongly share this desire to encourage
693	      diversity.  But our aim is merely to show that border policing can
694	      at least work with this one model, then we can assume that
695	      operators might experiment with the metric in other models (see
696	      Section 4.5 for examples).  Of course, operators are free to
697	      complement this usage element of their charges with traditional
698	      capacity charging, and we expect they will as predicted by
699	      economics.

701	   No congestion charging to users:  Bulk congestion penalties at trust
702	      boundaries are passive and extremely simple, and lose none of
703	      their per-packet precision from one boundary to the next (unlike
704	      Diffserv all-address traffic conditioning agreements, which
705	      dissipate their effectiveness across long topologies).  But at any
706	      trust boundary, there is no imperative to use congestion charging.
707	      Traditional traffic policing can be used, if the complexity and
708	      cost is preferred.  In particular, at the boundary with end
709	      customers (e.g. between S and N1), traffic policing will most
710	      likely be more appropriate.  Policer complexity is less of a
711	      concern at the edge of the network.  And end-customers are known
712	      to be highly averse to the unpredictability of congestion
713	      charging.

715	   NOTE WELL:  This document neither advocates nor requires congestion
716	      charging for end customers and advocates but does not require
717	      inter-domain congestion charging.

719	   Competitive discipline of inter-domain traffic engineering:  With
720	      inter-domain congestion charging, a domain seems to have a
721	      perverse incentive to fake congestion; N2's profit depends on the
722	      difference between congestion at its ingress (its revenue) and at
723	      its egress (its cost).  So, overstating internal congestion seems
724	      to increase profit.  However, smart border routing [Smart_rtg] by
725	      N1 will bias its routing towards the least cost routes.  So, N2
726	      risks losing all its revenue to competitive routes if it
727	      overstates congestion (see Section 5.3).  In other words, if N2 is
728	      the least congested route, its ability to raise excess profits is
729	      limited by the congestion on the next least congested route.

731	   Closing the loop:  All the above elements conspire to trap everyone
732	      between two opposing pressures, ensuring the downstream congestion
733	      metric arrives at the destination neither above nor below zero.
734	      So, we have arrived back where we started in our argument.  The
735	      ingress edge network can rely on downstream congestion declared in
736	      the packet headers presented by the sender.  So it can police the
737	      sender's congestion response accordingly.

739	   Evolvability of congestion control:  We have seen that re-ECN enables
740	      policing at the very first ingress.  We have also seen that, as
741	      flows continue on their path through further networks downstream,
742	      re-ECN removes the need for further per-domain ingress policing of
743	      all the different congestion responses allowed to each different
744	      flow.  This is why the evolvability of re-ECN policing is so
745	      superior to bottleneck policing or to any policing of different
746	      QoS for different flows.  Even if all access networks choose to
747	      conservatively police congestion per flow, each will want to
748	      compete with the others to allow new responses to congestion for
749	      new types of application.  With re-ECN, each can introduce new
750	      controls independently, without coordinating with other networks
751	      and without having to standardise anything.  But, as we have just
752	      seen, by making inter-domain penalties proportionate to bulk
753	      downtream congestion, downstream networks can be agnostic to the
754	      specific congestion response for each flow, but they can still
755	      apply more penalty the more liberal the ingress access network has
756	      been in the response to congestion it allowed for each flow.

758	   We now take a second pass over the incentive framework, filling in
759	   the detail.

761	4.3.  Egress Dropper

763	   As traffic leaves the last network before the receiver (domain N3 in
764	   Figure 2), the fraction of positive octets in a flow should match the
765	   fraction of negative octets introduced by congestion marking (red
766	   packets), leaving a balance of zero.  If it is less (a negative
767	   flow), it implies that the source is understating path congestion
768	   (which will reduce the penalties that N2 owes N3).

770	   If flows are positive, N3 need take no action---this simply means its
771	   upstream neighbour is paying more penalties than it needs to, and the
772	   source is going slower than it needs to.  But, to protect itself
773	   against persistently negative flows, N3 will need to install a
774	   dropper at its egress.  Appendix A gives a suggested algorithm for
775	   this dropper.  There is no intention that the dropper algorithm needs
776	   to be standardised, it is merely provided to show that an efficient,
777	   robust algorithm is possible.  But whatever algorithm is used must
778	   meet the criteria below:

780	   o  It SHOULD introduce minimal false positives for honest flows;

782	   o  It SHOULD quickly detect and sanction dishonest flows (minimal
783	      false negatives);

785	   o  It SHOULD be invulnerable to state exhaustion attacks from
786	      malicious sources.  For instance, if the dropper uses flow-state,
787	      it should not be possible for a source to send numerous packets,
788	      each with a different flow ID, to force the dropper to exhaust its
789	      memory capacity (rationale for SHOULD: Continuously sending keep-
790	      alive packets might be perfectly reasonable behaviour, so we can't
791	      distinguish a deliberate attack from reasonable levels of such
792	      behaviour.  Therefore it is strictly impossible to be invulnerable
793	      to such an attack);

795	   o  It MUST introduce sufficient loss in goodput so that malicious
796	      sources cannot play off losses in the egress dropper against
797	      higher allowed throughput.  Salvatori [CLoop_pol] describes this
798	      attack, which involves the source understating path congestion
799	      then inserting forward error correction (FEC) packets to
800	      compensate expected losses;

802	   o  It MUST NOT be vulnerable to `identity whitewashing', where a
803	      transport can label a flow with a new ID more cheaply than paying
804	      the cost of continuing to use its current ID.

806	   Note that the dropper operates on flows but we would like it not to
807	   require per-flow state.  This is why we have been careful to ensure
808	   that all flows MUST start with a cautious packet.  If a flow does not
809	   start with a cautious packet, a dropper is likely to treat it
810	   unfavourably.  This risk makes it worth sending a cautious packet at
811	   the start of a flow, even though there is a cost to the sender of
812	   doing so (positive `worth').  Indeed, with cautious packets, the rate
813	   at which a sender can generate new flows can be limited (Appendix B).
814	   In this respect, cautious packets work like Handley's state set-up
815	   bit [Steps_DoS].

817	   Appendix A also gives an example dropper implementation that
818	   aggregates flow state.  Dropper algorithms will often maintain a
819	   moving average across flows of the fraction of positive packets.
820	   When maintaining an average across flows, a dropper SHOULD only allow
821	   flows into the average if they start with a cautious packet, but it
822	   SHOULD NOT include cautious packets in the positive packet average.
823	   A sender sends cautious packets when it does not have the benefit of
824	   feedback from the receiver.  So, counting cautious packets would be
825	   likely to make the average unnecessarily positive, providing headroom
826	   (or should we say footroom?) for dishonest (negative) traffic.

828	   If the dropper detects a persistently negative flow, it SHOULD drop
829	   sufficient negative and neutral packets to force the flow to not be
830	   negative.  Drops SHOULD be focused on just sufficient packets in
831	   misbehaving flows to remove the negative bias while doing minimal
832	   extra harm.

834	4.4.  Ingress Policing

836	   Access operators who wish to limit the congeston that a sender is
837	   able to cause can deploy policers at the very first ingress to the
838	   internetwork.  Re-ECN has been designed to avoid the need for
839	   bottleneck policing so that we can avoid a future where a single rate
840	   adaptation policy is embedded throughout the network.  Instead, re-
841	   ECN allows the particular rate adaptation policy to be solely agreed
842	   bilaterally between the sender and its ingress access provider ([ref
843	   other document] discusses possible ways to signal between them),
844	   which allows congestion control to be policed, but maintains its
845	   evolvability, requiring only a single, local box to be updated.

847	   Appendix B gives examples of per-user policing algorithms.  But there
848	   is no implication that these algorithms are to be standardised, or
849	   that they are ideal.  The ingress rate policer is the part of the re-
850	   ECN incentive framework that is intended to be the most flexible.
851	   Once endpoint protocol handlers for re-ECN and egress droppers are in
852	   place, operators can choose exactly which congestion response they
853	   want to police, and whether they want to do it per user, per flow or
854	   not at all.

856	   The re-ECN protocol allows these ingress policers to easily perform
857	   bulk per-user policing (Appendix B.1).  This is likely to provide
858	   sufficient incentive to the user to correctly respond to congestion
859	   without needing the policing function to be overly complex.  If an
860	   access operator chose they could use per-flow policing according to
861	   the widely adopted TCP rate adaptation ( Appendix B.2) or other
862	   alternatives, however this would introduce extra complexity to the
863	   system.

865	   If a per-flow rate policer is used, it should use path (not
866	   downstream) congestion as the relevant metric, which is represented
867	   by the fraction of octets in packets with positive (positive and
868	   cautious packets) and cancelled packets.  Of course, re-ECN provides
869	   all the information a policer needs directly in the packets being
870	   policed.  So, even policing TCP's AIMD algorithm is relatively
871	   straightforward (Appendix B.2).

873	   Note that we have included cancelled packets in the measure of path
874	   congestion. cancelled packets arise when the sender sends a positive
875	   packet in response to feedback, but then this positive packet just
876	   happens to be congestion marked itself.  One would not normally
877	   expect many cancelled packets at the first ingress because one would
878	   not normally expect much congestion marking to have been necessary
879	   that soon in the path.  However, a home network or campus network may
880	   well sit between the sending endpoint and the ingress policer, so
881	   some congestion may occur upstream of the policer.  And if congestion
882	   does occur upstream, some cancelled packets should be visible, and
883	   should be taken into account in the measure of path congestion.

885	   But a much more important reason for including cancelled packets in
886	   the measure of path congestion at an ingress policer is that a sender
887	   might otherwise subvert the protocol by sending cancelled packets
888	   instead of neutral packets.  Like neutral, cancelled packets are
889	   worth zero, so the sender knows they won't be counted against any
890	   quota it might have been allowed.  But unlike neutral packets,
891	   cancelled packets are immune to congestion marking, because they have
892	   already been congestion marked.  So, it is both correct and useful
893	   that cancelled packets should be included in a policer's measure of
894	   path congestion, as this removes the incentive the sender would
895	   otherwise have to mark more packets as cancelled than it should.

897	   An ingress policer should also ensure that flows are not already
898	   negative when they enter the access network.  As with cancelled
899	   packets, the presence of negative packets will typically be unusual.
900	   Therefore it will be easy to detect negative flows at the ingress by
901	   just detecting negative packets then monitoring the flow they belong
902	   to.

904	   Of course, even if the sender does operate its own network, it may
905	   arrange not to congestion mark traffic.  Whether the sender does this
906	   or not is of no concern to anyone else except the sender.  Such a
907	   sender will not be policed against its own network's contribution to
908	   congestion, but the only resulting problem would be overload in the
909	   sender's own network.

911	   Finally, we must not forget that an easy way to circumvent re-ECN's
912	   defences is for the source to turn off re-ECN support, by setting the
913	   Not-RECT codepoint, implying RFC3168 compliant traffic.  Therefore an
914	   ingress policer should put a general rate-limit on Not-RECT traffic,
915	   which SHOULD be lax during early, patchy deployment, but will have to
916	   become stricter as deployment widens.  Similarly, flows starting
917	   without a cautious packet can be confined by a strict rate-limit used
918	   for the remainder of flows that haven't proved they are well-behaved
919	   by starting correctly (therefore they need not consume any flow
920	   state---they are just confined to the `misbehaving' bin if they carry
921	   an unrecognised flow ID).

923	4.5.  Inter-domain Policing

925	   One of the main design goals of re-ECN is for border security
926	   mechanisms to be as simple as possible, otherwise they will become
927	   the pinch-points that limit scalability of the whole internetwork.
928	   We want to avoid per-flow processing at borders and to keep to
929	   passive mechanisms that can monitor traffic in parallel to
930	   forwarding, rather than having to filter traffic inline---in series
931	   with forwarding.  Such passive, off-line mechanisms are essential for
932	   future high-speed all-optical border interconnection where packets
933	   cannot be buffered while they are checked for policy compliance.

935	   So far, we have been able to keep the border mechanisms simple,
936	   despite having had to harden them against some subtle attacks on the
937	   re-ECN design.  The mechanisms are still passive and avoid per-flow
938	   processing.

940	   The basic accounting mechanism at each border interface simply
941	   involves accumulating the volume of packets with positive worth
942	   (positive and cautious packets), and subtracting the volume of those
943	   with negative worth (red packets).  Even though this mechanism takes
944	   no regard of flows, over an accounting period (say a month) this
945	   subtraction will account for the downstream congestion caused by all
946	   the flows traversing the interface, wherever they come from, and
947	   wherever they go to.  The two networks can agree to use this metric
948	   however they wish to determine some congestion-related penalty
949	   against the upstream network.  Although the algorithm could hardly be
950	   simpler, it is spelled out using pseudo-code in Appendix C.1.

952	   Various attempts to subvert the re-ECN design have been made.  In all
953	   cases their root cause is persistently negative flows.  But, after
954	   describing these attacks we will show that we don't actually have to
955	   get rid of all persistently negative flows in order to thwart the
956	   attacks.

958	   In honest flows, downstream congestion is measured as positive minus
959	   negative volume.  So if all flows are honest (i.e. not persistently
960	   negative), adding all positive volume and all negative volume without
961	   regard to flows will give an aggregate measure of downstream
962	   congestion.  But such simple aggregation is only possible if no flows
963	   are persistently negative.  Unless persistently negative flows are
964	   completely removed, they will reduce the aggregate measure of
965	   congestion.  The aggregate may still be positive overall, but not as
966	   positive as it would have been had the negative flows been removed.

968	   In Section 4.3 we discussed how to sanction traffic to remove, or at
969	   least to identify, persistently negative flows.  But, even if the
970	   sanction for negative traffic is to discard it, unless it is
971	   discarded at the exact point it goes negative, it will wrongly
972	   subtract from aggregate downstream congestion, at least at any
973	   borders it crosses after it has gone negative but before it is
974	   discarded.

976	   We rely on sanctions to deter dishonest understatement of congestion.
977	   But even the ultimate sanction of discard can only be effective if
978	   the sender is bothered about the data getting through to its
979	   destination.  A number of attacks have been identified where a sender
980	   gains from sending dummy traffic or it can attack someone or
981	   something using dummy traffic even though it isn't communicating any
982	   information to anyone:

984	   o  A host can send traffic with no positive packets towards its
985	      intended destination, aiming to transmit as much traffic as any
986	      dropper will allow [Bauer06].  It may add forward error correction
987	      (FEC) to repair as much drop as it experiences.

989	   o  A host can send dummy traffic into the network with no positive
990	      packets and with no intention of communicating with anyone, but
991	      merely to cause higher levels of congestion for others who do want
992	      to communicate (DoS).  So, to ride over the extra congestion,
993	      everyone else has to spend more of whatever rights to cause
994	      congestion they have been allowed.

996	   o  A network can simply create its own dummy traffic to congest
997	      another network, perhaps causing it to lose business at no cost to
998	      the attacking network.  This is a form of denial of service
999	      perpetrated by one network on another.  The preferential drop
1000	      measures in [ref other document] provide crude protection against
1001	      such attacks, but we are not overly worried about more accurate
1002	      prevention measures, because it is already possible for networks
1003	      to DoS other networks on the general Internet, but they generally
1004	      don't because of the grave consequences of being found out.  We
1005	      are only concerned if re-ECN increases the motivation for such an
1006	      attack, as in the next example.

1008	   o  A network can just generate negative traffic and send it over its
1009	      border with a neighbour to reduce the overall penalties that it
1010	      should pay to that neighbour.  It could even initialise the TTL so
1011	      it expired shortly after entering the neighbouring network,
1012	      reducing the chance of detection further downstream.  This attack
1013	      need not be motivated by a desire to deny service and indeed need
1014	      not cause denial of service.  A network's main motivator would
1015	      most likely be to reduce the penalties it pays to a neighbour.
1016	      But, the prospect of financial gain might tempt the network into
1017	      mounting a DoS attack on the other network as well, given the gain
1018	      would offset some of the risk of being detected.

1020	   The first step towards a solution to all these problems with negative
1021	   flows is to be able to estimate the contribution they make to
1022	   downstream congestion at a border and to correct the measure
1023	   accordingly.  Although ideally we want to remove negative flows
1024	   themselves, perhaps surprisingly, the most effective first step is to
1025	   cancel out the polluting effect negative flows have on the measure of
1026	   downstream congestion at a border.  It is more important to get an
1027	   unbiased estimate of their effect, than to try to remove them all.  A
1028	   suggested algorithm to give an unbiased estimate of the contribution
1029	   from negative flows to the downstream congestion measure is given in
1030	   Appendix C.2.

1032	   Although making an accurate assessment of the contribution from
1033	   negative flows may not be easy, just the single step of neutralising
1034	   their polluting effect on congestion metrics removes all the gains
1035	   networks could otherwise make from mounting dummy traffic attacks on
1036	   each other.  This puts all networks on the same side (only with
1037	   respect to negative flows of course), rather than being pitched
1038	   against each other.  The network where this flow goes negative as
1039	   well as all the networks downstream lose out from not being
1040	   reimbursed for any congestion this flow causes.  So they all have an
1041	   interest in getting rid of these negative flows.  Networks forwarding
1042	   a flow before it goes negative aren't strictly on the same side, but
1043	   they are disinterested bystanders---they don't care that the flow
1044	   goes negative downstream, but at least they can't actively gain from
1045	   making it go negative.  The problem becomes localised so that once a
1046	   flow goes negative, all the networks from where it happens and beyond
1047	   downstream each have a small problem, each can detect it has a
1048	   problem and each can get rid of the problem if it chooses to.  But
1049	   negative flows can no longer be used for any new attacks.

1051	   Once an unbiased estimate of the effect of negative flows can be
1052	   made, the problem reduces to detecting and preferably removing flows
1053	   that have gone negative as soon as possible.  But importantly,
1054	   complete eradication of negative flows is no longer critical---best
1055	   endeavours will be sufficient.

1057	   For instance, let us consider the case where a source sends traffic
1058	   with no positive packets at all, hoping to at least get as much
1059	   traffic delivered as network-based droppers will allow.  The flow is
1060	   likely to go at least slightly negative in the first network on the
1061	   path (N1 if we use the example network layout in Figure 2).  If all
1062	   networks use the algorithm in Appendix C.2 to inflate penalties at
1063	   their border with an upstream network, they will remove the effect of
1064	   negative flows.  So, for instance, N2 will not be paying a penalty to
1065	   N1 for this flow.  Further, because the flow contributes no positive
1066	   packets at all, a dropper at the egress will completely remove it.

1068	   The remaining problem is that every network is carrying a flow that
1069	   is causing congestion to others but not being held to account for the
1070	   congestion it is causing.  Whenever the fail-safe border algorithm
1071	   (Section 4.6) or the border algorithm to compensate for negative
1072	   flows (Appendix C.2) detects a negative flow, it can instantiate a
1073	   focused dropper for that flow locally.  It may be some time before
1074	   the flow is detected, but the more strongly negative the flow is, the
1075	   more quickly it will be detected by the fail-safe algorithm.  But, in
1076	   the meantime, it will not be distorting border incentives.  Until it
1077	   is detected, if it contributes to drop anywhere, its packets will
1078	   tend to be dropped before others if queues use the preferential drop
1079	   rules in [ref other document], which discriminate against non-
1080	   positive packets.  All networks below the point where a flow goes
1081	   negative (N1, N2 and N3 in this case) have an incentive to remove
1082	   this flow, but the queue where it first goes negative (in N1) can of
1083	   course remove the problem for everyone downstream.

1085	   In the case of DDoS attacks, Section 5.1 describes how re-ECN
1086	   mitigates their force.

1088	4.6.  Inter-domain Fail-safes

1090	   The mechanisms described so far create incentives for rational
1091	   network operators to behave.  That is, one operator aims to make
1092	   another behave responsibly by applying penalties and expects a
1093	   rational response (i.e. one that trades off costs against benefits).
1094	   It is usually reasonable to assume that other network operators will
1095	   behave rationally (policy routing can avoid those that might not).
1096	   But this approach does not protect against the misconfigurations and
1097	   accidents of other operators.

1099	   Therefore, we propose the following two mechanisms at a network's
1100	   borders to provide "defence in depth".  Both are similar:

1102	   Highly positive flows:  A small sample of positive packets should be
1103	      picked randomly as they cross a border interface.  Then subsequent
1104	      packets matching the same source and destination address and DSCP
1105	      should be monitored.  If the fraction of positive packets is well
1106	      above a threshold (to be determined by operational practice), a
1107	      management alarm SHOULD be raised, and the flow MAY be
1108	      automatically subject to focused drop.

1110	   Persistently negative flows:  A small sample of congestion marked
1111	      (red) packets should be picked randomly as they cross a border
1112	      interface.  Then subsequent packets matching the same source and
1113	      destination address and DSCP should be monitored.  If the balance
1114	      of positive packets minus negative packets (measured in bytes) is
1115	      persistently negative, a management alarm SHOULD be raised, and
1116	      the flow MAY be automatically subject to focused drop.

1118	   Both these mechanisms rely on the fact that highly positive (or
1119	   negative) flows will appear more quickly in the sample by selecting
1120	   randomly solely from positive (or negative) packets.

1122	4.7.  The Case against Classic Feedback

1124	   A system that produces an optimal outcome as a result of everyone's
1125	   selfish actions is extremely powerful.  Especially one that enables
1126	   evolvability of congestion control.  But why do we have to change to
1127	   re-ECN to achieve it?  Can't classic congestion feedback (as used
1128	   already by standard ECN) be arranged to provide similar incentives
1129	   and similar evolvability?  Superficially it can.  Kelly's seminal
1130	   work showed how we can allow everyone the freedom to evolve whatever
1131	   congestion control behaviour is in their application's best interest
1132	   but still optimise the whole system of networks and users by placing
1133	   a price on congestion to ensure responsible use of this
1134	   freedom [Evol_cc]).  Kelly used ECN with its classic congestion
1135	   feedback model as the mechanism to convey congestion price
1136	   information.  The mechanism could be thought of as volume charging;
1137	   except only the volume of packets marked with congestion experienced
1138	   (CE) was counted.

1140	   However, below we explain why relying on classic feedback /required/
1141	   congestion charging to be used, while re-ECN achieves the same
1142	   powerful outcome (given it is built on Kelly's foundations), but does
1143	   not /require/ congestion charging.  In brief, the problem with
1144	   classic feedback is that the incentives have to trace the indirect
1145	   path back to the sender---the long way round the feedback loop.  For
1146	   example, if classic feedback were used in Figure 2, N2 would have had
1147	   to influence N1 via all of N3, R & S rather than directly.

1149	   Inability to agree what is happening downstream:  In order to police
1150	      its upstream neighbour's congestion response, the neighbours
1151	      should be able to agree on the congestion to be responded to.
1152	      Whatever the feedback regime, as packets change hands at each
1153	      trust boundary, any path metrics they carry are verifiable by both
1154	      neighbours.  But, with a classic path metric, they can only agree
1155	      on the /upstream/ path congestion.

1157	   Inaccessible back-channel:  The network needs a whole-path congestion
1158	      metric if it wants to control the source.  Classically, whole path
1159	      congestion emerges at the destination, to be fed back from
1160	      receiver to sender in a back-channel.  But, in any data network,
1161	      back-channels need not be visible to relays, as they are
1162	      essentially communications between the end-points.  They may be
1163	      encrypted, asymmetrically routed or simply omitted, so no network
1164	      element can reliably intercept them.  The congestion charging
1165	      literature solves this problem by charging the receiver and
1166	      assuming this will cause the receiver to refer the charges to the
1167	      sender.  But, of course, this creates unintended side-effects...

1169	   `Receiver pays' unacceptable:  In connectionless datagram networks,
1170	      receivers and receiving networks cannot prevent reception from
1171	      malicious senders, so `receiver pays' opens them to `denial of
1172	      funds' attacks.

1174	   End-user congestion charging unacceptable in many societies:  Even if
1175	      'denial of funds' were not a problem, we know that end-users are
1176	      highly averse to the unpredictability of congestion charging and
1177	      anyway, we want to avoid restricting network operators to just one
1178	      retail tariff.  But with classic feedback only an upstream metric
1179	      is available, so we cannot avoid having to wrap the `receiver
1180	      pays' money flow around the feedback loop, necessarily forcing
1181	      end-users to be subjected to congestion charging.

1183	   To summarise so far, with classic feedback, policing congestion
1184	   response without losing evolvability /requires/ congestion charging
1185	   of end-users and a `receiver pays' model, whereas, with re-ECN, it is
1186	   still possible to influence incentives using congestion charging but
1187	   using the safer `sender pays' model.  However, congestion charging is
1188	   only likely to be appropriate between domains.  So, without losing
1189	   evolvability, re-ECN enables technical policing mechanisms that are
1190	   more appropriate for end users than congestion pricing.

1192	4.8.  Simulations

1194	   Simulations of policer and dropper performance done for the multi-bit
1195	   version of re-feedback have been included in section 5 "Dropper
1196	   Performance" of [Re-fb].  Simulations of policer and dropper for the
1197	   re-ECN version described in this document are work in progress.

1199	5.  Other Applications of Re-ECN

1201	5.1.  DDoS Mitigation

1203	   A flooding attack is inherently about congestion of a resource.
1204	   Because re-ECN ensures the sources causing network congestion
1205	   experience the cost of their own actions, it acts as a first line of
1206	   defence against DDoS.  As load focuses on a victim, upstream queues
1207	   grow, requiring honest sources to pre-load packets with a higher
1208	   fraction of positive packets.  Once downstream queues are so
1209	   congested that they are dropping traffic, they will be marking to
1210	   negative the traffic they do forward 100%.  Honest sources will
1211	   therefore be sending positive packets 100% (and therefore being
1212	   severely rate-limited at the ingress).

1214	   Senders under malicious control can either do the same as honest
1215	   sources, and be rate-limited at ingress, or they can understate
1216	   congestion by sending more neutral RECT packets than they should.  If
1217	   sources understate congestion (i.e. do not re-echo sufficient
1218	   positive packets) and the preferential drop ranking is implemented on
1219	   queues ([ref othe document]), these queues will preserve positive
1220	   traffic until last.  So, the neutral traffic from malicious sources
1221	   will all be automatically dropped first.  Either way, the malicious
1222	   sources cannot send more than honest sources.

1224	   Further, hosts under malicious control will tend to be re-used for
1225	   many different attacks.  They will therefore build up a long term
1226	   history of causing congestion.  Therefore, as long as the population
1227	   of potentially compromisable hosts around the Internet is limited,
1228	   the per-user policing algorithms in Appendix B.1 will gradually
1229	   throttle down zombies and other launchpads for attacks.  Therefore,
1230	   widespread deployment of re-ECN could considerably dampen the force
1231	   of DDoS.  Certainly, zombie armies could hold their fire for long
1232	   enough to be able to build up enough credit in the per-user policers
1233	   to launch an attack.  But they would then still be limited to no more
1234	   throughput than other, honest users.

1236	   Inter-domain traffic policing (see Section 4.5)ensures that any
1237	   network that harbours compromised `zombie' hosts will have to bear
1238	   the cost of the congestion caused by traffic from zombies in
1239	   downstream networks.  Such networks will be incentivised to deploy
1240	   per-user policers that rate-limit hosts that are unresponsive to
1241	   congestion so they can only send very slowly into congested paths.
1242	   As well as protecting other networks, the extremely poor performance
1243	   at any sign of congestion will incentivise the zombie's owner to
1244	   clean it up.  However, the host should behave normally when using
1245	   uncongested paths.

1247	   Uniquely, re-ECN handles DDoS traffic without relying on the validity
1248	   of identifiers in packets.  Certainly the egress dropper relies on
1249	   uniqueness of flow identifiers, but not their validity.  So if a
1250	   source spoofs another address, re-ECN works just as well, as long as
1251	   the attacker cannot imitate all the flow identifiers of another
1252	   active flow passing through the same dropper (see Section 6).
1253	   Similarly, the ingress policer relies on uniqueness of flow IDs, not
1254	   their validity.  Because a new flow will only be allowed any rate at
1255	   all if it starts with a cautious packet, and the more cautious
1256	   packets there are starting new flows, the more they will be limited.
1257	   Essentially a re-ECN policer limits the bulk of all congestion
1258	   entering the network through a physical interface; limiting the
1259	   congestion caused by each flow is merely an optional extra.

1261	5.2.  End-to-end QoS

1263	   {ToDo: (Section 3.3.2 of [Re-fb] entitled `Edge QoS' gives an outline
1264	   of the text that will be added here).}

1266	5.3.  Traffic Engineering

1268	   Classic feedback makes congestion-based traffic engineering
1269	   inefficient too.  Network N3 can see which of its two alternative
1270	   upstream networks N2 and N3 are less congested.  But it is N1 that
1271	   makes the routing decision.  This is why current traffic engineering
1272	   requires a continuous message stream from congestion monitors to the
1273	   routing controller.  And even then the monitors can only be trusted
1274	   for /intra-/domain traffic engineering.  The trustworthiness of re-
1275	   ECN enables /inter-/domain traffic engineering without messaging
1276	   overhead. {ToDo: Elaborate}

1278	5.4.  Inter-Provider Service Monitoring

1280	   {ToDo: }

1282	6.  Limitations

1284	   {ToDo:See also: slide of limitations}

1286	   The known limitations of the re-ECN approach are:

1288	   o  We still cannot defend against the attack described in Section 10
1289	      where a malicious source sends negative traffic through the same
1290	      egress dropper as another flow and imitates its flow identifiers,
1291	      allowing a malicious source to cause an innocent flow to
1292	      experience heavy drop.

1294	   o  Re-feedback for TTL (re-TTL) would also be desirable at the same
1295	      time as re-ECN.  Unfortunately this requires a further standards
1296	      action for the mechanisms briefly described in Appendix D

1298	   o  Traffic must be ECN-capable for re-ECN to be effective.  The only
1299	      defence against malicious users who turn off ECN capbility is that
1300	      networks are expected to rate limit Not-ECT traffic and to apply
1301	      higher drop preference to it during congestion.  Although these
1302	      are blunt instruments, they at least represent a feasible scenario
1303	      for the future Internet where Not-ECT traffic co-exists with re-
1304	      ECN traffic, but as a severely hobbled under-class.  We recommend
1305	      (Section 7.1) that while accommodating a smooth initial transition
1306	      to re-ECN, policing policies should gradually be tightened to rate
1307	      limit Not-ECT traffic more strictly in the longer term.

1309	   o  When checking whether a flow is balancing positive packets with
1310	      negative packets (measured in bytes), re-ECN can only account for
1311	      congestion marking, not drops.  So, whenever a sender experiences
1312	      drop, it does not have to re-echo the congestion event by sending
1313	      positive packet(s).  Nonetheless, it is hardly any advantage to be
1314	      able to send faster than other flows only if your traffic is
1315	      dropped and the other traffic isn't.

1317	   o  We are considering the issue of whether it would be useful to
1318	      truncate rather than drop packets that appear to be malicious, so
1319	      that the feedback loop is not broken but useful data can be
1320	      removed.

1322	   {ToDo: Monopolies over Routes}

1324	7.  Incremental Deployment

1326	7.1.  Incremental Deployment Features

1328	   The design of the re-ECN protocol started from the fact that the
1329	   current ECN marking behaviour of queues was sufficient and that re-
1330	   feedback could be introduced around these queues by changing the
1331	   sender behaviour but not the routers.  Otherwise, if we had required
1332	   routers to be changed, the chance of encountering a path that had
1333	   every router upgraded would be vanishly small during early
1334	   deployment, giving no incentive to start deployment.  Also, as there
1335	   is no new forwarding behaviour, routers and hosts do not have to
1336	   signal or negotiate anything.

1338	   However, networks that choose to protect themselves using re-ECN do
1339	   have to add new security functions at their trust boundaries with
1340	   others.  They distinguish legacy traffic by its ECN field.  Traffic
1341	   from Not-ECT transports is distinguishable by its Not-ECT marking .
1342	   Traffic from RFC3168 compliant ECN transports is distinguished from
1343	   re-ECN by which of ECT(0) or ECT(1) is used.  We chose to use ECT(1)
1344	   for re-ECN traffic deliberately.  Existing ECN sources set ECT(0) on
1345	   either 50% (the nonce) or 100% (the default) of packets, whereas re-
1346	   ECN does not use ECT(0) at all.  We can use this distinguishing
1347	   feature of RFC3168 compliant ECN traffic to separate it out for
1348	   different treatment at the various border security functions: egress
1349	   dropping, ingress policing and border policing.

1351	   The general principle we adopt is that an egress dropper will not
1352	   drop any legacy traffic, but ingress and border policers will limit
1353	   the bulk rate of legacy traffic (Not-ECT, ECT(0) and those amrked
1354	   with the unused codepoint as defined in [Re-TCP]) that can enter each
1355	   network.  Then, during early re-ECN deployment, operators can set
1356	   very permissive (or non-existent) rate-limits on legacy traffic, but
1357	   once re-ECN implementations are generally available, legacy traffic
1358	   can be rate-limited increasingly harshly.  Ultimately, an operator
1359	   might choose to block all legacy traffic entering its network, or at
1360	   least only allow through a trickle.

1362	   Then, as the limits are set more strictly, the more RFC3168 ECN
1363	   sources will gain by upgrading to re-ECN.  Thus, towards the end of
1364	   the voluntary incremental deployment period, RFC3168 compliant
1365	   transports can be given progressively stronger encouragement to
1366	   upgrade.

1368	7.2.  Incremental Deployment Incentives

1370	   It would only be worth standardising the re-ECN protocol if there
1371	   existed a coherent story for how it might be incrementally deployed.
1372	   In order for it to have a chance of deployment, everyone who needs to
1373	   act must have a strong incentive to act, and the incentives must
1374	   arise in the order that deployment would have to happen.  Re-ECN
1375	   works around unmodified ECN routers, but we can't just discuss why
1376	   and how re-ECN deployment might build on ECN deployment, because
1377	   there is precious little to build on in the first place.  Instead, we
1378	   aim to show that re-ECN deployment could carry ECN with it.  We focus
1379	   on commercial deployment incentives, although some of the arguments
1380	   apply equally to academic or government sectors.

1382	   ECN deployment:

1384	      ECN is largely implemented in commercial routers, but generally
1385	      not as a supported feature, and it has largely not been deployed
1386	      by commercial network operators.  ECN has been implemented in most
1387	      Unix-based operating systems for some time.  Microsoft first
1388	      implemented ECN in Windows Vista, but it is only on by default for
1389	      the server end of a TCP connection.  Unfortunately the client end
1390	      had to be turned off by default, because a non-zero ECN field
1391	      triggers a bug in a legacy home gateway which makes it crash.  For
1392	      detailed deployment status, see [ECN-Deploy].  We believe the
1393	      reason ECN deployment has not happened is twofold:

1395	      *  ECN requires changes to both routers and hosts.  If someone
1396	         wanted to sell the improvement that ECN offers, they would have
1397	         to co-ordinate deployment of their product with others.  An ECN
1398	         server only gives any improvement on an ECN network.  An ECN
1399	         network only gives any improvement if used by ECN devices.
1400	         Deployment that requires co-ordination adds cost and delay and
1401	         tends to dilute any competitive advantage that might be gained.

1403	      *  ECN `only' gives a performance improvement.  Making a product a
1404	         bit faster (whether the product is a device or a network),
1405	         isn't usually a sufficient selling point to be worth the cost
1406	         of co-ordinating across the industry to deploy it.  Network
1407	         operators tend to avoid re-configuring a working network unless
1408	         launching a new product.

1410	   ECN and Re-ECN for Edge-to-edge Assured QoS:

1412	      We believe the proposal to provide assured QoS sessions using a
1413	      form of ECN called pre-congestion notification (PCN) [RFC5559] is
1414	      most likely to break the deadlock in ECN deployment first.  It
1415	      only requires edge-to-edge deployment so it does not require
1416	      endpoint support.  It can be deployed in a single network, then
1417	      grow incrementally to interconnected networks.  And it provides a
1418	      different `product' (internetworked assured QoS), rather than
1419	      merely making an existing product a bit faster.

1421	      Not only could this assured QoS application kick-start ECN
1422	      deployment, it could also carry re-ECN deployment with it; because
1423	      re-ECN can enable the assured QoS region to expand to a large
1424	      internetwork where neighbouring networks do not trust each other.
1425	      [Re-PCN] argues that re-ECN security should be built in to the QoS
1426	      system from the start, explaining why and how.

1428	      If ECN and re-ECN were deployed edge-to-edge for assured QoS,
1429	      operators would gain valuable experience.  They would also clear
1430	      away many technical obstacles such as firewall configurations that
1431	      block all but the RFC3168 settings of the ECN field and the RE
1432	      flag.

1434	   ECN in Access Networks:

1436	      The next obstacle to ECN deployment would be extension to access
1437	      and backhaul networks, where considerable link layer differences
1438	      makes implementation non-trivial, particularly on congested
1439	      wireless links.  ECN and re-ECN work fine during partial
1440	      deployment, but they will not be very useful if the most congested
1441	      elements in networks are the last to support them.  Access network
1442	      support is one of the weakest parts of this deployment story.  All
1443	      we can hope is that, once the benefits of ECN are better
1444	      understood by operators, they will push for the necessary link
1445	      layer implementations as deployment proceeds.

1447	   Policing Unresponsive Flows:

1449	      Re-ECN allows a network to offer differentiated quality of service
1450	      as explained in Section 5.2.  But we do not believe this will
1451	      motivate initial deployment of re-ECN, because the industry is
1452	      already set on alternative ways of doing QoS.  Despite being much
1453	      more complicated and expensive, the alternative approaches are
1454	      here and now.

1456	      But re-ECN is critical to QoS deployment in another respect.  It
1457	      can be used to prevent applications from taking whatever bandwidth
1458	      they choose without asking.

1460	      Currently, applications that remain resolute in their lack of
1461	      response to congestion are rewarded by other TCP applications.  In
1462	      other words, TCP is naively friendly, in that it reduces its rate
1463	      in response to congestion whether it is competing with friends
1464	      (other TCPs) or with enemies (unresponsive applications).

1466	      Therefore, those network owners that want to sell QoS will be keen
1467	      to ensure that their users can't help themselves to QoS for free.
1468	      Given the very large revenues at stake, we believe effective
1469	      policing of congestion response will become highly sought after by
1470	      network owners.

1472	      But this does not necessarily argue for re-ECN deployment.
1473	      Network owners might choose to deploy bottleneck policers rather
1474	      than re-ECN-based policing.  However, under Related Work
1475	      (Section 9) we argue that bottleneck policers are inherently
1476	      vulnerable to circumvention.

1478	      Therefore we believe there will be a strong demand from network
1479	      owners for re-ECN deployment so they can police flows that do not
1480	      ask to be unresponsive to congestion, in order to protect their
1481	      revenues from flows that do ask (QoS).  In particular, we suspect
1482	      that the operators of cellular networks will want to prevent VoIP
1483	      and video applications being used freely on their networks as a
1484	      more open market develops in GPRS and 3G devices.

1486	      Initial deployments are likely to be isolated to single cellular
1487	      networks.  Cellular operators would first place requirements on
1488	      device manufacturers to include re-ECN in the standards for mobile
1489	      devices.  In parallel, they would put out tenders for ingress and
1490	      egress policers.  Then, after a while they would start to tighten
1491	      rate limits on Not-ECT traffic from non-standard devices and they
1492	      would start policing whatever non-accredited applications people
1493	      might install on mobile devices with re-ECN support in the
1494	      operating system.  This would force even independent mobile device
1495	      manufacturers to provide re-ECN support.  Early standardisation
1496	      across the cellular operators is likely, including interconnection
1497	      agreements with penalties for excess downstream congestion.

1499	      We suspect some fixed broadband networks (whether cable or DSL)
1500	      would follow a similar path.  However, we also believe that larger
1501	      parts of the fixed Internet would not choose to police on a per-
1502	      flow basis.  Some might choose to police congestion on a per-user
1503	      basis in order to manage heavy peer-to-peer file-sharing, but it
1504	      seems likely that a sizeable majority would not deploy any form of
1505	      policing.

1507	      This hybrid situation begs the question, "How does re-ECN work for
1508	      networks that choose to using policing if they connect with others
1509	      that don't?"  Traffic from non-ECN capable sources will arrive
1510	      from other networks and cause congestion within the policed, ECN-
1511	      capable networks.  So networks that chose to police congestion
1512	      would rate-limit Not-ECT traffic throughout their network,
1513	      particularly at their borders.  They would probably also set
1514	      higher usage prices in their interconnection contracts for
1515	      incoming Not-ECT and Not-RECT traffic.  We assume that
1516	      interconnection contracts between networks in the same tier will
1517	      include congestion penalties before contracts with provider
1518	      backbones do.

1520	      A hybrid situation could remain for all time.  As was explained in
1521	      the introduction, we believe in healthy competition between
1522	      policing and not policing, with no imperative to convert the whole
1523	      world to the religion of policing.  Networks that chose not to
1524	      deploy egress droppers would leave themselves open to being
1525	      congested by senders in other networks.  But that would be their
1526	      choice.

1528	      The important aspect of the egress dropper though is that it most
1529	      protects the network that deploys it.  If a network does not
1530	      deploy an egress dropper, sources sending into it from other
1531	      networks will be able to understate the congestion they are
1532	      causing.  Whereas, if a network deploys an egress dropper, it can
1533	      know how much congestion other networks are dumping into it, and
1534	      apply penalties or charges accordingly.  So, whether or not a
1535	      network polices its own sources at ingress, it is in its interests
1536	      to deploy an egress dropper.

1538	   Host support:

1540	      In the above deployment scenario, host operating system support
1541	      for re-ECN came about through the cellular operators demanding it
1542	      in device standards (i.e. 3GPP).  Of course, increasingly, mobile
1543	      devices are being built to support multiple wireless technologies.
1544	      So, if re-ECN were stipulated for cellular devices, it would
1545	      automatically appear in those devices connected to the wireless
1546	      fringes of fixed networks if they coupled cellular with WiFi or
1547	      Bluetooth technology, for instance.  Also, once implemented in the
1548	      operating system of one mobile device, it would tend to be found
1549	      in other devices using the same family of operating system.

1551	      Therefore, whether or not a fixed network deployed ECN, or
1552	      deployed re-ECN policers and droppers, many of its hosts might
1553	      well be using re-ECN over it.  Indeed, they would be at an
1554	      advantage when communicating with hosts across re-ECN policed
1555	      networks that rate limited Not-RECT traffic.

1557	   Other possible scenarios:

1559	      The above is thankfully not the only plausible scenario we can
1560	      think of.  One of the many clubs of operators that meet regularly
1561	      around the world might decide to act together to persuade a major
1562	      operating system manufacturer to implement re-ECN.  And they may
1563	      agree between them on an interconnection model that includes
1564	      congestion penalties.

1566	      Re-ECN provides an interesting opportunity for device
1567	      manufacturers as well as network operators.  Policers can be
1568	      configured loosely when first deployed.  Then as re-ECN take-up
1569	      increases, they can be tightened up, so that a network with re-ECN
1570	      deployed can gradually squeeze down the service provided to
1571	      RFC3168 compliant devices that have not upgraded to re-ECN.  Many
1572	      device vendors rely on replacement sales.  And operating system
1573	      companies rely heavily on new release sales.  Also support
1574	      services would like to be able to force stragglers to upgrade.
1575	      So, the ability to throttle service to RFC3168 compliant operating
1576	      systems is quite valuable.

1578	      Also, policing unresponsive sources may not be the only or even
1579	      the first application that drives deployment.  It may be policing
1580	      causes of heavy congestion (e.g. peer-to-peer file-sharing).  Or
1581	      it may be mitigation of denial of service.  Or we may be wrong in
1582	      thinking simpler QoS will not be the initial motivation for re-ECN
1583	      deployment.  Indeed, the combined pressure for all these may be
1584	      the motivator, but it seems optimistic to expect such a level of
1585	      joined-up thinking from today's communications industry.  We
1586	      believe a single application alone must be a sufficient motivator.

1588	      In short, everyone gains from adding accountability to TCP/IP,
1589	      except the selfish or malicious.  So, deployment incentives tend
1590	      to be strong.

1592	8.  Architectural Rationale

1594	   In the Internet's technical community, the danger of not responding
1595	   to congestion is well-understood, as well as its attendant risk of
1596	   congestion collapse [RFC3714].  However, one side of the Internet's
1597	   commercial community considers that the very essence of IP is to
1598	   provide open access to the internetwork for all applications.  They
1599	   see congestion as a symptom of over-conservative investment, and rely
1600	   on revising application designs to find novel ways to keep
1601	   applications working despite congestion.  They argue that the
1602	   Internet was never intended to be solely for TCP-friendly
1603	   applications.  Meanwhile, another side of the Internet's commercial
1604	   community believes that it is worthwhile providing a network for
1605	   novel applications only if it has sufficient capacity, which can
1606	   happen only if a greater share of application revenues can be
1607	   /assured/ for the infrastructure provider.  Otherwise the major
1608	   investments required would carry too much risk and wouldn't happen.

1610	   The lesson articulated in [Tussle] is that we shouldn't embed our
1611	   view on these arguments into the Internet at design time.  Instead we
1612	   should design the Internet so that the outcome of these arguments can
1613	   get decided at run-time.  Re-ECN is designed in that spirit.  Once
1614	   the protocol is available, different network operators can choose how
1615	   liberal they want to be in holding people accountable for the
1616	   congestion they cause.  Some might boldly invest in capacity and not
1617	   police its use at all, hoping that novel applications will result.
1618	   Others might use re-ECN for fine-grained flow policing, expecting to
1619	   make money selling vertically integrated services.  Yet others might
1620	   sit somewhere half-way, perhaps doing coarse, per-user policing.  All
1621	   might change their minds later.  But re-ECN always allows them to
1622	   interconnect so that the careful ones can protect themselves from the
1623	   liberal ones.

1625	   The incentive-based approach used for re-ECN is based on Gibbens and
1626	   Kelly's arguments [Evol_cc] on allowing endpoints the freedom to
1627	   evolve new congestion control algorithms for new applications.  They
1628	   ensured responsible behaviour despite everyone's self-interest by
1629	   applying pricing to ECN marking, and Kelly had proved stability and
1630	   optimality in an earlier paper.

1632	   Re-ECN keeps all the underlying economic incentives, but rearranges
1633	   the feedback.  The idea is to allow a network operator (if it
1634	   chooses) to deploy engineering mechanisms like policers at the front
1635	   of the network which can be designed to behave /as if/ they are
1636	   responding to congestion prices.  Rather than having to subject users
1637	   to congestion pricing, networks can then use more traditional
1638	   charging regimes (or novel ones).  But the engineering can constrain
1639	   the overall amount of congestion a user can cause.  This provides a
1640	   buffer against completely outrageous congestion control, but still
1641	   makes it easy for novel applications to evolve if they need different
1642	   congestion control to the norms.  It also allows novel charging
1643	   regimes to evolve.

1645	   Despite being achieved with a relatively minor protocol change, re-
1646	   ECN is an architectural change.  Previously, Internet congestion
1647	   could only be controlled by the data sender, because it was the only
1648	   one both in a position to control the load and in a position to see
1649	   information on congestion.  Re-ECN levels the playing field.  It
1650	   recognises that the network also has a role to play in moderating
1651	   (policing) congestion control.  But policing is only truly effective
1652	   at the first ingress into an internetwork, whereas path congestion
1653	   was previously only visible at the last egress.  So, re-ECN
1654	   democratises congestion information.  Then the choice over who
1655	   actually controls congestion can be made at run-time, not design
1656	   time---a bit like an aircraft with dual controls.  And different
1657	   operators can make different choices.  We believe non-architectural
1658	   approaches to this problem are unlikely to offer more than partial
1659	   solutions (see Section 9).

1661	   Importantly, re-ECN does not require assumptions about specific
1662	   congestion responses to be embedded in any network elements, except
1663	   at the first ingress to the internetwork if that level of control is
1664	   desired by the ingress operator.  But such tight policing will be a
1665	   matter of agreement between the source and its access network
1666	   operator.  The ingress operator need not police congestion response
1667	   at flow granularity; it can simply hold a source responsible for the
1668	   aggregate congestion it causes, perhaps keeping it within a monthly
1669	   congestion quota.  Or if the ingress network trusts the source, it
1670	   can do nothing.

1672	   Therefore, the aim of the re-ECN protocol is NOT solely to police
1673	   TCP-friendliness.  Re-ECN preserves IP as a generic network layer for
1674	   all sorts of responses to congestion, for all sorts of transports.
1675	   Re-ECN merely ensures truthful downstream congestion information is
1676	   available in the network layer for all sorts of accountability
1677	   applications.

1679	   The end to end design principle does not say that all functions
1680	   should be moved out of the lower layers---only those functions that
1681	   are not generic to all higher layers.  Re-ECN adds a function to the
1682	   network layer that is generic, but was omitted: accountability for
1683	   causing congestion.  Accountability is not something that an end-user
1684	   can provide to themselves.  We believe re-ECN adds no more than is
1685	   sufficient to hold each flow accountable, even if it consists of a
1686	   single datagram.

1688	   "Accountability" implies being able to identify who is responsible
1689	   for causing congestion.  However, at the network layer it would NOT
1690	   be useful to identify the cause of congestion by adding individual or
1691	   organisational identity information, NOR by using source IP
1692	   addresses.  Rather than bringing identity information to the point of
1693	   congestion, we bring downstream congestion information to the point
1694	   where the cause can be most easily identified and dealt with.  That
1695	   is, at any trust boundary congestion can be associated with the
1696	   physically connected upstream neighbour that is directly responsible
1697	   for causing it (whether intentionally or not).  A trust boundary
1698	   interface is exactly the place to police or throttle in order to
1699	   directly mitigate congestion, rather than having to trace the
1700	   (ir)responsible party in order to shut them down.

1702	   Some considered that ECN itself was a layering violation.  The
1703	   reasoning went that the interface to a layer should provide a service
1704	   to the higher layer and hide how the lower layer does it.  However,
1705	   ECN reveals the state of the network layer and below to the transport
1706	   layer.  A more positive way to describe ECN is that it is like the
1707	   return value of a function call to the network layer.  It explicitly
1708	   returns the status of the request to deliver a packet, by returning a
1709	   value representing the current risk that a packet will not be served.
1710	   Re-ECN has similar semantics, except the transport layer must try to
1711	   guess the return value, then it can use the actual return value from
1712	   the network layer to modify the next guess.

1714	   The guiding principle behind all the discussion in Section 4.5 on
1715	   Policing is that any gain from subverting the protocol should be
1716	   precisely neutralised, rather than punished.  If a gain is punished
1717	   to a greater extent than is sufficient to neutralise it, it will most
1718	   likely open up a new vulnerability, where the amplifying effect of
1719	   the punishment mechanism can be turned on others.

1721	   For instance, if possible, flows should be removed as soon as they go
1722	   negative, but we do NOT RECOMMEND any attempts to discard such flows
1723	   further upstream while they are still positive.  Such over-zealous
1724	   push-back is unnecessary and potentially dangerous.  These flows have
1725	   paid their `fare' up to the point they go negative, so there is no
1726	   harm in delivering them that far.  If someone downstream asks for a
1727	   flow to be dropped as near to the source as possible, because they
1728	   say it is going to become negative later, an upstream node cannot
1729	   test the truth of this assertion.  Rather than have to authenticate
1730	   such messages, re-ECN has been designed so that flows can be dropped
1731	   solely based on locally measurable evidence.  A message hinting that
1732	   a flow should be watched closely to test for negativity is fine.  But
1733	   not a message that claims that a positive flow will go negative
1734	   later, so it should be dropped. .

1736	9.  Related Work

1738	   {Due to lack of time, this section is incomplete.  The reader is
1739	   referred to the Related Work section of [Re-fb] for a brief selection
1740	   of related ideas.}

1742	9.1.  Policing Rate Response to Congestion

1744	   ATM network elements send congestion back-pressure
1745	   messages [ITU-T.I.371] along each connection, duplicating any end to
1746	   end feedback because they don't trust it.  On the other hand, re-ECN
1747	   ensures information in forwarded packets can be used for congestion
1748	   management without requiring a connection-oriented architecture and
1749	   re-using the overhead of fields that are already set aside for end to
1750	   end congestion control (and routing loop detection in the case of re-
1751	   TTL in Appendix D).

1753	   We borrowed ideas from policers in the literature [pBox],[XCHOKe],
1754	   AFD etc. for our rate equation policer.  However, without the benefit
1755	   of re-ECN they don't police the correct rate for the condition of
1756	   their path.  They detect unusually high /absolute/ rates, but only
1757	   while the policer itself is congested, because they work by detecting
1758	   prevalent flows in the discards from the local RED queue.  These
1759	   policers must sit at every potential bottleneck, whereas our policer
1760	   need only be located at each ingress to the internetwork.  As Floyd &
1761	   Fall explain [pBox], the limitation of their approach is that a high
1762	   sending rate might be perfectly legitimate, if the rest of the path
1763	   is uncongested or the round trip time is short.  Commercially
1764	   available rate policers cap the rate of any one flow.  Or they
1765	   enforce monthly volume caps in an attempt to control high volume
1766	   file-sharing.  They limit the value a customer derives.  They might
1767	   also limit the congestion customers can cause, but only as an
1768	   accidental side-effect.  They actually punish traffic that fills
1769	   troughs as much as traffic that causes peaks in utilisation.  In
1770	   practice network operators need to be able to allocate service by
1771	   cost during congestion, and by value at other times.

1773	9.2.  Congestion Notification Integrity

1775	   The choice of two ECT code-points in the ECN field [RFC3168]
1776	   permitted future flexibility, optionally allowing the sender to
1777	   encode the experimental ECN nonce [RFC3540] in the packet stream.
1778	   This mechanism has since been included in the specifications of DCCP
1779	   [RFC4340].

1781	   The ECN nonce is an elegant scheme that allows the sender to detect
1782	   if someone in the feedback loop - the receiver especially - tries to
1783	   claim no congestion was experienced when in fact congestion led to
1784	   packet drops or ECN marks.  For each packet it sends, the sender
1785	   chooses between the two ECT codepoints in a pseudo-random sequence.
1786	   Then, whenever the network marks a packet with CE, if the receiver
1787	   wants to deny congestion happened, she has to guess which ECT
1788	   codepoint was overwritten.  She has only a 50:50 chance of being
1789	   correct each time she denies a congestion mark or a drop, which
1790	   ultimately will give her away.

1792	   The purpose of a network-layer nonce should primarily be protection
1793	   of the network, while a transport-layer nonce would be better used to
1794	   protect the sender from cheating receivers.  Now, the assumption
1795	   behind the ECN nonce is that a sender will want to detect whether a
1796	   receiver is suppressing congestion feedback.  This is only true if
1797	   the sender's interests are aligned with the network's, or with the
1798	   community of users as a whole.  This may be true for certain large
1799	   senders, who are under close scrutiny and have a reputation to
1800	   maintain.  But we have to deal with a more hostile world, where
1801	   traffic may be dominated by peer-to-peer transfers, rather than
1802	   downloads from a few popular sites.  Often the `natural' self-
1803	   interest of a sender is not aligned with the interests of other
1804	   users.  It often wishes to transfer data quickly to the receiver as
1805	   much as the receiver wants the data quickly.

1807	   In contrast, the re-ECN protocol enables policing of an agreed rate-
1808	   response to congestion (e.g. TCP-friendliness) at the sender's
1809	   interface with the internetwork.  It also ensures downstream networks
1810	   can police their upstream neighbours, to encourage them to police
1811	   their users in turn.  But most importantly, it requires the sender to
1812	   declare path congestion to the network and it can remove traffic at
1813	   the egress if this declaration is dishonest.  So it can police
1814	   correctly, irrespective of whether the receiver tries to suppress
1815	   congestion feedback or whether the sender ignores genuine congestion
1816	   feedback.  Therefore the re-ECN protocol addresses a much wider range
1817	   of cheating problems, which includes the one addressed by the ECN
1818	   nonce.

1820	9.3.  Identifying Upstream and Downstream Congestion

1822	   Purple [Purple] proposes that queues should use the CWR flag in the
1823	   TCP header of ECN-capable flows to work out path congestion and
1824	   therefore downstream congestion in a similar way to re-ECN.  However,
1825	   because CWR is in the transport layer, it is not always visible to
1826	   network layer routers and policers.  Purple's motivation was to
1827	   improve AQM, not policing.  But, of course, nodes trying to avoid a
1828	   policer would not be expected to allow CWR to be visible.

1830	10.  Security Considerations

1832	   {ToDo: enrich this section}{ToDo: Describe attacks by networks on
1833	   flows (and by spoofing sources).} {ToDo: Re-ECN & DNS servers}

1835	   Nearly the whole of this document concerns security.

1837	11.  IANA Considerations

1839	   This memo includes no request to IANA.

1841	12.  Conclusions

1843	   {ToDo:}

1845	13.  Acknowledgements

1847	   Sebastien Cazalet and Andrea Soppera contributed to the idea of re-
1848	   feedback.  All the following have given helpful comments: Andrea
1849	   Soppera, David Songhurst, Peter Hovell, Louise Burness, Phil Eardley,
1850	   Steve Rudkin, Marc Wennink, Fabrice Saffre, Cefn Hoile, Steve Wright,
1851	   John Davey, Martin Koyabe, Carla Di Cairano-Gilfedder, Alexandru
1852	   Murgu, Nigel Geffen, Pete Willis, John Adams (BT), Sally Floyd
1853	   (ICIR), Joe Babiarz, Kwok Ho-Chan (Nortel), Stephen Hailes, Mark
1854	   Handley (who developed the attack with cancelled packets), Adam
1855	   Greenhalgh (who developed the attack on DNS) (UCL), Jon Crowcroft
1856	   (Uni Cam), David Clark, Bill Lehr, Sharon Gillett, Steve Bauer (who
1857	   complemented our own dummy traffic attacks with others), Liz Maida
1858	   (MIT), and comments from participants in the CRN/CFP Broadband and
1859	   DoS-resistant Internet working groups.A special thank you to
1860	   Alessandro Salvatori for coming up with fiendish attacks on re-ECN.

1862	14.  Comments Solicited

1864	   Comments and questions are encouraged and very welcome.  They can be
1865	   addressed to the IETF Transport Area working group's mailing list
1866	   <tsvwg@ietf.org>, and/or to the authors.

1868	15.  References

1870	15.1.  Normative References

1872	   [RFC2119]         Bradner, S., "Key words for use in RFCs to Indicate
1873	                     Requirement Levels", BCP 14, RFC 2119, March 1997.

1875	   [RFC3168]         Ramakrishnan, K., Floyd, S., and D. Black, "The
1876	                     Addition of Explicit Congestion Notification (ECN)
1877	                     to IP", RFC 3168, September 2001.

1879	15.2.  Informative References

1881	   [Bauer06]         Bauer, S., Faratin, P., and R. Beverly, "Assessing
1882	                     the assumptions underlying mechanism design for the
1883	                     Internet", Proc. Workshop on the Economics of
1884	                     Networked Systems (NetEcon06) , June 2006, <http://
1885	                     www.cs.duke.edu/nicl/netecon06/papers/
1886	                     ne06-assessing.pdf>.

1888	   [CLoop_pol]       Salvatori, A., "Closed Loop Traffic Policing",
1889	                     Politecnico Torino and Institut Eurecom Masters
1890	                     Thesis , September 2005.

1892	   [ECN-Deploy]      Floyd, S., "ECN (Explicit Congestion Notification)
1893	                     in TCP/IP; Implementation and Deployment of ECN",
1894	                     Web-page , May 2004, <http://www.icir.org/floyd/
1895	                     ecn.html#implementations>.

1897	   [Evol_cc]         Gibbens, R. and F. Kelly, "Resource pricing and the
1898	                     evolution of congestion control",
1899	                     Automatica 35(12)1969--1985, December 1999, <http:/
1900	                     /www.sciencedirect.com/science/article/pii/
1901	                     S0005109899001351>.

1903	   [ITU-T.I.371]     ITU-T, "Traffic Control and Congestion Control in
1904	                     B-ISDN", ITU-T Rec. I.371 (03/04), March 2004.

1906	   [Jiang02]         Jiang, H. and D. Dovrolis, "The Macroscopic
1907	                     Behavior of the TCP Congestion Avoidance
1908	                     Algorithm", ACM SIGCOMM CCR 32(3)75-88, July 2002,
1909	                     <http://doi.acm.org/10.1145/571697.571725>.

1911	   [Mathis97]        Mathis, M., Semke, J., Mahdavi, J., and T. Ott,
1912	                     "The Macroscopic Behavior of the TCP Congestion
1913	                     Avoidance Algorithm", ACM SIGCOMM CCR 27(3)67--82,
1914	                     July 1997,
1915	                     <http://doi.acm.org/10.1145/263932.264023>.

1917	   [Purple]          Pletka, R., Waldvogel, M., and S. Mannal, "PURPLE:
1918	                     Predictive Active Queue Management Utilizing
1919	                     Congestion Information", Proc. Local Computer
1920	                     Networks (LCN 2003) , October 2003.

1922	   [RFC2208]         Mankin, A., Baker, F., Braden, B., Bradner, S.,
1923	                     O'Dell, M., Romanow, A., Weinrib, A., and L. Zhang,
1924	                     "Resource ReSerVation Protocol (RSVP) Version 1
1925	                     Applicability Statement Some Guidelines on
1926	                     Deployment", RFC 2208, September 1997.

1928	   [RFC3514]         Bellovin, S., "The Security Flag in the IPv4
1929	                     Header", RFC 3514, April 2003.

1931	   [RFC3540]         Spring, N., Wetherall, D., and D. Ely, "Robust
1932	                     Explicit Congestion Notification (ECN) Signaling
1933	                     with Nonces", RFC 3540, June 2003.

1935	   [RFC3714]         Floyd, S. and J. Kempf, "IAB Concerns Regarding
1936	                     Congestion Control for Voice Traffic in the
1937	                     Internet", RFC 3714, March 2004.

1939	   [RFC4340]         Kohler, E., Handley, M., and S. Floyd, "Datagram
1940	                     Congestion Control Protocol (DCCP)", RFC 4340,
1941	                     March 2006.

1943	   [RFC4341]         Floyd, S. and E. Kohler, "Profile for Datagram
1944	                     Congestion Control Protocol (DCCP) Congestion
1945	                     Control ID 2: TCP-like Congestion Control",
1946	                     RFC 4341, March 2006.

1948	   [RFC4342]         Floyd, S., Kohler, E., and J. Padhye, "Profile for
1949	                     Datagram Congestion Control Protocol (DCCP)
1950	                     Congestion Control ID 3: TCP-Friendly Rate Control
1951	                     (TFRC)", RFC 4342, March 2006.

1953	   [RFC5559]         Eardley, P., "Pre-Congestion Notification (PCN)
1954	                     Architecture", RFC 5559, June 2009.

1956	   [Re-PCN]          Briscoe, B., "Emulating Border Flow Policing using
1957	                     Re-PCN on Bulk Data",
1958	                     draft-briscoe-re-pcn-border-cheat-03 (work in
1959	                     progress), October 2009.

1961	   [Re-TCP]          Briscoe, B., Jacquet, A., Moncaster, T., and A.
1962	                     Smith, "Re-ECN: Adding Accountability for Causing
1963	                     Congestion to TCP/IP",
1964	                     draft-briscoe-conex-re-ecn-tcp-03 (work in
1965	                     progress), March 2014.

1967	   [Re-fb]           Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C.,
1968	                     Salvatori, A., Soppera, A., and M. Koyabe,
1969	                     "Policing Congestion Response in an Internetwork
1970	                     Using Re-Feedback", ACM SIGCOMM CCR 35(4)277--288,
1971	                     August 2005, <http://www.acm.org/sigs/sigcomm/
1972	                     sigcomm2005/techprog.html#session8>.

1974	   [Savage99]        Savage, S., Cardwell, N., Wetherall, D., and T.
1975	                     Anderson, "TCP congestion control with a
1976	                     misbehaving receiver", ACM SIGCOMM CCR 29(5),
1977	                     October 1999,
1978	                     <http://citeseer.ist.psu.edu/savage99tcp.html>.

1980	   [Smart_rtg]       Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y.
1981	                     Zhang, "Optimizing Cost and Performance for
1982	                     Multihoming", ACM SIGCOMM CCR 34(4)79--92,
1983	                     October 2004,
1984	                     <http://citeseer.ist.psu.edu/698472.html>.

1986	   [Steps_DoS]       Handley, M. and A. Greenhalgh, "Steps towards a
1987	                     DoS-resistant Internet Architecture", Proc. ACM
1988	                     SIGCOMM workshop on Future directions in network
1989	                     architecture (FDNA'04) pp 49--56, August 2004.

1991	   [Tussle]          Clark, D., Sollins, K., Wroclawski, J., and R.
1992	                     Braden, "Tussle in Cyberspace: Defining Tomorrow's
1993	                     Internet", ACM SIGCOMM CCR 32(4)347--356,
1994	                     October 2002, <http://www.acm.org/sigcomm/
1995	                     sigcomm2002/papers/tussle.pdf>.

1997	   [XCHOKe]          Chhabra, P., Chuig, S., Goel, A., John, A., Kumar,
1998	                     A., Saran, H., and R. Shorey, "XCHOKe: Malicious
1999	                     Source Control for Congestion Avoidance at Internet
2000	                     Gateways", Proceedings of IEEE International
2001	                     Conference on Network Protocols (ICNP-02) ,
2002	                     November 2002,
2003	                     <http://www.cc.gatech.edu/~akumar/xchoke.pdf>.

2005	   [pBox]            Floyd, S. and K. Fall, "Promoting the Use of End-
2006	                     to-End Congestion Control in the Internet", IEEE/
2007	                     ACM Transactions on Networking 7(4) 458--472,
2008	                     August 1999, <http://ieeexplore.ieee.org/xpls/
2009	                     abs_all.jsp?arnumber=793002>.

2011	   [relax-fairness]  Briscoe, B., Moncaster, T., and L. Burness,
2012	                     "Problem Statement: Transport Protocols Don't Have
2013	                     To Do Fairness",
2014	                     draft-briscoe-tsvwg-relax-fairness-01 (work in
2015	                     progress), July 2008.

2017	Appendix A.  Example Egress Dropper Algorithm

2019	   {ToDo: Write up the basic algorithm with flow state, then the
2020	   aggregated one.}

2022	Appendix B.  Policer Designs to ensure Congestion Responsiveness

2024	B.1.  Per-user Policing

2026	   User policing requires a policer on the ingress interface of the
2027	   access router associated with the user.  At that point, the traffic
2028	   of the user hasn't diverged on different routes yet; nor has it mixed
2029	   with traffic from other sources.

2031	   In order to ensure that a user doesn't generate more congestion in
2032	   the network than her due share, a modified bulk token-bucket is
2033	   maintained with the following parameter:

2035	   o  b_0 the initial token level

2037	   o  r the filling rate

2039	   o  b_max the bucket depth

2041	   The same token bucket algorithm is used as in many areas of
2042	   networking, but how it is used is very different:

2044	   o  all traffic from a user over the lifetime of their subscription is
2045	      policed in the same token bucket.

2047	   o  only positive and cancelled packets (positive, cautious and
2048	      cancelled) consume tokens

2050	   Such a policer will allow network operators to throttle the
2051	   contribution of their users to network congestion.  This will require
2052	   the appropriate contractual terms to be in place between operators
2053	   and users.  For instance: a condition for a user to subscribe to a
2054	   given network service may be that she should not cause more than a
2055	   volume C_user of congestion over a reference period T_user, although
2056	   she may carry forward up to N_user times her allowance at the end of
2057	   each period.  These terms directly set the parameter of the user
2058	   policer:

2060	   o  b_0 = C_user

2062	   o  r = C_user/T_user

2064	   o  b_max = b_0 * (N_user +1)

2066	   Besides the congestion budget policer above, another user policer may
2067	   be necessary to further rate-limit cautious packets, if they are to
2068	   be marked rather than dropped (see discussion in [ref other
2069	   document].).  Rate-limiting cautious packets will prevent high bursts
2070	   of new flow arrivals, which is a very useful feature in DoS
2071	   prevention.  A condition to subscribe to a given network service
2072	   would have to be that a user should not generate more than C_cautious
2073	   cautious packets, over a reference period T_cautious, with no option
2074	   to carry forward any of the allowance at the end of each period.
2075	   These terms directly set the parameters of the cautious packet
2076	   policer:

2078	   o  b_0 = C_cautious

2080	   o  r = C_cautious/T_cautious

2082	   o  b_max = b_0

2084	   T_cautious should be a much shorter period than T_user: for instance
2085	   T_cautious could be in the order of minutes while T_user could be in
2086	   order of weeks.

2088	B.2.  Per-flow Rate Policing

2090	   Whilst we believe that simple per-user policing would be sufficient
2091	   to ensure senders comply with congestion control, some operators may
2092	   wish to police the rate response of each flow to congestion as well.
2093	   Although we do not believe this will be neceesary, we include this
2094	   section to show how one could perform per-flow policing using
2095	   enforcement of TCP-fairness as an example.  Per-flow policing aims to
2096	   enforce congestion responsiveness on the shortest information
2097	   timescale on a network path: packet roundtrips.

2099	   This again requires that the appropriate terms be agreed between a
2100	   network operator and its users, where a congestion responsiveness
2101	   policy might be required for the use of a given network service
2102	   (perhaps unless the user specifically requests otherwise).

2104	   As an example, we describe below how a rate adaptation policer can be
2105	   designed when the applicable rate adaptation policy is TCP-
2106	   compliance.  In that context, the average throughput of a flow will
2107	   be expected to be bounded by the value of the TCP throughput during
2108	   congestion avoidance, given in Mathis' formula [Mathis97]

2110	      x_TCP = k * s / ( T * sqrt(m) )

2112	   where:

2114	   o  x_TCP is the throughput of the TCP flow in packets per second,

2116	   o  k is a constant upper-bounded by sqrt(3/2),
2117	   o  s is the average packet size of the flow,

2119	   o  T is the roundtrip time of the flow,

2121	   o  m is the congestion level experienced by the flow.

2123	   We define the marking period N=1/m which represents the average
2124	   number of packets between two positive or cancelled packets.  Mathis'
2125	   formula can be re-written as:

2127	      x_TCP = k*s*sqrt(N)/T

2129	   We can then get the average inter-mark time in a compliant TCP flow,
2130	   dt_TCP, by solving (x_TCP/s)*dt_TCP = N which gives

2132	      dt_TCP = sqrt(N)*T/k

2134	   We rely on this equation for the design of a rate-adaptation policer
2135	   as a variation of a token bucket.  In that case a policer has to be
2136	   set up for each policed flow.  This may be triggered by cautious
2137	   packets, with the remainder of flows being all rate limited together
2138	   if they do not start with a cautious packet.

2140	   Where maintaining per flow state is not a problem, for instance on
2141	   some access routers, systematic per-flow policing may be considered.
2142	   Should per-flow state be more constrained, rate adaptation policing
2143	   could be limited to a random sample of flows exhibiting positive or
2144	   cancelled packets.

2146	   As in the case of user policing, only positive or cancelled packets
2147	   will consume tokens, however the amount of tokens consumed will
2148	   depend on the congestion signal.

2150	   When a new rate adaptation policer is set up for flow j, the
2151	   following state is created:

2153	   o  a token bucket b_j of depth b_max starting at level b_0

2155	   o  a timestamp t_j = timenow()

2157	   o  a counter N_j = 0

2159	   o  a roundtrip estimate T_j

2161	   o  a filling rate r

2163	   When the policing node forwards a packet of flow j with no positive
2164	   packets:

2166	   o  . the counter is incremented: N_j += 1

2168	   When the policing node forwards a packet of flow j carrying a
2169	   negative packet:

2171	   o  the counter is incremented: N_j += 1

2173	   o  the token level is adjusted: b_j += r*(timenow()-t_j) - sqrt(N_j)*
2174	      T_j/k

2176	   o  the counter is reset: N_j = 0

2178	   o  the timer is reset: t_j = timenow()

2180	   An implementation example will be given in a later draft that avoids
2181	   having to extract the square root.

2183	   Analysis: For a TCP flow, for r= 1 token/sec, on average,

2185	      r*(timenow()-t_j)-sqrt(N_j)* T_j/k = dt_TCP - sqrt(N)*T/k = 0

2187	   This means that the token level will fluctuate around its initial
2188	   level.  The depth b_max of the bucket sets the timescale on which the
2189	   rate adaptation policy is performed while the filling rate r sets the
2190	   trade-off between responsiveness and robustness:

2192	   o  the higher b_max, the longer it will take to catch greedy flows

2194	   o  the higher r, the fewer false positives (greedy verdict on
2195	      compliant flows) but the more false negatives (compliant verdict
2196	      on greedy flows)

2198	   This rate adaptation policer requires the availability of a roundtrip
2199	   estimate which may be obtained for instance from the application of
2200	   re-feedback to the downstream delay Appendix D or passive estimation
2201	   [Jiang02].

2203	   When the bucket of a policer located at the access router (whether it
2204	   is a per-user policer or a per-flow policer) becomes empty, the
2205	   access router SHOULD drop at least all packets causing the token
2206	   level to become negative.  The network operator MAY take further
2207	   sanctions if the token level of the per-flow policers associated with
2208	   a user becomes negative.

2210	Appendix C.  Downstream Congestion Metering Algorithms
2211	C.1.  Bulk Downstream Congestion Metering Algorithm

2213	   To meter the bulk amount of downstream congestion in traffic crossing
2214	   an inter-domain border an algorithm is needed that accumulates the
2215	   size of positive packets and subtracts the size of negative packets.
2216	   We maintain two counters:

2218	      V_b: accumulated congestion volume

2220	      B: total data volume (in case it is needed)

2222	   A suitable pseudo-code algorithm for a border router is as follows:

2224	   ====================================================================
2225	   V_b = 0
2226	   B   = 0
2227	   for each Re-ECN-capable packet {
2228	       b = readLength(packet)      /* set b to packet size          */
2229	       B += b                      /* accumulate total volume       */
2230	       if readEECN(packet) == (positive || cautious {
2231	           V_b += b                /* increment...                  */
2232	       } elseif readEECN(packet) == negative {
2233	           V_b -= b                /* ...or decrement V_b...        */
2234	       }                           /*...depending on EECN field     */
2235	   }
2236	   ====================================================================

2238	   At the end of an accounting period this counter V_b represents the
2239	   congestion volume that penalties could be applied to, as described in
2240	   Section 4.5.

2242	   For instance, accumulated volume of congestion through a border
2243	   interface over a month might be V_b = 5PB (petabyte = 10^15 byte).
2244	   This might have resulted from an average downstream congestion level
2245	   of 1% on an accumulated total data volume of B = 500PB.

2247	   {ToDo: Include algorithm for precise downstream congestion.}

2249	C.2.  Inflation Factor for Persistently Negative Flows

2251	   The following process is suggested to complement the simple algorithm
2252	   above in order to protect against the various attacks from
2253	   persistently negative flows described in Section 4.5.  As explained
2254	   in that section, the most important and first step is to estimate the
2255	   contribution of persistently negative flows to the bulk volume of
2256	   downstream pre-congestion and to inflate this bulk volume as if these
2257	   flows weren't there.  The process below has been designed to give an
2258	   unbiased estimate, but it may be possible to define other processes
2259	   that achieve similar ends.

2261	   While the above simple metering algorithm is counting the bulk of
2262	   traffic over an accounting period, the meter should also select a
2263	   subset of the whole flow ID space that is small enough to be able to
2264	   realistically measure but large enough to give a realistic sample.
2265	   Many different samples of different subsets of the ID space should be
2266	   taken at different times during the accounting period, preferably
2267	   covering the whole ID space.  During each sample, the meter should
2268	   count the volume of positive packets and subtract the volume of
2269	   negative, maintaining a separate account for each flow in the sample.
2270	   It should run a lot longer than the large majority of flows, to avoid
2271	   a bias from missing the starts and ends of flows, which tend to be
2272	   positive and negative respectively.

2274	   Once the accounting period finishes, the meter should calculate the
2275	   total of the accounts V_{bI} for the subset of flows I in the sample,
2276	   and the total of the accounts V_{fI} excluding flows with a negative
2277	   account from the subset I. Then the weighted mean of all these
2278	   samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I}
2279	   V_{bI}.

2281	   If V_b is the result of the bulk accounting algorithm over the
2282	   accounting period (Appendix C.1) it can be inflated by this factor
2283	   a_S to get a good unbiased estimate of the volume of downstream
2284	   congestion over the accounting period a_S.V_b, without being polluted
2285	   by the effect of persistently negative flows.

2287	Appendix D.  Re-TTL

2289	   This Appendix gives an overview of a proposal to be able to overload
2290	   the TTL field in the IP header to monitor downstream propagation
2291	   delay.  This is included to show that it would be possible to take
2292	   account of RTT if it was deemed desirable.

2294	   Delay re-feedback can be achieved by overloading the TTL field,
2295	   without changing IP or router TTL processing.  A target value for TTL
2296	   at the destination would need standardising, say 16.  If the path hop
2297	   count increased by more than 16 during a routing change, it would
2298	   temporarily be mistaken for a routing loop, so this target would need
2299	   to be chosen to exceed typical hop count increases.  The TCP wire
2300	   protocol and handlers would need modifying to feed back the
2301	   destination TTL and initialise it.  It would be necessary to
2302	   standardise the unit of TTL in terms of real time (as was the
2303	   original intent in the early days of the Internet).

2305	   In the longer term, precision could be improved if routers
2306	   decremented TTL to represent exact propagation delay to the next
2307	   router.  That is, for a router to decrement TTL by, say, 1.8 time
2308	   units it would alternate the decrement of every packet between 1 & 2
2309	   at a ratio of 1:4.  Although this might sometimes require a seemingly
2310	   dangerous null decrement, a packet in a loop would still decrement to
2311	   zero after 255 time units on average.  As more routers were upgraded
2312	   to this more accurate TTL decrement, path delay estimates would
2313	   become increasingly accurate despite the presence of some RFC3168
2314	   compliant routers that continued to always decrement the TTL by 1.

2316	Appendix E.  Argument for holding back the ECN nonce

2318	   The ECN nonce is a mechanism that allows a /sending/ transport to
2319	   detect if drop or ECN marking at a congested router has been
2320	   suppressed by a node somewhere in the feedback loop---another router
2321	   or the receiver.

2323	   Space for the ECN nonce was set aside in [RFC3168] (currently
2324	   proposed standard) while the full nonce mechanism is specified in
2325	   [RFC3540] (currently experimental).  The specifications for [RFC4340]
2326	   (currently proposed standard) requires that "Each DCCP sender SHOULD
2327	   set ECN Nonces on its packets...".  It also mandates as a requirement
2328	   for all CCID profiles that "Any newly defined acknowledgement
2329	   mechanism MUST include a way to transmit ECN Nonce Echoes back to the
2330	   sender.", therefore:

2332	   o  The CCID profile for TCP-like Congestion Control [RFC4341]
2333	      (currently proposed standard) says "The sender will use the ECN
2334	      Nonce for data packets, and the receiver will echo those nonces in
2335	      its Ack Vectors."

2337	   o  The CCID profile for TCP-Friendly Rate Control (TFRC) [RFC4342]
2338	      recommends that "The sender [use] Loss Intervals options' ECN
2339	      Nonce Echoes (and possibly any Ack Vectors' ECN Nonce Echoes) to
2340	      probabilistically verify that the receiver is correctly reporting
2341	      all dropped or marked packets."

2343	   The primary function of the ECN nonce is to protect the integrity of
2344	   the information about congestion: ECN marks and packet drops.
2345	   However, when the nonce is used to protect the integrity of
2346	   information about packet drops, rather than ECN marks, a transport
2347	   layer nonce will always be sufficient (because a drop loses the
2348	   transport header as well as the ECN field in the network header),
2349	   which would avoid using scarce IP header codepoint space.  Similarly,
2350	   a transport layer nonce would protect against a receiver sending
2351	   early acknowledgements [Savage99].

2353	   If the ECN nonce reveals integrity problems with the information
2354	   about congestion, the sending transport can use that knowledge for
2355	   two functions:

2357	   o  to protect its own resources, by allocating them in proportion to
2358	      the rates that each network path can sustain, based on congestion
2359	      control,

2361	   o  and to protect congested routers in the network, by slowing down
2362	      drastically its connection to the destination with corrupt
2363	      congestion information.

2365	   If the sending transport chooses to act in the interests of congested
2366	   routers, it can reduce its rate if it detects some malicious party in
2367	   the feedback loop may be suppressing ECN feedback.  But it would only
2368	   be useful to congested routers when /all/ senders using them are
2369	   trusted to act in interest of the congested routers.

2371	   In the end, the only essential use of a network layer nonce is when
2372	   sending transports (e.g. large servers) want to allocate their /own/
2373	   resources in proportion to the rates that each network path can
2374	   sustain, based on congestion control.  In that case, the nonce allows
2375	   senders to be assured that they aren't being duped into giving more
2376	   of their own resources to a particular flow.  And if congestion
2377	   suppression is detected, the sending transport can rate limit the
2378	   offending connection to protect its own resources.  Certainly, this
2379	   is a useful function, but the IETF should carefully decide whether
2380	   such a single, very specific case warrants IP header space.

2382	   In contrast, Re-ECN allows all routers to fully protect themselves
2383	   from such attacks, without having to trust anyone - senders,
2384	   receivers, neighbouring networks.  Re-ECN is therefore proposed in
2385	   preference to the ECN nonce on the basis that it addresses the
2386	   generic problem of accountability for congestion of a network's
2387	   resources at the IP layer.

2389	   Delaying the ECN nonce is justified because the applicability of the
2390	   ECN nonce seems too limited for it to consume a two-bit codepoint in
2391	   the IP header.  It therefore seems prudent to give time for an
2392	   alternative way to be found to do the one function the nonce is
2393	   essential for.

2395	   Moreover, while we have re-designed the Re-ECN codepoints so that
2396	   they do not prevent the ECN nonce progressing, the same is not true
2397	   the other way round.  If the ECN nonce started to see some deployment
2398	   (perhaps because it was blessed with proposed standard status),
2399	   incremental deployment of Re-ECN would effectively be impossible,
2400	   because Re-ECN marking fractions at inter-domain borders would be
2401	   polluted by unknown levels of nonce traffic.

2403	   The authors are aware that Re-ECN must prove it has the potential it
2404	   claims if it is to displace the nonce.  Therefore, every effort has
2405	   been made to complete a comprehensive specification of Re-ECN so that
2406	   its potential can be assessed.  We therefore seek the opinion of the
2407	   Internet community on whether the Re-ECN protocol is sufficiently
2408	   useful to warrant standards action.

2410	Authors' Addresses

2412	   Bob Briscoe (editor)
2413	   BT
2414	   B54/77, Adastral Park
2415	   Martlesham Heath
2416	   Ipswich  IP5 3RE
2417	   UK

2419	   Phone: +44 1473 645196
2420	   EMail: bob.briscoe@bt.com
2421	   URI:   http://bobbriscoe.net/

2423	   Arnaud Jacquet
2424	   BT
2425	   B54/70, Adastral Park
2426	   Martlesham Heath
2427	   Ipswich  IP5 3RE
2428	   UK

2430	   Phone: +44 1473 647284
2431	   EMail: arnaud.jacquet@bt.com
2432	   URI:

2434	   Toby Moncaster
2435	   Moncaster.com
2436	   Dukes
2437	   Layer Marney
2438	   Colchester  CO5 9UZ
2439	   UK

2441	   EMail: toby@moncaster.com
2442	   Alan Smith
2443	   BT
2444	   B54/76, Adastral Park
2445	   Martlesham Heath
2446	   Ipswich  IP5 3RE
2447	   UK

2449	   Phone: +44 1473 640404
2450	   EMail: alan.p.smith@bt.com