idnits 2.17.1 

draft-briscoe-re-pcn-border-cheat-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 15.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 2379.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2390.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2397.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2403.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([RSVP-ECN], [Re-TCP],
     [PCN-arch], [PCN]), which it shouldn't.  Please replace those with
     straight textual mentions of the documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The exact meaning of the all-uppercase expression 'MAY NOT' is not
     defined in RFC 2119.  If it is intended as a requirements expression, it
     should be rewritten using one of the combinations defined in RFC 2119;
     otherwise it should not be all-uppercase.

  == The expression 'MAY NOT', while looking like RFC 2119 requirements text,
     is not defined in RFC 2119, and should not be used.  Consider using 'MUST
     NOT' instead (if that is what you mean).
     
     Found 'MAY NOT' in this paragraph:
     
     However, if the ingress gateway can guarantee that the network(s)
     that will carry the flow to its egress gateway all use a common
     identifier for the aggregate (e.g. a single MPLS network without ECMP
     routing), it MAY NOT set FNE when it adds a new flow to an active
     aggregate.  And an FNE packet need only be sent if a whole aggregate has
     been idle for more than 1 second.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 30, 2007) is 6138 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-09) exists of
     draft-briscoe-tsvwg-re-ecn-tcp-04

  == Outdated reference: A later version (-02) exists of
     draft-ietf-tsvwg-ecn-mpls-01

  == Outdated reference: A later version (-20) exists of
     draft-ietf-nsis-rmd-09


     Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	PCN Working Group                                             B. Briscoe
3	Internet-Draft                                                  BT & UCL
4	Intended status: Informational                             June 30, 2007
5	Expires: January 1, 2008

7	        Emulating Border Flow Policing using Re-ECN on Bulk Data
8	                  draft-briscoe-re-pcn-border-cheat-00

10	Status of this Memo

12	   By submitting this Internet-Draft, each author represents that any
13	   applicable patent or other IPR claims of which he or she is aware
14	   have been or will be disclosed, and any of which he or she becomes
15	   aware will be disclosed, in accordance with Section 6 of BCP 79.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt.

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   This Internet-Draft will expire on January 1, 2008.

35	Copyright Notice

37	   Copyright (C) The IETF Trust (2007).

39	Abstract

41	   Scaling per flow admission control to the Internet is a hard problem.
42	   A recently proposed approach combines Diffserv and pre-congestion
43	   notification (PCN) to provide a service slightly better than Intserv
44	   controlled load.  It scales to networks of any size, but only if
45	   domains trust each other to comply with admission control and rate
46	   policing.  This memo claims to solve this trust problem without
47	   losing scalability.  It describes bulk border policing that provides
48	   a sufficient emulation of per-flow policing with the help of another
49	   recently proposed extension to ECN, involving re-echoing ECN feedback
50	   (re-ECN).  With only passive bulk measurements at borders, sanctions
51	   can be applied against cheating networks.

53	Status (to be removed by the RFC Editor)

55	   This memo is posted as an Internet-Draft with the intent to
56	   eventually be broken down in two documents; one for the standards
57	   track and one for informational status.  But until it becomes an item
58	   of IETF working group business the whole proposal has been kept
59	   together to aid understanding.  Only the text of Section 4 of this
60	   document requires standardisation.  The rest of the sections describe
61	   how a system might be built from these protocols by the operators of
62	   an internetwork.  Note in particular that the policing and monitoring
63	   functions proposed for the trust boundaries between operators would
64	   not need standardisation by the IETF.  They simply represent one way
65	   that the proposed protocols could be used to extend the PCN
66	   architecture [PCN-arch] to span multiple domains without mutual trust
67	   between the operators.

69	   To realise the system described, this document also depends on
70	   standardisation of three other documents currently being discussed
71	   (but not on the standards track) in the IETF Transport Area: pre-
72	   congestion notification (PCN) marking on interior nodes [PCN];
73	   feedback of aggregate PCN measurements by suitably extending the
74	   admission control signalling protocol (e.g.  RSVP) [RSVP-ECN]; and
75	   re-insertion of the feedback into the forward stream of IP packets by
76	   the PCN ingress gateway in a similar way to that proposed for a TCP
77	   source [Re-TCP].

79	   The authors seek comments from the Internet community on whether
80	   combining PCN and re-ECN in this way is a sufficient solution to the
81	   problem of scaling microflow admission control to the Internet as a
82	   whole, even though such scaling must take account of the increasing
83	   numbers of networks and users who may all have conflicting interests.

85	Changes from previous drafts (to be removed by the RFC Editor)

87	   Changes in this version <draft-briscoe-re-pcn-border-cheat-00>
88	   relative to the last <draft-briscoe-tsvwg-re-ecn-border-cheat-01>:

90	      Changed filename to associate it with the new IETF PCN w-g, rather
91	      than the TSVWG w-g.

93	      Introduction: Clarified that bulk policing only replaces per-flow
94	      policing at interior inter-domain borders, while per-flow policing
95	      is still needed at the access interface to the internetwork.  Also
96	      clarified that the aim is to neutralise any gains from cheating
97	      using local bilateral contracts between neighbouring networks,
98	      rather than merely identifying remote cheaters.

100	      Section 3.1: Described the traditional per-flow policing problem
101	      with inter-domain reservations more precisely, particularly with
102	      respect to direction of reservations and of traffic flows.

104	      Clarified status of Section 5 onwards, in particular that policers
105	      and monitors would not need standardisation, but that the protocol
106	      in Section 4 would require standardisation.

108	      Section 5.6.2 on competitive routing: Added discussion of direct
109	      incentives for a receiver to switch to a different provider even
110	      if the provider has a termination monopoly.

112	      Clarified that "Designing in security from the start" merely means
113	      allowing codepoint space in the PCN protocol encoding.  There is
114	      no need to actually implement inter-domain security mechanisms for
115	      solutions confined to a single domain.

117	      Updated some references and added a ref to the Security
118	      Considerations, as well as other minor corrections and
119	      improvements.

121	   Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-00 to
122	   <draft-briscoe-tsvwg-re-ecn-border-cheat-01>:

124	      Added subsection on Border Accounting Mechanisms (Section 5.6.1)

126	      Section 4.2 on the re-ECN wire protocol clarified and re-organised
127	      to separately discuss re-ECN for default ECN marking and for pre-
128	      congestion marking (PCN).

130	      Router Forwarding Behaviour subsection added to re-organised
131	      section on Protocol Operation (Section 4.3).  Extensions section
132	      moved within Protocol Operations.

134	      Emulating Border Policing (Section 5) reorganised, starting with a
135	      new Terminology subsection heading, and a simplified overview
136	      section.  Added a large new subsection on Border Accounting
137	      Mechanisms within a new section bringing together other
138	      subsections on Border Mechanisms generally (Section 5.6).  Some
139	      text moved from old subsections into these new ones.

141	      Added section on Incremental Deployment (Section 7), drawing
142	      together relevant points about deployment made throughout.

144	      Sections on Design Rationale (Section 8) and Security
145	      Considerations (Section 9) expanded with some new material,
146	      including new attacks and their defences.

148	      Suggested Border Metering Algorithms improved (Appendix A.2) for
149	      resilience to newly identified attacks.

151	Table of Contents

153	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  7
154	   2.  Requirements Notation  . . . . . . . . . . . . . . . . . . . .  9
155	   3.  The Problem  . . . . . . . . . . . . . . . . . . . . . . . . .  9
156	     3.1.  The Traditional Per-flow Policing Problem  . . . . . . . .  9
157	     3.2.  Generic Scenario . . . . . . . . . . . . . . . . . . . . . 11
158	   4.  Re-ECN Protocol for an RSVP (or similar) Transport . . . . . . 14
159	     4.1.  Protocol Overview  . . . . . . . . . . . . . . . . . . . . 14
160	     4.2.  Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or
161	           v6)  . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
162	       4.2.1.  Re-ECN Recap . . . . . . . . . . . . . . . . . . . . . 16
163	       4.2.2.  Re-ECN Combined with Pre-Congestion Notification
164	               (re-PCN) . . . . . . . . . . . . . . . . . . . . . . . 17
165	     4.3.  Protocol Operation . . . . . . . . . . . . . . . . . . . . 19
166	       4.3.1.  Protocol Operation for an Established Flow . . . . . . 19
167	       4.3.2.  Aggregate Bootstrap  . . . . . . . . . . . . . . . . . 21
168	       4.3.3.  Flow Bootstrap . . . . . . . . . . . . . . . . . . . . 22
169	       4.3.4.  Router Forwarding Behaviour  . . . . . . . . . . . . . 23
170	       4.3.5.  Extensions . . . . . . . . . . . . . . . . . . . . . . 24
171	   5.  Emulating Border Policing with Re-ECN  . . . . . . . . . . . . 24
172	     5.1.  Informal Terminology . . . . . . . . . . . . . . . . . . . 25
173	     5.2.  Policing Overview  . . . . . . . . . . . . . . . . . . . . 26
174	     5.3.  Pre-requisite Contractual Arrangements . . . . . . . . . . 28
175	     5.4.  Emulation of Per-Flow Rate Policing: Rationale and
176	           Limits . . . . . . . . . . . . . . . . . . . . . . . . . . 31
177	     5.5.  Sanctioning Dishonest Marking  . . . . . . . . . . . . . . 32
178	     5.6.  Border Mechanisms  . . . . . . . . . . . . . . . . . . . . 34
179	       5.6.1.  Border Accounting Mechanisms . . . . . . . . . . . . . 34
180	       5.6.2.  Competitive Routing  . . . . . . . . . . . . . . . . . 38
181	       5.6.3.  Fail-safes . . . . . . . . . . . . . . . . . . . . . . 39
182	   6.  Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
183	   7.  Incremental Deployment . . . . . . . . . . . . . . . . . . . . 42
184	   8.  Design Choices and Rationale . . . . . . . . . . . . . . . . . 43
185	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 45
186	   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 46
187	   11. Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 46
188	   12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 47
189	   13. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 47
190	   14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 48
191	     14.1. Normative References . . . . . . . . . . . . . . . . . . . 48
192	     14.2. Informative References . . . . . . . . . . . . . . . . . . 48
193	   Appendix A.  Implementation  . . . . . . . . . . . . . . . . . . . 50
194	     A.1.  Ingress Gateway Algorithm for Blanking the RE flag . . . . 50
195	     A.2.  Downstream Congestion Metering Algorithms  . . . . . . . . 51
196	       A.2.1.  Bulk Downstream Congestion Metering Algorithm  . . . . 51
197	       A.2.2.  Inflation Factor for Persistently Negative Flows . . . 52
198	     A.3.  Algorithm for Sanctioning Negative Traffic . . . . . . . . 52

200	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 53
201	   Intellectual Property and Copyright Statements . . . . . . . . . . 54

203	1.  Introduction

205	   The Internet community largely lost interest in the Intserv
206	   architecture after it was clarified that it would be unlikely to
207	   scale to the whole Internet [RFC2208].  Although Intserv mechanisms
208	   proved impractical, the bandwidth reservation service it aimed to
209	   offer is still very much required.

211	   A recently proposed approach [PCN-arch] combines Diffserv and pre-
212	   congestion notification (PCN) to provide a service slightly better
213	   than Intserv controlled load [RFC2211].  It scales to any size
214	   network, but only if domains trust their neighbours to have checked
215	   that upstream customers aren't taking more bandwidth than they
216	   reserved, either accidentally or deliberately.  This memo describes
217	   border policing measures so that one network can protect its
218	   interests, even if networks around it are deliberately trying to
219	   cheat.  The approach provides a sufficient emulation of flow rate
220	   policing at trust boundaries but without per-flow processing.  The
221	   emulation is not perfect, but it is sufficient to ensure that the
222	   punishment is at least proportionate to the severity of the cheat.
223	   Per-flow rate policing for each reservation is still expected to be
224	   used at the access edge of the internetwork, but at the borders
225	   between networks bulk policing can be used to emulate per-flow
226	   policing.

228	   The aim is to be able to scale controlled load service to any number
229	   of endpoints, even though such scaling must take account of the
230	   increasing numbers of networks and users who may all have conflicting
231	   interests.  To achieve such scaling, this memo combines two recent
232	   proposals, both of which it briefly recaps:

234	   o  A deployment model for admission control over Diffserv using pre-
235	      congestion notification [PCN-arch] describes how bulk pre-
236	      congestion notification on routers within an edge-to-edge Diffserv
237	      region can emulate the precision of per-flow admission control to
238	      provide controlled load service without unscalable per-flow
239	      processing;

241	   o  Re-ECN: Adding Accountability to TCP/IP [Re-TCP].  The trick that
242	      addresses cheating at borders is to recognise that border policing
243	      is mainly necessary because cheating upstream networks will admit
244	      traffic when they shouldn't only as long as they don't directly
245	      experience the downstream congestion their misbehaviour can cause.
246	      The re-ECN protocol requires upstream nodes to declare expected
247	      downstream congestion in all forwarded packets and it makes it in
248	      their interests to declare it honestly.  Operators can then
249	      monitor downstream congestion in bulk at borders to emulate
250	      policing.

252	   The aim is not to enable a network to _identify_ some remote cheating
253	   party, which would rarely be useful given the victim network would be
254	   unlikely to be able to seek redress from a cheater in some remote
255	   part of the world with whom no direct contractual relationship
256	   exists.  Rather the aim is to ensure that any gain from cheating will
257	   be cancelled out by penalties applied to the cheating party by its
258	   local network.  Further, the solution ensures each of the chain of
259	   networks between the cheater and the victim will lose out if it
260	   doesn't apply penalties to its neighbour.  Thus the solution builds
261	   on the local bilateral contractual relationships that already exist
262	   between neighbouring networks.

264	   Rather than the end-to-end arrangement used when re-ECN was specified
265	   for the TCP transport [Re-TCP], this memo specifies re-ECN in an
266	   edge-to-edge arrangement, making it applicable to the above
267	   deployment model for admission control over Diffserv.  Also, rather
268	   than using a TCP transport for regular congestion feedback, this memo
269	   specifies re-ECN using RSVP as the transport for feedback [RSVP-ECN].
270	   A similar deployment model, but with a different transport for
271	   signalling congestion feedback could be used (e.g.  RMD [NSIS-RMD]
272	   uses NSIS).

274	   This memo aims to do two things: i) define how to apply the re-ECN
275	   protocol to the admission control over Diffserv scenario; and ii)
276	   explain why re-ECN sufficiently emulates border policing in that
277	   scenario.  Most of the memo is taken up with the second aim;
278	   explaining why it works.  Applying re-ECN to the scenario actually
279	   involves quite a trivial modification to the ingress gateway.  That
280	   modification can be added to gateways later, so our immediate goal is
281	   to convince everyone to have the foresight to define the PCN wire
282	   protocol encoding to accommodate the extended codepoints defined in
283	   this document, whether first deployments require border policing or
284	   not.  Otherwise, when we want to add policing, we will have built
285	   ourselves a legacy problem.  In other words, we aim to convince
286	   people to "Design in security from the start."

288	   The body of this memo is structured as follows:

290	      Section 3 describes the border policing problem.  We recap the
291	      traditional, unscalable view of how to solve the problem, and we
292	      recap the admission control solution which has the scalability we
293	      do not want to lose when we add border policing;

295	      Section 4 specifies the re-ECN protocol solution in detail;

297	      Section 5 explains how to use the protocol to emulate border
298	      policing, and why it works;
299	      Section 6 analyses the security of the proposed solution;

301	      Section 8 explains the sometimes subtle rationale behind our
302	      design decisions;

304	      Section 9 comments on the overall robustness of the security
305	      assumptions and lists specific security issues.

307	   It must be emphasised that we are not evangelical about removing per-
308	   flow processing from borders.  Network operators may choose to do
309	   per-flow processing at their borders for their own reasons, such as
310	   to support business models that require per-flow accounting.  Our aim
311	   is to show that per-flow processing at borders is no longer
312	   _necessary_ in order to provide end-to-end QoS using flow admission
313	   control.  Indeed, we are absolutely opposed to standardisation of
314	   technology that embeds particular business models into the Internet.
315	   Our aim is merely to provide a new useful metric (downstream
316	   congestion) at trust boundaries.  Given the well-known significance
317	   of congestion in economics, operators can then use this new metric in
318	   their interconnection contracts if they choose.  This will enable
319	   competitive evolution of new business models (for examples
320	   see [IXQoS]), even for sets of flows running alongside another set
321	   across the same border but using the more traditional model that
322	   depends on more costly per-flow processing at each border.

324	2.  Requirements Notation

326	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
327	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
328	   document are to be interpreted as described in [RFC2119].

330	3.  The Problem

332	3.1.  The Traditional Per-flow Policing Problem

334	   If we claim to be able to emulate per-flow policing with bulk
335	   policing at trust boundaries, we need to know exactly what we are
336	   emulating.  So, we will start from the traditional scenario with per-
337	   flow policing at trust boundaries to explain why it has always been
338	   considered necessary.

340	   To be able to take advantage of a reservation-based service such as
341	   controlled load, a source-destination pair must reserve resources
342	   using a signalling protocol such as RSVP [RFC2205].  An RSVP
343	   signalling request refers to a flow of packets by its flow ID tuple
344	   (filter spec [RFC2205]) (or its security parameter index
345	   (SPI) [RFC2207] if port numbers are hidden by IPSec encryption).
346	   Other signalling protocols use similar flow identifiers.  But, it is
347	   insufficient to merely authorise and admit a flow based on its
348	   identifiers, for instance merely opening a pin-hole for packets with
349	   identifiers that match an admitted flow ID.  Because, once a flow is
350	   admitted, it cannot necessarily be trusted to send packets within the
351	   rate profile it requested.

353	   The packet rate must also be policed to keep the flow within the
354	   requested flow spec [RFC2205].  For instance, without data rate
355	   policing, a source-destination pair could reserve resources for an
356	   8kbps audio flow but the source could transmit a 6Mbps video (theft
357	   of service).  More subtly, the sender could generate bursts that were
358	   outside the profile requested.

360	   In traditional architectures, per-flow packet rate-policing is
361	   expensive and unscalable but, without it, a network is vulnerable to
362	   such theft of service (whether malicious or accidental).  Perhaps
363	   more importantly, if flows are allowed to send more data than they
364	   were permitted, the ability of admission control to give assurances
365	   to other flows will break.

367	   Just as sources need not be trusted to keep within the requested flow
368	   spec, whole networks might also try to cheat.  We will now set up a
369	   concrete scenario to illustrate such cheats.  Imagine reservations
370	   for unidirectional flows, through at least two networks, an edge
371	   network and its downstream transit provider.  Imagine the edge
372	   network charges its retail customers per reservation but also has to
373	   pay its transit provider a charge per reservation.  Typically, both
374	   its selling and buying charges might depend on the duration and rate
375	   of each reservation.  The level of the actual selling and buying
376	   prices are irrelevant to our discussion (most likely the network will
377	   sell at a higher price than it buys, of course).

379	   A cheating ingress network could systematically reduce the size of
380	   its retail customers' reservation signalling requests (e.g. the
381	   SENDER_TSPEC object in RSVP's PATH message) before forwarding them to
382	   its transit provider and systematically reinstate the responses on
383	   the way back (e.g. the FLOWSPEC object in RSVP's RESV message).  It
384	   would then receive an honest income from its upstream retail customer
385	   but only pay for fraudulently smaller reservations downstream.  A
386	   similar but opposite trick (increasing the TSPEC and decreasing the
387	   FLOWSPEC) could be perpetrated by the receiver's access network if
388	   the reservation was paid for by the receiver.

390	   Equivalently, a cheating ingress network may feed the traffic from a
391	   number of flows into an aggregate reservation over the transit that
392	   is smaller than the total of all the flows.  Because of these fraud
393	   possibilities, in traditional QoS reservation architectures the
394	   downstream network polices at each border.  The policer checks that
395	   the actual sent data rate of each flow is within the signalled
396	   reservation.

398	   Reservation signalling could be authenticated end to end, but this
399	   wouldn't prevent the aggregation cheat just described.  For this
400	   reason, and to avoid the need for a global PKI, signalling integrity
401	   is typically only protected on a hop-by-hop basis [RFC2747].

403	   A variant of the above cheat is where a router in an honest
404	   downstream network denies admission to a new reservation, but a
405	   cheating upstream network still admits the flow.  For instance, the
406	   networks may be using Diffserv internally, but Intserv admission
407	   control at their borders [RFC2998].  The cheat would only work if
408	   they were using bulk Diffserv traffic policing at their borders,
409	   perhaps to avoid the cost/complexity of Intserv border policing.  As
410	   far as the cheating upstream network is concerned, it gets the
411	   revenue from the reservation, but it doesn't have to pay any
412	   downstream wholesale charges and the congestion is in someone else's
413	   network.  The cheating network may calculate that most of the flows
414	   affected by congestion in the downstream network aren't likely to be
415	   its own.  It may also calculate that the downstream router has been
416	   configured to deny admission to new flows in order to protect
417	   bandwidth assigned to other network services (e.g. enterprise VPNs).
418	   So the cheating network can steal capacity from the downstream
419	   operator's VPNs that are probably not actually congested.

421	   All the above cheats are framed in the context of RSVP's receiver
422	   confirmed reservation model, but similar cheats are possible with
423	   sender-initiated and other models.

425	   To summarise, in traditional reservation signalling architectures, if
426	   a network cannot trust a neighbouring upstream network to rate-police
427	   each reservation, it has to check for itself that the data rate fits
428	   within each of the reservations it has admitted.

430	3.2.  Generic Scenario

432	   We will now describe a generic internetworking scenario that we will
433	   use to describe and to test our bulk policing proposal.  It consists
434	   of a number of networks and endpoints that do not fully trust each
435	   other to behave.  In Section 6 we will tie down exactly what we mean
436	   by partial trust, and we will consider the various combinations where
437	   some networks do not trust each other and others are colluding
438	   together.

440	    _    ___      _____________________________________       ___    _
441	   | |  |   |   _|__    ______    ______    ______    _|__   |   |  | |
442	   | |  |   |  |    |  |      |  |      |  |      |  |    |  |   |  | |
443	   | |  |   |  |    |  |Inter-|  |Inter-|  |Inter-|  |    |  |   |  | |
444	   | |  |   |  |    |  | ior  |  | ior  |  | ior  |  |    |  |   |  | |
445	   | |  |   |  |    |  |Domain|  |Domain|  |Domain|  |    |  |   |  | |
446	   | |  |   |  |    |  |  A   |  |  B   |  |  C   |  |    |  |   |  | |
447	   | |  |   |  |    |  |      |  |      |  |      |  |    |  |   |  | |
448	   | |  |   |  +----+  +-+  +-+  +-+  +-+  +-+  +-+  +----+  |   |  | |
449	   | |  |   |  |    |  |B|  |B|  |B|  |B|  |B|  |B|  |    |  |   |\ | |
450	   | |==|   |==|Ingr|==|R|  |R|==|R|  |R|==|R|  |R|==|Egr |==|   |=>| |
451	   | |  |   |  |G/W |  | |  | |  | |  | |  | |  | |  |G/W |  |   |/ | |
452	   | |  |   |  +----+  +-+  +-+  +-+  +-+  +-+  +-+  +----+  |   |  | |
453	   | |  |   |  |    |  |      |  |      |  |      |  |    |  |   |  | |
454	   | |  |   |  |____|  |______|  |______|  |______|  |____|  |   |  | |
455	   |_|  |___|    |_____________________________________|     |___|  |_|

457	   Sx   Ingress               Diffserv region               Egress   Rx
458	   End  Access                                              Access  End
459	   Host Network                                            Network Host
460	                <-------- edge-to-edge signalling ------->
461	                          (for admission control)

463	   <-------------------end-to-end QoS signalling protocol------------->

465	      Figure 1: Generic Scenario (see text for explanation of terms)

467	   An ingress and egress gateway (Ingr G/W and Egr G/W in Figure 1)
468	   connect the interior Diffserv region to the edge access networks
469	   where routers (not shown) use per-flow reservation processing.
470	   Within the Diffserv region are three interior domains, A, B and C, as
471	   well as the inward facing interfaces of the ingress and egress
472	   gateways.  An ingress and egress border router (BR) is shown
473	   interconnecting each interior domain with the next.  There may be
474	   other interior routers (not shown) within each interior domain.

476	   In two paragraphs we now briefly recap how pre-congestion
477	   notification is intended to be used to control flow admission to a
478	   large Diffserv region.  The first paragraph describes data plane
479	   functions and the second describes signalling in the control plane.
480	   We omit many details from [PCN-arch] including behaviour during
481	   routing changes.  For brevity here we assume other flows are already
482	   in progress across a path through the Diffserv region before a new
483	   one arrives, but how bootstrap works is described in Section 4.3.2.

485	   Figure 1 shows a single simplex reserved flow from the sending (Sx)
486	   end host to the receiving (Rx) end host.  The ingress gateway polices
487	   incoming traffic within its admitted reservation and remarks it to
488	   turn on an ECN-capable codepoint [RFC3168] and the controlled load
489	   (CL) Diffserv codepoint.  Together, these codepoints define which
490	   traffic is entitled to the enhanced scheduling of the CL behaviour
491	   aggregate on routers within the Diffserv region.  The CL PHB of
492	   interior routers consists of a scheduling behaviour and a new ECN
493	   marking behaviour that we call `pre-congestion notification' [PCN].
494	   The CL PHB simply re-uses the definition of expedited forwarding
495	   (EF) [RFC3246] for its scheduling behaviour.  But it incorporates a
496	   new ECN marking behaviour, which sets the ECN field of an increasing
497	   number of CL packets to the admission marked (AM) codepoint as they
498	   approach a threshold rate that is lower than the line rate.  The use
499	   of virtual queues ensures real queues have hardly built up any
500	   congestion delay.  The level of marking detected at the egress of the
501	   Diffserv region is then used by the signalling system in order to
502	   determine admission control as follows.

504	   The end-to-end QoS signalling (e.g.  RSVP) for a new reservation
505	   takes one giant hop from ingress to egress gateway, because interior
506	   routers within the Diffserv region are configured to ignore RSVP.
507	   The egress gateway holds flow state because it takes part in the end-
508	   to-end reservation.  So it can classify all packets by flow and it
509	   can identify all flows that have the same previous RSVP hop (a CL-
510	   region-aggregate).  For each CL-region-aggregate of flows in
511	   progress, the egress gateway maintains a per-packet moving average of
512	   the fraction of pre-congestion-marked traffic.  Once an RSVP PATH
513	   message for a new reservation has hopped across the Diffserv region
514	   and reached the destination, an RSVP RESV message is returned.  As
515	   the RESV message passes, the egress gateway piggy-backs the relevant
516	   pre-congestion level onto it [RSVP-ECN].  Again, interior routers
517	   ignore the RSVP message, but the ingress gateway strips off the pre-
518	   congestion level.  If the pre-congestion level is above a threshold,
519	   the ingress gateway denies admission to the new reservation,
520	   otherwise it returns the original RESV signal back towards the data
521	   sender.

523	   Once a reservation is admitted, its traffic will always receive low
524	   delay service for the duration of the reservation.  This is because
525	   ingress gateways ensure that traffic not under a reservation cannot
526	   pass into the Diffserv region with the CL DSCP set.  So non-reserved
527	   traffic will always be treated with a lower priority PHB at each
528	   interior router.  And even if some disaster re-routes traffic after
529	   it has been admitted, if the traffic through any resource tips over a
530	   fail-safe threshold, pre-congestion notification will trigger flow
531	   pre-emption to very quickly bring every router within the whole
532	   Diffserv region back below its operating point.

534	   The whole admission control system just described deliberately
535	   confines per-flow processing to the access edges of the network,
536	   where it will not limit the system's scalability.  But ideally we
537	   want to extend this approach to multiple networks, to take even more
538	   advantage of its scaling potential.  We would still need per-flow
539	   processing at the access edges of each network, but not at the high
540	   speed interfaces where they interconnect.  Even though such an
541	   admission control system would work technically, it would gain us no
542	   scaling advantage if each network also wanted to police the rate of
543	   each admitted flow for itself--border routers would still have to do
544	   complex packet operations per-flow anyway, given they don't trust
545	   upstream networks to do their policing for them.

547	   This memo describes how to emulate per-flow rate policing using bulk
548	   mechanisms at border routers, so the full scalability potential of
549	   pre-congestion notification is not limited by the need for per-flow
550	   policing mechanisms at borders, which would make borders the most
551	   cost-critical pinch-points.  Then we can achieve the long sought-for
552	   vision of secure Internet-wide bandwidth reservations without needing
553	   per-flow processing at all in core and border routers--where
554	   scalability is most critical.

556	4.  Re-ECN Protocol for an RSVP (or similar) Transport

558	4.1.  Protocol Overview

560	   First we need to recap the way routers accumulate congestion marking
561	   along a path.  Each ECN-capable router marks some packets with CE,
562	   the marking probability increasing with the length of the queue at
563	   its egress link.  The only difference with pre-congestion
564	   marking [PCN] is that marking is based on the length of a virtual
565	   queue, so that the real queue occupancy can remain very low.  We will
566	   use the terms congestion and pre-congestion interchangeably in the
567	   following unless it is important to distinguish between them.

569	   With multiple ECN-capable routers on a path, the ECN field
570	   accumulates the fraction of CE marking that each router adds.  The
571	   combined effect of the packet marking of all the routers along the
572	   path signals congestion of the whole path to the receiver.  So, for
573	   example, if one router early in a path is marking 1% of packets and
574	   another later in a path is marking 2%, flows that pass through both
575	   routers will experience approximately 3% marking.

577	   The packets crossing an inter-domain trust boundary within the
578	   Diffserv region will all have come from different ingress gateways
579	   and will all be destined for different egress gateways.  We will show
580	   that the key to policing against theft of service is for a border
581	   router to be able to directly measure the congestion that is about to
582	   be caused by the traffic it forwards.  That is, it can measure
583	   locally the congestion on each of the downstream paths between itself
584	   and the egress gateways that its traffic is destined for.

586	   With the original ECN protocol, if CE markings crossing the border
587	   had been counted over a period, they would have represented the
588	   accumulated upstream congestion that had already been experienced by
589	   those packets.  The general idea of re-ECN is for the ingress gateway
590	   to continuously encode path congestion into the IP header where, in
591	   this case, `path' means from ingress to egress gateway.  Then at any
592	   point on that path (e.g. between domains A & B in Figure 2 below), IP
593	   headers can be monitored to subtract upstream congestion from
594	   expected path congestion in order to give the expected downstream
595	   congestion still to be experienced until the egress gateway.

597	   Importantly, it turns out that there is no need to monitor downstream
598	   congestion on a per-flow basis.  We will show that accounting for it
599	   in bulk across all flows will be sufficient.

601	                  _____________________________________
602	                _|__    ______    ______    ______    _|__
603	               |    |  |  A   |  |  B   |  |  C   |  |    |
604	               +----+  +-+  +-+  +-+  +-+  +-+  +-+  +----+
605	               |    |  |B|  |B|  |B|  |B|  |B|  |B|  |    |
606	               |Ingr|==|R|  |R|==|R|  |R|==|R|  |R|==|Egr |
607	               |G/W |  | |  | |: | |  | |  | |  | |  |G/W |
608	               +----+  +-+  +-+: +-+  +-+  +-+  +-+  +----+
609	               |    |  |      |: |      |  |      |  |    |
610	               |____|  |______|: |______|  |______|  |____|
611	                 |_____________:_______________________|
612	                               :
613	                 |             :                       |
614	                 |<-upstream-->:<-expected downstream->|
615	                 | congestion  :      congestion       |
616	                 |     u               v ~= p - u      |
617	                 |                                     |
618	                 |<--- expected path congestion, p --->|

620	                         Figure 2: Re-ECN concept

622	4.2.  Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6)

624	   In this section we define the names of the various codepoints of the
625	   re-ECN protocol when used with pre-congestion notification, deferring
626	   description of their semantics to the following sections.  But first
627	   we recap the re-ECN wire protocol proposed in [Re-TCP].

629	4.2.1.  Re-ECN Recap

631	   Re-ECN uses the two bit ECN field broadly as in RFC3168 [RFC3168].
632	   It also uses a new re-ECN extension (RE) flag.  The actual position
633	   of the RE flag is different between IPv4 & v6 headers so we will use
634	   an abstraction of the IPv4 and v6 wire protocols by just calling it
635	   the RE flag.  [Re-TCP] proposes using bit 48 (currently unused) in
636	   the IPv4 header for the RE flag, while for IPv6 it proposes an ECN
637	   extension header.

639	   Unlike the ECN field, the RE flag is intended to be set by the sender
640	   and remain unchanged along the path, although it can be read by
641	   network elements that understand the re-ECN protocol.  In the
642	   scenario used in this memo, the ingress gateway acts as a proxy for
643	   the sender, setting the RE flag as permitted in the specification of
644	   re-ECN.

646	   Note that general-purpose routers do not have to read the RE flag,
647	   only special policing elements at borders do.  And no general-purpose
648	   routers have to change the RE flag, although the ingress and egress
649	   gateways do because in the edge-to-edge deployment model we are
650	   using, they act as proxies for the endpoints.  Therefore the RE flag
651	   does not even have to be visible to interior routers.  So the RE flag
652	   has no implications on protocols like MPLS.  Congested label
653	   switching routers (LSRs) would have to be able to notify their
654	   congestion with an ECN/PCN codepoint in the MPLS shim [ECN-MPLS], but
655	   like any interior IP router, they can be oblivious to the RE flag,
656	   which need only be read by border policing functions.

658	   Although the RE flag is a separate, single bit field, it can be read
659	   as an extension to the two-bit ECN field; the three concatenated bits
660	   in what we will call the extended ECN field (EECN) make eight
661	   codepoints available.  When the RE flag setting is "don't care", we
662	   use the RFC3168 names of the ECN codepoints, but [Re-TCP] proposes
663	   the following six codepoint names for when there is a need to be more
664	   specific.

666	   +--------+-------------+-------+-------------+----------------------+
667	   |   ECN  | RFC3168     |   RE  | Extended    |    Re-ECN meaning    |
668	   |  field | codepoint   |  flag | ECN         |                      |
669	   |        |             |       | codepoint   |                      |
670	   +--------+-------------+-------+-------------+----------------------+
671	   |   00   | Not-ECT     |   0   | Not-RECT    |  Not re-ECN-capable  |
672	   |        |             |       |             |       transport      |
673	   |   00   | Not-ECT     |   1   | FNE         |     Feedback not     |
674	   |        |             |       |             |      established     |
675	   |   01   | ECT(1)      |   0   | Re-Echo     | Re-echoed congestion |
676	   |        |             |       |             |       and RECT       |
677	   |   01   | ECT(1)      |   1   | RECT        |    Re-ECN capable    |
678	   |        |             |       |             |       transport      |
679	   |   10   | ECT(0)      |   0   | ---         |    Legacy ECN use    |
680	   |        |             |       |             |        only          |
681	   |   10   | ECT(0)      |   1   | --CU--      |   Currently unused   |
682	   |        |             |       |             |                      |
683	   |   11   | CE          |   0   | CE(0)       |      Congestion      |
684	   |        |             |       |             |   experienced with   |
685	   |        |             |       |             |        Re-Echo       |
686	   |   11   | CE          |   1   | CE(-1)      |      Congestion      |
687	   |        |             |       |             |      experienced     |
688	   +--------+-------------+-------+-------------+----------------------+

690	    Table 1: Re-cap of Default Extended ECN Codepoints Proposed for Re-
691	                                    ECN

693	4.2.2.  Re-ECN Combined with Pre-Congestion Notification (re-PCN)

695	   As permitted by the ECN specification [RFC3168], a proposal is
696	   currently being advanced in the IETF to define different semantics
697	   for how routers might mark the ECN field of certain packets.  The
698	   idea is to be able to notify congestion when the router's load
699	   approaches a logical limit, rather than the physical limit of the
700	   line.  This new marking is called pre-congestion notification [PCN]
701	   and we will use the term PCN-enabled router for a router that can
702	   apply pre-congestion notification marking to the ECN fields of
703	   packets.

705	   [RFC3168] recommends that a packet's Diffserv codepoint should
706	   determine which type of ECN marking it receives.  A Diffserv per-hop
707	   behaviour (PHB) can specify that routers should apply pre-congestion
708	   notification marking to PCN-capable packets.  We will call this a
709	   PCN-enhanced PHB.  A PCN-capable packet must meet two conditions, it
710	   must carry a DSCP that maps to a PCN-enhanced PHB and it must carry
711	   an ECN field that turns on PCN marking.

713	   As an example, the controlled load (CL) PHB might specify expedited
714	   forwarding as its scheduling behaviour and PCN marking as its
715	   congestion marking behaviour.  Then we would say the CL PHB is a PCN-
716	   enhanced PHB, and that packets with a DSCP that maps to the CL PHB
717	   and with ECN turned on are PCN-capable packets.

719	   [PCN] actually proposes that two logical limits should be used for
720	   pre-congestion notification, with the higher limit as a back-stop for
721	   dealing with anomalous events.  It envisages PCN will be used to
722	   admission control inelastic real-time traffic, so marking at the
723	   lower limit will trigger admission control, while at the higher limit
724	   it will trigger flow pre-emption.

726	   Because it needs two types of congestion marking, PCN seems to need
727	   five states: Not-ECT, ECT (ECN-capable transport), the ECN Nonce,
728	   Admission Marking (AM) and Flow Pre-emption Marking (PM).  [PCN]
729	   proposes various alternative encodings of the ECN field, attempting
730	   various compromises to fit these five states into the four available
731	   ECN codepoints.

733	   One of the five states to make room for is the ECN Nonce [RFC3540],
734	   but the capability we describe in this memo supersedes any need for
735	   the Nonce.  The ECN Nonce is an elegant scheme, but it only allows a
736	   sending node (or its proxy) to detect suppression of congestion
737	   marking in the feedback loop.  Thus the Nonce requires the sender or
738	   its proxy to be trusted to respond correctly to congestion.  But this
739	   is precisely the main cheat we want to protect against (as well as
740	   many others).

742	   One of the compromise protocol encodings that [PCN] explores
743	   ("Alternative 5") leaves out support for the ECN Nonce.  Therefore we
744	   use that one.  This encoding of PCN markings is shown on the left of
745	   Table 2.  Note that these codepoints of the ECN field only take on
746	   the semantics of pre-congestion notification if they are combined
747	   with a Diffserv codepoint that the operator has configured to cause
748	   PCN marking, by mapping it to a PCN-enhanced PHB.

750	   For the rest of this memo, we will not distinguish between Admission
751	   Marking and Pre-emption Marking unless we need to be specific.  We
752	   will call both "congestion marking".  With the above encoding,
753	   congestion marking can be read to mean any packet with the left-most
754	   bit of the ECN field set.

756	   The re-ECN protocol can be used to control misbehaving sources
757	   whether congestion is with respect to a logical threshold (PCN) or
758	   the physical line rate (ECN).  In either case the RE flag can be used
759	   to create an extended ECN field.  For PCN-capable packets, the 8
760	   possible encodings of this 3-bit extended ECN (EECN) field are
761	   defined on the right of Table 2 below.  The purposes of these
762	   different codepoints will be introduced in subsequent sections.

764	   +-------+-----------------+------+--------------+-------------------+
765	   |  ECN  | PCN codepoint   |  RE  | Extended ECN |   Re-ECN meaning  |
766	   | field | (Alternative 5) | flag | codepoint    |                   |
767	   +-------+-----------------+------+--------------+-------------------+
768	   |   00  | Not-ECT         |   0  | Not-RECT     |        Not        |
769	   |       |                 |      |              |   re-ECN-capable  |
770	   |       |                 |      |              |     transport     |
771	   |   00  | Not-ECT         |   1  | FNE          |    Feedback not   |
772	   |       |                 |      |              |    established    |
773	   |   01  | ECT(1)          |   0  | Re-Echo      |     Re-echoed     |
774	   |       |                 |      |              |   congestion and  |
775	   |       |                 |      |              |        RECT       |
776	   |   01  | ECT(1)          |   1  | RECT         |   Re-ECN capable  |
777	   |       |                 |      |              |     transport     |
778	   |   10  | AM              |   0  | AM(0)        | Admission Marking |
779	   |       |                 |      |              |    with Re-Echo   |
780	   |   10  | AM              |   1  | AM(-1)       | Admission Marking |
781	   |       |                 |      |              |                   |
782	   |   11  | PM              |   0  | PM(0)        |    Pre-emption    |
783	   |       |                 |      |              |    Marking with   |
784	   |       |                 |      |              |      Re-Echo      |
785	   |   11  | PM              |   1  | PM(-1)       |    Pre-emption    |
786	   |       |                 |      |              |      Marking      |
787	   +-------+-----------------+------+--------------+-------------------+

789	   Table 2: Extended ECN Codepoints if the Diffserv codepoint uses Pre-
790	                       congestion Notification (PCN)

792	4.3.  Protocol Operation

794	4.3.1.  Protocol Operation for an Established Flow

796	   The re-ECN protocol involves a simple tweak to the action of the
797	   gateway at the ingress edge of the CL region.  In the deployment
798	   model just described [PCN-arch], for each active traffic aggregate
799	   across the CL region (CL-region-aggregate) the ingress gateway will
800	   hold a fairly recent Congestion-Level-Estimate that the egress
801	   gateway will have fed back to it, piggybacked on the signalling that
802	   sets up each flow.  For instance, one aggregate might have been
803	   experiencing 3% pre-congestion (that is, congestion marked octets
804	   whether Admission Marked or Pre-emption Marked).  In this case, the
805	   ingress gateway MUST clear the RE flag to "0" for the same percentage
806	   of octets of CL-packets (3%) and set it to "1" in the rest (97%).
807	   Appendix A.1 gives a simple pseudo-code algorithm that the ingress
808	   gateway may use to do this.

810	   The RE flag is set and cleared this way round for incremental
811	   deployment reasons (see [Re-TCP]).  To avoid confusion we will use
812	   the term `blanking' (rather than marking) when the RE flag is cleared
813	   to "0", so we will talk of the `RE blanking fraction' as the fraction
814	   of octets with the RE flag cleared to "0".

816	       ^
817	       |
818	       |         RE blanking fraction
819	    3% |    +----------------------------+====+
820	       |    |                            |    |
821	    2% |    |                            |    |
822	       |    | congestion marking fraction|    |
823	    1% |    |     +----------------------+    |
824	       |    |     |                           |
825	    0% +----+=====+---------------------------+------>
826	            ^   <--A---> <---B---> <---C--->  ^        domain
827	            |     ^                      ^    |
828	        ingress   |                      |    egress
829	                1.00%                 2.00%          marking fraction

831	        Figure 3: Example Extended ECN codepoint Marking fractions
832	                                (Imprecise)

834	   Figure 3 illustrates our example.  The horizontal axis represents the
835	   index of each congestible resource (typically queues) along a path
836	   through the Internet.  The two superimposed plots show the fraction
837	   of each ECN codepoint observed along this path, assuming there are
838	   two congested routers somewhere within domains A and C. And Table 3
839	   below shows the downstream pre-congestion measured at various border
840	   observation points along the path.  Figure 4 (later) shows the same
841	   results of these subtractions, but in graphical form like the above
842	   figure.  The tabulated figures are actually reasonable approximations
843	   derived from more precise formulae given in Appendix A of [Re-TCP].
844	   The RE flag is not changed by interior routers, so it can be seen
845	   that it acts as a reference against which the congestion marking
846	   fraction can be compared along the path.

848	   +--------------------------+---------------------------------------+
849	   | Border observation point | Approximate Downstream pre-congestion |
850	   +--------------------------+---------------------------------------+
851	   |       ingress -- A       |              3% - 0% = 3%             |
852	   |          A -- B          |              3% - 1% = 2%             |
853	   |          B -- C          |              3% - 1% = 2%             |
854	   |        C -- egress       |              3% - 3% = 0%             |
855	   +--------------------------+---------------------------------------+

857	   Table 3: Downstream Congestion Measured at Example Observation Points
858	   Note that the ingress determines the RE blanking fraction for each
859	   aggregate using the most recent feedback from the relevant egress,
860	   arriving with each new reservation, or each refresh.  These updates
861	   arrive relatively infrequently compared to the speed with which
862	   congestion changes.  Although this feedback will always be out of
863	   date, on average positive errors should cancel out negative over a
864	   sufficiently long duration.

866	   In summary, the network adds pre-congestion marking in the forward
867	   data path, the egress feeds its level back to the ingress in RSVP (or
868	   similar signalling), then the ingress gateway re-echoes it into the
869	   forward data path by blanking the RE flag.  Hence the name re-ECN.
870	   Then at any border within the Diffserv region, the pre-congestion
871	   marking that every passing packet will be expected to experience
872	   downstream can be measured to be the RE blanking fraction minus the
873	   congestion marking fraction.

875	4.3.2.  Aggregate Bootstrap

877	   When a new reservation PATH message arrives at the egress, if there
878	   are currently no flows in progress from the same ingress, there will
879	   be no state maintaining the current level of pre-congestion marking
880	   for the aggregate.  While the reservation signalling continues onward
881	   towards the receiving host, the egress gateway returns an RSVP
882	   message to the ingress with a flag [RSVP-ECN] asking the ingress to
883	   send a specified number of data probes between them.  This bootstrap
884	   behaviour is all described in the deployment model [PCN-arch].

886	   However, with our new re-ECN scheme, the ingress does not know what
887	   proportion of the data probes should have the RE flag blanked,
888	   because it has no estimate yet of pre-congestion for the path across
889	   the Diffserv region.

891	   To be conservative, following the guidance for specifying other re-
892	   ECN transports in [Re-TCP], the ingress SHOULD set the FNE codepoint
893	   of the extended ECN header in all probe packets (Table 2).  As per
894	   the deployment model, the egress gateway measures the fraction of
895	   congestion-marked probe octets and feeds back the resulting pre-
896	   congestion level to the ingress, piggy-backed on the returning
897	   reservation response (RESV) for the new flow.  Probe packets are
898	   identifiable by the egress because they have the ingress as the
899	   source and the egress as the destination in the IP header.

901	   It may seem inadvisable to expect the FNE codepoint to be set on
902	   probes, given legacy firewalls etc. might discard such packets
903	   (because this flag had no previous legitimate use).  However, in the
904	   deployment scenarios envisaged, each domain in the Diffserv region
905	   has to be explicitly configured to support the controlled load
906	   service.  So, before deploying the service, the operator MUST
907	   reconfigure such a misbehaving middlebox to allow through packets
908	   with the RE flag set.

910	   Note that we have said SHOULD rather than MUST for the FNE setting
911	   behaviour of the ingress for probe packets.  This entertains the
912	   possibility of an ingress implementation having the benefit of other
913	   knowledge of the path, which it re-uses for a newly starting
914	   aggregate.  For instance, it may hold cached information from a
915	   recent use of the aggregate that is still sufficiently current to be
916	   useful.

918	   It might seem pedantic worrying about these few probe packets, but
919	   this behaviour ensures the system is safe, even if the proportion of
920	   probe packets becomes large.

922	4.3.3.  Flow Bootstrap

924	   It might be expected that a new flow within an active aggregate would
925	   need no special bootstrap behaviour.  If there was an aggregate
926	   already in progress between the gateways the new flow was about to
927	   use, it would inherit the prevailing RE blanking fraction.  And if
928	   there were no active aggregate, the bootstrap behaviour for an
929	   aggregate would be appropriate and sufficient for the new flow.

931	   However, for a number of reasons, at least the first packet of each
932	   new flow SHOULD be set to the FNE codepoint, irrespective of whether
933	   it is joining an active aggregate or not.  If the first packet is
934	   unlikely to be reliably delivered, a number of FNE packets MAY be
935	   sent to increase the probability that at least one is delivered to
936	   the egress gateway.

938	   If each flow does not start with an FNE packet, it will be seen later
939	   that sanctions may be too strict at the interface before the egress
940	   gateway.  It will often be possible to apply sanctions at the
941	   granularity of aggregates rather than flows, but in an internetworked
942	   environment it cannot be guaranteed that aggregates will be
943	   identifiable in remote networks.  So setting FNE at the start of each
944	   flow is a safe strategy.  For instance, a remote network may have
945	   equal cost multi-path (ECMP) routing enabled, causing different flows
946	   between the same gateways to traverse different paths.

948	   After an idle period of more than 1 second, the ingress gateway
949	   SHOULD set the EECN field of the next packet it sends to FNE.  This
950	   allows the design of network policers to be deterministic (see
951	   [Re-TCP]).

953	   However, if the ingress gateway can guarantee that the network(s)
954	   that will carry the flow to its egress gateway all use a common
955	   identifier for the aggregate (e.g. a single MPLS network without ECMP
956	   routing), it MAY NOT set FNE when it adds a new flow to an active
957	   aggregate.  And an FNE packet need only be sent if a whole aggregate
958	   has been idle for more than 1 second.

960	4.3.4.  Router Forwarding Behaviour

962	   Adding re-ECN works well without modifying the forwarding behaviour
963	   of any routers.  However, below, two changes are proposed when
964	   forwarding packets with a per-hop-behaviour that requires pre-
965	   congestion notification:

967	   Preferential drop:  When a router cannot avoid dropping ECN-capable
968	      packets, preferential dropping of packets with different extended
969	      ECN codepoints SHOULD be implemented between packets within a PHB
970	      that uses PCN marking.  The drop preference order to use is
971	      defined in Table 4.  Note that to reduce configuration complexity,
972	      Re-Echo and FNE MAY be given the same drop preference, but if
973	      feasible, FNE should be dropped in preference to Re-Echo.

975	   +---------+-------+----------------+---------+----------------------+
976	   |   ECN   |   RE  | Extended ECN   | Drop    |    Re-ECN meaning    |
977	   |  field  |  flag | codepoint      | Pref    |                      |
978	   +---------+-------+----------------+---------+----------------------+
979	   |    01   |   0   | Re-Echo        | 5/4     | Re-echoed congestion |
980	   |         |       |                |         |       and RECT       |
981	   |    00   |   1   | FNE            | 4       |     Feedback not     |
982	   |         |       |                |         |      established     |
983	   |    01   |   1   | RECT           | 3       |    Re-ECN capable    |
984	   |         |       |                |         |       transport      |
985	   |    10   |   0   | AM(0)          | 3       |   Admission Marking  |
986	   |         |       |                |         |     with Re-Echo     |
987	   |    10   |   1   | AM(-1)         | 3       |   Admission Marking  |
988	   |         |       |                |         |                      |
989	   |    11   |   0   | PM(0)          | 2       |  Pre-emption Marking |
990	   |         |       |                |         |     with Re-Echo     |
991	   |    11   |   1   | PM(-1)         | 2       |  Pre-emption Marking |
992	   |         |       |                |         |                      |
993	   |    00   |   0   | Not-RECT       | 1       |  Not re-ECN-capable  |
994	   |         |       |                |         |       transport      |
995	   +---------+-------+----------------+---------+----------------------+

997	      Table 4: Drop Preference of Extended ECN Codepoints (1 = drop 1st)
998	      Given this proposal is being advanced at the same time as PCN
999	      itself, we strongly RECOMMEND that preferential drop based on
1000	      extended ECN codepoint is added to router forwarding at the same
1001	      time as PCN marking.  Preferential dropping can be difficult to
1002	      implement, but we strongly RECOMMEND this security-related re-ECN
1003	      improvement where feasible as it is an effective defence against
1004	      flooding attacks.

1006	   Marking vs. Drop:  We propose that PCN-routers SHOULD inspect the RE
1007	      flag as well as the ECN field to decide whether to drop or mark
1008	      PCN DSCPs.  They MUST choose drop if the codepoint of this
1009	      extended ECN field is Not-RECT.  Otherwise they SHOULD mark
1010	      (unless, of course, buffer space is exhausted).

1012	      A PCN-capable router MUST NOT ever congestion mark a packet
1013	      carrying the Not-RECT codepoint because the transport will only
1014	      understand drop, not congestion marking.  But a PCN-capable router
1015	      can mark rather than drop an FNE packet, even though its ECN field
1016	      when looked at in isolation is '00' which appears to be a legacy
1017	      Not-ECT packet.  Therefore, if a packet's RE flag is '1', even if
1018	      its ECN field is '00', a PCN-enabled router SHOULD use congestion
1019	      marking.  This allows the `feedback not established' (FNE)
1020	      codepoint to be used for probe packets, in order to pick up PCN
1021	      marking when bootstrapping an aggregate.

1023	      ECN marking rather than dropping of FNE packets MUST only be
1024	      deployed in controlled environments, such as that in [PCN-arch],
1025	      where the presence of an egress node that understands ECN marking
1026	      is assured.  Congestion events might otherwise be ignored if the
1027	      receiver only understands drop, rather than ECN marking.  This is
1028	      because there is no guarantee that ECN capability has been
1029	      negotiated if feedback is not established (FNE).  Also, [Re-TCP]
1030	      places the strong condition that a router MUST apply drop rather
1031	      than marking to FNE packets unless it can guarantee that FNE
1032	      packets are rate limited either locally or upstream.

1034	4.3.5.  Extensions

1036	   If a different signalling system, such as NSIS, were used, but it
1037	   provided admission control in a similar way, using pre-congestion
1038	   notification (e.g. with RMD [NSIS-RMD]) we believe re-ECN could be
1039	   used to protect against misbehaving networks in the same way as
1040	   proposed above.

1042	5.  Emulating Border Policing with Re-ECN

1044	   Note that the re-ECN protocol described in Section 4 above would
1045	   require standardisation, whereas operators acting in their own
1046	   interests would be expected to deploy policing and monitoring
1047	   functions similar to those proposed in the sections below without any
1048	   further need for standardisation by the IETF.  Flexibility is
1049	   expected in exactly how policing and monitoring is done.

1051	5.1.  Informal Terminology

1053	   In the rest of this memo, where the context makes it clear, we will
1054	   sometimes loosely use the term `congestion' rather than using the
1055	   stricter `downstream pre-congestion'.  Also we will loosely talk of
1056	   positive or negative flows, meaning flows where the moving average of
1057	   the downstream pre-congestion metric is persistently positive or
1058	   negative.  The notion of a negative metric arises because it is
1059	   derived by subtracting one metric from another.  Of course actual
1060	   downstream congestion cannot be negative, only the metric can
1061	   (whether due to time lags or deliberate malice).

1063	   Just as we will loosely talk of positive and negative flows, we will
1064	   also talk of positive or negative packets, meaning packets that
1065	   contribute positively or negatively to downstream pre-congestion.

1067	   Therefore packets can be considered to have a `worth' of +1, 0 or -1,
1068	   which, when multiplied by their size, indicates their contribution to
1069	   downstream congestion.  Packets will usually be sent with a worth of
1070	   0.  Blanking the RE flag increments the worth of a packet to +1.
1071	   Congestion marking a packet decrements its worth (whether admission
1072	   marking or pre-emption marking).  Congestion marking a previously
1073	   blanked packet cancel out the positive and negative worth of each
1074	   marking (a worth of 0).  The FNE codepoint is an exception.  It has
1075	   the same positive worth as a packet with the Re-Echo codepoint.  The
1076	   table below specifies unambiguously the worth of each extended ECN
1077	   codepoint.  Note the order is different from the previous table to
1078	   emphasise how congestion marking processes decrement the worth.

1080	   +---------+-------+-----------------+-------+-----------------------+
1081	   |   ECN   |   RE  | Extended ECN    | Worth |     Re-ECN meaning    |
1082	   |  field  |  flag | codepoint       |       |                       |
1083	   +---------+-------+-----------------+-------+-----------------------+
1084	   |    00   |   0   | Not-RECT        | n/a   |   Not re-ECN-capable  |
1085	   |         |       |                 |       |       transport       |
1086	   |    01   |   0   | Re-Echo         | +1    |  Re-echoed congestion |
1087	   |         |       |                 |       |        and RECT       |
1088	   |    10   |   0   | AM(0)           | 0     |   Admission Marking   |
1089	   |         |       |                 |       |      with Re-Echo     |
1090	   |    11   |   0   | PM(0)           | 0     |  Pre-emption Marking  |
1091	   |         |       |                 |       |      with Re-Echo     |
1092	   |    00   |   1   | FNE             | +1    |      Feedback not     |
1093	   |         |       |                 |       |      established      |
1094	   |    01   |   1   | RECT            | 0     |     Re-ECN capable    |
1095	   |         |       |                 |       |       transport       |
1096	   |    10   |   1   | AM(-1)          | -1    |   Admission Marking   |
1097	   |         |       |                 |       |                       |
1098	   |    11   |   1   | PM(-1)          | -1    |  Pre-emption Marking  |
1099	   +---------+-------+-----------------+-------+-----------------------+

1101	                Table 5: 'Worth' of Extended ECN Codepoints

1103	5.2.  Policing Overview

1105	   It will be recalled that downstream congestion can be found by
1106	   subtracting upstream congestion from path congestion.  Figure 4
1107	   displays the difference between the two plots in Figure 3 to show
1108	   downstream pre-congestion across the same path through the Internet.

1110	   To emulate border policing, the general idea is for each domain to
1111	   apply penalties to its upstream neighbour in proportion to the amount
1112	   of downstream pre-congestion that the upstream network sends across
1113	   the border.  That is, the penalties should be in proportion to the
1114	   height of the plot.  Downward arrows in the figure show the resulting
1115	   pressure for each domain to under-declare downstream pre-congestion
1116	   in traffic they pass to the next domain, because of the penalties.

1118	               p e n a l t i e s
1119	              /        |        \
1120	       A     :         :         :
1121	       |     |  <--A---> <---B---> <---C--->           domain
1122	       |     V         :         :         :
1123	    3% |    +-----+    |         |         :
1124	       |    |     |    V         V         :
1125	    2% |    |     +----------------------+ :
1126	       |    |  downstream pre-congestion | :
1127	    1% |    |     :                      | :
1128	       |    |     :                      | :
1129	    0% +----+----------------------------+====+------>
1130	            :     :                      : A  :
1131	            :     :                      : |  :
1132	        ingress   :                      : :  egress
1133	                1.00%                 2.00%:         pre-congestion
1134	                                           |
1135	                                       sanctions

1137	   Figure 4: Policing Framework, showing creation of opposing pressures
1138	    to under-declare and over-declare downstream pre-congestion, using
1139	                          penalties and sanctions

1141	   These penalties seem to encourage everyone to understate downstream
1142	   congestion in order to reduce the penalties they incur.  But a
1143	   balancing pressure is introduced by the last domain, which applies
1144	   sanctions to flows if downstream congestion goes negative before the
1145	   egress gateway.  The upward arrow at Domain C's border with the
1146	   egress gateway represents the incentive the sanctions would create to
1147	   prevent negative traffic.  The same upward pressure can be applied at
1148	   any domain border (arrows not shown).

1150	   Any flow that persistently goes negative by the time it leaves a
1151	   domain must not have been marked correctly in the first place.  A
1152	   domain that discovers such a flow can adopt a range of strategies to
1153	   protect itself.  Which strategy it uses will depend on policy,
1154	   because it cannot immediately assume malice--there may be an innocent
1155	   configuration error somewhere in the system.

1157	   This memo does not propose to standardise any particular mechanism to
1158	   detect persistently negative flows, but Section 5.5 does give
1159	   examples.  Note that we have used the term flow, but there will be no
1160	   need to bury into the transport layer for port numbers; identifiers
1161	   visible in the network layer will be sufficient (IP address pair,
1162	   DSCP, protocol ID).  The appendix also gives a mechanism to bound the
1163	   required flow state, preventing state exhaustion attacks.

1165	   Of course, some domains may trust other domains to comply with
1166	   admission control without applying sanctions or penalties.  In these
1167	   cases, the protocol should still be used but no penalties need be
1168	   applied.  The re-ECN protocol ensures downstream pre-congestion
1169	   marking is passed on correctly whether or not penalties are applied
1170	   to it, so the system works just as well with a mixture of some
1171	   domains trusting each other and others not.

1173	   Providers should be free to agree the contractual terms they wish
1174	   between themselves, so this memo does not propose to standardise how
1175	   these penalties would be applied.  It is sufficient to standardise
1176	   the re-ECN protocol so the downstream pre-congestion metric is
1177	   available if providers choose to use it.  However, the next section
1178	   (Section 5.3) gives some examples of how these penalties might be
1179	   implemented.

1181	5.3.  Pre-requisite Contractual Arrangements

1183	   The re-ECN protocol has been chosen to solve the policing problem
1184	   because it embeds a downstream pre-congestion metric in passing CL
1185	   traffic that is difficult to lie about and can be measured in bulk.
1186	   The ability to emulate border policing depends on network operators
1187	   choosing to use this metric as one of the elements in their contracts
1188	   with each other.

1190	   Already many inter-domain agreements involve a capacity and a usage
1191	   element.  The usage element may be based on volume or various
1192	   measures of peak demand.  We expect that those network operators who
1193	   choose to use pre-congestion notification for admission control would
1194	   also be willing to consider using this downstream pre-congestion
1195	   metric as a usage element in their interconnection contracts for
1196	   admission controlled (CL) traffic.

1198	   Congestion (or pre-congestion) has the dimension of [octet], being
1199	   the product of volume transferred [octet] and the congestion fraction
1200	   [dimensionless], which is the fraction of the offered load that the
1201	   network isn't able to serve (or would rather not serve in the case of
1202	   pre-congestion).  Measuring downstream congestion gives a measure of
1203	   the volume transferred but modulated by congestion expected
1204	   downstream.  So volume transferred during off-peak periods counts as
1205	   nearly nothing, while volume transferred at peak times counts very
1206	   highly.  The re-ECN protocol allows one network to measure how much
1207	   pre-congestion has been `dumped' into it by another network.  And
1208	   then in turn how much of that pre-congestion it dumped into the next
1209	   downstream network.

1211	   Section 5.6 describes mechanisms for calculating border penalties
1212	   referring to Appendix A.2 for suggested metering algorithms for
1213	   downstream congestion at a border router.  Conceptually, it could
1214	   hardly be simpler.  It broadly involves accumulating the volume of
1215	   packets with the RE flag blanked and the volume of those with
1216	   congestion marking then subtracting the two.

1218	   Once this downstream pre-congestion metric is available, operators
1219	   are free to choose how they incorporate it into their interconnection
1220	   contracts [IXQoS].  Some may include a threshold volume of pre-
1221	   congestion as a quality measure in their service level agreement,
1222	   perhaps with a penalty clause if the upstream network exceeds this
1223	   threshold over, say, a month.  Others may agree a set of tiered
1224	   monthly thresholds, with increasing penalties as each threshold is
1225	   exceeded.  But, it would be just as easy, and more resistant to
1226	   gaming, to do away with discrete thresholds, and instead make the
1227	   penalty rise smoothly with the volume of pre-congestion by applying a
1228	   price to pre-congestion itself.  Then the usage element of the
1229	   interconnection contract would directly relate to the volume of pre-
1230	   congestion caused by the upstream network.

1232	   The direction of penalties and charges relative to the direction of
1233	   traffic flow is a constant source of confusion.  Typically, where
1234	   capacity charges are concerned, lower tier customer networks pay
1235	   higher tier provider networks.  So money flows from the edges to the
1236	   middle of the internetwork, towards greater connectivity,
1237	   irrespective of the flow of data.  But we advise that penalties or
1238	   charges for usage should follow the same direction as the data flow--
1239	   the direction of control at the network layer.  Otherwise a network
1240	   lays itself open to `denial of funds' attacks.  So, where a tier 2
1241	   provider sends data into a tier 3 customer network, we would expect
1242	   the penalty clauses for sending too much pre-congestion to be against
1243	   the tier 2 network, even though it is the provider.

1245	   It may help to remember that data will be flowing in the other
1246	   direction too.  So the provider network has as much opportunity to
1247	   levy usage penalties as its customer, and it can set the price or
1248	   strength of its own penalties higher if it chooses.  Usage charges in
1249	   both directions tend to cancel each other out, which confirms that
1250	   usage-charging is less to do with revenue raising and more to do with
1251	   encouraging load control discipline in order to smooth peaks and
1252	   troughs, improving utilisation and quality.

1254	   Further, when operators agree penalties in their interconnection
1255	   contracts for sending downstream congestion, they should make sure
1256	   that any level of negative marking only equates to zero penalty.  In
1257	   other words, penalties are always paid in the same direction as the
1258	   data, and never against the data flow, even if downstream congestion
1259	   seems to be negative.  This is consistent with the definition of
1260	   physical congestion; when a resource is underutilised, it is not
1261	   negatively congested.  Its congestion is just zero.  So, although
1262	   short periods of negative marking can be tolerated to correct
1263	   temporary over-declarations due to lags in the feedback system,
1264	   persistent downstream negative congestion can have no physical
1265	   meaning and therefore must signify a problem.  The incentive for
1266	   domains not to tolerate persistently negative traffic depends on this
1267	   principle that penalties must never be paid against the data flow.

1269	   Also note that at the last egress of the Diffserv region, domain C
1270	   should not agree to pay any penalties to the egress gateway for pre-
1271	   congestion passed to the egress gateway.  Downstream pre-congestion
1272	   to the egress gateway should have reached zero here.  If domain C
1273	   were to agree to pay for any remaining downstream pre-congestion, it
1274	   would give the egress gateway an incentive to over-declare pre-
1275	   congestion feedback and take the resulting profit from domain C.

1277	   To focus the discussion, from now on, unless otherwise stated, we
1278	   will assume a downstream network charges its upstream neighbour in
1279	   proportion to the pre-congestion it sends (V_b in the notation of
1280	   Appendix A.2).  Effectively tiered thresholds would be just more
1281	   coarse-grained approximations of the fine-grained case we choose to
1282	   examine.  If these neighbours had previously agreed that the (fixed)
1283	   price per octet of pre-congestion would be L, then the bill at the
1284	   end of the month would simply be the product L*V_b, plus any fixed
1285	   charges they may also have agreed.

1287	   We are well aware that the IETF tries to avoid standardising
1288	   technology that depends on a particular business model.  Indeed, this
1289	   principle is at the heart of all our own work.  Our aim here is to
1290	   make a new metric available that we believe is superior to all
1291	   existing metrics.  Then, our aim is to show that border policing can
1292	   at least work with the one model we have just outlined.  We assume
1293	   that operators might then experiment with the metric in other models.
1294	   Of course, operators are free to complement this pre-congestion-based
1295	   usage element of their charges with traditional capacity charging,
1296	   and we expect they will.

1298	   Also note well that everything we discuss in this memo only concerns
1299	   interconnection within the Diffserv region.  ISPs are free to sell or
1300	   give away reservations however they want on the retail market.  But
1301	   of course, interconnection charges will have a bearing on that.
1302	   Indeed, in the present scenario, the ingress gateway effectively
1303	   sells reservations on one side and buys congestion penalties on the
1304	   other.  As congestion rises, one can imagine the gateway discovering
1305	   that congestion penalties have risen higher than the (probably fixed)
1306	   revenue it will earn from selling the next flow reservation.  This
1307	   encourages the gateway to cut its losses by blocking new calls, which
1308	   is why we believe downstream congestion penalties can emulate per-
1309	   flow rate policing at borders, as the next section explains.

1311	5.4.  Emulation of Per-Flow Rate Policing: Rationale and Limits

1313	   The important feature of charging in proportion to congestion volume
1314	   is that the penalty aggregates and disaggregates correctly along with
1315	   packet flows.  This is because the penalty rises linearly with bit
1316	   rate (unless congestion is absolutely zero) and linearly with
1317	   congestion, because it is the product of them both.  So if the
1318	   packets crossing a border belong to a thousand flows, and one of
1319	   those flows doubles its rate, the ingress gateway forwarding that
1320	   flow will have to put twice as much congestion marking into the
1321	   packets of that flow.  And this extra congestion marking will add
1322	   proportionately to the penalties levied at every border the flow
1323	   crosses in proportion to the amount of pre-congestion remaining on
1324	   the path.

1326	   Effectively, usage charges will continuously flow from ingress
1327	   gateways to the places generating pre-congestion marking, in
1328	   proportion to the pre-congestion marking introduced and to the data
1329	   rates from those gateways.

1331	   As importantly, pre-congestion itself rises super-linearly with
1332	   utilisation of a particular resource.  So if someone tries to push
1333	   another flow into a path that is already signalling enough pre-
1334	   congestion to warrant admission control, the penalty will be a lot
1335	   greater than it would have been to add the same flow to a less
1336	   congested path.  This makes the incentive system fairly insensitive
1337	   to the actual level of pre-congestion for triggering admission
1338	   control that each ingress chooses.  The deterrent against exceeding
1339	   whatever threshold is chosen rises very quickly with a small amount
1340	   of cheating.

1342	   These are the properties that allow re-ECN to emulate per-flow border
1343	   policing of both rate and admission control.  It is not a perfect
1344	   emulation of per-flow border policing, but we claim it is sufficient
1345	   to at least ensure the cost to others of a cheat is borne by the
1346	   cheater, because the penalties are at least proportionate to the
1347	   level of the cheat.  If an edge network operator is selling
1348	   reservations at a large profit over the congestion cost, these pre-
1349	   congestion penalties will not be sufficient to ensure networks in the
1350	   middle get a share of those profits, but at least they can cover
1351	   their costs.

1353	   We will now explain with an example.  When a whole inter-network is
1354	   operating at normal (typically very low) congestion, the pre-
1355	   congestion marking from virtual queues will be a little higher than
1356	   if the real queues had been used--still low, but more noticeable.
1357	   But low congestion levels do not imply that usage _charges_ must also
1358	   be low.  Usage charges will depend on the _price_ L as well.

1360	   If the metric of the usage element of an interconnection agreement
1361	   was changed from pure volume to pre-congested volume, one would
1362	   expect the price of pre-congestion to be arranged so that the total
1363	   usage charge remained about the same.  So, if an average pre-
1364	   congestion fraction turned out to be 1/1000, one would expect that
1365	   the price L (per octet) of pre-congestion would be about 1000 times
1366	   the previously used (per octet) price for volume.  We should add that
1367	   a switch to pre-congestion is unlikely to exactly maintain the same
1368	   overall level of usage charges, but this argument will be
1369	   approximately true, because usage charge will rise to at least the
1370	   level the market finds necessary to push back against usage.

1372	   From the above example it can be seen why a 1000x higher price will
1373	   make operators become acutely sensitive to the congestion they cause
1374	   in other networks, which is of course the desired effect; to
1375	   encourage networks to _control_ the congestion they allow their users
1376	   to cause to others.

1378	   If any network sends even one flow at higher rate, they will
1379	   immediately have to pay proportionately more usage charges.  Because
1380	   there is no knowledge of reservations within the Diffserv region, no
1381	   interior router can police whether the rate of each flow is greater
1382	   than each reservation.  So the system doesn't truly emulate rate-
1383	   policing of each flow.  But there is no incentive to pack a higher
1384	   rate into a reservation, because the charges are directly
1385	   proportional to rate, irrespective of the reservations.

1387	   However, if virtual queues start to fill on any path, even though
1388	   real queues will still be able to provide low latency service, pre-
1389	   congestion marking will rise fairly quickly.  It may eventually reach
1390	   the threshold where the ingress gateway would deny admission to new
1391	   flows.  If the ingress gateway cheats and continues to admit new
1392	   flows, the affected virtual queues will rapidly fill, even though the
1393	   real queues will still be little worse than they were when admission
1394	   control should have been invoked.  The ingress gateway will have to
1395	   pay the penalty for such an extremely high pre-congestion level, so
1396	   the pressure to invoke admission control should become unbearable.

1398	   The above mechanisms protect against rational operators.  In
1399	   Section 5.6.3 we discuss how networks can protect themselves from
1400	   accidental or deliberate misconfiguration in neighbouring networks.

1402	5.5.  Sanctioning Dishonest Marking

1404	   As CL traffic leaves the last network before the egress gateway
1405	   (domain C) the RE blanking fraction should match the congestion
1406	   marking fraction, when averaged over a sufficiently long duration
1407	   (perhaps ~10s to allow a few rounds of feedback through regular
1408	   signalling of new and refreshed reservations).

1410	   To protect itself, domain C should install a monitor at its egress.
1411	   It aims to detect flows of CL packets that are persistently negative.
1412	   If flows are positive, domain C need take no action--this simply
1413	   means an upstream network must be paying more penalties than it needs
1414	   to.  Appendix A.3 gives a suggested algorithm for the monitor,
1415	   meeting the criteria below.

1417	   o  It SHOULD introduce minimal false positives for honest flows;

1419	   o  It SHOULD quickly detect and sanction dishonest flows (minimal
1420	      false negatives);

1422	   o  It MUST be invulnerable to state exhaustion attacks from malicious
1423	      sources.  For instance, if the dropper uses flow-state, it should
1424	      not be possible for a source to send numerous packets, each with a
1425	      different flow ID, to force the dropper to exhaust its memory
1426	      capacity;

1428	   o  It MUST introduce sufficient loss in goodput so that malicious
1429	      sources cannot play off losses in the egress dropper against
1430	      higher allowed throughput.  Salvatori [CLoop_pol] describes this
1431	      attack, which involves the source understating path congestion
1432	      then inserting forward error correction (FEC) packets to
1433	      compensate expected losses.

1435	   Note that the monitor operates on flows but with careful design we
1436	   can avoid per-flow state.  This is why we have been careful to ensure
1437	   that all flows MUST start with a packet marked with the FNE
1438	   codepoint.  If a flow does not start with the FNE codepoint, a
1439	   monitor is likely to treat it unfavourably.  This risk makes it worth
1440	   setting the FNE codepoint at the start of a flow, even though there
1441	   is a cost to setting FNE (positive `worth').

1443	   Starting flows with an FNE packet also means that a monitor will be
1444	   resistant to state exhaustion attacks from other networks, as the
1445	   monitor can then be designed to never create state unless an FNE
1446	   packet arrives.  And an FNE packet counts positive, so it will cost a
1447	   lot for a network to send many of them.

1449	   Monitor algorithms will often maintain a moving average across flows
1450	   of the fraction of RE blanked packets.  When maintaining an average
1451	   across flows, a monitor MUST ignore packets with the FNE codepoint
1452	   set.  An ingress gateway sets the FNE codepoint when it does not have
1453	   the benefit of feedback from the egress.  So counting packets with
1454	   FNE cleared would be likely to make the average unnecessarily
1455	   positive, providing headroom (or should we say footroom?) for
1456	   dishonest (negative) traffic.

1458	   If the monitor detects a persistently negative flow, it could drop
1459	   sufficient negative and neutral packets to force the flow to not be
1460	   negative.  This is the approach taken for the `egress dropper' in
1461	   [Re-TCP], but for the scenario in this memo, where everyone would
1462	   expect everyone else to keep to the protocol, a management alarm
1463	   SHOULD be raised on detecting persistently negative traffic and any
1464	   automatic sanctions taken SHOULD be logged.  Even if the chosen
1465	   policy is to take no automatic action, the cause can then be
1466	   investigated manually.

1468	   Then all ingresses cannot understate downstream pre-congestion
1469	   without their action being logged.  So network operators can deal
1470	   with offending networks at the human level, out of band.  As a last
1471	   resort, perhaps where the ingress gateway address seems to have been
1472	   spoofed in the signalling, packets can be dropped.  Drops could be
1473	   focused on just sufficient packets in misbehaving flows to remove the
1474	   negative bias while doing minimal harm.

1476	   A future version of this memo may define a control message that could
1477	   be used to notify an offending ingress gateway (possibly via the
1478	   egress gateway) that it is sending persistently negative flows.
1479	   However, we are aware that such messages could be used to test the
1480	   sensitivity of the detection system, so currently we prefer silent
1481	   sanctions.

1483	   An extreme scenario would be where an ingress gateway (or set of
1484	   gateways) mounted a DoS attack against another network.  If their
1485	   traffic caused sufficient congestion to lead to drop but they
1486	   understated path congestion to avoid penalties for causing high
1487	   congestion, the preferential drop recommendations in Section 4.3.4
1488	   would at least ensure that these flows would always be dropped before
1489	   honest flows..

1491	5.6.  Border Mechanisms

1493	5.6.1.  Border Accounting Mechanisms

1495	   One of the main design goals of re-ECN was for border security
1496	   mechanisms to be as simple as possible, otherwise they would become
1497	   the pinch-points that limit scalability of the whole internetwork.
1498	   As the title of this memo suggests, we want to avoid per-flow
1499	   processing at borders.  We also want to keep to passive mechanisms
1500	   that can monitor traffic in parallel to forwarding, rather than
1501	   having to filter traffic inline--in series with forwarding.  As data
1502	   rates continue to rise, we suspect that all-optical interconnection
1503	   between networks will soon be a requirement.  So we want to avoid any
1504	   new need for buffering (even though border filtering is current
1505	   practice for other reasons, we don't want to make it even less likely
1506	   that we will ever get rid of it).

1508	   So far, we have been able to keep the border mechanisms simple,
1509	   despite having had to harden them against some subtle attacks on the
1510	   re-ECN design.  The mechanisms are still passive and avoid per-flow
1511	   processing, although we do use filtering as a fail-safe to
1512	   temporarily shield against extreme events in other networks, such as
1513	   accidental misconfigurations (Section 5.6.3).

1515	   The basic accounting mechanism at each border interface simply
1516	   involves accumulating the volume of packets with positive worth (Re-
1517	   Echo and FNE), and subtracting the volume of those with negative
1518	   worth: AM(-1) and PM(-1).  Even though this mechanism takes no regard
1519	   of flows, over an accounting period (say a month) this subtraction
1520	   will account for the downstream congestion caused by all the flows
1521	   traversing the interface, wherever they come from, and wherever they
1522	   go to.  The two networks can agree to use this metric however they
1523	   wish to determine some congestion-related penalty against the
1524	   upstream network (see Section 5.3 for examples).  Although the
1525	   algorithm could hardly be simpler, it is spelled out using pseudo-
1526	   code in Appendix A.2.1.

1528	   Various attempts to subvert the re-ECN design have been made.  In all
1529	   cases their root cause is persistently negative flows.  But, after
1530	   describing these attacks we will show that we don't actually have to
1531	   get rid of all persistently negative flows in order to thwart the
1532	   attacks.

1534	   In honest flows, downstream congestion is measured as positive minus
1535	   negative volume.  So if all flows are honest (i.e. not persistently
1536	   negative), adding all positive volume and all negative volume without
1537	   regard to flows will give an aggregate measure of downstream
1538	   congestion.  But such simple aggregation is only possible if no flows
1539	   are persistently negative.  Unless persistently negative flows are
1540	   completely removed, they will reduce the aggregate measure of
1541	   congestion.  The aggregate may still be positive overall, but not as
1542	   positive as it would have been had the negative flows been removed.

1544	   In Section 5.5 we discussed how to sanction traffic to remove, or at
1545	   least to identify, persistently negative flows.  But, even if the
1546	   sanction for negative traffic is to discard it, unless it is
1547	   discarded at the exact point it goes negative, it will wrongly
1548	   subtract from aggregate downstream congestion, at least at any
1549	   borders it crosses after it has gone negative but before it is
1550	   discarded.

1552	   We rely on sanctions to deter dishonest understatement of congestion.
1553	   But even the ultimate sanction of discard can only be effective if
1554	   the sender is bothered about the data getting through to its
1555	   destination.  A number of attacks have been identified where a sender
1556	   gains from sending dummy traffic or it can attack someone or
1557	   something using dummy traffic even though it isn't communicating any
1558	   information to anyone:

1560	   o  A network can simply create its own dummy traffic to congest
1561	      another network, perhaps causing it to lose business at no cost to
1562	      the attacking network.  This is a form of denial of service
1563	      perpetrated by one network on another.  The preferential drop
1564	      measures in Section 4.3.4 provide crude protection against such
1565	      attacks, but we are not overly worried about more accurate
1566	      prevention measures, because it is already possible for networks
1567	      to DoS other networks on the general Internet, but they generally
1568	      don't because of the grave consequences of being found out.  We
1569	      are only concerned if re-ECN increases the motivation for such an
1570	      attack, as in the next example.

1572	   o  A network can just generate negative traffic and send it over its
1573	      border with a neighbour to reduce the overall penalties that it
1574	      should pay to that neighbour.  It could even initialise the TTL so
1575	      it expired shortly after entering the neighbouring network,
1576	      reducing the chance of detection further downstream.  This attack
1577	      need not be motivated by a desire to deny service and indeed need
1578	      not cause denial of service.  A network's main motivator would
1579	      most likely be to reduce the penalties it pays to a neighbour.
1580	      But, the prospect of financial gain might tempt the network into
1581	      mounting a DoS attack on the other network as well, given the gain
1582	      would offset some of the risk of being detected.

1584	   Note that we have not included DoS by Internet hosts in the above
1585	   list of attacks, because we have restricted ourselves to a scenario
1586	   with edge-to-edge admission control across a Diffserv region.  In
1587	   this case, the edge ingress gateways insulate the Diffserv region
1588	   from DoS by Internet hosts.  Re-ECN resists more general DoS attacks,
1589	   but this is discussed in [Re-TCP].

1591	   The first step towards a solution to all these problems with negative
1592	   flows is to be able to estimate the contribution they make to
1593	   downstream congestion at a border and to correct the measure
1594	   accordingly.  Although ideally we want to remove negative flows
1595	   themselves, perhaps surprisingly, the most effective first step is to
1596	   cancel out the polluting effect negative flows have on the measure of
1597	   downstream congestion at a border.  It is more important to get an
1598	   unbiased estimate of their effect, than to try to remove them all.  A
1599	   suggested algorithm to give an unbiased estimate of the contribution
1600	   from negative flows to the downstream congestion measure is given in
1601	   Appendix A.2.2.

1603	   Although making an accurate assessment of the contribution from
1604	   negative flows may not be easy, just the single step of neutralising
1605	   their polluting effect on congestion metrics removes all the gains
1606	   networks could otherwise make from mounting dummy traffic attacks on
1607	   each other.  This puts all networks on the same side (only with
1608	   respect to negative flows of course), rather than being pitched
1609	   against each other.  The network where this flow goes negative as
1610	   well as all the networks downstream lose out from not being
1611	   reimbursed for any congestion this flow causes.  So they all have an
1612	   interest in getting rid of these negative flows.  Networks forwarding
1613	   a flow before it goes negative aren't strictly on the same side, but
1614	   they are disinterested bystanders--they don't care that the flow goes
1615	   negative downstream, but at least they can't actively gain from
1616	   making it go negative.  The problem becomes localised so that once a
1617	   flow goes negative, all the networks from where it happens and beyond
1618	   downstream each have a small problem, each can detect it has a
1619	   problem and each can get rid of the problem if it chooses to.  But
1620	   negative flows can no longer be used for any new attacks.

1622	   Once an unbiased estimate of the effect of negative flows can be
1623	   made, the problem reduces to detecting and preferably removing flows
1624	   that have gone negative as soon as possible.  But importantly,
1625	   complete eradication of negative flows is no longer critical--best
1626	   endeavours will be sufficient.

1628	   Note that the guiding principle behind all the above discussion is
1629	   that any gain from subverting the protocol should be precisely
1630	   neutralised, rather than punished.  If a gain is punished to a
1631	   greater extent than is sufficient to neutralise it, it will most
1632	   likely open up a new vulnerability, where the amplifying effect of
1633	   the punishment mechanism can be turned on others.

1635	   For instance, if possible, flows should be removed as soon as they go
1636	   negative, but we do NOT RECOMMEND any attempts to discard such flows
1637	   further upstream while they are still positive.  Such over-zealous
1638	   push-back is unnecessary and potentially dangerous.  These flows have
1639	   paid their `fare' up to the point they go negative, so there is no
1640	   harm in delivering them that far.  If someone downstream asks for a
1641	   flow to be dropped as near to the source as possible, because they
1642	   say it is going to become negative later, an upstream node cannot
1643	   test the truth of this assertion.  Rather than have to authenticate
1644	   such messages, re-ECN has been designed so that flows can be dropped
1645	   solely based on locally measurable evidence.  A message hinting that
1646	   a flow should be watched closely to test for negativity is fine.  But
1647	   not a message that claims that a positive flow will go negative
1648	   later, so it should be dropped. .

1650	5.6.2.  Competitive Routing

1652	   With the above penalty system, each domain seems to have a perverse
1653	   incentive to fake pre-congestion.  For instance domain B profits from
1654	   the difference between penalties it receives at its ingress (its
1655	   revenue) and those it pays at its egress (its cost).  So if B
1656	   overstates internal pre-congestion it seems to increase its profit.
1657	   However, we can assume that domain A could bypass B, routing through
1658	   other domains to reach the egress.  So the competitive discipline of
1659	   least-cost routing can ensure that any domain tempted to fake pre-
1660	   congestion for profit risks losing _all_ its incoming traffic.  The
1661	   least congested route would eventually be able to win this
1662	   competitive game, only as long as it didn't declare more fake pre-
1663	   congestion than the next most competitive route.

1665	   The competitive effect of interdomain routing might be weaker nearer
1666	   to the egress.  For instance, C may be the only route B can take to
1667	   reach the ultimate receiver.  And if C over-penalises B, the egress
1668	   gateway and the ultimate receiver seem to have no incentive to move
1669	   their terminating attachment to another network, because only B and
1670	   those upstream of B suffer the higher penalties.  However, we must
1671	   remember that we are only looking at the money flows at the
1672	   unidirectional network layer.  There are likely to be all sorts of
1673	   higher level business models constructed over the top of these low
1674	   level 'sender-pays' penalties.  For instance, we might expect a
1675	   session layer charging model where the session originator pays for a
1676	   pair of duplex flows, one as receiver and one as sender.
1677	   Traditionally this has been a common model for telephony and we might
1678	   expect it to be used, at least sometimes, for other media such as
1679	   video.  Wherever such a model is used, the data receiver will be
1680	   directly affected if its sessions terminate through a network like C
1681	   that fakes congestion to over-penalise B. So end-customers will
1682	   experience a direct competitive pressure to switch to cheaper
1683	   networks, away from networks like C that try to over-penalise B.

1685	   This memo does not need to standardise any particular mechanism for
1686	   routing based on re-ECN.  Goldenberg et al [Smart_rtg] refers to
1687	   various commercial products and presents its own algorithms for
1688	   moving traffic between multi-homed routes based on usage charges.
1689	   None of these systems require any changes to standards protocols
1690	   because the choice between the available border gateway protocol
1691	   (BGP) routes is based on a combination of local knowledge of the
1692	   charging regime and local measurement of traffic levels.  If, as we
1693	   propose, charges or penalties were based on the level of re-ECN
1694	   measured in passing traffic, a similar optimisation could be achieved
1695	   without requiring any changes to standard routing protocols.

1697	   We must be clear that applying pre-congestion-based routing to this
1698	   admission control system remains an open research issue.  Traffic
1699	   engineering based on congestion requires careful damping to avoid
1700	   oscillations, and should not be attempted without adult supervision
1701	   :) Mortier & Pratt [ECN-BGP] have analysed traffic engineering based
1702	   on congestion.  But without the benefit of re-ECN, they had to add a
1703	   path attribute to BGP to advertise a route's downstream congestion
1704	   (actually they proposed that BGP should advertise the charge for
1705	   congestion, which we believe wrongly embeds an assumption into BGP
1706	   that the only thing to do with congestion is charge for it).

1708	5.6.3.  Fail-safes

1710	   The mechanisms described so far create incentives for rational
1711	   operators to behave.  That is, one operator aims to make another
1712	   behave responsibly by applying penalties and expects a rational
1713	   response (i.e. one that trades off costs against benefits).  It is
1714	   usually reasonable to assume that other network operators will behave
1715	   rationally (policy routing can avoid those that might not).  But this
1716	   approach does not protect against the misconfigurations and accidents
1717	   of other operators.

1719	   Therefore, we propose the following two mechanisms at a network's
1720	   borders to provide "defence in depth".  Both are similar:

1722	   Highly positive flows:  A small sample of positive packets should be
1723	      picked randomly as they cross a border interface.  Then subsequent
1724	      packets matching the same source and destination address and DSCP
1725	      should be monitored.  If the fraction of positive marking is well
1726	      above a threshold (to be determined by operational practice), a
1727	      management alarm SHOULD be raised, and the flow MAY be
1728	      automatically subject to focused drop.

1730	   Persistently negative flows:  A small sample of congestion marked
1731	      packets should be picked randomly as they cross a border
1732	      interface.  Then subsequent packets matching the same source and
1733	      destination address and DSCP should be monitored.  If the RE
1734	      blanking fraction minus the congestion marking fraction is
1735	      persistently negative, a management alarm SHOULD be raised, and
1736	      the flow MAY be automatically subject to focused drop.

1738	   Both these mechanisms rely on the fact that highly positive (or
1739	   negative) flows will appear more quickly in the sample by selecting
1740	   randomly solely from positive (or negative) packets.

1742	   Note that there is no assumption that _users_ behave rationally.  The
1743	   system is protected from the vagaries of irrational user behaviour by
1744	   the ingress gateways, which transform internal penalties into a
1745	   deterministic, admission control mechanism that prevents users from
1746	   misbehaving, by directly engineered means.

1748	6.  Analysis

1750	   The domains in Figure 1 are not expected to be completely malicious
1751	   towards each other.  After all, we can assume that they are all co-
1752	   operating to provide an internetworking service to the benefit of
1753	   each of them and their customers.  Otherwise their routing polices
1754	   would not interconnect them in the first place.  However, we assume
1755	   that they are also competitors of each other.  So a network may try
1756	   to contravene our proposed protocol if it would gain or make a
1757	   competitor lose, or both, but only if it can do so without being
1758	   caught.  Therefore we do not have to consider every possible random
1759	   attack one network could launch on the traffic of another, given
1760	   anyway one network can always drop or corrupt packets that it
1761	   forwards on behalf of another.

1763	   Therefore, we only consider new opportunities for _gainful_ attack
1764	   that our proposal introduces.  But to a certain extent we can also
1765	   rely on the in depth defences we have described (Section 5.6.3 )
1766	   intended to mitigate the potential impact if one network accidentally
1767	   misconfiguring the workings of this protocol.

1769	   The ingress and egress gateways are shown in the most generic
1770	   arrangement possible in Figure 1, without any surrounding network.
1771	   This allows us to consider more specific cases where these gateways
1772	   and a neighbouring network are operated by the same player.  As well
1773	   as cases where the same player operates neighbouring networks, we
1774	   will also consider cases where the two gateways collude as one player
1775	   and where the sender and receiver collude as one.  Collusion of other
1776	   sets of domains is less likely, but we will consider such cases.  In
1777	   the general case, we will assume none of the nine trust domains
1778	   across the figure fully trust any of the others.

1780	   As we only propose to change routers within the Diffserv region, we
1781	   assume the operators of networks outside the region will be doing
1782	   per-flow policing.  That is, we assume the networks outside the
1783	   Diffserv region and the gateways around its edges can protect
1784	   themselves.  So given we are proposing to remove flow policing from
1785	   some networks, our primary concern must be to protect networks that
1786	   don't do per-flow policing (the potential `victims') from those that
1787	   do (the `enemy').  The ingress and egress gateways are the only way
1788	   the outer enemy can get at the middle victim, so we can consider the
1789	   gateways as the representatives of the enemy as far as domains A, B
1790	   and C are concerned.  We will call this trust scenario `edges against
1791	   middles'.

1793	   Earlier in this memo, we outlined the classic border rate policing
1794	   problem (Section 3).  It will now be useful to reiterate the
1795	   motivations that are the root cause of the problem.  The more
1796	   reservations a gateway can allow, the more revenue it receives.  The
1797	   middle networks want the edges to comply with the admission control
1798	   protocol when they become so congested that their service to others
1799	   might suffer.  The middle networks also want to ensure the edges
1800	   cannot steal more service from them than they are entitled to.

1802	   In the context of this `edges against middles' scenario, the re-ECN
1803	   protocol has two main effects:

1805	   o  The more pre-congestion there is on a path across the Diffserv
1806	      region, the higher the ingress gateway must declare downstream
1807	      pre-congestion.

1809	   o  If the ingress gateway does not declare downstream pre-congestion
1810	      high enough on average, it will `hit the ground before the
1811	      runway', going negative and triggering sanctions, either directly
1812	      against the traffic or against the ingress gateway at a management
1813	      level

1815	   An executive summary of our security analysis can be stated in three
1816	   parts, distinguished by the type of collusion considered.

1818	   Neighbour-only Middle-Middle Collusion:  Here there is no collusion
1819	      or collusion is limited to neighbours in the feedback loop.  In
1820	      other words, two neighbouring networks can be assumed to act as
1821	      one.  Or the egress gateway might collude with domain C. Or the
1822	      ingress gateway might collude with domain A. Or ingress and egress
1823	      gateways might collude with each other.

1825	      In these cases where only neighbours in the feedback loop collude,
1826	      we concludes that all parties have a positive incentive to declare
1827	      downstream pre-congestion truthfully, and the ingress gateway has
1828	      a positive incentive to invoke admission control when congestion
1829	      rises above the admission threshold in any network in the region
1830	      (including its own).  No party has an incentive to send more
1831	      traffic than declared in reservation signalling (even though only
1832	      the gateways read this signalling).  In short, no party can gain
1833	      at the expense of another.

1835	   Non-neighbour Middle-Middle Collusion:  In the case of other forms of
1836	      collusion between middle networks (e.g. between domain A and C) it
1837	      would be possible for say A & C to create a tunnel between
1838	      themselves so that A would gain at the expense of B. But C would
1839	      then lose the gain that A had made.  Therefore the value to A & C
1840	      of colluding to mount this attack seems questionable.  It is made
1841	      more questionable, because the attack can be statistically
1842	      detected by B using the second `defence in depth' mechanism
1843	      mentioned already.  Note that C can defend itself from being
1844	      attacked through a tunnel by treating the tunnel end point as a
1845	      direct link to a neighbouring network (e.g. as if A were a
1846	      neighbour of C, via the tunnel), which falls back to the safety of
1847	      the neighbour-only scenario.

1849	   Middle-Edge Collusion:  Collusion between networks or gateways within
1850	      the Diffserv region and networks or users outside the region has
1851	      not yet been fully analysed.  The presence of full per-flow
1852	      policing at the ingress gateway seems to make this a less likely
1853	      source of a successful attack.

1855	   {ToDo: Due to lack of time, the full write up of the security
1856	   analysis is deferred to the next version of this memo.}

1858	   Finally, it is well known that the best person to analyse the
1859	   security of a system is not the designer.  Therefore, our confident
1860	   claims must be hedged with doubt until others with perhaps a greater
1861	   incentive to break it have mounted a full analysis.

1863	7.  Incremental Deployment

1865	   We believe ECN has so far not been widely deployed because it
1866	   requires widespread end system and network deployment just to achieve
1867	   a marginal improvement in performance.  The ability to offer a new
1868	   service (admission control) would be a much stronger driver for ECN
1869	   deployment.

1871	   As stated in the introduction, the aim of this memo is to "Design in
1872	   security from the start" when admission control is based on pre-
1873	   congestion notification.  The proposal has been designed so that
1874	   security can be added some time after first deployment, but only if
1875	   the PCN wire protocol encoding is defined with the foresight to
1876	   accommodate the extended set of codepoints defined in this document.
1877	   Given admission control based on pre-congestion notification requires
1878	   few changes to standards, it should be deployable fairly soon.
1879	   However, re-ECN requires a change to IP, which may take a little
1880	   longer.

1882	   We expect that initial deployments of PCN-based admission control
1883	   will be confined to single networks, or to clubs of networks that
1884	   trust each other.  The proposal in this memo will only become
1885	   relevant once networks with conflicting interests wish to
1886	   interconnect their admission controlled services, but without the
1887	   scalability constraints of per-flow border policing.  It will not be
1888	   possible to use re-ECN, even in a controlled environment between
1889	   consenting operators, unless it is standardised into IP.  Given the
1890	   IPv4 header has limited space for further changes, current IESG
1891	   policy [RFC4727] is not to allow experimental use of codepoints in
1892	   the IPv4 header, as whenever an experiment isn't taken up, the space
1893	   it used tends to be impossible to reclaim.

1895	   If PCN-based admission control is deployed before re-ECN is
1896	   standardised into IP, wherever a networks (or club of networks)
1897	   connects to another network (or club of networks) with conflicting
1898	   interests, they will place a gateway between the two regions that
1899	   does per-flow rate policing and admission control.  If re-ECN is
1900	   eventually standardised into IP, it will be possible for these
1901	   separate regions to upgrade all their gateways to use re-ECN before
1902	   removing the per-flow policing gateways between them.  Given the
1903	   edge-to-edge deployment model of PCN-based admission control, it is
1904	   reasonable to imagine this incremental deployment model without
1905	   needing to cater for partial deployment of re-ECN in just some of the
1906	   gateways around one Diffserv region.

1908	   Only the edge gateways around a Diffserv region have to be upgraded
1909	   to add re-ECN support, not interior routers.  It is also necessary to
1910	   add the mechanisms that use re-ECN to secure a network against
1911	   misbehaving gateways and networks.  Specifically, these are the
1912	   border mechanisms (Section 5.6) and the mechanisms to sanction
1913	   dishonest marking (Section 5.5).

1915	   We also RECOMMEND adding improvements to forwarding on interior
1916	   routers (Section 4.3.4).  But the system works whether all, some or
1917	   none are upgraded, so interior routers may be upgraded in a piecemeal
1918	   fashion at any time.

1920	8.  Design Choices and Rationale

1922	   The primary insight of this work is that downstream congestion is the
1923	   metric that would be most useful to control an internetwork, and
1924	   particularly to police how one network responds to the congestion it
1925	   causes in a remote network.  This is the problem that has previously
1926	   made it so hard to provide scalable admission control.

1928	   The case for using re-feedback (a generalisation of re-ECN) to police
1929	   congestion response and provide QoS is made in [Re-fb].  Essentially,
1930	   the insight is that congestion is a factor that crosses layers from
1931	   the physical upwards.  Therefore re-feedback polices congestion where
1932	   it emerges from a physical interface between networks.  This is
1933	   achieved by bringing the congestion information to the interface,
1934	   rather than examining packet addressing where there is congestion.

1936	   Then congestion crossing the physical interface at a border can be
1937	   policed at the interface, rather than policing the congestion on
1938	   packets that claim to come from an address (which may be spoofed).
1939	   Also, re-feedback works in the network layer independently of other
1940	   layers--despite its name re-feedback does not actually require
1941	   feedback.  It requires a source to act conservatively before it gets
1942	   feedback.

1944	   On the subject of lack of feedback, the feedback not established
1945	   (FNE) codepoint is motivated by arguments for a state set-up bit in
1946	   IP to prevent state exhaustion attacks.  This idea was first put
1947	   forward informally by David Clark and documented by Handley and
1948	   Greenhalgh in [Steps_DoS].  The idea is that network layer datagrams
1949	   should signal explicitly when they require state to be created in the
1950	   network layer or the layer above (e.g. at flow start).  Then a node
1951	   can refuse to create any state unless a datagram declares this
1952	   intent.  We believe the proposed FNE codepoint serves the same
1953	   purpose as the proposed state-set-up bit, but it has been overloaded
1954	   with a more specific purpose, using it on more packets than just the
1955	   first in a flow, but never less (i.e. it is idempotent).  In effect
1956	   the FNE codepoint serves the purpose of a `soft-state set-up
1957	   codepoint'.

1959	   The re-feedback paper [Re-fb] also makes the case for converting the
1960	   economic interpretation of congestion into hard engineering
1961	   mechanism, which is the basis of the approach used in this memo.  The
1962	   admission control gateways around the Diffserv region use hard
1963	   engineering, not incentives, to prevent end users from sending more
1964	   traffic than they have reserved.  Incentive-based mechanisms are only
1965	   used between networks, because they are expected to respond to
1966	   incentives more rationally than end-users can be expected to.
1967	   However, even then, a network can use fail-safes to protect itself
1968	   from excessively unusual behaviour by neighbouring networks, whether
1969	   due to an accidental misconfiguration or malicious intent.

1971	   The guiding principle behind the incentive-based approach used
1972	   between networks is that any gain from subverting the protocol should
1973	   be precisely neutralised, rather than punished.  If a gain is
1974	   punished to a greater extent than is sufficient to neutralise it, it
1975	   will most likely open up a new vulnerability, where the amplifying
1976	   effect of the punishment mechanism can be turned on others.

1978	   The re-feedback paper also makes the case against the use of
1979	   congestion charging to police congestion if it is based on classic
1980	   feedback (where only upstream congestion is visible to network
1981	   elements).  It argues this would open up receiving networks to
1982	   `denial of funds' attacks and would require end users to accept
1983	   dynamic pricing (which few would).

1985	   Re-ECN has been deliberately designed to simplify policing at the
1986	   borders between networks.  These trust boundaries are the critical
1987	   pinch-points that will limit the scalability of the whole
1988	   internetwork unless the overall design minimises the complexity of
1989	   security functions at these borders.  The border mechanisms described
1990	   in this memo run passively in parallel to data forwarding and they do
1991	   not require per-flow processing.

1993	9.  Security Considerations

1995	   This whole memo concerns the security of a scalable admission control
1996	   system.  In particular the analysis section.  Below some specific
1997	   security issues are mentioned that did not belong elsewhere or which
1998	   comment on the overall robustness of the security provided by the
1999	   design.

2001	   Firstly, we must repeat the statement of applicability in the
2002	   analysis: that we only consider new opportunities for _gainful_
2003	   attack that our proposal introduces, particularly if the attacker can
2004	   avoid being identified.  Despite only involving a few bits, there is
2005	   sufficient complexity in the whole system that there are probably
2006	   numerous possibilities for other attacks.  However, as far as we are
2007	   aware, none reap any benefit to the attacker.  For instance, it would
2008	   be possible for a downstream network to remove the congestion
2009	   markings introduced by an upstream network, but it would only lose
2010	   out on the penalties it could apply to a downstream network.

2012	   When one network forwards a neighbouring network's traffic it will
2013	   always be possible to cause damage by dropping or corrupting it.
2014	   Therefore we do not believe networks would set their routing policies
2015	   to interconnect in the first place if they didn't trust the other
2016	   networks not to arbitrarily damage their traffic.

2018	   Having said this, we do want to highlight some of the weaker parts of
2019	   our argument.  We have argued that networks will be dissuaded from
2020	   faking congestion marking by the possibility that upstream networks
2021	   will route round them.  As we have said, these arguments are based on
2022	   fairly delicate assumptions and will remain fairly tenuous until
2023	   proved in practice, particularly close to the egress where less
2024	   competitive routing is likely.

2026	   We should also point out that the approach in this memo was only
2027	   designed to be robust for admission control.  We do not claim the
2028	   incentives will always be strong enough to force correct flow pre-
2029	   emption behaviour.  This is because a user will tend to perceive much
2030	   greater loss in value if a flow is pre-empted than if admission is
2031	   denied at the start.  However, in general the incentives for correct
2032	   flow pre-emption are similar to those for admission control.

2034	   Finally, it may seem that the 8 codepoints that have been made
2035	   available by extending the ECN field with the RE flag have been used
2036	   rather wastefully.  In effect the RE flag has been used as an
2037	   orthogonal single bit in nearly all cases.  The only exception being
2038	   when the ECN field is cleared to "00".  The mapping of the codepoints
2039	   in an earlier version of this proposal used the codepoint space more
2040	   efficiently, but the scheme became vulnerable to a network operator
2041	   focusing its congestion marking to mark more positive than neutral
2042	   packets in order to reduce its penalties (see Appendix B of
2043	   [Re-TCP]).

2045	   With the scheme as now proposed, once the RE flag is set or cleared
2046	   by the sender or its proxy, it should not be written by the network,
2047	   only read.  So the gateways can detect if any network maliciously
2048	   alters the RE flag.  IPSec AH integrity checking does not cover the
2049	   IPv4 option flags (they were considered mutable--even the one we
2050	   propose using for the RE flag that was `currently unused' when IPSec
2051	   was defined).  But it would be sufficient for a pair of gateways to
2052	   make random checks on whether the RE flag was the same when it
2053	   reached the egress gateway as when it left the ingress.  Indeed, if
2054	   IPSec AH had covered the RE flag, any network intending to alter
2055	   sufficient RE flags to make a gain would have focused its alterations
2056	   on packets without authenticating headers (AHs).

2058	   No cryptographic algorithms have been harmed in the making of this
2059	   proposal.

2061	10.  IANA Considerations

2063	   This memo includes no request to IANA.

2065	11.  Conclusions

2067	   This memo builds on a promising technique to solve the classic
2068	   problem of making flow admission control scale to any size network.
2069	   It involves the use of Diffserv in a deployment model that uses pre-
2070	   congestion notification feedback to control admission into a network
2071	   path [PCN-arch].  However as it stands, that deployment model depends
2072	   on all network domains trusting each other to comply with the
2073	   protocols, invoking admission control and flow pre-emption when
2074	   requested.

2076	   We propose that the congestion feedback used in that deployment model
2077	   should be re-echoed into the forward data path, by making a trivial
2078	   modification to the ingress gateway.  We then explain how the
2079	   resulting downstream pre-congestion metric in packets can be
2080	   monitored in bulk at borders to sufficiently emulate flow rate
2081	   policing.

2083	   We claim the result of combining these two approaches is an admission
2084	   control system that scales to any size network _and_ any number of
2085	   interconnected networks, even if they all act in their own interests.

2087	   This proposal aims to convince its readers to "Design in Security
2088	   from the start," by ensuring the PCN wire protocol encoding can
2089	   accommodate the extended set of codepoints defined in this document,
2090	   even if border policing is not needed at first.  This way, we will
2091	   not build ourselves tomorrow's legacy problem.

2093	   Re-echoing congestion feedback is based on a principled technique
2094	   called Re-ECN [Re-TCP], designed to add accountability for causing
2095	   congestion to the general-purpose IP datagram service.  Re-ECN
2096	   proposes to consume the last completely unused bit in the basic IPv4
2097	   header.

2099	12.  Acknowledgements

2101	   All the following have given helpful comments and some may become co-
2102	   authors of later drafts: Arnaud Jacquet, Alessandro Salvatori, Steve
2103	   Rudkin, David Songhurst, John Davey, Ian Self, Anthony Sheppard,
2104	   Carla Di Cairano-Gilfedder (BT), Mark Handley (who identified the
2105	   excess canceled packets attack), Stephen Hailes, Adam Greenhalgh
2106	   (UCL), Francois Le Faucheur, Anna Charny (Cisco), Jozef Babiarz,
2107	   Kwok-Ho Chan, Corey Alexander (Nortel), David Clark, Bill Lehr,
2108	   Sharon Gillett, Steve Bauer (MIT) (who publicised various dummy
2109	   traffic attacks), Sally Floyd (ICIR) and comments from participants
2110	   in the CFP/CRN Inter-Provider QoS, Broadband and DoS-Resistant
2111	   Internet working groups.

2113	13.  Comments Solicited

2115	   Comments and questions are encouraged and very welcome.  They can be
2116	   addressed to the IETF Transport Area working group's mailing list
2117	   <tsvwg@ietf.org>, and/or to the authors.

2119	14.  References
2120	14.1.  Normative References

2122	   [PCN]      Briscoe, B., Eardley, P., Songhurst, D., Le Faucheur, F.,
2123	              Charny, A., Liatsos, V., Babiarz, J., Chan, K., Dudley,
2124	              S., Westberg, L., Bader, A., and G. Karagiannis, "Pre-
2125	              Congestion Notification Marking",
2126	              draft-briscoe-tsvwg-cl-phb-03 (work in progress),
2127	              October 2006.

2129	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
2130	              Requirement Levels", BCP 14, RFC 2119, March 1997.

2132	   [RFC2211]  Wroclawski, J., "Specification of the Controlled-Load
2133	              Network Element Service", RFC 2211, September 1997.

2135	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
2136	              of Explicit Congestion Notification (ECN) to IP",
2137	              RFC 3168, September 2001.

2139	   [RFC3246]  Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec,
2140	              J., Courtney, W., Davari, S., Firoiu, V., and D.
2141	              Stiliadis, "An Expedited Forwarding PHB (Per-Hop
2142	              Behavior)", RFC 3246, March 2002.

2144	   [RSVP-ECN]
2145	              Le Faucheur, F., Charny, A., Briscoe, B., Eardley, P.,
2146	              Babiarz, J., and K. Chan, "RSVP Extensions for Admission
2147	              Control over Diffserv using Pre-congestion Notification",
2148	              draft-lefaucheur-rsvp-ecn-01 (work in progress),
2149	              June 2006.

2151	   [Re-TCP]   Briscoe, B., Jacquet, A., Salvatori, A., and M. Koyabi,
2152	              "Re-ECN: Adding Accountability for Causing Congestion to
2153	              TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-04 (work in
2154	              progress), June 2007.

2156	14.2.  Informative References

2158	   [CLoop_pol]
2159	              Salvatori, A., "Closed Loop Traffic Policing", Politecnico
2160	              Torino and Institut Eurecom Masters Thesis ,
2161	              September 2005.

2163	   [ECN-BGP]  Mortier, R. and I. Pratt, "Incentive Based Inter-Domain
2164	              Routeing", Proc Internet Charging and QoS Technology
2165	              Workshop (ICQT'03) pp308--317, September 2003, <http://
2166	              research.microsoft.com/users/mort/publications.aspx>.

2168	   [ECN-MPLS]
2169	              Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion
2170	              Marking in MPLS", draft-ietf-tsvwg-ecn-mpls-01 (work in
2171	              progress), June 2007.

2173	   [IXQoS]    Briscoe, B. and S. Rudkin, "Commercial Models for IP
2174	              Quality of Service Interconnect", BT Technology Journal
2175	              (BTTJ) 23(2)171--195, April 2005,
2176	              <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#ixqos>.

2178	   [NSIS-RMD]
2179	              Bader, A., Westberg, L., Karagiannis, G., Kappler, C., and
2180	              T. Phelan, "RMD-QOSM - The Resource Management in Diffserv
2181	              QOS Model", draft-ietf-nsis-rmd-09 (work in progress),
2182	              March 2007.

2184	   [PCN-arch]
2185	              Eardley, P., Babiarz, J., Chan, K., Charny, A., Geib, R.,
2186	              Karagiannis, G., Menth, M., and T. Tsou, "Pre-Congestion
2187	              Notification Architecture",
2188	              draft-eardley-pcn-architecture-00 (work in progress),
2189	              June 2007.

2191	   [RFC2205]  Braden, B., Zhang, L., Berson, S., Herzog, S., and S.
2192	              Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
2193	              Functional Specification", RFC 2205, September 1997.

2195	   [RFC2207]  Berger, L. and T. O'Malley, "RSVP Extensions for IPSEC
2196	              Data Flows", RFC 2207, September 1997.

2198	   [RFC2208]  Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell,
2199	              M., Romanow, A., Weinrib, A., and L. Zhang, "Resource
2200	              ReSerVation Protocol (RSVP) Version 1 Applicability
2201	              Statement Some Guidelines on Deployment", RFC 2208,
2202	              September 1997.

2204	   [RFC2747]  Baker, F., Lindell, B., and M. Talwar, "RSVP Cryptographic
2205	              Authentication", RFC 2747, January 2000.

2207	   [RFC2998]  Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L.,
2208	              Speer, M., Braden, R., Davie, B., Wroclawski, J., and E.
2209	              Felstaine, "A Framework for Integrated Services Operation
2210	              over Diffserv Networks", RFC 2998, November 2000.

2212	   [RFC3540]  Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
2213	              Congestion Notification (ECN) Signaling with Nonces",
2214	              RFC 3540, June 2003.

2216	   [RFC4727]  Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4,
2217	              ICMPv6, UDP, and TCP Headers", RFC 4727, November 2006.

2219	   [Re-fb]    Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C.,
2220	              Salvatori, A., Soppera, A., and M. Koyabe, "Policing
2221	              Congestion Response in an Internetwork Using Re-Feedback",
2222	              ACM SIGCOMM CCR 35(4)277--288, August 2005, <http://
2223	              www.acm.org/sigs/sigcomm/sigcomm2005/
2224	              techprog.html#session8>.

2226	   [Smart_rtg]
2227	              Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang,
2228	              "Optimizing Cost and Performance for Multihoming", ACM
2229	              SIGCOMM CCR 34(4)79--92, October 2004,
2230	              <http://citeseer.ist.psu.edu/698472.html>.

2232	   [Steps_DoS]
2233	              Handley, M. and A. Greenhalgh, "Steps towards a DoS-
2234	              resistant Internet Architecture", Proc. ACM SIGCOMM
2235	              workshop on Future directions in network architecture
2236	              (FDNA'04) pp 49--56, August 2004.

2238	Appendix A.  Implementation

2240	A.1.  Ingress Gateway Algorithm for Blanking the RE flag

2242	   The ingress gateway receives regular feedback reporting the fraction
2243	   of congestion marked octets for each aggregate arriving at the
2244	   egress.  So for each aggregate it should blank the RE flag on the
2245	   same fraction of octets.  It is more efficient to calculate the
2246	   reciprocal of this fraction when the signalling arrives, Z_0 = (1 /
2247	   Congestion-Level-Estimate).  Z_0 will be the number of octets of
2248	   packets the ingress should send with the RE flag set between those it
2249	   sends with the RE flag blanked.  Z_0 will also take account of the
2250	   sustainable rate reported during the flow pre-emption process, if
2251	   necessary.

2253	   A suitable pseudo-code algorithm for the ingress gateway is as
2254	   follows:

2256	   ====================================================================
2257	   B_i = 0                 /* interblank volume                     */
2258	   for each PCN-capable packet {
2259	       b = readLength(packet)      /* set b to packet size          */
2260	       B_i += b            /* accumulate interblank volume          */
2261	       if B_i < b * Z_0 {  /* test whether interblank volume...     */
2262	           writeRE(1)
2263	       } else {            /* ...exceeds blank RE spacing * pkt size*/
2264	           writeRE(0)      /* ...and if so, clear RE                */
2265	           B_i = 0         /* ...and re-set interblank volume       */
2266	       }
2267	   }
2268	   ====================================================================

2270	A.2.  Downstream Congestion Metering Algorithms

2272	A.2.1.  Bulk Downstream Congestion Metering Algorithm

2274	   To meter the bulk amount of downstream pre-congestion in traffic
2275	   crossing an inter-domain border, an algorithm is needed that
2276	   accumulates the size of positive packets and subtracts the size of
2277	   negative packets.  We maintain two counters:

2279	      V_b: accumulated pre-congestion volume

2281	      B: total data volume (in case it is needed)

2283	   A suitable pseudo-code algorithm for a border router is as follows:

2285	   ====================================================================
2286	   V_b = 0
2287	   B   = 0
2288	   for each PCN-capable packet {
2289	       b = readLength(packet)      /* set b to packet size          */
2290	       B += b                      /* accumulate total volume       */
2291	       if readEECN(packet) == (Re-Echo || FNE) {
2292	           V_b += b                /* increment...                  */
2293	       } elseif readEECN(packet) == ( AM(-1) || PM(-1) ) {
2294	           V_b -= b                /* ...or decrement V_b...        */
2295	       }                           /*...depending on EECN field     */
2296	   }
2297	   ====================================================================

2299	   At the end of an accounting period this counter V_b represents the
2300	   pre-congestion volume that penalties could be applied to, as
2301	   described in Section 5.3.

2303	   For instance, accumulated volume of pre-congestion through a border
2304	   interface over a month might be V_b = 5PB (petabyte = 10^15 byte).
2305	   This might have resulted from an average downstream pre-congestion
2306	   level of 1% on an accumulated total data volume of B = 500PB.

2308	A.2.2.  Inflation Factor for Persistently Negative Flows

2310	   The following process is suggested to complement the simple algorithm
2311	   above in order to protect against the various attacks from
2312	   persistently negative flows described in Section 5.6.1.  As explained
2313	   in that section, the most important and first step is to estimate the
2314	   contribution of persistently negative flows to the bulk volume of
2315	   downstream pre-congestion and to inflate this bulk volume as if these
2316	   flows weren't there.  The process below has been designed to give an
2317	   unbiased estimate, but it may be possible to define other processes
2318	   that achieve similar ends.

2320	   While the above simple metering algorithm is counting the bulk of
2321	   traffic over an accounting period, the meter should also select a
2322	   subset of the whole flow ID space that is small enough to be able to
2323	   realistically measure but large enough to give a realistic sample.
2324	   Many different samples of different subsets of the ID space should be
2325	   taken at different times during the accounting period, preferably
2326	   covering the whole ID space.  During each sample, the meter should
2327	   count the volume of positive packets and subtract the volume of
2328	   negative, maintaining a separate account for each flow in the sample.
2329	   It should run a lot longer than the large majority of flows, to avoid
2330	   a bias from missing the starts and ends of flows, which tend to be
2331	   positive and negative respectively.

2333	   Once the accounting period finishes, the meter should calculate the
2334	   total of the accounts V_{bI} for the subset of flows I in the sample,
2335	   and the total of the accounts V_{fI} excluding flows with a negative
2336	   account from the subset I. Then the weighted mean of all these
2337	   samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I}
2338	   V_{bI}.

2340	   If V_b is the result of the bulk accounting algorithm over the
2341	   accounting period (Appendix A.2.1) it can be inflated by this factor
2342	   a_S to get a good unbiased estimate of the volume of downstream
2343	   congestion over the accounting period a_S.V_b, without being polluted
2344	   by the effect of persistently negative flows.

2346	A.3.  Algorithm for Sanctioning Negative Traffic

2348	   {ToDo: Write up algorithms similar to Appendix D of [Re-TCP] for the
2349	   negative flow monitor with flow management algorithm and the variant
2350	   with bounded flow state.}

2352	Author's Address

2354	   Bob Briscoe
2355	   BT & UCL
2356	   B54/77, Adastral Park
2357	   Martlesham Heath
2358	   Ipswich  IP5 3RE
2359	   UK

2361	   Phone: +44 1473 645196
2362	   Email: bob.briscoe@bt.com
2363	   URI:   http://www.cs.ucl.ac.uk/staff/B.Briscoe/

2365	Full Copyright Statement

2367	   Copyright (C) The IETF Trust (2007).

2369	   This document is subject to the rights, licenses and restrictions
2370	   contained in BCP 78, and except as set forth therein, the authors
2371	   retain all their rights.

2373	   This document and the information contained herein are provided on an
2374	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
2375	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
2376	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
2377	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
2378	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
2379	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

2381	Intellectual Property

2383	   The IETF takes no position regarding the validity or scope of any
2384	   Intellectual Property Rights or other rights that might be claimed to
2385	   pertain to the implementation or use of the technology described in
2386	   this document or the extent to which any license under such rights
2387	   might or might not be available; nor does it represent that it has
2388	   made any independent effort to identify any such rights.  Information
2389	   on the procedures with respect to rights in RFC documents can be
2390	   found in BCP 78 and BCP 79.

2392	   Copies of IPR disclosures made to the IETF Secretariat and any
2393	   assurances of licenses to be made available, or the result of an
2394	   attempt made to obtain a general license or permission for the use of
2395	   such proprietary rights by implementers or users of this
2396	   specification can be obtained from the IETF on-line IPR repository at
2397	   http://www.ietf.org/ipr.

2399	   The IETF invites any interested party to bring to its attention any
2400	   copyrights, patents or patent applications, or other proprietary
2401	   rights that may cover technology that may be required to implement
2402	   this standard.  Please address the information to the IETF at
2403	   ietf-ipr@ietf.org.

2405	Acknowledgments

2407	   Funding for the RFC Editor function is provided by the IETF
2408	   Administrative Support Activity (IASA).  This document was produced
2409	   using xml2rfc v1.32 (of http://xml.resource.org/) from a source in
2410	   RFC-2629 XML format.