idnits 2.17.1 draft-briscoe-tsvwg-cl-architecture-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 33. -- Found old boilerplate from RFC 3978, Section 5.5 on line 2503. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2480. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2487. ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line 2507), which is fine, but *also* found old RFC 2026, Section 10.4C, paragraph 1 text on line 55. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document seems to lack an RFC 3979 Section 5, para. 3 IPR Disclosure Invitation -- however, there's a paragraph with a matching beginning. Boilerplate error? Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 3 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1228 has weird spacing: '... can be used ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC2205' is mentioned on line 1219, but not defined

  == Missing Reference: 'RT-ECN' is mentioned on line 642, but not defined

  == Unused Reference: 'AVQ' is defined on line 2217, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2309' is defined on line 2308, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2474' is defined on line 2312, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2597' is defined on line 2321, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3246' is defined on line 2334, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3270' is defined on line 2339, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-05) exists of
     draft-ietf-tsvwg-rsvp-dste-03

  -- Possible downref: Non-RFC (?) normative reference: ref. 'AVQ'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Breslau99'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Breslau00'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Briscoe'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DCAC'

  == Outdated reference: A later version (-01) exists of
     draft-davie-ecn-mpls-00

  ** Downref: Normative reference to an Informational RFC: RFC 3689 (ref.
     'EMERG-RQTS')

  ** Downref: Normative reference to an Informational RFC: RFC 3690 (ref.
     'EMERG-TEL')

  -- Possible downref: Normative reference to a draft: ref. 'Floyd' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GSPa'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GSP-TR'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ITU.MLPP.1990'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Johnson'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Low'

  == Outdated reference: A later version (-03) exists of
     draft-briscoe-tsvwg-cl-phb-02

  -- Possible downref: Normative reference to a draft: ref. 'PCN' 

  == Outdated reference: A later version (-09) exists of
     draft-briscoe-tsvwg-re-ecn-tcp-01

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Re-feedback'

  == Outdated reference: A later version (-01) exists of
     draft-briscoe-tsvwg-re-ecn-border-cheat-00

  -- Possible downref: Normative reference to a draft: ref. 'Re-PCN' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Reid'

  ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567)

  ** Downref: Normative reference to an Informational RFC: RFC 2475

  ** Downref: Normative reference to an Informational RFC: RFC 2998

  ** Downref: Normative reference to an Informational RFC: RFC 4542

  == Outdated reference: A later version (-20) exists of
     draft-ietf-nsis-rmd-03

  ** Downref: Normative reference to an Experimental draft:
     draft-ietf-nsis-rmd (ref. 'RMD')

  -- Possible downref: Normative reference to a draft: ref. 'RSVP-PCN' 

  == Outdated reference: A later version (-05) exists of
     draft-babiarz-tsvwg-rtecn-04

  -- Possible downref: Normative reference to a draft: ref. 'RTECN' 

  -- Possible downref: Normative reference to a draft: ref. 'RTECN-usage' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Songhurst'


     Summary: 14 errors (**), 0 flaws (~~), 19 warnings (==), 26 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	TSVWG                                                        B. Briscoe
2	Internet Draft                                               P. Eardley
3	draft-briscoe-tsvwg-cl-architecture-03.txt                 D. Songhurst
4	Expires: December 2006                                               BT

6	                                                       F. Le Faucheur
7	                                                              A. Charny
8	                                                    Cisco Systems, Inc

10	                                                           J. Babiarz
11	                                                                K. Chan
12	                                                            S. Dudley
13	                                                               Nortel

15	                                                       G. Karagiannis
16	                                       University of Twente / Ericsson

18	                                                             A. Bader
19	                                                          L. Westberg
20	                                                             Ericsson

22	                                                        26 June, 2006

24	     An edge-to-edge Deployment Model for Pre-Congestion Notification:
25	                 Admission Control over a DiffServ Region
26	                draft-briscoe-tsvwg-cl-architecture-03.txt

28	Status of this Memo

30	   By submitting this Internet-Draft, each author represents that any
31	   applicable patent or other IPR claims of which he or she is aware
32	   have been or will be disclosed, and any of which he or she becomes
33	   aware will be disclosed, in accordance with Section 6 of BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF), its areas, and its working groups.  Note that
37	   other groups may also distribute working documents as Internet-
38	   Drafts.

40	   Internet-Drafts are draft documents valid for a maximum of six months
41	   and may be updated, replaced, or obsoleted by other documents at any
42	   time.  It is inappropriate to use Internet-Drafts as reference
43	   material or to cite them other than as "work in progress".

45	   The list of current Internet-Drafts can be accessed at
46	        http://www.ietf.org/ietf/1id-abstracts.txt

48	   The list of Internet-Draft Shadow Directories can be accessed at
49	        http://www.ietf.org/shadow.html

51	   This Internet-Draft will expire on September 6, 2006.

53	Copyright Notice

55	   Copyright (C) The Internet Society (2006).  All Rights Reserved.

57	Abstract

59	   This document describes a deployment model for pre-congestion
60	   notification (PCN). PCN-based flow admission control and if necessary
61	   flow pre-emption preserve the Controlled Load service to admitted
62	   flows. Routers in a large DiffServ-based region of the Internet use
63	   new pre-congestion notification marking to give early warning of
64	   their own congestion. Gateways around the edges of the region convert
65	   measurements of this packet granularity marking into admission
66	   control and pre-emption functions at flow granularity. Note that
67	   interior routers of the DiffServ-based region do not require flow
68	   state or signalling - they only have to do the bulk packet marking of
69	   PCN. Hence an end-to-end Controlled Load service can be achieved
70	   without any scalability impact on interior routers.

72	Authors' Note (TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION)

74	   This document is posted as an Internet-Draft with the intention of
75	   eventually becoming an INFORMATIONAL RFC.

77	Table of Contents

79	   1. Introduction......................................... 5
80	      1.1. Summary......................................... 5
81	         1.1.1. Flow admission control........................ 7
82	         1.1.2. Flow pre-emption............................. 9
83	         1.1.3. Both admission control and pre-emption.......... 10
84	      1.2. Terminology.................................... 10
85	      1.3. Existing terminology............................. 12
86	      1.4. Standardisation requirements...................... 12
87	      1.5. Structure of rest of the document.................. 13
88	   2. Key aspects of the deployment model..................... 14
89	      2.1. Key goals...................................... 14
90	      2.2. Key assumptions................................. 15
91	      2.3. Key benefits ................................... 17
92	   3. Deployment model.................................... 19
93	      3.1. Admission control ............................... 19
94	         3.1.1. Pre-Congestion Notification for Admission Marking. 19
95	         3.1.2. Measurements to support admission control........ 19
96	         3.1.3. How edge-to-edge admission control supports end-to-end
97	         QoS signalling ................................... 20
98	         3.1.4. Use case................................... 20
99	      3.2. Flow pre-emption................................ 22
100	         3.2.1. Alerting an ingress gateway that flow pre-emption may be
101	         needed.......................................... 22
102	         3.2.2. Determining the right amount of CL traffic to drop 24
103	         3.2.3. Use case for flow pre-emption ................. 25
104	   4. Summary of Functionality.............................. 27
105	      4.1. Ingress gateways................................ 27
106	      4.2. Interior routers................................ 28
107	      4.3. Egress gateways................................. 28
108	      4.4. Failures....................................... 29
109	   5. Limitations and some potential solutions................. 31
110	      5.1. ECMP.......................................... 31
111	      5.2. Beat down effect................................ 33
112	      5.3. Bi-directional sessions .......................... 35
113	      5.4. Global fairness................................. 37
114	      5.5. Flash crowds ................................... 39
115	      5.6. Pre-empting too fast............................. 40
116	      5.7. Other potential extensions........................ 42
117	         5.7.1. Tunnelling................................. 42
118	         5.7.2. Multi-domain and multi-operator usage........... 43
119	         5.7.3. Preferential dropping of pre-emption marked packets43
120	         5.7.4. Adaptive bandwidth for the Controlled Load service 44
121	         5.7.5. Controlled Load service with end-to-end Pre-Congestion
122	         Notification..................................... 44
123	         5.7.6. MPLS-TE ................................... 45
124	   6. Relationship to other QoS mechanisms.................... 46
125	      6.1. IntServ Controlled Load .......................... 46
126	      6.2. Integrated services operation over DiffServ.......... 46
127	      6.3. Differentiated Services .......................... 46
128	      6.4. ECN........................................... 47
129	      6.5. RTECN......................................... 47
130	      6.6. RMD........................................... 47
131	      6.7. RSVP Aggregation over MPLS-TE...................... 48
132	   7. Security Considerations............................... 49
133	   8. Acknowledgements.................................... 49
134	   9. Comments solicited................................... 49
135	   10. Changes from earlier versions of the draft.............. 50
136	   11. Appendices ........................................ 51
137	      11.1. Appendix A: Explicit Congestion Notification ........ 51
138	      11.2. Appendix B: What is distributed measurement-based admission
139	      control?........................................... 52
140	      11.3. Appendix C: Calculating the Exponentially weighted moving
141	      average (EWMA)...................................... 53
142	   12. References ........................................ 55
143	   Authors' Addresses..................................... 60
144	   Intellectual Property Statement .......................... 62
145	   Disclaimer of Validity.................................. 62
146	   Copyright Statement.................................... 62

148	1. Introduction

150	1.1. Summary

152	   This document describes a deployment model to achieve an end-to-end
153	   Controlled Load service by using (within a large region of the
154	   Internet) DiffServ and edge-to-edge distributed measurement-based
155	   admission control and flow pre-emption. Controlled load service is a
156	   quality of service (QoS) closely approximating the QoS that the same
157	   flow would receive from a lightly loaded network element [RFC2211].
158	   Controlled Load (CL) is useful for inelastic flows such as those for
159	   real-time media.

161	   In line with the "IntServ over DiffServ" framework defined in
162	   [RFC2998], the CL service is supported end-to-end and RSVP signalling
163	   [RFC2205] is used end-to-end, over an edge-to-edge DiffServ region.

165	 ___    ___    _______________________________________    ____    ___
166	|   |  |   |  |                                       |  |    |  |   |
167	|   |  |   |  |Ingress         Interior         Egress|  |    |  |   |
168	|   |  |   |  |gateway         routers         gateway|  |    |  |   |
169	|   |  |   |  |-------+  +-------+  +-------+  +------|  |    |  |   |
170	|   |  |   |  | PCN-  |  | PCN-  |  | PCN-  |  |      |  |    |  |   |
171	|   |..|   |..|marking|..|marking|..|marking|..| Meter|..|    |..|   |
172	|   |  |   |  |-------+  +-------+  +-------+  +------|  |    |  |   |
173	|   |  |   |  |  \                                 /  |  |    |  |   |
174	|   |  |   |  |   \                               /   |  |    |  |   |
175	|   |  |   |  |    \  Congestion-Level-Estimate  /    |  |    |  |   |
176	|   |  |   |  |     \  (for admission control)  /     |  |    |  |   |
177	|   |  |   |  |      --<-----<----<----<-----<--      |  |    |  |   |
178	|   |  |   |  |      Sustainable-Aggregate-Rate       |  |    |  |   |
179	|   |  |   |  |        (for flow pre-emption)         |  |    |  |   |
180	|___|  |___|  |_______________________________________|  |____|  |___|

182	Sx     Access               CL-region                   Access    Rx
183	End    Network                                          Network   End
184	Host                                                              Host
185	                <------ edge-to-edge signalling ----->
186	              (for admission control & flow pre-emption)

188	<-------------------end-to-end QoS signalling protocol--------------->

190	Figure 1: Overall QoS architecture (NB terminology explained later)
191	   Figure 1 shows an example of an overall QoS architecture, where the
192	   two access networks are connected by a CL-region. Another possibility
193	   is that there are several CL-regions between the access networks -
194	   each would operate the Pre-Congestion Notification mechanisms
195	   separately.

197	   In Section 1.1.1 we summarise how admission of new CL microflows is
198	   controlled so as to deliver the required QoS. In abnormal
199	   circumstances, for instance a disaster affecting multiple interior
200	   routers, then the QoS on existing CL microflows may degrade even if
201	   care was exercised when admitting those microflows before those
202	   circumstances. Therefore we also propose a mechanism (summarised in
203	   Section 1.1.2) to pre-empt some of the existing microflows. Then
204	   remaining microflows retain their expected QoS, while improved QoS is
205	   quickly restored to lower priority traffic.

207	   As a fundamental building block to support these two mechanisms, we
208	   introduce "Pre-Congestion Notification". Pre-Congestion Notification
209	   (PCN) builds on the concepts of RFC 3168, "The addition of Explicit
210	   Congestion Notification to IP". The [PCN] document defines the
211	   respective algorithms that determine when a PCN-enabled router marks
212	   a packet with Admission Marking or Pre-emption Marking, depending on
213	   the traffic level.

215	   In order to support CL traffic we would expect PCN to supplement the
216	   existing Expedited Forwarding (EF). Within the controlled edge-to-
217	   edge region, a particular packet receives the Pre-Congestion
218	   Notification (PCN) behaviour if the packet's differentiated services
219	   codepoint (DSCP) is set to EF and also the ECN field indicates ECN
220	   Capable Transport. However, PCN is not only intended to supplement
221	   EF. PCN is specified (in [PCN]) as a building block which can
222	   supplement the scheduling behaviour of other PHBs.

224	   There are various possible ways to encode the markings into a packet,
225	   using the ECN field and perhaps other DSCPs, which are discussed in
226	   [PCN]. In this draft we use the abstract names Admission Marking and
227	   Pre-emption Marking.

229	   This framework assumes that the Pre-Congestion Notification behaviour
230	   is used in a controlled environment, i.e. within the controlled edge-
231	   to-edge region.

233	1.1.1. Flow admission control

235	   This document describes a new admission control procedure for an
236	   edge-to-edge region, which uses new per-hop Pre-Congestion
237	   Notification 'admission marking' as a fundamental building block. In
238	   turn, an end-to-end CL service would use this as a building block
239	   within a broader QoS architecture.

241	   The per-hop, edge-to-edge and end-to-end aspects are now briefly
242	   introduced in turn.

244	   Appendix A provides a brief summary of Explicit Congestion
245	   Notification (ECN) [RFC3168]. It specifies that a router sets the ECN
246	   field to the Congestion Experienced (CE) value as a warning of
247	   incipient congestion. RFC3168 doesn't specify a particular algorithm
248	   for setting the CE codepoint, although Random Early Detection (RED)
249	   is expected to be used.

251	   Pre-Congestion Notification (PCN) builds on the concepts of ECN. PCN
252	   introduces a new algorithm that Admission Marks packets before there
253	   is any significant build-up of CL packets in the queue. Admission
254	   marked packets therefore act as an "early warning" when the amount of
255	   packets flowing is getting close to the engineered capacity. Hence it
256	   can be used with per-hop behaviours (PHBs) designed to operate with
257	   very low queue occupancy, such as Expedited Forwarding (EF). Note
258	   that our use of the ECN field operates across the CL-region, i.e.
259	   edge-to-edge, and not host-to-host as in [RFC3168].

261	   Turning next to the edge-to-edge aspect. All routers within a region
262	   of the Internet, which we call the CL-region, apply the PHB used for
263	   CL traffic and the Pre-Congestion Notification behaviour. Traffic
264	   must enter/leave the CL-region through ingress/egress gateways, which
265	   have special functionality. Typically the CL-region is the core or
266	   backbone of an operator. The CL service is achieved "edge-to-edge"
267	   across the CL-region, by using distributed measurement-based
268	   admission control: the decision whether to admit a new microflow
269	   depends on a measurement of the existing traffic between the same
270	   pair of ingress and egress gateways (i.e. the same pair as the
271	   prospective new microflow). (See Appendix B for further discussion on
272	   "What is distributed measurement-based admission control?")

274	   As CL packets travel across the CL-region, routers will admission
275	   mark packets (according to the Pre-Congestion Notification algorithm)
276	   as an "early warning" of potential congestion, i.e. before there is
277	   any significant build-up of CL packets in the queue. For traffic from
278	   each remote ingress gateway, the CL-region's egress gateway measures
279	   the fraction of CL traffic that is admission marked. The egress
280	   gateway calculates the value on a per bit basis as a moving average
281	   (exponentially weighted is suggested), and which we term Congestion-
282	   Level-Estimate (CLE). Then it reports it to the CL-region's ingress
283	   gateway, piggy-backed on the signalling for a new flow. The ingress
284	   gateway only admits the new CL microflow if the Congestion-Level-
285	   Estimate is less than the value of the CLE-threshold. Hence
286	   previously accepted CL microflows will suffer minimal queuing delay,
287	   jitter and loss.

289	   In turn, the edge-to-edge architecture is a building block in
290	   delivering an end-to-end CL service. The approach is similar to that
291	   described in [RFC2998] for Integrated services operation over
292	   DiffServ networks. Like [RFC2998], an IntServ class (CL in our case)
293	   is achieved end-to-end, with a CL-region viewed as a single
294	   reservation hop in the total end-to-end path. Interior routers of the
295	   CL-region do not process flow signalling nor do they hold per flow
296	   state. We assume that the end-to-end signalling mechanism is RSVP
297	   (Section 2.2). However, the RSVP signalling may itself be originated
298	   or terminated by proxies still closer to the edge of the network,
299	   such as home hubs or the like, triggered in turn by application layer
300	   signalling. [RFC2998] and our approach are compared further in
301	   Section 6.2.

303	   An important benefit compared with the IntServ over DiffServ model
304	   [RFC2998] arises from the fact that the load is controlled
305	   dynamically rather than with traffic conditioning agreements (TCAs).
306	   TCAs were originally introduced in the (informational) DiffServ
307	   architecture [RFC2475] as an alternative to reservation processing in
308	   the interior region in order to reduce the burden on interior
309	   routers. With TCAs, in practice service providers rely on
310	   subscription-time Service Level Agreements that statically define the
311	   parameters of the traffic that will be accepted from a customer. The
312	   problem arises because the TCA at the ingress must allow any
313	   destination address, if it is to remain scalable. But for longer
314	   topologies, the chances increase that traffic will focus on an
315	   interior resource, even though it is within contract at the ingress
316	   [Reid], e.g. all flows converge on the same egress gateway. Even
317	   though networks can be engineered to make such failures rare, when
318	   they occur all inelastic flows through the congested resource fail
319	   catastrophically.

321	   Distributed measurement-based admission control avoids reservation
322	   processing (whether per flow or aggregated) on interior routers but
323	   flows are still blocked dynamically in response to actual congestion
324	   on any interior router. Hence there is no need for accurate or
325	   conservative prediction of the traffic matrix.

327	1.1.2. Flow pre-emption

329	   An essential QoS issue in core and backbone networks is being able to
330	   cope with failures of routers and links. The consequent re-routing
331	   can cause severe congestion on some links and hence degrade the QoS
332	   experienced by on-going microflows and other, lower priority traffic.
333	   Even when the network is engineered to sustain a single link failure,
334	   multiple link failures (e.g. due to a fibre cut, router failure or a
335	   natural disaster) can cause violation of capacity constraints and
336	   resulting QoS failures. Our solution uses rate-based flow pre-
337	   emption, so that sufficient of the previously admitted CL microflows
338	   are dropped to ensure that the remaining ones again receive QoS
339	   commensurate with the CL service and at least some QoS is quickly
340	   restored to other traffic classes.

342	   The solution involves four steps. First, triggering the ingress
343	   gateway to test whether pre-emption may be needed. A router enhanced
344	   with Pre-Congestion Notification may optionally include an algorithm
345	   that Pre-emption Marks packets. Reception of a packet with such a
346	   marking alerts the egress gateway that pre-emption may be needed,
347	   which in turn sends a Pre-emption Alert message to the ingress
348	   gateway. Secondly, calculating the right amount of traffic to drop.
349	   This involves the egress gateway measuring, and reporting to the
350	   ingress gateway, the current rate of CL traffic received from that
351	   particular ingress gateway. This is the CL rate which the network can
352	   actually support from that ingress gateway to that egress gateway,
353	   and we thus call it the Sustainable-Aggregate-Rate. The ingress
354	   gateway compares the Sustainable-Aggregate-Rate) with the rate that
355	   it is sending and hence determines how much traffic needs to be pre-
356	   empted. Thirdly, choosing which flows to shed in order to drop the
357	   traffic calculated in the second step. Information on the priority of
358	   flows may be held by the ingress gateway, or by some out of band
359	   policy decision point. How these systems co-ordinate to determine
360	   which flows to drop is outside the scope of this document, but
361	   between them they have all the information necessary to make the
362	   decision. Fourthly, tearing down reservations for the chosen flows.
363	   The ingress gateway triggers standard tear-down messages for the
364	   reservation protocol in use. In turn, this is expected to result in
365	   end-systems tearing down the corresponding sessions (e.g. voice
366	   calls) using the corresponding session control protocols.

368	   The focus of this document is on the first two steps, i.e.
369	   determining that pre-emption may be needed and estimating how much
370	   traffic needs to be pre-empted. We provide some hints about the
371	   latter two steps in Section 3.2.3, but don't try to provide full
372	   guidance as it greatly depends on the particular detailed operational
373	   situation.

375	   The solution operates within a little over one round trip time - the
376	   time required for microflow packets that have experienced Pre-emption
377	   Marking to travel downstream through the CL-region and arrive at the
378	   egress gateway, plus some additional time for the egress gateway to
379	   measure the rate seen after it has been alerted that pre-emption may
380	   be needed, and the time for the egress gateway to report this
381	   information to the ingress gateway.

383	1.1.3. Both admission control and pre-emption

385	   This document describes both the admission control and pre-emption
386	   mechanisms, and we suggest that an operator uses both. However, we do
387	   not require this and some operators may want to implement only one.

389	   For example, an operator could use just admission control, solving
390	   heavy congestion (caused by re-routing) by 'just waiting' - as
391	   sessions end, existing microflows naturally depart from the system
392	   over time, and the admission control mechanism will prevent admission
393	   of new microflows that use the affected links. So the CL-region will
394	   naturally return to normal controlled load service, but with reduced
395	   capacity. The drawback of this approach would be that until flows
396	   naturally depart to relieve the congestion, all flows and lower
397	   priority services will be adversely affected. As another example, an
398	   operator could use just admission control, avoiding heavy congestion
399	   (caused by re-routing) by 'capacity planning' - by configuring
400	   admission control thresholds to lower levels than the network could
401	   accept in normal situations such that the load after failure is
402	   expected to stay below acceptable levels even with reduced network
403	   resources.

405	   On the other hand, an operator could just rely for admission control
406	   on the traffic conditioning agreements of the DiffServ architecture
407	   [RFC2475]. The pre-emption mechanism described in this document would
408	   be used to counteract the problem described at the end of Section
409	   1.1.1.

411	1.2. Terminology

413	   This terminology is copied from the pre-congestion notification
414	   marking draft [PCN]:

416	   o Pre-Congestion Notification (PCN): two new algorithms that
417	      determine when a PCN-enabled router Admission Marks and Pre-
418	      emption Marks a packet, depending on the traffic level.

420	   o Admission Marking condition: the traffic level is such that the
421	      router Admission Marks packets. The router provides an "early
422	      warning" that the load is nearing the engineered admission control
423	      capacity, before there is any significant build-up of CL packets
424	      in the queue.

426	   o Pre-emption Marking condition: the traffic level is such that the
427	      router Pre-emption Marks packets. The router warns explicitly that
428	      pre-emption may be needed.

430	   o Configured-admission-rate: the reference rate used by the
431	      admission marking algorithm in a PCN-enabled router.

433	   o Configured-pre-emption-rate - the reference rate used by the pre-
434	      emption marking algorithm in a PCN-enabled router.

436	   The following terms are defined here:

438	   o Ingress gateway: router at an ingress to the CL-region. A CL-
439	      region may have several ingress gateways.

441	   o Egress gateway: router at an egress from the CL-region. A CL-
442	      region may have several egress gateways.

444	   o Interior router: a router which is part of the CL-region, but
445	      isn't an ingress or egress gateway.

447	   o CL-region: A region of the Internet in which all traffic
448	      enters/leaves through an ingress/egress gateway and all routers
449	      run Pre-Congestion Notification marking. A CL-region is a DiffServ
450	      region (a DiffServ region is either a single DiffServ domain or
451	      set of contiguous DiffServ domains), but note that the CL-region
452	      does not use the traffic conditioning agreements (TCAs) of the
453	      (informational) DiffServ architecture.

455	   o CL-region-aggregate: all the microflows between a specific pair of
456	      ingress and egress gateways. Note there is no field in the flow
457	      packet headers that uniquely identifies the aggregate.

459	   o Congestion-Level-Estimate: the number of bits in CL packets that
460	      are admission marked (or pre-emption marked), divided by the
461	      number of bits in all CL packets. It is calculated as an
462	      exponentially weighted moving average. It is calculated by an
463	      egress gateway for the CL packets from a particular ingress
464	      gateway, i.e. there is a Congestion-Level-Estimate for each CL-
465	      region-aggregate.

467	   o Sustainable-Aggregate-Rate: the rate of traffic that the network
468	      can actually support for a specific CL-region-aggregate. So it is
469	      measured by an egress gateway for the CL packets from a particular
470	      ingress gateway.

472	   o Ingress-Aggregate-Rate: the rate of traffic that is being sent on
473	      a specific CL-region-aggregate. So it is measured by an ingress
474	      gateway for the CL packets sent towards a particular egress
475	      gateway.

477	1.3. Existing terminology

479	   This is a placeholder for useful terminology that is defined
480	   elsewhere.

482	1.4. Standardisation requirements

484	   The framework described in this document has two new standardisation
485	   requirements:

487	   o new Pre-Congestion Notification for Admission Marking and Pre-
488	      emption Marking are required, as detailed in [PCN].

490	   o the end-to-end signalling protocol needs to be modified to carry
491	      the Congestion-Level-Estimate report (for admission control) and
492	      the Sustainable-Aggregate-Rate (for flow pre-emption). With our
493	      assumption of RSVP (Section 2.2) as the end-to-end signalling
494	      protocol, it means that extensions to RSVP are required, as
495	      detailed in [RSVP-PCN], for example to carry the Congestion-Level-
496	      Estimate and Sustainable-Aggregate-Rate information from egress
497	      gateway to ingress gateway.

499	   o We are discussing what to standardise about the gateway's
500	      behaviour.

502	   Other than these things, the arrangement uses existing IETF protocols
503	   throughout, although not in their usual architecture.

505	1.5. Structure of rest of the document

507	   Section 2 describes some key aspects of the deployment model: our
508	   goals, assumptions and the benefits we believe it has. Section 3
509	   describes the deployment model, whilst Section 4 summarises the
510	   required changes to the various routers in the CL-region. Section 5
511	   outlines some limitations of PCN that we've identified in this
512	   deployment model; it also discusses some potential solutions, and
513	   other possible extensions. Section 6 provides some comparison with
514	   existing QoS mechanisms.

516	2. Key aspects of the deployment model

518	   In this section we discuss the key aspects of the deployment model:

520	   o At a high level, our key goals, i.e. the functionality that we
521	      want to achieve

523	   o The assumptions that we're prepared to make

525	   o The consequent benefits they bring

527	2.1. Key goals

529	   The deployment model achieves an end-to-end controlled load (CL)
530	   service where a segment of the end-to-end path is an edge-to-edge
531	   Pre-Congestion Notification region. CL is a quality of service (QoS)
532	   closely approximating the QoS that the same flow would receive from a
533	   lightly loaded network element [RFC2211]. It is useful for inelastic
534	   flows such as those for real-time media.

536	   o The CL service should be achieved despite varying load levels of
537	      other sorts of traffic, which may or may not be rate adaptive
538	      (i.e. responsive to packet drops or ECN marks).

540	   o The CL service should be supported for a variety of possible CL
541	      sources: Constant Bit Rate (CBR), Variable Bit Rate (VBR) and
542	      voice with silence suppression. VBR is the most challenging to
543	      support.

545	   o After a localised failure in the interior of the CL-region causing
546	      heavy congestion, the CL service should recover gracefully by pre-
547	      empting (dropping) some of the admitted CL microflows, whilst
548	      preserving as many of them as possible with their full CL QoS.

550	   o It needs to be possible to complete flow pre-emption within 1-2
551	      seconds. Operators will have varying requirements but, at least
552	      for voice, it has been estimated that after a few seconds then
553	      many affected users will start to hang up, making the flow pre-
554	      emption mechanism redundant and possibly even counter-productive.
555	      Until flow pre-emption kicks in, other applications using CL (e.g.
556	      video) and lower priority traffic (e.g. Assured Forwarding (AF))
557	      could be receiving reduced service. Therefore an even faster flow
558	      pre-emption mechanism would be desirable (even if, in practice,
559	      operators have to add a deliberate pause to ride out a transient
560	      while the natural rate of call tear down or lower layer protection
561	      mechanisms kick in).

563	   o The CL service should support emergency services ([EMERG-RQTS],
564	      [EMERG-TEL]) as well as the Assured Service which is the IP
565	      implementation of the existing ITU-T/NATO/DoD telephone system
566	      architecture known as Multi-Level Pre-emption and Precedence
567	      [ITU.MLPP.1990] [ANSI.MLPP.Spec][ANSI.MLPP.Supplement], or MLPP.
568	      In particular, this involves admitting new flows that are part of
569	      high priority sessions even when admission control would reject
570	      new routine flows. Similarly, when having to choose which flows to
571	      pre-empt, this involves taking into account the priorities and
572	      properties of the sessions that flows are part of.

574	2.2. Key assumptions

576	   The framework does not try to deliver the above functionality in all
577	   scenarios. We make the following assumptions about the type of
578	   scenario to be solved.

580	   o Edge-to-edge: all the routers in the CL-region are upgraded with
581	      Pre-Congestion Notification, and all the ingress and egress
582	      gateways are upgraded to perform the measurement-based admission
583	      control and flow pre-emption. Note that although the upgrades
584	      required are edge-to-edge, the CL service is provided end-to-end.

586	   o Additional load: we assume that any additional load offered within
587	      the reaction time of the admission control mechanism doesn't move
588	      the CL-region directly from no congestion to overload. So it
589	      assumes there will always be an intermediate stage where some CL
590	      packets are Admission Marked, but they are still delivered without
591	      significant QoS degradation. We believe this is valid for core and
592	      backbone networks with typical call arrival patterns (given the
593	      reaction time is little more than one round trip time across the
594	      CL-region), but is unlikely to be valid in access networks where
595	      the granularity of an individual call becomes significant.

597	   o Aggregation: we assume that in normal operations, there are many
598	      CL microflows within the CL-region, typically at least hundreds
599	      between any pair of ingress and egress gateways. The implication
600	      is that the solution is targeted at core and backbone networks and
601	      possibly parts of large access networks.

603	   o Trust: we assume that there is trust between all the routers in
604	      the CL-region. For example, this trust model is satisfied if one
605	      operator runs the whole of the CL-region. But we make no such
606	      assumptions about the end hosts, i.e. depending on the scenario
607	      they may be trusted or untrusted by the CL-region.

609	   o Signalling: we assume that the end-to-end signalling protocol is
610	      RSVP. Section 3 describes how the CL-region fits into such an end-
611	      to-end QoS scenario, whilst [RSVP-PCN] describes the extensions to
612	      RSVP that are required.

614	   o Separation: we assume that all routers within the CL-region are
615	      upgraded with the CL mechanism, so the requirements of [Floyd] are
616	      met because the CL-region is an enclosed environment. Also, an
617	      operator separates CL-traffic in the CL-region from outside
618	      traffic by administrative configuration of the ring of gateways
619	      around the region. Within the CL-region we assume that the CL-
620	      traffic is separated from non-CL traffic.

622	   o Routing: we assume that all packets between a pair of ingress and
623	      egress gateways follow the same path, or that they follow
624	      different paths but that the load balancing scheme is tuned in the
625	      CL-region to distribute load such that the different paths always
626	      receive comparable relative load. This ensures that the
627	      Congestion-Level-Estimate used in the admission control procedure
628	      (and which is computed taking into account packets travelling on
629	      all the paths) approximately reflects the status of the actual
630	      path that will be followed by the new microflow's packets.

632	   We are investigating ways of loosening the restrictions set by some
633	   of these assumptions, for instance:

635	   o Trust: to allow the CL-region to span multiple, non-trusting
636	      operators, using the technique of [Re-PCN] as mentioned in Section
637	      5.7.2.

639	   o Signalling: we believe that the solution could operate with
640	      another signalling protocol, such as the one produced by the NSIS
641	      working group. It could also work with application level
642	      signalling as suggested in [RT-ECN].

644	   o Additional load: we believe that the assumption is valid for core
645	      and backbone networks, with an appropriate margin between the
646	      configured-admission-rate and the capacity for CL traffic.
647	      However, in principle a burst of admission requests can occur in a
648	      short time. We expect this to be a rare event under normal
649	      conditions, but it could happen e.g. due to a 'flash crowd'. If it
650	      does, then more flows may be admitted than should be, triggering
651	      the pre-emption mechanism. There are various ways an operator
652	      might try to alleviate this issue, which are discussed in the
653	      'Flash crowds' section 5.5 later.

655	   o Separation: the assumption that CL traffic is separated from non-
656	      CL traffic implies that the CL traffic has its own PHB, not shared
657	      with other traffic. We are looking at whether it could share
658	      Expedited Forwarding's PHB, but supplemented with Pre-Congestion
659	      Notification. If this is possible, other PHBs (like Assured
660	      Forwarding) could be supplemented with the same new behaviours.
661	      This is similar to how RFC3168 ECN was defined to supplement any
662	      PHB.

664	   o Routing: we are looking in greater detail at the solution in the
665	      presence of Equal Cost Multi-Path routing and at suitable
666	      enhancements. See also the 'ECMP' section 5.1 later.

668	2.3. Key benefits

670	   We believe that the mechanism described in this document has several
671	   advantages:

673	   o It achieves statistical guarantees of quality of service for
674	      microflows, delivering a very low delay, jitter and packet loss
675	      service suitable for applications like voice and video calls that
676	      generate real time inelastic traffic. This is because of its per
677	      microflow admission control scheme, combined with its dynamic on-
678	      path "early warning" of potential congestion. The guarantee is at
679	      least as strong as with IntServ Controlled Load (Section 6.1
680	      mentions why the guarantee may be somewhat better), but without
681	      the scalability problems of per-microflow IntServ.

683	   o It can support "Emergency" and military Multi-Level Pre-emption
684	      and Priority (MLPP) services, even in times of heavy congestion
685	      (perhaps caused by failure of a router within the CL-region), by
686	      pre-empting on-going "ordinary CL microflows". See also Section
687	      4.5.

689	   o It scales well, because there is no signal processing or per flow
690	      state held by the interior routers of the CL-region. Note that
691	      interior routers only hold state per outgoing interface - they do
692	      not hold state per CL-region-aggregate nor per flow.

694	   o It is resilient, again because no per flow state is held by the
695	      interior routers of the CL-region. Hence during an interior
696	      routing change caused by a router failure, no microflow state has
697	      to be relocated. The flow pre-emption mechanism further helps
698	      resilience because it rapidly reduces the load to one that the CL-
699	      region can support.

701	   o It helps preserve, through the flow pre-emption mechanism, QoS to
702	      as many microflows as possible and to lower priority traffic in
703	      times of heavy congestion (e.g. caused by failure of an interior
704	      router). Otherwise long-lived microflows could cause loss on all
705	      CL microflows for a long time.

707	   o It avoids the potential catastrophic failure problem when the
708	      DiffServ architecture is used in large networks using statically
709	      provisioned capacity. This is achieved by controlling the load
710	      dynamically, based on edge-to-edge-path real-time measurement of
711	      Pre-Congestion Notification, as discussed in Section 1.1.1.

713	   o It requires minimal new standardisation, because it reuses
714	      existing QoS protocols and algorithms.

716	   o It can be deployed incrementally, region by region or network by
717	      network. Not all the regions or networks on the end-to-end path
718	      need to have it deployed. Two CL-regions can even be separated by
719	      a network that uses another QoS mechanism (e.g. MPLS-TE).

721	   o It provides a deployment path for use of ECN for real-time
722	      applications. Operators can gain experience of ECN before its
723	      applicability to end-systems is understood and end terminals are
724	      ECN capable.

726	3. Deployment model

728	3.1. Admission control

730	   In this section we describe the admission control mechanism. We
731	   discuss the three pieces of the solution and then give an example of
732	   how they fit together in a use case:

734	   o the new Pre-Congestion Notification for Admission Marking used by
735	      all routers in the CL-region

737	   o how the measurements made support our admission control mechanism

739	   o how the edge to edge mechanism fits into the end to end RSVP
740	      signalling

742	3.1.1. Pre-Congestion Notification for Admission Marking

744	   This is discussed in [PCN]. Here we only give a brief outline.

746	   To support our admission control mechanism, each router in the CL-
747	   region runs an algorithm to determine whether to Admission Mark the
748	   packet. The algorithm measures the aggregate CL traffic on the link
749	   and ensures that packets are admission marked before the actual queue
750	   builds up, but when it is in danger of doing so soon; the probability
751	   of admission marking increases with the danger. The algorithm's main
752	   parameter is the configured-admission-rate, which is set lower than
753	   the link speed, perhaps considerably so. Admission marked packets
754	   indicate that the CL traffic rate is reaching the configured-
755	   admission-rate and so act as an "early warning" that the engineered
756	   capacity is nearly reached. Therefore they indicate that requests to
757	   admit prospective new CL flows may need to be refused.

759	3.1.2. Measurements to support admission control

761	   To support our admission control mechanism the egress measures the
762	   Congestion-Level-Estimate for traffic from each remote ingress
763	   gateway, i.e. per CL-region-aggregate. The Congestion-Level-Estimate
764	   is the number of bits in CL packets that are admission marked or pre-
765	   emption marked, divided by the number of bits in all CL packets. It
766	   is calculated as an exponentially weighted moving average. It is
767	   calculated by an egress gateway separately for the CL packets from
768	   each particular ingress gateway.

770	   Why are pre-emption marked packets included in the Congestion-Level-
771	   Estimate? Pre-emption marking over-writes admission marking, i.e. a
772	   packet cannot be both admission and pre-emption marked. So if pre-
773	   emption marked packets weren't counted we would have the anomaly that
774	   as the traffic rate grew above the configured-pre-emption-rate, the
775	   Congestion-Level-Estimate would fall. If a particular encoding scheme
776	   is chosen where a packet can be both admission and pre-emption marked
777	   (such as Alternative 4 in Appendix C of [PCN]), then this is not
778	   necessary.

780	   This Congestion-Level-Estimate provides an estimate of how near the
781	   links on the path inside the CL-region are getting to the configured-
782	   admission-rate. Note that the metering is done separately per ingress
783	   gateway, because there may be sufficient capacity on all the routers
784	   on the path between one ingress gateway and a particular egress, but
785	   not from a second ingress to that same egress gateway.

787	3.1.3. How edge-to-edge admission control supports end-to-end QoS
788	   signalling

790	   Consider a scenario that consists of two end hosts, each connected to
791	   their own access networks, which are linked by the CL-region. A
792	   source tries to set up a new CL microflow by sending an RSVP PATH
793	   message, and the receiving end host replies with an RSVP RESV
794	   message. Outside the CL-region some other method, for instance
795	   IntServ, is used to provide QoS. From the perspective of RSVP the CL-
796	   region is a single hop, so the RSVP PATH and RESV messages are
797	   processed by the ingress and egress gateways but are carried
798	   transparently across all the interior routers; hence, the ingress and
799	   egress gateways hold per microflow state, whilst no per microflow
800	   state is kept by the interior routers. So far this is as in IntServ
801	   over DiffServ [RFC2998]. However, in order to support our admission
802	   control mechanism, the egress gateway adds to the RESV message an
803	   opaque object which states the current Congestion-Level-Estimate for
804	   the relevant CL-region-aggregate. Details of the corresponding RSVP
805	   extensions are described in [RSVP-PCN].

807	3.1.4. Use case

809	   To see how the three pieces of the solution fit together, we imagine
810	   a scenario where some microflows are already in place between a given
811	   pair of ingress and egress gateways, but the traffic load is such
812	   that no packets from these flows are admission marked as they travel
813	   across the CL-region. A source wanting to start a new CL microflow
814	   sends an RSVP PATH message. The egress gateway adds an object to the
815	   RESV message with the Congestion-Level-Estimate, which is zero. The
816	   ingress gateway sees this and consequently admits the new flow. It
817	   then forwards the RSVP RESV message upstream towards the source end
818	   host. Hence, assuming there's sufficient capacity in the access
819	   networks, the new microflow is admitted end-to-end.

821	   The source now sends CL packets, which arrive at the ingress gateway.
822	   The ingress uses a five-tuple filter to identify that the packets are
823	   part of a previously admitted CL microflow, and it also polices the
824	   microflow to ensure it remains within its traffic profile. (The
825	   ingress has learnt the required information from the RSVP messages.)
826	   When forwarding a packet belonging to an admitted microflow, the
827	   ingress sets the packet's DSCP and ECN fields to the appropriate
828	   values configured for the CL region. The CL packet now travels across
829	   the CL-region, getting admission marked if necessary.

831	   Next, we imagine the same scenario but at a later time when load is
832	   higher at one (or more) of the interior routers, which start to
833	   Admission Mark CL packets, because their load on the outgoing link is
834	   nearing the configured-admission-rate. The next time a source tries
835	   to set up a CL microflow, the ingress gateway learns (from the
836	   egress) the relevant Congestion-Level-Estimate. If it is greater than
837	   some CLE-threshold value then the ingress refuses the request,
838	   otherwise it is accepted. The ingress gateway could also take into
839	   account attributes of the RSVP reservation (such as for example the
840	   RSVP pre-emption priority of [RSVP-PREEMPTION] or the RSVP admission
841	   priority of [RSVP-EMERGENCY]) as well as information provided by a
842	   policy decision point in order to make a more sophisticated admission
843	   decision. This way, flow admission can help emergency/military calls
844	   by taking into account the corresponding priorities (as conveyed in
845	   RSVP policy elements) when deciding to admit or reject a new
846	   reservation. Use of RSVP for the support of emergency/military
847	   applications is discussed in further detail in [RFC4542] and [RSVP-
848	   EMERGENCY].

850	   It is also possible for an egress gateway to get a RSVP RESV message
851	   and not know what the Congestion-Level-Estimate is. For example, if
852	   there are no CL microflows at present between the relevant ingress
853	   and egress gateways. In this case the egress requests the ingress to
854	   send probe packets, from which it can initialise its meter. RSVP
855	   Extensions for such a request to send probe data can be found in
856	   [RSVP-PCN].

858	3.2. Flow pre-emption

860	   In this section we describe the flow pre-emption mechanism. We
861	   discuss the two parts of the solution and then give an example of how
862	   they fit together in a use case:

864	   o How an ingress gateway is triggered to test whether flow pre-
865	      emption may be needed

867	   o How an ingress gateway determines the right amount of CL traffic
868	      to drop

870	   The mechanism is defined in [PCN] and [RSVP-PCN].

872	3.2.1. Alerting an ingress gateway that flow pre-emption may be needed

874	   Alerting an ingress gateway that flow pre-emption may be needed is a
875	   two stage process: a router in the CL-region alerts an egress gateway
876	   that flow pre-emption may be needed; in turn the egress gateway
877	   alerts the relevant ingress gateway. Every router in the CL-region
878	   has the ability to alert egress gateways, which may be done either
879	   explicitly or implicitly:

881	   o Explicit - the router per-hop behaviour is supplemented with a new
882	      Pre-emption Marking behaviour, which is outlined below. Reception
883	      of such a packet by the egress gateway alerts it that pre-emption
884	      may be needed.

886	   o Implicit - the router behaviour is unchanged from the Admission
887	      Marking behaviour described earlier. The egress gateway treats a
888	      Congestion-Level-Estimate of (almost) 100% as an implicit alert
889	      that pre-emption may be required. ('Almost' because the
890	      Congestion-Level-Estimate is a moving average, so can never reach
891	      exactly 100%.)

893	   To support explicit pre-emption alerting, each router in the CL-
894	   region runs an algorithm to determine whether to Pre-emption Mark the
895	   packet. The algorithm measures the aggregate CL traffic and ensures
896	   that packets are pre-emption marked before the actual queue builds
897	   up. The algorithm's main parameter is the configured-pre-emption-
898	   rate, which is set lower than the link speed (but higher than the
899	   configured-admission-rate). Thus pre-emption marked packets indicate
900	   that the CL traffic rate is reaching the configured-pre-emption-rate
901	   and so act as an "early warning" that the engineered capacity is
902	   nearly reached. Therefore they indicate that it may be advisable to
903	   pre-empt some of the existing CL flows in order to preserve the QoS
904	   of the others.

906	   Note that the explicit mechanism only makes sense if all the routers
907	   in the CL-region have the functionality so that the egress gateways
908	   can rely on the explicit mechanism. Otherwise there is the danger
909	   that the traffic happens to focus on a router without it, and egress
910	   gateways then have also to watch for implicit pre-emption alerts.

912	   When one or more packets in a CL-region-aggregate alert the egress
913	   gateway of the need for flow pre-emption, whether explicitly or
914	   implicitly, the egress puts that CL-region-aggregate into the Pre-
915	   emption Alert state. For each CL-region-aggregate in alert state it
916	   measures the rate of traffic at the egress gateway (i.e. the traffic
917	   rate of the appropriate CL-region-aggregate) and reports this to the
918	   relevant ingress gateway. The steps are:

920	   o Determine the relevant ingress gateway - for the explicit case the
921	      egress gateway examines the pre-emption marked packet and uses the
922	      state installed at the time of admission to determine which
923	      ingress gateway the packet came from. For the implicit case the
924	      egress gateway has already determined this information, because
925	      the Congestion-Level-Estimate is calculated per ingress gateway.

927	   o Measure the traffic rate of CL packets - as soon as the egress
928	      gateway is alerted (whether explicitly or implicitly) it measures
929	      the rate of CL traffic from this ingress gateway (i.e. for this
930	      CL-region-aggregate). Note that pre-emption marked packets are
931	      excluded from that measurement. It should make its measurement
932	      quickly and accurately, but exactly how is up to the
933	      implementation.

935	   o Alert the ingress gateway - the egress gateway then immediately
936	      alerts the relevant ingress gateway about the fact that flow pre-
937	      emption may be required. This Alert message also includes the
938	      measured Sustainable-Aggregate-Rate, i.e. the rate of CL-traffic
939	      received from this ingress gateway. The Alert message is sent
940	      using reliable delivery. Procedures for the support of such an
941	      Alert using RSVP are defined in [RSVP-PCN].

943	             --------------       _       _          -----------------
944	CL packet   |Update        |     / Is it a \   Y    | Measure CL rate |
945	arrives --->|Congestion-   |--->/pre-emption\-----> | from ingress and|
946	            |Level-Estimate|    \  marked   /       | alert ingress   |
947	             --------------      \ packet? /         -----------------
948	                                  \_     _/

950	Figure 2: Egress gateway action for explicit Pre-emption Alert
951	                                   _     _
952	             --------------       /       \          -----------------
953	CL packet   |Update        |     /  Is     \   Y    | Measure CL rate |
954	arrives --->|Congestion-   |--->/  C.L.E.   \-----> | from ingress and|
955	            |Level-Estimate|    \ (nearly)  /       | alert ingress   |
956	             --------------      \ 100%?   /         -----------------
957	                                  \_     _/

959	Figure 3: Egress gateway action for implicit Pre-emption Alert

961	3.2.2. Determining the right amount of CL traffic to drop

963	   The method relies on the insight that the amount of CL traffic that
964	   can be supported between a particular pair of ingress and egress
965	   gateways, is the amount of CL traffic that is actually getting across
966	   the CL-region to the egress gateway without being Pre-emption Marked.
967	   Hence we term it the Sustainable-Aggregate-Rate.

969	   So when the ingress gateway gets the Alert message from an egress
970	   gateway, it compares:

972	   o The traffic rate that it is sending to this particular egress
973	      gateway (which we term Ingress-Aggregate-Rate)

975	   o The traffic rate that the egress gateway reports (in the Alert
976	      message) that it is receiving from this ingress gateway (which is
977	      the Sustainable-Aggregate-Rate)

979	   If the difference is significant, then the ingress gateway pre-empts
980	   some microflows. It only pre-empts if:

982	        Ingress-Aggregate-Rate > Sustainable-Aggregate-Rate + error

984	   The "error" term is partly to allow for inaccuracies in the
985	   measurements of the rates. It is also needed because the Ingress-
986	   Aggregate-Rate is measured at a slightly later moment than the
987	   Sustainable-Aggregate-Rate, and it is quite possible that the
988	   Ingress-Aggregate-Rate has increased in the interim due to natural
989	   variation of the bit rate of the CL sources. So the "error" term
990	   allows for some variation in the ingress rate without triggering pre-
991	   emption.

993	   The ingress gateway should pre-empt enough microflows to ensure that:

995	        New Ingress-Aggregate-Rate < Sustainable-Aggregate-Rate - error

997	   The "error" term here is used for similar reasons but in the other
998	   direction, to ensure slightly more load is shed than seems necessary,
999	   in case the two measurements were taken during a short-term fall in
1000	   load.

1002	   When the routers in the CL-region are using explicit pre-emption
1003	   alerting, the ingress gateway would normally pre-empt microflows
1004	   whenever it gets an alert (it always would if it were possible to set
1005	   "error" equal to zero). For the implicit case however this is not so.
1006	   It receives an Alert message when the Congestion-Level-Estimate
1007	   reaches (almost) 100%, which is roughly when traffic exceeds the
1008	   configured-admission-rate. However, it is only when packets are
1009	   indeed dropped en route that the Sustainable-Aggregate-Rate becomes
1010	   less than the Ingress-Aggregate-Rate so only then will pre-emption
1011	   actually occur on the ingress gateway.

1013	   Hence with the implicit scheme, pre-emption can only be triggered
1014	   once the system starts dropping packets and thus the QoS of flows
1015	   starts being significantly degraded. This is in contrast with the
1016	   explicit scheme which allows flow pre-emption to be triggered before
1017	   any packet drop, simply when the traffic reaches the configured-pre-
1018	   emption-rate. Therefore we believe that the explicit mechanism is
1019	   superior. However it does require new functionality on all the
1020	   routers (although this is little more than a bulk token bucket - see
1021	   [PCN] for details).

1023	3.2.3. Use case for flow pre-emption

1025	   To see how the pieces of the solution fit together in a use case, we
1026	   imagine a scenario where many microflows have already been admitted.
1027	   We confine our description to the explicit pre-emption mechanism. Now
1028	   an interior router in the CL-region fails. The network layer routing
1029	   protocol re-routes round the problem, but as a consequence traffic on
1030	   other links increases. In fact let's assume the traffic on one link
1031	   now exceeds its configured-pre-emption-rate and so the router pre-
1032	   emption marks CL packets. When the egress sees the first one of the
1033	   pre-emption marked packets it immediately determines which microflow
1034	   this packet is part of (by using a five-tuple filter and comparing it
1035	   with state installed at admission) and hence which ingress gateway
1036	   the packet came from. It sets up a meter to measure the traffic rate
1037	   from this ingress gateway, and as soon as possible sends a message to
1038	   the ingress gateway. This message alerts the ingress gateway that
1039	   pre-emption may be needed and contains the traffic rate measured by
1040	   the egress gateway. Then the ingress gateway determines the traffic
1041	   rate that it is sending towards this egress gateway and hence it can
1042	   calculate the amount of traffic that needs to be pre-empted.

1044	   The ingress gateway could now just shed random microflows, but it is
1045	   better if the least important ones are dropped. The ingress gateway
1046	   could use information stored locally in each reservation's state
1047	   (such as for example the RSVP pre-emption priority of [RSVP-
1048	   PREEMPTION] or the RSVP admission priority of [RSVP-EMERGENCY]) as
1049	   well as information provided by a policy decision point in order to
1050	   decide which of the flows to shed (or perhaps which ones not to
1051	   shed). This way, flow pre-emption can also helps emergency/military
1052	   calls by taking into account the corresponding priorities (as
1053	   conveyed in RSVP policy elements) when selecting calls to be pre-
1054	   empted, which is likely to be particularly important in a disaster
1055	   scenario. Use of RSVP for support of emergency/military applications
1056	   is discussed in further details in [RFC4542] and [RSVP-EMERGENCY].

1058	   The ingress gateway then initiates RSVP signalling to instruct the
1059	   relevant destinations that their reservation has been terminated, and
1060	   to tell (RSVP) nodes along the path to tear down associated RSVP
1061	   state. To guard against recalcitrant sources, normal IntServ policing
1062	   may be used to block any future traffic from the dropped flows from
1063	   entering the CL-region. Note that - with the explicit Pre-emption
1064	   Alert mechanism - since the configured-pre-emption-rate may be
1065	   significantly less than the physical line capacity, flow pre-emption
1066	   may be triggered before any congestion has actually occurred and
1067	   before any packet is dropped.

1069	   We extend the scenario further by imagining that (due to a disaster
1070	   of some kind) further routers in the CL-region fail during the time
1071	   taken by the pre-emption process described above. This is handled
1072	   naturally, as packets will continue to be pre-emption marked and so
1073	   the pre-emption process will happen for a second time.

1075	4. Summary of Functionality

1077	   This section is intended to provide a systematic summary of the new
1078	   functionality required by the routers in the CL-region.

1080	   A network operator upgrades normal IP routers by:

1082	   o Adding functionality related to admission control and flow pre-
1083	      emption to all its ingress and egress gateways

1085	   o Adding Pre-Congestion Notification for Admission Marking and Pre-
1086	      emption Marking to all the routers in the CL-region.

1088	   We consider the detailed actions required for each of the types of
1089	   router in turn.

1091	4.1. Ingress gateways

1093	   Ingress gateways perform the following tasks:

1095	   o Classify incoming packets - decide whether they are CL or non-CL
1096	      packets. This is done using an IntServ filter spec (source and
1097	      destination addresses and port numbers), whose details have been
1098	      gathered from the RSVP messaging.

1100	   o Police - check that the microflow conforms with what has been
1101	      agreed (i.e. it keeps to its agreed data rate). If necessary,
1102	      packets which do not correspond to any reservations, packets which
1103	      are in excess of the rate agreed for their reservation, and
1104	      packets for a reservation that has earlier been pre-empted may be
1105	      policed. Policing may be achieved via dropping or via re-marking
1106	      of the packet's DSCP to a value different from the CL behaviour
1107	      aggregate.

1109	   o ECN colouring packets - for CL microflows, set the ECN field of
1110	      packets appropriately (see [PCN] for some discussion of encoding).

1112	   o Perform 'interior router' functions (see next sub-section).

1114	   o Admission Control - on new session establishment, consider the
1115	      Congestion-Level-Estimate received from the corresponding egress
1116	      gateway and most likely based on a simple configured CLE-threshold
1117	      decide if a new call is to be admitted or rejected (taking into
1118	      account local policy information as well as optionally information
1119	      provided by a policy decision point).

1121	   o Probe - if requested by the egress gateway to do so, the ingress
1122	      gateway generates probe traffic so that the egress gateway can
1123	      compute the Congestion-Level-Estimate from this ingress gateway.
1124	      Probe packets may be simple data addressed to the egress gateway
1125	      and require no protocol standardisation, although there will be
1126	      best practice for their number, size and rate.

1128	   o Measure - when it receives a Pre-emption Alert message from an
1129	      egress gateway, it determines the rate at which it is sending
1130	      packets to that egress gateway

1132	   o Pre-empt - calculate how much CL traffic needs to be pre-empted;
1133	      decide which microflows should be dropped, perhaps in consultation
1134	      with a Policy Decision Point; and do the necessary signalling to
1135	      drop them.

1137	4.2. Interior routers

1139	   Interior routers do the following tasks:

1141	   o Classify packets - examine the DSCP and ECN field to see if it's a
1142	      CL packet

1144	   o Non-CL packets are handled as usual, with respect to dropping them
1145	      or setting their CE codepoint.

1147	   o Pre-Congestion Notification - CL packets are Admission Marked and
1148	      Pre-emption Marked according to the algorithm detailed in [PCN]
1149	      and outlined in Section 3.

1151	4.3. Egress gateways

1153	   Egress gateways do the following tasks:

1155	   o Classify packets - determine which ingress gateway a CL packet has
1156	      come from. This is the previous RSVP hop, hence the necessary
1157	      details are obtained just as with IntServ from the state
1158	      associated with the packet five-tuple, which has been built using
1159	      information from the RSVP messages.

1161	   o Meter - for CL packets, calculate the fraction of the total number
1162	      of bits which are in Admission marked packets or in Pre-emption
1163	      Marked packets. The calculation is done as an exponentially
1164	      weighted moving average (see Appendix C). A separate calculation
1165	      is made for CL packets from each ingress gateway. The meter works
1166	      on an aggregate basis and not per microflow.

1168	   o Signal the Congestion-Level-Estimate - this is piggy-backed on the
1169	      reservation reply. An egress gateway's interface is configured to
1170	      know it is an egress gateway, so it always appends this to the
1171	      RESV message. If the Congestion-Level-Estimate is unknown or is
1172	      too stale, then the egress gateway can request the ingress gateway
1173	      to send probes.

1175	   o Packet colouring - for CL packets, set the DSCP and the ECN field
1176	      to whatever has been agreed as appropriate for the next domain. By
1177	      default the ECN field is set to the Not-ECT codepoint. See also
1178	      the discussion in the Tunnelling section later.

1180	   o Measure the rate - measure the rate of CL traffic from a
1181	      particular ingress gateway, excluding packets that are Pre-emption
1182	      Marked (i.e. the Sustainable-Aggregate-Rate for the CL-region-
1183	      aggregate), when alerted (either explicitly or implicitly) that
1184	      pre-emption may be required. The measured rate is reported back to
1185	      the appropriate ingress gateway [RSVP-PCN].

1187	4.4. Failures

1189	   If an interior router fails, then the regular IP routing protocol
1190	   will re-route round it. If the new route can carry all the admitted
1191	   traffic, flows will gracefully continue. If instead this causes early
1192	   warning of pre-congestion on the new route, then admission control
1193	   based on pre-congestion notification will ensure new flows will not
1194	   be admitted until enough existing flows have departed. Finally re-
1195	   routing may result in heavy congestion, when the flow pre-emption
1196	   mechanism will kick in.

1198	   If a gateway fails then we would like regular RSVP procedures
1199	   [RFC2205] to take care of things. With the local repair mechanism of
1200	   [RFC2205], when a route changes the next RSVP PATH refresh message
1201	   will establish path state along the new route, and thus attempt to
1202	   re-establish reservations through the new ingress gateway.
1203	   Essentially the same procedure is used as described earlier in this
1204	   document, with the re-routed session treated as a new session
1205	   request.

1207	   In more detail, consider what happens if an ingress gateway of the
1208	   CL-region fails. Then RSVP routers upstream of it do IP re-routing to
1209	   a new ingress gateway. The next time the upstream RSVP router sends a
1210	   PATH refresh message it reaches the new ingress gateway which
1211	   therefore installs the associated RSVP state. The next RSVP RESV
1212	   refresh will pick up the Congestion-Level-Estimate from the egress
1213	   gateway, and the ingress compares this with its threshold to decide
1214	   whether to admit the new session. This could result in some of the
1215	   flows being rejected, but those accepted will receive the full QoS.

1217	   An issue with this is that we have to wait until a PATH and RESV
1218	   refresh messages are sent - which may not be very often - the default
1219	   value is 30 seconds. [RFC2205] discusses how to speed up the local
1220	   repair mechanism. First, the RSVP module is notified by the local
1221	   routing protocol module of a route change to particular destinations,
1222	   which triggers it to rapidly send out PATH refresh messages. Further,
1223	   when a PATH refresh arrives with a previous hop address different
1224	   from the one stored, then RESV refreshes are immediately sent to that
1225	   previous hop. Where RSVP is operating hop-by-hop, i.e. on every
1226	   router, then triggering the PATH refresh is easy as the router can
1227	   simply monitor its local link. Thus, this fast local repair mechanism
1228	   can be  used to deal with failures upstream of the ingress gateway,
1229	   with failures of the ingress gateway and with failures downstream of
1230	   the egress gateway.

1232	   But where RSVP is not operating hop-by-hop (as is the case within the
1233	   CL-region), it is not so easy to trigger the PATH refresh.

1235	   Unfortunately, this problem applies if an egress gateway fails, since
1236	   it's very likely that an egress gateway is several IP hops from the
1237	   ingress gateway. (If the ingress is several IP hops from its previous
1238	   RSVP node, then there is the same issue.) The options appear to be:

1240	   o the ingress gateway has a link state database for the CL-region,
1241	      so it can detect that an egress gateway has failed or became
1242	      unreachable

1244	   o there is an inter-gateway protocol, so the ingress can
1245	      continuously check that the egress gateways are still alive

1247	   o (default) do nothing and wait for the regular PATH/RESV refreshes
1248	      (and, if needed, the pre-emption mechanism) to sort things out.

1250	5. Limitations and some potential solutions

1252	   In this section we describe various limitations of the deployment
1253	   model, and some suggestions about potential ways of alleviating them.
1254	   The limitations fall into three broad categories:

1256	   o ECMP (Section 5.1): the assumption about routing (Section 2.2) is
1257	      that all packets between a pair of ingress and egress gateways
1258	      follow the same path; ECMP breaks this assumption

1260	   o The lack of global coordination (Sections 5.2, 5.3 and 5.4): a
1261	      decision about admission control or flow pre-emption is made for
1262	      one aggregate independently of other aggregates

1264	   o Timing and accuracy of measurements (Sections 5.5 and 5.6): the
1265	      assumption (Section 2.2) that additional load, offered within the
1266	      reaction time of the measurement-based admission control
1267	      mechanism, doesn't move the system directly from no congestion to
1268	      overload (dropping packets). A 'flash crowd' may break this
1269	      assumption (Section 5.5). There are a variety of more general
1270	      issues associated with marking measurements, which may mean it's a
1271	      good idea to do pre-emption 'slower' (Section 5.6).

1273	   Each section describes a limitation and some possible solutions to
1274	   alleviate the limitation. These are intended as options for an
1275	   operator to consider, based on their particular requirements.

1277	   We would welcome feedback, for example suggestions as to which
1278	   potential solutions are worth working out in more detail, and ideas
1279	   on new potential solutions.

1281	   Finally Section 5.7 considers some other potential extensions.

1283	5.1. ECMP

1285	   If the CL-region uses Equal Cost Multipath Routing (ECMP), then
1286	   traffic between a particular pair of ingress and egress gateways may
1287	   follow several different paths.

1289	   Why? An ECMP-enabled router runs an algorithm to choose between
1290	   potential outgoing links, based on a hash of fields such as the
1291	   packet's source and destination addresses - exactly what depends on
1292	   the proprietary algorithm. Packets are addressed to the CL flow's
1293	   end-point, and therefore different flows may follow different paths
1294	   through the CL-region. (All packets of an individual flow follow the
1295	   same ECMP path.)

1297	   The problem is that if one of the paths is congested such that
1298	   packets are being admission marked, then the Congestion-Level-
1299	   Estimate measured by the egress gateway will be diluted by unmarked
1300	   packets from other non-congested paths. Similarly, the measurement of
1301	   the Sustainable-Aggregate-Rate will also be diluted.

1303	   Possible solution approaches are:

1305	   o tunnel: traffic is tunnelled across the CL-region. Then the
1306	      destination address (and so on) seen by the ECMP algorithm is that
1307	      of the egress gateway, so all flows follow the same path.
1308	      Effectively ECMP is turned off. As a compromise, to try to retain
1309	      some of the benefits of ECMP, there could be several tunnels, each
1310	      following a different ECMP path, with flows randomly assigned to
1311	      different tunnels.

1313	   o assume worst case: the operator sets the configured-admission-rate
1314	      (and configured-pre-emption-rate) to below the optimum level to
1315	      compensate for the fact that the effect on the Congestion-Level-
1316	      Estimate (and Sustainable-Aggregate-Rate) of the congestion
1317	      experienced over one of the paths may be diluted by traffic
1318	      received over non-congested paths. Hence lower thresholds need to
1319	      be used to ensure early admission control rejection and pre-
1320	      emption over the congested path. This approach will waste capacity
1321	      (e.g. flows following a non-congested ECMP path are not admitted
1322	      or are pre-empted), and there is still the danger that for some
1323	      traffic mixes the operator hasn't been cautious enough.

1325	   o for admission control, probe to obtain a flow-specific congestion-
1326	      level-estimate. Earlier this document suggests continuously
1327	      monitoring the congestion-level-estimate. Instead, probe packets
1328	      could be sent for each prospective new flow. The probe packets
1329	      have the same IP address etc as the data packets would have, and
1330	      hence follow the same ECMP path. However, probing is an extra
1331	      overhead, depending on how many probe packets need to be sent to
1332	      get a sufficiently accurate congestion-level-estimate.

1334	   o for flow pre-emption, only select flows for pre-emption from
1335	      amongst those that have actually received a Pre-emption Marked
1336	      packet. Because these flows must have followed an ECMP path that
1337	      goes through an overloaded router. However, it needs some extra
1338	      work by the egress gateway, to record this information and report
1339	      it to the ingress gateway.

1341	   o for flow pre-emption, a variant of this idea involves introducing
1342	      a new marking behaviour, 'Router Marking'. A router that is pre-
1343	      emption marking packets on an outgoing link, also 'Router Marks'
1344	      all other packets. When selecting flows for pre-emption, the
1345	      selection is made from amongst those that have actually received a
1346	      Router Marked or Pre-emption Marked packet. Hence compared with
1347	      the previous bullet, it may extend the range of flows from which
1348	      the pre-emption selection is made (i.e. it includes those which,
1349	      by chance, haven't had any pre-emption marked packets). However,
1350	      it also requires that the 'Router Marking' state is somehow
1351	      encoded into a packet, i.e. it makes harder the encoding challenge
1352	      discussed in Appendix C of [PCN]. The extra work required by the
1353	      egress gateway would also be somewhat higher than for the previous
1354	      bullet.

1356	5.2. Beat down effect

1358	   This limitation concerns the pre-emption mechanism in the case where
1359	   more than one router is pre-emption marking packets. The result
1360	   (explained in the next paragraph) is that the measurement of
1361	   sustainable-aggregate-rate is lower than its true value, so more
1362	   traffic is pre-empted than necessary.

1364	   Imagine the scenario:

1366	                 +-------+     +-------+     +-------+
1367	   IAR-b=3 >@@@@@| CPR=2 |@@@@@| CPR>2 |@@@@@| CPR=1 |@@> SAR-b=1
1368	   IAR-a=1 >#####|  R1   |#####|  R2   |     |  R3   |
1369	                 +-------+     +-------+     +-------+
1370	                                  #
1371	                                  #
1372	                                  #
1373	                                  v SAR-a=0.5

1375	   Figure 4: Scenario to illustrate 'beat down effect' limitation

1377	   Aggregate-a (ingress-aggregate-rate, IAR, 1 unit) takes a 'short'
1378	   route through two routers, one of which (R1) is above its configured-
1379	   pre-emption-rate (CPR, 2 units). Aggregate-b takes a 'long' route,
1380	   going through a second congested router (R3, with a CPR of 1 unit).

1382	   R1's input traffic is 4 units, twice its configured-pre-emption-rate,
1383	   so 50% of packets are pre-emption marked. Hence the measured
1384	   sustainable-aggregate-rate (SAR) for aggregate-a is 0.5, and half of
1385	   its traffic will be pre-empted.

1387	   R3's input of non-pre-emption-marked traffic is 1.5 units, and
1388	   therefore it has to do further marking.

1390	   But this means that aggregate-a has taken a bigger hit than it needed
1391	   to; the router R1 could have let through all of aggregate-a's traffic
1392	   unmarked if it had known that the second router R2 was going to "beat
1393	   down" aggregate-b's traffic further.

1395	   Generalising, the result is that in a scenario where more than one
1396	   router is pre-emption marking packets, only the final router is sure
1397	   to be fully loaded after flow pre-emption. The fundamental reason is
1398	   that a router makes a local decision about which packets to pre-
1399	   emption mark, i.e. independently of how other routers are pre-emption
1400	   marking. A very similar effect has been noted in XCP [Low].

1402	   Potential solutions:

1404	   o a full solution would involve routers learning about other routers
1405	      that are pre-emption marking, and being able to differentially
1406	      mark flows (e.g. in the example above, aggregate-a's packets
1407	      wouldn't be marked by R1). This seems hard and complex.

1409	   o do nothing about this limitation. It causes over-pre-emption,
1410	      which is safe. At the moment this is our suggested option.

1412	   o do pre-emption 'slowly'. The description earlier in this document
1413	      assumes that after the measurements of ingress-aggregate-rate and
1414	      sustainable-aggregate-rate, then sufficient flows are pre-empted
1415	      in 'one shot' to eliminate the excess traffic. An alternative is
1416	      to spread pre-emption over several rounds: initially, only pre-
1417	      empt enough to eliminate some of the excess traffic, then re-
1418	      measure the sustainable-aggregate-rate, and then pre-empt some
1419	      more, etc. In the scenario above, the re-measurement would be
1420	      lower than expected, due to the beat down effect, and hence in the
1421	      second round of pre-emption less of aggregate-a's traffic would be
1422	      pre-empted (perhaps none). Overall, therefore the impact of the
1423	      'beat down' effect would be lessened, i.e. there would be a
1424	      smaller degree of over pre-emption. The downside is that the
1425	      overall pre-emption is slower, and therefore routers will be
1426	      congested longer.

1428	5.3. Bi-directional sessions

1430	   The document earlier describes how to decide whether or not to admit
1431	   (or pre-empt) a particular flow. However, from a user/application
1432	   perspective, the session is the relevant unit of granularity. A
1433	   session can consist of several flows which may not all be part of the
1434	   same aggregate. The most obvious example is a bi-directional session,
1435	   where the two flows should ideally be admitted or pre-empted as a
1436	   pair - for instance a voice call only makes sense if A can send to B
1437	   as well as B to A! But the admission and pre-emption mechanisms
1438	   described earlier in this document operate on a per-aggregate basis,
1439	   independently of what's happening with other aggregates. For
1440	   admission control the problem isn't serious: e.g. the SIP server for
1441	   the voice call can easily detect that the A-to-B flow has been
1442	   admitted but the B-to-A flow blocked, and inform the user perhaps via
1443	   a busy tone. For flow pre-emption, the problem is similar but more
1444	   serious. If both the aggregate-1-to-2 (i.e. from gateway 1 to gateway
1445	   2) and the aggregate-2-to-1 have to pre-empt flows, then it would be
1446	   good if either all of the flows of a particular session were pre-
1447	   empted or none of them. Therefore if the two aggregates pre-empt
1448	   flows independently of each other, more sessions will end up being
1449	   torn down than is really necessary. For instance, pre-empting one
1450	   direction of a voice call will result in the SIP server tearing down
1451	   the other direction anyway.

1453	   Potential solutions:

1455	   o if it's known that all session are bi-directional, simply pre-
1456	      empting roughly half as many flows as suggested by the
1457	      measurements of {ingress-aggregate-rate - sustainable-aggregate-
1458	      rate}. But this makes a big assumption about the nature of
1459	      sessions, and also that the aggregate-1-to-2 and aggregate-2-to-1
1460	      are equally overloaded.

1462	   o ignore the limitation. The penalty will be quite small if most
1463	      sessions consist of one flow or of flows part of the same
1464	      aggregate.

1466	   o introduce a gateway controller. It would receive reports for all
1467	      aggregates where the ingress-aggregate-rate exceeds the
1468	      sustainable-aggregate-rate. It then would make a global decision
1469	      about which flows to pre-empt. However it requires quite some
1470	      complexity, for example the controller needs to understand which
1471	      flows map to which sessions. This may be an option in some
1472	      scenarios, for example where gateways aren't handling too many
1473	      flows (but note that this breaks the aggregation assumption of
1474	      Section 2.2). A variant of this idea would be to introduce a
1475	      gateway controller per pair of gateways, in order to handle bi-
1476	      directional sessions but not try to deal with more complex
1477	      sessions that include flows from an arbitrary number of
1478	      aggregates.

1480	   o do pre-emption 'slowly'. As in the "beat down" solution 4, this
1481	      would reduce the impact of this limitation. The downside is that
1482	      the overall pre-emption is slower, and therefore router(s) will be
1483	      congested longer.

1485	   o each ingress gateway 'loosely coordinates' with other gateways its
1486	      decision about which specific flows to pre-empt. Each gateway
1487	      numbers flows in the order they arrive (note that this number has
1488	      no meaning outside the gateway), and when pre-empting flows, the
1489	      most recent (or most recent low priority flow) is selected for
1490	      pre-emption; the gateway then works backwards selecting as many
1491	      flows as needed. Gateways will therefore tend to pre-empt flows
1492	      that are part of the same session (as they were admitted at the
1493	      same time). Of course this isn't guaranteed for several reasons,
1494	      for instance gateway A's most recent bi-directional sessions may
1495	      be with gateway C, whereas gateway B's are with gateway A (so
1496	      gateway A will pre-empt A-to-C flows and gateway B will pre-empt
1497	      B-to-A flows). Rather than pre-empting the most recent (low
1498	      priority) flow, an alternative algorithm (for further study) may
1499	      be to select flows based on a hash of particular fields in the
1500	      packet, such that both gateways produce the same hash for flows of
1501	      the same bi-directional session. We believe that this approach
1502	      should be investigated further.

1504	5.4. Global fairness

1506	   The limitation here is that 'high priority' traffic may be pre-empted
1507	   (or not admitted) when a global decision would instead pre-empt (or
1508	   not admit) 'lower priority' traffic on a different aggregate.

1510	   Imagine the following scenario (extreme to illustrate the point
1511	   clearly). Aggregate_a is all Assured Services (MLPP) traffic, whilst
1512	   aggregate_b is all ordinary traffic (i.e. comparatively low
1513	   priority). Together the two aggregates cause a router to be at twice
1514	   its configured-pre-emption-rate. Ideally we'd like all of aggregate_b
1515	   to be pre-empted, as then all of aggregate_a could be carried.
1516	   However, the approach described earlier in this document leads to
1517	   half of each aggregate being pre-empted.

1519	                       IAR_b=1
1520	                         v
1521	                         v
1522	                     +-------+
1523	   IAR_a=1 ---->-----| CPR=1 |-----> SAR_a=0.5
1524	                     |       |
1525	                     +-------+
1526	                         v
1527	                         v
1528	                       SAR_a=0.5

1530	   Figure 5: Scenario to illustrate 'global fairness' limitation

1532	   Similarly, for admission control - Section 4.1 describes how if the
1533	   Congestion-Level-Estimate is greater than the CLE-threshold all new
1534	   sessions are refused. But it is unsatisfactory to block emergency
1535	   calls, for instance.

1537	   Potential solutions:

1539	   o in the admission control case, it is recommended that an
1540	      'emergency / Assured Services' call is admitted immediately even
1541	      if the CLE-threshold is exceeded. Usually the network can actually
1542	      handle the additional microflow, because there is a safety margin
1543	      between the configured-admission-rate and the configured-pre-
1544	      emption-rate. Normal call termination behaviour will soon bring
1545	      the traffic level down below the configured-admission-rate.
1546	      However, in exceptional circumstances the 'emergency / higher
1547	      precedence' call may cause the traffic level to exceed the
1548	      configured-pre-emption-rate; then the usual pre-emption mechanism
1549	      will pre-empt enough (non 'emergency / higher precedence')
1550	      microflows to bring the total traffic back under the configured-
1551	      pre-emption-rate.

1553	   o all egress gateways report to a global coordinator that makes
1554	      decisions about what flows to pre-empt. However this solution adds
1555	      complexity and probably isn't scalable, but it may be an option in
1556	      some scenarios, for example where gateways aren't handling too
1557	      many flows (but note that this breaks the aggregation assumption
1558	      of Section 2.2).

1560	   o introduce a heuristic rule: before pre-empting a 'high priority'
1561	      flow the egress gateway should wait to see if sufficient (lower
1562	      priority) traffic is pre-empted on other aggregates. This is a
1563	      reasonable option.

1565	   o enhance the functionality of all the interior routers, so they can
1566	      detect the priority of a packet, and then differentially mark
1567	      them. As well as adding complexity, in general this would be an
1568	      unacceptable security risk for MLPP traffic, since only controlled
1569	      nodes (like gateways) should know which packets are high priority,
1570	      as this information can be abused by an attacker.

1572	   o do nothing, i.e. accept the limitation. Whilst it's unlikely that
1573	      high priority calls will be quite so unbalanced as in the scenario
1574	      above, just accepting this limitation may be risky. The sorts of
1575	      situations that cause routers to start pre-emption marking are
1576	      also likely to cause a surge of emergency / MLPP calls.

1578	5.5. Flash crowds

1580	   This limitation concerns admission control and arises because there
1581	   is a time lag between the admission control decision (which depends
1582	   on the Congestion-Level-Estimate during RSVP signalling during call
1583	   set-up) and when the data is actually sent (after the called party
1584	   has answered). In PSTN terms this is the time the phone rings.
1585	   Normally the time lag doesn't matter much because (1) in the CL-
1586	   region there are many flows and they terminate and are answered at
1587	   roughly the same rate, and (2) the network can still operate safely
1588	   when the traffic level is some margin above the configured-admission-
1589	   rate.

1591	   A 'flash crowd' occurs when something causes many calls to be
1592	   initiated in a short period of time - for instance a 'tele-vote'. So
1593	   there is a danger that a 'flash' of calls is accepted, but when the
1594	   calls are answered and data flows the traffic overloads the network.
1595	   Therefore potentially the 'additional load' assumption of Section 2.2
1596	   doesn't hold.

1598	   Potential solutions:

1600	   o The simplest option is to do nothing; an operator relies on the
1601	      pre-emption mechanism if there is a problem. This doesn't seem a
1602	      good choice, as 'flash crowds' are reasonably common on the PSTN,
1603	      unless the operator can ensure that nearly all 'flash crowd'
1604	      events are blocked in the access network and so do not impact on
1605	      the CL-region.

1607	   o A second option is to send 'dummy data' as soon as the call is
1608	      admitted, thus effectively reserving the bandwidth whilst waiting
1609	      for the called party to answer. Reserving bandwidth in advance
1610	      means that the network cannot admit as many calls. For example,
1611	      suppose sessions last 100 seconds and ringing for 10 seconds, the
1612	      cost is a 10% loss of capacity. It may be possible to offset this
1613	      somewhat by increasing the configured-admission-rate in the
1614	      routers, but it would need further investigation. A concern with
1615	      this 'dummy data' option is that it may allow an attacker to
1616	      initiate many calls that are never answered (by a cooperating
1617	      attacker), so eventually the network would only be carrying 'dummy
1618	      data'. The attack exploits that charging only starts when the call
1619	      is answered and not when it is dialled. It may be possible to
1620	      alleviate the attack at the session layer - for example, when the
1621	      ingress gateway gets an RSVP PATH message it checks that the
1622	      source has been well-behaved recently; and limiting the maximum
1623	      time that ringing can last. We believe that if this attack can be
1624	      dealt with then this is a good option.

1626	   o A third option is that the egress gateway limits the rate at which
1627	      it sends out the Congestion-Level-Estimate, or limits the rate at
1628	      which calls are accepted by replying with a Congestion-Level-
1629	      Estimate of 100% (this is the equivalent of 'call gapping' in the
1630	      PSTN). There is a trade-off, which would need to be investigated
1631	      further, between the degree of protection and possible adverse
1632	      side-effects like slowing down call set-up.

1634	   o A final option is to re-perform admission control before the call
1635	      is answered. The ingress gateway monitors Congestion-Level-
1636	      Estimate updates received from each egress. If it notices that a
1637	      Congestion-Level-Estimate has risen above the CLE-threshold, then
1638	      it terminates all unanswered calls through that egress (e.g. by
1639	      instructing the session protocol to stop the 'ringing tone'). For
1640	      extra safety the Congestion-Level-Estimate could be re-checked
1641	      when the call is answered. A potential drawback for an operator
1642	      that wants to emulate the PSTN is that the PSTN network never
1643	      drops a 'ringing' PSTN call.

1645	5.6. Pre-empting too fast

1647	   As a general idea it seems good to pre-empt excess flows rapidly, so
1648	   that the full QoS is restored to the remaining CL users as soon as
1649	   possible, and partial service is restored to lower priority traffic
1650	   classes on shared links. Therefore the pre-emption mechanism
1651	   described earlier in this document works in 'one shot', i.e. one
1652	   measurement is made of the sustainable-aggregate-rate and the
1653	   ingress-aggregate-rate, and the excess is pre-empted immediately.
1654	   However, there are some reasons why an operator may potentially want
1655	   to pre-empt 'more slowly':

1657	   o To allow time to modify the ingress gateway's policer, as the
1658	      ingress wants to be able to drop any packets that arrive from a
1659	      pre-empted flow. There will be a limit on how many new filters an
1660	      ingress gateway can install in a certain time period. Otherwise
1661	      the source may cheat and ignore the instruction to drop its flow.

1663	   o The operator may decide to slow down pre-emption in order to
1664	      ameliorate the 'beat down' and/or 'bi-directional sessions'
1665	      limitations (see above)

1667	   o To help combat inaccuracies in measurements of the sustainable-
1668	      aggregate-rate and ingress-aggregate-rate. For a CL-region where
1669	      it's assumed there are many flows in an aggregate these
1670	      measurements can be obtained in a short period of time, but where
1671	      there are fewer flows it will take longer.

1673	   o To help combat over pre-emption because, during the time it takes
1674	      to pre-empt flows, others may be ending anyway (either the call
1675	      has naturally ended, or the user hangs up due to poor QoS).
1676	      Slowing pre-emption may seem counter-intuitive here, as it makes
1677	      it more likely that calls will terminate anyway - however it also
1678	      gives time to adjust the amount pre-empted to take account of
1679	      this.

1681	   o Earlier in this document we said that an egress starts measuring
1682	      the sustainable-aggregate-rate immediately it sees a single pre-
1683	      emption marked packet. However, when a link or router fails the
1684	      network's underlying recovery mechanism will kick in (e.g.
1685	      switching to a back up path), which may result in the network
1686	      again being able to support all the traffic.

1688	   Potential solutions

1690	   o To combat the final issue, the egress could measure the
1691	      sustainable-aggregate-rate over a longer time period than the
1692	      network recovery time (say 100ms vs. 50ms). If it detects no pre-
1693	      emption marked packets towards the end of its measurement period
1694	      (say in the last 30 ms) then it doesn't send a pre-emption alert
1695	      message to the ingress.

1697	   o We suggest that optionally (the choice of the operator) pre-
1698	      emption is slowed by pre-empting traffic in several rounds rather
1699	      than in one shot. One possible algorithm is to pre-empt most of
1700	      the traffic in the first round and the rest in the second round;
1701	      the amount pre-empted in the second round is influenced by both
1702	      the first and second round measurements:     * Round 1: pre-empt h
1703	      * S_1                                 where 0.5 <= h <= 1
1704	      where S_1 is the amount the normal mechanism calculates that it
1705	      should shed, i.e. {ingress-aggregate-rate - sustainable-aggregate-
1706	      rate}        * Round 2: pre-empt Predicted-S_2 - h * (Predicted-
1707	      S_2 - Measured-S_2)
1708	      where Predicted-S_2 = (1-h)*S_1                              Note
1709	      that the second measurement should be made when sufficient time
1710	      has elapsed for the first round of pre-emption to have happened.
1711	      One idea to achieve this is for the egress gateway to continuously
1712	      measure and report its sustainable-aggregate-rate, in (say) 100ms
1713	      windows. Therefore the ingress gateway knows when the egress
1714	      gateway made its measurement (assuming the round trip time is
1715	      known). Therefore the ingress gateway knows when measurements
1716	      should reflect that it has pre-empted flows.

1718	5.7. Other potential extensions

1720	   In this section we discuss some other potential extensions not
1721	   already covered above.

1723	5.7.1. Tunnelling

1725	   It is possible to tunnel all CL packets across the CL-region.
1726	   Although there is a cost of tunnelling (additional header on each
1727	   packet, additional processing at tunnel ingress and egress), there
1728	   are three reasons it may be interesting.

1730	   ECMP:

1732	   Tunnelling is one of the possible solutions given earlier in Section
1733	   5.1 on Equal Cost Multipath Routing (ECMP).

1735	   Ingress gateway determination:

1737	   If packets are tunnelled from ingress gateway to egress gateway, the
1738	   egress gateway can very easily determine in the data path which
1739	   ingress gateway a packet comes from (by simply looking at the source
1740	   address of the tunnel header). This can facilitate operations such as
1741	   computing the Congestion-Level-Estimate on a per ingress gateway
1742	   basis.

1744	   End-to-end ECN:

1746	   The ECN field is used for PCN marking (see [PCN] for details), and so
1747	   it needs to be re-set by the egress gateway to whatever has been
1748	   agreed as appropriate for the next domain. Therefore if a packet
1749	   arrives at the ingress gateway with its ECN field already set (i.e.
1750	   not '00'), it may leave the egress gateway with a different value.
1751	   Hence the end-to-end meaning of the ECN field is lost.

1753	   It is open to debate whether end-to-end congestion control is ever
1754	   necessary within an end-to-end reservation. But if a genuine need is
1755	   identified for end-to-end ECN semantics within a reservation, then
1756	   one solution is to tunnel CL packets across the CL-region. When the
1757	   egress gateway decapsulates them the original ECN field is recovered.

1759	5.7.2. Multi-domain and multi-operator usage

1761	   This potential extension would eliminate the trust assumption
1762	   (Section 2.2), so that the CL-region could consist of multiple
1763	   domains run by different operators that did not trust each other.
1764	   Then only the ingress and egress gateways of the CL-region would take
1765	   part in the admission control procedure, i.e. at the ingress to the
1766	   first domain and the egress from the final domain. The border routers
1767	   between operators within the CL-region would only have to do bulk
1768	   accounting - they wouldn't do per microflow metering and policing,
1769	   and they wouldn't take part in signal processing or hold per flow
1770	   state [Briscoe]. [Re-feedback] explains how a downstream domain can
1771	   police that its upstream domain does not 'cheat' by admitting traffic
1772	   when the downstream path is over-congested. [Re-PCN] proposes how to
1773	   achieve this with the help of another recently proposed extension to
1774	   ECN, involving re-echoing ECN feedback [Re-ECN].

1776	5.7.3. Preferential dropping of pre-emption marked packets

1778	   When the rate of real-time traffic in the specified class exceeds the
1779	   maximum configured rate, then a router has to drop some packet(s)
1780	   instead of forwarding them on the out-going link. Now when the egress
1781	   gateway measures the Sustainable-Aggregate-Rate, neither dropped
1782	   packets nor pre-emption marked packets contribute to it. Dropping
1783	   non-pre-emption-marked packets therefore reduces the measured
1784	   Sustainable-Aggregate-Rate below its true value. Thus a router should
1785	   preferentially drop pre-emption marked packets.

1787	   Note that it is important that the operator doesn't set the
1788	   configured-pre-emption-rate equal to the rate at which packets start
1789	   being dropped (for the specified real-time service class). Otherwise
1790	   the egress gateway may never see a pre-emption marked packet and so
1791	   won't be triggered into the Pre-emption Alert state.

1793	   This optimisation is optional. When considering whether to use it an
1794	   operator will consider issues such as whether the over-pre-emption is
1795	   serious, and whether the particular routers can easily do this sort
1796	   of selective drop.

1798	5.7.4. Adaptive bandwidth for the Controlled Load service

1800	   The admission control mechanism described in this document assumes
1801	   that each router has a fixed bandwidth allocated to CL flows. A
1802	   possible extension is that the bandwidth is flexible, depending on
1803	   the level of non-CL traffic. If a large share of the current load on
1804	   a path is CL, then more CL traffic can be admitted. And if the
1805	   greater share of the load is non-CL, then the admission threshold can
1806	   be proportionately lower. The approach re-arranges sharing between
1807	   classes to aim for economic efficiency, whatever the traffic load
1808	   matrix. It also deals with unforeseen changes to capacity during
1809	   failures better than configuring fixed engineered rates. Adaptive
1810	   bandwidth allocation can be achieved by changing the admission
1811	   marking behaviour, so that the probability of admission marking a
1812	   packet would now depend on the number of queued non-CL packets as
1813	   well as the size of the virtual queue. The adaptive bandwidth
1814	   approach would be supplemented by placing limits on the adaptation to
1815	   prevent starvation of the CL by other traffic classes and of other
1816	   classes by CL traffic. [Songhurst] has more details of the adaptive
1817	   bandwidth approach.

1819	5.7.5. Controlled Load service with end-to-end Pre-Congestion
1820	   Notification

1822	   It may be possible to extend the framework to parts of the network
1823	   where there are only a low number of CL microflows, i.e. the
1824	   aggregation assumption (Section 2.2) doesn't hold. In the extreme it
1825	   may be possible to operate the framework end-to-end, i.e. between end
1826	   hosts. One potential method is to send probe packets to test whether
1827	   the network can support a prospective new CL microflow. The probe
1828	   packets would be sent at the same traffic rate as expected for the
1829	   actual microflow, but in order not to disturb existing CL traffic a
1830	   router would always schedule probe packets behind CL ones (compare
1831	   [Breslau00]); this implies they have a new DSCP. Otherwise the
1832	   routers would treat probe packets identically to CL packets. In order
1833	   to perform admission control quickly, in parts of the network where
1834	   there are only a few CL microflows, the Pre-Congestion marking
1835	   behaviour for probe packets would switch from admission marking no
1836	   packets to admission marking them all for only a minimal increase in
1837	   load.

1839	5.7.6. MPLS-TE

1841	   [ECN-MPLS] discusses how to extend the deployment model to MPLS, i.e.
1842	   for admission control of microflows into a set of MPLS-TE aggregates
1843	   (Multi-protocol label switching traffic engineering). It would
1844	   require that the MPLS header could include the ECN field, which is
1845	   not precluded by RFC3270. See [ECN-MPLS].

1847	6. Relationship to other QoS mechanisms

1849	6.1. IntServ Controlled Load

1851	   The CL mechanism delivers QoS similar to Integrated Services
1852	   controlled load, but rather better. The reason the QoS is better is
1853	   that the CL mechanism keeps the real queues empty, by driving
1854	   admission control from a bulk virtual queue on each interface. The
1855	   virtual queue [AVQ, vq] can detect a rise in load before the real
1856	   queue builds. It is also more robust to route changes.

1858	6.2. Integrated services operation over DiffServ

1860	   Our approach to end-to-end QoS is similar to that described in
1861	   [RFC2998] for Integrated services operation over DiffServ networks.
1862	   Like [RFC2998], an IntServ class (CL in our case) is achieved end-to-
1863	   end, with a CL-region viewed as a single reservation hop in the total
1864	   end-to-end path. Interior routers of the CL-region do not process
1865	   flow signalling nor do they hold per flow state. Unlike [RFC2998] we
1866	   do not require the end-to-end signalling mechanism to be RSVP,
1867	   although it can be.

1869	   Bearing in mind these differences, we can describe our architecture
1870	   in the terms of the options in [RFC2998]. The DiffServ network region
1871	   is RSVP-aware, but awareness is confined to (what [RFC2998] calls)
1872	   the "border routers" of the DiffServ region. We use explicit
1873	   admission control into this region, with static provisioning within
1874	   it. The ingress "border router" does per microflow policing and sets
1875	   the DSCP and ECN fields to indicate the packets are CL ones (i.e. we
1876	   use router marking rather than host marking).

1878	6.3. Differentiated Services

1880	   The DiffServ architecture does not specify any way for devices
1881	   outside the domain to dynamically reserve resources or receive
1882	   indications of network resource availability.  In practice, service
1883	   providers rely on subscription-time Service Level Agreements (SLAs)
1884	   that statically define the parameters of the traffic that will be
1885	   accepted from a customer. The CL mechanism allows dynamic reservation
1886	   of resources through the DiffServ domain and, with the potential
1887	   extension mentioned in Section 5.7.2, it can span multiple domains
1888	   without active policing mechanisms at the borders (unlike DiffServ).
1889	   Therefore we do not use the traffic conditioning agreements (TCAs) of
1890	   the (informational) DiffServ architecture [RFC2475].

1892	   [Johnson] compares admission control with a 'generously dimensioned'
1893	   DiffServ network as ways to achieve QoS. The former is recommended.

1895	6.4. ECN

1897	   The marking behaviour described in this document complies with the
1898	   ECN aspects of the IP wire protocol RFC3168, but provides its own
1899	   edge-to-edge feedback instead of the TCP aspects of RFC3168. All
1900	   routers within the CL-region are upgraded with the admission marking
1901	   and pre-emption marking of Pre-Congestion Notification, so the
1902	   requirements of [Floyd] are met because the CL-region is an enclosed
1903	   environment. The operator prevents traffic arriving at a router that
1904	   doesn't understand CL by administrative configuration of the ring of
1905	   gateways around the CL-region.

1907	6.5. RTECN

1909	   Real-time ECN (RTECN) [RTECN, RTECN-usage] has a similar aim to this
1910	   document (to achieve a low delay, jitter and loss service suitable
1911	   for RT traffic) and a similar approach (per microflow admission
1912	   control combined with an "early warning" of potential congestion
1913	   through setting the CE codepoint). But it explores a different
1914	   architecture without the aggregation assumption: host-to-host rather
1915	   than edge-to-edge. We plan to document such a host-to-host framework
1916	   in a parallel draft to this one, and to describe if and how [PCN] can
1917	   work in this framework.

1919	6.6. RMD

1921	   Resource Management in DiffServ (RMD) [RMD] is similar to this work,
1922	   in that it pushes complex classification, traffic conditioning and
1923	   admission control functions to the edge of a DiffServ domain and
1924	   simplifies the operation of the interior routers. One of the RMD
1925	   modes ("Congestion notification function based on probing") uses
1926	   measurement-based admission control in a similar way to this
1927	   document. The main difference is that in RMD probing plays a
1928	   significant role in the admission control process. Other differences
1929	   are that the admission control decision is taken on the egress
1930	   gateway (rather than the ingress); 'admission marking' is encoded in
1931	   a packet as a new DSCP (rather than in the ECN field), and that the
1932	   NSIS protocols are used for signalling (rather than RSVP).

1934	   RMD also includes the concept of Severe Congestion handling. The pre-
1935	   emption mechanism described in the CL architecture has similar
1936	   objectives but relies on different mechanisms. The main difference is
1937	   that the interior routers measure the data rate that causes an
1938	   overload and mark packets according to this rate.

1940	6.7. RSVP Aggregation over MPLS-TE

1942	   Multi-protocol label switching traffic engineering (MPLS-TE) allows
1943	   scalable reservation of resources in the core for an aggregate of
1944	   many microflows. To achieve end-to-end reservations, admission
1945	   control and policing of microflows into the aggregate can be achieved
1946	   using techniques such as RSVP Aggregation over MPLS TE Tunnels as per
1947	   [AGGRE-TE]. However, in the case of inter-provider environments,
1948	   these techniques require that admission control and policing be
1949	   repeated at each trust boundary or that MPLS TE tunnels span multiple
1950	   domains.

1952	7. Security Considerations

1954	   To protect against denial of service attacks, the ingress gateway of
1955	   the CL-region needs to police all CL packets and drop packets in
1956	   excess of the reservation. This is similar to operations with
1957	   existing IntServ behaviour.

1959	   For pre-emption, it is considered acceptable from a security
1960	   perspective that the ingress gateway can treat "emergency/military"
1961	   CL flows preferentially compared with "ordinary" CL flows. However,
1962	   in the rest of the CL-region they are not distinguished (nonetheless,
1963	   our proposed technique does not preclude the use of different DSCPs
1964	   at the packet level as well as different priorities at the flow
1965	   level.). Keeping emergency traffic indistinguishable at the packet
1966	   level minimises the opportunity for new security attacks. For
1967	   example, if instead a mechanism used different DSCPs for
1968	   "emergency/military" and "ordinary" packets, then an attacker could
1969	   specifically target the former in the data plane (perhaps for DoS or
1970	   for eavesdropping).

1972	   Further security aspects to be considered later.

1974	8. Acknowledgements

1976	   The admission control mechanism evolved from the work led by Martin
1977	   Karsten on the Guaranteed Stream Provider developed in the M3I
1978	   project [GSPa, GSP-TR], which in turn was based on the theoretical
1979	   work of Gibbens and Kelly [DCAC]. Kennedy Cheng, Gabriele Corliano,
1980	   Carla Di Cairano-Gilfedder, Kashaf Khan, Peter Hovell, Arnaud Jacquet
1981	   and June Tay (BT) helped develop and evaluate this approach.

1983	   Many thanks to those who have commented on this work at Transport
1984	   Area Working Group meetings and on the mailing list, including: Ken
1985	   Carlberg, Ruediger Geib, Lars Westberg, David Black, Robert Hancock,
1986	   Cornelia Kappler.

1988	9. Comments solicited

1990	   Comments and questions are encouraged and very welcome. They can be
1991	   sent to the Transport Area Working Group's mailing list,
1992	   tsvwg@ietf.org, and/or to the authors.

1994	10. Changes from earlier versions of the draft

1996	   The main changes are:

1998	   From -00 to -01

2000	   The whole of the Pre-emption mechanism is added.

2002	   There are several modifications to the admission control mechanism.

2004	   From -01 to -02

2006	   The pre-congestion notification algorithms for admission marking and
2007	   pre-emption marking are now described in [PCN].

2009	   There are new sub-sections in Section 4 on Failures, Admission of
2010	   'emergency / higher precedence' session, and Tunnelling; and a new
2011	   sub-section in Section 5 on Mechanisms to deal with 'Flash crowds'.

2013	   From -02 to -03

2015	   Section 5 has been updated and expanded. It is now about the
2016	   'limitations' of the PCN mechanism, as described in the earlier
2017	   sections, plus discussion of 'possible solutions' to those
2018	   limitations.

2020	   The measurement of the Congestion-Level-Estimate now includes pre-
2021	   emption marked packets as well as admission marked ones. Section
2022	   3.1.2 explains.

2024	11. Appendices

2026	11.1. Appendix A: Explicit Congestion Notification

2028	   This Appendix provides a brief summary of Explicit Congestion
2029	   Notification (ECN).

2031	   [RFC3168] specifies the incorporation of ECN to TCP and IP, including
2032	   ECN's use of two bits in the IP header. It specifies a method for
2033	   indicating incipient congestion to end-hosts (e.g. as in RED, Random
2034	   Early Detection), where the notification is through ECN marking
2035	   packets rather than dropping them.

2037	   ECN uses two bits in the IP header of both IPv4 and IPv6 packets:

2039	            0     1     2     3     4     5     6     7
2040	         +-----+-----+-----+-----+-----+-----+-----+-----+
2041	         |          DS FIELD, DSCP           | ECN FIELD |
2042	         +-----+-----+-----+-----+-----+-----+-----+-----+

2044	           DSCP: differentiated services codepoint
2045	           ECN:  Explicit Congestion Notification

2047	   Figure A.1: The Differentiated Services and ECN Fields in IP.

2049	   The two bits of the ECN field have four ECN codepoints, '00' to '11':
2050	         +-----+-----+
2051	         | ECN FIELD |
2052	         +-----+-----+
2053	           ECT   CE
2054	            0     0         Not-ECT
2055	            0     1         ECT(1)
2056	            1     0         ECT(0)
2057	            1     1         CE

2059	   Figure A.2: The ECN Field in IP.

2061	   The not-ECT codepoint '00' indicates a packet that is not using ECN.

2063	   The CE codepoint '11' is set by a router to indicate congestion to
2064	   the end hosts. The term 'CE packet' denotes a packet that has the CE
2065	   codepoint set.

2067	   The ECN-Capable Transport (ECT) codepoints '10' and '01' (ECT(0) and
2068	   ECT(1) respectively) are set by the data sender to indicate that the
2069	   end-points of the transport protocol are ECN-capable. Routers treat
2070	   the ECT(0) and ECT(1) codepoints as equivalent. Senders are free to
2071	   use either the ECT(0) or the ECT(1) codepoint to indicate ECT, on a
2072	   packet-by-packet basis. The use of both the two codepoints for ECT is
2073	   motivated primarily by the desire to allow mechanisms for the data
2074	   sender to verify that network elements are not erasing the CE
2075	   codepoint, and that data receivers are properly reporting to the
2076	   sender the receipt of packets with the CE codepoint set.

2078	   ECN requires support from the transport protocol, in addition to the
2079	   functionality given by the ECN field in the IP packet header.
2080	   [RFC3168] addresses the addition of ECN Capability to TCP, specifying
2081	   three new pieces of functionality: negotiation between the endpoints
2082	   during connection setup to determine if they are both ECN-capable; an
2083	   ECN-Echo (ECE) flag in the TCP header so that the data receiver can
2084	   inform the data sender when a CE packet has been received; and a
2085	   Congestion Window Reduced (CWR) flag in the TCP header so that the
2086	   data sender can inform the data receiver that the congestion window
2087	   has been reduced.

2089	   The transport layer (e.g.. TCP) must respond, in terms of congestion
2090	   control, to a *single* CE packet as it would to a packet drop.

2092	   The advantage of setting the CE codepoint as an indication of
2093	   congestion, instead of relying on packet drops, is that it allows the
2094	   receiver(s) to receive the packet, thus avoiding the potential for
2095	   excessive delays due to retransmissions after packet losses.

2097	11.2. Appendix B: What is distributed measurement-based admission
2098	   control?

2100	   This Appendix briefly explains what distributed measurement-based
2101	   admission control is [Breslau99].

2103	   Traditional admission control algorithms for 'hard' real-time
2104	   services (those providing a firm delay bound for example) guarantee
2105	   QoS by using 'worst case analysis'. Each time a flow is admitted its
2106	   traffic parameters are examined and the network re-calculates the
2107	   remaining resources. When the network gets a new request it therefore
2108	   knows for certain whether the prospective flow, with its particular
2109	   parameters, should be admitted. However, parameter-based admission
2110	   control algorithms result in under-utilisation when the traffic is
2111	   bursty. Therefore 'soft' real time services - like Controlled Load -
2112	   can use a more relaxed admission control algorithm.

2114	   This insight suggests measurement-based admission control (MBAC). The
2115	   aim of MBAC is to provide a statistical service guarantee. The
2116	   classic scenario for MBAC is where each router participates in hop-
2117	   by-hop admission control, characterising existing traffic locally
2118	   through measurements (instead of keeping an accurate track of traffic
2119	   as it is admitted), in order to determine the current value of some
2120	   parameter e.g. load. Note that for scalability the measurement is of
2121	   the aggregate of the flows in the local system. The measured
2122	   parameter(s) is then compared to the requirements of the prospective
2123	   flow to see whether it should be admitted.

2125	   MBAC may also be performed centrally for a network, it which case it
2126	   uses centralised measurements by a bandwidth broker.

2128	   We use distributed MBAC. "Distributed" means that the measurement is
2129	   accumulated for the 'whole-path' using in-band signalling. In our
2130	   case, this means that the measurement of existing traffic is for the
2131	   same pair of ingress and egress gateways as the prospective
2132	   microflow.

2134	   In fact our mechanism can be said to be distributed in three ways:
2135	   all routers on the ingress-egress path affect the Congestion-Level-
2136	   Estimate; the admission control decision is made just once on behalf
2137	   of all the routers on the path across the CL-region; and the ingress
2138	   and egress gateways cooperate to perform MBAC.

2140	11.3. Appendix C: Calculating the Exponentially weighted moving average
2141	   (EWMA)

2143	   At the egress gateway, for every CL packet arrival:

2145	   [EWMA-total-bits]n+1  =  (w * bits-in-packet)  +  ((1-w) * [EWMA-
2146	   total-bits]n )

2148	   [EWMA-M-bits]n+1  =  (B * w * bits-in-packet)  +  ((1-w) * [EWMA-M-
2149	   bits]n )

2151	   Then, per new flow arrival:

2153	   [Congestion-Level-Estimate]n+1  =  [EWMA-M-bits]n+1  /  [EWMA-total-
2154	   bits]n+1

2156	   where
2157	   EWMA-total-bits is the total number of bits in CL packets, calculated
2158	   as an exponentially weighted moving average (EWMA)

2160	   EWMA-M-bits is the total number of bits in CL packets that are
2161	   Admission Marked or Pre-emption Marked, again calculated as an EWMA.

2163	   B is either 0 or 1:

2165	     B = 0 if the CL packet is not admission marked

2167	     B = 1 if the CL packet is admission marked

2169	   w is the exponential weighting factor.

2171	   Varying the value of the weight trades off between the smoothness and
2172	   responsiveness of the Congestion-Level-Estimate. However, in general
2173	   both can be achieved, given our original assumption of many CL
2174	   microflows and remembering that the EWMA is calculated on the basis
2175	   of aggregate traffic between the ingress and egress gateways.
2176	   There will be a threshold inter-arrival time between packets of the
2177	   same aggregate below which the egress will consider the estimate of
2178	   the Congestion-Level-Estimate as too stale, and it will then trigger
2179	   generation of probes by the ingress.

2181	   The first two per-packet algorithms can be simplified, if their only
2182	   use will be where the result of one is divided by the result of the
2183	   other in the third, per-flow algorithm.

2185	   [EWMA-total-bits]'n+1  =  bits-in-packet  +  (w' * [EWMA- total-
2186	   bits]n )

2188	   [EWMA-AM-bits]'n+1  =  (B * bits-in-packet)  +  (w' * [EWMA-AM-bits]n
2189	   )

2191	   where w' = (1-w)/w.

2193	   If w' is arranged to be a power of 2, these per packet algorithms can
2194	   be implemented solely with a shift and an add.

2196	12. References

2198	   A later version will distinguish normative and informative
2199	   references.

2201	   [AGGRE-TE]    Francois Le Faucheur, Michael Dibiasio, Bruce Davie,
2202	                 Michael Davenport, Chris Christou, Jerry Ash, Bur
2203	                 Goode, 'Aggregation of RSVP Reservations over MPLS
2204	                 TE/DS-TE Tunnels', draft-ietf-tsvwg-rsvp-dste-03 (work
2205	                 in progress), June 2006

2207	   [ANSI.MLPP.Spec] American National Standards Institute,
2208	                 "Telecommunications- Integrated Services Digital
2209	                 Network (ISDN) - Multi-Level Precedence and Pre-
2210	                 emption (MLPP) Service Capability", ANSI T1.619-1992
2211	                 (R1999), 1992.

2213	   [ANSI.MLPP.Supplement] American National Standards Institute, "MLPP
2214	                 Service Domain Cause Value Changes", ANSI ANSI
2215	                 T1.619a-1994 (R1999), 1990.

2217	   [AVQ]         S. Kunniyur and R. Srikant "Analysis and Design of an
2218	                 Adaptive Virtual Queue (AVQ) Algorithm for Active
2219	                 Queue Management", In: Proc. ACM SIGCOMM'01, Computer
2220	                 Communication Review 31 (4) (October, 2001).

2222	   [Breslau99]   L. Breslau, S. Jamin, S. Shenker "Measurement-based
2223	                 admission control: what is the research agenda?", In:
2224	                 Proc. Int'l Workshop on Quality of Service 1999.

2226	   [Breslau00]   L. Breslau, E. Knightly, S. Shenker, I. Stoica, H.
2227	                 Zhang "Endpoint Admission Control: Architectural
2228	                 Issues and Performance", In: ACM SIGCOMM 2000

2230	   [Briscoe]     Bob Briscoe and Steve Rudkin, "Commercial Models for
2231	                 IP Quality of Service Interconnect", BT Technology
2232	                 Journal, Vol 23 No 2, April 2005.

2234	   [DCAC]        Richard J. Gibbens and Frank P. Kelly "Distributed
2235	                 connection acceptance control for a connectionless
2236	                 network", In: Proc. International Teletraffic Congress
2237	                 (ITC16), Edinburgh, pp. 941�952 (1999).

2239	   [ECN-MPLS]    Bruce Davie, Bob Briscoe, June Tay, "Explicit
2240	                 Congestion Marking in MPLS",                    draft-
2241	                 davie-ecn-mpls-00.txt (work in progress), June 2006

2243	   [EMERG-RQTS]  Carlberg, K. and R. Atkinson, "General Requirements
2244	                 for Emergency Telecommunication Service (ETS)", RFC
2245	                 3689, February 2004.

2247	   [EMERG-TEL]   Carlberg, K. and R. Atkinson, "IP Telephony
2248	                 Requirements for Emergency Telecommunication Service
2249	                 (ETS)", RFC 3690, February 2004.

2251	   [Floyd]       S. Floyd, 'Specifying Alternate Semantics for the
2252	                 Explicit Congestion Notification (ECN) Field', draft-
2253	                 floyd-ecn-alternates-02.txt (work in progress), August
2254	                 2005

2256	   [GSPa]        Karsten (Ed.), Martin "GSP/ECN Technology &
2257	                 Experiments", Deliverable: 15.3 PtIII, M3I Eu Vth
2258	                 Framework Project IST-1999-11429, URL:
2259	                 http://www.m3i.org/ (February, 2002) (superseded by
2260	                 [GSP-TR])

2262	   [GSP-TR]      Martin Karsten and Jens Schmitt, "Admission Control
2263	                 Based on Packet Marking and Feedback Signalling �--
2264	                 Mechanisms, Implementation and Experiments", TU-
2265	                 Darmstadt Technical Report TR-KOM-2002-03, URL:
2266	                 http://www.kom.e-technik.tu-
2267	                 darmstadt.de/publications/abstracts/KS02-5.html (May,
2268	                 2002)

2270	   [ITU.MLPP.1990] International Telecommunications Union, "Multilevel
2271	                 Precedence and Pre-emption Service (MLPP)", ITU-T
2272	                 Recommendation I.255.3, 1990.

2274	   [Johnson]     DM Johnson, 'QoS control versus generous
2275	                 dimensioning', BT Technology Journal, Vol 23 No 2,
2276	                 April 2005

2278	   [Low]         S. Low, L. Andrew, B. Wydrowski, 'Understanding XCP:
2279	                 equilibrium and fairness', IEEE InfoCom 2005

2281	   [PCN]         B. Briscoe, P. Eardley, D. Songhurst, F. Le Faucheur,
2282	                 A. Charny, V. Liatsos, S. Dudley, J. Babiarz, K. Chan,
2283	                 G. Karagiannis, A. Bader, L. Westberg. 'Pre-Congestion
2284	                 Notification marking', draft-briscoe-tsvwg-cl-phb-02
2285	                 (work in progress), June 2006.

2287	   [Re-ECN]      Bob Briscoe, Arnaud Jacquet, Alessandro Salvatori,
2288	                 'Re-ECN: Adding Accountability for Causing Congestion
2289	                 to TCP/IP', draft-briscoe-tsvwg-re-ecn-tcp-01 (work in
2290	                 progress), March 2006.

2292	   [Re-feedback] Bob Briscoe, Arnaud Jacquet, Carla Di Cairano-
2293	                 Gilfedder, Andrea Soppera, 'Re-feedback for Policing
2294	                 Congestion Response in an Inter-network', ACM SIGCOMM
2295	                 2005, August 2005.

2297	   [Re-PCN]      B. Briscoe, 'Emulating Border Flow Policing using Re-
2298	                 ECN on Bulk Data', draft-briscoe-tsvwg-re-ecn-border-
2299	                 cheat-00 (work in progress), February 2006.

2301	   [Reid]        ABD Reid, 'Economics and scalability of QoS
2302	                 solutions', BT Technology Journal, Vol 23 No 2, April
2303	                 2005

2305	   [RFC2211]     J. Wroclawski, Specification of the Controlled-Load
2306	                 Network Element Service, September 1997

2308	   [RFC2309]     Braden, B., et al., "Recommendations on Queue
2309	                 Management and Congestion Avoidance in the Internet",
2310	                 RFC 2309, April 1998.

2312	   [RFC2474]     Nichols, K., Blake, S., Baker, F. and D. Black,
2313	                 "Definition of the Differentiated Services Field (DS
2314	                 Field) in the IPv4 and IPv6 Headers", RFC 2474,
2315	                 December 1998

2317	   [RFC2475]     Blake, S., Black, D., Carlson, M., Davies, E., Wang,
2318	                 Z. and W. Weiss, 'A framework for Differentiated
2319	                 Services', RFC 2475, December 1998.

2321	   [RFC2597]     Heinanen, J., Baker, F., Weiss, W. and J. Wrocklawski,
2322	                 "Assured Forwarding PHB Group", RFC 2597, June 1999.

2324	   [RFC2998]     Bernet, Y., Yavatkar, R., Ford, P., Baker, F., Zhang,
2325	                 L., Speer, M., Braden, R., Davie, B., Wroclawski, J.
2326	                 and E. Felstaine, "A Framework for Integrated Services
2327	                 Operation Over DiffServ Networks", RFC 2998, November
2328	                 2000.

2330	   [RFC3168]     Ramakrishnan, K., Floyd, S. and D. Black "The Addition
2331	                 of Explicit Congestion Notification (ECN) to IP", RFC
2332	                 3168, September 2001.

2334	   [RFC3246]     B. Davie, A. Charny, J.C.R. Bennet, K. Benson, J.Y. Le
2335	                 Boudec, W. Courtney, S. Davari, V. Firoiu, D.
2336	                 Stiliadis, 'An Expedited Forwarding PHB (Per-Hop
2337	                 Behavior)', RFC 3246, March 2002.

2339	   [RFC3270]     Le Faucheur, F., Wu, L., Davie, B., Davari, S.,
2340	                 Vaananen, P., Krishnan, R., Cheval, P., and J.
2341	                 Heinanen, "Multi- Protocol Label Switching (MPLS)
2342	                 Support of Differentiated Services", RFC 3270, May
2343	                 2002.

2345	   [RFC4542]     F. Baker & J. Polk, "Implementing an Emergency
2346	                 Telecommunications Service for Real Time Services in
2347	                 the Internet Protocol Suite", RFC 4542, May 2006.

2349	   [RMD]         Attila Bader, Lars Westberg, Georgios Karagiannis,
2350	                 Cornelia Kappler, Tom Phelan, 'RMD-QOSM - The Resource
2351	                 Management in DiffServ QoS model', draft-ietf-nsis-
2352	                 rmd-03 Work in Progress, June 2005.

2354	   [RSVP-PCN]    Francois Le Faucheur, Anna Charny, Bob Briscoe, Philip
2355	                 Eardley, Joe Barbiaz, Kwok-Ho Chan, 'RSVP Extensions
2356	                 for Admission Control over DiffServ using Pre-
2357	                 Congestion Notification (PCN)', draft-lefaucheur-rsvp-
2358	                 ecn-01 (work in progress), June 2006.

2360	   [RSVP-PREEMPTION] Herzog, S., "Signaled Preemption Priority Policy
2361	                 Element", RFC 3181, October 2001.

2363	   [RSVP-EMERGENCY] Le Faucheur et al., RSVP Extensions for Emergency
2364	                 Services, draft-lefaucheur-emergency-rsvp-02.txt

2366	   [RTECN]       Babiarz, J., Chan, K. and V. Firoiu, 'Congestion
2367	                 Notification Process for Real-Time Traffic', draft-
2368	                 babiarz-tsvwg-rtecn-04 Work in Progress, July 2005.

2370	   [RTECN-usage] Alexander, C., Ed., Babiarz, J. and J. Matthews,
2371	                 'Admission Control Use Case for Real-time ECN', draft-
2372	                 alexander-rtecn-admission-control-use-case-00, Work in
2373	                 Progress, February 2005.

2375	   [Songhurst]   David J. Songhurst, Philip Eardley, Bob Briscoe, Carla
2376	                 Di Cairano Gilfedder and June Tay, 'Guaranteed QoS
2377	                 Synthesis for Admission Control with Shared Capacity',
2378	                 BT Technical Report TR-CXR9-2006-001, Feb 2006,
2379	                 http://www.cs.ucl.ac.uk/staff/B.Briscoe/projects/ipe2e
2380	                 qos/gqs/papers/GQS_shared_tr.pdf

2382	   [vq]          Costas Courcoubetis and Richard Weber "Buffer Overflow
2383	                 Asymptotics for a Switch Handling Many Traffic
2384	                 Sources" In: Journal Applied Probability 33 pp. 886--
2385	                 903 (1996).

2387	Authors' Addresses

2389	   Bob Briscoe
2390	   BT Research
2391	   B54/77, Sirius House
2392	   Adastral Park
2393	   Martlesham Heath
2394	   Ipswich, Suffolk
2395	   IP5 3RE
2396	   United Kingdom
2397	   Email: bob.briscoe@bt.com

2399	   Dave Songhurst
2400	   BT Research
2401	   B54/69, Sirius House
2402	   Adastral Park
2403	   Martlesham Heath
2404	   Ipswich, Suffolk
2405	   IP5 3RE
2406	   United Kingdom
2407	   Email: dsonghurst@jungle.bt.co.uk

2409	   Philip Eardley
2410	   BT Research
2411	   B54/77, Sirius House
2412	   Adastral Park
2413	   Martlesham Heath
2414	   Ipswich, Suffolk
2415	   IP5 3RE
2416	   United Kingdom
2417	   Email: philip.eardley@bt.com

2419	   Francois Le Faucheur
2420	   Cisco Systems, Inc.
2421	   Village d'Entreprise Green Side - Batiment T3
2422	   400, Avenue de Roumanille
2423	   06410 Biot Sophia-Antipolis
2424	   France
2425	   Email: flefauch@cisco.com

2427	   Anna Charny
2428	   Cisco Systems
2429	   300 Apollo Drive
2430	   Chelmsford, MA 01824
2431	   USA
2432	   Email: acharny@cisco.com
2433	   Kwok Ho Chan
2434	   Nortel Networks
2435	   600 Technology Park Drive
2436	   Billerica, MA  01821
2437	   USA
2438	   Email: khchan@nortel.com

2440	   Jozef Z. Babiarz
2441	   Nortel Networks
2442	   3500 Carling Avenue
2443	   Ottawa, Ont  K2H 8E9
2444	   Canada
2445	   Email: babiarz@nortel.com

2447	   Stephen Dudley
2448	   Nortel Networks
2449	   4001 E. Chapel Hill Nelson Highway
2450	   P.O. Box 13010, ms 570-01-0V8
2451	   Research Triangle Park, NC 27709
2452	   USA
2453	   Email: smdudley@nortel.com

2455	   Georgios Karagiannis
2456	   University of Twente
2457	   P.O. BOX 217
2458	   7500 AE Enschede,
2459	   The Netherlands
2460	   EMail: g.karagiannis@ewi.utwente.nl

2462	   Attila B�der
2463	   attila.bader@ericsson.com

2465	   Lars Westberg
2466	   Ericsson AB
2467	   SE-164 80 Stockholm
2468	   Sweden
2469	   EMail: Lars.Westberg@ericsson.com

2471	Intellectual Property Statement

2473	   The IETF takes no position regarding the validity or scope of any
2474	   Intellectual Property Rights or other rights that might be claimed to
2475	   pertain to the implementation or use of the technology described in
2476	   this document or the extent to which any license under such rights
2477	   might or might not be available; nor does it represent that it has
2478	   made any independent effort to identify any such rights.  Information
2479	   on the procedures with respect to rights in RFC documents can be
2480	   found in BCP 78 and BCP 79.

2482	   Copies of IPR disclosures made to the IETF Secretariat and any
2483	   assurances of licenses to be made available, or the result of an
2484	   attempt made to obtain a general license or permission for the use of
2485	   such proprietary rights by implementers or users of this
2486	   specification can be obtained from the IETF on-line IPR repository at
2487	   http://www.ietf.org/ipr.

2489	   The IETF invites any interested party to bring to its attention any
2490	   copyrights, patents or patent applications, or other proprietary
2491	   rights that may cover technology that may be required to implement
2492	   this standard.  Please address the information to the IETF at
2493	   ietf-ipr@ietf.org

2495	Disclaimer of Validity

2497	   This document and the information contained herein are provided on an
2498	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
2499	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
2500	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
2501	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
2502	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
2503	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

2505	Copyright Statement

2507	   Copyright (C) The Internet Society (2006).

2509	   This document is subject to the rights, licenses and restrictions
2510	   contained in BCP 78, and except as set forth therein, the authors
2511	   retain all their rights.