idnits 2.17.1 draft-briscoe-tsvwg-cl-architecture-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 33. -- Found old boilerplate from RFC 3978, Section 5.5 on line 2549. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2526. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2533. ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line 2553), which is fine, but *also* found old RFC 2026, Section 10.4C, paragraph 1 text on line 55. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document seems to lack an RFC 3979 Section 5, para. 3 IPR Disclosure Invitation -- however, there's a paragraph with a matching beginning. Boilerplate error? Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 3 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1193 has weird spacing: '... can be used ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC2205' is mentioned on line 1184, but not defined

  == Missing Reference: 'RT-ECN' is mentioned on line 599, but not defined

  == Missing Reference: 'TEWMA' is mentioned on line 2215, but not defined

  == Unused Reference: 'AVQ' is defined on line 2239, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2309' is defined on line 2354, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2474' is defined on line 2358, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2597' is defined on line 2367, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3246' is defined on line 2380, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3270' is defined on line 2385, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-05) exists of
     draft-ietf-tsvwg-rsvp-dste-03

  -- Possible downref: Non-RFC (?) normative reference: ref. 'AVQ'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Breslau99'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Breslau00'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Briscoe'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DCAC'

  == Outdated reference: A later version (-01) exists of
     draft-davie-ecn-mpls-00

  ** Downref: Normative reference to an Informational RFC: RFC 3689 (ref.
     'EMERG-RQTS')

  ** Downref: Normative reference to an Informational RFC: RFC 3690 (ref.
     'EMERG-TEL')

  -- Possible downref: Normative reference to a draft: ref. 'Floyd' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GSPa'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GSP-TR'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ITU.MLPP.1990'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Johnson'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'LoadBalancing-a'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'LoadBalancing-b'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Low'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'NAC-a'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'NAC-b'

  == Outdated reference: A later version (-03) exists of
     draft-briscoe-tsvwg-cl-phb-02

  -- Possible downref: Normative reference to a draft: ref. 'PCN' 

  == Outdated reference: A later version (-09) exists of
     draft-briscoe-tsvwg-re-ecn-tcp-01

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Re-feedback'

  == Outdated reference: A later version (-01) exists of
     draft-briscoe-tsvwg-re-ecn-border-cheat-00

  -- Possible downref: Normative reference to a draft: ref. 'Re-PCN' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Reid'

  ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567)

  ** Downref: Normative reference to an Informational RFC: RFC 2475

  ** Downref: Normative reference to an Informational RFC: RFC 2998

  ** Downref: Normative reference to an Informational RFC: RFC 4542

  == Outdated reference: A later version (-20) exists of
     draft-ietf-nsis-rmd-03

  ** Downref: Normative reference to an Experimental draft:
     draft-ietf-nsis-rmd (ref. 'RMD')

  -- Possible downref: Normative reference to a draft: ref. 'RSVP-PCN' 

  == Outdated reference: A later version (-05) exists of
     draft-babiarz-tsvwg-rtecn-04

  -- Possible downref: Normative reference to a draft: ref. 'RTECN' 

  -- Possible downref: Normative reference to a draft: ref. 'RTECN-usage' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Songhurst'


     Summary: 14 errors (**), 0 flaws (~~), 20 warnings (==), 30 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	TSVWG                                                        B. Briscoe
2	Internet Draft                                               P. Eardley
3	draft-briscoe-tsvwg-cl-architecture-04.txt                 D. Songhurst
4	Expires: April 2007                                                  BT

6	                                                        F. Le Faucheur
7	                                                              A. Charny
8	                                                    Cisco Systems, Inc

10	                                                           J. Babiarz
11	                                                                K. Chan
12	                                                            S. Dudley
13	                                                               Nortel

15	                                                        G. Karagiannis
16	                                       University of Twente / Ericsson

18	                                                             A. Bader
19	                                                          L. Westberg
20	                                                             Ericsson

22	                                                      25 October, 2006

24	     An edge-to-edge Deployment Model for Pre-Congestion Notification:
25	                 Admission Control over a DiffServ Region
26	                draft-briscoe-tsvwg-cl-architecture-04.txt

28	Status of this Memo

30	   By submitting this Internet-Draft, each author represents that any
31	   applicable patent or other IPR claims of which he or she is aware
32	   have been or will be disclosed, and any of which he or she becomes
33	   aware will be disclosed, in accordance with Section 6 of BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF), its areas, and its working groups.  Note that
37	   other groups may also distribute working documents as Internet-
38	   Drafts.

40	   Internet-Drafts are draft documents valid for a maximum of six months
41	   and may be updated, replaced, or obsoleted by other documents at any
42	   time.  It is inappropriate to use Internet-Drafts as reference
43	   material or to cite them other than as "work in progress".

45	   The list of current Internet-Drafts can be accessed at
46	        http://www.ietf.org/1id-abstracts.html

48	   The list of Internet-Draft Shadow Directories can be accessed at
49	        http://www.ietf.org/shadow.html

51	   This Internet-Draft will expire on September 6, 2006.

53	Copyright Notice

55	   Copyright (C) The Internet Society (2006).  All Rights Reserved.

57	Abstract

59	This document describes a deployment model for pre-congestion
60	notification (PCN) operating in a large DiffServ-based region of the
61	Internet. PCN-based admission control protects the quality of service
62	of existing flows in normal circumstances, whilst if necessary (eg
63	after a large failure) pre-emption of some flows preserves the quality
64	of service of the remaining flows. Each link has a configured-
65	admission-rate and a configured-pre-emption-rate, and a router marks
66	packets that exceed these rates. Hence routers give an early warning of
67	their own potential congestion, before packets need to be dropped.
68	Gateways around the edges of the PCN-region convert measurements of
69	packet rates and their markings into decisions about whether to admit
70	new flows, and (if necessary) into the rate of excess traffic that
71	should be pre-empted. Per-flow admission states are kept at the
72	gateways only, while the PCN markers that are required for all routers
73	operate on the aggregate traffic - hence there is no scalability impact
74	on interior routers.

76	Authors' Note (TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION)

78	   This document is posted as an Internet-Draft with the intention of
79	   eventually becoming an INFORMATIONAL RFC.

81	Table of Contents

83	   1. Introduction................................................5
84	      1.1. Summary................................................5
85	      1.2. Key benefits...........................................8
86	      1.3. Terminology............................................9
87	      1.4. Existing terminology...................................11
88	      1.5. Standardisation requirements...........................11
89	      1.6. Structure of rest of the document......................12
90	   2. Key aspects of the deployment model.........................13
91	      2.1. Key goals.............................................13
92	      2.2. Key assumptions........................................14
93	   3. Deployment model...........................................17
94	      3.1. Admission control......................................17
95	         3.1.1. Pre-Congestion Notification for Admission Marking..17
96	         3.1.2. Measurements to support admission control..........17
97	         3.1.3. How edge-to-edge admission control supports end-to-end
98	         QoS signalling..........................................18
99	         3.1.4. Use case.........................................18
100	      3.2. Flow pre-emption.......................................20
101	         3.2.1. Alerting an ingress gateway that flow pre-emption may be
102	         needed..................................................20
103	         3.2.2. Determining the right amount of CL traffic to drop.23
104	         3.2.3. Use case for flow pre-emption.....................24
105	      3.3. Both admission control and pre-emption.................25
106	   4. Summary of Functionality....................................27
107	      4.1. Ingress gateways.......................................27
108	      4.2. Interior routers.......................................28
109	      4.3. Egress gateways........................................28
110	      4.4. Failures..............................................29
111	   5. Limitations and some potential solutions....................31
112	      5.1. ECMP..................................................31
113	      5.2. Beat down effect.......................................33
114	      5.3. Bi-directional sessions................................35
115	      5.4. Global fairness........................................37
116	      5.5. Flash crowds..........................................39
117	      5.6. Pre-empting too fast...................................41
118	      5.7. Other potential extensions.............................42
119	         5.7.1. Tunnelling........................................42
120	         5.7.2. Multi-domain and multi-operator usage.............43
121	         5.7.3. Preferential dropping of pre-emption marked packets44
122	         5.7.4. Adaptive bandwidth for the Controlled Load service.44
123	         5.7.5. Controlled Load service with end-to-end Pre-Congestion
124	         Notification............................................45
125	         5.7.6. MPLS-TE..........................................45
126	   6. Relationship to other QoS mechanisms........................46
127	      6.1. IntServ Controlled Load................................46
128	      6.2. Integrated services operation over DiffServ............46
129	      6.3. Differentiated Services................................46
130	      6.4. ECN...................................................47
131	      6.5. RTECN.................................................47
132	      6.6. RMD...................................................48
133	      6.7. RSVP Aggregation over MPLS-TE..........................48
134	      6.8. Other Network Admission Control Approaches.............48
135	   7. Security Considerations.....................................49
136	   8. Acknowledgements...........................................49
137	   9. Comments solicited.........................................50
138	   10. Changes from earlier versions of the draft.................50
139	   11. Appendices................................................52
140	      11.1. Appendix A: Explicit Congestion Notification..........52
141	      11.2. Appendix B: What is distributed measurement-based admission
142	      control?...................................................53
143	      11.3. Appendix C: Calculating the Exponentially weighted moving
144	      average (EWMA).............................................54
145	   12. References................................................56
146	   Authors' Addresses............................................61
147	   Intellectual Property Statement................................63
148	   Disclaimer of Validity........................................63
149	   Copyright Statement...........................................63

151	1. Introduction

153	1.1. Summary

155	   This document describes a deployment model to achieve an end-to-end
156	   Controlled Load service by using (within a large region of the
157	   Internet) DiffServ and edge-to-edge distributed measurement-based
158	   admission control and flow pre-emption. Controlled load service is a
159	   quality of service (QoS) closely approximating the QoS that the same
160	   flow would receive from a lightly loaded network element [RFC2211].
161	   Controlled Load (CL) is useful for inelastic flows such as those for
162	   real-time media.

164	   In line with the "IntServ over DiffServ" framework defined in
165	   [RFC2998], the CL service is supported end-to-end and RSVP signalling
166	   [RFC2205] is used end-to-end, over an edge-to-edge DiffServ region.
167	   We call the DiffServ region the "CL-region".

169	 ___    ___    _______________________________________    ____    ___
170	|   |  |   |  |Ingress         Interior         Egress|  |    |  |   |
171	|   |  |   |  |gateway         routers         gateway|  |    |  |   |
172	|   |  |   |  |-------+  +-------+  +-------+  +------|  |    |  |   |
173	|   |  |   |  | PCN-  |  | PCN-  |  | PCN-  |  |      |  |    |  |   |
174	|   |..|   |..|marking|..|marking|..|marking|..| Meter|..|    |..|   |
175	|   |  |   |  |-------+  +-------+  +-------+  +------|  |    |  |   |
176	|   |  |   |  |  \                                 /  |  |    |  |   |
177	|   |  |   |  |   \                               /   |  |    |  |   |
178	|   |  |   |  |    \  Congestion-Level-Estimate  /    |  |    |  |   |
179	|   |  |   |  |     \  (for admission control)  /     |  |    |  |   |
180	|   |  |   |  |      --<-----<----<----<-----<--      |  |    |  |   |
181	|   |  |   |  |      Sustainable-Aggregate-Rate       |  |    |  |   |
182	|   |  |   |  |        (for flow pre-emption)         |  |    |  |   |
183	|___|  |___|  |_______________________________________|  |____|  |___|

185	Sx     Access               CL-region                   Access    Rx
186	End    Network                                          Network   End
187	Host                                                              Host
188	                <------ edge-to-edge signalling ----->
189	              (for admission control & flow pre-emption)

191	<-------------------end-to-end QoS signalling protocol--------------->

193	Figure 1: Overall QoS architecture (NB terminology explained later)
194	   Figure 1 shows an example of an overall QoS architecture, where the
195	   two access networks are connected by a CL-region. Another possibility
196	   is that there are several CL-regions between the access networks -
197	   each would operate the Pre-Congestion Notification mechanisms
198	   separately. The document assumes RSVP as the end-to-end QoS
199	   signalling protocol. However, the RSVP signalling may itself be
200	   originated or terminated by proxies still closer to the edge of the
201	   network, such as home hubs or the like, triggered in turn by
202	   application layer signalling. [RFC2998] and our approach are compared
203	   further in Section 6.2.

205	   Flows must enter and leave the CL-region through its ingress and
206	   egress gateways, and they need traffic descriptors that are policed
207	   by the ingress gateway (NB the policing function is out of this
208	   document's scope). The overall CL-traffic between two border routers
209	   is called a "CL-region-aggregate".

211	   The document introduces a mechanism for flow admission control:
212	   should a new flow be admitted into a specific CL-region-aggregate?
213	   Admission control protects the QoS of existing CL-flows in normal
214	   circumstances. In abnormal circumstances, for instance a disaster
215	   affecting multiple interior routers, then the QoS on existing CL
216	   microflows may degrade even if care was exercised when admitting
217	   those microflows before those circumstances. Therefore we also
218	   propose a mechanism for flow pre-emption: how much traffic, in a
219	   specific CL-region-aggregate, should be pre-empted in order to
220	   preserve the QoS of the remaining CL-flows? Flow pre-emption also
221	   restores QoS to lower priority traffic.

223	   As a fundamental building block to enable these two mechanisms, each
224	   link of the CL-region is associated with a configured-admission-rate
225	   and configured-pre-emption-rate; the former is usually significantly
226	   larger than the latter. If traffic in a specific DiffServ class ("CL-
227	   traffic") on the link exceeds these rates then packets are marked
228	   with "Admission Marking" or "Pre-emption Marking". The algorithms
229	   that determine the number of packets marked are outlined in Section 3
230	   and detailed in [PCN]. PCN marking (Pre-Congestion Notification)
231	   builds on the concepts of RFC 3168, "The addition of Explicit
232	   Congestion Notification to IP" (which is briefly summarised in
233	   Appendix A).

235	              Traffic rate on link ^
236	                                   |
237	                                   |   Drop packets
238	                   link bandwidth -|---------------------------
239	                                   |
240	                                   |  Pre-emption Mark packets
241	      configured-pre-emption-rate -|---------------------------
242	                                   |
243	                                   |   Admission Mark packets
244	        configured-admission-rate -|---------------------------
245	                                   |
246	                                   |   No marking of packets
247	                                   |
248	                                   +---------------------------

250	                       Figure 2: Packet Marking by Routers

252	   Gateways of the CL-region make measurements of packet rates and their
253	   PCN markings and convert them into decisions about whether to admit
254	   new flows, and (if necessary) into the rate of excess traffic that
255	   should be pre-empted. These mechanisms are detailed in Section 3 and
256	   briefly outlined in the next few paragraphs.

258	   The admission control mechanism for a new flow entering the network
259	   at ingress gateway G0 and leaving it at egress gateway G1 relies on
260	   feedback from the egress gateway G1 about the existing CL-region-
261	   aggregate between G0 and G1. This feedback is generated as follows.
262	   All routers meter the rate of the CL-traffic on their outgoing links
263	   and mark the packets with the Admission Mark if the configured-
264	   admission-rate is exceeded. Egress gateway G1 measures the Admission
265	   Marks for each of its CL-region-aggregates separately. If the
266	   fraction of traffic on a CL-region-aggregate that is Admission Marked
267	   exceeds some threshold, no further flows should be admitted into this
268	   CL-region-aggregate. Because sources vary their data rates (amongst
269	   other reasons) the rate of the CL-traffic on a link may fluctuate
270	   above and below the configured-admission-rate. Hence to get more
271	   stable information, the egress gateway measures the fraction as a
272	   moving average, called the Congestion-Level-Estimate. This is
273	   signalled from the egress G1 to the ingress G0, to enable the ingress
274	   to block new flows.

276	   Admission control seems most useful for DiffServ's Controlled load
277	   service. In order to support CL traffic we would expect PCN to
278	   supplement the existing scheduling behaviour Expedited Forwarding
279	   (EF). Since PCN gives an "early warning" of potential congestion
280	   (hence "pre-congestion notification"), admission control can kick in
281	   before there is any significant build up of packets in routers -
282	   which is exactly the performance required for CL. However, PCN is not
283	   only intended to supplement EF. PCN is specified (in [PCN]) as a
284	   building block which can supplement the scheduling behaviour of other
285	   PHBs.

287	   The function to pre-empt flows (or allow the potential to pre-empt
288	   them) relies on feedback from the egress gateway about the CL-region-
289	   aggregates. This feedback is generated as follows. All routers meter
290	   the rate of the CL-traffic on their outgoing links, and if the rate
291	   is in excess of the configured-pre-emption-rate then packets
292	   amounting to the excess rate are Pre-emption Marked. If the egress
293	   gateway G1 sees a Pre-emption Marked packet then it measures, for
294	   this CL-region-aggregate, the rate of all received packets that
295	   aren't Pre-emption Marked. This is the rate of CL-traffic that the
296	   network can actually support from G0 to G1, and we thus call it the
297	   Sustainable-Aggregate-Rate. The ingress gateway G0 compares the
298	   Sustainable-Aggregate-Rate with the rate that it is sending towards
299	   G1, and hence determines the required traffic rate reduction. The
300	   document assumes flow pre-emption as the way of reacting to this
301	   information, ie stopping sufficient flows to reduce the rate to the
302	   Sustainable-Aggregate-Rate. However, this isn't mandated, for
303	   instance policy or regulation may prevent pre-emption of some flows -
304	   such considerations are out of scope of this document.

306	1.2. Key benefits

308	   We believe that the mechanisms described in this document are simple,
309	   scalable, and robust because:

311	   o Per flow state is only required at the ingress gateways to prevent
312	      non-admitted CL traffic from entering the PCN-region. Other
313	      network entities are not aware of individual flows.

315	   o For each of its links a router has Admission Marking and Pre-
316	      emption Marking behaviours. These markers operate on the overall
317	      CL traffic of the respective link. Therefore, there are no
318	      scalability concerns.

320	   o The information of these measurements is implicitly signalled to
321	      the egress gateways by the marks in the packet headers. No
322	      protocol actions (explicit messages) are required.

324	   o The egress gateways make separate measurements for each ingress
325	      gateway of packets. Each meter operates on the overall CL traffic
326	      of a particular CL-region-aggregate. Therefore, there are no
327	      scalability concerns as long as the number of ingress gateways is
328	      not overwhelmingly large.

330	   o Feedback signalling is required between all pairs of ingress and
331	      egress gateways and the signalled information is on the basis of
332	      the corresponding CL-region-aggregate, i.e. it is also unaware of
333	      individual flows.

335	   o The configured-admission-rates can be chosen small enough that
336	      admitted traffic can still be carried after a rerouting in most
337	      failure cases. This is an important feature as QoS violations in
338	      core networks due to link failures are more likely than QoS
339	      violations due to increased traffic volume.

341	   o The admitted load is controlled dynamically. Therefore it adapts
342	      as the traffic matrix changes, and also if the network topology
343	      changes (eg after a link failure). Hence an operator can be less
344	      conservative when deploying network capacity, and less accurate in
345	      their prediction of the traffic matrix. Also, controlling the load
346	      using statically provisioned capacity per ingress (regardless of
347	      the egress of a flow), as is typical in the DiffServ architecture
348	      [RFC2475], can lead to focussed overload: many flows happen to
349	      focus on a particular link and then all flows through the
350	      congested link fail catastrophically (Section 6.2).

352	   o The pre-emption function complements admission control. It allows
353	      the network to recover from sudden unexpected surges of CL-traffic
354	      on some links, thus restoring QoS to the remaining flows. Such
355	      scenarios are very unlikely but not impossible. They can be caused
356	      by large network failures that redirect lots of admitted CL-
357	      traffic to other links, or by malfunction of the measurement-based
358	      admission control in the presence of admitted flows that send for
359	      a while with an atypically low rate and increase their rates in a
360	      correlated way.

362	1.3. Terminology

364	   EDITOR'S NOTE: Terminology in this document is (hopefully) consistent
365	   with that in [PCN]. However, it may not be consistent with the
366	   terminology in other PCN-related documents. The PCN Working Group (if
367	   formed) will need to agree a single set of terminology.

369	   This terminology is copied from the pre-congestion notification
370	   marking draft [PCN]:

372	   o Pre-Congestion Notification (PCN): two new algorithms that
373	      determine when a PCN-enabled router Admission Marks and Pre-
374	      emption Marks a packet, depending on the traffic level.

376	   o Admission Marking condition: the traffic level is such that the
377	      router Admission Marks packets. The router provides an "early
378	      warning" that the load is nearing the engineered admission control
379	      capacity, before there is any significant build-up of CL packets
380	      in the queue.

382	   o Pre-emption Marking condition: the traffic level is such that the
383	      router Pre-emption Marks packets. The router warns explicitly that
384	      pre-emption may be needed.

386	   o Configured-admission-rate: the reference rate used by the
387	      admission marking algorithm in a PCN-enabled router.

389	   o Configured-pre-emption-rate - the reference rate used by the pre-
390	      emption marking algorithm in a PCN-enabled router.

392	   The following terms are defined here:

394	   o Ingress gateway: router at an ingress to the CL-region. A CL-
395	      region may have several ingress gateways.

397	   o Egress gateway: router at an egress from the CL-region. A CL-
398	      region may have several egress gateways.

400	   o Interior router: a router which is part of the CL-region, but
401	      isn't an ingress or egress gateway.

403	   o CL-region: A region of the Internet in which all traffic
404	      enters/leaves through an ingress/egress gateway and all routers
405	      run Pre-Congestion Notification marking. A CL-region is a DiffServ
406	      region (a DiffServ region is either a single DiffServ domain or
407	      set of contiguous DiffServ domains), but note that the CL-region
408	      does not use the traffic conditioning agreements (TCAs) of the
409	      (informational) DiffServ architecture.

411	   o CL-region-aggregate: all the microflows between a specific pair of
412	      ingress and egress gateways. Note there is no field in the flow
413	      packet headers that uniquely identifies the aggregate.

415	   o Congestion-Level-Estimate: the number of bits in CL packets that
416	      are admission marked (or pre-emption marked), divided by the
417	      number of bits in all CL packets. It is calculated as an
418	      exponentially weighted moving average. It is calculated by an
419	      egress gateway for the CL packets from a particular ingress
420	      gateway, i.e. there is a Congestion-Level-Estimate for each CL-
421	      region-aggregate.

423	   o Sustainable-Aggregate-Rate: the rate of traffic that the network
424	      can actually support for a specific CL-region-aggregate. So it is
425	      measured by an egress gateway for the CL packets from a particular
426	      ingress gateway.

428	   o Ingress-Aggregate-Rate: the rate of traffic that is being sent on
429	      a specific CL-region-aggregate. So it is measured by an ingress
430	      gateway for the CL packets sent towards a particular egress
431	      gateway.

433	1.4. Existing terminology

435	   This is a placeholder for useful terminology that is defined
436	   elsewhere.

438	1.5. Standardisation requirements

440	   The framework described in this document has two new standardisation
441	   requirements:

443	   o new Pre-Congestion Notification for Admission Marking and Pre-
444	      emption Marking are required, as detailed in [PCN].

446	   o the end-to-end signalling protocol needs to be modified to carry
447	      the Congestion-Level-Estimate report (for admission control) and
448	      the Sustainable-Aggregate-Rate (for flow pre-emption). With our
449	      assumption of RSVP (Section 2.2) as the end-to-end signalling
450	      protocol, it means that extensions to RSVP are required, as
451	      detailed in [RSVP-PCN], for example to carry the Congestion-Level-
452	      Estimate and Sustainable-Aggregate-Rate information from egress
453	      gateway to ingress gateway.

455	   o We are discussing what to standardise about the gateway's
456	      behaviour.

458	   Other than these things, the arrangement uses existing IETF protocols
459	   throughout, although not in their usual architecture.

461	1.6. Structure of rest of the document

463	   Section 2 describes some key aspects of the deployment model: our
464	   goals and assumptions. Section 3 describes the deployment model,
465	   whilst Section 4 summarises the required changes to the various
466	   routers in the CL-region. Section 5 outlines some limitations of PCN
467	   that we've identified in this deployment model; it also discusses
468	   some potential solutions, and other possible extensions. Section 6
469	   provides some comparison with existing QoS mechanisms.

471	2. Key aspects of the deployment model

473	   EDITOR'S NOTE: The material in Section 2 will eventually disappear,
474	   as it will be covered by the problem statement of the PCN Working
475	   Group (if formed).

477	   In this section we discuss the key aspects of the deployment model:

479	   o At a high level, our key goals, i.e. the functionality that we
480	      want to achieve

482	   o The assumptions that we're prepared to make

484	2.1. Key goals

486	   The deployment model achieves an end-to-end controlled load (CL)
487	   service where a segment of the end-to-end path is an edge-to-edge
488	   Pre-Congestion Notification region. CL is a quality of service (QoS)
489	   closely approximating the QoS that the same flow would receive from a
490	   lightly loaded network element [RFC2211]. It is useful for inelastic
491	   flows such as those for real-time media.

493	   o The CL service should be achieved despite varying load levels of
494	      other sorts of traffic, which may or may not be rate adaptive
495	      (i.e. responsive to packet drops or ECN marks).

497	   o The CL service should be supported for a variety of possible CL
498	      sources: Constant Bit Rate (CBR), Variable Bit Rate (VBR) and
499	      voice with silence suppression. VBR is the most challenging to
500	      support.

502	   o After a localised failure in the interior of the CL-region causing
503	      heavy congestion, the CL service should recover gracefully by pre-
504	      empting (dropping) some of the admitted CL microflows, whilst
505	      preserving as many of them as possible with their full CL QoS.

507	   o It needs to be possible to complete flow pre-emption within 1-2
508	      seconds. Operators will have varying requirements but, at least
509	      for voice, it has been estimated that after a few seconds then
510	      many affected users will start to hang up, making the flow pre-
511	      emption mechanism redundant and possibly even counter-productive.
512	      Until flow pre-emption kicks in, other applications using CL (e.g.
513	      video) and lower priority traffic (e.g. Assured Forwarding (AF))
514	      could be receiving reduced service. Therefore an even faster flow
515	      pre-emption mechanism would be desirable (even if, in practice,
516	      operators have to add a deliberate pause to ride out a transient
517	      while the natural rate of call tear down or lower layer protection
518	      mechanisms kick in).

520	   o The CL service should support emergency services ([EMERG-RQTS],
521	      [EMERG-TEL]) as well as the Assured Service which is the IP
522	      implementation of the existing ITU-T/NATO/DoD telephone system
523	      architecture known as Multi-Level Pre-emption and Precedence
524	      [ITU.MLPP.1990] [ANSI.MLPP.Spec][ANSI.MLPP.Supplement], or MLPP.
525	      In particular, this involves admitting new flows that are part of
526	      high priority sessions even when admission control would reject
527	      new routine flows. Similarly, when having to choose which flows to
528	      pre-empt, this involves taking into account the priorities and
529	      properties of the sessions that flows are part of.

531	2.2. Key assumptions

533	   The framework does not try to deliver the above functionality in all
534	   scenarios. We make the following assumptions about the type of
535	   scenario to be solved.

537	   o Edge-to-edge: all the routers in the CL-region are upgraded with
538	      Pre-Congestion Notification, and all the ingress and egress
539	      gateways are upgraded to perform the measurement-based admission
540	      control and flow pre-emption. Note that although the upgrades
541	      required are edge-to-edge, the CL service is provided end-to-end.

543	   o Additional load: we assume that any additional load offered within
544	      the reaction time of the admission control mechanism doesn't move
545	      the CL-region directly from no congestion to overload. So it
546	      assumes there will always be an intermediate stage where some CL
547	      packets are Admission Marked, but they are still delivered without
548	      significant QoS degradation. We believe this is valid for core and
549	      backbone networks with typical call arrival patterns (given the
550	      reaction time is little more than one round trip time across the
551	      CL-region), but is unlikely to be valid in access networks where
552	      the granularity of an individual call becomes significant.

554	   o Aggregation: we assume that in normal operations, there are many
555	      CL microflows within the CL-region, typically at least hundreds
556	      between any pair of ingress and egress gateways. The implication
557	      is that the solution is targeted at core and backbone networks and
558	      possibly parts of large access networks.

560	   o Trust: we assume that there is trust between all the routers in
561	      the CL-region. For example, this trust model is satisfied if one
562	      operator runs the whole of the CL-region. But we make no such
563	      assumptions about the end hosts, i.e. depending on the scenario
564	      they may be trusted or untrusted by the CL-region.

566	   o Signalling: we assume that the end-to-end signalling protocol is
567	      RSVP. Section 3 describes how the CL-region fits into such an end-
568	      to-end QoS scenario, whilst [RSVP-PCN] describes the extensions to
569	      RSVP that are required.

571	   o Separation: we assume that all routers within the CL-region are
572	      upgraded with the CL mechanism, so the requirements of [Floyd] are
573	      met because the CL-region is an enclosed environment. Also, an
574	      operator separates CL-traffic in the CL-region from outside
575	      traffic by administrative configuration of the ring of gateways
576	      around the region. Within the CL-region we assume that the CL-
577	      traffic is separated from non-CL traffic.

579	   o Routing: we assume that all packets between a pair of ingress and
580	      egress gateways follow the same path, or that they follow
581	      different paths but that the load balancing scheme is tuned in the
582	      CL-region to distribute load such that the different paths always
583	      receive comparable relative load. This ensures that the
584	      Congestion-Level-Estimate used in the admission control procedure
585	      (and which is computed taking into account packets travelling on
586	      all the paths) approximately reflects the status of the actual
587	      path that will be followed by the new microflow's packets.

589	   We are investigating ways of loosening the restrictions set by some
590	   of these assumptions, for instance:

592	   o Trust: to allow the CL-region to span multiple, non-trusting
593	      operators, using the technique of [Re-PCN] as mentioned in Section
594	      5.7.2.

596	   o Signalling: we believe that the solution could operate with
597	      another signalling protocol, such as the one produced by the NSIS
598	      working group. It could also work with application level
599	      signalling as suggested in [RT-ECN].

601	   o Additional load: we believe that the assumption is valid for core
602	      and backbone networks, with an appropriate margin between the
603	      configured-admission-rate and the capacity for CL traffic.
604	      However, in principle a burst of admission requests can occur in a
605	      short time. We expect this to be a rare event under normal
606	      conditions, but it could happen e.g. due to a 'flash crowd'. If it
607	      does, then more flows may be admitted than should be, triggering
608	      the pre-emption mechanism. There are various ways an operator
609	      might try to alleviate this issue, which are discussed in the
610	      'Flash crowds' section 5.5 later.

612	   o Separation: the assumption that CL traffic is separated from non-
613	      CL traffic implies that the CL traffic has its own PHB, not shared
614	      with other traffic. We are looking at whether it could share
615	      Expedited Forwarding's PHB, but supplemented with Pre-Congestion
616	      Notification. If this is possible, other PHBs (like Assured
617	      Forwarding) could be supplemented with the same new behaviours.
618	      This is similar to how RFC3168 ECN was defined to supplement any
619	      PHB.

621	   o Routing: we are looking in greater detail at the solution in the
622	      presence of Equal Cost Multi-Path routing and at suitable
623	      enhancements. See also the 'ECMP' section 5.1 later.

625	3. Deployment model

627	3.1. Admission control

629	   In this section we describe the admission control mechanism. We
630	   discuss the three pieces of the solution and then give an example of
631	   how they fit together in a use case:

633	   o the new Pre-Congestion Notification for Admission Marking used by
634	      all routers in the CL-region

636	   o how the measurements made support our admission control mechanism

638	   o how the edge to edge mechanism fits into the end to end RSVP
639	      signalling

641	3.1.1. Pre-Congestion Notification for Admission Marking

643	   This is discussed in [PCN]. Here we only give a brief outline.

645	   To support our admission control mechanism, each router in the CL-
646	   region runs an algorithm to determine whether to Admission Mark the
647	   packet. The algorithm measures the aggregate CL traffic on the link
648	   and ensures that packets are admission marked before the actual queue
649	   builds up, but when it is in danger of doing so soon; the probability
650	   of admission marking increases with the danger. The algorithm's main
651	   parameter is the configured-admission-rate, which is set lower than
652	   the link speed, perhaps considerably so. Admission marked packets
653	   indicate that the CL traffic rate is reaching the configured-
654	   admission-rate and so act as an "early warning" that the engineered
655	   capacity is nearly reached. Therefore they indicate that requests to
656	   admit prospective new CL flows may need to be refused.

658	3.1.2. Measurements to support admission control

660	   To support our admission control mechanism the egress measures the
661	   Congestion-Level-Estimate for traffic from each remote ingress
662	   gateway, i.e. per CL-region-aggregate. The Congestion-Level-Estimate
663	   is the number of bits in CL packets that are admission marked or pre-
664	   emption marked, divided by the number of bits in all CL packets. It
665	   is calculated as an exponentially weighted moving average. It is
666	   calculated by an egress gateway separately for the CL packets from
667	   each particular ingress gateway.

669	   Why are pre-emption marked packets included in the Congestion-Level-
670	   Estimate? Pre-emption marking over-writes admission marking, i.e. a
671	   packet cannot be both admission and pre-emption marked. So if pre-
672	   emption marked packets weren't counted we would have the anomaly that
673	   as the traffic rate grew above the configured-pre-emption-rate, the
674	   Congestion-Level-Estimate would fall. If a particular encoding scheme
675	   is chosen where a packet can be both admission and pre-emption marked
676	   (such as Alternative 4 in Appendix C of [PCN]), then this is not
677	   necessary.

679	   This Congestion-Level-Estimate provides an estimate of how near the
680	   links on the path inside the CL-region are getting to the configured-
681	   admission-rate. Note that the metering is done separately per ingress
682	   gateway, because there may be sufficient capacity on all the routers
683	   on the path between one ingress gateway and a particular egress, but
684	   not from a second ingress to that same egress gateway.

686	3.1.3. How edge-to-edge admission control supports end-to-end QoS
687	   signalling

689	   Consider a scenario that consists of two end hosts, each connected to
690	   their own access networks, which are linked by the CL-region. A
691	   source tries to set up a new CL microflow by sending an RSVP PATH
692	   message, and the receiving end host replies with an RSVP RESV
693	   message. Outside the CL-region some other method, for instance
694	   IntServ, is used to provide QoS. From the perspective of RSVP the CL-
695	   region is a single hop, so the RSVP PATH and RESV messages are
696	   processed by the ingress and egress gateways but are carried
697	   transparently across all the interior routers; hence, the ingress and
698	   egress gateways hold per microflow state, whilst no per microflow
699	   state is kept by the interior routers. So far this is as in IntServ
700	   over DiffServ [RFC2998]. However, in order to support our admission
701	   control mechanism, the egress gateway adds to the RESV message an
702	   opaque object which states the current Congestion-Level-Estimate for
703	   the relevant CL-region-aggregate. Details of the corresponding RSVP
704	   extensions are described in [RSVP-PCN].

706	3.1.4. Use case

708	   To see how the three pieces of the solution fit together, we imagine
709	   a scenario where some microflows are already in place between a given
710	   pair of ingress and egress gateways, but the traffic load is such
711	   that no packets from these flows are admission marked as they travel
712	   across the CL-region. A source wanting to start a new CL microflow
713	   sends an RSVP PATH message. The egress gateway adds an object to the
714	   RESV message with the Congestion-Level-Estimate, which is zero. The
715	   ingress gateway sees this and consequently admits the new flow. It
716	   then forwards the RSVP RESV message upstream towards the source end
717	   host. Hence, assuming there's sufficient capacity in the access
718	   networks, the new microflow is admitted end-to-end.

720	   The source now sends CL packets, which arrive at the ingress gateway.
721	   The ingress uses a five-tuple filter to identify that the packets are
722	   part of a previously admitted CL microflow, and it also polices the
723	   microflow to ensure it remains within its traffic profile. (The
724	   ingress has learnt the required information from the RSVP messages.)
725	   When forwarding a packet belonging to an admitted microflow, the
726	   ingress sets the packet's DSCP and ECN fields to the appropriate
727	   values configured for the CL region. The CL packet now travels across
728	   the CL-region, getting admission marked if necessary.

730	   Next, we imagine the same scenario but at a later time when load is
731	   higher at one (or more) of the interior routers, which start to
732	   Admission Mark CL packets, because their load on the outgoing link is
733	   nearing the configured-admission-rate. The next time a source tries
734	   to set up a CL microflow, the ingress gateway learns (from the
735	   egress) the relevant Congestion-Level-Estimate. If it is greater than
736	   some CLE-threshold value then the ingress refuses the request,
737	   otherwise it is accepted. The ingress gateway could also take into
738	   account attributes of the RSVP reservation (such as for example the
739	   RSVP pre-emption priority of [RSVP-PREEMPTION] or the RSVP admission
740	   priority of [RSVP-EMERGENCY]) as well as information provided by a
741	   policy decision point in order to make a more sophisticated admission
742	   decision. This way, flow admission can help emergency/military calls
743	   by taking into account the corresponding priorities (as conveyed in
744	   RSVP policy elements) when deciding to admit or reject a new
745	   reservation. Use of RSVP for the support of emergency/military
746	   applications is discussed in further detail in [RFC4542] and [RSVP-
747	   EMERGENCY].

749	   It is also possible for an egress gateway to get a RSVP RESV message
750	   and not know what the Congestion-Level-Estimate is. For example, if
751	   there are no CL microflows at present between the relevant ingress
752	   and egress gateways. In this case the egress requests the ingress to
753	   send probe packets, from which it can initialise its meter. RSVP
754	   Extensions for such a request to send probe data can be found in
755	   [RSVP-PCN].

757	3.2. Flow pre-emption

759	   In this section we describe the flow pre-emption mechanism. We
760	   discuss the two parts of the solution and then give an example of how
761	   they fit together in a use case:

763	   o How an ingress gateway is triggered to test whether flow pre-
764	      emption may be needed

766	   o How an ingress gateway determines the right amount of CL traffic
767	      to drop

769	   The mechanism is defined in [PCN] and [RSVP-PCN].

771	   Two subsequent steps could be:

773	   o Choose which flows to shed, influenced by their priority and other
774	      policy information

776	   o Tear down the reservations for the chosen flows

778	   We provide some hints about these latter two steps in Section 3.2.3,
779	   but don't try to provide full guidance as it greatly depends on the
780	   particular detailed operational situation.

782	   An essential QoS issue in core and backbone networks is being able to
783	   cope with failures of routers and links. The consequent re-routing
784	   can cause severe congestion on some links and hence degrade the QoS
785	   experienced by on-going microflows and other, lower priority traffic.
786	   Even when the network is engineered to sustain a single link failure,
787	   multiple link failures (e.g. due to a fibre cut, router failure or a
788	   natural disaster) can cause violation of capacity constraints and
789	   resulting QoS failures. Our solution uses rate-based flow pre-
790	   emption, so that sufficient of the previously admitted CL microflows
791	   are dropped to ensure that the remaining ones again receive QoS
792	   commensurate with the CL service and at least some QoS is quickly
793	   restored to other traffic classes.

795	3.2.1. Alerting an ingress gateway that flow pre-emption may be needed

797	   Alerting an ingress gateway that flow pre-emption may be needed is a
798	   two stage process: a router in the CL-region alerts an egress gateway
799	   that flow pre-emption may be needed; in turn the egress gateway
800	   alerts the relevant ingress gateway. Every router in the CL-region
801	   has the ability to alert egress gateways, which may be done either
802	   explicitly or implicitly:

804	   o Explicit - the router per-hop behaviour is supplemented with a new
805	      Pre-emption Marking behaviour, which is outlined below. Reception
806	      of such a packet by the egress gateway alerts it that pre-emption
807	      may be needed.

809	   o Implicit - the router behaviour is unchanged from the Admission
810	      Marking behaviour described earlier. The egress gateway treats a
811	      Congestion-Level-Estimate of (almost) 100% as an implicit alert
812	      that pre-emption may be required. ('Almost' because the
813	      Congestion-Level-Estimate is a moving average, so can never reach
814	      exactly 100%.)

816	   To support explicit pre-emption alerting, each router in the CL-
817	   region runs an algorithm to determine whether to Pre-emption Mark the
818	   packet. The algorithm measures the aggregate CL traffic and ensures
819	   that packets are pre-emption marked before the actual queue builds
820	   up. The algorithm's main parameter is the configured-pre-emption-
821	   rate, which is set lower than the link speed (but higher than the
822	   configured-admission-rate). Thus pre-emption marked packets indicate
823	   that the CL traffic rate is reaching the configured-pre-emption-rate
824	   and so act as an "early warning" that the engineered capacity is
825	   nearly reached. Therefore they indicate that it may be advisable to
826	   pre-empt some of the existing CL flows in order to preserve the QoS
827	   of the others.

829	   Note that the pre-emption marking algorithm doesn't measure the
830	   packets that are already Pre-emption Marked. This ensures that in a
831	   scenario with several links that are above their configured-pre-
832	   emption-rate, then at the egress gateway the rate of packets
833	   excluding Pre-emption Marked ones truly does represent the
834	   Sustainable-Aggregate-Rate(see below for explanation).

836	   Note that the explicit mechanism only makes sense if all the routers
837	   in the CL-region have the functionality so that the egress gateways
838	   can rely on the explicit mechanism. Otherwise there is the danger
839	   that the traffic happens to focus on a router without it, and egress
840	   gateways then have also to watch for implicit pre-emption alerts.

842	   When one or more packets in a CL-region-aggregate alert the egress
843	   gateway of the need for flow pre-emption, whether explicitly or
844	   implicitly, the egress puts that CL-region-aggregate into the Pre-
845	   emption Alert state. For each CL-region-aggregate in alert state it
846	   measures the rate of traffic at the egress gateway (i.e. the traffic
847	   rate of the appropriate CL-region-aggregate) and reports this to the
848	   relevant ingress gateway. The steps are:

850	   o Determine the relevant ingress gateway - for the explicit case the
851	      egress gateway examines the pre-emption marked packet and uses the
852	      state installed at the time of admission to determine which
853	      ingress gateway the packet came from. For the implicit case the
854	      egress gateway has already determined this information, because
855	      the Congestion-Level-Estimate is calculated per ingress gateway.

857	   o Measure the traffic rate of CL packets - as soon as the egress
858	      gateway is alerted (whether explicitly or implicitly) it measures
859	      the rate of CL traffic from this ingress gateway (i.e. for this
860	      CL-region-aggregate). Note that pre-emption marked packets are
861	      excluded from that measurement. It should make its measurement
862	      quickly and accurately, but exactly how is up to the
863	      implementation.

865	   o Alert the ingress gateway - the egress gateway then immediately
866	      alerts the relevant ingress gateway about the fact that flow pre-
867	      emption may be required. This Alert message also includes the
868	      measured Sustainable-Aggregate-Rate, i.e. the rate of CL-traffic
869	      received from this ingress gateway. The Alert message is sent
870	      using reliable delivery. Procedures for the support of such an
871	      Alert using RSVP are defined in [RSVP-PCN].

873	             --------------       _       _          -----------------
874	CL packet   |Update        |     / Is it a \   Y    | Measure CL rate |
875	arrives --->|Congestion-   |--->/pre-emption\-----> | from ingress and|
876	            |Level-Estimate|    \  marked   /       | alert ingress   |
877	             --------------      \ packet? /         -----------------
878	                                  \_     _/

880	Figure 2: Egress gateway action for explicit Pre-emption Alert
881	                                   _     _
882	             --------------       /       \          -----------------
883	CL packet   |Update        |     /  Is     \   Y    | Measure CL rate |
884	arrives --->|Congestion-   |--->/  C.L.E.   \-----> | from ingress and|
885	            |Level-Estimate|    \ (nearly)  /       | alert ingress   |
886	             --------------      \ 100%?   /         -----------------
887	                                  \_     _/

889	Figure 3: Egress gateway action for implicit Pre-emption Alert
890	3.2.2. Determining the right amount of CL traffic to drop

892	   The method relies on the insight that the amount of CL traffic that
893	   can be supported between a particular pair of ingress and egress
894	   gateways, is the amount of CL traffic that is actually getting across
895	   the CL-region to the egress gateway without being Pre-emption Marked.
896	   Hence we term it the Sustainable-Aggregate-Rate.

898	   So when the ingress gateway gets the Alert message from an egress
899	   gateway, it compares:

901	   o The traffic rate that it is sending to this particular egress
902	      gateway (which we term Ingress-Aggregate-Rate)

904	   o The traffic rate that the egress gateway reports (in the Alert
905	      message) that it is receiving from this ingress gateway (which is
906	      the Sustainable-Aggregate-Rate)

908	   If the difference is significant, then the ingress gateway pre-empts
909	   some microflows. It only pre-empts if:

911	        Ingress-Aggregate-Rate > Sustainable-Aggregate-Rate + error

913	   The "error" term is partly to allow for inaccuracies in the
914	   measurements of the rates. It is also needed because the Ingress-
915	   Aggregate-Rate is measured at a slightly later moment than the
916	   Sustainable-Aggregate-Rate, and it is quite possible that the
917	   Ingress-Aggregate-Rate has increased in the interim due to natural
918	   variation of the bit rate of the CL sources. So the "error" term
919	   allows for some variation in the ingress rate without triggering pre-
920	   emption.

922	   The ingress gateway should pre-empt enough microflows to ensure that:

924	        New Ingress-Aggregate-Rate < Sustainable-Aggregate-Rate - error

926	   The "error" term here is used for similar reasons but in the other
927	   direction, to ensure slightly more load is shed than seems necessary,
928	   in case the two measurements were taken during a short-term fall in
929	   load.

931	   When the routers in the CL-region are using explicit pre-emption
932	   alerting, the ingress gateway would normally pre-empt microflows
933	   whenever it gets an alert (it always would if it were possible to set
934	   "error" equal to zero). For the implicit case however this is not so.
935	   It receives an Alert message when the Congestion-Level-Estimate
936	   reaches (almost) 100%, which is roughly when traffic exceeds the
937	   configured-admission-rate. However, it is only when packets are
938	   indeed dropped en route that the Sustainable-Aggregate-Rate becomes
939	   less than the Ingress-Aggregate-Rate so only then will pre-emption
940	   actually occur on the ingress gateway.

942	   Hence with the implicit scheme, pre-emption can only be triggered
943	   once the system starts dropping packets and thus the QoS of flows
944	   starts being significantly degraded. This is in contrast with the
945	   explicit scheme which allows flow pre-emption to be triggered before
946	   any packet drop, simply when the traffic reaches the configured-pre-
947	   emption-rate. Therefore we believe that the explicit mechanism is
948	   superior. However it does require new functionality on all the
949	   routers (although this is little more than a bulk token bucket - see
950	   [PCN] for details).

952	3.2.3. Use case for flow pre-emption

954	   To see how the pieces of the solution fit together in a use case, we
955	   imagine a scenario where many microflows have already been admitted.
956	   We confine our description to the explicit pre-emption mechanism. Now
957	   an interior router in the CL-region fails. The network layer routing
958	   protocol re-routes round the problem, but as a consequence traffic on
959	   other links increases. In fact let's assume the traffic on one link
960	   now exceeds its configured-pre-emption-rate and so the router pre-
961	   emption marks CL packets. When the egress sees the first one of the
962	   pre-emption marked packets it immediately determines which microflow
963	   this packet is part of (by using a five-tuple filter and comparing it
964	   with state installed at admission) and hence which ingress gateway
965	   the packet came from. It sets up a meter to measure the traffic rate
966	   from this ingress gateway, and as soon as possible sends a message to
967	   the ingress gateway. This message alerts the ingress gateway that
968	   pre-emption may be needed and contains the traffic rate measured by
969	   the egress gateway. Then the ingress gateway determines the traffic
970	   rate that it is sending towards this egress gateway and hence it can
971	   calculate the amount of traffic that needs to be pre-empted.

973	   The solution operates within a little over one round trip time - the
974	   time required for microflow packets that have experienced Pre-emption
975	   Marking to travel downstream through the CL-region and arrive at the
976	   egress gateway, plus some additional time for the egress gateway to
977	   measure the rate seen after it has been alerted that pre-emption may
978	   be needed, and the time for the egress gateway to report this
979	   information to the ingress gateway.

981	   The ingress gateway could now just shed random microflows, but it is
982	   better if the least important ones are dropped. The ingress gateway
983	   could use information stored locally in each reservation's state
984	   (such as for example the RSVP pre-emption priority of [RSVP-
985	   PREEMPTION] or the RSVP admission priority of [RSVP-EMERGENCY]) as
986	   well as information provided by a policy decision point in order to
987	   decide which of the flows to shed (or perhaps which ones not to
988	   shed). This way, flow pre-emption can also helps emergency/military
989	   calls by taking into account the corresponding priorities (as
990	   conveyed in RSVP policy elements) when selecting calls to be pre-
991	   empted, which is likely to be particularly important in a disaster
992	   scenario. Use of RSVP for support of emergency/military applications
993	   is discussed in further details in [RFC4542] and [RSVP-EMERGENCY].

995	   The ingress gateway then initiates RSVP signalling to instruct the
996	   relevant destinations that their reservation has been terminated, and
997	   to tell (RSVP) nodes along the path to tear down associated RSVP
998	   state. To guard against recalcitrant sources, normal IntServ policing
999	   may be used to block any future traffic from the dropped flows from
1000	   entering the CL-region. Note that - with the explicit Pre-emption
1001	   Alert mechanism - since the configured-pre-emption-rate may be
1002	   significantly less than the physical line capacity, flow pre-emption
1003	   may be triggered before any congestion has actually occurred and
1004	   before any packet is dropped.

1006	   We extend the scenario further by imagining that (due to a disaster
1007	   of some kind) further routers in the CL-region fail during the time
1008	   taken by the pre-emption process described above. This is handled
1009	   naturally, as packets will continue to be pre-emption marked and so
1010	   the pre-emption process will happen for a second time.

1012	3.3. Both admission control and pre-emption

1014	   This document describes both the admission control and pre-emption
1015	   mechanisms, and we suggest that an operator uses both. However, we do
1016	   not require this and some operators may want to implement only one.

1018	   For example, an operator could use just admission control, solving
1019	   heavy congestion (caused by re-routing) by 'just waiting' - as
1020	   sessions end, existing microflows naturally depart from the system
1021	   over time, and the admission control mechanism will prevent admission
1022	   of new microflows that use the affected links. So the CL-region will
1023	   naturally return to normal controlled load service, but with reduced
1024	   capacity. The drawback of this approach would be that until flows
1025	   naturally depart to relieve the congestion, all flows and lower
1026	   priority services will be adversely affected. As another example, an
1027	   operator could use just admission control, avoiding heavy congestion
1028	   (caused by re-routing) by 'capacity planning' - by configuring
1029	   admission control thresholds to lower levels than the network could
1030	   accept in normal situations such that the load after failure is
1031	   expected to stay below acceptable levels even with reduced network
1032	   resources.

1034	   On the other hand, an operator could just rely for admission control
1035	   on the traffic conditioning agreements of the DiffServ architecture
1036	   [RFC2475]. The pre-emption mechanism described in this document would
1037	   be used to counteract the problem described at the end of Section
1038	   1.1.1.

1040	4. Summary of Functionality

1042	   This section is intended to provide a systematic summary of the new
1043	   functionality required by the routers in the CL-region.

1045	   A network operator upgrades normal IP routers by:

1047	   o Adding functionality related to admission control and flow pre-
1048	      emption to all its ingress and egress gateways

1050	   o Adding Pre-Congestion Notification for Admission Marking and Pre-
1051	      emption Marking to all the routers in the CL-region.

1053	   We consider the detailed actions required for each of the types of
1054	   router in turn.

1056	4.1. Ingress gateways

1058	   Ingress gateways perform the following tasks:

1060	   o Classify incoming packets - decide whether they are CL or non-CL
1061	      packets. This is done using an IntServ filter spec (source and
1062	      destination addresses and port numbers), whose details have been
1063	      gathered from the RSVP messaging.

1065	   o Police - check that the microflow conforms with what has been
1066	      agreed (i.e. it keeps to its agreed data rate). If necessary,
1067	      packets which do not correspond to any reservations, packets which
1068	      are in excess of the rate agreed for their reservation, and
1069	      packets for a reservation that has earlier been pre-empted may be
1070	      policed. Policing may be achieved via dropping or via re-marking
1071	      of the packet's DSCP to a value different from the CL behaviour
1072	      aggregate.

1074	   o ECN colouring packets - for CL microflows, set the ECN field of
1075	      packets appropriately (see [PCN] for some discussion of encoding).

1077	   o Perform 'interior router' functions (see next sub-section).

1079	   o Admission Control - on new session establishment, consider the
1080	      Congestion-Level-Estimate received from the corresponding egress
1081	      gateway and most likely based on a simple configured CLE-threshold
1082	      decide if a new call is to be admitted or rejected (taking into
1083	      account local policy information as well as optionally information
1084	      provided by a policy decision point).

1086	   o Probe - if requested by the egress gateway to do so, the ingress
1087	      gateway generates probe traffic so that the egress gateway can
1088	      compute the Congestion-Level-Estimate from this ingress gateway.
1089	      Probe packets may be simple data addressed to the egress gateway
1090	      and require no protocol standardisation, although there will be
1091	      best practice for their number, size and rate.

1093	   o Measure - when it receives a Pre-emption Alert message from an
1094	      egress gateway, it determines the rate at which it is sending
1095	      packets to that egress gateway

1097	   o Pre-empt - calculate how much CL traffic needs to be pre-empted;
1098	      decide which microflows should be dropped, perhaps in consultation
1099	      with a Policy Decision Point; and do the necessary signalling to
1100	      drop them.

1102	4.2. Interior routers

1104	   Interior routers do the following tasks:

1106	   o Classify packets - examine the DSCP and ECN field to see if it's a
1107	      CL packet

1109	   o Non-CL packets are handled as usual, with respect to dropping them
1110	      or setting their CE codepoint.

1112	   o Pre-Congestion Notification - CL packets are Admission Marked and
1113	      Pre-emption Marked according to the algorithm detailed in [PCN]
1114	      and outlined in Section 3.

1116	4.3. Egress gateways

1118	   Egress gateways do the following tasks:

1120	   o Classify packets - determine which ingress gateway a CL packet has
1121	      come from. This is the previous RSVP hop, hence the necessary
1122	      details are obtained just as with IntServ from the state
1123	      associated with the packet five-tuple, which has been built using
1124	      information from the RSVP messages.

1126	   o Meter - for CL packets, calculate the fraction of the total number
1127	      of bits which are in Admission marked packets or in Pre-emption
1128	      Marked packets. The calculation is done as an exponentially
1129	      weighted moving average (see Appendix C). A separate calculation
1130	      is made for CL packets from each ingress gateway. The meter works
1131	      on an aggregate basis and not per microflow.

1133	   o Signal the Congestion-Level-Estimate - this is piggy-backed on the
1134	      reservation reply. An egress gateway's interface is configured to
1135	      know it is an egress gateway, so it always appends this to the
1136	      RESV message. If the Congestion-Level-Estimate is unknown or is
1137	      too stale, then the egress gateway can request the ingress gateway
1138	      to send probes.

1140	   o Packet colouring - for CL packets, set the DSCP and the ECN field
1141	      to whatever has been agreed as appropriate for the next domain. By
1142	      default the ECN field is set to the Not-ECT codepoint. See also
1143	      the discussion in the Tunnelling section later.

1145	   o Measure the rate - measure the rate of CL traffic from a
1146	      particular ingress gateway, excluding packets that are Pre-emption
1147	      Marked (i.e. the Sustainable-Aggregate-Rate for the CL-region-
1148	      aggregate), when alerted (either explicitly or implicitly) that
1149	      pre-emption may be required. The measured rate is reported back to
1150	      the appropriate ingress gateway [RSVP-PCN].

1152	4.4. Failures

1154	   If an interior router fails, then the regular IP routing protocol
1155	   will re-route round it. If the new route can carry all the admitted
1156	   traffic, flows will gracefully continue. If instead this causes early
1157	   warning of pre-congestion on the new route, then admission control
1158	   based on pre-congestion notification will ensure new flows will not
1159	   be admitted until enough existing flows have departed. Finally re-
1160	   routing may result in heavy congestion, when the flow pre-emption
1161	   mechanism will kick in.

1163	   If a gateway fails then we would like regular RSVP procedures
1164	   [RFC2205] to take care of things. With the local repair mechanism of
1165	   [RFC2205], when a route changes the next RSVP PATH refresh message
1166	   will establish path state along the new route, and thus attempt to
1167	   re-establish reservations through the new ingress gateway.
1168	   Essentially the same procedure is used as described earlier in this
1169	   document, with the re-routed session treated as a new session
1170	   request.

1172	   In more detail, consider what happens if an ingress gateway of the
1173	   CL-region fails. Then RSVP routers upstream of it do IP re-routing to
1174	   a new ingress gateway. The next time the upstream RSVP router sends a
1175	   PATH refresh message it reaches the new ingress gateway which
1176	   therefore installs the associated RSVP state. The next RSVP RESV
1177	   refresh will pick up the Congestion-Level-Estimate from the egress
1178	   gateway, and the ingress compares this with its threshold to decide
1179	   whether to admit the new session. This could result in some of the
1180	   flows being rejected, but those accepted will receive the full QoS.

1182	   An issue with this is that we have to wait until a PATH and RESV
1183	   refresh messages are sent - which may not be very often - the default
1184	   value is 30 seconds. [RFC2205] discusses how to speed up the local
1185	   repair mechanism. First, the RSVP module is notified by the local
1186	   routing protocol module of a route change to particular destinations,
1187	   which triggers it to rapidly send out PATH refresh messages. Further,
1188	   when a PATH refresh arrives with a previous hop address different
1189	   from the one stored, then RESV refreshes are immediately sent to that
1190	   previous hop. Where RSVP is operating hop-by-hop, i.e. on every
1191	   router, then triggering the PATH refresh is easy as the router can
1192	   simply monitor its local link. Thus, this fast local repair mechanism
1193	   can be  used to deal with failures upstream of the ingress gateway,
1194	   with failures of the ingress gateway and with failures downstream of
1195	   the egress gateway.

1197	   But where RSVP is not operating hop-by-hop (as is the case within the
1198	   CL-region), it is not so easy to trigger the PATH refresh.

1200	   Unfortunately, this problem applies if an egress gateway fails, since
1201	   it's very likely that an egress gateway is several IP hops from the
1202	   ingress gateway. (If the ingress is several IP hops from its previous
1203	   RSVP node, then there is the same issue.) The options appear to be:

1205	   o the ingress gateway has a link state database for the CL-region,
1206	      so it can detect that an egress gateway has failed or became
1207	      unreachable

1209	   o there is an inter-gateway protocol, so the ingress can
1210	      continuously check that the egress gateways are still alive

1212	   o (default) do nothing and wait for the regular PATH/RESV refreshes
1213	      (and, if needed, the pre-emption mechanism) to sort things out.

1215	5. Limitations and some potential solutions

1217	   In this section we describe various limitations of the deployment
1218	   model, and some suggestions about potential ways of alleviating them.
1219	   The limitations fall into three broad categories:

1221	   o ECMP (Section 5.1): the assumption about routing (Section 2.2) is
1222	      that all packets between a pair of ingress and egress gateways
1223	      follow the same path; ECMP breaks this assumption. A study
1224	      regarding the accuracy of load balancing schemes can be found in
1225	      [LoadBalancing-a] and [LoadBalancing-b].

1227	   o The lack of global coordination (Sections 5.2, 5.3 and 5.4): a
1228	      decision about admission control or flow pre-emption is made for
1229	      one aggregate independently of other aggregates

1231	   o Timing and accuracy of measurements (Sections 5.5 and 5.6): the
1232	      assumption (Section 2.2) that additional load, offered within the
1233	      reaction time of the measurement-based admission control
1234	      mechanism, doesn't move the system directly from no congestion to
1235	      overload (dropping packets). A 'flash crowd' may break this
1236	      assumption (Section 5.5). There are a variety of more general
1237	      issues associated with marking measurements, which may mean it's a
1238	      good idea to do pre-emption 'slower' (Section 5.6).

1240	   Each section describes a limitation and some possible solutions to
1241	   alleviate the limitation. These are intended as options for an
1242	   operator to consider, based on their particular requirements.

1244	   We would welcome feedback, for example suggestions as to which
1245	   potential solutions are worth working out in more detail, and ideas
1246	   on new potential solutions.

1248	   Finally Section 5.7 considers some other potential extensions.

1250	5.1. ECMP

1252	   If the CL-region uses Equal Cost Multipath Routing (ECMP), then
1253	   traffic between a particular pair of ingress and egress gateways may
1254	   follow several different paths.

1256	   Why? An ECMP-enabled router runs an algorithm to choose between
1257	   potential outgoing links, based on a hash of fields such as the
1258	   packet's source and destination addresses - exactly what depends on
1259	   the proprietary algorithm. Packets are addressed to the CL flow's
1260	   end-point, and therefore different flows may follow different paths
1261	   through the CL-region. (All packets of an individual flow follow the
1262	   same ECMP path.)

1264	   The problem is that if one of the paths is congested such that
1265	   packets are being admission marked, then the Congestion-Level-
1266	   Estimate measured by the egress gateway will be diluted by unmarked
1267	   packets from other non-congested paths. Similarly, the measurement of
1268	   the Sustainable-Aggregate-Rate will also be diluted.

1270	   Possible solution approaches are:

1272	   o tunnel: traffic is tunnelled across the CL-region. Then the
1273	      destination address (and so on) seen by the ECMP algorithm is that
1274	      of the egress gateway, so all flows follow the same path.
1275	      Effectively ECMP is turned off. As a compromise, to try to retain
1276	      some of the benefits of ECMP, there could be several tunnels, each
1277	      following a different ECMP path, with flows randomly assigned to
1278	      different tunnels.

1280	   o assume worst case: the operator sets the configured-admission-rate
1281	      (and configured-pre-emption-rate) to below the optimum level to
1282	      compensate for the fact that the effect on the Congestion-Level-
1283	      Estimate (and Sustainable-Aggregate-Rate) of the congestion
1284	      experienced over one of the paths may be diluted by traffic
1285	      received over non-congested paths. Hence lower thresholds need to
1286	      be used to ensure early admission control rejection and pre-
1287	      emption over the congested path. This approach will waste capacity
1288	      (e.g. flows following a non-congested ECMP path are not admitted
1289	      or are pre-empted), and there is still the danger that for some
1290	      traffic mixes the operator hasn't been cautious enough.

1292	   o for admission control, probe to obtain a flow-specific congestion-
1293	      level-estimate. Earlier this document suggests continuously
1294	      monitoring the congestion-level-estimate. Instead, probe packets
1295	      could be sent for each prospective new flow. The probe packets
1296	      have the same IP address etc as the data packets would have, and
1297	      hence follow the same ECMP path. However, probing is an extra
1298	      overhead, depending on how many probe packets need to be sent to
1299	      get a sufficiently accurate congestion-level-estimate. Probes also
1300	      cause a processing overhead, either for the machine at the
1301	      destination address or for the egress gateway to identify and
1302	      remove the probe packets.

1304	   o for flow pre-emption, only select flows for pre-emption from
1305	      amongst those that have actually received a Pre-emption Marked
1306	      packet. Because these flows must have followed an ECMP path that
1307	      goes through an overloaded router. However, it needs some extra
1308	      work by the egress gateway, to record this information and report
1309	      it to the ingress gateway.

1311	   o for flow pre-emption, a variant of this idea involves introducing
1312	      a new marking behaviour, 'Router Marking'. A router that is pre-
1313	      emption marking packets on an outgoing link, also 'Router Marks'
1314	      all other packets. When selecting flows for pre-emption, the
1315	      selection is made from amongst those that have actually received a
1316	      Router Marked or Pre-emption Marked packet. Hence compared with
1317	      the previous bullet, it may extend the range of flows from which
1318	      the pre-emption selection is made (i.e. it includes those which,
1319	      by chance, haven't had any pre-emption marked packets). However,
1320	      it also requires that the 'Router Marking' state is somehow
1321	      encoded into a packet, i.e. it makes harder the encoding challenge
1322	      discussed in Appendix C of [PCN]. The extra work required by the
1323	      egress gateway would also be somewhat higher than for the previous
1324	      bullet.

1326	5.2. Beat down effect

1328	   This limitation concerns the pre-emption mechanism in the case where
1329	   more than one router is pre-emption marking packets. The result
1330	   (explained in the next paragraph) is that the measurement of
1331	   sustainable-aggregate-rate is lower than its true value, so more
1332	   traffic is pre-empted than necessary.

1334	   Imagine the scenario:

1336	                 +-------+     +-------+     +-------+
1337	   IAR-b=3 >@@@@@| CPR=2 |@@@@@| CPR>2 |@@@@@| CPR=1 |@@> SAR-b=1
1338	   IAR-a=1 >#####|  R1   |#####|  R2   |     |  R3   |
1339	                 +-------+     +-------+     +-------+
1340	                                  #
1341	                                  #
1342	                                  #
1343	                                  v SAR-a=0.5

1345	   Figure 4: Scenario to illustrate 'beat down effect' limitation

1347	   Aggregate-a (ingress-aggregate-rate, IAR, 1 unit) takes a 'short'
1348	   route through two routers, one of which (R1) is above its configured-
1349	   pre-emption-rate (CPR, 2 units). Aggregate-b takes a 'long' route,
1350	   going through a second congested router (R3, with a CPR of 1 unit).

1352	   R1's input traffic is 4 units, twice its configured-pre-emption-rate,
1353	   so 50% of packets are pre-emption marked. Hence the measured
1354	   sustainable-aggregate-rate (SAR) for aggregate-a is 0.5, and half of
1355	   its traffic will be pre-empted.

1357	   R3's input of non-pre-emption-marked traffic is 1.5 units, and
1358	   therefore it has to do further marking.

1360	   But this means that aggregate-a has taken a bigger hit than it needed
1361	   to; the router R1 could have let through all of aggregate-a's traffic
1362	   unmarked if it had known that the second router R2 was going to "beat
1363	   down" aggregate-b's traffic further.

1365	   Generalising, the result is that in a scenario where more than one
1366	   router is pre-emption marking packets, only the final router is sure
1367	   to be fully loaded after flow pre-emption. The fundamental reason is
1368	   that a router makes a local decision about which packets to pre-
1369	   emption mark, i.e. independently of how other routers are pre-emption
1370	   marking. A very similar effect has been noted in XCP [Low].

1372	   Potential solutions:

1374	   o a full solution would involve routers learning about other routers
1375	      that are pre-emption marking, and being able to differentially
1376	      mark flows (e.g. in the example above, aggregate-a's packets
1377	      wouldn't be marked by R1). This seems hard and complex.

1379	   o do nothing about this limitation. It causes over-pre-emption,
1380	      which is safe. At the moment this is our suggested option.

1382	   o do pre-emption 'slowly'. The description earlier in this document
1383	      assumes that after the measurements of ingress-aggregate-rate and
1384	      sustainable-aggregate-rate, then sufficient flows are pre-empted
1385	      in 'one shot' to eliminate the excess traffic. An alternative is
1386	      to spread pre-emption over several rounds: initially, only pre-
1387	      empt enough to eliminate some of the excess traffic, then re-
1388	      measure the sustainable-aggregate-rate, and then pre-empt some
1389	      more, etc. In the scenario above, the re-measurement would be
1390	      lower than expected, due to the beat down effect, and hence in the
1391	      second round of pre-emption less of aggregate-a's traffic would be
1392	      pre-empted (perhaps none). Overall, therefore the impact of the
1393	      'beat down' effect would be lessened, i.e. there would be a
1394	      smaller degree of over pre-emption. The downside is that the
1395	      overall pre-emption is slower, and therefore routers will be
1396	      congested longer.

1398	5.3. Bi-directional sessions

1400	   The document earlier describes how to decide whether or not to admit
1401	   (or pre-empt) a particular flow. However, from a user/application
1402	   perspective, the session is the relevant unit of granularity. A
1403	   session can consist of several flows which may not all be part of the
1404	   same aggregate. The most obvious example is a bi-directional session,
1405	   where the two flows should ideally be admitted or pre-empted as a
1406	   pair - for instance a voice call only makes sense if A can send to B
1407	   as well as B to A! But the admission and pre-emption mechanisms
1408	   described earlier in this document operate on a per-aggregate basis,
1409	   independently of what's happening with other aggregates. For
1410	   admission control the problem isn't serious: e.g. the SIP server for
1411	   the voice call can easily detect that the A-to-B flow has been
1412	   admitted but the B-to-A flow blocked, and inform the user perhaps via
1413	   a busy tone. For flow pre-emption, the problem is similar but more
1414	   serious. If both the aggregate-1-to-2 (i.e. from gateway 1 to gateway
1415	   2) and the aggregate-2-to-1 have to pre-empt flows, then it would be
1416	   good if either all of the flows of a particular session were pre-
1417	   empted or none of them. Therefore if the two aggregates pre-empt
1418	   flows independently of each other, more sessions will end up being
1419	   torn down than is really necessary. For instance, pre-empting one
1420	   direction of a voice call will result in the SIP server tearing down
1421	   the other direction anyway.

1423	   Potential solutions:

1425	   o if it's known that all session are bi-directional, simply pre-
1426	      empting roughly half as many flows as suggested by the
1427	      measurements of {ingress-aggregate-rate - sustainable-aggregate-
1428	      rate}. But this makes a big assumption about the nature of
1429	      sessions, and also that the aggregate-1-to-2 and aggregate-2-to-1
1430	      are equally overloaded.

1432	   o ignore the limitation. The penalty will be quite small if most
1433	      sessions consist of one flow or of flows part of the same
1434	      aggregate.

1436	   o introduce a gateway controller. It would receive reports for all
1437	      aggregates where the ingress-aggregate-rate exceeds the
1438	      sustainable-aggregate-rate. It then would make a global decision
1439	      about which flows to pre-empt. However it requires quite some
1440	      complexity, for example the controller needs to understand which
1441	      flows map to which sessions. This may be an option in some
1442	      scenarios, for example where gateways aren't handling too many
1443	      flows (but note that this breaks the aggregation assumption of
1444	      Section 2.2). A variant of this idea would be to introduce a
1445	      gateway controller per pair of gateways, in order to handle bi-
1446	      directional sessions but not try to deal with more complex
1447	      sessions that include flows from an arbitrary number of
1448	      aggregates.

1450	   o do pre-emption 'slowly'. As in the "beat down" solution 4, this
1451	      would reduce the impact of this limitation. The downside is that
1452	      the overall pre-emption is slower, and therefore router(s) will be
1453	      congested longer.

1455	   o each ingress gateway 'loosely coordinates' with other gateways its
1456	      decision about which specific flows to pre-empt. Each gateway
1457	      numbers flows in the order they arrive (note that this number has
1458	      no meaning outside the gateway), and when pre-empting flows, the
1459	      most recent (or most recent low priority flow) is selected for
1460	      pre-emption; the gateway then works backwards selecting as many
1461	      flows as needed. Gateways will therefore tend to pre-empt flows
1462	      that are part of the same session (as they were admitted at the
1463	      same time). Of course this isn't guaranteed for several reasons,
1464	      for instance gateway A's most recent bi-directional sessions may
1465	      be with gateway C, whereas gateway B's are with gateway A (so
1466	      gateway A will pre-empt A-to-C flows and gateway B will pre-empt
1467	      B-to-A flows). Rather than pre-empting the most recent (low
1468	      priority) flow, an alternative algorithm (for further study) may
1469	      be to select flows based on a hash of particular fields in the
1470	      packet, such that both gateways produce the same hash for flows of
1471	      the same bi-directional session. We believe that this approach
1472	      should be investigated further.

1474	5.4. Global fairness

1476	   The limitation here is that 'high priority' traffic may be pre-empted
1477	   (or not admitted) when a global decision would instead pre-empt (or
1478	   not admit) 'lower priority' traffic on a different aggregate.

1480	   Imagine the following scenario (extreme to illustrate the point
1481	   clearly). Aggregate_a is all Assured Services (MLPP) traffic, whilst
1482	   aggregate_b is all ordinary traffic (i.e. comparatively low
1483	   priority). Together the two aggregates cause a router to be at twice
1484	   its configured-pre-emption-rate. Ideally we'd like all of aggregate_b
1485	   to be pre-empted, as then all of aggregate_a could be carried.
1486	   However, the approach described earlier in this document leads to
1487	   half of each aggregate being pre-empted.

1489	                       IAR_b=1
1490	                         v
1491	                         v
1492	                     +-------+
1493	   IAR_a=1 ---->-----| CPR=1 |-----> SAR_a=0.5
1494	                     |       |
1495	                     +-------+
1496	                         v
1497	                         v
1498	                       SAR_a=0.5

1500	   Figure 5: Scenario to illustrate 'global fairness' limitation

1502	   Similarly, for admission control - Section 4.1 describes how if the
1503	   Congestion-Level-Estimate is greater than the CLE-threshold all new
1504	   sessions are refused. But it is unsatisfactory to block emergency
1505	   calls, for instance.

1507	   Potential solutions:

1509	   o in the admission control case, it is recommended that an
1510	      'emergency / Assured Services' call is admitted immediately even
1511	      if the CLE-threshold is exceeded. Usually the network can actually
1512	      handle the additional microflow, because there is a safety margin
1513	      between the configured-admission-rate and the configured-pre-
1514	      emption-rate. Normal call termination behaviour will soon bring
1515	      the traffic level down below the configured-admission-rate.
1516	      However, in exceptional circumstances the 'emergency / higher
1517	      precedence' call may cause the traffic level to exceed the
1518	      configured-pre-emption-rate; then the usual pre-emption mechanism
1519	      will pre-empt enough (non 'emergency / higher precedence')
1520	      microflows to bring the total traffic back under the configured-
1521	      pre-emption-rate.

1523	   o all egress gateways report to a global coordinator that makes
1524	      decisions about what flows to pre-empt. However this solution adds
1525	      complexity and probably isn't scalable, but it may be an option in
1526	      some scenarios, for example where gateways aren't handling too
1527	      many flows (but note that this breaks the aggregation assumption
1528	      of Section 2.2).

1530	   o introduce a heuristic rule: before pre-empting a 'high priority'
1531	      flow the egress gateway should wait to see if sufficient (lower
1532	      priority) traffic is pre-empted on other aggregates. This is a
1533	      reasonable option.

1535	   o enhance the functionality of all the interior routers, so they can
1536	      detect the priority of a packet, and then differentially mark
1537	      them. As well as adding complexity, in general this would be an
1538	      unacceptable security risk for MLPP traffic, since only controlled
1539	      nodes (like gateways) should know which packets are high priority,
1540	      as this information can be abused by an attacker.

1542	   o do nothing, i.e. accept the limitation. Whilst it's unlikely that
1543	      high priority calls will be quite so unbalanced as in the scenario
1544	      above, just accepting this limitation may be risky. The sorts of
1545	      situations that cause routers to start pre-emption marking are
1546	      also likely to cause a surge of emergency / MLPP calls.

1548	5.5. Flash crowds

1550	   This limitation concerns admission control and arises because there
1551	   is a time lag between the admission control decision (which depends
1552	   on the Congestion-Level-Estimate during RSVP signalling during call
1553	   set-up) and when the data is actually sent (after the called party
1554	   has answered). In PSTN terms this is the time the phone rings.
1555	   Normally the time lag doesn't matter much because (1) in the CL-
1556	   region there are many flows and they terminate and are answered at
1557	   roughly the same rate, and (2) the network can still operate safely
1558	   when the traffic level is some margin above the configured-admission-
1559	   rate.

1561	   A 'flash crowd' occurs when something causes many calls to be
1562	   initiated in a short period of time - for instance a 'tele-vote'. So
1563	   there is a danger that a 'flash' of calls is accepted, but when the
1564	   calls are answered and data flows the traffic overloads the network.
1565	   Therefore potentially the 'additional load' assumption of Section 2.2
1566	   doesn't hold.

1568	   Potential solutions:

1570	   o The simplest option is to do nothing; an operator relies on the
1571	      pre-emption mechanism if there is a problem. This doesn't seem a
1572	      good choice, as 'flash crowds' are reasonably common on the PSTN,
1573	      unless the operator can ensure that nearly all 'flash crowd'
1574	      events are blocked in the access network and so do not impact on
1575	      the CL-region.

1577	   o A second option is to send 'dummy data' as soon as the call is
1578	      admitted, thus effectively reserving the bandwidth whilst waiting
1579	      for the called party to answer. Reserving bandwidth in advance
1580	      means that the network cannot admit as many calls. For example,
1581	      suppose sessions last 100 seconds and ringing for 10 seconds, the
1582	      cost is a 10% loss of capacity. It may be possible to offset this
1583	      somewhat by increasing the configured-admission-rate in the
1584	      routers, but it would need further investigation. A concern with
1585	      this 'dummy data' option is that it may allow an attacker to
1586	      initiate many calls that are never answered (by a cooperating
1587	      attacker), so eventually the network would only be carrying 'dummy
1588	      data'. The attack exploits that charging only starts when the call
1589	      is answered and not when it is dialled. It may be possible to
1590	      alleviate the attack at the session layer - for example, when the
1591	      ingress gateway gets an RSVP PATH message it checks that the
1592	      source has been well-behaved recently; and limiting the maximum
1593	      time that ringing can last. We believe that if this attack can be
1594	      dealt with then this is a good option.

1596	   o A third option is that the egress gateway limits the rate at which
1597	      it sends out the Congestion-Level-Estimate, or limits the rate at
1598	      which calls are accepted by replying with a Congestion-Level-
1599	      Estimate of 100% (this is the equivalent of 'call gapping' in the
1600	      PSTN). There is a trade-off, which would need to be investigated
1601	      further, between the degree of protection and possible adverse
1602	      side-effects like slowing down call set-up.

1604	   o A final option is to re-perform admission control before the call
1605	      is answered. The ingress gateway monitors Congestion-Level-
1606	      Estimate updates received from each egress. If it notices that a
1607	      Congestion-Level-Estimate has risen above the CLE-threshold, then
1608	      it terminates all unanswered calls through that egress (e.g. by
1609	      instructing the session protocol to stop the 'ringing tone'). For
1610	      extra safety the Congestion-Level-Estimate could be re-checked
1611	      when the call is answered. A potential drawback for an operator
1612	      that wants to emulate the PSTN is that the PSTN network never
1613	      drops a 'ringing' PSTN call.

1615	5.6. Pre-empting too fast

1617	   As a general idea it seems good to pre-empt excess flows rapidly, so
1618	   that the full QoS is restored to the remaining CL users as soon as
1619	   possible, and partial service is restored to lower priority traffic
1620	   classes on shared links. Therefore the pre-emption mechanism
1621	   described earlier in this document works in 'one shot', i.e. one
1622	   measurement is made of the sustainable-aggregate-rate and the
1623	   ingress-aggregate-rate, and the excess is pre-empted immediately.
1624	   However, there are some reasons why an operator may potentially want
1625	   to pre-empt 'more slowly':

1627	   o To allow time to modify the ingress gateway's policer, as the
1628	      ingress wants to be able to drop any packets that arrive from a
1629	      pre-empted flow. There will be a limit on how many new filters an
1630	      ingress gateway can install in a certain time period. Otherwise
1631	      the source may cheat and ignore the instruction to drop its flow.

1633	   o The operator may decide to slow down pre-emption in order to
1634	      ameliorate the 'beat down' and/or 'bi-directional sessions'
1635	      limitations (see above)

1637	   o To help combat inaccuracies in measurements of the sustainable-
1638	      aggregate-rate and ingress-aggregate-rate. For a CL-region where
1639	      it's assumed there are many flows in an aggregate these
1640	      measurements can be obtained in a short period of time, but where
1641	      there are fewer flows it will take longer.

1643	   o To help combat over pre-emption because, during the time it takes
1644	      to pre-empt flows, others may be ending anyway (either the call
1645	      has naturally ended, or the user hangs up due to poor QoS).
1646	      Slowing pre-emption may seem counter-intuitive here, as it makes
1647	      it more likely that calls will terminate anyway - however it also
1648	      gives time to adjust the amount pre-empted to take account of
1649	      this.

1651	   o Earlier in this document we said that an egress starts measuring
1652	      the sustainable-aggregate-rate immediately it sees a single pre-
1653	      emption marked packet. However, when a link or router fails the
1654	      network's underlying recovery mechanism will kick in (e.g.
1655	      switching to a back up path), which may result in the network
1656	      again being able to support all the traffic.

1658	   Potential solutions

1660	   o To combat the final issue, the egress could measure the
1661	      sustainable-aggregate-rate over a longer time period than the
1662	      network recovery time (say 100ms vs. 50ms). If it detects no pre-
1663	      emption marked packets towards the end of its measurement period
1664	      (say in the last 30 ms) then it doesn't send a pre-emption alert
1665	      message to the ingress.

1667	   o We suggest that optionally (the choice of the operator) pre-
1668	      emption is slowed by pre-empting traffic in several rounds rather
1669	      than in one shot. One possible algorithm is to pre-empt most of
1670	      the traffic in the first round and the rest in the second round;
1671	      the amount pre-empted in the second round is influenced by both
1672	      the first and second round measurements:     * Round 1: pre-empt h
1673	      * S_1                                 where 0.5 <= h <= 1
1674	      where S_1 is the amount the normal mechanism calculates that it
1675	      should shed, i.e. {ingress-aggregate-rate - sustainable-aggregate-
1676	      rate}        * Round 2: pre-empt Predicted-S_2 - h * (Predicted-
1677	      S_2 - Measured-S_2)
1678	      where Predicted-S_2 = (1-h)*S_1                              Note
1679	      that the second measurement should be made when sufficient time
1680	      has elapsed for the first round of pre-emption to have happened.
1681	      One idea to achieve this is for the egress gateway to continuously
1682	      measure and report its sustainable-aggregate-rate, in (say) 100ms
1683	      windows. Therefore the ingress gateway knows when the egress
1684	      gateway made its measurement (assuming the round trip time is
1685	      known). Therefore the ingress gateway knows when measurements
1686	      should reflect that it has pre-empted flows.

1688	5.7. Other potential extensions

1690	   In this section we discuss some other potential extensions not
1691	   already covered above.

1693	5.7.1. Tunnelling

1695	   It is possible to tunnel all CL packets across the CL-region.
1696	   Although there is a cost of tunnelling (additional header on each
1697	   packet, additional processing at tunnel ingress and egress), there
1698	   are three reasons it may be interesting.

1700	   ECMP:

1702	   Tunnelling is one of the possible solutions given earlier in Section
1703	   5.1 on Equal Cost Multipath Routing (ECMP).

1705	   Ingress gateway determination:

1707	   If packets are tunnelled from ingress gateway to egress gateway, the
1708	   egress gateway can very easily determine in the data path which
1709	   ingress gateway a packet comes from (by simply looking at the source
1710	   address of the tunnel header). This can facilitate operations such as
1711	   computing the Congestion-Level-Estimate on a per ingress gateway
1712	   basis.

1714	   End-to-end ECN:

1716	   The ECN field is used for PCN marking (see [PCN] for details), and so
1717	   it needs to be re-set by the egress gateway to whatever has been
1718	   agreed as appropriate for the next domain. Therefore if a packet
1719	   arrives at the ingress gateway with its ECN field already set (i.e.
1720	   not '00'), it may leave the egress gateway with a different value.
1721	   Hence the end-to-end meaning of the ECN field is lost.

1723	   It is open to debate whether end-to-end congestion control is ever
1724	   necessary within an end-to-end reservation. But if a genuine need is
1725	   identified for end-to-end ECN semantics within a reservation, then
1726	   one solution is to tunnel CL packets across the CL-region. When the
1727	   egress gateway decapsulates them the original ECN field is recovered.

1729	5.7.2. Multi-domain and multi-operator usage

1731	   This potential extension would eliminate the trust assumption
1732	   (Section 2.2), so that the CL-region could consist of multiple
1733	   domains run by different operators that did not trust each other.
1734	   Then only the ingress and egress gateways of the CL-region would take
1735	   part in the admission control procedure, i.e. at the ingress to the
1736	   first domain and the egress from the final domain. The border routers
1737	   between operators within the CL-region would only have to do bulk
1738	   accounting - they wouldn't do per microflow metering and policing,
1739	   and they wouldn't take part in signal processing or hold per flow
1740	   state [Briscoe]. [Re-feedback] explains how a downstream domain can
1741	   police that its upstream domain does not 'cheat' by admitting traffic
1742	   when the downstream path is congested. [Re-PCN] proposes how to
1743	   achieve this with the help of another recently proposed extension to
1744	   ECN, involving re-echoing ECN feedback [Re-ECN].

1746	5.7.3. Preferential dropping of pre-emption marked packets

1748	   When the rate of real-time traffic in the specified class exceeds the
1749	   maximum configured rate, then a router has to drop some packet(s)
1750	   instead of forwarding them on the out-going link. Now when the egress
1751	   gateway measures the Sustainable-Aggregate-Rate, neither dropped
1752	   packets nor pre-emption marked packets contribute to it. Dropping
1753	   non-pre-emption-marked packets therefore reduces the measured
1754	   Sustainable-Aggregate-Rate below its true value. Thus a router should
1755	   preferentially drop pre-emption marked packets.

1757	   Note that it is important that the operator doesn't set the
1758	   configured-pre-emption-rate equal to the rate at which packets start
1759	   being dropped (for the specified real-time service class). Otherwise
1760	   the egress gateway may never see a pre-emption marked packet and so
1761	   won't be triggered into the Pre-emption Alert state.

1763	   This optimisation is optional. When considering whether to use it an
1764	   operator will consider issues such as whether the over-pre-emption is
1765	   serious, and whether the particular routers can easily do this sort
1766	   of selective drop.

1768	5.7.4. Adaptive bandwidth for the Controlled Load service

1770	   The admission control mechanism described in this document assumes
1771	   that each router has a fixed bandwidth allocated to CL flows. A
1772	   possible extension is that the bandwidth is flexible, depending on
1773	   the level of non-CL traffic. If a large share of the current load on
1774	   a path is CL, then more CL traffic can be admitted. And if the
1775	   greater share of the load is non-CL, then the admission threshold can
1776	   be proportionately lower. The approach re-arranges sharing between
1777	   classes to aim for economic efficiency, whatever the traffic load
1778	   matrix. It also deals with unforeseen changes to capacity during
1779	   failures better than configuring fixed engineered rates. Adaptive
1780	   bandwidth allocation can be achieved by changing the admission
1781	   marking behaviour, so that the probability of admission marking a
1782	   packet would now depend on the number of queued non-CL packets as
1783	   well as the size of the virtual queue. The adaptive bandwidth
1784	   approach would be supplemented by placing limits on the adaptation to
1785	   prevent starvation of the CL by other traffic classes and of other
1786	   classes by CL traffic. [Songhurst] has more details of the adaptive
1787	   bandwidth approach.

1789	5.7.5. Controlled Load service with end-to-end Pre-Congestion
1790	   Notification

1792	   It may be possible to extend the framework to parts of the network
1793	   where there are only a low number of CL microflows, i.e. the
1794	   aggregation assumption (Section 2.2) doesn't hold. In the extreme it
1795	   may be possible to operate the framework end-to-end, i.e. between end
1796	   hosts. One potential method is to send probe packets to test whether
1797	   the network can support a prospective new CL microflow. The probe
1798	   packets would be sent at the same traffic rate as expected for the
1799	   actual microflow, but in order not to disturb existing CL traffic a
1800	   router would always schedule probe packets behind CL ones (compare
1801	   [Breslau00]); this implies they have a new DSCP. Otherwise the
1802	   routers would treat probe packets identically to CL packets. In order
1803	   to perform admission control quickly, in parts of the network where
1804	   there are only a few CL microflows, the algorithm for Admission
1805	   Marking described in [PCN] would need to "switch on" very rapidly, ie
1806	   go from marking no packets to marking them all for only a minimal
1807	   increase in the size of the virtual queue.

1809	5.7.6. MPLS-TE

1811	   [ECN-MPLS] discusses how to extend the deployment model to MPLS, i.e.
1812	   for admission control of microflows into a set of MPLS-TE aggregates
1813	   (Multi-protocol label switching traffic engineering). It would
1814	   require that the MPLS header could include the ECN field, which is
1815	   not precluded by RFC3270. See [ECN-MPLS].

1817	6. Relationship to other QoS mechanisms

1819	6.1. IntServ Controlled Load

1821	   The CL mechanism delivers QoS similar to Integrated Services
1822	   controlled load, but rather better. The reason the QoS is better is
1823	   that the CL mechanism keeps the real queues empty, by driving
1824	   admission control from a bulk virtual queue on each interface. The
1825	   virtual queue [AVQ, vq] can detect a rise in load before the real
1826	   queue builds. It is also more robust to route changes.

1828	6.2. Integrated services operation over DiffServ

1830	   Our approach to end-to-end QoS is similar to that described in
1831	   [RFC2998] for Integrated services operation over DiffServ networks.
1832	   Like [RFC2998], an IntServ class (CL in our case) is achieved end-to-
1833	   end, with a CL-region viewed as a single reservation hop in the total
1834	   end-to-end path. Interior routers of the CL-region do not process
1835	   flow signalling nor do they hold per flow state. Unlike [RFC2998] we
1836	   do not require the end-to-end signalling mechanism to be RSVP,
1837	   although it can be.

1839	   Bearing in mind these differences, we can describe our architecture
1840	   in the terms of the options in [RFC2998]. The DiffServ network region
1841	   is RSVP-aware, but awareness is confined to (what [RFC2998] calls)
1842	   the "border routers" of the DiffServ region. We use explicit
1843	   admission control into this region, with static provisioning within
1844	   it. The ingress "border router" does per microflow policing and sets
1845	   the DSCP and ECN fields to indicate the packets are CL ones (i.e. we
1846	   use router marking rather than host marking).

1848	6.3. Differentiated Services

1850	   The DiffServ architecture does not specify any way for devices
1851	   outside the domain to dynamically reserve resources or receive
1852	   indications of network resource availability.  In practice, service
1853	   providers rely on subscription-time Service Level Agreements (SLAs)
1854	   that statically define the parameters of the traffic that will be
1855	   accepted from a customer. The CL mechanism allows dynamic reservation
1856	   of resources through the DiffServ domain and, with the potential
1857	   extension mentioned in Section 5.7.2, it can span multiple domains
1858	   without active policing mechanisms at the borders (unlike DiffServ).
1859	   Therefore we do not use the traffic conditioning agreements (TCAs) of
1860	   the (informational) DiffServ architecture [RFC2475].

1862	   An important benefit arises from the fact that the load is controlled
1863	   dynamically rather than with traffic conditioning agreements (TCAs).

1865	   TCAs were originally introduced in the (informational) DiffServ
1866	   architecture [RFC2475] as an alternative to reservation processing in
1867	   the interior region in order to reduce the burden on interior
1868	   routers. With TCAs, in practice service providers rely on
1869	   subscription-time Service Level Agreements that statically define the
1870	   parameters of the traffic that will be accepted from a customer. The
1871	   problem arises because the TCA at the ingress must allow any
1872	   destination address, if it is to remain scalable. But for longer
1873	   topologies, the chances increase that traffic will focus on an
1874	   interior resource, even though it is within contract at the ingress
1875	   [Reid], e.g. all flows converge on the same egress gateway. Even
1876	   though networks can be engineered to make such failures rare, when
1877	   they occur all inelastic flows through the congested resource fail
1878	   catastrophically.

1880	   [Johnson] compares admission control with a 'generously dimensioned'
1881	   DiffServ network as ways to achieve QoS. The former is recommended.

1883	6.4. ECN

1885	   The marking behaviour described in this document complies with the
1886	   ECN aspects of the IP wire protocol RFC3168, but provides its own
1887	   edge-to-edge feedback instead of the TCP aspects of RFC3168. All
1888	   routers within the CL-region are upgraded with the admission marking
1889	   and pre-emption marking of Pre-Congestion Notification, so the
1890	   requirements of [Floyd] are met because the CL-region is an enclosed
1891	   environment. The operator prevents traffic arriving at a router that
1892	   doesn't understand CL by administrative configuration of the ring of
1893	   gateways around the CL-region.

1895	6.5. RTECN

1897	   Real-time ECN (RTECN) [RTECN, RTECN-usage] has a similar aim to this
1898	   document (to achieve a low delay, jitter and loss service suitable
1899	   for RT traffic) and a similar approach (per microflow admission
1900	   control combined with an "early warning" of potential congestion
1901	   through setting the CE codepoint). But it explores a different
1902	   architecture without the aggregation assumption: host-to-host rather
1903	   than edge-to-edge. We plan to document such a host-to-host framework
1904	   in a parallel draft to this one, and to describe if and how [PCN] can
1905	   work in this framework.

1907	6.6. RMD

1909	   Resource Management in DiffServ (RMD) [RMD] is similar to this work,
1910	   in that it pushes complex classification, traffic conditioning and
1911	   admission control functions to the edge of a DiffServ domain and
1912	   simplifies the operation of the interior routers. One of the RMD
1913	   modes ("Congestion notification function based on probing") uses
1914	   measurement-based admission control in a similar way to this
1915	   document. The main difference is that in RMD probing plays a
1916	   significant role in the admission control process. Other differences
1917	   are that the admission control decision is taken on the egress
1918	   gateway (rather than the ingress); 'admission marking' is encoded in
1919	   a packet as a new DSCP (rather than in the ECN field), and that the
1920	   NSIS protocols are used for signalling (rather than RSVP).

1922	   RMD also includes the concept of Severe Congestion handling. The pre-
1923	   emption mechanism described in the CL architecture has similar
1924	   objectives but relies on different mechanisms. The main difference is
1925	   that the interior routers measure the data rate that causes an
1926	   overload and mark packets according to this rate.

1928	6.7. RSVP Aggregation over MPLS-TE

1930	   Multi-protocol label switching traffic engineering (MPLS-TE) allows
1931	   scalable reservation of resources in the core for an aggregate of
1932	   many microflows. To achieve end-to-end reservations, admission
1933	   control and policing of microflows into the aggregate can be achieved
1934	   using techniques such as RSVP Aggregation over MPLS TE Tunnels as per
1935	   [AGGRE-TE]. However, in the case of inter-provider environments,
1936	   these techniques require that admission control and policing be
1937	   repeated at each trust boundary or that MPLS TE tunnels span multiple
1938	   domains.

1940	6.8. Other Network Admission Control Approaches

1942	   Link admission control (LAC) describes how admission control (AC) can
1943	   be done on a single link and comprises, e.g., the calculation of
1944	   effective bandwidths which may be the base for a parameter-based AC.
1945	   In contrast, network AC (NAC) describes how AC can be done for a
1946	   network and focuses on the locations from which data is gathered for
1947	   the admission decision. Most approaches implement a link budget based
1948	   NAC (LB NAC) where each link has a certain AC-budget. RSVP works
1949	   according to that principle, but also the new concept admits
1950	   additional flows as long as each link on the new flow's path still
1951	   has resources available. The border-to-border budget based NAC (BBB
1952	   NAC) pre-configures an AC budget for all border-to-border
1953	   relationships (= CL-region-aggregates) and if this capacity budget is
1954	   exhausted, new flows are rejected. The TCA-based admission control
1955	   which is associated with the DiffServ architecture implements an
1956	   ingress budget based NAC (IB NAC). These basically different concepts
1957	   have different flexibility and efficiency with regard to the use of
1958	   link bandwidths [NAC-a,NAC-b]. They can be made resilient by choosing
1959	   the budgets in such a way that the network will not be congested
1960	   after rerouting due to a failure. The efficiency of the approaches is
1961	   different with and without such resilient requirements.

1963	7. Security Considerations

1965	   To protect against denial of service attacks, the ingress gateway of
1966	   the CL-region needs to police all CL packets and drop packets in
1967	   excess of the reservation. This is similar to operations with
1968	   existing IntServ behaviour.

1970	   For pre-emption, it is considered acceptable from a security
1971	   perspective that the ingress gateway can treat "emergency/military"
1972	   CL flows preferentially compared with "ordinary" CL flows. However,
1973	   in the rest of the CL-region they are not distinguished (nonetheless,
1974	   our proposed technique does not preclude the use of different DSCPs
1975	   at the packet level as well as different priorities at the flow
1976	   level.). Keeping emergency traffic indistinguishable at the packet
1977	   level minimises the opportunity for new security attacks. For
1978	   example, if instead a mechanism used different DSCPs for
1979	   "emergency/military" and "ordinary" packets, then an attacker could
1980	   specifically target the former in the data plane (perhaps for DoS or
1981	   for eavesdropping).

1983	   Further security aspects to be considered later.

1985	8. Acknowledgements

1987	   The admission control mechanism evolved from the work led by Martin
1988	   Karsten on the Guaranteed Stream Provider developed in the M3I
1989	   project [GSPa, GSP-TR], which in turn was based on the theoretical
1990	   work of Gibbens and Kelly [DCAC]. Kennedy Cheng, Gabriele Corliano,
1991	   Carla Di Cairano-Gilfedder, Kashaf Khan, Peter Hovell, Arnaud Jacquet
1992	   and June Tay (BT) helped develop and evaluate this approach.

1994	   Many thanks to those who have commented on this work at Transport
1995	   Area Working Group meetings and on the mailing list, including: Ken
1996	   Carlberg, Ruediger Geib, Lars Westberg, David Black, Robert Hancock,
1997	   Cornelia Kappler, Michael Menth.

1999	9. Comments solicited

2001	   Comments and questions are encouraged and very welcome. They can be
2002	   sent to the Transport Area Working Group's mailing list,
2003	   tsvwg@ietf.org, and/or to the authors.

2005	10. Changes from earlier versions of the draft

2007	   The main changes are:

2009	   From -00 to -01

2011	   The whole of the Pre-emption mechanism is added.

2013	   There are several modifications to the admission control mechanism.

2015	   From -01 to -02

2017	   The pre-congestion notification algorithms for admission marking and
2018	   pre-emption marking are now described in [PCN].

2020	   There are new sub-sections in Section 4 on Failures, Admission of
2021	   'emergency / higher precedence' session, and Tunnelling; and a new
2022	   sub-section in Section 5 on Mechanisms to deal with 'Flash crowds'.

2024	   From -02 to -03

2026	   Section 5 has been updated and expanded. It is now about the
2027	   'limitations' of the PCN mechanism, as described in the earlier
2028	   sections, plus discussion of 'possible solutions' to those
2029	   limitations.

2031	   The measurement of the Congestion-Level-Estimate now includes pre-
2032	   emption marked packets as well as admission marked ones. Section
2033	   3.1.2 explains.

2035	   From -03 to -04

2037	   Detailed review by Michael Menth. In response, Abstract, Summary and
2038	   Key benefits sections re-written. Numerous detailed comments on
2039	   Sections 5 and following sections.

2041	11. Appendices

2043	11.1. Appendix A: Explicit Congestion Notification

2045	   This Appendix provides a brief summary of Explicit Congestion
2046	   Notification (ECN).

2048	   [RFC3168] specifies the incorporation of ECN to TCP and IP, including
2049	   ECN's use of two bits in the IP header. It specifies a method for
2050	   indicating incipient congestion to end-hosts (e.g. as in RED, Random
2051	   Early Detection), where the notification is through ECN marking
2052	   packets rather than dropping them.

2054	   ECN uses two bits in the IP header of both IPv4 and IPv6 packets:

2056	            0     1     2     3     4     5     6     7
2057	         +-----+-----+-----+-----+-----+-----+-----+-----+
2058	         |          DS FIELD, DSCP           | ECN FIELD |
2059	         +-----+-----+-----+-----+-----+-----+-----+-----+

2061	           DSCP: differentiated services codepoint
2062	           ECN:  Explicit Congestion Notification

2064	   Figure A.1: The Differentiated Services and ECN Fields in IP.

2066	   The two bits of the ECN field have four ECN codepoints, '00' to '11':
2067	         +-----+-----+
2068	         | ECN FIELD |
2069	         +-----+-----+
2070	           ECT   CE
2071	            0     0         Not-ECT
2072	            0     1         ECT(1)
2073	            1     0         ECT(0)
2074	            1     1         CE

2076	   Figure A.2: The ECN Field in IP.

2078	   The not-ECT codepoint '00' indicates a packet that is not using ECN.

2080	   The CE codepoint '11' is set by a router to indicate congestion to
2081	   the end hosts. The term 'CE packet' denotes a packet that has the CE
2082	   codepoint set.

2084	   The ECN-Capable Transport (ECT) codepoints '10' and '01' (ECT(0) and
2085	   ECT(1) respectively) are set by the data sender to indicate that the
2086	   end-points of the transport protocol are ECN-capable. Routers treat
2087	   the ECT(0) and ECT(1) codepoints as equivalent. Senders are free to
2088	   use either the ECT(0) or the ECT(1) codepoint to indicate ECT, on a
2089	   packet-by-packet basis. The motivation for having two codepoints (the
2090	   'ECN nonce') is the desire to check two things: for the data sender
2091	   to verify that network elements are not erasing the CE codepoint; and
2092	   for the data sender to verify that data receivers are properly
2093	   reporting to the sender the receipt of packets with the CE codepoint
2094	   set.

2096	   ECN requires support from the transport protocol, in addition to the
2097	   functionality given by the ECN field in the IP packet header.
2098	   [RFC3168] addresses the addition of ECN Capability to TCP, specifying
2099	   three new pieces of functionality: negotiation between the endpoints
2100	   during connection setup to determine if they are both ECN-capable; an
2101	   ECN-Echo (ECE) flag in the TCP header so that the data receiver can
2102	   inform the data sender when a CE packet has been received; and a
2103	   Congestion Window Reduced (CWR) flag in the TCP header so that the
2104	   data sender can inform the data receiver that the congestion window
2105	   has been reduced.

2107	   The transport layer (e.g.. TCP) must respond, in terms of congestion
2108	   control, to a *single* CE packet as it would to a packet drop.

2110	   The advantage of setting the CE codepoint as an indication of
2111	   congestion, instead of relying on packet drops, is that it allows the
2112	   receiver(s) to receive the packet, thus avoiding the potential for
2113	   excessive delays due to retransmissions after packet losses.

2115	11.2. Appendix B: What is distributed measurement-based admission
2116	   control?

2118	   This Appendix briefly explains what distributed measurement-based
2119	   admission control is [Breslau99].

2121	   Traditional admission control algorithms for 'hard' real-time
2122	   services (those providing a firm delay bound for example) guarantee
2123	   QoS by using 'worst case analysis'. Each time a flow is admitted its
2124	   traffic parameters are examined and the network re-calculates the
2125	   remaining resources. When the network gets a new request it therefore
2126	   knows for certain whether the prospective flow, with its particular
2127	   parameters, should be admitted. However, parameter-based admission
2128	   control algorithms result in under-utilisation when the traffic is
2129	   bursty. Therefore 'soft' real time services - like Controlled Load -
2130	   can use a more relaxed admission control algorithm.

2132	   This insight suggests measurement-based admission control (MBAC). The
2133	   aim of MBAC is to provide a statistical service guarantee. The
2134	   classic scenario for MBAC is where each router participates in hop-
2135	   by-hop admission control, characterising existing traffic locally
2136	   through measurements (instead of keeping an accurate track of traffic
2137	   as it is admitted), in order to determine the current value of some
2138	   parameter e.g. load. Note that for scalability the measurement is of
2139	   the aggregate of the flows in the local system. The measured
2140	   parameter(s) is then compared to the requirements of the prospective
2141	   flow to see whether it should be admitted.

2143	   MBAC may also be performed centrally for a network, it which case it
2144	   uses centralised measurements by a bandwidth broker.

2146	   We use distributed MBAC. "Distributed" means that the measurement is
2147	   accumulated for the 'whole-path' using in-band signalling. In our
2148	   case, this means that the measurement of existing traffic is for the
2149	   same pair of ingress and egress gateways as the prospective
2150	   microflow.

2152	   In fact our mechanism can be said to be distributed in three ways:
2153	   all routers on the ingress-egress path affect the Congestion-Level-
2154	   Estimate; the admission control decision is made just once on behalf
2155	   of all the routers on the path across the CL-region; and the ingress
2156	   and egress gateways cooperate to perform MBAC.

2158	11.3. Appendix C: Calculating the Exponentially weighted moving average
2159	   (EWMA)

2161	   At the egress gateway, for every CL packet arrival:

2163	   [EWMA-total-bits]n+1  =  (w * bits-in-packet)  +  ((1-w) * [EWMA-
2164	   total-bits]n )

2166	   [EWMA-M-bits]n+1  =  (B * w * bits-in-packet)  +  ((1-w) * [EWMA-M-
2167	   bits]n )

2169	   Then, per new flow arrival:

2171	   [Congestion-Level-Estimate]n+1  =  [EWMA-M-bits]n+1  /  [EWMA-total-
2172	   bits]n+1

2174	   where
2175	   EWMA-total-bits is the total number of bits in CL packets, calculated
2176	   as an exponentially weighted moving average (EWMA)

2178	   EWMA-M-bits is the total number of bits in CL packets that are
2179	   Admission Marked or Pre-emption Marked, again calculated as an EWMA.

2181	   B is either 0 or 1:

2183	     B = 0 if the CL packet is not admission marked

2185	     B = 1 if the CL packet is admission marked

2187	   w is the exponential weighting factor.

2189	   Varying the value of the weight trades off between the smoothness and
2190	   responsiveness of the Congestion-Level-Estimate. However, in general
2191	   both can be achieved, given our original assumption of many CL
2192	   microflows and remembering that the EWMA is calculated on the basis
2193	   of aggregate traffic between the ingress and egress gateways.
2194	   There will be a threshold inter-arrival time between packets of the
2195	   same aggregate below which the egress will consider the estimate of
2196	   the Congestion-Level-Estimate as too stale, and it will then trigger
2197	   generation of probes by the ingress.

2199	   The first two per-packet algorithms can be simplified, if their only
2200	   use will be where the result of one is divided by the result of the
2201	   other in the third, per-flow algorithm.

2203	   [EWMA-total-bits]'n+1  =  bits-in-packet  +  (w' * [EWMA- total-
2204	   bits]n )

2206	   [EWMA-AM-bits]'n+1  =  (B * bits-in-packet)  +  (w' * [EWMA-AM-bits]n
2207	   )

2209	   where w' = (1-w)/w.

2211	   If w' is arranged to be a power of 2, these per packet algorithms can
2212	   be implemented solely with a shift and an add.

2214	   There are alternative possibilities for smoothing out the congestion-
2215	   level-estimate. For example [TEWMA] deals better with the issue of
2216	   stale information when the traffic rate for

2218	12. References

2220	   A later version will distinguish normative and informative
2221	   references.

2223	   [AGGRE-TE]    Francois Le Faucheur, Michael Dibiasio, Bruce Davie,
2224	                 Michael Davenport, Chris Christou, Jerry Ash, Bur
2225	                 Goode, 'Aggregation of RSVP Reservations over MPLS
2226	                 TE/DS-TE Tunnels', draft-ietf-tsvwg-rsvp-dste-03 (work
2227	                 in progress), June 2006

2229	   [ANSI.MLPP.Spec] American National Standards Institute,
2230	                 "Telecommunications- Integrated Services Digital
2231	                 Network (ISDN) - Multi-Level Precedence and Pre-
2232	                 emption (MLPP) Service Capability", ANSI T1.619-1992
2233	                 (R1999), 1992.

2235	   [ANSI.MLPP.Supplement] American National Standards Institute, "MLPP
2236	                 Service Domain Cause Value Changes", ANSI ANSI
2237	                 T1.619a-1994 (R1999), 1990.

2239	   [AVQ]         S. Kunniyur and R. Srikant "Analysis and Design of an
2240	                 Adaptive Virtual Queue (AVQ) Algorithm for Active
2241	                 Queue Management", In: Proc. ACM SIGCOMM'01, Computer
2242	                 Communication Review 31 (4) (October, 2001).

2244	   [Breslau99]   L. Breslau, S. Jamin, S. Shenker "Measurement-based
2245	                 admission control: what is the research agenda?", In:
2246	                 Proc. Int'l Workshop on Quality of Service 1999.

2248	   [Breslau00]   L. Breslau, E. Knightly, S. Shenker, I. Stoica, H.
2249	                 Zhang "Endpoint Admission Control: Architectural
2250	                 Issues and Performance", In: ACM SIGCOMM 2000

2252	   [Briscoe]     Bob Briscoe and Steve Rudkin, "Commercial Models for
2253	                 IP Quality of Service Interconnect", BT Technology
2254	                 Journal, Vol 23 No 2, April 2005.

2256	   [DCAC]        Richard J. Gibbens and Frank P. Kelly "Distributed
2257	                 connection acceptance control for a connectionless
2258	                 network", In: Proc. International Teletraffic Congress
2259	                 (ITC16), Edinburgh, pp. 941�952 (1999).

2261	   [ECN-MPLS]    Bruce Davie, Bob Briscoe, June Tay, "Explicit
2262	                 Congestion Marking in MPLS",                    draft-
2263	                 davie-ecn-mpls-00.txt (work in progress), June 2006

2265	   [EMERG-RQTS]  Carlberg, K. and R. Atkinson, "General Requirements
2266	                 for Emergency Telecommunication Service (ETS)", RFC
2267	                 3689, February 2004.

2269	   [EMERG-TEL]   Carlberg, K. and R. Atkinson, "IP Telephony
2270	                 Requirements for Emergency Telecommunication Service
2271	                 (ETS)", RFC 3690, February 2004.

2273	   [Floyd]       S. Floyd, 'Specifying Alternate Semantics for the
2274	                 Explicit Congestion Notification (ECN) Field', draft-
2275	                 floyd-ecn-alternates-02.txt (work in progress), August
2276	                 2005

2278	   [GSPa]        Karsten (Ed.), Martin "GSP/ECN Technology &
2279	                 Experiments", Deliverable: 15.3 PtIII, M3I Eu Vth
2280	                 Framework Project IST-1999-11429, URL:
2281	                 http://www.m3i.org/ (February, 2002) (superseded by
2282	                 [GSP-TR])

2284	   [GSP-TR]      Martin Karsten and Jens Schmitt, "Admission Control
2285	                 Based on Packet Marking and Feedback Signalling �--
2286	                 Mechanisms, Implementation and Experiments", TU-
2287	                 Darmstadt Technical Report TR-KOM-2002-03, URL:
2288	                 http://www.kom.e-technik.tu-
2289	                 darmstadt.de/publications/abstracts/KS02-5.html (May,
2290	                 2002)

2292	   [ITU.MLPP.1990] International Telecommunications Union, "Multilevel
2293	                 Precedence and Pre-emption Service (MLPP)", ITU-T
2294	                 Recommendation I.255.3, 1990.

2296	   [Johnson]     DM Johnson, 'QoS control versus generous
2297	                 dimensioning', BT Technology Journal, Vol 23 No 2,
2298	                 April 2005

2300	   [LoadBalancing-a] Ruediger Martin, Michael Menth, and Michael
2301	                 Hemmkeppler: "Accuracy and Dynamics of Hash-Based Load
2302	                 Balancing Algorithms for Multipath Internet Routing",
2303	                 IEEE Broadnets, San Jose, CA, USA, October 2006
2304	                 http://www3.informatik.uni-
2305	                 wuerzburg.de/~menth/Publications/Menth06p.pdf

2307	   [LoadBalancing-b] Ruediger Martin, Michael Menth, and Michael
2308	                 Hemmkeppler: "Accuracy and Dynamics of Multi-Stage
2309	                 Load Balancing for Multipath Internet Routing",
2310	                 currently under submission http://www3.informatik.uni-
2311	                 wuerzburg.de/~menth/Publications/Menth07-Sub-6.pdf

2313	   [Low]         S. Low, L. Andrew, B. Wydrowski, 'Understanding XCP:
2314	                 equilibrium and fairness', IEEE InfoCom 2005

2316	   [NAC-a]       Michael Menth: "Efficient Admission Control and
2317	                 Routing in Resilient Communication Networks", PhD
2318	                 thesis, July 2004, http://opus.bibliothek.uni-
2319	                 wuerzburg.de/opus/volltexte/2004/994/pdf/Menth04.pdf

2321	   [NAC-b]       Michael Menth, Stefan Kopf, Joachim Charzinski, and
2322	                 Karl Schrodi: "Resilient Network Admission Control",
2323	                 currently under submission.
2324	                 http://www3.informatik.uni-
2325	                 wuerzburg.de/~menth/Publications/Menth07-Sub-3.pdf

2327	   [PCN]         B. Briscoe, P. Eardley, D. Songhurst, F. Le Faucheur,
2328	                 A. Charny, V. Liatsos, S. Dudley, J. Babiarz, K. Chan,
2329	                 G. Karagiannis, A. Bader, L. Westberg. 'Pre-Congestion
2330	                 Notification marking', draft-briscoe-tsvwg-cl-phb-02
2331	                 (work in progress), June 2006.

2333	   [Re-ECN]      Bob Briscoe, Arnaud Jacquet, Alessandro Salvatori,
2334	                 'Re-ECN: Adding Accountability for Causing Congestion
2335	                 to TCP/IP', draft-briscoe-tsvwg-re-ecn-tcp-01 (work in
2336	                 progress), March 2006.

2338	   [Re-feedback] Bob Briscoe, Arnaud Jacquet, Carla Di Cairano-
2339	                 Gilfedder, Andrea Soppera, 'Re-feedback for Policing
2340	                 Congestion Response in an Inter-network', ACM SIGCOMM
2341	                 2005, August 2005.

2343	   [Re-PCN]      B. Briscoe, 'Emulating Border Flow Policing using Re-
2344	                 ECN on Bulk Data', draft-briscoe-tsvwg-re-ecn-border-
2345	                 cheat-00 (work in progress), February 2006.

2347	   [Reid]        ABD Reid, 'Economics and scalability of QoS
2348	                 solutions', BT Technology Journal, Vol 23 No 2, April
2349	                 2005

2351	   [RFC2211]     J. Wroclawski, Specification of the Controlled-Load
2352	                 Network Element Service, September 1997

2354	   [RFC2309]     Braden, B., et al., "Recommendations on Queue
2355	                 Management and Congestion Avoidance in the Internet",
2356	                 RFC 2309, April 1998.

2358	   [RFC2474]     Nichols, K., Blake, S., Baker, F. and D. Black,
2359	                 "Definition of the Differentiated Services Field (DS
2360	                 Field) in the IPv4 and IPv6 Headers", RFC 2474,
2361	                 December 1998

2363	   [RFC2475]     Blake, S., Black, D., Carlson, M., Davies, E., Wang,
2364	                 Z. and W. Weiss, 'A framework for Differentiated
2365	                 Services', RFC 2475, December 1998.

2367	   [RFC2597]     Heinanen, J., Baker, F., Weiss, W. and J. Wrocklawski,
2368	                 "Assured Forwarding PHB Group", RFC 2597, June 1999.

2370	   [RFC2998]     Bernet, Y., Yavatkar, R., Ford, P., Baker, F., Zhang,
2371	                 L., Speer, M., Braden, R., Davie, B., Wroclawski, J.
2372	                 and E. Felstaine, "A Framework for Integrated Services
2373	                 Operation Over DiffServ Networks", RFC 2998, November
2374	                 2000.

2376	   [RFC3168]     Ramakrishnan, K., Floyd, S. and D. Black "The Addition
2377	                 of Explicit Congestion Notification (ECN) to IP", RFC
2378	                 3168, September 2001.

2380	   [RFC3246]     B. Davie, A. Charny, J.C.R. Bennet, K. Benson, J.Y. Le
2381	                 Boudec, W. Courtney, S. Davari, V. Firoiu, D.
2382	                 Stiliadis, 'An Expedited Forwarding PHB (Per-Hop
2383	                 Behavior)', RFC 3246, March 2002.

2385	   [RFC3270]     Le Faucheur, F., Wu, L., Davie, B., Davari, S.,
2386	                 Vaananen, P., Krishnan, R., Cheval, P., and J.
2387	                 Heinanen, "Multi- Protocol Label Switching (MPLS)
2388	                 Support of Differentiated Services", RFC 3270, May
2389	                 2002.

2391	   [RFC4542]     F. Baker & J. Polk, "Implementing an Emergency
2392	                 Telecommunications Service for Real Time Services in
2393	                 the Internet Protocol Suite", RFC 4542, May 2006.

2395	   [RMD]         Attila Bader, Lars Westberg, Georgios Karagiannis,
2396	                 Cornelia Kappler, Tom Phelan, 'RMD-QOSM - The Resource
2397	                 Management in DiffServ QoS model', draft-ietf-nsis-
2398	                 rmd-03 Work in Progress, June 2005.

2400	   [RSVP-PCN]    Francois Le Faucheur, Anna Charny, Bob Briscoe, Philip
2401	                 Eardley, Joe Barbiaz, Kwok-Ho Chan, 'RSVP Extensions
2402	                 for Admission Control over DiffServ using Pre-
2403	                 Congestion Notification (PCN)', draft-lefaucheur-rsvp-
2404	                 ecn-01 (work in progress), June 2006.

2406	   [RSVP-PREEMPTION] Herzog, S., "Signaled Preemption Priority Policy
2407	                 Element", RFC 3181, October 2001.

2409	   [RSVP-EMERGENCY] Le Faucheur et al., RSVP Extensions for Emergency
2410	                 Services, draft-lefaucheur-emergency-rsvp-02.txt

2412	   [RTECN]       Babiarz, J., Chan, K. and V. Firoiu, 'Congestion
2413	                 Notification Process for Real-Time Traffic', draft-
2414	                 babiarz-tsvwg-rtecn-04 Work in Progress, July 2005.

2416	   [RTECN-usage] Alexander, C., Ed., Babiarz, J. and J. Matthews,
2417	                 'Admission Control Use Case for Real-time ECN', draft-
2418	                 alexander-rtecn-admission-control-use-case-00, Work in
2419	                 Progress, February 2005.

2421	   [Songhurst]   David J. Songhurst, Philip Eardley, Bob Briscoe, Carla
2422	                 Di Cairano Gilfedder and June Tay, 'Guaranteed QoS
2423	                 Synthesis for Admission Control with Shared Capacity',
2424	                 BT Technical Report TR-CXR9-2006-001, Feb 2006,
2425	                 http://www.cs.ucl.ac.uk/staff/B.Briscoe/projects/ipe2e
2426	                 qos/gqs/papers/GQS_shared_tr.pdf

2428	   [vq]          Costas Courcoubetis and Richard Weber "Buffer Overflow
2429	                 Asymptotics for a Switch Handling Many Traffic
2430	                 Sources" In: Journal Applied Probability 33 pp. 886--
2431	                 903 (1996).

2433	Authors' Addresses

2435	   Bob Briscoe
2436	   BT Research
2437	   B54/77, Sirius House
2438	   Adastral Park
2439	   Martlesham Heath
2440	   Ipswich, Suffolk
2441	   IP5 3RE
2442	   United Kingdom
2443	   Email: bob.briscoe@bt.com

2445	   Dave Songhurst
2446	   BT Research
2447	   B54/69, Sirius House
2448	   Adastral Park
2449	   Martlesham Heath
2450	   Ipswich, Suffolk
2451	   IP5 3RE
2452	   United Kingdom
2453	   Email: dsonghurst@jungle.bt.co.uk

2455	   Philip Eardley
2456	   BT Research
2457	   B54/77, Sirius House
2458	   Adastral Park
2459	   Martlesham Heath
2460	   Ipswich, Suffolk
2461	   IP5 3RE
2462	   United Kingdom
2463	   Email: philip.eardley@bt.com

2465	   Francois Le Faucheur
2466	   Cisco Systems, Inc.
2467	   Village d'Entreprise Green Side - Batiment T3
2468	   400, Avenue de Roumanille
2469	   06410 Biot Sophia-Antipolis
2470	   France
2471	   Email: flefauch@cisco.com

2473	   Anna Charny
2474	   Cisco Systems
2475	   300 Apollo Drive
2476	   Chelmsford, MA 01824
2477	   USA
2478	   Email: acharny@cisco.com
2479	   Kwok Ho Chan
2480	   Nortel Networks
2481	   600 Technology Park Drive
2482	   Billerica, MA  01821
2483	   USA
2484	   Email: khchan@nortel.com

2486	   Jozef Z. Babiarz
2487	   Nortel Networks
2488	   3500 Carling Avenue
2489	   Ottawa, Ont  K2H 8E9
2490	   Canada
2491	   Email: babiarz@nortel.com

2493	   Stephen Dudley
2494	   Nortel Networks
2495	   4001 E. Chapel Hill Nelson Highway
2496	   P.O. Box 13010, ms 570-01-0V8
2497	   Research Triangle Park, NC 27709
2498	   USA
2499	   Email: smdudley@nortel.com

2501	   Georgios Karagiannis
2502	   University of Twente
2503	   P.O. BOX 217
2504	   7500 AE Enschede,
2505	   The Netherlands
2506	   EMail: g.karagiannis@ewi.utwente.nl

2508	   Attila B�der
2509	   attila.bader@ericsson.com

2511	   Lars Westberg
2512	   Ericsson AB
2513	   SE-164 80 Stockholm
2514	   Sweden
2515	   EMail: Lars.Westberg@ericsson.com

2517	Intellectual Property Statement

2519	   The IETF takes no position regarding the validity or scope of any
2520	   Intellectual Property Rights or other rights that might be claimed to
2521	   pertain to the implementation or use of the technology described in
2522	   this document or the extent to which any license under such rights
2523	   might or might not be available; nor does it represent that it has
2524	   made any independent effort to identify any such rights.  Information
2525	   on the procedures with respect to rights in RFC documents can be
2526	   found in BCP 78 and BCP 79.

2528	   Copies of IPR disclosures made to the IETF Secretariat and any
2529	   assurances of licenses to be made available, or the result of an
2530	   attempt made to obtain a general license or permission for the use of
2531	   such proprietary rights by implementers or users of this
2532	   specification can be obtained from the IETF on-line IPR repository at
2533	   http://www.ietf.org/ipr.

2535	   The IETF invites any interested party to bring to its attention any
2536	   copyrights, patents or patent applications, or other proprietary
2537	   rights that may cover technology that may be required to implement
2538	   this standard.  Please address the information to the IETF at
2539	   ietf-ipr@ietf.org

2541	Disclaimer of Validity

2543	   This document and the information contained herein are provided on an
2544	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
2545	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
2546	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
2547	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
2548	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
2549	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

2551	Copyright Statement

2553	   Copyright (C) The Internet Society (2006).

2555	   This document is subject to the rights, licenses and restrictions
2556	   contained in BCP 78, and except as set forth therein, the authors
2557	   retain all their rights.