idnits 2.17.1 

draft-briscoe-tsvwg-cl-architecture-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 26.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 1994.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1971.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1978.

  ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line
     1998), which is fine, but *also* found old RFC 2026, Section 10.4C,
     paragraph 1 text on line 48.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.

  ** The document seems to lack an RFC 3979 Section 5, para. 3 IPR Disclosure
     Invitation -- however, there's a paragraph with a matching beginning.
     Boilerplate error?


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == There are 2 instances of lines with non-ascii characters in the document.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 6, 2006) is 6625 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC2205' is mentioned on line 1159, but not defined

  == Missing Reference: 'RT-ECN' is mentioned on line 608, but not defined

  == Unused Reference: 'AVQ' is defined on line 1749, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2309' is defined on line 1833, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2474' is defined on line 1837, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2597' is defined on line 1846, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3246' is defined on line 1859, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3270' is defined on line 1864, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-05) exists of
     draft-ietf-tsvwg-rsvp-dste-00

  -- Possible downref: Non-RFC (?) normative reference: ref. 'AVQ'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Breslau99'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Breslau00'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Briscoe'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DCAC'

  ** Downref: Normative reference to an Informational RFC: RFC 3689 (ref.
     'EMERG-RQTS')

  ** Downref: Normative reference to an Informational RFC: RFC 3690 (ref.
     'EMERG-TEL')

  -- Possible downref: Normative reference to a draft: ref. 'Floyd' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GSPa'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GSP-TR'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ITU.MLPP.1990'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Johnson'

  == Outdated reference: A later version (-03) exists of
     draft-briscoe-tsvwg-cl-phb-01

  -- Possible downref: Normative reference to a draft: ref. 'PCN' 

  == Outdated reference: A later version (-09) exists of
     draft-briscoe-tsvwg-re-ecn-tcp-01

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Re-feedback'

  == Outdated reference: A later version (-01) exists of
     draft-briscoe-tsvwg-re-ecn-border-cheat-00

  -- Possible downref: Normative reference to a draft: ref. 'Re-PCN' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Reid'

  ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567)

  ** Downref: Normative reference to an Informational RFC: RFC 2475

  ** Downref: Normative reference to an Informational RFC: RFC 2998

  == Outdated reference: A later version (-20) exists of
     draft-ietf-nsis-rmd-03

  ** Downref: Normative reference to an Experimental draft:
     draft-ietf-nsis-rmd (ref. 'RMD')

  == Outdated reference: A later version (-01) exists of
     draft-lefaucheur-rsvp-ecn-00

  -- Possible downref: Normative reference to a draft: ref. 'RSVP-ECN' 

  == Outdated reference: A later version (-05) exists of
     draft-babiarz-tsvwg-rtecn-04

  -- Possible downref: Normative reference to a draft: ref. 'RTECN' 

  -- Possible downref: Normative reference to a draft: ref. 'RTECN-usage' 


     Summary: 13 errors (**), 0 flaws (~~), 18 warnings (==), 23 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	TSVWG                                                        B. Briscoe
2	Internet Draft                                               P. Eardley
3	draft-briscoe-tsvwg-cl-architecture-02.txt                 D. Songhurst
4	Expires: September 2006                                              BT

6	                                                        F. Le Faucheur
7	                                                              A. Charny
8	                                                    Cisco Systems, Inc

10	                                                           J. Babiarz
11	                                                                K. Chan
12	                                                            S. Dudley
13	                                                               Nortel

15	                                                         March 6, 2006

17	    A Framework for Admission Control over DiffServ using Pre-Congestion
18	                               Notification
19	                draft-briscoe-tsvwg-cl-architecture-02.txt

21	Status of this Memo

23	   By submitting this Internet-Draft, each author represents that any
24	   applicable patent or other IPR claims of which he or she is aware
25	   have been or will be disclosed, and any of which he or she becomes
26	   aware will be disclosed, in accordance with Section 6 of BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF), its areas, and its working groups.  Note that
30	   other groups may also distribute working documents as Internet-
31	   Drafts.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress".

38	   The list of current Internet-Drafts can be accessed at
39	        http://www.ietf.org/ietf/1id-abstracts.txt

41	   The list of Internet-Draft Shadow Directories can be accessed at
42	        http://www.ietf.org/shadow.html

44	   This Internet-Draft will expire on September 6, 2006.

46	Copyright Notice

48	   Copyright (C) The Internet Society (2006).  All Rights Reserved.

50	Abstract

52	   This document describes a framework to achieve an end-to-end
53	   Controlled Load (CL) service without the scalability problems of
54	   previous approaches. Flow admission control and if necessary flow
55	   pre-emption preserve the CL service to admitted flows. But interior
56	   routers within a large DiffServ-based region of the Internet do not
57	   require flow state or signalling. They only have to give early
58	   warning of their own congestion by bulk packet marking using new pre-
59	   congestion notification marking. Gateways around the edges of the
60	   region convert measurements of this packet granularity marking into
61	   admission control and pre-emption functions at flow granularity.

63	Authors' Note (TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION)

65	   This document is posted as an Internet-Draft with the intention of
66	   eventually becoming an INFORMATIONAL RFC, rather than a standards
67	   track document.

69	Table of Contents

71	   1. Introduction................................................4
72	      1.1. Summary................................................4
73	         1.1.1. Flow admission control.............................5
74	         1.1.2. Flow pre-emption...................................7
75	         1.1.3. Both admission control and pre-emption.............8
76	      1.2. Terminology............................................9
77	      1.3. Existing terminology...................................10
78	      1.4. Standardisation requirements...........................10
79	      1.5. Structure of rest of the document......................11
80	   2. Key aspects of the framework................................12
81	      2.1. Key goals.............................................12
82	      2.2. Key assumptions........................................13
83	      2.3. Key benefits..........................................15
84	   3. Architecture...............................................18
85	      3.1. Admission control......................................18
86	         3.1.1. Pre-Congestion Notification for Admission Marking..18
87	         3.1.2. Measurements to support admission control..........18
88	         3.1.3. How edge-to-edge admission control supports end-to-end
89	         QoS signalling..........................................19
90	         3.1.4. Use case.........................................19
91	      3.2. Flow pre-emption.......................................20
92	         3.2.1. Alerting an ingress gateway that flow pre-emption may be
93	         needed..................................................20
94	         3.2.2. Determining the right amount of CL traffic to drop.23
95	         3.2.3. Use case for flow pre-emption.....................24
96	   4. Details....................................................26
97	      4.1. Ingress gateways.......................................26
98	      4.2. Interior nodes........................................27
99	      4.3. Egress gateways........................................27
100	      4.4. Failures..............................................28
101	      4.5. Admission of 'emergency / higher precedence' session....29
102	      4.6. Tunnelling............................................30
103	   5. Potential future extensions.................................32
104	      5.1. Mechanisms to deal with 'Flash crowds'.................32
105	      5.2. Multi-domain and multi-operator usage..................33
106	      5.3. Adaptive bandwidth for the Controlled Load service......33
107	      5.4. Controlled Load service with end-to-end Pre-Congestion
108	      Notification...............................................34
109	      5.5. MPLS-TE...............................................34
110	   6. Relationship to other QoS mechanisms........................35
111	      6.1. IntServ Controlled Load................................35
112	      6.2. Integrated services operation over DiffServ............35
113	      6.3. Differentiated Services................................35
114	      6.4. ECN...................................................36
115	      6.5. RTECN.................................................36
116	      6.6. RMD...................................................36
117	      6.7. RSVP Aggregation over MPLS-TE..........................37
118	   7. Security Considerations.....................................37
119	   8. Acknowledgements...........................................38
120	   9. Comments solicited.........................................38
121	   10. Changes from earlier versions of the draft.................38
122	   11. Appendices................................................39
123	      11.1. Appendix A: Explicit Congestion Notification..........39
124	      11.2. Appendix B: What is distributed measurement-based admission
125	      control?...................................................40
126	      11.3. Appendix C: Calculating the Exponentially weighted moving
127	      average (EWMA).............................................41
128	   12. References................................................43
129	   Authors' Addresses............................................46
130	   Intellectual Property Statement................................48
131	   Disclaimer of Validity........................................48
132	   Copyright Statement...........................................49

134	1. Introduction

136	1.1. Summary

138	   This document describes a framework to achieve an end-to-end
139	   controlled load service by using - within a large region of the
140	   Internet - DiffServ and edge-to-edge distributed measurement-based
141	   admission control and flow pre-emption. Controlled load service is a
142	   quality of service (QoS) closely approximating the QoS that the same
143	   flow would receive from a lightly loaded network element [RFC2211].
144	   Controlled Load (CL) is useful for inelastic flows such as those for
145	   real-time media.

147	   In line with the "IntServ over DiffServ" framework defined in
148	   [RFC2998], the CL service is supported end-to-end and RSVP signalling
149	   [RFC2205] is used end-to-end, over an edge-to-edge DiffServ region.

151	 ___    ___    _______________________________________    ____    ___
152	|   |  |   |  |                                       |  |    |  |   |
153	|   |  |   |  |Ingress         Interior         Egress|  |    |  |   |
154	|   |  |   |  |gateway          nodes          gateway|  |    |  |   |
155	|   |  |   |  |-------+  +-------+  +-------+  +------|  |    |  |   |
156	|   |  |   |  | PCN-  |  | PCN-  |  | PCN-  |  |      |  |    |  |   |
157	|   |..|   |..|marking|..|marking|..|marking|..| Meter|..|    |..|   |
158	|   |  |   |  |-------+  +-------+  +-------+  +------|  |    |  |   |
159	|   |  |   |  |  \                                 /  |  |    |  |   |
160	|   |  |   |  |   \                               /   |  |    |  |   |
161	|   |  |   |  |    \  Congestion-Level-Estimate  /    |  |    |  |   |
162	|   |  |   |  |     \  (for admission control)  /     |  |    |  |   |
163	|   |  |   |  |      --<-----<----<----<-----<--      |  |    |  |   |
164	|   |  |   |  |      Sustainable-Aggregate-Rate       |  |    |  |   |
165	|   |  |   |  |        (for flow pre-emption)         |  |    |  |   |
166	|___|  |___|  |_______________________________________|  |____|  |___|

168	Sx     Access               CL-region                   Access    Rx
169	End    Network                                          Network   End
170	Host                                                              Host
171	                <------ edge-to-edge signalling ----->
172	              (for admission control & flow pre-emption)

174	<-------------------end-to-end QoS signalling protocol--------------->

176	Figure 1: Overall QoS architecture (NB terminology explained later)
177	   In Section 1.1.1 we summarise how admission of new CL microflows is
178	   controlled so as to deliver the required QoS. In abnormal
179	   circumstances for instance a disaster affecting multiple interior
180	   nodes, then the QoS on existing CL microflows may degrade even if
181	   care was exercised when admitting those microflows before those
182	   circumstances. Therefore we also propose a mechanism (summarised in
183	   Section 1.1.2) to pre-empt some of the existing microflows. Then
184	   remaining microflows retain their expected QoS, while improved QoS is
185	   quickly restored to lower priority traffic.

187	   As a fundamental building block to support these two mechanisms, we
188	   introduce "Pre-Congestion Notification". Pre-Congestion Notification
189	   (PCN) builds on the concepts of RFC 3168, "The addition of Explicit
190	   Congestion Notification to IP". The draft [PCN] proposes the
191	   respective algorithms that determine when a PCN-enabled router marks
192	   a packet with Admission Marking or Pre-emption Marking, depending on
193	   the traffic level.

195	   Pre-Congestion Notification can supplement any Per Hop Behaviour. In
196	   order to support CL traffic we would expect it to supplement the
197	   existing Expedited Forwarding (EF). Within the controlled edge-to-
198	   edge region, a particular packet receives the Pre-Congestion
199	   Notification behaviour if the packet's DSCP (differentiated services
200	   codepoint) is set to EF (or whatever is configured for CL traffic)
201	   and also the ECN field indicates ECN Capable Transport.

203	   There are various possible ways to encode the markings into a packet,
204	   using the ECN field and perhaps other DSCPs, which are discussed in
205	   [PCN]. In this draft we use the abstract names Admission Marking and
206	   Pre-emption Marking.

208	   This framework assumes that the Pre-Congestion Notification behaviour
209	   is used in a controlled environment, i.e. within the controlled edge-
210	   to-edge region.

212	1.1.1. Flow admission control

214	   This document describes a new admission control procedure for an
215	   edge-to-edge region, which uses new per-hop Pre-Congestion
216	   Notification 'admission marking' as a fundamental building block. In
217	   turn, an end-to-end CL service would use this as a building block
218	   within a broader QoS architecture.

220	   The per-hop, edge-to-edge and end-to-end aspects are now briefly
221	   introduced in turn.

223	   Appendix A provides a brief summary of Explicit Congestion
224	   Notification (ECN) [RFC3168]. It specifies that a router sets the ECN
225	   field to the Congestion Experienced (CE) value as a warning of
226	   incipient congestion. RFC3168 doesn't specify a particular algorithm
227	   for setting the CE codepoint, although RED (Random Early Detection)
228	   is expected to be used.

230	   Pre-Congestion Notification (PCN) builds on the concepts of ECN. PCN
231	   introduces a new algorithm that Admission Marks packets before there
232	   is any significant build-up of CL packets in the queue. Admission
233	   marked packets therefore act as an "early warning" when the amount of
234	   packets flowing is getting close to the engineered capacity. Hence it
235	   can be used with per-hop behaviours (PHBs) designed to operate with
236	   very low queue occupancy, such as Expedited Forwarding (EF). Note
237	   that our use of the ECN field operates across the CL-region, i.e.
238	   edge-to-edge, and not host-to-host as in [RFC3168].

240	   Turning next to the edge-to-edge aspect. All nodes within a region of
241	   the Internet, which we call the CL-region, apply the PHB used for CL
242	   traffic and the Pre-Congestion Notification behaviour. Traffic must
243	   enter/leave the CL-region through ingress/egress gateways, which have
244	   special functionality. Typically the CL-region is the core or
245	   backbone of an operator. The CL service is achieved "edge-to-edge"
246	   across the CL-region, by using distributed measurement-based
247	   admission control: the decision whether to admit a new microflow
248	   depends on a measurement of the existing traffic between the same
249	   pair of ingress and egress gateways (i.e. the same pair as the
250	   prospective new microflow). (See Appendix B for further discussion on
251	   "What is distributed measurement-based admission control?")

253	   As CL packets travel across the CL-region, nodes will admission mark
254	   packets (according to the Pre-Congestion Notification algorithm) as
255	   an "early warning" of potential congestion, i.e. before there is any
256	   significant build-up of CL packets in the queue. For traffic from
257	   each remote ingress gateway, the CL-region's egress gateway measures
258	   the fraction of CL traffic that is admission marked. The egress
259	   gateway calculates the value on a per bit basis as an exponentially
260	   weighted moving average (which we term Congestion-Level-Estimate).
261	   Then it reports it to the CL-region's ingress gateway piggy-backed on
262	   the signalling for a new flow. The ingress gateway only admits the
263	   new CL microflow if the Congestion-Level-Estimate is less than the
264	   value of the CLE-threshold. Hence previously accepted CL microflows
265	   will suffer minimal queuing delay, jitter and loss.

267	   In turn, the edge-to-edge architecture is a building block in
268	   delivering an end-to-end CL service. The approach is similar to that
269	   described in [RFC2998] for Integrated services operation over
270	   DiffServ networks. Like [RFC2998], an IntServ class (CL in our case)
271	   is achieved end-to-end, with a CL-region viewed as a single
272	   reservation hop in the total end-to-end path. Interior nodes of the
273	   CL-region do not process flow signalling nor do they hold state. We
274	   assume that the end-to-end signalling mechanism is RSVP (Section
275	   2.2). However, the RSVP signalling may itself be originated or
276	   terminated by proxies still closer to the edge of the network, such
277	   as home hubs or the like, triggered in turn by application layer
278	   signalling. [RFC2998] and our approach are compared further in
279	   Section 6.2.

281	   An important benefit compared with the IntServ over DiffServ model
282	   [RFC2998] arises from the fact that the load is controlled
283	   dynamically rather than with the traffic conditioning agreements
284	   (TCAs). TCAs were originally introduced in the (informational)
285	   DiffServ architecture [RFC2475] as an alternative to reservation
286	   processing in the interior region in order to reduce the burden on
287	   interior nodes. With TCAs, in practice service providers rely on
288	   subscription-time Service Level Agreements that statically define the
289	   parameters of the traffic that will be accepted from a customer. The
290	   problem arises because the TCA at the ingress must allow any
291	   destination address, if it is to remain scalable. But for longer
292	   topologies, the chances increase that traffic will focus on an
293	   interior resource, even though it is within contract at the ingress
294	   [Reid], e.g. all flows converge on the same egress gateway. Even
295	   though networks can be engineered to make such failures rare, when
296	   they occur all inelastic flows through the congested resource fail
297	   catastrophically.

299	   Distributed measurement-based admission control avoids reservation
300	   processing (whether per flow or aggregated) on interior nodes but
301	   flows are still blocked dynamically in response to actual congestion
302	   on any interior node. Hence there is no need for accurate or
303	   conservative prediction of the traffic matrix.

305	1.1.2. Flow pre-emption

307	   An essential QoS issue in core and backbone networks is being able to
308	   cope with failures of nodes and links. The consequent re-routing can
309	   cause severe congestion on some links and hence degrade the QoS
310	   experienced by on-going microflows and other, lower priority traffic.
311	   Even when the network is engineered to sustain a single link failure,
312	   multiple link failures (e.g. due to a fibre cut or a node failure, or
313	   a natural disaster) can cause violation of capacity constraints and
314	   resulting QoS failures. Our solution uses rate-based flow pre-
315	   emption, so that sufficient of the previously admitted CL microflows
316	   are dropped to ensure that the remaining ones again receive QoS
317	   commensurate with the CL service and at least some QoS is quickly
318	   restored to other traffic classes.

320	   The solution has two aspects. First, triggering the ingress gateway
321	   to test whether pre-emption may be needed. A router enhanced with
322	   Pre-Congestion Notification may optionally include an algorithm that
323	   sets packets into the Pre-emption Marked state. Such a packet alerts
324	   the egress that pre-emption may be needed, which in turn sends a Pre-
325	   emption Alert message to the ingress. Secondly, calculating the right
326	   amount of traffic to drop. This involves the egress gateway
327	   measuring, and reporting to the ingress gateway, the current amount
328	   of CL traffic received from that particular ingress gateway. The
329	   ingress gateway compares this measurement (which is the amount that
330	   the network can actually support, and which we thus call the
331	   Sustainable-Aggregate-Rate) with the rate that it is sending and
332	   hence determines how much traffic needs to be pre-empted.

334	   The solution operates within a little over one round trip time - the
335	   time required for microflow packets that have experienced Pre-emption
336	   Marking to travel downstream through the CL-region and arrive at the
337	   egress gateway, plus some additional time for the egress gateway to
338	   measure the rate seen after it has been alerted that pre-emption may
339	   be needed, and the time for the egress gateway to report this
340	   information to the ingress gateway.

342	1.1.3. Both admission control and pre-emption

344	   This document describes both the admission control and pre-emption
345	   mechanisms, and we suggest that an operator uses both. However, we do
346	   not require this and some operators may want to implement only one.

348	   For example, an operator could use just admission control, solving
349	   heavy congestion (caused by re-routing) by 'just waiting' - as
350	   sessions end, existing microflows naturally depart from the system
351	   over time, and the admission control mechanism will prevent admission
352	   of new microflows that use the affected links. So the CL-region will
353	   naturally return to normal controlled load service, but with reduced
354	   capacity. The drawback of this approach would be that until flows
355	   naturally depart to relieve the congestion, all flows and lower
356	   priority services will be adversely affected. As another example, an
357	   operator could use just admission control, avoiding heavy congestion
358	   (caused by re-routing) by 'capacity planning' - by configuring
359	   admission control thresholds to lower levels than the network could
360	   accept in normal situations such that the load after failure is
361	   expected to stay below acceptable levels even with reduced network
362	   resources.

364	   On the other hand, an operator could just rely for admission control
365	   on the traffic conditioning agreements of the DiffServ architecture
366	   [RFC2475]. The pre-emption mechanism described in this document would
367	   be used to counteract the problem described at the end of Section
368	   1.1.1.

370	1.2. Terminology

372	   This terminology is copied from the pre-congestion notification
373	   marking draft [PCN]:

375	   o Pre-Congestion Notification (PCN): two new algorithms that
376	      determine when a PCN-enabled router Admission Marks and Pre-
377	      emption Marks a packet, depending on the traffic level.

379	   o Admission Marking condition: the traffic level is such that the
380	      router Admission Marks packets. The router provides an "early
381	      warning" that the load is nearing the engineered admission control
382	      capacity, before there is any significant build-up of CL packets
383	      in the queue.

385	   o Pre-emption Marking condition: the traffic level is such that the
386	      router Pre-emption Marks packets. The router warns explicitly that
387	      pre-emption may be needed.

389	   o Configured-admission-rate: the reference rate used by the
390	      admission marking algorithm in a PCN-enabled router.

392	   o Configured-pre-emption-rate - the reference rate used by the pre-
393	      emption marking algorithm in a PCN-enabled router.

395	   The following terms are defined here:

397	   o Ingress gateway: node at an ingress to the CL-region. A CL-region
398	      may have several ingress gateways.

400	   o Egress gateway: node at an egress from the CL-region. A CL-region
401	      may have several egress gateways.

403	   o Interior node: a node which is part of the CL-region, but isn't an
404	      ingress or egress node.

406	   o CL-region: A region of the Internet in which all traffic
407	      enters/leaves through an ingress/egress gateway and all nodes run
408	      Pre-Congestion Notification marking. A CL-region is a DiffServ
409	      region (a DiffServ region is either a single DiffServ domain or
410	      set of contiguous DiffServ domains), but note that the CL-region
411	      does not use the traffic conditioning agreements (TCAs) of the
412	      (informational) DiffServ architecture.

414	   o CL-region-aggregate: all the microflows between a specific pair of
415	      ingress and egress gateways. Note there is no identifier unique to
416	      the aggregate.

418	   o Congestion-Level-Estimate: the number of bits in CL packets that
419	      are admission marked, divided by the number of bits in all CL
420	      packets. It is calculated as an exponentially weighted moving
421	      average. It is calculated by an egress gateway for the CL packets
422	      from a particular ingress gateway, i.e. there is a Congestion-
423	      Level-Estimate for each CL-region-aggregate.

425	   o Sustainable-Aggregate-Rate: the rate of traffic that the network
426	      can actually support for a specific CL-region-aggregate. So it is
427	      measured by an egress gateway for the CL packets from a particular
428	      ingress gateway.

430	1.3. Existing terminology

432	   This is a placeholder for useful terminology that is defined
433	   elsewhere.

435	1.4. Standardisation requirements

437	   The framework described in this document has two new standardisation
438	   requirements:

440	   o new Pre-Congestion Notification for Admission Marking and Pre-
441	      emption Marking are required, as detailed in [PCN].

443	   o the end-to-end signalling protocol needs to be modified to carry
444	      the Congestion-Level-Estimate report (for admission control) and
445	      the Sustainable-Aggregate-Rate (for flow pre-emption). With our
446	      assumption of RSVP (Section 2.2) as the end-to-end signalling
447	      protocol, it means that extensions to RSVP are required, as
448	      detailed in [RSVP-ECN], for example to carry the Congestion-Level-
449	      Estimate and Sustainable-Aggregate-Rate information from egress
450	      gateway to ingress gateway.

452	   Other than these things, the arrangement uses existing IETF protocols
453	   throughout, although not in their usual architecture.

455	1.5. Structure of rest of the document

457	   Section 2 describes some key aspects of the framework: our goals,
458	   assumptions and the benefits we believe it has. Section 3 describes
459	   the architecture (including a use case), whilst Section 4 summarises
460	   the required changes to the various nodes in the CL-region. Section 5
461	   outlines some possible extensions. Section 6 provides some comparison
462	   with existing QoS mechanisms.

464	2. Key aspects of the framework

466	   In this section we discuss the key aspects of the framework:

468	   o At a high level, our key goals, i.e. the functionality that we
469	      want to achieve

471	   o The assumptions that we're prepared to make

473	   o The consequent benefits they bring

475	2.1. Key goals

477	   The framework achieves an end-to-end controlled load (CL) service
478	   where a segment of the end-to-end path is an edge-to-edge Pre-
479	   Congestion Notification region. CL is a quality of service (QoS)
480	   closely approximating the QoS that the same flow would receive from a
481	   lightly loaded network element [RFC2211]. It is useful for inelastic
482	   flows such as those for real-time media.

484	   o The CL service should be achieved despite varying load levels of
485	      other sorts of traffic, which may or may not be rate adaptive
486	      (i.e. responsive to packet drops or ECN marks).

488	   o The CL service should be supported for a variety of possible CL
489	      sources: Constant Bit Rate (CBR), Variable Bit Rate (VBR) and
490	      voice with silence suppression. VBR is the most challenging to
491	      support.

493	   o After a localised failure in the interior of the CL-region causing
494	      heavy congestion, the CL service should recover gracefully by pre-
495	      empting (dropping) some of the admitted CL microflows, whilst
496	      preserving as many of them as possible with their full CL QoS.

498	   o It is suggested that flow pre-emption needs to be completed within
499	      1-2 seconds, because it is estimated that after a few seconds then
500	      many affected users will start to hang up (and then not only is a
501	      flow pre-emption mechanism redundant and possibly even counter-
502	      productive, but also many more flows than necessary to reduce
503	      congestion may hang up). Also, other, lower priority traffic
504	      classes will not be restored to partial service until the higher
505	      priority CL service reduces its load on shared links.

507	   o The CL service should support emergency services ([EMERG-RQTS],
508	      [EMERG-TEL]) as well as the Assured Service which is the IP
509	      implementation of the existing ITU-T/NATO/DoD telephone system
510	      architecture known as Multi-Level Pre-emption and Precedence
511	      [ITU.MLPP.1990] [ANSI.MLPP.Spec][ANSI.MLPP.Supplement], or MLPP.
512	      In particular, this involves admitting new high priority sessions
513	      even when admission control thresholds are reached and new routine
514	      sessions are rejected. Similarly, this involves taking into
515	      account session priorities and properties at the time of pre-
516	      empting flows.

518	2.2. Key assumptions

520	   The framework does not try to deliver the above functionality in all
521	   scenarios. We make the following assumptions about the type of
522	   scenario to be solved.

524	   o Edge-to-edge: all the nodes in the CL-region are upgraded with
525	      Pre-Congestion Notification, and all the ingress and egress
526	      gateways are upgraded to perform the measurement-based admission
527	      control and flow pre-emption. Note that although the upgrades
528	      required are edge-to-edge, the CL service is provided end-to-end.

530	   o Additional load: we assume that any additional load offered within
531	      the reaction time of the admission control mechanism doesn't move
532	      the CL-region directly from no congestion to overload. So it
533	      assumes there will always be an intermediate stage where some CL
534	      packets are Admission Marked, but they are still delivered without
535	      significant QoS degradation. We believe this is valid for core and
536	      backbone networks with typical call arrival patterns (given the
537	      reaction time is little more than one round trip time across the
538	      CL-region), but is unlikely to be valid in access networks where
539	      the granularity of an individual call becomes significant.

541	   o Aggregation: we assume that in normal operations, there are many
542	      CL microflows within the CL-region, typically at least hundreds
543	      between any pair of ingress and egress gateways. The implication
544	      is that the solution is targeted at core and backbone networks and
545	      possibly parts of large access networks.

547	   o Trust: we assume that there is trust between all the nodes in the
548	      CL-region. For example, this trust model is satisfied if one
549	      operator runs the whole of the CL-region. But we make no such
550	      assumptions about the end nodes, i.e. depending on the scenario
551	      they may be trusted or untrusted by the CL-region.

553	   o Signalling: we assume that the end-to-end signalling protocol is
554	      RSVP. Section 3 describes how the CL-region fits into such an end-
555	      to-end QoS scenario, whilst [RSVP-ECN] describes the extensions to
556	      RSVP that are required.

558	   o Separation: we assume that all nodes within the CL-region are
559	      upgraded with the CL mechanism, so the requirements of [Floyd] are
560	      met because the CL-region is an enclosed environment. Also, an
561	      operator separates CL-traffic in the CL-region from outside
562	      traffic by administrative configuration of the ring of gateways
563	      around the region. Within the CL-region we assume that the CL-
564	      traffic is separated from non-CL traffic.

566	   o Routing: we assume that one of the following applies:

568	        (same path) all packets between a pair of ingress and egress
569	        gateways follow the same path. This ensures that the Congestion-
570	        Level-Estimate used in the admission control procedure reflects
571	        the status of the path followed by the new flow's packets

573	        (load balanced) packets between a pair of ingress and egress
574	        gateways follow different paths but that the load balancing
575	        scheme is tuned in the CL-region to distribute load such that
576	        the different paths always receive comparable relative load.
577	        This ensures that the Congestion-Level-Estimate used in the
578	        admission control procedure (and which is computed taking into
579	        account packets travelling on all the paths) also approximately
580	        reflects the status of the actual path followed by the new
581	        microflow's packets

583	        (worst case assumed) packets between a pair of ingress and
584	        egress gateways follow different paths but that (i) it is
585	        acceptable for the operator to keep the CL traffic between this
586	        pair of gateways to a level dictated by the most loaded of all
587	        paths between this pair of gateways (so that CL flows may be
588	        rejected - or even pre-empted in some situations - even if one
589	        or more of the paths between the pair of gateways is operating
590	        below its engineered levels) and that (ii) it is acceptable for
591	        that operator to configure engineered levels below optimum
592	        levels to compensate for the fact that the effect on the
593	        Congestion-Level-Estimate of the congestion experienced over one
594	        of the paths may be diluted by traffic received over non-
595	        congested paths so that lower thresholds need to be used in
596	        these cases to ensure early admission control rejection and pre-
597	        emption over the congested paths.

599	   We are investigating ways of loosening the restrictions set by some
600	   of these assumptions, for instance:

602	   o Trust: to allow the CL-region to span multiple, non-trusting
603	      operators, using the technique of [Re-PCN] and mentioned in
604	      Section 5.1.

606	   o Signalling: we believe that the solution could operate with
607	      another signalling protocol such as NSIS. It could also work with
608	      application level signalling as suggested in [RT-ECN].

610	   o Additional load: we believe that the assumption is valid for core
611	      and backbone networks, with an appropriate margin between the
612	      configured-admission-rate and the capacity for CL traffic.
613	      However, in principle a burst of admission requests can occur in a
614	      short time. We expect this to be a rare event under normal
615	      conditions, but it could happen e.g. due to a 'flash crowd'. If it
616	      does, then more flows may be admitted than should be, triggering
617	      the pre-emption mechanisms. There are various approaches to how an
618	      operator might try to alleviate this issue, which are discussed in
619	      the 'Flash crowds' section 5.1 later.

621	   o Separation: the assumption that CL traffic is separated from non-
622	      CL traffic implies that the CL traffic has its own PHB, not shared
623	      with other traffic. We are looking at whether it could share
624	      Expedited Forwarding's PHB, but supplemented with Pre-Congestion
625	      Notification. If this is possible, other PHBs (like Assured
626	      Forwarding) could be supplemented with the same new behaviours.
627	      This is similar to how RFC3168 ECN was defined to supplement any
628	      PHB.

630	   o Routing: we are looking in greater detail at the solution in the
631	      presence of Equal Cost Multi-Path routing and at suitable
632	      enhancements. See also the "Tunnelling" section later.

634	2.3. Key benefits

636	   We believe that the mechanism described in this document has several
637	   advantages:

639	   o It achieves statistical guarantees of quality of service for
640	      microflows, delivering a very low delay, jitter and packet loss
641	      service suitable for applications like voice and video calls that
642	      generate real time inelastic traffic. This is because of its per
643	      microflow admission control scheme, combined with its dynamic on-
644	      path "early warning" of potential congestion. The guarantee is at
645	      least as strong as with IntServ Controlled Load (Section 6.1
646	      mentions why the guarantee may be somewhat better), but without
647	      the scalability problems of per-microflow IntServ.

649	   o It can support "Emergency" and military Multi-Level Pre-emption
650	      and Priority services, even in times of heavy congestion (perhaps
651	      caused by failure of a node within the CL-region), by pre-empting
652	      on-going "ordinary CL microflows". See also Section 4.5.

654	   o It scales well, because there is no signal processing or path
655	      state held by the interior nodes of the CL-region.

657	   o It is resilient, again because no state is held by the interior
658	      nodes of the CL-region. Hence during an interior routing change
659	      caused by a node failure no microflow state has to be relocated.
660	      The flow pre-emption mechanism further helps resilience because it
661	      rapidly reduces the load to one that the CL-region can support.

663	   o It helps preserve, through the flow pre-emption mechanism, QoS to
664	      as many microflows as possible and to lower priority traffic in
665	      times of heavy congestion (e.g. caused by failure of an interior
666	      node). Otherwise long-lived microflows could cause loss on all CL
667	      microflows for a long time.

669	   o It avoids the potential catastrophic failure problem when the
670	      DiffServ architecture is used in large networks using statically
671	      provisioned capacity. This is achieved by controlling the load
672	      dynamically, based on edge-to-edge-path real-time measurement of
673	      Pre-Congestion Notification, as discussed in Section 1.1.1.

675	   o It requires minimal new standardisation, because it reuses
676	      existing QoS protocols and algorithms.

678	   o It can be deployed incrementally, region by region or network by
679	      network. Not all the regions or networks on the end-to-end path
680	      need to have it deployed. Two CL-regions can even be separated by
681	      a network that uses another QoS mechanism (e.g. MPLS-TE).

683	   o It provides a deployment path for use of ECN for real-time
684	      applications. Operators can gain experience of ECN before its
685	      applicability to end-systems is understood and end terminals are
686	      ECN capable.

688	3. Architecture

690	3.1. Admission control

692	   In this section we describe the admission control mechanism. We
693	   discuss the three pieces of the solution and then give an example of
694	   how they fit together in a use case:

696	   o the new Pre-Congestion Notification for Admission Marking used by
697	      all nodes in the CL-region

699	   o how the measurements made support our admission control mechanism

701	   o how the edge to edge mechanism fits into the end to end RSVP
702	      signalling

704	3.1.1. Pre-Congestion Notification for Admission Marking

706	   This is discussed in [PCN]. Here we only give a brief outline.

708	   To support our admission control mechanism, each node in the CL-
709	   region runs an algorithm to determine whether to set the packet into
710	   the Admission Marked state. The algorithm measures the aggregate CL
711	   traffic on the link and ensures that packets are admission marked
712	   before the actual queue builds up, but when it is in danger of doing
713	   so soon; the probability of admission marking increases with the
714	   danger. The algorithm's main parameter is the configured-admission-
715	   rate, which is set lower than the link speed, perhaps considerably
716	   so. Admission marked packets indicate that the CL traffic rate is
717	   reaching the configured-admission-rate and so act as an "early
718	   warning" that the engineered capacity is nearly reached. Therefore
719	   they indicate that requests to admit prospective new CL flows may
720	   need to be refused.

722	3.1.2. Measurements to support admission control

724	   To support our admission control mechanism the egress measures the
725	   Congestion-Level-Estimate for traffic from each remote ingress
726	   gateway, i.e. per CL-region-aggregate. The Congestion-Level-Estimate
727	   is the number of bits in CL packets that are admission marked,
728	   divided by the number of bits in all CL packets. It is calculated as
729	   an exponentially weighted moving average. It is calculated by an
730	   egress node separately for the CL packets from each particular
731	   ingress node. This Congestion-Level-Estimate provides an estimate of
732	   how near the links on the path inside the CL-region are getting to
733	   the configured-admission-rate. Note that the metering is done
734	   separately per ingress node, because there may be sufficient capacity
735	   on all the nodes on the path between one ingress gateway and a
736	   particular egress, but not from a second ingress to that same egress
737	   gateway.

739	3.1.3. How edge-to-edge admission control supports end-to-end QoS
740	   signalling

742	   Consider a scenario that consists of two end hosts, each connected to
743	   their own access networks, which are linked by the CL-region. A
744	   source tries to set up a new CL microflow by sending an RSVP PATH
745	   message, and the receiving end host replies with an RSVP RESV
746	   message. Outside the CL-region some other method, for instance
747	   IntServ, is used to provide QoS. From the perspective of RSVP the CL-
748	   region is a single hop, so the RSVP PATH and RESV messages are
749	   processed by the ingress and egress gateways but are carried
750	   transparently across all the interior nodes; hence, the ingress and
751	   egress gateways hold per microflow state, whilst no state is kept by
752	   the interior nodes. So far this is as in IntServ over DiffServ
753	   [RFC2998]. However, in order to support our admission control
754	   mechanism, the egress gateway adds to the RESV message an opaque
755	   object which states the current Congestion-Level-Estimate for the
756	   relevant CL-region-aggregate. Details of the corresponding RSVP
757	   extensions are described in [RSVP-ECN].

759	3.1.4. Use case

761	   To see how the three pieces of the solution fit together, we imagine
762	   a scenario where some microflows are already in place between a given
763	   pair of ingress and egress gateways, but the traffic load is such
764	   that no packets from these flows are admission marked as they travel
765	   across the CL-region. A source wanting to start a new CL microflow
766	   sends an RSVP PATH message. The egress gateway adds an object to the
767	   RESV message with the Congestion-Level-Estimate, which is zero. The
768	   ingress gateway sees this and consequently admits the new flow. It
769	   then forwards the RSVP RESV message upstream towards the source end
770	   host. Hence, assuming there's sufficient capacity in the access
771	   networks, the new microflow is admitted end-to-end.

773	   The source now sends CL packets, which arrive at the ingress gateway.
774	   The ingress uses a five-tuple filter to identify that the packets are
775	   part of a previously admitted CL microflow, and it also polices the
776	   microflow to ensure it remains within its traffic profile. (The
777	   ingress has learnt the required information from the RSVP messages.)
778	   When forwarding a packet belonging to an admitted microflow, the
779	   ingress sets the packet's DSCP and ECN fields to the appropriate
780	   values configured for the CL region. The CL packet now travels across
781	   the CL-region, getting admission marked if necessary.

783	   Next, we imagine the same scenario but at a later time when load is
784	   higher at one (or more) of the interior nodes, which start to set CL
785	   packets into the Admission Marked state, because their load on the
786	   outgoing link is nearing the configured-admission-rate. The next time
787	   a source tries to set up a CL microflow, the ingress gateway learns
788	   (from the egress) the relevant Congestion-Level-Estimate. If it is
789	   greater than some CLE-threshold value then the ingress refuses the
790	   request, otherwise it is accepted.

792	   It is also possible for an egress gateway to get a RSVP RESV message
793	   and not know what the Congestion-Level-Estimate is. For example, if
794	   there are no CL microflows at present between the relevant ingress
795	   and egress gateways. In this case the egress requests the ingress to
796	   send probe packets, from which it can initialise its meter. RSVP
797	   Extensions for such a request to send probe data can be found in
798	   [RSVP-ECN].

800	3.2. Flow pre-emption

802	   In this section we describe the flow pre-emption mechanism. We
803	   discuss the two parts of the solution and then give an example of how
804	   they fit together in a use case:

806	   o How an ingress gateway is triggered to test whether flow pre-
807	      emption may be needed

809	   o How an ingress gateway determines the right amount of CL traffic
810	      to drop

812	   The mechanism is defined in [PCN] and [RSVP-ECN].

814	3.2.1. Alerting an ingress gateway that flow pre-emption may be needed

816	   Alerting an ingress gateway that flow pre-emption may be needed is a
817	   two stage process: a router in the CL-region alerts an egress gateway
818	   that flow pre-emption may be needed; in turn the egress gateway
819	   alerts the relevant ingress gateway. Every router in the CL-region
820	   has the ability to alert egress gateways, which may be done either
821	   explicitly or implicitly:

823	   o Explicit - the router per-hop behaviour is supplemented with a new
824	      Pre-emption Marking behaviour, which is outlined below. Reception
825	      of such a packet by the egress gateway alerts it that pre-emption
826	      may be needed.

828	   o Implicit - the router behaviour is unchanged from the Admission
829	      Marking behaviour described earlier. The egress gateway treats a
830	      Congestion-Level-Estimate of (almost) 100% as an implicit alert
831	      that pre-emption may be required. ('Almost' because the
832	      Congestion-Level-Estimate is a moving average, so can never reach
833	      exactly 100%.)

835	   To support explicit pre-emption alerting, each node in the CL-region
836	   runs an algorithm to determine whether to set the packet into the
837	   Pre-emption Marked state. The algorithm measures the aggregate CL
838	   traffic and ensures that packets are pre-emption marked before the
839	   actual queue builds up. The algorithm's main parameter is the
840	   configured-pre-emption-rate, which is set lower than the link speed
841	   (but higher than the configured-admission-rate). Thus pre-emption
842	   marked packets indicate that the CL traffic rate is reaching the
843	   configured-pre-emption-rate and so act as an "early warning" that the
844	   engineered capacity is nearly reached. Therefore they indicate that
845	   it may be advisable to pre-empt some of the existing CL flows in
846	   order to preserve the QoS of the others.

848	   Note that the explicit mechanism only makes sense if all the routers
849	   in the CL-region have the functionality so that the egress gateways
850	   can rely on the explicit mechanism. Otherwise there is the danger
851	   that the traffic happens to focus on a router without it, and egress
852	   gateways then have also to watch for implicit pre-emption alerts.

854	   When one or more packets in a CL-region-aggregate alert the egress
855	   gateway of the need for flow pre-emption, whether explicitly or
856	   implicitly, the egress puts that CL-region-aggregate into the Pre-
857	   emption Alert state. For each CL-region-aggregate in alert state it
858	   measures the rate of traffic at the egress gateway (i.e. the traffic
859	   rate of the appropriate CL-region-aggregate) and reports this to the
860	   relevant ingress gateway. The steps are:

862	   o Determine the relevant ingress gateway - for the explicit case the
863	      egress gateway examines the pre-emption marked packet and uses the
864	      state installed at the time of admission to determine which
865	      ingress gateway the packet came from. For the implicit case the
866	      egress gateway has already determined this information, because
867	      the Congestion-Level-Estimate is calculated per ingress gateway.

869	   o Measure the traffic rate of CL packets - as soon as the egress
870	      gateway is alerted (whether explicitly or implicitly) it measures
871	      the rate of CL traffic from this ingress gateway (i.e. for this
872	      CL-region-aggregate). Note that pre-emption marked packets are
873	      excluded from that measurement. It should make its measurement
874	      quickly and accurately, but exactly how is up to the
875	      implementation.

877	   o Alert the ingress gateway - the egress gateway then immediately
878	      alerts the relevant ingress gateway about the fact that flow pre-
879	      emption may be required. This Alert message also includes the
880	      measured Sustainable-Aggregate-Rate, i.e. the egress rate of CL-
881	      traffic for this ingress gateway. The Alert message is sent using
882	      reliable delivery. Procedures for support of such an Alert using
883	      RSVP are defined in [RSVP-ECN].

885	                                   _     _
886	             --------------       /       \          -----------------
887	CL packet   |Update        |     / Is it a \   Y    | Measure CL rate |
888	arrives --->|Congestion-   |--->/pre-emption\-----> | from ingress and|
889	            |Level-Estimate|    \  marked   /       | alert ingress   |
890	             --------------      \ packet? /         -----------------
891	                                  \_     _/

893	Figure 2: Egress gateway action for explicit Pre-emption Alert

895	                                   _     _
896	             --------------       /       \          -----------------
897	CL packet   |Update        |     /  Is     \   Y    | Measure CL rate |
898	arrives --->|Congestion-   |--->/  C.L.E.   \-----> | from ingress and|
899	            |Level-Estimate|    \ (nearly)  /       | alert ingress   |
900	             --------------      \ 100%?   /         -----------------
901	                                  \_     _/

903	Figure 3: Egress gateway action for implicit Pre-emption Alert
904	3.2.2. Determining the right amount of CL traffic to drop

906	   The method relies on the insight that the amount of CL traffic that
907	   can be supported between a particular pair of ingress and egress
908	   gateways, is the amount of CL traffic that is actually getting across
909	   the CL-region to the egress gateway without being re-marked to the
910	   Pre-emption Marked state. Hence we term it the Sustainable-Aggregate-
911	   Rate.

913	   So when the ingress gateway gets the Alert message from an egress
914	   gateway, it compares:

916	   o The traffic rate that it is sending to this particular egress
917	      gateway (which we term ingress-aggregate-rate)

919	   o The traffic rate that the egress gateway reports (in the Alert
920	      message) that it is receiving from this ingress gateway (which is
921	      the Sustainable-Aggregate-Rate)

923	   If the difference is significant, then the ingress gateway pre-empts
924	   some microflows. It only pre-empts if:

926	        Ingress-aggregate-rate > Sustainable-Aggregate-Rate + error

928	   The "error" term is partly to allow for inaccuracies in the
929	   measurements of the rates. It is also needed because the ingress-
930	   aggregate-rate is measured at a slightly later moment than the
931	   Sustainable-Aggregate-Rate, and it is quite possible that the
932	   ingress-aggregate-rate has increased in the interim due to natural
933	   variation of the bit rate of the CL sources. So the "error" term
934	   allows for some variation in the ingress rate without triggering pre-
935	   emption.

937	   The ingress gateway should pre-empt enough microflows to ensure that:

939	        New ingress-aggregate-rate < Sustainable-Aggregate-Rate - error

941	   The "error" term here is used for similar reasons but in the other
942	   direction, to ensure slightly more load is shed than seems necessary,
943	   in case the two measurements were taken during a short-term fall in
944	   load.

946	   When the routers in the CL-region are using explicit pre-emption
947	   alerting, the ingress gateway would normally pre-empt microflows
948	   whenever it gets an alert (it always would if it were possible to set
949	   "error" equal to zero). For the implicit case however this is not so.
950	   It receives an Alert message when the Congestion-Level-Estimate
951	   reaches (almost) 100%, which is roughly when traffic exceeds the
952	   configured-admission-rate. However, it is only when packets are
953	   indeed dropped en route that the Sustainable-Aggregate-Rate becomes
954	   less than the ingress-aggregate-rate so only then will pre-emption
955	   will actually occur on the ingress router.

957	   Hence with the implicit scheme, pre-emption can only be triggered
958	   once the system starts dropping packets and thus the QoS of flows
959	   starts being significantly degraded. This is in contrast with the
960	   explicit scheme which allows flow pre-emption to be triggered before
961	   any packet drop, simply when the traffic reaches the configured-pre-
962	   emption-rate. Therefore we believe that the explicit mechanism is
963	   superior. However it does require new functionality on all the
964	   routers (although this is little more than a bulk token bucket - see
965	   [PCN] for details).

967	3.2.3. Use case for flow pre-emption

969	   To see how the pieces of the solution fit together in a use case, we
970	   imagine a scenario where many microflows have already been admitted.
971	   We confine our description to the explicit pre-emption mechanism. Now
972	   an interior router in the CL-region fails. The network layer routing
973	   protocol re-routes round the problem, but as a consequence traffic on
974	   other links increases. In fact let's assume the traffic on one link
975	   now exceeds its configured-pre-emption-rate and so the router pre-
976	   emption marks CL packets. When the egress sees the first one of the
977	   pre-emption marked packets it immediately determines which microflow
978	   this packet is part of (by using a five-tuple filter and comparing it
979	   with state installed at admission) and hence which ingress gateway
980	   the packet came from. It sets up a meter to measure the traffic rate
981	   from this ingress gateway, and as soon as possible sends a message to
982	   the ingress gateway. This message alerts the ingress gateway that
983	   pre-emption may be needed and contains the traffic rate measured by
984	   the egress gateway. Then the ingress gateway determines the traffic
985	   rate that it is sending towards this egress gateway and hence it can
986	   calculate the amount of traffic that needs to be pre-empted.

988	   The ingress gateway could now just shed random microflows, but it is
989	   better if the least important ones are dropped. The ingress gateway
990	   could use information stored locally in each reservation's state
991	   (such as for example the RSVP pre-emption priority) as well as
992	   information provided by a policy decision point in order to decide
993	   which of the flows to shed (or perhaps which ones not to shed). The
994	   ingress gateway then initiates RSVP signalling to instruct the
995	   relevant destinations that their session has been terminated, and to
996	   tell (RSVP) nodes along the path to tear down associated RSVP state.
997	   To guard against recalcitrant sources, normal IntServ policing will
998	   block any future traffic from the dropped flows from entering the CL-
999	   region. Note that - with the explicit Pre-emption Alert mechanism -
1000	   since the configured-pre-emption-rate may be significantly less than
1001	   the physical line capacity, flow pre-emption may be triggered before
1002	   any congestion has actually occurred and before any packet is
1003	   dropped.

1005	   We extend the scenario further by imagining that (due to a disaster
1006	   of some kind) further routers in the CL-region fail during the time
1007	   taken by the pre-emption process described above. This is handled
1008	   naturally, as packets will continue to be pre-emption marked and so
1009	   the pre-emption process will happen for a second time.

1011	   Flow pre-emption also helps emergency/military calls by taking into
1012	   account the corresponding call priorities when selecting calls to be
1013	   pre-empted, which is likely to be particularly important in a
1014	   disaster scenario.

1016	4. Details

1018	   This section is intended to provide a systematic summary of the new
1019	   functionality required by the routers in the CL-region.

1021	   A network operator upgrades normal IP routers by:

1023	   o Adding functionality related to admission control and flow pre-
1024	      emption to all its ingress and egress gateways

1026	   o Adding Pre-Congestion Notification for Admission and Pre-emption
1027	      Marking to all the nodes in the CL-region.

1029	   We consider the detailed actions required for each of the types of
1030	   node in turn.

1032	4.1. Ingress gateways

1034	   Ingress gateways perform the following tasks:

1036	   o Classify incoming packets - decide whether they are CL or non-CL
1037	      packets. This is done using an IntServ filter spec (source and
1038	      destination addresses and port numbers), whose details have been
1039	      gathered from the RSVP messaging.

1041	   o Police - check that the microflow conforms with what has been
1042	      agreed (i.e. it keeps to its agreed data rate). If necessary,
1043	      packets which do not correspond to any reservations, packets which
1044	      are in excess of the rate agreed for their reservation, and
1045	      packets for a reservation that has earlier been pre-empted may be
1046	      policed. Policing may be achieved via dropping or via re-marking
1047	      of the packet's DSCP to a value different from the CL behaviour
1048	      aggregate.

1050	   o Packet ECN colouring - for CL microflows, set the ECN field
1051	      appropriately (see [PCN] for some discussion of encoding)

1053	   o Perform 'interior node' functions (see next sub-section)

1055	   o Admission Control - on new session establishment, consider the
1056	      Congestion-Level-Estimate received from the corresponding egress
1057	      gateway and most likely based on a simple configured CLE-threshold
1058	      decide if a new call is to be admitted or rejected (taking into
1059	      account local policy information as well as optionally information
1060	      provided by a policy decision point).

1062	   o Probe - if requested by the egress gateway to do so, the ingress
1063	      gateway generates probe traffic so that the egress gateway can
1064	      compute the Congestion-Level-Estimate from this ingress gateway.
1065	      Probe packets may be simple data addressed to the egress gateway
1066	      and require no protocol standardisation, although there will be
1067	      best practice for their number, size and rate.

1069	   o Measure - when it receives an Alert message from an egress
1070	      gateway, it determines the rate at which it is sending packets to
1071	      that egress gateway

1073	   o Pre-empt - calculate how much CL traffic needs to be pre-empted;
1074	      decide which microflows should be dropped, perhaps in consultation
1075	      with a Policy Decision Point; and do the necessary signalling to
1076	      drop them.

1078	4.2. Interior nodes

1080	   Interior nodes do the following tasks:

1082	   o Classify packets - examine the DSCP and ECN field to see if it's a
1083	      CL packet

1085	   o Non-CL packets are handled as usual, with respect to dropping them
1086	      or setting their CE codepoint.

1088	   o Pre-Congestion Notification - CL packets are Admission Marked and
1089	      Pre-emption Marked according to the algorithm detailed in [PCN]
1090	      and outlined in Section 3.

1092	4.3. Egress gateways

1094	   Egress gateways do the following tasks:

1096	   o Classify packets - determine which ingress gateway a CL packet has
1097	      come from. This is the previous RSVP hop, hence the necessary
1098	      details are obtained just as with IntServ from the state
1099	      associated with the packet five-tuple, which has been built using
1100	      information from the RSVP messages.

1102	   o Meter - for CL packets, calculate the fraction of the total number
1103	      of bits which are in Admission marked packets. The calculation is
1104	      done as an exponentially weighted moving average (see Appendix). A
1105	      separate calculation is made for CL packets from each ingress
1106	      gateway. The meter works on an aggregate basis and not per
1107	      microflow.

1109	   o Signal the Congestion-Level-Estimate - this is piggy-backed on the
1110	      reservation reply. An egress gateway's interface is configured to
1111	      know it is an egress gateway, so it always appends this to the
1112	      RESV message. If the Congestion-Level-Estimate is unknown or is
1113	      too stale, then the egress gateway can request the ingress gateway
1114	      to send probes.

1116	   o Packet colouring - for CL packets, set the DSCP and the ECN field
1117	      to whatever has been agreed as appropriate for the next domain. By
1118	      default the ECN field is set to the Not-ECT codepoint. See also
1119	      the discussion in the Tunnelling section later.

1121	   o Measure the rate - measure the rate of CL traffic from a
1122	      particular ingress gateway (i.e. the rate for the CL-region-
1123	      aggregate), when alerted (either explicitly or implicitly) that
1124	      pre-emption may be required. The measured rate is reported back to
1125	      the appropriate ingress gateway [RSVP-ECN].

1127	4.4. Failures

1129	   If an interior node fails, then the regular IP routing protocol will
1130	   re-route round it. If the new route can carry all the admitted
1131	   traffic, flows will gracefully continue. If instead this causes early
1132	   warning of congestion from the new route, then admission control
1133	   based on pre-congestion notification will ensure new flows will not
1134	   be admitted until enough existing flows have departed. Finally re-
1135	   routing may result in heavy congestion, when the pre-emption
1136	   mechanism will kick in.

1138	   If a gateway fails then we would like regular RSVP procedures
1139	   [RFC2205] to take care of things. With the local repair mechanism of
1140	   [RFC2205], when a route changes the next RSVP PATH refresh message
1141	   will establish path state along the new route, and thus attempt to
1142	   re-establish reservations through the new ingress gateway.
1143	   Essentially the same procedure is used as described earlier in this
1144	   document, with the re-routed session treated as a new session
1145	   request.

1147	   In more detail, consider what happens if an ingress gateway of the
1148	   CL-region fails. Then RSVP routers upstream of it do IP re-routing to
1149	   a new ingress gateway. The next time the upstream RSVP router sends a
1150	   PATH refresh message it reaches the new ingress gateway which
1151	   therefore installs the associated RSVP state. The next RSVP RESV
1152	   refresh will pick up the congestion-level-estimate from the egress
1153	   gateway, and the ingress compares this with its threshold to decide
1154	   whether to admit the new session. This could result in some of the
1155	   flows being rejected, but those accepted will receive the full QoS.

1157	   An issue with this is that we have to wait until a PATH and RESV
1158	   refresh messages are sent - which may not be very often - the default
1159	   value is 30 seconds. [RFC2205] discusses how to speed up the local
1160	   repair mechanism. First, the RSVP module is notified by the local
1161	   routing protocol module of a route change to particular destinations,
1162	   which triggers it to rapidly send out PATH refresh messages. Further,
1163	   when a PATH refresh arrives with a previous hop address different
1164	   from the one stored, then RESV refreshes are immediately sent to that
1165	   previous hop. Where RSVP is operating hop-by-hop, ie on every router,
1166	   then triggering the PATH refresh is easy as the node can simply
1167	   monitor its local link. Thus, this fast local repair mechanism can be
1168	   used to deal with failures upstream of the ingress gateway, with
1169	   failures of the ingress gateway and with failures downstream of the
1170	   egress gateway.

1172	   But where RSVP is not operating hop-by-hop (as is the case within the
1173	   CL-region), it is not so easy to trigger the PATH refresh.

1175	   Unfortunately, this problem applies if an egress gateway fails, since
1176	   it's very likely that an egress gateway is several IP hops from the
1177	   ingress gateway. (If the ingress is several IP hops from its previous
1178	   RSVP node, then there is the same issue.) The options appear to be:

1180	   o the ingress gateway has a link state database for the CL-region,
1181	      so it can detect that an egress gateway has failed or became
1182	      unreachable

1184	   o there is an inter-gateway protocol, so the ingress can
1185	      continuously check that the egress gateways are still alive

1187	   o (default) do nothing and wait for the regular PATH/RESV refreshes
1188	      (and, if needed, the pre-emption mechanism) to sort things out.

1190	4.5. Admission of 'emergency / higher precedence' session

1192	   Section 4.1 describes how if the Congestion-Level-Estimate is greater
1193	   than the CLE-threshold all new sessions are refused. But it is
1194	   unsatisfactory to block emergency calls, for instance. Therefore it
1195	   is recommended that an 'emergency / higher precedence' call is
1196	   admitted immediately even if the CLE-threshold is exceeded. Usually
1197	   the network can actually handle the additional microflow, because
1198	   there is a safety margin between the configured-admission-rate and
1199	   the configured-pre-emption-rate. Normal call termination behaviour
1200	   will soon bring the traffic level down below the configured-
1201	   admission-rate. However, in exceptional circumstances the 'emergency
1202	   / higher precedence' call may cause the traffic level to exceed the
1203	   configured-pre-emption-rate; then the usual pre-emption mechanism
1204	   will pre-empt enough (non 'emergency / higher precedence' )
1205	   microflows to bring the total traffic back under the configured-pre-
1206	   emption-rate.

1208	4.6. Tunnelling

1210	   It is possible to tunnel all CL packets across the CL-region.
1211	   Although there is a cost of tunnelling (additional header on each
1212	   packet, additional processing at tunnel ingress and egress), there
1213	   are three reasons it may be interesting.

1215	   ECMP:

1217	   If the CL-region uses Equal Cost Multipath Routing (ECMP), then
1218	   traffic between a particular pair of ingress and egress gateways may
1219	   follow several different paths.

1221	   Why? An ECMP-enabled router runs an algorithm to choose between
1222	   potential outgoing links, based on a hash of fields such as the
1223	   packet's source and destination addresses - exactly what depends on
1224	   the proprietary algorithm. Packets are addressed to the CL flow's
1225	   end-point, and therefore different flows may follow different paths
1226	   through the CL-region.

1228	   The problem is that if one of the paths is congested such that
1229	   packets are being admission marked, then the Congestion-Level-
1230	   Estimate measured by the egress gateway will be diluted by unmarked
1231	   packets from other non-congested paths. Similarly, the measurement of
1232	   the Sustainable-Aggregate-Rate will also be diluted.

1234	   One solution is to tunnel across the CL-region. Then the destination
1235	   address (and so on) seen by the ECMP algorithm is that of the egress
1236	   gateway, so all flows follow the same path.

1238	   Ingress gateway determination:

1240	   If packets are tunnelled from ingress gateway to egress gateway, the
1241	   egress gateway can very easily determine in the datapath which
1242	   ingress gateway a packet comes from (by simply looking at the source
1243	   address of the tunnel header). This can facilitate operations such as
1244	   computing the Congestion-Level-Estimate on a per ingress gateway
1245	   basis.

1247	   End-to-end ECN:

1249	   The ECN field is used for PCN marking (see [PCN] for details), and so
1250	   it needs to be re-set by the egress gateway to whatever has been
1251	   agreed as appropriate for the next domain. Therefore if a packet
1252	   arrives at the ingress gateway with its ECN field already set (ie not
1253	   '00'), it may leave the egress gateway with a different value. Hence
1254	   the end-to-end meaning of the ECN field is lost.

1256	   It is open to debate whether end-to-end congestion control is ever
1257	   necessary within an end-to-end reservation. But if a genuine need is
1258	   identified for end-to-end ECN semantics within a reservation, then
1259	   one solution is to tunnel CL packets across the CL-region. When the
1260	   egress gateway decapsulates them the original ECN field is recovered.

1262	5. Potential future extensions

1264	5.1. Mechanisms to deal with 'Flash crowds'

1266	   There is a time lag between the admission control decision (which
1267	   depends on the Congestion-Level-Estimate during RSVP signalling
1268	   during call set-up) and when the data is actually sent (after the
1269	   called party has answered). In PSTN terms this is the time the phone
1270	   rings. Normally the time lag doesn't matter much because (1) in the
1271	   CL-region there are many flows and they terminate and are answered at
1272	   roughly the same rate, and (2) the network can still operate safely
1273	   when the traffic level is some margin above the configured-admission-
1274	   rate.

1276	   A 'flash crowd' occurs when something causes many calls to be
1277	   initiated in a short period of time - for instance a 'televote'. So
1278	   there is a danger that a 'flash' of calls is accepted, but when the
1279	   calls are answered and data flows the traffic overloads the network.
1280	   There are various possible ways an operator could try to address the
1281	   problem.

1283	   The simplest option is to do nothing; an operator relies on the pre-
1284	   emption mechanism if there is a problem. This doesn't seem a good
1285	   choice, as 'flash crowds' are reasonably common on the PSTN, unless
1286	   the operator can ensure that nearly all "flash crowd" events are
1287	   blocked in the access network and so do not impact on the CL-region.

1289	   A second option is to send 'dummy data' as soon as the call is
1290	   admitted, thus effectively reserving the bandwidth whilst waiting for
1291	   the called party to answer. Reserving bandwidth in advance means that
1292	   the network cannot admit as many calls. For example, suppose sessions
1293	   last 100 seconds and ringing for 10 seconds, the cost is a 10% loss
1294	   of capacity. It may be possible to offset this somewhat by increasing
1295	   the configured-admission-rate in the routers, but it would need
1296	   further investigation.

1298	   A concern with this 'dummy data' option is that it may allow an
1299	   attacker to initiate many calls that are never answered (by a
1300	   cooperating attacker), so eventually the network would only be
1301	   carrying 'dummy data'. The attack exploits that charging only starts
1302	   when the call is answered and not when it is dialled. It may be
1303	   possible to alleviate the attack at the session layer - for example,
1304	   when the ingress gateway gets an RSVP PATH message it checks that the
1305	   source has been well-behaved recently.

1307	   A third option is that the egress gateway limits the rate at which it
1308	   sends out the Congestion-Level-Estimate, or limits the rate at which
1309	   calls are accepted by replying with a Congestion-Level-Estimate of
1310	   100% (this is the equivalent of 'call gapping' in the PSTN). There is
1311	   a trade-off, which would need to be investigated further, between the
1312	   degree of protection and possible adverse side-effects like slowing
1313	   down call set-up.

1315	   A final option is to re-perform admission control before the call is
1316	   answered. The ingress gateway monitors Congestion-Level-Estimate
1317	   updates received from each egress. If it notices that a Congestion-
1318	   Level-Estimate has risen above the CLE-threshold, then it terminates
1319	   all unanswered calls through that egress (eg by instructing the
1320	   session protocol to stop the 'ringing tone'). For extra safety the
1321	   Congestion-Level-Estimate could be re-checked when the call is
1322	   answered. A potential drawback for an operator that wants to emulate
1323	   the PSTN is that the PSTN network never drops a 'ringing' PSTN call.

1325	5.2. Multi-domain and multi-operator usage

1327	   This potential extension would eliminate the trust assumption
1328	   (Section 2.2), so that the CL-region could consist of multiple
1329	   domains run by different operators that did not trust each other.
1330	   Then only the ingress and egress gateways of the CL-region would take
1331	   part in the admission control procedure, i.e. at the ingress to the
1332	   first domain and the egress from the final domain. The border routers
1333	   between operators within the CL-region would only have to do bulk
1334	   accounting - they wouldn't do per microflow metering and policing,
1335	   and they wouldn't take part in signal processing or hold path state
1336	   [Briscoe]. [Re-feedback] explains how a downstream domain can police
1337	   that its upstream domain does not 'cheat' by admitting traffic when
1338	   the downstream path is over-congested. [Re-PCN] proposes how to
1339	   achieve this with the help of another recently proposed extension to
1340	   ECN, involving re-echoing ECN feedback [Re-ECN].

1342	5.3. Adaptive bandwidth for the Controlled Load service

1344	   The admission control mechanism described in this document assumes
1345	   that each router has a fixed bandwidth allocated to CL flows. A
1346	   possible extension is that the bandwidth is flexible, depending on
1347	   the level of non-CL traffic. If a large share of the current load on
1348	   a path is CL, then more CL traffic can be admitted. And if the
1349	   greater share of the load is non-CL, then the admission threshold can
1350	   be proportionately lower. The approach re-arranges sharing between
1351	   classes to aim for economic efficiency, whatever the traffic load
1352	   matrix. It also deals with unforeseen changes to capacity during
1353	   failures better than configuring fixed engineered rates. Adaptive
1354	   bandwidth allocation can be achieved by changing the admission
1355	   marking behaviour, so that the probability of admission marking a
1356	   packet would now depend on the number of queued non-CL packets as
1357	   well as the size of the virtual queue. The adaptive bandwidth
1358	   approach would be supplemented by placing limits on the adaptation to
1359	   prevent starvation of the CL by other traffic classes and of other
1360	   classes by CL traffic.

1362	5.4. Controlled Load service with end-to-end Pre-Congestion Notification

1364	   It may be possible to extend the framework to parts of the network
1365	   where there are only a low number of CL microflows, i.e. the
1366	   aggregation assumption (Section 2.2) doesn't hold. In the extreme it
1367	   may be possible to operate the framework end-to-end, i.e. between end
1368	   hosts. One potential method is to send probe packets to test whether
1369	   the network can support a prospective new CL microflow. The probe
1370	   packets would be sent at the same traffic rate as expected for the
1371	   actual microflow, but in order not to disturb existing CL traffic a
1372	   router would always schedule probe packets behind CL ones (compare
1373	   [Breslau00]); this implies they have a new DSCP. Otherwise the
1374	   routers would treat probe packets identically to CL packets. In order
1375	   to perform admission control quickly, in parts of the network where
1376	   there are only a few CL microflows, the Pre-Congestion marking
1377	   behaviour for probe packets would switch from admission marking no
1378	   packets to admission marking them all for only a minimal increase in
1379	   load.

1381	5.5. MPLS-TE

1383	   It may be possible to extend the framework for admission control of
1384	   microflows into a set of MPLS-TE aggregates (Multi-protocol label
1385	   switching traffic engineering). However it would require that the
1386	   MPLS header could include the ECN field, which is not precluded by
1387	   RFC3270.

1389	6. Relationship to other QoS mechanisms

1391	6.1. IntServ Controlled Load

1393	   The CL mechanism delivers QoS similar to Integrated Services
1394	   controlled load, but rather better as queues are kept empty by
1395	   driving admission control from a bulk virtual queue on each interface
1396	   that can detect a rise in load before queues build, sometimes termed
1397	   a virtual queue [AVQ, vq]. It is also more robust to route changes.

1399	6.2. Integrated services operation over DiffServ

1401	   Our approach to end-to-end QoS is similar to that described in
1402	   [RFC2998] for Integrated services operation over DiffServ networks.
1403	   Like [RFC2998], an IntServ class (CL in our case) is achieved end-to-
1404	   end, with a CL-region viewed as a single reservation hop in the total
1405	   end-to-end path. Interior routers of the CL-region do not process
1406	   flow signalling nor do they hold state. Unlike [RFC2998] we do not
1407	   require the end-to-end signalling mechanism to be RSVP, although it
1408	   can be.

1410	   Bearing in mind these differences, we can describe our architecture
1411	   in the terms of the options in [RFC2998]. The DiffServ network region
1412	   is RSVP-aware, but awareness is confined to (what [RFC2998] calls)
1413	   the "border routers" of the DiffServ region. We use explicit
1414	   admission control into this region, with static provisioning within
1415	   it. The ingress "border router" does per microflow policing and sets
1416	   the DSCP and ECN fields to indicate the packets are CL ones (i.e. we
1417	   use router marking rather than host marking).

1419	6.3. Differentiated Services

1421	   The DiffServ architecture does not specify any way for devices
1422	   outside the domain to dynamically reserve resources or receive
1423	   indications of network resource availability.  In practice, service
1424	   providers rely on subscription-time Service Level Agreements (SLAs)
1425	   that statically define the parameters of the traffic that will be
1426	   accepted from a customer. The CL mechanism allows dynamic reservation
1427	   of resources through the DiffServ domain and, with the potential
1428	   extension mentioned in Section 5.1, it can span multiple domains
1429	   without active policing mechanisms at the borders (unlike DiffServ).
1430	   Therefore we do not use the traffic conditioning agreements (TCAs) of
1431	   the (informational) DiffServ architecture [RFC2475].

1433	   [Johnson] compares admission control with a 'generously dimensioned'
1434	   DiffServ network as ways to achieve QoS. The former is recommended.

1436	6.4. ECN

1438	   The marking behaviour described in this document complies with the
1439	   ECN aspects of the IP wire protocol RFC3168, but provides its own
1440	   edge-to-edge feedback instead of the TCP aspects of RFC3168. All
1441	   nodes within the CL-region are upgraded with the admission marking
1442	   and pre-emption marking of Pre-Congestion Notification, so the
1443	   requirements of [Floyd] are met because the CL-region is an enclosed
1444	   environment. The operator prevents traffic arriving at a node that
1445	   doesn't understand CL by administrative configuration of the ring of
1446	   gateways around the CL-region.

1448	6.5. RTECN

1450	   Real-time ECN (RTECN) [RTECN, RTECN-usage] has a similar aim to this
1451	   document (to achieve a low delay, jitter and loss service suitable
1452	   for RT traffic) and a similar approach (per microflow admission
1453	   control combined with an "early warning" of potential congestion
1454	   through setting the CE codepoint). But it explores a different
1455	   architecture without the aggregation assumption: host-to-host rather
1456	   than edge-to-edge. We plan to document such a host-to-host framework
1457	   in a parallel draft to this one, and to describe if and how [PCN] can
1458	   work in this framework.

1460	6.6. RMD

1462	   Resource Management in DiffServ (RMD) [RMD] is similar to this work,
1463	   in that it pushes complex classification, traffic conditioning and
1464	   admission control functions to the edge of a DiffServ domain and
1465	   simplifies the operation of the interior nodes. One of the RMD modes
1466	   uses measurement-based admission control, however it works
1467	   differently: each interior node measures the user traffic load in the
1468	   PHB traffic aggregate, and each interior node processes a local
1469	   RESERVE message and compares the requested resources with the
1470	   available resources (maximum allowed load minus current load).

1472	   Hence a difference is that the CL architecture described in this
1473	   document has been designed not to require interaction between
1474	   interior nodes and signalling, whereas in RMD all interior nodes are
1475	   QoS-NSLP aware. So our architecture involves less processing in
1476	   interior nodes, is more agnostic to signalling, requires fewer
1477	   changes to existing standards and therefore works with existing RSVP
1478	   as well as having the potential to work with future signalling
1479	   protocols like NSIS.

1481	   RMD introduced the concept of Severe Congestion handling. The pre-
1482	   emption mechanism described in the CL architecture has similar
1483	   objectives but relies on different mechanisms.

1485	   It is planned to work together with the authors of [RMD] and that the
1486	   next version of this draft and [PCN] will be co-authored with them.

1488	6.7. RSVP Aggregation over MPLS-TE

1490	   Multi-protocol label switching traffic engineering (MPLS-TE) allows
1491	   scalable reservation of resources in the core for an aggregate of
1492	   many microflows. To achieve end-to-end reservations, admission
1493	   control and policing of microflows into the aggregate can be achieved
1494	   using techniques such as RSVP Aggregation over MPLS TE Tunnels as per
1495	   [AGGRE-TE]. However, in the case of inter-provider environments,
1496	   these techniques require that admission control and policing be
1497	   repeated at each trust boundary or that MPLS TE tunnels span multiple
1498	   domains.

1500	7. Security Considerations

1502	   To protect against denial of service attacks, the ingress gateway of
1503	   the CL-region needs to police all CL packets and drop packets in
1504	   excess of the reservation. This is similar to operations with
1505	   existing IntServ behaviour.

1507	   For pre-emption, it is considered acceptable from a security
1508	   perspective that the ingress gateway can treat "emergency/military"
1509	   CL flows preferentially compared with "ordinary" CL flows. However,
1510	   in the rest of the CL-region they are not distinguished (nonetheless,
1511	   our proposed technique does not preclude the use of different DSCPs
1512	   at the packet level as well as different priorities at the flow
1513	   level.). Keeping emergency traffic indistinguishable at the packet
1514	   level minimises the opportunity for new security attacks. For
1515	   example, if instead a mechanism used different DSCPs for
1516	   "emergency/military" and "ordinary" packets, then an attacker could
1517	   specifically target the former in the data plane (perhaps for DoS or
1518	   for eavesdropping).

1520	   Further security aspects to be considered later.

1522	8. Acknowledgements

1524	   The admission control mechanism evolved from the work led by Martin
1525	   Karsten on the Guaranteed Stream Provider developed in the M3I
1526	   project [GSPa, GSP-TR], which in turn was based on the theoretical
1527	   work of Gibbens and Kelly [DCAC]. Kennedy Cheng, Gabriele Corliano,
1528	   Carla Di Cairano-Gilfedder, Kashaf Khan, Peter Hovell, Arnaud Jacquet
1529	   and June Tay (BT) helped develop and evaluate this approach.

1531	9. Comments solicited

1533	   Comments and questions are encouraged and very welcome. They can be
1534	   sent to the Transport Area Working Group's mailing list,
1535	   tsvwg@ietf.org, and/or to the authors.

1537	10. Changes from earlier versions of the draft

1539	   The main changes are:

1541	   From -00 to -01

1543	   The whole of the Pre-emption mechanism is added.

1545	   There are several modifications to the admission control mechanism.

1547	   From -01 to -02

1549	   The pre-congestion notification algorithms for admission marking and
1550	   pre-emption marking are now described in [PCN].

1552	   There are new sub-sections in Section 4 on Failures, Admission of
1553	   'emergency / higher precedence' session, and Tunnelling; and a new
1554	   sub-section in Section 5 on Mechanisms to deal with 'Flash crowds'.

1556	11. Appendices

1558	11.1. Appendix A: Explicit Congestion Notification

1560	   This Appendix provides a brief summary of Explicit Congestion
1561	   Notification (ECN).

1563	   [RFC3168] specifies the incorporation of ECN to TCP and IP, including
1564	   ECN's use of two bits in the IP header. It specifies a method for
1565	   indicating incipient congestion to end-nodes (eg as in RED, Random
1566	   Early Detection), where the notification is through ECN marking
1567	   packets rather than dropping them.

1569	   ECN uses two bits in the IP header of both IPv4 and IPv6 packets:

1571	            0     1     2     3     4     5     6     7
1572	         +-----+-----+-----+-----+-----+-----+-----+-----+
1573	         |          DS FIELD, DSCP           | ECN FIELD |
1574	         +-----+-----+-----+-----+-----+-----+-----+-----+

1576	           DSCP: differentiated services codepoint
1577	           ECN:  Explicit Congestion Notification

1579	   Figure A.1: The Differentiated Services and ECN Fields in IP.

1581	   The two bits of the ECN field have four ECN codepoints, '00' to '11':
1582	         +-----+-----+
1583	         | ECN FIELD |
1584	         +-----+-----+
1585	           ECT   CE
1586	            0     0         Not-ECT
1587	            0     1         ECT(1)
1588	            1     0         ECT(0)
1589	            1     1         CE

1591	   Figure A.2: The ECN Field in IP.

1593	   The not-ECT codepoint '00' indicates a packet that is not using ECN.

1595	   The CE codepoint '11' is set by a router to indicate congestion to
1596	   the end nodes. The term 'CE packet' denotes a packet that has the CE
1597	   codepoint set.

1599	   The ECN-Capable Transport (ECT) codepoints '10' and '01' (ECT(0) and
1600	   ECT(1) respectively) are set by the data sender to indicate that the
1601	   end-points of the transport protocol are ECN-capable. Routers treat
1602	   the ECT(0) and ECT(1) codepoints as equivalent. Senders are free to
1603	   use either the ECT(0) or the ECT(1) codepoint to indicate ECT, on a
1604	   packet-by-packet basis. The use of both the two codepoints for ECT is
1605	   motivated primarily by the desire to allow mechanisms for the data
1606	   sender to verify that network elements are not erasing the CE
1607	   codepoint, and that data receivers are properly reporting to the
1608	   sender the receipt of packets with the CE codepoint set.

1610	   ECN requires support from the transport protocol, in addition to the
1611	   functionality given by the ECN field in the IP packet header.
1612	   [RFC3168] addresses the addition of ECN Capability to TCP, specifying
1613	   three new pieces of functionality: negotiation between the endpoints
1614	   during connection setup to determine if they are both ECN-capable; an
1615	   ECN-Echo (ECE) flag in the TCP header so that the data receiver can
1616	   inform the data sender when a CE packet has been received; and a
1617	   Congestion Window Reduced (CWR) flag in the TCP header so that the
1618	   data sender can inform the data receiver that the congestion window
1619	   has been reduced.

1621	   The transport layer (e.g.. TCP) must respond, in terms of congestion
1622	   control, to a *single* CE packet as it would to a packet drop.

1624	   The advantage of setting the CE codepoint as an indication of
1625	   congestion, instead of relying on packet drops, is that it allows the
1626	   receiver(s) to receive the packet, thus avoiding the potential for
1627	   excessive delays due to retransmissions after packet losses.

1629	11.2. Appendix B: What is distributed measurement-based admission
1630	   control?

1632	   This Appendix briefly explains what distributed measurement-based
1633	   admission control is [Breslau99].

1635	   Traditional admission control algorithms for 'hard' real-time
1636	   services (those providing a firm delay bound for example) guarantee
1637	   QoS by using 'worst case analysis'. Each time a flow is admitted its
1638	   traffic parameters are examined and the network re-calculates the
1639	   remaining resources. When the network gets a new request it therefore
1640	   knows for certain whether the prospective flow, with its particular
1641	   parameters, should be admitted. However, parameter-based admission
1642	   control algorithms result in under-utilisation when the traffic is
1643	   bursty. Therefore 'soft' real time services - like Controlled Load -
1644	   can use a more relaxed admission control algorithm.

1646	   This insight suggests measurement-based admission control (MBAC). The
1647	   aim of MBAC is to provide a statistical service guarantee. The
1648	   classic scenario for MBAC is where each node participates in hop-by-
1649	   hop admission control, characterising existing traffic locally
1650	   through measurements (instead of keeping an accurate track of traffic
1651	   as it is admitted), in order to determine the current value of some
1652	   parameter e.g. load. Note that for scalability the measurement is of
1653	   the aggregate of the flows in the local system. The measured
1654	   parameter(s) is then compared to the requirements of the prospective
1655	   flow to see whether it should be admitted.

1657	   MBAC may also be performed centrally for a network, it which case it
1658	   uses centralised measurements by a bandwidth broker.

1660	   We use distributed MBAC. "Distributed" means that the measurement is
1661	   accumulated for the 'whole-path' using in-band signalling. In our
1662	   case, this means that the measurement of existing traffic is for the
1663	   same pair of ingress and egress gateways as the prospective
1664	   microflow.

1666	   In fact our mechanism can be said to be distributed in three ways:
1667	   all nodes on the ingress-egress path affect the Congestion-Level-
1668	   Estimate; the admission control decision is made just once on behalf
1669	   of all the nodes on the path across the CL-region; and the ingress
1670	   and egress gateways cooperate to perform MBAC.

1672	11.3. Appendix C: Calculating the Exponentially weighted moving average
1673	   (EWMA)

1675	   At the egress gateway, for every CL packet arrival:

1677	   [EWMA-total-bits]n+1  =  (w * bits-in-packet)  +  ((1-w) * [EWMA-
1678	   total-bits]n )

1680	   [EWMA-AM-bits]n+1  =  (B * w * bits-in-packet)  +  ((1-w) * [EWMA-AM-
1681	   bits]n )

1683	   Then, per new flow arrival:

1685	   [Congestion-Level-Estimate]n+1  =  [EWMA-AM-bits]n+1  /  [EWMA-total-
1686	   bits]n+1

1688	   where
1689	   EWMA-total-bits is the total number of bits in CL packets, calculated
1690	   as an exponentially weighted moving average (EWMA)

1692	   EWMA-AM-bits is the total number of bits in CL packets that are
1693	   Admission Marked, again calculated as an EWMA.

1695	   B is either 0 or 1:

1697	     B = 0 if the CL packet is not admission marked

1699	     B = 1 if the CL packet is admission marked

1701	   w is the exponential weighting factor.

1703	   Varying the value of the weight trades off between the smoothness and
1704	   responsiveness of the Congestion-Level-Estimate. However, in general
1705	   both can be achieved, given our original assumption of many CL
1706	   microflows and remembering that the EWMA is calculated on the basis
1707	   of aggregate traffic between the ingress and egress gateways.
1708	   There will be a threshold inter-arrival time between packets of the
1709	   same aggregate below which the egress will consider the estimate of
1710	   the Congestion-Level-Estimate as too stale, and it will then trigger
1711	   generation of probes by the ingress.

1713	   The first two per-packet algorithms can be simplified, if their only
1714	   use will be where the result of one is divided by the result of the
1715	   other in the third, per-flow algorithm.

1717	   [EWMA-total-bits]'n+1  =  bits-in-packet  +  (w' * [EWMA- total-
1718	   bits]n )

1720	   [EWMA-AM-bits]'n+1  =  (B * bits-in-packet)  +  (w' * [EWMA-AM-bits]n
1721	   )

1723	   where w' = (1-w)/w.

1725	   If w' is arranged to be a power of 2, these per packet algorithms can
1726	   be implemented solely with a shift and an add.

1728	12. References

1730	   A later version will distinguish normative and informative
1731	   references.

1733	   [AGGRE-TE]    Francois Le Faucheur, Michael Dibiasio, Bruce Davie,
1734	                 Michael Davenport, Chris Christou, Jerry Ash, Bur
1735	                 Goode, 'Aggregation of RSVP Reservations over MPLS
1736	                 TE/DS-TE Tunnels', draft-ietf-tsvwg-rsvp-dste-00 (work
1737	                 in progress), July 2005

1739	   [ANSI.MLPP.Spec] American National Standards Institute,
1740	                 "Telecommunications- Integrated Services Digital
1741	                 Network (ISDN) - Multi-Level Precedence and Pre-
1742	                 emption (MLPP) Service Capability", ANSI T1.619-1992
1743	                 (R1999), 1992.

1745	   [ANSI.MLPP.Supplement] American National Standards Institute, "MLPP
1746	                 Service Domain Cause Value Changes", ANSI ANSI
1747	                 T1.619a-1994 (R1999), 1990.

1749	   [AVQ]         S. Kunniyur and R. Srikant "Analysis and Design of an
1750	                 Adaptive Virtual Queue (AVQ) Algorithm for Active
1751	                 Queue Management", In: Proc. ACM SIGCOMM'01, Computer
1752	                 Communication Review 31 (4) (October, 2001).

1754	   [Breslau99]   L. Breslau, S. Jamin, S. Shenker "Measurement-based
1755	                 admission control: what is the research agenda?", In:
1756	                 Proc. Int'l Workshop on Quality of Service 1999.

1758	   [Breslau00]   L. Breslau, E. Knightly, S. Shenker, I. Stoica, H.
1759	                 Zhang "Endpoint Admission Control: Architectural
1760	                 Issues and Performance", In: ACM SIGCOMM 2000

1762	   [Briscoe]     Bob Briscoe and Steve Rudkin, "Commercial Models for
1763	                 IP Quality of Service Interconnect", BT Technology
1764	                 Journal, Vol 23 No 2, April 2005.

1766	   [DCAC]        Richard J. Gibbens and Frank P. Kelly "Distributed
1767	                 connection acceptance control for a connectionless
1768	                 network", In: Proc. International Teletraffic Congress
1769	                 (ITC16), Edinburgh, pp. 941�952 (1999).

1771	   [EMERG-RQTS]  Carlberg, K. and R. Atkinson, "General Requirements
1772	                 for Emergency Telecommunication Service (ETS)", RFC
1773	                 3689, February 2004.

1775	   [EMERG-TEL]   Carlberg, K. and R. Atkinson, "IP Telephony
1776	                 Requirements for Emergency Telecommunication Service
1777	                 (ETS)", RFC 3690, February 2004.

1779	   [Floyd]       S. Floyd, 'Specifying Alternate Semantics for the
1780	                 Explicit Congestion Notification (ECN) Field', draft-
1781	                 floyd-ecn-alternates-02.txt (work in progress), August
1782	                 2005

1784	   [GSPa]        Karsten (Ed.), Martin "GSP/ECN Technology &
1785	                 Experiments", Deliverable: 15.3 PtIII, M3I Eu Vth
1786	                 Framework Project IST-1999-11429, URL:
1787	                 http://www.m3i.org/ (February, 2002) (superseded by
1788	                 [GSP-TR])

1790	   [GSP-TR]      Martin Karsten and Jens Schmitt, "Admission Control
1791	                 Based on Packet Marking and Feedback Signalling �--
1792	                 Mechanisms, Implementation and Experiments", TU-
1793	                 Darmstadt Technical Report TR-KOM-2002-03, URL:
1794	                 http://www.kom.e-technik.tu-
1795	                 darmstadt.de/publications/abstracts/KS02-5.html (May,
1796	                 2002)

1798	   [ITU.MLPP.1990] International Telecommunications Union, "Multilevel
1799	                 Precedence and Pre-emption Service (MLPP)", ITU-T
1800	                 Recommendation I.255.3, 1990.

1802	   [Johnson]     DM Johnson, 'QoS control versus generous
1803	                 dimensioning', BT Technology Journal, Vol 23 No 2,
1804	                 April 2005

1806	   [PCN]         B. Briscoe, P. Eardley, D. Songhurst, F. Le Faucheur,
1807	                 A.   Charny, V. Liatsos, S. Dudley, J. Babiarz, K.
1808	                 Chan. 'Pre-Congestion Notification marking', draft-
1809	                 briscoe-tsvwg-cl-phb-01 (work in progress), March
1810	                 2006.

1812	   [Re-ECN]      Bob Briscoe, Arnaud Jacquet, Alessandro Salvatori,
1813	                 'Re-ECN: Adding Accountability for Causing Congestion
1814	                 to TCP/IP', draft-briscoe-tsvwg-re-ecn-tcp-01 (work in
1815	                 progress), March 2006.

1817	   [Re-feedback] Bob Briscoe, Arnaud Jacquet, Carla Di Cairano-
1818	                 Gilfedder, Andrea Soppera, 'Re-feedback for Policing
1819	                 Congestion Response in an Inter-network', ACM SIGCOMM
1820	                 2005, August 2005.

1822	   [Re-PCN]      B. Briscoe, 'Emulating Border Flow Policing using Re-
1823	                 ECN on Bulk Data', draft-briscoe-tsvwg-re-ecn-border-
1824	                 cheat-00 (work in progress), February 2006.

1826	   [Reid]        ABD Reid, 'Economics and scalability of QoS
1827	                 solutions', BT Technology Journal, Vol 23 No 2, April
1828	                 2005

1830	   [RFC2211]     J. Wroclawski, Specification of the Controlled-Load
1831	                 Network Element Service, September 1997

1833	   [RFC2309]     Braden, B., et al., "Recommendations on Queue
1834	                 Management and Congestion Avoidance in the Internet",
1835	                 RFC 2309, April 1998.

1837	   [RFC2474]     Nichols, K., Blake, S., Baker, F. and D. Black,
1838	                 "Definition of the Differentiated Services Field (DS
1839	                 Field) in the IPv4 and IPv6 Headers", RFC 2474,
1840	                 December 1998

1842	   [RFC2475]     Blake, S., Black, D., Carlson, M., Davies, E., Wang,
1843	                 Z. and W. Weiss, 'A framework for Differentiated
1844	                 Services', RFC 2475, December 1998.

1846	   [RFC2597]     Heinanen, J., Baker, F., Weiss, W. and J. Wrocklawski,
1847	                 "Assured Forwarding PHB Group", RFC 2597, June 1999.

1849	   [RFC2998]     Bernet, Y., Yavatkar, R., Ford, P., Baker, F., Zhang,
1850	                 L., Speer, M., Braden, R., Davie, B., Wroclawski, J.
1851	                 and E. Felstaine, "A Framework for Integrated Services
1852	                 Operation Over DiffServ Networks", RFC 2998, November
1853	                 2000.

1855	   [RFC3168]     Ramakrishnan, K., Floyd, S. and D. Black "The Addition
1856	                 of Explicit Congestion Notification (ECN) to IP", RFC
1857	                 3168, September 2001.

1859	   [RFC3246]     B. Davie, A. Charny, J.C.R. Bennet, K. Benson, J.Y. Le
1860	                 Boudec, W. Courtney, S. Davari, V. Firoiu, D.
1861	                 Stiliadis, 'An Expedited Forwarding PHB (Per-Hop
1862	                 Behavior)', RFC 3246, March 2002.

1864	   [RFC3270]      Le Faucheur, F., Wu, L., Davie, B., Davari, S.,
1865	                 Vaananen, P., Krishnan, R., Cheval, P., and J.
1866	                 Heinanen, "Multi- Protocol Label Switching (MPLS)
1867	                 Support of Differentiated Services", RFC 3270, May
1868	                 2002.

1870	   [RMD]         Attila Bader, Lars Westberg, Georgios Karagiannis,
1871	                 Cornelia Kappler, Tom Phelan, 'RMD-QOSM - The Resource
1872	                 Management in DiffServ QoS model', draft-ietf-nsis-
1873	                 rmd-03 Work in Progress, June 2005.

1875	   [RSVP-ECN]    Francois Le Faucheur, Anna Charny, Bob Briscoe, Philip
1876	                 Eardley, Joe Barbiaz, Kwok-Ho Chan, 'RSVP Extensions
1877	                 for Admission Control over DiffServ using Pre-
1878	                 congestion Notification', draft-lefaucheur-rsvp-ecn-00
1879	                 (work in progress), October 2005.

1881	   [RTECN]       Babiarz, J., Chan, K. and V. Firoiu, 'Congestion
1882	                 Notification Process for Real-Time Traffic', draft-
1883	                 babiarz-tsvwg-rtecn-04 Work in Progress, July 2005.

1885	   [RTECN-usage] Alexander, C., Ed., Babiarz, J. and J. Matthews,
1886	                 'Admission Control Use Case for Real-time ECN', draft-
1887	                 alexander-rtecn-admission-control-use-case-00, Work in
1888	                 Progress, February 2005.

1890	   [vq]          Costas Courcoubetis and Richard Weber "Buffer Overflow
1891	                 Asymptotics for a Switch Handling Many Traffic
1892	                 Sources" In: Journal Applied Probability 33 pp. 886--
1893	                 903 (1996).

1895	Authors' Addresses

1897	   Bob Briscoe
1898	   BT Research
1899	   B54/77, Sirius House
1900	   Adastral Park
1901	   Martlesham Heath
1902	   Ipswich, Suffolk
1903	   IP5 3RE
1904	   United Kingdom
1905	   Email: bob.briscoe@bt.com
1906	   Dave Songhurst
1907	   BT Research
1908	   B54/69, Sirius House
1909	   Adastral Park
1910	   Martlesham Heath
1911	   Ipswich, Suffolk
1912	   IP5 3RE
1913	   United Kingdom
1914	   Email: dsonghurst@jungle.bt.co.uk

1916	   Philip Eardley
1917	   BT Research
1918	   B54/77, Sirius House
1919	   Adastral Park
1920	   Martlesham Heath
1921	   Ipswich, Suffolk
1922	   IP5 3RE
1923	   United Kingdom
1924	   Email: philip.eardley@bt.com

1926	   Francois Le Faucheur
1927	   Cisco Systems, Inc.
1928	   Village d'Entreprise Green Side - Batiment T3
1929	   400, Avenue de Roumanille
1930	   06410 Biot Sophia-Antipolis
1931	   France
1932	   Email: flefauch@cisco.com

1934	   Anna Charny
1935	   Cisco Systems
1936	   300 Apollo Drive
1937	   Chelmsford, MA 01824
1938	   USA
1939	   Email: acharny@cisco.com

1941	   Kwok Ho Chan
1942	   Nortel Networks
1943	   600 Technology Park Drive
1944	   Billerica, MA  01821
1945	   USA
1946	   Email: khchan@nortel.com
1947	   Jozef Z. Babiarz
1948	   Nortel Networks
1949	   3500 Carling Avenue
1950	   Ottawa, Ont  K2H 8E9
1951	   Canada
1952	   Email: babiarz@nortel.com

1954	   Stephen Dudley
1955	   Nortel Networks
1956	   4001 E. Chapel Hill Nelson Highway
1957	   P.O. Box 13010, ms 570-01-0V8
1958	   Research Triangle Park, NC 27709
1959	   USA
1960	   Email: smdudley@nortel.com

1962	Intellectual Property Statement

1964	   The IETF takes no position regarding the validity or scope of any
1965	   Intellectual Property Rights or other rights that might be claimed to
1966	   pertain to the implementation or use of the technology described in
1967	   this document or the extent to which any license under such rights
1968	   might or might not be available; nor does it represent that it has
1969	   made any independent effort to identify any such rights.  Information
1970	   on the procedures with respect to rights in RFC documents can be
1971	   found in BCP 78 and BCP 79.

1973	   Copies of IPR disclosures made to the IETF Secretariat and any
1974	   assurances of licenses to be made available, or the result of an
1975	   attempt made to obtain a general license or permission for the use of
1976	   such proprietary rights by implementers or users of this
1977	   specification can be obtained from the IETF on-line IPR repository at
1978	   http://www.ietf.org/ipr.

1980	   The IETF invites any interested party to bring to its attention any
1981	   copyrights, patents or patent applications, or other proprietary
1982	   rights that may cover technology that may be required to implement
1983	   this standard.  Please address the information to the IETF at
1984	   ietf-ipr@ietf.org

1986	Disclaimer of Validity

1988	   This document and the information contained herein are provided on an
1989	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1990	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
1991	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
1992	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1993	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1994	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1996	Copyright Statement

1998	   Copyright (C) The Internet Society (2006).

2000	   This document is subject to the rights, licenses and restrictions
2001	   contained in BCP 78, and except as set forth therein, the authors
2002	   retain all their rights.