idnits 2.17.1 

draft-briscoe-tsvwg-cl-architecture-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 25.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 1826.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1803.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1810.

  ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line
     1830), which is fine, but *also* found old RFC 2026, Section 10.4C,
     paragraph 1 text on line 47.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.

  ** The document seems to lack an RFC 3979 Section 5, para. 3 IPR Disclosure
     Invitation -- however, there's a paragraph with a matching beginning.
     Boilerplate error?


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == There are 2 instances of lines with non-ascii characters in the document.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 57 has weird spacing: '...ning of  their...'

  == Line 725 has weird spacing: '...reshold    thr...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 24, 2005) is 6758 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC2205' is mentioned on line 145, but not defined

  == Missing Reference: 'RT-ECN' is mentioned on line 585, but not defined

  == Missing Reference: 'Re-feedback-I-D' is mentioned on line 1201, but not
     defined

  == Unused Reference: 'AVQ' is defined on line 1596, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2309' is defined on line 1672, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2474' is defined on line 1676, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2597' is defined on line 1685, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3246' is defined on line 1698, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3270' is defined on line 1703, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-05) exists of
     draft-ietf-tsvwg-rsvp-dste-00

  -- Possible downref: Non-RFC (?) normative reference: ref. 'AVQ'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Breslau99'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Breslau00'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Briscoe'

  == Outdated reference: A later version (-03) exists of
     draft-briscoe-tsvwg-cl-phb-00

  -- Possible downref: Normative reference to a draft: ref. 'CL-marking' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DCAC'

  ** Downref: Normative reference to an Informational RFC: RFC 3689 (ref.
     'EMERG-RQTS')

  ** Downref: Normative reference to an Informational RFC: RFC 3690 (ref.
     'EMERG-TEL')

  -- Possible downref: Normative reference to a draft: ref. 'Floyd' 

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GSPa'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'GSP-TR'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ITU.MLPP.1990'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Johnson'

  == Outdated reference: A later version (-09) exists of
     draft-briscoe-tsvwg-re-ecn-tcp-00

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Re-feedback'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Reid'

  ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567)

  ** Downref: Normative reference to an Informational RFC: RFC 2475

  ** Downref: Normative reference to an Informational RFC: RFC 2998

  == Outdated reference: A later version (-20) exists of
     draft-ietf-nsis-rmd-03

  ** Downref: Normative reference to an Experimental draft:
     draft-ietf-nsis-rmd (ref. 'RMD')

  == Outdated reference: A later version (-01) exists of
     draft-lefaucheur-rsvp-ecn-00

  -- Possible downref: Normative reference to a draft: ref. 'RSVP-ECN' 

  == Outdated reference: A later version (-05) exists of
     draft-babiarz-tsvwg-rtecn-04

  -- Possible downref: Normative reference to a draft: ref. 'RTECN' 

  -- Possible downref: Normative reference to a draft: ref. 'RTECN-usage' 


     Summary: 13 errors (**), 0 flaws (~~), 20 warnings (==), 22 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	TSVWG                                                        B. Briscoe
2	Internet Draft                                               P. Eardley
3	draft-briscoe-tsvwg-cl-architecture-01.txt                 D. Songhurst
4	Expires: April 2006                                                  BT

6	                                                         F. Le Faucheur
7	                                                              A. Charny
8	                                                     Cisco Systems, Inc

10	                                                             J. Barbiaz
11	                                                                K. Chan
12	                                                                 Nortel

14	                                                       October 24, 2005

16	    A Framework for Admission Control over DiffServ using Pre-Congestion
17	                               Notification
18	                draft-briscoe-tsvwg-cl-architecture-01.txt

20	Status of this Memo

22	   By submitting this Internet-Draft, each author represents that any
23	   applicable patent or other IPR claims of which he or she is aware
24	   have been or will be disclosed, and any of which he or she becomes
25	   aware will be disclosed, in accordance with Section 6 of BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF), its areas, and its working groups.  Note that
29	   other groups may also distribute working documents as Internet-
30	   Drafts.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress".

37	   The list of current Internet-Drafts can be accessed at
38	        http://www.ietf.org/ietf/1id-abstracts.txt

40	   The list of Internet-Draft Shadow Directories can be accessed at
41	        http://www.ietf.org/shadow.html

43	   This Internet-Draft will expire on May 24, 2006.

45	Copyright Notice

47	   Copyright (C) The Internet Society (2005).  All Rights Reserved.

49	Abstract

51	   This document describes a framework to achieve an end-to-end
52	   Controlled Load (CL) service without the scalability problems of
53	   previous approaches. Flow admission control and if necessary flow
54	   pre-emption preserve the CL service to admitted flows. But interior
55	   routers within a large DiffServ-based region of the Internet do not
56	   require flow state or signalling. They only have to give early
57	   warning of  their own congestion by bulk packet marking using a new
58	   pre-congestion notification behaviour. Gateways around the edges of
59	   the region convert measurements of this packet granularity marking
60	   into admission control and pre-emption functions at flow granularity.

62	Authors' Note (TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION)

64	   This document is posted as an Internet-Draft with the intention of
65	   eventually becoming an INFORMATIONAL RFC, rather than a standards
66	   track document.

68	Table of Contents

70	   1. Introduction................................................4
71	      1.1. Summary................................................4
72	         1.1.1. Admission control..................................5
73	         1.1.2. Pre-emption........................................7
74	         1.1.3. Both admission control and pre-emption.............8
75	      1.2. Terminology............................................8
76	      1.3. Existing terminology...................................10
77	      1.4. Standardisation requirements...........................10
78	      1.5. Structure of rest of the document......................10
79	   2. Key aspects of the framework................................11
80	      2.1. Key goals.............................................11
81	      2.2. Key assumptions........................................12
82	      2.3. Key benefits..........................................15
83	   3. Architecture...............................................17
84	      3.1. Admission control......................................17
85	         3.1.1. Pre-Congestion Notification marking behaviour......17
86	         3.1.2. Measurements to support admission control..........18
87	         3.1.3. How edge-to-edge admission control supports end-to-end
88	         QoS signalling..........................................19
89	         3.1.4. Use case.........................................19
90	      3.2. Pre-emption...........................................20
91	         3.2.1. Alerting an ingress gateway that pre-emption may be
92	         needed..................................................20
93	         3.2.2. Determining the right amount of CL traffic to drop.23
94	         3.2.3. Use case for pre-emption..........................24
95	   4. Details....................................................25
96	      4.1. Ingress gateways.......................................26
97	      4.2. Interior nodes........................................27
98	      4.3. Egress gateways........................................27
99	      4.4. Failures..............................................28
100	   5. Potential future extensions.................................29
101	      5.1. Multi-domain and multi-operator usage..................29
102	      5.2. Adaptive bandwidth for the Controlled Load service......29
103	      5.3. Controlled Load service with end-to-end Pre-Congestion
104	      Notification...............................................29
105	      5.4. MPLS-TE...............................................30
106	   6. Relationship to other QoS mechanisms........................30
107	      6.1. IntServ Controlled Load................................30
108	      6.2. Integrated services operation over DiffServ............30
109	      6.3. Differentiated Services................................31
110	      6.4. ECN...................................................31
111	      6.5. RTECN.................................................31
112	      6.6. RMD...................................................31
113	      6.7. RSVP Aggregation over MPLS-TE..........................32
114	   7. Security Considerations.....................................32
115	   8. Acknowledgements...........................................33
116	   9. Comments solicited.........................................33
117	   10. Changes from the -00 version of this draft.................33
118	   11. Appendixes................................................33
119	      11.1. Appendix A: Explicit Congestion Notification..........33
120	      11.2. Appendix B: What is distributed measurement-based admission
121	      control?...................................................35
122	      11.3. Appendix C: Calculating the Exponentially weighted moving
123	      average (EWMA).............................................36
124	   12. References................................................37
125	   Authors' Addresses............................................41
126	   Intellectual Property Statement................................42
127	   Disclaimer of Validity........................................43
128	   Copyright Statement...........................................43

130	1. Introduction

132	1.1. Summary

134	   This document describes a framework to achieve an end-to-end
135	   controlled load service by using - within a large region of the
136	   Internet - DiffServ and edge-to-edge distributed measurement-based
137	   admission control and flow pre-emption. Controlled load service is a
138	   quality of service (QoS) closely approximating the QoS that the same
139	   flow would receive from a lightly loaded network element [RFC2211].
140	   Controlled Load (CL) is useful for inelastic flows such as those for
141	   real-time media.

143	   In line with the "IntServ over DiffServ" framework defined in
144	   [RFC2998], the CL service is supported end-to-end and RSVP signalling
145	   [RFC2205] is used end-to-end, over an edge-to-edge DiffServ region.

147	 ___    ___    _______________________________________    ____    ___
148	|   |  |   |  |                                       |  |    |  |   |
149	|   |  |   |  |Ingress         Interior         Egress|  |    |  |   |
150	|   |  |   |  |gateway          nodes          gateway|  |    |  |   |
151	|   |  |   |  |-------+  +-------+  +-------+  +------|  |    |  |   |
152	|   |  |   |  | CL-   |  | CL-   |  | CL-   |  |      |  |    |  |   |
153	|   |..|   |..|marking|..|marking|..|marking|..| Meter|..|    |..|   |
154	|   |  |   |  |-------+  +-------+  +-------+  +------|  |    |  |   |
155	|   |  |   |  |  \                                 /  |  |    |  |   |
156	|   |  |   |  |   \                               /   |  |    |  |   |
157	|   |  |   |  |    \  Congestion-Level-Estimate  /    |  |    |  |   |
158	|   |  |   |  |     \  (for admission control)  /     |  |    |  |   |
159	|   |  |   |  |      --<-----<----<----<-----<--      |  |    |  |   |
160	|   |  |   |  |      Sustainable-Aggregate-Rate       |  |    |  |   |
161	|   |  |   |  |          (for pre-emption)            |  |    |  |   |
162	|___|  |___|  |_______________________________________|  |____|  |___|

164	Sx     Access               CL-region                   Access    Rx
165	End    Network                                          Network   End
166	Host                                                              Host
167	                <------ edge-to-edge signalling ----->
168	                (for admission control & pre-emption)

170	<-------------------end-to-end QoS signalling protocol--------------->

172	Figure 1: Overall QoS architecture (NB terminology explained later)
173	   In Section 1.1.1 we summarise how admission of new CL microflows is
174	   controlled so as to deliver the required QoS. In abnormal
175	   circumstances for instance a disaster affecting multiple interior
176	   nodes, then the QoS on existing CL microflows may degrade even if
177	   care was exercised when admitting those microflows before those
178	   circumstances. Therefore we also propose a mechanism (summarised in
179	   Section 1.1.2) to pre-empt some of the existing microflows. Then
180	   remaining microflows retain their expected QoS, while improved QoS is
181	   quickly restored to lower priority traffic.

183	1.1.1. Admission control

185	   This document describes a new admission control procedure for an
186	   edge-to-edge region, which uses a new per-hop Explicit Congestion
187	   Notification marking behaviour as a fundamental building block. In
188	   turn, an end-to-end CL service would use this as a building block
189	   within a broader QoS architecture.

191	   The per-hop, edge-to-edge and end-to-end aspects are now briefly
192	   introduced in turn.

194	   Appendix A provides a brief summary of Explicit Congestion
195	   Notification (ECN) [RFC3168]. It specifies that a router sets the ECN
196	   field to the Congestion Experienced (CE) value as a warning of
197	   incipient congestion. RFC3168 doesn't specify a particular algorithm
198	   for setting the CE codepoint, although RED (Random Early Detection)
199	   is expected to be used. We introduce a new algorithm in this
200	   document, called Pre-Congestion Notification. It aims to set the CE
201	   codepoint before there is any significant build-up of CL packets in
202	   the queue, but as an "early warning" when the amount of packets
203	   flowing is getting close to the engineered capacity. Hence it can be
204	   used with per-hop behaviours (PHBs) designed to operate with very low
205	   queue occupancy. Note that our use of the ECN field operates across
206	   the CL-region, i.e. edge-to-edge, and not host-to-host as in
207	   [RFC3168].

209	   This framework assumes that the Pre-Congestion Notification behaviour
210	   is used in a controlled environment, i.e. within the controlled edge-
211	   to-edge region.

213	   Within the controlled edge-to-edge region, a particular packet
214	   receives the Pre-Congestion Notification behaviour if the packet's
215	   header fulfils two conditions: its DSCP (differentiated services
216	   codepoint) corresponds to the PHB for CL traffic, and also its ECN
217	   field indicates ECN Capable Transport (ECT).

219	   Turning next to the edge-to-edge aspect. All nodes within a region of
220	   the Internet, which we call the CL-region, apply the PHB used for CL
221	   traffic and the Pre-Congestion Notification behaviour. Traffic must
222	   enter/leave the CL-region through ingress/egress gateways, which have
223	   special functionality. Typically the CL-region is the core or
224	   backbone of an operator. The CL service is achieved "edge-to-edge"
225	   across the CL-region, by using distributed measurement-based
226	   admission control: the decision whether to admit a new microflow
227	   depends on a measurement of the existing traffic between the same
228	   pair of ingress and egress gateways (i.e. the same pair as the
229	   prospective new microflow). (See Appendix B for further discussion on
230	   "What is distributed measurement-based admission control?")

232	   As CL packets travel across the CL-region, nodes will set the CE
233	   codepoint (according to the Pre-Congestion Notification algorithm) as
234	   an "early warning" of potential congestion, i.e. before there is any
235	   significant build-up of CL packets in the queue. For traffic from
236	   each remote ingress gateway, the CL-region's egress gateway measures
237	   the fraction of CL traffic for which the CE codepoint is set. The
238	   egress gateway calculates the value on a per bit basis as an
239	   exponentially weighted moving average (which we term Congestion-
240	   Level-Estimate). Then reports it to the CL-region's ingress gateway
241	   piggy-backed on the signalling for a new flow. The ingress gateway
242	   only admits the new CL microflow if the Congestion-Level-Estimate is
243	   less than a threshold value. Hence previously accepted CL microflows
244	   will suffer minimal queuing delay, jitter and loss.

246	   In turn, the edge-to-edge architecture is a building block in
247	   delivering an end-to-end CL service. The approach is similar to that
248	   described in [RFC2998] for Integrated services operation over
249	   DiffServ networks. Like [RFC2998], an IntServ class (CL in our case)
250	   is achieved end-to-end, with a CL-region viewed as a single
251	   reservation hop in the total end-to-end path. Interior nodes of the
252	   CL-region do not process flow signalling nor do they hold state. We
253	   assume that the end-to-end signalling mechanism is RSVP (Section
254	   2.2). However, the RSVP signalling may itself be originated or
255	   terminated by proxies still closer to the edge of the network, such
256	   as home hubs or the like, triggered in turn by application layer
257	   signalling. [RFC2998] and our approach are compared further in
258	   Section 6.2.

260	   An important benefit compared with the IntServ over DiffServ model
261	   [RFC2998] arises from the fact that the load is controlled
262	   dynamically rather than with the traffic conditioning agreements
263	   (TCAs). TCAs were originally introduced in the (informational)
264	   DiffServ architecture [RFC2475] as an alternative to reservation
265	   processing in the interior region in order to reduce the burden on
266	   interior nodes. With TCAs, in practice service providers rely on
267	   subscription-time Service Level Agreements that statically define the
268	   parameters of the traffic that will be accepted from a customer. The
269	   problem arises because the TCA at the ingress must allow any
270	   destination address, if it is to remain scalable. But for longer
271	   topologies, the chances increase that traffic will focus on an
272	   interior resource, even though it is within contract at the ingress
273	   [Reid], e.g. all flows converge on the same egress gateway. Even
274	   though networks can be engineered to make such failures rare, when
275	   they occur all inelastic flows through the congested resource fail
276	   catastrophically.

278	   Distributed measurement-based admission control avoids reservation
279	   processing (whether per flow or aggregated) on interior nodes but
280	   flows are still blocked dynamically in response to actual congestion
281	   on any interior node. Hence there is no need for accurate or
282	   conservative prediction of the traffic matrix.

284	1.1.2. Pre-emption

286	   An essential QoS issue in core and backbone networks is being able to
287	   cope with failures of nodes and links. The consequent re-routing can
288	   cause severe congestion on some links and hence degrade the QoS
289	   experienced by on-going microflows and other, lower priority traffic.
290	   Even when the network is engineered to sustain a single link failure,
291	   multiple link failures (e.g. due to a fibre cut or a node failure, or
292	   a natural disaster) can cause violation of capacity constraints and
293	   resulting QoS failures. Our solution uses rate-based pre-emption, so
294	   that sufficient of the previously admitted CL microflows are dropped
295	   to ensure that the remaining ones again receive QoS commensurate with
296	   the CL service and at least some QoS is quickly restored to other
297	   traffic classes.

299	   The solution has two aspects. First, triggering the ingress gateway
300	   to test whether pre-emption may be needed. This involves an optional
301	   new router marking behaviour for Pre-emption Alert. Secondly,
302	   calculating the right amount of traffic to drop. This involves the
303	   egress gateway measuring, and reporting to the ingress gateway, the
304	   current amount of CL traffic received from that particular ingress
305	   gateway. The ingress gateway compares this measurement (which is the
306	   amount that the network can actually support, and which we thus call
307	   the Sustainable-Aggregate-Rate) with the rate that it is sending and
308	   hence determines how much traffic needs to be pre-empted.

310	   The solution operates within a little over one round trip time - the
311	   time required for microflow packets that have experienced Pre-emption
312	   Alert marking to travel downstream through the CL-region and arrive
313	   at the egress gateway, plus some additional time for the egress
314	   gateway to measure the rate seen after it has been alerted that pre-
315	   emption may be needed, and the time for the egress gateway to report
316	   this information to the ingress gateway.

318	1.1.3. Both admission control and pre-emption

320	   This document describes both the admission control and pre-emption
321	   mechanisms, and we suggest that an operator uses both. However, we do
322	   not require this and some operators may want to implement only one.

324	   For example, an operator could use just admission control, solving
325	   heavy congestion (caused by re-routing) by 'just waiting' - as
326	   sessions end, existing microflows naturally depart from the system
327	   over time, and the admission control mechanism will prevent admission
328	   of new microflows that use the affected links. So the CL-region will
329	   naturally return to normal controlled load service, but with reduced
330	   capacity. The drawback of this approach would be that until flows
331	   naturally depart to relieve the congestion, all flows and lower
332	   priority services will be adversely affected. As another example, an
333	   operator could use just admission control, avoiding heavy congestion
334	   (caused by re-routing) by 'capacity planning' - by configuring
335	   admission control thresholds to lower levels than the network could
336	   accept in normal situations such that the load after failure is
337	   expected to stay below acceptable levels even with reduced network
338	   resources.

340	   On the other hand, an operator could just rely for admission control
341	   on the traffic conditioning agreements of the DiffServ architecture
342	   [RFC2475]. The pre-emption mechanism described in this document would
343	   be used to counteract the problem described at the end of Section
344	   1.1.1.

346	1.2. Terminology

348	   o Ingress gateway: node at an ingress to the CL-region. A CL-region
349	      may have several ingress gateways.

351	   o Egress gateway: node at an egress from the CL-region. A CL-region
352	      may have several egress gateways.

354	   o Interior node: a node which is part of the CL-region, but isn't an
355	      ingress or egress node.

357	   o CL-region: A region of the Internet in which all traffic
358	      enters/leaves through an ingress/egress gateway and all nodes run
359	      the Pre-Congestion Notification and Pre-emption Alert behaviours.
360	      A CL-region is a DiffServ region (a DiffServ region is either a
361	      single DiffServ domain or set of contiguous DiffServ domains), but
362	      note that the CL-region does not use the traffic conditioning
363	      agreements (TCAs) of the (informational) DiffServ architecture.

365	   o CL-region-aggregate: all the microflows between a specific pair of
366	      ingress and egress gateways. Note there is no identifier unique to
367	      the aggregate.

369	   o Pre-Congestion Notification: a new algorithm for deciding whether
370	      to set the ECN CE codepoint (Explicit Congestion Notification
371	      Congestion Experienced), for use by all routers in the CL-region.
372	      A router sets the CE codepoint as an "early warning" that the load
373	      is nearing the engineered admission control capacity, before there
374	      is any significant build-up of CL packets in the queue.

376	   o Inverse-token-bucket: a token bucket for which tokens are added
377	      when packets are queued for transmission on the corresponding link
378	      and consumed at a fixed rate. This is the inverse of a normal
379	      token bucket.

381	   o Pre-emption Alert: a new router marking behaviour, for use by
382	      either all or none of the routers in the CL-region. A router re-
383	      marks a packet to Re-marked-CL to warn explicitly that pre-emption
384	      may be needed.

386	   o Congestion-Level-Estimate: the number of bits in CL packets that
387	      have the CE codepoint set, divided by the number of bits in all CL
388	      packets. It is calculated as an exponentially weighted moving
389	      average. It is calculated by an egress gateway for the CL packets
390	      from a particular ingress gateway, i.e. there is a Congestion-
391	      Level-Estimate for each CL-region-aggregate.

393	   o Sustainable-Aggregate-Rate: the rate of traffic that the network
394	      can actually support for a specific CL-region-aggregate. So it is
395	      measured by an egress gateway for the CL packets from a particular
396	      ingress gateway.

398	1.3. Existing terminology

400	   This is a placeholder for useful terminology that is defined
401	   elsewhere.

403	1.4. Standardisation requirements

405	   The framework described in this document has two new standardisation
406	   requirements:

408	   o new Pre-Congestion Notification and Pre-emption Alert marking
409	      behaviours are required, as detailed in [CL-marking].

411	   o the end-to-end signalling protocol needs to be modified to carry
412	      the Congestion-Level-Estimate report (for admission control) and
413	      the Sustainable-Aggregate-Rate (for pre-emption). With our
414	      assumption of RSVP (Section 2.2) as the end-to-end signalling
415	      protocol, it means that extensions to RSVP are required, as
416	      detailed in [RSVP-ECN], for example to carry the Congestion-Level-
417	      Estimate and Sustainable-Aggregate-Rate information from egress
418	      gateway to ingress gateway.

420	   We are discussing whether the PHB used by CL traffic should be a new
421	   PHB (indicated by a new DSCP) or whether the Expedited Forwarding
422	   (EF) PHB can be used with the addition of the required ECN marking
423	   behaviour.

425	   Other than these things, the arrangement uses existing IETF protocols
426	   throughout, although not in their usual architecture.

428	1.5. Structure of rest of the document

430	   Section 2 describes some key aspects of the framework: our goals,
431	   assumptions and the benefits we believe it has. Section 3 describes
432	   the architecture (including a use case), whilst Section 4 summarises
433	   the required changes to the various nodes in the CL-region. Section 5
434	   outlines some possible extensions. Section 6 provides some comparison
435	   with existing QoS mechanisms.

437	2. Key aspects of the framework

439	   In this section we discuss the key aspects of the framework:

441	   o At a high level, our key goals, i.e. the functionality that we
442	      want to achieve

444	   o The assumptions that we're prepared to make

446	   o The consequent benefits they bring

448	2.1. Key goals

450	   The framework achieves an end-to-end controlled load (CL) service
451	   where a segment of the end-to-end path is an edge-to-edge Pre-
452	   Congestion Notification region. CL is a quality of service (QoS)
453	   closely approximating the QoS that the same flow would receive from a
454	   lightly loaded network element [RFC2211]. It is useful for inelastic
455	   flows such as those for real-time media.

457	   o The CL service should be achieved despite varying load levels of
458	      other sorts of traffic, which may or may not be rate adaptive
459	      (i.e. responsive to packet drops or ECN marks).

461	   o The CL service should be supported for a variety of possible CL
462	      sources: Constant Bit Rate (CBR), Variable Bit Rate (VBR) and
463	      voice with silence suppression. VBR is the most challenging to
464	      support.

466	   o After a localised failure in the interior of the CL-region causing
467	      heavy congestion, the CL service should recover gracefully by pre-
468	      empting (dropping) some of the admitted CL microflows, whilst
469	      preserving as many of them as possible with their full CL QoS.

471	   o It is suggested that pre-emption needs to be completed within 1-2
472	      seconds, because it is estimated that after a few seconds then
473	      many affected users will start to hang up (and then not only is a
474	      pre-emption mechanism redundant and possibly even counter-
475	      productive, but also many more flows than necessary to reduce
476	      congestion may hang up). Also, other, lower priority traffic
477	      classes will not be restored to partial service until the higher
478	      priority CL service reduces its load on shared links.

480	   o The CL service should support emergency services ([EMERG-RQTS],
481	      [EMERG-TEL]) as well as the Assured Service which is the IP
482	      implementation of the existing ITU-T/NATO/DoD telephone system
483	      architecture known as Multi-Level Pre-emption and Precedence
484	      [ITU.MLPP.1990] [ANSI.MLPP.Spec][ANSI.MLPP.Supplement], or MLPP.
485	      In particular, this involves admitting new high priority sessions
486	      even when admission control thresholds are reached and new routine
487	      sessions are rejected. Similarly, this involves taking into
488	      account session priorities and properties at the time of pre-
489	      empting calls.

491	2.2. Key assumptions

493	   The framework does not try to deliver the above functionality in all
494	   scenarios. We make the following assumptions about the type of
495	   scenario to be solved.

497	   o Edge-to-edge: all the nodes in the CL-region are upgraded with the
498	      Pre-Congestion Notification and Pre-emption Alert mechanisms, and
499	      all the ingress and egress gateways are upgraded to perform the
500	      measurement-based admission control and pre-emption. Note that
501	      although the upgrades required are edge-to-edge, the CL service is
502	      provided end-to-end.

504	   o Additional load: we assume that any additional load offered within
505	      the reaction time of the admission control mechanism doesn't move
506	      the CL-region directly from no congestion to overload. So it
507	      assumes there will always be an intermediate stage where some CL
508	      packets have their CE codepoint set, but they are still delivered
509	      without significant QoS degradation. We believe this is valid for
510	      core and backbone networks with typical call arrival patterns
511	      (given the reaction time is little more than one round trip time
512	      across the CL-region), but is unlikely to be valid in access
513	      networks where the granularity of an individual call becomes
514	      significant.

516	   o Aggregation: we assume that in normal operations, there are many
517	      CL microflows within the CL-region, typically at least hundreds
518	      between any pair of ingress and egress gateways. The implication
519	      is that the solution is targeted at core and backbone networks and
520	      possibly parts of large access networks.

522	   o Trust: we assume that there is trust between all the nodes in the
523	      CL-region. For example, this trust model is satisfied if one
524	      operator runs the whole of the CL-region. But we make no such
525	      assumptions about the end nodes, i.e. depending on the scenario
526	      they may be trusted or untrusted by the CL-region.

528	   o Signalling: we assume that the end-to-end signalling protocol is
529	      RSVP. Section 3 describes how the CL-region fits into such an end-
530	      to-end QoS scenario, whilst [RSVP-ECN] describes the extensions to
531	      RSVP that are required.

533	   o Separation: we assume that all nodes within the CL-region are
534	      upgraded with the CL mechanism, so the requirements of [Floyd] are
535	      met because the CL-region is an enclosed environment. Also, an
536	      operator separates CL-traffic in the CL-region from outside
537	      traffic by administrative configuration of the ring of gateways
538	      around the region. Within the CL-region we assume that the CL-
539	      traffic is separated from non-CL traffic.

541	   o Routing: we assume that one of the following applies:

543	        (same path) all packets between a pair of ingress and egress
544	        gateways follow the same path. This ensures that the Congestion-
545	        Level-Estimate used in the admission control procedure reflects
546	        the status of the path followed by the new flow's packets

548	        (load balanced) packets between a pair of ingress and egress
549	        gateways follow different paths but that the load balancing
550	        scheme is tuned in the CL-region to distribute load such that
551	        the different paths always receive comparable relative load.
552	        This ensures that the Congestion-Level-Estimate used in the
553	        admission control procedure (and which is computed taking into
554	        account packets travelling on all the paths) also approximately
555	        reflects the status of the actual path followed by the new
556	        microflow's packets

558	        (worst case assumed) packets between a pair of ingress and
559	        egress gateways follow different paths but that (i) it is
560	        acceptable for the operator to keep the CL traffic between this
561	        pair of gateways to a level dictated by the most loaded of all
562	        paths between this pair of gateways (so that CL traffic may be
563	        rejected - or even pre-empted in some situations - even if one
564	        or more of the paths between the pair of gateways is operating
565	        below its engineered levels) and that (ii) it is acceptable for
566	        that operator to configure engineered levels below optimum
567	        levels to compensate for the fact that the effect on the
568	        Congestion-Level-Estimate of the congestion experienced over one
569	        of the paths may be diluted by traffic received over non-
570	        congested paths so that lower thresholds need to be used in
571	        these cases to ensure early admission control rejection and pre-
572	        emption over the congested paths.

574	   We are investigating ways of loosening the restrictions set by some
575	   of these assumptions, for instance:

577	   o Trust: to allow the CL-region to span multiple, non-trusting
578	      operators, using the technique of [Re-feedback] [Re-ECN] and
579	      mentioned in Section 5.1.

581	   o Signalling: we believe that the solution could operate with
582	      another signalling protocol such as NSIS. We would very much
583	      welcome input / collaboration with the NSIS community in order to
584	      carry out similar work as done for RSVP. It could also work with
585	      application level signalling as suggested in [RT-ECN].

587	   o Additional load: we believe that the assumption is valid for core
588	      and backbone networks, with an appropriate margin between the
589	      inverse-token-bucket's token rate and the configured rate for CL
590	      traffic. However, in principle a burst of admission requests can
591	      occur in a short time. We expect this to be a rare event under
592	      normal conditions, but it could happen e.g.. due to a 'flash
593	      crowd'. If it does, then more flows may be admitted than should
594	      be, triggering the pre-emption mechanisms., To avoid the need for
595	      pre-emption, 'call gapping' could be used at the egress (i.e. the
596	      egress gateway paces out the admission of microflows).

598	   o Separation: the assumption that CL traffic is separated from non-
599	      CL traffic implies that the CL traffic has its own PHB, not shared
600	      with other traffic. We are looking at whether it could share
601	      Expedited Forwarding's PHB, but supplemented with the new Pre-
602	      Congestion Notification and Pre-emption Alert marking behaviours.
603	      If this is possible, other PHBs (like Assured Forwarding) could be
604	      supplemented with the same new behaviours. This is similar to how
605	      RFC3168 ECN was defined to supplement any PHB.

607	   o Routing: we are looking in greater detail at the solution in the
608	      presence of Equal Cost Multi-Path routing and at suitable
609	      enhancements.

611	2.3. Key benefits

613	   We believe that the mechanism described in this document has several
614	   advantages:

616	   o It achieves statistical guarantees of quality of service for
617	      microflows, delivering a very low delay, jitter and packet loss
618	      service suitable for applications like voice and video calls that
619	      generate real time inelastic traffic. This is because of its per
620	      microflow admission control scheme, combined with its dynamic on-
621	      path "early warning" of potential congestion. The guarantee is at
622	      least as strong as with IntServ Controlled Load (Section 6.1
623	      mentions why the guarantee may be somewhat better), but without
624	      the scalability problems of per-microflow IntServ.

626	   o It can support "Emergency" and military Multi-Level Pre-emption
627	      and Priority services, even in times of heavy congestion (perhaps
628	      caused by failure of a node within the CL-region), by pre-empting
629	      on-going "ordinary CL microflows".

631	   o It scales well, because there is no signal processing or path
632	      state held by the interior nodes of the CL-region.

634	   o It is resilient, again because no state is held by the interior
635	      nodes of the CL-region. Hence during an interior routing change
636	      caused by a node failure no microflow state has to be relocated.
637	      The pre-emption mechanism further helps resilience because it
638	      rapidly reduces the load to one that the CL-region can support.

640	   o It helps preserve, through the pre-emption mechanism, QoS to as
641	      many microflows as possible and to lower priority traffic in times
642	      of heavy congestion (e.g.. caused by failure of an interior node).
643	      Otherwise long-lived microflows could cause loss on all CL
644	      microflows for a long time.

646	   o It avoids the potential catastrophic failure problem when the
647	      DiffServ architecture is used in large networks using statically
648	      provisioned capacity. This is achieved by controlling the load
649	      dynamically based on edge-to-edge-path real-time measurement of
650	      Pre-Congestion Notification, as discussed in Section 1.1.1.

652	   o It requires minimal new standardisation, because it reuses
653	      existing QoS protocols and algorithms.

655	   o It can be deployed incrementally, region by region or network by
656	      network. Not all the regions or networks on the end-to-end path
657	      need to have it deployed. Two CL-regions can even be separated by
658	      a network that uses another QoS mechanism (e.g. MPLS-TE).

660	   o It provides a deployment path for use of ECN for real-time
661	      applications. Operators can gain experience of ECN before its
662	      applicability to end-systems is understood and end terminals are
663	      ECN capable.

665	3. Architecture

667	3.1. Admission control

669	   In this section we describe the admission control mechanism. We
670	   discuss the three pieces of the solution and then give an example of
671	   how they fit together in a use case:

673	   o the new Pre-Congestion Notification marking behaviour used by all
674	      nodes in the CL-region

676	   o how the measurements made support our admission control mechanism

678	   o how the edge to edge mechanism fits into the end to end RSVP
679	      signalling

681	3.1.1. Pre-Congestion Notification marking behaviour

683	   To support our admission control mechanism, each node in the CL-
684	   region runs an algorithm to determine whether to set the CE codepoint
685	   of a particular CL packet.

687	   Each link in the CL-region has a fixed rate (bandwidth) reflecting
688	   the engineered admission control capacity for CL traffic, under the
689	   control of management configuration. In order to make the description
690	   more specific we assume a bulk 'inverse-token-bucket' is used on each
691	   link; other implementations are possible. Tokens are added to our
692	   inverse-token-bucket when packets are queued for transmission on the
693	   corresponding link, and are consumed at a fixed rate that is slower
694	   than the configured rate. This means that the amount of tokens starts
695	   to increase before the actual queue builds up, but when it is in
696	   danger of doing so soon; hence it can be used as an "early warning"
697	   that the engineered capacity is nearly reached. The probability that
698	   a node sets the CE codepoint of a CL packet depends on the number of
699	   tokens in the inverse-token-bucket. Below one threshold value of the
700	   number of tokens no packets have their CE codepoint set and above the
701	   second they all do; in between, the probability increases linearly.
702	   Note that the same inverse-token-bucket is used for all the CL
703	   packets on that link, i.e. it operates in bulk on the CL behaviour
704	   aggregate and not per microflow. The algorithm is detailed in [CL-
705	   marking].

707	Probability
708	of setting    ^
709	CE codepoint  |
710	              |
711	            1_|                     _______________
712	              |                    /
713	              |                   /
714	              |                  /
715	              |                 /
716	              |                /
717	              |               /
718	              |              /
719	              |             /
720	              |            /
721	            0_|___________/
722	              |
723	               -----------|---------|-------------->
724	                        min-       max-         Amount of tokens in
725	                     threshold    threshold     inverse-token-bucket

727	Figure 2: Setting the Congestion Experienced Codepoint

729	   How does a node know that it should apply the new Pre-Congestion
730	   Notification marking behaviour? A CL packet is indicated by a
731	   combination of three things: the node itself is in the CL-region so
732	   it is configured with a behaviour for CL packets; the ECN codepoint
733	   is set to ECN-Capable Transport (ECT); and the DSCP is set to the
734	   value configured for the CL behaviour aggregate in the CL-region. On
735	   the third point, we are currently considering whether the PHB used by
736	   CL traffic should be a new PHB (indicated by a new DSCP) or whether
737	   the Expedited Forwarding (EF) PHB can be used.

739	3.1.2. Measurements to support admission control

741	   To support our admission control mechanism the egress measures the
742	   Congestion-Level-Estimate for traffic from each remote ingress
743	   gateway, i.e. per CL-region-aggregate. The Congestion-Level-Estimate
744	   is the number of bits in CL packets that have the CE codepoint set,
745	   divided by the number of bits in all CL packets. It is calculated as
746	   an exponentially weighted moving average. It is calculated by an
747	   egress node separately for the CL packets from each particular
748	   ingress node. This Congestion-Level-Estimate provides an estimate of
749	   how near the links on the path inside the CL-region are getting to
750	   the engineered admission control capacity. Note that the metering is
751	   done separately per ingress node, because there may be sufficient
752	   capacity on all the nodes on the path between one ingress gateway and
753	   a particular egress, but not from a second ingress to that same
754	   egress gateway.

756	3.1.3. How edge-to-edge admission control supports end-to-end QoS
757	   signalling

759	   Consider a scenario that consists of two end hosts, each connected to
760	   their own access networks, which are linked by the CL-region. A
761	   source tries to set up a new CL microflow by sending an RSVP PATH
762	   message, and the receiving end host replies with an RSVP RESV
763	   message. Outside the CL-region some other method, for instance
764	   IntServ, is used to provide QoS. From the perspective of RSVP the CL-
765	   region is a single hop, so the RSVP PATH and RESV messages are
766	   processed by the ingress and egress gateways but are carried
767	   transparently across all the interior nodes; hence, the ingress and
768	   egress gateways hold per microflow state, whilst no state is kept by
769	   the interior nodes. So far this is as in IntServ over DiffServ
770	   [RFC2998]. However, in order to support our admission control
771	   mechanism, the egress gateway adds to the RESV message an opaque
772	   object which states the current Congestion-Level-Estimate for the
773	   relevant CL-region-aggregate. Details of the corresponding RSVP
774	   extensions are described in [RSVP-ECN].

776	3.1.4. Use case

778	   To see how the three pieces of the solution fit together, we imagine
779	   a scenario where some microflows are already in place between a given
780	   pair of ingress and egress gateways, but the traffic load is such
781	   that no packets from these flows have their CE codepoint set as they
782	   travel across the CL-region. A source wanting to start a new CL
783	   microflow sends an RSVP PATH message. The egress gateway adds an
784	   object to the RESV message with the Congestion-Level-Estimate, which
785	   is zero. The ingress gateway sees this and consequently admits the
786	   new flow. It then forwards the RSVP RESV message upstream towards the
787	   source end host. Hence, assuming there's sufficient capacity in the
788	   access networks, the new microflow is admitted end-to-end.

790	   The source now sends CL packets, which arrive at the ingress gateway.
791	   The ingress uses a five-tuple filter to identify that the packets are
792	   part of a previously admitted CL microflow, and it also polices the
793	   microflow to ensure it remains within its traffic profile. (The
794	   ingress has learnt the required information from the RSVP messages).
795	   When forwarding a packet belonging to an admitted microflow, the
796	   ingress sets the packet's DSCP to that for the CL-traffic in the CL-
797	   region and the packet's ECN field to ECT, so that the interior nodes
798	   know this is a CL packet. The CL packet now travels across the CL-
799	   region, with the CE codepoint getting set if necessary. Also,
800	   appropriate queue scheduling is needed in each node to ensure that CL
801	   traffic gets its configured bandwidth.

803	   Next, we imagine the same scenario but at a later time when load is
804	   higher at one (or more) of the interior nodes, which start to set the
805	   CE codepoint of CL packets because their arrival rate is nearing the
806	   configured rate. The next time a source tries to set up a CL
807	   microflow, the ingress gateway learns (from the egress) the relevant
808	   Congestion-Level-Estimate. If it is greater than some threshold value
809	   then the ingress refuses the request, otherwise it is accepted.

811	   It is also possible for an egress gateway to get a RSVP RESV message
812	   and not know what the Congestion-Level-Estimate is. For example, if
813	   there are no CL microflows at present between the relevant ingress
814	   and egress gateways. In this case the egress requests the ingress to
815	   send probe packets, from which it can initialise its meter. RSVP
816	   Extensions for such a request to send probe data can be found in
817	   [RSVP-ECN].

819	3.2. Pre-emption

821	   In this section we describe the pre-emption mechanism. We discuss the
822	   two parts of the solution and then give an example of how they fit
823	   together in a use case:

825	   o How an ingress gateway is triggered to test whether pre-emption
826	      may be needed

828	   o How an ingress gateway determines the right amount of CL traffic
829	      to drop

831	   The mechanism is defined in [CL-marking] and [RSVP-ECN].

833	3.2.1. Alerting an ingress gateway that pre-emption may be needed

835	   Alerting an ingress gateway that pre-emption may be needed is a two
836	   stage process: a router in the CL-region alerts an egress gateway
837	   that pre-emption may be needed; in turn the egress gateway alerts the
838	   relevant ingress gateway. Every router in the CL-region has the
839	   ability to alert egress gateways, which may be done either explicitly
840	   or implicitly:

842	   o Explicit - every link in the CL-region has a configured traffic
843	      rate, which is a threshold above which it re-marks exceeding CL
844	      packets to Re-marked-CL. Reception of such a packet by the egress
845	      gateway acts as a Pre-emption Alert. Encoding of Re-marked-CL is
846	      under discussion (a new DSCP or leaving the DSCP unchanged and
847	      setting a new ECN codepoint). Note that the explicit mechanism
848	      only makes sense if all the routers in the CL-region have the
849	      functionality so that the egress gateways can rely on the explicit
850	      mechanism. Otherwise there is the danger that the traffic happens
851	      to focus on a router without it, and egress gateways then have to
852	      also watch for implicit pre-emption alerts.

854	   o Implicit - the router behaviour is unchanged from the Pre-
855	      Congestion marking behaviour described in the admission control
856	      section. The egress gateway treats a Congestion-Level-Estimate of
857	      (almost) 100% as an implicit alert that pre-emption may be
858	      required. ('Almost' because the Congestion-Level-Estimate is a
859	      moving average, so can never reach exactly 100%.)

861	Probability
862	of re-marking   ^
863	CL packet to    |
864	Re-marked-CL    |
865	packet        1_|            ______________
866	                |           |
867	                |           |
868	                |           |
869	                |           |
870	                |           |
871	                |           |
872	                |           |
873	                |           |
874	                |           |
875	              0_|___________|
876	                |
877	                 -----------|-------------->
878	                      threshold      CL traffic rate

880	Figure 3: Re-marking CL packets to Re-marked-CL packets for explicit
881	Pre-emption Alert
882	   When one or more packets in a CL-region-aggregate alert the egress
883	   gateway of the need for pre-emption, whether explicitly or
884	   implicitly, the egress puts that CL-region-aggregate into Pre-emption
885	   Alert state. For each CL-region-aggregate in alert state it measures
886	   the rate of traffic at the egress gateway (i.e. the traffic rate of
887	   the appropriate CL-region-aggregate) and reports this to the relevant
888	   ingress gateway. The steps are:

890	   o Determine the relevant ingress gateway - for the explicit case the
891	      egress gateway examines the Re-marked-CL packet (resulting from
892	      Pre-emption Alert marking) and uses the state installed at the
893	      time of admission to determine which ingress gateway the packet
894	      came from. For the implicit case the egress gateway has already
895	      determined this information, because the Congestion-Level-Estimate
896	      is calculated per ingress gateway.

898	   o Measure the traffic rate of CL packets - as soon as the egress
899	      gateway is alerted (whether explicitly or implicitly) it measures
900	      the rate of CL traffic from this ingress gateway (i.e. for this
901	      CL-region-aggregate). Note that Re-marked-CL packets are excluded
902	      from that measurement. It should make its measurement quickly and
903	      accurately, but exactly how is up to the implementation.

905	   o Alert the ingress gateway - the egress gateway then immediately
906	      alerts the relevant ingress gateway about the fact that pre-
907	      emption may be required. This Alert message also includes the
908	      measured Sustainable-Aggregate-Rate, i.e. the egress rate of CL-
909	      traffic for this ingress gateway. The Alert message is sent using
910	      reliable delivery. Procedures for support of such an Alert using
911	      RSVP are defined in [RSVP-ECN].

913	             ______________           / \           ________________
914	            |              |        /     \        |                |
915	CL packet   |Update        |      / Is it a \   Y  |Measure CL rate |
916	arrives --->|Congestion-   |--->/Re-marked-CL \--->|from ingress and|
917	            |Level-Estimate|    \   packet?   /    |alert ingress   |
918	            |______________|      \         /      |________________|
919	                                    \     /
920	                                      \ /

922	Figure 4: Egress gateway action for explicit Pre-emption Alert
923	             ______________           / \           ________________
924	            |              |        /     \        |                |
925	CL packet   |Update        |      / C-L-E   \   Y  |Measure CL rate |
926	arrives --->|Congestion-   |--->/  threshold  \--->|from ingress and|
927	            |Level-Estimate|    \  exceeded?  /    |alert ingress   |
928	            |______________|      \         /      |________________|
929	                                    \     /
930	                                      \ /

932	Figure 5: Egress gateway action for implicit Pre-emption Alert

934	3.2.2. Determining the right amount of CL traffic to drop

936	   The method relies on the insight that the amount of CL traffic that
937	   can be supported between a particular pair of ingress and egress
938	   gateways, is the amount of CL traffic that is actually getting across
939	   the CL-region to the egress gateway without being re-marked to Re-
940	   marked-CL. Hence we term it the Sustainable-Aggregate-Rate.

942	   So when the ingress gateway gets the Alert message from an egress
943	   gateway, it compares:

945	   o The traffic rate that it is sending to this particular egress
946	      gateway (which we term ingress-rate)

948	   o The traffic rate that the egress gateway reports (in the Alert
949	      message) that it is receiving from this ingress gateway (which is
950	      the Sustainable-Aggregate-Rate)

952	   If the difference is significant, then the ingress gateway pre-empts
953	   some microflows. It only pre-empts if:

955	        Ingress-rate > Sustainable-Aggregate-Rate + error

957	   The "error" term is partially to allow for inaccuracies in the
958	   measurements of the rates. It is also needed because the ingress-rate
959	   is measured at a slightly later moment than the Sustainable-
960	   Aggregate-Rate, and it is quite possible that the ingress-rate has
961	   increased in the interim due to natural variation of the bit rate of
962	   the CL sources. So the "error" term allows for some variation in the
963	   ingress rate without triggering pre-emption.

965	   The ingress gateway should pre-empt enough microflows to ensure that:

967	        New ingress-rate < Sustainable-Aggregate-Rate - error

969	   The "error" term here is used for similar reasons but in the other
970	   direction, to ensure slightly more load is shed than seems necessary,
971	   in case the two measurements were taken during a short-term fall in
972	   load.

974	   When the routers in the CL-region are using explicit pre-emption
975	   alerting, the ingress gateway would normally pre-empt microflows
976	   whenever it gets an alert (it always would if it were possible to set
977	   "error" equal to zero). For the implicit case however this is not so.
978	   It receives an Alert message when the Congestion-Level-Estimate
979	   reaches (almost) 100%, which is roughly when traffic exceeds the
980	   amount allocated for admission control of CL traffic at routers.
981	   However, it is only when packets are indeed dropped en route that the
982	   Sustainable-Aggregate-Rate becomes less than the ingress-rate so only
983	   then will pre-emption will actually occur on the ingress router.

985	   Hence with the implicit scheme, pre-emption can only be triggered
986	   once the system starts dropping packets and thus the QoS of flows
987	   starts being significantly degraded. This is in contrast with the
988	   explicit scheme which allows pre-emption to be triggered before any
989	   packet drop, simply when the traffic reaches a certain configured
990	   engineered pre-emption level. Therefore we believe that the explicit
991	   mechanism is superior. However it does require new functionality on
992	   all the routers (although this is little more than a bulk token
993	   bucket).

995	3.2.3. Use case for pre-emption

997	   To see how the pieces of the solution fit together in a use case, we
998	   imagine a scenario where many microflows have already been admitted.
999	   We confine our description to the explicit pre-emption mechanism. Now
1000	   an interior router in the CL-region fails. The network layer routing
1001	   protocol re-routes round the problem, but as a consequence traffic on
1002	   other links increases. In fact let's assume the traffic on one link
1003	   now exceeds its pre-emption threshold and so the router re-marks CL
1004	   packets to Re-marked-CL. When the egress sees the first one of these
1005	   packets it immediately determines which microflow this packet is part
1006	   of (by using a five-tuple filter and comparing it with state
1007	   installed at admission) and hence which ingress gateway the packet
1008	   came from. It sets up a meter to measure the traffic rate from this
1009	   ingress gateway, and as soon as possible sends a message to the
1010	   ingress gateway. This message alerts the ingress gateway that pre-
1011	   emption may be needed and contains the traffic rate measured by the
1012	   egress gateway. Then the ingress gateway determines the traffic rate
1013	   that it is sending towards this egress gateway and hence it can
1014	   calculate the amount of traffic that needs to be pre-empted.

1016	   The ingress gateway could now just shed random microflows, but it is
1017	   better if the least important ones are dropped. The ingress gateway
1018	   could use information stored locally in each reservation's state
1019	   (such as for example the RSVP pre-emption priority) as well as
1020	   information provided by a policy decision point in order to decide
1021	   which of the flows to shed (or perhaps which ones not to shed). The
1022	   ingress gateway then initiates RSVP signalling to instruct the
1023	   relevant destinations that their session has been terminated, and to
1024	   tell (RSVP) nodes along the path to tear down associated RSVP state.
1025	   To guard against recalcitrant sources, normal IntServ policing will
1026	   block any future traffic from the dropped flows from entering the CL-
1027	   region. Note that - with the explicit Pre-emption Alert mechanism -
1028	   since the threshold for re-marking packets to Re-marked-CL may be set
1029	   at significantly less than the physical line capacity, traffic pre-
1030	   emption may be triggered before any congestion has actually occurred
1031	   and before any packet is dropped.

1033	   We extend the scenario further by imagining that (due to a disaster
1034	   of some kind) further routers in the CL-region fail during the time
1035	   taken by the pre-emption process described above. This is handled
1036	   naturally, as packets will continue to be re-marked to Re-marked-CL
1037	   and so the pre-emption process will happen for a second time.

1039	   Pre-emption also helps emergency/military calls by taking into
1040	   account the corresponding call priorities when selecting calls to be
1041	   pre-empted, which is likely to be particularly important in a
1042	   disaster scenario.

1044	4. Details

1046	   This section is intended to provide a systematic summary of the new
1047	   functionality required by the routers in the CL-region.

1049	   A network operator upgrades normal IP routers by:

1051	   o Adding functionality related to admission control and pre-emption
1052	      to all its ingress and egress gateways

1054	   o Adding Pre-Congestion Notification behaviour and Pre-emption Alert
1055	      behaviour to all the nodes in the CL-region.

1057	   We consider the detailed actions required for each of the types of
1058	   node in turn.

1060	4.1. Ingress gateways

1062	   Ingress gateways perform the following tasks:

1064	   o Classify incoming packets - decide whether they are CL or non-CL
1065	      packets. This is done using an IntServ filter spec (source and
1066	      destination addresses and port numbers), whose details have been
1067	      gathered from the RSVP messaging.

1069	   o Police - check that the microflow conforms with what has been
1070	      agreed (i.e. it keeps to its agreed data rate). If necessary,
1071	      packets which do not correspond to any reservations, packets which
1072	      are in excess of the rate agreed for their reservation, and
1073	      packets for a reservation that has earlier been pre-empted may be
1074	      policed. Policing may be achieved via dropping or via re-marking
1075	      of the packet's DSCP to a value different from the CL behaviour
1076	      aggregate.

1078	   o Packet ECN colouring - for CL microflows, set the ECN field to
1079	      ECT(0) or ECT(1) (uses for ECT(0) and ECT(1) will be discussed in
1080	      a later version of this document)

1082	   o Perform 'interior node' functions (see next sub-section)

1084	   o Admission Control - on new session establishment, consider the
1085	      Congestion-Level-Estimate received from the corresponding egress
1086	      gateway and most likely based on a simple configured threshold
1087	      decide if a new call is to be admitted or rejected (taking into
1088	      account local policy information as well as optionally information
1089	      provided by a policy decision point).

1091	   o Probe - if requested by the egress gateway to do so, the ingress
1092	      gateway generates probe traffic so that the egress gateway can
1093	      compute the Congestion-Level-Estimate from this ingress gateway.
1094	      Probe packets may be simple data addressed to the egress gateway
1095	      and require no protocol standardisation, although there will be
1096	      best practice for their number, size and rate.

1098	   o Measure - when it receives an Alert message from an egress
1099	      gateway, it determines the rate at which it is sending packets to
1100	      that egress gateway

1102	   o Pre-empt - calculate how much CL traffic needs to be pre-empted;
1103	      decide which microflows should be dropped, perhaps in consultation
1104	      with a Policy Decision Point; and do the necessary signalling to
1105	      drop them.

1107	4.2. Interior nodes

1109	   Interior nodes do the following tasks:

1111	   o Classify packets - examine the DSCP and ECN field to see if it's a
1112	      CL packet

1114	   o Non-CL packets are handled as usual, with respect to dropping them
1115	      or setting their CE codepoint.

1117	   o Pre-Congestion Notification - CL packets have their CE codepoint
1118	      set according to the algorithm detailed in [CL-marking] and
1119	      outlined in Section 3.

1121	   o Pre-emption Alert - assuming the explicit Pre-emption Alert
1122	      mechanism is being used, when the rate of CL traffic exceeds a
1123	      threshold then re-mark packets to Re-marked-CL.

1125	4.3. Egress gateways

1127	   Egress gateways do the following tasks:

1129	   o Classify packets - determine which ingress gateway a CL packet has
1130	      come from. This is the previous RSVP hop, hence the necessary
1131	      details are obtained just as with IntServ from the state
1132	      associated with the packet five-tuple, which has been built using
1133	      information from the RSVP messages.

1135	   o Meter - for CL packets, calculate the fraction of the total number
1136	      of bits which are in CE marked packets or in Re-marked-CL packets.
1137	      The calculation is done as an exponentially weighted moving
1138	      average (see Appendix). A separate calculation is made for CL
1139	      packets from each ingress gateway. The meter works on an aggregate
1140	      basis and not per microflow.

1142	   o Signal the Congestion-Level-Estimate - this is piggy-backed on the
1143	      reservation reply. An egress gateway's interface is configured to
1144	      know it is an egress gateway, so it always appends this to the
1145	      RESV message. If the Congestion-Level-Estimate is unknown or is
1146	      too stale, then the egress gateway can request the ingress gateway
1147	      to send probes.

1149	   o Packet colouring - for CL packets, set the DSCP and the ECN field
1150	      to whatever has been agreed as appropriate for the next domain. By
1151	      default the ECN field is set to the Not-ECT codepoint. Note that
1152	      this results in the loss of the end-to-end meaning of the ECN
1153	      field. It can usually be assumed that end-to-end congestion
1154	      control is unnecessary within an end-to-end reservation. But if a
1155	      genuine need is identified for end-to-end ECN semantics within a
1156	      reservation, then an alternative is to tunnel CL packets across
1157	      the CL-region, or to agree an extension to end-to-end signalling
1158	      to indicate that the microflow uses an ECN-capable transport. We
1159	      do not recommend such apparently unnecessary complexity.

1161	   o Measure the rate - measure the rate of CL traffic from a
1162	      particular ingress gateway (i.e. the rate for the CL-region-
1163	      aggregate), when alerted (either explicitly or implicitly) that
1164	      pre-emption may be required. The measured rate is reported back to
1165	      the appropriate ingress gateway [RSVP-ECN].

1167	4.4. Failures

1169	   If a gateway fails then regular RSVP procedures will take care of
1170	   things. For example, say an ingress gateway fails. Then RSVP routers
1171	   upstream of it do IP re-routing to a new ingress gateway. Then the
1172	   upstream RSVP routers do RSVP fast local repair, i.e. attempt to re-
1173	   establish reservations through the new ingress gateway and, for
1174	   example, through the same egress gateway. As part of this, admission
1175	   control is performed, using the procedure described in this document.
1176	   This could result in some of the flows being rejected, but those
1177	   accepted will receive the full QoS.

1179	   If an interior node fails, then the regular IP routing protocol will
1180	   re-route round it. If the new route can carry admitted traffic, flows
1181	   gracefully continue. If instead this causes early warning of
1182	   congestion from the new route, admission control based on pre-
1183	   congestion notification will ensure new flows will not be admitted
1184	   until enough existing flows have departed. Finally re-routing may
1185	   result in heavy congestion, when the pre-emption mechanism will kick
1186	   in.

1188	5. Potential future extensions

1190	5.1. Multi-domain and multi-operator usage

1192	   This potential extension would eliminate the trust assumption
1193	   (Section 2.2), so that the CL-region could consist of multiple
1194	   domains run by different operators that did not trust each other.
1195	   Then only the ingress and egress gateways of the CL-region would take
1196	   part in the admission control procedure, i.e. at the ingress to the
1197	   first domain and the egress from the final domain. The border routers
1198	   between operators within the CL-region would only have to do bulk
1199	   accounting - they wouldn't do per microflow metering and policing,
1200	   and they wouldn't take part in signal processing or hold path state
1201	   [Briscoe]. [Re-feedback, Re-feedback-I-D] explains how a downstream
1202	   domain can police that its upstream domain does not 'cheat' by
1203	   admitting traffic when the downstream path is over-congested.

1205	5.2. Adaptive bandwidth for the Controlled Load service

1207	   The admission control mechanism described in this document assumes
1208	   that each router has a fixed bandwidth allocated to CL flows. A
1209	   possible extension is that the bandwidth is flexible, depending on
1210	   the level of non-CL traffic. If a large share of the current load on
1211	   a path is CL, then more CL traffic can be admitted. And if the
1212	   greater share of the load is non-CL, then the admission threshold can
1213	   be proportionately lower. The approach re-arranges sharing between
1214	   classes to aim for economic efficiency, whatever the traffic load
1215	   matrix. It also deals with unforeseen changes to capacity during
1216	   failures better than configuring fixed engineered rates. Adaptive
1217	   bandwidth allocation can be achieved by changing the Pre-Congestion
1218	   marking behaviour, so that the probability of setting the CE
1219	   codepoint would now depend on the number of queued non-CL packets as
1220	   well as the number of CL tokens. The adaptive bandwidth approach
1221	   would be supplemented by placing limits on the adaptation to prevent
1222	   starvation of the CL by other traffic classes and of other classes by
1223	   CL traffic.

1225	5.3. Controlled Load service with end-to-end Pre-Congestion Notification

1227	   It may be possible to extend the framework to parts of the network
1228	   where there are only a low number of CL microflows, i.e. the
1229	   aggregation assumption (Section 2.2) doesn't hold. In the extreme it
1230	   may be possible to operate the framework end-to-end, i.e. between end
1231	   hosts. One potential method is to send probe packets to test whether
1232	   the network can support a prospective new CL microflow. The probe
1233	   packets would be sent at the same traffic rate as expected for the
1234	   actual microflow, but in order not to disturb existing CL traffic a
1235	   router would always schedule probe packets behind CL ones (compare
1236	   [Breslau00]); this implies they have a new DSCP. Otherwise the
1237	   routers would treat probe packets identically to CL packets. In order
1238	   to perform admission control quickly, in parts of the network where
1239	   there are only a few CL microflows, the Pre-Congestion marking
1240	   behaviour for probe packets would switch from CE marking no packets
1241	   to CE marking them all for only a minimal increase in load.

1243	5.4. MPLS-TE

1245	   It may be possible to extend the framework for admission control of
1246	   microflows into a set of MPLS-TE aggregates (Multi-protocol label
1247	   switching traffic engineering). However it would require that the
1248	   MPLS header could include the ECN field, which is not precluded by
1249	   RFC3270.

1251	6. Relationship to other QoS mechanisms

1253	6.1. IntServ Controlled Load

1255	   The CL mechanism delivers QoS similar to Integrated Services
1256	   controlled load, but rather better as queues are kept empty by
1257	   driving admission control from bulk inverse-token-buckets on each
1258	   interface that can detect a rise in load before queues build,
1259	   sometimes termed a virtual queue [AVQ, vq]. It is also more robust to
1260	   route changes.

1262	6.2. Integrated services operation over DiffServ

1264	   Our approach to end-to-end QoS is similar to that described in
1265	   [RFC2998] for Integrated services operation over DiffServ networks.
1266	   Like [RFC2998], an IntServ class (CL in our case) is achieved end-to-
1267	   end, with a CL-region viewed as a single reservation hop in the total
1268	   end-to-end path. Interior routers of the CL-region do not process
1269	   flow signalling nor do they hold state. Unlike [RFC2998] we do not
1270	   require the end-to-end signalling mechanism to be RSVP, although it
1271	   can be.

1273	   Bearing in mind these differences, we can describe our architecture
1274	   in the terms of the options in [RFC2998]. The DiffServ network region
1275	   is RSVP-aware, but awareness is confined to (what [RFC2998] calls)
1276	   the "border routers" of the DiffServ region. We use explicit
1277	   admission control into this region, with static provisioning within
1278	   it. The ingress "border router" does per microflow policing and sets
1279	   the DSCP and ECN fields to indicate the packets are CL ones (i.e. we
1280	   use router marking rather than host marking).

1282	6.3. Differentiated Services

1284	   The DiffServ architecture does not specify any way for devices
1285	   outside the domain to dynamically reserve resources or receive
1286	   indications of network resource availability.  In practice, service
1287	   providers rely on subscription-time Service Level Agreements (SLAs)
1288	   that statically define the parameters of the traffic that will be
1289	   accepted from a customer. The CL mechanism allows dynamic reservation
1290	   of resources through the DiffServ domain and, with the potential
1291	   extension mentioned in Section 5.1, it can span multiple domains
1292	   without active policing mechanisms at the borders (unlike DiffServ).
1293	   Therefore we do not use the traffic conditioning agreements (TCAs) of
1294	   the (informational) DiffServ architecture [RFC2475].

1296	   [Johnson] compares admission control with a 'generously dimensioned'
1297	   DiffServ network as ways to achieve QoS. The former is recommended.

1299	6.4. ECN

1301	   The marking behaviour described in this document complies with the
1302	   ECN aspects of the IP wire protocol RFC3168, but provides its own
1303	   edge-to-edge feedback instead of the TCP aspects of RFC3168. All
1304	   nodes within the CL-region are upgraded with the Pre-Congestion
1305	   Notification and Pre-emption Alert mechanisms, so the requirements of
1306	   [Floyd] are met because the CL-region is an enclosed environment. The
1307	   operator prevents traffic arriving at a node that doesn't understand
1308	   CL by administrative configuration of the ring of gateways around the
1309	   CL-region.

1311	6.5. RTECN

1313	   Real-time ECN (RTECN) [RTECN, RTECN-usage] has a similar aim to this
1314	   document (to achieve a low delay, jitter and loss service suitable
1315	   for RT traffic) and a similar approach (per microflow admission
1316	   control combined with an "early warning" of potential congestion
1317	   through setting the CE codepoint). But it explores a different
1318	   architecture without the aggregation assumption: host-to-host rather
1319	   than edge-to-edge.

1321	6.6. RMD

1323	   Resource Management in DiffServ (RMD) [RMD] is similar to this work,
1324	   in that it pushes complex classification, traffic conditioning and
1325	   admission control functions to the edge of a DiffServ domain and
1326	   simplifies the operation of the interior nodes. One of the RMD modes
1327	   uses measurement-based admission control, however it works
1328	   differently: each interior node measures the user traffic load in the
1329	   PHB traffic aggregate, and each interior node processes a local
1330	   RESERVE message and compares the requested resources with the
1331	   available resources (maximum allowed load minus current load).

1333	   Hence a difference is that the CL architecture described in this
1334	   document has been designed not to require interaction between
1335	   interior nodes and signalling, whereas in RMD all interior nodes are
1336	   QoS-NSLP aware. So our architecture involves less processing in
1337	   interior nodes, is more agnostic to signalling, requires fewer
1338	   changes to existing standards and therefore works with existing RSVP
1339	   as well as having the potential to work with future signalling
1340	   protocols like NSIS.

1342	   RMD introduced the concept of Severe Congestion handling. The pre-
1343	   emption mechanism described in the CL architecture has similar
1344	   objectives but relies on different mechanisms.

1346	6.7. RSVP Aggregation over MPLS-TE

1348	   Multi-protocol label switching traffic engineering (MPLS-TE) allows
1349	   scalable reservation of resources in the core for an aggregate of
1350	   many microflows. To achieve end-to-end reservations, admission
1351	   control and policing of microflows into the aggregate can be achieved
1352	   using techniques such as RSVP Aggregation over MPLS TE Tunnels as per
1353	   [AGGRE-TE]. However, in the case of inter-provider environments,
1354	   these techniques require that admission control and policing be
1355	   repeated at each trust boundary or that MPLS TE tunnels span multiple
1356	   domains.

1358	7. Security Considerations

1360	   To protect against denial of service attacks, the ingress gateway of
1361	   the CL-region needs to police all CL packets and drop packets in
1362	   excess of the reservation. This is similar to operations with
1363	   existing IntServ behaviour.

1365	   For pre-emption, it is considered acceptable from a security
1366	   perspective that the ingress gateway can treat "emergency/military"
1367	   CL flows preferentially compared with "ordinary" CL flows. However,
1368	   in the rest of the CL-region they are not distinguished (nonetheless,
1369	   our proposed technique does not preclude the use of different DSCPs
1370	   at the packet level as well as different priorities at the flow
1371	   level.). Keeping emergency traffic indistinguishable at the packet
1372	   level minimises the opportunity for new security attacks. For
1373	   example, if instead a mechanism used different DSCPs for
1374	   "emergency/military" and "ordinary" packets, then an attacker could
1375	   specifically target the former in the data plane (perhaps for DoS or
1376	   for eavesdropping).

1378	   Further security aspects to be considered later.

1380	8. Acknowledgements

1382	   The admission control mechanism evolved from the work led by Martin
1383	   Karsten on the Guaranteed Stream Provider developed in the M3I
1384	   project [GSPa, GSP-TR], which in turn was based on the theoretical
1385	   work of Gibbens and Kelly [DCAC]. Kennedy Cheng, Gabriele Corliano,
1386	   Carla Di Cairano-Gilfedder, Kashaf Khan, Peter Hovell, Arnaud Jacquet
1387	   and June Tay (BT) helped develop and evaluate this approach.

1389	9. Comments solicited

1391	   Comments and questions are encouraged and very welcome. They can be
1392	   sent to the Transport Area Working Group's mailing list,
1393	   tsvwg@ietf.org, and/or to the authors.

1395	10. Changes from the -00 version of this draft

1397	   There are several modifications to the admission control mechanism
1398	   described in the first version of the draft, but the main technical
1399	   change is the addition of the whole of the Pre-emption mechanism.

1401	11. Appendixes

1403	11.1. Appendix A: Explicit Congestion Notification

1405	   This Appendix provides a brief summary of Explicit Congestion
1406	   Notification (ECN).

1408	   [RFC3168] specifies the incorporation of ECN to TCP and IP, including
1409	   ECN's use of two bits in the IP header. It specifies a method for
1410	   indicating incipient congestion to end-nodes (egg as in RED, Random
1411	   Early Detection), where the notification is through ECN marking
1412	   packets rather than dropping them.

1414	   ECN uses two bits in the IP header of both IPv4 and IPv6 packets:

1416	            0     1     2     3     4     5     6     7
1417	         +-----+-----+-----+-----+-----+-----+-----+-----+
1418	         |          DS FIELD, DSCP           | ECN FIELD |
1419	         +-----+-----+-----+-----+-----+-----+-----+-----+

1421	           DSCP: differentiated services codepoint
1422	           ECN:  Explicit Congestion Notification

1424	   Figure A.1: The Differentiated Services and ECN Fields in IP.

1426	   The two bits of the ECN field have four ECN codepoints, '00' to '11':
1427	         +-----+-----+
1428	         | ECN FIELD |
1429	         +-----+-----+
1430	           ECT   CE
1431	            0     0         Not-ECT
1432	            0     1         ECT(1)
1433	            1     0         ECT(0)
1434	            1     1         CE

1436	   Figure A.2: The ECN Field in IP.

1438	   The not-ECT codepoint '00' indicates a packet that is not using ECN.

1440	   The CE codepoint '11' is set by a router to indicate congestion to
1441	   the end nodes. The term 'CE packet' denotes a packet that has the CE
1442	   codepoint set.

1444	   The ECN-Capable Transport (ECT) codepoints '10' and '01' (ECT(0) and
1445	   ECT(1) respectively) are set by the data sender to indicate that the
1446	   end-points of the transport protocol are ECN-capable. Routers treat
1447	   the ECT(0) and ECT(1) codepoints as equivalent. Senders are free to
1448	   use either the ECT(0) or the ECT(1) codepoint to indicate ECT, on a
1449	   packet-by-packet basis. The use of both the two codepoints for ECT is
1450	   motivated primarily by the desire to allow mechanisms for the data
1451	   sender to verify that network elements are not erasing the CE
1452	   codepoint, and that data receivers are properly reporting to the
1453	   sender the receipt of packets with the CE codepoint set.

1455	   ECN requires support from the transport protocol, in addition to the
1456	   functionality given by the ECN field in the IP packet header.
1457	   [RFC3168] addresses the addition of ECN Capability to TCP, specifying
1458	   three new pieces of functionality: negotiation between the endpoints
1459	   during connection setup to determine if they are both ECN-capable; an
1460	   ECN-Echo (ECE) flag in the TCP header so that the data receiver can
1461	   inform the data sender when a CE packet has been received; and a
1462	   Congestion Window Reduced (CWR) flag in the TCP header so that the
1463	   data sender can inform the data receiver that the congestion window
1464	   has been reduced.

1466	   The transport layer (e.g.. TCP) must respond, in terms of congestion
1467	   control, to a *single* CE packet as it would to a packet drop.

1469	   The advantage of setting the CE codepoint as an indication of
1470	   congestion, instead of relying on packet drops, is that it allows the
1471	   receiver(s) to receive the packet, thus avoiding the potential for
1472	   excessive delays due to retransmissions after packet losses.

1474	11.2. Appendix B: What is distributed measurement-based admission
1475	   control?

1477	   This Appendix briefly explains what distributed measurement-based
1478	   admission control is [Breslau99].

1480	   Traditional admission control algorithms for 'hard' real-time
1481	   services (those providing a firm delay bound for example) guarantee
1482	   QoS by using 'worst case analysis'. Each time a flow is admitted its
1483	   traffic parameters are examined and the network re-calculates the
1484	   remaining resources. When the network gets a new request it therefore
1485	   knows for certain whether the prospective flow, with its particular
1486	   parameters, should be admitted. However, parameter-based admission
1487	   control algorithms result in under-utilisation when the traffic is
1488	   bursty. Therefore 'soft' real time services - like Controlled Load -
1489	   can use a more relaxed admission control algorithm.

1491	   This idea suggests measurement-based admission control (MBAC). The
1492	   aim of MBAC is to provide a statistical service guarantee. The
1493	   classic scenario for MBAC is where each node participates in hop-by-
1494	   hop admission control, characterising existing traffic locally
1495	   through measurements (instead of keeping an accurate track of traffic
1496	   as it is admitted), in order to determine the current value of some
1497	   parameter e.g. load. Note that for scalability the measurement is of
1498	   the aggregate of the flows in the local system. The measured
1499	   parameter(s) is then compared to the requirements of the prospective
1500	   flow to see whether it should be admitted.

1502	   MBAC may also be performed centrally for a network, it which case it
1503	   uses centralised measurements by a bandwidth broker.

1505	   We use distributed MBAC. "Distributed" means that the measurement is
1506	   accumulated for the 'whole-path' using in-band signalling. In our
1507	   case, this means that the measurement of existing traffic is for the
1508	   same pair of ingress and egress gateways as the prospective
1509	   microflow.

1511	   In fact our mechanism can be said to be distributed in three ways:
1512	   all nodes on the ingress-egress path affect the Congestion-Level-
1513	   Estimate; the admission control decision is made just once on behalf
1514	   of all the nodes on the path across the CL-region; and the ingress
1515	   and egress gateways cooperate to perform MBAC.

1517	11.3. Appendix C: Calculating the Exponentially weighted moving average
1518	   (EWMA)

1520	   At the egress gateway, for every CL packet arrival:

1522	   [EWMA-total-bits]n+1  =  (w * bits-in-packet)  +  ((1-w) * [EWMA-
1523	   total-bits]n )

1525	   [EWMA-CE-bits]n+1  =  (B * w * bits-in-packet)  +  ((1-w) * [EWMA-CE-
1526	   bits]n )

1528	   Then, per new flow arrival:

1530	    [Congestion-Level-Estimate]n+1  =  [EWMA-CE-bits]n+1  /  [EWMA-
1531	   total-bits]n+1

1533	   where

1535	   EWMA-total-bits is the total number of bits in CL packets, calculated
1536	   as an exponentially weighted moving average (EWMA)

1538	   EWMA-CE-bits is the total number of bits in CL packets where the
1539	   packet has its CE codepoint set, again calculated as an EWMA.

1541	   B is either 0 or 1:

1543	     B = 0 if the CL packet does not have its CE codepoint set

1545	     B = 1 if the CL packet has its CE codepoint set

1547	   w is the exponential weighting factor.

1549	   Varying the value of the weight trades off between the smoothness and
1550	   responsiveness of the estimate of the percentage of CE packets.
1551	   However, in general both can be achieved, given our original
1552	   assumption of many CL microflows and remembering that the EWMA is
1553	   calculated on the basis of aggregate traffic between the ingress and
1554	   egress gateways.
1555	   There will be a threshold inter-arrival time between packets of the
1556	   same aggregate below which the egress will consider the estimate of
1557	   the Congestion-Level-Estimate as too stale, and it will then trigger
1558	   generation of probes by the ingress.

1560	   The first two per-packet algorithms can be simplified, if their only
1561	   use will be where the result of one is divided by the result of the
1562	   other in the third, per-flow algorithm.

1564	   [EWMA-total-bits]'n+1  =  bits-in-packet  +  (w' * [EWMA- total-
1565	   bits]n )

1567	   [EWMA-CE-bits]'n+1  =  (B * bits-in-packet)  +  (w' * [EWMA-CE-bits]n
1568	   )

1570	   where w' = (1-w)/w.

1572	   If w' is arranged to be a power of 2, these per packet algorithms can
1573	   be implemented solely with a shift and an add.

1575	12. References

1577	   A later version will distinguish normative and informative
1578	   references.

1580	   [AGGRE-TE]    Francois Le Faucheur, Michael Dibiasio, Bruce Davie,
1581	                 Michael Davenport, Chris Christou, Jerry Ash, Bur
1582	                 Goode, 'Aggregation of RSVP Reservations over MPLS
1583	                 TE/DS-TE Tunnels', draft-ietf-tsvwg-rsvp-dste-00 (work
1584	                 in progress), July 2005

1586	   [ANSI.MLPP.Spec] American National Standards Institute,
1587	                 "Telecommunications- Integrated Services Digital
1588	                 Network (ISDN) - Multi-Level Precedence and Pre-
1589	                 emption (MLPP) Service Capability", ANSI T1.619-1992
1590	                 (R1999), 1992.

1592	   [ANSI.MLPP.Supplement] American National Standards Institute, "MLPP
1593	                 Service Domain Cause Value Changes", ANSI ANSI
1594	                 T1.619a-1994 (R1999), 1990.

1596	   [AVQ]         S. Kunniyur and R. Srikant "Analysis and Design of an
1597	                 Adaptive Virtual Queue (AVQ) Algorithm for Active
1598	                 Queue Management", In: Proc. ACM SIGCOMM'01, Computer
1599	                 Communication Review 31 (4) (October, 2001).

1601	   [Breslau99]   L. Breslau, S. Jamin, S. Shenker "Measurement-based
1602	                 admission control: what is the research agenda?", In:
1603	                 Proc. Int'l Workshop on Quality of Service 1999.

1605	   [Breslau00]   L. Breslau, E. Knightly, S. Shenker, I. Stoica, H.
1606	                 Zhang "Endpoint Admission Control: Architectural
1607	                 Issues and Performance", In: ACM SIGCOMM 2000

1609	   [Briscoe]     Bob Briscoe and Steve Rudkin, "Commercial Models for
1610	                 IP Quality of Service Interconnect", BT Technology
1611	                 Journal, Vol 23 No 2, April 2005.

1613	   [CL-marking]  Forthcoming. Supercedes draft-briscoe-tsvwg-cl-phb-00.

1615	   [DCAC]        Richard J. Gibbens and Frank P. Kelly "Distributed
1616	                 connection acceptance control for a connectionless
1617	                 network", In: Proc. International Teletraffic Congress
1618	                 (ITC16), Edinburgh, pp. 941�952 (1999).

1620	   [EMERG-RQTS]  Carlberg, K. and R. Atkinson, "General Requirements
1621	                 for Emergency Telecommunication Service (ETS)", RFC
1622	                 3689, February 2004.

1624	   [EMERG-TEL]   Carlberg, K. and R. Atkinson, "IP Telephony
1625	                 Requirements for Emergency Telecommunication Service
1626	                 (ETS)", RFC 3690, February 2004.

1628	   [Floyd]       S. Floyd, 'Specifying Alternate Semantics for the
1629	                 Explicit Congestion Notification (ECN) Field', draft-
1630	                 floyd-ecn-alternates-02.txt (work in progress), August
1631	                 2005

1633	   [GSPa]        Karsten (Ed.), Martin "GSP/ECN Technology &
1634	                 Experiments", Deliverable: 15.3 PtIII, M3I Eu Vth
1635	                 Framework Project IST-1999-11429, URL:
1636	                 http://www.m3i.org/ (February, 2002) (superseded by
1637	                 [GSP-TR])

1639	   [GSP-TR]      Martin Karsten and Jens Schmitt, "Admission Control
1640	                 Based on Packet Marking and Feedback Signalling �--
1641	                 Mechanisms, Implementation and Experiments", TU-
1642	                 Darmstadt Technical Report TR-KOM-2002-03, URL:
1643	                 http://www.kom.e-technik.tu-
1644	                 darmstadt.de/publications/abstracts/KS02-5.html (May,
1645	                 2002)

1647	   [ITU.MLPP.1990] International Telecommunications Union, "Multilevel
1648	                 Precedence and Pre-emption Service (MLPP)", ITU-T
1649	                 Recommendation I.255.3, 1990.

1651	   [Johnson]     DM Johnson, 'QoS control versus generous
1652	                 dimensioning', BT Technology Journal, Vol 23 No 2,
1653	                 April 2005

1655	   [Re-ECN]      Bob Briscoe, Arnaud Jacquet, Alessandro Salvatori,
1656	                 'Re-ECN: Adding Accountability for Causing Congestion
1657	                 to TCP/IP', draft-briscoe-tsvwg-re-ecn-tcp-00 (work in
1658	                 progress), October 2005.

1660	   [Re-feedback] Bob Briscoe, Arnaud Jacquet, Carla Di Cairano-
1661	                 Gilfedder, Andrea Soppera, 'Re-feedback for Policing
1662	                 Congestion Response in an Inter-network', ACM SIGCOMM
1663	                 2005, August 2005.

1665	   [Reid]        ABD Reid, 'Economics and scalability of QoS
1666	                 solutions', BT Technology Journal, Vol 23 No 2, April
1667	                 2005

1669	   [RFC2211]     J. Wroclawski, Specification of the Controlled-Load
1670	                 Network Element Service, September 1997

1672	   [RFC2309]     Braden, B., et al., "Recommendations on Queue
1673	                 Management and Congestion Avoidance in the Internet",
1674	                 RFC 2309, April 1998.

1676	   [RFC2474]     Nichols, K., Blake, S., Baker, F. and D. Black,
1677	                 "Definition of the Differentiated Services Field (DS
1678	                 Field) in the IPv4 and IPv6 Headers", RFC 2474,
1679	                 December 1998

1681	   [RFC2475]     Blake, S., Black, D., Carlson, M., Davies, E., Wang,
1682	                 Z. and W. Weiss, 'A framework for Differentiated
1683	                 Services', RFC 2475, December 1998.

1685	   [RFC2597]     Heinanen, J., Baker, F., Weiss, W. and J. Wrocklawski,
1686	                 "Assured Forwarding PHB Group", RFC 2597, June 1999.

1688	   [RFC2998]     Bernet, Y., Yavatkar, R., Ford, P., Baker, F., Zhang,
1689	                 L., Speer, M., Braden, R., Davie, B., Wroclawski, J.
1690	                 and E. Felstaine, "A Framework for Integrated Services
1691	                 Operation Over DiffServ Networks", RFC 2998, November
1692	                 2000.

1694	   [RFC3168]     Ramakrishnan, K., Floyd, S. and D. Black "The Addition
1695	                 of Explicit Congestion Notification (ECN) to IP", RFC
1696	                 3168, September 2001.

1698	   [RFC3246]     B. Davie, A. Charny, J.C.R. Bennet, K. Benson, J.Y. Le
1699	                 Boudec, W. Courtney, S. Davari, V. Firoiu, D.
1700	                 Stiliadis, 'An Expedited Forwarding PHB (Per-Hop
1701	                 Behavior)', RFC 3246, March 2002.

1703	   [RFC3270]      Le Faucheur, F., Wu, L., Davie, B., Davari, S.,
1704	                 Vaananen, P., Krishnan, R., Cheval, P., and J.
1705	                 Heinanen, "Multi- Protocol Label Switching (MPLS)
1706	                 Support of Differentiated Services", RFC 3270, May
1707	                 2002.

1709	   [RMD]         Attila Bader, Lars Westberg, Georgios Karagiannis,
1710	                 Cornelia Kappler, Tom Phelan, 'RMD-QOSM - The Resource
1711	                 Management in DiffServ QoS model', draft-ietf-nsis-
1712	                 rmd-03 Work in Progress, June 2005.

1714	   [RSVP-ECN]    Francois Le Faucheur, Anna Charny, Bob Briscoe, Philip
1715	                 Eardley, Joe Barbiaz, Kwok-Ho Chan, 'RSVP Extensions
1716	                 for Admission Control over DiffServ using Pre-
1717	                 congestion Notification', draft-lefaucheur-rsvp-ecn-00
1718	                 (work in progress), October 2005.

1720	   [RTECN]       Babiarz, J., Chan, K. and V. Firoiu, 'Congestion
1721	                 Notification Process for Real-Time Traffic', draft-
1722	                 babiarz-tsvwg-rtecn-04 Work in Progress, July 2005.

1724	   [RTECN-usage] Alexander, C., Ed., Babiarz, J. and J. Matthews,
1725	                 'Admission Control Use Case for Real-time ECN', draft-
1726	                 alexander-rtecn-admission-control-use-case-00, Work in
1727	                 Progress, February 2005.

1729	   [vq]          Costas Courcoubetis and Richard Weber "Buffer Overflow
1730	                 Asymptotics for a Switch Handling Many Traffic
1731	                 Sources" In: Journal Applied Probability 33 pp. 886--
1732	                 903 (1996).

1734	Authors' Addresses

1736	   Bob Briscoe
1737	   BT Research
1738	   B54/77, Sirius House
1739	   Adastral Park
1740	   Martlesham Heath
1741	   Ipswich, Suffolk
1742	   IP5 3RE
1743	   United Kingdom
1744	   Email: bob.briscoe@bt.com

1746	   Dave Songhurst
1747	   BT Research
1748	   B54/69, Sirius House
1749	   Adastral Park
1750	   Martlesham Heath
1751	   Ipswich, Suffolk
1752	   IP5 3RE
1753	   United Kingdom
1754	   Email: dsonghurst@jungle.bt.co.uk

1756	   Philip Eardley
1757	   BT Research
1758	   B54/77, Sirius House
1759	   Adastral Park
1760	   Martlesham Heath
1761	   Ipswich, Suffolk
1762	   IP5 3RE
1763	   United Kingdom
1764	   Email: philip.eardley@bt.com
1765	   Francois Le Faucheur
1766	   Cisco Systems, Inc.
1767	   Village d'Entreprise Green Side - Batiment T3
1768	   400, Avenue de Roumanille
1769	   06410 Biot Sophia-Antipolis
1770	   France
1771	   Email: flefauch@cisco.com

1773	   Anna Charny
1774	   Cisco Systems
1775	   300 Apollo Drive
1776	   Chelmsford, MA 01824
1777	   USA
1778	   Email: acharny@cisco.com

1780	   Kwok Ho Chan
1781	   Nortel Networks
1782	   600 Technology Park Drive
1783	   Billerica, MA  01821
1784	   USA
1785	   Email: khchan@nortel.com

1787	   Jozef Z. Babiarz
1788	   Nortel Networks
1789	   3500 Carling Avenue
1790	   Ottawa, Ont  K2H 8E9
1791	   Canada
1792	   Email: babiarz@nortel.com

1794	Intellectual Property Statement

1796	   The IETF takes no position regarding the validity or scope of any
1797	   Intellectual Property Rights or other rights that might be claimed to
1798	   pertain to the implementation or use of the technology described in
1799	   this document or the extent to which any license under such rights
1800	   might or might not be available; nor does it represent that it has
1801	   made any independent effort to identify any such rights.  Information
1802	   on the procedures with respect to rights in RFC documents can be
1803	   found in BCP 78 and BCP 79.

1805	   Copies of IPR disclosures made to the IETF Secretariat and any
1806	   assurances of licenses to be made available, or the result of an
1807	   attempt made to obtain a general license or permission for the use of
1808	   such proprietary rights by implementers or users of this
1809	   specification can be obtained from the IETF on-line IPR repository at
1810	   http://www.ietf.org/ipr.

1812	   The IETF invites any interested party to bring to its attention any
1813	   copyrights, patents or patent applications, or other proprietary
1814	   rights that may cover technology that may be required to implement
1815	   this standard.  Please address the information to the IETF at
1816	   ietf-ipr@ietf.org

1818	Disclaimer of Validity

1820	   This document and the information contained herein are provided on an
1821	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1822	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
1823	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
1824	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1825	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1826	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1828	Copyright Statement

1830	   Copyright (C) The Internet Society (2005).

1832	   This document is subject to the rights, licenses and restrictions
1833	   contained in BCP 78, and except as set forth therein, the authors
1834	   retain all their rights.