idnits 2.17.1 draft-briscoe-tsvwg-cl-architecture-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 25. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1826. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1803. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1810. ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line 1830), which is fine, but *also* found old RFC 2026, Section 10.4C, paragraph 1 text on line 47. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document seems to lack an RFC 3979 Section 5, para. 3 IPR Disclosure Invitation -- however, there's a paragraph with a matching beginning. Boilerplate error? Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 2 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 57 has weird spacing: '...ning of their...' == Line 725 has weird spacing: '...reshold thr...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 24, 2005) is 6758 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2205' is mentioned on line 145, but not defined == Missing Reference: 'RT-ECN' is mentioned on line 585, but not defined == Missing Reference: 'Re-feedback-I-D' is mentioned on line 1201, but not defined == Unused Reference: 'AVQ' is defined on line 1596, but no explicit reference was found in the text == Unused Reference: 'RFC2309' is defined on line 1672, but no explicit reference was found in the text == Unused Reference: 'RFC2474' is defined on line 1676, but no explicit reference was found in the text == Unused Reference: 'RFC2597' is defined on line 1685, but no explicit reference was found in the text == Unused Reference: 'RFC3246' is defined on line 1698, but no explicit reference was found in the text == Unused Reference: 'RFC3270' is defined on line 1703, but no explicit reference was found in the text == Outdated reference: A later version (-05) exists of draft-ietf-tsvwg-rsvp-dste-00 -- Possible downref: Non-RFC (?) normative reference: ref. 'AVQ' -- Possible downref: Non-RFC (?) normative reference: ref. 'Breslau99' -- Possible downref: Non-RFC (?) normative reference: ref. 'Breslau00' -- Possible downref: Non-RFC (?) normative reference: ref. 'Briscoe' == Outdated reference: A later version (-03) exists of draft-briscoe-tsvwg-cl-phb-00 -- Possible downref: Normative reference to a draft: ref. 'CL-marking' -- Possible downref: Non-RFC (?) normative reference: ref. 'DCAC' ** Downref: Normative reference to an Informational RFC: RFC 3689 (ref. 'EMERG-RQTS') ** Downref: Normative reference to an Informational RFC: RFC 3690 (ref. 'EMERG-TEL') -- Possible downref: Normative reference to a draft: ref. 'Floyd' -- Possible downref: Non-RFC (?) normative reference: ref. 'GSPa' -- Possible downref: Non-RFC (?) normative reference: ref. 'GSP-TR' -- Possible downref: Non-RFC (?) normative reference: ref. 'ITU.MLPP.1990' -- Possible downref: Non-RFC (?) normative reference: ref. 'Johnson' == Outdated reference: A later version (-09) exists of draft-briscoe-tsvwg-re-ecn-tcp-00 -- Possible downref: Non-RFC (?) normative reference: ref. 'Re-feedback' -- Possible downref: Non-RFC (?) normative reference: ref. 'Reid' ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567) ** Downref: Normative reference to an Informational RFC: RFC 2475 ** Downref: Normative reference to an Informational RFC: RFC 2998 == Outdated reference: A later version (-20) exists of draft-ietf-nsis-rmd-03 ** Downref: Normative reference to an Experimental draft: draft-ietf-nsis-rmd (ref. 'RMD') == Outdated reference: A later version (-01) exists of draft-lefaucheur-rsvp-ecn-00 -- Possible downref: Normative reference to a draft: ref. 'RSVP-ECN' == Outdated reference: A later version (-05) exists of draft-babiarz-tsvwg-rtecn-04 -- Possible downref: Normative reference to a draft: ref. 'RTECN' -- Possible downref: Normative reference to a draft: ref. 'RTECN-usage' Summary: 13 errors (**), 0 flaws (~~), 20 warnings (==), 22 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TSVWG B. Briscoe 2 Internet Draft P. Eardley 3 draft-briscoe-tsvwg-cl-architecture-01.txt D. Songhurst 4 Expires: April 2006 BT 6 F. Le Faucheur 7 A. Charny 8 Cisco Systems, Inc 10 J. Barbiaz 11 K. Chan 12 Nortel 14 October 24, 2005 16 A Framework for Admission Control over DiffServ using Pre-Congestion 17 Notification 18 draft-briscoe-tsvwg-cl-architecture-01.txt 20 Status of this Memo 22 By submitting this Internet-Draft, each author represents that any 23 applicable patent or other IPR claims of which he or she is aware 24 have been or will be disclosed, and any of which he or she becomes 25 aware will be disclosed, in accordance with Section 6 of BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF), its areas, and its working groups. Note that 29 other groups may also distribute working documents as Internet- 30 Drafts. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress". 37 The list of current Internet-Drafts can be accessed at 38 http://www.ietf.org/ietf/1id-abstracts.txt 40 The list of Internet-Draft Shadow Directories can be accessed at 41 http://www.ietf.org/shadow.html 43 This Internet-Draft will expire on May 24, 2006. 45 Copyright Notice 47 Copyright (C) The Internet Society (2005). All Rights Reserved. 49 Abstract 51 This document describes a framework to achieve an end-to-end 52 Controlled Load (CL) service without the scalability problems of 53 previous approaches. Flow admission control and if necessary flow 54 pre-emption preserve the CL service to admitted flows. But interior 55 routers within a large DiffServ-based region of the Internet do not 56 require flow state or signalling. They only have to give early 57 warning of their own congestion by bulk packet marking using a new 58 pre-congestion notification behaviour. Gateways around the edges of 59 the region convert measurements of this packet granularity marking 60 into admission control and pre-emption functions at flow granularity. 62 Authors' Note (TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION) 64 This document is posted as an Internet-Draft with the intention of 65 eventually becoming an INFORMATIONAL RFC, rather than a standards 66 track document. 68 Table of Contents 70 1. Introduction................................................4 71 1.1. Summary................................................4 72 1.1.1. Admission control..................................5 73 1.1.2. Pre-emption........................................7 74 1.1.3. Both admission control and pre-emption.............8 75 1.2. Terminology............................................8 76 1.3. Existing terminology...................................10 77 1.4. Standardisation requirements...........................10 78 1.5. Structure of rest of the document......................10 79 2. Key aspects of the framework................................11 80 2.1. Key goals.............................................11 81 2.2. Key assumptions........................................12 82 2.3. Key benefits..........................................15 83 3. Architecture...............................................17 84 3.1. Admission control......................................17 85 3.1.1. Pre-Congestion Notification marking behaviour......17 86 3.1.2. Measurements to support admission control..........18 87 3.1.3. How edge-to-edge admission control supports end-to-end 88 QoS signalling..........................................19 89 3.1.4. Use case.........................................19 90 3.2. Pre-emption...........................................20 91 3.2.1. Alerting an ingress gateway that pre-emption may be 92 needed..................................................20 93 3.2.2. Determining the right amount of CL traffic to drop.23 94 3.2.3. Use case for pre-emption..........................24 95 4. Details....................................................25 96 4.1. Ingress gateways.......................................26 97 4.2. Interior nodes........................................27 98 4.3. Egress gateways........................................27 99 4.4. Failures..............................................28 100 5. Potential future extensions.................................29 101 5.1. Multi-domain and multi-operator usage..................29 102 5.2. Adaptive bandwidth for the Controlled Load service......29 103 5.3. Controlled Load service with end-to-end Pre-Congestion 104 Notification...............................................29 105 5.4. MPLS-TE...............................................30 106 6. Relationship to other QoS mechanisms........................30 107 6.1. IntServ Controlled Load................................30 108 6.2. Integrated services operation over DiffServ............30 109 6.3. Differentiated Services................................31 110 6.4. ECN...................................................31 111 6.5. RTECN.................................................31 112 6.6. RMD...................................................31 113 6.7. RSVP Aggregation over MPLS-TE..........................32 114 7. Security Considerations.....................................32 115 8. Acknowledgements...........................................33 116 9. Comments solicited.........................................33 117 10. Changes from the -00 version of this draft.................33 118 11. Appendixes................................................33 119 11.1. Appendix A: Explicit Congestion Notification..........33 120 11.2. Appendix B: What is distributed measurement-based admission 121 control?...................................................35 122 11.3. Appendix C: Calculating the Exponentially weighted moving 123 average (EWMA).............................................36 124 12. References................................................37 125 Authors' Addresses............................................41 126 Intellectual Property Statement................................42 127 Disclaimer of Validity........................................43 128 Copyright Statement...........................................43 130 1. Introduction 132 1.1. Summary 134 This document describes a framework to achieve an end-to-end 135 controlled load service by using - within a large region of the 136 Internet - DiffServ and edge-to-edge distributed measurement-based 137 admission control and flow pre-emption. Controlled load service is a 138 quality of service (QoS) closely approximating the QoS that the same 139 flow would receive from a lightly loaded network element [RFC2211]. 140 Controlled Load (CL) is useful for inelastic flows such as those for 141 real-time media. 143 In line with the "IntServ over DiffServ" framework defined in 144 [RFC2998], the CL service is supported end-to-end and RSVP signalling 145 [RFC2205] is used end-to-end, over an edge-to-edge DiffServ region. 147 ___ ___ _______________________________________ ____ ___ 148 | | | | | | | | | | 149 | | | | |Ingress Interior Egress| | | | | 150 | | | | |gateway nodes gateway| | | | | 151 | | | | |-------+ +-------+ +-------+ +------| | | | | 152 | | | | | CL- | | CL- | | CL- | | | | | | | 153 | |..| |..|marking|..|marking|..|marking|..| Meter|..| |..| | 154 | | | | |-------+ +-------+ +-------+ +------| | | | | 155 | | | | | \ / | | | | | 156 | | | | | \ / | | | | | 157 | | | | | \ Congestion-Level-Estimate / | | | | | 158 | | | | | \ (for admission control) / | | | | | 159 | | | | | --<-----<----<----<-----<-- | | | | | 160 | | | | | Sustainable-Aggregate-Rate | | | | | 161 | | | | | (for pre-emption) | | | | | 162 |___| |___| |_______________________________________| |____| |___| 164 Sx Access CL-region Access Rx 165 End Network Network End 166 Host Host 167 <------ edge-to-edge signalling -----> 168 (for admission control & pre-emption) 170 <-------------------end-to-end QoS signalling protocol---------------> 172 Figure 1: Overall QoS architecture (NB terminology explained later) 173 In Section 1.1.1 we summarise how admission of new CL microflows is 174 controlled so as to deliver the required QoS. In abnormal 175 circumstances for instance a disaster affecting multiple interior 176 nodes, then the QoS on existing CL microflows may degrade even if 177 care was exercised when admitting those microflows before those 178 circumstances. Therefore we also propose a mechanism (summarised in 179 Section 1.1.2) to pre-empt some of the existing microflows. Then 180 remaining microflows retain their expected QoS, while improved QoS is 181 quickly restored to lower priority traffic. 183 1.1.1. Admission control 185 This document describes a new admission control procedure for an 186 edge-to-edge region, which uses a new per-hop Explicit Congestion 187 Notification marking behaviour as a fundamental building block. In 188 turn, an end-to-end CL service would use this as a building block 189 within a broader QoS architecture. 191 The per-hop, edge-to-edge and end-to-end aspects are now briefly 192 introduced in turn. 194 Appendix A provides a brief summary of Explicit Congestion 195 Notification (ECN) [RFC3168]. It specifies that a router sets the ECN 196 field to the Congestion Experienced (CE) value as a warning of 197 incipient congestion. RFC3168 doesn't specify a particular algorithm 198 for setting the CE codepoint, although RED (Random Early Detection) 199 is expected to be used. We introduce a new algorithm in this 200 document, called Pre-Congestion Notification. It aims to set the CE 201 codepoint before there is any significant build-up of CL packets in 202 the queue, but as an "early warning" when the amount of packets 203 flowing is getting close to the engineered capacity. Hence it can be 204 used with per-hop behaviours (PHBs) designed to operate with very low 205 queue occupancy. Note that our use of the ECN field operates across 206 the CL-region, i.e. edge-to-edge, and not host-to-host as in 207 [RFC3168]. 209 This framework assumes that the Pre-Congestion Notification behaviour 210 is used in a controlled environment, i.e. within the controlled edge- 211 to-edge region. 213 Within the controlled edge-to-edge region, a particular packet 214 receives the Pre-Congestion Notification behaviour if the packet's 215 header fulfils two conditions: its DSCP (differentiated services 216 codepoint) corresponds to the PHB for CL traffic, and also its ECN 217 field indicates ECN Capable Transport (ECT). 219 Turning next to the edge-to-edge aspect. All nodes within a region of 220 the Internet, which we call the CL-region, apply the PHB used for CL 221 traffic and the Pre-Congestion Notification behaviour. Traffic must 222 enter/leave the CL-region through ingress/egress gateways, which have 223 special functionality. Typically the CL-region is the core or 224 backbone of an operator. The CL service is achieved "edge-to-edge" 225 across the CL-region, by using distributed measurement-based 226 admission control: the decision whether to admit a new microflow 227 depends on a measurement of the existing traffic between the same 228 pair of ingress and egress gateways (i.e. the same pair as the 229 prospective new microflow). (See Appendix B for further discussion on 230 "What is distributed measurement-based admission control?") 232 As CL packets travel across the CL-region, nodes will set the CE 233 codepoint (according to the Pre-Congestion Notification algorithm) as 234 an "early warning" of potential congestion, i.e. before there is any 235 significant build-up of CL packets in the queue. For traffic from 236 each remote ingress gateway, the CL-region's egress gateway measures 237 the fraction of CL traffic for which the CE codepoint is set. The 238 egress gateway calculates the value on a per bit basis as an 239 exponentially weighted moving average (which we term Congestion- 240 Level-Estimate). Then reports it to the CL-region's ingress gateway 241 piggy-backed on the signalling for a new flow. The ingress gateway 242 only admits the new CL microflow if the Congestion-Level-Estimate is 243 less than a threshold value. Hence previously accepted CL microflows 244 will suffer minimal queuing delay, jitter and loss. 246 In turn, the edge-to-edge architecture is a building block in 247 delivering an end-to-end CL service. The approach is similar to that 248 described in [RFC2998] for Integrated services operation over 249 DiffServ networks. Like [RFC2998], an IntServ class (CL in our case) 250 is achieved end-to-end, with a CL-region viewed as a single 251 reservation hop in the total end-to-end path. Interior nodes of the 252 CL-region do not process flow signalling nor do they hold state. We 253 assume that the end-to-end signalling mechanism is RSVP (Section 254 2.2). However, the RSVP signalling may itself be originated or 255 terminated by proxies still closer to the edge of the network, such 256 as home hubs or the like, triggered in turn by application layer 257 signalling. [RFC2998] and our approach are compared further in 258 Section 6.2. 260 An important benefit compared with the IntServ over DiffServ model 261 [RFC2998] arises from the fact that the load is controlled 262 dynamically rather than with the traffic conditioning agreements 263 (TCAs). TCAs were originally introduced in the (informational) 264 DiffServ architecture [RFC2475] as an alternative to reservation 265 processing in the interior region in order to reduce the burden on 266 interior nodes. With TCAs, in practice service providers rely on 267 subscription-time Service Level Agreements that statically define the 268 parameters of the traffic that will be accepted from a customer. The 269 problem arises because the TCA at the ingress must allow any 270 destination address, if it is to remain scalable. But for longer 271 topologies, the chances increase that traffic will focus on an 272 interior resource, even though it is within contract at the ingress 273 [Reid], e.g. all flows converge on the same egress gateway. Even 274 though networks can be engineered to make such failures rare, when 275 they occur all inelastic flows through the congested resource fail 276 catastrophically. 278 Distributed measurement-based admission control avoids reservation 279 processing (whether per flow or aggregated) on interior nodes but 280 flows are still blocked dynamically in response to actual congestion 281 on any interior node. Hence there is no need for accurate or 282 conservative prediction of the traffic matrix. 284 1.1.2. Pre-emption 286 An essential QoS issue in core and backbone networks is being able to 287 cope with failures of nodes and links. The consequent re-routing can 288 cause severe congestion on some links and hence degrade the QoS 289 experienced by on-going microflows and other, lower priority traffic. 290 Even when the network is engineered to sustain a single link failure, 291 multiple link failures (e.g. due to a fibre cut or a node failure, or 292 a natural disaster) can cause violation of capacity constraints and 293 resulting QoS failures. Our solution uses rate-based pre-emption, so 294 that sufficient of the previously admitted CL microflows are dropped 295 to ensure that the remaining ones again receive QoS commensurate with 296 the CL service and at least some QoS is quickly restored to other 297 traffic classes. 299 The solution has two aspects. First, triggering the ingress gateway 300 to test whether pre-emption may be needed. This involves an optional 301 new router marking behaviour for Pre-emption Alert. Secondly, 302 calculating the right amount of traffic to drop. This involves the 303 egress gateway measuring, and reporting to the ingress gateway, the 304 current amount of CL traffic received from that particular ingress 305 gateway. The ingress gateway compares this measurement (which is the 306 amount that the network can actually support, and which we thus call 307 the Sustainable-Aggregate-Rate) with the rate that it is sending and 308 hence determines how much traffic needs to be pre-empted. 310 The solution operates within a little over one round trip time - the 311 time required for microflow packets that have experienced Pre-emption 312 Alert marking to travel downstream through the CL-region and arrive 313 at the egress gateway, plus some additional time for the egress 314 gateway to measure the rate seen after it has been alerted that pre- 315 emption may be needed, and the time for the egress gateway to report 316 this information to the ingress gateway. 318 1.1.3. Both admission control and pre-emption 320 This document describes both the admission control and pre-emption 321 mechanisms, and we suggest that an operator uses both. However, we do 322 not require this and some operators may want to implement only one. 324 For example, an operator could use just admission control, solving 325 heavy congestion (caused by re-routing) by 'just waiting' - as 326 sessions end, existing microflows naturally depart from the system 327 over time, and the admission control mechanism will prevent admission 328 of new microflows that use the affected links. So the CL-region will 329 naturally return to normal controlled load service, but with reduced 330 capacity. The drawback of this approach would be that until flows 331 naturally depart to relieve the congestion, all flows and lower 332 priority services will be adversely affected. As another example, an 333 operator could use just admission control, avoiding heavy congestion 334 (caused by re-routing) by 'capacity planning' - by configuring 335 admission control thresholds to lower levels than the network could 336 accept in normal situations such that the load after failure is 337 expected to stay below acceptable levels even with reduced network 338 resources. 340 On the other hand, an operator could just rely for admission control 341 on the traffic conditioning agreements of the DiffServ architecture 342 [RFC2475]. The pre-emption mechanism described in this document would 343 be used to counteract the problem described at the end of Section 344 1.1.1. 346 1.2. Terminology 348 o Ingress gateway: node at an ingress to the CL-region. A CL-region 349 may have several ingress gateways. 351 o Egress gateway: node at an egress from the CL-region. A CL-region 352 may have several egress gateways. 354 o Interior node: a node which is part of the CL-region, but isn't an 355 ingress or egress node. 357 o CL-region: A region of the Internet in which all traffic 358 enters/leaves through an ingress/egress gateway and all nodes run 359 the Pre-Congestion Notification and Pre-emption Alert behaviours. 360 A CL-region is a DiffServ region (a DiffServ region is either a 361 single DiffServ domain or set of contiguous DiffServ domains), but 362 note that the CL-region does not use the traffic conditioning 363 agreements (TCAs) of the (informational) DiffServ architecture. 365 o CL-region-aggregate: all the microflows between a specific pair of 366 ingress and egress gateways. Note there is no identifier unique to 367 the aggregate. 369 o Pre-Congestion Notification: a new algorithm for deciding whether 370 to set the ECN CE codepoint (Explicit Congestion Notification 371 Congestion Experienced), for use by all routers in the CL-region. 372 A router sets the CE codepoint as an "early warning" that the load 373 is nearing the engineered admission control capacity, before there 374 is any significant build-up of CL packets in the queue. 376 o Inverse-token-bucket: a token bucket for which tokens are added 377 when packets are queued for transmission on the corresponding link 378 and consumed at a fixed rate. This is the inverse of a normal 379 token bucket. 381 o Pre-emption Alert: a new router marking behaviour, for use by 382 either all or none of the routers in the CL-region. A router re- 383 marks a packet to Re-marked-CL to warn explicitly that pre-emption 384 may be needed. 386 o Congestion-Level-Estimate: the number of bits in CL packets that 387 have the CE codepoint set, divided by the number of bits in all CL 388 packets. It is calculated as an exponentially weighted moving 389 average. It is calculated by an egress gateway for the CL packets 390 from a particular ingress gateway, i.e. there is a Congestion- 391 Level-Estimate for each CL-region-aggregate. 393 o Sustainable-Aggregate-Rate: the rate of traffic that the network 394 can actually support for a specific CL-region-aggregate. So it is 395 measured by an egress gateway for the CL packets from a particular 396 ingress gateway. 398 1.3. Existing terminology 400 This is a placeholder for useful terminology that is defined 401 elsewhere. 403 1.4. Standardisation requirements 405 The framework described in this document has two new standardisation 406 requirements: 408 o new Pre-Congestion Notification and Pre-emption Alert marking 409 behaviours are required, as detailed in [CL-marking]. 411 o the end-to-end signalling protocol needs to be modified to carry 412 the Congestion-Level-Estimate report (for admission control) and 413 the Sustainable-Aggregate-Rate (for pre-emption). With our 414 assumption of RSVP (Section 2.2) as the end-to-end signalling 415 protocol, it means that extensions to RSVP are required, as 416 detailed in [RSVP-ECN], for example to carry the Congestion-Level- 417 Estimate and Sustainable-Aggregate-Rate information from egress 418 gateway to ingress gateway. 420 We are discussing whether the PHB used by CL traffic should be a new 421 PHB (indicated by a new DSCP) or whether the Expedited Forwarding 422 (EF) PHB can be used with the addition of the required ECN marking 423 behaviour. 425 Other than these things, the arrangement uses existing IETF protocols 426 throughout, although not in their usual architecture. 428 1.5. Structure of rest of the document 430 Section 2 describes some key aspects of the framework: our goals, 431 assumptions and the benefits we believe it has. Section 3 describes 432 the architecture (including a use case), whilst Section 4 summarises 433 the required changes to the various nodes in the CL-region. Section 5 434 outlines some possible extensions. Section 6 provides some comparison 435 with existing QoS mechanisms. 437 2. Key aspects of the framework 439 In this section we discuss the key aspects of the framework: 441 o At a high level, our key goals, i.e. the functionality that we 442 want to achieve 444 o The assumptions that we're prepared to make 446 o The consequent benefits they bring 448 2.1. Key goals 450 The framework achieves an end-to-end controlled load (CL) service 451 where a segment of the end-to-end path is an edge-to-edge Pre- 452 Congestion Notification region. CL is a quality of service (QoS) 453 closely approximating the QoS that the same flow would receive from a 454 lightly loaded network element [RFC2211]. It is useful for inelastic 455 flows such as those for real-time media. 457 o The CL service should be achieved despite varying load levels of 458 other sorts of traffic, which may or may not be rate adaptive 459 (i.e. responsive to packet drops or ECN marks). 461 o The CL service should be supported for a variety of possible CL 462 sources: Constant Bit Rate (CBR), Variable Bit Rate (VBR) and 463 voice with silence suppression. VBR is the most challenging to 464 support. 466 o After a localised failure in the interior of the CL-region causing 467 heavy congestion, the CL service should recover gracefully by pre- 468 empting (dropping) some of the admitted CL microflows, whilst 469 preserving as many of them as possible with their full CL QoS. 471 o It is suggested that pre-emption needs to be completed within 1-2 472 seconds, because it is estimated that after a few seconds then 473 many affected users will start to hang up (and then not only is a 474 pre-emption mechanism redundant and possibly even counter- 475 productive, but also many more flows than necessary to reduce 476 congestion may hang up). Also, other, lower priority traffic 477 classes will not be restored to partial service until the higher 478 priority CL service reduces its load on shared links. 480 o The CL service should support emergency services ([EMERG-RQTS], 481 [EMERG-TEL]) as well as the Assured Service which is the IP 482 implementation of the existing ITU-T/NATO/DoD telephone system 483 architecture known as Multi-Level Pre-emption and Precedence 484 [ITU.MLPP.1990] [ANSI.MLPP.Spec][ANSI.MLPP.Supplement], or MLPP. 485 In particular, this involves admitting new high priority sessions 486 even when admission control thresholds are reached and new routine 487 sessions are rejected. Similarly, this involves taking into 488 account session priorities and properties at the time of pre- 489 empting calls. 491 2.2. Key assumptions 493 The framework does not try to deliver the above functionality in all 494 scenarios. We make the following assumptions about the type of 495 scenario to be solved. 497 o Edge-to-edge: all the nodes in the CL-region are upgraded with the 498 Pre-Congestion Notification and Pre-emption Alert mechanisms, and 499 all the ingress and egress gateways are upgraded to perform the 500 measurement-based admission control and pre-emption. Note that 501 although the upgrades required are edge-to-edge, the CL service is 502 provided end-to-end. 504 o Additional load: we assume that any additional load offered within 505 the reaction time of the admission control mechanism doesn't move 506 the CL-region directly from no congestion to overload. So it 507 assumes there will always be an intermediate stage where some CL 508 packets have their CE codepoint set, but they are still delivered 509 without significant QoS degradation. We believe this is valid for 510 core and backbone networks with typical call arrival patterns 511 (given the reaction time is little more than one round trip time 512 across the CL-region), but is unlikely to be valid in access 513 networks where the granularity of an individual call becomes 514 significant. 516 o Aggregation: we assume that in normal operations, there are many 517 CL microflows within the CL-region, typically at least hundreds 518 between any pair of ingress and egress gateways. The implication 519 is that the solution is targeted at core and backbone networks and 520 possibly parts of large access networks. 522 o Trust: we assume that there is trust between all the nodes in the 523 CL-region. For example, this trust model is satisfied if one 524 operator runs the whole of the CL-region. But we make no such 525 assumptions about the end nodes, i.e. depending on the scenario 526 they may be trusted or untrusted by the CL-region. 528 o Signalling: we assume that the end-to-end signalling protocol is 529 RSVP. Section 3 describes how the CL-region fits into such an end- 530 to-end QoS scenario, whilst [RSVP-ECN] describes the extensions to 531 RSVP that are required. 533 o Separation: we assume that all nodes within the CL-region are 534 upgraded with the CL mechanism, so the requirements of [Floyd] are 535 met because the CL-region is an enclosed environment. Also, an 536 operator separates CL-traffic in the CL-region from outside 537 traffic by administrative configuration of the ring of gateways 538 around the region. Within the CL-region we assume that the CL- 539 traffic is separated from non-CL traffic. 541 o Routing: we assume that one of the following applies: 543 (same path) all packets between a pair of ingress and egress 544 gateways follow the same path. This ensures that the Congestion- 545 Level-Estimate used in the admission control procedure reflects 546 the status of the path followed by the new flow's packets 548 (load balanced) packets between a pair of ingress and egress 549 gateways follow different paths but that the load balancing 550 scheme is tuned in the CL-region to distribute load such that 551 the different paths always receive comparable relative load. 552 This ensures that the Congestion-Level-Estimate used in the 553 admission control procedure (and which is computed taking into 554 account packets travelling on all the paths) also approximately 555 reflects the status of the actual path followed by the new 556 microflow's packets 558 (worst case assumed) packets between a pair of ingress and 559 egress gateways follow different paths but that (i) it is 560 acceptable for the operator to keep the CL traffic between this 561 pair of gateways to a level dictated by the most loaded of all 562 paths between this pair of gateways (so that CL traffic may be 563 rejected - or even pre-empted in some situations - even if one 564 or more of the paths between the pair of gateways is operating 565 below its engineered levels) and that (ii) it is acceptable for 566 that operator to configure engineered levels below optimum 567 levels to compensate for the fact that the effect on the 568 Congestion-Level-Estimate of the congestion experienced over one 569 of the paths may be diluted by traffic received over non- 570 congested paths so that lower thresholds need to be used in 571 these cases to ensure early admission control rejection and pre- 572 emption over the congested paths. 574 We are investigating ways of loosening the restrictions set by some 575 of these assumptions, for instance: 577 o Trust: to allow the CL-region to span multiple, non-trusting 578 operators, using the technique of [Re-feedback] [Re-ECN] and 579 mentioned in Section 5.1. 581 o Signalling: we believe that the solution could operate with 582 another signalling protocol such as NSIS. We would very much 583 welcome input / collaboration with the NSIS community in order to 584 carry out similar work as done for RSVP. It could also work with 585 application level signalling as suggested in [RT-ECN]. 587 o Additional load: we believe that the assumption is valid for core 588 and backbone networks, with an appropriate margin between the 589 inverse-token-bucket's token rate and the configured rate for CL 590 traffic. However, in principle a burst of admission requests can 591 occur in a short time. We expect this to be a rare event under 592 normal conditions, but it could happen e.g.. due to a 'flash 593 crowd'. If it does, then more flows may be admitted than should 594 be, triggering the pre-emption mechanisms., To avoid the need for 595 pre-emption, 'call gapping' could be used at the egress (i.e. the 596 egress gateway paces out the admission of microflows). 598 o Separation: the assumption that CL traffic is separated from non- 599 CL traffic implies that the CL traffic has its own PHB, not shared 600 with other traffic. We are looking at whether it could share 601 Expedited Forwarding's PHB, but supplemented with the new Pre- 602 Congestion Notification and Pre-emption Alert marking behaviours. 603 If this is possible, other PHBs (like Assured Forwarding) could be 604 supplemented with the same new behaviours. This is similar to how 605 RFC3168 ECN was defined to supplement any PHB. 607 o Routing: we are looking in greater detail at the solution in the 608 presence of Equal Cost Multi-Path routing and at suitable 609 enhancements. 611 2.3. Key benefits 613 We believe that the mechanism described in this document has several 614 advantages: 616 o It achieves statistical guarantees of quality of service for 617 microflows, delivering a very low delay, jitter and packet loss 618 service suitable for applications like voice and video calls that 619 generate real time inelastic traffic. This is because of its per 620 microflow admission control scheme, combined with its dynamic on- 621 path "early warning" of potential congestion. The guarantee is at 622 least as strong as with IntServ Controlled Load (Section 6.1 623 mentions why the guarantee may be somewhat better), but without 624 the scalability problems of per-microflow IntServ. 626 o It can support "Emergency" and military Multi-Level Pre-emption 627 and Priority services, even in times of heavy congestion (perhaps 628 caused by failure of a node within the CL-region), by pre-empting 629 on-going "ordinary CL microflows". 631 o It scales well, because there is no signal processing or path 632 state held by the interior nodes of the CL-region. 634 o It is resilient, again because no state is held by the interior 635 nodes of the CL-region. Hence during an interior routing change 636 caused by a node failure no microflow state has to be relocated. 637 The pre-emption mechanism further helps resilience because it 638 rapidly reduces the load to one that the CL-region can support. 640 o It helps preserve, through the pre-emption mechanism, QoS to as 641 many microflows as possible and to lower priority traffic in times 642 of heavy congestion (e.g.. caused by failure of an interior node). 643 Otherwise long-lived microflows could cause loss on all CL 644 microflows for a long time. 646 o It avoids the potential catastrophic failure problem when the 647 DiffServ architecture is used in large networks using statically 648 provisioned capacity. This is achieved by controlling the load 649 dynamically based on edge-to-edge-path real-time measurement of 650 Pre-Congestion Notification, as discussed in Section 1.1.1. 652 o It requires minimal new standardisation, because it reuses 653 existing QoS protocols and algorithms. 655 o It can be deployed incrementally, region by region or network by 656 network. Not all the regions or networks on the end-to-end path 657 need to have it deployed. Two CL-regions can even be separated by 658 a network that uses another QoS mechanism (e.g. MPLS-TE). 660 o It provides a deployment path for use of ECN for real-time 661 applications. Operators can gain experience of ECN before its 662 applicability to end-systems is understood and end terminals are 663 ECN capable. 665 3. Architecture 667 3.1. Admission control 669 In this section we describe the admission control mechanism. We 670 discuss the three pieces of the solution and then give an example of 671 how they fit together in a use case: 673 o the new Pre-Congestion Notification marking behaviour used by all 674 nodes in the CL-region 676 o how the measurements made support our admission control mechanism 678 o how the edge to edge mechanism fits into the end to end RSVP 679 signalling 681 3.1.1. Pre-Congestion Notification marking behaviour 683 To support our admission control mechanism, each node in the CL- 684 region runs an algorithm to determine whether to set the CE codepoint 685 of a particular CL packet. 687 Each link in the CL-region has a fixed rate (bandwidth) reflecting 688 the engineered admission control capacity for CL traffic, under the 689 control of management configuration. In order to make the description 690 more specific we assume a bulk 'inverse-token-bucket' is used on each 691 link; other implementations are possible. Tokens are added to our 692 inverse-token-bucket when packets are queued for transmission on the 693 corresponding link, and are consumed at a fixed rate that is slower 694 than the configured rate. This means that the amount of tokens starts 695 to increase before the actual queue builds up, but when it is in 696 danger of doing so soon; hence it can be used as an "early warning" 697 that the engineered capacity is nearly reached. The probability that 698 a node sets the CE codepoint of a CL packet depends on the number of 699 tokens in the inverse-token-bucket. Below one threshold value of the 700 number of tokens no packets have their CE codepoint set and above the 701 second they all do; in between, the probability increases linearly. 702 Note that the same inverse-token-bucket is used for all the CL 703 packets on that link, i.e. it operates in bulk on the CL behaviour 704 aggregate and not per microflow. The algorithm is detailed in [CL- 705 marking]. 707 Probability 708 of setting ^ 709 CE codepoint | 710 | 711 1_| _______________ 712 | / 713 | / 714 | / 715 | / 716 | / 717 | / 718 | / 719 | / 720 | / 721 0_|___________/ 722 | 723 -----------|---------|--------------> 724 min- max- Amount of tokens in 725 threshold threshold inverse-token-bucket 727 Figure 2: Setting the Congestion Experienced Codepoint 729 How does a node know that it should apply the new Pre-Congestion 730 Notification marking behaviour? A CL packet is indicated by a 731 combination of three things: the node itself is in the CL-region so 732 it is configured with a behaviour for CL packets; the ECN codepoint 733 is set to ECN-Capable Transport (ECT); and the DSCP is set to the 734 value configured for the CL behaviour aggregate in the CL-region. On 735 the third point, we are currently considering whether the PHB used by 736 CL traffic should be a new PHB (indicated by a new DSCP) or whether 737 the Expedited Forwarding (EF) PHB can be used. 739 3.1.2. Measurements to support admission control 741 To support our admission control mechanism the egress measures the 742 Congestion-Level-Estimate for traffic from each remote ingress 743 gateway, i.e. per CL-region-aggregate. The Congestion-Level-Estimate 744 is the number of bits in CL packets that have the CE codepoint set, 745 divided by the number of bits in all CL packets. It is calculated as 746 an exponentially weighted moving average. It is calculated by an 747 egress node separately for the CL packets from each particular 748 ingress node. This Congestion-Level-Estimate provides an estimate of 749 how near the links on the path inside the CL-region are getting to 750 the engineered admission control capacity. Note that the metering is 751 done separately per ingress node, because there may be sufficient 752 capacity on all the nodes on the path between one ingress gateway and 753 a particular egress, but not from a second ingress to that same 754 egress gateway. 756 3.1.3. How edge-to-edge admission control supports end-to-end QoS 757 signalling 759 Consider a scenario that consists of two end hosts, each connected to 760 their own access networks, which are linked by the CL-region. A 761 source tries to set up a new CL microflow by sending an RSVP PATH 762 message, and the receiving end host replies with an RSVP RESV 763 message. Outside the CL-region some other method, for instance 764 IntServ, is used to provide QoS. From the perspective of RSVP the CL- 765 region is a single hop, so the RSVP PATH and RESV messages are 766 processed by the ingress and egress gateways but are carried 767 transparently across all the interior nodes; hence, the ingress and 768 egress gateways hold per microflow state, whilst no state is kept by 769 the interior nodes. So far this is as in IntServ over DiffServ 770 [RFC2998]. However, in order to support our admission control 771 mechanism, the egress gateway adds to the RESV message an opaque 772 object which states the current Congestion-Level-Estimate for the 773 relevant CL-region-aggregate. Details of the corresponding RSVP 774 extensions are described in [RSVP-ECN]. 776 3.1.4. Use case 778 To see how the three pieces of the solution fit together, we imagine 779 a scenario where some microflows are already in place between a given 780 pair of ingress and egress gateways, but the traffic load is such 781 that no packets from these flows have their CE codepoint set as they 782 travel across the CL-region. A source wanting to start a new CL 783 microflow sends an RSVP PATH message. The egress gateway adds an 784 object to the RESV message with the Congestion-Level-Estimate, which 785 is zero. The ingress gateway sees this and consequently admits the 786 new flow. It then forwards the RSVP RESV message upstream towards the 787 source end host. Hence, assuming there's sufficient capacity in the 788 access networks, the new microflow is admitted end-to-end. 790 The source now sends CL packets, which arrive at the ingress gateway. 791 The ingress uses a five-tuple filter to identify that the packets are 792 part of a previously admitted CL microflow, and it also polices the 793 microflow to ensure it remains within its traffic profile. (The 794 ingress has learnt the required information from the RSVP messages). 795 When forwarding a packet belonging to an admitted microflow, the 796 ingress sets the packet's DSCP to that for the CL-traffic in the CL- 797 region and the packet's ECN field to ECT, so that the interior nodes 798 know this is a CL packet. The CL packet now travels across the CL- 799 region, with the CE codepoint getting set if necessary. Also, 800 appropriate queue scheduling is needed in each node to ensure that CL 801 traffic gets its configured bandwidth. 803 Next, we imagine the same scenario but at a later time when load is 804 higher at one (or more) of the interior nodes, which start to set the 805 CE codepoint of CL packets because their arrival rate is nearing the 806 configured rate. The next time a source tries to set up a CL 807 microflow, the ingress gateway learns (from the egress) the relevant 808 Congestion-Level-Estimate. If it is greater than some threshold value 809 then the ingress refuses the request, otherwise it is accepted. 811 It is also possible for an egress gateway to get a RSVP RESV message 812 and not know what the Congestion-Level-Estimate is. For example, if 813 there are no CL microflows at present between the relevant ingress 814 and egress gateways. In this case the egress requests the ingress to 815 send probe packets, from which it can initialise its meter. RSVP 816 Extensions for such a request to send probe data can be found in 817 [RSVP-ECN]. 819 3.2. Pre-emption 821 In this section we describe the pre-emption mechanism. We discuss the 822 two parts of the solution and then give an example of how they fit 823 together in a use case: 825 o How an ingress gateway is triggered to test whether pre-emption 826 may be needed 828 o How an ingress gateway determines the right amount of CL traffic 829 to drop 831 The mechanism is defined in [CL-marking] and [RSVP-ECN]. 833 3.2.1. Alerting an ingress gateway that pre-emption may be needed 835 Alerting an ingress gateway that pre-emption may be needed is a two 836 stage process: a router in the CL-region alerts an egress gateway 837 that pre-emption may be needed; in turn the egress gateway alerts the 838 relevant ingress gateway. Every router in the CL-region has the 839 ability to alert egress gateways, which may be done either explicitly 840 or implicitly: 842 o Explicit - every link in the CL-region has a configured traffic 843 rate, which is a threshold above which it re-marks exceeding CL 844 packets to Re-marked-CL. Reception of such a packet by the egress 845 gateway acts as a Pre-emption Alert. Encoding of Re-marked-CL is 846 under discussion (a new DSCP or leaving the DSCP unchanged and 847 setting a new ECN codepoint). Note that the explicit mechanism 848 only makes sense if all the routers in the CL-region have the 849 functionality so that the egress gateways can rely on the explicit 850 mechanism. Otherwise there is the danger that the traffic happens 851 to focus on a router without it, and egress gateways then have to 852 also watch for implicit pre-emption alerts. 854 o Implicit - the router behaviour is unchanged from the Pre- 855 Congestion marking behaviour described in the admission control 856 section. The egress gateway treats a Congestion-Level-Estimate of 857 (almost) 100% as an implicit alert that pre-emption may be 858 required. ('Almost' because the Congestion-Level-Estimate is a 859 moving average, so can never reach exactly 100%.) 861 Probability 862 of re-marking ^ 863 CL packet to | 864 Re-marked-CL | 865 packet 1_| ______________ 866 | | 867 | | 868 | | 869 | | 870 | | 871 | | 872 | | 873 | | 874 | | 875 0_|___________| 876 | 877 -----------|--------------> 878 threshold CL traffic rate 880 Figure 3: Re-marking CL packets to Re-marked-CL packets for explicit 881 Pre-emption Alert 882 When one or more packets in a CL-region-aggregate alert the egress 883 gateway of the need for pre-emption, whether explicitly or 884 implicitly, the egress puts that CL-region-aggregate into Pre-emption 885 Alert state. For each CL-region-aggregate in alert state it measures 886 the rate of traffic at the egress gateway (i.e. the traffic rate of 887 the appropriate CL-region-aggregate) and reports this to the relevant 888 ingress gateway. The steps are: 890 o Determine the relevant ingress gateway - for the explicit case the 891 egress gateway examines the Re-marked-CL packet (resulting from 892 Pre-emption Alert marking) and uses the state installed at the 893 time of admission to determine which ingress gateway the packet 894 came from. For the implicit case the egress gateway has already 895 determined this information, because the Congestion-Level-Estimate 896 is calculated per ingress gateway. 898 o Measure the traffic rate of CL packets - as soon as the egress 899 gateway is alerted (whether explicitly or implicitly) it measures 900 the rate of CL traffic from this ingress gateway (i.e. for this 901 CL-region-aggregate). Note that Re-marked-CL packets are excluded 902 from that measurement. It should make its measurement quickly and 903 accurately, but exactly how is up to the implementation. 905 o Alert the ingress gateway - the egress gateway then immediately 906 alerts the relevant ingress gateway about the fact that pre- 907 emption may be required. This Alert message also includes the 908 measured Sustainable-Aggregate-Rate, i.e. the egress rate of CL- 909 traffic for this ingress gateway. The Alert message is sent using 910 reliable delivery. Procedures for support of such an Alert using 911 RSVP are defined in [RSVP-ECN]. 913 ______________ / \ ________________ 914 | | / \ | | 915 CL packet |Update | / Is it a \ Y |Measure CL rate | 916 arrives --->|Congestion- |--->/Re-marked-CL \--->|from ingress and| 917 |Level-Estimate| \ packet? / |alert ingress | 918 |______________| \ / |________________| 919 \ / 920 \ / 922 Figure 4: Egress gateway action for explicit Pre-emption Alert 923 ______________ / \ ________________ 924 | | / \ | | 925 CL packet |Update | / C-L-E \ Y |Measure CL rate | 926 arrives --->|Congestion- |--->/ threshold \--->|from ingress and| 927 |Level-Estimate| \ exceeded? / |alert ingress | 928 |______________| \ / |________________| 929 \ / 930 \ / 932 Figure 5: Egress gateway action for implicit Pre-emption Alert 934 3.2.2. Determining the right amount of CL traffic to drop 936 The method relies on the insight that the amount of CL traffic that 937 can be supported between a particular pair of ingress and egress 938 gateways, is the amount of CL traffic that is actually getting across 939 the CL-region to the egress gateway without being re-marked to Re- 940 marked-CL. Hence we term it the Sustainable-Aggregate-Rate. 942 So when the ingress gateway gets the Alert message from an egress 943 gateway, it compares: 945 o The traffic rate that it is sending to this particular egress 946 gateway (which we term ingress-rate) 948 o The traffic rate that the egress gateway reports (in the Alert 949 message) that it is receiving from this ingress gateway (which is 950 the Sustainable-Aggregate-Rate) 952 If the difference is significant, then the ingress gateway pre-empts 953 some microflows. It only pre-empts if: 955 Ingress-rate > Sustainable-Aggregate-Rate + error 957 The "error" term is partially to allow for inaccuracies in the 958 measurements of the rates. It is also needed because the ingress-rate 959 is measured at a slightly later moment than the Sustainable- 960 Aggregate-Rate, and it is quite possible that the ingress-rate has 961 increased in the interim due to natural variation of the bit rate of 962 the CL sources. So the "error" term allows for some variation in the 963 ingress rate without triggering pre-emption. 965 The ingress gateway should pre-empt enough microflows to ensure that: 967 New ingress-rate < Sustainable-Aggregate-Rate - error 969 The "error" term here is used for similar reasons but in the other 970 direction, to ensure slightly more load is shed than seems necessary, 971 in case the two measurements were taken during a short-term fall in 972 load. 974 When the routers in the CL-region are using explicit pre-emption 975 alerting, the ingress gateway would normally pre-empt microflows 976 whenever it gets an alert (it always would if it were possible to set 977 "error" equal to zero). For the implicit case however this is not so. 978 It receives an Alert message when the Congestion-Level-Estimate 979 reaches (almost) 100%, which is roughly when traffic exceeds the 980 amount allocated for admission control of CL traffic at routers. 981 However, it is only when packets are indeed dropped en route that the 982 Sustainable-Aggregate-Rate becomes less than the ingress-rate so only 983 then will pre-emption will actually occur on the ingress router. 985 Hence with the implicit scheme, pre-emption can only be triggered 986 once the system starts dropping packets and thus the QoS of flows 987 starts being significantly degraded. This is in contrast with the 988 explicit scheme which allows pre-emption to be triggered before any 989 packet drop, simply when the traffic reaches a certain configured 990 engineered pre-emption level. Therefore we believe that the explicit 991 mechanism is superior. However it does require new functionality on 992 all the routers (although this is little more than a bulk token 993 bucket). 995 3.2.3. Use case for pre-emption 997 To see how the pieces of the solution fit together in a use case, we 998 imagine a scenario where many microflows have already been admitted. 999 We confine our description to the explicit pre-emption mechanism. Now 1000 an interior router in the CL-region fails. The network layer routing 1001 protocol re-routes round the problem, but as a consequence traffic on 1002 other links increases. In fact let's assume the traffic on one link 1003 now exceeds its pre-emption threshold and so the router re-marks CL 1004 packets to Re-marked-CL. When the egress sees the first one of these 1005 packets it immediately determines which microflow this packet is part 1006 of (by using a five-tuple filter and comparing it with state 1007 installed at admission) and hence which ingress gateway the packet 1008 came from. It sets up a meter to measure the traffic rate from this 1009 ingress gateway, and as soon as possible sends a message to the 1010 ingress gateway. This message alerts the ingress gateway that pre- 1011 emption may be needed and contains the traffic rate measured by the 1012 egress gateway. Then the ingress gateway determines the traffic rate 1013 that it is sending towards this egress gateway and hence it can 1014 calculate the amount of traffic that needs to be pre-empted. 1016 The ingress gateway could now just shed random microflows, but it is 1017 better if the least important ones are dropped. The ingress gateway 1018 could use information stored locally in each reservation's state 1019 (such as for example the RSVP pre-emption priority) as well as 1020 information provided by a policy decision point in order to decide 1021 which of the flows to shed (or perhaps which ones not to shed). The 1022 ingress gateway then initiates RSVP signalling to instruct the 1023 relevant destinations that their session has been terminated, and to 1024 tell (RSVP) nodes along the path to tear down associated RSVP state. 1025 To guard against recalcitrant sources, normal IntServ policing will 1026 block any future traffic from the dropped flows from entering the CL- 1027 region. Note that - with the explicit Pre-emption Alert mechanism - 1028 since the threshold for re-marking packets to Re-marked-CL may be set 1029 at significantly less than the physical line capacity, traffic pre- 1030 emption may be triggered before any congestion has actually occurred 1031 and before any packet is dropped. 1033 We extend the scenario further by imagining that (due to a disaster 1034 of some kind) further routers in the CL-region fail during the time 1035 taken by the pre-emption process described above. This is handled 1036 naturally, as packets will continue to be re-marked to Re-marked-CL 1037 and so the pre-emption process will happen for a second time. 1039 Pre-emption also helps emergency/military calls by taking into 1040 account the corresponding call priorities when selecting calls to be 1041 pre-empted, which is likely to be particularly important in a 1042 disaster scenario. 1044 4. Details 1046 This section is intended to provide a systematic summary of the new 1047 functionality required by the routers in the CL-region. 1049 A network operator upgrades normal IP routers by: 1051 o Adding functionality related to admission control and pre-emption 1052 to all its ingress and egress gateways 1054 o Adding Pre-Congestion Notification behaviour and Pre-emption Alert 1055 behaviour to all the nodes in the CL-region. 1057 We consider the detailed actions required for each of the types of 1058 node in turn. 1060 4.1. Ingress gateways 1062 Ingress gateways perform the following tasks: 1064 o Classify incoming packets - decide whether they are CL or non-CL 1065 packets. This is done using an IntServ filter spec (source and 1066 destination addresses and port numbers), whose details have been 1067 gathered from the RSVP messaging. 1069 o Police - check that the microflow conforms with what has been 1070 agreed (i.e. it keeps to its agreed data rate). If necessary, 1071 packets which do not correspond to any reservations, packets which 1072 are in excess of the rate agreed for their reservation, and 1073 packets for a reservation that has earlier been pre-empted may be 1074 policed. Policing may be achieved via dropping or via re-marking 1075 of the packet's DSCP to a value different from the CL behaviour 1076 aggregate. 1078 o Packet ECN colouring - for CL microflows, set the ECN field to 1079 ECT(0) or ECT(1) (uses for ECT(0) and ECT(1) will be discussed in 1080 a later version of this document) 1082 o Perform 'interior node' functions (see next sub-section) 1084 o Admission Control - on new session establishment, consider the 1085 Congestion-Level-Estimate received from the corresponding egress 1086 gateway and most likely based on a simple configured threshold 1087 decide if a new call is to be admitted or rejected (taking into 1088 account local policy information as well as optionally information 1089 provided by a policy decision point). 1091 o Probe - if requested by the egress gateway to do so, the ingress 1092 gateway generates probe traffic so that the egress gateway can 1093 compute the Congestion-Level-Estimate from this ingress gateway. 1094 Probe packets may be simple data addressed to the egress gateway 1095 and require no protocol standardisation, although there will be 1096 best practice for their number, size and rate. 1098 o Measure - when it receives an Alert message from an egress 1099 gateway, it determines the rate at which it is sending packets to 1100 that egress gateway 1102 o Pre-empt - calculate how much CL traffic needs to be pre-empted; 1103 decide which microflows should be dropped, perhaps in consultation 1104 with a Policy Decision Point; and do the necessary signalling to 1105 drop them. 1107 4.2. Interior nodes 1109 Interior nodes do the following tasks: 1111 o Classify packets - examine the DSCP and ECN field to see if it's a 1112 CL packet 1114 o Non-CL packets are handled as usual, with respect to dropping them 1115 or setting their CE codepoint. 1117 o Pre-Congestion Notification - CL packets have their CE codepoint 1118 set according to the algorithm detailed in [CL-marking] and 1119 outlined in Section 3. 1121 o Pre-emption Alert - assuming the explicit Pre-emption Alert 1122 mechanism is being used, when the rate of CL traffic exceeds a 1123 threshold then re-mark packets to Re-marked-CL. 1125 4.3. Egress gateways 1127 Egress gateways do the following tasks: 1129 o Classify packets - determine which ingress gateway a CL packet has 1130 come from. This is the previous RSVP hop, hence the necessary 1131 details are obtained just as with IntServ from the state 1132 associated with the packet five-tuple, which has been built using 1133 information from the RSVP messages. 1135 o Meter - for CL packets, calculate the fraction of the total number 1136 of bits which are in CE marked packets or in Re-marked-CL packets. 1137 The calculation is done as an exponentially weighted moving 1138 average (see Appendix). A separate calculation is made for CL 1139 packets from each ingress gateway. The meter works on an aggregate 1140 basis and not per microflow. 1142 o Signal the Congestion-Level-Estimate - this is piggy-backed on the 1143 reservation reply. An egress gateway's interface is configured to 1144 know it is an egress gateway, so it always appends this to the 1145 RESV message. If the Congestion-Level-Estimate is unknown or is 1146 too stale, then the egress gateway can request the ingress gateway 1147 to send probes. 1149 o Packet colouring - for CL packets, set the DSCP and the ECN field 1150 to whatever has been agreed as appropriate for the next domain. By 1151 default the ECN field is set to the Not-ECT codepoint. Note that 1152 this results in the loss of the end-to-end meaning of the ECN 1153 field. It can usually be assumed that end-to-end congestion 1154 control is unnecessary within an end-to-end reservation. But if a 1155 genuine need is identified for end-to-end ECN semantics within a 1156 reservation, then an alternative is to tunnel CL packets across 1157 the CL-region, or to agree an extension to end-to-end signalling 1158 to indicate that the microflow uses an ECN-capable transport. We 1159 do not recommend such apparently unnecessary complexity. 1161 o Measure the rate - measure the rate of CL traffic from a 1162 particular ingress gateway (i.e. the rate for the CL-region- 1163 aggregate), when alerted (either explicitly or implicitly) that 1164 pre-emption may be required. The measured rate is reported back to 1165 the appropriate ingress gateway [RSVP-ECN]. 1167 4.4. Failures 1169 If a gateway fails then regular RSVP procedures will take care of 1170 things. For example, say an ingress gateway fails. Then RSVP routers 1171 upstream of it do IP re-routing to a new ingress gateway. Then the 1172 upstream RSVP routers do RSVP fast local repair, i.e. attempt to re- 1173 establish reservations through the new ingress gateway and, for 1174 example, through the same egress gateway. As part of this, admission 1175 control is performed, using the procedure described in this document. 1176 This could result in some of the flows being rejected, but those 1177 accepted will receive the full QoS. 1179 If an interior node fails, then the regular IP routing protocol will 1180 re-route round it. If the new route can carry admitted traffic, flows 1181 gracefully continue. If instead this causes early warning of 1182 congestion from the new route, admission control based on pre- 1183 congestion notification will ensure new flows will not be admitted 1184 until enough existing flows have departed. Finally re-routing may 1185 result in heavy congestion, when the pre-emption mechanism will kick 1186 in. 1188 5. Potential future extensions 1190 5.1. Multi-domain and multi-operator usage 1192 This potential extension would eliminate the trust assumption 1193 (Section 2.2), so that the CL-region could consist of multiple 1194 domains run by different operators that did not trust each other. 1195 Then only the ingress and egress gateways of the CL-region would take 1196 part in the admission control procedure, i.e. at the ingress to the 1197 first domain and the egress from the final domain. The border routers 1198 between operators within the CL-region would only have to do bulk 1199 accounting - they wouldn't do per microflow metering and policing, 1200 and they wouldn't take part in signal processing or hold path state 1201 [Briscoe]. [Re-feedback, Re-feedback-I-D] explains how a downstream 1202 domain can police that its upstream domain does not 'cheat' by 1203 admitting traffic when the downstream path is over-congested. 1205 5.2. Adaptive bandwidth for the Controlled Load service 1207 The admission control mechanism described in this document assumes 1208 that each router has a fixed bandwidth allocated to CL flows. A 1209 possible extension is that the bandwidth is flexible, depending on 1210 the level of non-CL traffic. If a large share of the current load on 1211 a path is CL, then more CL traffic can be admitted. And if the 1212 greater share of the load is non-CL, then the admission threshold can 1213 be proportionately lower. The approach re-arranges sharing between 1214 classes to aim for economic efficiency, whatever the traffic load 1215 matrix. It also deals with unforeseen changes to capacity during 1216 failures better than configuring fixed engineered rates. Adaptive 1217 bandwidth allocation can be achieved by changing the Pre-Congestion 1218 marking behaviour, so that the probability of setting the CE 1219 codepoint would now depend on the number of queued non-CL packets as 1220 well as the number of CL tokens. The adaptive bandwidth approach 1221 would be supplemented by placing limits on the adaptation to prevent 1222 starvation of the CL by other traffic classes and of other classes by 1223 CL traffic. 1225 5.3. Controlled Load service with end-to-end Pre-Congestion Notification 1227 It may be possible to extend the framework to parts of the network 1228 where there are only a low number of CL microflows, i.e. the 1229 aggregation assumption (Section 2.2) doesn't hold. In the extreme it 1230 may be possible to operate the framework end-to-end, i.e. between end 1231 hosts. One potential method is to send probe packets to test whether 1232 the network can support a prospective new CL microflow. The probe 1233 packets would be sent at the same traffic rate as expected for the 1234 actual microflow, but in order not to disturb existing CL traffic a 1235 router would always schedule probe packets behind CL ones (compare 1236 [Breslau00]); this implies they have a new DSCP. Otherwise the 1237 routers would treat probe packets identically to CL packets. In order 1238 to perform admission control quickly, in parts of the network where 1239 there are only a few CL microflows, the Pre-Congestion marking 1240 behaviour for probe packets would switch from CE marking no packets 1241 to CE marking them all for only a minimal increase in load. 1243 5.4. MPLS-TE 1245 It may be possible to extend the framework for admission control of 1246 microflows into a set of MPLS-TE aggregates (Multi-protocol label 1247 switching traffic engineering). However it would require that the 1248 MPLS header could include the ECN field, which is not precluded by 1249 RFC3270. 1251 6. Relationship to other QoS mechanisms 1253 6.1. IntServ Controlled Load 1255 The CL mechanism delivers QoS similar to Integrated Services 1256 controlled load, but rather better as queues are kept empty by 1257 driving admission control from bulk inverse-token-buckets on each 1258 interface that can detect a rise in load before queues build, 1259 sometimes termed a virtual queue [AVQ, vq]. It is also more robust to 1260 route changes. 1262 6.2. Integrated services operation over DiffServ 1264 Our approach to end-to-end QoS is similar to that described in 1265 [RFC2998] for Integrated services operation over DiffServ networks. 1266 Like [RFC2998], an IntServ class (CL in our case) is achieved end-to- 1267 end, with a CL-region viewed as a single reservation hop in the total 1268 end-to-end path. Interior routers of the CL-region do not process 1269 flow signalling nor do they hold state. Unlike [RFC2998] we do not 1270 require the end-to-end signalling mechanism to be RSVP, although it 1271 can be. 1273 Bearing in mind these differences, we can describe our architecture 1274 in the terms of the options in [RFC2998]. The DiffServ network region 1275 is RSVP-aware, but awareness is confined to (what [RFC2998] calls) 1276 the "border routers" of the DiffServ region. We use explicit 1277 admission control into this region, with static provisioning within 1278 it. The ingress "border router" does per microflow policing and sets 1279 the DSCP and ECN fields to indicate the packets are CL ones (i.e. we 1280 use router marking rather than host marking). 1282 6.3. Differentiated Services 1284 The DiffServ architecture does not specify any way for devices 1285 outside the domain to dynamically reserve resources or receive 1286 indications of network resource availability. In practice, service 1287 providers rely on subscription-time Service Level Agreements (SLAs) 1288 that statically define the parameters of the traffic that will be 1289 accepted from a customer. The CL mechanism allows dynamic reservation 1290 of resources through the DiffServ domain and, with the potential 1291 extension mentioned in Section 5.1, it can span multiple domains 1292 without active policing mechanisms at the borders (unlike DiffServ). 1293 Therefore we do not use the traffic conditioning agreements (TCAs) of 1294 the (informational) DiffServ architecture [RFC2475]. 1296 [Johnson] compares admission control with a 'generously dimensioned' 1297 DiffServ network as ways to achieve QoS. The former is recommended. 1299 6.4. ECN 1301 The marking behaviour described in this document complies with the 1302 ECN aspects of the IP wire protocol RFC3168, but provides its own 1303 edge-to-edge feedback instead of the TCP aspects of RFC3168. All 1304 nodes within the CL-region are upgraded with the Pre-Congestion 1305 Notification and Pre-emption Alert mechanisms, so the requirements of 1306 [Floyd] are met because the CL-region is an enclosed environment. The 1307 operator prevents traffic arriving at a node that doesn't understand 1308 CL by administrative configuration of the ring of gateways around the 1309 CL-region. 1311 6.5. RTECN 1313 Real-time ECN (RTECN) [RTECN, RTECN-usage] has a similar aim to this 1314 document (to achieve a low delay, jitter and loss service suitable 1315 for RT traffic) and a similar approach (per microflow admission 1316 control combined with an "early warning" of potential congestion 1317 through setting the CE codepoint). But it explores a different 1318 architecture without the aggregation assumption: host-to-host rather 1319 than edge-to-edge. 1321 6.6. RMD 1323 Resource Management in DiffServ (RMD) [RMD] is similar to this work, 1324 in that it pushes complex classification, traffic conditioning and 1325 admission control functions to the edge of a DiffServ domain and 1326 simplifies the operation of the interior nodes. One of the RMD modes 1327 uses measurement-based admission control, however it works 1328 differently: each interior node measures the user traffic load in the 1329 PHB traffic aggregate, and each interior node processes a local 1330 RESERVE message and compares the requested resources with the 1331 available resources (maximum allowed load minus current load). 1333 Hence a difference is that the CL architecture described in this 1334 document has been designed not to require interaction between 1335 interior nodes and signalling, whereas in RMD all interior nodes are 1336 QoS-NSLP aware. So our architecture involves less processing in 1337 interior nodes, is more agnostic to signalling, requires fewer 1338 changes to existing standards and therefore works with existing RSVP 1339 as well as having the potential to work with future signalling 1340 protocols like NSIS. 1342 RMD introduced the concept of Severe Congestion handling. The pre- 1343 emption mechanism described in the CL architecture has similar 1344 objectives but relies on different mechanisms. 1346 6.7. RSVP Aggregation over MPLS-TE 1348 Multi-protocol label switching traffic engineering (MPLS-TE) allows 1349 scalable reservation of resources in the core for an aggregate of 1350 many microflows. To achieve end-to-end reservations, admission 1351 control and policing of microflows into the aggregate can be achieved 1352 using techniques such as RSVP Aggregation over MPLS TE Tunnels as per 1353 [AGGRE-TE]. However, in the case of inter-provider environments, 1354 these techniques require that admission control and policing be 1355 repeated at each trust boundary or that MPLS TE tunnels span multiple 1356 domains. 1358 7. Security Considerations 1360 To protect against denial of service attacks, the ingress gateway of 1361 the CL-region needs to police all CL packets and drop packets in 1362 excess of the reservation. This is similar to operations with 1363 existing IntServ behaviour. 1365 For pre-emption, it is considered acceptable from a security 1366 perspective that the ingress gateway can treat "emergency/military" 1367 CL flows preferentially compared with "ordinary" CL flows. However, 1368 in the rest of the CL-region they are not distinguished (nonetheless, 1369 our proposed technique does not preclude the use of different DSCPs 1370 at the packet level as well as different priorities at the flow 1371 level.). Keeping emergency traffic indistinguishable at the packet 1372 level minimises the opportunity for new security attacks. For 1373 example, if instead a mechanism used different DSCPs for 1374 "emergency/military" and "ordinary" packets, then an attacker could 1375 specifically target the former in the data plane (perhaps for DoS or 1376 for eavesdropping). 1378 Further security aspects to be considered later. 1380 8. Acknowledgements 1382 The admission control mechanism evolved from the work led by Martin 1383 Karsten on the Guaranteed Stream Provider developed in the M3I 1384 project [GSPa, GSP-TR], which in turn was based on the theoretical 1385 work of Gibbens and Kelly [DCAC]. Kennedy Cheng, Gabriele Corliano, 1386 Carla Di Cairano-Gilfedder, Kashaf Khan, Peter Hovell, Arnaud Jacquet 1387 and June Tay (BT) helped develop and evaluate this approach. 1389 9. Comments solicited 1391 Comments and questions are encouraged and very welcome. They can be 1392 sent to the Transport Area Working Group's mailing list, 1393 tsvwg@ietf.org, and/or to the authors. 1395 10. Changes from the -00 version of this draft 1397 There are several modifications to the admission control mechanism 1398 described in the first version of the draft, but the main technical 1399 change is the addition of the whole of the Pre-emption mechanism. 1401 11. Appendixes 1403 11.1. Appendix A: Explicit Congestion Notification 1405 This Appendix provides a brief summary of Explicit Congestion 1406 Notification (ECN). 1408 [RFC3168] specifies the incorporation of ECN to TCP and IP, including 1409 ECN's use of two bits in the IP header. It specifies a method for 1410 indicating incipient congestion to end-nodes (egg as in RED, Random 1411 Early Detection), where the notification is through ECN marking 1412 packets rather than dropping them. 1414 ECN uses two bits in the IP header of both IPv4 and IPv6 packets: 1416 0 1 2 3 4 5 6 7 1417 +-----+-----+-----+-----+-----+-----+-----+-----+ 1418 | DS FIELD, DSCP | ECN FIELD | 1419 +-----+-----+-----+-----+-----+-----+-----+-----+ 1421 DSCP: differentiated services codepoint 1422 ECN: Explicit Congestion Notification 1424 Figure A.1: The Differentiated Services and ECN Fields in IP. 1426 The two bits of the ECN field have four ECN codepoints, '00' to '11': 1427 +-----+-----+ 1428 | ECN FIELD | 1429 +-----+-----+ 1430 ECT CE 1431 0 0 Not-ECT 1432 0 1 ECT(1) 1433 1 0 ECT(0) 1434 1 1 CE 1436 Figure A.2: The ECN Field in IP. 1438 The not-ECT codepoint '00' indicates a packet that is not using ECN. 1440 The CE codepoint '11' is set by a router to indicate congestion to 1441 the end nodes. The term 'CE packet' denotes a packet that has the CE 1442 codepoint set. 1444 The ECN-Capable Transport (ECT) codepoints '10' and '01' (ECT(0) and 1445 ECT(1) respectively) are set by the data sender to indicate that the 1446 end-points of the transport protocol are ECN-capable. Routers treat 1447 the ECT(0) and ECT(1) codepoints as equivalent. Senders are free to 1448 use either the ECT(0) or the ECT(1) codepoint to indicate ECT, on a 1449 packet-by-packet basis. The use of both the two codepoints for ECT is 1450 motivated primarily by the desire to allow mechanisms for the data 1451 sender to verify that network elements are not erasing the CE 1452 codepoint, and that data receivers are properly reporting to the 1453 sender the receipt of packets with the CE codepoint set. 1455 ECN requires support from the transport protocol, in addition to the 1456 functionality given by the ECN field in the IP packet header. 1457 [RFC3168] addresses the addition of ECN Capability to TCP, specifying 1458 three new pieces of functionality: negotiation between the endpoints 1459 during connection setup to determine if they are both ECN-capable; an 1460 ECN-Echo (ECE) flag in the TCP header so that the data receiver can 1461 inform the data sender when a CE packet has been received; and a 1462 Congestion Window Reduced (CWR) flag in the TCP header so that the 1463 data sender can inform the data receiver that the congestion window 1464 has been reduced. 1466 The transport layer (e.g.. TCP) must respond, in terms of congestion 1467 control, to a *single* CE packet as it would to a packet drop. 1469 The advantage of setting the CE codepoint as an indication of 1470 congestion, instead of relying on packet drops, is that it allows the 1471 receiver(s) to receive the packet, thus avoiding the potential for 1472 excessive delays due to retransmissions after packet losses. 1474 11.2. Appendix B: What is distributed measurement-based admission 1475 control? 1477 This Appendix briefly explains what distributed measurement-based 1478 admission control is [Breslau99]. 1480 Traditional admission control algorithms for 'hard' real-time 1481 services (those providing a firm delay bound for example) guarantee 1482 QoS by using 'worst case analysis'. Each time a flow is admitted its 1483 traffic parameters are examined and the network re-calculates the 1484 remaining resources. When the network gets a new request it therefore 1485 knows for certain whether the prospective flow, with its particular 1486 parameters, should be admitted. However, parameter-based admission 1487 control algorithms result in under-utilisation when the traffic is 1488 bursty. Therefore 'soft' real time services - like Controlled Load - 1489 can use a more relaxed admission control algorithm. 1491 This idea suggests measurement-based admission control (MBAC). The 1492 aim of MBAC is to provide a statistical service guarantee. The 1493 classic scenario for MBAC is where each node participates in hop-by- 1494 hop admission control, characterising existing traffic locally 1495 through measurements (instead of keeping an accurate track of traffic 1496 as it is admitted), in order to determine the current value of some 1497 parameter e.g. load. Note that for scalability the measurement is of 1498 the aggregate of the flows in the local system. The measured 1499 parameter(s) is then compared to the requirements of the prospective 1500 flow to see whether it should be admitted. 1502 MBAC may also be performed centrally for a network, it which case it 1503 uses centralised measurements by a bandwidth broker. 1505 We use distributed MBAC. "Distributed" means that the measurement is 1506 accumulated for the 'whole-path' using in-band signalling. In our 1507 case, this means that the measurement of existing traffic is for the 1508 same pair of ingress and egress gateways as the prospective 1509 microflow. 1511 In fact our mechanism can be said to be distributed in three ways: 1512 all nodes on the ingress-egress path affect the Congestion-Level- 1513 Estimate; the admission control decision is made just once on behalf 1514 of all the nodes on the path across the CL-region; and the ingress 1515 and egress gateways cooperate to perform MBAC. 1517 11.3. Appendix C: Calculating the Exponentially weighted moving average 1518 (EWMA) 1520 At the egress gateway, for every CL packet arrival: 1522 [EWMA-total-bits]n+1 = (w * bits-in-packet) + ((1-w) * [EWMA- 1523 total-bits]n ) 1525 [EWMA-CE-bits]n+1 = (B * w * bits-in-packet) + ((1-w) * [EWMA-CE- 1526 bits]n ) 1528 Then, per new flow arrival: 1530 [Congestion-Level-Estimate]n+1 = [EWMA-CE-bits]n+1 / [EWMA- 1531 total-bits]n+1 1533 where 1535 EWMA-total-bits is the total number of bits in CL packets, calculated 1536 as an exponentially weighted moving average (EWMA) 1538 EWMA-CE-bits is the total number of bits in CL packets where the 1539 packet has its CE codepoint set, again calculated as an EWMA. 1541 B is either 0 or 1: 1543 B = 0 if the CL packet does not have its CE codepoint set 1545 B = 1 if the CL packet has its CE codepoint set 1547 w is the exponential weighting factor. 1549 Varying the value of the weight trades off between the smoothness and 1550 responsiveness of the estimate of the percentage of CE packets. 1551 However, in general both can be achieved, given our original 1552 assumption of many CL microflows and remembering that the EWMA is 1553 calculated on the basis of aggregate traffic between the ingress and 1554 egress gateways. 1555 There will be a threshold inter-arrival time between packets of the 1556 same aggregate below which the egress will consider the estimate of 1557 the Congestion-Level-Estimate as too stale, and it will then trigger 1558 generation of probes by the ingress. 1560 The first two per-packet algorithms can be simplified, if their only 1561 use will be where the result of one is divided by the result of the 1562 other in the third, per-flow algorithm. 1564 [EWMA-total-bits]'n+1 = bits-in-packet + (w' * [EWMA- total- 1565 bits]n ) 1567 [EWMA-CE-bits]'n+1 = (B * bits-in-packet) + (w' * [EWMA-CE-bits]n 1568 ) 1570 where w' = (1-w)/w. 1572 If w' is arranged to be a power of 2, these per packet algorithms can 1573 be implemented solely with a shift and an add. 1575 12. References 1577 A later version will distinguish normative and informative 1578 references. 1580 [AGGRE-TE] Francois Le Faucheur, Michael Dibiasio, Bruce Davie, 1581 Michael Davenport, Chris Christou, Jerry Ash, Bur 1582 Goode, 'Aggregation of RSVP Reservations over MPLS 1583 TE/DS-TE Tunnels', draft-ietf-tsvwg-rsvp-dste-00 (work 1584 in progress), July 2005 1586 [ANSI.MLPP.Spec] American National Standards Institute, 1587 "Telecommunications- Integrated Services Digital 1588 Network (ISDN) - Multi-Level Precedence and Pre- 1589 emption (MLPP) Service Capability", ANSI T1.619-1992 1590 (R1999), 1992. 1592 [ANSI.MLPP.Supplement] American National Standards Institute, "MLPP 1593 Service Domain Cause Value Changes", ANSI ANSI 1594 T1.619a-1994 (R1999), 1990. 1596 [AVQ] S. Kunniyur and R. Srikant "Analysis and Design of an 1597 Adaptive Virtual Queue (AVQ) Algorithm for Active 1598 Queue Management", In: Proc. ACM SIGCOMM'01, Computer 1599 Communication Review 31 (4) (October, 2001). 1601 [Breslau99] L. Breslau, S. Jamin, S. Shenker "Measurement-based 1602 admission control: what is the research agenda?", In: 1603 Proc. Int'l Workshop on Quality of Service 1999. 1605 [Breslau00] L. Breslau, E. Knightly, S. Shenker, I. Stoica, H. 1606 Zhang "Endpoint Admission Control: Architectural 1607 Issues and Performance", In: ACM SIGCOMM 2000 1609 [Briscoe] Bob Briscoe and Steve Rudkin, "Commercial Models for 1610 IP Quality of Service Interconnect", BT Technology 1611 Journal, Vol 23 No 2, April 2005. 1613 [CL-marking] Forthcoming. Supercedes draft-briscoe-tsvwg-cl-phb-00. 1615 [DCAC] Richard J. Gibbens and Frank P. Kelly "Distributed 1616 connection acceptance control for a connectionless 1617 network", In: Proc. International Teletraffic Congress 1618 (ITC16), Edinburgh, pp. 941�952 (1999). 1620 [EMERG-RQTS] Carlberg, K. and R. Atkinson, "General Requirements 1621 for Emergency Telecommunication Service (ETS)", RFC 1622 3689, February 2004. 1624 [EMERG-TEL] Carlberg, K. and R. Atkinson, "IP Telephony 1625 Requirements for Emergency Telecommunication Service 1626 (ETS)", RFC 3690, February 2004. 1628 [Floyd] S. Floyd, 'Specifying Alternate Semantics for the 1629 Explicit Congestion Notification (ECN) Field', draft- 1630 floyd-ecn-alternates-02.txt (work in progress), August 1631 2005 1633 [GSPa] Karsten (Ed.), Martin "GSP/ECN Technology & 1634 Experiments", Deliverable: 15.3 PtIII, M3I Eu Vth 1635 Framework Project IST-1999-11429, URL: 1636 http://www.m3i.org/ (February, 2002) (superseded by 1637 [GSP-TR]) 1639 [GSP-TR] Martin Karsten and Jens Schmitt, "Admission Control 1640 Based on Packet Marking and Feedback Signalling �-- 1641 Mechanisms, Implementation and Experiments", TU- 1642 Darmstadt Technical Report TR-KOM-2002-03, URL: 1643 http://www.kom.e-technik.tu- 1644 darmstadt.de/publications/abstracts/KS02-5.html (May, 1645 2002) 1647 [ITU.MLPP.1990] International Telecommunications Union, "Multilevel 1648 Precedence and Pre-emption Service (MLPP)", ITU-T 1649 Recommendation I.255.3, 1990. 1651 [Johnson] DM Johnson, 'QoS control versus generous 1652 dimensioning', BT Technology Journal, Vol 23 No 2, 1653 April 2005 1655 [Re-ECN] Bob Briscoe, Arnaud Jacquet, Alessandro Salvatori, 1656 'Re-ECN: Adding Accountability for Causing Congestion 1657 to TCP/IP', draft-briscoe-tsvwg-re-ecn-tcp-00 (work in 1658 progress), October 2005. 1660 [Re-feedback] Bob Briscoe, Arnaud Jacquet, Carla Di Cairano- 1661 Gilfedder, Andrea Soppera, 'Re-feedback for Policing 1662 Congestion Response in an Inter-network', ACM SIGCOMM 1663 2005, August 2005. 1665 [Reid] ABD Reid, 'Economics and scalability of QoS 1666 solutions', BT Technology Journal, Vol 23 No 2, April 1667 2005 1669 [RFC2211] J. Wroclawski, Specification of the Controlled-Load 1670 Network Element Service, September 1997 1672 [RFC2309] Braden, B., et al., "Recommendations on Queue 1673 Management and Congestion Avoidance in the Internet", 1674 RFC 2309, April 1998. 1676 [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, 1677 "Definition of the Differentiated Services Field (DS 1678 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1679 December 1998 1681 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, 1682 Z. and W. Weiss, 'A framework for Differentiated 1683 Services', RFC 2475, December 1998. 1685 [RFC2597] Heinanen, J., Baker, F., Weiss, W. and J. Wrocklawski, 1686 "Assured Forwarding PHB Group", RFC 2597, June 1999. 1688 [RFC2998] Bernet, Y., Yavatkar, R., Ford, P., Baker, F., Zhang, 1689 L., Speer, M., Braden, R., Davie, B., Wroclawski, J. 1690 and E. Felstaine, "A Framework for Integrated Services 1691 Operation Over DiffServ Networks", RFC 2998, November 1692 2000. 1694 [RFC3168] Ramakrishnan, K., Floyd, S. and D. Black "The Addition 1695 of Explicit Congestion Notification (ECN) to IP", RFC 1696 3168, September 2001. 1698 [RFC3246] B. Davie, A. Charny, J.C.R. Bennet, K. Benson, J.Y. Le 1699 Boudec, W. Courtney, S. Davari, V. Firoiu, D. 1700 Stiliadis, 'An Expedited Forwarding PHB (Per-Hop 1701 Behavior)', RFC 3246, March 2002. 1703 [RFC3270] Le Faucheur, F., Wu, L., Davie, B., Davari, S., 1704 Vaananen, P., Krishnan, R., Cheval, P., and J. 1705 Heinanen, "Multi- Protocol Label Switching (MPLS) 1706 Support of Differentiated Services", RFC 3270, May 1707 2002. 1709 [RMD] Attila Bader, Lars Westberg, Georgios Karagiannis, 1710 Cornelia Kappler, Tom Phelan, 'RMD-QOSM - The Resource 1711 Management in DiffServ QoS model', draft-ietf-nsis- 1712 rmd-03 Work in Progress, June 2005. 1714 [RSVP-ECN] Francois Le Faucheur, Anna Charny, Bob Briscoe, Philip 1715 Eardley, Joe Barbiaz, Kwok-Ho Chan, 'RSVP Extensions 1716 for Admission Control over DiffServ using Pre- 1717 congestion Notification', draft-lefaucheur-rsvp-ecn-00 1718 (work in progress), October 2005. 1720 [RTECN] Babiarz, J., Chan, K. and V. Firoiu, 'Congestion 1721 Notification Process for Real-Time Traffic', draft- 1722 babiarz-tsvwg-rtecn-04 Work in Progress, July 2005. 1724 [RTECN-usage] Alexander, C., Ed., Babiarz, J. and J. Matthews, 1725 'Admission Control Use Case for Real-time ECN', draft- 1726 alexander-rtecn-admission-control-use-case-00, Work in 1727 Progress, February 2005. 1729 [vq] Costas Courcoubetis and Richard Weber "Buffer Overflow 1730 Asymptotics for a Switch Handling Many Traffic 1731 Sources" In: Journal Applied Probability 33 pp. 886-- 1732 903 (1996). 1734 Authors' Addresses 1736 Bob Briscoe 1737 BT Research 1738 B54/77, Sirius House 1739 Adastral Park 1740 Martlesham Heath 1741 Ipswich, Suffolk 1742 IP5 3RE 1743 United Kingdom 1744 Email: bob.briscoe@bt.com 1746 Dave Songhurst 1747 BT Research 1748 B54/69, Sirius House 1749 Adastral Park 1750 Martlesham Heath 1751 Ipswich, Suffolk 1752 IP5 3RE 1753 United Kingdom 1754 Email: dsonghurst@jungle.bt.co.uk 1756 Philip Eardley 1757 BT Research 1758 B54/77, Sirius House 1759 Adastral Park 1760 Martlesham Heath 1761 Ipswich, Suffolk 1762 IP5 3RE 1763 United Kingdom 1764 Email: philip.eardley@bt.com 1765 Francois Le Faucheur 1766 Cisco Systems, Inc. 1767 Village d'Entreprise Green Side - Batiment T3 1768 400, Avenue de Roumanille 1769 06410 Biot Sophia-Antipolis 1770 France 1771 Email: flefauch@cisco.com 1773 Anna Charny 1774 Cisco Systems 1775 300 Apollo Drive 1776 Chelmsford, MA 01824 1777 USA 1778 Email: acharny@cisco.com 1780 Kwok Ho Chan 1781 Nortel Networks 1782 600 Technology Park Drive 1783 Billerica, MA 01821 1784 USA 1785 Email: khchan@nortel.com 1787 Jozef Z. Babiarz 1788 Nortel Networks 1789 3500 Carling Avenue 1790 Ottawa, Ont K2H 8E9 1791 Canada 1792 Email: babiarz@nortel.com 1794 Intellectual Property Statement 1796 The IETF takes no position regarding the validity or scope of any 1797 Intellectual Property Rights or other rights that might be claimed to 1798 pertain to the implementation or use of the technology described in 1799 this document or the extent to which any license under such rights 1800 might or might not be available; nor does it represent that it has 1801 made any independent effort to identify any such rights. Information 1802 on the procedures with respect to rights in RFC documents can be 1803 found in BCP 78 and BCP 79. 1805 Copies of IPR disclosures made to the IETF Secretariat and any 1806 assurances of licenses to be made available, or the result of an 1807 attempt made to obtain a general license or permission for the use of 1808 such proprietary rights by implementers or users of this 1809 specification can be obtained from the IETF on-line IPR repository at 1810 http://www.ietf.org/ipr. 1812 The IETF invites any interested party to bring to its attention any 1813 copyrights, patents or patent applications, or other proprietary 1814 rights that may cover technology that may be required to implement 1815 this standard. Please address the information to the IETF at 1816 ietf-ipr@ietf.org 1818 Disclaimer of Validity 1820 This document and the information contained herein are provided on an 1821 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1822 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1823 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1824 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1825 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1826 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1828 Copyright Statement 1830 Copyright (C) The Internet Society (2005). 1832 This document is subject to the rights, licenses and restrictions 1833 contained in BCP 78, and except as set forth therein, the authors 1834 retain all their rights.