TSVWG                                                        B. Briscoe 
Internet Draft                                               P. Eardley 
draft-briscoe-tsvwg-cl-architecture-01.txt                 D. Songhurst 
Expires: April 2006                                                  BT 
 
                                                         F. Le Faucheur 
                                                              A. Charny 
                                                     Cisco Systems, Inc 
 
                                                             J. Barbiaz 
                                                                K. Chan 
                                                                 Nortel 
 
                                                       October 24, 2005 
                                                                      
                                      
    A Framework for Admission Control over DiffServ using Pre-Congestion 
                               Notification  
                draft-briscoe-tsvwg-cl-architecture-01.txt 


Status of this Memo 

   By submitting this Internet-Draft, each author represents that any 
   applicable patent or other IPR claims of which he or she is aware 
   have been or will be disclosed, and any of which he or she becomes 
   aware will be disclosed, in accordance with Section 6 of BCP 79. 

   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that 
   other groups may also distribute working documents as Internet-
   Drafts. 

   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time.  It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress". 

   The list of current Internet-Drafts can be accessed at 
        http://www.ietf.org/ietf/1id-abstracts.txt 

   The list of Internet-Draft Shadow Directories can be accessed at 
        http://www.ietf.org/shadow.html 

   This Internet-Draft will expire on May 24, 2006. 


Briscoe                Expires April 24, 2006                 [Page 1] 

Internet-Draft      Controlled Load architecture          October 2005 
    

Copyright Notice 

   Copyright (C) The Internet Society (2005).  All Rights Reserved. 

Abstract  

   This document describes a framework to achieve an end-to-end 
   Controlled Load (CL) service without the scalability problems of 
   previous approaches. Flow admission control and if necessary flow 
   pre-emption preserve the CL service to admitted flows. But interior 
   routers within a large DiffServ-based region of the Internet do not 
   require flow state or signalling. They only have to give early 
   warning of  their own congestion by bulk packet marking using a new 
   pre-congestion notification behaviour. Gateways around the edges of 
   the region convert measurements of this packet granularity marking 
   into admission control and pre-emption functions at flow granularity. 

 
Authors' Note (TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION) 

   This document is posted as an Internet-Draft with the intention of 
   eventually becoming an INFORMATIONAL RFC, rather than a standards 
   track document. 

 
Table of Contents 

    
   1. Introduction................................................4 
      1.1. Summary................................................4 
         1.1.1. Admission control..................................5 
         1.1.2. Pre-emption........................................7 
         1.1.3. Both admission control and pre-emption.............8 
      1.2. Terminology............................................8 
      1.3. Existing terminology...................................10 
      1.4. Standardisation requirements...........................10 
      1.5. Structure of rest of the document......................10 
   2. Key aspects of the framework................................11 
      2.1. Key goals.............................................11 
      2.2. Key assumptions........................................12 
      2.3. Key benefits..........................................15 
   3. Architecture...............................................17 
      3.1. Admission control......................................17 
         3.1.1. Pre-Congestion Notification marking behaviour......17 
         3.1.2. Measurements to support admission control..........18 
 
 
Briscoe                Expires April 24, 2006                 [Page 2] 

Internet-Draft      Controlled Load architecture          October 2005 
    

         3.1.3. How edge-to-edge admission control supports end-to-end 
         QoS signalling..........................................19 
         3.1.4. Use case.........................................19 
      3.2. Pre-emption...........................................20 
         3.2.1. Alerting an ingress gateway that pre-emption may be 
         needed..................................................20 
         3.2.2. Determining the right amount of CL traffic to drop.23 
         3.2.3. Use case for pre-emption..........................24 
   4. Details....................................................25 
      4.1. Ingress gateways.......................................26 
      4.2. Interior nodes........................................27 
      4.3. Egress gateways........................................27 
      4.4. Failures..............................................28 
   5. Potential future extensions.................................29 
      5.1. Multi-domain and multi-operator usage..................29 
      5.2. Adaptive bandwidth for the Controlled Load service......29 
      5.3. Controlled Load service with end-to-end Pre-Congestion 
      Notification...............................................29 
      5.4. MPLS-TE...............................................30 
   6. Relationship to other QoS mechanisms........................30 
      6.1. IntServ Controlled Load................................30 
      6.2. Integrated services operation over DiffServ............30 
      6.3. Differentiated Services................................31 
      6.4. ECN...................................................31 
      6.5. RTECN.................................................31 
      6.6. RMD...................................................31 
      6.7. RSVP Aggregation over MPLS-TE..........................32 
   7. Security Considerations.....................................32 
   8. Acknowledgements...........................................33 
   9. Comments solicited.........................................33 
   10. Changes from the -00 version of this draft.................33 
   11. Appendixes................................................33 
      11.1. Appendix A: Explicit Congestion Notification..........33 
      11.2. Appendix B: What is distributed measurement-based admission 
      control?...................................................35 
      11.3. Appendix C: Calculating the Exponentially weighted moving 
      average (EWMA).............................................36 
   12. References................................................37 
   Authors' Addresses............................................41 
   Intellectual Property Statement................................42 
   Disclaimer of Validity........................................43 
   Copyright Statement...........................................43 
    

Briscoe                Expires April 24, 2006                 [Page 3] 

Internet-Draft      Controlled Load architecture          October 2005 
    

1. Introduction 

1.1. Summary  

   This document describes a framework to achieve an end-to-end 
   controlled load service by using - within a large region of the 
   Internet - DiffServ and edge-to-edge distributed measurement-based 
   admission control and flow pre-emption. Controlled load service is a 
   quality of service (QoS) closely approximating the QoS that the same 
   flow would receive from a lightly loaded network element [RFC2211]. 
   Controlled Load (CL) is useful for inelastic flows such as those for 
   real-time media. 

   In line with the "IntServ over DiffServ" framework defined in 
   [RFC2998], the CL service is supported end-to-end and RSVP signalling 
   [RFC2205] is used end-to-end, over an edge-to-edge DiffServ region. 

 ___    ___    _______________________________________    ____    ___ 
|   |  |   |  |                                       |  |    |  |   | 
|   |  |   |  |Ingress         Interior         Egress|  |    |  |   | 
|   |  |   |  |gateway          nodes          gateway|  |    |  |   | 
|   |  |   |  |-------+  +-------+  +-------+  +------|  |    |  |   | 
|   |  |   |  | CL-   |  | CL-   |  | CL-   |  |      |  |    |  |   | 
|   |..|   |..|marking|..|marking|..|marking|..| Meter|..|    |..|   | 
|   |  |   |  |-------+  +-------+  +-------+  +------|  |    |  |   | 
|   |  |   |  |  \                                 /  |  |    |  |   | 
|   |  |   |  |   \                               /   |  |    |  |   | 
|   |  |   |  |    \  Congestion-Level-Estimate  /    |  |    |  |   | 
|   |  |   |  |     \  (for admission control)  /     |  |    |  |   | 
|   |  |   |  |      --<-----<----<----<-----<--      |  |    |  |   | 
|   |  |   |  |      Sustainable-Aggregate-Rate       |  |    |  |   | 
|   |  |   |  |          (for pre-emption)            |  |    |  |   | 
|___|  |___|  |_______________________________________|  |____|  |___| 
 
Sx     Access               CL-region                   Access    Rx 
End    Network                                          Network   End 
Host                                                              Host 
                <------ edge-to-edge signalling -----> 
                (for admission control & pre-emption) 
 
<-------------------end-to-end QoS signalling protocol---------------> 
 
Figure 1: Overall QoS architecture (NB terminology explained later) 
 
 
Briscoe                Expires April 24, 2006                 [Page 4] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   In Section 1.1.1 we summarise how admission of new CL microflows is 
   controlled so as to deliver the required QoS. In abnormal 
   circumstances for instance a disaster affecting multiple interior 
   nodes, then the QoS on existing CL microflows may degrade even if 
   care was exercised when admitting those microflows before those 
   circumstances. Therefore we also propose a mechanism (summarised in 
   Section 1.1.2) to pre-empt some of the existing microflows. Then 
   remaining microflows retain their expected QoS, while improved QoS is 
   quickly restored to lower priority traffic.  

1.1.1. Admission control 

   This document describes a new admission control procedure for an 
   edge-to-edge region, which uses a new per-hop Explicit Congestion 
   Notification marking behaviour as a fundamental building block. In 
   turn, an end-to-end CL service would use this as a building block 
   within a broader QoS architecture. 

   The per-hop, edge-to-edge and end-to-end aspects are now briefly 
   introduced in turn. 

   Appendix A provides a brief summary of Explicit Congestion 
   Notification (ECN) [RFC3168]. It specifies that a router sets the ECN 
   field to the Congestion Experienced (CE) value as a warning of 
   incipient congestion. RFC3168 doesn't specify a particular algorithm 
   for setting the CE codepoint, although RED (Random Early Detection) 
   is expected to be used. We introduce a new algorithm in this 
   document, called Pre-Congestion Notification. It aims to set the CE 
   codepoint before there is any significant build-up of CL packets in 
   the queue, but as an "early warning" when the amount of packets 
   flowing is getting close to the engineered capacity. Hence it can be 
   used with per-hop behaviours (PHBs) designed to operate with very low 
   queue occupancy. Note that our use of the ECN field operates across 
   the CL-region, i.e. edge-to-edge, and not host-to-host as in 
   [RFC3168]. 

   This framework assumes that the Pre-Congestion Notification behaviour 
   is used in a controlled environment, i.e. within the controlled edge-
   to-edge region. 

   Within the controlled edge-to-edge region, a particular packet 
   receives the Pre-Congestion Notification behaviour if the packet's 
   header fulfils two conditions: its DSCP (differentiated services 
   codepoint) corresponds to the PHB for CL traffic, and also its ECN 
   field indicates ECN Capable Transport (ECT). 


Briscoe                Expires April 24, 2006                 [Page 5] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   Turning next to the edge-to-edge aspect. All nodes within a region of 
   the Internet, which we call the CL-region, apply the PHB used for CL 
   traffic and the Pre-Congestion Notification behaviour. Traffic must 
   enter/leave the CL-region through ingress/egress gateways, which have 
   special functionality. Typically the CL-region is the core or 
   backbone of an operator. The CL service is achieved "edge-to-edge" 
   across the CL-region, by using distributed measurement-based 
   admission control: the decision whether to admit a new microflow 
   depends on a measurement of the existing traffic between the same 
   pair of ingress and egress gateways (i.e. the same pair as the 
   prospective new microflow). (See Appendix B for further discussion on 
   "What is distributed measurement-based admission control?") 

   As CL packets travel across the CL-region, nodes will set the CE 
   codepoint (according to the Pre-Congestion Notification algorithm) as 
   an "early warning" of potential congestion, i.e. before there is any 
   significant build-up of CL packets in the queue. For traffic from 
   each remote ingress gateway, the CL-region's egress gateway measures 
   the fraction of CL traffic for which the CE codepoint is set. The 
   egress gateway calculates the value on a per bit basis as an 
   exponentially weighted moving average (which we term Congestion-
   Level-Estimate). Then reports it to the CL-region's ingress gateway 
   piggy-backed on the signalling for a new flow. The ingress gateway 
   only admits the new CL microflow if the Congestion-Level-Estimate is 
   less than a threshold value. Hence previously accepted CL microflows 
   will suffer minimal queuing delay, jitter and loss. 

   In turn, the edge-to-edge architecture is a building block in 
   delivering an end-to-end CL service. The approach is similar to that 
   described in [RFC2998] for Integrated services operation over 
   DiffServ networks. Like [RFC2998], an IntServ class (CL in our case) 
   is achieved end-to-end, with a CL-region viewed as a single 
   reservation hop in the total end-to-end path. Interior nodes of the 
   CL-region do not process flow signalling nor do they hold state. We 
   assume that the end-to-end signalling mechanism is RSVP (Section 
   2.2). However, the RSVP signalling may itself be originated or 
   terminated by proxies still closer to the edge of the network, such 
   as home hubs or the like, triggered in turn by application layer 
   signalling. [RFC2998] and our approach are compared further in 
   Section 6.2. 

   An important benefit compared with the IntServ over DiffServ model 
   [RFC2998] arises from the fact that the load is controlled 
   dynamically rather than with the traffic conditioning agreements 
   (TCAs). TCAs were originally introduced in the (informational) 
   DiffServ architecture [RFC2475] as an alternative to reservation 
   processing in the interior region in order to reduce the burden on 
 
 
Briscoe                Expires April 24, 2006                 [Page 6] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   interior nodes. With TCAs, in practice service providers rely on 
   subscription-time Service Level Agreements that statically define the 
   parameters of the traffic that will be accepted from a customer. The 
   problem arises because the TCA at the ingress must allow any 
   destination address, if it is to remain scalable. But for longer 
   topologies, the chances increase that traffic will focus on an 
   interior resource, even though it is within contract at the ingress 
   [Reid], e.g. all flows converge on the same egress gateway. Even 
   though networks can be engineered to make such failures rare, when 
   they occur all inelastic flows through the congested resource fail 
   catastrophically.  

   Distributed measurement-based admission control avoids reservation 
   processing (whether per flow or aggregated) on interior nodes but 
   flows are still blocked dynamically in response to actual congestion 
   on any interior node. Hence there is no need for accurate or 
   conservative prediction of the traffic matrix. 

     
1.1.2. Pre-emption 

   An essential QoS issue in core and backbone networks is being able to 
   cope with failures of nodes and links. The consequent re-routing can 
   cause severe congestion on some links and hence degrade the QoS 
   experienced by on-going microflows and other, lower priority traffic. 
   Even when the network is engineered to sustain a single link failure, 
   multiple link failures (e.g. due to a fibre cut or a node failure, or 
   a natural disaster) can cause violation of capacity constraints and 
   resulting QoS failures. Our solution uses rate-based pre-emption, so 
   that sufficient of the previously admitted CL microflows are dropped 
   to ensure that the remaining ones again receive QoS commensurate with 
   the CL service and at least some QoS is quickly restored to other 
   traffic classes.  

   The solution has two aspects. First, triggering the ingress gateway 
   to test whether pre-emption may be needed. This involves an optional 
   new router marking behaviour for Pre-emption Alert. Secondly, 
   calculating the right amount of traffic to drop. This involves the 
   egress gateway measuring, and reporting to the ingress gateway, the 
   current amount of CL traffic received from that particular ingress 
   gateway. The ingress gateway compares this measurement (which is the 
   amount that the network can actually support, and which we thus call 
   the Sustainable-Aggregate-Rate) with the rate that it is sending and 
   hence determines how much traffic needs to be pre-empted.  


Briscoe                Expires April 24, 2006                 [Page 7] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   The solution operates within a little over one round trip time - the 
   time required for microflow packets that have experienced Pre-emption 
   Alert marking to travel downstream through the CL-region and arrive 
   at the egress gateway, plus some additional time for the egress 
   gateway to measure the rate seen after it has been alerted that pre-
   emption may be needed, and the time for the egress gateway to report 
   this information to the ingress gateway.  

1.1.3. Both admission control and pre-emption 

   This document describes both the admission control and pre-emption 
   mechanisms, and we suggest that an operator uses both. However, we do 
   not require this and some operators may want to implement only one.  

   For example, an operator could use just admission control, solving 
   heavy congestion (caused by re-routing) by 'just waiting' - as 
   sessions end, existing microflows naturally depart from the system 
   over time, and the admission control mechanism will prevent admission 
   of new microflows that use the affected links. So the CL-region will 
   naturally return to normal controlled load service, but with reduced 
   capacity. The drawback of this approach would be that until flows 
   naturally depart to relieve the congestion, all flows and lower 
   priority services will be adversely affected. As another example, an 
   operator could use just admission control, avoiding heavy congestion 
   (caused by re-routing) by 'capacity planning' - by configuring 
   admission control thresholds to lower levels than the network could 
   accept in normal situations such that the load after failure is 
   expected to stay below acceptable levels even with reduced network 
   resources. 

   On the other hand, an operator could just rely for admission control 
   on the traffic conditioning agreements of the DiffServ architecture 
   [RFC2475]. The pre-emption mechanism described in this document would 
   be used to counteract the problem described at the end of Section 
   1.1.1. 

    
1.2. Terminology 

   o Ingress gateway: node at an ingress to the CL-region. A CL-region 
      may have several ingress gateways.  

   o Egress gateway: node at an egress from the CL-region. A CL-region 
      may have several egress gateways. 


Briscoe                Expires April 24, 2006                 [Page 8] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   o Interior node: a node which is part of the CL-region, but isn't an 
      ingress or egress node. 

   o CL-region: A region of the Internet in which all traffic 
      enters/leaves through an ingress/egress gateway and all nodes run 
      the Pre-Congestion Notification and Pre-emption Alert behaviours. 
      A CL-region is a DiffServ region (a DiffServ region is either a 
      single DiffServ domain or set of contiguous DiffServ domains), but 
      note that the CL-region does not use the traffic conditioning 
      agreements (TCAs) of the (informational) DiffServ architecture. 

   o CL-region-aggregate: all the microflows between a specific pair of 
      ingress and egress gateways. Note there is no identifier unique to 
      the aggregate. 

   o Pre-Congestion Notification: a new algorithm for deciding whether 
      to set the ECN CE codepoint (Explicit Congestion Notification 
      Congestion Experienced), for use by all routers in the CL-region. 
      A router sets the CE codepoint as an "early warning" that the load 
      is nearing the engineered admission control capacity, before there 
      is any significant build-up of CL packets in the queue.  

   o Inverse-token-bucket: a token bucket for which tokens are added 
      when packets are queued for transmission on the corresponding link 
      and consumed at a fixed rate. This is the inverse of a normal 
      token bucket. 

   o Pre-emption Alert: a new router marking behaviour, for use by 
      either all or none of the routers in the CL-region. A router re-
      marks a packet to Re-marked-CL to warn explicitly that pre-emption 
      may be needed.  

   o Congestion-Level-Estimate: the number of bits in CL packets that 
      have the CE codepoint set, divided by the number of bits in all CL 
      packets. It is calculated as an exponentially weighted moving 
      average. It is calculated by an egress gateway for the CL packets 
      from a particular ingress gateway, i.e. there is a Congestion-
      Level-Estimate for each CL-region-aggregate.  

   o Sustainable-Aggregate-Rate: the rate of traffic that the network 
      can actually support for a specific CL-region-aggregate. So it is 
      measured by an egress gateway for the CL packets from a particular 
      ingress gateway. 

 
Briscoe                Expires April 24, 2006                 [Page 9] 

Internet-Draft      Controlled Load architecture          October 2005 
    

1.3. Existing terminology 

   This is a placeholder for useful terminology that is defined 
   elsewhere. 

1.4. Standardisation requirements 

   The framework described in this document has two new standardisation 
   requirements:  

   o new Pre-Congestion Notification and Pre-emption Alert marking 
      behaviours are required, as detailed in [CL-marking].  

   o the end-to-end signalling protocol needs to be modified to carry 
      the Congestion-Level-Estimate report (for admission control) and 
      the Sustainable-Aggregate-Rate (for pre-emption). With our 
      assumption of RSVP (Section 2.2) as the end-to-end signalling 
      protocol, it means that extensions to RSVP are required, as 
      detailed in [RSVP-ECN], for example to carry the Congestion-Level-
      Estimate and Sustainable-Aggregate-Rate information from egress 
      gateway to ingress gateway. 

   We are discussing whether the PHB used by CL traffic should be a new 
   PHB (indicated by a new DSCP) or whether the Expedited Forwarding 
   (EF) PHB can be used with the addition of the required ECN marking 
   behaviour.  

   Other than these things, the arrangement uses existing IETF protocols 
   throughout, although not in their usual architecture. 

1.5. Structure of rest of the document 

   Section 2 describes some key aspects of the framework: our goals, 
   assumptions and the benefits we believe it has. Section 3 describes 
   the architecture (including a use case), whilst Section 4 summarises 
   the required changes to the various nodes in the CL-region. Section 5 
   outlines some possible extensions. Section 6 provides some comparison 
   with existing QoS mechanisms.  

    
Briscoe                Expires April 24, 2006                [Page 10] 

Internet-Draft      Controlled Load architecture          October 2005 
    

2. Key aspects of the framework 

   In this section we discuss the key aspects of the framework: 

   o At a high level, our key goals, i.e. the functionality that we 
      want to achieve 

   o The assumptions that we're prepared to make  

   o The consequent benefits they bring 

2.1. Key goals 

   The framework achieves an end-to-end controlled load (CL) service 
   where a segment of the end-to-end path is an edge-to-edge Pre-
   Congestion Notification region. CL is a quality of service (QoS) 
   closely approximating the QoS that the same flow would receive from a 
   lightly loaded network element [RFC2211]. It is useful for inelastic 
   flows such as those for real-time media.  

   o The CL service should be achieved despite varying load levels of 
      other sorts of traffic, which may or may not be rate adaptive 
      (i.e. responsive to packet drops or ECN marks). 

   o The CL service should be supported for a variety of possible CL 
      sources: Constant Bit Rate (CBR), Variable Bit Rate (VBR) and 
      voice with silence suppression. VBR is the most challenging to 
      support. 

   o After a localised failure in the interior of the CL-region causing 
      heavy congestion, the CL service should recover gracefully by pre-
      empting (dropping) some of the admitted CL microflows, whilst 
      preserving as many of them as possible with their full CL QoS.  

   o It is suggested that pre-emption needs to be completed within 1-2 
      seconds, because it is estimated that after a few seconds then 
      many affected users will start to hang up (and then not only is a 
      pre-emption mechanism redundant and possibly even counter-
      productive, but also many more flows than necessary to reduce 
      congestion may hang up). Also, other, lower priority traffic 
      classes will not be restored to partial service until the higher 
      priority CL service reduces its load on shared links. 


Briscoe                Expires April 24, 2006                [Page 11] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   o The CL service should support emergency services ([EMERG-RQTS], 
      [EMERG-TEL]) as well as the Assured Service which is the IP 
      implementation of the existing ITU-T/NATO/DoD telephone system 
      architecture known as Multi-Level Pre-emption and Precedence 
      [ITU.MLPP.1990] [ANSI.MLPP.Spec][ANSI.MLPP.Supplement], or MLPP. 
      In particular, this involves admitting new high priority sessions 
      even when admission control thresholds are reached and new routine 
      sessions are rejected. Similarly, this involves taking into 
      account session priorities and properties at the time of pre-
      empting calls. 

    
2.2. Key assumptions 

   The framework does not try to deliver the above functionality in all 
   scenarios. We make the following assumptions about the type of 
   scenario to be solved.  

   o Edge-to-edge: all the nodes in the CL-region are upgraded with the 
      Pre-Congestion Notification and Pre-emption Alert mechanisms, and 
      all the ingress and egress gateways are upgraded to perform the 
      measurement-based admission control and pre-emption. Note that 
      although the upgrades required are edge-to-edge, the CL service is 
      provided end-to-end. 

   o Additional load: we assume that any additional load offered within 
      the reaction time of the admission control mechanism doesn't move 
      the CL-region directly from no congestion to overload. So it 
      assumes there will always be an intermediate stage where some CL 
      packets have their CE codepoint set, but they are still delivered 
      without significant QoS degradation. We believe this is valid for 
      core and backbone networks with typical call arrival patterns 
      (given the reaction time is little more than one round trip time 
      across the CL-region), but is unlikely to be valid in access 
      networks where the granularity of an individual call becomes 
      significant. 

   o Aggregation: we assume that in normal operations, there are many 
      CL microflows within the CL-region, typically at least hundreds 
      between any pair of ingress and egress gateways. The implication 
      is that the solution is targeted at core and backbone networks and 
      possibly parts of large access networks.  


Briscoe                Expires April 24, 2006                [Page 12] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   o Trust: we assume that there is trust between all the nodes in the 
      CL-region. For example, this trust model is satisfied if one 
      operator runs the whole of the CL-region. But we make no such 
      assumptions about the end nodes, i.e. depending on the scenario 
      they may be trusted or untrusted by the CL-region.  

   o Signalling: we assume that the end-to-end signalling protocol is 
      RSVP. Section 3 describes how the CL-region fits into such an end-
      to-end QoS scenario, whilst [RSVP-ECN] describes the extensions to 
      RSVP that are required.  

   o Separation: we assume that all nodes within the CL-region are 
      upgraded with the CL mechanism, so the requirements of [Floyd] are 
      met because the CL-region is an enclosed environment. Also, an 
      operator separates CL-traffic in the CL-region from outside 
      traffic by administrative configuration of the ring of gateways 
      around the region. Within the CL-region we assume that the CL-
      traffic is separated from non-CL traffic.  

   o Routing: we assume that one of the following applies: 

        (same path) all packets between a pair of ingress and egress 
        gateways follow the same path. This ensures that the Congestion-
        Level-Estimate used in the admission control procedure reflects 
        the status of the path followed by the new flow's packets 

        (load balanced) packets between a pair of ingress and egress 
        gateways follow different paths but that the load balancing 
        scheme is tuned in the CL-region to distribute load such that 
        the different paths always receive comparable relative load. 
        This ensures that the Congestion-Level-Estimate used in the 
        admission control procedure (and which is computed taking into 
        account packets travelling on all the paths) also approximately 
        reflects the status of the actual path followed by the new 
        microflow's packets 

        (worst case assumed) packets between a pair of ingress and 
        egress gateways follow different paths but that (i) it is 
        acceptable for the operator to keep the CL traffic between this 
        pair of gateways to a level dictated by the most loaded of all 
        paths between this pair of gateways (so that CL traffic may be 
        rejected - or even pre-empted in some situations - even if one 
        or more of the paths between the pair of gateways is operating 
        below its engineered levels) and that (ii) it is acceptable for 
        that operator to configure engineered levels below optimum 
        levels to compensate for the fact that the effect on the 
        Congestion-Level-Estimate of the congestion experienced over one 
 
 
Briscoe                Expires April 24, 2006                [Page 13] 

Internet-Draft      Controlled Load architecture          October 2005 
    

        of the paths may be diluted by traffic received over non-
        congested paths so that lower thresholds need to be used in 
        these cases to ensure early admission control rejection and pre-
        emption over the congested paths.   

    
   We are investigating ways of loosening the restrictions set by some 
   of these assumptions, for instance: 

   o Trust: to allow the CL-region to span multiple, non-trusting 
      operators, using the technique of [Re-feedback] [Re-ECN] and 
      mentioned in Section 5.1. 

   o Signalling: we believe that the solution could operate with 
      another signalling protocol such as NSIS. We would very much 
      welcome input / collaboration with the NSIS community in order to 
      carry out similar work as done for RSVP. It could also work with 
      application level signalling as suggested in [RT-ECN]. 

   o Additional load: we believe that the assumption is valid for core 
      and backbone networks, with an appropriate margin between the 
      inverse-token-bucket's token rate and the configured rate for CL 
      traffic. However, in principle a burst of admission requests can 
      occur in a short time. We expect this to be a rare event under 
      normal conditions, but it could happen e.g.. due to a 'flash 
      crowd'. If it does, then more flows may be admitted than should 
      be, triggering the pre-emption mechanisms., To avoid the need for 
      pre-emption, 'call gapping' could be used at the egress (i.e. the 
      egress gateway paces out the admission of microflows). 

   o Separation: the assumption that CL traffic is separated from non-
      CL traffic implies that the CL traffic has its own PHB, not shared 
      with other traffic. We are looking at whether it could share 
      Expedited Forwarding's PHB, but supplemented with the new Pre-
      Congestion Notification and Pre-emption Alert marking behaviours. 
      If this is possible, other PHBs (like Assured Forwarding) could be 
      supplemented with the same new behaviours. This is similar to how 
      RFC3168 ECN was defined to supplement any PHB. 

   o Routing: we are looking in greater detail at the solution in the 
      presence of Equal Cost Multi-Path routing and at suitable 
      enhancements.  

    
Briscoe                Expires April 24, 2006                [Page 14] 

Internet-Draft      Controlled Load architecture          October 2005 
    

2.3. Key benefits 

   We believe that the mechanism described in this document has several 
   advantages: 

   o It achieves statistical guarantees of quality of service for 
      microflows, delivering a very low delay, jitter and packet loss 
      service suitable for applications like voice and video calls that 
      generate real time inelastic traffic. This is because of its per 
      microflow admission control scheme, combined with its dynamic on-
      path "early warning" of potential congestion. The guarantee is at 
      least as strong as with IntServ Controlled Load (Section 6.1 
      mentions why the guarantee may be somewhat better), but without 
      the scalability problems of per-microflow IntServ. 

   o It can support "Emergency" and military Multi-Level Pre-emption 
      and Priority services, even in times of heavy congestion (perhaps 
      caused by failure of a node within the CL-region), by pre-empting 
      on-going "ordinary CL microflows". 

   o It scales well, because there is no signal processing or path 
      state held by the interior nodes of the CL-region. 

   o It is resilient, again because no state is held by the interior 
      nodes of the CL-region. Hence during an interior routing change 
      caused by a node failure no microflow state has to be relocated. 
      The pre-emption mechanism further helps resilience because it 
      rapidly reduces the load to one that the CL-region can support. 

   o It helps preserve, through the pre-emption mechanism, QoS to as 
      many microflows as possible and to lower priority traffic in times 
      of heavy congestion (e.g.. caused by failure of an interior node). 
      Otherwise long-lived microflows could cause loss on all CL 
      microflows for a long time.   

   o It avoids the potential catastrophic failure problem when the 
      DiffServ architecture is used in large networks using statically 
      provisioned capacity. This is achieved by controlling the load 
      dynamically based on edge-to-edge-path real-time measurement of 
      Pre-Congestion Notification, as discussed in Section 1.1.1. 

   o It requires minimal new standardisation, because it reuses 
      existing QoS protocols and algorithms. 


Briscoe                Expires April 24, 2006                [Page 15] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   o It can be deployed incrementally, region by region or network by 
      network. Not all the regions or networks on the end-to-end path 
      need to have it deployed. Two CL-regions can even be separated by 
      a network that uses another QoS mechanism (e.g. MPLS-TE).  

   o It provides a deployment path for use of ECN for real-time 
      applications. Operators can gain experience of ECN before its 
      applicability to end-systems is understood and end terminals are 
      ECN capable. 

    
Briscoe                Expires April 24, 2006                [Page 16] 

Internet-Draft      Controlled Load architecture          October 2005 
    

3. Architecture 

3.1. Admission control  

   In this section we describe the admission control mechanism. We 
   discuss the three pieces of the solution and then give an example of 
   how they fit together in a use case: 

   o the new Pre-Congestion Notification marking behaviour used by all 
      nodes in the CL-region 

   o how the measurements made support our admission control mechanism  

   o how the edge to edge mechanism fits into the end to end RSVP 
      signalling 

    
3.1.1. Pre-Congestion Notification marking behaviour 

   To support our admission control mechanism, each node in the CL-
   region runs an algorithm to determine whether to set the CE codepoint 
   of a particular CL packet.  

   Each link in the CL-region has a fixed rate (bandwidth) reflecting 
   the engineered admission control capacity for CL traffic, under the 
   control of management configuration. In order to make the description 
   more specific we assume a bulk 'inverse-token-bucket' is used on each 
   link; other implementations are possible. Tokens are added to our 
   inverse-token-bucket when packets are queued for transmission on the 
   corresponding link, and are consumed at a fixed rate that is slower 
   than the configured rate. This means that the amount of tokens starts 
   to increase before the actual queue builds up, but when it is in 
   danger of doing so soon; hence it can be used as an "early warning" 
   that the engineered capacity is nearly reached. The probability that 
   a node sets the CE codepoint of a CL packet depends on the number of 
   tokens in the inverse-token-bucket. Below one threshold value of the 
   number of tokens no packets have their CE codepoint set and above the 
   second they all do; in between, the probability increases linearly. 
   Note that the same inverse-token-bucket is used for all the CL 
   packets on that link, i.e. it operates in bulk on the CL behaviour 
   aggregate and not per microflow. The algorithm is detailed in [CL-
   marking].  


Briscoe                Expires April 24, 2006                [Page 17] 

Internet-Draft      Controlled Load architecture          October 2005 
    

Probability  
of setting    ^ 
CE codepoint  | 
              | 
            1_|                     _______________ 
              |                    / 
              |                   / 
              |                  / 
              |                 / 
              |                / 
              |               / 
              |              / 
              |             / 
              |            / 
            0_|___________/ 
              | 
               -----------|---------|--------------> 
                        min-       max-         Amount of tokens in 
                     threshold    threshold     inverse-token-bucket 
       
Figure 2: Setting the Congestion Experienced Codepoint 
 
   How does a node know that it should apply the new Pre-Congestion 
   Notification marking behaviour? A CL packet is indicated by a 
   combination of three things: the node itself is in the CL-region so 
   it is configured with a behaviour for CL packets; the ECN codepoint 
   is set to ECN-Capable Transport (ECT); and the DSCP is set to the 
   value configured for the CL behaviour aggregate in the CL-region. On 
   the third point, we are currently considering whether the PHB used by 
   CL traffic should be a new PHB (indicated by a new DSCP) or whether 
   the Expedited Forwarding (EF) PHB can be used.  

3.1.2. Measurements to support admission control 

   To support our admission control mechanism the egress measures the 
   Congestion-Level-Estimate for traffic from each remote ingress 
   gateway, i.e. per CL-region-aggregate. The Congestion-Level-Estimate 
   is the number of bits in CL packets that have the CE codepoint set, 
   divided by the number of bits in all CL packets. It is calculated as 
   an exponentially weighted moving average. It is calculated by an 
   egress node separately for the CL packets from each particular 
   ingress node. This Congestion-Level-Estimate provides an estimate of 
   how near the links on the path inside the CL-region are getting to 
   the engineered admission control capacity. Note that the metering is 
 
 
Briscoe                Expires April 24, 2006                [Page 18] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   done separately per ingress node, because there may be sufficient 
   capacity on all the nodes on the path between one ingress gateway and 
   a particular egress, but not from a second ingress to that same 
   egress gateway. 

3.1.3. How edge-to-edge admission control supports end-to-end QoS 
   signalling 

   Consider a scenario that consists of two end hosts, each connected to 
   their own access networks, which are linked by the CL-region. A 
   source tries to set up a new CL microflow by sending an RSVP PATH 
   message, and the receiving end host replies with an RSVP RESV 
   message. Outside the CL-region some other method, for instance 
   IntServ, is used to provide QoS. From the perspective of RSVP the CL-
   region is a single hop, so the RSVP PATH and RESV messages are 
   processed by the ingress and egress gateways but are carried 
   transparently across all the interior nodes; hence, the ingress and 
   egress gateways hold per microflow state, whilst no state is kept by 
   the interior nodes. So far this is as in IntServ over DiffServ 
   [RFC2998]. However, in order to support our admission control 
   mechanism, the egress gateway adds to the RESV message an opaque 
   object which states the current Congestion-Level-Estimate for the 
   relevant CL-region-aggregate. Details of the corresponding RSVP 
   extensions are described in [RSVP-ECN]. 

3.1.4. Use case 

   To see how the three pieces of the solution fit together, we imagine 
   a scenario where some microflows are already in place between a given 
   pair of ingress and egress gateways, but the traffic load is such 
   that no packets from these flows have their CE codepoint set as they 
   travel across the CL-region. A source wanting to start a new CL 
   microflow sends an RSVP PATH message. The egress gateway adds an 
   object to the RESV message with the Congestion-Level-Estimate, which 
   is zero. The ingress gateway sees this and consequently admits the 
   new flow. It then forwards the RSVP RESV message upstream towards the 
   source end host. Hence, assuming there's sufficient capacity in the 
   access networks, the new microflow is admitted end-to-end.  

   The source now sends CL packets, which arrive at the ingress gateway. 
   The ingress uses a five-tuple filter to identify that the packets are 
   part of a previously admitted CL microflow, and it also polices the 
   microflow to ensure it remains within its traffic profile. (The 
   ingress has learnt the required information from the RSVP messages). 
   When forwarding a packet belonging to an admitted microflow, the 
   ingress sets the packet's DSCP to that for the CL-traffic in the CL-
   region and the packet's ECN field to ECT, so that the interior nodes 
 
 
Briscoe                Expires April 24, 2006                [Page 19] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   know this is a CL packet. The CL packet now travels across the CL-
   region, with the CE codepoint getting set if necessary. Also, 
   appropriate queue scheduling is needed in each node to ensure that CL 
   traffic gets its configured bandwidth. 

   Next, we imagine the same scenario but at a later time when load is 
   higher at one (or more) of the interior nodes, which start to set the 
   CE codepoint of CL packets because their arrival rate is nearing the 
   configured rate. The next time a source tries to set up a CL 
   microflow, the ingress gateway learns (from the egress) the relevant 
   Congestion-Level-Estimate. If it is greater than some threshold value 
   then the ingress refuses the request, otherwise it is accepted.  

   It is also possible for an egress gateway to get a RSVP RESV message 
   and not know what the Congestion-Level-Estimate is. For example, if 
   there are no CL microflows at present between the relevant ingress 
   and egress gateways. In this case the egress requests the ingress to 
   send probe packets, from which it can initialise its meter. RSVP 
   Extensions for such a request to send probe data can be found in 
   [RSVP-ECN]. 

    
3.2. Pre-emption 

   In this section we describe the pre-emption mechanism. We discuss the 
   two parts of the solution and then give an example of how they fit 
   together in a use case: 

   o How an ingress gateway is triggered to test whether pre-emption 
      may be needed 

   o How an ingress gateway determines the right amount of CL traffic 
      to drop 

   The mechanism is defined in [CL-marking] and [RSVP-ECN]. 

3.2.1. Alerting an ingress gateway that pre-emption may be needed 

   Alerting an ingress gateway that pre-emption may be needed is a two 
   stage process: a router in the CL-region alerts an egress gateway 
   that pre-emption may be needed; in turn the egress gateway alerts the 
   relevant ingress gateway. Every router in the CL-region has the 
   ability to alert egress gateways, which may be done either explicitly 
   or implicitly:  

 
Briscoe                Expires April 24, 2006                [Page 20] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   o Explicit - every link in the CL-region has a configured traffic 
      rate, which is a threshold above which it re-marks exceeding CL 
      packets to Re-marked-CL. Reception of such a packet by the egress 
      gateway acts as a Pre-emption Alert. Encoding of Re-marked-CL is 
      under discussion (a new DSCP or leaving the DSCP unchanged and 
      setting a new ECN codepoint). Note that the explicit mechanism 
      only makes sense if all the routers in the CL-region have the 
      functionality so that the egress gateways can rely on the explicit 
      mechanism. Otherwise there is the danger that the traffic happens 
      to focus on a router without it, and egress gateways then have to 
      also watch for implicit pre-emption alerts. 

   o Implicit - the router behaviour is unchanged from the Pre-
      Congestion marking behaviour described in the admission control 
      section. The egress gateway treats a Congestion-Level-Estimate of 
      (almost) 100% as an implicit alert that pre-emption may be 
      required. ('Almost' because the Congestion-Level-Estimate is a 
      moving average, so can never reach exactly 100%.) 

    
Probability   
of re-marking   ^  
CL packet to    |  
Re-marked-CL    |  
packet        1_|            ______________ 
                |           |  
                |           |  
                |           |  
                |           |  
                |           |  
                |           |  
                |           |  
                |           |  
                |           |  
              0_|___________| 
                |            
                 -----------|-------------->  
                      threshold      CL traffic rate                            
  
 
Figure 3: Re-marking CL packets to Re-marked-CL packets for explicit 
Pre-emption Alert 
    
 
Briscoe                Expires April 24, 2006                [Page 21] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   When one or more packets in a CL-region-aggregate alert the egress 
   gateway of the need for pre-emption, whether explicitly or 
   implicitly, the egress puts that CL-region-aggregate into Pre-emption 
   Alert state. For each CL-region-aggregate in alert state it measures 
   the rate of traffic at the egress gateway (i.e. the traffic rate of 
   the appropriate CL-region-aggregate) and reports this to the relevant 
   ingress gateway. The steps are: 

   o Determine the relevant ingress gateway - for the explicit case the 
      egress gateway examines the Re-marked-CL packet (resulting from 
      Pre-emption Alert marking) and uses the state installed at the 
      time of admission to determine which ingress gateway the packet 
      came from. For the implicit case the egress gateway has already 
      determined this information, because the Congestion-Level-Estimate 
      is calculated per ingress gateway. 

   o Measure the traffic rate of CL packets - as soon as the egress 
      gateway is alerted (whether explicitly or implicitly) it measures 
      the rate of CL traffic from this ingress gateway (i.e. for this 
      CL-region-aggregate). Note that Re-marked-CL packets are excluded 
      from that measurement. It should make its measurement quickly and 
      accurately, but exactly how is up to the implementation.  

   o Alert the ingress gateway - the egress gateway then immediately 
      alerts the relevant ingress gateway about the fact that pre-
      emption may be required. This Alert message also includes the 
      measured Sustainable-Aggregate-Rate, i.e. the egress rate of CL-
      traffic for this ingress gateway. The Alert message is sent using 
      reliable delivery. Procedures for support of such an Alert using 
      RSVP are defined in [RSVP-ECN]. 

 
             ______________           / \           ________________ 
            |              |        /     \        |                |     
CL packet   |Update        |      / Is it a \   Y  |Measure CL rate | 
arrives --->|Congestion-   |--->/Re-marked-CL \--->|from ingress and| 
            |Level-Estimate|    \   packet?   /    |alert ingress   | 
            |______________|      \         /      |________________| 
                                    \     / 
                                      \ / 
 
 
Figure 4: Egress gateway action for explicit Pre-emption Alert  
 
 
Briscoe                Expires April 24, 2006                [Page 22] 

Internet-Draft      Controlled Load architecture          October 2005 
    

             ______________           / \           ________________ 
            |              |        /     \        |                |     
CL packet   |Update        |      / C-L-E   \   Y  |Measure CL rate | 
arrives --->|Congestion-   |--->/  threshold  \--->|from ingress and| 
            |Level-Estimate|    \  exceeded?  /    |alert ingress   | 
            |______________|      \         /      |________________| 
                                    \     / 
                                      \ / 
 
 
Figure 5: Egress gateway action for implicit Pre-emption Alert  
 
 
3.2.2. Determining the right amount of CL traffic to drop 

   The method relies on the insight that the amount of CL traffic that 
   can be supported between a particular pair of ingress and egress 
   gateways, is the amount of CL traffic that is actually getting across 
   the CL-region to the egress gateway without being re-marked to Re-
   marked-CL. Hence we term it the Sustainable-Aggregate-Rate. 

   So when the ingress gateway gets the Alert message from an egress 
   gateway, it compares: 

   o The traffic rate that it is sending to this particular egress 
      gateway (which we term ingress-rate) 

   o The traffic rate that the egress gateway reports (in the Alert 
      message) that it is receiving from this ingress gateway (which is 
      the Sustainable-Aggregate-Rate) 

   If the difference is significant, then the ingress gateway pre-empts 
   some microflows. It only pre-empts if: 

        Ingress-rate > Sustainable-Aggregate-Rate + error 

   The "error" term is partially to allow for inaccuracies in the 
   measurements of the rates. It is also needed because the ingress-rate 
   is measured at a slightly later moment than the Sustainable-
   Aggregate-Rate, and it is quite possible that the ingress-rate has 
   increased in the interim due to natural variation of the bit rate of 
   the CL sources. So the "error" term allows for some variation in the 
   ingress rate without triggering pre-emption.  
 
 
Briscoe                Expires April 24, 2006                [Page 23] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   The ingress gateway should pre-empt enough microflows to ensure that: 

        New ingress-rate < Sustainable-Aggregate-Rate - error 

   The "error" term here is used for similar reasons but in the other 
   direction, to ensure slightly more load is shed than seems necessary, 
   in case the two measurements were taken during a short-term fall in 
   load.  

   When the routers in the CL-region are using explicit pre-emption 
   alerting, the ingress gateway would normally pre-empt microflows 
   whenever it gets an alert (it always would if it were possible to set 
   "error" equal to zero). For the implicit case however this is not so. 
   It receives an Alert message when the Congestion-Level-Estimate 
   reaches (almost) 100%, which is roughly when traffic exceeds the 
   amount allocated for admission control of CL traffic at routers. 
   However, it is only when packets are indeed dropped en route that the 
   Sustainable-Aggregate-Rate becomes less than the ingress-rate so only 
   then will pre-emption will actually occur on the ingress router.   

   Hence with the implicit scheme, pre-emption can only be triggered 
   once the system starts dropping packets and thus the QoS of flows 
   starts being significantly degraded. This is in contrast with the 
   explicit scheme which allows pre-emption to be triggered before any 
   packet drop, simply when the traffic reaches a certain configured 
   engineered pre-emption level. Therefore we believe that the explicit 
   mechanism is superior. However it does require new functionality on 
   all the routers (although this is little more than a bulk token 
   bucket).  

 
3.2.3. Use case for pre-emption  

   To see how the pieces of the solution fit together in a use case, we 
   imagine a scenario where many microflows have already been admitted. 
   We confine our description to the explicit pre-emption mechanism. Now 
   an interior router in the CL-region fails. The network layer routing 
   protocol re-routes round the problem, but as a consequence traffic on 
   other links increases. In fact let's assume the traffic on one link 
   now exceeds its pre-emption threshold and so the router re-marks CL 
   packets to Re-marked-CL. When the egress sees the first one of these 
   packets it immediately determines which microflow this packet is part 
   of (by using a five-tuple filter and comparing it with state 
   installed at admission) and hence which ingress gateway the packet 
   came from. It sets up a meter to measure the traffic rate from this 
   ingress gateway, and as soon as possible sends a message to the 
 
 
Briscoe                Expires April 24, 2006                [Page 24] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   ingress gateway. This message alerts the ingress gateway that pre-
   emption may be needed and contains the traffic rate measured by the 
   egress gateway. Then the ingress gateway determines the traffic rate 
   that it is sending towards this egress gateway and hence it can 
   calculate the amount of traffic that needs to be pre-empted.  

   The ingress gateway could now just shed random microflows, but it is 
   better if the least important ones are dropped. The ingress gateway 
   could use information stored locally in each reservation's state 
   (such as for example the RSVP pre-emption priority) as well as 
   information provided by a policy decision point in order to decide 
   which of the flows to shed (or perhaps which ones not to shed). The 
   ingress gateway then initiates RSVP signalling to instruct the 
   relevant destinations that their session has been terminated, and to 
   tell (RSVP) nodes along the path to tear down associated RSVP state. 
   To guard against recalcitrant sources, normal IntServ policing will 
   block any future traffic from the dropped flows from entering the CL-
   region. Note that - with the explicit Pre-emption Alert mechanism - 
   since the threshold for re-marking packets to Re-marked-CL may be set 
   at significantly less than the physical line capacity, traffic pre-
   emption may be triggered before any congestion has actually occurred 
   and before any packet is dropped. 

   We extend the scenario further by imagining that (due to a disaster 
   of some kind) further routers in the CL-region fail during the time 
   taken by the pre-emption process described above. This is handled 
   naturally, as packets will continue to be re-marked to Re-marked-CL 
   and so the pre-emption process will happen for a second time.  

   Pre-emption also helps emergency/military calls by taking into 
   account the corresponding call priorities when selecting calls to be 
   pre-empted, which is likely to be particularly important in a 
   disaster scenario.  

    
4. Details 

   This section is intended to provide a systematic summary of the new 
   functionality required by the routers in the CL-region. 

   A network operator upgrades normal IP routers by: 

   o Adding functionality related to admission control and pre-emption 
      to all its ingress and egress gateways 


Briscoe                Expires April 24, 2006                [Page 25] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   o Adding Pre-Congestion Notification behaviour and Pre-emption Alert 
      behaviour to all the nodes in the CL-region. 

   We consider the detailed actions required for each of the types of 
   node in turn.  

4.1. Ingress gateways 

   Ingress gateways perform the following tasks: 

   o Classify incoming packets - decide whether they are CL or non-CL 
      packets. This is done using an IntServ filter spec (source and 
      destination addresses and port numbers), whose details have been 
      gathered from the RSVP messaging. 

   o Police - check that the microflow conforms with what has been 
      agreed (i.e. it keeps to its agreed data rate). If necessary, 
      packets which do not correspond to any reservations, packets which 
      are in excess of the rate agreed for their reservation, and 
      packets for a reservation that has earlier been pre-empted may be 
      policed. Policing may be achieved via dropping or via re-marking 
      of the packet's DSCP to a value different from the CL behaviour 
      aggregate. 

   o Packet ECN colouring - for CL microflows, set the ECN field to 
      ECT(0) or ECT(1) (uses for ECT(0) and ECT(1) will be discussed in 
      a later version of this document) 

   o Perform 'interior node' functions (see next sub-section) 

   o Admission Control - on new session establishment, consider the 
      Congestion-Level-Estimate received from the corresponding egress 
      gateway and most likely based on a simple configured threshold 
      decide if a new call is to be admitted or rejected (taking into 
      account local policy information as well as optionally information 
      provided by a policy decision point). 

   o Probe - if requested by the egress gateway to do so, the ingress 
      gateway generates probe traffic so that the egress gateway can 
      compute the Congestion-Level-Estimate from this ingress gateway. 
      Probe packets may be simple data addressed to the egress gateway 
      and require no protocol standardisation, although there will be 
      best practice for their number, size and rate. 

   o Measure - when it receives an Alert message from an egress 
      gateway, it determines the rate at which it is sending packets to 
      that egress gateway 
 
 
Briscoe                Expires April 24, 2006                [Page 26] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   o Pre-empt - calculate how much CL traffic needs to be pre-empted; 
      decide which microflows should be dropped, perhaps in consultation 
      with a Policy Decision Point; and do the necessary signalling to 
      drop them. 

4.2. Interior nodes 

   Interior nodes do the following tasks: 

   o Classify packets - examine the DSCP and ECN field to see if it's a 
      CL packet 

   o Non-CL packets are handled as usual, with respect to dropping them 
      or setting their CE codepoint.  

   o Pre-Congestion Notification - CL packets have their CE codepoint 
      set according to the algorithm detailed in [CL-marking] and 
      outlined in Section 3. 

   o Pre-emption Alert - assuming the explicit Pre-emption Alert 
      mechanism is being used, when the rate of CL traffic exceeds a 
      threshold then re-mark packets to Re-marked-CL.  

 
4.3. Egress gateways 

   Egress gateways do the following tasks: 

   o Classify packets - determine which ingress gateway a CL packet has 
      come from. This is the previous RSVP hop, hence the necessary 
      details are obtained just as with IntServ from the state 
      associated with the packet five-tuple, which has been built using 
      information from the RSVP messages. 

   o Meter - for CL packets, calculate the fraction of the total number 
      of bits which are in CE marked packets or in Re-marked-CL packets. 
      The calculation is done as an exponentially weighted moving 
      average (see Appendix). A separate calculation is made for CL 
      packets from each ingress gateway. The meter works on an aggregate 
      basis and not per microflow. 


Briscoe                Expires April 24, 2006                [Page 27] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   o Signal the Congestion-Level-Estimate - this is piggy-backed on the 
      reservation reply. An egress gateway's interface is configured to 
      know it is an egress gateway, so it always appends this to the 
      RESV message. If the Congestion-Level-Estimate is unknown or is 
      too stale, then the egress gateway can request the ingress gateway 
      to send probes.  

   o Packet colouring - for CL packets, set the DSCP and the ECN field 
      to whatever has been agreed as appropriate for the next domain. By 
      default the ECN field is set to the Not-ECT codepoint. Note that 
      this results in the loss of the end-to-end meaning of the ECN 
      field. It can usually be assumed that end-to-end congestion 
      control is unnecessary within an end-to-end reservation. But if a 
      genuine need is identified for end-to-end ECN semantics within a 
      reservation, then an alternative is to tunnel CL packets across 
      the CL-region, or to agree an extension to end-to-end signalling 
      to indicate that the microflow uses an ECN-capable transport. We 
      do not recommend such apparently unnecessary complexity. 

   o Measure the rate - measure the rate of CL traffic from a 
      particular ingress gateway (i.e. the rate for the CL-region-
      aggregate), when alerted (either explicitly or implicitly) that 
      pre-emption may be required. The measured rate is reported back to 
      the appropriate ingress gateway [RSVP-ECN].  

4.4. Failures  

   If a gateway fails then regular RSVP procedures will take care of 
   things. For example, say an ingress gateway fails. Then RSVP routers 
   upstream of it do IP re-routing to a new ingress gateway. Then the 
   upstream RSVP routers do RSVP fast local repair, i.e. attempt to re-
   establish reservations through the new ingress gateway and, for 
   example, through the same egress gateway. As part of this, admission 
   control is performed, using the procedure described in this document. 
   This could result in some of the flows being rejected, but those 
   accepted will receive the full QoS.  

   If an interior node fails, then the regular IP routing protocol will 
   re-route round it. If the new route can carry admitted traffic, flows 
   gracefully continue. If instead this causes early warning of 
   congestion from the new route, admission control based on pre-
   congestion notification will ensure new flows will not be admitted 
   until enough existing flows have departed. Finally re-routing may 
   result in heavy congestion, when the pre-emption mechanism will kick 
   in.  

    
Briscoe                Expires April 24, 2006                [Page 28] 

Internet-Draft      Controlled Load architecture          October 2005 
    

5. Potential future extensions 

5.1. Multi-domain and multi-operator usage 

   This potential extension would eliminate the trust assumption 
   (Section 2.2), so that the CL-region could consist of multiple 
   domains run by different operators that did not trust each other. 
   Then only the ingress and egress gateways of the CL-region would take 
   part in the admission control procedure, i.e. at the ingress to the 
   first domain and the egress from the final domain. The border routers 
   between operators within the CL-region would only have to do bulk 
   accounting - they wouldn't do per microflow metering and policing, 
   and they wouldn't take part in signal processing or hold path state 
   [Briscoe]. [Re-feedback, Re-feedback-I-D] explains how a downstream 
   domain can police that its upstream domain does not 'cheat' by 
   admitting traffic when the downstream path is over-congested.  

5.2. Adaptive bandwidth for the Controlled Load service 

   The admission control mechanism described in this document assumes 
   that each router has a fixed bandwidth allocated to CL flows. A 
   possible extension is that the bandwidth is flexible, depending on 
   the level of non-CL traffic. If a large share of the current load on 
   a path is CL, then more CL traffic can be admitted. And if the 
   greater share of the load is non-CL, then the admission threshold can 
   be proportionately lower. The approach re-arranges sharing between 
   classes to aim for economic efficiency, whatever the traffic load 
   matrix. It also deals with unforeseen changes to capacity during 
   failures better than configuring fixed engineered rates. Adaptive 
   bandwidth allocation can be achieved by changing the Pre-Congestion 
   marking behaviour, so that the probability of setting the CE 
   codepoint would now depend on the number of queued non-CL packets as 
   well as the number of CL tokens. The adaptive bandwidth approach 
   would be supplemented by placing limits on the adaptation to prevent 
   starvation of the CL by other traffic classes and of other classes by 
   CL traffic.  

5.3. Controlled Load service with end-to-end Pre-Congestion Notification 

   It may be possible to extend the framework to parts of the network 
   where there are only a low number of CL microflows, i.e. the 
   aggregation assumption (Section 2.2) doesn't hold. In the extreme it 
   may be possible to operate the framework end-to-end, i.e. between end 
   hosts. One potential method is to send probe packets to test whether 
   the network can support a prospective new CL microflow. The probe 
   packets would be sent at the same traffic rate as expected for the 
   actual microflow, but in order not to disturb existing CL traffic a 
 
 
Briscoe                Expires April 24, 2006                [Page 29] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   router would always schedule probe packets behind CL ones (compare 
   [Breslau00]); this implies they have a new DSCP. Otherwise the 
   routers would treat probe packets identically to CL packets. In order 
   to perform admission control quickly, in parts of the network where 
   there are only a few CL microflows, the Pre-Congestion marking 
   behaviour for probe packets would switch from CE marking no packets 
   to CE marking them all for only a minimal increase in load. 

5.4. MPLS-TE 

   It may be possible to extend the framework for admission control of 
   microflows into a set of MPLS-TE aggregates (Multi-protocol label 
   switching traffic engineering). However it would require that the 
   MPLS header could include the ECN field, which is not precluded by 
   RFC3270.  

    
6. Relationship to other QoS mechanisms 

6.1. IntServ Controlled Load 

   The CL mechanism delivers QoS similar to Integrated Services 
   controlled load, but rather better as queues are kept empty by 
   driving admission control from bulk inverse-token-buckets on each 
   interface that can detect a rise in load before queues build, 
   sometimes termed a virtual queue [AVQ, vq]. It is also more robust to 
   route changes.  

6.2. Integrated services operation over DiffServ 

   Our approach to end-to-end QoS is similar to that described in 
   [RFC2998] for Integrated services operation over DiffServ networks. 
   Like [RFC2998], an IntServ class (CL in our case) is achieved end-to-
   end, with a CL-region viewed as a single reservation hop in the total 
   end-to-end path. Interior routers of the CL-region do not process 
   flow signalling nor do they hold state. Unlike [RFC2998] we do not 
   require the end-to-end signalling mechanism to be RSVP, although it 
   can be.  

   Bearing in mind these differences, we can describe our architecture 
   in the terms of the options in [RFC2998]. The DiffServ network region 
   is RSVP-aware, but awareness is confined to (what [RFC2998] calls) 
   the "border routers" of the DiffServ region. We use explicit 
   admission control into this region, with static provisioning within 
   it. The ingress "border router" does per microflow policing and sets 

 
Briscoe                Expires April 24, 2006                [Page 30] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   the DSCP and ECN fields to indicate the packets are CL ones (i.e. we 
   use router marking rather than host marking). 

6.3. Differentiated Services 

   The DiffServ architecture does not specify any way for devices 
   outside the domain to dynamically reserve resources or receive 
   indications of network resource availability.  In practice, service 
   providers rely on subscription-time Service Level Agreements (SLAs) 
   that statically define the parameters of the traffic that will be 
   accepted from a customer. The CL mechanism allows dynamic reservation 
   of resources through the DiffServ domain and, with the potential 
   extension mentioned in Section 5.1, it can span multiple domains 
   without active policing mechanisms at the borders (unlike DiffServ). 
   Therefore we do not use the traffic conditioning agreements (TCAs) of 
   the (informational) DiffServ architecture [RFC2475].  

   [Johnson] compares admission control with a 'generously dimensioned' 
   DiffServ network as ways to achieve QoS. The former is recommended.  

6.4. ECN 

   The marking behaviour described in this document complies with the 
   ECN aspects of the IP wire protocol RFC3168, but provides its own 
   edge-to-edge feedback instead of the TCP aspects of RFC3168. All 
   nodes within the CL-region are upgraded with the Pre-Congestion 
   Notification and Pre-emption Alert mechanisms, so the requirements of 
   [Floyd] are met because the CL-region is an enclosed environment. The 
   operator prevents traffic arriving at a node that doesn't understand 
   CL by administrative configuration of the ring of gateways around the 
   CL-region.  

6.5. RTECN 

   Real-time ECN (RTECN) [RTECN, RTECN-usage] has a similar aim to this 
   document (to achieve a low delay, jitter and loss service suitable 
   for RT traffic) and a similar approach (per microflow admission 
   control combined with an "early warning" of potential congestion 
   through setting the CE codepoint). But it explores a different 
   architecture without the aggregation assumption: host-to-host rather 
   than edge-to-edge. 

6.6. RMD 

   Resource Management in DiffServ (RMD) [RMD] is similar to this work, 
   in that it pushes complex classification, traffic conditioning and 
   admission control functions to the edge of a DiffServ domain and 
 
 
Briscoe                Expires April 24, 2006                [Page 31] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   simplifies the operation of the interior nodes. One of the RMD modes 
   uses measurement-based admission control, however it works 
   differently: each interior node measures the user traffic load in the 
   PHB traffic aggregate, and each interior node processes a local 
   RESERVE message and compares the requested resources with the 
   available resources (maximum allowed load minus current load). 

   Hence a difference is that the CL architecture described in this 
   document has been designed not to require interaction between 
   interior nodes and signalling, whereas in RMD all interior nodes are 
   QoS-NSLP aware. So our architecture involves less processing in 
   interior nodes, is more agnostic to signalling, requires fewer 
   changes to existing standards and therefore works with existing RSVP 
   as well as having the potential to work with future signalling 
   protocols like NSIS. 

   RMD introduced the concept of Severe Congestion handling. The pre-
   emption mechanism described in the CL architecture has similar 
   objectives but relies on different mechanisms. 

6.7. RSVP Aggregation over MPLS-TE 

   Multi-protocol label switching traffic engineering (MPLS-TE) allows 
   scalable reservation of resources in the core for an aggregate of 
   many microflows. To achieve end-to-end reservations, admission 
   control and policing of microflows into the aggregate can be achieved 
   using techniques such as RSVP Aggregation over MPLS TE Tunnels as per 
   [AGGRE-TE]. However, in the case of inter-provider environments, 
   these techniques require that admission control and policing be 
   repeated at each trust boundary or that MPLS TE tunnels span multiple 
   domains.  

    
7. Security Considerations 

   To protect against denial of service attacks, the ingress gateway of 
   the CL-region needs to police all CL packets and drop packets in 
   excess of the reservation. This is similar to operations with 
   existing IntServ behaviour. 

   For pre-emption, it is considered acceptable from a security 
   perspective that the ingress gateway can treat "emergency/military" 
   CL flows preferentially compared with "ordinary" CL flows. However, 
   in the rest of the CL-region they are not distinguished (nonetheless, 
   our proposed technique does not preclude the use of different DSCPs 
   at the packet level as well as different priorities at the flow 
 
 
Briscoe                Expires April 24, 2006                [Page 32] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   level.). Keeping emergency traffic indistinguishable at the packet 
   level minimises the opportunity for new security attacks. For 
   example, if instead a mechanism used different DSCPs for 
   "emergency/military" and "ordinary" packets, then an attacker could 
   specifically target the former in the data plane (perhaps for DoS or 
   for eavesdropping). 

   Further security aspects to be considered later.   

    
8. Acknowledgements 

   The admission control mechanism evolved from the work led by Martin 
   Karsten on the Guaranteed Stream Provider developed in the M3I 
   project [GSPa, GSP-TR], which in turn was based on the theoretical 
   work of Gibbens and Kelly [DCAC]. Kennedy Cheng, Gabriele Corliano, 
   Carla Di Cairano-Gilfedder, Kashaf Khan, Peter Hovell, Arnaud Jacquet 
   and June Tay (BT) helped develop and evaluate this approach. 

9. Comments solicited 

   Comments and questions are encouraged and very welcome. They can be 
   sent to the Transport Area Working Group's mailing list, 
   tsvwg@ietf.org, and/or to the authors. 

10. Changes from the -00 version of this draft 

   There are several modifications to the admission control mechanism 
   described in the first version of the draft, but the main technical 
   change is the addition of the whole of the Pre-emption mechanism. 

    
11. Appendixes 

11.1. Appendix A: Explicit Congestion Notification 

   This Appendix provides a brief summary of Explicit Congestion 
   Notification (ECN). 

   [RFC3168] specifies the incorporation of ECN to TCP and IP, including 
   ECN's use of two bits in the IP header. It specifies a method for 
   indicating incipient congestion to end-nodes (egg as in RED, Random 

 
Briscoe                Expires April 24, 2006                [Page 33] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   Early Detection), where the notification is through ECN marking 
   packets rather than dropping them.   

   ECN uses two bits in the IP header of both IPv4 and IPv6 packets: 

            0     1     2     3     4     5     6     7 
         +-----+-----+-----+-----+-----+-----+-----+-----+ 
         |          DS FIELD, DSCP           | ECN FIELD | 
         +-----+-----+-----+-----+-----+-----+-----+-----+ 
    
           DSCP: differentiated services codepoint 
           ECN:  Explicit Congestion Notification 
    
   Figure A.1: The Differentiated Services and ECN Fields in IP. 

   The two bits of the ECN field have four ECN codepoints, '00' to '11': 
         +-----+-----+ 
         | ECN FIELD | 
         +-----+-----+ 
           ECT   CE          
            0     0         Not-ECT 
            0     1         ECT(1) 
            1     0         ECT(0) 
            1     1         CE 
    
   Figure A.2: The ECN Field in IP. 

   The not-ECT codepoint '00' indicates a packet that is not using ECN. 

   The CE codepoint '11' is set by a router to indicate congestion to 
   the end nodes. The term 'CE packet' denotes a packet that has the CE 
   codepoint set.   

   The ECN-Capable Transport (ECT) codepoints '10' and '01' (ECT(0) and 
   ECT(1) respectively) are set by the data sender to indicate that the 
   end-points of the transport protocol are ECN-capable. Routers treat 
   the ECT(0) and ECT(1) codepoints as equivalent. Senders are free to 
   use either the ECT(0) or the ECT(1) codepoint to indicate ECT, on a 
   packet-by-packet basis. The use of both the two codepoints for ECT is 
   motivated primarily by the desire to allow mechanisms for the data 
   sender to verify that network elements are not erasing the CE 
   codepoint, and that data receivers are properly reporting to the 
   sender the receipt of packets with the CE codepoint set. 

   ECN requires support from the transport protocol, in addition to the 
   functionality given by the ECN field in the IP packet header. 
   [RFC3168] addresses the addition of ECN Capability to TCP, specifying 
 
 
Briscoe                Expires April 24, 2006                [Page 34] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   three new pieces of functionality: negotiation between the endpoints 
   during connection setup to determine if they are both ECN-capable; an 
   ECN-Echo (ECE) flag in the TCP header so that the data receiver can 
   inform the data sender when a CE packet has been received; and a 
   Congestion Window Reduced (CWR) flag in the TCP header so that the 
   data sender can inform the data receiver that the congestion window 
   has been reduced. 

   The transport layer (e.g.. TCP) must respond, in terms of congestion 
   control, to a *single* CE packet as it would to a packet drop.  

   The advantage of setting the CE codepoint as an indication of 
   congestion, instead of relying on packet drops, is that it allows the 
   receiver(s) to receive the packet, thus avoiding the potential for 
   excessive delays due to retransmissions after packet losses.  

    
11.2. Appendix B: What is distributed measurement-based admission 
   control?  

   This Appendix briefly explains what distributed measurement-based 
   admission control is [Breslau99].  

   Traditional admission control algorithms for 'hard' real-time 
   services (those providing a firm delay bound for example) guarantee 
   QoS by using 'worst case analysis'. Each time a flow is admitted its 
   traffic parameters are examined and the network re-calculates the 
   remaining resources. When the network gets a new request it therefore 
   knows for certain whether the prospective flow, with its particular 
   parameters, should be admitted. However, parameter-based admission 
   control algorithms result in under-utilisation when the traffic is 
   bursty. Therefore 'soft' real time services - like Controlled Load - 
   can use a more relaxed admission control algorithm.  

   This idea suggests measurement-based admission control (MBAC). The 
   aim of MBAC is to provide a statistical service guarantee. The 
   classic scenario for MBAC is where each node participates in hop-by-
   hop admission control, characterising existing traffic locally 
   through measurements (instead of keeping an accurate track of traffic 
   as it is admitted), in order to determine the current value of some 
   parameter e.g. load. Note that for scalability the measurement is of 
   the aggregate of the flows in the local system. The measured 
   parameter(s) is then compared to the requirements of the prospective 
   flow to see whether it should be admitted.  


Briscoe                Expires April 24, 2006                [Page 35] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   MBAC may also be performed centrally for a network, it which case it 
   uses centralised measurements by a bandwidth broker.  

   We use distributed MBAC. "Distributed" means that the measurement is 
   accumulated for the 'whole-path' using in-band signalling. In our 
   case, this means that the measurement of existing traffic is for the 
   same pair of ingress and egress gateways as the prospective 
   microflow.  

   In fact our mechanism can be said to be distributed in three ways: 
   all nodes on the ingress-egress path affect the Congestion-Level-
   Estimate; the admission control decision is made just once on behalf 
   of all the nodes on the path across the CL-region; and the ingress 
   and egress gateways cooperate to perform MBAC.  

11.3. Appendix C: Calculating the Exponentially weighted moving average 
   (EWMA) 

   At the egress gateway, for every CL packet arrival: 

   [EWMA-total-bits]n+1  =  (w * bits-in-packet)  +  ((1-w) * [EWMA- 
   total-bits]n ) 

   [EWMA-CE-bits]n+1  =  (B * w * bits-in-packet)  +  ((1-w) * [EWMA-CE-
   bits]n ) 

   Then, per new flow arrival: 

    [Congestion-Level-Estimate]n+1  =  [EWMA-CE-bits]n+1  /  [EWMA-
   total-bits]n+1  

    
   where 

   EWMA-total-bits is the total number of bits in CL packets, calculated 
   as an exponentially weighted moving average (EWMA) 

   EWMA-CE-bits is the total number of bits in CL packets where the 
   packet has its CE codepoint set, again calculated as an EWMA.  

   B is either 0 or 1: 

     B = 0 if the CL packet does not have its CE codepoint set  

     B = 1 if the CL packet has its CE codepoint set 

 
Briscoe                Expires April 24, 2006                [Page 36] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   w is the exponential weighting factor.  

    
   Varying the value of the weight trades off between the smoothness and 
   responsiveness of the estimate of the percentage of CE packets. 
   However, in general both can be achieved, given our original 
   assumption of many CL microflows and remembering that the EWMA is 
   calculated on the basis of aggregate traffic between the ingress and 
   egress gateways.   
   There will be a threshold inter-arrival time between packets of the 
   same aggregate below which the egress will consider the estimate of 
   the Congestion-Level-Estimate as too stale, and it will then trigger 
   generation of probes by the ingress.  
    
   The first two per-packet algorithms can be simplified, if their only 
   use will be where the result of one is divided by the result of the 
   other in the third, per-flow algorithm. 
    
   [EWMA-total-bits]'n+1  =  bits-in-packet  +  (w' * [EWMA- total-
   bits]n ) 

   [EWMA-CE-bits]'n+1  =  (B * bits-in-packet)  +  (w' * [EWMA-CE-bits]n 
   ) 

   where w' = (1-w)/w. 

   If w' is arranged to be a power of 2, these per packet algorithms can 
   be implemented solely with a shift and an add. 

     
12. References 

   A later version will distinguish normative and informative 
   references. 

   [AGGRE-TE]    Francois Le Faucheur, Michael Dibiasio, Bruce Davie, 
                 Michael Davenport, Chris Christou, Jerry Ash, Bur 
                 Goode, 'Aggregation of RSVP Reservations over MPLS 
                 TE/DS-TE Tunnels', draft-ietf-tsvwg-rsvp-dste-00 (work 
                 in progress), July 2005  

 
Briscoe                Expires April 24, 2006                [Page 37] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   [ANSI.MLPP.Spec] American National Standards Institute, 
                 "Telecommunications- Integrated Services Digital 
                 Network (ISDN) - Multi-Level Precedence and Pre-
                 emption (MLPP) Service Capability", ANSI T1.619-1992 
                 (R1999), 1992. 

   [ANSI.MLPP.Supplement] American National Standards Institute, "MLPP 
                 Service Domain Cause Value Changes", ANSI ANSI 
                 T1.619a-1994 (R1999), 1990. 

   [AVQ]         S. Kunniyur and R. Srikant "Analysis and Design of an 
                 Adaptive Virtual Queue (AVQ) Algorithm for Active 
                 Queue Management", In: Proc. ACM SIGCOMM'01, Computer 
                 Communication Review 31 (4) (October, 2001). 

   [Breslau99]   L. Breslau, S. Jamin, S. Shenker "Measurement-based 
                 admission control: what is the research agenda?", In: 
                 Proc. Int'l Workshop on Quality of Service 1999. 

   [Breslau00]   L. Breslau, E. Knightly, S. Shenker, I. Stoica, H. 
                 Zhang "Endpoint Admission Control: Architectural 
                 Issues and Performance", In: ACM SIGCOMM 2000  

   [Briscoe]     Bob Briscoe and Steve Rudkin, "Commercial Models for 
                 IP Quality of Service Interconnect", BT Technology 
                 Journal, Vol 23 No 2, April 2005. 

   [CL-marking]  Forthcoming. Supercedes draft-briscoe-tsvwg-cl-phb-00. 

   [DCAC]        Richard J. Gibbens and Frank P. Kelly "Distributed 
                 connection acceptance control for a connectionless 
                 network", In: Proc. International Teletraffic Congress 
                 (ITC16), Edinburgh, pp. 941�952 (1999). 

   [EMERG-RQTS]  Carlberg, K. and R. Atkinson, "General Requirements 
                 for Emergency Telecommunication Service (ETS)", RFC 
                 3689, February 2004. 

   [EMERG-TEL]   Carlberg, K. and R. Atkinson, "IP Telephony 
                 Requirements for Emergency Telecommunication Service 
                 (ETS)", RFC 3690, February 2004. 

   [Floyd]       S. Floyd, 'Specifying Alternate Semantics for the 
                 Explicit Congestion Notification (ECN) Field', draft-
                 floyd-ecn-alternates-02.txt (work in progress), August 
                 2005  

 
Briscoe                Expires April 24, 2006                [Page 38] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   [GSPa]        Karsten (Ed.), Martin "GSP/ECN Technology & 
                 Experiments", Deliverable: 15.3 PtIII, M3I Eu Vth 
                 Framework Project IST-1999-11429, URL: 
                 http://www.m3i.org/ (February, 2002) (superseded by 
                 [GSP-TR]) 

   [GSP-TR]      Martin Karsten and Jens Schmitt, "Admission Control 
                 Based on Packet Marking and Feedback Signalling �-- 
                 Mechanisms, Implementation and Experiments", TU-
                 Darmstadt Technical Report TR-KOM-2002-03, URL: 
                 http://www.kom.e-technik.tu-
                 darmstadt.de/publications/abstracts/KS02-5.html (May, 
                 2002)  

   [ITU.MLPP.1990] International Telecommunications Union, "Multilevel 
                 Precedence and Pre-emption Service (MLPP)", ITU-T 
                 Recommendation I.255.3, 1990.  

   [Johnson]     DM Johnson, 'QoS control versus generous 
                 dimensioning', BT Technology Journal, Vol 23 No 2, 
                 April 2005 

   [Re-ECN]      Bob Briscoe, Arnaud Jacquet, Alessandro Salvatori, 
                 'Re-ECN: Adding Accountability for Causing Congestion 
                 to TCP/IP', draft-briscoe-tsvwg-re-ecn-tcp-00 (work in 
                 progress), October 2005. 

   [Re-feedback] Bob Briscoe, Arnaud Jacquet, Carla Di Cairano-
                 Gilfedder, Andrea Soppera, 'Re-feedback for Policing 
                 Congestion Response in an Inter-network', ACM SIGCOMM 
                 2005, August 2005. 

   [Reid]        ABD Reid, 'Economics and scalability of QoS 
                 solutions', BT Technology Journal, Vol 23 No 2, April 
                 2005 

   [RFC2211]     J. Wroclawski, Specification of the Controlled-Load 
                 Network Element Service, September 1997 

   [RFC2309]     Braden, B., et al., "Recommendations on Queue 
                 Management and Congestion Avoidance in the Internet", 
                 RFC 2309, April 1998. 

   [RFC2474]     Nichols, K., Blake, S., Baker, F. and D. Black, 
                 "Definition of the Differentiated Services Field (DS 
                 Field) in the IPv4 and IPv6 Headers", RFC 2474, 
                 December 1998 
 
 
Briscoe                Expires April 24, 2006                [Page 39] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   [RFC2475]     Blake, S., Black, D., Carlson, M., Davies, E., Wang, 
                 Z. and W. Weiss, 'A framework for Differentiated 
                 Services', RFC 2475, December 1998. 

   [RFC2597]     Heinanen, J., Baker, F., Weiss, W. and J. Wrocklawski, 
                 "Assured Forwarding PHB Group", RFC 2597, June 1999. 

   [RFC2998]     Bernet, Y., Yavatkar, R., Ford, P., Baker, F., Zhang, 
                 L., Speer, M., Braden, R., Davie, B., Wroclawski, J. 
                 and E. Felstaine, "A Framework for Integrated Services 
                 Operation Over DiffServ Networks", RFC 2998, November 
                 2000. 

   [RFC3168]     Ramakrishnan, K., Floyd, S. and D. Black "The Addition 
                 of Explicit Congestion Notification (ECN) to IP", RFC 
                 3168, September 2001. 

   [RFC3246]     B. Davie, A. Charny, J.C.R. Bennet, K. Benson, J.Y. Le 
                 Boudec, W. Courtney, S. Davari, V. Firoiu, D. 
                 Stiliadis, 'An Expedited Forwarding PHB (Per-Hop 
                 Behavior)', RFC 3246, March 2002. 

   [RFC3270]      Le Faucheur, F., Wu, L., Davie, B., Davari, S., 
                 Vaananen, P., Krishnan, R., Cheval, P., and J. 
                 Heinanen, "Multi- Protocol Label Switching (MPLS) 
                 Support of Differentiated Services", RFC 3270, May 
                 2002. 

   [RMD]         Attila Bader, Lars Westberg, Georgios Karagiannis, 
                 Cornelia Kappler, Tom Phelan, 'RMD-QOSM - The Resource 
                 Management in DiffServ QoS model', draft-ietf-nsis-
                 rmd-03 Work in Progress, June 2005. 

   [RSVP-ECN]    Francois Le Faucheur, Anna Charny, Bob Briscoe, Philip 
                 Eardley, Joe Barbiaz, Kwok-Ho Chan, 'RSVP Extensions 
                 for Admission Control over DiffServ using Pre-
                 congestion Notification', draft-lefaucheur-rsvp-ecn-00 
                 (work in progress), October 2005. 

   [RTECN]       Babiarz, J., Chan, K. and V. Firoiu, 'Congestion 
                 Notification Process for Real-Time Traffic', draft-
                 babiarz-tsvwg-rtecn-04 Work in Progress, July 2005. 

   [RTECN-usage] Alexander, C., Ed., Babiarz, J. and J. Matthews, 
                 'Admission Control Use Case for Real-time ECN', draft-
                 alexander-rtecn-admission-control-use-case-00, Work in 
                 Progress, February 2005. 
 
 
Briscoe                Expires April 24, 2006                [Page 40] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   [vq]          Costas Courcoubetis and Richard Weber "Buffer Overflow 
                 Asymptotics for a Switch Handling Many Traffic 
                 Sources" In: Journal Applied Probability 33 pp. 886--
                 903 (1996). 

    
Authors' Addresses 

   Bob Briscoe 
   BT Research 
   B54/77, Sirius House 
   Adastral Park 
   Martlesham Heath 
   Ipswich, Suffolk 
   IP5 3RE 
   United Kingdom 
   Email: bob.briscoe@bt.com 
    

   Dave Songhurst 
   BT Research 
   B54/69, Sirius House 
   Adastral Park 
   Martlesham Heath 
   Ipswich, Suffolk 
   IP5 3RE 
   United Kingdom 
   Email: dsonghurst@jungle.bt.co.uk 
    

   Philip Eardley 
   BT Research 
   B54/77, Sirius House 
   Adastral Park 
   Martlesham Heath 
   Ipswich, Suffolk 
   IP5 3RE 
   United Kingdom 
   Email: philip.eardley@bt.com 
    

Briscoe                Expires April 24, 2006                [Page 41] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   Francois Le Faucheur  
   Cisco Systems, Inc.  
   Village d'Entreprise Green Side - Batiment T3  
   400, Avenue de Roumanille  
   06410 Biot Sophia-Antipolis  
   France  
   Email: flefauch@cisco.com  
        

   Anna Charny  
   Cisco Systems  
   300 Apollo Drive  
   Chelmsford, MA 01824  
   USA  
   Email: acharny@cisco.com  
        

   Kwok Ho Chan  
   Nortel Networks  
   600 Technology Park Drive  
   Billerica, MA  01821  
   USA  
   Email: khchan@nortel.com  
        

   Jozef Z. Babiarz  
   Nortel Networks  
   3500 Carling Avenue  
   Ottawa, Ont  K2H 8E9  
   Canada  
   Email: babiarz@nortel.com 
    

Intellectual Property Statement 

   The IETF takes no position regarding the validity or scope of any 
   Intellectual Property Rights or other rights that might be claimed to 
   pertain to the implementation or use of the technology described in 
   this document or the extent to which any license under such rights 
   might or might not be available; nor does it represent that it has 
   made any independent effort to identify any such rights.  Information 
   on the procedures with respect to rights in RFC documents can be 
   found in BCP 78 and BCP 79. 

   Copies of IPR disclosures made to the IETF Secretariat and any 
   assurances of licenses to be made available, or the result of an 
   attempt made to obtain a general license or permission for the use of 
 
 
Briscoe                Expires April 24, 2006                [Page 42] 

Internet-Draft      Controlled Load architecture          October 2005 
    

   such proprietary rights by implementers or users of this 
   specification can be obtained from the IETF on-line IPR repository at 
   http://www.ietf.org/ipr. 

   The IETF invites any interested party to bring to its attention any 
   copyrights, patents or patent applications, or other proprietary 
   rights that may cover technology that may be required to implement 
   this standard.  Please address the information to the IETF at 
   ietf-ipr@ietf.org 

Disclaimer of Validity 

   This document and the information contained herein are provided on an 
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 

Copyright Statement 

   Copyright (C) The Internet Society (2005). 

   This document is subject to the rights, licenses and restrictions 
   contained in BCP 78, and except as set forth therein, the authors 
   retain all their rights. 

 
Briscoe                Expires April 24, 2006                [Page 43]