Traffic Engineering Working Group Kireeti Kompella Internet Draft Juniper Networks Expiration Date: January 2002 Bandwidth Accounting for Traffic Engineering draft-kompella-tewg-bw-acct-00.txt 1. Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 except that the right to produce derivative works is not granted. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 2. Abstract This document examines bandwidth representations for Traffic Engineering. It compares the current representation, two schemes that are under consideration, and finally proposes yet another approach. The primary application is Traffic Engineering for Differentiated Services. 2.1. ID Summary SUMMARY This document examines bandwidth representations for Traffic Engineering. It compares the current representation, two schemes Kompella, Kireeti [Page 1] Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001 that are under consideration, and finally proposes yet another approach. The primary application is Traffic Engineering for Differentiated Services. RELATED DOCUMENTS draft-boyle-tewg-ds-nop-00.txt draft-ietf-mpls-diff-te-reqts-00.txt draft-ietf-tewg-diff-te-reqts-01.txt WHERE DOES IT FIT IN THE PICTURE OF THE SUB-IP WORK Belongs in TE WG. WHY IS IT TARGETED AT THIS WG This document describes Traffic Engineering parameters and their use. JUSTIFICATION There is ongoing work in the TE WG to specify new bandwidth parameters for DiffServ TE. This document proposes a possible alternative approach to this problem. 3. Introduction This document examines bandwidth representations for Traffic Engineering. Section 4 describes the current representation in [ISIS-TE] and [OSPF-TE]. Section 5 describes modification of the current representation suggested in [NOP]. Section 6 describes the DiffServ TE work [DS-TE]. Finally, Section 7 proposes a new bandwidth scheme. In what follows, it is assumed that there is a "Traffic Engineering Database" (TED) consisting of the topology of a network, i.e., the nodes and links in the network, and their attributes. The goal is to compute a path for a traffic flow (variously called aggregated flow, traffic trunk and traffic engineering tunnel) of given bandwidth and priority from a source node to a destination node; the focus is limited to aspects of this computation related to bandwidth. The questions being examined are what bandwidth attributes a link in the TED should have, and how these are used in computation. Kompella, Kireeti [Page 2] Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001 3.1. Terms A link has a "natural" bandwidth, the rate of data flow across the link. For example, an OC-12 link has a bandwidth of 622Mbps. A flow has a bandwidth requirement B and a priority p. The semantics of the bandwidth requirement is outside the scope of this document; the relevant issue is whether the flow fits on a link (admission control). A link may be allowed to be "oversubscribed". That is, the total bandwidth of flows across the link may exceed the link's bandwidth. It is assumed that the bandwidth of a single flow may not exceed the link's bandwidth. Priorities range from 0 (highest) to 7 (lowest). A flow F of higher priority may preempt a flow G of lower priority on a link if both flows cannot be accommodated on the link. Preemption in this context means that the flow G is terminated; to reestablish G, its path will have to be recomputed, and the flow set up over a different set of links; in so doing, G may preempt other flows. This potential cascading preemption may lead to (temporary) network instability, and thus is not always desirable; this issue is discussed below. 3.2. Operation Admission control is the test whether a flow "fits" in a link. If a flow is set up over a given link, the unreserved bandwidth of the link will change. Of interest here is how it changes, and whether the computing agent has enough information to predict this change. Similarly, when a flow is terminated (for example, by preemption), the unreserved bandwidth on the links it was using changes; again, it is of interest to be able to determine these changes. For each scheme below, we attempt to describe the admission control criteria and the bandwidth update algorithm. Kompella, Kireeti [Page 3] Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001 4. The Current Approach The current scheme for bandwidth representation associates with a link a link bandwidth, a maximum reservable bandwidth and unreserved bandwidth at eight priority levels. The link bandwidth is set to the speed of the link; so for example, an OC-12 link would have a link bandwidth of 622Mbps; this quantity is static. The maximum reservable bandwidth is the allowable total bandwidth of flows passing through this link; if oversubscription is allowed, this total may be larger than the link bandwidth. This quantity is also relatively static (it is configured, and changes only if the configuration changes). So, for example, a OC-12 link that is allowed to be oversubscribed by 150% will have a maximum reservable bandwidth of 1555Mbps. Unreserved bandwidth on the other hand is dynamic, i.e., as flows are set up and torn down, the unreserved bandwidth of a link may change. The initial value of unreserved bandwidth at each priority is set to the maximum reservable bandwidth; thus, unreserved bandwidth may be larger than the link bandwidth. Denote a link's link bandwidth by lbw, its maximum reservable bandwidth by mrb and its unreserved bandwidth at priority p by unres[p]. 4.1. Admission Control Say one wants to compute a path for a flow F with bandwidth B and setup priority p. F can be set up over a link L only if L's link bandwidth is at least B and L's unreserved bandwidth at priority p is at least B, i.e., B <= lbw & B <= unres[p] If F is set up over L, then L's unreserved bandwidth changes as follows: unres[q] := unres[q] - B, q >= p If unres[q] < 0 as a result, this is an indication that one or more flows need to be preempted to make room for F. If flow F is terminated for any reason, including preemption, then for each link L over which F was established, L's unreserved bandwidth changes as follows: unres[q] := unres[q] + B, q >= p As an aside, the above rules imply the following invariant: unres[p] >= unres[q], p <= q (*) Kompella, Kireeti [Page 4] Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001 Note that this invariant is a consequence of the fact that the initial values of unreserved bandwidth are identical, and of the procedures that update bandwidth values. This is not a requirement of any specification or protocol. 4.2. Analysis This scheme has the advantage of simplicity; also, it is deployed and currently working in many networks. The drawbacks are that each priority gets the same initial value and the same oversubscription factor; thus, this scheme is insufficient to address the requirements of Differentiated Services. Note: if preemption is not allowed, there is no value whatsoever in having 8 bandwidth values; priorities become irrelevant. In this case, a single value of unreserved bandwidth suffices, and admission control reduces to: B <= lbw & B <= unres 5. The "No-op" Scheme The "No-op" scheme [NOP] is an attempt to satisfy the requirements of DiffServ Traffic Engineering with no changes to topology information distributed by routing protocols, and minimal changes in admission control procedures. The No-op scheme is simple: instead of fixing the initial value of unreserved bandwidth at each priority level as the maximum reservable bandwidth, allow this to be configured per priority (per link). 5.1. Admission Control Admission control in the No-op scheme is the same as for the current scheme, i.e., for a flow at priority p with bandwidth B: B <= lbw & B <= unres[p] Bandwidth update procedures are not specified. 5.2. Analysis Only the node that owns a given link can say what the new values of unreserved bandwidth are as flows are established/torn down. This is a serious drawback: all other nodes in the network have to wait for the new information to be flooded before they can make meaningful new Kompella, Kireeti [Page 5] Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001 computations. This time can range from a few seconds to several minutes, as it is strongly suggested that bandwidth updates be rate- limited. As an example, consider an OC-12 link L owned by node A, and assume no oversubscription. Say that node Z is considering link L as a candidate for a flow F with priority p and bandwidth B. Case 1: Suppose that the initial values of unres[p] is set to 622Mbps for all p. Suppose further that a 222Mbps flow at priority 0 and a 200Mbps flow at priority 7 have already been established. The values of unres[0] and unres[7] seen by Z will then be 400Mbps and 200Mbps respectively. Case 2: Suppose that the initial value of unres[0] is set to 400Mbps, and of unres[7] is set to 200Mbps. The values of unres[0] and unres[7] seen by Z will again be 400Mbps and 200Mbps respectively. If Z wants to set up a 100Mbps flow over this link, it certainly fits. However, in case 1, the values of unres[0] and unres[7] after this flow is established are 300Mbps and 100Mbps respectively. In case 2, the new values are 300Mbps and 200Mbps respectively. If Z chooses link L for the flow, and sets it up, Z cannot predict a priori which of these values will be the new values, and must wait for A's update to know for certain. Such knowledge is required before Z can compute paths for new flows. Another drawback is that Z cannot predict whether setting up a given flow will cause preemption or not. In the above example, say that Z wants to set up a 300Mbps flow. In case 1, this will require that the priority 7 flow be preempted; in case 2, this flow fits without preemption. If Z doesn't want to cause preemption, Z cannot tell whether it should use link L or not. 6. Per-Class-Type Bandwidth Pools The scheme in [DS-TE] defines the notion of "Class Type". A class type is a set of DiffServ classes with similar DiffServ characteristics. Each class type has unreserved bandwidth at 8 priorities, analogous to the current scheme; in fact, the current scheme is defined to be bandwidths for "class type 0". For each pool, the maximum reservable bandwidth is configured (but not advertised); the initial values of unreserved bandwidth for a pool are set to the pool's maximum reservable bandwidth. This means that the invariant (*) is valid within a pool. Kompella, Kireeti [Page 6] Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001 6.1. Admission Control A flow is admissible on a link iff it meets two constraints: the flow fits within the class-type bound; and the flow meets the aggregate bandwidth for the link. The former is the same as the admission control for the current scheme, except that the appropriate class type should be used. The latter is requires that the aggregate unreserved bandwidth be known. However, the actual approach suggested by [DS-TE] is to fold in the two constraints into the unreserved bandwidth values for each class type. This is straightforward to compute for the node owning a link; however, as we will see below, this is impossible to predict when viewing the link from afar. The scheme also allows for compression of advertised values of unreserved bandwidths. This is useful, since each class type has 8 bandwidths, so the number of bandwidths advertised can be quite large. 6.2. Analysis The first observation is that the notion of class type is a direct consequence of the scaling issues with this approach. Ideally, one would talk in terms of classes; however, if having 6 classes means having 6 bandwidth pools of 8 priorities each, this is clearly not tenable. However, going from classes to class types is a definite compromise. The second is that all members of a class type share the same characteristics, i.e., they have the same initial unreserved bandwidth. If a class type consists of AF1 and AF2, it is clearly preferable to be able to have independent maximum reservable bandwidths for each, rather than a single common value. The third is that the notion of oversubscription has not been addressed at all. At the very least, each class type should have independent oversubscription factors; at best, each class should. For example, one may want to oversubscribe best effort traffic by 150%, but voice traffic by only 25%. One may go further and say that gold voice should be oversubscribed by only 10% whereas regular voice by 25%. Fourthly, as mentioned above, the scheme does not allow a computing node to predict the new values of bandwidths if a flow is actually set up; instead, the node has to wait for this information to be Kompella, Kireeti [Page 7] Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001 flooded. Fifthly, it is not clear that cross-class preemption is desirable; a clearer understanding of the consequences, and an evaluation by those who wish to deploy DiffServ TE should be sought. Finally, there are proposals to advertise backup bandwidths for the purpose of setting up backup flows. If there are 4 class types with 8 bandwidths, and each needs to be backed up, there is a definite concern about the scalability of this approach. 7. Per-Class Bandwidth Accounting The scheme proposed in this document offers per-class bandwidth accounting. This means per class maximum bandwidth, as well as per- class oversubscription. The mapping of priority to class is configured by the network administrator. This mapping is irrelevant to protocols and to the TED; neither path computation nor admission control care about Class Types or classes. 7.1. Advertisements Bandwidth advertisements work as follows: each link has a "link bandwidth", and N priority levels (N = 8 is the current norm). Each priority has a normalized (see next section) reserved bandwidth, a normalized maximum bandwidth, and a subscription factor. Each priority corresponds to exactly one Diff Serv class; more than one priority can map to the same class. For example: priority grade 0 premium voice 1 premium assured 2 standard voice 3 standard assured 4 gold data 5 silver data 6 bronze data 7 best effort is the natural link bandwidth. is the reserved bandwidth at each priority level. is the maximum bandwidth allowed at each priority. is the subscription factor at each priority level. This advertisement deprecates current bandwidth Kompella, Kireeti [Page 8] Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001 advertisements, i.e., maximum link bandwidth, maximum reservable bandwidth and unreserved bandwidth. Normalized maximum bandwidth for each priority level is set by configuration and must be at most link bandwidth; the default is the link bandwidth. The subscription factor is given in percentage and is also set by configuration, and must be >= 100; the default is 100%. A subscription factor > 100 for priority p means that priority p can be over-subscribed by (subscr[p]-100) %. Normalized reserved bandwidth is initially 0, and is the only field that changes dynamically, i.e., as flows come and go. Normalized reserved bandwidth must be <= normalized maximum bandwidth. So, a 100Mbps link could have 10Mbps/110% maximum bandwidth and subscription factor for priority 0, 15Mbps/125% for priority 1, 25Mbps/150% for priority 2, 50Mbps/150% for priorities 3-6 and 100Mbps/400% for priority 7. 7.2. Normalization The notion of normalized bandwidth is a simple way of dealing with bandwidths at different oversubscription levels. Suppose a flow requires 10Mbps at a priority level that has 50% oversubscription, and another flow requires 40Mbps at a priority level that has 150% oversubscription. Adding the bandwidths (10+40 = 50) clearly doesn't make sense. On the other hand, for some model of oversubscription, normalizing the bandwidths before adding (or otherwise combining) them has some merit. Clearly, this sacrifices accuracy for simplicity; a more accurate mechanism for adding bandwidths (even at the same oversubscription levels) would involve adding probability functions. Definition: for a bandwidth B at priority p, the "normalized" bandwidth B' is given by B' = ((B*100)/subscr[p]). For a link, denote by reserved[p] the normalized reserved bandwidth at priority p; denote by max[p] the normalized maximum bandwidth at priority p; denote by subscr[p] the subscription factor for priority p. Definition: the unreserved bandwidth at priority p is given by: unreserved[p] = (link bandwidth - (sum reserved[q], q <= p)) Note that unreserved[] is computed at each node -- it is not flooded. Kompella, Kireeti [Page 9] Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001 7.3. Admission Control A flow of bandwidth B and priority p is admissible on a link iff all of the following are true: a) B <= max[p] for the link # flow must fit in priority class b) B' <= unreserved[p] # normalized flow must fit in # unreserved bandwidth at prio p c) B' <= max[p] - res[p] # normalized flow must fit within prio p So, for example, suppose priority 1 has max = 50Mbps with 300% subscription. A flow with bandwidth 145Mbps cannot be admitted even though 145 < 50x3. However, 3 flows with bandwidth 50Mbps can be admitted. If the flow is accepted, then reserved[p] := reserved[p] + B' for q >= p unreserved[q] := unreserved[q] - B' if unreserved[q] < 0, preempt flows at priority q until unreserved[q] >= 0 If a flow of bandwidth B and priority p goes away, then reserved[p] := reserved[p] + B' unreserved[q] := unreserved[q] - B' for q >= p 7.4. Analysis The requirements for DiffServ TE, reduced to simplest terms, can be stated as follows: overlay on a single physical topology multiple logical topologies (each could represent a DiffServ class); show how the topology information can be flooded, and how computation (admission control and bandwidth updates) can be done; and show the interaction between the various logical topologies. This document does that. It is unfortunate that [DS-TE-REQTS] and [TE-FW] speak about Class Types rather than classes (thus pre-supposing the solution), and ignore oversubscription. However, the following statement (with "class" replacing "class type") sums up the requirements for DiffServ TE: The fundamental requirement for DS-TE is to be able to enforce different bandwidth constraints for different Class Types rather than a single one. The above scheme may seem fairly complex; however, the procedures for admission control and bandwidth updates are very clear and Kompella, Kireeti [Page 10] Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001 straightforward, and readily implementable; the mechanism is scalable; and oversubscription is dealt with. Furthermore, there is sufficient information in the TED so that each computing node has an accurate picture of the state of the TED after its operations, so that a node that needs to do multiple computations can do so without waiting for new topology information (with a reasonable chance of success). In the opinion of the author (as an implementor), this property is highly desirable, if not an absolute requirement. 8. Security Considerations Not discussed. Not envisioned to be an issue. 9. Acknowledgments Many thanks to Quaizar Vohra and Yakov Rekhter for their valuable comments and bug fixes. Any remaining bugs are solely my doing. 10. References [DS-TE] "Requirements for support of Diff-Serv-aware MPLS Traffic Engineering", Le Faucheur, Francois, Chiu, Angela, Townsend, William, Skalecki, Darek, draft-ietf-mpls-diff-te-reqts-00.txt (work in progress). [DS-TE-REQTS] "Requirements for support of Diff-Serv-aware MPLS Traffic Engineering", Le Faucheur, Francois, Nadeau, Thomas, Tatham, Martin, Telkamp, Thomas, Cooper, David, Boyle, Jim, Martini, Luca, Fang, Luyuan, Lai, Waisum, Ash, Jerry, Hicks, Pete, Chiu, Angela, Townsend, William, Skalecki, Darek, draft-ietf-tewg-diff-te- reqts-01.txt (work in progress). [ISIS-TE] "IS-IS extensions for Traffic Engineering", Li, Tony, Smit, Henk, draft-ietf-isis-traffic-03.txt (work in progress). [NOP] "Accomplishing Diffserv TE needs with Current Specifications", Boyle, Jim, draft-boyle-tewg-ds-nop-00.txt (work in progress). [OSPF-TE] "Traffic Engineering Extensions to OSPF", Katz, Dave, Yeung, Derek, Kompella, Kireeti, draft-katz-yeung-ospf-traffic-05.txt (work in progress). [TE-FW] "A Framework for Internet Traffic Engineering", Awduche, Daniel, Chiu, Angela, Elwalid, Anwar, Widjaja, Indra, Xiao, XiPeng, draft-ietf-tewg-framework-05.txt (work in progress). Kompella, Kireeti [Page 11] Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001 11. Author's Address Kireeti Kompella Juniper Networks, Inc. 1194 N. Mathilda Ave Sunnyvale, CA 94089 kireeti@juniper.net Kompella, Kireeti [Page 12]