Traffic Engineering Working Group Kireeti Kompella
Internet Draft Juniper Networks
Expiration Date: January 2002
Bandwidth Accounting for Traffic Engineering
draft-kompella-tewg-bw-acct-00.txt
1. Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026 except that the right to
produce derivative works is not granted.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as ``work in progress.''
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
2. Abstract
This document examines bandwidth representations for Traffic
Engineering. It compares the current representation, two schemes
that are under consideration, and finally proposes yet another
approach. The primary application is Traffic Engineering for
Differentiated Services.
2.1. ID Summary
SUMMARY
This document examines bandwidth representations for Traffic
Engineering. It compares the current representation, two schemes
Kompella, Kireeti [Page 1]
Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001
that are under consideration, and finally proposes yet another
approach. The primary application is Traffic Engineering for
Differentiated Services.
RELATED DOCUMENTS
draft-boyle-tewg-ds-nop-00.txt
draft-ietf-mpls-diff-te-reqts-00.txt
draft-ietf-tewg-diff-te-reqts-01.txt
WHERE DOES IT FIT IN THE PICTURE OF THE SUB-IP WORK
Belongs in TE WG.
WHY IS IT TARGETED AT THIS WG
This document describes Traffic Engineering parameters and their use.
JUSTIFICATION
There is ongoing work in the TE WG to specify new bandwidth
parameters for DiffServ TE. This document proposes a possible
alternative approach to this problem.
3. Introduction
This document examines bandwidth representations for Traffic
Engineering. Section 4 describes the current representation in
[ISIS-TE] and [OSPF-TE]. Section 5 describes modification of the
current representation suggested in [NOP]. Section 6 describes the
DiffServ TE work [DS-TE]. Finally, Section 7 proposes a new
bandwidth scheme.
In what follows, it is assumed that there is a "Traffic Engineering
Database" (TED) consisting of the topology of a network, i.e., the
nodes and links in the network, and their attributes. The goal is to
compute a path for a traffic flow (variously called aggregated flow,
traffic trunk and traffic engineering tunnel) of given bandwidth and
priority from a source node to a destination node; the focus is
limited to aspects of this computation related to bandwidth. The
questions being examined are what bandwidth attributes a link in the
TED should have, and how these are used in computation.
Kompella, Kireeti [Page 2]
Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001
3.1. Terms
A link has a "natural" bandwidth, the rate of data flow across the
link. For example, an OC-12 link has a bandwidth of 622Mbps.
A flow has a bandwidth requirement B and a priority p. The semantics
of the bandwidth requirement is outside the scope of this document;
the relevant issue is whether the flow fits on a link (admission
control).
A link may be allowed to be "oversubscribed". That is, the total
bandwidth of flows across the link may exceed the link's bandwidth.
It is assumed that the bandwidth of a single flow may not exceed the
link's bandwidth.
Priorities range from 0 (highest) to 7 (lowest). A flow F of higher
priority may preempt a flow G of lower priority on a link if both
flows cannot be accommodated on the link. Preemption in this context
means that the flow G is terminated; to reestablish G, its path will
have to be recomputed, and the flow set up over a different set of
links; in so doing, G may preempt other flows. This potential
cascading preemption may lead to (temporary) network instability, and
thus is not always desirable; this issue is discussed below.
3.2. Operation
Admission control is the test whether a flow "fits" in a link.
If a flow is set up over a given link, the unreserved bandwidth of
the link will change. Of interest here is how it changes, and
whether the computing agent has enough information to predict this
change. Similarly, when a flow is terminated (for example, by
preemption), the unreserved bandwidth on the links it was using
changes; again, it is of interest to be able to determine these
changes.
For each scheme below, we attempt to describe the admission control
criteria and the bandwidth update algorithm.
Kompella, Kireeti [Page 3]
Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001
4. The Current Approach
The current scheme for bandwidth representation associates with a
link a link bandwidth, a maximum reservable bandwidth and unreserved
bandwidth at eight priority levels. The link bandwidth is set to the
speed of the link; so for example, an OC-12 link would have a link
bandwidth of 622Mbps; this quantity is static. The maximum
reservable bandwidth is the allowable total bandwidth of flows
passing through this link; if oversubscription is allowed, this total
may be larger than the link bandwidth. This quantity is also
relatively static (it is configured, and changes only if the
configuration changes). So, for example, a OC-12 link that is
allowed to be oversubscribed by 150% will have a maximum reservable
bandwidth of 1555Mbps.
Unreserved bandwidth on the other hand is dynamic, i.e., as flows are
set up and torn down, the unreserved bandwidth of a link may change.
The initial value of unreserved bandwidth at each priority is set to
the maximum reservable bandwidth; thus, unreserved bandwidth may be
larger than the link bandwidth.
Denote a link's link bandwidth by lbw, its maximum reservable
bandwidth by mrb and its unreserved bandwidth at priority p by
unres[p].
4.1. Admission Control
Say one wants to compute a path for a flow F with bandwidth B and
setup priority p. F can be set up over a link L only if L's link
bandwidth is at least B and L's unreserved bandwidth at priority p is
at least B, i.e.,
B <= lbw & B <= unres[p]
If F is set up over L, then L's unreserved bandwidth changes as
follows:
unres[q] := unres[q] - B, q >= p
If unres[q] < 0 as a result, this is an indication that one or more
flows need to be preempted to make room for F.
If flow F is terminated for any reason, including preemption, then
for each link L over which F was established, L's unreserved
bandwidth changes as follows:
unres[q] := unres[q] + B, q >= p
As an aside, the above rules imply the following invariant:
unres[p] >= unres[q], p <= q (*)
Kompella, Kireeti [Page 4]
Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001
Note that this invariant is a consequence of the fact that the
initial values of unreserved bandwidth are identical, and of the
procedures that update bandwidth values. This is not a requirement
of any specification or protocol.
4.2. Analysis
This scheme has the advantage of simplicity; also, it is deployed and
currently working in many networks. The drawbacks are that each
priority gets the same initial value and the same oversubscription
factor; thus, this scheme is insufficient to address the requirements
of Differentiated Services.
Note: if preemption is not allowed, there is no value whatsoever in
having 8 bandwidth values; priorities become irrelevant. In this
case, a single value of unreserved bandwidth suffices, and admission
control reduces to:
B <= lbw & B <= unres
5. The "No-op" Scheme
The "No-op" scheme [NOP] is an attempt to satisfy the requirements of
DiffServ Traffic Engineering with no changes to topology information
distributed by routing protocols, and minimal changes in admission
control procedures.
The No-op scheme is simple: instead of fixing the initial value of
unreserved bandwidth at each priority level as the maximum reservable
bandwidth, allow this to be configured per priority (per link).
5.1. Admission Control
Admission control in the No-op scheme is the same as for the current
scheme, i.e., for a flow at priority p with bandwidth B:
B <= lbw & B <= unres[p]
Bandwidth update procedures are not specified.
5.2. Analysis
Only the node that owns a given link can say what the new values of
unreserved bandwidth are as flows are established/torn down. This is
a serious drawback: all other nodes in the network have to wait for
the new information to be flooded before they can make meaningful new
Kompella, Kireeti [Page 5]
Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001
computations. This time can range from a few seconds to several
minutes, as it is strongly suggested that bandwidth updates be rate-
limited.
As an example, consider an OC-12 link L owned by node A, and assume
no oversubscription. Say that node Z is considering link L as a
candidate for a flow F with priority p and bandwidth B.
Case 1: Suppose that the initial values of unres[p] is set to 622Mbps
for all p. Suppose further that a 222Mbps flow at priority 0 and a
200Mbps flow at priority 7 have already been established. The values
of unres[0] and unres[7] seen by Z will then be 400Mbps and 200Mbps
respectively.
Case 2: Suppose that the initial value of unres[0] is set to 400Mbps,
and of unres[7] is set to 200Mbps. The values of unres[0] and
unres[7] seen by Z will again be 400Mbps and 200Mbps respectively.
If Z wants to set up a 100Mbps flow over this link, it certainly
fits. However, in case 1, the values of unres[0] and unres[7] after
this flow is established are 300Mbps and 100Mbps respectively. In
case 2, the new values are 300Mbps and 200Mbps respectively. If Z
chooses link L for the flow, and sets it up, Z cannot predict a
priori which of these values will be the new values, and must wait
for A's update to know for certain. Such knowledge is required
before Z can compute paths for new flows.
Another drawback is that Z cannot predict whether setting up a given
flow will cause preemption or not. In the above example, say that Z
wants to set up a 300Mbps flow. In case 1, this will require that
the priority 7 flow be preempted; in case 2, this flow fits without
preemption. If Z doesn't want to cause preemption, Z cannot tell
whether it should use link L or not.
6. Per-Class-Type Bandwidth Pools
The scheme in [DS-TE] defines the notion of "Class Type". A class
type is a set of DiffServ classes with similar DiffServ
characteristics. Each class type has unreserved bandwidth at 8
priorities, analogous to the current scheme; in fact, the current
scheme is defined to be bandwidths for "class type 0".
For each pool, the maximum reservable bandwidth is configured (but
not advertised); the initial values of unreserved bandwidth for a
pool are set to the pool's maximum reservable bandwidth. This means
that the invariant (*) is valid within a pool.
Kompella, Kireeti [Page 6]
Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001
6.1. Admission Control
A flow is admissible on a link iff it meets two constraints: the flow
fits within the class-type bound; and the flow meets the aggregate
bandwidth for the link.
The former is the same as the admission control for the current
scheme, except that the appropriate class type should be used. The
latter is requires that the aggregate unreserved bandwidth be known.
However, the actual approach suggested by [DS-TE] is to fold in the
two constraints into the unreserved bandwidth values for each class
type. This is straightforward to compute for the node owning a link;
however, as we will see below, this is impossible to predict when
viewing the link from afar.
The scheme also allows for compression of advertised values of
unreserved bandwidths. This is useful, since each class type has 8
bandwidths, so the number of bandwidths advertised can be quite
large.
6.2. Analysis
The first observation is that the notion of class type is a direct
consequence of the scaling issues with this approach. Ideally, one
would talk in terms of classes; however, if having 6 classes means
having 6 bandwidth pools of 8 priorities each, this is clearly not
tenable. However, going from classes to class types is a definite
compromise.
The second is that all members of a class type share the same
characteristics, i.e., they have the same initial unreserved
bandwidth. If a class type consists of AF1 and AF2, it is clearly
preferable to be able to have independent maximum reservable
bandwidths for each, rather than a single common value.
The third is that the notion of oversubscription has not been
addressed at all. At the very least, each class type should have
independent oversubscription factors; at best, each class should.
For example, one may want to oversubscribe best effort traffic by
150%, but voice traffic by only 25%. One may go further and say that
gold voice should be oversubscribed by only 10% whereas regular voice
by 25%.
Fourthly, as mentioned above, the scheme does not allow a computing
node to predict the new values of bandwidths if a flow is actually
set up; instead, the node has to wait for this information to be
Kompella, Kireeti [Page 7]
Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001
flooded.
Fifthly, it is not clear that cross-class preemption is desirable; a
clearer understanding of the consequences, and an evaluation by those
who wish to deploy DiffServ TE should be sought.
Finally, there are proposals to advertise backup bandwidths for the
purpose of setting up backup flows. If there are 4 class types with
8 bandwidths, and each needs to be backed up, there is a definite
concern about the scalability of this approach.
7. Per-Class Bandwidth Accounting
The scheme proposed in this document offers per-class bandwidth
accounting. This means per class maximum bandwidth, as well as per-
class oversubscription.
The mapping of priority to class is configured by the network
administrator. This mapping is irrelevant to protocols and to the
TED; neither path computation nor admission control care about Class
Types or classes.
7.1. Advertisements
Bandwidth advertisements work as follows: each link has a "link
bandwidth", and N priority levels (N = 8 is the current norm). Each
priority has a normalized (see next section) reserved bandwidth, a
normalized maximum bandwidth, and a subscription factor. Each
priority corresponds to exactly one Diff Serv class; more than one
priority can map to the same class. For example:
priority grade
0 premium voice
1 premium assured
2 standard voice
3 standard assured
4 gold data
5 silver data
6 bronze data
7 best effort
is the natural link bandwidth. is the reserved bandwidth at each priority level.
is the maximum bandwidth allowed at
each priority. is the subscription factor at each
priority level. This advertisement deprecates current bandwidth
Kompella, Kireeti [Page 8]
Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001
advertisements, i.e., maximum link bandwidth, maximum reservable
bandwidth and unreserved bandwidth.
Normalized maximum bandwidth for each priority level is set by
configuration and must be at most link bandwidth; the default is the
link bandwidth. The subscription factor is given in percentage and
is also set by configuration, and must be >= 100; the default is
100%. A subscription factor > 100 for priority p means that priority
p can be over-subscribed by (subscr[p]-100) %. Normalized reserved
bandwidth is initially 0, and is the only field that changes
dynamically, i.e., as flows come and go. Normalized reserved
bandwidth must be <= normalized maximum bandwidth.
So, a 100Mbps link could have 10Mbps/110% maximum bandwidth and
subscription factor for priority 0, 15Mbps/125% for priority 1,
25Mbps/150% for priority 2, 50Mbps/150% for priorities 3-6 and
100Mbps/400% for priority 7.
7.2. Normalization
The notion of normalized bandwidth is a simple way of dealing with
bandwidths at different oversubscription levels. Suppose a flow
requires 10Mbps at a priority level that has 50% oversubscription,
and another flow requires 40Mbps at a priority level that has 150%
oversubscription. Adding the bandwidths (10+40 = 50) clearly doesn't
make sense. On the other hand, for some model of oversubscription,
normalizing the bandwidths before adding (or otherwise combining)
them has some merit. Clearly, this sacrifices accuracy for
simplicity; a more accurate mechanism for adding bandwidths (even at
the same oversubscription levels) would involve adding probability
functions.
Definition: for a bandwidth B at priority p, the "normalized"
bandwidth B' is given by
B' = ((B*100)/subscr[p]).
For a link, denote by reserved[p] the normalized reserved bandwidth
at priority p; denote by max[p] the normalized maximum bandwidth at
priority p; denote by subscr[p] the subscription factor for priority
p.
Definition: the unreserved bandwidth at priority p is given by:
unreserved[p] = (link bandwidth - (sum reserved[q], q <= p))
Note that unreserved[] is computed at each node -- it is not flooded.
Kompella, Kireeti [Page 9]
Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001
7.3. Admission Control
A flow of bandwidth B and priority p is admissible on a link iff all
of the following are true:
a) B <= max[p] for the link # flow must fit in priority class
b) B' <= unreserved[p] # normalized flow must fit in
# unreserved bandwidth at prio p
c) B' <= max[p] - res[p] # normalized flow must fit within prio p
So, for example, suppose priority 1 has max = 50Mbps with 300%
subscription. A flow with bandwidth 145Mbps cannot be admitted even
though 145 < 50x3. However, 3 flows with bandwidth 50Mbps can be
admitted.
If the flow is accepted, then
reserved[p] := reserved[p] + B'
for q >= p
unreserved[q] := unreserved[q] - B'
if unreserved[q] < 0, preempt flows at priority q until
unreserved[q] >= 0
If a flow of bandwidth B and priority p goes away, then
reserved[p] := reserved[p] + B'
unreserved[q] := unreserved[q] - B' for q >= p
7.4. Analysis
The requirements for DiffServ TE, reduced to simplest terms, can be
stated as follows: overlay on a single physical topology multiple
logical topologies (each could represent a DiffServ class); show how
the topology information can be flooded, and how computation
(admission control and bandwidth updates) can be done; and show the
interaction between the various logical topologies. This document
does that.
It is unfortunate that [DS-TE-REQTS] and [TE-FW] speak about Class
Types rather than classes (thus pre-supposing the solution), and
ignore oversubscription. However, the following statement (with
"class" replacing "class type") sums up the requirements for DiffServ
TE:
The fundamental requirement for DS-TE is to be able to enforce
different bandwidth constraints for different Class Types rather
than a single one.
The above scheme may seem fairly complex; however, the procedures for
admission control and bandwidth updates are very clear and
Kompella, Kireeti [Page 10]
Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001
straightforward, and readily implementable; the mechanism is
scalable; and oversubscription is dealt with. Furthermore, there is
sufficient information in the TED so that each computing node has an
accurate picture of the state of the TED after its operations, so
that a node that needs to do multiple computations can do so without
waiting for new topology information (with a reasonable chance of
success). In the opinion of the author (as an implementor), this
property is highly desirable, if not an absolute requirement.
8. Security Considerations
Not discussed. Not envisioned to be an issue.
9. Acknowledgments
Many thanks to Quaizar Vohra and Yakov Rekhter for their valuable
comments and bug fixes. Any remaining bugs are solely my doing.
10. References
[DS-TE] "Requirements for support of Diff-Serv-aware MPLS Traffic
Engineering", Le Faucheur, Francois, Chiu, Angela, Townsend, William,
Skalecki, Darek, draft-ietf-mpls-diff-te-reqts-00.txt (work in
progress).
[DS-TE-REQTS] "Requirements for support of Diff-Serv-aware MPLS
Traffic Engineering", Le Faucheur, Francois, Nadeau, Thomas, Tatham,
Martin, Telkamp, Thomas, Cooper, David, Boyle, Jim, Martini, Luca,
Fang, Luyuan, Lai, Waisum, Ash, Jerry, Hicks, Pete, Chiu, Angela,
Townsend, William, Skalecki, Darek, draft-ietf-tewg-diff-te-
reqts-01.txt (work in progress).
[ISIS-TE] "IS-IS extensions for Traffic Engineering", Li, Tony, Smit,
Henk, draft-ietf-isis-traffic-03.txt (work in progress).
[NOP] "Accomplishing Diffserv TE needs with Current Specifications",
Boyle, Jim, draft-boyle-tewg-ds-nop-00.txt (work in progress).
[OSPF-TE] "Traffic Engineering Extensions to OSPF", Katz, Dave,
Yeung, Derek, Kompella, Kireeti, draft-katz-yeung-ospf-traffic-05.txt
(work in progress).
[TE-FW] "A Framework for Internet Traffic Engineering", Awduche,
Daniel, Chiu, Angela, Elwalid, Anwar, Widjaja, Indra, Xiao, XiPeng,
draft-ietf-tewg-framework-05.txt (work in progress).
Kompella, Kireeti [Page 11]
Internet Draft draft-kompella-tewg-bw-acct-00.txt July 2001
11. Author's Address
Kireeti Kompella
Juniper Networks, Inc.
1194 N. Mathilda Ave
Sunnyvale, CA 94089
kireeti@juniper.net
Kompella, Kireeti [Page 12]