idnits 2.17.1 draft-ietf-diffserv-model-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 9 longer pages, the longest (page 1) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([DSARCH], [PIB], [DSMIB]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 2 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 195 has weird spacing: '...agement pac...' == Line 232 has weird spacing: '...serving not...' == Line 265 has weird spacing: '...tioning other...' == Line 273 has weird spacing: '...serving ser...' == Line 1217 has weird spacing: '...monitor dro...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 1999) is 8954 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'GTC' is defined on line 1603, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 2475 (ref. 'DSARCH') -- Possible downref: Non-RFC (?) normative reference: ref. 'DSTERMS' -- Possible downref: Non-RFC (?) normative reference: ref. 'E2E' ** Obsolete normative reference: RFC 2598 (ref. 'EF-PHB') (Obsoleted by RFC 3246) -- Possible downref: Non-RFC (?) normative reference: ref. 'DSMIB' ** Downref: Normative reference to an Informational RFC: RFC 2697 (ref. 'SRTCM') -- Possible downref: Non-RFC (?) normative reference: ref. 'PIB' ** Downref: Normative reference to an Informational RFC: RFC 2698 (ref. 'TRTCM') -- Possible downref: Non-RFC (?) normative reference: ref. 'GTC' -- Possible downref: Non-RFC (?) normative reference: ref. 'MPLSDS' Summary: 8 errors (**), 0 flaws (~~), 11 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Y. Bernet 2 Diffserv Working Group Microsoft 3 INTERNET-DRAFT A. Smith 4 Expires: April 2000 Extreme Networks 5 S. Blake 6 Ericsson 7 October 1999 9 A Conceptual Model for Diffserv Routers 11 draft-ietf-diffserv-model-01.txt 13 Status of this Memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This document is a product of the Diffserv working group. Comments 35 on this draft should be directed to the Diffserv mailing list 36 . 38 Distribution of this memo is unlimited. 40 Copyright Notice 42 Copyright (C) The Internet Society (1999). All Rights Reserved. 44 Abstract 46 This draft proposes a conceptual model of Differentiated Services 47 (Diffserv) routers for use in their management and configuration. 48 This model defines the general functional datapath elements 49 (classifiers, meters, markers, droppers, monitors, mirrors, muxes, 50 queues), their possible configuration parameters, and how they might 51 be interconnected to realize the range of classification, traffic 52 conditioning, and per-hop behavior (PHB) functionalities described in 53 [DSARCH]. The model is intended to be abstract and capable of 54 representing the configuration parameters important to Diffserv 55 functionality for a variety of specific router implementations. It 57 Bernet, et. al. Expires: April 2000 [page 1] 58 is not intended as a guide to hardware implementation. 60 This model should serve as a rationale for the design of a Diffserv 61 MIB [DSMIB], as well for various configuration interfaces (such as 62 [PIB]). Since these documents are all evolving simultaneously there 63 are discrepancies between their current revisions; this should be 64 resolved in a future revision of this draft. 66 Table of Contents 68 1. Introduction ................................................. 3 69 2. Glossary .................................................... 4 70 3. Conceptual Model ............................................. 6 71 3.1 Elements of a Diffserv Router ............................. 6 72 3.1.1 Datapath .............................................. 7 73 3.1.2 Configuration and Management Interface ................ 8 74 3.1.3 Optional RSVP Module .................................. 8 75 3.2 Hierarchical Model of Diffserv Components ................. 8 76 4. Classifiers .................................................. 10 77 4.1 Definition ................................................ 10 78 4.1.1 Filters ............................................... 11 79 4.1.2 Overlapping Filters ................................... 12 80 4.1.3 Filter Groups ......................................... 12 81 4.2 Examples .................................................. 12 82 4.2.1 Behavior Aggregate (BA) Classifier .................... 12 83 4.2.2 Multi-Field (MF) Classifier ........................... 13 84 4.2.3 IEEE802 MAC Address Classifier ........................ 13 85 4.2.4 Free-form Classifier .................................. 14 86 4.2.5 Other Possible Classifiers ............................ 14 87 4.3 MPLS ...................................................... 15 88 5. Meters ....................................................... 15 89 5.1 Definition ................................................ 15 90 5.2 Examples .................................................. 16 91 5.2.1 Average Rate Meter .................................... 16 92 5.2.2 Exponentially Weighted Moving Average (EWMA) Meter .... 17 93 5.2.3 Two-Parameter Token Bucket Meter ...................... 17 94 5.2.4 Multi-Stage Token Bucket Meter ........................ 18 95 5.2.5 Null Meter ............................................ 19 96 6. Action Elements .............................................. 19 97 6.1 Marker .................................................... 19 98 6.2 Dropper ................................................... 20 99 6.3 Shaper .................................................... 20 100 6.4 Mirroring Element ......................................... 20 101 6.5 Multiplexor ............................................... 20 102 6.6 Enqueueing Element ........................................ 20 103 6.7 Monitor ................................................... 21 104 6.8 Null Action ............................................... 21 105 7. Queues ....................................................... 21 106 7.1 Queue Sets and Scheduling ................................. 21 107 7.2 Shaping ................................................... 23 108 8. Traffic Conditioning Blocks (TCBs) ........................... 23 109 8.1 An Example TCB ............................................ 24 111 Bernet, et. al. Expires: April 2000 [page 2] 112 8.2 An Example TCB to Support Multiple Customers .............. 27 113 8.3 TCBs Supporting Microflow-based Services .................. 28 114 9. Open Issues .................................................. 31 115 10. Security Considerations ...................................... 31 116 11. Acknowledgments .............................................. 31 117 12. References ................................................... 32 118 Appendix A. Simple Token Bucket Definition ....................... 33 120 1. Introduction 122 Differentiated Services (Diffserv) [DSARCH] is a set of technologies 123 which allow network service providers to offer differing levels of 124 network quality-of-service (QoS) to different customers and their 125 traffic streams. The premise of Diffserv networks is that routers 126 within the core of the network handle packets in different traffic 127 streams by forwarding them using different per-hop behaviors (PHBs). 128 The PHB to be applied is indicated by a Diffserv codepoint (DSCP) in 129 the IP header of each packet [DSFIELD]. Note that this document 130 uses the terminology defined in [DSARCH, DSTERMS] and in Sec. 2. 132 The advantage of such a scheme is that many traffic streams can be 133 aggregated to one of a small number of behavior aggregates (BA) 134 which are each forwarded using the same PHB at the router, thereby 135 simplifying the processing and associated storage. In addition, 136 there is no signaling, other than what is carried in the DSCP of 137 each packet, and no other related processing that is required in the 138 core of the Diffserv network since QoS is invoked on a packet-by- 139 packet basis. 141 The Diffserv architecture enables a variety of possible services 142 which could be deployed in a network. These services are reflected 143 to customers at the edges of the Diffserv network in the form of a 144 Service Level Specification (SLS) [DSTERMS]. The ability to provide 145 these services depends on the availability of cohesive management and 146 configuration tools that can be used to provision and monitor a set 147 of Diffserv routers in a coordinated manner. To facilitate the 148 development of such configuration and management tools it is helpful 149 to define a conceptual model of a Diffserv router that abstracts 150 away implementation details of particular Diffserv routers from the 151 parameters of interest for configuration and management. The purpose 152 of this draft is to define such a model. 154 The basic forwarding functionality of a Diffserv router is defined in 155 other specifications; e.g., [DSARCH, DSFIELD, AF-PHB, EF-PHB]. 157 This document is not intended in any way to constrain or to dictate 158 the implementation alternatives of Diffserv routers. We expect that 159 router vendors will demonstrate a great deal of variability in their 160 implementations. To the extent that vendors are able to model their 161 implementations using the abstractions described in this draft, 162 configuration and management tools will more readily be able to 163 configure and manage networks incorporating Diffserv routers of 165 Bernet, et. al. Expires: April 2000 [page 3] 166 various implementations. 167 In Sec. 3 we start by describing the basic high-level functional 168 elements of a Diffserv router and then describe the various 169 components. We then focus on the Diffserv-specific components of 170 the router and describe a hierarchical management model for these. 172 In Sec. 4 we describe classification elements and in Sec. 5, we 173 discuss the meter elements. 175 In Sec. 6 we discuss action elements. In Sec. 7 we discuss the 176 basic queueing elements and their functional behaviors (e.g., 177 shaping). 179 In Sec. 8, we show how the basic classification, meter, action, and 180 queueing elements can be combined to build modules called Traffic 181 Conditioning Blocks (TCBs). 183 In Sec. 9 we discuss open issues with this document and in Sec. 10 we 184 discuss security concerns. 186 Appendix A discusses token bucket implementation details. 188 2. Glossary 190 Some of the terms used in this draft are defined in [DSARCH] and in 191 [DSTERMS]. We define a few of them here again only to provide 192 additional detail. 194 Buffer An algorithm used to determine whether an arriving 195 management packet should be stored in a queue, or discarded. This 196 algorithm decision is usually a function of the instantaneous or 197 average queue occupancy, but also may be a function of 198 the aggregate queue occupancy in a queue set, or of 199 other parameters. 201 Classifier A functional datapath element which consists of filters 202 which select packets based on the content of packet 203 headers or other packet data, and/or on implicit or 204 derived attributes associated with the packet, and 205 forwards the packet along a particular datapath within 206 the router. A classifier splits a single incoming 207 traffic stream into multiple outgoing ones. 209 Enqueueing The process of executing a buffer management algorithm 210 to determine whether an arriving packet should be 211 stored in a queue. 213 Filter A set of (wildcard/prefix/masked/range/exact) 214 conditions on the components of a packet's 215 classification key. A filter is said to match only if 216 each condition is satisfied. 218 Bernet, et. al. Expires: April 2000 [page 4] 219 Mirroring A functional datapath element which makes one or more 220 element copies of a packet and forwards them on distinct 221 datapaths; for example to a monitoring port. 223 Monitor A functional datapath element which increments an octet 224 and a packet counter for every packet which passes 225 through it. Used for collecting statistics. 227 Multiplexer A functional datapath element that merges multiple 228 (Mux) traffic streams (datapaths) into a single traffic 229 stream (datapath). 231 Non-work A property of a scheduling algorithm such that it does 232 conserving not necessarily service a packet if available at every 233 transmission opportunity. 235 Queue A storage location for packets awaiting transmission or 236 processing by the next functional element in the data- 237 path. The queues represented in this model are 238 abstract elements that may be implemented by multiple 239 physical queues in series and/or in parallel in a 240 specific implementation. Note that we assume that a 241 queue is serviced such as to preserve the required 242 ordering constraint for each Ordering Aggregate (OA) 243 it queues [DSTERMS]. This can be achieved by a FIFO 244 (first in, first out) service policy or by other means 245 (e.g., multiple FIFOs exclusively servicing particular 246 OAs). 248 Queue set A set of queues which are serviced by a scheduling 249 algorithm and which may share a buffer management 250 algorithm. 252 Scheduling An algorithm which determines which queue of a queue 253 algorithm set to service next. This may be based on the relative 254 priority of the queues, or on a weighted fair bandwidth 255 sharing policy, or some other policy. A scheduling 256 algorithm may be either work-conserving or non-work- 257 conserving. 259 Shaping The process of delaying packets within a traffic stream 260 to cause it to conform to some defined traffic profile. 261 Shaping can be implemented using a queue serviced by a 262 non-work conserving scheduling algorithm. 264 Traffic A logical datapath element consisting of a number of 265 Conditioning other functional datapath elements interconnected in 266 Block (TCB) such a way as to perform a specific set of traffic 267 conditioning functions on an incoming traffic stream. 268 A TCB can be thought of as a "black box" with a single 269 input and output. 271 Bernet, et. al. Expires: April 2000 [page 5] 272 Work A property of a scheduling algorithm such that it 273 conserving services a packet if available at every transmission 274 opportunity. 276 3. Conceptual Model 278 In this section we introduce a block diagram of a Diffserv router and 279 describe the various components illustrated. Note that a Diffserv 280 core router is assumed to include only a subset of these components: 281 the model we present here is intended to cover the case of both 282 Diffserv edge and core routers. 284 3.1 Elements of a Diffserv Router 286 The conceptual model we define includes abstract definitions for the 287 following: 289 o The basic traffic classification components. 291 o The basic traffic conditioning components. 293 o Certain combinations of traffic classification and conditioning 294 components. 296 o Queueing components. 298 The components and combinations of components described in this 299 document form building blocks that need to be manageable by Diffserv 300 configuration and management tools. One of the goals of this 301 document is to show how a model of a Diffserv device can be built 302 using these component blocks. This model is in the form of a 303 connected directed acyclic graph (DAG) of functional datapath 304 elements that describes the traffic conditioning and queueing 305 behaviors that any particular packet will experience when forwarded 306 to the Diffserv router. 308 The following diagram illustrates the major functional blocks of a 309 Diffserv router: 311 Bernet, et. al. Expires: April 2000 [page 6] 312 +---------------+ 313 | Diffserv | 314 Mgmt | configuration | 315 <----+-->| & management |------------------+ 316 SNMP,| | interface | | 317 COPS | +---------------+ | 318 etc. | | | 319 | | | 320 | v v 321 | +-------------+ +---------+ +-------------+ 322 data | | ingress i/f | | | | egress i/f | 323 -------->| class., |-->| routing |-->| class., |----> 324 | | TC, | | core | | TC, | 325 | | queueing | | | | queueing | 326 | +-------------+ +---------+ +-------------+ 327 | ^ ^ 328 | | | 329 | | | 330 | +------------+ | 331 +-->| RSVP | | 332 -------->| (optional) |---------------------+ 333 RSVP +------------+ 334 cntl 335 msgs 337 Figure 1: Diffserv Router Major Functional Blocks 339 3.1.1 Datapath 341 An ingress interface, routing core, and egress interface are 342 illustrated at the center of the diagram. In actual router 343 implementations, there may be an arbitrary number of ingress and 344 egress interfaces interconnected by the routing core. The routing 345 core element serves as an abstraction of a router's normal routing 346 and switching functionality. The routing core moves packets between 347 interfaces according to policies outside the scope of Diffserv. The 348 actual queueing delay and packet loss behavior of a specific router's 349 switching fabric/backplane is not modeled by the routing core; these 350 should be modeled using the functional elements described later. The 351 routing core should be thought of as an infinite bandwidth, zero- 352 delay backplane connecting ingress and egress interfaces. 354 The components of interest on the ingress/egress interfaces are the 355 traffic classifiers, traffic conditioning (TC) components, and the 356 queueing components that support Diffserv traffic conditioning and 357 per-hop behaviors [DSARCH]. These are the fundamental components 358 comprising a Diffserv router and will be the focal point of our 359 conceptual model. 361 Bernet, et. al. Expires: April 2000 [page 7] 362 3.1.2 Configuration and Management Interface 364 Diffserv operating parameters are monitored and provisioned through 365 this interface. Monitored parameters include statistics regarding 366 traffic carried at various Diffserv service levels. These statistics 367 may be important for accounting purposes and/or for tracking 368 compliance to traffic conditioning specifications (TCSs) [DSTERMS] 369 negotiated with customers. Provisioned parameters are primarily 370 classification rules, TC and PHB configuration parameters. The 371 network administrator interacts with the Diffserv configuration and 372 management interface via one or more management protocols, such as 373 SNMP or COPS, or through other router configuration tools such as 374 serial terminal or telnet consoles. 376 3.1.3 Optional RSVP Module 378 Diffserv routers may snoop or participate in either per-microflow or 379 per-flow-aggregate signaling of QoS requirements [E2E]. The example 380 discussed here uses the RSVP protocol. Snooping of RSVP messages may 381 be used, for example, to learn how to classify traffic without 382 actually participating as a RSVP protocol peer. Diffserv routers may 383 reject or admit RSVP reservation requests to provide a means of 384 admission control to Diffserv-based services or they may use these 385 requests to trigger provisioning changes for a flow-aggregation in 386 the Diffserv network. A flow-aggregation in this context might be 387 equivalent to a Diffserv BA or it may be more fine-grained, relying 388 on a MF classifier [DSARCH]. Note that the conceptual model of such 389 a router starts to look the same as a Integrated Services (intserv) 390 router in its component makeup [E2E]. 392 Note that a RSVP component of a Diffserv router, if present, might 393 be active only in the control plane and not in the data plane. In 394 this scenario, RSVP is used strictly as a signaling protocol. The 395 data plane of such a Diffserv router can still act purely on Diffserv 396 DSCPs and PHBs in handling data traffic. 398 3.2 Hierarchical Model of Diffserv Components 400 We focus on the Diffserv specific functional components of the 401 router: the classification, traffic conditioning, and queueing 402 functionality. The diagram below is based on the larger block 403 diagram shown above: 405 Bernet, et. al. Expires: April 2000 [page 8] 406 Interface A Interface B 407 +-------------+ +---------+ +-------------+ 408 | ingress i/f | | | | egress i/f | 409 | class., | | | | class., | 410 --->| meter, |---->| |---->| meter, |---> 411 | action, | | | | action, | 412 | queueing | | | | queueing | 413 +-------------+ | routing | +-------------+ 414 | core | 415 +-------------+ | | +-------------+ 416 | egress i/f | | | | ingress i/f | 417 | class., | | | | class., | 418 <---| meter, |<----| |<----| meter, |<--- 419 | action, | | | | action, | 420 | queueing | +---------+ | queueing | 421 +-------------+ +-------------+ 423 Figure 2. Traffic Conditioning and Queueing Elements 425 This diagram illustrates two Diffserv router interfaces, each having 426 an ingress and an egress component. It shows classification, meter, 427 action, and queueing elements which might be instantiated on each 428 interface's ingress and egress component. The TC functionality is 429 implemented by a combination of classification, action, meter, and 430 queueing elements. We show equivalent functional elements on both 431 the ingress and egress components of an interface because we expect 432 an N-port router to display the same Diffserv capabilities as a 433 network of 2-port routers interconnected by LAN media [DSMIB]. Note 434 that it is not mandatory that each of these functional elements be 435 implemented on both ingress and egress components; it is dependent on 436 the service requirements on a particular interface on a particular 437 router. Further, we wish to point out that by showing these elements 438 on both ingress and egress components we do not mean to imply that 439 they must be implemented in this way in a specific router. For 440 example, a router may implement all shaping and PHB queueing on the 441 interface egress component, or may instead implement it only on the 442 ingress component. Further, the classification needed to map a 443 packet to an egress component queue (if present) need not be 444 implemented on the egress component but instead may be implemented on 445 the ingress component, with the packet passed through the routing 446 core with in-band control information to allow for egress queue 447 selection. 449 From a configuration and management perspective, the following 450 hierarchy exists: 452 At the top level, the network administrator manages interfaces. Each 453 interface consists of an ingress component and an egress component. 454 Each component may contain classifier, action, meter, and queueing 455 elements. 457 Bernet, et. al. Expires: April 2000 [page 9] 458 At the next level, the network administrator manages groups of 459 functional elements interconnected in a DAG. These elements are 460 organized in self-contained Traffic Conditioning Blocks (TCBs) which 461 are used to implement some desired network policy (see Sec. 8). One 462 or more TCBs may be instantiated on each ingress or egress component, 463 may be connected in series, and/or may be connected in a 464 parallel configuration on the multiple outputs of a classifier. 465 We define the TCB to optionally include classification and queueing 466 elements so as to allow for rich functionality. A TCB can be thought 467 of as a "black box" with a single input and a single output (on the 468 main data path). TCBs can be constructed out of a DAG of other TCBs, 469 recursively. We do not assume the same TCB configuration on every 470 interface (ingress or egress). 472 At the lowest level are individual functional elements, each with 473 their own configuration parameters and management counters and flags. 475 4. Classifiers 477 4.1 Definition 479 Classification is performed by a classifier element. Classifiers are 480 1:N (fan-out) devices: they take a single traffic stream as input and 481 generate N logically separate traffic streams as output. Classifiers 482 are parameterized by filters and output streams. Packets from the 483 input stream are sorted into various output streams by filters which 484 match the contents of the packet or possibly match other attributes 485 associated with the packet. Various types of classifiers are 486 described in the following sections. 488 We use the following diagram to illustrate a classifier, where the 489 outputs connect to succeeding functional elements: 491 unclassified classified 492 traffic traffic 493 +------------+ 494 | |--> match Filter1 --> output A 495 ------->| classifier |--> match Filter2 --> output B 496 | |--> no match --> output C 497 +------------+ 499 Figure 3. An Example Classifier 501 Note that we allow a mux (see Sec. 6.5) before the classifier to 502 allow input from multiple traffic streams. For example, if multiple 503 ingress sub-interfaces feed through a single classifier then the 504 interface number can be considered by the classifier as a packet 505 attribute and be included in the packet's classification key. This 506 optimization may be important for scalability in the management 507 plane. Another possible packet attribute could be an integer 508 representing the BGP community string associated with the packet's 509 best-matching route. 511 The following classifier separates traffic into one of three output 512 streams based on three filters: 514 Filter Matched Output Stream 515 -------------- --------------- 516 Filter1 A 517 Filter2 B 518 Filter3 (no match) C 520 Where Filters1 and Filter2 are defined to be the following BA filters 521 ([DSARCH], see Sec. 4.2.1 ): 523 Filter DSCP 524 ------ ------ 525 1 101010 526 2 111111 527 3 ****** (wildcard) 529 4.1.1 Filters 531 A filter consists of a set of conditions on the component values of 532 a packet's classification key (the header values, contents, and 533 attributes relevant for classification). In the BA classifier 534 example above, the classification key consists of one packet header 535 field, the DSCP, and both Filter1 and Filter2 specify exact-match 536 conditions on the value of the DSCP. Filter3 is a wildcard default 537 filter which matches every packet, but which is only selected in the 538 event that no other more specific filter matches. 540 In general there are a set of possible component conditions including 541 exact, prefix, range, masked, and wildcard matches. Note that ranges 542 can be represented (with less efficiency) as a set of prefixes and 543 that prefix matches are just a special case of both masked and range 544 matches. 546 In the case of a MF classifier [DSARCH], the classification key 547 consists of a number of packet header fields. The filter may 548 specify a different condition for each key component, as illustrated 549 in the example below for a IPv4/TCP classifier: 551 Filter IP Src Addr IP Dest Addr TCP SrcPort TCP DestPort 552 ------ ------------- ------------- ----------- ------------ 553 Filter4 172.31.8.1/32 172.31.3.X/24 X 5003 555 In this example, the fourth octet of the destination IPv4 address 556 and the source TCP port are wildcard or "don't cares". 558 4.1.2 Overlapping Filters 560 Note that it is easy to define sets of overlapping filters in a 561 classifier. For example: 563 Filter5: Filter6: 564 Type: Masked-DSCP Type: Masked-DSCP 565 Value: 111000 Value: 000111 (binary) 566 Mask: 111000 Mask: 000111 (binary) 568 A packet containing DSCP = 111111 cannot be uniquely classified by 569 this pair of filters and so a precedence must be established between 570 Filter5 and Filter6 in order to break the tie. This precedence must 571 be established either (a) by a manager which knows that the router 572 can accomplish this particular ordering; e.g., by means of reported 573 capabilities or (b) by the router along with a mechanism to report 574 to a manager which precedence is being used. These ordering 575 mechanisms must be supported by the configuration and management 576 protocols although further discussion of this is outside the scope of 577 this document. 579 An unambiguous classifier requires that every possible classification 580 key match at least one filter (including the wildcard default), and 581 that any ambiguity between overlapping filters be resolved by 582 precedence. 584 4.1.3 Filter Groups 586 Filters may be logically combined. For example, consider the 587 following DestMacAddress filter: 589 Filter7: 590 Type: DestMacAddress 591 Value: 01-02-03-04-05-06 592 Mask: FF-FF-FF-FF-FF-FF 594 Classifier0 could then be declared as: 596 Classifier0: 597 Filter1 and Filter7: output A 598 Filter2 and Filter7: output B 599 Default (wildcard) filter: output C 601 4.2 Examples 603 4.2.1 Behaviour Aggregate (BA) Classifier 605 The simplest Diffserv classifier is a behavior aggregate (BA) 606 classifier [DSARCH]. A BA classifier uses only the Diffserv 607 codepoint (DSCP) in a packet's IP header to determine the logical 608 output stream to which the packet should be directed. We allow only 609 an exact-match condition on this field because the assigned DSCP 610 values have no structure, and therefore no subset of DSCP bits are 611 significant. 613 The following defines a possible BA filter: 615 Filter8: 616 Type: BA 617 Value: 111000 619 4.2.2 Multi-Field (MF) Classifier 621 Another type of classifier is a multi-field (MF) classifier [DSARCH]. 622 This classifies packets based on one or more fields in the packet 623 header (including the DSCP). A common type of MF classifier is a 6- 624 tuple classifier that classifies based on six IP header fields 625 (destination address, source address, IP protocol, source port, 626 destination port, and DSCP). MF classifiers may classify on other 627 fields such as MAC addresses, VLAN tags, link-layer traffic class 628 fields or other higher-layer protocol fields. 630 The following defines a possible MF filter: 632 Filter9: 633 Type: IPv4-6-tuple 634 IPv4DestAddrValue: 0 635 IPv4DestAddrMask: 0.0.0.0 636 IPv4SrcAddrValue: 172.31.8.0 637 IPv4SrcAddrMask: 255.255.255.0 638 IPv4DSCP: 28 639 IPv4Protocol: 6 640 IPv4DestL4PortMin: 0 641 IPv4DestL4PortMax: 65535 642 IPv4SrcL4PortMin: 20 643 IPv4SrcL4PortMax: 20 645 A similar type of classifier can be defined for IPv6. 647 4.2.3 IEEE802 MAC Address Classifier 649 A MacAddress filter is parameterized by a 6-byte {value, mask} pair 650 for either source or destination MAC address. For example, the 651 following classifier sends packets matching either DA = 652 01-02-03-04-05-06 or SA = 00-E0-2B-XX-XX-XX to output A: 654 Classifier1: 655 Filter10: output A 656 Filter11: output A 657 Default: output B 658 Filter10: 659 Type: DestMacAddress 660 Value: 01-02-03-04-05-06 (hex) 661 Mask: FF-FF-FF-FF-FF-FF (hex) 663 Filter11: 664 Type: SrcMacAddress 665 DestValue: 00-E0-2B-00-00-00 (hex) 666 DestMask: FF-FF-FF-00-00-00 (hex) 668 4.2.4 Free-form Classifier 670 A Free-form classifier is made up of a set of user definable 671 arbitrary filters each made up of {bit-field size, offset (from head 672 of packet), mask}: 674 Classifier2: 675 Filter12: output A 676 Filter13: output B 677 Default: output C 679 Filter12: 680 Type: FreeForm 681 SizeBits: 3 (bits) 682 Offset: 16 (bytes) 683 Value: 100 (binary) 684 Mask: 101 (binary) 686 Filter13: 687 Type: FreeForm 688 SizeBits: 12 (bits) 689 Offset: 16 (bytes) 690 Value: 100100000000 (binary) 691 Mask: 111111111111 (binary) 693 Free-form filters can be combined into filter groups to form very 694 powerful filters. 696 4.2.5 Other Possible Classifiers 698 Classifier3: 699 Filter14: output A 700 Filter15: output B 701 Default: output C 703 Filter14: 704 Type: IEEEPriority 705 Value: 100 (binary) 706 Mask: 101 (binary) 707 Filter15: 708 Type: IEEEVLAN 709 Value: 100100000000 (binary) 710 Mask: 111111111111 (binary) 712 Classification may be performed based on implicit information 713 associated with a packet (e.g. the incoming channel number on a 714 channelized interface) or on information derived from a different 715 non-Diffserv classification operation (e.g. the outgoing interface 716 determined by the route lookup operation). Other vendor-specific 717 filter formats are possible. We do not discuss these further here. 719 4.3 MPLS 721 It is possible for an MPLS label-switched router (LSR) to function as 722 a Diffserv router [MPLSDS]. In this case the IP header is not 723 visible for inspection and all header classification must be 724 performed on the MPLS label, and in the event of shim encapsulation, 725 on the 3-bit EXP field in addition. In general a MPLS classification 726 filter may specify either wildcard- or exact-match conditions for 727 either field (but not both wildcard at once). The distinction to be 728 drawn here is that MPLS labels are dynamically established and torn 729 down. An EXP-only classifier may be statically configured but a 730 label or label + EXP classifier must be established dynamically along 731 with the LSP. In all other respects (except marking) the labeled 732 packet can be treated identically to an unlabeled packet. 734 5. Meters 736 5.1 Definition 738 Metering is the function of monitoring the arrival times of packets 739 of a traffic stream and determining the level of conformance of each 740 packet to a pre-established traffic profile. Diffserv network 741 providers may choose to offer services to customers based on a 742 temporal (i.e., rate) profile within which the customer submits 743 traffic for the service. In this event, a meter might be used to 744 trigger real-time traffic conditioning actions (e.g., marking) by 745 routing a non-conforming packet through an appropriate next-stage 746 action element. Alternatively, it might also be used for out-of-band 747 management functions like statistics monitoring for billing 748 applications. 750 Meters are logically 1:N (fan-out) devices (although a mux can be 751 used in front of a meter). Meters are parameterized by a temporal 752 profile and by conformance levels, each of which is associated with 753 a meter's output. Each output can be connected to another functional 754 element. 756 Note that this model of a meter differs from that described in 757 [DSARCH]. In that description the meter is not a datapath element 758 but is instead used to monitor the traffic stream and send control 759 signals to action elements to dynamically modulate their behavior 760 based on the conformance of the packet. We find the description here 761 more powerful. 763 We use the following diagram to illustrate a meter with 3 levels of 764 conformance: 766 unmetered metered 767 traffic traffic 769 +---------+ 770 | |--------> conformanceA 771 --------->| meter |--------> conformanceB 772 | |--------> conformanceC 773 +---------+ 775 Figure 4. An Example Meter 777 In some Diffserv examples, three levels of conformance are discussed 778 in terms of colors, with green representing conforming, yellow 779 representing partially conforming, and red representing non- 780 conforming [AF-PHB]. These different conformance levels are used to 781 trigger different buffer management actions. Other example meters 782 use a binary notion of conformance; in the general case N levels of 783 conformance can be supported. In general there is no constraint on 784 the type of functional element following a meter output, but care 785 must be taken not to inadvertently configure a datapath that results 786 in packet reordering within an OA. 788 5.2 Examples 790 The following is a non-exhaustive list of possible meters. 792 5.2.1 Average Rate Meter 794 An example of a very simple meter is an average rate meter. This 795 type of meter measures the average rate at which packets are 796 submitted to it over a specified averaging time. 798 An average rate profile may take the following form: 800 Meter1: 801 Type: AverageRate 802 Profile1: output A 803 NonConforming: output B 805 Profile1: 806 Type: AverageRate 807 AverageRate: 120 KBps 808 Delta: 1.0 msec 810 A meter measuring against this profile would continually maintain a 811 count that indicates the total number of packets arriving between 812 time T (now) and time T - 1.0 msecs. So long as an arriving packet 813 does not push the count over 120 bytes, the packet would be deemed 814 conforming. Any packet that pushes the count over 120 would be 815 deemed non-conforming. Thus, this meter deems packets to correspond 816 to one of two conformance levels: conforming or non-conforming. 818 5.2.2 Exponential Weighted Moving Average (EWMA) Meter 820 The EWMA form of meter is easy to implement in hardware and can be 821 parameterized as follows: 823 avg_rate(t) = (1 - Gain) * avg_rate(t') + Gain * rate(t) 824 t = t' + Delta 826 For a packet arriving at time t: 828 if (avg_rate(t) > AverageRate) 829 non-conforming 830 else 831 conforming 833 Gain controls the time constant (e.g. frequency response) of what is 834 essentially a simple IIR low-pass filter. rate(t) measures the 835 number of incoming bytes in a small fixed sampling interval, Delta. 836 Any packet that arrives and pushes the average rate over a predefined 837 rate AverageRate is deemed non-conforming. An EWMA meter profile 838 might look as follows: 840 Meter2: 841 Type: ExpWeightedMovingAvg 842 Profile2: output A 843 NonConforming: output B 845 Profile2: 846 Type: ExpWeightedMovingAvg 847 AverageRate: 25 KBps 848 Delta: 10.0 usec 849 Gain: 1/16 851 5.2.3 Two-Parameter Token Bucket Meter 853 A more sophisticated meter might measure conformance to a token 854 bucket (TB) profile. A TB profile generally has two parameters, an 855 average token rate, a burst size. TB meters compare the arrival 856 rate of packets to the average rate specified by the TB profile. 857 Logically, byte tokens accumulate in a bucket at the average rate, 858 up to a maximum credit which is the burst size. Packets of length 859 L bytes are considered conforming if L tokens are available in the 860 bucket at the time of packet arrival. Packets are allowed to 861 exceed the average rate in bursts up to the burst size. Packets 862 which arrive to find a bucket with insufficient tokens in it are 863 deemed non-conforming. A two-parameter TB meter has exactly two 864 possible conformance levels (conforming, non-conforming). TB 865 implementation details are discussed in Appendix A. 867 A two-parameter RB meter profile might look as follows: 869 Meter3: 870 Type: SimpleTokenBucket 871 Profile3: output A 872 NonConforming: output B 874 Profile3: 875 Type: SimpleTokenBucket 876 AverageRate: 100 KBps 877 BurstSize: 100 KB 879 5.2.4 Multi-Stage Token Bucket Meter 881 More complicated TB meters might define two burst sizes and three 882 conformance levels. Packets found to exceed the larger burst size 883 are deemed non-conforming. Packets found to exceed the smaller 884 burst size are deemed partially conforming. Packets exceeding 885 neither are deemed conforming. Token bucket meters designed for 886 Diffserv networks are described in more detail in [SRTCM, TRTCM, 887 GTC]; in some of these references three levels of conformance are 888 discussed in terms of colors, with green representing conforming, 889 yellow representing partially conforming and red representing non- 890 conforming. Often these multi-conformance level meters can be 891 implemented using an appropriate configuration of multiple two- 892 parameter TB meters. 894 A profile for a multi-stage TB meter with three levels of conformance 895 might look as follows: 897 Meter4: 898 Type: MultiTokenBucket 899 Profile4: output A 900 Profile5: output B 901 NonConforming: output C 903 Profile4: 904 Type: SimpleTokenBucket 905 AverageRate: 100 KBps 906 BurstSize: 20 KB 908 Profile5: 909 Type: SimpleTokenBucket 910 AverageRate: 100 KBps 911 BurstSize: 100 KB 913 5.2.5 Null Meter 915 A null meter has only one output: always conforming, and no 916 associated temporal profile. Such a meter is useful to define in the 917 event that the configuration or management interface does not have 918 the flexibility to omit a meter in a datapath segment. 920 6. Action Elements 922 Classifiers and meters are fan-out elements which are generally used 923 to determine the appropriate action to apply to a packet. The set of 924 possible actions include: 926 1) Marking 927 2) Dropping 928 2) Shaping 929 3) Mirroring 930 4) Monitoring 932 The corresponding action elements are described in the following 933 paragraphs. 935 Policing is a general term for the process of preventing a traffic 936 stream from seizing more than its share of resources from a Diffserv 937 network. Each of the first three actions described above may be used 938 to police traffic. Markers do so by re-marking non-conforming 939 packets to a DSCP value that is entitled to fewer network resources. 940 Shapers and droppers do so by limiting the rate at which a particular 941 traffic stream is submitted to the network. 943 6.1 Marker 945 Markers are 1:1 elements which set the DSCP in an IP header (in 946 the case of unlabeled packets). Markers may act on unmarked packets 947 (submitted with DSCP of zero) or may re-mark previously marked 948 packets. In particular, the model supports the application of 949 marking based on a preceding classifier match. The DSCP set in a 950 packet will determine its subsequent treatment in downstream nodes 951 of a network, and possible in subsequent processing stages within the 952 router (depending on configuration). 954 Markers are normally parameterized by a single parameter: the 6-bit 955 DSCP to be marked in the packet header. 957 ActionElement1: 958 Type: Marker 959 Mark: 010010 961 In the case of a MPLS labeled packet, the marker is parameterized 962 by a 3-bit EXP value to be marked in the MPLS shim header. 964 6.2 Dropper 966 Droppers simply discard packets. There are no parameters for 967 droppers. Because a dropper is a terminating point of the datapath, 968 it may be desirable to forward the packet through a monitor first 969 for instrumentation purposes. 971 Droppers are not the only elements than can cause a packet to be 972 discarded. The other element is an enqueueing element (see Sec. 973 6.6). However, since the enqueueing element's behavior is closely 974 tied the state of one or more queues, we choose to distinguish them 975 as separate functional elements. 977 6.3 Shaper 979 Shapers are used to shape traffic streams to a certain temporal 980 profile. For example, a shaper can be used to smooth traffic 981 arriving in bursts. In [DSARCH] a shaper is described as a 982 queueing element controlled by a meter which defines its temporal 983 profile. This model of a shaper differs substantially from typical 984 shaper implementations. Further, with the inclusion of queueing 985 elements in the model a separate shaping element becomes confusing. 986 Therefore, the function of a shaper is embedded in a queue and is 987 covered in Sec. 7. 989 6.4 Mirroring Element 991 It is occasionally desirable to mirror data traffic on one or more 992 additional interfaces for data collection purposes. A mirroring 993 element is a 1:N (fan-out) element. However, each and every packet 994 follows each output path simultaneously. A mirroring element is 995 parameterized by the number of outputs it supports. 997 6.5 Mux 999 It is occasionally necessary to multiplex traffic streams into a 1:1 1000 or 1:N action element or classifier. A M:1 (fan-in) mux is a simple 1001 logical device for merging traffic streams. It is parameterized by 1002 its number of incoming ports. 1004 6.6 Enqueueing Element 1006 Queueing elements (discussed in Sec. 7) require an action element to 1007 execute the appropriate buffer management algorithm and store or 1008 discard a packet. This is performed by an enqueueing element, which 1009 is an M:1 (fan-in) element. An enqueueing element executes the 1010 buffer management algorithm appropriate for the queue it is feeding. 1011 This may include a deterministic discard behavior if the queue size 1012 exceeds a threshold, it may include a random discard behavior that 1013 is a function of the average queue size [AF-PHB], or it may include 1014 a more complex policy which is a function of the state of several 1015 queues in a queue set (see Sec. 7). The particular parameters to 1016 apply to a packet may depend on the particular input port the element 1017 receives it on; this allows packets which are classified into 1018 different colors to follow different datapaths and be processed 1019 appropriately at the enqueueing element. 1021 The configuration parameters for an enqueueing element will depend on 1022 the details of the algorithm it is executing. For an algorithm such 1023 as the one recommended in [AF-PHB], the parameters would include 1024 separate RED min_th, max_th, and max_p parameters per-element input 1025 port. 1027 An enqueueing element must maintain octet/packet counters for both 1028 the forwarded and discarded packets received at each element input 1029 port. Counters should be provided to distinguish between losses due 1030 to the normal operation of the algorithm (e.g., random drop) and 1031 those due to resource exhaustion (e.g., tail drop) [DSMIB]. 1033 6.7 Monitor 1035 One passive action is to account for the fact that a data packet was 1036 processed. The statistics that result might be used later for 1037 customer billing, service verification, or network engineering 1038 purposes. Monitors are 1:1 functional elements which increment an 1039 octet counter by L and a packet counter by 1 every time a L-byte 1040 sized packet passes through it. Monitors can also be used to count 1041 packets on the verge of being dropped by a dropper. 1043 6.8 Null Action 1045 A null action has one input and one output. The element performs no 1046 action on the packet. Such an element is useful to define in the 1047 event that the configuration or management interface does not have 1048 the flexibility to omit an action element in a datapath segment. 1050 7. Queues 1052 7.1 Queue Sets and Scheduling 1054 Queues are used to store packets prior to transmission or prior to 1055 forwarding to the next functional element. Packets are usually 1056 stored either because there is a resource constraint (e.g., available 1057 bandwidth) which prevents immediate forwarding, or because the queue 1058 is being used to alter the temporal properties of a traffic stream 1059 (shaping). Queues may be organized into queue sets, which are 1060 serviced using a common scheduling algorithm (although each queue may 1061 be individually parameterized). Queue sets can be treated as 1062 functional elements and organized hierarchically in queue supersets, 1063 using an n-th order scheduling algorithm. Such a queue set may be 1064 used to implement the entire range of PHBs on an egress interface, 1065 for instance. 1067 Possible queue scheduling algorithms fall into a number of 1068 categories, including strict priority, weighted fair bandwidth 1069 sharing (e.g., WFQ, WRR, etc.), rate-limited strict priority, etc. 1070 Scheduling algorithms can be further distinguished by whether they 1071 are work conserving or non-work conserving. A work conserving 1072 algorithm will always transmit an available packet at every 1073 transmission opportunity, while a non-work conserving algorithm will 1074 not. Non-work conserving schedulers can be used to shape traffic 1075 streams by delaying packets that would be deemed non-conforming by 1076 some traffic profile. The packet is delayed until such time that it 1077 would conform to a meter using the same profile. 1079 [DSARCH] defines PHBs without specifying required queueing 1080 algorithms. However, PHBs such as EF [EF-PHB] and AF [AF-PHB] have 1081 configuration parameters which strongly suggest the sort of queue 1082 scheduling algorithm needed to implement them. We have selected a 1083 minimal set of queue parameters to enable realization of these per- 1084 hop behaviors. These include a minimum service rate and a strict 1085 service priority along with an optional maximum service rate profile 1086 (depending on whether the queue is meant to be non-work conserving). 1087 The minimum service rate allows throughput guarantees for each queue 1088 as required by EF and AF without specifying the details of how excess 1089 bandwidth between these queues is shared (additional parameters to 1090 control this behavior should be made available, but are dependent on 1091 the particular scheduling algorithm implemented). The strict service 1092 priority is useful for implementing EF on some links (assuming that 1093 the aggregate EF rate has been appropriately bounded to avoid 1094 starvation). Setting the service priority of each queue in a queue 1095 set to the same value enables the scheduler to satisfy the minimum 1096 service rate for each queue. Queue sets can be serviced like 1097 individual queues in a queue superset using the same scheduling 1098 parameters. 1100 It should be noted that the queues in this model are logical 1101 abstractions used to configure PHB-related parameters. They are not 1102 expected to map one-to-one with physical queues in a specific router 1103 implementation. An implementor should map the configurable 1104 parameters of the physical queues to these queue parameters as 1105 appropriate to achieve equivalent behaviors. 1107 Other queue parameters such as maximum capacity are assumed to be 1108 mapped to the buffer management algorithm used by the enqueueing 1109 element feeding the queue. 1111 A queue set might be represented using the following parameters: 1113 QueueSet1: 1114 Type: QueueSet 1115 MaxProfile: WorkConserving 1116 MinGuarRate: 20 MBps 1117 Interface: ifIndex 1118 QueueA: 1119 Type: Queue 1120 QueueSet: QueueSet1 1121 MaxProfile: Profile1 1122 MinGuarRate: 2 MBps 1123 Priority: 3 1125 QueueB: 1126 Type: Queue 1127 QueueSet: QueueSet1 1128 MaxProfile: WorkConserving 1129 MinGuarRate: 8 MBps 1130 Priority: 3 1132 7.2 Shaping 1134 Shapers are often used to pre-condition traffic such that packets 1135 are deemed conforming by subsequent meters, e.g., in downstream 1136 Diffserv nodes. Shapers may also be used to isolate certain traffic 1137 streams from the effects of other traffic streams of the same BA. 1139 A shaper action element is implemented in this model by using a non- 1140 work conserving queue. Shapers operate by delaying packets that 1141 would be deemed non-conforming by a meter configured to the shaper's 1142 maximum service rate profile. The packet is delayed until such 1143 time that it would become conforming. 1145 Profile definitions are identical in format to those described for 1146 meters. The use of a meter algorithm to control shaping is further 1147 discussed in Appendix A. Average, EWMA, and TB profiles are all 1148 feasible for shaping. Because a shaper is implemented as a queue it 1149 can also utilize a variety of buffer management algorithms 1150 (implemented in a enqueueing element). 1152 A shaping queue might be represented using the following parameters: 1154 QueueA: 1155 Type: Queue 1156 QueueSet: QueueSet1 1157 MaxProfile: Profile1 1158 MinGuarRate: 2 MBps 1159 Priority: 3 1161 Profile1: 1162 Type: SimpleTokenBucket 1163 AverageRate: 3 MBps 1164 BurstSize: 8 KB 1166 8. Traffic Conditioning Blocks (TCBs) 1168 The classifiers, meters, action elements, and queueing elements 1169 described above can be combined into traffic conditioning blocks 1170 (TCBs). The TCB is an abstraction of a functional element that may 1171 be used to facilitate the definition of specific traffic conditioning 1172 functionality. 1174 One of the simplest possible TCBs would consist of the following 1175 stages: 1177 1. Classifier stage 1178 2. Enqueueing stage 1179 3. Queueing stage 1181 Note that a classifier is a 1:N element, while an enqueueing stage is 1182 a N:1 element and a queue is a 1:1 element. If the classifier split 1183 traffic across multiple enqueueing elements then the queueing stage 1184 may consist of a hierarchy of queue sets, all resulting in a 1:1 1185 abstract element. 1187 A more general TCB might consists of the following four stages: 1189 1. Classifier stage 1190 2. Metering stage 1191 3. Action stage 1192 4. Queueing stage 1194 where each stage may consist of a set of parallel datapaths 1195 consisting of pipelined elements. 1197 TCBs are constructed by connecting elements corresponding to these 1198 stages in any sensible order. It is possible to omit stages, to 1199 include null elements, or to concatenate multiple stages of the same 1200 type. TCB outputs may drive additional TCBs (on either the ingress 1201 or egress interfaces). Classifiers and meters are fan-out elements, 1202 muxes and enqueueing elements are fan-in elements. 1204 8.1 An Example TCB 1206 The following diagram illustrates an example TCB: 1208 +------------> to Queue A 1209 +-----+ | (not shown) 1210 | |--+ 1211 +->| | 1212 | | |--+ +-----+ +-----+ 1213 | +-----+ | | | | | 1214 | meter +->| |--->| | 1215 | | | | | 1216 | +-----+ +-----+ 1217 | monitor dropper 1218 | 1219 | 1220 | 1221 submitted +-----+ | +-----+ +-----+ 1222 traffic | A |-----+ | | | | 1223 --->| B |------->| |---->| |---> to Queue B 1224 | C |-----+ | | | | (not shown) 1225 | X |--+ | +-----+ +-----+ 1226 +-----+ | | marker shaper 1227 BA | | queue 1228 classifier| | 1229 | | 1230 | | 1231 | | 1232 | | 1233 | | +-----+ +-----+ 1234 | | | |--------------->| | to Queue C 1235 | +->| | | |-> 1236 | | |--+ +-----+ +->| | (not shown) 1237 | +-----+ | | | | +-----+ 1238 | meter +->| |-+ mux 1239 | | | 1240 | +-----+ 1241 | marker 1242 | 1243 +---------------------------> to Queue D 1244 (not shown) 1245 Figure 5: An Example Traffic Conditioning Block 1247 This sample TCB might be suitable for an ingress interface at a 1248 customer/provider boundary. A SLS is presumed to have been 1249 negotiated between the customer and the provider which specifies the 1250 handling of the customer's traffic by the provider's network. The 1251 agreement might be of the following form: 1253 DSCP PHB Profile Non-Conforming Packets 1254 ---- --- ------- ---------------------- 1255 001001 PHB1 Profile1 Discard 1256 001100 PHB2 Profile2 Wait in shaper queue 1257 001101 PHB3 Profile3 Re-mark to DSCP 001000 1259 It is implicit in this agreement that conforming packets are given 1260 the PHB originally indicated by the packets' DSCP field. It 1261 specifies that the customer may submit packets marked for DSCP 1262 001001 which will get PHB1 treatment so long as they remain 1263 conforming to Profile1 and will be discarded if they exceed this 1264 profile. Similar contract rules are applied for 001100 and 001101 1265 traffic. 1267 In this example, the classification stage consists of a single BA 1268 classifier. The BA classifier is used to separate traffic based on 1269 the Diffserv service level requested by the customer (as indicated 1270 by the DSCP in each submitted packet's IP header). We illustrate 1271 three DSCP filter values: A, B and C. The 'X' in the BA classifier 1272 is the default wildcard filter that matches every packet. 1274 A metering stage is next in the upper and lower branches. There is a 1275 separate meter for each set of packets corresponding to DSCPs A and 1276 C. Each meter uses a specific profile as specified in the TCS for 1277 the corresponding Diffserv service level. The meters in this 1278 example indicate one of two conforming levels, conforming or 1279 non-conforming. The middle branch has a marker which re-marks all 1280 packets received with DSCP B. 1282 Following the metering stage is the action stage in the upper and 1283 lower branches. Packets submitted for DSCP A that are deemed non- 1284 conforming and are counted and discarded. Packets that are 1285 conforming are passed on to Queue A. Packets submitted for DSCP C 1286 that are deemed non-conforming are re-marked, and then conforming and 1287 non-conforming packets are muxed together before being forwarded to 1288 Queue C. Packets submitted for DSCP B are shaped to Profile2 before 1289 being forwarded to Queue B. 1291 The interconnections of the TCB elements illustrated in Fig. 5 can be 1292 represented as follows: 1294 TCB1: 1296 Classifier1: 1297 Output A --> Meter1 1298 Output B --> Marker1 1299 Output C --> Meter2 1300 Output X --> QueueD 1302 Meter1: 1303 Output A --> QueueA 1304 Output B --> Monitor1 1306 Monitor1: 1307 Output A --> Dropper1 1309 Marker1: 1310 Output A --> Shaper1 1311 Shaper1: 1312 Output A --> Queue B 1314 Meter2: 1315 Output A --> Mux1 1316 Output B --> Marker2 1318 Marker2: 1319 Output A --> Mux1 1321 Mux1: 1322 Output A --> Queue C 1324 8.2 An Example TCB to Support Multiple Customers 1326 The TCB described above can be installed on an ingress interface to 1327 implement a provider/customer TCS if the interface is dedicated to 1328 the customer. However, if a single interface is shared between 1329 multiple customers, then the TCB above will not suffice, since it 1330 does not differentiate among traffic from different customers. Its 1331 classification stage uses only BA classifiers. 1333 The TCB is readily extended to support the case of multiple customers 1334 per interface, as follows. First, we define a TCB for each customer 1335 to reflect the TCS with that customer. TCB1, defined above is the 1336 TCB for customer 1. We add definitions for TCB2 and for TCB3 which 1337 reflect the agreements with customers 2 and 3 respectively. 1339 Finally, we add a classifier which provides a front end to separate 1340 the traffic from the three different customers. This forms a new 1341 TCB which incorporates TCB1, TCB2, and TCB3, and can be illustrated 1342 as follows: 1344 submitted +-----+ 1345 traffic | A |--------> TCB1 1346 --->| B |--------> TCB2 1347 | C |--------> TCB3 1348 | X |--------> Dropper4 1349 +-----+ 1350 Classifier4 1352 Figure 6: An Example of a Multi-Customer TCB 1354 A formal representation of this multi-customer TCB might be: 1356 TCB1: 1357 (as defined above) 1359 TCB2: 1360 (similar to TCB1, perhaps with different numeric parameters) 1361 TCB3: 1362 (similar to TCB1, perhaps with different numeric parameters) 1364 TCB4: 1365 (the total TCB) 1367 Classifier4: 1368 Output A --> TCB1 1369 Output B --> TCB2 1370 Output C --> TCB3 1371 Output X --> Dropper4 1373 Where Classifier2 is defined as follows: 1375 Classifier4: 1376 Filter1: Output A 1377 Filter2: Output B 1378 Filter3: Output C 1379 No Match: Output X 1381 and the filters, based on each customer's source MAC address, are 1382 defined as follows: 1384 Filter1: 1385 Type: MacAddress 1386 SrcValue: 01-02-03-04-05-06 (source MAC address of customer 1) 1387 SrcMask: FF-FF-FF-FF-FF-FF 1388 DestValue: 00-00-00-00-00-00 1389 DestMask: 00-00-00-00-00-00 1391 Filter2: 1392 (similar to Filter1 but with customer 2's source MAC address as 1393 SrcValue) 1395 Filter3: 1396 (similar to Filter1 but with customer 3's source MAC address as 1397 SrcValue) 1399 In this example, Classifier4 separates traffic submitted from 1400 different customers based on the source MAC address in submitted 1401 packets. Those packets with recognized source MAC addresses are 1402 passed to the TCB implementing the TCS with the corresponding 1403 customer. Those packets with unrecognized source MAC addresses are 1404 passed to a dropper. 1406 TCB4 has a classification stage and an action element stage, which 1407 consists of either a dropper or another TCB. 1409 8.3 TCBs Supporting Microflow-based Services 1411 The TCB illustrated above describes a configuration that might be 1412 suitable for enforcing a SLS at a router's ingress. It assumes that 1413 the customer marks its own traffic for the appropriate service level. 1414 It then limits the rate of aggregate traffic submitted at each 1415 service level, thereby protecting the resources of the Diffserv 1416 network. It does not provide any isolation between the customer's 1417 individual microflows (other than from separated queueing). 1419 Next we present a TCB configuration that offers additional 1420 functionality to the customer. It recognizes individual customer 1421 microflows and marks each one independently. It also isolates the 1422 customer's individual microflows from each other in order to prevent 1423 a single microflow from seizing an unfair share of the resources 1424 available to the customer at a certain service level. This is 1425 illustrated in Figure 7 below: 1427 +-----+ +-----+ 1428 | | | |---------------+ 1429 +->| |-->| | +-----+ | 1430 +-----+ | | | | |---->| | | 1431 | |---- +-----+ +-----+ +-----+ | 1432 ->| |---- marker meter dropper | +-----+ to 1433 | |-+ | +-----+ +-----+ +-->| | 1434 +-----+ | | | | | |------------------>| |---> 1435 MF | +->| |-->| | +-----+ +-->| | 1436 class. | | | | |---->| | | +-----+ TCB2 1437 | +-----+ +-----+ +-----+ | mux 1438 | marker meter dropper | 1439 | +-----+ +-----+ | 1440 | | | | |---------------+ 1441 |--->| |-->| | +-----+ 1442 | | | | |---->| | 1443 | +-----+ +-----+ +-----+ 1444 | marker meter dropper 1445 | . . . 1446 V V V V 1448 Figure 7: An Example of a Marking and Traffic Isolation TCB 1450 Traffic is first directed to a MF classifier which classifies traffic 1451 based on miscellaneous classification criteria, to a granularity 1452 sufficient to identify individual customer microflows. Each 1453 microflow can then be marked for a specific DSCP (in this particular 1454 example we assume that one of two different DSCPs is marked). The 1455 metering stage limits the contribution of each of the customer's 1456 microflows to the service level for which it was marked. Packets 1457 exceeding the allowable limit for the microflow are dropped. 1459 The TCB could be formally specified as follows: 1461 TCB1: 1462 Classifier1: (MF) 1463 Output A --> Marker1 1464 Output B --> Marker2 1465 Output C --> Marker3 1466 . . . 1468 Marker1 --> Meter1 1469 Marker2 --> Meter2 1470 Marker3 --> Meter3 1472 Meter1: 1473 Output A --> TCB2 1474 Output B --> ActionElement1 (dropper) 1476 Meter2: 1477 Output A --> TCB2 1478 Output B --> ActionElement2 (dropper) 1480 Meter3: 1481 Output A --> TCB2 1482 Output B --> ActionElement3 (dropper) 1484 The actual traffic element declarations are not shown here. 1486 Traffic is either dropped by TCB1 or emerges marked for one of two 1487 DSCPs. This traffic is then passed to TCB2, illustrated below: 1489 +-----+ 1490 | |---------------> 1491 +->| | +-----+ 1492 +-----+ | | |---->| | 1493 | |---+ +-----+ +-----+ 1494 ->| | meter dropper 1495 | |---+ +-----+ 1496 +-----+ | | |---------------> 1497 BA +->| | +-----+ 1498 classifier | |---->| | 1499 +-----+ +-----+ 1500 meter dropper 1502 Figure 8: Additional Example TCB 1504 TCB2 would be formally specified as follows: 1506 Classifier2: (BA) 1507 Output A --> Meter10 1508 Output B --> Meter11 1509 Meter10: 1510 Output A --> PHBQueueA 1511 Output B --> Dropper10 1513 Meter11: 1514 Output A --> PHBQueueB 1515 Output B --> Dropper11 1517 9. Open Issues 1519 o There is a difference in interpretation of token bucket behavior 1520 between this document (Appendix A) and [DSMIB]. Specifically, 1521 [DSMIB] allows a packet to conform if any smaller packet would 1522 conform. 1524 o The meter in [SRTCM] cannot be precisely modeled using two 1525 two-parameter token buckets because its two buckets do not 1526 accumulate credits independently. We intended to demonstrate how 1527 the [TRTCM] meter could be implemented but ran out of time. 1529 o Are the queue parameters (scheduling and buffer management) 1530 parameters defined sufficient? 1532 o Does Queue and Queue Set really belong in the model (and the MIB 1533 and PIB?), or should the model stick to the abstract PHB 1534 representation and leave the implementation details to the MIB and 1535 PIB? 1537 o Should a classifier be part of a TCB? We argue yes. This allows a 1538 TCB to be a one input/one output black box element. 1540 o Is the description of a shaper sufficient? Is it overbroad? 1542 10. Security Considerations 1544 Security vulnerabilities of Diffserv network operation are discussed 1545 in [DSARCH]. This document describes an abstract functional model of 1546 Diffserv router elements. Certain denial-of-service attacks such as 1547 those resulting from resource starvation may be mitigated by 1548 appropriate configuration of these router elements; for example, by 1549 rate limiting certain traffic streams or by authenticating traffic 1550 marked for higher quality-of-service. 1552 11. Acknowledgments 1554 Concepts, terminology, and text have been borrowed liberally from 1555 [DSMIB] and [PIB]. We wish to thank the authors: Fred Baker, 1556 Michael Fine, Keith McCloghrie, John Seligson, Kwok Chan, and 1557 Scott Hahn, for their permission. 1559 This document has benefitted from the comments and suggestions of 1560 several participants of the Diffserv working group. 1562 12. References 1564 [DSARCH] M. Carlson, W. Weiss, S. Blake, Z. Wang, D. Black, and 1565 E. Davies, "An Architecture for Differentiated Services", 1566 RFC 2475, December 1998 1568 [DSTERMS] D. Grossman, "New Terminology for Diffserv", Internet 1569 Draft , October 1570 1999. 1572 [E2E] Y. Bernet, R. Yavatkar, P. Ford, F. Baker, L. Zhang, 1573 M. Speer, K. Nichols, R. Braden, B. Davie, J. Wroclawski, 1574 and E. Felstaine, "Integrated Services Operation over 1575 Diffserv Networks", Internet Draft 1576 , September 1999. 1578 [DSFIELD] K. Nichols, S. Blake, F. Baker, and D. Black, 1579 "Definition of the Differentiated Services Field (DS 1580 Field) in the IPv4 and IPv6 Headers", RFC 2474, December 1581 1998. 1583 [EF-PHB] V. Jacobson, K. Nichols, and K. Poduri, "An Expedited 1584 Forwarding PHB", RFC 2598, June 1999. 1586 [AF-PHB] J. Heinanen, F. Baker, W. Weiss, and J. Wroclawski, 1587 "Assured Forwarding PHB Group", RFC 2597, June 1999. 1589 [DSMIB] F. Baker, "Differentiated Services MIB", Internet Draft 1590 , June 1999. 1592 [SRTCM] J. Heinanen, and R. Guerin, "A Single Rate Three Color 1593 Marker", RFC 2697, September 1999. 1595 [PIB] M. Fine, K. McCloghrie, J. Seligson, K. Chan, S. Hahn, 1596 and A. Smith, "Quality of Service Policy Information 1597 Base", Internet Draft , 1598 June 1999. 1600 [TRTCM] J. Heinanen, R. Guerin, "A Two Rate Three Color Marker", 1601 RFC 2698, September 1999. 1603 [GTC] L. Lin, J. Lo, and F. Ou, "A Generic Traffic Conditioner", 1604 Internet Draft , August 1605 1999. 1607 [MPLSDS] J. Heinanen, "Differentiated Services in MPLS Networks", 1608 Internet Draft , 1609 June 1999. 1611 Appendix A. Simple Token Bucket Definition 1613 [DSMIB] presents a fairly detailed exposition on the operation of 1614 two-parameter token buckets for metering. However, the behavior 1615 described does not appear to be consistent with the behavior defined 1616 in [SRTCM] and [TRTCM]. Specifically, under the definition in 1617 [DSMIB], a packet is assumed to conform to the meter if any of its 1618 bytes would have been accepted, while in [SRTCM] and [TRTCM], a packet 1619 is assumed to conform only if sufficient tokens are available for 1620 every byte in the packet. Further, a packet has no effect on the 1621 token occupancy if it does not conform (no tokens are decremented). 1623 The behavior defined in [SRTCM] and [TRTCM] is not mandatory for 1624 compliance, but we give here a mathematical definition of two- 1625 parameter token bucket operation which is consistent with these 1626 documents, and which can be used to define a shaping profile. 1628 Define a token bucket with bucket size BS, token accumulation rate 1629 R, and instantaneous token occupancy T(t). Assume that T(0) = BS. 1631 Then after an arbitrary interval with no packet arrivals, T(t) will 1632 not change since the bucket is already full of tokens. Assume a 1633 packet of size B bytes at time t'. The bucket capacity T(t'-) = BS 1634 still. Then, as long as B <= BS, the packet conforms to the meter, 1635 and 1637 T(t') = BS - B. 1639 Assume an interval v = t - t' elapses before the next packet, of 1640 size C <= BS, arrives. T(t-) is given by the following equation: 1642 T(t-) = max { BS, T(t') + v*R } 1644 (the packet has accumulated v*R tokens over the interval, up to a 1645 maximum of BS tokens). 1647 If T(t-) - C >= 0, the packet conforms and T(t) = T(t-) - C. 1648 Otherwise, the packet does not conform and T(t) = T(t-). 1650 This function can be used to define a shaping profile. If a packet of 1651 size C arrives at time t, it will be eligible for transmission at time 1652 te given as follows (we still assume C <= BS): 1654 te = max { t, t" } 1656 where 1658 t" = (C - T(t') + t'*R)/R. 1660 T(t") = C, the time when C credits have accumulated in the bucket, 1661 and when the packet would conform if the token bucket were a meter. 1662 te != t" only if t > t". 1664 Authors' Addresses 1666 Yoram Bernet 1667 Microsoft 1668 One Microsoft Way 1669 Redmond, WA 98052 1670 Phone: +1 425 936 9568 1671 E-mail: yoramb@microsoft.com 1673 Andrew Smith 1674 Extreme Networks 1675 3585 Monroe St. 1676 Santa Clara, CA 95051 1677 Phone: +1 408 579 2821 1678 E-mail: andrew@extremenetworks.com 1680 Steven Blake 1681 Ericsson 1682 3000 Aerial Center Parkway, Suite 140 1683 Morrisville, NC 27560 1684 Phone: +1 919 468 8466 x232 1685 E-mail: slblake@torrentnet.com