idnits 2.17.1 draft-ietf-diffserv-model-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 21 longer pages, the longest (page 10) being 599 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 34 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 12 instances of too long lines in the document, the longest one being 3 characters in excess of 72. ** There are 67 instances of lines with control characters in the document. ** The abstract seems to contain references ([DSARCH], [PIB], [DSMIB]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 2 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 1092: '... such operations MUST NOT have the eff...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 201 has weird spacing: '...agement pac...' == Line 239 has weird spacing: '...serving not...' == Line 272 has weird spacing: '...tioning other...' == Line 281 has weird spacing: '...serving ser...' == Line 1065 has weird spacing: '...ally to const...' == (3 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 2000) is 8807 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'GTC' is defined on line 1791, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 2475 (ref. 'DSARCH') -- Possible downref: Non-RFC (?) normative reference: ref. 'DSTERMS' -- Possible downref: Non-RFC (?) normative reference: ref. 'E2E' ** Obsolete normative reference: RFC 2598 (ref. 'EF-PHB') (Obsoleted by RFC 3246) -- Possible downref: Non-RFC (?) normative reference: ref. 'DSMIB' ** Downref: Normative reference to an Informational RFC: RFC 2697 (ref. 'SRTCM') -- Possible downref: Non-RFC (?) normative reference: ref. 'PIB' ** Downref: Normative reference to an Informational RFC: RFC 2698 (ref. 'TRTCM') -- Possible downref: Non-RFC (?) normative reference: ref. 'GTC' -- Possible downref: Non-RFC (?) normative reference: ref. 'MPLSDS' Summary: 11 errors (**), 0 flaws (~~), 13 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Y. Bernet 2 Diffserv Working Group Microsoft 3 INTERNET-DRAFT A. Smith 4 Expires: September 2000 Extreme Networks 5 S. Blake 6 Ericsson 7 D. Grossman 8 Motorola 9 March 2000 11 A Conceptual Model for Diffserv Routers 13 draft-ietf-diffserv-model-02.txt 15 Status of this Memo 17 This document is an Internet-Draft and is in full conformance with 18 all provisions of Section 10 of RFC2026. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt. 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 This document is a product of the Diffserv working group. Comments 37 on this draft should be directed to the Diffserv mailing list 38 . 40 Distribution of this memo is unlimited. 42 Copyright Notice 44 Copyright (C) The Internet Society (1999). All Rights Reserved. 46 Abstract 48 DISCLAIMER - for reasons outside our control this version has been 49 rushed out with formatting errors and not checked by all authors. 51 This draft proposes a conceptual model of Differentiated Services 52 (Diffserv) routers for use in their management and configuration. 53 This model defines the general functional datapath elements 54 (classifiers, meters, markers, droppers, monitors, replicators, muxes, 55 queues), their possible configuration parameters, and how they might 56 be interconnected to realize the range of classification, traffic 57 conditioning, and per-hop behavior (PHB) functionalities described in 59 Bernet, et. al. Expires: September 2000 [page 1] 61 [DSARCH]. The model is intended to be abstract and capable of 62 representing the configuration parameters important to Diffserv 63 functionality for a variety of specific router implementations. It 64 is not intended as a guide to hardware implementation. 66 This model should serve as a rationale for the design of a Diffserv 67 MIB [DSMIB], as well for various configuration interfaces (such as 68 [PIB]). Since these documents are all evolving simultaneously there 69 are discrepancies between their current revisions; this should be 70 resolved in a future revision of this draft. 72 Table of Contents 74 1. Introduction ................................................. 3 75 2. Glossary .................................................... 4 76 3. Conceptual Model ............................................. 6 77 3.1 Elements of a Diffserv Router ............................. 6 78 3.1.1 Datapath .............................................. 7 79 3.1.2 Configuration and Management Interface ................ 8 80 3.1.3 Optional RSVP Module .................................. 8 81 3.2 Hierarchical Model of Diffserv Components ................. 8 82 4. Classifiers .................................................. 10 83 4.1 Definition ................................................ 10 84 4.1.1 Filters ............................................... 11 85 4.1.2 Overlapping Filters ................................... 12 86 4.1.3 Filter Groups ......................................... 12 87 4.2 Examples .................................................. 12 88 4.2.1 Behavior Aggregate (BA) Classifier .................... 12 89 4.2.2 Multi-Field (MF) Classifier ........................... 13 90 4.2.3 IEEE802 MAC Address Classifier ........................ 13 91 4.2.4 Free-form Classifier .................................. 14 92 4.2.5 Other Possible Classifiers ............................ 14 93 4.3 MPLS ...................................................... 15 94 5. Meters ....................................................... 15 95 5.1 Definition ................................................ 15 96 5.2 Examples .................................................. 16 97 5.2.1 Average Rate Meter .................................... 16 98 5.2.2 Exponentially Weighted Moving Average (EWMA) Meter .... 17 99 5.2.3 Two-Parameter Token Bucket Meter ...................... 17 100 5.2.4 Multi-Stage Token Bucket Meter ........................ 18 101 5.2.5 Null Meter ............................................ 19 102 6. Action Elements .............................................. 19* 103 6.1 Marker .................................................... 19* 104 6.2 Dropper ................................................... 20* 105 6.3 Shaper .................................................... 20* 106 6.4 Replicating Element ....................................... 20* 107 6.5 Multiplexor ............................................... 20* 108 6.6 Monitor ................................................... 21* 109 6.7 Null Action ............................................... 21* 110 7. Queues ....................................................... 21 111 7.1 Queue Sets and Scheduling ................................. 21 112 7.2 Shaping ................................................... 23 114 Bernet, et. al. Expires: July 2000 [page 2] 115 8. Traffic Conditioning Blocks (TCBs) ........................... 23 116 8.1 An Example TCB ............................................ 24 117 8.2 An Example TCB to Support Multiple Customers .............. 27 118 8.3 TCBs Supporting Microflow-based Services .................. 28 119 8.4 Cascaded TCBs ............................................. 31 120 9. Open Issues .................................................. 31 121 10. Security Considerations ...................................... 31 122 11. Acknowledgments .............................................. 31 123 12. References ................................................... 32 124 Appendix A. Simple Token Bucket Definition ....................... 33 126 1. Introduction 128 Differentiated Services (Diffserv) [DSARCH] is a set of technologies 129 which allow network service providers to offer differing levels of 130 network quality-of-service (QoS) to different customers and their 131 traffic streams. The premise of Diffserv networks is that routers 132 within the core of the network handle packets in different traffic 133 streams by forwarding them using different per-hop behaviors (PHBs). 134 The PHB to be applied is indicated by a Diffserv codepoint (DSCP) in 135 the IP header of each packet [DSFIELD]. Note that this document 136 uses the terminology defined in [DSARCH, DSTERMS] and in Sec. 2. 138 The advantage of such a scheme is that many traffic streams can be 139 aggregated to one of a small number of behavior aggregates (BA) 140 which are each forwarded using the same PHB at the router, thereby 141 simplifying the processing and associated storage. In addition, 142 there is no signaling, other than what is carried in the DSCP of 143 each packet, and no other related processing that is required in the 144 core of the Diffserv network since QoS is invoked on a packet-by- 145 packet basis. 147 The Diffserv architecture enables a variety of possible services 148 which could be deployed in a network. These services are reflected 149 to customers at the edges of the Diffserv network in the form of a 150 Service Level Specification (SLS) [DSTERMS]. The ability to provide 151 these services depends on the availability of cohesive management and 152 configuration tools that can be used to provision and monitor a set 153 of Diffserv routers in a coordinated manner. To facilitate the 154 development of such configuration and management tools it is helpful 155 to define a conceptual model of a Diffserv router that abstracts 156 away implementation details of particular Diffserv routers from the 157 parameters of interest for configuration and management. The purpose 158 of this draft is to define such a model. 160 The basic forwarding functionality of a Diffserv router is defined in 161 other specifications; e.g., [DSARCH, DSFIELD, AF-PHB, EF-PHB]. 163 This document is not intended in any way to constrain or to dictate 164 the implementation alternatives of Diffserv routers. We expect that 165 router vendors will demonstrate a great deal of variability in their 166 implementations. To the extent that vendors are able to model their 168 Bernet, et. al. Expires: September 2000 [page 3] 169 implementations using the abstractions described in this draft, 170 configuration and management tools will more readily be able to 171 configure and manage networks incorporating Diffserv routers of 172 various implementations. 173 In Sec. 3 we start by describing the basic high-level functional 174 elements of a Diffserv router and then describe the various 175 components. We then focus on the Diffserv-specific components of 176 the router and describe a hierarchical management model for these. 178 In Sec. 4 we describe classification elements and in Sec. 5, we 179 discuss the meter elements. 181 In Sec. 6 we discuss action elements. In Sec. 7 we discuss the 182 basic queueing elements and their functional behaviors (e.g., 183 shaping). 185 In Sec. 8, we show how the basic classification, meter, action, and 186 queueing elements can be combined to build modules called Traffic 187 Conditioning Blocks (TCBs). 189 In Sec. 9 we discuss open issues with this document and in Sec. 10 we 190 discuss security concerns. 192 Appendix A discusses token bucket implementation details. 194 2. Glossary 196 Some of the terms used in this draft are defined in [DSARCH] and in 197 [DSTERMS]. We define a few of them here again only to provide 198 additional detail. 200 Buffer An algorithm used to determine whether an arriving 201 management packet should be stored in a queue, or discarded. This 202 algorithm decision is usually a function of the instantaneous or 203 average queue occupancy, but also may be a function of 204 the aggregate queue occupancy in a queue set, or of 205 other parameters. 207 Classifier A functional datapath element which consists of filters 208 which select packets based on the content of packet 209 headers or other packet data, and/or on implicit or 210 derived attributes associated with the packet, and 211 forwards the packet along a particular datapath within 212 the router. A classifier splits a single incoming 213 traffic stream into multiple outgoing ones. 215 Enqueueing The process of executing a buffer management algorithm 216 to determine whether an arriving packet should be 217 stored in a queue. 219 Filter A set of (wildcard/prefix/masked/range/exact) 220 conditions on the components of a packet's 222 Bernet, et. al. Expires: September 2000 [page 4] 223 classification key. A filter is said to match only if 224 each condition is satisfied. 226 Replicating A functional datapath element which makes one or more 227 element copies of a packet and forwards them on distinct 228 datapaths; for example to a monitoring port. 230 Monitor A functional datapath element which updates an octet 231 and a packet counter for every packet which passes 232 through it. Used for collecting statistics. 234 Multiplexer A functional datapath element that merges multiple 235 (Mux) traffic streams (datapaths) into a single traffic 236 stream (datapath). 238 Non-work A property of a scheduling algorithm such that it does 239 conserving not necessarily service a packet if available at every 240 transmission opportunity. 242 Queue A storage location for packets awaiting transmission or 243 processing by the next functional element in the data- 244 path. The queues represented in this model are 245 abstract elements that may be implemented by multiple 246 physical queues in series and/or in parallel in a 247 specific implementation. Note that we assume that a 248 queue is serviced such as to preserve the required 249 ordering constraint for each Ordering Aggregate (OA) 250 it queues [DSTERMS]. This can be achieved by a FIFO 251 (first in, first out) service policy or by other means 252 (e.g., multiple FIFOs exclusively servicing particular 253 OAs). 255 Queue set A set of queues which are serviced by a scheduling 256 algorithm and which may share a buffer management 257 algorithm. 259 Scheduling An algorithm which determines which queue of a queue 260 algorithm set to service next. This may be based on the relative 261 priority of the queues, or on a weighted fair bandwidth 262 sharing policy, or some other policy. A scheduling 263 algorithm may be either work-conserving or non-work- 264 conserving. 266 Shaping The process of delaying packets within a traffic stream 267 to cause it to conform to some defined traffic profile. 268 Shaping can be implemented using a queue serviced by a 269 non-work conserving scheduling algorithm. 271 Traffic A logical datapath entity consisting of a number of 272 Conditioning other functional datapath entities interconnected in 273 Block (TCB) such a way as to perform a specific set of traffic 274 conditioning functions on an incoming traffic stream. 276 Bernet, et. al. Expires: September 2000 [page 5] 277 A TCB can be thought of as an entity with at least one 278 input and output and a set of control parameters. 280 Work A property of a scheduling algorithm such that it 281 conserving services a packet if available at every transmission 282 opportunity. 284 3. Conceptual Model 286 In this section we introduce a block diagram of a Diffserv router and 287 describe the various components illustrated. Note that a Diffserv 288 core router is assumed to include only a subset of these components: 289 the model we present here is intended to cover the case of both 290 Diffserv edge and core routers. 292 3.1 Elements of a Diffserv Router 294 The conceptual model we define includes abstract definitions for the 295 following: 297 o The basic traffic classification components. 299 o The basic traffic conditioning components. 301 o Certain combinations of traffic classification and conditioning 302 components. 304 o Queueing components. 306 The components and combinations of components described in this 307 document form building blocks that need to be manageable by Diffserv 308 configuration and management tools. One of the goals of this 309 document is to show how a model of a Diffserv device can be built 310 using these component blocks. This model is in the form of a 311 connected directed acyclic graph (DAG) of functional datapath 312 elements that describes the traffic conditioning and queueing 313 behaviors that any particular packet will experience when forwarded 314 to the Diffserv router. 316 The following diagram illustrates the major functional blocks of a 317 Diffserv router: 319 Bernet, et. al. Expires: September 2000 [page 6] 320 +---------------+ 321 | Diffserv | 322 Mgmt | configuration | 323 <----+-->| & management |------------------+ 324 SNMP,| | interface | | 325 COPS | +---------------+ | 326 etc. | | | 327 | | | 328 | v v 329 | +-------------+ +---------+ +-------------+ 330 data | | ingress i/f | | | | egress i/f | 331 -------->| class., |-->| routing |-->| class., |----> 332 | | TC, | | core | | TC, | 333 | | queueing | | | | queueing | 334 | +-------------+ +---------+ +-------------+ 335 | ^ ^ 336 | | | 337 | | | 338 | +------------+ | 339 +-->| QOS agent | | 340 -------->| (optional) |---------------------+ 341 QOS | (e.g. RSVP)| 342 cntl +------------+ 343 msgs 345 Figure 1: Diffserv Router Major Functional Blocks 347 3.1.1 Datapath 349 An ingress interface, routing core, and egress interface are 350 illustrated at the center of the diagram. In actual router 351 implementations, there may be an arbitrary number of ingress and 352 egress interfaces interconnected by the routing core. The routing 353 core element serves as an abstraction of a router's normal routing 354 and switching functionality. The routing core moves packets between 355 interfaces according to policies outside the scope of Diffserv. The 356 actual queueing delay and packet loss behavior of a specific router's 357 switching fabric/backplane is not modeled by the routing core; these 358 should be modeled using the functional elements described later. The 359 routing core should be thought of as an infinite bandwidth, zero- 360 delay backplane connecting ingress and egress interfaces. 362 The components of interest on the ingress/egress interfaces are the 363 traffic classifiers, traffic conditioning (TC) components, and the 364 queueing components that support Diffserv traffic conditioning and 365 per-hop behaviors [DSARCH]. These are the fundamental components 366 comprising a Diffserv router and will be the focal point of our 367 conceptual model. 369 Bernet, et. al. Expires: September 2000 [page 7] 370 3.1.2 Configuration and Management Interface 372 Diffserv operating parameters are monitored and provisioned through 373 this interface. Monitored parameters include statistics regarding 374 traffic carried at various Diffserv service levels. These statistics 375 may be important for accounting purposes and/or for tracking 376 compliance to traffic conditioning specifications (TCSs) [DSTERMS] 377 negotiated with customers. Provisioned parameters are primarily 378 classification rules, TC and PHB configuration parameters. The 379 network administrator interacts with the Diffserv configuration and 380 management interface via one or more management protocols, such as 381 SNMP or COPS, or through other router configuration tools such as 382 serial terminal or telnet consoles. 384 Specific policy objectives are presumed to be installed by or 385 retrieved from policy management mechanisms. However, diffserv 386 routers are subject to implementation decisions which form a meta- 387 policy that scopes the kinds of policies which can be created. 389 3.1.3 Optional RSVP Module 391 Diffserv routers may snoop or participate in either per-microflow or 392 per-flow-aggregate signaling of QoS requirements [E2E]. The example 393 discussed here uses the RSVP protocol. Snooping of RSVP messages may 394 be used, for example, to learn how to classify traffic without 395 actually participating as a RSVP protocol peer. Diffserv routers may 396 reject or admit RSVP reservation requests to provide a means of 397 admission control to Diffserv-based services or they may use these 398 requests to trigger provisioning changes for a flow-aggregation in 399 the Diffserv network. A flow-aggregation in this context might be 400 equivalent to a Diffserv BA or it may be more fine-grained, relying 401 on a MF classifier [DSARCH]. Note that the conceptual model of such 402 a router starts to look the same as a Integrated Services (intserv) 403 router in its component makeup [E2E]. 405 Note that a RSVP component of a Diffserv router, if present, might 406 be active only in the control plane and not in the data plane. In 407 this scenario, RSVP is used strictly as a signaling protocol. The 408 data plane of such a Diffserv router can still act purely on Diffserv 409 DSCPs and PHBs in handling data traffic. 411 3.2 Hierarchical Model of Diffserv Components 413 We focus on the Diffserv specific functional components of the 414 router: the classification, traffic conditioning, and queueing 415 functionality. The diagram below is based on the larger block 416 diagram shown above: 418 Bernet, et. al. Expires: September 2000 [page 8] 419 Interface A Interface B 420 +-------------+ +---------+ +-------------+ 421 | ingress i/f | | | | egress i/f | 422 | class., | | | | class., | 423 --->| meter, |---->| |---->| meter, |---> 424 | action, | | | | action, | 425 | queueing | | | | queueing | 426 +-------------+ | routing | +-------------+ 427 | core | 428 +-------------+ | | +-------------+ 429 | egress i/f | | | | ingress i/f | 430 | class., | | | | class., | 431 <---| meter, |<----| |<----| meter, |<--- 432 | action, | | | | action, | 433 | queueing | +---------+ | queueing | 434 +-------------+ +-------------+ 436 Figure 2. Traffic Conditioning and Queueing Elements 438 This diagram illustrates two Diffserv router interfaces, each having 439 an ingress and an egress component. It shows classification, meter, 440 action, and queueing elements which might be instantiated on each 441 interface's ingress and egress component. The TC functionality is 442 implemented by a combination of classification, action, meter, and 443 queueing elements. We show equivalent functional elements on both 444 the ingress and egress components of an interface because we expect 445 an N-port router to display the same Diffserv capabilities as a 446 network of 2-port routers interconnected by LAN media [DSMIB]. Note 447 that it is not mandatory that each of these functional elements be 448 implemented on both ingress and egress components; it is dependent on 449 the service requirements on a particular interface on a particular 450 router. Further, we wish to point out that by showing these elements 451 on both ingress and egress components we do not mean to imply that 452 they must be implemented in this way in a specific router. For 453 example, a router may implement all shaping and PHB queueing on the 454 interface egress component, or may instead implement it only on the 455 ingress component. Further, the classification needed to map a 456 packet to an egress component queue (if present) need not be 457 implemented on the egress component but instead may be implemented on 458 the ingress component, with the packet passed through the routing 459 core with in-band control information to allow for egress queue 460 selection. 462 From a configuration and management perspective, the following 463 hierarchy exists: 465 At the top level, the network administrator manages interfaces. Each 466 interface consists of an ingress component and an egress component. 467 Each component may contain classifier, action, meter, and queueing 468 elements. 470 Bernet, et. al. Expires: September 2000 [page 9] 471 At the next level, the network administrator manages groups of 472 functional elements interconnected in a DAG. These elements are 473 organized in self-contained Traffic Conditioning Blocks (TCBs) which 474 are used to implement some desired network policy (see Sec. 8). One 475 or more TCBs may be instantiated on each ingress or egress component, 476 may be connected in series, and/or may be connected in a 477 parallel configuration on the multiple outputs of a classifier. 478 We define the TCB to optionally include classification and queueing 479 elements so as to allow for rich functionality. A TCB can be thought 480 of as a "black box" with a single input and a single output (on the 481 main data path). TCBs can be constructed out of a DAG of other TCBs, 482 recursively. We do not assume the same TCB configuration on every 483 interface (ingress or egress). 485 At the lowest level are individual functional elements, each with 486 their own configuration parameters and management counters and flags. 488 4. Classifiers 490 4.1 Definition 492 Classification is performed by a classifier element. Classifiers are 493 1:N (fan-out) devices: they take a single traffic stream as input and 494 generate N logically separate traffic streams as output. Classifiers 495 are parameterized by filters and output streams. Packets from the 496 input stream are sorted into various output streams by filters which 497 match the contents of the packet or possibly match other attributes 498 associated with the packet. Various types of classifiers are 499 described in the following sections. 501 We use the following diagram to illustrate a classifier, where the 502 outputs connect to succeeding functional elements: 504 unclassified classified 505 traffic traffic 506 +------------+ 507 | |--> match Filter1 --> output A 508 ------->| classifier |--> match Filter2 --> output B 509 | |--> no match --> output C 510 +------------+ 512 Figure 3. An Example Classifier 514 Note that we allow a mux (see Sec. 6.5) before the classifier to 515 allow input from multiple traffic streams. For example, if multiple 516 ingress sub-interfaces feed through a single classifier then the 517 interface number can be considered by the classifier as a packet 518 attribute and be included in the packet's classification key. This 519 optimization may be important for scalability in the management 520 plane. Another possible packet attribute could be an integer 521 representing the BGP community string associated with the packet's 522 best-matching route. 524 The following classifier separates traffic into one of three output 525 streams based on three filters: 527 Filter Matched Output Stream 528 -------------- --------------- 529 Filter1 A 530 Filter2 B 531 Filter3 (no match) C 533 Where Filters1 and Filter2 are defined to be the following BA filters 534 ([DSARCH], see Sec. 4.2.1 ): 536 Filter DSCP 537 ------ ------ 538 1 101010 539 2 111111 540 3 ****** (wildcard) 542 4.1.1 Filters 544 A filter consists of a set of conditions on the component values of 545 a packet's classification key (the header values, contents, and 546 attributes relevant for classification). In the BA classifier 547 example above, the classification key consists of one packet header 548 field, the DSCP, and both Filter1 and Filter2 specify exact-match 549 conditions on the value of the DSCP. Filter3 is a wildcard default 550 filter which matches every packet, but which is only selected in the 551 event that no other more specific filter matches. 553 In general there are a set of possible component conditions including 554 exact, prefix, range, masked, and wildcard matches. Note that ranges 555 can be represented (with less efficiency) as a set of prefixes and 556 that prefix matches are just a special case of both masked and range 557 matches. 559 In the case of a MF classifier [DSARCH], the classification key 560 consists of a number of packet header fields. The filter may 561 specify a different condition for each key component, as illustrated 562 in the example below for a IPv4/TCP classifier: 564 Filter IP Src Addr IP Dest Addr TCP SrcPort TCP DestPort 565 ------ ------------- ------------- ----------- ------------ 566 Filter4 172.31.8.1/32 172.31.3.X/24 X 5003 568 In this example, the fourth octet of the destination IPv4 address 569 and the source TCP port are wildcard or "don't cares". 571 MF filtering of fragmented packets is impossible. MTU size discovery 572 is therefore prerequisite for proper operation of a diffserv network. 574 4.1.2 Overlapping Filters 576 Note that it is easy to define sets of overlapping filters in a 577 classifier. For example: 579 Filter5: Filter6: 580 Type: Masked-DSCP Type: Masked-DSCP 581 Value: 111000 Value: 000111 (binary) 582 Mask: 111000 Mask: 000111 (binary) 584 A packet containing DSCP = 111111 cannot be uniquely classified by 585 this pair of filters and so a precedence must be established between 586 Filter5 and Filter6 in order to break the tie. This precedence must 587 be established either (a) by a manager which knows that the router 588 can accomplish this particular ordering; e.g., by means of reported 589 capabilities or (b) by the router along with a mechanism to report 590 to a manager which precedence is being used. These ordering 591 mechanisms must be supported by the configuration and management 592 protocols although further discussion of this is outside the scope of 593 this document. 595 An unambiguous classifier requires that every possible classification 596 key match at least one filter (including the wildcard default), and 597 that any ambiguity between overlapping filters be resolved by 598 precedence. 600 4.1.3 Filter Groups 602 Filters may be logically combined. For example, consider the 603 following DestMacAddress filter: 605 Filter7: 606 Type: DestMacAddress 607 Value: 01-02-03-04-05-06 608 Mask: FF-FF-FF-FF-FF-FF 610 Classifier0 could then be declared as: 612 Classifier0: 613 Filter1 and Filter7: output A 614 Filter2 and Filter7: output B 615 Default (wildcard) filter: output C 617 4.2 Examples 619 4.2.1 Behaviour Aggregate (BA) Classifier 621 The simplest Diffserv classifier is a behavior aggregate (BA) 622 classifier [DSARCH]. A BA classifier uses only the Diffserv 623 codepoint (DSCP) in a packet's IP header to determine the logical 624 output stream to which the packet should be directed. We allow only 625 an exact-match condition on this field because the assigned DSCP 626 values have no structure, and therefore no subset of DSCP bits are 627 significant. 629 The following defines a possible BA filter: 631 Filter8: 632 Type: BA 633 Value: 111000 635 4.2.2 Multi-Field (MF) Classifier 637 Another type of classifier is a multi-field (MF) classifier [DSARCH]. 638 This classifies packets based on one or more fields in the packet 639 header (including the DSCP). A common type of MF classifier is a 6- 640 tuple classifier that classifies based on six IP header fields 641 (destination address, source address, IP protocol, source port, 642 destination port, and DSCP). MF classifiers may classify on other 643 fields such as MAC addresses, VLAN tags, link-layer traffic class 644 fields or other higher-layer protocol fields. 646 The following defines a possible MF filter: 648 Filter9: 649 Type: IPv4-6-tuple 650 IPv4DestAddrValue: 0 651 IPv4DestAddrMask: 0.0.0.0 652 IPv4SrcAddrValue: 172.31.8.0 653 IPv4SrcAddrMask: 255.255.255.0 654 IPv4DSCP: 28 655 IPv4Protocol: 6 656 IPv4DestL4PortMin: 0 657 IPv4DestL4PortMax: 65535 658 IPv4SrcL4PortMin: 20 659 IPv4SrcL4PortMax: 20 661 A similar type of classifier can be defined for IPv6. 663 4.2.3 IEEE802 MAC Address Classifier 665 A MacAddress filter is parameterized by a 6-byte {value, mask} pair 666 for either source or destination MAC address. For example, the 667 following classifier sends packets matching either DA = 668 01-02-03-04-05-06 or SA = 00-E0-2B-XX-XX-XX to output A: 670 Classifier1: 671 Filter10: output A 672 Filter11: output A 673 Default: output B 674 Filter10: 675 Type: DestMacAddress 676 Value: 01-02-03-04-05-06 (hex) 677 Mask: FF-FF-FF-FF-FF-FF (hex) 679 Filter11: 680 Type: SrcMacAddress 681 DestValue: 00-E0-2B-00-00-00 (hex) 682 DestMask: FF-FF-FF-00-00-00 (hex) 684 4.2.4 Free-form Classifier 686 A Free-form classifier is made up of a set of user definable 687 arbitrary filters each made up of {bit-field size, offset (from head 688 of packet), mask}: 690 Classifier2: 691 Filter12: output A 692 Filter13: output B 693 Default: output C 695 Filter12: 696 Type: FreeForm 697 SizeBits: 3 (bits) 698 Offset: 16 (bytes) 699 Value: 100 (binary) 700 Mask: 101 (binary) 702 Filter13: 703 Type: FreeForm 704 SizeBits: 12 (bits) 705 Offset: 16 (bytes) 706 Value: 100100000000 (binary) 707 Mask: 111111111111 (binary) 709 Free-form filters can be combined into filter groups to form very 710 powerful filters. 712 4.2.5 Other Possible Classifiers 714 Classifier3: 715 Filter14: output A 716 Filter15: output B 717 Default: output C 719 Filter14: 720 Type: IEEEPriority 721 Value: 100 (binary) 722 Mask: 101 (binary) 723 Filter15: 724 Type: IEEEVLAN 725 Value: 100100000000 (binary) 726 Mask: 111111111111 (binary) 728 Classification may be performed based on implicit information 729 associated with a packet (e.g. the incoming channel number on a 730 channelized interface) or on information derived from a different 731 non-Diffserv classification operation (e.g. the outgoing interface 732 determined by the route lookup operation). Other vendor-specific 733 filter formats are possible. We do not discuss these further here. 735 4.3 MPLS 737 It is possible for an MPLS label-switched router (LSR) to function as 738 a Diffserv router [MPLSDS]. The interaction between MPLS and Diffserv 739 is not discussed further in this document. 741 5. Meters 743 5.1 Definition 745 Metering is the function of monitoring the arrival times of packets 746 of a traffic stream and determining the level of conformance of each 747 packet to a pre-established traffic profile. Diffserv network 748 providers may choose to offer services to customers based on a 749 temporal (i.e., rate) profile within which the customer submits 750 traffic for the service. In this event, a meter might be used to 751 trigger real-time traffic conditioning actions (e.g., marking) by 752 routing a non-conforming packet through an appropriate next-stage 753 action element. Alternatively, it might also be used for out-of-band 754 management functions like statistics monitoring for billing 755 applications. 757 Meters are logically 1:N (fan-out) devices (although a mux can be 758 used in front of a meter). Meters are parameterized by a temporal 759 profile and by conformance levels, each of which is associated with 760 a meter's output. Each output can be connected to another functional 761 element. 763 Note that this model of a meter differs from that described in 764 [DSARCH]. In that description the meter is not a datapath element 765 but is instead used to monitor the traffic stream and send control 766 signals to action elements to dynamically modulate their behavior 767 based on the conformance of the packet. We find the description here 768 more powerful. 770 We use the following diagram to illustrate a meter with 3 levels of 771 conformance: 773 unmetered metered 774 traffic traffic 776 +---------+ 777 | |--------> conformanceA 778 --------->| meter |--------> conformanceB 779 | |--------> conformanceC 780 +---------+ 782 Figure 4. An Example Meter 784 In some Diffserv examples, three levels of conformance are discussed 785 in terms of colors, with green representing conforming, yellow 786 representing partially conforming, and red representing non- 787 conforming [AF-PHB]. These different conformance levels are used to 788 trigger different buffer management actions. Other example meters 789 use a binary notion of conformance; in the general case N levels of 790 conformance can be supported. In general there is no constraint on 791 the type of functional element following a meter output, but care 792 must be taken not to inadvertently configure a datapath that results 793 in packet reordering within an OA. 795 5.2 Examples 797 The following is a non-exhaustive list of possible meters. 799 5.2.1 Average Rate Meter 801 An example of a very simple meter is an average rate meter. This 802 type of meter measures the average rate at which packets are 803 submitted to it over a specified averaging time. 805 An average rate profile may take the following form: 807 Meter1: 808 Type: AverageRate 809 Profile1: output A 810 NonConforming: output B 812 Profile1: 813 Type: AverageRate 814 AverageRate: 120 KBps 815 Delta: 1.0 msec 817 A meter measuring against this profile would continually maintain a 818 count that indicates the total number of packets arriving between 819 time T (now) and time T - 1.0 msecs. So long as an arriving packet 820 does not push the count over 120 bytes, the packet would be deemed 821 conforming. Any packet that pushes the count over 120 would be 822 deemed non-conforming. Thus, this meter deems packets to correspond 823 to one of two conformance levels: conforming or non-conforming. 825 5.2.2 Exponential Weighted Moving Average (EWMA) Meter 827 The EWMA form of meter is easy to implement in hardware and can be 828 parameterized as follows: 830 avg_rate(t) = (1 - Gain) * avg_rate(t') + Gain * rate(t) 831 t = t' + Delta 833 For a packet arriving at time t: 835 if (avg_rate(t) > AverageRate) 836 non-conforming 837 else 838 conforming 840 Gain controls the time constant (e.g. frequency response) of what is 841 essentially a simple IIR low-pass filter. rate(t) measures the 842 number of incoming bytes in a small fixed sampling interval, Delta. 843 Any packet that arrives and pushes the average rate over a predefined 844 rate AverageRate is deemed non-conforming. An EWMA meter profile 845 might look as follows: 847 Meter2: 848 Type: ExpWeightedMovingAvg 849 Profile2: output A 850 NonConforming: output B 852 Profile2: 853 Type: ExpWeightedMovingAvg 854 AverageRate: 25 KBps 855 Delta: 10.0 usec 856 Gain: 1/16 858 5.2.3 Two-Parameter Token Bucket Meter 860 A more sophisticated meter might measure conformance to a token 861 bucket (TB) profile. A TB profile generally has two parameters, an 862 average token rate, a burst size. TB meters compare the arrival 863 rate of packets to the average rate specified by the TB profile. 864 Logically, byte tokens accumulate in a bucket at the average rate, 865 up to a maximum credit which is the burst size. Packets of length 866 L bytes are considered conforming if L tokens are available in the 867 bucket at the time of packet arrival. Packets are allowed to 868 exceed the average rate in bursts up to the burst size. Packets 869 which arrive to find a bucket with insufficient tokens in it are 870 deemed non-conforming. A two-parameter TB meter has exactly two 871 possible conformance levels (conforming, non-conforming). TB 872 implementation details are discussed in Appendix A. 874 A two-parameter RB meter profile might look as follows: 876 Meter3: 877 Type: SimpleTokenBucket 878 Profile3: output A 879 NonConforming: output B 881 Profile3: 882 Type: SimpleTokenBucket 883 AverageRate: 100 KBps 884 BurstSize: 100 KB 886 5.2.4 Multi-Stage Token Bucket Meter 888 More complicated TB meters might define two burst sizes and three 889 conformance levels. Packets found to exceed the larger burst size 890 are deemed non-conforming. Packets found to exceed the smaller 891 burst size are deemed partially conforming. Packets exceeding 892 neither are deemed conforming. Token bucket meters designed for 893 Diffserv networks are described in more detail in [SRTCM, TRTCM, 894 GTC]; in some of these references three levels of conformance are 895 discussed in terms of colors, with green representing conforming, 896 yellow representing partially conforming and red representing non- 897 conforming. Often these multi-conformance level meters can be 898 implemented using an appropriate configuration of multiple two- 899 parameter TB meters. 901 A profile for a multi-stage TB meter with three levels of conformance 902 might look as follows: 904 Meter4: 905 Type: MultiTokenBucket 906 Profile4: output A 907 Profile5: output B 908 NonConforming: output C 910 Profile4: 911 Type: SimpleTokenBucket 912 AverageRate: 100 KBps 913 BurstSize: 20 KB 915 Profile5: 916 Type: SimpleTokenBucket 917 AverageRate: 100 KBps 918 BurstSize: 100 KB 920 5.2.5 Null Meter 922 A null meter has only one output: always conforming, and no 923 associated temporal profile. Such a meter is useful to define in the 924 event that the configuration or management interface does not have 925 the flexibility to omit a meter in a datapath segment. 927 6. Action Elements 929 Classifiers and meters are fan-out elements which are generally used 930 to determine the appropriate action to apply to a packet. The set of 931 possible actions include: 933 1) Marking 934 2) Dropping 935 2) Shaping 936 3) Replicating 937 4) Monitoring 939 The corresponding action elements are described in the following 940 paragraphs. 942 Policing is a general term for the process of preventing a traffic 943 stream from seizing more than its share of resources from a Diffserv 944 network. Each of the first three actions described above may be used 945 to police traffic. Markers do so by re-marking non-conforming 946 packets to a DSCP value that is entitled to fewer network resources. 947 Shapers and droppers do so by limiting the rate at which a particular 948 traffic stream is submitted to the network. 950 6.1 Marker 952 Markers are 1:1 elements which set the DSCP in an IP header (in 953 the case of unlabeled packets). Markers may act on unmarked packets 954 (submitted with DSCP of zero) or may re-mark previously marked 955 packets. In particular, the model supports the application of 956 marking based on a preceding classifier match. The DSCP set in a 957 packet will determine its subsequent treatment in downstream nodes 958 of a network, and possible in subsequent processing stages within the 959 router (depending on configuration). 961 Markers are normally parameterized by a single parameter: the 6-bit 962 DSCP to be marked in the packet header. 964 ActionElement1: 965 Type: Marker 966 Mark: 010010 968 In the case of a MPLS labeled packet, the marker is parameterized 969 by a 3-bit EXP value to be marked in the MPLS shim header. 971 6.2 Dropper 973 Droppers simply discard packets. There are no parameters for 974 droppers. Because a dropper is a terminating point of the datapath, 975 it may be desirable to forward the packet through a monitor first 976 for instrumentation purposes. 978 Droppers are not the only elements than can cause a packet to be 979 discarded. The other element is an enqueueing element (see Sec. 980 6.6). However, since the enqueueing element's behavior is closely 981 tied the state of one or more queues, we choose to distinguish them 982 as separate functional elements. 984 6.3 Shaper 986 Shapers are used to shape traffic streams to a certain temporal 987 profile. For example, a shaper can be used to smooth traffic 988 arriving in bursts. In [DSARCH] a shaper is described as a 989 queueing element controlled by a meter which defines its temporal 990 profile. This model of a shaper differs substantially from typical 991 shaper implementations. Further, with the inclusion of queueing 992 elements in the model a separate shaping element becomes confusing. 993 Therefore, the function of a shaper is embedded in a queue and is 994 covered in Sec. 7. 996 6.4 Replicating Element 998 It is occasionally desirable to replicate traffic on one or more 999 additional interfaces for data collection purposes. A replicating 1000 element is a 1:N (fan-out) element. However, each and every packet 1001 follows each output path simultaneously. A replicating element is 1002 parameterized by the number of outputs it supports. 1004 6.5 Mux 1006 It is occasionally necessary to multiplex traffic streams into a 1:1 1007 or 1:N action element or classifier. A M:1 (fan-in) mux is a simple 1008 logical device for merging traffic streams. It is parameterized by 1009 its number of incoming ports. 1011 6.6 Monitor 1013 One passive action is to account for the fact that a data packet was 1014 processed. The statistics that result might be used later for 1015 customer billing, service verification, or network engineering 1016 purposes. Monitors are 1:1 functional elements which update an 1017 octet counter by L and a packet counter by 1 every time a L-byte 1018 sized packet passes through it. Monitors can also be used to count 1019 packets on the verge of being dropped by a dropper. 1021 6.7 Null Action 1023 A null action has one input and one output. The element performs no 1024 action on the packet. Such an element is useful to define in the 1025 event that the configuration or management interface does not have 1026 the flexibility to omit an action element in a datapath segment. 1028 7. Queueing block 1030 The queueing block modulates the transmission of packets belonging to 1031 the different traffic streams and determines their ordering, possibly 1032 storing them temporarily or discarding them. Packets are usually 1033 stored either because there is a resource constraint (e.g., available 1034 bandwidth) which prevents immediate forwarding, or because the 1035 queueing block is being used to alter the temporal properties of a 1036 traffic stream (i.e., shaping). Packets are discarded either because 1037 of buffering limitations, because a buffer threshold is exceeded 1038 (including when shaping is performed), as a feedback control signal 1039 to reactive control protocols such as TCP, because a meter exceeds a 1040 configured rate (i.e., policing). 1042 The queueing block in this model is a logical abstraction of a 1043 queueing system, which is used to configure PHB-related parameters. 1044 There is no conformance to this model. The model can be used to 1045 represent a broad variety of possible implementations. However, it 1046 need not necessarily map one-to-one with physical queueing systems in 1047 a specific router implementation. Implementors should map the 1048 configurable parameters of the implementation's queueing systems to 1049 these queueing block parameters as appropriate to achieve equivalent 1050 behaviors. 1052 7.1 Model 1054 Queuing is a function a which lends itself to innovation. It must be 1055 modelled to allow a broad range of possible implementations to be 1056 represented using common structures and parameters. This model uses 1057 functional decomposition as a tool to permit the needed lattitude. 1059 Queueing sytems, such as the queueing block defined in this model, 1060 perform three distinct, but related, functions: they store packets, 1061 they modulate the departure of packets belonging to various traffic 1062 streams and they selectively discard packets. This model decomposes 1063 the queueing block into the component elements that perform each of 1064 these functions. These elements which may be connected together 1065 either dynamically or statically to construct queueing blocks. A 1066 queuing block is thus composed of of one or more FIFO, one or more 1067 scheduler, and one or more discarder. See figure TBA for an example 1068 of a queueing block. 1070 Note that the term FIFO is overloaded (i.e., has more than one 1071 meaning). In common usage it is taken to mean, among other things, a 1072 data structure that permits items to be removed only in the order in 1073 which they were inserted, and a service discipline which is non- 1074 reordering. 1076 7.1.1 FIFO 1078 A FIFO element is a data structure which at any time may contain zero 1079 or more packets. It may have one or more threshold associated with 1080 it. A FIFO has one or more inputs and exactly one output. It must 1081 support an enqueue operation to add a packet to the tail of the 1082 queue, and a dequeue operation to remove a packet from the head of 1083 the queue. Packets must be dequeued in the order in which they were 1084 enqueued. A FIFO has a depth, which indicates the number of packets 1085 that it contains at a particular time; this is a traffic dependent 1086 variable and not used to configure a FIFO. 1088 Typically, the FIFO element of this model will be implemented as a 1089 FIFO data structure. However, this does not preclude implementations 1090 which are not strictly FIFO, in that they also support operations 1091 that remove or examine packets (e.g., for use by discarders) other 1092 than at the tail. However, such operations MUST NOT have the effect 1093 of reordering packets belonging to the same microflow. 1095 In an implementation, packets are presumably stored in one or more 1096 buffer. Buffers are allocated from one or more free buffer pool. If 1097 there are multiple instances of a FIFO, their packet buffers may or 1098 may not be allocated out of the same free buffer pool. Free buffer 1099 pools may also have one or more threshold associated with them, which 1100 may affect discarding and/or scheduling. Otherwise, buffering 1101 mechanisms are implementation specific and not part of this model. 1103 A FIFO might be represented using the following parameters: 1105 FIFO1: 1106 Type: FIFO 1107 Input: QueuingBlock.input1 1108 Output: Discarder2 1109 Threshold1: 3 packets 1111 Another FIFO may be represented using the following parameters: 1113 FIFO2: 1114 Type: FIFO 1115 Input: Discarder1 1116 Output: Scheduler1 1117 Threshold1: 3 packets 1118 Threshold2: 1000 octets 1119 Threshold3: 10 packets 1120 Threshold4: 2000 octets 1122 7.1.2 Scheduler 1124 A scheduler is an element which gates the departure of each packet 1125 that arrives at one of its inputs, based on a service discipline. It 1126 has one or more input and exactly one output. Each input has an 1127 upstream element to which it is connected, and a set of parameters 1128 that affects the scheduling of packets received at that input. 1130 The service discipline (also known as a scheduling algorithm) is an 1131 algorithm which may take as its inputs static parameters (such as 1132 relative priority, and/or absolute token bucket parameters for 1133 maximum or minimum rates) associated with each of the scheduler's 1134 inputs; parameters (such as packet length or DSCP) associated with 1135 the packet present at its input; absolute time and/or local state. 1137 Possible service disciplines fall into a number of categories, 1138 including (but not limited to) first come, first served (FCFS), 1139 strict priority, weighted fair bandwidth sharing (e.g., WFQ, WRR, 1140 etc.), rate-limited strict priority, and rate-based. Service 1141 disciplines can be further distinguished by whether they are work 1142 conserving or non-work conserving. A work conserving service 1143 discipline transmits a packet at every transmission opportunity if 1144 one is available. A non-work conserving service discipline transmits 1145 packets no sooner than a scheduled departure time, even if it means 1146 leaving packets in a FIFO while the link is idle. Non-work 1147 conserving schedulers can be used to shape traffic streams by 1148 delaying packets that would be deemed non-conforming by some traffic 1149 profile. The packet is delayed until such time as it would conform 1150 to a meter using the same profile. 1152 [DSARCH] defines PHBs without specifying required scheduling 1153 algorithms. However, PHBs such as the class selctors [DSFIELD], 1154 EF [EF-PHB] and AF [AF-PHB] have descriptions or 1155 configuration parameters which strongly suggest the sort of 1156 scheduling discipline needed to implement them. This memo specifies 1157 a minimal set of queue parameters to enable realization of these per- 1158 hop behaviors. It does not attempt to specify an all-embracing 1159 set of parameters to cover all possible implementation models. 1160 The mimimum set includes a minimum service rate profile, a 1161 service priority and a maximum service rate profile (the latter is 1162 for use only with a non-work conserving service discipline). The 1163 minimum service rate allows rate guarantees for each traffic stream 1164 as required by EF and AF without specifying the details of how excess 1165 bandwidth between these traffic streams is shared. Additional 1166 parameters to control this behavior should be made available, but are 1167 dependent on the particular scheduling algorithm implemented. The 1168 service priority is used only after the MinRateProfiles of all inputs 1169 have been satisfied in order to decide how to allocate any remaining 1170 bandwidth. It could be used for the class selectors. For the EF PHB, 1171 using a strict priority scheduling algorithm on some links, and assuming 1172 that the aggregate EF rate has been appropriately bounded to avoid 1173 starvation, for this scheduler the MinRateProfile would be reported 1174 as zero and the MaxRateProfile reported as line rate. Setting the 1175 service priority of each input to the scheduler to the same value 1176 enables the scheduler to satisfy the minimum service rates for each 1177 input, so long as the sum of all minimum service rates is less than 1178 or equal to the line rate. 1180 A non-work conserving scheduler might be represented using the 1181 following parameters: 1183 Scheduler1: 1184 Type: Scheduler 1186 Input1: Discarder1 1187 MaxRateProfile: Profile1 1188 MinRateProfile: Profile2 1189 Priority: None 1191 Input2: Discarder1 1192 MaxRateProfile: Profile3 1193 MinRateProfile: Profile4 1194 Priority: None 1196 A work conserving scheduler might be represented using the 1197 following parameters: 1199 Scheduler2: 1200 Type: Scheduler 1202 Input1: Scheduler1, 1203 MaxRateProfile: WorkConserving 1204 MinRateProfile: Profile5 1205 Priority: 1 1207 Input2: FIFO2 1208 MaxRateProfile: WorkConserving 1209 MinRateProfile: Profile6 1210 Priority: 2 1212 Input3: FIFO3 1213 MaxRateProfile: WorkConserving 1214 MinRateProfile: None 1215 Priority: 3 1217 7.1.3 Discarder 1219 A discarder is an element which selectively discards packets that 1220 arrive at its input, based on a discarding discipline. It has one 1221 input and one output. In this model (but not necessarily in a real 1222 implementation), a packet enters the discarder at the input, and 1223 either its buffer is returned to a free buffer pool or it exits the 1224 discarder at the output. 1226 Alternatively, a discarder may invoke operations on a FIFO which 1227 selectively remove packets, then return those packets to the free 1228 buffer pool, based on a discarding discipline. In this case, the 1229 discarder's operation is modelled as a side-effect on the FIFO upon 1230 which it operates, rather than as having a discrete input and output. 1232 A discarder has a trigger that causes the discarder to make a 1233 decision whether or not to drop one (or possibly more than one) 1234 packet. The trigger may internal (i.e., the arrival of a packet at 1235 the input to the discarder), or it may be external (i.e., resulting 1236 from one or more state change at another element, such as a FIFO 1237 depth exceeding a threshold or a scheduling event). A trigger may be 1238 a boolean combination of events (e.g., a FIFO depth exceeding a 1239 threshold OR a buffer pool depth falling below a threshold). 1241 The discarding discipline is an algorithm which makes a decision to 1242 forward or discard a packet. It takes as its parameters some set of 1243 dynamic parameters (e.g., averaged or instantaneous FIFO depth) and 1244 some set of static parameters (e.g. thresholds) and possibly 1245 parameters associated with the packet (e.g. its PHB, as determined by 1246 a classifier). It may also have internal state. RED, RIO, and drop- 1247 on-threhold are examples of a discarding discipline. Tail dropping 1248 and head dropping are effected by the location of the discarder 1249 relative to the FIFO. 1251 Note that although a discarder may need to examine the DSCP or 1252 possibly other fields in a packet, it may not modify them (i.e., 1253 it is not a marker). 1255 A discarder might be represented using the following parameters: 1256 Discarder1: 1257 Type: Discarder 1258 Trigger: Internal 1259 Input: QueuingBlock.input2 1260 Output: FIFO1 1261 Discipline: RIO 1263 Parameters: 1264 In-MinTh: FIFO1.Threshold1 1265 In-MaxTh: FIFO1.Threshold2 1266 Out-Minth: FIFO1.Threshold3 1267 Out-Maxth: FIFO1.Threshold4 1268 InClassification: AFx1_PHB 1269 OutClassifcation: AFx2_PHB 1270 W_q .002 1271 Max_p .01 1273 Another discarder might be represented using the following parameters: 1274 Discarder2: 1275 Type: Discarder 1276 Trigger: 1277 Input: FIFO2 1278 Output: Scheduler1.input1 1279 Discipline: Drop-on-threshold 1281 Parameters: 1282 Threshold FIFO2.Threshold1 1284 Yet another discarder (not part of the example) might be represented 1285 with the following parameters: 1286 Discarder3: 1287 Type: Discarder 1288 Operate_on FIFO3 1289 Trigger: FIFO3.depth > 100 packets 1290 Discipline: Drop-all-out-packets 1292 Parameters: 1293 Out-DSCP: AFx2_recommended_DSCP | AFx3_recommended_DSCP 1295 7.1.4 Constructing queueing blocks from the elements 1297 A queuing block is constructed by concatenation of these elements 1298 so as to meet the meta-policy objectives of the implementation, 1299 subject to the grammar rules specified in this section. 1301 Elements of the same type may appear more than once in a queueing 1302 block, either in parallel or in series. Typically, a queuing block 1303 will have relatively many elements in parallel and few in series. 1304 Iteration and recursion are not supported constructs in this 1305 grammar. A queuing block must have at least one FIFO, at least 1306 one discarder, and at least one scheduler. The following 1307 connections are allowed: 1309 The input of a FIFO may be the input of the queueing block, or it 1310 may be connected to the output of a discarder or to an output of 1311 a scheduler. 1313 Each input of a scheduler may be connected to the output of a 1314 FIFO, to the output of a discarder or to the output of another 1315 scheduler. 1317 The input of a discarder which has a discrete input and output 1318 may be the input of the queue, or it may be connected to the 1319 output of a FIFO (e.g., head dropping). 1321 The output of the queueing block may be the output of a FIFO 1322 element, a discarding element or a scheduling element. 1324 Note, in particular, that schedulers may operate in series such 1325 that a packet at the head of a FIFO feeding the concatenated 1326 schedulers is serviced only after all of the scheduling criteria 1327 are met. For example, a FIFO which carries EF traffic streams 1328 may be served first by a non-work conserving scheduler to shape 1329 the stream to a maximum rate, then by a work conserving scheduler 1330 to mix EF traffic streams with other traffic streams. Alternatively, 1331 there might be a FIFO and/or a discarder between the two schedulers. 1333 7.2 Shaping 1334 Traffic shaping is often used to condition traffic such that packets 1335 will be deemed conforming by subsequent meters, e.g., in downstream 1336 Diffserv nodes. Shaping may also be used to isolate certain traffic 1337 streams from the effects of other traffic streams of the same BA. 1339 A shaper is realized in this model by using a non-work conserving 1340 scheduler. Some implementations may elect to have queues whose sole 1341 purpose is shaping, while others may integrate the shaping function 1342 with other buffering, discarding and scheduling associated with access 1343 to a resource. Shapers operate by delaying the departure of packets 1344 that would be deemed non-conforming by a meter configured to the shaper's 1345 maximum service rate profile. The packet is scheduled to depart no 1346 sooner than such time that it would become conforming. 1348 8. Traffic Conditioning Blocks (TCBs) 1350 The classifiers, meters, action elements, and queueing elements 1351 described above can be combined into traffic conditioning blocks 1352 (TCBs). The TCB is an abstraction of a functional element that may 1353 be used to facilitate the definition of specific traffic conditioning 1354 functionality. 1356 One of the simplest possible TCBs would consist of the following 1357 stages: 1359 1. Classifier stage 1360 2. Enqueueing stage 1361 3. Queueing stage 1363 Note that a classifier is a 1:N element, while an enqueueing stage is 1364 a N:1 element and a queue is a 1:1 element. If the classifier split 1365 traffic across multiple enqueueing elements then the queueing stage 1366 may consist of a hierarchy of queue sets, all resulting in a 1:1 1367 abstract element. 1369 A more general TCB might consists of the following four stages: 1371 1. Classifier stage 1372 2. Metering stage 1373 3. Action stage 1374 4. Queueing stage 1376 where each stage may consist of a set of parallel datapaths 1377 consisting of pipelined elements. 1379 TCBs are constructed by connecting elements corresponding to these 1380 stages in any sensible order. It is possible to omit stages, to 1381 include null elements, or to concatenate multiple stages of the same 1382 type. TCB outputs may drive additional TCBs (on either the ingress 1383 or egress interfaces). Classifiers and meters are fan-out elements, 1384 muxes and enqueueing elements are fan-in elements. 1386 8.1 An Example TCB 1388 The following diagram illustrates an example TCB: 1390 +------------> to Queue A 1391 +-----+ | (not shown) 1392 | |--+ 1393 +->| | 1394 | | |--+ +-----+ +-----+ 1395 | +-----+ | | | | | 1396 | meter +->| |--->| | 1397 | | | | | 1398 | +-----+ +-----+ 1399 | monitor dropper 1400 | 1401 | 1402 | 1403 submitted +-----+ | +-----+ +-----+ 1404 traffic | A |-----+ | | | | 1405 --->| B |------->| |---->| |---> to Queue B 1406 | C |-----+ | | | | (not shown) 1407 | X |--+ | +-----+ +-----+ 1408 +-----+ | | marker shaper 1409 BA | | queue 1410 classifier| | 1411 | | 1412 | | 1413 | | 1414 | | 1415 | | +-----+ +-----+ 1416 | | | |--------------->| | to Queue C 1417 | +->| | | |-> 1418 | | |--+ +-----+ +->| | (not shown) 1419 | +-----+ | | | | +-----+ 1420 | meter +->| |-+ mux 1421 | | | 1422 | +-----+ 1423 | marker 1424 | 1425 +---------------------------> to Queue D 1426 (not shown) 1427 Figure 5: An Example Traffic Conditioning Block 1429 This sample TCB might be suitable for an ingress interface at a 1430 customer/provider boundary. A SLS is presumed to have been 1431 negotiated between the customer and the provider which specifies the 1432 handling of the customer's traffic by the provider's network. The 1433 agreement might be of the following form: 1435 DSCP PHB Profile Non-Conforming Packets 1436 ---- --- ------- ---------------------- 1437 001001 PHB1 Profile1 Discard 1438 001100 PHB2 Profile2 Wait in shaper queue 1439 001101 PHB3 Profile3 Re-mark to DSCP 001000 1441 It is implicit in this agreement that conforming packets are given 1442 the PHB originally indicated by the packets' DSCP field. It 1443 specifies that the customer may submit packets marked for DSCP 1444 001001 which will get PHB1 treatment so long as they remain 1445 conforming to Profile1 and will be discarded if they exceed this 1446 profile. Similar contract rules are applied for 001100 and 001101 1447 traffic. 1449 In this example, the classification stage consists of a single BA 1450 classifier. The BA classifier is used to separate traffic based on 1451 the Diffserv service level requested by the customer (as indicated 1452 by the DSCP in each submitted packet's IP header). We illustrate 1453 three DSCP filter values: A, B and C. The 'X' in the BA classifier 1454 is the default wildcard filter that matches every packet. 1456 A metering stage is next in the upper and lower branches. There is a 1457 separate meter for each set of packets corresponding to DSCPs A and 1458 C. Each meter uses a specific profile as specified in the TCS for 1459 the corresponding Diffserv service level. The meters in this 1460 example indicate one of two conforming levels, conforming or 1461 non-conforming. The middle branch has a marker which re-marks all 1462 packets received with DSCP B. 1464 Following the metering stage is the action stage in the upper and 1465 lower branches. Packets submitted for DSCP A that are deemed non- 1466 conforming and are counted and discarded. Packets that are 1467 conforming are passed on to Queue A. Packets submitted for DSCP C 1468 that are deemed non-conforming are re-marked, and then conforming and 1469 non-conforming packets are muxed together before being forwarded to 1470 Queue C. Packets submitted for DSCP B are shaped to Profile2 before 1471 being forwarded to Queue B. 1473 The interconnections of the TCB elements illustrated in Fig. 5 can be 1474 represented as follows: 1476 TCB1: 1478 Classifier1: 1479 Output A --> Meter1 1480 Output B --> Marker1 1481 Output C --> Meter2 1482 Output X --> QueueD 1484 Meter1: 1485 Output A --> QueueA 1486 Output B --> Monitor1 1488 Monitor1: 1489 Output A --> Dropper1 1491 Marker1: 1492 Output A --> Shaper1 1493 Shaper1: 1494 Output A --> Queue B 1496 Meter2: 1497 Output A --> Mux1 1498 Output B --> Marker2 1500 Marker2: 1501 Output A --> Mux1 1503 Mux1: 1504 Output A --> Queue C 1506 8.2 An Example TCB to Support Multiple Customers 1508 The TCB described above can be installed on an ingress interface to 1509 implement a provider/customer TCS if the interface is dedicated to 1510 the customer. However, if a single interface is shared between 1511 multiple customers, then the TCB above will not suffice, since it 1512 does not differentiate among traffic from different customers. Its 1513 classification stage uses only BA classifiers. 1515 The TCB is readily extended to support the case of multiple customers 1516 per interface, as follows. First, we define a TCB for each customer 1517 to reflect the TCS with that customer. TCB1, defined above is the 1518 TCB for customer 1. We add definitions for TCB2 and for TCB3 which 1519 reflect the agreements with customers 2 and 3 respectively. 1521 Finally, we add a classifier which provides a front end to separate 1522 the traffic from the three different customers. This forms a new 1523 TCB which incorporates TCB1, TCB2, and TCB3, and can be illustrated 1524 as follows: 1526 submitted +-----+ 1527 traffic | A |--------> TCB1 1528 --->| B |--------> TCB2 1529 | C |--------> TCB3 1530 | X |--------> Dropper4 1531 +-----+ 1532 Classifier4 1534 Figure 6: An Example of a Multi-Customer TCB 1536 A formal representation of this multi-customer TCB might be: 1538 TCB1: 1539 (as defined above) 1541 TCB2: 1542 (similar to TCB1, perhaps with different numeric parameters) 1543 TCB3: 1544 (similar to TCB1, perhaps with different numeric parameters) 1546 TCB4: 1547 (the total TCB) 1549 Classifier4: 1550 Output A --> TCB1 1551 Output B --> TCB2 1552 Output C --> TCB3 1553 Output X --> Dropper4 1555 Where Classifier2 is defined as follows: 1557 Classifier4: 1558 Filter1: Output A 1559 Filter2: Output B 1560 Filter3: Output C 1561 No Match: Output X 1563 and the filters, based on each customer's source MAC address, are 1564 defined as follows: 1566 Filter1: 1567 Type: MacAddress 1568 SrcValue: 01-02-03-04-05-06 (source MAC address of customer 1) 1569 SrcMask: FF-FF-FF-FF-FF-FF 1570 DestValue: 00-00-00-00-00-00 1571 DestMask: 00-00-00-00-00-00 1573 Filter2: 1574 (similar to Filter1 but with customer 2's source MAC address as 1575 SrcValue) 1577 Filter3: 1578 (similar to Filter1 but with customer 3's source MAC address as 1579 SrcValue) 1581 In this example, Classifier4 separates traffic submitted from 1582 different customers based on the source MAC address in submitted 1583 packets. Those packets with recognized source MAC addresses are 1584 passed to the TCB implementing the TCS with the corresponding 1585 customer. Those packets with unrecognized source MAC addresses are 1586 passed to a dropper. 1588 TCB4 has a classification stage and an action element stage, which 1589 consists of either a dropper or another TCB. 1591 8.3 TCBs Supporting Microflow-based Services 1593 The TCB illustrated above describes a configuration that might be 1594 suitable for enforcing a SLS at a router's ingress. It assumes that 1595 the customer marks its own traffic for the appropriate service level. 1596 It then limits the rate of aggregate traffic submitted at each 1597 service level, thereby protecting the resources of the Diffserv 1598 network. It does not provide any isolation between the customer's 1599 individual microflows (other than from separated queueing). 1601 Next we present a TCB configuration that offers additional 1602 functionality to the customer. It recognizes individual customer 1603 microflows and marks each one independently. It also isolates the 1604 customer's individual microflows from each other in order to prevent 1605 a single microflow from seizing an unfair share of the resources 1606 available to the customer at a certain service level. This is 1607 illustrated in Figure 7 below: 1609 +-----+ +-----+ 1610 | | | |---------------+ 1611 +->| |-->| | +-----+ | 1612 +-----+ | | | | |---->| | | 1613 | |---- +-----+ +-----+ +-----+ | 1614 ->| |---- marker meter dropper | +-----+ to 1615 | |-+ | +-----+ +-----+ +-->| | 1616 +-----+ | | | | | |------------------>| |---> 1617 MF | +->| |-->| | +-----+ +-->| | 1618 class. | | | | |---->| | | +-----+ TCB2 1619 | +-----+ +-----+ +-----+ | mux 1620 | marker meter dropper | 1621 | +-----+ +-----+ | 1622 | | | | |---------------+ 1623 |--->| |-->| | +-----+ 1624 | | | | |---->| | 1625 | +-----+ +-----+ +-----+ 1626 | marker meter dropper 1627 | . . . 1628 V V V V 1630 Figure 7: An Example of a Marking and Traffic Isolation TCB 1632 Traffic is first directed to a MF classifier which classifies traffic 1633 based on miscellaneous classification criteria, to a granularity 1634 sufficient to identify individual customer microflows. Each 1635 microflow can then be marked for a specific DSCP (in this particular 1636 example we assume that one of two different DSCPs is marked). The 1637 metering stage limits the contribution of each of the customer's 1638 microflows to the service level for which it was marked. Packets 1639 exceeding the allowable limit for the microflow are dropped. 1641 The TCB could be formally specified as follows: 1643 TCB1: 1644 Classifier1: (MF) 1645 Output A --> Marker1 1646 Output B --> Marker2 1647 Output C --> Marker3 1648 . . . 1650 Marker1 --> Meter1 1651 Marker2 --> Meter2 1652 Marker3 --> Meter3 1654 Meter1: 1655 Output A --> TCB2 1656 Output B --> ActionElement1 (dropper) 1658 Meter2: 1659 Output A --> TCB2 1660 Output B --> ActionElement2 (dropper) 1662 Meter3: 1663 Output A --> TCB2 1664 Output B --> ActionElement3 (dropper) 1666 The actual traffic element declarations are not shown here. 1668 Traffic is either dropped by TCB1 or emerges marked for one of two 1669 DSCPs. This traffic is then passed to TCB2, illustrated below: 1671 +-----+ 1672 | |---------------> 1673 +->| | +-----+ 1674 +-----+ | | |---->| | 1675 | |---+ +-----+ +-----+ 1676 ->| | meter dropper 1677 | |---+ +-----+ 1678 +-----+ | | |---------------> 1679 BA +->| | +-----+ 1680 classifier | |---->| | 1681 +-----+ +-----+ 1682 meter dropper 1684 Figure 8: Additional Example TCB 1686 TCB2 would be formally specified as follows: 1688 Classifier2: (BA) 1689 Output A --> Meter10 1690 Output B --> Meter11 1691 Meter10: 1692 Output A --> PHBQueueA 1693 Output B --> Dropper10 1695 Meter11: 1696 Output A --> PHBQueueB 1697 Output B --> Dropper11 1699 8.4 Cascaded TCBs 1701 Conceptually, nothing prevents more complex scenarios in which one 1702 microflow TCB precedes another (for example, TCBs implementing 1703 separate TCS's for the source and for a set of destinations). 1705 9. Open Issues 1707 o There is a difference in interpretation of token bucket behavior 1708 between this document (Appendix A) and [DSMIB]. Specifically, 1709 [DSMIB] allows a packet to conform if any smaller packet would 1710 conform. 1712 o The meter in [SRTCM] cannot be precisely modeled using two 1713 two-parameter token buckets because its two buckets do not 1714 accumulate credits independently. We intended to demonstrate how 1715 the [TRTCM] meter could be implemented but ran out of time. 1717 o Are the queue parameters (scheduling and buffer management) 1718 parameters defined sufficient? 1720 o Does Queue and Queue Set really belong in the model (and the MIB 1721 and PIB?), or should the model stick to the abstract PHB 1722 representation and leave the implementation details to the MIB and 1723 PIB? 1725 o Should a classifier be part of a TCB? We argue yes. This allows a 1726 TCB to be a one input/one output black box element. 1728 o Is the description of a shaper sufficient? Is it overbroad? 1730 10. Security Considerations 1732 Security vulnerabilities of Diffserv network operation are discussed 1733 in [DSARCH]. This document describes an abstract functional model of 1734 Diffserv router elements. Certain denial-of-service attacks such as 1735 those resulting from resource starvation may be mitigated by 1736 appropriate configuration of these router elements; for example, by 1737 rate limiting certain traffic streams or by authenticating traffic 1738 marked for higher quality-of-service. 1740 11. Acknowledgments 1742 Concepts, terminology, and text have been borrowed liberally from 1743 [DSMIB] and [PIB]. We wish to thank the authors: Fred Baker, 1744 Michael Fine, Keith McCloghrie, John Seligson, Kwok Chan, and 1745 Scott Hahn, for their permission. 1747 This document has benefitted from the comments and suggestions of 1748 several participants of the Diffserv working group. 1750 12. References 1752 [DSARCH] M. Carlson, W. Weiss, S. Blake, Z. Wang, D. Black, and 1753 E. Davies, "An Architecture for Differentiated Services", 1754 RFC 2475, December 1998 1756 [DSTERMS] D. Grossman, "New Terminology for Diffserv", Internet 1757 Draft , October 1758 1999. 1760 [E2E] Y. Bernet, R. Yavatkar, P. Ford, F. Baker, L. Zhang, 1761 M. Speer, K. Nichols, R. Braden, B. Davie, J. Wroclawski, 1762 and E. Felstaine, "Integrated Services Operation over 1763 Diffserv Networks", Internet Draft 1764 , September 1999. 1766 [DSFIELD] K. Nichols, S. Blake, F. Baker, and D. Black, 1767 "Definition of the Differentiated Services Field (DS 1768 Field) in the IPv4 and IPv6 Headers", RFC 2474, December 1769 1998. 1771 [EF-PHB] V. Jacobson, K. Nichols, and K. Poduri, "An Expedited 1772 Forwarding PHB", RFC 2598, June 1999. 1774 [AF-PHB] J. Heinanen, F. Baker, W. Weiss, and J. Wroclawski, 1775 "Assured Forwarding PHB Group", RFC 2597, June 1999. 1777 [DSMIB] F. Baker, "Differentiated Services MIB", Internet Draft 1778 , June 1999. 1780 [SRTCM] J. Heinanen, and R. Guerin, "A Single Rate Three Color 1781 Marker", RFC 2697, September 1999. 1783 [PIB] M. Fine, K. McCloghrie, J. Seligson, K. Chan, S. Hahn, 1784 and A. Smith, "Quality of Service Policy Information 1785 Base", Internet Draft , 1786 June 1999. 1788 [TRTCM] J. Heinanen, R. Guerin, "A Two Rate Three Color Marker", 1789 RFC 2698, September 1999. 1791 [GTC] L. Lin, J. Lo, and F. Ou, "A Generic Traffic Conditioner", 1792 Internet Draft , August 1793 1999. 1795 [MPLSDS] J. Heinanen, "Differentiated Services in MPLS Networks", 1796 Internet Draft , 1797 June 1999. 1799 Appendix A. Simple Token Bucket Definition 1801 [DSMIB] presents a fairly detailed exposition on the operation of 1802 two-parameter token buckets for metering. However, the behavior 1803 described does not appear to be consistent with the behavior defined 1804 in [SRTCM] and [TRTCM]. Specifically, under the definition in 1805 [DSMIB], a packet is assumed to conform to the meter if any of its 1806 bytes would have been accepted, while in [SRTCM] and [TRTCM], a packet 1807 is assumed to conform only if sufficient tokens are available for 1808 every byte in the packet. Further, a packet has no effect on the 1809 token occupancy if it does not conform (no tokens are decremented). 1811 The behavior defined in [SRTCM] and [TRTCM] is not mandatory for 1812 compliance, but we give here a mathematical definition of two- 1813 parameter token bucket operation which is consistent with these 1814 documents, and which can be used to define a shaping profile. 1816 Define a token bucket with bucket size BS, token accumulation rate 1817 R, and instantaneous token occupancy T(t). Assume that T(0) = BS. 1819 Then after an arbitrary interval with no packet arrivals, T(t) will 1820 not change since the bucket is already full of tokens. Assume a 1821 packet of size B bytes at time t'. The bucket capacity T(t'-) = BS 1822 still. Then, as long as B <= BS, the packet conforms to the meter, 1823 and 1825 T(t') = BS - B. 1827 Assume an interval v = t - t' elapses before the next packet, of 1828 size C <= BS, arrives. T(t-) is given by the following equation: 1830 T(t-) = min { BS, T(t') + v*R } 1832 (the packet has accumulated v*R tokens over the interval, up to a 1833 maximum of BS tokens). 1835 If T(t-) - C >= 0, the packet conforms and T(t) = T(t-) - C. 1836 Otherwise, the packet does not conform and T(t) = T(t-). 1838 This function can be used to define a shaping profile. If a packet of 1839 size C arrives at time t, it will be eligible for transmission at time 1840 te given as follows (we still assume C <= BS): 1842 te = max { t, t" } 1844 where 1846 t" = (C - T(t') + t'*R)/R. 1848 T(t") = C, the time when C credits have accumulated in the bucket, 1849 and when the packet would conform if the token bucket were a meter. 1850 te != t" only if t > t". 1852 Authors' Addresses 1854 Yoram Bernet 1855 Microsoft 1856 One Microsoft Way 1857 Redmond, WA 98052 1858 Phone: +1 425 936 9568 1859 E-mail: yoramb@microsoft.com 1861 Andrew Smith 1862 Extreme Networks 1863 3585 Monroe St. 1864 Santa Clara, CA 95051 1865 Phone: +1 408 579 2821 1866 E-mail: andrew@extremenetworks.com 1868 Steven Blake 1869 Ericsson 1870 920 Main Campus Drive, Suite 500 1871 Raleigh, NC 27606 1872 Phone: +1 919 472 9913 1873 E-mail: slblake@torrentnet.com 1875 Daniel Grossman 1876 Motorola Inc. 1877 20 Cabot Blvd. 1878 Mansfield, MA 02048 1879 Phone: +1 508 261 5312 1880 E-mail: dan@dma.isg.mot.com