idnits 2.17.1 draft-ietf-issll-ds-map-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 11 longer pages, the longest (page 4) being 60 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 355 instances of too long lines in the document, the longest one being 33 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 299: '... network element MUST continue to prov...' RFC 2119 keyword, line 303: '... network element SHOULD prevent excess...' RFC 2119 keyword, line 307: '...nd 2, the network element MUST attempt...' RFC 2119 keyword, line 347: '.... This approach SHOULD be used whenev...' RFC 2119 keyword, line 456: '...fore, this approach SHOULD NOT be used...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'LeBouldec' is mentioned on line 723, but not defined == Unused Reference: 'GENCHAR' is defined on line 929, but no explicit reference was found in the text == Unused Reference: 'RSVPINTSERV' is defined on line 957, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'CHARNY' ** Downref: Normative reference to an Informational RFC: RFC 2475 (ref. 'DIFFSERV') ** Obsolete normative reference: RFC 2598 (ref. 'EF') (Obsoleted by RFC 3246) ** Downref: Normative reference to an Informational RFC: RFC 1633 (ref. 'INTSERV') ** Downref: Normative reference to an Informational RFC: RFC 2998 (ref. 'ISDSFRAME') -- Possible downref: Non-RFC (?) normative reference: ref. 'LEBOUDEC' ** Downref: Normative reference to an Informational RFC: RFC 2297 (ref. 'NULL') == Outdated reference: A later version (-03) exists of draft-ietf-issll-rsvp-aggr-02 Summary: 13 errors (**), 0 flaws (~~), 7 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft J. Wroclawski 2 draft-ietf-issll-ds-map-01.txt MIT LCS 3 Expires August, 2001 A. Charny 4 Cisco Systems 5 February, 2001 7 Integrated Service Mappings for Differentiated Services Networks 9 Status of this Memo 11 This document is an Internet Draft and is in full conformance with all 12 provisions of Section 10 of RFC2026. Internet Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its Areas, and 14 its Working Groups. Note that other groups may also distribute working 15 documents as Internet Drafts. 17 Internet Drafts are draft documents valid for a maximum of six months. 18 Internet Drafts may be updated, replaced, or obsoleted by other 19 documents at any time. It is not appropriate to use Internet Drafts as 20 reference material or to cite them other than as a "working draft" or 21 "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 To learn the current status of any Internet-Draft, please check the 30 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 31 Directories on ftp.ietf.org (US East Coast), nic.nordu.net (Europe), 32 ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). 34 This document is a product of the ISSLL working group of the Internet 35 Engineering Task Force. Please address comments to the group's mailing 36 list at issll@mercury.lcs.mit.edu, with a copy to the authors. 37 Copyright (C) The Internet Society (2001). All Rights Reserved. 39 Abstract 41 This document describes mappings of IETF Integrated Services onto IETF 42 differentiated services networks. These mappings allow appropriately 43 engineered and configured differentiated service network clouds to play 44 the role of "network elements" in the Integrated Services framework, and 45 thus to be used as components of an overall end-to-end Integrated 46 Services QoS solution. 48 1. Introduction 50 The IETF Integrated Services framework [INTSERV] defines mechanisms and 51 interfaces for providing network Quality of Service control useful for 52 applications that require more predictable network service than is 54 Wroclawski and Charny Expires: August, 2001 [page 1 ] 55 available with the traditional best-effort IP delivery model. Provision 56 of end-to-end QoS control in the Intserv model is based on the 57 concatenation of "network elements" along the data transmission path. 58 When all of the concatenated network elements implement one of the 59 defined Intserv "services" [G,CL], the resulting data transmission path 60 will deliver a known, controlled QoS defined by the particular Intserv 61 service in use. 63 The IETF Differentiated Services framework [DIFFSERV] defines a number 64 of mechanisms for differentiating different traffic streams within a 65 network and providing different levels of delivery service to those 66 different streams. These mechanisms include differentiated per-hop 67 queuing and forewarding behaviors, as well as behaviors such as traffic 68 classification, metering, policing and shaping that are intended to be 69 used at the edge or boundary of a diffserv cloud. Crucially, the 70 Differentiated Services framework manages traffic forwarding behavior 71 within a diffserv cloud at the aggregate level, rather than the 72 per-application-flow level. 74 The availability of Differentiated Services per-hop and cloud-edge 75 behaviors, together with additional mechanisms to statically or 76 dynamically limit the absolute level of traffic within a traffic class, 77 allows an IETF Differentiated Services network cloud to act as a network 78 element within the Integrated Services framework. In other words, an 79 appropriately designed, configured and managed Diffserv network cloud 80 can act as one component of an overall end-to-end QoS controlled data 81 path using the Integrated Services framework, and therefore support the 82 delivery of Intserv QoS services. 84 This document is one of a set that together describe the usage of 85 Differentiated Services networks in this manner. This document 86 describes methods for implementing Intserv using Diffserv network 87 behaviors and mechanisms. Companion documents [RSVPAGGR, DCLASS] define 88 extensions to the RSVP signaling protocol [RSVP] that are useful in this 89 environment.It is recommended that readers be familiar with the overall 90 framework in which these mappings and protocols are expected to be used; 91 this framework is described fully in [ISDSFRAME]. 93 Within this document, Section 2 describes the overall approach and 94 discusses issues that are independent of the class of Intserv service 95 being implemented. Section 3 discusses implementation of the Controlled 96 Load service. Section 4 discusses implementation of a mathematically 97 correct Guaranteed service, and presents information about the 98 performance and limitations of this implementation. Section 5 discusses 99 implementation of close approximations to the Guaranteed service that 100 may be acceptable in some circumstances and may allow more efficient use 101 of network resources. Section 6 briefly describes the relationship of 102 the mechanisms described here to the Intserv Null Service [NULL]. 104 2. Basics 105 2.1. Components 107 Figure 1 shows the basic use of a Diffserv network cloud as an Intserv 109 Wroclawski and Charny Expires: August, 2001 [page 2 ] 110 network element. In this figure, Intserv functions within the 111 non-Diffserv regions take place at the level of individual 112 switches, routers, subnets, and similar ojects. In contrast, the 113 entire Diffserv region acts as a _single_ Intserv network element; 114 using components of the Diffserv architecture to implement the 115 behaviors expected of an object in the Intserv environment. 117 ________ ______________ ________ 118 / \ / \ / \ 119 / \ / \ / \ 120 |---| | |---| |---| |---| |---| | |---| 121 |Tx |-|--O--O--|ER1|---|BR1| |BR2|---|ER2|--O--O--|-|Rx | 122 |---| | |-- | |---| |---| |---| | |---| 123 \ / \ / \ / 124 \________/ \______________/ \________/ 126 Non-Diffserv region Diffserv region Non-Diffserv region 128 Figure 1: Sample Network Configuration Figure 1 130 The figure shows that required Intserv network element functions are 131 mapped to the Diffserv cloud as follows: 133 - Traffic scheduling. The Intserv traffic scheduling function is 134 supported by appropriately selected, configured, and provisioned PHB's 135 within the Diffserv network. These PHB's, when concatenated along the 136 path of traffic flow, must provide a scheduling result that adequately 137 approximates the result defined by the Intserv service. 139 In general, the PHB concatenation will only be able to approximate the 140 defined Intserv service over a limited range of operating conditions 141 (level of traffic, allocated resources, and the like). In that case, 142 other elements of the network, such as shapers and policers, must 143 ensure that the traffic conditions seen by the PHB's stay within this 144 range. 146 - Traffic classification. The Intserv framework requires that each 147 network element (re)classify arriving traffic into flows for further 148 processing. This requirement is based on the architectural assumption 149 that network elements should be independent, and not depend on other 150 network elements for correct operation. 152 NOTE: the Intserv framework does not specify the granularity of a 153 flow. Intserv is often associated with per-application or 154 per-session end-to-end flows, but in fact any collection of packets 155 that can be described by an appropriate classifier can be treated as 156 an Intserv traffic flow. 158 When Intserv is mapped to Diffserv, packets must be classified into 159 flows, policed, shaped, and marked with the appropriate DSCP before 160 they enter the interior of the diffserv cloud. Strictly speaking, the 161 independence requirement stated above implies that the ingress 162 boundary router of each diffserv cloud must implement a MF classifier 163 to perform the classification function. However, in keeping with the 165 Wroclawski and Charny Expires: August, 2001 [page 3 ] 166 diffserv model, it is permissible to push the flow classification 167 function further towards the edge of the network if appropriate 168 agreements are in place. For example, flows may be classified and 169 marked by the upstream edge router if the Diffserv network is prepared 170 to trust this router. 172 - Policing and shaping. In terms of location in the network, these 173 functions are similar to traffic classification. A strict 174 interpretation of the Intserv framework would require that the ingress 175 boundary router of the diffserv cloud perform these functions. In 176 practice, they may be pushed to an upstream edge router if appropriate 177 agreements are in place. 179 Note that moving the shaping function upstream of the diffserv ingress 180 boundary router may result in poorer overall QoS performance. This is 181 because if shaping is performed at the boundary router, a single 182 shaper can be applied to all of the traffic in the service class, 183 whereas if the shaping is performed upstream separate shapers will be 184 applied to the traffic from each upstream node. As discussed further 185 in Section 4, the single shaper may be preferable in some 186 circumstances. 188 - Admission control. The quantitative Intserv services (Guaranteed and 189 Controlled Load) require that some form of admission control limit the 190 amount of arriving traffic relative to the available resources. Two 191 issues are of interest; the method used by the diffserv cloud to 192 determine whether sufficient resources are available, and the method 193 used by the overall network to query the diffserv cloud about this 194 availability. 196 Within the cloud, the admission control *mechanism* is closely related 197 to resource allocation. If some form of static resource allocation 198 (provisioning) is used, the admission control function can be 199 performed by any network component that is aware of this allocation, 200 such as a properly configured boundary router. If resource allocation 201 within the network cloud is dynamic (a dynamic "bandwidth broker" or 202 signaling protocol) then this protocol can also perform the admission 203 control function, by refusing to admit new traffic when it determines 204 that it cannot allocate new resources to match. 206 The admission control *mechanism* used is independent of the admission 207 control *algorithm* used to determine whether sufficient resources are 208 available to admit a new traffic flow. The algorithm used may range 209 from simple peak-rate allocation to a complex statistical 210 measurement-based approach. The choice of algorithm is dependent on 211 the Intserv service to be supported. Admission control algorithms 212 appropriate for each service are discussed in the service 213 specific sections below. 215 The admission control mechanism used within the diffserv cloud is also 216 independent of the mechanism used by the outside world to request 217 service from the cloud. As an example, end-to-end RSVP might be used 218 together with any form of interior admission control mechanism - 219 static provisioning, a central bandwidth broker, or aggregate RSVP 220 internal signalling. 222 Wroclawski and Charny Expires: August, 2001 [page 4 ] 223 2.2. Per-Cloud versus Per-Path Control 225 The key to providing absolute, quantitative QoS services within a 226 diffserv network is to ensure that at each hop in the network the 227 resources allocated to the PHB's used for these services are sufficient 228 to handle the arriving traffic. As described above, this can be done 229 through a spectrum of mechanisms ranging from static provisioning to 230 dynamic per-hop signaling within the cloud. Two situations are 231 possible: 233 - With per-cloud provisioning, sufficient resources are made available 234 in the network so that traffic arriving at an ingress point can flow 235 to *any* egress point without violating the PHB resource allocation 236 requirements. In this case, admission control and traffic management 237 decisions need not be based on destination information. 239 - With per-path provisioning, resources are made available in the 240 network to ensure that the PHB resource allocation requirements will 241 not be violated if traffic arriving at an ingress point flows to one 242 (in the unicast case) specific egress point. This requires that 243 admission control and resource allocation mechanisms take into account 244 the egress point of traffic entering the network, but results in more 245 efficient resource utilization. 247 Two points are important to note: 249 - Both approaches are valuable, but all functions must adopt the same 250 approach. Particularly, if resource allocation is per-path, traffic 251 shaping and policing, and hence classification must be destination 252 aware as well. 254 - The per-cloud vs per-path decision is independent of decisions about 255 static vs. dynamic provisioning. It is often assumed that dynamic 256 provisioning is necessarily per-path, while static provisioning is 257 more likely to be per-cloud. In reality, all four options may be 258 useful in differing circumstances. 260 3. Implementation of the Controlled Load Service 262 3.1. Summary of CL Requirements 264 The essence of the Controlled Load service is that traffic using it 265 experiences the performance expected of an unloaded network. The CL 266 specification [CL] refines this definition. 268 - Controlled Load traffic is described by a token bucket Tspec. When 269 traffic is conformant to the Tspec, network elements will forward it 270 with queuing delay not greater than that caused by the traffic's own 271 burstiness - that is, the result of the source emitting a burst of 272 size B into a logical network with capacity R. Further, in doing this 273 no packets will be discarded due to queue overflow. Statistically 274 rare deviations from this ideal behavior are permitted. A measure of 275 the "quality" of a CL service is how rare these deviations are. 277 Wroclawski and Charny Expires: August, 2001 [page 5 ] 278 NOTE: the actual behavior requirements stated in the CL spec are 279 slightly more detailed than what is presented here. 281 - Network elements must not assume that that arrival of nonconformant 282 traffic for a specific controlled-load flow will be unusual, or 283 indicative of error. In certain circumstances large numbers of 284 packets will fail the conformance test *as a matter of normal 285 operation*. Some aspects of the behavior of a CL network element in 286 the presence of nonconformant traffic are specified. 288 (These circumstances include elements carrying traffic from adaptive 289 applications that use the CL service to provide a floor on performance 290 but constantly try to do better, elements acting as the "split points" 291 of a multicast distribution tree or carrying multi-source aggregate 292 flows, such as those generated by RSVP's wildcard or shared-explicit 293 reservation styles supporting a shared reservation). 295 In the presence of nonconformant packets arriving for one or more 296 controlled-load flows, each network element must ensure locally that 297 the following requirements are met: 299 1) The network element MUST continue to provide the contracted 300 quality of service to those controlled-load flows not experiencing 301 excess traffic. 303 2) The network element SHOULD prevent excess controlled-load 304 traffic from unfairly impacting the handling of arriving best- 305 effort traffic. 307 3) Consistent with points 1 and 2, the network element MUST attempt 308 to forward the excess traffic on a best-effort basis if sufficient 309 resources are available. 311 These points lead to two observations about a well implemented CL service. 313 - CL traffic can be sorted into "delay classes" based on burstiness. 314 Highly bursty flows, having a large ratio of Tspec parameters B/R, 315 should expect to experience more queuing delay than their 316 low-burstiness counterparts. Thus, a good CL implementation will sort 317 the offered CL traffic into sub-classes that are expecting roughly 318 equivalent delay, and queue these subclasses independently to achieve 319 this result. 321 - The CL specification leaves open the precise treatment of 322 nonconformant traffic, giving only the minimum requirements listed 323 above. 325 NOTE: The phrase "best effort basis" in the portion of the CL spec 326 quoted above has sometimes been taken to mean "the traffic must be 327 placed in the best effort traffic class and treated identically to 328 BE traffic". This interpretation is incorrect. It is easy to see 329 this at one level, because if nonconformant CL traffic from 330 non-adaptive applications is simply lumped in with adaptive 331 best-effort traffic it will tend to unfairly impact that traffic, in 333 Wroclawski and Charny Expires: August, 2001 [page 6 ] 334 contravention of point 2). However, the intent of the specification 335 is more general. An appropriate reading is "nonconformant CL 336 traffic should be transmitted, when possible, in the way that is 337 most advantageous to users and applications, subject to the 338 requirements on non-interference with other traffic". This allows 339 the CL service to be used both to provide a specific QoS for 340 non-adaptive applications and as to provide a "floor" or minimum QoS 341 for adaptive applications. 343 3.2. Implementation of CL using the AF Per-Hop Behavior 345 The CL service can be supported most effectively using an appropriately 346 designed and configured Assured Forwarding PHB implementation [AF] as 347 the data forwarding element. This approach SHOULD be used whenever 348 possible. 350 The basics of the AF-based approach are as follows: 352 - Sort the offered CL traffic into delay classes based on the B/R 353 ratio of the Tspec. The packets of each delay class will be forwarded 354 using a separate instance of the AF PHB. 356 - For each delay class, construct an aggregate Tspec for the admitted 357 traffic according to the rule for summing Tspecs given in [CL]. This 358 Tspec will be used to police the traffic for conformance at the 359 ingress to the diffserv cloud. 361 - For each delay class, police arriving packets against the token 362 bucket Tspec derived above. Mark conforming packets with a DSCP 363 indicating the selected AF instance, and highest priority forwarding 364 within that instance. Mark nonconformant packets with a DSCP 365 indicating the selected AF instance, and lowest priority forwarding 366 within that instance. 368 - At each node within the diffserv network, configure each AF instance 369 appropriately by: 371 a) setting the actual queue size (or alternatively the dropping 372 parameters for high priority packets) to limit queuing delay to the 373 delay class's target. (In other words, packets that have been 374 delayed beyond the class target should be dropped). 376 b) setting the dropping parameters for low priority packets to drop 377 such packets as soon as any significant non-transient queuing of 378 these packets is detected. 380 c) setting the service rate of the AF instance to a bandwidth 381 sufficient to meet the delay and loss behavior requirements of the 382 CL spec when only high-priority packets are present. 384 - Implement an admission control algorithm that ensures that at each 385 hop in the network the level of conformant traffic offered to each AF 386 instance is equal to or less than that provisioned for in step 4c 387 above (or alternatively dynamically allocates more bandwidth to the 388 relevant AF instance when required). 390 Wroclawski and Charny Expires: August, 2001 [page 7 ] 391 In addition to these basic actions, two subtleties with the use of AF 392 must be observed. 394 First the relationship between different AF instances, and between AF 395 and other PHBs, must be more tightly constrained than is required by the 396 the base AF specification. 398 - Bandwidth should be allocated between AF and BE (and any other 399 relevant PHB's) in such a way that AF cannot simply steal all 400 best-effort bandwidth on demand. A simple WFQ or CBQ scheduler can 401 meet this requirement. 403 - The bandwidth allocation relationship between different AF instances 404 must be known. Two likely relationships are 406 o Bandwidth is allocated to each AF instance independently, as with 407 a WFQ scheduler. 409 o Bandwidth is allocated across the AF instances used for CL service 410 on a priority basis, with the AF instance supporting the lowest 411 delay class of CL having the highest priority. 413 Either of these approaches may be used. However the choice of approach 414 affects the admission control decision, and must be taken into account. 415 In the first case, admission control decisions may be made for each CL 416 delay class independently. In the second case, admission control 417 decisions for high priority classes will affect lower priority classes, 418 which must be taken into account. 420 The second subtlety is that the implementation of AF must service the AF 421 classes in a timely manner, by ensuring that the bandwidth allocated to 422 an AF instance is made available at a time-scale substantially shorter 423 than the delay target of the class. This requirement is slightly 424 stronger than that stated in the AF specification. In practice, any 425 implementation using a common queuing algorithm is likely to be able to 426 meet this requirement unless other PHB's, such as EF, are served at 427 higher priority. When that is true, the traffic seen by the higher 428 priority PHB will also require limiting and shaping in order to ensure 429 that the CL AF instances receive bandwidth on a timely basis. 431 The overall result of this procedure is an implementation of the CL 432 service with the following characteristics: 434 - Conformant CL traffic is carried according to the CL requirements. 436 - Resources are used efficiently by aggregating traffic with similar 437 requirements, but supporting multiple delay classes for traffic with 438 widely differing requirements. 440 - Non-CL traffic is carried whenever resources permit, and is not 441 reordered with respect to the CL flow's conformant traffic. 443 - Nonconformant CL traffic is not able to disrupt traffic of other 444 classes, particular BE. 446 Wroclawski and Charny Expires: August, 2001 [page 8 ] 447 3.2.1 CL/AF Admission Control Approaches 449 451 3.3. Implementation of CL using the EF Per-Hop Behavior 453 It is also possible to implement an approximation of the Controlled Load 454 service using the Diffserv Expedited Forwarding [EF] PHB as the traffic 455 scheduling element. This approach is not preferred, because of two 456 significant limitations. Therefore, this approach SHOULD NOT be used 457 unless the AF-based approach is not available. 459 - Because there is only one EF scheduling class per node, it is 460 impossible to sort the Controlled Load traffic into queuing delay 461 classes, as described above for the AF implementation. Instead, all 462 CL traffic must be handled as one scheduling class, and sufficient 463 resources must be allocated to the class to cause *all* CL traffic to 464 meet the queuing delay expectations of the most demanding flows. 466 - Because the EF PHB requires a hard limit on the amount of traffic 467 passing through it, a CL service implemented using EF cannot handle 468 nonconformant (over-Tspec) traffic gracefully, as can be done with AF. 469 Instead, nonconformant traffic must either be discarded at the ingress 470 of the Diffserv cloud or remarked into a different behavior aggregate, 471 and thus potentially reordered in transit. Either of these behaviors 472 is less desirable than the one obtained from the AF-based 473 implementation above. 475 Notwithstanding these limitations, it may be useful to implement a CL 476 approximation based on the EF PHB when the Diffserv network does not 477 support the AF PHB, or when the implementation of the AF PHB cannot 478 assure the forwarding of traffic in a sufficiently timely manner. In 479 this case: 481 - All CL traffic is marked with a DSCP corresponding to the EF PHB. 483 - A single aggregate Tspec for all CL traffic is computed for each 484 network ingress. 486 - Arriving CL traffic is policed against this Tspec, and nonconformant 487 traffic is either discarded or remarked as BE, at the preference of 488 the network operator. 490 - At each hop within the network the EF PHB must receive a bandwidth 491 allocation sufficient to meet the requirements given in the EF 492 specification when the arriving CL traffic is at the Tspec level for 493 that point within the network. 495 - The topology of the network must be designed so that the 496 instantaneous queuing delay caused by fan-in to a node will exceed the 497 CL requirements rarely or never. In practice, this will be a concern 498 only with very high fan-in topologies. 500 Wroclawski and Charny Expires: August, 2001 [page 9 ] 501 4. Implementation of the Guaranteed Service 503 The Guaranteed service [G] offers a strict mathematical assurance of 504 both throughput and queuing delay, assuming only that the network is 505 functioning correctly. A key concept of the Guaranteed service is that 506 "error terms", referred to as C and D in the specification, are provided 507 by the network element to the customer, allowing the customer to 508 calculate the bandwidth it must request from the network in order to 509 achieve a particular queuing delay target. Thus, the two important 510 tasks in implementing a Guaranteed service network element are providing 511 the traffic scheduling, policing, and shaping functions needed to 512 support a hard bound on performance, and characterizing the network 513 element's error terms so that the customer of the service can accurately 514 characterize the network path and deduce what level of resources must be 515 requested. 517 Our strategy for implementing these capabilities within a diffserv cloud 518 revolves around the use of the EF PHB for Guaranteed traffic, together 519 with the shaping and policing functions necessary to obtain a 520 performance bound in this context. The basic traffic policing and 521 shaping requirements for Guaranteed service are discussed more fully in 522 the service specification. 524 Delay through a Diffserv cloud can be roughly classified into 525 propagation and serialization delay, shaping/reshaping delays at the 526 boundary, and queuing delay inside the cloud. In order to determine the 527 error terms C_dc and D_dc for the Diffserv cloud needed for end-to-end 528 determination of end-to-end delay, each of these delay components need 529 to be evaluated. The difficulty in characterizing C_dc and D_dc is that 530 unlike the Intserv model, where the C and D terms are a local property 531 of the router, in the case of Diffserv cloud these terms depend not only 532 on the topology of the cloud, but also on the internal traffic 533 characteristics of potentially _all_ EF traffic in the cloud. 535 Hence, the existence of upper bounds on delay through the cloud implies 536 centralized knowledge about the topology of the cloud and traffic 537 characterization. In turn, dependence of the delay bounds on traffic 538 characterization at any ingress point to the cloud implies the existence 539 of a policy that defines traffic characterization rules, as well as 540 implementation mechanisms at _all_ ingress points in the network that 541 enforce that policy. 543 These considerations imply that determination of the bound on the delay 544 through the Diffserv cloud should be performed off-line, perhaps as part 545 of a traffic management algorithm, based on the knowledge of the 546 topology, traffic patterns, shaping policies, and other relevant 547 parametersof the cloud. These parameters are discussed in the following 548 sections with respect of each delay component. 550 Once the delay bounds and determined, the corresponding error terms C_dc 551 and D_dc are configured into the appropriate intserv-capable edge 552 routers, as discussed below. 554 Wroclawski and Charny Expires: August, 2001 [page 10] 555 4.1 Propagation and Serialization Delay. 557 These delay components can be bounded by modeling the Diffserv cloud as 558 a sequence of at most h links, each of which of at most length C. The 559 parameters (h, C) determine the so-called "diameter" of the cloud. The 560 knowledge of this diameter can then be used to obtain upper bounds on 561 the propagation and serialization delay through the cloud. 563 4.2 Shaping delay. 565 The Diffserv EF PHB assumes that traffic entering the Diffserv region is 566 conditioned at the Diffserv cloud boundary. In the framework of Figure 567 1, shaping is expected to take place at the ingress edge router ER1, and 568 optionally at the boundary router BR1. Granularity of such shaping is 569 implementation dependent, and can range from microflow shaping to 570 aggregate shaping. The granularity of aggregation can be "all EF 571 traffic between a particular ingress-egress pair", which is frequently 572 referred to as "pipe model", or "all EF traffic originating at a given 573 ingress to all possible destinations", which is frequently referred to 574 as "hose model". 576 In addition to ingress shaping, the Diffserv model allows re-shaping 577 traffic at the egress point. As for the case of ingress shaping, the 578 egress shaping can be implemented either at BR2 or ER2. 580 The effect of different choices of the location and granularity of 581 shaping on the delay guarantees that can be provided by a Diffserv cloud 582 will be discussed in section ??. In this section we consider the effect 583 of this choices on the C and D terms advertised by the Interv-capable 584 routers ER1 and ER2. Note that the Intserv capable router downstream 585 from the Diffserv cloud (ER2 in the reference network of Figure 1) is 586 responsible for exporting the C and D terms of the Diffserv cloud. 588 4.2.1. Shaping at the Edge Routers 590 If shaping is performed at the ingress edge router ER1, and reshaping, 591 if any, is performed at ER2, but there is no shaping implemented inside 592 the Diffserv cloud, the shaping/reshaping delay is part of the total 593 delay advertised by the edge routers ER1 and ER2, and hence the 594 corresponding C and D terms are exported by the Intserv-capable edge 595 routers. These will be denoted as C_is, D_is, C_es, D_es respectively, 596 where the indices _is and _es denote "ingress shaper" and "egress 597 shaper". The values of these parameters are implementation dependent. 599 Since the Diffserv cloud itself does not perform any shaping in this 600 case, its C_dc should be set to zero. The determination of the value of 601 D_dc and factors affecting it are discussed in section 4.4 below. 603 4.2.2 Shaping at the boundary routers 605 In the case where shaping is performed by the boundary routers, shaping 606 and reshaping delay become part of the delay of the Diffserv cloud and 607 hence have to be accounted for in the C_dc and D_dc error terms. Note 609 Wroclawski and Charny Expires: August, 2001 [page 11] 610 that depending on the shaping implementation, the rate-dependent error 611 term may not necessarily be zero, and hence ingress shaping may add a 612 non-zero component to the C_dc value of the Diffserv cloud. 614 Since the ingress shaping delay depends on the shaping implementation 615 and shaping granularity at the border router, and since different border 616 routers may implement different shaping algorithms, it seems natural to 617 dedicate the responsibility to export the error terms for ingress 618 shaping delay to the ingress edge router(s) attached to the border 619 router. 621 It is important to note that in the case of aggregate shaping, the 622 shaping delay may be a function of the combined burst and combined rate 623 of all microflows comprising the shaped aggregate (note that the 624 aggregate may consist of microflows arriving from different ingress 625 points). 627 To enable an existence of a meaningful upper bound on the shaping delay 628 the shapers at the edge routers must be configured in such a way as to 629 ensure the existence of the bound on the shaping delay at the boundary 630 router. This may be accomplished by emposing a policy such as "token 631 bucket parameters of all flows requiring G support entering the diffserv 632 cloud from any edge router should satisfy the condition (r>=r_min, 633 b<=b_max). Such conditions would enable token bucket characterization 634 of the aggregate stream, which in combination with the properties of the 635 shaping implementation would enable the computation of an upper bound 636 for a particular microflow. 638 If the egress boundary router implements reshaping on an aggregate 639 basis, just as in the case if ingress shaping, the egress reshaping 640 delay of a microflow depends on the combined rate and burstiness of the 641 aggregate which is being reshaped. Aggregate burstiness depends, among 642 other things, on the parameters of ingress shapers and on the delay 643 bound of the diffserv cloud incurred by all microflows after the last 644 shaping point. 646 The C and D terms corresponding to the egress boundary shaping must be 647 configured at the egress edge router, which is responsible for exporting 648 the egress shaping component of the C and D terms of the Diffserv cloud. 650 In addition, just as in section 4.2.1, the egress edge router is 651 responsible for exporting the D_ds component of the delay inside the 652 diffserv cloud which is not due to the shaping or reshaping delays. 654 4.2.3. Shaping inside the Diffserv cloud 656 While the Diffserv model does not prevent shaping inside the cloud as 657 well as at the boundaries, this draft will concentrate on the most 658 common case when all internal interfaces of any node in the diffserv 659 cloud implement work-conserving aggregate class-based scheduling only. 661 Wroclawski and Charny Expires: August, 2001 [page 12] 662 4.3 Queuing delay 664 Queuing delay experienced by a given packet is caused by two reasons: 665 contention with other packets in the scheduler and the interruption of 666 service experienced by the scheduler as a whole. A typical example of 667 the latter is the delay in a single processor system when the processor 668 schedules some tasks other than packet scheduler. If a bound on this 669 latter portion of the delay is known for all routers inside the diffserv 670 cloud, then the contribution of this delay component can be bounded by 671 multiplying this bound by the max hop count h. 673 The component of the queuing delay due to contention with other packets 674 in the link scheduler will be discussed in detail in section 4.4. For 675 the sake of brevity, in the rest of this draft the term queuing delay 676 will be used to refer to just the portion of the queuing delay due to 677 contention with other packets in the scheduler. 679 4.4. Queueing delay bounds in the Diffserv Cloud 681 The main difficulty in obtaining hard delay bounds for an arbitrary 682 topology cloud arises from the assumption of aggregate scheduling inside 683 the cloud. When a packet of some flow f traverses a sequence of 684 aggregate queues, its worst case delay may depend on the traffic of 685 other flows which do not even share a single queue with the flow. 686 Moreover, the delay of a packet p of flow f at time t may be affected by 687 flows whose last packets have exited the network long before the first 688 packet of flow f entered the network [CHARNY]. 690 The ability to provide hard delay bounds in a Diffserv cloud with 691 aggregate scheduling must rely on cooperation of all devices in the 692 cloud, as well as strict constraints on the traffic entering the cloud. 694 It has been demonstrated that the knowledge of the following parameters 695 global to the cloud is essential for the ability to provide strict 696 queuing delay guarantees across the Diffserv cloud [CHARNY],[LEBOUDEC]: 698 - limited number of hops of any flow across the cloud (denoted h) 700 - low (bounded) ratio of the load of EF traffic to the service rate of 701 the EF queue on any link in the cloud (denoted u) 703 - minimum rate of the shaped aggregate (denoted r_min) 705 - maximum token bucket depth of an edge-to-edge aggregate (denoted 706 b_max) 708 - minimum service rate of the EF queue (denoted S) 710 - maximum deviation of the amount of service of the EF queue from the 711 ideal fluid service at rate S (denoted E) 713 Wroclawski and Charny Expires: August, 2001 [page 13] 714 Currently, the only known delay bound that holds for an arbitrary 715 topology and arbitrary route distribution is given in [LeBouldec] by 717 D = (E/S + ub_max/r_min)x h/(1-u(h-1)) 719 which holds for any utilization u<1/(h-1). This bound holds for the 720 case when the capacity of any single link is substantially smaller than 721 the total capacity of all interfaces of any router. (This bound may be 722 slightly improved if the capacity of a single link is not negligible 723 compared to the total router capacity [LeBouldec]). Unfortunately, this 724 bound explodes when u=1/(h-1). 726 Some knowledge on either the topology or the routes in the cloud may 727 yield to an improved bound. For example, for a class of network 728 topologies which includes a multistage network it can be shown [CHARNY] 729 that the bound is given by 731 D = (E/S + ub_max/r_min)x((1+u)^h-1)/u 733 While this bound holds for any utilization, due to the exponential term 734 the delay grows very fast with the increase in utilization u. 736 Unfortunately, at the moment no bound is known for a general topology 737 with utilization greater than 1/(h-1). It can be shown [CHARNY], that 738 for utilization values greater than 1/(h-1), for any value of delay D 739 one can always construct a network such that the delay in that network 740 is greater than D. This implies that either no bound exists at all, or 741 if a bound does exist, it must depend on some additional characteristics 742 of the network other than just h and u. 744 The practical implication of these results is that, barring new results 745 on delay bounds, the amount of traffic requiring end-to-end Guaranteed 746 service across the diffserv cloud should be rather small. Furthermore, 747 it also implies that if substantial amount of other EF traffic is 748 present in the network, in order to ensure strict delay bounds for GS 749 traffic, buffering and scheduling mechanisms must exist that ensure 750 separation of the GS traffic using EF PHB from other traffic using EF 751 PHB. 753 4.5. Relationship to Bandwidth Allocation Techniques and Traffic 754 Conditioning Models 756 4.5.1. Availability of sufficient bandwidth 758 As discussed in Section 4.4, in order to provide a strict delay bound 759 across the Diffserv cloud the ratio of the EF load to the service rate 760 of the EF queue has to be deterministically bounded on all links in the 761 network. This can be either ensured by signaled admission control (such 762 as using RSVP aggregation techniques [RSVPAGGR] or by a static 763 provisioning mechanism. It should be noted that if provisioning is 764 used, then to ensure deterministic load/service rate ratio on all link 765 the network should be strongly overprovisioned to account for possible 766 inaccuracy of traffic matrix estimates. 768 Wroclawski and Charny Expires: August, 2001 [page 14] 769 In either case deterministic availability of sufficient bandwidth on all 770 links is a necessary condition for the ability to provide deterministic 771 delay guarantees. 773 4.5.2. Effect of Shaping Granularity on Delay Bounds 775 A related, although different issue for the ability to provide delay 776 deterministic delay guarantees is the granularity of the ingress 777 shaping. The implications of different choices on the resulting delay 778 bounds are discussed in the following subsections. 780 4.5.2.1 Per-microflow shaping 782 The known worst case delay bound is linear in the ratio b_max/r_min. In 783 the case of microflow shaping, the minimal rate of the microflow can be 784 quite small, resulting in a large delay bound. There is a substantial 785 advantage therefore in aggregating many small microflows into an 786 aggregate and shaping the aggregate as a whole. While in principle 787 there is a range of choices for aggregation, this document will consider 788 only two: edge-to-edge aggregation and edge-to-everywhere aggregation. 790 4.5.2.2. Shaping of edge-to-edge aggregates 792 This type of shaping is natural for explicit bandwidth reservation 793 techiques. In this case r_min and b_max relate to the rate and token 794 bucket depth of the border-to-border aggregates. Since the delay bound 795 is linear in b_max/r_min, aggregating as many microflows sharing the 796 same border-to-border pair as possible results in the increase of r_min, 797 and hence in the decrease of the delay bound. The location of the 798 shaper at the border router is therefore beneficial for reducing the 799 edge-to-edge delay bound. 801 4.5.2.3. Shaping of edge-to-everywhere aggregates. 803 This type of shaping is frequently assumed in conjunction with bandwidth 804 provisioning. The effect of this choice on delay bounds depends on 805 exactly how provisioning is done. One possibility for provisioning the 806 network is to estimate edge-to-edge demand matrix for EF traffic and 807 ensure that there is sufficient capacity to accommodate this demand, 808 assuming that the traffic matrix is accurate enough. Another option is 809 to make no assumption on the edge-to-edge EF traffic distribution, but 810 rather admit a certain amount of EF traffic at each ingress edge, 811 regardless of the destination edge, and provision the network in such a 812 way that even if _all_ traffic from _all_ sources happens to pass 813 through a single bottleneck link, the capacity of that link is 814 sufficient to ensure the appropriate load to service rate ratio for the 815 EF traffic. 817 Depending on which of the two choices for provisioning is chosen, 818 shaping of the edge-to-everywhere aggregate has the opposite effect on 819 the delay bound. 821 In the case of "edge-to-edge provisioning", the bandwidth of any link 822 may be sufficient to accommodate the _actual_ load of EF traffic while 824 Wroclawski and Charny Expires: August, 2001 [page 15] 825 remaining within the target utilization bound. Hence, it is the minimal 826 rate and the maximum burst size of the _actual_ edge-to-edge aggregates 827 sharing any link that effect the delay bound. However, aggregate 828 edge-to-all shaping may result in individual substreams of the shaped 829 aggregate being shaped to a much higher rate than the expected rate of 830 that substream. When the edge-to-everywhere aggregate splits inside the 831 network into different substreams going to different destinations, each 832 of those substreams may have in the worst case substantially larger 833 burstiness than the token bucket depth of the aggregate 834 edge-to-everywhere stream. This results in substantial increase of the 835 worst case delay over the edge-to-edge shaping model. Moreover, in this 836 case the properties of ingress shapers do not provide sufficient 837 information to bound the worst case delay, since it is the burstiness of 838 the _substreams_ inside the shaped aggregates that is needed, but is 839 unknown. 841 In contrast, if the "worst case" provisioning is assumed, the network is 842 provisioned in such a way that each link can accommodate all the traffic 843 even if all edge-to-everywhere aggregates end up sharing this link. In 844 this case the r_min and b_max of the edge-to-everywhere aggregate should 845 be used without modification in the formula for the delay bound. 846 Intuitively, in this case the actual traffic distribution can only be 847 better than the worst case, in which all the aggregate traffic at a 848 given ingress is destined to the same "worst case egress". 850 Note that the "worst case" provisioning model targeting a particular 851 utilization bound results in substantially more overprovisioning than 852 the the "point-to-point" provisioning using an estimated traffic matrix, 853 or explicit point-to point bandwidth allocation using signaled admission 854 control. 856 4.6 Concatenation of Diffserv Clouds 858 In the case where one or more Diffserv clouds are concatenated via an 859 Intserv-capable node, the total delay is simply a concatenation of 860 delays computed for each individual intserv-diffserv-insterv segment 861 along the path. However, obtaining end-to-end delay bound for a 862 concatenation of Diffserv clouds via nodes implementing aggregate 863 scheduling is a more complicated problem which requires further 864 research. 866 5. Implementation of Resource Efficient Close Approximations to the 867 Guaranteed Service 869 871 6. Relationship to the Null Service 873 The Intserv "Null Service" [NULL] differs from other defined services by 874 not expressing any quantitative network performance requirements. Use 875 of the Null Service where an Intserv service class is required allows an 876 application or host requesting QoS control service to express policy 877 related information to the network without making a specific 879 Wroclawski and Charny Expires: August, 2001 [page 16] 880 quantitative QoS request. The assumption is that the network policy 881 management and control elements will use this information to select an 882 appropriate QoS for the requesting entity, and take whatever action is 883 required to provide this QoS. 885 One possibility is that the network policy mechanisms will determine 886 that a quantitative end-to-end QoS is appropriate for this entity, and 887 that this QoS can be provided using Intserv mechanisms. In this case, 888 the Null service selector can be replaced, at the first hop router or 889 elsewhere along the path, with a different Intserv service class and 890 related parameter information. Once this occurs, the situation with 891 respect to the use of Diffserv networks to provide the desired QoS is 892 identical to that described above for these other services. 894 A second alternative is that the network policy mechanisms determine 895 that the requesting entity should receive a relative, rather than 896 absolute (quantitative) level of service. In this case, the packets are 897 marked with the appropriate DSCP, but the admission control actions 898 described above are not necessary. 900 7. Security Considerations 902 904 8. References 906 [AF] Heinanen, J., Baker, F., Weiss, W., Wroclawski, J., "Assured 907 Forwarding PHB Group", RFC 2597, June 1999. 909 [CHARNY] Anna Charny, "Delay Bounds in a Network with Aggregate 910 Scehduling", work in progress, 911 ftpeng.cisco.com/ftp/acharny/aggregate_delay_v4.ps 913 [CL] Wroclawski, J., "Specification of the Controlled-Load Network 914 Element Service", RFC 2211, September 1997 916 [DCLASS] Bernet, Y., "Format of the RSVP DCLASS Object", RFC 2996, 917 November 2000 919 [DIFFSERV] Blake, S., Black, D., Carlson, M., Davies, E., Wang Z., 920 Weiss, W., "An Architecture for Differentiated Service", RFC 2475, 921 December 1998. 923 [EF] Jacobson, V., Nichols, K., Poduri, K., "An Expedited Forwarding 924 PHB", RFC 2598, June 1999. 926 [G] Schenker, S., Partridge, C., Guerin, R., "Specification of 927 Guaranteed Quality of Service", RFC 2212 September 1997 929 [GENCHAR] Shenker, S., Wroclawski, J., "General Characterization 930 Parameters for Integrated Service Network Elements", RFC 2215, September 931 1997 933 Wroclawski and Charny Expires: August, 2001 [page 17] 935 [INTSERV] Clark, D. et al. "Integrated Services in the Internet 936 Architecture: an Overview" RFC1633, June 1994 938 [ISDSFRAME] Bernet, Ford, Yavatkar, Baker, Zhang, Speer, Braden, 939 Davie, Wroclawski, Felstaine, "A Framework for Integrated 940 Services Operation over Diffserv Networks", RFC 2998, November 2000 942 [LEBOUDEC] Jean-Yves LeBoudec, "A Proven Delay Bound in a Network with 943 Aggregate Scheduling", work in progress, 944 http://ica1www.epfl.ch/PS_files/ds2.ps 946 [NULL] Bernet, Y., Smith, A., Davie, B., "Specification of the Null 947 Service Type", RFC 2297, November 2000 949 [RSVP] Braden, R., L. Zhang, S. Berson, S. Herzog, S. Jamin, "Resource 950 Reservation Protocol (RSVP) - Version 1 Functional Specification", RFC 951 2205, September 1997 953 [RSVPAGGR] Baker, F., Iturralde, C., Le Faucheur, F., Davie, B., 954 "Aggregation of RSVP for IPv4 and IPv6 Reservations", Internet Draft 955 draft-ietf-issll-rsvp-aggr-02.txt 957 [RSVPINTSERV] Wroclawski, J., "The use of RSVP with IETF Integrated 958 Services", RFC 2210, September 1997. 960 9. Authors' addresses 962 John Wroclawski 963 MIT Laboratory for Computer Science 964 545 Technology Sq., Cambridge, MA 02139, USA 965 EMail: jtw@lcs.mit.edu 967 Anna Charny 968 Cisco Systems 969 300 Apollo Drive, Chelmsford, MA 01824, USA 970 Email: acharny@cisco.com 972 10. Full Copyright 974 Copyright (C) The Internet Society 2001. All Rights Reserved. 976 This document and translations of it may be copied and furnished 977 to others, and derivative works that comment on or otherwise 978 explain it or assist in its implementation may be prepared, copied, 979 published and distributed, in whole or in part, without 980 restriction of any kind, provided that the above copyright notice 981 and this paragraph are included on all such copies and derivative 982 works. However, this document itself may not be modified in any 983 way, such as by removing the copyright notice or references to the 984 Internet Society or other Internet organizations, except as needed 985 for the purpose of developing Internet standards in which case the 986 procedures for copyrights defined in the Internet Standards 988 Wroclawski and Charny Expires: August, 2001 [page 18] 989 process must be followed, or as required to translate it into 990 languages other than English. 992 The limited permissions granted above are perpetual and will not 993 be revoked by the Internet Society or its successors or assigns. 995 This document and the information contained herein is provided on 996 an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET 997 ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR 998 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 999 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1000 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1002 Wroclawski and Charny Expires: August, 2001 [page 19]