idnits 2.17.1 draft-ietf-intserv-guaranteed-svc-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-24) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([2], [3], [4], [5], [6], [7], [TBA], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 198 has weird spacing: '...55) and a sig...' == Line 477 has weird spacing: '... flow and c...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (15 December 1995) is 10358 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'TBA' on line 634 -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' Summary: 9 errors (**), 0 flaws (~~), 3 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Integrated Services WG 2 INTERNET-DRAFT S. Shenker/C. Partridge 3 draft-ietf-intserv-guaranteed-svc-03.txt Xerox/BBN 4 15 December 1995 5 Expires: 5/15/96 7 Specification of Guaranteed Quality of Service 9 Status of this Memo 11 This document is an Internet-Draft. Internet-Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its areas, 13 and its working groups. Note that other groups may also distribute 14 working documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six months 17 and may be updated, replaced, or obsoleted by other documents at any 18 time. It is inappropriate to use Internet- Drafts as reference 19 material or to cite them other than as ``work in progress.'' 21 To learn the current status of any Internet-Draft, please check the 22 ``1id-abstracts.txt'' listing contained in the Internet- Drafts 23 Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 24 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 25 ftp.isi.edu (US West Coast). 27 This document is a product of the Integrated Services working group 28 of the Internet Engineering Task Force. Comments are solicited and 29 should be addressed to the working group's mailing list at int- 30 serv@isi.edu and/or the author(s). 32 This draft reflects changes from the IETF meeting in Dallas. 34 Abstract 36 This memo describes the network element behavior required to deliver 37 guaranteed service in the Internet. Guaranteed service provides firm 38 (mathematically provable) bounds on end-to-end packet delays. This 39 specification follows the service specification template described in 40 [1]. 42 Introduction 44 This document defines the requirements for network elements that 45 support guaranteed service. This memo is one of a series of 46 documents that specify the network element behavior required to 47 support various qualities of service in IP internetworks. Services 48 described in these documents are useful both in the global Internet 49 and private IP networks. 51 This document is based on the service specification template given in 52 [1]. Please refer to that document for definitions and additional 53 information about the specification of qualities of service within 54 the IP protocol family. 56 End-to-End Behavior 58 The end-to-end behavior provided by a series of service elements that 59 conform to this document is an assured level of bandwidth that, when 60 used by a policed flow, produces a delay bounded service with no 61 queueing loss for all conforming datagrams (assuming no failure of 62 network components or changes in routing during the life of the 63 flow). 65 The end-to-end behavior conforms to the fluid model (described below) 66 in that the delivered delays do not exceed the fluid delays by more 67 than the specified error bounds. More precisely, the end-to-end 68 delay bound is [(b-M)/R*(p-R)/(p-r)]+(M+Ctot)/R+Dtot for p>R, and 69 (M+Ctot)/R+Dtot for p<=R, (where b, r, p, M, R, Ctot, and Dtot are 70 defined later in this document). Guaranteed service does not control 71 the minimal delay of packets, merely the maximal delays. 73 NOTE: While the per-hop error terms needed to compute the end-to- 74 end delays are exported by the service module (see Exported 75 Information below), the mechanisms needed to collect per-hop 76 bounds and make the end-to-end quantities Ctot and Dtot known to 77 the applications are not described in this specification. These 78 functions, which can be provided by reservation setup protocols, 79 routing protocols or by other network management functions, are 80 outside the scope of this document. 82 The end-to-end behavior (as characterized by Ctot and Dtot) provided 83 along a path should be stable. That is, it should not change as long 84 as the end-to-end path does not change. 86 This service is subject to admission control. 88 Motivation 90 Guaranteed service guarantees that packets will arrive within the 91 guaranteed delivery time and will not be discarded due to queue 92 overflows, provided the flow traffic stays within its specified 93 traffic parameters. This service is intended for applications which 94 need a firm guarantee that a packet will arrive no later than a 95 certain time after it was transmitted by its source. For example, 96 some audio and video "play-back" applications are intolerant of any 97 packet arriving after their play-back time. Applications that have 98 hard real-time requirements will also require guaranteed service. 100 This service does not attempt to minimize the jitter (the difference 101 between the minimal and maximal packet delays); it merely controls 102 the maximal delay. Because the guaranteed bound is a firm one, it 103 must be large enough to cover extremely rare cases of long queueing 104 delays. Several studies have shown that the actual delay for the 105 vast majority of packets can be far lower than the guaranteed delay. 106 Therefore, authors of playback applications should note that packets 107 will often arrive far earlier than the delivery deadline and will 108 have to be buffered at the receiving system until it is time for the 109 application to process them. 111 This service represents one extreme end of delay control for 112 networks. Most other services providing delay control provide much 113 weaker assurances about the resulting delays. In order to provide 114 this high level of assurance, guaranteed service is typically only 115 useful if provided by every network element along the path. 116 Moreover, as described in the Exported Information section, effective 117 provision and use of the service requires that the set-up protocol 118 used to request service provides service characterizations to 119 intermediate routers and to the endpoints. 121 Network Element Data Handling Requirements 123 The network element must ensure that the service approximates the 124 "fluid model" of service. The fluid model at service rate R is 125 essentially the service that would be provided by a dedicated wire of 126 bandwidth R between the source and receiver. Thus, in the fluid 127 model of service at a fixed rate R, the flow's service is completely 128 independent of that of any other flow. 130 The flow's level of service is characterized at each network element 131 by a bandwidth (or service rate) R and a buffer size B. R represents 132 the share of the link's bandwidth the flow is entitled to and B 133 represents the buffer space in the router that the flow may consume. 134 The network element must ensure that its service matches the fluid 135 model at that same rate to within a sharp error bound. 137 The definition of guaranteed service relies on the result that the 138 fluid delay of a flow obeying a token bucket (r,b) and being served 139 by a line with bandwidth R is bounded by b/R as long as R is no less 140 than r. Guaranteed service with a service rate R, where now R is a 141 share of bandwidth rather than the bandwidth of a dedicated line, 142 approximates this behavior. 144 More specifically, the network element must ensure that the delay of 145 any packet be less than b/R+C/R+D, where C and D describe the maximal 146 deviation away from the fluid model. It is important to emphasize 147 that C and D are maximums. So, for instance, if an implementation 148 has occasional gaps in service (perhaps due to processing routing 149 updates), D needs to be large enough to account for the time a packet 150 may lose during the gap in service. 152 Links are not permitted to fragment packets as part of guaranteed 153 service. Packets larger than the MTU of the link must be policed as 154 nonconformant which means that they will be policed according to the 155 rules described in the Policing section below. 157 Invocation Information 159 Guaranteed service is invoked by specifying the traffic (TSpec) and 160 the desired service (RSpec) to the network element. A service 161 request for an existing flow that has a new TSpec and/or RSpec should 162 be treated as a new invocation, in the sense that admission control 163 must be reapplied to the flow. Flows that reduce their TSpec and/or 164 their RSpec (i.e., their new TSpec/RSpec is strictly smaller than the 165 old TSpec/RSpec according to the ordering rules described in the 166 section on Ordering below) should never be denied service. 168 The TSpec takes the form of a token bucket plus a peak rate (p), a 169 minimum policed unit (m), and a maximum packet size (M). 171 The token bucket has a bucket depth, b, and a bucket rate, r. Both b 172 and r must be positive. The rate, r, is measured in bytes of IP 173 datagrams per second, and can range from 1 byte per second to as 174 large as 40 terabytes per second (or about what is believed to be the 175 maximum theoretical bandwidth of a single strand of fiber). Clearly, 176 particularly for large bandwidths, only the first few digits are 177 significant and so the use of floating point representations, 178 accurate to at least 0.1% is encouraged. 180 The bucket depth, b, is also measured in bytes and can range from 1 181 byte to 250 gigabytes. Again, floating point representations 182 accurate to at least 0.1% are encouraged. 184 The range of values is intentionally large to allow for the future 185 bandwidths. The range is not intended to imply that a network 186 element must support the entire range. 188 The peak rate, p, is measured in bytes of IP datagrams per second and 189 has the same range and suggested representation as the bucket rate. 190 The peak rate is the maximum rate at which the source and any 191 reshaping points (reshaping points are defined below) may inject 192 bursts of traffic into the network. More precisely, it is the 193 requirement that for all time periods the amount of data sent cannot 194 exceed M+pT where M is the maximum packet size and T is the length of 195 the time period. Furthermore, p must be greater than or equal to the 196 token bucket rate, r. If the peak rate is unknown or unspecified, 197 then p is set to infinity, which in the IEEE floating point format 198 corresponds to an exponent of all ones (255) and a sign bit and 199 mantissa of all zeroes. 201 The minimum policed unit, m, is an integer measured in bytes. All IP 202 datagrams less than size m will be counted, when policed and tested 203 for conformance to the TSpec, as being of size m. The maximum packet 204 size, M, is the biggest packet that will conform to the traffic 205 specification; it is also measured in bytes. The flow must be 206 rejected if the requested maximum packet size is larger than the MTU 207 of the link. Both m and M must be positive, and m must be less than 208 or equal to M. 210 The RSpec is a rate R and a slack term S, where R must be greater 211 than or equal to r and S must be nonnegative. The RSpec rate can be 212 bigger than the TSpec rate because higher rates will reduce queueing 213 delay. The slack term signifies the difference between the desired 214 delay and the delay obtained by using a reservation level R. This 215 slack term can be utilized by the service element to reduce its 216 resource reservation for this flow. When a service element chooses to 217 utilize some of the slack in the RSpec, it must follow specific rules 218 in updating the R and S fields of the RSpec; these rules are 219 specified in the Ordering and Merging section. If at the time of 220 service invocation no slack is specified, the slack term, S, is set 221 to zero. No buffer specification is included in the RSpec because 222 the service element is expected to derive the required buffer space 223 to ensure no queueing loss from the token bucket and peak rate in the 224 TSpec, the reserved rate and slack in the RSpec, combined with 225 internal information about how the element manages its traffic. 227 The TSpec can be represented by three floating point numbers in 228 single-precision IEEE floating point format followed by two 32-bit 229 integers in network byte order. The first value is the rate (r), the 230 second value is the bucket size (b), the third is the peak rate (p), 231 the fourth is the minimum policed unit (m), and the fifth is the 232 maximum packet size (M). 234 The RSpec rate, R, and the slack term, S, can also be represented 235 using single-precision IEEE floating point. 237 For all IEEE floating point values, the sign bit must be zero. (All 238 values must be positive). Exponents less than 127 (i.e., 0) are 239 prohibited. Exponents greater than 162 (i.e., positive 35) are 240 discouraged, except for specifying a peak rate of infinity. 242 Exported Information 244 Each guaranteed service module must export at least the following 245 information. All of the parameters described below are 246 characterization parameters. 248 A network elements implementation of guaranteed service is 249 characterized by two error terms, C and D, which represent how the 250 element's implementation of the guaranteed service deviates from the 251 fluid model. These two parameters have an additive composition rule. 253 If the composition function is applied along the entire path to 254 compute the end-to-end sums of C and D (Ctot and Dtot) and the 255 resulting values are then provided to the end nodes (by presumably 256 the setup protocol), the end nodes can compute the maximal packet 257 delays. Moreover, if the partial sums (Csum and Dsum) from the most 258 recent reshaping point (reshaping points are defined below) 259 downstream towards receivers are handed to each network element then 260 these network elements can compute the buffer allocations necessary 261 to achieve no packet loss, as detailed in the section Guidelines for 262 Implementors. The proper use and provision of this service requires 263 that the quantities Ctot and Dtot, and the quantities Csum and Dsum 264 be computed. Therefore, we assume that usage of guaranteed service 265 will be primarily in contexts where these quantities are made 266 available to end nodes and network elements. 268 The error term C is measured in units of bytes. An individual 269 element can advertise a C value between 1 and 2**28 (a little over 270 250 megabytes) and the total added over all elements can range as 271 high as (2**32)-1. Should the sum of the different elements delay 272 exceed (2**32)-1, the end-to-end error term should be (2**32)-1. 274 The error term D is measured in units of one microsecond. An 275 individual element can advertise a delay value between 1 and 2**28 276 (somewhat over two minutes) and the total delay added all elements 277 can range as high as (2**32)-1. Should the sum of the different 278 elements delay exceed (2**32)-1, the end-to-end delay should be 279 (2**32)-1. 281 The guaranteed service is service_name 2. 283 Error characterization parameter C is numbered 1 and parameter D is 284 numbered 2. 286 The end-to-end composed value (Ctot) for C is numbered 3 and the 287 end-to-end composed value for D (Dtot) is numbered 4. 289 The since-last-reshaping point composed value (Csum) for C is 290 numbered 5 and the since-last-reshaping point composed value for D 291 (Dsum) is numbered 6. 293 No other exported data is required by this specification. 295 Policing 297 Policing is done at the edge of the network, at all heterogeneous 298 source branch points and at all source merge points. A heterogeneous 299 source branch point is a spot where the multicast distribution tree 300 from a source branches to multiple distinct paths, and the TSpec's of 301 the reservations on the various outgoing links are not all the same. 302 Policing need only be done if the TSpec on the outgoing link is "less 303 than" (in the sense described in the Ordering section) the TSpec 304 reserved on the immediately upstream link. A source merge point is 305 where the multicast distribution trees from two different sources 306 (sharing the same reservation) merge. It is the responsibility of 307 the invoker of the service (a setup protocol, local configuration 308 tool, or similar mechanism) to identify points where policing is 309 required. Policing may be done at other points as well as those 310 described above. 312 The token bucket and peak rate parameters require that traffic must 313 obey the rule that over all time periods, the amount of data sent 314 cannot exceed M+min[pT, rT+b-M], where r and b are the token bucket 315 parameters, M is the maximum packet size, and T is the length of the 316 time period (note that when p is infinite this reduces to the 317 standard token bucket requirement). For the purposes of this 318 accounting, links must count packets which are smaller than the 319 minimal policing unit to be of size m. Packets which arrive at an 320 element and cause a violation of the the M+min[pT, rT+b-M] bound are 321 considered non-conformant. Policing to conformance with this token 322 bucket is done in two different ways. 324 At the edge of the network, non-conforming packets are treated as 325 best-effort datagrams. [If and when a marking ability becomes 326 available, these non-conformant packets should be ''marked'' as being 327 non-compliant and then treated as best effort packets at all 328 subsequent routers.] 330 NOTE: There may be situations outside the scope of this document, 331 such as when a service module's implementation of guaranteed 332 service is being used to implement traffic sharing rather than a 333 quality of service, where the desired action is to discard non- 334 conforming packets. To allow for such uses, implementors should 335 ensure that the action to be taken for non-conforming packets is 336 configurable. 338 Inside the network, this approach does not produce the desired 339 results, because queueing effects will occasionally cause a flow's 340 traffic that entered the network as conformant to be no longer 341 conformant at some downstream network element. Therefore, inside the 342 network, service elements must reshape traffic before applying the 343 token bucket test. Reshaping entails delaying packets until they are 344 within conformance of the TSpec. 346 Reshaping is done by combining a buffer with a token bucket and peak 347 rate regulator and buffering data until it can be sent in conformance 348 with the token bucket and peak rate parameters. (The token bucket 349 regulator should start with its token bucket full of tokens). Under 350 guaranteed service, the amount of buffering required to reshape any 351 conforming traffic back to its original token bucket shape is 352 b+Csum+(Dsum*r), where Csum and Dsum are the sums of the parameters C 353 and D between the last reshaping point and the current reshaping 354 point. Note that the above buffer requirement is an upper bound that 355 can be significantly reduced if the cumulative latency [7] from the 356 last reshaping point is known. More precisely, in the above formula 357 Dsum can be replaced by Dsum - (cumulative latency). In addition, the 358 knowledge of the peak rate at the reshapers can also be used to 359 further reduce the buffer requirements. A network element must 360 provide the necessary buffers to ensure that conforming traffic is 361 not lost at the reshaper. 363 If a datagram arrives to discover the reshaping buffer is full, then 364 the datagram is non-conforming. Observe this means that a reshaper 365 is effectively policing too. As with a policer, the reshaper should 366 relegate non-conforming datagrams to best effort. [If marking is 367 available, the non-conforming datagrams should be marked] 369 NOTE: As with policers, it should be possible to configure how 370 reshapers handle non-conforming packets. 372 Note that while the large buffer makes it appear that reshapers add 373 considerable delay, this is not the case. Given a valid TSpec that 374 accurately describes the traffic, reshaping will cause little extra 375 delay at the reshaping point. However, if the TSpec is smaller than 376 the actual traffic, reshaping will cause a large queue to develop at 377 the reshaping point, which both causes substantial additional delays 378 and forces some datagrams to be treated as non-conforming. This 379 scenario makes an unpleasant denial of service attack possible, in 380 which a receiver who is successfully receiving a flow's traffic via 381 best effort service is pre-empted by a new receiver who requests a 382 reservation for the flow, but with an inadequate TSpec and RSpec. 383 The flow's traffic will now be policed and possibly reshaped. If the 384 policing function was chosen to discard datagrams, the best-effort 385 receiver would stop receiving traffic. For this reason, in the 386 normal case, policers are simply to mark packets as best effort. 387 While this protects against denial of service, it is still true that 388 the bad TSpec may cause queueing delays to increase. 390 NOTE: To minimize problems of reordering datagrams, reshaping 391 points may wish to forward a best-effort datagram from the front 392 of the reshaping queue when a new datagram arrives and the 393 reshaping buffer is full. 395 Readers should also observe that reclassifying datagrams as best 396 effort also makes support for elastic flows easier. They can 397 reserve a modest token bucket and when their traffic exceeds the 398 token bucket, the excess traffic will be sent best effort. 400 A related issue is that at all network elements, packets bigger than 401 the MTU of the network element must be considered non-conformant and 402 should be classified as best effort (and will then either be 403 fragmented or dropped according to the element's handling of best 404 effort traffic). [Again, if marking is available, these reclassified 405 packets should be marked.] 407 Ordering and Merging 409 TSpec's are ordered according to the following rule: TSpec A is a 410 substitute ("as good or better than") for TSpec B if (1) both the 411 token rate r and bucket depth b for TSpec A are greater than or equal 412 to those of TSpec B, (2) the peak rate p is at least as large in 413 TSpec A as it is in TSpec B. (3) the minimum policed unit m is at 414 least as small for TSpec A as it is for TSpec B, and (4) the maximum 415 packet size M is at least as large for TSpec A as it is for TSpec B. 417 A merged TSpec may be calculated over a set of TSpecs by taking the 418 largest token bucket rate, largest bucket size, largest peak rate, 419 smallest minimal policed unit, and largest maximum packet size across 420 all members of the set. This use of the word "merging" is similar to 421 that in the RSVP protocol; a merged TSpec is one which is adequate to 422 describe the traffic from any one of a number of flows. 424 The RSpec's are merged in a similar manner as the TSpecs, i.e. a set 425 of RSpecs is merged onto a single RSpec by taking the largest rate R, 426 and the smallest slack S. More precisely, RSpec A is a substitute 427 for RSpec B if the value of reserved service rate, R, in RSpec A is 428 greater than or equal to the value in RSpec B, and the value of the 429 slack, S, in RSpec A is smaller than or equal to that in RSpec B. 431 Each network element receives a service request of the form (TSpec, 432 RSpec), where the RSpec is of the form (Rin, Sin). The network 433 element processes this request and performs one of two actions: 435 a. it accepts the request and returns a new Rspec of the form 436 (Rout, Sout); 437 b. it rejects the request. 439 The processing rules for generating the new RSpec are governed by the 440 delay constraint: 442 Sout + b/Rout + Ctoti/Rout <= Sin + b/Rin + Ctoti/Rin, 444 where Ctoti is the cumulative sum of the error terms, C, for all the 445 network elements that are upstream of the current element, i. In 446 other words, this element consumes (Sin - Sout) of slack and can use 447 it to reduce its reservation level, provided that the above 448 inequality is satisfied. Rin and Rout must also satisfy the 449 constraint: 451 r <= Rout <= Rin. 453 When several RSpec's, each with rate Rj, j=1,2..., are to be merged 454 at a split point, the value of Rout is the maximum over all the rates 455 Rj, and the value of Sout is the minimum over all the slack terms Sj. 457 Guidelines for Implementors 459 This section discusses a number of important implementation issues in 460 no particular order. 462 It is important to note that individual subnetworks are service 463 elements and both routers and subnetworks must support the guaranteed 464 service model to achieve guaranteed service. Since subnetworks 465 typically are not capable of negotiating service using IP-based 466 protocols, as part of providing guaranteed service, routers will have 467 to act as proxies for the subnetworks they are attached to. 469 In some cases, this proxy service will be easy. For instance, on 470 leased line, the proxy need simply ensure that the sum of all the 471 flows' RSpec rates does not exceed the bandwidth of the line, and 472 needs to advertise the serialization and transmission delays of the 473 link as the values of C and D. 475 In other cases, this proxy service will be complex. In an ATM 476 network, for example, it may require establishing an ATM VC for the 477 flow and computing the C and D terms for that VC. Readers may 478 observe that the token bucket and peak rate used by guaranteed 479 service map directly to the Sustained Cell Rate, Burst Size, and Peak 480 Cell Rate of ATM's Q.2931 QoS parameters for Variable Bit Rate 481 traffic. 483 The assurance that packets will not be lost is obtained by setting 484 the router buffer space B to be equal to the token bucket b plus some 485 error term (described below). 487 Another issue related to subnetworks is that the TSpec's token bucket 488 rates measure IP traffic and do not (and cannot) account for link 489 level headers. So the subnetwork service elements must adjust the 490 rate and possibly the bucket size to account for adding link level 491 headers. Tunnels must also account for the additional IP headers 492 that they add. 494 For packet networks, a maximum header rate can usually be computed by 495 dividing the rate and bucket sizes by the minimum policed unit. For 496 networks that do internal fragmentation, such as ATM, the computation 497 may be more complex, since one must account for both per-fragment 498 overhead and any wastage (padding bytes transmitted) due to 499 mismatches between packet sizes and fragment sizes. For instance, a 500 conservative estimate of the additional data rate imposed by ATM AAL5 501 plus ATM segmentation and reassembly is 503 ((r/48)*5)+((r/m)*(8+52)) 505 which represents the rate divided into 48-byte cells multiplied by 506 the 5-byte ATM header, plus the maximum packet rate (r/m) multiplied 507 by the cost of the 8-byte AAL5 header plus the maximum space that can 508 be wasted by ATM segmentation of a packet (which is the 52 bytes 509 wasted in a cell that contains one byte). But this estimate is 510 likely to be wildly high, especially if m is small, since ATM wastage 511 is usually much less than 52 bytes. (ATM implementors should be 512 warned that the token bucket may also have to be scaled when setting 513 the VC parameters for call setup and that this example does not 514 account for overhead incurred by encapsulations such as those 515 specified in RFC 1483). 517 To ensure no loss, service elements will have to allocate some 518 buffering for bursts. If every hop implemented the fluid model 519 perfectly, this buffering would simply be b (the token bucket size). 520 However, as noted in the discussion of reshaping earlier, 521 implementations are approximations and we expect that traffic will 522 become more bursty as it goes through the network. However, as with 523 shaping the amount of buffering required to handle the burstiness is 524 bounded by b+Csum+Dsum*R. If one accounts for the peak rate, this 525 can be further reduced to 527 M + (b-M)(p-X)/(p-r) + (Csum/R + Dsum)X 529 where X is set to r if (b-M)/(p-r) is less than Csum/R+Dsum and X is 530 R if (b-M)/(p-r) is greater than or equal to Csum/R+Dsum and p>R; 531 otherwise, X is set to p. This reduction comes from the fact that 532 the peak rate limits the rate at which the burst, b, can be placed in 533 the network. As before, the buffer requirements can be lowered by 534 subtracting from Dsum the propagation delay since the last reshaping 535 point, if it is known. Conversely, if a non-zero slack term, Sout, is 536 returned by the network element, the buffer requirements are 537 increased by adding Sout to Dsum. 539 While sending applications are encouraged to set the peak rate 540 parameter and reshaping points are required to conform to it, it is 541 always acceptable to ignore the peak rate for the purposes of 542 computing buffer requirements and end-to-end delays. The result is 543 simply an overestimate of the buffering and delay. As noted above, 544 if the peak rate is unknown (and thus potentially infinite), the 545 buffering required is b+Csum+Dsum*R. The end-to-end delay without 546 the peak rate is b/R+Ctot/R+Dtot. 548 The parameter D at each service element should be set to the maximum 549 packet transfer delay (independent of bucket size) through the 550 service element. For instance, in a simple router, one might compute 551 the worst case amount of time it make take for a datagram to get 552 through the input interface to the processor, and how long it would 553 take to get from the processor to the outbound interface (assuming 554 the queueing schemes work correctly). For an Ethernet, it might 555 represent the worst case delay if the maximum number of collisions is 556 experienced. 558 The parameter C is the data backlog resulting from the vagaries of 559 how a specific implementation deviates from a strict bit-by-bit 560 service. So, for instance, for packetized weighted fair queueing, C 561 is set to M. 563 If a network element uses a certain amount of slack, Si, to reduce 564 the amount of resources that it has reserved for a particular flow, 565 i, the value Si should be stored at the network element. 566 Subsequently, if reservation refreshes are received for flow i, the 567 network element must use the same slack Si without any further 568 computation. This guarantees consistency in the reservation process. 570 As an example for the use of the slack term, consider the case where 571 the required end-to-end delay, Dreq, is larger than the maximum delay 572 of the fluid flow system. The latter is obtained by setting R=r in 573 the fluid delay formula, and is given by 575 b/r + Ctot/r + Dtot. 577 In this case the slack term is 579 S = Dreq - (b/r + Ctot/r + Dtot). 581 The slack term may be used by the network elements to adjust their 582 local reservations, so that they can admit flows that would otherwise 583 have been rejected. A service element at an intermediate network 584 element that can internally differentiate between delay and rate 585 guarantees can now take advantage of this information to lower the 586 amount of resources allocated to this flow. For example, by taking an 587 amount of slack s <= S, an RCSD scheduler [5] can increase the local 588 delay bound, d, assigned to the flow, to d+s. Given an RSpec, (Rin, 589 Sin), it would do so by setting Rout = Rin and Sout = Sin - s. 591 Similarly, a network element using a WFQ scheduler can decrease its 592 local reservation from Rin to Rout by using some of the slack in the 593 RSpec. This can be accomplished by using the transformation rules 594 given in the previous section, that ensure that the reduced 595 reservation level will not increase the overall end-to-end delay. 597 Evaluation Criteria 599 The scheduling algorithm and admission control algorithm of the 600 element must ensure that the delay bounds are never violated. 601 Furthermore, the element must ensure that misbehaving flows do not 602 affect the service given to other flows. Vendors are encouraged to 603 formally prove that their implementation is an approximation of the 604 fluid model. 606 Examples of Implementation 608 Several algorithms and implementations exist that approximate the 609 fluid model. They include Weighted Fair Queueing (WFQ) [2], Jitter- 610 EDD [3], Virtual Clock [4] and a scheme proposed by IBM [5]. A nice 611 theoretical presentation that shows these schemes are part of a large 612 class of algorithms can be found in [6]. 614 Examples of Use 616 Consider an application that is intolerant of any lost or late 617 packets. It uses the advertised values Ctot and Dtot and the TSpec 618 of the flow, to compute the resulting delay bound from a service 619 request with rate R. Assuming R < p, it then sets its playback point 620 to [(b-M)/R*(p-R)/(p-r)]+(M+Ctot)/R+Dtot. 622 Security Considerations 624 This memo discusses how this service could be abused to permit denial 625 of service attacks. The service, as defined, does not allow denial 626 of service (although service may degrade under certain 627 circumstances). 629 Acknowledgements 631 The authors would like to gratefully acknowledge the help of the INT 632 SERV working group. We would also like to expressly acknowledge the 633 help of several people who helped us ensure the mathematics of this 634 document were correct [TBA]. 636 References 638 [1] S. Shenker and J. Wroclawski. "Network Element Service 639 Specification Template", Internet Draft, June 1995, 642 [2] A. Demers, S. Keshav and S. Shenker, "Analysis and Simulation of 643 a Fair Queueing Algorithm," in Internetworking: Research and 644 Experience, Vol 1, No. 1., pp. 3-26. 646 [3] L. Zhang, "Virtual Clock: A New Traffic Control Algorithm for 647 Packet Switching Networks," in Proc. ACM SIGCOMM '90, pp. 19-29. 649 [4] D. Verma, H. Zhang, and D. Ferrari, "Guaranteeing Delay Jitter 650 Bounds in Packet Switching Networks," in Proc. Tricomm '91. 652 [5] L. Georgiadis, R. Guerin, V. Peris, and K. N. Sivarajan, 653 "Efficient Network QoS Provisioning Based on per Node Traffic 654 Shaping," IBM Research Report No. RC-20064. 656 [6] P. Goyal, S.S. Lam and H.M. Vin, "Determining End-to-End Delay 657 Bounds in Heterogeneous Networks," in Proc. 5th Intl. Workshop on 658 Network and Operating System Support for Digital Audio and Video, 659 April 1995. 661 [7] S. Shenker. "Specification of General Characterization 662 Parameters", Internet Draft, November 1995, 665 Authors' Address: 667 Scott Shenker 668 Xerox PARC 669 3333 Coyote Hill Road 670 Palo Alto, CA 94304-1314 671 shenker@parc.xerox.com 672 415-812-4840 673 415-812-4471 (FAX) 675 Craig Partridge 676 BBN 677 2370 Amherst St 678 Palo Alto CA 94306 679 craig@bbn.com