idnits 2.17.1 draft-ietf-intserv-guaranteed-svc-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([2], [3], [4], [5], [6], [8], [9], [10], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 144: '... the path MUST be determined and add...' RFC 2119 keyword, line 188: '... network element MUST ensure that the ...' RFC 2119 keyword, line 199: '... network element MUST ensure that its ...' RFC 2119 keyword, line 209: '... network element MUST ensure that the ...' RFC 2119 keyword, line 227: '...an the MTU of the link MUST be policed...' (41 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: Policing is done at the edge of the network. Reshaping is done at all heterogeneous source branch points and at all source merge points. A heterogeneous source branch point is a spot where the multicast distribution tree from a source branches to multiple distinct paths, and the TSpec's of the reservations on the various outgoing links are not all the same. Reshaping need only be done if the TSpec on the outgoing link is "less than" (in the sense described in the Ordering section) the TSpec reserved on the immediately upstream link. A source merge point is where the distribution paths or trees from two different sources (sharing the same reservation) merge. It is the responsibility of the invoker of the service (a setup protocol, local configuration tool, or similar mechanism) to identify points where policing is required. Reshaping may be done at other points as well as those described above. Policing MUST not be done except at the edge of the network. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (3 February 1997) is 9944 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '7' is defined on line 908, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '8' -- Possible downref: Non-RFC (?) normative reference: ref. '9' -- Possible downref: Non-RFC (?) normative reference: ref. '10' Summary: 11 errors (**), 0 flaws (~~), 3 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Integrated Services WG 3 INTERNET-DRAFT S. Shenker/C. Partridge/R. Guerin 4 draft-ietf-intserv-guaranteed-svc-07.txt Xerox/BBN/IBM 5 3 February 1997 6 Expires: 8/3/98 8 Specification of Guaranteed Quality of Service 10 Status of this Memo 12 This document is an Internet-Draft. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its areas, 14 and its working groups. Note that other groups may also distribute 15 working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet- Drafts as reference 20 material or to cite them other than as ``work in progress.'' 22 To learn the current status of any Internet-Draft, please check the 23 ``1id-abstracts.txt'' listing contained in the Internet- Drafts 24 Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 25 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 26 ftp.isi.edu (US West Coast). 28 This document is a product of the Integrated Services working group 29 of the Internet Engineering Task Force. Comments are solicited and 30 should be addressed to the working group's mailing list at int- 31 serv@isi.edu and/or the author(s). 33 This draft reflects minor changes from the IETF meeting in Los 34 Angeles and comments received after circulating draft 5. 36 Abstract 38 This memo describes the network element behavior required to deliver 39 a guaranteed service (guaranteed delay and bandwidth) in the 40 Internet. Guaranteed service provides firm (mathematically provable) 41 bounds on end-to-end datagram queueing delays. This service makes it 42 possible to provide a service that guarantees both delay and 43 bandwidth. This specification follows the service specification 44 template described in [1]. 46 Introduction 48 This document defines the requirements for network elements that 49 support guaranteed service. This memo is one of a series of 50 documents that specify the network element behavior required to 51 support various qualities of service in IP internetworks. Services 52 described in these documents are useful both in the global Internet 53 and private IP networks. 55 This document is based on the service specification template given in 56 [1]. Please refer to that document for definitions and additional 57 information about the specification of qualities of service within 58 the IP protocol family. 60 In brief, the concept behind this memo is that a flow is described 61 using a token bucket and given this description of a flow, a service 62 element (a router, a subnet, etc) computes various parameters 63 describing how the service element will handle the flow's data. By 64 combining the parameters from the various service elements in a path, 65 it is possible to compute the maximum delay a piece of data will 66 experience when transmitted via that path. 68 It is important to note three characteristics of this memo and the 69 service it specifies: 71 1. While the requirements a setup mechanism must follow to achieve 72 a guaranteed reservation are carefully specified, neither the 73 setup mechanism itself nor the method for identifying flows is 74 specified. One can create a guaranteed reservation using a 75 protocol like RSVP, manual configuration of relevant routers or a 76 network management protocol like SNMP. This specification is 77 intentionally independent of setup mechanism. 79 2. To achieve a bounded delay requires that every service element 80 in the path supports guaranteed service or adequately mimics 81 guaranteed service. However this requirement does not imply that 82 guaranteed service must be deployed throughout the Internet to be 83 useful. Guaranteed service can have clear benefits even when 84 partially deployed. If fully deployed in an intranet, that 85 intranet can support guaranteed service internally. And an ISP 86 can put guaranteed service in its backbone and provide guaranteed 87 service between customers (or between POPs). 89 3. Because service elements produce a delay bound as a result 90 rather than take a delay bound as an input to be achieved, it is 91 sometimes assumed that applications cannot control the delay. In 92 reality, guaranteed service gives applications considerable 93 control over their delay. 95 In brief, delay has two parts: a fixed delay (transmission delays, 96 etc) and a queueing delay. The fixed delay is a property of the 97 chosen path, which is determined not by guaranteed service but by 98 the setup mechanism. Only queueing delay is determined by 99 guaranteed service. And (as the equations later in this memo 100 show) the queueing delay is primarily a function of two 101 parameters: the token bucket (in particular, the bucket size b) 102 and the data rate (R) the application requests. These two values 103 are completely under the application's control. In other words, 104 an application can usually accurately estimate, a priori, what 105 queueing delay guaranteed service will likely promise. 106 Furthermore, if the delay is larger than expected, the application 107 can modify its token bucket and data rate in predictable ways to 108 achieve a lower delay. 110 End-to-End Behavior 112 The end-to-end behavior provided by a series of network elements that 113 conform to this document is an assured level of bandwidth that, when 114 used by a policed flow, produces a delay-bounded service with no 115 queueing loss for all conforming datagrams (assuming no failure of 116 network components or changes in routing during the life of the 117 flow). 119 The end-to-end behavior conforms to the fluid model (described under 120 Network Element Data Handling below) in that the delivered queueing 121 delays do not exceed the fluid delays by more than the specified 122 error bounds. More precisely, the end-to-end delay bound is [(b- 123 M)/R*(p-R)/(p-r)]+(M+Ctot)/R+Dtot for p>R>=r, and (M+Ctot)/R+Dtot for 124 r<=p<=R, (where b, r, p, M, R, Ctot, and Dtot are defined later in 125 this document). 127 NOTE: While the per-hop error terms needed to compute the end-to- 128 end delays are exported by the service module (see Exported 129 Information below), the mechanisms needed to collect per-hop 130 bounds and make the end-to-end quantities Ctot and Dtot known to 131 the applications are not described in this specification. These 132 functions are provided by reservation setup protocols, routing 133 protocols or other network management functions and are outside 134 the scope of this document. 136 The maximum end-to-end queueing delay (as characterized by Ctot and 137 Dtot) and bandwidth (characterized by R) provided along a path will 138 be stable. That is, they will not change as long as the end-to-end 139 path does not change. 141 Guaranteed service does not control the minimal or average delay of 142 datagrams, merely the maximal queueing delay. Furthermore, to 143 compute the maximum delay a datagram will experience, the latency of 144 the path MUST be determined and added to the guaranteed queueing 145 delay. (However, as noted below, a conservative bound of the latency 146 can be computed by observing the delay experienced by any one 147 packet). 149 This service is subject to admission control. 151 Motivation 153 Guaranteed service guarantees that datagrams will arrive within the 154 guaranteed delivery time and will not be discarded due to queue 155 overflows, provided the flow's traffic stays within its specified 156 traffic parameters. This service is intended for applications which 157 need a firm guarantee that a datagram will arrive no later than a 158 certain time after it was transmitted by its source. For example, 159 some audio and video "play-back" applications are intolerant of any 160 datagram arriving after their play-back time. Applications that have 161 hard real-time requirements will also require guaranteed service. 163 This service does not attempt to minimize the jitter (the difference 164 between the minimal and maximal datagram delays); it merely controls 165 the maximal queueing delay. Because the guaranteed delay bound is a 166 firm one, the delay has to be set large enough to cover extremely 167 rare cases of long queueing delays. Several studies have shown that 168 the actual delay for the vast majority of datagrams can be far lower 169 than the guaranteed delay. Therefore, authors of playback 170 applications should note that datagrams will often arrive far earlier 171 than the delivery deadline and will have to be buffered at the 172 receiving system until it is time for the application to process 173 them. 175 This service represents one extreme end of delay control for 176 networks. Most other services providing delay control provide much 177 weaker assurances about the resulting delays. In order to provide 178 this high level of assurance, guaranteed service is typically only 179 useful if provided by every network element along the path (i.e. by 180 both routers and the links that interconnect the routers). Moreover, 181 as described in the Exported Information section, effective provision 182 and use of the service requires that the set-up protocol or other 183 mechanism used to request service provides service characterizations 184 to intermediate routers and to the endpoints. 186 Network Element Data Handling Requirements 188 The network element MUST ensure that the service approximates the 189 "fluid model" of service. The fluid model at service rate R is 190 essentially the service that would be provided by a dedicated wire of 191 bandwidth R between the source and receiver. Thus, in the fluid 192 model of service at a fixed rate R, the flow's service is completely 193 independent of that of any other flow. 195 The flow's level of service is characterized at each network element 196 by a bandwidth (or service rate) R and a buffer size B. R represents 197 the share of the link's bandwidth the flow is entitled to and B 198 represents the buffer space in the network element that the flow may 199 consume. The network element MUST ensure that its service matches 200 the fluid model at that same rate to within a sharp error bound. 202 The definition of guaranteed service relies on the result that the 203 fluid delay of a flow obeying a token bucket (r,b) and being served 204 by a line with bandwidth R is bounded by b/R as long as R is no less 205 than r. Guaranteed service with a service rate R, where now R is a 206 share of bandwidth rather than the bandwidth of a dedicated line, 207 approximates this behavior. 209 Consequently, the network element MUST ensure that the queueing delay 210 of any datagram be less than b/R+C/R+D, where C and D describe the 211 maximal local deviation away from the fluid model. It is important 212 to emphasize that C and D are maximums. So, for instance, if an 213 implementation has occasional gaps in service (perhaps due to 214 processing routing updates), D needs to be large enough to account 215 for the time a datagram may lose during the gap in service. (C and D 216 are described in more detail in the section on Exported Information). 218 NOTE: Strictly speaking, this memo requires only that the service 219 a flow receives is never worse than it would receive under this 220 approximation of the fluid model. It is perfectly acceptable to 221 give better service. For instance, if a flow is currently not 222 using its share, R, algorithms such as Weighted Fair Queueing that 223 temporarily give other flows the unused bandwidth, are perfectly 224 acceptable (indeed, are encouraged). 226 Links are not permitted to fragment datagrams as part of guaranteed 227 service. Datagrams larger than the MTU of the link MUST be policed 228 as nonconformant which means that they will be policed according to 229 the rules described in the Policing section below. 231 Invocation Information 233 Guaranteed service is invoked by specifying the traffic (TSpec) and 234 the desired service (RSpec) to the network element. A service 235 request for an existing flow that has a new TSpec and/or RSpec SHOULD 236 be treated as a new invocation, in the sense that admission control 237 SHOULD be reapplied to the flow. Flows that reduce their TSpec 238 and/or their RSpec (i.e., their new TSpec/RSpec is strictly smaller 239 than the old TSpec/RSpec according to the ordering rules described in 240 the section on Ordering below) SHOULD never be denied service. 242 The TSpec takes the form of a token bucket plus a peak rate (p), a 243 minimum policed unit (m), and a maximum datagram size (M). 245 The token bucket has a bucket depth, b, and a bucket rate, r. Both b 246 and r MUST be positive. The rate, r, is measured in bytes of IP 247 datagrams per second, and can range from 1 byte per second to as 248 large as 40 terabytes per second (or close to what is believed to be 249 the maximum theoretical bandwidth of a single strand of fiber). 250 Clearly, particularly for large bandwidths, only the first few digits 251 are significant and so the use of floating point representations, 252 accurate to at least 0.1% is encouraged. 254 The bucket depth, b, is also measured in bytes and can range from 1 255 byte to 250 gigabytes. Again, floating point representations 256 accurate to at least 0.1% are encouraged. 258 The range of values is intentionally large to allow for the future 259 bandwidths. The range is not intended to imply that a network 260 element has to support the entire range. 262 The peak rate, p, is measured in bytes of IP datagrams per second and 263 has the same range and suggested representation as the bucket rate. 264 The peak rate is the maximum rate at which the source and any 265 reshaping points (reshaping points are defined below) may inject 266 bursts of traffic into the network. More precisely, it is a 267 requirement that for all time periods the amount of data sent cannot 268 exceed M+pT where M is the maximum datagram size and T is the length 269 of the time period. Furthermore, p MUST be greater than or equal to 270 the token bucket rate, r. If the peak rate is unknown or 271 unspecified, then p MUST be set to infinity. 273 The minimum policed unit, m, is an integer measured in bytes. All IP 274 datagrams less than size m will be counted, when policed and tested 275 for conformance to the TSpec, as being of size m. The maximum 276 datagram size, M, is the biggest datagram that will conform to the 277 traffic specification; it is also measured in bytes. The flow MUST 278 be rejected if the requested maximum datagram size is larger than the 279 MTU of the link. Both m and M MUST be positive, and m MUST be less 280 than or equal to M. 282 The guaranteed service uses the general TOKEN_BUCKET_TSPEC 283 parameter defined in Reference [8] to describe a data flow's 284 traffic characteristics. The description above is of that 285 parameter. The TOKEN_BUCKET_TSPEC is general parameter number 286 127. Use of this parameter for the guaranteed service TSpec 287 simplifies the use of guaranteed Service in a multi-service 288 environment. 290 The RSpec is a rate R and a slack term S, where R MUST be greater 291 than or equal to r and S MUST be nonnegative. The rate R is again 292 measured in bytes of IP datagrams per second and has the same range 293 and suggested representation as the bucket and the peak rates. The 294 slack term S is in microseconds. The RSpec rate can be bigger than 295 the TSpec rate because higher rates will reduce queueing delay. The 296 slack term signifies the difference between the desired delay and the 297 delay obtained by using a reservation level R. This slack term can 298 be utilized by the network element to reduce its resource reservation 299 for this flow. When a network element chooses to utilize some of the 300 slack in the RSpec, it MUST follow specific rules in updating the R 301 and S fields of the RSpec; these rules are specified in the Ordering 302 and Merging section. If at the time of service invocation no slack 303 is specified, the slack term, S, is set to zero. No buffer 304 specification is included in the RSpec because the network element is 305 expected to derive the required buffer space to ensure no queueing 306 loss from the token bucket and peak rate in the TSpec, the reserved 307 rate and slack in the RSpec, the exported information received at the 308 network element, i.e., Ctot and Dtot or Csum and Dsum, combined with 309 internal information about how the element manages its traffic. 311 The TSpec can be represented by three floating point numbers in 312 single-precision IEEE floating point format followed by two 32-bit 313 integers in network byte order. The first floating point value is 314 the rate (r), the second floating point value is the bucket size (b), 315 the third floating point is the peak rate (p), the first integer is 316 the minimum policed unit (m), and the second integer is the maximum 317 datagram size (M). 319 The RSpec rate term, R, can also be represented using single- 320 precision IEEE floating point. 322 The Slack term, S, can be represented as a 32-bit integer. Its value 323 can range from 0 to (2**32)-1 microseconds. 325 When r, b, p, and R terms are represented as IEEE floating point 326 values, the sign bit MUST be zero (all values MUST be non-negative). 327 Exponents less than 127 (i.e., 0) are prohibited. Exponents greater 328 than 162 (i.e., positive 35) are discouraged, except for specifying a 329 peak rate of infinity. Infinity is represented with an exponent of 330 all ones (255) and a sign bit and mantissa of all zeroes. 332 Exported Information 334 Each guaranteed service module MUST export at least the following 335 information. All of the parameters described below are 336 characterization parameters. 338 A network element's implementation of guaranteed service is 339 characterized by two error terms, C and D, which represent how the 340 element's implementation of the guaranteed service deviates from the 341 fluid model. These two parameters have an additive composition rule. 343 The error term C is the rate-dependent error term. It represents the 344 delay a datagram in the flow might experience due to the rate 345 parameters of the flow. An example of such an error term is the need 346 to account for the time taken serializing a datagram broken up into 347 ATM cells, with the cells sent at a frequency of 1/r. 349 NOTE: It is important to observe that when computing the delay 350 bound, parameter C is divided by the reservation rate R. This 351 division is done because, as with the example of serializing the 352 datagram, the effect of the C term is a function of the 353 transmission rate. Implementors should take care to confirm that 354 their C values, when divided by various rates, give appropriate 355 results. Delay values that are not dependent on the rate SHOULD 356 be incorporated into the value for the D parameter. 358 The error term D is the rate-independent, per-element error term and 359 represents the worst case non-rate-based transit time variation 360 through the service element. It is generally determined or set at 361 boot or configuration time. An example of D is a slotted network, in 362 which guaranteed flows are assigned particular slots in a cycle of 363 slots. Some part of the per-flow delay may be determined by which 364 slots in the cycle are allocated to the flow. In this case, D would 365 measure the maximum amount of time a flow's data, once ready to be 366 sent, might have to wait for a slot. (Observe that this value can be 367 computed before slots are assigned and thus can be advertised. For 368 instance, imagine there are 100 slots. In the worst case, a flow 369 might get all of its N slots clustered together, such that if a 370 packet was made ready to send just after the cluster ended, the 371 packet might have to wait 100-N slot times before transmitting. In 372 this case one can easily approximate this delay by setting D to 100 373 slot times). 375 If the composition function is applied along the entire path to 376 compute the end-to-end sums of C and D (Ctot and Dtot) and the 377 resulting values are then provided to the end nodes (by presumably 378 the setup protocol), the end nodes can compute the maximal datagram 379 queueing delays. Moreover, if the partial sums (Csum and Dsum) from 380 the most recent reshaping point (reshaping points are defined below) 381 downstream towards receivers are handed to each network element then 382 these network elements can compute the buffer allocations necessary 383 to achieve no datagram loss, as detailed in the section Guidelines 384 for Implementors. The proper use and provision of this service 385 requires that the quantities Ctot and Dtot, and the quantities Csum 386 and Dsum be computed. Therefore, we assume that usage of guaranteed 387 service will be primarily in contexts where these quantities are made 388 available to end nodes and network elements. 390 The error term C is measured in units of bytes. An individual 391 element can advertise a C value between 1 and 2**28 (a little over 392 250 megabytes) and the total added over all elements can range as 393 high as (2**32)-1. Should the sum of the different elements delay 394 exceed (2**32)-1, the end-to-end error term MUST be set to (2**32)-1. 396 The error term D is measured in units of one microsecond. An 397 individual element can advertise a delay value between 1 and 2**28 398 (somewhat over two minutes) and the total delay added over all 399 elements can range as high as (2**32)-1. Should the sum of the 400 different elements delay exceed (2**32)-1, the end-to-end delay MUST 401 be set to (2**32)-1. 403 The guaranteed service is service_name 2. 405 The RSpec parameter is numbered 130. 407 Error characterization parameters C and D are numbered 131 and 132. 408 The end-to-end composed values for C and D (Ctot and Dtot) are 409 numbered 133 and 134. The since-last-reshaping point composed values 410 for C and D (Csum and Dsum) are numbered 135 and 136. 412 Policing 414 There are two forms of policing in guaranteed service. One form is 415 simple policing (hereafter just called policing to be consistent with 416 other documents), in which arriving traffic is compared against a 417 TSpec. The other form is reshaping, where an attempt is made to 418 restore (possibly distorted) traffic's shape to conform to the TSpec, 419 and the fact that traffic is in violation of the TSpec is discovered 420 because the reshaping fails (the reshaping buffer overflows). 422 Policing is done at the edge of the network. Reshaping is done at 423 all heterogeneous source branch points and at all source merge 424 points. A heterogeneous source branch point is a spot where the 425 multicast distribution tree from a source branches to multiple 426 distinct paths, and the TSpec's of the reservations on the various 427 outgoing links are not all the same. Reshaping need only be done if 428 the TSpec on the outgoing link is "less than" (in the sense described 429 in the Ordering section) the TSpec reserved on the immediately 430 upstream link. A source merge point is where the distribution paths 431 or trees from two different sources (sharing the same reservation) 432 merge. It is the responsibility of the invoker of the service (a 433 setup protocol, local configuration tool, or similar mechanism) to 434 identify points where policing is required. Reshaping may be done at 435 other points as well as those described above. Policing MUST not be 436 done except at the edge of the network. 438 The token bucket and peak rate parameters require that traffic MUST 439 obey the rule that over all time periods, the amount of data sent 440 cannot exceed M+min[pT, rT+b-M], where r and b are the token bucket 441 parameters, M is the maximum datagram size, and T is the length of 442 the time period (note that when p is infinite this reduces to the 443 standard token bucket requirement). For the purposes of this 444 accounting, links MUST count datagrams which are smaller than the 445 minimum policing unit to be of size m. Datagrams which arrive at an 446 element and cause a violation of the the M+min[pT, rT+b-M] bound are 447 considered non-conformant. 449 At the edge of the network, traffic is policed to ensure it conforms 450 to the token bucket. Non-conforming datagrams SHOULD be treated as 451 best-effort datagrams. [If and when a marking ability becomes 452 available, these non-conformant datagrams SHOULD be ''marked'' as 453 being non-compliant and then treated as best effort datagrams at all 454 subsequent routers.] 456 Best effort service is defined as the default service a network 457 element would give to a datagram that is not part of a flow and was 458 sent between the flow's source and destination. Among other 459 implications, this definition means that if a flow's datagram is 460 changed to a best effort datagram, all flow control (e.g., RED [2]) 461 that is normally applied to best effort datagrams is applied to that 462 datagram too. 464 NOTE: There may be situations outside the scope of this document, 465 such as when a service module's implementation of guaranteed 466 service is being used to implement traffic sharing rather than a 467 quality of service, where the desired action is to discard non- 468 conforming datagrams. To allow for such uses, implementors SHOULD 469 ensure that the action to be taken for non-conforming datagrams is 470 configurable. 472 Inside the network, policing does not produce the desired results, 473 because queueing effects will occasionally cause a flow's traffic 474 that entered the network as conformant to be no longer conformant at 475 some downstream network element. Therefore, inside the network, 476 network elements that wish to police traffic MUST do so by reshaping 477 traffic to the token bucket. Reshaping entails delaying datagrams 478 until they are within conformance of the TSpec. 480 Reshaping is done by combining a buffer with a token bucket and peak 481 rate regulator and buffering data until it can be sent in conformance 482 with the token bucket and peak rate parameters. (The token bucket 483 regulator MUST start with its token bucket full of tokens). Under 484 guaranteed service, the amount of buffering required to reshape any 485 conforming traffic back to its original token bucket shape is 486 b+Csum+(Dsum*r), where Csum and Dsum are the sums of the parameters C 487 and D between the last reshaping point and the current reshaping 488 point. Note that the knowledge of the peak rate at the reshapers can 489 be used to reduce these buffer requirements (see the section on 490 "Guidelines for Implementors" below). A network element MUST provide 491 the necessary buffers to ensure that conforming traffic is not lost 492 at the reshaper. 494 NOTE: Observe that a router that is not reshaping can still 495 identify non-conforming datagrams (and discard them or schedule 496 them at lower priority) by observing when queued traffic for the 497 flow exceeds b+Csum+(Dsum*r). 499 If a datagram arrives to discover the reshaping buffer is full, then 500 the datagram is non-conforming. Observe this means that a reshaper 501 is effectively policing too. As with a policer, the reshaper SHOULD 502 relegate non-conforming datagrams to best effort. [If marking is 503 available, the non-conforming datagrams SHOULD be marked] 505 NOTE: As with policers, it SHOULD be possible to configure how 506 reshapers handle non-conforming datagrams. 508 Note that while the large buffer makes it appear that reshapers add 509 considerable delay, this is not the case. Given a valid TSpec that 510 accurately describes the traffic, reshaping will cause little extra 511 actual delay at the reshaping point (and will not affect the delay 512 bound at all). Furthermore, in the normal case, reshaping will not 513 cause the loss of any data. 515 However, (typically at merge or branch points), it may happen that 516 the TSpec is smaller than the actual traffic. If this happens, 517 reshaping will cause a large queue to develop at the reshaping point, 518 which both causes substantial additional delays and forces some 519 datagrams to be treated as non-conforming. This scenario makes an 520 unpleasant denial of service attack possible, in which a receiver who 521 is successfully receiving a flow's traffic via best effort service is 522 pre-empted by a new receiver who requests a reservation for the flow, 523 but with an inadequate TSpec and RSpec. The flow's traffic will now 524 be policed and possibly reshaped. If the policing function was 525 chosen to discard datagrams, the best-effort receiver would stop 526 receiving traffic. For this reason, in the normal case, policers are 527 simply to treat non-conforming datagrams as best effort (and marking 528 them if marking is implemented). While this protects against denial 529 of service, it is still true that the bad TSpec may cause queueing 530 delays to increase. 532 NOTE: To minimize problems of reordering datagrams, reshaping 533 points may wish to forward a best-effort datagram from the front 534 of the reshaping queue when a new datagram arrives and the 535 reshaping buffer is full. 537 Readers should also observe that reclassifying datagrams as best 538 effort (as opposed to dropping the datagrams) also makes support 539 for elastic flows easier. They can reserve a modest token bucket 540 and when their traffic exceeds the token bucket, the excess 541 traffic will be sent best effort. 543 A related issue is that at all network elements, datagrams bigger 544 than the MTU of the network element MUST be considered non-conformant 545 and SHOULD be classified as best effort (and will then either be 546 fragmented or dropped according to the element's handling of best 547 effort traffic). [Again, if marking is available, these reclassified 548 datagrams SHOULD be marked.] 550 Ordering and Merging 552 TSpec's are ordered according to the following rules. 554 TSpec A is a substitute ("as good or better than") for TSpec B if (1) 555 both the token rate r and bucket depth b for TSpec A are greater than 556 or equal to those of TSpec B; (2) the peak rate p is at least as 557 large in TSpec A as it is in TSpec B; (3) the minimum policed unit m 558 is at least as small for TSpec A as it is for TSpec B; and (4) the 559 maximum datagram size M is at least as large for TSpec A as it is for 560 TSpec B. 562 TSpec A is "less than or equal" to TSpec B if (1) both the token rate 563 r and bucket depth b for TSpec A are less than or equal to those of 564 TSpec B; (2) the peak rate p in TSpec A is at least as small as the 565 peak rate in TSpec B; (3) the minimum policed unit m is at least as 566 large for TSpec A as it is for TSpec B; and (4) the maximum datagram 567 size M is at least as small for TSpec A as it is for TSpec B. 569 A merged TSpec may be calculated over a set of TSpecs by taking (1) 570 the largest token bucket rate, (2) the largest bucket size, (3) the 571 largest peak rate, (4) the smallest minimum policed unit, and (5) the 572 smallest maximum datagram size across all members of the set. This 573 use of the word "merging" is similar to that in the RSVP protocol 574 [10]; a merged TSpec is one which is adequate to describe the traffic 575 from any one of constituent TSpecs. 577 A summed TSpec may be calculated over a set of TSpecs by computing 578 (1) the sum of the token bucket rates, (2) the sum of the bucket 579 sizes, (3) the sum of the peak rates, (4) the smallest minimum 580 policed unit, and (5) the maximum datagram size parameter. 582 A least common TSpec is one that is sufficient to describe the 583 traffic of any one in a set of traffic flows. A least common TSpec 584 may be calculated over a set of TSpecs by computing: (1) the largest 585 token bucket rate, (2) the largest bucket size, (3) the largest peak 586 rate, (4) the smallest minimum policed unit, and (5) the largest 587 maximum datagram size across all members of the set. 589 The minimum of two TSpecs differs according to whether the TSpecs can 590 be ordered. If one TSpec is less than the other TSpec, the smaller 591 TSpec is the minimum. Otherwise, the minimum TSpec of two TSpecs is 592 determined by comparing the respective values in the two TSpecs and 593 choosing (1) the smaller token bucket rate, (2) the larger token 594 bucket size (3) the smaller peak rate, (4) the smaller minimum 595 policed unit, and (5) the smaller maximum datagram size. 597 The RSpec's are merged in a similar manner as the TSpecs, i.e. a set 598 of RSpecs is merged onto a single RSpec by taking the largest rate R, 599 and the smallest slack S. More precisely, RSpec A is a substitute 600 for RSpec B if the value of reserved service rate, R, in RSpec A is 601 greater than or equal to the value in RSpec B, and the value of the 602 slack, S, in RSpec A is smaller than or equal to that in RSpec B. 604 Each network element receives a service request of the form (TSpec, 605 RSpec), where the RSpec is of the form (Rin, Sin). The network 606 element processes this request and performs one of two actions: 608 a. it accepts the request and returns a new Rspec of the form 609 (Rout, Sout); 610 b. it rejects the request. 612 The processing rules for generating the new RSpec are governed by the 613 delay constraint: 615 Sout + b/Rout + Ctoti/Rout <= Sin + b/Rin + Ctoti/Rin, 617 where Ctoti is the cumulative sum of the error terms, C, for all the 618 network elements that are upstream of and including the current 619 element, i. In other words, this element consumes (Sin - Sout) of 620 slack and can use it to reduce its reservation level, provided that 621 the above inequality is satisfied. Rin and Rout MUST also satisfy 622 the constraint: 624 r <= Rout <= Rin. 626 When several RSpec's, each with rate Rj, j=1,2..., are to be merged 627 at a split point, the value of Rout is the maximum over all the rates 628 Rj, and the value of Sout is the minimum over all the slack terms Sj. 630 NOTE: The various TSpec functions described above are used by 631 applications which desire to combine TSpecs. It is important to 632 observe, however, that the properties of the actual reservation 633 are determined by combining the TSpec with the RSpec rate (R). 635 Because the guaranteed reservation requires both the TSpec and the 636 RSpec rate, there exist some difficult problems for shared 637 reservations in RSVP, particularly where two or more source 638 streams meet. Upstream of the meeting point, it would be 639 desirable to reduce the TSpec and RSpec to use only as much 640 bandwidth and buffering as is required by the individual source's 641 traffic. (Indeed, it may be necessary if the sender is 642 transmitting over a low bandwidth link). 644 However, the RSpec's rate is set to achieve a particular delay 645 bound (and is notjust a function of the TSpec), so changing the 646 RSpec may cause the reservation to fail to meet the receiver's 647 delay requirements. At the same time, not adjusting the RSpec 648 rate means that "shared" RSVP reservations using guaranteed 649 service will fail whenever the bandwidth available at a particular 650 link is less than the receiver's requested rate R, even if the 651 bandwidth is adequate to support the number of senders actually 652 using the link. At this time, this limitation is an open problem 653 in using the guaranteed service with RSVP. 655 Guidelines for Implementors 657 This section discusses a number of important implementation issues in 658 no particular order. 660 It is important to note that individual subnetworks are network 661 elements and both routers and subnetworks MUST support the guaranteed 662 service model to achieve guaranteed service. Since subnetworks 663 typically are not capable of negotiating service using IP-based 664 protocols, as part of providing guaranteed service, routers will have 665 to act as proxies for the subnetworks they are attached to. 667 In some cases, this proxy service will be easy. For instance, on 668 leased line managed by a WFQ scheduler on the upstream node, the 669 proxy need simply ensure that the sum of all the flows' RSpec rates 670 does not exceed the bandwidth of the line, and needs to advertise the 671 rate-based and non-rate-based delays of the link as the values of C 672 and D. 674 In other cases, this proxy service will be complex. In an ATM 675 network, for example, it may require establishing an ATM VC for the 676 flow and computing the C and D terms for that VC. Readers may 677 observe that the token bucket and peak rate used by guaranteed 678 service map directly to the Sustained Cell Rate, Burst Size, and Peak 679 Cell Rate of ATM's Q.2931 QoS parameters for Variable Bit Rate 680 traffic. 682 The assurance that datagrams will not be lost is obtained by setting 683 the router buffer space B to be equal to the token bucket b plus some 684 error term (described below). 686 Another issue related to subnetworks is that the TSpec's token bucket 687 rates measure IP traffic and do not (and cannot) account for link 688 level headers. So the subnetwork network elements MUST adjust the 689 rate and possibly the bucket size to account for adding link level 690 headers. Tunnels MUST also account for the additional IP headers 691 that they add. 693 For datagram networks, a maximum header rate can usually be computed 694 by dividing the rate and bucket sizes by the minimum policed unit. 695 For networks that do internal fragmentation, such as ATM, the 696 computation may be more complex, since one MUST account for both 697 per-fragment overhead and any wastage (padding bytes transmitted) due 698 to mismatches between datagram sizes and fragment sizes. For 699 instance, a conservative estimate of the additional data rate imposed 700 by ATM AAL5 plus ATM segmentation and reassembly is 702 ((r/48)*5)+((r/m)*(8+52)) 704 which represents the rate divided into 48-byte cells multiplied by 705 the 5-byte ATM header, plus the maximum datagram rate (r/m) 706 multiplied by the cost of the 8-byte AAL5 header plus the maximum 707 space that can be wasted by ATM segmentation of a datagram (which is 708 the 52 bytes wasted in a cell that contains one byte). But this 709 estimate is likely to be wildly high, especially if m is small, since 710 ATM wastage is usually much less than 52 bytes. (ATM implementors 711 should be warned that the token bucket may also have to be scaled 712 when setting the VC parameters for call setup and that this example 713 does not account for overhead incurred by encapsulations such as 714 those specified in RFC 1483). 716 To ensure no loss, network elements will have to allocate some 717 buffering for bursts. If every hop implemented the fluid model 718 perfectly, this buffering would simply be b (the token bucket size). 719 However, as noted in the discussion of reshaping earlier, 720 implementations are approximations and we expect that traffic will 721 become more bursty as it goes through the network. However, as with 722 shaping the amount of buffering required to handle the burstiness is 723 bounded by b+Csum+Dsum*R. If one accounts for the peak rate, this 724 can be further reduced to 726 M + (b-M)(p-X)/(p-r) + (Csum/R + Dsum)X 728 where X is set to r if (b-M)/(p-r) is less than Csum/R+Dsum and X is 729 R if (b-M)/(p-r) is greater than or equal to Csum/R+Dsum and p>R; 730 otherwise, X is set to p. This reduction comes from the fact that 731 the peak rate limits the rate at which the burst, b, can be placed in 732 the network. Conversely, if a non-zero slack term, Sout, is returned 733 by the network element, the buffer requirements are increased by 734 adding Sout to Dsum. 736 While sending applications are encouraged to set the peak rate 737 parameter and reshaping points are required to conform to it, it is 738 always acceptable to ignore the peak rate for the purposes of 739 computing buffer requirements and end-to-end delays. The result is 740 simply an overestimate of the buffering and delay. As noted above, 741 if the peak rate is unknown (and thus potentially infinite), the 742 buffering required is b+Csum+Dsum*R. The end-to-end delay without 743 the peak rate is b/R+Ctot/R+Dtot. 745 The parameter D for each network element SHOULD be set to the maximum 746 datagram transfer delay variation (independent of rate and bucket 747 size) through the network element. For instance, in a simple router, 748 one might compute the difference between the worst case and best case 749 times it takes for a datagram to get through the input interface to 750 the processor, and add it to any variation that may occur in how long 751 it would take to get from the processor to the outbound link 752 scheduler (assuming the queueing schemes work correctly). 754 For weighted fair queueing in a datagram environment, D is set to the 755 link MTU divided by the link bandwidth, to account for the 756 possibility that a packet arrives just as a maximum-sized packet 757 begins to be transmitted, and that the arriving packet should have 758 departed before the maximum-sized packet. For a frame-based, slotted 759 system such as Stop and Go queueing, D is the maximum number of slots 760 a datagram may have to wait before getting a chance to be 761 transmitted. 763 Note that multicasting may make determining D more difficult. In 764 many subnets, ATM being one example, the properties of the subnet may 765 depend on the path taken from the multicast sender to the receiver. 766 There are a number of possible approaches to this problem. One is to 767 choose a representative latency for the overall subnet and set D to 768 the (non-negative) difference from that latency. Another is to 769 estimate subnet properties at exit points from the subnet, since the 770 exit point presumably is best placed to compute the properties of its 771 path from the source. 773 NOTE: It is important to note that there is no fixed set of rules 774 about how a subnet determines its properties, and each subnet 775 technology will have to develop its own set of procedures to 776 accurately compute C and D and slack values. 778 D is intended to be distinct from the latency through the network 779 element. Latency is the minimum time through the device (the speed 780 of light delay in a fiber or the absolute minimum time it would take 781 to move a packet through a router), while parameter D is intended to 782 bound the variability in non-rate-based delay. In practice, this 783 distinction is sometimes arbitrary (the latency may be minimal) -- in 784 such cases it is perfectly reasonable to combine the latency with D 785 and to advertise any latency as zero. 787 NOTE: It is implicit in this scheme that to get a complete 788 guarantee of the maximum delay a packet might experience, a user 789 of this service will need to know both the queueing delay 790 (provided by C and D) and the latency. The latency is not 791 advertised by this service but is a general characterization 792 parameter (advertised as specified in [8]). 794 However, even if latency is not advertised, this service can still 795 be used. The simplest approach is to measure the delay 796 experienced by the first packet (or the minimum delay of the first 797 few packets) received and treat this delay value as an upper bound 798 on the latency. 800 The parameter C is the data backlog resulting from the vagaries of 801 how a specific implementation deviates from a strict bit-by-bit 802 service. So, for instance, for datagramized weighted fair queueing, C 803 is set to M to account for packetization effects. 805 If a network element uses a certain amount of slack, Si, to reduce 806 the amount of resources that it has reserved for a particular flow, 807 i, the value Si SHOULD be stored at the network element. 809 Subsequently, if reservation refreshes are received for flow i, the 810 network element MUST use the same slack Si without any further 811 computation. This guarantees consistency in the reservation process. 813 As an example for the use of the slack term, consider the case where 814 the required end-to-end delay, Dreq, is larger than the maximum delay 815 of the fluid flow system. The latter is obtained by setting R=r in 816 the fluid delay formula (for stability, R>=r must be true), and is 817 given by 819 b/r + Ctot/r + Dtot. 821 In this case the slack term is 823 S = Dreq - (b/r + Ctot/r + Dtot). 825 The slack term may be used by the network elements to adjust their 826 local reservations, so that they can admit flows that would otherwise 827 have been rejected. A network element at an intermediate network 828 element that can internally differentiate between delay and rate 829 guarantees can now take advantage of this information to lower the 830 amount of resources allocated to this flow. For example, by taking an 831 amount of slack s <= S, an RCSD scheduler [5] can increase the local 832 delay bound, d, assigned to the flow, to d+s. Given an RSpec, (Rin, 833 Sin), it would do so by setting Rout = Rin and Sout = Sin - s. 835 Similarly, a network element using a WFQ scheduler can decrease its 836 local reservation from Rin to Rout by using some of the slack in the 837 RSpec. This can be accomplished by using the transformation rules 838 given in the previous section, that ensure that the reduced 839 reservation level will not increase the overall end-to-end delay. 841 Evaluation Criteria 843 The scheduling algorithm and admission control algorithm of the 844 element MUST ensure that the delay bounds are never violated and 845 datagrams are not lost, when a source's traffic conforms to the 846 TSpec. Furthermore, the element MUST ensure that misbehaving flows 847 do not affect the service given to other flows. Vendors are 848 encouraged to formally prove that their implementation is an 849 approximation of the fluid model. 851 Examples of Implementation 853 Several algorithms and implementations exist that approximate the 854 fluid model. They include Weighted Fair Queueing (WFQ) [2], Jitter- 855 EDD [3], Virtual Clock [4] and a scheme proposed by IBM [5]. A nice 856 theoretical presentation that shows these schemes are part of a large 857 class of algorithms can be found in [6]. 859 Examples of Use 861 Consider an application that is intolerant of any lost or late 862 datagrams. It uses the advertised values Ctot and Dtot and the TSpec 863 of the flow, to compute the resulting delay bound from a service 864 request with rate R. Assuming R < p, it then sets its playback point 865 to [(b-M)/R*(p-R)/(p-r)]+(M+Ctot)/R+Dtot. 867 Security Considerations 869 This memo discusses how this service could be abused to permit denial 870 of service attacks. The service, as defined, does not allow denial 871 of service (although service may degrade under certain 872 circumstances). 874 Appendix 1: Use of the Guaranteed service with RSVP 876 The use of guaranteed service in conjunction with the RSVP resource 877 reservation setup protocol is specified in reference [9]. This 878 document gives the format of RSVP FLOWSPEC, SENDER_TSPEC, and ADSPEC 879 objects needed to support applications desiring guaranteed service 880 and gives information about how RSVP processes those objects. The 881 RSVP protocol itself is specified in Reference [10]. 883 References 885 [1] S. Shenker and J. Wroclawski. "Network Element QoS Control 886 Service Specification Template". Internet Draft, July 1996, 889 [2] A. Demers, S. Keshav and S. Shenker, "Analysis and Simulation of 890 a Fair Queueing Algorithm," in Internetworking: Research and 891 Experience, Vol 1, No. 1., pp. 3-26. 893 [3] L. Zhang, "Virtual Clock: A New Traffic Control Algorithm for 894 Packet Switching Networks," in Proc. ACM SIGCOMM '90, pp. 19-29. 896 [4] D. Verma, H. Zhang, and D. Ferrari, "Guaranteeing Delay Jitter 897 Bounds in Packet Switching Networks," in Proc. Tricomm '91. 899 [5] L. Georgiadis, R. Guerin, V. Peris, and K. N. Sivarajan, 900 "Efficient Network QoS Provisioning Based on per Node Traffic 901 Shaping," IBM Research Report No. RC-20064. 903 [6] P. Goyal, S.S. Lam and H.M. Vin, "Determining End-to-End Delay 904 Bounds in Heterogeneous Networks," in Proc. 5th Intl. Workshop on 905 Network and Operating System Support for Digital Audio and Video, 906 April 1995. 908 [7] A.K.J. Parekh, A Generalized Processor Sharing Approach to Flow 909 Control in Integrated Services Networks, MIT Laboratory for 910 Information and Decision Systems, Report LIDS-TH-2089, February 1992. 912 [8] S. Shenker and J. Wroclawski. "General Characterization 913 Parameters for Integrated Service Network Elements", Internet Draft, 914 July 1996, 916 [9] J. Wroclawski, "Use of RSVP with IETF Integrated Services", 917 Internet Draft, July 1996, 919 [10] B. Braden, et. al. "Resource Reservation Protocol (RSVP) - 920 Version 1 Functional Specification", Internet Draft, July 1996, 921 923 Authors' Addresses: 925 Scott Shenker 926 Xerox PARC 927 3333 Coyote Hill Road 928 Palo Alto, CA 94304-1314 930 email: shenker@parc.xerox.com 931 415-812-4840 932 415-812-4471 (FAX) 934 Craig Partridge 935 BBN 936 2370 Amherst St 937 Palo Alto CA 94306 939 email: craig@bbn.com 941 Roch Guerin 942 IBM T.J. Watson Research Center 943 Yorktown Heights, NY 10598 945 email: guerin@watson.ibm.com 946 914-784-7038 947 914-784-6318 (FAX)