idnits 2.17.1 draft-ietf-intserv-guaranteed-svc-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-24) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([2], [3], [4], [5], [6], [7], [8], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 133: '... network element MUST ensure that the ...' RFC 2119 keyword, line 144: '... network element MUST ensure that its ...' RFC 2119 keyword, line 154: '... network element MUST ensure that the ...' RFC 2119 keyword, line 172: '...an the MTU of the link MUST be policed...' RFC 2119 keyword, line 180: '...at has a new TSpec and/or RSpec SHOULD...' (39 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 261 has weird spacing: '...55) and a...' == Line 544 has weird spacing: '... flow and c...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: Policing is done at the edge of the network. Reshaping is done at all heterogeneous source branch points and at all source merge points. A heterogeneous source branch point is a spot where the multicast distribution tree from a source branches to multiple distinct paths, and the TSpec's of the reservations on the various outgoing links are not all the same. Reshaping need only be done if the TSpec on the outgoing link is "less than" (in the sense described in the Ordering section) the TSpec reserved on the immediately upstream link. A source merge point is where the multicast distribution trees from two different sources (sharing the same reservation) merge. It is the responsibility of the invoker of the service (a setup protocol, local configuration tool, or similar mechanism) to identify points where policing is required. Reshaping may be done at other points as well as those described above. Policing MUST not be done except at the edge of the network. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (10 June 1996) is 10180 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '8' Summary: 12 errors (**), 0 flaws (~~), 4 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Integrated Services WG 3 INTERNET-DRAFT S. Shenker/C. Partridge/R. Guerin 4 draft-ietf-intserv-guaranteed-svc-04.txt Xerox/BBN/IBM 5 10 June 1996 6 Expires: 1/1/97 8 Specification of Guaranteed Quality of Service 10 Status of this Memo 12 This document is an Internet-Draft. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its areas, 14 and its working groups. Note that other groups may also distribute 15 working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet- Drafts as reference 20 material or to cite them other than as ``work in progress.'' 22 To learn the current status of any Internet-Draft, please check the 23 ``1id-abstracts.txt'' listing contained in the Internet- Drafts 24 Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 25 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 26 ftp.isi.edu (US West Coast). 28 This document is a product of the Integrated Services working group 29 of the Internet Engineering Task Force. Comments are solicited and 30 should be addressed to the working group's mailing list at int- 31 serv@isi.edu and/or the author(s). 33 This draft reflects minor changes from the IETF meeting in Los 34 Angeles. 36 Abstract 38 This memo describes the network element behavior required to deliver 39 a guaranteed service (guaranteed delay and bandwidth) in the 40 Internet. Guaranteed service provides firm (mathematically provable) 41 bounds on end-to-end datagram queueing delays. This service makes it 42 possible to provide a service that guarantees both delay and 43 bandwidth. This specification follows the service specification 44 template described in [1]. 46 Introduction 48 This document defines the requirements for network elements that 49 support guaranteed service. This memo is one of a series of 50 documents that specify the network element behavior required to 51 support various qualities of service in IP internetworks. Services 52 described in these documents are useful both in the global Internet 53 and private IP networks. 55 This document is based on the service specification template given in 56 [1]. Please refer to that document for definitions and additional 57 information about the specification of qualities of service within 58 the IP protocol family. 60 End-to-End Behavior 62 The end-to-end behavior provided by a series of service elements that 63 conform to this document is an assured level of bandwidth that, when 64 used by a policed flow, produces a delay-bounded service with no 65 queueing loss for all conforming datagrams (assuming no failure of 66 network components or changes in routing during the life of the 67 flow). 69 The end-to-end behavior conforms to the fluid model (described under 70 Network Element Data Handling below) in that the delivered queueing 71 delays do not exceed the fluid delays by more than the specified 72 error bounds. More precisely, the end-to-end delay bound is [(b- 73 M)/R*(p-R)/(p-r)]+(M+Ctot)/R+Dtot for p>R, and (M+Ctot)/R+Dtot for 74 r<=p<=R, (where b, r, p, M, R, Ctot, and Dtot are defined later in 75 this document). 77 NOTE: While the per-hop error terms needed to compute the end-to- 78 end delays are exported by the service module (see Exported 79 Information below), the mechanisms needed to collect per-hop 80 bounds and make the end-to-end quantities Ctot and Dtot known to 81 the applications are not described in this specification. These 82 functions are provided by reservation setup protocols, routing 83 protocols or other network management functions and are outside 84 the scope of this document. 86 The maximum end-to-end queueing delay (as characterized by Ctot and 87 Dtot) and bandwidth (characterized by R) provided along a path will 88 be stable. That is, they will not change as long as the end-to-end 89 path does not change. 91 Guaranteed service does not control the minimal delay of datagrams, 92 merely the maximal queueing delay. 94 This service is subject to admission control. 96 Motivation 98 Guaranteed service guarantees that datagrams will arrive within the 99 guaranteed delivery time and will not be discarded due to queue 100 overflows, provided the flow traffic stays within its specified 101 traffic parameters. This service is intended for applications which 102 need a firm guarantee that a datagram will arrive no later than a 103 certain time after it was transmitted by its source. For example, 104 some audio and video "play-back" applications are intolerant of any 105 datagram arriving after their play-back time. Applications that have 106 hard real-time requirements will also require guaranteed service. 108 This service does not attempt to minimize the jitter (the difference 109 between the minimal and maximal datagram delays); it merely controls 110 the maximal queueing delay. Because the guaranteed delay bound is a 111 firm one, the delay has to be set large enough to cover extremely 112 rare cases of long queueing delays. Several studies have shown that 113 the actual delay for the vast majority of datagrams can be far lower 114 than the guaranteed delay. Therefore, authors of playback 115 applications should note that datagrams will often arrive far earlier 116 than the delivery deadline and will have to be buffered at the 117 receiving system until it is time for the application to process 118 them. 120 This service represents one extreme end of delay control for 121 networks. Most other services providing delay control provide much 122 weaker assurances about the resulting delays. In order to provide 123 this high level of assurance, guaranteed service is typically only 124 useful if provided by every network element along the path (i.e. by 125 both routers and the links and switches that interconnect the 126 routers). Moreover, as described in the Exported Information 127 section, effective provision and use of the service requires that the 128 set-up protocol used to request service provides service 129 characterizations to intermediate routers and to the endpoints. 131 Network Element Data Handling Requirements 133 The network element MUST ensure that the service approximates the 134 "fluid model" of service. The fluid model at service rate R is 135 essentially the service that would be provided by a dedicated wire of 136 bandwidth R between the source and receiver. Thus, in the fluid 137 model of service at a fixed rate R, the flow's service is completely 138 independent of that of any other flow. 140 The flow's level of service is characterized at each network element 141 by a bandwidth (or service rate) R and a buffer size B. R represents 142 the share of the link's bandwidth the flow is entitled to and B 143 represents the buffer space in the router that the flow may consume. 144 The network element MUST ensure that its service matches the fluid 145 model at that same rate to within a sharp error bound. 147 The definition of guaranteed service relies on the result that the 148 fluid delay of a flow obeying a token bucket (r,b) and being served 149 by a line with bandwidth R is bounded by b/R as long as R is no less 150 than r. Guaranteed service with a service rate R, where now R is a 151 share of bandwidth rather than the bandwidth of a dedicated line, 152 approximates this behavior. 154 Consequently, the network element MUST ensure that the delay of any 155 datagram be less than b/R+C/R+D, where C and D describe the maximal 156 local deviation away from the fluid model. It is important to 157 emphasize that C and D are maximums. So, for instance, if an 158 implementation has occasional gaps in service (perhaps due to 159 processing routing updates), D needs to be large enough to account 160 for the time a datagram may lose during the gap in service. (C and D 161 are described in more detail in the section on Exported Information). 163 NOTE: Strictly speaking, this memo requires only that the service 164 a flow receives is never worse than it would receive under this 165 approximation of the fluid model. It is perfectly acceptable to 166 give better service. For instance, if a flow is currently not 167 using its share, R, algorithms such as Weighted Fair Queueing that 168 temporarily give other flows the unused bandwidth, are perfectly 169 acceptable (indeed, are encouraged). 171 Links are not permitted to fragment datagrams as part of guaranteed 172 service. Datagrams larger than the MTU of the link MUST be policed 173 as nonconformant which means that they will be policed according to 174 the rules described in the Policing section below. 176 Invocation Information 178 Guaranteed service is invoked by specifying the traffic (TSpec) and 179 the desired service (RSpec) to the network element. A service 180 request for an existing flow that has a new TSpec and/or RSpec SHOULD 181 be treated as a new invocation, in the sense that admission control 182 SHOULD be reapplied to the flow. Flows that reduce their TSpec 183 and/or their RSpec (i.e., their new TSpec/RSpec is strictly smaller 184 than the old TSpec/RSpec according to the ordering rules described in 185 the section on Ordering below) SHOULD never be denied service. 187 The TSpec takes the form of a token bucket plus a peak rate (p), a 188 minimum policed unit (m), and a maximum datagram size (M). 190 The token bucket has a bucket depth, b, and a bucket rate, r. Both b 191 and r MUST be positive. The rate, r, is measured in bytes of IP 192 datagrams per second, and can range from 1 byte per second to as 193 large as 40 terabytes per second (or close to what is believed to be 194 the maximum theoretical bandwidth of a single strand of fiber). 195 Clearly, particularly for large bandwidths, only the first few digits 196 are significant and so the use of floating point representations, 197 accurate to at least 0.1% is encouraged. 199 The bucket depth, b, is also measured in bytes and can range from 1 200 byte to 250 gigabytes. Again, floating point representations 201 accurate to at least 0.1% are encouraged. 203 The range of values is intentionally large to allow for the future 204 bandwidths. The range is not intended to imply that a network 205 element has to support the entire range. 207 The peak rate, p, is measured in bytes of IP datagrams per second and 208 has the same range and suggested representation as the bucket rate. 209 The peak rate is the maximum rate at which the source and any 210 reshaping points (reshaping points are defined below) may inject 211 bursts of traffic into the network. More precisely, it is a 212 requirement that for all time periods the amount of data sent cannot 213 exceed M+pT where M is the maximum datagram size and T is the length 214 of the time period. Furthermore, p MUST be greater than or equal to 215 the token bucket rate, r. If the peak rate is unknown or 216 unspecified, then p MUST be set to infinity. 218 The minimum policed unit, m, is an integer measured in bytes. All IP 219 datagrams less than size m will be counted, when policed and tested 220 for conformance to the TSpec, as being of size m. The maximum 221 datagram size, M, is the biggest datagram that will conform to the 222 traffic specification; it is also measured in bytes. The flow MUST 223 be rejected if the requested maximum datagram size is larger than the 224 MTU of the link. Both m and M MUST be positive, and m MUST be less 225 than or equal to M. 227 The RSpec is a rate R and a slack term S, where R MUST be greater 228 than or equal to r and S MUST be nonnegative. The RSpec rate can be 229 bigger than the TSpec rate because higher rates will reduce queueing 230 delay. The slack term signifies the difference between the desired 231 delay and the delay obtained by using a reservation level R. This 232 slack term can be utilized by the service element to reduce its 233 resource reservation for this flow. When a service element chooses to 234 utilize some of the slack in the RSpec, it MUST follow specific rules 235 in updating the R and S fields of the RSpec; these rules are 236 specified in the Ordering and Merging section. If at the time of 237 service invocation no slack is specified, the slack term, S, is set 238 to zero. No buffer specification is included in the RSpec because 239 the service element is expected to derive the required buffer space 240 to ensure no queueing loss from the token bucket and peak rate in the 241 TSpec, the reserved rate and slack in the RSpec, the exported 242 information received at the network element, i.e., Ctot and Dtot or 243 Csum and Dsum, combined with internal information about how the 244 element manages its traffic. 246 The TSpec can be represented by three floating point numbers in 247 single-precision IEEE floating point format followed by two 32-bit 248 integers in network byte order. The first floating point value is 249 the rate (r), the second floating point value is the bucket size (b), 250 the third floating point is the peak rate (p), the first integer is 251 the minimum policed unit (m), and the second integer is the maximum 252 datagram size (M). 254 The RSpec rate, R, and the slack term, S, can also be represented 255 using single-precision IEEE floating point. 257 If the IEEE floating point representation is used, the sign bit MUST 258 be zero. (All values MUST be positive). Exponents less than 127 259 (i.e., 0) are prohibited. Exponents greater than 162 (i.e., positive 260 35) are discouraged, except for specifying a peak rate of infinity. 261 Infinity is represented with an exponent of all ones (255) and a 262 sign bit and mantissa of all zeroes. 264 Exported Information 266 Each guaranteed service module MUST export at least the following 267 information. All of the parameters described below are 268 characterization parameters. 270 A network elements implementation of guaranteed service is 271 characterized by two error terms, C and D, which represent how the 272 element's implementation of the guaranteed service deviates from the 273 fluid model. These two parameters have an additive composition rule. 275 The error term C is the rate-dependent error term. It represents the 276 delay a datagram in the flow might experience due to the rate 277 parameters of the flow. An example of such an error term is the need 278 to account for the time taken serializing a datagram broken up into 279 ATM cells, with the cells sent at a frequency of 1/r. 281 NOTE: It is important to observe that when computing the delay 282 bound, parameter C is divided by the reservation rate R. This 283 division is done because, as with the example of serializing the 284 datagram, the effect of the C term is a function of the 285 transmission rate. Implementers should take care to confirm that 286 their C values, when divided by various rates, give appropriate 287 results. Delay values that are not dependent on the rate should 288 be incorporated into the value for the D parameter. 290 The error term D is the fixed per-element error term and represents 291 the worst case non-rate-based transit time through the service 292 element. It is generally determined or set at boot or configuration 293 time. An example of D is a slotted network, in which guaranteed 294 flows are assigned particular slots in a cycle of slots. Some part 295 of the per-flow delay may be determined by which slots in the cycle 296 are allocated to the flow. In this case, D would measure the maximum 297 amount of time a flow's data, once ready to be sent, might have to 298 wait for a slot. (Observe that this value can be computed before 299 slots are assigned and thus can be advertised. For instance, imagine 300 there are 100 slots. In the worst case, a flow might get all of its 301 N slots clustered together, such that if a packet was made ready to 302 send just after the cluster ended, the packet might have to wait 303 100-N slot times before transmitting. In this case one can easily 304 approximate this delay by setting D to 100 slot times). 306 If the composition function is applied along the entire path to 307 compute the end-to-end sums of C and D (Ctot and Dtot) and the 308 resulting values are then provided to the end nodes (by presumably 309 the setup protocol), the end nodes can compute the maximal datagram 310 queueing delays. Moreover, if the partial sums (Csum and Dsum) from 311 the most recent reshaping point (reshaping points are defined below) 312 downstream towards receivers are handed to each network element then 313 these network elements can compute the buffer allocations necessary 314 to achieve no datagram loss, as detailed in the section Guidelines 315 for Implementors. The proper use and provision of this service 316 requires that the quantities Ctot and Dtot, and the quantities Csum 317 and Dsum be computed. Therefore, we assume that usage of guaranteed 318 service will be primarily in contexts where these quantities are made 319 available to end nodes and network elements. 321 The error term C is measured in units of bytes. An individual 322 element can advertise a C value between 1 and 2**28 (a little over 323 250 megabytes) and the total added over all elements can range as 324 high as (2**32)-1. Should the sum of the different elements delay 325 exceed (2**32)-1, the end-to-end error term MUST be set to (2**32)-1. 327 The error term D is measured in units of one microsecond. An 328 individual element can advertise a delay value between 1 and 2**28 329 (somewhat over two minutes) and the total delay added all elements 330 can range as high as (2**32)-1. Should the sum of the different 331 elements delay exceed (2**32)-1, the end-to-end delay MUST be set to 332 (2**32)-1. 334 The guaranteed service is service_name 2. 336 Error characterization parameter C is numbered 1 and parameter D is 337 numbered 2. 339 The end-to-end composed value (Ctot) for C is numbered 3 and the 340 end-to-end composed value for D (Dtot) is numbered 4. 342 The since-last-reshaping point composed value (Csum) for C is 343 numbered 5 and the since-last-reshaping point composed value for D 344 (Dsum) is numbered 6. 346 No other exported data is required by this specification. 348 Policing 350 There are two forms of policing in guaranteed service. One form is 351 simple policing (hereafter just called policing to be consistent with 352 other documents), in which arriving traffic is compared against a 353 TSpec. The other form is reshaping, where an attempt is made to 354 restore (possibly distorted) traffic's shape to conform to the TSpec, 355 and the fact that traffic is in violation of the TSpec is discovered 356 because the reshaping fails (the reshaping buffer overflows). 358 Policing is done at the edge of the network. Reshaping is done at 359 all heterogeneous source branch points and at all source merge 360 points. A heterogeneous source branch point is a spot where the 361 multicast distribution tree from a source branches to multiple 362 distinct paths, and the TSpec's of the reservations on the various 363 outgoing links are not all the same. Reshaping need only be done if 364 the TSpec on the outgoing link is "less than" (in the sense described 365 in the Ordering section) the TSpec reserved on the immediately 366 upstream link. A source merge point is where the multicast 367 distribution trees from two different sources (sharing the same 368 reservation) merge. It is the responsibility of the invoker of the 369 service (a setup protocol, local configuration tool, or similar 370 mechanism) to identify points where policing is required. Reshaping 371 may be done at other points as well as those described above. 372 Policing MUST not be done except at the edge of the network. 374 The token bucket and peak rate parameters require that traffic MUST 375 obey the rule that over all time periods, the amount of data sent 376 cannot exceed M+min[pT, rT+b-M], where r and b are the token bucket 377 parameters, M is the maximum datagram size, and T is the length of 378 the time period (note that when p is infinite this reduces to the 379 standard token bucket requirement). For the purposes of this 380 accounting, links MUST count datagrams which are smaller than the 381 minimal policing unit to be of size m. Datagrams which arrive at an 382 element and cause a violation of the the M+min[pT, rT+b-M] bound are 383 considered non-conformant. 385 At the edge of the network, traffic is policed to ensure it conforms 386 to the token bucket. Non-conforming datagrams are treated as best- 387 effort datagrams. [If and when a marking ability becomes available, 388 these non-conformant datagrams SHOULD be ''marked'' as being non- 389 compliant and then treated as best effort datagrams at all subsequent 390 routers.] 392 NOTE: There may be situations outside the scope of this document, 393 such as when a service module's implementation of guaranteed 394 service is being used to implement traffic sharing rather than a 395 quality of service, where the desired action is to discard non- 396 conforming datagrams. To allow for such uses, implementors SHOULD 397 ensure that the action to be taken for non-conforming datagrams is 398 configurable. 400 Inside the network, policing does not produce the desired results, 401 because queueing effects will occasionally cause a flow's traffic 402 that entered the network as conformant to be no longer conformant at 403 some downstream network element. Therefore, inside the network, 404 service elements that wish to police traffic MUST do so by reshaping 405 traffic to the token bucket. Reshaping entails delaying datagrams 406 until they are within conformance of the TSpec. 408 Reshaping is done by combining a buffer with a token bucket and peak 409 rate regulator and buffering data until it can be sent in conformance 410 with the token bucket and peak rate parameters. (The token bucket 411 regulator MUST start with its token bucket full of tokens). Under 412 guaranteed service, the amount of buffering required to reshape any 413 conforming traffic back to its original token bucket shape is 414 b+Csum+(Dsum*r), where Csum and Dsum are the sums of the parameters C 415 and D between the last reshaping point and the current reshaping 416 point. Note that the knowledge of the peak rate at the reshapers can 417 be used to reduce these buffer requirements (see the section on 418 "Guidelines for Implementors" below). A network element MUST provide 419 the necessary buffers to ensure that conforming traffic is not lost 420 at the reshaper. 422 If a datagram arrives to discover the reshaping buffer is full, then 423 the datagram is non-conforming. Observe this means that a reshaper 424 is effectively policing too. As with a policer, the reshaper SHOULD 425 relegate non-conforming datagrams to best effort. [If marking is 426 available, the non-conforming datagrams SHOULD be marked] 428 NOTE: As with policers, it SHOULD be possible to configure how 429 reshapers handle non-conforming datagrams. 431 Note that while the large buffer makes it appear that reshapers add 432 considerable delay, this is not the case. Given a valid TSpec that 433 accurately describes the traffic, reshaping will cause little extra 434 actual delay at the reshaping point (and will not affect the delay 435 bound at all). Furthermore, in the normal case, reshaping will not 436 cause the loss of any data. 438 However, (typically at merge or branch points), it may happen that 439 the TSpec is smaller than the actual traffic. If this happens, 440 reshaping will cause a large queue to develop at the reshaping point, 441 which both causes substantial additional delays and forces some 442 datagrams to be treated as non-conforming. This scenario makes an 443 unpleasant denial of service attack possible, in which a receiver who 444 is successfully receiving a flow's traffic via best effort service is 445 pre-empted by a new receiver who requests a reservation for the flow, 446 but with an inadequate TSpec and RSpec. The flow's traffic will now 447 be policed and possibly reshaped. If the policing function was 448 chosen to discard datagrams, the best-effort receiver would stop 449 receiving traffic. For this reason, in the normal case, policers are 450 simply to treat datagrams as best effort (and marking them if marking 451 is implemented). While this protects against denial of service, it 452 is still true that the bad TSpec may cause queueing delays to 453 increase. 455 NOTE: To minimize problems of reordering datagrams, reshaping 456 points may wish to forward a best-effort datagram from the front 457 of the reshaping queue when a new datagram arrives and the 458 reshaping buffer is full. 460 Readers should also observe that reclassifying datagrams as best 461 effort also makes support for elastic flows easier. They can 462 reserve a modest token bucket and when their traffic exceeds the 463 token bucket, the excess traffic will be sent best effort. 465 A related issue is that at all network elements, datagrams bigger 466 than the MTU of the network element MUST be considered non-conformant 467 and SHOULD be classified as best effort (and will then either be 468 fragmented or dropped according to the element's handling of best 469 effort traffic). [Again, if marking is available, these reclassified 470 datagrams SHOULD be marked.] 472 Ordering and Merging 474 TSpec's are ordered according to the following rule: TSpec A is a 475 substitute ("as good or better than") for TSpec B if (1) both the 476 token rate r and bucket depth b for TSpec A are greater than or equal 477 to those of TSpec B, (2) the peak rate p is at least as large in 478 TSpec A as it is in TSpec B. (3) the minimum policed unit m is at 479 least as small for TSpec A as it is for TSpec B, and (4) the maximum 480 datagram size M is at least as large for TSpec A as it is for TSpec 481 B. 483 A merged TSpec may be calculated over a set of TSpecs by taking the 484 largest token bucket rate, largest bucket size, largest peak rate, 485 smallest minimal policed unit, and largest maximum datagram size 486 across all members of the set. This use of the word "merging" is 487 similar to that in the RSVP protocol; a merged TSpec is one which is 488 adequate to describe the traffic from any one of a number of flows. 490 The RSpec's are merged in a similar manner as the TSpecs, i.e. a set 491 of RSpecs is merged onto a single RSpec by taking the largest rate R, 492 and the smallest slack S. More precisely, RSpec A is a substitute 493 for RSpec B if the value of reserved service rate, R, in RSpec A is 494 greater than or equal to the value in RSpec B, and the value of the 495 slack, S, in RSpec A is smaller than or equal to that in RSpec B. 497 Each network element receives a service request of the form (TSpec, 498 RSpec), where the RSpec is of the form (Rin, Sin). The network 499 element processes this request and performs one of two actions: 501 a. it accepts the request and returns a new Rspec of the form 502 (Rout, Sout); 503 b. it rejects the request. 505 The processing rules for generating the new RSpec are governed by the 506 delay constraint: 508 Sout + b/Rout + Ctoti/Rout <= Sin + b/Rin + Ctoti/Rin, 510 where Ctoti is the cumulative sum of the error terms, C, for all the 511 network elements that are upstream of the current element, i. In 512 other words, this element consumes (Sin - Sout) of slack and can use 513 it to reduce its reservation level, provided that the above 514 inequality is satisfied. Rin and Rout MUST also satisfy the 515 constraint: 517 r <= Rout <= Rin. 519 When several RSpec's, each with rate Rj, j=1,2..., are to be merged 520 at a split point, the value of Rout is the maximum over all the rates 521 Rj, and the value of Sout is the minimum over all the slack terms Sj. 523 Guidelines for Implementors 525 This section discusses a number of important implementation issues in 526 no particular order. 528 It is important to note that individual subnetworks are service 529 elements and both routers and subnetworks MUST support the guaranteed 530 service model to achieve guaranteed service. Since subnetworks 531 typically are not capable of negotiating service using IP-based 532 protocols, as part of providing guaranteed service, routers will have 533 to act as proxies for the subnetworks they are attached to. 535 In some cases, this proxy service will be easy. For instance, on 536 leased line managed by a WFQ scheduler on the upstream node, the 537 proxy need simply ensure that the sum of all the flows' RSpec rates 538 does not exceed the bandwidth of the line, and needs to advertise the 539 rate-based and non-rate-based delays of the link as the values of C 540 and D. 542 In other cases, this proxy service will be complex. In an ATM 543 network, for example, it may require establishing an ATM VC for the 544 flow and computing the C and D terms for that VC. Readers may 545 observe that the token bucket and peak rate used by guaranteed 546 service map directly to the Sustained Cell Rate, Burst Size, and Peak 547 Cell Rate of ATM's Q.2931 QoS parameters for Variable Bit Rate 548 traffic. 550 The assurance that datagrams will not be lost is obtained by setting 551 the router buffer space B to be equal to the token bucket b plus some 552 error term (described below). 554 Another issue related to subnetworks is that the TSpec's token bucket 555 rates measure IP traffic and do not (and cannot) account for link 556 level headers. So the subnetwork service elements MUST adjust the 557 rate and possibly the bucket size to account for adding link level 558 headers. Tunnels MUST also account for the additional IP headers 559 that they add. 561 For datagram networks, a maximum header rate can usually be computed 562 by dividing the rate and bucket sizes by the minimum policed unit. 563 For networks that do internal fragmentation, such as ATM, the 564 computation may be more complex, since one MUST account for both 565 per-fragment overhead and any wastage (padding bytes transmitted) due 566 to mismatches between datagram sizes and fragment sizes. For 567 instance, a conservative estimate of the additional data rate imposed 568 by ATM AAL5 plus ATM segmentation and reassembly is 570 ((r/48)*5)+((r/m)*(8+52)) 572 which represents the rate divided into 48-byte cells multiplied by 573 the 5-byte ATM header, plus the maximum datagram rate (r/m) 574 multiplied by the cost of the 8-byte AAL5 header plus the maximum 575 space that can be wasted by ATM segmentation of a datagram (which is 576 the 52 bytes wasted in a cell that contains one byte). But this 577 estimate is likely to be wildly high, especially if m is small, since 578 ATM wastage is usually much less than 52 bytes. (ATM implementors 579 should be warned that the token bucket may also have to be scaled 580 when setting the VC parameters for call setup and that this example 581 does not account for overhead incurred by encapsulations such as 582 those specified in RFC 1483). 584 To ensure no loss, service elements will have to allocate some 585 buffering for bursts. If every hop implemented the fluid model 586 perfectly, this buffering would simply be b (the token bucket size). 587 However, as noted in the discussion of reshaping earlier, 588 implementations are approximations and we expect that traffic will 589 become more bursty as it goes through the network. However, as with 590 shaping the amount of buffering required to handle the burstiness is 591 bounded by b+Csum+Dsum*R. If one accounts for the peak rate, this 592 can be further reduced to 594 M + (b-M)(p-X)/(p-r) + (Csum/R + Dsum)X 596 where X is set to r if (b-M)/(p-r) is less than Csum/R+Dsum and X is 597 R if (b-M)/(p-r) is greater than or equal to Csum/R+Dsum and p>R; 598 otherwise, X is set to p. This reduction comes from the fact that 599 the peak rate limits the rate at which the burst, b, can be placed in 600 the network. Note also, the buffer requirements can be lowered by 601 subtracting from Dsum the propagation delay since the last reshaping 602 point, if it is known. Conversely, if a non-zero slack term, Sout, is 603 returned by the network element, the buffer requirements are 604 increased by adding Sout to Dsum. 606 While sending applications are encouraged to set the peak rate 607 parameter and reshaping points are required to conform to it, it is 608 always acceptable to ignore the peak rate for the purposes of 609 computing buffer requirements and end-to-end delays. The result is 610 simply an overestimate of the buffering and delay. As noted above, 611 if the peak rate is unknown (and thus potentially infinite), the 612 buffering required is b+Csum+Dsum*R. The end-to-end delay without 613 the peak rate is b/R+Ctot/R+Dtot. 615 The parameter D at each service element SHOULD be set to the maximum 616 datagram transfer delay (independent of rate and bucket size) through 617 the service element. For instance, in a simple router, one might 618 compute the worst case amount of time it make take for a datagram to 619 get through the input interface to the processor, and how long it 620 would take to get from the processor to the outbound link scheduler 621 (assuming the queueing schemes work correctly). For an Ethernet, it 622 might represent the worst case delay if the maximum number of 623 collisions is experienced. For datagramized weighted fair queueing, 624 D is set to the link MTU divided by the link bandwidth, to account 625 for the possibility that a packet arrives just as a maximum-sized 626 packet begins to be transmitted, and that the arriving packet should 627 have departed before the maximum-sized packet. 629 D is intended to be distinct from the latency through the service 630 element. Latency is the minimum time through the device (the speed 631 of light delay in a fiber or the absolute minimum time it would take 632 to move a packet through a router), while parameter D is intended to 633 bound the variability in non-rate-based delay. In practice, this 634 distinction is sometimes arbitrary (the latency may be minimal) -- in 635 such cases it is perfectly reasonable to combine the latency with D 636 and to advertise any latency as zero. 638 NOTE: It is implicit in this scheme that to get a complete 639 guarantee of the maximum delay a packet might experience, a user 640 of this service will need to know both the queueing delay 641 (provided by C and D) and the latency. The latency is not 642 advertised by this service but is a general characterization 643 parameter (advertised as specified in [7]). 645 However, even if latency is not advertised, this service can still 646 be used. The simplest approach is to measure the delay 647 experienced by the first packet (or the minimum delay of the first 648 few packets) received and treat this delay value as an upper bound 649 on the latency. 651 The parameter C is the data backlog resulting from the vagaries of 652 how a specific implementation deviates from a strict bit-by-bit 653 service. So, for instance, for datagramized weighted fair queueing, C 654 is set to M to account for packetization effects. 656 If a network element uses a certain amount of slack, Si, to reduce 657 the amount of resources that it has reserved for a particular flow, 658 i, the value Si SHOULD be stored at the network element. 659 Subsequently, if reservation refreshes are received for flow i, the 660 network element MUST use the same slack Si without any further 661 computation. This guarantees consistency in the reservation process. 663 As an example for the use of the slack term, consider the case where 664 the required end-to-end delay, Dreq, is larger than the maximum delay 665 of the fluid flow system. The latter is obtained by setting R=r in 666 the fluid delay formula (for stability, R>=r must be true), and is 667 given by 669 b/r + Ctot/r + Dtot. 671 In this case the slack term is 673 S = Dreq - (b/r + Ctot/r + Dtot). 675 The slack term may be used by the network elements to adjust their 676 local reservations, so that they can admit flows that would otherwise 677 have been rejected. A service element at an intermediate network 678 element that can internally differentiate between delay and rate 679 guarantees can now take advantage of this information to lower the 680 amount of resources allocated to this flow. For example, by taking an 681 amount of slack s <= S, an RCSD scheduler [5] can increase the local 682 delay bound, d, assigned to the flow, to d+s. Given an RSpec, (Rin, 683 Sin), it would do so by setting Rout = Rin and Sout = Sin - s. 685 Similarly, a network element using a WFQ scheduler can decrease its 686 local reservation from Rin to Rout by using some of the slack in the 687 RSpec. This can be accomplished by using the transformation rules 688 given in the previous section, that ensure that the reduced 689 reservation level will not increase the overall end-to-end delay. 691 Evaluation Criteria 693 The scheduling algorithm and admission control algorithm of the 694 element MUST ensure that the delay bounds are never violated. 695 Furthermore, the element MUST ensure that misbehaving flows do not 696 affect the service given to other flows. Vendors are encouraged to 697 formally prove that their implementation is an approximation of the 698 fluid model. 700 Examples of Implementation 702 Several algorithms and implementations exist that approximate the 703 fluid model. They include Weighted Fair Queueing (WFQ) [2], Jitter- 704 EDD [3], Virtual Clock [4] and a scheme proposed by IBM [5]. A nice 705 theoretical presentation that shows these schemes are part of a large 706 class of algorithms can be found in [6]. 708 Examples of Use 710 Consider an application that is intolerant of any lost or late 711 datagrams. It uses the advertised values Ctot and Dtot and the TSpec 712 of the flow, to compute the resulting delay bound from a service 713 request with rate R. Assuming R < p, it then sets its playback point 714 to [(b-M)/R*(p-R)/(p-r)]+(M+Ctot)/R+Dtot. 716 Security Considerations 718 This memo discusses how this service could be abused to permit denial 719 of service attacks. The service, as defined, does not allow denial 720 of service (although service may degrade under certain 721 circumstances). 723 Acknowledgements 725 The authors would like to gratefully acknowledge the help of the INT 726 SERV working group. The basic results for the delay guarantees come 727 from reference [8]. 729 References 731 [1] S. Shenker and J. Wroclawski. "Network Element Service 732 Specification Template", Internet Draft, June 1995, 735 [2] A. Demers, S. Keshav and S. Shenker, "Analysis and Simulation of 736 a Fair Queueing Algorithm," in Internetworking: Research and 737 Experience, Vol 1, No. 1., pp. 3-26. 739 [3] L. Zhang, "Virtual Clock: A New Traffic Control Algorithm for 740 Packet Switching Networks," in Proc. ACM SIGCOMM '90, pp. 19-29. 742 [4] D. Verma, H. Zhang, and D. Ferrari, "Guaranteeing Delay Jitter 743 Bounds in Packet Switching Networks," in Proc. Tricomm '91. 745 [5] L. Georgiadis, R. Guerin, V. Peris, and K. N. Sivarajan, 746 "Efficient Network QoS Provisioning Based on per Node Traffic 747 Shaping," IBM Research Report No. RC-20064. 749 [6] P. Goyal, S.S. Lam and H.M. Vin, "Determining End-to-End Delay 750 Bounds in Heterogeneous Networks," in Proc. 5th Intl. Workshop on 751 Network and Operating System Support for Digital Audio and Video, 752 April 1995. 754 [7] S. Shenker. "Specification of General Characterization 755 Parameters", Internet Draft, November 1995, 758 [8] A.K.J. Parekh, A Generalized Processor Sharing Approach to Flow 759 Control in Integrated Services Networks, MIT Laboratory for 760 Information and Decision Systems, Report LIDS-TH-2089, February 1992. 762 Scott Shenker 763 Xerox PARC 764 3333 Coyote Hill Road 765 Palo Alto, CA 94304-1314 767 email: shenker@parc.xerox.com 768 415-812-4840 769 415-812-4471 (FAX) 771 Craig Partridge 772 BBN 773 2370 Amherst St 774 Palo Alto CA 94306 776 email: craig@bbn.com 778 Roch Guerin 779 IBM T.J. Watson Research Center 780 Yorktown Heights, NY 10598 782 email: guerin@watson.ibm.com 783 914-784-7038 784 914-784-6318 (FAX)