idnits 2.17.1 

draft-ietf-intserv-guaranteed-svc-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-24) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack an Authors' Addresses Section.

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The abstract seems to contain references ([2], [3], [4], [5], [6], [7],
     [8], [1]), which it shouldn't.  Please replace those with straight
     textual mentions of the documents in question.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 133: '... network element MUST ensure that the ...'
     RFC 2119 keyword, line 144: '... network element MUST ensure that its ...'
     RFC 2119 keyword, line 154: '... network element MUST ensure that the ...'
     RFC 2119 keyword, line 172: '...an the MTU of the link MUST be policed...'
     RFC 2119 keyword, line 180: '...at has a new TSpec and/or RSpec SHOULD...'
     (39 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 261 has weird spacing: '...55) and  a...'

  == Line 544 has weird spacing: '...   flow  and c...'

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     Policing is done at the edge of the network.  Reshaping is done at
     all heterogeneous source branch points and at all source merge points.  A
     heterogeneous source branch point is a spot where the multicast
     distribution tree from a source branches to multiple distinct paths, and
     the TSpec's of the reservations on the various outgoing links are not all
     the same.  Reshaping need only be done if the TSpec on the outgoing link
     is "less than" (in the sense described in the Ordering section) the TSpec
     reserved on the immediately upstream link.  A source merge point is where
     the multicast distribution trees from two different sources (sharing the
     same reservation) merge.  It is the responsibility of the invoker of the
     service (a setup protocol, local configuration tool, or similar
     mechanism) to identify points where policing is required.  Reshaping may
     be done at other points as well as those described above. Policing MUST
     not be done except at the edge of the network.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (10 June 1996) is 10180 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  -- Possible downref: Non-RFC (?) normative reference: ref. '5'

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  -- Possible downref: Non-RFC (?) normative reference: ref. '7'

  -- Possible downref: Non-RFC (?) normative reference: ref. '8'


     Summary: 12 errors (**), 0 flaws (~~), 4 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                   Integrated Services WG
3	INTERNET-DRAFT                         S. Shenker/C. Partridge/R. Guerin
4	draft-ietf-intserv-guaranteed-svc-04.txt                   Xerox/BBN/IBM
5	                                                            10 June 1996
6	                                                         Expires: 1/1/97

8	             Specification of Guaranteed Quality of Service

10	Status of this Memo

12	   This document is an Internet-Draft.  Internet-Drafts are working
13	   documents of the Internet Engineering Task Force (IETF), its areas,
14	   and its working groups.  Note that other groups may also distribute
15	   working documents as Internet-Drafts.

17	   Internet-Drafts are draft documents valid for a maximum of six months
18	   and may be updated, replaced, or obsoleted by other documents at any
19	   time.  It is inappropriate to use Internet- Drafts as reference
20	   material or to cite them other than as ``work in progress.''

22	   To learn the current status of any Internet-Draft, please check the
23	   ``1id-abstracts.txt'' listing contained in the Internet- Drafts
24	   Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
25	   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
26	   ftp.isi.edu (US West Coast).

28	   This document is a product of the Integrated Services working group
29	   of the Internet Engineering Task Force.  Comments are solicited and
30	   should be addressed to the working group's mailing list at int-
31	   serv@isi.edu and/or the author(s).

33	   This draft reflects minor changes from the IETF meeting in Los
34	   Angeles.

36	Abstract

38	   This memo describes the network element behavior required to deliver
39	   a guaranteed service (guaranteed delay and bandwidth) in the
40	   Internet.  Guaranteed service provides firm (mathematically provable)
41	   bounds on end-to-end datagram queueing delays.  This service makes it
42	   possible to provide a service that guarantees both delay and
43	   bandwidth.  This specification follows the service specification
44	   template described in [1].

46	Introduction

48	   This document defines the requirements for network elements that
49	   support guaranteed service.  This memo is one of a series of
50	   documents that specify the network element behavior required to
51	   support various qualities of service in IP internetworks.  Services
52	   described in these documents are useful both in the global Internet
53	   and private IP networks.

55	   This document is based on the service specification template given in
56	   [1]. Please refer to that document for definitions and additional
57	   information about the specification of qualities of service within
58	   the IP protocol family.

60	End-to-End Behavior

62	   The end-to-end behavior provided by a series of service elements that
63	   conform to this document is an assured level of bandwidth that, when
64	   used by a policed flow, produces a delay-bounded service with no
65	   queueing loss for all conforming datagrams (assuming no failure of
66	   network components or changes in routing during the life of the
67	   flow).

69	   The end-to-end behavior conforms to the fluid model (described under
70	   Network Element Data Handling below) in that the delivered queueing
71	   delays do not exceed the fluid delays by more than the specified
72	   error bounds.  More precisely, the end-to-end delay bound is [(b-
73	   M)/R*(p-R)/(p-r)]+(M+Ctot)/R+Dtot for p>R, and (M+Ctot)/R+Dtot for
74	   r<=p<=R, (where b, r, p, M, R, Ctot, and Dtot are defined later in
75	   this document).

77	      NOTE: While the per-hop error terms needed to compute the end-to-
78	      end delays are exported by the service module (see Exported
79	      Information below), the mechanisms needed to collect per-hop
80	      bounds and make the end-to-end quantities Ctot and Dtot known to
81	      the applications are not described in this specification.  These
82	      functions are provided by reservation setup protocols, routing
83	      protocols or other network management functions and are outside
84	      the scope of this document.

86	   The maximum end-to-end queueing delay (as characterized by Ctot and
87	   Dtot) and bandwidth (characterized by R) provided along a path will
88	   be stable.  That is, they will not change as long as the end-to-end
89	   path does not change.

91	   Guaranteed service does not control the minimal delay of datagrams,
92	   merely the maximal queueing delay.

94	   This service is subject to admission control.

96	Motivation

98	   Guaranteed service guarantees that datagrams will arrive within the
99	   guaranteed delivery time and will not be discarded due to queue
100	   overflows, provided the flow traffic stays within its specified
101	   traffic parameters.  This service is intended for applications which
102	   need a firm guarantee that a datagram will arrive no later than a
103	   certain time after it was transmitted by its source.  For example,
104	   some audio and video "play-back" applications are intolerant of any
105	   datagram arriving after their play-back time.  Applications that have
106	   hard real-time requirements will also require guaranteed service.

108	   This service does not attempt to minimize the jitter (the difference
109	   between the minimal and maximal datagram delays); it merely controls
110	   the maximal queueing delay.  Because the guaranteed delay bound is a
111	   firm one, the delay has to be set large enough to cover extremely
112	   rare cases of long queueing delays.  Several studies have shown that
113	   the actual delay for the vast majority of datagrams can be far lower
114	   than the guaranteed delay.  Therefore, authors of playback
115	   applications should note that datagrams will often arrive far earlier
116	   than the delivery deadline and will have to be buffered at the
117	   receiving system until it is time for the application to process
118	   them.

120	   This service represents one extreme end of delay control for
121	   networks.  Most other services providing delay control provide much
122	   weaker assurances about the resulting delays.  In order to provide
123	   this high level of assurance, guaranteed service is typically only
124	   useful if provided by every network element along the path (i.e. by
125	   both routers and the links and switches that interconnect the
126	   routers).  Moreover, as described in the Exported Information
127	   section, effective provision and use of the service requires that the
128	   set-up protocol used to request service provides service
129	   characterizations to intermediate routers and to the endpoints.

131	Network Element Data Handling Requirements

133	   The network element MUST ensure that the service approximates the
134	   "fluid model" of service.  The fluid model at service rate R is
135	   essentially the service that would be provided by a dedicated wire of
136	   bandwidth R between the source and receiver.  Thus, in the fluid
137	   model of service at a fixed rate R, the flow's service is completely
138	   independent of that of any other flow.

140	   The flow's level of service is characterized at each network element
141	   by a bandwidth (or service rate) R and a buffer size B.  R represents
142	   the share of the link's bandwidth the flow is entitled to and B
143	   represents the buffer space in the router that the flow may consume.
144	   The network element MUST ensure that its service matches the fluid
145	   model at that same rate to within a sharp error bound.

147	   The definition of guaranteed service relies on the result that the
148	   fluid delay of a flow obeying a token bucket (r,b) and being served
149	   by a line with bandwidth R is bounded by b/R as long as R is no less
150	   than r.  Guaranteed service with a service rate R, where now R is a
151	   share of bandwidth rather than the bandwidth of a dedicated line,
152	   approximates this behavior.

154	   Consequently, the network element MUST ensure that the delay of any
155	   datagram be less than b/R+C/R+D, where C and D describe the maximal
156	   local deviation away from the fluid model.  It is important to
157	   emphasize that C and D are maximums.  So, for instance, if an
158	   implementation has occasional gaps in service (perhaps due to
159	   processing routing updates), D needs to be large enough to account
160	   for the time a datagram may lose during the gap in service.  (C and D
161	   are described in more detail in the section on Exported Information).

163	      NOTE: Strictly speaking, this memo requires only that the service
164	      a flow receives is never worse than it would receive under this
165	      approximation of the fluid model.  It is perfectly acceptable to
166	      give better service.  For instance, if a flow is currently not
167	      using its share, R, algorithms such as Weighted Fair Queueing that
168	      temporarily give other flows the unused bandwidth, are perfectly
169	      acceptable (indeed, are encouraged).

171	   Links are not permitted to fragment datagrams as part of guaranteed
172	   service.  Datagrams larger than the MTU of the link MUST be policed
173	   as nonconformant which means that they will be policed according to
174	   the rules described in the Policing section below.

176	Invocation Information

178	   Guaranteed service is invoked by specifying the traffic (TSpec) and
179	   the desired service (RSpec) to the network element.  A service
180	   request for an existing flow that has a new TSpec and/or RSpec SHOULD
181	   be treated as a new invocation, in the sense that admission control
182	   SHOULD be reapplied to the flow.  Flows that reduce their TSpec
183	   and/or their RSpec (i.e., their new TSpec/RSpec is strictly smaller
184	   than the old TSpec/RSpec according to the ordering rules described in
185	   the section on Ordering below) SHOULD never be denied service.

187	   The TSpec takes the form of a token bucket plus a peak rate (p), a
188	   minimum policed unit (m), and a maximum datagram size (M).

190	   The token bucket has a bucket depth, b, and a bucket rate, r.  Both b
191	   and r MUST be positive.  The rate, r, is measured in bytes of IP
192	   datagrams per second, and can range from 1 byte per second to as
193	   large as 40 terabytes per second (or close to what is believed to be
194	   the maximum theoretical bandwidth of a single strand of fiber).
195	   Clearly, particularly for large bandwidths, only the first few digits
196	   are significant and so the use of floating point representations,
197	   accurate to at least 0.1% is encouraged.

199	   The bucket depth, b, is also measured in bytes and can range from 1
200	   byte to 250 gigabytes.  Again, floating point representations
201	   accurate to at least 0.1% are encouraged.

203	   The range of values is intentionally large to allow for the future
204	   bandwidths.  The range is not intended to imply that a network
205	   element has to support the entire range.

207	   The peak rate, p, is measured in bytes of IP datagrams per second and
208	   has the same range and suggested representation as the bucket rate.
209	   The peak rate is the maximum rate at which the source and any
210	   reshaping points (reshaping points are defined below) may inject
211	   bursts of traffic into the network.  More precisely, it is a
212	   requirement that for all time periods the amount of data sent cannot
213	   exceed M+pT where M is the maximum datagram size and T is the length
214	   of the time period.  Furthermore, p MUST be greater than or equal to
215	   the token bucket rate, r.  If the peak rate is unknown or
216	   unspecified, then p MUST be set to infinity.

218	   The minimum policed unit, m, is an integer measured in bytes.  All IP
219	   datagrams less than size m will be counted, when policed and tested
220	   for conformance to the TSpec, as being of size m.  The maximum
221	   datagram size, M, is the biggest datagram that will conform to the
222	   traffic specification; it is also measured in bytes.  The flow MUST
223	   be rejected if the requested maximum datagram size is larger than the
224	   MTU of the link.  Both m and M MUST be positive, and m MUST be less
225	   than or equal to M.

227	   The RSpec is a rate R and a slack term S, where R MUST be greater
228	   than or equal to r and S MUST be nonnegative.  The RSpec rate can be
229	   bigger than the TSpec rate because higher rates will reduce queueing
230	   delay.  The slack term signifies the difference between the desired
231	   delay and the delay obtained by using a reservation level R.  This
232	   slack term can be utilized by the service element to reduce its
233	   resource reservation for this flow. When a service element chooses to
234	   utilize some of the slack in the RSpec, it MUST follow specific rules
235	   in updating the R and S fields of the RSpec; these rules are
236	   specified in the Ordering and Merging section.  If at the time of
237	   service invocation no slack is specified, the slack term, S, is set
238	   to zero.  No buffer specification is included in the RSpec because
239	   the service element is expected to derive the required buffer space
240	   to ensure no queueing loss from the token bucket and peak rate in the
241	   TSpec, the reserved rate and slack in the RSpec, the exported
242	   information received at the network element, i.e., Ctot and Dtot or
243	   Csum and Dsum, combined with internal information about how the
244	   element manages its traffic.

246	   The TSpec can be represented by three floating point numbers in
247	   single-precision IEEE floating point format followed by two 32-bit
248	   integers in network byte order.  The first floating point value is
249	   the rate (r), the second floating point value is the bucket size (b),
250	   the third floating point is the peak rate (p), the first integer is
251	   the minimum policed unit (m), and the second integer is the maximum
252	   datagram size (M).

254	   The RSpec rate, R, and the slack term, S, can also be represented
255	   using single-precision IEEE floating point.

257	   If the IEEE floating point representation is used, the sign bit MUST
258	   be zero. (All values MUST be positive).  Exponents less than 127
259	   (i.e., 0) are prohibited.  Exponents greater than 162 (i.e., positive
260	   35) are discouraged, except for specifying a peak rate of infinity.
261	   Infinity is represented with an exponent of all ones (255) and  a
262	   sign bit and mantissa of all zeroes.

264	Exported Information

266	   Each guaranteed service module MUST export at least the following
267	   information.  All of the parameters described below are
268	   characterization parameters.

270	   A network elements implementation of guaranteed service is
271	   characterized by two error terms, C and D, which represent how the
272	   element's implementation of the guaranteed service deviates from the
273	   fluid model.  These two parameters have an additive composition rule.

275	   The error term C is the rate-dependent error term.  It represents the
276	   delay a datagram in the flow might experience due to the rate
277	   parameters of the flow.  An example of such an error term is the need
278	   to account for the time taken serializing a datagram broken up into
279	   ATM cells, with the cells sent at a frequency of 1/r.

281	      NOTE: It is important to observe that when computing the delay
282	      bound, parameter C is divided by the reservation rate R.  This
283	      division is done because, as with the example of serializing the
284	      datagram, the effect of the C term is a function of the
285	      transmission rate.  Implementers should take care to confirm that
286	      their C values, when divided by various rates, give appropriate
287	      results.  Delay values that are not dependent on the rate should
288	      be incorporated into the value for the D parameter.

290	   The error term D is the fixed per-element error term and represents
291	   the worst case non-rate-based transit time through the service
292	   element.  It is generally determined or set at boot or configuration
293	   time.  An example of D is a slotted network, in which guaranteed
294	   flows are assigned particular slots in a cycle of slots.  Some part
295	   of the per-flow delay may be determined by which slots in the cycle
296	   are allocated to the flow.  In this case, D would measure the maximum
297	   amount of time a flow's data, once ready to be sent, might have to
298	   wait for a slot.  (Observe that this value can be computed before
299	   slots are assigned and thus can be advertised.  For instance, imagine
300	   there are 100 slots.  In the worst case, a flow might get all of its
301	   N slots clustered together, such that if a packet was made ready to
302	   send just after the cluster ended, the packet might have to wait
303	   100-N slot times before transmitting.  In this case one can easily
304	   approximate this delay by setting D to 100 slot times).

306	   If the composition function is applied along the entire path to
307	   compute the end-to-end sums of C and D (Ctot and Dtot) and the
308	   resulting values are then provided to the end nodes (by presumably
309	   the setup protocol), the end nodes can compute the maximal datagram
310	   queueing delays.  Moreover, if the partial sums (Csum and Dsum) from
311	   the most recent reshaping point (reshaping points are defined below)
312	   downstream towards receivers are handed to each network element then
313	   these network elements can compute the buffer allocations necessary
314	   to achieve no datagram loss, as detailed in the section Guidelines
315	   for Implementors.  The proper use and provision of this service
316	   requires that the quantities Ctot and Dtot, and the quantities Csum
317	   and Dsum be computed.  Therefore, we assume that usage of guaranteed
318	   service will be primarily in contexts where these quantities are made
319	   available to end nodes and network elements.

321	   The error term C is measured in units of bytes.  An individual
322	   element can advertise a C value between 1 and 2**28 (a little over
323	   250 megabytes) and the total added over all elements can range as
324	   high as (2**32)-1.  Should the sum of the different elements delay
325	   exceed (2**32)-1, the end-to-end error term MUST be set to (2**32)-1.

327	   The error term D is measured in units of one microsecond.  An
328	   individual element can advertise a delay value between 1 and 2**28
329	   (somewhat over two minutes) and the total delay added all elements
330	   can range as high as (2**32)-1.  Should the sum of the different
331	   elements delay exceed (2**32)-1, the end-to-end delay MUST be set to
332	   (2**32)-1.

334	   The guaranteed service is service_name 2.

336	   Error characterization parameter C is numbered 1 and parameter D is
337	   numbered 2.

339	   The end-to-end composed value (Ctot) for C is numbered 3 and the
340	   end-to-end composed value for D (Dtot) is numbered 4.

342	   The since-last-reshaping point composed value (Csum) for C is
343	   numbered 5 and the since-last-reshaping point composed value for D
344	   (Dsum) is numbered 6.

346	   No other exported data is required by this specification.

348	Policing

350	   There are two forms of policing in guaranteed service.  One form is
351	   simple policing (hereafter just called policing to be consistent with
352	   other documents), in which arriving traffic is compared against a
353	   TSpec.  The other form is reshaping, where an attempt is made to
354	   restore (possibly distorted) traffic's shape to conform to the TSpec,
355	   and the fact that traffic is in violation of the TSpec is discovered
356	   because the reshaping fails (the reshaping buffer overflows).

358	   Policing is done at the edge of the network.  Reshaping is done at
359	   all heterogeneous source branch points and at all source merge
360	   points.  A heterogeneous source branch point is a spot where the
361	   multicast distribution tree from a source branches to multiple
362	   distinct paths, and the TSpec's of the reservations on the various
363	   outgoing links are not all the same.  Reshaping need only be done if
364	   the TSpec on the outgoing link is "less than" (in the sense described
365	   in the Ordering section) the TSpec reserved on the immediately
366	   upstream link.  A source merge point is where the multicast
367	   distribution trees from two different sources (sharing the same
368	   reservation) merge.  It is the responsibility of the invoker of the
369	   service (a setup protocol, local configuration tool, or similar
370	   mechanism) to identify points where policing is required.  Reshaping
371	   may be done at other points as well as those described above.
372	   Policing MUST not be done except at the edge of the network.

374	   The token bucket and peak rate parameters require that traffic MUST
375	   obey the rule that over all time periods, the amount of data sent
376	   cannot exceed M+min[pT, rT+b-M], where r and b are the token bucket
377	   parameters, M is the maximum datagram size, and T is the length of
378	   the time period (note that when p is infinite this reduces to the
379	   standard token bucket requirement).  For the purposes of this
380	   accounting, links MUST count datagrams which are smaller than the
381	   minimal policing unit to be of size m.  Datagrams which arrive at an
382	   element and cause a violation of the the M+min[pT, rT+b-M] bound are
383	   considered non-conformant.

385	   At the edge of the network, traffic is policed to ensure it conforms
386	   to the token bucket.  Non-conforming datagrams are treated as best-
387	   effort datagrams.  [If and when a marking ability becomes available,
388	   these non-conformant datagrams SHOULD be ''marked'' as being non-
389	   compliant and then treated as best effort datagrams at all subsequent
390	   routers.]

392	      NOTE: There may be situations outside the scope of this document,
393	      such as when a service module's implementation of guaranteed
394	      service is being used to implement traffic sharing rather than a
395	      quality of service, where the desired action is to discard non-
396	      conforming datagrams.  To allow for such uses, implementors SHOULD
397	      ensure that the action to be taken for non-conforming datagrams is
398	      configurable.

400	   Inside the network, policing does not produce the desired results,
401	   because queueing effects will occasionally cause a flow's traffic
402	   that entered the network as conformant to be no longer conformant at
403	   some downstream network element.  Therefore, inside the network,
404	   service elements that wish to police traffic MUST do so by reshaping
405	   traffic to the token bucket.  Reshaping entails delaying datagrams
406	   until they are within conformance of the TSpec.

408	   Reshaping is done by combining a buffer with a token bucket and peak
409	   rate regulator and buffering data until it can be sent in conformance
410	   with the token bucket and peak rate parameters.  (The token bucket
411	   regulator MUST start with its token bucket full of tokens).  Under
412	   guaranteed service, the amount of buffering required to reshape any
413	   conforming traffic back to its original token bucket shape is
414	   b+Csum+(Dsum*r), where Csum and Dsum are the sums of the parameters C
415	   and D between the last reshaping point and the current reshaping
416	   point.  Note that the knowledge of the peak rate at the reshapers can
417	   be used to reduce these buffer requirements (see the section on
418	   "Guidelines for Implementors" below).  A network element MUST provide
419	   the necessary buffers to ensure that conforming traffic is not lost
420	   at the reshaper.

422	   If a datagram arrives to discover the reshaping buffer is full, then
423	   the datagram is non-conforming.  Observe this means that a reshaper
424	   is effectively policing too.  As with a policer, the reshaper SHOULD
425	   relegate non-conforming datagrams to best effort.  [If marking is
426	   available, the non-conforming datagrams SHOULD be marked]

428	      NOTE: As with policers, it SHOULD be possible to configure how
429	      reshapers handle non-conforming datagrams.

431	   Note that while the large buffer makes it appear that reshapers add
432	   considerable delay, this is not the case.  Given a valid TSpec that
433	   accurately describes the traffic, reshaping will cause little extra
434	   actual delay at the reshaping point (and will not affect the delay
435	   bound at all).  Furthermore, in the normal case, reshaping will not
436	   cause the loss of any data.

438	   However, (typically at merge or branch points), it may happen that
439	   the TSpec is smaller than the actual traffic.  If this happens,
440	   reshaping will cause a large queue to develop at the reshaping point,
441	   which both causes substantial additional delays and forces some
442	   datagrams to be treated as non-conforming.  This scenario makes an
443	   unpleasant denial of service attack possible, in which a receiver who
444	   is successfully receiving a flow's traffic via best effort service is
445	   pre-empted by a new receiver who requests a reservation for the flow,
446	   but with an inadequate TSpec and RSpec.  The flow's traffic will now
447	   be policed and possibly reshaped.  If the policing function was
448	   chosen to discard datagrams, the best-effort receiver would stop
449	   receiving traffic.  For this reason, in the normal case, policers are
450	   simply to treat datagrams as best effort (and marking them if marking
451	   is implemented).  While this protects against denial of service, it
452	   is still true that the bad TSpec may cause queueing delays to
453	   increase.

455	      NOTE: To minimize problems of reordering datagrams, reshaping
456	      points may wish to forward a best-effort datagram from the front
457	      of the reshaping queue when a new datagram arrives and the
458	      reshaping buffer is full.

460	      Readers should also observe that reclassifying datagrams as best
461	      effort also makes support for elastic flows easier.  They can
462	      reserve a modest token bucket and when their traffic exceeds the
463	      token bucket, the excess traffic will be sent best effort.

465	   A related issue is that at all network elements, datagrams bigger
466	   than the MTU of the network element MUST be considered non-conformant
467	   and SHOULD be classified as best effort (and will then either be
468	   fragmented or dropped according to the element's handling of best
469	   effort traffic).  [Again, if marking is available, these reclassified
470	   datagrams SHOULD be marked.]

472	Ordering and Merging

474	   TSpec's are ordered according to the following rule: TSpec A is a
475	   substitute ("as good or better than") for TSpec B if (1) both the
476	   token rate r and bucket depth b for TSpec A are greater than or equal
477	   to those of TSpec B, (2) the peak rate p is at least as large in
478	   TSpec A as it is in TSpec B.  (3) the minimum policed unit m is at
479	   least as small for TSpec A as it is for TSpec B, and (4) the maximum
480	   datagram size M is at least as large for TSpec A as it is for TSpec
481	   B.

483	   A merged TSpec may be calculated over a set of TSpecs by taking the
484	   largest token bucket rate, largest bucket size, largest peak rate,
485	   smallest minimal policed unit, and largest maximum datagram size
486	   across all members of the set.  This use of the word "merging" is
487	   similar to that in the RSVP protocol; a merged TSpec is one which is
488	   adequate to describe the traffic from any one of a number of flows.

490	   The RSpec's are merged in a similar manner as the TSpecs, i.e. a set
491	   of RSpecs is merged onto a single RSpec by taking the largest rate R,
492	   and the smallest slack S.  More precisely, RSpec A is a substitute
493	   for RSpec B if the value of reserved service rate, R, in RSpec A is
494	   greater than or equal to the value in RSpec B, and the value of the
495	   slack, S, in RSpec A is smaller than or equal to that in RSpec B.

497	   Each network element receives a service request of the form (TSpec,
498	   RSpec), where the RSpec is of the form (Rin, Sin).  The network
499	   element processes this request and performs one of two actions:

501	    a. it accepts the request and returns a new Rspec of the form
502	       (Rout, Sout);
503	    b. it rejects the request.

505	   The processing rules for generating the new RSpec are governed by the
506	   delay constraint:

508	          Sout + b/Rout + Ctoti/Rout <= Sin + b/Rin + Ctoti/Rin,

510	   where Ctoti is the cumulative sum of the error terms, C, for all the
511	   network elements that are upstream of the current element, i.  In
512	   other words, this element consumes (Sin - Sout) of slack and can use
513	   it to reduce its reservation level, provided that the above
514	   inequality is satisfied.  Rin and Rout MUST also satisfy the
515	   constraint:

517	                             r <= Rout <= Rin.

519	   When several RSpec's, each with rate Rj, j=1,2..., are to be merged
520	   at a split point, the value of Rout is the maximum over all the rates
521	   Rj, and the value of Sout is the minimum over all the slack terms Sj.

523	 Guidelines for Implementors

525	   This section discusses a number of important implementation issues in
526	   no particular order.

528	   It is important to note that individual subnetworks are service
529	   elements and both routers and subnetworks MUST support the guaranteed
530	   service model to achieve guaranteed service.  Since subnetworks
531	   typically are not capable of negotiating service using IP-based
532	   protocols, as part of providing guaranteed service, routers will have
533	   to act as proxies for the subnetworks they are attached to.

535	   In some cases, this proxy service will be easy.  For instance, on
536	   leased line managed by a WFQ scheduler on the upstream node, the
537	   proxy need simply ensure that the sum of all the flows' RSpec rates
538	   does not exceed the bandwidth of the line, and needs to advertise the
539	   rate-based and non-rate-based delays of the link as the values of C
540	   and D.

542	   In other cases, this proxy service will be complex.  In an ATM
543	   network, for example, it may require establishing an ATM VC for the
544	   flow  and computing the C and D terms for that VC.  Readers may
545	   observe that the token bucket and peak rate used by guaranteed
546	   service map directly to the Sustained Cell Rate, Burst Size, and Peak
547	   Cell Rate of ATM's Q.2931 QoS parameters for Variable Bit Rate
548	   traffic.

550	   The assurance that datagrams will not be lost is obtained by setting
551	   the router buffer space B to be equal to the token bucket b plus some
552	   error term (described below).

554	   Another issue related to subnetworks is that the TSpec's token bucket
555	   rates measure IP traffic and do not (and cannot) account for link
556	   level headers.  So the subnetwork service elements MUST adjust the
557	   rate and possibly the bucket size to account for adding link level
558	   headers.  Tunnels MUST also account for the additional IP headers
559	   that they add.

561	   For datagram networks, a maximum header rate can usually be computed
562	   by dividing the rate and bucket sizes by the minimum policed unit.
563	   For networks that do internal fragmentation, such as ATM, the
564	   computation may be more complex, since one MUST account for both
565	   per-fragment overhead and any wastage (padding bytes transmitted) due
566	   to mismatches between datagram sizes and fragment sizes.  For
567	   instance, a conservative estimate of the additional data rate imposed
568	   by ATM AAL5 plus ATM segmentation and reassembly is

570	                         ((r/48)*5)+((r/m)*(8+52))

572	   which represents the rate divided into 48-byte cells multiplied by
573	   the 5-byte ATM header, plus the maximum datagram rate (r/m)
574	   multiplied by the cost of the 8-byte AAL5 header plus the maximum
575	   space that can be wasted by ATM segmentation of a datagram (which is
576	   the 52 bytes wasted in a cell that contains one byte).  But this
577	   estimate is likely to be wildly high, especially if m is small, since
578	   ATM wastage is usually much less than 52 bytes.  (ATM implementors
579	   should be warned that the token bucket may also have to be scaled
580	   when setting the VC parameters for call setup and that this example
581	   does not account for overhead incurred by encapsulations such as
582	   those specified in RFC 1483).

584	   To ensure no loss, service elements will have to allocate some
585	   buffering for bursts.  If every hop implemented the fluid model
586	   perfectly, this buffering would simply be b (the token bucket size).
587	   However, as noted in the discussion of reshaping earlier,
588	   implementations are approximations and we expect that traffic will
589	   become more bursty as it goes through the network.  However, as with
590	   shaping the amount of buffering required to handle the burstiness is
591	   bounded by b+Csum+Dsum*R.  If one accounts for the peak rate, this
592	   can be further reduced to

594	                  M + (b-M)(p-X)/(p-r) + (Csum/R + Dsum)X

596	   where X is set to r if (b-M)/(p-r) is less than Csum/R+Dsum and X is
597	   R if (b-M)/(p-r) is greater than or equal to Csum/R+Dsum and p>R;
598	   otherwise, X is set to p.  This reduction comes from the fact that
599	   the peak rate limits the rate at which the burst, b, can be placed in
600	   the network. Note also, the buffer requirements can be lowered by
601	   subtracting from Dsum the propagation delay since the last reshaping
602	   point, if it is known. Conversely, if a non-zero slack term, Sout, is
603	   returned by the network element, the buffer requirements are
604	   increased by adding Sout to Dsum.

606	   While sending applications are encouraged to set the peak rate
607	   parameter and reshaping points are required to conform to it, it is
608	   always acceptable to ignore the peak rate for the purposes of
609	   computing buffer requirements and end-to-end delays.  The result is
610	   simply an overestimate of the buffering and delay.  As noted above,
611	   if the peak rate is unknown (and thus potentially infinite), the
612	   buffering required is b+Csum+Dsum*R.  The end-to-end delay without
613	   the peak rate is b/R+Ctot/R+Dtot.

615	   The parameter D at each service element SHOULD be set to the maximum
616	   datagram transfer delay (independent of rate and bucket size) through
617	   the service element.  For instance, in a simple router, one might
618	   compute the worst case amount of time it make take for a datagram to
619	   get through the input interface to the processor, and how long it
620	   would take to get from the processor to the outbound link scheduler
621	   (assuming the queueing schemes work correctly).  For an Ethernet, it
622	   might represent the worst case delay if the maximum number of
623	   collisions is experienced.  For datagramized weighted fair queueing,
624	   D is set to the link MTU divided by the link bandwidth, to account
625	   for the possibility that a packet arrives just as a maximum-sized
626	   packet begins to be transmitted, and that the arriving packet should
627	   have departed before the maximum-sized packet.

629	   D is intended to be distinct from the latency through the service
630	   element.  Latency is the minimum time through the device (the speed
631	   of light delay in a fiber or the absolute minimum time it would take
632	   to move a packet through a router), while parameter D is intended to
633	   bound the variability in non-rate-based delay.  In practice, this
634	   distinction is sometimes arbitrary (the latency may be minimal) -- in
635	   such cases it is perfectly reasonable to combine the latency with D
636	   and to advertise any latency as zero.

638	      NOTE: It is implicit in this scheme that to get a complete
639	      guarantee of the maximum delay a packet might experience, a user
640	      of this service will need to know both the queueing delay
641	      (provided by C and D) and the latency.  The latency is not
642	      advertised by this service but is a general characterization
643	      parameter (advertised as specified in [7]).

645	      However, even if latency is not advertised, this service can still
646	      be used.  The simplest approach is to measure the delay
647	      experienced by the first packet (or the minimum delay of the first
648	      few packets) received and treat this delay value as an upper bound
649	      on the latency.

651	   The parameter C is the data backlog resulting from the vagaries of
652	   how a specific implementation deviates from a strict bit-by-bit
653	   service. So, for instance, for datagramized weighted fair queueing, C
654	   is set to M to account for packetization effects.

656	   If a network element uses a certain amount of slack, Si, to reduce
657	   the amount of resources that it has reserved for a particular flow,
658	   i, the value Si SHOULD be stored at the network element.
659	   Subsequently, if reservation refreshes are received for flow i, the
660	   network element MUST use the same slack Si without any further
661	   computation. This guarantees consistency in the reservation process.

663	   As an example for the use of the slack term, consider the case where
664	   the required end-to-end delay, Dreq, is larger than the maximum delay
665	   of the fluid flow system. The latter is obtained by setting R=r in
666	   the fluid delay formula (for stability, R>=r must be true), and is
667	   given by

669	                           b/r + Ctot/r + Dtot.

671	   In this case the slack term is

673	                     S = Dreq - (b/r + Ctot/r + Dtot).

675	   The slack term may be used by the network elements to adjust their
676	   local reservations, so that they can admit flows that would otherwise
677	   have been rejected. A service element at an intermediate network
678	   element that can internally differentiate between delay and rate
679	   guarantees can now take advantage of this information to lower the
680	   amount of resources allocated to this flow. For example, by taking an
681	   amount of slack s <= S, an RCSD scheduler [5] can increase the local
682	   delay bound, d, assigned to the flow, to d+s. Given an RSpec, (Rin,
683	   Sin), it would do so by setting Rout = Rin and Sout = Sin - s.

685	   Similarly, a network element using a WFQ scheduler can decrease its
686	   local reservation from Rin to Rout by using some of the slack in the
687	   RSpec. This can be accomplished by using the transformation rules
688	   given in the previous section, that ensure that the reduced
689	   reservation level will not increase the overall end-to-end delay.

691	Evaluation Criteria

693	   The scheduling algorithm and admission control algorithm of the
694	   element MUST ensure that the delay bounds are never violated.
695	   Furthermore, the element MUST ensure that misbehaving flows do not
696	   affect the service given to other flows.  Vendors are encouraged to
697	   formally prove that their implementation is an approximation of the
698	   fluid model.

700	 Examples of Implementation

702	   Several algorithms and implementations exist that approximate the
703	   fluid model.  They include Weighted Fair Queueing (WFQ) [2], Jitter-
704	   EDD [3], Virtual Clock [4] and a scheme proposed by IBM [5].  A nice
705	   theoretical presentation that shows these schemes are part of a large
706	   class of algorithms can be found in [6].

708	 Examples of Use

710	   Consider an application that is intolerant of any lost or late
711	   datagrams.  It uses the advertised values Ctot and Dtot and the TSpec
712	   of the flow, to compute the resulting delay bound from a service
713	   request with rate R. Assuming R < p, it then sets its playback point
714	   to [(b-M)/R*(p-R)/(p-r)]+(M+Ctot)/R+Dtot.

716	Security Considerations

718	   This memo discusses how this service could be abused to permit denial
719	   of service attacks.  The service, as defined, does not allow denial
720	   of service (although service may degrade under certain
721	   circumstances).

723	Acknowledgements

725	   The authors would like to gratefully acknowledge the help of the INT
726	   SERV working group.  The basic results for the delay guarantees come
727	   from reference [8].

729	References

731	   [1] S. Shenker and J. Wroclawski. "Network Element Service
732	   Specification Template", Internet Draft, June 1995, <draft-ietf-
733	   intserv-svc-template-01.txt>

735	   [2] A. Demers, S. Keshav and S. Shenker, "Analysis and Simulation of
736	   a Fair Queueing Algorithm," in Internetworking: Research and
737	   Experience, Vol 1, No. 1., pp. 3-26.

739	   [3] L. Zhang, "Virtual Clock: A New Traffic Control Algorithm for
740	   Packet Switching Networks," in Proc. ACM SIGCOMM '90, pp. 19-29.

742	   [4] D. Verma, H. Zhang, and D. Ferrari, "Guaranteeing Delay Jitter
743	   Bounds in Packet Switching Networks," in Proc. Tricomm '91.

745	   [5] L. Georgiadis, R. Guerin, V. Peris, and K. N. Sivarajan,
746	   "Efficient Network QoS Provisioning Based on per Node Traffic
747	   Shaping," IBM Research Report No. RC-20064.

749	   [6] P. Goyal, S.S. Lam and H.M. Vin, "Determining End-to-End Delay
750	   Bounds in Heterogeneous Networks," in Proc. 5th Intl. Workshop on
751	   Network and Operating System Support for Digital Audio and Video,
752	   April 1995.

754	   [7] S. Shenker. "Specification of General Characterization
755	   Parameters", Internet Draft, November 1995, <draft-ietf-intserv-
756	   charac-01.txt>

758	   [8] A.K.J. Parekh, A Generalized Processor Sharing Approach to Flow
759	   Control in Integrated Services Networks, MIT Laboratory for
760	   Information and Decision Systems, Report LIDS-TH-2089, February 1992.

762	   Scott Shenker
763	   Xerox PARC
764	   3333 Coyote Hill Road
765	   Palo Alto, CA  94304-1314

767	   email: shenker@parc.xerox.com
768	   415-812-4840
769	   415-812-4471 (FAX)

771	   Craig Partridge
772	   BBN
773	   2370 Amherst St
774	   Palo Alto CA 94306

776	   email: craig@bbn.com

778	   Roch Guerin
779	   IBM T.J. Watson Research Center
780	   Yorktown Heights, NY 10598

782	   email: guerin@watson.ibm.com
783	   914-784-7038
784	   914-784-6318 (FAX)