idnits 2.17.1 

draft-westberg-loadcntr-03.txt:
  ** The Abstract section seems to be numbered


  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The abstract seems to contain references ([RFC2119]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (April 2000) is 8777 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? 'RFC2119' on line 505 looks like a reference

  -- Missing reference section? 'RFC2475' on line 462 looks like a reference

  -- Missing reference section? 'Berson97' on line 489 looks like a reference

  -- Missing reference section? 'Guerin97' on line 492 looks like a reference

  -- Missing reference section? 'Stoica99' on line 486 looks like a reference

  -- Missing reference section? 'Bernet99' on line 482 looks like a reference

  -- Missing reference section? 'Tur99' on line 498 looks like a reference

  -- Missing reference section? 'RFC2481' on line 779 looks like a reference

  -- Missing reference section? 'RFC2402' on line 468 looks like a reference

  -- Missing reference section? '2406' on line 468 looks like a reference

  -- Missing reference section? 'RFC2406' on line 479 looks like a reference

  -- Missing reference section? 'Gross99' on line 714 looks like a reference

  -- Missing reference section? 'IAB-QoS' on line 502 looks like a reference


     Summary: 5 errors (**), 0 flaws (~~), 1 warning (==), 15 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Draft                Load Control                    April 2000

4	                   Load Control of Real-Time Traffic
5	                     draft-westberg-loadcntr-03.txt
6	                         Document Revision: 1.3
7	                          2000/04/19 12:43:19

9	                  A Two-bit Resource Allocation Scheme

11	                               April 2000

13	                              L. Westberg
14	                             Z. R. Turanyi
15	                               D. Partain

17	                                Ericsson

19	Status of this Memo

21	This document is an Internet-Draft and is in full conformance with all
22	provisions of Section 10 of RFC2026.

24	Internet-Drafts are working documents of the Internet Engineering Task
25	Force (IETF), its areas, and its working groups.  Note that other groups
26	may also distribute working documents as Internet-Drafts.

28	Internet-Drafts are draft documents valid for a maximum of six months
29	and may be updated, replaced, or obsoleted by other documents at any
30	time.  It is inappropriate to use Internet- Drafts as reference material
31	or to cite them other than as "work in progress."

33	The list of current Internet-Drafts can be accessed at
34	http://www.ietf.org/ietf/1id-abstracts.txt

36	The list of Internet-Draft Shadow Directories can be accessed at
37	http://www.ietf.org/shadow.html.

39	1.  Abstract

41	   The purpose of this memo is to present a new resource allocation
42	   scheme for DiffServ (DS) networks, called Load Control.  The main
43	   purpose of Load Control is to provide a simple and scalable solution
44	   to the resource provisioning problem.

46	   Load Control addresses two particular issues:
47	     1. Measurement-based access control, whereby a probe packet is
48	        sent along the forwarding path in a network to determine
49	        whether a flow can be admitted based upon the current
50	        congestion state of the network
51	     2. A lightweight reservation of a certain amount of network
52	        resources.

54	   Load Control uses two-bit markers in packet headers to carry load
55	   information from core routers to edge devices. The scheme provides
56	   the capability of controlling the traffic load in the network without
57	   requiring signaling or any per-flow processing in the core routers.
58	   The complexity of Load Control is kept to a minimum to make
59	   implementation simple.

61	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
62	   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
63	   document are to be interpreted as described in [RFC2119].

65	2.  Background and Motivation

67	   The amount of traffic carried on the Internet is now greater than the
68	   traffic on the world's telephony network. Still, Internet-based
69	   communication services generate less income than plain old telephony
70	   services. Enabling value-added services over the Internet is
71	   therefore crucial for service providers. One significant class of
72	   such value-added services requires real-time packet transportation.
73	   It can be expected that these real-time services will be popular as
74	   they replicate or are natural extensions of existing communication
75	   services like telephony.  Exact and reliable resource management
76	   (e.g., admission control) is essential for achieving high utilization
77	   in networks with real-time transportation capabilities. The problem
78	   is difficult mainly due to scalability issues.

80	   With the introduction of differentiated services (DS) [RFC2475], it
81	   is now possible to provide large scale, real-time services. The basic
82	   idea of DiffServ is that, rather than classifying packets at each
83	   router, packets are only classified at the edge devices.  The result
84	   - the required packet treatment - is stored and carried in the packet
85	   headers, and core routers can carry out appropriate scheduling.

87	   The current definition of DiffServ, however, does not contain any
88	   simple, scalable solution to the problem of resource provisioning and
89	   control. A number of approaches to solving the problem already exist
90	   [Berson97, Guerin97, Stoica99, Bernet99].  The scheme presented in
91	   this document does not require any state aggregation and aims at
92	   extreme simplicity and low cost of implementation along with good
93	   scaling properties. Load control operates edge-to-edge in a DS
94	   domain, or between two RSVP-capable routers, where only the edge
95	   devices keep flow state and do per-flow processing.  The main purpose
96	   of Load Control is to provide a simple and scalable solution to the
97	   resource provisioning problem.

99	3.  Overview

101	   Load control is achieved by two actions: measurement-based admission
102	   control of incoming requests and the dropping of admitted flows in
103	   case of exceptional events such as link failures.  Load Control uses
104	   two-bit markers in the packet headers to gather information about the
105	   load level along various paths through the network.  The core routers
106	   are able to mark passing packets to signal the exhaustion of
107	   resources to the edge devices.

109	   For admission control, the resource state of core routers is gathered
110	   by sending a specially marked packet, denoted a "probe" packet, from
111	   the ingress to the egress edge device.  The probe result is then used
112	   by the ingress to decide flow acceptance or rejection and to set up
113	   traffic conditioning/policy.  If rigid admission control is required,
114	   soft-state based reservations are also supported. In this case the
115	   probe packet does both the probing and allocation of resources along
116	   the path. The latter method is comparable to signaling based schemes
117	   but does not require processing of signaling messages in the core
118	   routers.

120	   Under normal circumstances, admission control is enough to control
121	   the load in the network. Nevertheless, when exceptional events (such
122	   as link failures) cause too much traffic to be re-routed over a link,
123	   the resulting severe congestion may degrade the quality of all of the
124	   flows on the link. In that case, the best solution might be to keep
125	   existing flows and suffer the loss of quality. However, for some
126	   services, it may be desirable to drop some of the previously admitted
127	   flows to protect the quality of the remaining flows. Thus, when
128	   severe congestion occurs, the core routers mark the headers of all
129	   (not only probe) packets to notify the edge devices of the congestion
130	   condition.

132	   In the following sections, we assume a DS (DiffServ) domain where
133	   connection requests arrive at the edges of the domain via RSVP, at
134	   the request of a Bandwidth Broker, or by other means.  The requests
135	   may be generated directly at the edge by a gateway, which provides
136	   connection to other types of networks, or in hosts that are connected
137	   directly to the domain.

139	4.  Operation of Load Control

141	   The load control scheme has two modes of operation:

143	      a) 'Simple marking':  This refers to a measurement-based admission
144	      scheme where routers measure the traffic volume and base the
145	      marking on these results.

147	      b) 'Unit-based reservations':  A "unit" represents a share of
148	      bandwidth in the network that could be reserved by the edge
149	      devices.  This mode makes it possible to perform resource
150	      reservations, independently of the amount of traffic that is
151	      actually transmitted.

153	   Both modes can perform admission control of incoming requests and
154	   indicate exceptional events.

156	   In the appendices, we present some analysis of Load Control
157	   properties, but a more detailed investigation can be found in
158	   [Tur99].

160	4.1.  Simple Marking

162	   The idea of simple marking is that core routers measure the traffic,
163	   and, if they encounter near exhaustion of resources, they mark
164	   passing probe packets and thereby notify the edge devices of the lack
165	   of resources.

167	   The scheme has the following steps of operation:

169	      1) Resource Probing: Before establishing the flow, the initiating
170	      edge device sends a probe packet into the network.  The probe
171	      packet passes through the same routers as the actual traffic will
172	      pass through (in any case, with a high degree of probability) and
173	      is exposed to the marking function in each router. The marking
174	      performs an OR-operation of its own status and the incoming probe
175	      packet status (a packet once marked must not be changed).  When
176	      the packet reaches the egress edge device, its header will reflect
177	      the aggregated resource status along that path.

179	      2) Send resource status to ingress: When the egress edge device
180	      receives the probe packet, it copies the marker from the header to
181	      the header/payload of a reverse packet and sends it back to the
182	      initiating party (the ingress edge device). The probe packet may
183	      be discarded, converted to an ordinary data packet, or
184	      encapsulated (as mentioned above) and sent to the ingress edge
185	      device. The packet containing the probing result can also serve as
186	      a probe packet for the reverse path. This allows the initiating
187	      party to check for bi-directional resources.

189	      3) Acceptance/Rejection: The report packet is returned to the
190	      initiating ingress edge device, which uses the result of the probe
191	      to admit or block the request by setting up appropriate packet
192	      filtering, measuring, and marking rules.

194	      4) Reaction to exceptional events: If a core router detects severe
195	      congestion on an interface, it starts marking all packets on that
196	      interface. If the egress edge device receives a marked packet
197	      which is not a probe packet, this can be interpreted as a sign of
198	      severe congestion along the path.  The fact that the incoming
199	      marked packet was not sent as a probe packet can be determined
200	      from the packet content, by multi-field classification or by
201	      checking the admittance state at the egress edge device.  If
202	      severe congestion occurs, a signaling message can be sent to the
203	      ingress edge device, which can then take the appropriate action.

205	   To make the scheme more robust against packet loss, the initiating
206	   edge device MAY maintain a timer associated with each probe packet.
207	   If a probe packet is lost, the device simply re-transmits on time-
208	   out.  How often and how many times the probe packet should be
209	   retransmitted before failure is declared is an implementation issue,
210	   but these parameters SHOULD be configurable (e.g., via an SNMP MIB).
211	   Furthermore, whether probes are retransmitted at all SHOULD be
212	   configurable.

214	4.2.  Unit-based Reservations

216	   While measurement-based admission control has important advantages
217	   over non-measurement based algorithms, it has disadvantages as well.
218	   Unit-based reservations allow the sources to keep their reservations
219	   irrespective of the volume of the traffic they transmit.  Although
220	   the admission scheme is very similar to the simple marking case, the
221	   presence of actual reservations is a fundamental difference.

223	   Each flow can occupy any number of units of resources, and even
224	   fractions of units by allowing a number of flows to share a common
225	   resource unit.  The unit is not necessarily a simple bandwidth value:
226	   it may be defined in terms of any resource unit (e.g., effective
227	   bandwidth) to support statistical multiplexing at packet level (use
228	   of silence period). The definition of the unit may vary from network
229	   to network and is outside the scope of this document.  The basic idea
230	   of unit-based reservation is to allow the edge devices periodically
231	   to mark some of the data packets to refresh resource reservation.
232	   Each refresh packet reserves one unit of resources for one refresh
233	   period. Reservations are timed out after a refresh period and have to
234	   be refreshed in a soft state manner.  The length of the refresh
235	   period must be the same throughout the DS domain and SHOULD be
236	   configurable.

238	   Core routers estimate the number of reservations by counting the
239	   number of refresh packets during a refresh interval. If the router
240	   runs out of units, it goes into blocking state, starts to mark probe
241	   packets indicating congestion and thereby rejects new flows.  The
242	   probe packets that pass the router unmarked and the refresh packets
243	   reserve one unit of resources for the following refresh period.
244	   (Editor note:  It is clear that we need to have the capability of
245	   reserving more than one unit, but it is not yet clear how that will
246	   be encoded in the packet header.  See below.) Thus, after the probe
247	   packet has passed along the path unmarked, the ingress edge device is
248	   required to send the first reservation refresh packet during the next
249	   refresh period.

251	   If a flow occupies more than one unit, more than one probe packet may
252	   be sent to allocate the required number of resources (an alternative
253	   using only one packet should be defined).  Similarly, more than one
254	   refresh packet must be sent for such a flow. By proper definition of
255	   the unit, a wide range of flows can be described and handled using
256	   this simple mechanism.

258	   If a probe packet was forwarded unmarked by a core router, but was
259	   marked later downstream, that core router will not be notified and
260	   will incorrectly maintain the reservation. However, as the flow is
261	   rejected, no refresh packets will arrive, and the reservation will
262	   time out at the end of the refresh period and will be released.

264	   Severe congestion is handled in the same way as in 'Simple marking'
265	   (see below).

267	   If a refresh packet is lost, the downstream routers will
268	   underestimate the number of reserved units. Refresh and probe packets
269	   should therefore be protected from losses in the manner described
270	   above.

272	   Core routers estimate the number of allocated units by counting the
273	   number of refresh packets during a refresh period.  The accuracy of
274	   the estimate can be increased by generating refresh packets evenly
275	   spread in time over the refresh period. This minimizes errors
276	   resulting from time alignment differences between routers and edge
277	   devices.

279	4.3.  Multiple Unit reservation

281	   In some cases it might feasible to add functionality for reservation
282	   of several units in one single reservation request.  A similar
283	   semantic (as the two-bit reservation scheme) could be used to provide
284	   such functionality but it will of course require addition of a
285	   integer value denoting the number of units.

287	   The coding of such proposal is still under discussion and needs to
288	   studied further.

290	4.4.  Codepoints for Flow Types

292	   In both variants of Load Control, routers making marking decisions
293	   have very little information about the resource or QoS requirement of
294	   the flow in question. The DS field of the probe packet can be used to
295	   indicate the DiffServ class the flow will arrive on and thus the QoS
296	   requirements.  The marking function of core routers can take the
297	   required PHB into account when deciding on the marking.

299	   Information on the resource requirements for incoming flows can also
300	   be expressed using the DS field by dividing real-time traffic into
301	   classes based on resource requirements and using different codepoints
302	   for different classes. If the DSCPs denote not only the PHB that the
303	   flow is to receive, but implicitly also the bandwidth requirements
304	   for the flow, core routers will be able to mark packets more
305	   intelligently, resulting in less resource waste and greater
306	   flexibility.

308	   In the unit-based case, the major benefit is that the size of the
309	   unit can be different in different classes, making it possible to
310	   allocate resources with finer granularity.

312	5.  Objects for Standardization

314	   A forthcoming standard might only include the encoding of the Load
315	   Control information into the IP header and some design
316	   recommendations.

318	5.1.  Packet Types

320	   We need four types of packets in the algorithm:

322	      - Ordinary Packet (OP)
323	      - Probe Packet (PP)
324	      - Marked Packet (MP)
325	      - Refresh Packet (RP)

327	   During transport through the network, a probe packet can be changed
328	   to a marked packet. This indicates that at least one router does not
329	   accept the reservation associated by the probe packet.

331	   ------       Rejection       ------
332	   | PP |---------------------->| MP |
333	   ------                       ------

335	   An ordinary packet can also be changed to a marked packet, meaning
336	   that some exceptional event caused severe congestion on one link of
337	   the path the packet took.

339	   ------  Severe Congestion   ------
340	   | OP |---------------------->| MP |
341	   ------                       ------

343	   In the simple marking scheme, only three packet types are used.
344	   Refresh packets are treated as ordinary packets, except that these
345	   packets cannot be changed to marked packets.

347	5.2.  Coding of Packet Types

349	   We have two alternative solutions for storing Load Control related
350	   information in the packet headers: using new DS codepoints or using
351	   the two currently unused bits (intended for ECN) in the DS byte.  The
352	   latter case is only considered in Appendix E.

354	   In the first alternative (where PHBs are intended to be used together
355	   for Load Control), two or three new codepoints would have to be
356	   defined for probe, marked and (optionally) refresh packets. For
357	   example, in the case of the EF PHB, in addition to the codepoint used
358	   for the EF packets, EF-probe, EF-marked and EF-refresh packets can
359	   also be sent. The new codepoints can be drawn from the LU/EXP space.

361	5.3.  Behavior Description

363	   The behavior of the edge devices depends greatly on the application
364	   or signaling protocol that uses the load control scheme. Below we
365	   only describe the few aspects of the edge device behavior that are
366	   necessary for interworking with the core routers.

368	5.3.1.  Behavior of the Core Routers

370	   All core routers continuously maintain a state of accepting or
371	   rejecting more flows.  If the state is accepting, the router passes
372	   all packets unchanged. If the state is congestion, then the router
373	   changes the marking of incoming packets from probe to marked.

375	   If the router is capable of detecting severe congestion, and this
376	   occurs, then the router forwards both ordinary and probe packets as
377	   marked.  The router MUST NOT change the marking of refresh packets.

379	    Addition for Unit-based Reservations:

381	      The router uses the refresh and probe markers in packets to
382	      maintain its estimation of reserved resources. A refresh packet
383	      signals previously admitted resource usage, while a probe packet
384	      signals a new request. When passed unmarked, both types of packets
385	      reserve one unit for one refresh period.

387	5.3.2.  Behavior of the Edge Devices

389	   When a new reservation is needed, the ingress edge device should send
390	   the appropriate number of packets marked as probe.

392	   If the egress edge device receives a probe packet that is marked,
393	   this means that the network has insufficient capacity along the path
394	   between the two edge devices. The egress edge device should take care
395	   of blocking the flow by notifying the ingress device.  If the egress
396	   device receives a marked packet that is not initially sent as probe
397	   packet, it shall inform the ingress device to reject admitted flows.
398	   This can be determined from the packet content, multi-field
399	   classification of the IP header, or by checking the admittance state
400	   at the egress edge device.

402	    Addition for Unit-based Reservations:

404	      For the unit-based reservation scheme, the ingress edge device
405	      should generate the required number of refresh packets per refresh
406	      period and per flow. If there are not enough data packets to mark
407	      as refresh packets, the ingress device must generate dummy packets
408	      and mark those as refresh packets.  The generated refresh packets
409	      should be as uniformly distributed through the refresh interval as
410	      possible to minimize the effect of refresh interval timing between
411	      routers.

413	6.  Interworking with RSVP/Intserv

415	   Load control can also be used in DiffServ regions (backbones) that
416	   connect RSVP/Intserv regions. This inter-operation is described in
417	   detail in [Bernet99]. For load control, border routers of the
418	   DiffServ region must be RSVP-aware in order to detect the arrival of
419	   new connections.

421	   RSVP PATH messages can be used as probe packets to gather congestion
422	   information along the path between the two border routers. When a new
423	   RSVP path state is installed at the egress border router, the
424	   collective admission state of the path (collected in the packet of
425	   the PATH message) is also stored. If a RESV message for the installed
426	   state arrives within a time period during which the congestion state
427	   can be considered valid, then the egress border router can perform
428	   the admission control for the DiffServ network as well. If the first
429	   RESV message arrives too late, then the egress border router MUST
430	   solicit a new (dummy) probe packet from the ingress router to
431	   determine the current congestion state.

433	   When the egress receives a marked packet that is not a PATH message
434	   nor a dummy probe packet, this signals a severe congestion state
435	   along the path. The identity of the ingress router can easily be
436	   determined from the path state, but in this case the egress router
437	   can itself decide to drop certain reservations. The ingress router
438	   can be notified via ResvTear messages while the receiver end systems
439	   get ResvErr messages.

441	   RSVP routers can also be placed inside the domain. In this case,
442	   probing is performed between RSVP routers instead of edge devices.
443	   Thus adding a simple and cheap extension to non-RSVP capable routers,
444	   correct admission control is possible on non-RSVP capable parts of an
445	   end-to-end path.

447	   Unit-based reservations can also be used to provide resources in a DS
448	   domain that is used to provide VPN tunnels between customer sites.
449	   Using a load control scheme, it is fast and easy to modify the size
450	   of these tunnels. Thus, tunnel size selection can be a very dynamic
451	   process. Note that tunnels are not necessarily real-time tunnels.
452	   Packets of any DSCP can travel on them after receiving the
453	   appropriate PHB. Even best-effort tunnels can be reserved this way.
454	   Provisioning can be done on a per-DSCP basis or in aggregates as the
455	   service provider wishes.

457	7.  Security Considerations

459	   We propose using two-bit markers in packet headers (DS field) to
460	   reserve resources within a DiffServ domain. This poses similar
461	   security problems to the use of the DS field to differentiate packets
462	   in general [RFC2475].

464	   If the interior of the DS domain fully contains a tunnel, then by
465	   copying the outer marking into the inner header at de-encapsulation,
466	   load control can be exercised over the links of the tunnel as well.
467	   The procedure is similar to the one described in [RFC2481]. As IPSec
468	   [RFC2402, 2406] does not allow the copying of the DS field from the
469	   outer to the inner header at de-encapsulation, load control cannot be
470	   exercised over regions where IPSec tunnels are used.

472	8.  Identification of Edge Nodes

474	   In the absense of RSVP, an alternative method for identificatiof of
475	   edge nodes will be required.  This section needs to be written.

477	9.  Multicast-related Issues

479	[RFC2406] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload
480	   (ESP)", RFC 2406, November 1998.

482	[Bernet99] Bernett, Y., Yavatkar, R., Ford, P., Baker, F., Zhang, L.,
483	   Speer, M., Braden, R., "Interoperation of RSVP/Intserv and Diffserv
484	   Networks", Work in Progress, March 1999

486	[Stoica99] Stoica, I., et al "Per Hop Behaviors Based on Dynamic Packet
487	   States", Work in Progress, February 1999

489	[Berson97] Berson, S. and Vincent, R., "Aggregation of Internet
490	   Integrated Services State", Work in Progress, December 1997.

492	[Guerin97] Guerin, R., Blake, S. and Herzog, S.,"Aggregating RSVP based
493	   QoS Requests", Work in Progress, November 1997.

495	[Gross99] Grossglauser, M., Tse, D. N. C., "A Time-Scale Decomposition
496	   Approach to Measurement-Based Admission Control", Infocom '99

498	[Tur99] Z. R. Turanyi, L. Westberg "Load Control: Lightweight
499	   Provisioning of Internet Resources" submitted to Networking 2000,
500	   Paris, May 2000, http: //www.ericsson.co.hu/ethzrt/

502	[IAB-QoS] G. Huston (Internet Architecture Board), "Next Steps for the
503	   IP QoS Architecture", Work in Progress, March 2000.

505	[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
506	   Requirement Levels", BCP 14, RFC2119, March 1997.

508	Appendix A. Admission Precision of Simple Marking

510	   Simple marking is basically a measurement-based admission control
511	   scheme, where flows do not say anything about their traffic
512	   characteristics. In addition, flow departure is not signaled
513	   explicitly.

515	   When the network carries more types of flows with different bandwidth
516	   requirements, the core routers do not know the bandwidth requirements
517	   of the incoming flows. They simply declare whether they will accept
518	   more flows or not irrespective of the bandwidth demands of the new
519	   flow. Thus the marking algorithm in the routers should conservatively
520	   always expect the largest type of flow that the network carries and
521	   start rejecting flows when there is not enough bandwidth left for one
522	   such flow.  On the positive side, this will result in fair rejection
523	   among different flow types, but on the negative side, some bandwidth
524	   will be wasted.  However, if the links of our domain can carry at
525	   least several hundred requests even from the most bandwidth-demanding
526	   types of flow, then this is not a significant waste.

528	Appendix B. Effect of Delays on Admission

530	   When a probe packet is passed unmarked without correcting the
531	   estimate of the free resources, we in fact admit a flow without
532	   immediately reserving resources for it.  The reservation will be
533	   implicitly done later by the arriving traffic or refresh packets of
534	   the flow. During the time between admission and the arrival of the
535	   traffic of the flow, new requests can be admitted without taking the
536	   previously admitted flow into account.  To illustrate the effects of
537	   this delay, we took an old and simple Markovian example. Flows are
538	   identical with an average flow-holding time of 180 seconds and flow
539	   arrivals and departures follow a Poisson process. Let the link be
540	   able to carry N calls and let the delay be T. The link starts
541	   refusing flows when the measured traffic exceeds N-H calls. We can
542	   say that a space of size H is put aside to cater for the errors
543	   caused by the delay.

545	   If the link is properly dimensioned, then the usual blocking ratio
546	   should not exceed 1%. However, in a mass call situation (such as
547	   occurs at New Year's Eve for example) it can be considerably higher.
548	   In this example, 50% blocking was chosen to demonstrate the extreme
549	   load case. Thus, the offered traffic is roughly twice the link
550	   capacity.

552	   QoS violation occurs if during time T the difference between the
553	   number of arriving and departing flows is larger than H. Under the
554	   above assumptions, the chance of QoS violation can be calculated.
555	   Naturally the larger H is, the less the chance is that QoS will be
556	   violated. The required value of H can be determined for a low value
557	   of QoS violation probability (e.g.  10e-5).

559	   The following table presents the value of H as a function of link
560	   size (N), delay length (T) and load (causing 1% or 50% blocking).

562	         |    1ms    |   10ms    |   100ms   |   500ms   |     1s    |
563	         | 1%  | 50% | 1%  | 50% | 1%  | 50% | 1%  | 50% | 1%  | 50% |
564	   ------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
565	      50 |  2  |  2  |  2  |  3  |   3 |   4 |   4 |   5 |   5 |   7 |
566	     100 |  2  |  2  |  3  |  3  |   4 |   4 |   4 |   7 |   6 |   9 |
567	     500 |  2  |  3  |  3  |  4  |   4 |   7 |   9 |  13 |  12 |  18 |
568	    1000 |  3  |  3  |  4  |  4  |   5 |   9 |  12 |  18 |  16 |  25 |
569	    5000 |  3  |  4  |  5  |  7  |  12 |  18 |  24 |  44 |  33 |  69 |
570	   10000 |  4  |  4  |  7  |  9  |  16 |  25 |  33 |  69 |  47 | 113 |

572	   The amount of required safety margin is highest for small links,
573	   since less statistical multiplexing is possible there.

575	Appendix C. A Simple Algorithm for Core Routers

577	   In this appendix, we present an algorithm for core routers that use
578	   unit-based reservations. The algorithm is simple, so it can be easily
579	   implemented in hardware by simple counters. Its inputs are the
580	   refresh interval and the number of flows allowed on the link. The
581	   latter is denoted by <threshold>. (We assume flows with similar
582	   characteristics (e.g., voice) and that one flow sends one refresh
583	   packet per refresh interval.) If the network uses more DSCPs for
584	   real-time traffic, then a separate copy of the algorithm may be run
585	   for each DSCP, resulting in per-DSCP admission.

587	   The algorithm counts the number of refresh and admitted probe packets
588	   in refresh intervals (<count>). The result of the counting is an
589	   upper limit on the number of units reserved on the link, as some
590	   reservations may have gone by the end of the refresh interval. The
591	   value of this counter is used in the next interval to decide on
592	   admission (<last>). When a new reservation is admitted, this value is
593	   increased to take the new reservation into account. If this value is
594	   high above the admission limit, then we start sending severe
595	   congestion notifications by marking regular packets as well.

597	      On initialization:
598	         last = 0
599	         count = 0

601	      On arrival of a refresh packet
602	         count++

604	      On arrival of a probe packet
605	         if last < threshold then
606	            last ++
607	            count ++
608	         elseif
609	            Mark Packet
610	         endif

612	      On arrival of a regular packet
613	         if last < threshold*1.1 then
614	            Mark Packet
615	         endif

617	      At the end of the refresh interval
618	         last = count
619	         count = 0

621	Appendix D. Simulation Results

623	   The purpose of the simulations described in this appendix is to give
624	   some insight into the performance of load control. The simulation
625	   cases are by no means representative, and the scheme may work
626	   differently in other situations. In section C.1, the simple marking
627	   case is demonstrated with a purely measurement-based admission
628	   algorithm by using a single link with both constant bit-rate and
629	   on/off sources. In appendix C.2, the unit-based reservation method is
630	   shown, using the algorithm in appendix B.

632	   Severe congestion signaling is not used in any of the examples; only
633	   admission control is used.

635	   We simulated a very simple network of one link. This can be viewed as
636	   the single bottleneck in the domain. The link had a 2 Mbit/s
637	   throughput, 50% of which was designated to carry real-time traffic.
638	   The round trip propagation delay was set to 100ms. The real time
639	   flows arrived according to a Poisson process, holding time was
640	   exponential with a 90 second mean. The arrival rate of flows was set
641	   to produce approximately 50% blocking. Only real-time traffic was
642	   simulated, so scheduling was simple FIFO.

644	D.1 Simple Marking

646	D.1.1 Constant Bit-Rate Sources

648	   In the first case, flows emitted 40 byte long packets every 20 ms,
649	   producing a constant 16 kbit/s load. The 1 Mbit/s capacity assigned
650	   to this traffic can thus carry 62.5 flows. From the table in appendix
651	   A, we can see that 4 calls should be reserved in addition to the
652	   62.5. After an initial transient of 5 minutes, we simulated 2.5
653	   hours.

655	   During the 2.5 hour simulation time, utilization was measured over
656	   5-minute intervals. Utilization was also measured in 20ms slots and
657	   the percentage of slots in which it was above 1.064 Mbit/s (66.5
658	   calls) was counted.

660	      min/avg/max of the utilization was: 881 / 899 / 914 kbit/s
661	      min/avg/max of the violation ratio was: 98.96% / 99.78% / 100%

663	D.1.2 On/Off Sources

665	   In the second simulation case, on/off sources were used. During an
666	   "off" period, no packets were generated, while in the "on" state the
667	   behavior is the same as in the previous case: 40 byte long packets 20
668	   ms apart. The distributions of the on and off periods were both drawn
669	   from a pareto distribution with the shape parameter of 1.1 and mean
670	   of 5 seconds. The average bit-rate of the sources is thus 8 kbit/s.
671	   The flow arrival rate has been doubled to produce50% blocking,
672	   since the link is capable of carrying nearly twice the number of
673	   flows. The same set of measurements was carried out as in the
674	   previous case.

676	      min/avg/max of the utilization was: 808 / 819 / 837 kbit/s
677	      min/avg/max of the violation ratio was: 98.98% / 99.40% / 99.70%

679	It can be seen that although the measurement-based approach was not able
680	to prevent the over-use of the real-time resources in this high load
681	case, it is a viable alternative. In no case did the 20 ms measurements
682	exceed 1.15 Mbit/s, so the over-use just means a temporary steal from
683	the resources provisioned to the lower priority traffic.

685	D.1.3 The Router Algorithm

687	   The mbac algorithm used by the router is presented here only for the
688	   completeness of the simulation description. The marking strategy was
689	   the same for both types of traffic. The router counts the number of
690	   bytes transmitted in every 20 ms interval and calculates the average
691	   bit rate in these 20 ms slots. Then it smoothes these values in time
692	   through an exponentially weighted moving average (ewma) filter. The
693	   window size of the ewma was set to 9 seconds, i.e., running a unit
694	   step function through it, the output will be 0.63 after 9 seconds.
695	   The algorithm also calculated the histogram of the difference between
696	   the original slot values and the filtered values. The histogram has
697	   been counted in 1000 bins between the range of -1 and +1 Mbit/s. The
698	   99% quantile of the histogram was calculated every 100 seconds. The
699	   router marks all passing packets if the sum of the output of the ewma
700	   filter and the calculated quantile is greater than 1 Mbit/s. The
701	   router makes no correction to its measurements when a new flow is a

703	   Thus, the target violation probability was set to 1%, which was in
704	   fact fulfilled in the long run.

706	   On arrival of a new packet, only counters are incremented. Every 20
707	   ms a new value for the ewma must be calculated, a marking decision
708	   must be made for the next 20 ms and the value of one bin in the
709	   histogram must be increased. Every 100 seconds, the 99% quantile
710	   value must be looked up in the histogram and the histogram must be
711	   initialized.

713	   The interested reader can read more about the design rationale of the
714	   above algorithm in [Gross99].

716	D.2 Unit-Based Reservations

718	   In this section we demonstrate the unit-based reservation scheme. The
719	   routers use the simple algorithm in Appendix B, except that it never
720	   marks regular packets. The simulation setup is otherwise the same as
721	   in the previous section. The traffic inside the flows does not affect
722	   the admission algorithm, so during simulation, sources send only
723	   probe and refresh packets. The definition of the unit is a peak bit-
724	   rate of 16 kbit/s. The flow number threshold was set to 62 flows
725	   resulting in close to the same target utilization of 1Mbits/s as in
726	   appendix C.1. The length of the refresh period was changed between
727	   100 ms and 10 seconds. The actual number of flows on the link never
728	   exceeded 62 (no violation), so only the utilization values are shown
729	   in kbit/s.

731	                      | interval | min | avg | max |
732	                      +----------+-----+-----+-----+
733	                      |    --    | 968 | 972 | 976 |
734	                      | 100 ms   | 952 | 954 | 959 |
735	                      |  1 sec.  | 941 | 946 | 949 |
736	                      |  2 sec.  | 927 | 933 | 936 |
737	                      |  4 sec.  | 908 | 913 | 920 |
738	                      |  7 sec.  | 861 | 870 | 879 |
739	                      | 10 sec.  | 827 | 837 | 852 |

741	The first line shows the utilization value for the case when the source
742	limits itself to 62 flows, i.e., blocking is not done by the network,
743	but by the source. This emulates the case when the refresh period is
744	infinitely short or when a state approach is used, as in RSVP. The
745	utilization is not 100% due to the burstiness of the arrivals.

747	It can be seen that as the refresh packets becomes less frequent, more
748	resources are wasted, as the resources allocated to departing flows
749	remain allocated until the end of the next refresh period. The result is
750	not only lower average utilization, but lower maximal utilization as
751	well. When the refresh period is 10 seconds long, the highest
752	utilization experienced was 952 kbit/sec, which is 3 units below the
753	limit.

755	This motivates the use of as short a refresh period as possible.
756	However, too short a refresh period will increase the effects of clock
757	differences between edge and core devices (which was not taken into
758	account during simulation). It also decreases the chance of finding a
759	packet to mark as refresh if the flow is currently transmitting below
760	its reserved rate.

762	Appendix E: Marking using ECN bits

764	If the ECN bits were to be used for load control marking, the values are
765	encoded in the two unused bits as described below, and the DS field
766	contains the PHB.

768	               DS byte    Load Control
769	               01234567   codepoint (in ECN)
770	               -----------------------------
771	               xxxxxx00   Ordinary
772	               xxxxxx01   Probe
773	               xxxxxx10   Marked
774	               xxxxxx11   Refresh

776	               The interpretation of the two unused bits remains
777	               unspecified for other PHBs that do not support Load
778	               Control. This is done so as not to interfere with
779	               possible ECN deployment [RFC2481].

781	Table of Contents

783	1 Abstract ........................................................    2
784	2 Background and Motivation .......................................    3
785	3 Overview ........................................................    3
786	4 Operation of Load Control .......................................    4
787	4.1 Simple Marking ................................................    5
788	4.2 Unit-based Reservations .......................................    6
789	4.3 Multiple Unit reservation .....................................    8
790	4.4 Codepoints for Flow Types .....................................    8
791	5 Objects for Standardization .....................................    8
792	5.1 Packet Types ..................................................    9
793	5.2 Coding of Packet Types ........................................    9
794	5.3 Behavior Description ..........................................   10
795	5.3.1 Behavior of the Core Routers ................................   10
796	5.3.2 Behavior of the Edge Devices ................................   10
797	6 Interworking with RSVP/Intserv ..................................   11
798	7 Security Considerations .........................................   12
799	8 Identification of Edge Nodes ....................................   12
800	9 Multicast-related Issues ........................................   12
801	 Appendix A. Admission Precision of Simple Marking ................   13
802	 Appendix B. Effect of Delays on Admission ........................   14
803	 Appendix C. A Simple Algorithm for Core Routers ..................   15
804	 Appendix D. Simulation Results ...................................   16
805	 D.1 Simple Marking ...............................................   16
806	 D.1.1 Constant Bit-Rate Sources ..................................   16
807	 D.1.2 On/Off Sources .............................................   17
808	 D.1.3 The Router Algorithm .......................................   17
809	 D.2 Unit-Based Reservations ......................................   18
810	 Appendix E: Marking using ECN bits ...............................   19

812	Authors' Addresses

814	Lars Westberg
815	Ericsson Research
816	Kistagangen 26
817	SE-164 80 Stockholm
818	Sweden
819	EMail: Lars.Westberg@era-t.ericsson.se

821	Zoltan R. Turanyi
822	Ericcson Telecommunications
823	Budapest, Laborc u. 1
824	H-1037
825	Hungary
826	EMail: Zoltan.Turanyi@ericsson.com

828	David Partain
829	Ericsson Radio Systems AB
830	P.O. Box 1248
831	SE-581 12  Linkoping
832	Sweden
833	EMail: David.Partain@ericsson.com