idnits 2.17.1 

draft-nichols-diff-svc-arch-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-25) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 1162 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 7 instances of too long lines in the document, the longest one
     being 3 characters in excess of 72.

  ** There are 7 instances of lines with control characters in the document.

  ** The abstract seems to contain references ([2,3]), which it shouldn't. 
     Please replace those with straight textual mentions of the documents in
     question.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 876: '... 100kbps. (There MAY be a shaper set a...'
     RFC 2119 keyword, line 1022: '...datagrams SHOULD be treated as best-ef...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Normative reference to a draft: ref. '3' 

  -- Possible downref: Non-RFC (?) normative reference: ref. '5'

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  -- Possible downref: Non-RFC (?) normative reference: ref. '7'

  -- Possible downref: Non-RFC (?) normative reference: ref. '10'


     Summary: 13 errors (**), 0 flaws (~~), 2 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force         K. Nichols / V. Jacobson / L. Zhang
2	INTERNET-DRAFT                                                    Nov, 1997
3	draft-nichols-diff-svc-arch-00.txt                            Expires: 5/98

5	    A Two-bit Differentiated Services Architecture for the Internet

7	Status of this Memo

9	   This document is an Internet-Draft.  Internet-Drafts are working
10	   documents of the Internet Engineering Task Force (IETF), its areas,
11	   and its working groups.  Note that other groups may also distribute
12	   working documents as Internet-Drafts.

14	   Internet-Drafts are draft documents valid for a maximum of six months
15	   and may be updated, replaced, or obsoleted by other documents at any
16	   time.  It is inappropriate to use Internet-Drafts as reference
17	   material or to cite them other than as "work in progress".

19	   To learn the current status of any Internet-Draft, please check the
20	   "1id-abstracts.txt" listing contained in the Internet- Drafts Shadow
21	   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
22	   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
23	   ftp.isi.edu (US West Coast).

25	Abstract

27	    This document presents a differentiated services architecture for
28	    the internet. Dave Clark and Van Jacobson each presented work
29	    on differentiated services at the Munich IETF meeting [2,3].
30	    Each explained how to use one bit of the IP header to deliver a
31	    new kind of service to packets in the internet. These were two very
32	    different kinds of service with quite different policy assumptions.
33	    Ensuing discussion has convinced us that both service types
34	    have merit and that both service types can be implemented with
35	    a set of very similar mechanisms. We propose an architectural
36	    framework that permits the use of both of these service types
37	    and exploits their similarities in forwarding path mechanisms.
38	    The major goals of this architecture are each shared with one
39	    or both of those two proposals: keep the forwarding path simple,
40	    push complexity to the edges of the network to the extent possible,
41	    provide a service that avoids assumptions about the type of traffic
42	    using it, employ an allocation policy that will be compatible with
43	    both long-term and short-term provisioning, make it possible for
44	    the dominant Internet traffic model to remain best-effort.

46	NOTE: This document includes figures that are an integral part of its
47	      content.  The IETF's choice of ascii as the standard document form
48	      precludes the inclusion of those figures.  The complete document,
49	      with all its figures, is available at:
50	          http://ftp.ee.lbl.gov/papers/dsarch.pdf
51	Internet Engineering Task Force         K. Nichols / V. Jacobson / L. Zhang
52	draft-nichols-diff-svc-arch-00.txt                            Expires: 4/98

54	    A Two-bit Differentiated Services Architecture for the Internet

56				 K. Nichols
57				Bay Networks

59				V. Jacobson
60				   LBNL

62				L. Zhang
63				   UCLA

65	1. Introduction

67	This document presents a differentiated services architecture for
68	the internet. Dave Clark and Van Jacobson each presented work
69	on differentiated services at the Munich IETF meeting [2,3].
70	Each explained how to use one bit of the IP header to deliver a
71	new kind of service to packets in the internet. These were two very
72	different kinds of service with quite different policy assumptions.
73	Ensuing discussion has convinced us that both service types
74	have merit and that both service types can be implemented with
75	a set of very similar mechanisms. We propose an architectural
76	framework that permits the use of both of these service types
77	and exploits their similarities in forwarding path mechanisms.
78	The major goals of this architecture are each shared with one
79	or both of those two proposals: keep the forwarding path simple,
80	push complexity to the edges of the network to the extent possible,
81	provide a service that avoids assumptions about the type of traffic
82	using it, employ an allocation policy that will be compatible with
83	both long-term and short-term provisioning, make it possible for
84	the dominant Internet traffic model to remain best-effort.

86	The major contributions of this document are to present two
87	distinct service types, a set of general mechanisms for the
88	forwarding path that can be used to implement a range of
89	differentiated services and to propose a flexible framework
90	for provisioning a differentiated services network. It is
91	precisely this kind of architecture that is needed for expedient
92	deployment of differentiated services: we need a framework and
93	set of primitives that can be implemented in the short-term and
94	provide interoperable services, yet can provide a "sandbox" for
95	experimentation and elaboration that can lead in time to more
96	levels of differentiation within each service as needed.

98	At the risk of belaboring an analogy, we are motivated to provide
99	services tiers in somewhat the same fashion as the airlines do
100	with first class, business class and coach class. The latter
101	also has tiering built in due to the various restrictions put on
102	the purchase. A part of the analogy we want to stress is that
103	best effort traffic, like coach class seats on an airplane,
104	is still expected to make up the bulk of internet traffic.
105	Business and first class carry a small number of passengers,
106	but are quite important to the economics of the airline industry.
107	The various economic forces and realities combine to dictate the
108	relative allocation of the seats and to try to fill the airplane.
109	We don't expect that differentiated services will comprise all
110	the traffic on the internet, but we do expect that new services
111	will lead to a healthy economic and service environment.

113	This document is organized into sections describing service
114	architecture, mechanisms, the bandwidth allocation architecture,
115	how this architecture might interoperate with RSVP/int-serv work,
116	and gives recommendations for deployment.

118	2. Architecture

120	2.1 Background

122	The current internet delivers one type of service, best-effort,
123	to all traffic. A number of proposals have been made concerning
124	the addition of enhanced services to the Internet. We focus on two
125	particular methods of adding a differentiated level of service to
126	IP, each designated by one bit [1,2,3]. These services represent a
127	radical departure from the Internet's traditional service, but they
128	are also a radical departure from traditional "quality of service"
129	architectures which rely on circuit-based models. Both these
130	proposals seek to define a single common mechanism that is used by
131	interior network routers, pushing most of the complexity and state
132	of differentiated services to the network edges. Both use bandwidth
133	as the resource that is being requested and allocated. Clark and
134	Wroclawski defined an "Assured" service that follows "expected
135	capacity" usage profiles that are statistically provisioned [3].
136	The assurance that the user of such a service receives is that
137	such traffic is unlikely to be dropped as long as it stays within
138	the expected capacity profile. The exact meaning of "unlikely"
139	depends on how well provisioned the service is. An Assured service
140	traffic flow may exceed its Profile, but the excess traffic is
141	not given the same assurance level. Jacobson defined a "Premium"
142	service that is provisioned according to peak capacity Profiles
143	that are strictly not oversubscribed and that is given its own
144	high-priority queue in routers [2]. A Premium service traffic
145	flow is shaped and hard-limited to its provisioned peak rate
146	and shaped so that bursts are not injected into the network.
147	Premium service presents a "virtual wire" where a flow's bursts
148	may queue at the shaper at the edge of the network, but thereafter
149	only in proportion to the indegree of each router. Despite their
150	many similarities, these two approaches result in fundamentally
151	different services. The former uses buffer management to provide
152	a "better effort" service while the latter creates a service with
153	little jitter and queueing delay and no need for queue management
154	on the Premium packets' queue.

156	An Assured service was introduced in [3] by Clark and Wroclawski,
157	though we have made some alterations in its specification for
158	our architecture. Further refinements and an "Expected Capacity"
159	framework are given in Clark and Fang [10].  This framework is
160	focused on "providing different levels of best-effort service at
161	times of network congestion" but also mentions that it is possible
162	to have a separate router queue to implement a "guaranteed"
163	level of assurance.  We believe this framework and our Two-bit
164	architecture are compatible but this needs further exploration.
165	As Premium service has not been documented elsewhere, we describe
166	it next and follow this with a description of the two-bit
167	architecture.

169	2.2 Premium service

171	In [2], a Premium service was presented that is fundamentally
172	different from the Internet's current best effort service.
173	This service is not meant to replace best effort but primarily to
174	meet an emerging demand for a commercial service that can share the
175	network with best effort traffic. This is desirable economically,
176	since the same network can be used for both kinds of traffic.
177	It is expected that Premium traffic would be allocated a small
178	percentage of the total network capacity, but that it would
179	be priced much higher. One use of such a service might be to
180	create "virtual leased lines", saving the cost of building and
181	maintaining a separate network. Premium service, not unlike
182	a standard telephone line, is a capacity which the customer
183	expects to be there when the receiver is lifted, although it may,
184	depending on the household, be idle a good deal of the time.
185	Provisioning Premium traffic in this way reduces the capacity
186	of the best effort internet by the amount of Premium allocated,
187	in the worst case, thus it would have to be priced accordingly.
188	On the other hand, whenever that capacity is not being used it
189	is available to best effort traffic. In contrast to normal best
190	effort traffic which is bursty and requires queue management
191	to deal fairly with congestive episodes, this Premium service
192	by design creates very regular traffic patterns and small or
193	nonexistent queues.

195	Premium service levels are specified as a desired peak bit-rate
196	for a specific flow (or aggregation of flows). The user contract
197	with the network is not to exceed the peak rate. The network
198	contract is that the contracted bandwidth will be available when
199	traffic is sent. First-hop routers (or other edge devices) filter
200	the packets entering the network, set the Premium bit of those
201	that match a Premium service specification, and perform traffic
202	shaping on the flow that smooths all traffic bursts before they
203	enter the network. This approach requires no changes in hosts.
204	A compliant router along the path needs two levels of priority
205	queueing, sending all packets with the Premium bit set first.
206	Best-effort traffic is unmarked and queued and sent at the lower
207	priority. This results in two "virtual networks": one which is
208	identical to today's Internet with buffers designed to absorb
209	traffic bursts; and one where traffic is limited and shaped to
210	a contracted peak-rate, but packets move through a network of
211	queues where they experience almost no queueing delay.

213	In this architecture, forwarding path decisions are made separately
214	and more simply than the setting up of the service agreements
215	and traffic profiles. With the exception of policing and shaping
216	at administrative or "trust" boundaries, the only actions that
217	need to be handled in the forwarding path are to classify a
218	packet into one of two queues on a single bit and to service
219	the two queues using simple priority. Shaping must include both
220	rate and burst parameters; the latter is expected to be small,
221	in the one or two packet range. Policing at boundaries enforces
222	rate compliance, and may be implemented by a simple token bucket.
223	The admission and set-up procedures are expected to evolve, in
224	time, to be dynamically configurable and fairly complex while
225	the mechanisms in the forwarding path remain simple.

227	A Premium service built on this architecture can be deployed in
228	a useful way once the forwarding path mechanisms are in place
229	by making static allocations. Traffic flows can be designated
230	for special treatment through network management configuration.
231	Traffic flows should be designated by the source, the destination,
232	or any combination of fields in the packet header. First-hop (of
233	leaf) routers will filter flows on all or part of the header tuple
234	consisting of the source IP address, destination IP address,
235	protocol identifier, source port number, and destination
236	port number. Based on this classification, a first-hop router
237	performs traffic shaping and sets the designated Premium bit
238	of the precedence field. End-hosts are thus not required to be
239	"differentiated services aware", though if and when end-systems
240	become universally "aware", they might do their own shaping and
241	first-hop routers merely police.

243	Adherence to the subscribed rate and burst size must be enforced
244	at the entry to the network, either by the end-system or by the
245	first-hop router. Within an intranet, administrative domain, or
246	"trust region" the packets can then be classified and serviced
247	solely on the Premium bit. Where packets cross a boundary, the
248	policing function is critical. The entered region will check the
249	prioritized packet flow for conformance to a rate the two regions
250	have agreed upon, discarding packets that exceed the rate. It is
251	thus in the best interests of a region to ensure conformance
252	to the agreed-upon rate at the egress. This requirement means
253	that Premium traffic is burst-free and, together with the no
254	oversubscription rule, leads directly to the observation that
255	Premium queues can easily be sized to prevent the need to drop
256	packets and thus the need for a queue management policy. At each
257	router, the largest queue size is related to the in-degree of
258	other routers and is thus quite small, on the order of ten packets.

260	Premium bandwidth allocations must not be oversubscribed as
261	they represent a commitment by the network and should be priced
262	accordingly. Note that, in this architecture, Premium traffic will
263	also experience considerably less delay variation than either best
264	effort traffic or the Assured data traffic of [3]. Premium rates
265	might be configured on a subscription basis in the near-term,
266	or on-demand when dynamic set-up or signaling is available.

268	Figure 1 shows how a Premium packet flow is established within a
269	particular administrative domain, Company A, and sent across the
270	access link to Company A's ISP. Assume that the host's first-hop
271	router has been configured to match a flow from the host's IP
272	address to a destination IP address that is reached through ISP.
273	A Premium flow is configured from a host with a rate which is
274	both smaller than the total Premium allocation Company A has
275	from the ISP, r bytes per second, and smaller than the amount of
276	that allocation has been assigned to other hosts in Company A.
277	Packets are not marked in any special way when they leave the host.
278	The first-hop router clears the Premium bit on all arriving
279	packets, sets the Premium bit on all packets in the designated
280	flow, shapes packets in the Premium flow to a configured rate
281	and burst size, queues best-effort unmarked packets in the low
282	priority queue and shaped Premium packets in the high priority
283	queue, and sends packets from those two queues at simple priority.
284	Intermediate routers internal to Company A enqueue packets in
285	one of two output queues based on the Premium bit and service
286	the queues with simple priority. Border routers perform quite
287	different tasks, depending on whether they are processing an egress
288	flow or an ingress flow. An egress border router may perform
289	some reshaping on the aggregate Premium traffic to conform to
290	rate r, depending on the number of Premium flows aggregated.
291	Ingress border routers only need to perform a simple policing
292	function that can be implemented with a token bucket. In the
293	example, the ISP accepts all Premium packets from A as long as
294	the flow does not exceed r bytes per second.

296	Figure 1.	 Premium traffic flow from end-host to
297	organization's ISP

299	2.3 Two-bit differentiated services architecture

301	Clark's and Jacobson's proposals are markedly similar in the
302	location and type of functional blocks that are needed to implement
303	them. Furthermore, they implement quite different services which
304	are not incompatible in a network. The Premium service implements
305	a guaranteed peak bandwidth service with negligible queueing delay
306	that cannot starve best effort traffic and can be allocated in a
307	fairly straightforward fashion. This service would seem to have
308	a strong appeal for commercial applications, video broadcasts,
309	voice-over-IP, and VPNs. On the other hand, this service may
310	prove both too restrictive (in its hard limits) and overdesigned
311	(no overallocation) for some applications. The Assured service
312	implements a service that has the same delay characteristics as
313	(undropped) best effort packets and the firmness of its guarantee
314	depends on how well individual links are provisioned for bursts of
315	Assured packets. On the other hand, it permits traffic flows to use
316	any additional available capacity without penalty and occasional
317	dropped packets for short congestive periods may be acceptable
318	to many users. This service might be what an ISP would provide to
319	individual customers who are willing to pay a bit more for internet
320	service that seems unaffected by congestive periods. Both services
321	are only as good as their admission control schemes, though this
322	can be more difficult for traffic which is not peak-rate allocated.

324	There may be some additional benefits of deploying both services.
325	To the extent that Premium service is a conservative allocation
326	of resources, unused bandwidth that had been allocated to Premium
327	might provide some "headroom" for underallocated or burst periods
328	of Assured traffic or for best effort. Network elements that
329	deploy both services will be performing RED queue management on
330	all non-Premium traffic, as suggested in [4], and the effects of
331	mixing the Premium streams with best effort might serve to reduce
332	burstiness in the latter. A strength of the Assured service is that
333	it allows bursts to happen in their natural fashion, but this also
334	makes the provisioning, admission control and allocation problem
335	more difficult so it may take more time and experimentation before
336	this admission policy for this service is completely defined.
337	A Premium service could be deployed that employs static allocations
338	on peak rates with no statistical sharing.

340	As there appear to be a number of advantages to an architecture
341	that permits these two types of service and because, as we shall
342	see, they can be made to share many of the same mechanisms, we
343	propose designating two bit-patterns from the IP header precedence
344	field. We leave the explicit designation of these bit-patterns
345	to the standards process thus we use the shorthand notation of
346	denoting each pattern by a bit, one we will call the Premium or
347	P-bit, the other we call the assurance or A-bit. It is possible
348	for a network to implement only one of these services and to have
349	network elements that only look at the one applicable bit, but we
350	focus on the two service architecture. Further, we assume the case
351	where no changes are made in the hosts, appropriate packet marking
352	all being done in the network, at the first-hop, or leaf, router.
353	We describe the forwarding path architecture in this section,
354	assuming that the service has been allocated through mechanisms
355	we will discuss in section 4.

357	In a more general sense, Premium service denotes packets that are
358	enqueued at a higher priority than the ordinary best-effort queue.
359	Similarly, Assured service denotes packets that are treated
360	preferentially with respect to the dropping probability within
361	the "normal" queue. There are a number of ways to add more service
362	levels within each of these service types [7], but this document
363	takes the position of specifying the base-level services of
364	Premium and Assured.

366	The forwarding path mechanisms can be broken down into those
367	that happen at the input interface, before packet forwarding,
368	and those that happen at the output interface, after packet
369	forwarding. Intermediate routers only need to implement the
370	post packet forwarding functions, while leaf and border routers
371	must perform functions on arriving packets before forwarding.
372	We describe the mechanisms this way for illustration; other ways
373	of composing their functions are possible.

375	Leaf routers are configured with a traffic profile for a particular
376	flow based on its packet header. This functionality has been
377	defined by the RSVP Working Group in RFC 2205. Figure 2 shows
378	what happens to a packet that arrives at the leaf router, before
379	it is passed to the forwarding engine. All arriving packets must
380	have both the A-bit and the P-bit cleared after which packets
381	are classified on their header. If the header does not match any
382	configured values, it is immediately forwarded. Matched flows
383	pass through individual Markers that have been configured from the
384	usage profile for that flow: service class (Premium or Assured),
385	rate (peak for Premium, "expected" for Assured), and permissible
386	burst size (may be optional for Premium). Assured flow packets
387	emerge from the Marker with their A-bits set when the flow is in
388	conformance to its Profile, but the flow is otherwise unchanged.
389	For a Premium flow, the Marker will hold packets when necessary
390	to enforce their configured rate. Thus Premium flow packets
391	emerge from the Marker in a shaped flow with their P-bits set.
392	(It is possible for Premium flow packets to be dropped inside
393	of the Marker as we describe below.) Packets are passed to the
394	forwarding engine when they emerge from Markers. Packets that have
395	either their P or A bits set we will refer to as Marked packets.

397	Figure 2. Block diagram of leaf router input functionality

399	Figure 3 shows the inner workings of the Marker. For both Assured
400	and Premium packets, a token bucket "fills" at the flow rate
401	that was specified in the usage profile. For Assured service,
402	the token bucket depth is set by the Profile's burst size.
403	For Premium service, the token bucket depth must be limited to
404	the equivalent of only one or two packets. (We suggest a depth of
405	one packet in early deployments.) When a token is present, Assured
406	flow packets have their A-bit set to one, otherwise the packet is
407	passed to the forwarding engine. For Premium-configured Marker,
408	arriving packets that see a token present have their P-bits set
409	and are forwarded, but when no token is present, Premium flow
410	packets are held until a token arrives. If a Premium flow bursts
411	enough to overflow the holding queue, its packets will be dropped.
412	Though the flow set up data can be used to configure a size limit
413	for the holding queue (this would be the meaning of a "burst"
414	in Premium service), it is not necessary. Unconfigured holding
415	queues should be capable of holding at least two bandwidth-delay
416	products, adequate for TCP connections. A smaller value might
417	be used to suit delay requirements of a specific application.

419	Figure 3. Markers to implement the two different services

421	In practice, the token bucket should be implemented in bytes
422	and a token is considered to be present if the number of bytes
423	in the bucket is equal or larger to the size of the packet.
424	For Premium, the bucket can only be allowed to fill to the
425	maximum packet size; while Assured may fill to the configured
426	burst parameter. Premium traffic is held until a sufficient byte
427	credit has accumulated and this holding buffer provides the only
428	real queue the flow sees in the network. For Assured, traffic,
429	we just test if the bytes in the bucket are sufficient for the
430	packet size and set A if so. If not, the only difference is that
431	A is not set. Assured traffic goes into a queue following this
432	step and potentially sees a queue at every hop along its path.

434	Each output interface of a router must have two queues and must
435	implement a test on the P-bit to select a packet's output queue.
436	The two queues must be serviced by simple priority, Premium packets
437	first. Each output interface must implement the RED-based RIO
438	mechanism described in [3] on the lower priority queue. RIO uses
439	two thresholds for when to begin dropping packets, a lower one
440	based on total queue occupancy for ordinary best effort traffic and
441	one based on the number of packets enqueued that have their A-bit
442	set. This means that any action preferential to Assured service
443	traffic will only be taken when the queue's capacity exceeds the
444	threshold value for ordinary best effort service. In this case,
445	only unmarked packets will be dropped (using the RED algorithm)
446	unless the threshold value for Assured service is also reached.
447	Keeping an accurate count of the number of A-bit packets currently
448	in a queue requires either testing the A-bit at both entry and
449	exit of the queue or some additional state in the router. Figure 4
450	is a block diagram of the output interface for all routers.

452	Figure 4. Router output interface for two-bit architecture

454	The packet output of a leaf router is thus a shaped stream of
455	packets with P-bits set mingled with an unshaped best effort stream
456	of packets, some of which may have A-bits set. Premium service
457	clearly cannot starve best effort traffic because it is both burst
458	and bandwidth controlled. Assured service might rely only on a
459	conservative allocation to prevent starvation of unmarked traffic,
460	but bursts of Assured traffic might then close out best-effort
461	traffic at bottleneck queues during congestive periods.

463	After [3], we designate the forwarding path objects that test flows
464	against their usage profiles "Profile Meters". Border routers will
465	require Profile Meters at their input interfaces. The bilateral
466	agreement between adjacent administrative domains must specify a
467	peak rate on all P traffic and a rate and burst for A traffic (and
468	possibly a start time and duration). A Profile Meter is required
469	at the ingress of a trust region to ensure that differentiated
470	service packet flows are in compliance with their agreed-upon
471	rates. Non-compliant packets of Premium flows are discarded while
472	non-compliant packets of Assured flows have their A-bits reset.
473	For example, in figure 1, if the ISP has agreed to supply Company
474	A with r bytes/sec of Premium service, P-bit marked packets that
475	enter the ISP through the link from Company A will be dropped if
476	they exceed r. If instead, the service in figure 1 was Assured
477	service, the packets would simply be unmarked, forwarded as
478	best effort.

480	The simplest border router input interface is a Profile Meter
481	constructed from a token bucket configured with the contracted
482	rate across that ingress link (see figure 5). Each type, Premium
483	or Assured, and each interface must have its own profile meter
484	corresponding to a particular class across a particular boundary.
485	(This is in contrast to models where every flow that crosses the
486	boundary must be separately policed and/or shaped.) The exact
487	mechanisms required at a border router input interface depend
488	on the allocation policy deployed; a more complex approach is
489	presented in section 4.

491	Figure 5. Border router input interface Profile Meters

493	3. Mechanisms

495	3.1 Forwarding Path Primitives

497	Section 2.3 introduced the forwarding path objects of Markers and
498	Profile Meters. In this section we specify the primitive building
499	blocks required to compose them. The primitives are: general
500	classifier, bit-pattern classifier, bit setter, priority queues,
501	policing token bucket and shaping token bucket. These primitives
502	can compose a Marker (either a policing or a shaping token bucket
503	plus a bit setter) and a Profile Meter (a policing token bucket
504	plus a dropper or bit setter).

506	General Classifier:
507	    Leaf or first-hop routers must perform a transport-level signature
508	    matching based on a tuple in the packet header, a functionality
509	    which is part of any RSVP-capable router. As described above,
510	    packets whose tuples match one of the configured flows are
511	    conformance tested and have the appropriate service bit set.
512	    This function is memory- and processing-intensive, but is kept
513	    at the edges of the network where there are fewer flows.

515	Bit-pattern classifier:
516	    This primitive comprises a simple two-way decision based on
517	    whether a particular bit-pattern in the IP header is set or not.
518	    As in figure 4, the P-bit is tested when a packet arrives at a
519	    non-leaf router to determine whether to enqueue it in the high
520	    priority output queue or the low priority packet queue. The A-bit
521	    of packets bound for the low priority queue is tested to 1)
522	    increment the count of Assured packets in the queue if set and 2)
523	    determine which drop probability will be used for that packet.
524	    Packets exiting the low priority queue must also have the A-bit
525	    tested so that the count of enqueued Assured packets can be
526	    decremented if necessary.

528	Bit setter:
529	    The A-bits and P-bits must be set or cleared in several places.
530	    A functional block that sets the appropriate bits of the IP header
531	    to a configured bit-pattern would be the most general.

533	Priority queues:
534	    Every network element must include (at least) two levels of simple
535	    priority queueing. The high priority queue is for the Premium
536	    traffic and the service rule is to send packets in that queue
537	    first and to exhaustion. Recall that Premium traffic must never be
538	    oversubscribed, thus Premium traffic should see little or no queue.

540	Shaping token bucket:
541	    This is the token bucket required at the leaf router for Premium
542	    traffic and shown in figure 3. As we shall see, shaping is also
543	    useful at egress points of a trust region. An arriving packet is
544	    immediately forwarded if there is a token present in the bucket,
545	    otherwise the packet is enqueued until the bucket contains tokens
546	    sufficient to send it. Shaping requires clocking mechanisms,
547	    packet memory, and some state block for each flow and is thus a
548	    memory and computation-intensive process.

550	Policing token bucket:
551	    This is the token bucket required for Profile Meters and shown in
552	    figure 5. Policing token buckets never hold arriving packets, but
553	    check on arrival to see if a token is available for the packet's
554	    service class. If so, the packet is forwarded immediately.
555	    If not, the policing action is taken, dropping for Premium and
556	    reclassifying or unmarking for Assured.

558	3.2 Passing configuration information

560	Clearly, mechanisms are required to communicate the information
561	about the request to the leaf router. This configuration
562	information is the rate, burst, and whether it is a Premium or
563	Assured type. There may also need to be a specific field to set
564	or clear this configuration. This information can be passed in
565	a number of ways, including using the semantics of RSVP, SNMP,
566	or directly set by a network administrator in some other way.
567	There must be some mechanisms for authenticating the sender of
568	this information. We expect configuration to be done in a variety
569	of ways in early deployments and a protocol and mechanism for
570	this to be a topic for future standards work.

572	3.3 Discussion

574	The requirements of shapers motivate their placement at the edges
575	of the network where the state per router can be smaller than
576	in the middle of a network. The greatest burden of flow matching
577	and shaping will be at leaf routers where the speeds and buffering
578	required should be less than those that might be required deeper in
579	the network. This functionality is not required at every network
580	element on the path. Routers that are internal to a trust region
581	will not need to shape traffic. Border routers may need or desire
582	to shape the aggregate flow of Marked packets at their egress
583	in order to ensure that they will not burst into non-compliance
584	with the policing mechanism at the ingress to the other domain
585	(though this may not be necessary if the in-degree of the router
586	is low). Further, the shaping would be applied to an aggregation
587	of all the Premium flows that exit the domain via that path,
588	not to each flow individually.

590	These mechanisms are within reach of today's technology and
591	it seems plausible to us that Premium and Assured services are
592	all that is needed in the Internet. If, in time, these services
593	are found insufficient, this architecture provides a migration
594	path for delivering other kinds of service levels to traffic.
595	The A- and P-bits would continue to be used to identify traffic
596	that gets Marked service, but further filter matching could be
597	done on packet headers to differentiate service levels further.
598	Using the bits this way reduces the number of packets that have
599	to have further matching done on them rather than filtering every
600	incoming packet. More queue levels and more complex scheduling
601	could be added for P-bit traffic and more levels of drop priority
602	could be added for A-bit traffic if experience shows them to be
603	necessary and processing speeds are sufficient. We propose that
604	the services described here be considered as "at least" services.
605	Thus, a network element should at least be capable of mapping all
606	P-bit traffic to Premium service and of mapping all A-bit traffic
607	to be treated with one level of priority in the "best effort" queue
608	(it appears that the single level of A-bit traffic should map to
609	a priority that is equivalent to the best level in a multi-level
610	element that is also in the path).

612	On the other hand, what is the downside of deploying an
613	architecture for both classes of service if later experience
614	convinces us that only one of them is needed? The functional blocks
615	of both service classes are similar and can be provided by the same
616	mechanism, parameterized differently. If Assured service is not
617	used, very little is lost. A RED-managed best effort queue has been
618	strongly recommended in [4] and, to the extent that the deployment
619	of this architecture pushes the deployment of RED-managed best
620	effort queues, it is clearly a positive. If Premium service
621	goes unused, the two-queues with simple priority service is not
622	required and the shaping function of the Marker may be unused,
623	thus these would impose an unnecessary implementation cost.

625	4. The Architectural Framework for Marked Traffic Allocation

627	Thus far we have focused on the service definitions and the
628	forwarding path mechanisms. We now turn to the problem of
629	allocating the level of Marked traffic throughout the Internet.
630	We observe that most organizations have fixed portions of their
631	budgets, including data communications, that are determined on
632	an annual or quarterly basis. Some additional monies might be
633	attached to specific projects for discretionary costs that arise
634	in the shorter term. In turn, service providers (ISPs and NSPs)
635	must do their planning on annual and quarterly bases and thus
636	cannot be expected to provide differentiated services purely
637	"on call". Provisioning sets up static levels of Marked traffic
638	while call set-up creates an allocation of Marked traffic for
639	a single flow's duration. Static levels can be provisioned with
640	time-of-day specifications, but cannot be changed in response to
641	a dynamic message. We expect both kinds of bandwidth allocation
642	to be important. The purchasers of Marked services can generally
643	be expected to work on longer-term budget cycles where these
644	services will be accounted for similarly to many information
645	services today. A mail-order house may wish to purchase a fixed
646	allocation of bandwidth in and out of its web-server to give
647	potential customers a "fast" feel when browsing their site.
648	This allocation might be based on hit rates of the previous
649	quarter or some sort of industry-based averages. In addition,
650	there needs to be a dynamic allocation capability to respond to
651	particular events, such as a demonstration, a network broadcast
652	by a company's CEO, or a particular network test. Furthermore,
653	a dynamic capability may be needed in order to meet a precommitted
654	service level when the particular source or destination is allowed
655	to be "anywhere on the Internet". "Dynamic" covers the range
656	from a telephoned or e-mailed request to a signalling type model.
657	A strictly statically allocated scenario is expected to be useful
658	in initial deployment of differentiated services and to make up
659	a major portion of the Marked traffic for the forseeable future.

661	Without a "per call" dynamic set up, the preconfiguring of
662	usage profiles can always be construed as "paying for bits you
663	don't use" whether the type of service is Premium or Assured.
664	We prefer to think of this as paying for the level of service that
665	one expects to have available at any time, for example paying
666	for a telephone line. A customer might pay an additional flat
667	fee to have the privilege of calling a wide local area for no
668	additional charge or might pay by the call. Although a customer
669	might pay on a "per call" basis for every call made anywhere,
670	it generally turns out not to be the most economical option for
671	most customers. It's possible similar pricing structures might
672	arise in the internet.

674	We use Allocation to refer to the process of making Marked
675	traffic commitments anywhere along this continuum from strictly
676	preallocated to dynamic call set-up and we require an Allocation
677	architecture capable of encompassing this entire spectrum
678	in any mix. We further observe that Allocation must follow
679	organizational hierarchies, that is each organization must
680	have complete responsibility for the Allocation of the Marked
681	traffic resource within its domain. Finally, we observe that
682	the only chance of success for incremental deployment lies in an
683	Allocation architecture that is made up of bilateral agreements,
684	as multilateral agreements are much too complex to administer.
685	Thus, the Allocation architecture is made up of agreements across
686	boundaries as to the amount of Marked traffic that will be allowed
687	to pass. This is similar to "settlement" models used today.

689	4.1 Bandwidth Brokers - Allocating and Controlling Bandwidth Shares

691	The goal of differentiated services is controlled sharing of
692	some organization's Internet bandwidth. The control can be done
693	independently by individuals, i.e., users set bit(s) in their
694	packets to distinguish their most important traffic, or it can
695	be done by agents that have some knowledge of the organization's
696	priorities and policies and allocate bandwidth with respect to
697	those policies.  Independent labeling by individuals is simple to
698	implement but unlikely to be sufficient since it's unreasonable to
699	expect all individuals to know all their organization's priorities
700	and current network use and always mark their traffic accordingly.
701	Thus this architecture is designed with agents called bandwidth
702	brokers (BB) [2], that can be configured with organizational
703	policies, keep track of the current allocation of marked traffic,
704	and interpret new requests to mark traffic in light of the policies
705	and current allocation.

707	We note that such agents are inherent in any but the most trivial
708	notions of sharing.  Neither individuals nor the routers their
709	packets transit have the information necessary to decide which
710	packets are most important to the organization.  Since these
711	agents must exist, they can be used to allocate bandwidth for
712	end-to-end connections with far less state and simpler trust
713	relationships than deploying per flow or per filter guarantees in
714	all network elements on an end-to-end path. BBs make it possible
715	for bandwidth allocation to follow organizational hierarchies
716	and, in concert with the forwarding path mechanisms discussed
717	in section 3, reduce the state required to set up and maintain a
718	flow over architectures that require checking the full flow header
719	at every network element. Organizationally, the BB architecture
720	is motivated by the observation that multilateral agreements
721	rarely work and this architecture allows end-to-end services to
722	be constructed out of purely bilateral agreements. BBs only need
723	to establish relationships of limited trust with their peers
724	in adjacent domains, unlike schemes that require the setting
725	of flow specifications in routers throughout an end-to-end path.
726	In practical technical terms, the BB architecture makes it possible
727	to keep state on an administrative domain basis, rather than at
728	every router and the service definitions of Premium and Assured
729	service make it possible to confine per flow state to just the
730	leaf routers.

732	BBs have two responsibilities. Their primary one is to parcel
733	out their region's Marked traffic allocations and set up the
734	leaf routers within the local domain. The other is to manage the
735	messages that are sent across boundaries to adjacent regions' BBs.
736	A BB is associated with a particular trust region, one per domain.
737	A BB has a policy database that keeps the information on who can
738	do what when and a method of using that database to authenticate
739	requesters. Only a BB can configure the leaf routers to deliver a
740	particular service to flows, crucial for deploying a secure system.
741	If the deployment of Differentiated Services has advanced to
742	the stage where dynamically allocated, marked flows are possible
743	between two adjacent domains, BBs also provide the hook needed to
744	implement this. Each domain's BB establishes a secure association
745	with its peer in the adjacent domain to negotiate or configure a
746	rate and a service class (Premium or Assured) across the shared
747	boundary and through the peer's domain. As we shall see, it is
748	possible for some types of service and particularly in early
749	implementations, that this "secure association" is not automatic
750	but accomplished through human negotiation and subsequent manual
751	configuration of the adjacent BBs according to the negotiated
752	agreement. This negotiated rate is a capability that a BB controls
753	for all hosts in its region.

755	When an allocation is desired for a particular flow, a request is
756	sent to the BB. Requests include a service type, a target rate,
757	a maximum burst, and the time period when service is required.
758	The request can be made manually by a network administrator
759	or a user or it might come from another region's BB. A BB first
760	authenticates the credentials of the requester, then verifies there
761	exists unallocated bandwidth sufficient to meet the request. If a
762	request passes these tests, the available bandwidth is reduced by
763	the requested amount and the flow specification is recorded. In the
764	case where the flow has a destination outside this trust region,
765	the request must fall within the class allocation through the
766	"next hop" trust region that was established through a bilateral
767	agreement of the two trust regions. The requester's BB informs
768	the adjacent region's BB that it will be using some of this rate
769	allocation. The BB configures the appropriate leaf router with
770	the information about the packet flow to be given a service at
771	the time that the service is to commence. This configuration is
772	"soft state" that the BB will periodically refresh. The BB in
773	the adjacent region is responsible for configuring the border
774	router to permit the allocated packet flow to pass and for any
775	additional configurations and negotiations within and across its
776	borders that will allow the flow to reach its final destination.

778	At DMZs, there must be an unambiguous way to determine the local
779	source of a packet. An interface's source could be determined
780	from its MAC address which would then be used to classify packets
781	as coming across a logical link directly from the source domain
782	corresponding to that MAC address. Thus with this understanding
783	we can continue to use figures illustrating a single pipe between
784	two different domains.

786	In this way, all agreements and negotiations are performed
787	between two adjacent domains. An initial request might cause
788	communication between BBs on several domains along a path, but
789	each communication is only between two adjacent BBs. Initially,
790	these agreements will be prenegotiated and fairly static. Some may
791	become more dynamic as the service evolves.

793	4.2 Examples

795	This section gives examples of BB transactions in a non-trivial,
796	multi-transit-domain Internet. The BB framework allows operating
797	points across a spectrum from "no signalling across boundaries"
798	to "each flow set up dynamically". We might expect to move
799	across this spectrum over time, as the necessary mechanisms are
800	ubiquitously deployed and BBs become more sophisticated, but
801	the statically allocated portions of the spectrum should always
802	have uses. We believe the ability to support this wide spectrum
803	of choices simultaneously will be important both in incremental
804	deployment and in allowing ISPs to make a wide range of offerings
805	and pricings to users. The examples of this section roughly follow
806	the spectrum of increasing sophistication. Note that we assume
807	that domains contract for some amount of Marked traffic which can
808	be requested as either `Assured' or `Premium' in each individual
809	flow setup transaction. The examples say "Marked" although actual
810	transactions would have to specify either Assured or Premium.

812	A statically configured example with no BB messages exchanged

814	Here all allocations are statically preallocated through purely
815	bilateral agreements between users (individual TCPs, individual
816	hosts, campus networks, or whole ISPs) [6]. The allocations are in
817	the form of usage profiles of rate, burst, and a time during which
818	that profile is to be active. Users and providers negotiate these
819	Profiles which are then installed in the user domain BB and in the
820	provider domain BB. No BB messages cross the boundary; we assume
821	this negotiation is done by human representatives of each domain.
822	In this case, BBs only have to perform one of their two functions,
823	that of allocating this Profile within their local domain. It is
824	even possible to set all of this suballocations up in advance and
825	then the BB only needs to set up and tear down the Profile at the
826	proper time and to refresh the soft state in the leaf routers.
827	>From the user domain BB, the Profile is sent as soft state
828	to the first hop router of the flow during the specified time.
829	These Profiles might be set using RSVP, a variant of RSVP, SNMP, or
830	some vendor-specific mechanism. Although this static approach can
831	work for all Marked traffic, due to the strictly not oversubscribed
832	requirement, it is only appropriate for Premium traffic as long as
833	it is kept to a small percentage of the bottleneck path through
834	a domain or is otherwise constrained to a well-known behavior.
835	Similar restrictions might hold for Assured depending on the
836	expectation associated with the service.

838	In figure 6, we show an example of setting a Profile in a leaf
839	router. A usage profile has been negotiated with the ISP for the
840	entire domain and the BB parcels it out among individual flows
841	as requested. The leaf router mechanism is that shown in figure 3,
842	with the token bucket set to the parameters from the usage profile.
843	The ISP's BB would configure its own Profile Meter at the ingress
844	router from that customer to ensure the Profile was maintained.
845	This mechanism was shown in figure 5. We assume that the time
846	duration and start times for any Profile to be active are
847	maintained in the BB. The Profile is sent to the ingress device
848	or cleared from the ingress device by messages sent from the BB.
849	In this example, we assume that van@lbl wants to talk to ddc@mit.
850	The LBL-BB is sent a request from Van asking that premium service
851	be assigned to a flow that is designated as having source address
852	"V:4" and going to destination address "D:8". This flow should be
853	configured for a rate of 128kb/sec and allocated from 1pm to 3pm.
854	The request must be "signed" in a secure, verifiable manner.
855	The request might be sent as data to the LBL-BB, an e-mail message
856	to a network administrator, or in a phone call to a network
857	administrator. The LBL-BB receives this message, verifies that
858	there is 128kb/sec of unused Premium service for the domain from
859	1-3pm, then sends a message to Leaf1 that sets up an appropriate
860	Profile Meter. The message to Leaf1 might be an RSVP message,
861	or SNMP, or some proprietary method. All the domains passed must
862	have sufficient reserve capacity to meet this request.

864	Figure 6. Bandwidth Broker setting Profiles in leaf routers

866	A statically configured example with BB messages exchanged

868	Next we present an example where all allocations are statically
869	preallocated but BB messages are exchanged for greater flexibility.
870	Figure 7 shows an end-to-end example for Marked traffic in a
871	statically allocated internet. The numbers at the trust region
872	boundaries indicate the total statically allocated Marked packet
873	rates that will be accepted across those boundaries. For example,
874	100kbps of Marked traffic can be sent from LBL to ESNet; a Profile
875	Meter at the ESNet egress boundary would have a token bucket set
876	to rate 100kbps. (There MAY be a shaper set at LBL's egress to
877	ensure that the Marked traffic conforms to the aggregate Profile.)
878	The tables inside the transit network "bubbles" show their policy
879	databases and reflect the values after the transaction is complete.
880	In Figure 7, V wants to transmit a flow from LBL to D at MIT at 10
881	Kbps. As in figure 6, a request for this profile is made of LBL's
882	BB. LBL's BB authenticates the request and checks to see if there
883	is 10kbps left in its Marked allocation going in that direction.
884	There is, so the LBL-BB passes a message to the ESNet-BB saying
885	that it would like to use 10kbps of its Marked allocation for
886	this flow. ESNet authenticates the message, checks its database
887	and sees that it has a 10kbps Marked allocation to NEARNet (the
888	next region in that direction) that is being unused. The policy
889	is that ESNet-BB must always inform ("ask") NEARNet-BB when it
890	is about to use part of its allocation. NEARNET-BB authenticates
891	the message, checks its database and discovers that 20kbps of the
892	allocation to MIT is unused and the policy at that boundary is to
893	not inform MIT when part of the allocation is about to be used
894	("<50 ok" where the total allocation is 50). The dotted lines
895	indicate the "implied" transaction, that is the transaction that
896	would have happened if the policy hadn't said "don't ask me".
897	Now each BB can pass an "ok" message to this request across
898	its boundary. This allows V to send to D, but not vice versa.
899	It would also be possible for the request to originate from D.

901	Figure 7. End-to-end example with static allocation.

903	Consider the same example where the ESNet-BB finds all of its
904	Marked allocation to NEARNet, 10 kbps, in use. With static
905	allocations, ESNet must transmit a "no" to this request back to
906	the LBL-BB. Presumably, the LBL-BB would record this information to
907	complain to ESNet about the overbooking at the end of the month!
908	One solution to this sort of "busy signal" is for ESNet to get
909	better at anticipating its customers needs or require long advance
910	bookings for every flow, but it's also possible for bandwidth
911	brokerage decisions to become dynamic.

913	Figure 8. End-to-end static allocation example with no remaining
914	allocation

916	Dynamic Allocation and additional mechanism

918	As we shall see, dynamic allocation requires more complex BBs as
919	well as more complex border policing, including the necessity to
920	keep more state. However, it enables an important service with
921	a small increase in state.

923	The next set of figures (starting with figure 9) show what happens
924	in the case of dynamic allocation. As before, V requests 10kbps
925	to talk to D at MIT. Since the allocation is dynamic, the border
926	policers do not have a preset value, instead being set to reflect
927	the current peak value of Marked traffic permitted to cross
928	that boundary. The request is sent to the LBL-BB.

930	Figure 9. First step in end-to-end dynamic allocation example.

932	In figure 10, note that ESNet has no allocation set up to NEARNet.
933	This system is capable of dynamic allocations in addition to
934	static, so it asks NEARNet if it can "add 10" to its allocation
935	from ESNet. As in the figure 7 example, MIT's policy is set to
936	"don't ask" for this case, so the dotted lines represent "implicit
937	transactions" where no messages were exchanged. However, NEARNet
938	does update its table to indicate that it is now using 20kbps of
939	the Marked allocation to MIT.

941	Figure 10. Second step in end-to-end dynamic allocation example

943	In figure 11, we see the third step where MIT's "virtual ok"
944	allows the NEARNet-BB to tell its border router to increase the
945	Marked allocation across the ESNet-NEARNet boundary by 10 kbps.

947	Figure 11. Third step in end-to-end dynamic allocation example

949	Figure 11 shows NEARNet-BB's "ok" for that request transmitted
950	back to ESNet-BB. This causes ESNet-BB to send its border router
951	a message to create a 10 kbps subclass for the flow "V->D".
952	This is required in order to ensure that the 10kpbs that has just
953	been dynamically allocated gets used only for that connection.
954	Note that this does require that the per flow state be passed
955	from LBL-BB to ESNet-BB, but this is the only boundary that needs
956	that level of flow information and this further classification
957	will only need to be done at that one boundary router and only
958	on packets coming from LBL. Thus dynamic allocation requires more
959	complex Profile Metering than that shown in figure 5.

961	Figure 12. Fourth step in end-to-end dynamic allocation example.

963	In figure 12, the ESNet border router gives the "ok" that a
964	subclass has been created, causing the ESNet-BB to send an "ok"
965	to the LBL-BB which lets V know the request has been approved.

967	Figure 13. Final step in end-to-end dynamic allocation example

969	For dynamic allocation, a basic version of a CBQ scheduler [5]
970	would have all the required functionality to set up the subclasses.
971	RSVP currently provides a way to move the TSpec for the flow.

973	For multicast flows, we assume that packets that are bound for
974	at least one egress can be carried through a domain at that level
975	of service to all egress points. If a particular multicast branch
976	has been subscribed to at best-effort when upstream branches are
977	Marked, it will have its bit settings cleared before it crosses
978	the boundary. The information required for this flow identification
979	is used to augment the existing state that is already kept on
980	this flow because it is a multicast flow. We note that we are
981	already "catching" this flow, but now we must potentially clear
982	the bit-pattern.

984	5. RSVP/int-serv and this architecture

986	Much work has been done in recent years on the definition of
987	related integrated services for the internet and the specification
988	of the RSVP signalling protocol. The two-bit architecture proposed
989	in this work can easily interoperate with those specifications.
990	In this section we first discuss how the forwarding mechanisms
991	described in section 3 can be used to support integrated
992	services. Second, we discuss how RSVP could interoperate with
993	the administrative structure of the BBs to provide better scaling.

995	5.1 Providing Controlled-Load and Guaranteed Service

997	We believe that the forwarding path mechanisms described in
998	section 3 are general enough that they can also be used to provide
999	the Controlled-Load service [8] and a version of the Guaranteed
1000	Quality of Service [9], as developed by the int-serv WG. First note
1001	that Premium service can be thought of as a constrained case of
1002	Controlled-Load service where the burst size is limited to one
1003	packet and where non-conforming packets are dropped. A network
1004	element that has implemented the mechanisms to support premium
1005	service can easily support the more general controlled-load
1006	service by making one or more minor parameter adjustments, e.g.
1007	by lifting the constraint on the token bucket size, or configuring
1008	the Premium service rate with the peak traffic rate parameter in
1009	the Controlled-Load specification, and by changing the policing
1010	action on out-of-profile packets from dropping to sending the
1011	packets to the Best-effort queue.

1013	It is also possible to implement Guaranteed Quality of Service
1014	using the mechanisms of Premium service. From RFC 2212 [9]:
1015	"The definition of guaranteed service relies on the result that
1016	the fluid delay of a flow obeying a token bucket (r, b) and being
1017	served by a line with bandwidth R is bounded by b/R as long as R is
1018	no less than r. Guaranteed service with a service rate R, where now
1019	R is a share of bandwidth rather than the bandwidth of a dedicated
1020	line approximates this behavior." The service model of Premium
1021	clearly fits this model. RFC 2212 states that "Non-conforming
1022	datagrams SHOULD be treated as best-effort datagrams." Thus, a
1023	policing Profile Meter that drops non-conforming datagrams would
1024	be acceptable, but it's also possible to change the action for
1025	non-compliant packets from a drop to sending to the best-effort
1026	queue.

1028	5.2 RSVP and BBs

1030	In this section we discuss how RSVP signaling can be used in
1031	conjunction with the BBs described in section 4 to deliver a
1032	more scalable end-to-end resource set up for Integrated Services.
1033	First we note that the BB architecture has three major differences
1034	with the original RSVP resource set up model:

1036	 1. There exist apriori bilateral business relations between BBs of
1037	    adjacent trust regions before one can set up end-to-end resource
1038	    allocation; real-time signaling is used only to activate/confirm
1039	    the availability of pre-negotiated Marked bandwidth, and to
1040	    dynamically readjust the allocation amount when necessary. We note
1041	    that this real-time signaling across domains is not required,
1042	    but depends on the nature of the bilateral agreement (e.g., the
1043	    agreement might state "I'll tell you whenever I'm going to use
1044	    some of my allocation" or not).

1046	 2. A few bits in the packet header, i.e. the P-bit and A-bit,
1047	    are used to mark the service class of each packet, therefore a
1048	    full packet classification (by checking all relevant fields in
1049	    the header) need be done only once at the leaf router; after that
1050	    packets will be served according to their class bit settings.

1052	 3. RSVP resource set up assumes that resources will be reserved
1053	    hop-by-hop at each router along the entire end-to-end path.

1055	RSVP messages sent to leaf routers by hosts can be intercepted
1056	and sent to the local domain's BB. The BB processes the message
1057	and, if the request is approved, forwards a message to the leaf
1058	router that sets up appropriate per-flow packet classification.
1059	A message should also be sent to the egress border router to add
1060	to the aggregate Marked traffic allocation for packet shaping by
1061	the Profile Meter on outbound traffic. (It's possible that this is
1062	always set to the full allocation.) An RSVP message must be sent
1063	across the boundary to adjacent ISP's border router, either from
1064	the local domain's border router or from the local domain's BB.
1065	If the ISP is also implementing the RSVP with a BB and diff-serv
1066	framework, its border router forwards the message to the ISP's
1067	local BB. A similar process (to what happened in the first domain)
1068	can be carried out in the ISP domain, then an RSVP message
1069	gets forwarded to the next ISP along the path. Inside a domain,
1070	packets are served solely according to the Marked bits. The local
1071	BB knows exactly how much Premium traffic is permitted to enter
1072	at each border router and from which border router packets exit.

1074	6. Recommendations

1076	This document has presented a reference architecture for
1077	differentiated services. Several variations can be envisioned,
1078	particularly for early and partial deployments, but we do not
1079	enumerate all of these variations here. There has been a great
1080	market demand for differentiated services lately. As one of the
1081	many efforts to meet that demand this draft sketches out the
1082	framework of a flexible architecture for offering differential
1083	services, and in particular defines a simple set of packet
1084	forwarding path mechanisms to support two basic types of
1085	differential services. Although there remain a number of issues
1086	and parameters that need further exploration and refinement,
1087	we believe it is both possible and feasible at this time to
1088	start deployment of differentiated services incrementally. First,
1089	given that the basic mechanisms required in the packet forwarding
1090	path are clearly understood, both Assured and Premium services
1091	can be implemented today with manually configured BBs and static
1092	resource allocation. Initially we recommend conservative choices
1093	on the amount of Marked traffic that is admitted into the network.
1094	Second, we plan to continue the effort started with this draft
1095	and the experimental work of the authors to define and deploy
1096	increasingly sophisticated BBs. We hope to turn the experience
1097	gained from in-progress trial implementations on ESNet and CAIRN
1098	into future proposals to the IETF.

1100	Future revisions of this draft will present the receiver-based
1101	and multicast flow allocations in detail.    After this step
1102	is finished, we believe the basic picture of an scalable,
1103	robust, secure resource management and allocation system will be
1104	completed. In this draft we described how the proposed architecture
1105	supports two services that seem to us to provide at least a good
1106	starting point for trial deployment of differentiated services.
1107	Our main intent is to define an architecture with three services,
1108	Premium, Assured, and Best effort, that can be determined by
1109	specific bit-patterns, but not to preclude additional levels
1110	of differentiation within each service. It seems that more
1111	experimentation and experience is required before we could
1112	standardize more than one level per service class. Our base-level
1113	approach says that everyone has to provide "at least" Premium
1114	service and Assured service as documented. We feel rather strongly
1115	about both 1) that we should not try to define, at this time,
1116	something beyond the minimalist two service approach and 2) that
1117	the architecture we define must be open-ended so that more levels
1118	of differentiation might be standardized in the future. We believe
1119	this architecture is completely compatible with approaches that
1120	would define more levels of differentiation within a particular
1121	service, if the benefits of doing so become well understood.

1123	7. Acknowledgments

1125	The authors have benefited from many discussions, both in
1126	person and electronically and wish to particularly thank Dave
1127	Clark who has been responsible for the genesis of many of the
1128	ideas presented here, though he does not agree with all of the
1129	content this document. We also thank Sally Floyd for comments
1130	on an earlier draft. A comment from Jon Crowcroft was partially
1131	responsible for our including section 5. Comments from Fred Baker
1132	made us try to make it clearer that we are defining two base-level
1133	services, irrespective of the bit patterns used to encode them.

1135	8. References

1137	[1] D. Clark, "Adding Service Discrimination to the Internet",
1138	1995.

1140	[2] V. Jacobson, "Differentiated Services Architecture", talk in
1141	the Int-Serv WG at the Munich IETF, August, 1997.

1143	[3] D. Clark and J. Wroclawski, "An Approach to Service Allocation
1144	in the Internet", Internet Draft draft-clark-diff-svc-alloc-00.txt,
1145	July 1997, also talk by D. Clark in the Int-Serv WG at the Munich
1146	IETF, August, 1997.

1148	[4] Braden et. al., "Recommendations on Queue Management and
1149	Congestion Avoidance in the Internet", Internet Draft, March, 1997.

1151	[4] Braden, R., Ed., et. al., "Resource Reservation Protocol (RSVP)
1152	- Version 1 Functional Specification", RFC 2205, September, 1997.

1154	[5] S. Floyd and V. Jacobson, "Link-sharing and Resource Management
1155	Models for Packet Networks", IEEE/ACM Transactions on Networking,
1156	pp 365-386, August 1995.

1158	[6] D. Clark, private communication, October 26, 1997

1160	[7] "Advanced QoS Services for the Intelligent Internet", Cisco
1161	Systems White Paper, 1997.

1163	[8] J. Wroclawski, "Specification of the Controlled-Load Network
1164	Element Service", RFC 2211, September, 1997.

1166	[9] S. Shenker, et. al., "Specification of Guaranteed Quality of
1167	Service", RFC 2212, September, 1997.

1169	[10] D. Clark and W. Fang, "Explicit Allocation of
1170	Best Effort Packet Delivery Service", November, 1997.
1171	http://diffserv.lcs.mit.edu/Papers/exp-alloc-ddc-wf.pdf

1173	Authors' Addresses

1175	    Kathleen Nichols
1176	    Bay Networks, Inc.
1177	    Bay Architecture Lab
1178	    4401 Great America Parkway, SC1-04
1179	    Santa Clara, CA 95052-8185
1180	    Phone: 408-495-3252
1181	    Fax:   408-495-1299
1182	    Email: knichols@baynetworks.com

1184	    Van Jacobson
1185	    M/S 50B-2239
1186	    Lawrence Berkeley National Laboratory
1187	    One Cyclotron Rd
1188	    Berkeley, CA 94720
1189	    Email: van@ee.lbl.gov

1191	    Lixia Zhang
1192	    UCLA
1193	    4531G Boelter Hall
1194	    Los Angeles, CA  90095
1195	    Phone: 310-825-2695
1196	    Email: lixia@cs.ucla.edu

1198	Internet Engineering Task Force         K. Nichols / V. Jacobson / L. Zhang
1199	draft-nichols-diff-svc-arch-00.txt                            Expires: 5/98