idnits 2.17.1 

draft-nichols-diff-svc-arch-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 1223 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 5 instances of lines with control characters in the document.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 1020: '...orming datagrams SHOULD be treated as ...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 20 has weird spacing: '... at any  time....'

  == Line 23 has weird spacing: '...ssed at  http:...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Normative reference to a draft: ref. '3' 

  -- Possible downref: Non-RFC (?) normative reference: ref. '5'

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  -- Possible downref: Non-RFC (?) normative reference: ref. '7'

  -- Possible downref: Non-RFC (?) normative reference: ref. '10'


     Summary: 8 errors (**), 0 flaws (~~), 4 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	INTERNET DRAFT		                                      K. Nichols
2	draft-nichols-diff-svc-arch-01.txt	                     V. Jacobson
3	April, 1999		                                           Cisco
4				                                        L. Zhang
5				                                            UCLA

7	  A Two-bit Differentiated Services Architecture for the Internet

9	Status of this Memo

11	This document is an Internet-Draft and is in full conformance with
12	all provisions of Section 10 of RFC2026.

14	Internet-Drafts are working documents of the Internet Engineering
15	Task Force (IETF), its areas, and its working groups.  Note that other
16	groups may also distribute working documents as Internet-Drafts.

18	Internet-Drafts are draft documents valid for a maximum of six
19	months and may be updated, replaced, or obsoleted by other
20	documents at any  time.  It is inappropriate to use Internet-Drafts as
21	reference material or to cite them other than as "work in progress."

23	The list of current Internet-Drafts can be accessed at  http://
24	www.ietf.org/ietf/1id-abstracts.txt.

26	The list of Internet-Draft Shadow Directories can be accessed at
27	http://www.ietf.org/shadow.html.

29	Abstract

31	This document was originally submitted as an internet draft in
32	November of 1997. As one of the documents predating the formation
33	of the IETF's Differentiated Services Working Group, many of the
34	ideas presented here, in concert with Dave Clark's subsequent
35	presentation to the December 1997 meeting of the IETF Integrated
36	Services Working Group, were key to the work which led to RFCs
37	2474 and 2475 and the section on allocation remains a timely
38	proposal. For this reason, and to provide a reference, it is
39	being submitted in its original form. The forwarding path portion
40	of this document is intended as a record of where we were at in late
41	1997 and not as an indication of future direction.

43	The postscript version of this document includes Clark's slides as an
44	appendix. The postscript version of this document also includes many
45	figures that aid greatly in its readability.

47	1. Introduction

49	This document presents a differentiated services architecture for the
50	internet. Dave Clark and Van Jacobson each presented work on
51	differentiated services at the Munich IETF meeting [2,3]. Each
52	explained how to use one bit of the IP header to deliver a new
53	kind of service to packets in the internet. These were two very
54	different kinds of service with quite different policy assumptions.
55	Ensuing discussion has convinced us that both service types have
56	merit and that both service types can be implemented with a set
57	of very similar mechanisms. We propose an architectural
58	framework that permits the use of both of these service types and
59	exploits their similarities in forwarding path mechanisms. The
60	major goals of this architecture are each shared with one or both
61	of those two proposals: keep the forwarding path simple, push
62	complexity to the edges of the network to the extent possible,
63	provide a service that avoids assumptions about the type of
64	traffic using it, employ an allocation policy that will be
65	compatible with both long-term and short-term provisioning,
66	make it possible for the dominant Internet traffic model to
67	remain best-effort.

69	The major contributions of this document are to present two
70	distinct service types, a set of general mechanisms for the
71	forwarding path that can be used to implement a range of
72	differentiated services and to propose a flexible framework for
73	provisioning a differentiated services network. It is precisely this
74	kind of architecture that is needed for expedient deployment of
75	differentiated services: we need a framework and set of
76	primitives that can be implemented in the short-term and provide
77	interoperable services, yet can provide a "sandbox" for
78	experimentation and elaboration that can lead in time to more
79	levels of differentiation within each service as needed.

81	At the risk of belaboring an analogy, we are motivated to provide
82	services tiers in somewhat the same fashion as the airlines do
83	with first class, business class and coach class. The latter also has
84	tiering built in due to the various restrictions put on the purchase.
85	A part of the analogy we want to stress is that best effort traffic,
86	like coach class seats on an airplane, is still expected to make up
87	the bulk of internet traffic. Business and first class carry a small
88	number of passengers, but are quite important to the economics
89	of the airline industry. The various economic forces and realities
90	combine to dictate the relative allocation of the seats and to try to
91	fill the airplane. We don't expect that differentiated services will
92	comprise all the traffic on the internet, but we do expect that new
93	services will lead to a healthy economic and service
94	environment.

96	This document is organized into sections describing service
97	architecture, mechanisms, the bandwidth allocation architecture,
98	how this architecture might interoperate with RSVP/int-serv
99	work, and gives recommendations for deployment.

101	2. Architecture

103	2.1 Background

105	The current internet delivers one type of service, best-effort, to
106	all traffic. A number of proposals have been made concerning
107	the addition of enhanced services to the Internet. We focus on
108	two particular methods of adding a differentiated level of service
109	to IP, each designated by one bit [1,2,3]. These services
110	represent a radical departure from the Internet's traditional
111	service, but they are also a radical departure from traditional
112	"quality of service" architectures which rely on circuit-based
113	models. Both these proposals seek to define a single common
114	mechanism that is used by interior network routers, pushing most
115	of the complexity and state of differentiated services to the
116	network edges. Both use bandwidth as the resource that is being
117	requested and allocated. Clark and Wroclawski defined an
118	"Assured" service that follows "expected capacity" usage profiles
119	that are statistically provisioned [3]. The assurance that the user
120	of such a service receives is that such traffic is unlikely to be
121	dropped as long as it stays within the expected capacity profile.
122	The exact meaning of "unlikely" depends on how well
123	provisioned the service is. An Assured service traffic flow may
124	exceed its Profile, but the excess traffic is not given the same
125	assurance level. Jacobson defined a "Premium" service that is
126	provisioned according to peak capacity Profiles that are strictly
127	not oversubscribed and that is given its own high-priority queue
128	in routers [2]. A Premium service traffic flow is shaped and
129	hard-limited to its provisioned peak rate and shaped so that
130	bursts are not injected into the network. Premium service
131	presents a "virtual wire" where a flow's bursts may queue at the
132	shaper at the edge of the network, but thereafter only in
133	proportion to the indegree of each router. Despite their many
134	similarities, these two approaches result in fundamentally
135	different services. The former uses buffer management to
136	provide a "better effort" service while the latter creates a service
137	with little jitter and queueing delay and no need for queue
138	management on the Premium packets's queue.

140	An Assured service was introduced in [3] by Clark and
141	Wroclawski, though we have made some alterations in its
142	specification for our architecture. Further refinements and an
143	"Expected Capacity" framework are given in Clark and Fang
144	[10].  This framework is focused on "providing different levels
145	of best-effort service at times of network congestion" but also
146	mentions that it is possible to have a separate router queue to
147	implement a "guaranteed" level of assurance.  We believe this
148	framework and our Two-bit architecture are compatible but this
149	needs further exploration.  As Premium service has not been
150	documented elsewhere, we describe it next and follow this with a
151	description of the two-bit architecture.

153	2.2 Premium service

155	In [2], a Premium service was presented that is fundamentally
156	different from the Internet's current best effort service. This
157	service is not meant to replace best effort but primarily to meet
158	an emerging demand for a commercial service that can share the
159	network with best effort traffic. This is desirable economically,
160	since the same network can be used for both kinds of traffic. It is
161	expected that Premium traffic would be allocated a small
162	percentage of the total network capacity, but that it would be
163	priced much higher. One use of such a service might be to create
164	"virtual leased lines", saving the cost of building and maintaining
165	a separate network. Premium service, not unlike a standard
166	telephone line, is a capacity which the customer expects to be
167	there when the receiver is lifted, although it may, depending on
168	the household, be idle a good deal of the time.  Provisioning
169	Premium traffic in this way reduces the capacity of the best
170	effort internet by the amount of Premium allocated, in the worst
171	case, thus it would have to be priced accordingly. On the other
172	hand, whenever that capacity is not being used it is available to
173	best effort traffic. In contrast to normal best effort traffic which
174	is bursty and requires queue management to deal fairly with
175	congestive episodes, this Premium service by design creates very
176	regular traffic patterns and small or nonexistent queues.

178	Premium service levels are specified as a desired peak bit-rate
179	for a specific flow (or aggregation of flows). The user contract
180	with the network is not to exceed the peak rate. The network
181	contract is that the contracted bandwidth will be available when
182	traffic is sent. First-hop routers (or other edge devices) filter the
183	packets entering the network, set the Premium bit of those that
184	match a Premium service specification, and perform traffic
185	shaping on the flow that smooths all traffic bursts before they
186	enter the network. This approach requires no changes in hosts. A
187	compliant router along the path needs two levels of priority
188	queueing, sending all packets with the Premium bit set first.
189	Best-effort traffic is unmarked and queued and sent at the lower
190	priority. This results in two "virtual networks": one which is
191	identical to today's Internet with buffers designed to absorb
192	traffic bursts; and one where traffic is limited and shaped to a
193	contracted peak-rate, but packets move through a network of
194	queues where they experience almost no queueing delay.

196	In this architecture, forwarding path decisions are made
197	separately and more simply than the setting up of the service
198	agreements and traffic profiles. With the exception of policing
199	and shaping at administrative or "trust" boundaries, the only
200	actions that need to be handled in the forwarding path are to
201	classify a packet into one of two queues on a single bit and to
202	service the two queues using simple priority. Shaping must
203	include both rate and burst parameters; the latter is expected to
204	be small, in the one or two packet range. Policing at boundaries
205	enforces rate compliance, and may be implemented by a simple
206	token bucket. The admission and set-up procedures are expected
207	to evolve, in time, to be dynamically configurable and fairly
208	complex while the mechanisms in the forwarding path remain
209	simple.

211	A Premium service built on this architecture can be deployed in a
212	useful way once the forwarding path mechanisms are in place by
213	making static allocations. Traffic flows can be designated for
214	special treatment through network management configuration.
215	Traffic flows should be designated by the source, the destination,
216	or any combination of fields in the packet header. First-hop (of
217	leaf) routers will filter flows on all or part of the header tuple
218	consisting of the source IP address, destination IP address,
219	protocol identifier, source port number, and destination port
220	number. Based on this classification, a first-hop router performs
221	traffic shaping and sets the designated Premium bit of the
222	precedence field. End-hosts are thus not required to be
223	"differentiated services aware", though if and when end-systems
224	become universally "aware", they might do their own shaping
225	and first-hop routers merely police.

227	Adherence to the subscribed rate and burst size must be enforced
228	at the entry to the network, either by the end-system or by the
229	first-hop router. Within an intranet, administrative domain, or
230	"trust region" the packets can then be classified and serviced
231	solely on the Premium bit. Where packets cross a boundary, the
232	policing function is critical. The entered region will check the
233	prioritized packet flow for conformance to a rate the two regions
234	have agreed upon, discarding packets that exceed the rate. It is
235	thus in the best interests of a region to ensure conformance to the
236	agreed-upon rate at the egress. This requirement means that
237	Premium traffic is burst-free and, together with the no
238	oversubscription rule, leads directly to the observation that
239	Premium queues can easily be sized to prevent the need to drop
240	packets and thus the need for a queue management policy. At
241	each router, the largest queue size is related to the in-degree of
242	other routers and is thus quite small, on the order of ten packets.

244	Premium bandwidth allocations must not be oversubscribed as
245	they represent a commitment by the network and should be
246	priced accordingly. Note that, in this architecture, Premium
247	traffic will also experience considerably less delay variation than
248	either best effort traffic or the Assured data traffic of [3].
249	Premium rates might be configured on a subscription basis in the
250	near-term, or on-demand when dynamic set-up or signaling is
251	available.

253	Figure 1 shows how a Premium packet flow is established within
254	a particular administrative domain, Company A, and sent across
255	the access link to Company A's ISP. Assume that the host's first-
256	hop router has been configured to match a flow from the host's
257	IP address to a destination IP address that is reached through
258	ISP. A Premium flow is configured from a host with a rate which
259	is both smaller than the total Premium allocation Company A has
260	from the ISP, r bytes per second, and smaller than the amount of
261	that allocation has been assigned to other hosts in Company A.
262	Packets are not marked in any special way when they leave the
263	host. The first-hop router clears the Premium bit on all arriving
264	packets, sets the Premium bit on all packets in the designated
265	flow, shapes packets in the Premium flow to a configured rate
266	and burst size, queues best-effort unmarked packets in the low
267	priority queue and shaped Premium packets in the high priority
268	queue, and sends packets from those two queues at simple
269	priority. Intermediate routers internal to Company A enqueue
270	packets in one of two output queues based on the Premium bit
271	and service the queues with simple priority. Border routers
272	perform quite different tasks, depending on whether they are
273	processing an egress flow or an ingress flow. An egress border
274	router may perform some reshaping on the aggregate Premium
275	traffic to conform to rate r, depending on the number of
276	Premium flows aggregated. Ingress border routers only need to
277	perform a simple policing function that can be implemented with
278	a token bucket. In the example, the ISP accepts all Premium
279	packets from A as long as the flow does not exceed r bytes per
280	second.

282	Figure 1. Premium traffic flow from end-host to organization's ISP

284	2.3 Two-bit differentiated services architecture

286	Clark's and Jacobson's proposals are markedly similar in the
287	location and type of functional blocks that are needed to
288	implement them. Furthermore, they implement quite different
289	services which are not incompatible in a network. The Premium
290	service implements a guaranteed peak bandwidth service with
291	negligible queueing delay that cannot starve best effort traffic
292	and can be allocated in a fairly straightforward fashion. This
293	service would seem to have a strong appeal for commercial
294	applications, video broadcasts, voice-over-IP, and VPNs. On the
295	other hand, this service may prove both too restrictive (in its hard
296	limits) and overdesigned (no overallocation) for some
297	applications. The Assured service implements a service that has
298	the same delay characteristics as (undropped) best effort packets
299	and the firmness of its guarantee depends on how well individual
300	links are provisioned for bursts of Assured packets. On the other
301	hand, it permits traffic flows to use any additional available
302	capacity without penalty and occasional dropped packets for
303	short congestive periods may be acceptable to many users. This
304	service might be what an ISP would provide to individual
305	customers who are willing to pay a bit more for internet service
306	that seems unaffected by congestive periods. Both services are
307	only as good as their admission control schemes, though this can
308	be more difficult for traffic which is not peak-rate allocated.

310	There may be some additional benefits of deploying both
311	services. To the extent that Premium service is a conservative
312	allocation of resources, unused bandwidth that had been
313	allocated to Premium might provide some "headroom" for
314	underallocated or burst periods of Assured traffic or for best
315	effort. Network elements that deploy both services will be
316	performing RED queue management on all non-Premium traffic,
317	as suggested in [4], and the effects of mixing the Premium
318	streams with best effort might serve to reduce burstiness in the
319	latter. A strength of the Assured service is that it allows bursts to
320	happen in their natural fashion, but this also makes the
321	provisioning, admission control and allocation problem more
322	difficult so it may take more time and experimentation before
323	this admission policy for this service is completely defined. A
324	Premium service could be deployed that employs static
325	allocations on peak rates with no statistical sharing.

327	As there appear to be a number of advantages to an architecture
328	that permits these two types of service and because, as we shall
329	see, they can be made to share many of the same mechanisms,
330	we propose designating two bit-patterns from the IP header
331	precedence field. We leave the explicit designation of these bit-
332	patterns to the standards process thus we use the shorthand
333	notation of denoting each pattern by a bit, one we will call the
334	Premium or P-bit, the other we call the assurance or A-bit. It is
335	possible for a network to implement only one of these services
336	and to have network elements that only look at the one
337	applicable bit, but we focus on the two service architecture.
338	Further, we assume the case where no changes are made in the
339	hosts, appropriate packet marking all being done in the network,
340	at the first-hop, or leaf, router. We describe the forwarding path
341	architecture in this section, assuming that the service has been
342	allocated through mechanisms we will discuss in section 4.

344	In a more general sense, Premium service denotes packets that
345	are enqueued at a higher priority than the ordinary best-effort
346	queue. Similarly, Assured service denotes packets that are
347	treated preferentially with respect to the dropping probability
348	within the "normal" queue. There are a number of ways to add
349	more service levels within each of these service types [7], but
350	this document takes the position of specifying the base-level
351	services of Premium and Assured.

353	The forwarding path mechanisms can be broken down into those
354	that happen at the input interface, before packet forwarding, and
355	those that happen at the output interface, after packet forwarding.
356	Intermediate routers only need to implement the post packet
357	forwarding functions, while leaf and border routers must perform
358	functions on arriving packets before forwarding. We describe the
359	mechanisms this way for illustration; other ways of composing
360	their functions are possible.

362	Leaf routers are configured with a traffic profile for a particular
363	flow based on its packet header. This functionality has been
364	defined by the RSVP Working Group in RFC 2205. Figure 2
365	shows what happens to a packet that arrives at the leaf router,
366	before it is passed to the forwarding engine. All arriving packets
367	must have both the A-bit and the P-bit cleared after which
368	packets are classified on their header. If the header does not
369	match any configured values, it is immediately forwarded.
370	Matched flows pass through individual Markers that have been
371	configured from the usage profile for that flow: service class
372	(Premium or Assured), rate (peak for Premium, "expected" for
373	Assured), and permissible burst size (may be optional for
374	Premium). Assured flow packets emerge from the Marker with
375	their A-bits set when the flow is in conformance to its Profile,
376	but the flow is otherwise unchanged. For a Premium flow, the
377	Marker will hold packets when necessary to enforce their
378	configured rate. Thus Premium flow packets emerge from the
379	Marker in a shaped flow with their P-bits set. (It is possible for
380	Premium flow packets to be dropped inside of the Marker as we
381	describe below.) Packets are passed to the forwarding engine
382	when they emerge from Markers. Packets that have either their P
383	or A bits set we will refer to as Marked packets.

385	Figure 2. Block diagram of leaf router input functionality

387	Figure 3 shows the inner workings of the Marker. For both
388	Assured and Premium packets, a token bucket "fills" at the flow
389	rate that was specified in the usage profile. For Assured service,
390	the token bucket depth is set by the Profile's burst size. For
391	Premium service, the token bucket depth must be limited to the
392	equivalent of only one or two packets. (We suggest a depth of
393	one packet in early deployments.) When a token is present,
394	Assured flow packets have their A-bit set to one, otherwise the
395	packet is passed to the forwarding engine. For Premium-
396	configured Marker, arriving packets that see a token present have
397	their P-bits set and are forwarded, but when no token is present,
398	Premium flow packets are held until a token arrives. If a
399	Premium flow bursts enough to overflow the holding queue, its
400	packets will be dropped. Though the flow set up data can be used
401	to configure a size limit for the holding queue (this would be the
402	meaning of a "burst" in Premium service), it is not necessary.
403	Unconfigured holding queues should be capable of holding at
404	least two bandwidth-delay products, adequate for TCP
405	connections. A smaller value might be used to suit delay
406	requirements of a specific application.

408	Figure 3. Markers to implement the two different services

410	In practice, the token bucket should be implemented in bytes and
411	a token is considered to be present if the number of bytes in the
412	bucket is equal or larger to the size of the packet. For Premium,
413	the bucket can only be allowed to fill to the maximum packet
414	size; while Assured may fill to the configured burst parameter.
415	Premium traffic is held until a sufficient byte credit has
416	accumulated and this holding buffer provides the only real queue
417	the flow sees in the network. For Assured, traffic, we just test if
418	the bytes in the bucket are sufficient for the packet size and set A
419	if so. If not, the only difference is that A is not set. Assured
420	traffic goes into a queue following this step and potentially sees a
421	queue at every hop along its path.

423	Each output interface of a router must have two queues and must
424	implement a test on the P-bit to select a packet's output queue.
425	The two queues must be serviced by simple priority, Premium
426	packets first. Each output interface must implement the RED-
427	based RIO mechanism described in [3] on the lower priority
428	queue. RIO uses two thresholds for when to begin dropping
429	packets, a lower one based on total queue occupancy for ordinary
430	best effort traffic and one based on the number of packets
431	enqueued that have their A-bit set. This means that any action
432	preferential to Assured service traffic will only be taken when
433	the queue's capacity exceeds the threshold value for ordinary
434	best effort service. In this case, only unmarked packets will be
435	dropped (using the RED algorithm) unless the threshold value
436	for Assured service is also reached. Keeping an accurate count of
437	the number of A-bit packets currently in a queue requires either
438	testing the A-bit at both entry and exit of the queue or some
439	additional state in the router. Figure 4 is a block diagram of the
440	output interface for all routers.

442	Figure 4. Router output interface for two-bit architecture

444	The packet output of a leaf router is thus a shaped stream of
445	packets with P-bits set mingled with an unshaped best effort
446	stream of packets, some of which may have A-bits set. Premium
447	service clearly cannot starve best effort traffic because it is both
448	burst and bandwidth controlled. Assured service might rely only
449	on a conservative allocation to prevent starvation of unmarked
450	traffic, but bursts of Assured traffic might then close out best-
451	effort traffic at bottleneck queues during congestive periods.

453	After [3], we designate the forwarding path objects that test
454	flows against their usage profiles "Profile Meters". Border
455	routers will require Profile Meters at their input interfaces. The
456	bilateral agreement between adjacent administrative domains
457	must specify a peak rate on all P traffic and a rate and burst for A
458	traffic (and possibly a start time and duration). A Profile Meter is
459	required at the ingress of a trust region to ensure that
460	differentiated service packet flows are in compliance with their
461	agreed-upon rates. Non-compliant packets of Premium flows are
462	discarded while non-compliant packets of Assured flows have
463	their A-bits reset. For example, in figure 1, if the ISP has agreed
464	to supply Company A with r bytes/sec of Premium service, P-bit
465	marked packets that enter the ISP through the link from
466	Company A will be dropped if they exceed r. If instead, the
467	service in figure 1 was Assured service, the packets would
468	simply be unmarked, forwarded as best effort.

470	The simplest border router input interface is a Profile Meter
471	constructed from a token bucket configured with the contracted
472	rate across that ingress link (see figure 5). Each type, Premium or
473	Assured, and each interface must have its own profile meter
474	corresponding to a particular class across a particular boundary.
475	(This is in contrast to models where every flow that crosses the
476	boundary must be separately policed and/or shaped.) The exact
477	mechanisms required at a border router input interface depend on
478	the allocation policy deployed; a more complex approach is
479	presented in section 4.

481	Figure 5. Border router input interface Profile Meters

483	3. Mechanisms

485	3.1 Forwarding Path Primitives

487	Section 2.3 introduced the forwarding path objects of Markers
488	and Profile Meters. In this section we specify the primitive
489	building blocks required to compose them. The primitives are:
490	general classifier, bit-pattern classifier, bit setter, priority
491	queues, policing token bucket and shaping token bucket. These
492	primitives can compose a Marker (either a policing or a shaping
493	token bucket plus a bit setter) and a Profile Meter (a policing
494	token bucket plus a dropper or bit setter).

496	General Classifier: Leaf or first-hop routers must perform a
497	transport-level signature matching based on a tuple in the packet
498	header, a functionality which is part of any RSVP-capable router.
499	As described above, packets whose tuples match one of the configured
500	flows are conformance tested and have the appropriate service bit set.
501	This function is memory- and processing-intensive, but is kept at the
502	edges of the network where there are fewer flows.

504	Bit-pattern classifier: This primitive comprises a simple two-
505	way decision based on whether a particular bit-pattern in the IP
506	header is set or not. As in figure 4, the P-bit is tested when a
507	packet arrives at a non-leaf router to determine whether to
508	enqueue it in the high priority output queue or the low priority
509	packet queue. The A-bit of packets bound for the low priority
510	queue is tested to 1) increment the count of Assured packets in
511	the queue if set and 2) determine which drop probability will be
512	used for that packet. Packets exiting the low priority queue must
513	also have the A-bit tested so that the count of enqueued Assured
514	packets can be decremented if necessary.

516	Bit setter: The A-bits and P-bits must be set or cleared in several
517	places. A functional block that sets the appropriate bits of the IP
518	header to a configured bit-pattern would be the most general.

520	Priority queues: Every network element must include (at least)
521	two levels of simple priority queueing. The high priority queue is
522	for the Premium traffic and the service rule is to send packets in
523	that queue first and to exhaustion. Recall that Premium traffic
524	must never be oversubscribed, thus Premium traffic should see
525	little or no queue.

527	Shaping token bucket:This is the token bucket required at the
528	leaf router for Premium traffic and shown in figure 3. As we
529	shall see, shaping is also useful at egress points of a trust region.
530	An arriving packet is immediately forwarded if there is a token
531	present in the bucket, otherwise the packet is enqueued until the
532	bucket contains tokens sufficient to send it. Shaping requires
533	clocking mechanisms, packet memory, and some state block for
534	each flow and is thus a memory and computation-intensive
535	process.

537	Policing token bucket: This is the token bucket required for
538	Profile Meters and shown in figure 5. Policing token buckets
539	never hold arriving packets, but check on arrival to see if a token
540	is available for the packet's service class. If so, the packet is
541	forwarded immediately. If not, the policing action is taken,
542	dropping for Premium and reclassifying or unmarking for
543	Assured.

545	3.2 Passing configuration information

547	 Clearly, mechanisms are required to communicate the
548	information about the request to the leaf router. This
549	configuration information is the rate, burst, and whether it is a
550	Premium or Assured type. There may also need to be a specific
551	field to set or clear this configuration. This information can be
552	passed in a number of ways, including using the semantics of
553	RSVP, SNMP, or directly set by a network administrator in some
554	other way. There must be some mechanisms for authenticating
555	the sender of this information. We expect configuration to be
556	done in a variety of ways in early deployments and a protocol
557	and mechanism for this to be a topic for future standards work.

559	3.3 Discussion

561	The requirements of shapers motivate their placement at the
562	edges of the network where the state per router can be smaller
563	than in the middle of a network. The greatest burden of flow
564	matching and shaping will be at leaf routers where the speeds
565	and buffering required should be less than those that might be
566	required deeper in the network. This functionality is not required
567	at every network element on the path. Routers that are internal to
568	a trust region will not need to shape traffic. Border routers may
569	need or desire to shape the aggregate flow of Marked packets at
570	their egress in order to ensure that they will not burst into non-
571	compliance with the policing mechanism at the ingress to the
572	other domain (though this may not be necessary if the in-degree
573	of the router is low). Further, the shaping would be applied to an
574	aggregation of all the Premium flows that exit the domain via
575	that path, not to each flow individually.

577	These mechanisms are within reach of today's technology and it
578	seems plausible to us that Premium and Assured services are all
579	that is needed in the Internet. If, in time, these services are found
580	insufficient, this architecture provides a migration path for
581	delivering other kinds of service levels to traffic. The A- and P-
582	bits would continue to be used to identify traffic that gets
583	Marked service, but further filter matching could be done on
584	packet headers to differentiate service levels further. Using the
585	bits this way reduces the number of packets that have to have
586	further matching done on them rather than filtering every
587	incoming packet. More queue levels and more complex
588	scheduling could be added for P-bit traffic and more levels of
589	drop priority could be added for A-bit traffic if experience shows
590	them to be necessary and processing speeds are sufficient. We
591	propose that the services described here be considered as "at
592	least" services. Thus, a network element should at least be
593	capable of mapping all P-bit traffic to Premium service and of
594	mapping all A-bit traffic to be treated with one level of priority
595	in the "best effort" queue (it appears that the single level of A-bit
596	traffic should map to a priority that is equivalent to the best level
597	in a multi-level element that is also in the path).

599	On the other hand, what is the downside of deploying an
600	architecture for both classes of service if later experience
601	convinces us that only one of them is needed? The functional
602	blocks of both service classes are similar and can be provided by
603	the same mechanism, parameterized differently. If Assured
604	service is not used, very little is lost. A RED-managed best effort
605	queue has been strongly recommended in [4] and, to the extent
606	that the deployment of this architecture pushes the deployment of
607	RED-managed best effort queues, it is clearly a positive. If
608	Premium service goes unused, the two-queues with simple
609	priority service is not required and the shaping function of the
610	Marker may be unused, thus these would impose an unnecessary
611	implementation cost.

613	4. The Architectural Framework for Marked Traffic
614	Allocation

616	Thus far we have focused on the service definitions and the
617	forwarding path mechanisms. We now turn to the problem of
618	allocating the level of Marked traffic throughout the Internet. We
619	observe that most organizations have fixed portions of their
620	budgets, including data communications, that are determined on
621	an annual or quarterly basis. Some additional monies might be
622	attached to specific projects for discretionary costs that arise in
623	the shorter term. In turn, service providers (ISPs and NSPs) must
624	do their planning on annual and quarterly bases and thus cannot
625	be expected to provide differentiated services purely "on call".
626	Provisioning sets up static levels of Marked traffic while call set-
627	up creates an allocation of Marked traffic for a single flow's
628	duration. Static levels can be provisioned with time-of-day
629	specifications, but cannot be changed in response to a dynamic
630	message. We expect both kinds of bandwidth allocation to be
631	important. The purchasers of Marked services can generally be
632	expected to work on longer-term budget cycles where these
633	services will be accounted for similarly to many information
634	services today. A mail-order house may wish to purchase a fixed
635	allocation of bandwidth in and out of its web-server to give
636	potential customers a "fast" feel when browsing their site. This
637	allocation might be based on hit rates of the previous quarter or
638	some sort of industry-based averages. In addition, there needs to
639	be a dynamic allocation capability to respond to particular
640	events, such as a demonstration, a network broadcast by a
641	company's CEO, or a particular network test. Furthermore, a
642	dynamic capability may be needed in order to meet a
643	precommitted service level when the particular source or
644	destination is allowed to be "anywhere on the Internet".
645	"Dynamic" covers the range from a telephoned or e-mailed
646	request to a signalling type model. A strictly statically allocated
647	scenario is expected to be useful in initial deployment of
648	differentiated services and to make up a major portion of the
649	Marked traffic for the forseeable future.

651	Without a "per call" dynamic set up, the preconfiguring of usage
652	profiles can always be construed as "paying for bits you don't
653	use" whether the type of service is Premium or Assured. We
654	prefer to think of this as paying for the level of service that one
655	expects to have available at any time, for example paying for a
656	telephone line. A customer might pay an additional flat fee to
657	have the privilege of calling a wide local area for no additional
658	charge or might pay by the call. Although a customer might pay
659	on a "per call" basis for every call made anywhere, it generally
660	turns out not to be the most economical option for most
661	customers. It's possible similar pricing structures might arise in
662	the internet.

664	We use Allocation to refer to the process of making Marked
665	traffic commitments anywhere along this continuum from strictly
666	preallocated to dynamic call set-up and we require an Allocation
667	architecture capable of encompassing this entire spectrum in any
668	mix. We further observe that Allocation must follow
669	organizational hierarchies, that is each organization must have
670	complete responsibility for the Allocation of the Marked traffic
671	resource within its domain. Finally, we observe that the only
672	chance of success for incremental deployment lies in an
673	Allocation architecture that is made up of bilateral agreements,
674	as multilateral agreements are much too complex to administer.
675	Thus, the Allocation architecture is made up of agreements
676	across boundaries as to the amount of Marked traffic that will be
677	allowed to pass. This is similar to "settlement" models used
678	today.

680	4.1 Bandwidth Brokers: Allocating and Controlling Bandwidth Shares

682	The goal of differentiated services is controlled sharing of some
683	organization's Internet bandwidth. The control can be done
684	independently by individuals, i.e., users set bit(s) in their packets
685	to distinguish their most important traffic, or it can be done by
686	agents that have some knowledge of the organization's priorities
687	and policies and allocate bandwidth with respect to those
688	policies.  Independent labeling by individuals is simple to
689	implement but unlikely to be sufficient since it's unreasonable to
690	expect all individuals to know all their organization's priorities
691	and current network use and always mark their traffic
692	accordingly.  Thus this architecture is designed with agents
693	called bandwidth brokers (BB) [2], that can be configured with
694	organizational policies, keep track of the current allocation of
695	marked traffic, and interpret new requests to mark traffic in light
696	of the policies and current allocation.

698	We note that such agents are inherent in any but the most trivial
699	notions of sharing.  Neither individuals nor the routers their
700	packets transit have the information necessary to decide which
701	packets are most important to the organization.  Since these
702	agents must exist, they can be used to allocate bandwidth for
703	end-to-end connections with far less state and simpler trust
704	relationships than deploying per flow or per filter guarantees in
705	all network elements on an end-to-end path. BBs make it
706	possible for bandwidth allocation to follow organizational
707	hierarchies and, in concert with the forwarding path mechanisms
708	discussed in section 3, reduce the state required to set up and
709	maintain a flow over architectures that require checking the full
710	flow header at every network element. Organizationally, the BB
711	architecture is motivated by the observation that multilateral
712	agreements rarely work and this architecture allows end-to-end
713	services to be constructed out of purely bilateral agreements.
714	BBs only need to establish relationships of limited trust with
715	their peers in adjacent domains, unlike schemes that require the
716	setting of flow specifications in routers throughout an end-to-end
717	path. In practical technical terms, the BB architecture makes it
718	possible to keep state on an administrative domain basis, rather
719	than at every router and the service definitions of Premium and
720	Assured service make it possible to confine per flow state to just
721	the leaf routers.

723	BBs have two responsibilities. Their primary one is to parcel out
724	their region's Marked traffic allocations and set up the leaf
725	routers within the local domain. The other is to manage the
726	messages that are sent across boundaries to adjacent regions'
727	BBs. A BB is associated with a particular trust region, one per
728	domain. A BB has a policy database that keeps the information
729	on who can do what when and a method of using that database to
730	authenticate requesters. Only a BB can configure the leaf routers
731	to deliver a particular service to flows, crucial for deploying a
732	secure system. If the deployment of Differentiated Services has
733	advanced to the stage where dynamically allocated, marked
734	flows are possible between two adjacent domains, BBs also
735	provide the hook needed to implement this. Each domain's BB
736	establishes a secure association with its peer in the adjacent
737	domain to negotiate or configure a rate and a service class
738	(Premium or Assured) across the shared boundary and through
739	the peer's domain. As we shall see, it is possible for some types
740	of service and particularly in early implementations, that this
741	"secure association" is not automatic but accomplished through
742	human negotiation and subsequent manual configuration of the
743	adjacent BBs according to the negotiated agreement. This
744	negotiated rate is a capability that a BB controls for all hosts in
745	its region.

747	When an allocation is desired for a particular flow, a request is
748	sent to the BB. Requests include a service type, a target rate, a
749	maximum burst, and the time period when service is required.
750	The request can be made manually by a network administrator or
751	a user or it might come from another region's BB. A BB first
752	authenticates the credentials of the requester, then verifies there
753	exists unallocated bandwidth sufficient to meet the request. If a
754	request passes these tests, the available bandwidth is reduced by
755	the requested amount and the flow specification is recorded. In
756	the case where the flow has a destination outside this trust
757	region, the request must fall within the class allocation through
758	the "next hop" trust region that was established through a
759	bilateral agreement of the two trust regions. The requester's BB
760	informs the adjacent region's BB that it will be using some of
761	this rate allocation. The BB configures the appropriate leaf router
762	with the information about the packet flow to be given a service
763	at the time that the service is to commence. This configuration is
764	"soft state" that the BB will periodically refresh. The BB in the
765	adjacent region is responsible for configuring the border router to
766	permit the allocated packet flow to pass and for any additional
767	configurations and negotiations within and across its borders that
768	will allow the flow to reach its final destination.

770	At DMZs, there must be an unambiguous way to determine the
771	local source of a packet. An interface's source could be
772	determined from its MAC address which would then be used to
773	classify packets as coming across a logical link directly from the
774	source domain corresponding to that MAC address. Thus with
775	this understanding we can continue to use figures illustrating a
776	single pipe between two different domains.

778	In this way, all agreements and negotiations are performed
779	between two adjacent domains. An initial request might cause
780	communication between BBs on several domains along a path,
781	but each communication is only between two adjacent BBs.
782	Initially, these agreements will be prenegotiated and fairly static.
783	Some may become more dynamic as the service evolves.

785	4.2 Examples

787	This section gives examples of BB transactions in a non-trivial,
788	multi-transit-domain Internet. The BB framework allows
789	operating points across a spectrum from "no signalling across
790	boundaries" to "each flow set up dynamically". We might expect
791	to move across this spectrum over time, as the necessary
792	mechanisms are ubiquitously deployed and BBs become more
793	sophisticated, but the statically allocated portions of the spectrum
794	should always have uses. We believe the ability to support this
795	wide spectrum of choices simultaneously will be important both
796	in incremental deployment and in allowing ISPs to make a wide
797	range of offerings and pricings to users. The examples of this
798	section roughly follow the spectrum of increasing sophistication.
799	Note that we assume that domains contract for some amount of
800	Marked traffic which can be requested as either Assured or
801	Premium in each individual flow setup transaction. The
802	examples say "Marked" although actual transactions would have
803	to specify either Assured or Premium.

805	A statically configured example with no BB messages
806	exchanged: Here all allocations are statically preallocated
807	through purely bilateral agreements between users (individual
808	TCPs, individual hosts, campus networks, or whole ISPs) [6].
809	The allocations are in the form of usage profiles of rate, burst,
810	and a time during which that profile is to be active. Users and
811	providers negotiate these Profiles which are then installed in the
812	user domain BB and in the provider domain BB. No BB
813	messages cross the boundary; we assume this negotiation is done
814	by human representatives of each domain. In this case, BBs only
815	have to perform one of their two functions, that of allocating this
816	Profile within their local domain. It is even possible to set all of
817	this suballocations up in advance and then the BB only needs to
818	set up and tear down the Profile at the proper time and to refresh
819	the soft state in the leaf routers. From the user domain BB, the
820	Profile is sent as soft state to the first hop router of the flow
821	during the specified time. These Profiles might be set using
822	RSVP, a variant of RSVP, SNMP, or some vendor-specific
823	mechanism. Although this static approach can work for all
824	Marked traffic, due to the strictly not oversubscribed
825	requirement, it is only appropriate for Premium traffic as long as
826	it is kept to a small percentage of the bottleneck path through a
827	domain or is otherwise constrained to a well-known behavior.
828	Similar restrictions might hold for Assured depending on the
829	expectation associated with the service.

831	In figure 6, we show an example of setting a Profile in a leaf
832	router. A usage profile has been negotiated with the ISP for the
833	entire domain and the BB parcels it out among individual flows
834	as requested. The leaf router mechanism is that shown in figure
835	3, with the token bucket set to the parameters from the usage
836	profile. The ISP's BB would configure its own Profile Meter at
837	the ingress router from that customer to ensure the Profile was
838	maintained. This mechanism was shown in figure 5. We assume
839	that the time duration and start times for any Profile to be active
840	are maintained in the BB. The Profile is sent to the ingress
841	device or cleared from the ingress device by messages sent from
842	the BB. In this example, we assume that van@lbl wants to talk to
843	ddc@mit. The LBL-BB is sent a request from Van asking that
844	premium service be assigned to a flow that is designated as
845	having source address "V:4" and going to destination address
846	"D:8". This flow should be configured for a rate of 128kb/sec
847	and allocated from 1pm to 3pm. The request must be "signed" in
848	a secure, verifiable manner. The request might be sent as data to
849	the LBL-BB, an e-mail message to a network administrator, or in
850	a phone call to a network administrator. The LBL-BB receives
851	this message, verifies that there is 128kb/sec of unused Premium
852	service for the domain from 1-3pm, then sends a message to
853	Leaf1 that sets up an appropriate Profile Meter. The message to
854	Leaf1 might be an RSVP message, or SNMP, or some
855	proprietary method. All the domains passed must have sufficient
856	reserve capacity to meet this request.

858	Figure 6. Bandwidth Broker setting Profiles in leaf routers

860	A statically configured example with BB messages
861	exchanged: Next we present an example where all allocations
862	are statically preallocated but BB messages are exchanged for
863	greater flexibility. Figure 7 shows an end-to-end example for
864	Marked traffic in a statically allocated internet. The numbers at
865	the trust region boundaries indicate the total statically allocated
866	Marked packet rates that will be accepted across those
867	boundaries. For example, 100kbps of Marked traffic can be sent
868	from LBL to ESNet; a Profile Meter at the ESNet egress
869	boundary would have a token bucket set to rate 100kbps. (There
870	MAY be a shaper set at LBL's egress to ensure that the Marked
871	traffic conforms to the aggregate Profile.) The tables inside the
872	transit network "bubbles" show their policy databases and reflect
873	the values after the transaction is complete. In Figure 7, V wants
874	to transmit a flow from LBL to D at MIT at 10 Kbps. As in
875	figure 6, a request for this profile is made of LBL's BB. LBL's
876	BB authenticates the request and checks to see if there is 10kbps
877	left in its Marked allocation going in that direction. There is, so
878	the LBL-BB passes a message to the ESNet-BB saying that it
879	would like to use 10kbps of its Marked allocation for this flow.
880	ESNet authenticates the message, checks its database and sees
881	that it has a 10kbps Marked allocation to NEARNet (the next
882	region in that direction) that is being unused. The policy is that
883	ESNet-BB must always inform ("ask") NEARNet-BB when it is
884	about to use part of its allocation. NEARNET-BB authenticates
885	the message, checks its database and discovers that 20kbps of the
886	allocation to MIT is unused and the policy at that boundary is to
887	not inform MIT when part of the allocation is about to be used
888	("<50 ok" where the total allocation is 50). The dotted lines
889	indicate the "implied" transaction, that is the transaction that
890	would have happened if the policy hadn't said "don't ask me".
891	Now each BB can pass an "ok" message to this request across its
892	boundary. This allows V to send to D, but not vice versa. It
893	would also be possible for the request to originate from D.

895	Figure 7. End-to-end example with static allocation.

897	Consider the same example where the ESNet-BB finds all of its
898	Marked allocation to NEARNet, 10 kbps, in use. With static
899	allocations, ESNet must transmit a "no" to this request back to
900	the LBL-BB. Presumably, the LBL-BB would record this
901	information to complain to ESNet about the overbooking at the
902	end of the month! One solution to this sort of "busy signal" is for
903	ESNet to get better at anticipating its customers needs or require
904	long advance bookings for every flow, but it's also possible for
905	bandwidth brokerage decisions to become dynamic.

907	Figure 8. End-to-end static allocation example with no remaining
908	allocation

910	Dynamic Allocation and additional mechanism: As we shall
911	see, dynamic allocation requires more complex BBs as well as
912	more complex border policing, including the necessity to keep
913	more state. However, it enables an important service with a small
914	increase in state.

916	The next set of figures (starting with figure 9) show what
917	happens in the case of dynamic allocation. As before, V requests
918	10kbps to talk to D at MIT. Since the allocation is dynamic, the
919	border policers do not have a preset value, instead being set to
920	reflect the current peak value of Marked traffic permitted to cross
921	that boundary. The request is sent to the LBL-BB.

923	Figure 9. First step in end-to-end dynamic allocation example.

925	In figure 10, note that ESNet has no allocation set up to
926	NEARNet. This system is capable of dynamic allocations in
927	addition to static, so it asks NEARNet if it can "add 10" to its
928	allocation from ESNet. As in the figure 7 example, MIT's policy
929	is set to "don't ask" for this case, so the dotted lines represent
930	"implicit transactions" where no messages were exchanged.
931	However, NEARNet does update its table to indicate that it is
932	now using 20kbps of the Marked allocation to MIT.

934	Figure 10. Second step in end-to-end dynamic allocation example

936	In figure 11, we see the third step where MIT's "virtual ok"
937	allows the NEARNet-BB to tell its border router to increase the
938	Marked allocation across the ESNet-NEARNet boundary by 10
939	kbps.

941	Figure 11. Third step in end-to-end dynamic allocation example

943	Figure 11 shows NEARNet-BB's "ok" for that request
944	transmitted back to ESNet-BB. This causes ESNet-BB to send its
945	border router a message to create a 10 kbps subclass for the flow
946	"V->D". This is required in order to ensure that the 10kpbs that
947	has just been dynamically allocated gets used only for that
948	connection. Note that this does require that the per flow state be
949	passed from LBL-BB to ESNet-BB, but this is the only boundary
950	that needs that level of flow information and this further
951	classification will only need to be done at that one boundary
952	router and only on packets coming from LBL. Thus dynamic
953	allocation requires more complex Profile Metering than that
954	shown in figure 5.

956	Figure 12. Fourth step in end-to-end dynamic allocation example.

958	In figure 12, the ESNet border router gives the "ok" that a
959	subclass has been created, causing the ESNet-BB to send an "ok"
960	to the LBL-BB which lets V know the request has been
961	approved.

963	Figure 13. Final step in end-to-end dynamic allocation example

965	For dynamic allocation, a basic version of a CBQ scheduler [5]
966	would have all the required functionality to set up the subclasses.
967	RSVP currently provides a way to move the TSpec for the flow.

969	For multicast flows, we assume that packets that are bound for at
970	least one egress can be carried through a domain at that level of
971	service to all egress points. If a particular multicast branch has
972	been subscribed to at best-effort when upstream branches are
973	Marked, it will have its bit settings cleared before it crosses the
974	boundary. The information required for this flow identification is
975	used to augment the existing state that is already kept on this
976	flow because it is a multicast flow. We note that we are already
977	"catching" this flow, but now we must potentially clear the bit-
978	pattern.

980	5. RSVP/int-serv and this architecture

982	Much work has been done in recent years on the definition of
983	related integrated services for the internet and the specification
984	of the RSVP signalling protocol. The two-bit architecture
985	proposed in this work can easily interoperate with those
986	specifications. In this section we first discuss how the forwarding
987	mechanisms described in section 3 can be used to support
988	integrated services. Second, we discuss how RSVP could
989	interoperate with the administrative structure of the BBs to
990	provide better scaling.

992	5.1 Providing Controlled-Load and Guaranteed Service

994	We believe that the forwarding path mechanisms described in
995	section 3 are general enough that they can also be used to
996	provide the Controlled-Load service [8] and a version of the
997	Guaranteed Quality of Service [9], as developed by the int-serv
998	WG. First note that Premium service can be thought of as a
999	constrained case of Controlled-Load service where the burst size
1000	is limited to one packet and where non-conforming packets are
1001	dropped. A network element that has implemented the
1002	mechanisms to support premium service can easily support the
1003	more general controlled-load service by making one or more
1004	minor parameter adjustments, e.g. by lifting the constraint on the
1005	token bucket size, or configuring the Premium service rate with
1006	the peak traffic rate parameter in the Controlled-Load
1007	specification, and by changing the policing action on out-of-
1008	profile packets from dropping to sending the packets to the Best-
1009	effort queue.

1011	It is also possible to implement Guaranteed Quality of Service
1012	using the mechanisms of Premium service. From RFC 2212 [9]:
1013	"The definition of guaranteed service relies on the result that the
1014	fluid delay of a flow obeying a token bucket (r, b) and being
1015	served by a line with bandwidth R is bounded by b/R as long as
1016	R is no less than r. Guaranteed service with a service rate R,
1017	where now R is a share of bandwidth rather than the bandwidth
1018	of a dedicated line approximates this behavior." The service
1019	model of Premium clearly fits this model. RFC 2212 states that
1020	"Non-conforming datagrams SHOULD be treated as best-effort
1021	datagrams." Thus, a policing Profile Meter that drops non-
1022	conforming datagrams would be acceptable, but it's also possible
1023	to change the action for non-compliant packets from a drop to
1024	sending to the best-effort queue.

1026	5.2 RSVP and BBs

1028	In this section we discuss how RSVP signaling can be used in
1029	conjunction with the BBs described in section 4 to deliver a more
1030	scalable end-to-end resource set up for Integrated Services. First
1031	we note that the BB architecture has three major differences with
1032	the original RSVP resource set up model:

1034	1. There exist apriori bilateral business relations between BBs of
1035	adjacent trust regions before one can set up end-to-end resource
1036	allocation; real-time signaling is used only to activate/confirm
1037	the availability of pre-negotiated Marked bandwidth, and to
1038	dynamically readjust the allocation amount when necessary. We
1039	note that this real-time signaling across domains is not required,
1040	but depends on the nature of the bilateral agreement (e.g., the
1041	agreement might state "I'll tell you whenever I'm going to use
1042	some of my allocation" or not).

1044	2. A few bits in the packet header, i.e. the P-bit and A-bit, are
1045	used to mark the service class of each packet, therefore a full
1046	packet classification (by checking all relevant fields in the
1047	header) need be done only once at the leaf router; after that
1048	packets will be served according to their class bit settings.

1050	3. RSVP resource set up assumes that resources will be reserved
1051	hop-by-hop at each router along the entire end-to-end path.

1053	RSVP messages sent to leaf routers by hosts can be intercepted
1054	and sent to the local domain's BB. The BB processes the
1055	message and, if the request is approved, forwards a message to
1056	the leaf router that sets up appropriate per-flow packet
1057	classification. A message should also be sent to the egress border
1058	router to add to the aggregate Marked traffic allocation for
1059	packet shaping by the Profile Meter on outbound traffic. (Its
1060	possible that this is always set to the full allocation.) An RSVP
1061	message must be sent across the boundary to adjacent ISP's
1062	border router, either from the local domain's border router or
1063	from the local domain's BB. If the ISP is also implementing the
1064	RSVP with a BB and diff-serv framework, its border router
1065	forwards the message to the ISP's local BB. A similar process (to
1066	what happened in the first domain) can be carried out in the ISP
1067	domain, then an RSVP message gets forwarded to the next ISP
1068	along the path. Inside a domain, packets are served solely
1069	according to the Marked bits. The local BB knows exactly how
1070	much Premium traffic is permitted to enter at each border router
1071	and from which border router packets exit.

1073	6. Recommendations

1075	This document has presented a reference architecture for
1076	differentiated services. Several variations can be envisioned,
1077	particularly for early and partial deployments, but we do not
1078	enumerate all of these variations here. There has been a great
1079	market demand for differentiated services lately. As one of the
1080	many efforts to meet that demand this draft sketches out the
1081	framework of a flexible architecture for offering differential
1082	services, and in particular defines a simple set of packet
1083	forwarding path mechanisms to support two basic types of
1084	differential services. Although there remain a number of issues
1085	and parameters that need further exploration and refinement, we
1086	believe it is both possible and feasible at this time to start
1087	deployment of differentiated services incrementally. First, given
1088	that the basic mechanisms required in the packet forwarding path
1089	are clearly understood, both Assured and Premium services can
1090	be implemented today with manually configured BBs and static
1091	resource allocation. Initially we recommend conservative choices
1092	on the amount of Marked traffic that is admitted into the
1093	network. Second, we plan to continue the effort started with this
1094	draft and the experimental work of the authors to define and
1095	deploy increasingly sophisticated BBs. We hope to turn the
1096	experience gained from in-progress trial implementations on
1097	ESNet and CAIRN into future proposals to the IETF.

1099	Future revisions of this draft will present the receiver-based and
1100	multicast flow allocations in detail.    After this step is finished,
1101	we believe the basic picture of an scalable, robust, secure
1102	resource management and allocation system will be completed.
1103	In this draft we described how the proposed architecture supports
1104	two services that seem to us to provide at least a good starting
1105	point for trial deployment of differentiated services. Our main
1106	intent is to define an architecture with three services, Premium,
1107	Assured, and Best effort, that can be determined by specific bit-
1108	patterns, but not to preclude additional levels of differentiation
1109	within each service. It seems that more experimentation and
1110	experience is required before we could standardize more than
1111	one level per service class. Our base-level approach says that
1112	everyone has to provide "at least" Premium service and Assured
1113	service as documented. We feel rather strongly about both 1) that
1114	we should not try to define, at this time, something beyond the
1115	minimalist two service approach and 2) that the architecture we
1116	define must be open-ended so that more levels of differentiation
1117	might be standardized in the future. We believe this architecture
1118	is completely compatible with approaches that would define
1119	more levels of differentiation within a particular service, if the
1120	benefits of doing so become well understood.

1122	7. Acknowledgments

1124	The authors have benefited from many discussions, both in
1125	person and electronically and wish to particularly thank Dave
1126	Clark who has been responsible for the genesis of many of the
1127	ideas presented here, though he does not agree with all of the
1128	content this document. We also thank Sally Floyd for comments
1129	on an earlier draft. A comment from Jon Crowcroft was partially
1130	responsible for our including section 5. Comments from Fred
1131	Baker made us try to make it clearer that we are defining two
1132	base-level services, irrespective of the bit patterns used to encode
1133	them.

1135	8. References

1137	[1] D. Clark, "Adding Service Discrimination to the Internet",
1138	Proceedings of the 23rd Annual Telecommunications Policy Research
1139	Conference (TPRC), Solomons, MD, October 1995.

1141	[2] V. Jacobson, "Differentiated Services Architecture", talk in
1142	the Int-Serv WG at the Munich IETF, August, 1997.

1144	[3] D. Clark and J. Wroclawski, "An Approach to Service
1145	Allocation in the Internet", Internet Draft draft-clark-diff-svc-
1146	alloc-00.txt, July 1997, also talk by D. Clark in the Int-Serv WG
1147	at the Munich IETF, August, 1997.

1149	[4] Braden et. al., "Recommendations on Queue Management
1150	and Congestion Avoidance in the Internet", Internet Draft,
1151	March, 1997.

1153	[4] Braden, R., Ed., et. al., "Resource Reservation Protocol
1154	(RSVP) - Version 1 Functional Specification", RFC 2205,
1155	September, 1997.

1157	[5] S. Floyd and V. Jacobson, "Link-sharing and Resource
1158	Management Models for Packet Networks", IEEE/ACM
1159	Transactions on Networking, pp 365-386, August 1995.

1161	[6] D. Clark, private communication, October 26, 1997

1163	[7] "Advanced QoS Services for the Intelligent Internet", Cisco
1164	Systems White Paper, 1997.

1166	[8] J. Wroclawski, "Specification of the Controlled-Load
1167	Network Element Service", RFC 2211, September, 1997.

1169	[9] S. Shenker, et. al., "Specification of Guaranteed Quality of
1170	Service", RFC 2212, September, 1997.

1172	[10] D. Clark and W. Fang, "Explicit Allocation of Best Effort
1173	Packet Delivery Service", IEEE/ACM Transactions on Networking,
1174	August, 1998, Vol6, No 4, pp. 362-373. also at: http://
1175	diffserv.lcs.mit.edu/Papers/exp-alloc-ddc-wf.pdf

1177	Authors' Addresses

1179	Kathleen Nichols
1180	Cisco Systems, Inc.
1181	170 West Tasman Drive
1182	San Jose, CA 95134-1706

1184	Phone: 408-525-4857
1185	Email:   kmn@cisco.com

1187	Van Jacobson
1188	Cisco Systems, Inc.
1189	170 West Tasman Drive
1190	San Jose, CA 95134-1706

1192	Email: van@cisco.com

1194	Lixia Zhang
1195	UCLA
1196	4531G Boelter Hall
1197	Los Angeles, CA  90095

1199	Phone: 310-825-2695
1200	Email: lixia@cs.ucla.edu

1202	Appendix: A Combined Approach to Differential Service in the Internet by
1203	David D. Clark

1205	After the draft-nichols-diff-svc-00 was submitted, the co-authors had a
1206	discussion with Dave Clark and John Wroclawski which resulted in Clark's
1207	using the presentation slot for the draft at the December 1997 IETF
1208	Integrated Services Working Group meeting. A reading of the slides shows
1209	that it was Clark's proposal on "mechanisms", "services", and "rules"
1210	and how to proceed in the standards process that has guided much of the
1211	process in the subsequently formed IETF Differentiated Services Working
1212	Group. We believe Dave Clark's talk gave us a solid approach for
1213	bringing quality of service to the Internet in a manner that is
1214	compatible with its strengths.

1216	The slides presented at the December 1997 IETF Integrated Services
1217	Working Group are included with the Postscript version.